Re: [PATCH] f2fs: read page index before freeing

2018-11-26 Thread Chao Yu
On 2018/11/27 8:22, PanBian wrote:
> On Mon, Nov 26, 2018 at 07:07:08PM +0800, Chao Yu wrote:
>> On 2018/11/26 18:28, PanBian wrote:
>>> On Mon, Nov 26, 2018 at 05:13:53PM +0800, Chao Yu wrote:
>>>> Hi Pan,
>>>>
>>>> On 2018/11/22 18:58, Pan Bian wrote:
>>>>> The function truncate_node frees the page with f2fs_put_page. However,
>>>>> the page index is read after that. So, the patch reads the index before
>>>>> freeing the page.
>>>>
>>>> I notice that you found another use-after-free bug in ext4, out of
>>>> curiosity, I'd like to ask how do you find those bugs? by tool or code 
>>>> review?
>>>
>>> I found such bugs by the aid of a tool I wrote recently. I designed a 
>>> method 
>>> to automatically find paired alloc/free functions. With such functions, I
>>> wrote two checkers, one to check mismatched alloc/free bugs, the other to
>>> check use-after-free and double-free bugs.
>>
>> Excellent! Do you have any plan to open its source or announce it w/ binary
>> to linux kernel developers, I think w/ it we can help to improve kernel's
>> code quality efficiently.
> 
> Yes. I am now writing a paper about the method. I will open the source code
> as soon as I complete the paper and some optimizations.

Cool, if there is any progress, please let f2fs guys know, thank you in
advance. :)

Thanks,

> 
> Best,
> Pan
> 
> 
> .
> 



Re: [PATCH] f2fs: read page index before freeing

2018-11-26 Thread Chao Yu
Hi Pan,

On 2018/11/22 18:58, Pan Bian wrote:
> The function truncate_node frees the page with f2fs_put_page. However,
> the page index is read after that. So, the patch reads the index before
> freeing the page.

I notice that you found another use-after-free bug in ext4, out of
curiosity, I'd like to ask how do you find those bugs? by tool or code review?

Thanks,



Re: [PATCH] f2fs: read page index before freeing

2018-11-22 Thread Chao Yu
Hi Pan,

On 2018/11/22 18:58, Pan Bian wrote:
> The function truncate_node frees the page with f2fs_put_page. However,
> the page index is read after that. So, the patch reads the index before
> freeing the page.

Good catch!

It will be better to add:

Fixes: bf39c00a9a7f ("f2fs: drop obsolete node page when it is truncated")
Cc: 

> 
> Signed-off-by: Pan Bian 

Reviewed-by: Chao Yu 

Thanks,



[PATCH v2] f2fs: add bio cache for IPU

2018-11-18 Thread Chao Yu
SQLite in Wal mode may trigger sequential IPU write in db-wal file, after
commit d1b3e72d5490 ("f2fs: submit bio of in-place-update pages"), we
lost the chance of merging page in inner managed bio cache, result in
submitting more small-sized IO.

So let's add temporary bio in writepages() to cache mergeable write IO as
much as possible.

Test case:
1. xfs_io -f /mnt/f2fs/file -c "pwrite 0 65536" -c "fsync"
2. xfs_io -f /mnt/f2fs/file -c "pwrite 0 65536" -c "fsync"

Before:
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 
65544, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 
65552, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 
65560, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 
65568, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 
65576, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 
65584, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 
65592, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 
65600, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 
65608, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 
65616, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 
65624, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 
65632, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 
65640, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 
65648, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 
65656, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 
65664, size = 4096
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), NODE, sector = 
57352, size = 4096

After:
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 
65544, size = 65536
f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), NODE, sector = 
57368, size = 4096

Signed-off-by: Chao Yu 
---
v2:
- submit cached bio for cp error case.
 fs/f2fs/data.c| 61 +--
 fs/f2fs/f2fs.h|  3 +++
 fs/f2fs/segment.c |  5 +++-
 3 files changed, 66 insertions(+), 3 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 8780f3d737c4..7dffafb8b2c5 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -474,6 +474,49 @@ int f2fs_submit_page_bio(struct f2fs_io_info *fio)
return 0;
 }
 
+int f2fs_merge_page_bio(struct f2fs_io_info *fio)
+{
+   struct bio *bio = *fio->bio;
+   struct page *page = fio->encrypted_page ?
+   fio->encrypted_page : fio->page;
+
+   if (!f2fs_is_valid_blkaddr(fio->sbi, fio->new_blkaddr,
+   __is_meta_io(fio) ? META_GENERIC : DATA_GENERIC))
+   return -EFAULT;
+
+   trace_f2fs_submit_page_bio(page, fio);
+   f2fs_trace_ios(fio, 0);
+
+   if (bio && (*fio->last_block + 1 != fio->new_blkaddr ||
+   !__same_bdev(fio->sbi, fio->new_blkaddr, bio))) {
+   __submit_bio(fio->sbi, bio, fio->type);
+   bio = NULL;
+   }
+alloc_new:
+   if (!bio) {
+   bio = __bio_alloc(fio->sbi, fio->new_blkaddr, fio->io_wbc,
+   BIO_MAX_PAGES, false, fio->type, fio->temp);
+   *fio->last_block = fio->new_blkaddr;
+   bio_set_op_attrs(bio, fio->op, fio->op_flags);
+   }
+
+   if (bio_add_page(bio, page, PAGE_SIZE, 0) < PAGE_SIZE) {
+   __submit_bio(fio->sbi, bio, fio->type);
+   bio = NULL;
+   goto alloc_new;
+   }
+
+   if (fio->io_wbc)
+   wbc_account_io(fio->io_wbc, page, PAGE_SIZE);
+
+   *fio->last_block = fio->new_blkaddr;
+
+   inc_page_count(fio->sbi, WB_DATA_TYPE(fio->page));
+
+   *fio->bio = bio;
+   return 0;
+}
+
 void f2fs_submit_page_write(struct f2fs_io_info *fio)
 {
struct f2fs_sb_info *sbi = fio->sbi;
@@ -1894,6 +1937,8 @@ int f2fs_do_write_data_page(struct f2fs_io_info *fio)
 }
 
 static int __write_data_page(struct page *page, bool *submitted,
+   struct bio **bio,
+   sector_t *last_block,
struct writeback_control *wbc,
enum iostat_type io_type)
 {
@@ -1919,6 +1964,8 @@ static int __write_data_page(struct page *page, bool 
*submitted,
.need_

[PATCH] f2fs: add to account direct IO

2018-10-31 Thread Chao Yu
This patch adds f2fs_dio_submit_bio() to hook submit_io/end_io functions
in direct IO path, in order to account DIO.

Later, we will add this count into is_idle() to let background GC/Discard
thread be aware of DIO.

Signed-off-by: Chao Yu 
---
 fs/f2fs/data.c  | 51 -
 fs/f2fs/debug.c |  4 
 fs/f2fs/f2fs.h  | 10 ++
 3 files changed, 64 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 3a874bedc923..b5c711b35f8e 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -2601,6 +2601,53 @@ static int check_direct_IO(struct inode *inode, struct 
iov_iter *iter,
return 0;
 }
 
+static void f2fs_dio_end_io(struct bio *bio)
+{
+   struct f2fs_private_dio *dio = bio->bi_private;
+
+   dec_page_count(F2FS_I_SB(dio->inode),
+   dio->write ? F2FS_DIO_WRITE : F2FS_DIO_READ);
+
+   bio->bi_private = dio->orig_private;
+   bio->bi_end_io = dio->orig_end_io;
+
+   kfree(dio);
+
+   bio_endio(bio);
+}
+
+static void f2fs_dio_submit_bio(struct bio *bio, struct inode *inode,
+   loff_t file_offset)
+{
+   struct f2fs_private_dio *dio;
+   bool write = (bio_op(bio) == REQ_OP_WRITE);
+   int err;
+
+   dio = f2fs_kzalloc(F2FS_I_SB(inode),
+   sizeof(struct f2fs_private_dio), GFP_NOFS);
+   if (!dio) {
+   err = -ENOMEM;
+   goto out;
+   }
+
+   dio->inode = inode;
+   dio->orig_end_io = bio->bi_end_io;
+   dio->orig_private = bio->bi_private;
+   dio->write = write;
+
+   bio->bi_end_io = f2fs_dio_end_io;
+   bio->bi_private = dio;
+
+   inc_page_count(F2FS_I_SB(inode),
+   write ? F2FS_DIO_WRITE : F2FS_DIO_READ);
+
+   submit_bio(bio);
+   return;
+out:
+   bio->bi_error = err;
+   bio_endio(bio);
+}
+
 static ssize_t f2fs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 {
struct address_space *mapping = iocb->ki_filp->f_mapping;
@@ -2647,7 +2694,9 @@ static ssize_t f2fs_direct_IO(struct kiocb *iocb, struct 
iov_iter *iter)
down_read(>i_gc_rwsem[READ]);
}
 
-   err = blockdev_direct_IO(iocb, inode, iter, get_data_block_dio);
+   err = __blockdev_direct_IO(iocb, inode, inode->i_sb->s_bdev,
+   iter, get_data_block_dio, NULL, f2fs_dio_submit_bio,
+   DIO_LOCKING | DIO_SKIP_HOLES);
 
if (do_opu)
up_read(>i_gc_rwsem[READ]);
diff --git a/fs/f2fs/debug.c b/fs/f2fs/debug.c
index e327eefdbc02..06e72f9c8654 100644
--- a/fs/f2fs/debug.c
+++ b/fs/f2fs/debug.c
@@ -53,6 +53,8 @@ static void update_general_status(struct f2fs_sb_info *sbi)
si->vw_cnt = atomic_read(>vw_cnt);
si->max_aw_cnt = atomic_read(>max_aw_cnt);
si->max_vw_cnt = atomic_read(>max_vw_cnt);
+   si->nr_dio_read = get_pages(sbi, F2FS_DIO_READ);
+   si->nr_dio_write = get_pages(sbi, F2FS_DIO_WRITE);
si->nr_wb_cp_data = get_pages(sbi, F2FS_WB_CP_DATA);
si->nr_wb_data = get_pages(sbi, F2FS_WB_DATA);
si->nr_rd_data = get_pages(sbi, F2FS_RD_DATA);
@@ -374,6 +376,8 @@ static int stat_show(struct seq_file *s, void *v)
seq_printf(s, "  - Inner Struct Count: tree: %d(%d), node: 
%d\n",
si->ext_tree, si->zombie_tree, si->ext_node);
seq_puts(s, "\nBalancing F2FS Async:\n");
+   seq_printf(s, "  - DIO (R: %4d, W: %4d)\n",
+  si->nr_dio_read, si->nr_dio_write);
seq_printf(s, "  - IO_R (Data: %4d, Node: %4d, Meta: %4d\n",
   si->nr_rd_data, si->nr_rd_node, si->nr_rd_meta);
seq_printf(s, "  - IO_W (CP: %4d, Data: %4d, Flush: (%4d %4d 
%4d), "
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index b9cec3f2184c..2363512c0e1c 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -957,6 +957,8 @@ enum count_type {
F2FS_RD_DATA,
F2FS_RD_NODE,
F2FS_RD_META,
+   F2FS_DIO_WRITE,
+   F2FS_DIO_READ,
NR_COUNT_TYPE,
 };
 
@@ -1336,6 +1338,13 @@ struct f2fs_sb_info {
__u32 s_chksum_seed;
 };
 
+struct f2fs_private_dio {
+   struct inode *inode;
+   void *orig_private;
+   bio_end_io_t *orig_end_io;
+   bool write;
+};
+
 #ifdef CONFIG_F2FS_FAULT_INJECTION
 #define f2fs_show_injection_info(type) \
printk_ratelimited("%sF2FS-fs : inject %s in %s of %pF\n",  \
@@ -3158,6 +3167,7 @@ struct f2fs_stat_info {
int total_count, utilization;
int bg_gc, nr_wb_cp_data, nr_wb_data;
int nr_rd_data, nr_rd_node, nr_rd_meta;
+   int nr_dio_read, nr_dio_write;
unsigned int io_skip_bggc, other_skip_bggc;
int nr_flushing, nr_flushed, flush_list_empty;
int nr_discarding, nr_discarded;
-- 
2.18.0.rc1



Re: [f2fs-dev] [PATCH] f2fs: clear cold data flag if IO is not counted

2018-10-15 Thread Chao Yu
On 2018/10/11 5:22, Jaegeuk Kim wrote:
> If we clear the cold data flag out of the writeback flow, we can miscount
> -1 by end_io.

I didn't get it, which count do you mean?

Thanks,

> 
> Signed-off-by: Jaegeuk Kim 
> ---
>  fs/f2fs/data.c | 4 
>  1 file changed, 4 deletions(-)
> 
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index 29a9d3b8f709..4102799b5558 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -2636,10 +2636,6 @@ static int f2fs_set_data_page_dirty(struct page *page)
>   if (!PageUptodate(page))
>   SetPageUptodate(page);
>  
> - /* don't remain PG_checked flag which was set during GC */
> - if (is_cold_data(page))
> - clear_cold_data(page);
> -
>   if (f2fs_is_atomic_file(inode) && !f2fs_is_commit_atomic_write(inode)) {
>   if (!IS_ATOMIC_WRITTEN_PAGE(page)) {
>   f2fs_register_inmem_page(inode, page);
> 



Re: [f2fs-dev] [PATCH] f2fs: fix quota info to adjust recovered data

2018-09-30 Thread Chao Yu
On 2018-10-1 9:27, Jaegeuk Kim wrote:
> On 10/01, Chao Yu wrote:
>> On 2018-10-1 7:58, Jaegeuk Kim wrote:
>>> On 09/29, Chao Yu wrote:
>>>> On 2018/9/29 7:40, Jaegeuk Kim wrote:
>>>>> Testing other fix.
>>>>>
>>>>> ---
>>>>>  fs/f2fs/checkpoint.c |  7 +++
>>>>>  fs/f2fs/f2fs.h   |  1 +
>>>>>  fs/f2fs/gc.c | 10 +-
>>>>>  fs/f2fs/super.c  | 22 +-
>>>>>  4 files changed, 38 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
>>>>> index 524b87667cf4..3fde91f41a91 100644
>>>>> --- a/fs/f2fs/checkpoint.c
>>>>> +++ b/fs/f2fs/checkpoint.c
>>>>> @@ -1494,6 +1494,7 @@ int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, 
>>>>> struct cp_control *cpc)
>>>>>  {
>>>>>   struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
>>>>>   unsigned long long ckpt_ver;
>>>>> + bool need_up = false;
>>>>>   int err = 0;
>>>>>  
>>>>>   if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED))) {
>>>>> @@ -1506,6 +1507,10 @@ int f2fs_write_checkpoint(struct f2fs_sb_info 
>>>>> *sbi, struct cp_control *cpc)
>>>>>   f2fs_msg(sbi->sb, KERN_WARNING,
>>>>>   "Start checkpoint disabled!");
>>>>>   }
>>>>> + if (!is_sbi_flag_set(sbi, SBI_QUOTA_INIT)) {
>>>>> + need_up = true;
>>>>> + down_read(>sb->s_umount);
>>>>
>>>> This is to avoid show warning when calling dquot_writeback_dquots() in
>>>> f2fs_quota_sync(), right?
>>>
>>> Yup. Unfortunately, this can't fix all the issues, so I'm testing trylock
>>> simply in this case.
>>
>> Oh, that's just warning, it could not be harmful, I think we can simply 
>> remove
>> WARN_ON_ONCE in dquot_writeback_dquots to fix this?
> 
> Well, I think it'd be better to keep it.

We'd better to ask suggestion from maintainer of quota subsystem?

Thanks,

> 
>>
>>>
>>>>
>>>>> + }
>>>>>   mutex_lock(>cp_mutex);
>>>>>  
>>>>>   if (!is_sbi_flag_set(sbi, SBI_IS_DIRTY) &&
>>>>> @@ -1582,6 +1587,8 @@ int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, 
>>>>> struct cp_control *cpc)
>>>>>   trace_f2fs_write_checkpoint(sbi->sb, cpc->reason, "finish checkpoint");
>>>>>  out:
>>>>>   mutex_unlock(>cp_mutex);
>>>>> + if (need_up)
>>>>> + up_read(>sb->s_umount);
>>>>>   return err;
>>>>>  }
>>>>>  
>>>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>>>>> index 57c829dd107e..30194f2f108e 100644
>>>>> --- a/fs/f2fs/f2fs.h
>>>>> +++ b/fs/f2fs/f2fs.h
>>>>> @@ -1096,6 +1096,7 @@ enum {
>>>>>   SBI_IS_SHUTDOWN,/* shutdown by ioctl */
>>>>>   SBI_IS_RECOVERED,   /* recovered orphan/data */
>>>>>   SBI_CP_DISABLED,/* CP was disabled last mount */
>>>>> + SBI_QUOTA_INIT, /* avoid sb->s_umount lock */
>>>>>   SBI_QUOTA_NEED_FLUSH,   /* need to flush quota info in 
>>>>> CP */
>>>>>   SBI_QUOTA_SKIP_FLUSH,   /* skip flushing quota in 
>>>>> current CP */
>>>>>   SBI_QUOTA_NEED_REPAIR,  /* quota file may be corrupted 
>>>>> */
>>>>> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
>>>>> index adaf5a695b12..deece448cb3b 100644
>>>>> --- a/fs/f2fs/gc.c
>>>>> +++ b/fs/f2fs/gc.c
>>>>> @@ -55,9 +55,14 @@ static int gc_thread_func(void *data)
>>>>>   f2fs_stop_checkpoint(sbi, false);
>>>>>   }
>>>>>  
>>>>> - if (!sb_start_write_trylock(sbi->sb))
>>>>> + if (!down_read_trylock(>sb->s_umount))
>>>>>   continue;
>>>>>  
>>>>> + set_sbi_flag(sbi, SBI_QUOTA_INIT);
>>>>> +
>>>>> + if (!sb_start_write_trylock(sbi->sb))
>>>>> + goto next_umount;
>&g

[PATCH 1/2] f2fs: add to account meta IO

2018-09-29 Thread Chao Yu
This patch supports to account meta IO, it enables to show write IO
from f2fs more comprehensively via 'status' debugfs entry.

Signed-off-by: Chao Yu 
---
 fs/f2fs/debug.c   | 13 +
 fs/f2fs/f2fs.h| 15 +++
 fs/f2fs/segment.c |  1 +
 3 files changed, 29 insertions(+)

diff --git a/fs/f2fs/debug.c b/fs/f2fs/debug.c
index d3c402183e3c..da1cabbc4973 100644
--- a/fs/f2fs/debug.c
+++ b/fs/f2fs/debug.c
@@ -118,6 +118,9 @@ static void update_general_status(struct f2fs_sb_info *sbi)
si->curzone[i] = GET_ZONE_FROM_SEC(sbi, si->cursec[i]);
}
 
+   for (i = META_CP; i < META_MAX; i++)
+   si->meta_count[i] = atomic_read(>meta_count[i]);
+
for (i = 0; i < 2; i++) {
si->segment_count[i] = sbi->segment_count[i];
si->block_count[i] = sbi->block_count[i];
@@ -329,6 +332,13 @@ static int stat_show(struct seq_file *s, void *v)
   si->prefree_count, si->free_segs, si->free_secs);
seq_printf(s, "CP calls: %d (BG: %d)\n",
si->cp_count, si->bg_cp_count);
+   seq_printf(s, "  - cp blocks : %u\n", si->meta_count[META_CP]);
+   seq_printf(s, "  - sit blocks : %u\n",
+   si->meta_count[META_SIT]);
+   seq_printf(s, "  - nat blocks : %u\n",
+   si->meta_count[META_NAT]);
+   seq_printf(s, "  - ssa blocks : %u\n",
+   si->meta_count[META_SSA]);
seq_printf(s, "GC calls: %d (BG: %d)\n",
   si->call_count, si->bg_gc);
seq_printf(s, "  - data segments : %d (%d)\n",
@@ -441,6 +451,7 @@ int f2fs_build_stats(struct f2fs_sb_info *sbi)
 {
struct f2fs_super_block *raw_super = F2FS_RAW_SUPER(sbi);
struct f2fs_stat_info *si;
+   int i;
 
si = f2fs_kzalloc(sbi, sizeof(struct f2fs_stat_info), GFP_KERNEL);
if (!si)
@@ -466,6 +477,8 @@ int f2fs_build_stats(struct f2fs_sb_info *sbi)
atomic_set(>inline_inode, 0);
atomic_set(>inline_dir, 0);
atomic_set(>inplace_count, 0);
+   for (i = META_CP; i < META_MAX; i++)
+   atomic_set(>meta_count[i], 0);
 
atomic_set(>aw_cnt, 0);
atomic_set(>vw_cnt, 0);
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index b1738113e821..0f82f342e514 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -202,6 +202,7 @@ enum {
META_NAT,
META_SIT,
META_SSA,
+   META_MAX,
META_POR,
DATA_GENERIC,
META_GENERIC,
@@ -1267,6 +1268,7 @@ struct f2fs_sb_info {
 */
 #ifdef CONFIG_F2FS_STAT_FS
struct f2fs_stat_info *stat_info;   /* FS status information */
+   atomic_t meta_count[META_MAX];  /* # of meta blocks */
unsigned int segment_count[2];  /* # of allocated segments */
unsigned int block_count[2];/* # of allocated blocks */
atomic_t inplace_count; /* # of inplace update */
@@ -3146,6 +3148,7 @@ struct f2fs_stat_info {
int cursec[NR_CURSEG_TYPE];
int curzone[NR_CURSEG_TYPE];
 
+   unsigned int meta_count[META_MAX];
unsigned int segment_count[2];
unsigned int block_count[2];
unsigned int inplace_count;
@@ -3197,6 +3200,17 @@ static inline struct f2fs_stat_info *F2FS_STAT(struct 
f2fs_sb_info *sbi)
if (f2fs_has_inline_dentry(inode))  \
(atomic_dec(_I_SB(inode)->inline_dir));\
} while (0)
+#define stat_inc_meta_count(sbi, blkaddr)  \
+   do {\
+   if (blkaddr < SIT_I(sbi)->sit_base_addr)\
+   atomic_inc(&(sbi)->meta_count[META_CP]);\
+   else if (blkaddr < NM_I(sbi)->nat_blkaddr)  \
+   atomic_inc(&(sbi)->meta_count[META_SIT]);   \
+   else if (blkaddr < SM_I(sbi)->ssa_blkaddr)  \
+   atomic_inc(&(sbi)->meta_count[META_NAT]);   \
+   else if (blkaddr < SM_I(sbi)->main_blkaddr) \
+   atomic_inc(&(sbi)->meta_count[META_SSA]);   \
+   } while (0)
 #define stat_inc_seg_type(sbi, curseg) \
((sbi)->segment_count[(curseg)->alloc_type]++)
 #define stat_inc_block_count(sbi, curseg)  \
@@ -3284,6 +3298,7 @@ void f2fs_destroy_root_stats(void);
 #define stat_inc_volatile_write(inode) do { } while (0)
 #define stat_dec_volatile_write(inode) do { }

[PATCH v3] Revert: "f2fs: check last page index in cached bio to decide submission"

2018-09-27 Thread Chao Yu
From: Chao Yu 

There is one case that we can leave bio in f2fs, result in hanging
page writeback waiter.

Thread AThread B
- f2fs_write_cache_pages
 - f2fs_submit_page_write
 page #0 cached in bio #0 of cold log
 - f2fs_submit_page_write
 page #1 cached in bio #1 of warm log
- f2fs_write_cache_pages
 - f2fs_submit_page_write
 bio is full, submit bio #1 contain 
page #1
 - f2fs_submit_merged_write_cond(, page #1)
 fail to submit bio #0 due to page #1 is not in any cached bios.

Signed-off-by: Chao Yu 
---
v3:
- only pass page parameter for f2fs_submit_merged_write_cond in relaimer path
 fs/f2fs/checkpoint.c |  3 +--
 fs/f2fs/data.c   | 38 +++---
 fs/f2fs/f2fs.h   |  4 ++--
 fs/f2fs/node.c   | 11 +--
 fs/f2fs/segment.c| 11 +--
 5 files changed, 32 insertions(+), 35 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index d312d2829d5a..dbffb3f6c5c7 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -279,8 +279,7 @@ static int __f2fs_write_meta_page(struct page *page,
dec_page_count(sbi, F2FS_DIRTY_META);
 
if (wbc->for_reclaim)
-   f2fs_submit_merged_write_cond(sbi, page->mapping->host,
-   0, page->index, META);
+   f2fs_submit_merged_write_cond(sbi, NULL, page, 0, META);
 
unlock_page(page);
 
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 36998b078b1b..02d5ce888a4a 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -322,8 +322,8 @@ static void __submit_merged_bio(struct f2fs_bio_info *io)
io->bio = NULL;
 }
 
-static bool __has_merged_page(struct f2fs_bio_info *io,
-   struct inode *inode, nid_t ino, pgoff_t idx)
+static bool __has_merged_page(struct f2fs_bio_info *io, struct inode *inode,
+   struct page *page, nid_t ino)
 {
struct bio_vec *bvec;
struct page *target;
@@ -332,7 +332,7 @@ static bool __has_merged_page(struct f2fs_bio_info *io,
if (!io->bio)
return false;
 
-   if (!inode && !ino)
+   if (!inode && !page && !ino)
return true;
 
bio_for_each_segment_all(bvec, io->bio, i) {
@@ -342,11 +342,10 @@ static bool __has_merged_page(struct f2fs_bio_info *io,
else
target = fscrypt_control_page(bvec->bv_page);
 
-   if (idx != target->index)
-   continue;
-
if (inode && inode == target->mapping->host)
return true;
+   if (page && page == target)
+   return true;
if (ino && ino == ino_of_node(target))
return true;
}
@@ -355,7 +354,8 @@ static bool __has_merged_page(struct f2fs_bio_info *io,
 }
 
 static bool has_merged_page(struct f2fs_sb_info *sbi, struct inode *inode,
-   nid_t ino, pgoff_t idx, enum page_type type)
+   struct page *page, nid_t ino,
+   enum page_type type)
 {
enum page_type btype = PAGE_TYPE_OF_BIO(type);
enum temp_type temp;
@@ -366,7 +366,7 @@ static bool has_merged_page(struct f2fs_sb_info *sbi, 
struct inode *inode,
io = sbi->write_io[btype] + temp;
 
down_read(>io_rwsem);
-   ret = __has_merged_page(io, inode, ino, idx);
+   ret = __has_merged_page(io, inode, page, ino);
up_read(>io_rwsem);
 
/* TODO: use HOT temp only for meta pages now. */
@@ -397,12 +397,12 @@ static void __f2fs_submit_merged_write(struct 
f2fs_sb_info *sbi,
 }
 
 static void __submit_merged_write_cond(struct f2fs_sb_info *sbi,
-   struct inode *inode, nid_t ino, pgoff_t idx,
-   enum page_type type, bool force)
+   struct inode *inode, struct page *page,
+   nid_t ino, enum page_type type, bool force)
 {
enum temp_type temp;
 
-   if (!force && !has_merged_page(sbi, inode, ino, idx, type))
+   if (!force && !has_merged_page(sbi, inode, page, ino, type))
return;
 
for (temp = HOT; temp < NR_TEMP_TYPE; temp++) {
@@ -421,10 +421,10 @@ void f2fs_submit_merged_write(struct f2fs_sb_info *sbi, 
enum page_type type)
 }
 
 void f2fs_submit_merged_write_cond(struct f2fs_sb_info *sbi,
-   struct inode *inode, nid_t ino, pgoff_t idx,
-   enum page_type type)
+

Re: [f2fs-dev] [PATCH 2/2 v3] f2fs: avoid f2fs_bug_on if f2fs_get_meta_page_nofail got EIO

2018-09-27 Thread Chao Yu
On 2018-9-21 5:48, Jaegeuk Kim wrote:
> This patch avoids BUG_ON when f2fs_get_meta_page_nofail got EIO during
> xfstests/generic/475.
> 
> Signed-off-by: Jaegeuk Kim 
Reviewed-by: Chao Yu 

Thanks,


Re: [PATCH v3] f2fs: submit cached bio to avoid endless PageWriteback

2018-09-25 Thread Chao Yu
On 2018/9/26 8:20, Jaegeuk Kim wrote:
> On 09/21, Chao Yu wrote:
>> On 2018/9/18 10:14, Chao Yu wrote:
>>> On 2018/9/18 10:02, Jaegeuk Kim wrote:
>>>> On 09/18, Chao Yu wrote:
>>>>> On 2018/9/18 9:37, Jaegeuk Kim wrote:
>>>>>> On 09/18, Chao Yu wrote:
>>>>>>> On 2018/9/18 9:04, Jaegeuk Kim wrote:
>>>>>>>> On 09/13, Chao Yu wrote:
>>>>>>>>> From: Chao Yu 
>>>>>>>>>
>>>>>>>>> When migrating encrypted block from background GC thread, we only add
>>>>>>>>> them into f2fs inner bio cache, but forget to submit the cached bio, 
>>>>>>>>> it
>>>>>>>>> may cause potential deadlock when we are waiting page writebacked, fix
>>>>>>>>> it.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Chao Yu 
>>>>>>>>> ---
>>>>>>>>> v3:
>>>>>>>>> clean up codes suggested by Jaegeuk.
>>>>>>>>>  fs/f2fs/f2fs.h |  2 +-
>>>>>>>>>  fs/f2fs/gc.c   | 71 
>>>>>>>>> +++---
>>>>>>>>>  fs/f2fs/node.c | 13 ++---
>>>>>>>>>  3 files changed, 61 insertions(+), 25 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>>>>>>>>> index b676b82312e0..917b2ca76aac 100644
>>>>>>>>> --- a/fs/f2fs/f2fs.h
>>>>>>>>> +++ b/fs/f2fs/f2fs.h
>>>>>>>>> @@ -2869,7 +2869,7 @@ struct page *f2fs_new_node_page(struct 
>>>>>>>>> dnode_of_data *dn, unsigned int ofs);
>>>>>>>>>  void f2fs_ra_node_page(struct f2fs_sb_info *sbi, nid_t nid);
>>>>>>>>>  struct page *f2fs_get_node_page(struct f2fs_sb_info *sbi, pgoff_t 
>>>>>>>>> nid);
>>>>>>>>>  struct page *f2fs_get_node_page_ra(struct page *parent, int start);
>>>>>>>>> -void f2fs_move_node_page(struct page *node_page, int gc_type);
>>>>>>>>> +int f2fs_move_node_page(struct page *node_page, int gc_type);
>>>>>>>>>  int f2fs_fsync_node_pages(struct f2fs_sb_info *sbi, struct inode 
>>>>>>>>> *inode,
>>>>>>>>>   struct writeback_control *wbc, bool atomic,
>>>>>>>>>   unsigned int *seq_id);
>>>>>>>>> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
>>>>>>>>> index a4c1a419611d..f57622cfe058 100644
>>>>>>>>> --- a/fs/f2fs/gc.c
>>>>>>>>> +++ b/fs/f2fs/gc.c
>>>>>>>>> @@ -461,7 +461,7 @@ static int check_valid_map(struct f2fs_sb_info 
>>>>>>>>> *sbi,
>>>>>>>>>   * On validity, copy that node with cold status, otherwise (invalid 
>>>>>>>>> node)
>>>>>>>>>   * ignore that.
>>>>>>>>>   */
>>>>>>>>> -static void gc_node_segment(struct f2fs_sb_info *sbi,
>>>>>>>>> +static int gc_node_segment(struct f2fs_sb_info *sbi,
>>>>>>>>>   struct f2fs_summary *sum, unsigned int segno, int 
>>>>>>>>> gc_type)
>>>>>>>>>  {
>>>>>>>>>   struct f2fs_summary *entry;
>>>>>>>>> @@ -469,6 +469,7 @@ static void gc_node_segment(struct f2fs_sb_info 
>>>>>>>>> *sbi,
>>>>>>>>>   int off;
>>>>>>>>>   int phase = 0;
>>>>>>>>>   bool fggc = (gc_type == FG_GC);
>>>>>>>>> + int submitted = 0;
>>>>>>>>>  
>>>>>>>>>   start_addr = START_BLOCK(sbi, segno);
>>>>>>>>>  
>>>>>>>>> @@ -482,10 +483,11 @@ static void gc_node_segment(struct f2fs_sb_info 
>>>>>>>>> *sbi,
>>>>>>>>>   nid_t nid = le32_to_cpu(entry->nid);
>>>>>>>>>   struct page *node_page;
>>>>>>>>>   struct node_info ni;
>>>>&

Re: [PATCH] jfs: remove redundant dquot_initialize() in jfs_evict_inode()

2018-09-20 Thread Chao Yu
On 2018/9/20 22:24, Dave Kleikamp wrote:
> On 9/20/18 9:18 AM, Chao Yu wrote:
>> Ping,
>>
>> Any comments?
> 
> Sorry for putting it off. It looks good to me. I'll push it upstream.

Thanks for your review. ;)

Thanks,

> 
> Thanks,
> Dave
> 
>>
>> On 2018/9/17 15:12, Chao Yu wrote:
>>> We don't need to call dquot_initialize() twice in jfs_evict_inode(),
>>> remove one of them for cleanup.
>>>
>>> Signed-off-by: Chao Yu 
>>> ---
>>>  fs/jfs/inode.c | 1 -
>>>  1 file changed, 1 deletion(-)
>>>
>>> diff --git a/fs/jfs/inode.c b/fs/jfs/inode.c
>>> index 054cc761b426..805ae9e8944a 100644
>>> --- a/fs/jfs/inode.c
>>> +++ b/fs/jfs/inode.c
>>> @@ -166,7 +166,6 @@ void jfs_evict_inode(struct inode *inode)
>>> /*
>>>  * Free the inode from the quota allocation.
>>>  */
>>> -   dquot_initialize(inode);
>>> dquot_free_inode(inode);
>>> }
>>> } else {
>>>


Re: [PATCH] jfs: remove redundant dquot_initialize() in jfs_evict_inode()

2018-09-20 Thread Chao Yu
Ping,

Any comments?

On 2018/9/17 15:12, Chao Yu wrote:
> We don't need to call dquot_initialize() twice in jfs_evict_inode(),
> remove one of them for cleanup.
> 
> Signed-off-by: Chao Yu 
> ---
>  fs/jfs/inode.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/fs/jfs/inode.c b/fs/jfs/inode.c
> index 054cc761b426..805ae9e8944a 100644
> --- a/fs/jfs/inode.c
> +++ b/fs/jfs/inode.c
> @@ -166,7 +166,6 @@ void jfs_evict_inode(struct inode *inode)
>   /*
>* Free the inode from the quota allocation.
>*/
> - dquot_initialize(inode);
>   dquot_free_inode(inode);
>   }
>   } else {
> 


Re: [PATCH] f2fs: avoid GC causing encrypted file corrupted

2018-09-18 Thread Chao Yu
On 2018/9/18 20:39, Yunlong Song wrote:
> The encrypted file may be corrupted by GC in following case:
> 
> Time 1: | segment 1 blkaddr = A |  GC -> | segment 2 blkaddr = B |
> Encrypted block 1 is moved from blkaddr A of segment 1 to blkaddr B of
> segment 2,
> 
> Time 2: | segment 1 blkaddr = B |  GC -> | segment 3 blkaddr = C |
> 
> Before page 1 is written back and if segment 2 become a victim, then
> page 1 is moved from blkaddr B of segment 2 to blkaddr Cof segment 3,
> during the GC process of Time 2, f2fs should wait for page 1 written back
> before reading it, or move_data_block will read a garbage block from
> blkaddr B since page is not written back to blkaddr B yet.
> 
> Commit 6aa58d8a ("f2fs: readahead encrypted block during GC") introduce
> ra_data_block to read encrypted block, but it forgets to add
> f2fs_wait_on_page_writeback to avoid racing between GC and flush.
> 
> Signed-off-by: Yunlong Song 

Reviewed-by: Chao Yu 

Thanks,


Re: [PATCH] ext4: fix to propagate error from dquot_initialize()

2018-09-17 Thread Chao Yu
Hi Shilong,

On 2018/9/17 16:18, Wang Shilong wrote:
> Hi Chao,
> 
>    I sent a early patch  series which included this but forgot  to send to
> f2fs list..
> https://patchwork.ozlabs.org/patch/968759/

Oh, sorry, I didn't notice that one.

To all, please ignore this reduplicated one.

> 
> It looks Ted still have some questions for my first patch, so that second 
> patch
> is not applied yet..
> 
> Thanks,
> 



[PATCH] ext4: fix to propagate error from dquot_initialize()

2018-09-17 Thread Chao Yu
In ext4_ioctl_setproject(), we forgot to check error return from
dquot_initialize(), if we ignore such error, later, quota info can
be out-of-update, fix it.

Signed-off-by: Chao Yu 
---
 fs/ext4/ioctl.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index a7074115d6f6..e6d11cb07e87 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -364,7 +364,9 @@ static int ext4_ioctl_setproject(struct file *filp, __u32 
projid)
brelse(iloc.bh);
}
 
-   dquot_initialize(inode);
+   err = dquot_initialize(inode);
+   if (err)
+   goto out_unlock;
 
handle = ext4_journal_start(inode, EXT4_HT_QUOTA,
EXT4_QUOTA_INIT_BLOCKS(sb) +
-- 
2.18.0.rc1



[PATCH] jfs: remove redundant dquot_initialize() in jfs_evict_inode()

2018-09-17 Thread Chao Yu
We don't need to call dquot_initialize() twice in jfs_evict_inode(),
remove one of them for cleanup.

Signed-off-by: Chao Yu 
---
 fs/jfs/inode.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fs/jfs/inode.c b/fs/jfs/inode.c
index 054cc761b426..805ae9e8944a 100644
--- a/fs/jfs/inode.c
+++ b/fs/jfs/inode.c
@@ -166,7 +166,6 @@ void jfs_evict_inode(struct inode *inode)
/*
 * Free the inode from the quota allocation.
 */
-   dquot_initialize(inode);
dquot_free_inode(inode);
}
} else {
-- 
2.18.0.rc1



Re: [PATCH] Revert "staging: erofs: disable compiling temporarile"

2018-09-12 Thread Chao Yu
Hi Stephen,

On 2018/9/12 15:34, Stephen Rothwell wrote:
> Hi Chao,
> 
> On Wed, 12 Sep 2018 15:19:16 +0800 Chao Yu  wrote:
>>
>> To make sure, did -next tree enable erofs compiling now?
> 
> Yes, from yesterday.

Great, thanks for your help. :)

> 
>> Xiang has made two patches to fix integration issue with other vfs changes,
>> and Greg and David have already picked them in their tree.
>>
>> staging: erofs: rename superblock flags (MS_xyz -> SB_xyz)
>> staging: erofs: update .mount and .remount_sb
> 
> I noticed, thanks.
> 



[PATCH] f2fs: submit cached bio to avoid endless PageWriteback

2018-09-11 Thread Chao Yu
When migrating encrypted block from background GC thread, we only add
them into f2fs inner bio cache, but forget to submit the cached bio, it
may cause potential deadlock when we are waiting page writebacked, fix
it.

Signed-off-by: Chao Yu 
---
 fs/f2fs/gc.c | 42 +-
 1 file changed, 29 insertions(+), 13 deletions(-)

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index c4ea4009cf05..a2ea0d445345 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -687,7 +687,7 @@ static int ra_data_block(struct inode *inode, pgoff_t index)
  * Move data block via META_MAPPING while keeping locked data page.
  * This can be used to move blocks, aka LBAs, directly on disk.
  */
-static void move_data_block(struct inode *inode, block_t bidx,
+static int move_data_block(struct inode *inode, block_t bidx,
int gc_type, unsigned int segno, int off)
 {
struct f2fs_io_info fio = {
@@ -706,25 +706,29 @@ static void move_data_block(struct inode *inode, block_t 
bidx,
struct node_info ni;
struct page *page, *mpage;
block_t newaddr;
-   int err;
+   int err = 0;
bool lfs_mode = test_opt(fio.sbi, LFS);
 
/* do not read out */
page = f2fs_grab_cache_page(inode->i_mapping, bidx, false);
if (!page)
-   return;
+   return -ENOMEM;
 
-   if (!check_valid_map(F2FS_I_SB(inode), segno, off))
+   if (!check_valid_map(F2FS_I_SB(inode), segno, off)) {
+   err = -ENOENT;
goto out;
+   }
 
if (f2fs_is_atomic_file(inode)) {
F2FS_I(inode)->i_gc_failures[GC_FAILURE_ATOMIC]++;
F2FS_I_SB(inode)->skipped_atomic_files[gc_type]++;
+   err = -EAGAIN;
goto out;
}
 
if (f2fs_is_pinned_file(inode)) {
f2fs_pin_file_control(inode, true);
+   err = -EAGAIN;
goto out;
}
 
@@ -735,6 +739,7 @@ static void move_data_block(struct inode *inode, block_t 
bidx,
 
if (unlikely(dn.data_blkaddr == NULL_ADDR)) {
ClearPageUptodate(page);
+   err = -ENOENT;
goto put_out;
}
 
@@ -817,6 +822,7 @@ static void move_data_block(struct inode *inode, block_t 
bidx,
fio.new_blkaddr = newaddr;
f2fs_submit_page_write();
if (fio.retry) {
+   err = -EAGAIN;
if (PageWriteback(fio.encrypted_page))
end_page_writeback(fio.encrypted_page);
goto put_page_out;
@@ -840,6 +846,8 @@ static void move_data_block(struct inode *inode, block_t 
bidx,
f2fs_put_dnode();
 out:
f2fs_put_page(page, 1);
+
+   return err;
 }
 
 static void move_data_page(struct inode *inode, block_t bidx, int gc_type,
@@ -919,7 +927,7 @@ static void move_data_page(struct inode *inode, block_t 
bidx, int gc_type,
  * If the parent node is not valid or the data block address is different,
  * the victim data block is ignored.
  */
-static void gc_data_segment(struct f2fs_sb_info *sbi, struct f2fs_summary *sum,
+static int gc_data_segment(struct f2fs_sb_info *sbi, struct f2fs_summary *sum,
struct gc_inode_list *gc_list, unsigned int segno, int gc_type)
 {
struct super_block *sb = sbi->sb;
@@ -927,6 +935,7 @@ static void gc_data_segment(struct f2fs_sb_info *sbi, 
struct f2fs_summary *sum,
block_t start_addr;
int off;
int phase = 0;
+   int submitted = 0;
 
start_addr = START_BLOCK(sbi, segno);
 
@@ -943,7 +952,7 @@ static void gc_data_segment(struct f2fs_sb_info *sbi, 
struct f2fs_summary *sum,
 
/* stop BG_GC if there is not enough free sections. */
if (gc_type == BG_GC && has_not_enough_free_secs(sbi, 0, 0))
-   return;
+   return submitted;
 
if (check_valid_map(sbi, segno, off) == 0)
continue;
@@ -1015,6 +1024,7 @@ static void gc_data_segment(struct f2fs_sb_info *sbi, 
struct f2fs_summary *sum,
if (inode) {
struct f2fs_inode_info *fi = F2FS_I(inode);
bool locked = false;
+   int err;
 
if (S_ISREG(inode->i_mode)) {
if (!down_write_trylock(>i_gc_rwsem[READ]))
@@ -1033,12 +1043,15 @@ static void gc_data_segment(struct f2fs_sb_info *sbi, 
struct f2fs_summary *sum,
 
start_bidx = f2fs_start_bidx_of_node(nofs, inode)
+ ofs_in_node;
-   if (f2fs_post_read_required(inode))
-   move_data_block(inode, start_bidx, gc_type,
-   segno, off);
-   else
+   if (f2f

Re: [PATCH 2/2] f2fs: fix to avoid quota inode leak in ->put_super

2018-09-07 Thread Chao Yu
I can see it in dev, thanks for merging. ;)

On 2018/9/8 6:38, Jaegeuk Kim wrote:
> I merged as one. Please check dev. :)
> 
> On 09/06, Chao Yu wrote:
>> generic/019 reports below error:
>>
>>  __quota_error: 1160 callbacks suppressed
>>  Quota error (device zram1): write_blk: dquota write failed
>>  Quota error (device zram1): qtree_write_dquot: Error -28 occurred while 
>> creating quota
>>  Quota error (device zram1): write_blk: dquota write failed
>>  Quota error (device zram1): qtree_write_dquot: Error -28 occurred while 
>> creating quota
>>  Quota error (device zram1): write_blk: dquota write failed
>>  Quota error (device zram1): qtree_write_dquot: Error -28 occurred while 
>> creating quota
>>  Quota error (device zram1): write_blk: dquota write failed
>>  Quota error (device zram1): qtree_write_dquot: Error -28 occurred while 
>> creating quota
>>  Quota error (device zram1): write_blk: dquota write failed
>>  Quota error (device zram1): qtree_write_dquot: Error -28 occurred while 
>> creating quota
>>  VFS: Busy inodes after unmount of zram1. Self-destruct in 5 seconds.  Have 
>> a nice day...
>>
>> If we failed in below path due to fail to write dquot block, we will miss
>> to release quota inode, fix it.
>>
>> - f2fs_put_super
>>  - f2fs_quota_off_umount
>>   - f2fs_quota_off
>>- f2fs_quota_sync   <-- failed
>>- dquot_quota_off   <-- missed to call
>>
>> Signed-off-by: Chao Yu 
>> ---
>>  fs/f2fs/super.c | 6 --
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
>> index c026aaccf218..328f58647f4c 100644
>> --- a/fs/f2fs/super.c
>> +++ b/fs/f2fs/super.c
>> @@ -1900,10 +1900,12 @@ void f2fs_quota_off_umount(struct super_block *sb)
>>  for (type = 0; type < MAXQUOTAS; type++) {
>>  err = f2fs_quota_off(sb, type);
>>  if (err) {
>> +int ret = dquot_quota_off(sb, type);
>> +
>>  f2fs_msg(sb, KERN_ERR,
>>  "Fail to turn off disk quota "
>> -"(type: %d, err: %d), Please "
>> -"run fsck to fix it.", type, err);
>> +"(type: %d, err: %d, ret:%d), Please "
>> +"run fsck to fix it.", type, err, ret);
>>  set_sbi_flag(F2FS_SB(sb), SBI_QUOTA_NEED_REPAIR);
>>  }
>>  }
>> -- 
>> 2.18.0.rc1


[PATCH v7] f2fs: guarantee journalled quota data by checkpoint

2018-09-07 Thread Chao Yu
From: Chao Yu 

For journalled quota mode, let checkpoint to flush dquot dirty data
and quota file data to guarntee persistence of all quota sysfile in
last checkpoint, by this way, we can avoid corrupting quota sysfile
when encountering SPO.

The implementation is as below:

1. add a global state SBI_QUOTA_NEED_FLUSH to indicate that there is
cached dquot metadata changes in quota subsystem, and later checkpoint
should:
 a) flush dquot metadata into quota file.
 b) flush quota file to storage to keep file usage be consistent.

2. add a global state SBI_QUOTA_NEED_REPAIR to indicate that quota
operation failed due to -EIO or -ENOSPC, so later,
 a) checkpoint will skip syncing dquot metadata.
 b) CP_QUOTA_NEED_FSCK_FLAG will be set in last cp pack to give a
hint for fsck repairing.

3. add a global state SBI_QUOTA_SKIP_FLUSH, in checkpoint, if quota
data updating is very heavy, it may cause hungtask in block_operation().
To avoid this, if our retry time exceed threshold, let's just skip
flushing and retry in next checkpoint().

Signed-off-by: Weichao Guo 
Signed-off-by: Chao Yu 
---
v7:
- fix compile error.

 fs/f2fs/checkpoint.c|  45 +++--
 fs/f2fs/data.c  |   8 ++-
 fs/f2fs/f2fs.h  |   7 +++
 fs/f2fs/super.c | 106 
 include/linux/f2fs_fs.h |   1 +
 5 files changed, 152 insertions(+), 15 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index fa2e0d3c4945..4ce2f90b4fb2 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -1086,6 +1086,15 @@ static void __prepare_cp_block(struct f2fs_sb_info *sbi)
ckpt->next_free_nid = cpu_to_le32(last_nid);
 }
 
+static bool __need_flush_quota(struct f2fs_sb_info *sbi)
+{
+   if (is_sbi_flag_set(sbi, SBI_QUOTA_SKIP_FLUSH))
+   return false;
+   if (is_sbi_flag_set(sbi, SBI_QUOTA_NEED_REPAIR))
+   return false;
+   return is_sbi_flag_set(sbi, SBI_QUOTA_NEED_FLUSH);
+}
+
 /*
  * Freeze all the FS-operations for checkpoint.
  */
@@ -1097,12 +1106,31 @@ static int block_operations(struct f2fs_sb_info *sbi)
.for_reclaim = 0,
};
struct blk_plug plug;
-   int err = 0;
+   int err = 0, cnt = 0;
 
blk_start_plug();
 
-retry_flush_dents:
+retry_flush_quotas:
+   if (__need_flush_quota(sbi)) {
+   if (++cnt > DEFAULT_RETRY_QUOTA_FLUSH_COUNT) {
+   set_sbi_flag(sbi, SBI_QUOTA_SKIP_FLUSH);
+   f2fs_lock_all(sbi);
+   goto retry_flush_dents;
+   }
+   clear_sbi_flag(sbi, SBI_QUOTA_NEED_FLUSH);
+   err = f2fs_quota_sync(sbi->sb, -1);
+   if (err)
+   goto out;
+   }
+
f2fs_lock_all(sbi);
+   if (__need_flush_quota(sbi)) {
+   f2fs_unlock_all(sbi);
+   cond_resched();
+   goto retry_flush_quotas;
+   }
+
+retry_flush_dents:
/* write all the dirty dentry pages */
if (get_pages(sbi, F2FS_DIRTY_DENTS)) {
f2fs_unlock_all(sbi);
@@ -1110,7 +1138,7 @@ static int block_operations(struct f2fs_sb_info *sbi)
if (err)
goto out;
cond_resched();
-   goto retry_flush_dents;
+   goto retry_flush_quotas;
}
 
/*
@@ -1126,7 +1154,7 @@ static int block_operations(struct f2fs_sb_info *sbi)
if (err)
goto out;
cond_resched();
-   goto retry_flush_dents;
+   goto retry_flush_quotas;
}
 
 retry_flush_nodes:
@@ -1217,6 +1245,14 @@ static void update_ckpt_flags(struct f2fs_sb_info *sbi, 
struct cp_control *cpc)
if (is_sbi_flag_set(sbi, SBI_NEED_FSCK))
__set_ckpt_flags(ckpt, CP_FSCK_FLAG);
 
+   if (is_sbi_flag_set(sbi, SBI_QUOTA_SKIP_FLUSH))
+   __set_ckpt_flags(ckpt, CP_QUOTA_NEED_FSCK_FLAG);
+   else
+   __clear_ckpt_flags(ckpt, CP_QUOTA_NEED_FSCK_FLAG);
+
+   if (is_sbi_flag_set(sbi, SBI_QUOTA_NEED_REPAIR))
+   __set_ckpt_flags(ckpt, CP_QUOTA_NEED_FSCK_FLAG);
+
/* set this flag to activate crc|cp_ver for recovery */
__set_ckpt_flags(ckpt, CP_CRC_RECOVERY_FLAG);
__clear_ckpt_flags(ckpt, CP_NOCRC_RECOVERY_FLAG);
@@ -1424,6 +1460,7 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, struct 
cp_control *cpc)
 
clear_sbi_flag(sbi, SBI_IS_DIRTY);
clear_sbi_flag(sbi, SBI_NEED_CP);
+   clear_sbi_flag(sbi, SBI_QUOTA_SKIP_FLUSH);
__set_cp_next_pack(sbi);
 
/*
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 8c204f896c22..eb60a870a1df 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -49,7 +49,7 @@ static bool __is_cp_guaranteed(struct page *page)
inode->i_ino ==  F2FS_NODE_INO(sbi) ||
S_ISDIR(i

Re: [PATCH v2] f2fs: fix to avoid NULL pointer dereference on se->discard_map

2018-09-04 Thread Chao Yu
On 2018/9/4 23:25, Vicente Bergas wrote:
> On Mon, Sep 3, 2018 at 9:52 PM, Chao Yu  wrote:
>> From: Chao Yu 
>>
>> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D200951data=02%7C01%7C%7Cc2be7ee866b04268e69f08d6127aa973%7C84df9e7fe9f640afb435%7C1%7C0%7C636716715374991374sdata=E%2Boli5wWbe97f3QCdmAiIIZPEMInd9u221ldtVDqvtA%3Dreserved=0
>>
>> These is a NULL pointer dereference issue reported in bugzilla:
>>
>> Hi,
>> in the setup there is a SATA SSD connected to a SATA-to-USB bridge.
>>
>> The disc is "Samsung SSD 850 PRO 256G" which supports TRIM.
>> There are four partitions:
>>  sda1: FAT  /boot
>>  sda2: F2FS /
>>  sda3: F2FS /home
>>  sda4: F2FS
>>
>> The bridge is ASMT1153e which uses the "uas" driver.
>> There is no TRIM pass-through, so, when mounting it reports:
>>  mounting with "discard" option, but the device does not support discard
>>
>> The USB host is USB3.0 and UASP capable. It is the one on RK3399.
>>
>> Given this everything works fine, except there is no TRIM support.
>>
>> In order to enable TRIM a new UDEV rule is added [1]:
>>  /etc/udev/rules.d/10-sata-bridge-trim.rules:
>>  ACTION=="add|change", ATTRS{idVendor}=="174c", ATTRS{idProduct}=="55aa", 
>> SUBSYSTEM=="scsi_disk", ATTR{provisioning_mode}="unmap"
>> After reboot any F2FS write hangs forever and dmesg reports:
>>  Unable to handle kernel NULL pointer dereference
>>
>> Also tested on a x86_64 system: works fine even with TRIM enabled.
>>  same disc
>>  same bridge
>>  different usb host controller
>>  different cpu architecture
>>  not root filesystem
>>
>> Regards,
>>   Vicenç.
>>
>> [1] Post #5 in 
>> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbbs.archlinux.org%2Fviewtopic.php%3Fid%3D236280data=02%7C01%7C%7Cc2be7ee866b04268e69f08d6127aa973%7C84df9e7fe9f640afb435%7C1%7C0%7C636716715374991374sdata=tLP2J%2BL2MPDnqbLm1JcmJ7HfM%2F9j%2F0xc2MET2QSAjVE%3Dreserved=0
>>
>>  Unable to handle kernel NULL pointer dereference at virtual address 
>> 003e
>>  Mem abort info:
>>ESR = 0x9604
>>Exception class = DABT (current EL), IL = 32 bits
>>SET = 0, FnV = 0
>>EA = 0, S1PTW = 0
>>  Data abort info:
>>ISV = 0, ISS = 0x0004
>>CM = 0, WnR = 0
>>  user pgtable: 4k pages, 48-bit VAs, pgdp = 626e3122
>>  [003e] pgd=
>>  Internal error: Oops: 9604 [#1] SMP
>>  Modules linked in: overlay snd_soc_hdmi_codec rc_cec dw_hdmi_i2s_audio 
>> dw_hdmi_cec snd_soc_simple_card snd_soc_simple_card_utils 
>> snd_soc_rockchip_i2s rockchip_rga snd_soc_rockchip_pcm rockchipdrm 
>> videobuf2_dma_sg v4l2_mem2mem rtc_rk808 videobuf2_memops analogix_dp 
>> videobuf2_v4l2 videobuf2_common dw_hdmi dw_wdt cec rc_core videodev 
>> drm_kms_helper media drm rockchip_thermal rockchip_saradc realtek 
>> drm_panel_orientation_quirks syscopyarea sysfillrect sysimgblt fb_sys_fops 
>> dwmac_rk stmmac_platform stmmac pwm_bl squashfs loop crypto_user gpio_keys 
>> hid_kensington
>>  CPU: 5 PID: 957 Comm: nvim Not tainted 4.19.0-rc1-1-ARCH #1
>>  Hardware name: Sapphire-RK3399 Board (DT)
>>  pstate: 0005 (nzcv daif -PAN -UAO)
>>  pc : update_sit_entry+0x304/0x4b0
>>  lr : update_sit_entry+0x108/0x4b0
>>  sp : 0ca13bd0
>>  x29: 0ca13bd0 x28: 003e
>>  x27: 0020 x26: 0008
>>  x25: 0048 x24: 8000ebb85cf8
>>  x23: 0253 x22: 
>>  x21: 000535f2 x20: ffdf
>>  x19: 8000eb9e6800 x18: 8000eb9e6be8
>>  x17: 07ce6926 x16: 1c83ffa8
>>  x15:  x14: 8000f602df90
>>  x13: 0006 x12: 0040
>>  x11: 0228 x10: 
>>  x9 :  x8 : 
>>  x7 : 000535f2 x6 : 8000ebff3440
>>  x5 : 8000ebff3440 x4 : 8000ebe3a6c8
>>  x3 :  x2 : 0020
>>  x1 :  x0 : 8000eb9e5800
>>  Process nvim (pid: 957, stack limit = 0x63a78320)
>>  Call trace:
>>   update_sit_entry+0x304/0x4b0
>>   f2fs_invalidate_blocks+0x98/0x140
>>   truncate_node+0x90/0x400
>>   f2fs_remove_inode_page+0xe8/0x340
>>   f2fs_evict_inode+0x2b0/0x408
>>   evict+0xe0/0x1e0
>>   iput+0x160/0x260
>>   do_u

Re: [PATCH 1/2] f2fs: fix to avoid quota inode leak in ->put_super

2018-08-20 Thread Chao Yu
On 2018/8/18 23:16, Jaegeuk Kim wrote:
> On 08/17, Chao Yu wrote:
>> generic/019 reports below error:
>>
>>  __quota_error: 1160 callbacks suppressed
>>  Quota error (device zram1): write_blk: dquota write failed
>>  Quota error (device zram1): qtree_write_dquot: Error -28 occurred while 
>> creating quota
>>  Quota error (device zram1): write_blk: dquota write failed
>>  Quota error (device zram1): qtree_write_dquot: Error -28 occurred while 
>> creating quota
>>  Quota error (device zram1): write_blk: dquota write failed
>>  Quota error (device zram1): qtree_write_dquot: Error -28 occurred while 
>> creating quota
>>  Quota error (device zram1): write_blk: dquota write failed
>>  Quota error (device zram1): qtree_write_dquot: Error -28 occurred while 
>> creating quota
>>  Quota error (device zram1): write_blk: dquota write failed
>>  Quota error (device zram1): qtree_write_dquot: Error -28 occurred while 
>> creating quota
>>  VFS: Busy inodes after unmount of zram1. Self-destruct in 5 seconds.  Have 
>> a nice day...
>>
>> If we failed in below path due to fail to write dquot block, we will miss
>> to release quota inode, fix it.
>>
>> - f2fs_put_super
>>  - f2fs_quota_off_umount
>>   - f2fs_quota_off
>>- f2fs_quota_sync   <-- failed
>>- dquot_quota_off   <-- missed to call
>>
>> Signed-off-by: Chao Yu 
>> ---
>>  fs/f2fs/super.c | 6 --
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
>> index a5df9fbc6355..9647bbcdfd2b 100644
>> --- a/fs/f2fs/super.c
>> +++ b/fs/f2fs/super.c
>> @@ -1977,10 +1977,12 @@ void f2fs_quota_off_umount(struct super_block *sb)
>>  for (type = 0; type < MAXQUOTAS; type++) {
>>  err = f2fs_quota_off(sb, type);
>>  if (err) {
>> +int ret = dquot_quota_off(sb, type);
>> +
> 
> Could you check the mainline version?

I guess we missed to apply below patch? Could you check that?

f2fs: report error if quota off error during umount

Thanks,

> 
>>  f2fs_msg(sb, KERN_ERR,
>>  "Fail to turn off disk quota "
>> -"(type: %d, err: %d), Please "
>> -"run fsck to fix it.", type, err);
>> +"(type: %d, err: %d, ret:%d), Please "
>> +"run fsck to fix it.", type, err, ret);
>>  set_sbi_flag(F2FS_SB(sb), SBI_NEED_FSCK);
>>  }
>>  }
>> -- 
>> 2.18.0.rc1


Re: [f2fs-dev] [PATCH v3] f2fs: fix performance issue observed with multi-thread sequential read

2018-08-14 Thread Chao Yu
On 2018/8/14 12:04, Jaegeuk Kim wrote:
> On 08/14, Chao Yu wrote:
>> On 2018/8/14 4:11, Jaegeuk Kim wrote:
>>> On 08/13, Chao Yu wrote:
>>>> Hi Jaegeuk,
>>>>
>>>> On 2018/8/11 2:56, Jaegeuk Kim wrote:
>>>>> This reverts the commit - "b93f771 - f2fs: remove writepages lock"
>>>>> to fix the drop in sequential read throughput.
>>>>>
>>>>> Test: ./tiotest -t 32 -d /data/tio_tmp -f 32 -b 524288 -k 1 -k 3 -L
>>>>> device: UFS
>>>>>
>>>>> Before -
>>>>> read throughput: 185 MB/s
>>>>> total read requests: 85177 (of these ~8 are 4KB size requests).
>>>>> total write requests: 2546 (of these ~2208 requests are written in 512KB).
>>>>>
>>>>> After -
>>>>> read throughput: 758 MB/s
>>>>> total read requests: 2417 (of these ~2042 are 512KB reads).
>>>>> total write requests: 2701 (of these ~2034 requests are written in 512KB).
>>>>
>>>> IMO, it only impact sequential read performance in a large file which may 
>>>> be
>>>> fragmented during multi-thread writing.
>>>>
>>>> In android environment, mostly, the large file should be cold type, such 
>>>> as apk,
>>>> mp3, rmvb, jpeg..., so I think we only need to serialize writepages() for 
>>>> cold
>>>> data area writer.
>>>>
>>>> So how about adding a mount option to serialize writepage() for different 
>>>> type
>>>> of log, e.g. in android, using serialize=4; by default, using serialize=7
>>>> HOT_DATA   1
>>>> WARM_DATA  2
>>>> COLD_DATA  4
>>>
>>> Well, I don't think we need to give too many mount options for this 
>>> fragmented
>>> case. How about doing this for the large files only like this?
>>
>> Thread A write 512 pages Thread B write 8 pages
>>
>> - writepages()
>>  - mutex_lock(>writepages);
>>   - writepage();
>> ...
>>  - writepages()
>>   - writepage()
>>
>>   - writepage();
>> ...
>>  - mutex_unlock(>writepages);
>>
>> Above case will also cause fragmentation since we didn't serialize all
>> concurrent IO with the lock.
>>
>> Do we need to consider such case?
> 
> We can simply allow 512 and 8 in the same segment, which would not a big deal,
> when considering starvation of Thread B.

Yeah, but in reality, there would be more threads competing in same log header,
so I worry that the effect of defragmenting will not so good as we expect,
anyway, for benchmark, it's enough.

Thanks,

> 
>>
>> Thanks,
>>
>>>
>>> >From 4fea0b6e4da8512a72dd52afc7a51beb35966ad9 Mon Sep 17 00:00:00 2001
>>> From: Jaegeuk Kim 
>>> Date: Thu, 9 Aug 2018 17:53:34 -0700
>>> Subject: [PATCH] f2fs: fix performance issue observed with multi-thread
>>>  sequential read
>>>
>>> This reverts the commit - "b93f771 - f2fs: remove writepages lock"
>>> to fix the drop in sequential read throughput.
>>>
>>> Test: ./tiotest -t 32 -d /data/tio_tmp -f 32 -b 524288 -k 1 -k 3 -L
>>> device: UFS
>>>
>>> Before -
>>> read throughput: 185 MB/s
>>> total read requests: 85177 (of these ~8 are 4KB size requests).
>>> total write requests: 2546 (of these ~2208 requests are written in 512KB).
>>>
>>> After -
>>> read throughput: 758 MB/s
>>> total read requests: 2417 (of these ~2042 are 512KB reads).
>>> total write requests: 2701 (of these ~2034 requests are written in 512KB).
>>>
>>> Signed-off-by: Sahitya Tummala 
>>> Signed-off-by: Jaegeuk Kim 
>>> ---
>>>  Documentation/ABI/testing/sysfs-fs-f2fs |  8 
>>>  fs/f2fs/data.c  | 10 ++
>>>  fs/f2fs/f2fs.h  |  2 ++
>>>  fs/f2fs/segment.c   |  1 +
>>>  fs/f2fs/super.c |  1 +
>>>  fs/f2fs/sysfs.c |  2 ++
>>>  6 files changed, 24 insertions(+)
>>>
>>> diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs 
>>> b/Documentation/ABI/testing/sysfs-fs-f2fs
>>> index 9b0123388f18..94a24aedcdb2 100644
>>> --- a/Documentation/ABI/testing/sysfs-fs-f2fs
>>> 

[PATCH] f2fs: fix use-after-free of dicard command entry

2018-08-07 Thread Chao Yu
As Dan Carpenter reported:

The patch 20ee4382322c: "f2fs: issue small discard by LBA order" from
Jul 8, 2018, leads to the following Smatch warning:

fs/f2fs/segment.c:1277 __issue_discard_cmd_orderly()
warn: 'dc' was already freed.

See also:
fs/f2fs/segment.c:2550 __issue_discard_cmd_range() warn: 'dc' was already freed.

In order to fix this issue, let's get error from __submit_discard_cmd(),
and release current discard command after we referenced next one.

Reported-by: Dan Carpenter 
Signed-off-by: Chao Yu 
---
 fs/f2fs/segment.c | 79 +++
 1 file changed, 45 insertions(+), 34 deletions(-)

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index be1bf38400ca..8826ea683804 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -998,7 +998,7 @@ static void __update_discard_tree_range(struct f2fs_sb_info 
*sbi,
struct block_device *bdev, block_t lstart,
block_t start, block_t len);
 /* this function is copied from blkdev_issue_discard from block/blk-lib.c */
-static void __submit_discard_cmd(struct f2fs_sb_info *sbi,
+static int __submit_discard_cmd(struct f2fs_sb_info *sbi,
struct discard_policy *dpolicy,
struct discard_cmd *dc,
unsigned int *issued)
@@ -1015,10 +1015,10 @@ static void __submit_discard_cmd(struct f2fs_sb_info 
*sbi,
int err = 0;
 
if (dc->state != D_PREP)
-   return;
+   return 0;
 
if (is_sbi_flag_set(sbi, SBI_NEED_FSCK))
-   return;
+   return 0;
 
trace_f2fs_issue_discard(bdev, dc->start, dc->len);
 
@@ -1057,43 +1057,44 @@ static void __submit_discard_cmd(struct f2fs_sb_info 
*sbi,
SECTOR_FROM_BLOCK(len),
GFP_NOFS, 0, );
 submit:
-   if (!err && bio) {
-   /*
-* should keep before submission to avoid D_DONE
-* right away
-*/
+   if (err) {
spin_lock_irqsave(>lock, flags);
-   if (last)
+   if (dc->state == D_PARTIAL)
dc->state = D_SUBMIT;
-   else
-   dc->state = D_PARTIAL;
-   dc->bio_ref++;
spin_unlock_irqrestore(>lock, flags);
 
-   atomic_inc(>issing_discard);
-   dc->issuing++;
-   list_move_tail(>list, wait_list);
+   break;
+   }
 
-   /* sanity check on discard range */
-   __check_sit_bitmap(sbi, start, start + len);
+   f2fs_bug_on(sbi, !bio);
 
-   bio->bi_private = dc;
-   bio->bi_end_io = f2fs_submit_discard_endio;
-   bio->bi_opf |= flag;
-   submit_bio(bio);
+   /*
+* should keep before submission to avoid D_DONE
+* right away
+*/
+   spin_lock_irqsave(>lock, flags);
+   if (last)
+   dc->state = D_SUBMIT;
+   else
+   dc->state = D_PARTIAL;
+   dc->bio_ref++;
+   spin_unlock_irqrestore(>lock, flags);
 
-   atomic_inc(>issued_discard);
+   atomic_inc(>issing_discard);
+   dc->issuing++;
+   list_move_tail(>list, wait_list);
 
-   f2fs_update_iostat(sbi, FS_DISCARD, 1);
-   } else {
-   spin_lock_irqsave(>lock, flags);
-   if (dc->state == D_PARTIAL)
-   dc->state = D_SUBMIT;
-   spin_unlock_irqrestore(>lock, flags);
+   /* sanity check on discard range */
+   __check_sit_bitmap(sbi, start, start + len);
 
-   __remove_discard_cmd(sbi, dc);
-   err = -EIO;
-   }
+   bio->bi_private = dc;
+   bio->bi_end_io = f2fs_submit_discard_endio;
+   bio->bi_opf |= flag;
+   submit_bio(bio);
+
+   atomic_inc(>issued_discard);
+
+   f2fs_update_iostat(sbi, FS_DISCARD, 1);
 
lstart += len;
start += len;
@@ -1101,8 +1102,9 @@ static void __submit_discard_cmd(struct f2fs_sb_info *sbi,
len = total_len;
}
 
-   if (len)
+   if (!err && len)
__update_discard_tr

Re: [PATCH] drivers/staging: Remove some unneeded semicolon

2018-08-05 Thread Chao Yu
Hi Xiang,

On 2018/8/6 9:27, Gao Xiang wrote:
> Hi Jiang,
> 
> On 2018/8/5 21:57, zhong jiang wrote:
>> That semicolons are unneeded, JUst remove them.
>>
>> Signed-off-by: zhong jiang 
> 
> Thanks for your patch. Since erofs and gasket are different feature, it is 
> better to seperate into two patches.
> and could you please cc linux-erofs mailing list 
>  as well?
> 
> Yes, there is an extra semicolon in z_erofs_vle_unzip_all, it was reported by 
> Julia Lawall several days ago.
> Actually, there is a patch in linux-erofs mailing list, but it seems that 
> Chao hasn't reviewed it yet...

Oh, sorry, I missed this one, let me check/review all recent patches again and
update them in tree later.

Thanks,

> 
> https://lists.ozlabs.org/pipermail/linux-erofs/2018-August/000303.html
> 
> I will add Signed-off-by: zhong jiang  to the original 
> patch if if you don't mind, do you?
> 
> Thanks,
> Gao Xiang
> 
> .
> 



Re: [PATCH] staging: erofs: fix if assignment style issue

2018-08-05 Thread Chao Yu
On 2018/8/5 23:21, Kristaps Čivkulis wrote:
> Fix coding style issue "do not use assignment in if condition"
> detected by checkpatch.pl.
> 
> Signed-off-by: Kristaps Čivkulis 

Reviewed-by: Chao Yu 
Thanks,



Re: linux-next: build failure after merge of the staging tree

2018-08-02 Thread Chao Yu
On 2018/8/2 15:14, Greg KH wrote:
> On Thu, Aug 02, 2018 at 03:01:59PM +0800, Chao Yu wrote:
>> Hi Greg,
>>
>> On 2018/8/2 14:15, Greg KH wrote:
>>> On Wed, Aug 01, 2018 at 05:09:13PM +0800, Chao Yu wrote:
>>>> Hi Stephen,
>>>>
>>>> On 2018/7/30 14:31, Gao Xiang wrote:
>>>>> Hi Stephen,
>>>>>
>>>>> On 2018/7/30 14:16, Stephen Rothwell wrote:
>>>>>> Hi Greg,
>>>>>>
>>>>>> After merging the staging tree, today's linux-next build (x86_64
>>>>>> allmodconfig) failed like this:
>>>>>>
>>>>>> drivers/staging/erofs/super.c: In function 'erofs_read_super':
>>>>>> drivers/staging/erofs/super.c:343:17: error: 'MS_RDONLY' undeclared 
>>>>>> (first use in this function); did you mean 'IS_RDONLY'?
>>>>>>   sb->s_flags |= MS_RDONLY | MS_NOATIME;
>>>>>>  ^
>>>>>>  IS_RDONLY
>>>>>> drivers/staging/erofs/super.c:343:17: note: each undeclared identifier 
>>>>>> is reported only once for each function it appears in
>>>>>> drivers/staging/erofs/super.c:343:29: error: 'MS_NOATIME' undeclared 
>>>>>> (first use in this function); did you mean 'S_NOATIME'?
>>>>>>   sb->s_flags |= MS_RDONLY | MS_NOATIME;
>>>>>>  ^~
>>>>>>  S_NOATIME
>>>>>> drivers/staging/erofs/super.c: In function 'erofs_mount':
>>>>>> drivers/staging/erofs/super.c:501:10: warning: passing argument 5 of 
>>>>>> 'mount_bdev' makes integer from pointer without a cast [-Wint-conversion]
>>>>>>, erofs_fill_super);
>>>>>>   ^~~~
>>>>>> In file included from include/linux/buffer_head.h:12:0,
>>>>>>  from drivers/staging/erofs/super.c:14:
>>>>>> include/linux/fs.h:2151:23: note: expected 'size_t {aka long unsigned 
>>>>>> int}' but argument is of type 'int (*)(struct super_block *, void *, 
>>>>>> int)'
>>>>>>  extern struct dentry *mount_bdev(struct file_system_type *fs_type,
>>>>>>^~
>>>>>> drivers/staging/erofs/super.c:500:9: error: too few arguments to 
>>>>>> function 'mount_bdev'
>>>>>>   return mount_bdev(fs_type, flags, dev_name,
>>>>>>  ^~
>>>>>> In file included from include/linux/buffer_head.h:12:0,
>>>>>>  from drivers/staging/erofs/super.c:14:
>>>>>> include/linux/fs.h:2151:23: note: declared here
>>>>>>  extern struct dentry *mount_bdev(struct file_system_type *fs_type,
>>>>>>^~
>>>>>> drivers/staging/erofs/super.c: At top level:
>>>>>> drivers/staging/erofs/super.c:518:20: error: initialization from 
>>>>>> incompatible pointer type [-Werror=incompatible-pointer-types]
>>>>>>   .mount  = erofs_mount,
>>>>>> ^~~
>>>>>> drivers/staging/erofs/super.c:518:20: note: (near initialization for 
>>>>>> 'erofs_fs_type.mount')
>>>>>> drivers/staging/erofs/super.c: In function 'erofs_remount':
>>>>>> drivers/staging/erofs/super.c:630:12: error: 'MS_RDONLY' undeclared 
>>>>>> (first use in this function); did you mean 'IS_RDONLY'?
>>>>>>   *flags |= MS_RDONLY;
>>>>>> ^
>>>>>> IS_RDONLY
>>>>>> drivers/staging/erofs/super.c: At top level:
>>>>>> drivers/staging/erofs/super.c:640:16: error: initialization from 
>>>>>> incompatible pointer type [-Werror=incompatible-pointer-types]
>>>>>>   .remount_fs = erofs_remount,
>>>>>> ^
>>>>>>
>>>>>> Caused by various commits creating erofs in the staging tree interacting
>>>>>> with various commits redoing the mount infrastructure in the vfs tree.
>>>>>>
>>>>>> I have disabed CONFIG_EROFS_FS for now:
>>>>
>>>> Xiang has submitted several patches as below to fix compiling error on 
>>>> -next
>>>> tree, could you consider to merg

Re: linux-next: build failure after merge of the staging tree

2018-08-02 Thread Chao Yu
Hi Greg,

On 2018/8/2 14:15, Greg KH wrote:
> On Wed, Aug 01, 2018 at 05:09:13PM +0800, Chao Yu wrote:
>> Hi Stephen,
>>
>> On 2018/7/30 14:31, Gao Xiang wrote:
>>> Hi Stephen,
>>>
>>> On 2018/7/30 14:16, Stephen Rothwell wrote:
>>>> Hi Greg,
>>>>
>>>> After merging the staging tree, today's linux-next build (x86_64
>>>> allmodconfig) failed like this:
>>>>
>>>> drivers/staging/erofs/super.c: In function 'erofs_read_super':
>>>> drivers/staging/erofs/super.c:343:17: error: 'MS_RDONLY' undeclared (first 
>>>> use in this function); did you mean 'IS_RDONLY'?
>>>>   sb->s_flags |= MS_RDONLY | MS_NOATIME;
>>>>  ^
>>>>  IS_RDONLY
>>>> drivers/staging/erofs/super.c:343:17: note: each undeclared identifier is 
>>>> reported only once for each function it appears in
>>>> drivers/staging/erofs/super.c:343:29: error: 'MS_NOATIME' undeclared 
>>>> (first use in this function); did you mean 'S_NOATIME'?
>>>>   sb->s_flags |= MS_RDONLY | MS_NOATIME;
>>>>  ^~
>>>>  S_NOATIME
>>>> drivers/staging/erofs/super.c: In function 'erofs_mount':
>>>> drivers/staging/erofs/super.c:501:10: warning: passing argument 5 of 
>>>> 'mount_bdev' makes integer from pointer without a cast [-Wint-conversion]
>>>>, erofs_fill_super);
>>>>   ^~~~
>>>> In file included from include/linux/buffer_head.h:12:0,
>>>>  from drivers/staging/erofs/super.c:14:
>>>> include/linux/fs.h:2151:23: note: expected 'size_t {aka long unsigned 
>>>> int}' but argument is of type 'int (*)(struct super_block *, void *, int)'
>>>>  extern struct dentry *mount_bdev(struct file_system_type *fs_type,
>>>>^~
>>>> drivers/staging/erofs/super.c:500:9: error: too few arguments to function 
>>>> 'mount_bdev'
>>>>   return mount_bdev(fs_type, flags, dev_name,
>>>>  ^~
>>>> In file included from include/linux/buffer_head.h:12:0,
>>>>  from drivers/staging/erofs/super.c:14:
>>>> include/linux/fs.h:2151:23: note: declared here
>>>>  extern struct dentry *mount_bdev(struct file_system_type *fs_type,
>>>>^~
>>>> drivers/staging/erofs/super.c: At top level:
>>>> drivers/staging/erofs/super.c:518:20: error: initialization from 
>>>> incompatible pointer type [-Werror=incompatible-pointer-types]
>>>>   .mount  = erofs_mount,
>>>> ^~~
>>>> drivers/staging/erofs/super.c:518:20: note: (near initialization for 
>>>> 'erofs_fs_type.mount')
>>>> drivers/staging/erofs/super.c: In function 'erofs_remount':
>>>> drivers/staging/erofs/super.c:630:12: error: 'MS_RDONLY' undeclared (first 
>>>> use in this function); did you mean 'IS_RDONLY'?
>>>>   *flags |= MS_RDONLY;
>>>> ^
>>>> IS_RDONLY
>>>> drivers/staging/erofs/super.c: At top level:
>>>> drivers/staging/erofs/super.c:640:16: error: initialization from 
>>>> incompatible pointer type [-Werror=incompatible-pointer-types]
>>>>   .remount_fs = erofs_remount,
>>>> ^
>>>>
>>>> Caused by various commits creating erofs in the staging tree interacting
>>>> with various commits redoing the mount infrastructure in the vfs tree.
>>>>
>>>> I have disabed CONFIG_EROFS_FS for now:
>>
>> Xiang has submitted several patches as below to fix compiling error on -next
>> tree, could you consider to merge those temporary fixes into -next after 
>> merging
>> staging-next's updates, and reenable CONFIG_EROFS_FS for further integrity
>> compiling and test?
>>
>> staging: erofs: fix superblock/inode flags (MS_RDONLY -> SB_RDONLY, 
>> S_NOATIME)
>> https://lists.ozlabs.org/pipermail/linux-erofs/2018-July/000282.html
>>
>> staging: erofs: remove RADIX_TREE_EXCEPTIONAL_{ENTRY, SHIFT}
>> https://lists.ozlabs.org/pipermail/linux-erofs/2018-July/000283.html
>>
>> staging: erofs: update .mount and .remount_sb
>> https://lists.ozlabs.org/pipermail/linux-erofs/2018-July/000285.html
> 
> Why have these not been submitted to me for inclusion in my tree?
Oh, let me 

Re: linux-next: build failure after merge of the staging tree

2018-08-02 Thread Chao Yu
Hi Stephen,

Sorry, yesterday I missed this email due to my email filter.

On 2018/8/1 23:07, Stephen Rothwell wrote:
> Hi Chao,
> 
> On Wed, 1 Aug 2018 17:09:13 +0800 Chao Yu  wrote:
>>
>> Xiang has submitted several patches as below to fix compiling error on -next
>> tree, could you consider to merge those temporary fixes into -next after 
>> merging
>> staging-next's updates, and reenable CONFIG_EROFS_FS for further integrity
>> compiling and test?
>>
>> staging: erofs: fix superblock/inode flags (MS_RDONLY -> SB_RDONLY, 
>> S_NOATIME)
>> https://lists.ozlabs.org/pipermail/linux-erofs/2018-July/000282.html
>>
>> staging: erofs: remove RADIX_TREE_EXCEPTIONAL_{ENTRY, SHIFT}
>> https://lists.ozlabs.org/pipermail/linux-erofs/2018-July/000283.html
>>
>> staging: erofs: update .mount and .remount_sb
>> https://lists.ozlabs.org/pipermail/linux-erofs/2018-July/000285.html
> 
> OK, I will apply those tomorrow (actually later today :-)) and and stop
> disabling CONFIG_EROFS_FS.

OK, thanks for doing that, I and Xiang will keep an eye on compile result.

> 
>> BTW, for this condition that erofs was not covered by some common vfs
>> stuff changes in other one's tree, who should take care of those
>> missing fixes during coming next merge window?
> 
> It might be easiest for Greg to add the disabling CONFIG_EROFS_FS patch
> to the staging tree itself for his first pull request during the merge
> window and then send a second pull request (after the vfs and maybe the
> Xarray stuff has been merged by Linus) with these patches followed by a
> revert of the disabling patch.

Thanks for the advice, I think that's a good way to solve the issue, let me send
a patch to disable erofs compiling temporarily to avoid conflict during merge
window. :)

Thanks,

> 



Re: linux-next: build failure after merge of the staging tree

2018-08-01 Thread Chao Yu
Hi Stephen,

On 2018/7/30 14:31, Gao Xiang wrote:
> Hi Stephen,
> 
> On 2018/7/30 14:16, Stephen Rothwell wrote:
>> Hi Greg,
>>
>> After merging the staging tree, today's linux-next build (x86_64
>> allmodconfig) failed like this:
>>
>> drivers/staging/erofs/super.c: In function 'erofs_read_super':
>> drivers/staging/erofs/super.c:343:17: error: 'MS_RDONLY' undeclared (first 
>> use in this function); did you mean 'IS_RDONLY'?
>>   sb->s_flags |= MS_RDONLY | MS_NOATIME;
>>  ^
>>  IS_RDONLY
>> drivers/staging/erofs/super.c:343:17: note: each undeclared identifier is 
>> reported only once for each function it appears in
>> drivers/staging/erofs/super.c:343:29: error: 'MS_NOATIME' undeclared (first 
>> use in this function); did you mean 'S_NOATIME'?
>>   sb->s_flags |= MS_RDONLY | MS_NOATIME;
>>  ^~
>>  S_NOATIME
>> drivers/staging/erofs/super.c: In function 'erofs_mount':
>> drivers/staging/erofs/super.c:501:10: warning: passing argument 5 of 
>> 'mount_bdev' makes integer from pointer without a cast [-Wint-conversion]
>>, erofs_fill_super);
>>   ^~~~
>> In file included from include/linux/buffer_head.h:12:0,
>>  from drivers/staging/erofs/super.c:14:
>> include/linux/fs.h:2151:23: note: expected 'size_t {aka long unsigned int}' 
>> but argument is of type 'int (*)(struct super_block *, void *, int)'
>>  extern struct dentry *mount_bdev(struct file_system_type *fs_type,
>>^~
>> drivers/staging/erofs/super.c:500:9: error: too few arguments to function 
>> 'mount_bdev'
>>   return mount_bdev(fs_type, flags, dev_name,
>>  ^~
>> In file included from include/linux/buffer_head.h:12:0,
>>  from drivers/staging/erofs/super.c:14:
>> include/linux/fs.h:2151:23: note: declared here
>>  extern struct dentry *mount_bdev(struct file_system_type *fs_type,
>>^~
>> drivers/staging/erofs/super.c: At top level:
>> drivers/staging/erofs/super.c:518:20: error: initialization from 
>> incompatible pointer type [-Werror=incompatible-pointer-types]
>>   .mount  = erofs_mount,
>> ^~~
>> drivers/staging/erofs/super.c:518:20: note: (near initialization for 
>> 'erofs_fs_type.mount')
>> drivers/staging/erofs/super.c: In function 'erofs_remount':
>> drivers/staging/erofs/super.c:630:12: error: 'MS_RDONLY' undeclared (first 
>> use in this function); did you mean 'IS_RDONLY'?
>>   *flags |= MS_RDONLY;
>> ^
>> IS_RDONLY
>> drivers/staging/erofs/super.c: At top level:
>> drivers/staging/erofs/super.c:640:16: error: initialization from 
>> incompatible pointer type [-Werror=incompatible-pointer-types]
>>   .remount_fs = erofs_remount,
>> ^
>>
>> Caused by various commits creating erofs in the staging tree interacting
>> with various commits redoing the mount infrastructure in the vfs tree.
>>
>> I have disabed CONFIG_EROFS_FS for now:

Xiang has submitted several patches as below to fix compiling error on -next
tree, could you consider to merge those temporary fixes into -next after merging
staging-next's updates, and reenable CONFIG_EROFS_FS for further integrity
compiling and test?

staging: erofs: fix superblock/inode flags (MS_RDONLY -> SB_RDONLY, S_NOATIME)
https://lists.ozlabs.org/pipermail/linux-erofs/2018-July/000282.html

staging: erofs: remove RADIX_TREE_EXCEPTIONAL_{ENTRY, SHIFT}
https://lists.ozlabs.org/pipermail/linux-erofs/2018-July/000283.html

staging: erofs: update .mount and .remount_sb
https://lists.ozlabs.org/pipermail/linux-erofs/2018-July/000285.html

BTW, for this condition that erofs was not covered by some common vfs stuff
changes in other one's tree, who should take care of those missing fixes during
coming next merge window?

Thanks,

> 
> I will fix them as soon as possible, and test it with the latest linux-next 
> code.
> It seems caused by some vfs changes.
> 
> Thanks,
> Gao Xiang
> 
> .
> 



Re: [PATCH] f2fs: avoid race between zero_range and background GC

2018-07-27 Thread Chao Yu
On 2018/7/27 18:29, Jaegeuk Kim wrote:
> On 07/26, Chao Yu wrote:
>> Thread A Background GC
>> - f2fs_zero_range
>>  - truncate_pagecache_range
>>  - gc_data_segment
>>   - get_read_data_page
>>- move_data_page
>> - set_page_dirty
>> - set_cold_data
>>  - f2fs_do_zero_range
>>   - dn->data_blkaddr = NEW_ADDR;
>>   - f2fs_set_data_blkaddr
>>
>> Actually, we don't need to set dirty & checked flag on the page, since
>> all valid data in the page should be zeroed by zero_range().
> 
> But, it doesn't matter too much, right?

No, if the dirtied page is writebacked after f2fs_do_zero_range(), result of
zero_range() should be wrong, as zeroed page contains valid user data.

> 
>> Use i_gc_rwsem[WRITE] to avoid such race condition.
> 
> Hope to avoid abusing i_gc_rwsem[] tho.

Agreed, let's try avoiding until we have to use it.

Thanks,

> 
>>
>> Signed-off-by: Chao Yu 
>> ---
>>  fs/f2fs/file.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
>> index 267ec3794e1e..7bd2412a8c37 100644
>> --- a/fs/f2fs/file.c
>> +++ b/fs/f2fs/file.c
>> @@ -1309,6 +1309,7 @@ static int f2fs_zero_range(struct inode *inode, loff_t 
>> offset, loff_t len,
>>  if (ret)
>>  return ret;
>>  
>> +down_write(_I(inode)->i_gc_rwsem[WRITE]);
>>  down_write(_I(inode)->i_mmap_sem);
>>  ret = filemap_write_and_wait_range(mapping, offset, offset + len - 1);
>>  if (ret)
>> @@ -1389,6 +1390,7 @@ static int f2fs_zero_range(struct inode *inode, loff_t 
>> offset, loff_t len,
>>  }
>>  out_sem:
>>  up_write(_I(inode)->i_mmap_sem);
>> +up_write(_I(inode)->i_gc_rwsem[WRITE]);
>>  
>>  return ret;
>>  }
>> -- 
>> 2.18.0.rc1


[PATCH 1/4] f2fs: don't keep meta pages used for block migration

2018-07-27 Thread Chao Yu
For migration of encrypted inode's block, we load data of encrypted block
into meta inode's page cache, after checkpoint, those all intermediate
pages should be clean, and no one will read them again, so let's just
release them for more memory.

Signed-off-by: Chao Yu 
---
 fs/f2fs/checkpoint.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 12bebb8fa13d..67834d0ca422 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -1499,6 +1499,14 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, 
struct cp_control *cpc)
commit_checkpoint(sbi, ckpt, start_blk);
wait_on_all_pages_writeback(sbi);
 
+   /*
+* invalidate intermediate page cache borrowed from meta inode
+* which are used for migration of encrypted inode's blocks.
+*/
+   if (f2fs_sb_has_encrypt(sbi->sb))
+   invalidate_mapping_pages(META_MAPPING(sbi),
+   MAIN_BLKADDR(sbi), MAX_BLKADDR(sbi) - 1);
+
f2fs_release_ino_entry(sbi, false);
 
f2fs_reset_fsync_node_info(sbi);
-- 
2.18.0.rc1



[PATCH 4/4] f2fs: fix to spread clear_cold_data()

2018-07-27 Thread Chao Yu
We need to drop PG_checked flag on page as well when we clear PG_uptodate
flag, in order to avoid treating the page as GCing one later.

Signed-off-by: Weichao Guo 
Signed-off-by: Chao Yu 
---
 fs/f2fs/data.c| 8 +++-
 fs/f2fs/dir.c | 1 +
 fs/f2fs/segment.c | 4 +++-
 3 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index a29f3162b887..2817e2f4eb17 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -1768,6 +1768,7 @@ int f2fs_do_write_data_page(struct f2fs_io_info *fio)
/* This page is already truncated */
if (fio->old_blkaddr == NULL_ADDR) {
ClearPageUptodate(page);
+   clear_cold_data(page);
goto out_writepage;
}
 got_it:
@@ -1943,8 +1944,10 @@ static int __write_data_page(struct page *page, bool 
*submitted,
 
 out:
inode_dec_dirty_pages(inode);
-   if (err)
+   if (err) {
ClearPageUptodate(page);
+   clear_cold_data(page);
+   }
 
if (wbc->for_reclaim) {
f2fs_submit_merged_write_cond(sbi, inode, 0, page->index, DATA);
@@ -2534,6 +2537,8 @@ void f2fs_invalidate_page(struct page *page, unsigned int 
offset,
}
}
 
+   clear_cold_data(page);
+
/* This is atomic written page, keep Private */
if (IS_ATOMIC_WRITTEN_PAGE(page))
return f2fs_drop_inmem_page(inode, page);
@@ -2552,6 +2557,7 @@ int f2fs_release_page(struct page *page, gfp_t wait)
if (IS_ATOMIC_WRITTEN_PAGE(page))
return 0;
 
+   clear_cold_data(page);
set_page_private(page, 0);
ClearPagePrivate(page);
return 1;
diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
index 7f955c4e86a4..1e4a4122eb0c 100644
--- a/fs/f2fs/dir.c
+++ b/fs/f2fs/dir.c
@@ -734,6 +734,7 @@ void f2fs_delete_entry(struct f2fs_dir_entry *dentry, 
struct page *page,
clear_page_dirty_for_io(page);
ClearPagePrivate(page);
ClearPageUptodate(page);
+   clear_cold_data(page);
inode_dec_dirty_pages(dir);
f2fs_remove_dirty_inode(dir);
}
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 58abbdc53561..4d83961745e6 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -267,8 +267,10 @@ static int __revoke_inmem_pages(struct inode *inode,
}
 next:
/* we don't need to invalidate this in the sccessful status */
-   if (drop || recover)
+   if (drop || recover) {
ClearPageUptodate(page);
+   clear_cold_data(page);
+   }
set_page_private(page, 0);
ClearPagePrivate(page);
f2fs_put_page(page, 1);
-- 
2.18.0.rc1



[PATCH 3/4] f2fs: fix avoid race between truncate and background GC

2018-07-27 Thread Chao Yu
Thread ABackground GC
- f2fs_setattr isize to 0
 - truncate_setsize
- gc_data_segment
 - f2fs_get_read_data_page page #0
  - set_page_dirty
  - set_cold_data
 - f2fs_truncate

- f2fs_setattr isize to 4k
- read 4k <--- hit data in cached page #0

Above race condition can cause read out invalid data in a truncated
page, fix it by i_gc_rwsem[WRITE] lock.

Signed-off-by: Chao Yu 
---
 fs/f2fs/data.c |  4 
 fs/f2fs/file.c | 33 +++--
 2 files changed, 23 insertions(+), 14 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 071224ded5f4..a29f3162b887 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -2214,10 +2214,14 @@ static void f2fs_write_failed(struct address_space 
*mapping, loff_t to)
loff_t i_size = i_size_read(inode);
 
if (to > i_size) {
+   down_write(_I(inode)->i_gc_rwsem[WRITE]);
down_write(_I(inode)->i_mmap_sem);
+
truncate_pagecache(inode, i_size);
f2fs_truncate_blocks(inode, i_size, true);
+
up_write(_I(inode)->i_mmap_sem);
+   up_write(_I(inode)->i_gc_rwsem[WRITE]);
}
 }
 
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 7bd2412a8c37..ed5c9b0e0d0c 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -796,22 +796,25 @@ int f2fs_setattr(struct dentry *dentry, struct iattr 
*attr)
}
 
if (attr->ia_valid & ATTR_SIZE) {
-   if (attr->ia_size <= i_size_read(inode)) {
-   down_write(_I(inode)->i_mmap_sem);
-   truncate_setsize(inode, attr->ia_size);
+   bool to_smaller = (attr->ia_size <= i_size_read(inode));
+
+   down_write(_I(inode)->i_gc_rwsem[WRITE]);
+   down_write(_I(inode)->i_mmap_sem);
+
+   truncate_setsize(inode, attr->ia_size);
+
+   if (to_smaller)
err = f2fs_truncate(inode);
-   up_write(_I(inode)->i_mmap_sem);
-   if (err)
-   return err;
-   } else {
-   /*
-* do not trim all blocks after i_size if target size is
-* larger than i_size.
-*/
-   down_write(_I(inode)->i_mmap_sem);
-   truncate_setsize(inode, attr->ia_size);
-   up_write(_I(inode)->i_mmap_sem);
+   /*
+* do not trim all blocks after i_size if target size is
+* larger than i_size.
+*/
+   up_write(_I(inode)->i_mmap_sem);
+   up_write(_I(inode)->i_gc_rwsem[WRITE]);
+   if (err)
+   return err;
 
+   if (!to_smaller) {
/* should convert inline inode here */
if (!f2fs_may_inline_data(inode)) {
err = f2fs_convert_inline_inode(inode);
@@ -958,6 +961,7 @@ static int punch_hole(struct inode *inode, loff_t offset, 
loff_t len)
 
blk_start = (loff_t)pg_start << PAGE_SHIFT;
blk_end = (loff_t)pg_end << PAGE_SHIFT;
+   down_write(_I(inode)->i_gc_rwsem[WRITE]);
down_write(_I(inode)->i_mmap_sem);
truncate_inode_pages_range(mapping, blk_start,
blk_end - 1);
@@ -966,6 +970,7 @@ static int punch_hole(struct inode *inode, loff_t offset, 
loff_t len)
ret = f2fs_truncate_hole(inode, pg_start, pg_end);
f2fs_unlock_op(sbi);
up_write(_I(inode)->i_mmap_sem);
+   up_write(_I(inode)->i_gc_rwsem[WRITE]);
}
}
 
-- 
2.18.0.rc1



Re: [f2fs-dev] [PATCH] f2fs: avoid fi->i_gc_rwsem[WRITE] lock in f2fs_gc

2018-07-25 Thread Chao Yu
On 2018/7/25 11:22, Jaegeuk Kim wrote:
> The f2fs_gc() called by f2fs_balance_fs() requires to be called outside of
> fi->i_gc_rwsem[WRITE], since f2fs_gc() can try to grab it in a loop.

It seems there are other paths having the same issue, how about fixing all of
them in this patch?

Thanks,

> 
> Signed-off-by: Jaegeuk Kim 
> ---
>  fs/f2fs/file.c| 2 ++
>  fs/f2fs/segment.c | 1 -
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index 84293423f830..3a5c35fa0603 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -1754,6 +1754,8 @@ static int f2fs_ioc_commit_atomic_write(struct file 
> *filp)
>   if (ret)
>   return ret;
>  
> + f2fs_balance_fs(F2FS_I_SB(inode), true);
> +
>   inode_lock(inode);
>  
>   down_write(_I(inode)->i_gc_rwsem[WRITE]);
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index d28fa03a115f..17354089b4ab 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -445,7 +445,6 @@ int f2fs_commit_inmem_pages(struct inode *inode)
>   struct f2fs_inode_info *fi = F2FS_I(inode);
>   int err;
>  
> - f2fs_balance_fs(sbi, true);
>   f2fs_lock_op(sbi);
>  
>   set_inode_flag(inode, FI_ATOMIC_COMMIT);
> 


Re: [PATCH 2/5] f2fs: add cur_victim_sec for BG_GC to avoid skipping BG_GC victim

2018-07-25 Thread Chao Yu
On 2018/7/24 23:19, Yunlong Song wrote:
> 
> 
> On 2018/7/24 22:17, Chao Yu wrote:
>> On 2018/7/24 21:39, Yunlong Song wrote:
>>>
>>> On 2018/7/24 21:11, Chao Yu wrote:
>>>> On 2018/7/23 22:10, Yunlong Song wrote:
>>>>> If f2fs aborts BG_GC, then the section bit of victim_secmap will be set,
>>>>> which will cause the section skipped in the future get_victim of BG_GC.
>>>>> In a worst case that each section in the victim_secmap is set and there
>>>>> are enough free sections (so FG_GC can not be triggered), then BG_GC
>>>>> will skip all the sections and cannot find any victims, causing BG_GC
>>>> If f2fs aborts BG_GC, we'd better to clear victim_secmap?
>>> We can keep the bit set in victim_secmap for FG_GC use next time as before, 
>>> the
>> No, I don't think we could assume that FGGC will come soon, and in adaptive
>> mode, after we triggered SSR agressively, FG_GC will be much less.
>>
>> For your case, we need to clear victim_secmap.
> However, if it is cleared, then FG_GC will lose the chance to have a quick
> selection of the victim
> candidate, which BG_GC has selected and aborted in last round or there are 
> still
> some blocks
> ungced because these blocks belong to an opening atomic file. Especially for 
> the
> large section
> case, when BG_GC stops its job if IO state change from idle to busy, then it 
> is
> better that FG_GC
> can continue to gc the section selected before. So how about adding another 
> map
> to record these
> sections, and make FG_GC/BG_GC select these sections, as for the old
> victim_secmap, keep its
> old logic, BG_GC can not select those sections in victim_secmap, but FG_GC 
> can.

Let's discuss optimization ideas on GC offline? it will be fast and direct than
in mailing list. :)

Thanks,

> 
>>
>>> diffierent
>>> is that this patch will make BG_GC ignore the bit set in victim_secmap, so 
>>> BG_GC
>>> can still
>>> get the the section (which is in set) as victim and do GC jobs.
>> I guess this scenario is the case our previous scheme tries to prevent, 
>> since if
>> in selected section, all block there are cached and set dirty, BGGC will end 
>> up
>> with doing nothing, it's inefficient.
> 
> OK, I understand.
> 
>>
>> Thanks,
>>
>>>>> failed each time. Besides, SSR also uses BG_GC to get ssr segment, if
>>>> Looks like foreground GC will try to grab section which is selected as
>>>> victim of background GC?
>>> Yes, this is exactly the value of victim_secmap, it helps FG_GC reduce time 
>>> in
>>> selecting victims
>>> and continue the job which BG_GC has not finished.
>>>
>>>> Thanks,
>>>>
>>>>> many sections in the victim_secmap are set, then SSR cannot get a proper
>>>>> ssr segment to allocate blocks, which makes SSR inefficiently. To fix
>>>>> this problem, we can add cur_victim_sec for BG_GC similar like that in
>>>>> FG_GC to avoid selecting the same section repeatedly.
>>>>>
>>>>> Signed-off-by: Yunlong Song 
>>>>> ---
>>>>>    fs/f2fs/f2fs.h  |  3 ++-
>>>>>    fs/f2fs/gc.c    | 15 +--
>>>>>    fs/f2fs/segment.h   |  3 ++-
>>>>>    fs/f2fs/super.c |  3 ++-
>>>>>    include/trace/events/f2fs.h | 18 --
>>>>>    5 files changed, 27 insertions(+), 15 deletions(-)
>>>>>
>>>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>>>>> index 57a8851..f8a7b42 100644
>>>>> --- a/fs/f2fs/f2fs.h
>>>>> +++ b/fs/f2fs/f2fs.h
>>>>> @@ -1217,7 +1217,8 @@ struct f2fs_sb_info {
>>>>>    /* for cleaning operations */
>>>>>    struct mutex gc_mutex;    /* mutex for GC */
>>>>>    struct f2fs_gc_kthread    *gc_thread;    /* GC thread */
>>>>> -    unsigned int cur_victim_sec;    /* current victim section num */
>>>>> +    unsigned int cur_fg_victim_sec;    /* current FG_GC victim 
>>>>> section
>>>>> num */
>>>>> +    unsigned int cur_bg_victim_sec;    /* current BG_GC victim 
>>>>> section
>>>>> num */
>>>>>    unsigned int gc_mode;    /* current GC state */
>>>>>    /* for skip statistic */
>>>>>    unsigned long long skipped_ato

[PATCH] f2fs: fix to restrict mount condition when without CONFIG_QUOTA

2018-07-25 Thread Chao Yu
From: Chao Yu 

Like quota_ino feature, we need to reject mounting RDWR with image
which enables project_quota feature when there is no CONFIG_QUOTA
be set in kernel.

Signed-off-by: Chao Yu 
---
 fs/f2fs/super.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index dbc1cb53581f..bb40f08d2861 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -775,6 +775,12 @@ static int parse_options(struct super_block *sb, char 
*options)
 "without CONFIG_QUOTA");
return -EINVAL;
}
+   if (f2fs_sb_has_project_quota(sbi->sb) && !f2fs_readonly(sbi->sb)) {
+   ext4_msg(sb, KERN_ERR,
+   "Filesystem with project quota feature cannot be "
+   "mounted RDWR without CONFIG_QUOTA");
+   return -EINVAL;
+   }
 #endif
 
if (F2FS_IO_SIZE_BITS(sbi) && !test_opt(sbi, LFS)) {
-- 
2.16.2.17.g38e79b1fd



Re: [f2fs-dev] [PATCH] f2fs: fix 32-bit format string warning

2018-07-25 Thread Chao Yu
On 2018/7/24 17:34, Arnd Bergmann wrote:
> On 32-bit targets, size_t is often 'unsigned int', so printing it as %lu
> causes a warning:
> 
> fs/f2fs/inode.c: In function 'sanity_check_inode':
> fs/f2fs/inode.c:247:4: error: format '%lu' expects argument of type 'long 
> unsigned int', but argument 7 has type 'unsigned int' [-Werror=format=]
> 
> The correct format string is %zu.
> 
> Fixes: ba3a252d3367 ("f2fs: fix to do sanity check with i_extra_isize")
> Signed-off-by: Arnd Bergmann 

I noticed the issue, thank you for fixing it, but original buggy patch has not
been upstreamed yet, how about merging this fix into origial patch, if you don't
mind?

Thanks,

> ---
>  fs/f2fs/inode.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
> index 3fe63b0c7325..4fd339fd3ff2 100644
> --- a/fs/f2fs/inode.c
> +++ b/fs/f2fs/inode.c
> @@ -245,7 +245,7 @@ static bool sanity_check_inode(struct inode *inode, 
> struct page *node_page)
>   set_sbi_flag(sbi, SBI_NEED_FSCK);
>   f2fs_msg(sbi->sb, KERN_WARNING,
>   "%s: inode (ino=%lx) has corrupted i_extra_isize: %d, "
> - "max: %lu",
> + "max: %zu",
>   __func__, inode->i_ino, fi->i_extra_isize,
>   F2FS_TOTAL_EXTRA_ATTR_SIZE);
>   return false;
> 


[PATCH v4 1/2] f2fs: fix to avoid broken of dnode block list

2018-07-25 Thread Chao Yu
f2fs recovery flow is relying on dnode block link list, it means fsynced
file recovery depends on previous dnode's persistence in the list, so
during fsync() we should wait on all regular inode's dnode writebacked
before issuing flush.

By this way, we can avoid dnode block list being broken by out-of-order
IO submission due to IO scheduler or driver.

Sheng Yong helps to do the test with this patch:

Target:/data (f2fs, -)
64MB / 32768KB / 4KB / 8

1 / PERSIST / Index

Base:
SEQ-RD(MB/s)SEQ-WR(MB/s)RND-RD(IOPS)RND-WR(IOPS)
Insert(TPS) Update(TPS) Delete(TPS)
1   867.82  204.15  41440.0341370.54680.8   
1025.94 1031.08
2   871.87  205.87  41370.3 40275.2 791.14  
1065.84 1101.7
3   866.52  205.69  41795.6740596.16694.69  
1037.16 1031.48
Avg 868.737 205.237 41535.3 40747.3 722.21  
1042.98 1054.75

After:
SEQ-RD(MB/s)SEQ-WR(MB/s)RND-RD(IOPS)RND-WR(IOPS)
Insert(TPS) Update(TPS) Delete(TPS)
1   798.81  202.5   41143   40613.87602.71  
838.08  913.83
2   805.79  206.47  40297.2 41291.46604.44  
840.75  924.27
3   814.83  206.17  41209.5740453.62602.85  
834.66  927.91
Avg 806.477 205.047 40883.25667 40786.31667 
603.333 837.83  922.003

Patched/Original:
0.928332713 0.999074239 0.984300676 1.000957528 
0.835398753 0.803303994 0.874141189

It looks like atomic write will suffer performance regression.

I suspect that the criminal is that we forcing to wait all dnode being in
storage cache before we issue PREFLUSH+FUA.

BTW, will commit ("f2fs: don't need to wait for node writes for atomic write")
cause the problem: we will lose data of last transaction after SPO, even if
atomic write return no error:

- atomic_open();
- write() P1, P2, P3;
- atomic_commit();
 - writeback data: P1, P2, P3;
 - writeback node: N1, N2, N3;  <--- If N1, N2 is not writebacked, N3 with 
fsync_mark is
writebacked, In SPOR, we won't find N3 since node chain is broken, turns out 
that losing
last transaction.
 - preflush + fua;
- power-cut

If we don't wait dnode writeback for atomic_write:

SEQ-RD(MB/s)SEQ-WR(MB/s)RND-RD(IOPS)RND-WR(IOPS)
Insert(TPS) Update(TPS) Delete(TPS)
1   779.91  206.03  41621.5 40333.16716.9   
1038.21 1034.85
2   848.51  204.35  40082.4439486.17791.83  
1119.96 1083.77
3   772.12  206.27  41335.2541599.65723.29  
1055.07 971.92
Avg 800.18  205.55  41013.06333 40472.99333 
744.007 1071.08 1030.18

Patched/Original:
0.92108464  1.001526693 0.987425886 0.993268102 
1.030180511 1.026942031 0.976702294

SQLite's performance recovers.

Jaegeuk:
"Practically, I don't see db corruption becase of this. We can excuse to lose
the last transaction."

Finally, we decide to keep original implementation of atomic write interface
sematics that we don't wait all dnode writeback before preflush+fua submission.

Signed-off-by: Chao Yu 
---
v4:
- add test number with Androbench.
- don't wait dnode writeback for atomic_write.
- add missing filemap_check_errors.
 fs/f2fs/checkpoint.c |   2 +
 fs/f2fs/data.c   |   2 +
 fs/f2fs/f2fs.h   |  21 ++-
 fs/f2fs/file.c   |   5 +-
 fs/f2fs/node.c   | 144 +++
 fs/f2fs/super.c  |   4 ++
 6 files changed, 150 insertions(+), 28 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 04841f32d4d9..e010fecce097 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -1407,6 +1407,8 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, struct 
cp_control *cpc)
 
f2fs_release_ino_entry(sbi, false);
 
+   f2fs_reset_fsync_node_info(sbi);
+
clear_sbi_flag(sbi, SBI_IS_DIRTY);
clear_sbi_flag(sbi, SBI_NEED_CP);
__set_cp_next_pack(sbi);
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 6b8ca5011bfd..572c91e43337 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -177,6 +177,8 @@ static void f2fs_write_end_io(struct bio *bio)
page->index != nid_of_node(page));
 
dec_page_count(sbi, type);
+   if (f2fs_in_warm_node_list(sbi, page))
+   f2fs_del_fsync_node_entry(sbi, page);
clear_cold_data(page);
end_page_writeback(page);
}
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs

[PATCH v4 2/2] f2fs: let checkpoint flush dnode page of regular

2018-07-25 Thread Chao Yu
Fsyncer will wait on all dnode pages of regular writeback before flushing,
if there are async dnode pages blocked by IO scheduler, it may decrease
fsync's performance.

In this patch, we choose to let f2fs_balance_fs_bg() to trigger checkpoint
to flush these dnode pages of regular, so async IO of dnode page can be
elimitnated, making fsyncer only need to wait for sync IO.

Signed-off-by: Chao Yu 
---
v4:
- rebase to last dev-test.
 fs/f2fs/node.c| 8 +++-
 fs/f2fs/node.h| 5 +
 fs/f2fs/segment.c | 4 +++-
 3 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index d976b8fe479d..2bc5ff76c19c 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -1478,6 +1478,10 @@ static int __write_node_page(struct page *page, bool 
atomic, bool *submitted,
if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING)))
goto redirty_out;
 
+   if (wbc->sync_mode == WB_SYNC_NONE &&
+   IS_DNODE(page) && is_cold_node(page))
+   goto redirty_out;
+
/* get old block addr of this node page */
nid = nid_of_node(page);
f2fs_bug_on(sbi, page->index != nid);
@@ -1804,10 +1808,12 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
}
 
if (step < 2) {
+   if (wbc->sync_mode == WB_SYNC_NONE && step == 1)
+   goto out;
step++;
goto next_step;
}
-
+out:
if (nwritten)
f2fs_submit_merged_write(sbi, NODE);
 
diff --git a/fs/f2fs/node.h b/fs/f2fs/node.h
index 8f34bdffde93..0f4db7a61254 100644
--- a/fs/f2fs/node.h
+++ b/fs/f2fs/node.h
@@ -135,6 +135,11 @@ static inline bool excess_cached_nats(struct f2fs_sb_info 
*sbi)
return NM_I(sbi)->nat_cnt >= DEF_NAT_CACHE_THRESHOLD;
 }
 
+static inline bool excess_dirty_nodes(struct f2fs_sb_info *sbi)
+{
+   return get_pages(sbi, F2FS_DIRTY_NODES) >= sbi->blocks_per_seg * 8;
+}
+
 enum mem_type {
FREE_NIDS,  /* indicates the free nid list */
NAT_ENTRIES,/* indicates the cached nat entry */
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 17354089b4ab..27f9a3202d6f 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -509,7 +509,8 @@ void f2fs_balance_fs_bg(struct f2fs_sb_info *sbi)
else
f2fs_build_free_nids(sbi, false, false);
 
-   if (!is_idle(sbi) && !excess_dirty_nats(sbi))
+   if (!is_idle(sbi) &&
+   (!excess_dirty_nats(sbi) && !excess_dirty_nodes(sbi)))
return;
 
/* checkpoint is the only way to shrink partial cached entries */
@@ -517,6 +518,7 @@ void f2fs_balance_fs_bg(struct f2fs_sb_info *sbi)
!f2fs_available_free_memory(sbi, INO_ENTRIES) ||
excess_prefree_segs(sbi) ||
excess_dirty_nats(sbi) ||
+   excess_dirty_nodes(sbi) ||
f2fs_time_over(sbi, CP_TIME)) {
if (test_opt(sbi, DATA_FLUSH)) {
struct blk_plug plug;
-- 
2.18.0.rc1



Re: [PATCH 2/5] f2fs: add cur_victim_sec for BG_GC to avoid skipping BG_GC victim

2018-07-24 Thread Chao Yu
On 2018/7/24 21:39, Yunlong Song wrote:
> 
> 
> On 2018/7/24 21:11, Chao Yu wrote:
>> On 2018/7/23 22:10, Yunlong Song wrote:
>>> If f2fs aborts BG_GC, then the section bit of victim_secmap will be set,
>>> which will cause the section skipped in the future get_victim of BG_GC.
>>> In a worst case that each section in the victim_secmap is set and there
>>> are enough free sections (so FG_GC can not be triggered), then BG_GC
>>> will skip all the sections and cannot find any victims, causing BG_GC
>> If f2fs aborts BG_GC, we'd better to clear victim_secmap?
> We can keep the bit set in victim_secmap for FG_GC use next time as before, 
> the

No, I don't think we could assume that FGGC will come soon, and in adaptive
mode, after we triggered SSR agressively, FG_GC will be much less.

For your case, we need to clear victim_secmap.

> diffierent
> is that this patch will make BG_GC ignore the bit set in victim_secmap, so 
> BG_GC
> can still
> get the the section (which is in set) as victim and do GC jobs.

I guess this scenario is the case our previous scheme tries to prevent, since if
in selected section, all block there are cached and set dirty, BGGC will end up
with doing nothing, it's inefficient.

Thanks,

>>
>>> failed each time. Besides, SSR also uses BG_GC to get ssr segment, if
>> Looks like foreground GC will try to grab section which is selected as
>> victim of background GC?
> Yes, this is exactly the value of victim_secmap, it helps FG_GC reduce time in
> selecting victims
> and continue the job which BG_GC has not finished.
> 
>>
>> Thanks,
>>
>>> many sections in the victim_secmap are set, then SSR cannot get a proper
>>> ssr segment to allocate blocks, which makes SSR inefficiently. To fix
>>> this problem, we can add cur_victim_sec for BG_GC similar like that in
>>> FG_GC to avoid selecting the same section repeatedly.
>>>
>>> Signed-off-by: Yunlong Song 
>>> ---
>>>   fs/f2fs/f2fs.h  |  3 ++-
>>>   fs/f2fs/gc.c    | 15 +--
>>>   fs/f2fs/segment.h   |  3 ++-
>>>   fs/f2fs/super.c |  3 ++-
>>>   include/trace/events/f2fs.h | 18 --
>>>   5 files changed, 27 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>>> index 57a8851..f8a7b42 100644
>>> --- a/fs/f2fs/f2fs.h
>>> +++ b/fs/f2fs/f2fs.h
>>> @@ -1217,7 +1217,8 @@ struct f2fs_sb_info {
>>>   /* for cleaning operations */
>>>   struct mutex gc_mutex;    /* mutex for GC */
>>>   struct f2fs_gc_kthread    *gc_thread;    /* GC thread */
>>> -    unsigned int cur_victim_sec;    /* current victim section num */
>>> +    unsigned int cur_fg_victim_sec;    /* current FG_GC victim section
>>> num */
>>> +    unsigned int cur_bg_victim_sec;    /* current BG_GC victim section
>>> num */
>>>   unsigned int gc_mode;    /* current GC state */
>>>   /* for skip statistic */
>>>   unsigned long long skipped_atomic_files[2];    /* FG_GC and BG_GC */
>>> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
>>> index 2ba470d..705d419 100644
>>> --- a/fs/f2fs/gc.c
>>> +++ b/fs/f2fs/gc.c
>>> @@ -367,8 +367,6 @@ static int get_victim_by_default(struct f2fs_sb_info 
>>> *sbi,
>>>     if (sec_usage_check(sbi, secno))
>>>   goto next;
>>> -    if (gc_type == BG_GC && test_bit(secno, dirty_i->victim_secmap))
>>> -    goto next;
>>>     cost = get_gc_cost(sbi, segno, );
>>>   @@ -391,14 +389,17 @@ static int get_victim_by_default(struct f2fs_sb_info
>>> *sbi,
>>>   if (p.alloc_mode == LFS) {
>>>   secno = GET_SEC_FROM_SEG(sbi, p.min_segno);
>>>   if (gc_type == FG_GC)
>>> -    sbi->cur_victim_sec = secno;
>>> -    else
>>> +    sbi->cur_fg_victim_sec = secno;
>>> +    else {
>>>   set_bit(secno, dirty_i->victim_secmap);
>>> +    sbi->cur_bg_victim_sec = secno;
>>> +    }
>>>   }
>>>   *result = (p.min_segno / p.ofs_unit) * p.ofs_unit;
>>>     trace_f2fs_get_victim(sbi->sb, type, gc_type, ,
>>> -    sbi->cur_victim_sec,
>>> +    sbi->cur_fg_victim_sec,
>>> +    sbi->cur_bg_victim_sec,
>>>   

Re: [PATCH 2/5] f2fs: add cur_victim_sec for BG_GC to avoid skipping BG_GC victim

2018-07-24 Thread Chao Yu
On 2018/7/23 22:10, Yunlong Song wrote:
> If f2fs aborts BG_GC, then the section bit of victim_secmap will be set,
> which will cause the section skipped in the future get_victim of BG_GC.
> In a worst case that each section in the victim_secmap is set and there
> are enough free sections (so FG_GC can not be triggered), then BG_GC
> will skip all the sections and cannot find any victims, causing BG_GC

If f2fs aborts BG_GC, we'd better to clear victim_secmap?

> failed each time. Besides, SSR also uses BG_GC to get ssr segment, if

Looks like foreground GC will try to grab section which is selected as
victim of background GC?

Thanks,

> many sections in the victim_secmap are set, then SSR cannot get a proper
> ssr segment to allocate blocks, which makes SSR inefficiently. To fix
> this problem, we can add cur_victim_sec for BG_GC similar like that in
> FG_GC to avoid selecting the same section repeatedly.
> 
> Signed-off-by: Yunlong Song 
> ---
>  fs/f2fs/f2fs.h  |  3 ++-
>  fs/f2fs/gc.c| 15 +--
>  fs/f2fs/segment.h   |  3 ++-
>  fs/f2fs/super.c |  3 ++-
>  include/trace/events/f2fs.h | 18 --
>  5 files changed, 27 insertions(+), 15 deletions(-)
> 
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 57a8851..f8a7b42 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -1217,7 +1217,8 @@ struct f2fs_sb_info {
>   /* for cleaning operations */
>   struct mutex gc_mutex;  /* mutex for GC */
>   struct f2fs_gc_kthread  *gc_thread; /* GC thread */
> - unsigned int cur_victim_sec;/* current victim section num */
> + unsigned int cur_fg_victim_sec; /* current FG_GC victim section 
> num */
> + unsigned int cur_bg_victim_sec; /* current BG_GC victim section 
> num */
>   unsigned int gc_mode;   /* current GC state */
>   /* for skip statistic */
>   unsigned long long skipped_atomic_files[2]; /* FG_GC and BG_GC */
> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> index 2ba470d..705d419 100644
> --- a/fs/f2fs/gc.c
> +++ b/fs/f2fs/gc.c
> @@ -367,8 +367,6 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi,
>  
>   if (sec_usage_check(sbi, secno))
>   goto next;
> - if (gc_type == BG_GC && test_bit(secno, dirty_i->victim_secmap))
> - goto next;
>  
>   cost = get_gc_cost(sbi, segno, );
>  
> @@ -391,14 +389,17 @@ static int get_victim_by_default(struct f2fs_sb_info 
> *sbi,
>   if (p.alloc_mode == LFS) {
>   secno = GET_SEC_FROM_SEG(sbi, p.min_segno);
>   if (gc_type == FG_GC)
> - sbi->cur_victim_sec = secno;
> - else
> + sbi->cur_fg_victim_sec = secno;
> + else {
>   set_bit(secno, dirty_i->victim_secmap);
> + sbi->cur_bg_victim_sec = secno;
> + }
>   }
>   *result = (p.min_segno / p.ofs_unit) * p.ofs_unit;
>  
>   trace_f2fs_get_victim(sbi->sb, type, gc_type, ,
> - sbi->cur_victim_sec,
> + sbi->cur_fg_victim_sec,
> + sbi->cur_bg_victim_sec,
>   prefree_segments(sbi), free_segments(sbi));
>   }
>  out:
> @@ -1098,7 +1099,9 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync,
>   }
>  
>   if (gc_type == FG_GC)
> - sbi->cur_victim_sec = NULL_SEGNO;
> + sbi->cur_fg_victim_sec = NULL_SEGNO;
> + else
> + sbi->cur_bg_victim_sec = NULL_SEGNO;
>  
>   if (!sync) {
>   if (has_not_enough_free_secs(sbi, sec_freed, 0)) {
> diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
> index 5049551..b21bb96 100644
> --- a/fs/f2fs/segment.h
> +++ b/fs/f2fs/segment.h
> @@ -787,7 +787,8 @@ static inline block_t sum_blk_addr(struct f2fs_sb_info 
> *sbi, int base, int type)
>  
>  static inline bool sec_usage_check(struct f2fs_sb_info *sbi, unsigned int 
> secno)
>  {
> - if (IS_CURSEC(sbi, secno) || (sbi->cur_victim_sec == secno))
> + if (IS_CURSEC(sbi, secno) || (sbi->cur_fg_victim_sec == secno) ||
> + (sbi->cur_bg_victim_sec == secno))
>   return true;
>   return false;
>  }
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index 7187885..ef69ebf 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -2386,7 +2386,8 @@ static void init_sb_info(struct f2fs_sb_info *sbi)
>   sbi->root_ino_num = le32_to_cpu(raw_super->root_ino);
>   sbi->node_ino_num = le32_to_cpu(raw_super->node_ino);
>   sbi->meta_ino_num = le32_to_cpu(raw_super->meta_ino);
> - sbi->cur_victim_sec = NULL_SECNO;
> + sbi->cur_fg_victim_sec = NULL_SECNO;
> + sbi->cur_bg_victim_sec = NULL_SECNO;
>   

Re: [PATCH v2] f2fs: clear the remaining prefree_map of the section

2018-07-17 Thread Chao Yu
On 2018/7/16 18:03, Yunlong Song wrote:
> For the case when sbi->segs_per_sec > 1 with lfs mode, take
> section:segment = 5 for example, if the section prefree_map is
> ...previous section | current section (1 1 0 1 1) | next section...,
> then the start = x, end = x + 1, after start = start_segno +
> sbi->segs_per_sec, start = x + 5, then it will skip x + 3 and x + 4, but
> their bitmap is still set, which will cause duplicated
> f2fs_issue_discard of this same section in the next write_checkpoint, so
> fix it.

I mean:

Subject: [PATCH] f2fs: issue discard align to section in LFS mode

---
 fs/f2fs/segment.c | 19 ---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index f12dad627fb4..6640c790cf64 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -1761,6 +1761,7 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
unsigned int start = 0, end = -1;
unsigned int secno, start_segno;
bool force = (cpc->reason & CP_DISCARD);
+   bool need_align = test_opt(sbi, LFS) && sbi->segs_per_sec > 1;

mutex_lock(_i->seglist_lock);

@@ -1772,10 +1773,15 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info 
*sbi,
end = find_next_zero_bit(prefree_map, MAIN_SEGS(sbi),
start + 1);

-   for (i = start; i < end; i++)
-   clear_bit(i, prefree_map);
+   if (need_align) {
+   start = rounddown(start, sbi->segs_per_sec);
+   end = roundup(start, sbi->segs_per_sec);
+   }

-   dirty_i->nr_dirty[PRE] -= end - start;
+   for (i = start; i < end; i++) {
+   if (test_and_clear_bit(i, prefree_map))
+   dirty_i->nr_dirty[PRE]--;
+   }

if (!test_opt(sbi, DISCARD))
continue;
@@ -2564,6 +2570,7 @@ int f2fs_trim_fs(struct f2fs_sb_info *sbi, struct 
fstrim_range *range)
struct discard_policy dpolicy;
unsigned long long trimmed = 0;
int err = 0;
+   bool need_align = test_opt(sbi, LFS) && sbi->segs_per_sec > 1;

if (start >= MAX_BLKADDR(sbi) || range->len < sbi->blocksize)
return -EINVAL;
@@ -2582,6 +2589,12 @@ int f2fs_trim_fs(struct f2fs_sb_info *sbi, struct 
fstrim_range *range)
end_segno = (end >= MAX_BLKADDR(sbi)) ? MAIN_SEGS(sbi) - 1 :
GET_SEGNO(sbi, end);

+   if (need_align) {
+   start_segno = rounddown(start_segno, sbi->segs_per_sec);
+   end_segno = roundup(end_segno, sbi->segs_per_sec);
+   end_segno = min(end_segno, MAIN_SEGS(sbi) - 1);
+   }
+
cpc.reason = CP_DISCARD;
cpc.trim_minlen = max_t(__u64, 1, F2FS_BYTES_TO_BLK(range->minlen));
cpc.trim_start = start_segno;
-- 
2.18.0.rc1



> 
> Signed-off-by: Yunlong Song 
> ---
>  fs/f2fs/segment.c | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index cfff7cf..5dc1d5cc 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -1729,6 +1729,15 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info 
> *sbi,
>   if (!test_opt(sbi, DISCARD))
>   continue;
>  
> + if (test_opt(sbi, LFS) && sbi->segs_per_sec > 1) {
> + start = rounddown(start, sbi->segs_per_sec);
> + i = end;
> + end = roundup(end, sbi->segs_per_sec);
> + while (++i < end)
> + if (test_and_clear_bit(i, prefree_map))
> + dirty_i->nr_dirty[PRE]--;
> + }
> +
>   if (force && start >= cpc->trim_start &&
>   (end - 1) <= cpc->trim_end)
>   continue;
> 



Re: [PATCH 5/5] f2fs: do not __punch_discard_cmd in lfs mode

2018-07-12 Thread Chao Yu
On 2018/7/12 23:09, Yunlong Song wrote:
> In lfs mode, it is better to submit and wait for discard of the
> new_blkaddr's overall section, rather than punch it which makes
> more small discards and is not friendly with flash alignment. And
> f2fs does not have to wait discard of each new_blkaddr except for the
> start_block of each section with this patch.

For non-zoned block device, unaligned discard can be allowed; and if synchronous
discard is very slow, it will block block allocator here, rather than that, I
prefer just punch 4k lba of discard entry for performance.

If you don't want to encounter this condition, I suggest issue large size
discard more quickly.

Thanks,

> 
> Signed-off-by: Yunlong Song 
> ---
>  fs/f2fs/segment.c | 76 
> ++-
>  fs/f2fs/segment.h |  7 -
>  2 files changed, 75 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index f6c20e0..bce321a 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -893,7 +893,19 @@ static void __remove_discard_cmd(struct f2fs_sb_info 
> *sbi,
>  static void f2fs_submit_discard_endio(struct bio *bio)
>  {
>   struct discard_cmd *dc = (struct discard_cmd *)bio->bi_private;
> + struct f2fs_sb_info *sbi = F2FS_SB(dc->bdev->bd_super);
>  
> + if (test_opt(sbi, LFS)) {
> + unsigned int segno = GET_SEGNO(sbi, dc->lstart);
> + unsigned int secno = GET_SEC_FROM_SEG(sbi, segno);
> + int cnt = (dc->len >> sbi->log_blocks_per_seg) /
> + sbi->segs_per_sec;
> +
> + while (cnt--) {
> + set_bit(secno, FREE_I(sbi)->discard_secmap);
> + secno++;
> + }
> + }
>   dc->error = blk_status_to_errno(bio->bi_status);
>   dc->state = D_DONE;
>   complete_all(>wait);
> @@ -1349,8 +1361,15 @@ static void f2fs_wait_discard_bio(struct f2fs_sb_info 
> *sbi, block_t blkaddr)
>   dc = (struct discard_cmd *)f2fs_lookup_rb_tree(>root,
>   NULL, blkaddr);
>   if (dc) {
> - if (dc->state == D_PREP) {
> + if (dc->state == D_PREP && !test_opt(sbi, LFS))
>   __punch_discard_cmd(sbi, dc, blkaddr);
> + else if (dc->state == D_PREP && test_opt(sbi, LFS)) {
> + struct discard_policy dpolicy;
> +
> + __init_discard_policy(sbi, , DPOLICY_FORCE, 1);
> + __submit_discard_cmd(sbi, , dc);
> + dc->ref++;
> + need_wait = true;
>   } else {
>   dc->ref++;
>   need_wait = true;
> @@ -2071,9 +2090,10 @@ static void get_new_segment(struct f2fs_sb_info *sbi,
>   unsigned int hint = GET_SEC_FROM_SEG(sbi, *newseg);
>   unsigned int old_zoneno = GET_ZONE_FROM_SEG(sbi, *newseg);
>   unsigned int left_start = hint;
> - bool init = true;
> + bool init = true, check_discard = test_opt(sbi, LFS) ? true : false;
>   int go_left = 0;
>   int i;
> + unsigned long *free_secmap;
>  
>   spin_lock(_i->segmap_lock);
>  
> @@ -2084,11 +2104,25 @@ static void get_new_segment(struct f2fs_sb_info *sbi,
>   goto got_it;
>   }
>  find_other_zone:
> - secno = find_next_zero_bit(free_i->free_secmap, MAIN_SECS(sbi), hint);
> + if (check_discard) {
> + int entries = f2fs_bitmap_size(MAIN_SECS(sbi)) / 
> sizeof(unsigned long);
> +
> + free_secmap = free_i->tmp_secmap;
> + for (i = 0; i < entries; i++)
> + free_secmap[i] = (!(free_i->free_secmap[i] ^
> + free_i->discard_secmap[i])) | 
> free_i->free_secmap[i];
> + } else
> + free_secmap = free_i->free_secmap;
> +
> + secno = find_next_zero_bit(free_secmap, MAIN_SECS(sbi), hint);
>   if (secno >= MAIN_SECS(sbi)) {
>   if (dir == ALLOC_RIGHT) {
> - secno = find_next_zero_bit(free_i->free_secmap,
> + secno = find_next_zero_bit(free_secmap,
>   MAIN_SECS(sbi), 0);
> + if (secno >= MAIN_SECS(sbi) && check_discard) {
> + check_discard = false;
> + goto find_other_zone;
> + }
>   f2fs_bug_on(sbi, secno >= MAIN_SECS(sbi));
>   } else {
>   go_left = 1;
> @@ -2098,13 +2132,17 @@ static void get_new_segment(struct f2fs_sb_info *sbi,
>   if (go_left == 0)
>   goto skip_left;
>  
> - while (test_bit(left_start, free_i->free_secmap)) {
> + while (test_bit(left_start, free_secmap)) {
>   if (left_start > 0) {
>   left_start--;
>   continue;
>   }
> - 

Re: [f2fs-dev] [PATCH 2/4] f2fs: allow wrong configure dio to buffered write

2018-07-09 Thread Chao Yu
On 2018/7/7 5:09, Jaegeuk Kim wrote:
> This fixes to support unaligned dio as buffered writes.

Should we return -EINVAL as manual of write said:

EINVAL fd is attached to an object which is unsuitable for writing; or the file
was opened with the O_DIRECT flag,  and  either  the  address
specified in buf, the value specified in count, or the current file offset is
not suitably aligned.

Thanks,

> 
> Signed-off-by: Jaegeuk Kim 
> ---
>  fs/f2fs/data.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index e66379961804..6e8e78bb64a7 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -2425,7 +2425,7 @@ static ssize_t f2fs_direct_IO(struct kiocb *iocb, 
> struct iov_iter *iter)
>  
>   err = check_direct_IO(inode, iter, offset);
>   if (err)
> - return err;
> + return 0;
>  
>   if (f2fs_force_buffered_io(inode, rw))
>   return 0;
> 


[PATCH] f2fs: enable real-time discard by default

2018-07-05 Thread Chao Yu
f2fs is focused on flash based storage, so let's enable real-time
discard by default, if user don't want to enable it, 'nodiscard'
mount option should be used on mount.

Signed-off-by: Chao Yu 
---
 fs/f2fs/super.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 848badda50ad..980edeb0b650 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -1366,12 +1366,11 @@ static void default_options(struct f2fs_sb_info *sbi)
set_opt(sbi, NOHEAP);
sbi->sb->s_flags |= SB_LAZYTIME;
set_opt(sbi, FLUSH_MERGE);
-   if (f2fs_sb_has_blkzoned(sbi->sb)) {
+   set_opt(sbi, DISCARD);
+   if (f2fs_sb_has_blkzoned(sbi->sb))
set_opt_mode(sbi, F2FS_MOUNT_LFS);
-   set_opt(sbi, DISCARD);
-   } else {
+   else
set_opt_mode(sbi, F2FS_MOUNT_ADAPTIVE);
-   }
 
 #ifdef CONFIG_F2FS_FS_XATTR
set_opt(sbi, XATTR_USER);
-- 
2.18.0.rc1



[PATCH v2 1/2] f2fs: fix to avoid broken of dnode block list

2018-07-04 Thread Chao Yu
f2fs recovery flow is relying on dnode block link list, it means fsynced
file recovery depends on previous dnode's persistence in the list, so
during fsync() we should wait on all regular inode's dnode writebacked
before issuing flush.

By this way, we can avoid dnode block list being broken by out-of-order
IO submission due to IO scheduler or driver.

Signed-off-by: Chao Yu 
---
v2: add missing definition modification in f2fs.h.
 fs/f2fs/f2fs.h |  2 +-
 fs/f2fs/file.c | 17 -
 fs/f2fs/node.c |  4 ++--
 3 files changed, 7 insertions(+), 16 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 859ecde81dd0..a9da5a089cb4 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -2825,7 +2825,7 @@ pgoff_t f2fs_get_next_page_offset(struct dnode_of_data 
*dn, pgoff_t pgofs);
 int f2fs_get_dnode_of_data(struct dnode_of_data *dn, pgoff_t index, int mode);
 int f2fs_truncate_inode_blocks(struct inode *inode, pgoff_t from);
 int f2fs_truncate_xattr_node(struct inode *inode);
-int f2fs_wait_on_node_pages_writeback(struct f2fs_sb_info *sbi, nid_t ino);
+int f2fs_wait_on_node_pages_writeback(struct f2fs_sb_info *sbi);
 int f2fs_remove_inode_page(struct inode *inode);
 struct page *f2fs_new_inode_page(struct inode *inode);
 struct page *f2fs_new_node_page(struct dnode_of_data *dn, unsigned int ofs);
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 752ff678bfe0..ecca7b833268 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -292,19 +292,10 @@ static int f2fs_do_sync_file(struct file *file, loff_t 
start, loff_t end,
goto sync_nodes;
}
 
-   /*
-* If it's atomic_write, it's just fine to keep write ordering. So
-* here we don't need to wait for node write completion, since we use
-* node chain which serializes node blocks. If one of node writes are
-* reordered, we can see simply broken chain, resulting in stopping
-* roll-forward recovery. It means we'll recover all or none node blocks
-* given fsync mark.
-*/
-   if (!atomic) {
-   ret = f2fs_wait_on_node_pages_writeback(sbi, ino);
-   if (ret)
-   goto out;
-   }
+
+   ret = f2fs_wait_on_node_pages_writeback(sbi);
+   if (ret)
+   goto out;
 
/* once recovery info is written, don't need to tack this */
f2fs_remove_ino_entry(sbi, ino, APPEND_INO);
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index 849c2ed9c152..0810c8117d46 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -1710,7 +1710,7 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
return ret;
 }
 
-int f2fs_wait_on_node_pages_writeback(struct f2fs_sb_info *sbi, nid_t ino)
+int f2fs_wait_on_node_pages_writeback(struct f2fs_sb_info *sbi)
 {
pgoff_t index = 0;
struct pagevec pvec;
@@ -1726,7 +1726,7 @@ int f2fs_wait_on_node_pages_writeback(struct f2fs_sb_info 
*sbi, nid_t ino)
for (i = 0; i < nr_pages; i++) {
struct page *page = pvec.pages[i];
 
-   if (ino && ino_of_node(page) == ino) {
+   if (IS_DNODE(page) && is_cold_node(page)) {
f2fs_wait_on_page_writeback(page, NODE, true);
if (TestClearPageError(page))
ret = -EIO;
-- 
2.18.0.rc1



Re: [PATCH] f2fs: Replace strncpy with memcpy

2018-07-01 Thread Chao Yu
On 2018/7/2 10:16, Guenter Roeck wrote:
> On 07/01/2018 06:53 PM, Chao Yu wrote:
>> On 2018/7/2 4:57, Guenter Roeck wrote:
>>> gcc 8.1.0 complains:
>>>
>>> fs/f2fs/namei.c: In function 'f2fs_update_extension_list':
>>> fs/f2fs/namei.c:257:3: warning:
>>> 'strncpy' output truncated before terminating nul copying
>>> as many bytes from a string as its length
>>> fs/f2fs/namei.c:249:3: warning:
>>> 'strncpy' output truncated before terminating nul copying
>>> as many bytes from a string as its length
>>>
>>> Using strncpy() is indeed less than perfect since the length of data to
>>> be copied has already been determined with strlen(). Replace strncpy()
>>> with memcpy() to address the warning and optimize the code a little.
>>>
>>> Signed-off-by: Guenter Roeck 
>>> ---
>>>   fs/f2fs/namei.c | 4 ++--
>>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
>>> index 231b7f3ea7d3..e75607544f7c 100644
>>> --- a/fs/f2fs/namei.c
>>> +++ b/fs/f2fs/namei.c
>>> @@ -246,7 +246,7 @@ int f2fs_update_extension_list(struct f2fs_sb_info 
>>> *sbi, const char *name,
>>> return -EINVAL;
>>>   
>>> if (hot) {
>>> -   strncpy(extlist[count], name, strlen(name));
>>> +   memcpy(extlist[count], name, strlen(name));
>>
>> How about replacing with strcpy(extlist[count], name)? Because name length 
>> has
>> already been checked before f2fs_update_extension_list, it should be valid, 
>> and
>> will not cause overflow during copying.
>>
> 
> Your call; feel free to submit an alternative. Since it is different files, 
> static
> analysis might not know and complain, though. You might want to make sure 
> that this
> doesn't happen, and also add a comment explaining the reason for using 
> strcpy().

Yeah, that could be changed in another patch, but it will be trivial. Anyway, to
fix this gcc complaint, this patch looks good to me, thanks for the patch. :)

Reviewed-by: Chao Yu 

Thanks,

> 
> Thanks,
> Guenter
> 
> 



[PATCH v2 2/2] f2fs: fix to do sanity check with extra_attr feature

2018-06-25 Thread Chao Yu
From: Chao Yu 

If FI_EXTRA_ATTR is set in inode by fuzzing, inode.i_addr[0] will be
parsed as inode.i_extra_isize, then in __recover_inline_status, inline
data address will beyond boundary of page, result in accessing invalid
memory.

So in this condition, during reading inode page, let's do sanity check
with EXTRA_ATTR feature of fs and extra_attr bit of inode, if they're
inconsistent, deny to load this inode.

- Overview
Out-of-bound access in f2fs_iget() when mounting a corrupted f2fs image

- Reproduce

The following message will be got in KASAN build of 4.18 upstream kernel.
[  819.392227] 
==
[  819.393901] BUG: KASAN: slab-out-of-bounds in f2fs_iget+0x736/0x1530
[  819.395329] Read of size 4 at addr 8801f099c968 by task mount/1292

[  819.397079] CPU: 1 PID: 1292 Comm: mount Not tainted 4.18.0-rc1+ #4
[  819.397082] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Ubuntu-1.8.2-1ubuntu1 04/01/2014
[  819.397088] Call Trace:
[  819.397124]  dump_stack+0x7b/0xb5
[  819.397154]  print_address_description+0x70/0x290
[  819.397159]  kasan_report+0x291/0x390
[  819.397163]  ? f2fs_iget+0x736/0x1530
[  819.397176]  check_memory_region+0x139/0x190
[  819.397182]  __asan_loadN+0xf/0x20
[  819.397185]  f2fs_iget+0x736/0x1530
[  819.397197]  f2fs_fill_super+0x1b4f/0x2b40
[  819.397202]  ? f2fs_fill_super+0x1b4f/0x2b40
[  819.397208]  ? f2fs_commit_super+0x1b0/0x1b0
[  819.397227]  ? set_blocksize+0x90/0x140
[  819.397241]  mount_bdev+0x1c5/0x210
[  819.397245]  ? f2fs_commit_super+0x1b0/0x1b0
[  819.397252]  f2fs_mount+0x15/0x20
[  819.397256]  mount_fs+0x60/0x1a0
[  819.397267]  ? alloc_vfsmnt+0x309/0x360
[  819.397272]  vfs_kern_mount+0x6b/0x1a0
[  819.397282]  do_mount+0x34a/0x18c0
[  819.397300]  ? lockref_put_or_lock+0xcf/0x160
[  819.397306]  ? copy_mount_string+0x20/0x20
[  819.397318]  ? memcg_kmem_put_cache+0x1b/0xa0
[  819.397324]  ? kasan_check_write+0x14/0x20
[  819.397334]  ? _copy_from_user+0x6a/0x90
[  819.397353]  ? memdup_user+0x42/0x60
[  819.397359]  ksys_mount+0x83/0xd0
[  819.397365]  __x64_sys_mount+0x67/0x80
[  819.397388]  do_syscall_64+0x78/0x170
[  819.397403]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  819.397422] RIP: 0033:0x7f54c667cb9a
[  819.397424] Code: 48 8b 0d 01 c3 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 
0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 
f0 ff ff 73 01 c3 48 8b 0d ce c2 2b 00 f7 d8 64 89 01 48
[  819.397483] RSP: 002b:7ffd8f46cd08 EFLAGS: 0202 ORIG_RAX: 
00a5
[  819.397496] RAX: ffda RBX: 00dfa030 RCX: 7f54c667cb9a
[  819.397498] RDX: 00dfa210 RSI: 00dfbf30 RDI: 00e02ec0
[  819.397501] RBP:  R08:  R09: 0013
[  819.397503] R10: c0ed R11: 0202 R12: 00e02ec0
[  819.397505] R13: 00dfa210 R14:  R15: 0003

[  819.397866] Allocated by task 139:
[  819.398702]  save_stack+0x46/0xd0
[  819.398705]  kasan_kmalloc+0xad/0xe0
[  819.398709]  kasan_slab_alloc+0x11/0x20
[  819.398713]  kmem_cache_alloc+0xd1/0x1e0
[  819.398717]  dup_fd+0x50/0x4c0
[  819.398740]  copy_process.part.37+0xbed/0x32e0
[  819.398744]  _do_fork+0x16e/0x590
[  819.398748]  __x64_sys_clone+0x69/0x80
[  819.398752]  do_syscall_64+0x78/0x170
[  819.398756]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

[  819.399097] Freed by task 159:
[  819.399743]  save_stack+0x46/0xd0
[  819.399747]  __kasan_slab_free+0x13c/0x1a0
[  819.399750]  kasan_slab_free+0xe/0x10
[  819.399754]  kmem_cache_free+0x89/0x1e0
[  819.399757]  put_files_struct+0x132/0x150
[  819.399761]  exit_files+0x62/0x70
[  819.399766]  do_exit+0x47b/0x1390
[  819.399770]  do_group_exit+0x86/0x130
[  819.399774]  __x64_sys_exit_group+0x2c/0x30
[  819.399778]  do_syscall_64+0x78/0x170
[  819.399782]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

[  819.400115] The buggy address belongs to the object at 8801f099c680
which belongs to the cache files_cache of size 704
[  819.403234] The buggy address is located 40 bytes to the right of
704-byte region [8801f099c680, 8801f099c940)
[  819.405689] The buggy address belongs to the page:
[  819.406709] page:ea0007c26700 count:1 mapcount:0 
mapping:8801f69a3340 index:0x8801f099d380 compound_mapcount: 0
[  819.408984] flags: 0x2008100(slab|head)
[  819.409932] raw: 02008100 ea00077fb600 00020002 
8801f69a3340
[  819.411514] raw: 8801f099d380 8013 0001 

[  819.413073] page dumped because: kasan: bad access detected

[  819.414539] Memory state around the buggy address:
[  819.415521]  8801f099c800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb 
fb
[  819.416981]  8801f099c880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb 
fb
[  819.418454] >8801f099c900: fb fb fb fb fb fb

[PATCH v2] f2fs: disable f2fs_check_rb_tree_consistence

2018-06-22 Thread Chao Yu
If there is millions of discard entries cached in rb tree, each
sanity check of it can cause very long latency as held cmd_lock
blocking other lock grabbers.

In other aspect, we have enabled the check very long time, as
we see, there is no such inconsistent condition caused by bugs.

But still we do not choose to kill it directly, instead, adding
an flag to disable the check now, if there is related code change,
we can reuse it to detect bugs.

Signed-off-by: Yunlei He 
Signed-off-by: Chao Yu 
---
v2: use unlikely suggested by Jaegeuk.
 fs/f2fs/f2fs.h|  1 +
 fs/f2fs/segment.c | 10 +++---
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 99ad25513b4c..5302bad1566b 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -309,6 +309,7 @@ struct discard_cmd_control {
atomic_t issing_discard;/* # of issing discard */
atomic_t discard_cmd_cnt;   /* # of cached cmd count */
struct rb_root root;/* root of discard rb-tree */
+   bool rbtree_check;  /* config for consistence check 
*/
 };
 
 /* for the list of fsync inodes, used only during recovery */
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 553075973be4..f7eaefe8b1d1 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -1199,8 +1199,9 @@ static int __issue_discard_cmd(struct f2fs_sb_info *sbi,
mutex_lock(>cmd_lock);
if (list_empty(pend_list))
goto next;
-   f2fs_bug_on(sbi,
-   !f2fs_check_rb_tree_consistence(sbi, >root));
+   if (unlikely(dcc->rbtree_check))
+   f2fs_bug_on(sbi, !f2fs_check_rb_tree_consistence(sbi,
+   >root));
blk_start_plug();
list_for_each_entry_safe(dc, tmp, pend_list, list) {
f2fs_bug_on(sbi, dc->state != D_PREP);
@@ -1752,6 +1753,7 @@ static int create_discard_cmd_control(struct f2fs_sb_info 
*sbi)
dcc->max_discards = MAIN_SEGS(sbi) << sbi->log_blocks_per_seg;
dcc->undiscard_blks = 0;
dcc->root = RB_ROOT;
+   dcc->rbtree_check = false;
 
init_waitqueue_head(>discard_wait_queue);
SM_I(sbi)->dcc_info = dcc;
@@ -2381,7 +2383,9 @@ static void __issue_discard_cmd_range(struct f2fs_sb_info 
*sbi,
issued = 0;
 
mutex_lock(>cmd_lock);
-   f2fs_bug_on(sbi, !f2fs_check_rb_tree_consistence(sbi, >root));
+   if (unlikely(dcc->rbtree_check))
+   f2fs_bug_on(sbi, !f2fs_check_rb_tree_consistence(sbi,
+   >root));
 
dc = (struct discard_cmd *)f2fs_lookup_rb_tree_ret(>root,
NULL, start,
-- 
2.18.0.rc1



[PATCH 1/2] f2fs: relocate readdir_ra configure initialization

2018-06-11 Thread Chao Yu
readdir_ra is sysfs configuration instead of mount option, so it should
not be initialized in default_options(), otherwise after remount, it can
be reset to be enabled which may not as user wish, so let's move it to
f2fs_tuning_parameters().

Signed-off-by: Chao Yu 
---
 fs/f2fs/super.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 56f3af86f62c..19ef966a99e9 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -1357,7 +1357,6 @@ static void default_options(struct f2fs_sb_info *sbi)
F2FS_OPTION(sbi).test_dummy_encryption = false;
F2FS_OPTION(sbi).s_resuid = make_kuid(_user_ns, F2FS_DEF_RESUID);
F2FS_OPTION(sbi).s_resgid = make_kgid(_user_ns, F2FS_DEF_RESGID);
-   sbi->readdir_ra = 1;
 
set_opt(sbi, BG_GC);
set_opt(sbi, INLINE_XATTR);
@@ -2655,6 +2654,8 @@ static void f2fs_tuning_parameters(struct f2fs_sb_info 
*sbi)
sm_i->dcc_info->discard_granularity = 1;
sm_i->ipu_policy = 1 << F2FS_IPU_FORCE;
}
+
+   sbi->readdir_ra = 1;
 }
 
 static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
-- 
2.18.0.rc1



[PATCH 2/2] f2fs: fix error path of fill_super

2018-06-11 Thread Chao Yu
In fill_super, if root inode's attribute is incorrect, we need to
call f2fs_destroy_stats to release stats memory.

Signed-off-by: Chao Yu 
---
 fs/f2fs/super.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 19ef966a99e9..a790dae6c16f 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -2914,7 +2914,7 @@ static int f2fs_fill_super(struct super_block *sb, void 
*data, int silent)
if (!S_ISDIR(root->i_mode) || !root->i_blocks || !root->i_size) {
iput(root);
err = -EINVAL;
-   goto free_node_inode;
+   goto free_stats;
}
 
sb->s_root = d_make_root(root); /* allocate root dentry */
-- 
2.18.0.rc1



[PATCH v2 2/2] f2fs: fix to update mtime correctly

2018-06-04 Thread Chao Yu
From: Chao Yu 

If we change system time to the past, get_mtime() will return a
overflowed time, and SIT_I(sbi)->max_mtime will be udpated
incorrectly, this patch fixes the two issues.

Signed-off-by: Chao Yu 
---
v2:
- fix to correct return value of get_mtime().
 fs/f2fs/checkpoint.c |  2 +-
 fs/f2fs/segment.c|  7 ---
 fs/f2fs/segment.h| 17 ++---
 3 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 76e1856e4666..c7ffa1e0e021 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -1223,7 +1223,7 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, struct 
cp_control *cpc)
 * modify checkpoint
 * version number is already updated
 */
-   ckpt->elapsed_time = cpu_to_le64(get_mtime(sbi));
+   ckpt->elapsed_time = cpu_to_le64(get_mtime(sbi, true));
ckpt->free_segment_count = cpu_to_le32(free_segments(sbi));
for (i = 0; i < NR_CURSEG_NODE_TYPE; i++) {
ckpt->cur_node_segno[i] =
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 66983acaad16..c1bd31da772d 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -1822,8 +1822,9 @@ static void update_sit_entry(struct f2fs_sb_info *sbi, 
block_t blkaddr, int del)
(new_vblocks > sbi->blocks_per_seg)));
 
se->valid_blocks = new_vblocks;
-   se->mtime = get_mtime(sbi);
-   SIT_I(sbi)->max_mtime = se->mtime;
+   se->mtime = get_mtime(sbi, false);
+   if (se->mtime > SIT_I(sbi)->max_mtime)
+   SIT_I(sbi)->max_mtime = se->mtime;
 
/* Update valid block bitmap */
if (del > 0) {
@@ -3879,7 +3880,7 @@ static void init_min_max_mtime(struct f2fs_sb_info *sbi)
if (sit_i->min_mtime > mtime)
sit_i->min_mtime = mtime;
}
-   sit_i->max_mtime = get_mtime(sbi);
+   sit_i->max_mtime = get_mtime(sbi, false);
up_write(_i->sentry_lock);
 }
 
diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
index c574131ac7e1..f18fc82fbe99 100644
--- a/fs/f2fs/segment.h
+++ b/fs/f2fs/segment.h
@@ -745,12 +745,23 @@ static inline void set_to_next_sit(struct sit_info 
*sit_i, unsigned int start)
 #endif
 }
 
-static inline unsigned long long get_mtime(struct f2fs_sb_info *sbi)
+static inline unsigned long long get_mtime(struct f2fs_sb_info *sbi,
+   bool base_time)
 {
struct sit_info *sit_i = SIT_I(sbi);
-   time64_t now = ktime_get_real_seconds();
+   time64_t diff, now = ktime_get_real_seconds();
 
-   return sit_i->elapsed_time + now - sit_i->mounted_time;
+   if (now >= sit_i->mounted_time)
+   return sit_i->elapsed_time + now - sit_i->mounted_time;
+
+   /* system time is set to the past */
+   if (!base_time) {
+   diff = sit_i->mounted_time - now;
+   if (sit_i->elapsed_time >= diff)
+   return sit_i->elapsed_time - diff;
+   return 0;
+   }
+   return sit_i->elapsed_time;
 }
 
 static inline void set_summary(struct f2fs_summary *sum, nid_t nid,
-- 
2.16.2.17.g38e79b1fd



[PATCH 1/2] f2fs: fix to let caller retry allocating block address

2018-05-28 Thread Chao Yu
From: Chao Yu <yuch...@huawei.com>

Configure io_bits with 2 and enable LFS mode, generic/013 reports below dmesg:

BUG: unable to handle kernel NULL pointer dereference at 0104
*pdpt = 29b7b001 *pde = 
Oops: 0002 [#1] PREEMPT SMP
Modules linked in: crc32_generic zram f2fs(O) rfcomm bnep bluetooth 
ecdh_generic snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_seq_midi 
snd_seq_midi_event snd_rawmidi snd_seq pcbc joydev snd_seq_device aesni_intel 
snd_timer aes_i586 snd crypto_simd cryptd soundcore i2c_piix4 serio_raw mac_hid 
video parport_pc ppdev lp parport hid_generic psmouse usbhid hid e1000
CPU: 0 PID: 11161 Comm: fsstress Tainted: G   O  4.17.0-rc2 #38
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
EIP: f2fs_submit_page_write+0x28d/0x550 [f2fs]
EFLAGS: 00010206 CPU: 0
EAX: e863dcd8 EBX:  ECX: 0100 EDX: 0200
ESI: e863dcf4 EDI: f6f82768 EBP: e863dbb0 ESP: e863db74
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0: 80050033 CR2: 0104 CR3: 29a62020 CR4: 000406f0
Call Trace:
 do_write_page+0x6f/0xc0 [f2fs]
 write_data_page+0x4a/0xd0 [f2fs]
 do_write_data_page+0x327/0x630 [f2fs]
 __write_data_page+0x34b/0x820 [f2fs]
 __f2fs_write_data_pages+0x42d/0x8c0 [f2fs]
 f2fs_write_data_pages+0x27/0x30 [f2fs]
 do_writepages+0x1a/0x70
 __filemap_fdatawrite_range+0x94/0xd0
 filemap_write_and_wait_range+0x3d/0xa0
 __generic_file_write_iter+0x11a/0x1f0
 f2fs_file_write_iter+0xdd/0x3b0 [f2fs]
 __vfs_write+0xd2/0x150
 vfs_write+0x9b/0x190
 ksys_write+0x45/0x90
 sys_write+0x16/0x20
 do_fast_syscall_32+0xaa/0x22c
 entry_SYSENTER_32+0x4c/0x7b
EIP: 0xb7fc8c51
EFLAGS: 0246 CPU: 0
EAX: ffda EBX: 0003 ECX: 09cde000 EDX: 1000
ESI: 0003 EDI: 1000 EBP:  ESP: bfbded38
 DS: 007b ES: 007b FS:  GS: 0033 SS: 007b
Code: e8 f9 77 34 c9 8b 45 e0 8b 80 b8 00 00 00 39 45 d8 0f 84 bb 02 00 00 8b 
45 e0 8b 80 b8 00 00 00 8d 50 d8 8b 08 89 55 f0 8b 50 04 <89> 51 04 89 0a c7 00 
00 01 00 00 c7 40 04 00 02 00 00 8b 45 dc
EIP: f2fs_submit_page_write+0x28d/0x550 [f2fs] SS:ESP: 0068:e863db74
CR2: 0104
---[ end trace 4cac79c0d1305ee6 ]---

allocate_data_block will submit all sequential pending IOs sorted by a
FIFO list, If we failed to submit other user's IO due to unaligned write,
we will retry to allocate new block address for current IO, then it will
initialize fio.list again, if fio was in the list before, it can break
FIFO list, result in above panic.

Thread AThread B
- do_write_page
 - allocate_data_block
  - list_add_tail
  : fioA cached in FIFO list.
- do_write_page
 - allocate_data_block
  - list_add_tail
  : fioB cached in FIFO list.
 - f2fs_submit_page_write
 : fail to submit IO
 - allocate_data_block
  - INIT_LIST_HEAD
 - f2fs_submit_page_write
  - list_del  <-- NULL pointer dereference

This patch adds fio.retry parameter to indicate failure status for each
IO, and avoid bailing out if there is still pending IO in FIFO list for
fixing.

Signed-off-by: Chao Yu <yuch...@huawei.com>
---
 fs/f2fs/data.c| 14 ++
 fs/f2fs/f2fs.h|  3 ++-
 fs/f2fs/gc.c  |  5 +++--
 fs/f2fs/segment.c | 11 ++-
 4 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 31c2edb217ec..97e6df852f37 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -462,13 +462,12 @@ int f2fs_submit_page_bio(struct f2fs_io_info *fio)
return 0;
 }
 
-int f2fs_submit_page_write(struct f2fs_io_info *fio)
+void f2fs_submit_page_write(struct f2fs_io_info *fio)
 {
struct f2fs_sb_info *sbi = fio->sbi;
enum page_type btype = PAGE_TYPE_OF_BIO(fio->type);
struct f2fs_bio_info *io = sbi->write_io[btype] + fio->temp;
struct page *bio_page;
-   int err = 0;
 
f2fs_bug_on(sbi, is_read_io(fio->op));
 
@@ -478,7 +477,7 @@ int f2fs_submit_page_write(struct f2fs_io_info *fio)
spin_lock(>io_lock);
if (list_empty(>io_list)) {
spin_unlock(>io_lock);
-   goto out_fail;
+   goto out;
}
fio = list_first_entry(>io_list,
struct f2fs_io_info, list);
@@ -505,9 +504,9 @@ int f2fs_submit_page_write(struct f2fs_io_info *fio)
if (io->bio == NULL) {
if ((fio->type == DATA || fio->type == NODE) &&
fio->new_blkaddr & F2FS_IO_SIZE_MASK(sbi)) {
-   err = -EAGAIN;
dec_page_count(sbi, WB_DATA_TYPE(bio_page));
-   

[PATCH 2/2] f2fs: fix to avoid accessing cross the boundary

2018-05-28 Thread Chao Yu
From: Chao Yu <yuch...@huawei.com>

Configure io_bits with 2 and enable LFS mode, generic/017 reports below dmesg:

BUG: unable to handle kernel NULL pointer dereference at 0039
*pdpt = 2fcb2001 *pde = 
Oops:  [#1] PREEMPT SMP
Modules linked in: crc32_generic zram f2fs(O) bnep rfcomm bluetooth 
ecdh_generic snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_seq_midi 
snd_seq_midi_event snd_rawmidi pcbc snd_seq joydev aesni_intel aes_i586 
snd_seq_device snd_timer crypto_simd cryptd snd soundcore i2c_piix4 serio_raw 
mac_hid video parport_pc ppdev lp parport hid_generic usbhid psmouse hid e1000
CPU: 2 PID: 20779 Comm: xfs_io Tainted: G   O  4.17.0-rc2 #38
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
EIP: is_checkpointed_data+0x84/0xd0 [f2fs]
EFLAGS: 00010207 CPU: 2
EAX:  EBX: f5cd7000 ECX: fe32 EDX: 0039
ESI: 01cd EDI: ec95fb6c EBP: e264bd80 ESP: e264bd6c
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0: 80050033 CR2: 0039 CR3: 2fe55660 CR4: 000406f0
Call Trace:
 __exchange_data_block+0xb3f/0x1000 [f2fs]
 f2fs_fallocate+0xab9/0x16b0 [f2fs]
 vfs_fallocate+0x17c/0x2d0
 ksys_fallocate+0x42/0x70
 sys_fallocate+0x31/0x40
 do_fast_syscall_32+0xaa/0x22c
 entry_SYSENTER_32+0x4c/0x7b
EIP: 0xb7f98c51
EFLAGS: 0293 CPU: 2
EAX: ffda EBX: 0003 ECX: 0008 EDX: 01001000
ESI:  EDI: 1000 EBP:  ESP: bfc0357c
 DS: 007b ES: 007b FS:  GS: 0033 SS: 007b
Code: 00 00 d3 e8 8b 4d ec 2b 02 8b 55 f0 6b c0 1c 03 41 70 29 d6 8b 93 d0 06 
00 00 8b 40 0c 83 ea 01 21 d6 89 f2 89 f1 c1 ea 03 f7 d1 <0f> be 14 10 83 e1 07 
b8 01 00 00 00 d3 e0 85 c2 89 f8 0f 95 c3
EIP: is_checkpointed_data+0x84/0xd0 [f2fs] SS:ESP: 0068:e264bd6c
CR2: 0039
---[ end trace 9a4d4087cce6080a ]---

This is because in recovery flow of __exchange_data_block, we didn't pass olen 
to
__roll_back_blkaddrs, instead we passed len, which indicates wrong array size, 
result
in copying random block address into dnode page.

Later, once that random block address was accessed by is_checkpointed_data, it 
can
cause NULL pointer dereference.

Signed-off-by: Chao Yu <yuch...@huawei.com>
---
 fs/f2fs/file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index fab65a0bd4cc..694ef319f979 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -1144,7 +1144,7 @@ static int __exchange_data_block(struct inode *src_inode,
return 0;
 
 roll_back:
-   __roll_back_blkaddrs(src_inode, src_blkaddr, do_replace, src, len);
+   __roll_back_blkaddrs(src_inode, src_blkaddr, do_replace, src, olen);
kvfree(src_blkaddr);
kvfree(do_replace);
return ret;
-- 
2.16.2.17.g38e79b1fd



[PATCH 2/2] f2fs: fix error path of move_data_page

2018-05-28 Thread Chao Yu
This patch fixes error path of move_data_page:
- clear cold data flag if it fails to write page.
- redirty page for non-ENOMEM case.

Signed-off-by: Chao Yu <yuch...@huawei.com>
---
 fs/f2fs/gc.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 50bb8fc25275..885032fc3a61 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -780,9 +780,14 @@ static void move_data_page(struct inode *inode, block_t 
bidx, int gc_type,
set_cold_data(page);
 
err = do_write_data_page();
-   if (err == -ENOMEM && is_dirty) {
-   congestion_wait(BLK_RW_ASYNC, HZ/50);
-   goto retry;
+   if (err) {
+   clear_cold_data(page);
+   if (err == -ENOMEM) {
+   congestion_wait(BLK_RW_ASYNC, HZ/50);
+   goto retry;
+   }
+   if (is_dirty)
+   set_page_dirty(page);
}
}
 out:
-- 
2.17.0.391.g1f1cddd558b5



[PATCH 1/2] f2fs: don't drop dentry pages after fs shutdown

2018-05-28 Thread Chao Yu
As description in commit "f2fs: don't drop any page on f2fs_cp_error()
case":

"We still provide readdir() after shtudown, so we should keep pages to
avoid additional IOs."

In order to provider lastest directory structure, let's keep dentry
pages in cache after fs shutdown.

Signed-off-by: Chao Yu <yuch...@huawei.com>
---
 fs/f2fs/data.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 9d3e2e1c1e33..31c2edb217ec 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -1801,6 +1801,12 @@ static int __write_data_page(struct page *page, bool 
*submitted,
/* we should bypass data pages to proceed the kworkder jobs */
if (unlikely(f2fs_cp_error(sbi))) {
mapping_set_error(page->mapping, -EIO);
+   /*
+* don't drop any dirty dentry pages for keeping lastest
+* directory structure.
+*/
+   if (S_ISDIR(inode->i_mode))
+   goto redirty_out;
goto out;
}
 
-- 
2.17.0.391.g1f1cddd558b5



[PATCH v3] f2fs: fix to avoid race during access gc_thread pointer

2018-05-28 Thread Chao Yu
Thread AThread B
- f2fs_remount
 - stop_gc_thread
- f2fs_sbi_store
   sbi->gc_thread = NULL;
  access sbi->gc_thread->gc_*

Previously, we allocate memory for sbi->gc_thread based on background
gc thread mount option, the memory can be released if we turn off
that mount option, but still there are several places access gc_thread
pointer without considering race condition, result in NULL point
dereference.

In order to fix this issue, use sb->s_umount to exclude those operations.

Signed-off-by: Chao Yu <yuch...@huawei.com>
---
v3:
- fix missing 'gc_urgent' case
 fs/f2fs/sysfs.c | 19 ++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c
index dd940d156af6..ac3ea6044936 100644
--- a/fs/f2fs/sysfs.c
+++ b/fs/f2fs/sysfs.c
@@ -165,7 +165,7 @@ static ssize_t f2fs_sbi_show(struct f2fs_attr *a,
return snprintf(buf, PAGE_SIZE, "%u\n", *ui);
 }
 
-static ssize_t f2fs_sbi_store(struct f2fs_attr *a,
+static ssize_t __f2fs_sbi_store(struct f2fs_attr *a,
struct f2fs_sb_info *sbi,
const char *buf, size_t count)
 {
@@ -278,6 +278,23 @@ static ssize_t f2fs_sbi_store(struct f2fs_attr *a,
return count;
 }
 
+static ssize_t f2fs_sbi_store(struct f2fs_attr *a,
+   struct f2fs_sb_info *sbi,
+   const char *buf, size_t count)
+{
+   ssize_t ret;
+   bool gc_entry = (!strcmp(a->attr.name, "gc_urgent") ||
+   a->struct_type == GC_THREAD);
+
+   if (gc_entry)
+   down_read(>sb->s_umount);
+   ret = __f2fs_sbi_store(a, sbi, buf, count);
+   if (gc_entry)
+   up_read(>sb->s_umount);
+
+   return ret;
+}
+
 static ssize_t f2fs_attr_show(struct kobject *kobj,
struct attribute *attr, char *buf)
 {
-- 
2.17.0.391.g1f1cddd558b5



Re: [RESEND PATCH V5 24/33] f2fs: conver to bio_for_each_page_all2

2018-05-28 Thread Chao Yu
On 2018/5/25 11:46, Ming Lei wrote:
> bio_for_each_page_all() can't be used any more after multipage bvec is
> enabled, so we have to convert to bio_for_each_page_all2().
> 
> Signed-off-by: Ming Lei <ming@redhat.com>

Acked-by: Chao Yu <yuch...@huawei.com>

Thanks,



Re: [PATCH] f2fs-tools: fix overflow bug of start_sector when computing zone_align_start_offset

2018-05-28 Thread Chao Yu
On 2018/5/26 16:09, Yunlong Song wrote:
> zone_align_start_offset should be u64, but config.start_sector is u32,
> so it may be overflow when computing zone_align_start_offset.

Could you rebase this patch on top of "f2fs-tools: fix to match with the
start_sector"?

Thanks,

> 
> Signed-off-by: Yunlong Song 
> ---
>  fsck/resize.c  | 7 ---
>  mkfs/f2fs_format.c | 4 ++--
>  2 files changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/fsck/resize.c b/fsck/resize.c
> index d285dd7..8ac7d45 100644
> --- a/fsck/resize.c
> +++ b/fsck/resize.c
> @@ -11,7 +11,8 @@
>  
>  static int get_new_sb(struct f2fs_super_block *sb)
>  {
> - u_int32_t zone_size_bytes, zone_align_start_offset;
> + u_int32_t zone_size_bytes;
> + u_int64_t zone_align_start_offset;
>   u_int32_t blocks_for_sit, blocks_for_nat, blocks_for_ssa;
>   u_int32_t sit_segments, nat_segments, diff, total_meta_segments;
>   u_int32_t total_valid_blks_available;
> @@ -27,10 +28,10 @@ static int get_new_sb(struct f2fs_super_block *sb)
>  
>   zone_size_bytes = segment_size_bytes * segs_per_zone;
>   zone_align_start_offset =
> - (c.start_sector * c.sector_size +
> + ((u_int64_t) c.start_sector * c.sector_size +
>   2 * F2FS_BLKSIZE + zone_size_bytes - 1) /
>   zone_size_bytes * zone_size_bytes -
> - c.start_sector * c.sector_size;
> + (u_int64_t) c.start_sector * c.sector_size;
>  
>   set_sb(segment_count, (c.target_sectors * c.sector_size -
>   zone_align_start_offset) / segment_size_bytes /
> diff --git a/mkfs/f2fs_format.c b/mkfs/f2fs_format.c
> index 0a99a77..f045e23 100644
> --- a/mkfs/f2fs_format.c
> +++ b/mkfs/f2fs_format.c
> @@ -212,10 +212,10 @@ static int f2fs_prepare_super_block(void)
>   set_sb(block_count, c.total_sectors >> log_sectors_per_block);
>  
>   zone_align_start_offset =
> - (c.start_sector * c.sector_size +
> + ((u_int64_t) c.start_sector * c.sector_size +
>   2 * F2FS_BLKSIZE + zone_size_bytes - 1) /
>   zone_size_bytes * zone_size_bytes -
> - c.start_sector * c.sector_size;
> + (u_int64_t) c.start_sector * c.sector_size;
>  
>   if (c.start_sector % c.sectors_per_blk) {
>   MSG(1, "\t%s: Align start sector number to the page unit\n",
> 



Re: [PATCH v2] f2fs-tools: fix to match with the start_sector

2018-05-28 Thread Chao Yu
On 2018/5/7 10:15, Yunlong Song wrote:
> f2fs-tools uses ioctl BLKSSZGET to get sector_size, however, this ioctl
> will return a value which may be larger than 512 (according to the value
> of q->limits.logical_block_size), then this will be inconsistent with
> the start_sector, since start_sector is got from ioctl HDIO_GETGEO and
> is always in 512 size unit for a sector. To fix this problem, just
> change the sector_size to the default value when computing with
> start_sector. And fix sectors_per_blk as well.
> 
> Signed-off-by: Yunlong Song <yunlong.s...@huawei.com>

Reviewed-by: Chao Yu <yuch...@huawei.com>

Thanks,



[PATCH 2/2] f2fs: clean up with clear_radix_tree_dirty_tag

2018-05-26 Thread Chao Yu
Introduce clear_radix_tree_dirty_tag to include common codes for cleanup.

Signed-off-by: Chao Yu <yuch...@huawei.com>
---
 fs/f2fs/data.c   | 11 +++
 fs/f2fs/dir.c|  8 +---
 fs/f2fs/f2fs.h   |  1 +
 fs/f2fs/inline.c |  7 +--
 fs/f2fs/node.c   | 12 ++--
 5 files changed, 16 insertions(+), 23 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 98a12526e3d0..9d3e2e1c1e33 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -2594,6 +2594,17 @@ const struct address_space_operations f2fs_dblock_aops = 
{
 #endif
 };
 
+void clear_radix_tree_dirty_tag(struct page *page)
+{
+   struct address_space *mapping = page_mapping(page);
+   unsigned long flags;
+
+   xa_lock_irqsave(>i_pages, flags);
+   radix_tree_tag_clear(>i_pages, page_index(page),
+   PAGECACHE_TAG_DIRTY);
+   xa_unlock_irqrestore(>i_pages, flags);
+}
+
 int __init f2fs_init_post_read_processing(void)
 {
bio_post_read_ctx_cache = KMEM_CACHE(bio_post_read_ctx, 0);
diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
index 8c9c2f31b253..e20539ba0554 100644
--- a/fs/f2fs/dir.c
+++ b/fs/f2fs/dir.c
@@ -698,8 +698,6 @@ void f2fs_delete_entry(struct f2fs_dir_entry *dentry, 
struct page *page,
struct  f2fs_dentry_block *dentry_blk;
unsigned int bit_pos;
int slots = GET_DENTRY_SLOTS(le16_to_cpu(dentry->name_len));
-   struct address_space *mapping = page_mapping(page);
-   unsigned long flags;
int i;
 
f2fs_update_time(F2FS_I_SB(dir), REQ_TIME);
@@ -732,11 +730,7 @@ void f2fs_delete_entry(struct f2fs_dir_entry *dentry, 
struct page *page,
 
if (bit_pos == NR_DENTRY_IN_BLOCK &&
!truncate_hole(dir, page->index, page->index + 1)) {
-   xa_lock_irqsave(>i_pages, flags);
-   radix_tree_tag_clear(>i_pages, page_index(page),
-PAGECACHE_TAG_DIRTY);
-   xa_unlock_irqrestore(>i_pages, flags);
-
+   clear_radix_tree_dirty_tag(page);
clear_page_dirty_for_io(page);
ClearPagePrivate(page);
ClearPageUptodate(page);
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 659c63dae81c..da43959b725a 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -2966,6 +2966,7 @@ int f2fs_migrate_page(struct address_space *mapping, 
struct page *newpage,
struct page *page, enum migrate_mode mode);
 #endif
 bool f2fs_overwrite_io(struct inode *inode, loff_t pos, size_t len);
+void clear_radix_tree_dirty_tag(struct page *page);
 
 /*
  * gc.c
diff --git a/fs/f2fs/inline.c b/fs/f2fs/inline.c
index 1eaa2049eafa..83e6881fd21e 100644
--- a/fs/f2fs/inline.c
+++ b/fs/f2fs/inline.c
@@ -204,8 +204,6 @@ int f2fs_write_inline_data(struct inode *inode, struct page 
*page)
 {
void *src_addr, *dst_addr;
struct dnode_of_data dn;
-   struct address_space *mapping = page_mapping(page);
-   unsigned long flags;
int err;
 
set_new_dnode(, inode, NULL, NULL, 0);
@@ -227,10 +225,7 @@ int f2fs_write_inline_data(struct inode *inode, struct 
page *page)
kunmap_atomic(src_addr);
set_page_dirty(dn.inode_page);
 
-   xa_lock_irqsave(>i_pages, flags);
-   radix_tree_tag_clear(>i_pages, page_index(page),
-PAGECACHE_TAG_DIRTY);
-   xa_unlock_irqrestore(>i_pages, flags);
+   clear_radix_tree_dirty_tag(page);
 
set_inode_flag(inode, FI_APPEND_WRITE);
set_inode_flag(inode, FI_DATA_EXIST);
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index 67b3e89975af..59041acbf7ac 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -102,18 +102,10 @@ bool available_free_memory(struct f2fs_sb_info *sbi, int 
type)
 
 static void clear_node_page_dirty(struct page *page)
 {
-   struct address_space *mapping = page->mapping;
-   unsigned int long flags;
-
if (PageDirty(page)) {
-   xa_lock_irqsave(>i_pages, flags);
-   radix_tree_tag_clear(>i_pages,
-   page_index(page),
-   PAGECACHE_TAG_DIRTY);
-   xa_unlock_irqrestore(>i_pages, flags);
-
+   clear_radix_tree_dirty_tag(page);
clear_page_dirty_for_io(page);
-   dec_page_count(F2FS_M_SB(mapping), F2FS_DIRTY_NODES);
+   dec_page_count(F2FS_P_SB(page), F2FS_DIRTY_NODES);
}
ClearPageUptodate(page);
 }
-- 
2.17.0.391.g1f1cddd558b5



[PATCH 1/2] f2fs: fix to don't trigger writeback during recovery

2018-05-26 Thread Chao Yu
- f2fs_fill_super
 - recover_fsync_data
  - recover_data
   - del_fsync_inode
- iput
 - iput_final
  - write_inode_now
   - f2fs_write_inode
- f2fs_balance_fs
 - f2fs_balance_fs_bg
  - sync_dirty_inodes

With data_flush mount option, during recovery, in order to avoid entering
above writeback flow, let's detect recovery status and do skip in
f2fs_balance_fs_bg.

Signed-off-by: Chao Yu <yuch...@huawei.com>
Signed-off-by: Yunlei He <heyun...@huawei.com>
---
 fs/f2fs/segment.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 7dc59ae05e94..76947f2856bf 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -486,6 +486,9 @@ void f2fs_balance_fs(struct f2fs_sb_info *sbi, bool need)
 
 void f2fs_balance_fs_bg(struct f2fs_sb_info *sbi)
 {
+   if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING)))
+   return;
+
/* try to shrink extent cache when there is no enough memory */
if (!available_free_memory(sbi, EXTENT_CACHE))
f2fs_shrink_extent_tree(sbi, EXTENT_CACHE_SHRINK_NUMBER);
-- 
2.17.0.391.g1f1cddd558b5



Re: [PATCH v2] f2fs: let fstrim issue discard commands in lower priority

2018-05-26 Thread Chao Yu
On 2018/5/26 3:49, Jaegeuk Kim wrote:
> The fstrim gathers huge number of large discard commands, and tries to issue
> without IO awareness, which results in long user-perceive IO latencies on
> READ, WRITE, and FLUSH in UFS. We've observed some of commands take several
> seconds due to long discard latency.
> 
> This patch limits the maximum size to 2MB per candidate, and check IO 
> congestion
> when issuing them to disk.
> 
> Signed-off-by: Jaegeuk Kim 
> ---
> 
> Change log from v1:
>  - wait all discard bios in put_super & congested case in trimfs
> 
>  fs/f2fs/f2fs.h|   4 +-
>  fs/f2fs/segment.c | 139 +-
>  2 files changed, 78 insertions(+), 65 deletions(-)
> 
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 3cc56b4df03f..6e0677aff8ca 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -178,6 +178,7 @@ enum {
>  
>  #define MAX_DISCARD_BLOCKS(sbi)  BLKS_PER_SEC(sbi)
>  #define DEF_MAX_DISCARD_REQUEST  8   /* issue 8 discards per 
> round */
> +#define DEF_MAX_DISCARD_LEN  512 /* Max. 2MB per discard */
>  #define DEF_MIN_DISCARD_ISSUE_TIME   50  /* 50 ms, if exists */
>  #define DEF_MID_DISCARD_ISSUE_TIME   500 /* 500 ms, if device busy */
>  #define DEF_MAX_DISCARD_ISSUE_TIME   6   /* 60 s, if no candidates */
> @@ -698,7 +699,8 @@ static inline void set_extent_info(struct extent_info 
> *ei, unsigned int fofs,
>  static inline bool __is_discard_mergeable(struct discard_info *back,
>   struct discard_info *front)
>  {
> - return back->lstart + back->len == front->lstart;
> + return (back->lstart + back->len == front->lstart) &&
> + (back->len + front->len < DEF_MAX_DISCARD_LEN);
>  }
>  
>  static inline bool __is_discard_back_mergeable(struct discard_info *cur,
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index c67d92bf2968..0150719e580d 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -1139,68 +1139,6 @@ static int __queue_discard_cmd(struct f2fs_sb_info 
> *sbi,
>   return 0;
>  }
>  
> -static void __issue_discard_cmd_range(struct f2fs_sb_info *sbi,
> - struct discard_policy *dpolicy,
> - unsigned int start, unsigned int end)
> -{
> - struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
> - struct discard_cmd *prev_dc = NULL, *next_dc = NULL;
> - struct rb_node **insert_p = NULL, *insert_parent = NULL;
> - struct discard_cmd *dc;
> - struct blk_plug plug;
> - int issued;
> -
> -next:
> - issued = 0;
> -
> - mutex_lock(>cmd_lock);
> - f2fs_bug_on(sbi, !__check_rb_tree_consistence(sbi, >root));
> -
> - dc = (struct discard_cmd *)__lookup_rb_tree_ret(>root,
> - NULL, start,
> - (struct rb_entry **)_dc,
> - (struct rb_entry **)_dc,
> - _p, _parent, true);
> - if (!dc)
> - dc = next_dc;
> -
> - blk_start_plug();
> -
> - while (dc && dc->lstart <= end) {
> - struct rb_node *node;
> -
> - if (dc->len < dpolicy->granularity)
> - goto skip;
> -
> - if (dc->state != D_PREP) {
> - list_move_tail(>list, >fstrim_list);
> - goto skip;
> - }
> -
> - __submit_discard_cmd(sbi, dpolicy, dc);
> -
> - if (++issued >= dpolicy->max_requests) {
> - start = dc->lstart + dc->len;
> -
> - blk_finish_plug();
> - mutex_unlock(>cmd_lock);
> -
> - schedule();
> -
> - goto next;
> - }
> -skip:
> - node = rb_next(>rb_node);
> - dc = rb_entry_safe(node, struct discard_cmd, rb_node);
> -
> - if (fatal_signal_pending(current))
> - break;
> - }
> -
> - blk_finish_plug();
> - mutex_unlock(>cmd_lock);
> -}
> -
>  static int __issue_discard_cmd(struct f2fs_sb_info *sbi,
>   struct discard_policy *dpolicy)
>  {
> @@ -1341,7 +1279,18 @@ static unsigned int __wait_discard_cmd_range(struct 
> f2fs_sb_info *sbi,
>  static void __wait_all_discard_cmd(struct f2fs_sb_info *sbi,
>   struct discard_policy *dpolicy)
>  {
> - __wait_discard_cmd_range(sbi, dpolicy, 0, UINT_MAX);
> + struct discard_policy dp;
> +
> + if (dpolicy) {
> + __wait_discard_cmd_range(sbi, dpolicy, 0, UINT_MAX);
> + return;
> + }
> +
> + /* wait all */
> + init_discard_policy(, DPOLICY_FSTRIM, 1);
> + __wait_discard_cmd_range(sbi, , 0, UINT_MAX);
> + init_discard_policy(, DPOLICY_UMOUNT, 1);
> + __wait_discard_cmd_range(sbi, , 0, 

Re: [RFC PATCH] f2fs: add fsync_mode=nobarrier for non-atomic files

2018-05-25 Thread Chao Yu
On 2018/5/26 9:04, Jaegeuk Kim wrote:
> For non-atomic files, this patch adds an option to give nobarrier which
> doesn't issue flush commands to the device.
> 
> Signed-off-by: Jaegeuk Kim <jaeg...@kernel.org>

Reviewed-by: Chao Yu <yuch...@huawei.com>

Thanks,



[PATCH v2] f2fs: keep migration IO order in LFS mode

2018-05-25 Thread Chao Yu
For non-migration IO, we will keep order of data/node blocks' submitting
as allocation sequence by sorting IOs in per log io_list list, but for
migration IO, it could be out-of-order.

In LFS mode, we should keep all IOs including migration IO be ordered,
so that this patch fixes to add an additional lock to keep submitting
order.

Signed-off-by: Chao Yu <yuch...@huawei.com>
Signed-off-by: Yunlong Song <yunlong.s...@huawei.com>
---
v2:
- introduce variable lfs_mode to record historical option, it can avoid
option being changed.
 fs/f2fs/f2fs.h| 2 ++
 fs/f2fs/gc.c  | 6 ++
 fs/f2fs/segment.c | 5 +
 fs/f2fs/super.c   | 1 +
 4 files changed, 14 insertions(+)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index dc0a462461e8..3cc56b4df03f 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1124,6 +1124,8 @@ struct f2fs_sb_info {
struct f2fs_bio_info *write_io[NR_PAGE_TYPE];   /* for write bios */
struct mutex wio_mutex[NR_PAGE_TYPE - 1][NR_TEMP_TYPE];
/* bio ordering for NODE/DATA */
+   /* keep migration IO order for LFS mode */
+   struct rw_semaphore io_order_lock;
mempool_t *write_io_dummy;  /* Dummy pages */
 
/* for checkpoint */
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 5ef3233c38d2..50bb8fc25275 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -610,6 +610,7 @@ static void move_data_block(struct inode *inode, block_t 
bidx,
struct page *page;
block_t newaddr;
int err;
+   bool lfs_mode = test_opt(fio.sbi, LFS);
 
/* do not read out */
page = f2fs_grab_cache_page(inode->i_mapping, bidx, false);
@@ -653,6 +654,9 @@ static void move_data_block(struct inode *inode, block_t 
bidx,
fio.page = page;
fio.new_blkaddr = fio.old_blkaddr = dn.data_blkaddr;
 
+   if (lfs_mode)
+   down_write(>io_order_lock);
+
allocate_data_block(fio.sbi, NULL, fio.old_blkaddr, ,
, CURSEG_COLD_DATA, NULL, false);
 
@@ -709,6 +713,8 @@ static void move_data_block(struct inode *inode, block_t 
bidx,
 put_page_out:
f2fs_put_page(fio.encrypted_page, 1);
 recover_block:
+   if (lfs_mode)
+   up_write(>io_order_lock);
if (err)
__f2fs_replace_block(fio.sbi, , newaddr, fio.old_blkaddr,
true, true);
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index a05208954dd5..c67d92bf2968 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -2735,7 +2735,10 @@ static void do_write_page(struct f2fs_summary *sum, 
struct f2fs_io_info *fio)
 {
int type = __get_segment_type(fio);
int err;
+   bool keep_order = (test_opt(fio->sbi, LFS) && type == CURSEG_COLD_DATA);
 
+   if (keep_order)
+   down_read(>sbi->io_order_lock);
 reallocate:
allocate_data_block(fio->sbi, fio->page, fio->old_blkaddr,
>new_blkaddr, sum, type, fio, true);
@@ -2748,6 +2751,8 @@ static void do_write_page(struct f2fs_summary *sum, 
struct f2fs_io_info *fio)
} else if (!err) {
update_device_state(fio);
}
+   if (keep_order)
+   up_read(>sbi->io_order_lock);
 }
 
 void write_meta_page(struct f2fs_sb_info *sbi, struct page *page,
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 8e5f0a178f5d..1b42fc7e4b29 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -2365,6 +2365,7 @@ static void init_sb_info(struct f2fs_sb_info *sbi)
for (i = 0; i < NR_PAGE_TYPE - 1; i++)
for (j = HOT; j < NR_TEMP_TYPE; j++)
mutex_init(>wio_mutex[i][j]);
+   init_rwsem(>io_order_lock);
spin_lock_init(>cp_lock);
 
sbi->dirty_device = 0;
-- 
2.17.0.391.g1f1cddd558b5



Re: [PATCH v2] f2fs: let fstrim issue discard commands in lower priority

2018-05-25 Thread Chao Yu
On 2018/5/26 3:49, Jaegeuk Kim wrote:
> The fstrim gathers huge number of large discard commands, and tries to issue
> without IO awareness, which results in long user-perceive IO latencies on
> READ, WRITE, and FLUSH in UFS. We've observed some of commands take several
> seconds due to long discard latency.
> 
> This patch limits the maximum size to 2MB per candidate, and check IO 
> congestion
> when issuing them to disk.
> 
> Signed-off-by: Jaegeuk Kim <jaeg...@kernel.org>

It looks good to me. :)

Reviewed-by: Chao Yu <yuch...@huawei.com>

Thanks,



Re: [PATCH v3] f2fs: Fix deadlock in shutdown ioctl

2018-05-25 Thread Chao Yu
On 2018/5/18 14:21, Sahitya Tummala wrote:
> f2fs_ioc_shutdown() ioctl gets stuck in the below path
> when issued with F2FS_GOING_DOWN_FULLSYNC option.
> 
> __switch_to+0x90/0xc4
> percpu_down_write+0x8c/0xc0
> freeze_super+0xec/0x1e4
> freeze_bdev+0xc4/0xcc
> f2fs_ioctl+0xc0c/0x1ce0
> f2fs_compat_ioctl+0x98/0x1f0
> 
> Signed-off-by: Sahitya Tummala <stumm...@codeaurora.org>

Reviewed-by: Chao Yu <yuch...@huawei.com>

Thanks,




Re: [PATCH] f2fs: let fstrim issue discard commands in lower priority

2018-05-25 Thread Chao Yu
On 2018/5/25 14:18, Jaegeuk Kim wrote:
> On 05/25, Chao Yu wrote:
>> Hi Jaegeuk,
>>
>> On 2018/5/25 13:10, Jaegeuk Kim wrote:
>>> The fstrim gathers huge number of large discard commands, and tries to issue
>>> without IO awareness, which results in long user-perceive IO latencies on
>>> READ, WRITE, and FLUSH in UFS. We've observed some of commands take several
>>> seconds due to long discard latency.
>>>
>>> This patch limits the maximum size to 2MB per candidate, and check IO 
>>> congestion
>>> when issuing them to disk.
>>>
>>> Signed-off-by: Jaegeuk Kim <jaeg...@kernel.org>
>>> ---
>>>  fs/f2fs/f2fs.h|   4 +-
>>>  fs/f2fs/segment.c | 123 +++---
>>>  2 files changed, 64 insertions(+), 63 deletions(-)
>>>
>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>>> index 3bddf13794d9..75ae7fc86ae8 100644
>>> --- a/fs/f2fs/f2fs.h
>>> +++ b/fs/f2fs/f2fs.h
>>> @@ -178,6 +178,7 @@ enum {
>>>  
>>>  #define MAX_DISCARD_BLOCKS(sbi)BLKS_PER_SEC(sbi)
>>>  #define DEF_MAX_DISCARD_REQUEST8   /* issue 8 discards per 
>>> round */
>>> +#define DEF_MAX_DISCARD_LEN512 /* Max. 2MB per discard 
>>> */
>>>  #define DEF_MIN_DISCARD_ISSUE_TIME 50  /* 50 ms, if exists */
>>>  #define DEF_MID_DISCARD_ISSUE_TIME 500 /* 500 ms, if device busy */
>>>  #define DEF_MAX_DISCARD_ISSUE_TIME 6   /* 60 s, if no candidates */
>>> @@ -698,7 +699,8 @@ static inline void set_extent_info(struct extent_info 
>>> *ei, unsigned int fofs,
>>>  static inline bool __is_discard_mergeable(struct discard_info *back,
>>> struct discard_info *front)
>>>  {
>>> -   return back->lstart + back->len == front->lstart;
>>> +   return (back->lstart + back->len == front->lstart) &&
>>> +   (back->len + front->len < DEF_MAX_DISCARD_LEN);
>>>  }
>>>  
>>>  static inline bool __is_discard_back_mergeable(struct discard_info *cur,
>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>>> index 843fc2e6d41c..ba996d4091bc 100644
>>> --- a/fs/f2fs/segment.c
>>> +++ b/fs/f2fs/segment.c
>>> @@ -1139,68 +1139,6 @@ static int __queue_discard_cmd(struct f2fs_sb_info 
>>> *sbi,
>>> return 0;
>>>  }
>>>  
>>> -static void __issue_discard_cmd_range(struct f2fs_sb_info *sbi,
>>> -   struct discard_policy *dpolicy,
>>> -   unsigned int start, unsigned int end)
>>> -{
>>> -   struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
>>> -   struct discard_cmd *prev_dc = NULL, *next_dc = NULL;
>>> -   struct rb_node **insert_p = NULL, *insert_parent = NULL;
>>> -   struct discard_cmd *dc;
>>> -   struct blk_plug plug;
>>> -   int issued;
>>> -
>>> -next:
>>> -   issued = 0;
>>> -
>>> -   mutex_lock(>cmd_lock);
>>> -   f2fs_bug_on(sbi, !__check_rb_tree_consistence(sbi, >root));
>>> -
>>> -   dc = (struct discard_cmd *)__lookup_rb_tree_ret(>root,
>>> -   NULL, start,
>>> -   (struct rb_entry **)_dc,
>>> -   (struct rb_entry **)_dc,
>>> -   _p, _parent, true);
>>> -   if (!dc)
>>> -   dc = next_dc;
>>> -
>>> -   blk_start_plug();
>>> -
>>> -   while (dc && dc->lstart <= end) {
>>> -   struct rb_node *node;
>>> -
>>> -   if (dc->len < dpolicy->granularity)
>>> -   goto skip;
>>> -
>>> -   if (dc->state != D_PREP) {
>>> -   list_move_tail(>list, >fstrim_list);
>>> -   goto skip;
>>> -   }
>>> -
>>> -   __submit_discard_cmd(sbi, dpolicy, dc);
>>> -
>>> -   if (++issued >= dpolicy->max_requests) {
>>> -   start = dc->lstart + dc->len;
>>> -
>>> -   blk_finish_plug();
>>> -   mutex_unlock(>cmd_lock);
>>> -
>>> -   schedule();
>>> -
>>> -   goto next;
>>> -

Re: [PATCH] f2fs: let fstrim issue discard commands in lower priority

2018-05-25 Thread Chao Yu
Hi Jaegeuk,

On 2018/5/25 13:10, Jaegeuk Kim wrote:
> The fstrim gathers huge number of large discard commands, and tries to issue
> without IO awareness, which results in long user-perceive IO latencies on
> READ, WRITE, and FLUSH in UFS. We've observed some of commands take several
> seconds due to long discard latency.
> 
> This patch limits the maximum size to 2MB per candidate, and check IO 
> congestion
> when issuing them to disk.
> 
> Signed-off-by: Jaegeuk Kim 
> ---
>  fs/f2fs/f2fs.h|   4 +-
>  fs/f2fs/segment.c | 123 +++---
>  2 files changed, 64 insertions(+), 63 deletions(-)
> 
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 3bddf13794d9..75ae7fc86ae8 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -178,6 +178,7 @@ enum {
>  
>  #define MAX_DISCARD_BLOCKS(sbi)  BLKS_PER_SEC(sbi)
>  #define DEF_MAX_DISCARD_REQUEST  8   /* issue 8 discards per 
> round */
> +#define DEF_MAX_DISCARD_LEN  512 /* Max. 2MB per discard */
>  #define DEF_MIN_DISCARD_ISSUE_TIME   50  /* 50 ms, if exists */
>  #define DEF_MID_DISCARD_ISSUE_TIME   500 /* 500 ms, if device busy */
>  #define DEF_MAX_DISCARD_ISSUE_TIME   6   /* 60 s, if no candidates */
> @@ -698,7 +699,8 @@ static inline void set_extent_info(struct extent_info 
> *ei, unsigned int fofs,
>  static inline bool __is_discard_mergeable(struct discard_info *back,
>   struct discard_info *front)
>  {
> - return back->lstart + back->len == front->lstart;
> + return (back->lstart + back->len == front->lstart) &&
> + (back->len + front->len < DEF_MAX_DISCARD_LEN);
>  }
>  
>  static inline bool __is_discard_back_mergeable(struct discard_info *cur,
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index 843fc2e6d41c..ba996d4091bc 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -1139,68 +1139,6 @@ static int __queue_discard_cmd(struct f2fs_sb_info 
> *sbi,
>   return 0;
>  }
>  
> -static void __issue_discard_cmd_range(struct f2fs_sb_info *sbi,
> - struct discard_policy *dpolicy,
> - unsigned int start, unsigned int end)
> -{
> - struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
> - struct discard_cmd *prev_dc = NULL, *next_dc = NULL;
> - struct rb_node **insert_p = NULL, *insert_parent = NULL;
> - struct discard_cmd *dc;
> - struct blk_plug plug;
> - int issued;
> -
> -next:
> - issued = 0;
> -
> - mutex_lock(>cmd_lock);
> - f2fs_bug_on(sbi, !__check_rb_tree_consistence(sbi, >root));
> -
> - dc = (struct discard_cmd *)__lookup_rb_tree_ret(>root,
> - NULL, start,
> - (struct rb_entry **)_dc,
> - (struct rb_entry **)_dc,
> - _p, _parent, true);
> - if (!dc)
> - dc = next_dc;
> -
> - blk_start_plug();
> -
> - while (dc && dc->lstart <= end) {
> - struct rb_node *node;
> -
> - if (dc->len < dpolicy->granularity)
> - goto skip;
> -
> - if (dc->state != D_PREP) {
> - list_move_tail(>list, >fstrim_list);
> - goto skip;
> - }
> -
> - __submit_discard_cmd(sbi, dpolicy, dc);
> -
> - if (++issued >= dpolicy->max_requests) {
> - start = dc->lstart + dc->len;
> -
> - blk_finish_plug();
> - mutex_unlock(>cmd_lock);
> -
> - schedule();
> -
> - goto next;
> - }
> -skip:
> - node = rb_next(>rb_node);
> - dc = rb_entry_safe(node, struct discard_cmd, rb_node);
> -
> - if (fatal_signal_pending(current))
> - break;
> - }
> -
> - blk_finish_plug();
> - mutex_unlock(>cmd_lock);
> -}
> -
>  static int __issue_discard_cmd(struct f2fs_sb_info *sbi,
>   struct discard_policy *dpolicy)
>  {
> @@ -2397,6 +2335,67 @@ bool exist_trim_candidates(struct f2fs_sb_info *sbi, 
> struct cp_control *cpc)
>   return has_candidate;
>  }
>  
> +static void __issue_discard_cmd_range(struct f2fs_sb_info *sbi,
> + struct discard_policy *dpolicy,
> + unsigned int start, unsigned int end)
> +{
> + struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
> + struct discard_cmd *prev_dc = NULL, *next_dc = NULL;
> + struct rb_node **insert_p = NULL, *insert_parent = NULL;
> + struct discard_cmd *dc;
> + struct blk_plug plug;
> + int issued;

unsigned int cur = start;

> +
> +next:
> + issued = 0;
> +
> + mutex_lock(>cmd_lock);
> + f2fs_bug_on(sbi, 

[PATCH] f2fs: keep migration IO order in LFS mode

2018-05-24 Thread Chao Yu
For non-migration IO, we will keep order of data/node blocks' submitting
as allocation sequence by sorting IOs in per log io_list list, but for
migration IO, it could be out-of-order.

In LFS mode, we should keep all IOs including migration IO be ordered,
so that this patch fixes to add an additional lock to keep submitting
order.

Signed-off-by: Chao Yu <yuch...@huawei.com>
Signed-off-by: Yunlong Song <yunlong.s...@huawei.com>
---
 fs/f2fs/f2fs.h| 2 ++
 fs/f2fs/gc.c  | 5 +
 fs/f2fs/segment.c | 5 +
 fs/f2fs/super.c   | 1 +
 4 files changed, 13 insertions(+)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index e06b622ba661..233e00068472 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1127,6 +1127,8 @@ struct f2fs_sb_info {
struct f2fs_bio_info *write_io[NR_PAGE_TYPE];   /* for write bios */
struct mutex wio_mutex[NR_PAGE_TYPE - 1][NR_TEMP_TYPE];
/* bio ordering for NODE/DATA */
+   /* keep migration IO order for LFS mode */
+   struct rw_semaphore io_order_lock;
mempool_t *write_io_dummy;  /* Dummy pages */
 
/* for checkpoint */
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 5ef3233c38d2..fca1d3745535 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -653,6 +653,9 @@ static void move_data_block(struct inode *inode, block_t 
bidx,
fio.page = page;
fio.new_blkaddr = fio.old_blkaddr = dn.data_blkaddr;
 
+   if (test_opt(fio.sbi, LFS))
+   down_write(>io_order_lock);
+
allocate_data_block(fio.sbi, NULL, fio.old_blkaddr, ,
, CURSEG_COLD_DATA, NULL, false);
 
@@ -709,6 +712,8 @@ static void move_data_block(struct inode *inode, block_t 
bidx,
 put_page_out:
f2fs_put_page(fio.encrypted_page, 1);
 recover_block:
+   if (test_opt(fio.sbi, LFS))
+   up_write(>io_order_lock);
if (err)
__f2fs_replace_block(fio.sbi, , newaddr, fio.old_blkaddr,
true, true);
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 697e7f8464b4..6b688e2d0b5d 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -2749,7 +2749,10 @@ static void do_write_page(struct f2fs_summary *sum, 
struct f2fs_io_info *fio)
 {
int type = __get_segment_type(fio);
int err;
+   bool keep_order = (test_opt(fio->sbi, LFS) && type == CURSEG_COLD_DATA);
 
+   if (keep_order)
+   down_read(>sbi->io_order_lock);
 reallocate:
allocate_data_block(fio->sbi, fio->page, fio->old_blkaddr,
>new_blkaddr, sum, type, fio, true);
@@ -2762,6 +2765,8 @@ static void do_write_page(struct f2fs_summary *sum, 
struct f2fs_io_info *fio)
} else if (!err) {
update_device_state(fio);
}
+   if (keep_order)
+   up_read(>sbi->io_order_lock);
 }
 
 void write_meta_page(struct f2fs_sb_info *sbi, struct page *page,
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 8e5f0a178f5d..1b42fc7e4b29 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -2365,6 +2365,7 @@ static void init_sb_info(struct f2fs_sb_info *sbi)
for (i = 0; i < NR_PAGE_TYPE - 1; i++)
for (j = HOT; j < NR_TEMP_TYPE; j++)
mutex_init(>wio_mutex[i][j]);
+   init_rwsem(>io_order_lock);
spin_lock_init(>cp_lock);
 
sbi->dirty_device = 0;
-- 
2.17.0.391.g1f1cddd558b5



[PATCH 2/2] f2fs: detect synchronous writeback more earlier

2018-05-23 Thread Chao Yu
From: Chao Yu <yuch...@huawei.com>

This patch changes to detect synchronous writeback more earlier before,
in order to avoid unnecessary page writeback before exiting asynchronous
writeback.

Signed-off-by: Chao Yu <yuch...@huawei.com>
---
 fs/f2fs/data.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index feb4c224b305..91cbf20b448a 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -1919,6 +1919,13 @@ static int f2fs_write_cache_pages(struct address_space 
*mapping,
struct page *page = pvec.pages[i];
bool submitted = false;
 
+   /* give a priority to WB_SYNC threads */
+   if (atomic_read(_M_SB(mapping)->wb_sync_req) &&
+   wbc->sync_mode == WB_SYNC_NONE) {
+   done = 1;
+   break;
+   }
+
done_index = page->index;
 retry_write:
lock_page(page);
@@ -1973,9 +1980,7 @@ static int f2fs_write_cache_pages(struct address_space 
*mapping,
last_idx = page->index;
}
 
-   /* give a priority to WB_SYNC threads */
-   if ((atomic_read(_M_SB(mapping)->wb_sync_req) ||
-   --wbc->nr_to_write <= 0) &&
+   if (--wbc->nr_to_write <= 0 &&
wbc->sync_mode == WB_SYNC_NONE) {
done = 1;
break;
-- 
2.16.2.17.g38e79b1fd



[PATCH 1/2] f2fs: clean up with is_valid_blkaddr()

2018-05-23 Thread Chao Yu
From: Chao Yu <yuch...@huawei.com>

- rename is_valid_blkaddr() to is_valid_meta_blkaddr() for readability.
- introduce is_valid_blkaddr() for cleanup.

No logic change in this patch.

Signed-off-by: Chao Yu <yuch...@huawei.com>
---
 fs/f2fs/checkpoint.c |  4 ++--
 fs/f2fs/data.c   | 18 +-
 fs/f2fs/f2fs.h   |  9 -
 fs/f2fs/file.c   |  2 +-
 fs/f2fs/inode.c  |  2 +-
 fs/f2fs/node.c   |  5 ++---
 fs/f2fs/recovery.c   |  6 +++---
 fs/f2fs/segment.c|  4 ++--
 fs/f2fs/segment.h|  2 +-
 9 files changed, 25 insertions(+), 27 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 33d2da006789..e38221d96564 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -119,7 +119,7 @@ struct page *get_tmp_page(struct f2fs_sb_info *sbi, pgoff_t 
index)
return __get_meta_page(sbi, index, false);
 }
 
-bool is_valid_blkaddr(struct f2fs_sb_info *sbi, block_t blkaddr, int type)
+bool is_valid_meta_blkaddr(struct f2fs_sb_info *sbi, block_t blkaddr, int type)
 {
switch (type) {
case META_NAT:
@@ -175,7 +175,7 @@ int ra_meta_pages(struct f2fs_sb_info *sbi, block_t start, 
int nrpages,
blk_start_plug();
for (; nrpages-- > 0; blkno++) {
 
-   if (!is_valid_blkaddr(sbi, blkno, type))
+   if (!is_valid_meta_blkaddr(sbi, blkno, type))
goto out;
 
switch (type) {
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index c368e651d3fd..feb4c224b305 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -428,7 +428,7 @@ int f2fs_submit_page_write(struct f2fs_io_info *fio)
spin_unlock(>io_lock);
}
 
-   if (fio->old_blkaddr != NEW_ADDR)
+   if (is_valid_blkaddr(fio->old_blkaddr))
verify_block_addr(fio, fio->old_blkaddr);
verify_block_addr(fio, fio->new_blkaddr);
 
@@ -984,7 +984,7 @@ int f2fs_map_blocks(struct inode *inode, struct 
f2fs_map_blocks *map,
 next_block:
blkaddr = datablock_addr(dn.inode, dn.node_page, dn.ofs_in_node);
 
-   if (blkaddr == NEW_ADDR || blkaddr == NULL_ADDR) {
+   if (!is_valid_blkaddr(blkaddr)) {
if (create) {
if (unlikely(f2fs_cp_error(sbi))) {
err = -EIO;
@@ -1619,15 +1619,6 @@ static inline bool need_inplace_update(struct 
f2fs_io_info *fio)
return should_update_inplace(inode, fio);
 }
 
-static inline bool valid_ipu_blkaddr(struct f2fs_io_info *fio)
-{
-   if (fio->old_blkaddr == NEW_ADDR)
-   return false;
-   if (fio->old_blkaddr == NULL_ADDR)
-   return false;
-   return true;
-}
-
 int do_write_data_page(struct f2fs_io_info *fio)
 {
struct page *page = fio->page;
@@ -1642,7 +1633,7 @@ int do_write_data_page(struct f2fs_io_info *fio)
f2fs_lookup_extent_cache(inode, page->index, )) {
fio->old_blkaddr = ei.blk + page->index - ei.fofs;
 
-   if (valid_ipu_blkaddr(fio)) {
+   if (is_valid_blkaddr(fio->old_blkaddr)) {
ipu_force = true;
fio->need_lock = LOCK_DONE;
goto got_it;
@@ -1669,7 +1660,8 @@ int do_write_data_page(struct f2fs_io_info *fio)
 * If current allocation needs SSR,
 * it had better in-place writes for updated data.
 */
-   if (ipu_force || (valid_ipu_blkaddr(fio) && need_inplace_update(fio))) {
+   if (ipu_force || (is_valid_blkaddr(fio->old_blkaddr) &&
+   need_inplace_update(fio))) {
err = encrypt_one_page(fio);
if (err)
goto out_writepage;
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 76d8de0f21a7..49cd6c30843c 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -2658,6 +2658,13 @@ static inline void f2fs_update_iostat(struct 
f2fs_sb_info *sbi,
spin_unlock(>iostat_lock);
 }
 
+static inline bool is_valid_blkaddr(block_t blkaddr)
+{
+   if (blkaddr == NEW_ADDR || blkaddr == NULL_ADDR)
+   return false;
+   return true;
+}
+
 /*
  * file.c
  */
@@ -2878,7 +2885,7 @@ void f2fs_stop_checkpoint(struct f2fs_sb_info *sbi, bool 
end_io);
 struct page *grab_meta_page(struct f2fs_sb_info *sbi, pgoff_t index);
 struct page *get_meta_page(struct f2fs_sb_info *sbi, pgoff_t index);
 struct page *get_tmp_page(struct f2fs_sb_info *sbi, pgoff_t index);
-bool is_valid_blkaddr(struct f2fs_sb_info *sbi, block_t blkaddr, int type);
+bool is_valid_meta_blkaddr(struct f2fs_sb_info *sbi, block_t blkaddr, int 
type);
 int ra_meta_pages(struct f2fs_sb_info *sbi, block_t start, int nrpages,
int type, bool sync);
 void ra_meta_pages_cond(struct f2fs_sb_info *sbi, pgoff_t index);
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index da13ea3e190a..b569c7c5db6d 1006

Re: [PATCH v4] f2fs: fix to avoid race during access gc_thread pointer

2018-05-17 Thread Chao Yu
On 2018/5/8 5:36, Jaegeuk Kim wrote:
> On 05/07, Chao Yu wrote:
>> Thread A Thread BThread C
>> - f2fs_remount
>>  - stop_gc_thread
>>  - f2fs_sbi_store
>>  - issue_discard_thread
>>sbi->gc_thread = NULL;
>>sbi->gc_thread->gc_wake = 1
>>access 
>> sbi->gc_thread->gc_urgent
>>
>> Previously, we allocate memory for sbi->gc_thread based on background
>> gc thread mount option, the memory can be released if we turn off
>> that mount option, but still there are several places access gc_thread
>> pointer without considering race condition, result in NULL point
>> dereference.
>>
>> In order to fix this issue, introduce gc_rwsem to exclude those operations.
>>
>> Signed-off-by: Chao Yu <yuch...@huawei.com>
>> ---
>> v4:
>> - use introduced sbi.gc_rwsem lock instead of sb.s_umount.
> 
> We can use this first.
> 
>>From e62e8d3ece6ee8a4aeac8ffd6161d25851f8b3f0 Mon Sep 17 00:00:00 2001
> From: Jaegeuk Kim <jaeg...@kernel.org>
> Date: Mon, 7 May 2018 14:22:40 -0700
> Subject: [PATCH] f2fs: introduce sbi->gc_mode to determine the policy
> 
> This is to avoid sbi->gc_thread pointer access.
> 
> Signed-off-by: Jaegeuk Kim <jaeg...@kernel.org>
> ---
>  fs/f2fs/f2fs.h|  8 
>  fs/f2fs/gc.c  | 28 
>  fs/f2fs/gc.h  |  2 --
>  fs/f2fs/segment.c |  4 ++--
>  fs/f2fs/sysfs.c   | 33 +
>  5 files changed, 47 insertions(+), 28 deletions(-)
> 
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 80490a7991a7..779d8b26878c 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -1065,6 +1065,13 @@ enum {
>   MAX_TIME,
>  };
>  
> +enum {
> + GC_NORMAL,
> + GC_IDLE_CB,
> + GC_IDLE_GREEDY,
> + GC_URGENT,
> +};
> +
>  enum {
>   WHINT_MODE_OFF, /* not pass down write hints */
>   WHINT_MODE_USER,/* try to pass down hints given by users */
> @@ -1193,6 +1200,7 @@ struct f2fs_sb_info {
>   struct mutex gc_mutex;  /* mutex for GC */
>   struct f2fs_gc_kthread  *gc_thread; /* GC thread */
>   unsigned int cur_victim_sec;/* current victim section num */
> + unsigned int gc_mode;   /* current GC state */
>  
>   /* threshold for gc trials on pinned files */
>   u64 gc_pin_file_threshold;
> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> index 9bb2ddbbed1e..7ec8ea75dfde 100644
> --- a/fs/f2fs/gc.c
> +++ b/fs/f2fs/gc.c
> @@ -76,7 +76,7 @@ static int gc_thread_func(void *data)
>* invalidated soon after by user update or deletion.
>* So, I'd like to wait some time to collect dirty segments.
>*/
> - if (gc_th->gc_urgent) {
> + if (sbi->gc_mode == GC_URGENT) {
>   wait_ms = gc_th->urgent_sleep_time;
>   mutex_lock(>gc_mutex);
>   goto do_gc;
> @@ -131,8 +131,6 @@ int start_gc_thread(struct f2fs_sb_info *sbi)
>   gc_th->max_sleep_time = DEF_GC_THREAD_MAX_SLEEP_TIME;
>   gc_th->no_gc_sleep_time = DEF_GC_THREAD_NOGC_SLEEP_TIME;
>  
> - gc_th->gc_idle = 0;
> - gc_th->gc_urgent = 0;
>   gc_th->gc_wake= 0;
>  
>   sbi->gc_thread = gc_th;
> @@ -158,21 +156,19 @@ void stop_gc_thread(struct f2fs_sb_info *sbi)
>   sbi->gc_thread = NULL;
>  }
>  
> -static int select_gc_type(struct f2fs_gc_kthread *gc_th, int gc_type)
> +static int select_gc_type(struct f2fs_sb_info *sbi, int gc_type)
>  {
>   int gc_mode = (gc_type == BG_GC) ? GC_CB : GC_GREEDY;
>  
> - if (!gc_th)
> - return gc_mode;
> -
> - if (gc_th->gc_idle) {
> - if (gc_th->gc_idle == 1)
> - gc_mode = GC_CB;
> - else if (gc_th->gc_idle == 2)
> - gc_mode = GC_GREEDY;
> - }
> - if (gc_th->gc_urgent)
> + switch (sbi->gc_mode) {
> + case GC_IDLE_CB:
> + gc_mode = GC_CB;
> + break;
> + case GC_IDLE_GREEDY:
> + case GC_URGENT:
>   gc_mode = GC_GREEDY;
> + break;
> + }
>   return gc_mode;
>  }
>  
> @@ -187,7 +183,7 @@ static void select_policy(struct f2fs_sb_info *sbi, int 
> gc_type,
>   p->max_search = dirty_i->nr_dirty[type];
> 

Re: [PATCH 1/3] f2fs: fix to wait page writeback during revoking atomic write

2018-05-17 Thread Chao Yu
Hi Jaegeuk,

Could you recheck this patch?

On 2018/4/23 10:36, Chao Yu wrote:
> After revoking atomic write, related LBA can be reused by others, so we
> need to wait page writeback before reusing the LBA, in order to avoid
> interference between old atomic written in-flight IO and new IO.
> 
> Signed-off-by: Chao Yu <yuch...@huawei.com>
> ---
>  fs/f2fs/segment.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index cba59fc58eb6..24299f81f80d 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -231,6 +231,8 @@ static int __revoke_inmem_pages(struct inode *inode,
>  
>   lock_page(page);
>  
> + f2fs_wait_on_page_writeback(page, DATA, true);
> +
>   if (recover) {
>   struct dnode_of_data dn;
>   struct node_info ni;
> 



Re: [PATCH v2] f2fs: Fix deadlock in shutdown ioctl

2018-05-17 Thread Chao Yu
On 2018/5/17 16:03, Sahitya Tummala wrote:
> f2fs_ioc_shutdown() ioctl gets stuck in the below path
> when issued with F2FS_GOING_DOWN_FULLSYNC option.
> 
> __switch_to+0x90/0xc4
> percpu_down_write+0x8c/0xc0
> freeze_super+0xec/0x1e4
> freeze_bdev+0xc4/0xcc
> f2fs_ioctl+0xc0c/0x1ce0
> f2fs_compat_ioctl+0x98/0x1f0
> 
> Signed-off-by: Sahitya Tummala 
> ---
> v2:
> remove lock coverage for only F2FS_GOING_DOWN_FULLSYNC case as suggested by 
> Chao.
> 
>  fs/f2fs/file.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index 6b94f19..5a132c9 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -1857,6 +1857,7 @@ static int f2fs_ioc_shutdown(struct file *filp, 
> unsigned long arg)

How about:

if (in != F2FS_GOING_DOWN_FULLSYNC)
mnt_want_write_file();

switch()
{
handle command;
}

if (in != F2FS_GOING_DOWN_FULLSYNC)
mnt_drop_write_file();

Thanks,

>  
>   switch (in) {
>   case F2FS_GOING_DOWN_FULLSYNC:
> + mnt_drop_write_file(filp);
>   sb = freeze_bdev(sb->s_bdev);
>   if (IS_ERR(sb)) {
>   ret = PTR_ERR(sb);
> @@ -1894,7 +1895,8 @@ static int f2fs_ioc_shutdown(struct file *filp, 
> unsigned long arg)
>  
>   f2fs_update_time(sbi, REQ_TIME);
>  out:
> - mnt_drop_write_file(filp);
> + if (in != F2FS_GOING_DOWN_FULLSYNC)
> + mnt_drop_write_file(filp);
>   return ret;
>  }
>  
> 



Re: [PATCH] f2fs: Fix deadlock in shutdown ioctl

2018-05-16 Thread Chao Yu
On 2018/5/15 15:57, Sahitya Tummala wrote:
> On Mon, May 14, 2018 at 11:39:42AM +0800, Chao Yu wrote:
>> On 2018/5/10 21:20, Sahitya Tummala wrote:
>>> f2fs_ioc_shutdown() ioctl gets stuck in the below path
>>> when going down with full sync (F2FS_GOING_DOWN_FULLSYNC)
>>> option.
>>>
>>> __switch_to+0x90/0xc4
>>> percpu_down_write+0x8c/0xc0
>>> freeze_super+0xec/0x1e4
>>> freeze_bdev+0xc4/0xcc
>>> f2fs_ioctl+0xc0c/0x1ce0
>>> f2fs_compat_ioctl+0x98/0x1f0
>>>
>>> Fix this by not holding write access during this ioctl.
>>
>> I think we can just remove lock coverage for F2FS_GOING_DOWN_FULLSYNC path, 
>> for
>> other path, we need to keep as it is.
>>
> 
> Thanks, I thought about it too but then I checked that XFS shutdown ioctl is
> not taking any lock for this ioctl. Hence, I followed the same in F2FS.
> Do you know why XFS is not taking any lock? 

I don't know. :(

> Is it really needed in shutdown ioctl?

IMO, yes, we should keep freeze and remount be aware of the shutdown operation.

Thanks,

> 
> --
> Sent by a consultant of the Qualcomm Innovation Center, Inc.
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
> 
> 



Re: [PATCH v2 1/3] f2fs: fix to initialize i_current_depth according to inode type

2018-05-15 Thread Chao Yu
Sorry, I sent the wrong patch, please ignore this one.

On 2018/5/15 18:50, Chao Yu wrote:
> i_current_depth is used only for directory inode, but its space is
> shared with i_gc_failures field used for regular inode, in order to
> avoid affecting i_gc_failures' value, this patch fixes to initialize
> the union's fields according to inode type.
> 
> Signed-off-by: Chao Yu <yuch...@huawei.com>
> ---
> v2:
> - rebase code.
>  fs/f2fs/inode.c | 12 +---
>  fs/f2fs/namei.c |  3 +++
>  fs/f2fs/super.c |  1 -
>  3 files changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
> index 7f2fe4574c48..3a74a1cf3264 100644
> --- a/fs/f2fs/inode.c
> +++ b/fs/f2fs/inode.c
> @@ -232,8 +232,10 @@ static int do_read_inode(struct inode *inode)
>   inode->i_ctime.tv_nsec = le32_to_cpu(ri->i_ctime_nsec);
>   inode->i_mtime.tv_nsec = le32_to_cpu(ri->i_mtime_nsec);
>   inode->i_generation = le32_to_cpu(ri->i_generation);
> -
> - fi->i_current_depth = le32_to_cpu(ri->i_current_depth);
> + if (S_ISDIR(inode->i_mode))
> + fi->i_current_depth = le32_to_cpu(ri->i_current_depth);
> + else if (S_ISREG(inode->i_mode))
> + fi->i_gc_failures = le16_to_cpu(ri->i_gc_failures);
>   fi->i_xattr_nid = le32_to_cpu(ri->i_xattr_nid);
>   fi->i_flags = le32_to_cpu(ri->i_flags);
>   fi->flags = 0;
> @@ -422,7 +424,11 @@ void update_inode(struct inode *inode, struct page 
> *node_page)
>   ri->i_atime_nsec = cpu_to_le32(inode->i_atime.tv_nsec);
>   ri->i_ctime_nsec = cpu_to_le32(inode->i_ctime.tv_nsec);
>   ri->i_mtime_nsec = cpu_to_le32(inode->i_mtime.tv_nsec);
> - ri->i_current_depth = cpu_to_le32(F2FS_I(inode)->i_current_depth);
> + if (S_ISDIR(inode->i_mode))
> + ri->i_current_depth =
> + cpu_to_le32(F2FS_I(inode)->i_current_depth);
> + else if (S_ISREG(inode->i_mode))
> + ri->i_gc_failures = cpu_to_le16(F2FS_I(inode)->i_gc_failures);
>   ri->i_xattr_nid = cpu_to_le32(F2FS_I(inode)->i_xattr_nid);
>   ri->i_flags = cpu_to_le32(F2FS_I(inode)->i_flags);
>   ri->i_pino = cpu_to_le32(F2FS_I(inode)->i_pino);
> diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
> index fef6e3ab2135..bcfc4219b29e 100644
> --- a/fs/f2fs/namei.c
> +++ b/fs/f2fs/namei.c
> @@ -54,6 +54,9 @@ static struct inode *f2fs_new_inode(struct inode *dir, 
> umode_t mode)
>   F2FS_I(inode)->i_crtime = current_time(inode);
>   inode->i_generation = sbi->s_next_generation++;
>  
> + if (S_ISDIR(inode->i_mode))
> + F2FS_I(inode)->i_current_depth = 1;
> +
>   err = insert_inode_locked(inode);
>   if (err) {
>   err = -EINVAL;
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index 294be9e92aee..55ccc2eaaa2e 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -826,7 +826,6 @@ static struct inode *f2fs_alloc_inode(struct super_block 
> *sb)
>  
>   /* Initialize f2fs-specific inode info */
>   atomic_set(>dirty_pages, 0);
> - fi->i_current_depth = 1;
>   init_rwsem(>i_sem);
>   INIT_LIST_HEAD(>dirty_list);
>   INIT_LIST_HEAD(>gdirty_list);
> 



[PATCH] f2fs: fix to initialize min_mtime with ULLONG_MAX

2018-05-15 Thread Chao Yu
Since sit_i.min_mtime's type is unsigned long long, so we should
initialize it with max value of the type ULLONG_MAX instead of
LLONG_MAX.

Signed-off-by: Chao Yu <yuch...@huawei.com>
---
 fs/f2fs/segment.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 100fb6454527..bc93c9efbbd2 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -3828,7 +3828,7 @@ static void init_min_max_mtime(struct f2fs_sb_info *sbi)
 
down_write(_i->sentry_lock);
 
-   sit_i->min_mtime = LLONG_MAX;
+   sit_i->min_mtime = ULLONG_MAX;
 
for (segno = 0; segno < MAIN_SEGS(sbi); segno += sbi->segs_per_sec) {
unsigned int i;
-- 
2.17.0.391.g1f1cddd558b5



[PATCH v2 1/3] f2fs: fix to initialize i_current_depth according to inode type

2018-05-15 Thread Chao Yu
i_current_depth is used only for directory inode, but its space is
shared with i_gc_failures field used for regular inode, in order to
avoid affecting i_gc_failures' value, this patch fixes to initialize
the union's fields according to inode type.

Signed-off-by: Chao Yu <yuch...@huawei.com>
---
v2:
- rebase code.
 fs/f2fs/inode.c | 12 +---
 fs/f2fs/namei.c |  3 +++
 fs/f2fs/super.c |  1 -
 3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
index 7f2fe4574c48..3a74a1cf3264 100644
--- a/fs/f2fs/inode.c
+++ b/fs/f2fs/inode.c
@@ -232,8 +232,10 @@ static int do_read_inode(struct inode *inode)
inode->i_ctime.tv_nsec = le32_to_cpu(ri->i_ctime_nsec);
inode->i_mtime.tv_nsec = le32_to_cpu(ri->i_mtime_nsec);
inode->i_generation = le32_to_cpu(ri->i_generation);
-
-   fi->i_current_depth = le32_to_cpu(ri->i_current_depth);
+   if (S_ISDIR(inode->i_mode))
+   fi->i_current_depth = le32_to_cpu(ri->i_current_depth);
+   else if (S_ISREG(inode->i_mode))
+   fi->i_gc_failures = le16_to_cpu(ri->i_gc_failures);
fi->i_xattr_nid = le32_to_cpu(ri->i_xattr_nid);
fi->i_flags = le32_to_cpu(ri->i_flags);
fi->flags = 0;
@@ -422,7 +424,11 @@ void update_inode(struct inode *inode, struct page 
*node_page)
ri->i_atime_nsec = cpu_to_le32(inode->i_atime.tv_nsec);
ri->i_ctime_nsec = cpu_to_le32(inode->i_ctime.tv_nsec);
ri->i_mtime_nsec = cpu_to_le32(inode->i_mtime.tv_nsec);
-   ri->i_current_depth = cpu_to_le32(F2FS_I(inode)->i_current_depth);
+   if (S_ISDIR(inode->i_mode))
+   ri->i_current_depth =
+   cpu_to_le32(F2FS_I(inode)->i_current_depth);
+   else if (S_ISREG(inode->i_mode))
+   ri->i_gc_failures = cpu_to_le16(F2FS_I(inode)->i_gc_failures);
ri->i_xattr_nid = cpu_to_le32(F2FS_I(inode)->i_xattr_nid);
ri->i_flags = cpu_to_le32(F2FS_I(inode)->i_flags);
ri->i_pino = cpu_to_le32(F2FS_I(inode)->i_pino);
diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
index fef6e3ab2135..bcfc4219b29e 100644
--- a/fs/f2fs/namei.c
+++ b/fs/f2fs/namei.c
@@ -54,6 +54,9 @@ static struct inode *f2fs_new_inode(struct inode *dir, 
umode_t mode)
F2FS_I(inode)->i_crtime = current_time(inode);
inode->i_generation = sbi->s_next_generation++;
 
+   if (S_ISDIR(inode->i_mode))
+   F2FS_I(inode)->i_current_depth = 1;
+
err = insert_inode_locked(inode);
if (err) {
err = -EINVAL;
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 294be9e92aee..55ccc2eaaa2e 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -826,7 +826,6 @@ static struct inode *f2fs_alloc_inode(struct super_block 
*sb)
 
/* Initialize f2fs-specific inode info */
atomic_set(>dirty_pages, 0);
-   fi->i_current_depth = 1;
init_rwsem(>i_sem);
INIT_LIST_HEAD(>dirty_list);
INIT_LIST_HEAD(>gdirty_list);
-- 
2.17.0.391.g1f1cddd558b5



Re: [PATCH] f2fs: Fix deadlock in shutdown ioctl

2018-05-13 Thread Chao Yu
On 2018/5/10 21:20, Sahitya Tummala wrote:
> f2fs_ioc_shutdown() ioctl gets stuck in the below path
> when going down with full sync (F2FS_GOING_DOWN_FULLSYNC)
> option.
> 
> __switch_to+0x90/0xc4
> percpu_down_write+0x8c/0xc0
> freeze_super+0xec/0x1e4
> freeze_bdev+0xc4/0xcc
> f2fs_ioctl+0xc0c/0x1ce0
> f2fs_compat_ioctl+0x98/0x1f0
> 
> Fix this by not holding write access during this ioctl.

I think we can just remove lock coverage for F2FS_GOING_DOWN_FULLSYNC path, for
other path, we need to keep as it is.

Thanks,

> 
> Signed-off-by: Sahitya Tummala 
> ---
>  fs/f2fs/file.c | 5 -
>  1 file changed, 5 deletions(-)
> 
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index b926df7..2c2e61b 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -1835,10 +1835,6 @@ static int f2fs_ioc_shutdown(struct file *filp, 
> unsigned long arg)
>   if (get_user(in, (__u32 __user *)arg))
>   return -EFAULT;
>  
> - ret = mnt_want_write_file(filp);
> - if (ret)
> - return ret;
> -
>   switch (in) {
>   case F2FS_GOING_DOWN_FULLSYNC:
>   sb = freeze_bdev(sb->s_bdev);
> @@ -1878,7 +1874,6 @@ static int f2fs_ioc_shutdown(struct file *filp, 
> unsigned long arg)
>  
>   f2fs_update_time(sbi, REQ_TIME);
>  out:
> - mnt_drop_write_file(filp);
>   return ret;
>  }
>  
> 



Re: [PATCH 32/76] fs/f2fs: Use inode_sb() helper instead of inode->i_sb

2018-05-10 Thread Chao Yu
On 2018/5/9 2:03, Mark Fasheh wrote:
> Signed-off-by: Mark Fasheh <mfas...@suse.de>

Reviewed-by: Chao Yu <yuch...@huawei.com>

Thanks,



[PATCH] f2fs: fix to let checkpoint guarantee atomic page persistence

2018-05-08 Thread Chao Yu
1. thread A: commit_inmem_pages submit data into block layer, but
haven't waited it writeback.
2. thread A: commit_inmem_pages update related node.
3. thread B: do checkpoint, flush all nodes to disk.
4. SPOR

Then, atomic file becomes corrupted since nodes is flushed before data.

This patch fixes to treat atomic page as checkpoint guaranteed one,
then in checkpoint, we can make sure all atomic page can be writebacked
with metadata of atomic file.

Signed-off-by: Chao Yu <yuch...@huawei.com>
---
 fs/f2fs/data.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 778b23fce4fa..734be00fab3a 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -45,6 +45,8 @@ static bool __is_cp_guaranteed(struct page *page)
if (inode->i_ino == F2FS_META_INO(sbi) ||
inode->i_ino ==  F2FS_NODE_INO(sbi) ||
S_ISDIR(inode->i_mode) ||
+   (S_ISREG(inode->i_mode) &&
+   is_inode_flag_set(inode, FI_ATOMIC_FILE)) ||
is_cold_data(page))
return true;
return false;
-- 
2.17.0.391.g1f1cddd558b5



Re: [PATCH] f2fs: fix to wait IO writeback in __revoke_inmem_pages()

2018-05-08 Thread Chao Yu
On 2018/5/8 11:31, Jaegeuk Kim wrote:
> On 05/08, Chao Yu wrote:
>> On 2018/5/8 4:46, Jaegeuk Kim wrote:
>>> On 04/27, Chao Yu wrote:
>>>> On 2018/4/27 0:36, Jaegeuk Kim wrote:
>>>>> On 04/26, Chao Yu wrote:
>>>>>> On 2018/4/26 23:48, Jaegeuk Kim wrote:
>>>>>>> On 04/26, Chao Yu wrote:
>>>>>>>> Thread A   Thread B
>>>>>>>> - f2fs_ioc_commit_atomic_write
>>>>>>>>  - commit_inmem_pages
>>>>>>>>   - f2fs_submit_merged_write_cond
>>>>>>>>   : write data
>>>>>>>>- write_checkpoint
>>>>>>>> - do_checkpoint
>>>>>>>> : commit all node within CP
>>>>>>>> -> SPO
>>>>>>>>   - f2fs_do_sync_file
>>>>>>>>- file_write_and_wait_range
>>>>>>>>: wait data writeback
>>>>>>>>
>>>>>>>> In above race condition, data/node can be flushed in reversed order 
>>>>>>>> when
>>>>>>>> coming a checkpoint before f2fs_do_sync_file, after SPOR, it results in
>>>>>>>> atomic written data being corrupted.
>>>>>>>
>>>>>>> Wait, what is the problem here? Thread B could succeed checkpoint, 
>>>>>>> there is
>>>>>>> no problem. If it fails, there is no fsync mark where we can recover 
>>>>>>> it, so
>>>>>>
>>>>>> Node is flushed by checkpoint before data, with reversed order, that's 
>>>>>> the problem.
>>>>>
>>>>> What do you mean? Data should be in disk, in order to proceed checkpoint.
>>>>
>>>> 1. thread A: commit_inmem_pages submit data into block layer, but haven't 
>>>> waited
>>>> it writeback.
>>>> 2. thread A: commit_inmem_pages update related node.
>>>> 3. thread B: do checkpoint, flush all nodes to disk
>>>
>>> How about, in block_operations(),
>>>
>>> down_read_trylock(_I(inode)->i_gc_rwsem[WRITE]);
>>> if (fail)
>>> wait_on_all_pages_writeback(F2FS_WB_DATA);
>>> else
>>> up_read(_I(inode)->i_gc_rwsem[WRITE]);
>>
>> I sent one patch for that, could you check it?
>>
>> Adding wait_on_all_pages_writeback in block_operations() can make 
>> checkpoint()
>> wait pages writeback one more time, which break IO flow, so what's your 
>> concern
>> here?
> 
> Performance. And I can see wait_on_all_pages_writeback() waits only for
> F2FS_WB_CP_DATA in checkpoint()?

Oh, you mean wait all F2FS_WB_DATA pages writeback, what about just treating
atomic write page as F2FS_WB_CP_DATA, and we can wait atomic pages in
wait_on_all_pages_writeback() in do_checkpoitn().

Thanks,

> 
> 
>>
>> Thanks,
>>
>>>
>>>
>>>> 4. SPOR
>>>>
>>>> Then, atomic file becomes corrupted since nodes is flushed before data.
>>>>
>>>> Thanks,
>>>>
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>>> we can just ignore the last written data as nothing.
>>>>>>>
>>>>>>>>
>>>>>>>> This patch adds f2fs_wait_on_page_writeback in __revoke_inmem_pages() 
>>>>>>>> to
>>>>>>>> keep data and node of atomic file being flushed orderly.
>>>>>>>>
>>>>>>>> Signed-off-by: Chao Yu <yuch...@huawei.com>
>>>>>>>> ---
>>>>>>>>  fs/f2fs/file.c| 4 
>>>>>>>>  fs/f2fs/segment.c | 3 +++
>>>>>>>>  2 files changed, 7 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
>>>>>>>> index be7578774a47..a352804af244 100644
>>>>>>>> --- a/fs/f2fs/file.c
>>>>>>>> +++ b/fs/f2fs/file.c
>>>>>>>> @@ -217,6 +217,9 @@ static int f2fs_do_sync_file(struct file *file, 
>>>>>>>> loff_t start, loff_t end,
>>>>>>>>  
>>>>>>>>

[PATCH v2] f2fs: fix to avoid race during access gc_thread pointer

2018-05-07 Thread Chao Yu
Thread AThread B
- f2fs_remount
 - stop_gc_thread
- f2fs_sbi_store
   sbi->gc_thread = NULL;
  access sbi->gc_thread->gc_*

Previously, we allocate memory for sbi->gc_thread based on background
gc thread mount option, the memory can be released if we turn off
that mount option, but still there are several places access gc_thread
pointer without considering race condition, result in NULL point
dereference.

In order to fix this issue, use sb->s_umount to exclude those operations.

Signed-off-by: Chao Yu <yuch...@huawei.com>
---
v2:
- fix to cover __struct_ptr() with sb->s_umount suggested by Jaegeuk.
 fs/f2fs/sysfs.c | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c
index 7432192ebe17..79f4e4ac8200 100644
--- a/fs/f2fs/sysfs.c
+++ b/fs/f2fs/sysfs.c
@@ -168,7 +168,7 @@ static ssize_t f2fs_sbi_show(struct f2fs_attr *a,
return snprintf(buf, PAGE_SIZE, "%u\n", *ui);
 }
 
-static ssize_t f2fs_sbi_store(struct f2fs_attr *a,
+static ssize_t __f2fs_sbi_store(struct f2fs_attr *a,
struct f2fs_sb_info *sbi,
const char *buf, size_t count)
 {
@@ -261,6 +261,22 @@ static ssize_t f2fs_sbi_store(struct f2fs_attr *a,
return count;
 }
 
+static ssize_t f2fs_sbi_store(struct f2fs_attr *a,
+   struct f2fs_sb_info *sbi,
+   const char *buf, size_t count)
+{
+   ssize_t ret;
+   bool gc_entry = (a->struct_type == GC_THREAD);
+
+   if (gc_entry)
+   down_read(>sb->s_umount);
+   ret = __f2fs_sbi_store(a, sbi, buf, count);
+   if (gc_entry)
+   up_read(>sb->s_umount);
+
+   return ret;
+}
+
 static ssize_t f2fs_attr_show(struct kobject *kobj,
struct attribute *attr, char *buf)
 {
-- 
2.17.0.391.g1f1cddd558b5



[PATCH] f2fs: fix to avoid race during access gc_thread pointer

2018-05-07 Thread Chao Yu
Thread AThread B
- f2fs_remount
 - stop_gc_thread
- f2fs_sbi_store
   sbi->gc_thread = NULL;
  access sbi->gc_thread->gc_*

Previously, we allocate memory for sbi->gc_thread based on background
gc thread mount option, the memory can be released if we turn off
that mount option, but still there are several places access gc_thread
pointer without considering race condition, result in NULL point
dereference.

In order to fix this issue, use sb->s_umount to exclude those operations.

Signed-off-by: Chao Yu <yuch...@huawei.com>
---
 fs/f2fs/sysfs.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c
index f3c3fb4cbb0d..0aec4db7fa02 100644
--- a/fs/f2fs/sysfs.c
+++ b/fs/f2fs/sysfs.c
@@ -179,6 +179,7 @@ static ssize_t f2fs_sbi_store(struct f2fs_attr *a,
unsigned long t;
unsigned int *ui;
ssize_t ret;
+   bool gc_entry = (a->struct_type == GC_THREAD);
 
ptr = __struct_ptr(sbi, a->struct_type);
if (!ptr)
@@ -277,8 +278,14 @@ static ssize_t f2fs_sbi_store(struct f2fs_attr *a,
return count;
}
 
+   if (gc_entry)
+   down_read(>sb->s_umount);
+
*ui = t;
 
+   if (gc_entry)
+   up_read(>sb->s_umount);
+
if (!strcmp(a->attr.name, "iostat_enable") && *ui == 0)
f2fs_reset_iostat(sbi);
return count;
-- 
2.17.0.391.g1f1cddd558b5



Re: [PATCH v4] f2fs: fix to avoid race during access gc_thread pointer

2018-05-07 Thread Chao Yu
On 2018/5/8 11:17, Jaegeuk Kim wrote:
> On 05/08, Chao Yu wrote:
>> On 2018/5/8 5:36, Jaegeuk Kim wrote:
>>> On 05/07, Chao Yu wrote:
>>>> Thread A   Thread BThread C
>>>> - f2fs_remount
>>>>  - stop_gc_thread
>>>>- f2fs_sbi_store
>>>>- issue_discard_thread
>>>>sbi->gc_thread = NULL;
>>>>  sbi->gc_thread->gc_wake = 1
>>>>  access 
>>>> sbi->gc_thread->gc_urgent
>>>>
>>>> Previously, we allocate memory for sbi->gc_thread based on background
>>>> gc thread mount option, the memory can be released if we turn off
>>>> that mount option, but still there are several places access gc_thread
>>>> pointer without considering race condition, result in NULL point
>>>> dereference.
>>>>
>>>> In order to fix this issue, introduce gc_rwsem to exclude those operations.
>>>>
>>>> Signed-off-by: Chao Yu <yuch...@huawei.com>
>>>> ---
>>>> v4:
>>>> - use introduced sbi.gc_rwsem lock instead of sb.s_umount.
>>>
>>> We can use this first.
>>>
>>> >From e62e8d3ece6ee8a4aeac8ffd6161d25851f8b3f0 Mon Sep 17 00:00:00 2001
>>> From: Jaegeuk Kim <jaeg...@kernel.org>
>>> Date: Mon, 7 May 2018 14:22:40 -0700
>>> Subject: [PATCH] f2fs: introduce sbi->gc_mode to determine the policy
>>>
>>> This is to avoid sbi->gc_thread pointer access.
>>>
>>> Signed-off-by: Jaegeuk Kim <jaeg...@kernel.org>
>>> ---
>>>  fs/f2fs/f2fs.h|  8 
>>>  fs/f2fs/gc.c  | 28 
>>>  fs/f2fs/gc.h  |  2 --
>>>  fs/f2fs/segment.c |  4 ++--
>>>  fs/f2fs/sysfs.c   | 33 +
>>>  5 files changed, 47 insertions(+), 28 deletions(-)
>>>
>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>>> index 80490a7991a7..779d8b26878c 100644
>>> --- a/fs/f2fs/f2fs.h
>>> +++ b/fs/f2fs/f2fs.h
>>> @@ -1065,6 +1065,13 @@ enum {
>>> MAX_TIME,
>>>  };
>>>  
>>> +enum {
>>> +   GC_NORMAL,
>>> +   GC_IDLE_CB,
>>> +   GC_IDLE_GREEDY,
>>> +   GC_URGENT,
>>> +};
>>> +
>>>  enum {
>>> WHINT_MODE_OFF, /* not pass down write hints */
>>> WHINT_MODE_USER,/* try to pass down hints given by users */
>>> @@ -1193,6 +1200,7 @@ struct f2fs_sb_info {
>>> struct mutex gc_mutex;  /* mutex for GC */
>>> struct f2fs_gc_kthread  *gc_thread; /* GC thread */
>>> unsigned int cur_victim_sec;/* current victim section num */
>>> +   unsigned int gc_mode;   /* current GC state */
>>>  
>>> /* threshold for gc trials on pinned files */
>>> u64 gc_pin_file_threshold;
>>> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
>>> index 9bb2ddbbed1e..7ec8ea75dfde 100644
>>> --- a/fs/f2fs/gc.c
>>> +++ b/fs/f2fs/gc.c
>>> @@ -76,7 +76,7 @@ static int gc_thread_func(void *data)
>>>  * invalidated soon after by user update or deletion.
>>>  * So, I'd like to wait some time to collect dirty segments.
>>>  */
>>> -   if (gc_th->gc_urgent) {
>>> +   if (sbi->gc_mode == GC_URGENT) {
>>> wait_ms = gc_th->urgent_sleep_time;
>>> mutex_lock(>gc_mutex);
>>> goto do_gc;
>>> @@ -131,8 +131,6 @@ int start_gc_thread(struct f2fs_sb_info *sbi)
>>> gc_th->max_sleep_time = DEF_GC_THREAD_MAX_SLEEP_TIME;
>>> gc_th->no_gc_sleep_time = DEF_GC_THREAD_NOGC_SLEEP_TIME;
>>>  
>>> -   gc_th->gc_idle = 0;
>>> -   gc_th->gc_urgent = 0;
>>> gc_th->gc_wake= 0;
>>>  
>>> sbi->gc_thread = gc_th;
>>> @@ -158,21 +156,19 @@ void stop_gc_thread(struct f2fs_sb_info *sbi)
>>> sbi->gc_thread = NULL;
>>>  }
>>>  
>>> -static int select_gc_type(struct f2fs_gc_kthread *gc_th, int gc_type)
>>> +static int select_gc_type(struct f2fs_sb_info *sbi, int gc_type)
>>>  {
>>> int gc_mode = (gc_type == BG_GC) ? GC_CB : GC_GREEDY;
>>>  
>>> -   if (!gc_th)

Re: [PATCH] f2fs: fix to wait IO writeback in __revoke_inmem_pages()

2018-05-07 Thread Chao Yu
On 2018/5/8 4:46, Jaegeuk Kim wrote:
> On 04/27, Chao Yu wrote:
>> On 2018/4/27 0:36, Jaegeuk Kim wrote:
>>> On 04/26, Chao Yu wrote:
>>>> On 2018/4/26 23:48, Jaegeuk Kim wrote:
>>>>> On 04/26, Chao Yu wrote:
>>>>>> Thread A Thread B
>>>>>> - f2fs_ioc_commit_atomic_write
>>>>>>  - commit_inmem_pages
>>>>>>   - f2fs_submit_merged_write_cond
>>>>>>   : write data
>>>>>>  - write_checkpoint
>>>>>>   - do_checkpoint
>>>>>>   : commit all node within CP
>>>>>>   -> SPO
>>>>>>   - f2fs_do_sync_file
>>>>>>- file_write_and_wait_range
>>>>>>: wait data writeback
>>>>>>
>>>>>> In above race condition, data/node can be flushed in reversed order when
>>>>>> coming a checkpoint before f2fs_do_sync_file, after SPOR, it results in
>>>>>> atomic written data being corrupted.
>>>>>
>>>>> Wait, what is the problem here? Thread B could succeed checkpoint, there 
>>>>> is
>>>>> no problem. If it fails, there is no fsync mark where we can recover it, 
>>>>> so
>>>>
>>>> Node is flushed by checkpoint before data, with reversed order, that's the 
>>>> problem.
>>>
>>> What do you mean? Data should be in disk, in order to proceed checkpoint.
>>
>> 1. thread A: commit_inmem_pages submit data into block layer, but haven't 
>> waited
>> it writeback.
>> 2. thread A: commit_inmem_pages update related node.
>> 3. thread B: do checkpoint, flush all nodes to disk
> 
> How about, in block_operations(),
> 
>   down_read_trylock(_I(inode)->i_gc_rwsem[WRITE]);
>   if (fail)
>   wait_on_all_pages_writeback(F2FS_WB_DATA);
>   else
>   up_read(_I(inode)->i_gc_rwsem[WRITE]);

I sent one patch for that, could you check it?

Adding wait_on_all_pages_writeback in block_operations() can make checkpoint()
wait pages writeback one more time, which break IO flow, so what's your concern
here?

Thanks,

> 
> 
>> 4. SPOR
>>
>> Then, atomic file becomes corrupted since nodes is flushed before data.
>>
>> Thanks,
>>
>>>
>>>>
>>>> Thanks,
>>>>
>>>>> we can just ignore the last written data as nothing.
>>>>>
>>>>>>
>>>>>> This patch adds f2fs_wait_on_page_writeback in __revoke_inmem_pages() to
>>>>>> keep data and node of atomic file being flushed orderly.
>>>>>>
>>>>>> Signed-off-by: Chao Yu <yuch...@huawei.com>
>>>>>> ---
>>>>>>  fs/f2fs/file.c| 4 
>>>>>>  fs/f2fs/segment.c | 3 +++
>>>>>>  2 files changed, 7 insertions(+)
>>>>>>
>>>>>> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
>>>>>> index be7578774a47..a352804af244 100644
>>>>>> --- a/fs/f2fs/file.c
>>>>>> +++ b/fs/f2fs/file.c
>>>>>> @@ -217,6 +217,9 @@ static int f2fs_do_sync_file(struct file *file, 
>>>>>> loff_t start, loff_t end,
>>>>>>  
>>>>>>  trace_f2fs_sync_file_enter(inode);
>>>>>>  
>>>>>> +if (atomic)
>>>>>> +goto write_done;
>>>>>> +
>>>>>>  /* if fdatasync is triggered, let's do in-place-update */
>>>>>>  if (datasync || get_dirty_pages(inode) <= 
>>>>>> SM_I(sbi)->min_fsync_blocks)
>>>>>>  set_inode_flag(inode, FI_NEED_IPU);
>>>>>> @@ -228,6 +231,7 @@ static int f2fs_do_sync_file(struct file *file, 
>>>>>> loff_t start, loff_t end,
>>>>>>  return ret;
>>>>>>  }
>>>>>>  
>>>>>> +write_done:
>>>>>>  /* if the inode is dirty, let's recover all the time */
>>>>>>  if (!f2fs_skip_inode_update(inode, datasync)) {
>>>>>>  f2fs_write_inode(inode, NULL);
>>>>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>>>>>> index 584483426584..9ca3d0a43d93 100644
>>>>>> --- a/fs/f2fs/segment.c
>>>>>> +++ b/fs/f2fs/segment.c
>>>>>> @@ -230,6 +230,8 @@ static int __revoke_inmem_pages(struct inode *inode,
>>>>>>  
>>>>>>  lock_page(page);
>>>>>>  
>>>>>> +f2fs_wait_on_page_writeback(page, DATA, true);
>>>>>> +
>>>>>>  if (recover) {
>>>>>>  struct dnode_of_data dn;
>>>>>>  struct node_info ni;
>>>>>> @@ -415,6 +417,7 @@ static int __commit_inmem_pages(struct inode *inode)
>>>>>>  /* drop all uncommitted pages */
>>>>>>  __revoke_inmem_pages(inode, >inmem_pages, true, 
>>>>>> false);
>>>>>>  } else {
>>>>>> +/* wait all committed IOs writeback and release them 
>>>>>> from list */
>>>>>>  __revoke_inmem_pages(inode, _list, false, false);
>>>>>>  }
>>>>>>  
>>>>>> -- 
>>>>>> 2.15.0.55.gc2ece9dc4de6
>>>
>>> .
>>>
> 
> .
> 



[PATCH] f2fs: fix to wait atomic pages writeback in block_operations()

2018-05-07 Thread Chao Yu
1. thread A: commit_inmem_pages submit data into block layer, but
haven't waited it writeback.
2. thread A: commit_inmem_pages update related node.
3. thread B: do checkpoint, flush all nodes to disk.
4. SPOR

Then, atomic file becomes corrupted since nodes is flushed before data.

This patch fixes to try to wait all atomic pages writeback in
block_operations().

Signed-off-by: Chao Yu <yuch...@huawei.com>
---
 fs/f2fs/checkpoint.c |  4 +++-
 fs/f2fs/data.c   |  2 ++
 fs/f2fs/f2fs.h   |  2 ++
 fs/f2fs/segment.c| 17 +
 4 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 33d2da006789..d53d53f55c51 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -1067,6 +1067,8 @@ static int block_operations(struct f2fs_sb_info *sbi)
goto retry_flush_dents;
}
 
+   wait_inmem_pages_writeback(sbi);
+
/*
 * POR: we should ensure that there are no dirty node pages
 * until finishing nat/sit flush. inode->i_blocks can be updated.
@@ -1115,7 +1117,7 @@ static void unblock_operations(struct f2fs_sb_info *sbi)
f2fs_unlock_all(sbi);
 }
 
-static void wait_on_all_pages_writeback(struct f2fs_sb_info *sbi)
+void wait_on_all_pages_writeback(struct f2fs_sb_info *sbi)
 {
DEFINE_WAIT(wait);
 
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 5a979b5ee278..c181f58948c0 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -48,6 +48,8 @@ static bool __is_cp_guaranteed(struct page *page)
if (inode->i_ino == F2FS_META_INO(sbi) ||
inode->i_ino ==  F2FS_NODE_INO(sbi) ||
S_ISDIR(inode->i_mode) ||
+   (S_ISREG(inode->i_mode) &&
+   is_inode_flag_set(inode, FI_ATOMIC_FILE)) ||
is_cold_data(page))
return true;
return false;
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index bda9c3ce08ef..adfd512ae4a1 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -2839,6 +2839,7 @@ void destroy_node_manager_caches(void);
 bool need_SSR(struct f2fs_sb_info *sbi);
 void register_inmem_page(struct inode *inode, struct page *page);
 void drop_inmem_pages_all(struct f2fs_sb_info *sbi, bool gc_failure);
+void wait_inmem_pages_writeback(struct f2fs_sb_info *sbi);
 void drop_inmem_pages(struct inode *inode);
 void drop_inmem_page(struct inode *inode, struct page *page);
 int commit_inmem_pages(struct inode *inode);
@@ -2926,6 +2927,7 @@ int get_valid_checkpoint(struct f2fs_sb_info *sbi);
 void update_dirty_page(struct inode *inode, struct page *page);
 void remove_dirty_inode(struct inode *inode);
 int sync_dirty_inodes(struct f2fs_sb_info *sbi, enum inode_type type);
+void wait_on_all_pages_writeback(struct f2fs_sb_info *sbi);
 int write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc);
 void init_ino_entry_info(struct f2fs_sb_info *sbi);
 int __init create_checkpoint_caches(void);
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 24b71d450374..e8a81cbd6808 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -305,6 +305,23 @@ void drop_inmem_pages_all(struct f2fs_sb_info *sbi, bool 
gc_failure)
goto next;
 }
 
+void wait_inmem_pages_writeback(struct f2fs_sb_info *sbi)
+{
+   struct list_head *head = >inode_list[ATOMIC_FILE];
+   struct f2fs_inode_info *fi;
+
+   spin_lock(>inode_lock[ATOMIC_FILE]);
+   list_for_each_entry(fi, head, inmem_ilist) {
+   if (!down_read_trylock(>i_gc_rwsem[WRITE])) {
+   spin_unlock(>inode_lock[ATOMIC_FILE]);
+   wait_on_all_pages_writeback(sbi);
+   return;
+   }
+   up_read(>i_gc_rwsem[WRITE]);
+   }
+   spin_unlock(>inode_lock[ATOMIC_FILE]);
+}
+
 void drop_inmem_pages(struct inode *inode)
 {
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
-- 
2.17.0.391.g1f1cddd558b5



Re: [PATCH v4] f2fs: fix to avoid race during access gc_thread pointer

2018-05-07 Thread Chao Yu
On 2018/5/8 5:36, Jaegeuk Kim wrote:
> On 05/07, Chao Yu wrote:
>> Thread A Thread BThread C
>> - f2fs_remount
>>  - stop_gc_thread
>>  - f2fs_sbi_store
>>  - issue_discard_thread
>>sbi->gc_thread = NULL;
>>sbi->gc_thread->gc_wake = 1
>>access 
>> sbi->gc_thread->gc_urgent
>>
>> Previously, we allocate memory for sbi->gc_thread based on background
>> gc thread mount option, the memory can be released if we turn off
>> that mount option, but still there are several places access gc_thread
>> pointer without considering race condition, result in NULL point
>> dereference.
>>
>> In order to fix this issue, introduce gc_rwsem to exclude those operations.
>>
>> Signed-off-by: Chao Yu <yuch...@huawei.com>
>> ---
>> v4:
>> - use introduced sbi.gc_rwsem lock instead of sb.s_umount.
> 
> We can use this first.
> 
>>From e62e8d3ece6ee8a4aeac8ffd6161d25851f8b3f0 Mon Sep 17 00:00:00 2001
> From: Jaegeuk Kim <jaeg...@kernel.org>
> Date: Mon, 7 May 2018 14:22:40 -0700
> Subject: [PATCH] f2fs: introduce sbi->gc_mode to determine the policy
> 
> This is to avoid sbi->gc_thread pointer access.
> 
> Signed-off-by: Jaegeuk Kim <jaeg...@kernel.org>
> ---
>  fs/f2fs/f2fs.h|  8 
>  fs/f2fs/gc.c  | 28 
>  fs/f2fs/gc.h  |  2 --
>  fs/f2fs/segment.c |  4 ++--
>  fs/f2fs/sysfs.c   | 33 +
>  5 files changed, 47 insertions(+), 28 deletions(-)
> 
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 80490a7991a7..779d8b26878c 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -1065,6 +1065,13 @@ enum {
>   MAX_TIME,
>  };
>  
> +enum {
> + GC_NORMAL,
> + GC_IDLE_CB,
> + GC_IDLE_GREEDY,
> + GC_URGENT,
> +};
> +
>  enum {
>   WHINT_MODE_OFF, /* not pass down write hints */
>   WHINT_MODE_USER,/* try to pass down hints given by users */
> @@ -1193,6 +1200,7 @@ struct f2fs_sb_info {
>   struct mutex gc_mutex;  /* mutex for GC */
>   struct f2fs_gc_kthread  *gc_thread; /* GC thread */
>   unsigned int cur_victim_sec;/* current victim section num */
> + unsigned int gc_mode;   /* current GC state */
>  
>   /* threshold for gc trials on pinned files */
>   u64 gc_pin_file_threshold;
> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> index 9bb2ddbbed1e..7ec8ea75dfde 100644
> --- a/fs/f2fs/gc.c
> +++ b/fs/f2fs/gc.c
> @@ -76,7 +76,7 @@ static int gc_thread_func(void *data)
>* invalidated soon after by user update or deletion.
>* So, I'd like to wait some time to collect dirty segments.
>*/
> - if (gc_th->gc_urgent) {
> + if (sbi->gc_mode == GC_URGENT) {
>   wait_ms = gc_th->urgent_sleep_time;
>   mutex_lock(>gc_mutex);
>   goto do_gc;
> @@ -131,8 +131,6 @@ int start_gc_thread(struct f2fs_sb_info *sbi)
>   gc_th->max_sleep_time = DEF_GC_THREAD_MAX_SLEEP_TIME;
>   gc_th->no_gc_sleep_time = DEF_GC_THREAD_NOGC_SLEEP_TIME;
>  
> - gc_th->gc_idle = 0;
> - gc_th->gc_urgent = 0;
>   gc_th->gc_wake= 0;
>  
>   sbi->gc_thread = gc_th;
> @@ -158,21 +156,19 @@ void stop_gc_thread(struct f2fs_sb_info *sbi)
>   sbi->gc_thread = NULL;
>  }
>  
> -static int select_gc_type(struct f2fs_gc_kthread *gc_th, int gc_type)
> +static int select_gc_type(struct f2fs_sb_info *sbi, int gc_type)
>  {
>   int gc_mode = (gc_type == BG_GC) ? GC_CB : GC_GREEDY;
>  
> - if (!gc_th)
> - return gc_mode;
> -
> - if (gc_th->gc_idle) {
> - if (gc_th->gc_idle == 1)
> - gc_mode = GC_CB;
> - else if (gc_th->gc_idle == 2)
> - gc_mode = GC_GREEDY;
> - }
> - if (gc_th->gc_urgent)
> + switch (sbi->gc_mode) {
> + case GC_IDLE_CB:
> + gc_mode = GC_CB;
> + break;
> + case GC_IDLE_GREEDY:
> + case GC_URGENT:
>   gc_mode = GC_GREEDY;
> + break;
> + }
>   return gc_mode;
>  }
>  
> @@ -187,7 +183,7 @@ static void select_policy(struct f2fs_sb_info *sbi, int 
> gc_type,
>   p->max_search = dirty_i->nr_dirty[type];
> 

[PATCH v2 3/3] f2fs: avoid stucking GC due to atomic write

2018-05-07 Thread Chao Yu
f2fs doesn't allow abuse on atomic write class interface, so except
limiting in-mem pages' total memory usage capacity, we need to limit
atomic-write usage as well when filesystem is seriously fragmented,
otherwise we may run into infinite loop during foreground GC because
target blocks in victim segment are belong to atomic opened file for
long time.

Now, we will detect failure due to atomic write in foreground GC, if
the count exceeds threshold, we will drop all atomic written data in
cache, by this, I expect it can keep our system running safely to
prevent Dos attack.

In addition, his patch adds to show GC skip information in debugfs,
now it just shows count of skipped caused by atomic write.

Signed-off-by: Chao Yu <yuch...@huawei.com>
---
v2:
- rebase code.
 fs/f2fs/data.c|  2 +-
 fs/f2fs/debug.c   |  6 ++
 fs/f2fs/f2fs.h| 21 +++--
 fs/f2fs/file.c| 20 ++--
 fs/f2fs/gc.c  | 27 +++
 fs/f2fs/inode.c   |  6 --
 fs/f2fs/segment.c | 11 ++-
 fs/f2fs/segment.h |  2 ++
 8 files changed, 75 insertions(+), 20 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index f7365ce45450..e394b5486c91 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -2320,7 +2320,7 @@ static int f2fs_write_begin(struct file *file, struct 
address_space *mapping,
f2fs_put_page(page, 1);
f2fs_write_failed(mapping, pos + len);
if (drop_atomic)
-   drop_inmem_pages_all(sbi);
+   drop_inmem_pages_all(sbi, false);
return err;
 }
 
diff --git a/fs/f2fs/debug.c b/fs/f2fs/debug.c
index d92a01cb420c..8febd9160635 100644
--- a/fs/f2fs/debug.c
+++ b/fs/f2fs/debug.c
@@ -104,6 +104,8 @@ static void update_general_status(struct f2fs_sb_info *sbi)
si->avail_nids = NM_I(sbi)->available_nids;
si->alloc_nids = NM_I(sbi)->nid_cnt[PREALLOC_NID];
si->bg_gc = sbi->bg_gc;
+   si->skipped_atomic_files[BG_GC] = sbi->skipped_atomic_files[BG_GC];
+   si->skipped_atomic_files[FG_GC] = sbi->skipped_atomic_files[FG_GC];
si->util_free = (int)(free_user_blocks(sbi) >> sbi->log_blocks_per_seg)
* 100 / (int)(sbi->user_block_count >> sbi->log_blocks_per_seg)
/ 2;
@@ -342,6 +344,10 @@ static int stat_show(struct seq_file *s, void *v)
si->bg_data_blks);
seq_printf(s, "  - node blocks : %d (%d)\n", si->node_blks,
si->bg_node_blks);
+   seq_printf(s, "Skipped : atomic write %llu (%llu)\n",
+   si->skipped_atomic_files[BG_GC] +
+   si->skipped_atomic_files[FG_GC],
+   si->skipped_atomic_files[BG_GC]);
seq_puts(s, "\nExtent Cache:\n");
seq_printf(s, "  - Hit Count: L1-1:%llu L1-2:%llu L2:%llu\n",
si->hit_largest, si->hit_cached,
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 73a1a39889bc..3286127708f9 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -617,15 +617,20 @@ enum {
 
 #define DEF_DIR_LEVEL  0
 
+enum {
+   GC_FAILURE_PIN,
+   GC_FAILURE_ATOMIC,
+   MAX_GC_FAILURE
+};
+
 struct f2fs_inode_info {
struct inode vfs_inode; /* serve a vfs inode */
unsigned long i_flags;  /* keep an inode flags for ioctl */
unsigned char i_advise; /* use to give file attribute hints */
unsigned char i_dir_level;  /* use for dentry level for large dir */
-   union {
-   unsigned int i_current_depth;   /* only for directory depth */
-   unsigned short i_gc_failures;   /* only for regular file */
-   };
+   unsigned int i_current_depth;   /* only for directory depth */
+   /* for gc failure statistic */
+   unsigned int i_gc_failures[MAX_GC_FAILURE];
unsigned int i_pino;/* parent inode number */
umode_t i_acl_mode; /* keep file acl mode temporarily */
 
@@ -1194,6 +1199,8 @@ struct f2fs_sb_info {
struct rw_semaphore gc_rwsem;   /* rw semaphore for gc_thread */
struct f2fs_gc_kthread  *gc_thread; /* GC thread */
unsigned int cur_victim_sec;/* current victim section num */
+   /* for skip statistic */
+   unsigned long long skipped_atomic_files[2];
 
/* threshold for gc trials on pinned files */
u64 gc_pin_file_threshold;
@@ -2242,6 +2249,7 @@ enum {
FI_EXTRA_ATTR,  /* indicate file has extra attribute */
FI_PROJ_INHERIT,/* indicate file inherits projectid */
FI_PIN_FILE,/* indicate file should not be gced */
+   FI_ATOMIC_REVOKE_REQUEST,/* indicate atomic committed data has been 
dropped */
 };
 
 static inline void _

[PATCH v2 2/3] f2fs: introduce GC_I for cleanup

2018-05-07 Thread Chao Yu
Introduce GC_I to replace sbi->gc_thread for cleanup, no logic changes.

Signed-off-by: Chao Yu <yuch...@huawei.com>
---
v2:
- rebase code.
 fs/f2fs/debug.c   |  2 +-
 fs/f2fs/f2fs.h|  5 +
 fs/f2fs/gc.c  | 14 +++---
 fs/f2fs/segment.c |  4 ++--
 fs/f2fs/super.c   |  4 ++--
 fs/f2fs/sysfs.c   |  8 
 6 files changed, 21 insertions(+), 16 deletions(-)

diff --git a/fs/f2fs/debug.c b/fs/f2fs/debug.c
index a66107b5cfff..d92a01cb420c 100644
--- a/fs/f2fs/debug.c
+++ b/fs/f2fs/debug.c
@@ -221,7 +221,7 @@ static void update_mem_info(struct f2fs_sb_info *sbi)
si->cache_mem = 0;
 
/* build gc */
-   if (sbi->gc_thread)
+   if (GC_I(sbi))
si->cache_mem += sizeof(struct f2fs_gc_kthread);
 
/* build merge flush thread */
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index e238d0ea0be7..73a1a39889bc 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1412,6 +1412,11 @@ static inline struct sit_info *SIT_I(struct f2fs_sb_info 
*sbi)
return (struct sit_info *)(SM_I(sbi)->sit_info);
 }
 
+static inline struct f2fs_gc_kthread *GC_I(struct f2fs_sb_info *sbi)
+{
+   return (struct f2fs_gc_kthread *)(sbi->gc_thread);
+}
+
 static inline struct free_segmap_info *FREE_I(struct f2fs_sb_info *sbi)
 {
return (struct free_segmap_info *)(SM_I(sbi)->free_info);
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index b74714be7be7..3c7914425b4e 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -26,8 +26,8 @@
 static int gc_thread_func(void *data)
 {
struct f2fs_sb_info *sbi = data;
-   struct f2fs_gc_kthread *gc_th = sbi->gc_thread;
-   wait_queue_head_t *wq = >gc_thread->gc_wait_queue_head;
+   struct f2fs_gc_kthread *gc_th = GC_I(sbi);
+   wait_queue_head_t *wq = _th->gc_wait_queue_head;
unsigned int wait_ms;
 
wait_ms = gc_th->min_sleep_time;
@@ -136,8 +136,8 @@ int start_gc_thread(struct f2fs_sb_info *sbi)
gc_th->gc_wake= 0;
 
sbi->gc_thread = gc_th;
-   init_waitqueue_head(>gc_thread->gc_wait_queue_head);
-   sbi->gc_thread->f2fs_gc_task = kthread_run(gc_thread_func, sbi,
+   init_waitqueue_head(_th->gc_wait_queue_head);
+   gc_th->f2fs_gc_task = kthread_run(gc_thread_func, sbi,
"f2fs_gc-%u:%u", MAJOR(dev), MINOR(dev));
if (IS_ERR(gc_th->f2fs_gc_task)) {
err = PTR_ERR(gc_th->f2fs_gc_task);
@@ -150,7 +150,7 @@ int start_gc_thread(struct f2fs_sb_info *sbi)
 
 void stop_gc_thread(struct f2fs_sb_info *sbi)
 {
-   struct f2fs_gc_kthread *gc_th = sbi->gc_thread;
+   struct f2fs_gc_kthread *gc_th = GC_I(sbi);
if (!gc_th)
return;
kthread_stop(gc_th->f2fs_gc_task);
@@ -188,7 +188,7 @@ static void select_policy(struct f2fs_sb_info *sbi, int 
gc_type,
p->ofs_unit = 1;
} else {
down_read(>gc_rwsem);
-   p->gc_mode = select_gc_type(sbi->gc_thread, gc_type);
+   p->gc_mode = select_gc_type(GC_I(sbi), gc_type);
up_read(>gc_rwsem);
p->dirty_segmap = dirty_i->dirty_segmap[DIRTY];
p->max_search = dirty_i->nr_dirty[DIRTY];
@@ -198,7 +198,7 @@ static void select_policy(struct f2fs_sb_info *sbi, int 
gc_type,
/* we need to check every dirty segments in the FG_GC case */
down_read(>gc_rwsem);
if (gc_type != FG_GC &&
-   (sbi->gc_thread && !sbi->gc_thread->gc_urgent) &&
+   (GC_I(sbi) && !GC_I(sbi)->gc_urgent) &&
p->max_search > sbi->max_victim_search)
p->max_search = sbi->max_victim_search;
up_read(>gc_rwsem);
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 33d146939048..c660efad7590 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -180,7 +180,7 @@ bool need_SSR(struct f2fs_sb_info *sbi)
return false;
 
down_read(>gc_rwsem);
-   if (sbi->gc_thread && sbi->gc_thread->gc_urgent)
+   if (GC_I(sbi) && GC_I(sbi)->gc_urgent)
gc_urgent = true;
up_read(>gc_rwsem);
 
@@ -1429,7 +1429,7 @@ static int issue_discard_thread(void *data)
dcc->discard_wake = 0;
 
down_read(>gc_rwsem);
-   if (sbi->gc_thread && sbi->gc_thread->gc_urgent)
+   if (GC_I(sbi) && GC_I(sbi)->gc_urgent)
init_discard_policy(, DPOLICY_FORCE, 1);
up_read(>gc_rwsem);
 
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 55ccc2eaaa2e..6bc0eb2084ff 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -1476,14 +1476,14 @@ static int f2fs_remount(struct super_block *sb, int 
*flags, char *data)
 */

[PATCH v2 1/3] f2fs: fix to initialize i_current_depth according to inode type

2018-05-07 Thread Chao Yu
i_current_depth is used only for directory inode, but its space is
shared with i_gc_failures field used for regular inode, in order to
avoid affecting i_gc_failures' value, this patch fixes to initialize
the union's fields according to inode type.

Signed-off-by: Chao Yu <yuch...@huawei.com>
---
v2:
- rebase code.
 fs/f2fs/inode.c | 12 +---
 fs/f2fs/namei.c |  3 +++
 fs/f2fs/super.c |  1 -
 3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
index 7f2fe4574c48..3a74a1cf3264 100644
--- a/fs/f2fs/inode.c
+++ b/fs/f2fs/inode.c
@@ -232,8 +232,10 @@ static int do_read_inode(struct inode *inode)
inode->i_ctime.tv_nsec = le32_to_cpu(ri->i_ctime_nsec);
inode->i_mtime.tv_nsec = le32_to_cpu(ri->i_mtime_nsec);
inode->i_generation = le32_to_cpu(ri->i_generation);
-
-   fi->i_current_depth = le32_to_cpu(ri->i_current_depth);
+   if (S_ISDIR(inode->i_mode))
+   fi->i_current_depth = le32_to_cpu(ri->i_current_depth);
+   else if (S_ISREG(inode->i_mode))
+   fi->i_gc_failures = le16_to_cpu(ri->i_gc_failures);
fi->i_xattr_nid = le32_to_cpu(ri->i_xattr_nid);
fi->i_flags = le32_to_cpu(ri->i_flags);
fi->flags = 0;
@@ -422,7 +424,11 @@ void update_inode(struct inode *inode, struct page 
*node_page)
ri->i_atime_nsec = cpu_to_le32(inode->i_atime.tv_nsec);
ri->i_ctime_nsec = cpu_to_le32(inode->i_ctime.tv_nsec);
ri->i_mtime_nsec = cpu_to_le32(inode->i_mtime.tv_nsec);
-   ri->i_current_depth = cpu_to_le32(F2FS_I(inode)->i_current_depth);
+   if (S_ISDIR(inode->i_mode))
+   ri->i_current_depth =
+   cpu_to_le32(F2FS_I(inode)->i_current_depth);
+   else if (S_ISREG(inode->i_mode))
+   ri->i_gc_failures = cpu_to_le16(F2FS_I(inode)->i_gc_failures);
ri->i_xattr_nid = cpu_to_le32(F2FS_I(inode)->i_xattr_nid);
ri->i_flags = cpu_to_le32(F2FS_I(inode)->i_flags);
ri->i_pino = cpu_to_le32(F2FS_I(inode)->i_pino);
diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
index fef6e3ab2135..bcfc4219b29e 100644
--- a/fs/f2fs/namei.c
+++ b/fs/f2fs/namei.c
@@ -54,6 +54,9 @@ static struct inode *f2fs_new_inode(struct inode *dir, 
umode_t mode)
F2FS_I(inode)->i_crtime = current_time(inode);
inode->i_generation = sbi->s_next_generation++;
 
+   if (S_ISDIR(inode->i_mode))
+   F2FS_I(inode)->i_current_depth = 1;
+
err = insert_inode_locked(inode);
if (err) {
err = -EINVAL;
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 294be9e92aee..55ccc2eaaa2e 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -826,7 +826,6 @@ static struct inode *f2fs_alloc_inode(struct super_block 
*sb)
 
/* Initialize f2fs-specific inode info */
atomic_set(>dirty_pages, 0);
-   fi->i_current_depth = 1;
init_rwsem(>i_sem);
INIT_LIST_HEAD(>dirty_list);
INIT_LIST_HEAD(>gdirty_list);
-- 
2.17.0.391.g1f1cddd558b5



[PATCH v4] f2fs: fix to avoid race during access gc_thread pointer

2018-05-07 Thread Chao Yu
Thread AThread BThread C
- f2fs_remount
 - stop_gc_thread
- f2fs_sbi_store
- issue_discard_thread
   sbi->gc_thread = NULL;
  sbi->gc_thread->gc_wake = 1
  access 
sbi->gc_thread->gc_urgent

Previously, we allocate memory for sbi->gc_thread based on background
gc thread mount option, the memory can be released if we turn off
that mount option, but still there are several places access gc_thread
pointer without considering race condition, result in NULL point
dereference.

In order to fix this issue, introduce gc_rwsem to exclude those operations.

Signed-off-by: Chao Yu <yuch...@huawei.com>
---
v4:
- use introduced sbi.gc_rwsem lock instead of sb.s_umount.
 fs/f2fs/f2fs.h|  1 +
 fs/f2fs/gc.c  |  4 
 fs/f2fs/segment.c | 11 ++-
 fs/f2fs/super.c   | 19 ++-
 fs/f2fs/sysfs.c   | 14 +++---
 5 files changed, 40 insertions(+), 9 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 80490a7991a7..e238d0ea0be7 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1191,6 +1191,7 @@ struct f2fs_sb_info {
 
/* for cleaning operations */
struct mutex gc_mutex;  /* mutex for GC */
+   struct rw_semaphore gc_rwsem;   /* rw semaphore for gc_thread */
struct f2fs_gc_kthread  *gc_thread; /* GC thread */
unsigned int cur_victim_sec;/* current victim section num */
 
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 9bb2ddbbed1e..b74714be7be7 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -187,17 +187,21 @@ static void select_policy(struct f2fs_sb_info *sbi, int 
gc_type,
p->max_search = dirty_i->nr_dirty[type];
p->ofs_unit = 1;
} else {
+   down_read(>gc_rwsem);
p->gc_mode = select_gc_type(sbi->gc_thread, gc_type);
+   up_read(>gc_rwsem);
p->dirty_segmap = dirty_i->dirty_segmap[DIRTY];
p->max_search = dirty_i->nr_dirty[DIRTY];
p->ofs_unit = sbi->segs_per_sec;
}
 
/* we need to check every dirty segments in the FG_GC case */
+   down_read(>gc_rwsem);
if (gc_type != FG_GC &&
(sbi->gc_thread && !sbi->gc_thread->gc_urgent) &&
p->max_search > sbi->max_victim_search)
p->max_search = sbi->max_victim_search;
+   up_read(>gc_rwsem);
 
/* let's select beginning hot/small space first in no_heap mode*/
if (test_opt(sbi, NOHEAP) &&
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 320cc1c57246..33d146939048 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -174,11 +174,18 @@ bool need_SSR(struct f2fs_sb_info *sbi)
int node_secs = get_blocktype_secs(sbi, F2FS_DIRTY_NODES);
int dent_secs = get_blocktype_secs(sbi, F2FS_DIRTY_DENTS);
int imeta_secs = get_blocktype_secs(sbi, F2FS_DIRTY_IMETA);
+   bool gc_urgent = false;
 
if (test_opt(sbi, LFS))
return false;
+
+   down_read(>gc_rwsem);
if (sbi->gc_thread && sbi->gc_thread->gc_urgent)
-   return true;
+   gc_urgent = true;
+   up_read(>gc_rwsem);
+
+   if (gc_urgent)
+   return false;
 
return free_sections(sbi) <= (node_secs + 2 * dent_secs + imeta_secs +
SM_I(sbi)->min_ssr_sections + reserved_sections(sbi));
@@ -1421,8 +1428,10 @@ static int issue_discard_thread(void *data)
if (dcc->discard_wake)
dcc->discard_wake = 0;
 
+   down_read(>gc_rwsem);
if (sbi->gc_thread && sbi->gc_thread->gc_urgent)
init_discard_policy(, DPOLICY_FORCE, 1);
+   up_read(>gc_rwsem);
 
sb_start_intwrite(sbi->sb);
 
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index c8e5fe5d71fe..294be9e92aee 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -1476,15 +1476,23 @@ static int f2fs_remount(struct super_block *sb, int 
*flags, char *data)
 * option. Also sync the filesystem.
 */
if ((*flags & SB_RDONLY) || !test_opt(sbi, BG_GC)) {
+   down_write(>gc_rwsem);
if (sbi->gc_thread) {
stop_gc_thread(sbi);
need_restart_gc = true;
}
-   } else if (!sbi->gc_thread) {
-   err = start_gc_thread(sbi);
-   if (err)
-   goto restore_opts;
-   need_stop_gc = true;
+   

Re: [PATCH v3] f2fs: fix to avoid race during access gc_thread pointer

2018-05-05 Thread Chao Yu
On 2018/5/5 18:02, Chao Yu wrote:
> Thread A  Thread BThread C
> - f2fs_remount
>  - stop_gc_thread
>   - f2fs_sbi_store
>   - issue_discard_thread
>sbi->gc_thread = NULL;
> sbi->gc_thread->gc_wake = 1
> access 
> sbi->gc_thread->gc_urgent
> 
> Previously, we allocate memory for sbi->gc_thread based on background
> gc thread mount option, the memory can be released if we turn off
> that mount option, but still there are several places access gc_thread
> pointer without considering race condition, result in NULL point
> dereference.
> 
> In order to fix this issue, use sb->s_umount to exclude those operations

I encounter deadlock with this patch

 dump_stack+0x5f/0x86
 __lock_acquire+0xff7/0x10e0
 lock_acquire+0xae/0x220
 down_read+0x38/0x60try lock s_umount again
 need_SSR+0x5d/0x160 [f2fs]
 allocate_segment_by_default+0xb7/0x1c0 [f2fs]
 allocate_data_block+0x183/0x4c0 [f2fs]
 do_write_page+0x52/0x80 [f2fs]
 write_data_page+0x4a/0xd0 [f2fs]
 do_write_data_page+0x327/0x630 [f2fs]
 __write_data_page+0x34b/0x800 [f2fs]
 __f2fs_write_data_pages+0x3f1/0x8e0 [f2fs]
 f2fs_write_data_pages+0x27/0x30 [f2fs]
 do_writepages+0x1a/0x70
 __writeback_single_inode+0x55/0x7e0
 writeback_sb_inodes+0x21b/0x490
 __writeback_inodes_wb+0x7c/0xb0trylock_super has alread hold s_umount
 wb_writeback+0x3e2/0x580
 wb_workfn+0x251/0x6b0
 process_one_work+0x196/0x550
 worker_thread+0x31/0x360
 kthread+0xe3/0x110
 ret_from_fork+0x2e/0x38

So, is it better to introduce private lock to avoid the race condition?

Thanks,

> 
> Signed-off-by: Chao Yu <yuch...@huawei.com>
> ---
> v3:
> - use sb->s_umount to make all race cases exclusive.
>  fs/f2fs/gc.c  |  4 
>  fs/f2fs/segment.c | 11 ++-
>  fs/f2fs/sysfs.c   | 14 +++---
>  3 files changed, 25 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> index 9bb2ddbbed1e..d7d469f9be0a 100644
> --- a/fs/f2fs/gc.c
> +++ b/fs/f2fs/gc.c
> @@ -187,17 +187,21 @@ static void select_policy(struct f2fs_sb_info *sbi, int 
> gc_type,
>   p->max_search = dirty_i->nr_dirty[type];
>   p->ofs_unit = 1;
>   } else {
> + down_read(>sb->s_umount);
>   p->gc_mode = select_gc_type(sbi->gc_thread, gc_type);
> + up_read(>sb->s_umount);
>   p->dirty_segmap = dirty_i->dirty_segmap[DIRTY];
>   p->max_search = dirty_i->nr_dirty[DIRTY];
>   p->ofs_unit = sbi->segs_per_sec;
>   }
>  
>   /* we need to check every dirty segments in the FG_GC case */
> + down_read(>sb->s_umount);
>   if (gc_type != FG_GC &&
>   (sbi->gc_thread && !sbi->gc_thread->gc_urgent) &&
>   p->max_search > sbi->max_victim_search)
>   p->max_search = sbi->max_victim_search;
> + up_read(>sb->s_umount);
>  
>   /* let's select beginning hot/small space first in no_heap mode*/
>   if (test_opt(sbi, NOHEAP) &&
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index 320cc1c57246..74e184ab0544 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -174,11 +174,18 @@ bool need_SSR(struct f2fs_sb_info *sbi)
>   int node_secs = get_blocktype_secs(sbi, F2FS_DIRTY_NODES);
>   int dent_secs = get_blocktype_secs(sbi, F2FS_DIRTY_DENTS);
>   int imeta_secs = get_blocktype_secs(sbi, F2FS_DIRTY_IMETA);
> + bool gc_urgent = false;
>  
>   if (test_opt(sbi, LFS))
>   return false;
> +
> + down_read(>sb->s_umount);
>   if (sbi->gc_thread && sbi->gc_thread->gc_urgent)
> - return true;
> + gc_urgent = true;
> + up_read(>sb->s_umount);
> +
> + if (gc_urgent)
> + return false;
>  
>   return free_sections(sbi) <= (node_secs + 2 * dent_secs + imeta_secs +
>   SM_I(sbi)->min_ssr_sections + reserved_sections(sbi));
> @@ -1421,8 +1428,10 @@ static int issue_discard_thread(void *data)
>   if (dcc->discard_wake)
>   dcc->discard_wake = 0;
>  
> + down_read(>sb->s_umount);
>   if (sbi->gc_thread && sbi->gc_thread->gc_urgent)
>   init_discard_policy(, DPOLICY_FORCE, 1);
> + up_read(>sb->s_umount);
>  
>   sb_start_intwrite(sbi->sb);
>

[PATCH 3/3] f2fs: avoid stucking GC due to atomic write

2018-05-05 Thread Chao Yu
f2fs doesn't allow abuse on atomic write class interface, so except
limiting in-mem pages' total memory usage capacity, we need to limit
atomic-write usage as well when filesystem is seriously fragmented,
otherwise we may run into infinite loop during foreground GC because
target blocks in victim segment are belong to atomic opened file for
long time.

Now, we will detect failure due to atomic write in foreground GC, if
the count exceeds threshold, we will drop all atomic written data in
cache, by this, I expect it can keep our system running safely to
prevent Dos attack.

In addition, his patch adds to show GC skip information in debugfs,
now it just shows count of skipped caused by atomic write.

Signed-off-by: Chao Yu <yuch...@huawei.com>
---
 fs/f2fs/data.c|  2 +-
 fs/f2fs/debug.c   |  8 
 fs/f2fs/f2fs.h| 19 +--
 fs/f2fs/file.c| 20 ++--
 fs/f2fs/gc.c  | 31 +++
 fs/f2fs/gc.h  |  3 +++
 fs/f2fs/inode.c   |  6 --
 fs/f2fs/segment.c | 11 ++-
 fs/f2fs/segment.h |  2 ++
 9 files changed, 82 insertions(+), 20 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index f7365ce45450..e394b5486c91 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -2320,7 +2320,7 @@ static int f2fs_write_begin(struct file *file, struct 
address_space *mapping,
f2fs_put_page(page, 1);
f2fs_write_failed(mapping, pos + len);
if (drop_atomic)
-   drop_inmem_pages_all(sbi);
+   drop_inmem_pages_all(sbi, false);
return err;
 }
 
diff --git a/fs/f2fs/debug.c b/fs/f2fs/debug.c
index d92a01cb420c..159427a5549c 100644
--- a/fs/f2fs/debug.c
+++ b/fs/f2fs/debug.c
@@ -104,6 +104,10 @@ static void update_general_status(struct f2fs_sb_info *sbi)
si->avail_nids = NM_I(sbi)->available_nids;
si->alloc_nids = NM_I(sbi)->nid_cnt[PREALLOC_NID];
si->bg_gc = sbi->bg_gc;
+   si->skipped_atomic_files[BG_GC] =
+   sbi->gc_thread->skipped_atomic_files[BG_GC];
+   si->skipped_atomic_files[FG_GC] =
+   sbi->gc_thread->skipped_atomic_files[FG_GC];
si->util_free = (int)(free_user_blocks(sbi) >> sbi->log_blocks_per_seg)
* 100 / (int)(sbi->user_block_count >> sbi->log_blocks_per_seg)
/ 2;
@@ -342,6 +346,10 @@ static int stat_show(struct seq_file *s, void *v)
si->bg_data_blks);
seq_printf(s, "  - node blocks : %d (%d)\n", si->node_blks,
si->bg_node_blks);
+   seq_printf(s, "Skipped : atomic write %llu (%llu)\n",
+   si->skipped_atomic_files[BG_GC] +
+   si->skipped_atomic_files[FG_GC],
+   si->skipped_atomic_files[BG_GC]);
seq_puts(s, "\nExtent Cache:\n");
seq_printf(s, "  - Hit Count: L1-1:%llu L1-2:%llu L2:%llu\n",
si->hit_largest, si->hit_cached,
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 06ca1e218c01..38a951917ade 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -617,15 +617,20 @@ enum {
 
 #define DEF_DIR_LEVEL  0
 
+enum {
+   GC_FAILURE_PIN,
+   GC_FAILURE_ATOMIC,
+   MAX_GC_FAILURE
+};
+
 struct f2fs_inode_info {
struct inode vfs_inode; /* serve a vfs inode */
unsigned long i_flags;  /* keep an inode flags for ioctl */
unsigned char i_advise; /* use to give file attribute hints */
unsigned char i_dir_level;  /* use for dentry level for large dir */
-   union {
-   unsigned int i_current_depth;   /* only for directory depth */
-   unsigned short i_gc_failures;   /* only for regular file */
-   };
+   unsigned int i_current_depth;   /* only for directory depth */
+   /* for gc failure statistic */
+   unsigned int i_gc_failures[MAX_GC_FAILURE];
unsigned int i_pino;/* parent inode number */
umode_t i_acl_mode; /* keep file acl mode temporarily */
 
@@ -2241,6 +2246,7 @@ enum {
FI_EXTRA_ATTR,  /* indicate file has extra attribute */
FI_PROJ_INHERIT,/* indicate file inherits projectid */
FI_PIN_FILE,/* indicate file should not be gced */
+   FI_ATOMIC_REVOKE_REQUEST,/* indicate atomic committed data has been 
dropped */
 };
 
 static inline void __mark_inode_dirty_flag(struct inode *inode,
@@ -2339,7 +2345,7 @@ static inline void f2fs_i_depth_write(struct inode 
*inode, unsigned int depth)
 static inline void f2fs_i_gc_failures_write(struct inode *inode,
unsigned int count)
 {
-   F2FS_I(inode)->i_gc_failures = count;
+   F

[PATCH 1/3] f2fs: fix to initialize i_current_depth according to inode type

2018-05-05 Thread Chao Yu
i_current_depth is used only for directory inode, but its space is
shared with i_gc_failures field used for regular inode, in order to
avoid affecting i_gc_failures' value, this patch fixes to initialize
the union's fields according to inode type.

Signed-off-by: Chao Yu <yuch...@huawei.com>
---
 fs/f2fs/inode.c | 12 +---
 fs/f2fs/namei.c |  3 +++
 fs/f2fs/super.c |  1 -
 3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
index 7f2fe4574c48..3a74a1cf3264 100644
--- a/fs/f2fs/inode.c
+++ b/fs/f2fs/inode.c
@@ -232,8 +232,10 @@ static int do_read_inode(struct inode *inode)
inode->i_ctime.tv_nsec = le32_to_cpu(ri->i_ctime_nsec);
inode->i_mtime.tv_nsec = le32_to_cpu(ri->i_mtime_nsec);
inode->i_generation = le32_to_cpu(ri->i_generation);
-
-   fi->i_current_depth = le32_to_cpu(ri->i_current_depth);
+   if (S_ISDIR(inode->i_mode))
+   fi->i_current_depth = le32_to_cpu(ri->i_current_depth);
+   else if (S_ISREG(inode->i_mode))
+   fi->i_gc_failures = le16_to_cpu(ri->i_gc_failures);
fi->i_xattr_nid = le32_to_cpu(ri->i_xattr_nid);
fi->i_flags = le32_to_cpu(ri->i_flags);
fi->flags = 0;
@@ -422,7 +424,11 @@ void update_inode(struct inode *inode, struct page 
*node_page)
ri->i_atime_nsec = cpu_to_le32(inode->i_atime.tv_nsec);
ri->i_ctime_nsec = cpu_to_le32(inode->i_ctime.tv_nsec);
ri->i_mtime_nsec = cpu_to_le32(inode->i_mtime.tv_nsec);
-   ri->i_current_depth = cpu_to_le32(F2FS_I(inode)->i_current_depth);
+   if (S_ISDIR(inode->i_mode))
+   ri->i_current_depth =
+   cpu_to_le32(F2FS_I(inode)->i_current_depth);
+   else if (S_ISREG(inode->i_mode))
+   ri->i_gc_failures = cpu_to_le16(F2FS_I(inode)->i_gc_failures);
ri->i_xattr_nid = cpu_to_le32(F2FS_I(inode)->i_xattr_nid);
ri->i_flags = cpu_to_le32(F2FS_I(inode)->i_flags);
ri->i_pino = cpu_to_le32(F2FS_I(inode)->i_pino);
diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
index fef6e3ab2135..bcfc4219b29e 100644
--- a/fs/f2fs/namei.c
+++ b/fs/f2fs/namei.c
@@ -54,6 +54,9 @@ static struct inode *f2fs_new_inode(struct inode *dir, 
umode_t mode)
F2FS_I(inode)->i_crtime = current_time(inode);
inode->i_generation = sbi->s_next_generation++;
 
+   if (S_ISDIR(inode->i_mode))
+   F2FS_I(inode)->i_current_depth = 1;
+
err = insert_inode_locked(inode);
if (err) {
err = -EINVAL;
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index c8e5fe5d71fe..8e5f0a178f5d 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -826,7 +826,6 @@ static struct inode *f2fs_alloc_inode(struct super_block 
*sb)
 
/* Initialize f2fs-specific inode info */
atomic_set(>dirty_pages, 0);
-   fi->i_current_depth = 1;
init_rwsem(>i_sem);
INIT_LIST_HEAD(>dirty_list);
INIT_LIST_HEAD(>gdirty_list);
-- 
2.17.0.391.g1f1cddd558b5



[PATCH 2/3] f2fs: introduce GC_I for cleanup

2018-05-05 Thread Chao Yu
Introduce GC_I to replace sbi->gc_thread for cleanup, no logic changes.

Signed-off-by: Chao Yu <yuch...@huawei.com>
---
 fs/f2fs/debug.c   |  2 +-
 fs/f2fs/f2fs.h|  5 +
 fs/f2fs/gc.c  | 14 +++---
 fs/f2fs/segment.c |  4 ++--
 fs/f2fs/super.c   |  4 ++--
 fs/f2fs/sysfs.c   |  8 
 6 files changed, 21 insertions(+), 16 deletions(-)

diff --git a/fs/f2fs/debug.c b/fs/f2fs/debug.c
index a66107b5cfff..d92a01cb420c 100644
--- a/fs/f2fs/debug.c
+++ b/fs/f2fs/debug.c
@@ -221,7 +221,7 @@ static void update_mem_info(struct f2fs_sb_info *sbi)
si->cache_mem = 0;
 
/* build gc */
-   if (sbi->gc_thread)
+   if (GC_I(sbi))
si->cache_mem += sizeof(struct f2fs_gc_kthread);
 
/* build merge flush thread */
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 80490a7991a7..06ca1e218c01 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1411,6 +1411,11 @@ static inline struct sit_info *SIT_I(struct f2fs_sb_info 
*sbi)
return (struct sit_info *)(SM_I(sbi)->sit_info);
 }
 
+static inline struct f2fs_gc_kthread *GC_I(struct f2fs_sb_info *sbi)
+{
+   return (struct f2fs_gc_kthread *)(sbi->gc_thread);
+}
+
 static inline struct free_segmap_info *FREE_I(struct f2fs_sb_info *sbi)
 {
return (struct free_segmap_info *)(SM_I(sbi)->free_info);
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index d7d469f9be0a..812189dd06e5 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -26,8 +26,8 @@
 static int gc_thread_func(void *data)
 {
struct f2fs_sb_info *sbi = data;
-   struct f2fs_gc_kthread *gc_th = sbi->gc_thread;
-   wait_queue_head_t *wq = >gc_thread->gc_wait_queue_head;
+   struct f2fs_gc_kthread *gc_th = GC_I(sbi);
+   wait_queue_head_t *wq = _th->gc_wait_queue_head;
unsigned int wait_ms;
 
wait_ms = gc_th->min_sleep_time;
@@ -136,8 +136,8 @@ int start_gc_thread(struct f2fs_sb_info *sbi)
gc_th->gc_wake= 0;
 
sbi->gc_thread = gc_th;
-   init_waitqueue_head(>gc_thread->gc_wait_queue_head);
-   sbi->gc_thread->f2fs_gc_task = kthread_run(gc_thread_func, sbi,
+   init_waitqueue_head(_th->gc_wait_queue_head);
+   gc_th->f2fs_gc_task = kthread_run(gc_thread_func, sbi,
"f2fs_gc-%u:%u", MAJOR(dev), MINOR(dev));
if (IS_ERR(gc_th->f2fs_gc_task)) {
err = PTR_ERR(gc_th->f2fs_gc_task);
@@ -150,7 +150,7 @@ int start_gc_thread(struct f2fs_sb_info *sbi)
 
 void stop_gc_thread(struct f2fs_sb_info *sbi)
 {
-   struct f2fs_gc_kthread *gc_th = sbi->gc_thread;
+   struct f2fs_gc_kthread *gc_th = GC_I(sbi);
if (!gc_th)
return;
kthread_stop(gc_th->f2fs_gc_task);
@@ -188,7 +188,7 @@ static void select_policy(struct f2fs_sb_info *sbi, int 
gc_type,
p->ofs_unit = 1;
} else {
down_read(>sb->s_umount);
-   p->gc_mode = select_gc_type(sbi->gc_thread, gc_type);
+   p->gc_mode = select_gc_type(GC_I(sbi), gc_type);
up_read(>sb->s_umount);
p->dirty_segmap = dirty_i->dirty_segmap[DIRTY];
p->max_search = dirty_i->nr_dirty[DIRTY];
@@ -198,7 +198,7 @@ static void select_policy(struct f2fs_sb_info *sbi, int 
gc_type,
/* we need to check every dirty segments in the FG_GC case */
down_read(>sb->s_umount);
if (gc_type != FG_GC &&
-   (sbi->gc_thread && !sbi->gc_thread->gc_urgent) &&
+   (GC_I(sbi) && !GC_I(sbi)->gc_urgent) &&
p->max_search > sbi->max_victim_search)
p->max_search = sbi->max_victim_search;
up_read(>sb->s_umount);
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 74e184ab0544..ef7d46c106df 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -180,7 +180,7 @@ bool need_SSR(struct f2fs_sb_info *sbi)
return false;
 
down_read(>sb->s_umount);
-   if (sbi->gc_thread && sbi->gc_thread->gc_urgent)
+   if (GC_I(sbi) && GC_I(sbi)->gc_urgent)
gc_urgent = true;
up_read(>sb->s_umount);
 
@@ -1429,7 +1429,7 @@ static int issue_discard_thread(void *data)
dcc->discard_wake = 0;
 
down_read(>sb->s_umount);
-   if (sbi->gc_thread && sbi->gc_thread->gc_urgent)
+   if (GC_I(sbi) && GC_I(sbi)->gc_urgent)
init_discard_policy(, DPOLICY_FORCE, 1);
up_read(>sb->s_umount);
 
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 8e5f0a178f5d..77ad8aa7c1ed 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -1475,11 +1475,11 @@ static int f2fs_re

[PATCH v3] f2fs: fix to avoid race during access gc_thread pointer

2018-05-05 Thread Chao Yu
Thread AThread BThread C
- f2fs_remount
 - stop_gc_thread
- f2fs_sbi_store
- issue_discard_thread
   sbi->gc_thread = NULL;
  sbi->gc_thread->gc_wake = 1
  access 
sbi->gc_thread->gc_urgent

Previously, we allocate memory for sbi->gc_thread based on background
gc thread mount option, the memory can be released if we turn off
that mount option, but still there are several places access gc_thread
pointer without considering race condition, result in NULL point
dereference.

In order to fix this issue, use sb->s_umount to exclude those operations

Signed-off-by: Chao Yu <yuch...@huawei.com>
---
v3:
- use sb->s_umount to make all race cases exclusive.
 fs/f2fs/gc.c  |  4 
 fs/f2fs/segment.c | 11 ++-
 fs/f2fs/sysfs.c   | 14 +++---
 3 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 9bb2ddbbed1e..d7d469f9be0a 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -187,17 +187,21 @@ static void select_policy(struct f2fs_sb_info *sbi, int 
gc_type,
p->max_search = dirty_i->nr_dirty[type];
p->ofs_unit = 1;
} else {
+   down_read(>sb->s_umount);
p->gc_mode = select_gc_type(sbi->gc_thread, gc_type);
+   up_read(>sb->s_umount);
p->dirty_segmap = dirty_i->dirty_segmap[DIRTY];
p->max_search = dirty_i->nr_dirty[DIRTY];
p->ofs_unit = sbi->segs_per_sec;
}
 
/* we need to check every dirty segments in the FG_GC case */
+   down_read(>sb->s_umount);
if (gc_type != FG_GC &&
(sbi->gc_thread && !sbi->gc_thread->gc_urgent) &&
p->max_search > sbi->max_victim_search)
p->max_search = sbi->max_victim_search;
+   up_read(>sb->s_umount);
 
/* let's select beginning hot/small space first in no_heap mode*/
if (test_opt(sbi, NOHEAP) &&
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 320cc1c57246..74e184ab0544 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -174,11 +174,18 @@ bool need_SSR(struct f2fs_sb_info *sbi)
int node_secs = get_blocktype_secs(sbi, F2FS_DIRTY_NODES);
int dent_secs = get_blocktype_secs(sbi, F2FS_DIRTY_DENTS);
int imeta_secs = get_blocktype_secs(sbi, F2FS_DIRTY_IMETA);
+   bool gc_urgent = false;
 
if (test_opt(sbi, LFS))
return false;
+
+   down_read(>sb->s_umount);
if (sbi->gc_thread && sbi->gc_thread->gc_urgent)
-   return true;
+   gc_urgent = true;
+   up_read(>sb->s_umount);
+
+   if (gc_urgent)
+   return false;
 
return free_sections(sbi) <= (node_secs + 2 * dent_secs + imeta_secs +
SM_I(sbi)->min_ssr_sections + reserved_sections(sbi));
@@ -1421,8 +1428,10 @@ static int issue_discard_thread(void *data)
if (dcc->discard_wake)
dcc->discard_wake = 0;
 
+   down_read(>sb->s_umount);
if (sbi->gc_thread && sbi->gc_thread->gc_urgent)
init_discard_policy(, DPOLICY_FORCE, 1);
+   up_read(>sb->s_umount);
 
sb_start_intwrite(sbi->sb);
 
diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c
index 6d8d8f41e517..1cba68812b32 100644
--- a/fs/f2fs/sysfs.c
+++ b/fs/f2fs/sysfs.c
@@ -173,6 +173,7 @@ static ssize_t f2fs_sbi_store(struct f2fs_attr *a,
unsigned long t;
unsigned int *ui;
ssize_t ret;
+   bool gc_entry = (a->struct_type == GC_THREAD);
 
ptr = __struct_ptr(sbi, a->struct_type);
if (!ptr)
@@ -248,16 +249,23 @@ static ssize_t f2fs_sbi_store(struct f2fs_attr *a,
if (!strcmp(a->attr.name, "trim_sections"))
return -EINVAL;
 
-   *ui = t;
-
-   if (!strcmp(a->attr.name, "iostat_enable") && *ui == 0)
+   if (!strcmp(a->attr.name, "iostat_enable") && t == 0)
f2fs_reset_iostat(sbi);
+
+   if (gc_entry)
+   down_read(>sb->s_umount);
+
if (!strcmp(a->attr.name, "gc_urgent") && t == 1 && sbi->gc_thread) {
sbi->gc_thread->gc_wake = 1;
wake_up_interruptible_all(>gc_thread->gc_wait_queue_head);
wake_up_discard_thread(sbi, true);
}
 
+   *ui = t;
+
+   if (gc_entry)
+   up_read(>sb->s_umount);
+
return count;
 }
 
-- 
2.17.0.391.g1f1cddd558b5



  1   2   3   4   5   6   7   8   9   10   >