Re: [f2fs-dev] [GIT PULL] vfs: Add support for timestamp limits

2019-08-31 Thread Deepa Dinamani
> I think it's unclear from the orangefs source code what the intention is,
> as there is a mixed of signed and unsigned types used for the inode
> stamps:
>
> #define encode_PVFS_time encode_int64_t
> #define encode_int64_t(pptr,x) do { \
> *(int64_t*) *(pptr) = cpu_to_le64(*(x)); \
> *(pptr) += 8; \
> } while (0)
> #define decode_PVFS_time decode_int64_t
> #define decode_int64_t(pptr,x) do { \
> *(x) = le64_to_cpu(*(int64_t*) *(pptr)); \
> *(pptr) += 8; \
> } while (0)
>
> This suggests that making it unsigned may have been an accident.
>
> Then again,  it's clearly and consistently printed as unsigned in
> user space:
>
> gossip_debug(
> GOSSIP_GETATTR_DEBUG, " VERSION is %llu, mtime is %llu\n",
> llu(s_op->attr.mtime), llu(resp_attr->mtime));

I think I had noticed these two and decided maybe the intention was to
use unsigned types.

> A related issue I noticed is this:
>
> PVFS_time PINT_util_mktime_version(PVFS_time time)
> {
> struct timeval t = {0,0};
> PVFS_time version = (time << 32);
>
> gettimeofday(, NULL);
> version |= (PVFS_time)t.tv_usec;
> return version;
> }
> PVFS_time PINT_util_mkversion_time(PVFS_time version)
> {
> return (PVFS_time)(version >> 32);
> }
> static PINT_sm_action getattr_verify_attribs(
> struct PINT_smcb *smcb, job_status_s *js_p)
> {
> ...
> resp_attr->mtime = PINT_util_mkversion_time(s_op->attr.mtime);
> ...
> }
>
> which suggests that at least for some purposes, the mtime field
> is only an unsigned 32-bit number (1970..2106). From my readiing,
> this affects the on-disk format, but not the protocol implemented
> by the kernel.
>
> atime and ctime are apparently 64-bit, but mtime is only 32-bit
> seconds, plus a 32-bit 'version'. I suppose the server could be
> fixed to allow a larger range, but probably would take it out of
> the 'version' bits, not the upper half.

I had missed this part. Thanks.

> To be on the safe side, I suppose the kernel can only assume
> an unsigned 32-bit range to be available. If the server gets
> extended beyond that, it would have to pass a feature flag.

This makes sense to me also. And, as Arnd pointed out on the IRC, if
there are negative timestamps that are already in use, this will be a
problem for those use cases.
I can update tha patch to use limits 0-u32_max.

-Deepa


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[PATCH v2 2/2] f2fs: fix to reserve space for IO align feature

2019-08-31 Thread Chao Yu
https://bugzilla.kernel.org/show_bug.cgi?id=204137

With below script, we will hit panic during new segment allocation:

DISK=bingo.img
MOUNT_DIR=/mnt/f2fs

dd if=/dev/zero of=$DISK bs=1M count=105
mkfs.f2fe -a 1 -o 19 -t 1 -z 1 -f -q $DISK

mount -t f2fs $DISK $MOUNT_DIR -o 
"noinline_dentry,flush_merge,noextent_cache,mode=lfs,io_bits=7,fsync_mode=strict"

for (( i = 0; i < 4096; i++ )); do
name=`head /dev/urandom | tr -dc A-Za-z0-9 | head -c 10`
mkdir $MOUNT_DIR/$name
done

umount $MOUNT_DIR
rm $DISK

--- Core dump ---
Call Trace:
 allocate_segment_by_default+0x9d/0x100 [f2fs]
 f2fs_allocate_data_block+0x3c0/0x5c0 [f2fs]
 do_write_page+0x62/0x110 [f2fs]
 f2fs_outplace_write_data+0x43/0xc0 [f2fs]
 f2fs_do_write_data_page+0x386/0x560 [f2fs]
 __write_data_page+0x706/0x850 [f2fs]
 f2fs_write_cache_pages+0x267/0x6a0 [f2fs]
 f2fs_write_data_pages+0x19c/0x2e0 [f2fs]
 do_writepages+0x1c/0x70
 __filemap_fdatawrite_range+0xaa/0xe0
 filemap_fdatawrite+0x1f/0x30
 f2fs_sync_dirty_inodes+0x74/0x1f0 [f2fs]
 block_operations+0xdc/0x350 [f2fs]
 f2fs_write_checkpoint+0x104/0x1150 [f2fs]
 f2fs_sync_fs+0xa2/0x120 [f2fs]
 f2fs_balance_fs_bg+0x33c/0x390 [f2fs]
 f2fs_write_node_pages+0x4c/0x1f0 [f2fs]
 do_writepages+0x1c/0x70
 __writeback_single_inode+0x45/0x320
 writeback_sb_inodes+0x273/0x5c0
 wb_writeback+0xff/0x2e0
 wb_workfn+0xa1/0x370
 process_one_work+0x138/0x350
 worker_thread+0x4d/0x3d0
 kthread+0x109/0x140
 ret_from_fork+0x25/0x30

The root cause here is, with IO alignment feature enables, in worst
case, we need F2FS_IO_SIZE() free blocks space for single one 4k write
due to filling dummy pages to make IO being aligned.

So we will easily run out of free segments during non-inline directory's
data writeback, even in process of foreground GC.

In order to fix this issue, I just propose to reserve additional free
space for IO alignment feature to handle worst case of free space usage
ratio during FGGC.

Fixes: 0a595ebaaa6b ("f2fs: support IO alignment for DATA and NODE writes")
Signed-off-by: Chao Yu 
---
v2:
- just rebase, no logic change
 fs/f2fs/f2fs.h|  5 +
 fs/f2fs/segment.h |  3 ++-
 fs/f2fs/super.c   | 43 +++
 fs/f2fs/sysfs.c   |  4 +++-
 4 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 9c010e6cba5c..84fa41845e18 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -920,6 +920,7 @@ struct f2fs_sm_info {
unsigned int segment_count; /* total # of segments */
unsigned int main_segments; /* # of segments in main area */
unsigned int reserved_segments; /* # of reserved segments */
+   unsigned int additional_reserved_segments;/* reserved segs for IO align 
feature */
unsigned int ovp_segments;  /* # of overprovision segments */
 
/* a threshold to reclaim prefree segments */
@@ -1767,6 +1768,10 @@ static inline unsigned int 
get_available_block_count(struct f2fs_sb_info *sbi,
if (!__allow_reserved_blocks(sbi, inode, cap))
avail_user_block_count -= F2FS_OPTION(sbi).root_reserved_blocks;
 
+   if (F2FS_IO_ALIGNED(sbi))
+   avail_user_block_count -= sbi->blocks_per_seg *
+   SM_I(sbi)->additional_reserved_segments;
+
if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED))) {
if (avail_user_block_count > sbi->unusable_block_count)
avail_user_block_count -= sbi->unusable_block_count;
diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
index 325781a1ae4d..78d0f7b4c47a 100644
--- a/fs/f2fs/segment.h
+++ b/fs/f2fs/segment.h
@@ -508,7 +508,8 @@ static inline unsigned int free_segments(struct 
f2fs_sb_info *sbi)
 
 static inline int reserved_segments(struct f2fs_sb_info *sbi)
 {
-   return SM_I(sbi)->reserved_segments;
+   return SM_I(sbi)->reserved_segments +
+   SM_I(sbi)->additional_reserved_segments;
 }
 
 static inline unsigned int free_sections(struct f2fs_sb_info *sbi)
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 46d10a94721d..fd5efcd89f6e 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -277,6 +277,45 @@ static inline void limit_reserve_root(struct f2fs_sb_info 
*sbi)
   F2FS_OPTION(sbi).s_resgid));
 }
 
+static inline int adjust_reserved_segment(struct f2fs_sb_info *sbi)
+{
+   unsigned int sec_blks = sbi->blocks_per_seg * sbi->segs_per_sec;
+   unsigned int avg_vblocks;
+   unsigned int wanted_reserved_segments;
+   block_t avail_user_block_count;
+
+   if (!F2FS_IO_ALIGNED(sbi))
+   return 0;
+
+   /* average valid block count in section in worst case */
+   avg_vblocks = sec_blks / F2FS_IO_SIZE(sbi);
+
+   /*
+* we need enough free space when migrating one section in worst case
+*/
+   wanted_reserved_segments = (F2FS_IO_SIZE(sbi) / avg_vblocks) *
+   

[PATCH v2 1/2] f2fs: introduce get_available_block_count() for cleanup

2019-08-31 Thread Chao Yu
There are very similar codes in inc_valid_block_count() and
inc_valid_node_count() which is used for available user block
count calculation.

This patch introduces a new helper get_available_block_count()
to include those common codes, and used it instead for cleanup.

Signed-off-by: Chao Yu 
---
v2:
- fix panic during recovery
 fs/f2fs/f2fs.h | 47 +++
 1 file changed, 27 insertions(+), 20 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index a89ad8cab821..9c010e6cba5c 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1756,6 +1756,27 @@ static inline bool __allow_reserved_blocks(struct 
f2fs_sb_info *sbi,
return false;
 }
 
+static inline unsigned int get_available_block_count(struct f2fs_sb_info *sbi,
+   struct inode *inode, bool cap)
+{
+   block_t avail_user_block_count;
+
+   avail_user_block_count = sbi->user_block_count -
+   sbi->current_reserved_blocks;
+
+   if (!__allow_reserved_blocks(sbi, inode, cap))
+   avail_user_block_count -= F2FS_OPTION(sbi).root_reserved_blocks;
+
+   if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED))) {
+   if (avail_user_block_count > sbi->unusable_block_count)
+   avail_user_block_count -= sbi->unusable_block_count;
+   else
+   avail_user_block_count = 0;
+   }
+
+   return avail_user_block_count;
+}
+
 static inline void f2fs_i_blocks_write(struct inode *, block_t, bool, bool);
 static inline int inc_valid_block_count(struct f2fs_sb_info *sbi,
 struct inode *inode, blkcnt_t *count)
@@ -1782,17 +1803,8 @@ static inline int inc_valid_block_count(struct 
f2fs_sb_info *sbi,
 
spin_lock(>stat_lock);
sbi->total_valid_block_count += (block_t)(*count);
-   avail_user_block_count = sbi->user_block_count -
-   sbi->current_reserved_blocks;
+   avail_user_block_count = get_available_block_count(sbi, inode, true);
 
-   if (!__allow_reserved_blocks(sbi, inode, true))
-   avail_user_block_count -= F2FS_OPTION(sbi).root_reserved_blocks;
-   if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED))) {
-   if (avail_user_block_count > sbi->unusable_block_count)
-   avail_user_block_count -= sbi->unusable_block_count;
-   else
-   avail_user_block_count = 0;
-   }
if (unlikely(sbi->total_valid_block_count > avail_user_block_count)) {
diff = sbi->total_valid_block_count - avail_user_block_count;
if (diff > *count)
@@ -2005,7 +2017,8 @@ static inline int inc_valid_node_count(struct 
f2fs_sb_info *sbi,
struct inode *inode, bool is_inode)
 {
block_t valid_block_count;
-   unsigned int valid_node_count, user_block_count;
+   unsigned int valid_node_count;
+   unsigned int avail_user_block_count;
int err;
 
if (is_inode) {
@@ -2027,16 +2040,10 @@ static inline int inc_valid_node_count(struct 
f2fs_sb_info *sbi,
 
spin_lock(>stat_lock);
 
-   valid_block_count = sbi->total_valid_block_count +
-   sbi->current_reserved_blocks + 1;
-
-   if (!__allow_reserved_blocks(sbi, inode, false))
-   valid_block_count += F2FS_OPTION(sbi).root_reserved_blocks;
-   user_block_count = sbi->user_block_count;
-   if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED)))
-   user_block_count -= sbi->unusable_block_count;
+   valid_block_count = sbi->total_valid_block_count + 1;
+   avail_user_block_count = get_available_block_count(sbi, inode, false);
 
-   if (unlikely(valid_block_count > user_block_count)) {
+   if (unlikely(valid_block_count > avail_user_block_count)) {
spin_unlock(>stat_lock);
goto enospc;
}
-- 
2.18.0.rc1



Re: [f2fs-dev] [PATCH v4] f2fs: add bio cache for IPU

2019-08-31 Thread Chao Yu
On 2019/2/19 16:15, Chao Yu wrote:
> @@ -1976,10 +2035,13 @@ static int __write_data_page(struct page *page, bool 
> *submitted,
>   }
>  
>   unlock_page(page);
> - if (!S_ISDIR(inode->i_mode) && !IS_NOQUOTA(inode))
> + if (!S_ISDIR(inode->i_mode) && !IS_NOQUOTA(inode)) {
> + f2fs_submit_ipu_bio(sbi, bio, page);
>   f2fs_balance_fs(sbi, need_balance_fs);
> + }

Above bio submission was added to avoid below deadlock:

- __write_data_page
 - f2fs_do_write_data_page
  - set_page_writeback set writeback flag
  - f2fs_inplace_write_data
 - f2fs_balance_fs
  - f2fs_gc
   - do_garbage_collect
- gc_data_segment
 - move_data_page
  - f2fs_wait_on_page_writeback
   - wait_on_page_writeback  --- wait writeback

However, it breaks the merge of IPU IOs, to solve this issue, it looks we need
to add global bio cache for such IPU merge case, then later
f2fs_wait_on_page_writeback can check whether writebacked page is cached or not,
and do the submission if necessary.

Jaegeuk, any thoughts?

Thanks,


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel