Re: [f2fs-dev] [PATCH v2] f2fs: fix max orphan inodes calculation
-- 8 -- From ce2462523dd5940b59f770c09a50d4babff5fcdb Mon Sep 17 00:00:00 2001 From: Changman Lee cm224@samsung.com Date: Mon, 9 Mar 2015 08:07:04 +0900 Subject: [PATCH] f2fs: cleanup statement about max orphan inodes calc Through each macro, we can read the meaning easily. Signed-off-by: Changman Lee cm224@samsung.com --- fs/f2fs/checkpoint.c | 7 --- 1 file changed, 7 deletions(-) diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c index 53bc328..384bfc4 100644 --- a/fs/f2fs/checkpoint.c +++ b/fs/f2fs/checkpoint.c @@ -1104,13 +1104,6 @@ void init_ino_entry_info(struct f2fs_sb_info *sbi) im-ino_num = 0; } - /* -* considering 512 blocks in a segment 8+cp_payload blocks are -* needed for cp and log segment summaries. Remaining blocks are -* used to keep orphan entries with the limitation one reserved -* segment for cp pack we can have max 1020*(504-cp_payload) -* orphan entries -*/ sbi-max_orphans = (sbi-blocks_per_seg - F2FS_CP_PACKS - NR_CURSEG_TYPE - __cp_payload(sbi)) * F2FS_ORPHANS_PER_BLOCK; -- 1.9.1 -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH] fs/f2fs: add cond_resched() to sync_dirty_dir_inodes()
On Fri, Feb 27, 2015 at 01:13:14PM +0100, Sebastian Andrzej Siewior wrote: In a preempt-off enviroment a alot of FS activity (write/delete) I run into a CPU stall: | NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [kworker/u2:2:59] | Modules linked in: | CPU: 0 PID: 59 Comm: kworker/u2:2 Tainted: GW 3.19.0-00010-g10c11c51ffed #153 | Workqueue: writeback bdi_writeback_workfn (flush-179:0) | task: df23 ti: df23e000 task.ti: df23e000 | PC is at __submit_merged_bio+0x6c/0x110 | LR is at f2fs_submit_merged_bio+0x74/0x80 … | [c00085c4] (gic_handle_irq) from [c0012e84] (__irq_svc+0x44/0x5c) | Exception stack(0xdf23fb48 to 0xdf23fb90) | fb40: deef3484 0001 0001 0027 deef3484 | fb60: deef3440 de426000 deef34ec deefc440 df23fbb4 df23fbb8 df23fb90 | fb80: c02191f0 c0218fa0 6013 | [c0012e84] (__irq_svc) from [c0218fa0] (__submit_merged_bio+0x6c/0x110) | [c0218fa0] (__submit_merged_bio) from [c02191f0] (f2fs_submit_merged_bio+0x74/0x80) | [c02191f0] (f2fs_submit_merged_bio) from [c021624c] (sync_dirty_dir_inodes+0x70/0x78) | [c021624c] (sync_dirty_dir_inodes) from [c0216358] (write_checkpoint+0x104/0xc10) | [c0216358] (write_checkpoint) from [c021231c] (f2fs_sync_fs+0x80/0xbc) | [c021231c] (f2fs_sync_fs) from [c0221eb8] (f2fs_balance_fs_bg+0x4c/0x68) | [c0221eb8] (f2fs_balance_fs_bg) from [c021e9b8] (f2fs_write_node_pages+0x40/0x110) | [c021e9b8] (f2fs_write_node_pages) from [c00de620] (do_writepages+0x34/0x48) | [c00de620] (do_writepages) from [c0145714] (__writeback_single_inode+0x50/0x228) | [c0145714] (__writeback_single_inode) from [c0146184] (writeback_sb_inodes+0x1a8/0x378) | [c0146184] (writeback_sb_inodes) from [c01463e4] (__writeback_inodes_wb+0x90/0xc8) | [c01463e4] (__writeback_inodes_wb) from [c01465f8] (wb_writeback+0x1dc/0x28c) | [c01465f8] (wb_writeback) from [c0146dd8] (bdi_writeback_workfn+0x2ac/0x460) | [c0146dd8] (bdi_writeback_workfn) from [c003c3fc] (process_one_work+0x11c/0x3a4) | [c003c3fc] (process_one_work) from [c003c844] (worker_thread+0x17c/0x490) | [c003c844] (worker_thread) from [c0041398] (kthread+0xec/0x100) | [c0041398] (kthread) from [c000ed10] (ret_from_fork+0x14/0x24) As it turns out, the code loops in sync_dirty_dir_inodes() and waits for others to make progress but since it never leaves the CPU there is no progress made. At the time of this stall, there is also a rm process blocked: | rm R running 0 1989 1774 0x | [c047c55c] (__schedule) from [c00486dc] (__cond_resched+0x30/0x4c) | [c00486dc] (__cond_resched) from [c047c8c8] (_cond_resched+0x4c/0x54) | [c047c8c8] (_cond_resched) from [c00e1aec] (truncate_inode_pages_range+0x1f0/0x5e8) | [c00e1aec] (truncate_inode_pages_range) from [c00e1fd8] (truncate_inode_pages+0x28/0x30) | [c00e1fd8] (truncate_inode_pages) from [c00e2148] (truncate_inode_pages_final+0x60/0x64) | [c00e2148] (truncate_inode_pages_final) from [c020c92c] (f2fs_evict_inode+0x4c/0x268) | [c020c92c] (f2fs_evict_inode) from [c0137214] (evict+0x94/0x140) | [c0137214] (evict) from [c01377e8] (iput+0xc8/0x134) | [c01377e8] (iput) from [c01333e4] (d_delete+0x154/0x180) | [c01333e4] (d_delete) from [c0129870] (vfs_rmdir+0x114/0x12c) | [c0129870] (vfs_rmdir) from [c012d644] (do_rmdir+0x158/0x168) | [c012d644] (do_rmdir) from [c012dd90] (SyS_unlinkat+0x30/0x3c) | [c012dd90] (SyS_unlinkat) from [c000ec40] (ret_fast_syscall+0x0/0x4c) As explained by Jaegeuk Kim: |This inode is the directory (c.f., do_rmdir) causing a infinite loop on |sync_dirty_dir_inodes. |The sync_dirty_dir_inodes tries to flush dirty dentry pages, but if the |inode is under eviction, it submits bios and do it again until eviction |is finished. This patch adds a cond_resched() (as suggested by Jaegeuk) after a BIO is submitted so other thread can make progress. Signed-off-by: Sebastian Andrzej Siewior bige...@linutronix.de --- Hi Jaegeuk, How about adding cond_resched() right after f2fs_submit_merged_bio in sync_dirty_dir_inodes? Could you test this? So I added it as you suggsted. I've seen that the two function looped for 5sec but the system did not freeze like before that patch. So it seems to work, thanks. Hi Sebastian, After this patch, your test is all done without any CPU stall, Right? IMHO, context should be switched without cond_resched() after consumed own time quota. So, it just reduces system latency due to yielding. I thought another way to discard pages of inode to be evicted in merged bio instead of submit. If so, evict() doesn't need to wait for writeback. Just my curiousity out of this problem. Thanks, fs/f2fs/checkpoint.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c index 7f794b72b3b7..a2ad3df39f24 100644 --- a/fs/f2fs/checkpoint.c +++ b/fs/f2fs/checkpoint.c @@ -796,6 +796,7 @@ void
Re: [f2fs-dev] [PATCH v2] f2fs: fix max orphan inodes calculation
On Fri, Feb 27, 2015 at 05:38:13PM +0800, Wanpeng Li wrote: cp_payload is introduced for sit bitmap to support large volume, and it is just after the block of f2fs_checkpoint + nat bitmap, so the first segment should include F2FS_CP_PACKS + NR_CURSEG_TYPE + cp_payload + orphan blocks. However, current max orphan inodes calculation don't consider cp_payload, this patch fix it by reducing the number of cp_payload from total blocks of the first segment when calculate max orphan inodes. Signed-off-by: Wanpeng Li wanpeng...@linux.intel.com --- v1 - v2: * adjust comments above the codes * fix coding style issue fs/f2fs/checkpoint.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c index db82e09..a914e99 100644 --- a/fs/f2fs/checkpoint.c +++ b/fs/f2fs/checkpoint.c @@ -1103,13 +1103,15 @@ void init_ino_entry_info(struct f2fs_sb_info *sbi) } /* - * considering 512 blocks in a segment 8 blocks are needed for cp - * and log segment summaries. Remaining blocks are used to keep - * orphan entries with the limitation one reserved segment - * for cp pack we can have max 1020*504 orphan entries + * considering 512 blocks in a segment 8+cp_payload blocks are + * needed for cp and log segment summaries. Remaining blocks are + * used to keep orphan entries with the limitation one reserved + * segment for cp pack we can have max 1020*(504-cp_payload) + * orphan entries */ Hi all, I think below code give us information enough so it doesn't need to describe above comments. And someone could get confused by 1020 constants. How do you think about removing comments. Regards, Changman sbi-max_orphans = (sbi-blocks_per_seg - F2FS_CP_PACKS - - NR_CURSEG_TYPE) * F2FS_ORPHANS_PER_BLOCK; + NR_CURSEG_TYPE - __cp_payload(sbi)) * + F2FS_ORPHANS_PER_BLOCK; } int __init create_checkpoint_caches(void) -- 1.9.1 -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH 5/5 v2] f2fs: introduce a batched trim
Hi Jaegeuk, IMHO, it looks better user could decide the size of trim considering latency of trim. Otherwise, additional checkpoints user doesn't want will occur. Regards, Changman On Mon, Feb 02, 2015 at 03:29:25PM -0800, Jaegeuk Kim wrote: Change long from v1: o add description o change the # of batched segments suggested by Chao o make consistent for # of batched segments This patch introduces a batched trimming feature, which submits split discard commands. This patch introduces a batched trimming feature, which submits split discard commands. This is to avoid long latency due to huge trim commands. If fstrim was triggered ranging from 0 to the end of device, we should lock all the checkpoint-related mutexes, resulting in very long latency. Signed-off-by: Jaegeuk Kim jaeg...@kernel.org --- fs/f2fs/f2fs.h| 2 ++ fs/f2fs/segment.c | 16 +++- 2 files changed, 13 insertions(+), 5 deletions(-) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 8231a59..ec5e66f 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -105,6 +105,8 @@ enum { CP_DISCARD, }; +#define BATCHED_TRIM_SEGMENTS(sbi) (((sbi)-segs_per_sec) 5) + struct cp_control { int reason; __u64 trim_start; diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 5ea57ec..b85bb97 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -1066,14 +1066,20 @@ int f2fs_trim_fs(struct f2fs_sb_info *sbi, struct fstrim_range *range) end_segno = (end = MAX_BLKADDR(sbi)) ? MAIN_SEGS(sbi) - 1 : GET_SEGNO(sbi, end); cpc.reason = CP_DISCARD; - cpc.trim_start = start_segno; - cpc.trim_end = end_segno; cpc.trim_minlen = range-minlen sbi-log_blocksize; /* do checkpoint to issue discard commands safely */ - mutex_lock(sbi-gc_mutex); - write_checkpoint(sbi, cpc); - mutex_unlock(sbi-gc_mutex); + for (; start_segno = end_segno; + start_segno += BATCHED_TRIM_SEGMENTS(sbi)) { + cpc.trim_start = start_segno; + cpc.trim_end = min_t(unsigned int, + start_segno + BATCHED_TRIM_SEGMENTS (sbi) - 1, + end_segno); + + mutex_lock(sbi-gc_mutex); + write_checkpoint(sbi, cpc); + mutex_unlock(sbi-gc_mutex); + } out: range-len = cpc.trimmed sbi-log_blocksize; return 0; -- 2.1.1 -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel -- Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [RFC PATCH 06/10] f2fs: add core functions for rb-tree extent cache
Hi Chao, Great works. :) 2015-01-12 16:14 GMT+09:00 Chao Yu chao2...@samsung.com: This patch adds core functions including slab cache init function and init/lookup/update/shrink/destroy function for rb-tree based extent cache. Thank Jaegeuk Kim and Changman Lee as they gave much suggestion about detail design and implementation of extent cache. Todo: * add a cached_ei into struct extent_tree for a quick recent cache. * register rb-based extent cache shrink with mm shrink interface. * disable dir inode's extent cache. Signed-off-by: Chao Yu chao2...@samsung.com Signed-off-by: Jaegeuk Kim jaeg...@kernel.org Signed-off-by: Changman Lee cm224@samsung.com --- fs/f2fs/data.c | 458 + fs/f2fs/node.c | 9 +- 2 files changed, 466 insertions(+), 1 deletion(-) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index 4f5b871e..bf8c5eb 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -25,6 +25,9 @@ #include trace.h #include trace/events/f2fs.h ~ snip ~ + +static void f2fs_update_extent_tree(struct inode *inode, pgoff_t fofs, + block_t blkaddr) +{ + struct f2fs_sb_info *sbi = F2FS_I_SB(inode); + nid_t ino = inode-i_ino; + struct extent_tree *et; + struct extent_node *en = NULL, *en1 = NULL, *en2 = NULL, *en3 = NULL; + struct extent_node *den = NULL; + struct extent_info *pei; + struct extent_info ei; + unsigned int endofs; + + if (is_inode_flag_set(F2FS_I(inode), FI_NO_EXTENT)) + return; + +retry: + down_write(sbi-extent_tree_lock); + et = radix_tree_lookup(sbi-extent_tree_root, ino); + if (!et) { We've already made some useful functions. How about using f2fs_kmem_cache_alloc and f2fs_radix_tree_insert ? + et = kmem_cache_alloc(extent_tree_slab, GFP_ATOMIC); + if (!et) { + up_write(sbi-extent_tree_lock); + goto retry; + } + if (radix_tree_insert(sbi-extent_tree_root, ino, et)) { + up_write(sbi-extent_tree_lock); + kmem_cache_free(extent_tree_slab, et); + goto retry; + } + memset(et, 0, sizeof(struct extent_tree)); + et-ino = ino; + et-root = RB_ROOT; + rwlock_init(et-lock); + atomic_set(et-refcount, 0); + et-count = 0; + sbi-total_ext_tree++; + } + atomic_inc(et-refcount); + up_write(sbi-extent_tree_lock); + ~ snip ~ + + write_unlock(et-lock); + atomic_dec(et-refcount); +} + +void f2fs_shrink_extent_tree(struct f2fs_sb_info *sbi, int nr_shrink) +{ + struct extent_tree *treevec[EXT_TREE_VEC_SIZE]; + struct extent_node *en, *tmp; + unsigned long ino = F2FS_ROOT_INO(sbi); + struct radix_tree_iter iter; + void **slot; + unsigned int found; + unsigned int node_cnt = 0, tree_cnt = 0; + + if (available_free_memory(sbi, EXTENT_CACHE)) + return; + + spin_lock(sbi-extent_lock); + list_for_each_entry_safe(en, tmp, sbi-extent_list, list) { + if (!nr_shrink--) + break; + list_del_init(en-list); + } + spin_unlock(sbi-extent_lock); + IMHO, it's expensive to retrieve all extent_tree to free extent_node that list_empty() is true. Is there any idea to improve this? For example, if each extent_node has its extent_root, it would be more fast by not to retrieve all trees. Of course, however, it uses more memory. But, I think that your patchset might just as well be merged because patches are well made and it's clearly separated with mount option. In the next time, we could improve this. Regards, Changman + down_read(sbi-extent_tree_lock); + while ((found = radix_tree_gang_lookup(sbi-extent_tree_root, + (void **)treevec, ino, EXT_TREE_VEC_SIZE))) { + unsigned i; + + ino = treevec[found - 1]-ino + 1; + for (i = 0; i found; i++) { + struct extent_tree *et = treevec[i]; + + atomic_inc(et-refcount); + write_lock(et-lock); + node_cnt += __free_extent_tree(sbi, et, false); + write_unlock(et-lock); + atomic_dec(et-refcount); + } + } + up_read(sbi-extent_tree_lock); + + down_write(sbi-extent_tree_lock); + radix_tree_for_each_slot(slot, sbi-extent_tree_root, iter, + F2FS_ROOT_INO(sbi)) { + struct extent_tree *et = (struct extent_tree *)*slot
Re: [f2fs-dev] [RFC PATCH] f2fs: add extent cache base on rb-tree
Hi Chao, On Sun, Jan 04, 2015 at 11:19:28AM +0800, Chao Yu wrote: Hi Changman, Sorry for replying late! -Original Message- From: Changman Lee [mailto:cm224@samsung.com] Sent: Tuesday, December 30, 2014 8:32 AM To: Jaegeuk Kim Cc: Chao Yu; linux-f2fs-devel@lists.sourceforge.net; linux-ker...@vger.kernel.org Subject: Re: [RFC PATCH] f2fs: add extent cache base on rb-tree Hi all, On Mon, Dec 29, 2014 at 01:23:00PM -0800, Jaegeuk Kim wrote: Hi Chao, On Mon, Dec 29, 2014 at 03:19:18PM +0800, Chao Yu wrote: [snip] Nice draft. :) Please see the draft below. 1) Extent management: If we use global management that managing all extents which are from different inodes in sbi, we will face with serious lock contention when we access these extents belong to different inodes concurrently, the loss may outweights the gain. Agreed. So we choose a local management for extent which means all extents are managed by inode itself to avoid above lock contention. Addtionlly, we manage all extents globally by linking all inode into a global lru list for extent cache shrinker. Approach: a) build extent tree/rwlock/lru list/extent count in each inode. *extent tree: link all extent in rb-tree; *rwlock: protect fields when accessing extent cache concurrently; *lru list: sort all extents in accessing time order; *extent count: record total count of extents in cache. b) use lru shrink list in sbi to manage all inode which cached extents. *inode will be added or repostioned in this global list whenever extent is being access in this inode. *use spinlock to protect this shrink list. 1. How about adding a data structure with inode number instead of referring inode pointer? 2. How about managing extent entries globally and setting an upper bound to the number of extent entries instead of limiting them per each inode? (The rb-tree will handle many extents per inode.) 3. It needs to set a minimum length for the candidate of extent cache. (e.g., 64) Agreed. So, for example, struct ino_entry_for_extents { inode number; rb_tree for extent_entry objects; rwlock; }; struct extent_entry { blkaddr, len; list_head *; }; Something like this. [A, B, C, ... are extent entry] The sbi has 1. an extent_list: (LRU) A - B - C - D - E - F - G (MRU) 2. radix_tree: ino_entry_for_extents (#10) has D, B in rb-tree ` ino_entry_for_extents (#11) has A, C in rb-tree ` ino_entry_for_extents (#12) has Fin rb-tree ` ino_entry_for_extents (#13) has G, E in rb-tree In f2fs_update_extent_cache and __get_data_block for #10, ino_entry_for_extents (#10) was founded and updated D or B. Then, updated entries are moved to MRU. In f2fs_evict_inode for #11, A and C are moved to LRU. But, if this inode is unlinked, all the A, C, and ino_entry_for_extens (#11) should be released. In f2fs_balance_fs_bg, some LRU extents are released according to the amount of consumed memory. Then, it frees any ino_entry_for_extents having no extent. IMO, we don't need to consider readahead for this, since get_data_block will be called by VFS readahead. Furthermore, we need to think about whether LRU is really best or not. IMO, the extent cache aims to improve second access speed, rather than initial cold misses. So, maybe MRU or another algorithms would be better. Right. It's very comflicated to judge which is better. In read or write path, extents could be made every time. At that time, we should decide which extent evicts instead of new extents if we set upper bound. In update, one extent could be seperated into 3. It requires 3 insertion and 1 deletion. So if update happends frequently, we could give up extent management for some ranges. And we need to bring ideas from vm managemnt. For example, active/inactive list and second chance to promotion, or batch work for insertion/deletion I thought suddenly 'Simple is best'. Let's think about better ideas together. Yeah, how about using an opposite way to the way of page cache manager? for example: node page A,B,C,D is in page cache; extent a,b,c,d is in extent cache; extent a is built from page A, ..., d is built from page D. page cache: LRU A - B - C - D MRU extent cache: LRU a - b - c - d MRU If we use 1) the same way LRU, cache pair A-a, B-b, ... may be reclaimed in the same time as OOM. 2) the opposite way, maybe A,B in page cache and d,c in extent cache will be reclaimed, but we still can hit whole cache
Re: [f2fs-dev] [RFC PATCH] f2fs: add extent cache base on rb-tree
Hi all, On Mon, Dec 29, 2014 at 01:23:00PM -0800, Jaegeuk Kim wrote: Hi Chao, On Mon, Dec 29, 2014 at 03:19:18PM +0800, Chao Yu wrote: [snip] Nice draft. :) Please see the draft below. 1) Extent management: If we use global management that managing all extents which are from different inodes in sbi, we will face with serious lock contention when we access these extents belong to different inodes concurrently, the loss may outweights the gain. Agreed. So we choose a local management for extent which means all extents are managed by inode itself to avoid above lock contention. Addtionlly, we manage all extents globally by linking all inode into a global lru list for extent cache shrinker. Approach: a) build extent tree/rwlock/lru list/extent count in each inode. *extent tree: link all extent in rb-tree; *rwlock: protect fields when accessing extent cache concurrently; *lru list: sort all extents in accessing time order; *extent count: record total count of extents in cache. b) use lru shrink list in sbi to manage all inode which cached extents. *inode will be added or repostioned in this global list whenever extent is being access in this inode. *use spinlock to protect this shrink list. 1. How about adding a data structure with inode number instead of referring inode pointer? 2. How about managing extent entries globally and setting an upper bound to the number of extent entries instead of limiting them per each inode? (The rb-tree will handle many extents per inode.) 3. It needs to set a minimum length for the candidate of extent cache. (e.g., 64) Agreed. So, for example, struct ino_entry_for_extents { inode number; rb_tree for extent_entry objects; rwlock; }; struct extent_entry { blkaddr, len; list_head *; }; Something like this. [A, B, C, ... are extent entry] The sbi has 1. an extent_list: (LRU) A - B - C - D - E - F - G (MRU) 2. radix_tree: ino_entry_for_extents (#10) has D, B in rb-tree ` ino_entry_for_extents (#11) has A, C in rb-tree ` ino_entry_for_extents (#12) has Fin rb-tree ` ino_entry_for_extents (#13) has G, E in rb-tree In f2fs_update_extent_cache and __get_data_block for #10, ino_entry_for_extents (#10) was founded and updated D or B. Then, updated entries are moved to MRU. In f2fs_evict_inode for #11, A and C are moved to LRU. But, if this inode is unlinked, all the A, C, and ino_entry_for_extens (#11) should be released. In f2fs_balance_fs_bg, some LRU extents are released according to the amount of consumed memory. Then, it frees any ino_entry_for_extents having no extent. IMO, we don't need to consider readahead for this, since get_data_block will be called by VFS readahead. Furthermore, we need to think about whether LRU is really best or not. IMO, the extent cache aims to improve second access speed, rather than initial cold misses. So, maybe MRU or another algorithms would be better. Right. It's very comflicated to judge which is better. In read or write path, extents could be made every time. At that time, we should decide which extent evicts instead of new extents if we set upper bound. In update, one extent could be seperated into 3. It requires 3 insertion and 1 deletion. So if update happends frequently, we could give up extent management for some ranges. And we need to bring ideas from vm managemnt. For example, active/inactive list and second chance to promotion, or batch work for insertion/deletion I thought suddenly 'Simple is best'. Let's think about better ideas together. Thanks, 2) Limitation: In one inode, as we split or add extent in extent cache when read/write, extent number will enlarge, so memory and CPU overhead will increase. In order to control the overhead of memory and CPU, we try to set a upper bound number to limit total extent number in each inode, This number is global configuration which is visable to all inode. This number will be exported to sysfs for configuring according to requirement of user. By default, designed number is 8. Chao, It's better which # of extent are controlled globally rather than limit extents per inode as Jaegeuk said to reduce extent management overhead. 3) Shrinker: There are two shrink paths: a) one is triggered when extent count has exceed the upper bound of inode's extent cache. We will try to release extent(s) from head of inode's inner extent lru list until extent count is equal to upper bound. This operation could be in f2fs_update_extent_cache(). b) the other one is triggered when memory util exceed threshold, we try get inode from head of global lru list(s), and release extent(s) with fixed number (by default: 64 extents)
Re: [f2fs-dev] [PATCH v2] f2fs: add block count by in-place-update in stat info
Change from v1 o use atomic_t inplace_count for more accurate suggested by Chao -- 8 -- From 7a42b27c8df45494e806d625be03830bfa8c30ff Mon Sep 17 00:00:00 2001 From: Changman Lee cm224@samsung.com Date: Wed, 24 Dec 2014 02:16:54 +0900 Subject: [PATCH] f2fs: add block count by in-place-update in stat info This patch adds block count by in-place-update in stat. Signed-off-by: Changman Lee cm224@samsung.com --- fs/f2fs/debug.c | 4 fs/f2fs/f2fs.h| 5 - fs/f2fs/segment.c | 1 + 3 files changed, 9 insertions(+), 1 deletion(-) diff --git a/fs/f2fs/debug.c b/fs/f2fs/debug.c index 91e8f69..2b64221 100644 --- a/fs/f2fs/debug.c +++ b/fs/f2fs/debug.c @@ -79,6 +79,8 @@ static void update_general_status(struct f2fs_sb_info *sbi) si-segment_count[i] = sbi-segment_count[i]; si-block_count[i] = sbi-block_count[i]; } + + si-inplace_count = atomic_read(sbi-inplace_count); } /* @@ -277,6 +279,7 @@ static int stat_show(struct seq_file *s, void *v) for (j = 0; j si-util_free; j++) seq_putc(s, '-'); seq_puts(s, ]\n\n); + seq_printf(s, IPU: %u blocks\n, si-inplace_count); seq_printf(s, SSR: %u blocks in %u segments\n, si-block_count[SSR], si-segment_count[SSR]); seq_printf(s, LFS: %u blocks in %u segments\n, @@ -331,6 +334,7 @@ int f2fs_build_stats(struct f2fs_sb_info *sbi) atomic_set(sbi-inline_inode, 0); atomic_set(sbi-inline_dir, 0); + atomic_set(sbi-inplace_count, 0); mutex_lock(f2fs_stat_mutex); list_add_tail(si-stat_list, f2fs_stat_list); diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index ec58bb2..72d2aab 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -589,6 +589,7 @@ struct f2fs_sb_info { struct f2fs_stat_info *stat_info; /* FS status information */ unsigned int segment_count[2]; /* # of allocated segments */ unsigned int block_count[2];/* # of allocated blocks */ + atomic_t inplace_count; /* # of inplace update */ int total_hit_ext, read_hit_ext;/* extent cache hit ratio */ atomic_t inline_inode; /* # of inline_data inodes */ atomic_t inline_dir;/* # of inline_dentry inodes */ @@ -1514,6 +1515,7 @@ struct f2fs_stat_info { unsigned int segment_count[2]; unsigned int block_count[2]; + unsigned int inplace_count; unsigned base_mem, cache_mem; }; @@ -1553,7 +1555,8 @@ static inline struct f2fs_stat_info *F2FS_STAT(struct f2fs_sb_info *sbi) ((sbi)-segment_count[(curseg)-alloc_type]++) #define stat_inc_block_count(sbi, curseg) \ ((sbi)-block_count[(curseg)-alloc_type]++) - +#define stat_inc_inplace_blocks(sbi) \ + (atomic_inc((sbi)-inplace_count)) #define stat_inc_seg_count(sbi, type) \ do {\ struct f2fs_stat_info *si = F2FS_STAT(sbi); \ diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 42607a6..fd9bc96 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -1235,6 +1235,7 @@ void write_data_page(struct page *page, struct dnode_of_data *dn, void rewrite_data_page(struct page *page, block_t old_blkaddr, struct f2fs_io_info *fio) { + stat_inc_inplace_blocks(F2FS_P_SB(page)); f2fs_submit_page_mbio(F2FS_P_SB(page), page, old_blkaddr, fio); } -- 1.9.1 -- Dive into the World of Parallel Programming! The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [RFC PATCH] f2fs: add extent cache base on rb-tree
On Mon, Dec 22, 2014 at 11:36:09PM -0800, Jaegeuk Kim wrote: Hi Chao, On Tue, Dec 23, 2014 at 11:01:39AM +0800, Chao Yu wrote: Hi Jaegeuk, -Original Message- From: Jaegeuk Kim [mailto:jaeg...@kernel.org] Sent: Tuesday, December 23, 2014 7:16 AM To: Chao Yu Cc: 'Changman Lee'; linux-f2fs-devel@lists.sourceforge.net; linux-ker...@vger.kernel.org Subject: Re: [RFC PATCH] f2fs: add extent cache base on rb-tree Hi Chao, On Mon, Dec 22, 2014 at 03:10:30PM +0800, Chao Yu wrote: Hi Changman, -Original Message- From: Changman Lee [mailto:cm224@samsung.com] Sent: Monday, December 22, 2014 10:03 AM To: Chao Yu Cc: Jaegeuk Kim; linux-f2fs-devel@lists.sourceforge.net; linux-ker...@vger.kernel.org Subject: Re: [RFC PATCH] f2fs: add extent cache base on rb-tree Hi Yu, Good approach. Thank you. :) As you know, however, f2fs breaks extent itself due to COW. Yes, and sometimes f2fs use IPU when override writing, in this condition, by using this approach we can cache more contiguous mapping extent for better performance. Hmm. When f2fs faces with this case, there is no chance to make an extent itself at all. With new implementation of this patch f2fs will build extent cache when readpage/readpages. I don't understand your points exactly. :( If there are no on-disk extents, it doesn't matter when the caches are built. Could you define what scenarios you're looking at? Unlike other filesystem like btrfs, minimum extent of f2fs could have 4KB granularity. So we would have lots of extents per inode and it could lead to overhead to manage extents. Agree, the more number of extents are growing in one inode, the more memory pressure and longer latency operating in rb-tree we are facing. IMO, to solve this problem, we'd better to add limitation or shrink ability into extent cache: 1.limit extent number per inode with the value set from sysfs and discard extent from inode's extent lru list if we touch the limitation; (e.g. in FAT, max number of mapping extent per inode is fixed: 8) 2.add all extents of inodes into a global lru list, we will try to shrink this list if we're facing memory pressure. How do you think? or any better ideas are welcome. :) Historically, the reason that I added only one small extent cache is that I wanted to avoid additional data structures having any overhead in critical data write path. Thank you for telling me the history of original extent cache. Instead, I intended to use a well operating node page cache. We need to consider what would be the benefit when using extent cache rather than existing node page cache. IMO, node page cache belongs to system level cache, filesystem sub system can not control it completely, cached uptodate node page will be invalidated by using drop_caches from sysfs, or reclaimer of mm, result in more IO when we need these node page next time. Yes, that's exactly what I wanted. New extent cache belongs to filesystem level cache, it is completely controlled by filesystem itself. What we can profit is: on the one hand, it is used as first level cache above the node page cache, which can also increase the cache hit ratio. I don't think so. The hit ratio depends on the cache policy. The node page cache is managed globally by kernel in LRU manner, so I think this can show affordable hit ratio. On the other hand, it is more instable and controllable than node page cache. It depends on how you can control the extent cache. But, I'm not sure that would be better than page cache managed by MM. So, my concerns are: 1. Redundant memory overhead : The extent cache is likely on top of the node page cache, which will consume memory redundantly. 2. CPU overhead : In every block address updates, it needs to traverse extent cache entries. 3. Effectiveness : We have a node page cache that is managed by MM in LRU order. I think this provides good hit ratio, system-wide memory relciaming algorithms, and well- defined locking mechanism. 4. Cache reclaiming policy a. global approach: it needs to consider lock contention, CPU overhead, and shrinker. I don't think it is better than page cache. b. local approach: there still exists cold misses at the initial read operations. After then, how does the extent cache increase hit ratio more than giving node page cache? For example, in the case of pretty normal scenario like open - read - close - open - read ..., we can't get benefits form locally-managed extent cache, while node page
Re: [f2fs-dev] [PATCH 1/2] f2fs: conduct f2fs_gc as explicit gc_type
Hi, On Tue, Dec 23, 2014 at 12:00:37AM -0800, Jaegeuk Kim wrote: Hi Changman, On Tue, Dec 23, 2014 at 08:37:38AM +0900, Changman Lee wrote: f2fs has 2 gc_type; foreground gc and background gc. In the case of foreground gc, f2fs will select victim as greedy. Otherwise, as cost-benefit. And also it runs as greedy in SSR mode. Until now, f2fs_gc conducted with BG_GC as default. So we couldn't expect how it runs; BG_GC or FG_GC and GREEDY or COST_BENEFIT. What does this mean? In f2fs_gc, the gc_type will be changed accoring to the number of free sections. Right, but when I turn on trace I saw 3 cases. 1. BG_GC and COST_BENEFIT 2. BG_GC and GREEDY 3. FG_GC and GREEDY I expected that case 1 is likely to operate only by gc_thread. But it was not. Therefore sometimes it runs as BG_GC/COST_BENEFIT although gc_thread don't put f2fs_gc to work. You mean f2fs_balance_fs? In this case, again, the gc_type will be assigned FG_GC. Why do you want to set FG_GC/GREEDY for the SSR victims? We should alloate a block as soon as possible. In the case of FG_GC, it also uses invalid blocks dirtied by background gc. In another case, if (BG_GC test_bit(victim_secmap)), it will be skipped. I intended that SSR operates fastly like FG_GC. Regards, Changman Thanks, Signed-off-by: Changman Lee cm224@samsung.com --- fs/f2fs/f2fs.h| 2 +- fs/f2fs/gc.c | 5 ++--- fs/f2fs/segment.c | 6 +++--- 3 files changed, 6 insertions(+), 7 deletions(-) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index ae6dfb6..c956535 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -1476,7 +1476,7 @@ int f2fs_fiemap(struct inode *inode, struct fiemap_extent_info *, u64, u64); int start_gc_thread(struct f2fs_sb_info *); void stop_gc_thread(struct f2fs_sb_info *); block_t start_bidx_of_node(unsigned int, struct f2fs_inode_info *); -int f2fs_gc(struct f2fs_sb_info *); +int f2fs_gc(struct f2fs_sb_info *, int); void build_gc_manager(struct f2fs_sb_info *); int __init create_gc_caches(void); void destroy_gc_caches(void); diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index eec0933..e1fa53a 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -80,7 +80,7 @@ static int gc_thread_func(void *data) stat_inc_bggc_count(sbi); /* if return value is not zero, no victim was selected */ - if (f2fs_gc(sbi)) + if (f2fs_gc(sbi, BG_GC)) wait_ms = gc_th-no_gc_sleep_time; /* balancing f2fs's metadata periodically */ @@ -691,10 +691,9 @@ static void do_garbage_collect(struct f2fs_sb_info *sbi, unsigned int segno, f2fs_put_page(sum_page, 1); } -int f2fs_gc(struct f2fs_sb_info *sbi) +int f2fs_gc(struct f2fs_sb_info *sbi, int gc_type) { unsigned int segno, i; - int gc_type = BG_GC; int nfree = 0; int ret = -1; struct cp_control cpc; diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index fd9bc96..3b32404 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -281,7 +281,7 @@ void f2fs_balance_fs(struct f2fs_sb_info *sbi) */ if (has_not_enough_free_secs(sbi, 0)) { mutex_lock(sbi-gc_mutex); - f2fs_gc(sbi); + f2fs_gc(sbi, FG_GC); } } @@ -994,12 +994,12 @@ static int get_ssr_segment(struct f2fs_sb_info *sbi, int type) if (IS_NODESEG(type) || !has_not_enough_free_secs(sbi, 0)) return v_ops-get_victim(sbi, - (curseg)-next_segno, BG_GC, type, SSR); + (curseg)-next_segno, FG_GC, type, SSR); /* For data segments, let's do SSR more intensively */ for (; type = CURSEG_HOT_DATA; type--) if (v_ops-get_victim(sbi, (curseg)-next_segno, - BG_GC, type, SSR)) + FG_GC, type, SSR)) return 1; return 0; } -- 1.9.1 -- Dive into the World of Parallel Programming! The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel -- Dive into the World of Parallel Programming! The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought
[f2fs-dev] [PATCH] f2fs: add block count by in-place-update in stat info
This patch adds block count by in-place-update in stat. Signed-off-by: Changman Lee cm224@samsung.com --- fs/f2fs/debug.c | 3 +++ fs/f2fs/f2fs.h| 5 - fs/f2fs/segment.c | 1 + 3 files changed, 8 insertions(+), 1 deletion(-) diff --git a/fs/f2fs/debug.c b/fs/f2fs/debug.c index 91e8f69..46bef86 100644 --- a/fs/f2fs/debug.c +++ b/fs/f2fs/debug.c @@ -79,6 +79,8 @@ static void update_general_status(struct f2fs_sb_info *sbi) si-segment_count[i] = sbi-segment_count[i]; si-block_count[i] = sbi-block_count[i]; } + + si-inplace_count = sbi-inplace_count; } /* @@ -277,6 +279,7 @@ static int stat_show(struct seq_file *s, void *v) for (j = 0; j si-util_free; j++) seq_putc(s, '-'); seq_puts(s, ]\n\n); + seq_printf(s, IPU: %u blocks\n, si-inplace_count); seq_printf(s, SSR: %u blocks in %u segments\n, si-block_count[SSR], si-segment_count[SSR]); seq_printf(s, LFS: %u blocks in %u segments\n, diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index ec58bb2..ae6dfb6 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -589,6 +589,7 @@ struct f2fs_sb_info { struct f2fs_stat_info *stat_info; /* FS status information */ unsigned int segment_count[2]; /* # of allocated segments */ unsigned int block_count[2];/* # of allocated blocks */ + unsigned int inplace_count; /* # of inplace update */ int total_hit_ext, read_hit_ext;/* extent cache hit ratio */ atomic_t inline_inode; /* # of inline_data inodes */ atomic_t inline_dir;/* # of inline_dentry inodes */ @@ -1514,6 +1515,7 @@ struct f2fs_stat_info { unsigned int segment_count[2]; unsigned int block_count[2]; + unsigned int inplace_count; unsigned base_mem, cache_mem; }; @@ -1553,7 +1555,8 @@ static inline struct f2fs_stat_info *F2FS_STAT(struct f2fs_sb_info *sbi) ((sbi)-segment_count[(curseg)-alloc_type]++) #define stat_inc_block_count(sbi, curseg) \ ((sbi)-block_count[(curseg)-alloc_type]++) - +#define stat_inc_inplace_blocks(sbi) \ + ((sbi)-inplace_count++) #define stat_inc_seg_count(sbi, type) \ do {\ struct f2fs_stat_info *si = F2FS_STAT(sbi); \ diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 42607a6..fd9bc96 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -1235,6 +1235,7 @@ void write_data_page(struct page *page, struct dnode_of_data *dn, void rewrite_data_page(struct page *page, block_t old_blkaddr, struct f2fs_io_info *fio) { + stat_inc_inplace_blocks(F2FS_P_SB(page)); f2fs_submit_page_mbio(F2FS_P_SB(page), page, old_blkaddr, fio); } -- 1.9.1 -- Dive into the World of Parallel Programming! The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [RFC PATCH] f2fs: add extent cache base on rb-tree
Hi, On Mon, Dec 22, 2014 at 03:10:30PM +0800, Chao Yu wrote: Hi Changman, -Original Message- From: Changman Lee [mailto:cm224@samsung.com] Sent: Monday, December 22, 2014 10:03 AM To: Chao Yu Cc: Jaegeuk Kim; linux-f2fs-devel@lists.sourceforge.net; linux-ker...@vger.kernel.org Subject: Re: [RFC PATCH] f2fs: add extent cache base on rb-tree Hi Yu, Good approach. Thank you. :) As you know, however, f2fs breaks extent itself due to COW. Yes, and sometimes f2fs use IPU when override writing, in this condition, by using this approach we can cache more contiguous mapping extent for better performance. Unlike other filesystem like btrfs, minimum extent of f2fs could have 4KB granularity. So we would have lots of extents per inode and it could lead to overhead to manage extents. Agree, the more number of extents are growing in one inode, the more memory pressure and longer latency operating in rb-tree we are facing. IMO, to solve this problem, we'd better to add limitation or shrink ability into extent cache: 1.limit extent number per inode with the value set from sysfs and discard extent from inode's extent lru list if we touch the limitation; (e.g. in FAT, max number of mapping extent per inode is fixed: 8) 2.add all extents of inodes into a global lru list, we will try to shrink this list if we're facing memory pressure. How do you think? or any better ideas are welcome. :) I think both of them are considerable options. How about adding extent to inode selected by user using ioctl or xattr? In the case of read most files having large size, user could get a benefit surely although they are seperated some pieces. Thanks, Anyway, mount option could be alternative for this patch. Yes, will do. Thanks, Yu On Fri, Dec 19, 2014 at 06:49:29PM +0800, Chao Yu wrote: Now f2fs have page-block mapping cache which can cache only one extent mapping between contiguous logical address and physical address. Normally, this design will work well because f2fs will expand coverage area of the mapping extent when we write forward sequentially. But when we write data randomly in Out-Place-Update mode, the extent will be shorten and hardly be expanded for most time as following reasons: 1.The short part of extent will be discarded if we break contiguous mapping in the middle of extent. 2.The new mapping will be added into mapping cache only at head or tail of the extent. 3.We will drop the extent cache when the extent became very fragmented. 4.We will not update the extent with mapping which we get from readpages or readpage. To solve above problems, this patch adds extent cache base on rb-tree like other filesystems (e.g.: ext4/btrfs) in f2fs. By this way, f2fs can support another more effective cache between dnode page cache and disk. It will supply high hit ratio in the cache with fewer memory when dnode page cache are reclaimed in environment of low memory. Todo: *introduce mount option for extent cache. *add shrink ability for extent cache. Signed-off-by: Chao Yu chao2...@samsung.com --- -- Dive into the World of Parallel Programming! The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH v2] f2fs: merge two uchar variable in struct node_info to reduce memory cost
Hi Yu, This patch is effective only in 32 bit machine. In case of 64 bit machine, nat_entry will be aligned in 8 bytes due to pointer variable (i.e. struct list_head). So it can't get any benefit to reduce memory usage. In the case of node_info, however, it will be gain in terms of memory usage. Hence, I think it's not correct for commit log to describe this patch. Thanks, Reviewed-by: Changman Lee cm224@samsung.com 2014-12-15 18:33 GMT+09:00 Chao Yu chao2...@samsung.com: This patch moves one member of struct nat_entry: _flag_ to struct node_info, so _version_ in struct node_info and _flag_ with unsigned char type will merge to one 32-bit space in register/memory. Then the size of nat_entry will reduce its size from 28 bytes to 24 bytes and slab memory using by f2fs will be reduced. changes from v1: o introduce inline copy_node_info() to copy valid data from node info suggested by Jaegeuk Kim, it can avoid bug. Signed-off-by: Chao Yu chao2...@samsung.com --- fs/f2fs/node.c | 4 ++-- fs/f2fs/node.h | 33 ++--- 2 files changed, 24 insertions(+), 13 deletions(-) diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index f83326c..5aa54a0 100644 --- a/fs/f2fs/node.c +++ b/fs/f2fs/node.c @@ -268,7 +268,7 @@ static void set_node_addr(struct f2fs_sb_info *sbi, struct node_info *ni, e = __lookup_nat_cache(nm_i, ni-nid); if (!e) { e = grab_nat_entry(nm_i, ni-nid); - e-ni = *ni; + copy_node_info(e-ni, ni); f2fs_bug_on(sbi, ni-blk_addr == NEW_ADDR); } else if (new_blkaddr == NEW_ADDR) { /* @@ -276,7 +276,7 @@ static void set_node_addr(struct f2fs_sb_info *sbi, struct node_info *ni, * previous nat entry can be remained in nat cache. * So, reinitialize it with new information. */ - e-ni = *ni; + copy_node_info(e-ni, ni); f2fs_bug_on(sbi, ni-blk_addr != NULL_ADDR); } diff --git a/fs/f2fs/node.h b/fs/f2fs/node.h index d10b644..eb59167 100644 --- a/fs/f2fs/node.h +++ b/fs/f2fs/node.h @@ -29,6 +29,14 @@ /* return value for read_node_page */ #define LOCKED_PAGE1 +/* For flag in struct node_info */ +enum { + IS_CHECKPOINTED,/* is it checkpointed before? */ + HAS_FSYNCED_INODE, /* is the inode fsynced before? */ + HAS_LAST_FSYNC, /* has the latest node fsync mark? */ + IS_DIRTY, /* this nat entry is dirty? */ +}; + /* * For node information */ @@ -37,18 +45,11 @@ struct node_info { nid_t ino; /* inode number of the node's owner */ block_t blk_addr; /* block address of the node */ unsigned char version; /* version of the node */ -}; - -enum { - IS_CHECKPOINTED,/* is it checkpointed before? */ - HAS_FSYNCED_INODE, /* is the inode fsynced before? */ - HAS_LAST_FSYNC, /* has the latest node fsync mark? */ - IS_DIRTY, /* this nat entry is dirty? */ + unsigned char flag; /* for node information bits */ }; struct nat_entry { struct list_head list; /* for clean or dirty nat list */ - unsigned char flag; /* for node information bits */ struct node_info ni;/* in-memory node information */ }; @@ -63,20 +64,30 @@ struct nat_entry { #define inc_node_version(version) (++version) +static inline void copy_node_info(struct node_info *dst, + struct node_info *src) +{ + dst-nid = src-nid; + dst-ino = src-ino; + dst-blk_addr = src-blk_addr; + dst-version = src-version; + /* should not copy flag here */ +} + static inline void set_nat_flag(struct nat_entry *ne, unsigned int type, bool set) { unsigned char mask = 0x01 type; if (set) - ne-flag |= mask; + ne-ni.flag |= mask; else - ne-flag = ~mask; + ne-ni.flag = ~mask; } static inline bool get_nat_flag(struct nat_entry *ne, unsigned int type) { unsigned char mask = 0x01 type; - return ne-flag mask; + return ne-ni.flag mask; } static inline void nat_reset_flag(struct nat_entry *ne) -- 2.1.2 -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=164703151iu=/4140/ostg.clktrk ___ Linux-f2fs-devel mailing list Linux
[f2fs-dev] [PATCH 1/3] f2fs: check if inode state is dirty at fsync
If inode state is dirty, go straight to write. Suggested-by: Jaegeuk Kim jaeg...@kernel.org Signed-off-by: Changman Lee cm224@samsung.com --- fs/f2fs/file.c | 25 +++-- 1 file changed, 19 insertions(+), 6 deletions(-) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index b6f3fbf..0b97002 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -138,6 +138,17 @@ static inline bool need_do_checkpoint(struct inode *inode) return need_cp; } +static bool need_inode_page_update(struct f2fs_sb_info *sbi, nid_t ino) +{ + struct page *i = find_get_page(NODE_MAPPING(sbi), ino); + bool ret = false; + /* But we need to avoid that there are some inode updates */ + if ((i PageDirty(i)) || need_inode_block_update(sbi, ino)) + ret = true; + f2fs_put_page(i, 0); + return ret; +} + int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync) { struct inode *inode = file-f_mapping-host; @@ -168,19 +179,21 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync) return ret; } + /* if the inode is dirty, let's recover all the time */ + if (!datasync is_inode_flag_set(fi, FI_DIRTY_INODE)) { + update_inode_page(inode); + goto go_write; + } + /* * if there is no written data, don't waste time to write recovery info. */ if (!is_inode_flag_set(fi, FI_APPEND_WRITE) !exist_written_data(sbi, ino, APPEND_INO)) { - struct page *i = find_get_page(NODE_MAPPING(sbi), ino); - /* But we need to avoid that there are some inode updates */ - if ((i PageDirty(i)) || need_inode_block_update(sbi, ino)) { - f2fs_put_page(i, 0); + /* it may call write_inode just prior to fsync */ + if (need_inode_page_update(sbi, ino)) goto go_write; - } - f2fs_put_page(i, 0); if (is_inode_flag_set(fi, FI_UPDATE_WRITE) || exist_written_data(sbi, ino, UPDATE_INO)) -- 1.9.1 -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=164703151iu=/4140/ostg.clktrk ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH] f2fs: check if inode's state is dirty or not before skip fsync
On Thu, Dec 04, 2014 at 04:58:29PM -0800, Jaegeuk Kim wrote: On Wed, Dec 03, 2014 at 10:46:38AM +0900, Changman Lee wrote: Hi Jaegeuk, Thanks for explanation. On Tue, Dec 02, 2014 at 11:42:19AM -0800, Jaegeuk Kim wrote: On Tue, Dec 02, 2014 at 01:21:31PM +0900, Changman Lee wrote: Hi, f2fs_dirty_inode just set fi-flag as FI_DIRTY_INODE not to call update_inode_page. Instead, we do it when f2fs_write_indoe is called. Do you have any reason to do like this? Actually, I'd like to use inode caches instead of dirty node pages as much as possible to mitigate memory pressure as well as reduce node page writes. But, the reality is that f2fs triggers update_inode_page frequently, since some inode information like i_blocks and i_links should be recovered consistently from sudden power-cuts. I got it. No objection. How about move update_inode_page from write_inode to dirty_inode? And we can update inode page when mark_inode_dirty or mark_inode_dirty_sync is called. Then, we control write I/O in write_inode according to wbc-sync_mode. What do you mean controlling write I/O in write_inode? The write_inode does not trigger any I/Os. We're controlling node page writes by f2fs_write_node_pages. Sorry, it's not enough for my explanation. At __writeback_single_inode, it calls write_inode if inode is dirty. And at ext4_write_inode and btrfs_write_inode, they issue write according to wbc-sync_mode. However, current f2fs doesn't issue any write i/o. Could you review it? Hi, Well, I'm not quite sure that f2fs should do this. In terms of recovery, we don't need to do this. Anyway, if we call update_inode_page in mark_inode_dirty, f2fs would suffer from a lot of dirty node pages. Got it. But I think we should write dirty node after update_inode_page in write_inode if wbc-sync_mode == WB_SYNC_ALL. Why do we have to do this? Again, there is no problem wrt recovery, but that causes unnecessary IOs. Finally, I have one more question. At f2fs_sync_file, in the case of need_cp is true and file_wrong_pino f2fs calls write_inode. But the inode isn't written back. Is it okay? Could you elaborate on it? No problem. That pino will be used only for fsynced inodes after checkpoint. I got it. My concern was started from this. If there is no problem, I think current f2fs_write_inode is also no problem. Thanks Jaegeuk. Then, let's merge your suggestion below. Lastly, I have curiosity related to write node; APPEND or UPDATE. Before fsync is called, isn't there any possiblity to be changed to APPEND from UPDATE. If so, we might lost recovery info. I think we'd better check if there is a situation. Regards, Changman Thanks, Thanks, Thanks, Could you consider this once? Thanks, On Mon, Dec 01, 2014 at 02:52:57PM -0800, Jaegeuk Kim wrote: On Mon, Dec 01, 2014 at 04:05:20PM +0900, Changman Lee wrote: It makes sense to check inode's state than check if inode page is dirty or not. Nice catch. However, we should leave the original condition, since write_inode can be called in prior to this fsync call. And, this is not a proper fix, since it still can skip to write its inode page. How about this one? diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 146e58a..6690599 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -168,6 +168,12 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync) return ret; } + /* if the inode is dirty, let's recover all the time */ + if (is_inode_flag_set(fi, FI_DIRTY_INODE)) { + update_inode_page(inode); + goto go_write; + } + /* * if there is no written data, don't waste time to write recovery info. */ -- 2.1.1 Signed-off-by: Changman Lee cm224@samsung.com --- fs/f2fs/file.c | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 7c2ec3e..0c5ae87 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -173,14 +173,11 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync) */ if (!is_inode_flag_set(fi, FI_APPEND_WRITE) !exist_written_data(sbi, ino, APPEND_INO)) { - struct page *i = find_get_page(NODE_MAPPING(sbi), ino); /* But we need to avoid that there are some inode updates */ - if ((i PageDirty(i)) || need_inode_block_update(sbi, ino)) { - f2fs_put_page(i, 0); + if (is_inode_flag_set(fi
Re: [f2fs-dev] [PATCH] f2fs: check if inode's state is dirty or not before skip fsync
Hi Jaegeuk, Thanks for explanation. On Tue, Dec 02, 2014 at 11:42:19AM -0800, Jaegeuk Kim wrote: On Tue, Dec 02, 2014 at 01:21:31PM +0900, Changman Lee wrote: Hi, f2fs_dirty_inode just set fi-flag as FI_DIRTY_INODE not to call update_inode_page. Instead, we do it when f2fs_write_indoe is called. Do you have any reason to do like this? Actually, I'd like to use inode caches instead of dirty node pages as much as possible to mitigate memory pressure as well as reduce node page writes. But, the reality is that f2fs triggers update_inode_page frequently, since some inode information like i_blocks and i_links should be recovered consistently from sudden power-cuts. I got it. No objection. How about move update_inode_page from write_inode to dirty_inode? And we can update inode page when mark_inode_dirty or mark_inode_dirty_sync is called. Then, we control write I/O in write_inode according to wbc-sync_mode. What do you mean controlling write I/O in write_inode? The write_inode does not trigger any I/Os. We're controlling node page writes by f2fs_write_node_pages. Sorry, it's not enough for my explanation. At __writeback_single_inode, it calls write_inode if inode is dirty. And at ext4_write_inode and btrfs_write_inode, they issue write according to wbc-sync_mode. However, current f2fs doesn't issue any write i/o. Could you review it? Anyway, if we call update_inode_page in mark_inode_dirty, f2fs would suffer from a lot of dirty node pages. Got it. But I think we should write dirty node after update_inode_page in write_inode if wbc-sync_mode == WB_SYNC_ALL. Finally, I have one more question. At f2fs_sync_file, in the case of need_cp is true and file_wrong_pino f2fs calls write_inode. But the inode isn't written back. Is it okay? Could you elaborate on it? Thanks, Thanks, Could you consider this once? Thanks, On Mon, Dec 01, 2014 at 02:52:57PM -0800, Jaegeuk Kim wrote: On Mon, Dec 01, 2014 at 04:05:20PM +0900, Changman Lee wrote: It makes sense to check inode's state than check if inode page is dirty or not. Nice catch. However, we should leave the original condition, since write_inode can be called in prior to this fsync call. And, this is not a proper fix, since it still can skip to write its inode page. How about this one? diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 146e58a..6690599 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -168,6 +168,12 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync) return ret; } + /* if the inode is dirty, let's recover all the time */ + if (is_inode_flag_set(fi, FI_DIRTY_INODE)) { + update_inode_page(inode); + goto go_write; + } + /* * if there is no written data, don't waste time to write recovery info. */ -- 2.1.1 Signed-off-by: Changman Lee cm224@samsung.com --- fs/f2fs/file.c | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 7c2ec3e..0c5ae87 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -173,14 +173,11 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync) */ if (!is_inode_flag_set(fi, FI_APPEND_WRITE) !exist_written_data(sbi, ino, APPEND_INO)) { - struct page *i = find_get_page(NODE_MAPPING(sbi), ino); /* But we need to avoid that there are some inode updates */ - if ((i PageDirty(i)) || need_inode_block_update(sbi, ino)) { - f2fs_put_page(i, 0); + if (is_inode_flag_set(fi, FI_DIRTY_INODE) || + need_inode_block_update(sbi, ino)) goto go_write; - } - f2fs_put_page(i, 0); if (is_inode_flag_set(fi, FI_UPDATE_WRITE) || exist_written_data(sbi, ino, UPDATE_INO)) -- 1.9.1 -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=157005751iu=/4140/ostg.clktrk ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel -- Download BIRT
[f2fs-dev] f2fs_write_inode
Hi guys, I was wondering why f2fs_write_inode doesn't submit any I/O according to wbc-sync_mode. If you have any idea, answer to my questions, please. And at f2fs_sync_file, if (need_cp) { Q: We've already called sync_fs. Is there any scenario like below ? I refered to 354a3399dc6f7e556d04e1c731cd50e08eeb44bd but I can't guess the situation. if (file_wrong_pino(inode) inode-i_nlink == 1 get_parent_ino(inode, pino)) { fi-i_pino = pino; file_got_pino(inode); up_write(fi-i_sem); mark_inode_dirty_sync(inode); Q: Update but no write I/O. How to recover after SPO ? ret = f2fs_write_inode(inode, NULL); if (ret) goto out; } else { up_write(fi-i_sem); } } else { ~ snip ~ out: return ret; Regards, Changman -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=157005751iu=/4140/ostg.clktrk ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH] f2fs: check if inode's state is dirty or not before skip fsync
Hi, f2fs_dirty_inode just set fi-flag as FI_DIRTY_INODE not to call update_inode_page. Instead, we do it when f2fs_write_indoe is called. Do you have any reason to do like this? How about move update_inode_page from write_inode to dirty_inode? And we can update inode page when mark_inode_dirty or mark_inode_dirty_sync is called. Then, we control write I/O in write_inode according to wbc-sync_mode. Could you consider this once? Thanks, On Mon, Dec 01, 2014 at 02:52:57PM -0800, Jaegeuk Kim wrote: On Mon, Dec 01, 2014 at 04:05:20PM +0900, Changman Lee wrote: It makes sense to check inode's state than check if inode page is dirty or not. Nice catch. However, we should leave the original condition, since write_inode can be called in prior to this fsync call. And, this is not a proper fix, since it still can skip to write its inode page. How about this one? diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 146e58a..6690599 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -168,6 +168,12 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync) return ret; } + /* if the inode is dirty, let's recover all the time */ + if (is_inode_flag_set(fi, FI_DIRTY_INODE)) { + update_inode_page(inode); + goto go_write; + } + /* * if there is no written data, don't waste time to write recovery info. */ -- 2.1.1 Signed-off-by: Changman Lee cm224@samsung.com --- fs/f2fs/file.c | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 7c2ec3e..0c5ae87 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -173,14 +173,11 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync) */ if (!is_inode_flag_set(fi, FI_APPEND_WRITE) !exist_written_data(sbi, ino, APPEND_INO)) { - struct page *i = find_get_page(NODE_MAPPING(sbi), ino); /* But we need to avoid that there are some inode updates */ - if ((i PageDirty(i)) || need_inode_block_update(sbi, ino)) { - f2fs_put_page(i, 0); + if (is_inode_flag_set(fi, FI_DIRTY_INODE) || + need_inode_block_update(sbi, ino)) goto go_write; - } - f2fs_put_page(i, 0); if (is_inode_flag_set(fi, FI_UPDATE_WRITE) || exist_written_data(sbi, ino, UPDATE_INO)) -- 1.9.1 -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=157005751iu=/4140/ostg.clktrk ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=157005751iu=/4140/ostg.clktrk ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH] f2fs: more fast lookup for gc_inode list
If there are many inodes that have data blocks in victim segment, it takes long time to find a inode in gc_inode list. Let's use radix_tree to reduce lookup time. Signed-off-by: Changman Lee cm224@samsung.com --- fs/f2fs/gc.c | 21 ++--- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index 29fc7e5..fc765c1 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -24,6 +24,7 @@ #include gc.h #include trace/events/f2fs.h +RADIX_TREE(gc_inode_root, GFP_ATOMIC); static struct kmem_cache *winode_slab; static int gc_thread_func(void *data) @@ -338,13 +339,13 @@ static const struct victim_selection default_v_ops = { .get_victim = get_victim_by_default, }; -static struct inode *find_gc_inode(nid_t ino, struct list_head *ilist) +static struct inode *find_gc_inode(nid_t ino) { struct inode_entry *ie; - list_for_each_entry(ie, ilist, list) - if (ie-inode-i_ino == ino) - return ie-inode; + ie = radix_tree_lookup(gc_inode_root, ino); + if (ie) + return ie-inode; return NULL; } @@ -352,13 +353,19 @@ static void add_gc_inode(struct inode *inode, struct list_head *ilist) { struct inode_entry *new_ie; - if (inode == find_gc_inode(inode-i_ino, ilist)) { + new_ie = radix_tree_lookup(gc_inode_root, inode-i_ino); + if (new_ie) { iput(inode); return; } new_ie = f2fs_kmem_cache_alloc(winode_slab, GFP_NOFS); new_ie-inode = inode; + + if (radix_tree_insert(gc_inode_root, inode-i_ino, new_ie)) { + kmem_cache_free(winode_slab, new_ie); + return; + } list_add_tail(new_ie-list, ilist); } @@ -367,7 +374,7 @@ static void put_gc_inode(struct list_head *ilist) struct inode_entry *ie, *next_ie; list_for_each_entry_safe(ie, next_ie, ilist, list) { iput(ie-inode); - list_del(ie-list); + radix_tree_delete(gc_inode_root, ie-inode-i_ino); kmem_cache_free(winode_slab, ie); } } @@ -614,7 +621,7 @@ next_step: } /* phase 3 */ - inode = find_gc_inode(dni.ino, ilist); + inode = find_gc_inode(dni.ino); if (inode) { start_bidx = start_bidx_of_node(nofs, F2FS_I(inode)); -- 1.9.1 -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=157005751iu=/4140/ostg.clktrk ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH] f2fs: move put_gc_inode into gc_mutex
There in no any lock to protect gc_inode list so let's move into gc_mutex, otherwise it might be lost links of list. Signed-off-by: Changman Lee cm224@samsung.com --- fs/f2fs/gc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index 657683c9..99e1720 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -733,9 +733,9 @@ gc_more: if (gc_type == FG_GC) write_checkpoint(sbi, cpc); stop: - mutex_unlock(sbi-gc_mutex); - put_gc_inode(ilist); + + mutex_unlock(sbi-gc_mutex); return ret; } -- 1.9.1 -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=157005751iu=/4140/ostg.clktrk ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH] f2fs: move put_gc_inode into gc_mutex
On Thu, Nov 27, 2014 at 07:55:14PM -0800, Jaegeuk Kim wrote: Hi Changman, On Thu, Nov 27, 2014 at 06:42:54PM +0900, Changman Lee wrote: There in no any lock to protect gc_inode list so let's move into gc_mutex, otherwise it might be lost links of list. Could you explain why the links can be lost? Cause the ilist is a local variable. Hi Jaegeuk, Oh, I missed ilist is a local variable. Sorry, ignore this patch. Thanks, IIRC, the reason why put_gc_inode is called outside of gc_mutex is to avoid deadlock between f2fs_evict_inode and gc operations. I'm not sure it still has a problem, but it is unclear that we have to move put_gc_inode inside gc_mutex. Are you facing with any bug on this? Thanks, Signed-off-by: Changman Lee cm224@samsung.com --- fs/f2fs/gc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index 657683c9..99e1720 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -733,9 +733,9 @@ gc_more: if (gc_type == FG_GC) write_checkpoint(sbi, cpc); stop: - mutex_unlock(sbi-gc_mutex); - put_gc_inode(ilist); + + mutex_unlock(sbi-gc_mutex); return ret; } -- 1.9.1 -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=157005751iu=/4140/ostg.clktrk ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=157005751iu=/4140/ostg.clktrk ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH] f2fs: cleanup if-statement of phase in gc_data_segment
Little cleanup to distinguish each phase easily Signed-off-by: Changman Lee cm224@samsung.com --- fs/f2fs/gc.c | 35 ++- 1 file changed, 18 insertions(+), 17 deletions(-) diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index 81686b2..de00713 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -602,27 +602,28 @@ next_step: data_page = find_data_page(inode, start_bidx + ofs_in_node, false); - if (IS_ERR(data_page)) - goto next_iput; + if (IS_ERR(data_page)) { + iput(inode); + continue; + } f2fs_put_page(data_page, 0); add_gc_inode(inode, ilist); - } else { - inode = find_gc_inode(dni.ino, ilist); - if (inode) { - start_bidx = start_bidx_of_node(nofs, - F2FS_I(inode)); - data_page = get_lock_data_page(inode, - start_bidx + ofs_in_node); - if (IS_ERR(data_page)) - continue; - move_data_page(inode, data_page, gc_type); - stat_inc_data_blk_count(sbi, 1); - } + continue; + } + + /* phase 3 */ + inode = find_gc_inode(dni.ino, ilist); + if (inode) { + start_bidx = start_bidx_of_node(nofs, + F2FS_I(inode)); + data_page = get_lock_data_page(inode, + start_bidx + ofs_in_node); + if (IS_ERR(data_page)) + continue; + move_data_page(inode, data_page, gc_type); + stat_inc_data_blk_count(sbi, 1); } - continue; -next_iput: - iput(inode); } if (++phase 4) -- 1.9.1 -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=157005751iu=/4140/ostg.clktrk ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH 1/3] f2fs: call flush_dcache_page when the page was updated
Hi Simon, Thanks very much for your interest. It becomes more clear due to your explanation. Regards, Changman On Tue, Nov 25, 2014 at 08:05:23PM +0100, Simon Baatz wrote: Hi Changman, On Mon, Nov 24, 2014 at 11:46:46AM +0900, Changman Lee wrote: Hi Simon, Thanks for your explanation kindly. On Sun, Nov 23, 2014 at 11:08:54AM +0100, Simon Baatz wrote: Hi Changman, Jaegeuk, On Thu, Nov 20, 2014 at 05:47:29PM +0900, Changman Lee wrote: On Wed, Nov 19, 2014 at 10:45:33PM -0800, Jaegeuk Kim wrote: On Thu, Nov 20, 2014 at 03:04:10PM +0900, Changman Lee wrote: Hi Jaegeuk, We should call flush_dcache_page before kunmap because the purpose of the cache flush is to address aliasing problem related to virtual address. Oh, I just followed zero_user_segments below. static inline void zero_user_segments(struct page *page, unsigned start1, unsigned end1, unsigned start2, unsigned end2) { void *kaddr = kmap_atomic(page); BUG_ON(end1 PAGE_SIZE || end2 PAGE_SIZE); if (end1 start1) memset(kaddr + start1, 0, end1 - start1); if (end2 start2) memset(kaddr + start2, 0, end2 - start2); kunmap_atomic(kaddr); flush_dcache_page(page); } Is this a wrong reference? Or, a bug? Well.. Data in cache only have to be flushed until before other users read the data. If so, it's not a bug. Yes, it is not a bug, since flush_dcache_page() needs to be able to deal with non-kmapped pages. However, this may create overhead in some situations. Previously, I was vague but I thought that it should be different according to vaddr exists or not. So I told jaegeuk that it should be better to change an order between flush_dache_page and kunmap. But actually, it doesn't matter the order between them except the situation you said. Could you explain the situation that makes overhead by flushing after kummap. I can't imagine it by just seeing flush_dcache_page code. I was a not very precise here. Yes, flush_dcache_page() on ARM does the same in both situations since it has no idea whether it is called before or after kunmap. However, flush_kernel_dcache_page() can assume that it is called before kunmap and thus, for example, does not need to pin a highmem page by kmap_high_get() (apart from not having to care about flushing user space mappings) According to documentation (see Documentation/cachetlb.txt), this is a use for flush_kernel_dcache_page(), since the page has been modified by the kernel only. In contrast to flush_dcache_page(), this function must be called before kunmap(). flush_kernel_dcache_page() does not need to flush the user space aliases. Additionally, at least on ARM, it does not flush at all when called within kmap_atomic()/kunmap_atomic(), when kunmap_atomic() is going to flush the page anyway. (I know that almost no one uses flush_kernel_dcache_page() (probably because almost no one knows when to use which of the two functions), but it may save a few cache flushes on architectures which are affected by aliasing) Anyway I modified as below. Thanks, From 7cb7b27c8cd2efc8a31d79239bef5b41c6e79216 Mon Sep 17 00:00:00 2001 From: Jaegeuk Kim jaeg...@kernel.org Date: Tue, 18 Nov 2014 10:50:21 -0800 Subject: [PATCH] f2fs: call flush_dcache_page when the page was updated Whenever f2fs updates mapped pages, it needs to call flush_dcache_page. Signed-off-by: Jaegeuk Kim jaeg...@kernel.org --- fs/f2fs/dir.c| 7 ++- fs/f2fs/inline.c | 2 ++ 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c index 5a49995..fabf4ee 100644 --- a/fs/f2fs/dir.c +++ b/fs/f2fs/dir.c @@ -287,8 +287,10 @@ void f2fs_set_link(struct inode *dir, struct f2fs_dir_entry *de, f2fs_wait_on_page_writeback(page, type); de-ino = cpu_to_le32(inode-i_ino); set_de_type(de, inode); - if (!f2fs_has_inline_dentry(dir)) + if (!f2fs_has_inline_dentry(dir)) { + flush_dcache_page(page); kunmap(page); + } Is this a page that may be mapped into user space? (I may be completely wrong here, since I have no idea how this code works. But it looks like as if the answer is no ;-) ). It is not necessary to flush pages that cannot be seen by user space (see also the NOTE in the documentation of flush_dcache_page() in cachetlb.txt). Thus, if you know that a page will not be mapped into user space, please don't create the overhead of flushing it. In the case of dentry unlike inline data
Re: [f2fs-dev] [PATCH] f2fs: add cleancache support
On Sun, Nov 23, 2014 at 11:18:00PM -0800, Jaegeuk Kim wrote: On Mon, Nov 24, 2014 at 03:19:43PM +0900, Changman Lee wrote: On Sun, Nov 23, 2014 at 09:42:12PM -0800, Jaegeuk Kim wrote: On Thu, Nov 20, 2014 at 01:38:51PM +0900, Changman Lee wrote: On Fri, Nov 14, 2014 at 02:53:02PM +0900, Changman Lee wrote: On Thu, Nov 13, 2014 at 05:27:51PM -0800, Jaegeuk Kim wrote: Hi Changman, On Thu, Nov 13, 2014 at 02:34:50PM +0900, Changman Lee wrote: To use cleancache, fs must explicitly enable cleancache by calling cleancache_init_fs. Good catch! Prior to merge this patch, can you share any testing results or performance numbers? Not yet, I'll try to get numbers. Hi, This is the result of kernel compile on xen-4.4 enabled tmem : cleancache and frontswap. I'm afraid that there is little difference by cleancache. The cleancache shows a few cache hits but the effect through it doesn't show. I don't know best benchmark to testify it yet. Finally, I couldn't discover any bug during test. [before patch] 1 2 3 Elapsed time25:00.6725:07.0925:00.38 Major fault 31100 31410 31333 Minor fault 276869398 276869318 276871144 [after patch] 1 2 3 Elapsed time25:12.3425:13.2925:11.99 Major fault 31559 32069 31801 Minor fault 276870283 276868046 276869251 [cleancache] - diff between start and end 1 2 3 failed_gets 1277980 1296355 1300368 invalidates 2588227 2651722 2655285 puts1289970 1323685 1320623 *succ_gets* 11 121299 114310 Hi Changman, So, what is your suggestion? IMO, we first need to find a way exploiting cleancache over f2fs, so that we can introduce some guide for users. Until then, how about keeping this patch for a while? The performance of cleancache depends on workload but ext4 and btrfs support it already. So how about allowing to enable cleancache on f2fs? If backend of cleancache doesn't exists, there is no effect for f2fs. I think negative effectness of cleancache is little. Anyway, a final decision lies in your hand. I'm not sure, but it seems that nobody uses the cleancache. https://www.google.co.kr/trends/explore#q=cleancache And, as you've shown even worse performance under a simple workload, I don't understand why you want to add this. Let me know, if I'm missing any rationale. Okay, let's keep it until before finding a way exploiting it well. I thought to estimate firefox's startup time. To do it, however, I needed to install ubuntu on f2fs. It takes long time to set up test environment. So I gave up. :( I have no rationale now. Thanks Thanks, Thanks, Changman What condition will be the best way to exploit f2fs and cleancache? Not clear. I think we can make a cleancache client for f2fs so that can compenstate a penalty of node pages which are read mostly. Can we confirm that f2fs satisfies most of requirements described by cleancache.txt below? Good point. At a quick glance, F2FS seems to satisfy most of requirements. Through a experimental, I'll try to check side effect. Some points for a filesystem to consider: - The FS should be block-device-based (e.g. a ram-based FS such as tmpfs should not enable cleancache) - To ensure coherency/correctness, the FS must ensure that all file removal or truncation operations either go through VFS or add hooks to do the equivalent cleancache invalidate operations - To ensure coherency/correctness, either inode numbers must be unique across the lifetime of the on-disk file OR the FS must provide an encode_fh function. - The FS must call the VFS superblock alloc and deactivate routines or add hooks to do the equivalent cleancache calls done there. - To maximize performance, all pages fetched from the FS should go through the do_mpag_readpage routine or the FS should add hooks to do the equivalent (cf. btrfs) - Currently, the FS blocksize must be the same as PAGESIZE. This is not an architectural restriction, but no backends currently support anything different. - A clustered FS should invoke the shared_init_fs cleancache hook to get best performance for some backends. Thanks, Signed-off-by: Changman Lee cm224@samsung.com --- fs/f2fs/super.c | 3
[f2fs-dev] [PATCH 2/2] f2fs: no more dirty_nat_entires when flushing
After flushing dirty nat entries, it has to be no more dirty nat entries. Signed-off-by: Changman Lee cm224@samsung.com --- fs/f2fs/node.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index f6bd222..fc1077b 100644 --- a/fs/f2fs/node.c +++ b/fs/f2fs/node.c @@ -1925,10 +1925,10 @@ static void __flush_nat_entry_set(struct f2fs_sb_info *sbi, else f2fs_put_page(page, 1); - if (!set-entry_cnt) { - radix_tree_delete(NM_I(sbi)-nat_set_root, set-set); - kmem_cache_free(nat_entry_set_slab, set); - } + f2fs_bug_on(sbi, set-entry_cnt); + + radix_tree_delete(NM_I(sbi)-nat_set_root, set-set); + kmem_cache_free(nat_entry_set_slab, set); } /* -- 1.9.1 -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=157005751iu=/4140/ostg.clktrk ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH 1/3] f2fs: call flush_dcache_page when the page was updated
Hi Simon, Thanks for your explanation kindly. On Sun, Nov 23, 2014 at 11:08:54AM +0100, Simon Baatz wrote: Hi Changman, Jaegeuk, On Thu, Nov 20, 2014 at 05:47:29PM +0900, Changman Lee wrote: On Wed, Nov 19, 2014 at 10:45:33PM -0800, Jaegeuk Kim wrote: On Thu, Nov 20, 2014 at 03:04:10PM +0900, Changman Lee wrote: Hi Jaegeuk, We should call flush_dcache_page before kunmap because the purpose of the cache flush is to address aliasing problem related to virtual address. Oh, I just followed zero_user_segments below. static inline void zero_user_segments(struct page *page, unsigned start1, unsigned end1, unsigned start2, unsigned end2) { void *kaddr = kmap_atomic(page); BUG_ON(end1 PAGE_SIZE || end2 PAGE_SIZE); if (end1 start1) memset(kaddr + start1, 0, end1 - start1); if (end2 start2) memset(kaddr + start2, 0, end2 - start2); kunmap_atomic(kaddr); flush_dcache_page(page); } Is this a wrong reference? Or, a bug? Well.. Data in cache only have to be flushed until before other users read the data. If so, it's not a bug. Yes, it is not a bug, since flush_dcache_page() needs to be able to deal with non-kmapped pages. However, this may create overhead in some situations. Previously, I was vague but I thought that it should be different according to vaddr exists or not. So I told jaegeuk that it should be better to change an order between flush_dache_page and kunmap. But actually, it doesn't matter the order between them except the situation you said. Could you explain the situation that makes overhead by flushing after kummap. I can't imagine it by just seeing flush_dcache_page code. According to documentation (see Documentation/cachetlb.txt), this is a use for flush_kernel_dcache_page(), since the page has been modified by the kernel only. In contrast to flush_dcache_page(), this function must be called before kunmap(). flush_kernel_dcache_page() does not need to flush the user space aliases. Additionally, at least on ARM, it does not flush at all when called within kmap_atomic()/kunmap_atomic(), when kunmap_atomic() is going to flush the page anyway. (I know that almost no one uses flush_kernel_dcache_page() (probably because almost no one knows when to use which of the two functions), but it may save a few cache flushes on architectures which are affected by aliasing) Anyway I modified as below. Thanks, From 7cb7b27c8cd2efc8a31d79239bef5b41c6e79216 Mon Sep 17 00:00:00 2001 From: Jaegeuk Kim jaeg...@kernel.org Date: Tue, 18 Nov 2014 10:50:21 -0800 Subject: [PATCH] f2fs: call flush_dcache_page when the page was updated Whenever f2fs updates mapped pages, it needs to call flush_dcache_page. Signed-off-by: Jaegeuk Kim jaeg...@kernel.org --- fs/f2fs/dir.c| 7 ++- fs/f2fs/inline.c | 2 ++ 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c index 5a49995..fabf4ee 100644 --- a/fs/f2fs/dir.c +++ b/fs/f2fs/dir.c @@ -287,8 +287,10 @@ void f2fs_set_link(struct inode *dir, struct f2fs_dir_entry *de, f2fs_wait_on_page_writeback(page, type); de-ino = cpu_to_le32(inode-i_ino); set_de_type(de, inode); - if (!f2fs_has_inline_dentry(dir)) + if (!f2fs_has_inline_dentry(dir)) { + flush_dcache_page(page); kunmap(page); + } Is this a page that may be mapped into user space? (I may be completely wrong here, since I have no idea how this code works. But it looks like as if the answer is no ;-) ). It is not necessary to flush pages that cannot be seen by user space (see also the NOTE in the documentation of flush_dcache_page() in cachetlb.txt). Thus, if you know that a page will not be mapped into user space, please don't create the overhead of flushing it. In the case of dentry unlike inline data, this is not mapped to user space, so dcache flush makes overhead. Do you mean that? Best regard, Changman - Simon -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=157005751iu=/4140/ostg.clktrk ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH 1/3] f2fs: call flush_dcache_page when the page was updated
On Wed, Nov 19, 2014 at 10:45:33PM -0800, Jaegeuk Kim wrote: On Thu, Nov 20, 2014 at 03:04:10PM +0900, Changman Lee wrote: Hi Jaegeuk, We should call flush_dcache_page before kunmap because the purpose of the cache flush is to address aliasing problem related to virtual address. Oh, I just followed zero_user_segments below. static inline void zero_user_segments(struct page *page, unsigned start1, unsigned end1, unsigned start2, unsigned end2) { void *kaddr = kmap_atomic(page); BUG_ON(end1 PAGE_SIZE || end2 PAGE_SIZE); if (end1 start1) memset(kaddr + start1, 0, end1 - start1); if (end2 start2) memset(kaddr + start2, 0, end2 - start2); kunmap_atomic(kaddr); flush_dcache_page(page); } Is this a wrong reference? Or, a bug? Well.. Data in cache only have to be flushed until before other users read the data. If so, it's not a bug. Anyway I modified as below. Thanks, From 7cb7b27c8cd2efc8a31d79239bef5b41c6e79216 Mon Sep 17 00:00:00 2001 From: Jaegeuk Kim jaeg...@kernel.org Date: Tue, 18 Nov 2014 10:50:21 -0800 Subject: [PATCH] f2fs: call flush_dcache_page when the page was updated Whenever f2fs updates mapped pages, it needs to call flush_dcache_page. Signed-off-by: Jaegeuk Kim jaeg...@kernel.org --- fs/f2fs/dir.c| 7 ++- fs/f2fs/inline.c | 2 ++ 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c index 5a49995..fabf4ee 100644 --- a/fs/f2fs/dir.c +++ b/fs/f2fs/dir.c @@ -287,8 +287,10 @@ void f2fs_set_link(struct inode *dir, struct f2fs_dir_entry *de, f2fs_wait_on_page_writeback(page, type); de-ino = cpu_to_le32(inode-i_ino); set_de_type(de, inode); - if (!f2fs_has_inline_dentry(dir)) + if (!f2fs_has_inline_dentry(dir)) { + flush_dcache_page(page); kunmap(page); + } set_page_dirty(page); dir-i_mtime = dir-i_ctime = CURRENT_TIME; mark_inode_dirty(dir); @@ -365,6 +367,7 @@ static int make_empty_dir(struct inode *inode, make_dentry_ptr(d, (void *)dentry_blk, 1); do_make_empty_dir(inode, parent, d); + flush_dcache_page(dentry_page); kunmap_atomic(dentry_blk); set_page_dirty(dentry_page); @@ -578,6 +581,7 @@ fail: update_inode_page(dir); clear_inode_flag(F2FS_I(dir), FI_UPDATE_DIR); } + flush_dcache_page(dentry_page); kunmap(dentry_page); f2fs_put_page(dentry_page, 1); return err; @@ -660,6 +664,7 @@ void f2fs_delete_entry(struct f2fs_dir_entry *dentry, struct page *page, bit_pos = find_next_bit_le(dentry_blk-dentry_bitmap, NR_DENTRY_IN_BLOCK, 0); + flush_dcache_page(page); kunmap(page); /* kunmap - pair of f2fs_find_entry */ set_page_dirty(page); diff --git a/fs/f2fs/inline.c b/fs/f2fs/inline.c index f26fb87..4291c1f 100644 --- a/fs/f2fs/inline.c +++ b/fs/f2fs/inline.c @@ -106,6 +106,7 @@ int f2fs_convert_inline_page(struct dnode_of_data *dn, struct page *page) src_addr = inline_data_addr(dn-inode_page); dst_addr = kmap_atomic(page); memcpy(dst_addr, src_addr, MAX_INLINE_DATA); + flush_dcache_page(page); kunmap_atomic(dst_addr); SetPageUptodate(page); no_update: @@ -357,6 +358,7 @@ static int f2fs_convert_inline_dir(struct inode *dir, struct page *ipage, memcpy(dentry_blk-filename, inline_dentry-filename, NR_INLINE_DENTRY * F2FS_SLOT_LEN); + flush_dcache_page(page); kunmap_atomic(dentry_blk); SetPageUptodate(page); set_page_dirty(page); -- 2.1.1 -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=157005751iu=/4140/ostg.clktrk ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH] f2fs: fix wrong data structure when create slab
It used nat_entry_set when create slab for sit_entry_set. Signed-off-by: Changman Lee cm224@samsung.com --- fs/f2fs/segment.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index e094675..9de857f 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -2231,7 +2231,7 @@ int __init create_segment_manager_caches(void) goto fail; sit_entry_set_slab = f2fs_kmem_cache_create(sit_entry_set, - sizeof(struct nat_entry_set)); + sizeof(struct sit_entry_set)); if (!sit_entry_set_slab) goto destory_discard_entry; -- 1.9.1 -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=157005751iu=/4140/ostg.clktrk ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH] f2fs: add cleancache support
On Fri, Nov 14, 2014 at 02:53:02PM +0900, Changman Lee wrote: On Thu, Nov 13, 2014 at 05:27:51PM -0800, Jaegeuk Kim wrote: Hi Changman, On Thu, Nov 13, 2014 at 02:34:50PM +0900, Changman Lee wrote: To use cleancache, fs must explicitly enable cleancache by calling cleancache_init_fs. Good catch! Prior to merge this patch, can you share any testing results or performance numbers? Not yet, I'll try to get numbers. Hi, This is the result of kernel compile on xen-4.4 enabled tmem : cleancache and frontswap. I'm afraid that there is little difference by cleancache. The cleancache shows a few cache hits but the effect through it doesn't show. I don't know best benchmark to testify it yet. Finally, I couldn't discover any bug during test. [before patch] 1 2 3 Elapsed time25:00.6725:07.0925:00.38 Major fault 31100 31410 31333 Minor fault 276869398 276869318 276871144 [after patch] 1 2 3 Elapsed time25:12.3425:13.2925:11.99 Major fault 31559 32069 31801 Minor fault 276870283 276868046 276869251 [cleancache] - diff between start and end 1 2 3 failed_gets 1277980 1296355 1300368 invalidates 2588227 2651722 2655285 puts1289970 1323685 1320623 *succ_gets* 11 121299 114310 Thanks, Changman What condition will be the best way to exploit f2fs and cleancache? Not clear. I think we can make a cleancache client for f2fs so that can compenstate a penalty of node pages which are read mostly. Can we confirm that f2fs satisfies most of requirements described by cleancache.txt below? Good point. At a quick glance, F2FS seems to satisfy most of requirements. Through a experimental, I'll try to check side effect. Some points for a filesystem to consider: - The FS should be block-device-based (e.g. a ram-based FS such as tmpfs should not enable cleancache) - To ensure coherency/correctness, the FS must ensure that all file removal or truncation operations either go through VFS or add hooks to do the equivalent cleancache invalidate operations - To ensure coherency/correctness, either inode numbers must be unique across the lifetime of the on-disk file OR the FS must provide an encode_fh function. - The FS must call the VFS superblock alloc and deactivate routines or add hooks to do the equivalent cleancache calls done there. - To maximize performance, all pages fetched from the FS should go through the do_mpag_readpage routine or the FS should add hooks to do the equivalent (cf. btrfs) - Currently, the FS blocksize must be the same as PAGESIZE. This is not an architectural restriction, but no backends currently support anything different. - A clustered FS should invoke the shared_init_fs cleancache hook to get best performance for some backends. Thanks, Signed-off-by: Changman Lee cm224@samsung.com --- fs/f2fs/super.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 512ffd8..2ebb960 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -24,6 +24,7 @@ #include linux/blkdev.h #include linux/f2fs_fs.h #include linux/sysfs.h +#include linux/cleancache.h #include f2fs.h #include node.h @@ -1144,6 +1145,8 @@ try_onemore: if (err) goto free_kobj; } + + cleancache_init_fs(sb); return 0; free_kobj: -- 1.9.1 -- Comprehensive Server Monitoring with Site24x7. Monitor 10 servers for $9/Month. Get alerted through email, SMS, voice calls or mobile push notifications. Take corrective actions from your mobile device. http://pubads.g.doubleclick.net/gampad/clk?id=154624111iu=/4140/ostg.clktrk ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel -- Comprehensive Server Monitoring with Site24x7. Monitor 10 servers for $9/Month. Get alerted through email, SMS, voice calls or mobile push notifications. Take corrective actions from your mobile device. http://pubads.g.doubleclick.net/gampad/clk?id=154624111iu=/4140/ostg.clktrk ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH 1/3] f2fs: call flush_dcache_page when the page was updated
Hi Jaegeuk, We should call flush_dcache_page before kunmap because the purpose of the cache flush is to address aliasing problem related to virtual address. On Wed, Nov 19, 2014 at 02:35:08PM -0800, Jaegeuk Kim wrote: Whenever f2fs updates mapped pages, it needs to call flush_dcache_page. Signed-off-by: Jaegeuk Kim jaeg...@kernel.org --- fs/f2fs/dir.c| 7 ++- fs/f2fs/inline.c | 4 +++- 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c index 5a49995..312fbfc 100644 --- a/fs/f2fs/dir.c +++ b/fs/f2fs/dir.c @@ -287,8 +287,10 @@ void f2fs_set_link(struct inode *dir, struct f2fs_dir_entry *de, f2fs_wait_on_page_writeback(page, type); de-ino = cpu_to_le32(inode-i_ino); set_de_type(de, inode); - if (!f2fs_has_inline_dentry(dir)) + if (!f2fs_has_inline_dentry(dir)) { kunmap(page); + flush_dcache_page(page); + } set_page_dirty(page); dir-i_mtime = dir-i_ctime = CURRENT_TIME; mark_inode_dirty(dir); @@ -366,6 +368,7 @@ static int make_empty_dir(struct inode *inode, do_make_empty_dir(inode, parent, d); kunmap_atomic(dentry_blk); + flush_dcache_page(dentry_page); set_page_dirty(dentry_page); f2fs_put_page(dentry_page, 1); @@ -579,6 +582,7 @@ fail: clear_inode_flag(F2FS_I(dir), FI_UPDATE_DIR); } kunmap(dentry_page); + flush_dcache_page(dentry_page); f2fs_put_page(dentry_page, 1); return err; } @@ -661,6 +665,7 @@ void f2fs_delete_entry(struct f2fs_dir_entry *dentry, struct page *page, NR_DENTRY_IN_BLOCK, 0); kunmap(page); /* kunmap - pair of f2fs_find_entry */ + flush_dcache_page(page); set_page_dirty(page); dir-i_ctime = dir-i_mtime = CURRENT_TIME; diff --git a/fs/f2fs/inline.c b/fs/f2fs/inline.c index f26fb87..8b7cc51 100644 --- a/fs/f2fs/inline.c +++ b/fs/f2fs/inline.c @@ -45,8 +45,8 @@ void read_inline_data(struct page *page, struct page *ipage) src_addr = inline_data_addr(ipage); dst_addr = kmap_atomic(page); memcpy(dst_addr, src_addr, MAX_INLINE_DATA); - flush_dcache_page(page); kunmap_atomic(dst_addr); + flush_dcache_page(page); SetPageUptodate(page); } @@ -107,6 +107,7 @@ int f2fs_convert_inline_page(struct dnode_of_data *dn, struct page *page) dst_addr = kmap_atomic(page); memcpy(dst_addr, src_addr, MAX_INLINE_DATA); kunmap_atomic(dst_addr); + flush_dcache_page(page); SetPageUptodate(page); no_update: /* write data page to try to make data consistent */ @@ -358,6 +359,7 @@ static int f2fs_convert_inline_dir(struct inode *dir, struct page *ipage, NR_INLINE_DENTRY * F2FS_SLOT_LEN); kunmap_atomic(dentry_blk); + flush_dcache_page(page); SetPageUptodate(page); set_page_dirty(page); -- 2.1.1 -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=157005751iu=/4140/ostg.clktrk ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=157005751iu=/4140/ostg.clktrk ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH] mkfs.f2fs: introduce some macros to simplify coding style
This patch tries to simplify coding style for readability. Rename shortly o rename super_block to sb And, introduce some macros. o set/get_cp o set/get_sb o next/prev_zone, last_zone and last_section o ALIGN, SEG_ALIGN and ZONE_ALIGN Signed-off-by: Changman Lee cm224@samsung.com --- include/f2fs_fs.h | 6 + lib/libf2fs.c | 1 + mkfs/f2fs_format.c | 548 +++- mkfs/f2fs_format_main.c | 1 + 4 files changed, 272 insertions(+), 284 deletions(-) diff --git a/include/f2fs_fs.h b/include/f2fs_fs.h index efddfca..0c3ba04 100644 --- a/include/f2fs_fs.h +++ b/include/f2fs_fs.h @@ -230,6 +230,7 @@ struct f2fs_configuration { u_int32_t cur_seg[6]; u_int32_t segs_per_sec; u_int32_t secs_per_zone; + u_int32_t segs_per_zone; u_int32_t start_sector; u_int64_t total_sectors; u_int32_t sectors_per_blk; @@ -786,4 +787,9 @@ f2fs_hash_t f2fs_dentry_hash(const unsigned char *, int); extern struct f2fs_configuration config; +#define ALIGN(val, size) ((val) + (size) - 1) / (size) +#define SEG_ALIGN(blks)ALIGN(blks, config.blks_per_seg) +#define ZONE_ALIGN(blks) ALIGN(blks, config.blks_per_seg * \ + config.segs_per_zone) + #endif /*__F2FS_FS_H */ diff --git a/lib/libf2fs.c b/lib/libf2fs.c index 14e4164..8123528 100644 --- a/lib/libf2fs.c +++ b/lib/libf2fs.c @@ -357,6 +357,7 @@ void f2fs_init_configuration(struct f2fs_configuration *c) c-overprovision = 5; c-segs_per_sec = 1; c-secs_per_zone = 1; + c-segs_per_zone = 1; c-heap = 1; c-vol_label = ; c-device_name = NULL; diff --git a/mkfs/f2fs_format.c b/mkfs/f2fs_format.c index c0028a3..a8d2db6 100644 --- a/mkfs/f2fs_format.c +++ b/mkfs/f2fs_format.c @@ -22,7 +22,71 @@ #include f2fs_format_utils.h extern struct f2fs_configuration config; -struct f2fs_super_block super_block; +struct f2fs_super_block sb; +struct f2fs_checkpoint *cp; + +/* Return first segment number of each area */ +#define prev_zone(cur) (config.cur_seg[cur] - config.segs_per_zone) +#define next_zone(cur) (config.cur_seg[cur] + config.segs_per_zone) +#define last_zone(cur) ((cur - 1) * config.segs_per_zone) +#define last_section(cur) (cur + (config.secs_per_zone - 1) * config.segs_per_sec) + +#define set_sb_le64(member, val) (sb.member = cpu_to_le64(val)) +#define set_sb_le32(member, val) (sb.member = cpu_to_le32(val)) +#define set_sb_le16(member, val) (sb.member = cpu_to_le16(val)) +#define get_sb_le64(member)le64_to_cpu(sb.member) +#define get_sb_le32(member)le32_to_cpu(sb.member) +#define get_sb_le16(member)le16_to_cpu(sb.member) + +#define set_sb(member, val)\ + do {\ + typeof(sb.member) t;\ + switch (sizeof(t)) {\ + case 8: set_sb_le64(member, val); break; \ + case 4: set_sb_le32(member, val); break; \ + case 2: set_sb_le16(member, val); break; \ + } \ + } while(0) + +#define get_sb(member) \ + ({ \ + typeof(sb.member) t;\ + switch (sizeof(t)) {\ + case 8: t = get_sb_le64(member); break; \ + case 4: t = get_sb_le32(member); break; \ + case 2: t = get_sb_le16(member); break; \ + } \ + t; \ + }) + +#define set_cp_le64(member, val) (cp-member = cpu_to_le64(val)) +#define set_cp_le32(member, val) (cp-member = cpu_to_le32(val)) +#define set_cp_le16(member, val) (cp-member = cpu_to_le16(val)) +#define get_cp_le64(member)le64_to_cpu(cp-member) +#define get_cp_le32(member)le32_to_cpu(cp-member) +#define get_cp_le16(member)le16_to_cpu(cp-member) + +#define set_cp(member, val)\ + do {\ + typeof(cp-member) t; \ + switch (sizeof(t)) {\ + case 8: set_cp_le64(member, val); break; \ + case 4: set_cp_le32(member, val); break; \ + case 2: set_cp_le16(member, val); break
[f2fs-dev] [PATCH 2/2] mkfs.f2fs: fix missing endian conversion
This is for conversion from cpu to little endian and vice versa. Signed-off-by: Changman Lee cm224@samsung.com --- mkfs/f2fs_format.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/mkfs/f2fs_format.c b/mkfs/f2fs_format.c index 0a9d728..c0028a3 100644 --- a/mkfs/f2fs_format.c +++ b/mkfs/f2fs_format.c @@ -71,7 +71,7 @@ static void configure_extension_list(void) memcpy(super_block.extension_list[i++], *extlist, name_len); extlist++; } - super_block.extension_count = i; + super_block.extension_count = cpu_to_le32(i); if (!ext_str) return; @@ -86,7 +86,7 @@ static void configure_extension_list(void) break; } - super_block.extension_count = i; + super_block.extension_count = cpu_to_le32(i); free(config.extension_list); } @@ -211,7 +211,7 @@ static int f2fs_prepare_super_block(void) if (max_sit_bitmap_size (CHECKSUM_OFFSET - sizeof(struct f2fs_checkpoint) + 65)) { max_nat_bitmap_size = CHECKSUM_OFFSET - sizeof(struct f2fs_checkpoint) + 1; - super_block.cp_payload = F2FS_BLK_ALIGN(max_sit_bitmap_size); + super_block.cp_payload = cpu_to_le32(F2FS_BLK_ALIGN(max_sit_bitmap_size)); } else { max_nat_bitmap_size = CHECKSUM_OFFSET - sizeof(struct f2fs_checkpoint) + 1 - max_sit_bitmap_size; -- 1.9.1 -- Comprehensive Server Monitoring with Site24x7. Monitor 10 servers for $9/Month. Get alerted through email, SMS, voice calls or mobile push notifications. Take corrective actions from your mobile device. http://pubads.g.doubleclick.net/gampad/clk?id=154624111iu=/4140/ostg.clktrk ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH] f2fs: add cleancache support
On Fri, Nov 14, 2014 at 11:08:15AM +0800, Chao Yu wrote: Hi Changman, -Original Message- From: Changman Lee [mailto:cm224@samsung.com] Sent: Thursday, November 13, 2014 1:35 PM To: linux-fsde...@vger.kernel.org; linux-f2fs-devel@lists.sourceforge.net Subject: [f2fs-dev] [PATCH] f2fs: add cleancache support To use cleancache, fs must explicitly enable cleancache by calling cleancache_init_fs. Good catch! AFAIK, cleancache will work only if we init its backend and register related ops, but since we merged the commit 962564604873 staging: zcache: delete it, we have lost the zcache one. Is there other backends? Regards, Yu Hi Yu, AFAIK, Hypervisor like xen uses cleancache and frontswap positively. And GCMA (Guaranteed CMA) is newly submitting, which has plan to be used by cleancache. I think it's not bad we prepare to accept them. Thansk, Signed-off-by: Changman Lee cm224@samsung.com --- fs/f2fs/super.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 512ffd8..2ebb960 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -24,6 +24,7 @@ #include linux/blkdev.h #include linux/f2fs_fs.h #include linux/sysfs.h +#include linux/cleancache.h #include f2fs.h #include node.h @@ -1144,6 +1145,8 @@ try_onemore: if (err) goto free_kobj; } + + cleancache_init_fs(sb); return 0; free_kobj: -- 1.9.1 -- Comprehensive Server Monitoring with Site24x7. Monitor 10 servers for $9/Month. Get alerted through email, SMS, voice calls or mobile push notifications. Take corrective actions from your mobile device. http://pubads.g.doubleclick.net/gampad/clk?id=154624111iu=/4140/ostg.clktrk ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel -- Comprehensive Server Monitoring with Site24x7. Monitor 10 servers for $9/Month. Get alerted through email, SMS, voice calls or mobile push notifications. Take corrective actions from your mobile device. http://pubads.g.doubleclick.net/gampad/clk?id=154624111iu=/4140/ostg.clktrk ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH] f2fs: add cleancache support
On Thu, Nov 13, 2014 at 05:27:51PM -0800, Jaegeuk Kim wrote: Hi Changman, On Thu, Nov 13, 2014 at 02:34:50PM +0900, Changman Lee wrote: To use cleancache, fs must explicitly enable cleancache by calling cleancache_init_fs. Good catch! Prior to merge this patch, can you share any testing results or performance numbers? Not yet, I'll try to get numbers. What condition will be the best way to exploit f2fs and cleancache? Not clear. I think we can make a cleancache client for f2fs so that can compenstate a penalty of node pages which are read mostly. Can we confirm that f2fs satisfies most of requirements described by cleancache.txt below? Good point. At a quick glance, F2FS seems to satisfy most of requirements. Through a experimental, I'll try to check side effect. Some points for a filesystem to consider: - The FS should be block-device-based (e.g. a ram-based FS such as tmpfs should not enable cleancache) - To ensure coherency/correctness, the FS must ensure that all file removal or truncation operations either go through VFS or add hooks to do the equivalent cleancache invalidate operations - To ensure coherency/correctness, either inode numbers must be unique across the lifetime of the on-disk file OR the FS must provide an encode_fh function. - The FS must call the VFS superblock alloc and deactivate routines or add hooks to do the equivalent cleancache calls done there. - To maximize performance, all pages fetched from the FS should go through the do_mpag_readpage routine or the FS should add hooks to do the equivalent (cf. btrfs) - Currently, the FS blocksize must be the same as PAGESIZE. This is not an architectural restriction, but no backends currently support anything different. - A clustered FS should invoke the shared_init_fs cleancache hook to get best performance for some backends. Thanks, Signed-off-by: Changman Lee cm224@samsung.com --- fs/f2fs/super.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 512ffd8..2ebb960 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -24,6 +24,7 @@ #include linux/blkdev.h #include linux/f2fs_fs.h #include linux/sysfs.h +#include linux/cleancache.h #include f2fs.h #include node.h @@ -1144,6 +1145,8 @@ try_onemore: if (err) goto free_kobj; } + + cleancache_init_fs(sb); return 0; free_kobj: -- 1.9.1 -- Comprehensive Server Monitoring with Site24x7. Monitor 10 servers for $9/Month. Get alerted through email, SMS, voice calls or mobile push notifications. Take corrective actions from your mobile device. http://pubads.g.doubleclick.net/gampad/clk?id=154624111iu=/4140/ostg.clktrk ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel -- Comprehensive Server Monitoring with Site24x7. Monitor 10 servers for $9/Month. Get alerted through email, SMS, voice calls or mobile push notifications. Take corrective actions from your mobile device. http://pubads.g.doubleclick.net/gampad/clk?id=154624111iu=/4140/ostg.clktrk ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH 1/5] f2fs: disable roll-forward when active_logs = 2
On Mon, Nov 10, 2014 at 07:07:59AM -0800, Jaegeuk Kim wrote: Hi Changman, On Mon, Nov 10, 2014 at 06:54:37PM +0900, Changman Lee wrote: On Sat, Nov 08, 2014 at 11:36:05PM -0800, Jaegeuk Kim wrote: The roll-forward mechanism should be activated when the number of active logs is not 2. Signed-off-by: Jaegeuk Kim jaeg...@kernel.org --- fs/f2fs/file.c| 2 ++ fs/f2fs/segment.c | 4 ++-- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 46311e7..54722a0 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -132,6 +132,8 @@ static inline bool need_do_checkpoint(struct inode *inode) need_cp = true; else if (test_opt(sbi, FASTBOOT)) need_cp = true; + else if (sbi-active_logs == 2) + need_cp = true; return need_cp; } diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 2fb3d7f..16721b5d 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -1090,8 +1090,8 @@ static int __get_segment_type_4(struct page *page, enum page_type p_type) else return CURSEG_COLD_DATA; } else { - if (IS_DNODE(page) !is_cold_node(page)) - return CURSEG_HOT_NODE; + if (IS_DNODE(page) is_cold_node(page)) + return CURSEG_WARM_NODE; Hi Jaegeuk, We should take hot/cold seperation into account as well. In case of dir inode, it will be mixed with COLD_NODE. If it's trade-off, let's notice it kindly as comments. NAK. This patch tries to fix a bug, which is not a trade-off. We should write files' direct node blocks in CURSEG_WARM_NODE for recovery. Thanks, Okay, a word of 'trade-off' is wrong. We must be able to do recovery. However, we break a rule of hot/cold separation we want. So I thought we should notice its negative effect. Anyway, how about putting WARM and HOT together instead HOT and COLD? We can distinguish enough if they are direct node and have fsync_mark at recovery time although HOT/WARM are mixed. Let me know if there is my misundertanding. Thanks, Regards, Changman else return CURSEG_COLD_NODE; } -- 2.1.1 -- ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel -- Comprehensive Server Monitoring with Site24x7. Monitor 10 servers for $9/Month. Get alerted through email, SMS, voice calls or mobile push notifications. Take corrective actions from your mobile device. http://pubads.g.doubleclick.net/gampad/clk?id=154624111iu=/4140/ostg.clktrk ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH 1/5] f2fs: disable roll-forward when active_logs = 2
On Sat, Nov 08, 2014 at 11:36:05PM -0800, Jaegeuk Kim wrote: The roll-forward mechanism should be activated when the number of active logs is not 2. Signed-off-by: Jaegeuk Kim jaeg...@kernel.org --- fs/f2fs/file.c| 2 ++ fs/f2fs/segment.c | 4 ++-- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 46311e7..54722a0 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -132,6 +132,8 @@ static inline bool need_do_checkpoint(struct inode *inode) need_cp = true; else if (test_opt(sbi, FASTBOOT)) need_cp = true; + else if (sbi-active_logs == 2) + need_cp = true; return need_cp; } diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 2fb3d7f..16721b5d 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -1090,8 +1090,8 @@ static int __get_segment_type_4(struct page *page, enum page_type p_type) else return CURSEG_COLD_DATA; } else { - if (IS_DNODE(page) !is_cold_node(page)) - return CURSEG_HOT_NODE; + if (IS_DNODE(page) is_cold_node(page)) + return CURSEG_WARM_NODE; Hi Jaegeuk, We should take hot/cold seperation into account as well. In case of dir inode, it will be mixed with COLD_NODE. If it's trade-off, let's notice it kindly as comments. Regards, Changman else return CURSEG_COLD_NODE; } -- 2.1.1 -- ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel -- ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH 3/5] f2fs: control the memory footprint used by ino entries
On Sat, Nov 08, 2014 at 11:36:07PM -0800, Jaegeuk Kim wrote: This patch adds to control the memory footprint used by ino entries. This will conduct best effort, not strictly. Signed-off-by: Jaegeuk Kim jaeg...@kernel.org --- fs/f2fs/node.c| 28 ++-- fs/f2fs/node.h| 3 ++- fs/f2fs/segment.c | 3 ++- 3 files changed, 26 insertions(+), 8 deletions(-) diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index 44b8afe..4ea2c47 100644 --- a/fs/f2fs/node.c +++ b/fs/f2fs/node.c @@ -31,22 +31,38 @@ bool available_free_memory(struct f2fs_sb_info *sbi, int type) { struct f2fs_nm_info *nm_i = NM_I(sbi); struct sysinfo val; + unsigned long avail_ram; unsigned long mem_size = 0; bool res = false; si_meminfo(val); - /* give 25%, 25%, 50% memory for each components respectively */ + + /* only uses low memory */ + avail_ram = val.totalram - val.totalhigh; + + /* give 25%, 25%, 50%, 50% memory for each components respectively */ Hi Jaegeuk, The memory usage of nm_i should be 100% but it's 125%. Mistake or intended? if (type == FREE_NIDS) { - mem_size = (nm_i-fcnt * sizeof(struct free_nid)) 12; - res = mem_size ((val.totalram * nm_i-ram_thresh / 100) 2); + mem_size = (nm_i-fcnt * sizeof(struct free_nid)) + PAGE_CACHE_SHIFT; + res = mem_size ((avail_ram * nm_i-ram_thresh / 100) 2); } else if (type == NAT_ENTRIES) { - mem_size = (nm_i-nat_cnt * sizeof(struct nat_entry)) 12; - res = mem_size ((val.totalram * nm_i-ram_thresh / 100) 2); + mem_size = (nm_i-nat_cnt * sizeof(struct nat_entry)) + PAGE_CACHE_SHIFT; + res = mem_size ((avail_ram * nm_i-ram_thresh / 100) 2); } else if (type == DIRTY_DENTS) { if (sbi-sb-s_bdi-dirty_exceeded) return false; mem_size = get_pages(sbi, F2FS_DIRTY_DENTS); - res = mem_size ((val.totalram * nm_i-ram_thresh / 100) 1); + res = mem_size ((avail_ram * nm_i-ram_thresh / 100) 1); + } else if (type == INO_ENTRIES) { + int i; + + if (sbi-sb-s_bdi-dirty_exceeded) + return false; + for (i = 0; i = UPDATE_INO; i++) + mem_size += (sbi-ino_num[i] * sizeof(struct ino_entry)) + PAGE_CACHE_SHIFT; + res = mem_size ((avail_ram * nm_i-ram_thresh / 100) 1); } return res; } diff --git a/fs/f2fs/node.h b/fs/f2fs/node.h index acb71e5..d10b644 100644 --- a/fs/f2fs/node.h +++ b/fs/f2fs/node.h @@ -106,7 +106,8 @@ static inline void raw_nat_from_node_info(struct f2fs_nat_entry *raw_ne, enum mem_type { FREE_NIDS, /* indicates the free nid list */ NAT_ENTRIES,/* indicates the cached nat entry */ - DIRTY_DENTS /* indicates dirty dentry pages */ + DIRTY_DENTS,/* indicates dirty dentry pages */ + INO_ENTRIES,/* indicates inode entries */ }; struct nat_entry_set { diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 16721b5d..e094675 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -276,7 +276,8 @@ void f2fs_balance_fs_bg(struct f2fs_sb_info *sbi) { /* check the # of cached NAT entries and prefree segments */ if (try_to_free_nats(sbi, NAT_ENTRY_PER_BLOCK) || - excess_prefree_segs(sbi)) + excess_prefree_segs(sbi) || + available_free_memory(sbi, INO_ENTRIES)) f2fs_sync_fs(sbi-sb, true); } -- 2.1.1 -- ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel -- ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] Don't merge not sended patches
Hi Jaegeuk, I've found new 2 patches when I pull f2fs-tools. The patches didn't show in mailing list. I think although patches is very trivial, it should be reported through our mailing list. Thanks, Changman -- ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH 4/5] f2fs: write node pages if checkpoint is not doing
On Sat, Nov 08, 2014 at 11:36:08PM -0800, Jaegeuk Kim wrote: It needs to write node pages if checkpoint is not doing in order to avoid memory pressure. Signed-off-by: Jaegeuk Kim jaeg...@kernel.org --- fs/f2fs/node.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index 4ea2c47..6f514fb 100644 --- a/fs/f2fs/node.c +++ b/fs/f2fs/node.c @@ -1314,10 +1314,12 @@ static int f2fs_write_node_page(struct page *page, return 0; } - if (wbc-for_reclaim) - goto redirty_out; - - down_read(sbi-node_write); + if (wbc-for_reclaim) { + if (!down_read_trylock(sbi-node_write)) + goto redirty_out; Previously, we skipped write_page for reclaim path, but from now on, we will write out node page to reclaim memory at any time except checkpoint. We should remember it may occur to break merging bio. Got it. Reviewed-by: Changman Lee cm224@samsung.com + } else { + down_read(sbi-node_write); + } set_page_writeback(page); write_node_page(sbi, page, fio, nid, ni.blk_addr, new_addr); set_node_addr(sbi, ni, new_addr, is_fsync_dnode(page)); -- 2.1.1 -- ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel -- ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH] f2fs: implement -o dirsync
On Sun, Nov 09, 2014 at 10:24:22PM -0800, Jaegeuk Kim wrote: If a mount option has dirsync, we should call checkpoint for all the directory operations. Signed-off-by: Jaegeuk Kim jaeg...@kernel.org --- fs/f2fs/namei.c | 24 1 file changed, 24 insertions(+) diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c index 6312dd2..db3ee09 100644 --- a/fs/f2fs/namei.c +++ b/fs/f2fs/namei.c @@ -138,6 +138,9 @@ static int f2fs_create(struct inode *dir, struct dentry *dentry, umode_t mode, stat_inc_inline_inode(inode); d_instantiate(dentry, inode); unlock_new_inode(inode); + + if (IS_DIRSYNC(dir)) + f2fs_sync_fs(sbi-sb, 1); return 0; out: handle_failed_inode(inode); @@ -164,6 +167,9 @@ static int f2fs_link(struct dentry *old_dentry, struct inode *dir, f2fs_unlock_op(sbi); d_instantiate(dentry, inode); + + if (IS_DIRSYNC(dir)) + f2fs_sync_fs(sbi-sb, 1); return 0; out: clear_inode_flag(F2FS_I(inode), FI_INC_LINK); @@ -233,6 +239,9 @@ static int f2fs_unlink(struct inode *dir, struct dentry *dentry) f2fs_delete_entry(de, page, dir, inode); f2fs_unlock_op(sbi); + if (IS_DIRSYNC(dir)) + f2fs_sync_fs(sbi-sb, 1); + /* In order to evict this inode, we set it dirty */ mark_inode_dirty(inode); Let's move it below mark_inode_dirty. After sync, it's unnecessary inserting inode into dirty_list. fail: @@ -268,6 +277,9 @@ static int f2fs_symlink(struct inode *dir, struct dentry *dentry, d_instantiate(dentry, inode); unlock_new_inode(inode); + + if (IS_DIRSYNC(dir)) + f2fs_sync_fs(sbi-sb, 1); return err; out: handle_failed_inode(inode); @@ -304,6 +316,8 @@ static int f2fs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode) d_instantiate(dentry, inode); unlock_new_inode(inode); + if (IS_DIRSYNC(dir)) + f2fs_sync_fs(sbi-sb, 1); return 0; out_fail: @@ -346,8 +360,12 @@ static int f2fs_mknod(struct inode *dir, struct dentry *dentry, f2fs_unlock_op(sbi); alloc_nid_done(sbi, inode-i_ino); + d_instantiate(dentry, inode); unlock_new_inode(inode); + + if (IS_DIRSYNC(dir)) + f2fs_sync_fs(sbi-sb, 1); return 0; out: handle_failed_inode(inode); @@ -461,6 +479,9 @@ static int f2fs_rename(struct inode *old_dir, struct dentry *old_dentry, } f2fs_unlock_op(sbi); + + if (IS_DIRSYNC(old_dir) || IS_DIRSYNC(new_dir)) + f2fs_sync_fs(sbi-sb, 1); return 0; put_out_dir: @@ -600,6 +621,9 @@ static int f2fs_cross_rename(struct inode *old_dir, struct dentry *old_dentry, update_inode_page(new_dir); f2fs_unlock_op(sbi); + + if (IS_DIRSYNC(old_dir) || IS_DIRSYNC(new_dir)) + f2fs_sync_fs(sbi-sb, 1); return 0; out_undo: /* Still we may fail to recover name info of f2fs_inode here */ -- 2.1.1 -- ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel -- ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH] mkfs.f2fs: reclaim free space in case of regular file
If we use regular file instead block device, let's reclaim its free space. Signed-off-by: Changman Lee cm224@samsung.com --- configure.ac | 2 +- mkfs/f2fs_format_utils.c | 18 -- 2 files changed, 17 insertions(+), 3 deletions(-) diff --git a/configure.ac b/configure.ac index 0111e72..d66cb73 100644 --- a/configure.ac +++ b/configure.ac @@ -57,7 +57,7 @@ PKG_CHECK_MODULES([libuuid], [uuid]) # Checks for header files. AC_CHECK_HEADERS([linux/fs.h fcntl.h mntent.h stdlib.h string.h \ - sys/ioctl.h sys/mount.h unistd.h]) + sys/ioctl.h sys/mount.h unistd.h linux/falloc.h]) # Checks for typedefs, structures, and compiler characteristics. AC_C_INLINE diff --git a/mkfs/f2fs_format_utils.c b/mkfs/f2fs_format_utils.c index 9892a8f..88b9953 100644 --- a/mkfs/f2fs_format_utils.c +++ b/mkfs/f2fs_format_utils.c @@ -6,18 +6,26 @@ * * Dual licensed under the GPL or LGPL version 2 licenses. */ +#define _LARGEFILE_SOURCE #define _LARGEFILE64_SOURCE +#ifndef _GNU_SOURCE +#define _GNU_SOURCE +#endif #include stdio.h #include unistd.h #include sys/ioctl.h #include sys/stat.h +#include fcntl.h #include f2fs_fs.h #ifdef HAVE_LINUX_FS_H #include linux/fs.h #endif +#ifdef HAVE_LINUX_FALLOC_H +#include linux/falloc.h +#endif int f2fs_trim_device() { @@ -37,9 +45,15 @@ int f2fs_trim_device() #if defined(WITH_BLKDISCARD) defined(BLKDISCARD) MSG(0, Info: Discarding device\n); - if (S_ISREG(stat_buf.st_mode)) + if (S_ISREG(stat_buf.st_mode)) { +#ifdef FALLOC_FL_PUNCH_HOLE + if (fallocate(config.fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, + range[0], range[1]) 0) { + MSG(0, Info: fallocate(PUNCH_HOLE|KEEP_SIZE) is failed\n); + } +#endif return 0; - else if (S_ISBLK(stat_buf.st_mode)) { + } else if (S_ISBLK(stat_buf.st_mode)) { if (ioctl(config.fd, BLKDISCARD, range) 0) { MSG(0, Info: This device doesn't support TRIM\n); } else { -- 1.9.1 -- ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH 04/10] f2fs: give an option to enable in-place-updates during fsync to users
Hi JK, I think it' nicer if this can be used as 'OR' with other policy together. If so, we can also cover the weakness in high utilization. Regard, Changman On Sun, Sep 14, 2014 at 03:14:18PM -0700, Jaegeuk Kim wrote: If user wrote F2FS_IPU_FSYNC:4 in /sys/fs/f2fs/ipu_policy, f2fs_sync_file only starts to try in-place-updates. And, if the number of dirty pages is over /sys/fs/f2fs/min_fsync_blocks, it keeps out-of-order manner. Otherwise, it triggers in-place-updates. This may be used by storage showing very high random write performance. For example, it can be used when, Seq. writes (Data) + wait + Seq. writes (Node) is pretty much slower than, Rand. writes (Data) Signed-off-by: Jaegeuk Kim jaeg...@kernel.org --- Documentation/ABI/testing/sysfs-fs-f2fs | 7 +++ Documentation/filesystems/f2fs.txt | 9 - fs/f2fs/f2fs.h | 1 + fs/f2fs/file.c | 7 +++ fs/f2fs/segment.c | 3 ++- fs/f2fs/segment.h | 14 ++ fs/f2fs/super.c | 2 ++ 7 files changed, 33 insertions(+), 10 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs b/Documentation/ABI/testing/sysfs-fs-f2fs index 62dd725..6f9157f 100644 --- a/Documentation/ABI/testing/sysfs-fs-f2fs +++ b/Documentation/ABI/testing/sysfs-fs-f2fs @@ -44,6 +44,13 @@ Description: Controls the FS utilization condition for the in-place-update policies. +What:/sys/fs/f2fs/disk/min_fsync_blocks +Date:September 2014 +Contact: Jaegeuk Kim jaeg...@kernel.org +Description: + Controls the dirty page count condition for the in-place-update + policies. + What:/sys/fs/f2fs/disk/max_small_discards Date:November 2013 Contact: Jaegeuk Kim jaegeuk@samsung.com diff --git a/Documentation/filesystems/f2fs.txt b/Documentation/filesystems/f2fs.txt index a2046a7..d010da8 100644 --- a/Documentation/filesystems/f2fs.txt +++ b/Documentation/filesystems/f2fs.txt @@ -194,13 +194,20 @@ Files in /sys/fs/f2fs/devname updates in f2fs. There are five policies: 0: F2FS_IPU_FORCE, 1: F2FS_IPU_SSR, 2: F2FS_IPU_UTIL, 3: F2FS_IPU_SSR_UTIL, - 4: F2FS_IPU_DISABLE. + 4: F2FS_IPU_FSYNC, 5: F2FS_IPU_DISABLE. min_ipu_util This parameter controls the threshold to trigger in-place-updates. The number indicates percentage of the filesystem utilization, and used by F2FS_IPU_UTIL and F2FS_IPU_SSR_UTIL policies. + min_fsync_blocks This parameter controls the threshold to trigger + in-place-updates when F2FS_IPU_FSYNC mode is set. + The number indicates the number of dirty pages + when fsync needs to flush on its call path. If + the number is less than this value, it triggers + in-place-updates. + max_victim_search This parameter controls the number of trials to find a victim segment when conducting SSR and cleaning operations. The default value is 4096 diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 2756c16..4f84d2a 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -386,6 +386,7 @@ struct f2fs_sm_info { unsigned int ipu_policy;/* in-place-update policy */ unsigned int min_ipu_util; /* in-place-update threshold */ + unsigned int min_fsync_blocks; /* threshold for fsync */ /* for flush command control */ struct flush_cmd_control *cmd_control_info; diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 77426c7..af06e22 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -154,12 +154,11 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync) trace_f2fs_sync_file_enter(inode); /* if fdatasync is triggered, let's do in-place-update */ - if (datasync) + if (get_dirty_pages(inode) = SM_I(sbi)-min_fsync_blocks) set_inode_flag(fi, FI_NEED_IPU); - ret = filemap_write_and_wait_range(inode-i_mapping, start, end); - if (datasync) - clear_inode_flag(fi, FI_NEED_IPU); + clear_inode_flag(fi, FI_NEED_IPU); + if (ret) { trace_f2fs_sync_file_exit(inode, need_cp, datasync, ret); return ret; diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index e158d63..c6f627b 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -1928,8 +1928,9 @@ int build_segment_manager(struct
Re: [f2fs-dev] [PATCH] f2fs: reposition unlock_new_inode to prevent accessing invalid inode
Hi Chao, I agree it's correct unlock_new_inode should be located after make_bad_inode. About this scenario, I think we should check some condition if this could be occured; A inode allocated newly could be victim by gc thread. Then, f2fs_iget called by Thread A have to fail because we handled it as bad_inode in Thread B. However, f2fs_iget could still get inode. How about check it using is_bad_inode() in f2fs_iget. Thanks, On Tue, Aug 26, 2014 at 06:35:29PM +0800, Chao Yu wrote: As the race condition on the inode cache, following scenario can appear: [Thread a][Thread b] -f2fs_mkdir -f2fs_add_link -__f2fs_add_link -init_inode_metadata failed here -gc_thread_func -f2fs_gc -do_garbage_collect -gc_data_segment -f2fs_iget -iget_locked -wait_on_inode -unlock_new_inode -move_data_page -make_bad_inode -iput When we fail in create/symlink/mkdir/mknod/tmpfile, the new allocated inode should be set as bad to avoid being accessed by other thread. But in above scenario, it allows f2fs to access the invalid inode before this inode was set as bad. This patch fix the potential problem, and this issue was found by code review. Signed-off-by: Chao Yu chao2...@samsung.com --- fs/f2fs/namei.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c index 6b53ce9..845f1be 100644 --- a/fs/f2fs/namei.c +++ b/fs/f2fs/namei.c @@ -134,8 +134,8 @@ static int f2fs_create(struct inode *dir, struct dentry *dentry, umode_t mode, return 0; out: clear_nlink(inode); - unlock_new_inode(inode); make_bad_inode(inode); + unlock_new_inode(inode); iput(inode); alloc_nid_failed(sbi, ino); return err; @@ -267,8 +267,8 @@ static int f2fs_symlink(struct inode *dir, struct dentry *dentry, return err; out: clear_nlink(inode); - unlock_new_inode(inode); make_bad_inode(inode); + unlock_new_inode(inode); iput(inode); alloc_nid_failed(sbi, inode-i_ino); return err; @@ -308,8 +308,8 @@ static int f2fs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode) out_fail: clear_inode_flag(F2FS_I(inode), FI_INC_LINK); clear_nlink(inode); - unlock_new_inode(inode); make_bad_inode(inode); + unlock_new_inode(inode); iput(inode); alloc_nid_failed(sbi, inode-i_ino); return err; @@ -354,8 +354,8 @@ static int f2fs_mknod(struct inode *dir, struct dentry *dentry, return 0; out: clear_nlink(inode); - unlock_new_inode(inode); make_bad_inode(inode); + unlock_new_inode(inode); iput(inode); alloc_nid_failed(sbi, inode-i_ino); return err; @@ -688,8 +688,8 @@ release_out: out: f2fs_unlock_op(sbi); clear_nlink(inode); - unlock_new_inode(inode); make_bad_inode(inode); + unlock_new_inode(inode); iput(inode); alloc_nid_failed(sbi, inode-i_ino); return err; -- 2.0.0.421.g786a89d -- Slashdot TV. Video for Nerds. Stuff that matters. http://tv.slashdot.org/ ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel -- Slashdot TV. Video for Nerds. Stuff that matters. http://tv.slashdot.org/ ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH 07/11] f2fs: enable in-place-update for fdatasync
On Tue, Jul 29, 2014 at 05:22:15AM -0700, Jaegeuk Kim wrote: Hi Changman, On Tue, Jul 29, 2014 at 09:41:11AM +0900, Changman Lee wrote: Hi Jaegeuk, On Fri, Jul 25, 2014 at 03:47:21PM -0700, Jaegeuk Kim wrote: This patch enforces in-place-updates only when fdatasync is requested. If we adopt this in-place-updates for the fdatasync, we can skip to write the recovery information. But, as you know, random write occurs when changing into in-place-updates. It will degrade write performance. Is there any case in-place-updates is better, except recovery or high utilization? As I described, you can easily imagine, if users requested small amount of data writes with fdatasync, we should do data writes + node writes. But, if we can do in-place-update, we don't need to write node blocks. Surely it triggers random writes, however, the amount of data is preety small and the device handles them very fast by its inside cache, so that it can enhance the performance. Thanks, Partially agree. Sometimes, I see that SSR shows lower performance than IPU. One of the reasons might be node writes. Anyway, if so, we should know total dirty pages for fdatasync and it's very tunable according to a random write performance of device. Thanks, Thanks Signed-off-by: Jaegeuk Kim jaeg...@kernel.org --- fs/f2fs/f2fs.h| 1 + fs/f2fs/file.c| 7 +++ fs/f2fs/segment.h | 4 3 files changed, 12 insertions(+) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index ab36025..8f8685e 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -998,6 +998,7 @@ enum { FI_INLINE_DATA, /* used for inline data*/ FI_APPEND_WRITE,/* inode has appended data */ FI_UPDATE_WRITE,/* inode has in-place-update data */ + FI_NEED_IPU,/* used fo ipu for fdatasync */ }; static inline void set_inode_flag(struct f2fs_inode_info *fi, int flag) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 121689a..e339856 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -127,11 +127,18 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync) return 0; trace_f2fs_sync_file_enter(inode); + + /* if fdatasync is triggered, let's do in-place-update */ + if (datasync) + set_inode_flag(fi, FI_NEED_IPU); + ret = filemap_write_and_wait_range(inode-i_mapping, start, end); if (ret) { trace_f2fs_sync_file_exit(inode, need_cp, datasync, ret); return ret; } + if (datasync) + clear_inode_flag(fi, FI_NEED_IPU); /* * if there is no written data, don't waste time to write recovery info. diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index ee5c75e..55973f7 100644 --- a/fs/f2fs/segment.h +++ b/fs/f2fs/segment.h @@ -486,6 +486,10 @@ static inline bool need_inplace_update(struct inode *inode) if (S_ISDIR(inode-i_mode)) return false; + /* this is only set during fdatasync */ + if (is_inode_flag_set(F2FS_I(inode), FI_NEED_IPU)) + return true; + switch (SM_I(sbi)-ipu_policy) { case F2FS_IPU_FORCE: return true; -- 1.8.5.2 (Apple Git-48) -- Want fast and easy access to all the code in your enterprise? Index and search up to 200,000 lines of code with a free copy of Black Duck Code Sight - the same software that powers the world's largest code search on Ohloh, the Black Duck Open Hub! Try it now. http://p.sf.net/sfu/bds ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel -- Infragistics Professional Build stunning WinForms apps today! Reboot your WinForms applications with our WinForms controls. Build a bridge from your legacy apps to the future. http://pubads.g.doubleclick.net/gampad/clk?id=153845071iu=/4140/ostg.clktrk ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH] Remove an unnecessary line in allocate_data_block.
On Tue, Jul 29, 2014 at 06:24:48AM -0700, Jaegeuk Kim wrote: Hi Dongho, At first, please write a patch under the correct rule. (e.g., description) About this change, it's negative. When considering SSR, we need to take care of the following scenario. - old segno : X - new address : Z - old curseg : Y This means, a new block is supposed to be written to Z from X. And Z is newly allocated in the same path from Y. In that case, we should trigger locate_dirty_segment for Y, since it was a current_segment and can be dirty owing to SSR. But that was not included in the dirty list. Thanks, We already choosed old curseg(Y) and then we allocate new address(Z) from old curseg(Y). After that we call refresh_sit_entry(old address, new address). In the funcation, we call locate_dirty_segment with old seg and old curseg. So calling locate_dirty_segment after refresh_sit_entry again is redundant. Thanks, On Mon, Jul 28, 2014 at 08:34:25AM +, Dongho Sim wrote: Hi, Chao. It's my mistake. Thanks :-) Signed-off-by: Dongho Sim dh@samsung.com --- fs/f2fs/segment.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 8a6e57d..7af4a8d 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -973,14 +973,12 @@ void allocate_data_block(struct f2fs_sb_info *sbi, struct page *page, { struct sit_info *sit_i = SIT_I(sbi); struct curseg_info *curseg; - unsigned int old_cursegno; curseg = CURSEG_I(sbi, type); mutex_lock(curseg-curseg_mutex); *new_blkaddr = NEXT_FREE_BLKADDR(sbi, curseg); - old_cursegno = curseg-segno; /* * __add_sum_entry should be resided under the curseg_mutex @@ -1001,7 +999,6 @@ void allocate_data_block(struct f2fs_sb_info *sbi, struct page *page, * since SSR needs latest valid block information. */ refresh_sit_entry(sbi, old_blkaddr, *new_blkaddr); - locate_dirty_segment(sbi, old_cursegno); mutex_unlock(sit_i-sentry_lock); -- 1.9.1 --- Original Message --- Sender : ?超chao2...@samsung.com 工程?/SRC-Nanjing-Mobile Solution Lab/삼성전자 Date : 2014-07-28 16:21 (GMT+09:00) Title : Re: [f2fs-dev] [PATCH] Remove an unnecessary line in allocate_data_block. Hi Dongho, - Original Message - From: Dongho Sim Sent: Monday, July 28, 2014 1:51 PM To: Chao Yu Cc: jaeg...@kernel.org, linux-f2fs-devel@lists.sourceforge.net Subject: Re: [f2fs-dev] [PATCH] Remove an unnecessary line in allocate_data_block. Yes, there was another one. Thanks Chao, :-) Signed-off-by: Dongho Sim --- fs/f2fs/segment.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 8a6e57d..3ab7749 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -980,7 +980,6 @@ void allocate_data_block(struct f2fs_sb_info *sbi, struct page *page, mutex_lock(curseg-curseg_mutex); *new_blkaddr = NEXT_FREE_BLKADDR(sbi, curseg); - old_cursegno = curseg-segno; The definition of old_cursegno also should be removed. Thanks, Yu /* * __add_sum_entry should be resided under the curseg_mutex @@ -1001,7 +1000,6 @@ void allocate_data_block(struct f2fs_sb_info *sbi, struct page *page, * since SSR needs latest valid block information. */ refresh_sit_entry(sbi, old_blkaddr, *new_blkaddr); - locate_dirty_segment(sbi, old_cursegno); mutex_unlock(sit_i-sentry_lock); -- 1.9.1 --- Original Message --- Sender : Chao Yu Date : 2014-07-28 14:35 (GMT+09:00) Title : RE: [f2fs-dev] [PATCH] Remove an unnecessary line in allocate_data_block. Hi Dongho, -Original Message- From: Dongho Sim [mailto:dh@samsung.com] Sent: Monday, July 28, 2014 7:03 AM To: jaeg...@kernel.org; linux-f2fs-devel@lists.sourceforge.net Subject: [f2fs-dev] [PATCH] Remove an unnecessary line in allocate_data_block. Hi. There was an unnecessary line in function, allocate_data_block. It is already done in refresh_sit_entry(sbi, old_blkaddr, *new_blkaddr); Thanks. Agreed, How about removing old_cursegno too as it's no longer used in allocate_data_block? Thanks, Yu Signed-off-by: Dongho Sim --- fs/f2fs/segment.c | 1 - 1 file changed, 1 deletion(-) diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 8a6e57d..a3c7aae 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -1001,7 +1001,6 @@ void allocate_data_block(struct f2fs_sb_info *sbi, struct page *page, * since SSR needs latest valid block information. */ refresh_sit_entry(sbi, old_blkaddr, *new_blkaddr); - locate_dirty_segment(sbi,
Re: [f2fs-dev] [PATCH 07/11] f2fs: enable in-place-update for fdatasync
Hi Jaegeuk, On Fri, Jul 25, 2014 at 03:47:21PM -0700, Jaegeuk Kim wrote: This patch enforces in-place-updates only when fdatasync is requested. If we adopt this in-place-updates for the fdatasync, we can skip to write the recovery information. But, as you know, random write occurs when changing into in-place-updates. It will degrade write performance. Is there any case in-place-updates is better, except recovery or high utilization? Thanks Signed-off-by: Jaegeuk Kim jaeg...@kernel.org --- fs/f2fs/f2fs.h| 1 + fs/f2fs/file.c| 7 +++ fs/f2fs/segment.h | 4 3 files changed, 12 insertions(+) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index ab36025..8f8685e 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -998,6 +998,7 @@ enum { FI_INLINE_DATA, /* used for inline data*/ FI_APPEND_WRITE,/* inode has appended data */ FI_UPDATE_WRITE,/* inode has in-place-update data */ + FI_NEED_IPU,/* used fo ipu for fdatasync */ }; static inline void set_inode_flag(struct f2fs_inode_info *fi, int flag) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 121689a..e339856 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -127,11 +127,18 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync) return 0; trace_f2fs_sync_file_enter(inode); + + /* if fdatasync is triggered, let's do in-place-update */ + if (datasync) + set_inode_flag(fi, FI_NEED_IPU); + ret = filemap_write_and_wait_range(inode-i_mapping, start, end); if (ret) { trace_f2fs_sync_file_exit(inode, need_cp, datasync, ret); return ret; } + if (datasync) + clear_inode_flag(fi, FI_NEED_IPU); /* * if there is no written data, don't waste time to write recovery info. diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index ee5c75e..55973f7 100644 --- a/fs/f2fs/segment.h +++ b/fs/f2fs/segment.h @@ -486,6 +486,10 @@ static inline bool need_inplace_update(struct inode *inode) if (S_ISDIR(inode-i_mode)) return false; + /* this is only set during fdatasync */ + if (is_inode_flag_set(F2FS_I(inode), FI_NEED_IPU)) + return true; + switch (SM_I(sbi)-ipu_policy) { case F2FS_IPU_FORCE: return true; -- 1.8.5.2 (Apple Git-48) -- Want fast and easy access to all the code in your enterprise? Index and search up to 200,000 lines of code with a free copy of Black Duck Code Sight - the same software that powers the world's largest code search on Ohloh, the Black Duck Open Hub! Try it now. http://p.sf.net/sfu/bds ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel -- Infragistics Professional Build stunning WinForms apps today! Reboot your WinForms applications with our WinForms controls. Build a bridge from your legacy apps to the future. http://pubads.g.doubleclick.net/gampad/clk?id=153845071iu=/4140/ostg.clktrk ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH 1/2 V4] mkfs.f2fs: large volume support
Hi, Jaegeuk Long time ago, I sent 3 patches for large volume support for mkfs, fsck and kernel. But you've missed one patch of mkfs. So I resend the patch resovled conflict with current git tree. Changes from V3 o remove cp_payload in f2fs_super_block Changes from V2 o remove CP_LARGE_VOL_LFLAG instead, use cp_payload in superblock because disk size is determined at format Changes from V1 o fix orphan node blkaddr Regards, Changman Lee -- 8 -- From b7d46c6aaf786d28f82c0fe5d116b561c03b4cb2 Mon Sep 17 00:00:00 2001 From: Changman Lee cm224@samsung.com Date: Thu, 10 Jul 2014 15:26:04 +0900 Subject: [PATCH] mkfs.f2fs: large volume support This patch supports large volume over about 3TB. Signed-off-by: Changman Lee cm224@samsung.com --- include/f2fs_fs.h | 8 ++ mkfs/f2fs_format.c | 79 +++--- 2 files changed, 71 insertions(+), 16 deletions(-) diff --git a/include/f2fs_fs.h b/include/f2fs_fs.h index 53b8cb9..80ce918 100644 --- a/include/f2fs_fs.h +++ b/include/f2fs_fs.h @@ -221,6 +221,7 @@ enum { #define F2FS_LOG_SECTORS_PER_BLOCK 3 /* 4KB: F2FS_BLKSIZE */ #define F2FS_BLKSIZE 4096/* support only 4KB block */ #define F2FS_MAX_EXTENSION 64 /* # of extension entries */ +#define F2FS_BLK_ALIGN(x) (((x) + F2FS_BLKSIZE - 1) / F2FS_BLKSIZE) #define NULL_ADDR 0x0U #define NEW_ADDR -1U @@ -456,6 +457,13 @@ struct f2fs_nat_block { #define SIT_ENTRY_PER_BLOCK (PAGE_CACHE_SIZE / sizeof(struct f2fs_sit_entry)) /* + * F2FS uses 4 bytes to represent block address. As a result, supported size of + * disk is 16 TB and it equals to 16 * 1024 * 1024 / 2 segments. + */ +#define F2FS_MAX_SEGMENT ((16 * 1024 * 1024) / 2) +#define MAX_SIT_BITMAP_SIZE((F2FS_MAX_SEGMENT / SIT_ENTRY_PER_BLOCK) / 8) + +/* * Note that f2fs_sit_entry-vblocks has the following bit-field information. * [15:10] : allocation type such as CURSEG__TYPE * [9:0] : valid block count diff --git a/mkfs/f2fs_format.c b/mkfs/f2fs_format.c index 1568545..a62a8fe 100644 --- a/mkfs/f2fs_format.c +++ b/mkfs/f2fs_format.c @@ -101,7 +101,8 @@ static int f2fs_prepare_super_block(void) u_int32_t blocks_for_sit, blocks_for_nat, blocks_for_ssa; u_int32_t total_valid_blks_available; u_int64_t zone_align_start_offset, diff, total_meta_segments; - u_int32_t sit_bitmap_size, max_nat_bitmap_size, max_nat_segments; + u_int32_t sit_bitmap_size, max_sit_bitmap_size; + u_int32_t max_nat_bitmap_size, max_nat_segments; u_int32_t total_zones; super_block.magic = cpu_to_le32(F2FS_SUPER_MAGIC); @@ -197,8 +198,26 @@ static int f2fs_prepare_super_block(void) */ sit_bitmap_size = ((le32_to_cpu(super_block.segment_count_sit) / 2) log_blks_per_seg) / 8; - max_nat_bitmap_size = CHECKSUM_OFFSET - sizeof(struct f2fs_checkpoint) + 1 - - sit_bitmap_size; + + if (sit_bitmap_size MAX_SIT_BITMAP_SIZE) + max_sit_bitmap_size = MAX_SIT_BITMAP_SIZE; + else + max_sit_bitmap_size = sit_bitmap_size; + + /* +* It should be reserved minimum 1 segment for nat. +* When sit is too large, we should expand cp area. It requires more pages for cp. +*/ + if (max_sit_bitmap_size + (CHECKSUM_OFFSET - sizeof(struct f2fs_checkpoint) + 65)) { + max_nat_bitmap_size = CHECKSUM_OFFSET - sizeof(struct f2fs_checkpoint) + 1; + super_block.cp_payload = F2FS_BLK_ALIGN(max_sit_bitmap_size); + } else { + max_nat_bitmap_size = CHECKSUM_OFFSET - sizeof(struct f2fs_checkpoint) + 1 + - max_sit_bitmap_size; + super_block.cp_payload = 0; + } + max_nat_segments = (max_nat_bitmap_size * 8) log_blks_per_seg; if (le32_to_cpu(super_block.segment_count_nat) max_nat_segments) @@ -414,6 +433,7 @@ static int f2fs_write_check_point_pack(void) u_int64_t cp_seg_blk_offset = 0; u_int32_t crc = 0; int i; + char *cp_payload = NULL; ckp = calloc(F2FS_BLKSIZE, 1); if (ckp == NULL) { @@ -427,6 +447,12 @@ static int f2fs_write_check_point_pack(void) return -1; } + cp_payload = calloc(F2FS_BLKSIZE, 1); + if (cp_payload == NULL) { + MSG(1, \tError: Calloc Failed for cp_payload!!!\n); + return -1; + } + /* 1. cp page 1 of checkpoint pack 1 */ ckp-checkpoint_ver = cpu_to_le64(1); ckp-cur_node_segno[0] = @@ -465,9 +491,10 @@ static int f2fs_write_check_point_pack(void) ((le32_to_cpu(ckp-free_segment_count) + 6 - le32_to_cpu(ckp-overprov_segment_count)) * config.blks_per_seg)); - ckp-cp_pack_total_block_count
Re: [f2fs-dev] [PATCH 3/4] f2fs: use find_next_bit_le rather than test_bit_le in, find_in_block
Hello, On Fri, Jul 04, 2014 at 11:25:35PM -0700, Jaegeuk Kim wrote: To Changman, Just for sure, can you reproduce this issue in the x86 machine with proper benchmarks? (i.e., test_bit_le vs. find_next_bit_le) It shows quite a different result of bit_mod_test between server and desktop. CPU i5 x86_64 Ubuntu Server - 3.16.0-rc3 [266627.204776] find_next_bit_letest_bit_le [266627.205319] 18321774 [266627.206223] 12921746 [266627.207092] 12051746 [266627.207876] 9141746 [266627.208710] 10821746 [266627.209506] 9561746 [266627.210175] 5231746 [266627.211839] 39071746 [266627.212898] 18501746 [266627.214046] 21531746 [266627.215118] 18941746 CPU i7 x86_64 Mint Desktop - 3.13.0-24-generic [432284.422356] find_next_bit_letest_bit_le [432284.423470] 37713878 [432284.425400] 26713696 [432284.427221] 24923760 [432284.428908] 19713696 [432284.430640] 21913730 [432284.432323] 19863696 [432284.433741] 11233698 [432284.437269] 82993696 [432284.439487] 38423696 [432284.441850] 43343696 [432284.444080] 38853696 To all, I cautiously suspect that the performances might be different when processing f2fs_find_entry, since L1/L2 cache misses due to the intermediate routines like matching strings can make some effect on it. But, IMO, it is still worth to investigate this issue and contemplate how to detect all ones or not. Ah, one solution may be using 2 bytes from the reserved space, total 3, to indicate how many valid dentries are stored in the dentry block. Any ideas? Agree. In the case of one bits is over than half, test_bit is better than find_next_bit. So we can decide whether using test_bit or find_next_bit depending on count of one bits. When just comparing test_bit and find_next_bit, I think test_bit is more effective in f2fs because let's think about f2fs's dentry management policy. One dentry bucket is filled then next dentry bucket is filled from lower to higher level. If empty slots of lower level exist, they are used first. So, I guess that one bits are getting more than zero bits as time goes by. Thanks, Thanks, On Fri, Jul 04, 2014 at 04:04:09PM +0800, Gu Zheng wrote: Hi Yu, Thanks. On 07/04/2014 02:21 PM, Chao Yu wrote: Hi Jaegeuk, Gu, Changman -Original Message- From: Jaegeuk Kim [mailto:jaeg...@kernel.org] Sent: Friday, July 04, 2014 1:36 PM To: Gu Zheng Cc: f2fs; fsdevel; 이창만; 俞 Subject: Re: [PATCH 3/4] f2fs: use find_next_bit_le rather than test_bit_le in, find_in_block Well, how about testing with many ones in the bit streams? Thanks, On Thu, Jul 03, 2014 at 06:14:02PM +0800, Gu Zheng wrote: Hi Jaegeuk, Changman Just a simple test, not very sure it can address our qualm. Bitmap size:216(the same as f2fs dentry_bits). CPU: Intel i5 x86_64. Time counting based on tsc(the less the fast). [Index of 1]find_next_bit_letest_bit_le 0 20 117 1 20 114 2 20 113 3 20 139 4 22 121 5 22 118 6 22 115 8 22 112 9 22 106 10 22 105 11 22 100 16 22 98 48 22 97 80 27 95 104 27 92 136 32 95 160 32 92 184 32 90 200 27 87 208 35 84 According to the result, find_next_bit_le is always better than test_bit_le, though there may be some noise, but I think the result is clear. Hope it can help us.:) ps.The sample is attached too. Thanks, Gu I hope this could provide some help for this patch. I modify Gu's code like this, and add few test case: static void test_bit_search_speed(void) { unsigned long flags; uint64_t tsc_s_b1, tsc_s_e1, tsc_s_b2, tsc_s_e2; int i, j, pos; const void *bit_addr; local_irq_save(flags); preempt_disable(); printk(find_next_bit test_bit_le\n); for (i = 0; i 24; i++) {
Re: [f2fs-dev] [PATCH 3/4] f2fs: use find_next_bit_le rather than test_bit_le in, find_in_block
Hi, Gu Unfortunately, find_next_bit isn't always better than test_bit. Refer to commit 5d0c667121bfc8be76d1580f485bddbe73465d1a I remember that Perviously, Jaegeuk had changed find_next_bit to test_bit because find_next_bit spent much cpu time in the case of there is lot of dentries like a postmark. Sorry, I should have reported this quickly. On Tue, Jun 24, 2014 at 06:20:41PM +0800, Gu Zheng wrote: Use find_next_bit_le rather than test_bit_le to improve search speed lightly. Signed-off-by: Gu Zheng guz.f...@cn.fujitsu.com --- fs/f2fs/dir.c | 43 +-- 1 files changed, 21 insertions(+), 22 deletions(-) diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c index 3edd561..ba510fb 100644 --- a/fs/f2fs/dir.c +++ b/fs/f2fs/dir.c @@ -93,42 +93,41 @@ static struct f2fs_dir_entry *find_in_block(struct page *dentry_page, const char *name, size_t namelen, int *max_slots, f2fs_hash_t namehash, struct page **res_page) { - struct f2fs_dir_entry *de; - unsigned long bit_pos = 0; + unsigned long bit_pos = 0, bit_start = 0; struct f2fs_dentry_block *dentry_blk = kmap(dentry_page); const void *dentry_bits = dentry_blk-dentry_bitmap; - int max_len = 0; - while (bit_pos NR_DENTRY_IN_BLOCK) { - if (!test_bit_le(bit_pos, dentry_bits)) { - if (bit_pos == 0) - max_len = 1; - else if (!test_bit_le(bit_pos - 1, dentry_bits)) - max_len++; - bit_pos++; - continue; + while (bit_start NR_DENTRY_IN_BLOCK) { + struct f2fs_dir_entry *de; + int max_len = 0; + + bit_pos = find_next_bit_le(dentry_bits, + NR_DENTRY_IN_BLOCK, bit_start); + + max_len = bit_pos - bit_start; + if (max_len *max_slots) { + *max_slots = max_len; + max_len = 0; } + + if (bit_pos = NR_DENTRY_IN_BLOCK) + break; + de = dentry_blk-dentry[bit_pos]; if (early_match_name(name, namelen, namehash, de)) { if (!memcmp(dentry_blk-filename[bit_pos], name, namelen)) { *res_page = dentry_page; - goto found; + return de; } } - if (max_len *max_slots) { - *max_slots = max_len; - max_len = 0; - } - bit_pos += GET_DENTRY_SLOTS(le16_to_cpu(de-name_len)); + + bit_start = bit_pos + + GET_DENTRY_SLOTS(le16_to_cpu(de-name_len)); } - de = NULL; kunmap(dentry_page); -found: - if (max_len *max_slots) - *max_slots = max_len; - return de; + return NULL; } static struct f2fs_dir_entry *find_in_level(struct inode *dir, -- 1.7.7 -- Open source business process management suite built on Java and Eclipse Turn processes into business applications with Bonita BPM Community Edition Quickly connect people, data, and systems into organized workflows Winner of BOSSIE, CODIE, OW2 and Gartner awards http://p.sf.net/sfu/Bonitasoft ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH] f2fs: avoid overflow when large directory feathure is enabled
Hi, Chao Good catch. Please, modify Documentation/filesytems/f2fs.txt On Tue, May 27, 2014 at 09:06:52AM +0800, Chao Yu wrote: When large directory feathure is enable, We have one case which could cause overflow in dir_buckets() as following: special case: level + dir_level = 32 and level MAX_DIR_HASH_DEPTH / 2. Here we define MAX_DIR_BUCKETS to limit the return value when the condition could trigger potential overflow. Signed-off-by: Chao Yu chao2...@samsung.com --- fs/f2fs/dir.c |4 ++-- include/linux/f2fs_fs.h |3 +++ 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c index c3f1485..966acb0 100644 --- a/fs/f2fs/dir.c +++ b/fs/f2fs/dir.c @@ -23,10 +23,10 @@ static unsigned long dir_blocks(struct inode *inode) static unsigned int dir_buckets(unsigned int level, int dir_level) { - if (level MAX_DIR_HASH_DEPTH / 2) + if (level + dir_level MAX_DIR_HASH_DEPTH / 2) return 1 (level + dir_level); else - return 1 ((MAX_DIR_HASH_DEPTH / 2 + dir_level) - 1); + return MAX_DIR_BUCKETS; } static unsigned int bucket_blocks(unsigned int level) diff --git a/include/linux/f2fs_fs.h b/include/linux/f2fs_fs.h index 8c03f71..ba6f312 100644 --- a/include/linux/f2fs_fs.h +++ b/include/linux/f2fs_fs.h @@ -394,6 +394,9 @@ typedef __le32f2fs_hash_t; /* MAX level for dir lookup */ #define MAX_DIR_HASH_DEPTH 63 +/* MAX buckets in one level of dir */ +#define MAX_DIR_BUCKETS (1 ((MAX_DIR_HASH_DEPTH / 2) - 1)) + #define SIZE_OF_DIR_ENTRY11 /* by byte */ #define SIZE_OF_DENTRY_BITMAP((NR_DENTRY_IN_BLOCK + BITS_PER_BYTE - 1) / \ BITS_PER_BYTE) -- 1.7.10.4 -- The best possible search technologies are now affordable for all companies. Download your FREE open source Enterprise Search Engine today! Our experts will assist you in its installation for $59/mo, no commitment. Test it for FREE on our Cloud platform anytime! http://pubads.g.doubleclick.net/gampad/clk?id=145328191iu=/4140/ostg.clktrk ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel -- The best possible search technologies are now affordable for all companies. Download your FREE open source Enterprise Search Engine today! Our experts will assist you in its installation for $59/mo, no commitment. Test it for FREE on our Cloud platform anytime! http://pubads.g.doubleclick.net/gampad/clk?id=145328191iu=/4140/ostg.clktrk ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH v2] f2fs: avoid crash when trace f2fs_submit_page_mbio event in ra_sum_pages
On Tue, May 27, 2014 at 02:32:57PM +0800, Chao Yu wrote: Hi changman, -Original Message- From: Changman Lee [mailto:cm224@samsung.com] Sent: Tuesday, May 27, 2014 9:25 AM To: Chao Yu Cc: Jaegeuk Kim; linux-fsde...@vger.kernel.org; linux-ker...@vger.kernel.org; linux-f2fs-devel@lists.sourceforge.net Subject: Re: [f2fs-dev] [PATCH v2] f2fs: avoid crash when trace f2fs_submit_page_mbio event in ra_sum_pages Hi, Chao Could you think about following once. move node_inode in front of build_segment_manager, then use node_inode instead of bd_inode. Jaegeuk and I discussed this solution previously in [PATCH 3/3 V3] f2fs: introduce f2fs_cache_node_page() to add page into node_inode cache You can see it from this url: http://sourceforge.net/p/linux-f2fs/mailman/linux-f2fs-devel/?viewmonth=201312page=5 And it seems not easy to change order of build_*_manager and make node_inode, because there are dependency between them. Sorry to make a mess your patch thread. I've understood it. In your patch, using NAT journal seems to be possible. Anyway, thanks for your answer. On Tue, May 27, 2014 at 08:41:07AM +0800, Chao Yu wrote: Previously we allocate pages with no mapping in ra_sum_pages(), so we may encounter a crash in event trace of f2fs_submit_page_mbio where we access mapping data of the page. We'd better allocate pages in bd_inode mapping and invalidate these pages after we restore data from pages. It could avoid crash in above scenario. Changes from V1 o remove redundant code in ra_sum_pages() suggested by Jaegeuk Kim. Call Trace: [f1031630] ? ftrace_raw_event_f2fs_write_checkpoint+0x80/0x80 [f2fs] [f10377bb] f2fs_submit_page_mbio+0x1cb/0x200 [f2fs] [f103c5da] restore_node_summary+0x13a/0x280 [f2fs] [f103e22d] build_curseg+0x2bd/0x620 [f2fs] [f104043b] build_segment_manager+0x1cb/0x920 [f2fs] [f1032c85] f2fs_fill_super+0x535/0x8e0 [f2fs] [c115b66a] mount_bdev+0x16a/0x1a0 [f102f63f] f2fs_mount+0x1f/0x30 [f2fs] [c115c096] mount_fs+0x36/0x170 [c1173635] vfs_kern_mount+0x55/0xe0 [c1175388] do_mount+0x1e8/0x900 [c1175d72] SyS_mount+0x82/0xc0 [c16059cc] sysenter_do_call+0x12/0x22 Suggested-by: Jaegeuk Kim jaegeuk@samsung.com Signed-off-by: Chao Yu chao2...@samsung.com --- fs/f2fs/node.c | 52 1 file changed, 24 insertions(+), 28 deletions(-) diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index 3d60d3d..02a59e9 100644 --- a/fs/f2fs/node.c +++ b/fs/f2fs/node.c @@ -1658,35 +1658,29 @@ int recover_inode_page(struct f2fs_sb_info *sbi, struct page *page) /* * ra_sum_pages() merge contiguous pages into one bio and submit. - * these pre-readed pages are linked in pages list. + * these pre-readed pages are alloced in bd_inode's mapping tree. */ -static int ra_sum_pages(struct f2fs_sb_info *sbi, struct list_head *pages, +static int ra_sum_pages(struct f2fs_sb_info *sbi, struct page **pages, int start, int nrpages) { - struct page *page; - int page_idx = start; + struct inode *inode = sbi-sb-s_bdev-bd_inode; + struct address_space *mapping = inode-i_mapping; + int i, page_idx = start; struct f2fs_io_info fio = { .type = META, .rw = READ_SYNC | REQ_META | REQ_PRIO }; - for (; page_idx start + nrpages; page_idx++) { - /* alloc temporal page for read node summary info*/ - page = alloc_page(GFP_F2FS_ZERO); - if (!page) + for (i = 0; page_idx start + nrpages; page_idx++, i++) { + /* alloc page in bd_inode for reading node summary info */ + pages[i] = grab_cache_page(mapping, page_idx); + if (!pages[i]) break; - - lock_page(page); - page-index = page_idx; - list_add_tail(page-lru, pages); + f2fs_submit_page_mbio(sbi, pages[i], page_idx, fio); } - list_for_each_entry(page, pages, lru) - f2fs_submit_page_mbio(sbi, page, page-index, fio); - f2fs_submit_merged_bio(sbi, META, READ); - - return page_idx - start; + return i; } int restore_node_summary(struct f2fs_sb_info *sbi, @@ -1694,11 +1688,11 @@ int restore_node_summary(struct f2fs_sb_info *sbi, { struct f2fs_node *rn; struct f2fs_summary *sum_entry; - struct page *page, *tmp; + struct inode *inode = sbi-sb-s_bdev-bd_inode; block_t addr; int bio_blocks = MAX_BIO_BLOCKS(max_hw_blocks(sbi)); - int i, last_offset, nrpages, err = 0; - LIST_HEAD(page_list); + struct page *pages[bio_blocks]; + int i, idx, last_offset, nrpages, err = 0; /* scan the node segment */ last_offset = sbi-blocks_per_seg; @@ -1709,29 +1703,31 @@ int restore_node_summary(struct f2fs_sb_info *sbi
Re: [f2fs-dev] [PATCH] f2fs: avoid crash when trace f2fs_submit_page_mbio event in ra_sum_pages
On Mon, May 26, 2014 at 02:26:24PM +0800, Chao Yu wrote: Hi changman, -Original Message- From: Changman Lee [mailto:cm224@samsung.com] Sent: Friday, May 23, 2014 1:14 PM To: Jaegeuk Kim Cc: Chao Yu; linux-fsde...@vger.kernel.org; linux-ker...@vger.kernel.org; linux-f2fs-devel@lists.sourceforge.net Subject: Re: [f2fs-dev] [PATCH] f2fs: avoid crash when trace f2fs_submit_page_mbio event in ra_sum_pages On Wed, May 21, 2014 at 12:36:46PM +0900, Jaegeuk Kim wrote: Hi Chao, 2014-05-16 (금), 17:14 +0800, Chao Yu: Previously we allocate pages with no mapping in ra_sum_pages(), so we may encounter a crash in event trace of f2fs_submit_page_mbio where we access mapping data of the page. We'd better allocate pages in bd_inode mapping and invalidate these pages after we restore data from pages. It could avoid crash in above scenario. Call Trace: [f1031630] ? ftrace_raw_event_f2fs_write_checkpoint+0x80/0x80 [f2fs] [f10377bb] f2fs_submit_page_mbio+0x1cb/0x200 [f2fs] [f103c5da] restore_node_summary+0x13a/0x280 [f2fs] [f103e22d] build_curseg+0x2bd/0x620 [f2fs] [f104043b] build_segment_manager+0x1cb/0x920 [f2fs] [f1032c85] f2fs_fill_super+0x535/0x8e0 [f2fs] [c115b66a] mount_bdev+0x16a/0x1a0 [f102f63f] f2fs_mount+0x1f/0x30 [f2fs] [c115c096] mount_fs+0x36/0x170 [c1173635] vfs_kern_mount+0x55/0xe0 [c1175388] do_mount+0x1e8/0x900 [c1175d72] SyS_mount+0x82/0xc0 [c16059cc] sysenter_do_call+0x12/0x22 Signed-off-by: Chao Yu chao2...@samsung.com --- fs/f2fs/node.c | 49 - 1 file changed, 28 insertions(+), 21 deletions(-) diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index 3d60d3d..b5cd814 100644 --- a/fs/f2fs/node.c +++ b/fs/f2fs/node.c @@ -1658,13 +1658,16 @@ int recover_inode_page(struct f2fs_sb_info *sbi, struct page *page) /* * ra_sum_pages() merge contiguous pages into one bio and submit. - * these pre-readed pages are linked in pages list. + * these pre-readed pages are alloced in bd_inode's mapping tree. */ -static int ra_sum_pages(struct f2fs_sb_info *sbi, struct list_head *pages, +static int ra_sum_pages(struct f2fs_sb_info *sbi, struct page **pages, int start, int nrpages) { struct page *page; + struct inode *inode = sbi-sb-s_bdev-bd_inode; How about use sbi-meta_inode instead of bd_inode, then we can do caching summary pages for further i/o. In my understanding, In ra_sum_pages() we readahead node pages in NODE segment, then we could padding current summary caching with nid of node page's footer. So we should not cache this readaheaded pages in meta_inode's mapping. Do I miss something? Regards Sorry, you're right. Forget about caching. I've confused ra_sum_pages with summary segments. + struct address_space *mapping = inode-i_mapping; int page_idx = start; + int alloced, readed; struct f2fs_io_info fio = { .type = META, .rw = READ_SYNC | REQ_META | REQ_PRIO @@ -1672,21 +1675,23 @@ static int ra_sum_pages(struct f2fs_sb_info *sbi, struct list_head *pages, for (; page_idx start + nrpages; page_idx++) { /* alloc temporal page for read node summary info*/ - page = alloc_page(GFP_F2FS_ZERO); + page = grab_cache_page(mapping, page_idx); if (!page) break; - - lock_page(page); - page-index = page_idx; - list_add_tail(page-lru, pages); + page_cache_release(page); IMO, we don't need to do like this. Instead, for() { page = grab_cache_page(); if (!page) break; page[page_idx] = page; f2fs_submit_page_mbio(sbi, page, fio); } f2fs_submit_merged_bio(sbi, META, READ); return page_idx - start; Afterwards, in restore_node_summry(), lock_page() will wait the end_io for read. ... f2fs_put_page(pages[index], 1); Thanks, } - list_for_each_entry(page, pages, lru) - f2fs_submit_page_mbio(sbi, page, page-index, fio); + alloced = page_idx - start; + readed = find_get_pages_contig(mapping, start, alloced, pages); + BUG_ON(alloced != readed); + + for (page_idx = 0; page_idx readed; page_idx++) + f2fs_submit_page_mbio(sbi, pages[page_idx], + pages[page_idx]-index, fio); f2fs_submit_merged_bio(sbi, META, READ); - return page_idx - start; + return readed
[f2fs-dev] [PATCH 1/2 V3] mkfs.f2fs: large volume support
Changes from V2 o remove CP_LARGE_VOL_LFLAG instead, use cp_payload in superblock because disk size is determined at format Changes from V1 o fix orphan node blkaddr -- 8 -- From 7e5e66699bb383e4fa7ce970e1cc8e10eb0a5c6f Mon Sep 17 00:00:00 2001 From: root root@f2fs-00.(none) Date: Mon, 12 May 2014 22:01:38 +0900 Subject: [PATCH 1/2] mkfs.f2fs: large volume support This patch supports large volume over about 3TB. Signed-off-by: Changman Lee cm224@samsung.com --- include/f2fs_fs.h |9 +++ mkfs/f2fs_format.c | 68 +++- 2 files changed, 66 insertions(+), 11 deletions(-) diff --git a/include/f2fs_fs.h b/include/f2fs_fs.h index 94d8dc3..3003f7f 100644 --- a/include/f2fs_fs.h +++ b/include/f2fs_fs.h @@ -223,6 +223,7 @@ enum { #define F2FS_LOG_SECTORS_PER_BLOCK 3 /* 4KB: F2FS_BLKSIZE */ #define F2FS_BLKSIZE 4096/* support only 4KB block */ #define F2FS_MAX_EXTENSION 64 /* # of extension entries */ +#define F2FS_BLK_ALIGN(x) (((x) + F2FS_BLKSIZE - 1) / F2FS_BLKSIZE) #define NULL_ADDR 0x0U #define NEW_ADDR -1U @@ -279,6 +280,7 @@ struct f2fs_super_block { __le16 volume_name[512];/* volume name */ __le32 extension_count; /* # of extensions below */ __u8 extension_list[F2FS_MAX_EXTENSION][8]; /* extension array */ + __le32 cp_payload; } __attribute__((packed)); /* @@ -457,6 +459,13 @@ struct f2fs_nat_block { #define SIT_ENTRY_PER_BLOCK (PAGE_CACHE_SIZE / sizeof(struct f2fs_sit_entry)) /* + * F2FS uses 4 bytes to represent block address. As a result, supported size of + * disk is 16 TB and it equals to 16 * 1024 * 1024 / 2 segments. + */ +#define F2FS_MAX_SEGMENT ((16 * 1024 * 1024) / 2) +#define MAX_SIT_BITMAP_SIZE((F2FS_MAX_SEGMENT / SIT_ENTRY_PER_BLOCK) / 8) + +/* * Note that f2fs_sit_entry-vblocks has the following bit-field information. * [15:10] : allocation type such as CURSEG__TYPE * [9:0] : valid block count diff --git a/mkfs/f2fs_format.c b/mkfs/f2fs_format.c index cdbf74a..58550a2 100644 --- a/mkfs/f2fs_format.c +++ b/mkfs/f2fs_format.c @@ -102,7 +102,8 @@ static int f2fs_prepare_super_block(void) u_int32_t blocks_for_sit, blocks_for_nat, blocks_for_ssa; u_int32_t total_valid_blks_available; u_int64_t zone_align_start_offset, diff, total_meta_segments; - u_int32_t sit_bitmap_size, max_nat_bitmap_size, max_nat_segments; + u_int32_t sit_bitmap_size, max_sit_bitmap_size; + u_int32_t max_nat_bitmap_size, max_nat_segments; u_int32_t total_zones; super_block.magic = cpu_to_le32(F2FS_SUPER_MAGIC); @@ -217,8 +218,25 @@ static int f2fs_prepare_super_block(void) */ sit_bitmap_size = ((le32_to_cpu(super_block.segment_count_sit) / 2) log_blks_per_seg) / 8; - max_nat_bitmap_size = CHECKSUM_OFFSET - sizeof(struct f2fs_checkpoint) + 1 - - sit_bitmap_size; + + if (sit_bitmap_size MAX_SIT_BITMAP_SIZE) + max_sit_bitmap_size = MAX_SIT_BITMAP_SIZE; + else + max_sit_bitmap_size = sit_bitmap_size; + + /* +* It should be reserved minimum 1 segment for nat. +* When sit is too large, we should expand cp area. It requires more pages for cp. +*/ + if (max_sit_bitmap_size + (CHECKSUM_OFFSET - sizeof(struct f2fs_checkpoint) + 65)) { + max_nat_bitmap_size = CHECKSUM_OFFSET - sizeof(struct f2fs_checkpoint) + 1; + super_block.cp_payload = F2FS_BLK_ALIGN(max_sit_bitmap_size); + } else { + max_nat_bitmap_size = CHECKSUM_OFFSET - sizeof(struct f2fs_checkpoint) + 1 - max_sit_bitmap_size; + super_block.cp_payload = 0; + } + max_nat_segments = (max_nat_bitmap_size * 8) log_blks_per_seg; if (le32_to_cpu(super_block.segment_count_nat) max_nat_segments) @@ -434,6 +452,7 @@ static int f2fs_write_check_point_pack(void) u_int64_t cp_seg_blk_offset = 0; u_int32_t crc = 0; int i; + char *cp_payload = NULL; ckp = calloc(F2FS_BLKSIZE, 1); if (ckp == NULL) { @@ -447,6 +466,12 @@ static int f2fs_write_check_point_pack(void) return -1; } + cp_payload = calloc(F2FS_BLKSIZE, 1); + if (cp_payload == NULL) { + MSG(1, \tError: Calloc Failed for cp_payload!!!\n); + return -1; + } + /* 1. cp page 1 of checkpoint pack 1 */ ckp-checkpoint_ver = cpu_to_le64(1); ckp-cur_node_segno[0] = @@ -485,9 +510,11 @@ static int f2fs_write_check_point_pack(void) ((le32_to_cpu(ckp-free_segment_count) + 6 - le32_to_cpu(ckp-overprov_segment_count)) * config.blks_per_seg)); - ckp-cp_pack_total_block_count
[f2fs-dev] [PATCH 2/2 V3] fsck.f2fs: large volume support
Changes from V2 o remove CP_LARGE_VOL_FLAG instead, use cp_payload in superblock because disk size is determined at format Changes from V1 o fix orphan node blkaddr -- 8 -- From 405367374f868a8cf29bef62c06bf53271b58f52 Mon Sep 17 00:00:00 2001 From: Changman Lee cm224@samsung.com Date: Mon, 12 May 2014 22:03:46 +0900 Subject: [PATCH 2/2] fsck.f2fs: large volume support This patch support large volume over about 3TB. Signed-off-by: Changman Lee cm224@samsung.com --- fsck/f2fs.h | 14 +++--- fsck/fsck.c |7 +-- fsck/mount.c | 22 -- lib/libf2fs.c |4 ++-- 4 files changed, 38 insertions(+), 9 deletions(-) diff --git a/fsck/f2fs.h b/fsck/f2fs.h index e1740fe..427a733 100644 --- a/fsck/f2fs.h +++ b/fsck/f2fs.h @@ -203,9 +203,17 @@ static inline unsigned long __bitmap_size(struct f2fs_sb_info *sbi, int flag) static inline void *__bitmap_ptr(struct f2fs_sb_info *sbi, int flag) { struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi); - int offset = (flag == NAT_BITMAP) ? - le32_to_cpu(ckpt-sit_ver_bitmap_bytesize) : 0; - return ckpt-sit_nat_version_bitmap + offset; + int offset; + if (le32_to_cpu(F2FS_RAW_SUPER(sbi)-cp_payload) 0) { + if (flag == NAT_BITMAP) + return ckpt-sit_nat_version_bitmap; + else + return ((char *)ckpt + F2FS_BLKSIZE); + } else { + offset = (flag == NAT_BITMAP) ? + le32_to_cpu(ckpt-sit_ver_bitmap_bytesize) : 0; + return ckpt-sit_nat_version_bitmap + offset; + } } static inline bool is_set_ckpt_flags(struct f2fs_checkpoint *cp, unsigned int f) diff --git a/fsck/fsck.c b/fsck/fsck.c index 20582c9..a1d5dd0 100644 --- a/fsck/fsck.c +++ b/fsck/fsck.c @@ -653,11 +653,14 @@ int fsck_chk_orphan_node(struct f2fs_sb_info *sbi) block_t start_blk, orphan_blkaddr, i, j; struct f2fs_orphan_block *orphan_blk; + struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi); - if (!is_set_ckpt_flags(F2FS_CKPT(sbi), CP_ORPHAN_PRESENT_FLAG)) + if (!is_set_ckpt_flags(ckpt, CP_ORPHAN_PRESENT_FLAG)) return 0; - start_blk = __start_cp_addr(sbi) + 1; + start_blk = __start_cp_addr(sbi) + 1 + + le32_to_cpu(F2FS_RAW_SUPER(sbi)-cp_payload); + orphan_blkaddr = __start_sum_addr(sbi) - 1; orphan_blk = calloc(BLOCK_SZ, 1); diff --git a/fsck/mount.c b/fsck/mount.c index e2f3ace..24ef3bf 100644 --- a/fsck/mount.c +++ b/fsck/mount.c @@ -129,6 +129,7 @@ void print_raw_sb_info(struct f2fs_sb_info *sbi) DISP_u32(sb, root_ino); DISP_u32(sb, node_ino); DISP_u32(sb, meta_ino); + DISP_u32(sb, cp_payload); printf(\n); } @@ -285,6 +286,7 @@ void *validate_checkpoint(struct f2fs_sb_info *sbi, block_t cp_addr, unsigned lo /* Read the 2nd cp block in this CP pack */ cp_page_2 = malloc(PAGE_SIZE); cp_addr += le32_to_cpu(cp_block-cp_pack_total_block_count) - 1; + if (dev_read_block(cp_page_2, cp_addr) 0) goto invalid_cp2; @@ -295,7 +297,7 @@ void *validate_checkpoint(struct f2fs_sb_info *sbi, block_t cp_addr, unsigned lo crc = *(unsigned int *)((unsigned char *)cp_block + crc_offset); if (f2fs_crc_valid(crc, cp_block, crc_offset)) - goto invalid_cp1; + goto invalid_cp2; cur_version = le64_to_cpu(cp_block-checkpoint_ver); @@ -319,8 +321,9 @@ int get_valid_checkpoint(struct f2fs_sb_info *sbi) unsigned long blk_size = sbi-blocksize; unsigned long long cp1_version = 0, cp2_version = 0; unsigned long long cp_start_blk_no; + unsigned int cp_blks = 1 + le32_to_cpu(F2FS_RAW_SUPER(sbi)-cp_payload); - sbi-ckpt = malloc(blk_size); + sbi-ckpt = malloc(cp_blks * blk_size); if (!sbi-ckpt) return -ENOMEM; /* @@ -351,6 +354,20 @@ int get_valid_checkpoint(struct f2fs_sb_info *sbi) memcpy(sbi-ckpt, cur_page, blk_size); + if (cp_blks 1) { + int i; + unsigned long long cp_blk_no; + + cp_blk_no = le32_to_cpu(raw_sb-cp_blkaddr); + if (cur_page == cp2) + cp_blk_no += 1 le32_to_cpu(raw_sb-log_blocks_per_seg); + /* copy sit bitmap */ + for (i = 1; i cp_blks; i++) { + unsigned char *ckpt = (unsigned char *)sbi-ckpt; + dev_read_block(cur_page, cp_blk_no + i); + memcpy(ckpt + i * blk_size, cur_page, blk_size); + } + } free(cp1); free(cp2); return 0; @@ -697,6 +714,7 @@ void check_block_count(struct f2fs_sb_info *sbi, int valid_blocks = 0; int i; + /* check segment usage */ ASSERT(GET_SIT_VBLOCKS(raw_sit) = sbi-blocks_per_seg); diff --git a/lib
Re: [f2fs-dev] [PATCH v2] f2fs: avoid crash when trace f2fs_submit_page_mbio event in ra_sum_pages
Hi, Chao Could you think about following once. move node_inode in front of build_segment_manager, then use node_inode instead of bd_inode. On Tue, May 27, 2014 at 08:41:07AM +0800, Chao Yu wrote: Previously we allocate pages with no mapping in ra_sum_pages(), so we may encounter a crash in event trace of f2fs_submit_page_mbio where we access mapping data of the page. We'd better allocate pages in bd_inode mapping and invalidate these pages after we restore data from pages. It could avoid crash in above scenario. Changes from V1 o remove redundant code in ra_sum_pages() suggested by Jaegeuk Kim. Call Trace: [f1031630] ? ftrace_raw_event_f2fs_write_checkpoint+0x80/0x80 [f2fs] [f10377bb] f2fs_submit_page_mbio+0x1cb/0x200 [f2fs] [f103c5da] restore_node_summary+0x13a/0x280 [f2fs] [f103e22d] build_curseg+0x2bd/0x620 [f2fs] [f104043b] build_segment_manager+0x1cb/0x920 [f2fs] [f1032c85] f2fs_fill_super+0x535/0x8e0 [f2fs] [c115b66a] mount_bdev+0x16a/0x1a0 [f102f63f] f2fs_mount+0x1f/0x30 [f2fs] [c115c096] mount_fs+0x36/0x170 [c1173635] vfs_kern_mount+0x55/0xe0 [c1175388] do_mount+0x1e8/0x900 [c1175d72] SyS_mount+0x82/0xc0 [c16059cc] sysenter_do_call+0x12/0x22 Suggested-by: Jaegeuk Kim jaegeuk@samsung.com Signed-off-by: Chao Yu chao2...@samsung.com --- fs/f2fs/node.c | 52 1 file changed, 24 insertions(+), 28 deletions(-) diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index 3d60d3d..02a59e9 100644 --- a/fs/f2fs/node.c +++ b/fs/f2fs/node.c @@ -1658,35 +1658,29 @@ int recover_inode_page(struct f2fs_sb_info *sbi, struct page *page) /* * ra_sum_pages() merge contiguous pages into one bio and submit. - * these pre-readed pages are linked in pages list. + * these pre-readed pages are alloced in bd_inode's mapping tree. */ -static int ra_sum_pages(struct f2fs_sb_info *sbi, struct list_head *pages, +static int ra_sum_pages(struct f2fs_sb_info *sbi, struct page **pages, int start, int nrpages) { - struct page *page; - int page_idx = start; + struct inode *inode = sbi-sb-s_bdev-bd_inode; + struct address_space *mapping = inode-i_mapping; + int i, page_idx = start; struct f2fs_io_info fio = { .type = META, .rw = READ_SYNC | REQ_META | REQ_PRIO }; - for (; page_idx start + nrpages; page_idx++) { - /* alloc temporal page for read node summary info*/ - page = alloc_page(GFP_F2FS_ZERO); - if (!page) + for (i = 0; page_idx start + nrpages; page_idx++, i++) { + /* alloc page in bd_inode for reading node summary info */ + pages[i] = grab_cache_page(mapping, page_idx); + if (!pages[i]) break; - - lock_page(page); - page-index = page_idx; - list_add_tail(page-lru, pages); + f2fs_submit_page_mbio(sbi, pages[i], page_idx, fio); } - list_for_each_entry(page, pages, lru) - f2fs_submit_page_mbio(sbi, page, page-index, fio); - f2fs_submit_merged_bio(sbi, META, READ); - - return page_idx - start; + return i; } int restore_node_summary(struct f2fs_sb_info *sbi, @@ -1694,11 +1688,11 @@ int restore_node_summary(struct f2fs_sb_info *sbi, { struct f2fs_node *rn; struct f2fs_summary *sum_entry; - struct page *page, *tmp; + struct inode *inode = sbi-sb-s_bdev-bd_inode; block_t addr; int bio_blocks = MAX_BIO_BLOCKS(max_hw_blocks(sbi)); - int i, last_offset, nrpages, err = 0; - LIST_HEAD(page_list); + struct page *pages[bio_blocks]; + int i, idx, last_offset, nrpages, err = 0; /* scan the node segment */ last_offset = sbi-blocks_per_seg; @@ -1709,29 +1703,31 @@ int restore_node_summary(struct f2fs_sb_info *sbi, nrpages = min(last_offset - i, bio_blocks); /* read ahead node pages */ - nrpages = ra_sum_pages(sbi, page_list, addr, nrpages); + nrpages = ra_sum_pages(sbi, pages, addr, nrpages); if (!nrpages) return -ENOMEM; - list_for_each_entry_safe(page, tmp, page_list, lru) { + for (idx = 0; idx nrpages; idx++) { if (err) goto skip; - lock_page(page); - if (unlikely(!PageUptodate(page))) { + lock_page(pages[idx]); + if (unlikely(!PageUptodate(pages[idx]))) { err = -EIO; } else { - rn = F2FS_NODE(page); + rn = F2FS_NODE(pages[idx]); sum_entry-nid = rn-footer.nid; sum_entry-version = 0;
[f2fs-dev] [PATCH V3] f2fs: large volume support
Changes from V2 o fix conversion like le32_to_cpu o use is_set_ckpt_flags instead of bit operation o check return value after memory allocation Changes from V1 o fix orphan node blkaddr for large volume Jaegeuk, What is your opinion about reallocation of sbi-ckpt ? If you have any idea, let me know. Thanks. -- 8 -- From 5a821fcec79fb9570a26104238b3c2391f6160ae Mon Sep 17 00:00:00 2001 From: Changman Lee cm224@samsung.com Date: Mon, 12 May 2014 12:27:43 +0900 Subject: [PATCH] f2fs: large volume support f2fs's cp has one page which consists of struct f2fs_checkpoint and version bitmap of sit and nat. To support lots of segments, we need more blocks for sit bitmap. So let's arrange sit bitmap as following: +-++ | f2fs_checkpoint | sit bitmap | | + nat bitmap|| +-++ 0 4kN blocks Signed-off-by: Changman Lee cm224@samsung.com --- fs/f2fs/checkpoint.c| 59 +++ fs/f2fs/f2fs.h | 13 +-- include/linux/f2fs_fs.h |2 ++ 3 files changed, 68 insertions(+), 6 deletions(-) diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c index fe968c7..cf2d1a7 100644 --- a/fs/f2fs/checkpoint.c +++ b/fs/f2fs/checkpoint.c @@ -366,12 +366,18 @@ static void recover_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino) void recover_orphan_inodes(struct f2fs_sb_info *sbi) { block_t start_blk, orphan_blkaddr, i, j; + struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi); if (!is_set_ckpt_flags(F2FS_CKPT(sbi), CP_ORPHAN_PRESENT_FLAG)) return; sbi-por_doing = true; - start_blk = __start_cp_addr(sbi) + 1; + + if (is_set_ckpt_flags(ckpt, CP_LARGE_VOL_FLAG)) + start_blk = __start_cp_addr(sbi) + F2FS_BLK_ALIGN( + le32_to_cpu(ckpt-sit_ver_bitmap_bytesize)); + else + start_blk = __start_cp_addr(sbi) + 1; orphan_blkaddr = __start_sum_addr(sbi) - 1; ra_meta_pages(sbi, start_blk, orphan_blkaddr, META_CP); @@ -544,6 +550,35 @@ int get_valid_checkpoint(struct f2fs_sb_info *sbi) cp_block = (struct f2fs_checkpoint *)page_address(cur_page); memcpy(sbi-ckpt, cp_block, blk_size); + if (is_set_ckpt_flags(sbi-ckpt, CP_LARGE_VOL_FLAG)) { + int i, cp_blks; + block_t cp_blk_no; + + cp_blk_no = le32_to_cpu(fsb-cp_blkaddr); + if (cur_page == cp2) + cp_blk_no += 1 le32_to_cpu(fsb-log_blocks_per_seg); + + cp_blks = 1 + F2FS_BLK_ALIGN( + le32_to_cpu(cp_block-sit_ver_bitmap_bytesize)); + + kfree(sbi-ckpt); + sbi-ckpt = kzalloc(cp_blks * blk_size, GFP_KERNEL); + if (!sbi-ckpt) + return -ENOMEM; + + memcpy(sbi-ckpt, cp_block, blk_size); + + for (i = 1; i cp_blks; i++) { + void *sit_bitmap_ptr; + unsigned char *ckpt = (unsigned char *)sbi-ckpt; + + cur_page = get_meta_page(sbi, cp_blk_no + i); + sit_bitmap_ptr = page_address(cur_page); + memcpy(ckpt + i * blk_size, sit_bitmap_ptr, blk_size); + f2fs_put_page(cur_page, 1); + } + } + f2fs_put_page(cp1, 1); f2fs_put_page(cp2, 1); return 0; @@ -736,6 +771,7 @@ static void do_checkpoint(struct f2fs_sb_info *sbi, bool is_umount) __u32 crc32 = 0; void *kaddr; int i; + int sit_bitmap_blks = 0; /* * This avoids to conduct wrong roll-forward operations and uses @@ -786,16 +822,22 @@ static void do_checkpoint(struct f2fs_sb_info *sbi, bool is_umount) orphan_blocks = (sbi-n_orphans + F2FS_ORPHANS_PER_BLOCK - 1) / F2FS_ORPHANS_PER_BLOCK; - ckpt-cp_pack_start_sum = cpu_to_le32(1 + orphan_blocks); + if (is_set_ckpt_flags(ckpt, CP_LARGE_VOL_FLAG)) + sit_bitmap_blks = F2FS_BLK_ALIGN( + le32_to_cpu(ckpt-sit_ver_bitmap_bytesize)); + ckpt-cp_pack_start_sum = cpu_to_le32(1 + sit_bitmap_blks + + orphan_blocks); if (is_umount) { set_ckpt_flags(ckpt, CP_UMOUNT_FLAG); ckpt-cp_pack_total_block_count = cpu_to_le32(2 + - data_sum_blocks + orphan_blocks + NR_CURSEG_NODE_TYPE); + sit_bitmap_blks + data_sum_blocks + + orphan_blocks + NR_CURSEG_NODE_TYPE); } else { clear_ckpt_flags(ckpt, CP_UMOUNT_FLAG); ckpt-cp_pack_total_block_count = cpu_to_le32(2 + - data_sum_blocks + orphan_blocks); + sit_bitmap_blks + data_sum_blocks
Re: [f2fs-dev] [PATCH] f2fs: avoid crash when trace f2fs_submit_page_mbio event in ra_sum_pages
On Wed, May 21, 2014 at 12:36:46PM +0900, Jaegeuk Kim wrote: Hi Chao, 2014-05-16 (금), 17:14 +0800, Chao Yu: Previously we allocate pages with no mapping in ra_sum_pages(), so we may encounter a crash in event trace of f2fs_submit_page_mbio where we access mapping data of the page. We'd better allocate pages in bd_inode mapping and invalidate these pages after we restore data from pages. It could avoid crash in above scenario. Call Trace: [f1031630] ? ftrace_raw_event_f2fs_write_checkpoint+0x80/0x80 [f2fs] [f10377bb] f2fs_submit_page_mbio+0x1cb/0x200 [f2fs] [f103c5da] restore_node_summary+0x13a/0x280 [f2fs] [f103e22d] build_curseg+0x2bd/0x620 [f2fs] [f104043b] build_segment_manager+0x1cb/0x920 [f2fs] [f1032c85] f2fs_fill_super+0x535/0x8e0 [f2fs] [c115b66a] mount_bdev+0x16a/0x1a0 [f102f63f] f2fs_mount+0x1f/0x30 [f2fs] [c115c096] mount_fs+0x36/0x170 [c1173635] vfs_kern_mount+0x55/0xe0 [c1175388] do_mount+0x1e8/0x900 [c1175d72] SyS_mount+0x82/0xc0 [c16059cc] sysenter_do_call+0x12/0x22 Signed-off-by: Chao Yu chao2...@samsung.com --- fs/f2fs/node.c | 49 - 1 file changed, 28 insertions(+), 21 deletions(-) diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index 3d60d3d..b5cd814 100644 --- a/fs/f2fs/node.c +++ b/fs/f2fs/node.c @@ -1658,13 +1658,16 @@ int recover_inode_page(struct f2fs_sb_info *sbi, struct page *page) /* * ra_sum_pages() merge contiguous pages into one bio and submit. - * these pre-readed pages are linked in pages list. + * these pre-readed pages are alloced in bd_inode's mapping tree. */ -static int ra_sum_pages(struct f2fs_sb_info *sbi, struct list_head *pages, +static int ra_sum_pages(struct f2fs_sb_info *sbi, struct page **pages, int start, int nrpages) { struct page *page; + struct inode *inode = sbi-sb-s_bdev-bd_inode; How about use sbi-meta_inode instead of bd_inode, then we can do caching summary pages for further i/o. + struct address_space *mapping = inode-i_mapping; int page_idx = start; + int alloced, readed; struct f2fs_io_info fio = { .type = META, .rw = READ_SYNC | REQ_META | REQ_PRIO @@ -1672,21 +1675,23 @@ static int ra_sum_pages(struct f2fs_sb_info *sbi, struct list_head *pages, for (; page_idx start + nrpages; page_idx++) { /* alloc temporal page for read node summary info*/ - page = alloc_page(GFP_F2FS_ZERO); + page = grab_cache_page(mapping, page_idx); if (!page) break; - - lock_page(page); - page-index = page_idx; - list_add_tail(page-lru, pages); + page_cache_release(page); IMO, we don't need to do like this. Instead, for() { page = grab_cache_page(); if (!page) break; page[page_idx] = page; f2fs_submit_page_mbio(sbi, page, fio); } f2fs_submit_merged_bio(sbi, META, READ); return page_idx - start; Afterwards, in restore_node_summry(), lock_page() will wait the end_io for read. ... f2fs_put_page(pages[index], 1); Thanks, } - list_for_each_entry(page, pages, lru) - f2fs_submit_page_mbio(sbi, page, page-index, fio); + alloced = page_idx - start; + readed = find_get_pages_contig(mapping, start, alloced, pages); + BUG_ON(alloced != readed); + + for (page_idx = 0; page_idx readed; page_idx++) + f2fs_submit_page_mbio(sbi, pages[page_idx], + pages[page_idx]-index, fio); f2fs_submit_merged_bio(sbi, META, READ); - return page_idx - start; + return readed; } int restore_node_summary(struct f2fs_sb_info *sbi, @@ -1694,11 +1699,11 @@ int restore_node_summary(struct f2fs_sb_info *sbi, { struct f2fs_node *rn; struct f2fs_summary *sum_entry; - struct page *page, *tmp; + struct inode *inode = sbi-sb-s_bdev-bd_inode; block_t addr; int bio_blocks = MAX_BIO_BLOCKS(max_hw_blocks(sbi)); - int i, last_offset, nrpages, err = 0; - LIST_HEAD(page_list); + struct page *pages[bio_blocks]; + int i, index, last_offset, nrpages, err = 0; /* scan the node segment */ last_offset = sbi-blocks_per_seg; @@ -1709,29 +1714,31 @@ int restore_node_summary(struct f2fs_sb_info *sbi, nrpages = min(last_offset - i, bio_blocks); /* read ahead node pages */ - nrpages = ra_sum_pages(sbi, page_list, addr, nrpages); + nrpages = ra_sum_pages(sbi, pages, addr, nrpages); if (!nrpages) return -ENOMEM; - list_for_each_entry_safe(page, tmp, page_list, lru) { + for (index = 0; index nrpages;
Re: [f2fs-dev] [PATCH] f2fs: large volume support
On 수, 2014-05-21 at 13:33 +0900, Jaegeuk Kim wrote: Hi Changman, 2014-05-12 (월), 15:59 +0900, Changman Lee: f2fs's cp has one page which consists of struct f2fs_checkpoint and version bitmap of sit and nat. To support lots of segments, we need more blocks for sit bitmap. So let's arrange sit bitmap as following: +-++ | f2fs_checkpoint | sit bitmap | | + nat bitmap|| +-++ 0 4kN blocks Signed-off-by: Changman Lee cm224@samsung.com --- fs/f2fs/checkpoint.c| 47 --- fs/f2fs/f2fs.h | 13 +++-- include/linux/f2fs_fs.h |2 ++ 3 files changed, 57 insertions(+), 5 deletions(-) diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c index fe968c7..f418243 100644 --- a/fs/f2fs/checkpoint.c +++ b/fs/f2fs/checkpoint.c @@ -544,6 +544,32 @@ int get_valid_checkpoint(struct f2fs_sb_info *sbi) cp_block = (struct f2fs_checkpoint *)page_address(cur_page); memcpy(sbi-ckpt, cp_block, blk_size); + if (is_set_ckpt_flags(sbi-ckpt, CP_LARGE_VOL_FLAG)) { + int i, cp_blks; + block_t cp_blk_no; + + cp_blk_no = le32_to_cpu(fsb-cp_blkaddr); + if (cur_page == cp2) + cp_blk_no += 1 le32_to_cpu(fsb-log_blocks_per_seg); + + cp_blks = 1 + F2FS_BLK_ALIGN(cp_block-sit_ver_bitmap_bytesize); Should covert le32_to_cpu(cp_block-sit_ver_bitmap_bytesize). Got it. + + kfree(sbi-ckpt); + sbi-ckpt = kzalloc(cp_blks * blk_size, GFP_KERNEL); Why does it have to reallocate this and not to handle -ENOMEM correctly? I think it's more simple than using another variable to point sit_ver_bitmap and it doesn't require alloc and free for the variable. + + memcpy(sbi-ckpt, cp_block, blk_size); + + for (i = 1; i cp_blks; i++) { + void *sit_bitmap_ptr; + unsigned char *ckpt = (unsigned char *)sbi-ckpt; + + cur_page = get_meta_page(sbi, cp_blk_no + i); + sit_bitmap_ptr = page_address(cur_page); + memcpy(ckpt + i * blk_size, sit_bitmap_ptr, blk_size); + f2fs_put_page(cur_page, 1); + } + } + f2fs_put_page(cp1, 1); f2fs_put_page(cp2, 1); return 0; @@ -736,6 +762,7 @@ static void do_checkpoint(struct f2fs_sb_info *sbi, bool is_umount) __u32 crc32 = 0; void *kaddr; int i; + int sit_bitmap_blks = 0; /* * This avoids to conduct wrong roll-forward operations and uses @@ -786,16 +813,21 @@ static void do_checkpoint(struct f2fs_sb_info *sbi, bool is_umount) orphan_blocks = (sbi-n_orphans + F2FS_ORPHANS_PER_BLOCK - 1) / F2FS_ORPHANS_PER_BLOCK; - ckpt-cp_pack_start_sum = cpu_to_le32(1 + orphan_blocks); + if (is_set_ckpt_flags(ckpt, CP_LARGE_VOL_FLAG)) + sit_bitmap_blks = F2FS_BLK_ALIGN(ckpt-sit_ver_bitmap_bytesize); + ckpt-cp_pack_start_sum = cpu_to_le32(1 + sit_bitmap_blks + + orphan_blocks); if (is_umount) { set_ckpt_flags(ckpt, CP_UMOUNT_FLAG); ckpt-cp_pack_total_block_count = cpu_to_le32(2 + - data_sum_blocks + orphan_blocks + NR_CURSEG_NODE_TYPE); + sit_bitmap_blks + data_sum_blocks + + orphan_blocks + NR_CURSEG_NODE_TYPE); } else { clear_ckpt_flags(ckpt, CP_UMOUNT_FLAG); ckpt-cp_pack_total_block_count = cpu_to_le32(2 + - data_sum_blocks + orphan_blocks); + sit_bitmap_blks + data_sum_blocks + + orphan_blocks); } if (sbi-n_orphans) @@ -821,6 +853,15 @@ static void do_checkpoint(struct f2fs_sb_info *sbi, bool is_umount) set_page_dirty(cp_page); f2fs_put_page(cp_page, 1); + for (i = 1; i 1 + sit_bitmap_blks; i++) { + cp_page = grab_meta_page(sbi, start_blk++); + kaddr = page_address(cp_page); + memcpy(kaddr, (char *)ckpt + i * F2FS_BLKSIZE, + (1 sbi-log_blocksize)); + set_page_dirty(cp_page); + f2fs_put_page(cp_page, 1); + } + if (sbi-n_orphans) { write_orphan_inodes(sbi, start_blk); start_blk += orphan_blocks; diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 676a2c6..9e147ae 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -764,9 +764,18 @@ static inline unsigned long __bitmap_size(struct f2fs_sb_info *sbi, int flag) static inline void *__bitmap_ptr(struct f2fs_sb_info *sbi, int flag) { struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi); - int offset = (flag == NAT_BITMAP
[f2fs-dev] [PATCH] f2fs: large volume support
f2fs's cp has one page which consists of struct f2fs_checkpoint and version bitmap of sit and nat. To support lots of segments, we need more blocks for sit bitmap. So let's arrange sit bitmap as following: +-++ | f2fs_checkpoint | sit bitmap | | + nat bitmap|| +-++ 0 4kN blocks Signed-off-by: Changman Lee cm224@samsung.com --- fs/f2fs/checkpoint.c| 55 +++ fs/f2fs/f2fs.h | 13 +-- include/linux/f2fs_fs.h |2 ++ 3 files changed, 64 insertions(+), 6 deletions(-) diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c index fe968c7..05e18f8 100644 --- a/fs/f2fs/checkpoint.c +++ b/fs/f2fs/checkpoint.c @@ -366,12 +366,18 @@ static void recover_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino) void recover_orphan_inodes(struct f2fs_sb_info *sbi) { block_t start_blk, orphan_blkaddr, i, j; + struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi); if (!is_set_ckpt_flags(F2FS_CKPT(sbi), CP_ORPHAN_PRESENT_FLAG)) return; sbi-por_doing = true; - start_blk = __start_cp_addr(sbi) + 1; + + if (is_set_ckpt_flags(ckpt, CP_LARGE_VOL_FLAG)) + start_blk = __start_cp_addr(sbi) + + F2FS_BLK_ALIGN(ckpt-sit_ver_bitmap_bytesize); + else + start_blk = __start_cp_addr(sbi) + 1; orphan_blkaddr = __start_sum_addr(sbi) - 1; ra_meta_pages(sbi, start_blk, orphan_blkaddr, META_CP); @@ -544,6 +550,32 @@ int get_valid_checkpoint(struct f2fs_sb_info *sbi) cp_block = (struct f2fs_checkpoint *)page_address(cur_page); memcpy(sbi-ckpt, cp_block, blk_size); + if (is_set_ckpt_flags(sbi-ckpt, CP_LARGE_VOL_FLAG)) { + int i, cp_blks; + block_t cp_blk_no; + + cp_blk_no = le32_to_cpu(fsb-cp_blkaddr); + if (cur_page == cp2) + cp_blk_no += 1 le32_to_cpu(fsb-log_blocks_per_seg); + + cp_blks = 1 + F2FS_BLK_ALIGN(cp_block-sit_ver_bitmap_bytesize); + + kfree(sbi-ckpt); + sbi-ckpt = kzalloc(cp_blks * blk_size, GFP_KERNEL); + + memcpy(sbi-ckpt, cp_block, blk_size); + + for (i = 1; i cp_blks; i++) { + void *sit_bitmap_ptr; + unsigned char *ckpt = (unsigned char *)sbi-ckpt; + + cur_page = get_meta_page(sbi, cp_blk_no + i); + sit_bitmap_ptr = page_address(cur_page); + memcpy(ckpt + i * blk_size, sit_bitmap_ptr, blk_size); + f2fs_put_page(cur_page, 1); + } + } + f2fs_put_page(cp1, 1); f2fs_put_page(cp2, 1); return 0; @@ -736,6 +768,7 @@ static void do_checkpoint(struct f2fs_sb_info *sbi, bool is_umount) __u32 crc32 = 0; void *kaddr; int i; + int sit_bitmap_blks = 0; /* * This avoids to conduct wrong roll-forward operations and uses @@ -786,16 +819,21 @@ static void do_checkpoint(struct f2fs_sb_info *sbi, bool is_umount) orphan_blocks = (sbi-n_orphans + F2FS_ORPHANS_PER_BLOCK - 1) / F2FS_ORPHANS_PER_BLOCK; - ckpt-cp_pack_start_sum = cpu_to_le32(1 + orphan_blocks); + if (is_set_ckpt_flags(ckpt, CP_LARGE_VOL_FLAG)) + sit_bitmap_blks = F2FS_BLK_ALIGN(ckpt-sit_ver_bitmap_bytesize); + ckpt-cp_pack_start_sum = cpu_to_le32(1 + sit_bitmap_blks + + orphan_blocks); if (is_umount) { set_ckpt_flags(ckpt, CP_UMOUNT_FLAG); ckpt-cp_pack_total_block_count = cpu_to_le32(2 + - data_sum_blocks + orphan_blocks + NR_CURSEG_NODE_TYPE); + sit_bitmap_blks + data_sum_blocks + + orphan_blocks + NR_CURSEG_NODE_TYPE); } else { clear_ckpt_flags(ckpt, CP_UMOUNT_FLAG); ckpt-cp_pack_total_block_count = cpu_to_le32(2 + - data_sum_blocks + orphan_blocks); + sit_bitmap_blks + data_sum_blocks + + orphan_blocks); } if (sbi-n_orphans) @@ -821,6 +859,15 @@ static void do_checkpoint(struct f2fs_sb_info *sbi, bool is_umount) set_page_dirty(cp_page); f2fs_put_page(cp_page, 1); + for (i = 1; i 1 + sit_bitmap_blks; i++) { + cp_page = grab_meta_page(sbi, start_blk++); + kaddr = page_address(cp_page); + memcpy(kaddr, (char *)ckpt + i * F2FS_BLKSIZE, + (1 sbi-log_blocksize)); + set_page_dirty(cp_page); + f2fs_put_page(cp_page, 1); + } + if (sbi-n_orphans) { write_orphan_inodes(sbi, start_blk
[f2fs-dev] [PATCH 2/2] fsck.f2fs: large volume support
In the case of volume size is over 2.x TB, checkpoint pack is also expanded over 4KB. It consists of f2fs_checkpoint and nat bitmap in a blocks, and n blocks of sit bitmap. Signed-off-by: Changman Lee cm224@samsung.com --- fsck/f2fs.h | 14 +++--- fsck/mount.c | 30 +- lib/libf2fs.c |4 ++-- 3 files changed, 42 insertions(+), 6 deletions(-) diff --git a/fsck/f2fs.h b/fsck/f2fs.h index e1740fe..439ab8c 100644 --- a/fsck/f2fs.h +++ b/fsck/f2fs.h @@ -203,9 +203,17 @@ static inline unsigned long __bitmap_size(struct f2fs_sb_info *sbi, int flag) static inline void *__bitmap_ptr(struct f2fs_sb_info *sbi, int flag) { struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi); - int offset = (flag == NAT_BITMAP) ? - le32_to_cpu(ckpt-sit_ver_bitmap_bytesize) : 0; - return ckpt-sit_nat_version_bitmap + offset; + int offset; + if (ckpt-ckpt_flags CP_LARGE_VOL_FLAG) { + if (flag == NAT_BITMAP) + return ckpt-sit_nat_version_bitmap; + else + return ((char *)ckpt + F2FS_BLKSIZE); + } else { + offset = (flag == NAT_BITMAP) ? + le32_to_cpu(ckpt-sit_ver_bitmap_bytesize) : 0; + return ckpt-sit_nat_version_bitmap + offset; + } } static inline bool is_set_ckpt_flags(struct f2fs_checkpoint *cp, unsigned int f) diff --git a/fsck/mount.c b/fsck/mount.c index e2f3ace..a12a6cf 100644 --- a/fsck/mount.c +++ b/fsck/mount.c @@ -265,6 +265,7 @@ void *validate_checkpoint(struct f2fs_sb_info *sbi, block_t cp_addr, unsigned lo unsigned long long cur_version = 0, pre_version = 0; unsigned int crc = 0; size_t crc_offset; + unsigned int sit_bitmap_blks = 0; /* Read the 1st cp block in this CP pack */ cp_page_1 = malloc(PAGE_SIZE); @@ -284,7 +285,10 @@ void *validate_checkpoint(struct f2fs_sb_info *sbi, block_t cp_addr, unsigned lo /* Read the 2nd cp block in this CP pack */ cp_page_2 = malloc(PAGE_SIZE); + if (cp_block-ckpt_flags CP_LARGE_VOL_FLAG) + sit_bitmap_blks = F2FS_BLK_ALIGN(cp_block-sit_ver_bitmap_bytesize); cp_addr += le32_to_cpu(cp_block-cp_pack_total_block_count) - 1; + if (dev_read_block(cp_page_2, cp_addr) 0) goto invalid_cp2; @@ -295,7 +299,7 @@ void *validate_checkpoint(struct f2fs_sb_info *sbi, block_t cp_addr, unsigned lo crc = *(unsigned int *)((unsigned char *)cp_block + crc_offset); if (f2fs_crc_valid(crc, cp_block, crc_offset)) - goto invalid_cp1; + goto invalid_cp2; cur_version = le64_to_cpu(cp_block-checkpoint_ver); @@ -351,6 +355,29 @@ int get_valid_checkpoint(struct f2fs_sb_info *sbi) memcpy(sbi-ckpt, cur_page, blk_size); + if (sbi-ckpt-ckpt_flags CP_LARGE_VOL_FLAG) { + int i, cp_blks; + unsigned long long cp_blk_no; + + free(sbi-ckpt); + + cp_blk_no = le32_to_cpu(raw_sb-cp_blkaddr); + if (cur_page == cp2) + cp_blk_no += 1 le32_to_cpu(raw_sb-log_blocks_per_seg); + + cp_blks = 1 + F2FS_BLK_ALIGN(sbi-ckpt-sit_ver_bitmap_bytesize); + + /* allocate cp size */ + sbi-ckpt = malloc(cp_blks * blk_size); + /* copy first cp data including nat bitmap */ + memcpy(sbi-ckpt, cur_page, blk_size); + /* copy sit bitmap */ + for (i = 1; i cp_blks; i++) { + unsigned char *ckpt = (unsigned char *)sbi-ckpt; + dev_read_block(cur_page, cp_blk_no + i); + memcpy(ckpt + i * blk_size, cur_page, blk_size); + } + } free(cp1); free(cp2); return 0; @@ -697,6 +724,7 @@ void check_block_count(struct f2fs_sb_info *sbi, int valid_blocks = 0; int i; + /* check segment usage */ ASSERT(GET_SIT_VBLOCKS(raw_sit) = sbi-blocks_per_seg); diff --git a/lib/libf2fs.c b/lib/libf2fs.c index fb3f8c1..1a16dd2 100644 --- a/lib/libf2fs.c +++ b/lib/libf2fs.c @@ -342,8 +342,8 @@ int f2fs_crc_valid(u_int32_t blk_crc, void *buf, int len) cal_crc = f2fs_cal_crc32(F2FS_SUPER_MAGIC, buf, len); if (cal_crc != blk_crc) { - DBG(0,CRC validation failed: cal_crc = %u \ - blk_crc = %u buff_size = 0x%x, + DBG(0,CRC validation failed: cal_crc = %u, + blk_crc = %u buff_size = 0x%x\n, cal_crc, blk_crc, len); return -1; } -- 1.7.9.5 -- Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE Instantly run your Selenium tests across 300+ browser/OS combos. Get unparalleled scalability from the best
[f2fs-dev] [PATCH 2/2] f2fs: call set_dirty_dir_page if inode is directory.
It's more legible and efficient to call set_dirty_dir_page only if inode-i_mode is directory before calling it. Signed-off-by: Changman Lee cm224@samsung.com --- fs/f2fs/checkpoint.c |3 --- fs/f2fs/data.c |3 ++- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c index 1e03ca5..cc61962 100644 --- a/fs/f2fs/checkpoint.c +++ b/fs/f2fs/checkpoint.c @@ -578,9 +578,6 @@ void set_dirty_dir_page(struct inode *inode, struct page *page) struct f2fs_sb_info *sbi = F2FS_SB(inode-i_sb); struct dir_inode_entry *new; - if (!S_ISDIR(inode-i_mode)) - return; - new = f2fs_kmem_cache_alloc(inode_entry_slab, GFP_NOFS); new-inode = inode; INIT_LIST_HEAD(new-list); diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index acd0159..ecfa674 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -1043,7 +1043,8 @@ static int f2fs_set_data_page_dirty(struct page *page) if (!PageDirty(page)) { __set_page_dirty_nobuffers(page); - set_dirty_dir_page(inode, page); + if (S_ISDIR(inode-i_mode)) + set_dirty_dir_page(inode, page); return 1; } return 0; -- 1.7.10.4 -- Learn Graph Databases - Download FREE O'Reilly Book Graph Databases is the definitive new guide to graph databases and their applications. Written by three acclaimed leaders in the field, this first edition is now available. Download your free book today! http://p.sf.net/sfu/13534_NeoTech ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH] f2fstat: add memory information used by f2fs
This patch adds memory information used by f2fs. Signed-off-by: Changman Lee cm224@samsung.com --- tools/f2fstat.c | 69 +++ 1 file changed, 44 insertions(+), 25 deletions(-) diff --git a/tools/f2fstat.c b/tools/f2fstat.c index b4f22ae..8ece660 100644 --- a/tools/f2fstat.c +++ b/tools/f2fstat.c @@ -16,6 +16,9 @@ */ #define F2FS_STATUS/sys/kernel/debug/f2fs/status +#define KEY_NODE 0x0001 +#define KEY_META 0x0010 + unsigned long util; unsigned long used_node_blks; unsigned long used_data_blks; @@ -33,9 +36,9 @@ unsigned long gc_node_blks; //unsigned long extent_hit_ratio; -unsigned long dirty_node; +unsigned long dirty_node, node_kb; unsigned long dirty_dents; -unsigned long dirty_meta; +unsigned long dirty_meta, meta_kb; unsigned long nat_caches; unsigned long dirty_sit; @@ -43,7 +46,7 @@ unsigned long free_nids; unsigned long ssr_blks; unsigned long lfs_blks; - +unsigned long memory_kb; struct options { int delay; @@ -54,6 +57,7 @@ struct options { struct mm_table { const char *name; unsigned long *val; + int flag; }; static int compare_mm_table(const void *a, const void *b) @@ -84,21 +88,22 @@ void f2fstat(struct options *opt) int found_cnt = 0; static struct mm_table f2fstat_table[] = { - { - Data, used_data_blks }, - { - Dirty, dirty_segs }, - { - Free, free_segs }, - { - NATs, nat_caches }, - { - Node, used_node_blks }, - { - Prefree,prefree_segs }, - { - SITs, dirty_sit }, - { - Valid, valid_segs }, - { - dents, dirty_dents }, - { - meta, dirty_meta }, - { - nodes, dirty_node }, - { GC calls, gc }, - { LFS,lfs_blks }, - { SSR,ssr_blks }, - { Utilization,util }, + { - Data, used_data_blks,0 }, + { - Dirty, dirty_segs,0 }, + { - Free, free_segs, 0 }, + { - NATs, nat_caches,0 }, + { - Node, used_node_blks,0 }, + { - Prefree,prefree_segs, 0 }, + { - SITs, dirty_sit, 0 }, + { - Valid, valid_segs,0 }, + { - dents, dirty_dents, 0 }, + { - meta, dirty_meta,KEY_META }, + { - nodes, dirty_node,KEY_NODE }, + { GC calls, gc,0 }, + { LFS,lfs_blks, 0 }, + { Memory, memory_kb, 0 }, + { SSR,ssr_blks, 0 }, + { Utilization,util, 0 }, }; f2fstat_table_cnt = sizeof(f2fstat_table)/sizeof(struct mm_table); @@ -147,6 +152,20 @@ void f2fstat(struct options *opt) goto nextline; *(found-val) = strtoul(head, tail, 10); + if (found-flag) { + int npages; + tail = strstr(head, in); + head = tail + 2; + npages = strtoul(head, tail, 10); + switch (found-flag (KEY_NODE | KEY_META)) { + case KEY_NODE: + node_kb = npages * 4; + break; + case KEY_META: + meta_kb = npages * 4; + break; + } + } if (++found_cnt == f2fstat_table_cnt) break; nextline: @@ -193,13 +212,13 @@ void parse_option(int argc, char *argv[], struct options *opt) void print_head(void) { - printf(---utilization--- ---main area ---balancing async-- -gc- ---alloc---\n); - printf(util node data free valid dirty prefree node dent meta sit gcssrlfs\n); + fprintf(stderr, ---utilization--- ---main area ---balancing async-- -gc- ---alloc--- -memory-\n); + fprintf(stderr, util node data free valid dirty prefree node dent meta sit gcssrlfs total node meta\n); } int main(int argc, char *argv[]) { - char format[] = %3ld %6ld %6ld %6ld %6ld %6ld %6ld %5ld %5ld %3ld %3ld %5ld %6ld %6ld\n; + char format[] = %3ld %6ld %6ld %6ld %6ld %6ld %6ld %5ld %5ld %3ld %3ld %5ld %6ld %6ld %6ld %6ld %6ld\n; int
[f2fs-dev] [PATCH 2/2] fibmap.f2fs: add bdev information
This patch shows devname and start_lba based on zero sector. fibmap reports related lba, sometimes we want to know absolute lba of file to compare with blktrace. Signed-off-by: Changman Lee cm224@samsung.com --- tools/fibmap.c | 44 +++- 1 file changed, 43 insertions(+), 1 deletion(-) diff --git a/tools/fibmap.c b/tools/fibmap.c index 9eb6b90..ed0a08e 100644 --- a/tools/fibmap.c +++ b/tools/fibmap.c @@ -7,6 +7,8 @@ #include errno.h #include sys/ioctl.h #include sys/stat.h +#include libgen.h +#include linux/hdreg.h #include linux/types.h #include linux/fs.h @@ -41,6 +43,42 @@ void print_stat(struct stat64 *st) printf(\n\n); } +void stat_bdev(struct stat64 *st, unsigned int *start_lba) +{ + struct stat bdev_stat; + struct hd_geometry geom; + char devname[32] = { 0, }; + char linkname[32] = { 0, }; + int fd; + + sprintf(devname, /dev/block/%d:%d, major(st-st_dev), minor(st-st_dev)); + + fd = open(devname, O_RDONLY); + if (fd 0) + return; + + if (fstat(fd, bdev_stat) 0) + goto out; + + if (S_ISBLK(bdev_stat.st_mode)) { + if (ioctl(fd, HDIO_GETGEO, geom) 0) + *start_lba = 0; + else + *start_lba = geom.start; + } + + if (readlink(devname, linkname, sizeof(linkname)) 0) + goto out; + + printf(bdev info---\n); + printf(devname = %s\n, basename(linkname)); + printf(start_lba = %u\n, *start_lba); + +out: + close(fd); + +} + int main(int argc, char *argv[]) { int fd; @@ -50,6 +88,7 @@ int main(int argc, char *argv[]) int total_blks; unsigned int i; struct file_ext ext; + __u32 start_lba; __u32 blknum; if (argc != 2) { @@ -73,9 +112,12 @@ int main(int argc, char *argv[]) goto out; } + stat_bdev(st, start_lba); + total_blks = (st.st_size + st.st_blksize - 1) / st.st_blksize; - printf(\n%s :\n, filename); + printf(\nfile info---\n); + printf(%s :\n, filename); print_stat(st); printf(file_pos start_blk end_blkblks\n); -- 1.7.9.5 -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH] f2fs-tools: add f2fstat to print f2fs's status in sec
This tool prints /sys/kernel/debug/f2fs/status in sec so that we can monitor variation of f2fs status. Signed-off-by: Changman Lee cm224@samsung.com --- Makefile.am |2 +- configure.ac |1 + tools/Makefile.am |7 ++ tools/f2fstat.c | 216 + 4 files changed, 225 insertions(+), 1 deletion(-) create mode 100644 tools/Makefile.am create mode 100644 tools/f2fstat.c diff --git a/Makefile.am b/Makefile.am index ca376b4..d2921d6 100644 --- a/Makefile.am +++ b/Makefile.am @@ -2,4 +2,4 @@ ACLOCAL_AMFLAGS = -I m4 -SUBDIRS = man lib mkfs fsck +SUBDIRS = man lib mkfs fsck tools diff --git a/configure.ac b/configure.ac index c5ca858..c2dafb0 100644 --- a/configure.ac +++ b/configure.ac @@ -83,6 +83,7 @@ AC_CONFIG_FILES([ lib/Makefile mkfs/Makefile fsck/Makefile + tools/Makefile ]) AC_OUTPUT diff --git a/tools/Makefile.am b/tools/Makefile.am new file mode 100644 index 000..8442387 --- /dev/null +++ b/tools/Makefile.am @@ -0,0 +1,7 @@ +## Makefile.am + +AM_CPPFLAGS = ${libuuid_CFLAGS} -I$(top_srcdir)/include +AM_CFLAGS = -Wall +sbin_PROGRAMS = f2fstat +f2fstat_SOURCES = f2fstat.c +f2fstat_LDADD = ${libuuid_LIBS} $(top_builddir)/lib/libf2fs.la diff --git a/tools/f2fstat.c b/tools/f2fstat.c new file mode 100644 index 000..75027a8 --- /dev/null +++ b/tools/f2fstat.c @@ -0,0 +1,216 @@ +#include stdio.h +#include unistd.h +#include stdlib.h +#include string.h +#include fcntl.h + +#ifdef DEBUG +#define dbg(fmt, args...) printf(fmt, __VA_ARGS__); +#else +#define dbg(fmt, args...) +#endif + +/* + * f2fs status + */ +#define F2FS_STATUS/sys/kernel/debug/f2fs/status + +unsigned long util; +unsigned long used_node_blks; +unsigned long used_data_blks; +//unsigned long inline_inode; + +unsigned long free_segs; +unsigned long valid_segs; +unsigned long dirty_segs; +unsigned long prefree_segs; + +unsigned long gc; +unsigned long bg_gc; +unsigned long gc_data_blks; +unsigned long gc_node_blks; + +//unsigned long extent_hit_ratio; + +unsigned long dirty_node; +unsigned long dirty_dents; +unsigned long dirty_meta; +unsigned long nat_caches; +unsigned long dirty_sit; + +unsigned long free_nids; + +unsigned long ssr_blks; +unsigned long lfs_blks; + + +struct options { + int delay; + int interval; +}; + +struct mm_table { + const char *name; + unsigned long *val; +}; + +static int compare_mm_table(const void *a, const void *b) +{ + dbg([COMPARE] %s, %s\n, ((struct mm_table *)a)-name, ((struct mm_table *)b)-name); + return strcmp(((struct mm_table *)a)-name, ((struct mm_table *)b)-name); +} + +static inline void remove_newline(char **head) +{ +again: + if (**head == '\n') { + *head = *head + 1; + goto again; + } +} + +void f2fstat(void) +{ + int fd; + int ret; + char keyname[32]; + char buf[4096]; + struct mm_table key = { keyname, NULL }; + struct mm_table *found; + int f2fstat_table_cnt; + char *head, *tail; + + static struct mm_table f2fstat_table[] = { + { - Data, used_data_blks }, + { - Dirty, dirty_segs }, + { - Free, free_segs }, + { - NATs, nat_caches }, + { - Node, used_node_blks }, + { - Prefree,prefree_segs }, + { - SITs, dirty_sit }, + { - Valid, valid_segs }, + { - dents, dirty_dents }, + { - meta, dirty_meta }, + { - nodes, dirty_node }, + { GC calls, gc }, + { LFS,lfs_blks }, + { SSR,ssr_blks }, + { Utilization,util }, + }; + + f2fstat_table_cnt = sizeof(f2fstat_table)/sizeof(struct mm_table); + + fd = open(F2FS_STATUS, O_RDONLY); + if (fd 0) { + perror(open F2FS_STATUS); + exit(EXIT_FAILURE); + } + + ret = read(fd, buf, 4096); + if (ret 0) { + perror(read F2FS_STATUS); + exit(EXIT_FAILURE); + } + buf[ret] = '\0'; + + head = buf; + for (;;) { + remove_newline(head); + tail = strchr(head, ':'); + if (!tail) + break; + *tail = '\0'; + if (strlen(head) = sizeof(keyname)) { + dbg([OVER] %s\n, head); + *tail = ':'; + tail = strchr(head, '\n'); + head = tail + 1; + continue; + } + + strcpy(keyname, head); + + found = bsearch(key, f2fstat_table, f2fstat_table_cnt, sizeof(struct mm_table), compare_mm_table
[f2fs-dev] [PATCH] f2fs: unify rw and sync parameter into rw
When we submit io, we can know whether the io is read or write and sync mode or not. So we can remove redundant sync parameter. Signed-off-by: Changman Lee cm224@samsung.com --- fs/f2fs/checkpoint.c | 12 ++-- fs/f2fs/data.c | 19 --- fs/f2fs/f2fs.h |2 +- fs/f2fs/gc.c |2 +- fs/f2fs/node.c | 15 --- fs/f2fs/segment.c| 16 6 files changed, 32 insertions(+), 34 deletions(-) diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c index 38f4a224..76b557c 100644 --- a/fs/f2fs/checkpoint.c +++ b/fs/f2fs/checkpoint.c @@ -158,8 +158,8 @@ long sync_meta_pages(struct f2fs_sb_info *sbi, enum page_type type, } if (nwritten) - f2fs_submit_merged_bio(sbi, type, nr_to_write == LONG_MAX, - WRITE); + f2fs_submit_merged_bio(sbi, type, + (nr_to_write == LONG_MAX) ? WRITE_SYNC : WRITE); return nwritten; } @@ -592,7 +592,7 @@ retry: * We should submit bio, since it exists several * wribacking dentry pages in the freeing inode. */ - f2fs_submit_merged_bio(sbi, DATA, true, WRITE); + f2fs_submit_merged_bio(sbi, DATA, WRITE_SYNC); } goto retry; } @@ -798,9 +798,9 @@ void write_checkpoint(struct f2fs_sb_info *sbi, bool is_umount) trace_f2fs_write_checkpoint(sbi-sb, is_umount, finish block_ops); - f2fs_submit_merged_bio(sbi, DATA, true, WRITE); - f2fs_submit_merged_bio(sbi, NODE, true, WRITE); - f2fs_submit_merged_bio(sbi, META, true, WRITE); + f2fs_submit_merged_bio(sbi, DATA, WRITE_SYNC); + f2fs_submit_merged_bio(sbi, NODE, WRITE_SYNC); + f2fs_submit_merged_bio(sbi, META, WRITE_SYNC); /* * update checkpoint pack index diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index 4e2fc09..470db6a 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -95,7 +95,7 @@ static void f2fs_write_end_io(struct bio *bio, int err) static void __submit_merged_bio(struct f2fs_sb_info *sbi, struct f2fs_bio_info *io, - enum page_type type, bool sync, int rw) + enum page_type type, int rw) { enum page_type btype = PAGE_TYPE_OF_BIO(type); @@ -106,16 +106,12 @@ static void __submit_merged_bio(struct f2fs_sb_info *sbi, rw |= REQ_META; if (is_read_io(rw)) { - if (sync) - rw |= READ_SYNC; submit_bio(rw, io-bio); trace_f2fs_submit_read_bio(sbi-sb, rw, type, io-bio); io-bio = NULL; return; } - if (sync) - rw |= WRITE_SYNC; if (type = META_FLUSH) rw |= WRITE_FLUSH_FUA; @@ -136,7 +132,7 @@ static void __submit_merged_bio(struct f2fs_sb_info *sbi, } void f2fs_submit_merged_bio(struct f2fs_sb_info *sbi, - enum page_type type, bool sync, int rw) + enum page_type type, int rw) { enum page_type btype = PAGE_TYPE_OF_BIO(type); struct f2fs_bio_info *io; @@ -144,7 +140,7 @@ void f2fs_submit_merged_bio(struct f2fs_sb_info *sbi, io = is_read_io(rw) ? sbi-read_io : sbi-write_io[btype]; mutex_lock(io-io_mutex); - __submit_merged_bio(sbi, io, type, sync, rw); + __submit_merged_bio(sbi, io, type, rw); mutex_unlock(io-io_mutex); } @@ -195,7 +191,7 @@ void f2fs_submit_page_mbio(struct f2fs_sb_info *sbi, struct page *page, inc_page_count(sbi, F2FS_WRITEBACK); if (io-bio io-last_block_in_bio != blk_addr - 1) - __submit_merged_bio(sbi, io, type, true, rw); + __submit_merged_bio(sbi, io, type, rw); alloc_new: if (io-bio == NULL) { bio_blocks = MAX_BIO_BLOCKS(max_hw_blocks(sbi)); @@ -212,7 +208,7 @@ alloc_new: if (bio_add_page(io-bio, page, PAGE_CACHE_SIZE, 0) PAGE_CACHE_SIZE) { - __submit_merged_bio(sbi, io, type, true, rw); + __submit_merged_bio(sbi, io, type, rw); goto alloc_new; } @@ -733,7 +729,7 @@ write: goto redirty_out; if (wbc-for_reclaim) - f2fs_submit_merged_bio(sbi, DATA, true, WRITE); + f2fs_submit_merged_bio(sbi, DATA, WRITE_SYNC); clear_cold_data(page); out: @@ -785,7 +781,8 @@ static int f2fs_write_data_pages(struct address_space *mapping, ret = write_cache_pages(mapping, wbc, __f2fs_writepage, mapping); if (locked) mutex_unlock(sbi-writepages); - f2fs_submit_merged_bio(sbi, DATA, wbc-sync_mode == WB_SYNC_ALL, WRITE); + f2fs_submit_merged_bio(sbi
Re: [f2fs-dev] [PATCH] f2fs: introduce f2fs_find_next(_zero)_bit
I agree. Your suggestion is more good. Thanks for your review. On 2013년 11월 15일 13:31, Jaegeuk Kim wrote: Hi, IMO, it would be better give names like __find_rev_next(_zero)_bit. If there is no objection, I'll modify and apply them by myself. Thanks, :) 2013-11-15 (금), 10:42 +0900, Changman Lee: When f2fs_set_bit is used, in a byte MSB and LSB is reversed, in that case we can use f2fs_find_next_bit or f2fs_find_next_zero_bit. Signed-off-by: Changman Lee cm224@samsung.com --- fs/f2fs/segment.c | 143 + 1 file changed, 143 insertions(+) diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index fa284d3..b2de887 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -21,6 +21,149 @@ #include trace/events/f2fs.h /* + * f2fs_ffs is copied from include/asm-generic/bitops/__ffs.h because + * MSB and LSB is reversed in a byte by f2fs_set_bit. + */ +static inline unsigned long f2fs_ffs(unsigned long word) +{ +int num = 0; + +#if BITS_PER_LONG == 64 +if ((word 0x) == 0) { +num += 32; +word = 32; +} +#endif +if ((word 0x) == 0) { +num += 16; +word = 16; +} +if ((word 0xff) == 0) { +num += 8; +word = 8; +} +if ((word 0xf0) == 0) +num += 4; +else +word = 4; +if ((word 0xc) == 0) +num += 2; +else +word = 2; +if ((word 0x2) == 0) +num += 1; +return num; +} + +#define f2fs_ffz(x) f2fs_ffs(~(x)) + +/* + * f2fs_find_next(_zero)_bit is copied from lib/find_next_bit.c becasue + * f2fs_set_bit makes MSB and LSB reversed in a byte. + * Example: + * LSB -- MSB + * f2fs_set_bit(0, bitmap) = 0001 + * f2fs_set_bit(7, bitmap) = 1000 + */ +static unsigned long f2fs_find_next_bit(const unsigned long *addr, +unsigned long size, unsigned long offset) +{ +const unsigned long *p = addr + BIT_WORD(offset); +unsigned long result = offset ~(BITS_PER_LONG - 1); +unsigned long tmp; +unsigned long mask, submask; +unsigned long quot, rest; + +if (offset = size) +return size; +size -= result; +offset %= BITS_PER_LONG; +if (!offset) +goto aligned; +tmp = *(p++); +quot = (offset 3) 3; +rest = offset 0x7; +mask = ~0UL quot; +submask = (unsigned char)(0xff rest) rest; +submask = quot; +mask = submask; +tmp = mask; +if (size BITS_PER_LONG) +goto found_first; +if (tmp) +goto found_middle; +size -= BITS_PER_LONG; +result += BITS_PER_LONG; +aligned: +while (size ~(BITS_PER_LONG-1)) { +tmp = *(p++); +if (tmp) +goto found_middle; +result += BITS_PER_LONG; +size -= BITS_PER_LONG; +} +if (!size) +return result; +tmp = *p; + +found_first: +tmp = (~0UL (BITS_PER_LONG - size)); +if (tmp == 0UL) /* Are any bits set? */ +return result + size; /* Nope. */ +found_middle: +return result + f2fs_ffs(tmp); +} + +static unsigned long f2fs_find_next_zero_bit(const unsigned long *addr, +unsigned long size, unsigned long offset) +{ +const unsigned long *p = addr + BIT_WORD(offset); +unsigned long result = offset ~(BITS_PER_LONG - 1); +unsigned long tmp; +unsigned long mask, submask; +unsigned long quot, rest; + +if (offset = size) +return size; +size -= result; +offset %= BITS_PER_LONG; +if (!offset) +goto aligned; +tmp = *(p++); +quot = (offset 3) 3; +rest = offset 0x7; +mask = ~(~0UL quot); +submask = (unsigned char)~((unsigned char)(0xff rest) rest); +submask = quot; +mask += submask; +tmp |= mask; +if (size BITS_PER_LONG) +goto found_first; +if (~tmp) +goto found_middle; +size -= BITS_PER_LONG; +result += BITS_PER_LONG; +aligned: +while (size ~(BITS_PER_LONG - 1)) { +tmp = *(p++); +if (~tmp) +goto found_middle; +result += BITS_PER_LONG; +size -= BITS_PER_LONG; +} +if (!size) +return result; +tmp = *p; + +found_first: +tmp |= ~0UL size; +if (tmp == ~0UL)/* Are any bits zero? */ +return result + size; /* Nope. */ +found_middle: +return result + f2fs_ffz(tmp); +} + +/* * This function balances dirty node and dentry pages. * In addition, it controls garbage collection. */ -- DreamFactory - Open Source REST JSON Services for HTML5 Native Apps OAuth
[f2fs-dev] [PATCH] f2fs: issue more large discard command
When f2fs issues discard command, if segment is contiguous, let's issue more large segment to gather adjacent segments. ** blktrace ** 179,10 585942.619023770 971 C D 131072 + 2097152 [0] 179,1033665 108.840475468 971 C D 2228224 + 2494464 [0] 179,1033671 109.131616427 971 C D 14909440 + 344064 [0] 179,1033677 109.137100677 971 C D 15261696 + 4096 [0] Signed-off-by: Changman Lee cm224@samsung.com --- fs/f2fs/segment.c | 40 ++-- 1 file changed, 34 insertions(+), 6 deletions(-) diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index b7186a3..09f1375 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -141,8 +141,12 @@ void clear_prefree_segments(struct f2fs_sb_info *sbi) struct dirty_seglist_info *dirty_i = DIRTY_I(sbi); unsigned int segno = -1; unsigned int total_segs = TOTAL_SEGS(sbi); + bool init = true; + int count = 0; + int start_segno, prev_segno; mutex_lock(dirty_i-seglist_lock); + while (1) { segno = find_next_bit(dirty_i-dirty_segmap[PRE], total_segs, segno + 1); @@ -152,15 +156,39 @@ void clear_prefree_segments(struct f2fs_sb_info *sbi) if (test_and_clear_bit(segno, dirty_i-dirty_segmap[PRE])) dirty_i-nr_dirty[PRE]--; - /* Let's use trim */ - if (test_opt(sbi, DISCARD)) - blkdev_issue_discard(sbi-sb-s_bdev, - START_BLOCK(sbi, segno) + if (init) { + init = false; + start_segno = segno; + prev_segno = segno; + count = 1; + continue; + } + + if (segno == prev_segno + 1) { + count++; + prev_segno = segno; + } else { + if (test_opt(sbi, DISCARD)) + blkdev_issue_discard(sbi-sb-s_bdev, + START_BLOCK(sbi, start_segno) sbi-log_sectors_per_block, - 1 (sbi-log_sectors_per_block + - sbi-log_blocks_per_seg), + (1 (sbi-log_sectors_per_block + + sbi-log_blocks_per_seg)) * count, GFP_NOFS, 0); + start_segno = segno; + prev_segno = segno; + count = 1; + } } + + if (count test_opt(sbi, DISCARD)) + blkdev_issue_discard(sbi-sb-s_bdev, + START_BLOCK(sbi, start_segno) + sbi-log_sectors_per_block, + (1 (sbi-log_sectors_per_block + + sbi-log_blocks_per_seg)) * count, + GFP_NOFS, 0); + mutex_unlock(dirty_i-seglist_lock); } -- 1.7.9.5 -- November Webinars for C, C++, Fortran Developers Accelerate application performance with scalable programming models. Explore techniques for threading, error checking, porting, and tuning. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60136231iu=/4140/ostg.clktrk ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH v2] f2fs: cleanup waiting routine for writeback pages in cp
use genernal method supported by kernel o changes from v1 If any waiter exists at end io, wake up it. Signed-off-by: Changman Lee cm224@samsung.com --- fs/f2fs/checkpoint.c | 25 - fs/f2fs/f2fs.h |2 +- fs/f2fs/segment.c|5 +++-- fs/f2fs/super.c |1 + 4 files changed, 21 insertions(+), 12 deletions(-) diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c index d430157..5716e5e 100644 --- a/fs/f2fs/checkpoint.c +++ b/fs/f2fs/checkpoint.c @@ -634,6 +634,21 @@ static void unblock_operations(struct f2fs_sb_info *sbi) f2fs_unlock_all(sbi); } +static void wait_on_all_pages_writeback(struct f2fs_sb_info *sbi) +{ + DEFINE_WAIT(wait); + + for (;;) { + prepare_to_wait(sbi-cp_wait, wait, TASK_UNINTERRUPTIBLE); + + if (!get_pages(sbi, F2FS_WRITEBACK)) + break; + + io_schedule(); + } + finish_wait(sbi-cp_wait, wait); +} + static void do_checkpoint(struct f2fs_sb_info *sbi, bool is_umount) { struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi); @@ -743,15 +758,7 @@ static void do_checkpoint(struct f2fs_sb_info *sbi, bool is_umount) f2fs_put_page(cp_page, 1); /* wait for previous submitted node/meta pages writeback */ - sbi-cp_task = current; - while (get_pages(sbi, F2FS_WRITEBACK)) { - set_current_state(TASK_UNINTERRUPTIBLE); - if (!get_pages(sbi, F2FS_WRITEBACK)) - break; - io_schedule(); - } - __set_current_state(TASK_RUNNING); - sbi-cp_task = NULL; + wait_on_all_pages_writeback(sbi); filemap_fdatawait_range(sbi-node_inode-i_mapping, 0, LONG_MAX); filemap_fdatawait_range(sbi-meta_inode-i_mapping, 0, LONG_MAX); diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 625eb4b..89dc750 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -372,7 +372,7 @@ struct f2fs_sb_info { struct mutex writepages;/* mutex for writepages() */ bool por_doing; /* recovery is doing or not */ bool on_build_free_nids;/* build_free_nids is doing */ - struct task_struct *cp_task;/* checkpoint task */ + wait_queue_head_t cp_wait; /* for orphan inode management */ struct list_head orphan_inode_list; /* orphan inode list */ diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 3d4d5fc..74e81cb 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -592,8 +592,9 @@ static void f2fs_end_io_write(struct bio *bio, int err) if (p-is_sync) complete(p-wait); - if (!get_pages(p-sbi, F2FS_WRITEBACK) p-sbi-cp_task) - wake_up_process(p-sbi-cp_task); + if (!get_pages(p-sbi, F2FS_WRITEBACK) + !list_empty(p-sbi-cp_wait.task_list)) + wake_up(p-sbi-cp_wait); kfree(p); bio_put(bio); diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index e42351c..00e79df 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -876,6 +876,7 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent) spin_lock_init(sbi-stat_lock); init_rwsem(sbi-bio_sem); init_rwsem(sbi-cp_rwsem); + init_waitqueue_head(sbi-cp_wait); init_sb_info(sbi); /* get an inode for meta space */ -- 1.7.10.4 -- November Webinars for C, C++, Fortran Developers Accelerate application performance with scalable programming models. Explore techniques for threading, error checking, porting, and tuning. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60136231iu=/4140/ostg.clktrk ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH V2 RESEND] f2fs: check all ones or zeros bitmap with bitops for better mount performance
Review attached patch, please. -Original Message- From: Chao Yu [mailto:chao2...@samsung.com] Sent: Tuesday, October 29, 2013 3:51 PM To: jaegeuk@samsung.com Cc: linux-fsde...@vger.kernel.org; linux-ker...@vger.kernel.org; linux-f2fs-devel@lists.sourceforge.net Subject: [f2fs-dev] [PATCH V2 RESEND] f2fs: check all ones or zeros bitmap with bitops for better mount performance Previously, check_block_count check valid_map with bit data type in common scenario that sit has all ones or zeros bitmap, it makes low mount performance. So let's check the special bitmap with integer data type instead of the bit one. v1--v2: use find_next_{zero_}bit_le for better performance and readable as Jaegeuk suggested. use neat logogram in comment as Gu Zheng suggested. search continuous ones or zeros for better performance when checking mixed bitmap. Suggested-by: Jaegeuk Kim jaegeuk@samsung.com Signed-off-by: Shu Tan shu@samsung.com Signed-off-by: Chao Yu chao2...@samsung.com --- fs/f2fs/segment.h | 19 +++ 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index abe7094..a7abfa8 100644 --- a/fs/f2fs/segment.h +++ b/fs/f2fs/segment.h @@ -550,8 +550,9 @@ static inline void check_block_count(struct f2fs_sb_info *sbi, { struct f2fs_sm_info *sm_info = SM_I(sbi); unsigned int end_segno = sm_info-segment_count - 1; + bool is_valid = test_bit_le(0, raw_sit-valid_map) ? true : false; int valid_blocks = 0; - int i; + int cur_pos = 0, next_pos; /* check segment usage */ BUG_ON(GET_SIT_VBLOCKS(raw_sit) sbi-blocks_per_seg); @@ -560,9 +561,19 @@ static inline void check_block_count(struct f2fs_sb_info *sbi, BUG_ON(segno end_segno); /* check bitmap with valid block count */ - for (i = 0; i sbi-blocks_per_seg; i++) - if (f2fs_test_bit(i, raw_sit-valid_map)) - valid_blocks++; + do { + if (is_valid) { + next_pos = find_next_zero_bit_le(raw_sit-valid_map, + sbi-blocks_per_seg, + cur_pos); + valid_blocks += next_pos - cur_pos; + } else + next_pos = find_next_bit_le(raw_sit-valid_map, + sbi-blocks_per_seg, + cur_pos); + cur_pos = next_pos; + is_valid = !is_valid; + } while (cur_pos sbi-blocks_per_seg); BUG_ON(GET_SIT_VBLOCKS(raw_sit) != valid_blocks); } -- 1.7.9.5 -- Android is increasing in popularity, but the open development platform that developers love is also attractive to malware creators. Download this white paper to learn more about secure code signing practices that can help keep Android apps secure. http://pubads.g.doubleclick.net/gampad/clk?id=65839951iu=/4140/ostg.clktrk ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel 0001-f2fs-use-pre-calculated-value-to-get-sum-of-valid-bl.patch Description: Binary data -- Android is increasing in popularity, but the open development platform that developers love is also attractive to malware creators. Download this white paper to learn more about secure code signing practices that can help keep Android apps secure. http://pubads.g.doubleclick.net/gampad/clk?id=65839951iu=/4140/ostg.clktrk___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH V2 RESEND] f2fs: check all ones or zeros bitmap with bitops for better mount performance
As you know, if any data or function are used once, we can use some keywords like __initdata for data and __init for function. -Original Message- From: Chao Yu [mailto:chao2...@samsung.com] Sent: Tuesday, October 29, 2013 7:52 PM To: 'Changman Lee'; jaegeuk@samsung.com Cc: linux-fsde...@vger.kernel.org; linux-ker...@vger.kernel.org; linux-f2fs-devel@lists.sourceforge.net Subject: RE: [f2fs-dev] [PATCH V2 RESEND] f2fs: check all ones or zeros bitmap with bitops for better mount performance Hi Lee, -Original Message- From: Changman Lee [mailto:cm224@samsung.com] Sent: Tuesday, October 29, 2013 3:36 PM To: 'Chao Yu'; jaegeuk@samsung.com Cc: linux-fsde...@vger.kernel.org; linux-ker...@vger.kernel.org; linux-f2fs-devel@lists.sourceforge.net Subject: RE: [f2fs-dev] [PATCH V2 RESEND] f2fs: check all ones or zeros bitmap with bitops for better mount performance Review attached patch, please. Could we hide the pre calculated value by generating it in allocated memory by func, because the value will be no use after build_sit_entries(); Regards Yu -Original Message- From: Chao Yu [mailto:chao2...@samsung.com] Sent: Tuesday, October 29, 2013 3:51 PM To: jaegeuk@samsung.com Cc: linux-fsde...@vger.kernel.org; linux-ker...@vger.kernel.org; linux-f2fs-devel@lists.sourceforge.net Subject: [f2fs-dev] [PATCH V2 RESEND] f2fs: check all ones or zeros bitmap with bitops for better mount performance Previously, check_block_count check valid_map with bit data type in common scenario that sit has all ones or zeros bitmap, it makes low mount performance. So let's check the special bitmap with integer data type instead of the bit one. v1--v2: use find_next_{zero_}bit_le for better performance and readable as Jaegeuk suggested. use neat logogram in comment as Gu Zheng suggested. search continuous ones or zeros for better performance when checking mixed bitmap. Suggested-by: Jaegeuk Kim jaegeuk@samsung.com Signed-off-by: Shu Tan shu@samsung.com Signed-off-by: Chao Yu chao2...@samsung.com --- fs/f2fs/segment.h | 19 +++ 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index abe7094..a7abfa8 100644 --- a/fs/f2fs/segment.h +++ b/fs/f2fs/segment.h @@ -550,8 +550,9 @@ static inline void check_block_count(struct f2fs_sb_info *sbi, { struct f2fs_sm_info *sm_info = SM_I(sbi); unsigned int end_segno = sm_info-segment_count - 1; + bool is_valid = test_bit_le(0, raw_sit-valid_map) ? true : false; int valid_blocks = 0; - int i; + int cur_pos = 0, next_pos; /* check segment usage */ BUG_ON(GET_SIT_VBLOCKS(raw_sit) sbi-blocks_per_seg); @@ -560,9 +561,19 @@ static inline void check_block_count(struct f2fs_sb_info +*sbi, BUG_ON(segno end_segno); /* check bitmap with valid block count */ - for (i = 0; i sbi-blocks_per_seg; i++) - if (f2fs_test_bit(i, raw_sit-valid_map)) - valid_blocks++; + do { + if (is_valid) { + next_pos = find_next_zero_bit_le(raw_sit-valid_map, + sbi-blocks_per_seg, + cur_pos); + valid_blocks += next_pos - cur_pos; + } else + next_pos = find_next_bit_le(raw_sit-valid_map, + sbi-blocks_per_seg, + cur_pos); + cur_pos = next_pos; + is_valid = !is_valid; + } while (cur_pos sbi-blocks_per_seg); BUG_ON(GET_SIT_VBLOCKS(raw_sit) != valid_blocks); } -- 1.7.9.5 -- Android is increasing in popularity, but the open development platform that developers love is also attractive to malware creators. Download this white paper to learn more about secure code signing practices that can help keep Android apps secure. http://pubads.g.doubleclick.net/gampad/clk?id=65839951iu=/4140/ostg.c lktr k ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel -- Android is increasing in popularity, but the open development platform that developers love is also attractive to malware creators. Download this white paper to learn more about secure code signing practices that can help keep Android apps secure. http://pubads.g.doubleclick.net/gampad/clk?id=65839951iu=/4140/ostg.clktrk ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH] f2fs-tools: discard is default but not set in config
flash devices support discard therefore discard is default but not set in config Signed-off-by: Changman Lee cm224@samsung.com --- lib/libf2fs.c |1 + mkfs/f2fs_format.c |1 + 2 files changed, 2 insertions(+) diff --git a/lib/libf2fs.c b/lib/libf2fs.c index 6947425..9046986 100644 --- a/lib/libf2fs.c +++ b/lib/libf2fs.c @@ -364,6 +364,7 @@ void f2fs_init_configuration(struct f2fs_configuration *c) c-heap = 1; c-vol_label = ; c-device_name = NULL; + c-trim = 1; } static int is_mounted(const char *mpt, const char *device) diff --git a/mkfs/f2fs_format.c b/mkfs/f2fs_format.c index 5b017c7..364bb46 100644 --- a/mkfs/f2fs_format.c +++ b/mkfs/f2fs_format.c @@ -917,6 +917,7 @@ int f2fs_trim_device() return -1; } + MSG(0, Info: Discarding device\n); if (S_ISREG(stat_buf.st_mode)) return 0; else if (S_ISBLK(stat_buf.st_mode)) { -- 1.7.9.5 -- Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! Discover the easy way to master current and previous Microsoft technologies and advance your career. Get an incredible 1,500+ hours of step-by-step tutorial videos with LearnDevNow. Subscribe today and save! http://pubads.g.doubleclick.net/gampad/clk?id=58040911iu=/4140/ostg.clktrk ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH] f2fs-tools: add stat information into fibmap
This patch shows stat information about a file with fragmented state. Signed-off-by: Changman Lee cm224@samsung.com --- fsck/fibmap.c | 16 1 file changed, 16 insertions(+) diff --git a/fsck/fibmap.c b/fsck/fibmap.c index 8726d3d..0ced7ca 100644 --- a/fsck/fibmap.c +++ b/fsck/fibmap.c @@ -26,6 +26,21 @@ void print_ext(struct file_ext *ext) ext-end_blk, ext-blk_count); } +void print_stat(struct stat64 *st) +{ + printf(\n); + printf(dev [%d:%d]\n, major(st-st_dev), minor(st-st_dev)); + printf(ino [0x%8lx : %ld]\n, st-st_ino, st-st_ino); + printf(mode [0x%8x : %d]\n, st-st_mode, st-st_mode); + printf(nlink [0x%8lx : %ld]\n, st-st_nlink, st-st_nlink); + printf(uid [0x%8x : %d]\n, st-st_uid, st-st_uid); + printf(gid [0x%8x : %d]\n, st-st_gid, st-st_gid); + printf(size [0x%8lx : %ld]\n, st-st_size, st-st_size); + printf(blksize [0x%8lx : %ld]\n, st-st_blksize, st-st_blksize); + printf(blocks[0x%8lx : %ld]\n, st-st_blocks, st-st_blocks); + printf(\n\n); +} + int main(int argc, char *argv[]) { int fd; @@ -61,6 +76,7 @@ int main(int argc, char *argv[]) total_blks = (st.st_size + st.st_blksize - 1) / st.st_blksize; printf(\n%s :\n, filename); + print_stat(st); printf(file_pos start_blk end_blkblks\n); blknum = 0; -- 1.7.10.4 -- Get 100% visibility into Java/.NET code with AppDynamics Lite! It's a free troubleshooting tool designed for production. Get down to code-level detail for bottlenecks, with 2% overhead. Download for free and get started troubleshooting in minutes. http://pubads.g.doubleclick.net/gampad/clk?id=48897031iu=/4140/ostg.clktrk ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH 1/4] f2fs: reorganize the f2fs_setattr() function.
2013. 6. 21. 오후 4:30에 Namjae Jeon linkinj...@gmail.com님이 작성: Sorry for late. I was very busy. Could you tell me if it happens difference between xattr and i_mode, what will you do? First of all, I want to know which case make mismatching permission between xattr and i_mode. And when we call chmod, inode is locked in sys_chmod. If so, inode-i_mode can be changed by any updated inode during chmod although inode is locked ? update_inode updates raw inode on disk from inode-i_mode. As you know, dirtied inode page will written back to disk at unexpected time according to dirty ratio or expired time. If you instantly modify inode-i_mode, inode could be earlier written back than xattr. So I think it is possible that inode-i_mode and xattr might be different when SPO is occured and so on. The purpose of i_acl_mode is used to update i_mode and xattr together in same lock region. Could you please tell me what is same lock region ? (inode-i_mutex or mutex_lock_op(sbi)) Thanks. I meant later. Subject: [PATCH v2] f2fs: reorganize the f2fs_setattr(), f2fs_set_acl, f2fs_setxattr() From: Namjae Jeon namjae.j...@samsung.com -- This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel -- This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel