[f2fs-dev] write performance difference 3.18.21/git f2fs

2015-09-25 Thread Marc Lehmann
Ok, before I tried the f2fs git I made another short test with the
original 3.18.21 f2fs, and it was as fast as before. Then I used the
faulty f2fs module,. which forced a reboot.

Now I started to redo the 3.18.21 test + git f2fs, with the same parameters
(specifically, -s90), and while it didn't start out to be as slow as 4.2.1,
it's similarly slow.

After 218GiB, I stopped the test, giving me an average of 50MiB/s.

Here is typical dstat output (again, dsk/sde):

http://ue.tst.eu/7a40644b3432e2932bdd8c1f6b6fc32d.txt

So less read behaviour than with 4.2.1, but also very slow writes.

That means the performance drop moves with f2fs, not the kernel version.

This is the resulting status:

http://ue.tst.eu/6d94e9bfad48a433bbc6f7daeaf5eb38.txt

Just for fun I'll start doing a -s64 run.

-- 
The choice of a   Deliantra, the free code+content MORPG
  -==- _GNU_  http://www.deliantra.net
  ==-- _   generation
  ---==---(_)__  __   __  Marc Lehmann
  --==---/ / _ \/ // /\ \/ /  schm...@schmorp.de
  -=/_/_//_/\_,_/ /_/\_\

--
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] write performance difference 3.18.21/4.2.1

2015-09-25 Thread Jaegeuk Kim
On Fri, Sep 25, 2015 at 05:47:12PM +0800, Chao Yu wrote:
> Hi Marc Jaegeuk,
> 
> > -Original Message-
> > From: Marc Lehmann [mailto:schm...@schmorp.de]
> > Sent: Friday, September 25, 2015 2:51 PM
> > To: Jaegeuk Kim
> > Cc: linux-f2fs-devel@lists.sourceforge.net
> > Subject: Re: [f2fs-dev] write performance difference 3.18.21/4.2.1
> > 
> > On Thu, Sep 24, 2015 at 11:28:36AM -0700, Jaegeuk Kim  
> > wrote:
> > > One thing that we can try is to run the latest f2fs source in v3.18.
> > > This branch supports f2fs for v3.18.
> > 
> > Ok, please bear with me, the last time I built my own kernel was during
> > the 2.4 timeframe, and this is a ubuntu kernel. What I did is this:
> > 
> >git clone -b linux-3.18 
> > git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git
> >cd f2fs/fs/f2fs
> >rsync -avPR include/linux/f2fs_fs.h include/trace/events/f2fs.h
> > /usr/src/linux-headers-3.18.21-031821/.
> >make -C /lib/modules/3.18.21-031821-generic/build/ M=$PWD modules 
> > modules_install
> > 
> > I then rmmod f2fs/insmod the resulting module, and tried to mount my
> > existing f2fs fs for a quick test, but got a null ptr exception on "mount":
> > 
> > http://ue.tst.eu/e4628dcee97324e580da1bafad938052.txt
> 
> This is my fault, sorry about introducing this oops. :(
> 
> Please revert the commit 7c5e466755ff ("f2fs: readahead cp payload
> pages when mount") since in this commit we try to access invalid
> SIT_I(sbi)->sit_base_addr which should be inited later.

Oops, I'll just remove this patch which was not going too far away.

Thanks,

> 
> Thanks,
> 
> > 
> > Probably caused me not building a full kernel, but recreating how ubuntu
> > build their kernels on a debian system isn't something I look forward to.
> > 
> > > For example, if I can represent blocks like:
> > [number of logs discussion]
> > 
> > Thanks for this explanation - two logs doesn't look so bad, from a
> > locality viewpoint (not a big issue for flash, but a big issue for
> > rotational devices - I also realised I can't use dmcache as dmcache, even
> > in writethrough mode, writes back all data after an unclean shutdown,
> > which would positively kill the disk).
> > 
> > Since whatever speed difference I saw with two logs wasn't big, you
> > completely sold me on 6 logs, or 4 (especially if it seepds up the gc,
> > which I haven't much tested yet). Two logs was merely a test anyway (the
> > same with no_heap, I don't know what it does, but I thought it is worth
> > a try, as metadata + data nearer together is better than having them at
> > opposite ends of the log or so).
> > 
> > --
> > The choice of a   Deliantra, the free code+content MORPG
> >   -==- _GNU_  http://www.deliantra.net
> >   ==-- _   generation
> >   ---==---(_)__  __   __  Marc Lehmann
> >   --==---/ / _ \/ // /\ \/ /  schm...@schmorp.de
> >   -=/_/_//_/\_,_/ /_/\_\
> > 
> > --
> > ___
> > Linux-f2fs-devel mailing list
> > Linux-f2fs-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

--
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] sync/umount hang on 3.18.21, 1.4TB gone after crash

2015-09-25 Thread Jaegeuk Kim
Hi Chao,

[snip]

> > It seems there was no fsync after sync at all. That's why f2fs recovered 
> > back to
> > the latest checkpoint. Anyway, I'm thinking that it's worth to add a kind of
> > periodic checkpoints.
> 
> Agree, I have that in my mind for long time, since Yunlei said that they
> may lost all data of new generated photos after an abnormal poweroff, I
> wrote the below patch, but I have not much time to test and tuned up with
> it.
> 
> I hope if you have time, we can discuss the implementation of periodic cp.
> Maybe in another thread. :)

Sure. Actually, in my thought, we can use our gc thread and existing VFS inode
lists.
Let's take a time to think a bout this.

Thanks,

> 
> >From c81c03fb69612350b12a14bccc07a1fd95cf606b Mon Sep 17 00:00:00 2001
> From: Chao Yu 
> Date: Wed, 5 Aug 2015 22:58:54 +0800
> Subject: [PATCH] f2fs: support background data flush
> 
> Signed-off-by: Chao Yu 
> ---
>  fs/f2fs/data.c  | 100 
> 
>  fs/f2fs/f2fs.h  |  15 +
>  fs/f2fs/inode.c |  16 +
>  fs/f2fs/namei.c |   7 
>  fs/f2fs/super.c |  50 ++--
>  5 files changed, 186 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index a82abe9..39b6339 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -20,6 +20,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  
>  #include "f2fs.h"
>  #include "node.h"
> @@ -27,6 +29,104 @@
>  #include "trace.h"
>  #include 
>  
> +static void f2fs_do_data_flush(struct f2fs_sb_info *sbi)
> +{
> + struct list_head *inode_list = >inode_list;
> + struct f2fs_inode_info *fi, *tmp;
> + struct inode *inode;
> + unsigned int number;
> +
> + spin_lock(>inode_lock);
> + number = sbi->inode_num;
> + list_for_each_entry_safe(fi, tmp, inode_list, i_flush) {
> +
> + if (number-- == 0)
> + break;
> +
> + inode = >vfs_inode;
> +
> + /*
> +  * If the inode is in evicting path, we will fail to igrab
> +  * inode since I_WILL_FREE or I_FREEING should be set in
> +  * inode, so after grab valid inode, it's safe to flush
> +  * dirty page after unlock inode_lock.
> +  */
> + inode = igrab(inode);
> + if (!inode)
> + continue;
> +
> + spin_unlock(>inode_lock);
> +
> + if (!get_dirty_pages(inode))
> + goto next;
> +
> + filemap_flush(inode->i_mapping);
> +next:
> + iput(inode);
> + spin_lock(>inode_lock);
> + }
> + spin_unlock(>inode_lock);
> +}
> +
> +static int f2fs_data_flush_thread(void *data)
> +{
> + struct f2fs_sb_info *sbi = data;
> + wait_queue_head_t *wq = >dflush_wait_queue;
> + struct cp_control cpc;
> + unsigned long wait_time;
> +
> + wait_time = sbi->wait_time;
> +
> + do {
> + if (try_to_freeze())
> + continue;
> + else
> + wait_event_interruptible_timeout(*wq,
> + kthread_should_stop(),
> + msecs_to_jiffies(wait_time));
> + if (kthread_should_stop())
> + break;
> +
> + if (sbi->sb->s_writers.frozen >= SB_FREEZE_WRITE)
> + continue;
> +
> + mutex_lock(>gc_mutex);
> +
> + f2fs_do_data_flush(sbi);
> +
> + cpc.reason = __get_cp_reason(sbi);
> + write_checkpoint(sbi, );
> +
> + mutex_unlock(>gc_mutex);
> +
> + } while (!kthread_should_stop());
> + return 0;
> +}
> +
> +int start_data_flush_thread(struct f2fs_sb_info *sbi)
> +{
> + dev_t dev = sbi->sb->s_bdev->bd_dev;
> + int err = 0;
> +
> + init_waitqueue_head(>dflush_wait_queue);
> + sbi->data_flush_thread = kthread_run(f2fs_data_flush_thread, sbi,
> + "f2fs_flush-%u:%u", MAJOR(dev), MINOR(dev));
> + if (IS_ERR(sbi->data_flush_thread)) {
> + err = PTR_ERR(sbi->data_flush_thread);
> + sbi->data_flush_thread = NULL;
> + }
> +
> + return err;
> +}
> +
> +void stop_data_flush_thread(struct f2fs_sb_info *sbi)
> +{
> + if (!sbi->data_flush_thread)
> + return;
> + kthread_stop(sbi->data_flush_thread);
> + sbi->data_flush_thread = NULL;
> +}
> +
>  static void f2fs_read_end_io(struct bio *bio)
>  {
>   struct bio_vec *bvec;
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index f1a90ff..b6790c9 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -52,6 +52,7 @@
>  #define F2FS_MOUNT_NOBARRIER 0x0800
>  #define F2FS_MOUNT_FASTBOOT  0x1000
>  #define F2FS_MOUNT_EXTENT_CACHE  0x2000
> +#define F2FS_MOUNT_DATA_FLUSH

Re: [f2fs-dev] sync/umount hang on 3.18.21, 1.4TB gone after crash

2015-09-25 Thread Jaegeuk Kim
On Fri, Sep 25, 2015 at 08:00:19AM +0200, Marc Lehmann wrote:
> On Thu, Sep 24, 2015 at 11:50:23AM -0700, Jaegeuk Kim  
> wrote:
> > > When I came back after ~10 hours, I found a number of hung task messages
> > > in syslog, and when I entered sync, sync was consuming 100% system time.
> > 
> > Hmm, at this time, it would be good to check what process is stuck through
> > sysrq.
> 
> It was only intermittently, but here they are. The first one is almost
> certainly the sync that I originally didn't have a backtrace for, the
> second one is one that came up frequently during the f2fs test.
> 
>INFO: task sync:10577 blocked for more than 120 seconds.
>  Tainted: GW  OE   4.2.1-040201-generic #201509211431
>"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>syncD 88082ec964c0 0 10577  10549 0x
> 88000210fdc8 0082 88062ef2a940 88010337e040
> 0246 88000211 8806294915f8 8805c939b800
> 88000210fe54 8121a910 88000210fde8 817a5a37
>Call Trace:
> [] ? SyS_tee+0x360/0x360
> [] schedule+0x37/0x80
> [] wb_wait_for_completion+0x49/0x80
> [] ? prepare_to_wait_event+0xf0/0xf0
> [] sync_inodes_sb+0x94/0x1b0
> [] ? SyS_tee+0x360/0x360
> [] sync_inodes_one_sb+0x15/0x20
> [] iterate_supers+0xb9/0x110
> [] sys_sync+0x35/0x90
> [] entry_SYSCALL_64_fastpath+0x16/0x75
> 
>INFO: task watchdog/1:14743 blocked for more than 120 seconds.
>  Tainted: P   OE  3.18.21-031821-generic #201509020527
>"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>watchdog/1  D 88082ec93300 0 14743  2 0x
> 8801a2383c48 0046 880273a5 00013300
> 8801a2383fd8 00013300 8802e642a800 880273a5
> 1000 81c23d80 81c23d84 880273a5
>Call Trace:
> [] schedule_preempt_disabled+0x29/0x70
> [] __mutex_lock_slowpath+0x95/0x100
> [] ? enqueue_entity+0x289/0xb20
> [] mutex_lock+0x23/0x37
> [] x86_pmu_event_init+0x343/0x430
> [] perf_init_event+0xcb/0x130
> [] perf_event_alloc+0x398/0x440
> [] ? put_prev_entity+0x31/0x3f0
> [] ? restart_watchdog_hrtimer+0x60/0x60
> [] perf_event_create_kernel_counter+0x26/0x100
> [] watchdog_nmi_enable+0xcd/0x170
> [] watchdog_enable+0x45/0xa0
> [] smpboot_thread_fn+0xb9/0x1a0
> [] ? __kthread_parkme+0x4c/0x80
> [] ? SyS_setgroups+0x180/0x180
> [] kthread+0xc9/0xe0
> [] ? kthread_create_on_node+0x180/0x180
> [] ret_from_fork+0x58/0x90
> [] ? kthread_create_on_node+0x180/0x180
> 
> The watchdog might or might not be unrelated, but it is either a 4.2.1
> thing (new kernel) or f2fs related. I only had them during the f2fs test,
> and often, not before or after.
> 
> (I don't know what that kernel thread does, but the system was somewhat
> sluggish during the test, and other, unrelated servcies, were negatively
> affected).
> 
> > It seems there was no fsync after sync at all. That's why f2fs recovered 
> > back to
> > the latest checkpoint. Anyway, I'm thinking that it's worth to add a kind of
> > periodic checkpoints.
> 
> Well, would it sync more often if this problem hadn't occured? Most
> filesystems (or rather, the filesystems I use, btrfs, xfs, ext* and zfs)
> seem to have their own regular commit interval, or otherwise commit
> frequently if it is cheap enough.

AFAIK, there-in *commit* means syncing metadata, not userdata. Doesn't it?
So, even if you saw no data loss, filesystem doesn't guarantee all the data were
completely recovered, since sync or fsync was not called for that file.

I think you need to tune the system-wide parameters related to flusher mentioned
by Chao for your workloads.
And, we need to expect periodic checkpoints are able to recover the previously
flushed data.

Thanks,

> 
> -- 
> The choice of a   Deliantra, the free code+content MORPG
>   -==- _GNU_  http://www.deliantra.net
>   ==-- _   generation
>   ---==---(_)__  __   __  Marc Lehmann
>   --==---/ / _ \/ // /\ \/ /  schm...@schmorp.de
>   -=/_/_//_/\_,_/ /_/\_\

--
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] SMR drive test 2; 128GB partition; no obvious corruption, much more sane behaviour, weird overprovisioning

2015-09-25 Thread Jaegeuk Kim
On Fri, Sep 25, 2015 at 07:42:25AM +0200, Marc Lehmann wrote:
> On Thu, Sep 24, 2015 at 10:27:49AM -0700, Jaegeuk Kim  
> wrote:
> > > In the end, I might settle with -s64, and currently do tests with -s90.
> > 
> > Got it. But why -s90? :)
> 
> He :) It's a nothing-special number between 64 and 128, that's all.

Oh, then, I don't think that is a good magic number.
It seems that you decided to use -s64, so it'd better to keep it to address
any perf results.

> > I just pushed the patches to master branch in f2fs-tools.git.
> > Could you pull them and check them?
> 
> Got them, last patch was the "check sit types" change.
> 
> > I added one more patch to avoid harmless sit_type fixes previously you 
> > reported.
> > 
> > And, for the 8TB case, let me check again. It seems that we need to handle 
> > under
> > 1% overprovision ratio. (e.g., 0.5%)
> 
> That might make me potentially very happy. But my main concern at the
> moment is stability - even when you have a backup, restoring 8TB will take
> days, and backups are never uptodate.
> 
> It would be nice to be able to control it more from the user side though.
> 
> For example, I have not yet reached 0.0% free with f2fs. That's fine, I don't
> plan9 to, but I need to know at which percentage should I stop, which is
> something I can only really find out with experiments.
> 
> And just filling these 8TB disks takes days, so the question is, can I
> simulate near-full behaviour with smaller partitions.

Why not? :)
I think the behavior should be same. And, it'd good to set small sections
in order to see it more clearly.

Anyway, I wrote a patch to consider under 1% for large partitions.

 section  ovp ratio  ovp size

For 8TB,
 -s1: 0.07%   -> 10GB
 -s32   : 0.39%   -> 65GB
 -s64   : 0.55%   -> 92GB
 -s128  : 0.78%   -> 132GB

For 128GB,
 -s1: 0.55%   -> 1.4GB
 -s32   : 3.14%   -> 8GB
 -s64   : 4.45%   -> 12GB
 -s128  : 6.32%   -> 17GB

Let me test this patch for a while, and then push into our git.

Thanks,

>From 2cdb04b52f202e931e370564396366d44bd4d1e2 Mon Sep 17 00:00:00 2001
From: Jaegeuk Kim 
Date: Fri, 25 Sep 2015 09:31:04 -0700
Subject: [PATCH] mkfs.f2fs: support <1% overprovision ratio

Big partition size needs uner 1% overprovision space to acquire more space.

section  ovp ratio  ovp size
For 8TB,
-s1: 0.07% -> 10GB
-s32   : 0.39% -> 65GB
-s64   : 0.55% -> 92GB
-s128  : 0.78% -> 132GB

For 128GB,
-s1: 0.55% -> 1.4GB
-s32   : 3.14% -> 8GB
-s64   : 4.45% -> 12GB
-s128  : 6.32% -> 17GB

Signed-off-by: Jaegeuk Kim 
---
 include/f2fs_fs.h   |  2 +-
 mkfs/f2fs_format.c  | 12 ++--
 mkfs/f2fs_format_main.c |  2 +-
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/include/f2fs_fs.h b/include/f2fs_fs.h
index 38a774c..359deec 100644
--- a/include/f2fs_fs.h
+++ b/include/f2fs_fs.h
@@ -225,7 +225,7 @@ enum f2fs_config_func {
 struct f2fs_configuration {
u_int32_t sector_size;
u_int32_t reserved_segments;
-   u_int32_t overprovision;
+   double overprovision;
u_int32_t cur_seg[6];
u_int32_t segs_per_sec;
u_int32_t secs_per_zone;
diff --git a/mkfs/f2fs_format.c b/mkfs/f2fs_format.c
index 2d4ab09..176bdea 100644
--- a/mkfs/f2fs_format.c
+++ b/mkfs/f2fs_format.c
@@ -155,19 +155,19 @@ static void configure_extension_list(void)
free(config.extension_list);
 }
 
-static u_int32_t get_best_overprovision(void)
+static double get_best_overprovision(void)
 {
-   u_int32_t reserved, ovp, candidate, end, diff, space;
-   u_int32_t max_ovp = 0, max_space = 0;
+   double reserved, ovp, candidate, end, diff, space;
+   double max_ovp = 0, max_space = 0;
 
if (get_sb(segment_count_main) < 256) {
candidate = 10;
end = 95;
diff = 5;
} else {
-   candidate = 1;
+   candidate = 0.01;
end = 10;
-   diff = 1;
+   diff = 0.01;
}
 
for (; candidate <= end; candidate += diff) {
@@ -533,7 +533,7 @@ static int f2fs_write_check_point_pack(void)
set_cp(overprov_segment_count, get_cp(overprov_segment_count) +
get_cp(rsvd_segment_count));
 
-   MSG(0, "Info: Overprovision ratio = %u%%\n", config.overprovision);
+   MSG(0, "Info: Overprovision ratio = %.3lf%%\n", config.overprovision);
MSG(0, "Info: Overprovision segments = %u (GC reserved = %u)\n",
get_cp(overprov_segment_count),
config.reserved_segments);
diff --git a/mkfs/f2fs_format_main.c b/mkfs/f2fs_format_main.c
index fc612d8..2ea809c 100644
--- a/mkfs/f2fs_format_main.c
+++ b/mkfs/f2fs_format_main.c
@@ -99,7 +99,7 @@ static void f2fs_parse_options(int argc, char *argv[])
config.vol_label = optarg;

Re: [f2fs-dev] [PATCH v2] f2fs: fix to correct freed section number during gc

2015-09-25 Thread Jaegeuk Kim
On Fri, Sep 25, 2015 at 05:50:55PM +0800, Chao Yu wrote:
> This patch fixes to maintain the right section count freed in garbage
> collecting when triggering a foreground gc.
> 
> Besides, when a foreground gc is running on current selected section, once
> we fail to gc one segment, it's better to abandon gcing the left segments
> in current section, because anyway we will select next victim for
> foreground gc, so gc on the left segments in previous section will become
> overhead and also cause the long latency for caller.
> 
> Signed-off-by: Chao Yu 
> ---
> v2:
>  o avoid calc the wrong value when freed segments across sections.
>  fs/f2fs/gc.c | 22 +-
>  1 file changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> index e932740..256ebd4 100644
> --- a/fs/f2fs/gc.c
> +++ b/fs/f2fs/gc.c
> @@ -802,7 +802,7 @@ int f2fs_gc(struct f2fs_sb_info *sbi)
>   unsigned int segno = NULL_SEGNO;
>   unsigned int i;
>   int gc_type = BG_GC;
> - int nfree = 0;
> + int sec_freed = 0, seg_freed;
>   int ret = -1;
>   struct cp_control cpc;
>   struct gc_inode_list gc_list = {
> @@ -817,7 +817,7 @@ gc_more:
>   if (unlikely(f2fs_cp_error(sbi)))
>   goto stop;
>  
> - if (gc_type == BG_GC && has_not_enough_free_secs(sbi, nfree)) {
> + if (gc_type == BG_GC && has_not_enough_free_secs(sbi, sec_freed)) {
>   gc_type = FG_GC;
>   if (__get_victim(sbi, , gc_type) || prefree_segments(sbi))
>   write_checkpoint(sbi, );
> @@ -833,13 +833,25 @@ gc_more:
>   ra_meta_pages(sbi, GET_SUM_BLOCK(sbi, segno), sbi->segs_per_sec,
>   META_SSA);
>  
> - for (i = 0; i < sbi->segs_per_sec; i++)
> - nfree += do_garbage_collect(sbi, segno + i, _list, gc_type);
> + for (i = 0, seg_freed = 0; i < sbi->segs_per_sec; i++) {
> + /*
> +  * for FG_GC case, halt gcing left segments once failed one
> +  * of segments in selected section to avoid long latency.
> +  */
> + if (!do_garbage_collect(sbi, segno + i, _list, gc_type) &&
> + gc_type == FG_GC)
> + break;
> + if (gc_type == FG_GC)
> + seg_freed++;
> + }

How about?

if (i == sbi->segs_per_sec && gc_type == FG_GC)
sec_free++;

> +
> + if (seg_freed == sbi->segs_per_sec)
> + sec_freed++;
>  
>   if (gc_type == FG_GC)
>   sbi->cur_victim_sec = NULL_SEGNO;
>  
> - if (has_not_enough_free_secs(sbi, nfree))
> + if (has_not_enough_free_secs(sbi, sec_freed))
>   goto gc_more;
>  
>   if (gc_type == FG_GC)
> -- 
> 2.5.2

--
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] write performance difference 3.18.21/4.2.1

2015-09-25 Thread Marc Lehmann
On Fri, Sep 25, 2015 at 05:47:12PM +0800, Chao Yu  wrote:
> Please revert the commit 7c5e466755ff ("f2fs: readahead cp payload
> pages when mount") since in this commit we try to access invalid
> SIT_I(sbi)->sit_base_addr which should be inited later.

Wow, you are fast. To make it short, the new module loads and mounts. Since
systemd failed to clear the dmcache again, I need to wait a few hours for it
to write back before testing. On the plus side, this gives a fairly high
chance of fragmented memory, so I can test the code that avoids oom on mount
as well :)

> > Since whatever speed difference I saw with two logs wasn't big, you
> > completely sold me on 6 logs, or 4 (especially if it seepds up the gc,
> > which I haven't much tested yet). Two logs was merely a test anyway (the
> > same with no_heap, I don't know what it does, but I thought it is worth
> > a try, as metadata + data nearer together is better than having them at
> > opposite ends of the log or so).
> 
> If the section size is pretty large, no_heap would be enough. The original
> intention was to provide more contiguous space for data only so that a big
> file could have a large extent instead of splitting by its metadata.

Great, so no_heap it is.

Also, I was thinking a bit more on the active_logs issue.

The problem with SMR drives and too many logs is not just locality,
but the fatc that appending data, unlike as with flash, requires a
read-modify-write cycle. Likewise, I am pretty sure the disk can't keep
6 open write fragments in memory - maybe it can only keep one, so every
metadata write might cause a RMW cycle again, because it's not big enough
to fill a full zone (17-30MB).

So, hmm, well, separating the metadata that frequently changes
(directories) form the rest is necessary for the GC to not have to copy
almost all data block, but otherwise, it's nice if everything else clumps
together.

(likewise, stat information probably changes a lot more often than file
data, e.g. chown -R user . will change stat data regardless of whether the
files already belong to a user, and it would be nice if that menas the
data blocks can be kept untouched. Similar, renames).

What would you recommend for this case?

-- 
The choice of a   Deliantra, the free code+content MORPG
  -==- _GNU_  http://www.deliantra.net
  ==-- _   generation
  ---==---(_)__  __   __  Marc Lehmann
  --==---/ / _ \/ // /\ \/ /  schm...@schmorp.de
  -=/_/_//_/\_,_/ /_/\_\

--
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] SMR drive test 2; 128GB partition; no obvious corruption, much more sane behaviour, weird overprovisioning

2015-09-25 Thread Marc Lehmann
On Fri, Sep 25, 2015 at 10:45:46AM -0700, Jaegeuk Kim  
wrote:
> > He :) It's a nothing-special number between 64 and 128, that's all.
> 
> Oh, then, I don't think that is a good magic number.

Care to share why? :)

> It seems that you decided to use -s64, so it'd better to keep it to address
> any perf results.

Is there anysthing specially good for numbers of two? Or do you just want top
reduce the number of changed variables?

I'f yes, should I do the 3.18.21 test with -s90 (as the 3.18.21 and 4.2.1
tests before), or with -s64?

> > And just filling these 8TB disks takes days, so the question is, can I
> > simulate near-full behaviour with smaller partitions.
> 
> Why not? :)
> I think the behavior should be same. And, it'd good to set small sections
> in order to see it more clearly.

The section size is a critical parameter for these drives. Also, the data
mix is the same for 8TB and smaller partitions (in these tests, which were
meantr to be the first round of tests only anyway).

So a smaller section size compared to the full partition test, I think,
would result in very different behaviour. Likewise, if a small partition
has comparatively more (or absolutely less) overprovision (and/or reserved
space), this again might cause different behaviour.

At least to me, it's not obvious what a good comparable overprovision ratio
is to test full device behaviour on a smaller partition.

Also, section sizes vary by a factor fo two over the device, so what might
work fine with -s64 in the middle of the disk, might work badly at the end.

Likewise, since the files don't get larger, the GC might do a much better
job at -s64 than at -s128 (almost certainly, actually).

As a thought experiment, what happens when I use -s8 or a similar small size?
If the GC writes linearly, there won't be too many RMW cycles. But is that
guaranteed even with an aging filesystem?

If yes, then the best -s number might be 1. Because all I rely on is
mostly linear batched large writes, not so much large batched reads.

That is, unfortunately, not something I can easily test.

> Let me test this patch for a while, and then push into our git.

Thanks, will do so, then.

-- 
The choice of a   Deliantra, the free code+content MORPG
  -==- _GNU_  http://www.deliantra.net
  ==-- _   generation
  ---==---(_)__  __   __  Marc Lehmann
  --==---/ / _ \/ // /\ \/ /  schm...@schmorp.de
  -=/_/_//_/\_,_/ /_/\_\

--
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] SMR drive test 2; 128GB partition; no obvious corruption, much more sane behaviour, weird overprovisioning

2015-09-25 Thread Marc Lehmann
On Fri, Sep 25, 2015 at 04:05:48PM +0800, Chao Yu  wrote:
> Actually, we should set the value of 'count' parameter to indicate how many
> times we want to do gc in one batch, at most 16 times in a loop for each
> ioctl invoking:
> ioctl(fd, F2FS_IOC_GC, );
> After ioctl retruned successfully, 'count' parameter will contain the count
> of gces we did actually.

Ah, so this way, I could even find out when to stop.

> One batch means a certain number of gces excuting serially.

Thanks for the explanation - well, I guess there is no harm in setting
count to 1 and calling it repeatedly, as GC operations should generally be
slow enough so many repeated calls will be ok.

-- 
The choice of a   Deliantra, the free code+content MORPG
  -==- _GNU_  http://www.deliantra.net
  ==-- _   generation
  ---==---(_)__  __   __  Marc Lehmann
  --==---/ / _ \/ // /\ \/ /  schm...@schmorp.de
  -=/_/_//_/\_,_/ /_/\_\

--
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] sync/umount hang on 3.18.21, 1.4TB gone after crash

2015-09-25 Thread Marc Lehmann
On Thu, Sep 24, 2015 at 11:50:23AM -0700, Jaegeuk Kim  
wrote:
> > When I came back after ~10 hours, I found a number of hung task messages
> > in syslog, and when I entered sync, sync was consuming 100% system time.
> 
> Hmm, at this time, it would be good to check what process is stuck through
> sysrq.

It was only intermittently, but here they are. The first one is almost
certainly the sync that I originally didn't have a backtrace for, the
second one is one that came up frequently during the f2fs test.

   INFO: task sync:10577 blocked for more than 120 seconds.
 Tainted: GW  OE   4.2.1-040201-generic #201509211431
   "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
   syncD 88082ec964c0 0 10577  10549 0x
88000210fdc8 0082 88062ef2a940 88010337e040
0246 88000211 8806294915f8 8805c939b800
88000210fe54 8121a910 88000210fde8 817a5a37
   Call Trace:
[] ? SyS_tee+0x360/0x360
[] schedule+0x37/0x80
[] wb_wait_for_completion+0x49/0x80
[] ? prepare_to_wait_event+0xf0/0xf0
[] sync_inodes_sb+0x94/0x1b0
[] ? SyS_tee+0x360/0x360
[] sync_inodes_one_sb+0x15/0x20
[] iterate_supers+0xb9/0x110
[] sys_sync+0x35/0x90
[] entry_SYSCALL_64_fastpath+0x16/0x75

   INFO: task watchdog/1:14743 blocked for more than 120 seconds.
 Tainted: P   OE  3.18.21-031821-generic #201509020527
   "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
   watchdog/1  D 88082ec93300 0 14743  2 0x
8801a2383c48 0046 880273a5 00013300
8801a2383fd8 00013300 8802e642a800 880273a5
1000 81c23d80 81c23d84 880273a5
   Call Trace:
[] schedule_preempt_disabled+0x29/0x70
[] __mutex_lock_slowpath+0x95/0x100
[] ? enqueue_entity+0x289/0xb20
[] mutex_lock+0x23/0x37
[] x86_pmu_event_init+0x343/0x430
[] perf_init_event+0xcb/0x130
[] perf_event_alloc+0x398/0x440
[] ? put_prev_entity+0x31/0x3f0
[] ? restart_watchdog_hrtimer+0x60/0x60
[] perf_event_create_kernel_counter+0x26/0x100
[] watchdog_nmi_enable+0xcd/0x170
[] watchdog_enable+0x45/0xa0
[] smpboot_thread_fn+0xb9/0x1a0
[] ? __kthread_parkme+0x4c/0x80
[] ? SyS_setgroups+0x180/0x180
[] kthread+0xc9/0xe0
[] ? kthread_create_on_node+0x180/0x180
[] ret_from_fork+0x58/0x90
[] ? kthread_create_on_node+0x180/0x180

The watchdog might or might not be unrelated, but it is either a 4.2.1
thing (new kernel) or f2fs related. I only had them during the f2fs test,
and often, not before or after.

(I don't know what that kernel thread does, but the system was somewhat
sluggish during the test, and other, unrelated servcies, were negatively
affected).

> It seems there was no fsync after sync at all. That's why f2fs recovered back 
> to
> the latest checkpoint. Anyway, I'm thinking that it's worth to add a kind of
> periodic checkpoints.

Well, would it sync more often if this problem hadn't occured? Most
filesystems (or rather, the filesystems I use, btrfs, xfs, ext* and zfs)
seem to have their own regular commit interval, or otherwise commit
frequently if it is cheap enough.

-- 
The choice of a   Deliantra, the free code+content MORPG
  -==- _GNU_  http://www.deliantra.net
  ==-- _   generation
  ---==---(_)__  __   __  Marc Lehmann
  --==---/ / _ \/ // /\ \/ /  schm...@schmorp.de
  -=/_/_//_/\_,_/ /_/\_\

--
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] sync/umount hang on 3.18.21, 1.4TB gone after crash

2015-09-25 Thread Marc Lehmann
On Fri, Sep 25, 2015 at 08:00:19AM +0200, Marc Lehmann  
wrote:
> On Thu, Sep 24, 2015 at 11:50:23AM -0700, Jaegeuk Kim  
> wrote:
> > > When I came back after ~10 hours, I found a number of hung task messages
> > > in syslog, and when I entered sync, sync was consuming 100% system time.
> > 
> > Hmm, at this time, it would be good to check what process is stuck through
> > sysrq.
> 
> It was only intermittently, but here they are.

I meant "here are backtraces from the stuck process", from syslog, not via
sysrq of course.

-- 
The choice of a   Deliantra, the free code+content MORPG
  -==- _GNU_  http://www.deliantra.net
  ==-- _   generation
  ---==---(_)__  __   __  Marc Lehmann
  --==---/ / _ \/ // /\ \/ /  schm...@schmorp.de
  -=/_/_//_/\_,_/ /_/\_\

--
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] SMR drive test 2; 128GB partition; no obvious corruption, much more sane behaviour, weird overprovisioning

2015-09-25 Thread Chao Yu
> -Original Message-
> From: Jaegeuk Kim [mailto:jaeg...@kernel.org]
> Sent: Friday, September 25, 2015 1:21 AM
> To: Marc Lehmann
> Cc: Chao Yu; linux-f2fs-devel@lists.sourceforge.net
> Subject: Re: [f2fs-dev] SMR drive test 2; 128GB partition; no obvious 
> corruption, much more
> sane behaviour, weird overprovisioning
> 
> On Thu, Sep 24, 2015 at 01:43:24AM +0200, Marc Lehmann wrote:
> > On Thu, Sep 24, 2015 at 01:30:22AM +0200, Marc Lehmann  
> > wrote:
> > > > One thing I note is that gc_min_sleep_time is not be set in your script,
> > > > so in some condition gc may still do the sleep with gc_min_sleep_time 
> > > > (30
> > > > seconds by default) instead of gc_max_sleep_time which we expect.
> > >
> > > Ah, sorry, I actually set gc_min_sleep_time to 100, but forgot to include
> > > it.
> >
> > Sorry, that sounded confusing - I set it to 100 in previous tests, and 
> > forgot
> > to include it, so it was running with 3. When experimenting, I actually
> > do get the gc to do more frequent operations now.
> >
> > Is there any obvious harm setting it to a very low value (such as 100 or 
> > 10)?
> >
> > I assume all it does is have less time buffer between the last operation
> > and the gc starting. When I write in batches, or when I know the fs will be
> > idle, there shouldn't be any harm, performance wise, of letting it work all
> > the time.
> 
> Yeah, I don't think it does matter with very small time periods, since the 
> timer
> is set after background GC is done.
> But, we use msecs_to_jiffies(), so hope not to use something like 10 ms, since
> each backgroudn GC conducts reading victim blocks into page cache and then 
> just
> sets them as dirty.
> That indicates, after a while, we hope flusher will write them all to disk and
> finally we got a free section.
> So, IMO, we need to give some time slots to flusher as well.
> 
> For example, if write bandwidth is 30MB/s and section size is 128MB, it needs
> about 4secs to write one section.

It's better for us to consider VM dirty data flush policy, IIRC, Fengguang
did the optimization work of writeback, if dirty ratio (dirty bytes?)is not
high, VM will flush data slightly slowly, but as dirty ratio  increase, VM
will flush data aggressively. If we want a large usage of max bandwidth, the
value of following interface could be consider when tuned up with gc policy
of f2fs.

/proc/sys/vm/
dirty_background_bytes
dirty_background_ratio
dirty_expire_centisecs

Thanks,

> So, how about setting
>  - gc_min_time to 1~2 secs,
>  - gc_max_time to 3~4 secs,
>  - gc_idle_time to 10 secs,
>  - reclaim_segments to 64 (sync when 1 section becomes prefree)
> 
> Thanks,
> 
> >
> > --
> > The choice of a   Deliantra, the free code+content MORPG
> >   -==- _GNU_  http://www.deliantra.net
> >   ==-- _   generation
> >   ---==---(_)__  __   __  Marc Lehmann
> >   --==---/ / _ \/ // /\ \/ /  schm...@schmorp.de
> >   -=/_/_//_/\_,_/ /_/\_\


--
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] SMR drive test 2; 128GB partition; no obvious corruption, much more sane behaviour, weird overprovisioning

2015-09-25 Thread Chao Yu
> -Original Message-
> From: Marc Lehmann [mailto:schm...@schmorp.de]
> Sent: Thursday, September 24, 2015 7:30 AM
> To: Chao Yu
> Cc: 'Jaegeuk Kim'; linux-f2fs-devel@lists.sourceforge.net
> Subject: Re: [f2fs-dev] SMR drive test 2; 128GB partition; no obvious 
> corruption, much more
> sane behaviour, weird overprovisioning
> 
> On Wed, Sep 23, 2015 at 04:55:57PM +0800, Chao Yu  
> wrote:
> > >echo 1 >gc_idle
> > >echo 1000 >gc_max_sleep_time
> > >echo 5000 >gc_no_gc_sleep_time
> >
> > One thing I note is that gc_min_sleep_time is not be set in your script,
> > so in some condition gc may still do the sleep with gc_min_sleep_time (30
> > seconds by default) instead of gc_max_sleep_time which we expect.
> 
> Ah, sorry, I actually set gc_min_sleep_time to 100, but forgot to include
> it.
> 
> > In 4.3 rc1 kernel, we have add a new ioctl to trigger in batches gc, maybe
> > we can use it as one option.
> 
> Yes, such an ioctl could be useful to me, although I do not intend to have
> background gc off.
> 
> I assume that the ioctl will block for the time it runs, and I can ask it
> to do up to 16 batches in one go (by default)? That sounds indeed very

Actually, we should set the value of 'count' parameter to indicate how many
times we want to do gc in one batch, at most 16 times in a loop for each
ioctl invoking:
ioctl(fd, F2FS_IOC_GC, );
After ioctl retruned successfully, 'count' parameter will contain the count
of gces we did actually.

> useful to have.
> 
> What is "one batch" in terms of gc, one section?

One batch means a certain number of gces excuting serially.

We have foreground/background mode in gc procedure:
1) For forground gc mode, it will try to gc several sections until there are
enough free sections;
2) For background gc mode, it will try to gc one section.
So we will not know how many sections will be freed in one batch, because it
depends on a) which mode we will use (gc mode is dynamically depending on 
current
status of free section/dirty datas) and b) whether a victim exist or not.

Thanks,

> 
> --
> The choice of a   Deliantra, the free code+content MORPG
>   -==- _GNU_  http://www.deliantra.net
>   ==-- _   generation
>   ---==---(_)__  __   __  Marc Lehmann
>   --==---/ / _ \/ // /\ \/ /  schm...@schmorp.de
>   -=/_/_//_/\_,_/ /_/\_\


--
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] f2fs: fix to correct freed section number during gc

2015-09-25 Thread Chao Yu
Hi Jaegeuk,

> On Sep 24, 2015, at 5:08 AM, Jaegeuk Kim  wrote:
> 
> Hi Chao,
> 
> On Wed, Sep 23, 2015 at 06:11:36PM +0800, Chao Yu wrote:
>> Hi Jaegeuk,
>> 
>>> -Original Message-
>>> From: Jaegeuk Kim [mailto:jaeg...@kernel.org]
>>> Sent: Wednesday, September 23, 2015 6:54 AM
>>> To: Chao Yu
>>> Cc: linux-f2fs-devel@lists.sourceforge.net; linux-ker...@vger.kernel.org
>>> Subject: Re: [PATCH] f2fs: fix to correct freed section number during gc
>>> 
>>> Hi Chao,
>>> 
>>> On Tue, Sep 22, 2015 at 09:18:18PM +0800, Chao Yu wrote:
 We pass 'nfree' to has_not_enough_free_secs to check whether there is
 enough free section, but 'nfree' indicates the number of segment gced,
 should alter the value to section number.
>>> 
>>> Yeah, but I think we need to increase nfree only when an entire section is 
>>> gced
>>> completely, since sometimes nfree can be increased across sections.
>> 
>> Agree, I will fix that.
>> 
>> Still have one question, for foreground gc, why would we give up retry 
>> writing
>> out pages of last victim, but trying to select another victim for cleanup?
>> Will new introduced method cause long latency for caller than before?
> 
> Hmm. Very occasionally, I've seen that gc goes into an infinite loop to clean 
> up
> one victim. In order to avoid that, I added giving up and then doing gc again.
> I think there is no problem in normal cases. Even in an abnormal case, I 
> expect
> that next victim would be selected again because that should have lowest 
> moving
> cost.

Got it, thanks for your explanation! :)

I have sent the v2 patch, please help to review.

Thanks,

> 
>> 
>> Thanks,
> 
> --
> Monitor Your Dynamic Infrastructure at Any Scale With Datadog!
> Get real-time metrics from all of your servers, apps and tools
> in one place.
> SourceForge users - Click here to start your Free Trial of Datadog now!
> http://pubads.g.doubleclick.net/gampad/clk?id=241902991=/4140
> ___
> Linux-f2fs-devel mailing list
> Linux-f2fs-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


--
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] write performance difference 3.18.21/4.2.1

2015-09-25 Thread Marc Lehmann
On Thu, Sep 24, 2015 at 11:28:36AM -0700, Jaegeuk Kim  
wrote:
> One thing that we can try is to run the latest f2fs source in v3.18.
> This branch supports f2fs for v3.18.

Ok, please bear with me, the last time I built my own kernel was during
the 2.4 timeframe, and this is a ubuntu kernel. What I did is this:

   git clone -b linux-3.18 
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git
   cd f2fs/fs/f2fs
   rsync -avPR include/linux/f2fs_fs.h include/trace/events/f2fs.h 
/usr/src/linux-headers-3.18.21-031821/.
   make -C /lib/modules/3.18.21-031821-generic/build/ M=$PWD modules 
modules_install

I then rmmod f2fs/insmod the resulting module, and tried to mount my
existing f2fs fs for a quick test, but got a null ptr exception on "mount":

http://ue.tst.eu/e4628dcee97324e580da1bafad938052.txt

Probably caused me not building a full kernel, but recreating how ubuntu
build their kernels on a debian system isn't something I look forward to.

> For example, if I can represent blocks like:
[number of logs discussion]

Thanks for this explanation - two logs doesn't look so bad, from a
locality viewpoint (not a big issue for flash, but a big issue for
rotational devices - I also realised I can't use dmcache as dmcache, even
in writethrough mode, writes back all data after an unclean shutdown,
which would positively kill the disk).

Since whatever speed difference I saw with two logs wasn't big, you
completely sold me on 6 logs, or 4 (especially if it seepds up the gc,
which I haven't much tested yet). Two logs was merely a test anyway (the
same with no_heap, I don't know what it does, but I thought it is worth
a try, as metadata + data nearer together is better than having them at
opposite ends of the log or so).

-- 
The choice of a   Deliantra, the free code+content MORPG
  -==- _GNU_  http://www.deliantra.net
  ==-- _   generation
  ---==---(_)__  __   __  Marc Lehmann
  --==---/ / _ \/ // /\ \/ /  schm...@schmorp.de
  -=/_/_//_/\_,_/ /_/\_\

--
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] write performance difference 3.18.21/4.2.1

2015-09-25 Thread Chao Yu
Hi Marc Jaegeuk,

> -Original Message-
> From: Marc Lehmann [mailto:schm...@schmorp.de]
> Sent: Friday, September 25, 2015 2:51 PM
> To: Jaegeuk Kim
> Cc: linux-f2fs-devel@lists.sourceforge.net
> Subject: Re: [f2fs-dev] write performance difference 3.18.21/4.2.1
> 
> On Thu, Sep 24, 2015 at 11:28:36AM -0700, Jaegeuk Kim  
> wrote:
> > One thing that we can try is to run the latest f2fs source in v3.18.
> > This branch supports f2fs for v3.18.
> 
> Ok, please bear with me, the last time I built my own kernel was during
> the 2.4 timeframe, and this is a ubuntu kernel. What I did is this:
> 
>git clone -b linux-3.18 
> git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git
>cd f2fs/fs/f2fs
>rsync -avPR include/linux/f2fs_fs.h include/trace/events/f2fs.h
> /usr/src/linux-headers-3.18.21-031821/.
>make -C /lib/modules/3.18.21-031821-generic/build/ M=$PWD modules 
> modules_install
> 
> I then rmmod f2fs/insmod the resulting module, and tried to mount my
> existing f2fs fs for a quick test, but got a null ptr exception on "mount":
> 
> http://ue.tst.eu/e4628dcee97324e580da1bafad938052.txt

This is my fault, sorry about introducing this oops. :(

Please revert the commit 7c5e466755ff ("f2fs: readahead cp payload
pages when mount") since in this commit we try to access invalid
SIT_I(sbi)->sit_base_addr which should be inited later.

Thanks,

> 
> Probably caused me not building a full kernel, but recreating how ubuntu
> build their kernels on a debian system isn't something I look forward to.
> 
> > For example, if I can represent blocks like:
> [number of logs discussion]
> 
> Thanks for this explanation - two logs doesn't look so bad, from a
> locality viewpoint (not a big issue for flash, but a big issue for
> rotational devices - I also realised I can't use dmcache as dmcache, even
> in writethrough mode, writes back all data after an unclean shutdown,
> which would positively kill the disk).
> 
> Since whatever speed difference I saw with two logs wasn't big, you
> completely sold me on 6 logs, or 4 (especially if it seepds up the gc,
> which I haven't much tested yet). Two logs was merely a test anyway (the
> same with no_heap, I don't know what it does, but I thought it is worth
> a try, as metadata + data nearer together is better than having them at
> opposite ends of the log or so).
> 
> --
> The choice of a   Deliantra, the free code+content MORPG
>   -==- _GNU_  http://www.deliantra.net
>   ==-- _   generation
>   ---==---(_)__  __   __  Marc Lehmann
>   --==---/ / _ \/ // /\ \/ /  schm...@schmorp.de
>   -=/_/_//_/\_,_/ /_/\_\
> 
> --
> ___
> Linux-f2fs-devel mailing list
> Linux-f2fs-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


--
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH] f2fs: remove unneeded f2fs_{, un}lock_op in do_recover_data()

2015-09-25 Thread Chao Yu
Protecting recovery flow by using cp_rwsem is not needed, since we have
prevent triggering any checkpoint by locking cp_mutex previously.

Signed-off-by: Chao Yu 
---
 fs/f2fs/recovery.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
index acc21f2..c5daec5 100644
--- a/fs/f2fs/recovery.c
+++ b/fs/f2fs/recovery.c
@@ -383,15 +383,11 @@ static int do_recover_data(struct f2fs_sb_info *sbi, 
struct inode *inode,
start = start_bidx_of_node(ofs_of_node(page), fi);
end = start + ADDRS_PER_PAGE(page, fi);
 
-   f2fs_lock_op(sbi);
-
set_new_dnode(, inode, NULL, NULL, 0);
 
err = get_dnode_of_data(, start, ALLOC_NODE);
-   if (err) {
-   f2fs_unlock_op(sbi);
+   if (err)
goto out;
-   }
 
f2fs_wait_on_page_writeback(dn.node_page, NODE);
 
@@ -456,7 +452,6 @@ static int do_recover_data(struct f2fs_sb_info *sbi, struct 
inode *inode,
set_page_dirty(dn.node_page);
 err:
f2fs_put_dnode();
-   f2fs_unlock_op(sbi);
 out:
f2fs_msg(sbi->sb, KERN_NOTICE,
"recover_data: ino = %lx, recovered = %d blocks, err = %d",
-- 
2.5.2



--
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel