[PATCH 1/9] writeback: plug writeback at a high level

2015-03-10 Thread Josef Bacik
From: Dave Chinner 

Doing writeback on lots of little files causes terrible IOPS storms
because of the per-mapping writeback plugging we do. This
essentially causes imeediate dispatch of IO for each mapping,
regardless of the context in which writeback is occurring.

IOWs, running a concurrent write-lots-of-small 4k files using fsmark
on XFS results in a huge number of IOPS being issued for data
writes.  Metadata writes are sorted and plugged at a high level by
XFS, so aggregate nicely into large IOs. However, data writeback IOs
are dispatched in individual 4k IOs, even when the blocks of two
consecutively written files are adjacent.

Test VM: 8p, 8GB RAM, 4xSSD in RAID0, 100TB sparse XFS filesystem,
metadata CRCs enabled.

Kernel: 3.10-rc5 + xfsdev + my 3.11 xfs queue (~70 patches)

Test:

$ ./fs_mark  -D  1  -S0  -n  1  -s  4096  -L  120  -d
/mnt/scratch/0  -d  /mnt/scratch/1  -d  /mnt/scratch/2  -d
/mnt/scratch/3  -d  /mnt/scratch/4  -d  /mnt/scratch/5  -d
/mnt/scratch/6  -d  /mnt/scratch/7

Result:

wallsys create rate Physical write IO
timeCPU (avg files/s)IOPS   Bandwidth
-   -   --  -
unpatched   6m56s   15m47s  24,000+/-50026,000  130MB/s
patched 5m06s   13m28s  32,800+/-600 1,500  180MB/s
improvement -26.44% -14.68%   +36.67%   -94.23% +38.46%

If I use zero length files, this workload at about 500 IOPS, so
plugging drops the data IOs from roughly 25,500/s to 1000/s.
3 lines of code, 35% better throughput for 15% less CPU.

The benefits of plugging at this layer are likely to be higher for
spinning media as the IO patterns for this workload are going make a
much bigger difference on high IO latency devices.

Signed-off-by: Dave Chinner 
Reviewed-by: Jan Kara 
---
 fs/fs-writeback.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index e907052..a9ff2b7 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -659,7 +659,9 @@ static long writeback_sb_inodes(struct super_block *sb,
unsigned long start_time = jiffies;
long write_chunk;
long wrote = 0;  /* count both pages and inodes */
+   struct blk_plug plug;
 
+   blk_start_plug();
while (!list_empty(>b_io)) {
struct inode *inode = wb_inode(wb->b_io.prev);
 
@@ -756,6 +758,7 @@ static long writeback_sb_inodes(struct super_block *sb,
break;
}
}
+   blk_finish_plug();
return wrote;
 }
 
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/9] writeback: plug writeback at a high level

2015-03-10 Thread Josef Bacik
From: Dave Chinner dchin...@redhat.com

Doing writeback on lots of little files causes terrible IOPS storms
because of the per-mapping writeback plugging we do. This
essentially causes imeediate dispatch of IO for each mapping,
regardless of the context in which writeback is occurring.

IOWs, running a concurrent write-lots-of-small 4k files using fsmark
on XFS results in a huge number of IOPS being issued for data
writes.  Metadata writes are sorted and plugged at a high level by
XFS, so aggregate nicely into large IOs. However, data writeback IOs
are dispatched in individual 4k IOs, even when the blocks of two
consecutively written files are adjacent.

Test VM: 8p, 8GB RAM, 4xSSD in RAID0, 100TB sparse XFS filesystem,
metadata CRCs enabled.

Kernel: 3.10-rc5 + xfsdev + my 3.11 xfs queue (~70 patches)

Test:

$ ./fs_mark  -D  1  -S0  -n  1  -s  4096  -L  120  -d
/mnt/scratch/0  -d  /mnt/scratch/1  -d  /mnt/scratch/2  -d
/mnt/scratch/3  -d  /mnt/scratch/4  -d  /mnt/scratch/5  -d
/mnt/scratch/6  -d  /mnt/scratch/7

Result:

wallsys create rate Physical write IO
timeCPU (avg files/s)IOPS   Bandwidth
-   -   --  -
unpatched   6m56s   15m47s  24,000+/-50026,000  130MB/s
patched 5m06s   13m28s  32,800+/-600 1,500  180MB/s
improvement -26.44% -14.68%   +36.67%   -94.23% +38.46%

If I use zero length files, this workload at about 500 IOPS, so
plugging drops the data IOs from roughly 25,500/s to 1000/s.
3 lines of code, 35% better throughput for 15% less CPU.

The benefits of plugging at this layer are likely to be higher for
spinning media as the IO patterns for this workload are going make a
much bigger difference on high IO latency devices.

Signed-off-by: Dave Chinner dchin...@redhat.com
Reviewed-by: Jan Kara j...@suse.cz
---
 fs/fs-writeback.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index e907052..a9ff2b7 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -659,7 +659,9 @@ static long writeback_sb_inodes(struct super_block *sb,
unsigned long start_time = jiffies;
long write_chunk;
long wrote = 0;  /* count both pages and inodes */
+   struct blk_plug plug;
 
+   blk_start_plug(plug);
while (!list_empty(wb-b_io)) {
struct inode *inode = wb_inode(wb-b_io.prev);
 
@@ -756,6 +758,7 @@ static long writeback_sb_inodes(struct super_block *sb,
break;
}
}
+   blk_finish_plug(plug);
return wrote;
 }
 
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/