Re: [PATCH] btrfs: fix write_dev_supers

2009-06-10 Thread Hisashi Hifumi

At 20:25 09/06/09, Chris Mason wrote:
On Tue, Jun 09, 2009 at 10:46:55AM +0900, Hisashi Hifumi wrote:
 Hi.
 
 I got following BUG trace.
 This is violation of BUG_ON(!buffer_locked(bh)) check on submit_bh() 
 function.
 In write_dev_supers(), if wait parameter is set and buffer_uptodate() check
 is negative,  submit_bh() is executed and hit above BUG_ON.
 So I fixed this issue.

Thanks for finding this bug and sending the patch.

This function is very confusing.  If wait parameter is set, it
isn't supposed to do any IO at all.  The caller first does
write_dev_supers with wait == 0, and that sends all the supers down on
all the devices.

Then it calls again with wait == 1, which is supposed to make sure all
the supers actually got to disk.

We should change the wait == 0 behavior to leave a reference held on all
the buffers, and wait == 1 to drop that reference.  That way the buffer
won't disappear while we are waiting, and we can return an error if the
buffer wasn't up to date when wait == 1.


Like this?

I changed wait == 0 case to get extra ref and on wait == 1 case if buffer is 
uptodate, bh releases ref otherwise buffer takes lock to proceed to submit_bh.

Thanks.
 
Signed-off-by: Hisashi Hifumi hifumi.hisa...@oss.ntt.co.jp

diff -Nrup linux-2.6.30-rc8.org/fs/btrfs/disk-io.c 
linux-2.6.30-rc8.btrfs/fs/btrfs/disk-io.c
--- linux-2.6.30-rc8.org/fs/btrfs/disk-io.c 2009-06-04 16:26:25.0 
+0900
+++ linux-2.6.30-rc8.btrfs/fs/btrfs/disk-io.c   2009-06-10 15:41:03.0 
+0900
@@ -2044,8 +2044,10 @@ static int write_dev_supers(struct btrfs
wait_on_buffer(bh);
if (buffer_uptodate(bh)) {
brelse(bh);
+   brelse(bh);
continue;
-   }
+   } else
+   lock_buffer(bh);
} else {
btrfs_set_super_bytenr(sb, bytenr);
 
@@ -2062,6 +2064,7 @@ static int write_dev_supers(struct btrfs
 
set_buffer_uptodate(bh);
get_bh(bh);
+   get_bh(bh);
lock_buffer(bh);
bh-b_end_io = btrfs_end_buffer_write_sync;
}



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: fix write_dev_supers

2009-06-09 Thread Hisashi Hifumi

At 20:25 09/06/09, Chris Mason wrote:
On Tue, Jun 09, 2009 at 10:46:55AM +0900, Hisashi Hifumi wrote:
 Hi.
 
 I got following BUG trace.
 This is violation of BUG_ON(!buffer_locked(bh)) check on submit_bh() 
 function.
 In write_dev_supers(), if wait parameter is set and buffer_uptodate() check
 is negative,  submit_bh() is executed and hit above BUG_ON.
 So I fixed this issue.

Thanks for finding this bug and sending the patch.

This function is very confusing.  If wait parameter is set, it
isn't supposed to do any IO at all.  The caller first does
write_dev_supers with wait == 0, and that sends all the supers down on
all the devices.

Then it calls again with wait == 1, which is supposed to make sure all
the supers actually got to disk.

We should change the wait == 0 behavior to leave a reference held on all
the buffers, and wait == 1 to drop that reference.  That way the buffer
won't disappear while we are waiting, and we can return an error if the
buffer wasn't up to date when wait == 1.

Are you interested in fixing this?

Yes, I want to fix this. 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: fix write_dev_supers

2009-06-08 Thread Hisashi Hifumi
Hi.

I got following BUG trace.
This is violation of BUG_ON(!buffer_locked(bh)) check on submit_bh() function.
In write_dev_supers(), if wait parameter is set and buffer_uptodate() check
is negative,  submit_bh() is executed and hit above BUG_ON.
So I fixed this issue.
Thanks.


Jun  9 00:41:32 dl580 kernel: [ cut here ]
Jun  9 00:41:32 dl580 kernel: kernel BUG at fs/buffer.c:2933!
Jun  9 00:41:32 dl580 kernel: invalid opcode:  [#1] SMP
Jun  9 00:41:32 dl580 kernel: last sysfs file: 
/sys/devices/system/cpu/cpu7/cache/index1/sha
red_cpu_map
Jun  9 00:41:32 dl580 kernel: CPU 3
Jun  9 00:41:32 dl580 kernel: Modules linked in: btrfs zlib_deflate ext4 jbd2 
crc16 sg qla2x
xx scsi_transport_fc autofs4 i2c_dev i2c_core sunrpc ipv6 serio_raw
tg3 libphy ata_piix libata shpchp rtc_cmos rtc_core rtc_lib cciss sd_mod 
scsi_mod ext3 jbd [
last unloaded: scsi_transport_fc]
Jun  9 00:41:32 dl580 kernel: Pid: 5207, comm: umount Tainted: GW  
2.6.30-rc6 #1 Pro
Liant DL580 G3
Jun  9 00:41:32 dl580 kernel: RIP: 0010:[802c458b]  
[802c458b] submit_bh
+0x1a/0x105
Jun  9 00:41:32 dl580 kernel: RSP: 0018:8801f46e5bf8  EFLAGS: 00010246
Jun  9 00:41:32 dl580 kernel: RAX: 0028 RBX: 88018a7ea420 RCX: 
0
000
Jun  9 00:41:32 dl580 kernel: RDX: 88018a7ea420 RSI: 88018a7ea420 RDI: 
0
419
Jun  9 00:41:32 dl580 kernel: RBP: 8801f46e5c18 R08: 802c533d R09: 
0
000
Jun  9 00:41:32 dl580 kernel: R10: 0001 R11: 0088 R12: 
88021d448
248
Jun  9 00:41:32 dl580 kernel: R13: 0419 R14: 8802191dacbb R15: 
0
000
Jun  9 00:41:32 dl580 kernel: FS:  7fd64fef3760() 
GS:88002815() knlGS:00
00
Jun  9 00:41:32 dl580 kernel: CS:  0010 DS:  ES:  CR0: 8005003b
Jun  9 00:41:32 dl580 kernel: CR2: 0044ef40 CR3: 000104287000 CR4: 
0
6e0
Jun  9 00:41:32 dl580 kernel: DR0:  DR1:  DR2: 
0
000
Jun  9 00:41:32 dl580 kernel: DR3:  DR6: 0ff0 DR7: 
0
400
Jun  9 00:41:32 dl580 kernel: Process umount (pid: 5207, threadinfo 
8801f46e4000, task f
fff8801e1168000)
Jun  9 00:41:32 dl580 kernel: Stack:
Jun  9 00:41:32 dl580 kernel:  0003 88018a7ea420 
88021d448248 00
03
Jun  9 00:41:32 dl580 kernel:  8801f46e5c68 a02d9979 
 000100
01
Jun  9 00:41:32 dl580 kernel:  0001 88021d448248 
 880219
1dacbb
Jun  9 00:41:32 dl580 kernel: Call Trace:
Jun  9 00:41:33 dl580 kernel:  [a02d9979] 
write_dev_supers+0x1eb/0x258 [btrfs]
Jun  9 00:41:33 dl580 kernel:  [a02d9b6d] 
write_all_supers+0x187/0x1c8 [btrfs]
Jun  9 00:41:33 dl580 kernel:  [a02d9bbc] write_ctree_super+0xe/0x10 
[btrfs]
Jun  9 00:41:33 dl580 kernel:  [a02de39f] 
btrfs_commit_transaction+0x6bb/0x841 [bt
rfs]
Jun  9 00:41:33 dl580 kernel:  [80246914] ? 
autoremove_wake_function+0x0/0x38
Jun  9 00:41:33 dl580 kernel:  [a02c14ed] btrfs_sync_fs+0x67/0x72 
[btrfs]
Jun  9 00:41:33 dl580 kernel:  [802e6e3a] quota_sync_sb+0x42/0xf3
Jun  9 00:41:33 dl580 kernel:  [802e6f14] sync_dquots+0x29/0x138
Jun  9 00:41:33 dl580 kernel:  [802a8c29] __fsync_super+0x1e/0x7b
Jun  9 00:41:33 dl580 kernel:  [802a8c97] fsync_super+0x11/0x22
Jun  9 00:41:33 dl580 kernel:  [802a8ea9] 
generic_shutdown_super+0x26/0xe2
Jun  9 00:41:33 dl580 kernel:  [802a8fb6] kill_anon_super+0x17/0x3b
Jun  9 00:41:33 dl580 kernel:  [802a92e8] deactivate_super+0x62/0x77
Jun  9 00:41:33 dl580 kernel:  [802bb7ae] mntput_no_expire+0xec/0x12c
Jun  9 00:41:33 dl580 kernel:  [802bbcff] sys_umount+0x2c5/0x31c
Jun  9 00:41:33 dl580 kernel:  [8020aeeb] system_call_fastpath+0x16/0x
Jun  9 00:41:33 dl580 kernel: Code: e0 eb ec 44 89 e8 48 83 c4 18 5b 41 5c 41 
5d 5d c3 55 48
 89 e5 41 55 41 54 53 48 83 ec 08 41 89 fd 48 89 f3 48 8b 06 a8 04 75 04 0f 
0b eb fe a8 20
 75 04 0f 0b eb fe 48 83 7e 38 00 75 04 0f 0b
Jun  9 00:41:33 dl580 kernel: RIP  [802c458b] submit_bh+0x1a/0x105
Jun  9 00:41:33 dl580 kernel:  RSP 8801f46e5bf8
Jun  9 00:41:33 dl580 kernel: ---[ end trace 4eaa2a86a8e2da24 ]--- 



Signed-off-by: Hisashi Hifumi hifumi.hisa...@oss.ntt.co.jp

--- linux-2.6.30-rc8.org/fs/btrfs/disk-io.c 2009-06-04 16:26:25.0 
+0900
+++ linux-2.6.30-rc8.btrfs/fs/btrfs/disk-io.c   2009-06-08 18:42:46.0 
+0900
@@ -2045,6 +2045,9 @@ static int write_dev_supers(struct btrfs
if (buffer_uptodate(bh)) {
brelse(bh);
continue;
+   } else {
+   get_bh(bh);
+   lock_buffer(bh);
}
} else

Re: [RFC] [PATCH] Btrfs: improve fsync/osync write performance

2009-04-01 Thread Hisashi Hifumi

At 20:27 09/03/31, Chris Mason wrote:
On Tue, 2009-03-31 at 14:18 +0900, Hisashi Hifumi wrote:
 Hi Chris.
 
 I noticed performance of fsync() and write() with O_SYNC flag on Btrfs is
 very slow as compared to ext3/4. I used blktrace to try to investigate the 
 cause of this. One of cause is that unplug is done by kblockd even if the 
I/O is 
 issued through fsync() or write() with O_SYNC flag. kblockd's unplug timeout
 is 3msec, so unplug via blockd can decrease I/O response. To increase 
 fsync/osync write performance, speeding up unplug should be done here.
 

 Btrfs's write I/O is issued via kernel thread, not via user application 
 context
 that calls fsync(). While waiting for page writeback, 
 wait_on_page_writeback() 
 can not unplug I/O sometimes on Btrfs because submit_bio is not called from 
 user application context so when submit_bio is called from kernel thread, 
 wait_on_page_writeback() sleeps on io_schedule(). 
 

This is exactly right, and one of the uglier side effects of the async
helper kernel threads.  I've been thinking for a while about a clean way
to fix it.

 I introduced btrfs_wait_on_page_writeback() on following patch, this is 
replacement 
 of wait_on_page_writeback() for Btrfs. This does unplug every 1 tick while
 waiting for page writeback.
 
 I did a performance test using the sysbench.
 
 # sysbench --num-threads=4 --max-requests=1  --test=fileio --file-num=1 
 --file-block-size=4K --file-total-size=128M --file-test-mode=rndwr 
 --file-fsync-freq=5  run
 
 The result was:
 -2.6.29
 
 Test execution summary:
 total time:  628.1047s
 total number of events:  1
 total time taken by event execution: 413.0834
 per-request statistics:
  min:0.s
  avg:0.0413s
  max:1.9075s
  approx.  95 percentile: 0.3712s
 
 Threads fairness:
 events (avg/stddev):   2500./29.21
 execution time (avg/stddev):   103.2708/4.04
 
 
 -2.6.29-patched
 
 Test execution summary:
 total time:  579.8049s
 total number of events:  10004
 total time taken by event execution: 355.3098
 per-request statistics:
  min:0.s
  avg:0.0355s
  max:1.7670s
  approx.  95 percentile: 0.3154s
 
 Threads fairness:
 events (avg/stddev):   2501./8.03
 execution time (avg/stddev):   88.8274/1.94
 
 
 This patch has some effect for performance improvement. 
 
 I think there are other reasons that should be fixed why fsync() or 
 write() with O_SYNC flag is slow on Btrfs.
 

Very nice.  Could I trouble you to try one more experiment?  The other
way to fix this is to your WRITE_SYNC instead of WRITE.  Could you
please hardcode WRITE_SYNC in the btrfs submit_bio paths and benchmark
that?

It doesn't cover as many cases as your patch, but it might have a lower
overall impact.


Hi.
I wrote hardcode WRITE_SYNC patch for btrfs submit_bio paths as shown below,
and I did sysbench test.
Later, I will try your unplug patch.

diff -Nrup linux-2.6.29.org/fs/btrfs/disk-io.c 
linux-2.6.29.btrfs_sync/fs/btrfs/disk-io.c
--- linux-2.6.29.org/fs/btrfs/disk-io.c 2009-03-24 08:12:14.0 +0900
+++ linux-2.6.29.btrfs_sync/fs/btrfs/disk-io.c  2009-04-01 16:26:56.0 
+0900
@@ -2068,7 +2068,7 @@ static int write_dev_supers(struct btrfs
}
 
if (i == last_barrier  do_barriers  device-barriers) {
-   ret = submit_bh(WRITE_BARRIER, bh);
+   ret = submit_bh(WRITE_BARRIER|WRITE_SYNC, bh);
if (ret == -EOPNOTSUPP) {
printk(btrfs: disabling barriers on dev %s\n,
   device-name);
@@ -2076,10 +2076,10 @@ static int write_dev_supers(struct btrfs
device-barriers = 0;
get_bh(bh);
lock_buffer(bh);
-   ret = submit_bh(WRITE, bh);
+   ret = submit_bh(WRITE_SYNC, bh);
}
} else {
-   ret = submit_bh(WRITE, bh);
+   ret = submit_bh(WRITE_SYNC, bh);
}
 
if (!ret  wait) {
diff -Nrup linux-2.6.29.org/fs/btrfs/extent_io.c 
linux-2.6.29.btrfs_sync/fs/btrfs/extent_io.c
--- linux-2.6.29.org/fs/btrfs/extent_io.c   2009-03-24 08:12:14.0 
+0900
+++ linux-2.6.29.btrfs_sync/fs/btrfs/extent_io.c2009-04-01 
14:48:08.0 +0900
@@ -1851,8 +1851,11 @@ static int submit_one_bio(int rw, struct
if (tree-ops  tree-ops-submit_bio_hook)
tree-ops-submit_bio_hook(page-mapping-host, rw, bio

Re: [PATCH] btrfs: call mark_inode_dirty when i_size is updated

2009-02-02 Thread Hisashi Hifumi

At 23:12 09/02/02, Chris Mason wrote:
On Mon, 2009-02-02 at 20:00 +0900, Hisashi Hifumi wrote:
 Hi Chris.
 
 I think it is needed to call mark_inode_dirty() when file size expands
 in order to flush metadata updates to HDD through sync() syscall or
 background_writeout().
 

Thanks for reading through this code and sending the patch.

I find the I_DIRTY flags one of the more confusing parts of the generic
fs writeback cdoe.  But, I think what happens is the
btrfs_set_page_dirty function calls __set_page_dirty_nobuffers() which
does:

if (mapping-host) {
 /* !PageAnon  !swapper_space */
  __mark_inode_dirty(mapping-host, I_DIRTY_PAGES);
}

This should be enough to make sure the btrfs inodes are processed by
background writeout and sync().  Please let me know if I'm misreading
things.

Surely, as you pointed out, btrfs_set_page_dirty calls
if (mapping-host) {
 /* !PageAnon  !swapper_space */
  __mark_inode_dirty(mapping-host, I_DIRTY_PAGES);
}
through _set_page_dirty_nobuffers.
But I_DIRTY_PAGES is not sufficient.
To flush metadata update to HDD through sync(), I_DIRTY_SYNC or 
I_DIRTY_DATASYNC flag is needed. see __sync_single_inode.

Thanks.


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: call mark_inode_dirty when i_size is updated

2009-02-02 Thread Hisashi Hifumi

At 10:04 09/02/03, Chris Mason wrote:
On Tue, 2009-02-03 at 09:36 +0900, Hisashi Hifumi wrote:
 At 23:12 09/02/02, Chris Mason wrote:
 On Mon, 2009-02-02 at 20:00 +0900, Hisashi Hifumi wrote:
  Hi Chris.
  
  I think it is needed to call mark_inode_dirty() when file size expands
  in order to flush metadata updates to HDD through sync() syscall or
  background_writeout().
  
 
 Thanks for reading through this code and sending the patch.
 
 I find the I_DIRTY flags one of the more confusing parts of the generic
 fs writeback cdoe.  But, I think what happens is the
 btrfs_set_page_dirty function calls __set_page_dirty_nobuffers() which
 does:
 
 if (mapping-host) {
  /* !PageAnon  !swapper_space */
   __mark_inode_dirty(mapping-host, I_DIRTY_PAGES);
 }
 
 This should be enough to make sure the btrfs inodes are processed by
 background writeout and sync().  Please let me know if I'm misreading
 things.
 
 Surely, as you pointed out, btrfs_set_page_dirty calls
 if (mapping-host) {
  /* !PageAnon  !swapper_space */
   __mark_inode_dirty(mapping-host, I_DIRTY_PAGES);
 }
 through _set_page_dirty_nobuffers.
 But I_DIRTY_PAGES is not sufficient.
 To flush metadata update to HDD through sync(), I_DIRTY_SYNC or 
 I_DIRTY_DATASYNC flag is needed. see __sync_single_inode.

Since btrfs uses a dirty_inode callback, our inodes are never really
dirty. The btree metadata always has the same information as the in-core
inode does.

The extra transaction commit steps taken at sync time are enough to get
all the relevant metadata on disk.

So, I think what happens is that I_DIRTY_PAGES is enough to get the data
pages on disk and the transaction commit gets the metadata on disk.


metadata update transaction is made through dirty_inode call back, but
to run dirty_inode callback I_DIRTY_SYNC or  I_DIRTY_DATASYNC flag is needed.
(see __mark_inode_dirty).

Also, to commit transaction to disk through write_inode callback, I_DIRTY_SYNC 
or  I_DIRTY_DATASYNC flag is needed.(see _sync_single_inode)

So I think my patch fixes this issue.



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html