3.14.18 btrfs_set_item_key_safe BUG

2014-09-15 Thread Daniel J Blueman
On 3.14.18 with a BTRFS partition mounted
noatime,autodefrag,compress=lzo, I see the second assertion in
btrfs_set_item_key_safe() trip:

void btrfs_set_item_key_safe(struct btrfs_root *root, struct btrfs_path *path,
 struct btrfs_key *new_key)
{
struct btrfs_disk_key disk_key;
struct extent_buffer *eb;
int slot;

eb = path-nodes[0];
slot = path-slots[0];
if (slot  0) {
btrfs_item_key(eb, disk_key, slot - 1);
BUG_ON(comp_keys(disk_key, new_key) = 0);
}
if (slot  btrfs_header_nritems(eb) - 1) {
btrfs_item_key(eb, disk_key, slot + 1);
BUG_ON(comp_keys(disk_key, new_key) = 0); ---
}

Full backtrace:

kernel BUG at /home/apw/COD/linux/fs/btrfs/ctree.c:3215!
invalid opcode:  [#1] SMP
Modules linked in: nfsd nfs_acl auth_rpcgss nfs fscache lockd sunrpc
bonding psmouse serio_raw joydev video mac_hid lpc_ich lp parport
hid_generic usbhid hid bcache btrfs raid10 raid456 async_raid6_recov
async_pq raid6_pq async_xor ahci xor async_memcpy libahci async_tx
raid1 e1000e ptp pps_core raid0 multipath linear
CPU: 0 PID: 6742 Comm: btrfs-endio-wri Not tainted
3.14.18-031418-generic #201409060201
Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.0b 09/17/2012
task: 880418609d70 ti: 880121e92000 task.ti: 880121e92000
RIP: 0010:[a01693f1] [a01693f1]
btrfs_set_item_key_safe+0x141/0x150 [btrfs]
RSP: 0018:880121e93b28 EFLAGS: 00010246
RAX:  RBX: 0011 RCX: 3e60
RDX:  RSI: 880121e93c67 RDI: 880121e93b07
RBP: 880121e93b88 R08: 1000 R09: 880121e93b48
R10:  R11:  R12: 88009ce9bcc0
R13: 880121e93c67 R14: 880121e93b47 R15: 8804145f7c60
FS: () GS:88042fc0() knlGS:
CS: 0010 DS:  ES:  CR0: 80050033
CR2: 7ff34a8d1890 CR3: 01c0d000 CR4: 001407f0
Stack:
 880121e93b88 880405ca 8803140d1000 d900
 6c00f6bf 3e60 880121e93b88 8804145f7c60
 88009ce9bcc0 3e5e 0001 0c46
Call Trace:
 [a01a1868] __btrfs_drop_extents+0x5a8/0xc80 [btrfs]
 [a0165e00] ? tree_mod_log_free_eb+0x240/0x260 [btrfs]
 [a0191d6b]
insert_reserved_file_extent.constprop.60+0xab/0x310 [btrfs]
 [a018ee10] ? start_transaction.part.35+0x80/0x540 [btrfs]
 [a0198565] btrfs_finish_ordered_io+0x465/0x500 [btrfs]
 [a0198615] finish_ordered_fn+0x15/0x20 [btrfs]
 [a01bd8f0] worker_loop+0xa0/0x330 [btrfs]
 [a01bd850] ? check_pending_worker_creates.isra.1+0xe0/0xe0 [btrfs]
 [810930c9] kthread+0xc9/0xe0
 [81093000] ? flush_kthread_worker+0xb0/0xb0
 [81784abc] ret_from_fork+0x7c/0xb0
 [81093000] ? flush_kthread_worker+0xb0/0xb0
Code: 00 00 4c 89 f6 4c 89 e7 48 98 48 8d 04 80 48 8d 54 80 65 e8 b2
6c 04 00 4c 89 ee 4c 89 f7 e8 d7 f4 ff ff 85 c0 0f 8f 5c ff ff ff 0f
0b 0f 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55

After rebooting, btrfs check (btrfs-tools 3.14.1-1) shows:

checking extents
checking free space cache
cache and super generation don't match, space cache will be invalidated
checking fs roots
root 5 inode 16170969 errors 80, file extent overlap
root 5 inode 17592262 errors 100, file extent discount
found 752124326140 bytes used err is 1
total csum bytes: 2415994160
total tree bytes: 18200276992
total fs tree bytes: 14156120064
total extent tree bytes: 1240526848
btree space waste bytes: 2998597745
file data blocks allocated: 2473980772352
 referenced 2731118456832

Is it better to not trust compression, autodefrag, or is this
filesystem corruption from previous issues, so I should rebuild the
FS?

Thanks,
  Daniel
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


3.15 btrfs free space cache oops

2014-08-11 Thread Daniel J Blueman
When running MonetDB against a BTRFS RAID-0 set over 4 SSDs [1] on
3.15.5, we see io_ctl have a bad address of 0x20, causing a fatal
pagefault in memcpy():

(gdb) list *(__btrfs_write_out_cache+0x3e4)
0x81365984 is in __btrfs_write_out_cache
(fs/btrfs/free-space-cache.c:521).
516if (io_ctl-index = io_ctl-num_pages)
517return -ENOSPC;
518io_ctl_map_page(io_ctl, 0);
519}
520
521memcpy(io_ctl-cur, bitmap, PAGE_CACHE_SIZE);
522io_ctl_set_crc(io_ctl, io_ctl-index - 1);
523if (io_ctl-index  io_ctl-num_pages)
524io_ctl_map_page(io_ctl, 0);
525return 0;

I can try to reproduce it if more data is useful?

Thanks,
  Daniel

-- [1]

mkfs.btrfs -f -m raid0 -d raid0 -n 16k -l 16k -O skinny-metadata
/dev/sda2 /dev/sdc2 /dev/sdb2 /dev/sdd2
mount /dev/sda2 /scratch -o noatime,discard,nodatasum,nobarrier,ssd_spread

-- [2]

BUG: unable to handle kernel paging request at 0020
IP: [8135a374] __btrfs_write_out_cache+0x3e4/0x8e0
PGD 3bca02c067 PUD 3bcf5fb067 PMD 0
Oops:  [#1] SMP
Modules linked in:
CPU: 34 PID: 46645 Comm: mserver5 Not tainted 3.15.5-server #7
Hardware name: Dell Inc. PowerEdge R815/0W13NR, BIOS 3.1.1 [1.1.54] 10/16/2013
task: 880a8c7234f0 ti: 8809aefcc000 task.ti: 8809aefcc000
RIP: 0010:[8135a374] [8135a374]
__btrfs_write_out_cache+0x3e4/0x8e0
RSP: 0018:8809aefcfc40 EFLAGS: 00010246
RAX: 004fb9321000 RBX: 8809aefcfca8 RCX: 0200
RDX: 1000 RSI: 0020 RDI: 884fb9321000
RBP: 8809aefcfd48 R08: 0200 R09: 
R10:  R11: 884fb9320ffc R12: 8831e3303740
R13: 880100579970 R14: 880bb38061c0 R15: 0020
FS: 7fb9447ed700() GS:884bbfc8() knlGS:
CS: 0010 DS:  ES:  CR0: 80050033
CR2: 0020 CR3: 00329b71c000 CR4: 000407e0
Stack:
 8809aefcfc90 0011 000e 884fbbc2c870
 880bb38061c0 8809aefcfc90 880bb3806058 880b02ec
 883bcd523800 8833d338f2c0 88476b1eb4e0 00b890cde000
Call Trace:
 [81a75b4b] ? _raw_spin_lock+0xb/0x20
 [8135c0e1] btrfs_write_out_cache+0xb1/0xf0
 [8130be0b] btrfs_write_dirty_block_groups+0x58b/0x670
 [813199c5] commit_cowonly_roots+0x195/0x250
 [8131b92f] btrfs_commit_transaction+0x41f/0x9b0
 [81358e85] ? btrfs_log_dentry_safe+0x55/0x70
 [8132b6b2] btrfs_sync_file+0x182/0x2a0
 [8114a450] do_fsync+0x50/0x80
 [8114a6de] SyS_fdatasync+0xe/0x20
 [81a766e6] system_call_fastpath+0x1a/0x1f
Code: ff 4d 89 fc 49 89 c7 e9 ab 00 00 00 0f 1f 00 40 f6 c7 02 0f 85
fe 00 00 00 40 f6 c7 04 0f 85 14 01 00 00 89 d1 c1 e9 03 f6 c2 04 f3
48 a5 74 09 8b 0e 89 0f b9 04 00 00 00 f6 c2 02 74 0e 44 0f
RIP [8135a374] __btrfs_write_out_cache+0x3e4/0x8e0
 RSP 8809aefcfc40
CR2: 0020
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


3.15 btrfs free space cache oops

2014-08-11 Thread Daniel J Blueman
When running MonetDB over a BTRFS RAID-0 set over 4 SSDs [1] on
3.15.5, we see io_ctl have a bad address of 0x20, causing a fatal
pagefault in memcpy():

(gdb) list *(__btrfs_write_out_cache+0x3e4)
0x81365984 is in __btrfs_write_out_cache
(fs/btrfs/free-space-cache.c:521).
516if (io_ctl-index = io_ctl-num_pages)
517return -ENOSPC;
518io_ctl_map_page(io_ctl, 0);
519}
520
521memcpy(io_ctl-cur, bitmap, PAGE_CACHE_SIZE);
522io_ctl_set_crc(io_ctl, io_ctl-index - 1);
523if (io_ctl-index  io_ctl-num_pages)
524io_ctl_map_page(io_ctl, 0);
525return 0;

I can try to reproduce it if more data is useful?

Thanks,
  Daniel

-- [1]

mkfs.btrfs -f -m raid0 -d raid0 -n 16k -l 16k -O skinny-metadata
/dev/sda2 /dev/sdc2 /dev/sdb2 /dev/sdd2
mount /dev/sda2 /scratch -o noatime,discard,nodatasum,nobarrier,ssd_spread

-- [2]

BUG: unable to handle kernel paging request at 0020
IP: [8135a374] __btrfs_write_out_cache+0x3e4/0x8e0
PGD 3bca02c067 PUD 3bcf5fb067 PMD 0
Oops:  [#1] SMP
Modules linked in:
CPU: 34 PID: 46645 Comm: mserver5 Not tainted 3.15.5-server #7
Hardware name: Dell Inc. PowerEdge R815/0W13NR, BIOS 3.1.1 [1.1.54] 10/16/2013
task: 880a8c7234f0 ti: 8809aefcc000 task.ti: 8809aefcc000
RIP: 0010:[8135a374] [8135a374]
__btrfs_write_out_cache+0x3e4/0x8e0
RSP: 0018:8809aefcfc40 EFLAGS: 00010246
RAX: 004fb9321000 RBX: 8809aefcfca8 RCX: 0200
RDX: 1000 RSI: 0020 RDI: 884fb9321000
RBP: 8809aefcfd48 R08: 0200 R09: 
R10:  R11: 884fb9320ffc R12: 8831e3303740
R13: 880100579970 R14: 880bb38061c0 R15: 0020
FS: 7fb9447ed700() GS:884bbfc8() knlGS:
CS: 0010 DS:  ES:  CR0: 80050033
CR2: 0020 CR3: 00329b71c000 CR4: 000407e0
Stack:
 8809aefcfc90 0011 000e 884fbbc2c870
 880bb38061c0 8809aefcfc90 880bb3806058 880b02ec
 883bcd523800 8833d338f2c0 88476b1eb4e0 00b890cde000
Call Trace:
 [81a75b4b] ? _raw_spin_lock+0xb/0x20
 [8135c0e1] btrfs_write_out_cache+0xb1/0xf0
 [8130be0b] btrfs_write_dirty_block_groups+0x58b/0x670
 [813199c5] commit_cowonly_roots+0x195/0x250
 [8131b92f] btrfs_commit_transaction+0x41f/0x9b0
 [81358e85] ? btrfs_log_dentry_safe+0x55/0x70
 [8132b6b2] btrfs_sync_file+0x182/0x2a0
 [8114a450] do_fsync+0x50/0x80
 [8114a6de] SyS_fdatasync+0xe/0x20
 [81a766e6] system_call_fastpath+0x1a/0x1f
Code: ff 4d 89 fc 49 89 c7 e9 ab 00 00 00 0f 1f 00 40 f6 c7 02 0f 85
fe 00 00 00 40 f6 c7 04 0f 85 14 01 00 00 89 d1 c1 e9 03 f6 c2 04 f3
48 a5 74 09 8b 0e 89 0f b9 04 00 00 00 f6 c2 02 74 0e 44 0f
RIP [8135a374] __btrfs_write_out_cache+0x3e4/0x8e0
 RSP 8809aefcfc40
CR2: 0020
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


3.13.5 btrfs read() oops

2014-03-07 Thread Daniel J Blueman
With kernel 3.13.5 (Ubuntu mainline), when plugging in a (evidently
twitchy) USB3 stick with a BTRFS filesystem, I hit an oops in read()
[1].

Full dmesg output is at:
http://quora.org/2014/btrfs-oops.txt

Thanks,
  Daniel

-- [1]

IP: 0010:[8135eaf6] [8135eaf6] memcpy+0x6/0x110
RSP: 0018:88025fa1b910 EFLAGS: 00010207
RAX: 88005c3d906e RBX: 027e RCX: 027e
RDX: 027e RSI: 00050800 RDI: 88005c3d906e
RBP: 88025fa1b948 R08: 1000 R09: 88025fa1b918
R10:  R11:  R12: 8800560e6350
R13: 1600 R14: 88005c3d92ec R15: 027e
FS: 7f9272f79700() GS:88026f3c() knlGS:
CS: 0010 DS:  ES:  CR0: 80050033
CR2: 7f9264010018 CR3: 00025f79a000 CR4: 001407e0
Stack:
 a036401c 1000 8800837f3800 8801e041a000
  8800763df218 880064c8c4c0 88025fa1ba08
 a0348f9c 0f18  1000
Call Trace:
 [a036401c] ? read_extent_buffer+0xbc/0x110 [btrfs]
 [a0348f9c] btrfs_get_extent+0x91c/0x970 [btrfs]
 [a0360217] __do_readpage+0x357/0x730 [btrfs]
 [a0348680] ? btrfs_real_readdir+0x5b0/0x5b0 [btrfs]
 [a0360972] __extent_readpages.constprop.41+0x2a2/0x2c0 [btrfs]
 [a0348680] ? btrfs_real_readdir+0x5b0/0x5b0 [btrfs]
 [a03627f6] extent_readpages+0x1b6/0x1c0 [btrfs]
 [a0348680] ? btrfs_real_readdir+0x5b0/0x5b0 [btrfs]
 [81192f03] ? alloc_pages_current+0xa3/0x160
 [a03467df] btrfs_readpages+0x1f/0x30 [btrfs]
 [811578d9] __do_page_cache_readahead+0x1b9/0x270
 [81157dd2] ondemand_readahead+0x152/0x2a0
 [81157f51] page_cache_sync_readahead+0x31/0x50
 [8114d655] generic_file_aio_read+0x4c5/0x700
 [811b671a] do_sync_read+0x5a/0x90
 [811b6db5] vfs_read+0x95/0x160
 [811b78c9] SyS_read+0x49/0xa0
 [81715bff] tracesys+0xe1/0xe6
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Barrier remount failure

2013-12-25 Thread Daniel J Blueman
On 3.13-rc5, it's possible to remount a mounted BTRFS filesystem with
'nobarrier', but not possible to remount with 'barrier'.

Is this expected?

Many thanks,
  Daniel
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs_join_transaction bug...

2013-09-02 Thread Daniel J Blueman
+0x350/0x390 [btrfs]
 [a02e4233] btrfs_end_transaction_throttle+0x13/0x20 [btrfs]
 [a0330fc4] relocate_block_group+0x434/0x570 [btrfs]
 [a03312b7] btrfs_relocate_block_group+0x1b7/0x2f0 [btrfs]
 [a03093b6] btrfs_relocate_chunk.isra.62+0x56/0x3e0 [btrfs]
 [a03080c9] ? should_balance_chunk.isra.66+0x49/0x2f0 [btrfs]
 [a030cda2] __btrfs_balance+0x312/0x3f0 [btrfs]
 [a030d1ba] btrfs_balance+0x33a/0x5d0 [btrfs]
 [a03162af] btrfs_ioctl_balance+0x22f/0x550 [btrfs]
 [a0317f09] btrfs_ioctl+0x4f9/0xa90 [btrfs]
 [8109caf6] ? account_user_time+0xa6/0xc0
 [8109d134] ? vtime_account_user+0x74/0x90
 [811c471c] do_vfs_ioctl+0x7c/0x2f0
 [810210a9] ? syscall_trace_enter+0x29/0x270
 [811c4a21] SyS_ioctl+0x91/0xb0
 [81735aaf] tracesys+0xe1/0xe6
---[ end trace 552316f62b37bc3a ]---
BTRFS error (device ram1) in __btrfs_free_extent:5693: errno=-28 No space left
BTRFS info (device ram1): forced readonly
BTRFS debug (device ram1): run_one_delayed_ref returned -28
BTRFS error (device ram1) in btrfs_run_delayed_refs:2677: errno=-28 No
space left
[ cut here ]
Kernel BUG at a0330b83 [verbose debug info unavailable]
invalid opcode:  [#1] SMP
Modules linked in: dm_crypt snd_hda_codec_hdmi ipt_REJECT xt_limit
xt_tcpudp xt_addrtype nf_conntrack_ipv4 nf_defrag_ipv4 xt_state
ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp
nf_nat nf_conntrack_ftp nf_conntrack iptable_filter ip_tables x_tables
arc4 b43 joydev mac80211 snd_hda_codec_cirrus rfcomm bnep cfg80211
snd_hda_intel snd_hda_codec ssb uvcvideo ax88179_178a usbnet snd_hwdep
applesmc videobuf2_vmalloc mii snd_pcm btusb videobuf2_memops
videobuf2_core input_polldev bluetooth snd_page_alloc videodev nfsd
snd_seq_midi snd_seq_midi_event auth_rpcgss snd_rawmidi bcm5974
nfs_acl snd_seq nfs snd_seq_device lockd snd_timer binfmt_misc bcma
mei_me lpc_ich sunrpc snd mei soundcore fscache apple_gmux mac_hid
apple_bl nls_iso8859_1 lp parport btrfs xor zlib_deflate raid6_pq
libcrc32c microcode hid_generic hid_apple usbhid hid nouveau i915
mxm_wmi wmi ttm i2c_algo_bit ahci drm_kms_helper libahci drm video
CPU: 5 PID: 22243 Comm: btrfs Tainted: GW
3.11.0-031100rc7-generic #201308252135
Hardware name: Apple Inc. MacBookPro10,1/Mac-C3EC7CD22292981F, BIOS
MBP101.88Z.00EE.B02.1208081132 08/08/2012
task: 880261685dc0 ti: 8801c766c000 task.ti: 8801c766c000
RIP: 0010:[a0330b83]  [a0330b83]
merge_reloc_roots+0x273/0x280 [btrfs]
RSP: 0018:8801c766db18  EFLAGS: 00010286
RAX: 8801c766db48 RBX: ffe2 RCX: ffe2
RDX: 0941 RSI:  RDI: 880262f52000
RBP: 8801c766db88 R08:  R09: ea000894d5c0
R10: a02bc74a R11: 00017960 R12: 880212c33800
R13: 880212c30c58 R14: 8801f2793800 R15: 880212c30800
FS:  7f870b70b780() GS:88026f34() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f1ad76dd050 CR3: 00024e02c000 CR4: 001407e0
Stack:
 8801c766db28 880212c33d90  008c
 0005 8801c766db48 8801c766db48 8801c766db48
 88021fca6480 880212c33800 88021fca6480 ffe2
Call Trace:
 [a0330e4b] relocate_block_group+0x2bb/0x570 [btrfs]
 [a03312b7] btrfs_relocate_block_group+0x1b7/0x2f0 [btrfs]
 [a03093b6] btrfs_relocate_chunk.isra.62+0x56/0x3e0 [btrfs]
 [a03080c9] ? should_balance_chunk.isra.66+0x49/0x2f0 [btrfs]
 [a030cda2] __btrfs_balance+0x312/0x3f0 [btrfs]
 [a030d1ba] btrfs_balance+0x33a/0x5d0 [btrfs]
 [a03162af] btrfs_ioctl_balance+0x22f/0x550 [btrfs]
 [a0317f09] btrfs_ioctl+0x4f9/0xa90 [btrfs]
 [8109caf6] ? account_user_time+0xa6/0xc0
 [8109d134] ? vtime_account_user+0x74/0x90
 [811c471c] do_vfs_ioctl+0x7c/0x2f0
 [810210a9] ? syscall_trace_enter+0x29/0x270
 [811c4a21] SyS_ioctl+0x91/0xb0
 [81735aaf] tracesys+0xe1/0xe6
Code: b8 48 39 45 c0 74 b5 48 8d 7d c0 e8 18 a8 ff ff eb aa 48 8b 45
c8 48 8d 55 c0 4c 89 6d c8 49 89 55 00 49 89 45 08 4c 89 28 eb b3 0f
0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89
RIP  [a0330b83] merge_reloc_roots+0x273/0x280 [btrfs]
 RSP 8801c766db18
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs_search_slot failure...

2012-07-31 Thread Daniel J Blueman
With Chris's current for-linus branch against 3.5.0, while doing I/O
and defrag/balance/scrub for a short time, I see btrfs_search_slot
fail to find the key, leaving it's loop and returning one, tripping
this assertion [1].

Let me know if interested for more testing/details/debug.

Thanks,
  Daniel

--- [1]

kernel BUG at fs/btrfs/relocation.c:3222!
invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
CPU 1
Modules linked in: brd netconsole dm_crypt dm_mod coretemp kvm_intel
kvm uvcvideo videobuf2_core videodev videobuf2_vmalloc
videobuf2_memops microcode iwlwifi binfmt_misc btrfs i915 cfbcopyarea
cfbimgblt cfbfillrect video [last unloaded: netconsole]

Pid: 7040, comm: btrfs Tainted: GW3.5.0-debug+ #4 Dell
Inc. Latitude E5420/006X7M
RIP: 0010:[a00fb327]  [a00fb327]
__add_tree_block.part.54+0xe7/0xf0 [btrfs]
RSP: 0018:88021bbb9ad8  EFLAGS: 00010202
RAX: 0001 RBX: 880209bec2d0 RCX: 
RDX:  RSI:  RDI: 
RBP: 88021bbb9b48 R08: 88021bbb9a94 R09: 88021bbb99f0
R10:  R11:  R12: 88021257e800
R13: 88021bbb9c40 R14: 15e78000 R15: 1000
FS:  7ff71964f740() GS:88022ec8() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 7f20850da000 CR3: 00021bdfe000 CR4: 000407e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process btrfs (pid: 7040, threadinfo 88021bbb8000, task 8802146e9f10)
Stack:
 15e78000 15e78000 88021bbb9b48 8000a00d1c18
 00a815e7 ff10 88021bbb9bb8 
 88021bbb9b78 15e78000 88021257e800 1000
Call Trace:
 [a00fb39d] __add_tree_block+0x6d/0xb0 [btrfs]
 [a00fba93] add_data_references+0xf3/0x290 [btrfs]
 [a00fdebd] relocate_block_group+0x3bd/0x560 [btrfs]
 [a00fe224] btrfs_relocate_block_group+0x1c4/0x300 [btrfs]
 [a00da31a] btrfs_relocate_chunk.isra.55+0x4a/0x240 [btrfs]
 [a00d4f32] ? free_extent_buffer+0x32/0x90 [btrfs]
 [a00dd78c] __btrfs_balance+0x2fc/0x3f0 [btrfs]
 [a00ddb73] btrfs_balance+0x2f3/0x4d0 [btrfs]
 [a00e4080] btrfs_ioctl_balance+0x130/0x430 [btrfs]
 [a00e80b8] btrfs_ioctl+0x428/0x8e0 [btrfs]
 [810f5baa] ? do_brk+0x22a/0x320
 [8112f3d7] do_vfs_ioctl+0x87/0x340
 [8122eb04] ? lockdep_sys_exit_thunk+0x35/0x67
 [8112f6da] sys_ioctl+0x4a/0x80
 [815f6a22] system_call_fastpath+0x16/0x1b
Code: f9 ff 8b 45 98 48 8b 5d d8 4c 8b 65 e0 4c 8b 6d e8 4c 8b 75 f0
4c 8b 7d f8 c9 c3 66 0f 1f 84 00 00 00 00 00 b8 f4 ff ff ff eb da 0f
0b 0f 1f 80 00 00 00 00 55 89 d0 45 31 c9 48 89 e5 41 56 41
RIP  [a00fb327] __add_tree_block.part.54+0xe7/0xf0 [btrfs]
 RSP 88021bbb9ad8
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Please hammer my for-linus branch

2012-07-10 Thread Daniel J Blueman
On 2 July 2012 12:20, Liu Bo liubo2...@cn.fujitsu.com wrote:
 On 07/02/2012 11:35 AM, Daniel J Blueman wrote:

 Hi everyone,

 I've got a nice set of fixes from Josef, Jan, Ilya and others in my
 for-linus branch:

 git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git 
 for-linus

 Some of the changes are fixes for the tree logging code, so I ran some
 extra crash runs against them Friday night.

 I ended up with a new crash in the tree log directory deletion replay
 code, so I didn't send out the pull request to Linus.

 It isn't clear yet if the new crash is because I was testing differently
 or if it is a regression.  I'm nailing it down this weekend, but please
 give my for-linus a shot.

 With this branch (3.4.0), my test has consistently been hitting the
 BUG_ON(owner  BTRFS_FIRST_FREE_OBJECTID) in
 insert_inline_extent_backref [1]. This is followed by a string of
 other issues [2] and a hard lockup, so I used netconsole to collect
 this.

 I'm preparing my btrfs test for xfstests integration, but can slip you
 it if interested. It hits this case in ~30s.



 IMO the BUG_ON is meant to avoid to mix 'log tree' in, it should be:

 BUG_ON(owner  BTRFS_FIRST_FREE_OBJECTID  root_objectid == 
 BTRFS_TREE_LOG_OBJECTID);

 This should help you, can you give it a try?

Bo, this did address the assertion I was tripping, so looks good from
here; it allowed me to report the second (different) assertion of
course.

If you still think the fix is sound, is it a good idea for 3.5-rc7?

Thanks,
  Daniel
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Please hammer my for-linus branch

2012-07-10 Thread Daniel J Blueman
On 11 July 2012 09:37, Liu Bo liubo2...@cn.fujitsu.com wrote:
 On 07/10/2012 08:18 PM, Daniel J Blueman wrote:

 On 2 July 2012 12:20, Liu Bo liubo2...@cn.fujitsu.com wrote:
 On 07/02/2012 11:35 AM, Daniel J Blueman wrote:

 Hi everyone,

 I've got a nice set of fixes from Josef, Jan, Ilya and others in my
 for-linus branch:

 git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git 
 for-linus

 Some of the changes are fixes for the tree logging code, so I ran some
 extra crash runs against them Friday night.

 I ended up with a new crash in the tree log directory deletion replay
 code, so I didn't send out the pull request to Linus.

 It isn't clear yet if the new crash is because I was testing differently
 or if it is a regression.  I'm nailing it down this weekend, but please
 give my for-linus a shot.
 With this branch (3.4.0), my test has consistently been hitting the
 BUG_ON(owner  BTRFS_FIRST_FREE_OBJECTID) in
 insert_inline_extent_backref [1]. This is followed by a string of
 other issues [2] and a hard lockup, so I used netconsole to collect
 this.

 I'm preparing my btrfs test for xfstests integration, but can slip you
 it if interested. It hits this case in ~30s.


 IMO the BUG_ON is meant to avoid to mix 'log tree' in, it should be:

 BUG_ON(owner  BTRFS_FIRST_FREE_OBJECTID  root_objectid == 
 BTRFS_TREE_LOG_OBJECTID);

 This should help you, can you give it a try?

 Bo, this did address the assertion I was tripping, so looks good from
 here; it allowed me to report the second (different) assertion of
 course.

 If you still think the fix is sound, is it a good idea for 3.5-rc7?


 Hi Daniel,

 I'm sorry but it is not ready yet, as it does not catch the root cause of the 
 bug.

 Josef has found that the bug comes from disabling merging delayed refs and is 
 working on the bug
 with Arne.  As the root cause has been found, the bug will be fixed soon IMO.

Now I see the two issues are connected.

 Btw, while testing with your great test scripts, I also post patches for two 
 bugs, which may have address your
 other issues.  Their links are

 http://www.spinics.net/lists/linux-btrfs/msg17761.html
 http://www.spinics.net/lists/linux-btrfs/msg17764.html

Great work indeed!

Thanks Bo,
  Daniel
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Please hammer my for-linus branch

2012-07-04 Thread Daniel J Blueman
On 4 July 2012 13:19, Liu Bo liubo2...@cn.fujitsu.com wrote:
 On 07/04/2012 11:37 AM, Daniel J Blueman wrote:
 Hi everyone,

 I've got a nice set of fixes from Josef, Jan, Ilya and others in my
 for-linus branch:

 git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git 
 for-linus

 Some of the changes are fixes for the tree logging code, so I ran some
 extra crash runs against them Friday night.

 I ended up with a new crash in the tree log directory deletion replay
 code, so I didn't send out the pull request to Linus.

 It isn't clear yet if the new crash is because I was testing differently
 or if it is a regression.  I'm nailing it down this weekend, but please
 give my for-linus a shot.

 I consistently run into this assertion [1] while running a fio
 workload on a fresh RAID10 filesystem with a balance running.

 Let me know if you need steps to reproduce, debug etc.

 Seems that additional condition does not catch the bug.

 Plz show us the steps to reproduce, I'll try to reproduce it locally and nail 
 it down.

The reproducer auto-generated from my test [1] consistently hits the
spot here; config @ http://quora.org/2012/kconfig-btrfs . You'll need
the fio workload file [2] in the same dir.

Thanks,
  Daniel

--- [1]

#!/bin/bash -ex

modprobe brd rd_size=1572864 rd_nr=4
# or use kernel param: ramdisk_size=1572864
mkdir -p /tmp/btrfsathon
sync

mkfs.btrfs -m raid1 -d raid1 -l 4096 -n 4096 /dev/ram2 /dev/ram3 /dev/ram1
mount /dev/ram1 /tmp/btrfsathon -o nodatacow,autodefrag,ssd,flushoncommit
btrfs filesystem defragment /tmp/btrfsathon ||:  sleep 0.017
fio --timeout=60 ./workload ||:  sleep 0.000
btrfs filesystem defragment /tmp/btrfsathon ||:  sleep 0.012
btrfs filesystem defragment /tmp/btrfsathon ||:  sleep 0.010
btrfs filesystem defragment /tmp/btrfsathon ||:  sleep 0.003
btrfs filesystem defragment /tmp/btrfsathon ||:  sleep 0.003
btrfs filesystem balance /tmp/btrfsathon ||:  sleep 0.003
fio --timeout=60 ./workload ||:  sleep 0.000
wait
umount /tmp/btrfsathon

--- [2] 'workload'

[global]
directory=/tmp/btrfsathon
rw=randread
size=128m
ioengine=libaio
iodepth=32
invalidate=1
direct=1

[bgwriter]
rw=randwrite
iodepth=32

[queryA]
iodepth=2
ioengine=mmap
direct=0
thinktime=1

[queryB]
iodepth=2
ioengine=mmap
direct=0
thinktime=1

[bgupdater]
rw=randrw
iodepth=32
size=64m
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Please hammer my for-linus branch

2012-07-03 Thread Daniel J Blueman
 Hi everyone,

 I've got a nice set of fixes from Josef, Jan, Ilya and others in my
 for-linus branch:

 git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git for-linus

 Some of the changes are fixes for the tree logging code, so I ran some
 extra crash runs against them Friday night.

 I ended up with a new crash in the tree log directory deletion replay
 code, so I didn't send out the pull request to Linus.

 It isn't clear yet if the new crash is because I was testing differently
 or if it is a regression.  I'm nailing it down this weekend, but please
 give my for-linus a shot.

I consistently run into this assertion [1] while running a fio
workload on a fresh RAID10 filesystem with a balance running.

Let me know if you need steps to reproduce, debug etc.

Thanks,
  Daniel

--- [1]

kernel BUG at fs/btrfs/extent-tree.c:1728!
invalid opcode:  [#1] SMP DEBUG_PAGEALLOC CPU 1

Modules linked in: brd dm_crypt dm_mod kvm_intel kvm binfmt_misc
coretemp microcode uvcvideo videobuf2_core videodev videobuf2_vmalloc
videobuf2_memops iwlwifi netconsole btrfs i915 cfbcopyarea cfbimgblt
cfbfillrect video

Pid: 31436, comm: btrfs Tainted: GW3.4.0-debug+ #6 Dell
Inc. Latitude E5420/0H5TG2
RIP: 0010:[a00ad739]  [a00ad739]
update_inline_extent_backref+0x2a9/0x2b0 [btrfs]
RSP: 0018:88021dfab858  EFLAGS: 00010213
RAX: 00b0 RBX: 8802061555a0 RCX: 88021cf1d000
RDX:  RSI: 0f3a RDI: 8800c4e5bc20
RBP: 88021dfab8b8 R08: 00b0 R09: 88021dfab808
R10:  R11:  R12: 8800c4e5bc20
R13: 0001 R14: 0001 R15: 0f10
FS:  7fdb25012740() GS:88022ec4() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 7f882b763f70 CR3: 00021d547000 CR4: 000407e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process btrfs (pid: 31436, threadinfo 88021dfaa000, task 88021fdc5ca0)
Stack:
 8801ff3ed000 00098123124f 8801ff3ed000 8802083270a0
  0f3a 880206156000 8802061555a0
 8801ff3ed000 8802083270a0  
Call Trace:
 [a00ad7c8] insert_inline_extent_backref+0x88/0x100 [btrfs]
 [a00a0be5] ? btrfs_alloc_path+0x15/0x20 [btrfs]
 [a00ad8da] __btrfs_inc_extent_ref+0x9a/0x1f0 [btrfs]
 [a00af077] run_delayed_tree_ref+0x167/0x190 [btrfs]
 [a00b2efe] run_one_delayed_ref+0xde/0xf0 [btrfs]
 [a00b2fed] run_clustered_refs+0xdd/0x370 [btrfs]
 [a00b33c9] btrfs_run_delayed_refs+0x149/0x340 [btrfs]
 [a00c4c97] __btrfs_end_transaction+0xa7/0x360 [btrfs]
 [a00c4f93] btrfs_end_transaction_throttle+0x13/0x20 [btrfs]
 [a01114e9] relocate_block_group+0x439/0x560 [btrfs]
 [a01117d4] btrfs_relocate_block_group+0x1c4/0x300 [btrfs]
 [a00eea4a] btrfs_relocate_chunk.isra.52+0x4a/0x240 [btrfs]
 [a00e9722] ? free_extent_buffer+0x32/0x90 [btrfs]
 [a00f1db4] __btrfs_balance+0x2f4/0x3f0 [btrfs]
 [a00f21a3] btrfs_balance+0x2f3/0x4d0 [btrfs]
 [a00f7f30] btrfs_ioctl_balance+0x140/0x440 [btrfs]
 [a00fbd67] btrfs_ioctl+0x5c7/0x7f0 [btrfs]
 [810f1616] ? do_brk+0x246/0x360
 [8112f607] do_vfs_ioctl+0x87/0x340
 [8122a434] ? lockdep_sys_exit_thunk+0x35/0x67
 [8112f90a] sys_ioctl+0x4a/0x80
 [815b8122] system_call_fastpath+0x16/0x1b
Code:e8 5d f6 02 00 45 31 c9 48 8b 4d a0 89 c2 44 8b 45 a8 eb b5 66 0f
1f 44 00 00 41 bd 0d 00 00 00 00 41 be 0d 00 00 00 00 e9 6b fe ff ff
f0b 0f 0b 0f 1f 00 55 48 89 e5 48 83 c4 80 48 8b 45 20 4c 89
RIP  [a00ad739] update_inline_extent_backref+0x2a9/0x2b0 [btrfs]
 RSP 88021dfab858
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Please hammer my for-linus branch

2012-07-02 Thread Daniel J Blueman
On 2 July 2012 21:34, Josef Bacik jba...@fusionio.com wrote:
 On Sun, Jul 01, 2012 at 09:35:01PM -0600, Daniel J Blueman wrote:
  Hi everyone,
 
  I've got a nice set of fixes from Josef, Jan, Ilya and others in my
  for-linus branch:
 
  git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git 
  for-linus
 
  Some of the changes are fixes for the tree logging code, so I ran some
  extra crash runs against them Friday night.
 
  I ended up with a new crash in the tree log directory deletion replay
  code, so I didn't send out the pull request to Linus.
 
  It isn't clear yet if the new crash is because I was testing differently
  or if it is a regression.  I'm nailing it down this weekend, but please
  give my for-linus a shot.

 With this branch (3.4.0), my test has consistently been hitting the
 BUG_ON(owner  BTRFS_FIRST_FREE_OBJECTID) in
 insert_inline_extent_backref [1]. This is followed by a string of
 other issues [2] and a hard lockup, so I used netconsole to collect
 this.

 I'm preparing my btrfs test for xfstests integration, but can slip you
 it if interested. It hits this case in ~30s.


 Can you apply this and capture the output, I have a feeling I know what this 
 is.
 Thanks,

 Josef

 diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
 index 5775dc4..917ea70 100644
 --- a/fs/btrfs/extent-tree.c
 +++ b/fs/btrfs/extent-tree.c
 @@ -1766,7 +1766,13 @@ int insert_inline_extent_backref(struct 
 btrfs_trans_handle *trans,
bytenr, num_bytes, parent,
root_objectid, owner, offset, 1);
 if (ret == 0) {
 -   BUG_ON(owner  BTRFS_FIRST_FREE_OBJECTID);
 +   if (owner  BTRFS_FIRST_FREE_OBJECTID) {
 +   printk(KERN_ERR bad inline extent, bytenr=%Lu, 
 +  num_bytes=%Lu, parent=%Lu, root=%Lu, 
 owner=%Lu
 +  , offset=%Lu\n, bytenr, num_bytes, parent,
 +  root_objectid, owner, offset);
 +   BUG_ON(owner  BTRFS_FIRST_FREE_OBJECTID);
 +   }
 update_inline_extent_backref(trans, root, path, iref,
  refs_to_add, extent_op);
 } else if (ret == -ENOENT) {

Bo's additional condition 'root_objectid == BTRFS_TREE_LOG_OBJECTID'
seemed to hold it off.

Here is the debug you asked for [1]. After we've determined the right
fix for this issue, I'll post the other issues I was seeing.

Thanks!
  Daniel

--- [1]

device fsid c5cf90d4-0301-4877-8f34-e8e82fe6ab0a devid 1 transid 3 /dev/ram3
device fsid c5cf90d4-0301-4877-8f34-e8e82fe6ab0a devid 2 transid 3 /dev/ram0
device fsid c5cf90d4-0301-4877-8f34-e8e82fe6ab0a devid 1 transid 4 /dev/ram3
btrfs: allowing degraded mounts
btrfs: force zlib compression
btrfs: disabling disk space caching
btrfs: enabling auto defrag
btrfs: enabling auto recovery
btrfs: no dev_stats entry found for device /dev/ram0 (devid 2) (OK on
first mount after mkfs)
btrfs: no dev_stats entry found for device /dev/ram3 (devid 1) (OK on
first mount after mkfs)
btrfs: relocating block group 512425984 flags 20
btrfs: found 2 extents
btrfs: relocating block group 190382080 flags 9
btrfs: found 4756 extents
btrfs: found 4756 extents
bad inline extent, bytenr=36909056, num_bytes=4096, parent=0, root=5,
owner=0, offset=0
[ cut here ]
kernel BUG at fs/btrfs/extent-tree.c:1774!
invalid opcode:  [#1] SMP DEBUG_PAGEALLOC CPU 0

Modules linked in:
 brd dm_crypt dm_mod kvm_intel kvm uvcvideo videobuf2_core videodev
videobuf2_vmalloc videobuf2_memops coretemp microcode iwlwifi
netconsole btrfs i915 cfbcopyarea cfbimgblt cfbfillrect video

Pid: 8055, comm: btrfs-endio-wri Not tainted 3.4.0-debug+ #5 Dell Inc.
Latitude E5420/0H5TG2

RIP: 0010:[a009685b] [a009685b]
insert_inline_extent_backref+0x11b/0x120 [btrfs]
RSP: 0018:880200415a40  EFLAGS: 00010282
RAX: 006d RBX: 88020e99c1b0 RCX: 
RDX: 8103cde5 RSI: 0001 RDI: 8103d170
RBP: 880200415ac0 R08: 0002 R09: 
R10:  R11:  R12: 88020e7f7000
R13: 88020ef80e60 R14:  R15: 1000
FS:  () GS:88022ec0() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 7f6916262000 CR3: 000221e24000 CR4: 000407f0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process btrfs-endio-wri (pid: 8055, threadinfo 880200414000, task
880222029ee0)
Stack:
  0005  
 88020001 a0089be5 880200415aa0 02333000
 8801f6f15000 0ea1 8801f6f15000 88020e99c1b0

Call Trace:
 [a0089be5

Re: Please hammer my for-linus branch

2012-07-01 Thread Daniel J Blueman
]
 [81110457] ? kmem_cache_alloc+0xe7/0x180
 [a009b90a] __btrfs_inc_extent_ref+0x9a/0x1f0 [btrfs]
 [a009d0a7] run_delayed_tree_ref+0x167/0x190 [btrfs]
 [a00a0f2e] run_one_delayed_ref+0xde/0xf0 [btrfs]
 [a00a101d] run_clustered_refs+0xdd/0x370 [btrfs]
 [a00a13f9] btrfs_run_delayed_refs+0x149/0x340 [btrfs]
 [a00b29c7] __btrfs_end_transaction+0xa7/0x360 [btrfs]
 [a00b2cc3] btrfs_end_transaction_throttle+0x13/0x20 [btrfs]
 [a00fecc9] relocate_block_group+0x439/0x560 [btrfs]
 [a00fefb4] btrfs_relocate_block_group+0x1c4/0x300 [btrfs]
 [a00dc84a] btrfs_relocate_chunk.isra.52+0x4a/0x240 [btrfs]
 [a00d7592] ? free_extent_buffer+0x32/0x90 [btrfs]
 [a00dfb14] __btrfs_balance+0x2f4/0x3f0 [btrfs]
 [a00dff03] btrfs_balance+0x2f3/0x4d0 [btrfs]
 [a00e5c30] btrfs_ioctl_balance+0x140/0x290 [btrfs]
 [a00e96c7] btrfs_ioctl+0x5c7/0x7f0 [btrfs]
 [810f2526] ? do_brk+0x246/0x360
 [81130987] do_vfs_ioctl+0x87/0x340
 [8122b894] ? lockdep_sys_exit_thunk+0x35/0x67
 [81130c8a] sys_ioctl+0x4a/0x80
 [815bc622] system_call_fastpath+0x16/0x1b

BUG: scheduling while atomic: btrfs/3219/0x1002
INFO: lockdep is turned off.
Modules linked in: brd netconsole dm_crypt dm_mod kvm_intel kvm
coretemp microcode uvcvideo videobuf2_core iwlwifi videodev
videobuf2_vmalloc videobuf2_memops btrfs i915 cfbcopyarea video
cfbimgblt cfbfillrect
Pid: 3219, comm: btrfs Tainted: G  D  3.4.0-debug+ #1
Call Trace:
 [815a674a] __schedule_bug+0x5d/0x61
 [815ba0fb] __schedule+0x8fb/0x9a0
 [810055a7] ? show_trace_log_lvl+0x57/0x70
 [810055d0] ? show_trace+0x10/0x20
 [815a469f] ? dump_stack+0x72/0x7b
 [8106c4e5] __cond_resched+0x25/0x40
 [815ba21d] _cond_resched+0x2d/0x40
 [815b9414] down_read+0x24/0x5c
 [8105143f] exit_signals+0x1f/0x130
 [81042956] do_exit+0xb6/0x480
 [81005677] oops_end+0x77/0xb0
 [810057f3] die+0x53/0x80
 [81002354] do_trap+0xc4/0x170
 [81002630] do_invalid_op+0x90/0xb0
 [a009b867] ? insert_inline_extent_backref+0xe7/0xf0 [btrfs]
 [a009466b] ? btrfs_search_slot+0x67b/0x760 [btrfs]
 [a00923ff] ? btrfs_leaf_free_space+0x5f/0xb0 [btrfs]
 [8122b85d] ? trace_hardirqs_off_thunk+0x3a/0x3c
 [815bbf09] ? restore_args+0x30/0x30
 [815bd695] invalid_op+0x15/0x20
 [a009b867] ? insert_inline_extent_backref+0xe7/0xf0 [btrfs]
 [a009b7de] ? insert_inline_extent_backref+0x5e/0xf0 [btrfs]
 [81110457] ? kmem_cache_alloc+0xe7/0x180
 [a009b90a] __btrfs_inc_extent_ref+0x9a/0x1f0 [btrfs]
 [a009d0a7] run_delayed_tree_ref+0x167/0x190 [btrfs]
 [a00a0f2e] run_one_delayed_ref+0xde/0xf0 [btrfs]
 [a00a101d] run_clustered_refs+0xdd/0x370 [btrfs]
 [a00a13f9] btrfs_run_delayed_refs+0x149/0x340 [btrfs]
 [a00b29c7] __btrfs_end_transaction+0xa7/0x360 [btrfs]
 [a00b2cc3] btrfs_end_transaction_throttle+0x13/0x20 [btrfs]
 [a00fecc9] relocate_block_group+0x439/0x560 [btrfs]
 [a00fefb4] btrfs_relocate_block_group+0x1c4/0x300 [btrfs]
 [a00dc84a] btrfs_relocate_chunk.isra.52+0x4a/0x240 [btrfs]
 [a00d7592] ? free_extent_buffer+0x32/0x90 [btrfs]
 [a00dfb14] __btrfs_balance+0x2f4/0x3f0 [btrfs]
 [a00dff03] btrfs_balance+0x2f3/0x4d0 [btrfs]
 [a00e5c30] btrfs_ioctl_balance+0x140/0x290 [btrfs]
 [a00e96c7] btrfs_ioctl+0x5c7/0x7f0 [btrfs]
 [810f2526] ? do_brk+0x246/0x360
 [81130987] do_vfs_ioctl+0x87/0x340
 [8122b894] ? lockdep_sys_exit_thunk+0x35/0x67
 [81130c8a] sys_ioctl+0x4a/0x80
 [815bc622] system_call_fastpath+0x16/0x1b
note: btrfs[3219] exited with preempt_count 1
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


3.4-rc6: delayed alloc deadlock...

2012-05-08 Thread Daniel J Blueman
]
 [815ae241] ? __schedule+0x351/0x8b0
 [a00eee39] btrfs_ioctl+0x409/0x770 [btrfs]
 [81128767] do_vfs_ioctl+0x87/0x340
 [81128a6a] sys_ioctl+0x4a/0x80
 [815b09a2] system_call_fastpath+0x16/0x1b
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2] btrfs: fix message printing

2012-05-07 Thread Daniel J Blueman
Fix various messages to include newline and module prefix.

Signed-off-by: Daniel J Blueman dan...@quora.org
---
 fs/btrfs/super.c   |8 
 fs/btrfs/volumes.c |6 +++---
 fs/btrfs/zlib.c|8 
 3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index c5f8fca..c0b8727 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -216,7 +216,7 @@ void __btrfs_abort_transaction(struct btrfs_trans_handle 
*trans,
   struct btrfs_root *root, const char *function,
   unsigned int line, int errno)
 {
-   WARN_ONCE(1, KERN_DEBUG btrfs: Transaction aborted);
+   WARN_ONCE(1, KERN_DEBUG btrfs: Transaction aborted\n);
trans-aborted = errno;
/* Nothing used. The other threads that have joined this
 * transaction may be able to continue. */
@@ -511,11 +511,11 @@ int btrfs_parse_options(struct btrfs_root *root, char 
*options)
btrfs_set_opt(info-mount_opt, ENOSPC_DEBUG);
break;
case Opt_defrag:
-   printk(KERN_INFO btrfs: enabling auto defrag);
+   printk(KERN_INFO btrfs: enabling auto defrag\n);
btrfs_set_opt(info-mount_opt, AUTO_DEFRAG);
break;
case Opt_recovery:
-   printk(KERN_INFO btrfs: enabling auto recovery);
+   printk(KERN_INFO btrfs: enabling auto recovery\n);
btrfs_set_opt(info-mount_opt, RECOVERY);
break;
case Opt_skip_balance:
@@ -1501,7 +1501,7 @@ static int btrfs_interface_init(void)
 static void btrfs_interface_exit(void)
 {
if (misc_deregister(btrfs_misc)  0)
-   printk(KERN_INFO misc_deregister failed for control device);
+   printk(KERN_INFO btrfs: misc_deregister failed for control 
device\n);
 }
 
 static int __init init_btrfs_fs(void)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 1411b99..79b603d 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -619,7 +619,7 @@ static int __btrfs_open_devices(struct btrfs_fs_devices 
*fs_devices,
 
bdev = blkdev_get_by_path(device-name, flags, holder);
if (IS_ERR(bdev)) {
-   printk(KERN_INFO open %s failed\n, device-name);
+   printk(KERN_INFO btrfs: open %s failed\n, 
device-name);
goto error;
}
filemap_write_and_wait(bdev-bd_inode-i_mapping);
@@ -3719,7 +3719,7 @@ static int __btrfs_map_block(struct btrfs_mapping_tree 
*map_tree, int rw,
read_unlock(em_tree-lock);
 
if (!em) {
-   printk(KERN_CRIT unable to find logical %llu len %llu\n,
+   printk(KERN_CRIT btrfs: unable to find logical %llu len 
%llu\n,
   (unsigned long long)logical,
   (unsigned long long)*length);
BUG();
@@ -4129,7 +4129,7 @@ int btrfs_map_bio(struct btrfs_root *root, int rw, struct 
bio *bio,
 
total_devs = bbio-num_stripes;
if (map_length  length) {
-   printk(KERN_CRIT mapping failed logical %llu bio len %llu 
+   printk(KERN_CRIT btrfs: mapping failed logical %llu bio len 
%llu 
   len %llu\n, (unsigned long long)logical,
   (unsigned long long)length,
   (unsigned long long)map_length);
diff --git a/fs/btrfs/zlib.c b/fs/btrfs/zlib.c
index 92c2065..9acb846 100644
--- a/fs/btrfs/zlib.c
+++ b/fs/btrfs/zlib.c
@@ -97,7 +97,7 @@ static int zlib_compress_pages(struct list_head *ws,
*total_in = 0;
 
if (Z_OK != zlib_deflateInit(workspace-def_strm, 3)) {
-   printk(KERN_WARNING deflateInit failed\n);
+   printk(KERN_WARNING btrfs: deflateInit failed\n);
ret = -1;
goto out;
}
@@ -125,7 +125,7 @@ static int zlib_compress_pages(struct list_head *ws,
while (workspace-def_strm.total_in  len) {
ret = zlib_deflate(workspace-def_strm, Z_SYNC_FLUSH);
if (ret != Z_OK) {
-   printk(KERN_DEBUG btrfs deflate in loop returned %d\n,
+   printk(KERN_DEBUG btrfs: deflate in loop returned 
%d\n,
   ret);
zlib_deflateEnd(workspace-def_strm);
ret = -1;
@@ -252,7 +252,7 @@ static int zlib_decompress_biovec(struct list_head *ws, 
struct page **pages_in,
}
 
if (Z_OK != zlib_inflateInit2(workspace-inf_strm, wbits)) {
-   printk(KERN_WARNING inflateInit failed\n);
+   printk(KERN_WARNING btrfs: inflateInit failed\n);
return -1;
}
while (workspace-inf_strm.total_in  srclen) {
@@ -336,7 +336,7 @@ static int zlib_decompress

[PATCH] Add missing printing newlines

2012-05-06 Thread Daniel J Blueman
Fix BTRFS messages to print a newline where there should be one.

Signed-off-by: Daniel J Blueman dan...@quora.org
---
 fs/btrfs/super.c |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index c5f8fca..c99cb72 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -216,7 +216,7 @@ void __btrfs_abort_transaction(struct btrfs_trans_handle 
*trans,
   struct btrfs_root *root, const char *function,
   unsigned int line, int errno)
 {
-   WARN_ONCE(1, KERN_DEBUG btrfs: Transaction aborted);
+   WARN_ONCE(1, KERN_DEBUG btrfs: Transaction aborted\n);
trans-aborted = errno;
/* Nothing used. The other threads that have joined this
 * transaction may be able to continue. */
@@ -511,11 +511,11 @@ int btrfs_parse_options(struct btrfs_root *root, char 
*options)
btrfs_set_opt(info-mount_opt, ENOSPC_DEBUG);
break;
case Opt_defrag:
-   printk(KERN_INFO btrfs: enabling auto defrag);
+   printk(KERN_INFO btrfs: enabling auto defrag\n);
btrfs_set_opt(info-mount_opt, AUTO_DEFRAG);
break;
case Opt_recovery:
-   printk(KERN_INFO btrfs: enabling auto recovery);
+   printk(KERN_INFO btrfs: enabling auto recovery\n);
btrfs_set_opt(info-mount_opt, RECOVERY);
break;
case Opt_skip_balance:
@@ -1501,7 +1501,7 @@ static int btrfs_interface_init(void)
 static void btrfs_interface_exit(void)
 {
if (misc_deregister(btrfs_misc)  0)
-   printk(KERN_INFO misc_deregister failed for control device);
+   printk(KERN_INFO misc_deregister failed for control device\n);
 }
 
 static int __init init_btrfs_fs(void)
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Fix break before assignment

2012-05-03 Thread Daniel J Blueman
Fix control flow to store count before breaking loop.

Signed-off-by: Daniel J Blueman dan...@quora.org
---
 fs/btrfs/ctree.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index e801f22..2227420 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -3465,8 +3465,8 @@ int btrfs_insert_some_items(struct btrfs_trans_handle 
*trans,
for (i = 0; i  nr; i++) {
if (total_size + data_size[i] + sizeof(struct btrfs_item) 
BTRFS_LEAF_DATA_SIZE(root)) {
-   break;
nr = i;
+   break;
}
total_data += data_size[i];
total_size += data_size[i] + sizeof(struct btrfs_item);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fix break before assignment

2012-05-03 Thread Daniel J Blueman
On 3 May 2012 22:04, David Sterba d...@jikos.cz wrote:
 On Thu, May 03, 2012 at 09:44:49PM +0800, Daniel J Blueman wrote:
 Fix control flow to store count before breaking loop.

 http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg10483.html
 but it's a dead code anyway.

Noted.

Chris, is it a good time to take out btrfs_insert_some_items or, if
keeping it, apply David's patch to avoid related surprises?

Thanks,
  Daniel
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: fs going r/o when out of space...

2012-05-02 Thread Daniel J Blueman
On 2 May 2012 22:01, Jeff Mahoney je...@suse.com wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 On 05/02/2012 01:44 AM, Daniel J Blueman wrote:
 I see the filesystem going readonly when run_clustered_refs
 returns -ENOSPC [1], so it looks like we need something like:

 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@
 -2451,7 +2451,8 @@ again: ret = run_clustered_refs(trans, root,
 cluster); if (ret  0) { spin_unlock(delayed_refs-lock); -
 btrfs_abort_transaction(trans, root, ret); +
 if (ret != -ENOSPC) +
 btrfs_abort_transaction(trans, root, ret); return ret; }

 No?

 No. In most cases ENOSPC is indistinguishable from any other error. An
 ENOSPC in deep code means that the reservation for the transaction
 wasn't big enough.

Ahh, makes sense. When I get some time, I'll dump the transaction
state and check how the reservation code calculates the length...

Thanks,
  Daniel
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


worker list corruption crash

2012-04-26 Thread Daniel J Blueman
In 3.4-rc4, I've come across worker list corruption while scrubbing,
leading to (in two separate cases) warning [1] and crashing [2]. The
connection with scrubbing is likely the increased rate of worker
threads starting and stopping.

In btrfs_stop_workers, access to worker-worker_list is done without
holding worker-lock (it is in all other callsites). We can't take
worker-lock there due to lock inversion deadlock (as it is the outer
lock), and if we drop the workers-lock to acquire worker-lock and
then workers-lock, we can't guarantee worker is still valid.

If feels like a global workers list pointer should be used and it's
lock should be the outer one to avoid this scenario, or maybe I'm
missing something?

Daniel

--- [1]

WARNING: at lib/list_debug.c:55 __list_del_entry+0xa1/0xd0()
Hardware name: Latitude E5420
list_del corruption. prev-next should be 88019cb3e268, but was
88021af4f628
Pid: 5232, comm: btrfs-scrub-4 Not tainted 3.4.0-rc4-debug+ #1
Call Trace:
 [8103c54a] warn_slowpath_common+0x7a/0xb0
 [8103c621] warn_slowpath_fmt+0x41/0x50
 [81229931] __list_del_entry+0xa1/0xd0
 [a01087d5] try_worker_shutdown+0x73/0xad [btrfs]
 [a00dfbff] worker_loop+0x17f/0x330 [btrfs]
 [a00dfa80] ? check_pending_worker_creates.isra.1+0xd0/0xd0 [btrfs]
 [8105d9ee] kthread+0x8e/0xa0
 [815ae0d4] kernel_thread_helper+0x4/0x10
 [815ac799] ? retint_restore_args+0xe/0xe
 [8105d960] ? __init_kthread_worker+0x70/0x70
 [815ae0d0] ? gs_change+0xb/0xb

(gdb) list *(try_worker_shutdown+0x73)
0x7e854 is in try_worker_shutdown (fs/btrfs/async-thread.c:241).
warning: Source file is more recent than executable.
236 atomic_read(worker-num_pending) == 0) {
237 freeit = 1;
238 list_del_init(worker-worker_list);
239 worker-workers-num_workers--;
240 }
241 spin_unlock(worker-workers-lock);
242 spin_unlock_irq(worker-lock);
243 
244 if (freeit)
245 put_worker(worker);

--- [2]

BUG: unable to handle kernel paging request at 8157f529
IP: [8108dd2e] __lock_acquire+0x1be/0x900
PGD 1a0d067 PUD 1a11063 PMD 14001e1
Oops: 0003 [#1] SMP
CPU 1
Pid: 2975, comm: btrfs-scrub-3 Tainted: GW3.4.0-rc4-debug+
#1 Dell Inc. Latitude E5420/0H5TG2
RIP: 0010:[8108dd2e]  [8108dd2e] __lock_acquire+0x1be/0x900
RSP: 0018:8801ad747d00  EFLAGS: 00010082
RAX: 81110b08 RBX: 8801df242288 RCX: 
RDX:  RSI:  RDI: 8801df242288
RBP: 8801ad747d70 R08: 0002 R09: 0001
R10:  R11: 8801df39c190 R12: 8801df39bc00
R13:  R14: 0002 R15: 8157f391
FS:  () GS:88022ec8() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 8157f529 CR3: 01a0b000 CR4: 000407e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process btrfs-scrub-3 (pid: 2975, threadinfo 8801ad746000, task
8801df39bc00)
Stack:
 8801ad747d20 0286 8801ad747d80 82577548
 8801ad747d60 8801df39c190 8801 8104a5ca
 8801ad747d80 8801df39bc00 0046 880221cdde90
Call Trace:
 [8104a5ca] ? del_timer_sync+0x8a/0xc0
 [8108e995] lock_acquire+0x55/0x70
 [a010878b] ? try_worker_shutdown+0x29/0xad [btrfs]
 [815abaac] _raw_spin_lock+0x3c/0x50
 [a010878b] ? try_worker_shutdown+0x29/0xad [btrfs]
 [a010878b] try_worker_shutdown+0x29/0xad [btrfs]
 [a00dfbff] worker_loop+0x17f/0x330 [btrfs]
 [a00dfa80] ? check_pending_worker_creates.isra.1+0xd0/0xd0 [btrfs]
 [8105d9ee] kthread+0x8e/0xa0
 [815ae0d4] kernel_thread_helper+0x4/0x10
 [815ac799] ? retint_restore_args+0xe/0xe
 [8105d960] ? __init_kthread_worker+0x70/0x70
 [815ae0d0] ? gs_change+0xb/0xb
Code: 00 48 c7 c7 50 30 7b 81 89 55 b0 e8 6d e8 fa ff 8b 55 b0 eb a8
0f 1f 84 00 00 00 00 00 4c 8b 7c d3 08 4d 85 ff 0f 84 c9 fe ff ff f0
41 ff 87 98 0$
RIP  [8108dd2e] __lock_acquire+0x1be/0x900
 RSP 8801ad747d00
CR2: 8157f529

(gdb) list *(try_worker_shutdown+0x29)
0x7e80a is in try_worker_shutdown (fs/btrfs/async-thread.c:232).
227 
228 spin_lock_irq(worker-lock);
229 spin_lock(worker-workers-lock);
230 if (worker-workers-num_workers  1 
231 worker-idle 
232 !worker-working 
233 !list_empty(worker-worker_list) 
234 list_empty(worker-prio_pending) 
235 list_empty(worker-pending) 
236 atomic_read(worker-num_pending) == 0) {
-- 
Daniel J Blueman
--
To unsubscribe from

[PATCH] Fix space checking during fs resize

2012-04-25 Thread Daniel J Blueman
Fix out-of-space checking, addressing a warning and potential resource
leak when resizing the filesystem down while allocating blocks.

Signed-off-by: Daniel J Blueman dan...@quora.org
Reviewed-by: Josef Bacik jo...@redhat.com
---
 fs/btrfs/relocation.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 017281d..cd2b46e 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3811,7 +3811,7 @@ restart:
 
ret = btrfs_block_rsv_check(rc-extent_root, rc-block_rsv, 5);
if (ret  0) {
-   if (ret != -EAGAIN) {
+   if (ret != -ENOSPC) {
err = ret;
WARN_ON(1);
break;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Add missing unlocks on error paths

2012-04-25 Thread Daniel J Blueman
Correctly drop locks during error cases.

Signed-off-by: Daniel J Blueman dan...@quora.org
---
 fs/btrfs/transaction.c |7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 11b77a5..ede3988 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -73,8 +73,11 @@ loop:
 
cur_trans = root-fs_info-running_transaction;
if (cur_trans) {
-   if (cur_trans-aborted)
+   if (cur_trans-aborted) {
+   spin_unlock(root-fs_info-trans_lock);
return cur_trans-aborted;
+   }
+
atomic_inc(cur_trans-use_count);
atomic_inc(cur_trans-num_writers);
cur_trans-num_joined++;
@@ -1400,6 +1403,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle 
*trans,
ret = commit_fs_roots(trans, root);
if (ret) {
mutex_unlock(root-fs_info-tree_log_mutex);
+   mutex_unlock(root-fs_info-reloc_mutex);
goto cleanup_transaction;
}
 
@@ -1411,6 +1415,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle 
*trans,
ret = commit_cowonly_roots(trans, root);
if (ret) {
mutex_unlock(root-fs_info-tree_log_mutex);
+   mutex_unlock(root-fs_info-reloc_mutex);
goto cleanup_transaction;
}
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Fix minor type issues

2012-04-25 Thread Daniel J Blueman
Address some minor type issues identified by sparse checker.

Signed-off-by: Daniel J Blueman dan...@quora.org
---
 fs/btrfs/ioctl.c |2 +-
 fs/btrfs/ulist.c |4 ++--
 fs/btrfs/ulist.h |5 ++---
 3 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 18cc23d..b410879 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2911,7 +2911,7 @@ long btrfs_ioctl_space_info(struct btrfs_root *root, void 
__user *arg)
up_read(info-groups_sem);
}
 
-   user_dest = (struct btrfs_ioctl_space_info *)
+   user_dest = (struct btrfs_ioctl_space_info __user *)
(arg + sizeof(struct btrfs_ioctl_space_args));
 
if (copy_to_user(user_dest, dest_orig, alloc_size))
diff --git a/fs/btrfs/ulist.c b/fs/btrfs/ulist.c
index 12f5147..ad993bc 100644
--- a/fs/btrfs/ulist.c
+++ b/fs/btrfs/ulist.c
@@ -95,7 +95,7 @@ EXPORT_SYMBOL(ulist_reinit);
  *
  * The allocated ulist will be returned in an initialized state.
  */
-struct ulist *ulist_alloc(unsigned long gfp_mask)
+struct ulist *ulist_alloc(gfp_t gfp_mask)
 {
struct ulist *ulist = kmalloc(sizeof(*ulist), gfp_mask);
 
@@ -144,7 +144,7 @@ EXPORT_SYMBOL(ulist_free);
  * unaltered.
  */
 int ulist_add(struct ulist *ulist, u64 val, unsigned long aux,
- unsigned long gfp_mask)
+ gfp_t gfp_mask)
 {
int i;
 
diff --git a/fs/btrfs/ulist.h b/fs/btrfs/ulist.h
index 2e25dec..6568c35 100644
--- a/fs/btrfs/ulist.h
+++ b/fs/btrfs/ulist.h
@@ -59,10 +59,9 @@ struct ulist {
 void ulist_init(struct ulist *ulist);
 void ulist_fini(struct ulist *ulist);
 void ulist_reinit(struct ulist *ulist);
-struct ulist *ulist_alloc(unsigned long gfp_mask);
+struct ulist *ulist_alloc(gfp_t gfp_mask);
 void ulist_free(struct ulist *ulist);
-int ulist_add(struct ulist *ulist, u64 val, unsigned long aux,
- unsigned long gfp_mask);
+int ulist_add(struct ulist *ulist, u64 val, unsigned long aux, gfp_t gfp_mask);
 struct ulist_node *ulist_next(struct ulist *ulist, struct ulist_node *prev);
 
 #endif
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


block_rsv_check EAGAIN vs ENOSPC...

2012-04-24 Thread Daniel J Blueman
With 3.4-rc4 under certain workloads, I see btrfs_block_rsv_check
return -ENOSPC.

Since btrfs_block_rsv_check can only return -ENOSPC or 0,
relocation.c:3816 checks for -EAGAIN, which is either redundant or
should be -ENOSPC, which I initially suspected.

Let me know which the behaviour should be and I'll get a patch tested and sent.

Thanks,
  Daniel
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Prevent root_list corruption

2012-04-23 Thread Daniel J Blueman
I was seeing root_list corruption on unmount during fs resize in 3.4-rc4; add
correct locking to address this.

Signed-off-by: Daniel J Blueman dan...@quora.org
---
 fs/btrfs/relocation.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 017281d..5a105a0 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -1279,7 +1279,9 @@ static int __update_reloc_root(struct btrfs_root *root, 
int del)
if (rb_node)
backref_tree_panic(rb_node, -EEXIST, node-bytenr);
} else {
+   spin_lock(root-fs_info-trans_lock);
list_del_init(root-root_list);
+   spin_unlock(root-fs_info-trans_lock);
kfree(node);
}
return 0;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs 3.2.2 - 3.3.1 upgrade finally ate babies, some advice?

2012-04-09 Thread Daniel J Blueman
Leho Kraav leho at kraav.com writes:
[]
 Apr  8 02:46:11 s9 kernel: [  189.691778] attempt to access beyond end
 of device
 Apr  8 02:46:11 s9 kernel: [  189.691787] dm-3: rw=129, want=23361976,
 limit=20967424

I recently bumped into this too [1]. Liu Bo posted a patch for it [2],
which tests out fine here. The workaround is to not mount with
'discard' until eg ~3.4-rc3 or later.

Thanks,
  Daniel

[1] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/16409
[2] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/16649
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs 3.2.2 - 3.3.1 upgrade finally ate babies, some advice?

2012-04-09 Thread Daniel J Blueman
On 9 April 2012 22:44, Leho Kraav l...@kraav.com wrote:
 On 09.04.2012 17:35, Daniel J Blueman wrote:

 Leho Kraavlehoat  kraav.com  writes:
 []

 Apr  8 02:46:11 s9 kernel: [  189.691778] attempt to access beyond end
 of device
 Apr  8 02:46:11 s9 kernel: [  189.691787] dm-3: rw=129, want=23361976,
 limit=20967424


 I recently bumped into this too [1]. Liu Bo posted a patch for it [2],
 which tests out fine here. The workaround is to not mount with
 'discard' until eg ~3.4-rc3 or later.

 [1] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/16409
 [2] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/16649

 Oh wow, thanks. This sounds exactly like what happened. I got the livelock
 post off my search results, but the patch post doesn't seem to have any of
 the keywords I was looking for, since I had no idea it could be related to
 discards.

 So can this become a problem earlier too, not only when the space used is
 approaching limits? If not, I think I should be good until 3.4:

Looks like it affects at least 3.3 and 3.4-rc1/2 in all circumstances.

 $ sudo btrfs fi show
 Label: 'S9-HOME'  uuid: 1ed06dbc-e1b7-433f-8d1b-19cf1f7756f1
        Total devices 1 FS bytes used 12.93GB
        devid    1 size 60.00GB used 20.04GB path /dev/dm-0

 Label: 'S9-ROOT'  uuid: 6206dfce-afcf-4afe-9047-b1c88a7889fd
        Total devices 1 FS bytes used 8.75GB
        devid    1 size 30.00GB used 18.29GB path /dev/dm-1

 I think I'd like to keep using discard for SSD still, unless a smart
 person says it's not particularly useful anyway.

If your SSD has background garbage collection and there are disk idle
periods, the synchronous discards will have little benefit.

 So while I'm on 3.3, is the patch from gmane:16649 good enough to eliminate
 immediate dangers?

Yes.

 And is the previous filesystem still hosed for good then? Or mounting the
 images with -discard might help?

It seems like the kernel caught and prevented the discard after the
end of the partition, so the data should be fine; scrubbing will tell
you.

Daniel
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


umount vs delayed allocation potential deadlock...

2012-04-09 Thread Daniel J Blueman
 [8113d1d8] writeback_inodes_sb_nr_if_idle+0x38/0x60
 [a00a0b2a] shrink_delalloc+0x13a/0x200 [btrfs]
 [a00a7672] reserve_metadata_bytes.isra.70+0x1c2/0x430 [btrfs]
 [8108e801] ? __lock_release+0x21/0xd0
 [a00a86ff] btrfs_delalloc_reserve_metadata+0x12f/0x240 [btrfs]
 [a00a890b] btrfs_delalloc_reserve_space+0x3b/0x60 [btrfs]
 [a00bceea] btrfs_direct_IO+0x14a/0x410 [btrfs]
 [810cffbf] ? do_writepages+0x1f/0x40
 [810c641c] generic_file_direct_write+0xcc/0x190
 [a0112391] __btrfs_direct_write+0x40/0x146 [btrfs]
 [a00c602f] ? btrfs_update_time+0x5f/0x160 [btrfs]
 [a00ca3af] btrfs_file_aio_write+0x33f/0x350 [btrfs]
 [815ac54b] ? _raw_spin_unlock_irq+0x2b/0x50
 [815ac54b] ? _raw_spin_unlock_irq+0x2b/0x50
 [a00ca070] ? __btrfs_buffered_write+0x340/0x340 [btrfs]
 [8115b1a9] aio_rw_vect_retry+0xb9/0x160
 [8115b0f0] ? aio_advance_iovec+0x90/0x90
 [8115caee] aio_run_iocb+0x5e/0x150
 [8115d265] io_submit_one+0x175/0x220
 [8115d6d9] do_io_submit+0x129/0x1c0
 [8115d77b] sys_io_submit+0xb/0x10
 [815ad122] system_call_fastpath+0x16/0x1b
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[3.4-rc1] attempt to access beyond end of device and livelock

2012-04-06 Thread Daniel J Blueman
Hi Josef, Chris,

When testing BTRFS with RAID 0 metadata on linux-3.4-rc1, we see
discard ranges exceeding the end of the block device [1], potentially
causing dataloss; when this occurs, filesystem writeback becomes
catatonic due to continual resubmission.

Simply mounting with discard a raid0 metadata filesystem and copying
some data in [2] provokes the issue.

Thanks,
 Daniel

--- [1]

attempt to access beyond end of device
ram0: rw=129, want=8452072, limit=4096000
...

--- [2]

modprobe brd rd_size=2048000 (or boot with ramdisk_size=2048000)
mkfs.btrfs -m raid0 /dev/ram0 /dev/ram1
mount /dev/ram0 /mnt -o discard
cd /mnt  tar -xvzf linux.tar.gz
access beyond end of device and livelock
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


attempt to access beyond end of device and livelock

2012-03-25 Thread Daniel J Blueman
Hi Dongyang, Yan,

When testing BTRFS with RAID 0 metadata on linux-3.3, we see discard
ranges exceeding the end of the block device [1], potentially causing
dataloss; when this occurs, filesystem writeback becomes catatonic due
to continual resubmission.

The reproducer is quite simple [2]. Hope this proves useful...

Thanks,
  Daniel

--- [1]

attempt to access beyond end of device
ram0: rw=129, want=8452072, limit=4096000
...

--- [2]

modprobe brd rd_size=2048000 (or boot with ramdisk_size=2048000)
mkfs.btrfs -m raid0 /dev/ram0 /dev/ram1
mount /dev/ram0 /mnt -o discard
fio testcase
umount /mnt

--- [3] testcase

[global]
directory=/mnt
rw=randread
size=256m
ioengine=libaio
iodepth=4
invalidate=1
direct=1

[bgwriter]
rw=randwrite
iodepth=32

[queryA]
iodepth=1
ioengine=mmap
thinktime=3

[queryB]
iodepth=1
ioengine=mmap
thinktime=1

[bgupdater]
rw=randrw
iodepth=16
thinktime=1
size=32m
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 'bad tree block start' mount failure...

2012-03-21 Thread Daniel J Blueman
On 21 March 2012 00:16, Andrea Gelmini andrea.gelm...@gmail.com wrote:
 2012/3/20 Daniel J Blueman dan...@quora.org:
 mkfs.btrfs -m raid0 -d raid0 /dev/sdb1 /dev/sdc1
 mount /dev/sdb1 /mnt
 umount /mnt
 mount /dev/sdb1 /mnt -o compress
 umount /mnt
 mount /dev/sdb1 /mnt -o ssd
 umount /mnt
 mount /dev/sdb1 /mnt -o discard
 umount /mnt
 mount /dev/sdb1 /mnt
 mount failure

 Well, I can't reproduce this. It's also true that I use some
 out-of-the-tree patches.
 I wrote this each step. They must be in a script?

I can reproduce this booting with the ubuntu 3.3 mainline kernel with
eg 'ramdisk_size=2048000' and then:

# mkfs.btrfs -m raid0 -d raid0 /dev/ram0 /dev/ram1
# mount /dev/ram0 /mnt
# umount /mnt
# mount /dev/ram0 /mnt -o compress
# umount /mnt
# mount /dev/ram0 /mnt -o ssd
# umount /mnt
# mount /dev/ram0 /mnt -o discard
# umount /mnt
# mount /dev/ram0 /mnt
mount: wrong fs type, bad option, bad superblock on /dev/ram0,
   missing codepage or helper program, or other error
   In some cases useful info is found in syslog - try
   dmesg | tail  or so

Thanks,
  Daniel
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs_search_slot BUG...

2012-03-08 Thread Daniel J Blueman
When testing out 16KB blocks with direct I/O [1] on 3.3-rc6, we
quickly see btrfs_search_slot returning positive numbers, popping an
assertion [2].

Are 4KB block sizes known broken for now?

Thanks,
  Daniel

--- [1]

mkfs.btrfs -m raid1 -d raid1 -l 16k -n 16k /dev/sda /dev/sdb
mount /dev/sda /store  cd /store
fio /usr/share/doc/fio/examples/iometer-file-access-server

--- [2]

kernel BUG at /home/apw/COD/linux/fs/btrfs/extent-tree.c:1481!
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[3.2.2] disk-io.c:413, extent-tree.c:1481, transactions.c:1220: bad luck

2012-01-29 Thread Daniel J Blueman
]
 [a06337eb] btrfs_finish_ordered_io+0x16b/0x340 [btrfs]
 [a0633a11] btrfs_writepage_end_io_hook+0x51/0xa0 [btrfs]
 [a064a4db] end_bio_extent_writepage+0x13b/0x180 [btrfs]
 [8164975b] ? schedule_timeout+0x18b/0x2e0
 [811b14ad] bio_endio+0x1d/0x40
 [a06283a4] end_workqueue_fn+0xf4/0x130 [btrfs]
 [a06578ac] worker_loop+0x15c/0x4c0 [btrfs]
 [a0657750] ? check_pending_worker_creates+0xf0/0xf0 [btrfs]
 [8108bb06] kthread+0x96/0xa0
 [816559b4] kernel_thread_helper+0x4/0x10
 [8108ba70] ? kthread_worker_fn+0x190/0x190
 [816559b0] ? gs_change+0x13/0x13
Code: 8b 75 20 48 89 c3 48 8b 7d 18 e8 82 c2 ff ff 48 39 d8 77 1f b8
1d 00 00 00 e9 0a ff ff ff a8 01 90 0f 85 74 fe ff ff 0f 0b eb fe 0f
0b eb fe 0f 0b eb fe 4c 89 fb 44 8b 7d ac 83 7d 30 00 41 be
RIP  [a061c3ca] lookup_inline_extent_backref+0x2ea/0x400 [btrfs]
 RSP 88021f7038d0
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: fix cast address space annotation in ioctl.c

2011-06-23 Thread Daniel J Blueman
One of the casts in ioctl.c loses the __user annotation; cast so it is
correctly maintained.

Signed-off-by: Daniel J Blueman daniel.blue...@gmail.com

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index a3c4751..79c32d8 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2708,7 +2708,7 @@ long btrfs_ioctl_space_info(struct btrfs_root
*root, void __user *arg)
up_read(info-groups_sem);
}

-   user_dest = (struct btrfs_ioctl_space_info *)
+   user_dest = (struct btrfs_ioctl_space_info __user *)
(arg + sizeof(struct btrfs_ioctl_space_args));

if (copy_to_user(user_dest, dest_orig, alloc_size))
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3.0-rc3] btrfs: fix oops on failure path

2011-06-19 Thread Daniel J Blueman
I hit this BTRFS oops [1] in 3.0-rc3, clearly due to filesystem corruption.

If lookup_extent_backref fails, path-nodes[0] reasonably could be
null, so look before leaping [2].

Chris, if happy, can you squeeze this into the drop for -rc4 please?

Signed-off-by: Daniel J Blueman daniel.blue...@gmail.com

--- [1]

leaf free space ret -1678719553, leaf data size 3995, used 1678723548 nritems 60
parent transid verify failed on 113373184 wanted 31 found 13951
leaf free space ret -1678719553, leaf data size 3995, used 1678723548 nritems 60
leaf free space ret -1678719553, leaf data size 3995, used 1678723548 nritems 60
leaf free space ret -1678719553, leaf data size 3995, used 1678723548 nritems 60
leaf free space ret -1678719553, leaf data size 3995, used 1678723548 nritems 60
BUG: unable to handle kernel NULL pointer dereference at 0030
IP: [8122d8e8] btrfs_print_leaf+0x28/0x810
PGD 206386067 PUD 20639e067 PMD 0
Oops:  [#1] SMP
CPU 2
Modules linked in: binfmt_misc kvm_intel kvm microcode arc4 uvcvideo
videodev v4l2_compat_ioctl32 i915 mei(C) iwlagn drm_kms_helper
mac80211 drm i2c_algo_bit video sdhci_pci sdhci mmc_core usb_storage

Pid: 1526, comm: rm Tainted: G C  3.0.0-rc3-340c+ #4 Dell Inc.
Latitude E5420/0H5TG2
RIP: 0010:[8122d8e8]  [8122d8e8] btrfs_print_leaf+0x28/0x810
RSP: 0018:8802063f7ab8  EFLAGS: 00010286
RAX: fffb RBX: 88022dc5de10 RCX: af74
RDX: 0008 RSI:  RDI: 880223f5b000
RBP: 8802063f7b48 R08: 81259152 R09: 0001
R10: fffb R11: 00020562a000 R12: 0005
R13: 8802063f7fd8 R14:  R15: 1000
FS:  7f95c55b3720() GS:88022ec4() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 0030 CR3: 0002063ac000 CR4: 000406e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process rm (pid: 1526, threadinfo 8802063f6000, task 880210a15da0)
Stack:
 1000 8802063f7c00 06bb 8125a124
 880223f5b000 1000 8802063f7b48 88022dc5de10
 06bb 001000a8 8802063f7b00 880210bb8360
Call Trace:
 [8125a124] ? set_extent_dirty+0x24/0x30
 [812261f2] __btrfs_free_extent+0x672/0x720
 [8121bf60] ? btrfs_del_leaf+0xd0/0x100
 [81228ac9] run_clustered_refs+0x379/0x840
 [81279b00] ? btrfs_find_ref_cluster+0x60/0x190
 [81229050] btrfs_run_delayed_refs+0xc0/0x200
 [8123a558] __btrfs_end_transaction+0x88/0x250
 [8123a780] btrfs_end_transaction+0x10/0x20
 [81244420] btrfs_evict_inode+0x180/0x210
 [8110dd2b] evict+0x7b/0x150
 [8110df25] iput+0xd5/0x1a0
 [81103964] do_unlinkat+0x104/0x1d0
 [8112b88b] ? fsnotify_find_inode_mark+0x2b/0x40
 [810f4561] ? filp_close+0x61/0x90
 [81104c5d] sys_unlinkat+0x1d/0x40
 [8165f0fb] system_call_fastpath+0x16/0x1b
Code: 00 00 00 55 48 89 e5 48 81 ec 90 00 00 00 48 89 5d d8 4c 89 6d
e8 4c 89 65 e0 4c 89 75 f0 4c 89 7d f8 65 4c 8b 2c 25 c8 b5 00 00
 8b 46 30 49 81 ed d8 1f 00 00 48 89 f3 41 ff 45 1c 48 ba 00
RIP  [8122d8e8] btrfs_print_leaf+0x28/0x810
 RSP 8802063f7ab8
CR2: 0030

--- [2]

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index b42efc2..1848f8f 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4454,7 +4454,8 @@ static int __btrfs_free_extent(struct
btrfs_trans_handle *trans,
extent_slot = path-slots[0];
}
} else {
-   btrfs_print_leaf(extent_root, path-nodes[0]);
+   if (path-nodes[0])
+   btrfs_print_leaf(extent_root, path-nodes[0]);
WARN_ON(1);
printk(KERN_ERR btrfs unable to find ref byte nr %llu 
   parent %llu root %llu  owner %llu offset %llu\n,
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at fs/btrfs/inode.c:4676!

2011-06-09 Thread Daniel J Blueman
On 10 June 2011 09:57, Andy Lutomirski l...@mit.edu wrote:
 On 06/06/2011 06:19 AM, Marek Otahal wrote:

 Hello,
 the issue happens every time when i have to hard power-off my notebook
 (suspend problems).
 With kernel 2.6.39 the partition is unmountable, solution is to boot
 2.6.38 kernel which
 1/ is able to mount the partition,
 2/ by doing that fixes the problem so later .39 (after clean shutdown) can
 mount it also.

 Same problem here.  Mounting with 2.6.38 says:

 [   41.906259] Btrfs loaded
 [   41.906747] device fsid e040a9d60da49596-66c0275e348878bf devid 1 transid
 69217 /dev/mapper/vg_midnight_ssd-home
 [   41.908767] btrfs: disk space caching is enabled
 [   42.232185] btrfs: unlinked 17 orphans
 [   42.232189] btrfs: truncated 2 orphans

 dmesg in 2.6.39.1 says:
[]
 [   15.004255] kernel BUG at fs/btrfs/inode.c:4676!
[]

I've been experiencing the same issue also.

Josef/Chris, would an metadata snapshot or full block snapshot help
debug this regression? I can probably setup a small testcase to
trigger this.

Thanks,
  Daniel
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[3.0-rc1] insert_dir_item hitting assertion during log replay

2011-05-31 Thread Daniel J Blueman
On 10 April 2011 16:29, Daniel J Blueman daniel.blue...@gmail.com wrote:
 When rebooting from a crash, thus during log replay on 2.6.29-rc2,
 btrfs_insert_dir_item caused an assertion failure [1]. The fs was
 being mounted clear_cache on an SSD.

On 3.0-rc1 with a fresh filesystem, after a few crashes with other
bugs, I tripped the assert at inode.c:4582 during log replay at mount
time, ie btrfs_insert_dir_item() is returning non-zero.

I have a metadata image captured from when this occurred in 2.6.29-rc2
and have instrumented the upstream functions to locate where we're
failing if it happens in my debug session soon. Anything else we can
do?

Thanks,
  Daniel

 --- [1] 2.6.29-rc2 trace

 kernel BUG at fs/btrfs/inode.c:4665!
 invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
 last sysfs file:
 /sys/devices/virtual/wmi/A80593CE-A997-11DA-B012-B622A1EF5492/uevent
 CPU 3
 Modules linked in: video sdhci_pci sdhci mmc_core

 Pid: 328, comm: mount Not tainted 2.6.39-rc2-350cd+ #1 Dell Inc.
 Latitude E5420/0H5TG2
 RIP: 0010:[812a2962]  [812a2962] 
 btrfs_add_link+0x132/0x190
 RSP: 0018:88021e1097d8  EFLAGS: 00010282
 RAX: ffef RBX: 88021d965f70 RCX: 0006
 RDX: ffef RSI: 88021efe4710 RDI: 88021efe4020
 RBP: 88021e109848 R08:  R09: 88022d7c03f0
 R10: 0001 R11: 0001 R12: 88021d966720
 R13: 88021e0261b0 R14: 000f R15: 88021d959000
 FS:  7fcee7b3d800() GS:88022ec6() knlGS:
 CS:  0010 DS:  ES:  CR0: 8005003b
 CR2: 7f5e5700 CR3: 00021e6ef000 CR4: 000406e0
 DR0:  DR1:  DR2: 
 DR3:  DR6: 0ff0 DR7: 0400
 Process mount (pid: 328, threadinfo 88021e108000, task 88021efe4020)
 Stack:
  88020001 0016 88021e109978 0016
  0010555e 0001 1000 
  88021e03a000  00b0 88021e109ae8
 Call Trace:
  [812ccb45] add_inode_ref+0x2f5/0x3b0
  [81058e61] ? get_parent_ip+0x11/0x50
  [812cdff6] replay_one_buffer+0x2c6/0x3a0
  [81099fd0] ? mark_held_locks+0x70/0xa0
  [81058e61] ? get_parent_ip+0x11/0x50
  [812ca978] walk_up_log_tree+0x168/0x320
  [812cdd30] ? replay_one_dir_item+0xe0/0xe0
  [812cb188] walk_log_tree+0xe8/0x290
  [8109a18d] ? trace_hardirqs_on+0xd/0x10
  [812d] btrfs_recover_log_trees+0x220/0x320
  [812cdd30] ? replay_one_dir_item+0xe0/0xe0
  [81295521] open_ctree+0x1301/0x16b0
  [81331ab4] ? snprintf+0x34/0x40
  [812701e3] btrfs_fill_super.clone.14+0x73/0x130
  [811a4aaf] ? disk_name+0x5f/0xc0
  [8132ef77] ? strlcpy+0x47/0x60
  [812705e0] btrfs_mount+0x340/0x3e0
  [81143e9b] mount_fs+0x1b/0xd0
  [8115fece] vfs_kern_mount+0x5e/0xd0
  [8116045f] do_kern_mount+0x4f/0x100
  [81161ea4] do_mount+0x1e4/0x220
  [8116228b] sys_mount+0x8b/0xe0
  [8170adfb] system_call_fastpath+0x16/0x1b
 Code: 4c 89 d2 44 89 f1 4c 89 ee 4c 89 1c 24 4c 89 55 a8 4c 89 5d a0
 e8 5f c6 fe ff 4c 8b 5d a0 4c 8b 55 a8 85 c0 75 bc e9 31 ff ff ff 0f
 0b 48 8b b2 d0 fc ff ff 48 8d 7d b0 b9 11 00 00 00 4d 89 d9
 RIP  [812a2962] btrfs_add_link+0x132/0x190
  RSP 88021e1097d8
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[3.0-rc1] delayed insertion allocation failing...

2011-05-30 Thread Daniel J Blueman
Hi Miao,

When booting 3.0-rc1 with an existing BTRFS filesystem with a normal
desktop use pattern, we see btrfs_batch_insert_item() sometimes
attempt an overly-large kmalloc (= order 11) [1], which is
subsequently failed.

Thanks,
  Daniel

--- [1]

WARNING: at mm/page_alloc.c:2074 __alloc_pages_nodemask+0x206/0x800()
Hardware name: Latitude E5420
Modules linked in: kvm_intel kvm microcode arc4 uvcvideo videodev
v4l2_compat_ioctl32 iwlagn mac80211 sdhci_pci sdhci mmc_core i915
drm_kms_helper drm mei(C) i2c_algo_bit video
Pid: 188, comm: btrfs-delayed-m Tainted: G C  3.0.0-rc1-100debug+ #1
Call Trace:
 [8105ceaa] warn_slowpath_common+0x7a/0xb0
 [8105cef5] warn_slowpath_null+0x15/0x20
 [810f0df6] __alloc_pages_nodemask+0x206/0x800
 [812bfa53] ? map_extent_buffer+0xd3/0xe0
 [812b3db4] ? btrfs_item_offset+0xe4/0xf0
 [81122bd0] alloc_pages_current+0xa0/0x110
 [810ec43f] __get_free_pages+0xf/0x50
 [8112e71a] __kmalloc+0x13a/0x160
 [8171ac00] ? _raw_spin_unlock+0x30/0x60
 [812e6050] btrfs_batch_insert_items+0x110/0x290
 [812e6326] btrfs_insert_delayed_items+0x156/0x170
 [812e674d] btrfs_async_run_delayed_node_done+0x6d/0x1f0
 [812c80f6] worker_loop+0x86/0x2b0
 [812c8070] ? check_pending_worker_creates.clone.4+0xe0/0xe0
 [8107dea6] kthread+0xb6/0xc0
 [8109882d] ? trace_hardirqs_on_caller+0x13d/0x180
 [8171c894] kernel_thread_helper+0x4/0x10
 [81048a47] ? finish_task_switch+0x77/0x100
 [8171afc4] ? retint_restore_args+0xe/0xe
 [8107ddf0] ? __init_kthread_worker+0x70/0x70
 [8171c890] ? gs_change+0xb/0xb
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: very poor read / write performance compared to other FS's?

2011-05-12 Thread Daniel J Blueman
On 13 May 2011 07:15, Marek Fstump marekfst...@gmail.com wrote:
 On Thu, May 12, 2011 at 4:39 PM, Josef Bacik jo...@redhat.com wrote:
 On Wed, May 11, 2011 at 11:33:35PM +0100, Marek Fstump wrote:
 Hi

 I am very interested in using BTRFS for my solution but in basic tests
 it seems to be very poor on read and write performance.  I am
 surprised by this so suspect that maybe I am doing something
 incorrectly or that there are updates I should be using, but I am not
 sure how I update BTRFS on SLES11

 Summary:
 RESULTS on link below
 SLES11 SP1
 Compared Sequential read/write performance against XFS and OCFS2
 Backend storage – FusionIO SLC SSD = circa 750MBsec

 Tests  set as follows:
 Filesystem contains 30 x 4GB files (made of random data)
 Read tests will read from 1 to 30 files concurrently
 Write tests will write 1 to 30 concurrent NEW files (simple 000’s)
 dd -direct flag used on writes

 All defaults used for mounting etc.

 Results shown in attachment.

 BTRFS looks an excellent FS and perfect for my application and I am
 hoping that there are some factors that I am missing
 and would appreciate any advice / help


 Yeah our O_DIRECT performance is less than stellar, I just did a bunch of 
 work
 to try and help us get a little better performance.  Would you mind pulling
 down linus's git tree and testing on that and seeing if you get better
 performance?  Thanks,

 Josef


 Hi Josef

 Forgive me as i am a 'storage guy' - so when you say pull down linus's
 git tree and test do you mean grab the latest kernel?  i know very
 stupid question, but just want to make sure i get it right... if so,
 then yes i will and i will add some more storage power also to see if
 it scales.

For  SLES 11, the kernel RPMs here may be your best shot:

http://download.opensuse.org/repositories/Kernel:/stable/standard/x86_64/

(eg kernel-default-2.6.38.6-1.1.x86_64.rpm)

You'll probably have to download dependent RPMs from there too.

Thanks,
  Daniel
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH, fixed] Prevent oopsing in posix_acl_valid()

2011-05-09 Thread Daniel J Blueman
Hi Chris,

On 4 May 2011 22:40, Josef Bacik jo...@redhat.com wrote:
 On 05/03/2011 10:54 PM, Daniel J Blueman wrote:

 If posix_acl_from_xattr() returns an error code, a negative address is
 dereferenced causing an oops; fix by checking for an error code first.

 Typo fixed; too much late-night coding.

 Signed-off-by: Daniel J Bluemandaniel.blue...@gmail.com
 ---
  fs/btrfs/acl.c |    5 +++--
  1 files changed, 3 insertions(+), 2 deletions(-)

 diff --git a/fs/btrfs/acl.c b/fs/btrfs/acl.c
 index 5d505aa..44ea5b9 100644
 --- a/fs/btrfs/acl.c
 +++ b/fs/btrfs/acl.c
 @@ -178,12 +178,13 @@ static int btrfs_xattr_acl_set(struct dentry
 *dentry, const char *name,

        if (value) {
                acl = posix_acl_from_xattr(value, size);
 +               if (IS_ERR(acl))
 +                       return PTR_ERR(acl);
 +
                if (acl) {
                        ret = posix_acl_valid(acl);
                        if (ret)
                                goto out;
 -               } else if (IS_ERR(acl)) {
 -                       return PTR_ERR(acl);
                }
        }


 Actually pulled this down and compiled it this time to make sure it worked.
  You can add

 Reviewed-by: Josef Bacik jo...@redhat.com

Will this fix go upstream for the final 2.6.39, now that the last -rc
is already out? I hit it in two independent cases when rebooting after
other kernel crashes.

Thanks,
  Daniel
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2.6.29-rc2] insert_dir_item hitting assertion during log replay

2011-05-08 Thread Daniel J Blueman
On 12 April 2011 00:07, Daniel J Blueman daniel.blue...@gmail.com wrote:
 On 11 April 2011 23:32, Josef Bacik jo...@redhat.com wrote:
 On 04/10/2011 04:29 AM, Daniel J Blueman wrote:

 When rebooting from a crash, thus during log replay on 2.6.29-rc2,
 btrfs_insert_dir_item caused an assertion failure [1]. The fs was
 being mounted clear_cache on an SSD.

 Probably it's not so easy to reproduce, but better to report it...


 Do you still have this fs, and does it still panic the same way on mount?
  Thanks,

 I still have this fs, though it didn't panic at next mount. I guess
 this creates a case for cooking a script that eg logically disconnects
 a block device during activity (hdparm or echo 1 delete) then
 reconnects it for remount...let me know if interested.

I've hit this a few times recently following a crash in 2.6.39-rc (eg
with -rc6 [1]) and have found the only way to access the data is mount
-o ro,notreelog.

I guess btrfs_insert_dir_item is failing due to corruption of the
directory inode. The only solution here would be to gracefully discard
the log item being replayed and print a warning that the filesystem
has corruption, right?

Daniel

--- [1]

kernel BUG at fs/btrfs/inode.c:4676!
invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
last sysfs file: /sys/devices/virtual/bdi/btrfs-2/uevent
CPU 1
Modules linked in: binfmt_misc kvm_intel kvm arc4 ecb uvcvideo
videodev v4l2_compat_ioctl32 microcode i915 iwlagn sdhci_pci sdhci
drm_kms_helper mac80211 drm i2c_algo_bit mmc_core video

Pid: 1372, comm: mount Tainted: G   M2.6.39-rc6-330cd+ #3 Dell
Inc. Latitude E5420/0H5TG2
RIP: 0010:[812a3262]  [812a3262] btrfs_add_link+0x132/0x190
RSP: 0018:8802102fd7b8  EFLAGS: 00010282
RAX: ffef RBX: 880212594860 RCX: 0040
RDX: ffef RSI:  RDI: 8112e413
RBP: 8802102fd828 R08:  R09: 88022d732090
R10: 0025 R11:  R12: 8802124cd010
R13: 88020dc48000 R14: 000f R15: 88021dcbc000
FS:  7f58dc76b800() GS:88022ec2() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 7f7a5adeb110 CR3: 00020e2e9000 CR4: 000406e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process mount (pid: 1372, threadinfo 8802102fc000, task 88021e5ec020)
Stack:
 88020001 1ab8 8802102fd958 1ab8
 0015d5c1 0001 1000 
 88020dc25000  007e 8802102fdac8
Call Trace:
 [812cd595] add_inode_ref+0x2f5/0x3b0
 [81059261] ? get_parent_ip+0x11/0x50
 [812cea46] replay_one_buffer+0x2c6/0x3a0
 [8112d126] ? init_object+0x46/0x80
 [81059261] ? get_parent_ip+0x11/0x50
 [812cb3c8] walk_up_log_tree+0x168/0x320
 [812ce780] ? replay_one_dir_item+0xe0/0xe0
 [812cbbd8] walk_log_tree+0xe8/0x290
 [8109a59d] ? trace_hardirqs_on+0xd/0x10
 [812d0a50] btrfs_recover_log_trees+0x220/0x320
 [812ce780] ? replay_one_dir_item+0xe0/0xe0
 [81295ca1] open_ctree+0x1301/0x16b0
 [81332504] ? snprintf+0x34/0x40
 [81270873] btrfs_fill_super.clone.14+0x73/0x130
 [811a4ebf] ? disk_name+0x5f/0xc0
 [8132f9c7] ? strlcpy+0x47/0x60
 [81270cdf] btrfs_mount+0x3af/0x450
 [811442eb] mount_fs+0x1b/0xd0
 [8116027e] vfs_kern_mount+0x5e/0xd0
 [8116080f] do_kern_mount+0x4f/0x100
 [81162264] do_mount+0x1e4/0x220
 [8116264b] sys_mount+0x8b/0xe0
 [8170927b] system_call_fastpath+0x16/0x1b
Code: 4c 89 d2 44 89 f1 4c 89 ee 4c 89 1c 24 4c 89 55 a8 4c 89 5d a0
e8 df c4 fe ff 4c 8b 5d a0 4c 8b 55 a8 85 c0 75 bc e9 31 ff ff ff 0f
0b 48 8b b2 d0 fc ff ff 48 8d 7d b0 b9 11 00 00 00 4d 89 d9
RIP  [812a3262] btrfs_add_link+0x132/0x190
 RSP 8802102fd7b8
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: abysmal performance

2011-05-03 Thread Daniel J Blueman
On 3 May 2011 19:30, Bernhard Schmidt be...@birkenwald.de wrote:
[]
 The file the defrag ioctl works is that it schedules things for defrag
 but doesn't force out the IO immediately unless you use -f.

 So, to test the result of the defrag, you need to either wait a bit or
 run sync.

 Did so, no change. See my reply to cwillu for the data.

Can you try with the compression option enabled? Eg:

# filefrag foo.dat
foo.dat: 11 extents found

# find . -xdev -type f -print0 | xargs -0 btrfs filesystem defragment -c

# filefrag foo.dat
foo.dat: 1 extent found

Seems to work fine on 2.6.39-rc5; I mounted with '-o
compress,clear_cache' though.

Daniel
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Prevent oopsing in posix_acl_valid()

2011-05-03 Thread Daniel J Blueman
If posix_acl_from_xattr() returns an error code, a negative address is
dereferenced causing an oops; fix by checking for error code first.

Signed-off-by: Daniel J Blueman daniel.blue...@gmail.com
---
 fs/btrfs/acl.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/acl.c b/fs/btrfs/acl.c
index 5d505aa..cad6fbb 100644
--- a/fs/btrfs/acl.c
+++ b/fs/btrfs/acl.c
@@ -178,12 +178,13 @@ static int btrfs_xattr_acl_set(struct dentry *dentry, 
const char *name,
 
if (value) {
acl = posix_acl_from_xattr(value, size);
+   if (IS_ERR(acl)
+   return PTR_ERR(acl);
+
if (acl) {
ret = posix_acl_valid(acl);
if (ret)
goto out;
-   } else if (IS_ERR(acl)) {
-   return PTR_ERR(acl);
}
}
 
-- 
1.7.4.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH, fixed] Prevent oopsing in posix_acl_valid()

2011-05-03 Thread Daniel J Blueman
If posix_acl_from_xattr() returns an error code, a negative address is
dereferenced causing an oops; fix by checking for an error code first.

Typo fixed; too much late-night coding.

Signed-off-by: Daniel J Blueman daniel.blue...@gmail.com
---
 fs/btrfs/acl.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/acl.c b/fs/btrfs/acl.c
index 5d505aa..44ea5b9 100644
--- a/fs/btrfs/acl.c
+++ b/fs/btrfs/acl.c
@@ -178,12 +178,13 @@ static int btrfs_xattr_acl_set(struct dentry *dentry, 
const char *name,
 
if (value) {
acl = posix_acl_from_xattr(value, size);
+   if (IS_ERR(acl))
+   return PTR_ERR(acl);
+
if (acl) {
ret = posix_acl_valid(acl);
if (ret)
goto out;
-   } else if (IS_ERR(acl)) {
-   return PTR_ERR(acl);
}
}
 
-- 
1.7.4.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 1/3] btrfs: move btrfs_cmp_device_free_bytes to super.c

2011-05-02 Thread Daniel J Blueman
On 2 May 2011 16:47, Arne Jansen sensi...@gmx.net wrote:
 this function won't be used here anymore, so move it super.c where it is
 used for df-calculation

 Signed-off-by: Arne Jansen sensi...@gmx.net
 ---
  fs/btrfs/super.c   |   25 +
  fs/btrfs/volumes.c |   13 -
  fs/btrfs/volumes.h |   15 ---
  3 files changed, 25 insertions(+), 28 deletions(-)

 diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
 index 0ac712e..d8c9a49 100644
 --- a/fs/btrfs/super.c
 +++ b/fs/btrfs/super.c
 @@ -913,6 +913,31 @@ static int btrfs_remount(struct super_block *sb, int 
 *flags, char *data)
        return 0;
  }

 +/* Used to sort the devices by max_avail(descending sort) */
 +int btrfs_cmp_device_free_bytes(const void *dev_info1, const void *dev_info2)
 +{
 +       if (((struct btrfs_device_info *)dev_info1)-max_avail 
 +           ((struct btrfs_device_info *)dev_info2)-max_avail)
 +               return -1;
 +       else if (((struct btrfs_device_info *)dev_info1)-max_avail 
 +                ((struct btrfs_device_info *)dev_info2)-max_avail)
 +               return 1;
 +       else
 +       return 0;
 +}
 +
 +/*
 + * sort the devices by max_avail, in which max free extent size of each 
 device
 + * is stored.(Descending Sort)
 + */
 +static inline void btrfs_descending_sort_devices(
 +                                       struct btrfs_device_info *devices,
 +                                       size_t nr_devices)
 +{
 +       sort(devices, nr_devices, sizeof(struct btrfs_device_info),
 +            btrfs_cmp_device_free_bytes, NULL);
 +}
 +
  /*
  * The helper to calc the free space on the devices that can be used to store
  * file data.
 diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
 index 8b9fb8c..a9f1fc2 100644
 --- a/fs/btrfs/volumes.c
 +++ b/fs/btrfs/volumes.c
 @@ -2282,19 +2282,6 @@ static noinline u64 chunk_bytes_by_type(u64 type, u64 
 calc_size,
                return calc_size * num_stripes;
  }

 -/* Used to sort the devices by max_avail(descending sort) */
 -int btrfs_cmp_device_free_bytes(const void *dev_info1, const void *dev_info2)
 -{
 -       if (((struct btrfs_device_info *)dev_info1)-max_avail 
 -           ((struct btrfs_device_info *)dev_info2)-max_avail)
 -               return -1;
 -       else if (((struct btrfs_device_info *)dev_info1)-max_avail 
 -                ((struct btrfs_device_info *)dev_info2)-max_avail)
 -               return 1;
 -       else
 -               return 0;
 -}
 -
  static int __btrfs_calc_nstripes(struct btrfs_fs_devices *fs_devices, u64 
 type,
                                 int *num_stripes, int *min_stripes,
                                 int *sub_stripes)
 diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
 index cc2eada..b502f01 100644
 --- a/fs/btrfs/volumes.h
 +++ b/fs/btrfs/volumes.h
 @@ -157,21 +157,6 @@ struct map_lookup {
        struct btrfs_bio_stripe stripes[];
  };

 -/* Used to sort the devices by max_avail(descending sort) */
 -int btrfs_cmp_device_free_bytes(const void *dev_info1, const void 
 *dev_info2);
 -
 -/*
 - * sort the devices by max_avail, in which max free extent size of each 
 device
 - * is stored.(Descending Sort)
 - */
 -static inline void btrfs_descending_sort_devices(
 -                                       struct btrfs_device_info *devices,
 -                                       size_t nr_devices)
 -{
 -       sort(devices, nr_devices, sizeof(struct btrfs_device_info),
 -            btrfs_cmp_device_free_bytes, NULL);
 -}
 -
  int btrfs_account_dev_extents_size(struct btrfs_device *device, u64 start,
                                   u64 end, u64 *length);


btrfs_cmp_device_free_bytes() can be marked static, since there are no
users outside the compilation unit.

Daniel
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] mark internal functions static (fixed)

2011-04-14 Thread Daniel J Blueman
This didn't make it in before, so updating to 2.6.39-rc3 and resending:

Prevent needless exporting of internal functions from compilation units by 
marking them static.

---
 fs/btrfs/ctree.c   |2 +-
 fs/btrfs/disk-io.c |2 +-
 fs/btrfs/extent-tree.c |4 ++--
 fs/btrfs/inode.c   |2 +-
 fs/btrfs/ioctl.c   |2 +-
 5 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 84d7ca1..6581b37 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -74,7 +74,7 @@ noinline void btrfs_set_path_blocking(struct btrfs_path *p)
  * retake all the spinlocks in the path.  You can safely use NULL
  * for held
  */
-noinline void btrfs_clear_path_blocking(struct btrfs_path *p,
+static noinline void btrfs_clear_path_blocking(struct btrfs_path *p,
struct extent_buffer *held)
 {
int i;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 8f1d44b..59f2567 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2313,7 +2313,7 @@ static int write_dev_supers(struct btrfs_device *device,
return errors  i ? 0 : -1;
 }
 
-int write_all_supers(struct btrfs_root *root, int max_mirrors)
+static int write_all_supers(struct btrfs_root *root, int max_mirrors)
 {
struct list_head *head;
struct btrfs_device *dev;
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index f619c3c..5394255 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -75,7 +75,7 @@ static int block_group_bits(struct btrfs_block_group_cache 
*cache, u64 bits)
return (cache-flags  bits) == bits;
 }
 
-void btrfs_get_block_group(struct btrfs_block_group_cache *cache)
+static void btrfs_get_block_group(struct btrfs_block_group_cache *cache)
 {
atomic_inc(cache-count);
 }
@@ -3586,7 +3586,7 @@ static void block_rsv_add_bytes(struct btrfs_block_rsv 
*block_rsv,
spin_unlock(block_rsv-lock);
 }
 
-void block_rsv_release_bytes(struct btrfs_block_rsv *block_rsv,
+static void block_rsv_release_bytes(struct btrfs_block_rsv *block_rsv,
 struct btrfs_block_rsv *dest, u64 num_bytes)
 {
struct btrfs_space_info *space_info = block_rsv-space_info;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 5cc64ab..e5e0cf7 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5305,7 +5305,7 @@ out:
return em;
 }
 
-struct extent_map *btrfs_get_extent_fiemap(struct inode *inode, struct page 
*page,
+static struct extent_map *btrfs_get_extent_fiemap(struct inode *inode, struct 
page *page,
   size_t pg_offset, u64 start, u64 len,
   int create)
 {
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index cfc264f..de76a6d 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2281,7 +2281,7 @@ static void get_block_group_info(struct list_head 
*groups_list,
}
 }
 
-long btrfs_ioctl_space_info(struct btrfs_root *root, void __user *arg)
+static long btrfs_ioctl_space_info(struct btrfs_root *root, void __user *arg)
 {
struct btrfs_ioctl_space_args space_args;
struct btrfs_ioctl_space_info space;
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Fix address space annotation (fixed)

2011-04-14 Thread Daniel J Blueman
Fix address space annotation.

---
 fs/btrfs/ioctl.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index de76a6d..2b1e53e 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2287,7 +2287,7 @@ static long btrfs_ioctl_space_info(struct btrfs_root 
*root, void __user *arg)
struct btrfs_ioctl_space_info space;
struct btrfs_ioctl_space_info *dest;
struct btrfs_ioctl_space_info *dest_orig;
-   struct btrfs_ioctl_space_info *user_dest;
+   struct btrfs_ioctl_space_info __user *user_dest;
struct btrfs_space_info *info;
u64 types[] = {BTRFS_BLOCK_GROUP_DATA,
   BTRFS_BLOCK_GROUP_SYSTEM,
@@ -2387,7 +2387,7 @@ static long btrfs_ioctl_space_info(struct btrfs_root 
*root, void __user *arg)
up_read(info-groups_sem);
}
 
-   user_dest = (struct btrfs_ioctl_space_info *)
+   user_dest = (struct btrfs_ioctl_space_info __user *)
(arg + sizeof(struct btrfs_ioctl_space_args));
 
if (copy_to_user(user_dest, dest_orig, alloc_size))
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] mark internal functions static

2011-04-11 Thread Daniel J Blueman
Hi Chris,

This didn't make it in before, so updating to 2.6.39-rc2 and resending:

Prevent needless exporting of internal functions from compilation
units by marking them static.

Signed-off-by: Daniel J Blueman daniel.blue...@gmail.com

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 84d7ca1..6581b37 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -74,7 +74,7 @@ noinline void btrfs_set_path_blocking(struct btrfs_path *p)
  * retake all the spinlocks in the path.  You can safely use NULL
  * for held
  */
-noinline void btrfs_clear_path_blocking(struct btrfs_path *p,
+static noinline void btrfs_clear_path_blocking(struct btrfs_path *p,
struct extent_buffer *held)
 {
int i;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 8f1d44b..59f2567 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2313,7 +2313,7 @@ static int write_dev_supers(struct btrfs_device *device,
return errors  i ? 0 : -1;
 }

-int write_all_supers(struct btrfs_root *root, int max_mirrors)
+static int write_all_supers(struct btrfs_root *root, int max_mirrors)
 {
struct list_head *head;
struct btrfs_device *dev;
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index f619c3c..5394255 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -75,7 +75,7 @@ static int block_group_bits(struct
btrfs_block_group_cache *cache, u64 bits)
return (cache-flags  bits) == bits;
 }

-void btrfs_get_block_group(struct btrfs_block_group_cache *cache)
+static void btrfs_get_block_group(struct btrfs_block_group_cache *cache)
 {
atomic_inc(cache-count);
 }
@@ -3586,7 +3586,7 @@ static void block_rsv_add_bytes(struct
btrfs_block_rsv *block_rsv,
spin_unlock(block_rsv-lock);
 }

-void block_rsv_release_bytes(struct btrfs_block_rsv *block_rsv,
+static void block_rsv_release_bytes(struct btrfs_block_rsv *block_rsv,
 struct btrfs_block_rsv *dest, u64 num_bytes)
 {
struct btrfs_space_info *space_info = block_rsv-space_info;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 6541339..6370184 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5305,7 +5305,7 @@ out:
return em;
 }

-struct extent_map *btrfs_get_extent_fiemap(struct inode *inode,
struct page *page,
+static struct extent_map *btrfs_get_extent_fiemap(struct inode
*inode, struct page *page,
   size_t pg_offset, u64 start, u64 len,
   int create)
 {
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index cfc264f..de76a6d 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2281,7 +2281,7 @@ static void get_block_group_info(struct
list_head *groups_list,
}
 }

-long btrfs_ioctl_space_info(struct btrfs_root *root, void __user *arg)
+static long btrfs_ioctl_space_info(struct btrfs_root *root, void __user *arg)
 {
struct btrfs_ioctl_space_args space_args;
struct btrfs_ioctl_space_info space;
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] fix user annotation in ioctl.c

2011-04-11 Thread Daniel J Blueman
Fix address space annotation correct in ioctl.c.

Signed-off-by: Daniel J Blueman daniel.blue...@gmail.com

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index cfc264f..0474ec3 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2287,7 +2287,7 @@ long btrfs_ioctl_space_info(struct btrfs_root
*root, void __user *arg)
struct btrfs_ioctl_space_info space;
struct btrfs_ioctl_space_info *dest;
struct btrfs_ioctl_space_info *dest_orig;
-   struct btrfs_ioctl_space_info *user_dest;
+   struct btrfs_ioctl_space_info __user *user_dest;
struct btrfs_space_info *info;
u64 types[] = {BTRFS_BLOCK_GROUP_DATA,
   BTRFS_BLOCK_GROUP_SYSTEM,
@@ -2387,7 +2387,7 @@ long btrfs_ioctl_space_info(struct btrfs_root
*root, void __user *arg)
up_read(info-groups_sem);
}

-   user_dest = (struct btrfs_ioctl_space_info *)
+   user_dest = (struct btrfs_ioctl_space_info __user *)
(arg + sizeof(struct btrfs_ioctl_space_args));

if (copy_to_user(user_dest, dest_orig, alloc_size))
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2.6.29-rc2] insert_dir_item hitting assertion during log replay

2011-04-11 Thread Daniel J Blueman
On 11 April 2011 23:32, Josef Bacik jo...@redhat.com wrote:
 On 04/10/2011 04:29 AM, Daniel J Blueman wrote:

 When rebooting from a crash, thus during log replay on 2.6.29-rc2,
 btrfs_insert_dir_item caused an assertion failure [1]. The fs was
 being mounted clear_cache on an SSD.

 Probably it's not so easy to reproduce, but better to report it...


 Do you still have this fs, and does it still panic the same way on mount?
  Thanks,

I still have this fs, though it didn't panic at next mount. I guess
this creates a case for cooking a script that eg logically disconnects
a block device during activity (hdparm or echo 1 delete) then
reconnects it for remount...let me know if interested.

Thanks,
  Daniel
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] mark internal functions static

2011-04-11 Thread Daniel J Blueman
On 11 April 2011 23:45, Josef Bacik jo...@redhat.com wrote:
 On 04/11/2011 11:40 AM, Daniel J Blueman wrote:

 Hi Chris,

 This didn't make it in before, so updating to 2.6.39-rc2 and resending:

 Prevent needless exporting of internal functions from compilation
 units by marking them static.


 Looks like you have line wrapping on or something, the page looks mangled.
  Thanks,

The only way I can solve this in gmail webmail is by attaching the patch.

Is this acceptable? I guess if the mailing list strips patches, I
guess using both may be a get-out-of-jail...

Daniel
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[2.6.29-rc2] insert_dir_item hitting assertion during log replay

2011-04-10 Thread Daniel J Blueman
When rebooting from a crash, thus during log replay on 2.6.29-rc2,
btrfs_insert_dir_item caused an assertion failure [1]. The fs was
being mounted clear_cache on an SSD.

Probably it's not so easy to reproduce, but better to report it...

--- [1]

kernel BUG at fs/btrfs/inode.c:4665!
invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
last sysfs file:
/sys/devices/virtual/wmi/A80593CE-A997-11DA-B012-B622A1EF5492/uevent
CPU 3
Modules linked in: video sdhci_pci sdhci mmc_core

Pid: 328, comm: mount Not tainted 2.6.39-rc2-350cd+ #1 Dell Inc.
Latitude E5420/0H5TG2
RIP: 0010:[812a2962]  [812a2962] btrfs_add_link+0x132/0x190
RSP: 0018:88021e1097d8  EFLAGS: 00010282
RAX: ffef RBX: 88021d965f70 RCX: 0006
RDX: ffef RSI: 88021efe4710 RDI: 88021efe4020
RBP: 88021e109848 R08:  R09: 88022d7c03f0
R10: 0001 R11: 0001 R12: 88021d966720
R13: 88021e0261b0 R14: 000f R15: 88021d959000
FS:  7fcee7b3d800() GS:88022ec6() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 7f5e5700 CR3: 00021e6ef000 CR4: 000406e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process mount (pid: 328, threadinfo 88021e108000, task 88021efe4020)
Stack:
 88020001 0016 88021e109978 0016
 0010555e 0001 1000 
 88021e03a000  00b0 88021e109ae8
Call Trace:
 [812ccb45] add_inode_ref+0x2f5/0x3b0
 [81058e61] ? get_parent_ip+0x11/0x50
 [812cdff6] replay_one_buffer+0x2c6/0x3a0
 [81099fd0] ? mark_held_locks+0x70/0xa0
 [81058e61] ? get_parent_ip+0x11/0x50
 [812ca978] walk_up_log_tree+0x168/0x320
 [812cdd30] ? replay_one_dir_item+0xe0/0xe0
 [812cb188] walk_log_tree+0xe8/0x290
 [8109a18d] ? trace_hardirqs_on+0xd/0x10
 [812d] btrfs_recover_log_trees+0x220/0x320
 [812cdd30] ? replay_one_dir_item+0xe0/0xe0
 [81295521] open_ctree+0x1301/0x16b0
 [81331ab4] ? snprintf+0x34/0x40
 [812701e3] btrfs_fill_super.clone.14+0x73/0x130
 [811a4aaf] ? disk_name+0x5f/0xc0
 [8132ef77] ? strlcpy+0x47/0x60
 [812705e0] btrfs_mount+0x340/0x3e0
 [81143e9b] mount_fs+0x1b/0xd0
 [8115fece] vfs_kern_mount+0x5e/0xd0
 [8116045f] do_kern_mount+0x4f/0x100
 [81161ea4] do_mount+0x1e4/0x220
 [8116228b] sys_mount+0x8b/0xe0
 [8170adfb] system_call_fastpath+0x16/0x1b
Code: 4c 89 d2 44 89 f1 4c 89 ee 4c 89 1c 24 4c 89 55 a8 4c 89 5d a0
e8 5f c6 fe ff 4c 8b 5d a0 4c 8b 55 a8 85 c0 75 bc e9 31 ff ff ff 0f
0b 48 8b b2 d0 fc ff ff 48 8d 7d b0 b9 11 00 00 00 4d 89 d9
RIP  [812a2962] btrfs_add_link+0x132/0x190
 RSP 88021e1097d8
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


2.6.29-rc2 oops and assertion failure...

2011-04-07 Thread Daniel J Blueman
 [812762d5] ? btrfs_alloc_path+0x15/0x30
 [812a1640] btrfs_truncate_inode_items+0x110/0x770
 [810506b1] ? get_parent_ip+0x11/0x50
 [817094d0] ? _raw_spin_unlock+0x30/0x60
 [812a21fb] btrfs_evict_inode+0x18b/0x200
 [8115b511] evict+0x81/0x180
 [8115b9c6] iput_final+0xe6/0x1a0
 [8115bab6] iput+0x36/0x50
 [811672de] writeback_sb_inodes+0x12e/0x1d0
 [81167e9b] writeback_inodes_wb+0x7b/0x180
 [8116825b] wb_writeback+0x2bb/0x320
 [8115c882] ? get_nr_inodes+0x62/0xb0
 [811684dc] wb_do_writeback+0x21c/0x230
 [81168582] bdi_writeback_thread+0x92/0x180
 [811684f0] ? wb_do_writeback+0x230/0x230
 [81080596] kthread+0xb6/0xc0
 [8109629d] ? trace_hardirqs_on_caller+0x14d/0x190
 [8170b154] kernel_thread_helper+0x4/0x10
 [81055718] ? finish_task_switch+0x78/0x110
 [81709884] ? retint_restore_args+0xe/0xe
 [810804e0] ? __init_kthread_worker+0x70/0x70
 [8170b150] ? gs_change+0xb/0xb
Code: ff ff e8 79 bf 42 00 e9 ae fe ff ff eb 02 90 90 e8 6b bf 42 00
eb 01 90 e9 33 fe ff ff 48 83 be 47 01 00 00 f7 0f 85 c2 fd ff ff 0f
0b eb fe 48 3b 50 20 0f 84 04 ff ff ff 0f 0b eb fe 83 7d c4
RIP  [812da5ab] btrfs_reloc_cow_block+0x28b/0x2c0
 RSP 8803057817f0
---[ end trace a7919e7f17c0a728 ]---
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.29-rc2 oops and assertion failure...

2011-04-07 Thread Daniel J Blueman
Hi Josef, Chris,

On 8 April 2011 00:23, Josef Bacik jo...@redhat.com wrote:
 On 04/07/2011 03:21 AM, Daniel J Blueman wrote:

 When running a practical stress-test on 2.6.29-rc2 trying to reproduce
 an older (extent refcounting) issue, I am consistently able to hit an
 oops [] and an assertion failure [].

 Sorry about that, please apply the patch I just sent this morning

 [PATCH] Btrfs: deal with the case that we run out of space in the cache

Superb work - the btrfs_write_out_cache oops is addressed, so now we
(separately) hit a few other assertions at: volumes.c:2013 [1],
volumes.c:2063 [2] and volumes.c:2703 [3] with the previous
reproducer.

Let me know if adding any debugging or other testing may be useful.

Thanks,
  Daniel

--- [1]

kernel BUG at fs/btrfs/volumes.c:2013!
invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
last sysfs file: /sys/devices/virtual/block/ram7/removable
CPU 0
Modules linked in: ppp_generic slhc tun brd loop

Pid: 17040, comm: btrfs Tainted: GW   2.6.39-rc2-350cd+ #3
Supermicro X8STi/X8STi
RIP: 0010:[812c214b]  [812c214b] btrfs_balance+0x27b/0x280
RSP: 0018:88015c923e08  EFLAGS: 00010282
RAX: fffb RBX: 880301d6e1b0 RCX: 0040
RDX: fffb RSI:  RDI: 8112e425
RBP: 88015c923e88 R08:  R09: 8802f8ee53f0
R10: 0012 R11: 0098 R12: 8802f909a490
R13: 8802f909bc38 R14: 1000 R15: 7fffd1599ce0
FS:  7f3c4b6f4740() GS:88031fc0() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 00f00098 CR3: 00015c921000 CR4: 06f0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process btrfs (pid: 17040, threadinfo 88015c922000, task 88030b898000)
Stack:
 880307cd5498 880301d6c120 88015c923e38 81085b9e
 880308a5d700 0008 88015c923f48 81031d5c
 ea000a9e7b40 88015c923f58 88030b898000 88015c8aa300
Call Trace:
 [81085b9e] ? up_read+0x1e/0x40
 [81031d5c] ? do_page_fault+0x1cc/0x440
 [812c9ec0] btrfs_ioctl+0x450/0x590
 [81152e8d] do_vfs_ioctl+0x8d/0x330
 [81141444] ? fget_light+0x274/0x3c0
 [81106cc0] ? __do_fault+0x150/0x5d0
 [8115317a] sys_ioctl+0x4a/0x80
 [8170a03b] system_call_fastpath+0x16/0x1b
Code: 81 c7 d8 22 00 00 e8 05 4b 44 00 8b 45 80 e9 e7 fd ff ff 31 c0
eb d2 85 c0 74 a7 0f 0b eb fe 0f 0b eb fe 0f 0b eb fe 0f 0b eb fe 0f
0b eb fe 90 55 48 89 e5 48 83 ec 40 8b 05 e2 62 72 00 4c 89
RIP  [812c214b] btrfs_balance+0x27b/0x280
 RSP 88015c923e08

--- [2]

kernel BUG at fs/btrfs/volumes.c:2063!
invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
last sysfs file: /sys/devices/virtual/block/ram7/removable
CPU 0
Modules linked in: brd loop

Pid: 13460, comm: btrfs Tainted: GW   2.6.39-rc2-350cd+ #3
Supermicro X8STi/X8STi
RIP: 0010:[812c213b]  [812c213b] btrfs_balance+0x26b/0x280
RSP: 0018:8800b1827e08  EFLAGS: 00010282
RAX: fffb RBX: 88030934d168 RCX: 0006
RDX: fffb RSI: 880308fc06f0 RDI: 880308fc
RBP: 8800b1827e88 R08:  R09: 
R10:  R11:  R12: 8802ff5455e8
R13: 8800b1827e38 R14: 00010d56 R15: 8800b1827e18
FS:  7fce737e5740() GS:88031fc0() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 02371688 CR3: b1ff8000 CR4: 06f0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process btrfs (pid: 13460, threadinfo 8800b1826000, task 880308fc)
Stack:
 0100 88030934e1b0 0100 010d56e4
 880308837a00 0008 0100 0113bbe4
 880308fc0600 8800b1827f58 880308fc 8801f8c56c00
Call Trace:
 [812c9ec0] btrfs_ioctl+0x450/0x590
 [81152e8d] do_vfs_ioctl+0x8d/0x330
 [8114148f] ? fget_light+0x2bf/0x3c0
 [8109629d] ? trace_hardirqs_on_caller+0x14d/0x190
 [8115317a] sys_ioctl+0x4a/0x80
 [8170a03b] system_call_fastpath+0x16/0x1b
Code: 7c 90 fb ff 48 8b 55 88 48 8b ba 58 01 00 00 48 81 c7 d8 22 00
00 e8 05 4b 44 00 8b 45 80 e9 e7 fd ff ff 31 c0 eb d2 85 c0 74 a7 0f
0b eb fe 0f 0b eb fe 0f 0b eb fe 0f 0b eb fe 0f 0b eb fe 90
RIP  [812c213b] btrfs_balance+0x26b/0x280
 RSP 8800b1827e08

--- [3]

kernel BUG at fs/btrfs/volumes.c:2703!
invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
last sysfs file: /sys/devices/virtual/bdi/btrfs-3/uevent
CPU 0
Modules linked in: brd loop

Pid: 14333, comm: btrfs-delalloc- Tainted: GW
2.6.39-rc2-350cd+ #3 Supermicro X8STi/X8STi
RIP: 0010:[812c08c2

2.6.39-rc2 filesystem balance lock ordering...

2011-04-06 Thread Daniel J Blueman
 [812c14ed] btrfs_relocate_chunk+0x6d/0x3b0
 [8105584d] ? sub_preempt_count+0x9d/0xd0
 [812b5b2e] ? unmap_extent_buffer+0xe/0x40
 [812ab4a5] ? btrfs_dev_extent_chunk_offset+0xe5/0xf0
 [812c1a4a] btrfs_shrink_device+0x21a/0x3d0
 [812c1fdb] btrfs_balance+0x10b/0x280
 [81085b9e] ? up_read+0x1e/0x40
 [81031d5c] ? do_page_fault+0x1cc/0x440
 [812c9ec0] btrfs_ioctl+0x450/0x590
 [81152e8d] do_vfs_ioctl+0x8d/0x330
 [8114148f] ? fget_light+0x2bf/0x3c0
 [8109629d] ? trace_hardirqs_on_caller+0x14d/0x190
 [8115317a] sys_ioctl+0x4a/0x80
 [81709ffb] system_call_fastpath+0x16/0x1b
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[2.6.39-rc1] extent reference leaking...

2011-03-30 Thread Daniel J Blueman
When running the Linux Test Project against a BTRFS RAID 1 array,
after some time I see BTRFS trying to free an extent that still has
state [1].

Let me know if anyone is interested in a more specific reproducer and
I'll take a look.

Daniel

--- [1]

WARNING: at fs/btrfs/extent_io.c:3371 free_extent_buffer+0x31/0x40()
Hardware name: X8STi
Modules linked in: tun microcode loop raid10 raid456 async_memcpy
async_pq async_xor xor async_raid6_recov raid6_pq async_tx raid1 raid0
multipath linear md_mod
Pid: 14202, comm: ftest08 Tainted: GW   2.6.39-rc1-350cd #1
Call Trace:
 [8105f81a] warn_slowpath_common+0x7a/0xb0
 [8105f865] warn_slowpath_null+0x15/0x20
 [812b6711] free_extent_buffer+0x31/0x40
 [8127ad40] btrfs_lock_root_node+0x20/0x50
 [8127b530] btrfs_search_slot+0x400/0x790
 [81275eb5] ? btrfs_alloc_path+0x15/0x30
 [81275eb5] ? btrfs_alloc_path+0x15/0x30
 [8128e053] btrfs_lookup_csums_range+0x83/0x4b0
 [812b56be] ? unmap_extent_buffer+0xe/0x40
 [812a8744] ? btrfs_file_extent_compression+0xe4/0xf0
 [812cd325] copy_items+0x315/0x3f0
 [812cf03a] btrfs_log_inode+0x3ea/0x520
 [812ce3bb] ? start_log_trans+0x6b/0x150
 [812cf463] btrfs_log_inode_parent+0x193/0x2e0
 [81156424] ? dget_parent+0x94/0x100
 [811563a7] ? dget_parent+0x17/0x100
 [812cf664] btrfs_log_dentry_safe+0x44/0x70
 [812a5afb] btrfs_sync_file+0xeb/0x190
 [8116bb43] vfs_fsync_range+0x73/0x90
 [81141470] ? fget_raw+0x260/0x260
 [8116bbb7] vfs_fsync+0x17/0x20
 [8116bbf5] do_fsync+0x35/0x60
 [8116bc4b] sys_fsync+0xb/0x10
 [816edf3b] system_call_fastpath+0x16/0x1b
---[ end trace a7919e7f17c0a728 ]---
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


2.6.38 defragment compression oops...

2011-03-24 Thread Daniel J Blueman
I found that I'm able to provoke undefined behaviour with 2.6.38 with
extent defragmenting + recompression, eg:

mkfs.btrfs /dev/sdb
mount /dev/sdb /mnt
cp -xa / /mnt
find /mnt -print0 | xargs -0 btrfs filesystem defragment -vc

After a short time, I was seeing what looked like a secondary effect
[1]. Reproducing with lock instrumentation reported recursive spinlock
acquisition, probably a false-positive from the locking scheme not
being annotated, so better report it now.

Daniel

--- [1]

BUG: unable to handle kernel NULL pointer dereference at   (null)
IP: [a00e23cb] write_extent_buffer+0xbb/0x1b0 [btrfs]
PGD 0
Oops:  [#1] SMP
last sysfs file: /sys/devices/pci:00/:00:1e.0/:06:04.0/local_cpus
CPU 1
Modules linked in: microcode psmouse serio_raw ioatdma i7core_edac
joydev lp edac_core dca parport raid10 raid456 async_raid6_recov
async_pq usbhid hid raid6_pq async_xor xor async_memcpy async_tx raid1
raid0 multipath linear ahci btrfs zlib_deflate libahci e1000e
libcrc32c

Pid: 1119, comm: btrfs-delalloc- Tainted: GW
2.6.38-020638-generic #201103151303 Supermicro X8STi/X8STi
RIP: 0010:[a00e23cb]  [a00e23cb]
write_extent_buffer+0xbb/0x1b0 [btrfs]
RSP: 0018:880303a0bbc0  EFLAGS: 00010a86
RAX: db74 RBX: 0d26 RCX: 8800
RDX:  RSI: 0002fa19 RDI: 88023c8353f8
RBP: 880303a0bc00 R08: 0001 R09: 
R10:  R11: 0017 R12: db738800
R13: 028c R14: 880303a0bfd8 R15: 
FS:  () GS:8800df48() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2:  CR3: 01a03000 CR4: 06e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process btrfs-delalloc- (pid: 1119, threadinfo 880303a0a000, task
8803046cad80)
Stack:
 880280e63cc0 8802fd10ad26 0001 880303a0a000
 ea000a75ba30 0fb2 08f7 02da
 880303a0bcb0 a00c5bb0 002e0001 
Call Trace:
 [a00c5bb0] insert_inline_extent+0x330/0x350 [btrfs]
 [a00c5cf6] cow_file_range_inline+0x126/0x160 [btrfs]
 [a00c68f0] compress_file_range+0x3b0/0x580 [btrfs]
 [a00c6af5] async_cow_start+0x35/0x50 [btrfs]
 [a00eac0c] worker_loop+0xac/0x260 [btrfs]
 [a00eab60] ? worker_loop+0x0/0x260 [btrfs]
 [81086317] kthread+0x97/0xa0
 [8100ce24] kernel_thread_helper+0x4/0x10
 [81086280] ? kthread+0x0/0xa0
 [8100ce20] ? kernel_thread_helper+0x0/0x10
Code: 16 00 00 48 8d 04 0a 48 b9 b7 6d db b6 6d db b6 6d 48 c1 f8 03
48 0f af c1 48 b9 00 00 00 00 00 88 ff ff 48 c1 e0 0c 4c 8d 24 08 48
8b 02 a8 08 0f 85 9c 00 00 00 be cb 0e 00 00 48 c7 c7 b8 7c
RIP  [a00e23cb] write_extent_buffer+0xbb/0x1b0 [btrfs]
 RSP 880303a0bbc0
CR2: 
---[ end trace a7919e7f17c0a728 ]---
note: btrfs-delalloc-exited with preempt_count 1
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


2.6.38 fs balance lock ordering...

2011-03-24 Thread Daniel J Blueman
While doing a filesystem balance, lockdep detecting a potential lock
ordering issue [1].

Thanks,
  Daniel

--- [1]

===
[ INFO: possible circular locking dependency detected ]
2.6.38.1-341cd+ #10
---
btrfs/1101 is trying to acquire lock:
 (sb-s_type-i_mutex_key#12){+.+.+.}, at: [812cddb9]
prealloc_file_extent_cluster+0x59/0x180

but task is already holding lock:
 (fs_info-cleaner_mutex){+.+.+.}, at: [812cfcb7]
btrfs_relocate_block_group+0x197/0x2d0

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

- #2 (fs_info-cleaner_mutex){+.+.+.}:
   [8109628a] lock_acquire+0x5a/0x70
   [816c9cde] mutex_lock_nested+0x5e/0x390
   [812828e1] btrfs_commit_super+0x21/0xe0
   [812857a2] close_ctree+0x332/0x3a0
   [8125fd08] btrfs_put_super+0x18/0x30
   [8113ae7d] generic_shutdown_super+0x6d/0xf0
   [8113af91] kill_anon_super+0x11/0x60
   [8113b6b5] deactivate_locked_super+0x45/0x60
   [8113c2b5] deactivate_super+0x45/0x60
   [81158729] mntput_no_expire+0x99/0xf0
   [8115996c] sys_umount+0x7c/0x3c0
   [81002dfb] system_call_fastpath+0x16/0x1b

- #1 (type-s_umount_key#24){++}:
   [8109628a] lock_acquire+0x5a/0x70
   [816ca372] down_read+0x42/0x60
   [8115e935] writeback_inodes_sb_nr_if_idle+0x35/0x60
   [812723ae] shrink_delalloc+0xee/0x180
   [81273253] btrfs_delalloc_reserve_metadata+0x163/0x180
   [812732ab] btrfs_delalloc_reserve_space+0x3b/0x60
   [8129563d] btrfs_file_aio_write+0x61d/0x9c0
   [81137f12] do_sync_write+0xd2/0x110
   [81138a88] vfs_write+0xc8/0x190
   [81138c3c] sys_write+0x4c/0x90
   [81002dfb] system_call_fastpath+0x16/0x1b

- #0 (sb-s_type-i_mutex_key#12){+.+.+.}:
   [810961a8] __lock_acquire+0x1ba8/0x1c30
   [8109628a] lock_acquire+0x5a/0x70
   [816c9cde] mutex_lock_nested+0x5e/0x390
   [812cddb9] prealloc_file_extent_cluster+0x59/0x180
   [812ce0a1] relocate_file_extent_cluster+0x91/0x380
   [812ce44b] relocate_data_extent+0xbb/0xd0
   [812cf843] relocate_block_group+0x323/0x600
   [812cfcc8] btrfs_relocate_block_group+0x1a8/0x2d0
   [812b09c3] btrfs_relocate_chunk+0x83/0x600
   [812b160d] btrfs_balance+0x20d/0x280
   [812b8b86] btrfs_ioctl+0x1b6/0xa80
   [8114a43d] do_vfs_ioctl+0x9d/0x590
   [8114a97a] sys_ioctl+0x4a/0x80
   [81002dfb] system_call_fastpath+0x16/0x1b

other info that might help us debug this:

2 locks held by btrfs/1101:
 #0:  (fs_info-volume_mutex){+.+.+.}, at: [812b148b]
btrfs_balance+0x8b/0x280
 #1:  (fs_info-cleaner_mutex){+.+.+.}, at: [812cfcb7]
btrfs_relocate_block_group+0x197/0x2d0

stack backtrace:
Pid: 1101, comm: btrfs Tainted: GW   2.6.38.1-341cd+ #10
Call Trace:
 [810937fb] ? print_circular_bug+0xeb/0xf0
 [810961a8] ? __lock_acquire+0x1ba8/0x1c30
 [812a5fd1] ? map_private_extent_buffer+0xe1/0x210
 [812cddb9] ? prealloc_file_extent_cluster+0x59/0x180
 [8109628a] ? lock_acquire+0x5a/0x70
 [812cddb9] ? prealloc_file_extent_cluster+0x59/0x180
 [810565f5] ? add_preempt_count+0x75/0xd0
 [816c9cde] ? mutex_lock_nested+0x5e/0x390
 [812cddb9] ? prealloc_file_extent_cluster+0x59/0x180
 [81125fa3] ? init_object+0x43/0x80
 [81051121] ? get_parent_ip+0x11/0x50
 [812cddb9] ? prealloc_file_extent_cluster+0x59/0x180
 [812ce0a1] ? relocate_file_extent_cluster+0x91/0x380
 [812ce44b] ? relocate_data_extent+0xbb/0xd0
 [812cf843] ? relocate_block_group+0x323/0x600
 [812cfcc8] ? btrfs_relocate_block_group+0x1a8/0x2d0
 [812b09c3] ? btrfs_relocate_chunk+0x83/0x600
 [812a62d2] ? read_extent_buffer+0xf2/0x230
 [8126c286] ? btrfs_search_slot+0x886/0xa90
 [8105654d] ? sub_preempt_count+0x9d/0xd0
 [812a62d2] ? read_extent_buffer+0xf2/0x230
 [812b160d] ? btrfs_balance+0x20d/0x280
 [812b8b86] ? btrfs_ioctl+0x1b6/0xa80
 [8103146c] ? do_page_fault+0x1cc/0x440
 [8114a43d] ? do_vfs_ioctl+0x9d/0x590
 [8113943f] ? fget_light+0x1df/0x3c0
 [8114a97a] ? sys_ioctl+0x4a/0x80
 [81002dfb] ? system_call_fastpath+0x16/0x1b
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: fix dip leak

2011-03-09 Thread Daniel J Blueman
2010/11/22 Miao Xie mi...@cn.fujitsu.com:
 bio_endio() will free dip and dip-csums, so dip and dip-csums twice will
 be freed twice. Fix it.

 Signed-off-by: Miao Xie mi...@cn.fujitsu.com
 ---
  fs/btrfs/inode.c |    9 +++--
  1 files changed, 3 insertions(+), 6 deletions(-)

 diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
 index 558cac2..5a5edc7 100644
 --- a/fs/btrfs/inode.c
 +++ b/fs/btrfs/inode.c
 @@ -5731,7 +5731,7 @@ static void btrfs_submit_direct(int rw, struct bio 
 *bio, struct inode *inode,

        ret = btrfs_bio_wq_end_io(root-fs_info, bio, 0);
        if (ret)
 -               goto out_err;
 +               goto free_ordered;

        if (write  !skip_sum) {
                ret = btrfs_wq_submit_bio(BTRFS_I(inode)-root-fs_info,
 @@ -5740,7 +5740,7 @@ static void btrfs_submit_direct(int rw, struct bio 
 *bio, struct inode *inode,
                                   __btrfs_submit_bio_start_direct_io,
                                   __btrfs_submit_bio_done);
                if (ret)
 -                       goto out_err;
 +                       goto free_ordered;
                return;
        } else if (!skip_sum)
                btrfs_lookup_bio_sums_dio(root, inode, bio,
 @@ -5748,11 +5748,8 @@ static void btrfs_submit_direct(int rw, struct bio 
 *bio, struct inode *inode,

        ret = btrfs_map_bio(root, rw, bio, 0, 1);
        if (ret)
 -               goto out_err;
 +               goto free_ordered;
        return;
 -out_err:
 -       kfree(dip-csums);
 -       kfree(dip);
  free_ordered:
        /*
         * If this is a write, we need to clean up the reserved space and kill

The previous patch is broken and leaks dip when dip-csums allocation
fails; bio-bi_end_io isn't set at the point where the free_ordered
branch is consequently taken, thus bio_endio doesn't call the function
which would free it in the normal case. Fix.

Signed-off-by: Daniel J Blueman daniel.blue...@gmail.com

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 0efdb65..53f4c8e 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6056,6 +6056,7 @@ static void btrfs_submit_direct(int rw, struct
bio *bio, struct inode *inode,
if (!skip_sum) {
dip-csums = kmalloc(sizeof(u32) * bio-bi_vcnt, GFP_NOFS);
if (!dip-csums) {
+   kfree(dip);
ret = -ENOMEM;
goto free_ordered;
}
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[2.6.38-rc6, patch] fix delayed_refs locking on error path...

2011-03-01 Thread Daniel J Blueman
Correctly unlock delayed_refs in the error case.

Signed-off-by: Daniel J Blueman daniel.blue...@gmail.com

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index e1aa8d6..c48d699 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2787,6 +2787,7 @@ static int btrfs_destroy_delayed_refs(struct
btrfs_transaction *trans,
spin_lock(delayed_refs-lock);
if (delayed_refs-num_entries == 0) {
printk(KERN_INFO delayed_refs has NO entry\n);
+   spin_unlock(delayed_refs-lock);
return ret;
}

-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[2.6.38-rc6, patch] mark some internal functions static...

2011-03-01 Thread Daniel J Blueman
Prevent needless exporting of internal functions from compilation
units by marking them static.

Signed-off-by: Daniel J Blueman daniel.blue...@gmail.com

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index b5baff0..5e49196 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -74,7 +74,7 @@ noinline void btrfs_set_path_blocking(struct btrfs_path *p)
  * retake all the spinlocks in the path.  You can safely use NULL
  * for held
  */
-noinline void btrfs_clear_path_blocking(struct btrfs_path *p,
+static noinline void btrfs_clear_path_blocking(struct btrfs_path *p,
struct extent_buffer *held)
 {
int i;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index e1aa8d6..c48d699 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2279,7 +2279,7 @@ static int write_dev_supers(struct btrfs_device *device,
return errors  i ? 0 : -1;
 }

-int write_all_supers(struct btrfs_root *root, int max_mirrors)
+static int write_all_supers(struct btrfs_root *root, int max_mirrors)
 {
struct list_head *head;
struct btrfs_device *dev;
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index f3c96fc..1961081 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -77,7 +77,7 @@ static int block_group_bits(struct
btrfs_block_group_cache *cache, u64 bits)
return (cache-flags  bits) == bits;
 }

-void btrfs_get_block_group(struct btrfs_block_group_cache *cache)
+static void btrfs_get_block_group(struct btrfs_block_group_cache *cache)
 {
atomic_inc(cache-count);
 }
@@ -3576,7 +3576,7 @@ static void block_rsv_add_bytes(struct
btrfs_block_rsv *block_rsv,
spin_unlock(block_rsv-lock);
 }

-void block_rsv_release_bytes(struct btrfs_block_rsv *block_rsv,
+static void block_rsv_release_bytes(struct btrfs_block_rsv *block_rsv,
 struct btrfs_block_rsv *dest, u64 num_bytes)
 {
struct btrfs_space_info *space_info = block_rsv-space_info;
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index a039065..ec5015c 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -1371,7 +1371,7 @@ out:
return ret;
 }

-bool try_merge_free_space(struct btrfs_block_group_cache *block_group,
+static bool try_merge_free_space(struct btrfs_block_group_cache *block_group,
  struct btrfs_free_space *info, bool update_stat)
 {
struct btrfs_free_space *left_info;
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index be2d4f6..7b97854 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2193,7 +2193,7 @@ static void get_block_group_info(struct
list_head *groups_list,
}
 }

-long btrfs_ioctl_space_info(struct btrfs_root *root, void __user *arg)
+static long btrfs_ioctl_space_info(struct btrfs_root *root, void __user *arg)
 {
struct btrfs_ioctl_space_args space_args;
struct btrfs_ioctl_space_info space;
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2.6.38-rc6] create-rebalance-mount crash...

2011-02-25 Thread Daniel J Blueman
On 24 February 2011 20:48, liubo liubo2...@cn.fujitsu.com wrote:
 On 02/24/2011 04:13 PM, Daniel J Blueman wrote:
 When creating a filesystem (single or redundant) with BTRFS and
 subsequently executing a balance [1], we see a kernel oops at the next
 mount [2].


 Hi, Daniel,

 After digging this, I've come up with a patch on this, would you please test
 it on your box?  Hopes that this is helpful, Thanks.

 From: Liu Bo liubo2...@cn.fujitsu.com

 [PATCH] btrfs: fix OOPS of empty filesystem after balance

 btrfs will exclude unused block groups via a thread.
 When a empty filesystem is balanced, the block group with tag DATA may be 
 dropped,
 and after umount, this will lead to OOPS when we mount it again.

[snip]

Thanks, Bo; the patch addresses the oops.

Daniel

Reported-by: Daniel J Blueman daniel.blue...@gmail.com
Tested-by: Daniel J Blueman daniel.blue...@gmail.com
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[2.6.38-rc1] btrfs potential false-positive lockdep report...

2011-01-19 Thread Daniel J Blueman
I saw a lockdep report with an instrumented 2.6.38-rc1 kernel [1].

Checking the code, it looks more likely a false-positive due to the
lock manipulation to satisfy lockdep, since CONFIG_DEBUG_LOCK_ALLOC is
defined.

Is this the case?

Thanks,
  Daniel

--- [1]

=
[ INFO: possible recursive locking detected ]
2.6.38-rc1-340cd+ #7
-
gnome-screensav/4276 is trying to acquire lock:
 ((eb-lock)-rlock){+.+...}, at: [81301078]
btrfs_try_spin_lock+0x58/0x100

but task is already holding lock:
 ((eb-lock)-rlock){+.+...}, at: [8130113d]
btrfs_clear_lock_blocking+0x1d/0x30

other info that might help us debug this:
2 locks held by gnome-screensav/4276:
 #0:  (sb-s_type-i_mutex_key#10){+.+.+.}, at: [81178fb1]
do_lookup+0x1c1/0x2c0
 #1:  ((eb-lock)-rlock){+.+...}, at: [8130113d]
btrfs_clear_lock_blocking+0x1d/0x30

stack backtrace:
Pid: 4276, comm: gnome-screensav Not tainted 2.6.38-rc1-340cd+ #7
Call Trace:
 [810a7a10] ? __lock_acquire+0x1040/0x1d10
 [810a42ed] ? trace_hardirqs_on_caller+0x15d/0x1b0
 [8100bdfc] ? native_sched_clock+0x2c/0x80
 [8100bc33] ? sched_clock+0x13/0x20
 [810a87a6] ? lock_acquire+0xc6/0x280
 [81301078] ? btrfs_try_spin_lock+0x58/0x100
 [816d31cb] ? _raw_spin_lock+0x3b/0x70
 [81301078] ? btrfs_try_spin_lock+0x58/0x100
 [8130113d] ? btrfs_clear_lock_blocking+0x1d/0x30
 [81301078] ? btrfs_try_spin_lock+0x58/0x100
 [812b4c27] ? btrfs_search_slot+0x917/0xa10
 [8100bc33] ? sched_clock+0x13/0x20
 [812c520a] ? btrfs_lookup_dir_item+0x7a/0x110
 [810a434d] ? trace_hardirqs_on+0xd/0x10
 [811563d3] ? kmem_cache_alloc+0x163/0x2f0
 [812db274] ? btrfs_lookup_dentry+0xa4/0x480
 [810a2a4e] ? put_lock_stats+0xe/0x40
 [810a2b2c] ? lock_release_holdtime+0xac/0x150
 [81055ce1] ? get_parent_ip+0x11/0x50
 [8105a1fd] ? sub_preempt_count+0x9d/0xd0
 [812db661] ? btrfs_lookup+0x11/0x30
 [81178b10] ? d_alloc_and_lookup+0x40/0x80
 [81186420] ? d_lookup+0x30/0x60
 [81178fd3] ? do_lookup+0x1e3/0x2c0
 [8117832e] ? generic_permission+0x1e/0xb0
 [8117b5a1] ? link_path_walk+0x141/0xbd0
 [8117af68] ? path_init_rcu+0x1b8/0x280
 [8117c326] ? do_path_lookup+0x56/0x130
 [8117d022] ? user_path_at+0x52/0xa0
 [8109375e] ? up_read+0x1e/0x40
 [810335c8] ? do_page_fault+0x1f8/0x510
 [8109543d] ? sched_clock_cpu+0xdd/0x120
 [81171ff1] ? vfs_fstatat+0x41/0x80
 [810a2a4e] ? put_lock_stats+0xe/0x40
 [810a2b2c] ? lock_release_holdtime+0xac/0x150
 [81172066] ? vfs_stat+0x16/0x20
 [8117223f] ? sys_newstat+0x1f/0x50
 [810a42ed] ? trace_hardirqs_on_caller+0x15d/0x1b0
 [816d2f39] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [81003192] ? system_call_fastpath+0x16/0x1b
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [dm-devel] hunt for 2.6.37 dm-crypt+ext4 corruption?

2010-12-05 Thread Daniel J Blueman
Hi Heinz,

On 5 December 2010 14:33, Heinz Diehl h...@fritha.org wrote:
 On 05.12.2010, Theodore Tso wrote:

 As another thought, what version of GCC are people using who
 are having difficulty? Could this perhaps be a compiler-related issue?

 h...@liesel:~ gcc -v
 Using built-in specs.
 Target: x86_64-suse-linux
 Configured with: ../configure --prefix=/usr --infodir=/usr/share/info
 --mandir=/usr/share/man --libdir=/usr/lib64 --libexecdir=/usr/lib64
 --enable-languages=c,c++,objc,fortran,obj-c++,java,ada
 --enable-checking=release --with-gxx-include-dir=/usr/include/c++/4.3
 --enable-ssp --disable-libssp --with-bugurl=http://bugs.opensuse.org/
 --with-pkgversion='SUSE Linux' --disable-libgcj --disable-libmudflap
 --with-slibdir=/lib64 --with-system-zlib --enable-__cxa_atexit
 --enable-libstdcxx-allocator=new --disable-libstdcxx-pch
 --enable-version-specific-runtime-libs --program-suffix=-4.3
 --enable-linux-futex --without-system-libunwind --with-cpu=generic
 --build=x86_64-suse-linux
 Thread model: posix
 gcc version 4.3.2 [gcc-4_3-branch revision 141291] (SUSE Linux)

A bit late to the party, but does memtest86 pass over multiple iterations?

Also, can you send an 'sudo lspci -vv', so we can check for
known-buggy controllers and bridges?

Thanks,
  Daniel
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ls flush-btrfs-1 sit at 100% sys

2010-11-18 Thread Daniel J Blueman
On 18 November 2010 06:03, Brian Sullivan bexam...@gmail.com wrote:
 Nothing shows up in dmesg.

 [ 8114.870020] ls            R  running task        0  3438   3375 0x0004
 [ 8114.870020]  88036339dab8 0086 88036339da60
 88036339dfd8
 [ 8114.870020]  000139c0  88036339dfd8
 88036339dfd8
 [ 8114.870020]  000139c0 88034f670398 88034f6703a0
 88034f67
 [ 8114.870020] Call Trace:
 [ 8114.870020]  [8159f7b4] ? schedule+0x224/0x660
 [ 8114.870020]  [815a01de] schedule_timeout+0x19e/0x2e0
 [ 8114.870020]  [81057690] enqueue_task_fair+0x50/0x60
 [ 8114.870020]  [8105d550] enqueue_task+0x70/0xd0
 [ 8114.870020]  [8105e9be] ? try_to_wake_up+0x18e/0x3f0
 [ 8114.870020]  [8105ec20] ? default_wake_function+0x0/0x20
 [ 8114.870020]  [815a0196] ? schedule_timeout+0x156/0x2e0
 [ 8114.870020]  [81181399] ? 
 writeback_inodes_sb_nr_if_idle+0x49/0x70
 [ 8114.870020]  [a0e84607] ? shrink_delalloc+0x127/0x170 [btrfs]
 [ 8114.870020]  [a0e84727] ? reserve_metadata_bytes+0xd7/0x1f0 
 [btrfs]
 [ 8114.870020]  [a0e84913] ? btrfs_block_rsv_add+0x43/0x60 [btrfs]
 [ 8114.870020]  [81085e00] ? autoremove_wake_function+0x0/0x40
 [ 8114.870020]  [a0e8498b] ?
 btrfs_trans_reserve_metadata+0x5b/0xa0 [btrfs]
 [ 8114.870020]  [a0e9a0be] ? start_transaction+0xbe/0x210 [btrfs]
 [ 8114.870020]  [8116fa80] ? filldir+0x0/0xf0
 [ 8114.870020]  [a0e9a423] ? btrfs_start_transaction+0x13/0x20 
 [btrfs]
 [ 8114.870020]  [a0e9d3e8] ? btrfs_dirty_inode+0x98/0x120 [btrfs]
 [ 8114.870020]  [8116fa80] ? filldir+0x0/0xf0
 [ 8114.870020]  [81182d9a] ? __mark_inode_dirty+0x3a/0x200
 [ 8114.870020]  [811754f4] ? touch_atime+0xf4/0x100
 [ 8114.870020]  [8116f92c] ? vfs_readdir+0xcc/0xd0
 [ 8114.870020]  [8116f9ba] ? sys_getdents+0x8a/0xe0
 [ 8114.870020]  [815a2515] ? page_fault+0x25/0x30
 [ 8114.870020]  [8100c132] ? system_call_fastpath+0x16/0x1b
 [ 8114.870020] flush-btrfs-1 R  running task        0  3439      2 0x
 [ 8114.870020]  8803631c5e60 0046 8803631c5e00
 
 [ 8114.870020]  8803631c5e20 000139c0 8803631c5fd8
 8803631c5fd8
 [ 8114.870020]  000139c0 8803507fd9b0 8803631c5e60
 880362205b00
 [ 8114.870020] Call Trace:
 [ 8114.870020]  [81182b00] ? bdi_writeback_thread+0x0/0x260
 [ 8114.870020]  [81182bb3] bdi_writeback_thread+0xb3/0x260
 [ 8114.870020]  [81182b00] ? bdi_writeback_thread+0x0/0x260
 [ 8114.870020]  [81085727] kthread+0x97/0xa0
 [ 8114.870020]  [8100cf24] kernel_thread_helper+0x4/0x10
 [ 8114.870020]  [81085690] ? kthread+0x0/0xa0
 [ 8114.870020]  [8100cf20] ? kernel_thread_helper+0x0/0x10

Interesting. If you mount the filesystem with 'noatime,nodiratime' or
'ro', does it allow ls to return?

Daniel
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel 2.6.36 btrfs csum bugreport

2010-11-01 Thread Daniel J Blueman
On 1 November 2010 00:35, Andreas Bauer a...@voltage.de wrote:
 So I conclude that these messages are faulty because data is read correctly.
  In addition, when you have more than one btrfs you cannot see from the 
 message
  which fs it is refering to.

  Is this a raid1 or a dup array?

 No, plain vanilla partition on physical hard disk. Btrfs was made with the 
 command mkfs.btrfs /dev/sdc1 no extra arguments.

By default, metadata is duplicated, thus it could be that BTRFS is
using the correct copy of the metadata after finding checksum errors
in the first copy.

Daniel
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[2.6.37-rc0 patch] direct I/O submission fixes v3

2010-10-30 Thread Daniel J Blueman
Hi Chris,

These patches from myself and Josef are still relevant, but not in
your last mainline pull request.

Can you add them if you are happy please? I've rediffed them [1,2]
against your for-linux tree.

Many thanks,
  Daniel

--- [1]

Fix use-after-free on error path.

Signed-off-by: Josef Bacik jo...@redhat.com

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 558cac2..986cc40 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5761,7 +5761,7 @@ free_ordered:
if (write) {
struct btrfs_ordered_extent *ordered;
ordered = btrfs_lookup_ordered_extent(inode,
- dip-logical_offset);
+ file_offset);
if (!test_bit(BTRFS_ORDERED_PREALLOC, ordered-flags) 
!test_bit(BTRFS_ORDERED_NOCOW, ordered-flags))
btrfs_free_reserved_extent(root, ordered-start,

--- [2]

Fix leak of 'dip' on error path and unnecessary double-assignment.

Signed-off-by: Daniel J Blueman daniel.blue...@gmail.com

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 558cac2..312eeb7 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5701,15 +5701,15 @@ static void btrfs_submit_direct(int rw, struct
bio *bio, struct inode *inode,
ret = -ENOMEM;
goto free_ordered;
}
-   dip-csums = NULL;

if (!skip_sum) {
dip-csums = kmalloc(sizeof(u32) * bio-bi_vcnt, GFP_NOFS);
if (!dip-csums) {
ret = -ENOMEM;
-   goto free_ordered;
+   goto out_err;
}
-   }
+   } else
+   dip-csums = NULL;

dip-private = bio-bi_private;
dip-inode = inode;

-- Forwarded message --
From: Daniel J Blueman daniel.blue...@gmail.com
Date: 25 July 2010 19:53
Subject: Re: [2.6.35-rc6 patch] direct I/O submission fixes v2
To: Josef Bacik jo...@redhat.com, Chris Mason chris.ma...@oracle.com
Cc: Linux BTRFS linux-btrfs@vger.kernel.org


On 25 July 2010 15:42, Josef Bacik jo...@redhat.com wrote:
 On Sat, Jul 24, 2010 at 12:01:59AM +0100, Daniel J Blueman wrote:
 Hi Chris,

 This fixes some issues relating to direct I/O submission, however a
 further patch will be needed to handle the case where allocation of
 'dip' fails, which is always dereferenced when finding the ordered
 extent.


 Hi,

 There's an easier way to do this.  This patch should fix the problem,

 Signed-off-by: Josef Bacik jo...@redhat.com

 diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
 index 3232945..7259ef9 100644
 --- a/fs/btrfs/inode.c
 +++ b/fs/btrfs/inode.c
 @@ -5815,7 +5815,7 @@ free_ordered:
        if (write) {
                struct btrfs_ordered_extent *ordered;
                ordered = btrfs_lookup_ordered_extent(inode,
 -                                                     dip-logical_offset);
 +                                                     file_offset);
                if (!test_bit(BTRFS_ORDERED_PREALLOC, ordered-flags) 
                    !test_bit(BTRFS_ORDERED_NOCOW, ordered-flags))
                        btrfs_free_reserved_extent(root, ordered-start,


Good move!

With your patch applied, mine (now not priority) then becomes:

Fix leak of 'dip' on error path and double assignment.

Signed-off-by: Daniel J Blueman daniel.blue...@gmail.com

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 1bff92a..bd7f940 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5652,15 +5652,15 @@ static void btrfs_submit_direct(int rw, struct
bio *bio, struct inode *inode,
               ret = -ENOMEM;
               goto free_ordered;
       }
-       dip-csums = NULL;

       if (!skip_sum) {
               dip-csums = kmalloc(sizeof(u32) * bio-bi_vcnt, GFP_NOFS);
               if (!dip-csums) {
                       ret = -ENOMEM;
-                       goto free_ordered;
+                       goto out_err;
               }
-       }
+       } else
+               dip-csums = NULL;

       dip-private = bio-bi_private;
       dip-inode = inode;
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Performance degradation

2010-09-26 Thread Daniel J Blueman
Hi Dave,

On 24 September 2010 20:39, Dave Cundiff syshack...@gmail.com wrote:
 It shouldn't but it depends on how much metadata we have to read in to
 process the snapshots.  Could you do a few sysrq-w?  We'll see where you
 are spending your time based on the traces.

 Also, using 'perf' may give a good picture of where the time is spent, eg:

 $ sudo perf record -a sleep 20
 $ sudo perf report | tee profile.txt
 --
 Daniel J Blueman


 The cleaner finished before I was able to get any debug, however,
 here's sysrq's and perf data from my very slow running backups. I also
 did a test on the same box with rsync between the ext3 and btrfs
 filesystem with 1 large file. rsync can pretty much saturate the
 100mbit port writing to the ext3 filesystem. The ext3 and btrfs
 filesystems are on the same RAID5 container. Just different
 partitions.

 btrfs filesystem:
 rsync --progress -v -e 'ssh' a2s55:/var/log/exim_mainlog .
 exim_mainlog
    64019026 100%    1.76MB/s    0:00:34 (xfer#1, to-check=0/1)

 Ext3:
 rsync --progress -v -e 'ssh' a2s55:/var/log/exim_mainlog .
 exim_mainlog
    64032337 100%    8.40MB/s    0:00:07 (xfer#1, to-check=0/1)

 Here's an iostat across 20 seconds while the backups were running and
 actively trying to copy a 10gig file:
 sdb               0.00     0.00  0.25 27.45     2.00  4000.80   144.51
    0.01    0.28   0.22   0.60

 The sysrq and perf data is also from the backup trying to copy a 10gig
 file. Its not gonna complete for almost 2 hours... :(

 home/dmportal/public_html/data3m7_data_wiki.tgz
 1782775808  15%    1.57MB/s    1:40:45

 The only process I could get to show blocked in the sysrq was ssh
 which is how rsync is transmitting its data.

 [336155.004946]   task                        PC stack   pid father
 [336155.005018] ssh           D 0001140c8669  4424 20748  20747 0x0080
 [336155.005024]  880402421628 0086 8804024215d8
 4000
 [336155.008563]  00013140 8801cd458378 8801cd458000
 880236dd16b0
 [336155.008567]  02421678 810e23d8 000a
 0001
 [336155.008572] Call Trace:
 [336155.008581]  [810e23d8] ? free_page_list+0xe8/0x100
 [336155.008589]  [815cde98] schedule_timeout+0x138/0x2b0
 [336155.008595]  [81058920] ? process_timeout+0x0/0x10
 [336155.008599]  [815ccee0] io_schedule_timeout+0x80/0xe0
 [336155.008606]  [810ed9a1] congestion_wait+0x71/0x90
 [336155.008610]  [8106a280] ? autoremove_wake_function+0x0/0x40
 [336155.008614]  [810e4171] shrink_inactive_list+0x2e1/0x310
 [336155.008619]  [810e449f] shrink_zone+0x2ff/0x4e0
 [336155.008625]  [810e16b9] ? shrink_slab+0x149/0x180
 [336155.008628]  [810e486e] zone_reclaim+0x1ee/0x290
 [336155.008632]  [810d926a] ? zone_watermark_ok+0x2a/0x100
 [336155.008637]  [810db55a] get_page_from_freelist+0x5fa/0x7b0
 [336155.008642]  [81130956] ? free_poll_entry+0x26/0x30
 [336155.008647]  [81037f6e] ? select_idle_sibling+0x9e/0x120
 [336155.008652]  [810dc860] __alloc_pages_nodemask+0x130/0x7e0
 [336155.008658]  [8103f7cf] ? enqueue_task_fair+0x4f/0x60
 [336155.008661]  [8103d8f8] ? enqueue_task+0x68/0xa0
 [336155.008665]  [814bd8f1] ? __alloc_skb+0x51/0x180
 [336155.008671]  [8110f02c] kmalloc_large_node+0x5c/0xb0
 [336155.008676]  [81112166] __kmalloc_node_track_caller+0xf6/0x1f0
 [336155.008682]  [814ba2f0] ? sock_alloc_send_pskb+0x1a0/0x2f0
 [336155.008685]  [814bd920] __alloc_skb+0x80/0x180
 [336155.008689]  [814ba2f0] sock_alloc_send_pskb+0x1a0/0x2f0
 [336155.008693]  [81037a6e] ? __wake_up_sync_key+0x5e/0x80
 [336155.008697]  [814ba455] sock_alloc_send_skb+0x15/0x20
 [336155.008704]  [81567c22] unix_stream_sendmsg+0x282/0x3c0
 [336155.008709]  [814b6b20] sock_aio_write+0x170/0x180
 [336155.008715]  [8111ff95] do_sync_write+0xd5/0x120
 [336155.008720]  [812204af] ? security_file_permission+0x1f/0x70
 [336155.008724]  [8112034e] vfs_write+0x17e/0x190
 [336155.008727]  [81120d25] sys_write+0x55/0x90
 [336155.008733]  [81002e1b] system_call_fastpath+0x16/0x1b


 and 60 seconds of perf data while the 10gig file was languishing. I
 trimmed it just to the top entries. Lemme know if you need the entire
 thing.

 # Samples: 140516
 #
 # Overhead          Command                   Shared Object  Symbol
 #   ...  ..  ..
 #
    25.92%              ssh  /lib64/libcrypto.so.0.9.8e      [.]
 0x0e0bb8
     5.19%            rsync  /usr/bin/rsync                  [.] md5_process
     5.18%           [idle]  [kernel]                        [k] intel_idle
     4.35%              ssh  /usr/bin/ssh                    [.]
 0x0254ae
     4.00%           [idle]  [kernel]                        [k] read_hpet
     3.67

Re: Btrfs, NFS (v3) and ESTALE

2010-09-23 Thread Daniel J Blueman
Hi David,

On 23 September 2010 12:02, David Flynn dav...@rd.bbc.co.uk wrote:
 Dear all;

 On a cluster of ~35 machines used for batch processing, which all mount
 via NFS (v3) a BTRFS export, I am experiencing issues that are causing
 NFS clients to occasionally produce Stale NFS handle errors on accessing
 this file system.  I would be interested to know if this is possibly
 related to use of BTRFS, or is mere coincidence.

 Background:
  - The NFS server is running 2.6.33, with a btrfs file system created
    under the same kernel.

  - The file system is mounted as:
    /dev/md2 /work btrfs rw,noatime,nodiratime 0 0

  - The file system is exported as:
    /work           world(rw,wdelay,root_squash,no_subtree_check)

  - Clients are mostly 2.6.35, however, problems have also been
    seen with 2.6.32.

  - Clients mount (from /proc/mounts)
    vc-fs0:/work /work nfs 
 rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=172.29.146.16,mountvers=3,mountport=51102,mountproto=udp,addr=172.29.146.16
  0 0

 The problem manifests itself when issuing a job to the cluster, of ~120
 tasks on 30 nodes.  We will occasionally find that a machine reports
 NFS stale filehandle errors when trying to stat a directory.  The
 directory will not have been deleted during the lifetime of the job,
 however some (eg 30) sub-directories will have been created.

 The erros are Usually seen from a machine that has not done any work.

 For example:

 (2.6.35:)
 vcfe0:~$ ls -l /work /dev/null
 --launch job (doesn't do anything on vcfe, uses different nodes)--
 ... time passes (unknown how long) ...

 vcfe0:~$ ls -l /work /dev/null
 ls: cannot access /work/marta-cip-test: Stale NFS file handle
 ls: cannot access /work/andrea-test-ais: Stale NFS file handle

 (2.6.35:)
 vc-r210-0:~$ ls -l /work /dev/null
 vc-r210-0:~$

 (2.6.32:)
 b36048:~$ ls -l /work/ /dev/null
 ls: cannot access /work/marta-cip-test: Stale NFS file handle
 ls: cannot access /work/andrea-test-ais: Stale NFS file handle

 Two separate machines are seeing the same stale file handles.  b36048
 hadn't even touched /work for some considerable time before doing that
 ls.

 performing `touch /work/andrea-test-ais' on the client will allow the
 client machine to stat the directory again, however, doing it on the
 file server does not.

 performing `echo 2  /proc/sys/vm/drop_caches' on the client will
 sometimes solve the problem for that client [but not always].

 I've not yet found a reliable way to reproduce the problem, other than
 running large jobs (we aren't running small ones at the moment, so can't
 say if it is related to size)

 I would be interested to know if anyone believes this may be related to
 the use of btrfs, (or even a configuration / nfs cache coherency problem).

 Some extra anecdotal evidence:
  I don't recall this being an issue before we upgraded all the compute
  nodes to 2.6.35.  Previously they used 2.6.33, but an upgrade was
  forced due to an nfs bug under high write loads.  However, it may be
  that the nature of the jobs that we are running now has changed
  slightly too.

I was experiencing a similar pattern of ESTALE issues with NFS with
2.6.33 (IIRC) and cached data on ext4, and could reproduce it from
time to time performing kernel rebuilds over NFS.

I've CC'd Trond on the full email to see if it rings a bell. The best
outcome may be if we write a micro-reproducer which exploits this race
using cached data.

Thanks,
  Daniel
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[2.6.35-rc6 patch] direct I/O submission fixes v2

2010-09-07 Thread Daniel J Blueman
Hi Chris,

On 25 July 2010 15:42, Josef Bacik jo...@redhat.com wrote:
 On Sat, Jul 24, 2010 at 12:01:59AM +0100, Daniel J Blueman wrote:
 Hi Chris,

 This fixes some issues relating to direct I/O submission, however a
 further patch will be needed to handle the case where allocation of
 'dip' fails, which is always dereferenced when finding the ordered
 extent.


 Hi,

 There's an easier way to do this.  This patch should fix the problem,

 Signed-off-by: Josef Bacik jo...@redhat.com

 diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
 index 3232945..7259ef9 100644
 --- a/fs/btrfs/inode.c
 +++ b/fs/btrfs/inode.c
 @@ -5815,7 +5815,7 @@ free_ordered:
        if (write) {
                struct btrfs_ordered_extent *ordered;
                ordered = btrfs_lookup_ordered_extent(inode,
 -                                                     dip-logical_offset);
 +                                                     file_offset);
                if (!test_bit(BTRFS_ORDERED_PREALLOC, ordered-flags) 
                    !test_bit(BTRFS_ORDERED_NOCOW, ordered-flags))
                        btrfs_free_reserved_extent(root, ordered-start,


Good move!

With your patch applied, mine (now not priority) then becomes:

Fix leak of 'dip' on error path and double assignment.

Signed-off-by: Daniel J Blueman daniel.blue...@gmail.com

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 1bff92a..bd7f940 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5652,15 +5652,15 @@ static void btrfs_submit_direct(int rw, struct
bio *bio, struct inode *inode,
               ret = -ENOMEM;
               goto free_ordered;
       }
-       dip-csums = NULL;

       if (!skip_sum) {
               dip-csums = kmalloc(sizeof(u32) * bio-bi_vcnt, GFP_NOFS);
               if (!dip-csums) {
                       ret = -ENOMEM;
-                       goto free_ordered;
+                       goto out_err;
               }
-       }
+       } else
+               dip-csums = NULL;

       dip-private = bio-bi_private;
       dip-inode = inode;


I didn't see Josef and my patches at
http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-unstable.git
yet.

They still appear relevant; let me know if you'd like it rediffed
against your tree.

Thanks,
  Daniel
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2.6.35-rc6 patch] direct I/O submission fixes v2

2010-07-25 Thread Daniel J Blueman
On 25 July 2010 15:42, Josef Bacik jo...@redhat.com wrote:
 On Sat, Jul 24, 2010 at 12:01:59AM +0100, Daniel J Blueman wrote:
 Hi Chris,

 This fixes some issues relating to direct I/O submission, however a
 further patch will be needed to handle the case where allocation of
 'dip' fails, which is always dereferenced when finding the ordered
 extent.


 Hi,

 There's an easier way to do this.  This patch should fix the problem,

 Signed-off-by: Josef Bacik jo...@redhat.com

 diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
 index 3232945..7259ef9 100644
 --- a/fs/btrfs/inode.c
 +++ b/fs/btrfs/inode.c
 @@ -5815,7 +5815,7 @@ free_ordered:
        if (write) {
                struct btrfs_ordered_extent *ordered;
                ordered = btrfs_lookup_ordered_extent(inode,
 -                                                     dip-logical_offset);
 +                                                     file_offset);
                if (!test_bit(BTRFS_ORDERED_PREALLOC, ordered-flags) 
                    !test_bit(BTRFS_ORDERED_NOCOW, ordered-flags))
                        btrfs_free_reserved_extent(root, ordered-start,


Good move!

With your patch applied, mine (now not priority) then becomes:

Fix leak of 'dip' on error path and double assignment.

Signed-off-by: Daniel J Blueman daniel.blue...@gmail.com

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 1bff92a..bd7f940 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5652,15 +5652,15 @@ static void btrfs_submit_direct(int rw, struct
bio *bio, struct inode *inode,
ret = -ENOMEM;
goto free_ordered;
}
-   dip-csums = NULL;

if (!skip_sum) {
dip-csums = kmalloc(sizeof(u32) * bio-bi_vcnt, GFP_NOFS);
if (!dip-csums) {
ret = -ENOMEM;
-   goto free_ordered;
+   goto out_err;
}
-   }
+   } else
+   dip-csums = NULL;

dip-private = bio-bi_private;
dip-inode = inode;

-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at fs/btrfs/extent-tree.c:1353

2010-07-23 Thread Daniel J Blueman
On 22 July 2010 19:07, Johannes Hirte johannes.hi...@fem.tu-ilmenau.de wrote:
 Am Montag 19 Juli 2010, 10:01:46 schrieb Miao Xie:
 On Thu, 15 Jul 2010 20:14:51 +0200, Johannes Hirte wrote:
  Am Donnerstag 15 Juli 2010, 02:11:04 schrieb Dave Chinner:
  On Wed, Jul 14, 2010 at 05:25:23PM +0200, Johannes Hirte wrote:
  Am Donnerstag 08 Juli 2010, 16:31:09 schrieb Chris Mason:
  I'm not sure if btrfs is to blame for this error. After the errors I
  switched to XFS on this system and got now this error:
 
  ls -l .kde4/share/apps/akregator/data/
  ls: cannot access .kde4/share/apps/akregator/data/feeds.opml: Structure
  needs cleaning
  total 4
  ?? ? ?    ?        ?            ? feeds.opml
 
  What is the error reported in dmesg when the XFS filesytem shuts down?
 
  Nothing. I double checked the logs. There are only the messages when
  mounting the filesystem. No other errors are reported than the
  inaccessible file and the output from xfs_check.

 Is there anything wrong with your disks or memory?
 Sometimes the bad memory can break the filesystem. I have met this kind of
 problem some time ago.

 I don't think that's the case. I've checked the RAM with memtest86+ and got no
 errors. I got the errors with two different disks, the first one with btrfs 
 the
 second one now with XFS. Before changing to the second disk, I've run
 badblocks on it to be sure it has no errors.

There are some known-buggy chipsets also. One still around is the
Nvidia CK804/MCP55, under certain patterns of spatially-local pending
reads and writes to the memory controller, a 64-byte request would
occasionally be returned with the wrong offset. I was hitting it with
some 27-Gbit adapters and managed to capture it on a PCI-e protocol
analyser. Rsync between network and local disk would hit sometimes
too.
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: volume broken? btrfsck fails

2010-07-08 Thread Daniel J Blueman
On 8 July 2010 01:21, Daniel Kozlowski dan.kozlow...@gmail.com wrote:
 On Tue, Jul 6, 2010 at 8:19 PM, Chris Mason chris.ma...@oracle.com wrote:
 I am also having the same problem with a slightly different setup. In My 
 case I
 cannot mount the filesystem.

 What is your hardware setup here?  Including write cache settings.  Did
 you have craces with 2.6.35-rc1 or rc2?

 My setup is

 Eight hard Drive
 four 1TB Drives
 four 500GB Drives
 All drives are connected through a 3ware Inc 9550SX SATA-II RAID PCI-X card
 The card is configured to export all drives essentially acting as a
 SATA port multiplier. (drives show up sdb - sdi)
 Drives are configured in btrfs raid0
 Filesystem is mounted using:
 mount -t btrfs /dev/sdb /opt

 I have been able to lock up the system on
 2.6.33.5-124.fc13.x86_64
 2.6.35-0.13.rc3.git2.fc14.x86_64
 2.6.35-0.23.rc3.git6.fc14.x86_64
 and
 2.6.35-0.23.rc3.git6.fc14.x86_64 with a DKMS build of the btrfs module
 (Btrfs v0.19-16-g075587c-dirty)

 If you would like me to pull out another version of the kernel or roll
 back specific commits from the kernel module I can

 I have been able to get different responses form different version
 2.6.33.* - This will mount the volume but will hang shortly after
 mounting when reading data form the filesystem ( ls /opt) writes a
 bunch of transid verify failed messages hangs on ls
 2.6.34.* - Will not mount at all still gives the transid verify failed
  hands on mount


 Looks like we're looping on a single block.  What happens when you
 dmesg -n1 to cut down on the console traffic?

 Nothing changes I still have endless repeats of

 parent transid verify failed on 1682586464256 wanted 285114 found 11257

 If that doesn't help we can change it to spit a stack trace to figure
 out where the looping is happening.  We should be erroring out instead
 of hitting it over and over again.

 In my kernel noviceness i tried attaching gdb to the btrfs-endio-met,
 however apparently you can't attach gdb to a kernel thread like that
 If you could assist me in obtaining a call trace I will gladly attempt
 to resolve the matter.

For grabbing kernel backtraces:

$ sudo -s
# dmesg -c /dev/null
# echo t /proc/sysrq-trigger
# dmesg backtraces.txt
(there are other ways with

The problem is that you'll be taking instantaneous snapshots, which
may or may not be representative of the main looping, but over a few
shots should be.

Thanks,
  Daniel
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Copy/move btrfs volume

2010-07-01 Thread Daniel J Blueman
On 1 July 2010 11:28, Lubos Kolouch lubos.kolo...@gmail.com wrote:
 Hello,

 I am testing btrfs on one of our backup servers
 (many millions of files, 1.5TB size, running on (non-btrfs-provided-)
 raid5).

 I am using subvolumes/snapshots with following rsync.

 It works very well, but I would like to ask a question... say I would need
 to copy/move the files to different server/disk.

 Normally I would do it with rsync, but I guess it will not preserve the
 subvolumes, it will also not detect that they are the same files (I guess
 they are not just normal hardlinks). So I would end up with duplicated
 files.

 What is the correct way to do this?

The only way to do this preserving duplication is to use hardlinks
between duplicated files (which reference counts the inode), and use
'rsync -H'.

Dan
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs: broken file system design (was Unbound(?) internal fragmentation in Btrfs)

2010-06-18 Thread Daniel J Blueman
On Fri, Jun 18, 2010 at 1:32 PM, Edward Shishkin
edward.shish...@gmail.com wrote:
 Mat wrote:

 On Thu, Jun 3, 2010 at 4:58 PM, Edward Shishkin edw...@redhat.com wrote:

 Hello everyone.

 I was asked to review/evaluate Btrfs for using in enterprise
 systems and the below are my first impressions (linux-2.6.33).

 The first test I have made was filling an empty 659M (/dev/sdb2)
 btrfs partition (mounted to /mnt) with 2K files:

 # for i in $(seq 100); \
 do dd if=/dev/zero of=/mnt/file_$i bs=2048 count=1; done
 (terminated after getting No space left on device reports).

 # ls /mnt | wc -l
 59480

 So, I got the dirty utilization 59480*2048 / (659*1024*1024) = 0.17,
 and the first obvious question is hey, where are other 83% of my
 disk space??? I looked at the btrfs storage tree (fs_tree) and was
 shocked with the situation on the leaf level. The Appendix B shows
 5 adjacent btrfs leafs, which have the same parent.

 For example, look at the leaf 29425664: items 1 free space 3892
 (of 4096!!). Note, that this free space (3892) is _dead_: any
 attempts to write to the file system will result in No space left
 on device.

 Internal fragmentation (see Appendix A) of those 5 leafs is
 (1572+3892+1901+3666+1675)/4096*5 = 0.62. This is even worse then
 ext4 and xfs: The last ones in this example will show fragmentation
 near zero with blocksize = 2K. Even with 4K blocksize they will
 show better utilization 0.50 (against 0.38 in btrfs)!

 I have a small question for btrfs developers: Why do you folks put
 inline extents, xattr, etc items of variable size to the B-tree
 in spite of the fact that B-tree is a data structure NOT for variable
 sized records? This disadvantage of B-trees was widely discussed.
 For example, maestro D. Knuth warned about this issue long time
 ago (see Appendix C).

 It is a well known fact that internal fragmentation of classic Bayer's
 B-trees is restricted by the value 0.50 (see Appendix C). However it
 takes place only if your tree contains records of the _same_ length
 (for example, extent pointers). Once you put to your B-tree records
 of variable length (restricted only by leaf size, like btrfs inline
 extents), your tree LOSES this boundary. Moreover, even worse:
 it is clear, that in this case utilization of B-tree scales as zero(!).
 That said, for every small E and for every amount of data N we
 can construct a consistent B-tree, which contains data N and has
 utilization worse then E. I.e. from the standpoint of utilization
 such trees can be completely degenerated.

 That said, the very important property of B-trees, which guarantees
 non-zero utilization, has been lost, and I don't see in Btrfs code any
 substitution for this property. In other words, where is a formal
 guarantee that all disk space of our users won't be eaten by internal
 fragmentation? I consider such guarantee as a *necessary* condition
 for putting a file system to production.

Wow...a small part of me says 'well said', on the basis that your
assertions are true, but I do think there needs to be more
constructivity in such critique; it is almost impossible to be a great
engineer and a great academic at once in a time-pressured environment.

If you can produce some specific and suggestions with code references,
I'm sure we'll get some good discussion with potential to improve from
where we are.

Thanks,
  Daniel
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Confused by performance

2010-06-16 Thread Daniel J Blueman
On Wed, Jun 16, 2010 at 7:08 PM, K. Richard Pixley r...@noir.com wrote:
 Once again I'm stumped by some performance numbers and hoping for some
 insight.

 Using an 8-core server, building in parallel, I'm building some code.  Using
 ext2 over a 5-way, (5 disk), lvm partition, I can build that code in 35
 minutes.  Tests with dd on the raw disk and lvm partitions show me that I'm
 getting near linear improvement from the raw stripe, even with dd runs
 exceeding 10G, so I think that convinces me that my disks and controller
 subsystem are capable of operating in parallel and in concert.  hdparm -t
 numbers seem to support what I'm seeing from dd.

 Running the same build, same parallelism, over a btrfs (defaults) partition
 on a single drive, I'm seeing very consistent build times around an hour,
 which is reasonable.  I get a little under an hour on ext4 single disk,
 again, very consistently.

 However, if I build a btrfs file system across the 5 disks, my build times
 decline to around 1.5 - 2hrs, although there's about a 30min variation
 between different runs.

 If I build a btrfs file system across the 5-way lvm stripe, I get even worse
 performance at around 2.5hrs per build, with about a 45min variation between
 runs.

 I can't explain these last two results.  Any theories?

Try mounting the BTRFS filesystem with 'nobarrier', since this may be
an obvious difference. Also, for metadata-write-intensive workloads,
when creating the filesystem try 'mkfs.btrfs -m single'. Of course,
all this doesn't explain the variance.

I'd say it's worth emplying 'blktrace' to see what happening at a
lower level, and even eg varying between deadline/CFQ I/O schedulers.

Daniel
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SSD Optimizations

2010-03-11 Thread Daniel J Blueman
On Wed, Mar 10, 2010 at 11:13 PM, Gordan Bobic gor...@bobich.net wrote:
 Marcus Fritzsch wrote:

 Hi there,

 On Wed, Mar 10, 2010 at 8:49 PM, Gordan Bobic gor...@bobich.net wrote:

 [...]
 Are there similar optimizations available in BTRFS?

 There is an SSD mount option available[1].

 [1] http://btrfs.wiki.kernel.org/index.php/Getting_started#Mount_Options

 But what _exactly_ does it do?

Chris explains the change to favour spatial locality in allocator
behaviour in with '-o ssd'. '-o ssd_spread' does the opposite, where
RMW cycles are higher penalty. Elsewhere IIRC, Chris also said BTRFS
attempts to submit 128KB BIOs where possible (or wishful thinking?):

http://markmail.org/message/4sq4uco2lghgxzzz
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2.6.33 regression] btrfs mount causes memory corruption

2010-02-25 Thread Daniel J Blueman
On Thu, Feb 25, 2010 at 8:38 PM, Josef Bacik jo...@redhat.com wrote:
 On Thu, Feb 25, 2010 at 03:29:34PM -0500, Andrew Lutomirski wrote:
 On Thu, Feb 25, 2010 at 3:23 PM, Josef Bacik jo...@redhat.com wrote:
  On Thu, Feb 25, 2010 at 03:01:08PM -0500, Andrew Lutomirski wrote:
  Mounting btrfs corrupts memory and causes nasty crashes within a few
  seconds.  This seems to happen even if the mount fails (note the
  unrecognized mount option).  This is a regression from 2.6.32, and
  I've attached an example.
 
 
  And it only happens when you mount a btrfs fs?  Can you show me a trace of 
  when
  you mount a btrfs fs with valid mount options?  I'd like to see if we're 
  not
  cleaning up something properly or what.  Thanks,

 Seems OK.  Or maybe I just got lucky, but it's crashed every time I
 tried to mount with 'acl' before.

 I even went through a couple iterations of trying to mount with
 'xattr' and 'user_xattr', both of which failed.


 Ok it looks like we have a problem kfree'ing the wrong stuff.  we kstrdup the
 options string, but then strsep screws with the pointer, so when we kfree() 
 it,
 we're not giving it the right pointer.  Please try this patch, and mount with 
 -o
 acl and other such garbage to make sure it actually worked (acl isn't a valid
 mount option btw).  Let me know if it works.  Thanks,

 Josef


 diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
 index 8a1ea6e..f8b4521 100644
 --- a/fs/btrfs/super.c
 +++ b/fs/btrfs/super.c
 @@ -128,7 +128,7 @@ int btrfs_parse_options(struct btrfs_root *root, char 
 *options)
  {
        struct btrfs_fs_info *info = root-fs_info;
        substring_t args[MAX_OPT_ARGS];
 -       char *p, *num;
 +       char *p, *num, *orig;
        int intarg;
        int ret = 0;

 @@ -143,6 +143,7 @@ int btrfs_parse_options(struct btrfs_root *root, char 
 *options)
        if (!options)
                return -ENOMEM;

 +       orig = options;

        while ((p = strsep(options, ,)) != NULL) {
                int token;
 @@ -280,7 +281,7 @@ int btrfs_parse_options(struct btrfs_root *root, char 
 *options)
                }
        }
  out:
 -       kfree(options);
 +       kfree(orig);
        return ret;
  }

The patch is good, and the same as I was testing to fix this issue I
found a day before with -rc8.

Thanks,
  Daniel
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: does btrfs have RAID I/O throughput (un)limiting sysctls, similar to md?

2010-02-06 Thread Daniel J Blueman
On Sat, Feb 6, 2010 at 12:37 AM, 0bo0 0.bugs.onl...@gmail.com wrote:
 i've a 4 drive array connected via a PCIe SATA card.

 per OS (opensuse) default, md RAID I/O performance was being limited by,

  cat /proc/sys/dev/raid/speed_limit_min
    1000
  cat /proc/sys/dev/raid/speed_limit_max
    20

 changing,

  echo dev.raid.speed_limit_min=10  /etc/sysctl.conf
  echo dev.raid.speed_limit_max=60  /etc/sysctl.conf

 enabled full/best I/O throughput.

These proc entries affect just array reconstruction, not general I/O
performance/throughput, so affect just an edge-case of applications
requiring maximum latency/minimum throughout guarantees.
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


file/extent checksums for dedup/sync...

2010-01-27 Thread Daniel J Blueman
For purposes of data deduplication and data synchronisation, it would
be a powerful tool to expose file data checksums.

Since eg BTRFS uses the crc32c algorithm [1], it's possible to compute
the file's overall CRC from the accumulation of the CRCs from all it's
extents' CRCs.

For now, exposing this via an IOCTL may be sufficient, though any
ideas for introducing it in a more standard way? (it's a pity that
when stat64 was introduced, reserved fields weren't added)

Thanks,
  Daniel

[1] http://www.research.ibm.com/haifa/satran/ips/Vince-Luben-crc32c-01.pdf
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: file/extent checksums for dedup/sync...

2010-01-27 Thread Daniel J Blueman
On Wed, Jan 27, 2010 at 12:30 PM, Andi Kleen a...@firstfloor.org wrote:
 Daniel J Blueman daniel.blue...@gmail.com writes:

 For purposes of data deduplication and data synchronisation, it would
 be a powerful tool to expose file data checksums.

 Since eg BTRFS uses the crc32c algorithm [1], it's possible to compute
 the file's overall CRC from the accumulation of the CRCs from all it's
 extents' CRCs.

 For now, exposing this via an IOCTL may be sufficient, though any
 ideas for introducing it in a more standard way? (it's a pity that
 when stat64 was introduced, reserved fields weren't added)

 The problem of doing it in any standard way is that it would
 hard code the way the file system does checksums in the applications.
 So the file system could never change it without breaking
 user space.

I guess the filesystem would need to express this in the resulting
data-structure, eg:
 - type 1 corresponds to using the crc32c algorithm with starting seed
N and accumulating ascending over data extents, padding with modulus
remainder or sparse holes with 0
 - type 2 etc

The next question, is does filesystem (eg BTRFS) compression come
before or after checksumming?
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Benchmarking btrfs on HW Raid ... BAD

2009-09-28 Thread Daniel J Blueman
On Mon, Sep 28, 2009 at 9:17 AM, Florian Weimer fwei...@bfk.de wrote:
 * Tobias Oetiker:

 Running this on a single disk, I get the quite acceptable results.
 When running on-top of a Areca HW Raid6 (lvm partitioned)
 then both read and write performance go down by at least 2
 magnitudes.

 Does the HW RAID use write caching (preferably battery-backed)?

I believe Areca controllers have an option for writeback or
writethrough caching, so it's worth checking this and that you're
running the current firmware, in case of errata. Ironically, disabling
writeback will give the OS tighter control of request latency, but
throughput may drop a lot. I still can't help thinking that this is
down to the behaviour of the controller, due to the 1-disk case
working well.

One way would be to configure the array as 6 or 7 devices, and allow
BTRFS/DM to mange the array, then see if performance under write load
is better, and with or without writeback caching...

Daniel
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: reiserfs3/ext4/btrfs RAID read performance

2009-09-20 Thread Daniel J Blueman
On Sep 20, 11:50 am, wbr...@gmail.com wrote:
 On Sun, Sep 20, 2009 at 3:47 AM, Daniel J Blueman

 daniel.blue...@gmail.com wrote:
  On Sep 19, 7:20 pm, wbr...@gmail.com wrote:

  RAID details:
 
  md8 : active raid10 sda7[0] sdd7[3] sdc7[2] sdb7[1]
62925824 blocks 256K chunks 2 far-copies [4/4] []
 
  Ext4:
  mkfs.ext4 -E stride=64,stripe-width=128 /dev/md8
  mount -t ext4 -o noatime,auto_da_alloc,commit=600 /dev/md8 /mnt/md8

Here, stripe-width should be 4* stride, not that it'll make much
difference.

  Reiser3:
  mount -t reiserfs /dev/md8 /mnt/md8
  mount -t reiserfs -o noatime,notail /dev/md8 /dev/md8
 
  Ext4 results:
  intial create total runs 10 avg 172.76 MB/s (user 0.43s sys 0.60s)
  create total runs 14 avg 36.49 MB/s (user 0.42s sys 0.59s)
  patch total runs 15 avg 15.16 MB/s (user 0.24s sys 0.49s)
  compile total runs 14 avg 64.07 MB/s (user 0.10s sys 0.59s)
  clean total runs 10 avg 393.43 MB/s (user 0.02s sys 0.06s)
  read tree total runs 11 avg 20.47 MB/s (user 0.53s sys 0.74s)
  read compiled tree total runs 4 avg 32.94 MB/s (user 0.61s sys 1.17s)
  delete tree total runs 10 avg 2.51 seconds (user 0.24s sys 0.42s)
  delete compiled tree total runs 4 avg 2.63 seconds (user 0.28s sys 0.50s)
  stat tree total runs 11 avg 1.99 seconds (user 0.23s sys 0.18s)
  stat compiled tree total runs 7 avg 2.11 seconds (user 0.27s sys 0.21s)
 
  Reiser3 results:
  intial create total runs 10 avg 82.74 MB/s (user 0.45s sys 1.13s)
  create total runs 14 avg 28.54 MB/s (user 0.45s sys 1.19s)
  patch total runs 15 avg 10.91 MB/s (user 0.24s sys 0.86s)
  compile total runs 14 avg 47.49 MB/s (user 0.10s sys 1.27s)
  clean total runs 10 avg 270.21 MB/s (user 0.02s sys 0.15s)
  read tree total runs 11 avg 26.33 MB/s (user 0.54s sys 0.81s)
  read compiled tree total runs 4 avg 41.82 MB/s (user 0.62s sys 1.36s)
  delete tree total runs 10 avg 3.38 seconds (user 0.24s sys 0.72s)
  delete compiled tree total runs 4 avg 4.14 seconds (user 0.27s sys 0.88s)
  stat tree total runs 11 avg 2.09 seconds (user 0.22s sys 0.18s)
  stat compiled tree total runs 7 avg 2.27 seconds (user 0.25s sys 0.21s)
 
  It would be interesting to also compare against BTRFS if on 2.6.30 or
  newer, if you can.

 BTRFS 2.6.31

 mkfs.btrfs -d raid10 -m raid10 /dev/sda7 /dev/sdb7 /dev/sdc7 /dev/sdd7
 mount -t btrfs -o noatime /dev/sda7 /mnt/md8

 intial create total runs 10 avg 158.85 MB/s (user 0.45s sys 0.93s)
 create total runs 14 avg 32.67 MB/s (user 0.44s sys 0.90s)
 patch total runs 15 avg 8.91 MB/s (user 0.22s sys 0.84s)
 compile total runs 14 avg 61.02 MB/s (user 0.09s sys 0.50s)
 clean total runs 10 avg 245.12 MB/s (user 0.02s sys 0.18s)

 read tree total runs 11 avg 14.03 MB/s (user 0.48s sys 0.87s)
 read compiled tree total runs 4 avg 29.14 MB/s (user 0.54s sys 1.37s)

 delete tree total runs 10 avg 9.77 seconds (user 0.28s sys 1.37s)
 delete compiled tree total runs 4 avg 11.91 seconds (user 0.31s sys 1.60s)
 stat tree total runs 11 avg 4.36 seconds (user 0.25s sys 0.33s)
 stat compiled tree total runs 7 avg 5.29 seconds (user 0.29s sys 0.37s)

Not ext4 specific, but I was finding 64KB chunk-size RAID 10 (layout
f2 if using MD) and increasing readahead of /dev/mdX to drives*256 to
give the best performance on a 4-drive SATA array. Consider using
aligned partitions (ext4 has internal alignment, I don't think BTRFS
does) and at the outside of identical disks if using non-SSDs.

Thanks,
  Daniel
--
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs csum failed on git .pack file

2009-09-09 Thread Daniel J Blueman
On Wed, Sep 9, 2009 at 8:01 AM, Jens Axboejens.ax...@oracle.com wrote:
 On Wed, Sep 09 2009, Markus Trippelsdorf wrote:
 On Tue, Sep 08, 2009 at 10:32:14PM +0200, Jens Axboe wrote:
  On Tue, Sep 08 2009, Markus Trippelsdorf wrote:
   On Tue, Sep 08, 2009 at 10:00:42PM +0200, Jens Axboe wrote:
On Mon, Sep 07 2009, Markus Trippelsdorf wrote:
 Just got this error today in my dmesg:
 btrfs csum failed ino 1483065 off 158482432 csum 4283543305 private 
 43905798

 linux % find . -inum 1483065
 ./.git/objects/pack/pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack

 It's the main pack file from my git linux kernel tree:

   
Hmm, I ran into something very similar. Care to check what the 
corrupted
block of data looks like (and how big it is)?
  
   I've already deleted the file in question unfortunately.
   On IRC Chris decided that either bad RAM or a harddrive error was the
   most likely reason for this chechsum mismatch.
 
  Darn, that's too bad. The corruption issue I had was also in a git pack
  file. It was fine one day, bad the next. Turned out to be 16kb of 0xff
  in the file, and I blamed it on the (cheap) SSD drive that hosted the
  local git repo. It's still the most likely explanation given the nature
  of the problem, however it would have been really interesting to see
  what corruption you had.

 If by cheap SSD drive you mean an Indilinx Barefoot based one, we might
 be using the same hardware (30GB Vertex in my case).

 Spooky, yes indeed that's the very same drive I'm using. Also see my
 postings on this very issue here, top two entries:

 http://axboe.livejournal.com/

 So that pretty much looks like it reaffirms some of my suspicions. Is
 the drive in a laptop that you suspend and resume?

If you're on firmware  1.30, the changlog includes some fixes which
may be relevant, eg if block 0 is relative, or you're
suspending/resuming:

- Race condition occurred during soft reset handler
- If read fail occurs during reading stamp information, firmware
corrupted block 0.
- Power off recovery had bug in certain circumstances

http://www.ocztechnologyforum.com/forum/showthread.php?t=57516
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs csum failed on git .pack file

2009-09-09 Thread Daniel J Blueman
On Wed, Sep 9, 2009 at 9:26 AM, Jens Axboejens.ax...@oracle.com wrote:
 On Wed, Sep 09 2009, Daniel J Blueman wrote:
 On Wed, Sep 9, 2009 at 8:01 AM, Jens Axboejens.ax...@oracle.com wrote:
  On Wed, Sep 09 2009, Markus Trippelsdorf wrote:
  On Tue, Sep 08, 2009 at 10:32:14PM +0200, Jens Axboe wrote:
   On Tue, Sep 08 2009, Markus Trippelsdorf wrote:
On Tue, Sep 08, 2009 at 10:00:42PM +0200, Jens Axboe wrote:
 On Mon, Sep 07 2009, Markus Trippelsdorf wrote:
  Just got this error today in my dmesg:
  btrfs csum failed ino 1483065 off 158482432 csum 4283543305 
  private 43905798
 
  linux % find . -inum 1483065
  ./.git/objects/pack/pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack
 
  It's the main pack file from my git linux kernel tree:
 

 Hmm, I ran into something very similar. Care to check what the 
 corrupted
 block of data looks like (and how big it is)?
   
I've already deleted the file in question unfortunately.
On IRC Chris decided that either bad RAM or a harddrive error was the
most likely reason for this chechsum mismatch.
  
   Darn, that's too bad. The corruption issue I had was also in a git pack
   file. It was fine one day, bad the next. Turned out to be 16kb of 0xff
   in the file, and I blamed it on the (cheap) SSD drive that hosted the
   local git repo. It's still the most likely explanation given the nature
   of the problem, however it would have been really interesting to see
   what corruption you had.
 
  If by cheap SSD drive you mean an Indilinx Barefoot based one, we might
  be using the same hardware (30GB Vertex in my case).
 
  Spooky, yes indeed that's the very same drive I'm using. Also see my
  postings on this very issue here, top two entries:
 
  http://axboe.livejournal.com/
 
  So that pretty much looks like it reaffirms some of my suspicions. Is
  the drive in a laptop that you suspend and resume?

 If you're on firmware  1.30, the changlog includes some fixes which
 may be relevant, eg if block 0 is relative, or you're
 suspending/resuming:

 - Race condition occurred during soft reset handler
 - If read fail occurs during reading stamp information, firmware
 corrupted block 0.
 - Power off recovery had bug in certain circumstances

 http://www.ocztechnologyforum.com/forum/showthread.php?t=57516

 The issue is pretty much moot at this point, since OCZ support were not
 really interested in providing any sort of real technical support to
 find out what really caused this issue. My main worry was reliability of
 these cheaper SSD drives, and that worry is still not resolved. If you
 read the blog entries, I do comment on the apparently scary basic bugs
 taht are still being fixed on the Indilinx controllers. I do expect some
 basic level of data integrity from a consumer product and at least some
 interest in resolving weird corruption issues if things go wrong. Since
 OCZ cannot provide anything like that, I have a hard time recommending
 these drives for anything but very casual use. Fast, cheap, reliable.
 Pick any two.

 My drive was running 1.10 at the time of the problem.

It looks like we need a small tool which performs patterned block I/O
to the device, updating a checksum as it goes, and performing
integrity sweeps at intervals, lower level than fsx. It must be
trusted or not.

I had a problem like this with nVidia CK804/MCP55 chipsets corrupting
data under a triple-edge case workload.
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2.6.31-rc6, BTRFS] potential memory leaks...

2009-08-16 Thread Daniel J Blueman
On Fri, Aug 14, 2009 at 5:35 PM, Catalin Marinascatalin.mari...@arm.com wrote:
 Daniel J Blueman daniel.blue...@gmail.com wrote:
 There is good chance that the BTRFS kmemleak reports using 2.6.31-rc6
 [1] are false-positives, due to the overwriting of the static pointers
 [2]. Does this ring true with anyone else?

 If you do a few echo scan  /sys/kernel/debug/kmemleak, do they
 disappear?

 The static pointers are scanned by kmemleak, unless they are in the
 .data.init section (which is removed anyway).

The above reports I picked _are_ transient indeed.

Directed more to LKML, every mount (at least on ext4 and BTRFS), we do
see persistent reports [1], even after scanning, unmount and more
scanning.

Daniel

--- [1]

unreferenced object 0x88006133d260 (size 32):
  comm mount, pid 27209, jiffies 4300626089
  backtrace:
[810de2f1] create_object+0x141/0x2d0
[810de5c5] kmemleak_alloc+0x55/0x60
[810dadfb] __kmalloc_track_caller+0x17b/0x260
[810c1429] kstrdup+0x39/0x70
[a01d2b9a] btrfs_parse_options+0x5a/0x3a0 [btrfs]
[a01edf45] open_ctree+0x9c5/0x13f0 [btrfs]
[a01d24ec] btrfs_get_sb+0x3fc/0x500 [btrfs]
[810e2478] vfs_kern_mount+0x58/0xd0
[810e255e] do_kern_mount+0x4e/0x110
[810fc55a] do_mount+0x2ca/0x8d0
[810fcc1b] sys_mount+0xbb/0xf0
[8100bdeb] system_call_fastpath+0x16/0x1b

unreferenced object 0x88007a5d9c80 (size 128):
  comm mount, pid 2030, jiffies 4294963872
  backtrace:
[810de2f1] create_object+0x141/0x2d0
[810de5c5] kmemleak_alloc+0x55/0x60
[810da14b] __kmalloc+0x18b/0x270
[81162c8d] ext4_mb_init+0x1cd/0x670
[8114e363] ext4_fill_super+0x1883/0x2810
[810e388d] get_sb_bdev+0x17d/0x1b0
[8114a8c3] ext4_get_sb+0x13/0x20
[810e2478] vfs_kern_mount+0x58/0xd0
[810e255e] do_kern_mount+0x4e/0x110
[810fc55a] do_mount+0x2ca/0x8d0
[810fcc1b] sys_mount+0xbb/0xf0
[8100bdeb] system_call_fastpath+0x16/0x1b

unreferenced object 0x88006071a000 (size 8192):
  comm mount, pid 27460, jiffies 4303151389
  backtrace:
[810de2f1] create_object+0x141/0x2d0
[810de5c5] kmemleak_alloc+0x55/0x60
[810d9f73] kmem_cache_alloc+0x153/0x1a0
[81171ec7] journal_init_common+0x1e7/0x2a0
[81172995] jbd2_journal_init_inode+0x15/0x1b0
[8114e869] ext4_fill_super+0x1d89/0x2810
[810e388d] get_sb_bdev+0x17d/0x1b0
[8114a8c3] ext4_get_sb+0x13/0x20
[810e2478] vfs_kern_mount+0x58/0xd0
[810e255e] do_kern_mount+0x4e/0x110
[810fc55a] do_mount+0x2ca/0x8d0
[810fcc1b] sys_mount+0xbb/0xf0
[8100bdeb] system_call_fastpath+0x16/0x1b
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[2.6.31-rc6, BTRFS] potential memory leaks...

2009-08-14 Thread Daniel J Blueman
There is good chance that the BTRFS kmemleak reports using 2.6.31-rc6
[1] are false-positives, due to the overwriting of the static pointers
[2]. Does this ring true with anyone else?

--- [1]

unreferenced object 0x88001eda7000 (size 168):
  comm rm, pid 14794, jiffies 4301710929
  backtrace:
[810de2f1] create_object+0x141/0x2d0
[810de5c5] kmemleak_alloc+0x55/0x60
[810d9f73] kmem_cache_alloc+0x153/0x1a0
[811df959] alloc_extent_state+0x19/0x70
[811e1eb3] clear_extent_bit+0x233/0x2e0
[811e20ee] try_release_extent_state+0x7e/0xa0
[811bf7f3] btree_releasepage+0x63/0xa0
[810ad0be] try_to_release_page+0x2e/0x60
[810b872c] invalidate_mapping_pages+0x1ac/0x1c0
[811b746a] __btrfs_free_extent+0x56a/0x8e0
[811b7c9d] run_one_delayed_ref+0x4bd/0x4f0
[811b9a8f] run_clustered_refs+0xcf/0x360
[811b9de6] btrfs_run_delayed_refs+0xc6/0x1f0
[811c4129] __btrfs_end_transaction+0x59/0x130
[811c421b] btrfs_end_transaction+0xb/0x10
[811cc6d2] btrfs_delete_inode+0x112/0x130

unreferenced object 0x88006c5912a0 (size 168):
  comm make, pid 3983, jiffies 4296054079
  backtrace:
[810de2f1] create_object+0x141/0x2d0
[810de5c5] kmemleak_alloc+0x55/0x60
[810d9f73] kmem_cache_alloc+0x153/0x1a0
[811df959] alloc_extent_state+0x19/0x70
[811e0cae] set_extent_bit+0x1ee/0x390
[811e1a73] lock_extent+0x73/0xa0
[811e27a7] __extent_read_full_page+0x97/0x610
[811e3119] read_extent_buffer_pages+0x3f9/0x540
[811bf75f] readahead_tree_block+0x4f/0x60
[811a7e03] read_block_for_search+0x2f3/0x3b0
[811b003b] btrfs_next_leaf+0x28b/0x3f0
[811c5cea] btrfs_real_readdir+0x1ca/0x4e0
[810f0580] vfs_readdir+0xb0/0xd0
[810f06f7] sys_getdents+0x87/0xe0
[8100bdeb] system_call_fastpath+0x16/0x1b

unreferenced object 0x88003bf5d800 (size 256):
  comm btrfs-transacti, pid 2060, jiffies 4301667515
  backtrace:
[810de2f1] create_object+0x141/0x2d0
[810de5c5] kmemleak_alloc+0x55/0x60
[810d9f73] kmem_cache_alloc+0x153/0x1a0
[811dfa29] alloc_extent_buffer+0x79/0x3e0
[811bf688] btrfs_find_create_tree_block+0x28/0x30
[811b46a1] btrfs_init_new_buffer+0x31/0x140
[811b4854] btrfs_alloc_free_block+0xa4/0x230
[811ac2d7] __btrfs_cow_block+0x137/0x670
[811acf0f] btrfs_cow_block+0xef/0x1f0
[811af6ba] btrfs_search_slot+0x19a/0x890
[811bd1ee] btrfs_del_csums+0xee/0x2e0
[811b75b9] __btrfs_free_extent+0x6b9/0x8e0
[811b7be2] run_one_delayed_ref+0x402/0x4f0
[811b9a8f] run_clustered_refs+0xcf/0x360
[811b9de6] btrfs_run_delayed_refs+0xc6/0x1f0
[811c4a8a] btrfs_commit_transaction+0x7a/0x750

unreferenced object 0x8800668e8600 (size 256):
  comm btrfs-endio-wri, pid 2053, jiffies 4301877227
  backtrace:
[810de2f1] create_object+0x141/0x2d0
[810de5c5] kmemleak_alloc+0x55/0x60
[810d9f73] kmem_cache_alloc+0x153/0x1a0
[811dfa29] alloc_extent_buffer+0x79/0x3e0
[811bf688] btrfs_find_create_tree_block+0x28/0x30
[811b46a1] btrfs_init_new_buffer+0x31/0x140
[811b4854] btrfs_alloc_free_block+0xa4/0x230
[811ac2d7] __btrfs_cow_block+0x137/0x670
[811acf0f] btrfs_cow_block+0xef/0x1f0
[811af6ba] btrfs_search_slot+0x19a/0x890
[811b0819] btrfs_insert_empty_items+0x69/0xd0
[811b7998] run_one_delayed_ref+0x1b8/0x4f0
[811b9a8f] run_clustered_refs+0xcf/0x360
[811b9de6] btrfs_run_delayed_refs+0xc6/0x1f0
[811c4129] __btrfs_end_transaction+0x59/0x130
[811c421b] btrfs_end_transaction+0xb/0x10

--- [2] fs/btrfs/extent_io.c:46

static struct kmem_cache *extent_state_cache;
static struct kmem_cache *extent_buffer_cache
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[2.6.31-rc4] uninitialised memory during read_sb...

2009-07-29 Thread Daniel J Blueman
When mounting a btrfs filesystem on my server running 2.6.31-rc4,
kmemcheck spotted some believed-uninitialised memory [1] 128 bytes
into the inode structure access from BTRFS_I [2,3].

The filesystem was created with btrfstools-0.18 under 2.6.30 - perhaps
an issue relating to the forward rolling disk format changes - or
simply relating to the inode size? Should
be reproducible.

Thanks,
 Daniel

--- [1]

device fsid bf4baee4f8fc876b-fe3bbc7a5af849a devid 1 transid 29478 /dev/sda1
WARNING: kmemcheck: Caught 64-bit read from uninitialized memory
(88007ac803c0)
b1e01781b5ca66815b90008125456581
 u u u u u u u u u u u u u u u u u u u u u u u u u u u u u u u u
 ^

Modules linked in: ath9k snd_hda_codec_realtek mac80211 led_class ath
snd_hda_intel snd_hda_codec snd_pcm snd_timer snd pl2303 soundcore
snd_page_alloc
Pid: 2172, comm: mount Tainted: G        W  2.6.31-rc4-274sd #1 OEM
RIP: 0010:[811be5d3]  [811be5d3] open_ctree+0x673/0x1360
RSP: 0018:88007d769bf8  EFLAGS: 00010246
RAX: 88007ac80670 RBX:  RCX: 88007ac80440
RDX: 821731d0 RSI: 0001 RDI: 821731d0
RBP: 88007d769d28 R08: 7fff R09: 
R10:  R11: 0001 R12: 88007d87d948
R13: 88007d87c000 R14: 88007d15d000 R15: 88007d15a000
FS:  7fa15cd1e780() GS:8800022fc000() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 88007f80cb40 CR3: 7e583000 CR4: 06f0
DR0:  DR1:  DR2: 
DR3:  DR6: 4ff0 DR7: 0400
 [811a385c] btrfs_get_sb+0x3fc/0x500
 [810e09b8] vfs_kern_mount+0x58/0xd0
 [810e0a9e] do_kern_mount+0x4e/0x110
 [810fa9ca] do_mount+0x2ca/0x8d0
 [810fb08b] sys_mount+0xbb/0xf0
 [8100bdeb] system_call_fastpath+0x16/0x1b
 [] 0x

--- [2]

811bdf60 open_ctree:
open_ctree():
...
/store/kernel/linux/fs/btrfs/disk-io.c:1610
811be5b7:       49 8b 85 40 19 00 00    mov    0x1940(%r13),%rax
811be5be:       48 8b 80 28 02 00 00    mov    0x228(%rax),%rax
811be5c5:       4c 89 a0 e8 00 00 00    mov    %r12,0xe8(%rax)
BTRFS_I():
/store/kernel/linux/fs/btrfs/btrfs_inode.h:147
811be5cc:       49 8b 8d 40 19 00 00    mov    0x1940(%r13),%rcx   ---
rb_set_parent():
/store/kernel/linux/include/linux/rbtree.h:125
811be5d3:       48 8b 41 80             mov    -0x80(%rcx),%rax
811be5d7:       48 8d 51 80             lea    -0x80(%rcx),%rdx
811be5db:       83 e0 03                and    $0x3,%eax
811be5de:       48 09 c2                or     %rax,%rdx
811be5e1:       48 89 51 80             mov    %rdx,-0x80(%rcx)

--- [3]

static inline struct btrfs_inode *BTRFS_I(struct inode *inode)
{
       return container_of(inode, struct btrfs_inode, vfs_inode);
}
--
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


SSDs and filesystem alignment...

2009-02-22 Thread Daniel J Blueman
Does BTRFS perform any journal and/or filesystem structure alignment
(for benefit to SSD longevity and SSD, RAID array and large-sector
device performance) at present?

ext4's Ted Tso will deliver 128KB alignment with the next release of
e2fsprogs (ie 1.41.4) [1], so perhaps it's a good idea for btrfsprogs
also, if not already available?

Daniel

--- [1]
http://thunk.org/tytso/blog/2009/02/20/aligning-filesystems-to-an-ssds-erase-block-size/
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


GRUB and BTRFS support...

2009-01-12 Thread Daniel J Blueman
Hi Chris et al,

Firstly, BTRFS is progressing at a very healthy pace and seems to be a
fantastic filesystem so far, and pulled into 2.6.29-rc1 is fine news
indeed.

Is there any plan to add BTRFS support to GRUB (v1 or 2)? I've been
unable to find any information so far...

Many thanks,
  Daniel
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html