Re: Btrfs moved my root tree off the end of the device

2011-09-08 Thread Justin Gottula
On Wed, Sep 7, 2011 at 10:51 PM, Arne Jansen sensi...@gmx.net wrote:
 I think Liu Bo posted a fix for this a while ago:

 [PATCH] Btrfs: fix a bug of balance on full multi-disk partitions

 -Arne

The behavior that this patch describes looks pretty much exactly like
what I was seeing, although in my situation the reason for the
relocation was due to defragging rather than multi-device balancing
(not sure if that ultimately makes any difference based on what he's
saying is responsible for the problem). I may be able to find a way to
reliably reproduce the buggy behavior, and if so, I'll try the
bleeding edge git kernel (which appears to have this patch by now) and
see if the problem goes away.


Justin
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs moved my root tree off the end of the device

2011-09-07 Thread C Anthony Risinger
On Wed, Sep 7, 2011 at 6:33 PM, Justin Gottula jus...@jgottula.com wrote:

 Hi,

 I recently created a Btrfs volume on top of a software (mdadm) raid5 array
 (since Btrfs currently lacks raid5 support at the FS level). On this 640 GB
 volume, I stored a ~400 GB tar file. After a couple weeks of use, I used 
 'btrfs
 defragment' on this file in an effort to (a) defrag and (b) compress the 
 file. I
 made sure I was using the latest version of the userspace utilities (Btrfs
 v0.19-35-g1b444cd-dirty) as well as kernel 3.0.

did you use the integration branch, ie:

http://git.darksatanic.net/repo/btrfs-progs-unstable.git

... this has the latest code for the time being.  looks like
`integration-20110805` is the most recent head.

 snip

 At this point, I took a disk image and dived in, and in doing so discovered 
 that
 somehow, there were CHUNK_ITEMs in the chunk tree that referred to physical
 address ranges that were entirely outside of the device (matching up to the
 ranges showing up in the kernel log over and over). Evidently, the filesystem
 driver thought it should move the root tree onto a chunk that existed at a
 nonexistant offset in the device. I checked the superblocks and verified that
 the total_bytes fields matched up correctly to the actual device size, which
 leaves me wondering how those chunks ever got there.

i could be way of base here, but your report reminded me of:

[thread] http://www.spinics.net/lists/linux-btrfs/index.html#12121

---
extent data backref root 5 objectid 258 offset 18446744073709543424 count 1
extent data backref root 5 objectid 257 offset 0 count 1

 So I think we have to live with this defect, just fix relocation for
 the negative offset case ?

I prefer fixing relocation.
---

... which, if i understood correctly, surfaced some issues with
relocation that could cause the offset to be grossly inaccurate (eg.
off the device completely?)

could of course be completely unrelated :-)

-- 

C Anthony
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs moved my root tree off the end of the device

2011-09-07 Thread Arne Jansen
On 08.09.2011 01:33, Justin Gottula wrote:
 Hi,
 
 
 I recently created a Btrfs volume on top of a software (mdadm) raid5 array
 (since Btrfs currently lacks raid5 support at the FS level). On this 640 GB
 volume, I stored a ~400 GB tar file. After a couple weeks of use, I used 
 'btrfs
 defragment' on this file in an effort to (a) defrag and (b) compress the 
 file. I
 made sure I was using the latest version of the userspace utilities (Btrfs
 v0.19-35-g1b444cd-dirty) as well as kernel 3.0.

I think Liu Bo posted a fix for this a while ago:

[PATCH] Btrfs: fix a bug of balance on full multi-disk partitions

-Arne

 Now, this may or may not have had something to do with the drives being at 55
 celsius, which I only discovered later, but after I had left this operation to
 work overnight, I came back to some scary messages in the kernel log.
 Immediately after the operation started (as far as I can tell), the following
 messages showed up in the kernel log:
 
 
 [17055.793912] btrfs: relocating block group 636489826304 flags 1
 
 [17112.566998] attempt to access beyond end of device
 [17112.567003] md127p1: rw=145, want=1248172032, limit=1248171999
 [17112.567004] attempt to access beyond end of device
 [17112.567006] md127p1: rw=145, want=1248172280, limit=1248171999
 [17112.567008] attempt to access beyond end of device
 [17112.567009] md127p1: rw=145, want=1248172416, limit=1248171999
 [17112.567011] attempt to access beyond end of device
 [17112.567012] md127p1: rw=145, want=1248172664, limit=1248171999
 [17112.567014] attempt to access beyond end of device
 [17112.567015] md127p1: rw=145, want=1248172912, limit=1248171999
 [17112.567016] attempt to access beyond end of device
 [17112.567018] md127p1: rw=145, want=1248172928, limit=1248171999
 (thousands more of the above in rapid succession)
 
 
 and occasionally a few of these:
 
 
 [17157.916746] btrfs csum failed ino 258 off 8192 csum 2566472073 private
 1679206033
 [17157.916758] btrfs csum failed ino 258 off 12288 csum 2566472073 private
 248979876
 [17157.916771] btrfs csum failed ino 258 off 16384 csum 2566472073 private
 3790022839
 
 
 Then, later, another burst of the same device access warnings followed by 
 this:
 
 
 [20063.971837] [ cut here ]
 [20063.972050] kernel BUG at fs/btrfs/ctree.c:300!
 [20063.972238] invalid opcode:  [#1] PREEMPT SMP 
 [20063.972666] CPU 0 
 [20063.972669] Modules linked in: ipv6 loop nouveau snd_hda_codec_via ttm
 drm_kms_helper drm r8169 forcedeth i2c_nforce2 mii ppdev i2c_algo_bit mxm_wmi
 parport_pc wmi pcspkr video parport asus_atk0110 snd_hda_intel evdev sg
 edac_core edac_mce_amd snd_hda_codec processor psmouse button snd_hwdep 
 snd_pcm
 snd_timer snd soundcore snd_page_alloc k10temp serio_raw i2c_core usbhid hid
 raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy 
 async_tx
 raid0 btrfs zlib_deflate crc32c libcrc32c ext4 mbcache jbd2 crc16 ohci_hcd
 ehci_hcd usbcore md_mod sata_nv sd_mod pata_amd libata scsi_mod
 [20063.974670] 
 [20063.974670] Pid: 7425, comm: btrfs-endio-wri Not tainted 3.0-ARCH #1 System
 manufacturer System Product Name/M4N68T-M-V2
 [20063.974670] RIP: 0010:[a01a1df1]  [a01a1df1]
 update_ref_for_cow+0x331/0x340 [btrfs]
 [20063.974670] RSP: 0018:8800236798b0  EFLAGS: 00010282
 [20063.974670] RAX: fffb RBX: 88002539f000 RCX: 
 0001bc22b580
 [20063.974670] RDX: 0001bc22b540 RSI: 60ffc24024f0 RDI: 
 eacb7930
 [20063.974670] RBP: 880023679900 R08: a01a121a R09: 
 
 [20063.974670] R10: fffb R11: 0001 R12: 
 880020a72a00
 [20063.974670] R13: 880037c7f000 R14: 88002c24fc00 R15: 
 88002367997c
 [20063.974670] FS:  7f6c7ea81740() GS:88003d80()
 knlGS:
 [20063.974670] CS:  0010 DS:  ES:  CR0: 8005003b
 [20063.974670] CR2: 7fe40de62000 CR3: 25978000 CR4: 
 06f0
 [20063.974670] DR0:  DR1:  DR2: 
 
 [20063.974670] DR3:  DR6: 0ff0 DR7: 
 0400
 [20063.974670] Process btrfs-endio-wri (pid: 7425, threadinfo 
 880023678000,
 task 88003a43f300)
 [20063.974670] Stack:
 [20063.974670]  1000 880023678000 880023679fd8
 880025cb8010
 [20063.974670]  880023679910 88002539f000 88002c24fc00
 880020a72a00
 [20063.974670]  880037c7f000 880003955900 8800236799b0
 a01a2276
 [20063.974670] Call Trace:
 [20063.974670]  [a01a2276] __btrfs_cow_block+0x476/0x890 [btrfs]
 [20063.974670]  [a01a27a8] btrfs_cow_block+0x118/0x360 [btrfs]
 [20063.974670]  [a01a7e0e] btrfs_search_slot+0x1de/0x900 [btrfs]
 [20063.974670]  [a01bac58] btrfs_lookup_file_extent+0x38/0x40 
 [btrfs]
 [20063.974670]  [a01d54c4] btrfs_drop_extents+0x104/0xa10 [btrfs]
 [20063.974670]  [a01cadf3]