Re: Kernel error during btrfs balance
Hi, It took me a couple of days, because I needed to patch my kernel first and then issue a rebalance, which ran for more than two days. Nevertheless, the rebalance succeeded without any kernel BUG-messages, so apparently your patch works! I noticed that at first, the messages were like this: [79329.526490] btrfs: found 1939 extents [79375.950834] btrfs: found 1939 extents [79376.083599] btrfs: relocating block group 352220872704 flags 1 [80052.940435] btrfs: found 3786 extents [80108.439657] btrfs: found 3786 extents [80112.325548] btrfs: relocating block group 351147130880 flags 1 Just like I saw during previous balance-runs. Then all of a sudden the messages changed to: [104178.827594] btrfs allocation failed flags 1, wanted 2013265920 [104178.827599] space_info has 4271198208 free, is not full [104178.827602] space_info total=214748364800, used=210440957952, pinned=0, reserved=36208640, may_use=3168993280, readonly=0 [104178.827606] block group 1107296256 has 5368709120 bytes, 5368582144 used 0 pinned 0 reserved [104178.827610] entry offset 1778384896, bytes 86016, bitmap yes [104178.827612] entry offset 1855827968, bytes 20480, bitmap no [104178.827614] entry offset 1855852544, bytes 20480, bitmap no [104178.827617] block group has cluster?: no [104178.827618] 0 blocks of free space at or bigger than bytes is [104178.827621] block group 8623489024 has 5368709120 bytes, 5368705024 used 0 pinned 0 reserved [104178.827624] entry offset 8891924480, bytes 4096, bitmap yes [104178.827626] block group has cluster?: no [104178.827628] 0 blocks of free space at or bigger than bytes is [104178.827631] block group 17213423616 has 5368709120 bytes, 5368709120 used 0 pinned 0 reserved [104178.827634] block group has cluster?: no And so on. Does this indicate an error of any sort, or is this expected behaviour? Kind regards, Erik. On 01/21/2011 10:19 AM, Yan, Zheng wrote: please try patch attached below, Thanks. --- diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index b37d723..49d6b13 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -1158,6 +1158,7 @@ static int clone_backref_node(struct btrfs_trans_handle *trans, new_node-bytenr = dest-node-start; new_node-level = node-level; new_node-lowest = node-lowest; + new_node-checked = 1; new_node-root = dest; if (!node-lowest) { --- On Fri, Jan 21, 2011 at 4:50 PM, Erik Logtenberg e...@logtenberg.eu wrote: Hi, I hit the same bug again I think: [291835.724344] [ cut here ] [291835.724376] kernel BUG at fs/btrfs/relocation.c:836! [291835.724401] invalid opcode: [#1] SMP [291835.724424] last sysfs file: /sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_map [291835.724461] CPU 0 [291835.724472] Modules linked in: uvcvideo snd_usb_audio snd_usbmidi_lib videodev v4l1_compat snd_rawmidi v4l2_compat_ioctl32 btrfs zlib_deflate libcrc32c sha256_generic cryptd aes_x86_64 aes_generic cbc dm_crypt tun ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat bridge stp llc nfsd lockd nfs_acl auth_rpcgss exportfs nls_utf8 cifs fscache sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipv6 kvm_intel kvm dummy uinput snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device e1000e snd_pcm snd_timer i2c_i801 snd shpchp iTCO_wdt iTCO_vendor_support soundcore dell_wmi sparse_keymap snd_page_alloc serio_raw joydev wmi dcdbas microcode usb_storage uas raid1 pata_acpi ata_generic radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan] [291835.725002] [291835.725013] Pid: 27386, comm: btrfs Tainted: G I 2.6.37-2.fc15.x86_64 #1 [291835.725062] RIP: 0010:[a0565237] [a0565237] build_backref_tree+0x473/0xd6d [btrfs] [291835.725126] RSP: 0018:8800373bf9c8 EFLAGS: 00010246 [291835.725152] RAX: 8801367d5100 RBX: 88020b110880 RCX: 0040 [291835.725186] RDX: 0030 RSI: 006dd08d3000 RDI: 880100069820 [291835.725219] RBP: 8800373bfaf8 R08: 8050 R09: 8800373bf980 [291835.725253] R10: 8800373bf918 R11: 88020b110880 R12: 8801367d5100 [291835.725254] R13: 88012c0a24c0 R14: 88021e2013f0 R15: 88021e201cf0 [291835.725254] FS: 7fcb1a6cc760() GS:8800bfa0() knlGS: [291835.725254] CS: 0010 DS: ES: CR0: 8005003b [291835.725254] CR2: 02feeeb8 CR3: 0001c2943000 CR4: 000426e0 [291835.725254] DR0: DR1: DR2: [291835.725254] DR3: DR6: 0ff0 DR7: 0400 [291835.725254] Process btrfs (pid: 27386, threadinfo 8800373be000, task 88022452ae40) [291835.725254] Stack: [291835.725254] ea0004b5a470 ea00 8800373bf9f8
Re: Kernel error during btrfs balance
On Wed, Jan 26, 2011 at 10:04:02AM +0100, Erik Logtenberg wrote: Hi, It took me a couple of days, because I needed to patch my kernel first and then issue a rebalance, which ran for more than two days. Nevertheless, the rebalance succeeded without any kernel BUG-messages, so apparently your patch works! I noticed that at first, the messages were like this: [79329.526490] btrfs: found 1939 extents [79375.950834] btrfs: found 1939 extents [79376.083599] btrfs: relocating block group 352220872704 flags 1 [80052.940435] btrfs: found 3786 extents [80108.439657] btrfs: found 3786 extents [80112.325548] btrfs: relocating block group 351147130880 flags 1 Just like I saw during previous balance-runs. Then all of a sudden the messages changed to: [104178.827594] btrfs allocation failed flags 1, wanted 2013265920 [104178.827599] space_info has 4271198208 free, is not full [104178.827602] space_info total=214748364800, used=210440957952, pinned=0, reserved=36208640, may_use=3168993280, readonly=0 [104178.827606] block group 1107296256 has 5368709120 bytes, 5368582144 used 0 pinned 0 reserved [104178.827610] entry offset 1778384896, bytes 86016, bitmap yes [104178.827612] entry offset 1855827968, bytes 20480, bitmap no [104178.827614] entry offset 1855852544, bytes 20480, bitmap no [104178.827617] block group has cluster?: no [104178.827618] 0 blocks of free space at or bigger than bytes is [104178.827621] block group 8623489024 has 5368709120 bytes, 5368705024 used 0 pinned 0 reserved [104178.827624] entry offset 8891924480, bytes 4096, bitmap yes [104178.827626] block group has cluster?: no [104178.827628] 0 blocks of free space at or bigger than bytes is [104178.827631] block group 17213423616 has 5368709120 bytes, 5368709120 used 0 pinned 0 reserved [104178.827634] block group has cluster?: no And so on. Does this indicate an error of any sort, or is this expected behaviour? As far as I know, it means that you've run out of space, and not every block group has been rewritten by the balance process. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- In one respect at least, the Martians are a happy people: --- they have no lawyers. signature.asc Description: Digital signature
Re: Kernel error during btrfs balance
[104178.827624] entry offset 8891924480, bytes 4096, bitmap yes [104178.827626] block group has cluster?: no [104178.827628] 0 blocks of free space at or bigger than bytes is [104178.827631] block group 17213423616 has 5368709120 bytes, 5368709120 used 0 pinned 0 reserved [104178.827634] block group has cluster?: no And so on. Does this indicate an error of any sort, or is this expected behaviour? As far as I know, it means that you've run out of space, and not every block group has been rewritten by the balance process. Hugo. It is a 300GB volume with 79GB free. So hardly out of space. Moreover, I started the balance operation with the sole purpose of reclaiming some free space. The volume had like 40GB less free space when balance started, which was used by / reserved for Metadata. Kind regards, Erik. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Kernel error during btrfs balance
Yesterday I reported a similar problem in this mailing list, in the thread version. Running kernel 2.6.37 didn't show this error, but running kernel 2.6.38- rc2 ended with errors. Viele Gruesse! Helmut Ah, indeed, just like you I use 2.6.38-rc2. Or to be more precise: 2.6.38-0.rc2.git0.1.fc14.x86_64, which is the latest rawhide kernel, with one additional patch, being the oneliner from Zheng Yan. Kind regards, Erik. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: version (was: btrfs, broken design?)
On 01/21/2011 03:32 PM, Diego Calleja wrote: On Viernes, 21 de Enero de 2011 10:54:00 Helmut Hullen escribió: And I never have seen somethin like Changelog - that would be fine too. Check the wiki, I keep that updated: https://btrfs.wiki.kernel.org/index.php/Main_Page#News I like the Changelog on the wiki [1] very much, since it shows the most important changes in easily understandable language. Unfortunately, the most recent change is 2.6.35 (august 2010), while we are currently at 2.6.37 (stable) and 2.6.38-rc2 (mainline). Especially 2.6.38(-rc2) contains many interesting new btrfs features and lots of important fixes. It would be very nice if the Changelog could list/explain those. [1] https://btrfs.wiki.kernel.org/index.php/Changelog The News [2] section on the Main page does mention one more version, 2.6.37. But the news section is less elaborate than the Changelog and also I notice that 2.6.36 is not mentioned in the News section. Still, 2.6.36 does contain all kinds of btrfs-related changes. [2] https://btrfs.wiki.kernel.org/index.php/Main_Page#News Diego, pls don't read anything negative in my comments, I enjoy and respect your work very much! If you could find time to add those latest changes to the wiki, it would be greatly appreciated. Kind regards, Erik. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: Fix balance panic
Mark the cloned backref_node as checked in clone_backref_node() Signed-off-by: Yan, Zheng zheng.z@intel.com --- diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 045c9c2..bef9c22 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -1157,6 +1157,7 @@ static int clone_backref_node(struct btrfs_trans_handle *trans, new_node-bytenr = dest-node-start; new_node-level = node-level; new_node-lowest = node-lowest; + new_node-checked = 1; new_node-root = dest; if (!node-lowest) { -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: version (was: btrfs, broken design?)
On Miércoles, 26 de Enero de 2011 11:13:20 Erik Logtenberg escribió: Diego, pls don't read anything negative in my comments, I enjoy and respect your work very much! If you could find time to add those latest changes to the wiki, it would be greatly appreciated. Thanks for your suggestion, I've updated the Changelog and removed the old items from the news section. 2.6.36 didn't had many btrfs changes, there was no new features. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: Fix balance panic
Hallo, Yan,, Du meintest am 26.01.11: Mark the cloned backref_node as checked in clone_backref_node() Signed-off-by: Yan, Zheng zheng.z@intel.com -+- diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 045c9c2..bef9c22 100644 -+- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -1157,6 +1157,7 @@ static int clone_backref_node(struct btrfs_trans_handle *trans,new_node-bytenr = dest-node-start; new_node-level = node-level; new_node-lowest = node-lowest; + new_node-checked = 1; new_node-root = dest; if (!node-lowest) { -- Sorry - didn't solve my problem: -- last lines from dmesg --- bio too big device sdc (256 240) bio too big device sdc (256 240) [...] more than 800 such lines bio too big device sdc (256 240) bio too big device sdc (256 240) bio too big device sdc (256 240) [ cut here ] kernel BUG at fs/btrfs/volumes.c:2097! invalid opcode: [#1] last sysfs file: /sys/devices/pci:00/:00:07.1/host1/target1:0:0/1:0:0:0/block/sdb/dev Modules linked in: sg nf_nat_ftp nf_conntrack_ftp ipt_MASQUERADE iptable_nat nf_nat xt_DSCP xt_multiport xt_recent nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp ipt_REJECT iptable_filter iptable_mangle ip_tables xt_iprange x_tables nfsd exportfs 8139too 8139cp savagefb fb_ddc i2c_algo_bit vgastate i2c_piix4 piix e100 mii intel_agp intel_gtt agpgart cmd64x video thermal_sys ac battery yenta_socket pcmcia_rsrc pcmcia pcmcia_core thinkpad_acpi hwmon rfkill nvram fuse Pid: 5991, comm: btrfs Not tainted 2.6.38-rc2-OD1 #2 26478EG/26478EG EIP: 0060:[c1235264] EFLAGS: 00010282 CPU: 0 EIP is at btrfs_balance+0x2d4/0x2e0 EAX: fffb EBX: cfa58000 ECX: d7ce6c18 EDX: d7ce6818 ESI: d0574000 EDI: cf958400 EBP: cc4e3e9c ESP: cc4e3e38 DS: 007b ES: 007b FS: GS: 00e0 SS: 0068 Process btrfs (pid: 5991, ti=cc4e2000 task=cce6c3c0 task.ti=cc4e2000) Stack: 99cc 0001 00e4 0010 0100 ccdcc000 cce8d058 aea58000 0001 99cc 0001 0100b790 00e4 0199cc00 0001 e400 ccc85380 ffea Call Trace: [c123bcf1] btrfs_ioctl+0x2e1/0x9d0 [c123ba10] ? btrfs_ioctl+0x0/0x9d0 [c10c3f65] do_vfs_ioctl+0x85/0x590 [c10206db] ? do_page_fault+0x17b/0x380 [c10b554b] ? do_sys_open+0xdb/0x110 [c10c44f7] sys_ioctl+0x87/0x90 [c1753d0c] syscall_call+0x7/0xb Code: 1b ff ff ff 89 f0 e8 cc 75 fb ff 8b 55 b4 8b 82 10 01 00 00 05 74 19 00 00 e8 09 dc 51 00 e9 70 fd ff ff 31 db eb dd 85 c0 74 9d 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 66 90 55 89 e5 56 53 83 ec 34 3e EIP: [c1235264] btrfs_balance+0x2d4/0x2e0 SS:ESP 0068:cc4e3e38 ---[ end trace ebf8fd68179e0b7a ]--- # end dmesg -- Viele Gruesse! Helmut -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs BUG during Ceph cosd open() syscall
Hi, I got this kernel BUG on a server running multiple Ceph cosd instances, during a heavy write load generated by multiple Ceph clients. The server was running the current ceph unstable kernel (a3f5274e535 in git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client.git). Please let me know what other information you need to make this report useful. -- Jim BUG: unable to handle kernel NULL pointer dereference at 0100 [97221.834832] IP: [a075b3ab] btrfs_drop_inode+0x10/0x36 [btrfs] [97221.834832] PGD 198d6b067 PUD 13d79f067 PMD 0 [97221.834832] Oops: [#1] SMP [97221.834832] last sysfs file: /sys/devices/pci:00/:00:1e.0/:10:0d.0/local_cpus [97221.834832] CPU 3 [97221.834832] Modules linked in: loop btrfs zlib_deflate ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge ] [97221.834832] [97221.834832] Pid: 30295, comm: cosd Not tainted 2.6.37-00017-ga3f5274 #4 0DT097/PowerEdge 1950 [97221.834832] RIP: 0010:[a075b3ab] [a075b3ab] btrfs_drop_inode+0x10/0x36 [btrfs] [97221.834832] RSP: 0018:8801cf205c08 EFLAGS: 00010282 [97221.834832] RAX: a075b39b RBX: 88018490a3a0 RCX: 0001 [97221.834832] RDX: RSI: 819e7ea0 RDI: 88018490a3a0 [97221.834832] RBP: 8801cf205c08 R08: e8ccefa8 R09: [97221.834832] R10: 8801488e9658 R11: R12: 88021b5c6400 [97221.834832] R13: 8801fad145a0 R14: 8801faf8c440 R15: 88017bab9848 [97221.834832] FS: 7f0b011f9940() GS:8800cfcc() knlGS: [97221.834832] CS: 0010 DS: ES: CR0: 8005003b [97221.834832] CR2: 0100 CR3: 0001b8c89000 CR4: 06e0 [97221.834832] DR0: DR1: DR2: [97221.834832] DR3: DR6: 0ff0 DR7: 0400 [97221.834832] Process cosd (pid: 30295, threadinfo 8801cf204000, task 8801488e9610) [97221.834832] Stack: [97221.834832] 8801cf205c28 810fd714 fffb fffb [97221.834832] 8801cf205cd8 a07587e8 8801cf205c48 0102 [97221.834832] 000fcf205c58 88022f5c46a0 8801d1ef8800 8136a638 [97221.834832] Call Trace: [97221.834832] [810fd714] iput+0x5c/0x1e0 [97221.834832] [a07587e8] btrfs_new_inode+0x2d3/0x2e5 [btrfs] [97221.834832] [8136a638] ? _cond_resched+0xe/0x22 [97221.834832] [8136ae20] ? mutex_lock+0x16/0x3a [97221.834832] [a0756da1] ? start_transaction+0x176/0x1bc [btrfs] [97221.834832] [a075d1fc] btrfs_create+0xbb/0x1fa [btrfs] [97221.834832] [810f49e2] vfs_create+0x76/0x96 [97221.834832] [810f56af] do_last+0x24d/0x4d3 [97221.834832] [810f5b16] do_filp_open+0x1e1/0x4c5 [97221.834832] [81031061] ? should_resched+0xe/0x2f [97221.834832] [8136a638] ? _cond_resched+0xe/0x22 [97221.834832] [811aa669] ? might_fault+0xe/0x10 [97221.834832] [811aa753] ? __strncpy_from_user+0x20/0x4a [97221.834832] [810e9023] do_sys_open+0x62/0xeb [97221.834832] [810e90df] sys_open+0x20/0x22 [97221.834832] [81002c2b] system_call_fastpath+0x16/0x1b [97221.834832] Code: 53 fc 94 e0 4c 89 e7 e8 f6 8a 95 e0 48 83 c4 18 5b 41 5c 41 5d 41 5e 41 5f c9 c3 55 48 89 e5 0f 1f 44 00 00 48 8b 97 68 fe ff ff 83 ba 00 01 00 00 00 75 12 48 8b 82 28 01 00 00 b9 01 00 [97221.834832] RIP [a075b3ab] btrfs_drop_inode+0x10/0x36 [btrfs] [97221.834832] RSP 8801cf205c08 [97221.834832] CR2: 0100 [97222.207152] ---[ end trace 32eb84782eb8 ]--- -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Who wants metadata image from botched filesystem?
Hello. So I had the filesystem that became broken. on 2.6.37 with for-linus unstable, when accessing some directories, it was hanging hard. I created the metadata image and can put it somewhere if you want to use it for something. Thanks. -- This message represents the official view of the voices in my head -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH] Btrfs: New inode number allocator
On 01/26/2011 02:53 AM, Li Zefan wrote: Here comes the compatability issue. It's fine to mount old btrfs, because we'll just use the original way to find free ino. But we can't mount new btrfs in older kernels, because the OFFSET makes highest objectid overflow when it is cast to unsigned long in 32bits system. We can store ino extents to a seperate btree, and then the new btrfs can be mounted in older kernels, but another problem will arise when remounting it in new kernels - creating new files will probably fail (but not oops) because the ino extent items are not consistent with inode items. If the above behavior (failing to create files) is not acceptable, we'll have to add an incompat flag. I can't comment the patch from a technical point of view. However I want to add a my comment about the compatibility issues. I remember that Linus was not happy about a filesystem which is not compatible when the the kernel version decrease. IIRC during the switch from ext3 to ext4 there were some issues during a git bis-sect process. So my suggestions are: - don't allow that an automatic switch to the new inode allocation policy. It should be the user to force this switch ( via a mount option for example) - in case the performance regression are noticeable, allow the user to use the old policy, which, if I understood correctly, work fine on a 64 bit system [*]. Regards G.Baroncelli [*] Supposing to create continuously 1000 file per sec, it needs 2^64/1000 sec = ~ 573.000.000 years to exhaust all the available inode numbers. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Atomic file data replace API
On Wed, Jan 26, 2011 at 8:30 PM, Chris Mason chris.ma...@oracle.com wrote: My answer hasn't really changed ;) Replacing file data is a common operation, but it is still surprisingly complex. Again, the truncate is O(size of the file) and it is actually impossible to do this atomically in most filesystems. Unfortunately life isn't trivial. ;) Given that it's common, it doesn't make sense to have code duplication in lots of apps to implement the temp file rename pattern. If it's too complex to implement in the FS (ATM), would it be possible to implement it in a higher layer? You don't notice this because xfs/ext34/btrfs (and many others) have code that makes sure a truncate is restarted if you crash. So, it appears to be atomic even though we're really just restarting the operation. In order to have a truncate + replacement of data operation, we'd have to do a disk format change that includes both the truncate and the new data. I'm not sure why the disk format would have to change. Conceptually, just like the temp file case, you'd write the new data to newly allocated blocks. After (and I guess that's the complex part) they're safely on disk, you update the meta data, in an atomic way. It would look a lot like echo data file.new ; truncate file ; mv file.new file, but recorded in the FS metadata. I don't have this in the btrfs roadmap. It would be nice but most people use databases for things that require atomic operations. I Executables and files shouldn't be in a DB. Olaf -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: LOOP_GET_STATUS(64) truncates pathnames to 64 chars (was Re: Bug in mkfs.btrfs?!)
Hi, attached is the answer from Jari Ruusu, (one of?!) the main developer of loop-aes. It seems that checking if a loop device is mounted following the link isn't the best idea :) I'll have time to look deeper into his example about the 14.02. I'll then try to fix that issue in mkfs.btrfs. If someone wants to fix it earlier, do it :) Regards, Felix ---BeginMessage--- First of all, I am not subscribed to linux-cry...@kernelnewbies.org mailing list. But I do check web archive occasionally, at least for now. If you want loop-AES maintainer to see your posts, please CC my @users.sourceforge.net address, or linux-cry...@vger.kernel.org Felix Blanke wrote: If I'm using the unpatched losetup (from util-linux-ng) and looping a device like /dev/disk/by-id/ata-WDC_WD6400AAKS-22A7B2_WD-WCASY7780706-part3 it is using readlink to track down the link to e.g. /dev/sda3. But if I'm using the patched losetup I don't see any readlink in the strace. Because no readlink is used there is a problem with device names which have more then 64 characters (the field lo_name is an array of 64 char). It is truncated like /dev/loop0: [0010]:6229 (/dev/disk/by-id/ata-INTEL_SSDSA2M160G2GC_CVPO939201JX160AGN-par) encryption=AES128 That does make trouble with e.g. btrfs, because mkfs.btrfs uses the devicename provided by losetup to check if a device is mounted. That can't work with an incomplete devicename. mkfs.btrfs authors goofed. Using truncated name is doomed to fail. Correct way to check that is to compare loop backing file inode and device numbers to inode and device numbers of supected file. Here is sample losetup output: /dev/loop6: [0902]:209029 (/dev/md3) offset=4096 encryption=AES128 multi-key-v3 ^^ device inode Device number (0902) is hexadecimal. Inode number (209029) is decimal number. /dev/md3 block special device node is present at mounted file system on device 0x0902. Anything that symlinks or hardlinks to that inode 209029 on device 0x0902, is same file or device. Even if losetup were to follow symlinks for that backing file name, symlink destination would not necessarily be shorter. Although in your case it would be shorter. Also, old versions of cryptoloop do not store the backing file name at all. They use that same space for cipher algorithm name. Below is source code for correct way for mkfs.btrfs to test loop backing file status. Feel free to forward it to mkfs.btrfs authors. -- Jari Ruusu 1024R/3A220F51 5B 4B F9 BB D3 3F 52 E9 DB 1D EB E3 24 0E A9 DD /* loop-backing-dev-check.c / (c) 2011 Jari Ruusu / GNU GPL */ #include stdio.h #include sys/types.h #include sys/stat.h #include fcntl.h #include unistd.h #include stdlib.h #include sys/ioctl.h #include linux/loop.h int loop_backing_dev_check(char *loopdev, char *suspect) { int fd, ret = 0; struct stat statbuf; struct loop_info64 info64; struct loop_info info; if(!loopdev || !*loopdev || !suspect || !*suspect) goto outret0; if(stat(loopdev, statbuf) || !S_ISBLK(statbuf.st_mode)) goto outret0; if((statbuf.st_rdev 0xfff00) != 0x700) goto outret0; if(stat(suspect, statbuf)) goto outret0; if((fd = open(loopdev, O_RDONLY)) 0) goto outret0; if(!ioctl(fd, LOOP_GET_STATUS64, info64)) { if(statbuf.st_dev != info64.lo_device) goto closeret0; if(statbuf.st_ino != info64.lo_inode) goto closeret0; ret = 1; /* loop backing device/file is same as suspect */ } else if(!ioctl(fd, LOOP_GET_STATUS, info)) { if(statbuf.st_dev != info.lo_device) goto closeret0; if(statbuf.st_ino != info.lo_inode) goto closeret0; ret = 1; /* loop backing device/file is same as suspect */ } closeret0: close(fd); outret0: return ret; } /* usage: ./loop-backing-dev-check /dev/loop0 /dev/fd0 */ int main(int argc, char **argv) { if(argc != 3) exit(1); if(loop_backing_dev_check(argv[1], argv[2])) { printf(loop device %s is associated with %s\n, argv[1], argv[2]); exit(0); } exit(1); } ---End Message---
2.6.38-rc2 oops's when rebalancing on different size drives (was Re: version)
On 26/01/11 01:37, Helmut Hullen wrote: bio too big device sdc (256 240) bio too big device sdc (256 240) bio too big device sdc (256 240) bio too big device sdc (256 240) Oh dear, those are errors from the block layer, looks like btrfs is doing something wrong there.. :-( [ cut here ] kernel BUG at fs/btrfs/volumes.c:2097! It looks like btrfs isn't handling errors coming back from the block layer - at that point it's just called btrfs_relocate_chunk().. So my guess is that the rebalancing code is naive and assumes the drives are the same size - but I can't quite follow what the code above that BUG_ON() is doing to verify that.. Chris M. ? cheers! Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: full btrfs partition, became unmountable (+ a solution that thankfully worked for me)
Hello Shawn, it's now performing a sequential read of the volume, which will probably take significantly more time for you than for me (where I was dealing with an image of a 16GB SD card, stored on a recent mechanical SATA disk). I'm a bit confused by what happens while reading the potential supers. At first the blocks appear valid, then they are all misplaced (meaning the bytenr field != the bytenr from which the block has been read, IOW the block is most probably not part of btrfs structures, from what I understand). From the output before the will attempt to find useful trees messages, it seems btrfsck is now doing a sequential read not just of /dev/sde, but also every single block device ? disk-io.c: try_emergency_tree_fixup() is probably now a bit too silent for your use case at the moment. You might want to uncomment the commented out fprintf there; this will make it very verbose (an extra line per structure block) but will provide clues as to where on disk is it working. -- Cyrille Le jeudi 27 janvier 2011 à 01:18 -0500, Shawn Stricker a écrit : any chance of getting a little more informative output? I started the command at about 2250 Eastern and now at 0117 Eastern the command is still running and all of the attached output happened in the first few minutes (under 5). On Jan 26, 2011, at 2:46 AM, Cyrille Chépélov wrote: Le mardi 25 janvier 2011 à 23:38 -0500, Shawn Stricker a écrit : Not sure where you pulled your source from but a fresh checkout of either master or next of git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git does not compile properly. They both fail with cc1: warnings being treated as errors disk-io.c: In function ‘btrfs_read_dev_super’: disk-io.c:937: error: format ‘%lu’ expects type ‘long unsigned int’, but argument 4 has type ‘unsigned int’ disk-io.c:957: error: implicit declaration of function ‘uuid_unparse’ am I patching/compiling from the wrong source or is there something I am missing? uh, I had been compiling with CFLAGS=-g, where the makefile specifies -O2 -Werror -Werror causes warnings to be treated as errors, which is a good thing in a way (makes sure stuff as this gets caught :) ) fixes are: * line 937 (patched), should be %llu instead of %lu * line 957, there should be a prototype for uuid_unparse(), most certainly by including uuid/uuid.h please try this patch instead. Thanks for the feedback! -- Cyrille On Jan 25, 2011, at 1:46 PM, Cyrille Chépélov wrote: Hello all, Last Friday, the /var and /home partition on one of my appliances became full. This should normally not be much of a problem, except that after the incident, I had been unable to mount the partition back again. The appliance runs 2.6.32 as provided by Debian during the last two months. The rescue computer runs 2.6.37; both exhibited the same behaviour at mount: an infinite loop-and-abort cycle (I unfortunately did not write down the exact messages, but in a nutshell, there was not enough free space to replay the log, so it aborted). After pulling the SD card (yes) to break the loop, I ended up with a corrupt file system. Any attempt to mount, debug or fsck (using btrfs-tools 0.19+20100601 as shipped by Debian, or compiled from git 1b444cd2e6ab8dcafdd) aborted with the following message: btrfs-debug-tree: disk-io.c:741: open_ctree_fd: Assertion `!(! tree_root-node)' failed. After much scavenging on the disk image, I finally managed to recover, using the (dirty) patch attached here. Since apparently other people had similar issues, I'm posting it in the hope it might be useful. -- Cyrille PS: Chris, if btrfs-images of before and after my butcher fix would be useful to you, just let me know. scavenge.patch scavenge-2.patch -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH] Btrfs: New inode number allocator
Chris Mason wrote: Excerpts from Li Zefan's message of 2011-01-25 20:53:00 -0500: (WARNING: this patch is not completed or well-tested) We used to allocate inode number by searching through inode items, but it made the allocation slower and slower as more and more files created. The current code just records the highest objectid in the btree without reusing old inode numbers, which will make the filesystem run out of inode number as we create/delete files. In this patch, free inode numbers are stored in the fs tree with key: [start, BTRFS_INO_EXTENT_KEY, end] Thanks a lot for working on this, it isn't an easy problem. I think Josef's free space cache for the extent allocation tree is the model you want to use. They are actually solving exactly the same problem: In the extent allocation tree, a free extent is one with no keys in the tree. In the FS tree, a free inode is one with no keys in the tree. He has a cache that gets written on a per block group basis for the free extents in that block group. It's a somewhat easier problem to solve in the inode number cache because you don't have the same problem where you need free blocks to store the free block cache ;) In his code, the cache stores the generation number of the commit that was used to create the cache. If a cache unaware kernel mounts the filesystem and makes changes, we notice on the next mount because the cache generation number doesn't match the filesystem generation number. It will probably be easiest to dedicate a specific objectid to the inode number cache in each FS tree (say objectid == -12ULL), and then put the caching items directly in the tree under that objectid. I'd suggest that you also reuse his code to compactly store a range of free extents. It wouldn't be hard to have a simple compression scheme that stored ranges for huge chunks of free inode numbers and did a bitmask for ranges where there are lots of free individual inodes. I'll take your suggestion and try to implement it. Thanks. (btw, I'll be off from Feb 29th to Mar 7th for Chinese Spring Festival) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: full btrfs partition, became unmountable (+ a solution that thankfully worked for me)
any chance of getting a little more informative output? I started the command at about 2250 Eastern and now at 0117 Eastern the command is still running and all of the attached output happened in the first few minutes (under 5). btrfsck /dev/sde trying potential super #0 at bytenr 65536 super #0 at bytenr 65536 has better generation 134838 than 0, using that trying potential super #1 at bytenr 67108864 super #1 at bytenr 67108864 has same generation 134838 than 134838, skipping warning: super #1 at bytenr 67108864 has different contents! trying potential super #2 at bytenr 274877906944 super #2 at bytenr 274877906944 has same generation 134838 than 134838, skipping warning: super #2 at bytenr 274877906944 has different contents! trying potential super #0 at bytenr 65536 misplaced block thinks it's at 8679965255889070385 trying potential super #1 at bytenr 67108864 misplaced block thinks it's at 11385464139938791651 trying potential super #2 at bytenr 274877906944 misplaced block thinks it's at 9270412280288921994 trying potential super #0 at bytenr 65536 super #0 at bytenr 65536 has better generation 2155 than 0, using that trying potential super #1 at bytenr 67108864 misplaced block thinks it's at 7739426643357674384 trying potential super #2 at bytenr 274877906944 misplaced block thinks it's at 15592201610856999042 trying potential super #0 at bytenr 65536 super #0 at bytenr 65536 has better generation 134838 than 0, using that trying potential super #1 at bytenr 67108864 super #1 at bytenr 67108864 has same generation 134838 than 134838, skipping warning: super #1 at bytenr 67108864 has different contents! trying potential super #2 at bytenr 274877906944 super #2 at bytenr 274877906944 has same generation 134838 than 134838, skipping warning: super #2 at bytenr 274877906944 has different contents! trying potential super #0 at bytenr 65536 misplaced block thinks it's at 13794433748072589868 trying potential super #1 at bytenr 67108864 misplaced block thinks it's at 6338804170709571794 trying potential super #2 at bytenr 274877906944 misplaced block thinks it's at 1827607198315921929 trying potential super #0 at bytenr 65536 misplaced block thinks it's at 0 trying potential super #1 at bytenr 67108864 misplaced block thinks it's at 1254821329273892037 trying potential super #2 at bytenr 274877906944 misplaced block thinks it's at 5355923006792833603 trying potential super #0 at bytenr 65536 misplaced block thinks it's at 15445565961457297964 trying potential super #1 at bytenr 67108864 misplaced block thinks it's at 3079817357236378973 trying potential super #2 at bytenr 274877906944 misplaced block thinks it's at 2007935378006179730 trying potential super #0 at bytenr 65536 invalid magic trying potential super #1 at bytenr 67108864 misplaced block thinks it's at 5729257636792198197 trying potential super #2 at bytenr 274877906944 misplaced block thinks it's at 9602773462471183673 trying potential super #0 at bytenr 65536 misplaced block thinks it's at 327680 trying potential super #1 at bytenr 67108864 misplaced block thinks it's at 0 trying potential super #2 at bytenr 274877906944 misplaced block thinks it's at 18446744073709551615 trying potential super #0 at bytenr 65536 misplaced block thinks it's at 0 trying potential super #1 at bytenr 67108864 misplaced block thinks it's at 0 trying potential super #2 at bytenr 274877906944 got only 0 bytes instead of 2859 trying potential super #0 at bytenr 65536 misplaced block thinks it's at 0 trying potential super #1 at bytenr 67108864 misplaced block thinks it's at 0 trying potential super #2 at bytenr 274877906944 misplaced block thinks it's at 4313900536667142911 trying potential super #0 at bytenr 65536 super #0 at bytenr 65536 has better generation 134838 than 0, using that trying potential super #1 at bytenr 67108864 super #1 at bytenr 67108864 has same generation 134838 than 134838, skipping warning: super #1 at bytenr 67108864 has different contents! trying potential super #2 at bytenr 274877906944 super #2 at bytenr 274877906944 has same generation 134838 than 134838, skipping warning: super #2 at bytenr 274877906944 has different contents! trying potential super #0 at bytenr 65536 misplaced block thinks it's at 1142399309793345613 trying potential super #1 at bytenr 67108864 misplaced block thinks it's at 6887355887353813266 trying potential super #2 at bytenr 274877906944 misplaced block thinks it's at 10874904992214108498 trying potential super #0 at bytenr 65536 misplaced block thinks it's at 8679965255889070385 trying potential super #1 at bytenr 67108864 misplaced block thinks it's at 16378195527537296748 trying potential super #2 at bytenr 274877906944 misplaced block thinks it's at 9378314511156802577 trying potential super #0 at bytenr 65536 super #0
Re: version
Excerpts from Helmut Hullen's message of 2011-01-25 09:37:00 -0500: crashes with the dmesg lines - dmesg --- bio too big device sdc (256 240) bio too big device sdc (256 240) bio too big device sdc (256 240) bio too big device sdc (256 240) [ cut here ] kernel BUG at fs/btrfs/volumes.c:2097! Ugh, this one is an old friend I thought I had fixed up. The two devices have different limits on the max size of the bio, and we're using one that is too large. I'll get it fixed for the next rc. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html