Re: fresh btrfs filesystem, out of disk space, hundreds of gigs free
Duncan 1i5t5.duncan at cox.net writes: Jon Nelson posted on Fri, 21 Mar 2014 19:00:51 -0500 as excerpted: Using openSUSE 13.1 on x86_64 which - as of this writing - is 3.11.10, Would a more recent kernel than 3.11 have done me any good? [Reordered the kernel question from below to here, where you reported the running version.] As both mkfs.btrfs and the wiki recommend, always use the latest kernel. In fact, the kernel config's btrfs option had a pretty strong warning thru 3.12 that was only toned down in 3.13 as well, so I'd definitely recommend at least the latest 3.13.x stable series kernel in any case. I would like to say that your response is one of the most useful and detailed responsed I've ever received on a mailing list. Thank you! The please run the very latest kernel/userland is sort of true for everything, though. Also, I am of the understanding that the openSUSE folks back-port *some* of the btrfs-relevant bits to both the kernel and the userspace tools, but I could be wrong, too. I tried to copy a bunch of files over to a btrfs filesystem (which was mounted as /, in fact). After some time, things ground to a halt and I got out of disk space errors. btrfs fi df / showed about 1TB of *data* free, and 500MB of metadata free. It's the metadata, plus no space left to allocate more. See below. Right. Although I did a poor job of noting it, I understood at least that much. Below are the btrfs fi df / and btrfs fi show. turnip:~ # btrfs fi df / Data, single: total=1.80TiB, used=832.22GiB System, DUP: total=8.00MiB, used=204.00KiB System, single: total=4.00MiB, used=0.00 Metadata, DUP: total=5.50GiB, used=5.00GiB Metadata, single: total=8.00MiB, used=0.00 FWIW, the system and metadata single chunks reported there are an artifact from mkfs.btrfs and aren't used (used=0.00). At some point it should be updated to remove them automatically, but meanwhile, a balance should remove them from the listing. If you do that balance immediately after filesystem creation, at the first mount, you'll be rid of them when there's not a whole lot of other data on the filesystem to balance as well. That would leave: Data, single: total=1.80TiB, used=832.22GiB System, DUP: total=8.00MiB, used=204.00KiB Metadata, DUP: total=5.50GiB, used=5.00GiB Metadata is the red-flag here. Metadata chunks are 256 MiB in size, but in default DUP mode, two are allocated at once, thus 512 MiB at a time. And you're under 512 MiB free so you're running on the last pair of metadata chunks, which means depending on the operation, you may need to allocate metadata pretty quickly. You can probably copy a few files before that, but a big copy operation with many files at a time would likely need to allocate more metadata. The size of the chunks allocated is especially useful information. I've not seen that anywhere else, and does explain a fair bit. But for a complete picture you need the filesystem show output, below, as well... turnip:~ # btrfs fi show Label: none uuid: 9379c138-b309-4556-8835-0f156b863d29 Total devices 1 FS bytes used 837.22GiB devid1 size 1.81TiB used 1.81TiB path /dev/sda3 Btrfs v3.12+20131125 OK. Here we see the root problem. Size 1.81 TiB, used 1.81 TiB. No unallocated space at all. Whichever runs out of space first, data or metadata, you'll be stuck. Now it's at this point that I am unclear. I thought the above said: 1 device on this filesystem, 837.22 GiB used. and device ID #1 is /dev/sda3, is 1.81TiB in size, and btrfs is using 1.81TiB of that Which I interpret differently. Can you go into more detail as to how (from btrfs fi show) we can say the _filesystem_ (not the device) is full? And as was discussed above, you're going to need another pair of metadata chunks allocated pretty quickly, but there's no unallocated space available to allocate to them, so no surprise at all you got free-space errors! =:^( Conversely, you have all sorts of free data space. Data space is allocated in gig-size chunks, and you have nearly a TiB of free data- space, which means there's quite a few nearly empty data chunks available. To correct that imbalance and free the extra data space to the pool so more metadata can be allocated, you run a balance. In fact, I did try a balance - both a data-only and a metadata-only balance. The metadata-only balance failed. I cancelled the data-only balance early, although perhaps I should have been more patient. I went from a running system to working from a rescue environment -- I was under a bit of time pressure to get things moving again. Here, you probably want a balance of the data only, since it's what's unbalanced, and on slow spinning rust (as opposed to fast SSD) rewriting /everything/, as balance does by default, will take some time. To do data only, use the -d option: # btrfs balance start -d / (You said it was mounted on root, so
fresh btrfs filesystem, out of disk space, hundreds of gigs free
Using openSUSE 13.1 on x86_64 which - as of this writing - is 3.11.10, I tried to copy a bunch of files over to a btrfs filesystem (which was mounted as /, in fact). After some time, things ground to a halt and I got out of disk space errors. btrfs fi df / showed about 1TB of *data* free, and 500MB of metadata free. Below are the btrfs fi df /and btrfs fi show. I ended up having to reboot the machine. I was not able to get the machine to boot again after that, and ended up having to resort to a rescue environment, at which point I copied everything over to an ext4 filesystem. This is this first time I have tried btrfs since I experienced (unfixable) corruption a year or so back, with 3.7 and up. I was led to believe that the ENOSPC errors had been resolved. Would a more recent kernel than 3.11 have done me any good? turnip:~ # btrfs fi df / Data, single: total=1.80TiB, used=832.22GiB System, DUP: total=8.00MiB, used=204.00KiB System, single: total=4.00MiB, used=0.00 Metadata, DUP: total=5.50GiB, used=5.00GiB Metadata, single: total=8.00MiB, used=0.00 turnip:~ # btrfs fi show Label: none uuid: 9379c138-b309-4556-8835- 0f156b863d29 Total devices 1 FS bytes used 837.22GiB devid1 size 1.81TiB used 1.81TiB path /dev/sda3 Btrfs v3.12+20131125 -- Jon Software Blacksmith -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
report: my btrfs filesystem failed hard today
I had a btrfs filesystem under 3.9.8 that failed /hard/ today. So hard that the filesystem could not be mounted because there wasn't enough free space, unless it was mounted read only. This happened after I ran out of metadata space (is there a way to increase the amount of metadata storage) while still having many gigs free of data space, as reported by btrfs fi df. I tried balancing the metadata, defrag'ing files (with -czlib) and even tried mounting with -o remount,metadata_ratio={several values}, none of which worked, and then it crashed hard. First, it killed systemd's logger (journald), which refused to start.Following the crash, I was not able to mount the filesystem without -o ro. -o recovery did not work. The first messages are (when I ran out of disk space): 2013-07-17T12:37:36.159351-05:00 linux-3y3i kernel: [58301.322033] btrfs allocation failed flags 1, wanted 1310720 2013-07-17T12:37:36.159367-05:00 linux-3y3i kernel: [58301.322037] space_info 1 has 19119284224 free, is full 2013-07-17T12:37:36.159369-05:00 linux-3y3i kernel: [58301.322039] space_info total=146070831104, used=126941663232, pinned=643072, reserved=9175040, may_use=43724800, readonly=65536 2013-07-17T12:37:36.159370-05:00 linux-3y3i kernel: [58301.322041] block group 1107296256 has 5368709120 bytes, 4263174144 used 0 pinned 0 reserved 2013-07-17T12:37:36.159371-05:00 linux-3y3i kernel: [58301.322042] entry offset 1107296256, bytes 20082688, bitmap yes 2013-07-17T12:37:36.159372-05:00 linux-3y3i kernel: [58301.322044] entry offset 1241513984, bytes 35172352, bitmap yes 2013-07-17T12:37:36.159373-05:00 linux-3y3i kernel: [58301.322045] entry offset 1246859264, bytes 12288, bitmap no ... 2013-07-17T12:37:36.160185-05:00 linux-3y3i kernel: [58301.322387] block group has cluster?: no 2013-07-17T12:37:36.160186-05:00 linux-3y3i kernel: [58301.322388] 40 blocks of free space at or bigger than bytes is 2013-07-17T12:37:36.160188-05:00 linux-3y3i kernel: [58301.322390] block group 8623489024 has 5368709120 bytes, 4745510912 used 0 pinned 0 reserved ... and a few hundred thousand more lines I also tried 3.7 and 3.10.1 (although I could not try 3.10.1 to try to mount the filesystem after it finally went boom). Lots and lots (10's of thousands) of: 2013-07-17T13:25:24.690135-05:00 linux-3y3i kernel: [ 493.772797] [ cut here ] 2013-07-17T13:25:24.690137-05:00 linux-3y3i kernel: [ 493.772825] WARNING: at fs/btrfs/extent-tree.c:6556 btrfs_alloc_free_block+0x3bd/0x3d0 [btrfs]() 2013-07-17T13:25:24.690138-05:00 linux-3y3i kernel: [ 493.772826] Hardware name: 4239CTO 2013-07-17T13:25:24.690140-05:00 linux-3y3i kernel: [ 493.772828] btrfs: block rsv returned -28 2013-07-17T13:25:24.690141-05:00 linux-3y3i kernel: [ 493.772829] Modules linked in: fuse ufs qnx4 hfsplus hfs minix vfat msdos fat jfs xfs reiserfs xt_tcpudp xt_pkttype xt_LOG af_packet xt_limit ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ipt_REJECT iptable_raw xt_CT iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack ip6table_filter ip6_tables x_tables arc4 iwldvm snd_hda_codec_hdmi mac80211 iTCO_wdt snd_hda_codec_conexant iTCO_vendor_support mperf snd_usb_audio snd_usbmidi_lib coretemp snd_rawmidi snd_seq_device snd_hda_intel kvm_intel kvm microcode pcspkr snd_hda_codec snd_hwdep snd_pcm iwlwifi sr_mod sdhci_pci sdhci mmc_core cfg80211 joydev cdrom snd_timer e1000e snd_page_alloc thinkpad_acpi lpc_ich mfd_core rfkill i2c_i801 ptp pps_core wmi snd soundcore battery tpm_tis ac tpm tpm_bios tcp_westwood sg autofs4 btrfs raid6_pq zlib_deflate xor libcrc32c sha256_generic dm_crypt dm_mod ghash_clmulni_intel crc32_pclmul crc32c_intel aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul i915 drm_kms_helper drm thermal i2c_algo_bit video button processor thermal_sys scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_dh ata_generic ata_piix 2013-07-17T13:25:24.690144-05:00 linux-3y3i kernel: [ 493.772882] Pid: 334, comm: btrfs-balance Tainted: GW 3.9.8-1.gfccf19c-desktop #3 2013-07-17T13:25:24.690145-05:00 linux-3y3i kernel: [ 493.772883] Call Trace: 2013-07-17T13:25:24.690146-05:00 linux-3y3i kernel: [ 493.772895] [81004728] dump_trace+0x88/0x300 2013-07-17T13:25:24.690148-05:00 linux-3y3i kernel: [ 493.772902] [815c8e9b] dump_stack+0x69/0x6f 2013-07-17T13:25:24.690149-05:00 linux-3y3i kernel: [ 493.772907] [81046069] warn_slowpath_common+0x79/0xc0 2013-07-17T13:25:24.690150-05:00 linux-3y3i kernel: [ 493.772912] [81046165] warn_slowpath_fmt+0x45/0x50 2013-07-17T13:25:24.690151-05:00 linux-3y3i kernel: [ 493.772923] [a0224fad] btrfs_alloc_free_block+0x3bd/0x3d0 [btrfs] 2013-07-17T13:25:24.690152-05:00 linux-3y3i kernel: [ 493.772955] [a0210685] __btrfs_cow_block+0x135/0x560 [btrfs] 2013-07-17T13:25:24.690153-05:00 linux-3y3i kernel: [ 493.772974] [a0210c5c]
Re: report: my btrfs filesystem failed hard today
On Wed, Jul 17, 2013 at 6:04 PM, Josef Bacik jba...@fusionio.com wrote: On Wed, Jul 17, 2013 at 05:44:23PM -0500, Jon Nelson wrote: I had a btrfs filesystem under 3.9.8 that failed /hard/ today. So hard that the filesystem could not be mounted because there wasn't enough free space, unless it was mounted read only. This happened after I ran out of metadata space (is there a way to increase the amount of metadata storage) while still having many gigs free of data space, as reported by btrfs fi df. I tried balancing the metadata, defrag'ing files (with -czlib) and even tried mounting with -o remount,metadata_ratio={several values}, none of which worked, and then it crashed hard. First, it killed systemd's logger (journald), which refused to start.Following the crash, I was not able to mount the filesystem without -o ro. -o recovery did not work. Can you try btrfs-next, I did some work in this area in the last few months. Unfortunately, I could not wait. I rely needed that system back up, so I copied everything off, reformatted with ext4, and copied it all back. I (probably?) have more logs, though. -- Jon Software Blacksmith -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hang on 3.9, 3.10-rc5
On Fri, Jun 21, 2013 at 6:30 AM, Chris Mason chris.ma...@fusionio.com wrote: Quoting Jon Nelson (2013-06-20 21:46:46) Is this what you are looking for? After this, the CPU gets stuck and I have to reboot. [360491.932226] [ cut here ] [360491.932261] kernel BUG at /home/abuild/rpmbuild/BUILD/kernel-desktop-3.9.6/linux-3.9/fs/btrfs/ctree.c:1144! Aha, this is not a hang but a crash. I know the end result is the same (you push the reset button) but for a crash we always want to see this oops message. In my case, pick a file, any file. I randomly choose a file, and this most recent time I chose $HOME/.config/Clementine/jamendo.db. I'll try to reproduce (and check out the bugzilla too). Do you have -chris compression on? I don't have compression on: /blah/blah on / type btrfs (rw,relatime,ssd,space_cache,enospc_debug) But those mount opts don't seem to matter. I checked out the bugzilla entry, it sure looks similar! -- Jon Software Blacksmith -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hang on 3.9, 3.10-rc5
Is this what you are looking for? After this, the CPU gets stuck and I have to reboot. [360491.932226] [ cut here ] [360491.932261] kernel BUG at /home/abuild/rpmbuild/BUILD/kernel-desktop-3.9.6/linux-3.9/fs/btrfs/ctree.c:1144! [360491.932312] invalid opcode: [#1] PREEMPT SMP [360491.932344] Modules linked in: xfs nilfs2 jfs usb_storage nls_iso8859_1 nls_cp437 vfat fat mmc_block nfsv4 auth_rpcgss nfs fscache lockd sunrpc tun snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_seq_device fuse xt_tcpudp xt_pkttype xt_LOG xt_limit af_packet ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ipt_REJECT iptable_raw xt_CT iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack ip6table_filter ip6_tables x_tables arc4 iwldvm mac80211 snd_hda_codec_hdmi snd_hda_codec_conexant iTCO_wdt iTCO_vendor_support mperf coretemp kvm_intel kvm snd_hda_intel snd_hda_codec microcode snd_hwdep snd_pcm thinkpad_acpi snd_timer joydev pcspkr sr_mod snd tpm_tis iwlwifi cdrom e1000e wmi tpm battery ac tpm_bios cfg80211 sdhci_pci soundcore ptp sdhci snd_page_alloc i2c_i801 rfkill pps_core mmc_core lpc_ich mfd_core tcp_westwood sg autofs4 btrfs raid6_pq zlib_deflate xor libcrc32c sha256_generic dm_crypt dm_mod crc32_pclmul ghash_clmulni_intel crc32c_intel aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul thermal i915 drm_kms_helper drm i2c_algo_bit video button processor thermal_sys scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_dh ata_generic ata_piix [360491.933095] CPU 3 [360491.933110] Pid: 22166, comm: btrfs-endio-wri Not tainted 3.9.6-1.g8ead728-desktop #1 LENOVO 4239CTO/4239CTO [360491.933161] RIP: 0010:[a022075b] [a022075b] __tree_mod_log_rewind+0x23b/0x240 [btrfs] [360491.933225] RSP: 0018:88014cad7888 EFLAGS: 00010297 [360491.933253] RAX: RBX: 880065705b40 RCX: 88014cad7828 [360491.933289] RDX: 0247cc54 RSI: 88014949e821 RDI: 8801bf916640 [360491.933326] RBP: 880073048490 R08: 1000 R09: 88014cad7838 [360491.933362] R10: R11: R12: 8801c0556640 [360491.933398] R13: 0001731e R14: 003d R15: 880158b76340 [360491.933435] FS: () GS:88021e2c() knlGS: [360491.933476] CS: 0010 DS: ES: CR0: 80050033 [360491.933506] CR2: 7f0a360017f8 CR3: 00015be03000 CR4: 000407e0 [360491.933543] DR0: DR1: DR2: [360491.933579] DR3: DR6: 0ff0 DR7: 0400 [360491.933617] Process btrfs-endio-wri (pid: 22166, threadinfo 88014cad6000, task 8801a5f8a6c0) [360491.933662] Stack: [360491.933674] 88011b573800 88010126e7f0 0001 1600 [360491.933719] 880073048490 a0228d45 6db6db6db6db6db7 8801c0556640 [360491.933763] 88020e764000 8800 0001731e 880197cb8158 [360491.933807] Call Trace: [360491.933848] [a0228d45] btrfs_search_old_slot+0x635/0x950 [btrfs] [360491.933909] [a02a1ec6] __resolve_indirect_refs+0x156/0x640 [btrfs] [360491.934044] [a02a2e0c] find_parent_nodes+0x95c/0x1050 [btrfs] [360491.934176] [a02a3592] btrfs_find_all_roots+0x92/0x100 [btrfs] [360491.934307] [a02a401e] iterate_extent_inodes+0x16e/0x370 [btrfs] [360491.934440] [a02a42b8] iterate_inodes_from_logical+0x98/0xc0 [btrfs] [360491.934572] [a024c1c8] record_extent_backrefs+0x68/0xe0 [btrfs] [360491.934652] [a0256d80] btrfs_finish_ordered_io+0x150/0x990 [btrfs] [360491.934739] [a0276ef3] worker_loop+0x153/0x560 [btrfs] [360491.934833] [810697c3] kthread+0xb3/0xc0 [360491.934864] [815dc6bc] ret_from_fork+0x7c/0xb0 [360491.934896] DWARF2 unwinder stuck at ret_from_fork+0x7c/0xb0 [360491.934925] [360491.934934] Leftover inexact backtrace: [360491.934934] [360491.934965] [81069710] ? kthread_create_on_node+0x120/0x120 [360491.934999] Code: c1 48 63 43 58 48 89 c2 48 c1 e2 05 48 8d 54 10 65 48 63 43 2c 48 89 c6 48 c1 e6 05 48 8d 74 30 65 e8 3a af 04 00 e9 b3 fe ff ff 0f 0b 0f 0b 90 41 57 41 56 41 55 41 54 55 48 89 fd 53 48 83 ec [360491.935188] RIP [a022075b] __tree_mod_log_rewind+0x23b/0x240 [btrfs] [360491.935233] RSP 88014cad7888 [360491.946047] ---[ end trace 1475a0830dcadf9c ]--- [360491.946051] note: btrfs-endio-wri[22166] exited with preempt_count 1 On Thu, Jun 20, 2013 at 8:11 PM, Chris Mason chris.ma...@fusionio.com wrote: Quoting Jon Nelson (2013-06-18 13:19:04) Josef Bacik jbacik at fusionio.com writes: On Tue, Jun 11, 2013 at 11:43:30AM -0400, Sage Weil wrote: I'm also seeing this hang regularly with both 3.9 and 3.10-rc5. Is this is a known problem? In this case there is no powercycling; just a regular ceph-osd
Re: hang on 3.9, 3.10-rc5
Josef Bacik jbacik at fusionio.com writes: On Tue, Jun 11, 2013 at 11:43:30AM -0400, Sage Weil wrote: I'm also seeing this hang regularly with both 3.9 and 3.10-rc5. Is this is a known problem? In this case there is no powercycling; just a regular ceph-osd workload. .. I'm able to cause a complete kernel hang by defrag'ing even one file on 3.9.X (3.9.0 through 3.9.4, so far). -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: WARNING: at fs/btrfs/free-space-cache.c:921 __btrfs_write_out_cache+0x6b9/0x9a0 [btrfs]()
Josef Bacik jbacik at fusionio.com writes: .. Ok well that's not good, I'm not sure how you got a 156 gigabyte block group, but thats why that warning is going off. Can you pull btrfs-image down from here git://github.com/josefbacik/btrfs-progs.git What is the difference between this git repo and the one at git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git I notice that the former has commits the latter doesn't. Is the latter the analogue to btrfs-next ? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: corrupted filesystem. mounts with -o recovery,ro but not -o recovery or -o ro
I need to know soon if there is going to be anything I can do to rescue this filesystem. I've tried 3.7.10, 3.8.[5,6,7,8] and btrfs-next as of ( bba653d1207646b17671c6cb9a0629736811848a ). btrfs-next - at least - merely failed the mount, all of the others failed but also ran into a BUG, requiring a reboot. I *can* mount this with -o recovery,ro but nothing else works. On Sat, Apr 20, 2013 at 8:20 AM, Jon Nelson jnel...@jamponi.net wrote: Using 3.8.8, I tried mounting with -o recovery and -o recovery,nospace_cache (which shouldn't be any different, if I'm understanding the kernel sources properly) without any benefit. Then I tried btrfs-next ( bba653d1207646b17671c6cb9a0629736811848a as of this writing ) also without being able to mount the filesystem, except for one big improvement -- it doesn't BUG/crash the kernel, it just fails the mount. As before, -o recovery,ro works. These seem to be the most useful of the messages: 2013-04-20T08:10:06.263708-05:00 turnip kernel: [ 638.385206] BTRFS error (device sda3) in btrfs_run_delayed_refs:2657: errno=-2 No such entry 2013-04-20T08:10:06.263711-05:00 turnip kernel: [ 638.385211] BTRFS warning (device sda3): Skipping commit of aborted transaction. 2013-04-20T08:10:06.263713-05:00 turnip kernel: [ 638.385215] BTRFS error (device sda3) in cleanup_transaction:1450: errno=-2 No such entry 2013-04-20T08:10:06.263715-05:00 turnip kernel: [ 638.385322] BTRFS error (device sda3): Error removing orphan entry, stopping orphan cleanup 2013-04-20T08:10:06.263717-05:00 turnip kernel: [ 638.385334] BTRFS critical (device sda3): could not do orphan cleanup -22 2013-04-20T08:10:06.263720-05:00 turnip kernel: [ 638.385350] btrfs: commit super ret -30 Using debug-tree, I can determine that the most probable root backup is just one generation back, but btrfs-restore doesn't seem to want to let me use it: btrfs root backup slot 0 tree root gen 246528 block 2621770289152 extent root gen 246528 block 2621506080768 chunk root gen 220757 block 2622035951616 device root gen 220757 block 2621945528320 csum root gen 246529 block 2621775081472 fs root gen 246529 block 2621774274560 440384839680 used 1629622038016 total 4 devices btrfs root backup slot 1 tree root gen 246529 block 2619191820288 extent root gen 246530 block 2619724374016 chunk root gen 220757 block 2622035951616 device root gen 220757 block 2621945528320 csum root gen 246530 block 2619804864512 fs root gen 246530 block 2619723927552 440382750720 used 1629622038016 total 4 devices btrfs root backup slot 2 tree root gen 246530 block 2621340016640 extent root gen 246530 block 2619724374016 chunk root gen 220757 block 2622035951616 device root gen 220757 block 2621945528320 csum root gen 246530 block 2619804864512 fs root gen 246530 block 2619723927552 440385257472 used 1629622038016 total 4 devices btrfs root backup slot 3 tree root gen 246527 block 2621585006592 extent root gen 246528 block 2621506080768 chunk root gen 220757 block 2622035951616 device root gen 220757 block 2621945528320 csum root gen 246527 block 2621345435648 fs root gen 246528 block 2621586079744 440384839680 used 1629622038016 total 4 devices turnip:~/recovery # btrfs restore -r 2619191820288 -vv -i /dev/sdd /tmp/foo Error reading root turnip:~/recovery # Is there a way for me to use btrfs tools to tell the superblock to go ahead and use backup root #1 in this case? On Tue, Apr 16, 2013 at 11:44 AM, Jon Nelson jnel...@jamponi.net wrote: Tried to mount with -o recovery using 3.8.7. No change. Does anybody have any suggestions? On Sat, Apr 13, 2013 at 6:21 PM, Jon Nelson jnel...@jamponi.net wrote: I have a 4-disk btrfs filesystem in raid1 mode. I'm running openSUSE 12.3, 3.7.10, x86_64. A few days ago something went wrong and the filesystem re-mounted itself RO. After reboot, it didn't come up. After a fair bit of work, I can get the filesystem to mount with -o recovery,ro. However, if I use -o recovery alone or any other option I eventually hit a BUG and that's that. I've tried with up to kernel 3.8.6 without improvement. My first question is this: how I can make it so I can use the filesystem without having to mount it with -o recovery,ro from a rescue environment (I have imaged all four drives *and* made a full filesystem-level backup, except for snapshots and some others). My second set of question is: what went wrong initially, what went wrong with the recovery(s), and are there fixes in kernels after 3.8.6 that might be involved? I
Re: corrupted filesystem. mounts with -o recovery,ro but not -o recovery or -o ro
On Tue, Apr 23, 2013 at 10:03 AM, Liu Bo bo.li@oracle.com wrote: Can you please show us where it BUG_ON(or logs) when mounting with -o recovery? (the stack info below seems not to be a resulf of '-o recovery'?) I have this from 3.8.8: 2013-04-19T21:19:47.060937-05:00 turnip kernel: [ 100.608815] device fsid 7feedf1e-9711-4900-af9c-92738ea8aace devid 5 transid 246530 /dev/sdd 2013-04-19T21:19:47.072892-05:00 turnip kernel: [ 100.619694] btrfs: enabling auto recovery 2013-04-19T21:19:47.072925-05:00 turnip kernel: [ 100.619700] btrfs: disk space caching is enabled 2013-04-19T21:19:55.156935-05:00 turnip kernel: [ 108.692720] btrfs: csum mismatch on free space cache 2013-04-19T21:19:55.156978-05:00 turnip kernel: [ 108.692745] btrfs: failed to load free space cache for block group 2618814627840 2013-04-19T21:19:55.156982-05:00 turnip kernel: [ 108.693108] btrfs: csum mismatch on free space cache 2013-04-19T21:19:55.156985-05:00 turnip kernel: [ 108.693133] btrfs: failed to load free space cache for block group 2619888369664 2013-04-19T21:19:57.924956-05:00 turnip kernel: [ 111.453762] btrfs: csum mismatch on free space cache 2013-04-19T21:19:57.924999-05:00 turnip kernel: [ 111.453784] btrfs: failed to load free space cache for block group 2620962111488 2013-04-19T21:19:58.139783-05:00 turnip kernel: [ 111.521418] leaf 2618819321856 total ptrs 25 free space 1780 2013-04-19T21:19:58.139814-05:00 turnip kernel: [ 111.521428] item 0 key (2621340270592 a8 4096) itemoff 3908 itemsize 87 2013-04-19T21:19:58.139817-05:00 turnip kernel: [ 111.521434] extent refs 5 gen 214210 flags 2 2013-04-19T21:19:58.139822-05:00 turnip kernel: [ 111.521439] tree block key (562484 1 0) level 0 2013-04-19T21:19:58.139864-05:00 turnip kernel: [ 111.521441] tree block backref root 5 2013-04-19T21:19:58.139873-05:00 turnip kernel: [ 111.521445] shared block backref parent 2621774995456 2013-04-19T21:19:58.139875-05:00 turnip kernel: [ 111.521448] shared block backref parent 2621603078144 2013-04-19T21:19:58.139877-05:00 turnip kernel: [ 111.521451] shared block backref parent 2621340147712 2013-04-19T21:19:58.139879-05:00 turnip kernel: [ 111.521453] shared block backref parent 2621312724992 2013-04-19T21:19:58.139880-05:00 turnip kernel: [ 111.521458] item 1 key (2621340274688 a8 4096) itemoff 3857 itemsize 51 2013-04-19T21:19:58.139882-05:00 turnip kernel: [ 111.521462] extent refs 1 gen 214217 flags 258 2013-04-19T21:19:58.139884-05:00 turnip kernel: [ 111.521466] tree block key (159644 54 627062569) level 0 2013-04-19T21:19:58.139885-05:00 turnip kernel: [ 111.521469] shared block backref parent 2621340254208 2013-04-19T21:19:58.139887-05:00 turnip kernel: [ 111.521473] item 2 key (2621340286976 a8 4096) itemoff 3761 itemsize 96 2013-04-19T21:19:58.139888-05:00 turnip kernel: [ 111.521476] extent refs 6 gen 214207 flags 2 2013-04-19T21:19:58.139890-05:00 turnip kernel: [ 111.521480] tree block key (562122 c 469151) level 0 2013-04-19T21:19:58.139891-05:00 turnip kernel: [ 111.521482] tree block backref root 5 2013-04-19T21:19:58.139892-05:00 turnip kernel: [ 111.521485] shared block backref parent 2622020726784 2013-04-19T21:19:58.139894-05:00 turnip kernel: [ 111.521488] shared block backref parent 2621647806464 2013-04-19T21:19:58.139895-05:00 turnip kernel: [ 111.521491] shared block backref parent 2621520723968 2013-04-19T21:19:58.139897-05:00 turnip kernel: [ 111.521494] shared block backref parent 2621378756608 2013-04-19T21:19:58.139898-05:00 turnip kernel: [ 111.521496] shared block backref parent 2621341069312 2013-04-19T21:19:58.139900-05:00 turnip kernel: [ 111.521501] item 3 key (2621340295168 a8 4096) itemoff 3710 itemsize 51 2013-04-19T21:19:58.139901-05:00 turnip kernel: [ 111.521505] extent refs 1 gen 241434 flags 2 2013-04-19T21:19:58.139903-05:00 turnip kernel: [ 111.521509] tree block key (2620335087616 b6 2621224001536) level 0 2013-04-19T21:19:58.139904-05:00 turnip kernel: [ 111.521516] tree block backref root 2 2013-04-19T21:19:58.139906-05:00 turnip kernel: [ 111.521521] item 4 key (2621340299264 a8 4096) itemoff 3659 itemsize 51 2013-04-19T21:19:58.139907-05:00 turnip kernel: [ 111.521525] extent refs 1 gen 208634 flags 258 2013-04-19T21:19:58.139909-05:00 turnip kernel: [ 111.521529] tree block key (11446 1 0) level 1 2013-04-19T21:19:58.139910-05:00 turnip kernel: [ 111.521531] shared block backref parent 2621405974528 2013-04-19T21:19:58.139912-05:00 turnip kernel: [ 111.521541] item 5 key (2621340303360 a8 4096) itemoff 3608 itemsize 51 2013-04-19T21:19:58.139913-05:00 turnip kernel: [ 111.521544] extent refs 1 gen 208634 flags 258 2013-04-19T21:19:58.139915-05:00 turnip kernel: [ 111.521548] tree block key (11547 6c 0) level 0 2013-04-19T21:19:58.139917-05:00 turnip kernel: [ 111.521550] shared block backref parent 2621340299264 2013-04-19T21:19:58.139918-05:00 turnip kernel: [ 111.521555] item 6 key (2621340307456 a8 4096) itemoff 3557 itemsize
Re: minor patch to cmds-restore.c
On Fri, Apr 19, 2013 at 11:48 PM, Eric Sandeen sand...@redhat.com wrote: On 4/19/13 7:11 PM, Jon Nelson wrote: The following is a minor patch to cmds-restore.c Hi Jon - just a note - When sending a patch like this, it's best to follow the standard patch format, which closely mimics the kernel patch submission guidelines: ... That way it's clear to the reviewer as well as making the source control history more descriptive. Hum, it'd be nice to have it in the manpage too... Thank you for the constructive and polite feedback! I will try to adhere the next time I send a patch. If I can find some time later, I'll update the manpage, too. Before I do that, is the verbiage in the patch acceptable? -- Jon Software Blacksmith -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: corrupted filesystem. mounts with -o recovery,ro but not -o recovery or -o ro
Using 3.8.8, I tried mounting with -o recovery and -o recovery,nospace_cache (which shouldn't be any different, if I'm understanding the kernel sources properly) without any benefit. Then I tried btrfs-next ( bba653d1207646b17671c6cb9a0629736811848a as of this writing ) also without being able to mount the filesystem, except for one big improvement -- it doesn't BUG/crash the kernel, it just fails the mount. As before, -o recovery,ro works. These seem to be the most useful of the messages: 2013-04-20T08:10:06.263708-05:00 turnip kernel: [ 638.385206] BTRFS error (device sda3) in btrfs_run_delayed_refs:2657: errno=-2 No such entry 2013-04-20T08:10:06.263711-05:00 turnip kernel: [ 638.385211] BTRFS warning (device sda3): Skipping commit of aborted transaction. 2013-04-20T08:10:06.263713-05:00 turnip kernel: [ 638.385215] BTRFS error (device sda3) in cleanup_transaction:1450: errno=-2 No such entry 2013-04-20T08:10:06.263715-05:00 turnip kernel: [ 638.385322] BTRFS error (device sda3): Error removing orphan entry, stopping orphan cleanup 2013-04-20T08:10:06.263717-05:00 turnip kernel: [ 638.385334] BTRFS critical (device sda3): could not do orphan cleanup -22 2013-04-20T08:10:06.263720-05:00 turnip kernel: [ 638.385350] btrfs: commit super ret -30 Using debug-tree, I can determine that the most probable root backup is just one generation back, but btrfs-restore doesn't seem to want to let me use it: btrfs root backup slot 0 tree root gen 246528 block 2621770289152 extent root gen 246528 block 2621506080768 chunk root gen 220757 block 2622035951616 device root gen 220757 block 2621945528320 csum root gen 246529 block 2621775081472 fs root gen 246529 block 2621774274560 440384839680 used 1629622038016 total 4 devices btrfs root backup slot 1 tree root gen 246529 block 2619191820288 extent root gen 246530 block 2619724374016 chunk root gen 220757 block 2622035951616 device root gen 220757 block 2621945528320 csum root gen 246530 block 2619804864512 fs root gen 246530 block 2619723927552 440382750720 used 1629622038016 total 4 devices btrfs root backup slot 2 tree root gen 246530 block 2621340016640 extent root gen 246530 block 2619724374016 chunk root gen 220757 block 2622035951616 device root gen 220757 block 2621945528320 csum root gen 246530 block 2619804864512 fs root gen 246530 block 2619723927552 440385257472 used 1629622038016 total 4 devices btrfs root backup slot 3 tree root gen 246527 block 2621585006592 extent root gen 246528 block 2621506080768 chunk root gen 220757 block 2622035951616 device root gen 220757 block 2621945528320 csum root gen 246527 block 2621345435648 fs root gen 246528 block 2621586079744 440384839680 used 1629622038016 total 4 devices turnip:~/recovery # btrfs restore -r 2619191820288 -vv -i /dev/sdd /tmp/foo Error reading root turnip:~/recovery # Is there a way for me to use btrfs tools to tell the superblock to go ahead and use backup root #1 in this case? On Tue, Apr 16, 2013 at 11:44 AM, Jon Nelson jnel...@jamponi.net wrote: Tried to mount with -o recovery using 3.8.7. No change. Does anybody have any suggestions? On Sat, Apr 13, 2013 at 6:21 PM, Jon Nelson jnel...@jamponi.net wrote: I have a 4-disk btrfs filesystem in raid1 mode. I'm running openSUSE 12.3, 3.7.10, x86_64. A few days ago something went wrong and the filesystem re-mounted itself RO. After reboot, it didn't come up. After a fair bit of work, I can get the filesystem to mount with -o recovery,ro. However, if I use -o recovery alone or any other option I eventually hit a BUG and that's that. I've tried with up to kernel 3.8.6 without improvement. My first question is this: how I can make it so I can use the filesystem without having to mount it with -o recovery,ro from a rescue environment (I have imaged all four drives *and* made a full filesystem-level backup, except for snapshots and some others). My second set of question is: what went wrong initially, what went wrong with the recovery(s), and are there fixes in kernels after 3.8.6 that might be involved? I have *some* logs, and I might be able to share portions of them. I also took a btrfs-image. Using a very recent btrfs-progs git pull, 'btrfs repair ...' gives me: ERROR: device scan failed '/dev/sdb' - Device or resource busy ERROR: device scan failed '/dev/sda' - Device or resource busy failed to open /dev/sr0: No medium found ERROR: device scan failed '/dev/sdb' - Device or resource busy ERROR: device scan failed '/dev/sda' - Device or resource busy failed to open /dev/sr0: No medium found
minor patch to cmds-restore.c
The following is a minor patch to cmds-restore.c diff --git a/cmds-restore.c b/cmds-restore.c index c75e187..273c813 100644 --- a/cmds-restore.c +++ b/cmds-restore.c @@ -917,14 +917,16 @@ out: } const char * const cmd_restore_usage[] = { - btrfs restore [options] device, + btrfs restore [options] device [destination], Try to restore files from a damaged filesystem (unmounted), , + -l list roots, -s get snapshots, -v verbose, -i ignore errors, -o overwrite, - -t tree location, + -r rootid root location, + -t treeid tree location, -f offset filesystem location, -u block super mirror, -d find dir, -- Jon Software Blacksmith -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: corrupted filesystem. mounts with -o recovery,ro but not -o recovery or -o ro
Tried to mount with -o recovery using 3.8.7. No change. Does anybody have any suggestions? On Sat, Apr 13, 2013 at 6:21 PM, Jon Nelson jnel...@jamponi.net wrote: I have a 4-disk btrfs filesystem in raid1 mode. I'm running openSUSE 12.3, 3.7.10, x86_64. A few days ago something went wrong and the filesystem re-mounted itself RO. After reboot, it didn't come up. After a fair bit of work, I can get the filesystem to mount with -o recovery,ro. However, if I use -o recovery alone or any other option I eventually hit a BUG and that's that. I've tried with up to kernel 3.8.6 without improvement. My first question is this: how I can make it so I can use the filesystem without having to mount it with -o recovery,ro from a rescue environment (I have imaged all four drives *and* made a full filesystem-level backup, except for snapshots and some others). My second set of question is: what went wrong initially, what went wrong with the recovery(s), and are there fixes in kernels after 3.8.6 that might be involved? I have *some* logs, and I might be able to share portions of them. I also took a btrfs-image. Using a very recent btrfs-progs git pull, 'btrfs repair ...' gives me: ERROR: device scan failed '/dev/sdb' - Device or resource busy ERROR: device scan failed '/dev/sda' - Device or resource busy failed to open /dev/sr0: No medium found ERROR: device scan failed '/dev/sdb' - Device or resource busy ERROR: device scan failed '/dev/sda' - Device or resource busy failed to open /dev/sr0: No medium found checking extents Backref 341888225280 parent 2621340434432 owner 0 offset 0 num_refs 0 not found in extent tree Incorrect local backref count on 341888225280 parent 2621340434432 owner 0 offset 0 found 1 wanted 0 back 0x6dc8500 Incorrect local backref count on 341888225280 root 1 owner 496 offset 0 found 0 wanted 1 back 0x2bb636c0 backpointer mismatch on [341888225280 262144] Unable to find block group for 0 btrfs: extent-tree.c:284: find_search_start: Assertion `!(1)' failed. enabling repair mode Checking filesystem on /dev/sdd UUID: 7feedf1e-9711-4900-af9c-92738ea8aace and some of the errors are here: [ 314.095449] [ cut here ] [ 314.095526] WARNING: at /home/abuild/rpmbuild/BUILD/kernel-desktop-3.8.6/linux-3.8/fs/btrfs/extent-tree.c:5208 __btrfs_free_extent+0x853/0x890 [btrfs]() [ 314.095541] Hardware name: TA790GX XE [ 314.09] Modules linked in: dm_mod af_packet cpufreq_conservative cpufreq_userspace cpufreq_powersave snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer snd bt rfs acpi_cpufreq mperf kvm_amd zlib_deflate libcrc32c kvm radeon sr_mod ttm drm_kms_helper cdrom processor sg via_velocity drm i2c_algo_bit shpchp pci_hotplug sp5100_tco i2c_piix4 edac_core edac_mce_amd thermal ata_generic thermal_sys r8169 pata_atiixp k10temp pcspkr microcode crc_ccitt wmi soundcore snd_page_alloc button autofs4 [ 314.095867] Pid: 5310, comm: btrfs-transacti Not tainted 3.8.6-2-desktop #1 [ 314.095875] Call Trace: [ 314.095904] [81004748] dump_trace+0x88/0x300 [ 314.095923] [815a9128] dump_stack+0x69/0x6f [ 314.095937] [81044f49] warn_slowpath_common+0x79/0xc0 [ 314.095968] [a0400db3] __btrfs_free_extent+0x853/0x890 [btrfs] [ 314.096061] [a0404b0f] run_clustered_refs+0x48f/0xb20 [btrfs] [ 314.096147] [a0408a9a] btrfs_run_delayed_refs+0xca/0x320 [btrfs] [ 314.096249] [a04182e0] btrfs_commit_transaction+0x80/0xb00 [btrfs] [ 314.096379] [a0411b4d] transaction_kthread+0x19d/0x220 [btrfs] [ 314.096492] [81068043] kthread+0xb3/0xc0 [ 314.096506] [815bbf7c] ret_from_fork+0x7c/0xb0 [ 314.096515] ---[ end trace 64d3998241407ddc ]--- [ 314.096520] btrfs unable to find ref byte nr 2621340344320 parent 0 root 2 owner 1 offset 0 [ 314.096526] [ cut here ] [ 314.096551] WARNING: at /home/abuild/rpmbuild/BUILD/kernel-desktop-3.8.6/linux-3.8/fs/btrfs/extent-tree.c:5265 __btrfs_free_extent+0x7ba/0x890 [btrfs]() [ 314.096554] Hardware name: TA790GX XE [ 314.096556] Modules linked in: dm_mod af_packet cpufreq_conservative cpufreq_userspace cpufreq_powersave snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer snd btrfs acpi_cpufreq mperf kvm_amd zlib_deflate libcrc32c kvm radeon sr_mod ttm drm_kms_helper cdrom processor sg via_velocity drm i2c_algo_bit shpchp pci_hotplug sp5100_tco i2c_piix4 edac_core edac_mce_amd thermal ata_generic thermal_sys r8169 pata_atiixp k10temp pcspkr microcode crc_ccitt wmi soundcore snd_page_alloc button autofs4 [ 314.096613] Pid: 5310, comm: btrfs-transacti Tainted: GW 3.8.6-2-desktop #1 [ 314.096615] Call Trace: [ 314.096627] [81004748] dump_trace+0x88/0x300 [ 314.096636] [815a9128] dump_stack+0x69/0x6f [ 314.096646
corrupted filesystem. mounts with -o recovery,ro but not -o recovery or -o ro
I have a 4-disk btrfs filesystem in raid1 mode. I'm running openSUSE 12.3, 3.7.10, x86_64. A few days ago something went wrong and the filesystem re-mounted itself RO. After reboot, it didn't come up. After a fair bit of work, I can get the filesystem to mount with -o recovery,ro. However, if I use -o recovery alone or any other option I eventually hit a BUG and that's that. I've tried with up to kernel 3.8.6 without improvement. My first question is this: how I can make it so I can use the filesystem without having to mount it with -o recovery,ro from a rescue environment (I have imaged all four drives *and* made a full filesystem-level backup, except for snapshots and some others). My second set of question is: what went wrong initially, what went wrong with the recovery(s), and are there fixes in kernels after 3.8.6 that might be involved? I have *some* logs, and I might be able to share portions of them. I also took a btrfs-image. Using a very recent btrfs-progs git pull, 'btrfs repair ...' gives me: ERROR: device scan failed '/dev/sdb' - Device or resource busy ERROR: device scan failed '/dev/sda' - Device or resource busy failed to open /dev/sr0: No medium found ERROR: device scan failed '/dev/sdb' - Device or resource busy ERROR: device scan failed '/dev/sda' - Device or resource busy failed to open /dev/sr0: No medium found checking extents Backref 341888225280 parent 2621340434432 owner 0 offset 0 num_refs 0 not found in extent tree Incorrect local backref count on 341888225280 parent 2621340434432 owner 0 offset 0 found 1 wanted 0 back 0x6dc8500 Incorrect local backref count on 341888225280 root 1 owner 496 offset 0 found 0 wanted 1 back 0x2bb636c0 backpointer mismatch on [341888225280 262144] Unable to find block group for 0 btrfs: extent-tree.c:284: find_search_start: Assertion `!(1)' failed. enabling repair mode Checking filesystem on /dev/sdd UUID: 7feedf1e-9711-4900-af9c-92738ea8aace and some of the errors are here: [ 314.095449] [ cut here ] [ 314.095526] WARNING: at /home/abuild/rpmbuild/BUILD/kernel-desktop-3.8.6/linux-3.8/fs/btrfs/extent-tree.c:5208 __btrfs_free_extent+0x853/0x890 [btrfs]() [ 314.095541] Hardware name: TA790GX XE [ 314.09] Modules linked in: dm_mod af_packet cpufreq_conservative cpufreq_userspace cpufreq_powersave snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer snd bt rfs acpi_cpufreq mperf kvm_amd zlib_deflate libcrc32c kvm radeon sr_mod ttm drm_kms_helper cdrom processor sg via_velocity drm i2c_algo_bit shpchp pci_hotplug sp5100_tco i2c_piix4 edac_core edac_mce_amd thermal ata_generic thermal_sys r8169 pata_atiixp k10temp pcspkr microcode crc_ccitt wmi soundcore snd_page_alloc button autofs4 [ 314.095867] Pid: 5310, comm: btrfs-transacti Not tainted 3.8.6-2-desktop #1 [ 314.095875] Call Trace: [ 314.095904] [81004748] dump_trace+0x88/0x300 [ 314.095923] [815a9128] dump_stack+0x69/0x6f [ 314.095937] [81044f49] warn_slowpath_common+0x79/0xc0 [ 314.095968] [a0400db3] __btrfs_free_extent+0x853/0x890 [btrfs] [ 314.096061] [a0404b0f] run_clustered_refs+0x48f/0xb20 [btrfs] [ 314.096147] [a0408a9a] btrfs_run_delayed_refs+0xca/0x320 [btrfs] [ 314.096249] [a04182e0] btrfs_commit_transaction+0x80/0xb00 [btrfs] [ 314.096379] [a0411b4d] transaction_kthread+0x19d/0x220 [btrfs] [ 314.096492] [81068043] kthread+0xb3/0xc0 [ 314.096506] [815bbf7c] ret_from_fork+0x7c/0xb0 [ 314.096515] ---[ end trace 64d3998241407ddc ]--- [ 314.096520] btrfs unable to find ref byte nr 2621340344320 parent 0 root 2 owner 1 offset 0 [ 314.096526] [ cut here ] [ 314.096551] WARNING: at /home/abuild/rpmbuild/BUILD/kernel-desktop-3.8.6/linux-3.8/fs/btrfs/extent-tree.c:5265 __btrfs_free_extent+0x7ba/0x890 [btrfs]() [ 314.096554] Hardware name: TA790GX XE [ 314.096556] Modules linked in: dm_mod af_packet cpufreq_conservative cpufreq_userspace cpufreq_powersave snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer snd btrfs acpi_cpufreq mperf kvm_amd zlib_deflate libcrc32c kvm radeon sr_mod ttm drm_kms_helper cdrom processor sg via_velocity drm i2c_algo_bit shpchp pci_hotplug sp5100_tco i2c_piix4 edac_core edac_mce_amd thermal ata_generic thermal_sys r8169 pata_atiixp k10temp pcspkr microcode crc_ccitt wmi soundcore snd_page_alloc button autofs4 [ 314.096613] Pid: 5310, comm: btrfs-transacti Tainted: GW 3.8.6-2-desktop #1 [ 314.096615] Call Trace: [ 314.096627] [81004748] dump_trace+0x88/0x300 [ 314.096636] [815a9128] dump_stack+0x69/0x6f [ 314.096646] [81044f49] warn_slowpath_common+0x79/0xc0 [ 314.096673] [a0400d1a] __btrfs_free_extent+0x7ba/0x890 [btrfs] [ 314.096752] [a0404b0f] run_clustered_refs+0x48f/0xb20 [btrfs] [ 314.096832] [a0408a9a] btrfs_run_delayed_refs+0xca/0x320
device error stats resettable?
I have a device that is part of a 4-device btrfs raid1 setup. I had accidentally jiggled the cable for this device and it started racking up errors (about 90,000). After fixing the cable (and a scrub), all of the errors are fixed (woo!), but the device still shows lots of errors. Is there a way to reset the device error count? I'm on 3.8.2 on openSUSE 12.3 x86_64. -- Jon Software Blacksmith -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: device error stats resettable?
On Mon, Apr 1, 2013 at 5:39 PM, anand jain anand.j...@oracle.com wrote: Is there a way to reset the device error count? there is -z, is it not what you are looking for ? -- # btrfs dev stat --help usage: btrfs device stats [-z] path|device Show current device IO stats. -z to reset stats afterwards. Aargh. Newer btrfs tool than I was using. Using the latest git version, this works great, of course. Thanks! -- Jon Software Blacksmith -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-show vs. btrfs different output
On Thu, Mar 21, 2013 at 11:25 AM, Eric Sandeen sand...@redhat.com wrote: On 3/21/13 10:29 AM, Jon Nelson wrote: On Thu, Mar 21, 2013 at 10:11 AM, Eric Sandeen sand...@redhat.com wrote: On 3/21/13 10:04 AM, Jon Nelson wrote: ... 2. the current git btrfs-show and btrfs fi show both output *different* devices for device with UUID b5dc52bd-21bf-4173-8049-d54d88c82240, and they're both wrong. does blkid output find that uuid anywhere? Since you're working in git, can you maybe do a little bisecting to find out when it changed? Should be a fairly quick test? blkid does /not/ report that uuid anywhere. git bisect, if I did it correctly, says: 6eba9002956ac40db87d42fb653a0524dc568810 is the first bad commit commit 6eba9002956ac40db87d42fb653a0524dc568810 Author: Goffredo Baroncelli kreij...@inwind.it Date: Tue Sep 4 19:59:26 2012 +0200 Correct un-initialized fsid variable :100644 100644 b21a87f827a6250da45f2fb6a1c3a6b651062243 03952051b5e25e0b67f0f910c84d93eb90de8480 M disk-io.c Ok, I think this is another case of greedily scanning stale backup superblocks (did you ever have btrfs on the whole sda or sdb?) btrfs_read_dev_super() currently tries to scan all 3 superblocks (primary 2 backups). I'm guessing that you have some stale backup superblocks on sda and/or sdb. Before the above commit, if the first sb didn't look valid, it'd skip to the 2nd. If the 2nd (stale) one looked OK, it'd compare its fsid to an uniniitialized variable (fsid) which would fail (since the fsid contents were random.) Same for the 3rd backup if found, and eventually it'd return -1 as failure and not report the device. After the commit, it'd skip the first invalid sb as well. But this time, it takes the fsid from the 2nd superblock as good and makes it through the loop thinking that it's found something valid. Hence the report of a device which you didn't expect even though the first superblock is indeed wiped out. There are some patches floating around to stop this backup superblock scanning altogether. This might fix it for you; it basically returns failure if any superblock on the device is found to be bad. What we really need is the right bits in the right places to let the administrator know if a device looks like it might be corrupt in need of fixing, vs. ignoring it altogether. Not sure if this is something we want upstream but you could test if if you like. I did test and it appears to resolve the issue for me. Thank you! -- Jon Software Blacksmith -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
crash with 3.7.10 and balance.
I'm running openSUSE 12.3 on x86_64. I was running a balance: btrfs balance -dusage=5 -v / using the latest btrfs tools code from git (as of this writing) and got a crash: [304158.496250] btrfs: found 75 extents [304159.309289] btrfs: relocating block group 2303295684608 flags 17 [304159.839886] btrfs: found 1 extents [304161.484616] [ cut here ] [304161.484668] WARNING: at /home/abuild/rpmbuild/BUILD/kernel-default-3.7.10/linux-3.7/fs/btrfs/super.c:246 __btrfs_abort_transaction+0xc3/0xe0 [btrfs]() [304161.484671] Hardware name: TA790GX XE [304161.484673] btrfs: Transaction aborted [304161.484675] Modules linked in: af_packet md5 xt_REDIRECT xt_pkttype xt_physdev xt_TCPMSS xt_tcpudp xt_LOG xt_limit iptable_nat nf_nat_ipv4 nf_nat iptable_mangle xt_mark nfsd nfs_acl nfs fscache lockd auth_rpcgss ebt_ip sunrpc ebtable_filter ebtables bridge stp llc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ipt_REJECT iptable_raw xt_CT iptable_filter ip6table_mangle nf_conntrack_ftp nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack ip6table_filter ip6_tables x_tables cpufreq_conservative cpufreq_userspace cpufreq_powersave snd_hda_codec_hdmi snd_hda_codec_realtek acpi_cpufreq snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer snd sr_mod cdrom radeon mperf ttm soundcore ata_generic sg via_velocity sp5100_tco drm_kms_helper kvm_amd kvm microcode snd_page_alloc r8169 pcspkr crc_ccitt button i2c_piix4 k10temp edac_core drm pata_atiixp i2c_algo_bit edac_mce_amd shpchp pci_hotplug wmi tcp_htcp autofs4 btrfs zlib_deflate libcrc32c raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx raid10 raid0 raid1 ohci_hcd ehci_hcd usbcore usb_common scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_dh dm_mirror dm_region_hash dm_log dm_mod edd fan thermal processor thermal_sys [304161.484749] Pid: 22397, comm: btrfs Tainted: GW 3.7.10-1.1-default #1 [304161.484751] Call Trace: [304161.484770] [81004728] dump_trace+0x78/0x2c0 [304161.484777] [8153b44e] dump_stack+0x69/0x6f [304161.484785] [81043f69] warn_slowpath_common+0x79/0xc0 [304161.484791] [81044065] warn_slowpath_fmt+0x45/0x50 [304161.484812] [a01662e3] __btrfs_abort_transaction+0xc3/0xe0 [btrfs] [304161.484844] [a0175ddd] __btrfs_inc_extent_ref+0x1ed/0x250 [btrfs] [304161.484899] [a017c3f6] run_clustered_refs+0x666/0xa90 [btrfs] [304161.484954] [a017f4ea] btrfs_run_delayed_refs+0xca/0x310 [btrfs] [304161.485012] [a018f7d9] __btrfs_end_transaction+0xf9/0x420 [btrfs] [304161.485085] [a01d457d] merge_reloc_root+0x48d/0x520 [btrfs] [304161.485214] [a01d4711] merge_reloc_roots+0x101/0x140 [btrfs] [304161.485337] [a01d4bde] relocate_block_group+0x25e/0x6b0 [btrfs] [304161.485459] [a01d51d9] btrfs_relocate_block_group+0x1a9/0x2e0 [btrfs] [304161.485579] [a01aed4d] btrfs_relocate_chunk.isra.53+0x5d/0x6e0 [btrfs] [304161.485674] [a01b3086] btrfs_balance+0x826/0xd60 [btrfs] [304161.485770] [a01b88f6] btrfs_ioctl_balance+0x136/0x420 [btrfs] [304161.485878] [a01bcc64] btrfs_ioctl+0xe54/0x1870 [btrfs] [304161.485967] [81177b1f] do_vfs_ioctl+0x8f/0x520 [304161.485973] [81178050] sys_ioctl+0xa0/0xc0 [304161.485979] [8154f2ed] system_call_fastpath+0x1a/0x1f [304161.485989] [7f03050aef27] 0x7f03050aef26 [304161.485991] ---[ end trace d010cbea0d653c96 ]--- [304161.485995] BTRFS error (device sdd) in __btrfs_inc_extent_ref:1952: Object already exists [304161.485996] btrfs is forced readonly [304161.486051] [ cut here ] [304161.486138] kernel BUG at /home/abuild/rpmbuild/BUILD/kernel-default-3.7.10/linux-3.7/fs/btrfs/relocation.c:2279! [304161.486299] invalid opcode: [#1] SMP [304161.486371] Modules linked in: af_packet md5 xt_REDIRECT xt_pkttype xt_physdev xt_TCPMSS xt_tcpudp xt_LOG xt_limit iptable_nat nf_nat_ipv4 nf_nat iptable_mangle xt_mark nfsd nfs_acl nfs fscache lockd auth_rpcgss ebt_ip sunrpc ebtable_filter ebtables bridge stp llc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ipt_REJECT iptable_raw xt_CT iptable_filter ip6table_mangle nf_conntrack_ftp nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack ip6table_filter ip6_tables x_tables cpufreq_conservative cpufreq_userspace cpufreq_powersave snd_hda_codec_hdmi snd_hda_codec_realtek acpi_cpufreq snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer snd sr_mod cdrom radeon mperf ttm soundcore ata_generic sg via_velocity sp5100_tco drm_kms_helper kvm_amd kvm microcode snd_page_alloc r8169 pcspkr crc_ccitt button i2c_piix4 k10temp edac_core drm pata_atiixp i2c_algo_bit edac_mce_amd shpchp pci_hotplug wmi tcp_htcp autofs4 btrfs zlib_deflate libcrc32c raid456 async_raid6_recov async_pq raid6_pq async_xor xor
btrfs-show vs. btrfs different output
I'm running openSUSE 12.3 x86_64 which has an unknown git version, but reports v0.19. I'm also supplying the output from git which reports itself as: v0.20-rc1-253-g7854c8b The problem is that btrfs-show (git) and btrfs fi show (git) give /different/ output from each other which is also different from the (older) btrfs. First btrfs-show (git): ** ** WARNING: this program is considered deprecated ** Please consider to switch to the btrfs utility ** Label: none uuid: b5dc52bd-21bf-4173-8049-d54d88c82240 Total devices 2 FS bytes used 230.34GB devid1 size 298.09GB used 298.09GB path /dev/sda *** Some devices missing Label: none uuid: 7feedf1e-9711-4900-af9c-92738ea8aace Total devices 4 FS bytes used 348.21GB devid4 size 460.76GB used 444.23GB path /dev/sdb3 devid3 size 460.76GB used 444.23GB path /dev/sda3 devid6 size 298.09GB used 282.53GB path /dev/sdc devid5 size 298.09GB used 282.53GB path /dev/sdd Btrfs v0.20-rc1-253-g7854c8b Now btrfs fi show (git): failed to open /dev/sr0: No medium found Label: none uuid: 7feedf1e-9711-4900-af9c-92738ea8aace Total devices 4 FS bytes used 348.21GB devid5 size 298.09GB used 282.53GB path /dev/sdd devid6 size 298.09GB used 282.53GB path /dev/sdc devid4 size 460.76GB used 444.23GB path /dev/sdb3 devid3 size 460.76GB used 444.23GB path /dev/sda3 Label: none uuid: b5dc52bd-21bf-4173-8049-d54d88c82240 Total devices 2 FS bytes used 230.34GB devid1 size 298.09GB used 298.09GB path /dev/sdb *** Some devices missing Btrfs v0.20-rc1-253-g7854c8b And now the (older) btrfs fi show: failed to read /dev/sr0 Label: none uuid: 7feedf1e-9711-4900-af9c-92738ea8aace Total devices 4 FS bytes used 348.21GB devid5 size 298.09GB used 282.53GB path /dev/sdd devid6 size 298.09GB used 282.53GB path /dev/sdc devid4 size 460.76GB used 444.23GB path /dev/sdb3 devid3 size 460.76GB used 444.23GB path /dev/sda3 Btrfs v0.19+ which has similar output to the (older) btrfs-show. The differences are: 1. the order of devices varies (not a big deal) 2. the current git btrfs-show and btrfs fi show both output *different* devices for device with UUID b5dc52bd-21bf-4173-8049-d54d88c82240, and they're both wrong. Somewhat confusingly, /dev/disk/by-uuid/ only shows three devices: bd40cb4e-93bb-4600-8455-ca1185aa8abe - ../../md3 7feedf1e-9711-4900-af9c-92738ea8aace - ../../sda3 4f0b27a5-5f5b-413c-a71b-b5a3bec5482c - ../../md2 What's going on with the varied output from (git) btrfs-show vs. 'btrfs fi show' (vs. the older, as-shipped btrfs)? -- Jon Software Blacksmith -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-show vs. btrfs different output
On Thu, Mar 21, 2013 at 10:11 AM, Eric Sandeen sand...@redhat.com wrote: On 3/21/13 10:04 AM, Jon Nelson wrote: ... 2. the current git btrfs-show and btrfs fi show both output *different* devices for device with UUID b5dc52bd-21bf-4173-8049-d54d88c82240, and they're both wrong. does blkid output find that uuid anywhere? Since you're working in git, can you maybe do a little bisecting to find out when it changed? Should be a fairly quick test? blkid does /not/ report that uuid anywhere. git bisect, if I did it correctly, says: 6eba9002956ac40db87d42fb653a0524dc568810 is the first bad commit commit 6eba9002956ac40db87d42fb653a0524dc568810 Author: Goffredo Baroncelli kreij...@inwind.it Date: Tue Sep 4 19:59:26 2012 +0200 Correct un-initialized fsid variable :100644 100644 b21a87f827a6250da45f2fb6a1c3a6b651062243 03952051b5e25e0b67f0f910c84d93eb90de8480 M disk-io.c -- Jon Software Blacksmith -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)
On Sun, Dec 12, 2010 at 8:06 PM, Ted Ts'o ty...@mit.edu wrote: On Sun, Dec 12, 2010 at 07:11:28AM -0600, Jon Nelson wrote: I'm glad you've been able to reproduce the problem! If you should need any further assistance, please do not hesitate to ask. This patch seems to fix the problem for me. (Unless the partition is mounted with mblk_io_submit.) Could you confirm that it fixes it for you as well? I believe I have applied the (relevant) inode.c changes to bd2d0210cf22f2bd0cef72eb97cf94fc7d31d8cc, rebuilt and begun testing. Now at 28 passes without error, I think I can say that the patch appears to resolve the issue. -- Jon -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)
On Sat, Dec 11, 2010 at 9:16 PM, Jon Nelson jnel...@jamponi.net wrote: On Sat, Dec 11, 2010 at 7:40 PM, Ted Ts'o ty...@mit.edu wrote: Yes, indeed. Is this in the virtualized environment or on real hardware at this point? And how many CPU's do you have configured in your virtualized environment, and how memory memory? Is having a certain number of CPU's critical for reproducing the problem? Is constricting the amount of memory important? Originally, I observed the behavior on really real hardware. Since then, I have been able to reproduce it in VirtualBox and qemu-kvm, with openSUSE 11.3 and KUbuntu. All of the more recent tests have been with qemu-kvm. I have one CPU configured in the environment, 512MB of memory. I have not done any memory-constriction tests whatsoever. It'll be a lot easier if I can reproduce it locally, which is why I'm asking all of these questions. On Sat, Dec 11, 2010 at 8:34 PM, Ted Ts'o ty...@mit.edu wrote: One experiment --- can you try this with the file system mounted with data=writeback, and see if the problem reproduces in that journalling mode? That test is running now, first with encryption. I will report if it shows problems. If it does, I will wait until I have been able to see that a few times, and move to a no-encryption test. Typically, I have to run quite a few more iterations of that test before problems show up (if they will at all). I want to rule out (if possible) journal_submit_inode_data_buffers() racing with mpage_da_submit_io(). I don't think that's the issue, but I'd prefer to do the experiment to make sure. So if you can use a kernel and system configuration which triggers the problem, and then try changing the mount options to include data=writeback, and then rerun the test, and let me know if the problem still reproduces, I'd be really grateful. Using 2.6.37-rc5 and data=writeback,noatime and LUKS encryption I hit the problem 71 times out of 173. -- Jon -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)
On Sun, Dec 12, 2010 at 6:43 AM, Ted Ts'o ty...@mit.edu wrote: On Sun, Dec 12, 2010 at 04:18:29AM -0600, Jon Nelson wrote: I have one CPU configured in the environment, 512MB of memory. I have not done any memory-constriction tests whatsoever. I've finally been able to reproduce it myself, on real hardware. SMP is not necessary to reproduce it, although having more than one CPU doesn't hurt. What I did need to do (on real hardware with 4 gigs of memory) was to turn off swap and pin enough memory so that free memory was around 200megs or so before the start of the test. (This is the natural amount of free memory that the system would try to reach, since 200 megs is about 5% of 4 gigs.) Then, during the test, free memory would drop to 50-70 megabytes, forcing writeback to run, and then I could trigger it about 1-2 times out of three. I'm guessing that when you used 512mb of memory, that was in effect a memory-constriction test, and if you were to push the memory down a little further, it might reproduce even more quickly. My next step is to try to reproduce this in a VM, and then I can start probing to see what might be going on. I'm glad you've been able to reproduce the problem! If you should need any further assistance, please do not hesitate to ask. -- Jon -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)
On Sat, Dec 11, 2010 at 7:40 PM, Ted Ts'o ty...@mit.edu wrote: Yes, indeed. Is this in the virtualized environment or on real hardware at this point? And how many CPU's do you have configured in your virtualized environment, and how memory memory? Is having a certain number of CPU's critical for reproducing the problem? Is constricting the amount of memory important? Originally, I observed the behavior on really real hardware. Since then, I have been able to reproduce it in VirtualBox and qemu-kvm, with openSUSE 11.3 and KUbuntu. All of the more recent tests have been with qemu-kvm. I have one CPU configured in the environment, 512MB of memory. I have not done any memory-constriction tests whatsoever. It'll be a lot easier if I can reproduce it locally, which is why I'm asking all of these questions. On Sat, Dec 11, 2010 at 8:34 PM, Ted Ts'o ty...@mit.edu wrote: One experiment --- can you try this with the file system mounted with data=writeback, and see if the problem reproduces in that journalling mode? That test is running now, first with encryption. I will report if it shows problems. If it does, I will wait until I have been able to see that a few times, and move to a no-encryption test. Typically, I have to run quite a few more iterations of that test before problems show up (if they will at all). I want to rule out (if possible) journal_submit_inode_data_buffers() racing with mpage_da_submit_io(). I don't think that's the issue, but I'd prefer to do the experiment to make sure. So if you can use a kernel and system configuration which triggers the problem, and then try changing the mount options to include data=writeback, and then rerun the test, and let me know if the problem still reproduces, I'd be really grateful. -- Jon -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)
On Fri, Dec 10, 2010 at 12:52 AM, Jon Nelson jnel...@jamponi.net wrote: On Thu, Dec 9, 2010 at 8:38 PM, Ted Ts'o ty...@mit.edu wrote: On Fri, Dec 10, 2010 at 02:53:30AM +0100, Matt wrote: Try a kernel before 5a87b7a5da250c9be6d757758425dfeaf8ed3179 from the tests I've done that one showed the least or no corruption if you count the empty /etc/env.d/03opengl as an artefact Yes, that's a good test. Also try commit bd2d0210cf. The patch series that is most likely to be at fault if there is a regression in between 5a87b7a5d and bd2d0210cf inclusive. I did a lot of testing before submitting it, but that wa a tricky rewrite. If you can reproduce the problem reliably, it might be good to try commit 16828088f9 (the commit before 5a87b7a5d) and commit bd2d0210cf. If it reliably reproduces on bd2d0210cf, but is clean on 16828088f9, then it's my ext4 block i/o submission patches, and we'll need to either figure out what's going on or back out that set of changes. If that's the case, a bisect of those changes (it's only 6 commits, so it shouldn't take long) would be most appreciated. I observed the behavior on bd2d0210cf in a qemu-kvm install of openSUSE 11.3 (x86_64) on *totally* different host - an AMD quad-core. I did /not/ observe the behavior on 16828088f9 (yet). I'll run the test a few more times on 1682.. Additionally, I am building a bisected kernel now ( cb20d5188366f04d96d2e07b1240cc92170ade40 ), but won't be able to get back at it for a while. cb20d5188366f04d96d2e07b1240cc92170ade40 seems OK so far. I'm going to try 1de3e3df917459422cb2aecac440febc8879d410 soon. -- Jon -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)
On Fri, Dec 10, 2010 at 8:58 AM, Jon Nelson jnel...@jamponi.net wrote: On Fri, Dec 10, 2010 at 12:52 AM, Jon Nelson jnel...@jamponi.net wrote: On Thu, Dec 9, 2010 at 8:38 PM, Ted Ts'o ty...@mit.edu wrote: On Fri, Dec 10, 2010 at 02:53:30AM +0100, Matt wrote: Try a kernel before 5a87b7a5da250c9be6d757758425dfeaf8ed3179 from the tests I've done that one showed the least or no corruption if you count the empty /etc/env.d/03opengl as an artefact Yes, that's a good test. Also try commit bd2d0210cf. The patch series that is most likely to be at fault if there is a regression in between 5a87b7a5d and bd2d0210cf inclusive. I did a lot of testing before submitting it, but that wa a tricky rewrite. If you can reproduce the problem reliably, it might be good to try commit 16828088f9 (the commit before 5a87b7a5d) and commit bd2d0210cf. If it reliably reproduces on bd2d0210cf, but is clean on 16828088f9, then it's my ext4 block i/o submission patches, and we'll need to either figure out what's going on or back out that set of changes. If that's the case, a bisect of those changes (it's only 6 commits, so it shouldn't take long) would be most appreciated. I observed the behavior on bd2d0210cf in a qemu-kvm install of openSUSE 11.3 (x86_64) on *totally* different host - an AMD quad-core. I did /not/ observe the behavior on 16828088f9 (yet). I'll run the test a few more times on 1682.. Additionally, I am building a bisected kernel now ( cb20d5188366f04d96d2e07b1240cc92170ade40 ), but won't be able to get back at it for a while. cb20d5188366f04d96d2e07b1240cc92170ade40 seems OK so far. I'm going to try 1de3e3df917459422cb2aecac440febc8879d410 soon. Barring false negatives, bd2d0210cf22f2bd0cef72eb97cf94fc7d31d8cc appears to be the culprit (according to git bisect). I will test bd2d0210cf22f2bd0cef72eb97cf94fc7d31d8cc again, confirm the behavior, and work backwards to try to reduce the possibility of false negatives. -- Jon -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)
On Fri, Dec 10, 2010 at 10:54 AM, Jon Nelson jnel...@jamponi.net wrote: On Fri, Dec 10, 2010 at 8:58 AM, Jon Nelson jnel...@jamponi.net wrote: On Fri, Dec 10, 2010 at 12:52 AM, Jon Nelson jnel...@jamponi.net wrote: On Thu, Dec 9, 2010 at 8:38 PM, Ted Ts'o ty...@mit.edu wrote: On Fri, Dec 10, 2010 at 02:53:30AM +0100, Matt wrote: Try a kernel before 5a87b7a5da250c9be6d757758425dfeaf8ed3179 from the tests I've done that one showed the least or no corruption if you count the empty /etc/env.d/03opengl as an artefact Yes, that's a good test. Also try commit bd2d0210cf. The patch series that is most likely to be at fault if there is a regression in between 5a87b7a5d and bd2d0210cf inclusive. I did a lot of testing before submitting it, but that wa a tricky rewrite. If you can reproduce the problem reliably, it might be good to try commit 16828088f9 (the commit before 5a87b7a5d) and commit bd2d0210cf. If it reliably reproduces on bd2d0210cf, but is clean on 16828088f9, then it's my ext4 block i/o submission patches, and we'll need to either figure out what's going on or back out that set of changes. If that's the case, a bisect of those changes (it's only 6 commits, so it shouldn't take long) would be most appreciated. I observed the behavior on bd2d0210cf in a qemu-kvm install of openSUSE 11.3 (x86_64) on *totally* different host - an AMD quad-core. I did /not/ observe the behavior on 16828088f9 (yet). I'll run the test a few more times on 1682.. Additionally, I am building a bisected kernel now ( cb20d5188366f04d96d2e07b1240cc92170ade40 ), but won't be able to get back at it for a while. cb20d5188366f04d96d2e07b1240cc92170ade40 seems OK so far. I'm going to try 1de3e3df917459422cb2aecac440febc8879d410 soon. Barring false negatives, bd2d0210cf22f2bd0cef72eb97cf94fc7d31d8cc appears to be the culprit (according to git bisect). I will test bd2d0210cf22f2bd0cef72eb97cf94fc7d31d8cc again, confirm the behavior, and work backwards to try to reduce the possibility of false negatives. A few additional notes, in no particular order: - For me, triggering the problem is fairly easy when encryption is involved. - I'm now 81 iterations into testing bd2d0210cf22f2bd0cef72eb97cf94fc7d31d8cc *without* encryption. Out of 81 iterations, I have 4 failures: #16, 40, 62, and 64. I will now try 1de3e3df917459422cb2aecac440febc8879d410 much more extensively. Is this useful information? -- Jon -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)
On Thu, Dec 9, 2010 at 2:13 PM, Ted Ts'o ty...@mit.edu wrote: On Thu, Dec 09, 2010 at 12:10:58PM -0600, Jon Nelson wrote: You should be OK, there. Are you using encryption or no? I had difficulty replicating the issue without encryption. Yes, I'm using encryption. LUKS with aes-xts-plain-sha256, and then LVM on top of LUKS. Hmm. The cipher is listed as: aes-cbc-essiv:sha256 If you can point out how to query pgsql_tmp (I'm using a completely default postgres install), that would be helpful, but I don't think it would be going anywhere else. Normally it's /var/lib/pgsql/data/pgsql_tmp (or /var/lib/postgres/data/pgsql_tmp in your case). By placing /var/lib/{postgresql,pgsql}/data on the LUKS + ext4 volume, on both openSUSE 11.3 and Kubuntu, I was able to replicate the problem easily, in VirtualBox. I can give qemu a try. In both cases I was using a 2.6.37x kernel. Ah, I'm not using virtualization. I'm running on a X410 laptop, on raw hardware. Perhaps virtualization slows things down enough that it triggers? Or maybe you're running with a more constrained memory than I? How much memory do you have configured in your VM? 512MB. 'free' reports 75MB, 419MB free. I originally noticed the problem on really real hardware (thinkpad T61p), however. -- Jon -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)
On Thu, Dec 9, 2010 at 8:38 PM, Ted Ts'o ty...@mit.edu wrote: On Fri, Dec 10, 2010 at 02:53:30AM +0100, Matt wrote: Try a kernel before 5a87b7a5da250c9be6d757758425dfeaf8ed3179 from the tests I've done that one showed the least or no corruption if you count the empty /etc/env.d/03opengl as an artefact Yes, that's a good test. Also try commit bd2d0210cf. The patch series that is most likely to be at fault if there is a regression in between 5a87b7a5d and bd2d0210cf inclusive. I did a lot of testing before submitting it, but that wa a tricky rewrite. If you can reproduce the problem reliably, it might be good to try commit 16828088f9 (the commit before 5a87b7a5d) and commit bd2d0210cf. If it reliably reproduces on bd2d0210cf, but is clean on 16828088f9, then it's my ext4 block i/o submission patches, and we'll need to either figure out what's going on or back out that set of changes. If that's the case, a bisect of those changes (it's only 6 commits, so it shouldn't take long) would be most appreciated. I observed the behavior on bd2d0210cf in a qemu-kvm install of openSUSE 11.3 (x86_64) on *totally* different host - an AMD quad-core. I did /not/ observe the behavior on 16828088f9 (yet). I'll run the test a few more times on 1682.. Additionally, I am building a bisected kernel now ( cb20d5188366f04d96d2e07b1240cc92170ade40 ), but won't be able to get back at it for a while. I hope this helps. -- Jon -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)
On Tue, Dec 7, 2010 at 9:37 PM, Jon Nelson jnel...@jamponi.net wrote: On Tue, Dec 7, 2010 at 1:35 PM, Ted Ts'o ty...@mit.edu wrote: On Tue, Dec 07, 2010 at 01:22:43PM -0500, Mike Snitzer wrote: 1. create a database (from bash): createdb test 2. place the following contents in a file (I used 't.sql'): begin; create temporary table foo as select x as a, ARRAY[x] as b FROM generate_series(1, 1000 ) AS x; create index foo_a_idx on foo (a); create index foo_b_idx on foo USING GIN (b); rollback; 3. execute that sql: psql -f t.sql --echo-all test With 2.6.34.7 I can re-run [3] all day long, as many times as I want, without issue. With 2.6.37-rc4-13 (the currently-installed KOTD kernel) if tails pretty frequently. So I just tried to reproduce this on an Ubuntu 10.04 system running 2.6.37-rc5 (completely stock except for a few apparmor patches that I needed to keep the apparmor userspace from complaining). I'm using Postgres 8.4.5-0ubuntu10.04. Using the above procedure, I wasn't able to reproduce. Then I realized this might have been because I was using an SSD root file system (which is secured using LUKS/dm-crypt, with LVM on top of dm-crypt). So I mounted a file system on a 5400 rpm SSD disk, which is also protected using LUKS/dm-crypt with LVM on top. I then executed the PostgresQL commands: CREATE TABLESPACE test LOCATION '/kbuild/postgres'; SET default_tablespace = test; COMMIT \quit I then re-ran the above proceduing, and verified that all of the I/O was going to the 5400rpm laptop disk. I then ran the above procedure a half-dozen times, and I still haven't been able to reproduce any Postgresql errors or kernel errors. Jon, can you help me identify what might be different with your run and mine? What version of Postgres are you using? One difference is the location of the transaction logs (pg_xlog). In my case, /var/lib/pgsql/data *is* mountpoint for the test volume (actually, it's a symlink to the mount point). In your case, that is not so. Perhaps that makes a difference? pgsql_tmp might also be on two different volumes in your case (I can't be sure). I grabbed a Kubuntu iso and installed Kubuntu 10.10, and then upgraded to 'natty', and eventually to 2.6.37-8-generic. With that install, and postgresql's data (/var/lib/postgresql/data) being located on a LUKS+ext4 volume, I easily observe the behavior. Does this help? -- Jon -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)
I finally found some time to test this out. With 2.6.37-rc4 (openSUSE KOTD kernel) I easily encounter the issue. Using a virtual machine, I created a stock, minimal openSUSE 11.3 x86_64 install, installed all updates, installed postgresql and the 'KOTD' (Kernel of the Day) kernel, and ran the following tests (as postgres user because I'm lazy). 1. create a database (from bash): createdb test 2. place the following contents in a file (I used 't.sql'): begin; create temporary table foo as select x as a, ARRAY[x] as b FROM generate_series(1, 1000 ) AS x; create index foo_a_idx on foo (a); create index foo_b_idx on foo USING GIN (b); rollback; 3. execute that sql: psql -f t.sql --echo-all test With 2.6.34.7 I can re-run [3] all day long, as many times as I want, without issue. With 2.6.37-rc4-13 (the currently-installed KOTD kernel) if tails pretty frequently. Then I tested with the 'vanilla' kernel available here: http://download.opensuse.org/repositories/Kernel:/vanilla/standard/ The 'vanilla' kernel exhibited the same problems. The version I tested: 2.6.37-rc4-219-g771f8bc-vanilla. Incidentally, quick tests of jfs, xfs, and ext3 do _not_ show the same problems, although I will note that I usually saw failure at least 1 in 3, but sometimes had to re-run the sql test 4 or 5 times before I saw failure. I will continue to do some testing, but I will hold off on testing the commits above until I receive further testing suggestions. -- Jon -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)
On Tue, Dec 7, 2010 at 12:22 PM, Mike Snitzer snit...@redhat.com wrote: On Tue, Dec 07 2010 at 1:10pm -0500, Jon Nelson jnel...@jamponi.net wrote: I finally found some time to test this out. With 2.6.37-rc4 (openSUSE KOTD kernel) I easily encounter the issue. Using a virtual machine, I created a stock, minimal openSUSE 11.3 x86_64 install, installed all updates, installed postgresql and the 'KOTD' (Kernel of the Day) kernel, and ran the following tests (as postgres user because I'm lazy). 1. create a database (from bash): createdb test 2. place the following contents in a file (I used 't.sql'): begin; create temporary table foo as select x as a, ARRAY[x] as b FROM generate_series(1, 1000 ) AS x; create index foo_a_idx on foo (a); create index foo_b_idx on foo USING GIN (b); rollback; 3. execute that sql: psql -f t.sql --echo-all test With 2.6.34.7 I can re-run [3] all day long, as many times as I want, without issue. With 2.6.37-rc4-13 (the currently-installed KOTD kernel) if tails pretty frequently. How does it fail? postgres errors? kernel errors? postgresql errors. Typically, header corruption but from the limited visibility I've had into this via strace, what I see is zeroed pages where there shouldn't be. I just ran a test and got: ERROR: invalid page header in block 37483 of relation base/16384/16417 but that is not the only error one might get. Then I tested with the 'vanilla' kernel available here: http://download.opensuse.org/repositories/Kernel:/vanilla/standard/ The 'vanilla' kernel exhibited the same problems. The version I tested: 2.6.37-rc4-219-g771f8bc-vanilla. Incidentally, quick tests of jfs, xfs, and ext3 do _not_ show the same problems, although I will note that I usually saw failure at least 1 in 3, but sometimes had to re-run the sql test 4 or 5 times before I saw failure. I will continue to do some testing, but I will hold off on testing the commits above until I receive further testing suggestions. OK, so to be clear: your testing is on dm-crypt + ext4? Yes. I took a virtual hard disk which shows up as /dev/sdb, used cryptsetup to format it as a LUKS volume, mounted the LUKS volume, formatted as ext4 (or whatever), mounted that, rsync'd over a blank postgresql 'data' directory, started postgresql, became the postgres user, and proceeded to create the db and run the sql. And you're testing upstream based kernels (meaning the dm-crypt scalability patch that has been in question is _not_ in the mix)? I am testing both the KOTD kernels and the vanilla kernels - neither of which has the dm-crypt patches (as far as I know). -- Jon -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)
On Tue, Dec 7, 2010 at 12:52 PM, Chris Mason chris.ma...@oracle.com wrote: Excerpts from Jon Nelson's message of 2010-12-07 13:45:14 -0500: On Tue, Dec 7, 2010 at 12:22 PM, Mike Snitzer snit...@redhat.com wrote: On Tue, Dec 07 2010 at 1:10pm -0500, Jon Nelson jnel...@jamponi.net wrote: I finally found some time to test this out. With 2.6.37-rc4 (openSUSE KOTD kernel) I easily encounter the issue. Using a virtual machine, I created a stock, minimal openSUSE 11.3 x86_64 install, installed all updates, installed postgresql and the 'KOTD' (Kernel of the Day) kernel, and ran the following tests (as postgres user because I'm lazy). 1. create a database (from bash): createdb test 2. place the following contents in a file (I used 't.sql'): begin; create temporary table foo as select x as a, ARRAY[x] as b FROM generate_series(1, 1000 ) AS x; create index foo_a_idx on foo (a); create index foo_b_idx on foo USING GIN (b); rollback; 3. execute that sql: psql -f t.sql --echo-all test With 2.6.34.7 I can re-run [3] all day long, as many times as I want, without issue. With 2.6.37-rc4-13 (the currently-installed KOTD kernel) if tails pretty frequently. How does it fail? postgres errors? kernel errors? postgresql errors. Typically, header corruption but from the limited visibility I've had into this via strace, what I see is zeroed pages where there shouldn't be. This sounds a lot like a bug higher up than dm-crypt. Zeros tend to come from some piece of code explicitly filling a page with zeros, and that often happens in the corner cases for O_DIRECT and a few other places in the filesystem. Have you tried triggering this with a regular block device? I just tried the whole set of tests, but with /dev/sdb directly (as ext4) without any crypt-y bits. It takes more iterations but out of 6 tests I had one failure: same type of thing, 'invalid page header in block '. I can't guarantee that it is a full-page of zeroes, just what I saw from the (limited) stracing I did. -- Jon -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)
On Tue, Dec 7, 2010 at 2:02 PM, Chris Mason chris.ma...@oracle.com wrote: Excerpts from Jon Nelson's message of 2010-12-07 14:34:40 -0500: On Tue, Dec 7, 2010 at 12:52 PM, Chris Mason chris.ma...@oracle.com wrote: postgresql errors. Typically, header corruption but from the limited visibility I've had into this via strace, what I see is zeroed pages where there shouldn't be. This sounds a lot like a bug higher up than dm-crypt. Zeros tend to come from some piece of code explicitly filling a page with zeros, and that often happens in the corner cases for O_DIRECT and a few other places in the filesystem. Have you tried triggering this with a regular block device? I just tried the whole set of tests, but with /dev/sdb directly (as ext4) without any crypt-y bits. It takes more iterations but out of 6 tests I had one failure: same type of thing, 'invalid page header in block '. I can't guarantee that it is a full-page of zeroes, just what I saw from the (limited) stracing I did. Fantastic. Now for our usual suspects: 1) Is postgres using O_DIRECT? If yes, please turn it off According to strace, O_DIRECT didn't show up once during the test. 2) Is postgres allocating sparse files? If yes, please have it fully allocate the file instead. That's a tough one. I don't think postgresql does that, but I'm not an expert here. 3) Is postgres using preallocation (fallocate)? If yes, please have it fully allocate the file instead As far as strace is concerned, postgreql is not using fallocate in this version. -- Jon -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)
On Tue, Dec 7, 2010 at 2:33 PM, Chris Mason chris.ma...@oracle.com wrote: Excerpts from Jon Nelson's message of 2010-12-07 15:25:47 -0500: On Tue, Dec 7, 2010 at 2:02 PM, Chris Mason chris.ma...@oracle.com wrote: Excerpts from Jon Nelson's message of 2010-12-07 14:34:40 -0500: On Tue, Dec 7, 2010 at 12:52 PM, Chris Mason chris.ma...@oracle.com wrote: postgresql errors. Typically, header corruption but from the limited visibility I've had into this via strace, what I see is zeroed pages where there shouldn't be. This sounds a lot like a bug higher up than dm-crypt. Zeros tend to come from some piece of code explicitly filling a page with zeros, and that often happens in the corner cases for O_DIRECT and a few other places in the filesystem. Have you tried triggering this with a regular block device? I just tried the whole set of tests, but with /dev/sdb directly (as ext4) without any crypt-y bits. It takes more iterations but out of 6 tests I had one failure: same type of thing, 'invalid page header in block '. I can't guarantee that it is a full-page of zeroes, just what I saw from the (limited) stracing I did. Fantastic. Now for our usual suspects: 1) Is postgres using O_DIRECT? If yes, please turn it off According to strace, O_DIRECT didn't show up once during the test. 2) Is postgres allocating sparse files? If yes, please have it fully allocate the file instead. That's a tough one. I don't think postgresql does that, but I'm not an expert here. 3) Is postgres using preallocation (fallocate)? If yes, please have it fully allocate the file instead As far as strace is concerned, postgreql is not using fallocate in this version. Well, the only other usual suspect would be mmap. Does the strace show that you're using read/write for file IO or is it doing a lot of mmaps on the files? I'm pretty sure postgresql uses regular file I/O and not mmap. -- Jon -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)
On Tue, Dec 7, 2010 at 2:41 PM, Chris Mason chris.ma...@oracle.com wrote: Excerpts from Jon Nelson's message of 2010-12-07 15:25:47 -0500: On Tue, Dec 7, 2010 at 2:02 PM, Chris Mason chris.ma...@oracle.com wrote: Excerpts from Jon Nelson's message of 2010-12-07 14:34:40 -0500: On Tue, Dec 7, 2010 at 12:52 PM, Chris Mason chris.ma...@oracle.com wrote: postgresql errors. Typically, header corruption but from the limited visibility I've had into this via strace, what I see is zeroed pages where there shouldn't be. This sounds a lot like a bug higher up than dm-crypt. Zeros tend to come from some piece of code explicitly filling a page with zeros, and that often happens in the corner cases for O_DIRECT and a few other places in the filesystem. Have you tried triggering this with a regular block device? I just tried the whole set of tests, but with /dev/sdb directly (as ext4) without any crypt-y bits. It takes more iterations but out of 6 tests I had one failure: same type of thing, 'invalid page header in block '. I can't guarantee that it is a full-page of zeroes, just what I saw from the (limited) stracing I did. Fantastic. Now for our usual suspects: 1) Is postgres using O_DIRECT? If yes, please turn it off According to strace, O_DIRECT didn't show up once during the test. 2) Is postgres allocating sparse files? If yes, please have it fully allocate the file instead. That's a tough one. I don't think postgresql does that, but I'm not an expert here. Ok, please compare du -k and du -k --apparent-size for each of the files involved in the postgres run. Because this is all done in a transaction (which fails), and because the table is a TEMPORARY table, there *are* no files once the transaction fails because postgresql unlinks them. I can modify the test to use real tables and do things outside of a transaction, however. I was using fdatasync[1] and now I'm using sync. I'm on 9 iterations without a failure (on ext4 - no crypt). Theoretically, these settings only make a difference in the event of a crash. However, could they make a difference in terms of the paths taken in the kernel? [1] for wal_sync_method -- Jon -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)
On Tue, Dec 7, 2010 at 1:35 PM, Ted Ts'o ty...@mit.edu wrote: On Tue, Dec 07, 2010 at 01:22:43PM -0500, Mike Snitzer wrote: 1. create a database (from bash): createdb test 2. place the following contents in a file (I used 't.sql'): begin; create temporary table foo as select x as a, ARRAY[x] as b FROM generate_series(1, 1000 ) AS x; create index foo_a_idx on foo (a); create index foo_b_idx on foo USING GIN (b); rollback; 3. execute that sql: psql -f t.sql --echo-all test With 2.6.34.7 I can re-run [3] all day long, as many times as I want, without issue. With 2.6.37-rc4-13 (the currently-installed KOTD kernel) if tails pretty frequently. So I just tried to reproduce this on an Ubuntu 10.04 system running 2.6.37-rc5 (completely stock except for a few apparmor patches that I needed to keep the apparmor userspace from complaining). I'm using Postgres 8.4.5-0ubuntu10.04. Using the above procedure, I wasn't able to reproduce. Then I realized this might have been because I was using an SSD root file system (which is secured using LUKS/dm-crypt, with LVM on top of dm-crypt). So I mounted a file system on a 5400 rpm SSD disk, which is also protected using LUKS/dm-crypt with LVM on top. I then executed the PostgresQL commands: CREATE TABLESPACE test LOCATION '/kbuild/postgres'; SET default_tablespace = test; COMMIT \quit I then re-ran the above proceduing, and verified that all of the I/O was going to the 5400rpm laptop disk. I then ran the above procedure a half-dozen times, and I still haven't been able to reproduce any Postgresql errors or kernel errors. Jon, can you help me identify what might be different with your run and mine? What version of Postgres are you using? I am using postgres 8.4.5 on openSUSE 11.3 x86_64. The problems were observed on both real hardware (thinkpad T61p) and in virtualbox, where all current testing is taking place. The current kernel is a vanilla (unpatched) kernel. I *did* set wal_sync_method to fdatasync, however, if that is relevant. Otherwise, the pg config is stock. With no crypt involved, I did have to iterate the tests to observe the issue - a half-dozen times or more were necessary. Typically, when crypt was involved, the issue would manifest much more rapidly. -- Jon -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)
On Tue, Dec 7, 2010 at 3:02 PM, Chris Mason chris.ma...@oracle.com wrote: Excerpts from Jon Nelson's message of 2010-12-07 15:48:58 -0500: On Tue, Dec 7, 2010 at 2:41 PM, Chris Mason chris.ma...@oracle.com wrote: Excerpts from Jon Nelson's message of 2010-12-07 15:25:47 -0500: On Tue, Dec 7, 2010 at 2:02 PM, Chris Mason chris.ma...@oracle.com wrote: Excerpts from Jon Nelson's message of 2010-12-07 14:34:40 -0500: On Tue, Dec 7, 2010 at 12:52 PM, Chris Mason chris.ma...@oracle.com wrote: postgresql errors. Typically, header corruption but from the limited visibility I've had into this via strace, what I see is zeroed pages where there shouldn't be. This sounds a lot like a bug higher up than dm-crypt. Zeros tend to come from some piece of code explicitly filling a page with zeros, and that often happens in the corner cases for O_DIRECT and a few other places in the filesystem. Have you tried triggering this with a regular block device? I just tried the whole set of tests, but with /dev/sdb directly (as ext4) without any crypt-y bits. It takes more iterations but out of 6 tests I had one failure: same type of thing, 'invalid page header in block '. I can't guarantee that it is a full-page of zeroes, just what I saw from the (limited) stracing I did. Fantastic. Now for our usual suspects: Maybe not so fantastic. I kept testing and had no more failures. At all. After 40+ iterations I gave up. I went back to trying ext4 on a LUKS volume. The 'hit' ratio went to something like 1 in 3, or better. I will continue to do testing with and without LUKS. I did /not/ reboot between tests, but I do start with a fresh postgres database. -- Jon -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)
On Tue, Dec 7, 2010 at 1:35 PM, Ted Ts'o ty...@mit.edu wrote: On Tue, Dec 07, 2010 at 01:22:43PM -0500, Mike Snitzer wrote: 1. create a database (from bash): createdb test 2. place the following contents in a file (I used 't.sql'): begin; create temporary table foo as select x as a, ARRAY[x] as b FROM generate_series(1, 1000 ) AS x; create index foo_a_idx on foo (a); create index foo_b_idx on foo USING GIN (b); rollback; 3. execute that sql: psql -f t.sql --echo-all test With 2.6.34.7 I can re-run [3] all day long, as many times as I want, without issue. With 2.6.37-rc4-13 (the currently-installed KOTD kernel) if tails pretty frequently. So I just tried to reproduce this on an Ubuntu 10.04 system running 2.6.37-rc5 (completely stock except for a few apparmor patches that I needed to keep the apparmor userspace from complaining). I'm using Postgres 8.4.5-0ubuntu10.04. Using the above procedure, I wasn't able to reproduce. Then I realized this might have been because I was using an SSD root file system (which is secured using LUKS/dm-crypt, with LVM on top of dm-crypt). So I mounted a file system on a 5400 rpm SSD disk, which is also protected using LUKS/dm-crypt with LVM on top. I then executed the PostgresQL commands: CREATE TABLESPACE test LOCATION '/kbuild/postgres'; SET default_tablespace = test; COMMIT \quit I then re-ran the above proceduing, and verified that all of the I/O was going to the 5400rpm laptop disk. I then ran the above procedure a half-dozen times, and I still haven't been able to reproduce any Postgresql errors or kernel errors. Jon, can you help me identify what might be different with your run and mine? What version of Postgres are you using? One difference is the location of the transaction logs (pg_xlog). In my case, /var/lib/pgsql/data *is* mountpoint for the test volume (actually, it's a symlink to the mount point). In your case, that is not so. Perhaps that makes a difference? pgsql_tmp might also be on two different volumes in your case (I can't be sure). -- Jon -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)
On Tue, Dec 7, 2010 at 2:41 PM, Chris Mason chris.ma...@oracle.com wrote: Excerpts from Jon Nelson's message of 2010-12-07 15:25:47 -0500: On Tue, Dec 7, 2010 at 2:02 PM, Chris Mason chris.ma...@oracle.com wrote: Excerpts from Jon Nelson's message of 2010-12-07 14:34:40 -0500: On Tue, Dec 7, 2010 at 12:52 PM, Chris Mason chris.ma...@oracle.com wrote: postgresql errors. Typically, header corruption but from the limited visibility I've had into this via strace, what I see is zeroed pages where there shouldn't be. This sounds a lot like a bug higher up than dm-crypt. Zeros tend to come from some piece of code explicitly filling a page with zeros, and that often happens in the corner cases for O_DIRECT and a few other places in the filesystem. Have you tried triggering this with a regular block device? I just tried the whole set of tests, but with /dev/sdb directly (as ext4) without any crypt-y bits. It takes more iterations but out of 6 tests I had one failure: same type of thing, 'invalid page header in block '. I can't guarantee that it is a full-page of zeroes, just what I saw from the (limited) stracing I did. Fantastic. Now for our usual suspects: 1) Is postgres using O_DIRECT? If yes, please turn it off According to strace, O_DIRECT didn't show up once during the test. 2) Is postgres allocating sparse files? If yes, please have it fully allocate the file instead. That's a tough one. I don't think postgresql does that, but I'm not an expert here. Ok, please compare du -k and du -k --apparent-size for each of the files involved in the postgres run. One of the files (the table itself) is very slightly sparse: 588240 (apparent) vs 588244 -- Jon -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: dm-crypt barrier support is effective
On Wed, Dec 1, 2010 at 12:24 PM, Milan Broz mb...@redhat.com wrote: On 12/01/2010 06:35 PM, Matt wrote: Thanks for pointing to v6 ! I hadn't noticed that there was a new one :) Well, so I'll restore my box to a working/productive state and will try out v6 (I'm pretty confident that it'll work without problems). It's the same as previous, just with fixed header (to track it properly in patchwork) , second patch adds some read optimisation, nothing what should help here. Anyway, I run several tests on 2.6.37-rc3+ and see no integrity problems (using xfs,ext3 and ext4 over dmcrypt). So please try to check which change causes these problems for you, it can be something completely unrelated to these patches. (If if anyone know how to trigger some corruption with btrfs/dmcrypt, let me know I am not able to reproduce it either.) Perhaps this is useful: for myself, I found that when I started using 2.6.37rc3 that postgresql starting having a *lot* of problems with corruption. Specifically, I noted zeroed pages, corruption in headers, all sorts of stuff on /newly created/ tables, especially during index creation. I had a fairly high hit rate of failure. I backed off to 2.6.34.7 and have *zero* problems (in fact, prior to 2.6.37rc3, I had never had a corruption issue with postgresql). I ran on 2.6.36 for a few weeks as well, without issue. I am using kcrypt with lvm on top of that, and ext4 on top of that. -- Jon -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
weird ENOSPC with defragment directory
Most other directories on /var/cache, *except* those created by squid, can be defragmented. The filesystem was converted from ext3/4. turnip:~ # uname -a Linux turnip 2.6.34-12-default #1 SMP 2010-06-29 02:39:08 +0200 x86_64 x86_64 x86_64 GNU/Linux (stock openSUSE 11.3 kernel) turnip:~ btrfsctl -d /var/cache/squid/01/93 ioctl:: No space left on device turnip:~ # find !$ find /var/cache/squid/01/93 /var/cache/squid/01/93 /var/cache/squid/01/93/00019321 /var/cache/squid/01/93/00019378 turnip:~ # ls -la !$ ls -la /var/cache/squid/01/93 total 2 drwxr-x--- 1 squid nogroup 32 Aug 13 17:13 . drwxr-x--- 1 squid nogroup 1024 Jun 4 18:35 .. -rw-r- 1 squid nogroup 1777 Jul 13 22:31 00019321 -rw-r- 1 squid nogroup 537 Jul 13 22:31 00019378 turnip:~ # That seems... strange. -- Jon -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: weird ENOSPC with defragment directory
On Mon, Aug 16, 2010 at 10:15 PM, Jon Nelson jnel...@jamponi.net wrote: Most other directories on /var/cache, *except* those created by squid, can be defragmented. The filesystem was converted from ext3/4. turnip:~ # uname -a Linux turnip 2.6.34-12-default #1 SMP 2010-06-29 02:39:08 +0200 x86_64 x86_64 x86_64 GNU/Linux (stock openSUSE 11.3 kernel) turnip:~ btrfsctl -d /var/cache/squid/01/93 ioctl:: No space left on device turnip:~ # find !$ find /var/cache/squid/01/93 /var/cache/squid/01/93 /var/cache/squid/01/93/00019321 /var/cache/squid/01/93/00019378 turnip:~ # ls -la !$ ls -la /var/cache/squid/01/93 total 2 drwxr-x--- 1 squid nogroup 32 Aug 13 17:13 . drwxr-x--- 1 squid nogroup 1024 Jun 4 18:35 .. -rw-r- 1 squid nogroup 1777 Jul 13 22:31 00019321 -rw-r- 1 squid nogroup 537 Jul 13 22:31 00019378 turnip:~ # That seems... strange. It gets stranger. If I issue a 'sync' command, chances are the defragment command will work. If I issue a bunch of them (in series, however), then I get ENOSPC. find /var/cache -xdev -type d -exec btrfsprogs -d {} \; Seems to do it every time, with or without -depth. I get 100% success and then 100% failure - no mixing. -- Jon -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html