Re: fresh btrfs filesystem, out of disk space, hundreds of gigs free

2014-03-22 Thread Jon Nelson
Duncan 1i5t5.duncan at cox.net writes:


 Jon Nelson posted on Fri, 21 Mar 2014 19:00:51 -0500 as excerpted:

  Using openSUSE 13.1 on x86_64 which - as of this writing - is 3.11.10,
  Would a more recent kernel than 3.11 have done me any good?

 [Reordered the kernel question from below to here, where you reported the
 running version.]

 As both mkfs.btrfs and the wiki recommend, always use the latest kernel.
 In fact, the kernel config's btrfs option had a pretty strong warning thru
 3.12 that was only toned down in 3.13 as well, so I'd definitely
 recommend at least the latest 3.13.x stable series kernel in any case.

I would like to say that your response is one of the most useful and
detailed responsed I've ever received on a mailing list. Thank you!

The please run the very latest kernel/userland is sort of true for
everything, though. Also, I am of the understanding that the openSUSE folks
back-port *some* of the btrfs-relevant bits to both the kernel and the
userspace tools, but I could be wrong, too.

  I tried to copy a bunch of files over to a btrfs filesystem (which was
  mounted as /, in fact).
 
  After some time, things ground to a halt and I got out of disk space
  errors. btrfs fi df /   showed about 1TB of *data* free, and 500MB
  of metadata free.

 It's the metadata, plus no space left to allocate more.  See below.

Right. Although I did a poor job of noting it, I understood at least that much.

  Below are the btrfs fi df /  and  btrfs fi show.
 
 
  turnip:~ # btrfs fi df /
  Data, single: total=1.80TiB, used=832.22GiB
  System, DUP: total=8.00MiB, used=204.00KiB
  System, single: total=4.00MiB, used=0.00
  Metadata, DUP: total=5.50GiB, used=5.00GiB
  Metadata, single: total=8.00MiB, used=0.00

 FWIW, the system and metadata single chunks reported there are an
 artifact from mkfs.btrfs and aren't used (used=0.00).  At some point it
 should be updated to remove them automatically, but meanwhile, a balance
 should remove them from the listing.  If you do that balance immediately
 after filesystem creation, at the first mount, you'll be rid of them when
 there's not a whole lot of other data on the filesystem to balance as
 well.  That would leave:

  Data, single: total=1.80TiB, used=832.22GiB
  System, DUP: total=8.00MiB, used=204.00KiB
  Metadata, DUP: total=5.50GiB, used=5.00GiB

 Metadata is the red-flag here.  Metadata chunks are 256 MiB in size, but
 in default DUP mode, two are allocated at once, thus 512 MiB at a time.
 And you're under 512 MiB free so you're running on the last pair of
 metadata chunks, which means depending on the operation, you may need to
 allocate metadata pretty quickly.  You can probably copy a few files
 before that, but a big copy operation with many files at a time would
 likely need to allocate more metadata.

The size of the chunks allocated is especially useful information. I've not
seen that anywhere else, and does explain a fair bit.

 But for a complete picture you need the filesystem show output, below, as
 well...

  turnip:~ # btrfs fi show
  Label: none  uuid: 9379c138-b309-4556-8835-0f156b863d29
  Total devices 1 FS bytes used 837.22GiB
  devid1 size 1.81TiB used 1.81TiB path /dev/sda3
 
  Btrfs v3.12+20131125

 OK.  Here we see the root problem.  Size 1.81 TiB, used 1.81 TiB.  No
 unallocated space at all.  Whichever runs out of space first, data or
 metadata, you'll be stuck.

Now it's at this point that I am unclear. I thought the above said:
1 device on this filesystem, 837.22 GiB used.
and
device ID #1 is /dev/sda3, is 1.81TiB in size, and btrfs is using 1.81TiB
of that

Which I interpret differently. Can you go into more detail as to how (from
btrfs fi show) we can say the _filesystem_ (not the device) is full?

 And as was discussed above, you're going to need another pair of metadata
 chunks allocated pretty quickly, but there's no unallocated space
 available to allocate to them, so no surprise at all you got free-space
 errors! =:^(

 Conversely, you have all sorts of free data space.  Data space is
 allocated in gig-size chunks, and you have nearly a TiB of free data-
 space, which means there's quite a few nearly empty data chunks available.

 To correct that imbalance and free the extra data space to the pool so
 more metadata can be allocated, you run a balance.

In fact, I did try a balance - both a data-only and a metadata-only balance.
 The metadata-only balance failed. I cancelled the data-only balance early,
although perhaps I should have been more patient. I went from a running
system to working from a rescue environment -- I was under a bit of time
pressure to get things moving again.

 Here, you probably want a balance of the data only, since it's what's
 unbalanced, and on slow spinning rust (as opposed to fast SSD) rewriting
 /everything/, as balance does by default, will take some time.  To do
 data only, use the -d option:

 # btrfs balance start -d /

 (You said it was mounted on root, so

fresh btrfs filesystem, out of disk space, hundreds of gigs free

2014-03-21 Thread Jon Nelson
Using openSUSE 13.1 on x86_64 which - as of this writing - is 3.11.10,
I tried to copy a bunch of files over to a btrfs filesystem (which was
mounted as /, in fact).

After some time, things ground to a halt and I got out of disk space errors.
btrfs fi df /   showed about 1TB of *data* free, and 500MB of metadata free.

Below are the btrfs fi df /and   btrfs fi show.
I ended up having to reboot the machine. I was not able to get the
machine to boot again after that, and ended up having to resort to a
rescue environment, at which point I copied everything over to an ext4
filesystem.

This is this first time I have tried btrfs since I experienced
(unfixable) corruption a year or so back, with 3.7 and up. I was led
to believe that the ENOSPC errors had been resolved.

Would a more recent kernel than 3.11 have done me any good?


turnip:~ # btrfs fi df /
Data, single: total=1.80TiB, used=832.22GiB
System, DUP: total=8.00MiB, used=204.00KiB
System, single: total=4.00MiB, used=0.00
Metadata, DUP: total=5.50GiB, used=5.00GiB
Metadata, single: total=8.00MiB, used=0.00
turnip:~ # btrfs fi show
Label: none  uuid: 9379c138-b309-4556-8835-
0f156b863d29
Total devices 1 FS bytes used 837.22GiB
devid1 size 1.81TiB used 1.81TiB path /dev/sda3

Btrfs v3.12+20131125


-- 
Jon
Software Blacksmith
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


report: my btrfs filesystem failed hard today

2013-07-17 Thread Jon Nelson
I had a btrfs filesystem under 3.9.8 that failed /hard/ today. So hard
that the filesystem could not be mounted because there wasn't enough
free space, unless it was mounted read only.

This happened after I ran out of metadata space (is there a way to
increase the amount of metadata storage) while still having many gigs
free of data space, as reported by btrfs fi df. I tried balancing the
metadata, defrag'ing files (with -czlib) and even tried mounting with
-o remount,metadata_ratio={several values}, none of which worked, and
then it crashed hard.

First, it killed systemd's logger (journald), which refused to
start.Following the crash, I was not able to mount the filesystem
without -o ro.  -o recovery did not work.

The first messages are (when I ran out of disk space):

2013-07-17T12:37:36.159351-05:00 linux-3y3i kernel: [58301.322033]
btrfs allocation failed flags 1, wanted 1310720
2013-07-17T12:37:36.159367-05:00 linux-3y3i kernel: [58301.322037]
space_info 1 has 19119284224 free, is full
2013-07-17T12:37:36.159369-05:00 linux-3y3i kernel: [58301.322039]
space_info total=146070831104, used=126941663232, pinned=643072,
reserved=9175040, may_use=43724800, readonly=65536
2013-07-17T12:37:36.159370-05:00 linux-3y3i kernel: [58301.322041]
block group 1107296256 has 5368709120 bytes, 4263174144 used 0 pinned
0 reserved
2013-07-17T12:37:36.159371-05:00 linux-3y3i kernel: [58301.322042]
entry offset 1107296256, bytes 20082688, bitmap yes
2013-07-17T12:37:36.159372-05:00 linux-3y3i kernel: [58301.322044]
entry offset 1241513984, bytes 35172352, bitmap yes
2013-07-17T12:37:36.159373-05:00 linux-3y3i kernel: [58301.322045]
entry offset 1246859264, bytes 12288, bitmap no
...
2013-07-17T12:37:36.160185-05:00 linux-3y3i kernel: [58301.322387]
block group has cluster?: no
2013-07-17T12:37:36.160186-05:00 linux-3y3i kernel: [58301.322388] 40
blocks of free space at or bigger than bytes is
2013-07-17T12:37:36.160188-05:00 linux-3y3i kernel: [58301.322390]
block group 8623489024 has 5368709120 bytes, 4745510912 used 0 pinned
0 reserved
...
and a few hundred thousand more lines


I also tried 3.7 and 3.10.1 (although I could not try 3.10.1 to try to
mount the filesystem after it finally went boom).


Lots and lots (10's of thousands) of:

2013-07-17T13:25:24.690135-05:00 linux-3y3i kernel: [  493.772797]
[ cut here ]
2013-07-17T13:25:24.690137-05:00 linux-3y3i kernel: [  493.772825]
WARNING: at fs/btrfs/extent-tree.c:6556
btrfs_alloc_free_block+0x3bd/0x3d0 [btrfs]()
2013-07-17T13:25:24.690138-05:00 linux-3y3i kernel: [  493.772826]
Hardware name: 4239CTO
2013-07-17T13:25:24.690140-05:00 linux-3y3i kernel: [  493.772828]
btrfs: block rsv returned -28
2013-07-17T13:25:24.690141-05:00 linux-3y3i kernel: [  493.772829]
Modules linked in: fuse ufs qnx4 hfsplus hfs minix vfat msdos fat jfs
xfs reiserfs xt_tcpudp xt_pkttype xt_LOG af_packet xt_limit
ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ipt_REJECT
iptable_raw xt_CT iptable_filter ip6table_mangle
nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4
nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack ip6table_filter
ip6_tables x_tables arc4 iwldvm snd_hda_codec_hdmi mac80211 iTCO_wdt
snd_hda_codec_conexant iTCO_vendor_support mperf snd_usb_audio
snd_usbmidi_lib coretemp snd_rawmidi snd_seq_device snd_hda_intel
kvm_intel kvm microcode pcspkr snd_hda_codec snd_hwdep snd_pcm iwlwifi
sr_mod sdhci_pci sdhci mmc_core cfg80211 joydev cdrom snd_timer e1000e
snd_page_alloc thinkpad_acpi lpc_ich mfd_core rfkill i2c_i801 ptp
pps_core wmi snd soundcore battery tpm_tis ac tpm tpm_bios
tcp_westwood sg autofs4 btrfs raid6_pq zlib_deflate xor libcrc32c
sha256_generic dm_crypt dm_mod ghash_clmulni_intel crc32_pclmul
crc32c_intel aesni_intel ablk_helper cryptd lrw aes_x86_64 xts
gf128mul i915 drm_kms_helper drm thermal i2c_algo_bit video button
processor thermal_sys scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc
scsi_dh_alua scsi_dh ata_generic ata_piix
2013-07-17T13:25:24.690144-05:00 linux-3y3i kernel: [  493.772882]
Pid: 334, comm: btrfs-balance Tainted: GW
3.9.8-1.gfccf19c-desktop #3
2013-07-17T13:25:24.690145-05:00 linux-3y3i kernel: [  493.772883] Call Trace:
2013-07-17T13:25:24.690146-05:00 linux-3y3i kernel: [  493.772895]
[81004728] dump_trace+0x88/0x300
2013-07-17T13:25:24.690148-05:00 linux-3y3i kernel: [  493.772902]
[815c8e9b] dump_stack+0x69/0x6f
2013-07-17T13:25:24.690149-05:00 linux-3y3i kernel: [  493.772907]
[81046069] warn_slowpath_common+0x79/0xc0
2013-07-17T13:25:24.690150-05:00 linux-3y3i kernel: [  493.772912]
[81046165] warn_slowpath_fmt+0x45/0x50
2013-07-17T13:25:24.690151-05:00 linux-3y3i kernel: [  493.772923]
[a0224fad] btrfs_alloc_free_block+0x3bd/0x3d0 [btrfs]
2013-07-17T13:25:24.690152-05:00 linux-3y3i kernel: [  493.772955]
[a0210685] __btrfs_cow_block+0x135/0x560 [btrfs]
2013-07-17T13:25:24.690153-05:00 linux-3y3i kernel: [  493.772974]
[a0210c5c] 

Re: report: my btrfs filesystem failed hard today

2013-07-17 Thread Jon Nelson
On Wed, Jul 17, 2013 at 6:04 PM, Josef Bacik jba...@fusionio.com wrote:
 On Wed, Jul 17, 2013 at 05:44:23PM -0500, Jon Nelson wrote:
 I had a btrfs filesystem under 3.9.8 that failed /hard/ today. So hard
 that the filesystem could not be mounted because there wasn't enough
 free space, unless it was mounted read only.

 This happened after I ran out of metadata space (is there a way to
 increase the amount of metadata storage) while still having many gigs
 free of data space, as reported by btrfs fi df. I tried balancing the
 metadata, defrag'ing files (with -czlib) and even tried mounting with
 -o remount,metadata_ratio={several values}, none of which worked, and
 then it crashed hard.

 First, it killed systemd's logger (journald), which refused to
 start.Following the crash, I was not able to mount the filesystem
 without -o ro.  -o recovery did not work.


 Can you try btrfs-next, I did some work in this area in the last few months.

Unfortunately, I could not wait. I rely needed that system back
up, so I copied everything off, reformatted with ext4, and copied it
all back.
I (probably?) have more logs, though.


--
Jon
Software Blacksmith
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hang on 3.9, 3.10-rc5

2013-06-21 Thread Jon Nelson
On Fri, Jun 21, 2013 at 6:30 AM, Chris Mason chris.ma...@fusionio.com wrote:
 Quoting Jon Nelson (2013-06-20 21:46:46)
 Is this what you are looking for?
 After this, the CPU gets stuck and I have to reboot.


 [360491.932226] [ cut here ]
 [360491.932261] kernel BUG at
 /home/abuild/rpmbuild/BUILD/kernel-desktop-3.9.6/linux-3.9/fs/btrfs/ctree.c:1144!

 Aha, this is not a hang but a crash.  I know the end result is the same
 (you push the reset button) but for a crash we always want to see this
 oops message.

In my case, pick a file, any file. I randomly choose a file, and
this most recent time I chose $HOME/.config/Clementine/jamendo.db.

 I'll try to reproduce (and check out the bugzilla too).  Do you have
 -chris
 compression on?

I don't have compression on:

/blah/blah on / type btrfs (rw,relatime,ssd,space_cache,enospc_debug)

But those mount opts don't seem to matter.
I checked out the bugzilla entry, it sure looks similar!


--
Jon
Software Blacksmith
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hang on 3.9, 3.10-rc5

2013-06-20 Thread Jon Nelson
Is this what you are looking for?
After this, the CPU gets stuck and I have to reboot.


[360491.932226] [ cut here ]
[360491.932261] kernel BUG at
/home/abuild/rpmbuild/BUILD/kernel-desktop-3.9.6/linux-3.9/fs/btrfs/ctree.c:1144!
[360491.932312] invalid opcode:  [#1] PREEMPT SMP
[360491.932344] Modules linked in: xfs nilfs2 jfs usb_storage
nls_iso8859_1 nls_cp437 vfat fat mmc_block nfsv4 auth_rpcgss nfs
fscache lockd sunrpc tun snd_usb_audio snd_usbmidi_lib snd_rawmidi
snd_seq_device fuse xt_tcpudp xt_pkttype xt_LOG xt_limit af_packet
ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ipt_REJECT
iptable_raw xt_CT iptable_filter ip6table_mangle
nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4
nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack ip6table_filter
ip6_tables x_tables arc4 iwldvm mac80211 snd_hda_codec_hdmi
snd_hda_codec_conexant iTCO_wdt iTCO_vendor_support mperf coretemp
kvm_intel kvm snd_hda_intel snd_hda_codec microcode snd_hwdep snd_pcm
thinkpad_acpi snd_timer joydev pcspkr sr_mod snd tpm_tis iwlwifi cdrom
e1000e wmi tpm battery ac tpm_bios cfg80211 sdhci_pci soundcore ptp
sdhci snd_page_alloc i2c_i801 rfkill pps_core mmc_core lpc_ich
mfd_core tcp_westwood sg autofs4 btrfs raid6_pq zlib_deflate xor
libcrc32c sha256_generic dm_crypt dm_mod crc32_pclmul
ghash_clmulni_intel crc32c_intel aesni_intel ablk_helper cryptd lrw
aes_x86_64 xts gf128mul thermal i915 drm_kms_helper drm i2c_algo_bit
video button processor thermal_sys scsi_dh_rdac scsi_dh_hp_sw
scsi_dh_emc scsi_dh_alua scsi_dh ata_generic ata_piix
[360491.933095] CPU 3
[360491.933110] Pid: 22166, comm: btrfs-endio-wri Not tainted
3.9.6-1.g8ead728-desktop #1 LENOVO 4239CTO/4239CTO
[360491.933161] RIP: 0010:[a022075b]  [a022075b]
__tree_mod_log_rewind+0x23b/0x240 [btrfs]
[360491.933225] RSP: 0018:88014cad7888  EFLAGS: 00010297
[360491.933253] RAX:  RBX: 880065705b40 RCX:
88014cad7828
[360491.933289] RDX: 0247cc54 RSI: 88014949e821 RDI:
8801bf916640
[360491.933326] RBP: 880073048490 R08: 1000 R09:
88014cad7838
[360491.933362] R10:  R11:  R12:
8801c0556640
[360491.933398] R13: 0001731e R14: 003d R15:
880158b76340
[360491.933435] FS:  () GS:88021e2c()
knlGS:
[360491.933476] CS:  0010 DS:  ES:  CR0: 80050033
[360491.933506] CR2: 7f0a360017f8 CR3: 00015be03000 CR4:
000407e0
[360491.933543] DR0:  DR1:  DR2:

[360491.933579] DR3:  DR6: 0ff0 DR7:
0400
[360491.933617] Process btrfs-endio-wri (pid: 22166, threadinfo
88014cad6000, task 8801a5f8a6c0)
[360491.933662] Stack:
[360491.933674]  88011b573800 88010126e7f0 0001
1600
[360491.933719]  880073048490 a0228d45 6db6db6db6db6db7
8801c0556640
[360491.933763]  88020e764000 8800 0001731e
880197cb8158
[360491.933807] Call Trace:
[360491.933848]  [a0228d45] btrfs_search_old_slot+0x635/0x950 [btrfs]
[360491.933909]  [a02a1ec6]
__resolve_indirect_refs+0x156/0x640 [btrfs]
[360491.934044]  [a02a2e0c] find_parent_nodes+0x95c/0x1050 [btrfs]
[360491.934176]  [a02a3592] btrfs_find_all_roots+0x92/0x100 [btrfs]
[360491.934307]  [a02a401e] iterate_extent_inodes+0x16e/0x370 [btrfs]
[360491.934440]  [a02a42b8]
iterate_inodes_from_logical+0x98/0xc0 [btrfs]
[360491.934572]  [a024c1c8] record_extent_backrefs+0x68/0xe0 [btrfs]
[360491.934652]  [a0256d80]
btrfs_finish_ordered_io+0x150/0x990 [btrfs]
[360491.934739]  [a0276ef3] worker_loop+0x153/0x560 [btrfs]
[360491.934833]  [810697c3] kthread+0xb3/0xc0
[360491.934864]  [815dc6bc] ret_from_fork+0x7c/0xb0
[360491.934896] DWARF2 unwinder stuck at ret_from_fork+0x7c/0xb0
[360491.934925]
[360491.934934] Leftover inexact backtrace:
[360491.934934]
[360491.934965]  [81069710] ? kthread_create_on_node+0x120/0x120
[360491.934999] Code: c1 48 63 43 58 48 89 c2 48 c1 e2 05 48 8d 54 10
65 48 63 43 2c 48 89 c6 48 c1 e6 05 48 8d 74 30 65 e8 3a af 04 00 e9
b3 fe ff ff 0f 0b 0f 0b 90 41 57 41 56 41 55 41 54 55 48 89 fd 53 48
83 ec
[360491.935188] RIP  [a022075b]
__tree_mod_log_rewind+0x23b/0x240 [btrfs]
[360491.935233]  RSP 88014cad7888
[360491.946047] ---[ end trace 1475a0830dcadf9c ]---
[360491.946051] note: btrfs-endio-wri[22166] exited with preempt_count 1

On Thu, Jun 20, 2013 at 8:11 PM, Chris Mason chris.ma...@fusionio.com wrote:
 Quoting Jon Nelson (2013-06-18 13:19:04)
 Josef Bacik jbacik at fusionio.com writes:

 
  On Tue, Jun 11, 2013 at 11:43:30AM -0400, Sage Weil wrote:
   I'm also seeing this hang regularly with both 3.9 and 3.10-rc5.  Is this
   is a known problem?  In this case there is no powercycling; just a 
   regular
   ceph-osd

Re: hang on 3.9, 3.10-rc5

2013-06-18 Thread Jon Nelson
Josef Bacik jbacik at fusionio.com writes:

 
 On Tue, Jun 11, 2013 at 11:43:30AM -0400, Sage Weil wrote:
  I'm also seeing this hang regularly with both 3.9 and 3.10-rc5.  Is this 
  is a known problem?  In this case there is no powercycling; just a regular 
  ceph-osd workload.

..


I'm able to cause a complete kernel hang by defrag'ing even one 
file on 3.9.X (3.9.0 through 3.9.4, so far).



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING: at fs/btrfs/free-space-cache.c:921 __btrfs_write_out_cache+0x6b9/0x9a0 [btrfs]()

2013-05-01 Thread Jon Nelson
Josef Bacik jbacik at fusionio.com writes:

..
 Ok well that's not good, I'm not sure how you got a 156 gigabyte block 
group,
 but thats why that warning is going off.  Can you pull btrfs-image down 
from
 here
 
 git://github.com/josefbacik/btrfs-progs.git

What is the difference between this git repo and the one at 
git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git

I notice that the former has commits the latter doesn't. Is the latter the 
analogue to btrfs-next ? 



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: corrupted filesystem. mounts with -o recovery,ro but not -o recovery or -o ro

2013-04-23 Thread Jon Nelson
I need to know soon if there is going to be anything I can do to
rescue this filesystem.
I've tried 3.7.10, 3.8.[5,6,7,8] and btrfs-next as of (
bba653d1207646b17671c6cb9a0629736811848a ).
btrfs-next - at least - merely failed the mount, all of the others
failed but also ran into a BUG, requiring a reboot.
I *can* mount this with -o recovery,ro but nothing else works.

On Sat, Apr 20, 2013 at 8:20 AM, Jon Nelson jnel...@jamponi.net wrote:
 Using 3.8.8, I tried mounting with -o recovery and -o
 recovery,nospace_cache (which shouldn't be any different, if I'm
 understanding the kernel sources properly) without any benefit.

 Then I tried btrfs-next ( bba653d1207646b17671c6cb9a0629736811848a as
 of this writing ) also without being able to mount the filesystem,
 except for one big improvement -- it doesn't BUG/crash the kernel, it
 just fails the mount.

 As before, -o recovery,ro works.

 These seem to be the most useful of the messages:

 2013-04-20T08:10:06.263708-05:00 turnip kernel: [  638.385206] BTRFS
 error (device sda3) in btrfs_run_delayed_refs:2657: errno=-2 No such
 entry
 2013-04-20T08:10:06.263711-05:00 turnip kernel: [  638.385211] BTRFS
 warning (device sda3): Skipping commit of aborted transaction.
 2013-04-20T08:10:06.263713-05:00 turnip kernel: [  638.385215] BTRFS
 error (device sda3) in cleanup_transaction:1450: errno=-2 No such
 entry
 2013-04-20T08:10:06.263715-05:00 turnip kernel: [  638.385322] BTRFS
 error (device sda3): Error removing orphan entry, stopping orphan
 cleanup
 2013-04-20T08:10:06.263717-05:00 turnip kernel: [  638.385334] BTRFS
 critical (device sda3): could not do orphan cleanup -22
 2013-04-20T08:10:06.263720-05:00 turnip kernel: [  638.385350] btrfs:
 commit super ret -30

 Using debug-tree, I can determine that the most probable root backup
 is just one generation back, but btrfs-restore doesn't seem to want to
 let me use it:
 btrfs root backup slot 0
 tree root gen 246528 block 2621770289152
 extent root gen 246528 block 2621506080768
 chunk root gen 220757 block 2622035951616
 device root gen 220757 block 2621945528320
 csum root gen 246529 block 2621775081472
 fs root gen 246529 block 2621774274560
 440384839680 used 1629622038016 total 4 devices
 btrfs root backup slot 1
 tree root gen 246529 block 2619191820288
 extent root gen 246530 block 2619724374016
 chunk root gen 220757 block 2622035951616
 device root gen 220757 block 2621945528320
 csum root gen 246530 block 2619804864512
 fs root gen 246530 block 2619723927552
 440382750720 used 1629622038016 total 4 devices
 btrfs root backup slot 2
 tree root gen 246530 block 2621340016640
 extent root gen 246530 block 2619724374016
 chunk root gen 220757 block 2622035951616
 device root gen 220757 block 2621945528320
 csum root gen 246530 block 2619804864512
 fs root gen 246530 block 2619723927552
 440385257472 used 1629622038016 total 4 devices
 btrfs root backup slot 3
 tree root gen 246527 block 2621585006592
 extent root gen 246528 block 2621506080768
 chunk root gen 220757 block 2622035951616
 device root gen 220757 block 2621945528320
 csum root gen 246527 block 2621345435648
 fs root gen 246528 block 2621586079744
 440384839680 used 1629622038016 total 4 devices

 turnip:~/recovery # btrfs restore -r 2619191820288 -vv -i  /dev/sdd 
 /tmp/foo
 Error reading root
 turnip:~/recovery #

 Is there a way for me to use btrfs tools to tell the superblock to go
 ahead and use backup root #1 in this case?


 On Tue, Apr 16, 2013 at 11:44 AM, Jon Nelson jnel...@jamponi.net wrote:
 Tried to mount with -o recovery using 3.8.7.  No change. Does
 anybody have any suggestions?


 On Sat, Apr 13, 2013 at 6:21 PM, Jon Nelson jnel...@jamponi.net wrote:
 I have a 4-disk btrfs filesystem in raid1 mode.
 I'm running openSUSE 12.3, 3.7.10, x86_64.
 A few days ago something went wrong and the filesystem re-mounted itself RO.
 After reboot, it didn't come up.
 After a fair bit of work, I can get the filesystem to mount with -o
 recovery,ro.  However, if I use -o recovery alone or any other option
 I eventually hit a BUG and that's that. I've tried with up to kernel
 3.8.6 without improvement.

 My first question is this: how I can make it so I can use the
 filesystem without having to mount it with -o recovery,ro from a
 rescue environment (I have imaged all four drives *and* made a full
 filesystem-level backup, except for snapshots and some others).

 My second set of question is: what went wrong initially, what went
 wrong with the recovery(s), and are there fixes in kernels after 3.8.6
 that might be involved?

 I

Re: corrupted filesystem. mounts with -o recovery,ro but not -o recovery or -o ro

2013-04-23 Thread Jon Nelson
On Tue, Apr 23, 2013 at 10:03 AM, Liu Bo bo.li@oracle.com wrote:

 Can you please show us where it BUG_ON(or logs) when mounting with -o 
 recovery?
 (the stack info below seems not to be a resulf of '-o recovery'?)

I have this from 3.8.8:

2013-04-19T21:19:47.060937-05:00 turnip kernel: [  100.608815] device
fsid 7feedf1e-9711-4900-af9c-92738ea8aace devid 5 transid 246530
/dev/sdd
2013-04-19T21:19:47.072892-05:00 turnip kernel: [  100.619694] btrfs:
enabling auto recovery
2013-04-19T21:19:47.072925-05:00 turnip kernel: [  100.619700] btrfs:
disk space caching is enabled
2013-04-19T21:19:55.156935-05:00 turnip kernel: [  108.692720] btrfs:
csum mismatch on free space cache
2013-04-19T21:19:55.156978-05:00 turnip kernel: [  108.692745] btrfs:
failed to load free space cache for block group 2618814627840
2013-04-19T21:19:55.156982-05:00 turnip kernel: [  108.693108] btrfs:
csum mismatch on free space cache
2013-04-19T21:19:55.156985-05:00 turnip kernel: [  108.693133] btrfs:
failed to load free space cache for block group 2619888369664
2013-04-19T21:19:57.924956-05:00 turnip kernel: [  111.453762] btrfs:
csum mismatch on free space cache
2013-04-19T21:19:57.924999-05:00 turnip kernel: [  111.453784] btrfs:
failed to load free space cache for block group 2620962111488
2013-04-19T21:19:58.139783-05:00 turnip kernel: [  111.521418] leaf
2618819321856 total ptrs 25 free space 1780
2013-04-19T21:19:58.139814-05:00 turnip kernel: [  111.521428]
item 0 key (2621340270592 a8 4096) itemoff 3908 itemsize 87
2013-04-19T21:19:58.139817-05:00 turnip kernel: [  111.521434]
extent refs 5 gen 214210 flags 2
2013-04-19T21:19:58.139822-05:00 turnip kernel: [  111.521439]
tree block key (562484 1 0) level 0
2013-04-19T21:19:58.139864-05:00 turnip kernel: [  111.521441]
tree block backref root 5
2013-04-19T21:19:58.139873-05:00 turnip kernel: [  111.521445]
shared block backref parent 2621774995456
2013-04-19T21:19:58.139875-05:00 turnip kernel: [  111.521448]
shared block backref parent 2621603078144
2013-04-19T21:19:58.139877-05:00 turnip kernel: [  111.521451]
shared block backref parent 2621340147712
2013-04-19T21:19:58.139879-05:00 turnip kernel: [  111.521453]
shared block backref parent 2621312724992
2013-04-19T21:19:58.139880-05:00 turnip kernel: [  111.521458]
item 1 key (2621340274688 a8 4096) itemoff 3857 itemsize 51
2013-04-19T21:19:58.139882-05:00 turnip kernel: [  111.521462]
extent refs 1 gen 214217 flags 258
2013-04-19T21:19:58.139884-05:00 turnip kernel: [  111.521466]
tree block key (159644 54 627062569) level 0
2013-04-19T21:19:58.139885-05:00 turnip kernel: [  111.521469]
shared block backref parent 2621340254208
2013-04-19T21:19:58.139887-05:00 turnip kernel: [  111.521473]
item 2 key (2621340286976 a8 4096) itemoff 3761 itemsize 96
2013-04-19T21:19:58.139888-05:00 turnip kernel: [  111.521476]
extent refs 6 gen 214207 flags 2
2013-04-19T21:19:58.139890-05:00 turnip kernel: [  111.521480]
tree block key (562122 c 469151) level 0
2013-04-19T21:19:58.139891-05:00 turnip kernel: [  111.521482]
tree block backref root 5
2013-04-19T21:19:58.139892-05:00 turnip kernel: [  111.521485]
shared block backref parent 2622020726784
2013-04-19T21:19:58.139894-05:00 turnip kernel: [  111.521488]
shared block backref parent 2621647806464
2013-04-19T21:19:58.139895-05:00 turnip kernel: [  111.521491]
shared block backref parent 2621520723968
2013-04-19T21:19:58.139897-05:00 turnip kernel: [  111.521494]
shared block backref parent 2621378756608
2013-04-19T21:19:58.139898-05:00 turnip kernel: [  111.521496]
shared block backref parent 2621341069312
2013-04-19T21:19:58.139900-05:00 turnip kernel: [  111.521501]
item 3 key (2621340295168 a8 4096) itemoff 3710 itemsize 51
2013-04-19T21:19:58.139901-05:00 turnip kernel: [  111.521505]
extent refs 1 gen 241434 flags 2
2013-04-19T21:19:58.139903-05:00 turnip kernel: [  111.521509]
tree block key (2620335087616 b6 2621224001536) level 0
2013-04-19T21:19:58.139904-05:00 turnip kernel: [  111.521516]
tree block backref root 2
2013-04-19T21:19:58.139906-05:00 turnip kernel: [  111.521521]
item 4 key (2621340299264 a8 4096) itemoff 3659 itemsize 51
2013-04-19T21:19:58.139907-05:00 turnip kernel: [  111.521525]
extent refs 1 gen 208634 flags 258
2013-04-19T21:19:58.139909-05:00 turnip kernel: [  111.521529]
tree block key (11446 1 0) level 1
2013-04-19T21:19:58.139910-05:00 turnip kernel: [  111.521531]
shared block backref parent 2621405974528
2013-04-19T21:19:58.139912-05:00 turnip kernel: [  111.521541]
item 5 key (2621340303360 a8 4096) itemoff 3608 itemsize 51
2013-04-19T21:19:58.139913-05:00 turnip kernel: [  111.521544]
extent refs 1 gen 208634 flags 258
2013-04-19T21:19:58.139915-05:00 turnip kernel: [  111.521548]
tree block key (11547 6c 0) level 0
2013-04-19T21:19:58.139917-05:00 turnip kernel: [  111.521550]
shared block backref parent 2621340299264
2013-04-19T21:19:58.139918-05:00 turnip kernel: [  111.521555]
item 6 key (2621340307456 a8 4096) itemoff 3557 itemsize 

Re: minor patch to cmds-restore.c

2013-04-20 Thread Jon Nelson
On Fri, Apr 19, 2013 at 11:48 PM, Eric Sandeen sand...@redhat.com wrote:
 On 4/19/13 7:11 PM, Jon Nelson wrote:
 The following is a minor patch to cmds-restore.c


 Hi Jon - just a note -

 When sending a patch like this, it's best to follow the standard
 patch format, which closely mimics the kernel patch submission
 guidelines:

...
 That way it's clear to the reviewer as well as making the source
 control history more descriptive.

 Hum, it'd be nice to have it in the manpage too...

Thank you for the constructive and polite feedback! I will try to
adhere the next time I send a patch. If I can find some time later,
I'll update the manpage, too.

Before I do that, is the verbiage in the patch acceptable?

--
Jon
Software Blacksmith
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: corrupted filesystem. mounts with -o recovery,ro but not -o recovery or -o ro

2013-04-20 Thread Jon Nelson
Using 3.8.8, I tried mounting with -o recovery and -o
recovery,nospace_cache (which shouldn't be any different, if I'm
understanding the kernel sources properly) without any benefit.

Then I tried btrfs-next ( bba653d1207646b17671c6cb9a0629736811848a as
of this writing ) also without being able to mount the filesystem,
except for one big improvement -- it doesn't BUG/crash the kernel, it
just fails the mount.

As before, -o recovery,ro works.

These seem to be the most useful of the messages:

2013-04-20T08:10:06.263708-05:00 turnip kernel: [  638.385206] BTRFS
error (device sda3) in btrfs_run_delayed_refs:2657: errno=-2 No such
entry
2013-04-20T08:10:06.263711-05:00 turnip kernel: [  638.385211] BTRFS
warning (device sda3): Skipping commit of aborted transaction.
2013-04-20T08:10:06.263713-05:00 turnip kernel: [  638.385215] BTRFS
error (device sda3) in cleanup_transaction:1450: errno=-2 No such
entry
2013-04-20T08:10:06.263715-05:00 turnip kernel: [  638.385322] BTRFS
error (device sda3): Error removing orphan entry, stopping orphan
cleanup
2013-04-20T08:10:06.263717-05:00 turnip kernel: [  638.385334] BTRFS
critical (device sda3): could not do orphan cleanup -22
2013-04-20T08:10:06.263720-05:00 turnip kernel: [  638.385350] btrfs:
commit super ret -30

Using debug-tree, I can determine that the most probable root backup
is just one generation back, but btrfs-restore doesn't seem to want to
let me use it:
btrfs root backup slot 0
tree root gen 246528 block 2621770289152
extent root gen 246528 block 2621506080768
chunk root gen 220757 block 2622035951616
device root gen 220757 block 2621945528320
csum root gen 246529 block 2621775081472
fs root gen 246529 block 2621774274560
440384839680 used 1629622038016 total 4 devices
btrfs root backup slot 1
tree root gen 246529 block 2619191820288
extent root gen 246530 block 2619724374016
chunk root gen 220757 block 2622035951616
device root gen 220757 block 2621945528320
csum root gen 246530 block 2619804864512
fs root gen 246530 block 2619723927552
440382750720 used 1629622038016 total 4 devices
btrfs root backup slot 2
tree root gen 246530 block 2621340016640
extent root gen 246530 block 2619724374016
chunk root gen 220757 block 2622035951616
device root gen 220757 block 2621945528320
csum root gen 246530 block 2619804864512
fs root gen 246530 block 2619723927552
440385257472 used 1629622038016 total 4 devices
btrfs root backup slot 3
tree root gen 246527 block 2621585006592
extent root gen 246528 block 2621506080768
chunk root gen 220757 block 2622035951616
device root gen 220757 block 2621945528320
csum root gen 246527 block 2621345435648
fs root gen 246528 block 2621586079744
440384839680 used 1629622038016 total 4 devices

turnip:~/recovery # btrfs restore -r 2619191820288 -vv -i  /dev/sdd /tmp/foo
Error reading root
turnip:~/recovery #

Is there a way for me to use btrfs tools to tell the superblock to go
ahead and use backup root #1 in this case?


On Tue, Apr 16, 2013 at 11:44 AM, Jon Nelson jnel...@jamponi.net wrote:
 Tried to mount with -o recovery using 3.8.7.  No change. Does
 anybody have any suggestions?


 On Sat, Apr 13, 2013 at 6:21 PM, Jon Nelson jnel...@jamponi.net wrote:
 I have a 4-disk btrfs filesystem in raid1 mode.
 I'm running openSUSE 12.3, 3.7.10, x86_64.
 A few days ago something went wrong and the filesystem re-mounted itself RO.
 After reboot, it didn't come up.
 After a fair bit of work, I can get the filesystem to mount with -o
 recovery,ro.  However, if I use -o recovery alone or any other option
 I eventually hit a BUG and that's that. I've tried with up to kernel
 3.8.6 without improvement.

 My first question is this: how I can make it so I can use the
 filesystem without having to mount it with -o recovery,ro from a
 rescue environment (I have imaged all four drives *and* made a full
 filesystem-level backup, except for snapshots and some others).

 My second set of question is: what went wrong initially, what went
 wrong with the recovery(s), and are there fixes in kernels after 3.8.6
 that might be involved?

 I have *some* logs, and I might be able to share portions of them.
 I also took a btrfs-image.


 Using a very recent btrfs-progs git pull, 'btrfs repair ...' gives me:
 ERROR: device scan failed '/dev/sdb' - Device or resource busy
 ERROR: device scan failed '/dev/sda' - Device or resource busy
 failed to open /dev/sr0: No medium found
 ERROR: device scan failed '/dev/sdb' - Device or resource busy
 ERROR: device scan failed '/dev/sda' - Device or resource busy
 failed to open /dev/sr0: No medium found

minor patch to cmds-restore.c

2013-04-19 Thread Jon Nelson
The following is a minor patch to cmds-restore.c


diff --git a/cmds-restore.c b/cmds-restore.c
index c75e187..273c813 100644
--- a/cmds-restore.c
+++ b/cmds-restore.c
@@ -917,14 +917,16 @@ out:
 }

 const char * const cmd_restore_usage[] = {
-   btrfs restore [options] device,
+   btrfs restore [options] device [destination],
Try to restore files from a damaged filesystem (unmounted),
,
+   -l  list roots,
-s  get snapshots,
-v  verbose,
-i  ignore errors,
-o  overwrite,
-   -t  tree location,
+   -r rootid root location,
+   -t treeid tree location,
-f offset filesystem location,
-u block  super mirror,
-d  find dir,


--
Jon
Software Blacksmith
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: corrupted filesystem. mounts with -o recovery,ro but not -o recovery or -o ro

2013-04-16 Thread Jon Nelson
Tried to mount with -o recovery using 3.8.7.  No change. Does
anybody have any suggestions?


On Sat, Apr 13, 2013 at 6:21 PM, Jon Nelson jnel...@jamponi.net wrote:
 I have a 4-disk btrfs filesystem in raid1 mode.
 I'm running openSUSE 12.3, 3.7.10, x86_64.
 A few days ago something went wrong and the filesystem re-mounted itself RO.
 After reboot, it didn't come up.
 After a fair bit of work, I can get the filesystem to mount with -o
 recovery,ro.  However, if I use -o recovery alone or any other option
 I eventually hit a BUG and that's that. I've tried with up to kernel
 3.8.6 without improvement.

 My first question is this: how I can make it so I can use the
 filesystem without having to mount it with -o recovery,ro from a
 rescue environment (I have imaged all four drives *and* made a full
 filesystem-level backup, except for snapshots and some others).

 My second set of question is: what went wrong initially, what went
 wrong with the recovery(s), and are there fixes in kernels after 3.8.6
 that might be involved?

 I have *some* logs, and I might be able to share portions of them.
 I also took a btrfs-image.


 Using a very recent btrfs-progs git pull, 'btrfs repair ...' gives me:
 ERROR: device scan failed '/dev/sdb' - Device or resource busy
 ERROR: device scan failed '/dev/sda' - Device or resource busy
 failed to open /dev/sr0: No medium found
 ERROR: device scan failed '/dev/sdb' - Device or resource busy
 ERROR: device scan failed '/dev/sda' - Device or resource busy
 failed to open /dev/sr0: No medium found
 checking extents
 Backref 341888225280 parent 2621340434432 owner 0 offset 0 num_refs 0
 not found in extent tree
 Incorrect local backref count on 341888225280 parent 2621340434432
 owner 0 offset 0 found 1 wanted 0 back 0x6dc8500
 Incorrect local backref count on 341888225280 root 1 owner 496 offset
 0 found 0 wanted 1 back 0x2bb636c0
 backpointer mismatch on [341888225280 262144]
 Unable to find block group for 0
 btrfs: extent-tree.c:284: find_search_start: Assertion `!(1)' failed.
 enabling repair mode
 Checking filesystem on /dev/sdd
 UUID: 7feedf1e-9711-4900-af9c-92738ea8aace


 and some of the errors are here:

 [  314.095449] [ cut here ]
 [  314.095526] WARNING: at
 /home/abuild/rpmbuild/BUILD/kernel-desktop-3.8.6/linux-3.8/fs/btrfs/extent-tree.c:5208
 __btrfs_free_extent+0x853/0x890 [btrfs]()
 [  314.095541] Hardware name: TA790GX XE
 [  314.09] Modules linked in: dm_mod af_packet
 cpufreq_conservative cpufreq_userspace cpufreq_powersave
 snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec
 snd_hwdep snd_pcm snd_timer snd bt
 rfs acpi_cpufreq mperf kvm_amd zlib_deflate libcrc32c kvm radeon
 sr_mod ttm drm_kms_helper cdrom processor sg via_velocity drm
 i2c_algo_bit shpchp pci_hotplug sp5100_tco i2c_piix4 edac_core
 edac_mce_amd thermal
 ata_generic thermal_sys r8169 pata_atiixp k10temp pcspkr microcode
 crc_ccitt wmi soundcore snd_page_alloc button autofs4
 [  314.095867] Pid: 5310, comm: btrfs-transacti Not tainted 3.8.6-2-desktop #1
 [  314.095875] Call Trace:
 [  314.095904]  [81004748] dump_trace+0x88/0x300
 [  314.095923]  [815a9128] dump_stack+0x69/0x6f
 [  314.095937]  [81044f49] warn_slowpath_common+0x79/0xc0
 [  314.095968]  [a0400db3] __btrfs_free_extent+0x853/0x890 [btrfs]
 [  314.096061]  [a0404b0f] run_clustered_refs+0x48f/0xb20 [btrfs]
 [  314.096147]  [a0408a9a] btrfs_run_delayed_refs+0xca/0x320 [btrfs]
 [  314.096249]  [a04182e0] btrfs_commit_transaction+0x80/0xb00 
 [btrfs]
 [  314.096379]  [a0411b4d] transaction_kthread+0x19d/0x220 [btrfs]
 [  314.096492]  [81068043] kthread+0xb3/0xc0
 [  314.096506]  [815bbf7c] ret_from_fork+0x7c/0xb0
 [  314.096515] ---[ end trace 64d3998241407ddc ]---
 [  314.096520] btrfs unable to find ref byte nr 2621340344320 parent 0
 root 2  owner 1 offset 0
 [  314.096526] [ cut here ]
 [  314.096551] WARNING: at
 /home/abuild/rpmbuild/BUILD/kernel-desktop-3.8.6/linux-3.8/fs/btrfs/extent-tree.c:5265
 __btrfs_free_extent+0x7ba/0x890 [btrfs]()
 [  314.096554] Hardware name: TA790GX XE
 [  314.096556] Modules linked in: dm_mod af_packet
 cpufreq_conservative cpufreq_userspace cpufreq_powersave
 snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec
 snd_hwdep snd_pcm snd_timer snd btrfs acpi_cpufreq mperf kvm_amd
 zlib_deflate libcrc32c kvm radeon sr_mod ttm drm_kms_helper cdrom
 processor sg via_velocity drm i2c_algo_bit shpchp pci_hotplug
 sp5100_tco i2c_piix4 edac_core edac_mce_amd thermal ata_generic
 thermal_sys r8169 pata_atiixp k10temp pcspkr microcode crc_ccitt wmi
 soundcore snd_page_alloc button autofs4
 [  314.096613] Pid: 5310, comm: btrfs-transacti Tainted: GW
 3.8.6-2-desktop #1
 [  314.096615] Call Trace:
 [  314.096627]  [81004748] dump_trace+0x88/0x300
 [  314.096636]  [815a9128] dump_stack+0x69/0x6f
 [  314.096646

corrupted filesystem. mounts with -o recovery,ro but not -o recovery or -o ro

2013-04-13 Thread Jon Nelson
I have a 4-disk btrfs filesystem in raid1 mode.
I'm running openSUSE 12.3, 3.7.10, x86_64.
A few days ago something went wrong and the filesystem re-mounted itself RO.
After reboot, it didn't come up.
After a fair bit of work, I can get the filesystem to mount with -o
recovery,ro.  However, if I use -o recovery alone or any other option
I eventually hit a BUG and that's that. I've tried with up to kernel
3.8.6 without improvement.

My first question is this: how I can make it so I can use the
filesystem without having to mount it with -o recovery,ro from a
rescue environment (I have imaged all four drives *and* made a full
filesystem-level backup, except for snapshots and some others).

My second set of question is: what went wrong initially, what went
wrong with the recovery(s), and are there fixes in kernels after 3.8.6
that might be involved?

I have *some* logs, and I might be able to share portions of them.
I also took a btrfs-image.


Using a very recent btrfs-progs git pull, 'btrfs repair ...' gives me:
ERROR: device scan failed '/dev/sdb' - Device or resource busy
ERROR: device scan failed '/dev/sda' - Device or resource busy
failed to open /dev/sr0: No medium found
ERROR: device scan failed '/dev/sdb' - Device or resource busy
ERROR: device scan failed '/dev/sda' - Device or resource busy
failed to open /dev/sr0: No medium found
checking extents
Backref 341888225280 parent 2621340434432 owner 0 offset 0 num_refs 0
not found in extent tree
Incorrect local backref count on 341888225280 parent 2621340434432
owner 0 offset 0 found 1 wanted 0 back 0x6dc8500
Incorrect local backref count on 341888225280 root 1 owner 496 offset
0 found 0 wanted 1 back 0x2bb636c0
backpointer mismatch on [341888225280 262144]
Unable to find block group for 0
btrfs: extent-tree.c:284: find_search_start: Assertion `!(1)' failed.
enabling repair mode
Checking filesystem on /dev/sdd
UUID: 7feedf1e-9711-4900-af9c-92738ea8aace


and some of the errors are here:

[  314.095449] [ cut here ]
[  314.095526] WARNING: at
/home/abuild/rpmbuild/BUILD/kernel-desktop-3.8.6/linux-3.8/fs/btrfs/extent-tree.c:5208
__btrfs_free_extent+0x853/0x890 [btrfs]()
[  314.095541] Hardware name: TA790GX XE
[  314.09] Modules linked in: dm_mod af_packet
cpufreq_conservative cpufreq_userspace cpufreq_powersave
snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec
snd_hwdep snd_pcm snd_timer snd bt
rfs acpi_cpufreq mperf kvm_amd zlib_deflate libcrc32c kvm radeon
sr_mod ttm drm_kms_helper cdrom processor sg via_velocity drm
i2c_algo_bit shpchp pci_hotplug sp5100_tco i2c_piix4 edac_core
edac_mce_amd thermal
ata_generic thermal_sys r8169 pata_atiixp k10temp pcspkr microcode
crc_ccitt wmi soundcore snd_page_alloc button autofs4
[  314.095867] Pid: 5310, comm: btrfs-transacti Not tainted 3.8.6-2-desktop #1
[  314.095875] Call Trace:
[  314.095904]  [81004748] dump_trace+0x88/0x300
[  314.095923]  [815a9128] dump_stack+0x69/0x6f
[  314.095937]  [81044f49] warn_slowpath_common+0x79/0xc0
[  314.095968]  [a0400db3] __btrfs_free_extent+0x853/0x890 [btrfs]
[  314.096061]  [a0404b0f] run_clustered_refs+0x48f/0xb20 [btrfs]
[  314.096147]  [a0408a9a] btrfs_run_delayed_refs+0xca/0x320 [btrfs]
[  314.096249]  [a04182e0] btrfs_commit_transaction+0x80/0xb00 [btrfs]
[  314.096379]  [a0411b4d] transaction_kthread+0x19d/0x220 [btrfs]
[  314.096492]  [81068043] kthread+0xb3/0xc0
[  314.096506]  [815bbf7c] ret_from_fork+0x7c/0xb0
[  314.096515] ---[ end trace 64d3998241407ddc ]---
[  314.096520] btrfs unable to find ref byte nr 2621340344320 parent 0
root 2  owner 1 offset 0
[  314.096526] [ cut here ]
[  314.096551] WARNING: at
/home/abuild/rpmbuild/BUILD/kernel-desktop-3.8.6/linux-3.8/fs/btrfs/extent-tree.c:5265
__btrfs_free_extent+0x7ba/0x890 [btrfs]()
[  314.096554] Hardware name: TA790GX XE
[  314.096556] Modules linked in: dm_mod af_packet
cpufreq_conservative cpufreq_userspace cpufreq_powersave
snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec
snd_hwdep snd_pcm snd_timer snd btrfs acpi_cpufreq mperf kvm_amd
zlib_deflate libcrc32c kvm radeon sr_mod ttm drm_kms_helper cdrom
processor sg via_velocity drm i2c_algo_bit shpchp pci_hotplug
sp5100_tco i2c_piix4 edac_core edac_mce_amd thermal ata_generic
thermal_sys r8169 pata_atiixp k10temp pcspkr microcode crc_ccitt wmi
soundcore snd_page_alloc button autofs4
[  314.096613] Pid: 5310, comm: btrfs-transacti Tainted: GW
3.8.6-2-desktop #1
[  314.096615] Call Trace:
[  314.096627]  [81004748] dump_trace+0x88/0x300
[  314.096636]  [815a9128] dump_stack+0x69/0x6f
[  314.096646]  [81044f49] warn_slowpath_common+0x79/0xc0
[  314.096673]  [a0400d1a] __btrfs_free_extent+0x7ba/0x890 [btrfs]
[  314.096752]  [a0404b0f] run_clustered_refs+0x48f/0xb20 [btrfs]
[  314.096832]  [a0408a9a] btrfs_run_delayed_refs+0xca/0x320 

device error stats resettable?

2013-04-01 Thread Jon Nelson
I have a device that is part of a 4-device btrfs raid1 setup.
I had accidentally jiggled the cable for this device and it started racking
up errors (about 90,000). After fixing the cable (and a scrub), all of
the errors are fixed (woo!), but
the device still shows lots of errors. Is there a way to reset the device
error count?

I'm on 3.8.2 on openSUSE 12.3 x86_64.


--
Jon
Software Blacksmith
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: device error stats resettable?

2013-04-01 Thread Jon Nelson
On Mon, Apr 1, 2013 at 5:39 PM, anand jain anand.j...@oracle.com wrote:

Is there a way to reset the device
 error count?


 there is -z, is it not what you are looking for ?

 --
 # btrfs dev stat --help
 usage: btrfs device stats [-z] path|device

 Show current device IO stats. -z to reset stats afterwards.

Aargh. Newer btrfs tool than I was using. Using the latest git
version, this works great, of course.
Thanks!

--
Jon
Software Blacksmith
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-show vs. btrfs different output

2013-03-22 Thread Jon Nelson
On Thu, Mar 21, 2013 at 11:25 AM, Eric Sandeen sand...@redhat.com wrote:
 On 3/21/13 10:29 AM, Jon Nelson wrote:
 On Thu, Mar 21, 2013 at 10:11 AM, Eric Sandeen sand...@redhat.com wrote:
 On 3/21/13 10:04 AM, Jon Nelson wrote:
 ...
 2. the current git btrfs-show and btrfs fi show both output
 *different* devices for device with UUID
 b5dc52bd-21bf-4173-8049-d54d88c82240, and they're both wrong.

 does blkid output find that uuid anywhere?

 Since you're working in git, can you maybe do a little bisecting
 to find out when it changed?  Should be a fairly quick test?

 blkid does /not/ report that uuid anywhere.

 git bisect, if I did it correctly, says:


 6eba9002956ac40db87d42fb653a0524dc568810 is the first bad commit
 commit 6eba9002956ac40db87d42fb653a0524dc568810
 Author: Goffredo Baroncelli kreij...@inwind.it
 Date:   Tue Sep 4 19:59:26 2012 +0200

 Correct un-initialized fsid variable

 :100644 100644 b21a87f827a6250da45f2fb6a1c3a6b651062243
 03952051b5e25e0b67f0f910c84d93eb90de8480 M  disk-io.c

 Ok, I think this is another case of greedily scanning stale
 backup superblocks (did you ever have btrfs on the whole sda
 or sdb?)

 btrfs_read_dev_super() currently tries to scan all 3 superblocks
 (primary  2 backups).  I'm guessing that you have some stale
 backup superblocks on sda and/or sdb.

 Before the above commit, if the first sb didn't look valid,
 it'd skip to the 2nd.  If the 2nd (stale) one looked OK,
 it'd compare its fsid to an uniniitialized variable (fsid)
 which would fail (since the fsid contents were random.)
 Same for the 3rd backup if found, and eventually it'd return
 -1 as failure and not report the device.

 After the commit, it'd skip the first invalid sb as well.
 But this time, it takes the fsid from the 2nd superblock as
 good and makes it through the loop thinking that it's found
 something valid.  Hence the report of a device which you didn't
 expect even though the first superblock is indeed wiped out.

 There are some patches floating around to stop this
 backup superblock scanning altogether.

 This might fix it for you; it basically returns failure
 if any superblock on the device is found to be bad.

 What we really need is the right bits in the right places
 to let the administrator know if a device looks like it might
 be corrupt  in need of fixing, vs. ignoring it altogether.

 Not sure if this is something we want upstream but you could
 test if if you like.

I did test and it appears to resolve the issue for me.
Thank you!

-- 
Jon
Software Blacksmith
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


crash with 3.7.10 and balance.

2013-03-22 Thread Jon Nelson
I'm running openSUSE 12.3 on x86_64.
I was running a balance:
btrfs balance -dusage=5 -v /

using the latest btrfs tools code from git (as of this writing)
and got a crash:

[304158.496250] btrfs: found 75 extents
[304159.309289] btrfs: relocating block group 2303295684608 flags 17
[304159.839886] btrfs: found 1 extents
[304161.484616] [ cut here ]
[304161.484668] WARNING: at
/home/abuild/rpmbuild/BUILD/kernel-default-3.7.10/linux-3.7/fs/btrfs/super.c:246
__btrfs_abort_transaction+0xc3/0xe0 [btrfs]()
[304161.484671] Hardware name: TA790GX XE
[304161.484673] btrfs: Transaction aborted
[304161.484675] Modules linked in: af_packet md5 xt_REDIRECT
xt_pkttype xt_physdev xt_TCPMSS xt_tcpudp xt_LOG xt_limit iptable_nat
nf_nat_ipv4 nf_nat iptable_mangle xt_mark nfsd nfs_acl nfs fscache
lockd auth_rpcgss ebt_ip sunrpc ebtable_filter ebtables bridge stp llc
ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ipt_REJECT
iptable_raw xt_CT iptable_filter ip6table_mangle nf_conntrack_ftp
nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4
nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack ip6table_filter
ip6_tables x_tables cpufreq_conservative cpufreq_userspace
cpufreq_powersave snd_hda_codec_hdmi snd_hda_codec_realtek
acpi_cpufreq snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer
snd sr_mod cdrom radeon mperf ttm soundcore ata_generic sg
via_velocity sp5100_tco drm_kms_helper kvm_amd kvm microcode
snd_page_alloc r8169 pcspkr crc_ccitt button i2c_piix4 k10temp
edac_core drm pata_atiixp i2c_algo_bit edac_mce_amd shpchp pci_hotplug
wmi tcp_htcp autofs4 btrfs zlib_deflate libcrc32c raid456
async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy
async_tx raid10 raid0 raid1 ohci_hcd ehci_hcd usbcore usb_common
scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_dh dm_mirror
dm_region_hash dm_log dm_mod edd fan thermal processor thermal_sys
[304161.484749] Pid: 22397, comm: btrfs Tainted: GW
3.7.10-1.1-default #1
[304161.484751] Call Trace:
[304161.484770]  [81004728] dump_trace+0x78/0x2c0
[304161.484777]  [8153b44e] dump_stack+0x69/0x6f
[304161.484785]  [81043f69] warn_slowpath_common+0x79/0xc0
[304161.484791]  [81044065] warn_slowpath_fmt+0x45/0x50
[304161.484812]  [a01662e3]
__btrfs_abort_transaction+0xc3/0xe0 [btrfs]
[304161.484844]  [a0175ddd] __btrfs_inc_extent_ref+0x1ed/0x250 [btrfs]
[304161.484899]  [a017c3f6] run_clustered_refs+0x666/0xa90 [btrfs]
[304161.484954]  [a017f4ea] btrfs_run_delayed_refs+0xca/0x310 [btrfs]
[304161.485012]  [a018f7d9] __btrfs_end_transaction+0xf9/0x420 [btrfs]
[304161.485085]  [a01d457d] merge_reloc_root+0x48d/0x520 [btrfs]
[304161.485214]  [a01d4711] merge_reloc_roots+0x101/0x140 [btrfs]
[304161.485337]  [a01d4bde] relocate_block_group+0x25e/0x6b0 [btrfs]
[304161.485459]  [a01d51d9]
btrfs_relocate_block_group+0x1a9/0x2e0 [btrfs]
[304161.485579]  [a01aed4d]
btrfs_relocate_chunk.isra.53+0x5d/0x6e0 [btrfs]
[304161.485674]  [a01b3086] btrfs_balance+0x826/0xd60 [btrfs]
[304161.485770]  [a01b88f6] btrfs_ioctl_balance+0x136/0x420 [btrfs]
[304161.485878]  [a01bcc64] btrfs_ioctl+0xe54/0x1870 [btrfs]
[304161.485967]  [81177b1f] do_vfs_ioctl+0x8f/0x520
[304161.485973]  [81178050] sys_ioctl+0xa0/0xc0
[304161.485979]  [8154f2ed] system_call_fastpath+0x1a/0x1f
[304161.485989]  [7f03050aef27] 0x7f03050aef26
[304161.485991] ---[ end trace d010cbea0d653c96 ]---
[304161.485995] BTRFS error (device sdd) in
__btrfs_inc_extent_ref:1952: Object already exists
[304161.485996] btrfs is forced readonly
[304161.486051] [ cut here ]
[304161.486138] kernel BUG at
/home/abuild/rpmbuild/BUILD/kernel-default-3.7.10/linux-3.7/fs/btrfs/relocation.c:2279!
[304161.486299] invalid opcode:  [#1] SMP
[304161.486371] Modules linked in: af_packet md5 xt_REDIRECT
xt_pkttype xt_physdev xt_TCPMSS xt_tcpudp xt_LOG xt_limit iptable_nat
nf_nat_ipv4 nf_nat iptable_mangle xt_mark nfsd nfs_acl nfs fscache
lockd auth_rpcgss ebt_ip sunrpc ebtable_filter ebtables bridge stp llc
ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ipt_REJECT
iptable_raw xt_CT iptable_filter ip6table_mangle nf_conntrack_ftp
nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4
nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack ip6table_filter
ip6_tables x_tables cpufreq_conservative cpufreq_userspace
cpufreq_powersave snd_hda_codec_hdmi snd_hda_codec_realtek
acpi_cpufreq snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer
snd sr_mod cdrom radeon mperf ttm soundcore ata_generic sg
via_velocity sp5100_tco drm_kms_helper kvm_amd kvm microcode
snd_page_alloc r8169 pcspkr crc_ccitt button i2c_piix4 k10temp
edac_core drm pata_atiixp i2c_algo_bit edac_mce_amd shpchp pci_hotplug
wmi tcp_htcp autofs4 btrfs zlib_deflate libcrc32c raid456
async_raid6_recov async_pq raid6_pq async_xor xor 

btrfs-show vs. btrfs different output

2013-03-21 Thread Jon Nelson
I'm running openSUSE 12.3 x86_64 which has an unknown git version, but
reports v0.19.
I'm also supplying the output from git which reports itself as:
v0.20-rc1-253-g7854c8b

The problem is that btrfs-show (git) and btrfs fi show (git) give
/different/ output from each other which is also different from the
(older) btrfs.

First btrfs-show (git):

**
** WARNING: this program is considered deprecated
** Please consider to switch to the btrfs utility
**
Label: none  uuid: b5dc52bd-21bf-4173-8049-d54d88c82240
Total devices 2 FS bytes used 230.34GB
devid1 size 298.09GB used 298.09GB path /dev/sda
*** Some devices missing

Label: none  uuid: 7feedf1e-9711-4900-af9c-92738ea8aace
Total devices 4 FS bytes used 348.21GB
devid4 size 460.76GB used 444.23GB path /dev/sdb3
devid3 size 460.76GB used 444.23GB path /dev/sda3
devid6 size 298.09GB used 282.53GB path /dev/sdc
devid5 size 298.09GB used 282.53GB path /dev/sdd

Btrfs v0.20-rc1-253-g7854c8b


Now btrfs fi show (git):

failed to open /dev/sr0: No medium found
Label: none  uuid: 7feedf1e-9711-4900-af9c-92738ea8aace
Total devices 4 FS bytes used 348.21GB
devid5 size 298.09GB used 282.53GB path /dev/sdd
devid6 size 298.09GB used 282.53GB path /dev/sdc
devid4 size 460.76GB used 444.23GB path /dev/sdb3
devid3 size 460.76GB used 444.23GB path /dev/sda3

Label: none  uuid: b5dc52bd-21bf-4173-8049-d54d88c82240
Total devices 2 FS bytes used 230.34GB
devid1 size 298.09GB used 298.09GB path /dev/sdb
*** Some devices missing

Btrfs v0.20-rc1-253-g7854c8b


And now the (older) btrfs fi show:

failed to read /dev/sr0
Label: none  uuid: 7feedf1e-9711-4900-af9c-92738ea8aace
Total devices 4 FS bytes used 348.21GB
devid5 size 298.09GB used 282.53GB path /dev/sdd
devid6 size 298.09GB used 282.53GB path /dev/sdc
devid4 size 460.76GB used 444.23GB path /dev/sdb3
devid3 size 460.76GB used 444.23GB path /dev/sda3

Btrfs v0.19+

which has similar output to the (older) btrfs-show.


The differences are:

1. the order of devices varies (not a big deal)
2. the current git btrfs-show and btrfs fi show both output
*different* devices for device with UUID
b5dc52bd-21bf-4173-8049-d54d88c82240, and they're both wrong.

Somewhat confusingly, /dev/disk/by-uuid/ only shows three devices:

bd40cb4e-93bb-4600-8455-ca1185aa8abe - ../../md3
7feedf1e-9711-4900-af9c-92738ea8aace - ../../sda3
4f0b27a5-5f5b-413c-a71b-b5a3bec5482c - ../../md2

What's going on with the varied output from (git) btrfs-show vs.
'btrfs fi show' (vs. the older, as-shipped btrfs)?


-- 
Jon
Software Blacksmith
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-show vs. btrfs different output

2013-03-21 Thread Jon Nelson
On Thu, Mar 21, 2013 at 10:11 AM, Eric Sandeen sand...@redhat.com wrote:
 On 3/21/13 10:04 AM, Jon Nelson wrote:
...
 2. the current git btrfs-show and btrfs fi show both output
 *different* devices for device with UUID
 b5dc52bd-21bf-4173-8049-d54d88c82240, and they're both wrong.

 does blkid output find that uuid anywhere?

 Since you're working in git, can you maybe do a little bisecting
 to find out when it changed?  Should be a fairly quick test?

blkid does /not/ report that uuid anywhere.

git bisect, if I did it correctly, says:


6eba9002956ac40db87d42fb653a0524dc568810 is the first bad commit
commit 6eba9002956ac40db87d42fb653a0524dc568810
Author: Goffredo Baroncelli kreij...@inwind.it
Date:   Tue Sep 4 19:59:26 2012 +0200

Correct un-initialized fsid variable

:100644 100644 b21a87f827a6250da45f2fb6a1c3a6b651062243
03952051b5e25e0b67f0f910c84d93eb90de8480 M  disk-io.c





-- 
Jon
Software Blacksmith
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-13 Thread Jon Nelson
On Sun, Dec 12, 2010 at 8:06 PM, Ted Ts'o ty...@mit.edu wrote:
 On Sun, Dec 12, 2010 at 07:11:28AM -0600, Jon Nelson wrote:
 I'm glad you've been able to reproduce the problem! If you should need
 any further assistance, please do not hesitate to ask.

 This patch seems to fix the problem for me.  (Unless the partition is
 mounted with mblk_io_submit.)

 Could you confirm that it fixes it for you as well?

I believe I have applied the (relevant) inode.c changes to
bd2d0210cf22f2bd0cef72eb97cf94fc7d31d8cc, rebuilt and begun testing.
Now at 28 passes without error, I think I can say that the patch
appears to resolve the issue.

-- 
Jon
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-12 Thread Jon Nelson
On Sat, Dec 11, 2010 at 9:16 PM, Jon Nelson jnel...@jamponi.net wrote:
 On Sat, Dec 11, 2010 at 7:40 PM, Ted Ts'o ty...@mit.edu wrote:
 Yes, indeed.  Is this in the virtualized environment or on real
 hardware at this point?  And how many CPU's do you have configured in
 your virtualized environment, and how memory memory?  Is having a
 certain number of CPU's critical for reproducing the problem?  Is
 constricting the amount of memory important?

 Originally, I observed the behavior on really real hardware.

 Since then, I have been able to reproduce it in VirtualBox and
 qemu-kvm, with openSUSE 11.3 and KUbuntu. All of the more recent tests
 have been with qemu-kvm.

 I have one CPU configured in the environment, 512MB of memory.
 I have not done any memory-constriction tests whatsoever.

 It'll be a lot easier if I can reproduce it locally, which is why I'm
 asking all of these questions.

 On Sat, Dec 11, 2010 at 8:34 PM, Ted Ts'o ty...@mit.edu wrote:
 One experiment --- can you try this with the file system mounted with
 data=writeback, and see if the problem reproduces in that journalling
 mode?

 That test is running now, first with encryption. I will report if it
 shows problems. If it does, I will wait until I have been able to see
 that a few times, and move to a no-encryption test. Typically, I have
 to run quite a few more iterations of that test before problems show
 up (if they will at all).

 I want to rule out (if possible) journal_submit_inode_data_buffers()
 racing with mpage_da_submit_io().  I don't think that's the issue, but
 I'd prefer to do the experiment to make sure.  So if you can use a
 kernel and system configuration which triggers the problem, and then
 try changing the mount options to include data=writeback, and then
 rerun the test, and let me know if the problem still reproduces, I'd
 be really grateful.

Using 2.6.37-rc5 and data=writeback,noatime and LUKS encryption I hit
the problem 71 times out of 173.

-- 
Jon
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-12 Thread Jon Nelson
On Sun, Dec 12, 2010 at 6:43 AM, Ted Ts'o ty...@mit.edu wrote:
 On Sun, Dec 12, 2010 at 04:18:29AM -0600, Jon Nelson wrote:
  I have one CPU configured in the environment, 512MB of memory.
  I have not done any memory-constriction tests whatsoever.

 I've finally been able to reproduce it myself, on real hardware.  SMP
 is not necessary to reproduce it, although having more than one CPU
 doesn't hurt.  What I did need to do (on real hardware with 4 gigs of
 memory) was to turn off swap and pin enough memory so that free memory
 was around 200megs or so before the start of the test.  (This is the
 natural amount of free memory that the system would try to reach,
 since 200 megs is about 5% of 4 gigs.)

 Then, during the test, free memory would drop to 50-70 megabytes,
 forcing writeback to run, and then I could trigger it about 1-2 times
 out of three.

 I'm guessing that when you used 512mb of memory, that was in effect a
 memory-constriction test, and if you were to push the memory down a
 little further, it might reproduce even more quickly.  My next step is
 to try to reproduce this in a VM, and then I can start probing to see
 what might be going on.

I'm glad you've been able to reproduce the problem! If you should need
any further assistance, please do not hesitate to ask.


-- 
Jon
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-11 Thread Jon Nelson
On Sat, Dec 11, 2010 at 7:40 PM, Ted Ts'o ty...@mit.edu wrote:
 Yes, indeed.  Is this in the virtualized environment or on real
 hardware at this point?  And how many CPU's do you have configured in
 your virtualized environment, and how memory memory?  Is having a
 certain number of CPU's critical for reproducing the problem?  Is
 constricting the amount of memory important?

Originally, I observed the behavior on really real hardware.

Since then, I have been able to reproduce it in VirtualBox and
qemu-kvm, with openSUSE 11.3 and KUbuntu. All of the more recent tests
have been with qemu-kvm.

I have one CPU configured in the environment, 512MB of memory.
I have not done any memory-constriction tests whatsoever.

 It'll be a lot easier if I can reproduce it locally, which is why I'm
 asking all of these questions.

On Sat, Dec 11, 2010 at 8:34 PM, Ted Ts'o ty...@mit.edu wrote:
 One experiment --- can you try this with the file system mounted with
 data=writeback, and see if the problem reproduces in that journalling
 mode?

That test is running now, first with encryption. I will report if it
shows problems. If it does, I will wait until I have been able to see
that a few times, and move to a no-encryption test. Typically, I have
to run quite a few more iterations of that test before problems show
up (if they will at all).

 I want to rule out (if possible) journal_submit_inode_data_buffers()
 racing with mpage_da_submit_io().  I don't think that's the issue, but
 I'd prefer to do the experiment to make sure.  So if you can use a
 kernel and system configuration which triggers the problem, and then
 try changing the mount options to include data=writeback, and then
 rerun the test, and let me know if the problem still reproduces, I'd
 be really grateful.


-- 
Jon
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-10 Thread Jon Nelson
On Fri, Dec 10, 2010 at 12:52 AM, Jon Nelson jnel...@jamponi.net wrote:
 On Thu, Dec 9, 2010 at 8:38 PM, Ted Ts'o ty...@mit.edu wrote:
 On Fri, Dec 10, 2010 at 02:53:30AM +0100, Matt wrote:

 Try a kernel before 5a87b7a5da250c9be6d757758425dfeaf8ed3179

 from the tests I've done that one showed the least or no corruption if
 you count the empty /etc/env.d/03opengl as an artefact

 Yes, that's a good test.  Also try commit bd2d0210cf.  The patch
 series that is most likely to be at fault if there is a regression in
 between 5a87b7a5d and bd2d0210cf inclusive.

 I did a lot of testing before submitting it, but that wa a tricky
 rewrite.  If you can reproduce the problem reliably, it might be good
 to try commit 16828088f9 (the commit before 5a87b7a5d) and commit
 bd2d0210cf.  If it reliably reproduces on bd2d0210cf, but is clean on
 16828088f9, then it's my ext4 block i/o submission patches, and we'll
 need to either figure out what's going on or back out that set of
 changes.

 If that's the case, a bisect of those changes (it's only 6 commits, so
 it shouldn't take long) would be most appreciated.

 I observed the behavior on bd2d0210cf in a qemu-kvm install of
 openSUSE 11.3 (x86_64) on *totally* different host - an AMD quad-core.

 I did /not/ observe the behavior on 16828088f9 (yet). I'll run the
 test a few more times on 1682..

 Additionally, I am building a bisected kernel now (
 cb20d5188366f04d96d2e07b1240cc92170ade40 ), but won't be able to get
 back at it for a while.

cb20d5188366f04d96d2e07b1240cc92170ade40 seems OK so far. I'm going to
try 1de3e3df917459422cb2aecac440febc8879d410 soon.


-- 
Jon
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-10 Thread Jon Nelson
On Fri, Dec 10, 2010 at 8:58 AM, Jon Nelson jnel...@jamponi.net wrote:
 On Fri, Dec 10, 2010 at 12:52 AM, Jon Nelson jnel...@jamponi.net wrote:
 On Thu, Dec 9, 2010 at 8:38 PM, Ted Ts'o ty...@mit.edu wrote:
 On Fri, Dec 10, 2010 at 02:53:30AM +0100, Matt wrote:

 Try a kernel before 5a87b7a5da250c9be6d757758425dfeaf8ed3179

 from the tests I've done that one showed the least or no corruption if
 you count the empty /etc/env.d/03opengl as an artefact

 Yes, that's a good test.  Also try commit bd2d0210cf.  The patch
 series that is most likely to be at fault if there is a regression in
 between 5a87b7a5d and bd2d0210cf inclusive.

 I did a lot of testing before submitting it, but that wa a tricky
 rewrite.  If you can reproduce the problem reliably, it might be good
 to try commit 16828088f9 (the commit before 5a87b7a5d) and commit
 bd2d0210cf.  If it reliably reproduces on bd2d0210cf, but is clean on
 16828088f9, then it's my ext4 block i/o submission patches, and we'll
 need to either figure out what's going on or back out that set of
 changes.

 If that's the case, a bisect of those changes (it's only 6 commits, so
 it shouldn't take long) would be most appreciated.

 I observed the behavior on bd2d0210cf in a qemu-kvm install of
 openSUSE 11.3 (x86_64) on *totally* different host - an AMD quad-core.

 I did /not/ observe the behavior on 16828088f9 (yet). I'll run the
 test a few more times on 1682..

 Additionally, I am building a bisected kernel now (
 cb20d5188366f04d96d2e07b1240cc92170ade40 ), but won't be able to get
 back at it for a while.

 cb20d5188366f04d96d2e07b1240cc92170ade40 seems OK so far. I'm going to
 try 1de3e3df917459422cb2aecac440febc8879d410 soon.

Barring false negatives, bd2d0210cf22f2bd0cef72eb97cf94fc7d31d8cc
appears to be the culprit (according to git bisect).
I will test bd2d0210cf22f2bd0cef72eb97cf94fc7d31d8cc again, confirm
the behavior, and work backwards to try to reduce the possibility of
false negatives.

-- 
Jon
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-10 Thread Jon Nelson
On Fri, Dec 10, 2010 at 10:54 AM, Jon Nelson jnel...@jamponi.net wrote:
 On Fri, Dec 10, 2010 at 8:58 AM, Jon Nelson jnel...@jamponi.net wrote:
 On Fri, Dec 10, 2010 at 12:52 AM, Jon Nelson jnel...@jamponi.net wrote:
 On Thu, Dec 9, 2010 at 8:38 PM, Ted Ts'o ty...@mit.edu wrote:
 On Fri, Dec 10, 2010 at 02:53:30AM +0100, Matt wrote:

 Try a kernel before 5a87b7a5da250c9be6d757758425dfeaf8ed3179

 from the tests I've done that one showed the least or no corruption if
 you count the empty /etc/env.d/03opengl as an artefact

 Yes, that's a good test.  Also try commit bd2d0210cf.  The patch
 series that is most likely to be at fault if there is a regression in
 between 5a87b7a5d and bd2d0210cf inclusive.

 I did a lot of testing before submitting it, but that wa a tricky
 rewrite.  If you can reproduce the problem reliably, it might be good
 to try commit 16828088f9 (the commit before 5a87b7a5d) and commit
 bd2d0210cf.  If it reliably reproduces on bd2d0210cf, but is clean on
 16828088f9, then it's my ext4 block i/o submission patches, and we'll
 need to either figure out what's going on or back out that set of
 changes.

 If that's the case, a bisect of those changes (it's only 6 commits, so
 it shouldn't take long) would be most appreciated.

 I observed the behavior on bd2d0210cf in a qemu-kvm install of
 openSUSE 11.3 (x86_64) on *totally* different host - an AMD quad-core.

 I did /not/ observe the behavior on 16828088f9 (yet). I'll run the
 test a few more times on 1682..

 Additionally, I am building a bisected kernel now (
 cb20d5188366f04d96d2e07b1240cc92170ade40 ), but won't be able to get
 back at it for a while.

 cb20d5188366f04d96d2e07b1240cc92170ade40 seems OK so far. I'm going to
 try 1de3e3df917459422cb2aecac440febc8879d410 soon.

 Barring false negatives, bd2d0210cf22f2bd0cef72eb97cf94fc7d31d8cc
 appears to be the culprit (according to git bisect).
 I will test bd2d0210cf22f2bd0cef72eb97cf94fc7d31d8cc again, confirm
 the behavior, and work backwards to try to reduce the possibility of
 false negatives.

A few additional notes, in no particular order:

- For me, triggering the problem is fairly easy when encryption is involved.
- I'm now 81 iterations into testing
bd2d0210cf22f2bd0cef72eb97cf94fc7d31d8cc *without* encryption.  Out of
81 iterations, I have 4 failures: #16, 40, 62, and 64.

I will now try 1de3e3df917459422cb2aecac440febc8879d410 much more extensively.

Is this useful information?

-- 
Jon
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-09 Thread Jon Nelson
On Thu, Dec 9, 2010 at 2:13 PM, Ted Ts'o ty...@mit.edu wrote:
 On Thu, Dec 09, 2010 at 12:10:58PM -0600, Jon Nelson wrote:

 You should be OK, there. Are you using encryption or no?
 I had difficulty replicating the issue without encryption.

 Yes, I'm using encryption.  LUKS with aes-xts-plain-sha256, and then
 LVM on top of LUKS.

Hmm.
The cipher is listed as:
aes-cbc-essiv:sha256

  If you can point out how to query pgsql_tmp (I'm using a completely
  default postgres install), that would be helpful, but I don't think it
  would be going anywhere else.

 Normally it's /var/lib/pgsql/data/pgsql_tmp (or
 /var/lib/postgres/data/pgsql_tmp in your case). By placing
 /var/lib/{postgresql,pgsql}/data on the LUKS + ext4 volume, on both
 openSUSE 11.3 and Kubuntu, I was able to replicate the problem easily,
 in VirtualBox. I can give qemu a try. In both cases I was using a
 2.6.37x kernel.

 Ah, I'm not using virtualization.  I'm running on a X410 laptop, on
 raw hardware.  Perhaps virtualization slows things down enough that it
 triggers?  Or maybe you're running with a more constrained memory than
 I?  How much memory do you have configured in your VM?

512MB.

'free' reports 75MB, 419MB free.

I originally noticed the problem on really real hardware (thinkpad
T61p), however.

-- 
Jon
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-09 Thread Jon Nelson
On Thu, Dec 9, 2010 at 8:38 PM, Ted Ts'o ty...@mit.edu wrote:
 On Fri, Dec 10, 2010 at 02:53:30AM +0100, Matt wrote:

 Try a kernel before 5a87b7a5da250c9be6d757758425dfeaf8ed3179

 from the tests I've done that one showed the least or no corruption if
 you count the empty /etc/env.d/03opengl as an artefact

 Yes, that's a good test.  Also try commit bd2d0210cf.  The patch
 series that is most likely to be at fault if there is a regression in
 between 5a87b7a5d and bd2d0210cf inclusive.

 I did a lot of testing before submitting it, but that wa a tricky
 rewrite.  If you can reproduce the problem reliably, it might be good
 to try commit 16828088f9 (the commit before 5a87b7a5d) and commit
 bd2d0210cf.  If it reliably reproduces on bd2d0210cf, but is clean on
 16828088f9, then it's my ext4 block i/o submission patches, and we'll
 need to either figure out what's going on or back out that set of
 changes.

 If that's the case, a bisect of those changes (it's only 6 commits, so
 it shouldn't take long) would be most appreciated.

I observed the behavior on bd2d0210cf in a qemu-kvm install of
openSUSE 11.3 (x86_64) on *totally* different host - an AMD quad-core.

I did /not/ observe the behavior on 16828088f9 (yet). I'll run the
test a few more times on 1682..

Additionally, I am building a bisected kernel now (
cb20d5188366f04d96d2e07b1240cc92170ade40 ), but won't be able to get
back at it for a while.

I hope this helps.

-- 
Jon
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-08 Thread Jon Nelson
On Tue, Dec 7, 2010 at 9:37 PM, Jon Nelson jnel...@jamponi.net wrote:
 On Tue, Dec 7, 2010 at 1:35 PM, Ted Ts'o ty...@mit.edu wrote:
 On Tue, Dec 07, 2010 at 01:22:43PM -0500, Mike Snitzer wrote:
  1. create a database (from bash):
 
  createdb test
 
  2. place the following contents in a file (I used 't.sql'):
 
  begin;
  create temporary table foo as select x as a, ARRAY[x] as b FROM
  generate_series(1, 1000 ) AS x;
  create index foo_a_idx on foo (a);
  create index foo_b_idx on foo USING GIN (b);
  rollback;
 
  3. execute that sql:
 
  psql -f t.sql --echo-all test
 
  With 2.6.34.7 I can re-run [3] all day long, as many times as I want,
  without issue.
 
  With 2.6.37-rc4-13 (the currently-installed KOTD kernel) if tails
  pretty frequently.

 So I just tried to reproduce this on an Ubuntu 10.04 system running
 2.6.37-rc5 (completely stock except for a few apparmor patches that I
 needed to keep the apparmor userspace from complaining).  I'm using
 Postgres 8.4.5-0ubuntu10.04.

 Using the above procedure, I wasn't able to reproduce.  Then I
 realized this might have been because I was using an SSD root file
 system (which is secured using LUKS/dm-crypt, with LVM on top of
 dm-crypt).  So I mounted a file system on a 5400 rpm SSD disk, which
 is also protected using LUKS/dm-crypt with LVM on top.  I then
 executed the PostgresQL commands:

 CREATE TABLESPACE test LOCATION '/kbuild/postgres';
 SET default_tablespace = test;
 COMMIT
 \quit

 I then re-ran the above proceduing, and verified that all of the I/O
 was going to the 5400rpm laptop disk.

 I then ran the above procedure a half-dozen times, and I still haven't
 been able to reproduce any Postgresql errors or kernel errors.

 Jon, can you help me identify what might be different with your run
 and mine?  What version of Postgres are you using?

 One difference is the location of the transaction logs (pg_xlog). In
 my case, /var/lib/pgsql/data *is* mountpoint for the test volume
 (actually, it's a symlink to the mount point). In your case, that is
 not so. Perhaps that makes a difference?  pgsql_tmp might also be on
 two different volumes in your case (I can't be sure).


I grabbed a Kubuntu iso and installed Kubuntu 10.10, and then upgraded
to 'natty', and eventually to 2.6.37-8-generic.

With that install, and postgresql's data (/var/lib/postgresql/data)
being located on a LUKS+ext4 volume, I easily observe the behavior.

Does this help?

-- 
Jon
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-07 Thread Jon Nelson
I finally found some time to test this out. With 2.6.37-rc4 (openSUSE
KOTD kernel) I easily encounter the issue.

Using a virtual machine, I created a stock, minimal openSUSE 11.3 x86_64
install, installed all updates, installed postgresql and the 'KOTD'
(Kernel of the Day)
kernel, and ran the following tests (as postgres user because I'm
lazy).

1. create a database (from bash):

createdb test

2. place the following contents in a file (I used 't.sql'):

begin;
create temporary table foo as select x as a, ARRAY[x] as b FROM
generate_series(1, 1000 ) AS x;
create index foo_a_idx on foo (a);
create index foo_b_idx on foo USING GIN (b);
rollback;

3. execute that sql:

psql -f t.sql --echo-all test


With 2.6.34.7 I can re-run [3] all day long, as many times as I want,
without issue.

With 2.6.37-rc4-13 (the currently-installed KOTD kernel) if tails
pretty frequently.

Then I tested with the 'vanilla' kernel available here:

http://download.opensuse.org/repositories/Kernel:/vanilla/standard/

The 'vanilla' kernel exhibited the same problems.
The version I tested:  2.6.37-rc4-219-g771f8bc-vanilla.

Incidentally, quick tests of jfs, xfs, and ext3 do _not_ show the same
problems, although I will note that I usually saw failure at least 1
in 3, but sometimes had to re-run the sql test 4 or 5 times before I
saw failure.

I will continue to do some testing, but I will hold off on testing the
commits above until I receive further testing suggestions.

-- 
Jon
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-07 Thread Jon Nelson
On Tue, Dec 7, 2010 at 12:22 PM, Mike Snitzer snit...@redhat.com wrote:
 On Tue, Dec 07 2010 at  1:10pm -0500,
 Jon Nelson jnel...@jamponi.net wrote:

 I finally found some time to test this out. With 2.6.37-rc4 (openSUSE
 KOTD kernel) I easily encounter the issue.

 Using a virtual machine, I created a stock, minimal openSUSE 11.3 x86_64
 install, installed all updates, installed postgresql and the 'KOTD'
 (Kernel of the Day)
 kernel, and ran the following tests (as postgres user because I'm
 lazy).

 1. create a database (from bash):

 createdb test

 2. place the following contents in a file (I used 't.sql'):

 begin;
 create temporary table foo as select x as a, ARRAY[x] as b FROM
 generate_series(1, 1000 ) AS x;
 create index foo_a_idx on foo (a);
 create index foo_b_idx on foo USING GIN (b);
 rollback;

 3. execute that sql:

 psql -f t.sql --echo-all test


 With 2.6.34.7 I can re-run [3] all day long, as many times as I want,
 without issue.

 With 2.6.37-rc4-13 (the currently-installed KOTD kernel) if tails
 pretty frequently.

 How does it fail?  postgres errors?  kernel errors?

postgresql errors. Typically, header corruption but from the limited
visibility I've had into this via strace, what I see is zeroed pages
where there shouldn't be.

I just ran a test and got:

ERROR:  invalid page header in block 37483 of relation base/16384/16417

but that is not the only error one might get.

 Then I tested with the 'vanilla' kernel available here:

 http://download.opensuse.org/repositories/Kernel:/vanilla/standard/

 The 'vanilla' kernel exhibited the same problems.
 The version I tested:  2.6.37-rc4-219-g771f8bc-vanilla.

 Incidentally, quick tests of jfs, xfs, and ext3 do _not_ show the same
 problems, although I will note that I usually saw failure at least 1
 in 3, but sometimes had to re-run the sql test 4 or 5 times before I
 saw failure.

 I will continue to do some testing, but I will hold off on testing the
 commits above until I receive further testing suggestions.

 OK, so to be clear: your testing is on dm-crypt + ext4?

Yes. I took a virtual hard disk which shows up as /dev/sdb, used
cryptsetup to format it as a LUKS volume, mounted the LUKS volume,
formatted as ext4 (or whatever), mounted that, rsync'd over a blank
postgresql 'data' directory, started postgresql, became the postgres
user, and proceeded to create the db and run the sql.

 And you're testing upstream based kernels (meaning the dm-crypt
 scalability patch that has been in question is _not_ in the mix)?

I am testing both the KOTD kernels and the vanilla kernels - neither
of which has the dm-crypt patches (as far as I know).

-- 
Jon
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-07 Thread Jon Nelson
On Tue, Dec 7, 2010 at 12:52 PM, Chris Mason chris.ma...@oracle.com wrote:
 Excerpts from Jon Nelson's message of 2010-12-07 13:45:14 -0500:
 On Tue, Dec 7, 2010 at 12:22 PM, Mike Snitzer snit...@redhat.com wrote:
  On Tue, Dec 07 2010 at  1:10pm -0500,
  Jon Nelson jnel...@jamponi.net wrote:
 
  I finally found some time to test this out. With 2.6.37-rc4 (openSUSE
  KOTD kernel) I easily encounter the issue.
 
  Using a virtual machine, I created a stock, minimal openSUSE 11.3 x86_64
  install, installed all updates, installed postgresql and the 'KOTD'
  (Kernel of the Day)
  kernel, and ran the following tests (as postgres user because I'm
  lazy).
 
  1. create a database (from bash):
 
  createdb test
 
  2. place the following contents in a file (I used 't.sql'):
 
  begin;
  create temporary table foo as select x as a, ARRAY[x] as b FROM
  generate_series(1, 1000 ) AS x;
  create index foo_a_idx on foo (a);
  create index foo_b_idx on foo USING GIN (b);
  rollback;
 
  3. execute that sql:
 
  psql -f t.sql --echo-all test
 
 
  With 2.6.34.7 I can re-run [3] all day long, as many times as I want,
  without issue.
 
  With 2.6.37-rc4-13 (the currently-installed KOTD kernel) if tails
  pretty frequently.
 
  How does it fail?  postgres errors?  kernel errors?

 postgresql errors. Typically, header corruption but from the limited
 visibility I've had into this via strace, what I see is zeroed pages
 where there shouldn't be.

 This sounds a lot like a bug higher up than dm-crypt.  Zeros tend to
 come from some piece of code explicitly filling a page with zeros, and
 that often happens in the corner cases for O_DIRECT and a few other
 places in the filesystem.

 Have you tried triggering this with a regular block device?

I just tried the whole set of tests, but with /dev/sdb directly (as
ext4) without any crypt-y bits.
It takes more iterations but out of 6 tests I had one failure: same
type of thing, 'invalid page header in block '.

I can't guarantee that it is a full-page of zeroes, just what I saw
from the (limited) stracing I did.



-- 
Jon
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-07 Thread Jon Nelson
On Tue, Dec 7, 2010 at 2:02 PM, Chris Mason chris.ma...@oracle.com wrote:
 Excerpts from Jon Nelson's message of 2010-12-07 14:34:40 -0500:
 On Tue, Dec 7, 2010 at 12:52 PM, Chris Mason chris.ma...@oracle.com wrote:
  postgresql errors. Typically, header corruption but from the limited
  visibility I've had into this via strace, what I see is zeroed pages
  where there shouldn't be.
 
  This sounds a lot like a bug higher up than dm-crypt.  Zeros tend to
  come from some piece of code explicitly filling a page with zeros, and
  that often happens in the corner cases for O_DIRECT and a few other
  places in the filesystem.
 
  Have you tried triggering this with a regular block device?

 I just tried the whole set of tests, but with /dev/sdb directly (as
 ext4) without any crypt-y bits.
 It takes more iterations but out of 6 tests I had one failure: same
 type of thing, 'invalid page header in block '.

 I can't guarantee that it is a full-page of zeroes, just what I saw
 from the (limited) stracing I did.

 Fantastic. Now for our usual suspects:

 1) Is postgres using O_DIRECT?  If yes, please turn it off

According to strace, O_DIRECT didn't show up once during the test.

 2) Is postgres allocating sparse files?  If yes, please have it fully
 allocate the file instead.

That's a tough one. I don't think postgresql does that, but I'm not an
expert here.

 3) Is postgres using preallocation (fallocate)?  If yes, please have it
 fully allocate the file instead

As far as strace is concerned, postgreql is not using fallocate in
this version.

-- 
Jon
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-07 Thread Jon Nelson
On Tue, Dec 7, 2010 at 2:33 PM, Chris Mason chris.ma...@oracle.com wrote:
 Excerpts from Jon Nelson's message of 2010-12-07 15:25:47 -0500:
 On Tue, Dec 7, 2010 at 2:02 PM, Chris Mason chris.ma...@oracle.com wrote:
  Excerpts from Jon Nelson's message of 2010-12-07 14:34:40 -0500:
  On Tue, Dec 7, 2010 at 12:52 PM, Chris Mason chris.ma...@oracle.com 
  wrote:
   postgresql errors. Typically, header corruption but from the limited
   visibility I've had into this via strace, what I see is zeroed pages
   where there shouldn't be.
  
   This sounds a lot like a bug higher up than dm-crypt.  Zeros tend to
   come from some piece of code explicitly filling a page with zeros, and
   that often happens in the corner cases for O_DIRECT and a few other
   places in the filesystem.
  
   Have you tried triggering this with a regular block device?
 
  I just tried the whole set of tests, but with /dev/sdb directly (as
  ext4) without any crypt-y bits.
  It takes more iterations but out of 6 tests I had one failure: same
  type of thing, 'invalid page header in block '.
 
  I can't guarantee that it is a full-page of zeroes, just what I saw
  from the (limited) stracing I did.
 
  Fantastic. Now for our usual suspects:
 
  1) Is postgres using O_DIRECT?  If yes, please turn it off

 According to strace, O_DIRECT didn't show up once during the test.

  2) Is postgres allocating sparse files?  If yes, please have it fully
  allocate the file instead.

 That's a tough one. I don't think postgresql does that, but I'm not an
 expert here.

  3) Is postgres using preallocation (fallocate)?  If yes, please have it
  fully allocate the file instead

 As far as strace is concerned, postgreql is not using fallocate in
 this version.

 Well, the only other usual suspect would be mmap.  Does the strace show
 that you're using read/write for file IO or is it doing a lot of mmaps
 on the files?

I'm pretty sure postgresql uses regular file I/O and not mmap.

-- 
Jon
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-07 Thread Jon Nelson
On Tue, Dec 7, 2010 at 2:41 PM, Chris Mason chris.ma...@oracle.com wrote:
 Excerpts from Jon Nelson's message of 2010-12-07 15:25:47 -0500:
 On Tue, Dec 7, 2010 at 2:02 PM, Chris Mason chris.ma...@oracle.com wrote:
  Excerpts from Jon Nelson's message of 2010-12-07 14:34:40 -0500:
  On Tue, Dec 7, 2010 at 12:52 PM, Chris Mason chris.ma...@oracle.com 
  wrote:
   postgresql errors. Typically, header corruption but from the limited
   visibility I've had into this via strace, what I see is zeroed pages
   where there shouldn't be.
  
   This sounds a lot like a bug higher up than dm-crypt.  Zeros tend to
   come from some piece of code explicitly filling a page with zeros, and
   that often happens in the corner cases for O_DIRECT and a few other
   places in the filesystem.
  
   Have you tried triggering this with a regular block device?
 
  I just tried the whole set of tests, but with /dev/sdb directly (as
  ext4) without any crypt-y bits.
  It takes more iterations but out of 6 tests I had one failure: same
  type of thing, 'invalid page header in block '.
 
  I can't guarantee that it is a full-page of zeroes, just what I saw
  from the (limited) stracing I did.
 
  Fantastic. Now for our usual suspects:
 
  1) Is postgres using O_DIRECT?  If yes, please turn it off

 According to strace, O_DIRECT didn't show up once during the test.

  2) Is postgres allocating sparse files?  If yes, please have it fully
  allocate the file instead.

 That's a tough one. I don't think postgresql does that, but I'm not an
 expert here.

 Ok, please compare du -k and du -k --apparent-size for each of the
 files involved in the postgres run.

Because this is all done in a transaction (which fails), and because
the table is a TEMPORARY table, there *are* no files once the
transaction fails because postgresql unlinks them.

I can modify the test to use real tables and do things outside of a
transaction, however.

I was using fdatasync[1] and now I'm using sync. I'm on 9 iterations
without a failure (on ext4 - no crypt). Theoretically, these settings
only make a difference in the event of a crash. However, could they
make a difference in terms of the paths taken in the kernel?


[1] for wal_sync_method

-- 
Jon
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-07 Thread Jon Nelson
On Tue, Dec 7, 2010 at 1:35 PM, Ted Ts'o ty...@mit.edu wrote:
 On Tue, Dec 07, 2010 at 01:22:43PM -0500, Mike Snitzer wrote:
  1. create a database (from bash):
 
  createdb test
 
  2. place the following contents in a file (I used 't.sql'):
 
  begin;
  create temporary table foo as select x as a, ARRAY[x] as b FROM
  generate_series(1, 1000 ) AS x;
  create index foo_a_idx on foo (a);
  create index foo_b_idx on foo USING GIN (b);
  rollback;
 
  3. execute that sql:
 
  psql -f t.sql --echo-all test
 
  With 2.6.34.7 I can re-run [3] all day long, as many times as I want,
  without issue.
 
  With 2.6.37-rc4-13 (the currently-installed KOTD kernel) if tails
  pretty frequently.

 So I just tried to reproduce this on an Ubuntu 10.04 system running
 2.6.37-rc5 (completely stock except for a few apparmor patches that I
 needed to keep the apparmor userspace from complaining).  I'm using
 Postgres 8.4.5-0ubuntu10.04.

 Using the above procedure, I wasn't able to reproduce.  Then I
 realized this might have been because I was using an SSD root file
 system (which is secured using LUKS/dm-crypt, with LVM on top of
 dm-crypt).  So I mounted a file system on a 5400 rpm SSD disk, which
 is also protected using LUKS/dm-crypt with LVM on top.  I then
 executed the PostgresQL commands:

 CREATE TABLESPACE test LOCATION '/kbuild/postgres';
 SET default_tablespace = test;
 COMMIT
 \quit

 I then re-ran the above proceduing, and verified that all of the I/O
 was going to the 5400rpm laptop disk.

 I then ran the above procedure a half-dozen times, and I still haven't
 been able to reproduce any Postgresql errors or kernel errors.

 Jon, can you help me identify what might be different with your run
 and mine?  What version of Postgres are you using?

I am using postgres 8.4.5 on openSUSE 11.3 x86_64.
The problems were observed on both real hardware (thinkpad T61p) and
in virtualbox, where all current testing is taking place. The current
kernel is a vanilla (unpatched) kernel. I *did* set wal_sync_method
to fdatasync, however, if that is relevant. Otherwise, the pg config
is stock. With no crypt involved, I did have to iterate the tests to
observe the issue - a half-dozen times or more were necessary.
Typically, when crypt was involved, the issue would manifest much more
rapidly.

-- 
Jon
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-07 Thread Jon Nelson
On Tue, Dec 7, 2010 at 3:02 PM, Chris Mason chris.ma...@oracle.com wrote:
 Excerpts from Jon Nelson's message of 2010-12-07 15:48:58 -0500:
 On Tue, Dec 7, 2010 at 2:41 PM, Chris Mason chris.ma...@oracle.com wrote:
  Excerpts from Jon Nelson's message of 2010-12-07 15:25:47 -0500:
  On Tue, Dec 7, 2010 at 2:02 PM, Chris Mason chris.ma...@oracle.com 
  wrote:
   Excerpts from Jon Nelson's message of 2010-12-07 14:34:40 -0500:
   On Tue, Dec 7, 2010 at 12:52 PM, Chris Mason chris.ma...@oracle.com 
   wrote:
postgresql errors. Typically, header corruption but from the limited
visibility I've had into this via strace, what I see is zeroed pages
where there shouldn't be.
   
This sounds a lot like a bug higher up than dm-crypt.  Zeros tend to
come from some piece of code explicitly filling a page with zeros, 
and
that often happens in the corner cases for O_DIRECT and a few other
places in the filesystem.
   
Have you tried triggering this with a regular block device?
  
   I just tried the whole set of tests, but with /dev/sdb directly (as
   ext4) without any crypt-y bits.
   It takes more iterations but out of 6 tests I had one failure: same
   type of thing, 'invalid page header in block '.
  
   I can't guarantee that it is a full-page of zeroes, just what I saw
   from the (limited) stracing I did.
  
   Fantastic. Now for our usual suspects:

Maybe not so fantastic. I kept testing and had no more failures. At
all. After 40+ iterations I gave up.
I went back to trying ext4 on a LUKS volume. The 'hit' ratio went to
something like 1 in 3, or better.

I will continue to do testing with and without LUKS. I did /not/
reboot between tests, but I do start with a fresh postgres database.

-- 
Jon
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-07 Thread Jon Nelson
On Tue, Dec 7, 2010 at 1:35 PM, Ted Ts'o ty...@mit.edu wrote:
 On Tue, Dec 07, 2010 at 01:22:43PM -0500, Mike Snitzer wrote:
  1. create a database (from bash):
 
  createdb test
 
  2. place the following contents in a file (I used 't.sql'):
 
  begin;
  create temporary table foo as select x as a, ARRAY[x] as b FROM
  generate_series(1, 1000 ) AS x;
  create index foo_a_idx on foo (a);
  create index foo_b_idx on foo USING GIN (b);
  rollback;
 
  3. execute that sql:
 
  psql -f t.sql --echo-all test
 
  With 2.6.34.7 I can re-run [3] all day long, as many times as I want,
  without issue.
 
  With 2.6.37-rc4-13 (the currently-installed KOTD kernel) if tails
  pretty frequently.

 So I just tried to reproduce this on an Ubuntu 10.04 system running
 2.6.37-rc5 (completely stock except for a few apparmor patches that I
 needed to keep the apparmor userspace from complaining).  I'm using
 Postgres 8.4.5-0ubuntu10.04.

 Using the above procedure, I wasn't able to reproduce.  Then I
 realized this might have been because I was using an SSD root file
 system (which is secured using LUKS/dm-crypt, with LVM on top of
 dm-crypt).  So I mounted a file system on a 5400 rpm SSD disk, which
 is also protected using LUKS/dm-crypt with LVM on top.  I then
 executed the PostgresQL commands:

 CREATE TABLESPACE test LOCATION '/kbuild/postgres';
 SET default_tablespace = test;
 COMMIT
 \quit

 I then re-ran the above proceduing, and verified that all of the I/O
 was going to the 5400rpm laptop disk.

 I then ran the above procedure a half-dozen times, and I still haven't
 been able to reproduce any Postgresql errors or kernel errors.

 Jon, can you help me identify what might be different with your run
 and mine?  What version of Postgres are you using?

One difference is the location of the transaction logs (pg_xlog). In
my case, /var/lib/pgsql/data *is* mountpoint for the test volume
(actually, it's a symlink to the mount point). In your case, that is
not so. Perhaps that makes a difference?  pgsql_tmp might also be on
two different volumes in your case (I can't be sure).


-- 
Jon
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-07 Thread Jon Nelson
On Tue, Dec 7, 2010 at 2:41 PM, Chris Mason chris.ma...@oracle.com wrote:
 Excerpts from Jon Nelson's message of 2010-12-07 15:25:47 -0500:
 On Tue, Dec 7, 2010 at 2:02 PM, Chris Mason chris.ma...@oracle.com wrote:
  Excerpts from Jon Nelson's message of 2010-12-07 14:34:40 -0500:
  On Tue, Dec 7, 2010 at 12:52 PM, Chris Mason chris.ma...@oracle.com 
  wrote:
   postgresql errors. Typically, header corruption but from the limited
   visibility I've had into this via strace, what I see is zeroed pages
   where there shouldn't be.
  
   This sounds a lot like a bug higher up than dm-crypt.  Zeros tend to
   come from some piece of code explicitly filling a page with zeros, and
   that often happens in the corner cases for O_DIRECT and a few other
   places in the filesystem.
  
   Have you tried triggering this with a regular block device?
 
  I just tried the whole set of tests, but with /dev/sdb directly (as
  ext4) without any crypt-y bits.
  It takes more iterations but out of 6 tests I had one failure: same
  type of thing, 'invalid page header in block '.
 
  I can't guarantee that it is a full-page of zeroes, just what I saw
  from the (limited) stracing I did.
 
  Fantastic. Now for our usual suspects:
 
  1) Is postgres using O_DIRECT?  If yes, please turn it off

 According to strace, O_DIRECT didn't show up once during the test.

  2) Is postgres allocating sparse files?  If yes, please have it fully
  allocate the file instead.

 That's a tough one. I don't think postgresql does that, but I'm not an
 expert here.

 Ok, please compare du -k and du -k --apparent-size for each of the
 files involved in the postgres run.

One of the files (the table itself) is very slightly sparse:
588240 (apparent) vs 588244

-- 
Jon
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dm-crypt barrier support is effective

2010-12-01 Thread Jon Nelson
On Wed, Dec 1, 2010 at 12:24 PM, Milan Broz mb...@redhat.com wrote:
 On 12/01/2010 06:35 PM, Matt wrote:
 Thanks for pointing to v6 ! I hadn't noticed that there was a new one :)

 Well, so I'll restore my box to a working/productive state and will
 try out v6 (I'm pretty confident that it'll work without problems).

 It's the same as previous, just with fixed header (to track it properly
 in patchwork) , second patch adds some read optimisation, nothing what
 should help here.

 Anyway, I run several tests on 2.6.37-rc3+ and see no integrity
 problems (using xfs,ext3 and ext4 over dmcrypt).

 So please try to check which change causes these problems for you,
 it can be something completely unrelated to these patches.

 (If if anyone know how to trigger some corruption with btrfs/dmcrypt,
 let me know I am not able to reproduce it either.)

Perhaps this is useful: for myself, I found that when I started using
2.6.37rc3 that postgresql starting having a *lot* of problems with
corruption. Specifically, I noted zeroed pages, corruption in headers,
all sorts of stuff on /newly created/ tables, especially during index
creation. I had a fairly high hit rate of failure. I backed off to
2.6.34.7 and have *zero* problems (in fact, prior to 2.6.37rc3, I had
never had a corruption issue with postgresql). I ran on 2.6.36 for a
few weeks as well, without issue.

I am using kcrypt with lvm on top of that, and ext4 on top of that.

-- 
Jon
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


weird ENOSPC with defragment directory

2010-08-16 Thread Jon Nelson
Most other directories on /var/cache, *except* those created by squid,
can be defragmented.
The filesystem was converted from ext3/4.

turnip:~ # uname -a
Linux turnip 2.6.34-12-default #1 SMP 2010-06-29 02:39:08 +0200 x86_64
x86_64 x86_64 GNU/Linux

(stock openSUSE 11.3 kernel)

turnip:~ btrfsctl -d /var/cache/squid/01/93
ioctl:: No space left on device
turnip:~ # find !$
find /var/cache/squid/01/93
/var/cache/squid/01/93
/var/cache/squid/01/93/00019321
/var/cache/squid/01/93/00019378
turnip:~ # ls -la !$
ls -la /var/cache/squid/01/93
total 2
drwxr-x--- 1 squid nogroup   32 Aug 13 17:13 .
drwxr-x--- 1 squid nogroup 1024 Jun  4 18:35 ..
-rw-r- 1 squid nogroup 1777 Jul 13 22:31 00019321
-rw-r- 1 squid nogroup  537 Jul 13 22:31 00019378
turnip:~ #

That seems... strange.

-- 
Jon
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: weird ENOSPC with defragment directory

2010-08-16 Thread Jon Nelson
On Mon, Aug 16, 2010 at 10:15 PM, Jon Nelson jnel...@jamponi.net wrote:
 Most other directories on /var/cache, *except* those created by squid,
 can be defragmented.
 The filesystem was converted from ext3/4.

 turnip:~ # uname -a
 Linux turnip 2.6.34-12-default #1 SMP 2010-06-29 02:39:08 +0200 x86_64
 x86_64 x86_64 GNU/Linux

 (stock openSUSE 11.3 kernel)

 turnip:~ btrfsctl -d /var/cache/squid/01/93
 ioctl:: No space left on device
 turnip:~ # find !$
 find /var/cache/squid/01/93
 /var/cache/squid/01/93
 /var/cache/squid/01/93/00019321
 /var/cache/squid/01/93/00019378
 turnip:~ # ls -la !$
 ls -la /var/cache/squid/01/93
 total 2
 drwxr-x--- 1 squid nogroup   32 Aug 13 17:13 .
 drwxr-x--- 1 squid nogroup 1024 Jun  4 18:35 ..
 -rw-r- 1 squid nogroup 1777 Jul 13 22:31 00019321
 -rw-r- 1 squid nogroup  537 Jul 13 22:31 00019378
 turnip:~ #

 That seems... strange.

It gets stranger. If I issue a 'sync' command, chances are the
defragment command will work. If I issue a bunch of them (in series,
however), then I get ENOSPC.

find /var/cache -xdev -type d -exec btrfsprogs -d {} \;

Seems to do it every time, with or without -depth. I get 100% success
and then 100% failure - no mixing.

-- 
Jon
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html