Re: Make existing snapshots read-only?

2012-05-29 Thread Stephane Chazelas
2012-05-28 12:37:00 -0600, Bruce Guenter:
 
 Is there any way to mark existing snapshots as read-only? Making new
 ones read-only is easy enough, but what about existing ones?
[...]

you can always do

btrfs sub snap -r vol vol-ro
btrfs sub del vol
mv vol-ro vol

-- 
Stephane

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Cloning a Btrfs partition

2011-12-08 Thread Stephane CHAZELAS
2011-12-07, 12:35(-06), BJ Quinn:
 I've got a 6TB btrfs array (two 3TB drives in a RAID 0). It's
 about 2/3 full and has lots of snapshots. I've written a
 script that runs through the snapshots and copies the data
 efficiently (rsync --inplace --no-whole-file) from the main
 6TB array to a backup array, creating snapshots on the backup
 array and then continuing on copying the next snapshot.
 Problem is, it looks like it will take weeks to finish. 

 I've tried simply using dd to clone the btrfs partition, which
 technically appears to work, but then it appears that the UUID
 between the arrays is identical, so I can only mount one or
 the other. This means I can't continue to simply update the
 backup array with the new snapshots created on the main array
 (my script is capable of catching up the backup array with
 the new snapshots, but if I can't mount both arrays...). 
[...]

You can mount them if you specify the devices upon mount.

Here's a method to transfer a full FS to some other with
different layout.

In this example, we're transfering from a FS on a 3GB device
(/dev/loop1) to a new FS on 2 2GB devices (/dev/loop2,
/dev/loop3)

truncate -s 3G a1
truncate -s 2G b1 b2
losetup /dev/loop1 a1
losetup /dev/loop2 b1
losetup /dev/loop2 b2

# our src FS on 1 disk:
mkfs.btrfs /dev/loop1
mkdir A B
mount /dev/loop1 A
# now we can fill it up, create subvolumes and snapshots...

# at this point, we decide to make a clone of it. To do that, we
# will make a snapshot of the device. For that, we need
# temporary storage as a block device. That could be a disk
# (like a USB key) or a nbd to another host, or anything. Here,
# I'm going to use a loop device to a file. You need enough
# space to store any modification done on the src FS while
# you're the transfer and what is needed to do the transfer
# (I can't tell you much about that).

truncate -s 100M sa
losetup /dev/loop4 sa

umount A
size=$(blockdev --getsize /dev/loop1)
echo 0 $size /dev/loop1) snapshot-origin /dev/loop1 | dmsetup create a
echo 0 $size snapshot /dev/loop1 /dev/loop4 N 8 | dmsetup create aSnap

# now we have /dev/mapper/a as the src device which we can
# remount as such and use:
mount /dev/mapper/a A

# and aSnap as a writable snapshot of the src device, which we
# mount separately:
mount /dev/mapper/aSnap B

# The trick here is that we're going to add the two new devices
# to B and remove the snapshot one. btrfs will automatically
# migrate the data to the new device:
btrfs device add /dev/loop2 /dev/loop3 B
btrfs device delete /dev/mapper/aSnap B
# END
Once that's completed, you should have a copy of A in B.

You may want to watch the status of the snapshot while you're
transfering to check that it doesn't get full 

That method can't be used to do some incremental syncing
between two FS for which you'd still need something similar to
zfs send (speaking of which, you may want to consider
zfsonlinux which is now reaching a point where it's about as
stable as btrfs, same performance level if not better and has a
lot more features. I'm doing the switch myself while waiting for
btrfs to be a bit more mature)

Because of the same uuid, the btrfs commands like filesystem
show will not always give sensible outputs. I tried to rename
the fsid by changing it in the superblocks, but it looks like it
is alsa included in a few other places where changing it
manually breaks some checksums, so I guess someone would have to
write a tool to do that job. I'm surprised it doesn't exist
already (or maybe it does and I'm not aware of it?).

-- 
Stephane

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Cloning a Btrfs partition

2011-12-08 Thread Stephane CHAZELAS
2011-12-08, 10:49(-05), Phillip Susi:
 On 12/7/2011 1:49 PM, BJ Quinn wrote:
 What I need isn't really an equivalent zfs send -- my script can do
 that. As I remember, zfs send was pretty slow too in a scenario like
 this. What I need is to be able to clone a btrfs array somehow -- dd
 would be nice, but as I said I end up with the identical UUID
 problem. Is there a way to change the UUID of an array?

 No, btrfs send is exactly what you need.  Using dd is slow because it 
 copies unused blocks, and requires the source fs be unmounted.
[...]

Not necessarily, you can snapshot them (as in the method I
suggested). If your FS is already on a device mapper device, you
can even get away with not unmounting it (freeze, reload the
device mapper table with a snapshot-origin one and thaw).

 and the destination be an empty partition.  rsync is slow
 because it can't take advantage of the btrfs tree to quickly
 locate the files (or parts of them) that have changed.  A
 btrfs send would solve all of these issues.
[...]

When you want to clone a FS using a similar device or set of
devices, a tool like clone2fs or ntfsclone that copies only the
used sectors across sequentially would probably be a lot more
efficient as it copies the data at the max speed of the drive,
seeking as little as possible.

-- 
Stephane

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


mounting btrfs FS on zfs zvol hangs

2011-11-23 Thread Stephane Chazelas
Hiya,

yes, you'll probably think that is crazy, but after observing
better performance with btrfs in some work loads on md RAID5
than btrfs builtin RAID10, I thought I'd try btrfs on zfs
(in-kernel, not fuse) zvol (on raidz) just for a laugh.

While this procedure worked for ext4 and xfs, for btrfs, the
mount hangs suggesting there might be something wrong with btrfs
and/or zfs.

Here's what I'm doing:

zpool create X raidz /dev/sd{a,b,c,d,e,f}
zfs create -V 6T -o refreservation=0 X/Y
mkfs.btrfs /dev/zvol/X/Y
mount /dev/zvol/X/Y /mnt

backtrace for mount:

mount   D 0009 0  2193   1761 0x
 880401b4d9a8 0082 0001 
 880401b4dfd8 880401b4dfd8 880401b4dfd8 00012a40
 8802092f 88040ef7dc80 880401b4d988 88041fa732c0
Call Trace:
 [81109af0] ? __lock_page+0x70/0x70
 [81056e9f] schedule+0x3f/0x60
 [815e85af] io_schedule+0x8f/0xd0
 [81109afe] sleep_on_page+0xe/0x20
 [815e8dcf] __wait_on_bit+0x5f/0x90
 [81109c68] wait_on_page_bit+0x78/0x80
 [81081d80] ? autoremove_wake_function+0x40/0x40
 [a041ab7a] read_extent_buffer_pages+0x3ca/0x430 [btrfs]
 [a03eec40] ? btrfs_destroy_pinned_extent+0xb0/0xb0 [btrfs]
 [a03f0daa] btree_read_extent_buffer_pages.isra.62+0x8a/0xc0 [btrfs]
 [a03f2021] read_tree_block+0x41/0x60 [btrfs]
 [a03f4735] open_ctree+0xe75/0x1760 [btrfs]
 [812f2f64] ? snprintf+0x34/0x40
 [a03d1438] btrfs_fill_super.isra.38+0x78/0x150 [btrfs]
 [811cf55a] ? disk_name+0xba/0xc0
 [812efbd7] ? strlcpy+0x47/0x60
 [a03d2806] btrfs_mount+0x3c6/0x470 [btrfs]
 [8116ae13] mount_fs+0x43/0x1b0
 [8118565a] vfs_kern_mount+0x6a/0xc0
 [81186a54] do_kern_mount+0x54/0x110
 [811884f4] do_mount+0x1a4/0x260
 [81188990] sys_mount+0x90/0xe0
 [815f27c2] system_call_fastpath+0x16/0x1b

3.0.0-13-generic #22-Ubuntu SMP Wed Nov 2 13:27:26 UTC 2011
x86_64

Best regards,
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


stripe alignment consideration for btrfs on RAID5

2011-11-23 Thread Stephane CHAZELAS
Hiya,

is there any recommendation out there to setup a btrfs FS on top
of hardware or software raid5 or raid6 wrt stripe/stride alignment?

From mkfs.btrfs, it doesn't look like there's much that can be
adjusted that would help, and what I'm asking might not even
make sense for btrfs but I thought I'd just ask.

Thanks,
Stephane

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: stripe alignment consideration for btrfs on RAID5

2011-11-23 Thread Stephane CHAZELAS
2011-11-23, 09:08(-08), Blair Zajac:

 On Nov 23, 2011, at 9:04 AM, Stephane CHAZELAS wrote:

 Hiya,
 
 is there any recommendation out there to setup a btrfs FS on top
 of hardware or software raid5 or raid6 wrt stripe/stride alignment?

 Isn't the advantage of having btrfs do all the raiding itself
 so one gets the checksums?  If one puts btrfs on top of
 software or hardware raid, then if there is a checksum error,
 you don't have another copy of the data to fall back to.  If
 one uses btrfs' raid1 or above for data and metadata, then you
 can suffer a checksum failure and get a good copy from another
 drive?
[...]

Yes, but btrfs doesn't support raid5 yet and I have a limited
number of drives I can connect to that system, and storage
capacity is more important for me than the odd chance of
corruptions of odd sectors (which can be mitigated by running
regular RAID checks).

Also, my tests of btrfs raid10 didn't indicate it was reliable
enough yet (when a drive disappears and reappears, btrfs seems
to get quite confused).

-- 
Stephane

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-delalloc - threaded?

2011-11-22 Thread Stephane CHAZELAS
2011-09-6, 11:21(-05), Andrew Carlson:
 I was doing some testing with writing out data to a BTFS filesystem
 with the compress-force option.  With 1 program running, I saw
 btfs-delalloc taking about 1 CPU worth of time, much as could be
 expected.  I then started up 2 programs at the same time, writing data
 to the BTRFS volume.  btrfs-delalloc still only used 1 CPU worth of
 time.  Is btrfs-delalloc threaded, to where it can use more than 1 CPU
 worth of time?  Is there a threshold where it would start using more
 CPU?
[...]

Hiya,

I observe the same here. The bottleneck when writing data
sequencially seems to be that btrfs-delalloc using 100% of the
time of one CPU.

If I do several writes in parallel, a few more btrfs-delalloc's
appear (3 when filling up 5 files concurrently), but
btrfs-delalloc is still the bottleneck. Interestingly, if I
write to 10 files simultanously, I see only two btrfs-delalloc
and the throughput is lower.

That's on ubuntu 11.10 3.0.0-13 amd64, 12 core, 16GB DDR3
1333MHz RAM. raid10 on 6 drives.

Note that zfsonlinux does perform a lot better in that regard
(on a raidz (ZFS raid5) on those same 6 drives): 50% CPU
utilisation and max out the disk bandwidth.

-- 
Stephane

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-delalloc - threaded?

2011-11-22 Thread Stephane CHAZELAS
2011-11-22, 09:47(-05), Chris Mason:
 On Tue, Nov 22, 2011 at 02:30:07PM +, Stephane CHAZELAS wrote:
 2011-09-6, 11:21(-05), Andrew Carlson:
  I was doing some testing with writing out data to a BTFS filesystem
  with the compress-force option.  With 1 program running, I saw
  btfs-delalloc taking about 1 CPU worth of time, much as could be
  expected.  I then started up 2 programs at the same time, writing data
  to the BTRFS volume.  btrfs-delalloc still only used 1 CPU worth of
  time.  Is btrfs-delalloc threaded, to where it can use more than 1 CPU
  worth of time?  Is there a threshold where it would start using more
  CPU?
 [...]
 
 Hiya,
 
 I observe the same here. The bottleneck when writing data
 sequencially seems to be that btrfs-delalloc using 100% of the
 time of one CPU.

 The compression is spread out to multiple CPUs.  Using zlib on my 4 cpu
 box, I get 4 delalloc threads working on two concurrent dds.

 The thread hand off is based on the amount of work queued up to each
 thread, and you're probably just below the threshold where it kicks off
 another one.  Are you using lzo or zlib? 

mounted with -o compress-force, so getting whatever the default
compression algorithm is.

 What is the workload you're using?  We can make the compression code
 more aggressive at fanning out.
[...]

That was a basic test of:

head -c 40M /dev/urandom  a
(while :; do cat a; done) | pv -rab  b

(I expect the content of a to be cached in memory).

Running dstat -df and top in parallel.

Nothing else reading or writing to that FS.

btrfs maxes out at about 150MB/s, and zfs at about 400MB/s.

For the concurrent writing, replace

pv  with pv | tee b c d e  f

(I suppose there's a fair chance of this incurring disk seeking,
so reduced throughput is probably to be expected. I get the same
kind of throughput (mayby 15% more) with zfs raid5 in that case).

-- 
Stephane

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


status of raid10 reliability

2011-11-17 Thread Stephane CHAZELAS
Hiya,

Before setting up a new RAID10 btrfs array with 6 drives, I
wanted to check how good it behaved in case of disk failure.
I've not been too impressed. Is RAID10 btrfs support only
meant for reading performance improvement?

My test method was:

Use the device-mapper to have devices mapped (linear) to loop
devices

zsh syntax:

# l=({1..4})
# mv /dev/loop$^l .
# truncate -s1T $l
# s=$(blockdev --getsize /dev/loop1)
# for f ($l) losetup loop$f $f
# for f ($l) echo 0 $s linear loop$f 0 | dmsetup create hd$f
# mkfs.btrfs -m raid10 -d raid10 /dev/mapper/hd$^l
# d=(device=/dev/mapper/hd$^l)
# mount -o ${(j:,:)d} /dev/mapper/hd1 /mnt/3

Then write some data, and then use DM's error target to simulate
a failing drive (all I/O ends up in error):

# dmsetup suspend hd3; echo 0 $s error | dmsetup reload hd3; dmsetup resume hd3

Then write some more data. The FS doesn't become degraded
automatically. If I restore the drive:

# echo 0 $s linear loop3 0 | dmsetup create hd3

More funny things occur of course as btrfs doesn't seem to have
registered it being broken.

If I do a scrub with the failing drive, it BUGs ON:

[13960.286464] [ cut here ]
[13960.286484] kernel BUG at 
/home/blank/debian/kernel/release/linux-2.6/linux-2.6-3.1.0/debian/build/source_amd64_none/fs/btrfs/volumes.c:2891!
[13960.286496] invalid opcode:  [#1] SMP
[13960.286507] CPU 0
[13960.286510] Modules linked in: vboxnetadp(O) vboxnetflt(O) vboxdrv(O) 
ip6table_filter ip6_tables ebtable_nat acpi_cpufreq mperf ebtables 
cpufreq_powersave cpufreq_userspace cpufreq_conservative cpufreq_stats 
ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state 
nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp iptable_filter 
ip_tables x_tables bridge stp parport_pc ppdev lp parport rfcomm bnep 
binfmt_misc uinput deflate ctr twofish_generic twofish_x86_64 twofish_common 
camellia serpent blowfish cast5 des_generic cbc xcbc rmd160 sha512_generic 
sha256_generic sha1_generic hmac crypto_null af_key fuse nfsd nfs lockd fscache 
auth_rpcgss nfs_acl sunrpc dm_crypt coretemp loop kvm_intel kvm uvcvideo 
bcm5974 videodev media v4l2_compat_ioctl32 cryptd snd_hda_codec_realtek 
snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm 
aes_x86_64 aes_generic snd_seq_midi ecb btusb bluetooth rfkill snd_rawmidi 
nouveau ttm snd_seq_midi_event drm_kms_helper snd_seq drm i2c_algo_bit mxm_wmi 
snd_timer wmi snd_seq_device joydev video battery snd ac apple_bl power_supply 
soundcore snd_page_alloc applesmc pcspkr input_polldev i2c_i801 i2c_core evdev 
button processor thermal_sys ext4 mbcache jbd2 crc16 btrfs zlib_deflate crc32c 
libcrc32c raid10 raid456 async_raid6_recov async_pq raid6_pq async_xor xor 
async_memcpy async_tx raid1 raid0 multipath linear md_mod nbd dm_mirror 
dm_region_hash dm_log dm_mod sg sr_mod sd_mod cdrom crc_t10dif hid_apple usbhid 
hid ata_generic uhci_hcd firewire_ohci sata_sil24 ata_piix firewire_core 
crc_itu_t libata sky2 ehci_hcd scsi_mod usbcore [last unloaded: scsi_wait_scan]
[13960.287012]
[13960.287016] Pid: 12681, comm: btrfs-scrub-0 Tainted: GW  O 
3.1.0-1-amd64 #1 Apple Inc. MacBookPro4,1/Mac-F42C86C8
[13960.287037] RIP: 0010:[a020fc8a]  [a020fc8a] 
__btrfs_map_block+0xfd/0x629 [btrfs]
[13960.287061] RSP: 0018:880078c87cb0  EFLAGS: 00010282
[13960.287067] RAX: 0042 RBX: 880078c87d68 RCX: 54ee
[13960.287076] RDX:  RSI: 0046 RDI: 0246
[13960.287085] RBP: 8800378f6380 R08: 0002 R09: fffe
[13960.287093] R10: 0001 R11: 88007e395c90 R12: 
[13960.287102] R13: 0008 R14:  R15: 0001
[13960.287114] FS:  () GS:88013fc0() 
knlGS:
[13960.287126] CS:  0010 DS:  ES:  CR0: 8005003b
[13960.287136] CR2: f76990a0 CR3: 000104319000 CR4: 06f0
[13960.287147] DR0:  DR1:  DR2: 
[13960.287159] DR3:  DR6: 0ff0 DR7: 0400
[13960.287169] Process btrfs-scrub-0 (pid: 12681, threadinfo 880078c86000, 
task 880021698280)
[13960.287184] Stack:
[13960.287189]   88010001 0040 
88008a1f40f8
[13960.287207]  88008a1f4100 880078c87d60 0001 
0001
[13960.287223]  880078c87cf0 880109671000 880123416400 

[13960.287242] Call Trace:
[13960.287259]  [a022aa1f] ? scrub_recheck_error+0x105/0x29b [btrfs]
[13960.287280]  [a022ac2a] ? scrub_checksum+0x75/0x372 [btrfs]
[13960.287288]  [810394c7] ? check_preempt_wakeup+0x122/0x18b
[13960.287297]  [81036e0c] ? set_next_entity+0x32/0x52
[13960.287304]  [8100d031] ? load_gs_index+0x7/0xa
[13960.287312]  [8100d6a8] ? __switch_to+0x15a/0x20e
[13960.287331]  

Re: status of raid10 reliability

2011-11-17 Thread Stephane CHAZELAS
2011-11-17 17:09:25 +, Stephane CHAZELAS:
[...]
 Before setting up a new RAID10 btrfs array with 6 drives, I
 wanted to check how good it behaved in case of disk failure.
 I've not been too impressed. Is RAID10 btrfs support only
 meant for reading performance improvement?
 
 My test method was:
 
 Use the device-mapper to have devices mapped (linear) to loop
 devices
[...]
 Then write some data, and then use DM's error target to simulate
 a failing drive (all I/O ends up in error):
 
 # dmsetup suspend hd3; echo 0 $s error | dmsetup reload hd3; dmsetup resume 
 hd3
[...]

Note that I did the same test with both md (raid10) and
zfsonlinux (raidz) and it worked as expected.

-- 
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


ENOSPC on almost empty FS

2011-11-01 Thread Stephane CHAZELAS
Hiya,

trying to restore a FS from a backup (tgz) on a freshly made
btrfs this morning, I got ENOSPCs after about 100MB out of 4GB
have been extracted. strace indicates that the ENOSPC are upon
the open(O_WRONLY).

Restoring with:

mkfs.btrfs /dev/mapper/VG_USB-root
mount -o compress-force,ssd $_ /mnt
cd /mnt
pv ~/backup.tgz | gunzip | sudo bsdtar -xpSf - --numeric-owner


That's on a LVM LV with the PV on a USB key.

If I supspend the job and resume it, then the ENOSPCs go away.

The only way I could restore the backup was via rate limiting
the untar:

zcat ~/backup.tgz | pv -L 300 | sudo bsdtar -xpSf - --numeric-owner

That 3MB/s wasn't even enough, as 3 files triggered a ENOSPC,
but I did untar them separately afterwards.

That's with debian's 3.0.0-1 amd64 kernel.

Is that expected behavior due to the way allocation works in
btrfs?

-- 
Stephane

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs fi defrag -c

2011-10-28 Thread Stephane CHAZELAS
2011-10-28, 10:25(+08), Li Zefan:
[...]
 # df . -h
 FilesystemSize  Used Avail Use% Mounted on
 /home/lizf/tmp/a  2.0G  409M  1.4G  23% /mnt

OK, why are we not gaining space after compression though?


 And I was not suprised, as there's a regression.

 With this fix:

 http://marc.info/?l=linux-btrfsm=131495014823121w=2
[...]

Thanks. That's the one that's scheduled for 3.2 and maybe 3.1.x,
right?

-- 
Stephane

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Unable to mount (or, why not to work late at night).

2011-10-28 Thread Stephane CHAZELAS
2011-10-28, 07:57(+07), Fajar A. Nugraha:
[...]
 Already got 'em.  Everything that tries to even think about modifying stuff
 (btrfs-zero-log, btrfsck, and btrfs-debug-tree) all dump core:

 Your last resort (for now, anyway) might be using restore from
 Josef's btrfs-progs: https://github.com/josefbacik/btrfs-progs

 It might be able to copy some data.

I also have got one FS in that same situation. I tried
everything on it including that restore (which bailed out with
those same error messages IIRC).

The only thing that got me a bit further was to use an alternate
superblock, though that screwed the FS even further as I need to
reboot the machine after trying to mount it (mount hangs and
there are some btrfs tasks using all the CPU time).

Fortunately, for that one, I had a not too old backup at the
block device level.

-- 
Stephane

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


lseek hanging

2011-10-27 Thread Stephane CHAZELAS
This morning, I have a strange behavior when doing a tail -f
on a log file.

cat log runs successfully, but
tail -f log hangs.

Running a strace shows it hanging on lseek(3, 0, SEEK_CUR...
3 being the fd for that log file.

In dmesg:

[59881.520030] INFO: task btrfs-delalloc-:763 blocked for more than 120 seconds.
[59881.527205] echo 0  /proc/sys/kernel/hung_task_timeout_secs disables this 
message.
[59881.535068] btrfs-delalloc- D 000100e2892b 0   763  2 0x
[59881.542161]  88014738bc20 0046 4000 

[59881.549673]  00012840 00012840 00012840 
8801478c2580
[59881.557147]  00012840 88014738bfd8 00012840 
00012840
[59881.568736] Call Trace:
[59881.571219]  [a02ff90c] wait_current_trans.clone.22+0xab/0xdc 
[btrfs]
[59881.578589]  [81068ffb] ? wake_up_bit+0x2a/0x2a
[59881.584012]  [8136f6e6] ? _raw_spin_lock+0xe/0x10
[59881.589613]  [a0300b3f] start_transaction+0xe3/0x231 [btrfs]
[59881.596176]  [a0300cd0] btrfs_join_transaction+0x15/0x17 [btrfs]
[59881.603103]  [a03074bb] compress_file_range+0x297/0x515 [btrfs]
[59881.609926]  [a030776e] async_cow_start+0x35/0x4a [btrfs]
[59881.616237]  [8136f736] ? _raw_spin_lock_irq+0x1f/0x21
[59881.622277]  [a0324f0a] worker_loop+0x19d/0x4cb [btrfs]
[59881.628433]  [a0324d6d] ? btrfs_queue_worker+0x27a/0x27a [btrfs]
[59881.635330]  [81068b2c] kthread+0x82/0x8a
[59881.640227]  [81376124] kernel_thread_helper+0x4/0x10
[59881.646160]  [81068aaa] ? kthread_worker_fn+0x14c/0x14c
[59881.652293]  [81376120] ? gs_change+0x13/0x13
[59881.657555] INFO: task flush-btrfs-1:2675 blocked for more than 120 seconds.
[59881.664617] echo 0  /proc/sys/kernel/hung_task_timeout_secs disables this 
message.
[59881.672477] flush-btrfs-1   D 000100e28931 0  2675  2 0x
[59881.679628]  88013c559990 0046  
8801
[59881.687118]  00012840 00012840 00012840 
88013cf76140
[59881.694591]  00012840 88013c559fd8 00012840 
00012840
[59881.702064] Call Trace:
[59881.704536]  [810c9843] ? lock_page+0x2f/0x2f
[59881.709800]  [8136e084] io_schedule+0x63/0x7e
[59881.715042]  [810c9851] sleep_on_page+0xe/0x12
[59881.720376]  [8136e663] __wait_on_bit_lock+0x46/0x8f
[59881.726241]  [810d3463] ? pagevec_lru_move_fn+0xaa/0xc0
[59881.732372]  [810c980d] __lock_page+0x66/0x6d
[59881.737628]  [81069034] ? autoremove_wake_function+0x39/0x39
[59881.744173]  [8103cea6] ? should_resched+0xe/0x2e
[59881.749779]  [a0319fa7] lock_page+0x2a/0x2e [btrfs]
[59881.755631]  [a031cf63] 
extent_write_cache_pages.clone.10.clone.17+0xba/0x28e [btrfs]
[59881.764382]  [a031d3a7] extent_writepages+0x47/0x5c [btrfs]
[59881.770877]  [a0303e3e] ? uncompress_inline.clone.32+0x119/0x119 
[btrfs]
[59881.778494]  [a0302f0e] btrfs_writepages+0x27/0x29 [btrfs]
[59881.784867]  [810d22d9] do_writepages+0x21/0x2a
[59881.790302]  [81131aba] writeback_single_inode+0xb5/0x1c6
[59881.796588]  [81131ddd] writeback_sb_inodes+0xbc/0x138
[59881.802683]  [8113279e] writeback_inodes_wb+0x172/0x184
[59881.808795]  [81132a1c] wb_writeback+0x26c/0x3aa
[59881.814297]  [81132ca1] wb_do_writeback+0x147/0x1a0
[59881.820081]  [8136e58d] ? schedule_timeout+0xb3/0xe3
[59881.825947]  [81132d86] bdi_writeback_thread+0x8c/0x20f
[59881.832056]  [81132cfa] ? wb_do_writeback+0x1a0/0x1a0
[59881.838062]  [81068b2c] kthread+0x82/0x8a
[59881.842959]  [81376124] kernel_thread_helper+0x4/0x10
[59881.848896]  [81068aaa] ? kthread_worker_fn+0x14c/0x14c
[59881.855005]  [81376120] ? gs_change+0x13/0x13
[59881.860267] INFO: task ntfsclone:2787 blocked for more than 120 seconds.
[59881.866981] echo 0  /proc/sys/kernel/hung_task_timeout_secs disables this 
message.
[59881.874842] ntfsclone   D 000100e28625 0  2787   2767 0x
[59881.881929]  88013b421b88 0082 814044e0 

[59881.889399]  00012840 00012840 00012840 
88013bcba5c0
[59881.896895]  00012840 88013b421fd8 00012840 
00012840
[59881.904388] Call Trace:
[59881.906848]  [81069259] ? prepare_to_wait+0x76/0x81
[59881.912630]  [a02ff90c] wait_current_trans.clone.22+0xab/0xdc 
[btrfs]
[59881.919970]  [81068ffb] ? wake_up_bit+0x2a/0x2a
[59881.925410]  [8136f6e6] ? _raw_spin_lock+0xe/0x10
[59881.931014]  [a0300b3f] start_transaction+0xe3/0x231 [btrfs]
[59881.937572]  [a0300cd0] btrfs_join_transaction+0x15/0x17 [btrfs]
[59881.944479]  [a0309bc4] btrfs_dirty_inode+0x2c/0x117 [btrfs]
[59881.951051]  [81131f6a] 

btrfs fi defrag -c

2011-10-27 Thread Stephane Chazelas
I don't quite understand the behavior of btrfs fi defrag

~# truncate -s2G ~/a
~# mkfs.btrfs ~/a
nodesize 4096 leafsize 4096 sectorsize 4096 size 2.00GB
~# mount -o loop ~/a /mnt/1
/mnt/1# cd x
/mnt/1# df -h .
Filesystem  Size  Used Avail Use% Mounted on
/dev/loop1  2.0G   64K  1.8G   1% /mnt/1
/mnt/1# yes | head -c400M  a
/mnt/1# df -h .
Filesystem  Size  Used Avail Use% Mounted on
/dev/loop1  2.0G   64K  1.8G   1% /mnt/1
/mnt/1# sync
/mnt/1# df -h .
Filesystem  Size  Used Avail Use% Mounted on
/dev/loop1  2.0G  402M  1.4G  23% /mnt/1
/mnt/1# btrfs fi defrag -c a

(exit status == 20 BTW).

(20)/mnt/1# sync
/mnt/1# df -h .
Filesystem  Size  Used Avail Use% Mounted on
/dev/loop1  2.0G  415M  994M  30% /mnt/1

No space gain, even lost 15M or 400M depending on how you look at it.

/mnt/1# btrfs fi defrag  a
(20)/mnt/1# sync
/mnt/1# df -h .
Filesystem  Size  Used Avail Use% Mounted on
/dev/loop1  2.0G  797M  612M  57% /mnt/1

Lost another 400M.

/mnt/1# ls -l
total 409600
-rw-r--r-- 1 root root 419430400 Oct 27 19:53 a
/mnt/1# btrfs fi balance .
/mnt/1# sync
/mnt/1# df -h .
Filesystem  Size  Used Avail Use% Mounted on
/dev/loop1  2.0G  798M  845M  49% /mnt/1

Possibly reclaimed some of the space?

At the point where it says 612M free, if I do:
/mnt/1# cat  /dev/zero  b
cat: write error: No space left on device
/mnt/1# ls -lh b
-rw-r--r-- 1 root root 612M Oct 27 20:14 b

There was indeed 612M free.


When the FS is mounted with compress:

~# mkfs.btrfs ./a
nodesize 4096 leafsize 4096 sectorsize 4096 size 2.00GB
~# mount -o compress ./a /mnt/1
~# cd /mnt/1
/mnt/1# yes | head -c400M  a
/mnt/1# sync
/mnt/1# df -h .
Filesystem  Size  Used Avail Use% Mounted on
/dev/loop1  2.0G   14M  1.8G   1% /mnt/1
/mnt/1# btrfs fi defrag -c ./a
(20)/mnt/1# sync
/mnt/1# df -h .
Filesystem  Size  Used Avail Use% Mounted on
/dev/loop1  2.0G   21M  1.4G   2% /mnt/1

Lost 400M?

/mnt/1# btrfs fi defrag ./a
(20)/mnt/1# sync
/mnt/1# df -h .
Filesystem  Size  Used Avail Use% Mounted on
/dev/loop1  2.0G   21M  1.4G   2% /mnt/1

I take it it doesn't uncompress?

I'm a bit confused here.

(that's with 3.0 amd64)

-- 
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how stable are snapshots at the block level?

2011-10-25 Thread Stephane CHAZELAS
2011-10-25, 07:46(-04), Edward Ned Harvey:
[...]
 My suggestion to the OP of this thread is to use rsync for now, wait for
 btrfs send, or switch to zfs.
[...]

rsync won't work if you've got snapshot volumes though (unless
you're prepared to have a backup copy thousands of times the
size of the original or have a framework in place to replicate
the snapshots on the backup copy as soon as they are created
(but before they're being written to)).

To backup a btrfs FS with snapshots, the only option seems to
be to copy the block devices for now (or the other trick
mentionned earlier).

-- 
Stephane

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how stable are snapshots at the block level?

2011-10-24 Thread Stephane CHAZELAS
2011-10-23, 17:19(+02), Mathijs Kwik:
[...]
 For this case (my laptop) I can stick to file-based rsync, but I think
 some guarantees should exist at the block level. Many virtual machines
 and cloud hosting services (like ec2) provide block-level snapshots.
 With xfs, I can freeze the filesystem for a short amount of time
 (100ms), snapshot, unfreeze. I don't think such a lock/freeze feature
 exists for btrfs
[...]

That FS-freeze feature has been moved to the vfs layer so is
available to any filesystem now.

You can either use xfs_io (see -F option to freeze for foreign
FS) like for xfs FS or use fsfreeze from util-linux.

Note that you can thaw file systems with a sysrq combination
now. (for instance with xen using xm sysrq vm j).

For block level snapshots, see also ddsnap (device mapper target
unfortunately no longer maintained) and lvm of course (but
doesn't scale well with several snapshots).

-- 
Stephane

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how stable are snapshots at the block level?

2011-10-24 Thread Stephane CHAZELAS
2011-10-24, 09:59(-04), Edward Ned Harvey:
[...]
 If you are reading the raw device underneath btrfs, you are
 not getting the benefit of the filesystem checksumming.  If
 you encounter an undetected read/write error, it will silently
 pass.  Your data will be corrupted, you'll never know about it
 until you see the side-effects (whatever they may be).
[...]

I don't follow you here. If you're cloning a device holding a
btrfs FS, you'll clone the checksums as well. If there were
errors, they will be detected on the cloned FS as well?

 There is never a situation where block level copies have any
 advantage over something like btrfs send.  Except perhaps
 forensics or espionage.  But in terms of fast efficient
 reliable backups, btrfs send has every advantage and no
 disadvantage compared to block level copy.

$ btrfs send
ERROR: unknown command 'send'
Usage:
[...]

(from 2011-10-12 integration branch). Am I missing something?

 There are many situations where btrfs send has an advantage
 over both block level and file level copies.  It instantly
 knows all the relevant disk blocks to send, it preserves every
 property, it's agnostic about filesystem size or layout on
 either sending or receiving end, you have the option to create
 different configurations on each side, including compression
 etc.  And so on.
[...]

That sounds like zfs send, I didn't know btrfs had it yet.

My understanding was that to clone/backup a btrfs FS, you could
only clone the block devices or use the device add + device
del trick with some extra copy-on-write (LVM, nbd) layer.

-- 
Stephane

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS thinks that a device is mounted

2011-10-21 Thread Stephane CHAZELAS
2011-10-21, 00:39(+03), Nikos Voutsinas:
[...]
 ## Comment: Of course /dev/sdh is not mounted.
 mount |grep /dev/sdh
 root@lxc:~#
[...]

Note that mount(8) uses /etc/mtab to find out what is mounted.
And if that file is not a symlink to /proc/mounts, the
information is not necessarily correct.

You can also have a look at /proc/mounts directly.

-- 
Stephane

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: recursive subvolume delete

2011-10-02 Thread Stephane Chazelas
2011-10-02 16:38:21 +0200, krz...@gmail.com :
 Also I think there are no real tools to find out which
 directories are subvolumes/snapshots
[...]

On my system (debian), there's mountpoint command (from the
initscript package from
http://savannah.nongnu.org/projects/sysvinit) that will tell you
that (it compares the st_dev of the given directory with that of
directory/..).

-- 
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: high CPU usage and low perf

2011-09-30 Thread Stephane Chazelas
2011-09-27 10:15:09 +0100, Stephane Chazelas:
[...]
 btrfs-transacti R  running task0   963  2 0x
  880143af7730 0001 ff10 880143af77b0
  8801456da420  e86aa840 1000
  ffe4 8801462ba800 880109f9b540 88002a95eba8
 Call Trace:
  [a032765e] ? tree_search_offset+0x18f/0x1b8 [btrfs]
  [a02eb745] ? btrfs_reserve_extent+0xb0/0x190 [btrfs]
  [a02ebdfc] ? btrfs_alloc_free_block+0x22e/0x349 [btrfs]
  [a02dea3d] ? __btrfs_cow_block+0x102/0x31e [btrfs]
  [a02ebdfc] ? btrfs_alloc_free_block+0x22e/0x349 [btrfs]
  [a02dea3d] ? __btrfs_cow_block+0x102/0x31e [btrfs]
  [a02dd400] ? btrfs_set_node_key+0x1a/0x20 [btrfs]
  [a02ded5d] ? btrfs_cow_block+0x104/0x14e [btrfs]
  [a02e1c34] ? btrfs_search_slot+0x162/0x4cb [btrfs]
  [a02e2ea3] ? btrfs_insert_empty_items+0x6a/0xba [btrfs]
  [a02e9bf3] ? run_clustered_refs+0x370/0x682 [btrfs]
  [a032d201] ? btrfs_find_ref_cluster+0xd/0x13c [btrfs]
  [a02e9fd6] ? btrfs_run_delayed_refs+0xd1/0x17c [btrfs]
  [a02f8467] ? btrfs_commit_transaction+0x38f/0x709 [btrfs]
  [8136f6e6] ? _raw_spin_lock+0xe/0x10
  [a02f79fe] ? join_transaction.clone.23+0xc1/0x200 [btrfs]
[...]

Any idea anyone? The above suggests btrfs struggles to allocate
space, even though the FS is only 66% full.

For now, my work around is to reboot the system once a day. Not
ideal...

I'm also suspecting some data corruption which I'm investigating
now (one a file written via mmap()).

Thanks,
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


high CPU usage and low perf

2011-09-27 Thread Stephane Chazelas
Hiya,


Recently,

a btrfs file system of mine started to behave very poorly with
some btrfs kernel tasks taking 100% of CPU time.

# btrfs fi show /dev/sdb
Label: none  uuid: b3ce8b16-970e-4ba8-b9d2-4c7de270d0f1
Total devices 3 FS bytes used 4.25TB
devid2 size 2.73TB used 1.52TB path /dev/sdc
devid1 size 2.70TB used 1.49TB path /dev/sda4
devid3 size 2.73TB used 1.52TB path /dev/sdb

Btrfs v0.19-100-g4964d65

FS mounted with compress-force,noatime

(Can't do a filesystem df just now, as there's a umount
running, there should be around 33% free).

Kernel 3.0, with patch:
http://www.spinics.net/lists/linux-btrfs/msg11023.html

While the FS is running, I see for instance btrfs-transacti
taking 100% CPU and iostat shows no disk activity. Writing
performance is dreadful (a few kB/s).

sysrq-t gives:

btrfs-transacti R  running task0   963  2 0x
 880143af7730 0001 ff10 880143af77b0
 8801456da420  e86aa840 1000
 ffe4 8801462ba800 880109f9b540 88002a95eba8
Call Trace:
 [a032765e] ? tree_search_offset+0x18f/0x1b8 [btrfs]
 [a02eb745] ? btrfs_reserve_extent+0xb0/0x190 [btrfs]
 [a02ebdfc] ? btrfs_alloc_free_block+0x22e/0x349 [btrfs]
 [a02dea3d] ? __btrfs_cow_block+0x102/0x31e [btrfs]
 [a02ebdfc] ? btrfs_alloc_free_block+0x22e/0x349 [btrfs]
 [a02dea3d] ? __btrfs_cow_block+0x102/0x31e [btrfs]
 [a02dd400] ? btrfs_set_node_key+0x1a/0x20 [btrfs]
 [a02ded5d] ? btrfs_cow_block+0x104/0x14e [btrfs]
 [a02e1c34] ? btrfs_search_slot+0x162/0x4cb [btrfs]
 [a02e2ea3] ? btrfs_insert_empty_items+0x6a/0xba [btrfs]
 [a02e9bf3] ? run_clustered_refs+0x370/0x682 [btrfs]
 [a032d201] ? btrfs_find_ref_cluster+0xd/0x13c [btrfs]
 [a02e9fd6] ? btrfs_run_delayed_refs+0xd1/0x17c [btrfs]
 [a02f8467] ? btrfs_commit_transaction+0x38f/0x709 [btrfs]
 [8136f6e6] ? _raw_spin_lock+0xe/0x10
 [a02f79fe] ? join_transaction.clone.23+0xc1/0x200 [btrfs]
 [81068ffb] ? wake_up_bit+0x2a/0x2a
 [a02f28fd] ? transaction_kthread+0x175/0x22a [btrfs]
 [a02f2788] ? btrfs_congested_fn+0x86/0x86 [btrfs]
 [81068b2c] ? kthread+0x82/0x8a
 [81376124] ? kernel_thread_helper+0x4/0x10
 [81068aaa] ? kthread_worker_fn+0x14c/0x14c
 [81376120] ? gs_change+0x13/0x13

After a while, with no FS activity, it does calm down though.

umount has already used over 10 minutes of CPU time:
# ps -flC umount
F S UIDPID  PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY  TIME CMD
4 R root  6045  1853 65  80   0 -  2538 -  09:46 pts/200:11:06 
umount /backup

sysrq-t gives:

[515954.295050] umount  R  running task0  6045   1853 0x
[515954.295050]  88011131c600 0001 811cb1ee 
88012c2fd598
[515954.295050]  8801456da420 1000 8800 
8801456da420
[515954.295050]  88012c2fd578 a0327d96 880111bebb60 
1000
[515954.295050] Call Trace:
[515954.295050]  [a032765e] ? tree_search_offset+0x18f/0x1b8 [btrfs]
[515954.295050]  [8103ce8e] ? need_resched+0x23/0x2d
[515954.295050]  [81103ccd] ? kmem_cache_alloc+0x94/0x105
[515954.295050]  [a0329ff7] ? btrfs_find_space_cluster+0xce/0x189 
[btrfs]
[515954.295050]  [a02eaaa0] ? find_free_extent.clone.64+0x549/0x8c7 
[btrfs]
[515954.295050]  [a032765e] ? tree_search_offset+0x18f/0x1b8 [btrfs]
[515954.295050]  [a02eb745] ? btrfs_reserve_extent+0xb0/0x190 [btrfs]
[515954.295050]  [a02ebdfc] ? btrfs_alloc_free_block+0x22e/0x349 
[btrfs]
[515954.295050]  [a02dea3d] ? __btrfs_cow_block+0x102/0x31e [btrfs]
[515954.295050]  [a02ebdfc] ? btrfs_alloc_free_block+0x22e/0x349 
[btrfs]
[515954.295050]  [a02dea3d] ? __btrfs_cow_block+0x102/0x31e [btrfs]
[515954.295050]  [a02e5312] ? lookup_inline_extent_backref+0xa5/0x328 
[btrfs]
[515954.295050]  [a02e76ef] ? __btrfs_free_extent+0xc3/0x55b [btrfs]
[515954.295050]  [8110480f] ? kfree+0x72/0x7b
[515954.295050]  [a032d19d] ? btrfs_delayed_ref_lock+0x4a/0xa1 [btrfs]
[515954.295050]  [a02e9ebb] ? run_clustered_refs+0x638/0x682 [btrfs]
[515954.295050]  [a032d200] ? btrfs_find_ref_cluster+0xc/0x13c [btrfs]
[515954.295050]  [a02e9fd6] ? btrfs_run_delayed_refs+0xd1/0x17c 
[btrfs]
[515954.295050]  [a02f710d] ? commit_cowonly_roots+0x78/0x18f [btrfs]
[515954.295050]  [8103ce8e] ? need_resched+0x23/0x2d
[515954.295050]  [8103cea6] ? should_resched+0xe/0x2e
[515954.295050]  [a02f84d7] ? btrfs_commit_transaction+0x3ff/0x709 
[btrfs]
[515954.295050]  [8136f6e6] ? _raw_spin_lock+0xe/0x10
[515954.295050]  [a02f7b07] ? join_transaction.clone.23+0x1ca/0x200 
[btrfs]
[515954.295050]  

Re: high CPU usage and low perf

2011-09-27 Thread Stephane Chazelas
2011-09-27 10:15:09 +0100, Stephane Chazelas:
[...]
 a btrfs file system of mine started to behave very poorly with
 some btrfs kernel tasks taking 100% of CPU time.
 
 # btrfs fi show /dev/sdb
 Label: none  uuid: b3ce8b16-970e-4ba8-b9d2-4c7de270d0f1
 Total devices 3 FS bytes used 4.25TB
 devid2 size 2.73TB used 1.52TB path /dev/sdc
 devid1 size 2.70TB used 1.49TB path /dev/sda4
 devid3 size 2.73TB used 1.52TB path /dev/sdb
 
 Btrfs v0.19-100-g4964d65
 
 FS mounted with compress-force,noatime
 
 (Can't do a filesystem df just now, as there's a umount
 running, there should be around 33% free).
[...]

The umount just returned.

# btrfs fi df /backup
Data, RAID0: total=4.20TB, used=4.20TB
Data: total=8.00MB, used=7.97MB
System, RAID1: total=8.00MB, used=344.00KB
System: total=4.00MB, used=0.00
Metadata, RAID1: total=162.75GB, used=59.30GB
Metadata: total=8.00MB, used=0.00

It's now running fine again after reload of btrfs module and
remount.

-- 
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at fs/btrfs/inode.c:4676!

2011-09-17 Thread Stephane Chazelas
2011-06-06 12:19:56 +0200, Marek Otahal:
 Hello, 
 the issue happens every time when i have to hard power-off my notebook 
 (suspend problems). 
 With kernel 2.6.39 the partition is unmountable, solution is to boot 2.6.38 
 kernel which 
 1/ is able to mount the partition, 
 2/ by doing that fixes the problem so later .39 (after clean shutdown) can 
 mount it also. 
[...]

I've just been hit by this (3.0). I dug up a 2.6.38 kernel and got
it back running just the same. Has any progress made on this?

[39564.802905] device fsid 01b919f7-32cd-4d09-be1c-1810249001b2 devid 1 transid 
21097 /dev/mapper/VG_USB_debian-root
[39565.555655] [ cut here ]
[39565.555662] kernel BUG at 
/build/buildd-linux-2.6_3.0.0-3-amd64-9ClimQ/linux-2.6-3.0.0/debian/build/source_amd64_none/fs/btrfs/inode.c:4586!
[39565.555668] invalid opcode:  [#1] SMP
[39565.555672] CPU 1
[39565.555674] Modules linked in: ext2 hfsplus nls_utf8 nls_cp437 vfat fat 
ip6table_filter ip6_tables ebtable_nat ebtables vboxnetadp(O) vboxnetflt(O) 
ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state 
nf_conntrack ipt_REJECT vboxdrv(O) xt_CHECKSUM iptable_mangle xt_tcpudp bridge 
stp parport_pc ppdev lp parport rfcomm bnep bluetooth rfkill xt_multiport 
iptable_filter ip_tables x_tables snd_hrtimer acpi_cpufreq mperf 
cpufreq_conservative cpufreq_powersave cpufreq_userspace cpufreq_stats 
binfmt_misc fuse nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc ext3 jbd 
loop dm_crypt kvm_intel kvm uvcvideo videodev media v4l2_compat_ioctl32 
nvidia(P) snd_hda_codec_via snd_hda_intel snd_hda_codec snd_hwdep snd_pcm 
snd_seq snd_timer snd_seq_device evdev i7core_edac snd i2c_i801 edac_core 
pcspkr soundcore i2c_core asus_atk0110 snd_page_alloc button processor 
thermal_sys ext4 mbcache jbd2 crc16 btrfs zlib_deflate crc32c libcrc32c dm_mod 
raid10 raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy 
async_tx raid1 raid0 multipath linear md_mod nbd sg sd_mod sr_mod crc_t10dif 
cdrom ata_generic usb_storage usbhid hid uas pata_jmicron firewire_ohci 
firewire_core crc_itu_t ahci libahci ehci_hcd libata scsi_mod r8169 mii usbcore 
[last unloaded: scsi_wait_scan]
[39565.555806]
[39565.555810] Pid: 18729, comm: mount Tainted: P  IO 3.0.0-1-amd64 #1 
System manufacturer System Product Name/P7P55D
[39565.555817] RIP: 0010:[a01fe008]  [a01fe008] 
btrfs_add_link+0x120/0x178 [btrfs]
[39565.555850] RSP: 0018:8801c5273858  EFLAGS: 00010282
[39565.555854] RAX: ffef RBX: 8801caf96d90 RCX: 8802124d01d8
[39565.555858] RDX: 000e RSI: 8801948da880 RDI: 0292
[39565.555862] RBP: 880162b12800 R08: 0050 R09: 000d
[39565.555866] R10: 000c R11: 00015670 R12: 8801caf7d1d8
[39565.555870] R13: 000b R14: 88017b3c0600 R15: 8801cecfb540
[39565.555875] FS:  7f1062f7b7e0() GS:88023fc2() 
knlGS:
[39565.555879] CS:  0010 DS:  ES:  CR0: 80050033
[39565.555883] CR2: 7f84b7689000 CR3: 00017b112000 CR4: 06e0
[39565.555887] DR0:  DR1:  DR2: 
[39565.555891] DR3:  DR6: 0ff0 DR7: 0400
[39565.555896] Process mount (pid: 18729, threadinfo 8801c5272000, task 
880236e947f0)
[39565.555899] Stack:
[39565.555901]  0001 01c9 8801 
000592ad
[39565.555908]  000592ad 0001 880203d96e00 
000c
[39565.555915]  1000 8801cec8b7f0 8801c52739e8 
8801caf7d1d8
[39565.555922] Call Trace:
[39565.555948]  [a021ddf5] ? add_inode_ref+0x2f3/0x385 [btrfs]
[39565.555974]  [a0220063] ? replay_one_buffer+0x181/0x1fb [btrfs]
[39565.556000]  [a0210c4e] ? alloc_extent_buffer+0x6f/0x295 [btrfs]
[39565.556025]  [a021f7e8] ? walk_down_log_tree+0x153/0x29c [btrfs]
[39565.556050]  [a021f9b2] ? walk_log_tree+0x81/0x196 [btrfs]
[39565.556074]  [a01f0b4e] ? btrfs_read_fs_root_no_radix+0x166/0x1a5 
[btrfs]
[39565.556099]  [a0221177] ? btrfs_recover_log_trees+0x192/0x297 
[btrfs]
[39565.556125]  [a021fee2] ? replay_one_dir_item+0xb3/0xb3 [btrfs]
[39565.556148]  [a01ef722] ? 
btree_read_extent_buffer_pages.clone.63+0x6f/0xb2 [btrfs]
[39565.556173]  [a01f2f47] ? open_ctree+0x10f5/0x140e [btrfs]
[39565.556180]  [811aa488] ? string.clone.2+0x39/0x9f
[39565.556187]  [810fdc92] ? sget+0x363/0x381
[39565.556207]  [a01d9743] ? btrfs_mount+0x228/0x470 [btrfs]
[39565.556213]  [810cdcd6] ? pcpu_next_pop+0x37/0x45
[39565.556219]  [810cda22] ? cpumask_next+0x18/0x1d
[39565.556224]  [810ceb4c] ? pcpu_alloc+0x7b4/0x7cc
[39565.556232]  [810fe52b] ? mount_fs+0x67/0x150
[39565.556241]  [8c4c] ? vfs_kern_mount+0x58/0x97
[39565.556249]  

Re: btrfs hung tasks

2011-07-28 Thread Stephane Chazelas
2011-07-28 07:23:43 +0100, Stephane Chazelas:
 Hiya, I got below those last night. That was 3 minutes after a
 bunch of rsync and ntfsclone processes started.
 
 It's the first time it happens. I upgraded from 3.0rc6 to 3.0
 yesterday.
[...]

And again this morning, though at that point only one ntfsclone
process was actively writing to the FS.

At this point, I can read directories and stat(2) files on that
FS, but reading or writing files hangs.

I'll try and revert to 3.0rc6 to see if that makes a difference.
call traces for some processes trying to read from the FS:

cat D 8801424ee240 0  3478  1 0x0005
 8801424ee240 0086 8801080497e8 8800494322e0
 8801461908b0 00012800 880108049fd8 880108049fd8
 00012800 8801424ee240 00012800 00012800
Call Trace:
 [813367ec] ? _raw_spin_lock_irqsave+0x9/0x25
 [a02d989c] ? btrfs_tree_lock+0x9a/0xa7 [btrfs]
 [a02d9754] ? btrfs_spin_on_block+0x49/0x49 [btrfs]
 [a0297edb] ? btrfs_set_path_blocking+0x21/0x32 [btrfs]
 [a029ba81] ? btrfs_search_slot+0x3c6/0x4d6 [btrfs]
 [a02a923a] ? btrfs_lookup_csum+0x65/0x105 [btrfs]
 [a02c9bb2] ? btrfs_lookup_ordered_extent+0x2b/0x69 [btrfs]
 [a02ca1fa] ? btrfs_find_ordered_sum+0x34/0xcc [btrfs]
 [a02a9449] ? __btrfs_lookup_bio_sums+0x16f/0x2ed [btrfs]
 [a02e3bb8] ? btrfs_submit_compressed_read+0x3b7/0x42e [btrfs]
 [a02ca73b] ? submit_one_bio+0x85/0xbc [btrfs]
 [a02cc880] ? submit_extent_page.clone.16+0x118/0x1b9 [btrfs]
 [a02cc290] ? check_page_uptodate+0x36/0x36 [btrfs]
 [a02ccda4] ? __extent_read_full_page+0x463/0x4cc [btrfs]
 [a02cc290] ? check_page_uptodate+0x36/0x36 [btrfs]
 [a02b4e09] ? uncompress_inline.clone.32+0x117/0x117 [btrfs]
 [a02cd92b] ? extent_readpages+0xb1/0xf6 [btrfs]
 [a02b4e09] ? uncompress_inline.clone.32+0x117/0x117 [btrfs]
 [810be21b] ? __do_page_cache_readahead+0x124/0x1c8
 [810be526] ? ra_submit+0x1c/0x23
 [810b6e9e] ? generic_file_aio_read+0x2a7/0x5c7
 [810fb5f1] ? do_sync_read+0xb1/0xea
 [81336815] ? _raw_spin_lock_irq+0xd/0x1a
 [810fbc10] ? vfs_read+0x9f/0xf2
 [81012599] ? syscall_trace_enter+0xb5/0x15d
 [810fbca8] ? sys_read+0x45/0x6b
 [8133bca7] ? tracesys+0xd9/0xde


wc  D 8801424ef710 0  3495  1 0x0005
 8801424ef710 0086 811ab802 88014951f5c0
 8160b020 00012800 880109617fd8 880109617fd8
 00012800 8801424ef710 00012800 00012800
Call Trace:
 [811ab802] ? delay_tsc+0x2b/0x68
 [813367ec] ? _raw_spin_lock_irqsave+0x9/0x25
 [a02d989c] ? btrfs_tree_lock+0x9a/0xa7 [btrfs]
 [a02d9754] ? btrfs_spin_on_block+0x49/0x49 [btrfs]
 [a02ce989] ? map_private_extent_buffer+0xa3/0xc4 [btrfs]
 [a029816d] ? btrfs_lock_root_node+0x1d/0x3f [btrfs]
 [a029b7a1] ? btrfs_search_slot+0xe6/0x4d6 [btrfs]
 [a02ac5a1] ? btrfs_header_generation.clone.17+0xf/0x14 [btrfs]
 [a02a923a] ? btrfs_lookup_csum+0x65/0x105 [btrfs]
 [a02c9bb2] ? btrfs_lookup_ordered_extent+0x2b/0x69 [btrfs]
 [a02ca1fa] ? btrfs_find_ordered_sum+0x34/0xcc [btrfs]
 [a02a9449] ? __btrfs_lookup_bio_sums+0x16f/0x2ed [btrfs]
 [a02b3226] ? btrfs_submit_bio_hook+0xa4/0x129 [btrfs]
 [a02ca73b] ? submit_one_bio+0x85/0xbc [btrfs]
 [a02cc880] ? submit_extent_page.clone.16+0x118/0x1b9 [btrfs]
 [a02cc290] ? check_page_uptodate+0x36/0x36 [btrfs]
 [a02ccda4] ? __extent_read_full_page+0x463/0x4cc [btrfs]
 [a02cc290] ? check_page_uptodate+0x36/0x36 [btrfs]
 [a02b4e09] ? uncompress_inline.clone.32+0x117/0x117 [btrfs]
 [a02cd92b] ? extent_readpages+0xb1/0xf6 [btrfs]
 [a02b4e09] ? uncompress_inline.clone.32+0x117/0x117 [btrfs]
 [810be21b] ? __do_page_cache_readahead+0x124/0x1c8
 [810be526] ? ra_submit+0x1c/0x23
 [810b6e62] ? generic_file_aio_read+0x26b/0x5c7
 [810fb5f1] ? do_sync_read+0xb1/0xea
 [81336815] ? _raw_spin_lock_irq+0xd/0x1a
 [810fbc10] ? vfs_read+0x9f/0xf2
 [81012599] ? syscall_trace_enter+0xb5/0x15d
 [810fbca8] ? sys_read+0x45/0x6b
 [8133bca7] ? tracesys+0xd9/0xde


tailD 88014651b750 0  3442   1844 0x0004
 88014651b750 0082 880147b06508 
 8801495660c0 00012800 88010e05dfd8 88010e05dfd8
 00012800 88014651b750 00012800 00012800
Call Trace:
 [81335e18] ? __mutex_lock_common.clone.5+0x114/0x179
 [81335cf1] ? mutex_lock+0x1a/0x2d
 [810fb6d8] ? generic_file_llseek+0x21/0x52
 [810fb745] ? sys_lseek+0x3c/0x59
 [8133ba92] ? system_call_fastpath+0x16/0x1b


rm  D

BUG() in btrfs-fixup (Was: btrfs invalid opcode)

2011-07-27 Thread Stephane Chazelas
2011-07-25 17:38:10 +0100, Jeremy Sanders:
 I'm afraid this is a rather old kernel, 2.6.35.13-92.fc14.x86_64, but this 
 error looks rather similiar to 
 http://www.spinics.net/lists/linux-btrfs/msg11053.html
 
 Has this been fixed? I was simultaneously doing rsyncs into different 
 subvolumes (one reading and one writing).
[...]
 [454244.123523] kernel BUG at fs/btrfs/inode.c:1528!
[...]
 [454244.124338] Pid: 3158, comm: btrfs-fixup-0 Not tainted 
 2.6.35.13-92.fc14.x86_64 #1 C51MCP51/C51GM03
 [454244.124338] RIP: 0010:[a048ed89]  [a048ed89] 
 btrfs_writepage_fixup_worker+0xde/0x118 [btrfs]
[...]

Hi Jeremy,

glad I'm not the only one with that issue. That may renew the
interest in it...

I don't think much progress has been made on it.

We could compare our experience to see what contributes to its
occurrence.

It occurs (quite reproducibly) for me when rsyncing from a
multi-device multi-subvolume btrfs fs (mounted with
compress-force) onto a single device, no subvolume btrfs fs also
mounted with compress-force. It also happens when the target is
mounted with compress instead of compress-force but not if I
leave out compress.

I only get one occurrence of those BUG()s until I reboot.

After the occurrence of that BUG(), I saw a number of
misbehaviors that may or may not be linked to it:

- btrfs eating all memory (mostly in the btrfs_inode_cache slab)
  resulting in crash. That doesn't happen anymore since I'm
  mounting with no atime and use CONFIG_SLUB (though I suspect
  it's noatime alone that did the trick)

- occasionally, 20 to 95% of write(2) system calls to files on
  the source FS take 4 seconds, making it hardly usable. I also
  notice a flush-btrfs-1 stuck in D state

How does that compare with your experience?

-- 
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: write(2) taking 4s

2011-07-19 Thread Stephane Chazelas
2011-07-18 20:37:25 +0100, Stephane Chazelas:
 2011-07-18 11:39:12 +0100, Stephane Chazelas:
  2011-07-17 10:17:37 +0100, Stephane Chazelas:
   2011-07-16 13:12:10 +0100, Stephane Chazelas:
Still on my btrfs-based backup system. I still see one BUG()
reached in btrfs-fixup per boot time, no memory exhaustion
anymore. There is now however something new: write performance
is down to a few bytes per second.
   [...]
   
   The condition that was causing that seems to have cleared by
   itself this morning before 4am.
   
   flush-btrfs-1 and sync are still in D state.
   
   Can't really tell what cleared it. Could be when the first of
   the rsyncs ended as all the other ones (and ntfsclones from nbd
   devices) ended soon after
  [...]
  
  New nightly backup, and it's happening again. Started about 40
  minutes after the start of the backup.
 [...]
  Actively  running at the moment are 1 rsync and 3 ntfsclone.
 [...]
 
 And then again today.
 
 Interestingly, I killall -STOPed all the ntfsclone and rsync
 processes and:
[...]
 Now 95% of the write(2)s take 4 seconds (while it was about 15%
 before I stopped the processes).
[...]

And this morning, after killing everything so that nothing was
writing to the FS anymore, 95% of write(2)s were delayed as well
(according to strace -Te write yes  file-on-btrfs).

Then I rebooted (sysrq-b) and am trying btrfsck (from
integration-20110705) on it, but btrfsck is using 8G of memory
on a system that has only 5G so it's swapping in and out
constantly and getting nowhere (and renders the system hardly usable)

I found
http://thread.gmane.org/gmane.comp.file-systems.btrfs/5716/focus=5728
from last year. Is that still the case?

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 1950 root  20   0 7684m 4.4g  232 R4 91.1   4:22.87 btrfsck
(and still growing)

 vmstat 1
procs ---memory-- ---swap-- -io -system-- cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
 2  2 3232016 115232   4524   3520  698  708  3305   716  991  570  3  1 56 39
 0  2 3231816 111536   5976   3428 2964  532  4912   532 1569  683  1  0 46 53
 0  2 3231144 105832   8144   3536 3140   24  532424 1612  392  1  1 38 60
 0  2 3231532 104964   8180   3684 2672  900  2708   900 1017  324  1  1 34 64

-- 
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: write(2) taking 4s

2011-07-18 Thread Stephane Chazelas
2011-07-17 10:17:37 +0100, Stephane Chazelas:
 2011-07-16 13:12:10 +0100, Stephane Chazelas:
  Still on my btrfs-based backup system. I still see one BUG()
  reached in btrfs-fixup per boot time, no memory exhaustion
  anymore. There is now however something new: write performance
  is down to a few bytes per second.
 [...]
 
 The condition that was causing that seems to have cleared by
 itself this morning before 4am.
 
 flush-btrfs-1 and sync are still in D state.
 
 Can't really tell what cleared it. Could be when the first of
 the rsyncs ended as all the other ones (and ntfsclones from nbd
 devices) ended soon after
[...]

New nightly backup, and it's happening again. Started about 40
minutes after the start of the backup.

system -net/total- ---procs--- --dsk/sda-dsk/sdb-dsk/sdc--
 time | recv  send|run blk new| read  writ: read  writ: read  writ
17-07 20:19:18|   0 0 |0.0 0.0 0.0| 142k   31k: 119k   36k: 120k   33k
17-07 20:19:48|8087k  224k|1.2 5.3 0.1|2976k   98k: 793k  400k:2856k  375k
17-07 20:20:18|5174k  134k|0.8 4.6 0.9| 880k  179k: 830k  916k:1801k  825k
17-07 20:20:48|6634k  148k|1.3 4.9 0.2| 609k  101k:1259k   96k:2628k   98k
17-07 20:21:18|6725k  165k|0.7 5.8 0.0| 237k  442k: 975k  723k:1870k  644k
17-07 20:21:48|7100k  153k|0.7 5.4   0| 305k   83k:1124k  314k:2155k  274k
17-07 20:22:18|4440k  178k|0.5 5.3 0.0| 296k 1775B:2094k  240k:1663k  239k
17-07 20:22:48|8181k  220k|0.9 5.8   0| 360k  410B:1579k  196k:2065k  196k
17-07 20:23:18|8144k  228k|1.3 5.6   0| 348k   54k:1781k  216k:2213k  164k
17-07 20:23:48|5506k  185k|0.8 5.2 0.1| 307k0 :2040k0 :2166k0
17-07 20:24:18|6260k  206k|1.0 5.4 0.1| 474k   78k:2034k  285k:2218k  207k
17-07 20:24:48|8420k  314k|1.5 5.4   0| 313k  363k:2367k  391k:2182k  124k
17-07 20:25:18|8367k  247k|0.9 5.1 0.2| 475k   77k:1797k   75k:2220k  410B
17-07 20:25:48|7511k  179k|1.0 4.7   0| 406k 7646B:1596k  145k:2397k  147k
17-07 20:26:18|7930k  162k|0.7 5.1   0| 991k  410B:1468k   26k:2186k   26k
17-07 20:26:48|7757k  176k|1.0 5.3   0|1884k   26k:1147k   58k:2761k   32k
[...]
17-07 20:57:18|6917k  120k|0.3 4.1   0|  56k  410B:  65k 4506B: 213k 4506B
17-07 20:57:48|5698k  103k|0.1 4.0   0|   0   410B:  27k 6007B: 590k 6007B
17-07 20:58:18|6582k  117k|0.2 4.0   0| 229k   20k: 195k  956B: 290k   21k
17-07 20:58:48|6048k  110k|0.6 4.0 0.1|  32k   21k:  81k  410B: 331k   21k
17-07 20:59:18|8057k  138k|0.6 4.1   0|  42k 5871B:  33k  410B:  35k 5871B
17-07 20:59:48|7369k  145k|0.5 4.1   0|  59k 3959B: 230k  410B: 532k 3959B
17-07 21:00:18|8189k  140k|0.7 4.0   0|  53k 6007B:  58k  410B:  40k 6007B
17-07 21:00:48|7596k  137k|0.3 4.2   0|  24k 6690B: 250k  410B:  15k 5734B
17-07 21:01:18|8448k  145k|0.7 4.2   0|  24k 1365B: 325k 6827B:  15k 7646B
17-07 21:01:48|6821k  119k|0.3 4.0   0|  17k  410B: 175k 3004B:  11k 3004B
17-07 21:02:18|3614k   66k|0.7 2.7   0|  39k  410B: 538k 4779B:  45k 4779B
17-07 21:02:48| 417k   14k|0.5 1.3 0.3| 106k 1638B: 209k 4779B:   0  4779B
17-07 21:03:18| 353k 7979B|0.8 1.2   0|   0  1229B: 449k 2867B:   0  2867B
17-07 21:03:48| 327k 8981B|1.1 1.2   0|   0   410B: 686k 4506B:  43k 4506B
[...]
18-07 11:02:48| 243k 4866B|0.0 1.2 0.1|   0  2458B:   0  3550B:   0  3550B
18-07 11:03:18| 274k 5506B|0.1 1.2 0.1|   0  1775B:   0  3550B:   0  3550B
18-07 11:03:48| 238k 4851B|0.1 1.2 0.0|   0  4369B:   0  3550B:   0  3550B
18-07 11:04:18| 243k 4999B|0.1 1.1 0.1|   0  4506B:   0  3550B:   0  3550B
18-07 11:04:48| 288k 6488B|0.1 1.1 0.4|   0  2458B:   0  3550B:   0  3550B


Because that's after the week-end, there's not much to write.
What's holding 3 of the backups is actually writing log data
like xx% Completed.

Actively  running at the moment are 1 rsync and 3 ntfsclone.

# strace -tt -s 2 -Te write -p 8771 -p 8567 -p 8856 -p 8403
Process 8771 attached - interrupt to quit
Process 8567 attached - interrupt to quit
Process 8856 attached - interrupt to quit
Process 8403 attached - interrupt to quit
[pid  8403] 11:12:26.539830 write(4, es..., 1024 unfinished ...
[pid  8771] 11:12:26.540417 write(4, hb..., 4096 unfinished ...
[pid  8567] 11:12:26.555211 write(1,  3..., 25 unfinished ...
[pid  8856] 11:12:26.593232 write(1,  6..., 25 unfinished ...
[pid  8403] 11:12:30.635257 ... write resumed ) = 1024 4.095271
[pid  8403] 11:12:30.635309 write(4, 19..., 112 unfinished ...
[pid  8567] 11:12:30.635364 ... write resumed ) = 25 4.080091
[pid  8856] 11:12:30.635553 ... write resumed ) = 25 4.042268
[pid  8771] 11:12:30.635799 ... write resumed ) = 4096 4.095350
[pid  8771] 11:12:30.636182 write(4, hb..., 4096 unfinished ...
[pid  8567] 11:12:30.649904 write(1,  3..., 25 unfinished ...
[pid  8403] 11:12:30.651452 ... write resumed ) = 112 0.015921
[pid  8567] 11:12:30.651595 ... write resumed ) = 25 0.001640
[pid  8403] 11:12:30.651787 write(4, @d..., 1024 unfinished ...
[pid  8771] 11:12:30.651865 ... write resumed ) = 4096 0.015638
[pid  8771] 11:12:30.652281 write(4, hb..., 4096 unfinished ...
[pid  8856] 11:12:30.657579

Re: write(2) taking 4s

2011-07-18 Thread Stephane Chazelas
2011-07-18 11:39:12 +0100, Stephane Chazelas:
 2011-07-17 10:17:37 +0100, Stephane Chazelas:
  2011-07-16 13:12:10 +0100, Stephane Chazelas:
   Still on my btrfs-based backup system. I still see one BUG()
   reached in btrfs-fixup per boot time, no memory exhaustion
   anymore. There is now however something new: write performance
   is down to a few bytes per second.
  [...]
  
  The condition that was causing that seems to have cleared by
  itself this morning before 4am.
  
  flush-btrfs-1 and sync are still in D state.
  
  Can't really tell what cleared it. Could be when the first of
  the rsyncs ended as all the other ones (and ntfsclones from nbd
  devices) ended soon after
 [...]
 
 New nightly backup, and it's happening again. Started about 40
 minutes after the start of the backup.
[...]
 Actively  running at the moment are 1 rsync and 3 ntfsclone.
[...]

And then again today.

Interestingly, I killall -STOPed all the ntfsclone and rsync
processes and:

# strace -tt -Te write yes  a-file-on-the-btrfs-fs
20:23:26.635848 write(1, y\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\n..., 
4096) = 4096 4.095223
20:23:30.731391 write(1, y\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\n..., 
4096) = 4096 4.095769
20:23:34.827390 write(1, y\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\n..., 
4096) = 4096 4.095788
20:23:38.923388 write(1, y\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\n..., 
4096) = 4096 4.095771

Now 95% of the write(2)s take 4 seconds (while it was about 15%
before I stopped the processes).

[304257.760119] yes S 88001e8e3780 0 13179  13178 0x0001
[304257.760119]  88001e8e3780 0086  
8160b020
[304257.760119]  000127c0 880074543fd8 880074543fd8 
000127c0
[304257.760119]  88001e8e3780 880074542010 0286 
00010286
[304257.760119] Call Trace:
[304257.760119]  [81339912] ? schedule_timeout+0xa0/0xd7
[304257.760119]  [8105238c] ? lock_timer_base+0x49/0x49
[304257.760119]  [a0162e7d] ? shrink_delalloc+0x100/0x14e [btrfs]
[304257.760119]  [a0163d66] ? 
btrfs_delalloc_reserve_metadata+0xf9/0x10b [btrfs]
[304257.760119]  [a0167aa8] ? btrfs_delalloc_reserve_space+0x20/0x3e 
[btrfs]
[304257.760119]  [a0182540] ? __btrfs_buffered_write+0x137/0x2dc 
[btrfs]
[304257.760119]  [a017ad0e] ? btrfs_dirty_inode+0x119/0x139 [btrfs]
[304257.760119]  [a0182a7a] ? btrfs_file_aio_write+0x395/0x42b [btrfs]
[304257.760119]  [8100866d] ? __switch_to+0x19c/0x288
[304257.760119]  [810fee6d] ? do_sync_write+0xb1/0xea
[304257.760119]  [81056522] ? ptrace_notify+0x7f/0x9d
[304257.760119]  [811691ce] ? security_file_permission+0x18/0x2d
[304257.760119]  [810ff7c4] ? vfs_write+0xa4/0xff
[304257.760119]  [810116a7] ? syscall_trace_enter+0xb6/0x15b
[304257.760119]  [810ff8d5] ? sys_write+0x45/0x6e
[304257.760119]  [81340027] ? tracesys+0xd9/0xde

After killall -CONT, it's back to 15% write(2)s delayed.

What's going on?

-- 
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: write(2) taking 4s. (Was: Memory leak?)

2011-07-17 Thread Stephane Chazelas
2011-07-16 13:12:10 +0100, Stephane Chazelas:
 Still on my btrfs-based backup system. I still see one BUG()
 reached in btrfs-fixup per boot time, no memory exhaustion
 anymore. There is now however something new: write performance
 is down to a few bytes per second.
[...]

The condition that was causing that seems to have cleared by
itself this morning before 4am.

flush-btrfs-1 and sync are still in D state.

Can't really tell what cleared it. Could be when the first of
the rsyncs ended as all the other ones (and ntfsclones from nbd
devices) ended soon after

Cheers,
Stephane

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


write(2) taking 4s. (Was: Memory leak?)

2011-07-16 Thread Stephane Chazelas
Still on my btrfs-based backup system. I still see one BUG()
reached in btrfs-fixup per boot time, no memory exhaustion
anymore. There is now however something new: write performance
is down to a few bytes per second.

I've got a few processes (rsync, patched ntfsclone, shells
mostly) writing to files at the same time on this server.

disk stats per second:

--dsk/sda-dsk/sdb-dsk/sdc--
 read  writ: read  writ: read  writ
 264k   44k: 193k   44k: 225k   42k
   0 0 :   0 0 :   0 0
   0 0 :   0 0 :   0 0
   0 0 :   0 0 :   0 0
   060k:   0 0 :   0 0
   012k:   0  1176k:   0  1164k
   0 0 :   0 0 :   0 0
   0 0 :   0 0 :   0 0
   0 0 :   0 0 :   0 0
   040k:   0 0 :8192B0
   0 0 :   0 0 :   0 0
   0 0 :   0 0 :   0 0
   0 0 :   0 0 :   0 0
   0 0 :   0 0 :   0 0
   0 0 :   0 0 :4096B0
   0 0 :   0 0 :   0 0
   0 0 :   0 0 :   0 0
   0 0 :   0 0 :   0 0
 324k0 : 248k0 : 548k0
   0  4096B:   0 0 :   0 0
 352k0 :   0 0 :   0 0
   0 0 :   0 0 :   0 0
   0 0 :   0 0 :4096B0
   0 0 :   0 0 :   0 0
   080k:   0 0 :   0 0

rsync:

# strace -Ts0 -p 5015
Process 5015 attached - interrupt to quit
write(3, ..., 1024)   = 1024 0.007700
write(3, ..., 1024)   = 1024 0.015822
write(3, ..., 1024)   = 1024 0.031853
write(3, ..., 1024)   = 1024 0.015881
write(3, ..., 1024)   = 1024 0.015911
write(3, ..., 1024)   = 1024 0.015796
write(3, ..., 1024)   = 1024 0.031946
write(3, ..., 1024)   = 1024 4.083854
write(3, ..., 1024)   = 1024 0.007925
write(3, ..., 1024)   = 1024 0.003776
write(3, ..., 1024)   = 1024 0.031862
write(3, ..., 1024)   = 1024 0.011807
write(3, ..., 1024)   = 1024 0.019742
write(3, ..., 1024)   = 1024 0.015857
write(3, ..., 1024)   = 1024 0.031833
write(3, ..., 1024)   = 1024 0.015789
write(3, ..., 1024)   = 1024 0.015926
write(3, ..., 1024)   = 1024 4.095967
write(3, ..., 1024)   = 1024 0.019798
write(3, ..., 1024)   = 1024 4.083682
write(3, ..., 1024)   = 1024 0.015398
write(3, ..., 1024)   = 1024 0.015951
write(3, ..., 1024)   = 1024 0.035837
write(3, ..., 1024)   = 1024 0.015962
write(3, ..., 1024)   = 1024 0.015909
write(3, ..., 1024)   = 1024 0.015967
write(3, ..., 48) = 48 0.003782
write(3, ..., 1024)   = 1024 0.031802
write(3, ..., 1024)   = 1024 0.015811
write(3, ..., 1024)   = 1024 0.015944
write(3, ..., 1024)   = 1024 0.019810
write(3, ..., 1024)   = 1024 0.031948

ntfsclone (patched to only write modified clusters):

# strace -Te write -p 4717
Process 4717 attached - interrupt to quit
write(1,  65.16 percent completed\r, 25) = 25 0.008996
write(1,  65.16 percent completed\r, 25) = 25 0.743358
write(1,  65.16 percent completed\r, 25) = 25 0.306582
write(1,  65.17 percent completed\r, 25) = 25 4.082723
write(1,  65.17 percent completed\r, 25) = 25 0.006402
write(1,  65.17 percent completed\r, 25) = 25 0.012582
write(1,  65.17 percent completed\r, 25) = 25 4.052504
write(1,  65.17 percent completed\r, 25) = 25 0.012111
write(1,  65.17 percent completed\r, 25) = 25 0.016001
write(1,  65.17 percent completed\r, 25) = 25 4.028017
write(1,  65.18 percent completed\r, 25) = 25 0.013365
write(1,  65.18 percent completed\r, 25) = 25 0.003963
(that's writing to a log file)

See how many write(2)s take 4 seconds.

No issue when writing to an ext4 FS.

SMART status on all drives OK.

What else could I look at?

Attached is a sysrq-t output.

-- 
Stephane


sysrq-t.txt.xz
Description: Binary data


Re: write(2) taking 4s. (Was: Memory leak?)

2011-07-16 Thread Stephane Chazelas
2011-07-16 13:12:10 +0100, Stephane Chazelas:
[...]
 ntfsclone (patched to only write modified clusters):
 
 # strace -Te write -p 4717
 Process 4717 attached - interrupt to quit
 write(1,  65.16 percent completed\r, 25) = 25 0.008996
 write(1,  65.16 percent completed\r, 25) = 25 0.743358
 write(1,  65.16 percent completed\r, 25) = 25 0.306582
 write(1,  65.17 percent completed\r, 25) = 25 4.082723
 write(1,  65.17 percent completed\r, 25) = 25 0.006402
 write(1,  65.17 percent completed\r, 25) = 25 0.012582
 write(1,  65.17 percent completed\r, 25) = 25 4.052504
 write(1,  65.17 percent completed\r, 25) = 25 0.012111
 write(1,  65.17 percent completed\r, 25) = 25 0.016001
 write(1,  65.17 percent completed\r, 25) = 25 4.028017
 write(1,  65.18 percent completed\r, 25) = 25 0.013365
 write(1,  65.18 percent completed\r, 25) = 25 0.003963
 (that's writing to a log file)
 
 See how many write(2)s take 4 seconds.
[...]

top - 17:14:18 up 1 day,  9:20,  3 users,  load average: 1.00, 1.06, 1.11
Tasks: 146 total,   1 running, 145 sleeping,   0 stopped,   0 zombie
Cpu0  :  0.0%us,  0.0%sy,  0.0%ni, 25.0%id, 75.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   5094800k total,  4616412k used,   478388k free,  1420192k buffers
Swap:  4194300k total, 8720k used,  4185580k free,  2266240k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  P COMMAND
 2156 root  20   0 000 D0  0.0   0:00.02 0 flush-btrfs-1
 6206 root  20   0 19112 1284  916 R0  0.0   0:00.09 0 top
1 root  20   0  8400  220  196 S0  0.0   0:02.26 0 init
(all the other processes sleeping)

I suspect load 1 is for that flush-btrfs-1 in

[86372.445554] flush-btrfs-1   D 88014438da30 0  2156  2 0x
[86372.445554]  88014438da30 0046 8100e366 
88014438a9a0
[86372.445554]  000127c0 880021501fd8 880021501fd8 
000127c0
[86372.445554]  88014438da30 880021500010 8100e366 
81066ec6
[86372.445554] Call Trace:
[86372.445554]  [8100e366] ? read_tsc+0x5/0x16
[86372.445554]  [8100e366] ? read_tsc+0x5/0x16
[86372.445554]  [81066ec6] ? timekeeping_get_ns+0xd/0x2a
[86372.445554]  [810b607c] ? __lock_page+0x63/0x63
[86372.445554]  [81339473] ? io_schedule+0x84/0xc3
[86372.445554]  [811aa0aa] ? radix_tree_gang_lookup_tag_slot+0x7a/0x9f
[86372.445554]  [810b6085] ? sleep_on_page+0x9/0xd
[86372.445554]  [813399d5] ? __wait_on_bit_lock+0x3c/0x85
[86372.445554]  [810b6076] ? __lock_page+0x5d/0x63
[86372.445554]  [8105fff7] ? autoremove_wake_function+0x2a/0x2a
[86372.445554]  [a0192784] ? T.1090+0xba/0x234 [btrfs]
[86372.445554]  [a01929ee] ? extent_writepages+0x40/0x56 [btrfs]
[86372.445554]  [a017e5f0] ? btrfs_submit_direct+0x403/0x403 [btrfs]
[86372.445554]  [8111b29c] ? writeback_single_inode+0xb8/0x1b8
[86372.445554]  [8111b5af] ? writeback_sb_inodes+0xc2/0x13b
[86372.445554]  [8111b96e] ? writeback_inodes_wb+0xfd/0x10f
[86372.445554]  [8111c088] ? wb_writeback+0x213/0x330
[86372.445554]  [81052368] ? lock_timer_base+0x25/0x49
[86372.445554]  [8111c312] ? wb_do_writeback+0x16d/0x1fc
[86372.445554]  [81052824] ? del_timer_sync+0x34/0x3e
[86372.445554]  [8111c464] ? bdi_writeback_thread+0xc3/0x1ff
[86372.445554]  [8111c3a1] ? wb_do_writeback+0x1fc/0x1fc
[86372.445554]  [8111c3a1] ? wb_do_writeback+0x1fc/0x1fc
[86372.445554]  [8105fb91] ? kthread+0x7a/0x82
[86372.445554]  [81340f24] ? kernel_thread_helper+0x4/0x10
[86372.445554]  [8105fb17] ? kthread_worker_fn+0x147/0x147
[86372.445554]  [81340f20] ? gs_change+0x13/0x13

Also, if I run sync(1), it seems to never return.

[120348.788021] syncD 88011b3e1bc0 0  6215   1789 0x
[120348.788021]  88011b3e1bc0 0082  
8160b020
[120348.788021]  000127c0 8800b0f25fd8 8800b0f25fd8 
000127c0
[120348.788021]  88011b3e1bc0 8800b0f24010 88011b3e1bc0 
00014fc127c0
[120348.788021] Call Trace:
[120348.788021]  [8133989f] ? schedule_timeout+0x2d/0xd7
[120348.788021]  [8111f2fa] ? __sync_filesystem+0x74/0x74
[120348.788021]  [81339714] ? wait_for_common+0xd1/0x14e
[120348.788021]  [810427f2] ? try_to_wake_up+0x18c/0x18c
[120348.788021]  [8111f2fa] ? __sync_filesystem+0x74/0x74
[120348.788021]  [8111f2fa] ? __sync_filesystem+0x74/0x74
[120348.788021]  [8111bab6] ? writeback_inodes_sb_nr+0x72/0x78
[120348.788021]  [8111f2d4] ? __sync_filesystem+0x4e/0x74
[120348.788021]  [81100d0d] ? iterate_supers+0x5e/0xab
[120348.788021]  [8111f337] ? sys_sync+0x28/0x53
[120348.788021]  [8133fe12] ? system_call_fastpath+0x16/0x1b

Re: Memory leak?

2011-07-12 Thread Stephane Chazelas
2011-07-11 10:01:21 +0100, Stephane Chazelas:
[...]
 Same without dmcrypt. So to sum up, BUG() reached in btrfs-fixup
 thread when doing an 
 
 - rsync (though I also got (back when on ubuntu and 2.6.38) at
   least one occurrence using bsdtar | bsdtar)
 - of a large amount of data (with a large number of files),
   though the bug occurs quite early probably after having
   transfered about 50-100GB
 - the source FS being btrfs with compress-force on 3 devices
   (one of which slightly shorter than the others) and a lot of
   subvolumes and snapshots (I'm now copying from read-only
   snapshots but that happened with RW ones as well).
 - to a newly created btrfs fs
 - on one device (/dev/sdd or dmcrypt)
 - mounted with compress or compress-force.
 
 - noatime on either source or dest doesn't make a difference
   (wrt the occurrence of fixup BUG())
 - can't reproduce it when dest is not mounted with compress
 - beside that BUG(),
 - kernel memory is being used up (mostly in
   btrfs_inode_cache) and can't be reclaimed (leading to crash
   with oom killing everybody)
 - the target FS can be unmounted but that does not reclaim
   memory. However the *source* FS (that is not the one we tried
   with and without compress) cannot be unmounted (umount hangs,
   see another email for its stack trace).
 - Only way to get out of there is reboot with sysrq-b
 - happens with 2.6.38, 2.6.39, 3.0.0rc6
 - CONFIG_SLAB_DEBUG, CONFIG_DEBUG_PAGEALLOC,
   CONFIG_DEBUG_SLAB_LEAK, slub_debug don't tell us anything
   useful (there's more info in /proc/slabinfo when
   CONFIG_SLAB_DEBUG is on, see below)
 - happens with CONFIG_SLUB as well.
[...]

I don't know which of CONFIG_SLUB or noatime made it, but in
that setup with both enabled, I do get the BUG(), but the system
memory is not exhausted even after rsync goes over the section
with millions of files where it used to cause the oom crash.

The only issue remaining then is that I can't umount the source
FS (and thus causing reboot issues). We could still have 2 or 3
different issues here for all we know.

The situation is a lot more comfortable for me now though.

-- 
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: feature request: btrfs-image without zeroing data

2011-07-11 Thread Stephane Chazelas
2011-07-11 02:00:51 +0200, krz...@gmail.com :
 Documentation says that btrfs-image zeros data. Feature request is for
 disabling this. btrfs-image could be used to copy filesystem to
 another drive (for example with snapshots, when copying it file by
 file would take much longer time or acctualy was not possible
 (snapshots)). btrfs-image in turn could be used to actualy shrink loop
 devices/sparse file containing btrfs - by copying filesystem to new
 loop device/sparse file.
 
 Also it would be nice if copying filesystem could occour without
 intermediate dump to a file...
[...]

I second that.

See also
http://thread.gmane.org/gmane.comp.file-systems.btrfs/9675/focus=9820
for a way to transfer btrfs fs.

(Add a layer of copy-on-write on the original devices (LVM
snapshots, nbd/qemu-nbd cow...), btrfs add the new device(s)
and then btrfs del of the cow'ed original devices.

-- 
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: feature request: btrfs-image without zeroing data

2011-07-11 Thread Stephane Chazelas
2011-07-11 14:39:18 +0200, krz...@gmail.com :
 2011/7/11 Stephane Chazelas stephane_chaze...@yahoo.fr:
[...]
  See also
  http://thread.gmane.org/gmane.comp.file-systems.btrfs/9675/focus=9820
  for a way to transfer btrfs fs.
 
  (Add a layer of copy-on-write on the original devices (LVM
  snapshots, nbd/qemu-nbd cow...), btrfs add the new device(s)
  and then btrfs del of the cow'ed original devices.
[...]
 Copying on block level (dd, lvm) is old trick, however this takes same
 ammount of time regardless of actual space used in filesystem. Hence
 this feature request. Images inside filesystem can copy only actualy
 used data and metadata, which dramaticly reduces copy times in large
 volumes that are not filled up...

The method I suggest doesn't copy the whole disks, please read
more carefully. It can also work to copy from a 3 disk setup to
a 1 disk setup or the other way round.

With btrfs, you can add devices to a FS dynamically, you can
also delete devices in which case data is being transfered to
the other devices. The method I suggest uses that feature.

Cheers,
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Memory leak?

2011-07-11 Thread Stephane Chazelas
2011-07-11 11:00:19 -0400, Chris Mason:
 Excerpts from Stephane Chazelas's message of 2011-07-11 05:01:21 -0400:
  2011-07-10 19:37:28 +0100, Stephane Chazelas:
   2011-07-10 08:44:34 -0400, Chris Mason:
   [...]
Great, we're on the right track.  Does it trigger with mount -o compress
instead of mount -o compress_force?
   [...]
   
   It does trigger. I get that same invalid opcode.
   
   BTW, I tried with CONFIG_SLUB and slub_debug and no more useful
   information than with SLAB_DEBUG.
   
   I'm trying now without dmcrypt. Then I won't have much bandwidth
   for testing.
  [...]
  
  Same without dmcrypt. So to sum up, BUG() reached in btrfs-fixup
  thread when doing an 
[...]
  - CONFIG_SLAB_DEBUG, CONFIG_DEBUG_PAGEALLOC,
CONFIG_DEBUG_SLAB_LEAK, slub_debug don't tell us anything
useful (there's more info in /proc/slabinfo when
CONFIG_SLAB_DEBUG is on, see below)
[...]
 This is some fantastic debugging, thank you.  I know you tested with
 slab debugging turned on, did you have CONFIG_DEBUG_PAGEALLOC on as
 well?

Yes when using SLAB, not when using SLUB.

 It's probably something to do with a specific file, but pulling that
 file out without extra printks is going to be tricky.  I'll see if I can
 reproduce it here.
[...]

For one occurrence, I know what file was being transfered at the
time of the crash (looking at ctimes on the dest FS, see one of
my earlier emails). And after just checking on the latest BUG(),
it's a different one.

Also, when I resume the rsync (so it doesn't transfer the
already transfered files), it does BUG() again.

regards,
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Memory leak?

2011-07-11 Thread Stephane Chazelas
2011-07-11 12:25:51 -0400, Chris Mason:
[...]
  Also, when I resume the rsync (so it doesn't transfer the
  already transfered files), it does BUG() again.
 
 Ok, could you please send along the exact rsync command you were
 running?
[...]

I did earlier, but here it is again:

rsync --archive --xattrs --hard-links --numeric-ids --sparse --acls /src/ /dst/

Also with:

(cd /src  bsdtar cf - .) | pv | (cd /dst  bsdtar -xpSf - --numeric-owner)

-- 
Stephane

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Memory leak?

2011-07-10 Thread Stephane Chazelas
2011-07-10 08:44:34 -0400, Chris Mason:
[...]
 Great, we're on the right track.  Does it trigger with mount -o compress
 instead of mount -o compress_force?
[...]

It does trigger. I get that same invalid opcode.

BTW, I tried with CONFIG_SLUB and slub_debug and no more useful
information than with SLAB_DEBUG.

I'm trying now without dmcrypt. Then I won't have much bandwidth
for testing.

-- 
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


A lot of writing to FS only read (Was: Memory leak?)

2011-07-09 Thread Stephane Chazelas
2011-07-09 08:09:55 +0100, Stephane Chazelas:
 2011-07-08 16:12:28 -0400, Chris Mason:
 [...]
   I'm running a dstat -df at the same time and I'm seeing
   substantive amount of disk writes on the disks that hold the
   source FS (and I'm rsyncing from read-only snapshot subvolumes
   in case you're thinking of atimes) almost more than onto the
   drive holding the target FS!?
[...]

One thing I didn't mention is that before doing the rsync, I
deleted a few snapshot volumes (those were read-only snapshots
of read-only snapshots) and recreated them and that's the ones
I'm rsyncing from (basically, I prepare an area to be rsynced
made of the latests snapshots of a few subvolumes).

-- 
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Memory leak?

2011-07-09 Thread Stephane Chazelas
2011-07-08 12:17:54 -0400, Chris Mason:
[...]
 How easily can you recompile your kernel with more debugging flags?
 That should help narrow it down.  I'm looking for CONFIG_SLAB_DEBUG (or
 slub) and CONFIG_DEBUG_PAGEALLOC
[...]

I tried that (with CONFIG_DEBUG_SLAB_LEAK as well) but no
difference whatsoever

-- 
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Memory leak?

2011-07-09 Thread Stephane Chazelas
2011-07-09 13:25:00 -0600, cwillu:
 On Sat, Jul 9, 2011 at 11:09 AM, Stephane Chazelas
 stephane_chaze...@yahoo.fr wrote:
  2011-07-08 11:06:08 -0400, Chris Mason:
  [...]
  I would do two things.  First, I'd turn off compress_force.  There's no
  explicit reason for this, it just seems like the mostly likely place for
  a bug.
  [...]
 
  I couldn't reproduce it with compress_force turned off, the
  inode_cache reached 600MB but never stayed high.
 
  Not using compress_force is not an option for me though
  unfortunately.
 
 Disabling compression doesn't decompress everything that's already compressed.

Yes. But the very issue here is that I get this problem when I
copy data onto an empty drive. If I don't enable compression
there, it simply doesn't fit. In that very case, support for
compression is the main reason why I use brtfs here.

-- 
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Memory leak?

2011-07-09 Thread Stephane Chazelas
2011-07-09 08:09:55 +0100, Stephane Chazelas:
 2011-07-08 16:12:28 -0400, Chris Mason:
 [...]
   I'm running a dstat -df at the same time and I'm seeing
   substantive amount of disk writes on the disks that hold the
   source FS (and I'm rsyncing from read-only snapshot subvolumes
   in case you're thinking of atimes) almost more than onto the
   drive holding the target FS!?
  
  These are probably atime updates.  You can completely disable them with
  mount -o noatime.
 [...]
 
 I don't think it is. First, as I said, I'm rsyncing from
 read-only snapshots (and I could see atimes were not updated)
 and nothing else is running. Then now looking at the traces this
 morning, There was a lot written in the first minutes, then it
 calmed down.
[...]

How embarrassing, sorry

In that instance I wasn't rsyncing from the right place, so I
was indeed copying non-readonly volumes before copying readonly
ones. So, those writes were probably due to atimes.

-- 
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Memory leak?

2011-07-08 Thread Stephane Chazelas
2011-07-08 11:06:08 -0400, Chris Mason:
[...]
 So the invalidate opcode in btrfs-fixup-0 is the big problem.  We're
 either failing to write because we weren't able to allocate memory (and
 not dealing with it properly) or there is a bigger problem.
 
 Does the btrfs-fixup-0 oops come before or after the ooms?

Hi Chris, thanks for looking into this.

It comes long before. Hours before there's any problem. So it
seems unrelated.

 Please send along any oops output during the run.  Only the first
 (earliest) oops matters.

There's always only  one in between two reboots. I've sent two
already, but here they  are:

Jul  1 18:25:57  [ cut here ]
Jul  1 18:25:57  kernel BUG at 
/media/data/mattems/src/linux-2.6-3.0.0~rc5/debian/build/source_amd64_none/fs/btrfs/inode.c:1583!
Jul  1 18:25:57  invalid opcode:  [#1] SMP
Jul  1 18:25:57  CPU 1
Jul  1 18:25:57  Modules linked in: sha256_generic cryptd aes_x86_64 
aes_generic cbc dm_crypt fuse snd_pcm psmouse tpm_tis tpm i2c_i801 snd_timer 
snd soundcore snd_page_alloc i3200_edac tpm_bios serio_raw evdev pcspkr 
processor button thermal_sys i2c_core container edac_core sg sr_mod cdrom ext4 
mbcache jbd2 crc16 dm_mod nbd btrfs zlib_deflate crc32c libcrc32c ums_cypress 
usb_storage sd_mod crc_t10dif uas uhci_hcd ahci libahci libata ehci_hcd e1000e 
scsi_mod usbcore [last unloaded: scsi_wait_scan]
Jul  1 18:25:57 
Jul  1 18:25:57  Pid: 747, comm: btrfs-fixup-0 Not tainted 3.0.0-rc5-amd64 #1 
empty empty/Tyan Tank GT20 B5211
Jul  1 18:25:57  RIP: 0010:[a014b0f4]  [a014b0f4] 
btrfs_writepage_fixup_worker+0xdb/0x120 [btrfs]
Jul  1 18:25:57  RSP: 0018:8801483ffde0  EFLAGS: 00010246
Jul  1 18:25:57  RAX:  RBX: ea000496a430 RCX: 

Jul  1 18:25:57  RDX:  RSI: 06849000 RDI: 
880071c1fcb8
Jul  1 18:25:57  RBP: 06849000 R08: 0008 R09: 
8801483ffd98
Jul  1 18:25:57  R10: dead00200200 R11: dead00100100 R12: 
880071c1fd90
Jul  1 18:25:57  R13:  R14: 8801483ffdf8 R15: 
06849fff
Jul  1 18:25:57  FS:  () GS:88014fd0() 
knlGS:
Jul  1 18:25:57  CS:  0010 DS:  ES:  CR0: 8005003b
Jul  1 18:25:57  CR2: f7596000 CR3: 00013def9000 CR4: 
06e0
Jul  1 18:25:57  DR0:  DR1:  DR2: 

Jul  1 18:25:57  DR3:  DR6: 0ff0 DR7: 
0400
Jul  1 18:25:57  Process btrfs-fixup-0 (pid: 747, threadinfo 8801483fe000, 
task 88014672efa0)
Jul  1 18:25:57  Stack:
Jul  1 18:25:57   880071c1fc28 8800c70165c0  
88011e61ca28
Jul  1 18:25:57    880146ef41c0 880146ef4210 
880146ef41d8
Jul  1 18:25:57   880146ef41c8 880146ef4200 880146ef41e8 
a01669fa
Jul  1 18:25:57  Call Trace:
Jul  1 18:25:57   [a01669fa] ? worker_loop+0x186/0x4a1 [btrfs]
Jul  1 18:25:57   [813369ca] ? schedule+0x5ed/0x61a
Jul  1 18:25:57   [a0166874] ? btrfs_queue_worker+0x24a/0x24a [btrfs]
Jul  1 18:25:57   [a0166874] ? btrfs_queue_worker+0x24a/0x24a [btrfs]
Jul  1 18:25:57   [8105faed] ? kthread+0x7a/0x82
Jul  1 18:25:57   [8133e524] ? kernel_thread_helper+0x4/0x10
Jul  1 18:25:57   [8105fa73] ? kthread_worker_fn+0x147/0x147
Jul  1 18:25:57   [8133e520] ? gs_change+0x13/0x13
Jul  1 18:25:57  Code: 41 b8 50 00 00 00 4c 89 f1 e8 d5 3b 01 00 48 89 df e8 fb 
ac f6 e0 ba 01 00 00 00 4c 89 ee 4c 89 e7 e8 ce 05 01 00 e9 4e ff ff ff 0f 0b 
eb fe 48 8b 3c 24 41 b8 50 00 00 00 4c 89 f1 4c 89 fa 48
Jul  1 18:25:57  RIP  [a014b0f4] 
btrfs_writepage_fixup_worker+0xdb/0x120 [btrfs]
Jul  1 18:25:57   RSP 8801483ffde0
Jul  1 18:25:57  ---[ end trace 9744d33381de3d04 ]---

Jul  4 12:50:51  [ cut here ]
Jul  4 12:50:51  kernel BUG at 
/media/data/mattems/src/linux-2.6-3.0.0~rc5/debian/build/source_amd64_none/fs/btrfs/inode.c:1583!
Jul  4 12:50:51  invalid opcode:  [#1] SMP
Jul  4 12:50:51  CPU 0
Jul  4 12:50:51  Modules linked in: lm85 dme1737 hwmon_vid coretemp ipmi_si 
ipmi_msghandler sha256_generic cryptd aes_x86_64 aes_generic cbc fuse dm_crypt 
snd_pcm snd_timer snd sg soundcore i3200_edac snd_page_alloc sr_mod processor 
tpm_tis i2c_i801 pl2303 pcspkr thermal_sys i2c_core tpm edac_core tpm_bios 
cdrom usbserial container evdev psmouse button serio_raw ext4 mbcache jbd2 
crc16 dm_mod nbd btrfs zlib_deflate crc32c libcrc32c ums_cypress sd_mod 
crc_t10dif usb_storage uas uhci_hcd ahci libahci ehci_hcd libata e1000e usbcore 
scsi_mod [last unloaded: i2c_dev]
Jul  4 12:50:51 
Jul  4 12:50:51  Pid: 764, comm: btrfs-fixup-0 Not tainted 3.0.0-rc5-amd64 #1 
empty empty/Tyan Tank GT20 B5211
Jul  4 12:50:51  RIP: 0010:[a00820f4]  [a00820f4] 
btrfs_writepage_fixup_worker+0xdb/0x120 [btrfs]
Jul  4 12:50:51  RSP: 

Re: Memory leak?

2011-07-08 Thread Stephane Chazelas
2011-07-08 16:41:23 +0100, Stephane Chazelas:
 2011-07-08 11:06:08 -0400, Chris Mason:
 [...]
  So the invalidate opcode in btrfs-fixup-0 is the big problem.  We're
  either failing to write because we weren't able to allocate memory (and
  not dealing with it properly) or there is a bigger problem.
  
  Does the btrfs-fixup-0 oops come before or after the ooms?
 
 Hi Chris, thanks for looking into this.
 
 It comes long before. Hours before there's any problem. So it
 seems unrelated.

Though every time I had the issue, there had been such an
invalid opcode before. But also, I only had both the invalid
opcode and memory issue when doing that rsync onto external
hard drive.

  Please send along any oops output during the run.  Only the first
  (earliest) oops matters.
 
 There's always only  one in between two reboots. I've sent two
 already, but here they  are:
[...]

I dug up the traces for before I switched to debian (thinking
getting a newer kernel would improve matters) in case it helps:

Jun  4 18:12:58  [ cut here ]
Jun  4 18:12:58  kernel BUG at /build/buildd/linux-2.6.38/fs/btrfs/inode.c:1555!
Jun  4 18:12:58  invalid opcode:  [#2] SMP
Jun  4 18:12:58  last sysfs file: /sys/devices/virtual/block/dm-2/dm/name
Jun  4 18:12:58  CPU 0
Jun  4 18:12:58  Modules linked in: sha256_generic cryptd aes_x86_64 
aes_generic dm_crypt psmouse serio_raw xgifb(C+) i3200_edac edac_core nbd btrfs 
zlib_deflate libcrc32c xenbus_probe_frontend ums_cypress usb_storage uas e1000e 
ahci libahci
Jun  4 18:12:58 
Jun  4 18:12:58  Pid: 416, comm: btrfs-fixup-0 Tainted: G  D  C  
2.6.38-7-server #35-Ubuntu empty empty/Tyan Tank GT20 B5211
Jun  4 18:12:58  RIP: 0010:[a0099765]  [a0099765] 
btrfs_writepage_fixup_worker+0x145/0x150 [btrfs]
Jun  4 18:12:58  RSP: 0018:88003cfddde0  EFLAGS: 00010246
Jun  4 18:12:58  RAX:  RBX: ea04ca88 RCX: 

Jun  4 18:12:58  RDX: 88003cfddd98 RSI:  RDI: 
8800152088b0
Jun  4 18:12:58  RBP: 88003cfdde30 R08: e8c09988 R09: 
88003cfddd98
Jun  4 18:12:58  R10:  R11:  R12: 
010ec000
Jun  4 18:12:58  R13: 880015208988 R14:  R15: 
010ecfff
Jun  4 18:12:58  FS:  () GS:88003fc0() 
knlGS:
Jun  4 18:12:58  CS:  0010 DS:  ES:  CR0: 8005003b
Jun  4 18:12:58  CR2: 00e73fe8 CR3: 30fcc000 CR4: 
06f0
Jun  4 18:12:58  DR0:  DR1:  DR2: 

Jun  4 18:12:58  DR3:  DR6: 0ff0 DR7: 
0400
Jun  4 18:12:58  Process btrfs-fixup-0 (pid: 416, threadinfo 88003cfdc000, 
task 880036912dc0)
Jun  4 18:12:58  Stack:
Jun  4 18:12:58   880039c4e120 880015208820 88003cfdde90 
880032da4b80
Jun  4 18:12:58   88003cfdde30 88003ce915a0 88003cfdde90 
88003cfdde80
Jun  4 18:12:58   880036912dc0 88003ce915f0 88003cfddee0 
a00c34f4
Jun  4 18:12:58  Call Trace:
Jun  4 18:12:58   [a00c34f4] worker_loop+0xa4/0x3a0 [btrfs]
Jun  4 18:12:58   [a00c3450] ? worker_loop+0x0/0x3a0 [btrfs]
Jun  4 18:12:58   [81087116] kthread+0x96/0xa0
Jun  4 18:12:58   [8100cde4] kernel_thread_helper+0x4/0x10
Jun  4 18:12:58   [81087080] ? kthread+0x0/0xa0
Jun  4 18:12:58   [8100cde0] ? kernel_thread_helper+0x0/0x10
Jun  4 18:12:58  Code: 1f 80 00 00 00 00 48 8b 7d b8 48 8d 4d c8 41 b8 50 00 00 
00 4c 89 fa 4c 89 e6 e8 37 d1 01 00 eb b6 48 89 df e8 8d 1a 07 e1 eb 9a 0f 0b 
66 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 57 41 56 41 55
Jun  4 18:12:58  RIP  [a0099765] 
btrfs_writepage_fixup_worker+0x145/0x150 [btrfs]
Jun  4 18:12:58   RSP 88003cfddde0
Jun  4 18:12:58  ---[ end trace e5cf15794ff3ebdb ]---

And:

Jun  5 00:58:10  BUG: Bad page state in process rsync  pfn:1bfdf
Jun  5 00:58:10  page:ea61f8c8 count:0 mapcount:0 mapping:  
(null) index:0x2300
Jun  5 00:58:10  page flags: 0x110(dirty)
Jun  5 00:58:10  Pid: 1584, comm: rsync Tainted: G  D  C  2.6.38-7-server 
#35-Ubuntu
Jun  5 00:58:10  Call Trace:
Jun  5 00:58:10   [8111250b] ? dump_page+0x9b/0xd0
Jun  5 00:58:10   [8111260c] ? bad_page+0xcc/0x120
Jun  5 00:58:10   [81112905] ? prep_new_page+0x1a5/0x1b0
Jun  5 00:58:10   [815d755e] ? _raw_spin_lock+0xe/0x20
Jun  5 00:58:10   [a00b7391] ? test_range_bit+0x111/0x150 [btrfs]
Jun  5 00:58:10   [81112b74] ? get_page_from_freelist+0x264/0x650
Jun  5 00:58:10   [a0073cce] ? 
generic_bin_search.clone.42+0x19e/0x200 [btrfs]
Jun  5 00:58:10   [81113778] ? __alloc_pages_nodemask+0x118/0x830
Jun  5 00:58:10   [a0073cce] ? 
generic_bin_search.clone.42+0x19e/0x200 [btrfs]
Jun  5 00:58:10   [815d755e] ? _raw_spin_lock+0xe/0x20
Jun  5 00:58:10   [811541d2] ? get_partial_node

Re: Memory leak?

2011-07-08 Thread Stephane Chazelas
2011-07-08 12:17:54 -0400, Chris Mason:
[...]
  Jun  5 00:58:10  BUG: Bad page state in process rsync  pfn:1bfdf
  Jun  5 00:58:10  page:ea61f8c8 count:0 mapcount:0 mapping:  
  (null) index:0x2300
  Jun  5 00:58:10  page flags: 0x110(dirty)
  Jun  5 00:58:10  Pid: 1584, comm: rsync Tainted: G  D  C  
  2.6.38-7-server #35-Ubuntu
  Jun  5 00:58:10  Call Trace:
 
 Ok, this one is really interesting.  Did you get this after another oops
 or was it after a reboot?
 

After the oops above (a few hours after though). The two reports
were together with nothing inbetween in the kern.log.

That was the only occurrence though.

 How easily can you recompile your kernel with more debugging flags?
 That should help narrow it down.  I'm looking for CONFIG_SLAB_DEBUG (or
 slub) and CONFIG_DEBUG_PAGEALLOC
[...]

I can try next week.

-- 
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Memory leak?

2011-07-08 Thread Stephane Chazelas
2011-07-08 12:15:08 -0400, Chris Mason:
[...]
 You described this workload as rsync, is there anything else running?
[...]

Nope. Nothing else. And at least initially, that was onto an
empty drive so basic copy.

rsync --archive --xattrs --hard-links --numeric-ids --sparse --acls


Cheers,
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Memory leak?

2011-07-08 Thread Stephane Chazelas
2011-07-08 12:15:08 -0400, Chris Mason:
[...]
 I'd definitely try without -o compress_force.
[...]

Just started that over the night.

I'm running a dstat -df at the same time and I'm seeing
substantive amount of disk writes on the disks that hold the
source FS (and I'm rsyncing from read-only snapshot subvolumes
in case you're thinking of atimes) almost more than onto the
drive holding the target FS!?

--dsk/sda-- --dsk/sdb-- --dsk/sdc-- --dsk/sdd--
 read  writ: read  writ: read  writ: read  writ
1000k0 : 580k0 :2176k0 :   0 0
1192k0 :  76k0 : 988k0 :   0 0
 436k0 : 364k0 :1984k0 :   033M
 824k0 : 812k0 :4276k0 :   0 0
3004k0 :2868k0 :5488k0 :   0 0
 796k  564k: 640k   25M:2284k 4600k:   0 0
 108k0 :  68k   23M: 648k   35M:   0 0
1712k   12k:1644k   12k:2476k 7864k:   0 0
 732k0 : 620k0 :3192k0 :   0 0
1148k0 :1116k0 :3532k0 :   019M
1392k0 :1380k0 :4416k0 :   0  7056k
1336k0 :1212k0 :6664k0 :   0  3148k
 820k0 : 784k0 :4528k0 :   048M
1336k0 :1340k0 :3964k0 :   0  8996k
1528k0 :1280k0 :2908k0 :   0 0
1352k0 :1216k0 :3880k0 :   0 0
 864k0 : 888k0 :1684k0 :   0 0
1268k0 :1208k0 :3072k0 :   0 0

(source FS is on sda4+sdb+sdc, target on sdd, sda alsa has an
ext4 FS)

How can that be? Some garbage collection, background defrag or
something like that? But then, if I stop the rsync, those writes
stop as well.

-- 
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Memory leak?

2011-07-07 Thread Stephane Chazelas
2011-07-06 09:11:11 +0100, Stephane Chazelas:
[...]
 extent_map delayed_node btrfs_inode_cache btrfs_free_space_cache
 (in bytes)
[...]
 01:00  267192640  668595744 23216460003418048
 01:10  267192640  668595744 23216460003418048
 01:20  267192640  668595744 23216460003418048
 01:30  267192640  668595744 23216460003418048
 01:40  267192640  668595744 23216460003418048
[...]

I've just come accross
http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-unstable.git;a=commit;h=4b9465cb9e3859186eefa1ca3b990a5849386320

GIT author Chris Mason chris.ma...@oracle.com
GITFri, 3 Jun 2011 13:36:29 + (09:36 -0400)
GIT committer  Chris Mason chris.ma...@oracle.com
GITSat, 4 Jun 2011 12:03:47 + (08:03 -0400)
GIT commit 4b9465cb9e3859186eefa1ca3b990a5849386320
GIT tree   8fc06452fb75e52f6c1c2e2253c2ff6700e622fdtree | snapshot
GIT parent e7786c3ae517b2c433edc91714e86be770e9f1cecommit | diff
GIT Btrfs: add mount -o inode_cache
GIT 
GIT This makes the inode map cache default to off until we
GIT fix the overflow problem when the free space crcs don't fit
GIT inside a single page.

I would have thought that would have disabled that
btrfs_inode_cache. And I can see that patch is in 3.0.0-rc5 (I'm
not mounting with -o inode_cache). So, why those 2.2GiB in
btrfs_inode_cache above?

-- 
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Memory leak?

2011-07-07 Thread Stephane Chazelas
2011-07-07 16:20:20 +0800, Li Zefan:
[...]
 btrfs_inode_cache is a slab cache for in memory inodes, which is of
 struct btrfs_inode.
[...]

Thanks Li.

If that's a cache, the system should be able to reuse the space
there when it's low on memory, wouldn't it? What would be the
conditions where that couldn't be done? (like in my case, where
the oom killer was hired to free memory rather than reclaiming
that cache memory).

Best regards,
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Memory leak?

2011-07-06 Thread Stephane Chazelas
2011-07-03 13:38:57 -0600, cwillu:
 On Sun, Jul 3, 2011 at 1:09 PM, Stephane Chazelas
 stephane_chaze...@yahoo.fr wrote:
[...]
  Now, on a few occasions (actually, most of the time), when I
  rsynced the data (about 2.5TB) onto the external drive, the
  system would crash after some time with Out of memory and no
  killable process. Basically, something in kernel was allocating
  the whole memory, then oom mass killed everybody and crash.
[...]
 Look at the output of slabtop (should be installed by default, procfs
 package), before rsync for comparison, and during.

Hi,

so, no crash this time, but at the end of the rsync, there's a
whole chunk of memory that is no longer available to processes
(about 3.5G). As suggested by Carey, I watched /proc/slabinfo
during the rsync process (see below for a report of the most
significant ones over time).

Does that mean that if the system had had less than 3G of RAM, it
would have crashed?

I tried to reclaim the space without success. I had a process
allocate as much memory as it could. Then I unmounted the btrfs
fs that rsync was copying onto (the one on LUKS).
btrfs_inode_cache hardly changed. Then I tried to unmount the
source (the one on 3 hard disks and plenty of subvolumes).
umount hung. The FS disappeared from /proc/mounts. Here is the
backtrace:

[169270.268005] umount  D 880145ebe770 0 24079   1290 0x0004
[169270.268005]  880145ebe770 0086  
8160b020
[169270.268005]  00012840 880123bc7fd8 880123bc7fd8 
00012840
[169270.268005]  880145ebe770 880123bc6010 7fffac84f4a8 
0001
[169270.268005] Call Trace:
[169270.268005]  [81337d50] ? rwsem_down_failed_common+0xda/0x10e
[169270.268005]  [811aca63] ? call_rwsem_down_write_failed+0x13/0x20
[169270.268005]  [813376c7] ? down_write+0x25/0x27
[169270.268005]  [810ff6eb] ? deactivate_super+0x30/0x3d
[169270.268005]  [81114135] ? sys_umount+0x2ea/0x315
[169270.268005]  [8133d412] ? system_call_fastpath+0x16/0x1b

iostat shows nothing being written to the drives.

extent_map delayed_node btrfs_inode_cache btrfs_free_space_cache
(in bytes)
09:40 131560  0  0128
09:50 131560  0   2000128 rsync started at 9:52
10:00   15832608   87963264 146583 325440
10:10 386056   85071168 1444656000 832960
10:20 237600   33988032 15497690001355584
10:30   23300288  145209600  4872740002237056
10:40   24667104  148610016  5064920002304448
10:50   22479776  139655808  5118930002382464
11:004137672104169645920002425344
11:1054985923211776   127420002443648
11:204567904140889695990002452608
11:302276736130982446850002453696
11:402225696 42192019870002455424
11:501971552  90432 3830002466176
12:001761672  74016 3270002469760
12:101939608  85824 4010002473536
12:202136288 121824 5510002479168
12:302367288 135648 6190002486016
12:401380984 181152 9110002485696
12:501053272 20217610270002483712
13:001938200 21974411520002491712
13:102037112 22377612040002494528
13:201775664 24912012440002497216
13:301704560 36604817320002500608
13:401433344 4688642462501824
13:508553248   20505888   673320002503168
14:00   12682208   34351200  1419680002494208
14:10   18488800   50282784  1778030002500544
14:20   19435592   46767744  1635820002505920
14:30   18734936   44863488  1565010002507200
14:40   21865184   46053504  1601850002484928
14:50   24457664   46473120  1624460002499200
15:00   24401344   47700576  1667230002502784
15:10   31390304   63426240  2211790002521472
15:20   34594560   61365600  2142430002524160
15:30   33836704   60934752  2126950002526400
15:40   33358776   60598944  2114550002528320
15:50   34909952   62583840  2184920002526272
16:00   44326656   65875392  2301230002529792
16:10   45840608   66373632  2321140002532736
16:20   47848064   66577536  2320480002535872
16:30   48013152   6160  2406510002536128
16:40   47594184   67766976  2362410002536576
16:50   48144184   67739904  236122542144
17:00   48005848   67639392  2352980002544000
17:10   48253920   67661280  2353760002537216
17:20   48857952   67612032  2349950002536000
17:30   48514752   67611168  2349860002535488
17:40   48436872   67609728  2349240002534528
17:50   48902216   67765248  2356540002542400
18:00   49055160   67763520  236022542912
18:10   48749712   67727520  235742550464
18:20   48631088   67705344  2355570002553280
18:30   49101096   6344  235713000220
18:40   48609264   67782816  2356010002558912
18:50

Memory leak?

2011-07-03 Thread Stephane Chazelas
Hiya,

I've got a server using brtfs to implement a backup system.
Basically, every night, for a few machines, I rsync (and other
methods) their file systems into one btrfs subvolume each and
then snapshot it.

On that server, the btrfs fs is on 3 3TB drives, mounted with
compress-force. Every week, I rsync the most recent snapshot of
a selection of subvolumes onto an encrypted (LUKS) external hard
drive (3TB as well).

Now, on a few occasions (actually, most of the time), when I
rsynced the data (about 2.5TB) onto the external drive, the
system would crash after some time with Out of memory and no
killable process. Basically, something in kernel was allocating
the whole memory, then oom mass killed everybody and crash.

That was with ubuntu 2.6.38. I had then moved to debian and
2.6.39 and thought the problem was fixed, but it just happened
again with 3.0.0rc5 while rsyncing onto an initially empty btrfs
fs.

I'm going to resume the rsync again, and it's likely to happen
again. Is there anything simple (as I've got very little time to
look into that) I could do to help debug the issue (I'm not 100%
sure it's btrfs fault but that's the most likely culprit).

For a start, I'll switch the console to serial, and watch
/proc/vmstat. Anything else I could do?

Note that that server has never crashed when doing a lot of IO
at the same time in a lot of subvolumes with remote hosts. It's
only when copying data to that external drive on LUKS that it
seems to crash.

Cheers,
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Re: [btrfs-progs integration] incorrect argument checking for btrfs sub snap -r

2011-07-01 Thread Stephane Chazelas
2011-06-30 22:55:15 +0200, Andreas Philipp:
 
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
  
 On 30.06.2011 14:34, Stephane Chazelas wrote:
  Looks like this was missing in integration-20110626 for the
  readonly snapshot patch:
 
  diff --git a/btrfs.c b/btrfs.c
  index e117172..be6ece5 100644
  --- a/btrfs.c
  +++ b/btrfs.c
  @@ -49,7 +49,7 @@ static struct Command commands[] = {
  /*
  avoid short commands different for the case only
  */
  - { do_clone, 2,
  + { do_clone, -1,
  subvolume snapshot, [-r] source [dest/]name\n
  Create a writable/readonly snapshot of the subvolume source with\n
  the name name in the dest directory.,
 
  Without that, btrfs sub snap -r x y would fail as it's not *2*
  arguments.
 Unfortunately, this is not correct either. -1 means that the minimum
 number of arguments is 1 and since we need at least source and
 name this is 2. So the correct version should be -2.
[...]

Sorry, without looking closely at the source, I assumed -1 meant
defer the checking to the subcommand handler.

do_clone will indeed return an error if the number of arguments
is less than expected (so with -2, you'll get a different error
message if you do btrfs sub snap -r foo or btrfs sub snap
foo) , but will not if it's more than expected by the way.

So the patch should probably be:

diff --git a/btrfs.c b/btrfs.c
index e117172..b50c58a 100644
--- a/btrfs.c
+++ b/btrfs.c
@@ -49,7 +49,7 @@ static struct Command commands[] = {
/*
avoid short commands different for the case only
*/
-   { do_clone, 2,
+   { do_clone, -2,
  subvolume snapshot, [-r] source [dest/]name\n
Create a writable/readonly snapshot of the subvolume source 
with\n
the name name in the dest directory.,
diff --git a/btrfs_cmds.c b/btrfs_cmds.c
index 1d18c59..3415afc 100644
--- a/btrfs_cmds.c
+++ b/btrfs_cmds.c
@@ -355,7 +355,7 @@ int do_clone(int argc, char **argv)
return 1;
}
}
-   if (argc - optind  2) {
+   if (argc - optind != 2) {
fprintf(stderr, Invalid arguments for subvolume snapshot\n);
free(argv);
return 1;

Cheers,
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Re: [btrfs-progs integration] incorrect argument checking for btrfs sub snap -r

2011-07-01 Thread Stephane Chazelas
2011-07-01 11:42:23 +0100, Hugo Mills:
[...]
   diff --git a/btrfs.c b/btrfs.c
   index e117172..b50c58a 100644
   --- a/btrfs.c
   +++ b/btrfs.c
   @@ -49,7 +49,7 @@ static struct Command commands[] = {
 /*
 avoid short commands different for the case only
 */
   - { do_clone, 2,
   + { do_clone, -2,
   subvolume snapshot, [-r] source [dest/]name\n
 Create a writable/readonly snapshot of the subvolume source 
   with\n
 the name name in the dest directory.,
   diff --git a/btrfs_cmds.c b/btrfs_cmds.c
   index 1d18c59..3415afc 100644
   --- a/btrfs_cmds.c
   +++ b/btrfs_cmds.c
   @@ -355,7 +355,7 @@ int do_clone(int argc, char **argv)
 return 1;
 }
 }
   - if (argc - optind  2) {
   + if (argc - optind != 2) {
 fprintf(stderr, Invalid arguments for subvolume snapshot\n);
 free(argv);
 return 1;
  
  Thanks for having another look at this. You are perfectly right. Should
  we patch my patch or should I rework a corrected version? What do you
  think Hugo?
 
Could you send a follow-up patch with just the second hunk, please?
 I screwed up the process with this (processing patches too quickly to
 catch the review), and I've already published the patch with the first
 hunk, above, into the for-chris branch.

Hugo, not sure what you mean nor whom you're talking to, but I
can certainly copy-paste the second hunk from above here:

diff --git a/btrfs_cmds.c b/btrfs_cmds.c
index 1d18c59..3415afc 100644
--- a/btrfs_cmds.c
+++ b/btrfs_cmds.c
@@ -355,7 +355,7 @@ int do_clone(int argc, char **argv)
return 1;
}
}
-   if (argc - optind  2) {
+   if (argc - optind != 2) {
fprintf(stderr, Invalid arguments for subvolume snapshot\n);
free(argv);
return 1;

Cheers,
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[3.0.0rc5] invalid opcode

2011-07-01 Thread Stephane Chazelas
Hi,

I just got one of those:

[ 8203.192107] [ cut here ]
[ 8203.192146] kernel BUG at 
/media/data/mattems/src/linux-2.6-3.0.0~rc5/debian/build/source_amd64_none/fs/btrfs/inode.c:1583!
[ 8203.192210] invalid opcode:  [#1] SMP
[ 8203.192246] CPU 1
[ 8203.192256] Modules linked in: sha256_generic cryptd aes_x86_64 aes_generic 
cbc dm_crypt fuse snd_pcm psmouse tpm_tis tpm i2c_i801 snd_timer snd soundcore 
snd_page_alloc i3200_edac tpm_bios serio_raw evdev pcspkr processor button 
thermal_sys i2c_core container edac_core sg sr_mod cdrom ext4 mbcache jbd2 
crc16 dm_mod nbd btrfs zlib_deflate crc32c libcrc32c ums_cypress usb_storage 
sd_mod crc_t10dif uas uhci_hcd ahci libahci libata ehci_hcd e1000e scsi_mod 
usbcore [last unloaded: scsi_wait_scan]
[ 8203.192603]
[ 8203.192630] Pid: 747, comm: btrfs-fixup-0 Not tainted 3.0.0-rc5-amd64 #1 
empty empty/Tyan Tank GT20 B5211
[ 8203.192697] RIP: 0010:[a014b0f4]  [a014b0f4] 
btrfs_writepage_fixup_worker+0xdb/0x120 [btrfs]
[ 8203.192781] RSP: 0018:8801483ffde0  EFLAGS: 00010246
[ 8203.192816] RAX:  RBX: ea000496a430 RCX: 
[ 8203.192855] RDX:  RSI: 06849000 RDI: 880071c1fcb8
[ 8203.192893] RBP: 06849000 R08: 0008 R09: 8801483ffd98
[ 8203.192932] R10: dead00200200 R11: dead00100100 R12: 880071c1fd90
[ 8203.192971] R13:  R14: 8801483ffdf8 R15: 06849fff
[ 8203.193010] FS:  () GS:88014fd0() 
knlGS:
[ 8203.193067] CS:  0010 DS:  ES:  CR0: 8005003b
[ 8203.193103] CR2: f7596000 CR3: 00013def9000 CR4: 06e0
[ 8203.193141] DR0:  DR1:  DR2: 
[ 8203.193180] DR3:  DR6: 0ff0 DR7: 0400
[ 8203.193219] Process btrfs-fixup-0 (pid: 747, threadinfo 8801483fe000, 
task 88014672efa0)
[ 8203.193277] Stack:
[ 8203.193304]  880071c1fc28 8800c70165c0  
88011e61ca28
[ 8203.193371]   880146ef41c0 880146ef4210 
880146ef41d8
[ 8203.193434]  880146ef41c8 880146ef4200 880146ef41e8 
a01669fa
[ 8203.193497] Call Trace:
[ 8203.193538]  [a01669fa] ? worker_loop+0x186/0x4a1 [btrfs]
[ 8203.193579]  [813369ca] ? schedule+0x5ed/0x61a
[ 8203.193624]  [a0166874] ? btrfs_queue_worker+0x24a/0x24a [btrfs]
[ 8203.193673]  [a0166874] ? btrfs_queue_worker+0x24a/0x24a [btrfs]
[ 8203.193714]  [8105faed] ? kthread+0x7a/0x82
[ 8203.193750]  [8133e524] ? kernel_thread_helper+0x4/0x10
[ 8203.193788]  [8105fa73] ? kthread_worker_fn+0x147/0x147
[ 8203.193825]  [8133e520] ? gs_change+0x13/0x13
[ 8203.193859] Code: 41 b8 50 00 00 00 4c 89 f1 e8 d5 3b 01 00 48 89 df e8 fb 
ac f6 e0 ba 01 00 00 00 4c 89 ee 4c 89 e7 e8 ce 05 01 00 e9 4e ff ff ff 0f 0b 
eb fe 48 8b 3c 24 41 b8 50 00 00 00 4c 89 f1 4c 89 fa 48
[ 8203.194087] RIP  [a014b0f4] 
btrfs_writepage_fixup_worker+0xdb/0x120 [btrfs]
[ 8203.194160]  RSP 8801483ffde0
[ 8203.194907] ---[ end trace 9744d33381de3d04 ]---

Should I be worried?

Cheers,
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: subvolumes missing from btrfs subvolume list output

2011-06-30 Thread Stephane Chazelas
2011-06-30 11:18:42 +0200, Andreas Philipp:
[...]
  After that, I posted a patch to fix btrfs-progs, which Chris
  aggreed on:
 
  http://marc.info/?l=linux-btrfsm=129238454714319w=2
  [...]
 
  Great. Thanks a lot
 
  It fixes my problem indeed.
 
  Which brings me to my next question: where to find the latest
  btrfs-progs if not at
  git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git
[...]
 Hugo Mills keeps an integration branch with nearly all patches to
 btrfs-progs applied.
 See
 
 http://www.spinics.net/lists/linux-btrfs/msg10594.html
 
 and for the last update
 
 http://www.spinics.net/lists/linux-btrfs/msg10890.html
[...]

Thanks.

It might be worth adding a link to that to
https://btrfs.wiki.kernel.org/index.php/Btrfs_source_repositories

Note that it (integration-20110626) doesn't seem to include the fix in
http://marc.info/?l=linux-btrfsm=129238454714319w=2 though.

-- 
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [btrfs-progs integration] incorrect argument checking for btrfs sub snap -r

2011-06-30 Thread Stephane Chazelas
Looks like this was missing in integration-20110626 for the
readonly snapshot patch:

diff --git a/btrfs.c b/btrfs.c
index e117172..be6ece5 100644
--- a/btrfs.c
+++ b/btrfs.c
@@ -49,7 +49,7 @@ static struct Command commands[] = {
/*
avoid short commands different for the case only
*/
-   { do_clone, 2,
+   { do_clone, -1,
  subvolume snapshot, [-r] source [dest/]name\n
Create a writable/readonly snapshot of the subvolume source 
with\n
the name name in the dest directory.,

Without that, btrfs sub snap -r x y would fail as it's not *2*
arguments.

-- 
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


subvolumes missing from btrfs subvolume list output

2011-06-29 Thread Stephane Chazelas
Hiya,

I've got a btrfs FS with 84 subvolumes in it (some created
with btrfs sub create, some with btrfs sub snap of the other
ones). There's no nesting of subvolumes at all (all direct
children of the root subvolume).

The btrfs subvolume list is only showing 80 subvolumes. The 4
missing ones (1 original volume, 3 snapshots) do exist on disk
and files in there have different st_devs from any other
subvolume.

How would I start investigating what's wrong? And how to fix it.

I found
http://thread.gmane.org/gmane.comp.file-systems.btrfs/8123/focus=8208

which looks like the same issue, with Li Zefan saying he had a
fix, but I couldn't find any mention that it was actually fixed.

Has anybody got any update on that?

Thanks in advance,
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: subvolumes missing from btrfs subvolume list output

2011-06-29 Thread Stephane Chazelas
2011-06-29 15:37:47 +0100, Stephane Chazelas:
[...]
 I found
 http://thread.gmane.org/gmane.comp.file-systems.btrfs/8123/focus=8208
 
 which looks like the same issue, with Li Zefan saying he had a
 fix, but I couldn't find any mention that it was actually fixed.
 
 Has anybody got any update on that?
[...]

I've found
http://thread.gmane.org/gmane.comp.file-systems.btrfs/8232

but no corresponding fix or ioctl.c
http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-unstable.git;a=history;f=fs/btrfs/ioctl.c

I'm under the impression that the issue has been forgotten
about.

From what I managed to gather though, it seems that what's on
disk is correct, it's just the ioctl and/or btrfs sub list
that's wrong. Am I right?

(btw, I forgot to mention the kernel version: 3.0rc4 amd64,
btrfs tools from git)

Cheers,
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


different st_dev's in one subvolume

2011-06-01 Thread Stephane Chazelas
Hiya,

please consider this:

~# truncate -s1G ./a
~# mkfs.btrfs ./a
~# sudo mount -o loop ./a /mnt/1
~# cd /mnt/1
/mnt/1# ls
/mnt/1# btrfs sub c A
Create subvolume './A'
/mnt/1# btrfs sub c A/B
Create subvolume 'A/B'
/mnt/1# touch A/inA A/B/inB
/mnt/1# btrfs sub snap A A.snap
Create a snapshot of 'A' in './A.snap'
/mnt/1# zmodload zsh/stat
/mnt/1# zstat +device ./**/*
. 25
A 26
A/B 27
A/B/inB 27
A/inA 26
A.snap 28
A.snap/B 23
A.snap/inA 28

Why does A.snap/B have a different st_dev from A.snap's?

Also:

/mnt/1# touch A.snap/B/foo
touch: cannot touch `A.snap/B/foo': Permission denied

I can rmdir that directory OK though.

Also note that the permissions are different:

/mnt/1# ll A
total 0
drwx-- 1 root root 6 Jun  2 00:54 B/
-rw-r--r-- 1 root root 0 Jun  2 00:54 inA
/mnt/1# ll A.snap
total 0
drwxr-xr-x 1 root root 0 Jun  2 01:29 B/
-rw-r--r-- 1 root root 0 Jun  2 00:54 inA

If I create another snap of A or A.snap, the B in there gets
the same st_dev (23).

/mnt/1# btrfs sub create A.snap/B/C
Create subvolume 'A.snap/B/C'
ERROR: cannot create subvolume
# btrfs sub snap A.snap/B B.snap
ERROR: 'A.snap/B' is not a subvolume

-- 
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: different st_dev's in one subvolume

2011-06-01 Thread Stephane Chazelas
2011-06-02 01:39:41 +0100, Stephane Chazelas:
[...]
 /mnt/1# zstat +device ./**/*
 . 25
 A 26
 A/B 27
 A/B/inB 27
 A/inA 26
 A.snap 28
 A.snap/B 23
 A.snap/inA 28
 
 Why does A.snap/B have a different st_dev from A.snap's?
[...]
 If I create another snap of A or A.snap, the B in there gets
 the same st_dev (23).
[...]

And same inode, ctime, mtime, atime... And when I create a new
snapshot, all those (regardless of where they are) have their
times updated at once.

I also noticed the st_nlink is always one but then came accross
http://thread.gmane.org/gmane.comp.file-systems.btrfs/4580

-- 
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange btrfs sub list output

2011-05-31 Thread Stephane Chazelas
2011-05-27 13:49:52 +0200, Andreas Philipp:
[...]
  Thanks, I can understand that. What I don't get is how one creates
  a subvol with a top-level other than 5. I might be missing the
  obvious, though.
 
  If I do:
 
  btrfs sub create A btrfs sub create A/B btrfs sub snap A A/B/C
 
  A, A/B, A/B/C have their top-level being 5. How would I get a new
  snapshot to be a child of A/B for instance?
 
  In my case, 285, was not appearing in the btrfs sub list output,
  287 was a child of 285 with path data while all I did was create
  a snapshot of 284 (path u6:10022/vm+xfs@u8/xvda1/g8/v3/data in vol
  5) in u6:10022/vm+xfs@u8/xvda1/g8/v3/snapshots/2011-03-30
 
  So I did manage to get a volume with a parent other than 5, but I
  did not ask for it.
[...]
 Reconsidering the explanations on btrfs subvolume list in this thread
 I get the impression that a line in the output of btrfs subvolume list
 with top level other than 5 indicates that the backrefs from one
 subvolume to its parent are broken.
 
 What's your opinion on this?
[...]

Given that I don't really get what the parent-child relationship
means in that context, I can't really comment.

In effect, the snapshot had been created and was attached to the
right directory (but didn't appear in the sub list), and there
was an additional data volume that I had not asked for nor
created that had the snapshot above as parent and that did
appear in the sub list.

It pretty much looks like a bug to me, I'd like to understand
more so that I can maybe try and avoid running into it again.

-- 
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange btrfs sub list output

2011-05-27 Thread Stephane Chazelas
2011-05-26 22:22:03 +0100, Stephane Chazelas:
[...]
 I get a btrfs sub list output that I don't understand:
 
 # btrfs sub list /backup/
 ID 257 top level 5 path u1/linux/lvm+btrfs/storage/data/data
 ID 260 top level 5 path u2/linux/lvm/linux/var/data
 ID 262 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2010-10-11
 ID 263 top level 5 path u2/linux/lvm/linux/home/snapshots/2011-04-07
 ID 264 top level 5 path u2/linux/lvm/linux/root/snapshots/2011-04-07
 ID 265 top level 5 path u2/linux/lvm/linux/var/snapshots/2011-04-07
 ID 266 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2010-10-26
 ID 267 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2010-11-08
 ID 268 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2010-11-22
 ID 269 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2010-12-15
 ID 270 top level 5 path u2/linux/lvm/linux/home/snapshots/2011-04-14
 ID 271 top level 5 path u2/linux/lvm/linux/root/snapshots/2011-04-14
 ID 272 top level 5 path u2/linux/lvm/linux/var/snapshots/2011-04-14
 ID 273 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2010-12-29
 ID 274 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2011-01-26
 ID 275 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2011-03-07
 ID 276 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2011-04-01
 ID 277 top level 5 path u2/linux/lvm/linux/home/data
 ID 278 top level 5 path u2/linux/lvm/linux/home/snapshots/2011-04-27
 ID 279 top level 5 path u2/linux/lvm/linux/root/snapshots/2011-04-27
 ID 280 top level 5 path u2/linux/lvm/linux/var/snapshots/2011-04-27
 ID 281 top level 5 path u3:10022/vm+xfs@u9/xvda1/g1/v4/data
 ID 282 top level 5 path u3:10022/vm+xfs@u9/xvda1/g1/v4/snapshots/2011-05-19
 ID 283 top level 5 path u5/vm+xfs@u9/xvda1/g1/v5/data
 ID 284 top level 5 path u6:10022/vm+xfs@u8/xvda1/g8/v3/data
 ID 286 top level 5 path u5/vm+xfs@u9/xvda1/g1/v5/snapshots/2011-05-24
 ID 287 top level 285 path data
 ID 288 top level 5 path u4/vm+xfs@u9/xvda1/g1/v1/data
 ID 289 top level 5 path u4/vm+xfs@u9/xvda1/g1/v1/snapshots/2011-03-11
 ID 290 top level 5 path u4/vm+xfs@u9/xvda1/g1/v2/data
 ID 291 top level 5 path u4/vm+xfs@u9/xvda1/g1/v2/snapshots/2011-05-11
 ID 292 top level 5 path u4/vm+xfs@u9/xvda1/g1/v1/snapshots/2011-05-11
[...]
 There is no /backup/data directory. There is however a
 /backup/u6:10022/vm+xfs@u8/xvda1/g8/v3/snapshots/2011-03-30 that
 contains the same thing as what I get if I mount the fs with
 subvolid=287. And I did do a btrfs sub snap data
 snapshots/2011-03/30 there.
 
 What could be the cause of that? How to fix it?
 
 In case that matters, there used to be more components in the
 path of u6:10022/vm+xfs@u8/xvda1/g8/v3/data.
[...]

I tried deleting the
/backup/u6:10022/vm+xfs@u8/xvda1/g8/v3/snapshots/2011-03-30
subvolume (what seems to be id 287) and I get:

# btrfs sub delete snapshots/2011-03-30
Delete subvolume '/backup/u6:10022/vm+xfs@u8/xvda1/g8/v3/snapshots/2011-03-30'
ERROR: cannot delete 
'/backup/u6:10022/vm+xfs@u8/xvda1/g8/v3/snapshots/2011-03-30'

With a strace, it tells me:

ioctl(3, 0x5000940f, 0x7fffc7841a80)= -1 ENOTEMPTY (Directory not empty)

Then I realised that there was a data directory in there and
that snapshots/2011-03-30 was actually id 285 (which doesn't
appear in the btrfs sub list) and snapshots/2011-03-30/data is
id 287.

What do those top-level IDs mean by the way?

Then I was able to delete snapshots/2011-03-30/data, but
snapshots/2011-03-30 still didn't appear in the list.

Then I was able to delete snapshots/2011-03-30 and recreate it,
and this time it was fine.

Still don't know what happened there.

-- 
Stephane

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange btrfs sub list output

2011-05-27 Thread Stephane Chazelas
2011-05-27 10:21:03 +0200, Andreas Philipp:
[...]
  What do those top-level IDs mean by the way?
 The top-level ID associated with a subvolume is NOT the ID of this
 particular subvolume but of the subvolume containing it. Since the
 root/initial (sub-)volume has always ID 0, the subvolumes of depth
 1 will all have top-level ID set to 0. You need those top-level IDs to
 correctly mount a specific subvolume by name.
 
 # mount /dev/dummy -o subvol=subvolume,subvolrootid=top-level ID
 /mountpoint
 
 Of course, you do need them, if you specify the subvolume to mount by
 its ID.
[...]

Thanks Andreas for pointing that subvolrootid (might be worth
adding it to
https://btrfs.wiki.kernel.org/index.php/Getting_started#Mount_Options
BTW).

In my case, on a freshly made btrfs file system, subvolumes have
top-level 5. (and neither volume with id 0 or 5 appear in the
btrfs sub list).

All the top-levels are 5, and I don't even know how to create a
subvolume with a different top-level there, so I wonder how that
subvol that I had created with

btrfs sub snap data snapshots/2011-03-30

ending up being a subvolume with ID 285 that doesn't appear in
the btrfs sub list and contains a subvolume of path data
in there (with its top-level being 285). All the other
subvolumes and snapshots I've created in the exact same way are
created with a top-level 5 and have an entry in btrfs sub list
and don't have subvolumes of their own.

-- 
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange btrfs sub list output

2011-05-27 Thread Stephane Chazelas
Is there a way to derive the subvolume ID from the stat(2)
st_dev, by the way.

# btrfs sub list .
ID 256 top level 5 path a
ID 257 top level 5 path b
# zstat +dev . a b
. 27
a 28
b 29

Are the dev numbers allocated in the same order as the
subvolids? Would there be any /sys, /proc, ioctl interface to
get this kind of information?

-- 
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange btrfs sub list output

2011-05-27 Thread Stephane Chazelas
2011-05-27 10:12:24 +0100, Hugo Mills:
[skipped useful clarification]
 
That's all rather dense, and probably too much information. Hope
 it's helpful, though.
[...]

It is, thanks.

How would one end up in a situation where the output of btrfs
sub list . has:

ID 287 top level 285 path data

How could a subvolume 285 become a top level?

How does one get a subvolume with a top-level other than 5?

-- 
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange btrfs sub list output

2011-05-27 Thread Stephane Chazelas
2011-05-27 10:45:23 +0100, Hugo Mills:
[...]
  How could a subvolume 285 become a top level?
 
  How does one get a subvolume with a top-level other than 5?
 
This just means that subvolume 287 was created (somewhere) inside
 subvolume 285.
 
Due to the way that the FS trees and subvolumes work, there's no
 global namespace structure in btrfs; that is, there's no single data
 structure that represents the entirety of the file/directory hierarchy
 in the filesystem. Instead, it's broken up into these sub-namespaces
 called subvolumes, and we only record parent/child relationships for
 each subvolume separately. The full path you get from btrfs subv
 list is reconstructed from that information in userspace(*).
[...]

Thanks, I can understand that. What I don't get is how one
creates a subvol with a top-level other than 5. I might be
missing the obvious, though.

If I do:

btrfs sub create A
btrfs sub create A/B
btrfs sub snap A A/B/C

A, A/B, A/B/C have their top-level being 5. How would I get a
new snapshot to be a child of A/B for instance?

In my case, 285, was not appearing in the btrfs sub list output,
287 was a child of 285 with path data while all I did was
create a snapshot of 284 (path
u6:10022/vm+xfs@u8/xvda1/g8/v3/data in vol 5) in
u6:10022/vm+xfs@u8/xvda1/g8/v3/snapshots/2011-03-30

So I did manage to get a volume with a parent other than 5, but
I did not ask for it.

-- 
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


strange btrfs sub list output

2011-05-26 Thread Stephane Chazelas
Hiya,

I get a btrfs sub list output that I don't understand:

# btrfs sub list /backup/
ID 257 top level 5 path u1/linux/lvm+btrfs/storage/data/data
ID 260 top level 5 path u2/linux/lvm/linux/var/data
ID 262 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2010-10-11
ID 263 top level 5 path u2/linux/lvm/linux/home/snapshots/2011-04-07
ID 264 top level 5 path u2/linux/lvm/linux/root/snapshots/2011-04-07
ID 265 top level 5 path u2/linux/lvm/linux/var/snapshots/2011-04-07
ID 266 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2010-10-26
ID 267 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2010-11-08
ID 268 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2010-11-22
ID 269 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2010-12-15
ID 270 top level 5 path u2/linux/lvm/linux/home/snapshots/2011-04-14
ID 271 top level 5 path u2/linux/lvm/linux/root/snapshots/2011-04-14
ID 272 top level 5 path u2/linux/lvm/linux/var/snapshots/2011-04-14
ID 273 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2010-12-29
ID 274 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2011-01-26
ID 275 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2011-03-07
ID 276 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2011-04-01
ID 277 top level 5 path u2/linux/lvm/linux/home/data
ID 278 top level 5 path u2/linux/lvm/linux/home/snapshots/2011-04-27
ID 279 top level 5 path u2/linux/lvm/linux/root/snapshots/2011-04-27
ID 280 top level 5 path u2/linux/lvm/linux/var/snapshots/2011-04-27
ID 281 top level 5 path u3:10022/vm+xfs@u9/xvda1/g1/v4/data
ID 282 top level 5 path u3:10022/vm+xfs@u9/xvda1/g1/v4/snapshots/2011-05-19
ID 283 top level 5 path u5/vm+xfs@u9/xvda1/g1/v5/data
ID 284 top level 5 path u6:10022/vm+xfs@u8/xvda1/g8/v3/data
ID 286 top level 5 path u5/vm+xfs@u9/xvda1/g1/v5/snapshots/2011-05-24
ID 287 top level 285 path data
ID 288 top level 5 path u4/vm+xfs@u9/xvda1/g1/v1/data
ID 289 top level 5 path u4/vm+xfs@u9/xvda1/g1/v1/snapshots/2011-03-11
ID 290 top level 5 path u4/vm+xfs@u9/xvda1/g1/v2/data
ID 291 top level 5 path u4/vm+xfs@u9/xvda1/g1/v2/snapshots/2011-05-11
ID 292 top level 5 path u4/vm+xfs@u9/xvda1/g1/v1/snapshots/2011-05-11

See ID 287 above.

There is no /backup/data directory. There is however a
/backup/u6:10022/vm+xfs@u8/xvda1/g8/v3/snapshots/2011-03-30 that
contains the same thing as what I get if I mount the fs with
subvolid=287. And I did do a btrfs sub snap data
snapshots/2011-03/30 there.

What could be the cause of that? How to fix it?

In case that matters, there used to be more components in the
path of u6:10022/vm+xfs@u8/xvda1/g8/v3/data.

Thanks,
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: curious writes on mounted, not used btrfs filesystem

2011-05-22 Thread Stephane Chazelas
2011-05-21 14:58:21 +0200, Tomasz Chmielewski:
 I have a btrfs filesystem (2.6.39) which is mounted, but otherwise, not used:
 
 # lsof -n|grep /mnt/btrfs

processes with open fds are one thing. You could also have loop
devices setup on it for instance.

 #
 
 
 I noticed that whenever I do sync, btrfs will write for around 6.5s and 
 write 13 MB (see below).
[...]

You could try and play with /proc/sys/vm/block_dump to see what
is being written (remember to disable logging of kernel messages
by syslog).

-- 
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: curious writes on mounted, not used btrfs filesystem

2011-05-22 Thread Stephane Chazelas
2011-05-22 11:52:37 +0200, Tomasz Chmielewski:
[...]
 Can you try running these commands yourself:
 
 iostat -k 1 /your/btrfs/device
 
 
 And in a second terminal:
 
 while true; do sync ; done
 
 
 To see if your btrfs makes writes on sync each time?
[...]

Yes it does. And I see:

7[38554.244219] sync(3354): dirtied inode 1 (?) on dm-11
7[38554.244740] btrfs-submit-0(29134): WRITE block 301128 on dm-11 (56 
sectors)
7[38554.244774] btrfs-submit-0(29134): WRITE block 741832 on dm-11 (56 
sectors)
7[38554.249963] sync(3354): WRITE block 128 on dm-11 (8 sectors)
7[38554.250010] sync(3354): WRITE block 131072 on dm-11 (8 sectors)

7[38567.312908] sync(3356): dirtied inode 1 (?) on dm-11
7[38567.313330] btrfs-submit-0(29134): WRITE block 301256 on dm-11 (24 
sectors)
7[38567.313350] btrfs-submit-0(29134): WRITE block 741960 on dm-11 (24 
sectors)
7[38567.313358] btrfs-submit-0(29134): WRITE block 301288 on dm-11 (24 
sectors)
7[38567.313366] btrfs-submit-0(29134): WRITE block 741992 on dm-11 (24 
sectors)
7[38567.313393] btrfs-submit-0(29134): WRITE block 301312 on dm-11 (8 sectors)
7[38567.313403] btrfs-submit-0(29134): WRITE block 742016 on dm-11 (8 sectors)
7[38567.325194] sync(3356): WRITE block 128 on dm-11 (8 sectors)
7[38567.325244] sync(3356): WRITE block 131072 on dm-11 (8 sectors)

7[38570.374449] sync(3358): dirtied inode 1 (?) on dm-11
7[38570.374976] btrfs-submit-0(29134): WRITE block 301128 on dm-11 (56 
sectors)
7[38570.375011] btrfs-submit-0(29134): WRITE block 741832 on dm-11 (56 
sectors)
7[38570.379221] sync(3358): WRITE block 128 on dm-11 (8 sectors)
7[38570.379272] sync(3358): WRITE block 131072 on dm-11 (8 sectors)

7[38572.170816] sync(3359): dirtied inode 1 (?) on dm-11
7[38572.171289] btrfs-submit-0(29134): WRITE block 301256 on dm-11 (24 
sectors)
7[38572.171300] btrfs-submit-0(29134): WRITE block 741960 on dm-11 (24 
sectors)
7[38572.171304] btrfs-submit-0(29134): WRITE block 301288 on dm-11 (24 
sectors)
7[38572.171308] btrfs-submit-0(29134): WRITE block 741992 on dm-11 (24 
sectors)
7[38572.171320] btrfs-submit-0(29134): WRITE block 301312 on dm-11 (8 sectors)
7[38572.171325] btrfs-submit-0(29134): WRITE block 742016 on dm-11 (8 sectors)
7[38572.180338] sync(3359): WRITE block 128 on dm-11 (8 sectors)
7[38572.180386] sync(3359): WRITE block 131072 on dm-11 (8 sectors)

7[38574.186559] sync(3360): dirtied inode 1 (?) on dm-11
7[38574.187090] btrfs-submit-0(29134): WRITE block 301128 on dm-11 (56 
sectors)
7[38574.187125] btrfs-submit-0(29134): WRITE block 741832 on dm-11 (56 
sectors)
7[38574.191602] sync(3360): WRITE block 128 on dm-11 (8 sectors)
7[38574.191654] sync(3360): WRITE block 131072 on dm-11 (8 sectors)

7[38576.370003] sync(3361): dirtied inode 1 (?) on dm-11
7[38576.370452] btrfs-submit-0(29134): WRITE block 301256 on dm-11 (24 
sectors)
7[38576.370470] btrfs-submit-0(29134): WRITE block 741960 on dm-11 (24 
sectors)
7[38576.370478] btrfs-submit-0(29134): WRITE block 301288 on dm-11 (24 
sectors)
7[38576.370485] btrfs-submit-0(29134): WRITE block 741992 on dm-11 (24 
sectors)
7[38576.370513] btrfs-submit-0(29134): WRITE block 301312 on dm-11 (8 sectors)
7[38576.370523] btrfs-submit-0(29134): WRITE block 742016 on dm-11 (8 sectors)
7[38576.379718] sync(3361): WRITE block 128 on dm-11 (8 sectors)
7[38576.379766] sync(3361): WRITE block 131072 on dm-11 (8 sectors)

Every other sync the same.

-- 
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ssd option for USB flash drive?

2011-05-19 Thread Stephane Chazelas
2011-05-19 21:04:54 +0200, Hubert Kario:
 On Wednesday 18 of May 2011 00:02:52 Stephane Chazelas wrote:
  Hiya,
  
  I've not found much detail on what the ssd btrfs mount option
  did. Would it make sense to enable it to a fs on a USB flash
  drive?
 
 yes, enabling discard is pointless though (no USB storage supports it AFAIK).
  
  I'm using btrfs (over LVM) on a Live Linux USB stick to benefit
  from btrfs's compression and am trying to improve the
  performance.
 
 ssd mode won't improve performance by much (if any).
 
 You need to remember that USB2.0 is limited to about 20-30MiB/s (depending on 
 CPU) so it will be slow no matter what you do

Thanks Hubert for the feedback.

Well, for hard drives over USB, I can get to 40MiB/s read and
write easily. Here, I believe the bottle neck is the flash
memory. With that particular USB flash drive Corsair Voyager GT
16GB, I can get 25MiB/s sequential read and 17MiB/s sequential
write, but that falls down to about 3-5MiB/s random write.

[...]
 aligning logical blocks to erase blocks can give some performance but the 
 only 
 way to make it really fast is not to use USB
[...]

For something that fits in your pocket and is almost
universally bootable, there are not so many other options.

I tried changing the alignment on FAT32 and it didn't make
any difference. Playing with /proc/sys/vm/block_dump, I could see
chunks of 3, 4, 5 data sectors being written at once regardless
of the cluster size being used anyway. Interestingly when a user
process writes to /dev/sdx, block_dump shows 4k writes to
/dev/sdx only regardless of the size of the user writes while if
it goes via the filesystem I can see writes of up to 120k. Also,
I've very little knowledge of what happens at layers below the
block device (scsi interface, usb-storage, and the device
controller itself, for instance, I see
/sys/block/sdi/queue/rotational is 1 for that usb stick, why,
what does that mean in terms of performance and scheduling of
read-writes?)

I wonder now what credit to give to recommendations like in
http://www.patriotmemory.com/forums/showthread.php?3696-HOWTO-Increase-write-speed-by-aligning-FAT32
http://linux-howto-guide.blogspot.com/2009/10/increase-usb-flash-drive-write-speed.html

Doing a apt-get upgrade on that stick takes hours when the same
takes a few minutes on an internal drive.

If I boot a kvm virtual machine on that USB stick with a disk
cache mode of unsafe, that is writes are hardly every flushed
to underlying storage, then that becomes lightning fast (at the
expense of possibly losing data in case of host failure, but I'm
not too worried about that), and flushing writes to device
upon VM shutdown only takes a couple of minutes.

So I figured that if I could make sure writing to the flash
device is asynchronous (and reads priviledged), that would help.

There's probably some solutions with aufs or some fuse
solutions, but I thought there might be some solution in btrfs
or some standard core layers usually underneath it.

-- 
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ssd option for USB flash drive?

2011-05-19 Thread Stephane Chazelas
2011-05-19 15:54:23 -0600, cwillu:
[...]
 Try with the ssd_spread mount option.
[...]

Thanks. I'll try that.

  I wonder now what credit to give to recommendations like in
  http://www.patriotmemory.com/forums/showthread.php?3696-HOWTO-Increase-write-speed-by-aligning-FAT32
  http://linux-howto-guide.blogspot.com/2009/10/increase-usb-flash-drive-write-speed.html
 
  Doing a apt-get upgrade on that stick takes hours when the same
  takes a few minutes on an internal drive.
 
 Also, there's a package libeatmydata which will provide an
 eatmydata command, which you can prefix your apt-get commands with.
 This will disable the excessive sync calls that dpkg makes, and should
 dramatically decrease the time for those sorts of things to finish.
 Flash as found in thumb drives doesn't have much in the way of crash
 guarantees anyway, so you're not really giving up much safety.

Thanks. That's very useful indeed.

Note that if you use that on aptitude/apg-get that means that
the daemons started/restarted in the process will be affected,
but it could be all the better in my case.

Now, with that eatmydata, I'm thinking of trying qemu-nbd -c
/dev/nbd0 /dev/mapper/original-device with that and have the
rootfs mounted on that /dev/nbd0.

That eatmydata could be a work around to the problem I was
mentionning at
https://lists.ubuntu.com/archives/ubuntu-server-bugs/2010-June/037846.html

-- 
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


ssd option for USB flash drive?

2011-05-17 Thread Stephane Chazelas
Hiya,

I've not found much detail on what the ssd btrfs mount option
did. Would it make sense to enable it to a fs on a USB flash
drive?

I'm using btrfs (over LVM) on a Live Linux USB stick to benefit
from btrfs's compression and am trying to improve the
performance.

Would anybody have any recommendation on how to improve
performance there? Like what would be the best way to
enable/increase writeback buffer or any way to make sure writes
are delayed and asynchronous? Would disabling read-ahead help?
(at which level would it be done?). Any other tip (like
disabling atime, aligning blocks/extents, figure out erase block
sizes if relevant...)?

Many thanks in advance,
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: wrong values in df and btrfs filesystem df

2011-04-12 Thread Stephane Chazelas
2011-04-12 15:22:57 +0800, Miao Xie:
[...]
 But the algorithm of df command doesn't simulate the above allocation 
 correctly, this
 simulated allocation just allocates the stripes from two disks, and then, 
 these two disks
 have no free space, but the third disk still has 1.2TB free space, df command 
 thinks
 this space can be used to make a new RAID0 block group and ignores it. This 
 is a bug,
 I think.
[...]

Thanks a lot Miao for the detailed explanation. So, the disk
space is not lost, it's just df not reporting the available
space correctly. That's me relieved.

It explains why I'm getting:

# blockdev --getsize64 /dev/sda4
2967698087424
# blockdev --getsize64 /dev/sdb
3000592982016
# blockdev --getsize64 /dev/sdc
3000592982016
# truncate -s 2967698087424 a
# truncate -s 3000592982016 b
# truncate -s 3000592982016 c
# losetup /dev/loop0 ./a
# losetup /dev/loop1 ./b
# losetup /dev/loop2 ./c
# mkfs.btrfs a b c
# btrfs device scan /dev/loop[0-2]
Scanning for Btrfs filesystems in '/dev/loop0'
Scanning for Btrfs filesystems in '/dev/loop1'
Scanning for Btrfs filesystems in '/dev/loop2'
# mount  /dev/loop0 /mnt/1
# df -k /mnt/1
Filesystem   1K-blocks  Used Available Use% Mounted on
/dev/loop0   875867582856 5859474304   1% /mnt/1
# echo $(((8758675828 - 5859474304)*2**10))
2968782360576

One disk worth of space lost according to df.

While it should have been more something like
$(((3000592982016-2967698087424)*2)) (about 60GB), or about 0
after the quasi-round-robin allocation patch, right?

Best regards,
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: wrong values in df and btrfs filesystem df

2011-04-11 Thread Stephane Chazelas
2011-04-10 18:13:51 +0800, Miao Xie:
[...]
  # df /srv/MM
 
  Filesystem   1K-blocks  Used Available Use% Mounted on
  /dev/sdd15846053400 1593436456 2898463184  36% /srv/MM
 
  # btrfs filesystem df /srv/MM
 
  Data, RAID0: total=1.67TB, used=1.48TB
  System, RAID1: total=16.00MB, used=112.00KB
  System: total=4.00MB, used=0.00
  Metadata, RAID1: total=3.75GB, used=2.26GB
 
  # btrfs-show
 
  Label: MMedia  uuid: 120b036a-883f-46aa-bd9a-cb6a1897c8d2
 Total devices 3 FS bytes used 1.48TB
 devid3 size 1.81TB used 573.76GB path /dev/sdb1
 devid2 size 1.81TB used 573.77GB path /dev/sde1
 devid1 size 1.82TB used 570.01GB path /dev/sdd1
 
  Btrfs Btrfs v0.19
 
  
 
  df shows an Available value which isn't related to any real value.  
  
 I _think_ that value is the amount of space not allocated to any
  block group. If that's so, then Available (from df) plus the three
  total values (from btrfs fi df) should equal the size value from df.
 
 This value excludes the space that can not be allocated to any block group,
 This feature was implemented to fix the bug df command add the disk space, 
 which
 can not be allocated to any block group forever, into the Available value.
 (see the changelog of the commit 6d07bcec969af335d4e35b3921131b7929bd634e)
 
 This implementation just like fake chunk allocation, but the fake allocation
 just allocate the space from two of these three disks, doesn't spread the
 stripes over all the disks, which has enough space.
[...]

Hi Miao,

would you care to expand a bit on that. In Helmut's case above
where all the drives have at least 1.2TB free, how would there
be un-allocatable space?

What's the implication of having disks of differing sizes? Does
that mean that the extra space on larger disks is lost?

Thanks,
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs balancing start - and stop?

2011-04-11 Thread Stephane Chazelas
2011-04-06 12:43:50 +0100, Stephane Chazelas:
[...]
 The rate is going down. It's now down to about 14kB/s
 
 [658654.295752] btrfs: relocating block group 3919858106368 flags 20
 [671932.913235] btrfs: relocating block group 3919589670912 flags 20
 [686189.296126] btrfs: relocating block group 3919321235456 flags 20
 [701511.523990] btrfs: relocating block group 391905280 flags 20
 [718591.316339] btrfs: relocating block group 3918784364544 flags 20
 [725567.081031] btrfs: relocating block group 3918515929088 flags 20
 [744415.011581] btrfs: relocating block group 3918247493632 flags 20
 [762365.021458] btrfs: relocating block group 3917979058176 flags 20
 [780504.726067] btrfs: relocating block group 3917710622720 flags 20
[...]
 At this rate, the balancing would be over in about 8 years.
[...]

Hurray! The btrfs balance eventually ran through after almost exactly 2 weeks.
It didn't get down to 0:

[1189505.152717] btrfs: found 60527 extents
[1189505.440565] btrfs: relocating block group 3910731300864 flags 20
[1199805.071045] btrfs: found 60235 extents
[1199805.447821] btrfs: relocating block group 3910462865408 flags 20
[1207914.737372] btrfs: found 58039 extents

iostat reckons 9TB have been written to  disk in the whole
process (4.5TB read from them (!?)).

There hasn't been any change in allocation though:

# df -h /backup
FilesystemSize  Used Avail Use% Mounted on
/dev/sda4 8.2T  3.5T  3.2T  53% /backup
# btrfs fi df /backup
Data, RAID0: total=3.42TB, used=3.41TB
System, RAID1: total=16.00MB, used=228.00KB
Metadata, RAID1: total=28.00GB, used=20.47GB
# btrfs fi show
Label: none  uuid: a0ae35c4-51f2-405f-a4bb-e4f134b1d193
Total devices 3 FS bytes used 3.43TB
devid4 size 2.73TB used 1.17TB path /dev/sdc
devid3 size 2.73TB used 1.17TB path /dev/sdb
devid2 size 2.70TB used 1.14TB path /dev/sda4

Btrfs Btrfs v0.19

Still 1.5TB missing.

-- 
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: wrong values in df and btrfs filesystem df

2011-04-09 Thread Stephane Chazelas
2011-04-09 10:11:41 +0100, Hugo Mills:
[...]
  # df /srv/MM
  
  Filesystem   1K-blocks  Used Available Use% Mounted on
  /dev/sdd15846053400 1593436456 2898463184  36% /srv/MM
  
  # btrfs filesystem df /srv/MM
  
  Data, RAID0: total=1.67TB, used=1.48TB
  System, RAID1: total=16.00MB, used=112.00KB
  System: total=4.00MB, used=0.00
  Metadata, RAID1: total=3.75GB, used=2.26GB
  
  # btrfs-show
  
  Label: MMedia  uuid: 120b036a-883f-46aa-bd9a-cb6a1897c8d2
  Total devices 3 FS bytes used 1.48TB
  devid3 size 1.81TB used 573.76GB path /dev/sdb1
  devid2 size 1.81TB used 573.77GB path /dev/sde1
  devid1 size 1.82TB used 570.01GB path /dev/sdd1
  
  Btrfs Btrfs v0.19
  
  
  
  df shows an Available value which isn't related to any real value.  
 
I _think_ that value is the amount of space not allocated to any
 block group. If that's so, then Available (from df) plus the three
 total values (from btrfs fi df) should equal the size value from df.
[...]

Well,

$ echo $((2898463184 + 1.67*2**30 + 4*2**10 + 16*2**10*2 + 3.75*2**20*2))
4699513214.079

I do get the same kind of discrepancy:

$ df -h /mnt
FilesystemSize  Used Avail Use% Mounted on
/dev/sdb  8.2T  3.5T  3.2T  53% /mnt
$ sudo btrfs fi show
Label: none  uuid: ...
Total devices 3 FS bytes used 3.43TB
devid4 size 2.73TB used 1.17TB path /dev/sdc
devid3 size 2.73TB used 1.17TB path /dev/sdb
devid2 size 2.70TB used 1.14TB path /dev/sda4
$ sudo btrfs fi df /mnt
Data, RAID0: total=3.41TB, used=3.41TB
System, RAID1: total=16.00MB, used=232.00KB
Metadata, RAID1: total=35.25GB, used=20.55GB


$ echo $((3.2 + 3.41 + 2*16/2**20 + 2*35.25/2**10))
6.678847656253

-- 
Stephane

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs balancing start - and stop?

2011-04-06 Thread Stephane Chazelas
2011-04-04 20:07:54 +0100, Stephane Chazelas:
[...]
   4.7 more days to go. And I reckon it will have written about 9
   TB to disk by that time (which is the total size of the volume,
   though only 3.8TB are occupied).
  
  Yes - that's the pessimistic estimation. As Hugo has explained it can  
  finish faster - just look to the data tomorrow again.
 [...]
 
 That may be an optimistic estimation actually, as there hasn't
 been much progress in the last 34 hours:
[...]

The rate is going down. It's now down to about 14kB/s

[658654.295752] btrfs: relocating block group 3919858106368 flags 20
[671932.913235] btrfs: relocating block group 3919589670912 flags 20
[686189.296126] btrfs: relocating block group 3919321235456 flags 20
[701511.523990] btrfs: relocating block group 391905280 flags 20
[718591.316339] btrfs: relocating block group 3918784364544 flags 20
[725567.081031] btrfs: relocating block group 3918515929088 flags 20
[744415.011581] btrfs: relocating block group 3918247493632 flags 20
[762365.021458] btrfs: relocating block group 3917979058176 flags 20
[780504.726067] btrfs: relocating block group 3917710622720 flags 20

Even though it is reading and writing to disk at a much higher
rate. Here stats every second:

--dsk/sda-dsk/sdb-dsk/sdc--
 read  writ: read  writ: read  writ
   0 0 : 540k0 :  12k0
   0 0 : 704k0 :  20k0
   0 0 :1068k0 :  24k0
   0 0 : 968k0 :   0 0
   0 0 : 932k0 :4096B0
   0 0 : 832k  880k: 152k 1320k
  60k 4096B: 880k  140k:   028M
  68k0 : 308k0 :4096B 9240k
   048k:   0 0 :   0  7852k
   0 0 : 576k 6192k:4096B   26M
   0 0 : 100k   18M:   0 0
   0 0 :  28k   10M:   0 0
   0 0 :   0  7020k:   0 0
   0 0 :  52k   13M:   0 0
   012k: 528k   17M:   012k
   0 0 : 884k0 :8192B0
   0 0 :1068k0 :  20k0
   0 0 : 660k0 :   0 0
   040k: 776k0 :4096B0
   0 0 : 576k0 :   0 0
   0 0 : 596k0 :8192B0
1096k   28k: 664k0 :4096B0
   0 0 : 660k0 :   0 0
   0 0 : 592k0 :8192B0

At this rate, the balancing would be over in about 8 years.

Since the start of the balance:
Device:tpsMB_read/sMB_wrtn/sMB_readMB_wrtn
sda  10.04 1.56 1.5712282861237359
sdc 396.24 1.77 3.9513970153115057
sdb 421.17 1.87 3.9514737593115093

I think that's the end of my attempt to transfer that FS to
another machine (see other thread). I'll have to ditch that copy
and try again from scratch with another approach.

Before I do that, is there anything I can do to help investigate
the problem?

regards,
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: cloning single-device btrfs file system onto multi-device one

2011-04-06 Thread Stephane Chazelas
2011-03-28 14:17:48 +0100, Stephane Chazelas:
[...]
 So here is how I transferred a 6TB btrfs on one 6TB raid5 device
 (on host src) over the network onto a btrfs on 3 3TB hard drives
[...]
 I then did a btrfs fi balance again and let it run through. However here is
 what I get:
[...]

Sorry, it didn't run through and it is still running (after 9
days) and there are indications it  could still be running 8 years from
now (see other thread). There hasn't been any change in the
amount of free space reported by df since the beginning of  the
balance (there still are 2TB missing).

Cheers,
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs balancing start - and stop?

2011-04-04 Thread Stephane Chazelas
2011-04-03 21:35:00 +0200, Helmut Hullen:
 Hallo, Stephane,
 
 Du meintest am 03.04.11:
 
  balancing about 2 TByte needed about 20 hours.
 
 [...]
 
  Hugo has explained the limits of regarding
 
  dmesg | grep relocating
 
  or (more simple) the last lines of dmesg and looking for the
  relocating lines. But: what do these lines tell now? What is the
  (pessimistic) estimation when you extrapolate the data?
 
 [...]
 
  4.7 more days to go. And I reckon it will have written about 9
  TB to disk by that time (which is the total size of the volume,
  though only 3.8TB are occupied).
 
 Yes - that's the pessimistic estimation. As Hugo has explained it can  
 finish faster - just look to the data tomorrow again.
[...]

That may be an optimistic estimation actually, as there hasn't
been much progress in the last 34 hours:

# dmesg | awk -F '[][ ]+' '/reloc/ n++%5==0 {x=(n-$7)/($2-t)/1048576; printf 
%s\t%s\t%.2f\t%*s\n, $2/3600,$7, x, x/3, ; t=$2; n=$7}' | tr ' ' '*' | tail 
-40
125.629 4170039951360   11.93   ***
125.641 4166818725888   70.99   ***
125.699 4157155049472   43.87   **
125.753 4144270147584   63.34   *
125.773 4137827696640   84.98   
125.786 4134606471168   64.39   *
125.823 4124942794752   70.09   ***
125.87  4112057892864   71.66   ***
125.887 4105615441920   100.60  *
125.898 4102394216448   81.26   ***
125.935 4092730540032   69.06   ***
126.33  4085751218176   4.69*
131.904 4072597880832   0.63
132.082 4059712978944   19.20   **
132.12  4053270528000   45.52   ***
132.138 4050049302528   45.60   ***
132.225 4040385626112   29.68   *
132.267 4027500724224   81.17   ***
132.283 4021058273280   106.31  ***
132.29  4017837047808   110.42  
132.316 4008173371392   100.54  *
132.358 3995288469504   81.18   ***
132.475 3988846018560   14.62   
132.514 3985624793088   21.55   ***
132.611 3975961116672   26.40   
132.663 3963076214784   65.31   *
132.678 3956633763840   120.11  
132.685 3956365328384   10.26   ***
137.701 3949922877440   0.34
137.709 3946701651968   106.54  ***
137.744 3937037975552   72.10   
137.889 3927105863680   18.18   **
137.901 3926837428224   5.85*
141.555 3926300557312   0.04
141.93  3925226815488   0.76
151.227 3924421509120   0.02
151.491 3924153073664   0.27
151.712 3923616202752   0.64
165.301 3922542460928   0.02
174.346 3921737154560   0.02

At this rate (third field expressed in MiB/s), it could take
months to complete.

iostat still reports writes at about 5MiB/s though. Note that
this system is not doing anything else at all.

There definitely seems to be scope for optimisation in the
balancing I'd say.

-- 
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs balancing start - and stop?

2011-04-01 Thread Stephane Chazelas
On Fri, 2011-04-01 at 14:12 +0200, Helmut Hullen wrote:
 Hallo, Struan,
 
 Du meintest am 01.04.11:
 
  1) Is the balancing operation expected to take many hours (or days?)
  on a filesystem such as this? Or are there known issues with the
  algorithm that are yet to be addressed?
 
 May be. Balancing about 15 GByte needed about 2 hours (or less),  
 balancing about 2 TByte needed about 20 hours.
[...]

I've got a balance running since Monday on a 9TB volume (3.5 of which
are used, 3.2 allegedly free), showing no sign of finishing soon. Should
I be worried?

Using /proc/sys/vm/block_dump, I can see it's seeking all over the
place, which is probably why throughput is not high. I can also see it
writing several times to the same sectors.

# df -h /backup
FilesystemSize  Used Avail Use% Mounted on
/dev/sda4 8.2T  3.5T  3.2T  53% /backup
# btrfs fi sh
Label: none  uuid: ...
Total devices 3 FS bytes used 3.43TB
devid4 size 2.73TB used 1.16TB path /dev/sdc
devid3 size 2.73TB used 1.16TB path /dev/sdb
devid2 size 2.70TB used 1.14TB path /dev/sda4

Btrfs Btrfs v0.19
# ps -eolstart,args | grep balance
Mon Mar 28 11:18:18 2011 sudo btrfs fi balance /backup
Mon Mar 28 11:18:18 2011 btrfs fi balance /backup
# date
Fri Apr  1 19:28:40 BST 2011
# btrfs fi df /backup
Data, RAID0: total=3.41TB, used=3.41TB
System, RAID1: total=16.00MB, used=232.00KB
Metadata, RAID1: total=27.75GB, used=20.47GB
# iostat -md
Device:tpsMB_read/sMB_wrtn/sMB_readMB_wrtn
sda  14.49 2.37 2.39 903123 913112
sdc 501.23 2.68 5.0610224561928462
sdb 477.28 2.58 5.06 9828531928482

It's already written more than the used space.

Cheers,
Stephane

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: cloning single-device btrfs file system onto multi-device one

2011-03-28 Thread Stephane Chazelas
2011-03-22 18:06:29 -0600, cwillu:
  I can mount it back, but not if I reload the btrfs module, in which case I 
  get:
 
  [ 1961.328280] Btrfs loaded
  [ 1961.328695] device fsid df4e5454eb7b1c23-7a68fc421060b18b devid 1 
  transid 118 /dev/loop0
  [ 1961.329007] btrfs: failed to read the system array on loop0
  [ 1961.340084] btrfs: open_ctree failed
 
 Did you rescan all the loop devices (btrfs dev scan /dev/loop*) after
 reloading the module, before trying to mount again?

Thanks. That probably was the issue, that and using too big
files on too small volumes I'd guess.

I've tried it in real life and it seemed to work to some extent.
So here is how I transferred a 6TB btrfs on one 6TB raid5 device
(on host src) over the network onto a btrfs on 3 3TB hard drives
(on host dst):

on src:

lvm snapshot -L100G -n snap /dev/VG/vol
nbd-server 12345 /dev/VG/snap

(if you're not lucky enough to have used lvm there, you can use
nbd-server's copy-on-write feature).

on dst:

nbd-client src 12345 /dev/nbd0
mount /dev/nbd0 /mnt
btrfs device add /dev/sdb /dev/sdc /dev/sdd /mnt
  # in reality it was /dev/sda4 (a little under 3TB), /dev/sdb,
  # /dev/sdc
btrfs device delete /dev/nbd0 /mnt

That was relatively fast (about 18 hours) but failed with an
error. Apparently, it managed to fill up the 3 3TB drives (as
shown by btrfs fi show). Usage for /dev/nbd0 was at 16MB though
(?!)

I then did a btrfs fi balance /mnt. I could see usage on the
drives go down quickly. However, that was writing data onto
/dev/nbd0 so was threatening to fill up my LVM snapshot. I then
cancelled that by doing a hard reset on dst (couldn't find
any other way). And then:

Upon reboot, I mounted /dev/sdb instead of /dev/nbd0 in case
that made a difference and then ran the 

btrfs device delete /dev/nbd0 /mnt

again, which this time went through.

I then did a btrfs fi balance again and let it run through. However here is
what I get:

$ df -h /mnt
FilesystemSize  Used Avail Use% Mounted on
/dev/sdb  8.2T  3.5T  3.2T  53% /mnt

Only 3.2T left. How would I reclaim the missing space?

$ sudo btrfs fi show
Label: none  uuid: ...
Total devices 3 FS bytes used 3.43TB
devid4 size 2.73TB used 1.17TB path /dev/sdc
devid3 size 2.73TB used 1.17TB path /dev/sdb
devid2 size 2.70TB used 1.14TB path /dev/sda4
$ sudo btrfs fi df /mnt
Data, RAID0: total=3.41TB, used=3.41TB
System, RAID1: total=16.00MB, used=232.00KB
Metadata, RAID1: total=35.25GB, used=20.55GB

So that kind of worked but that is of little use to me as 2TB
kind of disappeared under my feet in the process.

Any idea, anyone?

Thanks
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: cloning single-device btrfs file system onto multi-device one

2011-03-28 Thread Stephane Chazelas
2011-03-23 12:13:45 +0700, Fajar A. Nugraha:
 On Mon, Mar 21, 2011 at 11:24 PM, Stephane Chazelas
 stephane.chaze...@gmail.com wrote:
  AFAICT, compression is enabled at mount time and would
  only apply to newly created files. Is there a way to compress
  files already in a btrfs filesystem?
 
 You need to select the files manually (not possible to select a
 directory), but yes, it's possible using btrfs filesystem defragment
 -c
[...]

Thanks. However I find that for files that have snapshots, it
ends up increasing disk usage instead of reducing it (size of
the file + size of the compressed file, instead of size of the
file).

If I do the btrfs fi de on both the volume and its snapshot, I
end up with some benefit only if the compression ratio is over
2 (and with more snapshots, there's little chance of getting any
benefit at all). Also, with dozens of snapshots on a 4TB volume,
it's likely to take weeks to do.

Is there a way around that?

Thanks
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: cloning single-device btrfs file system onto multi-device one

2011-03-22 Thread Stephane Chazelas
2011-03-21 16:24:50 +, Stephane Chazelas:
[...]
 I'm trying to move a btrfs FS that's on a hardware raid 5 (6TB
 large, 4 of which are in use) to another machine with 3 3TB HDs
 and preserve all the subvolumes/snapshots.
[...]

I tried one approach: export a LVM snapshot of the old fs as a
nbd device, mount it from the new machine (/dev/nbd0), then add
the new disks to the FS (btrfs add device) and then delete
/dev/nbd0 which I'd hope would relocate all the extents onto the
new disks.

I did some experiments with some loop devices but got all sorts
of results with different versions of kernels (debian unstable
2.6.37 and 2.6.38 amd64).

Here is what I did:

dd seek=512 bs=1M of=./a  /dev/null
dd seek=256 bs=1M of=./b  /dev/null
dd seek=256 bs=1M of=./c  /dev/null
mkfs.btrfs ./a
losetup /dev/loop0 ./a
losetup /dev/loop1 ./b
losetup /dev/loop2 ./c
mount /dev/loop0 /mnt
yes | head -c 300M  /mnt/test
btrfs device add /dev/loop1 /mnt
btrfs device add /dev/loop2 /mnt
# btrfs filesystem balance /mnt
btrfs device delete /dev/loop0 /mnt

In 2,6,38, upon the balance as well as upon the delete, it
seemed to go in a loop, the system at 70% wait, and some 
btrfs: found 1 extents
2 to 3 times per second in dmesg. I tried leaving it on for a few
hours and it didn't help. The only thing I could do is reboot.
Disk usage of the a, b, c files were not increasing, though
dstat -d showed some disk writing at ~500kB/s (so I suppose it
was writing the same blocks over and over and seeking a lot).

In 2.6.37, I managed to have it working once, though I don't
know how and never managed to reproduce.

Upon the delete, I can see some relocations in dmesg output, but
then:

# btrfs device delete /dev/loop0 /mnt
ERROR: error removing the device '/dev/loop0'
(no error in dmesg)

Upon umount, here is what I find in dmesg:

[...]
[ 1802.357205] btrfs: relocating block group 0 flags 2
[ 1860.193351] [ cut here ]
[ 1860.193373] WARNING: at 
/build/buildd-linux-2.6_2.6.37-2-amd64-bITS0h/linux-2.6-2.6.37/debian/build/source_amd64_none/fs/btrfs/volumes.c:544
 __btrfs_close_devices+0xb5/0xd0 [btrfs]()
[ 1860.193379] Hardware name: MacBookPro4,1
[ 1860.193382] Modules linked in: btrfs libcrc32c hidp vboxnetadp vboxnetflt 
vboxdrv ip6table_filter ip6_tables ebtable_nat ebtables acpi_cpufreq mperf 
cpufreq_powersave cpufreq_userspace cpufreq_conservative cpufreq_stats 
ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state 
nf_conntrack ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp 
parport_pc ppdev lp parport sco bnep rfcomm l2cap kvm_intel binfmt_misc kvm 
deflate ctr twofish_generic twofish_x86_64 twofish_common camellia serpent 
blowfish cast5 des_generic cbc cryptd aes_x86_64 aes_generic xcbc rmd160 
sha512_generic sha256_generic sha1_generic hmac crypto_null af_key fuse nfsd 
exportfs nfs lockd fscache nfs_acl auth_rpcgss sunrpc loop dm_crypt 
snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss 
snd_mixer_oss snd_pcm uvcvideo videodev nouveau btusb bluetooth snd_seq_midi 
lib80211_crypt_tkip snd_rawmidi snd_seq_midi_event v4l1_compat rfkill snd_seq 
bcm5974 wl(P) ttm drm_kms_helper v4l2_compat_ioctl32 snd_timer snd_seq_device 
drm i2c_i801 i2c_algo_bit snd tpm_tis soundcore video snd_page_alloc lib80211 
joydev i2c_core tpm tpm_bios battery ac applesmc input_polldev evdev pcspkr 
mbp_nvidia_bl output power_supply processor thermal_sys button ext4 mbcache 
jbd2 crc16 raid10 raid456 async_raid6_recov async_pq raid6_pq async_xor xor 
async_memcpy async_tx raid1 raid0 multipath linear md_mod nbd dm_mirror 
dm_region_hash dm_log dm_mod zlib_deflate crc32c sg sd_mod sr_mod cdrom 
crc_t10dif hid_apple usbhid hid ata_generic sata_sil24 uhci_hcd ata_piix libata 
ehci_hcd scsi_mod usbcore sky2 firewire_ohci firewire_core crc_itu_t nls_base 
[last unloaded: uinput]
[ 1860.193550] Pid: 14808, comm: umount Tainted: PW   2.6.37-2-amd64 #1
[ 1860.193552] Call Trace:
[ 1860.193561]  [81047084] ? warn_slowpath_common+0x78/0x8c
[ 1860.193577]  [a0c74a4b] ? __btrfs_close_devices+0xb5/0xd0 [btrfs]
[ 1860.193593]  [a0c74a83] ? btrfs_close_devices+0x1d/0x70 [btrfs]
[ 1860.193610]  [a0c53e64] ? close_ctree+0x2cd/0x32f [btrfs]
[ 1860.193616]  [8110580d] ? dispose_list+0xa7/0xb9
[ 1860.193627]  [a0c3d1f3] ? btrfs_put_super+0x10/0x1d [btrfs]
[ 1860.193633]  [810f5c67] ? generic_shutdown_super+0x5c/0xd4
[ 1860.193638]  [810f5d1e] ? kill_anon_super+0x9/0x40
[ 1860.193642]  [810f5794] ? deactivate_locked_super+0x1e/0x3d
[ 1860.193647]  [8110928e] ? sys_umount+0x2cf/0x2fa
[ 1860.193653]  [81009a12] ? system_call_fastpath+0x16/0x1b
[ 1860.193656] ---[ end trace 4e4b8320dc6e70cc ]---


I can mount it back, but not if I reload the btrfs module, in which case I get:

[ 1961.328280] Btrfs loaded
[ 1961.328695] device fsid df4e5454eb7b1c23-7a68fc421060b18b devid 1 transid 
118 /dev/loop0

cloning single-device btrfs file system onto multi-device one

2011-03-21 Thread Stephane Chazelas
Hiya,

I'm trying to move a btrfs FS that's on a hardware raid 5 (6TB
large, 4 of which are in use) to another machine with 3 3TB HDs
and preserve all the subvolumes/snapshots.

Is there a way to do that without using a software/hardware raid
on the new machine (that is just use btrfs multi-device).

If fewer than 3TB were occupied, I suppose I could just resize
it so that it fits on one 3TB hd, then copy device to device
onto a 3TB disk, add the 2 other ones and do a balance, but
here, I can't do that.

I suspect that if compression was enabled, the FS could fit on
3 TB, but AFAICT, compression is enabled at mount time and would
only apply to newly created files. Is there a way to compress
files already in a btrfs filesystem?

Any help would be appreciated.
Stephane

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html