from:"Stephane Chazelas"

Re: Make existing snapshots read-only?

2012-05-29 Thread Stephane Chazelas

2012-05-28 12:37:00 -0600, Bruce Guenter:
> 
> Is there any way to mark existing snapshots as read-only? Making new
> ones read-only is easy enough, but what about existing ones?
[...]

you can always do

btrfs sub snap -r vol vol-ro
btrfs sub del vol
mv vol-ro vol

-- 
Stephane

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Cloning a Btrfs partition

2011-12-08 Thread Stephane CHAZELAS

2011-12-08, 10:49(-05), Phillip Susi:
> On 12/7/2011 1:49 PM, BJ Quinn wrote:
>> What I need isn't really an equivalent "zfs send" -- my script can do
>> that. As I remember, zfs send was pretty slow too in a scenario like
>> this. What I need is to be able to clone a btrfs array somehow -- dd
>> would be nice, but as I said I end up with the identical UUID
>> problem. Is there a way to change the UUID of an array?
>
> No, btrfs send is exactly what you need.  Using dd is slow because it 
> copies unused blocks, and requires the source fs be unmounted.
[...]

Not necessarily, you can snapshot them (as in the method I
suggested). If your FS is already on a device mapper device, you
can even get away with not unmounting it (freeze, reload the
device mapper table with a snapshot-origin one and thaw).

> and the destination be an empty partition.  rsync is slow
> because it can't take advantage of the btrfs tree to quickly
> locate the files (or parts of them) that have changed.  A
> btrfs send would solve all of these issues.
[...]

When you want to clone a FS using a similar device or set of
devices, a tool like clone2fs or ntfsclone that copies only the
used sectors across sequentially would probably be a lot more
efficient as it copies the data at the max speed of the drive,
seeking as little as possible.

-- 
Stephane

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Cloning a Btrfs partition

2011-12-08 Thread Stephane CHAZELAS

2011-12-07, 12:35(-06), BJ Quinn:
> I've got a 6TB btrfs array (two 3TB drives in a RAID 0). It's
> about 2/3 full and has lots of snapshots. I've written a
> script that runs through the snapshots and copies the data
> efficiently (rsync --inplace --no-whole-file) from the main
> 6TB array to a backup array, creating snapshots on the backup
> array and then continuing on copying the next snapshot.
> Problem is, it looks like it will take weeks to finish. 
>
> I've tried simply using dd to clone the btrfs partition, which
> technically appears to work, but then it appears that the UUID
> between the arrays is identical, so I can only mount one or
> the other. This means I can't continue to simply update the
> backup array with the new snapshots created on the main array
> (my script is capable of "catching up" the backup array with
> the new snapshots, but if I can't mount both arrays...). 
[...]

You can mount them if you specify the devices upon mount.

Here's a method to transfer a full FS to some other with
different layout.

In this example, we're transfering from a FS on a 3GB device
(/dev/loop1) to a new FS on 2 2GB devices (/dev/loop2,
/dev/loop3)

truncate -s 3G a1
truncate -s 2G b1 b2
losetup /dev/loop1 a1
losetup /dev/loop2 b1
losetup /dev/loop2 b2

# our src FS on 1 disk:
mkfs.btrfs /dev/loop1
mkdir A B
mount /dev/loop1 A
# now we can fill it up, create subvolumes and snapshots...

# at this point, we decide to make a clone of it. To do that, we
# will make a snapshot of the device. For that, we need
# temporary storage as a block device. That could be a disk
# (like a USB key) or a nbd to another host, or anything. Here,
# I'm going to use a loop device to a file. You need enough
# space to store any modification done on the src FS while
# you're the transfer and what is needed to do the transfer
# (I can't tell you much about that).

truncate -s 100M sa
losetup /dev/loop4 sa

umount A
size=$(blockdev --getsize /dev/loop1)
echo 0 "$size" /dev/loop1) snapshot-origin /dev/loop1 | dmsetup create a
echo 0 "$size" snapshot /dev/loop1 /dev/loop4 N 8 | dmsetup create aSnap

# now we have /dev/mapper/a as the src device which we can
# remount as such and use:
mount /dev/mapper/a A

# and aSnap as a writable snapshot of the src device, which we
# mount separately:
mount /dev/mapper/aSnap B

# The trick here is that we're going to add the two new devices
# to "B" and remove the snapshot one. btrfs will automatically
# migrate the data to the new device:
btrfs device add /dev/loop2 /dev/loop3 B
btrfs device delete /dev/mapper/aSnap B
# END
Once that's completed, you should have a copy of A in B.

You may want to watch the status of the snapshot while you're
transfering to check that it doesn't get full 

That method can't be used to do some incremental "syncing"
between two FS for which you'd still need something similar to
"zfs send" (speaking of which, you may want to consider
zfsonlinux which is now reaching a point where it's about as
stable as btrfs, same performance level if not better and has a
lot more features. I'm doing the switch myself while waiting for
btrfs to be a bit more mature)

Because of the same uuid, the btrfs commands like filesystem
show will not always give sensible outputs. I tried to rename
the fsid by changing it in the superblocks, but it looks like it
is alsa included in a few other places where changing it
manually breaks some checksums, so I guess someone would have to
write a tool to do that job. I'm surprised it doesn't exist
already (or maybe it does and I'm not aware of it?).

-- 
Stephane

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: stripe alignment consideration for btrfs on RAID5

2011-11-23 Thread Stephane CHAZELAS

2011-11-23, 09:08(-08), Blair Zajac:
>
> On Nov 23, 2011, at 9:04 AM, Stephane CHAZELAS wrote:
>
>> Hiya,
>> 
>> is there any recommendation out there to setup a btrfs FS on top
>> of hardware or software raid5 or raid6 wrt stripe/stride alignment?
>
> Isn't the advantage of having btrfs do all the raiding itself
> so one gets the checksums?  If one puts btrfs on top of
> software or hardware raid, then if there is a checksum error,
> you don't have another copy of the data to fall back to.  If
> one uses btrfs' raid1 or above for data and metadata, then you
> can suffer a checksum failure and get a good copy from another
> drive?
[...]

Yes, but btrfs doesn't support raid5 yet and I have a limited
number of drives I can connect to that system, and storage
capacity is more important for me than the odd chance of
corruptions of odd sectors (which can be mitigated by running
regular RAID checks).

Also, my tests of btrfs raid10 didn't indicate it was reliable
enough yet (when a drive disappears and reappears, btrfs seems
to get quite confused).

-- 
Stephane

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

stripe alignment consideration for btrfs on RAID5

2011-11-23 Thread Stephane CHAZELAS

Hiya,

is there any recommendation out there to setup a btrfs FS on top
of hardware or software raid5 or raid6 wrt stripe/stride alignment?

>From mkfs.btrfs, it doesn't look like there's much that can be
adjusted that would help, and what I'm asking might not even
make sense for btrfs but I thought I'd just ask.

Thanks,
Stephane

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

mounting btrfs FS on zfs zvol hangs

2011-11-23 Thread Stephane Chazelas

Hiya,

yes, you'll probably think that is crazy, but after observing
better performance with btrfs in some work loads on md RAID5
than btrfs builtin RAID10, I thought I'd try btrfs on zfs
(in-kernel, not fuse) zvol (on raidz) just for a laugh.

While this procedure worked for ext4 and xfs, for btrfs, the
mount hangs suggesting there might be something wrong with btrfs
and/or zfs.

Here's what I'm doing:

zpool create X raidz /dev/sd{a,b,c,d,e,f}
zfs create -V 6T -o refreservation=0 X/Y
mkfs.btrfs /dev/zvol/X/Y
mount /dev/zvol/X/Y /mnt

backtrace for mount:

mount   D 0009 0  2193   1761 0x
 880401b4d9a8 0082 0001 
 880401b4dfd8 880401b4dfd8 880401b4dfd8 00012a40
 8802092f 88040ef7dc80 880401b4d988 88041fa732c0
Call Trace:
 [] ? __lock_page+0x70/0x70
 [] schedule+0x3f/0x60
 [] io_schedule+0x8f/0xd0
 [] sleep_on_page+0xe/0x20
 [] __wait_on_bit+0x5f/0x90
 [] wait_on_page_bit+0x78/0x80
 [] ? autoremove_wake_function+0x40/0x40
 [] read_extent_buffer_pages+0x3ca/0x430 [btrfs]
 [] ? btrfs_destroy_pinned_extent+0xb0/0xb0 [btrfs]
 [] btree_read_extent_buffer_pages.isra.62+0x8a/0xc0 [btrfs]
 [] read_tree_block+0x41/0x60 [btrfs]
 [] open_ctree+0xe75/0x1760 [btrfs]
 [] ? snprintf+0x34/0x40
 [] btrfs_fill_super.isra.38+0x78/0x150 [btrfs]
 [] ? disk_name+0xba/0xc0
 [] ? strlcpy+0x47/0x60
 [] btrfs_mount+0x3c6/0x470 [btrfs]
 [] mount_fs+0x43/0x1b0
 [] vfs_kern_mount+0x6a/0xc0
 [] do_kern_mount+0x54/0x110
 [] do_mount+0x1a4/0x260
 [] sys_mount+0x90/0xe0
 [] system_call_fastpath+0x16/0x1b

3.0.0-13-generic #22-Ubuntu SMP Wed Nov 2 13:27:26 UTC 2011
x86_64

Best regards,
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs-delalloc - threaded?

2011-11-22 Thread Stephane CHAZELAS

2011-11-22, 09:47(-05), Chris Mason:
> On Tue, Nov 22, 2011 at 02:30:07PM +0000, Stephane CHAZELAS wrote:
>> 2011-09-6, 11:21(-05), Andrew Carlson:
>> > I was doing some testing with writing out data to a BTFS filesystem
>> > with the compress-force option.  With 1 program running, I saw
>> > btfs-delalloc taking about 1 CPU worth of time, much as could be
>> > expected.  I then started up 2 programs at the same time, writing data
>> > to the BTRFS volume.  btrfs-delalloc still only used 1 CPU worth of
>> > time.  Is btrfs-delalloc threaded, to where it can use more than 1 CPU
>> > worth of time?  Is there a threshold where it would start using more
>> > CPU?
>> [...]
>> 
>> Hiya,
>> 
>> I observe the same here. The bottleneck when writing data
>> sequencially seems to be that btrfs-delalloc using 100% of the
>> time of one CPU.
>
> The compression is spread out to multiple CPUs.  Using zlib on my 4 cpu
> box, I get 4 delalloc threads working on two concurrent dds.
>
> The thread hand off is based on the amount of work queued up to each
> thread, and you're probably just below the threshold where it kicks off
> another one.  Are you using lzo or zlib? 

mounted with -o compress-force, so getting whatever the default
compression algorithm is.

> What is the workload you're using?  We can make the compression code
> more aggressive at fanning out.
[...]

That was a basic test of:

head -c 40M /dev/urandom > a
(while :; do cat a; done) | pv -rab > b

(I expect the content of "a" to be cached in memory).

Running "dstat -df" and top in parallel.

Nothing else reading or writing to that FS.

btrfs maxes out at about 150MB/s, and zfs at about 400MB/s.

For the concurrent writing, replace

pv  with pv | tee b c d e > f

(I suppose there's a fair chance of this incurring disk seeking,
so reduced throughput is probably to be expected. I get the same
kind of throughput (mayby 15% more) with zfs raid5 in that case).

-- 
Stephane

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs-delalloc - threaded?

2011-11-22 Thread Stephane CHAZELAS

2011-09-6, 11:21(-05), Andrew Carlson:
> I was doing some testing with writing out data to a BTFS filesystem
> with the compress-force option.  With 1 program running, I saw
> btfs-delalloc taking about 1 CPU worth of time, much as could be
> expected.  I then started up 2 programs at the same time, writing data
> to the BTRFS volume.  btrfs-delalloc still only used 1 CPU worth of
> time.  Is btrfs-delalloc threaded, to where it can use more than 1 CPU
> worth of time?  Is there a threshold where it would start using more
> CPU?
[...]

Hiya,

I observe the same here. The bottleneck when writing data
sequencially seems to be that btrfs-delalloc using 100% of the
time of one CPU.

If I do several writes in parallel, a few more btrfs-delalloc's
appear (3 when filling up 5 files concurrently), but
btrfs-delalloc is still the bottleneck. Interestingly, if I
write to 10 files simultanously, I see only two btrfs-delalloc
and the throughput is lower.

That's on ubuntu 11.10 3.0.0-13 amd64, 12 core, 16GB DDR3
1333MHz RAM. raid10 on 6 drives.

Note that zfsonlinux does perform a lot better in that regard
(on a raidz (ZFS raid5) on those same 6 drives): 50% CPU
utilisation and max out the disk bandwidth.

-- 
Stephane

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: status of raid10 reliability

2011-11-17 Thread Stephane CHAZELAS

2011-11-17 17:09:25 +, Stephane CHAZELAS:
[...]
> Before setting up a new RAID10 btrfs array with 6 drives, I
> wanted to check how good it behaved in case of disk failure.
> I've not been too impressed. Is RAID10 btrfs support only
> meant for reading performance improvement?
> 
> My test method was:
> 
> Use the device-mapper to have devices mapped (linear) to loop
> devices
[...]
> Then write some data, and then use DM's error target to simulate
> a failing drive (all I/O ends up in error):
> 
> # dmsetup suspend hd3; echo 0 $s error | dmsetup reload hd3; dmsetup resume 
> hd3
[...]

Note that I did the same test with both md (raid10) and
zfsonlinux (raidz) and it worked as expected.

-- 
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

status of raid10 reliability

2011-11-17 Thread Stephane CHAZELAS

Hiya,

Before setting up a new RAID10 btrfs array with 6 drives, I
wanted to check how good it behaved in case of disk failure.
I've not been too impressed. Is RAID10 btrfs support only
meant for reading performance improvement?

My test method was:

Use the device-mapper to have devices mapped (linear) to loop
devices

zsh syntax:

# l=({1..4})
# mv /dev/loop$^l .
# truncate -s1T $l
# s=$(blockdev --getsize /dev/loop1)
# for f ($l) losetup loop$f $f
# for f ($l) echo 0 $s linear loop$f 0 | dmsetup create hd$f
# mkfs.btrfs -m raid10 -d raid10 /dev/mapper/hd$^l
# d=(device=/dev/mapper/hd$^l)
# mount -o ${(j:,:)d} /dev/mapper/hd1 /mnt/3

Then write some data, and then use DM's error target to simulate
a failing drive (all I/O ends up in error):

# dmsetup suspend hd3; echo 0 $s error | dmsetup reload hd3; dmsetup resume hd3

Then write some more data. The FS doesn't become degraded
automatically. If I restore the drive:

# echo 0 $s linear loop3 0 | dmsetup create hd3

More funny things occur of course as btrfs doesn't seem to have
registered it being broken.

If I do a scrub with the failing drive, it "BUGs ON":

[13960.286464] [ cut here ]
[13960.286484] kernel BUG at 
/home/blank/debian/kernel/release/linux-2.6/linux-2.6-3.1.0/debian/build/source_amd64_none/fs/btrfs/volumes.c:2891!
[13960.286496] invalid opcode:  [#1] SMP
[13960.286507] CPU 0
[13960.286510] Modules linked in: vboxnetadp(O) vboxnetflt(O) vboxdrv(O) 
ip6table_filter ip6_tables ebtable_nat acpi_cpufreq mperf ebtables 
cpufreq_powersave cpufreq_userspace cpufreq_conservative cpufreq_stats 
ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state 
nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp iptable_filter 
ip_tables x_tables bridge stp parport_pc ppdev lp parport rfcomm bnep 
binfmt_misc uinput deflate ctr twofish_generic twofish_x86_64 twofish_common 
camellia serpent blowfish cast5 des_generic cbc xcbc rmd160 sha512_generic 
sha256_generic sha1_generic hmac crypto_null af_key fuse nfsd nfs lockd fscache 
auth_rpcgss nfs_acl sunrpc dm_crypt coretemp loop kvm_intel kvm uvcvideo 
bcm5974 videodev media v4l2_compat_ioctl32 cryptd snd_hda_codec_realtek 
snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm 
aes_x86_64 aes_generic snd_seq_midi ecb btusb bluetooth rfkill snd_rawmidi 
nouveau ttm snd_seq_midi_event drm_kms_helper snd_seq drm i2c_algo_bit mxm_wmi 
snd_timer wmi snd_seq_device joydev video battery snd ac apple_bl power_supply 
soundcore snd_page_alloc applesmc pcspkr input_polldev i2c_i801 i2c_core evdev 
button processor thermal_sys ext4 mbcache jbd2 crc16 btrfs zlib_deflate crc32c 
libcrc32c raid10 raid456 async_raid6_recov async_pq raid6_pq async_xor xor 
async_memcpy async_tx raid1 raid0 multipath linear md_mod nbd dm_mirror 
dm_region_hash dm_log dm_mod sg sr_mod sd_mod cdrom crc_t10dif hid_apple usbhid 
hid ata_generic uhci_hcd firewire_ohci sata_sil24 ata_piix firewire_core 
crc_itu_t libata sky2 ehci_hcd scsi_mod usbcore [last unloaded: scsi_wait_scan]
[13960.287012]
[13960.287016] Pid: 12681, comm: btrfs-scrub-0 Tainted: GW  O 
3.1.0-1-amd64 #1 Apple Inc. MacBookPro4,1/Mac-F42C86C8
[13960.287037] RIP: 0010:[]  [] 
__btrfs_map_block+0xfd/0x629 [btrfs]
[13960.287061] RSP: 0018:880078c87cb0  EFLAGS: 00010282
[13960.287067] RAX: 0042 RBX: 880078c87d68 RCX: 54ee
[13960.287076] RDX:  RSI: 0046 RDI: 0246
[13960.287085] RBP: 8800378f6380 R08: 0002 R09: fffe
[13960.287093] R10: 0001 R11: 88007e395c90 R12: 
[13960.287102] R13: 0008 R14:  R15: 0001
[13960.287114] FS:  () GS:88013fc0() 
knlGS:
[13960.287126] CS:  0010 DS:  ES:  CR0: 8005003b
[13960.287136] CR2: f76990a0 CR3: 000104319000 CR4: 06f0
[13960.287147] DR0:  DR1:  DR2: 
[13960.287159] DR3:  DR6: 0ff0 DR7: 0400
[13960.287169] Process btrfs-scrub-0 (pid: 12681, threadinfo 880078c86000, 
task 880021698280)
[13960.287184] Stack:
[13960.287189]   88010001 0040 
88008a1f40f8
[13960.287207]  88008a1f4100 880078c87d60 0001 
0001
[13960.287223]  880078c87cf0 880109671000 880123416400 

[13960.287242] Call Trace:
[13960.287259]  [] ? scrub_recheck_error+0x105/0x29b [btrfs]
[13960.287280]  [] ? scrub_checksum+0x75/0x372 [btrfs]
[13960.287288]  [] ? check_preempt_wakeup+0x122/0x18b
[13960.287297]  [] ? set_next_entity+0x32/0x52
[13960.287304]  [] ? load_gs_index+0x7/0xa
[13960.287312]  [] ? __switch_to+0x15a/0x20e
[13960.287331]  [] ? worker_loop+0x16a/0x45d [btrfs]
[13960.287341]  [] ? __schedule+0x5ac/0x5c3
[13960.287360]  [] ? btrfs_queue_worker+0x25b

ENOSPC on almost empty FS

2011-11-01 Thread Stephane CHAZELAS

Hiya,

trying to restore a FS from a backup (tgz) on a freshly made
btrfs this morning, I got ENOSPCs after about 100MB out of 4GB
have been extracted. strace indicates that the ENOSPC are upon
the open(O_WRONLY).

Restoring with:

mkfs.btrfs /dev/mapper/VG_USB-root
mount -o compress-force,ssd $_ /mnt
cd /mnt
pv ~/backup.tgz | gunzip | sudo bsdtar -xpSf - --numeric-owner


That's on a LVM LV with the PV on a USB key.

If I supspend the job and resume it, then the ENOSPCs go away.

The only way I could restore the backup was via rate limiting
the untar:

zcat ~/backup.tgz | pv -L 300 | sudo bsdtar -xpSf - --numeric-owner

That 3MB/s wasn't even enough, as 3 files triggered a ENOSPC,
but I did untar them separately afterwards.

That's with debian's 3.0.0-1 amd64 kernel.

Is that expected behavior due to the way allocation works in
btrfs?

-- 
Stephane

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Unable to mount (or, why not to work late at night).

2011-10-28 Thread Stephane CHAZELAS

2011-10-28, 07:57(+07), Fajar A. Nugraha:
[...]
>> Already got 'em.  Everything that tries to even think about modifying stuff
>> (btrfs-zero-log, btrfsck, and btrfs-debug-tree) all dump core:
>
> Your last resort (for now, anyway) might be using "restore" from
> Josef's btrfs-progs: https://github.com/josefbacik/btrfs-progs
>
> It might be able to copy some data.

I also have got one FS in that same situation. I tried
everything on it including that "restore" (which bailed out with
those same error messages IIRC).

The only thing that got me a bit further was to use an alternate
superblock, though that screwed the FS even further as I need to
reboot the machine after trying to mount it (mount hangs and
there are some btrfs tasks using all the CPU time).

Fortunately, for that one, I had a not too old backup at the
block device level.

-- 
Stephane

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs fi defrag -c

2011-10-28 Thread Stephane CHAZELAS

2011-10-28, 10:25(+08), Li Zefan:
[...]
> # df . -h
> FilesystemSize  Used Avail Use% Mounted on
> /home/lizf/tmp/a  2.0G  409M  1.4G  23% /mnt

OK, why are we not gaining space after compression though?


> And I was not suprised, as there's a regression.
>
> With this fix:
>
> http://marc.info/?l=linux-btrfs&m=131495014823121&w=2
[...]

Thanks. That's the one that's scheduled for 3.2 and maybe 3.1.x,
right?

-- 
Stephane

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

btrfs fi defrag -c

2011-10-27 Thread Stephane Chazelas

I don't quite understand the behavior of "btrfs fi defrag"

~# truncate -s2G ~/a
~# mkfs.btrfs ~/a
nodesize 4096 leafsize 4096 sectorsize 4096 size 2.00GB
~# mount -o loop ~/a /mnt/1
/mnt/1# cd x
/mnt/1# df -h .
Filesystem  Size  Used Avail Use% Mounted on
/dev/loop1  2.0G   64K  1.8G   1% /mnt/1
/mnt/1# yes | head -c400M > a
/mnt/1# df -h .
Filesystem  Size  Used Avail Use% Mounted on
/dev/loop1  2.0G   64K  1.8G   1% /mnt/1
/mnt/1# sync
/mnt/1# df -h .
Filesystem  Size  Used Avail Use% Mounted on
/dev/loop1  2.0G  402M  1.4G  23% /mnt/1
/mnt/1# btrfs fi defrag -c a

(exit status == 20 BTW).

(20)/mnt/1# sync
/mnt/1# df -h .
Filesystem  Size  Used Avail Use% Mounted on
/dev/loop1  2.0G  415M  994M  30% /mnt/1

No space gain, even lost 15M or 400M depending on how you look at it.

/mnt/1# btrfs fi defrag  a
(20)/mnt/1# sync
/mnt/1# df -h .
Filesystem  Size  Used Avail Use% Mounted on
/dev/loop1  2.0G  797M  612M  57% /mnt/1

Lost another 400M.

/mnt/1# ls -l
total 409600
-rw-r--r-- 1 root root 419430400 Oct 27 19:53 a
/mnt/1# btrfs fi balance .
/mnt/1# sync
/mnt/1# df -h .
Filesystem  Size  Used Avail Use% Mounted on
/dev/loop1  2.0G  798M  845M  49% /mnt/1

Possibly reclaimed some of the space?

At the point where it says 612M free, if I do:
/mnt/1# cat < /dev/zero > b
cat: write error: No space left on device
/mnt/1# ls -lh b
-rw-r--r-- 1 root root 612M Oct 27 20:14 b

There was indeed 612M free.


When the FS is mounted with compress:

~# mkfs.btrfs ./a
nodesize 4096 leafsize 4096 sectorsize 4096 size 2.00GB
~# mount -o compress ./a /mnt/1
~# cd /mnt/1
/mnt/1# yes | head -c400M > a
/mnt/1# sync
/mnt/1# df -h .
Filesystem  Size  Used Avail Use% Mounted on
/dev/loop1  2.0G   14M  1.8G   1% /mnt/1
/mnt/1# btrfs fi defrag -c ./a
(20)/mnt/1# sync
/mnt/1# df -h .
Filesystem  Size  Used Avail Use% Mounted on
/dev/loop1  2.0G   21M  1.4G   2% /mnt/1

Lost 400M?

/mnt/1# btrfs fi defrag ./a
(20)/mnt/1# sync
/mnt/1# df -h .
Filesystem  Size  Used Avail Use% Mounted on
/dev/loop1  2.0G   21M  1.4G   2% /mnt/1

I take it it doesn't uncompress?

I'm a bit confused here.

(that's with 3.0 amd64)

-- 
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

lseek hanging

2011-10-27 Thread Stephane CHAZELAS

This morning, I have a strange behavior when doing a "tail -f"
on a log file.

"cat log" runs successfully, but
"tail -f log" hangs.

Running a strace shows it hanging on lseek(3, 0, SEEK_CUR...
3 being the fd for that log file.

In dmesg:

[59881.520030] INFO: task btrfs-delalloc-:763 blocked for more than 120 seconds.
[59881.527205] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[59881.535068] btrfs-delalloc- D 000100e2892b 0   763  2 0x
[59881.542161]  88014738bc20 0046 4000 

[59881.549673]  00012840 00012840 00012840 
8801478c2580
[59881.557147]  00012840 88014738bfd8 00012840 
00012840
[59881.568736] Call Trace:
[59881.571219]  [] wait_current_trans.clone.22+0xab/0xdc 
[btrfs]
[59881.578589]  [] ? wake_up_bit+0x2a/0x2a
[59881.584012]  [] ? _raw_spin_lock+0xe/0x10
[59881.589613]  [] start_transaction+0xe3/0x231 [btrfs]
[59881.596176]  [] btrfs_join_transaction+0x15/0x17 [btrfs]
[59881.603103]  [] compress_file_range+0x297/0x515 [btrfs]
[59881.609926]  [] async_cow_start+0x35/0x4a [btrfs]
[59881.616237]  [] ? _raw_spin_lock_irq+0x1f/0x21
[59881.622277]  [] worker_loop+0x19d/0x4cb [btrfs]
[59881.628433]  [] ? btrfs_queue_worker+0x27a/0x27a [btrfs]
[59881.635330]  [] kthread+0x82/0x8a
[59881.640227]  [] kernel_thread_helper+0x4/0x10
[59881.646160]  [] ? kthread_worker_fn+0x14c/0x14c
[59881.652293]  [] ? gs_change+0x13/0x13
[59881.657555] INFO: task flush-btrfs-1:2675 blocked for more than 120 seconds.
[59881.664617] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[59881.672477] flush-btrfs-1   D 000100e28931 0  2675  2 0x
[59881.679628]  88013c559990 0046  
8801
[59881.687118]  00012840 00012840 00012840 
88013cf76140
[59881.694591]  00012840 88013c559fd8 00012840 
00012840
[59881.702064] Call Trace:
[59881.704536]  [] ? lock_page+0x2f/0x2f
[59881.709800]  [] io_schedule+0x63/0x7e
[59881.715042]  [] sleep_on_page+0xe/0x12
[59881.720376]  [] __wait_on_bit_lock+0x46/0x8f
[59881.726241]  [] ? pagevec_lru_move_fn+0xaa/0xc0
[59881.732372]  [] __lock_page+0x66/0x6d
[59881.737628]  [] ? autoremove_wake_function+0x39/0x39
[59881.744173]  [] ? should_resched+0xe/0x2e
[59881.749779]  [] lock_page+0x2a/0x2e [btrfs]
[59881.755631]  [] 
extent_write_cache_pages.clone.10.clone.17+0xba/0x28e [btrfs]
[59881.764382]  [] extent_writepages+0x47/0x5c [btrfs]
[59881.770877]  [] ? uncompress_inline.clone.32+0x119/0x119 
[btrfs]
[59881.778494]  [] btrfs_writepages+0x27/0x29 [btrfs]
[59881.784867]  [] do_writepages+0x21/0x2a
[59881.790302]  [] writeback_single_inode+0xb5/0x1c6
[59881.796588]  [] writeback_sb_inodes+0xbc/0x138
[59881.802683]  [] writeback_inodes_wb+0x172/0x184
[59881.808795]  [] wb_writeback+0x26c/0x3aa
[59881.814297]  [] wb_do_writeback+0x147/0x1a0
[59881.820081]  [] ? schedule_timeout+0xb3/0xe3
[59881.825947]  [] bdi_writeback_thread+0x8c/0x20f
[59881.832056]  [] ? wb_do_writeback+0x1a0/0x1a0
[59881.838062]  [] kthread+0x82/0x8a
[59881.842959]  [] kernel_thread_helper+0x4/0x10
[59881.848896]  [] ? kthread_worker_fn+0x14c/0x14c
[59881.855005]  [] ? gs_change+0x13/0x13
[59881.860267] INFO: task ntfsclone:2787 blocked for more than 120 seconds.
[59881.866981] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[59881.874842] ntfsclone   D 000100e28625 0  2787   2767 0x
[59881.881929]  88013b421b88 0082 814044e0 

[59881.889399]  00012840 00012840 00012840 
88013bcba5c0
[59881.896895]  00012840 88013b421fd8 00012840 
00012840
[59881.904388] Call Trace:
[59881.906848]  [] ? prepare_to_wait+0x76/0x81
[59881.912630]  [] wait_current_trans.clone.22+0xab/0xdc 
[btrfs]
[59881.919970]  [] ? wake_up_bit+0x2a/0x2a
[59881.925410]  [] ? _raw_spin_lock+0xe/0x10
[59881.931014]  [] start_transaction+0xe3/0x231 [btrfs]
[59881.937572]  [] btrfs_join_transaction+0x15/0x17 [btrfs]
[59881.944479]  [] btrfs_dirty_inode+0x2c/0x117 [btrfs]
[59881.951051]  [] __mark_inode_dirty+0x31/0x19e
[59881.957069]  [] ? mnt_clone_write+0x12/0x2a
[59881.962867]  [] file_update_time+0xed/0x111
[59881.968725]  [] btrfs_file_aio_write+0x1a6/0x495 [btrfs]
[59881.975657]  [] do_sync_write+0xcb/0x108
[59881.981172]  [] ? security_file_permission+0x2e/0x33
[59881.987743]  [] vfs_write+0xac/0xff
[59881.992814]  [] sys_write+0x4a/0x6e
[59881.997955]  [] system_call_fastpath+0x16/0x1b
[59882.003978] INFO: task tail:2789 blocked for more than 120 seconds.
[59882.010257] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[59882.018137] tailD 000100e2ca63 0  2789   2604 0x0004
[59882.025225]  8801337c5e48 0082 8801337c5e28 
8801
[59882.032698]  0001

Re: how stable are snapshots at the block level?

2011-10-25 Thread Stephane CHAZELAS

2011-10-25, 07:46(-04), Edward Ned Harvey:
[...]
> My suggestion to the OP of this thread is to use rsync for now, wait for
> btrfs send, or switch to zfs.
[...]

rsync won't work if you've got snapshot volumes though (unless
you're prepared to have a backup copy thousands of times the
size of the original or have a framework in place to replicate
the snapshots on the backup copy as soon as they are created
(but before they're being written to)).

To backup a btrfs FS with snapshots, the only option seems to
be to copy the block devices for now (or the other trick
mentionned earlier).

-- 
Stephane

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: how stable are snapshots at the block level?

2011-10-24 Thread Stephane CHAZELAS

2011-10-24, 09:59(-04), Edward Ned Harvey:
[...]
> If you are reading the raw device underneath btrfs, you are
> not getting the benefit of the filesystem checksumming.  If
> you encounter an undetected read/write error, it will silently
> pass.  Your data will be corrupted, you'll never know about it
> until you see the side-effects (whatever they may be).
[...]

I don't follow you here. If you're cloning a device holding a
btrfs FS, you'll clone the checksums as well. If there were
errors, they will be detected on the cloned FS as well?

> There is never a situation where block level copies have any
> advantage over something like btrfs send.  Except perhaps
> forensics or espionage.  But in terms of fast efficient
> reliable backups, btrfs send has every advantage and no
> disadvantage compared to block level copy.

$ btrfs send
ERROR: unknown command 'send'
Usage:
[...]

(from 2011-10-12 integration branch). Am I missing something?

> There are many situations where btrfs send has an advantage
> over both block level and file level copies.  It instantly
> knows all the relevant disk blocks to send, it preserves every
> property, it's agnostic about filesystem size or layout on
> either sending or receiving end, you have the option to create
> different configurations on each side, including compression
> etc.  And so on.
[...]

That sounds like "zfs send", I didn't know btrfs had it yet.

My understanding was that to clone/backup a btrfs FS, you could
only clone the block devices or use the "device add" + "device
del" trick with some extra copy-on-write (LVM, nbd) layer.

-- 
Stephane

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: how stable are snapshots at the block level?

2011-10-24 Thread Stephane CHAZELAS

2011-10-23, 17:19(+02), Mathijs Kwik:
[...]
> For this case (my laptop) I can stick to file-based rsync, but I think
> some guarantees should exist at the block level. Many virtual machines
> and cloud hosting services (like ec2) provide block-level snapshots.
> With xfs, I can freeze the filesystem for a short amount of time
> (<100ms), snapshot, unfreeze. I don't think such a lock/freeze feature
> exists for btrfs
[...]

That FS-freeze feature has been moved to the vfs layer so is
available to any filesystem now.

You can either use xfs_io (see -F option to "freeze" for foreign
FS) like for xfs FS or use fsfreeze from util-linux.

Note that you can thaw file systems with a sysrq combination
now. (for instance with xen using "xm sysrq vm j").

For block level snapshots, see also ddsnap (device mapper target
unfortunately no longer maintained) and lvm of course (but
doesn't scale well with several snapshots).

-- 
Stephane

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS thinks that a device is mounted

2011-10-21 Thread Stephane CHAZELAS

2011-10-21, 00:39(+03), Nikos Voutsinas:
[...]
> ## Comment: Of course /dev/sdh is not mounted.
> mount |grep /dev/sdh
> root@lxc:~#
[...]

Note that mount(8) uses /etc/mtab to find out what is mounted.
And if that file is not a symlink to /proc/mounts, the
information is not necessarily correct.

You can also have a look at /proc/mounts directly.

-- 
Stephane

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: recursive subvolume delete

2011-10-02 Thread Stephane Chazelas

2011-10-02 16:38:21 +0200, krz...@gmail.com :
> Also I think there are no real tools to find out which
> directories are subvolumes/snapshots
[...]

On my system (debian), there's "mountpoint" command (from the
initscript package from
http://savannah.nongnu.org/projects/sysvinit) that will tell you
that (it compares the st_dev of the given directory with that of
directory/..).

-- 
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: high CPU usage and low perf

2011-09-30 Thread Stephane Chazelas

2011-09-27 10:15:09 +0100, Stephane Chazelas:
[...]
> btrfs-transacti R  running task0   963  2 0x
>  880143af7730 0001 ff10 880143af77b0
>  8801456da420  e86aa840 1000
>  ffe4 8801462ba800 880109f9b540 88002a95eba8
> Call Trace:
>  [] ? tree_search_offset+0x18f/0x1b8 [btrfs]
>  [] ? btrfs_reserve_extent+0xb0/0x190 [btrfs]
>  [] ? btrfs_alloc_free_block+0x22e/0x349 [btrfs]
>  [] ? __btrfs_cow_block+0x102/0x31e [btrfs]
>  [] ? btrfs_alloc_free_block+0x22e/0x349 [btrfs]
>  [] ? __btrfs_cow_block+0x102/0x31e [btrfs]
>  [] ? btrfs_set_node_key+0x1a/0x20 [btrfs]
>  [] ? btrfs_cow_block+0x104/0x14e [btrfs]
>  [] ? btrfs_search_slot+0x162/0x4cb [btrfs]
>  [] ? btrfs_insert_empty_items+0x6a/0xba [btrfs]
>  [] ? run_clustered_refs+0x370/0x682 [btrfs]
>  [] ? btrfs_find_ref_cluster+0xd/0x13c [btrfs]
>  [] ? btrfs_run_delayed_refs+0xd1/0x17c [btrfs]
>  [] ? btrfs_commit_transaction+0x38f/0x709 [btrfs]
>  [] ? _raw_spin_lock+0xe/0x10
>  [] ? join_transaction.clone.23+0xc1/0x200 [btrfs]
[...]

Any idea anyone? The above suggests btrfs struggles to allocate
space, even though the FS is only 66% full.

For now, my work around is to reboot the system once a day. Not
ideal...

I'm also suspecting some data corruption which I'm investigating
now (one a file written via mmap()).

Thanks,
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: high CPU usage and low perf

2011-09-27 Thread Stephane Chazelas

2011-09-27 10:15:09 +0100, Stephane Chazelas:
[...]
> a btrfs file system of mine started to behave very poorly with
> some btrfs kernel tasks taking 100% of CPU time.
> 
> # btrfs fi show /dev/sdb
> Label: none  uuid: b3ce8b16-970e-4ba8-b9d2-4c7de270d0f1
> Total devices 3 FS bytes used 4.25TB
> devid2 size 2.73TB used 1.52TB path /dev/sdc
> devid1 size 2.70TB used 1.49TB path /dev/sda4
> devid3 size 2.73TB used 1.52TB path /dev/sdb
> 
> Btrfs v0.19-100-g4964d65
> 
> FS mounted with compress-force,noatime
> 
> (Can't do a "filesystem df" just now, as there's a umount
> running, there should be around 33% free).
[...]

The umount just returned.

# btrfs fi df /backup
Data, RAID0: total=4.20TB, used=4.20TB
Data: total=8.00MB, used=7.97MB
System, RAID1: total=8.00MB, used=344.00KB
System: total=4.00MB, used=0.00
Metadata, RAID1: total=162.75GB, used=59.30GB
Metadata: total=8.00MB, used=0.00

It's now running fine again after reload of btrfs module and
remount.

-- 
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

high CPU usage and low perf

2011-09-27 Thread Stephane Chazelas

Hiya,


Recently,

a btrfs file system of mine started to behave very poorly with
some btrfs kernel tasks taking 100% of CPU time.

# btrfs fi show /dev/sdb
Label: none  uuid: b3ce8b16-970e-4ba8-b9d2-4c7de270d0f1
Total devices 3 FS bytes used 4.25TB
devid2 size 2.73TB used 1.52TB path /dev/sdc
devid1 size 2.70TB used 1.49TB path /dev/sda4
devid3 size 2.73TB used 1.52TB path /dev/sdb

Btrfs v0.19-100-g4964d65

FS mounted with compress-force,noatime

(Can't do a "filesystem df" just now, as there's a umount
running, there should be around 33% free).

Kernel 3.0, with patch:
http://www.spinics.net/lists/linux-btrfs/msg11023.html

While the FS is running, I see for instance btrfs-transacti
taking 100% CPU and iostat shows no disk activity. Writing
performance is dreadful (a few kB/s).

sysrq-t gives:

btrfs-transacti R  running task0   963  2 0x
 880143af7730 0001 ff10 880143af77b0
 8801456da420  e86aa840 1000
 ffe4 8801462ba800 880109f9b540 88002a95eba8
Call Trace:
 [] ? tree_search_offset+0x18f/0x1b8 [btrfs]
 [] ? btrfs_reserve_extent+0xb0/0x190 [btrfs]
 [] ? btrfs_alloc_free_block+0x22e/0x349 [btrfs]
 [] ? __btrfs_cow_block+0x102/0x31e [btrfs]
 [] ? btrfs_alloc_free_block+0x22e/0x349 [btrfs]
 [] ? __btrfs_cow_block+0x102/0x31e [btrfs]
 [] ? btrfs_set_node_key+0x1a/0x20 [btrfs]
 [] ? btrfs_cow_block+0x104/0x14e [btrfs]
 [] ? btrfs_search_slot+0x162/0x4cb [btrfs]
 [] ? btrfs_insert_empty_items+0x6a/0xba [btrfs]
 [] ? run_clustered_refs+0x370/0x682 [btrfs]
 [] ? btrfs_find_ref_cluster+0xd/0x13c [btrfs]
 [] ? btrfs_run_delayed_refs+0xd1/0x17c [btrfs]
 [] ? btrfs_commit_transaction+0x38f/0x709 [btrfs]
 [] ? _raw_spin_lock+0xe/0x10
 [] ? join_transaction.clone.23+0xc1/0x200 [btrfs]
 [] ? wake_up_bit+0x2a/0x2a
 [] ? transaction_kthread+0x175/0x22a [btrfs]
 [] ? btrfs_congested_fn+0x86/0x86 [btrfs]
 [] ? kthread+0x82/0x8a
 [] ? kernel_thread_helper+0x4/0x10
 [] ? kthread_worker_fn+0x14c/0x14c
 [] ? gs_change+0x13/0x13

After a while, with no FS activity, it does calm down though.

umount has already used over 10 minutes of CPU time:
# ps -flC umount
F S UIDPID  PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY  TIME CMD
4 R root  6045  1853 65  80   0 -  2538 -  09:46 pts/200:11:06 
umount /backup

sysrq-t gives:

[515954.295050] umount  R  running task0  6045   1853 0x
[515954.295050]  88011131c600 0001 811cb1ee 
88012c2fd598
[515954.295050]  8801456da420 1000 8800 
8801456da420
[515954.295050]  88012c2fd578 a0327d96 880111bebb60 
1000
[515954.295050] Call Trace:
[515954.295050]  [] ? tree_search_offset+0x18f/0x1b8 [btrfs]
[515954.295050]  [] ? need_resched+0x23/0x2d
[515954.295050]  [] ? kmem_cache_alloc+0x94/0x105
[515954.295050]  [] ? btrfs_find_space_cluster+0xce/0x189 
[btrfs]
[515954.295050]  [] ? find_free_extent.clone.64+0x549/0x8c7 
[btrfs]
[515954.295050]  [] ? tree_search_offset+0x18f/0x1b8 [btrfs]
[515954.295050]  [] ? btrfs_reserve_extent+0xb0/0x190 [btrfs]
[515954.295050]  [] ? btrfs_alloc_free_block+0x22e/0x349 
[btrfs]
[515954.295050]  [] ? __btrfs_cow_block+0x102/0x31e [btrfs]
[515954.295050]  [] ? btrfs_alloc_free_block+0x22e/0x349 
[btrfs]
[515954.295050]  [] ? __btrfs_cow_block+0x102/0x31e [btrfs]
[515954.295050]  [] ? lookup_inline_extent_backref+0xa5/0x328 
[btrfs]
[515954.295050]  [] ? __btrfs_free_extent+0xc3/0x55b [btrfs]
[515954.295050]  [] ? kfree+0x72/0x7b
[515954.295050]  [] ? btrfs_delayed_ref_lock+0x4a/0xa1 [btrfs]
[515954.295050]  [] ? run_clustered_refs+0x638/0x682 [btrfs]
[515954.295050]  [] ? btrfs_find_ref_cluster+0xc/0x13c [btrfs]
[515954.295050]  [] ? btrfs_run_delayed_refs+0xd1/0x17c 
[btrfs]
[515954.295050]  [] ? commit_cowonly_roots+0x78/0x18f [btrfs]
[515954.295050]  [] ? need_resched+0x23/0x2d
[515954.295050]  [] ? should_resched+0xe/0x2e
[515954.295050]  [] ? btrfs_commit_transaction+0x3ff/0x709 
[btrfs]
[515954.295050]  [] ? _raw_spin_lock+0xe/0x10
[515954.295050]  [] ? join_transaction.clone.23+0x1ca/0x200 
[btrfs]
[515954.295050]  [] ? wake_up_bit+0x2a/0x2a
[515954.295050]  [] ? btrfs_sync_fs+0x9f/0xa7 [btrfs]
[515954.295050]  [] ? __sync_filesystem+0x66/0x7a
[515954.295050]  [] ? sync_filesystem+0x4c/0x50
[515954.295050]  [] ? generic_shutdown_super+0x38/0xf6
[515954.295050]  [] ? kill_anon_super+0x16/0x50
[515954.295050]  [] ? deactivate_locked_super+0x26/0x4b
[515954.295050]  [] ? deactivate_super+0x3a/0x3e
[515954.295050]  [] ? mntput_no_expire+0xd0/0xd5
[515954.295050]  [] ? sys_umount+0x2ee/0x31c
[515954.295050]  [] ? system_call_fastpath+0x16/0x1b

Last time it happened, I hard rebooted the system, and it was
fine for a while. This time, I'll try and let umount finish.

Would anybody know what is happening and how to get out of it?

Thanks.
Stephane
--
To unsubscribe from thi

Re: kernel BUG at fs/btrfs/inode.c:4676!

2011-09-17 Thread Stephane Chazelas

2011-06-06 12:19:56 +0200, Marek Otahal:
> Hello, 
> the issue happens every time when i have to hard power-off my notebook 
> (suspend problems). 
> With kernel 2.6.39 the partition is unmountable, solution is to boot 2.6.38 
> kernel which 
> 1/ is able to mount the partition, 
> 2/ by doing that fixes the problem so later .39 (after clean shutdown) can 
> mount it also. 
[...]

I've just been hit by this (3.0). I dug up a 2.6.38 kernel and got
it back running just the same. Has any progress made on this?

[39564.802905] device fsid 01b919f7-32cd-4d09-be1c-1810249001b2 devid 1 transid 
21097 /dev/mapper/VG_USB_debian-root
[39565.555655] [ cut here ]
[39565.555662] kernel BUG at 
/build/buildd-linux-2.6_3.0.0-3-amd64-9ClimQ/linux-2.6-3.0.0/debian/build/source_amd64_none/fs/btrfs/inode.c:4586!
[39565.555668] invalid opcode:  [#1] SMP
[39565.555672] CPU 1
[39565.555674] Modules linked in: ext2 hfsplus nls_utf8 nls_cp437 vfat fat 
ip6table_filter ip6_tables ebtable_nat ebtables vboxnetadp(O) vboxnetflt(O) 
ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state 
nf_conntrack ipt_REJECT vboxdrv(O) xt_CHECKSUM iptable_mangle xt_tcpudp bridge 
stp parport_pc ppdev lp parport rfcomm bnep bluetooth rfkill xt_multiport 
iptable_filter ip_tables x_tables snd_hrtimer acpi_cpufreq mperf 
cpufreq_conservative cpufreq_powersave cpufreq_userspace cpufreq_stats 
binfmt_misc fuse nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc ext3 jbd 
loop dm_crypt kvm_intel kvm uvcvideo videodev media v4l2_compat_ioctl32 
nvidia(P) snd_hda_codec_via snd_hda_intel snd_hda_codec snd_hwdep snd_pcm 
snd_seq snd_timer snd_seq_device evdev i7core_edac snd i2c_i801 edac_core 
pcspkr soundcore i2c_core asus_atk0110 snd_page_alloc button processor 
thermal_sys ext4 mbcache jbd2 crc16 btrfs zlib_deflate crc32c libcrc32c dm_mod 
raid10 raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy 
async_tx raid1 raid0 multipath linear md_mod nbd sg sd_mod sr_mod crc_t10dif 
cdrom ata_generic usb_storage usbhid hid uas pata_jmicron firewire_ohci 
firewire_core crc_itu_t ahci libahci ehci_hcd libata scsi_mod r8169 mii usbcore 
[last unloaded: scsi_wait_scan]
[39565.555806]
[39565.555810] Pid: 18729, comm: mount Tainted: P  IO 3.0.0-1-amd64 #1 
System manufacturer System Product Name/P7P55D
[39565.555817] RIP: 0010:[]  [] 
btrfs_add_link+0x120/0x178 [btrfs]
[39565.555850] RSP: 0018:8801c5273858  EFLAGS: 00010282
[39565.555854] RAX: ffef RBX: 8801caf96d90 RCX: 8802124d01d8
[39565.555858] RDX: 000e RSI: 8801948da880 RDI: 0292
[39565.555862] RBP: 880162b12800 R08: 0050 R09: 000d
[39565.555866] R10: 000c R11: 00015670 R12: 8801caf7d1d8
[39565.555870] R13: 000b R14: 88017b3c0600 R15: 8801cecfb540
[39565.555875] FS:  7f1062f7b7e0() GS:88023fc2() 
knlGS:
[39565.555879] CS:  0010 DS:  ES:  CR0: 80050033
[39565.555883] CR2: 7f84b7689000 CR3: 00017b112000 CR4: 06e0
[39565.555887] DR0:  DR1:  DR2: 
[39565.555891] DR3:  DR6: 0ff0 DR7: 0400
[39565.555896] Process mount (pid: 18729, threadinfo 8801c5272000, task 
880236e947f0)
[39565.555899] Stack:
[39565.555901]  0001 01c9 8801 
000592ad
[39565.555908]  000592ad 0001 880203d96e00 
000c
[39565.555915]  1000 8801cec8b7f0 8801c52739e8 
8801caf7d1d8
[39565.555922] Call Trace:
[39565.555948]  [] ? add_inode_ref+0x2f3/0x385 [btrfs]
[39565.555974]  [] ? replay_one_buffer+0x181/0x1fb [btrfs]
[39565.556000]  [] ? alloc_extent_buffer+0x6f/0x295 [btrfs]
[39565.556025]  [] ? walk_down_log_tree+0x153/0x29c [btrfs]
[39565.556050]  [] ? walk_log_tree+0x81/0x196 [btrfs]
[39565.556074]  [] ? btrfs_read_fs_root_no_radix+0x166/0x1a5 
[btrfs]
[39565.556099]  [] ? btrfs_recover_log_trees+0x192/0x297 
[btrfs]
[39565.556125]  [] ? replay_one_dir_item+0xb3/0xb3 [btrfs]
[39565.556148]  [] ? 
btree_read_extent_buffer_pages.clone.63+0x6f/0xb2 [btrfs]
[39565.556173]  [] ? open_ctree+0x10f5/0x140e [btrfs]
[39565.556180]  [] ? string.clone.2+0x39/0x9f
[39565.556187]  [] ? sget+0x363/0x381
[39565.556207]  [] ? btrfs_mount+0x228/0x470 [btrfs]
[39565.556213]  [] ? pcpu_next_pop+0x37/0x45
[39565.556219]  [] ? cpumask_next+0x18/0x1d
[39565.556224]  [] ? pcpu_alloc+0x7b4/0x7cc
[39565.556232]  [] ? mount_fs+0x67/0x150
[39565.556241]  [] ? vfs_kern_mount+0x58/0x97
[39565.556249]  [] ? do_kern_mount+0x49/0xd8
[39565.556255]  [] ? do_mount+0x690/0x6f6
[39565.556262]  [] ? should_resched+0x5/0x24
[39565.556269]  [] ? _cond_resched+0x9/0x20
[39565.556275]  [] ? memdup_user+0x36/0x5b
[39565.556280]  [] ? sys_mount+0x88/0xc3
[39565.556287]  [] ? system_call_fastpath+0x16/0x1b
[39565.556291] Code: 89 f

Re: btrfs hung tasks

2011-07-29 Thread Stephane Chazelas

2011-07-28 10:22:54 -0400, Josef Bacik:
> On Thu, Jul 28, 2011 at 07:23:43AM +0100, Stephane Chazelas wrote:
> > Hiya, I got below those last night. That was 3 minutes after a
> > bunch of rsync and ntfsclone processes started.
> > 
> > It's the first time it happens. I upgraded from 3.0rc6 to 3.0
> > yesterday.
> > 
> 
> Ok I fixed that recently and Chris just sent it to Linus.  The patch you are
> looking for is
> 
> Btrfs: use a worker thread to do caching
[...]

Thanks a lot Josef,

That seems to have done the trick. I've not reproduced the issue
yet with that patch applied. Strange that I only get the issue
with 3.0 and not 3.0rc6  though

-- 
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs hung tasks

2011-07-28 Thread Stephane Chazelas

2011-07-28 07:23:43 +0100, Stephane Chazelas:
> Hiya, I got below those last night. That was 3 minutes after a
> bunch of rsync and ntfsclone processes started.
> 
> It's the first time it happens. I upgraded from 3.0rc6 to 3.0
> yesterday.
[...]

And again this morning, though at that point only one ntfsclone
process was actively writing to the FS.

At this point, I can read directories and stat(2) files on that
FS, but reading or writing files hangs.

I'll try and revert to 3.0rc6 to see if that makes a difference.
call traces for some processes trying to read from the FS:

cat D 8801424ee240 0  3478  1 0x0005
 8801424ee240 0086 8801080497e8 8800494322e0
 8801461908b0 00012800 880108049fd8 880108049fd8
 00012800 8801424ee240 00012800 00012800
Call Trace:
 [] ? _raw_spin_lock_irqsave+0x9/0x25
 [] ? btrfs_tree_lock+0x9a/0xa7 [btrfs]
 [] ? btrfs_spin_on_block+0x49/0x49 [btrfs]
 [] ? btrfs_set_path_blocking+0x21/0x32 [btrfs]
 [] ? btrfs_search_slot+0x3c6/0x4d6 [btrfs]
 [] ? btrfs_lookup_csum+0x65/0x105 [btrfs]
 [] ? btrfs_lookup_ordered_extent+0x2b/0x69 [btrfs]
 [] ? btrfs_find_ordered_sum+0x34/0xcc [btrfs]
 [] ? __btrfs_lookup_bio_sums+0x16f/0x2ed [btrfs]
 [] ? btrfs_submit_compressed_read+0x3b7/0x42e [btrfs]
 [] ? submit_one_bio+0x85/0xbc [btrfs]
 [] ? submit_extent_page.clone.16+0x118/0x1b9 [btrfs]
 [] ? check_page_uptodate+0x36/0x36 [btrfs]
 [] ? __extent_read_full_page+0x463/0x4cc [btrfs]
 [] ? check_page_uptodate+0x36/0x36 [btrfs]
 [] ? uncompress_inline.clone.32+0x117/0x117 [btrfs]
 [] ? extent_readpages+0xb1/0xf6 [btrfs]
 [] ? uncompress_inline.clone.32+0x117/0x117 [btrfs]
 [] ? __do_page_cache_readahead+0x124/0x1c8
 [] ? ra_submit+0x1c/0x23
 [] ? generic_file_aio_read+0x2a7/0x5c7
 [] ? do_sync_read+0xb1/0xea
 [] ? _raw_spin_lock_irq+0xd/0x1a
 [] ? vfs_read+0x9f/0xf2
 [] ? syscall_trace_enter+0xb5/0x15d
 [] ? sys_read+0x45/0x6b
 [] ? tracesys+0xd9/0xde


wc  D 8801424ef710 0  3495  1 0x0005
 8801424ef710 0086 811ab802 88014951f5c0
 8160b020 00012800 880109617fd8 880109617fd8
 00012800 8801424ef710 00012800 00012800
Call Trace:
 [] ? delay_tsc+0x2b/0x68
 [] ? _raw_spin_lock_irqsave+0x9/0x25
 [] ? btrfs_tree_lock+0x9a/0xa7 [btrfs]
 [] ? btrfs_spin_on_block+0x49/0x49 [btrfs]
 [] ? map_private_extent_buffer+0xa3/0xc4 [btrfs]
 [] ? btrfs_lock_root_node+0x1d/0x3f [btrfs]
 [] ? btrfs_search_slot+0xe6/0x4d6 [btrfs]
 [] ? btrfs_header_generation.clone.17+0xf/0x14 [btrfs]
 [] ? btrfs_lookup_csum+0x65/0x105 [btrfs]
 [] ? btrfs_lookup_ordered_extent+0x2b/0x69 [btrfs]
 [] ? btrfs_find_ordered_sum+0x34/0xcc [btrfs]
 [] ? __btrfs_lookup_bio_sums+0x16f/0x2ed [btrfs]
 [] ? btrfs_submit_bio_hook+0xa4/0x129 [btrfs]
 [] ? submit_one_bio+0x85/0xbc [btrfs]
 [] ? submit_extent_page.clone.16+0x118/0x1b9 [btrfs]
 [] ? check_page_uptodate+0x36/0x36 [btrfs]
 [] ? __extent_read_full_page+0x463/0x4cc [btrfs]
 [] ? check_page_uptodate+0x36/0x36 [btrfs]
 [] ? uncompress_inline.clone.32+0x117/0x117 [btrfs]
 [] ? extent_readpages+0xb1/0xf6 [btrfs]
 [] ? uncompress_inline.clone.32+0x117/0x117 [btrfs]
 [] ? __do_page_cache_readahead+0x124/0x1c8
 [] ? ra_submit+0x1c/0x23
 [] ? generic_file_aio_read+0x26b/0x5c7
 [] ? do_sync_read+0xb1/0xea
 [] ? _raw_spin_lock_irq+0xd/0x1a
 [] ? vfs_read+0x9f/0xf2
 [] ? syscall_trace_enter+0xb5/0x15d
 [] ? sys_read+0x45/0x6b
 [] ? tracesys+0xd9/0xde


tailD 88014651b750 0  3442   1844 0x0004
 88014651b750 0082 880147b06508 
 8801495660c0 00012800 88010e05dfd8 88010e05dfd8
 00012800 88014651b750 00012800 00012800
Call Trace:
 [] ? __mutex_lock_common.clone.5+0x114/0x179
 [] ? mutex_lock+0x1a/0x2d
 [] ? generic_file_llseek+0x21/0x52
 [] ? sys_lseek+0x3c/0x59
 [] ? system_call_fastpath+0x16/0x1b


rm  D 8801424bd060 0  3504  1 0x0005
 8801424bd060 0086  0001
 8801495660c0 00012800 880118b21fd8 880118b21fd8
 00012800 8801424bd060 00012800 00012800
Call Trace:
 [] ? _raw_spin_lock_irqsave+0x9/0x25
 [] ? wait_current_trans.clone.22+0xa1/0xd0 [btrfs]
 [] ? wake_up_bit+0x23/0x23
 [] ? start_transaction+0xd9/0x227 [btrfs]
 [] ? __unlink_start_trans+0x52/0x399 [btrfs]
 [] ? should_resched+0x5/0x24
 [] ? _cond_resched+0x9/0x20
 [] ? generic_permission+0xe/0x9b
 [] ? btrfs_unlink+0x1e/0xa4 [btrfs]
 [] ? vfs_unlink+0x65/0xbe
 [] ? do_unlinkat+0xc6/0x14d
 [] ? ptrace_report_syscall.clone.8+0x27/0x4f
 [] ? syscall_trace_enter+0xb5/0x15d
 [] ? tracesys+0xd9/0xde


tee D 88013d9d4140 0  3506  1 0x0005
 88013d9d4140 0082 880142634c40 880142634730
 8160b020 00012800 88007e18bfd8 88007

btrfs hung tasks

2011-07-27 Thread Stephane Chazelas

Hiya, I got below those last night. That was 3 minutes after a
bunch of rsync and ntfsclone processes started.

It's the first time it happens. I upgraded from 3.0rc6 to 3.0
yesterday.

sysrq-t output attached.

[30961.476020] INFO: task kthreadd:2 blocked for more than 120 seconds.
[30961.482414] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[30961.490271] kthreaddD 880149526e20 0 2  0 0x
[30961.497489]  880149526e20 0046 88014fffb700 
ea7eb520
[30961.504977]  880147045890 00012800 88014952ffd8 
88014952ffd8
[30961.512525]  00012800 880149526e20 00012800 
00012800
[30961.520179] Call Trace:
[30961.522702]  [] ? read_tsc+0x5/0x14
[30961.527850]  [] ? timekeeping_get_ns+0xd/0x2a
[30961.533861]  [] ? lock_page+0x20/0x20
[30961.539134]  [] ? io_schedule+0x5b/0x75
[30961.548853]  [] ? sleep_on_page+0x9/0x10
[30961.554424]  [] ? __wait_on_bit+0x3e/0x71
[30961.560042]  [] ? wait_on_page_bit+0x6a/0x70
[30961.565912]  [] ? autoremove_wake_function+0x2a/0x2a
[30961.572478]  [] ? migrate_pages+0x1b6/0x39b
[30961.578243]  [] ? suitable_migration_target+0x35/0x35
[30961.584877]  [] ? compact_zone+0x68e/0x6a1
[30961.590567]  [] ? shrink_dcache_memory+0x132/0x15d
[30961.596934]  [] ? compact_zone_order+0x9e/0xab
[30961.602954]  [] ? try_to_compact_pages+0x90/0xe6
[30961.609147]  [] ? __alloc_pages_direct_compact+0xac/0x163
[30961.616124]  [] ? __alloc_pages_nodemask+0x485/0x75c
[30961.622665]  [] ? copy_process+0x109/0x10eb
[30961.628462]  [] ? perf_event_task_sched_out+0x48/0x54
[30961.635108]  [] ? do_fork+0xff/0x263
[30961.640262]  [] ? kernel_thread+0x7b/0x83
[30961.645846]  [] ? kthread_worker_fn+0x149/0x149
[30961.651955]  [] ? gs_change+0x13/0x13
[30961.657193]  [] ? kthreadd+0xe7/0x125
[30961.662434]  [] ? finish_task_switch+0x84/0xaf
[30961.668477]  [] ? kernel_thread_helper+0x4/0x10
[30961.674587]  [] ? tsk_fork_get_node+0x1a/0x1a
[30961.680518]  [] ? gs_change+0x13/0x13
[30961.685772] INFO: task btrfs-delayed-m:785 blocked for more than 120 seconds.
[30961.692919] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[30961.700793] btrfs-delayed-m D 88014628a300 0   785  2 0x
[30961.707874]  88014628a300 0046 88014549cc28 
eb3f
[30961.715347]  880147044ab0 00012800 8801484c3fd8 
8801484c3fd8
[30961.722816]  00012800 88014628a300 00012800 
00012800
[30961.730283] Call Trace:
[30961.732750]  [] ? _raw_spin_lock_irqsave+0x9/0x25
[30961.739056]  [] ? btrfs_tree_lock+0x9a/0xa7 [btrfs]
[30961.745524]  [] ? btrfs_spin_on_block+0x49/0x49 [btrfs]
[30961.752337]  [] ? btrfs_lock_root_node+0x1d/0x3f [btrfs]
[30961.759231]  [] ? btrfs_search_slot+0xe6/0x4d6 [btrfs]
[30961.765948]  [] ? mutex_lock+0xd/0x2d
[30961.771190]  [] ? should_resched+0x5/0x24
[30961.776809]  [] ? btrfs_lookup_inode+0x25/0x87 [btrfs]
[30961.783527]  [] ? need_resched+0x1a/0x23
[30961.789062]  [] ? should_resched+0x5/0x24
[30961.794654]  [] ? _cond_resched+0x9/0x20
[30961.800155]  [] ? mutex_lock+0xd/0x2d
[30961.805412]  [] ? btrfs_update_delayed_inode+0x6b/0x126 
[btrfs]
[30961.812967]  [] ? 
btrfs_async_run_delayed_node_done+0x9f/0x1cb [btrfs]
[30961.821126]  [] ? worker_loop+0x17e/0x498 [btrfs]
[30961.827420]  [] ? btrfs_queue_worker+0x25c/0x25c [btrfs]
[30961.834320]  [] ? btrfs_queue_worker+0x25c/0x25c [btrfs]
[30961.841209]  [] ? kthread+0x7a/0x82
[30961.846276]  [] ? kernel_thread_helper+0x4/0x10
[30961.852381]  [] ? kthread_worker_fn+0x149/0x149
[30961.858487]  [] ? gs_change+0x13/0x13
[30961.863725] INFO: task btrfs-transacti:892 blocked for more than 120 seconds.
[30961.870870] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[30961.878727] btrfs-transacti D 880147e24ee0 0   892  2 0x
[30961.885811]  880147e24ee0 0046 ea06f318 
8800
[30961.893280]  8801495660c0 00012800 880144357fd8 
880144357fd8
[30961.900743]  00012800 880147e24ee0 00012800 
00012800
[30961.908210] Call Trace:
[30961.910669]  [] ? read_tsc+0x5/0x14
[30961.915737]  [] ? lock_page+0x20/0x20
[30961.920978]  [] ? io_schedule+0x5b/0x75
[30961.926390]  [] ? sleep_on_page+0x9/0x10
[30961.931888]  [] ? __wait_on_bit+0x3e/0x71
[30961.937473]  [] ? wait_on_page_bit+0x6a/0x70
[30961.943320]  [] ? autoremove_wake_function+0x2a/0x2a
[30961.949871]  [] ? lock_page+0x11/0x20 [btrfs]
[30961.955817]  [] ? 
extent_write_cache_pages.clone.10.clone.17+0xed/0x21f [btrfs]
[30961.964730]  [] ? pagevec_lookup_tag+0x18/0x1f
[30961.970757]  [] ? filemap_fdatawait_range+0x12c/0x14a
[30961.977400]  [] ? extent_writepages+0x44/0x5a [btrfs]
[30961.984042]  [] ? uncompress_inline.clone.32+0x117/0x117 
[btrfs]
[30961.991639]  [] ? __filemap_fdatawrite_range+0x4b/0x50
[30961.998365]  [] ? btrfs_wait_ordered_range+0x53/0x11

BUG() in btrfs-fixup (Was: btrfs invalid opcode)

2011-07-27 Thread Stephane Chazelas

2011-07-25 17:38:10 +0100, Jeremy Sanders:
> I'm afraid this is a rather old kernel, 2.6.35.13-92.fc14.x86_64, but this 
> error looks rather similiar to 
> http://www.spinics.net/lists/linux-btrfs/msg11053.html
> 
> Has this been fixed? I was simultaneously doing rsyncs into different 
> subvolumes (one reading and one writing).
[...]
> [454244.123523] kernel BUG at fs/btrfs/inode.c:1528!
[...]
> [454244.124338] Pid: 3158, comm: btrfs-fixup-0 Not tainted 
> 2.6.35.13-92.fc14.x86_64 #1 C51MCP51/C51GM03
> [454244.124338] RIP: 0010:[]  [] 
> btrfs_writepage_fixup_worker+0xde/0x118 [btrfs]
[...]

Hi Jeremy,

glad I'm not the only one with that issue. That may renew the
interest in it...

I don't think much progress has been made on it.

We could compare our experience to see what contributes to its
occurrence.

It occurs (quite reproducibly) for me when rsyncing from a
multi-device multi-subvolume btrfs fs (mounted with
compress-force) onto a single device, no subvolume btrfs fs also
mounted with compress-force. It also happens when the target is
mounted with compress instead of compress-force but not if I
leave out "compress".

I only get one occurrence of those BUG()s until I reboot.

After the occurrence of that BUG(), I saw a number of
misbehaviors that may or may not be linked to it:

- btrfs eating all memory (mostly in the btrfs_inode_cache slab)
  resulting in crash. That doesn't happen anymore since I'm
  mounting with no atime and use CONFIG_SLUB (though I suspect
  it's noatime alone that did the trick)

- occasionally, 20 to 95% of write(2) system calls to files on
  the source FS take 4 seconds, making it hardly usable. I also
  notice a flush-btrfs-1 stuck in "D" state

How does that compare with your experience?

-- 
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: write(2) taking 4s

2011-07-19 Thread Stephane Chazelas

2011-07-18 20:37:25 +0100, Stephane Chazelas:
> 2011-07-18 11:39:12 +0100, Stephane Chazelas:
> > 2011-07-17 10:17:37 +0100, Stephane Chazelas:
> > > 2011-07-16 13:12:10 +0100, Stephane Chazelas:
> > > > Still on my btrfs-based backup system. I still see one BUG()
> > > > reached in btrfs-fixup per boot time, no memory exhaustion
> > > > anymore. There is now however something new: write performance
> > > > is down to a few bytes per second.
> > > [...]
> > > 
> > > The condition that was causing that seems to have cleared by
> > > itself this morning before 4am.
> > > 
> > > flush-btrfs-1 and sync are still in D state.
> > > 
> > > Can't really tell what cleared it. Could be when the first of
> > > the rsyncs ended as all the other ones (and ntfsclones from nbd
> > > devices) ended soon after
> > [...]
> > 
> > New nightly backup, and it's happening again. Started about 40
> > minutes after the start of the backup.
> [...]
> > Actively  running at the moment are 1 rsync and 3 ntfsclone.
> [...]
> 
> And then again today.
> 
> Interestingly, I "killall -STOP"ed all the ntfsclone and rsync
> processes and:
[...]
> Now 95% of the write(2)s take 4 seconds (while it was about 15%
> before I stopped the processes).
[...]

And this morning, after killing everything so that nothing was
writing to the FS anymore, 95% of write(2)s were delayed as well
(according to strace -Te write yes > file-on-btrfs).

Then I rebooted (sysrq-b) and am trying btrfsck (from
integration-20110705) on it, but btrfsck is using 8G of memory
on a system that has only 5G so it's swapping in and out
constantly and getting nowhere (and renders the system hardly usable)

I found
http://thread.gmane.org/gmane.comp.file-systems.btrfs/5716/focus=5728
from last year. Is that still the case?

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 1950 root  20   0 7684m 4.4g  232 R4 91.1   4:22.87 btrfsck
(and still growing)

 vmstat 1
procs ---memory-- ---swap-- -io -system-- cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
 2  2 3232016 115232   4524   3520  698  708  3305   716  991  570  3  1 56 39
 0  2 3231816 111536   5976   3428 2964  532  4912   532 1569  683  1  0 46 53
 0  2 3231144 105832   8144   3536 3140   24  532424 1612  392  1  1 38 60
 0  2 3231532 104964   8180   3684 2672  900  2708   900 1017  324  1  1 34 64

-- 
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: write(2) taking 4s

2011-07-18 Thread Stephane Chazelas

2011-07-18 11:39:12 +0100, Stephane Chazelas:
> 2011-07-17 10:17:37 +0100, Stephane Chazelas:
> > 2011-07-16 13:12:10 +0100, Stephane Chazelas:
> > > Still on my btrfs-based backup system. I still see one BUG()
> > > reached in btrfs-fixup per boot time, no memory exhaustion
> > > anymore. There is now however something new: write performance
> > > is down to a few bytes per second.
> > [...]
> > 
> > The condition that was causing that seems to have cleared by
> > itself this morning before 4am.
> > 
> > flush-btrfs-1 and sync are still in D state.
> > 
> > Can't really tell what cleared it. Could be when the first of
> > the rsyncs ended as all the other ones (and ntfsclones from nbd
> > devices) ended soon after
> [...]
> 
> New nightly backup, and it's happening again. Started about 40
> minutes after the start of the backup.
[...]
> Actively  running at the moment are 1 rsync and 3 ntfsclone.
[...]

And then again today.

Interestingly, I "killall -STOP"ed all the ntfsclone and rsync
processes and:

# strace -tt -Te write yes > a-file-on-the-btrfs-fs
20:23:26.635848 write(1, "y\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\n"..., 
4096) = 4096 <4.095223>
20:23:30.731391 write(1, "y\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\n"..., 
4096) = 4096 <4.095769>
20:23:34.827390 write(1, "y\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\n"..., 
4096) = 4096 <4.095788>
20:23:38.923388 write(1, "y\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\n"..., 
4096) = 4096 <4.095771>

Now 95% of the write(2)s take 4 seconds (while it was about 15%
before I stopped the processes).

[304257.760119] yes S 88001e8e3780 0 13179  13178 0x0001
[304257.760119]  88001e8e3780 0086  
8160b020
[304257.760119]  000127c0 880074543fd8 880074543fd8 
000127c0
[304257.760119]  88001e8e3780 880074542010 0286 
00010286
[304257.760119] Call Trace:
[304257.760119]  [] ? schedule_timeout+0xa0/0xd7
[304257.760119]  [] ? lock_timer_base+0x49/0x49
[304257.760119]  [] ? shrink_delalloc+0x100/0x14e [btrfs]
[304257.760119]  [] ? 
btrfs_delalloc_reserve_metadata+0xf9/0x10b [btrfs]
[304257.760119]  [] ? btrfs_delalloc_reserve_space+0x20/0x3e 
[btrfs]
[304257.760119]  [] ? __btrfs_buffered_write+0x137/0x2dc 
[btrfs]
[304257.760119]  [] ? btrfs_dirty_inode+0x119/0x139 [btrfs]
[304257.760119]  [] ? btrfs_file_aio_write+0x395/0x42b [btrfs]
[304257.760119]  [] ? __switch_to+0x19c/0x288
[304257.760119]  [] ? do_sync_write+0xb1/0xea
[304257.760119]  [] ? ptrace_notify+0x7f/0x9d
[304257.760119]  [] ? security_file_permission+0x18/0x2d
[304257.760119]  [] ? vfs_write+0xa4/0xff
[304257.760119]  [] ? syscall_trace_enter+0xb6/0x15b
[304257.760119]  [] ? sys_write+0x45/0x6e
[304257.760119]  [] ? tracesys+0xd9/0xde

After killall -CONT, it's back to 15% write(2)s delayed.

What's going on?

-- 
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: write(2) taking 4s

2011-07-18 Thread Stephane Chazelas

2011-07-17 10:17:37 +0100, Stephane Chazelas:
> 2011-07-16 13:12:10 +0100, Stephane Chazelas:
> > Still on my btrfs-based backup system. I still see one BUG()
> > reached in btrfs-fixup per boot time, no memory exhaustion
> > anymore. There is now however something new: write performance
> > is down to a few bytes per second.
> [...]
> 
> The condition that was causing that seems to have cleared by
> itself this morning before 4am.
> 
> flush-btrfs-1 and sync are still in D state.
> 
> Can't really tell what cleared it. Could be when the first of
> the rsyncs ended as all the other ones (and ntfsclones from nbd
> devices) ended soon after
[...]

New nightly backup, and it's happening again. Started about 40
minutes after the start of the backup.

system -net/total- ---procs--- --dsk/sda-dsk/sdb-dsk/sdc--
 time | recv  send|run blk new| read  writ: read  writ: read  writ
17-07 20:19:18|   0 0 |0.0 0.0 0.0| 142k   31k: 119k   36k: 120k   33k
17-07 20:19:48|8087k  224k|1.2 5.3 0.1|2976k   98k: 793k  400k:2856k  375k
17-07 20:20:18|5174k  134k|0.8 4.6 0.9| 880k  179k: 830k  916k:1801k  825k
17-07 20:20:48|6634k  148k|1.3 4.9 0.2| 609k  101k:1259k   96k:2628k   98k
17-07 20:21:18|6725k  165k|0.7 5.8 0.0| 237k  442k: 975k  723k:1870k  644k
17-07 20:21:48|7100k  153k|0.7 5.4   0| 305k   83k:1124k  314k:2155k  274k
17-07 20:22:18|4440k  178k|0.5 5.3 0.0| 296k 1775B:2094k  240k:1663k  239k
17-07 20:22:48|8181k  220k|0.9 5.8   0| 360k  410B:1579k  196k:2065k  196k
17-07 20:23:18|8144k  228k|1.3 5.6   0| 348k   54k:1781k  216k:2213k  164k
17-07 20:23:48|5506k  185k|0.8 5.2 0.1| 307k0 :2040k0 :2166k0
17-07 20:24:18|6260k  206k|1.0 5.4 0.1| 474k   78k:2034k  285k:2218k  207k
17-07 20:24:48|8420k  314k|1.5 5.4   0| 313k  363k:2367k  391k:2182k  124k
17-07 20:25:18|8367k  247k|0.9 5.1 0.2| 475k   77k:1797k   75k:2220k  410B
17-07 20:25:48|7511k  179k|1.0 4.7   0| 406k 7646B:1596k  145k:2397k  147k
17-07 20:26:18|7930k  162k|0.7 5.1   0| 991k  410B:1468k   26k:2186k   26k
17-07 20:26:48|7757k  176k|1.0 5.3   0|1884k   26k:1147k   58k:2761k   32k
[...]
17-07 20:57:18|6917k  120k|0.3 4.1   0|  56k  410B:  65k 4506B: 213k 4506B
17-07 20:57:48|5698k  103k|0.1 4.0   0|   0   410B:  27k 6007B: 590k 6007B
17-07 20:58:18|6582k  117k|0.2 4.0   0| 229k   20k: 195k  956B: 290k   21k
17-07 20:58:48|6048k  110k|0.6 4.0 0.1|  32k   21k:  81k  410B: 331k   21k
17-07 20:59:18|8057k  138k|0.6 4.1   0|  42k 5871B:  33k  410B:  35k 5871B
17-07 20:59:48|7369k  145k|0.5 4.1   0|  59k 3959B: 230k  410B: 532k 3959B
17-07 21:00:18|8189k  140k|0.7 4.0   0|  53k 6007B:  58k  410B:  40k 6007B
17-07 21:00:48|7596k  137k|0.3 4.2   0|  24k 6690B: 250k  410B:  15k 5734B
17-07 21:01:18|8448k  145k|0.7 4.2   0|  24k 1365B: 325k 6827B:  15k 7646B
17-07 21:01:48|6821k  119k|0.3 4.0   0|  17k  410B: 175k 3004B:  11k 3004B
17-07 21:02:18|3614k   66k|0.7 2.7   0|  39k  410B: 538k 4779B:  45k 4779B
17-07 21:02:48| 417k   14k|0.5 1.3 0.3| 106k 1638B: 209k 4779B:   0  4779B
17-07 21:03:18| 353k 7979B|0.8 1.2   0|   0  1229B: 449k 2867B:   0  2867B
17-07 21:03:48| 327k 8981B|1.1 1.2   0|   0   410B: 686k 4506B:  43k 4506B
[...]
18-07 11:02:48| 243k 4866B|0.0 1.2 0.1|   0  2458B:   0  3550B:   0  3550B
18-07 11:03:18| 274k 5506B|0.1 1.2 0.1|   0  1775B:   0  3550B:   0  3550B
18-07 11:03:48| 238k 4851B|0.1 1.2 0.0|   0  4369B:   0  3550B:   0  3550B
18-07 11:04:18| 243k 4999B|0.1 1.1 0.1|   0  4506B:   0  3550B:   0  3550B
18-07 11:04:48| 288k 6488B|0.1 1.1 0.4|   0  2458B:   0  3550B:   0  3550B


Because that's after the week-end, there's not much to write.
What's holding 3 of the backups is actually writing log data
like "xx% Completed".

Actively  running at the moment are 1 rsync and 3 ntfsclone.

# strace -tt -s 2 -Te write -p 8771 -p 8567 -p 8856 -p 8403
Process 8771 attached - interrupt to quit
Process 8567 attached - interrupt to quit
Process 8856 attached - interrupt to quit
Process 8403 attached - interrupt to quit
[pid  8403] 11:12:26.539830 write(4, "es"..., 1024 
[pid  8771] 11:12:26.540417 write(4, "hb"..., 4096 
[pid  8567] 11:12:26.555211 write(1, " 3"..., 25 
[pid  8856] 11:12:26.593232 write(1, " 6"..., 25 
[pid  8403] 11:12:30.635257 <... write resumed> ) = 1024 <4.095271>
[pid  8403] 11:12:30.635309 write(4, "19"..., 112 
[pid  8567] 11:12:30.635364 <... write resumed> ) = 25 <4.080091>
[pid  8856] 11:12:30.635553 <... write resumed> ) = 25 <4.042268>
[pid  8771] 11:12:30.635799 <... write resumed> ) = 4096 <4.095350>
[pid  8771] 11:12:30.636182 write(4, "hb"..., 4096 
[pid  8567] 11:12:30.649904 write(1, " 3"..., 25 
[pid  8403] 11:12:30.651452 <... write resumed> ) = 112 <0.015921>
[pid  8567] 11:12:30.651595 <... write resumed> ) = 25 <0.001640>
[pid  8403] 11:12:30.65

Re: write(2) taking 4s. (Was: Memory leak?)

2011-07-17 Thread Stephane Chazelas

2011-07-16 13:12:10 +0100, Stephane Chazelas:
> Still on my btrfs-based backup system. I still see one BUG()
> reached in btrfs-fixup per boot time, no memory exhaustion
> anymore. There is now however something new: write performance
> is down to a few bytes per second.
[...]

The condition that was causing that seems to have cleared by
itself this morning before 4am.

flush-btrfs-1 and sync are still in D state.

Can't really tell what cleared it. Could be when the first of
the rsyncs ended as all the other ones (and ntfsclones from nbd
devices) ended soon after

Cheers,
Stephane

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: write(2) taking 4s. (Was: Memory leak?)

2011-07-16 Thread Stephane Chazelas

2011-07-16 13:12:10 +0100, Stephane Chazelas:
[...]
> ntfsclone (patched to only write modified clusters):
> 
> # strace -Te write -p 4717
> Process 4717 attached - interrupt to quit
> write(1, " 65.16 percent completed\r", 25) = 25 <0.008996>
> write(1, " 65.16 percent completed\r", 25) = 25 <0.743358>
> write(1, " 65.16 percent completed\r", 25) = 25 <0.306582>
> write(1, " 65.17 percent completed\r", 25) = 25 <4.082723>
> write(1, " 65.17 percent completed\r", 25) = 25 <0.006402>
> write(1, " 65.17 percent completed\r", 25) = 25 <0.012582>
> write(1, " 65.17 percent completed\r", 25) = 25 <4.052504>
> write(1, " 65.17 percent completed\r", 25) = 25 <0.012111>
> write(1, " 65.17 percent completed\r", 25) = 25 <0.016001>
> write(1, " 65.17 percent completed\r", 25) = 25 <4.028017>
> write(1, " 65.18 percent completed\r", 25) = 25 <0.013365>
> write(1, " 65.18 percent completed\r", 25) = 25 <0.003963>
> (that's writing to a log file)
> 
> See how many write(2)s take 4 seconds.
[...]

top - 17:14:18 up 1 day,  9:20,  3 users,  load average: 1.00, 1.06, 1.11
Tasks: 146 total,   1 running, 145 sleeping,   0 stopped,   0 zombie
Cpu0  :  0.0%us,  0.0%sy,  0.0%ni, 25.0%id, 75.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   5094800k total,  4616412k used,   478388k free,  1420192k buffers
Swap:  4194300k total, 8720k used,  4185580k free,  2266240k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  P COMMAND
 2156 root  20   0 000 D0  0.0   0:00.02 0 flush-btrfs-1
 6206 root  20   0 19112 1284  916 R0  0.0   0:00.09 0 top
1 root  20   0  8400  220  196 S0  0.0   0:02.26 0 init
(all the other processes sleeping)

I suspect load 1 is for that flush-btrfs-1 in

[86372.445554] flush-btrfs-1   D 88014438da30 0  2156  2 0x
[86372.445554]  88014438da30 0046 8100e366 
88014438a9a0
[86372.445554]  000127c0 880021501fd8 880021501fd8 
000127c0
[86372.445554]  88014438da30 880021500010 8100e366 
81066ec6
[86372.445554] Call Trace:
[86372.445554]  [] ? read_tsc+0x5/0x16
[86372.445554]  [] ? read_tsc+0x5/0x16
[86372.445554]  [] ? timekeeping_get_ns+0xd/0x2a
[86372.445554]  [] ? __lock_page+0x63/0x63
[86372.445554]  [] ? io_schedule+0x84/0xc3
[86372.445554]  [] ? radix_tree_gang_lookup_tag_slot+0x7a/0x9f
[86372.445554]  [] ? sleep_on_page+0x9/0xd
[86372.445554]  [] ? __wait_on_bit_lock+0x3c/0x85
[86372.445554]  [] ? __lock_page+0x5d/0x63
[86372.445554]  [] ? autoremove_wake_function+0x2a/0x2a
[86372.445554]  [] ? T.1090+0xba/0x234 [btrfs]
[86372.445554]  [] ? extent_writepages+0x40/0x56 [btrfs]
[86372.445554]  [] ? btrfs_submit_direct+0x403/0x403 [btrfs]
[86372.445554]  [] ? writeback_single_inode+0xb8/0x1b8
[86372.445554]  [] ? writeback_sb_inodes+0xc2/0x13b
[86372.445554]  [] ? writeback_inodes_wb+0xfd/0x10f
[86372.445554]  [] ? wb_writeback+0x213/0x330
[86372.445554]  [] ? lock_timer_base+0x25/0x49
[86372.445554]  [] ? wb_do_writeback+0x16d/0x1fc
[86372.445554]  [] ? del_timer_sync+0x34/0x3e
[86372.445554]  [] ? bdi_writeback_thread+0xc3/0x1ff
[86372.445554]  [] ? wb_do_writeback+0x1fc/0x1fc
[86372.445554]  [] ? wb_do_writeback+0x1fc/0x1fc
[86372.445554]  [] ? kthread+0x7a/0x82
[86372.445554]  [] ? kernel_thread_helper+0x4/0x10
[86372.445554]  [] ? kthread_worker_fn+0x147/0x147
[86372.445554]  [] ? gs_change+0x13/0x13

Also, if I run sync(1), it seems to never return.

[120348.788021] syncD 88011b3e1bc0 0  6215   1789 0x
[120348.788021]  88011b3e1bc0 0082  
8160b020
[120348.788021]  000127c0 8800b0f25fd8 8800b0f25fd8 
000127c0
[120348.788021]  88011b3e1bc0 8800b0f24010 88011b3e1bc0 
00014fc127c0
[120348.788021] Call Trace:
[120348.788021]  [] ? schedule_timeout+0x2d/0xd7
[120348.788021]  [] ? __sync_filesystem+0x74/0x74
[120348.788021]  [] ? wait_for_common+0xd1/0x14e
[120348.788021]  [] ? try_to_wake_up+0x18c/0x18c
[120348.788021]  [] ? __sync_filesystem+0x74/0x74
[120348.788021]  [] ? __sync_filesystem+0x74/0x74
[120348.788021]  [] ? writeback_inodes_sb_nr+0x72/0x78
[120348.788021]  [] ? __sync_filesystem+0x4e/0x74
[120348.788021]  [] ? iterate_supers+0x5e/0xab
[120348.788021]  [] ? sys_sync+0x28/0x53
[120348.788021]  [] ? system_call_fastpath+0x16/0x1b

-- 
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

write(2) taking 4s. (Was: Memory leak?)

2011-07-16 Thread Stephane Chazelas

Still on my btrfs-based backup system. I still see one BUG()
reached in btrfs-fixup per boot time, no memory exhaustion
anymore. There is now however something new: write performance
is down to a few bytes per second.

I've got a few processes (rsync, patched ntfsclone, shells
mostly) writing to files at the same time on this server.

disk stats per second:

--dsk/sda-dsk/sdb-dsk/sdc--
 read  writ: read  writ: read  writ
 264k   44k: 193k   44k: 225k   42k
   0 0 :   0 0 :   0 0
   0 0 :   0 0 :   0 0
   0 0 :   0 0 :   0 0
   060k:   0 0 :   0 0
   012k:   0  1176k:   0  1164k
   0 0 :   0 0 :   0 0
   0 0 :   0 0 :   0 0
   0 0 :   0 0 :   0 0
   040k:   0 0 :8192B0
   0 0 :   0 0 :   0 0
   0 0 :   0 0 :   0 0
   0 0 :   0 0 :   0 0
   0 0 :   0 0 :   0 0
   0 0 :   0 0 :4096B0
   0 0 :   0 0 :   0 0
   0 0 :   0 0 :   0 0
   0 0 :   0 0 :   0 0
 324k0 : 248k0 : 548k0
   0  4096B:   0 0 :   0 0
 352k0 :   0 0 :   0 0
   0 0 :   0 0 :   0 0
   0 0 :   0 0 :4096B0
   0 0 :   0 0 :   0 0
   080k:   0 0 :   0 0

rsync:

# strace -Ts0 -p 5015
Process 5015 attached - interrupt to quit
write(3, ""..., 1024)   = 1024 <0.007700>
write(3, ""..., 1024)   = 1024 <0.015822>
write(3, ""..., 1024)   = 1024 <0.031853>
write(3, ""..., 1024)   = 1024 <0.015881>
write(3, ""..., 1024)   = 1024 <0.015911>
write(3, ""..., 1024)   = 1024 <0.015796>
write(3, ""..., 1024)   = 1024 <0.031946>
write(3, ""..., 1024)   = 1024 <4.083854>
write(3, ""..., 1024)   = 1024 <0.007925>
write(3, ""..., 1024)   = 1024 <0.003776>
write(3, ""..., 1024)   = 1024 <0.031862>
write(3, ""..., 1024)   = 1024 <0.011807>
write(3, ""..., 1024)   = 1024 <0.019742>
write(3, ""..., 1024)   = 1024 <0.015857>
write(3, ""..., 1024)   = 1024 <0.031833>
write(3, ""..., 1024)   = 1024 <0.015789>
write(3, ""..., 1024)   = 1024 <0.015926>
write(3, ""..., 1024)   = 1024 <4.095967>
write(3, ""..., 1024)   = 1024 <0.019798>
write(3, ""..., 1024)   = 1024 <4.083682>
write(3, ""..., 1024)   = 1024 <0.015398>
write(3, ""..., 1024)   = 1024 <0.015951>
write(3, ""..., 1024)   = 1024 <0.035837>
write(3, ""..., 1024)   = 1024 <0.015962>
write(3, ""..., 1024)   = 1024 <0.015909>
write(3, ""..., 1024)   = 1024 <0.015967>
write(3, ""..., 48) = 48 <0.003782>
write(3, ""..., 1024)   = 1024 <0.031802>
write(3, ""..., 1024)   = 1024 <0.015811>
write(3, ""..., 1024)   = 1024 <0.015944>
write(3, ""..., 1024)   = 1024 <0.019810>
write(3, ""..., 1024)   = 1024 <0.031948>

ntfsclone (patched to only write modified clusters):

# strace -Te write -p 4717
Process 4717 attached - interrupt to quit
write(1, " 65.16 percent completed\r", 25) = 25 <0.008996>
write(1, " 65.16 percent completed\r", 25) = 25 <0.743358>
write(1, " 65.16 percent completed\r", 25) = 25 <0.306582>
write(1, " 65.17 percent completed\r", 25) = 25 <4.082723>
write(1, " 65.17 percent completed\r", 25) = 25 <0.006402>
write(1, " 65.17 percent completed\r", 25) = 25 <0.012582>
write(1, " 65.17 percent completed\r", 25) = 25 <4.052504>
write(1, " 65.17 percent completed\r", 25) = 25 <0.012111>
write(1, " 65.17 percent completed\r", 25) = 25 <0.016001>
write(1, " 65.17 percent completed\r", 25) = 25 <4.028017>
write(1, " 65.18 percent completed\r", 25) = 25 <0.013365>
write(1, " 65.18 percent completed\r", 25) = 25 <0.003963>
(that's writing to a log file)

See how many write(2)s take 4 seconds.

No issue when writing to an ext4 FS.

SMART status on all drives OK.

What else could I look at?

Attached is a sysrq-t output.

-- 
Stephane


sysrq-t.txt.xz
Description: Binary data

Re: Memory leak?

2011-07-12 Thread Stephane Chazelas

2011-07-11 10:01:21 +0100, Stephane Chazelas:
[...]
> Same without dmcrypt. So to sum up, BUG() reached in btrfs-fixup
> thread when doing an 
> 
> - rsync (though I also got (back when on ubuntu and 2.6.38) at
>   least one occurrence using bsdtar | bsdtar)
> - of a large amount of data (with a large number of files),
>   though the bug occurs quite early probably after having
>   transfered about 50-100GB
> - the source FS being btrfs with compress-force on 3 devices
>   (one of which slightly shorter than the others) and a lot of
>   subvolumes and snapshots (I'm now copying from read-only
>   snapshots but that happened with RW ones as well).
> - to a newly created btrfs fs
> - on one device (/dev/sdd or dmcrypt)
> - mounted with compress or compress-force.
> 
> - noatime on either source or dest doesn't make a difference
>   (wrt the occurrence of fixup BUG())
> - can't reproduce it when dest is not mounted with compress
> - beside that BUG(),
> - kernel memory is being used up (mostly in
>   btrfs_inode_cache) and can't be reclaimed (leading to crash
>   with oom killing everybody)
> - the target FS can be unmounted but that does not reclaim
>   memory. However the *source* FS (that is not the one we tried
>   with and without compress) cannot be unmounted (umount hangs,
>   see another email for its stack trace).
> - Only way to get out of there is reboot with sysrq-b
> - happens with 2.6.38, 2.6.39, 3.0.0rc6
> - CONFIG_SLAB_DEBUG, CONFIG_DEBUG_PAGEALLOC,
>   CONFIG_DEBUG_SLAB_LEAK, slub_debug don't tell us anything
>   useful (there's more info in /proc/slabinfo when
>   CONFIG_SLAB_DEBUG is on, see below)
> - happens with CONFIG_SLUB as well.
[...]

I don't know which of CONFIG_SLUB or noatime made it, but in
that setup with both enabled, I do get the BUG(), but the system
memory is not exhausted even after rsync goes over the section
with millions of files where it used to cause the oom crash.

The only issue remaining then is that I can't umount the source
FS (and thus causing reboot issues). We could still have 2 or 3
different issues here for all we know.

The situation is a lot more comfortable for me now though.

-- 
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Memory leak?

2011-07-11 Thread Stephane Chazelas

2011-07-11 12:25:51 -0400, Chris Mason:
[...]
> > Also, when I resume the rsync (so it doesn't transfer the
> > already transfered files), it does BUG() again.
> 
> Ok, could you please send along the exact rsync command you were
> running?
[...]

I did earlier, but here it is again:

rsync --archive --xattrs --hard-links --numeric-ids --sparse --acls /src/ /dst/

Also with:

(cd /src && bsdtar cf - .) | pv | (cd /dst && bsdtar -xpSf - --numeric-owner)

-- 
Stephane

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Memory leak?

2011-07-11 Thread Stephane Chazelas

2011-07-11 11:00:19 -0400, Chris Mason:
> Excerpts from Stephane Chazelas's message of 2011-07-11 05:01:21 -0400:
> > 2011-07-10 19:37:28 +0100, Stephane Chazelas:
> > > 2011-07-10 08:44:34 -0400, Chris Mason:
> > > [...]
> > > > Great, we're on the right track.  Does it trigger with mount -o compress
> > > > instead of mount -o compress_force?
> > > [...]
> > > 
> > > It does trigger. I get that same "invalid opcode".
> > > 
> > > BTW, I tried with CONFIG_SLUB and slub_debug and no more useful
> > > information than with SLAB_DEBUG.
> > > 
> > > I'm trying now without dmcrypt. Then I won't have much bandwidth
> > > for testing.
> > [...]
> > 
> > Same without dmcrypt. So to sum up, BUG() reached in btrfs-fixup
> > thread when doing an 
[...]
> > - CONFIG_SLAB_DEBUG, CONFIG_DEBUG_PAGEALLOC,
> >   CONFIG_DEBUG_SLAB_LEAK, slub_debug don't tell us anything
> >   useful (there's more info in /proc/slabinfo when
> >   CONFIG_SLAB_DEBUG is on, see below)
[...]
> This is some fantastic debugging, thank you.  I know you tested with
> slab debugging turned on, did you have CONFIG_DEBUG_PAGEALLOC on as
> well?

Yes when using SLAB, not when using SLUB.

> It's probably something to do with a specific file, but pulling that
> file out without extra printks is going to be tricky.  I'll see if I can
> reproduce it here.
[...]

For one occurrence, I know what file was being transfered at the
time of the crash (looking at ctimes on the dest FS, see one of
my earlier emails). And after just checking on the latest BUG(),
it's a different one.

Also, when I resume the rsync (so it doesn't transfer the
already transfered files), it does BUG() again.

regards,
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: feature request: btrfs-image without zeroing data

2011-07-11 Thread Stephane Chazelas

2011-07-11 14:39:18 +0200, krz...@gmail.com :
> 2011/7/11 Stephane Chazelas :
[...]
> > See also
> > http://thread.gmane.org/gmane.comp.file-systems.btrfs/9675/focus=9820
> > for a way to transfer btrfs fs.
> >
> > (Add a layer of "copy-on-write" on the original devices (LVM
> > snapshots, nbd/qemu-nbd cow...), "btrfs add" the new device(s)
> > and then "btrfs del" of the cow'ed original devices.
[...]
> Copying on block level (dd, lvm) is old trick, however this takes same
> ammount of time regardless of actual space used in filesystem. Hence
> this feature request. Images inside filesystem can copy only actualy
> used data and metadata, which dramaticly reduces copy times in large
> volumes that are not filled up...

The method I suggest doesn't copy the whole disks, please read
more carefully. It can also work to copy from a 3 disk setup to
a 1 disk setup or the other way round.

With btrfs, you can add devices to a FS dynamically, you can
also delete devices in which case data is being transfered to
the other devices. The method I suggest uses that feature.

Cheers,
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: feature request: btrfs-image without zeroing data

2011-07-11 Thread Stephane Chazelas

2011-07-11 02:00:51 +0200, krz...@gmail.com :
> Documentation says that btrfs-image zeros data. Feature request is for
> disabling this. btrfs-image could be used to copy filesystem to
> another drive (for example with snapshots, when copying it file by
> file would take much longer time or acctualy was not possible
> (snapshots)). btrfs-image in turn could be used to actualy shrink loop
> devices/sparse file containing btrfs - by copying filesystem to new
> loop device/sparse file.
> 
> Also it would be nice if copying filesystem could occour without
> intermediate dump to a file...
[...]

I second that.

See also
http://thread.gmane.org/gmane.comp.file-systems.btrfs/9675/focus=9820
for a way to transfer btrfs fs.

(Add a layer of "copy-on-write" on the original devices (LVM
snapshots, nbd/qemu-nbd cow...), "btrfs add" the new device(s)
and then "btrfs del" of the cow'ed original devices.

-- 
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Memory leak?

2011-07-11 Thread Stephane Chazelas

2011-07-10 19:37:28 +0100, Stephane Chazelas:
> 2011-07-10 08:44:34 -0400, Chris Mason:
> [...]
> > Great, we're on the right track.  Does it trigger with mount -o compress
> > instead of mount -o compress_force?
> [...]
> 
> It does trigger. I get that same "invalid opcode".
> 
> BTW, I tried with CONFIG_SLUB and slub_debug and no more useful
> information than with SLAB_DEBUG.
> 
> I'm trying now without dmcrypt. Then I won't have much bandwidth
> for testing.
[...]

Same without dmcrypt. So to sum up, BUG() reached in btrfs-fixup
thread when doing an 

- rsync (though I also got (back when on ubuntu and 2.6.38) at
  least one occurrence using bsdtar | bsdtar)
- of a large amount of data (with a large number of files),
  though the bug occurs quite early probably after having
  transfered about 50-100GB
- the source FS being btrfs with compress-force on 3 devices
  (one of which slightly shorter than the others) and a lot of
  subvolumes and snapshots (I'm now copying from read-only
  snapshots but that happened with RW ones as well).
- to a newly created btrfs fs
- on one device (/dev/sdd or dmcrypt)
- mounted with compress or compress-force.

- noatime on either source or dest doesn't make a difference
  (wrt the occurrence of fixup BUG())
- can't reproduce it when dest is not mounted with compress
- beside that BUG(),
- kernel memory is being used up (mostly in
  btrfs_inode_cache) and can't be reclaimed (leading to crash
  with oom killing everybody)
- the target FS can be unmounted but that does not reclaim
  memory. However the *source* FS (that is not the one we tried
  with and without compress) cannot be unmounted (umount hangs,
  see another email for its stack trace).
- Only way to get out of there is reboot with sysrq-b
- happens with 2.6.38, 2.6.39, 3.0.0rc6
- CONFIG_SLAB_DEBUG, CONFIG_DEBUG_PAGEALLOC,
  CONFIG_DEBUG_SLAB_LEAK, slub_debug don't tell us anything
  useful (there's more info in /proc/slabinfo when
  CONFIG_SLAB_DEBUG is on, see below)
- happens with CONFIG_SLUB as well.


slabinfo every about 60-70 seconds which include the "globalstats"

slabinfo - version: 2.1 (statistics)
# name
 : tunables: slabdata 
   : globalst
at
  : cpustat

btrfs_inode_cache  77610  77610   409611 : tunables   24   128 : 
slabdata  77610  77610  0 : globalstat   77610  77610 77610000  
  000 : cpustat104  77609 98  5
btrfs_inode_cache 165696 165696   409611 : tunables   24   128 : 
slabdata 165696 165696  0 : globalstat  174592 166889 17311700   37 
   800 : cpustat  14375 174178  21198   1659
btrfs_inode_cache 173906 173906   409611 : tunables   24   128 : 
slabdata 173906 173906  0 : globalstat  231342 196133 22884880   37 
   800 : cpustat  24914 230649  75318   6338
btrfs_inode_cache 201190 201190   409611 : tunables   24   128 : 
slabdata 201190 201190  0 : globalstat  338963 201190 33145480   38 
  1100 : cpustat  53954 335583 173512  14834
btrfs_inode_cache 224106 224143   409611 : tunables   24   128 : 
slabdata 224106 224143 96 : globalstat  453173 267189 44210180   38 
  1300 : cpustat  77063 448023 277242  23875
btrfs_inode_cache 126520 126520   409611 : tunables   24   128 : 
slabdata 126520 126520  0 : globalstat  486327 267189 472461   320   38 
  1300 : cpustat  96675 479904 414073  35992
btrfs_inode_cache 144723 144723   409611 : tunables   24   128 : 
slabdata 144723 144723  0 : globalstat  537600 267189 521248   320   38 
  1500 : cpustat 114446 530048 459922  39849
btrfs_inode_cache 176590 176590   409611 : tunables   24   128 : 
slabdata 176590 176590  0 : globalstat  626027 267189 605212   320   38 
  3500 : cpustat 142336 616188 535659  46275
btrfs_inode_cache 225715 225752   409611 : tunables   24   128 : 
slabdata 225715 225752 96 : globalstat  766387 267189 739439   320   38 
  6000 : cpustat 181607 753564 653165  56404
btrfs_inode_cache 179039 179076   409611 : tunables   24   128 : 
slabdata 179039 179076 84 : globalstat  821296 267189 793315   480   38 
  6000 : cpustat 189640 808027 753396  65349
btrfs_inode_cache 139572 139609   409611 : tunables   24   128 : 
slabdata 139572 139609 96 : globalstat  890513 267189 858553   560   38 
  6000 : cpustat 214964 875265 874796  75992
btrfs_inode_cache 122064 122101   409611 : tunables   24   128 : 
slabdata 122064 122101 96 : globalstat  936515 267189 903015   720   38 
  6600 : cpustat 230345 920877 947006  82282
btrfs_inode_cache 136431 136468   409611 : tunables   24   128 : 
slabdata 136431 136468

Re: Memory leak?

2011-07-10 Thread Stephane Chazelas

2011-07-10 08:44:34 -0400, Chris Mason:
[...]
> Great, we're on the right track.  Does it trigger with mount -o compress
> instead of mount -o compress_force?
[...]

It does trigger. I get that same "invalid opcode".

BTW, I tried with CONFIG_SLUB and slub_debug and no more useful
information than with SLAB_DEBUG.

I'm trying now without dmcrypt. Then I won't have much bandwidth
for testing.

-- 
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Memory leak?

2011-07-09 Thread Stephane Chazelas

2011-07-09 08:09:55 +0100, Stephane Chazelas:
> 2011-07-08 16:12:28 -0400, Chris Mason:
> [...]
> > > I'm running a "dstat -df" at the same time and I'm seeing
> > > substantive amount of disk writes on the disks that hold the
> > > source FS (and I'm rsyncing from read-only snapshot subvolumes
> > > in case you're thinking of atimes) almost more than onto the
> > > drive holding the target FS!?
> > 
> > These are probably atime updates.  You can completely disable them with
> > mount -o noatime.
> [...]
> 
> I don't think it is. First, as I said, I'm rsyncing from
> read-only snapshots (and I could see atimes were not updated)
> and nothing else is running. Then now looking at the traces this
> morning, There was a lot written in the first minutes, then it
> calmed down.
[...]

How embarrassing, sorry

In that instance I wasn't rsyncing from the right place, so I
was indeed copying non-readonly volumes before copying readonly
ones. So, those writes were probably due to atimes.

-- 
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Memory leak?

2011-07-09 Thread Stephane Chazelas

2011-07-09 13:25:00 -0600, cwillu:
> On Sat, Jul 9, 2011 at 11:09 AM, Stephane Chazelas
>  wrote:
> > 2011-07-08 11:06:08 -0400, Chris Mason:
> > [...]
> >> I would do two things.  First, I'd turn off compress_force.  There's no
> >> explicit reason for this, it just seems like the mostly likely place for
> >> a bug.
> > [...]
> >
> > I couldn't reproduce it with compress_force turned off, the
> > inode_cache reached 600MB but never stayed high.
> >
> > Not using compress_force is not an option for me though
> > unfortunately.
> 
> Disabling compression doesn't decompress everything that's already compressed.

Yes. But the very issue here is that I get this problem when I
copy data onto an empty drive. If I don't enable compression
there, it simply doesn't fit. In that very case, support for
compression is the main reason why I use brtfs here.

-- 
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Memory leak?

2011-07-09 Thread Stephane Chazelas

2011-07-08 11:06:08 -0400, Chris Mason:
[...]
> I would do two things.  First, I'd turn off compress_force.  There's no
> explicit reason for this, it just seems like the mostly likely place for
> a bug.
[...]

I couldn't reproduce it with compress_force turned off, the
inode_cache reached 600MB but never stayed high.

Not using compress_force is not an option for me though
unfortunately.

-- 
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Memory leak?

2011-07-09 Thread Stephane Chazelas

2011-07-08 12:17:54 -0400, Chris Mason:
[...]
> How easily can you recompile your kernel with more debugging flags?
> That should help narrow it down.  I'm looking for CONFIG_SLAB_DEBUG (or
> slub) and CONFIG_DEBUG_PAGEALLOC
[...]

I tried that (with CONFIG_DEBUG_SLAB_LEAK as well) but no
difference whatsoever

-- 
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

A lot of writing to FS only read (Was: Memory leak?)

2011-07-09 Thread Stephane Chazelas

2011-07-09 08:09:55 +0100, Stephane Chazelas:
> 2011-07-08 16:12:28 -0400, Chris Mason:
> [...]
> > > I'm running a "dstat -df" at the same time and I'm seeing
> > > substantive amount of disk writes on the disks that hold the
> > > source FS (and I'm rsyncing from read-only snapshot subvolumes
> > > in case you're thinking of atimes) almost more than onto the
> > > drive holding the target FS!?
[...]

One thing I didn't mention is that before doing the rsync, I
deleted a few snapshot volumes (those were read-only snapshots
of read-only snapshots) and recreated them and that's the ones
I'm rsyncing from (basically, I prepare an area to be rsynced
made of the latests snapshots of a few subvolumes).

-- 
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Memory leak?

2011-07-09 Thread Stephane Chazelas

2011-07-08 16:12:28 -0400, Chris Mason:
[...]
> > I'm running a "dstat -df" at the same time and I'm seeing
> > substantive amount of disk writes on the disks that hold the
> > source FS (and I'm rsyncing from read-only snapshot subvolumes
> > in case you're thinking of atimes) almost more than onto the
> > drive holding the target FS!?
> 
> These are probably atime updates.  You can completely disable them with
> mount -o noatime.
[...]

I don't think it is. First, as I said, I'm rsyncing from
read-only snapshots (and I could see atimes were not updated)
and nothing else is running. Then now looking at the traces this
morning, There was a lot written in the first minutes, then it
calmed down.

That's every 5 minutes (in megabytes written to the hard disks
making part of the source FS):


   976 *|**|**
  1786 *||
  2486 *|**|**
  3139 *||
  3720 *|*|*
  4226 *|**|**
  4756 *||***
  5134 **||***
  5573 ***|*|
  5991 |**|
  6423 |***|*
  6580 ||*
  6598 ||*
  6602 ||*
  6671 ||*
  6774 ||**
  6887 ||**
  6928 ||**
  6943 |*|**
  6954 |*|**
  6969 |*|**
  6982 |*|**
  6998 |*|**
  7016 |*|**
  7106 |*|**
  7143 |*|**
  7184 |*|***
  7216 |*|***
  7244 |*|***
  7315 |*|***
  7347 |**|***
  7358 |**|***
  7418 |**|***
  7432 |**|***
  7438 |**|***
  7438 |**|***
  7462 |**|***
  7483 |**|***
  7528 |**|***
  7529 |**|***
  7618 |**|
  7869 |***|
  8325 *||**
  8419 *||**
  8533 *|*|**
  8587 *|*|**
  8731 *|*|***
  8830 *|*|***
  8869 *|*|***
  8881 *|**|***
  8903 *|**|***
  8974 *|**|***
  9044 *|**|***
  9100 *|**|
  9492 *|***|*
  9564 *|***|*
  9640 *||*
  9684 *||*
  9756 *||*
  9803 *||*
  9836 *||*
  9845 *||**
  9881 *||**
  9989 *||**
 10083 *|*|**
 10172 *|*|**
 10230 *|*|**
 10250 *|*|***
 10271 *|*|***
 10291 *|*|***
 10306 *|*|***
 10314 *|*|***
 10330 *|*|***
 10395 *|*|***
 10468 *|**|***
 10641 *|**|***
 10728 *|**|
 10818 *|**|**

Re: Memory leak?

2011-07-08 Thread Stephane Chazelas

2011-07-08 12:15:08 -0400, Chris Mason:
[...]
> I'd definitely try without -o compress_force.
[...]

Just started that over the night.

I'm running a "dstat -df" at the same time and I'm seeing
substantive amount of disk writes on the disks that hold the
source FS (and I'm rsyncing from read-only snapshot subvolumes
in case you're thinking of atimes) almost more than onto the
drive holding the target FS!?

--dsk/sda-- --dsk/sdb-- --dsk/sdc-- --dsk/sdd--
 read  writ: read  writ: read  writ: read  writ
1000k0 : 580k0 :2176k0 :   0 0
1192k0 :  76k0 : 988k0 :   0 0
 436k0 : 364k0 :1984k0 :   033M
 824k0 : 812k0 :4276k0 :   0 0
3004k0 :2868k0 :5488k0 :   0 0
 796k  564k: 640k   25M:2284k 4600k:   0 0
 108k0 :  68k   23M: 648k   35M:   0 0
1712k   12k:1644k   12k:2476k 7864k:   0 0
 732k0 : 620k0 :3192k0 :   0 0
1148k0 :1116k0 :3532k0 :   019M
1392k0 :1380k0 :4416k0 :   0  7056k
1336k0 :1212k0 :6664k0 :   0  3148k
 820k0 : 784k0 :4528k0 :   048M
1336k0 :1340k0 :3964k0 :   0  8996k
1528k0 :1280k0 :2908k0 :   0 0
1352k0 :1216k0 :3880k0 :   0 0
 864k0 : 888k0 :1684k0 :   0 0
1268k0 :1208k0 :3072k0 :   0 0

(source FS is on sda4+sdb+sdc, target on sdd, sda alsa has an
ext4 FS)

How can that be? Some garbage collection, background defrag or
something like that? But then, if I stop the rsync, those writes
stop as well.

-- 
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Memory leak?

2011-07-08 Thread Stephane Chazelas

2011-07-08 12:15:08 -0400, Chris Mason:
[...]
> You described this workload as rsync, is there anything else running?
[...]

Nope. Nothing else. And at least initially, that was onto an
empty drive so basic copy.

rsync --archive --xattrs --hard-links --numeric-ids --sparse --acls


Cheers,
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Memory leak?

2011-07-08 Thread Stephane Chazelas

2011-07-08 12:17:54 -0400, Chris Mason:
[...]
> > Jun  5 00:58:10  BUG: Bad page state in process rsync  pfn:1bfdf
> > Jun  5 00:58:10  page:ea61f8c8 count:0 mapcount:0 mapping:  
> > (null) index:0x2300
> > Jun  5 00:58:10  page flags: 0x110(dirty)
> > Jun  5 00:58:10  Pid: 1584, comm: rsync Tainted: G  D  C  
> > 2.6.38-7-server #35-Ubuntu
> > Jun  5 00:58:10  Call Trace:
> 
> Ok, this one is really interesting.  Did you get this after another oops
> or was it after a reboot?
> 

After the oops above (a few hours after though). The two reports
were together with nothing inbetween in the kern.log.

That was the only occurrence though.

> How easily can you recompile your kernel with more debugging flags?
> That should help narrow it down.  I'm looking for CONFIG_SLAB_DEBUG (or
> slub) and CONFIG_DEBUG_PAGEALLOC
[...]

I can try next week.

-- 
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Memory leak?

2011-07-08 Thread Stephane Chazelas

2011-07-08 16:41:23 +0100, Stephane Chazelas:
> 2011-07-08 11:06:08 -0400, Chris Mason:
> [...]
> > So the invalidate opcode in btrfs-fixup-0 is the big problem.  We're
> > either failing to write because we weren't able to allocate memory (and
> > not dealing with it properly) or there is a bigger problem.
> > 
> > Does the btrfs-fixup-0 oops come before or after the ooms?
> 
> Hi Chris, thanks for looking into this.
> 
> It comes long before. Hours before there's any problem. So it
> seems unrelated.

Though every time I had the issue, there had been such an
"invalid opcode" before. But also, I only had both the "invalid
opcode" and memory issue when doing that rsync onto external
hard drive.

> > Please send along any oops output during the run.  Only the first
> > (earliest) oops matters.
> 
> There's always only  one in between two reboots. I've sent two
> already, but here they  are:
[...]

I dug up the traces for before I switched to debian (thinking
getting a newer kernel would improve matters) in case it helps:

Jun  4 18:12:58  [ cut here ]
Jun  4 18:12:58  kernel BUG at /build/buildd/linux-2.6.38/fs/btrfs/inode.c:1555!
Jun  4 18:12:58  invalid opcode:  [#2] SMP
Jun  4 18:12:58  last sysfs file: /sys/devices/virtual/block/dm-2/dm/name
Jun  4 18:12:58  CPU 0
Jun  4 18:12:58  Modules linked in: sha256_generic cryptd aes_x86_64 
aes_generic dm_crypt psmouse serio_raw xgifb(C+) i3200_edac edac_core nbd btrfs 
zlib_deflate libcrc32c xenbus_probe_frontend ums_cypress usb_storage uas e1000e 
ahci libahci
Jun  4 18:12:58 
Jun  4 18:12:58  Pid: 416, comm: btrfs-fixup-0 Tainted: G  D  C  
2.6.38-7-server #35-Ubuntu empty empty/Tyan Tank GT20 B5211
Jun  4 18:12:58  RIP: 0010:[]  [] 
btrfs_writepage_fixup_worker+0x145/0x150 [btrfs]
Jun  4 18:12:58  RSP: 0018:88003cfddde0  EFLAGS: 00010246
Jun  4 18:12:58  RAX:  RBX: ea04ca88 RCX: 

Jun  4 18:12:58  RDX: 88003cfddd98 RSI:  RDI: 
8800152088b0
Jun  4 18:12:58  RBP: 88003cfdde30 R08: e8c09988 R09: 
88003cfddd98
Jun  4 18:12:58  R10:  R11:  R12: 
010ec000
Jun  4 18:12:58  R13: 880015208988 R14:  R15: 
010ecfff
Jun  4 18:12:58  FS:  () GS:88003fc0() 
knlGS:
Jun  4 18:12:58  CS:  0010 DS:  ES:  CR0: 8005003b
Jun  4 18:12:58  CR2: 00e73fe8 CR3: 30fcc000 CR4: 
06f0
Jun  4 18:12:58  DR0:  DR1:  DR2: 

Jun  4 18:12:58  DR3:  DR6: 0ff0 DR7: 
0400
Jun  4 18:12:58  Process btrfs-fixup-0 (pid: 416, threadinfo 88003cfdc000, 
task 880036912dc0)
Jun  4 18:12:58  Stack:
Jun  4 18:12:58   880039c4e120 880015208820 88003cfdde90 
880032da4b80
Jun  4 18:12:58   88003cfdde30 88003ce915a0 88003cfdde90 
88003cfdde80
Jun  4 18:12:58   880036912dc0 88003ce915f0 88003cfddee0 
a00c34f4
Jun  4 18:12:58  Call Trace:
Jun  4 18:12:58   [] worker_loop+0xa4/0x3a0 [btrfs]
Jun  4 18:12:58   [] ? worker_loop+0x0/0x3a0 [btrfs]
Jun  4 18:12:58   [] kthread+0x96/0xa0
Jun  4 18:12:58   [] kernel_thread_helper+0x4/0x10
Jun  4 18:12:58   [] ? kthread+0x0/0xa0
Jun  4 18:12:58   [] ? kernel_thread_helper+0x0/0x10
Jun  4 18:12:58  Code: 1f 80 00 00 00 00 48 8b 7d b8 48 8d 4d c8 41 b8 50 00 00 
00 4c 89 fa 4c 89 e6 e8 37 d1 01 00 eb b6 48 89 df e8 8d 1a 07 e1 eb 9a <0f> 0b 
66 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 57 41 56 41 55
Jun  4 18:12:58  RIP  [] 
btrfs_writepage_fixup_worker+0x145/0x150 [btrfs]
Jun  4 18:12:58   RSP 
Jun  4 18:12:58  ---[ end trace e5cf15794ff3ebdb ]---

And:

Jun  5 00:58:10  BUG: Bad page state in process rsync  pfn:1bfdf
Jun  5 00:58:10  page:ea61f8c8 count:0 mapcount:0 mapping:  
(null) index:0x2300
Jun  5 00:58:10  page flags: 0x110(dirty)
Jun  5 00:58:10  Pid: 1584, comm: rsync Tainted: G  D  C  2.6.38-7-server 
#35-Ubuntu
Jun  5 00:58:10  Call Trace:
Jun  5 00:58:10   [] ? dump_page+0x9b/0xd0
Jun  5 00:58:10   [] ? bad_page+0xcc/0x120
Jun  5 00:58:10   [] ? prep_new_page+0x1a5/0x1b0
Jun  5 00:58:10   [] ? _raw_spin_lock+0xe/0x20
Jun  5 00:58:10   [] ? test_range_bit+0x111/0x150 [btrfs]
Jun  5 00:58:10   [] ? get_page_from_freelist+0x264/0x650
Jun  5 00:58:10   [] ? 
generic_bin_search.clone.42+0x19e/0x200 [btrfs]
Jun  5 00:58:10   [] ? __alloc_pages_nodemask+0x118/0x830
Jun  5 00:58:10   [] ? 
generic_bin_search.clone.42+0x19e/0x200 [btrfs]
Jun  5 00:58:10   [] ? _raw_spin_lock+0xe/0x20
Jun  5 00:58:10   [] ? get_partial_node+0x92/0xb0
Jun  5 00:58:10   [] ? 
btrfs_submit_compressed_read+0x15d/0x4e0 [btrfs]
Jun  5 00:58:10   [] ? alloc_pages_current+0xa5/0x110
Jun  5 00:58:10   [] ? 
btrfs_s

Re: Memory leak?

2011-07-08 Thread Stephane Chazelas

2011-07-08 11:06:08 -0400, Chris Mason:
[...]
> So the invalidate opcode in btrfs-fixup-0 is the big problem.  We're
> either failing to write because we weren't able to allocate memory (and
> not dealing with it properly) or there is a bigger problem.
> 
> Does the btrfs-fixup-0 oops come before or after the ooms?

Hi Chris, thanks for looking into this.

It comes long before. Hours before there's any problem. So it
seems unrelated.

> Please send along any oops output during the run.  Only the first
> (earliest) oops matters.

There's always only  one in between two reboots. I've sent two
already, but here they  are:

Jul  1 18:25:57  [ cut here ]
Jul  1 18:25:57  kernel BUG at 
/media/data/mattems/src/linux-2.6-3.0.0~rc5/debian/build/source_amd64_none/fs/btrfs/inode.c:1583!
Jul  1 18:25:57  invalid opcode:  [#1] SMP
Jul  1 18:25:57  CPU 1
Jul  1 18:25:57  Modules linked in: sha256_generic cryptd aes_x86_64 
aes_generic cbc dm_crypt fuse snd_pcm psmouse tpm_tis tpm i2c_i801 snd_timer 
snd soundcore snd_page_alloc i3200_edac tpm_bios serio_raw evdev pcspkr 
processor button thermal_sys i2c_core container edac_core sg sr_mod cdrom ext4 
mbcache jbd2 crc16 dm_mod nbd btrfs zlib_deflate crc32c libcrc32c ums_cypress 
usb_storage sd_mod crc_t10dif uas uhci_hcd ahci libahci libata ehci_hcd e1000e 
scsi_mod usbcore [last unloaded: scsi_wait_scan]
Jul  1 18:25:57 
Jul  1 18:25:57  Pid: 747, comm: btrfs-fixup-0 Not tainted 3.0.0-rc5-amd64 #1 
empty empty/Tyan Tank GT20 B5211
Jul  1 18:25:57  RIP: 0010:[]  [] 
btrfs_writepage_fixup_worker+0xdb/0x120 [btrfs]
Jul  1 18:25:57  RSP: 0018:8801483ffde0  EFLAGS: 00010246
Jul  1 18:25:57  RAX:  RBX: ea000496a430 RCX: 

Jul  1 18:25:57  RDX:  RSI: 06849000 RDI: 
880071c1fcb8
Jul  1 18:25:57  RBP: 06849000 R08: 0008 R09: 
8801483ffd98
Jul  1 18:25:57  R10: dead00200200 R11: dead00100100 R12: 
880071c1fd90
Jul  1 18:25:57  R13:  R14: 8801483ffdf8 R15: 
06849fff
Jul  1 18:25:57  FS:  () GS:88014fd0() 
knlGS:
Jul  1 18:25:57  CS:  0010 DS:  ES:  CR0: 8005003b
Jul  1 18:25:57  CR2: f7596000 CR3: 00013def9000 CR4: 
06e0
Jul  1 18:25:57  DR0:  DR1:  DR2: 

Jul  1 18:25:57  DR3:  DR6: 0ff0 DR7: 
0400
Jul  1 18:25:57  Process btrfs-fixup-0 (pid: 747, threadinfo 8801483fe000, 
task 88014672efa0)
Jul  1 18:25:57  Stack:
Jul  1 18:25:57   880071c1fc28 8800c70165c0  
88011e61ca28
Jul  1 18:25:57    880146ef41c0 880146ef4210 
880146ef41d8
Jul  1 18:25:57   880146ef41c8 880146ef4200 880146ef41e8 
a01669fa
Jul  1 18:25:57  Call Trace:
Jul  1 18:25:57   [] ? worker_loop+0x186/0x4a1 [btrfs]
Jul  1 18:25:57   [] ? schedule+0x5ed/0x61a
Jul  1 18:25:57   [] ? btrfs_queue_worker+0x24a/0x24a [btrfs]
Jul  1 18:25:57   [] ? btrfs_queue_worker+0x24a/0x24a [btrfs]
Jul  1 18:25:57   [] ? kthread+0x7a/0x82
Jul  1 18:25:57   [] ? kernel_thread_helper+0x4/0x10
Jul  1 18:25:57   [] ? kthread_worker_fn+0x147/0x147
Jul  1 18:25:57   [] ? gs_change+0x13/0x13
Jul  1 18:25:57  Code: 41 b8 50 00 00 00 4c 89 f1 e8 d5 3b 01 00 48 89 df e8 fb 
ac f6 e0 ba 01 00 00 00 4c 89 ee 4c 89 e7 e8 ce 05 01 00 e9 4e ff ff ff <0f> 0b 
eb fe 48 8b 3c 24 41 b8 50 00 00 00 4c 89 f1 4c 89 fa 48
Jul  1 18:25:57  RIP  [] 
btrfs_writepage_fixup_worker+0xdb/0x120 [btrfs]
Jul  1 18:25:57   RSP 
Jul  1 18:25:57  ---[ end trace 9744d33381de3d04 ]---

Jul  4 12:50:51  [ cut here ]
Jul  4 12:50:51  kernel BUG at 
/media/data/mattems/src/linux-2.6-3.0.0~rc5/debian/build/source_amd64_none/fs/btrfs/inode.c:1583!
Jul  4 12:50:51  invalid opcode:  [#1] SMP
Jul  4 12:50:51  CPU 0
Jul  4 12:50:51  Modules linked in: lm85 dme1737 hwmon_vid coretemp ipmi_si 
ipmi_msghandler sha256_generic cryptd aes_x86_64 aes_generic cbc fuse dm_crypt 
snd_pcm snd_timer snd sg soundcore i3200_edac snd_page_alloc sr_mod processor 
tpm_tis i2c_i801 pl2303 pcspkr thermal_sys i2c_core tpm edac_core tpm_bios 
cdrom usbserial container evdev psmouse button serio_raw ext4 mbcache jbd2 
crc16 dm_mod nbd btrfs zlib_deflate crc32c libcrc32c ums_cypress sd_mod 
crc_t10dif usb_storage uas uhci_hcd ahci libahci ehci_hcd libata e1000e usbcore 
scsi_mod [last unloaded: i2c_dev]
Jul  4 12:50:51 
Jul  4 12:50:51  Pid: 764, comm: btrfs-fixup-0 Not tainted 3.0.0-rc5-amd64 #1 
empty empty/Tyan Tank GT20 B5211
Jul  4 12:50:51  RIP: 0010:[]  [] 
btrfs_writepage_fixup_worker+0xdb/0x120 [btrfs]
Jul  4 12:50:51  RSP: 0018:880147ffdde0  EFLAGS: 00010246
Jul  4 12:50:51  RAX:  RBX: ea0004648098 RCX: 

Jul  4 12:50:51  RDX:  RSI: 05854000 RDI: 
8800073f18d0
Jul  4 12:50:51  RBP: 0

Re: Memory leak?

2011-07-08 Thread Stephane Chazelas

2011-07-06 09:11:11 +0100, Stephane Chazelas:
> 2011-07-03 13:38:57 -0600, cwillu:
> > On Sun, Jul 3, 2011 at 1:09 PM, Stephane Chazelas
> >  wrote:
> [...]
> > > Now, on a few occasions (actually, most of the time), when I
> > > rsynced the data (about 2.5TB) onto the external drive, the
> > > system would crash after some time with "Out of memory and no
> > > killable process". Basically, something in kernel was allocating
> > > the whole memory, then oom mass killed everybody and crash.
> [...]
> > Look at the output of slabtop (should be installed by default, procfs
> > package), before rsync for comparison, and during.
> 
> Hi,
> 
> so, no crash this time
[...]

Another attempt, again onto an empty drive, this time with
3.0.0-rc6. Exact same scenario.

slabinfo:

 extent_map delayed_node btrfs_inode_cache btrfs_free_space_cache
(in bytes)
2011-07-07_10:40 143264  29952  0 120832
2011-07-07_10:50   16088160   59593248  22814 196352
2011-07-07_11:00   19549728   83101824  25890 475776
2011-07-07_11:10   17695040   52438464  163416000 758976
2011-07-07_11:20   16723168   55321344  180821095040
2011-07-07_11:30   15468640   46624032  1665680001344256
2011-07-07_11:40   14651648   43962048  1439160001944640
2011-07-07_11:509010144262080097320002231616
2011-07-07_12:004224352364665692240002473280
2011-07-07_12:102528416 45676819840002511040
2011-07-07_12:202400640 42681614520002590336
2011-07-07_12:302241888 33696010520002590336
2011-07-07_12:404669632   10224864   368040002662080
2011-07-07_12:504386976432057680320002662080
2011-07-07_13:00428243217933765862662080
2011-07-07_13:104270816123177628480002662080
2011-07-07_13:204255328114192023480002662080
2011-07-07_13:304255328113443222320002662080
2011-07-07_13:404255328113068821920002662080
2011-07-07_13:504255328113068821720002662080
2011-07-07_14:004239840113068821680002665856
2011-07-07_14:1042127367259616   28422669632
2011-07-07_14:2070083209936576   231760002673408
2011-07-07_14:3069153929929088   225120002677184
2011-07-07_14:4070470409929088   253920002680960
2011-07-07_14:5085377609929088   263680002707392
2011-07-07_15:00   10469888   12804480   401160002718720
2011-07-07_15:10   13195776   14028768   47082726272
2011-07-07_15:20   13571360   13946400   457480002730048
2011-07-07_15:30   19580704   16200288   545960002737600
2011-07-07_15:40   19449056   16192800   524120002737600
2011-07-07_15:50   19445184   16192800   524120002737600
2011-07-07_16:00   19425824   19450080   66482741376
2011-07-07_16:10   24858240   25994592   910760002756480
2011-07-07_16:20   24246464   25930944   774480002760256
2011-07-07_16:30   25477760   35144928  1313120002767808
2011-07-07_16:40   40625024   85512960  3264440002767808
2011-07-07_16:50   53247744  102390912  346562767808
2011-07-07_17:00   63465952  113390784  3884160002786688
2011-07-07_17:10   10020736   13523328   24962805568
2011-07-07_17:2089714249345024   28902809344
2011-07-07_17:309257952   31408416   78042805568
2011-07-07_17:409257952   31378464   768120002805568
2011-07-07_17:509257952   31374720   758280002805568
2011-07-07_18:006857312   13725504   21942805568
2011-07-07_18:106605632   10707840   378160002813120
2011-07-07_18:20   10768032   17252352   540320002820672
2011-07-07_18:30   21737408   74595456  2104920002824448
2011-07-07_18:40   14554848   16967808   412320002832000
2011-07-07_18:506594016   10281024   308760002832000
2011-07-07_19:006322976   11467872   408040002835776
2011-07-07_19:107798208   12227904   46522843328
2011-07-07_19:209927808   13279968   46962850880
2011-07-07_19:30   10237568   13272480   466440002854656
2011-07-07_19:40   17160704   16368768   56742865984
2011-07-07_19:50   26039200   29105856  1018280002881088
2011-07-07_20:00   27878400   33988032  1155280002881088
2011-07-07_20:10   30604288   39151008  125962881088
2011-07-07_20:20   31339968   39049920  1254760002881088
2011-07-07_20:30   31297376   39042432  1279280002881088
2011-07-07_20:40   31390304   39038688  1290480002881088
2011-07-07_20:50   31390304   39038688  127282888640
2011-07-07_21:00   31390304   39038688  1271880002892416
2011-07-07_21:10   31738784   39038688  127162892416
2011-07-07_21:20   40299776   49342176  173702896192
2011-07-07_21:30   46041952   58904352  206176000

Re: Memory leak?

2011-07-07 Thread Stephane Chazelas

2011-07-07 16:20:20 +0800, Li Zefan:
[...]
> btrfs_inode_cache is a slab cache for in memory inodes, which is of
> struct btrfs_inode.
[...]

Thanks Li.

If that's a cache, the system should be able to reuse the space
there when it's low on memory, wouldn't it? What would be the
conditions where that couldn't be done? (like in my case, where
the oom killer was hired to free memory rather than reclaiming
that cache memory).

Best regards,
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Memory leak?

2011-07-07 Thread Stephane Chazelas

2011-07-06 09:11:11 +0100, Stephane Chazelas:
[...]
> extent_map delayed_node btrfs_inode_cache btrfs_free_space_cache
> (in bytes)
[...]
> 01:00  267192640  668595744 23216460003418048
> 01:10  267192640  668595744 23216460003418048
> 01:20  267192640  668595744 23216460003418048
> 01:30  267192640  668595744 23216460003418048
> 01:40  267192640  668595744 23216460003418048
[...]

I've just come accross
http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-unstable.git;a=commit;h=4b9465cb9e3859186eefa1ca3b990a5849386320

GIT> author Chris Mason 
GIT>Fri, 3 Jun 2011 13:36:29 + (09:36 -0400)
GIT> committer  Chris Mason 
GIT>Sat, 4 Jun 2011 12:03:47 + (08:03 -0400)
GIT> commit 4b9465cb9e3859186eefa1ca3b990a5849386320
GIT> tree   8fc06452fb75e52f6c1c2e2253c2ff6700e622fdtree | snapshot
GIT> parent e7786c3ae517b2c433edc91714e86be770e9f1cecommit | diff
GIT> Btrfs: add mount -o inode_cache
GIT> 
GIT> This makes the inode map cache default to off until we
GIT> fix the overflow problem when the free space crcs don't fit
GIT> inside a single page.

I would have thought that would have disabled that
btrfs_inode_cache. And I can see that patch is in 3.0.0-rc5 (I'm
not mounting with -o inode_cache). So, why those 2.2GiB in
btrfs_inode_cache above?

-- 
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Memory leak?

2011-07-06 Thread Stephane Chazelas

2011-07-03 13:38:57 -0600, cwillu:
> On Sun, Jul 3, 2011 at 1:09 PM, Stephane Chazelas
>  wrote:
[...]
> > Now, on a few occasions (actually, most of the time), when I
> > rsynced the data (about 2.5TB) onto the external drive, the
> > system would crash after some time with "Out of memory and no
> > killable process". Basically, something in kernel was allocating
> > the whole memory, then oom mass killed everybody and crash.
[...]
> Look at the output of slabtop (should be installed by default, procfs
> package), before rsync for comparison, and during.

Hi,

so, no crash this time, but at the end of the rsync, there's a
whole chunk of memory that is no longer available to processes
(about 3.5G). As suggested by Carey, I watched /proc/slabinfo
during the rsync process (see below for a report of the most
significant ones over time).

Does that mean that if the system had had less than 3G of RAM, it
would have crashed?

I tried to reclaim the space without success. I had a process
allocate as much memory as it could. Then I unmounted the btrfs
fs that rsync was copying onto (the one on LUKS).
btrfs_inode_cache hardly changed. Then I tried to unmount the
source (the one on 3 hard disks and plenty of subvolumes).
umount hung. The FS disappeared from /proc/mounts. Here is the
backtrace:

[169270.268005] umount  D 880145ebe770 0 24079   1290 0x0004
[169270.268005]  880145ebe770 0086  
8160b020
[169270.268005]  00012840 880123bc7fd8 880123bc7fd8 
00012840
[169270.268005]  880145ebe770 880123bc6010 7fffac84f4a8 
0001
[169270.268005] Call Trace:
[169270.268005]  [] ? rwsem_down_failed_common+0xda/0x10e
[169270.268005]  [] ? call_rwsem_down_write_failed+0x13/0x20
[169270.268005]  [] ? down_write+0x25/0x27
[169270.268005]  [] ? deactivate_super+0x30/0x3d
[169270.268005]  [] ? sys_umount+0x2ea/0x315
[169270.268005]  [] ? system_call_fastpath+0x16/0x1b

iostat shows nothing being written to the drives.

extent_map delayed_node btrfs_inode_cache btrfs_free_space_cache
(in bytes)
09:40 131560  0  0128
09:50 131560  0   2000128 rsync started at 9:52
10:00   15832608   87963264 146583 325440
10:10 386056   85071168 1444656000 832960
10:20 237600   33988032 15497690001355584
10:30   23300288  145209600  4872740002237056
10:40   24667104  148610016  5064920002304448
10:50   22479776  139655808  5118930002382464
11:004137672104169645920002425344
11:1054985923211776   127420002443648
11:204567904140889695990002452608
11:302276736130982446850002453696
11:402225696 42192019870002455424
11:501971552  90432 3830002466176
12:001761672  74016 3270002469760
12:101939608  85824 4010002473536
12:202136288 121824 5510002479168
12:302367288 135648 6190002486016
12:401380984 181152 9110002485696
12:501053272 20217610270002483712
13:001938200 21974411520002491712
13:102037112 22377612040002494528
13:201775664 24912012440002497216
13:301704560 36604817320002500608
13:401433344 4688642462501824
13:508553248   20505888   673320002503168
14:00   12682208   34351200  1419680002494208
14:10   18488800   50282784  1778030002500544
14:20   19435592   46767744  1635820002505920
14:30   18734936   44863488  1565010002507200
14:40   21865184   46053504  1601850002484928
14:50   24457664   46473120  1624460002499200
15:00   24401344   47700576  1667230002502784
15:10   31390304   63426240  2211790002521472
15:20   34594560   61365600  2142430002524160
15:30   33836704   60934752  2126950002526400
15:40   33358776   60598944  2114550002528320
15:50   34909952   62583840  2184920002526272
16:00   44326656   65875392  2301230002529792
16:10   45840608   66373632  2321140002532736
16:20   47848064   66577536  2320480002535872
16:30   48013152   6160  2406510002536128
16:40   47594184   67766976  2362410002536576
16:50   48144184   67739904  236122542144
17:00   48005848   67639392  2352980002544000
17:10   48253920   67661280  2353760002537216
17:20   48857952   67612032  2349950002536000
17:30   48514752   67611168  2349860002535488
17:40   48436872   67609728  2349240002534528
17:50   48902216   67765248  2356540002542400
18:00   49055160   67763520  236022542912
18:10   48749712   67727520  235742550464
18:20   48631088   67705344  2355570002553280
18:30   49101096   6344  235713000220
18:40   48609264   67782816  2356010002558912
18:50   48480080   67808160  235595000256179

Memory leak?

2011-07-03 Thread Stephane Chazelas

Hiya,

I've got a server using brtfs to implement a backup system.
Basically, every night, for a few machines, I rsync (and other
methods) their file systems into one btrfs subvolume each and
then snapshot it.

On that server, the btrfs fs is on 3 3TB drives, mounted with
compress-force. Every week, I rsync the most recent snapshot of
a selection of subvolumes onto an encrypted (LUKS) external hard
drive (3TB as well).

Now, on a few occasions (actually, most of the time), when I
rsynced the data (about 2.5TB) onto the external drive, the
system would crash after some time with "Out of memory and no
killable process". Basically, something in kernel was allocating
the whole memory, then oom mass killed everybody and crash.

That was with ubuntu 2.6.38. I had then moved to debian and
2.6.39 and thought the problem was fixed, but it just happened
again with 3.0.0rc5 while rsyncing onto an initially empty btrfs
fs.

I'm going to resume the rsync again, and it's likely to happen
again. Is there anything simple (as I've got very little time to
look into that) I could do to help debug the issue (I'm not 100%
sure it's btrfs fault but that's the most likely culprit).

For a start, I'll switch the console to serial, and watch
/proc/vmstat. Anything else I could do?

Note that that server has never crashed when doing a lot of IO
at the same time in a lot of subvolumes with remote hosts. It's
only when copying data to that external drive on LUKS that it
seems to crash.

Cheers,
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [3.0.0rc5] invalid opcode

2011-07-01 Thread Stephane Chazelas

2011-07-01 15:07:52 -0400, Josef Bacik:
> On 07/01/2011 01:44 PM, Stephane Chazelas wrote:
[...]
> > [ 8203.192146] kernel BUG at 
> > /media/data/mattems/src/linux-2.6-3.0.0~rc5/debian/build/source_amd64_none/fs/btrfs/inode.c:1583!
> > [ 8203.192210] invalid opcode:  [#1] SMP
[...]
> > [ 8203.193219] Process btrfs-fixup-0 (pid: 747, threadinfo 
> > 8801483fe000, task 88014672efa0)
[...]
> > [ 8203.193538]  [] ? worker_loop+0x186/0x4a1 [btrfs]
> > [ 8203.193579]  [] ? schedule+0x5ed/0x61a
> > [ 8203.193624]  [] ? btrfs_queue_worker+0x24a/0x24a 
> > [btrfs]
> > [ 8203.193673]  [] ? btrfs_queue_worker+0x24a/0x24a 
> > [btrfs]
> > [ 8203.193714]  [] ? kthread+0x7a/0x82
> > [ 8203.193750]  [] ? kernel_thread_helper+0x4/0x10
> > [ 8203.193788]  [] ? kthread_worker_fn+0x147/0x147
> > [ 8203.193825]  [] ? gs_change+0x13/0x13
> > [ 8203.193859] Code: 41 b8 50 00 00 00 4c 89 f1 e8 d5 3b 01 00 48 89 df e8 
> > fb ac f6 e0 ba 01 00 00 00 4c 89 ee 4c 89 e7 e8 ce 05 01 00 e9 4e ff ff ff 
> > <0f> 0b eb fe 48 8b 3c 24 41 b8 50 00 00 00 4c 89 f1 4c 89 fa 48
> > [ 8203.194087] RIP  [] 
> > btrfs_writepage_fixup_worker+0xdb/0x120 [btrfs]
> > [ 8203.194160]  RSP 
> > [ 8203.194907] ---[ end trace 9744d33381de3d04 ]---
> > 
> > Should I be worried?
> > 
> 
> A little, can you reproduce it?  Thanks,
[...]

Well no. I have no idea what triggered it. It was in the middle
of doing an rsync from one compressed btrfs fs on 3 HDs (1
partition and 2 HDs) onto a btrfs FS on one LUKS device
(compressed as well). That rsync is still happily ongoing and
the server looks fine. Nothing else happening.

What's btrfs-fixup meant to do?

Cheers,
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[3.0.0rc5] invalid opcode

2011-07-01 Thread Stephane Chazelas

Hi,

I just got one of those:

[ 8203.192107] [ cut here ]
[ 8203.192146] kernel BUG at 
/media/data/mattems/src/linux-2.6-3.0.0~rc5/debian/build/source_amd64_none/fs/btrfs/inode.c:1583!
[ 8203.192210] invalid opcode:  [#1] SMP
[ 8203.192246] CPU 1
[ 8203.192256] Modules linked in: sha256_generic cryptd aes_x86_64 aes_generic 
cbc dm_crypt fuse snd_pcm psmouse tpm_tis tpm i2c_i801 snd_timer snd soundcore 
snd_page_alloc i3200_edac tpm_bios serio_raw evdev pcspkr processor button 
thermal_sys i2c_core container edac_core sg sr_mod cdrom ext4 mbcache jbd2 
crc16 dm_mod nbd btrfs zlib_deflate crc32c libcrc32c ums_cypress usb_storage 
sd_mod crc_t10dif uas uhci_hcd ahci libahci libata ehci_hcd e1000e scsi_mod 
usbcore [last unloaded: scsi_wait_scan]
[ 8203.192603]
[ 8203.192630] Pid: 747, comm: btrfs-fixup-0 Not tainted 3.0.0-rc5-amd64 #1 
empty empty/Tyan Tank GT20 B5211
[ 8203.192697] RIP: 0010:[]  [] 
btrfs_writepage_fixup_worker+0xdb/0x120 [btrfs]
[ 8203.192781] RSP: 0018:8801483ffde0  EFLAGS: 00010246
[ 8203.192816] RAX:  RBX: ea000496a430 RCX: 
[ 8203.192855] RDX:  RSI: 06849000 RDI: 880071c1fcb8
[ 8203.192893] RBP: 06849000 R08: 0008 R09: 8801483ffd98
[ 8203.192932] R10: dead00200200 R11: dead00100100 R12: 880071c1fd90
[ 8203.192971] R13:  R14: 8801483ffdf8 R15: 06849fff
[ 8203.193010] FS:  () GS:88014fd0() 
knlGS:
[ 8203.193067] CS:  0010 DS:  ES:  CR0: 8005003b
[ 8203.193103] CR2: f7596000 CR3: 00013def9000 CR4: 06e0
[ 8203.193141] DR0:  DR1:  DR2: 
[ 8203.193180] DR3:  DR6: 0ff0 DR7: 0400
[ 8203.193219] Process btrfs-fixup-0 (pid: 747, threadinfo 8801483fe000, 
task 88014672efa0)
[ 8203.193277] Stack:
[ 8203.193304]  880071c1fc28 8800c70165c0  
88011e61ca28
[ 8203.193371]   880146ef41c0 880146ef4210 
880146ef41d8
[ 8203.193434]  880146ef41c8 880146ef4200 880146ef41e8 
a01669fa
[ 8203.193497] Call Trace:
[ 8203.193538]  [] ? worker_loop+0x186/0x4a1 [btrfs]
[ 8203.193579]  [] ? schedule+0x5ed/0x61a
[ 8203.193624]  [] ? btrfs_queue_worker+0x24a/0x24a [btrfs]
[ 8203.193673]  [] ? btrfs_queue_worker+0x24a/0x24a [btrfs]
[ 8203.193714]  [] ? kthread+0x7a/0x82
[ 8203.193750]  [] ? kernel_thread_helper+0x4/0x10
[ 8203.193788]  [] ? kthread_worker_fn+0x147/0x147
[ 8203.193825]  [] ? gs_change+0x13/0x13
[ 8203.193859] Code: 41 b8 50 00 00 00 4c 89 f1 e8 d5 3b 01 00 48 89 df e8 fb 
ac f6 e0 ba 01 00 00 00 4c 89 ee 4c 89 e7 e8 ce 05 01 00 e9 4e ff ff ff <0f> 0b 
eb fe 48 8b 3c 24 41 b8 50 00 00 00 4c 89 f1 4c 89 fa 48
[ 8203.194087] RIP  [] 
btrfs_writepage_fixup_worker+0xdb/0x120 [btrfs]
[ 8203.194160]  RSP 
[ 8203.194907] ---[ end trace 9744d33381de3d04 ]---

Should I be worried?

Cheers,
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Re: [btrfs-progs integration] incorrect argument checking for "btrfs sub snap -r"

2011-07-01 Thread Stephane Chazelas

2011-07-01 11:42:23 +0100, Hugo Mills:
[...]
> > > diff --git a/btrfs.c b/btrfs.c
> > > index e117172..b50c58a 100644
> > > --- a/btrfs.c
> > > +++ b/btrfs.c
> > > @@ -49,7 +49,7 @@ static struct Command commands[] = {
> > >   /*
> > >   avoid short commands different for the case only
> > >   */
> > > - { do_clone, 2,
> > > + { do_clone, -2,
> > > "subvolume snapshot", "[-r]  [/]\n"
> > >   "Create a writable/readonly snapshot of the subvolume  
> > > with\n"
> > >   "the name  in the  directory.",
> > > diff --git a/btrfs_cmds.c b/btrfs_cmds.c
> > > index 1d18c59..3415afc 100644
> > > --- a/btrfs_cmds.c
> > > +++ b/btrfs_cmds.c
> > > @@ -355,7 +355,7 @@ int do_clone(int argc, char **argv)
> > >   return 1;
> > >   }
> > >   }
> > > - if (argc - optind < 2) {
> > > + if (argc - optind != 2) {
> > >   fprintf(stderr, "Invalid arguments for subvolume snapshot\n");
> > >   free(argv);
> > >   return 1;
> > >
> > Thanks for having another look at this. You are perfectly right. Should
> > we patch my patch or should I rework a corrected version? What do you
> > think Hugo?
> 
>Could you send a follow-up patch with just the second hunk, please?
> I screwed up the process with this (processing patches too quickly to
> catch the review), and I've already published the patch with the first
> hunk, above, into the for-chris branch.

Hugo, not sure what you mean nor whom you're talking to, but I
can certainly copy-paste the second hunk from above here:

diff --git a/btrfs_cmds.c b/btrfs_cmds.c
index 1d18c59..3415afc 100644
--- a/btrfs_cmds.c
+++ b/btrfs_cmds.c
@@ -355,7 +355,7 @@ int do_clone(int argc, char **argv)
return 1;
}
}
-   if (argc - optind < 2) {
+   if (argc - optind != 2) {
fprintf(stderr, "Invalid arguments for subvolume snapshot\n");
free(argv);
return 1;

Cheers,
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Re: [btrfs-progs integration] incorrect argument checking for "btrfs sub snap -r"

2011-07-01 Thread Stephane Chazelas

2011-06-30 22:55:15 +0200, Andreas Philipp:
> 
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>  
> On 30.06.2011 14:34, Stephane Chazelas wrote:
> > Looks like this was missing in integration-20110626 for the
> > readonly snapshot patch:
> >
> > diff --git a/btrfs.c b/btrfs.c
> > index e117172..be6ece5 100644
> > --- a/btrfs.c
> > +++ b/btrfs.c
> > @@ -49,7 +49,7 @@ static struct Command commands[] = {
> > /*
> > avoid short commands different for the case only
> > */
> > - { do_clone, 2,
> > + { do_clone, -1,
> > "subvolume snapshot", "[-r]  [/]\n"
> > "Create a writable/readonly snapshot of the subvolume  with\n"
> > "the name  in the  directory.",
> >
> > Without that, "btrfs sub snap -r x y" would fail as it's not *2*
> > arguments.
> Unfortunately, this is not correct either. "-1" means that the minimum
> number of arguments is 1 and since we need at least  and
>  this is 2. So the correct version should be -2.
[...]

Sorry, without looking closely at the source, I assumed -1 meant
defer the checking to the subcommand handler.

do_clone will indeed return an error if the number of arguments
is less than expected (so with -2, you'll get a different error
message if you do "btrfs sub snap -r foo" or "btrfs sub snap
foo") , but will not if it's more than expected by the way.

So the patch should probably be:

diff --git a/btrfs.c b/btrfs.c
index e117172..b50c58a 100644
--- a/btrfs.c
+++ b/btrfs.c
@@ -49,7 +49,7 @@ static struct Command commands[] = {
/*
avoid short commands different for the case only
*/
-   { do_clone, 2,
+   { do_clone, -2,
  "subvolume snapshot", "[-r]  [/]\n"
"Create a writable/readonly snapshot of the subvolume  
with\n"
"the name  in the  directory.",
diff --git a/btrfs_cmds.c b/btrfs_cmds.c
index 1d18c59..3415afc 100644
--- a/btrfs_cmds.c
+++ b/btrfs_cmds.c
@@ -355,7 +355,7 @@ int do_clone(int argc, char **argv)
return 1;
}
}
-   if (argc - optind < 2) {
+   if (argc - optind != 2) {
fprintf(stderr, "Invalid arguments for subvolume snapshot\n");
free(argv);
return 1;

Cheers,
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] [btrfs-progs integration] incorrect argument checking for "btrfs sub snap -r"

2011-06-30 Thread Stephane Chazelas

Looks like this was missing in integration-20110626 for the
readonly snapshot patch:

diff --git a/btrfs.c b/btrfs.c
index e117172..be6ece5 100644
--- a/btrfs.c
+++ b/btrfs.c
@@ -49,7 +49,7 @@ static struct Command commands[] = {
/*
avoid short commands different for the case only
*/
-   { do_clone, 2,
+   { do_clone, -1,
  "subvolume snapshot", "[-r]  [/]\n"
"Create a writable/readonly snapshot of the subvolume  
with\n"
"the name  in the  directory.",

Without that, "btrfs sub snap -r x y" would fail as it's not *2*
arguments.

-- 
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: subvolumes missing from "btrfs subvolume list" output

2011-06-30 Thread Stephane Chazelas

2011-06-30 11:18:42 +0200, Andreas Philipp:
[...]
> >> After that, I posted a patch to fix btrfs-progs, which Chris
> >> aggreed on:
> >>
> >> http://marc.info/?l=linux-btrfs&m=129238454714319&w=2
> > [...]
> >
> > Great. Thanks a lot
> >
> > It fixes my problem indeed.
> >
> > Which brings me to my next question: where to find the latest
> > btrfs-progs if not at
> > git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git
[...]
> Hugo Mills keeps an integration branch with nearly all patches to
> btrfs-progs applied.
> See
> 
> http://www.spinics.net/lists/linux-btrfs/msg10594.html
> 
> and for the last update
> 
> http://www.spinics.net/lists/linux-btrfs/msg10890.html
[...]

Thanks.

It might be worth adding a link to that to
https://btrfs.wiki.kernel.org/index.php/Btrfs_source_repositories

Note that it (integration-20110626) doesn't seem to include the fix in
http://marc.info/?l=linux-btrfs&m=129238454714319&w=2 though.

-- 
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: subvolumes missing from "btrfs subvolume list" output

2011-06-30 Thread Stephane Chazelas

2011-06-30 08:47:38 +0800, Li Zefan:
> Stephane Chazelas wrote:
> > 2011-06-29 15:37:47 +0100, Stephane Chazelas:
> > [...]
> >> I found
> >> http://thread.gmane.org/gmane.comp.file-systems.btrfs/8123/focus=8208
> >>
> >> which looks like the same issue, with Li Zefan saying he had a
> >> fix, but I couldn't find any mention that it was actually fixed.
> >>
> >> Has anybody got any update on that?
> > [...]
> > 
> > I've found
> > http://thread.gmane.org/gmane.comp.file-systems.btrfs/8232
> > 
> 
> After that, I posted a patch to fix btrfs-progs, which Chris aggreed
> on:
> 
> http://marc.info/?l=linux-btrfs&m=129238454714319&w=2
[...]

Great. Thanks a lot

It fixes my problem indeed.

Which brings me to my next question: where to find the latest
btrfs-progs if not at
git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git
(that one hasn't been modified in 8 months)?

Have changes for read-only subvolumes, per-directory settings...
been applied to some publicly available repository?

Cheers,
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: subvolumes missing from "btrfs subvolume list" output

2011-06-29 Thread Stephane Chazelas

2011-06-29 15:37:47 +0100, Stephane Chazelas:
[...]
> I found
> http://thread.gmane.org/gmane.comp.file-systems.btrfs/8123/focus=8208
> 
> which looks like the same issue, with Li Zefan saying he had a
> fix, but I couldn't find any mention that it was actually fixed.
> 
> Has anybody got any update on that?
[...]

I've found
http://thread.gmane.org/gmane.comp.file-systems.btrfs/8232

but no corresponding fix or ioctl.c
http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-unstable.git;a=history;f=fs/btrfs/ioctl.c

I'm under the impression that the issue has been forgotten
about.

>From what I managed to gather though, it seems that what's on
disk is correct, it's just the ioctl and/or "btrfs sub list"
that's wrong. Am I right?

(btw, I forgot to mention the kernel version: 3.0rc4 amd64,
btrfs tools from git)

Cheers,
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

subvolumes missing from "btrfs subvolume list" output

2011-06-29 Thread Stephane Chazelas

Hiya,

I've got a btrfs FS with 84 subvolumes in it (some created
with "btrfs sub create", some with "btrfs sub snap" of the other
ones). There's no nesting of subvolumes at all (all direct
children of the root subvolume).

The "btrfs subvolume list" is only showing 80 subvolumes. The 4
missing ones (1 original volume, 3 snapshots) do exist on disk
and files in there have different st_devs from any other
subvolume.

How would I start investigating what's wrong? And how to fix it.

I found
http://thread.gmane.org/gmane.comp.file-systems.btrfs/8123/focus=8208

which looks like the same issue, with Li Zefan saying he had a
fix, but I couldn't find any mention that it was actually fixed.

Has anybody got any update on that?

Thanks in advance,
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: different st_dev's in one subvolume

2011-06-01 Thread Stephane Chazelas

2011-06-02 01:39:41 +0100, Stephane Chazelas:
[...]
> /mnt/1# zstat +device ./**/*
> . 25
> A 26
> A/B 27
> A/B/inB 27
> A/inA 26
> A.snap 28
> A.snap/B 23
> A.snap/inA 28
> 
> Why does A.snap/B have a different st_dev from A.snap's?
[...]
> If I create another snap of A or A.snap, the "B" in there gets
> the same st_dev (23).
[...]

And same inode, ctime, mtime, atime... And when I create a new
snapshot, all those (regardless of where they are) have their
times updated at once.

I also noticed the st_nlink is always one but then came accross
http://thread.gmane.org/gmane.comp.file-systems.btrfs/4580

-- 
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

different st_dev's in one subvolume

2011-06-01 Thread Stephane Chazelas

Hiya,

please consider this:

~# truncate -s1G ./a
~# mkfs.btrfs ./a
~# sudo mount -o loop ./a /mnt/1
~# cd /mnt/1
/mnt/1# ls
/mnt/1# btrfs sub c A
Create subvolume './A'
/mnt/1# btrfs sub c A/B
Create subvolume 'A/B'
/mnt/1# touch A/inA A/B/inB
/mnt/1# btrfs sub snap A A.snap
Create a snapshot of 'A' in './A.snap'
/mnt/1# zmodload zsh/stat
/mnt/1# zstat +device ./**/*
. 25
A 26
A/B 27
A/B/inB 27
A/inA 26
A.snap 28
A.snap/B 23
A.snap/inA 28

Why does A.snap/B have a different st_dev from A.snap's?

Also:

/mnt/1# touch A.snap/B/foo
touch: cannot touch `A.snap/B/foo': Permission denied

I can rmdir that directory OK though.

Also note that the permissions are different:

/mnt/1# ll A
total 0
drwx-- 1 root root 6 Jun  2 00:54 B/
-rw-r--r-- 1 root root 0 Jun  2 00:54 inA
/mnt/1# ll A.snap
total 0
drwxr-xr-x 1 root root 0 Jun  2 01:29 B/
-rw-r--r-- 1 root root 0 Jun  2 00:54 inA

If I create another snap of A or A.snap, the "B" in there gets
the same st_dev (23).

/mnt/1# btrfs sub create A.snap/B/C
Create subvolume 'A.snap/B/C'
ERROR: cannot create subvolume
# btrfs sub snap A.snap/B B.snap
ERROR: 'A.snap/B' is not a subvolume

-- 
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: strange btrfs sub list output

2011-05-31 Thread Stephane Chazelas

2011-05-27 13:49:52 +0200, Andreas Philipp:
[...]
> > Thanks, I can understand that. What I don't get is how one creates
> > a subvol with a top-level other than 5. I might be missing the
> > obvious, though.
> >
> > If I do:
> >
> > btrfs sub create A btrfs sub create A/B btrfs sub snap A A/B/C
> >
> > A, A/B, A/B/C have their top-level being 5. How would I get a new
> > snapshot to be a child of A/B for instance?
> >
> > In my case, 285, was not appearing in the btrfs sub list output,
> > 287 was a child of 285 with path "data" while all I did was create
> > a snapshot of 284 (path u6:10022/vm+xfs@u8/xvda1/g8/v3/data in vol
> > 5) in u6:10022/vm+xfs@u8/xvda1/g8/v3/snapshots/2011-03-30
> >
> > So I did manage to get a volume with a parent other than 5, but I
> > did not ask for it.
[...]
> Reconsidering the explanations on btrfs subvolume list in this thread
> I get the impression that a line in the output of btrfs subvolume list
> with top level other than 5 indicates that the backrefs from one
> subvolume to its parent are broken.
> 
> What's your opinion on this?
[...]

Given that I don't really get what the parent-child relationship
means in that context, I can't really comment.

In effect, the snapshot had been created and was attached to the
right directory (but didn't appear in the sub list), and there
was an additional "data" volume that I had not asked for nor
created that had the snapshot above as parent and that did
appear in the sub list.

It pretty much looks like a bug to me, I'd like to understand
more so that I can maybe try and avoid running into it again.

-- 
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: strange btrfs sub list output

2011-05-27 Thread Stephane Chazelas

2011-05-27 10:45:23 +0100, Hugo Mills:
[...]
> > How could a "subvolume 285" become a "top level"?
> 
> > How does one get a subvolume with a top-level other than "5"?
> 
>This just means that subvolume 287 was created (somewhere) inside
> subvolume 285.
> 
>Due to the way that the FS trees and subvolumes work, there's no
> global namespace structure in btrfs; that is, there's no single data
> structure that represents the entirety of the file/directory hierarchy
> in the filesystem. Instead, it's broken up into these sub-namespaces
> called subvolumes, and we only record parent/child relationships for
> each subvolume separately. The "full path" you get from "btrfs subv
> list" is reconstructed from that information in userspace(*).
[...]

Thanks, I can understand that. What I don't get is how one
creates a subvol with a top-level other than 5. I might be
missing the obvious, though.

If I do:

btrfs sub create A
btrfs sub create A/B
btrfs sub snap A A/B/C

A, A/B, A/B/C have their top-level being 5. How would I get a
new snapshot to be a child of A/B for instance?

In my case, 285, was not appearing in the btrfs sub list output,
287 was a child of 285 with path "data" while all I did was
create a snapshot of 284 (path
u6:10022/vm+xfs@u8/xvda1/g8/v3/data in vol 5) in
u6:10022/vm+xfs@u8/xvda1/g8/v3/snapshots/2011-03-30

So I did manage to get a volume with a parent other than 5, but
I did not ask for it.

-- 
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: strange btrfs sub list output

2011-05-27 Thread Stephane Chazelas

2011-05-27 10:12:24 +0100, Hugo Mills:
[skipped useful clarification]
> 
>That's all rather dense, and probably too much information. Hope
> it's helpful, though.
[...]

It is, thanks.

How would one end up in a situation where the output of "btrfs
sub list ." has:

ID 287 top level 285 path data

How could a "subvolume 285" become a "top level"?

How does one get a subvolume with a top-level other than "5"?

-- 
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: strange btrfs sub list output

2011-05-27 Thread Stephane Chazelas

Is there a way to derive the subvolume ID from the stat(2)
st_dev, by the way.

# btrfs sub list .
ID 256 top level 5 path a
ID 257 top level 5 path b
# zstat +dev . a b
. 27
a 28
b 29

Are the dev numbers allocated in the same order as the
subvolids? Would there be any /sys, /proc, ioctl interface to
get this kind of information?

-- 
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: strange btrfs sub list output

2011-05-27 Thread Stephane Chazelas

2011-05-27 10:21:03 +0200, Andreas Philipp:
[...]
> > What do those top-level IDs mean by the way?
> The top-level ID associated with a subvolume is NOT the ID of this
> particular subvolume but of the subvolume containing it. Since the
> "root/initial" (sub-)volume has always ID 0, the subvolumes of "depth"
> 1 will all have top-level ID set to 0. You need those top-level IDs to
> correctly mount a specific subvolume by name.
> 
> # mount /dev/dummy -o subvol=,subvolrootid=
> /mountpoint
> 
> Of course, you do need them, if you specify the subvolume to mount by
> its ID.
[...]

Thanks Andreas for pointing that subvolrootid (might be worth
adding it to
https://btrfs.wiki.kernel.org/index.php/Getting_started#Mount_Options
BTW).

In my case, on a freshly made btrfs file system, subvolumes have
top-level 5. (and neither volume with id 0 or 5 appear in the
btrfs sub list).

All the top-levels are 5, and I don't even know how to create a
subvolume with a different top-level there, so I wonder how that
subvol that I had created with

btrfs sub snap data snapshots/2011-03-30

ending up being a subvolume with ID 285 that doesn't appear in
the "btrfs sub list" and contains a subvolume of "path" "data"
in there (with its top-level being 285). All the other
subvolumes and snapshots I've created in the exact same way are
created with a top-level 5 and have an entry in "btrfs sub list"
and don't have subvolumes of their own.

-- 
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: strange btrfs sub list output

2011-05-27 Thread Stephane Chazelas

2011-05-26 22:22:03 +0100, Stephane Chazelas:
[...]
> I get a btrfs sub list output that I don't understand:
> 
> # btrfs sub list /backup/
> ID 257 top level 5 path u1/linux/lvm+btrfs/storage/data/data
> ID 260 top level 5 path u2/linux/lvm/linux/var/data
> ID 262 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2010-10-11
> ID 263 top level 5 path u2/linux/lvm/linux/home/snapshots/2011-04-07
> ID 264 top level 5 path u2/linux/lvm/linux/root/snapshots/2011-04-07
> ID 265 top level 5 path u2/linux/lvm/linux/var/snapshots/2011-04-07
> ID 266 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2010-10-26
> ID 267 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2010-11-08
> ID 268 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2010-11-22
> ID 269 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2010-12-15
> ID 270 top level 5 path u2/linux/lvm/linux/home/snapshots/2011-04-14
> ID 271 top level 5 path u2/linux/lvm/linux/root/snapshots/2011-04-14
> ID 272 top level 5 path u2/linux/lvm/linux/var/snapshots/2011-04-14
> ID 273 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2010-12-29
> ID 274 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2011-01-26
> ID 275 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2011-03-07
> ID 276 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2011-04-01
> ID 277 top level 5 path u2/linux/lvm/linux/home/data
> ID 278 top level 5 path u2/linux/lvm/linux/home/snapshots/2011-04-27
> ID 279 top level 5 path u2/linux/lvm/linux/root/snapshots/2011-04-27
> ID 280 top level 5 path u2/linux/lvm/linux/var/snapshots/2011-04-27
> ID 281 top level 5 path u3:10022/vm+xfs@u9/xvda1/g1/v4/data
> ID 282 top level 5 path u3:10022/vm+xfs@u9/xvda1/g1/v4/snapshots/2011-05-19
> ID 283 top level 5 path u5/vm+xfs@u9/xvda1/g1/v5/data
> ID 284 top level 5 path u6:10022/vm+xfs@u8/xvda1/g8/v3/data
> ID 286 top level 5 path u5/vm+xfs@u9/xvda1/g1/v5/snapshots/2011-05-24
> ID 287 top level 285 path data
> ID 288 top level 5 path u4/vm+xfs@u9/xvda1/g1/v1/data
> ID 289 top level 5 path u4/vm+xfs@u9/xvda1/g1/v1/snapshots/2011-03-11
> ID 290 top level 5 path u4/vm+xfs@u9/xvda1/g1/v2/data
> ID 291 top level 5 path u4/vm+xfs@u9/xvda1/g1/v2/snapshots/2011-05-11
> ID 292 top level 5 path u4/vm+xfs@u9/xvda1/g1/v1/snapshots/2011-05-11
[...]
> There is no "/backup/data" directory. There is however a
> /backup/u6:10022/vm+xfs@u8/xvda1/g8/v3/snapshots/2011-03-30 that
> contains the same thing as what I get if I mount the fs with
> subvolid=287. And I did do a btrfs sub snap data
> snapshots/2011-03/30 there.
> 
> What could be the cause of that? How to fix it?
> 
> In case that matters, there used to be more components in the
> path of u6:10022/vm+xfs@u8/xvda1/g8/v3/data.
[...]

I tried deleting the
/backup/u6:10022/vm+xfs@u8/xvda1/g8/v3/snapshots/2011-03-30
subvolume (what seems to be id 287) and I get:

# btrfs sub delete snapshots/2011-03-30
Delete subvolume '/backup/u6:10022/vm+xfs@u8/xvda1/g8/v3/snapshots/2011-03-30'
ERROR: cannot delete 
'/backup/u6:10022/vm+xfs@u8/xvda1/g8/v3/snapshots/2011-03-30'

With a strace, it tells me:

ioctl(3, 0x5000940f, 0x7fffc7841a80)= -1 ENOTEMPTY (Directory not empty)

Then I realised that there was a "data" directory in there and
that snapshots/2011-03-30 was actually id 285 (which doesn't
appear in the btrfs sub list) and "snapshots/2011-03-30/data" is
id 287.

What do those top-level IDs mean by the way?

Then I was able to delete snapshots/2011-03-30/data, but
snapshots/2011-03-30 still didn't appear in the list.

Then I was able to delete snapshots/2011-03-30 and recreate it,
and this time it was fine.

Still don't know what happened there.

-- 
Stephane

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

strange btrfs sub list output

2011-05-26 Thread Stephane Chazelas

Hiya,

I get a btrfs sub list output that I don't understand:

# btrfs sub list /backup/
ID 257 top level 5 path u1/linux/lvm+btrfs/storage/data/data
ID 260 top level 5 path u2/linux/lvm/linux/var/data
ID 262 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2010-10-11
ID 263 top level 5 path u2/linux/lvm/linux/home/snapshots/2011-04-07
ID 264 top level 5 path u2/linux/lvm/linux/root/snapshots/2011-04-07
ID 265 top level 5 path u2/linux/lvm/linux/var/snapshots/2011-04-07
ID 266 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2010-10-26
ID 267 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2010-11-08
ID 268 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2010-11-22
ID 269 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2010-12-15
ID 270 top level 5 path u2/linux/lvm/linux/home/snapshots/2011-04-14
ID 271 top level 5 path u2/linux/lvm/linux/root/snapshots/2011-04-14
ID 272 top level 5 path u2/linux/lvm/linux/var/snapshots/2011-04-14
ID 273 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2010-12-29
ID 274 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2011-01-26
ID 275 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2011-03-07
ID 276 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2011-04-01
ID 277 top level 5 path u2/linux/lvm/linux/home/data
ID 278 top level 5 path u2/linux/lvm/linux/home/snapshots/2011-04-27
ID 279 top level 5 path u2/linux/lvm/linux/root/snapshots/2011-04-27
ID 280 top level 5 path u2/linux/lvm/linux/var/snapshots/2011-04-27
ID 281 top level 5 path u3:10022/vm+xfs@u9/xvda1/g1/v4/data
ID 282 top level 5 path u3:10022/vm+xfs@u9/xvda1/g1/v4/snapshots/2011-05-19
ID 283 top level 5 path u5/vm+xfs@u9/xvda1/g1/v5/data
ID 284 top level 5 path u6:10022/vm+xfs@u8/xvda1/g8/v3/data
ID 286 top level 5 path u5/vm+xfs@u9/xvda1/g1/v5/snapshots/2011-05-24
ID 287 top level 285 path data
ID 288 top level 5 path u4/vm+xfs@u9/xvda1/g1/v1/data
ID 289 top level 5 path u4/vm+xfs@u9/xvda1/g1/v1/snapshots/2011-03-11
ID 290 top level 5 path u4/vm+xfs@u9/xvda1/g1/v2/data
ID 291 top level 5 path u4/vm+xfs@u9/xvda1/g1/v2/snapshots/2011-05-11
ID 292 top level 5 path u4/vm+xfs@u9/xvda1/g1/v1/snapshots/2011-05-11

See ID 287 above.

There is no "/backup/data" directory. There is however a
/backup/u6:10022/vm+xfs@u8/xvda1/g8/v3/snapshots/2011-03-30 that
contains the same thing as what I get if I mount the fs with
subvolid=287. And I did do a btrfs sub snap data
snapshots/2011-03/30 there.

What could be the cause of that? How to fix it?

In case that matters, there used to be more components in the
path of u6:10022/vm+xfs@u8/xvda1/g8/v3/data.

Thanks,
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: curious writes on mounted, not used btrfs filesystem

2011-05-22 Thread Stephane Chazelas

2011-05-22 11:52:37 +0200, Tomasz Chmielewski:
[...]
> Can you try running these commands yourself:
> 
> iostat -k 1 
> 
> 
> And in a second terminal:
> 
> while true; do sync ; done
> 
> 
> To see if your btrfs makes writes on sync each time?
[...]

Yes it does. And I see:

<7>[38554.244219] sync(3354): dirtied inode 1 (?) on dm-11
<7>[38554.244740] btrfs-submit-0(29134): WRITE block 301128 on dm-11 (56 
sectors)
<7>[38554.244774] btrfs-submit-0(29134): WRITE block 741832 on dm-11 (56 
sectors)
<7>[38554.249963] sync(3354): WRITE block 128 on dm-11 (8 sectors)
<7>[38554.250010] sync(3354): WRITE block 131072 on dm-11 (8 sectors)

<7>[38567.312908] sync(3356): dirtied inode 1 (?) on dm-11
<7>[38567.313330] btrfs-submit-0(29134): WRITE block 301256 on dm-11 (24 
sectors)
<7>[38567.313350] btrfs-submit-0(29134): WRITE block 741960 on dm-11 (24 
sectors)
<7>[38567.313358] btrfs-submit-0(29134): WRITE block 301288 on dm-11 (24 
sectors)
<7>[38567.313366] btrfs-submit-0(29134): WRITE block 741992 on dm-11 (24 
sectors)
<7>[38567.313393] btrfs-submit-0(29134): WRITE block 301312 on dm-11 (8 sectors)
<7>[38567.313403] btrfs-submit-0(29134): WRITE block 742016 on dm-11 (8 sectors)
<7>[38567.325194] sync(3356): WRITE block 128 on dm-11 (8 sectors)
<7>[38567.325244] sync(3356): WRITE block 131072 on dm-11 (8 sectors)

<7>[38570.374449] sync(3358): dirtied inode 1 (?) on dm-11
<7>[38570.374976] btrfs-submit-0(29134): WRITE block 301128 on dm-11 (56 
sectors)
<7>[38570.375011] btrfs-submit-0(29134): WRITE block 741832 on dm-11 (56 
sectors)
<7>[38570.379221] sync(3358): WRITE block 128 on dm-11 (8 sectors)
<7>[38570.379272] sync(3358): WRITE block 131072 on dm-11 (8 sectors)

<7>[38572.170816] sync(3359): dirtied inode 1 (?) on dm-11
<7>[38572.171289] btrfs-submit-0(29134): WRITE block 301256 on dm-11 (24 
sectors)
<7>[38572.171300] btrfs-submit-0(29134): WRITE block 741960 on dm-11 (24 
sectors)
<7>[38572.171304] btrfs-submit-0(29134): WRITE block 301288 on dm-11 (24 
sectors)
<7>[38572.171308] btrfs-submit-0(29134): WRITE block 741992 on dm-11 (24 
sectors)
<7>[38572.171320] btrfs-submit-0(29134): WRITE block 301312 on dm-11 (8 sectors)
<7>[38572.171325] btrfs-submit-0(29134): WRITE block 742016 on dm-11 (8 sectors)
<7>[38572.180338] sync(3359): WRITE block 128 on dm-11 (8 sectors)
<7>[38572.180386] sync(3359): WRITE block 131072 on dm-11 (8 sectors)

<7>[38574.186559] sync(3360): dirtied inode 1 (?) on dm-11
<7>[38574.187090] btrfs-submit-0(29134): WRITE block 301128 on dm-11 (56 
sectors)
<7>[38574.187125] btrfs-submit-0(29134): WRITE block 741832 on dm-11 (56 
sectors)
<7>[38574.191602] sync(3360): WRITE block 128 on dm-11 (8 sectors)
<7>[38574.191654] sync(3360): WRITE block 131072 on dm-11 (8 sectors)

<7>[38576.370003] sync(3361): dirtied inode 1 (?) on dm-11
<7>[38576.370452] btrfs-submit-0(29134): WRITE block 301256 on dm-11 (24 
sectors)
<7>[38576.370470] btrfs-submit-0(29134): WRITE block 741960 on dm-11 (24 
sectors)
<7>[38576.370478] btrfs-submit-0(29134): WRITE block 301288 on dm-11 (24 
sectors)
<7>[38576.370485] btrfs-submit-0(29134): WRITE block 741992 on dm-11 (24 
sectors)
<7>[38576.370513] btrfs-submit-0(29134): WRITE block 301312 on dm-11 (8 sectors)
<7>[38576.370523] btrfs-submit-0(29134): WRITE block 742016 on dm-11 (8 sectors)
<7>[38576.379718] sync(3361): WRITE block 128 on dm-11 (8 sectors)
<7>[38576.379766] sync(3361): WRITE block 131072 on dm-11 (8 sectors)

Every other sync the same.

-- 
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: curious writes on mounted, not used btrfs filesystem

2011-05-22 Thread Stephane Chazelas

2011-05-21 14:58:21 +0200, Tomasz Chmielewski:
> I have a btrfs filesystem (2.6.39) which is mounted, but otherwise, not used:
> 
> # lsof -n|grep /mnt/btrfs

processes with open fds are one thing. You could also have loop
devices setup on it for instance.

> #
> 
> 
> I noticed that whenever I do "sync", btrfs will write for around 6.5s and 
> write 13 MB (see below).
[...]

You could try and play with /proc/sys/vm/block_dump to see what
is being written (remember to disable logging of kernel messages
by syslog).

-- 
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: ssd option for USB flash drive?

2011-05-19 Thread Stephane Chazelas

2011-05-19 15:54:23 -0600, cwillu:
[...]
> Try with the "ssd_spread" mount option.
[...]

Thanks. I'll try that.

> > I wonder now what credit to give to recommendations like in
> > http://www.patriotmemory.com/forums/showthread.php?3696-HOWTO-Increase-write-speed-by-aligning-FAT32
> > http://linux-howto-guide.blogspot.com/2009/10/increase-usb-flash-drive-write-speed.html
> >
> > Doing a apt-get upgrade on that stick takes hours when the same
> > takes a few minutes on an internal drive.
> 
> Also, there's a package "libeatmydata" which will provide an
> "eatmydata" command, which you can prefix your apt-get commands with.
> This will disable the excessive sync calls that dpkg makes, and should
> dramatically decrease the time for those sorts of things to finish.
> Flash as found in thumb drives doesn't have much in the way of crash
> guarantees anyway, so you're not really giving up much safety.

Thanks. That's very useful indeed.

Note that if you use that on aptitude/apg-get that means that
the daemons started/restarted in the process will be affected,
but it could be all the better in my case.

Now, with that eatmydata, I'm thinking of trying qemu-nbd -c
/dev/nbd0 /dev/mapper/original-device with that and have the
rootfs mounted on that /dev/nbd0.

That eatmydata could be a work around to the problem I was
mentionning at
https://lists.ubuntu.com/archives/ubuntu-server-bugs/2010-June/037846.html

-- 
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: ssd option for USB flash drive?

2011-05-19 Thread Stephane Chazelas

2011-05-19 21:04:54 +0200, Hubert Kario:
> On Wednesday 18 of May 2011 00:02:52 Stephane Chazelas wrote:
> > Hiya,
> > 
> > I've not found much detail on what the "ssd" btrfs mount option
> > did. Would it make sense to enable it to a fs on a USB flash
> > drive?
> 
> yes, enabling discard is pointless though (no USB storage supports it AFAIK).
>  
> > I'm using btrfs (over LVM) on a Live Linux USB stick to benefit
> > from btrfs's compression and am trying to improve the
> > performance.
> 
> ssd mode won't improve performance by much (if any).
> 
> You need to remember that USB2.0 is limited to about 20-30MiB/s (depending on 
> CPU) so it will be slow no matter what you do

Thanks Hubert for the feedback.

Well, for hard drives over USB, I can get to 40MiB/s read and
write easily. Here, I believe the bottle neck is the flash
memory. With that particular USB flash drive Corsair Voyager GT
16GB, I can get 25MiB/s sequential read and 17MiB/s sequential
write, but that falls down to about 3-5MiB/s random write.

[...]
> aligning logical blocks to erase blocks can give some performance but the 
> only 
> way to make it really fast is not to use USB
[...]

For something that fits in your pocket and is almost
universally bootable, there are not so many other options.

I tried changing the alignment on FAT32 and it didn't make
any difference. Playing with /proc/sys/vm/block_dump, I could see
chunks of 3, 4, 5 data sectors being written at once regardless
of the cluster size being used anyway. Interestingly when a user
process writes to /dev/sdx, block_dump shows 4k writes to
/dev/sdx only regardless of the size of the user writes while if
it goes via the filesystem I can see writes of up to 120k. Also,
I've very little knowledge of what happens at layers below the
block device (scsi interface, usb-storage, and the device
controller itself, for instance, I see
/sys/block/sdi/queue/rotational is 1 for that usb stick, why,
what does that mean in terms of performance and scheduling of
read-writes?)

I wonder now what credit to give to recommendations like in
http://www.patriotmemory.com/forums/showthread.php?3696-HOWTO-Increase-write-speed-by-aligning-FAT32
http://linux-howto-guide.blogspot.com/2009/10/increase-usb-flash-drive-write-speed.html

Doing a apt-get upgrade on that stick takes hours when the same
takes a few minutes on an internal drive.

If I boot a kvm virtual machine on that USB stick with a disk
cache mode of "unsafe", that is writes are hardly every flushed
to underlying storage, then that becomes lightning fast (at the
expense of possibly losing data in case of host failure, but I'm
not too worried about that), and flushing writes to device
upon VM shutdown only takes a couple of minutes.

So I figured that if I could make sure writing to the flash
device is asynchronous (and reads priviledged), that would help.

There's probably some solutions with aufs or some fuse
solutions, but I thought there might be some solution in btrfs
or some standard core layers usually underneath it.

-- 
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

ssd option for USB flash drive?

2011-05-17 Thread Stephane Chazelas

Hiya,

I've not found much detail on what the "ssd" btrfs mount option
did. Would it make sense to enable it to a fs on a USB flash
drive?

I'm using btrfs (over LVM) on a Live Linux USB stick to benefit
from btrfs's compression and am trying to improve the
performance.

Would anybody have any recommendation on how to improve
performance there? Like what would be the best way to
enable/increase writeback buffer or any way to make sure writes
are delayed and asynchronous? Would disabling read-ahead help?
(at which level would it be done?). Any other tip (like
disabling atime, aligning blocks/extents, figure out erase block
sizes if relevant...)?

Many thanks in advance,
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: wrong values in "df" and "btrfs filesystem df"

2011-04-12 Thread Stephane Chazelas

2011-04-12 15:22:57 +0800, Miao Xie:
[...]
> But the algorithm of df command doesn't simulate the above allocation 
> correctly, this
> simulated allocation just allocates the stripes from two disks, and then, 
> these two disks
> have no free space, but the third disk still has 1.2TB free space, df command 
> thinks
> this space can be used to make a new RAID0 block group and ignores it. This 
> is a bug,
> I think.
[...]

Thanks a lot Miao for the detailed explanation. So, the disk
space is not lost, it's just df not reporting the available
space correctly. That's me relieved.

It explains why I'm getting:

# blockdev --getsize64 /dev/sda4
2967698087424
# blockdev --getsize64 /dev/sdb
3000592982016
# blockdev --getsize64 /dev/sdc
3000592982016
# truncate -s 2967698087424 a
# truncate -s 3000592982016 b
# truncate -s 3000592982016 c
# losetup /dev/loop0 ./a
# losetup /dev/loop1 ./b
# losetup /dev/loop2 ./c
# mkfs.btrfs a b c
# btrfs device scan /dev/loop[0-2]
Scanning for Btrfs filesystems in '/dev/loop0'
Scanning for Btrfs filesystems in '/dev/loop1'
Scanning for Btrfs filesystems in '/dev/loop2'
# mount  /dev/loop0 /mnt/1
# df -k /mnt/1
Filesystem   1K-blocks  Used Available Use% Mounted on
/dev/loop0   875867582856 5859474304   1% /mnt/1
# echo $(((8758675828 - 5859474304)*2**10))
2968782360576

One disk worth of space lost according to df.

While it should have been more something like
$(((3000592982016-2967698087424)*2)) (about 60GB), or about 0
after the quasi-round-robin allocation patch, right?

Best regards,
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs balancing start - and stop?

2011-04-11 Thread Stephane Chazelas

2011-04-06 12:43:50 +0100, Stephane Chazelas:
[...]
> The rate is going down. It's now down to about 14kB/s
> 
> [658654.295752] btrfs: relocating block group 3919858106368 flags 20
> [671932.913235] btrfs: relocating block group 3919589670912 flags 20
> [686189.296126] btrfs: relocating block group 3919321235456 flags 20
> [701511.523990] btrfs: relocating block group 391905280 flags 20
> [718591.316339] btrfs: relocating block group 3918784364544 flags 20
> [725567.081031] btrfs: relocating block group 3918515929088 flags 20
> [744415.011581] btrfs: relocating block group 3918247493632 flags 20
> [762365.021458] btrfs: relocating block group 3917979058176 flags 20
> [780504.726067] btrfs: relocating block group 3917710622720 flags 20
[...]
> At this rate, the balancing would be over in about 8 years.
[...]

Hurray! The btrfs balance eventually ran through after almost exactly 2 weeks.
It didn't get down to 0:

[1189505.152717] btrfs: found 60527 extents
[1189505.440565] btrfs: relocating block group 3910731300864 flags 20
[1199805.071045] btrfs: found 60235 extents
[1199805.447821] btrfs: relocating block group 3910462865408 flags 20
[1207914.737372] btrfs: found 58039 extents

iostat reckons 9TB have been written to  disk in the whole
process (4.5TB read from them (!?)).

There hasn't been any change in allocation though:

# df -h /backup
FilesystemSize  Used Avail Use% Mounted on
/dev/sda4 8.2T  3.5T  3.2T  53% /backup
# btrfs fi df /backup
Data, RAID0: total=3.42TB, used=3.41TB
System, RAID1: total=16.00MB, used=228.00KB
Metadata, RAID1: total=28.00GB, used=20.47GB
# btrfs fi show
Label: none  uuid: a0ae35c4-51f2-405f-a4bb-e4f134b1d193
Total devices 3 FS bytes used 3.43TB
devid4 size 2.73TB used 1.17TB path /dev/sdc
devid3 size 2.73TB used 1.17TB path /dev/sdb
devid2 size 2.70TB used 1.14TB path /dev/sda4

Btrfs Btrfs v0.19

Still 1.5TB missing.

-- 
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: wrong values in "df" and "btrfs filesystem df"

2011-04-11 Thread Stephane Chazelas

2011-04-10 18:13:51 +0800, Miao Xie:
[...]
> >> # df /srv/MM
> >>
> >> Filesystem   1K-blocks  Used Available Use% Mounted on
> >> /dev/sdd15846053400 1593436456 2898463184  36% /srv/MM
> >>
> >> # btrfs filesystem df /srv/MM
> >>
> >> Data, RAID0: total=1.67TB, used=1.48TB
> >> System, RAID1: total=16.00MB, used=112.00KB
> >> System: total=4.00MB, used=0.00
> >> Metadata, RAID1: total=3.75GB, used=2.26GB
> >>
> >> # btrfs-show
> >>
> >> Label: MMedia  uuid: 120b036a-883f-46aa-bd9a-cb6a1897c8d2
> >>Total devices 3 FS bytes used 1.48TB
> >>devid3 size 1.81TB used 573.76GB path /dev/sdb1
> >>devid2 size 1.81TB used 573.77GB path /dev/sde1
> >>devid1 size 1.82TB used 570.01GB path /dev/sdd1
> >>
> >> Btrfs Btrfs v0.19
> >>
> >> 
> >>
> >> "df" shows an "Available" value which isn't related to any real value.  
> > 
> >I _think_ that value is the amount of space not allocated to any
> > block group. If that's so, then Available (from df) plus the three
> > "total" values (from btrfs fi df) should equal the size value from df.
> 
> This value excludes the space that can not be allocated to any block group,
> This feature was implemented to fix the bug df command add the disk space, 
> which
> can not be allocated to any block group forever, into the "Available" value.
> (see the changelog of the commit 6d07bcec969af335d4e35b3921131b7929bd634e)
> 
> This implementation just like fake chunk allocation, but the fake allocation
> just allocate the space from two of these three disks, doesn't spread the
> stripes over all the disks, which has enough space.
[...]

Hi Miao,

would you care to expand a bit on that. In Helmut's case above
where all the drives have at least 1.2TB free, how would there
be un-allocatable space?

What's the implication of having disks of differing sizes? Does
that mean that the extra space on larger disks is lost?

Thanks,
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: wrong values in "df" and "btrfs filesystem df"

2011-04-09 Thread Stephane Chazelas

2011-04-09 10:11:41 +0100, Hugo Mills:
[...]
> > # df /srv/MM
> > 
> > Filesystem   1K-blocks  Used Available Use% Mounted on
> > /dev/sdd15846053400 1593436456 2898463184  36% /srv/MM
> > 
> > # btrfs filesystem df /srv/MM
> > 
> > Data, RAID0: total=1.67TB, used=1.48TB
> > System, RAID1: total=16.00MB, used=112.00KB
> > System: total=4.00MB, used=0.00
> > Metadata, RAID1: total=3.75GB, used=2.26GB
> > 
> > # btrfs-show
> > 
> > Label: MMedia  uuid: 120b036a-883f-46aa-bd9a-cb6a1897c8d2
> > Total devices 3 FS bytes used 1.48TB
> > devid3 size 1.81TB used 573.76GB path /dev/sdb1
> > devid2 size 1.81TB used 573.77GB path /dev/sde1
> > devid1 size 1.82TB used 570.01GB path /dev/sdd1
> > 
> > Btrfs Btrfs v0.19
> > 
> > 
> > 
> > "df" shows an "Available" value which isn't related to any real value.  
> 
>I _think_ that value is the amount of space not allocated to any
> block group. If that's so, then Available (from df) plus the three
> "total" values (from btrfs fi df) should equal the size value from df.
[...]

Well,

$ echo $((2898463184 + 1.67*2**30 + 4*2**10 + 16*2**10*2 + 3.75*2**20*2))
4699513214.079

I do get the same kind of discrepancy:

$ df -h /mnt
FilesystemSize  Used Avail Use% Mounted on
/dev/sdb  8.2T  3.5T  3.2T  53% /mnt
$ sudo btrfs fi show
Label: none  uuid: ...
Total devices 3 FS bytes used 3.43TB
devid4 size 2.73TB used 1.17TB path /dev/sdc
devid3 size 2.73TB used 1.17TB path /dev/sdb
devid2 size 2.70TB used 1.14TB path /dev/sda4
$ sudo btrfs fi df /mnt
Data, RAID0: total=3.41TB, used=3.41TB
System, RAID1: total=16.00MB, used=232.00KB
Metadata, RAID1: total=35.25GB, used=20.55GB


$ echo $((3.2 + 3.41 + 2*16/2**20 + 2*35.25/2**10))
6.678847656253

-- 
Stephane

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: cloning single-device btrfs file system onto multi-device one

2011-04-06 Thread Stephane Chazelas

2011-03-28 14:17:48 +0100, Stephane Chazelas:
[...]
> So here is how I transferred a 6TB btrfs on one 6TB raid5 device
> (on host src) over the network onto a btrfs on 3 3TB hard drives
[...]
> I then did a btrfs fi balance again and let it run through. However here is
> what I get:
[...]

Sorry, it didn't run through and it is still running (after 9
days) and there are indications it  could still be running 8 years from
now (see other thread). There hasn't been any change in the
amount of free space reported by df since the beginning of  the
balance (there still are 2TB missing).

Cheers,
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs balancing start - and stop?

2011-04-06 Thread Stephane Chazelas

2011-04-04 20:07:54 +0100, Stephane Chazelas:
[...]
> > > 4.7 more days to go. And I reckon it will have written about 9
> > > TB to disk by that time (which is the total size of the volume,
> > > though only 3.8TB are occupied).
> > 
> > Yes - that's the pessimistic estimation. As Hugo has explained it can  
> > finish faster - just look to the data tomorrow again.
> [...]
> 
> That may be an optimistic estimation actually, as there hasn't
> been much progress in the last 34 hours:
[...]

The rate is going down. It's now down to about 14kB/s

[658654.295752] btrfs: relocating block group 3919858106368 flags 20
[671932.913235] btrfs: relocating block group 3919589670912 flags 20
[686189.296126] btrfs: relocating block group 3919321235456 flags 20
[701511.523990] btrfs: relocating block group 391905280 flags 20
[718591.316339] btrfs: relocating block group 3918784364544 flags 20
[725567.081031] btrfs: relocating block group 3918515929088 flags 20
[744415.011581] btrfs: relocating block group 3918247493632 flags 20
[762365.021458] btrfs: relocating block group 3917979058176 flags 20
[780504.726067] btrfs: relocating block group 3917710622720 flags 20

Even though it is reading and writing to disk at a much higher
rate. Here stats every second:

--dsk/sda-dsk/sdb-dsk/sdc--
 read  writ: read  writ: read  writ
   0 0 : 540k0 :  12k0
   0 0 : 704k0 :  20k0
   0 0 :1068k0 :  24k0
   0 0 : 968k0 :   0 0
   0 0 : 932k0 :4096B0
   0 0 : 832k  880k: 152k 1320k
  60k 4096B: 880k  140k:   028M
  68k0 : 308k0 :4096B 9240k
   048k:   0 0 :   0  7852k
   0 0 : 576k 6192k:4096B   26M
   0 0 : 100k   18M:   0 0
   0 0 :  28k   10M:   0 0
   0 0 :   0  7020k:   0 0
   0 0 :  52k   13M:   0 0
   012k: 528k   17M:   012k
   0 0 : 884k0 :8192B0
   0 0 :1068k0 :  20k0
   0 0 : 660k0 :   0 0
   040k: 776k0 :4096B0
   0 0 : 576k0 :   0 0
   0 0 : 596k0 :8192B0
1096k   28k: 664k0 :4096B0
   0 0 : 660k0 :   0 0
   0 0 : 592k0 :8192B0

At this rate, the balancing would be over in about 8 years.

Since the start of the balance:
Device:tpsMB_read/sMB_wrtn/sMB_readMB_wrtn
sda  10.04 1.56 1.5712282861237359
sdc 396.24 1.77 3.9513970153115057
sdb 421.17 1.87 3.9514737593115093

I think that's the end of my attempt to transfer that FS to
another machine (see other thread). I'll have to ditch that copy
and try again from scratch with another approach.

Before I do that, is there anything I can do to help investigate
the problem?

regards,
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs balancing start - and stop?

2011-04-04 Thread Stephane Chazelas

2011-04-03 21:35:00 +0200, Helmut Hullen:
> Hallo, Stephane,
> 
> Du meintest am 03.04.11:
> 
>  balancing about 2 TByte needed about 20 hours.
> 
> [...]
> 
> >> Hugo has explained the limits of regarding
> >>
> >> dmesg | grep relocating
> >>
> >> or (more simple) the last lines of "dmesg" and looking for the
> >> "relocating" lines. But: what do these lines tell now? What is the
> >> (pessimistic) estimation when you extrapolate the data?
> 
> [...]
> 
> > 4.7 more days to go. And I reckon it will have written about 9
> > TB to disk by that time (which is the total size of the volume,
> > though only 3.8TB are occupied).
> 
> Yes - that's the pessimistic estimation. As Hugo has explained it can  
> finish faster - just look to the data tomorrow again.
[...]

That may be an optimistic estimation actually, as there hasn't
been much progress in the last 34 hours:

# dmesg | awk -F '[][ ]+' '/reloc/ &&n++%5==0 {x=(n-$7)/($2-t)/1048576; printf 
"%s\t%s\t%.2f\t%*s\n", $2/3600,$7, x, x/3, ""; t=$2; n=$7}' | tr ' ' '*' | tail 
-40
125.629 4170039951360   11.93   ***
125.641 4166818725888   70.99   ***
125.699 4157155049472   43.87   **
125.753 4144270147584   63.34   *
125.773 4137827696640   84.98   
125.786 4134606471168   64.39   *
125.823 4124942794752   70.09   ***
125.87  4112057892864   71.66   ***
125.887 4105615441920   100.60  *
125.898 4102394216448   81.26   ***
125.935 4092730540032   69.06   ***
126.33  4085751218176   4.69*
131.904 4072597880832   0.63
132.082 4059712978944   19.20   **
132.12  4053270528000   45.52   ***
132.138 4050049302528   45.60   ***
132.225 4040385626112   29.68   *
132.267 4027500724224   81.17   ***
132.283 4021058273280   106.31  ***
132.29  4017837047808   110.42  
132.316 4008173371392   100.54  *
132.358 3995288469504   81.18   ***
132.475 3988846018560   14.62   
132.514 3985624793088   21.55   ***
132.611 3975961116672   26.40   
132.663 3963076214784   65.31   *
132.678 3956633763840   120.11  
132.685 3956365328384   10.26   ***
137.701 3949922877440   0.34
137.709 3946701651968   106.54  ***
137.744 3937037975552   72.10   
137.889 3927105863680   18.18   **
137.901 3926837428224   5.85*
141.555 3926300557312   0.04
141.93  3925226815488   0.76
151.227 3924421509120   0.02
151.491 3924153073664   0.27
151.712 3923616202752   0.64
165.301 3922542460928   0.02
174.346 3921737154560   0.02

At this rate (third field expressed in MiB/s), it could take
months to complete.

iostat still reports writes at about 5MiB/s though. Note that
this system is not doing anything else at all.

There definitely seems to be scope for optimisation in the
"balancing" I'd say.

-- 
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs balancing start - and stop?

2011-04-03 Thread Stephane Chazelas

2011-04-01 21:26:00 +0200, Helmut Hullen:
> Hallo, Stephane,

Hi Helmut,

> Du meintest am 01.04.11:
> 
> >> balancing about 2 TByte needed about 20 hours.
> > [...]
> 
> > I've got a balance running since Monday on a 9TB volume (3.5 of which
> > are used, 3.2 allegedly free), showing no sign of finishing soon.
> > Should I be worried?
> 
> > Using /proc/sys/vm/block_dump, I can see it's seeking all over the
> > place, which is probably why throughput is not high. I can also see
> > it writing several times to the same sectors.
> 
> 
> Hugo has explained the limits of regarding
> 
> dmesg | grep relocating
> 
> or (more simple) the last lines of "dmesg" and looking for the  
> "relocating" lines. But: what do these lines tell now? What is the  
> (pessimistic) estimation when you extrapolate the data?
> 
> (please excuse my gerlish)
[...]

$ dmesg | grep reloc | sed -e 1b -e '$!d'
[370178.209571] btrfs: relocating block group 5612075220992 flags 20
[546163.062739] btrfs: relocating block group 3923616202752 flags 20

So, if it's to go down to zero:

$ echo 
$((3923616202752*(546163.062739-370178.209571)/(5612075220992-3923616202752)/86400))
4.7332292856705722

4.7 more days to go. And I reckon it will have written about 9
TB to disk by that time (which is the total size of the volume,
though only 3.8TB are occupied).

-- 
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs balancing start - and stop?

2011-04-01 Thread Stephane Chazelas

On Fri, 2011-04-01 at 14:12 +0200, Helmut Hullen wrote:
> Hallo, Struan,
> 
> Du meintest am 01.04.11:
> 
> > 1) Is the balancing operation expected to take many hours (or days?)
> > on a filesystem such as this? Or are there known issues with the
> > algorithm that are yet to be addressed?
> 
> May be. Balancing about 15 GByte needed about 2 hours (or less),  
> balancing about 2 TByte needed about 20 hours.
[...]

I've got a balance running since Monday on a 9TB volume (3.5 of which
are used, 3.2 allegedly free), showing no sign of finishing soon. Should
I be worried?

Using /proc/sys/vm/block_dump, I can see it's seeking all over the
place, which is probably why throughput is not high. I can also see it
writing several times to the same sectors.

# df -h /backup
FilesystemSize  Used Avail Use% Mounted on
/dev/sda4 8.2T  3.5T  3.2T  53% /backup
# btrfs fi sh
Label: none  uuid: ...
Total devices 3 FS bytes used 3.43TB
devid4 size 2.73TB used 1.16TB path /dev/sdc
devid3 size 2.73TB used 1.16TB path /dev/sdb
devid2 size 2.70TB used 1.14TB path /dev/sda4

Btrfs Btrfs v0.19
# ps -eolstart,args | grep balance
Mon Mar 28 11:18:18 2011 sudo btrfs fi balance /backup
Mon Mar 28 11:18:18 2011 btrfs fi balance /backup
# date
Fri Apr  1 19:28:40 BST 2011
# btrfs fi df /backup
Data, RAID0: total=3.41TB, used=3.41TB
System, RAID1: total=16.00MB, used=232.00KB
Metadata, RAID1: total=27.75GB, used=20.47GB
# iostat -md
Device:tpsMB_read/sMB_wrtn/sMB_readMB_wrtn
sda  14.49 2.37 2.39 903123 913112
sdc 501.23 2.68 5.0610224561928462
sdb 477.28 2.58 5.06 9828531928482

It's already written more than the used space.

Cheers,
Stephane

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: cloning single-device btrfs file system onto multi-device one

2011-03-30 Thread Stephane Chazelas

2011-03-28 14:24:03 +0100, Stephane Chazelas:
> 2011-03-23 12:13:45 +0700, Fajar A. Nugraha:
> > On Mon, Mar 21, 2011 at 11:24 PM, Stephane Chazelas
> >  wrote:
> > > AFAICT, compression is enabled at mount time and would
> > > only apply to newly created files. Is there a way to compress
> > > files already in a btrfs filesystem?
> > 
> > You need to select the files manually (not possible to select a
> > directory), but yes, it's possible using "btrfs filesystem defragment
> > -c"
> [...]
> 
> Thanks. However I find that for files that have snapshots, it
> ends up increasing disk usage instead of reducing it (size of
> the file + size of the compressed file, instead of size of the
> file).
> 
> If I do the btrfs fi de on both the volume and its snapshot, I
> end up with some benefit only if the compression ratio is over
> 2 (and with more snapshots, there's little chance of getting any
> benefit at all). Also, with dozens of snapshots on a 4TB volume,
> it's likely to take weeks to do.
> 
> Is there a way around that?
[...]

OK, sorry. I can see now that it's a FAQ. So the answer to my
question would be "no"...

-- 
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: cloning single-device btrfs file system onto multi-device one

2011-03-28 Thread Stephane Chazelas

2011-03-23 12:13:45 +0700, Fajar A. Nugraha:
> On Mon, Mar 21, 2011 at 11:24 PM, Stephane Chazelas
>  wrote:
> > AFAICT, compression is enabled at mount time and would
> > only apply to newly created files. Is there a way to compress
> > files already in a btrfs filesystem?
> 
> You need to select the files manually (not possible to select a
> directory), but yes, it's possible using "btrfs filesystem defragment
> -c"
[...]

Thanks. However I find that for files that have snapshots, it
ends up increasing disk usage instead of reducing it (size of
the file + size of the compressed file, instead of size of the
file).

If I do the btrfs fi de on both the volume and its snapshot, I
end up with some benefit only if the compression ratio is over
2 (and with more snapshots, there's little chance of getting any
benefit at all). Also, with dozens of snapshots on a 4TB volume,
it's likely to take weeks to do.

Is there a way around that?

Thanks
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: cloning single-device btrfs file system onto multi-device one

2011-03-28 Thread Stephane Chazelas

2011-03-22 18:06:29 -0600, cwillu:
> > I can mount it back, but not if I reload the btrfs module, in which case I 
> > get:
> >
> > [ 1961.328280] Btrfs loaded
> > [ 1961.328695] device fsid df4e5454eb7b1c23-7a68fc421060b18b devid 1 
> > transid 118 /dev/loop0
> > [ 1961.329007] btrfs: failed to read the system array on loop0
> > [ 1961.340084] btrfs: open_ctree failed
> 
> Did you rescan all the loop devices (btrfs dev scan /dev/loop*) after
> reloading the module, before trying to mount again?

Thanks. That probably was the issue, that and using too big
files on too small volumes I'd guess.

I've tried it in real life and it seemed to work to some extent.
So here is how I transferred a 6TB btrfs on one 6TB raid5 device
(on host src) over the network onto a btrfs on 3 3TB hard drives
(on host dst):

on src:

lvm snapshot -L100G -n snap /dev/VG/vol
nbd-server 12345 /dev/VG/snap

(if you're not lucky enough to have used lvm there, you can use
nbd-server's copy-on-write feature).

on dst:

nbd-client src 12345 /dev/nbd0
mount /dev/nbd0 /mnt
btrfs device add /dev/sdb /dev/sdc /dev/sdd /mnt
  # in reality it was /dev/sda4 (a little under 3TB), /dev/sdb,
  # /dev/sdc
btrfs device delete /dev/nbd0 /mnt

That was relatively fast (about 18 hours) but failed with an
error. Apparently, it managed to fill up the 3 3TB drives (as
shown by btrfs fi show). Usage for /dev/nbd0 was at 16MB though
(?!)

I then did a "btrfs fi balance /mnt". I could see usage on the
drives go down quickly. However, that was writing data onto
/dev/nbd0 so was threatening to fill up my LVM snapshot. I then
cancelled that by doing a hard reset on "dst" (couldn't find
any other way). And then:

Upon reboot, I mounted /dev/sdb instead of /dev/nbd0 in case
that made a difference and then ran the 

btrfs device delete /dev/nbd0 /mnt

again, which this time went through.

I then did a btrfs fi balance again and let it run through. However here is
what I get:

$ df -h /mnt
FilesystemSize  Used Avail Use% Mounted on
/dev/sdb  8.2T  3.5T  3.2T  53% /mnt

Only 3.2T left. How would I reclaim the missing space?

$ sudo btrfs fi show
Label: none  uuid: ...
Total devices 3 FS bytes used 3.43TB
devid4 size 2.73TB used 1.17TB path /dev/sdc
devid3 size 2.73TB used 1.17TB path /dev/sdb
devid2 size 2.70TB used 1.14TB path /dev/sda4
$ sudo btrfs fi df /mnt
Data, RAID0: total=3.41TB, used=3.41TB
System, RAID1: total=16.00MB, used=232.00KB
Metadata, RAID1: total=35.25GB, used=20.55GB

So that kind of worked but that is of little use to me as 2TB
kind of disappeared under my feet in the process.

Any idea, anyone?

Thanks
Stephane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: cloning single-device btrfs file system onto multi-device one

2011-03-22 Thread Stephane Chazelas

2011-03-21 16:24:50 +, Stephane Chazelas:
[...]
> I'm trying to move a btrfs FS that's on a hardware raid 5 (6TB
> large, 4 of which are in use) to another machine with 3 3TB HDs
> and preserve all the subvolumes/snapshots.
[...]

I tried one approach: export a LVM snapshot of the old fs as a
nbd device, mount it from the new machine (/dev/nbd0), then add
the new disks to the FS (btrfs add device) and then delete
/dev/nbd0 which I'd hope would relocate all the extents onto the
new disks.

I did some experiments with some loop devices but got all sorts
of results with different versions of kernels (debian unstable
2.6.37 and 2.6.38 amd64).

Here is what I did:

dd seek=512 bs=1M of=./a < /dev/null
dd seek=256 bs=1M of=./b < /dev/null
dd seek=256 bs=1M of=./c < /dev/null
mkfs.btrfs ./a
losetup /dev/loop0 ./a
losetup /dev/loop1 ./b
losetup /dev/loop2 ./c
mount /dev/loop0 /mnt
yes | head -c 300M > /mnt/test
btrfs device add /dev/loop1 /mnt
btrfs device add /dev/loop2 /mnt
# btrfs filesystem balance /mnt
btrfs device delete /dev/loop0 /mnt

In 2,6,38, upon the "balance" as well as upon the "delete", it
seemed to go in a loop, the system at 70% wait, and some 
btrfs: found 1 extents
2 to 3 times per second in dmesg. I tried leaving it on for a few
hours and it didn't help. The only thing I could do is reboot.
Disk usage of the a, b, c files were not increasing, though
dstat -d showed some disk writing at ~500kB/s (so I suppose it
was writing the same blocks over and over and seeking a lot).

In 2.6.37, I managed to have it working once, though I don't
know how and never managed to reproduce.

Upon the delete, I can see some relocations in dmesg output, but
then:

# btrfs device delete /dev/loop0 /mnt
ERROR: error removing the device '/dev/loop0'
(no error in dmesg)

Upon umount, here is what I find in dmesg:

[...]
[ 1802.357205] btrfs: relocating block group 0 flags 2
[ 1860.193351] [ cut here ]
[ 1860.193373] WARNING: at 
/build/buildd-linux-2.6_2.6.37-2-amd64-bITS0h/linux-2.6-2.6.37/debian/build/source_amd64_none/fs/btrfs/volumes.c:544
 __btrfs_close_devices+0xb5/0xd0 [btrfs]()
[ 1860.193379] Hardware name: MacBookPro4,1
[ 1860.193382] Modules linked in: btrfs libcrc32c hidp vboxnetadp vboxnetflt 
vboxdrv ip6table_filter ip6_tables ebtable_nat ebtables acpi_cpufreq mperf 
cpufreq_powersave cpufreq_userspace cpufreq_conservative cpufreq_stats 
ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state 
nf_conntrack ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp 
parport_pc ppdev lp parport sco bnep rfcomm l2cap kvm_intel binfmt_misc kvm 
deflate ctr twofish_generic twofish_x86_64 twofish_common camellia serpent 
blowfish cast5 des_generic cbc cryptd aes_x86_64 aes_generic xcbc rmd160 
sha512_generic sha256_generic sha1_generic hmac crypto_null af_key fuse nfsd 
exportfs nfs lockd fscache nfs_acl auth_rpcgss sunrpc loop dm_crypt 
snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss 
snd_mixer_oss snd_pcm uvcvideo videodev nouveau btusb bluetooth snd_seq_midi 
lib80211_crypt_tkip snd_rawmidi snd_seq_midi_event v4l1_compat rfkill snd_seq 
bcm5974 wl(P) ttm drm_kms_helper v4l2_compat_ioctl32 snd_timer snd_seq_device 
drm i2c_i801 i2c_algo_bit snd tpm_tis soundcore video snd_page_alloc lib80211 
joydev i2c_core tpm tpm_bios battery ac applesmc input_polldev evdev pcspkr 
mbp_nvidia_bl output power_supply processor thermal_sys button ext4 mbcache 
jbd2 crc16 raid10 raid456 async_raid6_recov async_pq raid6_pq async_xor xor 
async_memcpy async_tx raid1 raid0 multipath linear md_mod nbd dm_mirror 
dm_region_hash dm_log dm_mod zlib_deflate crc32c sg sd_mod sr_mod cdrom 
crc_t10dif hid_apple usbhid hid ata_generic sata_sil24 uhci_hcd ata_piix libata 
ehci_hcd scsi_mod usbcore sky2 firewire_ohci firewire_core crc_itu_t nls_base 
[last unloaded: uinput]
[ 1860.193550] Pid: 14808, comm: umount Tainted: PW   2.6.37-2-amd64 #1
[ 1860.193552] Call Trace:
[ 1860.193561]  [] ? warn_slowpath_common+0x78/0x8c
[ 1860.193577]  [] ? __btrfs_close_devices+0xb5/0xd0 [btrfs]
[ 1860.193593]  [] ? btrfs_close_devices+0x1d/0x70 [btrfs]
[ 1860.193610]  [] ? close_ctree+0x2cd/0x32f [btrfs]
[ 1860.193616]  [] ? dispose_list+0xa7/0xb9
[ 1860.193627]  [] ? btrfs_put_super+0x10/0x1d [btrfs]
[ 1860.193633]  [] ? generic_shutdown_super+0x5c/0xd4
[ 1860.193638]  [] ? kill_anon_super+0x9/0x40
[ 1860.193642]  [] ? deactivate_locked_super+0x1e/0x3d
[ 1860.193647]  [] ? sys_umount+0x2cf/0x2fa
[ 1860.193653]  [] ? system_call_fastpath+0x16/0x1b
[ 1860.193656] ---[ end trace 4e4b8320dc6e70cc ]---


I can mount it back, but not if I reload the btrfs module, in which case I get:

[ 1961.328280] Btrfs loaded
[ 1961.328695] device fsid df4e5454eb7b1c23-7a68fc421060b18b devid 1 transid 
118 /dev/loop0
[ 1961.329007] btrfs: failed to read the system array on loop0
[ 1961.340084] btrfs: open_ct

cloning single-device btrfs file system onto multi-device one

2011-03-21 Thread Stephane Chazelas

Hiya,

I'm trying to move a btrfs FS that's on a hardware raid 5 (6TB
large, 4 of which are in use) to another machine with 3 3TB HDs
and preserve all the subvolumes/snapshots.

Is there a way to do that without using a software/hardware raid
on the new machine (that is just use btrfs multi-device).

If fewer than 3TB were occupied, I suppose I could just resize
it so that it fits on one 3TB hd, then copy device to device
onto a 3TB disk, add the 2 other ones and do a "balance", but
here, I can't do that.

I suspect that if compression was enabled, the FS could fit on
3 TB, but AFAICT, compression is enabled at mount time and would
only apply to newly created files. Is there a way to compress
files already in a btrfs filesystem?

Any help would be appreciated.
Stephane

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

94 matches

Mail list logo