btrfs warning in 3.7.0 with nfsd

2012-12-19 Thread Steve Leung


I'm getting some kernel warnings from a btrfs raid1 filesystem that's 
serving up nfs4 to a couple of other computers on a small home network. 
I ran 3.6.x for several weeks before upgrading and didn't see any warnings 
like this.


[ 1712.223791] WARNING: at fs/btrfs/tree-log.c:3716 
btrfs_log_inode_parent+0x291/0x2fd [btrfs]()
[ 1712.223793] Hardware name: System Product Name
[ 1712.223794] Modules linked in: lirc_dev xt_tcpudp iptable_filter ip_tables 
x_tables nfsd auth_rpcgss nfs_acl nfs lockd fscache sunrpc loop firewire_sbp2 
snd_hda_codec_analog iTCO_wdt iTCO_vendor_support snd_hda_intel snd_hda_codec 
snd_hwdep snd_pcm snd_page_alloc snd_seq coretemp kvm_intel acpi_cpufreq mperf 
snd_timer lpc_ich evdev asus_atk0110 kvm processor button psmouse serio_raw 
snd_seq_device mfd_core parport_pc pcspkr thermal_sys i2c_i801 snd microcode 
soundcore parport i2c_core btrfs crc32c libcrc32c zlib_deflate sg sd_mod 
crc_t10dif ata_generic uhci_hcd pata_jmicron ahci libahci firewire_ohci 
firewire_core crc_itu_t ata_piix libata r8169 mii scsi_mod ehci_hcd usbcore 
usb_common
[ 1712.223840] Pid: 2314, comm: nfsd Not tainted 3.7.0 #1
[ 1712.223842] Call Trace:
[ 1712.223849]  [] ? warn_slowpath_common+0x76/0x8a
[ 1712.223864]  [] ? btrfs_log_inode_parent+0x291/0x2fd 
[btrfs]
[ 1712.223879]  [] ? btrfs_log_dentry_safe+0x35/0x4e [btrfs]
[ 1712.223894]  [] ? btrfs_sync_file+0x151/0x1e6 [btrfs]
[ 1712.223908]  [] ? btrfs_file_aio_write+0x374/0x3c5 [btrfs]
[ 1712.223913]  [] ? tcp_release_cb+0x46/0x94
[ 1712.223917]  [] ? release_sock+0xe9/0x11f
[ 1712.223922]  [] ? tcp_sendmsg+0x6ef/0x801
[ 1712.223936]  [] ? __btrfs_buffered_write+0x2e8/0x2e8 
[btrfs]
[ 1712.223940]  [] ? do_sync_readv_writev+0x57/0x94
[ 1712.223943]  [] ? do_readv_writev+0x94/0x108
[ 1712.223947]  [] ? exportfs_decode_fh+0xcc/0x257
[ 1712.223958]  [] ? seconds_since_boot+0x11/0x1a [sunrpc]
[ 1712.223966]  [] ? cache_check+0x2c/0x24c [sunrpc]
[ 1712.223975]  [] ? nfsd_vfs_write.isra.8+0xc3/0x20e [nfsd]
[ 1712.223978]  [] ? kmem_cache_alloc+0x8e/0xfd
[ 1712.223987]  [] ? renew_client_locked+0x76/0x7f [nfsd]
[ 1712.223995]  [] ? renew_client+0x18/0x25 [nfsd]
[ 1712.224003]  [] ? find_confirmed_client+0x47/0x5c [nfsd]
[ 1712.224024]  [] ? nfsd_write+0x7f/0xf0 [nfsd]
[ 1712.224033]  [] ? should_resched+0x5/0x23
[ 1712.224040]  [] ? nfsd4_write+0xcb/0xf3 [nfsd]
[ 1712.224049]  [] ? nfsd4_proc_compound+0x224/0x3b2 [nfsd]
[ 1712.224057]  [] ? nfsd_dispatch+0x93/0x145 [nfsd]
[ 1712.224069]  [] ? svc_process_common+0x289/0x439 [sunrpc]
[ 1712.224075]  [] ? nfsd_destroy.constprop.2+0x3c/0x3c [nfsd]
[ 1712.224086]  [] ? svc_process+0x111/0x12d [sunrpc]
[ 1712.224095]  [] ? nfsd+0xaa/0xfe [nfsd]
[ 1712.224100]  [] ? kthread+0x81/0x89
[ 1712.224105]  [] ? __kthread_parkme+0x5c/0x5c
[ 1712.224109]  [] ? ret_from_fork+0x7c/0xb0
[ 1712.224114]  [] ? __kthread_parkme+0x5c/0x5c
[ 1712.224116] ---[ end trace 58e10839083ba9da ]---

This is the first warning; subsequent ones look broadly similar to my 
eyes.


btrfs fi show:

Label: none  uuid: 9d4db9e3-b9c3-4f6d-8cb4-60ff55e96d82
Total devices 3 FS bytes used 1.02TB
devid3 size 1.36TB used 1.22TB path /dev/sdb1
devid1 size 1.36TB used 1.18TB path /dev/sdc1
devid2 size 464.73GB used 318.03GB path /dev/sda1

This is a vanilla amd64 3.7.0 built using Debian's make-kpkg.  Userspace 
is a recent Debian testing.


I'm not sure exactly what triggers this but it seems to happen regularly 
with NFS activity.  I haven't observed any failures or bad performance 
from this.


I know 3.7 introduced some NFS changes too so I hope I'm directing this 
report correctly.


Should I be concerned about my data?  Can I provide any other information? 
I can try btrfs-next too if it'll help.


Thanks!

Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: About btrfs qgroup import/export command

2012-12-19 Thread Miao Xie
On Wed, 19 Dec 2012 12:40:25 +0100, Arne Jansen wrote:
> On 19.12.2012 12:25, Miao Xie wrote:
>> Hi, everyone.
>>
>> As we know, there is no backup function for qgroup. when the problem
>> occurs, the users must recover qgroup configuration manually, it is not
>> convenient. And besides that, some users might want to import an existed
>> qgroup configuration into a new filesystem. Btrfs does not have such a
>> function,it can only be done manually.
>>
>> So we want to implement btrfs qgroup import/export commands.
>> 1)'btrfs qgroup export' commands will export qgroup tree
>>   into a user's specified file.(stdout by default)
>>
>> 2)user may modify the configuration file firstly and then
>>   import it into the filesystem.(by 'btrfs qgroup import' command)
>>
>> The file may be formated as the following:
>>
>> Qgroupid is_compressed is_exclusive   limited_sizeparent
>> --
>>  0/10 0  10G1/0
>>  1/01 1  20G---
>>   
>>  If 'is_exclusive' is set, 'limited_size' corresponds to max exlusive size,
>>  else max referenced size. Here 'parent' exclude ancestral qgroups. 
>>
>> Is there any comment about this idea? 
> 
> The configuration only really makes sense in combination with the existing
> subvolumes. Even if the target has subvolumes under the same name, they
> might have different internal IDs. So it might make more sense to address
> the level 0 qgroups by name.

Yeah. Thanks for your suggest.

> Also it might be misleading to apply a configuration to an existing fs, as
> it currently is not possible get a correct accounting if the fs is not
> empty. Rescan is not yet implemented.
> So instead of just saving and restoring the qgroup config, it might make
> more sense to create a new filesystem including all subvolumes and quota
> config from a config file.
> But, I'm not completely convinced that this is a features that is needed
> frequently. If I want a standard deployment, I simple write a script that
> creates the fs + subvol + quota.

But if you set a new value for some groups, you must modify your script at
the same time, it is a bit troublesome.

Thanks
Miao
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: don't bother updating the inode when evicting

2012-12-19 Thread Miao Xie
On  wed, 19 Dec 2012 10:02:59 -0500, Josef Bacik wrote:
> On Tue, Dec 18, 2012 at 06:58:33PM -0700, Miao Xie wrote:
>> On tue, 18 Dec 2012 15:51:57 -0500, Josef Bacik wrote:
>>> We're deleting the stupid thing, no sense in updating the inode for the new
>>> size.  We're running into having 50-100 orphans left over with xfstests 83
>>> because of ENOSPC when trying to start the transaction for the inode update.
>>> This patch fixes this problem.  Thanks,
>>
>> This patch is wrong, it will introduce the inconsonant metadata in the 
>> snapshot
>> tree. The reason is folloing:
>>
>> commit 8407aa464331556e4f6784f974030b83fc7585ed
>> Author: Miao Xie 
>> Date:   Fri Sep 7 01:43:32 2012 -0600
>>
>> Btrfs: fix corrupted metadata in the snapshot
>> 
>> When we delete a inode, we will remove all the delayed items including 
>> delayed
>> inode update, and then truncate all the relative metadata. If there is 
>> lots of
>> metadata, we will end the current transaction, and start a new 
>> transaction to
>> truncate the left metadata. In this way, we will leave a inode item that 
>> its
>> link counter is > 0, and also may leave some directory index items in 
>> fs/file tree
>> after the current transaction ends. In other words, the metadata in this 
>> fs/file tree
>> is inconsistent. If we create a snapshot for this tree now, we will find 
>> a inode with
>> corrupted metadata in the new snapshot, and we won't continue to drop 
>> the left metadata,
>> because its link counter is not 0.
>> 
>> We fix this problem by updating the inode item before the current 
>> transaction ends.
>> 
>> Signed-off-by: Miao Xie 
>>
> 
> So why don't we fix unlink to call btrfs_update_inode_item so that the nlink
> counter is set to 0?  The orphan item will be carried over into the snapshot 
> if
> we don't actually evict the inode before we do the snapshot and then the 
> orphan
> cleanup will take care of the rest?  Thanks,

But it would make the file deletion performance down.

Thanks
Miao

> 
> Josef
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] Btrfs updates

2012-12-19 Thread Hugo Mills
On Wed, Dec 19, 2012 at 07:09:57PM +0100, Roy Sigurd Karlsbakk wrote:
> > [ sorry, resend. My lbdb autocompleted with an extra r in kernel.org ]
> > 
> > Hi everyone,
> > 
> > My for-linus branch has a big set of fixes and features:
> 
> Does this include raid-[56]?

   As Chris's last sentence in that mail says, not in this pull
request, but it should be ready for Friday.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- We demand rigidly defined areas of doubt and uncertainty! ---


signature.asc
Description: Digital signature


Re: [GIT PULL] Btrfs updates

2012-12-19 Thread Roy Sigurd Karlsbakk
> [ sorry, resend. My lbdb autocompleted with an extra r in kernel.org ]
> 
> Hi everyone,
> 
> My for-linus branch has a big set of fixes and features:

Does this include raid-[56]?

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 98013356
r...@karlsbakk.net
http://blogg.karlsbakk.net/
GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med xenotyp etymologi. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Online Deduplication for Btrfs (Master's thesis)

2012-12-19 Thread Martin Křížek
On Mon, Dec 17, 2012 at 2:33 PM, Alexander Block  wrote:
> I did some research on deduplication in the past and there are some
> problems that you will face. I'll try to list some of them (for sure
> not all).
>
> On Mon, Dec 17, 2012 at 1:05 PM, Martin Křížek  
> wrote:
>> Hello everyone,
>>
>> my name is Martin Krizek. I am a student at Faculty of Information, Brno
>> University of Technology, Czech Republic. As my master's thesis I chose to 
>> work
>> on Online Deduplication for Btrfs.
>>
>> My goal is to study Btrfs design, the offline deduplication patch [1] and to
>> come up with a design for the online dedup, this semester. I will be
>> implementing
>> the feature next semester (spring, that is).
> The offline dedup patch is quite old and won't apply/compile anymore.
> You should probably look into the clone ioctl which basically does the
> same as the extent_same ioctl from the patch. Based on the clone ioctl
> you can at least learn how to "inject" existing extents into other
> inodes.

I am aware of this, thanks for pointing that out though.

>>
>> I would like to shortly introduce my ideas on what I think the feature
>> should look
>> like.
>>
>> * Use cases
>> The two main use cases for the dedup are:
>> 1. virtual machine images
>> 2. backup servers
>>
>> * When to dedup
>> The choice of 'when to dedup' is not up to me as the master's thesis
>> specification
>> states "online". :)
>>
>> * What to dedup
>> I'd say the most reasonable is block-level deduplication as it seems the most
>> general approach to me.
> Here you have two options:
> 1. Based on the per volume tree where you'll find btrfs_file_extent
> items which point to the global extent tree.
> 2. Based on the global extent tree. By using the backref resolving
> code, you can find which volumes refer to these extents.

Agreed.

>>
>> * Controlling dedup
>> - turn on/off deduplication - specify subvolumes on which
>> deduplication is turned on
>>  (mount, ioctl - inherited),
>> - turn on/off byte-by-byte comparison of blocks that have same hashes
>> (mount, ioctl),
>> - deduplication statistics (ioctl)
> You'll get trouble when online dedup is turned on and off again. While
> it is offline, extents still get written, but you won't have your hash
> tree up-to-date. You'll need to find a way to update when dedup is
> online again, without too much performance loos while updating.

Well, yes.

>>
>> * Limitations
>> Not really limitations, but this is a list of situations when dedup will not
>> be triggered:
>> - encryption,
> I've already heard somewhere else that encryption+dedup is not
> possible but I don't understand why. Can someone explain this
> limitation?
>> - compression - basically, dedup is kind of compression, might be worth to 
>> into
>>   it in the future though
>> - inline/prealloc extents,
> Should be possible to dedup inline extents, but must be configurable
> (e.g. minimum block size). People should also be able to completely
> disable it when performance is important.

I agree that it's possible, but I don't think it's worth doing. You
won't save much storage that way.

>> - data across subvolumes
> Should be possible. See my comment on the key in the hash tree.

Again, I agree that it's possible but this would be something I
probably won't go into.

>>
>> * How to store hashes
>> The obvious choice would be to use the checksum tree that holds block 
>> checksums
>> of each extent. The problem with the checksum tree is that the
>> checksums are looked up by logical address for the start of the extent data.
>> This is undesirable since it needs to be done the other way around. Logical
>> addresses need to be looked up by a hash.
>> To solve this, another key type would be created inside the checksum tree (or
>> maybe better, a new tree would be introduced) that
>> would have a hash as item's right-hand key value. This way, items could be
>> looked up on a hash:
>> (root, HASH_ITEM, hash)
>> The root value says which root (subvolume) is hashed block stored on. The 
>> hash
>> value is hash itself.
> With the root inside the key you make it impossible to allow
> cross-subvolume deduplication. Also, the offset field in the key that
> you plan to use for the hash is only 64bit, so you can at best store a
> part of the hash in the key. You should probably split the hash into 3
> parts: 64bit to put into the objectid field, 64bit to put into the
> offset field and the remainder into the item data. A lookup would then
> do the necessary splitting and in case a match is found also compare
> the remainder found in the items data.

Right, I thought of this but somehow forgot to mention it. Yes, the
hash would need to be split if it does not fit into the offset. Thanks
for noticing.

>> The item data would be of the following structure:
>> struct btrfs_hash_item {
>> __le64 disk_bytenr;
>> __le64 disk_num_bytes;
>> __le64 offset;
>> }
> You could omit the offset here and and store the sum of disk_bytenr
> and offset in one fiel

Re: Online Deduplication for Btrfs (Master's thesis)

2012-12-19 Thread Martin Křížek
On Mon, Dec 17, 2012 at 2:12 PM, Hubert Kario  wrote:
> On Monday 17 of December 2012 13:05:01 Martin Křížek wrote:
>> * Limitations
>> Not really limitations, but this is a list of situations when dedup will
>> not be triggered:
>> - compression - basically, dedup is kind of compression, might be worth to
>> into it in the future though
>
> I don't see why it would be incompatible, compressed blocks are data like
> any other. COW and subvolume snapshots work with compressed nodes just as
> well as with regular ones...
>

I agree, I am not saying it's incompatible, just that I probably won't
deal with that unless time permits.

Thanks,
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG during log recovery

2012-12-19 Thread David Sterba
On Tue, Dec 18, 2012 at 03:10:24PM +0100, Jan Steffens wrote:
>   if (!caching_ctl) {
>   BUG_ON(!block_group_cache_done(block_group));
>   ret = btrfs_remove_free_space(block_group, start, num_bytes);
>   BUG_ON(ret); /* -ENOMEM */ // <<< 6185

The actual value of ret is = -11 = -EAGAIN (EAX: fff5 in the report)

>   } else {
>   mutex_lock(&caching_ctl->mutex);
> 
>   if (start >= caching_ctl->progress) {
>   ret = add_excluded_extent(root, start, num_bytes);
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug: trying to create reflink on different device results in empty file

2012-12-19 Thread David Sterba
On Tue, Dec 11, 2012 at 11:16:03PM +0100, Koen De Wit wrote:
> When trying to create a clone (reflink) of a file on a different device,
> you'll get this error:
> Invalid cross-device link
> 
> However, an empty file is created on the target location. An invalid clone
> operation should not result in the creation of a file.

The clone operation inside btrfs needs 2 files already open (ie. it
works with file descriptors, not with filenames), the empty file exists
prior to the ioctl call.  So the file should be removed by 'cp' (or
whoever calls the clone ioctl) because they know if it makes sense to
delete the file or not.

david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: HIT WARN_ON WARNING: at fs/btrfs/extent-tree.c:6339 btrfs_alloc_free_block+0x126/0x330 [btrfs]()

2012-12-19 Thread Rock Lee
Hi,

 I just try to write the testcase a few hours before. So there will be
some problems.

 Maybe you can ignore the unimplemented and the ugly parts.

 Welcome any feedback. :)

 I have uploaded the test source file to Github. Please open this link:


https://github.com/Zimilo/btrfs-testing-suites/blob/master/fallocate/fallocate.c

 Latest kernel commit is 752451f01c4567b506bf4343082682dbb8fb30dd in
linus git tree.

 Testing on a 20GB loop device.

 When running the second case, try to do the sync command manually at
the some time several times. Always could be reproduced.

 You will get hit the WARN_ON, dmesg will report them.


 Except this problem, there's another bug,   btrfs_fallocate doesn't
guarantee subsequent write to that range not to fail because of disk
space.



2012/12/19 Josef Bacik :
> On Wed, Dec 19, 2012 at 08:12:01AM -0700, Rock Lee wrote:
>> Hi all,
>>
>> Did someone have met this problem before. When doing the tests, I hit
>>
>> the WARN_ON. Is this log make sense or someone had fixed the problem.
>>
>>  If needed, I can supply the detail log and the testcase source file.
>>
>>  Version: the latest codes at linus git tree.
>>
>
> If you can give me your testcase I will love you forever.  Thanks,
>
> Josef

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: HIT WARN_ON WARNING: at fs/btrfs/extent-tree.c:6339 btrfs_alloc_free_block+0x126/0x330 [btrfs]()

2012-12-19 Thread Josef Bacik
On Wed, Dec 19, 2012 at 08:12:01AM -0700, Rock Lee wrote:
> Hi all,
> 
> Did someone have met this problem before. When doing the tests, I hit
> 
> the WARN_ON. Is this log make sense or someone had fixed the problem.
> 
>  If needed, I can supply the detail log and the testcase source file.
> 
>  Version: the latest codes at linus git tree.
>

If you can give me your testcase I will love you forever.  Thanks,

Josef 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: HIT WARN_ON WARNING: at fs/btrfs/extent-tree.c:6339 btrfs_alloc_free_block+0x126/0x330 [btrfs]()

2012-12-19 Thread cwillu
On Wed, Dec 19, 2012 at 9:12 AM, Rock Lee  wrote:
> Hi all,
>
> Did someone have met this problem before. When doing the tests, I hit
>
> the WARN_ON. Is this log make sense or someone had fixed the problem.
>
>  If needed, I can supply the detail log and the testcase source file.

That'd be good, as well as the specific kernel version.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


HIT WARN_ON WARNING: at fs/btrfs/extent-tree.c:6339 btrfs_alloc_free_block+0x126/0x330 [btrfs]()

2012-12-19 Thread Rock Lee
Hi all,

Did someone have met this problem before. When doing the tests, I hit

the WARN_ON. Is this log make sense or someone had fixed the problem.

 If needed, I can supply the detail log and the testcase source file.

 Version: the latest codes at linus git tree.


[ 2140.981293] use_block_rsv: 336 callbacks suppressed
[ 2140.981295] [ cut here ]
[ 2140.981308] WARNING: at fs/btrfs/extent-tree.c:6339
btrfs_alloc_free_block+0x126/0x330 [btrfs]()
[ 2140.981309] Hardware name: 2356BG6
...

[ 2140.981568] [ cut here ]
[ 2140.981574] WARNING: at fs/btrfs/extent-tree.c:6339
btrfs_alloc_free_block+0x126/0x330 [btrfs]()
[ 2140.981574] Hardware name: 2356BG6
[ 2140.981575] btrfs: block rsv returned -28
 
 Always this value.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTree lock contention

2012-12-19 Thread Atri Sharma
Thanks a ton David. Now it is there.

Regards,

Atri

On Wed, Dec 19, 2012 at 8:39 PM, David Sterba  wrote:
> On Wed, Dec 19, 2012 at 08:34:08PM +0530, Atri Sharma wrote:
>> Sorry for bothering again, but I cant see Btree Lock contention marked
>> as in progress or my name anywhere on
>> https://btrfs.wiki.kernel.org/index.php/Project_ideas
>>
>> Could you please check?
>
> I don't know what happened to the edit, but I'm sure I moved the 'lock
> contention' under 'claimed projects' with your name added. Now it's
> there.
>
> david



-- 
Regards,

Atri
l'apprenant
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTree lock contention

2012-12-19 Thread David Sterba
On Wed, Dec 19, 2012 at 08:34:08PM +0530, Atri Sharma wrote:
> Sorry for bothering again, but I cant see Btree Lock contention marked
> as in progress or my name anywhere on
> https://btrfs.wiki.kernel.org/index.php/Project_ideas
> 
> Could you please check?

I don't know what happened to the edit, but I'm sure I moved the 'lock
contention' under 'claimed projects' with your name added. Now it's
there.

david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTree lock contention

2012-12-19 Thread Atri Sharma
Hi David,

Sorry for bothering again, but I cant see Btree Lock contention marked
as in progress or my name anywhere on
https://btrfs.wiki.kernel.org/index.php/Project_ideas

Could you please check?

Thanks,

Atri

On Wed, Dec 19, 2012 at 8:30 PM, Atri Sharma  wrote:
> Thanks a ton David.
>
> Should I be sending a plan of my approach before starting work?
>
> Atri
>
> On Wed, Dec 19, 2012 at 6:47 PM, David Sterba  wrote:
>> On Tue, Dec 18, 2012 at 07:57:23PM +0530, Atri Sharma wrote:
>>> I am pretty weak with Mediawiki. Is there any way somebody could
>>> please place my name 'Atri Sharma' for the Btree locking contention in
>>> the wiki?
>>
>> Done. We're using templates for the project descriptions, and clicking
>> 'edit' leads to editing the template itself, so one has to edit the
>> containing section (or the whole page) instead.
>>
>>
>> david
>
>
>
> --
> Regards,
>
> Atri
> l'apprenant



-- 
Regards,

Atri
l'apprenant
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: don't bother updating the inode when evicting

2012-12-19 Thread Josef Bacik
On Tue, Dec 18, 2012 at 06:58:33PM -0700, Miao Xie wrote:
> On tue, 18 Dec 2012 15:51:57 -0500, Josef Bacik wrote:
> > We're deleting the stupid thing, no sense in updating the inode for the new
> > size.  We're running into having 50-100 orphans left over with xfstests 83
> > because of ENOSPC when trying to start the transaction for the inode update.
> > This patch fixes this problem.  Thanks,
> 
> This patch is wrong, it will introduce the inconsonant metadata in the 
> snapshot
> tree. The reason is folloing:
> 
> commit 8407aa464331556e4f6784f974030b83fc7585ed
> Author: Miao Xie 
> Date:   Fri Sep 7 01:43:32 2012 -0600
> 
> Btrfs: fix corrupted metadata in the snapshot
> 
> When we delete a inode, we will remove all the delayed items including 
> delayed
> inode update, and then truncate all the relative metadata. If there is 
> lots of
> metadata, we will end the current transaction, and start a new 
> transaction to
> truncate the left metadata. In this way, we will leave a inode item that 
> its
> link counter is > 0, and also may leave some directory index items in 
> fs/file tree
> after the current transaction ends. In other words, the metadata in this 
> fs/file tree
> is inconsistent. If we create a snapshot for this tree now, we will find 
> a inode with
> corrupted metadata in the new snapshot, and we won't continue to drop the 
> left metadata,
> because its link counter is not 0.
> 
> We fix this problem by updating the inode item before the current 
> transaction ends.
> 
> Signed-off-by: Miao Xie 
> 

So why don't we fix unlink to call btrfs_update_inode_item so that the nlink
counter is set to 0?  The orphan item will be carried over into the snapshot if
we don't actually evict the inode before we do the snapshot and then the orphan
cleanup will take care of the rest?  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTree lock contention

2012-12-19 Thread Atri Sharma
Thanks a ton David.

Should I be sending a plan of my approach before starting work?

Atri

On Wed, Dec 19, 2012 at 6:47 PM, David Sterba  wrote:
> On Tue, Dec 18, 2012 at 07:57:23PM +0530, Atri Sharma wrote:
>> I am pretty weak with Mediawiki. Is there any way somebody could
>> please place my name 'Atri Sharma' for the Btree locking contention in
>> the wiki?
>
> Done. We're using templates for the project descriptions, and clicking
> 'edit' leads to editing the template itself, so one has to edit the
> containing section (or the whole page) instead.
>
>
> david



-- 
Regards,

Atri
l'apprenant
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH V6 2/2] Btrfs: Add a new ioctl to change the label of a mounted file system

2012-12-19 Thread David Sterba
On Tue, Dec 18, 2012 at 11:06:07AM +0800, Jeff Liu wrote:
> +static int btrfs_ioctl_set_fslabel(struct file *file, void __user *arg)
> +{
> + struct btrfs_root *root = BTRFS_I(fdentry(file)->d_inode)->root;
> + struct btrfs_super_block *super_block = root->fs_info->super_copy;
> + struct btrfs_trans_handle *trans;
> + char label[BTRFS_LABEL_SIZE];
> + int ret;
> +
> + if (!capable(CAP_SYS_ADMIN))
> + return -EPERM;
> +
> + if (copy_from_user(label, arg, sizeof(label)))
> + return -EFAULT;
> +
> + if (strnlen(label, BTRFS_LABEL_SIZE) == BTRFS_LABEL_SIZE)
> + return -EINVAL;
> +
> + ret = mnt_want_write_file(file);
> + if (ret)
> + return ret;
> +
> + mutex_lock(&root->fs_info->volume_mutex);
> + trans = btrfs_start_transaction(root, 1);
> + if (IS_ERR(trans)) {
> + ret = PTR_ERR(trans);
> + goto out_unlock;
> + }
> +
> + strcpy(super_block->label, label);
> + btrfs_end_transaction(trans, root);

If this fails, eg. with EIO, it will not be reported back to the user

ret = btrfs_end_transaction(trans, root);

should fix it.

> +
> +out_unlock:
> + mutex_unlock(&root->fs_info->volume_mutex);
> + mnt_drop_write_file(file);
> + return ret;
> +}
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTree lock contention

2012-12-19 Thread David Sterba
On Tue, Dec 18, 2012 at 07:57:23PM +0530, Atri Sharma wrote:
> I am pretty weak with Mediawiki. Is there any way somebody could
> please place my name 'Atri Sharma' for the Btree locking contention in
> the wiki?

Done. We're using templates for the project descriptions, and clicking
'edit' leads to editing the template itself, so one has to edit the
containing section (or the whole page) instead.


david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: About btrfs qgroup import/export command

2012-12-19 Thread Arne Jansen
On 19.12.2012 12:25, Miao Xie wrote:
> Hi, everyone.
> 
> As we know, there is no backup function for qgroup. when the problem
> occurs, the users must recover qgroup configuration manually, it is not
> convenient. And besides that, some users might want to import an existed
> qgroup configuration into a new filesystem. Btrfs does not have such a
> function,it can only be done manually.
> 
> So we want to implement btrfs qgroup import/export commands.
> 1)'btrfs qgroup export' commands will export qgroup tree
>   into a user's specified file.(stdout by default)
> 
> 2)user may modify the configuration file firstly and then
>   import it into the filesystem.(by 'btrfs qgroup import' command)
> 
> The file may be formated as the following:
> 
> Qgroupid is_compressed is_exclusive   limited_sizeparent
> --
>  0/10 0  10G1/0
>  1/01 1  20G---
>   
>  If 'is_exclusive' is set, 'limited_size' corresponds to max exlusive size,
>  else max referenced size. Here 'parent' exclude ancestral qgroups. 
> 
> Is there any comment about this idea? 

The configuration only really makes sense in combination with the existing
subvolumes. Even if the target has subvolumes under the same name, they
might have different internal IDs. So it might make more sense to address
the level 0 qgroups by name.
Also it might be misleading to apply a configuration to an existing fs, as
it currently is not possible get a correct accounting if the fs is not
empty. Rescan is not yet implemented.
So instead of just saving and restoring the qgroup config, it might make
more sense to create a new filesystem including all subvolumes and quota
config from a config file.
But, I'm not completely convinced that this is a features that is needed
frequently. If I want a standard deployment, I simple write a script that
creates the fs + subvol + quota.

-Arne

> 
> Thanks
> Miao
> 
> 
> 
> 
> 
> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


About btrfs qgroup import/export command

2012-12-19 Thread Miao Xie
Hi, everyone.

As we know, there is no backup function for qgroup. when the problem
occurs, the users must recover qgroup configuration manually, it is not
convenient. And besides that, some users might want to import an existed
qgroup configuration into a new filesystem. Btrfs does not have such a
function,it can only be done manually.

So we want to implement btrfs qgroup import/export commands.
1)'btrfs qgroup export' commands will export qgroup tree
  into a user's specified file.(stdout by default)

2)user may modify the configuration file firstly and then
  import it into the filesystem.(by 'btrfs qgroup import' command)

The file may be formated as the following:

Qgroupid is_compressed is_exclusive   limited_sizeparent
--
 0/10 0  10G1/0
 1/01 1  20G---
  
 If 'is_exclusive' is set, 'limited_size' corresponds to max exlusive size,
 else max referenced size. Here 'parent' exclude ancestral qgroups. 

Is there any comment about this idea? 

Thanks
Miao








--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Feeback on RAID1 feature of Btrfs

2012-12-19 Thread C Anthony Risinger
On Tue, Dec 18, 2012 at 6:13 AM, Hugo Mills  wrote:
> On Tue, Dec 18, 2012 at 01:20:20PM +0200, Brendan Hide wrote:
>> On 2012/12/17 06:23 PM, Hugo Mills wrote:
>> >On Mon, Dec 17, 2012 at 04:51:33PM +0100, Sebastien Luttringer wrote:
>> >>Hello,
>> snip
>> >>I get the feeling that RAID1 only allow one disk removing. Which is more
>> >>a RAID5 feature.
>> >The RAID-1 support in btrfs makes exactly two copies of each item
>> >of data, so you can lose at most one disk from the array safely. Lose
>> >any more, and you're likely to have lost data, as you've found out.
>> >>I'm afraid Btrfs raid1 will not be working before the end of the world.
>> >It does work (as you demonstrated with the first disk being
>> >removed) -- but just not as you thought it should. Now, you can argue
>> >that "RAID-1" isn't a good name to use here, but there's no good name
>> >in RAID terminology to describe what we actually have here.
>> Technically, btrfs's "RAID1" implementation is much closer to RAID1E
>> than traditional RAID1. See
>> http://en.wikipedia.org/wiki/Non-standard_RAID_levels#RAID_1E or 
>> http://pic.dhe.ibm.com/infocenter/director/v5r2/index.jsp?topic=/serveraid_9.00/fqy0_craid1e.html
>>
>> Perhaps a new name, as with ZFS, might be appropriate. RAID-Z and
>> RAID-Z2, for example, could not adequately be described by any
>> existing RAID terminology and, technically, RAID-Z still isn't a
>> RAID in the classical sense.
>
>Yeah, we did have a naming scheme proposed, with combinations of
> nCmSpP, where n is the number of copies held, m the number of stripes,
> and p the number of parity stripes. So btrfs RAID-1 is 2C, RAID-5 on 5
> disks would be 4S1P, and RAID-10 on 4 disks would be 2C2S.

...yes.  something like this is not only reflects reality better,, and
actually transfers information in consistent way (vs RAID-XYZ...
meaningless ENUM!) you could maybe do something like:

2C2S : -1S : 0

...or similar, showing:

{normal}
{OFFSET max degraded [rel boundary]}
{OFFSET current}

... which instantly makes the useful boundaries known, along with the
active "panic level" i should be experiencing :)

> I'd prefer
> to see that than some non-"standard" RAID-18KTHXBYE formulation.

^^^ this. the term "RAID" conjures expectations that run afoul of
btrfs's reality and should thus simply be avoided altogether

IMO, unless you wish/must explicitly correlate some similarity X,
there is no need to even mention the work RAID, because it carries no
information.

-- 

C Anthony
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: make delayed ref lock logic more readable

2012-12-19 Thread Miao Xie
Locking and unlocking delayed ref mutex are in the different functions,
and the name of lock functions is not uniform, so the readability is not
so good, this patch optimizes the lock logic and makes it more readable.

Signed-off-by: Miao Xie 
---
 fs/btrfs/delayed-ref.c |  8 
 fs/btrfs/delayed-ref.h |  6 ++
 fs/btrfs/extent-tree.c | 42 --
 3 files changed, 38 insertions(+), 18 deletions(-)

diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index 455894f..b7a0641 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -426,6 +426,14 @@ again:
return 1;
 }
 
+void btrfs_release_ref_cluster(struct list_head *cluster)
+{
+   struct list_head *pos, *q;
+
+   list_for_each_safe(pos, q, cluster)
+   list_del_init(pos);
+}
+
 /*
  * helper function to update an extent delayed ref in the
  * rbtree.  existing and update must both have the same
diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
index fe50392..7939149 100644
--- a/fs/btrfs/delayed-ref.h
+++ b/fs/btrfs/delayed-ref.h
@@ -211,8 +211,14 @@ struct btrfs_delayed_ref_head *
 btrfs_find_delayed_ref_head(struct btrfs_trans_handle *trans, u64 bytenr);
 int btrfs_delayed_ref_lock(struct btrfs_trans_handle *trans,
   struct btrfs_delayed_ref_head *head);
+static inline void btrfs_delayed_ref_unlock(struct btrfs_delayed_ref_head 
*head)
+{
+   mutex_unlock(&head->mutex);
+}
+
 int btrfs_find_ref_cluster(struct btrfs_trans_handle *trans,
   struct list_head *cluster, u64 search_start);
+void btrfs_release_ref_cluster(struct list_head *cluster);
 
 int btrfs_check_delayed_seq(struct btrfs_fs_info *fs_info,
struct btrfs_delayed_ref_root *delayed_refs,
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index ae3c24a..b6ed965 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2143,7 +2143,6 @@ static int run_one_delayed_ref(struct btrfs_trans_handle 
*trans,
  node->num_bytes);
}
}
-   mutex_unlock(&head->mutex);
return ret;
}
 
@@ -2258,7 +2257,7 @@ static noinline int run_clustered_refs(struct 
btrfs_trans_handle *trans,
 * process of being added. Don't run this ref yet.
 */
list_del_init(&locked_ref->cluster);
-   mutex_unlock(&locked_ref->mutex);
+   btrfs_delayed_ref_unlock(locked_ref);
locked_ref = NULL;
delayed_refs->num_heads_ready++;
spin_unlock(&delayed_refs->lock);
@@ -2297,25 +2296,22 @@ static noinline int run_clustered_refs(struct 
btrfs_trans_handle *trans,
btrfs_free_delayed_extent_op(extent_op);
 
if (ret) {
-   list_del_init(&locked_ref->cluster);
-   mutex_unlock(&locked_ref->mutex);
-
-   printk(KERN_DEBUG "btrfs: 
run_delayed_extent_op returned %d\n", ret);
+   printk(KERN_DEBUG
+  "btrfs: run_delayed_extent_op "
+  "returned %d\n", ret);
spin_lock(&delayed_refs->lock);
+   btrfs_delayed_ref_unlock(locked_ref);
return ret;
}
 
goto next;
}
-
-   list_del_init(&locked_ref->cluster);
-   locked_ref = NULL;
}
 
ref->in_tree = 0;
rb_erase(&ref->rb_node, &delayed_refs->root);
delayed_refs->num_entries--;
-   if (locked_ref) {
+   if (!btrfs_delayed_ref_is_head(ref)) {
/*
 * when we play the delayed ref, also correct the
 * ref_mod on head
@@ -2337,20 +2333,29 @@ static noinline int run_clustered_refs(struct 
btrfs_trans_handle *trans,
ret = run_one_delayed_ref(trans, root, ref, extent_op,
  must_insert_reserved);
 
-   btrfs_put_delayed_ref(ref);
btrfs_free_delayed_extent_op(extent_op);
-   count++;
-
if (ret) {
-   if (locked_ref) {
-   list_del_init(&locked_ref->cluster);
-   mutex_unlock(&locked_ref->mutex);
-   }
-   printk(KERN_DEBUG "btrfs: run_one_delayed_ref returned 
%d\n", ret);
+   btrfs_