Re: btrfsck does not fix

2014-02-09 Thread Hendrik Friedel

Hi Chris,

thanks for your reply.

 ./btrfs filesystem show /dev/sdb1

Label: none  uuid: 989306aa-d291-4752-8477-0baf94f8c42f
Total devices 2 FS bytes used 3.47TiB
devid1 size 2.73TiB used 1.74TiB path /dev/sdb1
devid2 size 2.73TiB used 1.74TiB path /dev/sdc1


I don't understand the no spare part. You have 3.47T of data, and yet the 
single device size is 2.73T.
There is no way to migrate 1.74T from sdc1 to sdb1 because there isn't enough 
space.


Fair point. I summed up manually (with du) and apparently missed some 
data. I can move the 0.8TiB out of the way. I just don't have 3.5TiB 
'spare'.





btrfs device delete /dev/sdc1 /mnt/BTRFS/rsnapshot/
btrfs device delete /dev/sdc1 /mnt/BTRFS/backups/
btrfs device delete /dev/sdc1 /mnt/BTRFS/Video/
btrfs filesystem balance start /mnt/BTRFS/Video/


I don't understand this sequence because I don't know what you've mounted where,


I'm sorry. here you go:
/btrfs subvolume list /mnt/BTRFS/Video
ID 256 gen 226429 top level 5 path Video -- /mnt/BTRFS/Video/
ID 1495 gen 226141 top level 5 path rsnapshot  -- /mnt/BTRFS/rsnapshot
ID  gen 226429 top level 256 path Snapshot -- not mounted
ID 5845 gen 226375 top level 5 path backups -- /mnt/BTRFS/backups



but in any case maybe it's a bug that you're not getting errors for each

 of these commands because  you can't delete sdc1 from a raid0 volume.
That makes sense. I read that procedure somewhere in the -totally 
unvalidated- Internet.
In case the missing Error-Message is a Bug: Is this place here 
sufficient to report it, or is there a Bug-Tracker?



 You'd first have to convert the data, metadata, and system profiles to
 single (metadata can be set to dup). And then you'd be able to delete
 a device so long as there's room on remaining devices, which you 
don't have.


Yes, but I can create that space.
So, for me the next steps would be to:
-generate enough room on the filesystem
-btrfs balance start -dconvert=raid1 -mconvert=raid1 /mnt/BTRFS/Video
-btrfs device delete /dev/sdc1 /mnt/BTRFS/Video

Right?


next, I'm doing the balance for the subvolume /mnt/BTRFS/backups


You told us above  you deleted that subvolume. So how are you balancing it?


Yes, that was my understanding from my research:
You tell btrfs, that you want to remove one disc from the filesystem and 
then balance it to move the data on the remaining disc. I did find this 
logical. I was expecting that I possibly need a further command to tell 
btrfs that it's not a raid anymore, but I thought this could also be 
automagical.
I understand, that's not the way it is implemented, but it's not a crazy 
idea, is it?



And also, balance applies to a mountpoint, and even if you mount a

 subvolume to that mountpoint, the whole file system is balanced.
 Not just the mounted subvolume.

That is confusing. (I mean: I understand what you are saying, but it's 
counterintuitive). Why is this the case?



In parallel, I try to delete /mnt/BTRFS/rsnapshot, but it fails:
  btrfs subvolume delete  /mnt/BTRFS/rsnapshot/
  Delete subvolume '/mnt/BTRFS/rsnapshot'
  ERROR: cannot delete '/mnt/BTRFS/rsnapshot' - Inappropriate ioctl
  for  device

Why's that?
But even more: How do I free sdc1 now?!



Well I'm pretty confused because again, I can't tell if your paths refer to

 subvolumes or if they refer to mount points.

Now I am confused. These paths are the paths to which I mounted the 
subvolumes:

my (abbreviated) fstab:
UUID=xy  /mnt/BTRFS/Video btrfs subvol=Video
UUID=xy /mnt/BTRFS/rsnapshot btrfs subvol=rsnapshot
UUID=xy /mnt/BTRFS/backups btrfs subvol=backups


 The balance and device delete commands all refer to a mount point, 
which is the path returned by the df command.

So this:
/dev/sdb1   5,5T3,5T  2,0T   64% /mnt/BTRFS/Video
/dev/sdb1   5,5T3,5T  2,0T   64% /mnt/BTRFS/backups
/dev/sdc1   5,5T3,5T  2,0T   64% /mnt/BTRFS/rsnapshot


The subvolume delete command needs a path to subvolume that starts with the 
mount point.

Sorry, this I do not understand, no matter how hard I think about it..
What would it be in my case?

Thanks for your help! I appreciate it.


Greetings,
Hendrik
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Provide a better free space estimate on RAID1

2014-02-09 Thread Roman Mamedov
On Sun, 9 Feb 2014 06:38:53 + (UTC)
Duncan 1i5t5.dun...@cox.net wrote:

 RAID or multi-device filesystems aren't 1970s features and break 1970s 
 behavior and the assumptions associated with it.  If you're not prepared 
 to deal with those broken assumptions, don't.  Use mdraid or dmraid or lvm 
 or whatever to combine your multiple devices into one logical devices as 
 presented, and put your filesystem (either traditional filesystem, or 
 even btrfs using traditional single-device functionality) on top of the 
 single device the layer beneath the filesystem presents.  Problem solved! 
 =:^)
 
 Note that df only lists a single device as well, not the multiple 
 component devices of the filesystem.  That's broken functionality by your 
 definition, too, and again, using some other layer like lvm or mdraid to 
 present multiple devices as a single virtual device, with a traditional 
 single-device filesystem layout on top of that single device... solves 
 the problem!

No reason BTRFS can't work well in a similar simplistic usage scenario.

You seem to insist there is no way around it being too flexible for its own
good, but all those advanced features absolutely don't *have* to get in the
way of everyday usage for users who don't require them.

 Meanwhile, what I've done here is use one of df's commandline options to 
 set its block size to 2 MiB, and further used bash's alias functionality 
 to setup an alias accordingly:
 
 alias df='df -B2M'
 
 $ df /h
 Filesystem 2M-blocks  Used Available Use% Mounted on
 /dev/sda6  20480 12186  7909  61% /h
 
 $ sudo btrfs fi show /h
 Label: hm0238gcnx+35l0  uuid: ce23242a-b0a9-423f-a9c3-7db2729f48d6
 Total devices 2 FS bytes used 11.90GiB
 devid1 size 20.00GiB used 14.78GiB path /dev/sda6
 devid2 size 20.00GiB used 14.78GiB path /dev/sdb6
 
 $ sudo btrfs fi df /h
 Data, RAID1: total=14.00GiB, used=11.49GiB
 System, RAID1: total=32.00MiB, used=16.00KiB
 Metadata, RAID1: total=768.00MiB, used=414.94MiB
 
 
 On btrfs such as the above I can read the 2M blocks as 1M and be happy.

 On btrfs such as my /boot, which aren't raid1 (I have two separate 
 /boots, one on each device, with grub2 configured separately for each to 
 provide a backup), or if I df my media partitions still on reiserfs on 
 the old spinning rust, I can either double the figures DF gives me, or 
 add a second -B option at the CLI, overriding the aliased option.

Congratulations, you broke your df readings on all other filesystems to fix
them on btrfs.

 If I wanted something fully automated, it'd be easy enough to setup a 
 script that checked what filesystem I was df-ing, matched that against a 
 table of filesystems to preferred df block sizes, and supplied the 
 appropriate -BxX option accordingly.

I am not sure this would work well in the network share scenario described
earlier, with clients which in the real world are largely Windows-based.

-- 
With respect,
Roman


signature.asc
Description: PGP signature


Re: Provide a better free space estimate on RAID1

2014-02-09 Thread Kai Krakow
Duncan 1i5t5.dun...@cox.net schrieb:

 Roman Mamedov posted on Sun, 09 Feb 2014 04:10:50 +0600 as excerpted:
 
 If you need to perform a btrfs-specific operation, you can easily use
 the btrfs-specific tools to prepare for it, specifically use btrfs fi
 df which could give provide every imaginable interpretation of free
 space estimate and then some.
 
 UNIX 'df' and the 'statfs' call on the other hand should keep the
 behavior people are accustomized to rely on since 1970s.
 
 Which it does... on filesystems that only have 1970s filesystem features.
 =:^)
 
 RAID or multi-device filesystems aren't 1970s features and break 1970s
 behavior and the assumptions associated with it.  If you're not prepared
 to deal with those broken assumptions, don't.  Use mdraid or dmraid or lvm
 or whatever to combine your multiple devices into one logical devices as
 presented, and put your filesystem (either traditional filesystem, or
 even btrfs using traditional single-device functionality) on top of the
 single device the layer beneath the filesystem presents.  Problem solved!
 =:^)
 
 Note that df only lists a single device as well, not the multiple
 component devices of the filesystem.  That's broken functionality by your
 definition, too, and again, using some other layer like lvm or mdraid to
 present multiple devices as a single virtual device, with a traditional
 single-device filesystem layout on top of that single device... solves
 the problem!
 
 
 Meanwhile, what I've done here is use one of df's commandline options to
 set its block size to 2 MiB, and further used bash's alias functionality
 to setup an alias accordingly:
 
 alias df='df -B2M'
 
 $ df /h
 Filesystem 2M-blocks  Used Available Use% Mounted on
 /dev/sda6  20480 12186  7909  61% /h
 
 $ sudo btrfs fi show /h
 Label: hm0238gcnx+35l0  uuid: ce23242a-b0a9-423f-a9c3-7db2729f48d6
 Total devices 2 FS bytes used 11.90GiB
 devid1 size 20.00GiB used 14.78GiB path /dev/sda6
 devid2 size 20.00GiB used 14.78GiB path /dev/sdb6
 
 $ sudo btrfs fi df /h
 Data, RAID1: total=14.00GiB, used=11.49GiB
 System, RAID1: total=32.00MiB, used=16.00KiB
 Metadata, RAID1: total=768.00MiB, used=414.94MiB
 
 
 On btrfs such as the above I can read the 2M blocks as 1M and be happy.
 On btrfs such as my /boot, which aren't raid1 (I have two separate
 /boots, one on each device, with grub2 configured separately for each to
 provide a backup), or if I df my media partitions still on reiserfs on
 the old spinning rust, I can either double the figures DF gives me, or
 add a second -B option at the CLI, overriding the aliased option.
 
 If I wanted something fully automated, it'd be easy enough to setup a
 script that checked what filesystem I was df-ing, matched that against a
 table of filesystems to preferred df block sizes, and supplied the
 appropriate -BxX option accordingly.  As I guess most admins after a few
 years, I've developed quite a library of scripts/aliases for various
 things I do routinely enough to warrant it, and this would be just one
 more joining the list. =:^)

Well done... And a good idea, I didn't think of it yet. But it's my idea of 
fixing it in user space. :-)

I usually leave the discussion when people start to argument with pointers 
to unix tradition... That's like starting a systemd discussion and telling 
me that systemd is broken by design while mentioning in the same sentence 
that sysvinit is working perfectly fine. The latter doesn't do so. The first 
is a matter of personal taste but is in no case broken... But... Well...

 But of course it's your system in question, and you can patch btrfs to
 output anything you like, in any format you like.  No need to bother with
 df's -B option if you'd prefer to patch the kernel instead.  Me, I'll
 stick to the -B option.  =:^)

That's essentially the FOSS idea. Actually, I don't want df behavior being 
broken for me. It uses fstat syscall, that returns blocks. Cutting returned 
values into half lies about the properties of the device - for EVERY 
application out there, no matter which assumptions are being made about the 
returned values. This breaks the fstat syscall. User-space should simply not 
rely on the assumption that 1k of user data occupies 1k worth of blocks 
(that's not true anyways because meta-data has to be allocated, too). When I 
had contact with unix first, df returned used/free blocks - native BLOCKS! 
No option to make it human readable. No forced intention that it would show 
you usable space for actual written data. The blocks were given as 512-byte 
sectors. I've been okay with that. I knew: If I cut the values in half, I'd 
get about the size of data I perhabs could fit in the device. If it had been 
a property of the device that 512 byte of user data would write two blocks, 
nobody had cared about df displaying wrong values.
 
-- 
Replies to list only preferred.

--
To unsubscribe from this list: send the line unsubscribe 

Re: Provide a better free space estimate on RAID1

2014-02-09 Thread Kai Krakow
Roman Mamedov r...@romanrm.net schrieb:

 When I started to use unix, df returned blocks, not bytes. Without your
 proposed patch, it does that right. With your patch, it does it wrong.
 
 It returns total/used/available space that is usable/used/available by/for
 user data.

No, it does not. It returns space allocatable to the filesystem. That's user 
data and meta data. That can be far from your expectations depending on how 
allocation on the filesystem works.

-- 
Replies to list only preferred.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 2/2] Revert Btrfs: remove transaction from btrfs send

2014-02-09 Thread Filipe David Manana
On Sun, Feb 9, 2014 at 2:39 AM, Shilong Wang wangshilong1...@gmail.com wrote:
 2014-02-08 23:46 GMT+08:00 Wang Shilong wangshilong1...@gmail.com:
 From: Wang Shilong wangsl.f...@cn.fujitsu.com

 This reverts commit 41ce9970a8a6a362ae8df145f7a03d789e9ef9d2.
 Previously i was thinking we can use readonly root's commit root
 safely while it is not true, readonly root may be cowed with the
 following cases.

 1.snapshot send root will cow source root.
 2.balance,device operations will also cow readonly send root
 to relocate.

 So i have two ideas to make us safe to use commit root.

 --approach 1:
 make it protected by transaction and end transaction properly and we research
 next item from root node(see btrfs_search_slot_for_read()).

 --approach 2:
 add another counter to local root structure to sync snapshot with send.
 and add a global counter to sync send with exclusive device operations.

 So with approach 2, send can use commit root safely, because we make sure
 send root can not be cowed during send. Unfortunately, it make codes *ugly*
 and more complex to maintain.

 To make snapshot and send exclusively, device operations and send operation
 exclusively with each other is a little confusing for common users.

 So why not drop into previous way.

 Cc: Josef Bacik jba...@fb.com
 Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com
 ---
 Josef, if we reach agreement to adopt this approach, please revert
 Filipe's patch(Btrfs: make some tree searches in send.c more efficient)
 from btrfs-next.

 Oops, this patch guarantee searching commit roots are all protected by
 transaction, Filipe's
 patch is ok, we need update Josef's previous patch.

Hi Shilong,

I am confused. Can you explain why that optimization patch is a
problem, either with or without your patch or any other patch
currently flying around?

Either before or after the optimization, we search through the commit
root and after a key search we process a key while holding the leaf's
extent buffer. Both approaches call btrfs_next_leaf too (either
directly or via btrfs_search_slot_for_read).

Thanks


 Wang
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 2/2] Revert Btrfs: remove transaction from btrfs send

2014-02-09 Thread Shilong Wang
2014-02-09 21:52 GMT+08:00 Filipe David Manana fdman...@gmail.com:
 On Sun, Feb 9, 2014 at 2:39 AM, Shilong Wang wangshilong1...@gmail.com 
 wrote:
 2014-02-08 23:46 GMT+08:00 Wang Shilong wangshilong1...@gmail.com:
 From: Wang Shilong wangsl.f...@cn.fujitsu.com

 This reverts commit 41ce9970a8a6a362ae8df145f7a03d789e9ef9d2.
 Previously i was thinking we can use readonly root's commit root
 safely while it is not true, readonly root may be cowed with the
 following cases.

 1.snapshot send root will cow source root.
 2.balance,device operations will also cow readonly send root
 to relocate.

 So i have two ideas to make us safe to use commit root.

 --approach 1:
 make it protected by transaction and end transaction properly and we 
 research
 next item from root node(see btrfs_search_slot_for_read()).

 --approach 2:
 add another counter to local root structure to sync snapshot with send.
 and add a global counter to sync send with exclusive device operations.

 So with approach 2, send can use commit root safely, because we make sure
 send root can not be cowed during send. Unfortunately, it make codes *ugly*
 and more complex to maintain.

 To make snapshot and send exclusively, device operations and send operation
 exclusively with each other is a little confusing for common users.

 So why not drop into previous way.

 Cc: Josef Bacik jba...@fb.com
 Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com
 ---
 Josef, if we reach agreement to adopt this approach, please revert
 Filipe's patch(Btrfs: make some tree searches in send.c more efficient)
 from btrfs-next.

 Oops, this patch guarantee searching commit roots are all protected by
 transaction, Filipe's
 patch is ok, we need update Josef's previous patch.

 Hi Shilong,

 I am confused. Can you explain why that optimization patch is a
 problem, either with or without your patch or any other patch
 currently flying around?

 Either before or after the optimization, we search through the commit
 root and after a key search we process a key while holding the leaf's
 extent buffer. Both approaches call btrfs_next_leaf too (either
 directly or via btrfs_search_slot_for_read).

Sorry my miss, your patch did not have problem, you did not notice
my following thread comments for this patch, we need update josef's previous
patch not yours. ^_^

Thanks,
Wang

 Thanks


 Wang
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



 --
 Filipe David Manana,

 Reasonable men adapt themselves to the world.
  Unreasonable men adapt the world to themselves.
  That's why all progress depends on unreasonable men.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH][V3] Provide a better free space estimate [was]Re: Provide a better free space estimate on RAID1

2014-02-09 Thread Goffredo Baroncelli
On 02/07/2014 05:40 AM, Roman Mamedov wrote:
 On Thu, 06 Feb 2014 20:54:19 +0100
 Goffredo Baroncelli kreij...@libero.it wrote:
 
[...]

As Roman pointed out, df show the raw space available. However
when a RAID level is used, the space available to the user is
less.
This patch try to address this estimation correcting the value
on the basis of the RAID level.

This is my third revision of this patch. In this last issue, I
addressed the bugs related to an uncorrected evaluation of the 
free space in case of RAID1 [1] and DUP.

I have to point out that the free space estimation is quite
approximative, because it assumes:

a) all the new files are allocated in data chunk
b) the free space will not consumed by metadata
c) the already allocated chunk are not evaluated for the free
space estimation

Both these assumptions are unrelated to my patch.

I performed some tests with a filesystem composed by 7 51GB disks. 
Here my df results:

Profile: single
Filesystem  Size  Used Avail Use% Mounted on
/dev/vdb351G  512K  348G   1% /mnt/btrfs1

Profile: raid1
Filesystem  Size  Used Avail Use% Mounted on
/dev/vdb351G  1.3M  175G   1% /mnt/btrfs1

Profile: raid10
Filesystem  Size  Used Avail Use% Mounted on
/dev/vdb351G  2.3M  177G   1% /mnt/btrfs1

Profile: raid5
Filesystem  Size  Used Avail Use% Mounted on
/dev/vdb351G  2.0M  298G   1% /mnt/btrfs1

Profile: raid6
Filesystem  Size  Used Avail Use% Mounted on
/dev/vdb351G  1.8M  248G   1% /mnt/btrfs1


Profile: DUP (only one 50GB disk was used)
Filesystem  Size  Used Avail Use% Mounted on
/dev/vdc 51G  576K   26G   1% /mnt/btrfs1


Below my patch.

BR
G.Baroncelli

[1] the bug is before my patch; try to see what happens when you 
create a RAID1 filesystem with three disks.

Changes history:
V1  First issue
V2  Correct a (old) bug when in RAID10 the disks aren't 
a multiple of 4
V3  Correct the free space estimation in RAID1 (when the
number of disks are odd) and DUP



diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index d71a11d..4064a5f 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1481,10 +1481,16 @@ static int btrfs_calc_avail_data_space(struct 
btrfs_root *root, u64 *free_bytes)
num_stripes = nr_devices;
} else if (type  BTRFS_BLOCK_GROUP_RAID1) {
min_stripes = 2;
-   num_stripes = 2;
+   num_stripes = nr_devices;
} else if (type  BTRFS_BLOCK_GROUP_RAID10) {
min_stripes = 4;
-   num_stripes = 4;
+   num_stripes = nr_devices;
+   } else if (type  BTRFS_BLOCK_GROUP_RAID5) {
+   min_stripes = 3;
+   num_stripes = nr_devices;
+   } else if (type  BTRFS_BLOCK_GROUP_RAID6) {
+   min_stripes = 4;
+   num_stripes = nr_devices;
}
 
if (type  BTRFS_BLOCK_GROUP_DUP)
@@ -1560,9 +1566,44 @@ static int btrfs_calc_avail_data_space(struct btrfs_root 
*root, u64 *free_bytes)
 
if (devices_info[i].max_avail = min_stripe_size) {
int j;
-   u64 alloc_size;
+   u64 alloc_size, delta;
+   int k, div;
+
+   /*
+* Depending by the RAID profile, we use some
+* disk space as redundancy:
+* RAID1, RAID10, DUP - half of space used as 
redundancy
+* RAID5  - 1 stripe used as redundancy
+* RAID6  - 2 stripes used as redundancy
+* RAID0,LINEAR   - no redundancy
+*/
+   if (type  BTRFS_BLOCK_GROUP_RAID1) {
+   k = num_stripes;
+   div = 2;
+   } else if (type  BTRFS_BLOCK_GROUP_DUP) {
+   k = num_stripes;
+   div = 2;
+   } else if (type  BTRFS_BLOCK_GROUP_RAID10) {
+   k = num_stripes;
+   div = 2;
+   } else if (type  BTRFS_BLOCK_GROUP_RAID5) {
+   k = num_stripes-1;
+   div = 1;
+   } else if (type  BTRFS_BLOCK_GROUP_RAID6) {
+   k = num_stripes-2;
+   div = 1;
+   } else { /* RAID0/LINEAR */
+   k = num_stripes;
+   div = 1;
+   }
+
+   delta = 

[GIT PULL] Btrfs

2014-02-09 Thread Chris Mason

Hi Linus,

Please pull my for-linus branch:

git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git for-linus

This is a small collection of fixes

Josef Bacik (2) commits (+4/-5):
Btrfs: don't loop forever if we can't run because of the tree mod log 
(+1/-0)
Btrfs: fix assert screwup for the pending move stuff (+3/-5)

David Sterba (1) commits (+1/-1):
btrfs: reserve no transaction units in btrfs_ioctl_set_features

Filipe David Borba Manana (1) commits (+2/-0):
Btrfs: fix data corruption when reading/updating compressed extents

Jeff Mahoney (1) commits (+2/-2):
btrfs: commit transaction after setting label and features

Total: (5) commits (+9/-8)

 fs/btrfs/compression.c | 2 ++
 fs/btrfs/extent-tree.c | 1 +
 fs/btrfs/ioctl.c   | 6 +++---
 fs/btrfs/send.c| 8 +++-
 4 files changed, 9 insertions(+), 8 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Error: could not do orphan cleanup -22

2014-02-09 Thread Pavel Volkov
There was a similar discussion about an error in January 2013 but it related to 
some kernel panic.
I don't know if I encountered the same thing.

These errors from system journal bother me:

 2月 09 22:18:53 melforce kernel: BTRFS error (device sdb3): Error removing 
orphan entry, stopping orphan cleanup
 2月 09 22:18:53 melforce kernel: BTRFS critical (device sdb3): could not do 
orphan cleanup -22

I run kernel 3.12.10.
I'll explain what I did at that moment.
Subvolumes were already mounted at /home and /var and I mounted the root 
subvolume at /mnt/btr2.
Then executed ls command on /home/btr2. ls gave me invalid argument errors, 
but still displayed the contents. 
Next time I ran ls (right away), there were no more errors.

Another example is a script that mounts the same thing and then takes snapshots.
If I run the script manually, it never fails. If I run it from cron job, one of 
the snapshot commands fails
telling me that /home/btr2/var isn't accesible (I don't remember the exact 
error message, I can look if it
shows up again).

Someone said in the January thread that -22 error messages are harmless but in 
this case userspace tools break
so I wouldn't consider this totally harmless.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


BTRFS with RAID1 cannot boot when removing drive

2014-02-09 Thread Saint Germain
Hello,

I am experimenting with BTRFS and RAID1 on my Debian Wheezy (with
backported kernel 3.12-0.bpo.1-amd64) using a a motherboard with UEFI.

However I haven't managed to make the system boot when the removing the
first hard drive.

I have installed Debian with the following partition on the first hard
drive (no BTRFS subsystem):
/dev/sda1: for / (BTRFS)
/dev/sda2: for /home (BTRFS)
/dev/sda3: for swap

Then I added another drive for a RAID1 configuration (with btrfs
balance) and I installed grub on the second hard drive with
grub-install /dev/sdb.

If I boot on sdb, it takes sda1 as the root filesystem
If I switched the cable, it always take the first hard drive as the
root filesystem (now sdb)
If I disconnect /dev/sda, the system doesn't boot with a message
saying that it hasn't found the UUID:

Scanning for BTRFS filesystems...
mount: mounting /dev/disk/by-uuid/c64fca2a-5700-4cca-abac-3a61f2f7486c on /root 
failed: Invalid argument

Can you tell me what I have done incorrectly ?
Is it because of UEFI ? If yes I haven't understood how I can correct
it in a simple way.

As extra question, I don't see also how I can configure the system to
get the correct swap in case of disk failure. Should I force both swap partition
to have the same UUID ?

Many thanks in advance !


Here are some outputs for info:

btrfs filesystem show
Label: none  uuid: 743d6b3b-71a7-4869-a0af-83549555284b
Total devices 2 FS bytes used 27.96MB
devid1 size 897.98GB used 3.03GB path /dev/sda2
devid2 size 897.98GB used 3.03GB path /dev/sdb2

Label: none  uuid: c64fca2a-5700-4cca-abac-3a61f2f7486c
Total devices 2 FS bytes used 3.85GB
devid1 size 27.94GB used 7.03GB path /dev/sda1
devid2 size 27.94GB used 7.03GB path /dev/sdb1

blkid 
/dev/sda1: UUID=c64fca2a-5700-4cca-abac-3a61f2f7486c 
UUID_SUB=77ffad34-681c-4c43-9143-9b73da7d1ae3 TYPE=btrfs 
/dev/sda3: UUID=469715b2-2fa3-4462-b6f5-62c04a60a4a2 TYPE=swap 
/dev/sda2: UUID=743d6b3b-71a7-4869-a0af-83549555284b 
UUID_SUB=744510f5-5bd5-4df4-b8c4-0fc1a853199a TYPE=btrfs 
/dev/sdb1: UUID=c64fca2a-5700-4cca-abac-3a61f2f7486c 
UUID_SUB=2615fd98-f2ad-4e7b-84bc-0ee7f9770ca0 TYPE=btrfs 
/dev/sdb2: UUID=743d6b3b-71a7-4869-a0af-83549555284b 
UUID_SUB=8783a7b1-57ef-4bcc-ae7f-be20761e9a19 TYPE=btrfs 
/dev/sdb3: UUID=56fbbe2f-7048-488f-b263-ab2eb000d1e1 TYPE=swap

cat /etc/fstab
# file system mount point   type  options   dump  pass
UUID=c64fca2a-5700-4cca-abac-3a61f2f7486c /   btrfs   defaults  
  0   1
UUID=743d6b3b-71a7-4869-a0af-83549555284b /home   btrfs   defaults  
  0   2
UUID=469715b2-2fa3-4462-b6f5-62c04a60a4a2 noneswapsw
  0   0

cat /boot/grub/grub.cfg 
#
# DO NOT EDIT THIS FILE
#
# It is automatically generated by grub-mkconfig using templates
# from /etc/grub.d and settings from /etc/default/grub
#

### BEGIN /etc/grub.d/00_header ###
if [ -s $prefix/grubenv ]; then
  load_env
fi
set default=0
if [ ${prev_saved_entry} ]; then
  set saved_entry=${prev_saved_entry}
  save_env saved_entry
  set prev_saved_entry=
  save_env prev_saved_entry
  set boot_once=true
fi

function savedefault {
  if [ -z ${boot_once} ]; then
saved_entry=${chosen}
save_env saved_entry
  fi
}

function load_video {
  insmod vbe
  insmod vga
  insmod video_bochs
  insmod video_cirrus
}

insmod part_msdos
insmod btrfs
set root='(hd1,msdos1)'
search --no-floppy --fs-uuid --set=root c64fca2a-5700-4cca-abac-3a61f2f7486c
if loadfont /usr/share/grub/unicode.pf2 ; then
  set gfxmode=640x480
  load_video
  insmod gfxterm
  insmod part_msdos
  insmod btrfs
  set root='(hd1,msdos1)'
  search --no-floppy --fs-uuid --set=root c64fca2a-5700-4cca-abac-3a61f2f7486c
  set locale_dir=($root)/boot/grub/locale
  set lang=fr_FR
  insmod gettext
fi
terminal_output gfxterm
set timeout=5
### END /etc/grub.d/00_header ###

### BEGIN /etc/grub.d/05_debian_theme ###
insmod part_msdos
insmod btrfs
set root='(hd1,msdos1)'
search --no-floppy --fs-uuid --set=root c64fca2a-5700-4cca-abac-3a61f2f7486c
insmod png
if background_image /usr/share/images/desktop-base/joy-grub.png; then
  set color_normal=white/black
  set color_highlight=black/white
else
  set menu_color_normal=cyan/blue
  set menu_color_highlight=white/blue
fi
### END /etc/grub.d/05_debian_theme ###

### BEGIN /etc/grub.d/10_linux ###
menuentry 'Debian GNU/Linux, with Linux 3.12-0.bpo.1-amd64' --class debian 
--class gnu-linux --class gnu --class os {
load_video
insmod gzio
insmod part_msdos
insmod btrfs
set root='(hd1,msdos1)'
search --no-floppy --fs-uuid --set=root 
c64fca2a-5700-4cca-abac-3a61f2f7486c
echo'Chargement de Linux 3.12-0.bpo.1-amd64 ...'
linux   /boot/vmlinuz-3.12-0.bpo.1-amd64 
root=UUID=c64fca2a-5700-4cca-abac-3a61f2f7486c ro  quiet
echo'Chargement du disque mémoire initial ...'
initrd  

Re: [PATCH] xfstests: Btrfs: add test for large metadata blocks

2014-02-09 Thread Dave Chinner
On Sat, Feb 08, 2014 at 09:30:51AM +0100, Koen De Wit wrote:
 On 02/07/2014 11:49 PM, Dave Chinner wrote:
 On Fri, Feb 07, 2014 at 06:14:45PM +0100, Koen De Wit wrote:
  echo -n $xattr_value | md5sum
  ${ATTR_PROG} -Lq -s attr_$char -V $xattr_value $file
  ${ATTR_PROG} -Lq -g attr_$char $file | md5sum
  ${ATTR_PROG} -Lq -g attr_$char $lnkfile | md5sum
 
 is all that neds to be done here.
 
 The problem with this is that the length of the output will depend on the 
 page size. The code above runs for every valid leafsize, which can be any 
 multiple of the page size up to 64KB, as defined in the loop initialization:
 for leafsize in `seq $pagesize_kb $pagesize_kb 64`; do

That's only a limit on the mkfs leafsize parameter, yes? An the
limiation is that the leaf size can't be smaller than page size?

So really, the attribute sizes that are being tested are independent
of the mkfs parameters being tested. i.e:

for attrsize in `seq 4 4 64`; do
if [ $attrsize -lt $pagesize ]; then
leafsize=$pagesize
else
leafsize=$attrsize
fi
$BTRFS_MKFS_PROG -l $leafsize $SCRATCH_DEV

And now the test executes a fixed loop, testing the same attribute
sizes on all the filesystems under test. i.e. the attribute sizes
being tested are *independent* of the mkfs parameters being tested.
Always test the same attribute sizes, the mkfs parameters simply
vary by page size.

 +_scratch_unmount + +# Some illegal leafsizes + +_scratch_mkfs
 -l 0 2 $seqres.full +echo $?
 Same again - you are dumping the error output into a different
 file, then detecting the error manually. pass the output of
 _scratch_mkfs through a filter, and let errors cause golden
 output mismatches.
 
 I did this to make the golden output not depend on the output of
 mkfs.btrfs, inspired by
 http://oss.sgi.com/cgi-bin/gitweb.cgi?p=xfs/cmds/xfstests.git;a=commit;h=fd7a8e885732475c17488e28b569ac1530c8eb59
 and
 http://oss.sgi.com/cgi-bin/gitweb.cgi?p=xfs/cmds/xfstests.git;a=commit;h=78d86b996c9c431542fdbac11fa08764b16ceb7d
 However, in my opinion the test should simply be updated if the
 output of mkfs.btrfs changes, so I agree with you and I fixed this
 in v2.

While I agree with the sentiment, I'm questioning the
implementation. i.e. you've done this differently to every other
test that needs to check for failures. run_check woul dbe just
fine, as would be simply filtering the output of mkfs.

FWIW, the method for detecting the cp error in the second commit
is for a very specific case. It could have also been done with a
filter, as we have done in the past with such error messages. So
what's good for one case is not necessarily the right way to handle
the output for another.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: faster/more efficient insertion of file extent items

2014-02-09 Thread Filipe David Borba Manana
This is an extension to my previous commit titled:

  Btrfs: faster file extent item replace operations
  (hash 1acae57b161ef1282f565ef907f72aeed0eb71d9)

Instead of inserting the new file extent item if we deleted existing
file extent items covering our target file range, also allow to insert
the new file extent item if we didn't find any existing items to delete
and replace_extent != 0, since in this case our caller would do another
tree search to insert the new file extent item anyway, therefore just
combine the two tree searches into a single one, saving cpu time, reducing
lock contention and reducing btree node/leaf COW operations.

This covers the case where applications keep doing tail append writes to
files, which for example is the case of Apache CouchDB (its database and
view index files are always open with O_APPEND).

Signed-off-by: Filipe David Borba Manana fdman...@gmail.com
---
 fs/btrfs/file.c |   52 ++--
 1 file changed, 30 insertions(+), 22 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 0165b86..006af2f 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -720,7 +720,7 @@ int __btrfs_drop_extents(struct btrfs_trans_handle *trans,
if (drop_cache)
btrfs_drop_extent_cache(inode, start, end - 1, 0);
 
-   if (start = BTRFS_I(inode)-disk_i_size)
+   if (start = BTRFS_I(inode)-disk_i_size  !replace_extent)
modify_tree = 0;
 
while (1) {
@@ -938,34 +938,42 @@ next_slot:
 * Set path-slots[0] to first slot, so that after the delete
 * if items are move off from our leaf to its immediate left or
 * right neighbor leafs, we end up with a correct and adjusted
-* path-slots[0] for our insertion.
+* path-slots[0] for our insertion (if replace_extent != 0).
 */
path-slots[0] = del_slot;
ret = btrfs_del_items(trans, root, path, del_slot, del_nr);
if (ret)
btrfs_abort_transaction(trans, root, ret);
+   }
 
-   leaf = path-nodes[0];
-   /*
-* leaf eb has flag EXTENT_BUFFER_STALE if it was deleted (that
-* is, its contents got pushed to its neighbors), in which case
-* it means path-locks[0] == 0
-*/
-   if (!ret  replace_extent  leafs_visited == 1 
-   path-locks[0] 
-   btrfs_leaf_free_space(root, leaf) =
-   sizeof(struct btrfs_item) + extent_item_size) {
-
-   key.objectid = ino;
-   key.type = BTRFS_EXTENT_DATA_KEY;
-   key.offset = start;
-   setup_items_for_insert(root, path, key,
-  extent_item_size,
-  extent_item_size,
-  sizeof(struct btrfs_item) +
-  extent_item_size, 1);
-   *key_inserted = 1;
+   leaf = path-nodes[0];
+   /*
+* If btrfs_del_items() was called, it might have deleted a leaf, in
+* which case it unlocked our path, so check path-locks[0] matches a
+* write lock.
+*/
+   if (!ret  replace_extent  leafs_visited == 1 
+   (path-locks[0] == BTRFS_WRITE_LOCK_BLOCKING ||
+path-locks[0] == BTRFS_WRITE_LOCK) 
+   btrfs_leaf_free_space(root, leaf) =
+   sizeof(struct btrfs_item) + extent_item_size) {
+
+   key.objectid = ino;
+   key.type = BTRFS_EXTENT_DATA_KEY;
+   key.offset = start;
+   if (!del_nr  path-slots[0]  btrfs_header_nritems(leaf)) {
+   struct btrfs_key slot_key;
+
+   btrfs_item_key_to_cpu(leaf, slot_key, path-slots[0]);
+   if (btrfs_comp_cpu_keys(key, slot_key)  0)
+   path-slots[0]++;
}
+   setup_items_for_insert(root, path, key,
+  extent_item_size,
+  extent_item_size,
+  sizeof(struct btrfs_item) +
+  extent_item_size, 1);
+   *key_inserted = 1;
}
 
if (!replace_extent || !(*key_inserted))
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Provide a better free space estimate on RAID1

2014-02-09 Thread Duncan
Roman Mamedov posted on Sun, 09 Feb 2014 15:20:00 +0600 as excerpted:

 On Sun, 9 Feb 2014 06:38:53 + (UTC)
 Duncan 1i5t5.dun...@cox.net wrote:
 
 RAID or multi-device filesystems aren't 1970s features and break 1970s
 behavior and the assumptions associated with it.  If you're not
 prepared to deal with those broken assumptions, don't.  Use mdraid or
 dmraid or lvm or whatever to combine your multiple devices into one
 logical devices as presented, and put your filesystem (either
 traditional filesystem, or even btrfs using traditional single-device
 functionality) on top of the single device the layer beneath the
 filesystem presents.  Problem solved! =:^)
 
 No reason BTRFS can't work well in a similar simplistic usage scenario.
 
 You seem to insist there is no way around it being too flexible for its
 own good, but all those advanced features absolutely don't *have* to
 get in the way of everyday usage for users who don't require them.

Not really.  I'm more insisting that I've not seen a good kernel-space 
solution to the problem yet, and believe that it's a userspace or wetware 
problem.

And I provided a userspace/wetware solution that works for me, too. =:^)

 Meanwhile, what I've done here is use one of df's commandline options
 to set its block size to 2 MiB, and further used bash's alias
 functionality to setup an alias accordingly:
 
 alias df='df -B2M'
 
 $ df /h Filesystem 2M-blocks  Used Available Use% Mounted on
 /dev/sda6  20480 12186  7909  61% /h

 On btrfs such as the above I can read the 2M blocks as 1M and be happy.

 On btrfs such as my /boot, which aren't raid1 (I have two separate
 /boots, one on each device, with grub2 configured separately for each
 to provide a backup), or if I df my media partitions still on reiserfs
 on the old spinning rust, I can either double the figures DF gives me,
 or add a second -B option at the CLI, overriding the aliased option.
 
 Congratulations, you broke your df readings on all other filesystems to
 fix them on btrfs.

No.  It clearly says 2M blocks.  Nothing's broken at all, except perhaps 
the user's wetware.

I just find it a easier to do the doubling in wetware on the occasion 
it's needed, in MiB, then halving on more frequent occasions (since all 
my core automounted filesystems that I'd normally be doing df on are 
btrfs raid1), larger KiB or byte units, and don't need to do that wetware 
halving often enough to have gone to the trouble of setting up the 
software-scripted version I propose below.

 If I wanted something fully automated, it'd be easy enough to setup a
 script that checked what filesystem I was df-ing, matched that against
 a table of filesystems to preferred df block sizes, and supplied the
 appropriate -BxX option accordingly.
 
 I am not sure this would work well in the network share scenario
 described earlier, with clients which in the real world are largely
 Windows-based.

So patch the window-based stuff... oh, you've let them be your master (in 
the context of my sig below) and you can't...  Well, servant by choice, I 
guess...  There's freedom if you want it... which in fact you are using 
to do your kernel patches.  Try patching the MS Windows kernel and 
distributing those patches, and see how far you get! =:^(

FWIW/IMO, in the business context Ernie Ball made the right decision.  
One BSA audit was enough.  He said no more, and the company moved to free 
as in freedom software and isn't beholden to the whims of any servantware 
or the BSA auditors enforcing it, any longer. =:^)

But as I said, your systems (or your company's systems), play servant 
with them and be subject to the BSA gestapo (or the equivalent in your 
country) if you will.  No skin off my nose.  shrug


Meanwhile, you said it yourself, users aren't normally concerned about 
this.  And others pointed out that to the degree users /are/ concerned, 
they should be looking at their quotas, not filesystem level usage.

And admins, assuming they're proper admins, not the simple here's my 
MCSE, I'm certified to do anything, and if I can't do it, it's not 
possible, types, should have the wetware resources to either deal with 
the problem there, or script their own solutions, offloading it from 
wetware to installation-specific userspace software scripts as necessary.


All that said, it's worth noting that there ARE already API changes 
proposed and working their way thru the pipeline, that would expose 
various bits of necessary data to userspace in a standardized API that 
filesystems other than btrfs could make use of as well, with the intent 
of then updating coreutils (the package containing df) and friends to 
allow them to make use of the information exposed by this API to improve 
their default information output and allow for additional CLI level 
options as appropriate.  Presumably other userspace apps, including the 
GUIs over time, would follow the same course.

But the key is, getting a standardized modern API ready 

Re: [PATCH 1/2] btrfs-progs: Add missing devices check for mounted btrfs.

2014-02-09 Thread Qu Wenruo

On Fri, 07 Feb 2014 17:34:46 +0800, Anand Jain wrote:



 IMO btrfs-progs shouldn't add its intelligence to know if disk
 is missing. If btrfs-kernel doesn't know when disk is missing
 that's a bug to fix in btrfs-kernel. yes that indeed true as
 of now in btrfs-kernel. btrfs kernel has no idea when disk
 goes missing, just -EIO doesn't tell btrfs that. I am trying
 to fix this first.

 But the problem is there isn't good way with in btrfs/FS
 to know when disk goes missing. did I miss anything ?

Yes, kernel detection is the best way.
But since it has no better way to detect missing device, I think the 
btrfs-progs way fix is good enough for now.


Since btrfs fi show with -d options will scan the /dev to find fs and 
check missing disks,
I think adds some user-land check even using the ioctl way is still 
somewhat reasonable.


Thanks
Qu



Thanks, Anand


On 02/07/2014 02:45 PM, Qu Wenruo wrote:

In btrfs/003 of xfstest, it will check whether btrfs fi show can find
missing devices.

But before the patch, btrfs-progs will not check whether device missing
if given a mounted btrfs mountpoint/block device.
This patch fixes the bug and will pass btrfs/003.

Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
Cc: Anand Jain anand.j...@oracle.com
---
  cmds-filesystem.c | 12 
  1 file changed, 12 insertions(+)

diff --git a/cmds-filesystem.c b/cmds-filesystem.c
index 384d1b9..4c9933d 100644
--- a/cmds-filesystem.c
+++ b/cmds-filesystem.c
@@ -363,6 +363,8 @@ static int print_one_fs(struct 
btrfs_ioctl_fs_info_args *fs_info,

  char *label, char *path)
  {
  int i;
+int fd;
+int missing;
  char uuidbuf[BTRFS_UUID_UNPARSED_SIZE];
  struct btrfs_ioctl_dev_info_args *tmp_dev_info;
  int ret;
@@ -385,6 +387,14 @@ static int print_one_fs(struct 
btrfs_ioctl_fs_info_args *fs_info,


  for (i = 0; i  fs_info-num_devices; i++) {
  tmp_dev_info = (struct btrfs_ioctl_dev_info_args 
*)dev_info[i];

+
+/* Add check for missing devices even mounted */
+fd = open((char *)tmp_dev_info-path, O_RDONLY);
+if (fd  0) {
+missing = 1;
+continue;
+}
+close(fd);
  printf(\tdevid %4llu size %s used %s path %s\n,
  tmp_dev_info-devid,
  pretty_size(tmp_dev_info-total_bytes),
@@ -392,6 +402,8 @@ static int print_one_fs(struct 
btrfs_ioctl_fs_info_args *fs_info,

  tmp_dev_info-path);
  }

+if (missing)
+printf(\t*** Some devices missing\n);
  printf(\n);
  return 0;
  }





--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] btrfs-progs: Add -p/--print-missing options for btrfs fi show

2014-02-09 Thread Qu Wenruo

On fri, 07 Feb 2014 17:26:11 +0800, Anand Jain wrote:


 Whats needed is more comprehensive btrfs fi show
 which shows the flags (including missing) per disk.

Yes indeed.

 And also show the FS/Raid status. Which I am working on.

 sorry -p feature would be covered by default in the
 coming revamp of btrfs fi show.

That's all right.
Just a kind remind, if the output format changes, don't forget to modify 
the related xfstests testcase.


Thanks,
Qu


Thanks, Anand


On 02/07/2014 02:46 PM, Qu Wenruo wrote:

Since a mounted btrfs filesystem contains all the devices info even a
device is removed after mount(like btrfs/003 in xfstests),
we can use the info to print the known missing device if possible.

So -p/--print-missing options are added to print possible missing
devices.

Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
---
  cmds-filesystem.c | 26 --
  man/btrfs.8.in|  4 +++-
  2 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/cmds-filesystem.c b/cmds-filesystem.c
index 4c9933d..77b142c 100644
--- a/cmds-filesystem.c
+++ b/cmds-filesystem.c
@@ -360,7 +360,7 @@ static u64 calc_used_bytes(struct 
btrfs_ioctl_space_args *si)

  static int print_one_fs(struct btrfs_ioctl_fs_info_args *fs_info,
  struct btrfs_ioctl_dev_info_args *dev_info,
  struct btrfs_ioctl_space_args *space_info,
-char *label, char *path)
+char *label, char *path, int print_missing)
  {
  int i;
  int fd;
@@ -392,7 +392,14 @@ static int print_one_fs(struct 
btrfs_ioctl_fs_info_args *fs_info,

  fd = open((char *)tmp_dev_info-path, O_RDONLY);
  if (fd  0) {
  missing = 1;
-continue;
+if (print_missing)
+printf(\tdevid %4llu size %s used %s path %s 
(missing)\n,

+   tmp_dev_info-devid,
+ pretty_size(tmp_dev_info-total_bytes),
+ pretty_size(tmp_dev_info-bytes_used),
+   tmp_dev_info-path);
+else
+continue;
  }
  close(fd);
  printf(\tdevid %4llu size %s used %s path %s\n,
@@ -440,7 +447,7 @@ static int check_arg_type(char *input)
  return BTRFS_ARG_UNKNOWN;
  }

-static int btrfs_scan_kernel(void *search)
+static int btrfs_scan_kernel(void *search, int print_missing)
  {
  int ret = 0, fd;
  FILE *f;
@@ -477,7 +484,8 @@ static int btrfs_scan_kernel(void *search)
  fd = open(mnt-mnt_dir, O_RDONLY);
  if ((fd != -1)  !get_df(fd, space_info_arg)) {
  print_one_fs(fs_info_arg, dev_info_arg,
-space_info_arg, label, mnt-mnt_dir);
+ space_info_arg, label, mnt-mnt_dir,
+ print_missing);
  kfree(space_info_arg);
  memset(label, 0, sizeof(label));
  }
@@ -500,6 +508,7 @@ static const char * const cmd_show_usage[] = {
  Show the structure of a filesystem,
  -d|--all-devices   show only disks under /dev containing btrfs 
filesystem,

  -m|--mounted   show only mounted btrfs,
+-p|--print-missing show known missing device if possible,
  If no argument is given, structure of all present filesystems 
is shown.,

  NULL
  };
@@ -513,6 +522,7 @@ static int cmd_show(int argc, char **argv)
  int ret;
  int where = BTRFS_SCAN_LBLKID;
  int type = 0;
+int print_missing = 0;
  char mp[BTRFS_PATH_NAME_MAX + 1];
  char path[PATH_MAX];

@@ -521,9 +531,10 @@ static int cmd_show(int argc, char **argv)
  static struct option long_options[] = {
  { all-devices, no_argument, NULL, 'd'},
  { mounted, no_argument, NULL, 'm'},
+{ print-missing, no_argument, NULL, 'p'},
  { NULL, no_argument, NULL, 0 },
  };
-int c = getopt_long(argc, argv, dm, long_options,
+int c = getopt_long(argc, argv, dmp, long_options,
  long_index);
  if (c  0)
  break;
@@ -534,6 +545,9 @@ static int cmd_show(int argc, char **argv)
  case 'm':
  where = BTRFS_SCAN_MOUNTED;
  break;
+case 'p':
+print_missing = 1;
+break;
  default:
  usage(cmd_show_usage);
  }
@@ -571,7 +585,7 @@ static int cmd_show(int argc, char **argv)
  goto devs_only;

  /* show mounted btrfs */
-ret = btrfs_scan_kernel(search);
+ret = btrfs_scan_kernel(search, print_missing);
  if (search  !ret)
  return 0;

diff --git a/man/btrfs.8.in b/man/btrfs.8.in
index 8fea115..db2e355 100644
--- a/man/btrfs.8.in
+++ b/man/btrfs.8.in
@@ -25,7 +25,7 @@ btrfs \- control a btrfs filesystem
  .PP
  \fBbtrfs\fP \fBfilesystem df\fP\fI path\fP
  .PP
-\fBbtrfs\fP \fBfilesystem show\fP 
[\fI--mounted\fP|\fI--all-devices\fP|\fIuuid\fP]\fP
+\fBbtrfs\fP \fBfilesystem show\fP 
[\fI--mounted\fP|\fI--all-devices\fP|\fI--print-missing\fP|\fIuuid\fP]\fP

  .PP
  \fBbtrfs\fP \fBfilesystem 

Issue with btrfs balance

2014-02-09 Thread Austin S Hemmelgarn
I just recently discovered something about btrfs filesystem balance that
(as far as I can see) isn't documented anywhere, and doesn't necessarily
have an obvious (to the average user) explanation.

Apparently, trying to use -mconvert=dup or -sconvert=dup on a
multi-device filesystem using one of the RAID profiles for metadata
fails with a statement to look at the kernel log, which doesn't show
anything at all about the failure.

Based on what I've been able to understand from the source, it appears
that the kernel stops you from converting to a dup profile for metadata
in this case because it thinks that such a profile doesn't work on
multiple devices, despite the fact that you can take a single device
filesystem, and a device, and it will still work fine even without
converting the metadata/system profiles.

I feel at the very least, this should be documented, and the kernel
should give at least some indication of what went wrong.  Ideally, this
should be changed to allow converting to dup so that when converting a
multi-device filesystem to single-device, you never have to have
metadata or system chunks use a single profile.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS with RAID1 cannot boot when removing drive

2014-02-09 Thread Duncan
Saint Germain posted on Sun, 09 Feb 2014 22:40:55 +0100 as excerpted:

 I am experimenting with BTRFS and RAID1 on my Debian Wheezy (with
 backported kernel 3.12-0.bpo.1-amd64) using a a motherboard with UEFI.

My systems don't do UEFI, but I do run GPT partitions and use grub2 for 
booting, with grub2-core installed to a BIOS/reserved type partition 
(instead of as an EFI service as it would be with UEFI).  And I have root 
filesystem btrfs two-device raid1 mode working fine here, tested bootable 
with only one device of the two available.

So while I can't help you directly with UEFI, I know the rest of it can/
does work.

One more thing:  I do have a (small) separate btrfs /boot, actually two 
of them as I setup a separate /boot on each of the two devices in ordered 
to have a backup /boot, since grub can only point to one /boot by 
default, and while pointing to another in grub's rescue mode is possible, 
I didn't want to have to deal with that if the first /boot was corrupted, 
as it's easier to simply point the BIOS at a different drive entirely and 
load its (independently installed and configured) grub and /boot.

But grub2's btrfs module reads raid1 mode just fine as I can access files 
on the btrfs raid1 mode rootfs directly from grub without issue, so 
that's not a problem.

But I strongly suspect I know what is... and it's a relatively easy fix.  
See below.  =:^)

 However I haven't managed to make the system boot when the removing the
 first hard drive.
 
 I have installed Debian with the following partition on the first hard
 drive (no BTRFS subsystem):
 /dev/sda1: for / (BTRFS)
 /dev/sda2: for /home (BTRFS)
 /dev/sda3: for swap
 
 Then I added another drive for a RAID1 configuration (with btrfs
 balance) and I installed grub on the second hard drive with
 grub-install /dev/sdb.

Just for clarification as you don't mention it specifically, altho your 
btrfs filesystem show information suggests you did it this way, are your 
partition layouts identical on both drives?

That's what I've done here, and I definitely find that easiest to manage 
and even just to think about, tho it's definitely not a requirement.  But 
using different partition layouts does significantly increase management 
complexity, so it's useful to avoid if possible. =:^)

 If I boot on sdb, it takes sda1 as the root filesystem

 If I switched the cable, it always take the first hard drive as
 the root filesystem (now sdb)

That's normal /appearance/, but that /appearance/ doesn't fully reflect 
reality.

The problem is that mount output (and /proc/self/mounts), fstab, etc, 
were designed with single-device filesystems in mind, and multi-device 
btrfs has to be made to fix the existing rules as best it can.

So what's actually happening is that the for a btrfs composed of multiple 
devices, since there's only one device slot for the kernel to list 
devices, it only displays the first one it happens to come across, even 
tho the filesystem will normally (unless degraded) require that all 
component devices be available and logically assembled into the 
filesystem before it can be mounted.

When you boot on sdb, naturally, the sdb component of the multi-device 
filesystem that the kernel finds, so it's the one listed, even tho the 
filesystem is actually composed of more devices, not just that one.  When 
you switch the cables, the first one is, at least on your system, always 
the first device component of the filesystem detected, so it's always the 
one occupying the single device slot available for display, even tho the 
filesystem has actually assembled all devices into the complete 
filesystem before mounting.

 If I disconnect /dev/sda, the system doesn't boot with a message saying
 that it hasn't found the UUID:
 
 Scanning for BTRFS filesystems...
 mount: mounting /dev/disk/by-uuid/c64fca2a-5700-4cca-abac-3a61f2f7486c
 on /root failed: Invalid argument
 
 Can you tell me what I have done incorrectly ?
 Is it because of UEFI ? If yes I haven't understood how I can correct it
 in a simple way.

As you haven't mentioned it and the grub config below doesn't mention it 
either, I'm almost certain that you're simply not aware of the degraded 
mount option, and when/how it should be used.

And if you're not aware of that, chances are you're not aware of the 
btrfs wiki, and the multitude of other very helpful information it has 
available.  I'd suggest you spend some time reading it, as it'll very 
likely save you quite some btrfs administration questions and headaches 
down the road, as you continue to work with btrfs.

Bookmark it and refer to it often! =:^)

https://btrfs.wiki.kernel.org

(Click on the guides and usage information in contents under section 5, 
documentation.)

Here's the mount options page.  Note that the kernel btrfs documentation 
also includes mount options:

https://btrfs.wiki.kernel.org/index.php/Mount_options

$KERNELDIR/Documentation/filesystems/btrfs.txt

You should be able to mount a two-device 

[PATCH 0/3] make 'btrfs fi show /mnt/point/' works with ending '/' character

2014-02-09 Thread Qu Wenruo
Before this patchset, 'btrfs fi show' can work with '/mnt/point' but not 
'/mnt/point/',
which is very annoying since tab completion will add '/' to a directory.

This patchset just reuse the find_mount_root function with some small 
modification to
ignore the last '/' only when needed.

Qu Wenruo (3):
  btrfs-progs: move find_mount_root to utils.[ch]
  btrfs-progs: Add path_is_mp option for find_mount_root.
  btrfs-progs: reuse find_mount_root to determine arg type and so on.

 cmds-filesystem.c |  7 +
 cmds-receive.c|  2 +-
 cmds-send.c   | 53 ++---
 cmds-subvolume.c  |  2 +-
 commands.h|  1 -
 utils.c   | 88 +--
 utils.h   |  1 +
 7 files changed, 86 insertions(+), 68 deletions(-)

-- 
1.8.5.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] btrfs-progs: Add path_is_mp option for find_mount_root.

2014-02-09 Thread Qu Wenruo
Add path_is_mp option for find_mount_root, allowing to treat path as a
mount point, if not found a restricted(*) match, will return -ENOENT.

*: stricted match allow only the last '/' differs since path completion
often addes a '/' in the end but mount points in /proc/self/mounts.
e.g /mnt/data and /mnt/data/ is a restricted match but
/mnt/data and /mnt/data/something is not a restricted match.

Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
---
 cmds-receive.c   |  2 +-
 cmds-send.c  |  4 ++--
 cmds-subvolume.c |  2 +-
 utils.c  | 32 
 utils.h  |  2 +-
 5 files changed, 29 insertions(+), 13 deletions(-)

diff --git a/cmds-receive.c b/cmds-receive.c
index cce37a7..810fd59 100644
--- a/cmds-receive.c
+++ b/cmds-receive.c
@@ -843,7 +843,7 @@ static int do_receive(struct btrfs_receive *r, const char 
*tomnt, int r_fd)
goto out;
}
 
-   ret = find_mount_root(dest_dir_full_path, r-root_path);
+   ret = find_mount_root(dest_dir_full_path, r-root_path, 0);
if (ret  0) {
ret = -EINVAL;
fprintf(stderr, ERROR: failed to determine mount point 
diff --git a/cmds-send.c b/cmds-send.c
index 9d49ce9..967f45a 100644
--- a/cmds-send.c
+++ b/cmds-send.c
@@ -349,7 +349,7 @@ static int init_root_path(struct btrfs_send *s, const char 
*subvol)
if (s-root_path)
goto out;
 
-   ret = find_mount_root(subvol, s-root_path);
+   ret = find_mount_root(subvol, s-root_path, 0);
if (ret  0) {
ret = -EINVAL;
fprintf(stderr, ERROR: failed to determine mount point 
@@ -584,7 +584,7 @@ int cmd_send(int argc, char **argv)
goto out;
}
 
-   ret = find_mount_root(subvol, mount_root);
+   ret = find_mount_root(subvol, mount_root, 0);
if (ret  0) {
fprintf(stderr, ERROR: find_mount_root failed on %s: 
%s\n, subvol,
diff --git a/cmds-subvolume.c b/cmds-subvolume.c
index 0bd76f2..a78a535 100644
--- a/cmds-subvolume.c
+++ b/cmds-subvolume.c
@@ -931,7 +931,7 @@ static int cmd_subvol_show(int argc, char **argv)
goto out;
}
 
-   ret = find_mount_root(fullpath, mnt);
+   ret = find_mount_root(fullpath, mnt, 0);
if (ret  0) {
fprintf(stderr, ERROR: find_mount_root failed on %s: 
%s\n, fullpath, strerror(-ret));
diff --git a/utils.c b/utils.c
index 8f06d4e..e39ad79 100644
--- a/utils.c
+++ b/utils.c
@@ -2111,7 +2111,13 @@ int lookup_ino_rootid(int fd, u64 *rootid)
return 0;
 }
 
-int find_mount_root(const char *path, char **mount_root)
+/*
+ * Find the mount root of a given path.
+ * Return 0 when found and restore the mount root into mount_root.
+ * If path_is_mp is set, path will be treated as mount point and compare
+ * in a restricted way.
+ */
+int find_mount_root(const char *path, char **mount_root, int path_is_mp)
 {
FILE *mnttab;
int fd;
@@ -2144,17 +2150,27 @@ int find_mount_root(const char *path, char **mount_root)
endmntent(mnttab);
 
if (!longest_match) {
-   fprintf(stderr,
-   ERROR: Failed to find mount root for path %s.\n,
-   path);
-   return -ENOENT;
+   ret = -ENOENT;
+   goto out;
+   }
+
+   /* Only the last '/' in path may differs if path_is_mp */
+   if (path_is_mp  strlen(path) != strlen(longest_match)) {
+   if (strlen(path) != strlen(longest_match) + 1 ||
+   path[strlen(path) - 1] != '/') {
+   ret = -ENOENT;
+   goto out;
+   }
}
 
ret = 0;
-   *mount_root = realpath(longest_match, NULL);
-   if (!*mount_root)
-   ret = -errno;
+   if (mount_root) {
+   *mount_root = realpath(longest_match, *mount_root);
+   if (!*mount_root)
+   ret = -errno;
+   }
 
+out:
free(longest_match);
return ret;
 }
diff --git a/utils.h b/utils.h
index e074732..b69712e 100644
--- a/utils.h
+++ b/utils.h
@@ -96,6 +96,6 @@ int ask_user(char *question);
 int lookup_ino_rootid(int fd, u64 *rootid);
 int btrfs_scan_lblkid(int update_kernel);
 int get_btrfs_mount(const char *dev, char *mp, size_t mp_size);
-int find_mount_root(const char *path, char **mount_root);
+int find_mount_root(const char *path, char **mount_root, int path_is_mp);
 
 #endif
-- 
1.8.5.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] btrfs-progs: move find_mount_root to utils.[ch]

2014-02-09 Thread Qu Wenruo
Move find_mount_root to utils.[ch] for general use.

Signed-off-by: Qu Wenruo quwen...@cn.fuijitsu.com
---
 cmds-send.c | 49 +
 commands.h  |  1 -
 utils.c | 48 
 utils.h |  1 +
 4 files changed, 50 insertions(+), 49 deletions(-)

diff --git a/cmds-send.c b/cmds-send.c
index fc9a01e..9d49ce9 100644
--- a/cmds-send.c
+++ b/cmds-send.c
@@ -39,6 +39,7 @@
 #include ioctl.h
 #include commands.h
 #include list.h
+#include utils.h
 
 #include send.h
 #include send-utils.h
@@ -57,54 +58,6 @@ struct btrfs_send {
struct subvol_uuid_search sus;
 };
 
-int find_mount_root(const char *path, char **mount_root)
-{
-   FILE *mnttab;
-   int fd;
-   struct mntent *ent;
-   int len;
-   int ret;
-   int longest_matchlen = 0;
-   char *longest_match = NULL;
-
-   fd = open(path, O_RDONLY | O_NOATIME);
-   if (fd  0)
-   return -errno;
-   close(fd);
-
-   mnttab = setmntent(/proc/self/mounts, r);
-   if (!mnttab)
-   return -errno;
-
-   while ((ent = getmntent(mnttab))) {
-   len = strlen(ent-mnt_dir);
-   if (strncmp(ent-mnt_dir, path, len) == 0) {
-   /* match found */
-   if (longest_matchlen  len) {
-   free(longest_match);
-   longest_matchlen = len;
-   longest_match = strdup(ent-mnt_dir);
-   }
-   }
-   }
-   endmntent(mnttab);
-
-   if (!longest_match) {
-   fprintf(stderr,
-   ERROR: Failed to find mount root for path %s.\n,
-   path);
-   return -ENOENT;
-   }
-
-   ret = 0;
-   *mount_root = realpath(longest_match, NULL);
-   if (!*mount_root)
-   ret = -errno;
-
-   free(longest_match);
-   return ret;
-}
-
 static int get_root_id(struct btrfs_send *s, const char *path, u64 *root_id)
 {
struct subvol_info *si;
diff --git a/commands.h b/commands.h
index 23c1201..db70043 100644
--- a/commands.h
+++ b/commands.h
@@ -126,5 +126,4 @@ int cmd_rescue(int argc, char **argv);
 int test_issubvolume(char *path);
 
 /* send.c */
-int find_mount_root(const char *path, char **mount_root);
 char *get_subvol_name(char *mnt, char *full_path);
diff --git a/utils.c b/utils.c
index 75b37f3..8f06d4e 100644
--- a/utils.c
+++ b/utils.c
@@ -2110,3 +2110,51 @@ int lookup_ino_rootid(int fd, u64 *rootid)
 
return 0;
 }
+
+int find_mount_root(const char *path, char **mount_root)
+{
+   FILE *mnttab;
+   int fd;
+   struct mntent *ent;
+   int len;
+   int ret;
+   int longest_matchlen = 0;
+   char *longest_match = NULL;
+
+   fd = open(path, O_RDONLY | O_NOATIME);
+   if (fd  0)
+   return -errno;
+   close(fd);
+
+   mnttab = setmntent(/proc/self/mounts, r);
+   if (!mnttab)
+   return -errno;
+
+   while ((ent = getmntent(mnttab))) {
+   len = strlen(ent-mnt_dir);
+   if (strncmp(ent-mnt_dir, path, len) == 0) {
+   /* match found */
+   if (longest_matchlen  len) {
+   free(longest_match);
+   longest_matchlen = len;
+   longest_match = strdup(ent-mnt_dir);
+   }
+   }
+   }
+   endmntent(mnttab);
+
+   if (!longest_match) {
+   fprintf(stderr,
+   ERROR: Failed to find mount root for path %s.\n,
+   path);
+   return -ENOENT;
+   }
+
+   ret = 0;
+   *mount_root = realpath(longest_match, NULL);
+   if (!*mount_root)
+   ret = -errno;
+
+   free(longest_match);
+   return ret;
+}
diff --git a/utils.h b/utils.h
index 512c51b..e074732 100644
--- a/utils.h
+++ b/utils.h
@@ -96,5 +96,6 @@ int ask_user(char *question);
 int lookup_ino_rootid(int fd, u64 *rootid);
 int btrfs_scan_lblkid(int update_kernel);
 int get_btrfs_mount(const char *dev, char *mp, size_t mp_size);
+int find_mount_root(const char *path, char **mount_root);
 
 #endif
-- 
1.8.5.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html