Re: [PATCH] Btrfs: fix a tree mod bug while inserting a new root

2012-10-22 Thread Liu Bo
On 10/23/2012 01:05 AM, Jan Schmidt wrote:
> Hi liubo,
> 
> On Mon, October 22, 2012 at 16:02 (+0200), Liu Bo wrote:
>> According to btree's balance algorithm, when we split a root into two parts,
>> we insert a new one to be their parent:
>>
>>  new root
>> node A/  \
>>   | x1 x2 x3 x4 x5 x6 |   =>  node A node A'
>> | x1 x2 x3 - - - |  | x4 x5 x6 - - - |
>>  split
>>
>> The original root won't be freed because it becomes a child of the new root,
>> and a move to keep balance is needed then.
>>
>> So we should not add REMOVE_WHILE_FREEING keys for the old root, otherwise,
>> we will hit use-after-free since we first add REMOVE_WHILE_FREEING keys and
>> then add REMOVE keys, which is invalid.
> 
> I don't like adding another parameter there, the function is already confusing
> without it. I've got a different fix for that problem here as well. I haven't
> been sending it since Friday because there's at least one additional problem 
> in
> the tree mod log, and I'd like to see all of the issues fixed.
> 
> There's also a fix for double frees from push_node_left here. That one may be
> fixing the other issue you're seeing (which I still cannot reproduce). I'm 
> still
> not convinced it's a good idea to change the semantics in del_ptr as done in
> your previous patch set.
> 

If you have better fixes, that'd be good.

> Probably we can try working together on irc in a more interactive fashion? Or
> tell me if you want my patches anywhere before I send them out.
> 

OK, I'm on IRC now, lets rock it ;)

thanks,
liubo


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: device delete, error removing device

2012-10-22 Thread Chris Murphy

On Oct 22, 2012, at 1:50 PM, Hugo Mills  wrote:
> 
>   Yes, the automatic single -> RAID-0 upgrade was fixed. If you
> haven't run a balance on (at least) the metadata after adding the new
> device, then you won't get the DUP -> RAID-1 upgrade on metadata. (I
> can tell you haven't run the balance, because you still have the empty
> single metadata chunk).

I think it will be common for a balance of either type to not occur to most 
users. So if there's a way to automatically get a metadata only balance, such 
that metadata goes from dup to raid1, that would be nice.


>> What is the likelihood of a mkfs.btrfs 2+ device change in the
>> default data profile from raid0 to single?
> 
>   Non-zero. I think it mostly just wants someone to write the patch,
> and then beat off any resulting bikeshedding. :)

I'm curious of the negatives of -d single by default. Maybe this warrants a 
separate thread for discussion? As for the ensuing input for additional 
features once going down that path, the only one I'm coming up with is already 
mentioned above. Simple make options or additional steps go away for "most" 
people - whoever they are.

Chris Murphy--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: device delete, error removing device

2012-10-22 Thread Goffredo Baroncelli
On 2012-10-22 21:50, Hugo Mills wrote:
> It's more like a balance which moves everything that has
> some (part of its) existence on a device. So when you have
> RAID-0 or RAID-1 data, all of the related chunks on other
> disks get moved too (so in RAID-1, it's the mirror chunk as
> well as the chunk on the removed disk that gets
> rewritten).
>>> 
>>> Does this mean "device delete" depends on an ability to make
>>> writes to the device being removed? I immediately think of SSD
>>> failures, which seem to fail writing, while still being able to
>>> reliably read. Would that behavior inhibit the ability to
>>> remove the device from the volume?
> No, the device being removed isn't modified at all. (Which causes 
> its own set of weird problemettes, but I think most of those have
> gone away).

IIRC, when a device is deleted, the 1st superblock is zeroed.
Moreover btrfs needs to be able to read the device in order to delete
it. Of course these rules aren't applied when a device is classified
as "missing".

See the function btrfs_rm_device() in fs/btrfs/volumes.c for the details.

> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Btrfs-progs: fix unaligned accesses

2012-10-22 Thread Andre Diekwisch
Arne Jansen  gmx.net> writes:

> 
> There are some unaligned accesses in progs that cause malfunction or
> crashes on ARM.
> This patch fixes the ones we stumbled upon.
> 
> Signed-off-by: Arne Jansen  gmx.net>
> ---
> 
> Change v1->v2:
> Somehow sent the wrong patch without the patch to the setget functions.


Hello,

I reported these unaligned accesses originally and tested this patch on an ARMv5
kirkwood device using a 3.6.2 kernel.
I set /proc/cpu/alignment to 5 (signal+warn), and got no signals anymore using
this patch. I tested mkfs.btrfs, sub list and sub find-new (which all signalled
before the patch) and additionally inspect-internal inode-resolve and btrfsck.

None of them signalled or showed any other issues.
tested-by: Andre Diekwisch  mail.upb.de>


Thanks a lot Arne, this patch finally renders btrfs-progs usable on an
ARM-machine without enabling unaligned access trapping for userspace.
I'll report back should I run into more issues regarding unaligned accesses.

Hope this patch gets accepted.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] parse_size(): check for invalid suffix

2012-10-22 Thread Goffredo Baroncelli
From: Goffredo Baroncelli 

Check that the suffix for the parse_size() input is of only
one character.
---
 utils.c |   12 
 1 file changed, 12 insertions(+)

diff --git a/utils.c b/utils.c
index 732c782..de75309 100644
--- a/utils.c
+++ b/utils.c
@@ -1226,6 +1226,11 @@ u64 parse_size(char *s)
char c;
u64 mult = 1;
 
+   if( len <= 0 ){
+   fprintf(stderr, "ERROR: size value is empty\n");
+   exit(50);
+   }
+
if (!isdigit(s[len - 1])) {
c = tolower(s[len - 1]);
switch (c) {
@@ -1242,6 +1247,13 @@ u64 parse_size(char *s)
exit(1);
}
s[len - 1] = '\0';
+   len--;
+   }
+   if( len > 0 && !isdigit(s[len - 1])) {
+   fprintf(stderr, "ERROR: Illegal size value contains "
+   "non-digit character %c in wrong position\n",
+   s[len-1]);
+   exit(51);
}
return strtoull(s, NULL, 0) * mult;
 }
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] parse_size(): replace atoll() with strtoull()

2012-10-22 Thread Goffredo Baroncelli
From: Goffredo Baroncelli 

Replace the function atoll with strtoull()
---
 utils.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/utils.c b/utils.c
index 705be7b..732c782 100644
--- a/utils.c
+++ b/utils.c
@@ -1243,6 +1243,6 @@ u64 parse_size(char *s)
}
s[len - 1] = '\0';
}
-   return atoll(s) * mult;
+   return strtoull(s, NULL, 0) * mult;
 }
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/5] Update the man page with the new prefixes.

2012-10-22 Thread Goffredo Baroncelli
From: Goffredo Baroncelli 

---
 man/btrfs.8.in  |3 +++
 man/mkfs.btrfs.8.in |3 +++
 2 files changed, 6 insertions(+)

diff --git a/man/btrfs.8.in b/man/btrfs.8.in
index 9222580..33bd78d 100644
--- a/man/btrfs.8.in
+++ b/man/btrfs.8.in
@@ -184,6 +184,9 @@ defragment operations.
 
 \fB-t size\fP defragment only files at least \fIsize\fR bytes big
 
+For \fBstart\fP, \fBlen\fP, \fBsize\fP it is possible to append a prefix
+like \fBk\fP for 1 KBytes, \fBm\fP for 1 MBytes...
+
 NOTE: defragmenting with kernels up to 2.6.37 will unlink COW-ed copies of 
data,
 don't use it if you use snapshots, have de-duplicated your data or made
 copies with \fBcp --reflink\fP.
diff --git a/man/mkfs.btrfs.8.in b/man/mkfs.btrfs.8.in
index 72025ed..6f37cd8 100644
--- a/man/mkfs.btrfs.8.in
+++ b/man/mkfs.btrfs.8.in
@@ -69,6 +69,9 @@ Do not perform whole device TRIM operation by default.
 .TP
 \fB\-V\fR, \fB\-\-version\fR
 Print the \fBmkfs.btrfs\fP version and exit.
+.SH UNIT
+As default the unit is the byte, however it is possible to append a suffix
+to the arguments like \fBk\fP for KBytes, \fBm\fP for MBytes...
 .SH AVAILABILITY
 .B mkfs.btrfs
 is part of btrfs-progs. Btrfs is currently under heavy development,
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] parse_size(): add new suffixes

2012-10-22 Thread Goffredo Baroncelli
From: Goffredo Baroncelli 

Add new suffixes in parse_size() function. New suffixes are: T as
terabyte, P as petabyte, E as exabyte. Note these units are
multiply of 2 .
---
 utils.c |6 ++
 1 file changed, 6 insertions(+)

diff --git a/utils.c b/utils.c
index de75309..a5fabdc 100644
--- a/utils.c
+++ b/utils.c
@@ -1234,6 +1234,12 @@ u64 parse_size(char *s)
if (!isdigit(s[len - 1])) {
c = tolower(s[len - 1]);
switch (c) {
+   case 'e':
+   mult *= 1024;
+   case 'p':
+   mult *= 1024;
+   case 't':
+   mult *= 1024;
case 'g':
mult *= 1024;
case 'm':
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] Move parse_size() to utils.[hc]

2012-10-22 Thread Goffredo Baroncelli
From: Goffredo Baroncelli 

Move the function from cmds-filesystem.c and mkfs.c to utils.c
---
 cmds-filesystem.c |   26 --
 mkfs.c|   31 ---
 utils.c   |   26 ++
 utils.h   |2 ++
 4 files changed, 28 insertions(+), 57 deletions(-)

diff --git a/cmds-filesystem.c b/cmds-filesystem.c
index 9c43d35..507239a 100644
--- a/cmds-filesystem.c
+++ b/cmds-filesystem.c
@@ -311,32 +311,6 @@ static int cmd_sync(int argc, char **argv)
return 0;
 }
 
-static u64 parse_size(char *s)
-{
-   int len = strlen(s);
-   char c;
-   u64 mult = 1;
-
-   if (!isdigit(s[len - 1])) {
-   c = tolower(s[len - 1]);
-   switch (c) {
-   case 'g':
-   mult *= 1024;
-   case 'm':
-   mult *= 1024;
-   case 'k':
-   mult *= 1024;
-   case 'b':
-   break;
-   default:
-   fprintf(stderr, "Unknown size descriptor %c\n", c);
-   exit(1);
-   }
-   s[len - 1] = '\0';
-   }
-   return atoll(s) * mult;
-}
-
 static int parse_compress_type(char *s)
 {
if (strcmp(optarg, "zlib") == 0)
diff --git a/mkfs.c b/mkfs.c
index 47f0c9c..ca850d9 100644
--- a/mkfs.c
+++ b/mkfs.c
@@ -54,37 +54,6 @@ struct directory_name_entry {
struct list_head list;
 };
 
-static u64 parse_size(char *s)
-{
-   int len = strlen(s);
-   char c;
-   u64 mult = 1;
-   u64 ret;
-
-   s = strdup(s);
-
-   if (len && !isdigit(s[len - 1])) {
-   c = tolower(s[len - 1]);
-   switch (c) {
-   case 'g':
-   mult *= 1024;
-   case 'm':
-   mult *= 1024;
-   case 'k':
-   mult *= 1024;
-   case 'b':
-   break;
-   default:
-   fprintf(stderr, "Unknown size descriptor %c\n", c);
-   exit(1);
-   }
-   s[len - 1] = '\0';
-   }
-   ret = atol(s) * mult;
-   free(s);
-   return ret;
-}
-
 static int make_root_dir(struct btrfs_root *root, int mixed)
 {
struct btrfs_trans_handle *trans;
diff --git a/utils.c b/utils.c
index 205e667..705be7b 100644
--- a/utils.c
+++ b/utils.c
@@ -1220,3 +1220,29 @@ scan_again:
return 0;
 }
 
+u64 parse_size(char *s)
+{
+   int len = strlen(s);
+   char c;
+   u64 mult = 1;
+
+   if (!isdigit(s[len - 1])) {
+   c = tolower(s[len - 1]);
+   switch (c) {
+   case 'g':
+   mult *= 1024;
+   case 'm':
+   mult *= 1024;
+   case 'k':
+   mult *= 1024;
+   case 'b':
+   break;
+   default:
+   fprintf(stderr, "Unknown size descriptor %c\n", c);
+   exit(1);
+   }
+   s[len - 1] = '\0';
+   }
+   return atoll(s) * mult;
+}
+
diff --git a/utils.h b/utils.h
index 3a0368b..714fd7a 100644
--- a/utils.h
+++ b/utils.h
@@ -46,4 +46,6 @@ int check_label(char *input);
 int get_mountpt(char *dev, char *mntpt, size_t size);
 
 int btrfs_scan_block_devices(int run_ioctl);
+
+u64 parse_size(char *s);
 #endif
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH][BTRFS-PROGS] Update to parse_size()

2012-10-22 Thread Goffredo Baroncelli
Hi all,

the following patches attempt to address some issues to the function
parse_size():
- this function is defined both in mkfs.c and cmds-filesystem.c; I 
moved it in utils.c (which is already used in both mkfs.btrfs and
btrfs) in order to avoid code duplication.
- it used the function atoll(); I replaceed atoll() with strtoull() 
because we are dealing with u64
- no check on suffixes was performed. If the user put 'MB' as suffix he got
bytes instead megabytes. The patches check the suffix is valid
- add new suffixes (t,p,e for terabytes, petabytes, exabytes)
- update the man page of the command mkfs.btrfs and 
"btrfs filesystem defragment", both use parse_size()

Several peoples (see cc's) suggested these improvements with different 
patches, I collected them togheter.

Please reviewed them, test them. Comments are welcome.

The patches are available also to 

You can pull the patch from 
http://cassiopea.homelinux.net/git/btrfs-progs-unstable.git
branch
parse_size


Signed-off-by: Goffredo Baroncelli 


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: recheck bio against block device when we map the bio

2012-10-22 Thread Josef Bacik
On Mon, Oct 22, 2012 at 10:14:58AM -0600, Alex Elder wrote:
> On 10/19/2012 04:01 PM, Josef Bacik wrote:
> > Alex reported a problem where we were writing between chunks on a rbd
> > device.  The thing is we do bio_add_page using logical offsets, but the
> > physical offset may be different.  So when we map the bio now check to see
> > if the bio is still ok with the physical offset, and if it is not split the
> > bio up and redo the bio_add_page with the physical sector.  This fixes the
> > problem for Alex and doesn't affect performance in the normal case.  Thanks,
> > 
> > Reported-and-tested-by: Alex Elder 
> > Signed-off-by: Josef Bacik 
> > ---
> >  fs/btrfs/volumes.c |  159 
> > ++-
> >  1 files changed, 131 insertions(+), 28 deletions(-)
> > 
> > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> > index a8adf26..eaaf0bf 100644
> > --- a/fs/btrfs/volumes.c
> > +++ b/fs/btrfs/volumes.c
> > @@ -4217,6 +4217,113 @@ static noinline void schedule_bio(struct btrfs_root 
> > *root,
> >&device->work);
> >  }
> >  
> > +static int bio_size_ok(struct block_device *bdev, struct bio *bio,
> > +  sector_t sector)
> > +{
> > +   struct bio_vec *prev;
> > +   struct request_queue *q = bdev_get_queue(bdev);
> > +   unsigned short max_sectors = queue_max_sectors(q);
> > +   struct bvec_merge_data bvm = {
> > +   .bi_bdev = bdev,
> > +   .bi_sector = sector,
> > +   .bi_rw = bio->bi_rw,
> > +   };
> > +
> > +   if (bio->bi_vcnt == 0) {
> > +   WARN_ON(1);
> > +   return 1;
> > +   }
> > +
> > +   prev = &bio->bi_io_vec[bio->bi_vcnt - 1];
> > +   if ((bio->bi_size >> 9) > max_sectors)
> > +   return 0;
> > +
> > +   if (!q->merge_bvec_fn)
> > +   return 1;
> > +
> > +   bvm.bi_size = bio->bi_size - prev->bv_len;
> > +   if (q->merge_bvec_fn(q, &bvm, prev) < prev->bv_len)
> > +   return 0;
> 
> I wanted to mention one concern that occurred to me over the
> weekend.
> 
> By checking only the last bio_vec in the bio here it's
> conceivable you'll miss another boundary that sits earlier
> in the bio.
> 
> We technically allow the boundaries of objects that back rbd
> disk images to be as small as 4 KB (but by default it's 4 MB).
> 4 KB is really unlikely, but 256 KB (which is smaller than
> 128-entry bio) is less of a stretch.
> 
> I also came up with a different way of splitting that makes
> it moot anyway, obviating the need for doing this check at
> all so after a little more testing I'll get that posted.
> 

Right but if we go over the boundary earlier in the bio shouldn't we trip the
test anyway?  For example, if you have 4k images and the last one happens to be
4k aligned but is in a 8 megabyte bio it should fail, and if it doesn't the
merge function is wrong :).  Once it fails we add all pages starting from the
first one forward, so we will end up with correct bios if we have to split it.
Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: recheck bio against block device when we map the bio

2012-10-22 Thread Josef Bacik
On Fri, Oct 19, 2012 at 08:31:17PM -0600, Liu Bo wrote:
> On 10/20/2012 05:01 AM, Josef Bacik wrote:
> > Alex reported a problem where we were writing between chunks on a rbd
> > device.  The thing is we do bio_add_page using logical offsets, but the
> > physical offset may be different.  So when we map the bio now check to see
> > if the bio is still ok with the physical offset, and if it is not split the
> > bio up and redo the bio_add_page with the physical sector.  This fixes the
> > problem for Alex and doesn't affect performance in the normal case.  Thanks,
> > 
> > Reported-and-tested-by: Alex Elder 
> > Signed-off-by: Josef Bacik 
> > ---
> >  fs/btrfs/volumes.c |  159 
> > ++-
> >  1 files changed, 131 insertions(+), 28 deletions(-)
> > 
> > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> > index a8adf26..eaaf0bf 100644
> > --- a/fs/btrfs/volumes.c
> > +++ b/fs/btrfs/volumes.c
> > @@ -4217,6 +4217,113 @@ static noinline void schedule_bio(struct btrfs_root 
> > *root,
> >&device->work);
> >  }
> >  
> > +static int bio_size_ok(struct block_device *bdev, struct bio *bio,
> > +  sector_t sector)
> > +{
> > +   struct bio_vec *prev;
> > +   struct request_queue *q = bdev_get_queue(bdev);
> > +   unsigned short max_sectors = queue_max_sectors(q);
> > +   struct bvec_merge_data bvm = {
> > +   .bi_bdev = bdev,
> > +   .bi_sector = sector,
> > +   .bi_rw = bio->bi_rw,
> > +   };
> > +
> > +   if (bio->bi_vcnt == 0) {
> > +   WARN_ON(1);
> > +   return 1;
> > +   }
> > +
> > +   prev = &bio->bi_io_vec[bio->bi_vcnt - 1];
> 
> I prefer 'last' for this.
> 
> 
> Others look good to me :)
> 
> Reviewed-by: Liu Bo 
> 

Heh I had made that change but apparently didn't commit it before I sent it out.
Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fill the global reserve when unpinning space

2012-10-22 Thread Josef Bacik
Dave gave me an image of a very full file system that would abort the
transaction because it ran out of space while committing the transaction.
This is because we would think there was plenty of room to create a snapshot
even though the global reserve was not full.  This happens because we
calculate the global reserve size before we unpin any space, so after we
unpin the space we allow reservations to occur even though we haven't
reserved all of the space for our global reserve.  Fix this by adding to the
global reserve while unpinning in order to make sure we always have enough
space to do our work.  With this patch we no longer end up with an aborted
transaction, we return ENOSPC properly to the person trying to create the
snapshot.  Thanks,

Signed-off-by: Josef Bacik 
---
 fs/btrfs/extent-tree.c |   29 -
 1 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 2136add..b765025 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4949,9 +4949,13 @@ static int unpin_extent_range(struct btrfs_root *root, 
u64 start, u64 end)
 {
struct btrfs_fs_info *fs_info = root->fs_info;
struct btrfs_block_group_cache *cache = NULL;
+   struct btrfs_space_info *space_info;
+   struct btrfs_block_rsv *global_rsv = &fs_info->global_block_rsv;
u64 len;
+   bool readonly;
 
while (start <= end) {
+   readonly = false;
if (!cache ||
start >= cache->key.objectid + cache->key.offset) {
if (cache)
@@ -4969,15 +4973,30 @@ static int unpin_extent_range(struct btrfs_root *root, 
u64 start, u64 end)
}
 
start += len;
+   space_info = cache->space_info;
 
-   spin_lock(&cache->space_info->lock);
+   spin_lock(&space_info->lock);
spin_lock(&cache->lock);
cache->pinned -= len;
-   cache->space_info->bytes_pinned -= len;
-   if (cache->ro)
-   cache->space_info->bytes_readonly += len;
+   space_info->bytes_pinned -= len;
+   if (cache->ro) {
+   space_info->bytes_readonly += len;
+   readonly = true;
+   }
spin_unlock(&cache->lock);
-   spin_unlock(&cache->space_info->lock);
+   if (!readonly) {
+   spin_lock(&global_rsv->lock);
+   if (!global_rsv->full) {
+   len = min(len, global_rsv->size -
+ global_rsv->reserved);
+   global_rsv->reserved += len;
+   space_info->bytes_may_use += len;
+   if (global_rsv->reserved >= global_rsv->size)
+   global_rsv->full = 1;
+   }
+   spin_unlock(&global_rsv->lock);
+   }
+   spin_unlock(&space_info->lock);
}
 
if (cache)
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: device delete, error removing device

2012-10-22 Thread Hugo Mills
On Mon, Oct 22, 2012 at 01:36:31PM -0600, Chris Murphy wrote:
> On Oct 22, 2012, at 11:18 AM, Hugo Mills  wrote:
> > 
> >   It's more like a balance which moves everything that has some (part
> > of its) existence on a device. So when you have RAID-0 or RAID-1 data,
> > all of the related chunks on other disks get moved too (so in RAID-1,
> > it's the mirror chunk as well as the chunk on the removed disk that
> > gets rewritten).
> 
> Does this mean "device delete" depends on an ability to make writes
> to the device being removed? I immediately think of SSD failures,
> which seem to fail writing, while still being able to reliably read.
> Would that behavior inhibit the ability to remove the device from
> the volume?

   No, the device being removed isn't modified at all. (Which causes
its own set of weird problemettes, but I think most of those have gone
away).

> >> [ 2152.257163] btrfs: no missing devices found to remove
> >> 
> >> So they're missing but not missing?
> > 
> >   If you run sync, or wait for 30 seconds, you'll find that fi show
> > shows the correct information again -- btrfs fi show reads the
> > superblocks directly, and if you run it immediately after the dev del,
> > they've not been flushed back to disk yet.
>
> Even after an hour, btrfs fi show says there are missing devices.
> After mkfs.btrfs on that "missing" device, 'btrfs fi show' no longer
> shows the missing device message.

   Hmm. Someone had this on IRC yesterday. It sounds like something's
not properly destroying the superblock(s) on the removed device.

> >   I think we should probably default to single on multi-device
> > filesystems, not RAID-0, as this kind of problem bites a lot of
> > people, particularly when trying to drop the second disk in a pair.
> 
> I'm not thinking of an obvious advantage raid0 has over single other
> than performance. It seems the more common general purpose use case
> is better served by single, especially the likelihood of volumes
> being grown with arbitrary drive capacities.

   Indeed.

> I found this [1] thread discussing a case where a -d single volume
> is upgraded to the raid0 profile. I'm not finding this to be the
> case when trying it today. mkfs.btrfs on 1 drive, then adding a 2nd
> drive, produces:
> Data: total=8.00MB, used=128.00KB
> System, DUP: total=8.00MB, used=4.00KB
> System: total=4.00MB, used=0.00
> Metadata, DUP: total=409.56MB, used=24.00KB
> Metadata: total=8.00MB, used=0.00
>
> This appears to retain the single profile. This is expected at this
> point? What I find a bit problematic is that metadata is still DUP
> rather than being automatically upgraded to raid1.

   Yes, the automatic single -> RAID-0 upgrade was fixed. If you
haven't run a balance on (at least) the metadata after adding the new
device, then you won't get the DUP -> RAID-1 upgrade on metadata. (I
can tell you haven't run the balance, because you still have the empty
single metadata chunk).

> What is the likelihood of a mkfs.btrfs 2+ device change in the
> default data profile from raid0 to single?

   Non-zero. I think it mostly just wants someone to write the patch,
and then beat off any resulting bikeshedding. :)

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- I spent most of my money on drink, women and fast cars. The ---   
  rest I wasted.  -- James Hunt  


signature.asc
Description: Digital signature


[PATCH] Btrfs: do not bug when we fail to commit the transaction

2012-10-22 Thread Josef Bacik
We BUG if we fail to commit the transaction when creating a snapshot, which
is just obnoxious.  Remove the BUG_ON().  Thanks,

Signed-off-by: Josef Bacik 
---
 fs/btrfs/ioctl.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 3b8b509..8fcf9a5 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -571,7 +571,8 @@ static int create_snapshot(struct btrfs_root *root, struct 
dentry *dentry,
ret = btrfs_commit_transaction(trans,
   root->fs_info->extent_root);
}
-   BUG_ON(ret);
+   if (ret)
+   goto fail;
 
ret = pending_snapshot->error;
if (ret)
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: Use btrfs_update_inode_fallback when creating a snapshot

2012-10-22 Thread Josef Bacik
On a really full file system I was getting ENOSPC back from
btrfs_update_inode when trying to update the parent inode when creating a
snapshot.  Just use the fallback method so we can update the inode and not
have to worry about having a delayed ref.  Thanks,

Signed-off-by: Josef Bacik 
---
 fs/btrfs/ctree.h   |2 ++
 fs/btrfs/inode.c   |7 +++
 fs/btrfs/transaction.c |2 +-
 3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 59141a4..b61d9b5 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3344,6 +3344,8 @@ struct extent_map *btrfs_get_extent(struct inode *inode, 
struct page *page,
 int btrfs_update_inode(struct btrfs_trans_handle *trans,
  struct btrfs_root *root,
  struct inode *inode);
+int btrfs_update_inode_fallback(struct btrfs_trans_handle *trans,
+   struct btrfs_root *root, struct inode *inode);
 int btrfs_orphan_add(struct btrfs_trans_handle *trans, struct inode *inode);
 int btrfs_orphan_del(struct btrfs_trans_handle *trans, struct inode *inode);
 int btrfs_orphan_cleanup(struct btrfs_root *root);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 4919cc3..8e78c6d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -95,8 +95,6 @@ static noinline int cow_file_range(struct inode *inode,
   struct page *locked_page,
   u64 start, u64 end, int *page_started,
   unsigned long *nr_written, int unlock);
-static noinline int btrfs_update_inode_fallback(struct btrfs_trans_handle 
*trans,
-   struct btrfs_root *root, struct inode *inode);
 
 static int btrfs_init_inode_security(struct btrfs_trans_handle *trans,
 struct inode *inode,  struct inode *dir,
@@ -3362,8 +3360,9 @@ noinline int btrfs_update_inode(struct btrfs_trans_handle 
*trans,
return btrfs_update_inode_item(trans, root, inode);
 }
 
-static noinline int btrfs_update_inode_fallback(struct btrfs_trans_handle 
*trans,
-   struct btrfs_root *root, struct inode *inode)
+noinline int btrfs_update_inode_fallback(struct btrfs_trans_handle *trans,
+struct btrfs_root *root,
+struct inode *inode)
 {
int ret;
 
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 83aa617..4e1def4 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1198,7 +1198,7 @@ static noinline int create_pending_snapshot(struct 
btrfs_trans_handle *trans,
btrfs_i_size_write(parent_inode, parent_inode->i_size +
 dentry->d_name.len * 2);
parent_inode->i_mtime = parent_inode->i_ctime = CURRENT_TIME;
-   ret = btrfs_update_inode(trans, parent_root, parent_inode);
+   ret = btrfs_update_inode_fallback(trans, parent_root, parent_inode);
if (ret)
btrfs_abort_transaction(trans, root, ret);
 fail:
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: device delete, error removing device

2012-10-22 Thread Chris Murphy

On Oct 22, 2012, at 11:04 AM, Goffredo Baroncelli  wrote:

> Which version of "btrfs" tool are you using ? There was a bug on this.
> Try the latest.

No idea.

On Oct 22, 2012, at 11:18 AM, Hugo Mills  wrote:
> 
>   It's more like a balance which moves everything that has some (part
> of its) existence on a device. So when you have RAID-0 or RAID-1 data,
> all of the related chunks on other disks get moved too (so in RAID-1,
> it's the mirror chunk as well as the chunk on the removed disk that
> gets rewritten).

Does this mean "device delete" depends on an ability to make writes to the 
device being removed? I immediately think of SSD failures, which seem to fail 
writing, while still being able to reliably read. Would that behavior inhibit 
the ability to remove the device from the volume?


>> [ 2152.257163] btrfs: no missing devices found to remove
>> 
>> So they're missing but not missing?
> 
>   If you run sync, or wait for 30 seconds, you'll find that fi show
> shows the correct information again -- btrfs fi show reads the
> superblocks directly, and if you run it immediately after the dev del,
> they've not been flushed back to disk yet.


Even after an hour, btrfs fi show says there are missing devices. After 
mkfs.btrfs on that "missing" device, 'btrfs fi show' no longer shows the 
missing device message.


>   I think we should probably default to single on multi-device
> filesystems, not RAID-0, as this kind of problem bites a lot of
> people, particularly when trying to drop the second disk in a pair.

I'm not thinking of an obvious advantage raid0 has over single other than 
performance. It seems the more common general purpose use case is better served 
by single, especially the likelihood of volumes being grown with arbitrary 
drive capacities.

I found this [1] thread discussing a case where a -d single volume is upgraded 
to the raid0 profile. I'm not finding this to be the case when trying it today. 
mkfs.btrfs on 1 drive, then adding a 2nd drive, produces:
Data: total=8.00MB, used=128.00KB
System, DUP: total=8.00MB, used=4.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=409.56MB, used=24.00KB
Metadata: total=8.00MB, used=0.00

This appears to retain the single profile. This is expected at this point? What 
I find a bit problematic is that metadata is still DUP rather than being 
automatically upgraded to raid1.

What is the likelihood of a mkfs.btrfs 2+ device change in the default data 
profile from raid0 to single?


[1] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/16278--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5/6

2012-10-22 Thread Andreas Philipp

On 10/22/2012 06:05 PM, Hugo Mills wrote:
> On Mon, Oct 22, 2012 at 10:58:07AM -0500, Michael wrote:
>> Does anyone know when RAID 5/6 are planned to be included in the
>> Kernel?
>
> This is in the FAQ:
>
>
https://btrfs.wiki.kernel.org/index.php/FAQ#Can_I_use_RAID.5B56.5D_on_my_Btrfs_filesystem.3F
>
> Short answer: Not yet, probably soon.
>
>> I am starting to buy parts for my next computer and would very
>> much like to use BTRFS because I want a FS that can grow and also
>> recover from undetected read errors - it will be large enough that
>> these are possible. I'm hoping that it will be available for use in
>> the coming months.
>
> You can switch storage types on the fly, so you could at least
> start with RAID-1, and then restripe to RAID-5 (or -6) when it's
> stable enough for you. This assumes that you can manage to use RAID-1
> in the first place and expand later.
If you can live without redundancy at the beginning, you could also
start with RAID-0, and then restripe to RAID-5.

Andreas

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: device delete, error removing device

2012-10-22 Thread Hugo Mills
On Mon, Oct 22, 2012 at 10:42:18AM -0600, Chris Murphy wrote:
> Thanks for the response Hugo,
> 
> On Oct 22, 2012, at 3:19 AM, Hugo Mills  wrote:
> 
> >   I'm not entirely sure what's going on here(*), but it looks like an
> > awkward interaction between the unequal sizes of the devices, the fact
> > that three of them are very small, and the RAID-0/RAID-1 on
> > data/metadata respectively.
> 
> I'm fine accepting the devices are very small and the original file system 
> was packed completely full: to the point this is effectively sabotage. 
> 
> The idea was merely to test how a full (I was aiming more for 90%, not 97%, 
> oops) volume handles being migrated to a replacement disk, which I think for 
> a typical user would be larger not the same, knowing in advance that not all 
> of the space on the new disk is usable. And I was doing it at a one order 
> magnitude reduced scale for space consideration.
> 
> 
> >   You can't relocate any of the data chunks, because RAID-0 requires
> > at least two chunks, and all your data chunks are more than 50% full,
> > so it can't put one 0.55 GiB chunk on the big disk and one 0.55 GiB
> > chunk on the remaining space on the small disk, which is the only way
> > it could proceed.
> 
> Interesting. So the way "device delete" moves extents is not at all similar 
> to how LVM pvmove moves extents, which is unidirectional (away from the 
> device being demoted). My, seemingly flawed, expectation was that "device 
> delete" would cause extents on the deleted device to be moved to the newly 
> added disk.

   It's more like a balance which moves everything that has some (part
of its) existence on a device. So when you have RAID-0 or RAID-1 data,
all of the related chunks on other disks get moved too (so in RAID-1,
it's the mirror chunk as well as the chunk on the removed disk that
gets rewritten).

> If I add yet another 12GB virtual disk, sdf, and then attempt a delete, it 
> works, no errors. Result:
> [root@f18v ~]# btrfs device delete /dev/sdb /mnt
> [root@f18v ~]# btrfs fi show
> failed to read /dev/sr0
> Label: none  uuid: 6e96a96e-3357-4f23-b064-0f0713366d45
>   Total devices 5 FS bytes used 7.52GB
>   devid5 size 12.00GB used 4.17GB path /dev/sdf
>   devid4 size 12.00GB used 4.62GB path /dev/sde
>   devid3 size 3.00GB used 2.68GB path /dev/sdd
>   devid2 size 3.00GB used 2.68GB path /dev/sdc
>   *** Some devices missing
> 
> However, I think that last line is a bug. When I
> 
> [root@f18v ~]# btrfs device delete missing /mnt
> 
> I get
> 
> [ 2152.257163] btrfs: no missing devices found to remove
> 
> So they're missing but not missing?

   If you run sync, or wait for 30 seconds, you'll find that fi show
shows the correct information again -- btrfs fi show reads the
superblocks directly, and if you run it immediately after the dev del,
they've not been flushed back to disk yet.

> > btrfs balance start -dconvert=single /mountpoint

> Yeah that's perhaps a better starting point for many regular Joe
> users setting up a multiple device btrfs volume, in particular where
> different sized disks can be anticipated.

   I think we should probably default to single on multi-device
filesystems, not RAID-0, as this kind of problem bites a lot of
people, particularly when trying to drop the second disk in a pair.

   In similar vein, I'd suggest that an automatic downgrade from
RAID-1 to DUP metadata on removing one device from a 2-device array
should also be done, but I suspect there's some good reasons for not
doing that, that I've not thought of. This has also bitten a lot of
people in the past.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- "There's more than one way to do it" is not a commandment. It ---  
   is a dire warning.


signature.asc
Description: Digital signature


Re: [PATCH] Btrfs: fix a tree mod bug while inserting a new root

2012-10-22 Thread Jan Schmidt
Hi liubo,

On Mon, October 22, 2012 at 16:02 (+0200), Liu Bo wrote:
> According to btree's balance algorithm, when we split a root into two parts,
> we insert a new one to be their parent:
> 
>  new root
> node A/  \
>   | x1 x2 x3 x4 x5 x6 |   =>  node A node A'
> | x1 x2 x3 - - - |  | x4 x5 x6 - - - |
>  split
> 
> The original root won't be freed because it becomes a child of the new root,
> and a move to keep balance is needed then.
> 
> So we should not add REMOVE_WHILE_FREEING keys for the old root, otherwise,
> we will hit use-after-free since we first add REMOVE_WHILE_FREEING keys and
> then add REMOVE keys, which is invalid.

I don't like adding another parameter there, the function is already confusing
without it. I've got a different fix for that problem here as well. I haven't
been sending it since Friday because there's at least one additional problem in
the tree mod log, and I'd like to see all of the issues fixed.

There's also a fix for double frees from push_node_left here. That one may be
fixing the other issue you're seeing (which I still cannot reproduce). I'm still
not convinced it's a good idea to change the semantics in del_ptr as done in
your previous patch set.

Probably we can try working together on irc in a more interactive fashion? Or
tell me if you want my patches anywhere before I send them out.

-Jan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: device delete, error removing device

2012-10-22 Thread Goffredo Baroncelli
On 2012-10-22 18:42, Chris Murphy wrote:
> [root@f18v ~]# btrfs fi show
> failed to read /dev/sr0
> Label: none  uuid: 6e96a96e-3357-4f23-b064-0f0713366d45
>   Total devices 5 FS bytes used 7.52GB
>   devid5 size 12.00GB used 4.17GB path /dev/sdf
>   devid4 size 12.00GB used 4.62GB path /dev/sde
>   devid3 size 3.00GB used 2.68GB path /dev/sdd
>   devid2 size 3.00GB used 2.68GB path /dev/sdc
>   *** Some devices missing
> 
> However, I think that last line is a bug. When I


Which version of "btrfs" tool are you using ? There was a bug on this.
Try the latest.

BR
G.Baroncelli

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] Btrfs: don't check the permission of the subvolume which we want to delete

2012-10-22 Thread Goffredo Baroncelli
On 2012-10-22 13:38, Miao Xie wrote:
> Step to reproduce:
>  # mkfs.btrfs 
>  # mount -o user_subvol_rm_allowed  
>  # mkdir /dir0
>  # chmod 777 /dir0
>  # btrfs sub snap  /dir0/snap0
>  # su -c "btrfs sub del /dir0/snap0" -s /bin/bash nobody
>  ERROR: cannot delete '/dir0/snap0' - Permission denied
> 
> This is because we checked the permission of the subvolume that we want to
> delete, and found the user - nobody have no WRITE permission of this 
> subvolume.
> 
> I think we need not check the permission of the subvolume we want to delete,
> because we have the right to clean up the directory since we have WRITE and
> EXECUTE permission, just like rmdir command.

I think that removing a subvolume is a bit different than removing a
directory.
With "user_subvol_rm_allowed" allow an ordinary user to remove  things
that an plain "rm -rf" is not allowed.

For example if inside a directories tree there is a directory with
permission 700 (uid==root) an ordinary user is not able to remove this
directory.
Instead with the subvolumes (and the flag user_subvol_rm_allowed) an
ordinary user would be allowed to do it.
As mitigation it is required that user has WX permission on subvolume.
IIRC this is what we discussed at time.

See the thread "[PATCH] Btrfs: allow subvol deletion by unprivileged
user with -o user_subvol_rm_allowed"

http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg06525.html


> 
> Signed-off-by: Miao Xie 
> ---
>  fs/btrfs/ioctl.c |4 
>  1 files changed, 0 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index f5a2e6c..29fb07c 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -2062,10 +2062,6 @@ static noinline int btrfs_ioctl_snap_destroy(struct 
> file *file,
>   if (root == dest)
>   goto out_dput;
>  
> - err = inode_permission(inode, MAY_WRITE | MAY_EXEC);
> - if (err)
> - goto out_dput;
> -
>   /* check if subvolume may be deleted by a non-root user */
>   err = btrfs_may_delete(dir, dentry, 1);
>   if (err)


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: device delete, error removing device

2012-10-22 Thread Chris Murphy

Thanks for the response Hugo,

On Oct 22, 2012, at 3:19 AM, Hugo Mills  wrote:

>   I'm not entirely sure what's going on here(*), but it looks like an
> awkward interaction between the unequal sizes of the devices, the fact
> that three of them are very small, and the RAID-0/RAID-1 on
> data/metadata respectively.

I'm fine accepting the devices are very small and the original file system was 
packed completely full: to the point this is effectively sabotage. 

The idea was merely to test how a full (I was aiming more for 90%, not 97%, 
oops) volume handles being migrated to a replacement disk, which I think for a 
typical user would be larger not the same, knowing in advance that not all of 
the space on the new disk is usable. And I was doing it at a one order 
magnitude reduced scale for space consideration.


>   You can't relocate any of the data chunks, because RAID-0 requires
> at least two chunks, and all your data chunks are more than 50% full,
> so it can't put one 0.55 GiB chunk on the big disk and one 0.55 GiB
> chunk on the remaining space on the small disk, which is the only way
> it could proceed.

Interesting. So the way "device delete" moves extents is not at all similar to 
how LVM pvmove moves extents, which is unidirectional (away from the device 
being demoted). My, seemingly flawed, expectation was that "device delete" 
would cause extents on the deleted device to be moved to the newly added disk.

If I add yet another 12GB virtual disk, sdf, and then attempt a delete, it 
works, no errors. Result:
[root@f18v ~]# btrfs device delete /dev/sdb /mnt
[root@f18v ~]# btrfs fi show
failed to read /dev/sr0
Label: none  uuid: 6e96a96e-3357-4f23-b064-0f0713366d45
Total devices 5 FS bytes used 7.52GB
devid5 size 12.00GB used 4.17GB path /dev/sdf
devid4 size 12.00GB used 4.62GB path /dev/sde
devid3 size 3.00GB used 2.68GB path /dev/sdd
devid2 size 3.00GB used 2.68GB path /dev/sdc
*** Some devices missing

However, I think that last line is a bug. When I

[root@f18v ~]# btrfs device delete missing /mnt

I get

[ 2152.257163] btrfs: no missing devices found to remove

So they're missing but not missing?

> btrfs balance start -dconvert=single /mountpoint


Yeah that's perhaps a better starting point for many regular Joe users setting 
up a multiple device btrfs volume, in particular where different sized disks 
can be anticipated.


Chris Murphy--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fits: tool to parse stream

2012-10-22 Thread Chris Mason
On Mon, Oct 22, 2012 at 12:41:49AM -0600, Arne Jansen wrote:
> On 15.10.2012 16:32, Chris Mason wrote:
> > On Sat, Oct 13, 2012 at 09:41:28AM -0600, David Sterba wrote:
> >> On Sat, Oct 13, 2012 at 09:08:57AM +0100, Rory Campbell-Lange wrote:
> >>> Perhaps "BTRFS Incremental Stream" or "Backup Incremental Stream" should
> >>> be considered, with the file extension ".bis".
> >>
> >> From the brainstorming we had about the name, the intention behind the
> >> stream is to be filesystem independent.
> >>
> >> So this might be named FIS with .fis extension, that is not far from the
> >> proposed name. However, this hasn't been approved by the Slang committee
> >> yet :)
> > 
> > urbandictionary has some entries for fis, although I'm not really sure
> > that matters.  .fs is still my favorite, or .fss if you want to have the
> > word stream in there.
> 
> We finally settled on 'far', Filesystem Agnostic Replication.

I like that one a lot.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: recheck bio against block device when we map the bio

2012-10-22 Thread Alex Elder
On 10/19/2012 04:01 PM, Josef Bacik wrote:
> Alex reported a problem where we were writing between chunks on a rbd
> device.  The thing is we do bio_add_page using logical offsets, but the
> physical offset may be different.  So when we map the bio now check to see
> if the bio is still ok with the physical offset, and if it is not split the
> bio up and redo the bio_add_page with the physical sector.  This fixes the
> problem for Alex and doesn't affect performance in the normal case.  Thanks,
> 
> Reported-and-tested-by: Alex Elder 
> Signed-off-by: Josef Bacik 
> ---
>  fs/btrfs/volumes.c |  159 ++-
>  1 files changed, 131 insertions(+), 28 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index a8adf26..eaaf0bf 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -4217,6 +4217,113 @@ static noinline void schedule_bio(struct btrfs_root 
> *root,
>  &device->work);
>  }
>  
> +static int bio_size_ok(struct block_device *bdev, struct bio *bio,
> +sector_t sector)
> +{
> + struct bio_vec *prev;
> + struct request_queue *q = bdev_get_queue(bdev);
> + unsigned short max_sectors = queue_max_sectors(q);
> + struct bvec_merge_data bvm = {
> + .bi_bdev = bdev,
> + .bi_sector = sector,
> + .bi_rw = bio->bi_rw,
> + };
> +
> + if (bio->bi_vcnt == 0) {
> + WARN_ON(1);
> + return 1;
> + }
> +
> + prev = &bio->bi_io_vec[bio->bi_vcnt - 1];
> + if ((bio->bi_size >> 9) > max_sectors)
> + return 0;
> +
> + if (!q->merge_bvec_fn)
> + return 1;
> +
> + bvm.bi_size = bio->bi_size - prev->bv_len;
> + if (q->merge_bvec_fn(q, &bvm, prev) < prev->bv_len)
> + return 0;

I wanted to mention one concern that occurred to me over the
weekend.

By checking only the last bio_vec in the bio here it's
conceivable you'll miss another boundary that sits earlier
in the bio.

We technically allow the boundaries of objects that back rbd
disk images to be as small as 4 KB (but by default it's 4 MB).
4 KB is really unlikely, but 256 KB (which is smaller than
128-entry bio) is less of a stretch.

I also came up with a different way of splitting that makes
it moot anyway, obviating the need for doing this check at
all so after a little more testing I'll get that posted.

-Alex


> + return 1;
> +}
> +
> +static void submit_stripe_bio(struct btrfs_root *root, struct btrfs_bio 
> *bbio,
> +   struct bio *bio, u64 physical, int dev_nr,
> +   int rw, int async)
> +{
> + struct btrfs_device *dev = bbio->stripes[dev_nr].dev;
> +
> + bio->bi_private = bbio;
> + bio->bi_private = merge_stripe_index_into_bio_private(
> + bio->bi_private, (unsigned int)dev_nr);
> + bio->bi_end_io = btrfs_end_bio;
> + bio->bi_sector = physical >> 9;
> +#ifdef DEBUG
> + {
> + struct rcu_string *name;
> +
> + rcu_read_lock();
> + name = rcu_dereference(dev->name);
> + pr_debug("btrfs_map_bio: rw %d, secor=%llu, dev=%lu "
> +  "(%s id %llu), size=%u\n", rw,
> +  (u64)bio->bi_sector, (u_long)dev->bdev->bd_dev,
> +  name->str, dev->devid, bio->bi_size);
> + rcu_read_unlock();
> + }
> +#endif
> + bio->bi_bdev = dev->bdev;
> + if (async)
> + schedule_bio(root, dev, rw, bio);
> + else
> + btrfsic_submit_bio(rw, bio);
> +}
> +
> +static int breakup_stripe_bio(struct btrfs_root *root, struct btrfs_bio 
> *bbio,
> +   struct bio *first_bio, struct btrfs_device *dev,
> +   int dev_nr, int rw, int async)
> +{
> + struct bio_vec *bvec = first_bio->bi_io_vec;
> + struct bio *bio;
> + int nr_vecs = bio_get_nr_vecs(dev->bdev);
> + u64 physical = bbio->stripes[dev_nr].physical;
> +
> +again:
> + bio = btrfs_bio_alloc(dev->bdev, physical >> 9, nr_vecs, GFP_NOFS);
> + if (!bio)
> + return -ENOMEM;
> +
> + while (bvec <= (first_bio->bi_io_vec + first_bio->bi_vcnt - 1)) {
> + if (bio_add_page(bio, bvec->bv_page, bvec->bv_len,
> +  bvec->bv_offset) < bvec->bv_len) {
> + u64 len = bio->bi_size;
> +
> + atomic_inc(&bbio->stripes_pending);
> + submit_stripe_bio(root, bbio, bio, physical, dev_nr,
> +   rw, async);
> + physical += len;
> + goto again;
> + }
> + bvec++;
> + }
> +
> + submit_stripe_bio(root, bbio, bio, physical, dev_nr, rw, async);
> + return 0;
> +}
> +
> +static void bbio_error(struct btrfs_bio *bbio, struct bio *bio, u64 log

Re: [PATCH] Btrfs: recheck bio against block device when we map the bio

2012-10-22 Thread Alex Elder
On 10/19/2012 09:31 PM, Liu Bo wrote:
>> +
>> > +  prev = &bio->bi_io_vec[bio->bi_vcnt - 1];
> I prefer 'last' for this.

I said exactly the same thing.

> Others look good to me :)
> 
> Reviewed-by: Liu Bo 

-Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5/6

2012-10-22 Thread Hugo Mills
On Mon, Oct 22, 2012 at 10:58:07AM -0500, Michael wrote:
> Does anyone know when RAID 5/6 are planned to be included in the
> Kernel?

   This is in the FAQ:

https://btrfs.wiki.kernel.org/index.php/FAQ#Can_I_use_RAID.5B56.5D_on_my_Btrfs_filesystem.3F

   Short answer: Not yet, probably soon.

> I am starting to buy parts for my next computer and would very
> much like to use BTRFS because I want a FS that can grow and also
> recover from undetected read errors - it will be large enough that
> these are possible. I'm hoping that it will be available for use in
> the coming months.

   You can switch storage types on the fly, so you could at least
start with RAID-1, and then restripe to RAID-5 (or -6) when it's
stable enough for you. This assumes that you can manage to use RAID-1
in the first place and expand later.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- "There's more than one way to do it" is not a commandment. It ---  
   is a dire warning.


signature.asc
Description: Digital signature


RAID 5/6

2012-10-22 Thread Michael
Hello,
Does anyone know when RAID 5/6 are planned to be included in the
Kernel? I am starting to buy parts for my next computer and would very
much like to use BTRFS because I want a FS that can grow and also
recover from undetected read errors - it will be large enough that
these are possible. I'm hoping that it will be available for use in
the coming months.
Regards,
Michael
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Weird Warning

2012-10-22 Thread Jérôme Poulin
Here is an excerpt of btrfsck:

# ./btrfsck /dev/nbd1
checking extents
corrupt extent record: key 314454016 168 524288
corrupt extent record: key 314978304 168 524288
corrupt extent record: key 315502592 168 524288
corrupt extent record: key 316026880 168 524288
corrupt extent record: key 316551168 168 524288
corrupt extent record: key 317075456 168 409600
corrupt extent record: key 317485056 168 372736
corrupt extent record: key 317857792 168 65536
corrupt extent record: key 317960192 168 1081344
ref mismatch on [289406976 36864] extent item 1, found 0
Incorrect local backref count on 289406976 root 257 owner
18446744073709551604 offset 0 found 0 wanted 1 back 0x2604be0
backpointer mismatch on [289406976 36864]
owner ref check failed [289406976 36864]
ref mismatch on [289443840 36864] extent item 1, found 0
Incorrect local backref count on 289443840 root 259 owner
18446744073709551604 offset 0 found 0 wanted 1 back 0x2604cc0
backpointer mismatch on [289443840 36864]
owner ref check failed [289443840 36864]
ref mismatch on [304230400 262144] extent item 1, found 0
Incorrect local backref count on 304230400 root 259 owner 261 offset
262144 found 0 wanted 1 back 0x2604da0
backpointer mismatch on [304230400 262144]
owner ref check failed [304230400 262144]
ref mismatch on [304492544 524288] extent item 1, found 0
Incorrect local backref count on 304492544 root 259 owner 261 offset
524288 found 0 wanted 1 back 0x2604e80
backpointer mismatch on [304492544 524288]
owner ref check failed [304492544 524288]
ref mismatch on [305016832 524288] extent item 1, found 0
Incorrect local backref count on 305016832 root 259 owner 261 offset
1048576 found 0 wanted 1 back 0x2604f60
backpointer mismatch on [305016832 524288]
owner ref check failed [305016832 524288]
ref mismatch on [305541120 524288] extent item 1, found 0
Incorrect local backref count on 305541120 root 259 owner 261 offset
1572864 found 0 wanted 1 back 0x2605040
backpointer mismatch on [305541120 524288]
owner ref check failed [305541120 524288]
ref mismatch on [313929728 524288] extent item 65536, found 1
ref mismatch on [314454016 524288] extent item 65536, found 1
Backref 314454016 root 259 owner 261 offset 10485760 num_refs 0 not
found in extent tree
Incorrect local backref count on 314454016 root 259 owner 261 offset
10485760 found 1 wanted 0 back 0x2600010
backpointer mismatch on [314454016 524288]
ref mismatch on [314978304 524288] extent item 65536, found 1
Backref 314978304 root 259 owner 261 offset 11010048 num_refs 0 not
found in extent tree
Incorrect local backref count on 314978304 root 259 owner 261 offset
11010048 found 1 wanted 0 back 0x26000f0
backpointer mismatch on [314978304 524288]
ref mismatch on [315502592 524288] extent item 281474976776192, found 1
Backref 315502592 root 259 owner 261 offset 11534336 num_refs 0 not
found in extent tree
Incorrect local backref count on 315502592 root 259 owner 261 offset
11534336 found 1 wanted 0 back 0x26001d0
backpointer mismatch on [315502592 524288]
ref mismatch on [316026880 524288] extent item 4294967296, found 1
Backref 316026880 root 259 owner 261 offset 12058624 num_refs 0 not
found in extent tree
Incorrect local backref count on 316026880 root 259 owner 261 offset
12058624 found 1 wanted 0 back 0x26002b0
backpointer mismatch on [316026880 524288]
ref mismatch on [316551168 524288] extent item 4294967296, found 1
Backref 316551168 root 259 owner 261 offset 12582912 num_refs 0 not
found in extent tree
Incorrect local backref count on 316551168 root 259 owner 261 offset
12582912 found 1 wanted 0 back 0x2600390
backpointer mismatch on [316551168 524288]
ref mismatch on [317075456 409600] extent item 4294967297, found 1
Backref 317075456 root 259 owner 261 offset 13107200 num_refs 0 not
found in extent tree
Incorrect local backref count on 317075456 root 259 owner 261 offset
13107200 found 1 wanted 0 back 0x2600470
backpointer mismatch on [317075456 409600]
ref mismatch on [317485056 372736] extent item 4294967296, found 1
Backref 317485056 root 259 owner 262 offset 0 num_refs 0 not found in
extent tree
Incorrect local backref count on 317485056 root 259 owner 262 offset 0
found 1 wanted 0 back 0x2600550
backpointer mismatch on [317485056 372736]
ref mismatch on [317857792 65536] extent item 4294967296, found 1
Backref 317857792 root 1 owner 258 offset 0 num_refs 0 not found in extent tree
Incorrect local backref count on 317857792 root 1 owner 258 offset 0
found 1 wanted 0 back 0x2604b00
backpointer mismatch on [317857792 65536]
ref mismatch on [317960192 1081344] extent item 4294967296, found 1
Backref 317960192 root 259 owner 262 offset 368640 num_refs 0 not
found in extent tree
Incorrect local backref count on 317960192 root 259 owner 262 offset
368640 found 1 wanted 0 back 0x2600630
backpointer mismatch on [317960192 1081344]
ref mismatch on [539897856 8796093022208] extent item 0, found 1
Backref 539897856 root 259 owner 261 offset 18374686479673196544
num_refs 0 not found in extent tree
In

Re: Weird Warning

2012-10-22 Thread Jérôme Poulin
Tried with cmason 3.6.0 for-linus tree. Did a "find" on the BTRFS.

[  152.846595] [ cut here ]
[  152.848841] kernel BUG at fs/btrfs/extent-tree.c:1519!
[  152.848841] invalid opcode:  [#1] SMP
[  152.848841] Modules linked in:
[  152.848841] CPU 0
[  152.848841] Pid: 1256, comm: btrfs-transacti Not tainted 3.6.0+ #3
Bochs Bochs
[  152.848841] RIP: 0010:[]  []
lookup_inline_extent_backref+0x4c8/0x500
[  152.848841] RSP: 0018:880014987a20  EFLAGS: 00010246
[  152.848841] RAX: 0020 RBX: 880016d65580 RCX: 880016bcc000
[  152.848841] RDX: 2000 RSI: 880017ffd000 RDI: 880016d65580
[  152.848841] RBP: 00b2 R08: 2f94 R09: 8800149879e0
[  152.848841] R10:  R11: 0002 R12: 2fb1
[  152.848841] R13: 2f17 R14: 880016c6b090 R15: 2f94
[  152.848841] FS:  () GS:880017c0()
knlGS:
[  152.848841] CS:  0010 DS:  ES:  CR0: 8005003b
[  152.848841] CR2: ff600400 CR3: 164c2000 CR4: 06b0
[  152.848841] DR0:  DR1:  DR2: 
[  152.848841] DR3:  DR6: 0ff0 DR7: 0400
[  152.848841] Process btrfs-transacti (pid: 1256, threadinfo
880014986000, task 88001496c580)
[  152.848841] Stack:
[  152.848841]  3f9b 811aca93 
1000
[  152.848841]  880016417800 0035 
12f22000
[  152.848841]  880014987b40 880016de20b8 
00ff812150bd
[  152.848841] Call Trace:
[  152.848841]  [] ? leaf_space_used+0x113/0x160
[  152.848841]  [] ? __btrfs_free_extent+0xcd/0x890
[  152.848841]  [] ? btrfs_merge_delayed_refs+0x21a/0x2f0
[  152.848841]  [] ? run_clustered_refs+0x307/0xa90
[  152.848841]  [] ? btrfs_tree_unlock+0x1e/0xc0
[  152.848841]  [] ? btrfs_run_delayed_refs+0xca/0x2f0
[  152.848841]  [] ?
btrfs_write_dirty_block_groups+0x15f/0x5f0
[  152.848841]  [] ? commit_cowonly_roots+0x127/0x1e8
[  152.848841]  [] ? btrfs_commit_transaction+0x54e/0xa50
[  152.848841]  [] ? abort_exclusive_wait+0xb0/0xb0
[  152.848841]  [] ? start_transaction+0x78/0x390
[  152.848841]  [] ? usleep_range+0x40/0x40
[  152.848841]  [] ? transaction_kthread+0x1a5/0x1d0
[  152.848841]  [] ? cleaner_kthread+0xd0/0xd0
[  152.848841]  [] ? kthread+0x85/0x90
[  152.848841]  [] ? kernel_thread_helper+0x4/0x10
[  152.848841]  [] ? kthread_worker_fn+0x110/0x110
[  152.848841]  [] ? gs_change+0x13/0x13
[  152.848841] Code: ff 4d 89 f7 31 c0 4c 8b 74 24 48 e9 78 ff ff ff
0f 1f 00 b8 0d 00 00 00 e9 32 fd ff ff 66 0f 1f 44 00 00 a8 01 0f 85
dc fc ff ff <0f> 0b 66 0f 1f 44 00 00 c7 44 24 1c 0d 00 00 00 e9 9e fb
ff ff
[  152.848841] RIP  []
lookup_inline_extent_backref+0x4c8/0x500
[  152.848841]  RSP 
[  152.950405] ---[ end trace b34b153b89ec3692 ]---

On Fri, Oct 19, 2012 at 6:13 PM, cwillu  wrote:
> On Fri, Oct 19, 2012 at 3:51 PM, Jérôme Poulin  wrote:
>> After updating to 3.5.5, I get thi on boot and listing some dir freezes.
>> I don't have anything important on that volume but I'm willing to
>> debug the problem if needed.  Would I need a more recent kernel?
>
> Probably worth trying 3.7-rc1, or at least cmason's for-linus (which
> is 3.6.0 + the btrfs changes that went into 3.7).
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 21/22] btrfs: add support for read_iter and write_iter

2012-10-22 Thread Dave Kleikamp
btrfs can use generic_file_read_iter(). Base btrfs_file_write_iter()
on btrfs_file_aio_write(), then have the latter call the former.

Signed-off-by: Dave Kleikamp 
Cc: Zach Brown 
Cc: Chris Mason 
Cc: linux-btrfs@vger.kernel.org
---
 fs/btrfs/file.c | 55 ++-
 1 file changed, 30 insertions(+), 25 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 9ab1bed..576d2f0 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1361,27 +1361,23 @@ static noinline ssize_t __btrfs_buffered_write(struct 
file *file,
 }
 
 static ssize_t __btrfs_direct_write(struct kiocb *iocb,
-   const struct iovec *iov,
-   unsigned long nr_segs, loff_t pos,
-   loff_t *ppos, size_t count, size_t ocount)
+struct iov_iter *iter, loff_t pos,
+   loff_t *ppos, size_t count)
 {
struct file *file = iocb->ki_filp;
-   struct iov_iter i;
ssize_t written;
ssize_t written_buffered;
loff_t endbyte;
int err;
 
-   written = generic_file_direct_write(iocb, iov, &nr_segs, pos, ppos,
-   count, ocount);
+   written = generic_file_direct_write_iter(iocb, iter, pos, ppos, count);
 
if (written < 0 || written == count)
return written;
 
pos += written;
count -= written;
-   iov_iter_init(&i, iov, nr_segs, count, written);
-   written_buffered = __btrfs_buffered_write(file, &i, pos);
+   written_buffered = __btrfs_buffered_write(file, iter, pos);
if (written_buffered < 0) {
err = written_buffered;
goto out;
@@ -1398,9 +1394,8 @@ out:
return written ? written : err;
 }
 
-static ssize_t btrfs_file_aio_write(struct kiocb *iocb,
-   const struct iovec *iov,
-   unsigned long nr_segs, loff_t pos)
+static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
+struct iov_iter *iter, loff_t pos)
 {
struct file *file = iocb->ki_filp;
struct inode *inode = fdentry(file)->d_inode;
@@ -1409,18 +1404,13 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb,
u64 start_pos;
ssize_t num_written = 0;
ssize_t err = 0;
-   size_t count, ocount;
+   size_t count;
 
sb_start_write(inode->i_sb);
 
mutex_lock(&inode->i_mutex);
 
-   err = generic_segment_checks(iov, &nr_segs, &ocount, VERIFY_READ);
-   if (err) {
-   mutex_unlock(&inode->i_mutex);
-   goto out;
-   }
-   count = ocount;
+   count = iov_iter_count(iter);
 
current->backing_dev_info = inode->i_mapping->backing_dev_info;
err = generic_write_checks(file, &pos, &count, S_ISBLK(inode->i_mode));
@@ -1468,14 +1458,10 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb,
}
 
if (unlikely(file->f_flags & O_DIRECT)) {
-   num_written = __btrfs_direct_write(iocb, iov, nr_segs,
-  pos, ppos, count, ocount);
+   num_written = __btrfs_direct_write(iocb, iter, pos, ppos,
+  count);
} else {
-   struct iov_iter i;
-
-   iov_iter_init(&i, iov, nr_segs, count, num_written);
-
-   num_written = __btrfs_buffered_write(file, &i, pos);
+   num_written = __btrfs_buffered_write(file, iter, pos);
if (num_written > 0)
*ppos = pos + num_written;
}
@@ -1506,6 +1492,23 @@ out:
return num_written ? num_written : err;
 }
 
+static ssize_t btrfs_file_aio_write(struct kiocb *iocb,
+   const struct iovec *iov,
+   unsigned long nr_segs, loff_t pos)
+{
+   struct iov_iter i;
+   int ret;
+   size_t count;
+
+   ret = generic_segment_checks(iov, &nr_segs, &count, VERIFY_WRITE);
+   if (ret)
+   return ret;
+
+   iov_iter_init(&i, iov, nr_segs, count, 0);
+
+   return btrfs_file_write_iter(iocb, &i, pos);
+}
+
 int btrfs_release_file(struct inode *inode, struct file *filp)
 {
/*
@@ -2282,7 +2285,9 @@ const struct file_operations btrfs_file_operations = {
.write  = do_sync_write,
.aio_read   = generic_file_aio_read,
.splice_read= generic_file_splice_read,
+   .read_iter  = generic_file_read_iter,
.aio_write  = btrfs_file_aio_write,
+   .write_iter = btrfs_file_write_iter,
.mmap   = btrfs_file_mmap,
.open   = generic_file_open,
.release= btrfs_release_file,
-- 
1.7.12.3

--
To unsubscribe from this list: send the line "unsubscribe linux-bt

[PATCH] Btrfs: fix a tree mod bug while inserting a new root

2012-10-22 Thread Liu Bo
According to btree's balance algorithm, when we split a root into two parts,
we insert a new one to be their parent:

 new root
node A/  \
  | x1 x2 x3 x4 x5 x6 |   =>  node A node A'
| x1 x2 x3 - - - |  | x4 x5 x6 - - - |
 split

The original root won't be freed because it becomes a child of the new root,
and a move to keep balance is needed then.

So we should not add REMOVE_WHILE_FREEING keys for the old root, otherwise,
we will hit use-after-free since we first add REMOVE_WHILE_FREEING keys and
then add REMOVE keys, which is invalid.

Signed-off-by: Liu Bo 
---
 fs/btrfs/ctree.c |   16 +---
 1 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index b334362..26987ef 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -639,7 +639,8 @@ __tree_mod_log_free_eb(struct btrfs_fs_info *fs_info, 
struct extent_buffer *eb)
 static noinline int
 tree_mod_log_insert_root(struct btrfs_fs_info *fs_info,
 struct extent_buffer *old_root,
-struct extent_buffer *new_root, gfp_t flags)
+struct extent_buffer *new_root,
+gfp_t flags, int free_old)
 {
struct tree_mod_elem *tm;
int ret;
@@ -647,7 +648,8 @@ tree_mod_log_insert_root(struct btrfs_fs_info *fs_info,
if (tree_mod_dont_log(fs_info, NULL))
return 0;
 
-   __tree_mod_log_free_eb(fs_info, old_root);
+   if (free_old)
+   __tree_mod_log_free_eb(fs_info, old_root);
 
ret = tree_mod_alloc(fs_info, flags, &tm);
if (ret < 0)
@@ -797,11 +799,11 @@ tree_mod_log_free_eb(struct btrfs_fs_info *fs_info, 
struct extent_buffer *eb)
 
 static noinline void
 tree_mod_log_set_root_pointer(struct btrfs_root *root,
- struct extent_buffer *new_root_node)
+ struct extent_buffer *new_root_node, int free_old)
 {
int ret;
ret = tree_mod_log_insert_root(root->fs_info, root->node,
-  new_root_node, GFP_NOFS);
+  new_root_node, GFP_NOFS, free_old);
BUG_ON(ret < 0);
 }
 
@@ -1029,7 +1031,7 @@ static noinline int __btrfs_cow_block(struct 
btrfs_trans_handle *trans,
parent_start = 0;
 
extent_buffer_get(cow);
-   tree_mod_log_set_root_pointer(root, cow);
+   tree_mod_log_set_root_pointer(root, cow, 1);
rcu_assign_pointer(root->node, cow);
 
btrfs_free_tree_block(trans, root, buf, parent_start,
@@ -1725,7 +1727,7 @@ static noinline int balance_level(struct 
btrfs_trans_handle *trans,
goto enospc;
}
 
-   tree_mod_log_set_root_pointer(root, child);
+   tree_mod_log_set_root_pointer(root, child, 1);
rcu_assign_pointer(root->node, child);
 
add_root_to_dirty_list(root);
@@ -3107,7 +3109,7 @@ static noinline int insert_new_root(struct 
btrfs_trans_handle *trans,
btrfs_mark_buffer_dirty(c);
 
old = root->node;
-   tree_mod_log_set_root_pointer(root, c);
+   tree_mod_log_set_root_pointer(root, c, 0);
rcu_assign_pointer(root->node, c);
 
/* the super has an extra ref to root->node */
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] filter snapshot(s) by its parent uuid

2012-10-22 Thread Lenz Grimmer
On 10/19/2012 10:37 AM, Rory Campbell-Lange wrote:

> From my perspective as a user I would be grateful if the following changes in
> syntax for listing subvolumes could be considered:
> 
> In addition to 
> 
> btrfs subvolume list [-apurts] [-g [+|-]value] [-c [+|-]value] 
> [--sort=gen,ogen,rootid,path] 
> btrfs su list [-apurts] [-g [+|-]value] [-c [+|-]value] 
> [--sort=gen,ogen,rootid,path] 
> List subvolumes (and snapshots)
> (and Anand's patch to allow /)
> 
> I believe the following shortcuts may be useful
> 
> btrfs subvolumes [-apurts] [-g [+|-]value] [-c [+|-]value] 
> [--sort=gen,ogen,rootid,path] []
> btrfs sl [-apurts] [-g [+|-]value] [-c [+|-]value] 
> [--sort=gen,ogen,rootid,path] []
> 
> Although from a technical perspective 'subvolume list' may be logical, listing
> subvolumes (plural) seems like a different sort of operation to those relating
> to a subvolume (singular) for create/alter/show/delete/snapshot operations.

+1 to this proposal (FWIW) - thanks for making the user interface more
intuitive :)

-- 
  Lenz Grimmer  - http://www.lenzg.net/



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 2/2] Btrfs: do not delete a subvolume which is in a R/O subvolume

2012-10-22 Thread cwillu
On Mon, Oct 22, 2012 at 5:39 AM, Miao Xie  wrote:
> Step to reproduce:
>  # mkfs.btrfs 
>  # mount  
>  # btrfs sub create /subv0
>  # btrfs sub snap  /subv0/snap0
>  # change /subv0 from R/W to R/O
>  # btrfs sub del /subv0/snap0
>
> We deleted the snapshot successfully. I think we should not be able to delete
> the snapshot since the parent subvolume is R/O.

snap0 isn't read-only in that case, right?  From a user interaction
standpoint, this seems like it just forces a user to rm -rf rather
btrfs sub del, which strikes me as a bit ham-handed when all we really
care about is leaving a (the?) directory entry where snap0 used to be.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] Btrfs: do not delete a subvolume which is in a R/O subvolume

2012-10-22 Thread Miao Xie
Step to reproduce:
 # mkfs.btrfs 
 # mount  
 # btrfs sub create /subv0
 # btrfs sub snap  /subv0/snap0
 # change /subv0 from R/W to R/O
 # btrfs sub del /subv0/snap0

We deleted the snapshot successfully. I think we should not be able to delete
the snapshot since the parent subvolume is R/O.

Signed-off-by: Miao Xie 
---
 fs/btrfs/ioctl.c |   10 +-
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 29fb07c..54c56c7 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2061,13 +2061,13 @@ static noinline int btrfs_ioctl_snap_destroy(struct 
file *file,
err = -EINVAL;
if (root == dest)
goto out_dput;
-
-   /* check if subvolume may be deleted by a non-root user */
-   err = btrfs_may_delete(dir, dentry, 1);
-   if (err)
-   goto out_dput;
}
 
+   /* check if subvolume may be deleted by a user */
+   err = btrfs_may_delete(dir, dentry, 1);
+   if (err)
+   goto out_dput;
+
if (btrfs_ino(inode) != BTRFS_FIRST_FREE_OBJECTID) {
err = -EINVAL;
goto out_dput;
-- 
1.7.6.5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] Btrfs: don't check the permission of the subvolume which we want to delete

2012-10-22 Thread Miao Xie
Step to reproduce:
 # mkfs.btrfs 
 # mount -o user_subvol_rm_allowed  
 # mkdir /dir0
 # chmod 777 /dir0
 # btrfs sub snap  /dir0/snap0
 # su -c "btrfs sub del /dir0/snap0" -s /bin/bash nobody
 ERROR: cannot delete '/dir0/snap0' - Permission denied

This is because we checked the permission of the subvolume that we want to
delete, and found the user - nobody have no WRITE permission of this subvolume.

I think we need not check the permission of the subvolume we want to delete,
because we have the right to clean up the directory since we have WRITE and
EXECUTE permission, just like rmdir command.

Signed-off-by: Miao Xie 
---
 fs/btrfs/ioctl.c |4 
 1 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index f5a2e6c..29fb07c 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2062,10 +2062,6 @@ static noinline int btrfs_ioctl_snap_destroy(struct file 
*file,
if (root == dest)
goto out_dput;
 
-   err = inode_permission(inode, MAY_WRITE | MAY_EXEC);
-   if (err)
-   goto out_dput;
-
/* check if subvolume may be deleted by a non-root user */
err = btrfs_may_delete(dir, dentry, 1);
if (err)
-- 
1.7.6.5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: device delete, error removing device

2012-10-22 Thread Hugo Mills
On Mon, Oct 22, 2012 at 12:02:08AM -0600, Chris Murphy wrote:
> 
> On Oct 21, 2012, at 10:32 PM, Chris Murphy  wrote:
> 
> > This is stock Fedora 18 beta kernel, 3.6.1-1.fc18.x86_64 #1 SMP Mon Oct 8 
> > 17:19:09 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
> 
> Probably not a good idea to omit this is a beta *test candidate* not a beta. 
> 
> Two things that make this possibly not realistic:
> 
> 1. The virtual disks are obviously very small, 3GB each with the 4th one only 
> 12GB.
> 
> 2. The original 3 device volume was ~97% full with a single large file prior 
> to adding the 4th device. Approximately 313MB free space remained on the 
> volume.

   I'm not entirely sure what's going on here(*), but it looks like an
awkward interaction between the unequal sizes of the devices, the fact
that three of them are very small, and the RAID-0/RAID-1 on
data/metadata respectively.

   You can't relocate any of the data chunks, because RAID-0 requires
at least two chunks, and all your data chunks are more than 50% full,
so it can't put one 0.55 GiB chunk on the big disk and one 0.55 GiB
chunk on the remaining space on the small disk, which is the only way
it could proceed.

   You _may_ be able to get some more success by changing the data to
"single":

# btrfs balance start -dconvert=single /mountpoint

   You may also possibly be able to reclaim some metadata space with:

# btrfs balance start -m /mountpoint

but I think that's unlikely.

   Hugo.

(*) It may be an as-yet-undiscovered reservation problem, in which
case you get to see Josef scream loudly and hide under his desk,
gibbering.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- If it ain't broke,  hit it again. ---


signature.asc
Description: Digital signature


problem replacing failing drive

2012-10-22 Thread sam tygier
hi,

I have a 2 drive btrfs raid set up. It was created first with a single drive, 
and then adding a second and doing
btrfs fi balance start -dconvert=raid1 /data

the original drive is showing smart errors so i want to replace it. i dont 
easily have space in my desktop for an extra disk, so i decided to proceed by 
shutting down. taking out the old failing drive and putting in the new drive. 
this is similar to the description at
https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices#Replacing_Failed_Devices
(the other reason to try this is to simulate what would happen if a drive did 
completely fail).

so after swapping the drives and rebooting, i try to mount as degraded. i 
instantly get a kernel panic, 
http://www.hep.man.ac.uk/u/sam/pub/IMG_5397_crop.png

so far all this has been with 3.5 kernel. so i upgraded to 3.6.2 and tried to 
mount degraded again.

first with just sudo mount /dev/sdd2 /mnt, then with sudo mount -o degraded 
/dev/sdd2 /mnt

[  582.535689] device label bdata devid 1 transid 25342 /dev/sdd2
[  582.536196] btrfs: disk space caching is enabled
[  582.536602] btrfs: failed to read the system array on sdd2
[  582.536860] btrfs: open_ctree failed
[  606.784176] device label bdata devid 1 transid 25342 /dev/sdd2
[  606.784647] btrfs: allowing degraded mounts
[  606.784650] btrfs: disk space caching is enabled
[  606.785131] btrfs: failed to read chunk root on sdd2
[  606.785331] btrfs warning page private not zero on page 392922368
[  606.785408] btrfs: open_ctree failed
[  782.422959] device label bdata devid 1 transid 25342 /dev/sdd2

no panic is good progress, but something is still not right.

my options would seem to be
1) reconnect old drive (probably in a USB caddy), see if it mounts as if 
nothing ever happened. or possibly try and recover it back to a working raid1. 
then try again with adding the new drive first, then removing the old one.
2) give up experimenting and create a new btrfs raid1, and restore from backup

both leave me with a worry about what would happen if a disk in a raid 1 did 
die. (unless is was the panic that did some damage that borked the filesystem.)

thanks.

sam

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html