Re: btrfs filesystem corruptions with 4.18. git kernels

2018-07-20 Thread Christian Kujau
On Fri, 20 Jul 2018, Alexander Wetzel wrote:
> [  979.223808] BTRFS: error (device sdc2) in __btrfs_cow_block:1080: errno=-5
> IO failure

Are there no other messages in syslog? "IO failure" (from 
fs/btrfs/super.c:75) sounds like a problem with the underlying device. 
Maybe try w/o the "discard" mount option? Does "cat /dev/sdc2 > /dev/null" 
complete w/o errors?

C.
-- 
BOFH excuse #184:

loop found in loop in redundant loopback
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs send -p

2017-09-25 Thread Christian Brauner
Hi guys,


It seems that btrfs v4.12.1 allows:

(1) btrfs send -p  

but disallows

(2) btrfs send  -p 

Code-wise it assumes that  is always found at optind == 1. I was
about to patch this but I'm not sure which way we'd like to go with this:
Actually only allow (1) and block (2) properly or allow both. In any case, this
seems to me like a pretty serious regression and we'd like to get this settled
soon. :)

Thanks!
Christian
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Btrfs data recovery

2017-08-13 Thread Christian Rene Thelen
I have formated an encrypted disk, containing a LVM with a btrfs system.

All superblocks appear to be destroyed; the btrfs-progs tools can't find the 
root tree anymore and scalpel, binwalk, foremost & co return only scrap. The 
filesystem was on an ssd and mounted with -o compression=lzo.

How screwed am I? Any chances to recover some files? Is there a plausible way 
to rebuild the superblock manually? Checking the raw image with xxd gives me 
not a single readable word.

I managed to decrypt the LV and dd it to an image. What can I do?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] btrfs-progs: send: fail on first -ENODATA only

2017-05-01 Thread Christian Brauner
Hi,

The original bug-reporter verified that my patch fixes the bug. See

https://bugzilla.kernel.org/show_bug.cgi?id=195597

Christian

On Sat, Apr 29, 2017 at 11:54:05PM +0200, Christian Brauner wrote:
> Returning -ENODATA is only considered invalid on the first run of the loop.
> 
> Signed-off-by: Christian Brauner <christian.brau...@ubuntu.com>
> ---
>  cmds-receive.c | 20 ++--
>  1 file changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/cmds-receive.c b/cmds-receive.c
> index b59f00e4..72e9c8f3 100644
> --- a/cmds-receive.c
> +++ b/cmds-receive.c
> @@ -1091,6 +1091,7 @@ static int do_receive(struct btrfs_receive *rctx, const 
> char *tomnt,
>   char *dest_dir_full_path;
>   char root_subvol_path[PATH_MAX];
>   int end = 0;
> + int iterations = 0;
>  
>   dest_dir_full_path = realpath(tomnt, NULL);
>   if (!dest_dir_full_path) {
> @@ -1198,13 +1199,18 @@ static int do_receive(struct btrfs_receive *rctx, 
> const char *tomnt,
>rctx,
>rctx->honor_end_cmd,
>max_errors);
> - if (ret < 0 && ret == -ENODATA) {
> + if (ret < 0) {
> + if (ret != -ENODATA)
> + goto out;
> +
>   /* Empty stream is invalid */
> - error("empty stream is not considered valid");
> - ret = -EINVAL;
> - goto out;
> - } else if (ret < 0) {
> - goto out;
> + if (iterations == 0) {
> + error("empty stream is not considered valid");
> + ret = -EINVAL;
> + goto out;
> + }
> +
> + ret = 1;
>   }
>   if (ret > 0)
>   end = 1;
> @@ -1213,6 +1219,8 @@ static int do_receive(struct btrfs_receive *rctx, const 
> char *tomnt,
>   ret = finish_subvol(rctx);
>   if (ret < 0)
>   goto out;
> +
> + iterations++;
>   }
>   ret = 0;
>  
> -- 
> 2.11.0
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1] btrfs-progs: send: fail on first -ENODATA only

2017-04-29 Thread Christian Brauner
Returning -ENODATA is only considered invalid on the first run of the loop.

Signed-off-by: Christian Brauner <christian.brau...@ubuntu.com>
---
 cmds-receive.c | 20 ++--
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/cmds-receive.c b/cmds-receive.c
index b59f00e4..72e9c8f3 100644
--- a/cmds-receive.c
+++ b/cmds-receive.c
@@ -1091,6 +1091,7 @@ static int do_receive(struct btrfs_receive *rctx, const 
char *tomnt,
char *dest_dir_full_path;
char root_subvol_path[PATH_MAX];
int end = 0;
+   int iterations = 0;
 
dest_dir_full_path = realpath(tomnt, NULL);
if (!dest_dir_full_path) {
@@ -1198,13 +1199,18 @@ static int do_receive(struct btrfs_receive *rctx, const 
char *tomnt,
 rctx,
 rctx->honor_end_cmd,
 max_errors);
-   if (ret < 0 && ret == -ENODATA) {
+   if (ret < 0) {
+   if (ret != -ENODATA)
+   goto out;
+
/* Empty stream is invalid */
-   error("empty stream is not considered valid");
-   ret = -EINVAL;
-   goto out;
-   } else if (ret < 0) {
-   goto out;
+   if (iterations == 0) {
+   error("empty stream is not considered valid");
+   ret = -EINVAL;
+   goto out;
+   }
+
+   ret = 1;
}
if (ret > 0)
end = 1;
@@ -1213,6 +1219,8 @@ static int do_receive(struct btrfs_receive *rctx, const 
char *tomnt,
ret = finish_subvol(rctx);
if (ret < 0)
goto out;
+
+   iterations++;
}
ret = 0;
 
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/1] btrfs-progs: send: fail on first -ENODATA only

2017-04-29 Thread Christian Brauner
Christian Brauner (1):
  btrfs-progs: send: fail on first -ENODATA only

 cmds-receive.c | 20 ++--
 1 file changed, 14 insertions(+), 6 deletions(-)

-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2 v2] btrfs-progs: fix btrfs send & receive with -e flag

2017-04-28 Thread Christian Brauner
Hi,

On Fri, Apr 28, 2017 at 02:55:31PM +0530, Lakshmipathi.G wrote:
> Seems like user reported an issue with this patch. please check
> https://bugzilla.kernel.org/show_bug.cgi?id=195597

I can take a look. What I'm wondering about is why it fails only in the HDD
to SSD case. If -ENODATA is returned with this patch it should mean that there
was no header data. So is the user sure that this doesn't indicate a valid
error?

Christian

> 
> 
> Cheers,
> Lakshmipathi.G
> 
> 
> On Tue, Apr 4, 2017 at 1:51 AM, Christian Brauner <
> christian.brau...@ubuntu.com> wrote:
> > The old check here tried to ensure that empty streams are not considered
> valid.
> > The old check however, will always fail when only one run through the
> while(1)
> > loop is needed and honor_end_cmd is set. So this:
> >
> > btrfs send /some/subvol | btrfs receive -e /some/
> >
> > will consistently fail because -e causes honor_cmd_to be set and
> > btrfs_read_and_process_send_stream() to correctly return 1. So the
> command will
> > be successful but btrfs receive will error out because the send - receive
> > concluded in one run through the while(1) loop.
> >
> > If we want to exclude empty streams we need a way to tell the difference
> between
> > btrfs_read_and_process_send_stream() returning 1 because read_buf() did
> not
> > detect any data and read_and_process_cmd() returning 1 because
> honor_end_cmd was
> > set. Without introducing too many changes the best way to me seems to have
> > btrfs_read_and_process_send_stream() return -ENODATA in the first case.
> The rest
> > stays the same. We can then check for -ENODATA in do_receive() and report
> a
> > proper error in this case. This should also be backwards compatible to
> previous
> > versions of btrfs receive. They will fail on empty streams because a
> negative
> > value is returned. The only thing that they will lack is a nice error
> message.
> >
> > Signed-off-by: Christian Brauner <christian.brau...@ubuntu.com>
> > ---
> > Changelog: 2017-04-03
> > - no changes
> > ---
> >  cmds-receive.c | 13 +
> >  send-stream.c  |  2 +-
> >  2 files changed, 6 insertions(+), 9 deletions(-)
> >
> > diff --git a/cmds-receive.c b/cmds-receive.c
> > index 6cf22637..b59f00e4 100644
> > --- a/cmds-receive.c
> > +++ b/cmds-receive.c
> > @@ -1091,7 +1091,6 @@ static int do_receive(struct btrfs_receive *rctx,
> const char *tomnt,
> > char *dest_dir_full_path;
> > char root_subvol_path[PATH_MAX];
> > int end = 0;
> > -   int count;
> >
> > dest_dir_full_path = realpath(tomnt, NULL);
> > if (!dest_dir_full_path) {
> > @@ -1186,7 +1185,6 @@ static int do_receive(struct btrfs_receive *rctx,
> const char *tomnt,
> > if (ret < 0)
> > goto out;
> >
> > -   count = 0;
> > while (!end) {
> > if (rctx->cached_capabilities_len) {
> > if (g_verbose >= 3)
> > @@ -1200,16 +1198,15 @@ static int do_receive(struct btrfs_receive *rctx,
> const char *tomnt,
> >  rctx,
> >
>  rctx->honor_end_cmd,
> >  max_errors);
> > -   if (ret < 0)
> > -   goto out;
> > -   /* Empty stream is invalid */
> > -   if (ret && count == 0) {
> > +   if (ret < 0 && ret == -ENODATA) {
> > +   /* Empty stream is invalid */
> > error("empty stream is not considered valid");
> > ret = -EINVAL;
> > goto out;
> > +   } else if (ret < 0) {
> > +   goto out;
> > }
> > -   count++;
> > -   if (ret)
> > +   if (ret > 0)
> > end = 1;
> >
> > close_inode_for_write(rctx);
> > diff --git a/send-stream.c b/send-stream.c
> > index 5a028cd9..78f2571a 100644
> > --- a/send-stream.c
> > +++ b/send-stream.c
> > @@ -492,7 +492,7 @@ int btrfs_read_and_process_send_stream(int fd,
> > if (ret < 0)
> > goto out;
> > if (ret) {
> > -   ret = 1;
> > +   ret = -ENODATA;
> > goto out;
> > }
> >
> > --
> > 2.11.0
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Prevent escaping btrfs quota

2017-04-21 Thread Christian Brauner
Hi guys,

If a qgroup is created for a btrfs subvolume /some/path and limits are set and
a new btrfs subvolume /some/path/bla is created it does not inherit the parent
subvolume's /some/path qgroup and limits. The only way to achieve something
similar is to create a common "parent" qgroup and assign both, the parent btrfs
subvolume /some/path and the child subvolume /some/path/bla to this qgroup.
This seems unintuitive and actually proves to be a problem. For example, when
creating a container that uses a btrfs subvolume as storage backend and then
allowing users to create additional subvolumes in another container they can
easily escape the qgroup and its limits and even with the common qgroup nothing
seems to prevent them from simply removing themselves from this qgroup. What is
the proper way to deal with this case such that:
a) the subvolume automatically gets the same qgroup as the parent
b) cannot escape the qgroup

where b) is the more pressing concern.

Thanks!
Christian


signature.asc
Description: PGP signature


[PATCH 1/2 v2] btrfs-progs: fix btrfs send & receive with -e flag

2017-04-03 Thread Christian Brauner
The old check here tried to ensure that empty streams are not considered valid.
The old check however, will always fail when only one run through the while(1)
loop is needed and honor_end_cmd is set. So this:

btrfs send /some/subvol | btrfs receive -e /some/

will consistently fail because -e causes honor_cmd_to be set and
btrfs_read_and_process_send_stream() to correctly return 1. So the command will
be successful but btrfs receive will error out because the send - receive
concluded in one run through the while(1) loop.

If we want to exclude empty streams we need a way to tell the difference between
btrfs_read_and_process_send_stream() returning 1 because read_buf() did not
detect any data and read_and_process_cmd() returning 1 because honor_end_cmd was
set. Without introducing too many changes the best way to me seems to have
btrfs_read_and_process_send_stream() return -ENODATA in the first case. The rest
stays the same. We can then check for -ENODATA in do_receive() and report a
proper error in this case. This should also be backwards compatible to previous
versions of btrfs receive. They will fail on empty streams because a negative
value is returned. The only thing that they will lack is a nice error message.

Signed-off-by: Christian Brauner <christian.brau...@ubuntu.com>
---
Changelog: 2017-04-03
- no changes
---
 cmds-receive.c | 13 +
 send-stream.c  |  2 +-
 2 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/cmds-receive.c b/cmds-receive.c
index 6cf22637..b59f00e4 100644
--- a/cmds-receive.c
+++ b/cmds-receive.c
@@ -1091,7 +1091,6 @@ static int do_receive(struct btrfs_receive *rctx, const 
char *tomnt,
char *dest_dir_full_path;
char root_subvol_path[PATH_MAX];
int end = 0;
-   int count;
 
dest_dir_full_path = realpath(tomnt, NULL);
if (!dest_dir_full_path) {
@@ -1186,7 +1185,6 @@ static int do_receive(struct btrfs_receive *rctx, const 
char *tomnt,
if (ret < 0)
goto out;
 
-   count = 0;
while (!end) {
if (rctx->cached_capabilities_len) {
if (g_verbose >= 3)
@@ -1200,16 +1198,15 @@ static int do_receive(struct btrfs_receive *rctx, const 
char *tomnt,
 rctx,
 rctx->honor_end_cmd,
 max_errors);
-   if (ret < 0)
-   goto out;
-   /* Empty stream is invalid */
-   if (ret && count == 0) {
+   if (ret < 0 && ret == -ENODATA) {
+   /* Empty stream is invalid */
error("empty stream is not considered valid");
ret = -EINVAL;
goto out;
+   } else if (ret < 0) {
+   goto out;
}
-   count++;
-   if (ret)
+   if (ret > 0)
end = 1;
 
close_inode_for_write(rctx);
diff --git a/send-stream.c b/send-stream.c
index 5a028cd9..78f2571a 100644
--- a/send-stream.c
+++ b/send-stream.c
@@ -492,7 +492,7 @@ int btrfs_read_and_process_send_stream(int fd,
if (ret < 0)
goto out;
if (ret) {
-   ret = 1;
+   ret = -ENODATA;
goto out;
}
 
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2 v2] btrfs-progs: fix btrfs send & receive with -e flag

2017-04-03 Thread Christian Brauner
Hi guys,

This is the second version of the patch. It contains no functional changes but
merely adds a second small patch to adapt the test to use -e with btrfs receive
in order to terminate on an end marker in the stream. Thanks to David for
pointing this out. Here's the description of the patch again:

The old check here tried to ensure that empty streams are not considered valid.
The old check however, will always fail when only one run through the while(1)
loop is needed and honor_end_cmd is set. So this:

btrfs send /some/subvol | btrfs receive -e /some/

will consistently fail because -e causes honor_cmd_to be set and
btrfs_read_and_process_send_stream() to correctly return 1. So the command will
be successful but btrfs receive will error out because the send - receive
concluded in one run through the while(1) loop.

If we want to exclude empty streams we need a way to tell the difference between
btrfs_read_and_process_send_stream() returning 1 because read_buf() did not
detect any data and read_and_process_cmd() returning 1 because honor_end_cmd was
set. Without introducing too many changes the best way to me seems to have
btrfs_read_and_process_send_stream() return -ENODATA in the first case. The rest
stays the same. We can then check for -ENODATA in do_receive() and report a
proper error in this case. This should also be backwards compatible to previous
versions of btrfs receive. They will fail on empty streams because a negative
value is returned. The only thing that they will lack is a nice error message.

Christian Brauner (2):
  btrfs-progs: fix btrfs send & receive with -e flag
  tests: use receive -e to terminate on end marker

 cmds-receive.c  | 13 +
 send-stream.c   |  2 +-
 tests/misc-tests/018-recv-end-of-stream/test.sh | 12 ++--
 3 files changed, 12 insertions(+), 15 deletions(-)

-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2 v2] tests: use receive -e to terminate on end marker

2017-04-03 Thread Christian Brauner
Signed-off-by: Christian Brauner <christian.brau...@ubuntu.com>
---
 tests/misc-tests/018-recv-end-of-stream/test.sh | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/tests/misc-tests/018-recv-end-of-stream/test.sh 
b/tests/misc-tests/018-recv-end-of-stream/test.sh
index d39683e9..90655929 100755
--- a/tests/misc-tests/018-recv-end-of-stream/test.sh
+++ b/tests/misc-tests/018-recv-end-of-stream/test.sh
@@ -34,7 +34,7 @@ test_full_empty_stream() {
 
run_check $TOP/mkfs.btrfs -f $TEST_DEV
run_check_mount_test_dev
-   run_check $SUDO_HELPER $TOP/btrfs receive -v -f "$str" "$TEST_MNT"
+   run_check $SUDO_HELPER $TOP/btrfs receive -e -v -f "$str" "$TEST_MNT"
run_check_umount_test_dev
 
run_check rm -f -- "$str"
@@ -65,7 +65,7 @@ test_full_simple_stream() {
 
run_check $TOP/mkfs.btrfs -f $TEST_DEV
run_check_mount_test_dev
-   run_check $SUDO_HELPER $TOP/btrfs receive -v -f "$str" "$TEST_MNT"
+   run_check $SUDO_HELPER $TOP/btrfs receive -e -v -f "$str" "$TEST_MNT"
run_check_umount_test_dev
 
run_check rm -f -- "$str"
@@ -96,8 +96,8 @@ test_incr_empty_stream() {
 
run_check $TOP/mkfs.btrfs -f $TEST_DEV
run_check_mount_test_dev
-   run_check $SUDO_HELPER $TOP/btrfs receive -v -f "$fstr" "$TEST_MNT"
-   run_check $SUDO_HELPER $TOP/btrfs receive -v -f "$istr" "$TEST_MNT"
+   run_check $SUDO_HELPER $TOP/btrfs receive -e -v -f "$fstr" "$TEST_MNT"
+   run_check $SUDO_HELPER $TOP/btrfs receive -e -v -f "$istr" "$TEST_MNT"
run_check_umount_test_dev
 
run_check rm -f -- "$fstr" "$istr"
@@ -136,8 +136,8 @@ test_incr_simple_stream() {
 
run_check $TOP/mkfs.btrfs -f $TEST_DEV
run_check_mount_test_dev
-   run_check $SUDO_HELPER $TOP/btrfs receive -v -f "$fstr" "$TEST_MNT"
-   run_check $SUDO_HELPER $TOP/btrfs receive -v -f "$istr" "$TEST_MNT"
+   run_check $SUDO_HELPER $TOP/btrfs receive -e -v -f "$fstr" "$TEST_MNT"
+   run_check $SUDO_HELPER $TOP/btrfs receive -e -v -f "$istr" "$TEST_MNT"
run_check_umount_test_dev
 
run_check rm -f -- "$fstr" "$istr"
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: fix btrfs send & receive with -e flag

2017-04-03 Thread Christian Brauner
Hi guys,

Friendly ping. Just checking in on this patch since I haven't heard back so far
and this is a blocker in some scenarios where we're using btrfs.

Thanks!
Christian

On Fri, Mar 24, 2017 at 04:00:57PM +0100, Christian Brauner wrote:
> The old check here tried to ensure that empty streams are not considered 
> valid.
> The old check however, will always fail when only one run through the while(1)
> loop is needed and honor_end_cmd is set. So this:
> 
> btrfs send /some/subvol | btrfs receive -e /some/
> 
> will consistently fail because -e causes honor_cmd_to be set and
> btrfs_read_and_process_send_stream() to correctly return 1. So the command 
> will
> be successful but btrfs receive will error out because the send - receive
> concluded in one run through the while(1) loop.
> 
> If we want to exclude empty streams we need a way to tell the difference 
> between
> btrfs_read_and_process_send_stream() returning 1 because read_buf() did not
> detect any data and read_and_process_cmd() returning 1 because honor_end_cmd 
> was
> set. Without introducing too many changes the best way to me seems to have
> btrfs_read_and_process_send_stream() return -ENODATA in the first case. The 
> rest
> stays the same. We can then check for -ENODATA in do_receive() and report a
> proper error in this case. This should also be backwards compatible to 
> previous
> versions of btrfs receive. They will fail on empty streams because a negative
> value is returned. The only thing that they will lack is a nice error message.
> 
> Signed-off-by: Christian Brauner <christian.brau...@ubuntu.com>
> ---
>  cmds-receive.c | 13 +
>  send-stream.c  |  2 +-
>  2 files changed, 6 insertions(+), 9 deletions(-)
> 
> diff --git a/cmds-receive.c b/cmds-receive.c
> index 6cf22637..b59f00e4 100644
> --- a/cmds-receive.c
> +++ b/cmds-receive.c
> @@ -1091,7 +1091,6 @@ static int do_receive(struct btrfs_receive *rctx, const 
> char *tomnt,
>   char *dest_dir_full_path;
>   char root_subvol_path[PATH_MAX];
>   int end = 0;
> - int count;
>  
>   dest_dir_full_path = realpath(tomnt, NULL);
>   if (!dest_dir_full_path) {
> @@ -1186,7 +1185,6 @@ static int do_receive(struct btrfs_receive *rctx, const 
> char *tomnt,
>   if (ret < 0)
>   goto out;
>  
> - count = 0;
>   while (!end) {
>   if (rctx->cached_capabilities_len) {
>   if (g_verbose >= 3)
> @@ -1200,16 +1198,15 @@ static int do_receive(struct btrfs_receive *rctx, 
> const char *tomnt,
>rctx,
>rctx->honor_end_cmd,
>max_errors);
> - if (ret < 0)
> - goto out;
> - /* Empty stream is invalid */
> - if (ret && count == 0) {
> + if (ret < 0 && ret == -ENODATA) {
> + /* Empty stream is invalid */
>   error("empty stream is not considered valid");
>   ret = -EINVAL;
>   goto out;
> + } else if (ret < 0) {
> + goto out;
>   }
> - count++;
> - if (ret)
> + if (ret > 0)
>   end = 1;
>  
>   close_inode_for_write(rctx);
> diff --git a/send-stream.c b/send-stream.c
> index 5a028cd9..78f2571a 100644
> --- a/send-stream.c
> +++ b/send-stream.c
> @@ -492,7 +492,7 @@ int btrfs_read_and_process_send_stream(int fd,
>   if (ret < 0)
>   goto out;
>   if (ret) {
> - ret = 1;
> + ret = -ENODATA;
>   goto out;
>   }
>  
> -- 
> 2.11.0
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Shrinking a device - performance?

2017-03-27 Thread Christian Theune
Hi,

> On Mar 27, 2017, at 4:48 PM, Roman Mamedov <r...@romanrm.net> wrote:
> 
> On Mon, 27 Mar 2017 15:20:37 +0200
> Christian Theune <c...@flyingcircus.io> wrote:
> 
>> (Background info: we’re migrating large volumes from btrfs to xfs and can
>> only do this step by step: copying some data, shrinking the btrfs volume,
>> extending the xfs volume, rinse repeat. If someone should have any
>> suggestions to speed this up and not having to think in terms of _months_
>> then I’m all ears.)
> 
> I would only suggest that you reconsider XFS. You can't shrink XFS, therefore
> you won't have the flexibility to migrate in the same way to anything better
> that comes along in the future (ZFS perhaps? or even Bcachefs?). XFS does not
> perform that much better over Ext4, and very importantly, Ext4 can be shrunk.

That is true. However, we do have moved the expected feature set of the 
filesystem (i.e. cow) down to “store files safely and reliably” and we’ve seen 
too much breakage with ext4 in the past. Of course “persistence means you’ll 
have to say I’m sorry” and thus with either choice we may be faced with some 
issue in the future that we might have circumvented with another solution and 
yes flexibility is worth a great deal.

We’ve run XFS and ext4 on different (large and small) workloads in the last 2 
years and I have to say I’m much more happy about XFS even with the shrinking 
limitation.

To us ext4 is prohibitive with it’s fsck performance and we do like the tight 
error checking in XFS.

Thanks for the reminder though - especially in the public archive making this 
tradeoff with flexibility known is wise to communicate. :-)

Hugs,
Christian

--
Christian Theune · c...@flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · http://flyingcircus.io
Forsterstraße 29 · 06112 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian. Theune, Christian. Zagrodnick



signature.asc
Description: Message signed with OpenPGP


Re: Shrinking a device - performance?

2017-03-27 Thread Christian Theune
Hi,

> On Mar 27, 2017, at 4:17 PM, Austin S. Hemmelgarn <ahferro...@gmail.com> 
> wrote:
> 
> One other thing that I just thought of:
> For a backup system, assuming some reasonable thinning system is used for the 
> backups, I would personally migrate things slowly over time by putting new 
> backups on the new filesystem, and shrinking the old filesystem as the old 
> backups there get cleaned out.  Unfortunately, most backup software I've seen 
> doesn't handle this well, so it's not all that easy to do, but it does save 
> you from having to migrate data off of the old filesystem, and means you 
> don't have to worry as much about the resize of the old FS taking forever.

Right. This is an option we can do from a software perspective (our own 
solution - https://bitbucket.org/flyingcircus/backy) but our systems in use 
can’t hold all the data twice. Even though we’re migrating to a backend 
implementation that uses less data than before I have to perform an “inplace” 
migration in some way. This is VM block device backup. So basically we migrate 
one VM with all its previous data and that works quite fine with a little 
headroom. However, migrating all VMs to a new “full” backup and then wait for 
the old to shrink would only work if we had a completely empty backup server in 
place, which we don’t.

Also: the idea of migrating on btrfs also has its downside - the performance of 
“mkdir” and “fsync” is abysmal at the moment. I’m waiting for the current 
shrinking job to finish but this is likely limited to the “find free space” 
algorithm. We’re talking about a few megabytes converted per second. Sigh.

Cheers,
Christian Theune

--
Christian Theune · c...@flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · http://flyingcircus.io
Forsterstraße 29 · 06112 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian. Theune, Christian. Zagrodnick



signature.asc
Description: Message signed with OpenPGP


Re: Shrinking a device - performance?

2017-03-27 Thread Christian Theune
Hi,

> On Mar 27, 2017, at 3:50 PM, Christian Theune <c...@flyingcircus.io> wrote:
> 
> Hi,
> 
>> On Mar 27, 2017, at 3:46 PM, Austin S. Hemmelgarn <ahferro...@gmail.com> 
>> wrote:
>>> 
>>>> Something I’d like to verify: does having traffic on the volume have
>>>> the potential to delay this infinitely? I.e. does the system write
>>>> to any segments that we’re trying to free so it may have to work on
>>>> the same chunk over and over again? If not, then this means it’s
>>>> just slow and we’re looking forward to about 2 months worth of time
>>>> shrinking this volume. (And then again on the next bigger server
>>>> probably about 3-4 months).
>>> 
>>>  I don't know. I would hope not, but I simply don't know enough
>>> about the internal algorithms for that. Maybe someone else can confirm?
>> I'm not 100% certain, but I believe that while it can delay things, it can't 
>> do so infinitely.  AFAICT from looking at the code (disclaimer: I am not a C 
>> programmer by profession), it looks like writes to chunks that are being 
>> compacted or moved will go to the new location, not the old one, but writes 
>> to chunks which aren't being touched by the resize currently will just go to 
>> where the chunk is currently.  Based on this, lowering the amount of traffic 
>> to the FS could probably speed things up a bit, but it likely won't help 
>> much.
> 
> I hoped that this is the strategy implemented, otherwise it would end up in 
> an infinite cat-and-mouse game. ;)
> 
>>>> (Background info: we’re migrating large volumes from btrfs to xfs
>>>> and can only do this step by step: copying some data, shrinking the
>>>> btrfs volume, extending the xfs volume, rinse repeat. If someone
>>>> should have any suggestions to speed this up and not having to think
>>>> in terms of _months_ then I’m all ears.)
>>> 
>>>  All I can suggest is to move some unused data off the volume and do
>>> it in fewer larger steps. Sorry.
>> Same.
>> 
>> The other option though is to just schedule a maintenance window, nuke the 
>> old FS, and restore from a backup.  If you can afford to take the system 
>> off-line temporarily, this will almost certainly go faster (assuming you 
>> have a reasonably fast means of restoring backups).
> 
> Well. This is the backup. ;)

One strategy that does come to mind: we’re converting our backup from a system 
that uses reflinks to a non-reflink based system. We can convert this in place 
so this would remove all the reflink stuff in the existing filesystem and then 
we maybe can do the FS conversion faster when this isn’t an issue any longer. I 
think I’ll

Christian

--
Christian Theune · c...@flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · http://flyingcircus.io
Forsterstraße 29 · 06112 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian. Theune, Christian. Zagrodnick



signature.asc
Description: Message signed with OpenPGP


Re: Shrinking a device - performance?

2017-03-27 Thread Christian Theune
Hi,

> On Mar 27, 2017, at 3:46 PM, Austin S. Hemmelgarn <ahferro...@gmail.com> 
> wrote:
>> 
>>> Something I’d like to verify: does having traffic on the volume have
>>> the potential to delay this infinitely? I.e. does the system write
>>> to any segments that we’re trying to free so it may have to work on
>>> the same chunk over and over again? If not, then this means it’s
>>> just slow and we’re looking forward to about 2 months worth of time
>>> shrinking this volume. (And then again on the next bigger server
>>> probably about 3-4 months).
>> 
>>   I don't know. I would hope not, but I simply don't know enough
>> about the internal algorithms for that. Maybe someone else can confirm?
> I'm not 100% certain, but I believe that while it can delay things, it can't 
> do so infinitely.  AFAICT from looking at the code (disclaimer: I am not a C 
> programmer by profession), it looks like writes to chunks that are being 
> compacted or moved will go to the new location, not the old one, but writes 
> to chunks which aren't being touched by the resize currently will just go to 
> where the chunk is currently.  Based on this, lowering the amount of traffic 
> to the FS could probably speed things up a bit, but it likely won't help much.

I hoped that this is the strategy implemented, otherwise it would end up in an 
infinite cat-and-mouse game. ;)

>>> (Background info: we’re migrating large volumes from btrfs to xfs
>>> and can only do this step by step: copying some data, shrinking the
>>> btrfs volume, extending the xfs volume, rinse repeat. If someone
>>> should have any suggestions to speed this up and not having to think
>>> in terms of _months_ then I’m all ears.)
>> 
>>   All I can suggest is to move some unused data off the volume and do
>> it in fewer larger steps. Sorry.
> Same.
> 
> The other option though is to just schedule a maintenance window, nuke the 
> old FS, and restore from a backup.  If you can afford to take the system 
> off-line temporarily, this will almost certainly go faster (assuming you have 
> a reasonably fast means of restoring backups).

Well. This is the backup. ;)

Thanks,
Christian

--
Christian Theune · c...@flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · http://flyingcircus.io
Forsterstraße 29 · 06112 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian. Theune, Christian. Zagrodnick



signature.asc
Description: Message signed with OpenPGP


Re: Shrinking a device - performance?

2017-03-27 Thread Christian Theune
Hi,

> On Mar 27, 2017, at 3:07 PM, Hugo Mills <h...@carfax.org.uk> wrote:
> 
>   On my hardware (consumer HDDs and SATA, RAID-1 over 6 devices), it
> takes about a minute to move 1 GiB of data. At that rate, it would
> take 1000 minutes (or about 16 hours) to move 1 TiB of data.
> 
>   However, there are cases where some items of data can take *much*
> longer to move. The biggest of these is when you have lots of
> snapshots. When that happens, some (but not all) of the metadata can
> take a very long time. In my case, with a couple of hundred snapshots,
> some metadata chunks take 4+ hours to move.

Thanks for that info. The 1min per 1GiB is what I saw too - the “it can take 
longer” wasn’t really explainable to me.

As I’m not using snapshots: would large files (100+gb) with long chains of CoW 
history (specifically reflink copies) also hurt?

Something I’d like to verify: does having traffic on the volume have the 
potential to delay this infinitely? I.e. does the system write to any segments 
that we’re trying to free so it may have to work on the same chunk over and 
over again? If not, then this means it’s just slow and we’re looking forward to 
about 2 months worth of time shrinking this volume. (And then again on the next 
bigger server probably about 3-4 months).

(Background info: we’re migrating large volumes from btrfs to xfs and can only 
do this step by step: copying some data, shrinking the btrfs volume, extending 
the xfs volume, rinse repeat. If someone should have any suggestions to speed 
this up and not having to think in terms of _months_ then I’m all ears.)

Cheers,
Christian

--
Christian Theune · c...@flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · http://flyingcircus.io
Forsterstraße 29 · 06112 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian. Theune, Christian. Zagrodnick



signature.asc
Description: Message signed with OpenPGP


Re: Shrinking a device - performance?

2017-03-27 Thread Christian Theune

> On Mar 27, 2017, at 1:51 PM, Christian Theune <c...@flyingcircus.io> wrote:
> 
> Hi,
> 
> (I hope I’m not double posting. My mail client was misconfigured and I think 
> I only managed to send the mail correctly this time.)

Turns out I did double post. Mea culpa.

--
Christian Theune · c...@flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · http://flyingcircus.io
Forsterstraße 29 · 06112 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian. Theune, Christian. Zagrodnick



signature.asc
Description: Message signed with OpenPGP


Shrinking a device - performance?

2017-03-27 Thread Christian Theune
Hi,

(I hope I’m not double posting. My mail client was misconfigured and I think I 
only managed to send the mail correctly this time.)

I’m currently shrinking a device and it seems that the performance of shrink is 
abysmal. I intended to shrink a ~22TiB filesystem down to 20TiB. This is still 
using LVM underneath so that I can’t just remove a device from the filesystem 
but have to use the resize command.

Label: 'backy'  uuid: 3d0b7511-4901-4554-96d4-e6f9627ea9a4
   Total devices 1 FS bytes used 18.21TiB
   devid1 size 20.00TiB used 20.71TiB path /dev/mapper/vgsys-backy

This has been running since last Thursday, so roughly 3.5days now. The “used” 
number in devid1 has moved about 1TiB in this time. The filesystem is seeing 
regular usage (read and write) and when I’m suspending any application traffic 
I see about 1GiB of movement every now and then. Maybe once every 30 seconds or 
so.

Does this sound fishy or normal to you?

Kind regards,
Christian

--
Christian Theune · c...@flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · http://flyingcircus.io
Forsterstraße 29 · 06112 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian. Theune, Christian. Zagrodnick



signature.asc
Description: Message signed with OpenPGP


Shrinking a device - performance?

2017-03-27 Thread Christian Theune
Hi,

I’m currently shrinking a device and it seems that the performance of shrink is 
abysmal. I intended to shrink a ~22TiB filesystem down to 20TiB. This is still 
using LVM underneath so that I can’t just remove a device from the filesystem 
but have to use the resize command.

Label: 'backy'  uuid: 3d0b7511-4901-4554-96d4-e6f9627ea9a4
Total devices 1 FS bytes used 18.21TiB
devid1 size 20.00TiB used 20.71TiB path /dev/mapper/vgsys-backy

This has been running since last Thursday, so roughly 3.5days now. The “used” 
number in devid1 has moved about 1TiB in this time. The filesystem is seeing 
regular usage (read and write) and when I’m suspending any application traffic 
I see about 1GiB of movement every now and then. Maybe once every 30 seconds or 
so.

Does this sound fishy or normal to you?

Kind regards,
Christian

--
Christian Theune · c...@flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · http://flyingcircus.io
Forsterstraße 29 · 06112 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian. Theune, Christian. Zagrodnick



signature.asc
Description: Message signed with OpenPGP


[PATCH] btrfs-progs: fix btrfs send & receive with -e flag

2017-03-24 Thread Christian Brauner
The old check here tried to ensure that empty streams are not considered valid.
The old check however, will always fail when only one run through the while(1)
loop is needed and honor_end_cmd is set. So this:

btrfs send /some/subvol | btrfs receive -e /some/

will consistently fail because -e causes honor_cmd_to be set and
btrfs_read_and_process_send_stream() to correctly return 1. So the command will
be successful but btrfs receive will error out because the send - receive
concluded in one run through the while(1) loop.

If we want to exclude empty streams we need a way to tell the difference between
btrfs_read_and_process_send_stream() returning 1 because read_buf() did not
detect any data and read_and_process_cmd() returning 1 because honor_end_cmd was
set. Without introducing too many changes the best way to me seems to have
btrfs_read_and_process_send_stream() return -ENODATA in the first case. The rest
stays the same. We can then check for -ENODATA in do_receive() and report a
proper error in this case. This should also be backwards compatible to previous
versions of btrfs receive. They will fail on empty streams because a negative
value is returned. The only thing that they will lack is a nice error message.

Signed-off-by: Christian Brauner <christian.brau...@ubuntu.com>
---
 cmds-receive.c | 13 +
 send-stream.c  |  2 +-
 2 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/cmds-receive.c b/cmds-receive.c
index 6cf22637..b59f00e4 100644
--- a/cmds-receive.c
+++ b/cmds-receive.c
@@ -1091,7 +1091,6 @@ static int do_receive(struct btrfs_receive *rctx, const 
char *tomnt,
char *dest_dir_full_path;
char root_subvol_path[PATH_MAX];
int end = 0;
-   int count;
 
dest_dir_full_path = realpath(tomnt, NULL);
if (!dest_dir_full_path) {
@@ -1186,7 +1185,6 @@ static int do_receive(struct btrfs_receive *rctx, const 
char *tomnt,
if (ret < 0)
goto out;
 
-   count = 0;
while (!end) {
if (rctx->cached_capabilities_len) {
if (g_verbose >= 3)
@@ -1200,16 +1198,15 @@ static int do_receive(struct btrfs_receive *rctx, const 
char *tomnt,
 rctx,
 rctx->honor_end_cmd,
 max_errors);
-   if (ret < 0)
-   goto out;
-   /* Empty stream is invalid */
-   if (ret && count == 0) {
+   if (ret < 0 && ret == -ENODATA) {
+   /* Empty stream is invalid */
error("empty stream is not considered valid");
ret = -EINVAL;
goto out;
+   } else if (ret < 0) {
+   goto out;
}
-   count++;
-   if (ret)
+   if (ret > 0)
end = 1;
 
close_inode_for_write(rctx);
diff --git a/send-stream.c b/send-stream.c
index 5a028cd9..78f2571a 100644
--- a/send-stream.c
+++ b/send-stream.c
@@ -492,7 +492,7 @@ int btrfs_read_and_process_send_stream(int fd,
if (ret < 0)
goto out;
if (ret) {
-   ret = 1;
+   ret = -ENODATA;
goto out;
}
 
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Downgrading kernel 4.9 to 4.4 with space_cache=v2 enabled?

2017-02-23 Thread Christian Theune
Hi,

just for future reference if someone finds this thread: there is a bit of 
output I’m seeing with this crashing kernel (unclear whether related to btrfs 
or not):

  31 | 02/23/2017 | 09:51:22 | OS Stop/Shutdown #0x4f | Run-time critical stop 
| Asserted
  32 | Linux kernel panic: Out of memo
  33 | Linux kernel panic: ry and no k
  34 | Linux kernel panic: illable pro
  35 | Linux kernel panic: cesses...

Cheers,
Christian


--
Christian Theune · c...@flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · http://flyingcircus.io
Forsterstraße 29 · 06112 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian. Theune, Christian. Zagrodnick



signature.asc
Description: Message signed with OpenPGP


Downgrading kernel 4.9 to 4.4 with space_cache=v2 enabled?

2017-02-23 Thread Christian Theune
Hi,

not sure whether it’s possible, but we tried space_cache=v2 and obviously after 
working fine in staging it broke in production. Or rather: we upgraded from 4.4 
to 4.9 and enabled the space_cache. Our production volume is around 50TiB 
usable (underlying HW Raid 6).

The machine crashes silently every 15 hours or so and takes _ages_ to reboot. 
It current is stuck trying to mount the local filesystems and I guess btrfs is 
doing something, but I don’t have shell access yet.

I’m wondering whether we can downgrade by booting back into 4.4 or will this 
break things even further? (We’ve had some unpleasant surprises with FS’ in the 
last months, so I thought I’d rather ask.)

Kind regards,
Christian

--
Christian Theune · c...@flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · http://flyingcircus.io
Forsterstraße 29 · 06112 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian. Theune, Christian. Zagrodnick



signature.asc
Description: Message signed with OpenPGP


btrfs receive leaves new subvolume modifiable during operation

2017-01-31 Thread Christian Lupien
I have been testing btrfs send/receive. I like it.

During those tests I discovered that it is possible to access and
modify (add files, delete files ...) of the new receive snapshot during
the transfer. After the transfer it becomes readonly but it could
already have been modified.

So you can end up with a source and a destination which are not the
same. Therefore during a subsequent incremental transfers I can get
receive to crash (trying to unlink a file that is not in the parent but
should).

Is this behavior by design or will it be prevented in the future?

I can of course just not modify the subvolume during receive but is
there a way to make sure no user/program modifies it?

I can also get in the same kind of trouble by modifying a parent (after
changing its property temporarily to ro=false). send/receive is
checking that the same parent uuid is available on both sides but not
that generation has not changed. Of course in this case it requires
direct user intervention. Never changing the ro property of subvolumes
would prevent the problem. 

Again is this by design?
Otherwise I would suggest finding a way to avoid those conditions
(using the generation maybe?). There could be an override option to
allow more flexibility if needed.

Thanks
Christian   
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/1] btrfs lockdep annotation

2017-01-24 Thread Christian Borntraeger
On 01/24/2017 11:22 AM, Filipe Manana wrote:
> On Tue, Jan 24, 2017 at 9:01 AM, Christian Borntraeger
> <borntrae...@de.ibm.com> wrote:
>> Chris,
>>
>> since my bug report about this did not result in any fix and since
> 
> It was fixed and the fix landed in 4.10-rc4:

Thanks, I missed that last pull.

> 
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=781feef7e6befafd4d9787d1f7ada1f9ccd504e4
> 
>> this disables lockdep before the the code that I want to debug runs
>> here is my attempt to fix it.
>> Please double check if the subclass looks right. It seems to work
>> for me but I do not know enough about btrfs to decide if this is
>> right or not.
>>
>> Christian Borntraeger ():
>>   btrfs: add lockdep annotation for btrfs_log_inode
>>
>>  fs/btrfs/tree-log.c| 2 +-
>>
>> --
>> 2.7.4
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] btrfs: add lockdep annotation for btrfs_log_inode

2017-01-24 Thread Christian Borntraeger
Add a proper subclass to get rid of the following lockdep
error.

 [ INFO: possible recursive locking detected ]
 4.9.0+ #279 Not tainted
 -
 vim/4801 is trying to acquire lock:
  (>log_mutex){+.+...}, at: [<03ff82057592>]
btrfs_log_inode+0x182/0xfa8 [btrfs]

  (>log_mutex){+.+...}, at: [<03ff82057592>]
btrfs_log_inode+0x182/0xfa8 [btrfs]

  Possible unsafe locking scenario:
CPU0

   lock(>log_mutex);
   lock(>log_mutex);

 *** DEADLOCK ***
  May be due to missing lock nesting notation
 3 locks held by vim/4801:
  #0:  (>s_type->i_mutex_key#15){+.+.+.}, at: [<03ff81fc274c>]
btrfs_sync_file+0x204/0x728 [btrfs]
  #1:  (sb_internal#2){.+.+..}, at: [<03ff81fa38e0>]
start_transaction+0x318/0x770 [btrfs]
  #2:  (>log_mutex){+.+...}, at: [<03ff82057592>]

[...]
 Call Trace:
 ([<00115ffc>] show_trace+0xe4/0x108)
  [<001160f8>] show_stack+0x68/0xe0
  [<00652d52>] dump_stack+0x9a/0xd8
  [<00209bb0>] __lock_acquire+0xac8/0x1bd0
  [<0020b3c6>] lock_acquire+0x106/0x4a0
  [<00a1fb36>] mutex_lock_nested+0xa6/0x428
  [<03ff82057592>] btrfs_log_inode+0x182/0xfa8 [btrfs]
  [<03ff82057c76>] btrfs_log_inode+0x866/0xfa8 [btrfs]
  [<03ff81ffe278>] btrfs_log_inode_parent+0x218/0x988 [btrfs]
  [<03ff81aa>] btrfs_log_dentry_safe+0x7a/0xa0 [btrfs]
  [<03ff81fc29b6>] btrfs_sync_file+0x46e/0x728 [btrfs]
  [<0044aeee>] do_fsync+0x5e/0x90
  [<0044b2ba>] SyS_fsync+0x32/0x40
  [<00a26786>] system_call+0xd6/0x288

Signed-off-by: Christian Borntraeger <borntrae...@de.ibm.com>
---
 fs/btrfs/tree-log.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 3d33c4e..a3ec717 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -4648,7 +4648,7 @@ static int btrfs_log_inode(struct btrfs_trans_handle 
*trans,
return ret;
}
 
-   mutex_lock(_I(inode)->log_mutex);
+   mutex_lock_nested(_I(inode)->log_mutex, inode_only);
 
/*
 * a brute force approach to making sure we get the most uptodate
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/1] btrfs lockdep annotation

2017-01-24 Thread Christian Borntraeger
Chris,

since my bug report about this did not result in any fix and since
this disables lockdep before the the code that I want to debug runs
here is my attempt to fix it.
Please double check if the subclass looks right. It seems to work
for me but I do not know enough about btrfs to decide if this is
right or not.

Christian Borntraeger ():
  btrfs: add lockdep annotation for btrfs_log_inode

 fs/btrfs/tree-log.c| 2 +-

-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs_alloc_tree_block: Faulting instruction address: 0xc02d4584

2017-01-19 Thread Christian Kujau
Hi,

after upgrading this powerpc32 box from 4.10-rc2 to -rc4, the message 
below occured a few hours after boot. Full dmesg and .config:

  http://nerdbynature.de/bits/4.10-rc4/

Any ideas?

Thanks,
Christian.


Faulting instruction address: 0xc02d4584
Oops: Kernel access of bad area, sig: 11 [#1]
PowerMac
Modules linked in: ecb xt_tcpudp iptable_filter ip_tables x_tables 
nfnetlink_log nfnetlink sha256_generic twofish_generic twofish_common 
usb_storage therm_adt746x loop i2c_powermac arc4 firewire_sbp2 b43 
rng_core ssb bcma mac80211 cfg80211 ecryptfs [last unloaded: nbd]
CPU: 0 PID: 1395 Comm: btrfs-transacti Tainted: GW   
4.10.0-rc4-1-gab8184b #1
task: ee7162e0 task.stack: ee9cc000
NIP: c02d4584 LR: c02d4574 CTR: c00d0df0
REGS: ee9cdaa0 TRAP: 0300   Tainted: GW
(4.10.0-rc4-1-gab8184b)
MSR: 9032 <EE,ME,IR,DR,RI>
  CR: 24422248  XER: 
DAR: 10581054 DSISR: 4200 
GPR00: c02d4574 ee9cdb50 ee71d4b8 10581050 0001 dbc88118  0020 
GPR08: 1205 0004   24422444   93c3 
GPR16: f000 0001    ee260800   
GPR24:  0001 ee9cdc1b eef5c1a0 1000 ee47 dbc88118 ee470170 
NIP [c02d4584] btrfs_alloc_tree_block+0x18c/0x5c4
LR [c02d4574] btrfs_alloc_tree_block+0x17c/0x5c4
Call Trace:
[ee9cdb50] [c02d4574] btrfs_alloc_tree_block+0x17c/0x5c4 (unreliable)
[ee9cdbf0] [c02b86d4] __btrfs_cow_block+0x110/0x638
[ee9cdc70] [c02b8d74] btrfs_cow_block+0xdc/0x1b0
[ee9cdca0] [c02bc48c] btrfs_search_slot+0x1c0/0x904
[ee9cdd10] [c02dc680] btrfs_lookup_inode+0x3c/0x124
[ee9cdd50] [c02ec204] btrfs_update_inode_item+0x4c/0x10c
[ee9cdd80] [c02d05e4] cache_save_setup+0xc0/0x400
[ee9cdde0] [c02d4d54] btrfs_start_dirty_block_groups+0x184/0x47c
[ee9cde50] [c02e7e84] btrfs_commit_transaction+0x148/0xac4
[ee9cdeb0] [c02e313c] transaction_kthread+0x1d0/0x1ec
[ee9cdf00] [c004f1fc] kthread+0xf8/0x124
[ee9cdf40] [c0011480] ret_from_kernel_thread+0x5c/0x64
--- interrupt: 0 at   (null)
LR =   (null)
Instruction dump:
4800b3ed 7f838040 7c7e1b78 419d0430 806300d4 81db 81fb0004 4bdfe2b9 
3924 7ee6bb78 38630050 7fc5f378 <7dc91d2c> 7de01d2c 809501cf 807501cb 
---[ end trace 937683537ecd986b ]---



-- 
BOFH excuse #342:

HTTPD Error 4004 : very old Intel cpu - insufficient processing power
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs: still lockdep splat for 4.9-rc5+ (btrfs_log_inode)

2016-11-25 Thread Christian Borntraeger
FWIW, I still see the lockdep splat in btrfs in 4.9-rc5+

[  159.698343] =
[  159.698345] [ INFO: possible recursive locking detected ]
[  159.698347] 4.9.0-rc5+ #136 Tainted: GW  
[  159.698348] -
[  159.698349] vim/6913 is trying to acquire lock:
[  159.698350]  (
[  159.698351] >log_mutex
[  159.698352] ){+.+...}
[  159.698353] , at: 
[  159.698445] [<03ff8202c3f4>] btrfs_log_inode+0x124/0x1048 [btrfs]
[  159.698447] 
   but task is already holding lock:
[  159.698448]  (
[  159.698449] >log_mutex
[  159.698449] ){+.+...}
[  159.698450] , at: 
[  159.698469] [<03ff8202c3f4>] btrfs_log_inode+0x124/0x1048 [btrfs]
[  159.698470] 
   other info that might help us debug this:
[  159.698471]  Possible unsafe locking scenario:

[  159.698472]CPU0
[  159.698473]
[  159.698474]   lock(
[  159.698475] >log_mutex
[  159.698476] );
[  159.698476]   lock(
[  159.698477] >log_mutex
[  159.698478] );
[  159.698479] 
*** DEADLOCK ***

[  159.698480]  May be due to missing lock nesting notation

[  159.698482] 3 locks held by vim/6913:
[  159.698483]  #0: 
[  159.698483]  (
[  159.698484] >s_type->i_mutex_key
[  159.698508] #15
[  159.698509] ){+.+.+.}
[  159.698509] , at: 
[  159.698528] [<03ff81ff66f8>] btrfs_sync_file+0x1b8/0x560 [btrfs]
[  159.698529]  #1: 
[  159.698530]  (
[  159.698531] sb_internal
[  159.698531] #2
[  159.698532] ){.+.+..}
[  159.698532] , at: 
[  159.698551] [<03ff81fdad2a>] start_transaction+0x312/0x600 [btrfs]
[  159.698552]  #2: 
[  159.698552]  (
[  159.698553] >log_mutex
[  159.698554] ){+.+...}
[  159.698555] , at: 
[  159.698573] [<03ff8202c3f4>] btrfs_log_inode+0x124/0x1048 [btrfs]
[  159.698574] 
   stack backtrace:
[  159.698577] CPU: 22 PID: 6913 Comm: vim Tainted: GW   4.9.0-rc5+ 
#136
[  159.698578] Hardware name: IBM  2964 NC9  704
  (LPAR)
[  159.698580] Stack:
[  159.698581]00fae55635f0 00fae5563680 0003 

[  159.698584]00fae5563720 00fae5563698 00fae5563698 
0020
[  159.698587] 00fa0020 00fa000a 
000a
[  159.698589]000c 00fae55636e8  

[  159.698592]0400037e5c40 001127a4 00fae5563680 
00fae55636d8
[  159.698594] Call Trace:
[  159.698599] ([<0011266a>] show_trace+0xea/0xf0)
[  159.698601]  [<00112748>] show_stack+0x68/0xe0 
[  159.698605]  [<004ef502>] dump_stack+0x9a/0xd8 
[  159.698609]  [<001a6078>] validate_chain.isra.22+0xbd8/0xd48 
[  159.698611]  [<001a741c>] __lock_acquire+0x304/0x7f0 
[  159.698613]  [<001a7fe6>] lock_acquire+0xfe/0x2d8 
[  159.698617]  [<008c5216>] mutex_lock_nested+0x86/0x3e8 
[  159.698636]  [<03ff8202c3f4>] btrfs_log_inode+0x124/0x1048 [btrfs] 
[  159.698655]  [<03ff8202cfcc>] btrfs_log_inode+0xcfc/0x1048 [btrfs] 
[  159.698674]  [<03ff8202d5bc>] btrfs_log_inode_parent+0x1fc/0x918 [btrfs] 
[  159.698693]  [<03ff8202ef22>] btrfs_log_dentry_safe+0x7a/0xa0 [btrfs] 
[  159.698712]  [<03ff81ff68fc>] btrfs_sync_file+0x3bc/0x560 [btrfs] 
[  159.698715]  [<00349ade>] do_fsync+0x5e/0x90 
[  159.698716]  [<00349e6a>] SyS_fsync+0x32/0x40 
[  159.698718]  [<008cae8e>] system_call+0xd6/0x270 
[  159.698719] INFO: lockdep is turned off.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Resizing BTRFS - raw partition

2016-11-02 Thread Christian Völker
Yohoo!

Well, slightly off-topic now.

It is a Debian derivative. Univention Corporate Server.

uname -r says: 4.1.0-ucs190-amd64

So I am pretty fine with the kernel.

But indeed, I have to think about getting a more up-to-date btrfs as I
ran on resizing in the "File too large" issue so I had to use "max"
instead of "+100G". No big deal, just a minor issue.

I use btrfs to checksum all files stored there as I had several issues
the last years where files got corrupted.

Greetings

Christian


Am 02.11.2016 um 11:14 schrieb Adam Borowski:
> On Wed, Nov 02, 2016 at 09:29:26AM +, Hugo Mills wrote:
>> On Wed, Nov 02, 2016 at 10:18:03AM +0100, Christian Völker wrote:
>>> thanks for the quick reply. Regarding version- I prefer to use stable
>>> Linux versionsand I am not going to upgrade just btrfs outside of
>>> the verndors builds. So I am stuck happily with this version. And I run
>>> Linux since more than 10years, so I am really fine with it, I guess :D
>>Well, btrfs-progs 0.19 was last released several years ago. If your
>> kernel is of the same kind of age, then you're going to be seeing a
>> whole load of really nasty data-corrupting or filesystem-breaking bugs
>> which have since been fixed. Basically, if something goes wrong with
>> your FS when you're running a kernel that old, the main rsponse you'll
>> get is, "well, that was silly of you, wasn't it?", and you'll have to
>> make a new filesystem and restore from your backups and hope it
>> doesn't happen again.
> -progs 0.19 imply kernel 2.6.32 which comes from btrfs' infancy, when it
> was hardly merged into mainline.  It's buggier than experimental features
> like RAID5/6 nowadays.
>
>>I would currently recommend running a 4.4 kernel or later. If you
>> want a "stable" kernel version from a distribution, and want some kind
>> of support for it when it goes wrong, you're probably going to have to
>> pay someone (Red Hat or SuSE, most likely) to support your
>> configuraion.
> Kernels around 3.16 or so are pretty reliable -- ones I'm using on
> production are 3.13 and 3.14, without a single issue.
>
> As 2.6.32 is for you "stable" rather than "ancient and unsupported", I guess
> you're on RHEL or a derivative.  For them, 3.10 is the next stable, which
> is on the verge of what could be reasonable (but I still second Hugo's
> advice of using at least the current LTS kernel, ie 4.4).
>
> TL;DR:
> DO NOT USE BTRFS ON ANCIENT KERNELS!!1!elebenty-one!
>
>
> Meow!

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Resizing BTRFS - raw partition

2016-11-02 Thread Christian Völker
Hi Hugo,

thanks for the quick reply. Regarding version- I prefer to use stable
Linux versionsand I am not going to upgrade just btrfs outside of
the verndors builds. So I am stuck happily with this version. And I run
Linux since more than 10years, so I am really fine with it, I guess :D

And thanks again for your proposal. Yes, your command worked.

I had to tell betrfs the devid!

So this did NOT work:

 btrfs fi resize  max /srv/share/

Instead the following two commands worked:

 btrfs fi resize  1:max /srv/share/
 btrfs fi resize  2:max /srv/share/

And now boths phydevices show the correct size.

This sound really strange for me that I have to tell btrfs to resize
just a single disk insteag of automatically resizing all disks...I bet
next time I have it forgotten again :-(


Greetings


Christian



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Resizing BTRFS - raw partition

2016-11-02 Thread Christian Völker
Hi all,

I am using btrfs as follows:
root@srv:/srv# btrfs filesystem show
Label: none  uuid: c8f24351-ddc4-4866-843c-4e95fcb498d4
Total devices 2 FS bytes used 1005.37GB
devid2 size 1.00TB used 1023.98GB path /dev/sdc
devid1 size 1.46TB used 1.00TB path /dev/sdb

Btrfs Btrfs v0.19

It is running inside a virtual machines running on VMware ESXi. I
increased both virtual disks to 1.5TB now. I did a scsi-rescan and fdisk
-l tells me the new size:
Disk /dev/sdb: 1610.6 GB, 1610612736000 bytes
Disk /dev/sdc: 1610.6 GB, 1610612736000 bytes

There are no partitions created on the disks, just raw devices used for
BTRFS. I found several sites where they resized a btrfs filesystem based
on LVM and the new size was immediately recognized. But how to do on raw
partitions?

How do I tell btrfs the devices have been resized?
I did not find a rescan command. btrfs scan does not change anything. 
Do I really have to reboot?

Thanks!

Christian

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: lockdep warning in btrfs in 4.8-rc3

2016-09-08 Thread Christian Borntraeger
On 09/08/2016 01:48 PM, Christian Borntraeger wrote:
> Chris,
> 
> with 4.8-rc3 I get the following on an s390 box:

Sorry for the noise, just saw the fix in your pull request.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


lockdep warning in btrfs in 4.8-rc3

2016-09-08 Thread Christian Borntraeger
Chris,

with 4.8-rc3 I get the following on an s390 box:


[ 1094.009172] =
[ 1094.009174] [ INFO: possible recursive locking detected ]
[ 1094.009177] 4.8.0-rc3 #126 Tainted: GW  
[ 1094.009179] -
[ 1094.009180] vim/12891 is trying to acquire lock:
[ 1094.009182]  (>log_mutex){+.+...}, at: [<03ff817e83c6>] 
btrfs_log_inode+0x126/0x1010 [btrfs]
[ 1094.009256] 
   but task is already holding lock:
[ 1094.009258]  (>log_mutex){+.+...}, at: [<03ff817e83c6>] 
btrfs_log_inode+0x126/0x1010 [btrfs]
[ 1094.009276] 
   other info that might help us debug this:
[ 1094.009278]  Possible unsafe locking scenario:

[ 1094.009280]CPU0
[ 1094.009281]
[ 1094.009282]   lock(>log_mutex);
[ 1094.009284]   lock(>log_mutex);
[ 1094.009286] 
*** DEADLOCK ***

[ 1094.009288]  May be due to missing lock nesting notation

[ 1094.009290] 3 locks held by vim/12891:
[ 1094.009291]  #0:  (>s_type->i_mutex_key#15){+.+.+.}, at: 
[<03ff817afbd6>] btrfs_sync_file+0x1de/0x5e8 [btrfs]
[ 1094.009311]  #1:  (sb_internal#2){.+.+..}, at: [<0035e0ba>] 
__sb_start_write+0x122/0x138
[ 1094.009320]  #2:  (>log_mutex){+.+...}, at: [<03ff817e83c6>] 
btrfs_log_inode+0x126/0x1010 [btrfs]
[ 1094.009370] 
   stack backtrace:
[ 1094.009375] CPU: 14 PID: 12891 Comm: vim Tainted: GW   4.8.0-rc3 
#126
[ 1094.009377] Hardware name: IBM  2964 NC9  704
  (LPAR)
[ 1094.009380]00f061367608 00f061367698 0002 
 
  00f061367738 00f0613676b0 00f0613676b0 
001133ec 
    00f7000a 
00f7000a 
  00f0613676f8 00f061367698  
 
  040001d821c8 001133ec 00f061367698 
00f0613676e8 
[ 1094.009396] Call Trace:
[ 1094.009401] ([<00113334>] show_trace+0xec/0xf0)
[ 1094.009403] ([<0011339a>] show_stack+0x62/0xe8)
[ 1094.009406] ([<0055211c>] dump_stack+0x9c/0xe0)
[ 1094.009411] ([<001d9930>] validate_chain.isra.22+0xc00/0xd70)
[ 1094.009413] ([<001dad9c>] __lock_acquire+0x39c/0x7d8)
[ 1094.009414] ([<001db8d0>] lock_acquire+0x108/0x320)
[ 1094.009420] ([<008845c6>] mutex_lock_nested+0x86/0x3f8)
[ 1094.009440] ([<03ff817e83c6>] btrfs_log_inode+0x126/0x1010 [btrfs])
[ 1094.009457] ([<03ff817e8fb2>] btrfs_log_inode+0xd12/0x1010 [btrfs])
[ 1094.009474] ([<03ff817e95b4>] btrfs_log_inode_parent+0x244/0x980 [btrfs])
[ 1094.009490] ([<03ff817eafea>] btrfs_log_dentry_safe+0x7a/0xa0 [btrfs])
[ 1094.009506] ([<03ff817afe1a>] btrfs_sync_file+0x422/0x5e8 [btrfs])
[ 1094.009512] ([<0039e64e>] do_fsync+0x5e/0x90)
[ 1094.009514] ([<0039e9e2>] SyS_fsync+0x32/0x40)
[ 1094.009517] ([<0088a336>] system_call+0xd6/0x270)
[ 1094.009518] INFO: lockdep is turned off.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kworker threads may be working saner now instead of using 100% of a CPU core for minutes (Re: Still not production ready)

2016-09-07 Thread Christian Rohmann


On 03/20/2016 12:24 PM, Martin Steigerwald wrote:
>> btrfs kworker thread uses up 100% of a Sandybridge core for minutes on
>> > random write into big file
>> > https://bugzilla.kernel.org/show_bug.cgi?id=90401
> I think I saw this up to kernel 4.3. I think I didn´t see this with 4.4 
> anymore and definately not with 4.5.
> 
> So it may be fixed.
> 
> Did anyone else see kworker threads using 100% of a core for minutes with 4.4 
> / 4.5?

I run 4.8rc5 and currently see this issue. kworking has been running at
100% for hours now, seems stuck there.

Anything I should look at in order to narrow this down to a root cause?


Regards

Christian
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Strange behavior after "rm -rf //"

2016-08-12 Thread Christian Kujau
On Fri, 12 Aug 2016, Russell Coker wrote:
> There are a variety of ways of giving the same result that rm
> doesn't reject. "/*" Wasn't caught last time I checked. See the above 
> URL if you want to test out various rm operations as root. ;)

Oh, yes - "rm -r /*" would work, even with a current coreutils version. 
But since the OP stated "rm -rf //" I was curious about the userspace
part on that.

Thanks for the link to the SELinux playground, nice setup!

Christian.
-- 
BOFH excuse #346:

Your/our computer(s) had suffered a memory leak, and we are waiting for them to 
be topped up.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Strange behavior after "rm -rf //"

2016-08-09 Thread Christian Kujau
On Mon, 8 Aug 2016, Ivan Sizov wrote:
> I'd ran "rm -rf //" by mistake two days ago. I'd stopped it after five

Out of curiosity, what version of coreutils is this? The --preserve-root 
option is the default for quite some time now:

> Don't include dirname.h, since system.h does it now.
> (usage, main): --preserve-root is now the default.
> 2006-09-03 02:53:58 +
http://git.savannah.gnu.org/cgit/coreutils.git/commit/src/rm.c?id=89ffaa19909d31dffbcf12fb4498afb72666f6c9

Even coreutils-6.10 from Debian/5 refuses to remove "/":

$ sudo rm -rf /
rm: cannot remove root directory `/'

Christian.
-- 
BOFH excuse #243:

The computer fleetly, mouse and all.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mount btrfs takes 30 minutes, btrfs check runs out of memory

2016-07-15 Thread Christian Rohmann
Hey Qu, all

On 07/15/2016 05:56 AM, Qu Wenruo wrote:
> 
> The good news is, we have patch to slightly speedup the mount, by
> avoiding reading out unrelated tree blocks.
> 
> In our test environment, it takes 15% less time to mount a fs filled
> with 16K files(2T used space).
> 
> https://patchwork.kernel.org/patch/9021421/

I have a 30TB RAID6 filesystem with compression on and I've seen mount
times of up to 20 minutes (!).

I don't want to sound unfair, but 15% improvement is good, but not in
the league where BTRFS needs to be.
Do I understand you comments correctly that further improvement would
result in a change of the on-disk format?



Thanks and with regards

Christian
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


4.6.0-rc3+: WARNING: CPU: 16 PID: 17257 at fs/btrfs/inode.c:9261 btrfs_destroy_inode

2016-04-27 Thread Christian Borntraeger
Folks,

I can sometimes trigger the following bug

[  244.493534] [ cut here ]
[  244.493624] WARNING: CPU: 16 PID: 17257 at fs/btrfs/inode.c:9261 
btrfs_destroy_inode+0x288/0x2b0 [btrfs]
[  244.493626] Kernel panic - not syncing: panic_on_warn set ...

[  244.493629] CPU: 16 PID: 17257 Comm: dwz Not tainted 4.6.0-rc3+ #56
[  244.493631]00fb3d8a3790 00fb3d8a3820 0002 
 
   00fb3d8a38c0 00fb3d8a3838 00fb3d8a3838 00239364 
   00528488 0080e46e 007f02c8 000b 
   00fb3d8a3880 00fb3d8a3820   
   040003ff8185a39e 00113c1e 00fb3d8a3820 00fb3d8a3880 
[  244.493641] Call Trace:
[  244.493646] ([<00113b0a>] show_trace+0x62/0x78)
[  244.493647] ([<00113bd2>] show_stack+0x72/0xf0)
[  244.493650] ([<0040a8ca>] dump_stack+0x9a/0xd8)
[  244.493652] ([<00238b9e>] panic+0xf6/0x230)
[  244.493656] ([<001398e2>] __warn+0x11a/0x120)
[  244.493657] ([<0040a010>] report_bug+0x90/0xf8)
[  244.493658] ([<00100a4a>] do_report_trap+0xea/0x108)
[  244.493660] ([<00100bcc>] illegal_op+0xd4/0x150)
[  244.493662] ([<0067b760>] pgm_check_handler+0x15c/0x1a4)
[  244.493678] ([<03ff817d0ff0>] btrfs_destroy_inode+0x288/0x2b0 [btrfs])
[  244.493680] ([<00fb42aff1c8>] 0xfb42aff1c8)
[  244.493683] ([<002de5b6>] __dentry_kill+0x1c6/0x238)
[  244.493684] ([<002de7fc>] dput+0x1d4/0x298)
[  244.493687] ([<002c6d24>] __fput+0x144/0x1e8)
[  244.493689] ([<0015a806>] task_work_run+0xc6/0xe8)
[  244.493714] ([<0010923a>] do_notify_resume+0x5a/0x60)
[  244.493715] ([<0067b45c>] system_call+0xdc/0x24c)

The WARN_ON is
WARN_ON(BTRFS_I(inode)->csum_bytes);

The file system is shared on 4 multipath target on scsi disks.
In seem to be able to reproduce it on kvm/next (4.6.0-rc3+) but 4.5.0 seems 
fine.
Any ideas?

Christian

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Replacing RAID-1 devices with larger disks

2016-02-28 Thread Christian Robottom Reis
On Sun, Feb 28, 2016 at 05:15:32PM -0300, Christian Robottom Reis wrote:
> I've managed to do the actual swap using a series of btrfs replace
> commands with no special arguments, and the system is now live and
> booting from the 256GB drives. However, I haven't actually noticed any
> difference in btrfs fi show output, and usage looks weird. Has anyone
> seen this before or have a clue as to who?

Yes, now I do, about 10 minutes after writing that mail. After a btrfs
replace, if the device being added is larger than the original device,
you need to issue:

btrfs fi resize :max 

to actually use that disk space. So for something like:

> Label: 'root'  uuid: 670d1132-00dc-4511-a2f6-d28ce08b4d3a
> Total devices 2 FS bytes used 9.33GiB
> devid1 size 13.97GiB used 11.78GiB path /dev/sda1
> devid2 size 13.97GiB used 11.78GiB path /dev/sdb1
> 
> Label: 'var'  uuid: 815b3280-e90f-483a-b244-1d2dfe9b6e67
> Total devices 2 FS bytes used 56.14GiB
> devid1 size 80.00GiB used 80.00GiB path /dev/sda3
> devid2 size 80.00GiB used 80.00GiB path /dev/sdb3

You need to do:

btrfs fi resize 1:max /
btrfs fi resize 2:max /

btrfs fi resize 1:max /var
btrfs fi resize 2:max /var

And it looks great now:

Label: 'root'  uuid: 670d1132-00dc-4511-a2f6-d28ce08b4d3a
Total devices 2 FS bytes used 9.34GiB
devid1 size 40.00GiB used 10.78GiB path /dev/sda1
devid2 size 40.00GiB used 10.78GiB path /dev/sdb1

Label: 'var'  uuid: 815b3280-e90f-483a-b244-1d2dfe9b6e67
Total devices 2 FS bytes used 56.16GiB
devid1 size 160.00GiB used 80.00GiB path /dev/sda3
devid2 size 160.00GiB used 80.00GiB path /dev/sdb3

This would be nice to document in the manpage for replace; it would also
be a good addition to the best google hit for replace RAID-1:


http://unix.stackexchange.com/questions/227560/how-to-replace-a-device-in-btrfs-raid-1-filesystem

but I don't have enough reputation to do it myself.
-- 
Christian Robottom Reis | [+55 16] 3376 0125   | http://async.com.br/~kiko
| [+55 16] 991 126 430 | http://launchpad.net/~kiko
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Replacing RAID-1 devices with larger disks

2016-02-28 Thread Christian Robottom Reis
Hello there,

I'm running a btrfs RAID-1 on two 128GB SSDs that were getting kind
of full. I found two 256GB SSDs that I plan to use to replace the 128TB
versions.

I've managed to do the actual swap using a series of btrfs replace
commands with no special arguments, and the system is now live and
booting from the 256GB drives. However, I haven't actually noticed any
difference in btrfs fi show output, and usage looks weird. Has anyone
seen this before or have a clue as to who?

The relevant partition sizes are now (sdb is identical):

/dev/sda1   *20488388812741943040   83  Linux
/dev/sda392276736   427821055   167772160   83  Linux

Here's the show output:

Label: 'root'  uuid: 670d1132-00dc-4511-a2f6-d28ce08b4d3a
Total devices 2 FS bytes used 9.33GiB
devid1 size 13.97GiB used 11.78GiB path /dev/sda1
devid2 size 13.97GiB used 11.78GiB path /dev/sdb1

Label: 'var'  uuid: 815b3280-e90f-483a-b244-1d2dfe9b6e67
Total devices 2 FS bytes used 56.14GiB
devid1 size 80.00GiB used 80.00GiB path /dev/sda3
devid2 size 80.00GiB used 80.00GiB path /dev/sdb3

Those sizes have not changed over the resize; i.e. the original sda1/sdb1 pair
was 14GB and the sda3/sdb3 pair was 80GB, and after the replace, they haven't
changed.

And usage for / is now weird:

Overall:
Device size:  27.94GiB
Device allocated: 21.56GiB
Device unallocated:6.38GiB
Device missing:  0.00B
Used: 18.66GiB
Free (estimated):  3.99GiB  (min: 3.99GiB)
Data ratio:   2.00
Metadata ratio:   2.00
Global reserve:  208.00MiB  (used: 0.00B)

Data,RAID1: Size:9.00GiB, Used:8.20GiB
   /dev/sda1   9.00GiB
   /dev/sdb1   9.00GiB

Metadata,RAID1: Size:1.75GiB, Used:1.13GiB
   /dev/sda1   1.75GiB
   /dev/sdb1   1.75GiB

System,RAID1: Size:32.00MiB, Used:16.00KiB
   /dev/sda1  32.00MiB
   /dev/sdb1  32.00MiB

Usage for /var also looks wrong, but in a different way:

Overall:
Device size: 160.00GiB
Device allocated:160.00GiB
Device unallocated:2.00MiB
Device missing:  0.00B
Used:112.28GiB
Free (estimated): 21.20GiB  (min: 21.20GiB)
Data ratio:   2.00
Metadata ratio:   2.00
Global reserve:  512.00MiB  (used: 0.00B)

Data,RAID1: Size:74.97GiB, Used:53.77GiB
   /dev/sda3  74.97GiB
   /dev/sdb3  74.97GiB

Metadata,RAID1: Size:5.00GiB, Used:2.37GiB
   /dev/sda3   5.00GiB
   /dev/sdb3   5.00GiB

System,RAID1: Size:32.00MiB, Used:16.00KiB
   /dev/sda3  32.00MiB
   /dev/sdb3  32.00MiB

Unallocated:
   /dev/sda3   1.00MiB
   /dev/sdb3   1.00MiB


Version information:

async@riff:~$ uname -a
Linux riff 4.2.0-30-generic #36~14.04.1-Ubuntu SMP Fri Feb 26 18:49:23
UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

async@riff:~$ btrfs --version
btrfs-progs v4.0

Thanks,
-- 
Christian Robottom Reis | [+55 16] 3376 0125   | http://async.com.br/~kiko

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: task btrfs-cleaner:770 blocked for more than 120 seconds.

2016-02-19 Thread Christian Rohmann
Hey liubo,

thanks for the quick response.

On 02/18/2016 05:59 PM, Liu Bo wrote:
>> Apparently also with 4.4 there is some sort of blocking happening ...
>> > just at 38580:
> OK, what does 'sysrq-w' say?

The problem has not appeared again for some time. Do I need to catch it
right when it happens? If so, what evidence should I collect and how?



Regards

Christian
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: task btrfs-cleaner:770 blocked for more than 120 seconds.

2016-02-18 Thread Christian Rohmann


On 02/14/2016 11:42 PM, Roman Mamedov wrote:
> FWIW I had a persistently repeating deadlock on 4.1 and 4.3, but
> after upgrade to 4.4 it no longer happens.


Apparently also with 4.4 there is some sort of blocking happening ...
just at 38580:

 cut 

[Wed Feb 17 16:43:48 2016] INFO: task btrfs-cleaner:38580 blocked for
more than 120 seconds.
[Wed Feb 17 16:43:48 2016]   Not tainted 4.4.0-customkernel #1
[Wed Feb 17 16:43:48 2016] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Wed Feb 17 16:43:48 2016] btrfs-cleaner   D 882c27295dc0 0
38580  2 0x
[Wed Feb 17 16:43:48 2016]  882c16fa6480 88161a980280
882a3d744000 882a3d743df8
[Wed Feb 17 16:43:48 2016]  8815fead7104 882c16fa6480
 8815fead7108
[Wed Feb 17 16:43:48 2016]  81559a31 8815fead7100
81559cba 8155b5a0
[Wed Feb 17 16:43:48 2016] Call Trace:
[Wed Feb 17 16:43:48 2016]  [] ? schedule+0x31/0x80
[Wed Feb 17 16:43:48 2016]  [] ?
schedule_preempt_disabled+0xa/0x10
[Wed Feb 17 16:43:48 2016]  [] ?
__mutex_lock_slowpath+0x90/0x110
[Wed Feb 17 16:43:48 2016]  [] ? mutex_lock+0x1b/0x30
[Wed Feb 17 16:43:48 2016]  [] ?
btrfs_delete_unused_bgs+0xee/0x3f0 [btrfs]
[Wed Feb 17 16:43:48 2016]  [] ? __schedule+0x286/0x8f0
[Wed Feb 17 16:43:48 2016]  [] ?
cleaner_kthread+0x1a7/0x200 [btrfs]
[Wed Feb 17 16:43:48 2016]  [] ?
check_leaf+0x340/0x340 [btrfs]
[Wed Feb 17 16:43:48 2016]  [] ? kthread+0xcf/0xf0
[Wed Feb 17 16:43:48 2016]  [] ? kthread_park+0x50/0x50
[Wed Feb 17 16:43:48 2016]  [] ? ret_from_fork+0x3f/0x70
[Wed Feb 17 16:43:48 2016]  [] ? kthread_park+0x50/0x50

[Wed Feb 17 17:23:48 2016] INFO: task btrfs-cleaner:38580 blocked for
more than 120 seconds.
[Wed Feb 17 17:23:48 2016]   Not tainted 4.4.0-customkernel #1
[Wed Feb 17 17:23:48 2016] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Wed Feb 17 17:23:48 2016] btrfs-cleaner   D 881627a35dc0 0
38580  2 0x
[Wed Feb 17 17:23:48 2016]  882c16fa6480 88161a956f00
882a3d744000 882a3d743df8
[Wed Feb 17 17:23:48 2016]  8815fead7104 882c16fa6480
 8815fead7108
[Wed Feb 17 17:23:48 2016]  81559a31 8815fead7100
81559cba 8155b5a0
[Wed Feb 17 17:23:48 2016] Call Trace:
[Wed Feb 17 17:23:48 2016]  [] ? schedule+0x31/0x80
[Wed Feb 17 17:23:48 2016]  [] ?
schedule_preempt_disabled+0xa/0x10
[Wed Feb 17 17:23:48 2016]  [] ?
__mutex_lock_slowpath+0x90/0x110
[Wed Feb 17 17:23:48 2016]  [] ? mutex_lock+0x1b/0x30
[Wed Feb 17 17:23:48 2016]  [] ?
btrfs_delete_unused_bgs+0xee/0x3f0 [btrfs]
[Wed Feb 17 17:23:48 2016]  [] ? __schedule+0x286/0x8f0
[Wed Feb 17 17:23:48 2016]  [] ?
cleaner_kthread+0x1a7/0x200 [btrfs]
[Wed Feb 17 17:23:48 2016]  [] ?
check_leaf+0x340/0x340 [btrfs]
[Wed Feb 17 17:23:48 2016]  [] ? kthread+0xcf/0xf0
[Wed Feb 17 17:23:48 2016]  [] ? kthread_park+0x50/0x50
[Wed Feb 17 17:23:48 2016]  [] ? ret_from_fork+0x3f/0x70
[Wed Feb 17 17:23:48 2016]  [] ? kthread_park+0x50/0x50

[Wed Feb 17 17:57:48 2016] INFO: task btrfs-cleaner:38580 blocked for
more than 120 seconds.
[Wed Feb 17 17:57:48 2016]   Not tainted 4.4.0-customkernel #1
[Wed Feb 17 17:57:48 2016] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Wed Feb 17 17:57:48 2016] btrfs-cleaner   D 881627a95dc0 0
38580  2 0x
[Wed Feb 17 17:57:48 2016]  882c16fa6480 88161a980fc0
882a3d744000 882a3d743df8
[Wed Feb 17 17:57:48 2016]  8815fead7104 882c16fa6480
 8815fead7108
[Wed Feb 17 17:57:48 2016]  81559a31 8815fead7100
81559cba 8155b5a0
[Wed Feb 17 17:57:48 2016] Call Trace:
[Wed Feb 17 17:57:48 2016]  [] ? schedule+0x31/0x80
[Wed Feb 17 17:57:48 2016]  [] ?
schedule_preempt_disabled+0xa/0x10
[Wed Feb 17 17:57:48 2016]  [] ?
__mutex_lock_slowpath+0x90/0x110
[Wed Feb 17 17:57:48 2016]  [] ? mutex_lock+0x1b/0x30
[Wed Feb 17 17:57:48 2016]  [] ?
btrfs_delete_unused_bgs+0xee/0x3f0 [btrfs]
[Wed Feb 17 17:57:48 2016]  [] ? __schedule+0x286/0x8f0
[Wed Feb 17 17:57:48 2016]  [] ?
cleaner_kthread+0x1a7/0x200 [btrfs]
[Wed Feb 17 17:57:48 2016]  [] ?
check_leaf+0x340/0x340 [btrfs]
[Wed Feb 17 17:57:48 2016]  [] ? kthread+0xcf/0xf0
[Wed Feb 17 17:57:48 2016]  [] ? kthread_park+0x50/0x50
[Wed Feb 17 17:57:48 2016]  [] ? ret_from_fork+0x3f/0x70
[Wed Feb 17 17:57:48 2016]  [] ? kthread_park+0x50/0x50


 cut 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Docs]? Only one Subvolume with DUP (or different parameters)?

2016-02-16 Thread Christian Völker
Hi Guys,

sorry for the simple question and I assume every developer here laughs
about this question.

Anyway:

I have read loads of documents but did not find an answer for sure. Even
though I assume I am right.

On a btrfs filesystem created; is it possible to have subvolumes with
data duplication and another subvolume without (resp. with just metadata
duplication)?

I have some large filesystems currently with ext4 and I am thinking of
changing to btrfs. Some of the data is more important than others. So I
want to have data duplication on the important files (sorted in a mount
point) and without for the other subvolume.

So I want to have the advantage of redundancy of important files
combined with the flexibility of the volume manager and shared disk space.

Possible?


GReetings

Christian

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-progs 4.4 re-balance of RAID6 is very slow / limited to one cpu core?

2016-02-10 Thread Christian Rohmann
Hey btrfs-folks,


I did a bit of digging using "perf":


1)
 * "perf stat -B -p 3933 sleep 60"
 * "perf stat -e 'btrfs:*' -a sleep 60"
 -> http://fpaste.org/320718/10016145/



2)
 * perf record -e block:block_rq_issue -ag" for about 30 seconds:
 -> http://fpaste.org/320719/51101751/raw/


3)
* perf top
 -> http://fpaste.org/320720/45511028/






Regards

Christian
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-progs 4.4 re-balance of RAID6 is very slow / limited to one cpu core?

2016-02-09 Thread Christian Rohmann


On 02/01/2016 09:52 PM, Chris Murphy wrote:
>> Would some sort of stracing or profiling of the process help to narrow
>> > down where the time is currently spent and why the balancing is only
>> > running single-threaded?
> This can't be straced. Someone a lot more knowledgeable than I am
> might figure out where all the waits are with just a sysrq + t, if it
> is a hold up in say parity computations. Otherwise perf which is a
> rabbit hole but perf top is kinda cool to watch. That might give you
> an idea where most of the cpu cycles are going if you can isolate the
> workload to just the balance. Otherwise you may end up with noisy
> data.

My balance run is now working away since 19th of January:
 "885 out of about 3492 chunks balanced (996 considered),  75% left"

So this will take several more WEEKS to finish. Is there really nothing
anyone here wants me to do or analyze to help finding the root cause of
this? I mean with this kind of performance there is no way a RAID6 can
be used in production. Not because the code is not stable or
functioning, but because regular maintenance like replacing a drive or
growing an array takes WEEKS in which another maintenance procedure
could be necessary or, much worse, another drive might have failed.

What I'm saying is: Such a slow RAID6 balance renders the redundancy
unusable because drives might fail quicker than the potential rebuild
(read "balance").



Regards

Christian
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Progress indicator when (slowly) mounting a btrfs filesystem?

2016-02-01 Thread Christian Rohmann
Hey Chris,

On 01/28/2016 12:47 AM, Chris Murphy wrote:
> Might be a bug, but more likely might be a lack of optimization. If it
> eventually mounts without errors that's a pretty good plus. Lots of
> file systems can't handle power failures well at all.

So what and how should I go about profiling such a long running mount in
order to help finding point where optimization is needed the most then?

Just as a side note: I don't think it's your own expectation to be like
all those other file systems ;-)


Regards

Christian
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: "WARNING: device 0 not present" during scrub?

2016-01-31 Thread Christian Pernegger
On 31 January 2016 at 02:42, Chris Murphy <li...@colorremedies.com> wrote:
> On Sat, Jan 30, 2016 at 2:19 PM, Christian Pernegger
> It maybe be stable for Debian but is Debian explicitly supporting
> Btrfs with this release? I don't think they are.

The modules are in the kernel, the progs are in the main archive, it's
an option in the installer. It's not the default fs but I couldn't
find any indication that it's more or less supported than, say, xfs.
Why they've chosen 3.16 (and not 3.18, which would be a long term
release) I don't know, but the fact remains that that's the default
kernel of a tier 1 distro, so people using it are going to be around
for a while.

> But absolutely, of course we hope the problem is gone with the newer
> version, *that's how file system development works.*

Be that as it may, as I said, that approach doesn't inspire
confidence. If I had the vaguest idea about how to reproduce it, sure,
but all I have is an apparently lightly corrupted or at the very least
glitchy fs (it mounts and unmounts just fine). How would I know if a
new kernel helped things?

> I can see how it might seem like it's a reasonable question to just
> ask first, but it really isn't. There's just so much development
> happening right now, a developer is not in a great position to think
> that far back for specific problems and whether yours might be one of
> them, and in what kernel version it was fixed. *shrug* just doesn't
> work that way, that's why there are changelogs for every sub kernel
> version.

I do understand your point of view, but: If a possible fs corruption
bug on a widespread (if older) kernel after one month of use and
without any discernible cause gets nothing more than *shrug* from this
list then btrfs isn't production ready nor ready for any kind of
day-to-day use, not because of code maturity but because of that
mindset. IMHO the btrfs-genie is too far out of the bottle for that,
the wording of the stability status on the wiki much too inviting.

Anyway, I knew what I was getting into, so I'll just chalk it up to
experience and move on. Keep up the good work!

> Have you checked out ZFS on Linux? That might fit your use case better
> because it has the features you're asking for, but at least the ZFS
> portion is older and considered more stable.

It seemed a bit over the top on a single disk and 4GB of (not even
ECC) RAM. Between btrfs' heavy development and zfsonlinux being stable
but needing potentially less stable Solaris-glue and having no
distro-side support I thought I'd try btrfs first.

Regards,
Christian Pernegger
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: "WARNING: device 0 not present" during scrub?

2016-01-30 Thread Christian Pernegger
On 30 January 2016 at 21:10, Henk Slager  wrote:
> Can you mount the fs (readonly)?

No idea, it's still mounted (rw even), aside from the scrub failing
and debug-tree crashing I wouldn't know anything was amiss. I was kind
of reluctant to shut the machine down lest it then wouldn't come up at
all.

>  unmount and run a   btrfs check -p /dev/mapper/sda3_crypt

That would mean shutting it down and booting from a rescue image on
USB (any suggestions for something with a recent kernel and progs?).
That's fine of course, if there's nothing more to be gleaned from the
running system.

> I think there is a relation between the many ata2 messages and this
> scrub failure.

There's exactly one of these errors on every resume from suspend, I'd
assumed it's just the disk being slow to wake up. Even if they aren't
benign, I made sure beforehand that the box did not sleep during the
scrub and according to the logs it didn't.
Suspend-resume and/or systemd are still likely culprits of course.

> You can use brute-force rsync -c (and more, see manpage) to validate your
> data, assuming your sourcedata isn't on btrfs.

The data that I can verify, i.e. where the source machines still have
the version from the current backup, checks out.

> A workaround might be to disable PM for the system,

The system's supposed to wake up once daily (nightly), pull in
rdiff-backups from a few others and go back to sleep 20 min later.
Keeping it awake 24/7 is a no-go noise and cost-wise. (For testing /
debugging, sure, just not in the long run.)

> An an obvious advice is to use a 4.4 kernel and tools. Debian 'stable' 
> doesn't mean
> that every piece of the kernel and tooling fits that 'stamp'. [...] Maybe you 
> could switch
> to a rolling release linux distro or just update the debian kernel.

Using Debian stable usally means that once something is set up and
works it keeps working until the hardware dies with little to no user
interaction. For someting that sits in a corner and pulls in backups
that suits me just fine. If there's a specific reason to update the
kernel and btrfs-progs, it's easily done of course, but "let's hope it
has gone away with the newer version" doesn't inspire me with
confidence on its own.

> But the more fundamental question is why you use btrfs? What features
> do you need that ext4 or xfs or reiserfs don't have?

Data checksumming. I don't mind a bit flipping here or there in old
backups / archives but I'd have liked to know if something went bad
and which files were affected. Compression. Dedup that works on mortal
hardware. To a lesser degree, subvolumes.
Also I wanted to get familiar with the next big thing in Linux file
systems. :-) My bigger boxes use md + dm-crypt + lvm + manual
checksumming and the moment I can replace that (or part of it) with
something integrated, I will. Once the resilience and fault tolerance
is there. (The other day md-raid10 was so unfazed by what must have
been a disk with a half-dead controller that it took me half a day to
find out which one it was ...)
I was fully aware that I might run into trouble, I just didn't expect
it to take less than a month and/or happen without provocation.



The current install is expendable, even though it irks me to have to
redo it (I didn't backup that, wanted to get it just right first), but
I'd really like to find and fix the problem before I do, otherwise I
might be back to square one in a month or so ...

Cheers,
C.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Trouble with broken RAID5/6 System after trying to solve a problem, want to recover contained Data

2015-12-28 Thread Christian
I found out, that i can also show more Information about Files and Diretories 
contained in the Filesystem.

btrfs-debug-tree show me the following Infos:

parent transid verify failed on 2234958286848 wanted 35674 found 35675
parent transid verify failed on 2234958286848 wanted 35674 found 35675
parent transid verify failed on 2234958286848 wanted 35674 found 35675
Ignoring transid failure
root tree
node 2234958307328 level 1 items 20 free 101 generation 35675 owner 1
fs uuid a405d8a3-64f5-4b70-88fa-29df558262b0
chunk uuid 167d74cd-1bd1-4c1c-9acd-3fcb1ef4f483
key (EXTENT_TREE ROOT_ITEM 0) block 2234958311424 (545644119) gen 35674
key (1374 INODE_ITEM 0) block 2235090616320 (545676420) gen 35673
key (1381 INODE_ITEM 0) block 2235090214912 (545676322) gen 35673
key (1396 INODE_ITEM 0) block 2235090239488 (545676328) gen 35673
key (1411 INODE_ITEM 0) block 2235090538496 (545676401) gen 35673
key (1425 EXTENT_DATA 0) block 2235187728384 (545700129) gen 35637
key (1440 EXTENT_DATA 0) block 2235890171904 (545871624) gen 34929
key (1449 INODE_ITEM 0) block 2513856499712 (613734497) gen 35102
key (1464 INODE_ITEM 0) block 2235132993536 (545686766) gen 35636
key (1479 INODE_ITEM 0) block 2235090554880 (545676405) gen 35673
key (1494 INODE_ITEM 0) block 2235830722560 (545857110) gen 29339
key (1509 INODE_ITEM 0) block 2234985762816 (545650821) gen 34043
key (1524 INODE_ITEM 0) block 2236064706560 (545914235) gen 35321
key (1539 INODE_ITEM 0) block 2235081564160 (545674210) gen 35572
key (1547 INODE_ITEM 0) block 2235090591744 (545676414) gen 35673
key (1562 INODE_ITEM 0) block 2235090604032 (545676417) gen 35673
key (FREE_SPACE UNTYPED 2268378497024) block 2235090870272 (545676482) 
gen 35673
key (FREE_SPACE UNTYPED 2365015261184) block 2235090853888 (545676478) 
gen 35673
key (FREE_SPACE UNTYPED 2556174860288) block 2235833815040 (545857865) 
gen 29340
key (FREE_SPACE UNTYPED 2633484271616) block 2234958319616 (545644121) 
gen 35675
leaf 2234958311424 items 9 free space 1349 generation 35674 owner 1
fs uuid a405d8a3-64f5-4b70-88fa-29df558262b0
chunk uuid 167d74cd-1bd1-4c1c-9acd-3fcb1ef4f483
...
leaf 2235411533824 items 50 free space 9 generation 25816 owner 5
fs uuid a405d8a3-64f5-4b70-88fa-29df558262b0
chunk uuid 167d74cd-1bd1-4c1c-9acd-3fcb1ef4f483
...
item 2 key (259 DIR_ITEM 714625598) itemoff 3831 itemsize 51
location key (380075 INODE_ITEM 0) type FILE
namelen 21 datalen 0 name: Pixar - Gasplanet.mpg
...


Does someone know how to recover the files from the Filesystem?

Kind regards,
Christian

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to properly and efficiently balance RAID6 after more drives are added?

2015-11-11 Thread Christian Rohmann
Sorry for the late reply to this list regarding this topic
...

On 09/04/2015 01:04 PM, Duncan wrote:
> And of course, only with 4.1 (nominally 3.19 but there were initial 
> problems) was raid6 mode fully code-complete and functional -- before 
> that, runtime worked, it calculated and wrote the parity stripes as it 
> should, but the code to recover from problems wasn't complete, so you 
> were effectively running a slow raid0 in terms of recovery ability, but 
> one that got "magically" updated to raid6 once the recovery code was 
> actually there and working.

As other who write to this ML, I run into crashes when trying to do a
balance of my filesystem.
I moved through the different kernel versions and btrfs-tools and am
currently running Kernel 4.3 + 4.3rc1 of the tools but still after like
an hour of balancing (and actually moving chunks) the machine crashes
horribly without giving any good stack trace or anything in the kernel
log which I could report here :(

Any ideas on how I could proceed to get some usable debug info for the
devs to look at?


> So I'm guessing you have some 8-strip-stripe chunks at say 20% full or 
> some such.  There's 19.19 data TiB used of 22.85 TiB allocated, a spread 
> of over 3 TiB.  A full nominal-size data stripe allocation, given 12 
> devices in raid6, will be 10x1GiB data plus 2x1GiB parity, so there's 
> about 3.5 TiB / 10 GiB extra stripes worth of chunks, 350 stripes or so, 
> that should be freeable, roughly (the fact that you probably have 8-
> strip, 12-strip, and 4-strip stripes, on the same filesystem, will of 
> course change that a bit, as will the fact that four devices are much 
> smaller than the other eight).

The new devices have been in place for while (> 2 months) now, and are
barely used. Why is there not more data being put onto the new disks?
Even without a balance new data should spread evenly across all devices
right? From the IOPs I can see that only the 8 disks which always have
been in the box are doing any heavy lifting and the new disks are mostly
idle.

Anything I could do to narrow down where a certain file is stored across
the devices?






Regards

Christian
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: fix possible leak in btrfs_ioctl_balance()

2015-10-20 Thread Christian Engelmayer
Commit 8eb934591f8b ("btrfs: check unsupported filters in balance
arguments") adds a jump to exit label out_bargs in case the argument
check fails. At this point in addition to the bargs memory, the
memory for struct btrfs_balance_control has already been allocated.
Ownership of bctl is passed to btrfs_balance() in the good case,
thus the memory is not freed due to the introduced jump. Make sure
that the memory gets freed in any case as necessary. Detected by
Coverity CID 1328378.

Signed-off-by: Christian Engelmayer <cenge...@gmx.at>
---
The proposed patch is only test compiled.
---
 fs/btrfs/ioctl.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 3e3e6130637f..8d20f3b1cab0 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -4641,7 +4641,7 @@ locked:
 
if (bctl->flags & ~(BTRFS_BALANCE_ARGS_MASK | BTRFS_BALANCE_TYPE_MASK)) 
{
ret = -EINVAL;
-   goto out_bargs;
+   goto out_bctl;
}
 
 do_balance:
@@ -4655,12 +4655,15 @@ do_balance:
need_unlock = false;
 
ret = btrfs_balance(bctl, bargs);
+   bctl = NULL;
 
if (arg) {
if (copy_to_user(arg, bargs, sizeof(*bargs)))
ret = -EFAULT;
}
 
+out_bctl:
+   kfree(bctl);
 out_bargs:
kfree(bargs);
 out_unlock:
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to properly and efficiently balance RAID6 after more drives are added?

2015-09-04 Thread Christian Rohmann
Hello Ducan,

thanks a million for taking the time an effort to explain all that.
I understand that all the devices must have been chunk-allocated for
btrfs to tell me all available "space" was used (read "allocated to data
chunks").

The filesystem is quite old already with kernels starting at 3.12 (I
believe) and now 4.2 with always the most current version of btrfs-progs
debian has available.


On 09/03/2015 04:22 AM, Duncan wrote:
> But what we /do/ know from what you posted (from after the add), the 
> previously existing devices are "100% chunk-allocated", size 3.64 TiB, 
> used 3.64 TiB, on each of the first eight devices.
> 
> I don't know how much of (the user docs on) the wiki you've read, and/or 
> understood, but for many people, it takes awhile to really understand a 
> few major differences between btrfs and most other filesystems.
> 
> 1) Btrfs separates data and metadata into separate allocations, 
> allocating, tracking and reporting them separately.  While some 
> filesystems do allocate separately, few expose the separate data and 
> metadata allocation detail to the user.
> 
> 2) Btrfs allocates and uses space in two steps, first allocating/
> reserving relatively large "chunks" from free-space into separate data 
> and metadata chunks, then using space from these chunk allocations as 
> needed, until they're full and more must be allocated.  Nominal[1] chunk 
> size is 1 GiB for data, 256 MiB for metadata.

> 3) Up until a few kernel cycles ago, btrfs could and would automatically 
> allocate chunks as needed, but wouldn't deallocate them when they 
> emptied.  Once they were allocated for data or metadata, that's how they 
> stayed allocated, unless/until the user did a balance manually, at which 
> point the chunk rewrite would consolidate the used space and free any 
> unused chunk-space back to the unallocated space pool.

> The result was that given normal usage writing and deleting data, over 
> time, all unallocated space would typically end up allocated as data 
> chunks, such that at some point the filesystem would run out of metadata 
> space and need to allocate more metadata chunks, but couldn't, because of 
> all those extra partially to entirely empty data chunks that were 
> allocated and never freed.
> 
> Since IIRC 3.17 or so (kernel cycle from unverified memory, but that 
> should be close), btrfs will automatically deallocate chunks if they're 
> left entirely empty, so the problem has disappeared to a large extent, 
> tho it's still possible to eventually end up with a bunch of not-quite-
> empty data chunks, that require a manual balance to consolidate and clean 
> up.


I am running a full balance now, it's at 94% remaining (running for 48
hrs already ;-) ).

Is there any way I should / could "scan" for empty data chunks or almost
empty data chunks which could be freed in order to have more chunks
available for the actual balancing or new chunks that should be used
with a 10 drive RAID6? I understand that btrfs NOW does that somewhat
automagically, but my FS is quite old and used already and there is new
data coming in all the time, so I wand that properly spread across all
the drives.


Regards

Christian
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


How to properly and efficiently balance RAID6 after more drives are added?

2015-09-02 Thread Christian Rohmann
Hello btrfs-enthusiasts,

I have a rather big btrfs RAID6 with currently 12 devices. It used to be
only 8 drives 4TB each, but I successfully added 4 more drives with 1TB
each at some point. What I am trying to find out, and that's my main
reason for posting this, is how to balance the data on the drives now.

I am wondering what I should read from this "btrfs filesystem show" output:

--- cut ---
Total devices 12 FS bytes used 19.23TiB
devid1 size 3.64TiB used 3.64TiB path /dev/sdc
devid2 size 3.64TiB used 3.64TiB path /dev/sdd
devid3 size 3.64TiB used 3.64TiB path /dev/sde
devid4 size 3.64TiB used 3.64TiB path /dev/sdf
devid5 size 3.64TiB used 3.64TiB path /dev/sdh
devid6 size 3.64TiB used 3.64TiB path /dev/sdi
devid7 size 3.64TiB used 3.64TiB path /dev/sdj
devid8 size 3.64TiB used 3.64TiB path /dev/sdb
devid9 size 931.00GiB used 535.48GiB path /dev/sdg
devid   10 size 931.00GiB used 535.48GiB path /dev/sdk
devid   11 size 931.00GiB used 535.48GiB path /dev/sdl
devid   12 size 931.00GiB used 535.48GiB path /dev/sdm

btrfs-progs v4.1.2
--- cut ---


First of all I wonder why the first 8 disks are shown as "full" as "used
= size", but there is 5.3TB of free space for the fs shown by "df":

--- cut ---
Filesystem  Size  Used Avail Use% Mounted on
/dev/sdc 33T   20T  5.3T  79% /somemountpointsomewhere
--- cut ---

Also "btrfs filesystem df" doesn't give me any clues on the matter:

--- cut ---
btrfs filesystem df /srv/mirror/
Data, single: total=8.00MiB, used=0.00B
Data, RAID6: total=22.85TiB, used=19.19TiB
System, single: total=4.00MiB, used=0.00B
System, RAID6: total=12.00MiB, used=1.34MiB
Metadata, single: total=8.00MiB, used=0.00B
Metadata, RAID6: total=42.09GiB, used=38.42GiB
GlobalReserve, single: total=512.00MiB, used=1.58MiB
--- cut ---




What I am very certain about is that the "load" of I/O requests is not
equal yet, as iostat clearly shows:

--- cut ---
Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc  21.40 4.41   42.22   12.71  3626.12   940.79
166.29 3.82   69.38   42.83  157.60   5.98  32.82
sdb  22.35 4.45   41.29   12.71  3624.20   941.27
169.09 4.22   77.88   46.75  178.97   6.10  32.96
sdd  22.03 4.44   41.60   12.73  3623.76   943.22
168.13 3.79   69.45   42.53  157.48   6.05  32.85
sde  21.21 4.43   42.30   12.74  3621.39   943.36
165.88 3.82   69.28   42.99  156.62   5.98  32.90
sdf  22.19 4.42   41.42   12.75  3623.65   940.63
168.51 3.77   69.36   42.64  156.13   6.05  32.79
sdh  21.35 4.46   42.25   12.68  3623.12   940.28
166.14 3.95   71.72   43.61  165.40   6.02  33.06
sdi  21.92 4.38   41.67   12.79  3622.03   942.91
167.63 3.49   63.83   40.23  140.74   6.02  32.77
sdj  21.31 4.41   42.26   12.72  3625.32   941.50
166.12 3.99   72.25   44.50  164.44   6.00  33.01
sdg   8.90 4.97   12.53   21.16  1284.47  1630.08
173.02 0.83   24.61   27.31   23.02   1.77   5.95
sdk   9.14 4.94   12.30   21.19  1284.61  1630.02
174.07 0.79   23.41   26.59   21.57   1.76   5.91
sdl   8.88 4.95   12.58   21.19  1284.46  1630.06
172.62 0.80   23.80   25.68   22.68   1.78   6.00
sdm   9.07 4.85   12.35   21.29  1284.43  1630.01
173.26 0.79   23.57   26.57   21.83   1.77   5.94

--- cut ---



Should I run btrfs balance on the filesystem? If so, what FILTERS would
I then use in order for the data and therefore requests to be better
distributed?




With regards and thanks in advance,


Christian
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to properly and efficiently balance RAID6 after more drives are added?

2015-09-02 Thread Christian Rohmann
Hey Hugo,

thanks for the quick response.

On 09/02/2015 01:30 PM, Hugo Mills wrote:
> You had some data on the first 8 drives with 6 data+2 parity, then 
> added four more. From that point on, you were adding block groups
> with 10 data+2 parity. At some point, the first 8 drives became
> full, and then new block groups have been added only to the new
> drives, using 2 data+2 parity.

Even though the old 8 drive RAID6 was not full yet? Read: There was
still some terabytes of free space.


>> Should I run btrfs balance on the filesystem? If so, what FILTERS
>> would I then use in order for the data and therefore requests to
>> be better distributed?
> 
> Yes, you should run a balance. You probably need to free up some 
> space on the first 8 drives first, to give the allocator a chance
> to use all 12 devices in a single stripe. This can also be done
> with a balance. Sadly, with the striped RAID levels (0, 10, 5, 6),
> it's generally harder to ensure that all of the data is striped as
> evenly as is possible(*). I don't think there are any filters that
> you should to use -- just balance everything. The first time
> probably won't do the job fully. A second balance probably will.
> These are going to take a very long time to run (in your case, I'd
> guess at least a week for each balance). I would recommend starting
> the balance in a tmux or screen session, and also creating a second
> shell in the same session to run monitoring processes. I typically
> use something like:
> 
> watch -n60 sudo btrfs fi show\; echo\; btrfs fi df /mountpoint\;
> echo\; btrfs bal stat /mountpoint

Yeah, that's what I usually do. The thing is that one does not get any
progress indication and estimate about how long a task will take.


> (*) Hmmm... idea for a new filter: min/max stripe width? Then you 
> could balance only the block groups that aren't at full width,
> which is probably what's needed here.

Consider my question and motivation a rather obvious use case of
running out of disk space (or iops) and simply adding some more
drives. A balance needs to be straightforward for people to understand
and perform such tasks.



Regards

Christian
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


qgroup limit clearing, was Re: Btrfs progs release 4.1

2015-06-22 Thread Christian Robottom Reis
On Mon, Jun 22, 2015 at 05:00:23PM +0200, David Sterba wrote:
   - qgroup:
 - show: distinguish no limits and 0 limit value
 - limit: ability to clear the limit

I'm using kernel 4.1-rc7 as per:

root@riff:/var/lib/lxc/juju-trusty-lxc-template/rootfs# uname -a
Linux riff 4.1.0-040100rc7-generic #201506080035 SMP Mon Jun 8 04:36:20 UTC 
2015 x86_64 x86_64 x86_64 GNU/Linux

But apart from still having major issues with qgroups (quota enforcement
triggers even when there seems to be plenty of free space) clearing
limits with btrfs-progs 4.1 doesn't revert back to 'none', instead
confusingly setting the quota to 16EiB. Using:

root@riff:/var/lib/lxc/juju-trusty-lxc-template/rootfs# btrfs version
btrfs-progs v4.1

I start from:

qgroupid rfer excl max_rfer max_excl 
     
0/5   2.15GiB  1.95GiB none none 
0/261 1.42GiB  1.11GiB none100.00GiB 
0/265 1.09GiB600.59MiB none100.00GiB 
0/271   793.32MiB366.40MiB none100.00GiB 
0/274   514.96MiB142.92MiB none100.00GiB 

I then issue:

root@riff# btrfs qgroup limit -e none 261 /var
root@riff# btrfs qgroup limit none 261 /var

I end up with:

qgroupid rfer excl max_rfer max_excl 
     
0/5   2.15GiB  1.95GiB none none 
0/261 1.42GiB  1.11GiB 16.00EiB 16.00EiB 
0/265 1.09GiB600.59MiB none100.00GiB 
0/271   793.32MiB366.40MiB none100.00GiB 
0/274   514.96MiB142.92MiB none100.00GiB 

Is that expected?
-- 
Christian Robottom Reis | [+55 16] 3376 0125   | http://async.com.br/~kiko
CEO, Async Open Source  | [+55 16] 9 9112 6430 | http://launchpad.net/~kiko
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in


qgroup limit clearing, was Re: Btrfs progs release 4.1

2015-06-22 Thread Christian Robottom Reis
On Mon, Jun 22, 2015 at 05:00:23PM +0200, David Sterba wrote:
   - qgroup:
 - show: distinguish no limits and 0 limit value
 - limit: ability to clear the limit

I'm using kernel 4.1-rc7 as per:

root@riff:/var/lib/lxc/juju-trusty-lxc-template/rootfs# uname -a
Linux riff 4.1.0-040100rc7-generic #201506080035 SMP Mon Jun 8 04:36:20 UTC 
2015 x86_64 x86_64 x86_64 GNU/Linux

But apart from still having major issues with qgroups (quota enforcement
triggers even when there seems to be plenty of free space) clearing
limits with btrfs-progs 4.1 doesn't revert back to 'none', instead
confusingly setting the quota to 16EiB. Using:

root@riff:/var/lib/lxc/juju-trusty-lxc-template/rootfs# btrfs version
btrfs-progs v4.1

I start from:

qgroupid rfer excl max_rfer max_excl 
     
0/5   2.15GiB  1.95GiB none none 
0/261 1.42GiB  1.11GiB none100.00GiB 
0/265 1.09GiB600.59MiB none100.00GiB 
0/271   793.32MiB366.40MiB none100.00GiB 
0/274   514.96MiB142.92MiB none100.00GiB 

I then issue:

root@riff# btrfs qgroup limit -e none 261 /var
root@riff# btrfs qgroup limit none 261 /var

I end up with:

qgroupid rfer excl max_rfer max_excl 
     
0/5   2.15GiB  1.95GiB none none 
0/261 1.42GiB  1.11GiB 16.00EiB 16.00EiB 
0/265 1.09GiB600.59MiB none100.00GiB 
0/271   793.32MiB366.40MiB none100.00GiB 
0/274   514.96MiB142.92MiB none100.00GiB 

Is that expected?
-- 
Christian Robottom Reis   | [+1] 612 888 4935| http://launchpad.net/~kiko
Canonical VP Hyperscale   | [+55 16] 9 9112 6430
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in


Re: trim not working and irreparable errors from btrfsck

2015-06-17 Thread Christian

On 06/17/2015 10:22 AM, Chris Murphy wrote:

On Wed, Jun 17, 2015 at 6:56 AM, Christian Dysthe cdys...@gmail.com wrote:

Hi,

Sorry for asking more about this. I'm not a developer but trying to learn.
In my case I get several errors like this one:

root 2625 inode 353819 errors 400, nbytes wrong

Is it inode 353819 I should focus on and what is the number after root, in
this case 2625?


I'm going to guess it's tree root 2625, which is the same thing as fs
tree, which is the same thing as subvolume. Each subvolume has its own
inodes. So on a given Btrfs volume, an inode number can exist more
than once, but in separate subvolumes. When you use btrfs inspect
inode it will list all files with that inode number, but only the one
in subvol ID 2625 is what you care about deleting and replacing.

Thanks! Deleting the file for that inode took care of it. No more 
errors. Restored it from a backup.


However, fstrim still gives me 0 B (0 bytes) trimmed, so that may be 
another problem. Is there a way to check if trim works?


--
//Christian


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: trim not working and irreparable errors from btrfsck

2015-06-17 Thread Christian

On 06/17/2015 11:28 AM, Chris Murphy wrote:



However, fstrim still gives me 0 B (0 bytes) trimmed, so that may be
another problem. Is there a way to check if trim works?


That sounds like maybe your SSD is blacklisted for trim, is all I can
think of. So trim shouldn't be the cause of the problem if it's being
blacklisted. The recent problems appear to be around newer SSDs that
support queue trim and newer kernels that issue queued trim. There
have been some patches related to trim to the kernel, but the
existence of blacklisting and claims of bugs in firmware make it
difficult to test and isolate.

http://techreport.com/news/28473/some-samsung-ssds-may-suffer-from-a-buggy-trim-implementation

This is an Intel SSD in a Lenovo Thinkpad X1 Carbon. Trim worked until a 
few weeks ago and still works for my small ext4 boot partition (just ran 
it to check). I will keep looking for a solution. Thanks!


--
//Christian


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


trim not working and irreparable errors from btrfsck

2015-06-16 Thread Christian

Hi,

I have a btrfs partions on my laptop containing / and /home. Recently I 
noticed that trim (fstrim) didn't work anymore. It always tells me there 
nothing to trim which can not be correct. I then tried to to run btrfsck 
--repair and this is the result:


ubuntu@ubuntu:~$ sudo btrfsck /dev/sda3 --repair
enabling repair mode
Fixed 0 roots.
Checking filesystem on /dev/sda3
UUID: 3d52dc93-c89f-453f-965d-8601d11e7710
checking extents
checking free space cache
cache and super generation don't match, space cache will be invalidated
checking fs roots
root 257 inode 353819 errors 400, nbytes wrong
root 2260 inode 353819 errors 400, nbytes wrong
root 2262 inode 353819 errors 400, nbytes wrong
root 2264 inode 353819 errors 400, nbytes wrong
root 2266 inode 353819 errors 400, nbytes wrong
root 2268 inode 353819 errors 400, nbytes wrong
root 2270 inode 353819 errors 400, nbytes wrong
root 2273 inode 353819 errors 400, nbytes wrong
root 2275 inode 353819 errors 400, nbytes wrong
root 2277 inode 353819 errors 400, nbytes wrong
root 2279 inode 353819 errors 400, nbytes wrong
root 2281 inode 353819 errors 400, nbytes wrong
root 2283 inode 353819 errors 400, nbytes wrong
root 2285 inode 353819 errors 400, nbytes wrong
root 2287 inode 353819 errors 400, nbytes wrong
root 2289 inode 353819 errors 400, nbytes wrong
root 2291 inode 353819 errors 400, nbytes wrong
root 2293 inode 353819 errors 400, nbytes wrong
root 2295 inode 353819 errors 400, nbytes wrong
root 2297 inode 353819 errors 400, nbytes wrong
root 2299 inode 353819 errors 400, nbytes wrong
root 2301 inode 353819 errors 400, nbytes wrong
root 2303 inode 353819 errors 400, nbytes wrong
root 2305 inode 353819 errors 400, nbytes wrong
root 2317 inode 353819 errors 400, nbytes wrong
root 2320 inode 353819 errors 400, nbytes wrong
root 2326 inode 353819 errors 400, nbytes wrong
root 2556 inode 353819 errors 400, nbytes wrong
root 2574 inode 353819 errors 400, nbytes wrong
root 2592 inode 353819 errors 400, nbytes wrong
root 2601 inode 353819 errors 400, nbytes wrong
root 2617 inode 353819 errors 400, nbytes wrong
root 2620 inode 353819 errors 400, nbytes wrong
root 2621 inode 353819 errors 400, nbytes wrong
root 2624 inode 353819 errors 400, nbytes wrong
root 2625 inode 353819 errors 400, nbytes wrong
root 2626 inode 353819 errors 400, nbytes wrong
root 2627 inode 353819 errors 400, nbytes wrong
root 2628 inode 353819 errors 400, nbytes wrong
root 2629 inode 353819 errors 400, nbytes wrong
root 2630 inode 353819 errors 400, nbytes wrong
root 2631 inode 353819 errors 400, nbytes wrong
root 2632 inode 353819 errors 400, nbytes wrong
found 49406146650 bytes used err is 1
total csum bytes: 117817456
total tree bytes: 792248320
total fs tree bytes: 604520448
total extent tree bytes: 52101120
btree space waste bytes: 180897891
file data blocks allocated: 195314843648
 referenced 135991529472
Btrfs v3.17

I get the exact same result every time I run btrfsck --repair. If the 
file system irreparable or is there something I can do to save it?



--
//Christian


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Mysterious device (id 0).

2015-05-14 Thread Christian

Hi,

My laptop has a small ext4 /boot partition /dev/sda1 and a large btrfs 
partition containing / and /home as a sub volume /dev/sda3


When I run scrub on /dev/sda3 and check status with sudo btrfs scrub 
status -d /dev/sda3 I get:


scrub status for 3d52dc93-c89f-453f-965d-8601d11e7710
scrub device /dev/sda3 (id 1) history
scrub started at Thu May 14 19:54:02 2015 and finished after 235 seconds
total bytes scrubbed: 111.03GiB with 0 errors
scrub device  (id 0) history
scrub started at Thu May 14 19:54:02 2015 and was aborted after 0 
seconds
total bytes scrubbed: 0.00B with 0 errors

As device (id 0) is indicated, but I have no idea what it is.

When I run scrub with a script (cron) the same device (id 0) shows up in 
a warning:


/etc/cron.daily/btrfs-scrub:
WARNING: device 0 not present
scrub device /dev/sda3 (id 1) done
scrub started at Sat May  2 08:03:46 2015 and finished after 241 
seconds

total bytes scrubbed: 109.06GiB with 0 errors
scrub device  (id 0) canceled
scrub started at Sat May  2 08:03:46 2015 and was aborted after 0 
seconds

total bytes scrubbed: 0.00B with 0 errors

What is this device (id 0) which never gets scrubbed but always shows up?

I'm running Ubuntu 15.04 with the 3.19.0-16-generic kernel.

P.S. I am not a developer.

--
//Christian


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Quota limit question

2015-03-06 Thread Christian Robottom Reis
Just as a follow-up, I upgraded btrfs-tools and the kernel again. I
currently have a filesystem which reports 1G exclusive use:

root@riff# btrfs qg show -r -e /var -p -c
qgroupid rfer excl max_rfer max_excl parent child 
     -- - 
0/261 1.52GiB  1.01GiB0.00B100.00GiB --- ---  

This filesystem reports over quota, and removing the quota fixes that:

root@riff# touch x
touch: cannot touch ‘x’: Disk quota exceeded
root@riff# btrfs qg limit -e
none 261 /var
root@riff# touch x
root@riff# 

So at the moment quotas are pretty much unusable in kernel 3.18.6/tools
3.18.2, at least for my use case, and that's a bit surprising since
there isn't anything very interesting about it (other than it contains a
bunch of lxc-cloned rootfs).

I've proactively added Yang who has submitted a few patches on quota
checking recently just to let me know if he thinks that this should be
fixed with a trunk kernel, or if he'd like to investigate or consider
this further. Thanks!

On Wed, Dec 24, 2014 at 03:52:41AM +, Duncan wrote:
 Christian Robottom Reis posted on Tue, 23 Dec 2014 18:36:02 -0200 as
 excerpted:
 
  On Tue, Dec 16, 2014 at 11:15:37PM -0200, Christian Robottom Reis wrote:
  # btrfs qgroup limit 2000m 0/261 .  touch x touch: cannot touch
  ‘x’: Disk quota exceeded
  
  The strange thing is that it doesn't seem to be actually out of space:
  
  # btrfs qgroup show -p -r -e /var | grep 261
  0/261810048  391114752   2097152000  0  ---
  
  Replying to myself as I had not yet been subscribed in time to receive a
  reply; I just upgraded to 3.18.1 and am seeing the same issue on the
  same subvolume (and on no others).
 
 Looking at the thread here on gmane.org (list2news and list2web gateway), 
 it appears my reply was the only reply in any case, and it was general as 
 I don't run quotas myself.
 
 Basically I suggested upgrading, as the quota code as some rather huge 
 bugs in it (quotas could go seriously negative!) with the old versions 
 you were running.  But you've upgraded at least the kernel now (userspace 
 you didn't say).
 
 Here's a link to the thread on the gmane web interface for completeness, 
 but the above about covers my reply, as I said the only one until your 
 thread bump and my reply here, so there's not much new there unless 
 someone posts further followups to this thread...
 
 
 http://comments.gmane.org/gmane.comp.file-systems.btrfs/41491
 
 
 -- 
 Duncan - List replies preferred.   No HTML msgs.
 Every nonfree program has a lord, a master --
 and if you use the program, he is your master.  Richard Stallman
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Christian Robottom Reis   | [+1] 612 888 4935| http://launchpad.net/~kiko
Canonical VP Hyperscale   | [+55 16] 9 9112 6430
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: make btrfs qgroups show human readable sizes

2015-01-09 Thread Christian Robottom Reis
On Fri, Jan 09, 2015 at 02:47:05PM +0800, Fan Chengniang wrote:
 make btrfs qgroups show human readable sizes, using -h option, example:

Oh! This is really nice. I wonder, would there be a sane way to show the
actual path the qgroup is associated with as well?
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Quota limit question

2014-12-23 Thread Christian Robottom Reis
On Tue, Dec 16, 2014 at 11:15:37PM -0200, Christian Robottom Reis wrote:
 # btrfs qgroup limit 2000m 0/261 .  touch x
 touch: cannot touch ‘x’: Disk quota exceeded
 
 The strange thing is that it doesn't seem to be actually out of space:
 
 # btrfs qgroup show -p -r -e /var | grep 261
 0/261810048  391114752   2097152000  0  ---   

Replying to myself as I had not yet been subscribed in time to receive a
reply; I just upgraded to 3.18.1 and am seeing the same issue on the
same subvolume (and on no others).

root@riff:/etc# uname -a
Linux riff 3.18.1-031801-generic #201412170637 SMP Wed Dec 17 11:38:50
UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

It's quite odd that this specific subvolume acts up, given that there
are quite a few others that are closer to the quota:

subvol  grouptotal   unshared
-
(unknown)   0/5  1.37G / none  1.16G /   none
lxc-template1/rootfs0/2590.68G / none  0.10G /  2.00G
machine-2/rootfs0/2611.07G / none  0.40G /  2.00G
machine-3/rootfs0/2651.17G / none  0.41G /  2.00G
lxc-template2/rootfs0/2710.77G / none  0.31G /  2.00G
lxc-template3/rootfs0/2740.46G / none  0.02G /  2.00G
machine-4/rootfs0/2837.12G / none  6.21G / 10.00G
machine-5/rootfs0/2881.05G / none  0.34G /  2.00G
machine-6/rootfs0/289   11.33G / none 10.74G / 15.00G
machine-7/rootfs0/2901.30G / none  0.68G /  2.00G
machine-8/rootfs0/2921.00G / none  0.33G /  2.00G
machine-9/rootfs0/2931.17G / none  0.38G /  2.00G
machine-10/rootfs   0/3061.34G / none  0.62G /  2.00G
machine-11/rootfs   0/3189.49G / none  8.75G / 15.00G
lxc-template4/rootfs0/3200.79G / none  0.78G /  2.00G
machine-14/rootfs   0/3231.10G / none  0.45G /  2.00G

The LWN article suggests that btrfs is quite conservative with quotas,
but shouldn't 265, 290, 306, 320 and 323 all be out of quota as well? Or
is there a lot else that goes into the calculation beyond the numbers
reported by btrfs qgroup show?

What could I do to help investigate further?
-- 
Christian Robottom Reis | [+55 16] 3376 0125   | http://async.com.br/~kiko
CEO, Async Open Source  | [+55 16] 9 9112 6430 | http://launchpad.net/~kiko
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Quota limit question

2014-12-16 Thread Christian Robottom Reis
Hello there,

I'm trying out btrfs on a machine we use to host a number of
containers. After a misbehaved process filled the partition allocated to
the containers, I decided to experiment with quotas to isolate the
containers from each other. But I've now run into an oddity with one of
the containers, which reports being out of space:

# btrfs qgroup limit 2000m 0/261 .  touch x
touch: cannot touch ‘x’: Disk quota exceeded

The strange thing is that it doesn't seem to be actually out of space:

# btrfs qgroup show -p -r -e /var | grep 261
0/261810048  391114752   2097152000  0  ---   

which pretty-printed is 1.04G rfer and 0.36G excl (perhaps the qgroup
show command could take an option to display in other units?)

I can only get it to allow me to start using it again if I go over 5808M:

# btrfs qgroup limit 5807m 0/261 .  rm -f x  touch x
rm: cannot remove ‘x’: Disk quota exceeded
# btrfs qgroup limit 5808m 0/261 .  rm -f x  touch x
#

Why specifically 5808 I'm not sure, but I binary searched until I got to
that number. Does anyone have a clue as to why that might be happening,
and perhaps what I'm missing?

For completeness, some details on the filesystem and system:

# btrfs fi show /var
Label: var  uuid: 815b3280-e90f-483a-b244-1d2dfe9b6e67
Total devices 2 FS bytes used 31.48GiB
devid1 size 80.00GiB used 55.91GiB path /dev/sda3
devid2 size 80.00GiB used 55.91GiB path /dev/sdb3

root@riff:/var/lib/lxc/async-local-machine-2/rootfs# btrfs fi df /var
Data, RAID1: total=53.88GiB, used=30.45GiB
System, RAID1: total=32.00MiB, used=16.00KiB
Metadata, RAID1: total=2.00GiB, used=1.03GiB

# btrfs qgroup show -p -r -e /var
qgroupid rferexclmax_rfermax_excl   parent  
    --  
0/5  1486852096  1252569088  0   0  --- 
0/259727175168   104947712   0   5368709120 --- 
0/261810048  391114752   2097152000  0  ---   
0/2651255923712  442871808   0   5368709120 --- 
0/271831856640   333189120   0   5368709120 --- 
0/274498761728   228270080   5368709120 --- 
0/2837666098176  6691426304  10737418240 0  --- 
0/2881118441472  348901376   0   5368709120 --- 
0/28911134029824 10498187264 16106127360 0  --- 
0/2901412505600  694210560   10737418240 0  --- 
0/2921131053056  73440   0   5368709120 --- 
0/2931258176512  401141760   0   5368709120 --- 
0/3061430532096  656773120   0   5368709120 --- 
0/3189309212672  8509857792  10737418240 0  --- 
0/320860209152   837406720   0   5368709120 --- 
0/3231167962112  469741568   0   5368709120 --- 

# btrfs --version
Btrfs v3.12

# uname -a
Linux riff 3.13.0-43-generic #72-Ubuntu SMP Mon Dec 8 19:35:06 UTC 2014
x86_64 x86_64 x86_64 GNU/Linux

Thanks,
-- 
Christian Robottom Reis | [+55 16] 3376 0125   | http://async.com.br/~kiko
Async Open Source   | [+55 16] 9 9112 6430 | http://launchpad.net/~kiko
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: suspicious number of devices: 72057594037927936

2014-10-27 Thread Christian Kujau
On Mon, 27 Oct 2014 at 16:35, David Sterba wrote:
 Yeah sorry, I sent the v2 too late, here's an incremental that applies
 on top of current 3.18-rc
 
 https://patchwork.kernel.org/patch/5160651/

Yup, that fixes it. Thank you! If it's needed:

  Tested-by: Christian Kujau li...@nerdbynature.de

@Filipe: and thanks for warning me about 3.17 - I used 3.17.0 since it 
came out and compiled kernels on the btrfs partition and haven't had any 
issues. But it wasn't used very often, so whatever the serious issues 
were, I haven't experienced any.

Christian.
-- 
BOFH excuse #98:

The vendor put the bug there.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Does btrfs-restore report missing/corrupt files?

2014-10-26 Thread Christian Tschabuschnig

Hello,

currently I am trying to recover a btrfs filesystem which had a few subvolumes. 
When running
# btrfs restore -sx /dev/xxx .
one subvolume gets restored.

Would the restore utility report any corruption within this subvolume? May I 
assume that all data was recovered if there are no messages on STDERR?

In this particular case there were a few messages on STDERR but I believe they 
refer to the other subvolumes:
Check tree block failed, want=43702427648, have=4902726477564852953
Check tree block failed, want=43702427648, have=4902726477564852953
Check tree block failed, want=43702427648, have=9670034583150859267
Check tree block failed, want=43702427648, have=4902726477564852953
Check tree block failed, want=43702427648, have=4902726477564852953
read block failed check_tree_block
Error reading subvolume ./cur-root: 18446744073709551611
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1] btrfs-progs: fix compiler warning

2014-06-05 Thread Christian Hesse
gcc 4.9.0 gives warnings about possibly uninitialized values when
compiling with function inlining and optimization level two enabled
(CFLAGS=-finline-functions -O2).

Initializing the values fixes the warning. Hope this is correct.

Signed-off-by: Christian Hesse m...@eworm.de
---
 cmds-send.c   | 2 +-
 send-stream.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/cmds-send.c b/cmds-send.c
index 1cd457d..0c3e96b 100644
--- a/cmds-send.c
+++ b/cmds-send.c
@@ -416,7 +416,7 @@ int cmd_send(int argc, char **argv)
u32 i;
char *mount_root = NULL;
char *snapshot_parent = NULL;
-   u64 root_id;
+   u64 root_id = 0;
u64 parent_root_id = 0;
int full_send = 1;
int new_end_cmd_semantic = 0;
diff --git a/send-stream.c b/send-stream.c
index 88e18e2..54edafe 100644
--- a/send-stream.c
+++ b/send-stream.c
@@ -216,7 +216,7 @@ static int tlv_get_string(struct btrfs_send_stream *s, int 
attr, char **str)
 {
int ret;
void *data;
-   int len;
+   int len = 0;
 
TLV_GET(s, attr, data, len);
 
-- 
2.0.0

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] btrfs-progs: fix compiler warning

2014-06-05 Thread Christian Hesse
David Sterba dste...@suse.cz on Wed, 2014/06/04 18:44:
 On Wed, Jun 04, 2014 at 09:19:26AM +0200, Christian Hesse wrote:
   It seems to be related to default gcc flags from distribution?
  
  Probably. I did compile with optimization, so adding -O2 may do the trick:
  
  make CFLAGS=${CFLAGS} -O2 all
 
 The warning appears with -O2, so the question is if gcc is not able to
 reason about the values (ie. a false positive) or if there's a bug that
 I don't see.

I do not see a bug either. So probably this is a false positive...

Looks like the warning is triggered as soon as -ftree-vrp is added to CFLAGS.
From gcc man page:

-ftree-vrp
   Perform Value Range Propagation on trees.  This is similar to the
   constant propagation pass, but instead of values, ranges of values are
   propagated.  This allows the optimizers to remove unnecessary range
   checks like array bound checks and null pointer checks.  This is
   enabled by default at -O2 and higher.  Null pointer check elimination
   is only done if -fdelete-null-pointer-checks is enabled.

Is it possibly that gcc optimized away any checks?
-- 
Schoene Gruesse
Chris
 O ascii ribbon campaign
   stop html mail - www.asciiribbon.org


signature.asc
Description: PGP signature


Re: [PATCH 1/1] btrfs-progs: fix compiler warning

2014-06-04 Thread Christian Hesse
Qu Wenruo quwen...@cn.fujitsu.com on Wed, 2014/06/04 14:48:
 
  Original Message 
 Subject: [PATCH 1/1] btrfs-progs: fix compiler warning
 From: Christian Hesse m...@eworm.de
 To: linux-btrfs@vger.kernel.org
 Date: 2014年06月03日 19:29
  gcc 4.9.0 gives a warning: array subscript is above array bounds
 
  Checking for greater or equal instead of just equal fixes this.
 
  Signed-off-by: Christian Hesse m...@eworm.de
  ---
cmds-restore.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
 
  diff --git a/cmds-restore.c b/cmds-restore.c
  index 96b97e1..534a49e 100644
  --- a/cmds-restore.c
  +++ b/cmds-restore.c
  @@ -169,7 +169,7 @@ again:
  break;
  }

  -   if (level == BTRFS_MAX_LEVEL)
  +   if (level = BTRFS_MAX_LEVEL)
  return 1;

  slot = path-slots[level] + 1;

 Also I faied to reproduce the bug.
 Using gcc-4.9.0-3 from Archlinux core repo.

Exactly the same here. ;)

 It seems to be related to default gcc flags from distribution?

Probably. I did compile with optimization, so adding -O2 may do the trick:

make CFLAGS=${CFLAGS} -O2 all
-- 
Schoene Gruesse
Chris
 O ascii ribbon campaign
   stop html mail - www.asciiribbon.org


signature.asc
Description: PGP signature


[PATCH 1/1] btrfs-progs: fix build, manpage compression command

2014-06-03 Thread Christian Hesse
man pages for btrfs-progs are compressed by gzip by default. In Makefile
the variable GZIP is use, this evaluates to 'gzip gzip' on my system.
From man gzip:

 The environment variable GZIP can hold a set of default options for
 gzip. These options are interpreted first and can be overwritten by
 explicit command line parameters.

So using any other variable name fixes this.

Signed-off-by: Christian Hesse m...@eworm.de
---
 Documentation/Makefile | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/Documentation/Makefile b/Documentation/Makefile
index 45299bb..e79dd8f 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -45,7 +45,7 @@ MANPAGE_XSL = manpage-normal.xsl
 XMLTO = xmlto
 XMLTO_EXTRA =
 XMLTO_EXTRA = -m manpage-bold-literal.xsl
-GZIP = gzip
+GZIPCMD = gzip
 INSTALL ?= install
 RM ?= rm -f
 LNS ?= ln -sf
@@ -56,7 +56,7 @@ ifneq ($(findstring $(MAKEFLAGS),s),s)
 ifndef V
QUIET_ASCIIDOC  = @echo '   ' ASCIIDOC $@;
QUIET_XMLTO = @echo '   ' XMLTO $@;
-   QUIET_GZIP  = @echo '   ' GZIP $@;
+   QUIET_GZIP  = @echo '   ' GZIPCMD $@;
QUIET_STDERR= 2 /dev/null
QUIET_SUBDIR0   = +@subdir=
QUIET_SUBDIR1   = ;$(NO_SUBDIR) echo '   ' SUBDIR $$subdir; \
@@ -80,9 +80,9 @@ clean:
$(RM) *.xml *.xml+ *.8 *.8.gz
 
 %.8.gz : %.8
-   $(QUIET_GZIP)$(GZIP) -n -c $  $@
+   $(QUIET_GZIP)$(GZIPCMD) -n -c $  $@
 
-%.8 : %.xml 
+%.8 : %.xml
$(QUIET_XMLTO)$(RM) $@  \
$(XMLTO) -m $(MANPAGE_XSL) $(XMLTO_EXTRA) man $
 %.xml : %.txt asciidoc.conf
-- 
2.0.0

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1] btrfs-progs: fix compiler warning

2014-06-03 Thread Christian Hesse
gcc 4.9.0 gives a warning: format ‘%d’ expects argument of type ‘int’,
but argument 2 has type ‘u64’

Using %llu and casting to unsigned long long (same as bytenr) fixes this.

Signed-off-by: Christian Hesse m...@eworm.de
---
 btrfs-select-super.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/btrfs-select-super.c b/btrfs-select-super.c
index 15e6921..d7cd187 100644
--- a/btrfs-select-super.c
+++ b/btrfs-select-super.c
@@ -100,8 +100,8 @@ int main(int ac, char **av)
/* we don't close the ctree or anything, because we don't want a real
 * transaction commit.  We just want the super copy we pulled off the
 * disk to overwrite all the other copies
-*/ 
-   printf(using SB copy %d, bytenr %llu\n, num,
+*/
+   printf(using SB copy %llu, bytenr %llu\n, (unsigned long long)num,
   (unsigned long long)bytenr);
return ret;
 }
-- 
2.0.0

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] btrfs-progs: fix build, manpage compression command

2014-06-03 Thread Christian Hesse
David Sterba dste...@suse.cz on Tue, 2014/06/03 11:14:
 On Tue, Jun 03, 2014 at 08:09:25AM +0200, Christian Hesse wrote:
  man pages for btrfs-progs are compressed by gzip by default. In Makefile
  the variable GZIP is use, this evaluates to 'gzip gzip' on my system.
  From man gzip:
  
   The environment variable GZIP can hold a set of default options for
   gzip. These options are interpreted first and can be overwritten by
   explicit command line parameters.
  
  So using any other variable name fixes this.
 
 Thanks, I can see that you've fixed this bug for the second time.  The
 GZIP variable name got reverted during the asciidoc update and slipped
 through, sorry for that.

No problem, it is not a big deal.

  --- a/Documentation/Makefile
  +++ b/Documentation/Makefile
  @@ -56,7 +56,7 @@ ifneq ($(findstring $(MAKEFLAGS),s),s)
   ifndef V
  QUIET_ASCIIDOC  = @echo '   ' ASCIIDOC $@;
  QUIET_XMLTO = @echo '   ' XMLTO $@;
  -   QUIET_GZIP  = @echo '   ' GZIP $@;
  +   QUIET_GZIP  = @echo '   ' GZIPCMD $@;
 
 JFYI, I've removed this change so the output stays the same.
 
 I've assembled a branch containing doc-only fixes, including this one,
 and asked Chris do do a 3.14.3 release.

Thanks.
Though I would be fine if this just goes into the next regular release.
-- 
Schoene Gruesse
Chris
 O ascii ribbon campaign
   stop html mail - www.asciiribbon.org


signature.asc
Description: PGP signature


Re: [PATCH 1/1] btrfs-progs: fix compiler warning

2014-06-03 Thread Christian Hesse
David Sterba dste...@suse.cz on Tue, 2014/06/03 18:52:
 On Tue, Jun 03, 2014 at 01:29:19PM +0200, Christian Hesse wrote:
  gcc 4.9.0 gives a warning: array subscript is above array bounds
  
  Checking for greater or equal instead of just equal fixes this.
 
 That fixes the warning, but I don't see the code path that leads to
  level = BTRFS_MAX_LEVEL
 
 On the first pass, when level = 1 and the first while() reaches at most
 BTRFS_MAX_LEVEL, the equality test is enough. So it has to go through
 the second while() where level is decremented and in the range
 [1..BTRFS_MAX_LEVEL-1] before 'goto again' jumps to the first while
 again.

I suppose gcc does not know how much level can be increased within
function next_leaf(). level  BTRFS_MAX_LEVEL is never met at runtime, but
this way gcc knows that level can never be bigger than BTRFS_MAX_LEVEL.

Any better way to fix this?
-- 
Schoene Gruesse
Chris
 O ascii ribbon campaign
   stop html mail - www.asciiribbon.org


signature.asc
Description: PGP signature


btrfsck is using far too much memory

2014-04-25 Thread Christian Robert

btrfsck is using far too much memory !

I tryed a btrfsck on my /dev/md127 (13T) and had to kill it
because btrfsck used 7.8 Gigs of ram ( machine have 8 Gigs of ram, plus four 
Gigs on swap when I killed it)

btrfsck should not try to bring in memory the whole metadata, etc.

my 2 cents, (will reformat /dev/md127 tomorrow, actually saving what can be 
saved on the old readonly fs)


Xtian.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Some impossible benchmark results with LUKS - what am I missing?

2014-03-26 Thread Christian Robert

Only thing I can see is Lurk is not passing the barrier (do that writes in 
sequence then ack) to the devices.
so it act as a caching device (dangerous for btrfs point of view).

I'm *no* expert in that matter at all.

I may be wrong.

Xtian.

On 2014-03-26 21:56, Evan Powell wrote:

Order did not matter - I have run a few tests showing this same disparity in 
both orders.

I reran this evening using the drop_caches command you provided, no change in 
throughput results. The improvement I've seen has been present through multiple 
reboots so far, and clearing RAM using dd on different test files. Doesn't look 
like it's caching causing this, so I'm still stumped.

Output of cryptsetup luksDump is similar on all 4 devices:

Version:1
Cipher name:aes
Cipher mode:xts-plain64
Hash spec:  sha1
Payload offset: 4102
MK bits:256
...
(followed by different iterations and salts)

(Offtopic: migrating between encrypted and unencrypted drives is really 
painless on btrfs: rebalance to raid1, delete two devices, remove their 
encrypted volumes, readd the devices, remove and readd the 2 remaining 
encrypted disks without encryption, rebalance to raid10. All live, without 
losing data! This is awesome!)

Evan Powell | Technical Lead
epow...@zenoss.com

- Original Message -

From: Matt jackdac...@gmail.com
To: Evan Powell epow...@zenoss.com
Cc: linux-btrfs linux-btrfs@vger.kernel.org
Sent: Wednesday, March 26, 2014 4:47:19 PM
Subject: Some impossible benchmark results with LUKS - what am I missing?


Hey folks,



I have been experimenting with btrfs on a home NAS box, and have some benchmark 
\
results that I don't believe. I'm hoping someone here has some insight on what 
I've \
missed that would cause such a result.



The short version: I'm seeing roughly a 36% (on write) to 55% (on read) 
performance \
*improvement* when using btrfs on LUKS containers over btrfs on raw disks. This 
\
should not be the case!



The test setup:
My test file is a large file that I previously generated from /dev/urandom. I 
saw \
similar results using /dev/zero as input, as well as a repeated copy of the 
whole \
Ubuntu 14.04 ISO (i.e., real-ish data).



My calculated MB/s numbers are based on the 'real' time in the output.



Live-booted Ubuntu 14.04 (nightly from 3/11/14, kernel 3.13.5)
4x 4TB WD Red drives in standard (no RAID) configuration
i3-4130 CPU (has AES-NI for accelerated encryption)
Default BTRFS options from the disk gui, always raid10.



Tested configurations:
Raw: btrfs raid10 on 4x raw drives
Encrypted: btrfs raid10 on 4 separate LUKS containers on 4x raw drives (default 
LUKS \
options)




Read command: $ time sh -c dd if=test.out of=/dev/null bs=4k
Raw:
512+0 records in
512+0 records out
2097152 bytes (21 GB) copied, 149.841 s, 140 MB/s



real 2m29.849s
user 0m2.764s
sys 0m7.064s



= 133.467690809 MB/s



Encrypted:
$ time sh -c dd if=test2.out of=/dev/null bs=4k
512+0 records in
512+0 records out
2097152 bytes (21 GB) copied, 96.6127 s, 217 MB/s



real 1m36.627s
user 0m3.331s
sys 0m9.518s



= 206.981485506 MB/s




Read+Write: $ time sh -c dd if=test2.out of=test20grand.out bs=4k  sync
Raw:
512+0 records in
512+0 records out
2097152 bytes (21 GB) copied, 227.069 s, 92.4 MB/s



real 3m49.701s
user 0m2.854s
sys 0m15.936s



= 87.069712365 MB/s



Encrypted:
512+0 records in
512+0 records out
2097152 bytes (21 GB) copied, 167.823 s, 125 MB/s



real 2m48.784s
user 0m2.955s
sys 0m17.956s



= 118.494644042 MB/s




Any ideas what could explain this result?



One coworker suggested that perhaps the LUKS container was returning from 
'sync' \
early, before actually finishing the write to disk. This would seem to violate 
the \
assumptions of the 'sync' primitive, so I have my doubts.



I'm also interested in learning how I can reliably benchmark the real cost of 
running \
full-disk encryption under btrfs on my system.



Thanks!




Evan Powell | Technical Lead
epow...@zenoss.com



Hi Evan,

just to be sure:

did you do a

echo 3  /proc/sys/vm/drop_caches

before *each* test ?

also try reversing the order of tests like so:

Encrypted
RAW

whether that makes a difference


It would also be interesting to see the output of

cryptsetup luksDump

and

Version: *
Cipher name: *
Cipher mode: *
Hash spec: *




Interesting find indeed ! Thanks for sharing the finding

I'm currently using Btrfs on an encrypted system partition (without
AES-NI supported hardware) and things already feel and are faster than
with ext4

we need to find out what this magic is =)


Kind Regards


Thanks

Matt
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: Why cannot I move a read-only snapshot around?

2013-10-25 Thread Christian Robert

you can change a ro snapshot into a rw snapshot

you just snapshot it without the -r option

ex:

# btrfs subv snap -r linux-3.12-rc5 snap_ro
Create a readonly snapshot of 'linux-3.12-rc5' in './snap_ro'

# touch ./snap_ro/helo
touch: cannot touch ‘./snap_ro/helo’: Read-only file system

# btrfs subv snap snap_ro snap_rw
Create a snapshot of 'snap_ro' in './snap_rw'

# touch ./snap_rw/helo

# btrfs subv delete ./snap_ro
Delete subvolume '/data/snap_ro'

# mv snap_rw snap_any

# ls -ld snap*
drwxrwxr-x 1 root root 944 Oct 26 01:41 snap_any/


On 2013-10-24 18:09, Karl Kiniger wrote:

Hi

(pls see also my other reply in this thread)

On Thu 131024, Duncan wrote:

Karl Kiniger posted on Thu, 24 Oct 2013 17:29:56 +0200 as excerpted:


Dear list, (newbie alert)

After sucessfully sending and receiving a dozen of  related snapshots I
want to move them all to the readonly folder but I cannot:


I see you mention fedora 19 in a followup, but for those not on fedora,
that's not much help figuring out which kernel you're running.  It's
likely that the following is your problem, tho there's not enough
information in your post to be sure.


I promise to include more info in the future but just received
snapshots should be read-only if I read the docs correctly.



There was a recent regression with nested subvolumes that may be what
you're running into.  Kernel 3.11 was affected as well as early 3.12-rcs
and I believe 3.10 also but I'm not sure how far back, except that
someone mentioned trying an old kernel (3.8 or 3.6-ish) and moving
subvolumes into subvolumes worked there (tho doing anything involving
writing into read-only snapshots shouldn't work, by design, but that
doesn't appear to be what you're doing, you're just trying to move read-
only snapshots to a different location on a read/write base or parent
subvolume, this post assuming it's a parent subvolume, thus triggering
the nested subvolumes bug).


No nested subvolumes involved. (Is this true? This all is inside the top
level volume or what it is called in btrfs.)


A fix is available but I'm not sure whether it got into 3.12 (which is
just about to be released) or will now have to wait for 3.13.  So either
try latest 3.12 git and see if its there, or find and cherry-pick the
patch, applying it against 3.11 or 3.12.  (Given that btrfs is still an
experimental filesystem with fixes applied every kernel, while reverting
to an old enough kernel should unregress this particular problem, I can't
recommend it except possibly for testing against data you don't care
about, since by doing so you're exposing yourself to other known and now
fixed bugs.)

Agreed, I dont want to go back to older kernels - too risky. The data  are
backed up anyways (on ZFS if you are curious)  but the time invested  into
my current btrfs setup would be gone.

I can live with the current situation, its just not nice to have the
snapshots lying around in a place where they should not belong.

If it were possible to temporarily make the r/o snapshots r/w just for
the purpose of moving (being aware that caution is needed) I would
not hesitate ane try that.

Karl




--
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


broken superblocks

2013-10-19 Thread Christian Switaiski
During a system deadlock I was forced to hard shutdown my system while a  
`btrfs balance` was ongoing and ctrl+c  `btrfs balance cancel` failed
Since the restart the partition is broken, the kernel fails to mounts it  
and btrfs' repair tools exit cause of errors w/o doing anything. The  
partition is 20GB, and so got 2 superblocks: both are broken :/




kernel is 3.11.5, btrfs-progs are git ones  distro is gentoo
 btrfs output start 

# mount /dev/sda2 /mnt/repair
mount: wrong fs type, bad option, bad superblock on /dev/sda2,
   missing codepage or helper program, or other error
   In some cases useful info is found in syslog - try
   dmesg | tail or so

# dmesg | tail
[279861.340796] device label root-x64 devid 1 transid 535462 /dev/sda2
[279861.341416] btrfs: disk space caching is enabled
[279861.342223] attempt to access beyond end of device --
[279861.342226] sda2: rw=32, want=44040448, limit=41943040 --
[279861.342236] attempt to access beyond end of device
[279861.342237] sda2: rw=32, want=44302592, limit=41943040
[279861.347811] btrfs: open_ctree failed

# btrfs-show-super /dev/sda2
superblock: bytenr=65536, device=/dev/sda2
-
csum0xbd53d63e [match]
bytenr  65536
...
total_bytes 21474836480
bytes_used  18661335040
...
dev_item.total_bytes21474836480
dev_item.bytes_used 22032678912 --- beyond part size!
...

# btrfs fi show
Label: 'root-x64'  uuid: f87210e5-47a9-44af-9797-6afea2cdaae8
Total devices 1 FS bytes used 17.38GB
devid1 size 20.00GB used 20.52GB path /dev/sda2   --- again  
beyond partsize

Btrfs v0.20-rc1-358-g194aa4a

# btrfsck /dev/sda2
Check tree block failed, want=39317532672, have=0
read block failed check_tree_block
Couldn't setup extent tree
Checking filesystem on /dev/sda2
UUID: f87210e5-47a9-44af-9797-6afea2cdaae8
Critical roots corrupted, unable to fsck the FS

# btrfs restore /dev/sda2 /tmp/fix
Check tree block failed, want=39317532672, have=0
read block failed check_tree_block
Couldn't setup extent tree
Check tree block failed, want=39317532672, have=0
read block failed check_tree_block
Couldn't read fs root: -5

# btrfs-find-root /dev/sda2
Super think's the tree root is at 29556736, chunk root 40659582976
Found tree root at 29556736 gen 535462 level 1

btrfs-debug-tree  `btrfs chunk-recover -v` output a lot stuff but at the  
end both fail:

# btrfs-debug-tree /dev/sda2
...
Check tree block failed, want=39317532672, have=0
read block failed check_tree_block
btrfs-debug-tree: btrfs-debug-tree.c:237: main: Assertion `!(ret  0)'  
failed.

Aborted

# btrfs chunk-recover -v /dev/sda2
...
Total Chunks:   31
  Heathy:   29
  Bad:  2
Orphan Block Groups:
  Block Group: start = 37710987264, len = 268435456, flag = 1
  Block Group: start = 37979422720, len = 230686720, flag = 1
  Block Group: start = 38210109440, len = 33554432, flag = 1
Orphan Device Extents:
  Device extent: devid = 1, start = 3527409664, len = 268435456, chunk  
offset = 37710987264

Check tree block failed, want=39317532672, have=0
read block failed check_tree_block
Couldn't setup extent tree
open with broken chunk error
Fail to recover the chunk tree.

 btrfs output end 



Cause btrfs-debug-tree  `btrfs chunk-recover` say there are still chunks,  
I have the hope most data is still intact, only btrfs  tools stumble on  
some invalid data. Is there any hope to e.g. force btrfs to clamp all data?


Thanks,
Christian Switaiski


PS: I already tried to patch btrfsprogs to not exit after the errors  
occurred (it helped with a problem in the past I had) but it just causes  
segfaults :/
Also I am bit confused that none of the btrfs progs show a 'beyond  
anything' msg like the kernel.





--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[BUG] Reflinking fails for files 2GB on 32-bit platform [SOLVED - NO BUG]

2013-10-16 Thread Christian Weinberger
Please ignore the previous post.
I have to correct myself, this is not a btrfs issue. In fact, the wrong (non 64 
bit) calls have been used in the test program.


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fwd: btrfsck and ctree version

2013-01-30 Thread polack christian
i know that the proposed ctree.c file is from a kernel source but
btrfsck is user space only, since the btrfs-next is newer than
btrfs-prog i was hoping for a commit of this change for the user-space
version.

since this file-system have been created prior kernel 3.2 there is no
tree root backup

 i was hoping using  btrfsck to regenerate the  csum which are failing
during mount time (Input/output error)

/var/log/messages: btrfs csum failed ino 1048522 off 5124096 csum
1219517398 private 836806197

 i didn't find any way to deactivate csum check  with a mount option

or as chris say is there a way to regenerate  the cache on the block device.

is there a solution ?

thanks for your responses

olivier


2013/1/29 Chris Mason chris.ma...@fusionio.com

 On Mon, Jan 28, 2013 at 03:03:08PM -0700, David Sterba wrote:
  On Mon, Jan 28, 2013 at 03:07:13PM +0100, polack christian wrote:
   i did use btrfsck to recover it
   i got the tool from
  
   git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git
  
   and i got this error message:
   ...
   Check tree block failed, want=294555648, have=0
   Check tree block failed, want=294559744, have=0
   Check tree block failed, want=294559744, have=0
   btrfsck: ctree.c:1690: leaf_space_used: Assertion `!(data_len  0)' 
   failed.
   Aborted (core dumped)
  
   looking at
  
   git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git
 
  but this is a kernel source repository, not progs, I wonder
 
   this error in ctree.c have been corrected by this commit
  
   http://git.kernel.org/?p=linux/kernel/git/josef/btrfs-next.git;a=commit;h=41be1f3b40b87de33cd2e7463dce88596dbdccc4
 
  how this could happen. I have looked at the whether it does not silently
  fix a bug, nothing wrong I can see now.  How did you verify that the
  patch fixes the fsck problem?

 It sounds much more like the reboot or remount cleared the cache on the
 block device.

 -chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fwd: btrfsck and ctree version

2013-01-29 Thread polack christian
i know that the proposed ctree.c file is from a kernel source but
btrfsck is user space only, since the btrfs-next is newer than
btrfs-prog i was hoping for a commit of this change for the user-space
version.

since this file-system have been created prior kernel 3.2 there is no
tree root backup

 i was hoping using  btrfsck to regenerate the  csum which are failing
during mount time (Input/output error)

/var/log/messages: btrfs csum failed ino 1048522 off 5124096 csum
1219517398 private 836806197

 i didn't find any way to deactivate csum check  with a mount option

or as chris mention is there a way to regenerate  the cache on the block device.

is there a solution ?

thanks for your responses

olivier


2013/1/29 Chris Mason chris.ma...@fusionio.com

 On Mon, Jan 28, 2013 at 03:03:08PM -0700, David Sterba wrote:
  On Mon, Jan 28, 2013 at 03:07:13PM +0100, polack christian wrote:
   i did use btrfsck to recover it
   i got the tool from
  
   git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git
  
   and i got this error message:
   ...
   Check tree block failed, want=294555648, have=0
   Check tree block failed, want=294559744, have=0
   Check tree block failed, want=294559744, have=0
   btrfsck: ctree.c:1690: leaf_space_used: Assertion `!(data_len  0)' 
   failed.
   Aborted (core dumped)
  
   looking at
  
   git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git
 
  but this is a kernel source repository, not progs, I wonder
 
   this error in ctree.c have been corrected by this commit
  
   http://git.kernel.org/?p=linux/kernel/git/josef/btrfs-next.git;a=commit;h=41be1f3b40b87de33cd2e7463dce88596dbdccc4
 
  how this could happen. I have looked at the whether it does not silently
  fix a bug, nothing wrong I can see now.  How did you verify that the
  patch fixes the fsck problem?

 It sounds much more like the reboot or remount cleared the cache on the
 block device.

 -chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


fix btrfs-progs build

2012-10-08 Thread Christian Hesse
Hello everybody,

man pages for btrfs-progs are compressed by gzip by default. In Makefile the
variable GZIP is use, this evaluates to 'gzip gzip' on my system. From man
gzip:

 The environment variable GZIP can hold a set of default options for gzip.
 These options are interpreted first and can be overwritten by explicit
 command line parameters.

So using any other variable name fixes this. Patch is attached.
-- 
main(a){char*c=/*Schoene Gruesse */B?IJj;MEH
CX:;,b;for(a/*Chris   get my mail address:*/=0;b=c[a++];)
putchar(b-1/(/*   gcc -o sig sig.c  ./sig*/b/42*2-3)*42);}
diff --git a/man/Makefile b/man/Makefile
index 4a90b75..f7b57f7 100644
--- a/man/Makefile
+++ b/man/Makefile
@@ -1,4 +1,4 @@
-GZIP=gzip
+GZIPCMD=gzip
 INSTALL= install
 
 prefix ?= /usr/local
@@ -12,22 +12,22 @@ MANPAGES = mkfs.btrfs.8.gz btrfsctl.8.gz btrfsck.8.gz btrfs-image.8.gz \
 all: $(MANPAGES)
 
 mkfs.btrfs.8.gz: mkfs.btrfs.8.in
-	$(GZIP) -n -c mkfs.btrfs.8.in  mkfs.btrfs.8.gz
+	$(GZIPCMD) -n -c mkfs.btrfs.8.in  mkfs.btrfs.8.gz
 
 btrfs.8.gz: btrfs.8.in
-	$(GZIP) -n -c btrfs.8.in  btrfs.8.gz
+	$(GZIPCMD) -n -c btrfs.8.in  btrfs.8.gz
 
 btrfsctl.8.gz: btrfsctl.8.in
-	$(GZIP) -n -c btrfsctl.8.in  btrfsctl.8.gz
+	$(GZIPCMD) -n -c btrfsctl.8.in  btrfsctl.8.gz
 
 btrfsck.8.gz: btrfsck.8.in
-	$(GZIP) -n -c btrfsck.8.in  btrfsck.8.gz
+	$(GZIPCMD) -n -c btrfsck.8.in  btrfsck.8.gz
 
 btrfs-image.8.gz: btrfs-image.8.in
-	$(GZIP) -n -c btrfs-image.8.in  btrfs-image.8.gz
+	$(GZIPCMD) -n -c btrfs-image.8.in  btrfs-image.8.gz
 
 btrfs-show.8.gz: btrfs-show.8.in
-	$(GZIP) -n -c btrfs-show.8.in  btrfs-show.8.gz
+	$(GZIPCMD) -n -c btrfs-show.8.in  btrfs-show.8.gz
 
 clean :
 	rm -f $(MANPAGES)


Re: fix btrfs-progs build

2012-10-08 Thread Christian Hesse
Chris Mason chris.ma...@fusionio.com on Mon, 2012/10/08 10:29:
 On Mon, Oct 08, 2012 at 08:17:13AM -0600, Christian Hesse wrote:
  Hello everybody,
  
  man pages for btrfs-progs are compressed by gzip by default. In Makefile
  the variable GZIP is use, this evaluates to 'gzip gzip' on my system.
  From man gzip:
  
   The environment variable GZIP can hold a set of default options for
   gzip. These options are interpreted first and can be overwritten by
   explicit command line parameters.
  
  So using any other variable name fixes this. Patch is attached.
 
 Ok, which system is this?  Just curious, I'll pull in the patch.

This is Arch Linux with gzip 1.5-1.
-- 
main(a){char*c=/*Schoene Gruesse */B?IJj;MEH
CX:;,b;for(a/*Chris   get my mail address:*/=0;b=c[a++];)
putchar(b-1/(/*   gcc -o sig sig.c  ./sig*/b/42*2-3)*42);}
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: fix btrfs-progs build

2012-10-08 Thread Christian Hesse
Chris Mason chris.ma...@fusionio.com on Mon, 2012/10/08 10:33:
 On Mon, Oct 08, 2012 at 08:30:31AM -0600, Christian Hesse wrote:
  Chris Mason chris.ma...@fusionio.com on Mon, 2012/10/08 10:29:
   On Mon, Oct 08, 2012 at 08:17:13AM -0600, Christian Hesse wrote:
Hello everybody,

man pages for btrfs-progs are compressed by gzip by default. In
Makefile the variable GZIP is use, this evaluates to 'gzip gzip' on
my system. From man gzip:

 The environment variable GZIP can hold a set of default options for
 gzip. These options are interpreted first and can be overwritten by
 explicit command line parameters.

So using any other variable name fixes this. Patch is attached.
   
   Ok, which system is this?  Just curious, I'll pull in the patch.
  
  This is Arch Linux with gzip 1.5-1.
 
 Strange, I'm running running arch linux with gzip 1.5-1 and it builds.
 I wonder if something else is expanding it.  I'll take the patch
 regardless, there's no reason to add build problems when we don't need
 to.

This happens if you have exported GZIP to your environment. So probably most
people are not effected.
-- 
main(a){char*c=/*Schoene Gruesse */B?IJj;MEH
CX:;,b;for(a/*Chris   get my mail address:*/=0;b=c[a++];)
putchar(b-1/(/*   gcc -o sig sig.c  ./sig*/b/42*2-3)*42);}
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfsck crashes

2012-07-12 Thread Christian Volkmann

Anand Jain schrieb:



  If this is a deliberate corruption can you pls share the test-case ?
  if not have you tried mount with recovery and the scrub. ? scrub
  would be preferred choice over btrfsck.





Scrub does not fix the problem. I replaced the real host name with myhost.
Strange for me: the mentioned pathes for errors point to the same file names,
just a part of the myhost is different.

The btrfsck fails with the same crash after the scrub.


speedy:/home/cv # btrfs scrub status /backup.old
scrub status for fa7034c8-86d4-4aa3-9fde-ecd7051ff43c
scrub started at Thu Jul 12 20:21:08 2012 and finished after 1495 
seconds
total bytes scrubbed: 115.49GiB with 9 errors
error details: verify=3 csum=6
corrected errors: 3, uncorrectable errors: 6, unverified errors: 0

Should I continue with any analysis for bug hunting or just reformat and forget?

Best regard,
Christian

[ 5059.168649] btrfs: checksum/header error at logical 1532956672 on dev 
/dev/md3, sector 7204744: metadata leaf (level 0) in tree 2
[ 5059.168656] btrfs: checksum/header error at logical 1532956672 on dev 
/dev/md3, sector 7204744: metadata leaf (level 0) in tree 2
[ 5065.581348] btrfs: fixed up error at logical 1532956672 on dev /dev/md3
[ 5065.587844] btrfs: checksum/header error at logical 1532960768 on dev 
/dev/md3, sector 7204752: metadata leaf (level 0) in tree 2
[ 5065.587851] btrfs: checksum/header error at logical 1532960768 on dev 
/dev/md3, sector 7204752: metadata leaf (level 0) in tree 2
[ 5065.599317] btrfs: fixed up error at logical 1532960768 on dev /dev/md3
[ 5065.599500] btrfs: checksum/header error at logical 1532964864 on dev 
/dev/md3, sector 7204760: metadata leaf (level 0) in tree 2
[ 5065.599506] btrfs: checksum/header error at logical 1532964864 on dev 
/dev/md3, sector 7204760: metadata leaf (level 0) in tree 2
[ 5065.607379] btrfs: fixed up error at logical 1532964864 on dev /dev/md3
[ 5074.964900] btrfs: checksum error at logical 2327654400 on dev /dev/md3, 
sector 8756888: metadata leaf (level 0) in tree 5
[ 5074.964907] btrfs: checksum error at logical 2327654400 on dev /dev/md3, 
sector 8756888: metadata leaf (level 0) in tree 5
[ 5075.977763] btrfs: unable to fixup (regular) error at logical 2327654400 on 
dev /dev/md3
[ 5085.133646] btrfs: checksum error at logical 2327654400 on dev /dev/md3, 
sector 10854040: metadata leaf (level 0) in tree 5
[ 5085.133653] btrfs: checksum error at logical 2327654400 on dev /dev/md3, 
sector 10854040: metadata leaf (level 0) in tree 5
[ 5086.148842] btrfs: unable to fixup (regular) error at logical 2327654400 on 
dev /dev/md3
[ 6436.036292] btrfs: checksum error at logical 139801403392 on dev /dev/md3, sector 
331786256, root 5, inode 2960268, offset 1345069056, length 4096, links 1 (path: 
int-www-mail/int-www-2012-07-05-22_28_57/srv/www/vhosts/myhost.de/statistics/logs/access_ssl_log.processed)
[ 6436.036300] btrfs: unable to fixup (regular) error at logical 139801403392 
on dev /dev/md3
[ 6454.615722] btrfs: checksum error at logical 141661282304 on dev /dev/md3, sector 
335418832, root 5, inode 2968078, offset 104292352, length 4096, links 1 (path: 
int-www-mail/int-www-2012-07-05-22_28_57/srv/www/vhosts/myhost.no/statistics/logs/error_log)
[ 6454.615736] btrfs: unable to fixup (regular) error at logical 141661282304 
on dev /dev/md3
[ 6455.523759] btrfs: checksum error at logical 140794101760 on dev /dev/md3, sector 
333725120, root 5, inode 2964438, offset 87449600, length 4096, links 1 (path: 
int-www-mail/int-www-2012-07-05-22_28_57/srv/www/vhosts/myhost.fr/statistics/logs/access_log.processed)
[ 6455.523775] btrfs: unable to fixup (regular) error at logical 140794101760 
on dev /dev/md3
[ 6475.865387] btrfs: checksum error at logical 143052115968 on dev /dev/md3, sector 
338135304, root 5, inode 3000621, offset 1078595584, length 4096, links 1 (path: 
int-www-mail/int-www-2012-07-05-22_28_57/srv/www/vhosts/otherhost.com/statistics/logs/access_log.processed)
[ 6475.865403] btrfs: unable to fixup (regular) error at logical 143052115968 
on dev /dev/md3

speedy:/tmp/btrfs/btrfs-progs # ./btrfsck /dev/md3
checking extents
checksum verify failed on 2327654400 wanted 73CDE79C found 72
checksum verify failed on 2327654400 wanted 73CDE79C found 72
checksum verify failed on 2327654400 wanted 73CDE79C found 72
checksum verify failed on 2327654400 wanted 73CDE79C found 72
Csum didn't match
owner ref check failed [2327654400 4096]
ref mismatch on [101138354176 98304] extent item 1, found 0
Incorrect local backref count on 101138354176 root 5 owner 1867898 offset 0 
found 0 wanted 1 back 0x787c260
backpointer mismatch on [101138354176 98304]
owner ref check failed [101138354176 98304]
ref mismatch on [101138452480 106496] extent item 1, found 0
Incorrect local backref count on 101138452480 root 5 owner 1867899 offset 0 
found 0 wanted 1 back 0x787c2a0
backpointer mismatch on [101138452480 106496]
owner ref check failed [101138452480

Re: btrfsck crashes

2012-07-09 Thread Christian Volkmann

Anand Jain schrieb:

 What I have seen: buf is 0, after read_tree_block.

   Yes since we not checking extent_buffer_uptodate for the csum_root_tree,
   that will pass the null buf, The following patch will avoid sending null
   buffer
 https://patchwork.kernel.org/patch/1148831/

   However whether --init-csum-tree will build the good csum I think that
   will still depends on the corruption IMO.

 -Anand


.)
The patch does not help.
This is false: !extent_buffer_uptodate(info-csum_root-node)

.)
Output btrfsck of 
git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git ,
patched at line 3552.

speedy:/tmp/btrfs/btrfs-progs # gdb ./btrfsck
GNU gdb (GDB) SUSE (7.3-41.1.2)
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type show copying
and show warranty for details.
This GDB was configured as x86_64-suse-linux.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/...
Reading symbols from /tmp/btrfs/btrfs-progs/btrfsck...done.
(gdb) r /dev/md3
Starting program: /tmp/btrfs/btrfs-progs/btrfsck /dev/md3
Missing separate debuginfo for /lib64/ld-linux-x86-64.so.2
Try: zypper install -C 
debuginfo(build-id)=f20c99249f5a5776e1377d3bd728502e3f455a3f
Missing separate debuginfo for /lib64/libuuid.so.1
Try: zypper install -C 
debuginfo(build-id)=24ae727f9cd5fb29f81b0f965859d3cf4668bf17
Missing separate debuginfo for /lib64/libc.so.6
Try: zypper install -C 
debuginfo(build-id)=7b169b1db50384b70e3e4b4884cd56432d5de796
checking extents
checksum verify failed on 2327654400 wanted 89AAEA38 found 72
checksum verify failed on 2327654400 wanted 89AAEA38 found 72
checksum verify failed on 2327654400 wanted 73CDE79C found 72
checksum verify failed on 2327654400 wanted 89AAEA38 found 72
Csum didn't match
owner ref check failed [2327654400 4096]
ref mismatch on [101138354176 98304] extent item 1, found 0
Incorrect local backref count on 101138354176 root 5 owner 1867898 offset 0 
found 0 wanted 1 back 0x182ebd20
backpointer mismatch on [101138354176 98304]
owner ref check failed [101138354176 98304]
ref mismatch on [101138452480 106496] extent item 1, found 0
Incorrect local backref count on 101138452480 root 5 owner 1867899 offset 0 
found 0 wanted 1 back 0xefb8d0
backpointer mismatch on [101138452480 106496]
owner ref check failed [101138452480 106496]
ref mismatch on [101138558976 8192] extent item 1, found 0
Incorrect local backref count on 101138558976 root 5 owner 1867901 offset 0 
found 0 wanted 1 back 0x5a22350
backpointer mismatch on [101138558976 8192]
owner ref check failed [101138558976 8192]
ref mismatch on [101138567168 16384] extent item 1, found 0
Incorrect local backref count on 101138567168 root 5 owner 1867902 offset 0 
found 0 wanted 1 back 0x5a22390
backpointer mismatch on [101138567168 16384]
owner ref check failed [101138567168 16384]
ref mismatch on [101138583552 16384] extent item 1, found 0
Incorrect local backref count on 101138583552 root 5 owner 1867903 offset 0 
found 0 wanted 1 back 0x19dfaae0
backpointer mismatch on [101138583552 16384]
owner ref check failed [101138583552 16384]
Errors found in extent allocation tree
checking fs roots
checksum verify failed on 2327654400 wanted 89AAEA38 found 72
checksum verify failed on 2327654400 wanted 89AAEA38 found 72
checksum verify failed on 2327654400 wanted 73CDE79C found 72
checksum verify failed on 2327654400 wanted 89AAEA38 found 72
Csum didn't match

Program received signal SIGSEGV, Segmentation fault.
0x00402264 in btrfs_header_level (eb=0x0) at ctree.h:1540
1540BTRFS_SETGET_HEADER_FUNCS(header_level, struct btrfs_header, level, 8);
(gdb)


.)
Against which git should I regular patch?
This git from the wiki seems to be not up to date:
 http://git.darksatanic.net/repo/btrfs-progs-unstable.git

This repository does not match from the line number:
git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git

.)
Strange for me: Why seems the same number 2327654400 wants
to have a different checksum?

checksum verify failed on 2327654400 wanted 89AAEA38 found 72
checksum verify failed on 2327654400 wanted 73CDE79C found 72


Thanks  regards,
Christian




On 09/07/12 00:08, Christian Volkmann wrote:

Hi there,

I have a corrupted filesystem. This filesystem crashes btrfsck.

A gdb anaylsis showed me:
(gdb) bt
#0 0x00402379 in btrfs_header_nritems (eb=0x0) at ctree.h:1426
#1 0x00408c14 in run_next_block (root=0x73fb40, bits=0x740d50, 
bits_nr=1024, last=0x7fffd948, pending=0x7fffda40,
seen=0x7fffda50, reada=0x7fffda30, nodes=0x7fffda20, 
extent_cache=0x7fffda60) at btrfsck.c:2512
#2 0x004099e2 in check_extents (root=0x73fb40) at btrfsck.c:2792
#3 0x00409bec in main (ac=1, av=0x7fffdbe8) at btrfsck.c:2853

What I have seen: buf is 0

btrfsck crashes

2012-07-08 Thread Christian Volkmann

Hi there,

I have a corrupted filesystem. This filesystem crashes btrfsck.

A gdb anaylsis showed me:
(gdb) bt
#0  0x00402379 in btrfs_header_nritems (eb=0x0) at ctree.h:1426
#1  0x00408c14 in run_next_block (root=0x73fb40, bits=0x740d50, 
bits_nr=1024, last=0x7fffd948, pending=0x7fffda40,
seen=0x7fffda50, reada=0x7fffda30, nodes=0x7fffda20, 
extent_cache=0x7fffda60) at btrfsck.c:2512
#2  0x004099e2 in check_extents (root=0x73fb40) at btrfsck.c:2792
#3  0x00409bec in main (ac=1, av=0x7fffdbe8) at btrfsck.c:2853

What I have seen: buf is 0, after read_tree_block.

btrfsck.c:2511 buf = read_tree_block(root, bytenr, size, 0);
  2512 nritems = btrfs_header_nritems(buf);

So ctree.h crashes here with btrfs_header_nritems(buf)
...
static inline u##bits btrfs_##name(struct extent_buffer *eb)\
{   \
struct btrfs_header *h = (struct btrfs_header *)eb-data;   \
return le##bits##_to_cpu(h-member);\
}   \
...

I expect an error eb == 0 is not covered by ctree.h.
May be another fix is required. E.g. harden btrfsck against 0.

The file system crashes the kernel on some access. I did not follow up this,
cause the file system is corrupt.( Using  openSUSE Tumbleweed 3.4.4-31-desktop)
May be the kernel code requires also checks for this?

Please contact me, if I should do some further tests with this file system
or use some tools for a fix test. (developer knowledge given)

Another minor issue: btrfsck uses much memory. But this might be normal.
(  800MB)

Best regards,
Christian



PS: Just if anyone is interested:
- History + tried: openSUSE btrfsck showed the messages below in the first step.
- /sbin/btrfsck /dev/md3 --repair removed some messages, except checksum.
- File system is mounted with:
  /backup  btrfs  defaults,compress=zlib,noatime 1 2
- filesystem is used to back up some unix system with heavy usage of:
  rsync -aH  --link-dest=...
  So each file should have regular multiple hard links.

===
Is there anybody interested in fixing this file system with me,
to check btrfsck speedy:/home/cv # /sbin/btrfsck /dev/md3
checking extents
checksum verify failed on 2327654400 wanted 73CDE79C found 72
checksum verify failed on 2327654400 wanted 73CDE79C found 72
checksum verify failed on 2327654400 wanted 73CDE79C found 72
checksum verify failed on 2327654400 wanted 73CDE79C found 72
Csum didn't match
owner ref check failed [2327654400 4096]
ref mismatch on [101138354176 98304] extent item 1, found 0
Incorrect local backref count on 101138354176 root 5 owner 1867898 offset 0 
found 0 wanted 1 back 0x1f076d0
backpointer mismatch on [101138354176 98304]
owner ref check failed [101138354176 98304]
ref mismatch on [101138452480 106496] extent item 1, found 0
Incorrect local backref count on 101138452480 root 5 owner 1867899 offset 0 
found 0 wanted 1 back 0x6aa85d0
backpointer mismatch on [101138452480 106496]
owner ref check failed [101138452480 106496]
ref mismatch on [101138558976 8192] extent item 1, found 0
Incorrect local backref count on 101138558976 root 5 owner 1867901 offset 0 
found 0 wanted 1 back 0x6aa8610
backpointer mismatch on [101138558976 8192]
owner ref check failed [101138558976 8192]
ref mismatch on [101138567168 16384] extent item 1, found 0
Incorrect local backref count on 101138567168 root 5 owner 1867902 offset 0 
found 0 wanted 1 back 0x1f8fa80
backpointer mismatch on [101138567168 16384]
owner ref check failed [101138567168 16384]
ref mismatch on [101138583552 16384] extent item 1, found 0
Incorrect local backref count on 101138583552 root 5 owner 1867903 offset 0 
found 0 wanted 1 back 0x1f8fac0
backpointer mismatch on [101138583552 16384]
owner ref check failed [101138583552 16384]
Errors found in extent allocation tree
checking fs roots
checksum verify failed on 2327654400 wanted 73CDE79C found 72
checksum verify failed on 2327654400 wanted 73CDE79C found 72
checksum verify failed on 2327654400 wanted 73CDE79C found 72
checksum verify failed on 2327654400 wanted 73CDE79C found 72
Csum didn't match
Speicherzugriffsfehler
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


no dev_stats entry found / OK on first mount after mkfs

2012-07-02 Thread Christian Kujau
Hi,

after upgrading from 3.4.0 to 3.5.0-rc5 on this powerpc machine, the 
following is printed during bootup:

  [   18.630750] device fsid ce8c9df5-0a93-47c6-adf6-25084f352a4f devid 1 
transid 11061 /dev/hda7
  [   18.637193] btrfs: disk space caching is enabled
  [   18.706423] btrfs: no dev_stats entry found for device /dev/hda7 (devid 1) 
(OK on first mount after mkfs)

The btrfs on hda7 has been created many months ago and has been mounted 
several times since then. Assuming first mount after mkfs does not apply 
here, is it then NOT OK that no dev_stats entry has been found?

IOW: should I worry about this message?

Thanks,
Christian.
-- 
BOFH excuse #313:

your process is not ISO 9000 compliant
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: no dev_stats entry found / OK on first mount after mkfs

2012-07-02 Thread Christian Kujau
On Tue, 3 Jul 2012 at 01:41, Ilya Dryomov wrote:
  after upgrading from 3.4.0 to 3.5.0-rc5 on this powerpc machine, the 
  following is printed during bootup:
  
[   18.630750] device fsid ce8c9df5-0a93-47c6-adf6-25084f352a4f devid 1 
  transid 11061 /dev/hda7
[   18.637193] btrfs: disk space caching is enabled
[   18.706423] btrfs: no dev_stats entry found for device /dev/hda7 
  (devid 1) (OK on first mount after mkfs)
  
  The btrfs on hda7 has been created many months ago and has been mounted 
  several times since then. Assuming first mount after mkfs does not apply 
  here, is it then NOT OK that no dev_stats entry has been found?
  
  IOW: should I worry about this message?
 
 No, you should not.  If this is your first mount after upgrading your
 kernel to the one with btrfs dev stats feature, first mount after mkfs
 sort of applies.

OK, this is what I'd hoped for :-) Maybe the message should read OK on 
first mount after mkfs or kernel upgrade - but then again nobody else got 
worried, so the current message is OK, I guess.

Thanks for replying,
Christian.
-- 
BOFH excuse #332:

suboptimal routing experience
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ceph on btrfs 3.4rc

2012-05-24 Thread Christian Brunner
Same thing here.

I've tried really hard, but even after 12 hours I wasn't able to get a
single warning from btrfs.

I think you cracked it!

Thanks,
Christian

2012/5/24 Martin Mailand mar...@tuxadero.com:
 Hi,
 the ceph cluster is running under heavy load for the last 13 hours without a
 problem, dmesg is empty and the performance is good.

 -martin

 Am 23.05.2012 21:12, schrieb Martin Mailand:

 this patch is running for 3 hours without a Bug and without the Warning.
 I will let it run overnight and report tomorrow.
 It looks very good ;-)
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ceph on btrfs 3.4rc

2012-05-23 Thread Christian Brunner
2012/5/22 Josef Bacik jo...@redhat.com:


 Yeah you would also need to change orphan_meta_reserved.  I fixed this by just
 taking the BTRFS_I(inode)-lock when messing with these since we don't want to
 take up all that space in the inode just for a marker.  I ran this patch for 3
 hours with no issues, let me know if it works for you.  Thanks,

Compared to the last runs, I had to run it much longer, but somehow I
managed to hit a BUG_ON again:

[448281.002087] couldn't find orphan item for 2027, nlink 1, root 308,
root being deleted no
[448281.011339] [ cut here ]
[448281.016590] kernel BUG at fs/btrfs/inode.c:2230!
[448281.021837] invalid opcode:  [#1] SMP
[448281.026525] CPU 4
[448281.028670] Modules linked in: btrfs zlib_deflate libcrc32c xfs
exportfs sunrpc bonding ipv6 sg serio_raw pcspkr iTCO_wdt
iTCO_vendor_support ixgbe dca mdio i7core_edac edac_core
iomemory_vsl(PO) hpsa squashfs [last unloaded: btrfs]
[448281.052215]
[448281.053977] Pid: 16018, comm: ceph-osd Tainted: PW  O
3.3.5-1.fits.1.el6.x86_64 #1 HP ProLiant DL180 G6
[448281.06] RIP: 0010:[a04a17ab]  [a04a17ab]
btrfs_orphan_del+0x19b/0x1b0 [btrfs]
[448281.075965] RSP: 0018:880458257d18  EFLAGS: 00010292
[448281.081987] RAX: 0063 RBX: 8803a28ebc48 RCX:
2fdb
[448281.090042] RDX:  RSI: 0046 RDI:
0246
[448281.098093] RBP: 880458257d58 R08: 81af6100 R09:

[448281.106146] R10: 0004 R11:  R12:
0001
[448281.114202] R13: 88052e130400 R14: 0001 R15:
8805beae9e10
[448281.122262] FS:  7fa2e772f700() GS:88062728()
knlGS:
[448281.131386] CS:  0010 DS:  ES:  CR0: 80050033
[448281.137879] CR2: ff600400 CR3: 0005015a5000 CR4:
06e0
[448281.145929] DR0:  DR1:  DR2:

[448281.153974] DR3:  DR6: 0ff0 DR7:
0400
[448281.162043] Process ceph-osd (pid: 16018, threadinfo
880458256000, task 88055b711940)
[448281.171646] Stack:
[448281.173987]  880458257dff 8803a28eba98 880458257d58
8805beae9e10
[448281.182377]   88052e130400 88029ff33380
8803a28ebc48
[448281.190766]  880458257e08 a04ab4e6 
8803a28ebc48
[448281.199155] Call Trace:
[448281.202005]  [a04ab4e6] btrfs_truncate+0x5f6/0x660 [btrfs]
[448281.209203]  [a04ab646] btrfs_setattr+0xf6/0x1a0 [btrfs]
[448281.216202]  [811816fb] notify_change+0x18b/0x2b0
[448281.222517]  [81276541] ? selinux_inode_permission+0xd1/0x130
[448281.229990]  [81165f44] do_truncate+0x64/0xa0
[448281.235919]  [81172669] ? inode_permission+0x49/0x100
[448281.242617]  [81166197] sys_truncate+0x137/0x150
[448281.248838]  [8158b1e9] system_call_fastpath+0x16/0x1b
[448281.255631] Code: a0 49 8b 8d f0 02 00 00 8b 53 48 4c 0f 45 c0 48
85 f6 74 1b 80 bb 60 fe ff ff 84 74 12 48 c7 c7 e8 1d 50 a0 31 c0 e8
9d ea 0d e1 0f 0b eb fe 48 8b 73 40 eb e8 66 66 2e 0f 1f 84 00 00 00
00 00
[448281.277435] RIP  [a04a17ab] btrfs_orphan_del+0x19b/0x1b0 [btrfs]
[448281.285229]  RSP 880458257d18
[448281.289667] ---[ end trace 9adc7b36a3e66872 ]---

Sorry,
Christian
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ceph on btrfs 3.4rc

2012-05-22 Thread Christian Brunner
 ]---


Regards,
Christian
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs: Probably the larger filesystem I will see for a long time

2012-05-18 Thread Christian Robert

Probably the larger filesystem I will ever see. Tryed 8 Exabytes but it failed.

[root@CentOS6-A:/root] # df
Filesystem1K-blocks  Used Available  Use%  
Mounted
/dev/mapper/vg01-root  17915884  11533392   5513572   68%  /
/dev/sda1508745140314342831   30%  /boot
/dev/mapper/data_0 66993872   1644372  619940603%  
/mnt/data_0
/dev/mapper/data_1 7881299347898368508360  78812482240918961%  
/mnt/data_1

[root@CentOS6-A:/root] # df -h
Filesystem Size  Used  Avail  Use%  Mounted
/dev/mapper/vg01-root   18G   11G   5.3G   68%  /
/dev/sda1  497M  138M   335M   30%  /boot
/dev/mapper/data_0  64G  1.6G60G3%  /mnt/data_0
/dev/mapper/data_1 7.0E  497M   7.0E1%  /mnt/data_1

[root@CentOS6-A:/root] # df -Th
Filesystem  Type  Size  Used  Avail  Use%
/dev/mapper/vg01-root   ext4   18G   11G   5.3G  68%
/dev/sda1   ext4  497M  138M   335M  30%
/dev/mapper/data_0  ext4   64G  1.6G60G  3%
/dev/mapper/data_1 btrfs  7.0E  499M   7.0E  1%
[root@CentOS6-A:/root] #


[root@CentOS6-A:/root] # uname -rv
3.4.0-rc7+ #23 SMP Wed May 16 20:20:47 EDT 2012


made with a dm-thin device sitting on a device pair composed of (metadata 
256Megs and data 23 Gigs)

running on my laptop at home.

yes, this is 7 Exabytes or 7,168 Petabytes or ( 7,340,032 Terabytes ) or 
7,516,192,768 Gigabytes.


please do not answer, it is just a statement of a fact at 3.4-rc7 (was not 
working at 3.4-rc3 if I remember).


Xtian.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ceph on btrfs 3.4rc

2012-05-17 Thread Christian Brunner
2012/5/17 Josef Bacik jo...@redhat.com:
 On Thu, May 17, 2012 at 05:12:55PM +0200, Martin Mailand wrote:
 Hi Josef,
 no there was nothing above. Here the is another dmesg output.


 Hrm ok give this a try and hopefully this is it, still couldn't reproduce.
 Thanks,

 Josef

Well, I hate to say it, but the new patch doesn't seem to change much...

Regards,
Christian

[  123.507444] Btrfs loaded
[  202.683630] device fsid 2aa7531c-0e3c-4955-8542-6aed7ab8c1a2 devid
1 transid 4 /dev/sda
[  202.693704] btrfs: use lzo compression
[  202.697999] btrfs: enabling inode map caching
[  202.702989] btrfs: enabling auto defrag
[  202.707190] btrfs: disk space caching is enabled
[  202.712721] btrfs flagging fs with big metadata feature
[  207.839761] device fsid f81ff6a1-c333-4daf-989f-a28139f15f08 devid
1 transid 4 /dev/sdb
[  207.849681] btrfs: use lzo compression
[  207.853987] btrfs: enabling inode map caching
[  207.858970] btrfs: enabling auto defrag
[  207.863173] btrfs: disk space caching is enabled
[  207.868635] btrfs flagging fs with big metadata feature
[  210.857328] device fsid 9b905faa-f4fa-4626-9cae-2cd0287b30f7 devid
1 transid 4 /dev/sdc
[  210.867265] btrfs: use lzo compression
[  210.871560] btrfs: enabling inode map caching
[  210.876550] btrfs: enabling auto defrag
[  210.880757] btrfs: disk space caching is enabled
[  210.886228] btrfs flagging fs with big metadata feature
[  214.296287] device fsid f7990e4c-90b0-4691-9502-92b60538574a devid
1 transid 4 /dev/sdd
[  214.306510] btrfs: use lzo compression
[  214.310855] btrfs: enabling inode map caching
[  214.315905] btrfs: enabling auto defrag
[  214.320174] btrfs: disk space caching is enabled
[  214.325706] btrfs flagging fs with big metadata feature
[ 1337.937379] [ cut here ]
[ 1337.942526] kernel BUG at fs/btrfs/inode.c:2224!
[ 1337.947671] invalid opcode:  [#1] SMP
[ 1337.952255] CPU 5
[ 1337.954300] Modules linked in: btrfs zlib_deflate libcrc32c xfs
exportfs sunrpc bonding ipv6 sg pcspkr serio_raw iTCO_wdt
iTCO_vendor_support iomemory_vsl(PO) ixgbe dca mdio i7core_edac
edac_core hpsa squashfs [last unloaded: scsi_wait_scan]
[ 1337.978570]
[ 1337.980230] Pid: 6812, comm: ceph-osd Tainted: P   O
3.3.5-1.fits.1.el6.x86_64 #1 HP ProLiant DL180 G6
[ 1337.991592] RIP: 0010:[a035675c]  [a035675c]
btrfs_orphan_del+0x14c/0x150 [btrfs]
[ 1338.001897] RSP: 0018:8805e1171d38  EFLAGS: 00010282
[ 1338.007815] RAX: fffe RBX: 88061c3c8400 RCX: 00b37f48
[ 1338.015768] RDX: 00b37f47 RSI: 8805ec2a1cf0 RDI: ea0017b0a840
[ 1338.023724] RBP: 8805e1171d68 R08: 60f9d88028a0 R09: a033016a
[ 1338.031675] R10:  R11: 0004 R12: 8805de7f57a0
[ 1338.039629] R13: 0001 R14: 0001 R15: 8805ec2a5280
[ 1338.047584] FS:  7f4bffc6e700() GS:8806272a()
knlGS:
[ 1338.056600] CS:  0010 DS:  ES:  CR0: 80050033
[ 1338.063003] CR2: ff600400 CR3: 0005e34c3000 CR4: 06e0
[ 1338.070954] DR0:  DR1:  DR2: 
[ 1338.078909] DR3:  DR6: 0ff0 DR7: 0400
[ 1338.086865] Process ceph-osd (pid: 6812, threadinfo
8805e117, task 88060fa81940)
[ 1338.096268] Stack:
[ 1338.098509]  8805e1171d68 8805ec2a5280 88051235b920

[ 1338.106795]  88051235b920 0008 8805e1171e08
a036043c
[ 1338.115082]    
00011000
[ 1338.123367] Call Trace:
[ 1338.126111]  [a036043c] btrfs_truncate+0x5bc/0x640 [btrfs]
[ 1338.133213]  [a03605b6] btrfs_setattr+0xf6/0x1a0 [btrfs]
[ 1338.140105]  [811816fb] notify_change+0x18b/0x2b0
[ 1338.146320]  [81276541] ? selinux_inode_permission+0xd1/0x130
[ 1338.153699]  [81165f44] do_truncate+0x64/0xa0
[ 1338.159527]  [81172669] ? inode_permission+0x49/0x100
[ 1338.166128]  [81166197] sys_truncate+0x137/0x150
[ 1338.172244]  [8158b1e9] system_call_fastpath+0x16/0x1b
[ 1338.178936] Code: 89 e7 e8 88 7d fe ff eb 89 66 0f 1f 44 00 00 be
a4 08 00 00 48 c7 c7 59 49 3b a0 45 31 ed e8 5c 78 cf e0 45 31 f6 e9
30 ff ff ff 0f 0b eb fe 55 48 89 e5 48 83 ec 40 48 89 5d d8 4c 89 65
e0 4c
[ 1338.200623] RIP  [a035675c] btrfs_orphan_del+0x14c/0x150 [btrfs]
[ 1338.208317]  RSP 8805e1171d38
[ 1338.212681] ---[ end trace 86be14f0f863ea79 ]---
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ceph on btrfs 3.4rc

2012-05-11 Thread Christian Brunner
2012/5/10 Josef Bacik jo...@redhat.com:
 On Fri, Apr 27, 2012 at 01:02:08PM +0200, Christian Brunner wrote:
 Am 24. April 2012 18:26 schrieb Sage Weil s...@newdream.net:
  On Tue, 24 Apr 2012, Josef Bacik wrote:
  On Fri, Apr 20, 2012 at 05:09:34PM +0200, Christian Brunner wrote:
   After running ceph on XFS for some time, I decided to try btrfs again.
   Performance with the current for-linux-min branch and big metadata
   is much better. The only problem (?) I'm still seeing is a warning
   that seems to occur from time to time:
 
  Actually, before you do that... we have a new tool,
  test_filestore_workloadgen, that generates a ceph-osd-like workload on the
  local file system.  It's a subset of what a full OSD might do, but if
  we're lucky it will be sufficient to reproduce this issue.  Something like
 
   test_filestore_workloadgen --osd-data /foo --osd-journal /bar
 
  will hopefully do the trick.
 
  Christian, maybe you can see if that is able to trigger this warning?
  You'll need to pull it from the current master branch; it wasn't in the
  last release.

 Trying to reproduce with test_filestore_workloadgen didn't work for
 me. So here are some instructions on how to reproduce with a minimal
 ceph setup.
 [...]

 Well I feel like an idiot, I finally get it to reproduce, go look at where I
 want to put my printks and theres the problem staring me right in the face.
 I've looked seriously at this problem 2 or 3 times and have missed this every
 single freaking time.  Here is the patch I'm trying, please try it on yours to
 make sure it fixes the problem.  It takes like 2 hours for it to reproduce for
 me so I won't be able to fully test it until tomorrow, but so far it hasn't
 broken anything so it should be good.  Thanks,

Great! I've put your patch on my testbox and will run a test over the
weekend. I'll report back on monday.

Thanks,
Christian
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ceph on btrfs 3.4rc

2012-05-04 Thread Christian Brunner
2012/5/3 Josef Bacik jo...@redhat.com:
 On Thu, May 03, 2012 at 09:38:27AM -0700, Josh Durgin wrote:
 On Thu, 3 May 2012 11:20:53 -0400, Josef Bacik jo...@redhat.com
 wrote:
  On Thu, May 03, 2012 at 08:17:43AM -0700, Josh Durgin wrote:
 
  Yeah all that was in the right place, I rebooted and I magically
  stopped getting
  that error, but now I'm getting this
 
  http://fpaste.org/OE92/
 
  with that ping thing repeating over and over.  Thanks,

 That just looks like the osd isn't running. If you restart the
 osd with 'debug osd = 20' the osd log should tell us what's going on.

 Ok that part was my fault, Duh I need to redo the tmpfs and mkcephfs stuff 
 after
 reboot.  But now I'm back to my original problem

 http://fpaste.org/PfwO/

 I have the osd class dir = /usr/lib64/rados-classes thing set and libcls_rbd 
 is
 in there, so I'm not sure what is wrong.  Thanks,

Thats really strange. Do you have the osd logs in /var/log/ceph? If
so, can you look if you find anything about rbd or class loading
in there?

Another thing you should try is, whether you can access ceph with rados:

# rados -p rbd ls
# rados -p rbd -i /proc/cpuinfo put testobj
# rados -p rbd -o - get testobj

Regards,
Christian
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ceph on btrfs 3.4rc

2012-04-30 Thread Christian Brunner
2012/4/29 tsuna tsuna...@gmail.com:
 On Fri, Apr 20, 2012 at 8:09 AM, Christian Brunner
 christ...@brunner-muc.de wrote:
 After running ceph on XFS for some time, I decided to try btrfs again.
 Performance with the current for-linux-min branch and big metadata
 is much better.

 I've heard that although performance from btrfs is better at first, it
 degrades over time due to metadata fragmentation, whereas XFS'
 performance starts off a little worse, but remains stable even after
 weeks of heavy utilization.  Would be curious to hear your (or
 others') feedback on that topic.

Metadata fragmentation was a big problem (for us) in the past. With
the big metatdata feature (mkfs.btrfs -l 64k -n 64k) these problems
seem to be solved. We do not use it in production yet, but my stress
test didn't show any degradation. The only remaining issues I've seen
are these warnings.

Regards,
Christian
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ceph on btrfs 3.4rc

2012-04-27 Thread Christian Brunner
Am 24. April 2012 18:26 schrieb Sage Weil s...@newdream.net:
 On Tue, 24 Apr 2012, Josef Bacik wrote:
 On Fri, Apr 20, 2012 at 05:09:34PM +0200, Christian Brunner wrote:
  After running ceph on XFS for some time, I decided to try btrfs again.
  Performance with the current for-linux-min branch and big metadata
  is much better. The only problem (?) I'm still seeing is a warning
  that seems to occur from time to time:

 Actually, before you do that... we have a new tool,
 test_filestore_workloadgen, that generates a ceph-osd-like workload on the
 local file system.  It's a subset of what a full OSD might do, but if
 we're lucky it will be sufficient to reproduce this issue.  Something like

  test_filestore_workloadgen --osd-data /foo --osd-journal /bar

 will hopefully do the trick.

 Christian, maybe you can see if that is able to trigger this warning?
 You'll need to pull it from the current master branch; it wasn't in the
 last release.

Trying to reproduce with test_filestore_workloadgen didn't work for
me. So here are some instructions on how to reproduce with a minimal
ceph setup.

You will need a single system with two disks and a bit of memory.

- Compile and install ceph (detailed instructions:
http://ceph.newdream.net/docs/master/ops/install/mkcephfs/)

- For the test setup I've used two tmpfs files as journal devices. To
create these, do the following:

# mkdir -p /ceph/temp
# mount -t tmpfs tmpfs /ceph/temp
# dd if=/dev/zero of=/ceph/temp/journal0 count=500 bs=1024k
# dd if=/dev/zero of=/ceph/temp/journal1 count=500 bs=1024k

- Now you should create and mount btrfs. Here is what I did:

# mkfs.btrfs -l 64k -n 64k /dev/sda
# mkfs.btrfs -l 64k -n 64k /dev/sdb
# mkdir /ceph/osd.000
# mkdir /ceph/osd.001
# mount -o noatime,space_cache,inode_cache,autodefrag /dev/sda /ceph/osd.000
# mount -o noatime,space_cache,inode_cache,autodefrag /dev/sdb /ceph/osd.001

- Create /etc/ceph/ceph.conf similar to the attached ceph.conf. You
will probably have to change the btrfs devices and the hostname
(os39).

- Create the ceph filesystems:

# mkdir /ceph/mon
# mkcephfs -a -c /etc/ceph/ceph.conf

- Start ceph (e.g. service ceph start)

- Now you should be able to use ceph - ceph -s will tell you about
the state of the ceph cluster.

- rbd create -size 100 testimg will create an rbd image on the ceph cluster.

- Compile my test with gcc -o rbdtest rbdtest.c -lrbd and run it
with ./rbdtest testimg.

I can see the first btrfs_orphan_commit_root warning after an hour or
so... I hope that I've described all necessary steps. If there is a
problem just send me a note.

Thanks,
Christian


ceph.conf
Description: Binary data


Re: Ceph on btrfs 3.4rc

2012-04-23 Thread Christian Brunner
I decided to run the test over the weekend. The good news is, that the
system is still running without performance degradation. But in the
meantime I've got over 5000 WARNINGs of this kind:

[330700.043557] btrfs: block rsv returned -28
[330700.043559] [ cut here ]
[330700.048898] WARNING: at fs/btrfs/extent-tree.c:6220
btrfs_alloc_free_block+0x357/0x370 [btrfs]()
[330700.058880] Hardware name: ProLiant DL180 G6
[330700.064044] Modules linked in: btrfs zlib_deflate libcrc32c xfs
exportfs sunrpc bonding ipv6 sg serio_raw pcspkr iTCO_wdt
iTCO_vendor_support i7core_edac edac_core ixgbe dca mdio
iomemory_vsl(PO) hpsa squashfs [last unloaded: scsi_wait_scan]
[330700.090361] Pid: 7954, comm: btrfs-endio-wri Tainted: PW
O 3.3.2-1.fits.1.el6.x86_64 #1
[330700.100393] Call Trace:
[330700.103263]  [8104df6f] warn_slowpath_common+0x7f/0xc0
[330700.110201]  [8104dfca] warn_slowpath_null+0x1a/0x20
[330700.116905]  [a03436f7] btrfs_alloc_free_block+0x357/0x370 [btrfs]
[330700.124988]  [a0330eb0] ? __btrfs_cow_block+0x330/0x530 [btrfs]
[330700.132787]  [a0398174] ?
btrfs_add_delayed_data_ref+0x64/0x1c0 [btrfs]
[330700.141369]  [a0372d8b] ? read_extent_buffer+0xbb/0x120 [btrfs]
[330700.149194]  [a0365d6d] ?
btrfs_token_item_offset+0x5d/0xe0 [btrfs]
[330700.157373]  [a0330cb3] __btrfs_cow_block+0x133/0x530 [btrfs]
[330700.165023]  [a032f2ed] ?
read_block_for_search+0x14d/0x3d0 [btrfs]
[330700.173183]  [a0331684] btrfs_cow_block+0xf4/0x1f0 [btrfs]
[330700.180552]  [a03344b8] btrfs_search_slot+0x3e8/0x8e0 [btrfs]
[330700.188128]  [a03469f4] btrfs_lookup_csum+0x74/0x170 [btrfs]
[330700.195634]  [811589e5] ? kmem_cache_alloc+0x105/0x130
[330700.202551]  [a03477e0] btrfs_csum_file_blocks+0xd0/0x6d0 [btrfs]
[330700.210542]  [a03768b1] ? clear_extent_bit+0x161/0x420 [btrfs]
[330700.218237]  [a0354109] add_pending_csums+0x49/0x70 [btrfs]
[330700.225706]  [a0357de6]
btrfs_finish_ordered_io+0x276/0x3d0 [btrfs]
[330700.233940]  [a0357f8c]
btrfs_writepage_end_io_hook+0x4c/0xa0 [btrfs]
[330700.242345]  [a0376cb9] end_extent_writepage+0x69/0x100 [btrfs]
[330700.250192]  [a0376db6] end_bio_extent_writepage+0x66/0xa0 [btrfs]
[330700.258327]  [8119959d] bio_endio+0x1d/0x40
[330700.264214]  [a034b135] end_workqueue_fn+0x45/0x50 [btrfs]
[330700.271612]  [a03831df] worker_loop+0x14f/0x5a0 [btrfs]
[330700.278672]  [a0383090] ? btrfs_queue_worker+0x300/0x300 [btrfs]
[330700.286582]  [a0383090] ? btrfs_queue_worker+0x300/0x300 [btrfs]
[330700.294535]  [810703fe] kthread+0x9e/0xb0
[330700.300244]  [8158c224] kernel_thread_helper+0x4/0x10
[330700.307031]  [81070360] ? kthread_freezable_should_stop+0x70/0x70
[330700.315061]  [8158c220] ? gs_change+0x13/0x13
[330700.321167] ---[ end trace b8c31966cca74ca0 ]---

The filesystems have plenty of free space:

/dev/sda  1.9T   16G  1.8T   1% /ceph/osd.000
/dev/sdb  1.9T   15G  1.8T   1% /ceph/osd.001
/dev/sdc  1.9T   13G  1.8T   1% /ceph/osd.002
/dev/sdd  1.9T   14G  1.8T   1% /ceph/osd.003

# btrfs fi df /ceph/osd.000
Data: total=38.01GB, used=15.53GB
System, DUP: total=8.00MB, used=64.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=37.50GB, used=82.19MB
Metadata: total=8.00MB, used=0.00

A few more btrfs_orphan_commit_root WARNINGS are present too. If
needed I could upload the messages file.

Regards,
Christian

Am 20. April 2012 17:09 schrieb Christian Brunner christ...@brunner-muc.de:
 After running ceph on XFS for some time, I decided to try btrfs again.
 Performance with the current for-linux-min branch and big metadata
 is much better. The only problem (?) I'm still seeing is a warning
 that seems to occur from time to time:

 [87703.784552] [ cut here ]
 [87703.789759] WARNING: at fs/btrfs/inode.c:2103
 btrfs_orphan_commit_root+0xf6/0x100 [btrfs]()
 [87703.799070] Hardware name: ProLiant DL180 G6
 [87703.804024] Modules linked in: btrfs zlib_deflate libcrc32c xfs
 exportfs sunrpc bonding ipv6 sg serio_raw pcspkr iTCO_wdt
 iTCO_vendor_support i7core_edac edac_core ixgbe dca mdio
 iomemory_vsl(PO) hpsa squashfs [last unloaded: scsi_wait_scan]
 [87703.828166] Pid: 929, comm: kworker/1:2 Tainted: P           O
 3.3.2-1.fits.1.el6.x86_64 #1
 [87703.837513] Call Trace:
 [87703.840280]  [8104df6f] warn_slowpath_common+0x7f/0xc0
 [87703.847016]  [8104dfca] warn_slowpath_null+0x1a/0x20
 [87703.853533]  [a0355686] btrfs_orphan_commit_root+0xf6/0x100 
 [btrfs]
 [87703.861541]  [a0350a06] commit_fs_roots+0xc6/0x1c0 [btrfs]
 [87703.868674]  [a0351bcb]
 btrfs_commit_transaction+0x5db/0xa50 [btrfs]
 [87703.876745]  [810127a3] ? __switch_to+0x153/0x440
 [87703.882966]  [81070a90] ? wake_up_bit+0x40/0x40
 [87703.888997

Ceph on btrfs 3.4rc

2012-04-20 Thread Christian Brunner
After running ceph on XFS for some time, I decided to try btrfs again.
Performance with the current for-linux-min branch and big metadata
is much better. The only problem (?) I'm still seeing is a warning
that seems to occur from time to time:

[87703.784552] [ cut here ]
[87703.789759] WARNING: at fs/btrfs/inode.c:2103
btrfs_orphan_commit_root+0xf6/0x100 [btrfs]()
[87703.799070] Hardware name: ProLiant DL180 G6
[87703.804024] Modules linked in: btrfs zlib_deflate libcrc32c xfs
exportfs sunrpc bonding ipv6 sg serio_raw pcspkr iTCO_wdt
iTCO_vendor_support i7core_edac edac_core ixgbe dca mdio
iomemory_vsl(PO) hpsa squashfs [last unloaded: scsi_wait_scan]
[87703.828166] Pid: 929, comm: kworker/1:2 Tainted: P   O
3.3.2-1.fits.1.el6.x86_64 #1
[87703.837513] Call Trace:
[87703.840280]  [8104df6f] warn_slowpath_common+0x7f/0xc0
[87703.847016]  [8104dfca] warn_slowpath_null+0x1a/0x20
[87703.853533]  [a0355686] btrfs_orphan_commit_root+0xf6/0x100 [btrfs]
[87703.861541]  [a0350a06] commit_fs_roots+0xc6/0x1c0 [btrfs]
[87703.868674]  [a0351bcb]
btrfs_commit_transaction+0x5db/0xa50 [btrfs]
[87703.876745]  [810127a3] ? __switch_to+0x153/0x440
[87703.882966]  [81070a90] ? wake_up_bit+0x40/0x40
[87703.888997]  [a0352040] ?
btrfs_commit_transaction+0xa50/0xa50 [btrfs]
[87703.897271]  [a035205f] do_async_commit+0x1f/0x30 [btrfs]
[87703.904262]  [81068949] process_one_work+0x129/0x450
[87703.910777]  [8106b7eb] worker_thread+0x17b/0x3c0
[87703.916991]  [8106b670] ? manage_workers+0x220/0x220
[87703.923504]  [810703fe] kthread+0x9e/0xb0
[87703.928952]  [8158c224] kernel_thread_helper+0x4/0x10
[87703.93]  [81070360] ? kthread_freezable_should_stop+0x70/0x70
[87703.943323]  [8158c220] ? gs_change+0x13/0x13
[87703.949149] ---[ end trace b8c31966cca731fa ]---
[91128.812399] [ cut here ]
[91128.817576] WARNING: at fs/btrfs/inode.c:2103
btrfs_orphan_commit_root+0xf6/0x100 [btrfs]()
[91128.826930] Hardware name: ProLiant DL180 G6
[91128.831897] Modules linked in: btrfs zlib_deflate libcrc32c xfs
exportfs sunrpc bonding ipv6 sg serio_raw pcspkr iTCO_wdt
iTCO_vendor_support i7core_edac edac_core ixgbe dca mdio
iomemory_vsl(PO) hpsa squashfs [last unloaded: scsi_wait_scan]
[91128.856086] Pid: 6806, comm: btrfs-transacti Tainted: PW  O
3.3.2-1.fits.1.el6.x86_64 #1
[91128.865912] Call Trace:
[91128.868670]  [8104df6f] warn_slowpath_common+0x7f/0xc0
[91128.875379]  [8104dfca] warn_slowpath_null+0x1a/0x20
[91128.881900]  [a0355686] btrfs_orphan_commit_root+0xf6/0x100 [btrfs]
[91128.889894]  [a0350a06] commit_fs_roots+0xc6/0x1c0 [btrfs]
[91128.897019]  [a03a2b61] ?
btrfs_run_delayed_items+0xf1/0x160 [btrfs]
[91128.905075]  [a0351bcb]
btrfs_commit_transaction+0x5db/0xa50 [btrfs]
[91128.913156]  [a03524b2] ? start_transaction+0x92/0x310 [btrfs]
[91128.920643]  [81070a90] ? wake_up_bit+0x40/0x40
[91128.926667]  [a034cfcb] transaction_kthread+0x26b/0x2e0 [btrfs]
[91128.934254]  [a034cd60] ?
btrfs_destroy_marked_extents.clone.0+0x1f0/0x1f0 [btrfs]
[91128.943671]  [a034cd60] ?
btrfs_destroy_marked_extents.clone.0+0x1f0/0x1f0 [btrfs]
[91128.953079]  [810703fe] kthread+0x9e/0xb0
[91128.958532]  [8158c224] kernel_thread_helper+0x4/0x10
[91128.965133]  [81070360] ? kthread_freezable_should_stop+0x70/0x70
[91128.972913]  [8158c220] ? gs_change+0x13/0x13
[91128.978826] ---[ end trace b8c31966cca731fb ]---

I'm able to reproduce this with ceph on a single server with 4 disks
(4 filesystems/osds) and a small test program based on librbd. It is
simply writing random bytes on a rbd volume (see attachment).

Is this something I should care about? Any hint's on solving this
would be appreciated.

Thanks,
Christian
#include inttypes.h
#include rbd/librbd.h
#include stdio.h
#include signal.h

int nr_writes=0;

void
alarm_handler(int sig) {
fprintf(stderr, Writes/sec: %i\n, nr_writes/10);
	nr_writes = 0;
	alarm(10);
}


int main(int argc, char *argv[]) {
char *clientname;
rados_t cluster;
rados_ioctx_t io_ctx;
rbd_image_t image;
char *pool = rbd;
char *imgname = argv[1];
	
if (rados_create(cluster, NULL)  0) {
fprintf(stderr, error initializing);
return 1;
}

rados_conf_read_file(cluster, NULL);
	
if (rados_connect(cluster)  0) {
fprintf(stderr, error connecting);
rados_shutdown(cluster);
return 1;
}

if (rados_ioctx_create(cluster, pool, io_ctx)  0) {
fprintf(stderr, error opening pool %s, pool);
rados_shutdown(cluster);
return 1;
}

int r = rbd_open(io_ctx, imgname, image, NULL);
if (r  0) {
fprintf(stderr, error reading header from %s, imgname);
rados_ioctx_destroy(io_ctx);
rados_shutdown

  1   2   >