Re: [PATCH 4/5] btrfs: Allow barrier_all_devices to do per-chunk device check

2015-10-30 Thread Anand Jain



Qu,

 We shouldn't mark FS readonly when chunks are degradable.
 As below.

Thanks, Anand


diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 39a2d57..dbb2483 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3530,7 +3530,7 @@ static int write_all_supers(struct btrfs_root 
*root, int max_mirrors)


if (do_barriers) {
ret = barrier_all_devices(root->fs_info);
-   if (ret) {
+   if (ret < 0) {
mutex_unlock(

>fs_info->fs_devices->device_list_mutex);
btrfs_std_error(root->fs_info, ret,




On 09/21/2015 10:10 AM, Qu Wenruo wrote:

The last user of num_tolerated_disk_barrier_failures is
barrier_all_devices().
But it's can be easily changed to new per-chunk degradable check
framework.

Now btrfs_device will have two extra members, representing send/wait
error, set at write_dev_flush() time.
And then check it in a similar but more accurate behavior than old code.

Signed-off-by: Qu Wenruo 
---
  fs/btrfs/disk-io.c | 13 +
  fs/btrfs/volumes.c |  6 +-
  fs/btrfs/volumes.h |  4 
  3 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index d64299f..7cd94e7 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3400,8 +3400,6 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
  {
struct list_head *head;
struct btrfs_device *dev;
-   int errors_send = 0;
-   int errors_wait = 0;
int ret;

/* send down all the barriers */
@@ -3410,7 +3408,7 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
if (dev->missing)
continue;
if (!dev->bdev) {
-   errors_send++;
+   dev->err_send = 1;
continue;
}
if (!dev->in_fs_metadata || !dev->writeable)
@@ -3418,7 +3416,7 @@ static int barrier_all_devices(struct btrfs_fs_info *info)

ret = write_dev_flush(dev, 0);
if (ret)
-   errors_send++;
+   dev->err_send = 1;
}

/* wait for all the barriers */
@@ -3426,7 +3424,7 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
if (dev->missing)
continue;
if (!dev->bdev) {
-   errors_wait++;
+   dev->err_wait = 1;
continue;
}
if (!dev->in_fs_metadata || !dev->writeable)
@@ -3434,10 +3432,9 @@ static int barrier_all_devices(struct btrfs_fs_info 
*info)

ret = write_dev_flush(dev, 1);
if (ret)
-   errors_wait++;
+   dev->err_wait = 1;
}
-   if (errors_send > info->num_tolerated_disk_barrier_failures ||
-   errors_wait > info->num_tolerated_disk_barrier_failures)
+   if (btrfs_check_degradable(info, info->sb->s_flags) < 0)
return -EIO;
return 0;
  }
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f1ef215..88266fa 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6945,8 +6945,12 @@ int btrfs_check_degradable(struct btrfs_fs_info 
*fs_info, unsigned flags)
btrfs_get_num_tolerated_disk_barrier_failures(
map->type);
for (i = 0; i < map->num_stripes; i++) {
-   if (map->stripes[i].dev->missing)
+   if (map->stripes[i].dev->missing ||
+   map->stripes[i].dev->err_wait ||
+   map->stripes[i].dev->err_send)
missing++;
+   map->stripes[i].dev->err_wait = 0;
+   map->stripes[i].dev->err_send = 0;
}
if (missing > max_tolerated) {
ret = -EIO;
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index fe758df..cd02556 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -76,6 +76,10 @@ struct btrfs_device {
int can_discard;
int is_tgtdev_for_dev_replace;

+   /* for barrier_all_devices() check */
+   int err_send;
+   int err_wait;
+
  #ifdef __BTRFS_NEED_DEVICE_DATA_ORDERED
seqcount_t data_seqcount;
  #endif


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS BUG at insert_inline_extent_backref+0xe3/0xf0 while rebalancing

2015-10-30 Thread Erkki Seppala
Filipe Manana  writes:
> Try this (just sent a few minutes ago):
> https://patchwork.kernel.org/patch/7463161/

I've been using this patch for a week now, doing two rebalances a day
(one per file system) - no problem so far. Thanks!

Probably unrelated to this I did experience one reboot without any
trace, possibly because I had enabled panic = 10 and panic_on_oops = 1,
but that event did not happen anytime near a balance was happening. I
wonder if the hang detector could trigger that configuration to reboot?

Thanks again for the great work, your detective work is always
impressive :).

-- 
  _
 / __// /__   __   http://www.modeemi.fi/~flux/\   \
/ /_ / // // /\ \/ /\  /
   /_/  /_/ \___/ /_/\_\@modeemi.fi  \/

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] btrfs: Allow barrier_all_devices to do per-chunk device check

2015-10-30 Thread Anand Jain


On 10/30/2015 07:41 PM, Qu Wenruo wrote:



在 2015年10月30日 16:32, Anand Jain 写道:



Qu,

  We shouldn't mark FS readonly when chunks are degradable.
  As below.

Thanks, Anand


diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 39a2d57..dbb2483 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3530,7 +3530,7 @@ static int write_all_supers(struct btrfs_root
*root, int max_mirrors)

 if (do_barriers) {
 ret = barrier_all_devices(root->fs_info);
-   if (ret) {
+   if (ret < 0) {
 mutex_unlock(

>fs_info->fs_devices->device_list_mutex);
 btrfs_std_error(root->fs_info, ret,




Sorry, I didn't got the point here.

There should be no difference between ret and ret < 0,
as barrier_all_devices() will only return -EIO or 0.


 oh sorry. you are right. I missed that point.

Thanks, Anand



Thanks,
Qu




On 09/21/2015 10:10 AM, Qu Wenruo wrote:

The last user of num_tolerated_disk_barrier_failures is
barrier_all_devices().
But it's can be easily changed to new per-chunk degradable check
framework.

Now btrfs_device will have two extra members, representing send/wait
error, set at write_dev_flush() time.
And then check it in a similar but more accurate behavior than old code.

Signed-off-by: Qu Wenruo 
---
  fs/btrfs/disk-io.c | 13 +
  fs/btrfs/volumes.c |  6 +-
  fs/btrfs/volumes.h |  4 
  3 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index d64299f..7cd94e7 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3400,8 +3400,6 @@ static int barrier_all_devices(struct
btrfs_fs_info *info)
  {
  struct list_head *head;
  struct btrfs_device *dev;
-int errors_send = 0;
-int errors_wait = 0;
  int ret;

  /* send down all the barriers */
@@ -3410,7 +3408,7 @@ static int barrier_all_devices(struct
btrfs_fs_info *info)
  if (dev->missing)
  continue;
  if (!dev->bdev) {
-errors_send++;
+dev->err_send = 1;
  continue;
  }
  if (!dev->in_fs_metadata || !dev->writeable)
@@ -3418,7 +3416,7 @@ static int barrier_all_devices(struct
btrfs_fs_info *info)

  ret = write_dev_flush(dev, 0);
  if (ret)
-errors_send++;
+dev->err_send = 1;
  }

  /* wait for all the barriers */
@@ -3426,7 +3424,7 @@ static int barrier_all_devices(struct
btrfs_fs_info *info)
  if (dev->missing)
  continue;
  if (!dev->bdev) {
-errors_wait++;
+dev->err_wait = 1;
  continue;
  }
  if (!dev->in_fs_metadata || !dev->writeable)
@@ -3434,10 +3432,9 @@ static int barrier_all_devices(struct
btrfs_fs_info *info)

  ret = write_dev_flush(dev, 1);
  if (ret)
-errors_wait++;
+dev->err_wait = 1;
  }
-if (errors_send > info->num_tolerated_disk_barrier_failures ||
-errors_wait > info->num_tolerated_disk_barrier_failures)
+if (btrfs_check_degradable(info, info->sb->s_flags) < 0)
  return -EIO;
  return 0;
  }
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f1ef215..88266fa 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6945,8 +6945,12 @@ int btrfs_check_degradable(struct btrfs_fs_info
*fs_info, unsigned flags)
  btrfs_get_num_tolerated_disk_barrier_failures(
  map->type);
  for (i = 0; i < map->num_stripes; i++) {
-if (map->stripes[i].dev->missing)
+if (map->stripes[i].dev->missing ||
+map->stripes[i].dev->err_wait ||
+map->stripes[i].dev->err_send)
  missing++;
+map->stripes[i].dev->err_wait = 0;
+map->stripes[i].dev->err_send = 0;
  }
  if (missing > max_tolerated) {
  ret = -EIO;
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index fe758df..cd02556 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -76,6 +76,10 @@ struct btrfs_device {
  int can_discard;
  int is_tgtdev_for_dev_replace;

+/* for barrier_all_devices() check */
+int err_send;
+int err_wait;
+
  #ifdef __BTRFS_NEED_DEVICE_DATA_ORDERED
  seqcount_t data_seqcount;
  #endif


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: corrupted RAID1: unsuccessful recovery / help needed

2015-10-30 Thread Duncan
Lukas Pirl posted on Fri, 30 Oct 2015 10:43:41 +1300 as excerpted:

> If there is one subvolume that contains all other (read only) snapshots
> and there is insufficient storage to copy them all separately:
> Is there an elegant way to preserve those when moving the data across
> disks?

AFAIK, no elegant way without a writable mount.

Tho I'm not sure, btrfs send, to a btrfs elsewhere using receive, may 
work, since you did specify read-only snapshots, which is what send 
normally works with in ordered to avoid changes to the snapshot while 
it's sending it.  My own use-case doesn't involve either snapshots or 
send/receive, however, so I'm not sure if send can work with a read-only 
filesystem or not, but I think its normal method of operation is to 
create those read-only snapshots itself, which of course would require a 
writable filesystem, so I'm guessing it won't work unless you can 
convince it to use the read-only mounts as-is.

The less elegant way would involve manual deduplication.  Copy one 
snapshot, then another, and dedup what hasn't changed between the two, 
then add a third and dedup again. ...  Depending on the level of dedup 
(file vs block level) and the level of change in your filesystem, this 
should ultimately take about the same level of space as a full backup 
plus a series of incrementals.


Meanwhile, this does reinforce the point that snapshots don't replace 
full backups, that being the reason I don't use them here, since if the 
filesystem goes bad, it'll very likely take all the snapshots with it.

Snapshots do tend to be pretty convenient, arguably /too/ convenient and 
near-zero-cost to make, as people then tend to just do scheduled 
snapshots, without thinking about their overhead and maintenance costs on 
the filesystem, until they already have problems.  I'm not sure if you 
are a regular list reader and have thus seen my normal spiel on btrfs 
snapshot scaling and recommended limits to avoid problems or not, so if 
not, here's a slightly condensed version...

Btrfs has scaling issues that appear when trying to manage too many 
snapshots.  These tend to appear first in tools like balance and check, 
where time to process a filesystem goes up dramatically as the number of 
snapshots increases, to the point where it can become entirely 
impractical to manage at all somewhere near the 100k snapshots range, and 
is already dramatically affecting runtime at 10k snapshots.

As a result, I recommend keeping per-subvol snapshots to 250-ish, which 
will allow snapshotting four subvolumes while still keeping total 
filesystem snapshots to 1000, or eight subvolumes at a filesystem total 
of 2000 snapshots, levels where the scaling issues should remain well 
within control.  And 250-ish snapshots per subvolume is actually very 
reasonable even with half-hour scheduled snapshotting, provided a 
reasonable scheduled snapshot thinning program is also implemented, 
cutting say to hourly after six hours, six-hourly after a day, 12 hourly 
after 2 days, daily after a week, and weekly after four weeks to a 
quarter (13 weeks).  Out beyond a quarter or two, certainly within a 
year, longer term backups to other media should be done, and snapshots 
beyond that can be removed entirely, freeing up the space the old 
snapshots kept locked down and helping to keep the btrfs healthy and 
functioning well within its practical scalability limits.

Because a balance that takes a month to complete because it's dealing 
with a few hundred k snapshots is in practice (for most people) not 
worthwhile to do at all, and also in practice, a year or even six months 
out, are you really going to care about the precise half-hour snapshot, 
or is the next daily or weekly snapshot going to be just as good, and a 
whole lot easier to find among a couple hundred snapshots than hundreds 
of thousands?

If you have far too many snapshots, perhaps this sort of thinning 
strategy will as well allow you to copy and dedup only key snapshots, say 
weekly plus daily for the last week, doing the backup thing manually, as 
well, modifying the thinning strategy accordingly if necessary to get it 
to fit.  Tho using the copy and dedup strategy above will still require 
at least double the full space of a single copy, plus the space necessary 
for each deduped snapshot copy you keep, since the dedup occurs after the 
copy.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] btrfs: Allow barrier_all_devices to do per-chunk device check

2015-10-30 Thread Qu Wenruo



在 2015年10月30日 16:32, Anand Jain 写道:



Qu,

  We shouldn't mark FS readonly when chunks are degradable.
  As below.

Thanks, Anand


diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 39a2d57..dbb2483 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3530,7 +3530,7 @@ static int write_all_supers(struct btrfs_root
*root, int max_mirrors)

 if (do_barriers) {
 ret = barrier_all_devices(root->fs_info);
-   if (ret) {
+   if (ret < 0) {
 mutex_unlock(

>fs_info->fs_devices->device_list_mutex);
 btrfs_std_error(root->fs_info, ret,




Sorry, I didn't got the point here.

There should be no difference between ret and ret < 0,
as barrier_all_devices() will only return -EIO or 0.

Or I missed something?

Thanks,
Qu




On 09/21/2015 10:10 AM, Qu Wenruo wrote:

The last user of num_tolerated_disk_barrier_failures is
barrier_all_devices().
But it's can be easily changed to new per-chunk degradable check
framework.

Now btrfs_device will have two extra members, representing send/wait
error, set at write_dev_flush() time.
And then check it in a similar but more accurate behavior than old code.

Signed-off-by: Qu Wenruo 
---
  fs/btrfs/disk-io.c | 13 +
  fs/btrfs/volumes.c |  6 +-
  fs/btrfs/volumes.h |  4 
  3 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index d64299f..7cd94e7 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3400,8 +3400,6 @@ static int barrier_all_devices(struct
btrfs_fs_info *info)
  {
  struct list_head *head;
  struct btrfs_device *dev;
-int errors_send = 0;
-int errors_wait = 0;
  int ret;

  /* send down all the barriers */
@@ -3410,7 +3408,7 @@ static int barrier_all_devices(struct
btrfs_fs_info *info)
  if (dev->missing)
  continue;
  if (!dev->bdev) {
-errors_send++;
+dev->err_send = 1;
  continue;
  }
  if (!dev->in_fs_metadata || !dev->writeable)
@@ -3418,7 +3416,7 @@ static int barrier_all_devices(struct
btrfs_fs_info *info)

  ret = write_dev_flush(dev, 0);
  if (ret)
-errors_send++;
+dev->err_send = 1;
  }

  /* wait for all the barriers */
@@ -3426,7 +3424,7 @@ static int barrier_all_devices(struct
btrfs_fs_info *info)
  if (dev->missing)
  continue;
  if (!dev->bdev) {
-errors_wait++;
+dev->err_wait = 1;
  continue;
  }
  if (!dev->in_fs_metadata || !dev->writeable)
@@ -3434,10 +3432,9 @@ static int barrier_all_devices(struct
btrfs_fs_info *info)

  ret = write_dev_flush(dev, 1);
  if (ret)
-errors_wait++;
+dev->err_wait = 1;
  }
-if (errors_send > info->num_tolerated_disk_barrier_failures ||
-errors_wait > info->num_tolerated_disk_barrier_failures)
+if (btrfs_check_degradable(info, info->sb->s_flags) < 0)
  return -EIO;
  return 0;
  }
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f1ef215..88266fa 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6945,8 +6945,12 @@ int btrfs_check_degradable(struct btrfs_fs_info
*fs_info, unsigned flags)
  btrfs_get_num_tolerated_disk_barrier_failures(
  map->type);
  for (i = 0; i < map->num_stripes; i++) {
-if (map->stripes[i].dev->missing)
+if (map->stripes[i].dev->missing ||
+map->stripes[i].dev->err_wait ||
+map->stripes[i].dev->err_send)
  missing++;
+map->stripes[i].dev->err_wait = 0;
+map->stripes[i].dev->err_send = 0;
  }
  if (missing > max_tolerated) {
  ret = -EIO;
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index fe758df..cd02556 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -76,6 +76,10 @@ struct btrfs_device {
  int can_discard;
  int is_tgtdev_for_dev_replace;

+/* for barrier_all_devices() check */
+int err_send;
+int err_wait;
+
  #ifdef __BTRFS_NEED_DEVICE_DATA_ORDERED
  seqcount_t data_seqcount;
  #endif


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RichACLs for BTRFS? (this time complete)

2015-10-30 Thread Marcel Ritter
Hi btrfs-developers,

I just read about the possible/planned merge of richacl patches into
linux kernel 4.4.

s. http://lwn.net/Articles/661078/
s. http://lwn.net/Articles/661357/

Will btrfs support richacls with kernel 4.4?

According to the btrfs wiki, this topic has not been claimed:

https://btrfs.wiki.kernel.org/index.php/Project_ideas#RichACLs_.2F_NFS4_ACLS

As we'd like to use btrfs with NFSv4 I'd really like to see richacls on btrfs.

Hope someone can comment on this topic.

Bye,
   Marcel

PS: Please excuse my former incomplete posting.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: corrupted RAID1: unsuccessful recovery / help needed

2015-10-30 Thread Duncan
Lukas Pirl posted on Fri, 30 Oct 2015 10:43:41 +1300 as excerpted:

> Is e.g. "balance" also influenced by the userspace tools or does
> the kernel the actual work?

btrfs balance is done "online", that is, on the (writable-)mounted 
filesystem, and the kernel does the real work.  It's the tools that work 
on the unmounted filesystem, btrfs check, btrfs restore, btrfs rescue, 
etc, where the userspace code does the real work, and thus where being 
current and having all the latests userspace fixes is vital.

If you can't mount writable, you can't balance.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: random i/o error without error in dmesg

2015-10-30 Thread Duncan
Marc Joliet posted on Thu, 29 Oct 2015 22:10:24 +0100 as excerpted:

>>Meanwhile, as explained in the systemd docs (specifically the systemd
>>for administrators series, IIRC), systemd dropping back to the initr* is
>>actually its way of automatically doing effectively the same thing we
>>were using lib_users and all those restarts to do, getting rid of all
>>possible still running on root executables, including systemd itself, by
>>reexecing systemd itself back in the initr*, as a way to help eliminate
>>*everything* running on root, so it can not only be remounted read-only,
>>but actually unmounted the same as any other filesystem, as userspace is
>>now actually running from the initr* once again.  That's a far *FAR*
>>safer reboot after upgrade than traditional sysvinit solutions were able
>>to do. =:^)
> 
> Yeah, the ability to do that is a nice plus of using an initramfs.
> Although I've never been clear on why it's *safer*.  Is it because the
> remount might fail?  Or are there other reasons, too?

While I don't claim anything but informed admin level authority on the 
problem...

It's first worth noting that the problem a return to initramfs helps 
solve is in practice reasonably rare and obscure, since if it weren't, 
people would have been experiencing it in serious numbers on sysvinit-
based systems all along, and something would have been done to solve it 
long before systemd came along.  So it's a relatively narrow issue that 
in practice can only affect a few users, a relatively small portion of 
the time.

>From my read of the systemd docs, it's more pointing out a theoretical 
issue than a practical one, pointing out that systemd is in fact a more 
theoretically correct solution to the (implicitly mostly theoretical) 
problem.

In that context, I believe the (mostly theoretical) point is as much that 
we were treating / (and perhaps another mount or two) special, remounting 
it read-only instead of unmounting it because in practice there wasn't 
any other choice, and that now that systemd offers the choice, it can in 
fact be treated just like any other filesystem, fully unmounting it 
before shutdown.

Since exceptions to rules are nice places for bugs to hide, in theory at 
least (the remount-ro root being such a universal exception that in 
practice it's a rule of its own, and bugs couldn't long hide in that 
exception /because/ of its universalness), being able to treat / like any 
other filesystem and unmount it is a "purer and more correct" solution.

IOW, it's a nice counter to the "systemd isn't unixy enough" point, as 
here, it's more "unixy" than sysvinit ever was.

That said, I expect that over the years there have been plenty of 
otherwise nice implementations of various useful things that ran into a 
shutdown/reboot-time problem due to root's remount-ro exception, that 
either limited them to non-root-filesystem deployment or sent them back 
for a workaround, if not causing them to rejected outright as unworkable, 
that in this new return-to-initr*-and-unmount-root environment will see 
faster deployment without the workarounds that heretofore were required.  
Of course that'll end up being a limitation on deployment on non-initr* 
direct-to-root boot sequences, but in this primarily prebuild binary 
distro with prebuild by-necessity-modular-kernel-and-initr* environment, 
that's unlikely to slow down wide deployment by much, and anyone wanting 
to do direct-to-root boots and/or non-systemd-based deployments will just 
have to find their own workarounds, which may ultimately be incorporated 
into upstream, or not, depending on upstream's whims.

Which, bringing it all back to the btrfs list title topic, is already 
where multi-device btrfs as / filesystem is in terms of initr*, since 
that's basically broken without an initr* to assemble it.  And of course 
the same thing goes for / on LVM, since it too requires userspace to 
activate, which means initr* if / is on it.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RichACLs for BTRFS?

2015-10-30 Thread Marcel Ritter
Hi btrfs-developers,

I just read about the possible/planned merge of richacl patches into
linux kernel 4.4.

s. http://lwn.net/Articles/661078/
s. http://lwn.net/Articles/661357/

Will btrfs support richacls with kernel 4.4?
According to
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [3/3] btrfs: qgroup: Fix a rebase bug which will cause qgroup double free

2015-10-30 Thread Johannes Henninger
Woops, just noticed I copied and pasted a typo there. Sorry for the
trouble. It should be:

Tested-by: Johannes Henninger 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: corrupted RAID1: unsuccessful recovery / help needed

2015-10-30 Thread Hugo Mills
On Fri, Oct 30, 2015 at 10:58:47AM +, Duncan wrote:
> Lukas Pirl posted on Fri, 30 Oct 2015 10:43:41 +1300 as excerpted:
> 
> > If there is one subvolume that contains all other (read only) snapshots
> > and there is insufficient storage to copy them all separately:
> > Is there an elegant way to preserve those when moving the data across
> > disks?

   If they're read-only snapshots already, then yes:

sent=
for sub in *; do
   btrfs send $sent $sub | btrfs receive /where/ever
   sent="$sent -c$sub"
done

   That will preserve the shared extents between the subvols on the
receiving FS.

   If they're not read-only, then snapshotting each one again as RO
before sending would be the approach, but if your FS is itself RO,
that's not going to be possible, and you need to look at Duncan's
email.

   Hugo.

> AFAIK, no elegant way without a writable mount.
> 
> Tho I'm not sure, btrfs send, to a btrfs elsewhere using receive, may 
> work, since you did specify read-only snapshots, which is what send 
> normally works with in ordered to avoid changes to the snapshot while 
> it's sending it.  My own use-case doesn't involve either snapshots or 
> send/receive, however, so I'm not sure if send can work with a read-only 
> filesystem or not, but I think its normal method of operation is to 
> create those read-only snapshots itself, which of course would require a 
> writable filesystem, so I'm guessing it won't work unless you can 
> convince it to use the read-only mounts as-is.
> 
> The less elegant way would involve manual deduplication.  Copy one 
> snapshot, then another, and dedup what hasn't changed between the two, 
> then add a third and dedup again. ...  Depending on the level of dedup 
> (file vs block level) and the level of change in your filesystem, this 
> should ultimately take about the same level of space as a full backup 
> plus a series of incrementals.
> 
> 
> Meanwhile, this does reinforce the point that snapshots don't replace 
> full backups, that being the reason I don't use them here, since if the 
> filesystem goes bad, it'll very likely take all the snapshots with it.
> 
> Snapshots do tend to be pretty convenient, arguably /too/ convenient and 
> near-zero-cost to make, as people then tend to just do scheduled 
> snapshots, without thinking about their overhead and maintenance costs on 
> the filesystem, until they already have problems.  I'm not sure if you 
> are a regular list reader and have thus seen my normal spiel on btrfs 
> snapshot scaling and recommended limits to avoid problems or not, so if 
> not, here's a slightly condensed version...
> 
> Btrfs has scaling issues that appear when trying to manage too many 
> snapshots.  These tend to appear first in tools like balance and check, 
> where time to process a filesystem goes up dramatically as the number of 
> snapshots increases, to the point where it can become entirely 
> impractical to manage at all somewhere near the 100k snapshots range, and 
> is already dramatically affecting runtime at 10k snapshots.
> 
> As a result, I recommend keeping per-subvol snapshots to 250-ish, which 
> will allow snapshotting four subvolumes while still keeping total 
> filesystem snapshots to 1000, or eight subvolumes at a filesystem total 
> of 2000 snapshots, levels where the scaling issues should remain well 
> within control.  And 250-ish snapshots per subvolume is actually very 
> reasonable even with half-hour scheduled snapshotting, provided a 
> reasonable scheduled snapshot thinning program is also implemented, 
> cutting say to hourly after six hours, six-hourly after a day, 12 hourly 
> after 2 days, daily after a week, and weekly after four weeks to a 
> quarter (13 weeks).  Out beyond a quarter or two, certainly within a 
> year, longer term backups to other media should be done, and snapshots 
> beyond that can be removed entirely, freeing up the space the old 
> snapshots kept locked down and helping to keep the btrfs healthy and 
> functioning well within its practical scalability limits.
> 
> Because a balance that takes a month to complete because it's dealing 
> with a few hundred k snapshots is in practice (for most people) not 
> worthwhile to do at all, and also in practice, a year or even six months 
> out, are you really going to care about the precise half-hour snapshot, 
> or is the next daily or weekly snapshot going to be just as good, and a 
> whole lot easier to find among a couple hundred snapshots than hundreds 
> of thousands?
> 
> If you have far too many snapshots, perhaps this sort of thinning 
> strategy will as well allow you to copy and dedup only key snapshots, say 
> weekly plus daily for the last week, doing the backup thing manually, as 
> well, modifying the thinning strategy accordingly if necessary to get it 
> to fit.  Tho using the copy and dedup strategy above will still require 
> at least double the full space of a single copy, plus the space necessary 
> for each deduped snapshot 

Re: [PATCH 5/6] btrfs-progs: free comparer_set in cmd_qgroup_show

2015-10-30 Thread David Sterba
On Thu, Oct 29, 2015 at 05:31:47PM +0800, Zhao Lei wrote:
> comparer_set, which was allocated by malloc(), should be free before
> function return.
> 
> Signed-off-by: Zhao Lei 
> ---
>  cmds-qgroup.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/cmds-qgroup.c b/cmds-qgroup.c
> index a64b716..f069d32 100644
> --- a/cmds-qgroup.c
> +++ b/cmds-qgroup.c
> @@ -290,7 +290,7 @@ static int cmd_qgroup_show(int argc, char **argv)
>   int filter_flag = 0;
>   unsigned unit_mode;
>  
> - struct btrfs_qgroup_comparer_set *comparer_set;
> + struct btrfs_qgroup_comparer_set *comparer_set = NULL;
>   struct btrfs_qgroup_filter_set *filter_set;
>   filter_set = btrfs_qgroup_alloc_filter_set();
>   comparer_set = btrfs_qgroup_alloc_comparer_set();
> @@ -372,6 +372,8 @@ static int cmd_qgroup_show(int argc, char **argv)
>   fprintf(stderr, "ERROR: can't list qgroups: %s\n",
>   strerror(e));
>  
> + free(comparer_set);

Doh, coverity correctly found that comparer_set is freed inside
btrfs_show_qgroups() a few lines above. Patch dropped.

> +
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: corrupted RAID1: unsuccessful recovery / help needed

2015-10-30 Thread Austin S Hemmelgarn

On 2015-10-30 06:58, Duncan wrote:

Lukas Pirl posted on Fri, 30 Oct 2015 10:43:41 +1300 as excerpted:


If there is one subvolume that contains all other (read only) snapshots
and there is insufficient storage to copy them all separately:
Is there an elegant way to preserve those when moving the data across
disks?


AFAIK, no elegant way without a writable mount.

Tho I'm not sure, btrfs send, to a btrfs elsewhere using receive, may
work, since you did specify read-only snapshots, which is what send
normally works with in ordered to avoid changes to the snapshot while
it's sending it.  My own use-case doesn't involve either snapshots or
send/receive, however, so I'm not sure if send can work with a read-only
filesystem or not, but I think its normal method of operation is to
create those read-only snapshots itself, which of course would require a
writable filesystem, so I'm guessing it won't work unless you can
convince it to use the read-only mounts as-is.
Unless something has significantly changed since I last looked, send 
only works on existing snapshots and doesn't create any directly itself, 
and as such should work fine to send snapshots from a read-only 
filesystem.  In theory, you could use it to send all the snapshots at 
once, although that would probably take a long time, so you'll probably 
have to use a loop like the fragment of shell-script that Hugo suggested 
in his response.  That should result in an (almost) identical level of 
sharing.


The less elegant way would involve manual deduplication.  Copy one
snapshot, then another, and dedup what hasn't changed between the two,
then add a third and dedup again. ...  Depending on the level of dedup
(file vs block level) and the level of change in your filesystem, this
should ultimately take about the same level of space as a full backup
plus a series of incrementals.
If you're using duperemove (which is the only maintained dedupe tool I 
know of for BTRFS), then this will likely take a long time for any 
reasonable amount of data, and probably take up more space on the 
destination drive than it does on the source (while duperemove does 
block-based deduplication, it uses large chunks by default).


Meanwhile, this does reinforce the point that snapshots don't replace
full backups, that being the reason I don't use them here, since if the
filesystem goes bad, it'll very likely take all the snapshots with it.
FWIW, while I don't use them directly myself as a backup, they are 
useful when doing a backup to get a guaranteed stable version of the 
filesystem being backed-up (this is also one of the traditional use 
cases for LVM snapshots, although those have a lot of different issues 
to deal with).  For local backups (I also do cloud-storage based remote 
backups, but local is what matters in this case because it's where I 
actually use send/receive and snapshots) I use two different methods 
depending on the amount of storage I have:
1. If I'm relatively limited on local storage (like in my laptop where 
the secondary internal disk is only 64G), I use a temporary snapshot to 
generate a SquashFS image of the system, which I then store on the 
secondary drive.
2. If I have a lot of spare space (like on my desktop where I have 4x 
1TB HDD's and 2x 128G SSD's), I make a snapshot of the filesystem, then 
use send/receive to transfer that to a backup filesystem on a separate 
disk.  I then keep the original snapshot around on the filesystem so I 
can do incremental send/receive to speed up future backups.
In both cases, I can directly boot my most recent backups if need be, 
and in the second case, I can actually use it to trivially regenerate 
the backed-up filesystems (by simply doing a send/receive in the 
opposite direction).


Beyond providing a stable system-image for backups, the only valid use 
case for snapshots in my opinion is to provide the equivalent to MS 
Windows' 'Restore Point' feature (which I'm pretty sure is done 
currently by RHEL and SLES if they are installed on BTRFS) and possibly 
'File History' for people who for some reason can't use real VCS or just 
need to store the last few revision (which is itself done by stuff like 
'snapper').





smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PATCH V8 02/13] Btrfs: Compute and look up csums based on sectorsized blocks

2015-10-30 Thread Josef Bacik

On 10/28/2015 04:10 AM, Chandan Rajendra wrote:

Checksums are applicable to sectorsize units. The current code uses
bio->bv_len units to compute and look up checksums. This works on machines
where sectorsize == PAGE_SIZE. This patch makes the checksum computation and
look up code to work with sectorsize units.

Reviewed-by: Liu Bo 
Signed-off-by: Chandan Rajendra 


Reviewed-by: Josef Bacik 

Thanks,

Josef

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V8 03/13] Btrfs: Direct I/O read: Work on sectorsized blocks

2015-10-30 Thread Josef Bacik

On 10/28/2015 04:10 AM, Chandan Rajendra wrote:

The direct I/O read's endio and corresponding repair functions work on
page sized blocks. This commit adds the ability for direct I/O read to work on
subpagesized blocks.



Reviewed-by: Josef Bacik 

Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V8 01/13] Btrfs: __btrfs_buffered_write: Reserve/release extents aligned to block size

2015-10-30 Thread Josef Bacik

On 10/28/2015 04:10 AM, Chandan Rajendra wrote:

Currently, the code reserves/releases extents in multiples of PAGE_CACHE_SIZE
units. Fix this by doing reservation/releases in block size units.

Signed-off-by: Chandan Rajendra 


Reviewed-by: Josef Bacik 

Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: FW: btrfs-progs: android build

2015-10-30 Thread David Sterba
On Mon, Aug 31, 2015 at 06:33:01PM +0200, David Sterba wrote:
> So the preliminary support is merged. Outstanding issues are all related
> to blkid API:
> 
> - is_ssd

Fixed by trivially ifdef around the function.

> - btrfs_wipe_existing_sb
> - check_overwrite
>
> In the ssd check case it's safe to provide the 'return 1' replacement,
> but the other two are related to safety measures and I'm not comfortable
> to ifdef them out yet.

I'm not able to find the exact version of libblkid that android uses,
closest guess is 2.14 which is pretty old. The low-level probing has
been added to 2.15 and I think we can't avoid that. Reimpliementing the
missing blkid functionality is possible but I'd rather not go that way.

Please let me know if there's a version of android build system that
provides sufficiently new blkid otherwise I'm afraid that we can't
support it.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bad handling of unpartitioned device in sysfs_devno_to_wholedisk() (which breaks mkfs.btrfs)

2015-10-30 Thread Karel Zak
On Fri, Oct 30, 2015 at 09:43:28AM +0800, Tom Yan wrote:
> So I noticed that SSD detection does work on unpartitioned devices in
> mkfs.btrfs somehow:
> https://bugzilla.kernel.org/show_bug.cgi?id=102921
> 
> Later I found out that it breaks at blkid_devno_to_wholedisk() in is_ssd():
> http://git.kernel.org/cgit/linux/kernel/git/kdave/btrfs-progs.git/tree/mkfs.c?h=v4.2.3#n1103
> 
> which Elliot had shown an example with strace:
> https://lists.01.org/pipermail/linux-nvdimm/2015-September/002109.html
> 
> And I think the problem occurs in the sysfs_get_devname() here:
> https://git.kernel.org/cgit/utils/util-linux/util-linux.git/tree/lib/sysfs.c?h=v2.27#n785
> 
> Since sysfs_get_devname() has to call sysfs_readlink() later, which
> output a long full device path in /sys, I don't think we should call
> it directly with the buffer "diskname", which people won't expect that
> it has to be large enough to carry the path in the middle of the
> process. For example in is_sdd(), a char array of size 32 is used
> ("wholedisk").

You're right. The function sysfs_get_devname() is not too elegant as
it uses devname buffer for readlink. Fixed, the bugfix will be in
v2.27.1.

Thanks!

Karel

-- 
 Karel Zak  
 http://karelzak.blogspot.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RichACLs for BTRFS? (this time complete)

2015-10-30 Thread Austin S Hemmelgarn

On 2015-10-30 05:45, Marcel Ritter wrote:

Hi btrfs-developers,

I just read about the possible/planned merge of richacl patches into
linux kernel 4.4.

s. http://lwn.net/Articles/661078/
s. http://lwn.net/Articles/661357/

Will btrfs support richacls with kernel 4.4?

According to the btrfs wiki, this topic has not been claimed:

https://btrfs.wiki.kernel.org/index.php/Project_ideas#RichACLs_.2F_NFS4_ACLS

As we'd like to use btrfs with NFSv4 I'd really like to see richacls on btrfs.

Hope someone can comment on this topic.

While I don't think we'll directly support richacls, it shouldn't be 
hard to integrate, as they're just stored in a couple of xattrs in the 
'system' prefix.  AFAICT, all that would really be needed is to make 
sure that things are wired up correctly so that we can differentiate 
between using POSIX ACL's and richacls (while I don't agree with a 
number of choices the developers have made with richacls (It should be 
relatively easy to find the long discussions we've had in the LKML 
archives), I do agree that these two different models should not be mixed).





smime.p7s
Description: S/MIME Cryptographic Signature


Periodic kernel freezes

2015-10-30 Thread Alex Adriaanse
I have an EC2 instance on AWS that tends to freeze several times per week. When 
it freezes it stops responding to network traffic, disk I/O stops, and CPU goes 
to 100%. The system comes back fine after a reboot. I was finally able to get a 
kernel backtrace from when this happened today, which I have attached to this 
email.

The VM in question runs Debian Jessie, and has 3 BTRFS filesystems, including 
the root filesystem. Details are included below.

Any ideas?

Thanks,

Alex



# uname -a
Linux prod-docker-1-a 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1+deb8u5 
(2015-10-09) x86_64 GNU/Linux

#   btrfs --version
Btrfs v3.17

# df -h
Filesystem  Size  Used Avail Use% Mounted on
/dev/xvda   8.0G  1.3G  6.4G  17% /
udev 10M 0   10M   0% /dev
tmpfs   3.0G  8.6M  3.0G   1% /run
tmpfs   7.5G   12K  7.5G   1% /dev/shm
tmpfs   5.0M 0  5.0M   0% /run/lock
tmpfs   7.5G 0  7.5G   0% /sys/fs/cgroup
/dev/xvdb50G  3.9G   45G   9% /var/lib/docker
/dev/xvdc   200G   70G  130G  35% /srv/volumes


# btrfs fi show
Label: none  uuid: 8a293966-5c19-485c-a819-a6b801a1085d
Total devices 1 FS bytes used 1.21GiB
devid1 size 8.00GiB used 3.28GiB path /dev/xvda

Label: 'docker'  uuid: 5bf935e0-4519-43d9-b2e9-b3fb19374b72
Total devices 1 FS bytes used 3.70GiB
devid1 size 50.00GiB used 6.04GiB path /dev/xvdb

Label: 'volumes'  uuid: 2d121370-7879-4485-8fd5-1fe0db5a0c12
Total devices 1 FS bytes used 68.82GiB
devid1 size 200.00GiB used 124.04GiB path /dev/xvdc

Btrfs v3.17


# btrfs fi df /
Data, single: total=2.85GiB, used=1.17GiB
System, DUP: total=8.00MiB, used=16.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=204.75MiB, used=38.03MiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=16.00MiB, used=0.00B

# btrfs fi df /var/lib/docker
Data, single: total=4.01GiB, used=3.52GiB
System, DUP: total=8.00MiB, used=16.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=1.00GiB, used=179.58MiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=64.00MiB, used=0.00B

# btrfs fi df /srv/volumes
Data, single: total=122.01GiB, used=68.55GiB
System, DUP: total=8.00MiB, used=16.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=1.00GiB, used=277.20MiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=96.00MiB, used=0.00B
[344317.872151] [ cut here ]
[344317.876091] kernel BUG at 
/build/linux-xkTWug/linux-3.16.7-ckt11/mm/page_alloc.c:1011!
[344317.876091] invalid opcode:  [#1] SMP 
[344317.876091] Modules linked in: xt_nat xt_tcpudp veth xt_conntrack 
ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 
xt_addrtype iptable_filter ip_tables x_tables nf_nat nf_conntrack bridge stp 
llc crc32_pclmul ppdev ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul 
glue_helper ablk_helper cryptd evdev psmouse serio_raw parport_pc parport ttm 
drm_kms_helper drm i2c_piix4 i2c_core processor thermal_sys button autofs4 
btrfs xor raid6_pq ata_generic xen_blkfront crct10dif_pclmul crct10dif_common 
crc32c_intel ata_piix libata scsi_mod ixgbevf(O)
[344317.876091] CPU: 0 PID: 9842 Comm: kworker/u30:7 Tainted: G   O  
3.16.0-4-amd64 #1 Debian 3.16.7-ckt11-1+deb8u5
[344317.876091] Hardware name: Xen HVM domU, BIOS 4.2.amazon 05/06/2015
[344317.876091] Workqueue: btrfs-delalloc btrfs_delalloc_helper [btrfs]
[344317.876091] task: 8800eb30b630 ti: 880001a08000 task.ti: 
880001a08000
[344317.876091] RIP: 0010:[]  [] 
move_freepages+0x107/0x110
[344317.876091] RSP: 0018:880001a0b918  EFLAGS: 00010006
[344317.876091] RAX: 8803e08fb000 RBX:  RCX: 
0001
[344317.876091] RDX: ea000d922fc8 RSI: ea000d91c000 RDI: 
8803e08fbe00
[344317.876091] RBP: 0001 R08: 8803e08fbe00 R09: 

[344317.876091] R10:  R11: 8803e08fbeb0 R12: 
ea000d91cbd0
[344317.876091] R13:  R14:  R15: 
8803e08fbe00
[344317.876091] FS:  () GS:8803e040() 
knlGS:
[344317.876091] CS:  0010 DS:  ES:  CR0: 80050033
[344317.876091] CR2: 7fd0a085fc00 CR3: 00035f1d5000 CR4: 
001406f0
[344317.876091] Stack:
[344317.876091]  81143c1c  0002115a8000 
ea000d91cbf0
[344317.876091]  8803e08fbe90 8803e0412f78 8800eb30b698 
8803e08fbe00
[344317.876091]  001f  0001 
001a
[344317.876091] Call Trace:
[344317.876091]  [] ? __rmqueue+0x37c/0x460
[344317.876091]  [] ? get_page_from_freelist+0x685/0x910
[344317.876091]  [] ? __alloc_pages_nodemask+0x16d/0xb30
[344317.876091]  [] ? __alloc_pages_nodemask+0x16d/0xb30
[344317.876091]  [] ? btrfs_find_space_for_alloc+0x22a/0x270 
[btrfs]

Re: Periodic kernel freezes

2015-10-30 Thread David Goodwin


On 30/10/2015 16:25, Alex Adriaanse wrote:

I have an EC2 instance on AWS that tends to freeze several times per
week. When it freezes it stops responding to network traffic, disk
I/O stops, and CPU goes to 100%. The system comes back fine after a
reboot. I was finally able to get a kernel backtrace from when this
happened today, which I have attached to this email.

The VM in question runs Debian Jessie, and has 3 BTRFS filesystems,
including the root filesystem. Details are included below.

Any ideas?



Hi Alex -

I kept experiencing problems with the Jessie 3.16.x kernel on EC2 (and 
elsewhere) with BTRFS.


Out of 8 nodes, one managed an uptime of 90 days, while the average was 
about 21 days.


Crashes were seemingly random, and it was difficult to get stack traces.

For the stack traces I did get, it wasn't always obvious that the 
problem lay with BTRFS.


Reboots normally needed to be forceful.

I'd suggest upgrading to a backports kernel (I compiled various 4.1.x 
kernels, but there's now 4.2.x in jessie-backports).


You might also want to turn off compression...

David.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html