Re: BTRFS BUG at insert_inline_extent_backref+0xe3/0xf0 while rebalancing

2015-10-30 Thread Erkki Seppala
Filipe Manana  writes:
> Try this (just sent a few minutes ago):
> https://patchwork.kernel.org/patch/7463161/

I've been using this patch for a week now, doing two rebalances a day
(one per file system) - no problem so far. Thanks!

Probably unrelated to this I did experience one reboot without any
trace, possibly because I had enabled panic = 10 and panic_on_oops = 1,
but that event did not happen anytime near a balance was happening. I
wonder if the hang detector could trigger that configuration to reboot?

Thanks again for the great work, your detective work is always
impressive :).

-- 
  _
 / __// /__   __   http://www.modeemi.fi/~flux/\   \
/ /_ / // // /\ \/ /\  /
   /_/  /_/ \___/ /_/\_\@modeemi.fi  \/

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS BUG at insert_inline_extent_backref+0xe3/0xf0 while rebalancing

2015-10-22 Thread Filipe Manana
On Thu, Oct 22, 2015 at 6:32 AM, Erkki Seppala  wrote:
> Hello,
>
> Recently I added daily rebalancing to my cron.d (after finding myself in
> the no-space-situation), and not long after that, I found my PC had
> crashed over night. Having no sign in the logs anywhere (not even over
> network even though there should be) I had nothing to go on, but this
> night it crashed again after starting the rebalance, and this time there
> was some information on the kernel log.
>
> Kernel version: 4.2.3 (package linux-image-4.2.0-1-amd64 version 4.2.3-1
> from Debian Unstable)
>
> The dump is available at:
>
>   http://www.modeemi.fi/~flux/btrfs/btrfs-BUG-2015-10-55.txt
>
> The log is available as well (stripped some unrelated USB- and firewall
> logging, showing that last evening there was some kernel task hung for
> 120 seconds; but it's in another btrfs filesystem and is another story):
>
>   http://www.modeemi.fi/~flux/btrfs/btrfs-2015-10-55.txt
>
> I'm not quite sure which of the btrfs balance commands caused the
> issue. But there is my script:
>
> #!/bin/sh
> fs="$1"
> if [ -z "$fs" ]; then
>   echo usage: btrfs-balance / 0 1 5 10 20 50
>   exit 1
> fi
> fs="$1"
> shift
> for usage in d m; do for a in "$@"; do date; /bin/btrfs balance start
> "$fs" -v -${usage}usage=$a; done; done
>
> And it was started at 07:30 with:
>
>   /usr/local/sbin/btrfs-balance / 0 1 2 5 10 20 30 50 70
>
> I should add that the filesystem in question is backed by MD RAID10 and
> that is backed by four SSDs, so it's reasonably fast in IO, if that
> affects anything. There should have been no much competing IO at the
> time of the occurrence.
>
> Before Duncan asks ;-), I only have a moderate number of subvolumes and
> snapshots, ie. one subvolume for each of /, /var/log/journal and /home,
> 24 snapshots of / and /home plus <10 snapshots of /.
>
> Before that balance there was another balance on a another BTRFS RAID10,
> but given the time stamp I think I can easily say it wasn't the cause.
>
> I don't really have other 'solutions' than disabling the rebalancing for
> the time being, and only use it as-needed as I had earlier done..

Try this (just sent a few minutes ago):
https://patchwork.kernel.org/patch/7463161/

thanks

>
> Cheers,
>
> --
>   _
>  / __// /__   __   http://www.modeemi.fi/~flux/\   \
> / /_ / // // /\ \/ /\  /
>/_/  /_/ \___/ /_/\_\@modeemi.fi  \/
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS BUG at insert_inline_extent_backref+0xe3/0xf0 while rebalancing

2015-10-22 Thread Stéphane Lesimple

Le 2015-10-22 10:53, Filipe Manana a écrit :
On Thu, Oct 22, 2015 at 6:32 AM, Erkki Seppala  
wrote:

Hello,

Recently I added daily rebalancing to my cron.d (after finding myself 
in

the no-space-situation), and not long after that, I found my PC had
crashed over night. Having no sign in the logs anywhere (not even over
network even though there should be) I had nothing to go on, but this
night it crashed again after starting the rebalance, and this time 
there

was some information on the kernel log.

Kernel version: 4.2.3 (package linux-image-4.2.0-1-amd64 version 
4.2.3-1

from Debian Unstable)

The dump is available at:

  http://www.modeemi.fi/~flux/btrfs/btrfs-BUG-2015-10-55.txt

The log is available as well (stripped some unrelated USB- and 
firewall

logging, showing that last evening there was some kernel task hung for
120 seconds; but it's in another btrfs filesystem and is another 
story):


  http://www.modeemi.fi/~flux/btrfs/btrfs-2015-10-55.txt

I'm not quite sure which of the btrfs balance commands caused the
issue. But there is my script:

#!/bin/sh
fs="$1"
if [ -z "$fs" ]; then
  echo usage: btrfs-balance / 0 1 5 10 20 50
  exit 1
fi
fs="$1"
shift
for usage in d m; do for a in "$@"; do date; /bin/btrfs balance start
"$fs" -v -${usage}usage=$a; done; done

And it was started at 07:30 with:

  /usr/local/sbin/btrfs-balance / 0 1 2 5 10 20 30 50 70

I should add that the filesystem in question is backed by MD RAID10 
and

that is backed by four SSDs, so it's reasonably fast in IO, if that
affects anything. There should have been no much competing IO at the
time of the occurrence.

Before Duncan asks ;-), I only have a moderate number of subvolumes 
and
snapshots, ie. one subvolume for each of /, /var/log/journal and 
/home,

24 snapshots of / and /home plus <10 snapshots of /.

Before that balance there was another balance on a another BTRFS 
RAID10,

but given the time stamp I think I can easily say it wasn't the cause.

I don't really have other 'solutions' than disabling the rebalancing 
for

the time being, and only use it as-needed as I had earlier done..


Try this (just sent a few minutes ago):
https://patchwork.kernel.org/patch/7463161/



Awesome, I'll also try it right now under 4.3.0-rc6. My system is 
currently hit so hard by this bug that it no longer survives a balance 
for longer than a few minutes.


Will keep you posted on the outcome.

Thanks,

--
Stéphane.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS BUG at insert_inline_extent_backref+0xe3/0xf0 while rebalancing

2015-10-22 Thread Erkki Seppala
Hello,

Thanks for the super-fast response :).

I've installed the patch and shall be waiting. The effects should be
visible within a week given daily rebalances of two filesystems.

-- 
  _
 / __// /__   __   http://www.modeemi.fi/~flux/\   \
/ /_ / // // /\ \/ /\  /
   /_/  /_/ \___/ /_/\_\@modeemi.fi  \/

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


BTRFS BUG at insert_inline_extent_backref+0xe3/0xf0 while rebalancing

2015-10-21 Thread Erkki Seppala
Hello,

Recently I added daily rebalancing to my cron.d (after finding myself in
the no-space-situation), and not long after that, I found my PC had
crashed over night. Having no sign in the logs anywhere (not even over
network even though there should be) I had nothing to go on, but this
night it crashed again after starting the rebalance, and this time there
was some information on the kernel log.

Kernel version: 4.2.3 (package linux-image-4.2.0-1-amd64 version 4.2.3-1
from Debian Unstable)

The dump is available at:

  http://www.modeemi.fi/~flux/btrfs/btrfs-BUG-2015-10-55.txt

The log is available as well (stripped some unrelated USB- and firewall
logging, showing that last evening there was some kernel task hung for
120 seconds; but it's in another btrfs filesystem and is another story):

  http://www.modeemi.fi/~flux/btrfs/btrfs-2015-10-55.txt

I'm not quite sure which of the btrfs balance commands caused the
issue. But there is my script:

#!/bin/sh
fs="$1"
if [ -z "$fs" ]; then
  echo usage: btrfs-balance / 0 1 5 10 20 50
  exit 1
fi
fs="$1"
shift
for usage in d m; do for a in "$@"; do date; /bin/btrfs balance start
"$fs" -v -${usage}usage=$a; done; done

And it was started at 07:30 with:

  /usr/local/sbin/btrfs-balance / 0 1 2 5 10 20 30 50 70

I should add that the filesystem in question is backed by MD RAID10 and
that is backed by four SSDs, so it's reasonably fast in IO, if that
affects anything. There should have been no much competing IO at the
time of the occurrence.

Before Duncan asks ;-), I only have a moderate number of subvolumes and
snapshots, ie. one subvolume for each of /, /var/log/journal and /home,
24 snapshots of / and /home plus <10 snapshots of /.

Before that balance there was another balance on a another BTRFS RAID10,
but given the time stamp I think I can easily say it wasn't the cause.

I don't really have other 'solutions' than disabling the rebalancing for
the time being, and only use it as-needed as I had earlier done..

Cheers,

-- 
  _
 / __// /__   __   http://www.modeemi.fi/~flux/\   \
/ /_ / // // /\ \/ /\  /
   /_/  /_/ \___/ /_/\_\@modeemi.fi  \/

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html