Re: illegal snapshot, cannot be deleted

2015-11-13 Thread Vedran Vucic
Hello,

I succeeded to delete illegal snapshot with command:
btrfs subvolume delete /.snapshots/741/snapshot
When I have done
btrfs balance / -dusage=0 -musage=0
increasing value up to 4o I did not have issues.
But on value 4- for-dusage= and -musage=
I got message that there is no space left on disk.
Do you have any advice how to manage that?

Vedran


On Thu, Nov 12, 2015 at 1:32 PM, Austin S Hemmelgarn
 wrote:
> On 2015-11-11 17:11, Vedran Vucic wrote:
>>
>> Hello,
>>
>> I use OpenSuse 13.2 on my Toshiba Satellite laptop. I noticed that I run
>> out of disk space, checked documentation and I realized that there were
>> many snapshots.  I used Yast Snapper to delete snapshots.
>> I noticed that one snapshot  with number 748 could not be deleted.
>> I entered terminal and after the command:
>> snapper -c root delete 748
>> I got message Illegal snapshot.
>
> This sounds like some sort of issue with snapper, not BTRFS itself, but see
> below for some suggestions.
>>
>> I woudl like to delete it since it is old one.
>> Please find details about my system as requested on your wiki page.
>> uname -a
>> Linux linux-jjcc.site 3.16.7-29-desktop #1 SMP PREEMPT Fri Oct 23 00:46:04
>> UTC 2015 (6be6a97) i686 i686 i386 GNU/Linux
>>
>> btrfs --version
>> btrfs-progs v4.0+20150429
>>
>> btrfs fi show
>> Label: none  uuid: d6934db3-3ac9-49d0-83db-287be7b995a5
>>  Total devices 1 FS bytes used 10.98GiB
>>  devid1 size 18.71GiB used 18.71GiB path /dev/sda6
>>
>> btrfs fi df /
>> Data, single: total=15.19GiB, used=10.37GiB
>> System, DUP: total=8.00MiB, used=16.00KiB
>> System, single: total=4.00MiB, used=0.00B
>> Metadata, DUP: total=1.75GiB, used=622.53MiB
>> Metadata, single: total=8.00MiB, used=0.00B
>> GlobalReserve, single: total=208.00MiB, used=0.00B
>> Please find attached dmesg.log as requested.
>>
>> Please advise what have to do in order to delete snapshot that is reported
>> to be illegal.
>
> Have you tried running 'btrfs subvolume delete' on the snapshot?  You'll
> have to find the full path to it first of course, but that shouldn't be too
> hard.  Based on the lack of BTRFS error messages in the kernel log you
> posted, I'm pretty certain that this isn't an issue with the filesystem
> itself (although the filesystem doesn't look particularly healthy, see
> further below), so manually deleting the snapshot using the regular BTRFS
> commands should work just fine.  That said, you may also want to look into
> changing the config for snapper, as it has a ridiculously aggressive
> retention policy for snapshots by default, which tends to lead to excessive
> space usage on filesystems smaller than about 250GB.
>
> You may also want to look at running a balance on the filesystem, the
> numbers from btrfs fi show and btrfs fi df look somewhat worrying, you've
> got all the space on the disk allocated as chunks by BTRFS, but have a
> significant amount of empty space in those chunks.  Given that fact, ENOSPC
> issues are a very real possibility, and you'll probably have to run a series
> of partial balances to fix this (and it's important to do it before it
> becomes a visible issue also, because once you start getting ENOSPC errors,
> it is a lot harder to fix).  Try running a balance with '-dusage=0
> -musage=0', then re-run repeatedly increasing the number for both arguments
> by 5 each time until you get to 50.  If a run complains about 'ENOSPC errors
> during balance', re-run it with the same number for -dusage and -musage.  If
> you end up re-running with the same value 3 times and keep getting the
> errors, then you're probably beyond the point of this being fixable, and
> should just recreate the filesystem (you do have backups, right?).
> Otherwise, after finishing the run with '-dusage=50 -musage=50'
> successfully, run a full balance without the dusage and musage options.
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: illegal snapshot, cannot be deleted

2015-11-13 Thread Vedran Vucic
Hello,

Here are outputs of commands as you requested:
 btrfs fi df /
Data, single: total=8.00GiB, used=7.71GiB
System, DUP: total=32.00MiB, used=16.00KiB
Metadata, DUP: total=1.12GiB, used=377.25MiB
GlobalReserve, single: total=128.00MiB, used=0.00B

btrfs fi show
Label: none  uuid: d6934db3-3ac9-49d0-83db-287be7b995a5
Total devices 1 FS bytes used 8.08GiB
devid1 size 18.71GiB used 10.31GiB path /dev/sda6

btrfs-progs v4.0+20150429

Thanks,

vedran

On Fri, Nov 13, 2015 at 5:30 PM, Austin S Hemmelgarn
 wrote:
> On 2015-11-13 11:12, Vedran Vucic wrote:
>>
>> Hello,
>>
>> I succeeded to delete illegal snapshot with command:
>> btrfs subvolume delete /.snapshots/741/snapshot
>> When I have done
>> btrfs balance / -dusage=0 -musage=0
>> increasing value up to 4o I did not have issues.
>> But on value 4- for-dusage= and -musage=
>> I got message that there is no space left on disk.
>> Do you have any advice how to manage that?
>
>
> Can you post the output of 'btrfs fi df' and 'btrfs fi show' again? Both
> should have changed after the balance, and I'd need to see what it looks
> like now to be able to give any reasonable advice.
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: illegal snapshot, cannot be deleted

2015-11-13 Thread Austin S Hemmelgarn

On 2015-11-13 11:12, Vedran Vucic wrote:

Hello,

I succeeded to delete illegal snapshot with command:
btrfs subvolume delete /.snapshots/741/snapshot
When I have done
btrfs balance / -dusage=0 -musage=0
increasing value up to 4o I did not have issues.
But on value 4- for-dusage= and -musage=
I got message that there is no space left on disk.
Do you have any advice how to manage that?


Can you post the output of 'btrfs fi df' and 'btrfs fi show' again? 
Both should have changed after the balance, and I'd need to see what it 
looks like now to be able to give any reasonable advice.





smime.p7s
Description: S/MIME Cryptographic Signature


Re: illegal snapshot, cannot be deleted

2015-11-13 Thread Austin S Hemmelgarn

On 2015-11-13 12:30, Vedran Vucic wrote:

Hello,

Here are outputs of commands as you requested:
  btrfs fi df /
Data, single: total=8.00GiB, used=7.71GiB
System, DUP: total=32.00MiB, used=16.00KiB
Metadata, DUP: total=1.12GiB, used=377.25MiB
GlobalReserve, single: total=128.00MiB, used=0.00B

btrfs fi show
Label: none  uuid: d6934db3-3ac9-49d0-83db-287be7b995a5
 Total devices 1 FS bytes used 8.08GiB
 devid1 size 18.71GiB used 10.31GiB path /dev/sda6

btrfs-progs v4.0+20150429

Hmm, that's odd, based on these numbers, you should be having no issue 
at all trying to run a balance. You might be hitting some other bug in 
the kernel, however, but I don't remember if there were any known bugs 
related to ENOSPC or balance in the version you're running.  You might 
see if trying to re-run the balance with '-dusage=40 -musage=40' works 
correctly (I've seen cases where the first run fails, but subsequent 
ones work because the first one made some progress despite failing).

On Fri, Nov 13, 2015 at 5:30 PM, Austin S Hemmelgarn
 wrote:

On 2015-11-13 11:12, Vedran Vucic wrote:


Hello,

I succeeded to delete illegal snapshot with command:
btrfs subvolume delete /.snapshots/741/snapshot
When I have done
btrfs balance / -dusage=0 -musage=0
increasing value up to 4o I did not have issues.
But on value 4- for-dusage= and -musage=
I got message that there is no space left on disk.
Do you have any advice how to manage that?



Can you post the output of 'btrfs fi df' and 'btrfs fi show' again? Both
should have changed after the balance, and I'd need to see what it looks
like now to be able to give any reasonable advice.







smime.p7s
Description: S/MIME Cryptographic Signature


Re: illegal snapshot, cannot be deleted

2015-11-13 Thread Austin S Hemmelgarn

On 2015-11-13 13:42, Hugo Mills wrote:

On Fri, Nov 13, 2015 at 01:10:12PM -0500, Austin S Hemmelgarn wrote:

On 2015-11-13 12:30, Vedran Vucic wrote:

Hello,

Here are outputs of commands as you requested:
  btrfs fi df /
Data, single: total=8.00GiB, used=7.71GiB
System, DUP: total=32.00MiB, used=16.00KiB
Metadata, DUP: total=1.12GiB, used=377.25MiB
GlobalReserve, single: total=128.00MiB, used=0.00B

btrfs fi show
Label: none  uuid: d6934db3-3ac9-49d0-83db-287be7b995a5
 Total devices 1 FS bytes used 8.08GiB
 devid1 size 18.71GiB used 10.31GiB path /dev/sda6

btrfs-progs v4.0+20150429


Hmm, that's odd, based on these numbers, you should be having no
issue at all trying to run a balance. You might be hitting some
other bug in the kernel, however, but I don't remember if there were
any known bugs related to ENOSPC or balance in the version you're
running.


There's one specific bug that shows up with ENOSPC exactly like
this. It's in all versions of the kernel, there's no known solution,
and no guaranteed mitigation strategy, I'm afraid. Various things like
balancing, or adding, balancing, and removing a device again have been
tried. Sometimes they seem to help; sometimes they just make the
problem worse.

We average maybe one report a week or so with this particular
set of symptoms.
We should get this listed on the Wiki on the Gotcha's page ASAP, 
especially considering that it's a pretty significant bug (not quite as 
bad as data corruption, but pretty darn close).


Vedran, could you try running the balance with just '-dusage=40' and 
then again with just '-musage=40'?  If just one of those fails, it could 
help narrow things down significantly.


Hugo, is there anything else known about this issue (I don't recall 
seeing it mentioned before, and a quick web search didn't turn up much)? 
 In particular:
1. Is there any known way to reliably reproduce it (I would assume not, 
as that would likely lead to a mitigation strategy.  If someone does 
find a reliable reproducer, please let me know, I've got some 
significant spare processor time and storage space I could dedicate to 
getting traces and filesystem images for debugging, and already have 
most of the required infrastructure set up for something like this)?
2. Is it contagious (that is, if I send a snapshot from a filesystem 
that is affected by it, does the filesystem that receives the snapshot 
become affected; if we could find a way to reproduce it, I could easily 
answer this question within a couple of minutes of reproducing it)?
3. Do we have any kind of statistics beyond the rate of reports (for 
example, does it happen more often on bigger filesystems, or possibly 
more frequently with certain chunk profiles)?




smime.p7s
Description: S/MIME Cryptographic Signature


Re: illegal snapshot, cannot be deleted

2015-11-13 Thread Hugo Mills
On Fri, Nov 13, 2015 at 02:40:44PM -0500, Austin S Hemmelgarn wrote:
> On 2015-11-13 13:42, Hugo Mills wrote:
> >On Fri, Nov 13, 2015 at 01:10:12PM -0500, Austin S Hemmelgarn wrote:
> >>On 2015-11-13 12:30, Vedran Vucic wrote:
> >>>Hello,
> >>>
> >>>Here are outputs of commands as you requested:
> >>>  btrfs fi df /
> >>>Data, single: total=8.00GiB, used=7.71GiB
> >>>System, DUP: total=32.00MiB, used=16.00KiB
> >>>Metadata, DUP: total=1.12GiB, used=377.25MiB
> >>>GlobalReserve, single: total=128.00MiB, used=0.00B
> >>>
> >>>btrfs fi show
> >>>Label: none  uuid: d6934db3-3ac9-49d0-83db-287be7b995a5
> >>> Total devices 1 FS bytes used 8.08GiB
> >>> devid1 size 18.71GiB used 10.31GiB path /dev/sda6
> >>>
> >>>btrfs-progs v4.0+20150429
> >>>
> >>Hmm, that's odd, based on these numbers, you should be having no
> >>issue at all trying to run a balance. You might be hitting some
> >>other bug in the kernel, however, but I don't remember if there were
> >>any known bugs related to ENOSPC or balance in the version you're
> >>running.
> >
> >There's one specific bug that shows up with ENOSPC exactly like
> >this. It's in all versions of the kernel, there's no known solution,
> >and no guaranteed mitigation strategy, I'm afraid. Various things like
> >balancing, or adding, balancing, and removing a device again have been
> >tried. Sometimes they seem to help; sometimes they just make the
> >problem worse.
> >
> >We average maybe one report a week or so with this particular
> >set of symptoms.
> We should get this listed on the Wiki on the Gotcha's page ASAP,
> especially considering that it's a pretty significant bug (not quite
> as bad as data corruption, but pretty darn close).

   It's certainly mentioned in the FAQ, in the main entry on
unexpected ENOSPC. The text takes you through identifying when there's
the "usual" problem, then goes on to say that if you've hit ENOSPC
with free space still to be unallocated, you've got this issue.

> Vedran, could you try running the balance with just '-dusage=40' and
> then again with just '-musage=40'?  If just one of those fails, it
> could help narrow things down significantly.
> 
> Hugo, is there anything else known about this issue (I don't recall
> seeing it mentioned before, and a quick web search didn't turn up
> much)?

   I grumble about it regularly on IRC, where we get many more reports
of it than on the mailing list. There have been a couple on here that
I can recall, but not many.

>  In particular:
> 1. Is there any known way to reliably reproduce it (I would assume
> not, as that would likely lead to a mitigation strategy.  If someone
> does find a reliable reproducer, please let me know, I've got some
> significant spare processor time and storage space I could dedicate
> to getting traces and filesystem images for debugging, and already
> have most of the required infrastructure set up for something like
> this)?

   None that I know of. I can start asking people for btrfs-image
dumps again, if you want to investigate. I did do that for a while, to
pass them to josef, but he said he didn't need any more of them after
a while. (He was always planning on investigating it, but kept getting
diverted by data corruption bugs, which have higher priority).

> 2. Is it contagious (that is, if I send a snapshot from a filesystem
> that is affected by it, does the filesystem that receives the
> snapshot become affected; if we could find a way to reproduce it, I
> could easily answer this question within a couple of minutes of
> reproducing it)?

   No, as far as I know, it doesn't transfer via send/receive.
send/receive is largely equivalent to copying the data by other means
-- receive is implemented almost exclusively in userspace, with only a
couple of ioctls for mucking around with the UUIDs at the end.

> 3. Do we have any kind of statistics beyond the rate of reports (for
> example, does it happen more often on bigger filesystems, or
> possibly more frequently with certain chunk profiles)?

   Not that I've noticed, no. We've had it on small and large,
single-device and many devices, HDD and SSD, converted and not
converted. At one point, a couple of years ago, I did think it was
down to converted filesystems, because we had a run of them, but that
seems not to be the case.

   Hugo.

-- 
Hugo Mills | The glass is neither half-full nor half-empty; it is
hugo@... carfax.org.uk | twice as large as it needs to be.
http://carfax.org.uk/  |
PGP: E2AB1DE4  |  Dr Jon Whitehead


signature.asc
Description: Digital signature


Re: illegal snapshot, cannot be deleted

2015-11-13 Thread Henk Slager
Vedran,

I see 2 snapshot numbers (748 and 741), maybe copy-paste error or
typo, but can you confirm that the illegal one is deleted?

/Henk

On Fri, Nov 13, 2015 at 6:30 PM, Vedran Vucic  wrote:
> Hello,
>
> Here are outputs of commands as you requested:
>  btrfs fi df /
> Data, single: total=8.00GiB, used=7.71GiB
> System, DUP: total=32.00MiB, used=16.00KiB
> Metadata, DUP: total=1.12GiB, used=377.25MiB
> GlobalReserve, single: total=128.00MiB, used=0.00B
>
> btrfs fi show
> Label: none  uuid: d6934db3-3ac9-49d0-83db-287be7b995a5
> Total devices 1 FS bytes used 8.08GiB
> devid1 size 18.71GiB used 10.31GiB path /dev/sda6
>
> btrfs-progs v4.0+20150429
>
> Thanks,
>
> vedran
>
> On Fri, Nov 13, 2015 at 5:30 PM, Austin S Hemmelgarn
>  wrote:
>> On 2015-11-13 11:12, Vedran Vucic wrote:
>>>
>>> Hello,
>>>
>>> I succeeded to delete illegal snapshot with command:
>>> btrfs subvolume delete /.snapshots/741/snapshot
>>> When I have done
>>> btrfs balance / -dusage=0 -musage=0
>>> increasing value up to 4o I did not have issues.
>>> But on value 4- for-dusage= and -musage=
>>> I got message that there is no space left on disk.
>>> Do you have any advice how to manage that?
>>
>>
>> Can you post the output of 'btrfs fi df' and 'btrfs fi show' again? Both
>> should have changed after the balance, and I'd need to see what it looks
>> like now to be able to give any reasonable advice.
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: illegal snapshot, cannot be deleted

2015-11-13 Thread Hugo Mills
On Fri, Nov 13, 2015 at 01:10:12PM -0500, Austin S Hemmelgarn wrote:
> On 2015-11-13 12:30, Vedran Vucic wrote:
> >Hello,
> >
> >Here are outputs of commands as you requested:
> >  btrfs fi df /
> >Data, single: total=8.00GiB, used=7.71GiB
> >System, DUP: total=32.00MiB, used=16.00KiB
> >Metadata, DUP: total=1.12GiB, used=377.25MiB
> >GlobalReserve, single: total=128.00MiB, used=0.00B
> >
> >btrfs fi show
> >Label: none  uuid: d6934db3-3ac9-49d0-83db-287be7b995a5
> > Total devices 1 FS bytes used 8.08GiB
> > devid1 size 18.71GiB used 10.31GiB path /dev/sda6
> >
> >btrfs-progs v4.0+20150429
> >
> Hmm, that's odd, based on these numbers, you should be having no
> issue at all trying to run a balance. You might be hitting some
> other bug in the kernel, however, but I don't remember if there were
> any known bugs related to ENOSPC or balance in the version you're
> running.

   There's one specific bug that shows up with ENOSPC exactly like
this. It's in all versions of the kernel, there's no known solution,
and no guaranteed mitigation strategy, I'm afraid. Various things like
balancing, or adding, balancing, and removing a device again have been
tried. Sometimes they seem to help; sometimes they just make the
problem worse.

   We average maybe one report a week or so(*) with this particular
set of symptoms.

   Hugo.

(*) 

>  You might see if trying to re-run the balance with
> '-dusage=40 -musage=40' works correctly (I've seen cases where the
> first run fails, but subsequent ones work because the first one made
> some progress despite failing).
> >On Fri, Nov 13, 2015 at 5:30 PM, Austin S Hemmelgarn
> > wrote:
> >>On 2015-11-13 11:12, Vedran Vucic wrote:
> >>>
> >>>Hello,
> >>>
> >>>I succeeded to delete illegal snapshot with command:
> >>>btrfs subvolume delete /.snapshots/741/snapshot
> >>>When I have done
> >>>btrfs balance / -dusage=0 -musage=0
> >>>increasing value up to 4o I did not have issues.
> >>>But on value 4- for-dusage= and -musage=
> >>>I got message that there is no space left on disk.
> >>>Do you have any advice how to manage that?
> >>
> >>
> >>Can you post the output of 'btrfs fi df' and 'btrfs fi show' again? Both
> >>should have changed after the balance, and I'd need to see what it looks
> >>like now to be able to give any reasonable advice.
> >>
> >>
> 
> 



-- 
Hugo Mills | You can play with your friends' privates, but you
hugo@... carfax.org.uk | can't play with your friends' childrens' privates.
http://carfax.org.uk/  |
PGP: E2AB1DE4  |   C++ coding rule


signature.asc
Description: Digital signature


Re: illegal snapshot, cannot be deleted

2015-11-13 Thread Vedran Vucic
Hello,

My system is on laptop that is not heavy duty such as servers.
openSuse 13.2 was installed approx 2 months ago so the issue did not
appear due to longterm lack of administration or maintenance.
Please let me know if I can help in any other way.

Thanks,

vedran

On Fri, Nov 13, 2015 at 9:15 PM, Vedran Vucic  wrote:
> Hello,
>
> I guess that it might be bug in kernel.
> I was successful this:
> btrfs balance start / -dusage=50 -musage=35
>
> musage above 35 caused ENOSPC message. Otherwise it was good.
> Thanks on support,
>
> vedran
>
> On Fri, Nov 13, 2015 at 7:42 PM, Hugo Mills  wrote:
>> On Fri, Nov 13, 2015 at 01:10:12PM -0500, Austin S Hemmelgarn wrote:
>>> On 2015-11-13 12:30, Vedran Vucic wrote:
>>> >Hello,
>>> >
>>> >Here are outputs of commands as you requested:
>>> >  btrfs fi df /
>>> >Data, single: total=8.00GiB, used=7.71GiB
>>> >System, DUP: total=32.00MiB, used=16.00KiB
>>> >Metadata, DUP: total=1.12GiB, used=377.25MiB
>>> >GlobalReserve, single: total=128.00MiB, used=0.00B
>>> >
>>> >btrfs fi show
>>> >Label: none  uuid: d6934db3-3ac9-49d0-83db-287be7b995a5
>>> > Total devices 1 FS bytes used 8.08GiB
>>> > devid1 size 18.71GiB used 10.31GiB path /dev/sda6
>>> >
>>> >btrfs-progs v4.0+20150429
>>> >
>>> Hmm, that's odd, based on these numbers, you should be having no
>>> issue at all trying to run a balance. You might be hitting some
>>> other bug in the kernel, however, but I don't remember if there were
>>> any known bugs related to ENOSPC or balance in the version you're
>>> running.
>>
>>There's one specific bug that shows up with ENOSPC exactly like
>> this. It's in all versions of the kernel, there's no known solution,
>> and no guaranteed mitigation strategy, I'm afraid. Various things like
>> balancing, or adding, balancing, and removing a device again have been
>> tried. Sometimes they seem to help; sometimes they just make the
>> problem worse.
>>
>>We average maybe one report a week or so(*) with this particular
>> set of symptoms.
>>
>>Hugo.
>>
>> (*) 
>>
>>>  You might see if trying to re-run the balance with
>>> '-dusage=40 -musage=40' works correctly (I've seen cases where the
>>> first run fails, but subsequent ones work because the first one made
>>> some progress despite failing).
>>> >On Fri, Nov 13, 2015 at 5:30 PM, Austin S Hemmelgarn
>>> > wrote:
>>> >>On 2015-11-13 11:12, Vedran Vucic wrote:
>>> >>>
>>> >>>Hello,
>>> >>>
>>> >>>I succeeded to delete illegal snapshot with command:
>>> >>>btrfs subvolume delete /.snapshots/741/snapshot
>>> >>>When I have done
>>> >>>btrfs balance / -dusage=0 -musage=0
>>> >>>increasing value up to 4o I did not have issues.
>>> >>>But on value 4- for-dusage= and -musage=
>>> >>>I got message that there is no space left on disk.
>>> >>>Do you have any advice how to manage that?
>>> >>
>>> >>
>>> >>Can you post the output of 'btrfs fi df' and 'btrfs fi show' again? Both
>>> >>should have changed after the balance, and I'd need to see what it looks
>>> >>like now to be able to give any reasonable advice.
>>> >>
>>> >>
>>>
>>>
>>
>>
>>
>> --
>> Hugo Mills | You can play with your friends' privates, but you
>> hugo@... carfax.org.uk | can't play with your friends' childrens' privates.
>> http://carfax.org.uk/  |
>> PGP: E2AB1DE4  |   C++ coding 
>> rule
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: illegal snapshot, cannot be deleted

2015-11-13 Thread Austin S Hemmelgarn

On 2015-11-13 14:55, Hugo Mills wrote:

On Fri, Nov 13, 2015 at 02:40:44PM -0500, Austin S Hemmelgarn wrote:

On 2015-11-13 13:42, Hugo Mills wrote:

On Fri, Nov 13, 2015 at 01:10:12PM -0500, Austin S Hemmelgarn wrote:

On 2015-11-13 12:30, Vedran Vucic wrote:

Hello,

Here are outputs of commands as you requested:
  btrfs fi df /
Data, single: total=8.00GiB, used=7.71GiB
System, DUP: total=32.00MiB, used=16.00KiB
Metadata, DUP: total=1.12GiB, used=377.25MiB
GlobalReserve, single: total=128.00MiB, used=0.00B

btrfs fi show
Label: none  uuid: d6934db3-3ac9-49d0-83db-287be7b995a5
 Total devices 1 FS bytes used 8.08GiB
 devid1 size 18.71GiB used 10.31GiB path /dev/sda6

btrfs-progs v4.0+20150429


Hmm, that's odd, based on these numbers, you should be having no
issue at all trying to run a balance. You might be hitting some
other bug in the kernel, however, but I don't remember if there were
any known bugs related to ENOSPC or balance in the version you're
running.


There's one specific bug that shows up with ENOSPC exactly like
this. It's in all versions of the kernel, there's no known solution,
and no guaranteed mitigation strategy, I'm afraid. Various things like
balancing, or adding, balancing, and removing a device again have been
tried. Sometimes they seem to help; sometimes they just make the
problem worse.

We average maybe one report a week or so with this particular
set of symptoms.

We should get this listed on the Wiki on the Gotcha's page ASAP,
especially considering that it's a pretty significant bug (not quite
as bad as data corruption, but pretty darn close).


It's certainly mentioned in the FAQ, in the main entry on
unexpected ENOSPC. The text takes you through identifying when there's
the "usual" problem, then goes on to say that if you've hit ENOSPC
with free space still to be unallocated, you've got this issue.
It should still probably be on the Gotcha's page also, as it definitely 
fits the general description of the stuff there.

Vedran, could you try running the balance with just '-dusage=40' and
then again with just '-musage=40'?  If just one of those fails, it
could help narrow things down significantly.

Hugo, is there anything else known about this issue (I don't recall
seeing it mentioned before, and a quick web search didn't turn up
much)?


I grumble about it regularly on IRC, where we get many more reports
of it than on the mailing list. There have been a couple on here that
I can recall, but not many.

Ah, that would explain it, I'm almost never on IRC.



  In particular:
1. Is there any known way to reliably reproduce it (I would assume
not, as that would likely lead to a mitigation strategy.  If someone
does find a reliable reproducer, please let me know, I've got some
significant spare processor time and storage space I could dedicate
to getting traces and filesystem images for debugging, and already
have most of the required infrastructure set up for something like
this)?


None that I know of. I can start asking people for btrfs-image
dumps again, if you want to investigate. I did do that for a while, to
pass them to josef, but he said he didn't need any more of them after
a while. (He was always planning on investigating it, but kept getting
diverted by data corruption bugs, which have higher priority).
I don't have the experience to be able to properly debug it myself from 
images (my expertise has always been finding bugs, not necessarily 
fixing them), but was more offering to try and generate images (if we 
could find some series of commands that reproduces this at least some of 
the time, I have the resources to run a couple of VM's doing that over 
and over again until it hits the bug).  If I could get some, I might be 
able to put some assertions into the kernel so that it panics when 
there's an ENOSPC in the balance code, and get a stack trace, but the 
more I think about it, the more likely it seems that that isn't going to 
be too helpful.



2. Is it contagious (that is, if I send a snapshot from a filesystem
that is affected by it, does the filesystem that receives the
snapshot become affected; if we could find a way to reproduce it, I
could easily answer this question within a couple of minutes of
reproducing it)?


No, as far as I know, it doesn't transfer via send/receive.
send/receive is largely equivalent to copying the data by other means
-- receive is implemented almost exclusively in userspace, with only a
couple of ioctls for mucking around with the UUIDs at the end.
I thought that might be the case, but wanted to ask just to be safe (I 
do local backups on some systems using send/receive, largely because 
this means if my regular root filesystem gets corrupted, I can directly 
boot the backups, run a couple of commands, and then have a working 
system again in about 5 or 10 minutes, but if this could spread through 
send/receive, then that makes backups done this way less useful (because 
this is 

Re: illegal snapshot, cannot be deleted

2015-11-13 Thread Duncan
Hugo Mills posted on Fri, 13 Nov 2015 19:55:20 + as excerpted:

> receive is implemented almost exclusively in userspace, with only a
> couple of ioctls for mucking around with the UUIDs at the end.

I wasn't aware of that and had assumed kernel space.  Apart from the 
topic of discussion here, that has implications for the old "how old is 
too old" versioning question regarding userspace, so thanks, Hugo. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: illegal snapshot, cannot be deleted

2015-11-13 Thread Vedran Vucic
Hello,

I guess that it might be bug in kernel.
I was successful this:
btrfs balance start / -dusage=50 -musage=35

musage above 35 caused ENOSPC message. Otherwise it was good.
Thanks on support,

vedran

On Fri, Nov 13, 2015 at 7:42 PM, Hugo Mills  wrote:
> On Fri, Nov 13, 2015 at 01:10:12PM -0500, Austin S Hemmelgarn wrote:
>> On 2015-11-13 12:30, Vedran Vucic wrote:
>> >Hello,
>> >
>> >Here are outputs of commands as you requested:
>> >  btrfs fi df /
>> >Data, single: total=8.00GiB, used=7.71GiB
>> >System, DUP: total=32.00MiB, used=16.00KiB
>> >Metadata, DUP: total=1.12GiB, used=377.25MiB
>> >GlobalReserve, single: total=128.00MiB, used=0.00B
>> >
>> >btrfs fi show
>> >Label: none  uuid: d6934db3-3ac9-49d0-83db-287be7b995a5
>> > Total devices 1 FS bytes used 8.08GiB
>> > devid1 size 18.71GiB used 10.31GiB path /dev/sda6
>> >
>> >btrfs-progs v4.0+20150429
>> >
>> Hmm, that's odd, based on these numbers, you should be having no
>> issue at all trying to run a balance. You might be hitting some
>> other bug in the kernel, however, but I don't remember if there were
>> any known bugs related to ENOSPC or balance in the version you're
>> running.
>
>There's one specific bug that shows up with ENOSPC exactly like
> this. It's in all versions of the kernel, there's no known solution,
> and no guaranteed mitigation strategy, I'm afraid. Various things like
> balancing, or adding, balancing, and removing a device again have been
> tried. Sometimes they seem to help; sometimes they just make the
> problem worse.
>
>We average maybe one report a week or so(*) with this particular
> set of symptoms.
>
>Hugo.
>
> (*) 
>
>>  You might see if trying to re-run the balance with
>> '-dusage=40 -musage=40' works correctly (I've seen cases where the
>> first run fails, but subsequent ones work because the first one made
>> some progress despite failing).
>> >On Fri, Nov 13, 2015 at 5:30 PM, Austin S Hemmelgarn
>> > wrote:
>> >>On 2015-11-13 11:12, Vedran Vucic wrote:
>> >>>
>> >>>Hello,
>> >>>
>> >>>I succeeded to delete illegal snapshot with command:
>> >>>btrfs subvolume delete /.snapshots/741/snapshot
>> >>>When I have done
>> >>>btrfs balance / -dusage=0 -musage=0
>> >>>increasing value up to 4o I did not have issues.
>> >>>But on value 4- for-dusage= and -musage=
>> >>>I got message that there is no space left on disk.
>> >>>Do you have any advice how to manage that?
>> >>
>> >>
>> >>Can you post the output of 'btrfs fi df' and 'btrfs fi show' again? Both
>> >>should have changed after the balance, and I'd need to see what it looks
>> >>like now to be able to give any reasonable advice.
>> >>
>> >>
>>
>>
>
>
>
> --
> Hugo Mills | You can play with your friends' privates, but you
> hugo@... carfax.org.uk | can't play with your friends' childrens' privates.
> http://carfax.org.uk/  |
> PGP: E2AB1DE4  |   C++ coding rule
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: illegal snapshot, cannot be deleted

2015-11-13 Thread Hugo Mills
On Fri, Nov 13, 2015 at 09:11:46PM +, Duncan wrote:
> Hugo Mills posted on Fri, 13 Nov 2015 19:55:20 + as excerpted:
> 
> > receive is implemented almost exclusively in userspace, with only a
> > couple of ioctls for mucking around with the UUIDs at the end.
> 
> I wasn't aware of that and had assumed kernel space.  Apart from the 
> topic of discussion here, that has implications for the old "how old is 
> too old" versioning question regarding userspace, so thanks, Hugo. =:^)

   Note that send is still heavily kernel-side. It's only receive
that's all in userspace.

   Hugo.

-- 
Hugo Mills |
hugo@... carfax.org.uk | __(_'>
http://carfax.org.uk/  | Squeak!
PGP: E2AB1DE4  |


signature.asc
Description: Digital signature


Re: illegal snapshot, cannot be deleted

2015-11-13 Thread Duncan
Hugo Mills posted on Fri, 13 Nov 2015 21:13:41 + as excerpted:

> On Fri, Nov 13, 2015 at 09:11:46PM +, Duncan wrote:
>> Hugo Mills posted on Fri, 13 Nov 2015 19:55:20 + as excerpted:
>> 
>> > receive is implemented almost exclusively in userspace, with only a
>> > couple of ioctls for mucking around with the UUIDs at the end.
>> 
>> I wasn't aware of that and had assumed kernel space.  Apart from the
>> topic of discussion here, that has implications for the old "how old is
>> too old" versioning question regarding userspace, so thanks, Hugo. =:^)
> 
> Note that send is still heavily kernel-side. It's only receive
> that's all in userspace.

Being "runtime", send's kernel-side use would be expected.

The more general rule that runtime operations are kernel side, so user 
side versioning doesn't have the same importance, unless you're trying to 
use userside tools such as check, rescue and recover, generally run on an 
unmounted btrfs, to recover from damage, since in the general case that's 
where userspace code really goes to work and thus where it's version 
becomes important.

That the receive side of the send/receive feature is an exception to this 
general rule is the news, since now we have a runtime tool, normally run 
on a mounted btrfs, where the userspace code is doing the work and 
therefore the userspace version, newer versions having the latest fixes, 
becomes important.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: illegal snapshot, cannot be deleted

2015-11-12 Thread Austin S Hemmelgarn

On 2015-11-11 17:11, Vedran Vucic wrote:

Hello,

I use OpenSuse 13.2 on my Toshiba Satellite laptop. I noticed that I run
out of disk space, checked documentation and I realized that there were
many snapshots.  I used Yast Snapper to delete snapshots.
I noticed that one snapshot  with number 748 could not be deleted.
I entered terminal and after the command:
snapper -c root delete 748
I got message Illegal snapshot.
This sounds like some sort of issue with snapper, not BTRFS itself, but 
see below for some suggestions.

I woudl like to delete it since it is old one.
Please find details about my system as requested on your wiki page.
uname -a
Linux linux-jjcc.site 3.16.7-29-desktop #1 SMP PREEMPT Fri Oct 23 00:46:04
UTC 2015 (6be6a97) i686 i686 i386 GNU/Linux

btrfs --version
btrfs-progs v4.0+20150429

btrfs fi show
Label: none  uuid: d6934db3-3ac9-49d0-83db-287be7b995a5
 Total devices 1 FS bytes used 10.98GiB
 devid1 size 18.71GiB used 18.71GiB path /dev/sda6

btrfs fi df /
Data, single: total=15.19GiB, used=10.37GiB
System, DUP: total=8.00MiB, used=16.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=1.75GiB, used=622.53MiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=208.00MiB, used=0.00B
Please find attached dmesg.log as requested.

Please advise what have to do in order to delete snapshot that is reported
to be illegal.
Have you tried running 'btrfs subvolume delete' on the snapshot?  You'll 
have to find the full path to it first of course, but that shouldn't be 
too hard.  Based on the lack of BTRFS error messages in the kernel log 
you posted, I'm pretty certain that this isn't an issue with the 
filesystem itself (although the filesystem doesn't look particularly 
healthy, see further below), so manually deleting the snapshot using the 
regular BTRFS commands should work just fine.  That said, you may also 
want to look into changing the config for snapper, as it has a 
ridiculously aggressive retention policy for snapshots by default, which 
tends to lead to excessive space usage on filesystems smaller than about 
250GB.


You may also want to look at running a balance on the filesystem, the 
numbers from btrfs fi show and btrfs fi df look somewhat worrying, 
you've got all the space on the disk allocated as chunks by BTRFS, but 
have a significant amount of empty space in those chunks.  Given that 
fact, ENOSPC issues are a very real possibility, and you'll probably 
have to run a series of partial balances to fix this (and it's important 
to do it before it becomes a visible issue also, because once you start 
getting ENOSPC errors, it is a lot harder to fix).  Try running a 
balance with '-dusage=0 -musage=0', then re-run repeatedly increasing 
the number for both arguments by 5 each time until you get to 50.  If a 
run complains about 'ENOSPC errors during balance', re-run it with the 
same number for -dusage and -musage.  If you end up re-running with the 
same value 3 times and keep getting the errors, then you're probably 
beyond the point of this being fixable, and should just recreate the 
filesystem (you do have backups, right?).  Otherwise, after finishing 
the run with '-dusage=50 -musage=50' successfully, run a full balance 
without the dusage and musage options.






smime.p7s
Description: S/MIME Cryptographic Signature