subject:"btrfs problems"

Re: btrfs problems

2018-09-22 Thread Duncan

Adrian Bastholm posted on Thu, 20 Sep 2018 23:35:57 +0200 as excerpted:

> Thanks a lot for the detailed explanation.
> Aabout "stable hardware/no lying hardware". I'm not running any raid
> hardware, was planning on just software raid. three drives glued
> together with "mkfs.btrfs -d raid5 /dev/sdb /dev/sdc /dev/sdd". Would
> this be a safer bet, or would You recommend running the sausage method
> instead, with "-d single" for safety ? I'm guessing that if one of the
> drives dies the data is completely lost Another variant I was
> considering is running a raid1 mirror on two of the drives and maybe a
> subvolume on the third, for less important stuff

Agreed with CMurphy's reply, but he didn't mention...

As I wrote elsewhere recently, don't remember if it was in a reply to you 
before you tried zfs and came back, or to someone else, so I'll repeat 
here, briefer this time...

Keep in mind that on btrfs, it's possible (and indeed the default with 
multiple devices) to run data and metadata at different raid levels.

IMO, as long as you're following an appropriate backup policy that backs 
up anything valuable enough to be worth the time/trouble/resources of 
doing so, so if you /do/ lose the array you still have a backup of 
anything you considered valuable enough to worry about (and that caveat 
is always the case, no matter where or how it's stored, value of data is 
in practice defined not by arbitrary claims but by the number of backups 
it's considered worth having of it)...

With that backups caveat, I'm now confident /enough/ about raid56 mode to 
be comfortable cautiously recommending it for data, tho I'd still /not/ 
recommend it for metadata, which I'd recommend should remain the multi-
device default raid1 level.

That way, you're only risking a limited amount of raid5 data to the not 
yet as mature and well tested raid56 mode, the metadata remains protected 
by the more mature raid1 mode, and if something does go wrong, it's much 
more likely to be only a few files lost instead of the entire filesystem, 
as is at risk if your metadata is raid56 as well, the metadata including 
checksums will be intact so scrub should tell you what files are bad, and 
if those few files are valuable they'll be on the backup and easy enough 
to restore, compared to restoring the entire filesystem.  But for most 
use-cases, metadata should be relatively small compared to data, so 
duplicating metadata as raid1 while doing raid5 for data should go much 
easier on the capacity needs than raid1 for both would.

Tho I'd still recommend raid1 data as well for higher maturity and tested 
ability to use the good copy to rewrite the bad one if one copy goes bad 
(in theory, raid56 mode can use parity to rewrite as well, but that's not 
yet as well tested and there's still the narrow degraded-mode crash write 
hole to worry about), if it's not cost-prohibitive for the amount of data 
you need to store.  But for people on a really tight budget or who are 
storing double-digit TB of data or more, I can understand why they prefer 
raid5, and I do think raid5 is stable enough for data now, as long as the 
metadata remains raid1, AND they're actually executing on a good backup 
policy.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

Re: btrfs problems

2018-09-20 Thread Remi Gauvin

On 2018-09-20 05:35 PM, Adrian Bastholm wrote:
> Thanks a lot for the detailed explanation.
> Aabout "stable hardware/no lying hardware". I'm not running any raid
> hardware, was planning on just software raid. three drives glued
> together with "mkfs.btrfs -d raid5 /dev/sdb /dev/sdc /dev/sdd". Would
> this be a safer bet, or would You recommend running the sausage method
> instead, with "-d single" for safety ? I'm guessing that if one of the
> drives dies the data is completely lost
> Another variant I was considering is running a raid1 mirror on two of
> the drives and maybe a subvolume on the third, for less important
> stuff

In case you were not aware, it's perfectly acceptable with BTRFS to use
Raid 1 over 3 devices.  Even more amazing, regardless of how many
devices you start with, 2, 3, 4, whatever, you can add a single drive to
the array to increase capacity,  (at 50%, of course,, ie, adding a 4TB
drive will give you 2TB usable space, assuming the other drives add up
to at least 4TB to match it.)

<>

Re: btrfs problems

2018-09-20 Thread Chris Murphy

On Thu, Sep 20, 2018 at 3:36 PM Adrian Bastholm  wrote:
>
> Thanks a lot for the detailed explanation.
> Aabout "stable hardware/no lying hardware". I'm not running any raid
> hardware, was planning on just software raid.

Yep. I'm referring to the drives, their firmware, cables, logic board,
its firmware, the power supply, power, etc. Btrfs is by nature
intolerant of corruption. Other file systems are more tolerant because
they don't know about it (although recent versions of XFS and ext4 are
now defaulting to checksummed metadata and journals).

>three drives glued
> together with "mkfs.btrfs -d raid5 /dev/sdb /dev/sdc /dev/sdd". Would
> this be a safer bet, or would You recommend running the sausage method
> instead, with "-d single" for safety ? I'm guessing that if one of the
> drives dies the data is completely lost
> Another variant I was considering is running a raid1 mirror on two of
> the drives and maybe a subvolume on the third, for less important
> stuff

RAID does not substantially reduce the chances of data loss. It's not
anything like a backup. It's an uptime enhancer. If you have backups,
and your primary storage dies, of course you can restore from backup
no problem, but it takes time and while the restore is happening,
you're not online - uptime is killed. If that's a negative, might want
to run RAID so you can keep working during the degraded period, and
instead of a restore you're doing a rebuild. But of course there is a
chance of failure during the degraded period. So you have to have a
backup anyway. At least with Btrfs/ZFS, there is another reason to run
with some replication like raid1 or raid5 and that's so that if
there's corruption or a bad sector, Btrfs doesn't just detect it, it
can fix it up with the good copy.

For what it's worth, make sure the drives have lower SCT ERC time than
the SCSI command timer. This is the same for Btrfs as it is for md and
LVM RAID. The command timer default is 30 seconds, and most drives
have SCT ERC disabled with very high recovery times well over 30
seconds. So either set SCT ERC to something like 70 deciseconds. Or
increase the command timer to something like 120 or 180 (either one is
absurdly high but what you want is for the drive to eventually give up
and report a discrete error message which Btrfs can do something
about, rather than do a SATA link reset in which case Btrfs can't do
anything about it).

-- 
Chris Murphy

Re: btrfs problems

2018-09-20 Thread Adrian Bastholm

Thanks a lot for the detailed explanation.
Aabout "stable hardware/no lying hardware". I'm not running any raid
hardware, was planning on just software raid. three drives glued
together with "mkfs.btrfs -d raid5 /dev/sdb /dev/sdc /dev/sdd". Would
this be a safer bet, or would You recommend running the sausage method
instead, with "-d single" for safety ? I'm guessing that if one of the
drives dies the data is completely lost
Another variant I was considering is running a raid1 mirror on two of
the drives and maybe a subvolume on the third, for less important
stuff

BR Adrian
On Thu, Sep 20, 2018 at 9:39 PM Chris Murphy  wrote:
>
> On Thu, Sep 20, 2018 at 11:23 AM, Adrian Bastholm  wrote:
> > On Mon, Sep 17, 2018 at 2:44 PM Qu Wenruo  wrote:
> >
> >>
> >> Then I strongly recommend to use the latest upstream kernel and progs
> >> for btrfs. (thus using Debian Testing)
> >>
> >> And if anything went wrong, please report asap to the mail list.
> >>
> >> Especially for fs corruption, that's the ghost I'm always chasing for.
> >> So if any corruption happens again (although I hope it won't happen), I
> >> may have a chance to catch it.
> >
> > You got it
> >> >
> >> >> Anyway, enjoy your stable fs even it's not btrfs
> >
> >> > My new stable fs is too rigid. Can't grow it, can't shrink it, can't
> >> > remove vdevs from it , so I'm planning a comeback to BTRFS. I guess
> >> > after the dust settled I realize I like the flexibility of BTRFS.
> >> >
> > I'm back to btrfs.
> >
> >> From the code aspect, the biggest difference is the chunk layout.
> >> Due to the ext* block group usage, each block group header (except some
> >> sparse bg) is always used, thus btrfs can't use them.
> >>
> >> This leads to highly fragmented chunk layout.
> >
> > The only thing I really understood is "highly fragmented" == not good
> > . I might need to google these "chunk" thingies
>
> Chunks are synonyms with block groups. They're like a super extent, or
> extent of extents.
>
> The block group is how Btrfs abstracts the logical address used most
> everywhere in Btrfs land, and device + physical location of extents.
> It's how a file is referenced only by on logical address, and doesn't
> need to know either where the extent is located, or how many copies
> there are. The block group allocation profile is what determines if
> there's one copy, duplicate copies, raid1, 10, 5, 6 copies of a chunk
> and where the copies are located. It's also fundamental to how device
> add, remove, replace, file system resize, and balance all interrelate.
>
>
> >> If your primary concern is to make the fs as stable as possible, then
> >> keep snapshots to a minimal amount, avoid any functionality you won't
> >> use, like qgroup, routinely balance, RAID5/6.
> >
> > So, is RAID5 stable enough ? reading the wiki there's a big fat
> > warning about some parity issues, I read an article about silent
> > corruption (written a while back), and chris says he can't recommend
> > raid56 to mere mortals.
>
> Depends on how you define stable. In recent kernels it's stable on
> stable hardware, i.e. no lying hardware (actually flushes when it
> claims it has), no power failures, and no failed devices. Of course
> it's designed to help protect against a clear loss of a device, but
> there's tons of stuff here that's just not finished including ejecting
> bad devices from the array like md and lvm raids will do. Btrfs will
> just keep trying, through all the failures. There are some patches to
> moderate this but I don't think they're merged yet.
>
> You'd also want to be really familiar with how to handle degraded
> operation, if you're going to depend on it, and how to replace a bad
> device. Last I refreshed my memory on it, it's advised to use "btrfs
> device add" followed by "btrfs device remove" for raid56; whereas
> "btrfs replace" is preferred for all other profiles. I'm not sure if
> the "btrfs replace" issues with parity raid were fixed.
>
> Metadata as raid56 shows a lot more problem reports than metadata
> raid1, so there's something goofy going on in those cases. I'm not
> sure how well understood they are. But other people don't have
> problems with it.
>
> It's worth looking through the archives about some things. Btrfs
> raid56 isn't exactly perfectly COW, there is read-modify-write code
> that means there can be overwrites. I vaguely recall that it's COW in
> the logical layer, but the physical writes can end up being RMW or not
> for sure COW.
>
>
>
> --
> Chris Murphy



-- 
Vänliga hälsningar / Kind regards,
Adrian Bastholm

``I would change the world, but they won't give me the sourcecode``

Re: btrfs problems

2018-09-20 Thread Chris Murphy

On Thu, Sep 20, 2018 at 11:23 AM, Adrian Bastholm  wrote:
> On Mon, Sep 17, 2018 at 2:44 PM Qu Wenruo  wrote:
>
>>
>> Then I strongly recommend to use the latest upstream kernel and progs
>> for btrfs. (thus using Debian Testing)
>>
>> And if anything went wrong, please report asap to the mail list.
>>
>> Especially for fs corruption, that's the ghost I'm always chasing for.
>> So if any corruption happens again (although I hope it won't happen), I
>> may have a chance to catch it.
>
> You got it
>> >
>> >> Anyway, enjoy your stable fs even it's not btrfs
>
>> > My new stable fs is too rigid. Can't grow it, can't shrink it, can't
>> > remove vdevs from it , so I'm planning a comeback to BTRFS. I guess
>> > after the dust settled I realize I like the flexibility of BTRFS.
>> >
> I'm back to btrfs.
>
>> From the code aspect, the biggest difference is the chunk layout.
>> Due to the ext* block group usage, each block group header (except some
>> sparse bg) is always used, thus btrfs can't use them.
>>
>> This leads to highly fragmented chunk layout.
>
> The only thing I really understood is "highly fragmented" == not good
> . I might need to google these "chunk" thingies

Chunks are synonyms with block groups. They're like a super extent, or
extent of extents.

The block group is how Btrfs abstracts the logical address used most
everywhere in Btrfs land, and device + physical location of extents.
It's how a file is referenced only by on logical address, and doesn't
need to know either where the extent is located, or how many copies
there are. The block group allocation profile is what determines if
there's one copy, duplicate copies, raid1, 10, 5, 6 copies of a chunk
and where the copies are located. It's also fundamental to how device
add, remove, replace, file system resize, and balance all interrelate.

>> If your primary concern is to make the fs as stable as possible, then
>> keep snapshots to a minimal amount, avoid any functionality you won't
>> use, like qgroup, routinely balance, RAID5/6.
>
> So, is RAID5 stable enough ? reading the wiki there's a big fat
> warning about some parity issues, I read an article about silent
> corruption (written a while back), and chris says he can't recommend
> raid56 to mere mortals.

Depends on how you define stable. In recent kernels it's stable on
stable hardware, i.e. no lying hardware (actually flushes when it
claims it has), no power failures, and no failed devices. Of course
it's designed to help protect against a clear loss of a device, but
there's tons of stuff here that's just not finished including ejecting
bad devices from the array like md and lvm raids will do. Btrfs will
just keep trying, through all the failures. There are some patches to
moderate this but I don't think they're merged yet.

You'd also want to be really familiar with how to handle degraded
operation, if you're going to depend on it, and how to replace a bad
device. Last I refreshed my memory on it, it's advised to use "btrfs
device add" followed by "btrfs device remove" for raid56; whereas
"btrfs replace" is preferred for all other profiles. I'm not sure if
the "btrfs replace" issues with parity raid were fixed.

Metadata as raid56 shows a lot more problem reports than metadata
raid1, so there's something goofy going on in those cases. I'm not
sure how well understood they are. But other people don't have
problems with it.

It's worth looking through the archives about some things. Btrfs
raid56 isn't exactly perfectly COW, there is read-modify-write code
that means there can be overwrites. I vaguely recall that it's COW in
the logical layer, but the physical writes can end up being RMW or not
for sure COW.

-- 
Chris Murphy

Re: btrfs problems

2018-09-20 Thread Adrian Bastholm

On Mon, Sep 17, 2018 at 2:44 PM Qu Wenruo  wrote:

>
> Then I strongly recommend to use the latest upstream kernel and progs
> for btrfs. (thus using Debian Testing)
>
> And if anything went wrong, please report asap to the mail list.
>
> Especially for fs corruption, that's the ghost I'm always chasing for.
> So if any corruption happens again (although I hope it won't happen), I
> may have a chance to catch it.

You got it
> >
> >> Anyway, enjoy your stable fs even it's not btrfs

> > My new stable fs is too rigid. Can't grow it, can't shrink it, can't
> > remove vdevs from it , so I'm planning a comeback to BTRFS. I guess
> > after the dust settled I realize I like the flexibility of BTRFS.
> >
I'm back to btrfs.

> From the code aspect, the biggest difference is the chunk layout.
> Due to the ext* block group usage, each block group header (except some
> sparse bg) is always used, thus btrfs can't use them.
>
> This leads to highly fragmented chunk layout.

The only thing I really understood is "highly fragmented" == not good
. I might need to google these "chunk" thingies

> We doesn't have error report about such layout yet, but if you want
> everything to be as stable as possible, I still recommend to use a newly
> created fs.

I guess I'll stick with ext4 on the rootfs

> > Another thing is I'd like to see a "first steps after getting started
> > " section in the wiki. Something like take your first snapshot, back
> > up, how to think when running it - can i just set some cron jobs and
> > forget about it, or does it need constant attention, and stuff like
> > that.
>
> There are projects do such things automatically, like snapper.
>
> If your primary concern is to make the fs as stable as possible, then
> keep snapshots to a minimal amount, avoid any functionality you won't
> use, like qgroup, routinely balance, RAID5/6.

So, is RAID5 stable enough ? reading the wiki there's a big fat
warning about some parity issues, I read an article about silent
corruption (written a while back), and chris says he can't recommend
raid56 to mere mortals.

> And keep the necessary btrfs specific operations to minimal, like
> subvolume/snapshot (and don't keep too many snapshots, say over 20),
> shrink, send/receive.
>
> Thanks,
> Qu
>
> >
> > BR Adrian
> >
> >
>


-- 
Vänliga hälsningar / Kind regards,
Adrian Bastholm

``I would change the world, but they won't give me the sourcecode``

Re: btrfs problems

2018-09-17 Thread Stefan K

> If your primary concern is to make the fs as stable as possible, then
> keep snapshots to a minimal amount, avoid any functionality you won't
> use, like qgroup, routinely balance, RAID5/6.
> 
> And keep the necessary btrfs specific operations to minimal, like
> subvolume/snapshot (and don't keep too many snapshots, say over 20),
> shrink, send/receive.

hehe, that sound like "hey use btrfs, its cool, but please - don't use any 
btrfs specific feature" ;)

best
Stefan

Re: btrfs problems

2018-09-17 Thread Qu Wenruo

On 2018/9/17 下午7:55, Adrian Bastholm wrote:
>> Well, I'd say Debian is really not your first choice for btrfs.
>> The kernel is really old for btrfs.
>>
>> My personal recommend is to use rolling release distribution like
>> vanilla Archlinux, whose kernel is already 4.18.7 now.
> 
> I just upgraded to Debian Testing which has the 4.18 kernel

Then I strongly recommend to use the latest upstream kernel and progs
for btrfs. (thus using Debian Testing)

And if anything went wrong, please report asap to the mail list.

Especially for fs corruption, that's the ghost I'm always chasing for.
So if any corruption happens again (although I hope it won't happen), I
may have a chance to catch it.

> 
>> Anyway, enjoy your stable fs even it's not btrfs anymore.
> 
> My new stable fs is too rigid. Can't grow it, can't shrink it, can't
> remove vdevs from it , so I'm planning a comeback to BTRFS. I guess
> after the dust settled I realize I like the flexibility of BTRFS.
> 
> 
>  This time I'm considering BTRFS as rootfs as well, can I do an
> in-place conversion ? There's this guide
> (https://www.howtoforge.com/how-to-convert-an-ext3-ext4-root-file-system-to-btrfs-on-ubuntu-12.10)
> I was planning on following.

Btrfs-convert is recommended mostly for short term trial (the ability to
rollback to ext* without anything modified)

From the code aspect, the biggest difference is the chunk layout.
Due to the ext* block group usage, each block group header (except some
sparse bg) is always used, thus btrfs can't use them.

This leads to highly fragmented chunk layout.
We doesn't have error report about such layout yet, but if you want
everything to be as stable as possible, I still recommend to use a newly
created fs.

> 
> Another thing is I'd like to see a "first steps after getting started
> " section in the wiki. Something like take your first snapshot, back
> up, how to think when running it - can i just set some cron jobs and
> forget about it, or does it need constant attention, and stuff like
> that.

There are projects do such things automatically, like snapper.

If your primary concern is to make the fs as stable as possible, then
keep snapshots to a minimal amount, avoid any functionality you won't
use, like qgroup, routinely balance, RAID5/6.

And keep the necessary btrfs specific operations to minimal, like
subvolume/snapshot (and don't keep too many snapshots, say over 20),
shrink, send/receive.

Thanks,
Qu

> 
> BR Adrian
> 
> 

signature.asc
Description: OpenPGP digital signature

Re: btrfs problems

2018-09-17 Thread Adrian Bastholm

> Well, I'd say Debian is really not your first choice for btrfs.
> The kernel is really old for btrfs.
>
> My personal recommend is to use rolling release distribution like
> vanilla Archlinux, whose kernel is already 4.18.7 now.

I just upgraded to Debian Testing which has the 4.18 kernel

> Anyway, enjoy your stable fs even it's not btrfs anymore.

My new stable fs is too rigid. Can't grow it, can't shrink it, can't
remove vdevs from it , so I'm planning a comeback to BTRFS. I guess
after the dust settled I realize I like the flexibility of BTRFS.


 This time I'm considering BTRFS as rootfs as well, can I do an
in-place conversion ? There's this guide
(https://www.howtoforge.com/how-to-convert-an-ext3-ext4-root-file-system-to-btrfs-on-ubuntu-12.10)
I was planning on following.

Another thing is I'd like to see a "first steps after getting started
" section in the wiki. Something like take your first snapshot, back
up, how to think when running it - can i just set some cron jobs and
forget about it, or does it need constant attention, and stuff like
that.

BR Adrian


-- 
Vänliga hälsningar / Kind regards,
Adrian Bastholm

``I would change the world, but they won't give me the sourcecode``

Re: btrfs problems

2018-09-16 Thread Chris Murphy

On Sun, Sep 16, 2018 at 2:11 PM, Adrian Bastholm  wrote:
> Thanks for answering Qu.
>
>> At this timing, your fs is already corrupted.
>> I'm not sure about the reason, it can be a failed CoW combined with
>> powerloss, or corrupted free space cache, or some old kernel bugs.
>>
>> Anyway, the metadata itself is already corrupted, and I believe it
>> happens even before you noticed.
>  I suspected it had to be like that
>>
>> > BTRFS check --repair is not recommended, it
>> > crashes , doesn't fix all problems, and I later found out that my
>> > lost+found dir had about 39G of lost files and dirs.
>>
>> lost+found is completely created by btrfs check --repair.
>>
>> > I spent about two days trying to fix everything, removing a disk,
>> > adding it again, checking , you name it. I ended up removing one disk,
>> > reformatting it, and moving the data there.
>>
>> Well, I would recommend to submit such problem to the mail list *BEFORE*
>> doing any write operation to the fs (including btrfs check --repair).
>> As it would help us to analyse the failure pattern to further enhance btrfs.
>
> IMHO that's a, how should I put it, a design flaw, the wrong way of
> looking at how people think, with all respect to all the very smart
> people that put in countless hours of hard work. Users expect and fs
> check and repair to repair, not to break stuff.
> Reading that --repair is "destructive" is contradictory even to me.

It's contradictory to everyone including the developers. No developer
set out to make --repair dangerous from the outset. It just turns out
that it was a harder problem to solve and the thought was that it
would keep getting better.

Newer versions are "should be safe" now even if they can't fix
everything. The far bigger issue I think the developers are aware of
is that depending on repair at all for any Btrfs of appreciable size,
is simply not scalable. Taking a day or a week to run a repair on a
large file system, is unworkable. And that's why it's better to avoid
inconsistencies in the first place which is what Btrfs is supposed to
do, and if that's not happening it's a bug somewhere in Btrfs and also
sometimes in the hardware.

> This problem emerged in a direcory where motion (the camera software)
> was saving pictures. Either killing the process or a powerloss could
> have left these jpg files (or fs metadata) in a bad state. Maybe
> that's something to go on. I was thinking that there's not much anyone
> can do without root access to my box anyway, and I'm not sure I was
> prepared to give that to anyone.

I can't recommend raid56 for people new to Btrfs. It really takes
qualified hardware to make sure there's no betrayal, as everything
gets a lot more complicated with raid56. The general state of faulty
device handling on Btrfs, makes raid56 very much a hands on approach
you can't turn your back on it. And then when jumping into raid5, I
advise raid1 for metadata. It reduces problems. And that's true for
raid6 also, except that raid1 metadata is less redundancy than raid1
so...it's not helpful if you end up losing 2 devices.

If you need production grade parity raid you should use openzfs,
although I can't speak to how it behaves with respect to faulty
devices on Linux.

>> Any btrfs unexpected behavior, from strange ls output to aborted
>> transaction, please consult with the mail list first.
>> (Of course, with kernel version and btrfs-progs version, which is
>> missing in your console log though)
>
> Linux jenna 4.9.0-8-amd64 #1 SMP Debian 4.9.110-3+deb9u4 (2018-08-21)
> x86_64 GNU/Linux
> btrfs-progs is already the newest version (4.7.3-1).

Well the newest versions are kernel 4.18.8, and btrfs-progs 4.17.1, so
in Btrfs terms those are kinda old.

That is not inherently bad, but there are literally thousands of
additions and deletions since kernel 4.9 so there's almost no way
anyone on this list, except a developer familiar with backport status,
can tell you if the problem you're seeing is a bug that's been fixed
in that particular version. There aren't that many developers that
familiar with that status who also have time to read user reports.
Since this is an upstream list, most developers will want to know if
you're able to reproduce the problem with a mainline kernel, because
if you can it's very probable it's a bug that needs to be fixed
upstream first before it can be backported. That's just the nature of
kernel development generally. And you'll find the same thing on ext4
and XFS lists...

The main reason why people use Debian and its older kernel bases is
they're willing to accept certain bugginess in favor of stability.
Transient bugs are really bad in that world. Consistent bugs they just
find work arounds for (avoidance) until there's a known highly tested
backport, because they want "The Behavior" to be predictable, both
good and bad. That is not a model well suited for a file system that's
in Btrfs really active development state. It's better now than it was
even a

Fwd: btrfs problems

2018-09-16 Thread Adrian Bastholm

...

And also raid56 is still considered experimental, and has various
problems if the hardware lies (like if some writes happen out of order
or faster on some devices than others,and it's much harder to repair
because the repair tools aren't raid56 feature complete).

https://btrfs.wiki.kernel.org/index.php/Status

I think it's less scary than "dangerous" or "unstable" but anyway,
there are known problems unique to raid56 that will need future
features to make it as reliable as single, raid1, raid10. And like any
parity raid it sucks performance wise for random writes, especially
when using hard drives.



On Sun, Sep 16, 2018 at 1:40 PM, Adrian Bastholm  wrote:
> Hi Chris
>> There's almost no useful information provided for someone to even try
>> to reproduce your results, isolate cause and figure out the bugs.
> I realize that. That's why I wasn't really asking for help, I was
> merely giving some feedback.
>
>> No kernel version. No btrfs-progs version. No description of the
>> hardware and how it's laid out, and what mkfs and mount options are
>> being used. No one really has the time to speculate.
>
> I understand, and I apologize. I could have added more detail.
>
>>
>> >BTRFS check --repair is not recommended
>>
>> Right. So why did you run it anyway?
>
> Because "repair" implies it does something to help you. That's how
> most people's brains work. My fs is broken. I'll try "REPAIR"
>
>
>> man btrfs check:
>>
>> Warning
>>Do not use --repair unless you are advised to do so by a
>> developer or an experienced user
>>
>>
>> It is always a legitimate complaint, despite this warning, if btrfs
>> check --repair makes things worse, because --repair shouldn't ever
>> make things worse.
>
> I don't think It made things worse. It's more like it didn't do
> anything. That's when I started trying to copy a new file to the file
> with the question mark attributes (lame, I know) to see what happens.
> The "corrupted" file suddenly had attributes, and so on.
> check --repair removed the extra files and left me at square one, so not 
> worse.
>
>>But Btrfs repairs are complicated, and that's why
>> the warning is there. I suppose the devs could have made the flag
>> --riskyrepair but I doubt this would really slow users down that much.
>
> calling it --destructive or --deconstruct, or something even more
> scary would slow people down
>
>> A big part of --repair fixes weren't known to make things worse at the
>> time, and edge cases where it made things worse kept popping up, so
>> only in hindsight does it make sense --repair maybe could have been
>> called something different to catch the user's attention.
>
> Exactly. It's not too late to rename it. And maybe make it dump a
> filesystem report with everything a developer would need (within
> reason) to trace the error
>
>> But anyway, I see this same sort of thing on the linux-raid list all
>> the time. People run into trouble, and they press full forward making
>> all kinds of changes, each change increases the chance of data loss.
>> And then they come on the list with WTF messages. And it's always a
>> lesson in patience for the list regulars and developers... if only
>> you'd come to us with questions sooner.
>
> True. I found the list a bit late. I tried the IRC channel but I
> couldn' t post messages.
>
>> > Please have a look at the console logs.
>>
>> These aren't logs. It's a record of shell commands. Logs would include
>> kernel messages, ideally all of them. Why is device 3 missing?
>
> It was a RAID5 array of three drives. When doing btrfs check on two of
> the drives I got the drive x is missing. I figured that maybe it had
> to do something with which one was the "first" drive or something. The
> same way, btrfs-check crashed when I was running it against the drives
> where I got the "drive x missing" message
>
>
>> We have no idea. Most of Btrfs code is in the kernel, problems are reported 
>> by
>> the kernel. So we need kernel messages, user space messages aren't
>> enough.
>
>> Anyway, good luck with openzfs, cool project.
> Cool project, not so cool pitfalls. I might head back to BTRFS after
> all .. see the response to Qu.
>
> Thanks for answering, and sorry for the shortcomings of my feedback
> /A
>
>>
>> --
>> Chris Murphy
>
>
>
> --
> Vänliga hälsningar / Kind regards,
> Adrian Bastholm
>
> ``I would change the world, but they won't give me the sourcecode``



--
Chris Murphy


-- 
Vänliga hälsningar / Kind regards,
Adrian Bastholm

``I would change the world, but they won't give me the sourcecode``

Re: btrfs problems

2018-09-16 Thread Chris Murphy

On Sun, Sep 16, 2018 at 7:58 AM, Adrian Bastholm  wrote:
> Hello all
> Actually I'm not trying to get any help any more, I gave up BTRFS on
> the desktop, but I'd like to share my efforts of trying to fix my
> problems, in hope I can help some poor noob like me.

There's almost no useful information provided for someone to even try
to reproduce your results, isolate cause and figure out the bugs.

No kernel version. No btrfs-progs version. No description of the
hardware and how it's laid out, and what mkfs and mount options are
being used. No one really has the time to speculate.

>BTRFS check --repair is not recommended

Right. So why did you run it anyway?

man btrfs check:

Warning
   Do not use --repair unless you are advised to do so by a
developer or an experienced user

It is always a legitimate complaint, despite this warning, if btrfs
check --repair makes things worse, because --repair shouldn't ever
make things worse. But Btrfs repairs are complicated, and that's why
the warning is there. I suppose the devs could have made the flag
--riskyrepair but I doubt this would really slow users down that much.
A big part of --repair fixes weren't known to make things worse at the
time, and edge cases where it made things worse kept popping up, so
only in hindsight does it make sense --repair maybe could have been
called something different to catch the user's attention.

But anyway, I see this same sort of thing on the linux-raid list all
the time. People run into trouble, and they press full forward making
all kinds of changes, each change increases the chance of data loss.
And then they come on the list with WTF messages. And it's always a
lesson in patience for the list regulars and developers... if only
you'd come to us with questions sooner.

> Please have a look at the console logs.

These aren't logs. It's a record of shell commands. Logs would include
kernel messages, ideally all of them. Why is device 3 missing? We have
no idea. Most of Btrfs code is in the kernel, problems are reported by
the kernel. So we need kernel messages, user space messages aren't
enough.

Anyway, good luck with openzfs, cool project.

-- 
Chris Murphy

Re: btrfs problems

2018-09-16 Thread Qu Wenruo

On 2018/9/16 下午9:58, Adrian Bastholm wrote:
> Hello all
> Actually I'm not trying to get any help any more, I gave up BTRFS on
> the desktop, but I'd like to share my efforts of trying to fix my
> problems, in hope I can help some poor noob like me.
> 
> I decided to use BTRFS after reading the ArsTechnica article about the
> next-gen filesystems, and BTRFS seemed like the natural choice, open
> source, built into linux, etc. I even bought a HP microserver to have
> everything on because none of the commercial NAS-es supported BTRFS.
> What a mistake, I wasted weeks in total managing something that could
> have taken a day to set up, and I'd have MUCH more functionality now
> (if I wasn't hit by some ransomware, that is).
> 
> I had three 1TB drives, chose to use raid, and all was good for a
> while, until started fiddling with Motion, the image capturing
> software. When you kill that process (my take on it) a file can be
> written but it ends up with question marks instead of attributes, and
> it's impossible to remove.

At this timing, your fs is already corrupted.
I'm not sure about the reason, it can be a failed CoW combined with
powerloss, or corrupted free space cache, or some old kernel bugs.

Anyway, the metadata itself is already corrupted, and I believe it
happens even before you noticed.

> BTRFS check --repair is not recommended, it
> crashes , doesn't fix all problems, and I later found out that my
> lost+found dir had about 39G of lost files and dirs.

lost+found is completely created by btrfs check --repair.

> I spent about two days trying to fix everything, removing a disk,
> adding it again, checking , you name it. I ended up removing one disk,
> reformatting it, and moving the data there.

Well, I would recommend to submit such problem to the mail list *BEFORE*
doing any write operation to the fs (including btrfs check --repair).
As it would help us to analyse the failure pattern to further enhance btrfs.

> Now I removed BTRFS
> entirely and replaced it with a OpenZFS mirror array, to which I'll
> add the third disk later when I transferred everything over.

Understandable, it's really annoying a fs just get itself corrupted, and
without much btrfs specified knowledge it would just be a hell to try
any method to fix it (even a lot of them would just make the case worse).

> 
> Please have a look at the console logs. I've been running linux on the
> desktop for the past 15 years, so I'm not a noob, but for running
> BTRFS you better be involved in the development of it.

I'd say, yes.
For any btrfs unexpected behavior, don't use btrfs check --repair unless
you're a developer or some developer asked to do.

Any btrfs unexpected behavior, from strange ls output to aborted
transaction, please consult with the mail list first.
(Of course, with kernel version and btrfs-progs version, which is
missing in your console log though)

In fact, in recent (IIRC starting from v4.15) kernel releases, btrfs is
already doing much better error detection thus it would detect such
problem early on and protect the fs from being further modified.

(This further shows that the importance of using the latest mainline
kernel other than some old kernel provided by stable distribution).

Thanks,
Qu

> In my humble
> opinion, it's not for us "users" just yet. Not even for power users.
> 
> For those of you considering building a NAS without special purposes,
> don't. Buy a synology, pop in a couple of drives, and enjoy the ride.
> 
> 
> 
>  root  /home/storage/motion/2017-05-24  1  ls -al
> ls: cannot access '36-20170524201346-02.jpg': No such file or directory
> ls: cannot access '36-20170524201346-02.jpg': No such file or directory
> total 4
> drwxrwxrwx 1 motion   motion   114 Sep 14 12:48 .
> drwxrwxr-x 1 motion   adyhasch  60 Sep 14 09:42 ..
> -? ? ??  ?? 36-20170524201346-02.jpg
> -? ? ??  ?? 36-20170524201346-02.jpg
> -rwxr-xr-x 1 adyhasch adyhasch  62 Sep 14 12:43 remove.py
> root  /home/storage/motion/2017-05-24  1  touch test.raw
>  root  /home/storage/motion/2017-05-24  cat /dev/random > test.raw
> ^C
> root  /home/storage/motion/2017-05-24  ls -al
> ls: cannot access '36-20170524201346-02.jpg': No such file or directory
> ls: cannot access '36-20170524201346-02.jpg': No such file or directory
> total 8
> drwxrwxrwx 1 motion   motion   130 Sep 14 13:12 .
> drwxrwxr-x 1 motion   adyhasch  60 Sep 14 09:42 ..
> -? ? ??  ?? 36-20170524201346-02.jpg
> -? ? ??  ?? 36-20170524201346-02.jpg
> -rwxr-xr-x 1 adyhasch adyhasch  62 Sep 14 12:43 remove.py
> -rwxrwxrwx 1 root root 338 Sep 14 13:12 test.raw
>  root  /home/storage/motion/2017-05-24  1  cp test.raw
> 36-20170524201346-02.jpg
> 'test.raw' -> '36-20170524201346-02.jpg'
> 
>  root  /home/storage/motion/2017-05-24  ls -al
> total 20
> drwxrwxrwx 1 motion   motion   178 Sep 14

btrfs problems

2018-09-16 Thread Adrian Bastholm

Hello all
Actually I'm not trying to get any help any more, I gave up BTRFS on
the desktop, but I'd like to share my efforts of trying to fix my
problems, in hope I can help some poor noob like me.

I decided to use BTRFS after reading the ArsTechnica article about the
next-gen filesystems, and BTRFS seemed like the natural choice, open
source, built into linux, etc. I even bought a HP microserver to have
everything on because none of the commercial NAS-es supported BTRFS.
What a mistake, I wasted weeks in total managing something that could
have taken a day to set up, and I'd have MUCH more functionality now
(if I wasn't hit by some ransomware, that is).

I had three 1TB drives, chose to use raid, and all was good for a
while, until started fiddling with Motion, the image capturing
software. When you kill that process (my take on it) a file can be
written but it ends up with question marks instead of attributes, and
it's impossible to remove. BTRFS check --repair is not recommended, it
crashes , doesn't fix all problems, and I later found out that my
lost+found dir had about 39G of lost files and dirs.
I spent about two days trying to fix everything, removing a disk,
adding it again, checking , you name it. I ended up removing one disk,
reformatting it, and moving the data there. Now I removed BTRFS
entirely and replaced it with a OpenZFS mirror array, to which I'll
add the third disk later when I transferred everything over.

Please have a look at the console logs. I've been running linux on the
desktop for the past 15 years, so I'm not a noob, but for running
BTRFS you better be involved in the development of it. In my humble
opinion, it's not for us "users" just yet. Not even for power users.

For those of you considering building a NAS without special purposes,
don't. Buy a synology, pop in a couple of drives, and enjoy the ride.



 root  /home/storage/motion/2017-05-24  1  ls -al
ls: cannot access '36-20170524201346-02.jpg': No such file or directory
ls: cannot access '36-20170524201346-02.jpg': No such file or directory
total 4
drwxrwxrwx 1 motion   motion   114 Sep 14 12:48 .
drwxrwxr-x 1 motion   adyhasch  60 Sep 14 09:42 ..
-? ? ??  ?? 36-20170524201346-02.jpg
-? ? ??  ?? 36-20170524201346-02.jpg
-rwxr-xr-x 1 adyhasch adyhasch  62 Sep 14 12:43 remove.py
root  /home/storage/motion/2017-05-24  1  touch test.raw
 root  /home/storage/motion/2017-05-24  cat /dev/random > test.raw
^C
root  /home/storage/motion/2017-05-24  ls -al
ls: cannot access '36-20170524201346-02.jpg': No such file or directory
ls: cannot access '36-20170524201346-02.jpg': No such file or directory
total 8
drwxrwxrwx 1 motion   motion   130 Sep 14 13:12 .
drwxrwxr-x 1 motion   adyhasch  60 Sep 14 09:42 ..
-? ? ??  ?? 36-20170524201346-02.jpg
-? ? ??  ?? 36-20170524201346-02.jpg
-rwxr-xr-x 1 adyhasch adyhasch  62 Sep 14 12:43 remove.py
-rwxrwxrwx 1 root root 338 Sep 14 13:12 test.raw
 root  /home/storage/motion/2017-05-24  1  cp test.raw
36-20170524201346-02.jpg
'test.raw' -> '36-20170524201346-02.jpg'

 root  /home/storage/motion/2017-05-24  ls -al
total 20
drwxrwxrwx 1 motion   motion   178 Sep 14 13:13 .
drwxrwxr-x 1 motion   adyhasch  60 Sep 14 09:42 ..
-rwxr-xr-x 1 root root 338 Sep 14 13:13 36-20170524201346-02.jpg
-rwxr-xr-x 1 root root 338 Sep 14 13:13 36-20170524201346-02.jpg
-rwxr-xr-x 1 root root 338 Sep 14 13:13 36-20170524201346-02.jpg
-rwxr-xr-x 1 adyhasch adyhasch  62 Sep 14 12:43 remove.py
-rwxrwxrwx 1 root root 338 Sep 14 13:12 test.raw

 root  /home/storage/motion/2017-05-24  chmod 777 36-20170524201346-02.jpg

 root  /home/storage/motion/2017-05-24  ls -al
total 20
drwxrwxrwx 1 motion   motion   178 Sep 14 13:13 .
drwxrwxr-x 1 motion   adyhasch  60 Sep 14 09:42 ..
-rwxrwxrwx 1 root root 338 Sep 14 13:13 36-20170524201346-02.jpg
-rwxrwxrwx 1 root root 338 Sep 14 13:13 36-20170524201346-02.jpg
-rwxrwxrwx 1 root root 338 Sep 14 13:13 36-20170524201346-02.jpg
-rwxr-xr-x 1 adyhasch adyhasch  62 Sep 14 12:43 remove.py
-rwxrwxrwx 1 root root 338 Sep 14 13:12 test.raw
 root  /home/storage/motion/2017-05-24  unlink 36-20170524201346-02.jpg
unlink: cannot unlink '36-20170524201346-02.jpg': No such file or directory

 root  /home/storage/motion/2017-05-24  1  ls -al
total 20
drwxrwxrwx 1 motion   motion   178 Sep 14 13:13 .
drwxrwxr-x 1 motion   adyhasch  60 Sep 14 09:42 ..
-rwxrwxrwx 1 root root 338 Sep 14 13:13 36-20170524201346-02.jpg
-rwxrwxrwx 1 root root 338 Sep 14 13:13 36-20170524201346-02.jpg
-rwxrwxrwx 1 root root 338 Sep 14 13:13 36-20170524201346-02.jpg
-rwxr-xr-x 1 adyhasch adyhasch  62 Sep 14 12:43 remove.py
-rwxrwxrwx 1 root root 338 Sep 14 13:12 test.raw

 root  /home/storage/motion/2017-05-24  journalctl -k | grep BTRFS
Sep

Re: Kernel Oops / btrfs problems

2017-10-26 Thread Liu Bo

On Wed, Oct 25, 2017 at 02:05:48PM +0200, andreas.bt...@diezwickers.de wrote:
> Hi,
> 
> I've had problems with a btrfs filesystem on a usb disk. I made a
> successfull backup of all data and created the filesystem from scratch.
> I'm not able to restore all backuped data because of a kernel oops.
> The problem is reproducible.
> 
> I've checked my RAM alreayd with memtest86 for 8h without any problems
> found.
> 
> Could you be so kind to provide any information to solve the problem?
> 
> Thanks for you help.
> 
> Andreas
> 
> uname -a
> 
> Linux fatblock 4.12.0-0.bpo.2-amd64 #1 SMP Debian 4.12.13-1~bpo9+1
> (2017-09-28) x86_64 GNU/Linux
> 
> btrfs --version
> ===
> btrfs-progs v4.9.1
> 
> btrfs fi df /mnt/archive/
> =
> Data, single: total=1.20TiB, used=1.20TiB
> System, DUP: total=40.00MiB, used=160.00KiB
> Metadata, DUP: total=3.50GiB, used=2.79GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
> 
> smartctl -a /dev/sdf
> 
> andi@fatblock:/etc/initramfs-tools$ sudo smartctl -a /dev/sdf
> smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.12.0-0.bpo.2-amd64] (local
> build)
> Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
> 
> === START OF INFORMATION SECTION ===
> Model Family: Seagate NAS HDD
> Device Model: ST4000VN000-1H4168
> Serial Number:Z300MKYZ
> LU WWN Device Id: 5 000c50 063dd0357
> Firmware Version: SC43
> User Capacity:4.000.787.030.016 bytes [4,00 TB]
> Sector Sizes: 512 bytes logical, 4096 bytes physical
> Rotation Rate:5900 rpm
> Form Factor:  3.5 inches
> Device is:In smartctl database [for details use: -P show]
> ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
> SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
> Local Time is:Wed Oct 25 14:03:51 2017 CEST
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
> 
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> 
> General SMART Values:
> Offline data collection status:  (0x00)   Offline data collection activity
>   was never started.
>   Auto Offline Data Collection: Disabled.
> Self-test execution status:  (   0)   The previous self-test routine
> completed
>   without error or no self-test has ever
>   been run.
> Total time to complete Offline
> data collection:  (  117) seconds.
> Offline data collection
> capabilities:  (0x73) SMART execute Offline immediate.
>   Auto Offline data collection on/off 
> support.
>   Suspend Offline collection upon new
>   command.
>   No Offline surface scan supported.
>   Self-test supported.
>   Conveyance Self-test supported.
>   Selective Self-test supported.
> SMART capabilities:(0x0003)   Saves SMART data before entering
>   power-saving mode.
>   Supports SMART auto save timer.
> Error logging capability:(0x01)   Error logging supported.
>   General Purpose Logging supported.
> Short self-test routine
> recommended polling time:  (   1) minutes.
> Extended self-test routine
> recommended polling time:  ( 517) minutes.
> Conveyance self-test routine
> recommended polling time:  (   2) minutes.
> SCT capabilities:(0x10bd) SCT Status supported.
>   SCT Error Recovery Control supported.
>   SCT Feature Control supported.
>   SCT Data Table supported.
> 
> SMART Attributes Data Structure revision number: 10
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE UPDATED
> WHEN_FAILED RAW_VALUE
>   1 Raw_Read_Error_Rate 0x000f   109   099   006Pre-fail Always
> -   23673928
>   3 Spin_Up_Time0x0003   094   093   000Pre-fail Always
> -   0
>   4 Start_Stop_Count0x0032   078   078   020Old_age Always
> -   23040
>   5 Reallocated_Sector_Ct   0x0033   100   100   010Pre-fail Always
> -   0
>   7 Seek_Error_Rate 0x000f   083   060   030Pre-fail Always
> -   218165830
>   9 Power_On_Hours  0x0032   059   059   000Old_age Always
> -   36667
>  10 Spin_Retry_Count0x0013   100   100   097Pre-fail Always
> -   0
>  12 Power_Cycle_Count   0x0032   100   100   020Old_age Always
> -   31
> 184

Kernel Oops / btrfs problems

2017-10-25 Thread andreas . btrfs


Hi,

I've had problems with a btrfs filesystem on a usb disk. I made a 
successfull backup of all data and created the filesystem from scratch.

I'm not able to restore all backuped data because of a kernel oops.
The problem is reproducible.

I've checked my RAM alreayd with memtest86 for 8h without any problems 
found.


Could you be so kind to provide any information to solve the problem?

Thanks for you help.

Andreas

uname -a

Linux fatblock 4.12.0-0.bpo.2-amd64 #1 SMP Debian 4.12.13-1~bpo9+1 
(2017-09-28) x86_64 GNU/Linux


btrfs --version
===
btrfs-progs v4.9.1

btrfs fi df /mnt/archive/
=
Data, single: total=1.20TiB, used=1.20TiB
System, DUP: total=40.00MiB, used=160.00KiB
Metadata, DUP: total=3.50GiB, used=2.79GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

smartctl -a /dev/sdf

andi@fatblock:/etc/initramfs-tools$ sudo smartctl -a /dev/sdf
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.12.0-0.bpo.2-amd64] (local 
build)

Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Seagate NAS HDD
Device Model: ST4000VN000-1H4168
Serial Number:Z300MKYZ
LU WWN Device Id: 5 000c50 063dd0357
Firmware Version: SC43
User Capacity:4.000.787.030.016 bytes [4,00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate:5900 rpm
Form Factor:  3.5 inches
Device is:In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:Wed Oct 25 14:03:51 2017 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status:  (   0)	The previous self-test routine 
completed

without error or no self-test has ever
been run.
Total time to complete Offline
data collection:(  117) seconds.
Offline data collection
capabilities:(0x73) SMART execute Offline immediate.
Auto Offline data collection on/off 
support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities:(0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability:(0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time:(   1) minutes.
Extended self-test routine
recommended polling time:( 517) minutes.
Conveyance self-test routine
recommended polling time:(   2) minutes.
SCT capabilities:  (0x10bd) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE 
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x000f   109   099   006Pre-fail 
Always   -   23673928
  3 Spin_Up_Time0x0003   094   093   000Pre-fail 
Always   -   0
  4 Start_Stop_Count0x0032   078   078   020Old_age 
Always   -   23040
  5 Reallocated_Sector_Ct   0x0033   100   100   010Pre-fail 
Always   -   0
  7 Seek_Error_Rate 0x000f   083   060   030Pre-fail 
Always   -   218165830
  9 Power_On_Hours  0x0032   059   059   000Old_age 
Always   -   36667
 10 Spin_Retry_Count0x0013   100   100   097Pre-fail 
Always   -   0
 12 Power_Cycle_Count   0x0032   100   100   020Old_age 
Always   -   31
184 End-to-End_Error0x0032   100   100   099Old_age   Always 
  -   0
187 Reported_Uncorrect  0x0032   100   100   000Old_age   Always 
  -   0
188 Command_Timeout 0x0032   100   099   000Old_age

Re: btrfs problems on new file system

2015-12-26 Thread Chris Murphy

On Sat, Dec 26, 2015 at 4:38 AM,   wrote:
> Duncan <1i5t5.dun...@cox.net> wrote:
>
>> covici posted on Sat, 26 Dec 2015 02:29:11 -0500 as excerpted:
>>
>> > Chris Murphy  wrote:
>> >
>> >> If you can post the entire dmesg somewhere that'd be useful. MUAs tend
>> >> to wrap that text and make it unreadable on list. I think the problems
>> >> with your volume happened before the messages, but it's hard to say.
>> >> Also, a generation of nearly 5000 is not that new?
>> >
>> > The file system was only a few days old.  It was on an lvm volume group
>> > which consisted of two ssd drives, so I am not sure what you are saying
>> > about lvm cache -- how could I do anything different?
>> >
>> >> On another thread someone said you probably need to specify the device
>> >> to mount when using Btrfs and lvmcache? And the device to specify is
>> >> the combined HDD+SSD logical device, for lvmcache that's the "cache
>> >> LV", which is the OriginLV + CachePoolLV. If Btrfs decides to mount the
>> >> origin, it can result in corruption.
>> >
>> > See above.
>>
>> I think he mixed up two threads and thought you were running lvm-cache,
>> not just regular lvm, which should be good unless you're exposing lvm
>> snapshots and thus letting btrfs see multiple supposed UUIDs that aren't
>> actually universal.  Since btrfs is multi-device and uses the UUID to
>> track which devices belong to it (because they're _supposed_ to be
>> universally unique, it's even in the _name_!), if it sees the same UUID
>> it'll consider it part of the same filesystem, thus potentially causing
>> corruption if it's a snapshot or something that's not actually supposed
>> to be part of the (current) filesystem.
>
> I found a few more log entries, perhaps these may be helpful to track
> this down, or maybe prevent the filesystem from going read-only.

No, you need to post the entire dmesg. The "cut here" part is maybe
useful for a developer diagnosing Btrfs's response to the problem, but
the problem, or the pre-problem, happened before this.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs problems on new file system

2015-12-26 Thread covici

Chris Murphy  wrote:

> On Sat, Dec 26, 2015 at 4:38 AM,   wrote:
> > Duncan <1i5t5.dun...@cox.net> wrote:
> >
> >> covici posted on Sat, 26 Dec 2015 02:29:11 -0500 as excerpted:
> >>
> >> > Chris Murphy  wrote:
> >> >
> >> >> If you can post the entire dmesg somewhere that'd be useful. MUAs tend
> >> >> to wrap that text and make it unreadable on list. I think the problems
> >> >> with your volume happened before the messages, but it's hard to say.
> >> >> Also, a generation of nearly 5000 is not that new?
> >> >
> >> > The file system was only a few days old.  It was on an lvm volume group
> >> > which consisted of two ssd drives, so I am not sure what you are saying
> >> > about lvm cache -- how could I do anything different?
> >> >
> >> >> On another thread someone said you probably need to specify the device
> >> >> to mount when using Btrfs and lvmcache? And the device to specify is
> >> >> the combined HDD+SSD logical device, for lvmcache that's the "cache
> >> >> LV", which is the OriginLV + CachePoolLV. If Btrfs decides to mount the
> >> >> origin, it can result in corruption.
> >> >
> >> > See above.
> >>
> >> I think he mixed up two threads and thought you were running lvm-cache,
> >> not just regular lvm, which should be good unless you're exposing lvm
> >> snapshots and thus letting btrfs see multiple supposed UUIDs that aren't
> >> actually universal.  Since btrfs is multi-device and uses the UUID to
> >> track which devices belong to it (because they're _supposed_ to be
> >> universally unique, it's even in the _name_!), if it sees the same UUID
> >> it'll consider it part of the same filesystem, thus potentially causing
> >> corruption if it's a snapshot or something that's not actually supposed
> >> to be part of the (current) filesystem.
> >
> > I found a few more log entries, perhaps these may be helpful to track
> > this down, or maybe prevent the filesystem from going read-only.
> 
> No, you need to post the entire dmesg. The "cut here" part is maybe
> useful for a developer diagnosing Btrfs's response to the problem, but
> the problem, or the pre-problem, happened before this.

It would be a 20meg file, if I were to post the whole file.  but I can
tell you, no hardware errors at any time.


-- 
Your life is like a penny.  You're going to lose it.  The question is:
How do
you spend it?

 John Covici
 cov...@ccs.covici.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs problems on new file system

2015-12-26 Thread Chris Murphy

On Sat, Dec 26, 2015 at 12:22 PM,   wrote:
> Chris Murphy  wrote:
>
>> On Sat, Dec 26, 2015 at 4:38 AM,   wrote:
>> > Duncan <1i5t5.dun...@cox.net> wrote:
>> >
>> >> covici posted on Sat, 26 Dec 2015 02:29:11 -0500 as excerpted:
>> >>
>> >> > Chris Murphy  wrote:
>> >> >
>> >> >> If you can post the entire dmesg somewhere that'd be useful. MUAs tend
>> >> >> to wrap that text and make it unreadable on list. I think the problems
>> >> >> with your volume happened before the messages, but it's hard to say.
>> >> >> Also, a generation of nearly 5000 is not that new?
>> >> >
>> >> > The file system was only a few days old.  It was on an lvm volume group
>> >> > which consisted of two ssd drives, so I am not sure what you are saying
>> >> > about lvm cache -- how could I do anything different?
>> >> >
>> >> >> On another thread someone said you probably need to specify the device
>> >> >> to mount when using Btrfs and lvmcache? And the device to specify is
>> >> >> the combined HDD+SSD logical device, for lvmcache that's the "cache
>> >> >> LV", which is the OriginLV + CachePoolLV. If Btrfs decides to mount the
>> >> >> origin, it can result in corruption.
>> >> >
>> >> > See above.
>> >>
>> >> I think he mixed up two threads and thought you were running lvm-cache,
>> >> not just regular lvm, which should be good unless you're exposing lvm
>> >> snapshots and thus letting btrfs see multiple supposed UUIDs that aren't
>> >> actually universal.  Since btrfs is multi-device and uses the UUID to
>> >> track which devices belong to it (because they're _supposed_ to be
>> >> universally unique, it's even in the _name_!), if it sees the same UUID
>> >> it'll consider it part of the same filesystem, thus potentially causing
>> >> corruption if it's a snapshot or something that's not actually supposed
>> >> to be part of the (current) filesystem.
>> >
>> > I found a few more log entries, perhaps these may be helpful to track
>> > this down, or maybe prevent the filesystem from going read-only.
>>
>> No, you need to post the entire dmesg. The "cut here" part is maybe
>> useful for a developer diagnosing Btrfs's response to the problem, but
>> the problem, or the pre-problem, happened before this.
>
> It would be a 20meg file, if I were to post the whole file.  but I can
> tell you, no hardware errors at any time.

The kernel is tainted, looks like a proprietary kernel module, so you
have to have very good familiarity with the workings of that module to
know whether it might affect what's going on, or you'd have to retest
without that kernel module.

Anyway, asking for the whole dmesg isn't arbitrary, it saves times
having to ask for more later. The two things you've provided so far
aren't enough, any number of problems could result in those messages.
So my suggestion is when people ask for something, provide it or don't
provide it, but don't complain about what they're asking for. The
output from btrfs-debug-tree might be several hundred MB. The output
from btrfs-image might be several GB. So if you're not willing to
provide 100kB, let alone 20MB, of kernel messages that might give some
hint what's going on, the resistance itself is off putting. It's like
having to pull your own loose tooth for you, no one really wants to do
that.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs problems on new file system

2015-12-26 Thread Chris Murphy

On Sat, Dec 26, 2015 at 1:02 PM,   wrote:
> Chris Murphy  wrote:
>
>> On Sat, Dec 26, 2015 at 12:22 PM,   wrote:
>> > Chris Murphy  wrote:
>> >
>> >> On Sat, Dec 26, 2015 at 4:38 AM,   wrote:
>> >> > Duncan <1i5t5.dun...@cox.net> wrote:
>> >> >
>> >> >> covici posted on Sat, 26 Dec 2015 02:29:11 -0500 as excerpted:
>> >> >>
>> >> >> > Chris Murphy  wrote:
>> >> >> >
>> >> >> >> If you can post the entire dmesg somewhere that'd be useful. MUAs 
>> >> >> >> tend
>> >> >> >> to wrap that text and make it unreadable on list. I think the 
>> >> >> >> problems
>> >> >> >> with your volume happened before the messages, but it's hard to say.
>> >> >> >> Also, a generation of nearly 5000 is not that new?
>> >> >> >
>> >> >> > The file system was only a few days old.  It was on an lvm volume 
>> >> >> > group
>> >> >> > which consisted of two ssd drives, so I am not sure what you are 
>> >> >> > saying
>> >> >> > about lvm cache -- how could I do anything different?
>> >> >> >
>> >> >> >> On another thread someone said you probably need to specify the 
>> >> >> >> device
>> >> >> >> to mount when using Btrfs and lvmcache? And the device to specify is
>> >> >> >> the combined HDD+SSD logical device, for lvmcache that's the "cache
>> >> >> >> LV", which is the OriginLV + CachePoolLV. If Btrfs decides to mount 
>> >> >> >> the
>> >> >> >> origin, it can result in corruption.
>> >> >> >
>> >> >> > See above.
>> >> >>
>> >> >> I think he mixed up two threads and thought you were running lvm-cache,
>> >> >> not just regular lvm, which should be good unless you're exposing lvm
>> >> >> snapshots and thus letting btrfs see multiple supposed UUIDs that 
>> >> >> aren't
>> >> >> actually universal.  Since btrfs is multi-device and uses the UUID to
>> >> >> track which devices belong to it (because they're _supposed_ to be
>> >> >> universally unique, it's even in the _name_!), if it sees the same UUID
>> >> >> it'll consider it part of the same filesystem, thus potentially causing
>> >> >> corruption if it's a snapshot or something that's not actually supposed
>> >> >> to be part of the (current) filesystem.
>> >> >
>> >> > I found a few more log entries, perhaps these may be helpful to track
>> >> > this down, or maybe prevent the filesystem from going read-only.
>> >>
>> >> No, you need to post the entire dmesg. The "cut here" part is maybe
>> >> useful for a developer diagnosing Btrfs's response to the problem, but
>> >> the problem, or the pre-problem, happened before this.
>> >
>> > It would be a 20meg file, if I were to post the whole file.  but I can
>> > tell you, no hardware errors at any time.
>>
>> The kernel is tainted, looks like a proprietary kernel module, so you
>> have to have very good familiarity with the workings of that module to
>> know whether it might affect what's going on, or you'd have to retest
>> without that kernel module.
>>
>> Anyway, asking for the whole dmesg isn't arbitrary, it saves times
>> having to ask for more later. The two things you've provided so far
>> aren't enough, any number of problems could result in those messages.
>> So my suggestion is when people ask for something, provide it or don't
>> provide it, but don't complain about what they're asking for. The
>> output from btrfs-debug-tree might be several hundred MB. The output
>> from btrfs-image might be several GB. So if you're not willing to
>> provide 100kB, let alone 20MB, of kernel messages that might give some
>> hint what's going on, the resistance itself is off putting. It's like
>> having to pull your own loose tooth for you, no one really wants to do
>> that.
>
> How far back do you want to go in terms of the messages?

The kernel log buffer isn't that big by default which is why I asked
for the entire dmesg, not the entire /var/log/messages file. But if
you can reproduce the problem with a new boot, that'd certainly make
the kernel log shorter and cleaner if that's the concern.

The errno -95 is itself sufficiently rare there's no possible way to
answer your question because I don't know anyone would even know what
they're looking for until they find it. It's even possible it won't be
found by looking at kernel messages.

How was the fs created? Conversion? If mkfs.btrfs, what version of
progs and what options were used to create it?  And what was happening
at the time of the first errno=-95?

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs problems on new file system

2015-12-26 Thread covici

Chris Murphy  wrote:

> On Sat, Dec 26, 2015 at 12:22 PM,   wrote:
> > Chris Murphy  wrote:
> >
> >> On Sat, Dec 26, 2015 at 4:38 AM,   wrote:
> >> > Duncan <1i5t5.dun...@cox.net> wrote:
> >> >
> >> >> covici posted on Sat, 26 Dec 2015 02:29:11 -0500 as excerpted:
> >> >>
> >> >> > Chris Murphy  wrote:
> >> >> >
> >> >> >> If you can post the entire dmesg somewhere that'd be useful. MUAs 
> >> >> >> tend
> >> >> >> to wrap that text and make it unreadable on list. I think the 
> >> >> >> problems
> >> >> >> with your volume happened before the messages, but it's hard to say.
> >> >> >> Also, a generation of nearly 5000 is not that new?
> >> >> >
> >> >> > The file system was only a few days old.  It was on an lvm volume 
> >> >> > group
> >> >> > which consisted of two ssd drives, so I am not sure what you are 
> >> >> > saying
> >> >> > about lvm cache -- how could I do anything different?
> >> >> >
> >> >> >> On another thread someone said you probably need to specify the 
> >> >> >> device
> >> >> >> to mount when using Btrfs and lvmcache? And the device to specify is
> >> >> >> the combined HDD+SSD logical device, for lvmcache that's the "cache
> >> >> >> LV", which is the OriginLV + CachePoolLV. If Btrfs decides to mount 
> >> >> >> the
> >> >> >> origin, it can result in corruption.
> >> >> >
> >> >> > See above.
> >> >>
> >> >> I think he mixed up two threads and thought you were running lvm-cache,
> >> >> not just regular lvm, which should be good unless you're exposing lvm
> >> >> snapshots and thus letting btrfs see multiple supposed UUIDs that aren't
> >> >> actually universal.  Since btrfs is multi-device and uses the UUID to
> >> >> track which devices belong to it (because they're _supposed_ to be
> >> >> universally unique, it's even in the _name_!), if it sees the same UUID
> >> >> it'll consider it part of the same filesystem, thus potentially causing
> >> >> corruption if it's a snapshot or something that's not actually supposed
> >> >> to be part of the (current) filesystem.
> >> >
> >> > I found a few more log entries, perhaps these may be helpful to track
> >> > this down, or maybe prevent the filesystem from going read-only.
> >>
> >> No, you need to post the entire dmesg. The "cut here" part is maybe
> >> useful for a developer diagnosing Btrfs's response to the problem, but
> >> the problem, or the pre-problem, happened before this.
> >
> > It would be a 20meg file, if I were to post the whole file.  but I can
> > tell you, no hardware errors at any time.
> 
> The kernel is tainted, looks like a proprietary kernel module, so you
> have to have very good familiarity with the workings of that module to
> know whether it might affect what's going on, or you'd have to retest
> without that kernel module.
> 
> Anyway, asking for the whole dmesg isn't arbitrary, it saves times
> having to ask for more later. The two things you've provided so far
> aren't enough, any number of problems could result in those messages.
> So my suggestion is when people ask for something, provide it or don't
> provide it, but don't complain about what they're asking for. The
> output from btrfs-debug-tree might be several hundred MB. The output
> from btrfs-image might be several GB. So if you're not willing to
> provide 100kB, let alone 20MB, of kernel messages that might give some
> hint what's going on, the resistance itself is off putting. It's like
> having to pull your own loose tooth for you, no one really wants to do
> that.

How far back do you want to go in terms of the messages?


-- 
Your life is like a penny.  You're going to lose it.  The question is:
How do
you spend it?

 John Covici
 cov...@ccs.covici.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs problems on new file system

2015-12-26 Thread covici

Duncan <1i5t5.dun...@cox.net> wrote:

> covici posted on Sat, 26 Dec 2015 02:29:11 -0500 as excerpted:
> 
> > Chris Murphy  wrote:
> > 
> >> If you can post the entire dmesg somewhere that'd be useful. MUAs tend
> >> to wrap that text and make it unreadable on list. I think the problems
> >> with your volume happened before the messages, but it's hard to say.
> >> Also, a generation of nearly 5000 is not that new?
> > 
> > The file system was only a few days old.  It was on an lvm volume group
> > which consisted of two ssd drives, so I am not sure what you are saying
> > about lvm cache -- how could I do anything different?
> > 
> >> On another thread someone said you probably need to specify the device
> >> to mount when using Btrfs and lvmcache? And the device to specify is
> >> the combined HDD+SSD logical device, for lvmcache that's the "cache
> >> LV", which is the OriginLV + CachePoolLV. If Btrfs decides to mount the
> >> origin, it can result in corruption.
> > 
> > See above.
> 
> I think he mixed up two threads and thought you were running lvm-cache, 
> not just regular lvm, which should be good unless you're exposing lvm 
> snapshots and thus letting btrfs see multiple supposed UUIDs that aren't 
> actually universal.  Since btrfs is multi-device and uses the UUID to 
> track which devices belong to it (because they're _supposed_ to be 
> universally unique, it's even in the _name_!), if it sees the same UUID 
> it'll consider it part of the same filesystem, thus potentially causing 
> corruption if it's a snapshot or something that's not actually supposed 
> to be part of the (current) filesystem.

I found a few more log entries, perhaps these may be helpful to track
this down, or maybe prevent the filesystem from going read-only.
[ cut here ]
Dec 25 03:57:42 ccs.covici.com kernel: WARNING: CPU: 1 PID: 16580 at 
fs/btrfs/super.c:260 __btrfs_abort_transaction+0x52/0x114 [btrfs]()
Dec 25 03:57:42 ccs.covici.com kernel: BTRFS: Transaction aborted (error -95)
Dec 25 03:57:42 ccs.covici.com kernel: Modules linked in: rfcomm ip6table_nat 
nf_nat_ipv6 ip6t_REJECT nf_reject_ipv6 ip6table_mangle ip6table_raw 
nf_conntrack_ipv6 nf_defrag_ipv6 nf_log_ipv6 ip6table_filter ip6_tables sit 
tunnel4 ip_tunnel vmnet(O) fuse vmw_vsock_vmci_transport vsock vmw_vmci 
vmmon(O) uinput cmac ecb xt_nat ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_recent 
xt_comment ipt_REJECT nf_reject_ipv4 xt_addrtype xt_mark xt_CT xt_multiport 
xt_NFLOG nfnetlink_log xt_LOG nf_log_ipv4 nf_log_common nf_nat_tftp 
nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip nf_nat_pptp nf_nat_proto_gre 
nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda 
nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_udplite 
nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre 
nf_conntrack_netlink
Dec 25 03:57:42 ccs.covici.com kernel:  nfnetlink nf_conntrack_netbios_ns 
nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp 
xt_tcpudp xt_conntrack iptable_mangle iptable_nat nf_conntrack_ipv4 
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_raw iptable_filter 
ip_tables x_tables bnep ext4 jbd2 gpio_ich snd_emu10k1_synth snd_emux_synth 
snd_seq_midi_emul snd_seq_virmidi snd_pcm_oss snd_seq_dummy snd_seq_oss 
snd_seq_midi snd_seq_midi_event snd_seq snd_mixer_oss btusb joydev btintel 
btbcm snd_emu10k1 bluetooth intel_rapl rfkill iosf_mbi x86_pkg_temp_thermal 
crc16 snd_util_mem snd_hwdep coretemp snd_ac97_codec ac97_bus kvm_intel 
snd_rawmidi snd_seq_device kvm snd_pcm e1000e snd_timer r8169 emu10k1_gp snd 
ptp gameport microcode i2c_i801 pps_core pcspkr lpc_ich mii acpi_cpufreq 
8250_fintek processor
Dec 25 03:57:42 ccs.covici.com kernel:  button sch_fq_codel nvidia(PO) drm 
agpgart hid_logitech_hidpp dm_snapshot dm_bufio hid_logitech_dj usbhid btrfs 
xor raid6_pq ata_generic pata_acpi uas usb_storage crct10dif_pclmul 
crc32_pclmul crc32c_intel cryptd xhci_pci xhci_hcd ehci_pci ehci_hcd ahci 
libahci pata_marvell libata usbcore usb_common dm_mirror dm_region_hash dm_log 
dm_mod ipv6 autofs4
Dec 25 03:57:42 ccs.covici.com kernel: CPU: 1 PID: 16580 Comm: kworker/u16:5 
Tainted: P   O4.1.12-gentoo #1
Dec 25 03:57:42 ccs.covici.com kernel: Hardware name: Supermicro C7P67/C7P67, 
BIOS 4.6.4 07/01/2011
Dec 25 03:57:42 ccs.covici.com kernel: Workqueue: btrfs-endio-write 
btrfs_endio_write_helper [btrfs]
Dec 25 03:57:42 ccs.covici.com kernel:  0009 88037ca27c28 
81458291 8000
Dec 25 03:57:42 ccs.covici.com kernel:  88037ca27c78 88037ca27c68 
81045b50 88037ca27c58
Dec 25 03:57:42 ccs.covici.com kernel:  a0370008 ffa1 
880166d8e228 a0400aa0
Dec 25 03:57:42 ccs.covici.com kernel: Call Trace:
Dec 25 03:57:42 ccs.covici.com kernel:  [] 
dump_stack+0x4f/0x7b
Dec 25 03:57:42 ccs.covici.com kernel:  [] 
warn_slowpath_common+0xa1/0xbb
Dec 25 03:57:42

Re: btrfs problems on new file system

2015-12-26 Thread Duncan

covici posted on Sat, 26 Dec 2015 02:29:11 -0500 as excerpted:

> Chris Murphy  wrote:
> 
>> If you can post the entire dmesg somewhere that'd be useful. MUAs tend
>> to wrap that text and make it unreadable on list. I think the problems
>> with your volume happened before the messages, but it's hard to say.
>> Also, a generation of nearly 5000 is not that new?
> 
> The file system was only a few days old.  It was on an lvm volume group
> which consisted of two ssd drives, so I am not sure what you are saying
> about lvm cache -- how could I do anything different?
> 
>> On another thread someone said you probably need to specify the device
>> to mount when using Btrfs and lvmcache? And the device to specify is
>> the combined HDD+SSD logical device, for lvmcache that's the "cache
>> LV", which is the OriginLV + CachePoolLV. If Btrfs decides to mount the
>> origin, it can result in corruption.
> 
> See above.

I think he mixed up two threads and thought you were running lvm-cache, 
not just regular lvm, which should be good unless you're exposing lvm 
snapshots and thus letting btrfs see multiple supposed UUIDs that aren't 
actually universal.  Since btrfs is multi-device and uses the UUID to 
track which devices belong to it (because they're _supposed_ to be 
universally unique, it's even in the _name_!), if it sees the same UUID 
it'll consider it part of the same filesystem, thus potentially causing 
corruption if it's a snapshot or something that's not actually supposed 
to be part of the (current) filesystem.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

btrfs problems on new file system

2015-12-25 Thread covici

Hi.  I created a file system using 4.3.1 version of btrfsprogs and have
been using it for some three days.  I have gotten the following errors
in the log this morning:
Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
transid verify failed on 51776421888 wanted 4983 found 4981
Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
transid verify failed on 51776421888 wanted 4983 found 4981
Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
transid verify failed on 51776421888 wanted 4983 found 4981
Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
transid verify failed on 51776421888 wanted 4983 found 4981
Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
transid verify failed on 51776421888 wanted 4983 found 4981
Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
transid verify failed on 51776421888 wanted 4983 found 4981
Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
transid verify failed on 51776421888 wanted 4983 found 4981
Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
transid verify failed on 51776421888 wanted 4983 found 4981
Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
transid verify failed on 51776421888 wanted 4983 found 4981
Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
transid verify failed on 51776421888 wanted 4983 found 4981

The file system was then made read only.  I unmounted, did a check
without repair which said it was fine, and remounted successfully in
read/write mode, but am I in trouble?  This was on a solid state drive
using lvm.

Thanks in advance for any suggestions.

-- 
Your life is like a penny.  You're going to lose it.  The question is:
How do
you spend it?

 John Covici
 cov...@ccs.covici.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs problems on new file system

2015-12-25 Thread Henk Slager

On Fri, Dec 25, 2015 at 11:03 AM,   wrote:
> Hi.  I created a file system using 4.3.1 version of btrfsprogs and have
> been using it for some three days.  I have gotten the following errors
> in the log this morning:
> Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
> transid verify failed on 51776421888 wanted 4983 found 4981
> Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
> transid verify failed on 51776421888 wanted 4983 found 4981
> Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
> transid verify failed on 51776421888 wanted 4983 found 4981
> Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
> transid verify failed on 51776421888 wanted 4983 found 4981
> Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
> transid verify failed on 51776421888 wanted 4983 found 4981
> Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
> transid verify failed on 51776421888 wanted 4983 found 4981
> Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
> transid verify failed on 51776421888 wanted 4983 found 4981
> Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
> transid verify failed on 51776421888 wanted 4983 found 4981
> Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
> transid verify failed on 51776421888 wanted 4983 found 4981
> Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
> transid verify failed on 51776421888 wanted 4983 found 4981
>
> The file system was then made read only.  I unmounted, did a check
> without repair which said it was fine, and remounted successfully in
> read/write mode, but am I in trouble?  This was on a solid state drive
> using lvm.
What kernel version are you using?
I think you might have some hardware error or glitch somewhere,
otherwise I don't know why you have such errors. These kind of errors
remind me of SATA/cable failures over quite a period of time (multipe
days). Or something with lvm or trim of SSD.
Any unusual with the SSD if you run  smartctl?
A btrfs check will indeed likely result in an OK for this case.
What about running read-only scrub?
Maybe running  memtest86+  can rule-out the worst case.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs problems on new file system

2015-12-25 Thread Duncan

covici posted on Fri, 25 Dec 2015 16:14:58 -0500 as excerpted:

> Henk Slager  wrote:
> 
>> On Fri, Dec 25, 2015 at 11:03 AM,   wrote:
>> > Hi.  I created a file system using 4.3.1 version of btrfsprogs and
>> > have been using it for some three days.  I have gotten the following
>> > errors in the log this morning:

>> > Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
>> > transid verify failed on 51776421888 wanted 4983 found 4981

[Several of these within a second, same block and transids, wanted 4983, 
found 4981.]

>> > The file system was then made read only.  I unmounted, did a check
>> > without repair which said it was fine, and remounted successfully in
>> > read/write mode, but am I in trouble?  This was on a solid state
>> > drive using lvm.
>> What kernel version are you using?
>> I think you might have some hardware error or glitch somewhere,
>> otherwise I don't know why you have such errors. These kind of errors
>> remind me of SATA/cable failures over quite a period of time (multipe
>> days). Or something with lvm or trim of SSD.
>> Any unusual with the SSD if you run  smartctl?
>> A btrfs check will indeed likely result in an OK for this case.
>> What about running read-only scrub?
>> Maybe running  memtest86+  can rule-out the worst case.
> 
> I am running 4.1.12-gentoo and btrfs progs 4.3.1.  Same thing happened
> on another filesystem, so I switched them over to ext4 and no troubles
> since.  As far as I know the ssd drives are fine, I have been using them
> for months.  Maybe btrfs needs some more work.  I did do scrubs on the
> filesystems after I went offline and remounted them, and they were
> successful, and I got no errors from the lower layers at all.  Maybe
> I'll try this in a year or so.

Well, as I seem to say every few posts, btrfs is "still stabilizing, not 
fully stable and mature", so it's a given that more work is needed, tho 
it's demonstrated to be "stable enough" for many in daily use, as long as 
they're generally aware of stability status and are following the admin's 
rule of backups[1] with the increased risk-factor of running "still 
stabilizing" filesystems in mind.

The very close generation/transid numbers, only two commits apart, for 
the exact same block, within the same second, indicate a quite recent 
block-write update failure, possibly only a minute or two old.  You could 
tell how recent by comparing the generation/transid in the superblock 
(using btrfs-show-super) at as close to the same time as possible, seeing 
how far ahead it is.

I'd check smartctl -A for the device(s), then run scrub and check it 
again, to see if the raw number for ID5, Reallocated_Sector_Ct (or 
similar for your device) changed.  (I have some experience with this.[2])

If the raw reallocated sector count goes up, it's obviously the device.  
If it doesn't but scrub fixes an error, then it's likely elsewhere in the 
hardware (cabling, power, memory or storage bus errors, sata/scsi 
controller...).  If scrub detects but can't fix the error the lack of fix 
is probably due to single mode, with the original error due possibly to a 
bad shutdown/umount or a btrfs bug.  If scrub says it's fine, then 
whatever it was was temporary could be due to all sorts of things, from a 
cosmic ray induced memory error, to btrfs bug, to...

In any case, if scrub fixes or doesn't detect an error, I'd not worry 
about it too much, as it doesn't seem to be affecting operation, you 
didn't get a lockup or backtrace, etc.  In fact, I'd take that as 
indication of btrfs normal problem detection and self-healing, likely due 
to being able to pull a valid copy from elsewhere due to raidN or dup 
redundancy or parity.

Tho there's no shame in simply deciding btrfs is still too "stabilizing, 
not fully stable and mature" for you, either.  I know I'd still hesitate 
to use it in a full production environment, unless I had both good/tested 
backups and failover in place.  "Good enough for daily use, provided 
there's backups if you don't consider the data throwaway", is just that; 
it's not really yet good enough for "I just need it to work, reliably, 
because it's big money and people's jobs if it doesn't."

---
[1] Admin's rule of backups:  For any given level of backup, you either 
have it, or by your actions are defining the data to be of less value 
than the hassle and resources taken to do the backup, multiplied by the 
risk factor of actually needing that backup.  As a consequence, after the 
fact protests to the contrary are simply lies, as actions spoke louder 
than words and they defined the time and hassle saved as more valuable, 
so the valuable was saved in any case and in this case the user should be 
happy they saved the more valuable hassle and resources even if the data 
got lost.

And of course with btrfs still stabilizing, that risk factor remains 
somewhat elevated, meaning more levels of backups need to be kept, for

Re: btrfs problems on new file system

2015-12-25 Thread covici

Chris Murphy  wrote:

> If you can post the entire dmesg somewhere that'd be useful. MUAs tend
> to wrap that text and make it unreadable on list. I think the problems
> with your volume happened before the messages, but it's hard to say.
> Also, a generation of nearly 5000 is not that new?

The file system was only a few days old.  It was on an lvm volume group
which consisted of two ssd drives, so I am not sure what you are saying
about lvm cache -- how could I do anything different?


> 
> On another thread someone said you probably need to specify the device
> to mount when using Btrfs and lvmcache? And the device to specify is
> the combined HDD+SSD logical device, for lvmcache that's the "cache
> LV", which is the OriginLV + CachePoolLV. If Btrfs decides to mount
> the origin, it can result in corruption.

See above.


-- 
Your life is like a penny.  You're going to lose it.  The question is:
How do
you spend it?

 John Covici
 cov...@ccs.covici.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs problems on new file system

2015-12-25 Thread Chris Murphy

If you can post the entire dmesg somewhere that'd be useful. MUAs tend
to wrap that text and make it unreadable on list. I think the problems
with your volume happened before the messages, but it's hard to say.
Also, a generation of nearly 5000 is not that new?

On another thread someone said you probably need to specify the device
to mount when using Btrfs and lvmcache? And the device to specify is
the combined HDD+SSD logical device, for lvmcache that's the "cache
LV", which is the OriginLV + CachePoolLV. If Btrfs decides to mount
the origin, it can result in corruption.


Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs problems on new file system

2015-12-25 Thread covici

Henk Slager  wrote:

> On Fri, Dec 25, 2015 at 11:03 AM,   wrote:
> > Hi.  I created a file system using 4.3.1 version of btrfsprogs and have
> > been using it for some three days.  I have gotten the following errors
> > in the log this morning:
> > Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
> > transid verify failed on 51776421888 wanted 4983 found 4981
> > Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
> > transid verify failed on 51776421888 wanted 4983 found 4981
> > Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
> > transid verify failed on 51776421888 wanted 4983 found 4981
> > Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
> > transid verify failed on 51776421888 wanted 4983 found 4981
> > Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
> > transid verify failed on 51776421888 wanted 4983 found 4981
> > Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
> > transid verify failed on 51776421888 wanted 4983 found 4981
> > Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
> > transid verify failed on 51776421888 wanted 4983 found 4981
> > Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
> > transid verify failed on 51776421888 wanted 4983 found 4981
> > Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
> > transid verify failed on 51776421888 wanted 4983 found 4981
> > Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
> > transid verify failed on 51776421888 wanted 4983 found 4981
> >
> > The file system was then made read only.  I unmounted, did a check
> > without repair which said it was fine, and remounted successfully in
> > read/write mode, but am I in trouble?  This was on a solid state drive
> > using lvm.
> What kernel version are you using?
> I think you might have some hardware error or glitch somewhere,
> otherwise I don't know why you have such errors. These kind of errors
> remind me of SATA/cable failures over quite a period of time (multipe
> days). Or something with lvm or trim of SSD.
> Any unusual with the SSD if you run  smartctl?
> A btrfs check will indeed likely result in an OK for this case.
> What about running read-only scrub?
> Maybe running  memtest86+  can rule-out the worst case.

I am running 4.1.12-gentoo and btrfs progs 4.3.1.  Same thing happened
on another filesystem, so I switched them over to ext4 and no troubles
since.  As far as I know the ssd drives are fine, I have been using them
for months.  Maybe btrfs needs some more work.  I did do scrubs on the
filesystems after I went offline and remounted them, and they were
successful, and I got no errors from the lower layers at all.  Maybe
I'll try this in a year or so.



-- 
Your life is like a penny.  You're going to lose it.  The question is:
How do
you spend it?

 John Covici
 cov...@ccs.covici.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs problems on new file system

2015-12-25 Thread covici

Duncan <1i5t5.dun...@cox.net> wrote:

> covici posted on Fri, 25 Dec 2015 16:14:58 -0500 as excerpted:
> 
> > Henk Slager  wrote:
> > 
> >> On Fri, Dec 25, 2015 at 11:03 AM,   wrote:
> >> > Hi.  I created a file system using 4.3.1 version of btrfsprogs and
> >> > have been using it for some three days.  I have gotten the following
> >> > errors in the log this morning:
> 
> >> > Dec 25 04:10:16 ccs.covici.com kernel: BTRFS (device dm-20): parent
> >> > transid verify failed on 51776421888 wanted 4983 found 4981
> 
> [Several of these within a second, same block and transids, wanted 4983, 
> found 4981.]
> 
> >> > The file system was then made read only.  I unmounted, did a check
> >> > without repair which said it was fine, and remounted successfully in
> >> > read/write mode, but am I in trouble?  This was on a solid state
> >> > drive using lvm.
> >> What kernel version are you using?
> >> I think you might have some hardware error or glitch somewhere,
> >> otherwise I don't know why you have such errors. These kind of errors
> >> remind me of SATA/cable failures over quite a period of time (multipe
> >> days). Or something with lvm or trim of SSD.
> >> Any unusual with the SSD if you run  smartctl?
> >> A btrfs check will indeed likely result in an OK for this case.
> >> What about running read-only scrub?
> >> Maybe running  memtest86+  can rule-out the worst case.
> > 
> > I am running 4.1.12-gentoo and btrfs progs 4.3.1.  Same thing happened
> > on another filesystem, so I switched them over to ext4 and no troubles
> > since.  As far as I know the ssd drives are fine, I have been using them
> > for months.  Maybe btrfs needs some more work.  I did do scrubs on the
> > filesystems after I went offline and remounted them, and they were
> > successful, and I got no errors from the lower layers at all.  Maybe
> > I'll try this in a year or so.
> 
> Well, as I seem to say every few posts, btrfs is "still stabilizing, not 
> fully stable and mature", so it's a given that more work is needed, tho 
> it's demonstrated to be "stable enough" for many in daily use, as long as 
> they're generally aware of stability status and are following the admin's 
> rule of backups[1] with the increased risk-factor of running "still 
> stabilizing" filesystems in mind.
> 
> The very close generation/transid numbers, only two commits apart, for 
> the exact same block, within the same second, indicate a quite recent 
> block-write update failure, possibly only a minute or two old.  You could 
> tell how recent by comparing the generation/transid in the superblock 
> (using btrfs-show-super) at as close to the same time as possible, seeing 
> how far ahead it is.
> 
> I'd check smartctl -A for the device(s), then run scrub and check it 
> again, to see if the raw number for ID5, Reallocated_Sector_Ct (or 
> similar for your device) changed.  (I have some experience with this.[2])
> 
> If the raw reallocated sector count goes up, it's obviously the device.  
> If it doesn't but scrub fixes an error, then it's likely elsewhere in the 
> hardware (cabling, power, memory or storage bus errors, sata/scsi 
> controller...).  If scrub detects but can't fix the error the lack of fix 
> is probably due to single mode, with the original error due possibly to a 
> bad shutdown/umount or a btrfs bug.  If scrub says it's fine, then 
> whatever it was was temporary could be due to all sorts of things, from a 
> cosmic ray induced memory error, to btrfs bug, to...
> 
> In any case, if scrub fixes or doesn't detect an error, I'd not worry 
> about it too much, as it doesn't seem to be affecting operation, you 
> didn't get a lockup or backtrace, etc.  In fact, I'd take that as 
> indication of btrfs normal problem detection and self-healing, likely due 
> to being able to pull a valid copy from elsewhere due to raidN or dup 
> redundancy or parity.
> 
> Tho there's no shame in simply deciding btrfs is still too "stabilizing, 
> not fully stable and mature" for you, either.  I know I'd still hesitate 
> to use it in a full production environment, unless I had both good/tested 
> backups and failover in place.  "Good enough for daily use, provided 
> there's backups if you don't consider the data throwaway", is just that; 
> it's not really yet good enough for "I just need it to work, reliably, 
> because it's big money and people's jobs if it doesn't."
> 
> ---
> [1] Admin's rule of backups:  For any given level of backup, you either 
> have it, or by your actions are defining the data to be of less value 
> than the hassle and resources taken to do the backup, multiplied by the 
> risk factor of actually needing that backup.  As a consequence, after the 
> fact protests to the contrary are simply lies, as actions spoke louder 
> than words and they defined the time and hassle saved as more valuable, 
> so the valuable was saved in any case and in this case the user should be 
> happy they saved the

Re: btrfs problems and fedora 14

2010-11-26 Thread david grant

Thank you all for your help and in particular you cwillu (sounds
strangely formal!).

Yes, I can now boot into a snapshot but I thought it might be helpful to
explain why I thought otherwise.

I am totally anal about having backups of a current operating systems
and using those for testing I thought tat the best way to do this with
btrfs was to rsync the file system to another partition but exclude all
snapshots. This worked very well as long as I mounted only the root file
system of the copy but what I did was add snapshots to the copy and at
some point (probably at the start) the btree system was corrupted but I
only saw this on backtracking and checking all messages. Also, I didn't
want to boot from a snapshot of my working operating system for fear I
could screw things up and have to re-install from scratch.  In order to
try again, I deleted all snapshots from the original system, did an
rsync and checked the copy. I then made a snapshot of the copy via yum,
used rootflags and it worked!!

So, cwillu, after your scolding of me and your (perfectly reasonable)
questioning of my understanding, I did get it together for booting.

BUT I am still left with the problem that caused it for me: how do I
backup (clone?) a btrfs file system with snapshots to another btrfs
partition (apart from using dd). I just hope I don't get scolded again
and told I am not up to it.




On Wed, 2010-11-24 at 03:19 -0600, cwillu wrote: 
 On Wed, Nov 24, 2010 at 1:32 AM, david grant d...@david-grant.com wrote:
  Hugo, you told me how to mount a snapshot. Thank you, that works but you
  didn't tell me how to boot into it.
 
 He also gave you the command to set the default subvolume/snapshot
 used if you don't provide one:  btrfs subvolume set-default id
 path.  There's also a standard way to send mount options for the
 root filesystem, which would allow you to use the mount options he
 provided (which Anthony pointed out in his email).
 
  Anthony, I really hoped that you had provided the answer using grub but
  all combinations of your suggestions result in a boot failure with
  standard error message of unable to mount root because of of wrong fs
  type etc. I assume that with your suggestion I need a standard fstab
  entry with default options but it doesn't work even with subvol options.
  I am always nervous of messing with the MBR so I want to stick with
  grub.
 
 He meant that you distribution uses an initial ram filesystem loaded
 into memory with necessary modules, placed in the same place as the
 kernel image that grub loads.  This is unrelated to the MBR.
 
  Perhaps this is a fedora problem but I have to say I find it very
  strange that they tout btrfs as the future, particularly with respect to
  rollbacks but provide no guide to doing this. I assume it is a
  combination of grub boot parameters and fstab but nobody seems to know
  what to do.
 
 The future != the present.  Btrfs will make things like rollback easy
 to implement, but it's not implemented yet in useful way for an
 untechnical user.  The hard technical bits are over and done with by
 the time there are guides on the various forums about how to do
 rollback with btrfs.
 
  I am not a techo so I just need simple instructions. Is there any other
  site, I should be posting this on?
 
 Not to belabour the point, but a more careful reading of what people
 told you would have gotten you up and running.  If those instructions
 were too technical, then you probably shouldn't be using btrfs yet:
 it's very much at a some assembly required stage, and if you don't
 understand how your system boots at a basic-but-technical level,
 you're either going to come away frustrated, or you're going to have
 to learn at least some linux administrator 101.  :)
 
 Understand what the commands people are giving you actually do, and
 you'll have this working in no time.
 
 [sorry for sending this twice David, I consistently fail to hit reply
 to all when replying to mailing lists]  :(



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs problems and fedora 14

2010-11-26 Thread Oystein Viggen

* [david grant] 

 BUT I am still left with the problem that caused it for me: how do I
 backup (clone?) a btrfs file system with snapshots to another btrfs
 partition (apart from using dd). I just hope I don't get scolded again
 and told I am not up to it.

I don't think you can conveniently clone the filesystem including the
snapshots to another computer or partition using traditional userspace
tools like tar or rsync, since they'd end up de-linking the reflink-ness
of the snapshots, so that all the snapshots end up taking the full
space.

However, I can think of one or two strategies that might help you
achieve something close to what you actually want:

1. If the snapshots are just for online backup, you could backup only
what you consider the live subvol (or even better: a very recent
snapshot of it), and then make snapshots on the target filesystem after
each backup.  While this isn't really a backup including the snapshots,
it might serve the purpose you want.

2. You could rsync the oldest snapshot, make a snapshot of it on the
target filesystem named the same as your second-oldest snapshot, rsync
(--inplace) the second-oldest snapshot into that newly created snapshot,
and repeat until you've done all the snapshots.  My head is already
spinning, but it seems to me that it should be possible to automate this
in a not-too-ugly shell script that also handles updates in a sane way.
This falls to bits, however, if the various snapshots are regularly
written to, or if you can't be sure of their creation order.  (for dated
backup snapshots, there shouldn't be a problem).

What would be really awesome is some sort of btrfs-send program that
handles all this the best way for you, but I don't think that exists
(yet).  User friendly tools will undoubtedly appear as btrfs is more
used, but I guess it's still partly in the roll your own early adopter
stage.  :)

Øystein
-- 
Windows is too dangerous to be left to Windows admins.
 -- James Riden in the monastery

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs problems and fedora 14

2010-11-26 Thread David Pottage


On 26/11/10 10:11, Oystein Viggen wrote:

What would be really awesome is some sort of btrfs-send program that
handles all this the best way for you, but I don't think that exists
(yet).  User friendly tools will undoubtedly appear as btrfs is more
used, but I guess it's still partly in the roll your own early adopter
stage.  :)
   

I agree.

In the past this sort of thing has been handled by adding features to 
tar, so that tar 'knows' how to pack up a filesystem with the latest new 
features (extended meta data for example), and how to restore that data 
at the other end.


--
David Pottage.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs problems and fedora 14

2010-11-26 Thread cwillu

On Fri, Nov 26, 2010 at 3:40 AM, david grant d...@david-grant.com wrote:
 I am totally anal about having backups of a current operating systems
 and using those for testing I thought tat the best way to do this with
 btrfs was to rsync the file system to another partition but exclude all
 snapshots. This worked very well as long as I mounted only the root file
 system of the copy but what I did was add snapshots to the copy and at
 some point (probably at the start) the btree system was corrupted but I
 only saw this on backtracking and checking all messages. Also, I didn't
 want to boot from a snapshot of my working operating system for fear I
 could screw things up and have to re-install from scratch.  In order to
 try again, I deleted all snapshots from the original system, did an
 rsync and checked the copy. I then made a snapshot of the copy via yum,
 used rootflags and it worked!!

 So, cwillu, after your scolding of me and your (perfectly reasonable)
 questioning of my understanding, I did get it together for booting.

 BUT I am still left with the problem that caused it for me: how do I
 backup (clone?) a btrfs file system with snapshots to another btrfs
 partition (apart from using dd).

I use rsync myself, and explicitly list the subvolumes and mirrors I
want copied, which sounds pretty much like what you were doing.  The
corruption you saw definitely wasn't supposed to happen, but depending
on which kernel's you've used and (to a lessor extent) whether a few
particular kernel options are set, isn't too surprising.  Things
_have_ been pretty stable for me for the last little while, basically
since 2.6.35+btrfs_git, and I use snapshots quite a bit.

What I use in a nutshell is:

mountpoint ${BACKUP_TO}/${TODAY} || {
btrfs subvolume snapshot ${BACKUP_TO}/${YESTERDAY}
${BACKUP_TO}/${TODAY} || exit 1
}
btrfs subvolume snapshot / /backup-snap  {
rsync -vaxR --inplace --delete --ignore-errors /backup-snap/./
${BACKUP_TO}/${TODAY}/
btrfs subvolume delete /backup-snap
}

This will give you incremental backups while avoiding the worst of the
duplication.  I haven't verified that rsync actually does anything
useful COW-wise at the file level, but that's the idea behind the
--inplace option (without it, rsync writes to a copy, and replaces the
original, which COW can't help with).

This is a still a little ways from actually making new snapshots to
fully reproduce the existing filesystem, but I'm not certain that's
what you were after.

 I just hope I don't get scolded again and told I am not up to it.

In point of fact, I said that you _were_ up to it (you were), and that
you'd have it running in no time (you did) once you understood things
better (you do).  I win?  (I always win)  =D

[Also, please post your replies under the quoted original, not on top.
 Easier to follow the thread that way.]
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs problems and fedora 14

2010-11-24 Thread cwillu

On Wed, Nov 24, 2010 at 1:32 AM, david grant d...@david-grant.com wrote:
 Hugo, you told me how to mount a snapshot. Thank you, that works but you
 didn't tell me how to boot into it.

He also gave you the command to set the default subvolume/snapshot
used if you don't provide one:  btrfs subvolume set-default id
path.  There's also a standard way to send mount options for the
root filesystem, which would allow you to use the mount options he
provided (which Anthony pointed out in his email).

 Anthony, I really hoped that you had provided the answer using grub but
 all combinations of your suggestions result in a boot failure with
 standard error message of unable to mount root because of of wrong fs
 type etc. I assume that with your suggestion I need a standard fstab
 entry with default options but it doesn't work even with subvol options.
 I am always nervous of messing with the MBR so I want to stick with
 grub.

He meant that you distribution uses an initial ram filesystem loaded
into memory with necessary modules, placed in the same place as the
kernel image that grub loads.  This is unrelated to the MBR.

 Perhaps this is a fedora problem but I have to say I find it very
 strange that they tout btrfs as the future, particularly with respect to
 rollbacks but provide no guide to doing this. I assume it is a
 combination of grub boot parameters and fstab but nobody seems to know
 what to do.

The future != the present.  Btrfs will make things like rollback easy
to implement, but it's not implemented yet in useful way for an
untechnical user.  The hard technical bits are over and done with by
the time there are guides on the various forums about how to do
rollback with btrfs.

 I am not a techo so I just need simple instructions. Is there any other
 site, I should be posting this on?

Not to belabour the point, but a more careful reading of what people
told you would have gotten you up and running.  If those instructions
were too technical, then you probably shouldn't be using btrfs yet:
it's very much at a some assembly required stage, and if you don't
understand how your system boots at a basic-but-technical level,
you're either going to come away frustrated, or you're going to have
to learn at least some linux administrator 101.  :)

Understand what the commands people are giving you actually do, and
you'll have this working in no time.

[sorry for sending this twice David, I consistently fail to hit reply
to all when replying to mailing lists]  :(
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs problems and fedora 14

2010-11-23 Thread david grant

Thank you all for your responses to my boot snapshot problem but it
still exists.
.
Hugo, you told me how to mount a snapshot. Thank you, that works but you
didn't tell me how to boot into it.

Anthony, I really hoped that you had provided the answer using grub but
all combinations of your suggestions result in a boot failure with
standard error message of unable to mount root because of of wrong fs
type etc. I assume that with your suggestion I need a standard fstab
entry with default options but it doesn't work even with subvol options.
I am always nervous of messing with the MBR so I want to stick with
grub.

Perhaps this is a fedora problem but I have to say I find it very
strange that they tout btrfs as the future, particularly with respect to
rollbacks but provide no guide to doing this. I assume it is a
combination of grub boot parameters and fstab but nobody seems to know
what to do.

I am not a techo so I just need simple instructions. Is there any other
site, I should be posting this on?

Thanks in anticipation


On Tue, 2010-11-23 at 00:45 -0600, C Anthony Risinger wrote: 
 On Mon, Nov 22, 2010 at 10:47 PM, Wenyi Liu qingshen...@gmail.com wrote:
  2010/11/23, david grant d...@david-grant.com:
  I thought I would try btrfs on a new installation of f14. yes, I know
  its experimental but stable so it seemed to be a good time to try it.
  I am not sure if I have missed something out of all my searching but am
  I correct in thinking that currently:
   I. it is not possible to boot from a snapshot of the operating
  system and, in particular, the yum snapshots cannot be used for
  that purpose
 
  Is the Fedora grub support btrfs now?
  In this page http://fedoraproject.org/wiki/Features/SystemRollbackWithBtrfs
  I got the following information:
  (deferred) a patch to grub1 -- on top of the already existing patch to
  support btrfs in grub1 -- to allow selecting between snapshots of the
  boot partition.
 
 all you need to do is add:
 
 subvol=name of the snapshot
 
 -- or --
 
 subvolid=id of the snapshot
 
 to your kernel boot line (edit in grub on the fly)... however, if
 fedora is like archlinux in this respect (brief google search seems to
 agree), you will actually need to add this:
 
 rootflags=subvol=name of the snapshot
 
 where `rootflags` are the mount options passed to the initramfs/root
 device.  also, you rally don't need grub, whatsoever[1]; in arch,
 we use an initramfs hook to perform system rollback by dynamically
 modifying the rootflags in accordance with the user's choice:
 
 http://aur.archlinux.org/packages/mkinitcpio-btrfs/mkinitcpio-btrfs/btrfs_hook
 
 perhaps someone in fedora can adapt that script... it's rather simple,
 and it's MUCH easier and safer than fiddling with grub legacy[1].
 
 C Anthony
 
 [1] note however, that a proper grub2/extlinux solution is ideal to
 support kernel-level rollbacks.  in the link above, everything is
 rolled back except the kernel (residing on /boot... non-btrfs).
 though, a kexec solution may be possible.


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

btrfs problems and fedora 14

2010-11-22 Thread david grant

I thought I would try btrfs on a new installation of f14. yes, I know
its experimental but stable so it seemed to be a good time to try it.
I am not sure if I have missed something out of all my searching but am
I correct in thinking that currently: 
 I. it is not possible to boot from a snapshot of the operating
system and, in particular, the yum snapshots cannot be used for
that purpose 
II. it is so easy to create raid arrays of btrfs partitions but they
cannot be read by f13 or f14 
   III. it is not possible to copy btrfs partitions with snapshots
except possibly by the use of dd.
This is not meant to be a put down of btrfs but a plea to have some
clarification and in particular the ability to boot snapshots.

Hope I can get a response 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs problems and fedora 14

2010-11-22 Thread Hugo Mills

   Hi,

On Tue, Nov 23, 2010 at 10:19:43AM +1100, david grant wrote:
 I thought I would try btrfs on a new installation of f14. yes, I know
 its experimental but stable so it seemed to be a good time to try it.
 I am not sure if I have missed something out of all my searching but am
 I correct in thinking that currently: 
  I. it is not possible to boot from a snapshot of the operating
 system and, in particular, the yum snapshots cannot be used for
 that purpose 

   You can use btrfs subvolume set-default to set the default
subvolume that is mounted if no subvol= or subvolid= parameter is
given to mount. (And you can then subsequently access the original
root of the filesystem using mount -o subvolid=0).

 II. it is so easy to create raid arrays of btrfs partitions but they
 cannot be read by f13 or f14 

   There's no particular reason that this should be the case. How do
you come to this conclusion? What did you try, what did you expect to
happen, and what actually happened?

III. it is not possible to copy btrfs partitions with snapshots
 except possibly by the use of dd.

   Again, I can't see a reason that this shouldn't work. What are you
trying to do, exactly, and how is it failing?

   Hugo.

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- I believe that it's closely correlated with ---   
   the aeroswine coefficient.


signature.asc
Description: Digital signature

Re: btrfs problems and fedora 14

2010-11-22 Thread Wenyi Liu

2010/11/23, david grant d...@david-grant.com:
 I thought I would try btrfs on a new installation of f14. yes, I know
 its experimental but stable so it seemed to be a good time to try it.
 I am not sure if I have missed something out of all my searching but am
 I correct in thinking that currently:
  I. it is not possible to boot from a snapshot of the operating
 system and, in particular, the yum snapshots cannot be used for
 that purpose

Is the Fedora grub support btrfs now?
In this page http://fedoraproject.org/wiki/Features/SystemRollbackWithBtrfs
I got the following information:
(deferred) a patch to grub1 -- on top of the already existing patch to
support btrfs in grub1 -- to allow selecting between snapshots of the
boot partition.

 II. it is so easy to create raid arrays of btrfs partitions but they
 cannot be read by f13 or f14
III. it is not possible to copy btrfs partitions with snapshots
 except possibly by the use of dd.
 This is not meant to be a put down of btrfs but a plea to have some
 clarification and in particular the ability to boot snapshots.

 Hope I can get a response

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs problems and fedora 14

2010-11-22 Thread C Anthony Risinger

On Mon, Nov 22, 2010 at 10:47 PM, Wenyi Liu qingshen...@gmail.com wrote:
 2010/11/23, david grant d...@david-grant.com:
 I thought I would try btrfs on a new installation of f14. yes, I know
 its experimental but stable so it seemed to be a good time to try it.
 I am not sure if I have missed something out of all my searching but am
 I correct in thinking that currently:
      I. it is not possible to boot from a snapshot of the operating
         system and, in particular, the yum snapshots cannot be used for
         that purpose

 Is the Fedora grub support btrfs now?
 In this page http://fedoraproject.org/wiki/Features/SystemRollbackWithBtrfs
 I got the following information:
 (deferred) a patch to grub1 -- on top of the already existing patch to
 support btrfs in grub1 -- to allow selecting between snapshots of the
 boot partition.

all you need to do is add:

subvol=name of the snapshot

-- or --

subvolid=id of the snapshot

to your kernel boot line (edit in grub on the fly)... however, if
fedora is like archlinux in this respect (brief google search seems to
agree), you will actually need to add this:

rootflags=subvol=name of the snapshot

where `rootflags` are the mount options passed to the initramfs/root
device.  also, you rally don't need grub, whatsoever[1]; in arch,
we use an initramfs hook to perform system rollback by dynamically
modifying the rootflags in accordance with the user's choice:

http://aur.archlinux.org/packages/mkinitcpio-btrfs/mkinitcpio-btrfs/btrfs_hook

perhaps someone in fedora can adapt that script... it's rather simple,
and it's MUCH easier and safer than fiddling with grub legacy[1].

C Anthony

[1] note however, that a proper grub2/extlinux solution is ideal to
support kernel-level rollbacks.  in the link above, everything is
rolled back except the kernel (residing on /boot... non-btrfs).
though, a kexec solution may be possible.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

40 matches

Mail list logo