Re: Bug/regression: Read-only mount not read-only

2015-12-04 Thread Austin S Hemmelgarn

On 2015-12-02 18:40, Qu Wenruo wrote:



On 12/03/2015 06:48 AM, Eric Sandeen wrote:

On 12/2/15 11:48 AM, Austin S Hemmelgarn wrote:


On a side note, do either XFS or ext4 support removing the norecovery
option from the mount flags through mount -o remount?  Even if they
don't, that might be a nice feature to have in BTRFS if we can safely
support it.


It's not remountable today on xfs:

 /* ro -> rw */
 if ((mp->m_flags & XFS_MOUNT_RDONLY) && !(*flags & MS_RDONLY)) {
 if (mp->m_flags & XFS_MOUNT_NORECOVERY) {
 xfs_warn(mp,
 "ro->rw transition prohibited on norecovery mount");
 return -EINVAL;
 }

not sure about ext4.

-Eric


Not remountable is very good to implement it.
Makes things super easy to do.

Or we will need to add log replay for remount time.

I'd like to implement it first for non-remountable case as a try.
And for the option name, I prefer something like "notreereplay", but I
don't consider it the best one yet

I entirely understand wanting a simple implementation first, my only 
point is that it would be a potentially useful feature to have if we 
could sanely implement it.





smime.p7s
Description: S/MIME Cryptographic Signature


Re: Bug/regression: Read-only mount not read-only

2015-12-04 Thread Austin S Hemmelgarn

On 2015-12-02 18:51, Hugo Mills wrote:

On Thu, Dec 03, 2015 at 07:40:08AM +0800, Qu Wenruo wrote:



On 12/03/2015 06:48 AM, Eric Sandeen wrote:

On 12/2/15 11:48 AM, Austin S Hemmelgarn wrote:


On a side note, do either XFS or ext4 support removing the norecovery
option from the mount flags through mount -o remount?  Even if they
don't, that might be a nice feature to have in BTRFS if we can safely
support it.


It's not remountable today on xfs:

 /* ro -> rw */
 if ((mp->m_flags & XFS_MOUNT_RDONLY) && !(*flags & MS_RDONLY)) {
 if (mp->m_flags & XFS_MOUNT_NORECOVERY) {
 xfs_warn(mp,
 "ro->rw transition prohibited on norecovery mount");
 return -EINVAL;
 }

not sure about ext4.

-Eric


Not remountable is very good to implement it.
Makes things super easy to do.

Or we will need to add log replay for remount time.

I'd like to implement it first for non-remountable case as a try.
And for the option name, I prefer something like "notreereplay", but
I don't consider it the best one yet


Thinking out loud...

no-log-replay, no-log, hard-ro, ro-log,
really-read-only-i-mean-it-this-time-honest-guvnor

Delete hyphens at your pleasure.

Personally, I think no-log-replay (with or without hyphens) is the most 
concise option name.  With something like this, it should be as clear as 
possible what is being done.





smime.p7s
Description: S/MIME Cryptographic Signature


Re: Bug/regression: Read-only mount not read-only

2015-12-02 Thread Eric Sandeen
On 12/2/15 3:23 AM, Qu Wenruo wrote:
> 
> 
> Qu Wenruo wrote on 2015/12/02 17:06 +0800:
>>
>>
>> Russell Coker wrote on 2015/12/02 17:25 +1100:
>>> On Wed, 2 Dec 2015 06:05:09 AM Eric Sandeen wrote:
 yes, xfs does; we have "-o norecovery" if you don't want that, or need
 to mount a filesystem with a dirty log on a readonly device.
>>>
>>> That option also works with Ext3/4 so it seems to be a standard way of
>>> dealing
>>> with this.  I think that BTRFS should do what Ext3/4 and XFS do in this
>>> regard.
>>>
>> BTW, does -o norecovery implies -o ro?
>>
>> If not, how does it keep the filesystem consistent?
>>
>> I'd like to follow that ext2/xfs behavior, but I'm not familiar with
>> those filesystems.
>>
>> Thanks,
>> Qu
>>
> 
> OK, norecovery implies ro.

For XFS, it doesn't imply it, it requires it; i.e. both must be stated 
explicitly:

/*
 * no recovery flag requires a read-only mount
 */
if ((mp->m_flags & XFS_MOUNT_NORECOVERY) &&
!(mp->m_flags & XFS_MOUNT_RDONLY)) {
xfs_warn(mp, "no-recovery mounts must be read-only.");
return -EINVAL;
}

ext4 is the same, I believe:

} else if (test_opt(sb, NOLOAD) && !(sb->s_flags & MS_RDONLY) &&
   ext4_has_feature_journal_needs_recovery(sb)) {
ext4_msg(sb, KERN_ERR, "required journal recovery "
   "suppressed and not mounted read-only");
goto failed_mount_wq;

so if you'd like btrfs to be consistent with these, I would not make
norecovery imply ro; rather, make I would make it require an explicit ro, i.e.

mount -o ro,norecovery

-Eric

> So I think it's possible to do the same thing for btrfs.
> I'll try to do it soon.
> 
> Thanks,
> Qu
> 
>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug/regression: Read-only mount not read-only

2015-12-02 Thread Austin S Hemmelgarn

On 2015-12-02 11:54, Eric Sandeen wrote:

On 12/2/15 3:23 AM, Qu Wenruo wrote:



Qu Wenruo wrote on 2015/12/02 17:06 +0800:



Russell Coker wrote on 2015/12/02 17:25 +1100:

On Wed, 2 Dec 2015 06:05:09 AM Eric Sandeen wrote:

yes, xfs does; we have "-o norecovery" if you don't want that, or need
to mount a filesystem with a dirty log on a readonly device.


That option also works with Ext3/4 so it seems to be a standard way of
dealing
with this.  I think that BTRFS should do what Ext3/4 and XFS do in this
regard.


BTW, does -o norecovery implies -o ro?

If not, how does it keep the filesystem consistent?

I'd like to follow that ext2/xfs behavior, but I'm not familiar with
those filesystems.

Thanks,
Qu



OK, norecovery implies ro.


For XFS, it doesn't imply it, it requires it; i.e. both must be stated 
explicitly:

 /*
  * no recovery flag requires a read-only mount
  */
 if ((mp->m_flags & XFS_MOUNT_NORECOVERY) &&
 !(mp->m_flags & XFS_MOUNT_RDONLY)) {
 xfs_warn(mp, "no-recovery mounts must be read-only.");
 return -EINVAL;
 }

ext4 is the same, I believe:

 } else if (test_opt(sb, NOLOAD) && !(sb->s_flags & MS_RDONLY) &&
ext4_has_feature_journal_needs_recovery(sb)) {
 ext4_msg(sb, KERN_ERR, "required journal recovery "
"suppressed and not mounted read-only");
 goto failed_mount_wq;

so if you'd like btrfs to be consistent with these, I would not make
norecovery imply ro; rather, make I would make it require an explicit ro, i.e.

mount -o ro,norecovery
Agreed, with something like that, it should as blatantly obvious as 
possible that you can't write to the FS.


On a side note, do either XFS or ext4 support removing the norecovery 
option from the mount flags through mount -o remount?  Even if they 
don't, that might be a nice feature to have in BTRFS if we can safely 
support it.





smime.p7s
Description: S/MIME Cryptographic Signature


Re: Bug/regression: Read-only mount not read-only

2015-12-02 Thread Hugo Mills
On Wed, Dec 02, 2015 at 12:48:39PM -0500, Austin S Hemmelgarn wrote:
> On 2015-12-02 11:54, Eric Sandeen wrote:
> >On 12/2/15 3:23 AM, Qu Wenruo wrote:
> >>Qu Wenruo wrote on 2015/12/02 17:06 +0800:
> >>>Russell Coker wrote on 2015/12/02 17:25 +1100:
> On Wed, 2 Dec 2015 06:05:09 AM Eric Sandeen wrote:
> >yes, xfs does; we have "-o norecovery" if you don't want that, or need
> >to mount a filesystem with a dirty log on a readonly device.
> 
> That option also works with Ext3/4 so it seems to be a standard way of
> dealing
> with this.  I think that BTRFS should do what Ext3/4 and XFS do in this
> regard.
[snip]
> >so if you'd like btrfs to be consistent with these, I would not make
> >norecovery imply ro; rather, make I would make it require an explicit ro, 
> >i.e.
> >
> >mount -o ro,norecovery
> Agreed, with something like that, it should as blatantly obvious as
> possible that you can't write to the FS.
> 
> On a side note, do either XFS or ext4 support removing the
> norecovery option from the mount flags through mount -o remount?
> Even if they don't, that might be a nice feature to have in BTRFS if
> we can safely support it.

   One minor awkwardness with "norecovery", I've just realised: we
already have a "recovery" mount option. That's going to make things
really confusing if we stick to that name.

   Hugo.

-- 
Hugo Mills | Reintarnation: Coming back from the dead as a
hugo@... carfax.org.uk | hillbilly
http://carfax.org.uk/  |
PGP: E2AB1DE4  |


signature.asc
Description: Digital signature


Re: Bug/regression: Read-only mount not read-only

2015-12-02 Thread Qu Wenruo



Russell Coker wrote on 2015/12/02 17:25 +1100:

On Wed, 2 Dec 2015 06:05:09 AM Eric Sandeen wrote:

yes, xfs does; we have "-o norecovery" if you don't want that, or need
to mount a filesystem with a dirty log on a readonly device.


That option also works with Ext3/4 so it seems to be a standard way of dealing
with this.  I think that BTRFS should do what Ext3/4 and XFS do in this
regard.


BTW, does -o norecovery implies -o ro?

If not, how does it keep the filesystem consistent?

I'd like to follow that ext2/xfs behavior, but I'm not familiar with 
those filesystems.


Thanks,
Qu


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug/regression: Read-only mount not read-only

2015-12-02 Thread Qu Wenruo



Qu Wenruo wrote on 2015/12/02 17:06 +0800:



Russell Coker wrote on 2015/12/02 17:25 +1100:

On Wed, 2 Dec 2015 06:05:09 AM Eric Sandeen wrote:

yes, xfs does; we have "-o norecovery" if you don't want that, or need
to mount a filesystem with a dirty log on a readonly device.


That option also works with Ext3/4 so it seems to be a standard way of
dealing
with this.  I think that BTRFS should do what Ext3/4 and XFS do in this
regard.


BTW, does -o norecovery implies -o ro?

If not, how does it keep the filesystem consistent?

I'd like to follow that ext2/xfs behavior, but I'm not familiar with
those filesystems.

Thanks,
Qu



OK, norecovery implies ro.

So I think it's possible to do the same thing for btrfs.
I'll try to do it soon.

Thanks,
Qu



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug/regression: Read-only mount not read-only

2015-12-02 Thread Eric Sandeen
On 12/2/15 11:48 AM, Austin S Hemmelgarn wrote:

> On a side note, do either XFS or ext4 support removing the norecovery
> option from the mount flags through mount -o remount?  Even if they
> don't, that might be a nice feature to have in BTRFS if we can safely
> support it.

It's not remountable today on xfs:

/* ro -> rw */
if ((mp->m_flags & XFS_MOUNT_RDONLY) && !(*flags & MS_RDONLY)) {
if (mp->m_flags & XFS_MOUNT_NORECOVERY) {
xfs_warn(mp,
"ro->rw transition prohibited on norecovery mount");
return -EINVAL;
}

not sure about ext4.

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug/regression: Read-only mount not read-only

2015-12-02 Thread Duncan
Hugo Mills posted on Wed, 02 Dec 2015 23:51:55 + as excerpted:

> On Thu, Dec 03, 2015 at 07:40:08AM +0800, Qu Wenruo wrote:
>> 
>> Not remountable is very good to implement it.
>> Makes things super easy to do.
>> 
>> Or we will need to add log replay for remount time.
>> 
>> I'd like to implement it first for non-remountable case as a try. And
>> for the option name, I prefer something like "notreereplay", but I
>> don't consider it the best one yet
> 
>Thinking out loud...
> 
> no-log-replay, no-log, hard-ro, ro-log,
> really-read-only-i-mean-it-this-time-honest-guvnor
> 
> Delete hyphens at your pleasure.

I want the bikeshed green with black polkadots! =:^)

More seriously, ro-noreplay ?

As Hugo says, norecovery clashes with the recovery option we already 
have, so unless we _really_ want to maintain cross-filesystem mount 
option compatibility, that's not going to work.

I'm not sure we want to encourage thinking of it as a log, since it's not 
a log in the journalling-filesystem sense but much more limited.

And I think ro needs to be in there for clarity.

hard-ro strikes my fancy as well, but ro-noreplay seems clearer to me.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug/regression: Read-only mount not read-only

2015-12-02 Thread Hugo Mills
On Thu, Dec 03, 2015 at 07:40:08AM +0800, Qu Wenruo wrote:
> 
> 
> On 12/03/2015 06:48 AM, Eric Sandeen wrote:
> >On 12/2/15 11:48 AM, Austin S Hemmelgarn wrote:
> >
> >>On a side note, do either XFS or ext4 support removing the norecovery
> >>option from the mount flags through mount -o remount?  Even if they
> >>don't, that might be a nice feature to have in BTRFS if we can safely
> >>support it.
> >
> >It's not remountable today on xfs:
> >
> > /* ro -> rw */
> > if ((mp->m_flags & XFS_MOUNT_RDONLY) && !(*flags & MS_RDONLY)) {
> > if (mp->m_flags & XFS_MOUNT_NORECOVERY) {
> > xfs_warn(mp,
> > "ro->rw transition prohibited on norecovery mount");
> > return -EINVAL;
> > }
> >
> >not sure about ext4.
> >
> >-Eric
> 
> Not remountable is very good to implement it.
> Makes things super easy to do.
> 
> Or we will need to add log replay for remount time.
> 
> I'd like to implement it first for non-remountable case as a try.
> And for the option name, I prefer something like "notreereplay", but
> I don't consider it the best one yet

   Thinking out loud...

no-log-replay, no-log, hard-ro, ro-log,
really-read-only-i-mean-it-this-time-honest-guvnor

Delete hyphens at your pleasure.

   Hugo.

-- 
Hugo Mills | ORLY? IÄ! R'LYH!
hugo@... carfax.org.uk |
http://carfax.org.uk/  |
PGP: E2AB1DE4  |


signature.asc
Description: Digital signature


Re: Bug/regression: Read-only mount not read-only

2015-12-02 Thread Qu Wenruo



On 12/03/2015 06:48 AM, Eric Sandeen wrote:

On 12/2/15 11:48 AM, Austin S Hemmelgarn wrote:


On a side note, do either XFS or ext4 support removing the norecovery
option from the mount flags through mount -o remount?  Even if they
don't, that might be a nice feature to have in BTRFS if we can safely
support it.


It's not remountable today on xfs:

 /* ro -> rw */
 if ((mp->m_flags & XFS_MOUNT_RDONLY) && !(*flags & MS_RDONLY)) {
 if (mp->m_flags & XFS_MOUNT_NORECOVERY) {
 xfs_warn(mp,
 "ro->rw transition prohibited on norecovery mount");
 return -EINVAL;
 }

not sure about ext4.

-Eric


Not remountable is very good to implement it.
Makes things super easy to do.

Or we will need to add log replay for remount time.

I'd like to implement it first for non-remountable case as a try.
And for the option name, I prefer something like "notreereplay", but I 
don't consider it the best one yet


Thanks,
Qu


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug/regression: Read-only mount not read-only

2015-12-01 Thread Chris Mason
On Tue, Dec 01, 2015 at 02:46:32PM +0800, Qu Wenruo wrote:
> 
> 
> Chris Mason wrote on 2015/11/30 11:48 -0500:
> >On Sat, Nov 28, 2015 at 01:46:34PM +, Hugo Mills wrote:
> >>We've just had someone on IRC with a problem mounting their FS. The
> >>main problem is that they've got a corrupt log tree. That isn't the
> >>subject of this email, though.
> >>
> >>The issue I'd like to raise is that even with -oro as a point
> >>option, the FS is trying to replay the log tree. The dmesg output from
> >>mount -oro is at the end of the email.
> >>
> >>Now, my memory, experience and understanding is that the FS
> >>doesn't, and shouldn't replay the log tree on a RO mount, because the
> >>FS should still be consistent even without the reply, and
> >>RO-means-actually-RO is possible and desirable. (Compared to a
> >>journalling FS, where journal replay is required for a consistent,
> >>usable FS).
> >>
> >>So, this looks to me like a regression that's come in somewhere.
> >>
> >>(Just for completeness, the system in question usually runs 4.2.5,
> >>but the live CD the OP is using is 4.2.3).
> >
> >We do need to replay the log tree, even on readonly mounts.  Otherwise
> >files created and fsunk before crashing may not even exist.
> >
> >We'll bail out of the log replay on readonly media, but otherwise the
> >replay always happens.
> >
> >-chris
> 
> Or disable log_tree (making fsync as slow as sync).
> And there will be no log replay, making RO mount real RO.
> I think we can add it to kernel btrfs documentation.

True, without the log tree there's nothing to replay.

> 
> 
> Or, in my wildest dream, introduce a per-inode tree to record file
> extents/dir items.
> 
> Then fsync will only need to sync the inode file extent/dir item tree.(and
> its direct parent maybe)
> And better random read/write performance.
> 
> Although that's just my dream
> 
> But I'm a little curious about why btrfs choose to pack dir items and file
> extents into the same subvolume tree at design time.
> Unlike most of other file systems(ext4 for example).
> 
> Is it just designed for simplicity?

It's partially simplicity, but it also helps with locality.  When you're
working with lots of files in a single directory, we're able to do many 
operations
faster because we're not jumping around to other indexes for individual
file extents.

The cost is contention at the top of the btree, which I'm still hoping
to fix without having to go all the way down to per-file trees.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug/regression: Read-only mount not read-only

2015-12-01 Thread Chris Mason
On Mon, Nov 30, 2015 at 05:06:00PM +, Hugo Mills wrote:
> On Mon, Nov 30, 2015 at 11:48:01AM -0500, Chris Mason wrote:
> > On Sat, Nov 28, 2015 at 01:46:34PM +, Hugo Mills wrote:
> > >We've just had someone on IRC with a problem mounting their FS. The
> > > main problem is that they've got a corrupt log tree. That isn't the
> > > subject of this email, though.
> > > 
> > >The issue I'd like to raise is that even with -oro as a point
> > > option, the FS is trying to replay the log tree. The dmesg output from
> > > mount -oro is at the end of the email.
> > > 
> > >Now, my memory, experience and understanding is that the FS
> > > doesn't, and shouldn't replay the log tree on a RO mount, because the
> > > FS should still be consistent even without the reply, and
> > > RO-means-actually-RO is possible and desirable. (Compared to a
> > > journalling FS, where journal replay is required for a consistent,
> > > usable FS).
> > > 
> > >So, this looks to me like a regression that's come in somewhere.
> > > 
> > >(Just for completeness, the system in question usually runs 4.2.5,
> > > but the live CD the OP is using is 4.2.3).
> > 
> > We do need to replay the log tree, even on readonly mounts.  Otherwise
> > files created and fsunk before crashing may not even exist.
> 
>I'm actually happy with that, as long as the log tree is retained
> until it _can_ be played back. I think it's much more important that
> read-only actually means read-only *as much as is possible* (if for no
> other reason than being able to test the status of the log tree).
> Obviously, for journalling FSes, a journal reply is required by the
> design of the FS, but with a CoW FS, the FS should be consistent if
> possibly outdated with a RO mount.

Normally I'd agree, but we have a long tradition of mounting root
readonly at first for no good reason at all.  This is why reiserfs/ext
(and I think xfs) all replay logs on readonly mounts.  It's not an
admin initiated action but an early stage of boot.

> 
>Maybe there should be a "replay-log" mount option to modify the
> "ro" option to allow the log to be replayed but no further
> modifications? (i.e. keep the plain "ro" case to be the safest option
> that makes the fewest changes to the FS structure -- none).
> 

I'd do it the other way around, have a mount option that is emergency
readonly.

> > We'll bail out of the log replay on readonly media, but otherwise the
> > replay always happens.
> 
>OK, so what was happening in the cases where a filesystem was
> mountable RO, but not RW, and then btrfs-zero-log allowed the FS to be
> mounted? I've handled any number of people with exactly those
> symptoms, and it's been like that for a while. What I saw on IRC a
> couple of days ago seems to be new behaviour.

Something else was being skipped, probably btrfs_cleanup_fs_roots()

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug/regression: Read-only mount not read-only

2015-12-01 Thread Eric Sandeen
On 12/1/15 1:00 PM, Chris Mason wrote:
> On Mon, Nov 30, 2015 at 05:06:00PM +, Hugo Mills wrote:
>> On Mon, Nov 30, 2015 at 11:48:01AM -0500, Chris Mason wrote:
>>> On Sat, Nov 28, 2015 at 01:46:34PM +, Hugo Mills wrote:
We've just had someone on IRC with a problem mounting their FS. The
 main problem is that they've got a corrupt log tree. That isn't the
 subject of this email, though.

The issue I'd like to raise is that even with -oro as a point
 option, the FS is trying to replay the log tree. The dmesg output from
 mount -oro is at the end of the email.

Now, my memory, experience and understanding is that the FS
 doesn't, and shouldn't replay the log tree on a RO mount, because the
 FS should still be consistent even without the reply, and
 RO-means-actually-RO is possible and desirable. (Compared to a
 journalling FS, where journal replay is required for a consistent,
 usable FS).

So, this looks to me like a regression that's come in somewhere.

(Just for completeness, the system in question usually runs 4.2.5,
 but the live CD the OP is using is 4.2.3).
>>>
>>> We do need to replay the log tree, even on readonly mounts.  Otherwise
>>> files created and fsunk before crashing may not even exist.
>>
>>I'm actually happy with that, as long as the log tree is retained
>> until it _can_ be played back. I think it's much more important that
>> read-only actually means read-only *as much as is possible* (if for no
>> other reason than being able to test the status of the log tree).
>> Obviously, for journalling FSes, a journal reply is required by the
>> design of the FS, but with a CoW FS, the FS should be consistent if
>> possibly outdated with a RO mount.
> 
> Normally I'd agree, but we have a long tradition of mounting root
> readonly at first for no good reason at all.  This is why reiserfs/ext
> (and I think xfs) all replay logs on readonly mounts.  It's not an
> admin initiated action but an early stage of boot.

yes, xfs does; we have "-o norecovery" if you don't want that, or need
to mount a filesystem with a dirty log on a readonly device.

TBH I think it comes down to semantics: does a readonly mount mean
that the filesystem will not write to the block device, or does it mean
that you cannot write to the block device through the filesystem?
Subtle difference.

I think most filesystems treat it as "you cannot write to the filesystem"
but will still replay the log for consistency, because that's what is
normally expected.

If you're doing forensics, blkdev --setro /dev/blah to be sure; use
fs-specific mount options to bypass any log replay that would otherwise
be done, and have at it ...

-Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug/regression: Read-only mount not read-only

2015-11-30 Thread Hugo Mills
On Mon, Nov 30, 2015 at 11:48:01AM -0500, Chris Mason wrote:
> On Sat, Nov 28, 2015 at 01:46:34PM +, Hugo Mills wrote:
> >We've just had someone on IRC with a problem mounting their FS. The
> > main problem is that they've got a corrupt log tree. That isn't the
> > subject of this email, though.
> > 
> >The issue I'd like to raise is that even with -oro as a point
> > option, the FS is trying to replay the log tree. The dmesg output from
> > mount -oro is at the end of the email.
> > 
> >Now, my memory, experience and understanding is that the FS
> > doesn't, and shouldn't replay the log tree on a RO mount, because the
> > FS should still be consistent even without the reply, and
> > RO-means-actually-RO is possible and desirable. (Compared to a
> > journalling FS, where journal replay is required for a consistent,
> > usable FS).
> > 
> >So, this looks to me like a regression that's come in somewhere.
> > 
> >(Just for completeness, the system in question usually runs 4.2.5,
> > but the live CD the OP is using is 4.2.3).
> 
> We do need to replay the log tree, even on readonly mounts.  Otherwise
> files created and fsunk before crashing may not even exist.

   I'm actually happy with that, as long as the log tree is retained
until it _can_ be played back. I think it's much more important that
read-only actually means read-only *as much as is possible* (if for no
other reason than being able to test the status of the log tree).
Obviously, for journalling FSes, a journal reply is required by the
design of the FS, but with a CoW FS, the FS should be consistent if
possibly outdated with a RO mount.

   Maybe there should be a "replay-log" mount option to modify the
"ro" option to allow the log to be replayed but no further
modifications? (i.e. keep the plain "ro" case to be the safest option
that makes the fewest changes to the FS structure -- none).

> We'll bail out of the log replay on readonly media, but otherwise the
> replay always happens.

   OK, so what was happening in the cases where a filesystem was
mountable RO, but not RW, and then btrfs-zero-log allowed the FS to be
mounted? I've handled any number of people with exactly those
symptoms, and it's been like that for a while. What I saw on IRC a
couple of days ago seems to be new behaviour.

   Hugo.

-- 
Hugo Mills | argc, argv, argh!
hugo@... carfax.org.uk |
http://carfax.org.uk/  |
PGP: E2AB1DE4  |


signature.asc
Description: Digital signature


Re: Bug/regression: Read-only mount not read-only

2015-11-30 Thread Austin S Hemmelgarn

On 2015-11-30 11:48, Chris Mason wrote:

On Sat, Nov 28, 2015 at 01:46:34PM +, Hugo Mills wrote:

We've just had someone on IRC with a problem mounting their FS. The
main problem is that they've got a corrupt log tree. That isn't the
subject of this email, though.

The issue I'd like to raise is that even with -oro as a point
option, the FS is trying to replay the log tree. The dmesg output from
mount -oro is at the end of the email.

Now, my memory, experience and understanding is that the FS
doesn't, and shouldn't replay the log tree on a RO mount, because the
FS should still be consistent even without the reply, and
RO-means-actually-RO is possible and desirable. (Compared to a
journalling FS, where journal replay is required for a consistent,
usable FS).

So, this looks to me like a regression that's come in somewhere.

(Just for completeness, the system in question usually runs 4.2.5,
but the live CD the OP is using is 4.2.3).


We do need to replay the log tree, even on readonly mounts.  Otherwise
files created and fsunk before crashing may not even exist.
I would argue that if a user is trying to mount read-only after a crash 
(that is, the user requests a read-only mount, not if the kernel forces 
it), then that probably means that the user has a specific reason for 
doing so, and doesn't want us writing to the filesystem at all.  I 
understand wanting consistency, but if your system just crashed and your 
FS won't mount RW, then it's probably not a good idea to do anything 
that would cause it to be written to until you've figured out what's 
wrong and fixed it.  Because of how BTRFS is designed, about half of the 
things that are needed for recovery on average, need a mounted 
filesystem.  If you can't mount RW, then something _is_ broken, and you 
shouldn't be doing anything to the FS unless the user tells you to.


We'll bail out of the log replay on readonly media, but otherwise the
replay always happens.
We have the ability to make a RO mount truly RO, so we should have some 
way to do that without needing to jump through hoops to make the media 
read-only.  Not needing to jump through hoops to do this is a BIG 
selling point for some people (myself included) for a filesystem. 
Perhaps we should provide an option to control if the log replay happens 
at all (and then we wouldn't need btrfs-zero-log)?  Or we could replay 
the log in memory, and only write changes to disk if the FS is mounted RW.




smime.p7s
Description: S/MIME Cryptographic Signature


Re: Bug/regression: Read-only mount not read-only

2015-11-30 Thread Chris Mason
On Sat, Nov 28, 2015 at 01:46:34PM +, Hugo Mills wrote:
>We've just had someone on IRC with a problem mounting their FS. The
> main problem is that they've got a corrupt log tree. That isn't the
> subject of this email, though.
> 
>The issue I'd like to raise is that even with -oro as a point
> option, the FS is trying to replay the log tree. The dmesg output from
> mount -oro is at the end of the email.
> 
>Now, my memory, experience and understanding is that the FS
> doesn't, and shouldn't replay the log tree on a RO mount, because the
> FS should still be consistent even without the reply, and
> RO-means-actually-RO is possible and desirable. (Compared to a
> journalling FS, where journal replay is required for a consistent,
> usable FS).
> 
>So, this looks to me like a regression that's come in somewhere.
> 
>(Just for completeness, the system in question usually runs 4.2.5,
> but the live CD the OP is using is 4.2.3).

We do need to replay the log tree, even on readonly mounts.  Otherwise
files created and fsunk before crashing may not even exist.

We'll bail out of the log replay on readonly media, but otherwise the
replay always happens.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug/regression: Read-only mount not read-only

2015-11-30 Thread Austin S Hemmelgarn

On 2015-11-30 10:28, Hugo Mills wrote:

On Mon, Nov 30, 2015 at 09:59:40AM -0500, Austin S Hemmelgarn wrote:

On 2015-11-28 08:46, Hugo Mills wrote:

We've just had someone on IRC with a problem mounting their FS. The
main problem is that they've got a corrupt log tree. That isn't the
subject of this email, though.

The issue I'd like to raise is that even with -oro as a point
option, the FS is trying to replay the log tree. The dmesg output from
mount -oro is at the end of the email.

Now, my memory, experience and understanding is that the FS
doesn't, and shouldn't replay the log tree on a RO mount, because the
FS should still be consistent even without the reply, and
RO-means-actually-RO is possible and desirable. (Compared to a
journalling FS, where journal replay is required for a consistent,
usable FS).

This is exactly how it should behave (being able to say that a RO
mount is really RO (if atimes aren't enabled) is a huge selling
point).  On a side note, a properly designed journaling filesystem
_can_ be made to behave like this, but it makes the filesystem
_really_ slow if you don't have enough RAM to cache all the blocks
modified by the journal (because each block access has to check the
journal for modifications).


So, this looks to me like a regression that's come in somewhere.

I'm not sure that this ever worked the way it should.  It should be
fixed regardless of what state things were however.


I'm pretty sure it was like that at some point, because I've used
it as a diagnostic tool: If you can mount OK with -oro, but not
without, then the log tree is broken, and btrfs-zero-log can be used
to good effect. (In fact, that's what I was trying to do with the OP
when I spotted the issue).
Hmm, in that case, it looks like a bisection is in order, but that may 
be tough without a known broken filesystem image.  I would offer to try 
and bisect it myself, but I seem to have misplaced the scripts I had 
been using to automate testing, and I won't be able to take the time to 
look for them properly (and possibly re-write them) until this weekend.





smime.p7s
Description: S/MIME Cryptographic Signature


Re: Bug/regression: Read-only mount not read-only

2015-11-30 Thread Austin S Hemmelgarn

On 2015-11-28 08:46, Hugo Mills wrote:

We've just had someone on IRC with a problem mounting their FS. The
main problem is that they've got a corrupt log tree. That isn't the
subject of this email, though.

The issue I'd like to raise is that even with -oro as a point
option, the FS is trying to replay the log tree. The dmesg output from
mount -oro is at the end of the email.

Now, my memory, experience and understanding is that the FS
doesn't, and shouldn't replay the log tree on a RO mount, because the
FS should still be consistent even without the reply, and
RO-means-actually-RO is possible and desirable. (Compared to a
journalling FS, where journal replay is required for a consistent,
usable FS).
This is exactly how it should behave (being able to say that a RO mount 
is really RO (if atimes aren't enabled) is a huge selling point).  On a 
side note, a properly designed journaling filesystem _can_ be made to 
behave like this, but it makes the filesystem _really_ slow if you don't 
have enough RAM to cache all the blocks modified by the journal (because 
each block access has to check the journal for modifications).


So, this looks to me like a regression that's come in somewhere.
I'm not sure that this ever worked the way it should.  It should be 
fixed regardless of what state things were however.





smime.p7s
Description: S/MIME Cryptographic Signature


Re: Bug/regression: Read-only mount not read-only

2015-11-30 Thread Hugo Mills
On Mon, Nov 30, 2015 at 09:59:40AM -0500, Austin S Hemmelgarn wrote:
> On 2015-11-28 08:46, Hugo Mills wrote:
> >We've just had someone on IRC with a problem mounting their FS. The
> >main problem is that they've got a corrupt log tree. That isn't the
> >subject of this email, though.
> >
> >The issue I'd like to raise is that even with -oro as a point
> >option, the FS is trying to replay the log tree. The dmesg output from
> >mount -oro is at the end of the email.
> >
> >Now, my memory, experience and understanding is that the FS
> >doesn't, and shouldn't replay the log tree on a RO mount, because the
> >FS should still be consistent even without the reply, and
> >RO-means-actually-RO is possible and desirable. (Compared to a
> >journalling FS, where journal replay is required for a consistent,
> >usable FS).
> This is exactly how it should behave (being able to say that a RO
> mount is really RO (if atimes aren't enabled) is a huge selling
> point).  On a side note, a properly designed journaling filesystem
> _can_ be made to behave like this, but it makes the filesystem
> _really_ slow if you don't have enough RAM to cache all the blocks
> modified by the journal (because each block access has to check the
> journal for modifications).
> >
> >So, this looks to me like a regression that's come in somewhere.
> I'm not sure that this ever worked the way it should.  It should be
> fixed regardless of what state things were however.

   I'm pretty sure it was like that at some point, because I've used
it as a diagnostic tool: If you can mount OK with -oro, but not
without, then the log tree is broken, and btrfs-zero-log can be used
to good effect. (In fact, that's what I was trying to do with the OP
when I spotted the issue).

   Hugo.

-- 
Hugo Mills | You are not stuck in traffic: you are traffic
hugo@... carfax.org.uk |
http://carfax.org.uk/  |
PGP: E2AB1DE4  |German ad campaign


signature.asc
Description: Digital signature


Re: Bug/regression: Read-only mount not read-only

2015-11-30 Thread Qu Wenruo



Chris Mason wrote on 2015/11/30 11:48 -0500:

On Sat, Nov 28, 2015 at 01:46:34PM +, Hugo Mills wrote:

We've just had someone on IRC with a problem mounting their FS. The
main problem is that they've got a corrupt log tree. That isn't the
subject of this email, though.

The issue I'd like to raise is that even with -oro as a point
option, the FS is trying to replay the log tree. The dmesg output from
mount -oro is at the end of the email.

Now, my memory, experience and understanding is that the FS
doesn't, and shouldn't replay the log tree on a RO mount, because the
FS should still be consistent even without the reply, and
RO-means-actually-RO is possible and desirable. (Compared to a
journalling FS, where journal replay is required for a consistent,
usable FS).

So, this looks to me like a regression that's come in somewhere.

(Just for completeness, the system in question usually runs 4.2.5,
but the live CD the OP is using is 4.2.3).


We do need to replay the log tree, even on readonly mounts.  Otherwise
files created and fsunk before crashing may not even exist.

We'll bail out of the log replay on readonly media, but otherwise the
replay always happens.

-chris


Or disable log_tree (making fsync as slow as sync).
And there will be no log replay, making RO mount real RO.
I think we can add it to kernel btrfs documentation.


Or, in my wildest dream, introduce a per-inode tree to record file 
extents/dir items.


Then fsync will only need to sync the inode file extent/dir item 
tree.(and its direct parent maybe)

And better random read/write performance.

Although that's just my dream

But I'm a little curious about why btrfs choose to pack dir items and 
file extents into the same subvolume tree at design time.

Unlike most of other file systems(ext4 for example).

Is it just designed for simplicity?

Thanks,
Qu


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html





--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Bug/regression: Read-only mount not read-only

2015-11-28 Thread Hugo Mills
   We've just had someone on IRC with a problem mounting their FS. The
main problem is that they've got a corrupt log tree. That isn't the
subject of this email, though.

   The issue I'd like to raise is that even with -oro as a point
option, the FS is trying to replay the log tree. The dmesg output from
mount -oro is at the end of the email.

   Now, my memory, experience and understanding is that the FS
doesn't, and shouldn't replay the log tree on a RO mount, because the
FS should still be consistent even without the reply, and
RO-means-actually-RO is possible and desirable. (Compared to a
journalling FS, where journal replay is required for a consistent,
usable FS).

   So, this looks to me like a regression that's come in somewhere.

   (Just for completeness, the system in question usually runs 4.2.5,
but the live CD the OP is using is 4.2.3).

   Hugo.

[ 2058.530542] BTRFS info (device sda1): disk space caching is enabled
[ 2058.530548] BTRFS: has skinny extents
[ 2060.449981] [ cut here ]
[ 2060.450015] WARNING: CPU: 1 PID: 2650 at fs/btrfs/extent-tree.c:6255 
__btrfs_free_extent.isra.68+0x8c8/0xd70 [btrfs]()
[ 2060.450031] Modules linked in: bnep bluetooth rfkill ip6t_rpfilter 
ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_filter ebtable_nat 
ebtable_broute bridge ebtables ip6table_mangle ip6table_nat nf_conntrack_ipv6 
nf_defrag_ipv6 nf_nat_ipv6 ip6table_security ip6table_raw ip6table_filter 
ip6_tables iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 
nf_nat_ipv4 nf_nat nf_conntrack iptable_security iptable_raw coretemp kvm_intel 
kvm snd_hda_codec_realtek snd_hda_codec_generic gpio_ich snd_hda_intel 
snd_hda_codec iTCO_wdt iTCO_vendor_support ppdev lpc_ich snd_hda_core snd_hwdep 
snd_seq snd_seq_device snd_pcm snd_timer snd i2c_i801 soundcore mei_me mei 
tpm_infineon parport_pc tpm_tis parport shpchp tpm acpi_cpufreq nfsd 
auth_rpcgss nfs_acl lockd grace isofs squashfs btrfs xor i915 raid6_pq 
hid_logitech_hidpp
[ 2060.450111]  video i2c_algo_bit drm_kms_helper drm uas crc32c_intel 8021q 
garp stp usb_storage llc serio_raw mrp r8169 hid_logitech_dj mii scsi_dh_rdac 
scsi_dh_emc scsi_dh_alua sunrpc loop
[ 2060.450191] CPU: 1 PID: 2650 Comm: mount Tainted: GW   
4.2.3-300.fc23.x86_64 #1
[ 2060.450195] Hardware name: MSI MS-7636/H55M-P31(MS-7636)   , BIOS V1.9 
09/14/2010
[ 2060.450197]   73c9bbcf 8800780af618 
81771fca
[ 2060.450202]    8800780af658 
8109e4a6
[ 2060.450206]  0002 00252f595000 fffe 

[ 2060.450211] Call Trace:
[ 2060.450221]  [] dump_stack+0x45/0x57
[ 2060.450229]  [] warn_slowpath_common+0x86/0xc0
[ 2060.450233]  [] warn_slowpath_null+0x1a/0x20
[ 2060.450252]  [] __btrfs_free_extent.isra.68+0x8c8/0xd70 
[btrfs]
[ 2060.450309]  [] ? find_ref_head+0x5a/0x80 [btrfs]
[ 2060.450331]  [] __btrfs_run_delayed_refs+0x998/0x1080 
[btrfs]
[ 2060.450351]  [] btrfs_run_delayed_refs.part.73+0x74/0x270 
[btrfs]
[ 2060.450371]  [] btrfs_run_delayed_refs+0x15/0x20 [btrfs]
[ 2060.450420]  [] btrfs_commit_transaction+0x56/0xad0 [btrfs]
[ 2060.450447]  [] ? free_extent_buffer+0x4f/0xa0 [btrfs]
[ 2060.450474]  [] btrfs_recover_log_trees+0x3ed/0x490 [btrfs]
[ 2060.450501]  [] ? replay_one_extent+0x680/0x680 [btrfs]
[ 2060.450524]  [] open_ctree+0x19be/0x23f0 [btrfs]
[ 2060.450539]  [] btrfs_mount+0x94e/0xa70 [btrfs]
[ 2060.450546]  [] ? find_next_bit+0x15/0x20
[ 2060.450551]  [] ? pcpu_alloc+0x38d/0x670
[ 2060.450557]  [] mount_fs+0x38/0x160
[ 2060.450561]  [] ? __alloc_percpu+0x15/0x20
[ 2060.450565]  [] vfs_kern_mount+0x6b/0x110
[ 2060.450579]  [] btrfs_mount+0x1e8/0xa70 [btrfs]
[ 2060.450584]  [] ? pcpu_alloc+0x38d/0x670
[ 2060.450588]  [] mount_fs+0x38/0x160
[ 2060.450592]  [] ? __alloc_percpu+0x15/0x20
[ 2060.450596]  [] vfs_kern_mount+0x6b/0x110
[ 2060.450601]  [] do_mount+0x246/0xce0
[ 2060.450605]  [] ? memdup_user+0x46/0x80
[ 2060.450609]  [] SyS_mount+0x9f/0x100
[ 2060.450616]  [] entry_SYSCALL_64_fastpath+0x12/0x71
[ 2060.450649] ---[ end trace 7b4fe08881eca151 ]---
[ 2060.450655] BTRFS info (device sda1): leaf 437960704 total ptrs 242 free 
space 2247
[ 2060.450659]  item 0 key (159696797696 169 0) itemoff 16250 itemsize 33
[ 2060.450662]  extent refs 1 gen 21134 flags 2
[ 2060.450664]  tree block backref root 2
[ 2060.450667]  item 1 key (159696830464 169 1) itemoff 16217 itemsize 33
[ 2060.450670]  extent refs 1 gen 21134 flags 2
[ 2060.450672]  tree block backref root 2
[ 2060.450675]  item 2 key (159696846848 169 0) itemoff 16184 itemsize 33
[ 2060.450677]  extent refs 1 gen 21134 flags 2
[ 2060.450679]  tree block backref root 2
[ 2060.450682]  item 3 key (159696863232 169 0) itemoff 16151 itemsize 33
[ 2060.450684]  extent refs 1 gen 21134 flags 2
[ 2060.450686]  tree block backref root 2
[ 2060.450689]  item 4 key (159696879616 169 0) itemoff