Re: Bug/regression: Read-only mount not read-only
On 2015-12-02 18:40, Qu Wenruo wrote: On 12/03/2015 06:48 AM, Eric Sandeen wrote: On 12/2/15 11:48 AM, Austin S Hemmelgarn wrote: On a side note, do either XFS or ext4 support removing the norecovery option from the mount flags through mount -o remount? Even if they don't, that might be a nice feature to have in BTRFS if we can safely support it. It's not remountable today on xfs: /* ro -> rw */ if ((mp->m_flags & XFS_MOUNT_RDONLY) && !(*flags & MS_RDONLY)) { if (mp->m_flags & XFS_MOUNT_NORECOVERY) { xfs_warn(mp, "ro->rw transition prohibited on norecovery mount"); return -EINVAL; } not sure about ext4. -Eric Not remountable is very good to implement it. Makes things super easy to do. Or we will need to add log replay for remount time. I'd like to implement it first for non-remountable case as a try. And for the option name, I prefer something like "notreereplay", but I don't consider it the best one yet I entirely understand wanting a simple implementation first, my only point is that it would be a potentially useful feature to have if we could sanely implement it. smime.p7s Description: S/MIME Cryptographic Signature
Re: Bug/regression: Read-only mount not read-only
On 2015-12-02 18:51, Hugo Mills wrote: On Thu, Dec 03, 2015 at 07:40:08AM +0800, Qu Wenruo wrote: On 12/03/2015 06:48 AM, Eric Sandeen wrote: On 12/2/15 11:48 AM, Austin S Hemmelgarn wrote: On a side note, do either XFS or ext4 support removing the norecovery option from the mount flags through mount -o remount? Even if they don't, that might be a nice feature to have in BTRFS if we can safely support it. It's not remountable today on xfs: /* ro -> rw */ if ((mp->m_flags & XFS_MOUNT_RDONLY) && !(*flags & MS_RDONLY)) { if (mp->m_flags & XFS_MOUNT_NORECOVERY) { xfs_warn(mp, "ro->rw transition prohibited on norecovery mount"); return -EINVAL; } not sure about ext4. -Eric Not remountable is very good to implement it. Makes things super easy to do. Or we will need to add log replay for remount time. I'd like to implement it first for non-remountable case as a try. And for the option name, I prefer something like "notreereplay", but I don't consider it the best one yet Thinking out loud... no-log-replay, no-log, hard-ro, ro-log, really-read-only-i-mean-it-this-time-honest-guvnor Delete hyphens at your pleasure. Personally, I think no-log-replay (with or without hyphens) is the most concise option name. With something like this, it should be as clear as possible what is being done. smime.p7s Description: S/MIME Cryptographic Signature
Re: Bug/regression: Read-only mount not read-only
On 12/2/15 3:23 AM, Qu Wenruo wrote: > > > Qu Wenruo wrote on 2015/12/02 17:06 +0800: >> >> >> Russell Coker wrote on 2015/12/02 17:25 +1100: >>> On Wed, 2 Dec 2015 06:05:09 AM Eric Sandeen wrote: yes, xfs does; we have "-o norecovery" if you don't want that, or need to mount a filesystem with a dirty log on a readonly device. >>> >>> That option also works with Ext3/4 so it seems to be a standard way of >>> dealing >>> with this. I think that BTRFS should do what Ext3/4 and XFS do in this >>> regard. >>> >> BTW, does -o norecovery implies -o ro? >> >> If not, how does it keep the filesystem consistent? >> >> I'd like to follow that ext2/xfs behavior, but I'm not familiar with >> those filesystems. >> >> Thanks, >> Qu >> > > OK, norecovery implies ro. For XFS, it doesn't imply it, it requires it; i.e. both must be stated explicitly: /* * no recovery flag requires a read-only mount */ if ((mp->m_flags & XFS_MOUNT_NORECOVERY) && !(mp->m_flags & XFS_MOUNT_RDONLY)) { xfs_warn(mp, "no-recovery mounts must be read-only."); return -EINVAL; } ext4 is the same, I believe: } else if (test_opt(sb, NOLOAD) && !(sb->s_flags & MS_RDONLY) && ext4_has_feature_journal_needs_recovery(sb)) { ext4_msg(sb, KERN_ERR, "required journal recovery " "suppressed and not mounted read-only"); goto failed_mount_wq; so if you'd like btrfs to be consistent with these, I would not make norecovery imply ro; rather, make I would make it require an explicit ro, i.e. mount -o ro,norecovery -Eric > So I think it's possible to do the same thing for btrfs. > I'll try to do it soon. > > Thanks, > Qu > >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug/regression: Read-only mount not read-only
On 2015-12-02 11:54, Eric Sandeen wrote: On 12/2/15 3:23 AM, Qu Wenruo wrote: Qu Wenruo wrote on 2015/12/02 17:06 +0800: Russell Coker wrote on 2015/12/02 17:25 +1100: On Wed, 2 Dec 2015 06:05:09 AM Eric Sandeen wrote: yes, xfs does; we have "-o norecovery" if you don't want that, or need to mount a filesystem with a dirty log on a readonly device. That option also works with Ext3/4 so it seems to be a standard way of dealing with this. I think that BTRFS should do what Ext3/4 and XFS do in this regard. BTW, does -o norecovery implies -o ro? If not, how does it keep the filesystem consistent? I'd like to follow that ext2/xfs behavior, but I'm not familiar with those filesystems. Thanks, Qu OK, norecovery implies ro. For XFS, it doesn't imply it, it requires it; i.e. both must be stated explicitly: /* * no recovery flag requires a read-only mount */ if ((mp->m_flags & XFS_MOUNT_NORECOVERY) && !(mp->m_flags & XFS_MOUNT_RDONLY)) { xfs_warn(mp, "no-recovery mounts must be read-only."); return -EINVAL; } ext4 is the same, I believe: } else if (test_opt(sb, NOLOAD) && !(sb->s_flags & MS_RDONLY) && ext4_has_feature_journal_needs_recovery(sb)) { ext4_msg(sb, KERN_ERR, "required journal recovery " "suppressed and not mounted read-only"); goto failed_mount_wq; so if you'd like btrfs to be consistent with these, I would not make norecovery imply ro; rather, make I would make it require an explicit ro, i.e. mount -o ro,norecovery Agreed, with something like that, it should as blatantly obvious as possible that you can't write to the FS. On a side note, do either XFS or ext4 support removing the norecovery option from the mount flags through mount -o remount? Even if they don't, that might be a nice feature to have in BTRFS if we can safely support it. smime.p7s Description: S/MIME Cryptographic Signature
Re: Bug/regression: Read-only mount not read-only
On Wed, Dec 02, 2015 at 12:48:39PM -0500, Austin S Hemmelgarn wrote: > On 2015-12-02 11:54, Eric Sandeen wrote: > >On 12/2/15 3:23 AM, Qu Wenruo wrote: > >>Qu Wenruo wrote on 2015/12/02 17:06 +0800: > >>>Russell Coker wrote on 2015/12/02 17:25 +1100: > On Wed, 2 Dec 2015 06:05:09 AM Eric Sandeen wrote: > >yes, xfs does; we have "-o norecovery" if you don't want that, or need > >to mount a filesystem with a dirty log on a readonly device. > > That option also works with Ext3/4 so it seems to be a standard way of > dealing > with this. I think that BTRFS should do what Ext3/4 and XFS do in this > regard. [snip] > >so if you'd like btrfs to be consistent with these, I would not make > >norecovery imply ro; rather, make I would make it require an explicit ro, > >i.e. > > > >mount -o ro,norecovery > Agreed, with something like that, it should as blatantly obvious as > possible that you can't write to the FS. > > On a side note, do either XFS or ext4 support removing the > norecovery option from the mount flags through mount -o remount? > Even if they don't, that might be a nice feature to have in BTRFS if > we can safely support it. One minor awkwardness with "norecovery", I've just realised: we already have a "recovery" mount option. That's going to make things really confusing if we stick to that name. Hugo. -- Hugo Mills | Reintarnation: Coming back from the dead as a hugo@... carfax.org.uk | hillbilly http://carfax.org.uk/ | PGP: E2AB1DE4 | signature.asc Description: Digital signature
Re: Bug/regression: Read-only mount not read-only
Russell Coker wrote on 2015/12/02 17:25 +1100: On Wed, 2 Dec 2015 06:05:09 AM Eric Sandeen wrote: yes, xfs does; we have "-o norecovery" if you don't want that, or need to mount a filesystem with a dirty log on a readonly device. That option also works with Ext3/4 so it seems to be a standard way of dealing with this. I think that BTRFS should do what Ext3/4 and XFS do in this regard. BTW, does -o norecovery implies -o ro? If not, how does it keep the filesystem consistent? I'd like to follow that ext2/xfs behavior, but I'm not familiar with those filesystems. Thanks, Qu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug/regression: Read-only mount not read-only
Qu Wenruo wrote on 2015/12/02 17:06 +0800: Russell Coker wrote on 2015/12/02 17:25 +1100: On Wed, 2 Dec 2015 06:05:09 AM Eric Sandeen wrote: yes, xfs does; we have "-o norecovery" if you don't want that, or need to mount a filesystem with a dirty log on a readonly device. That option also works with Ext3/4 so it seems to be a standard way of dealing with this. I think that BTRFS should do what Ext3/4 and XFS do in this regard. BTW, does -o norecovery implies -o ro? If not, how does it keep the filesystem consistent? I'd like to follow that ext2/xfs behavior, but I'm not familiar with those filesystems. Thanks, Qu OK, norecovery implies ro. So I think it's possible to do the same thing for btrfs. I'll try to do it soon. Thanks, Qu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug/regression: Read-only mount not read-only
On 12/2/15 11:48 AM, Austin S Hemmelgarn wrote: > On a side note, do either XFS or ext4 support removing the norecovery > option from the mount flags through mount -o remount? Even if they > don't, that might be a nice feature to have in BTRFS if we can safely > support it. It's not remountable today on xfs: /* ro -> rw */ if ((mp->m_flags & XFS_MOUNT_RDONLY) && !(*flags & MS_RDONLY)) { if (mp->m_flags & XFS_MOUNT_NORECOVERY) { xfs_warn(mp, "ro->rw transition prohibited on norecovery mount"); return -EINVAL; } not sure about ext4. -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug/regression: Read-only mount not read-only
Hugo Mills posted on Wed, 02 Dec 2015 23:51:55 + as excerpted: > On Thu, Dec 03, 2015 at 07:40:08AM +0800, Qu Wenruo wrote: >> >> Not remountable is very good to implement it. >> Makes things super easy to do. >> >> Or we will need to add log replay for remount time. >> >> I'd like to implement it first for non-remountable case as a try. And >> for the option name, I prefer something like "notreereplay", but I >> don't consider it the best one yet > >Thinking out loud... > > no-log-replay, no-log, hard-ro, ro-log, > really-read-only-i-mean-it-this-time-honest-guvnor > > Delete hyphens at your pleasure. I want the bikeshed green with black polkadots! =:^) More seriously, ro-noreplay ? As Hugo says, norecovery clashes with the recovery option we already have, so unless we _really_ want to maintain cross-filesystem mount option compatibility, that's not going to work. I'm not sure we want to encourage thinking of it as a log, since it's not a log in the journalling-filesystem sense but much more limited. And I think ro needs to be in there for clarity. hard-ro strikes my fancy as well, but ro-noreplay seems clearer to me. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug/regression: Read-only mount not read-only
On Thu, Dec 03, 2015 at 07:40:08AM +0800, Qu Wenruo wrote: > > > On 12/03/2015 06:48 AM, Eric Sandeen wrote: > >On 12/2/15 11:48 AM, Austin S Hemmelgarn wrote: > > > >>On a side note, do either XFS or ext4 support removing the norecovery > >>option from the mount flags through mount -o remount? Even if they > >>don't, that might be a nice feature to have in BTRFS if we can safely > >>support it. > > > >It's not remountable today on xfs: > > > > /* ro -> rw */ > > if ((mp->m_flags & XFS_MOUNT_RDONLY) && !(*flags & MS_RDONLY)) { > > if (mp->m_flags & XFS_MOUNT_NORECOVERY) { > > xfs_warn(mp, > > "ro->rw transition prohibited on norecovery mount"); > > return -EINVAL; > > } > > > >not sure about ext4. > > > >-Eric > > Not remountable is very good to implement it. > Makes things super easy to do. > > Or we will need to add log replay for remount time. > > I'd like to implement it first for non-remountable case as a try. > And for the option name, I prefer something like "notreereplay", but > I don't consider it the best one yet Thinking out loud... no-log-replay, no-log, hard-ro, ro-log, really-read-only-i-mean-it-this-time-honest-guvnor Delete hyphens at your pleasure. Hugo. -- Hugo Mills | ORLY? IÄ! R'LYH! hugo@... carfax.org.uk | http://carfax.org.uk/ | PGP: E2AB1DE4 | signature.asc Description: Digital signature
Re: Bug/regression: Read-only mount not read-only
On 12/03/2015 06:48 AM, Eric Sandeen wrote: On 12/2/15 11:48 AM, Austin S Hemmelgarn wrote: On a side note, do either XFS or ext4 support removing the norecovery option from the mount flags through mount -o remount? Even if they don't, that might be a nice feature to have in BTRFS if we can safely support it. It's not remountable today on xfs: /* ro -> rw */ if ((mp->m_flags & XFS_MOUNT_RDONLY) && !(*flags & MS_RDONLY)) { if (mp->m_flags & XFS_MOUNT_NORECOVERY) { xfs_warn(mp, "ro->rw transition prohibited on norecovery mount"); return -EINVAL; } not sure about ext4. -Eric Not remountable is very good to implement it. Makes things super easy to do. Or we will need to add log replay for remount time. I'd like to implement it first for non-remountable case as a try. And for the option name, I prefer something like "notreereplay", but I don't consider it the best one yet Thanks, Qu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug/regression: Read-only mount not read-only
On Tue, Dec 01, 2015 at 02:46:32PM +0800, Qu Wenruo wrote: > > > Chris Mason wrote on 2015/11/30 11:48 -0500: > >On Sat, Nov 28, 2015 at 01:46:34PM +, Hugo Mills wrote: > >>We've just had someone on IRC with a problem mounting their FS. The > >>main problem is that they've got a corrupt log tree. That isn't the > >>subject of this email, though. > >> > >>The issue I'd like to raise is that even with -oro as a point > >>option, the FS is trying to replay the log tree. The dmesg output from > >>mount -oro is at the end of the email. > >> > >>Now, my memory, experience and understanding is that the FS > >>doesn't, and shouldn't replay the log tree on a RO mount, because the > >>FS should still be consistent even without the reply, and > >>RO-means-actually-RO is possible and desirable. (Compared to a > >>journalling FS, where journal replay is required for a consistent, > >>usable FS). > >> > >>So, this looks to me like a regression that's come in somewhere. > >> > >>(Just for completeness, the system in question usually runs 4.2.5, > >>but the live CD the OP is using is 4.2.3). > > > >We do need to replay the log tree, even on readonly mounts. Otherwise > >files created and fsunk before crashing may not even exist. > > > >We'll bail out of the log replay on readonly media, but otherwise the > >replay always happens. > > > >-chris > > Or disable log_tree (making fsync as slow as sync). > And there will be no log replay, making RO mount real RO. > I think we can add it to kernel btrfs documentation. True, without the log tree there's nothing to replay. > > > Or, in my wildest dream, introduce a per-inode tree to record file > extents/dir items. > > Then fsync will only need to sync the inode file extent/dir item tree.(and > its direct parent maybe) > And better random read/write performance. > > Although that's just my dream > > But I'm a little curious about why btrfs choose to pack dir items and file > extents into the same subvolume tree at design time. > Unlike most of other file systems(ext4 for example). > > Is it just designed for simplicity? It's partially simplicity, but it also helps with locality. When you're working with lots of files in a single directory, we're able to do many operations faster because we're not jumping around to other indexes for individual file extents. The cost is contention at the top of the btree, which I'm still hoping to fix without having to go all the way down to per-file trees. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug/regression: Read-only mount not read-only
On Mon, Nov 30, 2015 at 05:06:00PM +, Hugo Mills wrote: > On Mon, Nov 30, 2015 at 11:48:01AM -0500, Chris Mason wrote: > > On Sat, Nov 28, 2015 at 01:46:34PM +, Hugo Mills wrote: > > >We've just had someone on IRC with a problem mounting their FS. The > > > main problem is that they've got a corrupt log tree. That isn't the > > > subject of this email, though. > > > > > >The issue I'd like to raise is that even with -oro as a point > > > option, the FS is trying to replay the log tree. The dmesg output from > > > mount -oro is at the end of the email. > > > > > >Now, my memory, experience and understanding is that the FS > > > doesn't, and shouldn't replay the log tree on a RO mount, because the > > > FS should still be consistent even without the reply, and > > > RO-means-actually-RO is possible and desirable. (Compared to a > > > journalling FS, where journal replay is required for a consistent, > > > usable FS). > > > > > >So, this looks to me like a regression that's come in somewhere. > > > > > >(Just for completeness, the system in question usually runs 4.2.5, > > > but the live CD the OP is using is 4.2.3). > > > > We do need to replay the log tree, even on readonly mounts. Otherwise > > files created and fsunk before crashing may not even exist. > >I'm actually happy with that, as long as the log tree is retained > until it _can_ be played back. I think it's much more important that > read-only actually means read-only *as much as is possible* (if for no > other reason than being able to test the status of the log tree). > Obviously, for journalling FSes, a journal reply is required by the > design of the FS, but with a CoW FS, the FS should be consistent if > possibly outdated with a RO mount. Normally I'd agree, but we have a long tradition of mounting root readonly at first for no good reason at all. This is why reiserfs/ext (and I think xfs) all replay logs on readonly mounts. It's not an admin initiated action but an early stage of boot. > >Maybe there should be a "replay-log" mount option to modify the > "ro" option to allow the log to be replayed but no further > modifications? (i.e. keep the plain "ro" case to be the safest option > that makes the fewest changes to the FS structure -- none). > I'd do it the other way around, have a mount option that is emergency readonly. > > We'll bail out of the log replay on readonly media, but otherwise the > > replay always happens. > >OK, so what was happening in the cases where a filesystem was > mountable RO, but not RW, and then btrfs-zero-log allowed the FS to be > mounted? I've handled any number of people with exactly those > symptoms, and it's been like that for a while. What I saw on IRC a > couple of days ago seems to be new behaviour. Something else was being skipped, probably btrfs_cleanup_fs_roots() -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug/regression: Read-only mount not read-only
On 12/1/15 1:00 PM, Chris Mason wrote: > On Mon, Nov 30, 2015 at 05:06:00PM +, Hugo Mills wrote: >> On Mon, Nov 30, 2015 at 11:48:01AM -0500, Chris Mason wrote: >>> On Sat, Nov 28, 2015 at 01:46:34PM +, Hugo Mills wrote: We've just had someone on IRC with a problem mounting their FS. The main problem is that they've got a corrupt log tree. That isn't the subject of this email, though. The issue I'd like to raise is that even with -oro as a point option, the FS is trying to replay the log tree. The dmesg output from mount -oro is at the end of the email. Now, my memory, experience and understanding is that the FS doesn't, and shouldn't replay the log tree on a RO mount, because the FS should still be consistent even without the reply, and RO-means-actually-RO is possible and desirable. (Compared to a journalling FS, where journal replay is required for a consistent, usable FS). So, this looks to me like a regression that's come in somewhere. (Just for completeness, the system in question usually runs 4.2.5, but the live CD the OP is using is 4.2.3). >>> >>> We do need to replay the log tree, even on readonly mounts. Otherwise >>> files created and fsunk before crashing may not even exist. >> >>I'm actually happy with that, as long as the log tree is retained >> until it _can_ be played back. I think it's much more important that >> read-only actually means read-only *as much as is possible* (if for no >> other reason than being able to test the status of the log tree). >> Obviously, for journalling FSes, a journal reply is required by the >> design of the FS, but with a CoW FS, the FS should be consistent if >> possibly outdated with a RO mount. > > Normally I'd agree, but we have a long tradition of mounting root > readonly at first for no good reason at all. This is why reiserfs/ext > (and I think xfs) all replay logs on readonly mounts. It's not an > admin initiated action but an early stage of boot. yes, xfs does; we have "-o norecovery" if you don't want that, or need to mount a filesystem with a dirty log on a readonly device. TBH I think it comes down to semantics: does a readonly mount mean that the filesystem will not write to the block device, or does it mean that you cannot write to the block device through the filesystem? Subtle difference. I think most filesystems treat it as "you cannot write to the filesystem" but will still replay the log for consistency, because that's what is normally expected. If you're doing forensics, blkdev --setro /dev/blah to be sure; use fs-specific mount options to bypass any log replay that would otherwise be done, and have at it ... -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug/regression: Read-only mount not read-only
On Mon, Nov 30, 2015 at 11:48:01AM -0500, Chris Mason wrote: > On Sat, Nov 28, 2015 at 01:46:34PM +, Hugo Mills wrote: > >We've just had someone on IRC with a problem mounting their FS. The > > main problem is that they've got a corrupt log tree. That isn't the > > subject of this email, though. > > > >The issue I'd like to raise is that even with -oro as a point > > option, the FS is trying to replay the log tree. The dmesg output from > > mount -oro is at the end of the email. > > > >Now, my memory, experience and understanding is that the FS > > doesn't, and shouldn't replay the log tree on a RO mount, because the > > FS should still be consistent even without the reply, and > > RO-means-actually-RO is possible and desirable. (Compared to a > > journalling FS, where journal replay is required for a consistent, > > usable FS). > > > >So, this looks to me like a regression that's come in somewhere. > > > >(Just for completeness, the system in question usually runs 4.2.5, > > but the live CD the OP is using is 4.2.3). > > We do need to replay the log tree, even on readonly mounts. Otherwise > files created and fsunk before crashing may not even exist. I'm actually happy with that, as long as the log tree is retained until it _can_ be played back. I think it's much more important that read-only actually means read-only *as much as is possible* (if for no other reason than being able to test the status of the log tree). Obviously, for journalling FSes, a journal reply is required by the design of the FS, but with a CoW FS, the FS should be consistent if possibly outdated with a RO mount. Maybe there should be a "replay-log" mount option to modify the "ro" option to allow the log to be replayed but no further modifications? (i.e. keep the plain "ro" case to be the safest option that makes the fewest changes to the FS structure -- none). > We'll bail out of the log replay on readonly media, but otherwise the > replay always happens. OK, so what was happening in the cases where a filesystem was mountable RO, but not RW, and then btrfs-zero-log allowed the FS to be mounted? I've handled any number of people with exactly those symptoms, and it's been like that for a while. What I saw on IRC a couple of days ago seems to be new behaviour. Hugo. -- Hugo Mills | argc, argv, argh! hugo@... carfax.org.uk | http://carfax.org.uk/ | PGP: E2AB1DE4 | signature.asc Description: Digital signature
Re: Bug/regression: Read-only mount not read-only
On 2015-11-30 11:48, Chris Mason wrote: On Sat, Nov 28, 2015 at 01:46:34PM +, Hugo Mills wrote: We've just had someone on IRC with a problem mounting their FS. The main problem is that they've got a corrupt log tree. That isn't the subject of this email, though. The issue I'd like to raise is that even with -oro as a point option, the FS is trying to replay the log tree. The dmesg output from mount -oro is at the end of the email. Now, my memory, experience and understanding is that the FS doesn't, and shouldn't replay the log tree on a RO mount, because the FS should still be consistent even without the reply, and RO-means-actually-RO is possible and desirable. (Compared to a journalling FS, where journal replay is required for a consistent, usable FS). So, this looks to me like a regression that's come in somewhere. (Just for completeness, the system in question usually runs 4.2.5, but the live CD the OP is using is 4.2.3). We do need to replay the log tree, even on readonly mounts. Otherwise files created and fsunk before crashing may not even exist. I would argue that if a user is trying to mount read-only after a crash (that is, the user requests a read-only mount, not if the kernel forces it), then that probably means that the user has a specific reason for doing so, and doesn't want us writing to the filesystem at all. I understand wanting consistency, but if your system just crashed and your FS won't mount RW, then it's probably not a good idea to do anything that would cause it to be written to until you've figured out what's wrong and fixed it. Because of how BTRFS is designed, about half of the things that are needed for recovery on average, need a mounted filesystem. If you can't mount RW, then something _is_ broken, and you shouldn't be doing anything to the FS unless the user tells you to. We'll bail out of the log replay on readonly media, but otherwise the replay always happens. We have the ability to make a RO mount truly RO, so we should have some way to do that without needing to jump through hoops to make the media read-only. Not needing to jump through hoops to do this is a BIG selling point for some people (myself included) for a filesystem. Perhaps we should provide an option to control if the log replay happens at all (and then we wouldn't need btrfs-zero-log)? Or we could replay the log in memory, and only write changes to disk if the FS is mounted RW. smime.p7s Description: S/MIME Cryptographic Signature
Re: Bug/regression: Read-only mount not read-only
On Sat, Nov 28, 2015 at 01:46:34PM +, Hugo Mills wrote: >We've just had someone on IRC with a problem mounting their FS. The > main problem is that they've got a corrupt log tree. That isn't the > subject of this email, though. > >The issue I'd like to raise is that even with -oro as a point > option, the FS is trying to replay the log tree. The dmesg output from > mount -oro is at the end of the email. > >Now, my memory, experience and understanding is that the FS > doesn't, and shouldn't replay the log tree on a RO mount, because the > FS should still be consistent even without the reply, and > RO-means-actually-RO is possible and desirable. (Compared to a > journalling FS, where journal replay is required for a consistent, > usable FS). > >So, this looks to me like a regression that's come in somewhere. > >(Just for completeness, the system in question usually runs 4.2.5, > but the live CD the OP is using is 4.2.3). We do need to replay the log tree, even on readonly mounts. Otherwise files created and fsunk before crashing may not even exist. We'll bail out of the log replay on readonly media, but otherwise the replay always happens. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug/regression: Read-only mount not read-only
On 2015-11-30 10:28, Hugo Mills wrote: On Mon, Nov 30, 2015 at 09:59:40AM -0500, Austin S Hemmelgarn wrote: On 2015-11-28 08:46, Hugo Mills wrote: We've just had someone on IRC with a problem mounting their FS. The main problem is that they've got a corrupt log tree. That isn't the subject of this email, though. The issue I'd like to raise is that even with -oro as a point option, the FS is trying to replay the log tree. The dmesg output from mount -oro is at the end of the email. Now, my memory, experience and understanding is that the FS doesn't, and shouldn't replay the log tree on a RO mount, because the FS should still be consistent even without the reply, and RO-means-actually-RO is possible and desirable. (Compared to a journalling FS, where journal replay is required for a consistent, usable FS). This is exactly how it should behave (being able to say that a RO mount is really RO (if atimes aren't enabled) is a huge selling point). On a side note, a properly designed journaling filesystem _can_ be made to behave like this, but it makes the filesystem _really_ slow if you don't have enough RAM to cache all the blocks modified by the journal (because each block access has to check the journal for modifications). So, this looks to me like a regression that's come in somewhere. I'm not sure that this ever worked the way it should. It should be fixed regardless of what state things were however. I'm pretty sure it was like that at some point, because I've used it as a diagnostic tool: If you can mount OK with -oro, but not without, then the log tree is broken, and btrfs-zero-log can be used to good effect. (In fact, that's what I was trying to do with the OP when I spotted the issue). Hmm, in that case, it looks like a bisection is in order, but that may be tough without a known broken filesystem image. I would offer to try and bisect it myself, but I seem to have misplaced the scripts I had been using to automate testing, and I won't be able to take the time to look for them properly (and possibly re-write them) until this weekend. smime.p7s Description: S/MIME Cryptographic Signature
Re: Bug/regression: Read-only mount not read-only
On 2015-11-28 08:46, Hugo Mills wrote: We've just had someone on IRC with a problem mounting their FS. The main problem is that they've got a corrupt log tree. That isn't the subject of this email, though. The issue I'd like to raise is that even with -oro as a point option, the FS is trying to replay the log tree. The dmesg output from mount -oro is at the end of the email. Now, my memory, experience and understanding is that the FS doesn't, and shouldn't replay the log tree on a RO mount, because the FS should still be consistent even without the reply, and RO-means-actually-RO is possible and desirable. (Compared to a journalling FS, where journal replay is required for a consistent, usable FS). This is exactly how it should behave (being able to say that a RO mount is really RO (if atimes aren't enabled) is a huge selling point). On a side note, a properly designed journaling filesystem _can_ be made to behave like this, but it makes the filesystem _really_ slow if you don't have enough RAM to cache all the blocks modified by the journal (because each block access has to check the journal for modifications). So, this looks to me like a regression that's come in somewhere. I'm not sure that this ever worked the way it should. It should be fixed regardless of what state things were however. smime.p7s Description: S/MIME Cryptographic Signature
Re: Bug/regression: Read-only mount not read-only
On Mon, Nov 30, 2015 at 09:59:40AM -0500, Austin S Hemmelgarn wrote: > On 2015-11-28 08:46, Hugo Mills wrote: > >We've just had someone on IRC with a problem mounting their FS. The > >main problem is that they've got a corrupt log tree. That isn't the > >subject of this email, though. > > > >The issue I'd like to raise is that even with -oro as a point > >option, the FS is trying to replay the log tree. The dmesg output from > >mount -oro is at the end of the email. > > > >Now, my memory, experience and understanding is that the FS > >doesn't, and shouldn't replay the log tree on a RO mount, because the > >FS should still be consistent even without the reply, and > >RO-means-actually-RO is possible and desirable. (Compared to a > >journalling FS, where journal replay is required for a consistent, > >usable FS). > This is exactly how it should behave (being able to say that a RO > mount is really RO (if atimes aren't enabled) is a huge selling > point). On a side note, a properly designed journaling filesystem > _can_ be made to behave like this, but it makes the filesystem > _really_ slow if you don't have enough RAM to cache all the blocks > modified by the journal (because each block access has to check the > journal for modifications). > > > >So, this looks to me like a regression that's come in somewhere. > I'm not sure that this ever worked the way it should. It should be > fixed regardless of what state things were however. I'm pretty sure it was like that at some point, because I've used it as a diagnostic tool: If you can mount OK with -oro, but not without, then the log tree is broken, and btrfs-zero-log can be used to good effect. (In fact, that's what I was trying to do with the OP when I spotted the issue). Hugo. -- Hugo Mills | You are not stuck in traffic: you are traffic hugo@... carfax.org.uk | http://carfax.org.uk/ | PGP: E2AB1DE4 |German ad campaign signature.asc Description: Digital signature
Re: Bug/regression: Read-only mount not read-only
Chris Mason wrote on 2015/11/30 11:48 -0500: On Sat, Nov 28, 2015 at 01:46:34PM +, Hugo Mills wrote: We've just had someone on IRC with a problem mounting their FS. The main problem is that they've got a corrupt log tree. That isn't the subject of this email, though. The issue I'd like to raise is that even with -oro as a point option, the FS is trying to replay the log tree. The dmesg output from mount -oro is at the end of the email. Now, my memory, experience and understanding is that the FS doesn't, and shouldn't replay the log tree on a RO mount, because the FS should still be consistent even without the reply, and RO-means-actually-RO is possible and desirable. (Compared to a journalling FS, where journal replay is required for a consistent, usable FS). So, this looks to me like a regression that's come in somewhere. (Just for completeness, the system in question usually runs 4.2.5, but the live CD the OP is using is 4.2.3). We do need to replay the log tree, even on readonly mounts. Otherwise files created and fsunk before crashing may not even exist. We'll bail out of the log replay on readonly media, but otherwise the replay always happens. -chris Or disable log_tree (making fsync as slow as sync). And there will be no log replay, making RO mount real RO. I think we can add it to kernel btrfs documentation. Or, in my wildest dream, introduce a per-inode tree to record file extents/dir items. Then fsync will only need to sync the inode file extent/dir item tree.(and its direct parent maybe) And better random read/write performance. Although that's just my dream But I'm a little curious about why btrfs choose to pack dir items and file extents into the same subvolume tree at design time. Unlike most of other file systems(ext4 for example). Is it just designed for simplicity? Thanks, Qu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Bug/regression: Read-only mount not read-only
We've just had someone on IRC with a problem mounting their FS. The main problem is that they've got a corrupt log tree. That isn't the subject of this email, though. The issue I'd like to raise is that even with -oro as a point option, the FS is trying to replay the log tree. The dmesg output from mount -oro is at the end of the email. Now, my memory, experience and understanding is that the FS doesn't, and shouldn't replay the log tree on a RO mount, because the FS should still be consistent even without the reply, and RO-means-actually-RO is possible and desirable. (Compared to a journalling FS, where journal replay is required for a consistent, usable FS). So, this looks to me like a regression that's come in somewhere. (Just for completeness, the system in question usually runs 4.2.5, but the live CD the OP is using is 4.2.3). Hugo. [ 2058.530542] BTRFS info (device sda1): disk space caching is enabled [ 2058.530548] BTRFS: has skinny extents [ 2060.449981] [ cut here ] [ 2060.450015] WARNING: CPU: 1 PID: 2650 at fs/btrfs/extent-tree.c:6255 __btrfs_free_extent.isra.68+0x8c8/0xd70 [btrfs]() [ 2060.450031] Modules linked in: bnep bluetooth rfkill ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_filter ebtable_nat ebtable_broute bridge ebtables ip6table_mangle ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_security iptable_raw coretemp kvm_intel kvm snd_hda_codec_realtek snd_hda_codec_generic gpio_ich snd_hda_intel snd_hda_codec iTCO_wdt iTCO_vendor_support ppdev lpc_ich snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd i2c_i801 soundcore mei_me mei tpm_infineon parport_pc tpm_tis parport shpchp tpm acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace isofs squashfs btrfs xor i915 raid6_pq hid_logitech_hidpp [ 2060.450111] video i2c_algo_bit drm_kms_helper drm uas crc32c_intel 8021q garp stp usb_storage llc serio_raw mrp r8169 hid_logitech_dj mii scsi_dh_rdac scsi_dh_emc scsi_dh_alua sunrpc loop [ 2060.450191] CPU: 1 PID: 2650 Comm: mount Tainted: GW 4.2.3-300.fc23.x86_64 #1 [ 2060.450195] Hardware name: MSI MS-7636/H55M-P31(MS-7636) , BIOS V1.9 09/14/2010 [ 2060.450197] 73c9bbcf 8800780af618 81771fca [ 2060.450202] 8800780af658 8109e4a6 [ 2060.450206] 0002 00252f595000 fffe [ 2060.450211] Call Trace: [ 2060.450221] [] dump_stack+0x45/0x57 [ 2060.450229] [] warn_slowpath_common+0x86/0xc0 [ 2060.450233] [] warn_slowpath_null+0x1a/0x20 [ 2060.450252] [] __btrfs_free_extent.isra.68+0x8c8/0xd70 [btrfs] [ 2060.450309] [] ? find_ref_head+0x5a/0x80 [btrfs] [ 2060.450331] [] __btrfs_run_delayed_refs+0x998/0x1080 [btrfs] [ 2060.450351] [] btrfs_run_delayed_refs.part.73+0x74/0x270 [btrfs] [ 2060.450371] [] btrfs_run_delayed_refs+0x15/0x20 [btrfs] [ 2060.450420] [] btrfs_commit_transaction+0x56/0xad0 [btrfs] [ 2060.450447] [] ? free_extent_buffer+0x4f/0xa0 [btrfs] [ 2060.450474] [] btrfs_recover_log_trees+0x3ed/0x490 [btrfs] [ 2060.450501] [] ? replay_one_extent+0x680/0x680 [btrfs] [ 2060.450524] [] open_ctree+0x19be/0x23f0 [btrfs] [ 2060.450539] [] btrfs_mount+0x94e/0xa70 [btrfs] [ 2060.450546] [] ? find_next_bit+0x15/0x20 [ 2060.450551] [] ? pcpu_alloc+0x38d/0x670 [ 2060.450557] [] mount_fs+0x38/0x160 [ 2060.450561] [] ? __alloc_percpu+0x15/0x20 [ 2060.450565] [] vfs_kern_mount+0x6b/0x110 [ 2060.450579] [] btrfs_mount+0x1e8/0xa70 [btrfs] [ 2060.450584] [] ? pcpu_alloc+0x38d/0x670 [ 2060.450588] [] mount_fs+0x38/0x160 [ 2060.450592] [] ? __alloc_percpu+0x15/0x20 [ 2060.450596] [] vfs_kern_mount+0x6b/0x110 [ 2060.450601] [] do_mount+0x246/0xce0 [ 2060.450605] [] ? memdup_user+0x46/0x80 [ 2060.450609] [] SyS_mount+0x9f/0x100 [ 2060.450616] [] entry_SYSCALL_64_fastpath+0x12/0x71 [ 2060.450649] ---[ end trace 7b4fe08881eca151 ]--- [ 2060.450655] BTRFS info (device sda1): leaf 437960704 total ptrs 242 free space 2247 [ 2060.450659] item 0 key (159696797696 169 0) itemoff 16250 itemsize 33 [ 2060.450662] extent refs 1 gen 21134 flags 2 [ 2060.450664] tree block backref root 2 [ 2060.450667] item 1 key (159696830464 169 1) itemoff 16217 itemsize 33 [ 2060.450670] extent refs 1 gen 21134 flags 2 [ 2060.450672] tree block backref root 2 [ 2060.450675] item 2 key (159696846848 169 0) itemoff 16184 itemsize 33 [ 2060.450677] extent refs 1 gen 21134 flags 2 [ 2060.450679] tree block backref root 2 [ 2060.450682] item 3 key (159696863232 169 0) itemoff 16151 itemsize 33 [ 2060.450684] extent refs 1 gen 21134 flags 2 [ 2060.450686] tree block backref root 2 [ 2060.450689] item 4 key (159696879616 169 0) itemoff