from:"Theodore Ts'o"

Bug#1056145: e2fsprogs: FTBFS on hurd-i386

2023-11-19 Thread Theodore Ts'o

tag 1056145 pending
thanks

On Fri, Nov 17, 2023 at 07:09:27PM +0100, Svante Signell wrote:
> Source: e2fsprogs
> Version: 1.47.0-2
> Severity: important
> Tags: patch
> User: debian-h...@lists.debian.org
> Usertags: hurd
> X-Debbugs-CC: debian-h...@lists.debian.org
> 
> I chose to submit this patch to Debian instead of upstream as recommended. The
> upstream reports seemed to be a little dated:
> https://sourceforge.net/projects/e2fsprogs/ However, that page has the latest
> version available for download.

The upstream git repo is at
https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git and most
discussions happen on the linux-ext4 kernel mailing list on
vger.kernel.org.  I don't really use sourceforge for much besides
uploading the release tarballs mostly for old time's sake.

But that's fine, becuase in addition to being the Debian maintainer
for e2fsprogs, I'malso the upstream maintainer.  :-)

> Regarding the patch it is a very simple ifdef solution to that specific file. 
> As
> can be found in other parts of the code similar definitions are made. Maybe a
> common header file defining PATH_MAX would be a better solution. (or 
> dynamically
> allocating space for strings as is recommended for GNU/Hurd)

We don't have a common header file which includes limit.h, which is
what brings in PATH_MAX.  And the library interfaces are set up much
like gethostname, where the caller allocates the buffer and passes in
a length.

So we'll just do what we've done in other places, and what most other
programs do that seem to deal with Hurd's lack of PATH_MAX.  Unlike
your patch, though, we'll put the #ifndef PATH_MAX at the beginning of
file, after the header files are included.

- Ted

commit 795dcc264f48098ca5b214bba7d1b94189b2e491
Author: Theodore Ts'o 
Date:   Sun Nov 19 21:06:12 2023 -0500

tune2fs.c: define PATH_MAX if it is not defined by the system headers

This is needed to compile on GNU/Hurd.

Addresses-Debian-Bug: #1056145
Signed-off-by: Theodore Ts'o 

diff --git a/misc/tune2fs.c b/misc/tune2fs.c
index 458f7cf6a..9ffe0d199 100644
--- a/misc/tune2fs.c
+++ b/misc/tune2fs.c
@@ -51,11 +51,15 @@ extern int optind;
 #include 
 #include 
 #include 
-#include 
+#include /* for PATH_MAX */
 #ifdef HAVE_SYS_IOCTL_H
 #include 
 #endif

+#ifndef PATH_MAX
+#define PATH_MAX 4096
+#endif
+
 #include "ext2fs/ext2_fs.h"
 #include "ext2fs/ext2fs.h"
 #include "ext2fs/kernel-jbd.h"

Bug#1055808: fsck at boot always skipped due to APM_EMULATION kernel option

2023-11-11 Thread Theodore Ts'o

On Sat, Nov 11, 2023 at 10:22:21PM +0100, Christoph Biedl wrote:
> Package: e2fsprogs
> Version: 1.47.0-2+b1
> Severity: normal
> Tags: upstream
> X-Debbugs-Cc: debian.a...@manchmal.in-ulm.de
> 
> Greetings,
> 
> Summary: If the APM_EMULATION kernel option is enabled (and built-in),
> fsck during boot may be skipped all the time.
> 
> Longer story: One day I noticed some virtual machines have huge "mount
> count" values, and "next check after" at a date far in the past. (I have
> enable_periodic_fsck set.) After some research I learned I had (by
> accident) enabled the APM_EMULATION kernel option. So /proc/apm exists
> but has more or less static content[1]. The important part is is_on_batt
> in e2fsck/unix.c which reads the AC status as 0xff ("unknown"), and
> effectively treats this as "on battery".
> 
> So may I ask you to re-visit that check in that regard? I'll disable
> that option anyway, other people still might get into trouble because
> of that.

This was behaviour that was requested per Debian bug #205177[1], and
note that it doesn't skip the checks forever.  It just doubles the
check interval (so if check interval is 14 days, it will make it be 28
days), and if the max_mount_count was 20, it will not wait until the
40th mount.   So it's not "always skipped"; it's rather deferred.

[1]  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=205177

I'll also note that for modern file systems, mke2fs disables periodic
checks entirely, by setting check interval to zero and the max mount
count to -1.  For example from the output of "dumpe2fs -h" run on my
laptop's root file system:

Filesystem created:   Sat Dec 19 17:28:55 2020
Last mount time:  Wed Nov  1 13:44:11 2023
Last write time:  Wed Nov  1 13:44:09 2023
Mount count:  58
Maximum mount count:  -1
Last checked: Sun Aug 15 23:12:00 2021
Check interval:   0 ()

So yeah, there is a large mount count, and it hasn't been checked
since August 2021.  But that has nothing to do with is_on_batt()
check.  Are you in fact explicitly enabling periodic fsck checks?  If
this is a cloud image, I would expect the file systems to be new
enough that it should be getting the new mke2fs defaults.

Basically, we assume that modern hardware is more robust, and we're no
longer doing periodic fsck checks on general principles.  This was
more because we didn't trust crappy PC hardware many decades ago.  But
as file systems have gotten bigger, and fsck times started taking
longer and longer, we dispensed with periodic checks entirely.

These days if you are paranoid enough to want to do periodic checks,
the better solution is to rely on e2scrub (assuming that your are
using LVM).

Cheers,

- Ted

Bug#1053111: e2fsprogs FTBFS when systemd.pc changes systemdsystemunitdir

2023-09-28 Thread Theodore Ts'o

On Tue, Sep 26, 2023 at 10:31:41PM +0200, Helmut Grohne wrote:
> Source: e2fsprogs
> Version: 1.47.0-2
> Tags: ftbfs patch
> User: helm...@debian.org
> Usertags: dep17m2
> 
> We want to change the value of systemdsystemunitdir in systemd.pc to
> point below /usr. e2fsprogs' upstream build system consumes this
> variable while the packaging hard codes its current value. When we
> change it, e2fsprogs will FTBFS. Consider applying the attached patch to
> avoid this failure.

Out of curiosity, do you know when this change to systemdsystemunitdir
is planned to happen?  I was planning on releasing a 1.47.1 release in
the next month or two, and I'm curious about the planned timing of the
change in systemd.pc.  Are we talking about something that is going to
happen in days?  weeks?  months?

It would be helpful to know how urgently we need to get an updated
Debian source package released.

Thanks,

- Ted

Bug#1035543: init-system-helpers: new systemd units may not get enabled on upgrades from bullseye if systemd is installed

2023-06-01 Thread Theodore Ts'o

severity 1035543 normal
retitle 1035543 e2fsprogs: on an upgrade from bullseye e2scrub-reap.service may 
be wanted by default.target instead of multi-user.target
thanks

On Thu, Jun 01, 2023 at 07:40:25PM +0100, James Addison wrote:
> > So it's not a big deal; is that correct so this patch is not worth
> > trying to shoehorn in beform Bookworm ships, and this particular patch
> > can be safely downgraded from important, right?
> 
> I think all of that is true, yep.

OK, I've reset this to normal.

> Also, arguing against my own revert-patch: I think it could be said
> that multi-user is the "better" target to use here, because the
> default could be "graphical" or some later-reached system state
> whereas this is a relatively low-level (if small) system cleanup
> service.

Right, that's I believe the point of bug #991349; it's possible that
the system adminsitrator might manually set default.target to point to
graphical.target, per [1].  And since multi-user.target is a subset of
graphical.target, it makes sense to make the Wanted-by to be
multi-user.target.

[1] https://www.baeldung.com/linux/systemd-target-multi-user

And so it's fair to keep this bug open, but I think it makes it a
normal bug, with the resumption that hopefully it can be fixed via 
deb-systemd-helper or init-system-helpers.

In this particular case, since we *always* want it to be
default.target, since the whole *point* is to clean up after a failed
e2scrub, it seems really unlikely to me that the system administrator
would change this.  So this is one where it's probably fair for the
postinstall script to just fix the wanted-by link **always** if the
the systemd unit file says Wanted-by: default.target, and the symlink
is inconsistent with it.

Maybe this will mess with some kind of hidden systemd rules of the
road, but quite frankly, the fact that this is causing so much pain
and confusion, I'm not sure I care.

> I'd agree with downgrading the severity to below RC (and it's your
> package, so feel free, I think?  maybe even close it?).  If anyone
> arrives here/reports other bugs as a result of experiencing it, I
> think we can let them know that it's safe to remove the legacy
> 'default.target' symlink (does that sound correct too?).

I think so.  The symlink should *never* be considered normative,
right?  The unit file should be considered the single source of truth.
If the system administrator wants to do something weird, they are
supposed to copy /lib/systemd/system/e2scrub_reap.service to
/etc/systemd/system/e2scrub_reap.service, and if the symlink is
inconsistent with what's in the unit file (with the one in
/etc/systemd/...  if it exists taking precedence over the one in
/lib/systemd/..., we should just fix the symlink.

So this should be fixable in deb-systemd-helper, or I can just
implement a Big Fat Hammer in e2fsprogs's postinst script.

Does that sound right?

- Ted

Bug#1035543: init-system-helpers: new systemd units may not get enabled on upgrades from bullseye if systemd is installed

2023-06-01 Thread Theodore Ts'o

In addition to Bookworm being hard frozen, I question the importance
of this patch, the bug priority, and whether the title is correct.
After all, at least with respect to e2fsprogs systemd unit *will*
still be enabled.  It will just be enabled using
../multi-user.target/wanted instead of ../default.target/wanted.

Ok, piuparts whines about it, and I agree that it's ideal if things
are the stame regardless of whether the distro is freshly installed or
uprgaded --- but e2scrub-repeat.service will *still* be enabled,
right?  And so the bug title is misleaing, right?

So it's not a big deal; is that correct so this patch is not worth
trying to shoehorn in beform Bookworm ships, and this particular patch
can be safely downgraded from importanted, right?

Or else, what am I missing?

Thanks,

- Ted

P.S.  And repeat after me, "systemd is *so* much more simpler than
init.d scripts because it's declarative"

Bug#1035543: init-system-helpers: new systemd units may not get enabled on upgrades from bullseye if systemd is installed

2023-05-27 Thread Theodore Ts'o

On Sat, May 27, 2023 at 11:09:32PM +0200, Helmut Grohne wrote:
> Hi,
> 
> I sat down with Jochen in Hamburg to try and fix this.
> 
> On Sun, May 14, 2023 at 03:21:24PM -0400, Theodore Ts'o wrote:
> > Can someone send the instructions on how to fix this?
> 
> We wish we could give you. Instead, we document our findings, so maybe the
> next one looking into this bug has a better idea, but for now we give up
> as it is too late for bookworm anyway.

Helmut, Jochem, thanks so much for trying to look into this.  Here's
some additional context from my research.

First of all, the change to use WantedBy=default.target to
Wantedby-multi-user.target, as described in Message #19 of this bug,
was in response to a bug report from Ansgar, bug report #991349:

>I noticed that e2scrub_reap.service uses
>
>  WantedBy=default.target
>
>instead of the more usual
>
>  WantedBy=multi-user.target.
>
>As default.target is usually just an alias for multi-user.target or
>graphical.target, this means it will even be pulled in if someone uses
>some other custom target. This feels rather unexpected.
>
>Is there any reason not to use WantedBy=multi-user.target?

At the time, I thought to myself, sure, makes sense, and made the
change in commit b42c9788c75d ("e2scrub: use
WantedBy=multi-user.target in e2scrub_reap.service"), and in the
commit I noted "Addresses-Debian-Bug: #991349"

As near as I can tell, on a system that started with the Bullseye
version of e2fsprogs, and which has then updated to the Bookform
version e2fsprogs, via periodic updates to testing (Bookworm), the
default.target link still exists:

% ls -l /etc/systemd/system/default.target.wants/e2scrub_reap.service
0 lrwxrwxrwx 1 root root 40 Dec 19  2020 
/etc/systemd/system/default.target.wants/e2scrub_reap.service -> 
/lib/systemd/system/e2scrub_reap.service

... and this is enough for systemctl status to seem to think that
e2scrub_reap is still enabled:

% systemctl status e2scrub_reap
○ e2scrub_reap.service - Remove Stale Online ext4 Metadata Check Snapshots
 Loaded: loaded (/lib/systemd/system/e2scrub_reap.service; enabled; preset: 
enable>
 Active: inactive (dead) since Sat 2023-05-27 17:53:22 EDT; 1h 34min ago
   Docs: man:e2scrub_all(8)
Process: 1309 ExecStart=/sbin/e2scrub_all -A -r (code=exited, 
status=0/SUCCESS)
   Main PID: 1309 (code=exited, status=0/SUCCESS)
CPU: 12ms
 ...

So sure, /etc/systemd.d/system/multi-user.target.wants/e2scrub_reap.service
doesn't exist.  *But* it still exists in .../default.target.wants/...
which seems to be enough to keep the e2scrub_reap service enabled.  Right?

What am I missing?

In any case, I am still unclear (a) what is actually broken in this
particular setup, since according to systemctl status the systemd unit
is apparently still appropriate enabled, even if it isn't via the
expected Wanted-b: multi-user.target.

And secondly, (b) what is e2fsprogs's control scripts supposed to have
done differently?  That is, if this is indeed this is a bug in
e2fsprogs --- what did I do wrong, and how do I fix it?

And if the answer is you should never, ever, try to change a Wanted-by
line in a systemd script, because debian's systemd unit file
infrastructure is too fragile to handle this correctly, given that
bookworm is about to ship with "Wanted-by: multi-user.target", what's
the best path forward at this point?

I'll note that e2scrub_reap.service is just a helper unit file which
is only needed to clean up after a system crash while e2scrub is
running --- and that will only happen if the user has edited and
appropriately configured e2scrub in /etc/e2scrub.conf.

So from my maintainer's perspective, what I am going for is that
e2scrub_reap.service and e2scrub_all.timer should *always* be enabled,
since the real control point (as far as I am concerned) is
/etc/e2scrub.conf.  I really don't actually *care* whether it is
enabled via default.target.wanted or multi-user.target.wanted.

If I need to be sent to some systemd re-educational camp to understand
the finer points about default.target vs multi-user.target, and
whether it acctually makes any difference whether the systemd unit
file says "Wanted-by: multi-user.target", but in the upgraded
bullseye->bookworm installation, the symlink is still in
*/default.target.wanted/* --- please point me at the documentation.

Otherwise, I'm beginning to think that nothing is actually broken, and
the bullseye2bookworm piuparts tests is just being overly picky, but
nothing is actually broken in actual practice.  And perhaps I should
just close this bug as "Working as Intended".

Again, what am I missing? 

- Ted

P.S.  I really *am* trying to get with the systemd program, but this
all of this complexity is just hopelessly confusing.  :-( :-( :-(

P.P.S.  And there is actually a case where thi

Bug#1035543: init-system-helpers: new systemd units may not get enabled on upgrades from bullseye if systemd is installed

2023-05-14 Thread Theodore Ts'o

On Sun, May 14, 2023 at 06:03:59PM +0200, Michael Biebl wrote:
> > Please reassign it there together with instructions how to fix it, i.e.
> > what should be done in the maintainer scripts.

Can someone send the instructions on how to fix this?

I'm always amused by people who claim systemd is "simpler" to
understand than init.d scripts.  :-)

I have no clue what's going on; is default.target obsolete?  If we
change the Wanted-by in a systemd unit file, what magic incantation is
required?  Can someone point me at documentation that describes all of
this?

Thanks,

- Ted

Bug#1034746: Bug: mk2efs/mkfs.ext4 does not create the correct amount of inodes if -i is specified with 640k to 832k

2023-04-24 Thread Theodore Ts'o

On Sun, Apr 23, 2023 at 09:34:39AM +0200, The MH wrote:
> Package: e2fsprogs
> Version: 1.46.5
> Severity: minor
> 
> I did not find this bug in the patchnotes for the latest versions on
> e2fsprogs.sourceforge.net/e2fsprogs-release.html, so I assume it is
> still present.
> 
> I stumbled upon this, because I wanted to specifiy -i 768k for my main
> data drive (a 2 TB hard drive) as kind of less "aggressive" option
> than -i 1M or -T largefile.
> 
> I proceeded to test this behaviour against a file container with
> exactly 4 GiB. From what I know the value should increment in steps of
> 512 for every 64k as this is the boundary for one inode block per
> block group which ranges from 16 to 8 in this scenario.
> 
> -i 512k: number of inodes: expected 8192 actual 8192 -> ok
> -i 576k: number of inodes: expected 7680 actual 7680 -> ok
> 
> -i 640k: number of inodes: expected 7168 actual 6656

I'm not sure how you are calculating the "expected" value, but the
inode ratio is literally that.  So when you have an inode ratio of
640k, we take the total size of the file system, and divide it by the
inode ratio to determine the target number of inodes.  So the target is

4GiB / 640k or 4294967296 / 655360 or 6553.60.  This gets rounded up
to 6556.

> -i 704k: number of inodes: expected 6656 actual 6144

And...

4 GiB / 704k or 4294967296 / 720896 or 5957.78

... which gets rounded up to 6144.  So how does the rounding up
happen?  Well, a 4GiB file system, using a 4k block size, has 1048576
4k blocks.  Since there are 32,768 blocks in a block group, that means
there are 32 block groups.  So that means we need roughly 186.18
inodes per block group.  Since the default inode size is 256 bytes,
that means we can have 16 inodes per 4k block, and if we had 11 inode
table blocks per block group, that would work out to be 176 inodes per
block group, which is too small (it's less than 186 inodes), and 12
inode table blocks works out to 192 inodes per block group.  And 32
block groups times 192 inodes per block group works out to 6144, which
is the actual number that you've seen.

So this isn't a bug in mke2fs, but rather an apparent misunderstanding
of how the inode ratio is used.  In general, the basic idea is if the
system administrator thinks that the average size of the inodes is,
say, 704k, we need to make sure that there are at *least* 5958 inodes
(that is, 4 GiB / 704k and since the .78 in 5957.78 makes no sense,
that gets rounded to 5958).  But we need to have an integral number of
inode table blocks, and there's no point in having a partially filled
inode table block, and that's why it ends up being 6144 inodes.  I'm
not at all sure how you came up with your expected 6656 inodes,
consider that 6656 * 704k is 4.47 GiB, which is quite a bit more than
4GiB.

Cheers,

- Ted

Bug#1031640: Bug#1030940: e2fsprogs: generates filesystems that grub-install doesn't recognize

2023-02-21 Thread Theodore Ts'o

On Tue, Feb 21, 2023 at 12:17:20PM +, Christopher Obbard wrote:
> Control: severity -1 important
> Control: retitle -1 e2fsprogs generates filesystems which cannot be
> mounted on systems with older e2fsprogs
> 
> It turns out for debos the situation is a bit different. Since debos
> uses packages on the host to prepare the partitions, new features in 
> e2fsprogs from unstable which are unsupported in the target suite
> causes the built system to not boot.
> 
> One such failure at boot-time of a Debian testing system is when
> systemd-fsck tries to check a filesystem, it fails to mount the device:
> 
>   /dev/mmcblk0p5 has unsupported feature(s): FEATURE_C12
> 
> So, in short we cannot run images built for suites with older e2fsprogs
> on a host system which has e2fsprogs >=1.47.0-1.
> 
> Fixing grub2 will not fix the issue for debos, so I have retitled the
> bug.
> 
> In debos with newer e2fsprogs, we can set the following flag on a
> partition to disable the feature which allows the system running older
> e2fsprogs to boot, but older e2fsprogs doesn't allow us to set this
> flag:
> 
>   features: [ "^orphan_file" ]
> 
> So to my mind, the bug is in e2fsprogs as it doesn't maintain backwards
> compatibility.

E2fsprogs is always going to be adding new features.  Even if I dumb
down /etc/mk2fs.conf to not enable this feature in Bookworm, it *will*
enable this new feature in the next release of Debian.  This is true
for all file systems; we add new features to improve user experience.

So packages that are creating file systems to be consumed by older
target OS's must either (a) be aware of these changes for all file
systems (for example XFS is enabling the "bigtime" feature by default
in Bookworm to fix y2038 bug by expanding the size of the timestamp in
their inodes), OR, (b) run the mke2fs from the target OS in build
chroot.

This might get tricky for some target OS's, such as for example, GNU
Hurd, but the Hurd *absolutely* needs to either use mkfs.ext4 from
Hurd, or if you are using the mke2fs from Linux, you'll need
"mkfs.ext2 -O hurd".

The assumption that mkfs's from file systems never change and provide
full backwards compatibility for all target OS's is very much an
anti-pattern.

Best regards,

- Ted

Bug#1031622: d-i regression since bookworm alpha 1: creates a filesystem with FEATURE_C12 unsupported by the installed e2fsck

2023-02-19 Thread Theodore Ts'o

On Sun, Feb 19, 2023 at 01:23:19PM +, Simon McVittie wrote:
> 
> I think this could be caused by debian-installer having udebs from
> e2fsprogs 1.47.0-1 in the installation environment, so that the version
> used to create the root filesystem has newer feature flags than the
> installed version of e2fsck can cope with (similar to #1030939 and
> #1031325).

I thought the Debian-Installer would only be using udebs from Testing?
I'm confused how it would be picking up e2fsprogs 1.47.0-1 from
unstable, if it is going to be installing.  Currently e2fsprogs is
blocked from migrating to testing precisely because we didn't want d-i
to be picking up e2fsprogs 1.47.0.

That being said FEATURE_C12 is the orphan_file feature which is only
going to be enabled by default in e2fsprogs 1.47.0.  I'm just really
confused why d-i would be picking up the e2fsprogs udebs from
unstable.

- Ted

Bug#1031325: e2fsprogs 1.47.0 introduces a breaking change into Bookworm, breaking grub and making installations of Ubuntu and Debian releases via debootstrap impossible

2023-02-17 Thread Theodore Ts'o

On Fri, Feb 17, 2023 at 10:34:29PM +0200, Adrian Bunk wrote:
> 
> The same general problem applies in various "building non-Debian 
> embedded Linux filesystem on Debian" situations where the target
> chroot does not contain mkfs.ext4.

In practice, if the root file system is using ext4, the target chroot
has to have e2fsprogs installed so that fsck.ext4 exists.  So I'm not
sure it's all that unreasonable?

> Or if it is present, it would be a chroot for the target architecture
> where you might have to run mkfs.ext4 under qemu.

Sure, but that's not that difficult; I have a script[1] that will
setup chroots for foreign architectures which use binfmt_misc to run
(for example) arm32 or arm64 binaries using qemu on an x86 machine,
without needing to boot an arm32/arm64 kernel.  For a more detailed
explanation see slide #8 of this slide deck[2].  The setup-buildchroot
script is turn-key; my experience is that several new college grads
and interns hired at $WORK have had no problems setting it up.

[1] https://github.com/tytso/xfstests-bld/blob/master/setup-buildchroot#L364
[2] https://thunk.org/android-xfstests

Also, in practice, if you are building an image for a foreign system,
you'll need to have a qemu setup to run the second stage of the
debootstrap in any case.  I've just found it simpler to run the mkfs
and debootstrap in a chroot using qemu-user-static compared to messing
around with debootstrap --second-stage, which requires running a
chroot and qemu in any case.

> All image building tools that support bookworm (including all image 
> building tools we ship in bookworm) have to ensure that what they
> produce works on the intended target. There is no general solution
> *how* to do that that could be applied in all cases.

Well, the general solution we can give them is instructions to adjust
/etc/mke2fs.conf on those systems needing to run those image building
systems.  It's a one-line change.  This can be documented in the
release notes, or in an e2fsprogs NEWS entry.

If the RT really insists, we can edit /etc/mkefs.conf for Bookworm to
not enable metadata_csum_seed by default.  This will make it more
difficult for root file systems cloud VM's to be able adjust the file
system UUID on the fly, while the file system is mounted, so that's
the tradeoff.  Quite frankly, the distro that I really care about for
$WORK is Arch, since Google's Cloud Optimized OS is based off of Arch
userspace packages.  So if the Debian Release Team would like to
disable metadata_csum_seed by default in e2fsprogs, I will make that
change and upload a new version of e2fsprogs 1.47.0.

I don't think that's the right way to go, since I don't consider image
building to be a super-common use case, and those who do that can edit
/etc/mke2fs.conf on their own.  However, I accept that this is the
Release Team's call.

If we do make this change to mke2fs.conf for Bookworm, my intention is
to upload a version of e2fsprogs which reverts this change as soon as
Bookworm ships as stable, so that Debian 13 will enable
metadata-csum-seed by default.  At that point, people using Sid or
Debian Testing will have problems if they want to build images for
Bullseye, and I'll note that Daniel, who originally registered this
objection, was building images using Debian Unstable.  So this will
will only give him a reprieve for a few months before he will need to
push changes to vmdb2 or make that one-line change to
/etc/mke2fs.conf.

If the Debian release team could let me know which path they would
prefer, I would appreciate it.  At this point, the grub2 package
version (2.06-8) which supports metadata-csum-seed has migrated to
testing.  So the only thing the e2fsprogs migration is now blocked on
is the RT's decision on this bug.

Best regards,

- Ted

Bug#1031325: e2fsprogs 1.47.0 introduces a breaking change into Bookworm, breaking grub and making installations of Ubuntu and Debian releases via debootstrap impossible

2023-02-17 Thread Theodore Ts'o

On Fri, Feb 17, 2023 at 12:43:01PM -0700, Sam Hartman wrote:
> 
> I am not entirely convinced that using current rather than guest
> tools for image building is an anti-pattern.  You've been working on
> filesystems for a long time; I've been working on various image
> building projects since my first watchmaker project at MIT.

For what it's worth, I've been building test appliances[1] for the
purposes of file system testing since 2013 (initially just for Qemu,
but now for Cloud[2] environments as well as Android[3]).  Admittedly,
I haven't been doing this as long as you, but I'm not unfamiliar with
building appliance images using debootstrap, and I've *always* built
my KVM/Qemu images using a build chroot[4], including the mkfs and
debootstrap steps.

[1] https://github.com/tytso/xfstests-bld/blob/master/test-appliance/gen-image
[2] https://thunk.org/gce-xfstests
[3] https://thunk.org/android-xfstests
[4] https://github.com/tytso/xfstests-bld/blob/master/setup-buildchroot

So I'm happy to talk to you off-line about best practices for building
images, but this is something that I do *regularly*, since there are
weekly updates from upstream xfstests.  Thus, I have personal
experience that using a build chroot for image creation really is
*not* *that* *hard*.  For that matter, when I'm doing test appliance
development, I'm running regression tests[5] several times a day which
build images in a chroot.

[5] https://github.com/tytso/xfstests-bld/blob/master/selftests/appliance

Everything is automated.  No fuss, no muss, no dirty dishes.  :-)
Futher, it provides better build reproducibility, no matter who is
building the image, and it also makes full GPL compliance (if you are
publishing the test appliances) much, *much* easier --- I can provide
an exhaustive list of all components (including mkfs.ext4!) and control
scripts involved with the creation of the image.

Cheers,

- Ted

Bug#1031325: e2fsprogs 1.47.0 introduces a breaking change into Bookworm, breaking grub and making installations of Ubuntu and Debian releases via debootstrap impossible

2023-02-17 Thread Theodore Ts'o

On Fri, Feb 17, 2023 at 02:08:28PM -0500, Theodore Ts'o wrote:
> So enabling what may be
> convenient, but ultimately an anti-pattern is something that hopefully
> in the long-term Debian should be striving towards.

Sigh, I managed to invert the sense of what I was trying to say.  This
should have read:

So enabling what may be convenient, but ultimately an anti-pattern is
something that hopefully in the long-term Debian should be trying to
*avoid*.

- Ted

Bug#1031325: e2fsprogs 1.47.0 introduces a breaking change into Bookworm, breaking grub and making installations of Ubuntu and Debian releases via debootstrap impossible

2023-02-17 Thread Theodore Ts'o

On Fri, Feb 17, 2023 at 08:51:33AM -0800, Russ Allbery wrote:
> Adrian Bunk  writes:
> 
> > The image creators could just set the features they enable to what they
> > copied from /etc/mke2fs.conf from the target distribution, a label with
> > a timestamp wouldn'tbring much benefit here.
> 
> That's a very good point and I'm embarrassed it wasn't immediately obvious
> to me.  Sorry about the noise.

One other thing I woud add here is (a) this whole discussion of
mke2fs.conf only helps for ext4, and the general problem is something
that extends to all file system.  The immediate question may be ext4
specific, but as I mentioned earlier, XFS is enabling the "bigtime"
feature for the first time in Bookworm.  So enabling what may be
convenient, but ultimately an anti-pattern is something that hopefully
in the long-term Debian should be striving towards.  Yes, it's
annoying and and extra work.  So is using build chroots if we are
building packages for a older Distro versions.  But it's the right
thing to do.

Secondly, (b) there may be a misapprehension that it is possible to
get an identical file system just by controlling the contents of
mke2fs and/or specifying the file system features on the command line.
While this is mostly true, it is not the whole story.  For example,
the size and location of the journal is determined hueristically, and
in the past, this has changed as we have discovered that (for example)
making the default journal size larger would result in better
performance.  The location of the journal has also changed from the
beginning of storage device (low-numbered LBA's) to the middle of the
device.

So just as a binary compiled using the default gcc on Buster might be
different from a binary build using Sid, even if you you force the use
of older glibc shared libraries at link time or use -static to
statically link with the glibc installed in your random desktop
environment, the results from using mke2fs on Debian N may be
different from using mke2fs on Debian M, even if you control for thea
file system features.  (And other things which _are_ controllable on
mke2fs.conf, but which go beyond mke file system features.  For
example, in Bookworm starting with e2fsprogs 1.46.4, the default inode
size is forced to always be 256 bytes, even for small file systems.
This was not true in Buster and older Debian distributions.)

So if you want a bootable image that is identical to what would be
created by using the Buster or Bullseye debian-interaller, you
*really* want to use the mkfs that was supplied by Buster or Bullseye,
and that means running the mkfs from a chroot.

Best regards,

- Ted

Bug#1031325: e2fsprogs 1.47.0 introduces a breaking change into Bookworm, breaking grub and making installations of Ubuntu and Debian releases via debootstrap impossible

2023-02-16 Thread Theodore Ts'o

On Thu, Feb 16, 2023 at 11:45:23AM -0800, Russ Allbery wrote:
> 
> Yes, I'm probably understating the difficulty of making this change in
> practice inside image building software as it's currently constructed.
> 
> My concern about changing mkfs options is that I worry that this would be
> a constantly-changing target that has to be synchronized across multiple
> pieces of software that are not currently well-aligned with file system
> development.  One thing that would make this much easier would be for mkfs
> to support some sort of compatibility level flag that sets all of the
> options, whatever they may be, to their defaults as of some point in the
> past.  Image building software could then pick a conservative default set
> point when building images for older distributions.  That change still has
> to be added to all of the image building software, but it might be easier
> than building a local chroot of the target distribution before using it to
> build the file system the actual installation is going into.

As a long-term solution, one could image changing the various image
creation tools to do something like "mfks.ext4 -T grub2_dumbdown
/dev/XXX", and then have something like the following in
/etc/mke2fs.conf:

[fs_types]
grub2_dumbdown = {
features = ^metadata_csum_seed,^casefold
}

If you care about being able to run fsck.ext4 on really old
distributions, one could even add things like

[fs_types]
jessie_dumbdown = {
features = ^metadata_csum_seed,^metadata_csum
}

etc.

Maintaining this would be a nightmare, and I'd want to ask for help,
since this would be change if we also want to add dumbdown file system
usage types for Ubuntu, and potentially, other Debian derivitives.

Even if we stick with grub2_dumbdown, again, how far back do we go?
Some people have said, "just one release", but I bet there will be
people wanting to create Stretch installer images using a sid
userspace.  So I'd want to have some kind of formal understanding
about what it is that we feel obliged to support.

Given the number of users of the iamge building tools, it's a pretty
specialized use case with not a lot of users, and they can also just
edit /etc/mke2fs.conf to have their favored default.  From my
e2fsprogs maintainers hat on, that's my position --- I interpret
"users" in the Debian Social Contract for the general users, and not
necessarily something that needs to work for every single exotic use
case, especially if they tend to be more of the power users.

If we're not allowed to inconvenience *any* users, then it's hard to
make forward progress.  Certainly some people (including myself) were
inconvenienced for things like /usr-unification and the transition to
systemd.  We went ahead with the transition even though some users
were inconvenienced.  And quite honestly, if a few power users need to
edit /etc/mke2fs.conf to remove metadata_csum_seed because they want
to do something which is *REALLY* not a good idea (using Debian M to
build a file system meant for use on Debian M-X) --- for *ALL* file
systems, not just ext4.  It may be that some users have gotten lucky,
and it mostly works.  But it's a bad idea, and encouraging this bad
practice is eventually going to lead to tears.

But, if the Debian Release team would like to override my position, my
suggestion would be to just change the default for /etc/mke2fs.conf
for *everyone* running Debian bookworm, and with the understanding
that this will be reverted in Debian testing after the next stable
release, and that moving forward, we make it the image building tools
problem if they want to support this highly dubious practice of using
Debian N+X's mkfs to build images for Debian N.  I would strongly
suggest that they figure out which file system features they need to
explicitly turn off for ext4, xfs, btrfs, etc., if they want to build
old distro versions using whatever random software they have installed
on their desktop.

Best release engineering practice is that you use a known, controlled
build environment, and not whatever random package versions might be
on your desktop.  While that might be "inconvenient" when building
packages, we've learned to live with it.  I use sbuild; others might
use pbuild, or their own custom schroot setup.  But I dare say all
responsible people doing release engineering use a known build
environment.  Why should this be any different if you are building
images --- especially images that you expect *your* users or customers
to be depending on?  What are your responsibilities to them?  (Whether
or not the Debian Social contract applies to them or not.)

- Ted

Bug#1031325: e2fsprogs 1.47.0 introduces a breaking change into Bookworm, breaking grub and making installations of Ubuntu and Debian releases via debootstrap impossible

2023-02-16 Thread Theodore Ts'o

On Wed, Feb 15, 2023 at 07:39:28PM -0800, Russ Allbery wrote:
> It had never occurred to me before that the version of the system on which
> I run mke2fs would matter as long as I didn't pick a newer file system
> type (ext5 or something).  Now I know!  Until today, I had no idea ext4
> even *had* "incompat" features or that mke2fs could make file systems that
> grub couldn't understand.

While ext2 pioneered the whole concept of "compat", "rocompat" and
"incompat" features, these days pretty much all modern Linux file
systems have this.  As I mentioned earlier, xfs is enabling their
incompat "bigtime" feature for the first time in Bookworm.

File sysems have been evolving pretty much continuously since ext2 was
first released 30 years ago.  Poeple may have gotten lucky that grub
only (mostly) cares about incompat and rocompat features, but things
like adding extended attributes, 64-bit block number support,
post-2038 timestamps, etc., all require changes to the on-disk format.
And may of these updates did not happen at a extN to extN+1 boundary.

As far as the kernel is concerned, ext2/ext3/ext4 is really more about
allowing multiple implementations co-existing.  These days, "ext3"
file systems are handled by the code in fs/ext4/*.c, and the only
reason why we've kept fs/ext2/*.c is to provide sample file system
code more than anything else.  Many distributions (including Debian)
use fs/ext4 to support the ext2 file systems, via kernel config
CONFIG_EXT4_USE_FOR_EXT2.

As I mentioned, for the most part file system developers have tended
to care much more about kernel and file system utility
back-compatibility.  This is probably because we have our own ways of
controlling these issues using controls such as /etc/mke2fs.conf, but
also because we tend to worry much more about data disks than
installers.  And if much fewer Debian users tend to boot using say,
xfs or f2fs, perhaps people have just not noticed and complained.

Moving forward, certainly I and other fs developers will be spending
more time worrying about grub2 handling various file system features.
For example, grub2 doesn't understand the ext4 "casefold" feature, and
someday someone might want to make a Debian derivitive where users'
home directories work like MacOS and Windows...

Documenting this in ext4(5) is a bit challenging, because (a) not all
users use the grub2 bootloader, and (b) that means we'd have to
continuously update ext4(5) as grub2 changes.  But certainly we could
make a generic warning in ext4(5).

It might be that a Wiki is the best place for this.  The Arch and
Gentoo wikis have some of this already, since Arch and Gentoo users
tend to believe in enabling all file system developers (whether it is
good for them or not).  That's great for me in terms of beta testers
for features that I don't think is 100% ready for prime time, but
that's another reason why using a newer mkfs is a bad idea.

For example, the inline_data feature is *not* ready for prime time,
but it's been getting better, and someday we might enable it by
default in Debian N+x.  It'll *mostly* work so long as you don't
stress-test creating writing, truncating, etc., small files, but if
vmdb2 enables inline_data for Bookworm even though the Bookworm
e2fsprogs doesn't enable inline_data by default (but some future
Debian N+x does enable it by default), some users' lives might get
*exciting*, since the Bookworm will probably will be on the 6.1 LTS
kernel.

- Ted

Bug#1031325: e2fsprogs 1.47.0 introduces a breaking change into Bookworm, breaking grub and making installations of Ubuntu and Debian releases via debootstrap impossible

2023-02-15 Thread Theodore Ts'o

On Wed, Feb 15, 2023 at 04:06:55PM -0700, Sam Hartman wrote:
> 
> You argue about shared libraries for non-packaged binaries.
> I think we mostly don't care about that, and again, I think that's at
> least a generally recognized thing that came out of our focus on
> packages and package dependencies.

Note that package dependencies doesn't allow a binary created on
Debian N to work on Debian N-1.  It just *prevents* the package from
being installed on Debian N-1.  If you care about allowing the package
to be instaslled on Debian N-1, that's what build chroots are for.
That's how we create packages for backports, after all.

I'm making a similar argument for mkfs and bootable images for older
distributions.  Just as you use a build chroot when you build a
package for Debian N-1, you should use a chroot when building a
bootable image for Debian N-1.  And that means using the chroot for
*everything*; not just installing the grub bootloader, but also
running mkfs.

(Note that using the sid grub bootloader while using the sid's
mkfs.ext4 would work in this particular case, but if you want a pure
"Bullseye" image which is identical to what running the Bullseye d-i
would create, then you should run Bullseye's mkfs, not sid's mkfs.)

  - Ted

Bug#1031325: e2fsprogs 1.47.0 introduces a breaking change into Bookworm, breaking grub and making installations of Ubuntu and Debian releases via debootstrap impossible

2023-02-15 Thread Theodore Ts'o

On Wed, Feb 15, 2023 at 01:17:38PM -0700, Sam Hartman wrote:
> 
> I.E. I think your question of "for how long" has a very simple answer
> based on our history: if we care about stability in this instance it's
> for +/-1 Debian release.
> 
> I'm struggling trying to figure out whether we should commit to that
> stability.

I recogniuze that there are precedents that go in both directions.  We
have *never* required that shared library linkages created in Debian N
work in Debian N-1.  Sure, there are workarounds that you can use
(e.g., compiling with -static), but I've listed workarounds for mke2fs
as well.

In other cases, we may have gone out of our way to provide these sorts
of transitions.

> I also think it would be reasonable for the project to decide we care
> about this stability, and that we want bullseye grub to work with a
> filesystem made on sid.

Well, I agree that it's a Distribution level decision.  And if the
distribution decides that we should acccomodate this particular case
case, we can certainly make a Debian-specific change to the
/etc/mke2fs.conf config file which doesn't enable metadata_csum_seed
by default.

> I understand you do not support that stability decision.

The argument I would make is that it is a case by case decision, and
not a global one where *all* all generated objects created by Debian N
have to work in Debian N-1.  For example, this is most manifestly
*not* true for any shared library compiled by default that uses glibc.
And I don't personally consider vmdb2 to be important enough to be
worth this kind of solicitude when we haven't done it for shared
library lankage --- just based on the popcon statistics if nothing
else.

But, if the release team belives that a note in the release notes is
not sufficient, I can certainly change the default for Debian.  The
*reason* why the default features can be configured in
/etc/mke2fs.conf is because distributions or individual users might
want to make different decisions about the defaults than the upstream
maintainer of e2fsprogs.  So I'll certainly abide by the ruling of the
release team in this matter, and I guess if Daniel doesn't like the
decision they make, he can always appeal this to the Debian TC.

The appeal chain on this one seems pretty clear.  As the Debian
maintainer for the e2fsprogs package; I have my opinion on how this
can be resolved.  Daniel has appelaed this to release team, and if
necessary, he can appeal to the Technical Committee, and he can ever
try to organize a Debian GR if he feels so strongly about the issue.

Given that there are some *very* easy workarounds, which I have
listed, just as there are simple workarounds for creating a binary for
Bookworm that will work for Bullseye, I really do believe this is a
tempest in a teacup.

Best regards,

- Ted

Bug#1031325: e2fsprogs 1.47.0 introduces a breaking change into Bookworm, breaking grub and making installations of Ubuntu and Debian releases via debootstrap impossible

2023-02-15 Thread Theodore Ts'o

On Wed, Feb 15, 2023 at 11:47:08AM +0200, Adrian Bunk wrote:
> 
> For normal library dependencies
>   Depends: libc6 (>= 2.34)
> will do the right thing automatically.

Sure, but dependencies only apply if you are using building packages.
If you are not building packages, but just moving binaries between
distribution versions --- which is analogous to what is going on here
when someone moves a file system from one distro version to another.
For example:

% echo 'int main() { printf("Hello, world\n"); return 0; }' > /tmp/foo.c ; gcc 
-w -o /tmp/foo /tmp/foo.c ; /tmp/foo ; schroot -c bullseye-amd64 /tmp/foo
Hello, world
/tmp/foo: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found 
(required by /tmp/foo)

> e2fsprogs adding Breaks against older versions of bootloaders and other 
> software that lacks support for the new default might be a good idea.
> 
> The situtation is different when a relationship cannot be reliably 
> expressed by package dependencies.

Unfortunately, package dependencies won't address the concern raised
by Daniel.  The mkfs.ext4, debootstrap, etc., are being run on a one
version of the distro, such as for example sid or Bookworm, but then
when the grub bootloader is installed --- presumably in a chroot, so
that you get the Bullseye grub on a Bullseye VM image --- it won't
know that the base file system was built on Sid or Bookworm.  Of
course, if you run the mkfs.ext4 in a Bullseye chroot, everything will
work *just* *fine*.  It's just that this isn't Daniel's workflow.

> For the kernel the answer to your "how long" is that packages should 
> also work with the kernel from the previous and the kernel from the
> next Debian release.

This isn't a problem with the kernel.  For the ext4 feature that
Daniel cited, metadata_csum_seed was first supported in 4.4 kernel
(note that Debian *stretch* used the 4.9 kernel and 4.4 was first
released in 2016 --- six years ago).  On the file system utility side,
e2fsprogs 1.43 also support for metadata_csum_seed (again, support
going back to stretch).

The xfs bigtime feature, which was similarly enabled in Bookworm, was
supported in the kernel the xfsprogs for **many** years.  In general,
file system developers do understand the concern of needing to wait
before enabling a feature by default in mke2fs.conf.

The issue here is in grub support.  Part of the issue is that
admittedly, file system developers don't worry as much about
bootloaders as much as we do for kernels and the file system
utilities.  The other is that the grub2 upstream simply hasn't been as
responsive or quick at doing releases.  The fix landed in grub2
upstream in June of 2021.  So all distributions tend to carry large
number of grub2 cherry-picks.

> If the feature stays enabled by default in bookworm:
> Everyone using grub is an x86 thing, for embedded ARM it is more common 
> to use a once ported ancient u-boot on your hardware forever.[1]
> A bug against the release-notes pseudo-package with text describing
> the incompatibility and possible workarounds would be appreciated.

Would you prefer that we repurpose (and retitle) this bug, or open a
new one?

Here's a quick rough draft of what that text might look like, at least
for the ext4 metadata_csum_seed.  (If it's helpful, I can try to also
write a blurb for XFS, although I'd want to run it by the XFS kernel
maintainer, Darrick Wong first.)

 cut here ---

In the Bookworm release, the mkfs.ext4 will enable the orphan_file and
metadata_csum_seed feature.  The orphan_file feature significantly
improves performance for workloads which are truncated or deleting
files in parallel.  The orphan_file feature is an ext4 "compat"
feature, so file systems with the orphan_file feature can be mounted
on older kernels who do not understand the orphan_file feature, so
long as the file system was cleanly unmounted.  If the file system was
not cleanly unmounted (for example, due to a crash or power failure),
the file system must be fsck'ed or mounted on a Bookworm system before
the file system is transferred to an older system.

The metadata_csum_seed feature allows the file system UUID to
be modified without needing to update all of the file system metadata,
which is often important in VM context where root file systems are
cloned from read-only images, and having unique UUID's is important to
avoid confusion when the block device might be mounted on another
whose root file system is similarly cloned from the same base image.

The metadata_csum_seed feature is an ext4 "incompat" feature, which
means that the kernel must understand the feature before it will be
able mount a file system with that feature.  Fortunately, the Linux
kernels starting with 4.4 (and the Debian 9.0 "stretch" release)
support the metadata_csum_seed feature.  So data disks formatted with
metadata_csum_seed can be mounted on systems with older distro
versions without concerns in nearly all case.

Unfortunately, grub2 support for the metadata_csum_seed

Bug#1031325: e2fsprogs 1.47.0 introduces a breaking change into Bookworm, breaking grub and making installations of Ubuntu and Debian releases via debootstrap impossible

2023-02-14 Thread Theodore Ts'o

There is more about this in the referenced bugs, but I dispute
Daniel's characterization of the issue.

I will draw the analogy of building a program which links against
glibc for Bookworm resulting in a binary that will not run on Buster.
We expect that, and we tell people to use build chroots.  This is not
something which is actionable, and we don't hold back glibc updates
just because you can no longer build on Debian 10.0 something that
won't work on Debian 9.0, or 8.0.

The same is true for file system featuers.  We add new features to
improve the user experience.  This is true for all file systems: ext4,
xfs, btrfs.  For example, in Bookworm, the version of xfsprogs is
going to enable the incompat "bigtime" feature for the first time.
This fixes XFS's 2038 problem.  A file system has the bigtime feature
enabled won't boot on Grub versions older than 2.06.  That is just the
way of the world.

As it truns out, for e2fsprogs, users (or distributions) can very
easily change the default file system features by editing the
configuration file /etc/mke2fs.conf.  So if a user wants to ask mke2fs
to disable certain features by default, and "dumb down" the
capabilities of the file system, they can to do that --- on the
command line, by tuning the file system after the fact, or by editing
the /etc/mke2fs.conf file.  They can even set the MKE2FS_CONFIG
environment variable to point at their own custom config file if they
would like.

We can change the default for mke2fs.conf file for Debian.  I don't
think it's warranted, and I'm not aware of any other distribution
doing this.  The fact that file system featuers that fix certain
problems (such as the 2038 bug) will cause older versions the
distribution to fail to accept that file system is always going to be
the case.  So how long do we hold back some new feature?  A year?  Two
years?  Five years?  Ten years?  Again, we don't do this with shared
library linkages; we just tell people to use a build chroot.

I would gently suggest that the most general solution to this problem
is to do the same for building VM images, if people won't want to be
bothered to learn how to configure the mfks program.  After all,
according to popcon, there are 960 times as many people who have use
gcc-10 recently (7685) as vmdb2 (8).

So if we are to hold e2fsprogs and xfsprogs to the standard that a
file system created by default must work on all older Debian and
Ubuntu distributions to some arbitrary point in history, should we
1000 times as much hold gcc-10 and clang to the same standard?

Obviously, that is a ridiculous thing to do.

Best regards,

- Ted

Bug#1030846: Bug#1030939: e2fsprogs: generates filesystems that grub-install doesn't recognize

2023-02-14 Thread Theodore Ts'o

On Tue, Feb 14, 2023 at 07:35:51PM +0100, Daniel Leidert wrote:
> 
> As soon as this version hits testing, you have successfully disabled
> the last working environment to use vmdb2 to create images of Ubuntu
> and Debian. As soon as this version hits Testing, one then can no
> longer build images for either any Ubuntu release nor Debian Bullseye
> or older. I can then only build images for Bookworm and Sid. Please
> think about that.

The number of users who use vmdb2 are quite small; for those users who
do, there is a simple fix.  You can either diable the feature using
tune2fs, or you can just make an edit to your /etc/mke2fs.conf file to
not enable the feature.  A one line change to /etc/mke2fs.conf is
hardly what I'd call "impossible".

> You *seriously* break the debootstrap method of installing Debian or
> Ubuntu.

As you have pointed out, this is not a debootstrap issue, since it
doesn't create the file system.  The uestion is in how the file system
is created, and this is not hard to fix.  You can just run "mke2fs -O
^metadata_csum_seed _file_or_block_device_"; you can run "tune2fs -O
^metadata_csum_seed _file_or_block_device"; you can make a one-line
change to /etc/mke2fs.conf.

> You haven't documented how to disable that
> breaking change when creating filesystems for distributions where grub
> does not support this (there is not even a NEWS entry). You haven't
> even announced that breaking change. And yet you are going to break one
> of our two standard methods of installing Debian or Ubuntu. How is any
> of what you have been saying so far addressing any of these issues?

Sorry, as far as I'm concerned, vmdb2 is not that important.  We don't
document in a NEWS entry or anywhere else, how to build a binary that
links against a newer version of glibc so it will work on an older
system.  And I would consider the compiler toolchain to be far more
common a usecase than vmdb2.  Indeed, your use of "toolchain" for file
system utilities is a new one for me.  I've never heard the term
"toolchain" used in such a way before.

> I do not understand why you are pushing 1.47.0 over 1.46.6, which you
> had uploaded just five days before the former. You haven't even
> presented a reason yet.

It has anumber of new features that will improve ext4's performance on
highly parallel workloads; it makes it possible for cloud VM's to be
able to safely update the root file systems's UUID while it is
mounted, among other changes.

- Ted

Bug#1030846: Bug#1030939: e2fsprogs: generates filesystems that grub-install doesn't recognize

2023-02-14 Thread Theodore Ts'o

There is another issue with vmdb2 if you are using XFS.  Starting with
xfsprogs 5.15 (which is already in testing), bigtime is enabled by
default, so that newly created XFS file systems won't be subject to
timestamp overflow in 2038.  Grub didn't land support for this feature
until 8b1e5d1936ff ("fs/xfs: Add bigtime incompat feature support") in
May 2021, despite the fact that XFS has had this feature for years and
years and years.

So if you aren't using the latest security fixes, and you are using
XFS as the boot partition --- it won't work on buster and bullseye.
"Fortunately", there were were massive number security vulnerabilities
in grub2 which forced a backport of grub2 2.06 to bullseye and buster,
so if you have the security updates enabled, you'll probably be OK ---
but it was only because of massive number of security problems forced
that backport.


In any case, a version of grub that will support the csum_seed feature
will be landing in Bookworm in just a few days.  So at that point,
you'll be able to create VM images for Bookworm and Sid that will work
with the e2fsprogs in sid.  The current plan of record is that it will
only be at that point that e2fsprogs will be allowed to migrate into
Bookworm.

For slowly moving upstreams like grub2, distributions *have* to take
updates before grub2 finally gets around to doing a release --- to get
security fixes if nothing else!  The support for csum_seed has been in
Fedora and other distributions for a while, since the patches had
landed in grub2 in June 2021.  I probably should have made sure the
feature had landed in Debian's grub2 packaging earlier; that's my bad,
and my apologies for that.

Note that Debian's grub2 has well over 100 patches, nearly all of
which are backports from grub2's git repo.  So the argument that
"there doesn't exist a formally released grub2 release" isn't
particularly compelling, since all the distros are backporting
patches.  The only question is how *many* commits release has an
individual distribution taken.


By the way, in the case of the csum_seed feature, it's pretty
straightforward to just run "tune2fs -O ^metadata_csum_seed
/tmp/boot.img".  If the UUID has been changed since the file system
was created, you'll have to do this while the file system is unmounted
and it will take a few seconds, but that's almost certainly not the
case with vmdb2.

   - Ted

Bug#1030846: Bug#1030939: e2fsprogs: generates filesystems that grub-install doesn't recognize

2023-02-13 Thread Theodore Ts'o

On Tue, Feb 14, 2023 at 01:01:38AM +0100, Daniel Leidert wrote:
> Hi Steve,
> 
> I believe that your fix to grub2 in Sid is not enough to handle
> #1030939/#1030846.
> 
> This problem breaks e.g. vmdb2. I can no longer create a Bullseye
> system image with vmdb2 on Sid, because the grub-install step in the
> Bullseye chroot now fails, because the created filesystem (created with
> e2fsprogs on Sid) cannot be recognized by grub.

I'm confused, why does using vmdb2 on *sid* involve running
grub-install on a *bulleye* chroot?  That seems to be extremely
ill-advised.  If you are trying to install a bullseye system, why
can't you using vmdb2 on bullseye?

And if you are trying to install a sid or bookworm system using vmdb2,
why can't you (or vmdb2) use a sid or bookworm chroot for doing the
grub-install?

> I have to downgrade e2fsprogs to the version in Testing to get the
> build going again. It also means that a Bookworm system can never be
> used to format and debootstrap a Bullseye or Buster system that
> requires a grub installation.
> 
> I guess that the fix applied to grub2 in Sid would have to be applied
> to grub2 in Bullseye as well (and basically to any grub2 package in any
> Debian or Ubuntu or Raspbian release supported by debootstrap).

I can understand why we need to keep things synchronized so that a
debian installer for Bookworm be able to generate a bootable system
using the version of grub and e2fsprogs in Bookworm.

But a generalized requirement that we be able to use debootstrap and
vmdb2 to be able to bootstrap an arbitrarily old distribution doesn't
seem to be reasonable.  It would mean that we couldn't update to newer
versions of the C library, or of new file system featuers, because we
are somehow obliged to be able to boostrap ancient versions of the
system.  After all, we don't require that a binary built using
Bookworm be able to run on Bullseye.  How is this any different?

I generate test appliances for file system regression testing which
run on Debian Bullseye on my Debian Bookworm system --- and this gets
done using build chroot[1].  I even have super-convenient scripts to
create the build chroot[2].  This is not hard why can't vmdb2 do
the same thing?

[1] 
https://github.com/tytso/xfstests-bld/blob/master/Documentation/building-xfstests.md
[2] https://github.com/tytso/xfstests-bld/blob/master/setup-buildchroot

- Ted

Bug#1030939: e2fsprogs: generates filesystems that grub-install doesn't recognize

2023-02-10 Thread Theodore Ts'o

On Fri, Feb 10, 2023 at 08:31:04AM +0100, Cyril Brulebois wrote:
> > Holding back file system development because grub2 uptsream is super
> > slow doesn't seem like a reasonable way forward, so I really don't
> > want to set this precedent.
> 
> The Bookworm freeze has started, we need to be able to install stuff, so
> we need a solution for the grub2/e2fsprogs situation *now*.
> 
> I really don't care about “setting a precedent”.

But that problem has already been solved by cloning the bug back to
e2fpsrogs (#1030939) which will prevent e2fsprogs from transitioning
to testing, no?  So what's the problem.

The fact that grub2 upstream is super-slow in doing releases is
already a pain in *ass that has caused much pain over the years.
Adding a conflict just adds more pain, without adding any additional
benefit.  So what's the point?

 - Ted

Bug#1030939: e2fsprogs: generates filesystems that grub-install doesn't recognize

2023-02-09 Thread Theodore Ts'o

On Thu, Feb 09, 2023 at 06:55:08PM +0100, Sven Joachim wrote:
> >>
> >> Thanks from me as well :-).  To prevent e2fsprogs from migrating to
> >> testing before grub2 and breaking d-i, I am reassigning a copy of this
> >> bug back to e2fsprogs.  It may be closed once grub2 2.06-8 enters
> >> Bookworm.   Perhaps it would also be a good idea to add a versioned
> >> "Breaks: grub2-common (<< 2.06-8~)" to e2fsprogs.
> >
> > Perhaps we could just add a "Breaks: grub2-common (<< 2.06-8~)" to
> > e2fsprogs-udeb?
> 
> That is not going to help, because IIUC grub-install is run from the
> target system that you are installing, and there is no
> grub2-common-udeb.

Right, but if the conflict in e2fsprogs-udeb prevents the installer
from pulling in an overly new version of e2fsprogs-udeb, that woul be
sufficient, no?

After all, you can install e2fsprogs 1.47, and install a newer version
of grub, and there is no problem --- unless you enable the the
csum-seed function on an already installed system.  And you can't do
that, because you it's an incompat feature.

OTOH, with e2fsprogs 1.46 you can *ready* run the command "tune2fs -O
large_dir /dev/root" on a running system, and then when you reboot,
the system won't come back, because grub will see the large_dir
feature and freak out (unecessarily).  But adding an conflict with
1.46 and grub makes no sense, since it's not _that_ likely that user
will deploy that particular foot-gun.  (It happens with Arch and
Gentoo, but those users tend to be much more adventurous, aided and
abetted by Wiki pages that encourage users to help kernel developers
beta test new features.  :-)

The reason why I really don't want to add the conflicts with e2fsprogs
1.47 is because for people who are using sid, there is aboslutely
nothing wrong with installing e2fsprogs 1.47.  It's only if they use
the installer that sucks in e2fsprogs that there's a problem --- it's
not an inherent conflict with grub per se.  If it were, then we
shouldn't allow installation of e2fsprogs 1.46 because grub2 upstream
is painfully slow in putting out releases --- and that's not
reasonable.

Now, what I *could* do is change the mke2fs.conf in e2fsprogs-udeb to
remove the default use of the csum-seed feature  but is it worth
it?  Hopefully the new version of grub2 will come out soon, at which
point I would then need to revert the hacked mke2fs.conf --- and your
dup'ing this bug should prevent the installer from picking up
e2fsprogs 1.47 until the new version of grub gets released.

I'll note that grub is trying to use an independent implementation of
ext4, as opposed to using the upstream libraries such as libext2 for
ext2/3/4 and libxfs for XFS, combined with grub's very slow release
cycle, means that this kind of thing is going to be happening a *lot*.
It's something that I and Darrick Wong (the XFS maintainer) have
kvetched about a number of times on our weekly conference calls.

For example, recent grub commits that impact XFS that aren't in 2.06
include:

commit a4b495520e4dc41a896a8b916a64eda9970c50ea
Author: Erwan Velu 
Date:   Wed Aug 25 15:31:52 2021 +0200

fs/xfs: Fix unreadable filesystem with v4 superblock

The commit 8b1e5d193 (fs/xfs: Add bigtime incompat feature support)
introduced the bigtime support by adding some features in v3 inodes

Holding back file system development because grub2 uptsream is super
slow doesn't seem like a reasonable way forward, so I really don't
want to set this precedent.

Thinking about this some more, I'd much rather have some kind of
explicit scheme, such as:

   mkfs.xfs -T grub2-dumbdown /dev/XXX

Which disables various new file system features that aren't yet
supported by grub, but which users might very well want to use on
non-boot disks.  

This could be done by doing something as simple as adding to mke2fs.conf:

[fs_types]
 grub2-dumbdown = {
features = ^large_dir,^metadata_csum
 }

Then we could teach the Debian installer to use the file system usage
type "grub2-dumbdown".

Of course, moving forward we'd have to update mke2fs.conf as grub
finally learns new features, and since mke2fs.conf is a config file,
people would have to edit it by hand post-install if they have
customized it in any way.  Unless we add the above in
/etc/mke2fs.conf.d/grub2-dumbdown, and then teach mke2fs to understand
the /etc/mke2fs.conf.d directory...

  - Ted

Bug#1030846: e2fsprogs: generates filesystems that grub-install doesn't recognize

2023-02-09 Thread Theodore Ts'o

On Thu, Feb 09, 2023 at 06:04:23PM +0100, Sven Joachim wrote:
> 
> Thanks from me as well :-).  To prevent e2fsprogs from migrating to
> testing before grub2 and breaking d-i, I am reassigning a copy of this
> bug back to e2fsprogs.  It may be closed once grub2 2.06-8 enters
> Bookworm.   Perhaps it would also be a good idea to add a versioned
> "Breaks: grub2-common (<< 2.06-8~)" to e2fsprogs.

Perhaps we could just add a "Breaks: grub2-common (<< 2.06-8~)" to
e2fsprogs-udeb?  After all, it's not that e2fsprogs breaks
grub2-common, per se, unless the *installer* happens to to use << grub
2.06-8~ and e2fsprogs 1.47.0-1.

- Ted

Bug#1030846: e2fsprogs: generates filesystems that grub-install doesn't recognize

2023-02-08 Thread Theodore Ts'o

On Wed, Feb 08, 2023 at 09:12:05PM +, Steve McIntyre wrote:
> 
> I've just queued these up in our repo for the next grub upload, due in
> a few days.

Many thanks, Steve!

- Ted

Bug#1030846: e2fsprogs: generates filesystems that grub-install doesn't recognize

2023-02-08 Thread Theodore Ts'o

On Wed, Feb 08, 2023 at 11:39:48AM +0100, Cyril Brulebois wrote:
> 
> Spotted via debian-installer tests: grub-install fails with “unknown
> filesystem” when trying to run a simple `grub-install /dev/sda` with
> an all-default installation.

The fix for this issue landed in the grub2 repository on June 11,
2021:

commit 7fd5feff97c4b1f446f8fcf6d37aca0c64e7c763
Author: Javier Martinez Canillas 
Date:   Fri Jun 11 21:36:16 2021 +0200

fs/ext2: Ignore checksum seed incompat feature

This incompat feature is used to denote that the filesystem stored its
metadata checksum seed in the superblock. This is used to allow tune2fs
changing the UUID on a mounted metdata_csum filesystem without having
to rewrite all the disk metadata. However, the GRUB doesn't use the
metadata checksum at all. So, it can just ignore this feature if it
is enabled. This is consistent with the GRUB filesystem code in general
which just does a best effort to access the filesystem's data.

The checksum seed incompat feature has to be removed from the ignore
list if the support for metadata checksum verification is added to the
GRUB ext2 driver later.

Suggested-by: Eric Sandeen 
Suggested-by: Lukas Czerner 
Signed-off-by: Javier Martinez Canillas 
Reviewed-by: Lukas Czerner 
Reviewed-by: Daniel Kiper 

Unfortunately, this just missed the last grub release, since grub 2.06
was tagged on June 8, 2021.  

There are two ways we can fix this.  One is we can disable the
checksum seed feature in the Debian release of mke2fs.conf.  Or we can
land this above-mentioned commit into the grub2 package.  Since the
hard freeze is almost upon us, I'm happy to handle this either way.

If we *are* going to backport some grub2 patches, may also make a plug
for this one, which is also in the upstream grub2 git repo:

commit 2e9fa73a040462b81bfbfe56c0bc7ad2d30b446b
Author: Theodore Ts'o 
Date:   Tue Aug 30 22:41:59 2022 -0400

fs/ext2: Ignore the large_dir incompat feature

Recently, ext4 added the large_dir feature, which adds support for
a 3 level htree directory support.

The GRUB supports existing file systems with htree directories by
ignoring their existence, and since the index nodes for the hash tree
look like deleted directory entries (by design), the GRUB can simply do
a brute force O(n) linear search of directories. The same is true for
3 level deep htrees indicated by large_dir feature flag.

Hence, it is safe for the GRUB to ignore the large_dir incompat feature.

Fixes: https://savannah.gnu.org/bugs/?61606

Signed-off-by: Theodore Ts'o 
Reviewed-by: Daniel Kiper 

Otherwise, what can happen is that users may choose to enable the
large_dir feature, and if they do it on the root file system, they can
end up making their system unbootable.  I've had some Arch and Gentoo
users get bitten by this

> The “e2fsprogs gets a new upstream release and new flags” hypothesis was
> confirmed by building d-i with e2fsprogs-udeb_1.46.6-1_amd64.udeb
> rebranded as 1.48, which made the problem disappear.

Alternatively, I can apply this patch to e2fsprogs:

diff --git a/misc/mke2fs.conf.in b/misc/mke2fs.conf.in
index b7fc95df7..ff47bdb39 100644
--- a/misc/mke2fs.conf.in
+++ b/misc/mke2fs.conf.in
@@ -11,7 +11,7 @@
features = has_journal
}
ext4 = {
-   features = 
has_journal,extent,huge_file,flex_bg,metadata_csum,metadata_csum_seed,64bit,dir_nlink,extra_isize,orphan_file
+   features = 
has_journal,extent,huge_file,flex_bg,metadata_csum,64bit,dir_nlink,extra_isize,orphan_file
}
small = {
blocksize = 1024

Which will kind of suck, since the reason for enabling
metadata_csum_seed is to be able to reliably change the label and file
system uuid on a mounted file system, which matters for certain cloud
workloads.  Specifically, for Google's Cloud Optimized OS, which
attempts to update the file system UUID and resize the root file
system to fit the size of the cloud-emulated block device on two
separate, racing systemd unit files.  This which works 99.9% of the
time, but when you launch a huge number of cloud images, that last
0.1% of the time is really noticeable.

But that's for COS; if we have to disable the metadata_csum_seed
feature on Debian file systems, I can live with that.

Cheers,

- Ted

Bug#1022096: e2fsprogs: debian: make the copyright file machine readable

2023-01-31 Thread Theodore Ts'o

On Wed, Oct 19, 2022 at 11:45:22PM +0200, Bastian Germann wrote:
> Source: e2fsprogs
> Severity: serious
> Tags: patch
> Version: 1.46.6~rc1-1
> 
> Hi Theodore,
> 
> There are several distribution licenses and copyright information not
> mentioned, which are required by Debian Policy §12.5. I have attached
> a patch fixing this, which also has the conversion of the copyright
> files that are still in use to the machine-readable format.
> 
> Also attached is a patch for the upstream source (as you are also the
> upstream maintainer) that applies a license requirement that I found
> not to be present.

Applied to e2fsprogs upstream maint branch, thanks.

- Ted

Bug#1025791: fsck.f2fs is unable to check mounted FS

2022-12-14 Thread Theodore Ts'o

OK, I've done some final fixups (there was something weird with the
spelling patch that caused dgit to have heartburt) and have done a
"dgit push-source" and pushed git tree to salsa.  Thanks again for
your help.

Since you are investigating using f2fs, I thought I would let you know
that there are the following test failures of the file system:

KERNEL:kernel 6.1.0-xfstests #2 SMP PREEMPT_DYNAMIC Mon Dec 12 16:09:40 EST 
2022 x86_64
CMDLINE:   -c f2fs/default -g auto
CPUS:  2
MEM:   7680

f2fs/default: 676 tests, 5 failures, 224 skipped, 4923 seconds
  Failures: generic/050 generic/064 generic/252 generic/506 generic/563

The same test failures happen with f2fs-tools 1.14 and 1.15, so these
are not f2fs-tools regressions.  These are actually fewer test
failures than I am seeing with btrfs:

btrfs/4k: 953 tests, 21 failures, 176 skipped, 12135 seconds 
  Failures: btrfs/006 btrfs/012 btrfs/049 btrfs/100 btrfs/162 btrfs/163 
btrfs/184 btrfs/192 btrfs/196 btrfs/197 btrfs/218 btrfs/219 btrfs/235
btrfs/254 btrfs/277 btrfs/291 generic/455 generic/457
  Flaky: btrfs/028: 40% (2/5)   generic/475: 60% (3/5)
shared/298: 20% (1/5)

 although it's worse than the ext4/4k and xfs/4k test results
(which tests 100% green).  

If you are curious about exactly what the test failures are, just
checkout the xfstests-dev repo, and then look at the first few lines
of the test script, for example:

 {/usr/projects/xfstests-bld/build-64}   (master)
1079% head fstests-bld/xfstests-dev/tests/generic/050
#! /bin/bash
# SPDX-License-Identifier: GPL-2.0
# Copyright (c) 2009 Christoph Hellwig.
#
# FS QA Test No. 050
#
# Check out various mount/remount/unmount scenarious on a read-only blockdev.
#
seqfull=$0
. ./common/preamble

(Which is a bit relevant to this original bug report, which was purely
accidental, and perhaps a bit of synchronicity.)

On Tue, Dec 13, 2022 at 11:53:53PM +0300, Michael Tokarev wrote:
> 
> (and now this has absolutely nothing to do with #1025791 anyway.. ;))

Heh.

Going back to this bug, I'm pretty sure it's not going to be an easy
fix, unless you want to take over upstream f2fs-tools.  Which if you
ask the f2fs-tools maintainers, they might actually be willing to
delegate; you never know until you ask.  (As an aside, that's how I
got involved with ext2/ext3/ext4.  Originally ext2 userspace tools
were a modified version of the minix userspace tools, and I decided I
could do better.  So after rewriting it from scratch, creating a real
library interface that could actually have a stable shared library
ABI, and using test driven development, I was able to speed up e2fsck
by a factor of 10 or so.  So if you're interested in getting involved
with file system development  :-)

Failing that, probably the best approach is a quick patch, which could
either be maintained in the Debian packaging, or perhaps we can get it
upstream, might be to teach fsck.f2fs check to see if the open of the
block device is failing with EBUSY, and if the -a or -p option was
passed to fsck.f2fs, to print a warning message that running fsck on
boot is not supported, and then exit with a status code of 0.

This is a moral equivalent of what xfs and btrfs is doing.  (See the
shell script found in /sbin/fsck.xfs and /sbin/fsck.btrfs.)  It won't
allow f2fs to check a mounted file system, but at least it will clean
up the boot failures.

Cheers,

- Ted

Bug#1025791: fsck.f2fs is unable to check mounted FS

2022-12-13 Thread Theodore Ts'o

On Mon, Dec 12, 2022 at 09:41:38AM +0300, Michael Tokarev wrote:
> 
> I pushed "mjt" branch yesterday to the repository (without pristine-tar
> and upstream pieces so far) which updates f2fs to 1.15 and adds minor
> cleanups to d/rules. Please take a look.

Thanks!  The main thing I will note is that it appears that you
checked in f2fs-tools v1.15.0 as a squashed commit.  That is, the
signle commit d0ba994589b5 seems to be a delta between v1.14.0 and
v1.15.0.  What I've done instead is do a git fetch of the upstream git
repo[1] and then performed a "git merge v1.15.0".  This allows the
full upstream git history to be in the debian branch, which makes
future cherry-picks and merges to be easier.  

[1] git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs-tools.git

I've then rebased the rest of your changes from the salsa/mjt branch
on top, and then added a commit to change the distribution from
"UNRELEASED" to "unstable".

After adding pristine-tar, "gbp buildpackage" seems to work just fine.
I'm in the middle of running a quick regression test using
"gce-xfstests -c f2fs/default -g auto".  If that is clean, I'll do a
formal signed build using dgit.

> This is a known issue. I've faced it a few times myself, though in
> my case I didn't want to add more work for ftp-masters and tried to
> avoid soname bump.

Yeah, unfortunately, the library communicates with its callers
(including the specification of the device name, etc.) in the global
variable:

struct f2fs_configuration c;

So unfortunately, there are soname bumps needed pretty much at every
single release, because the library API design was, well the sort
of thing you would expect for a flash FTL firmware programmer.  :-(

> So, let's process 1.15. I don't really care much about the lack
> of shared library at this time.  Please take a look.  I'm not the
> first-day Debian developer, but maybe there's something specific
> to this package or your habits which I didn't think about.

I think you did a great job!  I'll let you know how my regression test
works out.  (gce-xfstests is more of a kernel test than a userspace
test, but it's the best that we have for f2fs-tools)

 - Ted

Bug#1025791: fsck.f2fs is unable to check mounted FS

2022-12-11 Thread Theodore Ts'o

On Sun, Dec 11, 2022 at 09:55:48AM +0300, Michael Tokarev wrote:
> 
> Hmm.  I'm a bit afraid it will be similar to my experience with samba.
> I come across a (data corruption) bug, discovered it's been fixed upstream
> a few years ago, tried to push the fix to debian, and ended up becoming a
> single maintainer of samba package in debian, and basically repackaging
> everything (because the package has been maintained in "rob Beter to pay
> Paul" or "random tinkering at the margins" mode for a few years).
> 
> I don't mind updating f2fs to current version in debian.  It does not seem
> to have this particular issue fixed though.

Yeah, I'm not surprised.  The f2fs file system was engineered
primarily for Android, and I'm not aware of a lot of people trying to
use f2fs as a root file system on a Linux desktop or server.  And
Android has a read-only root file system (which uses dm-verity to
cryptographically guarantee that it can't be modified), and so the
data partition can be fsck'ed before it is mounted.

So this is not likely to be something which would be fixed by the f2fs
upstream.  That's not something that they are interested in, since
their primary focus has always been for Android.

I've taken a quick look at the f2fs-tools sources, and fixing it to
support allow fsck.f2fs to operate on a mounted file system is going
to require massive surgery.  The short version is you can tell that
f2fs was written originally by firmware engineers from Samsung.

The software abstractions are utterly lacking (libf2fs communicates
with the application binary using the global variable "c", of type
struct f2fs_configuration).  The block device is opened in
get_device_info(i), with the block device name passed in
c.devices[i].path.

For e2fsprogs, the logic to determine whether the block device is
mounted, and whether it is mounted as the root device can be found in
lib/ext2fs/is_mounted.c, and the logic to decide whether it is safe to
open the block device read-only is in e2fsck/unix.c (search for code
involving EXT2_MF_ISROOT and EXT2_MF_READONLY).

There is no easy or clean way I can see to shoehorn this logic into
f2fs-tools.  If were going to tackle it, the very first thing I would
do is completely restructure libf2fs so it has a clean interface, and
then add silly things like regression tests.  Basically, the code
needs to be rototilled and rewritten almost from scratch.  Take a look
at libext2fs, which has had a stable API for multiple decades, and
then look at libf2fs, and weep.

So probably the best you can do to address this bug is to prevent the
boot up errors, by doing something like /sbin/fsck.xfs and
/sbin/fsck.btrfs.  Unless, that is, you want work with upstream to do
some pretty major surgery to f2fs-tools (whether it's done in a super
ugly way, or by first restructing the code so it has some clean
abstractions).

> (I don't remember how to join a group in whatever debian development system
> is in use there, - it was alioth before, now it seems to be just the
> mailing list part of it).

The filesystems group is pretty informal.  There is the mailing list
on lists.alioth.debian.org, and the git tree on salsa.debian.org is
writable to any DD; we're not doing any group access control.  (We
could have, but file system packages are set up with the Salsa default
which is that they are located the top-level Debian folder -- and
changing the access requires moving the gitlab project, as I
understand things.)

The maintainer is set to filesystems-de...@lists.alioth.debian.org,
and the uploaders are the folks who are primarily working on the
package.  For f2fs-tools, it's Vincent cheng (who hasn't done an
upload since 2018), and me (and my last upload was in January 2021).
I have tried to do things like clean up the debian/rules file, and
keep things up to date with Debian standards.

Probably the only place where the package is a bit unclean is that
upstream *regularly* breaks the ABI, which means that we effectively
had to bump the package version every single upstream release.
Because the ABI is completely unstable, the normal Debian rationale
for using shared library breaks down (at every single package upload,
we end up breaking the ABI, which means any packages depending on the
shared library ends up using the old, stale shared library anyway).
And I got tired of ftpmasters taking 6-9 months to process an upload
since there was always a new binary package (from the shared library
bump --- EVERY SINGLE D*MNED TIME).

Besides, the only dependency on libf2fs-tools is android-sdk packages,
which is based on an ancient Android SDK (25, and 28, where Android
SDK 28 EOL'ed in January 2022), so it's kind of hopeless from a
security perspective *anyway*, and anyone seriously doing Android
development is going to be using AOSP git repos.

Anyway, this is technically a Debian policy violation, but I really
haven't been able to bring myself to care, since it really doesn't
make any practical difference from

Bug#1025791: fsck.f2fs is unable to check mounted FS

2022-12-09 Thread Theodore Ts'o

On Fri, Dec 09, 2022 at 11:16:46AM +0300, Michael Tokarev wrote:
> Package: f2fs-tools
> Version: 1.14.0-2
> Severity: important
> 
> When running an fsck.f2fs on a readonly-mounted filesystem (root filesystem
> for example, which obviously can not be unmounted), fsck.f2fs always fails:
> 
> I don't know if this is a kernel issue (who explicitly prevents
> opening of a mounted block device) or f2fs-tools issue (who is
> not doing any special actions required to work around that), but
> the result is that the root filesystem is never checked for errors,
> and the boot process always complains about failed fsck.
> 
> Other filesystems (eg ext*fs) is able to check and repair a mounted
> (read-only) filesystem just fine.
> 
> This is why I think this bug is of Severity: important.  The only way
> to check f2fs root  filesystem for errors is to put fsck.f2fs into
> initramfs and go from there, or boot some rescue media.

Well, btrfs and xfs have /sbin/fsck.$FSTYP which is a no-op when it is
run with the -a option.  Whether or not this is a good idea is
something where I have my own opinions  but as the ext4
maintainer, I'm obviously biased.  :-)

The point is that there are many many other file systems supported by
Debian which never checks the root file system for errors --- even if
you use an initramfs.

The other thing I'll note is that f2fs-tools isn't getting much love
in Debian at the moment.  The original maintainer/uploader was Vincent
Cheng, and I joined the Filesystem Group to help out mainly because I
wnated to get an updated f2fs-tools for kernel testing[1].  I don't
actually use f2fs except for kernel testing, so I wouldn't want to be
making changes to f2fs-tools or its support scripts, since I might not
notice any potential problems.

[1] https://thunk.org/gce-xfstests

So if you or other folks would be interested in helping with
f2fs-tools, especially if you are actually interested in using f2fs as
a root file system, please consider joining the file system group and
helping out.  There is new version of f2fs-tools that hasn't been
packaged yet and I'm not sure when I will have the time

   - Ted

Bug#1023450: e2fsprogs - Does not agree with kernel on clean state

2022-11-07 Thread Theodore Ts'o

On Fri, Nov 04, 2022 at 01:17:35PM +0100, Bastian Blank wrote:
> Package: e2fsprogs
> Version: 1.46.6~rc1-1+b1
> Severity: important
> 
> While doing an online resize on lnux 6.0.5, the kernel considers the
> filesystem broken:
> 
> | [   56.958856] EXT4-fs (sda1): resizing filesystem from 491515 to 4161531 
> blocks
> | [   57.290138] EXT4-fs (sda1): resized filesystem to 4161531
> | [   57.332382] EXT4-fs (sda1): Invalid checksum for backup superblock 32768
> ...
> 
> Currently I would assume this is a bug in the kernel, do you know more?

Yes, this is a kernel bug.  It's fixed in upstream commit 9a8c5b0d0615
("ext4: update the backup superblock's at the end of the online
resize"), which landed is in v6.1-rc4 and which is tagged for
backporting to LTS.

Apologies for this not getting noticed in my pre-release testing.  I'm
working to improve the testing so this would have gotten caught
earlier.

- Ted

commit 9a8c5b0d061554fedd7dbe894e63aa34d0bac7c4
Author: Theodore Ts'o 
Date:   Thu Oct 27 16:04:36 2022 -0400

ext4: update the backup superblock's at the end of the online resize

When expanding a file system using online resize, various fields in
the superblock (e.g., s_blocks_count, s_inodes_count, etc.) change.
To update the backup superblocks, the online resize uses the function
update_backups() in fs/ext4/resize.c.  This function was not updating
the checksum field in the backup superblocks.  This wasn't a big deal
previously, because e2fsck didn't care about the checksum field in the
backup superblock.  (And indeed, update_backups() goes all the way
back to the ext3 days, well before we had support for metadata
checksums.)

However, there is an alternate, more general way of updating
superblock fields, ext4_update_primary_sb() in fs/ext4/ioctl.c.  This
function does check the checksum of the backup superblock, and if it
doesn't match will mark the file system as corrupted.  That was
clearly not the intent, so avoid to aborting the resize when a bad
superblock is found.

In addition, teach update_backups() to properly update the checksum in
the backup superblocks.  We will eventually want to unify
updapte_backups() with the infrasture in ext4_update_primary_sb(), but
that's for another day.

Note: The problem has been around for a while; it just didn't really
matter until ext4_update_primary_sb() was added by commit bbc605cdb1e1
("ext4: implement support for get/set fs label").  And it became
trivially easy to reproduce after commit 827891a38acc ("ext4: update
the s_overhead_clusters in the backup sb's when resizing") in v6.0.

Cc: sta...@kernel.org # 5.17+
Fixes: bbc605cdb1e1 ("ext4: implement support for get/set fs label")
Signed-off-by: Theodore Ts'o

Bug#1021750: general: the nodelalloc mount option should be used by default for ext4 in /etc/fstab

2022-10-14 Thread Theodore Ts'o

On Fri, Oct 14, 2022 at 03:37:29PM +0200, Marco d'Itri wrote:
> On Oct 14, Vincent Lefevre  wrote:
> 
> > > This is obviously convenient on Guillem's part, but we have to optimize 
> > > systems by default for the general case and not just for dpkg -i.
> > This dpkg FAQ says that this is not beneficial for just dpkg,
> > but also "for any application in the system".
> [citation needed]
> 
> I hope that you understand why at this point I cannot trust as is the 
> opinions of the dpkg maintainer.

The dpkg FAQ is just wrong.  It relates to controversy which is over a
dozen years old.  For more information see, see Josef Sipek's blog
post from 2009, "O_PONIES & Other Assorted Wishes"

https://blahg.josefsipek.net/?p=364

The O_PONIES mention references an 2009 April Fool's patch:

https://lore.kernel.org/linux-fsdevel/20090401041843.gn19...@josefsipek.net/

Because buggy applications and clueless application programmers vastly
outnumbers file system maintainers, at the 2009 LSF/MM workshop, a
number of the major file system developers agreed on the following
workaround.  If the application opens a pre-existing file with
O_TRUNC, or renames a newly created file on top of pre-existing file,
we will force the delayed allocation to be automatically resolved when
the file is closed (in the first case) or the rename (in the second
case).  It does *not* force a file system commit.  So if you crash
within 5 seconds of the close(2) or rename(2), you will still suffer
data loss.

HOWEVER, this was always the case for buggy applicatons that refused
to call fsync(2) who were relying on the old ext3 file system
semantics.  It did not guarantee that things would work; it would just
*mostly* work, since *usually* you didn't crash within 5 seconds of
rewriting a file.

So no, we're not going to be making the default change to /etc/fstab.

NACK.

- Ted

Bug#1021250: libcom-err2: sudo apt install wine32 fail to install

2022-10-04 Thread Theodore Ts'o

On Tue, Oct 04, 2022 at 01:45:31PM +0200, David Gonzalo Rodriguez wrote:
> Package: libcom-err2
> Version: 1.46.5-2~bpo11+2
> Severity: important
> Tags: upstream
> X-Debbugs-Cc: z80u...@gmail.com

What is the error message when you run "sudo apt install wine32"?

What is the reason for believing this is caused by libcom-err2?  I
don't see dependencies on libcom-err2 by the wine package.

 - Ted

Bug#1016675: This is already fixed upstream

2022-08-11 Thread Theodore Ts'o

On Thu, Aug 11, 2022 at 08:02:15PM +1200, Alex King wrote:
> We are using Direct IO to slow mkfs and therefore lower I/O load.  I'm not
> aware of a way to directly ask mkfs to reduce the I/O load or work more
> slowly.  Using ionice at idle priority would be ideal, but we are using the
> mq-deadline scheduler rather than bfq or anything.  I understand ionice is a
> noop under the deadline scheduler.
> 
> In this case we want the mkfs to be lower priority that other tasks, e.g.
> other XEN guests.  And sometimes we are creating large filesystems on
> spinning disk.

Ah!  Thanks for the clarification.  I wonder if adding the following
to your /etc/mke2fs.conf would work for you:

[options]
   sync_kludge = 5

>From mke2fs.conf:

  sync_kludge
 If this relation is set to a positive integer, then while
 writing the inode table, mke2fs will request the operating
 system flush out pending writes to initialize the inode table
 every sync_kludge block groups.  This is needed to work
 around buggy kernels that don't handle writeback throttling
 correctly.

It might be that Direct I/O is still better for you, since it avoids
consuming memory in the page cache.  OTOH, the advantage of using
sync_kludge is it is more adjustable, since you can specify an integer
number of block groups between fsync() calls.

We could also think about a more direct way to regulate the I/O load
by (for example) timing how long it takes to complete the fsync(2) and
then use that to make mke2fs sleep between each batch of block group
writes, so we could specify a target kb/s or mb/s that mke2fs should
try to hit

Cheers,

- Ted

Bug#1016675: This is already fixed upstream

2022-08-10 Thread Theodore Ts'o

I'm just curious --- what is your use case for wanting to use Direct
I/O in mke2fs?  In general, using buffered I/O is much faster, and the
historically, Direct I/O option was primarily used as a workaround for
kernels (long ago) that were buggy with respect to write throttling,
and userspace programs which issued a large number of number of
buffered writes, especially in a constrained memory cgroup, could
potentially get OOM-killed.

What the kernel *should* do is to throttled the buffered writes, by
cause the buffered write(2) to block until there is enough free memory
in the cgroup so the user space program doesn't get OOM-killed.

I don't believe modern kernels has had write throttling bugs in quite
a while, however.

One of the reasons I didn't bother backporting the bug to Debian
Stable was because I had assumed the -D option isn't commonly used.
That's why I'm curious why you were using the -D option and ran across
this bug.

Thanks,

- Ted

Bug#1010263: CVE-2022-1304

2022-04-28 Thread Theodore Ts'o

Applied upstream as commit: ab51d587

Bug#1010264: Bug#1010263: Bug#1010264: CVE-2022-28391

2022-04-28 Thread Theodore Ts'o

On Thu, Apr 28, 2022 at 09:30:45AM +0200, Salvatore Bonaccorso wrote:
> 
> Theodore, btw the BTS reference for the e2fsprogs issue is #1010263
> and the CVE id CVE-2022-1304.

Yes, I've noted that already.

> #1010264 and CVE-2022-28391 is respectively for busybox. the bug
> already reassigned accordingly earlier.

Apologies, I didn't get an e-mail notification for the bug getting
reassigned; I should have double checked the BTS web page for the bug
before replying.

Regards,

- Ted

Bug#1010264: CVE-2022-28391

2022-04-27 Thread Theodore Ts'o

On Wed, Apr 27, 2022 at 01:55:27PM +0200, Moritz Muehlenhoff wrote:
> Package: e2fsprogs
> Version: 1.46.5-2
> Severity: important
> 
> This issue was found by Alpine:
> https://gitlab.alpinelinux.org/alpine/aports/-/issues/13661
> 
> Details and the patches they used are in the report above, but the
> patches are not yet merged upstream, might be worth to wait until
> that's fixed since the impact is rather low.

Um, going to that link results in the (closed) alpine bug from three
weeks ago:

"netstat is vulnerable to escape sequence injection (busybox)"

"Alpine ships BusyBox with the netstat applet enabled. This is
vulnerable to escape sequence injection when used from an VT
compatible terminal. To exploit this vulnerability the PTR for a
remote host must contain a escape sequence and the victim has to
execute netstat. I've set up an example at [elided] with the PTR
resolving to \027[33\;46mlocalhost."

The string "e2fsprogs" appears nowhere in on the page.

I've done a search on Alpine/aports looking for "e2fsprogs" and could
only find:

e2fsprogs can be uninstalled manually on systems that depend on it
#13584 · created 1 month ago by Álvaro Torralba

updated 1 month ago
modloop verification fails with apline usb drive when local disk partition has 
a alpine installation
#11136 · created 2 years ago by nico

Neither seems to be security related.  Are you sure this was correctly
filed against e2fsprogs?

- Ted

Bug#1008107: "mke2fs -E android_sparse" yields: "Unimplemented ext2 library function while setting up superblock" (not built against libsparse?)

2022-03-22 Thread Theodore Ts'o

On Tue, Mar 22, 2022 at 11:59:49AM -0400, Daniel Kahn Gillmor wrote:
> Package: libext2fs2
> Version: 1.46.5-2
> Control: affects -1 + fastboot android-sdk-platform-tools
> 
> The -E android_sparse option for mke2fs fails because libext2fs2 reports
> EXT2_ET_UNIMPLEMENTED, presumably because libext2fs2 isn't built with
> ENABLE_LIBSPARSE .  here's the failure:

Yep.

The explanation of why this is not easy to fix can be found at:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=928551#75

Given Debian's strong policy about using dynamic libraries everywhere,
with no exception (an opinion for which I strongly disagree, but oh
well, I don't make the rules), there's no way I can drag in libsparse
as a dependency without bloating anything that requries e2fsprogs.
Say, like the installer given that ext4 is still default file system
for Debian.

What is needed is to implement something like how libss (used by
debufs) will try to look for one of the following shared libraries:

#define DEFAULT_LIBPATH 
"libreadline.so.8:libreadline.so.7:libreadline.so.6:libreadline.so.5:libreadline.so.4:libreadline.so:libedit.so.2:libedit.so:libeditline.so.0:libeditline.so"

... and if present, it will dlopen it and use it.  But if it is not
present, it will return EXT2_ET_UNIMPLEMENTED.  The implementation of
this logic is in lib/et/get_readline.c in the e2fsprogs sources.  We
need to do something similar in lib/ext2fs/sparse_io.c.

However, this is not high on my priority list to work upon, because
there is a much simpler workaround --- just download the Android SDK
from Google.  :-)  That being said, patches are greatefully accepted,
so if you're interested in fixing this, please send patches, either
attached to Debian bug #928551, or to the upstream development list:
linux-e...@vger.kernel.org

Regards,

- Ted

Bug#1006253: iwd is crashing with a segfault

2022-02-27 Thread Theodore Ts'o

Hi Jonas,

I was checking Debian Bug tracker to make sure the bug metadata was
correct after closing it, and I realized that you had replied on
Tuesday, February 22nd (Message #10).  My apologies, for some reason
that message never arrived at my inbox (and I checked the spam filter
as well).  So I didn't realize that you had replied to me with the
uploaded fix until just now.

Many thanks for your work!

- Ted

Bug#1003583: e2fsprogs and libext2fs2 in bullseye-backports missing dependency

2022-01-12 Thread Theodore Ts'o

found 1003583 e2fsprogs/1.46.5-2~bpo11+1
notfound 1003583 e2fsprogs/1.46.2-2
thanks

> Dear Maintainer,
> 
>* e2fsprogs and libext2fs2 version 1.46.5-2~bpo11+1 are in the
> bullseye-backports repository
>* both depend on libc6 (>= 2.33) for amd64
>* libc6 versions 2.33 and higher do not exist in bullseye or
> bullseye-backports repositories
>* Result:
> cannot upgrade to the version of either that are in the backports

Thanks for the bug report.  Apparently dgit let me down, in that "dgit
gbp-build" failed to use the proper build chroot, while "gbp
buildpackage" *does* use the correct build chroot.

I had tested first using "gbp buildpackage", and then got taken
surprised when "dgit --quilt=nofix gbp-build" did *not* use the proper
schroot.  At a guess, dgit is somehow invoking gbp buildpackage in a
way that causes the debian/gbp.conf is ignored.

In any case, no matter how it happened, the version of e2fsprogs that
dgit uploaded to bullseye-backport was built without a chroot, using
my Debian testing environment.  I've tested this and it's quite
reproducible.   

I've rebuilt using "gbp buildpackage" directly, which does DTRT, so
I'll reupload without uding dgit, and that should fix the problem.
When have a free day or two, I'll try figure out why dgit surprised
the heck out of me, but until then I'll stop using dgit as an
unreliable/incrutable tool.  :-(

   - Ted

Bug#992469: WARNING: dh_installsystemd is moving unit files to /usr/lib/systemd/system

2021-08-19 Thread Theodore Ts'o

On Thu, Aug 19, 2021 at 11:46:21AM +0200, Michael Biebl wrote:
> Am 19.08.21 um 08:27 schrieb Michael Biebl:
> > I'll check later today, if i-s-h (init-system-helpers) does properly
> > handle this new path. If so, I'd say the bug should be reassigned to
> > lintian and we should start transitioning the files to
> > /usr/lib/systemd/system.
> 
> I now remember updating i-s-h [1].
> 
> So we should be safe using /usr/lib/systemd/system I'd say.

OK, thanks for confirming this.  What really worried me was this text
in lintian:

N:   Systemd in Debian searches for unit files in /lib/systemd/system/ and
N:   /etc/systemd/system. Notably, it does *not* look in
N:   /usr/lib/systemd/system/ for service files.

This implied that it was *systemd* that didn't like /usr/lib/systemd,
and I didn't understand the subtlty that it was really the how
Debian's init-system-helpers worked which was the issue.

So it sounds like it's OK for me to upload a package like e2fsprogs
with a systemd unit file, despite the lintian flagging this as an
error.

  - Ted

Bug#992094: e2fsprogs: mke2fs fails to create file system

2021-08-11 Thread Theodore Ts'o

tags 992094 +pending
thanks

On Wed, Aug 11, 2021 at 02:56:06PM +0200, Heinrich Schuchardt wrote:
> 
> https://ci.debian.net/data/autopkgtest/unstable/amd64/e/e2fsprogs/14285096/log.gz
> 
> shows the following errors for debian/tests/fuse2fs:
> 
> mke2fs:
> No such file or directory while trying to determine filesystem size

Thanks for the bug report.  This bug is already fixed upstream in
commit 53464654bd33 ("mke2fs: fix creating a file system image w/o a
pre-existing file") and will be in the upcoming 1.46.4 release, which
is currently considered mostly complete pending updates from the
Translation Project.

Feel free to take a look at / test the "maint" branch at

https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git

and if you see any poitential problems, please let me know!

   - Ted

Bug#991922: FTBFS on s390x: Tests failed: f_baddotdir

2021-08-05 Thread Theodore Ts'o

tags 991922 +pending
thanks

On Thu, Aug 05, 2021 at 02:54:38PM -0300, Antonio Terceiro wrote:
> Source: e2fsprogs
> Version: 1.46.3-1
> Severity: serious
> Justification: fails to build from source
> 
> e2fsprogs currently fails to build on s390x buildd, and I was just able
> to reproduce it on a porter box.

(Note: the bug doesn't exist in 1.46.2, which is fortuante since
things are locked down for the upcoming stable release.)

Thanks for the bug report.  I'm glad to report that this is already
fixed upstream:

commit 225e5d093b519f9dbe9fcaacd995426f0e5194f6
Author: Theodore Ts'o 
Date:   Wed Jul 28 13:51:13 2021 -0400

e2fsck: fix f_baddotdir failure on big-endian systems

Commit 63f44aafb1f2 ("e2fsck: fix ".." more gracefully if possible")
changed the check_dot() function to try to avoid resetting the '..'
entry when the '.' entry is too large..  But if we do that, then on
big-endian systems, we need to try byte swapping the rest of the
directory entries, or else the f_baddotdir test will fail on
big-endian systems.

Also add a check to avoid UBSAN warning when there is not enough space
at the end of the directory block for a directory entry, and so we can
potentially overflow some pointer arithmetic when trying to byte swap
the remainder of the (negative) space in the directory block.

Fixes: 63f44aafb1f2 ("e2fsck: fix ".." more gracefully if possible")
Signed-off-by: Theodore Ts'o 

I only noticed the problem when I uploaded 1.46.3-1 to experimental on
July 27th, and like you, was able to reproduce it on a porter box and
fixed it the next day.  I am planning 1.46.4 release next week to fix
this and a number of other bugs:

ea97af65 - (maint) libext2fs: improve error handling in POSIX ACL conversions 
(6 days ago)
165c8541 - setup-schroot: install the acl and libreadline-dev packages (6 days 
ago)
5148709f - tests: skip m_rootdir_acl on GNU Hurd (6 days ago)
7a97083d - libext2fs: fix translation of Posix ACL's on big-endian systems (6 
days ago)
654be045 - contrib: add setup-schroot command for use on Debian porter boxes (7 
days ago)
c3062042 - tests: add description for j_recover_fast_commit (7 days ago)
5a3ea390 - tests: force test file systems to be built for the Linux OS (7 days 
ago)
4d988e1b - tests: try using truncate command before falling back to using dd (8 
days ago)
9aba0528 - libsupport: fix sort_r.h to work on the GNU Hurd (8 days ago)
225e5d09 - e2fsck: fix f_baddotdir failure on big-endian systems (8 days ago)

- Ted

Bug#986332: lsattr on certiain files in /dev results in "stack smashing detected"

2021-07-21 Thread Theodore Ts'o

tags 986332 +pending
thanks

The following commit should fix things; it will be in the next release
of e2fsprogs.

- Ted

commit a3f844da91f0c01209a5d778a5af57fabe245332
Author: Theodore Ts'o 
Date:   Fri Jul 16 22:31:26 2021 -0400

libe2p: use stat to prevent calling EXT2_IOC_[GS]ETFLAGS on devices

Some devices can react badly to the EXT2_IOC_[GS]ETFLAGS ioctls, since
ioctl codes are not guaranteed to be unique across different device
drivers and file systems.

Addresses-Debian-Bug: #986332
Signed-off-by: Theodore Ts'o 

diff --git a/lib/e2p/fgetflags.c b/lib/e2p/fgetflags.c
index 93e130c6..24a7166b 100644
--- a/lib/e2p/fgetflags.c
+++ b/lib/e2p/fgetflags.c
@@ -79,14 +79,26 @@ int fgetflags (const char * name, unsigned long * flags)
*flags = f;
return (save_errno);
 #elif HAVE_EXT2_IOCTLS
+   struct stat buf;
int fd, r, f, save_errno = 0;
 
+   if (!stat(name, ) &&
+   !S_ISREG(buf.st_mode) && !S_ISDIR(buf.st_mode)) {
+   errno = EOPNOTSUPP;
+   return -1;
+   }
fd = open(name, OPEN_FLAGS);
if (fd == -1) {
if (errno == ELOOP || errno == ENXIO)
errno = EOPNOTSUPP;
return -1;
}
+   if (!fstat(fd, ) &&
+   !S_ISREG(buf.st_mode) && !S_ISDIR(buf.st_mode)) {
+   close(fd);
+   errno = EOPNOTSUPP;
+   return -1;
+   }
r = ioctl(fd, EXT2_IOC_GETFLAGS, );
if (r == -1) {
if (errno == ENOTTY)
diff --git a/lib/e2p/fsetflags.c b/lib/e2p/fsetflags.c
index 6455e386..d865d243 100644
--- a/lib/e2p/fsetflags.c
+++ b/lib/e2p/fsetflags.c
@@ -80,14 +80,26 @@ int fsetflags (const char * name, unsigned long flags)
int f = (int) flags;
return syscall(SYS_fsctl, name, EXT2_IOC_SETFLAGS, , 0);
 #elif HAVE_EXT2_IOCTLS
+   struct stat buf;
int fd, r, f, save_errno = 0;
 
+   if (!stat(name, ) &&
+   !S_ISREG(buf.st_mode) && !S_ISDIR(buf.st_mode)) {
+   errno = EOPNOTSUPP;
+   return -1;
+   }
fd = open(name, OPEN_FLAGS);
if (fd == -1) {
if (errno == ELOOP || errno == ENXIO)
errno = EOPNOTSUPP;
return -1;
}
+   if (!fstat(fd, ) &&
+   !S_ISREG(buf.st_mode) && !S_ISDIR(buf.st_mode)) {
+   close(fd);
+   errno = EOPNOTSUPP;
+   return -1;
+   }
f = (int) flags;
r = ioctl(fd, EXT2_IOC_SETFLAGS, );
if (r == -1) {

Bug#989612: mke2fs -E offset=N still warns about existing partition table at the beginning

2021-07-21 Thread Theodore Ts'o

tags 989612 +pending
thanks

Thanks for reporting this bug.  The following commit should fix the
issue.

- Ted

commit 942b00cb9d2f2b52f4c58877d523145ee59a89b0
Author: Theodore Ts'o 
Date:   Wed Jul 21 15:46:09 2021 -0400

mke2fs: do not warn about a pre-existing partition table when using a 
non-zero offset

The existing code attempted to avoid warning about a pre-existing file
system with a non-zero offset, but because the offset was not set at
the time of the check, this intention was not actually working.  So
this commit will suppress warnings about pre-existing a partition
table as well as pre-existing file system when there is a non-zero
offset.

Addresses-Debian-Bug: #989612
Signed-off-by: Theodore Ts'o 

diff --git a/lib/support/plausible.c b/lib/support/plausible.c
index 2a3ae140..bbed2a70 100644
--- a/lib/support/plausible.c
+++ b/lib/support/plausible.c
@@ -282,11 +282,11 @@ int check_plausibility(const char *device, int flags, int 
*ret_is_dev)
return !has_magic;
}
 #endif
-
-   ret = check_partition_table(device);
-   if (ret >= 0)
-   return ret;
-
+   if (flags & CHECK_FS_EXIST) {
+   ret = check_partition_table(device);
+   if (ret >= 0)
+   return ret;
+   }
return 1;
 }
 
diff --git a/misc/mke2fs.c b/misc/mke2fs.c
index 5a35e9ef..c7b32316 100644
--- a/misc/mke2fs.c
+++ b/misc/mke2fs.c
@@ -1973,18 +1973,8 @@ profile_error:
profile_get_integer(profile, "options", "proceed_delay", 0, 0,
_delay);
 
-   /* The isatty() test is so we don't break existing scripts */
-   flags = CREATE_FILE;
-   if (isatty(0) && isatty(1) && !offset)
-   flags |= CHECK_FS_EXIST;
-   if (!quiet)
-   flags |= VERBOSE_CREATE;
-   if (fs_blocks_count == 0)
-   flags |= NO_SIZE;
-   else
+   if (fs_blocks_count)
explicit_fssize = 1;
-   if (!check_plausibility(device_name, flags, _device) && !force)
-   proceed_question(proceed_delay);
 
check_mount(device_name, force, _("filesystem"));
 
@@ -2650,6 +2640,17 @@ profile_error:
 
free(fs_type);
free(usage_types);
+
+   /* The isatty() test is so we don't break existing scripts */
+   flags = CREATE_FILE;
+   if (isatty(0) && isatty(1) && !offset)
+   flags |= CHECK_FS_EXIST;
+   if (!quiet)
+   flags |= VERBOSE_CREATE;
+   if (!explicit_fssize)
+   flags |= NO_SIZE;
+   if (!check_plausibility(device_name, flags, _device) && !force)
+   proceed_question(proceed_delay);
 }
 
 static int should_do_undo(const char *name)

Bug#989612: mke2fs -E offset=N still warns about existing partition table at the beginning

2021-06-20 Thread Theodore Ts'o

On Thu, Jun 17, 2021 at 10:39:20AM +0200, Paul Gevers wrote:
> Control: block 988830 by -1
> 
> Hi Josh, Theodore,
> 
> On Tue, 08 Jun 2021 09:14:41 -0700 Josh Triplett 
> wrote:
> > mke2fs with -E offset=N does not seem to take the offset into account
> > when checking the target to see if it seems to already contain
> > something.
> 
> Same as for bug 989630, do you know if this is a regression?

Nope, this has been around since 2013, and Josh is the first person to
have noticed and complained.

The offset feature is super-rare, and most of the time it's used only
as part of an automated image-building script where the image contains
a partition table as well as one or more file systems.

- Ted

Bug#989630: mke2fs with size limit and default discard will discard data after size limit

2021-06-20 Thread Theodore Ts'o

On Sat, Jun 19, 2021 at 08:44:52PM -0700, Josh Triplett wrote:
> On Thu, Jun 17, 2021 at 10:36:48AM +0200, Paul Gevers wrote:
> > On Tue, 08 Jun 2021 20:22:39 -0700 Josh Triplett 
> > wrote:
> > > Package: e2fsprogs
> > > Version: 1.46.2-2
> > > Severity: important
> > > X-Debbugs-Cc: j...@joshtriplett.org
> > > 
> > > "important" because this does cause data loss; not filing as
> > > release-critical because using mke2fs on a full disk or full disk image
> > > with a specified size to write a partition may be a niche use case.
> > 
> > Do you know if this is a regression with respect to the version in
> > bullseye (1.46.2-1)?
> 
> I'm fairly sure this bug has existed for quite a long time. Please don't
> let it affect propagation of new versions.

Indeed, this bug has been around since 2008, and most of the time,
people only use the size option when they are creating an file system
in a image file, where this wouldn't matter.  And if they are running
mke2fs on a partition with a size limit, which is unlikely to begin
with, it's even more unlikely they would care about what came after
the file system in the partition.

That's why no one noticed or complained for 13 years; and so I would
consider "important" to be a stretch.  :-)

- Ted

Bug#988830: [pre-approval] unblock e2fsprogs [Was: Bug#987641: e2fsprogs: FTBFS on armel/armhf with a 64-bit kernel]

2021-06-07 Thread Theodore Ts'o

Control: tags -1 -moreinfo
thanks

On Sun, Jun 06, 2021 at 06:18:27AM +0200, Paul Gevers wrote:
> On 03-06-2021 22:06, Paul Gevers wrote:
> > As I can't judge this myself, I'll take your word for it. If the upload
> > can happen in the next couple of days, let's have this in unstable and
> > if nothing is reported, have it in bullseye.
> 
> Please removed the moreinfo tag once the package is available.

I've just uploaded the package to unstable.

The debdiff is attached for your convenience.

   - Ted

diff -Nru e2fsprogs-1.46.2/debian/changelog e2fsprogs-1.46.2/debian/changelog
--- e2fsprogs-1.46.2/debian/changelog   2021-02-28 21:24:34.0 -0500
+++ e2fsprogs-1.46.2/debian/changelog   2021-06-07 07:27:15.0 -0400
@@ -1,3 +1,18 @@
+e2fsprogs (1.46.2-2) unstable; urgency=medium
+
+  * Fix FTBFS on arm32 when building on a system with an arm64 kernel.
+This also fixes a portability problem when replaying a fast-commit
+journal on an arm64 kernel when using an arm32 userspace.
+(Closes: #987641)
+  * Fix additional fast-commit portability problems on non-x86 systems
+which don't allow non-aligned pointer dereferences (for example, on a
+sparc64 system).
+  * Fix a missing mutex unlock on an error path which can cause e2fsck
+to hang on I/O errors.
+  * Clean up and improve the man page for e2image
+
+ -- Theodore Y. Ts'o   Mon, 07 Jun 2021 07:27:15 -0400
+
 e2fsprogs (1.46.2-1) unstable; urgency=medium
 
   * New upstream release
diff -Nru 
e2fsprogs-1.46.2/debian/patches/0001-e2fsck-fix-portability-problems-caused-by-unaligned-.patch
 
e2fsprogs-1.46.2/debian/patches/0001-e2fsck-fix-portability-problems-caused-by-unaligned-.patch
--- 
e2fsprogs-1.46.2/debian/patches/0001-e2fsck-fix-portability-problems-caused-by-unaligned-.patch
 1969-12-31 19:00:00.0 -0500
+++ 
e2fsprogs-1.46.2/debian/patches/0001-e2fsck-fix-portability-problems-caused-by-unaligned-.patch
 2021-06-07 07:27:15.0 -0400
@@ -0,0 +1,326 @@
+From 40960a7118171498448870b26d1c867f92fa430e Mon Sep 17 00:00:00 2001
+From: Theodore Ts'o 
+Date: Mon, 3 May 2021 15:37:33 -0400
+Subject: [PATCH 1/5] e2fsck: fix portability problems caused by unaligned
+ accesses
+
+The on-disk format for the ext4 journal can have unaigned 32-bit
+integers.  This can happen when replaying a journal using a obsolete
+checksum format (which was never popularly used, since the v3 format
+replaced v2 while the metadata checksum feature was being stablized),
+and in the fast commit feature (which landed in the 5.10 kernel,
+although it is not enabled by default).
+
+This commit fixes the following regression tests on some platforms
+(such as running 32-bit arm architectures on a 64-bit arm kernel):
+j_recover_csum2_32bit, j_recover_csum2_64bit, j_recover_fast_commit.
+
+https://github.com/tytso/e2fsprogs/issues/65
+
+Addresses-Debian-Bug: #987641
+Signed-off-by: Theodore Ts'o 
+---
+ e2fsck/journal.c   | 83 --
+ e2fsck/recovery.c  | 22 
+ tests/j_recover_fast_commit/script |  1 -
+ 3 files changed, 56 insertions(+), 50 deletions(-)
+
+diff --git a/e2fsck/journal.c b/e2fsck/journal.c
+index a425bbd1..bd0e4f31 100644
+--- a/e2fsck/journal.c
 b/e2fsck/journal.c
+@@ -286,9 +286,9 @@ static int ext4_fc_replay_scan(journal_t *j, struct 
buffer_head *bh,
+   int ret = JBD2_FC_REPLAY_CONTINUE;
+   struct ext4_fc_add_range *ext;
+   struct ext4_fc_tl *tl;
+-  struct ext4_fc_tail *tail;
++  struct ext4_fc_tail tail;
+   __u8 *start, *end;
+-  struct ext4_fc_head *head;
++  struct ext4_fc_head head;
+   struct ext2fs_extent ext2fs_ex = {0};
+ 
+   state = >fc_replay_state;
+@@ -338,16 +338,15 @@ static int ext4_fc_replay_scan(journal_t *j, struct 
buffer_head *bh,
+   break;
+   case EXT4_FC_TAG_TAIL:
+   state->fc_cur_tag++;
+-  tail = (struct ext4_fc_tail *)ext4_fc_tag_val(tl);
++  memcpy(, ext4_fc_tag_val(tl), sizeof(tail));
+   state->fc_crc = jbd2_chksum(j, state->fc_crc, tl,
+   sizeof(*tl) +
+   offsetof(struct ext4_fc_tail,
+   fc_crc));
+   jbd_debug(1, "tail tid %d, expected %d\n",
+-  le32_to_cpu(tail->fc_tid),
+-  expected_tid);
+-  if (le32_to_cpu(tail->fc_tid) == expected_tid &&
+-  le32_to_cpu(tail->fc_crc) == state->fc_crc) {
++le32_to_cpu(tail.fc_tid), expected_tid);
++  if (le32_to_cpu(tail.fc_tid) == expected_tid &&
++  le32_

Bug#987069: document which file systems support cgroupv2 I/O controller

2021-05-14 Thread Theodore Ts'o

On Thu, May 13, 2021 at 11:04:14PM -0400, Theodore Ts'o wrote:
> I'll give you another example that we learned the hard way.  Depending
> on how tight you make your memory cgroups and how tight you constrain
> your I/O controller, it's possible for write throttling --- where
> processes which are dirtying memory faster than they can be written
> out are put to sleep instead of triggering the OOM killer.  It turns
> out write throttling when the total system memory is low is quite
> different from a particular memory cgroup is low on free memory, and
> so the complex interactions between the memory cgroup controller and
> and the I/O cgroup controller is another reason why there appears to
> be a guaranteed employment act for data center kernel engineers.  :-)

Rereading this, I realized it wasn't completely clear because I left
out a part of the sentence.  Let me reword that.

What can happen if you have a memory cgroup controller with tight
memory constraints on a container, and an I/O controller throttling
I/O in that same container, this can lead to the OOM killer kill off
one or more of the processes in that container, because write
throttling caused by memory cgroup limits may not work sufficiently
well to prevent the OOM killer deciding to start randomly killing
processes --- and the I/O throttling essentially makes things worse,
because that prevents the I/O from relieving the pressure, but if
nothing prevents processes from continuing to dirty pages, then the
OOM killer starts going on a moderous rampage and starts killing
innocent processes.

What's important to note is that of these components are "working"
exactly as advertised.  It's just combined effects may not be what you
wanted, which is an overall systems problem.

Cheers,

- Ted

Bug#987069: document which file systems support cgroupv2 I/O controller

2021-05-13 Thread Theodore Ts'o

replication, the speed benefits are worth it, and we have other ways
of protecting against single node failures --- either due to fs
corruption after a power failure, or the network router at the top of
the rack failing, or a power distribution unit failure taking out an
entire row of racks of servers.  And data=writeback also has
tradeoffs:

  "A crash+recovery can cause incorrect data to appear in files which
  were written shortly before the crash."
  - https://www.kernel.org/doc/html/latest/admin-guide/ext4.html#data-mode

> I'm also not sure, because everywhere I've looked appears to document
> that this technology is not filesystem specific (except the Facebook
> cgroupv2 iogroup announcement which asserts btrfs-only).
> 
> CCing Theodore Ts'o, who will definitely know if ext4 is supported!

As described above, under *some* circumstances, ext4 *may* work with
I/O latency.  But until you test it, you won't know for sure.

So more generally, that's the problem with I/O controllers in general.
They are fundamentally oblivious to how they interact with other parts
of the kernel, and until you do the testing, and perform any
remediation that you may find is necessasry, you may end up being
surprised when you put it into production use.  Josef did some brief
testing with XFS and ext4, and noted that it didn't work in their
default configuration, and he did whatever work was needed to make it
work well for Facebook using btrfs.

I'll give you another example that we learned the hard way.  Depending
on how tight you make your memory cgroups and how tight you constrain
your I/O controller, it's possible for write throttling --- where
processes which are dirtying memory faster than they can be written
out are put to sleep instead of triggering the OOM killer.  It turns
out write throttling when the total system memory is low is quite
different from a particular memory cgroup is low on free memory, and
so the complex interactions between the memory cgroup controller and
and the I/O cgroup controller is another reason why there appears to
be a guaranteed employment act for data center kernel engineers.  :-)

So even with btrfs, how you configure your system may be quite
different from how Facebook configures its data center servers, so I'd
encourage you to be careful, and do a lot of careful testing before
you deploy the I/O latency cgroup in production, It's not that Block
I/O latency controller won't work with any particular file system ---
the problem is that it may work too well, or at least not the way you
expect, ala the magic broomstick carrying water in Disney's
Fantasia[2].

[2] 
https://video.disney.com/watch/sorcerer-s-apprentice-fantasia-4ea9ebc01a74ea59a5867853

Cheers,

- Ted

P.S.  As far as other file systems (f2fs, jfs, reiserfs, etc.) the
same issues apply; if they don't mark metadata I/O with the REQ_META
flag, things may not go well.  If they do, the next question is
whether they are doing a lot of I/O on kernel threads on behalf of
various userspace processes.  But until you actually test and verify,
I would hesitate to make any kind of guarantee.

Bug#987641: e2fsprogs: FTBFS on armel/armhf with a 64-bit kernel

2021-05-03 Thread Theodore Ts'o

On Mon, May 03, 2021 at 11:00:37PM +0200, Aurelien Jarno wrote:
> 
> Maybe I should give a bit of context here. First of all, there is one armhf
> buildd, arm-arm-01, setup as an arm64 machine with a 32-bit armhf chroot. It
> has been setup following [1] a study from Steve McIntyre [1]. It appears
> that e2fsprogs first failed to build there [2] and got requeued on another
> buildd where it succeed.
> 
> Now with my DSA and buildd maintainer hat on, we have been experiencing for
> quite a lot of VM crashes when building packages in 32-bit armhf/armel VMs
> on arm64 machines, so we have recently stopped using VMs to build them and
> instead rely on chroots.

Thanks for the context.  I had indeed noticed shortly after 1.46.2-1
was released that it had failed on the first armhf buildd, and then
when it was retried, it got successfully built.  Given that this was
right before the bulleye release freeze hardened, this had been on my
radar screen to fix, since it was clearly non-optimal, but I had
assumed that it would be OK to let things slide until after the
Bullseye release, since after all e2fsprogs 1.46.2-1 *did*
successfully get built on armhf.

For me, this is really a question of timing.  It will definitely be
the case that the next source upload of e2fsprogs will have the
armhf/armel build fix.  The question I have is should I upload the fix
before Bullseye releases, or after the Bullseye release.

What is the impact on the buildd and DSA support effort if we wait
until after the Debian 11.0 release?  What is the pain if we leave
this unfixed until Bullseye releases (I'm assuming that it's going to
be released soon)?  The buildd's aren't going to be rebuilding
e2fsprogs until the next source upload, I would think.

Contrawise, what is the impact on the Debian Release and Debian
Installer teams if I push out a bug-fix-only e2fsprogs source package
in the next week or so?

I'll do what is least disruptive for all of the relevant teams.  Let
me know what's preferred.

Cheers,

- Ted

Bug#987641: e2fsprogs: FTBFS on armel/armhf with a 64-bit kernel

2021-05-03 Thread Theodore Ts'o

On Mon, Apr 26, 2021 at 11:01:45PM +0200, Aurelien Jarno wrote:
> Source: e2fsprogs
> Version: 1.46.2-1
> Severity: serious
> Tags: upstream ftbfs
> Justification: fails to build from source (but built successfully in the past)
> Forwarded: https://github.com/tytso/e2fsprogs/issues/65
> 
> e2fsprogs builds fine on armel/armhf when built on a machine with a
> 32-bit kernel. However it fails to build on a machine with a 64-bit
> kernel due to alignments issues which are not trapped by the kernel:
> 
> A build log is available there:
> https://tests.reproducible-builds.org/debian/logs/unstable/armhf/e2fsprogs_1.46.2-1.build2.log.gz

Hi, thanks for the bug report.  I have a patch which should address
this problem.  (See below).

I have a question for the Debian Release Team (cc'ed).  Do you agree
this is considered "serious"?  It will build from source on a system
with a arm-32 kernel.  It is only when cross-compiling on armel or
armhf on a aarch64 platform that some regression tests
(j_recover_csum2_32bit, j_recover_csum2_64bit, and j_recover_fast_commit)
will fail, and it is this which causes dpkg-buildpackage when run on a
arm-32 chroot on a 64-bit arm system to fail.

So it is not completely clear to me that this qualifies as a FTBFS,
such that the release team would grant an migration into bullseye
given that we are currently in "frozen hard to get hot"[1]

[1] https://lists.debian.org/debian-devel-announce/2021/03/msg6.html

I actually wouldn't mind it if I could sneak in an update into
Bullseye, especially if I could fix one or two other bugs at the same
time that don't qualify for RC.  (For example, Debian Bug: #984472[2],
and a bug fix for mke2fs -D which was reported by the Yocto
project[3].

[2] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=984472
[3] https://github.com/tytso/e2fsprogs/pull/68

However, given that e2fsprogs is used by the installer, and we're
pretty late in the Bulleye Freeze --- I'd like to get a general
approval of the concept of an e2fsprogs update for Bullseye at this
point in time before I prepare an update and file a formal unblock
request.

Or we can do this after the initial Bullseye release, if that would be
more convenient for the release process.

What say ye?

Many thanks,

- Ted

commit bc8e4e56fcdd2a9e65cc87f6303dd127f79ad22d
Author: Theodore Ts'o 
Date:   Mon May 3 15:37:33 2021 -0400

e2fsck: fix portability problems caused by unaligned accesses

The on-disk format for the ext4 journal can have unaigned 32-bit
integers.  This can happen when replaying a journal using a obsolete
checksum format (which was never popularly used, since the v3 format
replaced v2 while the metadata checksum feature was being stablized),
and in the fast commit feature (which landed in the 5.10 kernel,
although it is not enabled by default).

This commit fixes the following regression tests on some platforms
(such as running 32-bit arm architectures on a 64-bit arm kernel):
j_recover_csum2_32bit, j_recover_csum2_64bit, j_recover_fast_commit.

https://github.com/tytso/e2fsprogs/issues/65

    Addresses-Debian-Bug: #987641
Signed-off-by: Theodore Ts'o 

diff --git a/e2fsck/journal.c b/e2fsck/journal.c
index a425bbd1..2231b811 100644
--- a/e2fsck/journal.c
+++ b/e2fsck/journal.c
@@ -278,6 +278,23 @@ static int process_journal_block(ext2_filsys fs,
return 0;
 }

+/*
+ * This function works on unaligned 32-bit pointers, which is needed
+ * since fast_commit's on-disk format does not guarantee that pointers
+ * are 32-bit aligned.
+ */
+static __u32 get_le32(__le32 *p)
+{
+   unsigned char *cp = (unsigned char *) p;
+   __u32 ret;
+
+   ret = (__u32) *cp++;
+   ret = ret | ((__u32) *cp++ << 8);
+   ret = ret | ((__u32) *cp++ << 16);
+   ret = ret | ((__u32) *cp++ << 24);
+   return ret;
+}
+
 static int ext4_fc_replay_scan(journal_t *j, struct buffer_head *bh,
int off, tid_t expected_tid)
 {
@@ -344,10 +361,10 @@ static int ext4_fc_replay_scan(journal_t *j, struct 
buffer_head *bh,
offsetof(struct ext4_fc_tail,
fc_crc));
jbd_debug(1, "tail tid %d, expected %d\n",
-   le32_to_cpu(tail->fc_tid),
+   get_le32(>fc_tid),
expected_tid);
-   if (le32_to_cpu(tail->fc_tid) == expected_tid &&
-   le32_to_cpu(tail->fc_crc) == state->fc_crc) {
+   if (get_le32(>fc_tid) == expected_tid &&
+   get_le32(>fc_crc) == state->fc_crc) {
state->fc_rep

Bug#986332: lsattr on certiain files in /dev results in "stack smashing detected"

2021-04-05 Thread Theodore Ts'o

On Mon, Apr 05, 2021 at 12:46:48AM +0200, Chris Hofstaedtler wrote:
> 
> AFAICT, for /dev/dri/card0 the ioctl ends up in the kernel's
> drm_ioctl [2], which will blindly call copy_to_user assuming the
> output size is the same as the input size (8 bytes). This is wrong
> for FS_IOC_GETFLAGS, at least for normal files.
> 
> Maybe the best thing is to put the lstat check back in?
> Or maybe lsattr should expect that the kernel might actually use the
> 8 bytes? I have checked various fs ioctl functions, and they all
> seem to return 4 bytes, except for orangefs [3] ...

What's going on is that apparently there is an overlap between the
ioctl code FS_IOC_GETFLAGS (aka EXT2_IOC_GETFLAGS) and some ioctl code
used by device driver responding to /dev/dri/card0, in drm_ioctl.  I
had the vague thought that at some point, we might be able to set and
get file system flags on device files, which is why I removed the
lstat check.  I wasn't counting on the fact that there would be ioctl
code collisions --- which in retrospect, was hopelessly optimistic on
my part.

So yeah, we need to put the lstat check back in.

I checked fs/orange/file.c and it is also using 4 bytes (int is always
32 bits even on 64-bit platforms):

if (cmd == FS_IOC_GETFLAGS) {
ret = orangefs_getflags(inode, );
if (ret)
return ret;
gossip_debug(GOSSIP_FILE_DEBUG,
 "orangefs_ioctl: FS_IOC_GETFLAGS: %llu\n",
 (unsigned long long)uval);
return put_user(uval, (int __user *)arg);
  ^

% cat /tmp/foo.c
#include 
#include 

int main(int argc, char **argv)
{
printf("size of int: %d\n", sizeof(int));
return 0;
}
% cc -o /tmp/foo /tmp/foo.c
% /tmp/foo
size of int: 4

Fortunately, the fortify compile option detectsd the stack smash, so
it's not critical that we get this fixed ASAP, but we ultimately do
need to put the lstat check back in.

- Ted

Bug#984472: [PATCH 2/2] resize2fs: close the file system on errors or early exits

2021-03-07 Thread Theodore Ts'o

When resize2fs exits early, perhaps because of an error, we should
free the file system so that if MMP is in use, the MMP block is reset.
This also releases the memory to avoid memory leak reports.

Addresses-Debian-Bug: #984472
Signed-off-by: Theodore Ts'o 
---
 resize/main.c | 50 ++
 1 file changed, 30 insertions(+), 20 deletions(-)

diff --git a/resize/main.c b/resize/main.c
index ccfd2896..8621d101 100644
--- a/resize/main.c
+++ b/resize/main.c
@@ -448,11 +448,15 @@ int main (int argc, char ** argv)
(fs->super->s_free_inodes_count > 
fs->super->s_inodes_count))
checkit = 1;
 
+   if ((fs->super->s_last_orphan != 0) ||
+   ext2fs_has_feature_journal_needs_recovery(fs->super))
+   checkit = 1;
+
if (checkit) {
fprintf(stderr,
_("Please run 'e2fsck -f %s' first.\n\n"),
device_name);
-   exit(1);
+   goto errout;
}
}
 
@@ -463,7 +467,7 @@ int main (int argc, char ** argv)
if (fs->super->s_feature_compat & ~EXT2_LIB_FEATURE_COMPAT_SUPP) {
com_err(program_name, EXT2_ET_UNSUPP_FEATURE,
"(%s)", device_name);
-   exit(1);
+   goto errout;
}
 
min_size = calculate_minimum_resize_size(fs, flags);
@@ -471,6 +475,9 @@ int main (int argc, char ** argv)
if (print_min_size) {
printf(_("Estimated minimum size of the filesystem: %llu\n"),
   (unsigned long long) min_size);
+   success_exit:
+   (void) ext2fs_close_free();
+   remove_error_table(_ext2_error_table);
exit(0);
}
 
@@ -497,7 +504,7 @@ int main (int argc, char ** argv)
if (retval) {
com_err(program_name, retval, "%s",
_("while trying to determine filesystem size"));
-   exit(1);
+   goto errout;
}
if (force_min_size)
new_size = min_size;
@@ -507,7 +514,7 @@ int main (int argc, char ** argv)
if (new_size == 0) {
com_err(program_name, 0,
_("Invalid new size: %s\n"), new_size_str);
-   exit(1);
+   goto errout;
}
} else {
new_size = max_size;
@@ -527,7 +534,7 @@ int main (int argc, char ** argv)
com_err(program_name, 0, "%s",
_("New size too large to be "
  "expressed in 32 bits\n"));
-   exit(1);
+   goto errout;
}
}
new_group_desc_count = ext2fs_div64_ceil(new_size -
@@ -540,20 +547,20 @@ int main (int argc, char ** argv)
com_err(program_name, 0,
_("New size results in too many block group "
  "descriptors.\n"));
-   exit(1);
+   goto errout;
}
 
if (!force && new_size < min_size) {
com_err(program_name, 0,
_("New size smaller than minimum (%llu)\n"),
(unsigned long long) min_size);
-   exit(1);
+   goto errout;
}
if (use_stride >= 0) {
if (use_stride >= (int) fs->super->s_blocks_per_group) {
com_err(program_name, 0, "%s",
_("Invalid stride length"));
-   exit(1);
+   goto errout;
}
fs->stride = fs->super->s_raid_stride = use_stride;
ext2fs_mark_super_dirty(fs);
@@ -580,52 +587,52 @@ int main (int argc, char ** argv)
" is only %llu (%dk) blocks.\nYou requested a new size"
" of %llu blocks.\n\n"), (unsigned long long) max_size,
blocksize / 1024, (unsigned long long) new_size);
-   exit(1);
+   goto errout;
}
if ((flags & RESIZE_DISABLE_64BIT) && (flags & RESIZE_ENABLE_64BIT)) {
fprintf(stderr, _("Cannot set and unset 64bit feature.\n"));
-   exit(1);
+   goto errout;
} else if (flags & (RESIZE_DISABLE_64BIT | RESIZE_ENABLE_64BIT)) {
if (new_size >= (1ULL << 32)) {
fprintf(stderr, _("Cannot change the 64bit feature "

Bug#984472: [PATCH 1/2] resize2fs: avoid allocating over the MMP block

2021-03-07 Thread Theodore Ts'o

When resizing past the point where the reserve inode has reserved
space for the block group descriptors to expand, and resize2fs (in an
offline resize) needs to move the allocation bitmaps and/or inode
table around, it's possible for resize2fs to allocate over the MMP
block, which would be bad.

Prevent this from happening by reserving the MMP block as a file
system metadata block (which it is) in resize2fs's accounting.

Addresses-Debian-Bug: #984472
Signed-off-by: Theodore Ts'o 
---
 resize/resize2fs.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/resize/resize2fs.c b/resize/resize2fs.c
index a0d08e5b..daaa3d49 100644
--- a/resize/resize2fs.c
+++ b/resize/resize2fs.c
@@ -1177,6 +1177,11 @@ static errcode_t mark_table_blocks(ext2_filsys fs,
if (blk)
ext2fs_mark_block_bitmap2(bmap, blk);
}
+   /* Reserve the MMP block */
+   if (ext2fs_has_feature_mmp(fs->super) &&
+   fs->super->s_mmp_block > fs->super->s_first_data_block &&
+   fs->super->s_mmp_block < ext2fs_blocks_count(fs->super))
+   ext2fs_mark_block_bitmap2(bmap, fs->super->s_mmp_block);
return 0;
 }
 
-- 
2.30.0

Bug#984472: e2fsprogs: resize2fs on inconsistent filesystem (in a way that e2fsck doesn't detect) completely corrupts MMP data (in a way e2fsck can't recover from)

2021-03-06 Thread Theodore Ts'o

On Fri, Mar 05, 2021 at 09:48:50PM -0500, Theodore Ts'o wrote:
> I can't reproduce the problem given your file system image.  Given
> your description, this is almost certainly operator error.

OK, I was finally able to reproduce the problem, but not using your
reproduction instructions.  I reproduced it via:

1.  e2image -r rbd13.e2i.qcow2 /tmp/rbd13
2.  truncate -s 10T /tmp/rbd13
3.  e2fsck -fy /tmp/rbd13
4.  resize2fs /tmp/rbd13
5.  e2fsck -fy /tmp/rbd13

The bug report was also incorrect by saying that resizing the file
system to 4T was sufficient; that is not true.  It can only be
reproduced by when a file system is resized sufficiently large that
there is no longer enough room to grow the block group descriptors
without moving the allocation bitmaps and/or inode table out of the
way in order to create room for the block group descriptors.  (As in
the above reproduction recipe.)

% e2image -r rbd13.e2i.qcow2 /tmp/rbd13
e2image 1.46.2 (28-Feb-2021)
% truncate -s 4T /tmp/rbd13
% resize2fs  /tmp/rbd13
resize2fs 1.46.2 (28-Feb-2021)
Please run 'e2fsck -f /tmp/rbd13' first.
% e2fsck -f /tmp/rbd13
e2fsck 1.46.2 (28-Feb-2021)
e2fsck: MMP: e2fsck being run while checking MMP block
MMP check failed: If you are sure the filesystem is not in use on any node, run:
'tune2fs -f -E clear_mmp /tmp/rbd13'
MMP_block:
mmp_magic: 0x4d4d50
mmp_check_interval: 10
mmp_sequence: e24d4d50
mmp_update_date: Sat Mar  6 22:47:25 2021
mmp_update_time: 1615088845
mmp_node_name: cwcc
mmp_device_name: /tmp/rbd13

/tmp/rbd13: ** WARNING: Filesystem still has errors **

This is a separate bug.  The issue here is that resize2fs is exiting
after printing the "Please run 'e2fsck -f /tmp/rbd13' first." without
cleanly stopping (resetting) the MMP protection.  And this doesn't
lead to file system corruption, as we can see here:

% tune2fs -f -E clear_mmp /tmp/rbd13
tune2fs 1.46.2 (28-Feb-2021)
% e2fsck -fy /tmp/rbd13
e2fsck 1.46.2 (28-Feb-2021)
Clearing orphaned inode 45617124 (uid=107, gid=115, mode=0100600, size=16777216)
Clearing orphaned inode 15073744 (uid=0, gid=0, mode=0100644, size=593696)
Clearing orphaned inode 15073743 (uid=0, gid=0, mode=0100644, size=3031904)
Clearing orphaned inode 50331709 (uid=0, gid=0, mode=0100644, size=149704)
Clearing orphaned inode 50332495 (uid=0, gid=0, mode=0100755, size=231560)
Clearing orphaned inode 50332319 (uid=0, gid=0, mode=0100644, size=2670992)
Clearing orphaned inode 50332271 (uid=0, gid=0, mode=0100644, size=651472)
Clearing orphaned inode 50332251 (uid=0, gid=0, mode=0100644, size=282752)
Clearing orphaned inode 13 (uid=0, gid=0, mode=0100600, size=0)
Pass 1: Checking inodes, blocks, and sizes
Inode 46530577 extent tree (at level 1) could be shorter.  Optimize? yes

Inode 46530714 extent tree (at level 1) could be shorter.  Optimize? yes

Pass 1E: Optimizing extent trees
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

/tmp/rbd13: * FILE SYSTEM WAS MODIFIED *
/tmp/rbd13: 225913/67108864 files (8.6% non-contiguous), 185961407/268435456 
blocks

The workaround to the first bug is to clear the MMP feature, do the
offline resize, and then enable the MMP feature again.  Of course,
while the MMP feature you won't be protected by another node trying to
modify the file system.  But it will allow you to grow the file system.

Another workaround is to simply do an online resize (that is, on the
node where the file system is mounted, run resize2fs on the mounted
file system).  However, depending on the kernel version, it won't
allow you to resize past the limits of the reserved block group
descriptors reserved by the resize inode.

- Ted

Bug#984472: e2fsprogs: resize2fs on inconsistent filesystem (in a way that e2fsck doesn't detect) completely corrupts MMP data (in a way e2fsck can't recover from)

2021-03-05 Thread Theodore Ts'o

I can't reproduce the problem given your file system image.  Given
your description, this is almost certainly operator error.

> $ e2fsck -f   /dev/zvol/filling/store/nabijaczleweli/e2test
> e2fsck 1.46.2 (28-Feb-2021)
> e2fsck: MMP: e2fsck being run while checking MMP block
> MMP check failed: If you are sure the filesystem is not in use on any node, 
> run:
> 'tune2fs -f -E clear_mmp /dev/zvol/filling/store/nabijaczleweli/e2test'
> MMP_block:
> mmp_magic: 0x4d4d50
> mmp_check_interval: 10
> mmp_sequence: e24d4d50
> mmp_update_date: Wed Mar  3 14:51:38 2021
> mmp_update_time: 1614779498
> mmp_node_name: tarta
> mmp_device_name: /dev/zvol/filling/store/nabijacz

What this message means is that some *other* node was trying to run
fsck on the node at the same time as your e2fsck run.

The key in this message is 

MMP check failed: If you are sure the filesystem is not in use on any node, run:
  

When you then run 

'tune2fs -f -E clear_mmp /dev/zvol/filling/store/nabijaczleweli/e2test'

It forcibly clears the MMP block.  The MMP protects the file system
from simulatenous modification by more than one system.  Given that
you you had this file system on some kind of remote block device
(which presumably is why you were using the multi-mount protection
feature in the first place), forcibly overriding the MMP protection is
a bad thing.  It's the functional equivalent of turning off the gun's
safety, and then aiming the gun at your foot, and pulling the trigger.

> 
> $ tune2fs -f -E clear_mmp /dev/zvol/filling/store/nabijaczleweli/e2test
> tune2fs 1.46.2 (28-Feb-2021)
> 
> $ e2fsck -fy   /dev/zvol/filling/store/nabijaczleweli/e2test
> e2fsck 1.46.2 (28-Feb-2021)
> ext2fs_open2: Superblock checksum does not match superblock
> e2fsck: Superblock invalid, trying backup blocks...
> Superblock has invalid MMP magic.  Fix? yes

The bad checksum and the invalid MMP magic means that some other
system was also modifying the file system while tune2fs was clearing
the MMP block.

If I run the same set of commands in your logs after unpacking your image:

% unzstd rbd13.e2i.qcow2.zst
% e2image -r rbd13.e2i.qcow2 rbd13
% truncate -s 4T rbd13  # This "expands" the file to 4TB

... and then running the same set of commands using "rbd13" instead of
"/dev/zvol/filling/store/...", it works just fine.  But on the local
file, obviously there won't be any other system or node modifying the
file system, and there is no corruption.


I do see some problems in how resize2fs handles MMP devices.  In
particular resize2fs doesn't check the MMP block.  So if there is some
other process messing with the file system, the result can be file
system corruption.  But you really shouldn't have been trying to
resize the file system if someone else is trying to use the file
system.

And even if you do this, if you run "tune2fs -f -E clear_mmp ..." and
something else is using the file system, you're going to be doomed,
anyway.

I suppose we can add some "are you sure?  ARE YOU REALLY SURE?"
question to "tune2fs -f -E clear_mmp", and maybe even force the user
to type the string "I AGREE TO ASSUME RESPONSIBILITY FOR FILE SYSTEM
DESTRUCTION IF OTHER NODES ARE USING, MOUNTING, OR MODIFYING THE FILE
SYSTEM", but at the end of the day, there is only so much we can do
protect against PEBCAK failures

- Ted

Bug#926130: e2fsck returns 1 which makes e2scrub fails

2021-02-14 Thread Theodore Ts'o

tag 926130 +moreinfo
thanks

On Sat, May 25, 2019 at 06:54:57PM -0400, Theodore Ts'o wrote:
> On Tue, May 14, 2019 at 09:42:58AM +0200, Laurent Bigonville wrote:
> > 
> > I've attached the requested logs. I don't see anything weird here.
> 
> Yeah, I'm not seeing anything in the logs either.  :-(
> 
> Would you be willing to send me a compressed e2image file?  The
> commands to generate are:
> 
>   e2image -Q /dev/fornost/docker docker.qcow2
>   xz docker.qcow2
> 
> .. and then send me the docker.qcow2.xz file.  It will include only
> the metadata blocks of the file system.  See the e2image man page for
> more details.  This *will* include filenames in the directory blocks,
> so if that makes you uncomfortable, you can use the -S option --- but
> if it's all the same to you, it would be more convenient if you could
> send it to me w/o using -S option.  The file will hopefully be small
> enough that you can just e-mail it to me, or put it someplace where I
> can download it (perhaps encrypted using GPG).

Hi, it's been a while.  Are you seeing e2scrub problems, especially
with newer versions of e2fsprogs?  If you are, would you be willing to
send me a compressed e2image file?

Thanks!

- Ted

Bug#981070: e2fsprogs: (Possibly) e2fsck sometimes enables ext4 features on ext2 filesystems?

2021-01-26 Thread Theodore Ts'o

On Wed, Jan 27, 2021 at 01:10:33AM +0100, Christoph Biedl wrote:
> 
> Yes, and my question here: Is it possible the existence of that bogus
> fscrypt feature flag made e2fsck or the kernel think this is an ext4,
> and things went downhill from there? That's a situation I'd like to
> avoid - since in the (today rare) case the bootloader cannot handle
> ext4, this will result in an unbootable system.

So, it had nothing to do with the fscrypt flag; the kernel message was
because that an inode flag was set on an inode, *not* a feature flag
in the superblock.

Looking at the tune2fs -l output, the issue was the inline_data file
system feature flag:

> Filesystem features:  ext_attr resize_inode dir_index filetype 
> inline_data sparse_super

That *is* a feature which e2fsck will offer to set, but only if there
is a valid inode which (a) has the inline data flag in the inode, (b)
there is a "system.data" extended attribute which has a non-zero size.
In that case, if the inode has valid inline data, e2fsck will assume
that what had happened feature flag was erroneously cleared, and will
ask permission to set it back:

Pass 1: Checking inodes, blocks, and sizes
Inode 12 has inline data, but superblock is missing INLINE_DATA feature
Fix? yes

If there is an inode that has the INLINE_DATA_FL inode flag set, but
there is not a valid system.data extended attribute, then e2fsck will
assume that it's random garbage written into the inode table that
happened to have the INLINE_DATA_FL bit set in the i_flags field set,
then e2fsck will ask permission to clear the inode and then get rid of
the file:

Pass 1: Checking inodes, blocks, and sizes
Inode 12 has INLINE_DATA_FL flag on filesystem without inline data support.
Clear? yes
Pass 2: Checking directory structure
Entry 'foo' in / (2) has deleted/unused inode 12.  Clear? yes

What could have happened on your file system?  Well, there are two
scenarios that could explain what had happened:

1) Somehow the inode was corrupted to (a) both set the inline data
flag, and (b) a valid extended attribute that had "system.data" (which
can't be set via the userspace API; it would have had to been
magically, random set).   Highly unlikely.

2) There was a random bit flip that enabled the inline_data feature
flag in the superblock.  The other fscrypt kernel message would
be explained another random bit flip and/or random garbage written
into an inode table block.

3) An admin accidentally ran "tune2fs -O inline_data /dev/sdXX" to
enable the inline_data feature.  The fscrypt message could be
explained as above.

In any case, e2fsck is doing the right thing.  It *is* possible for
e2fsck to set the inline_data feature flag, yes. but it's under
very tightly constrained circumstances, and the alternative would be
to have e2fsck to delete user data that could potentially be quite
valuable.  (For example, a cryptographic key which protects a bitcoin
wallet with $220 million dollars worth of bitcoin in it.  :-P )

Could this happen when it shouldn't?  Well, it would highly unlikely
--- as in one in bazillions odds unlucky.  It's actually much more
likely that a random bitflip in the in-memory superblock toggled the
inline_data feature bit set.

- Ted

Bug#981070: e2fsprogs: (Possibly) e2fsck sometimes enables ext4 features on ext2 filesystems?

2021-01-25 Thread Theodore Ts'o

On Tue, Jan 26, 2021 at 12:10:20AM +0100, Christoph Biedl wrote:
> Package: e2fsprogs
> Version: 1.44.5-1+deb10u3
> Severity: normal
> 
> Dear Maintainer,
> 
> at first, I am sorry to tell I cannot provide the raw data. I was quite
> in a hurry and overwrote it before realizing its high value for
> debugging.
> 
> Story: An ext2 file system, actually a boot partition, suffered severe
> damage for whatever reason, possibly controller issues. After massive
> repair done by e2fsck (on amd64, from Debian testing), the file system
> was clean and usable again (also, no damage to the important files but a
> lot of garbage in lost+found), *but* the kernel no longer recognizes it
> as ext2:
> 
>   EXT4-fs (sdf3): mounted filesystem without journal. Opts: (null)
> 
> Expected:
> 
>   EXT4-fs (sdf3): mounting ext2 file system using the ext4 subsystem

That's just a matter of how it was mounted.  If you mount with -t
ext2, you'll get:

# mke2fs -t ext2 /tmp/foo.img  2M
# mount -t ext2 /tmp/foo.img /mnt
[39916.330690] EXT4-fs (loop0): mounting ext2 file system using the ext4 
subsystem

If you mount the same file system with ext4, you'll get:

# mount -t ext4 /tmp/foo.img /mnt

[39880.180962] EXT4-fs (loop0): mounted filesystem without journal. Opts: 
(null). Quota mode: none.

> Therefore I assume e2fsck, while repairing the damage, by accident
> enabled some feature bits that made the kernel think it is dealing with
> an ext4
> 
> As I said, the raw image is destroyed. However I kept the output of
> "tune2fs -l" (again, from testing), possibly this provides enough hint
> to investigate?

You didn't actually include the output of tune2fs -l in your bug
report.  Given that you named them "broken" and "okay" I assume you
meant to attach them, but forgot to do so?

That being said, I'm pretty sure e2fsck enabled file system features;
in general, it doesn't do that, and certainly not with any of the
features added since the ext4 era.

My best guess is that I/O errors resulted in garbage getting written
to the inode table, and e2fsck wasn't clearly the fscrypt inode flag
when the fscrypt feature flag was not set; that would explain the
fscrypt kernel error message that you saw.

Cheers,

- Ted

Bug#979411: resize2fs: no percentage completion bars displayed with -p

2021-01-07 Thread Theodore Ts'o

On Wed, Jan 06, 2021 at 02:31:46PM +0200, Andrei POPESCU wrote:
> Package: e2fsprogs
> Version: 1.44.5-1+deb10u3
> Severity: normal
> Tags: upstream
> 
> According to the man page the -p option should print out "percentage 
> completion bars", though this was not done on a recent operation 
> (offline shrink).

If the resize operation doesn't require anything complex --- for
example, merely adjusting the number of blocks in the superblock,
without needing to relocate any blocks or inodes, we end up not doing
anything time-consuming enough to trigger a progress bar.

I suppose we could add a pointless progress bar which shows up for a
few milliseconds, but it would be a "even if you don't blink, you
still might miss it".  For example:

% mke2fs -Fq -t ext4 foo.img 16384
% e2fsck -fy foo.img
e2fsck 1.45.6 (20-Mar-2020)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
foo.img: 11/4096 files (0.0% non-contiguous), 1813/16384 blocks
% time resize2fs -p foo.img 16380
resize2fs 1.45.6 (20-Mar-2020)
Resizing the filesystem on foo.img to 16380 (1k) blocks.
The filesystem on foo.img is now 16380 (1k) blocks long.

real0m0.003s
user0m0.002s
sys 0m0.000s

Probably the best thing to do is to change the man page to say that
progress bars will be displayed for non-trivial resize operations, or
some such.

Cheers,

- Ted

Bug#931679: /sbin/e2scrub_all: e2scrub_all -r passes the snapshot, but e2scrub -r expects the original

2019-07-11 Thread Theodore Ts'o

Control: tag -1 +pending

On Tue, Jul 09, 2019 at 06:36:03PM +1000, Trent W. Buck wrote:
> Package: e2fsprogs
> Version: 1.45.2-1
> Severity: minor
> File: /sbin/e2scrub_all
> 
> e2scrub_all calls e2scrub with the wrong argument:

Thanks for the bug report!  This will be fixed in the next release of
e2fsprogs.

- Ted

commit 9b2c33f9daaecc593755fa6d45b6d910f8fe2f7b
Author: Theodore Ts'o 
Date:   Thu Jul 11 13:28:05 2019 -0400

e2scrub_all: fix "e2scurb_all -r"

The e2scrub_all program was broken by commit c7d6525ecaab
("e2scrub_all: refactor device probe loop") so that it would use the
path of the snapshot volume instead of the base volume.  This caused
"e2scrub_all -r" to pass the wrong pathname to e2scrub, with the
result that e2scrub would abort with an error instead of removing the
snapshot volume.

Fixes: c7d6525ecaab ("e2scrub_all: refactor device probe loop")
    Addresses-Debian-Bug: #931679
Signed-off-by: Theodore Ts'o 

diff --git a/scrub/e2scrub_all.in b/scrub/e2scrub_all.in
index 24b2c681..f342faf2 100644
--- a/scrub/e2scrub_all.in
+++ b/scrub/e2scrub_all.in
@@ -115,7 +115,8 @@ ls_scan_targets() {

 # Find leftover scrub snapshots
 ls_reap_targets() {
-   lvs -o lv_path -S lv_role=snapshot -S lv_name=~\(e2scrub$\) --noheadings
+lvs -o lv_path -S lv_role=snapshot -S lv_name=~\(e2scrub$\) \
+   --noheadings | sed -e 's/.e2scrub$//'
 }

 # Figure out what we're targeting

Bug#931266: e2fsprogs: Build-Depends on udev and systemd on non-Linux architectures

2019-07-04 Thread Theodore Ts'o

Control: tags -1 +pending

On Sat, Jun 29, 2019 at 10:48:35PM +0100, James Clarke wrote:
> As of the above version, e2fsprogs Build-Depends on udev and systemd
> (and cron too which, whilst somewhat perplexing, is not the issue here),
> thereby rendering its build dependencies unsatisfiable on non-Linux
> architectures, namely GNU/kFreeBSD and GNU/Hurd (the latter being
> particularly important given it uses ext2fs as its filesystem). I went
> digging to see why this was but the commit[1] that added it isn't any
> more enlightening. I haven't tried building it without those packages, but can
> only assume the upstream code still works without udev and systemd. Can we
> please have this fixed in Debian so e2fsprogs builds everywhere again?

Thanks for reporting this!  The udev, systemd, and cron packages are
only needed to so the e2scrub scripts will be properly configured, and
e2scrub only works for Linux systems (since it relies on Linux's LVM
system).

I'll fix this for the next release.

- Ted

Bug#931387: e2scrub invoked for non-LVM device

2019-07-04 Thread Theodore Ts'o

On Thu, Jul 04, 2019 at 08:52:02PM +0200, Marc Haber wrote:
> 
> I agree with you there. A few word of explanation in some readme or in
> the mail sent out by /usr/lib/x86_64-linux-gnu/e2fsprogs/e2scrub_fail)
> would have been nice though.

I'll at adding some comments in the scripts.

> > Reported-by: Marc Haber 
> 
> Please don't use the unplussed mail address that I have never
> communicated. mh+debian-b...@zugschlus.de would be enough, it's a valid
> and working address, and one that spammers don't handle too well.

I've ammended the commit, thanks.

- Ted

Bug#931387: e2scrub invoked for non-LVM device

2019-07-04 Thread Theodore Ts'o

Control: tags -1 +pending

On Wed, Jul 03, 2019 at 07:59:11PM +0200, Marc Haber wrote:
> 
> (additional issue #1)
> the -C option is not mentioned in the e2scrub_all manual page,

That's been fixed (by removing the -C) upstream already.

commit 6ec2060a291b873ba120aa7cda940c6604eac550
Author: Darrick J. Wong 
Date:   Mon Jun 3 21:27:12 2019 -0700

e2scrub: remove -C from e2scrub_all

We already have the "SERVICE_MODE=1" feature that signals to e2scrub
that we're running as a background daemon and therefore we should exit
quietly if conditions aren't right.

It's therefore unnecessary to have a separate -C flag to achieve the
same outcome for cron jobs.  Merge the two together.

Signed-off-by: Darrick J. Wong 
Signed-off-by: Theodore Ts'o 

> (additional issue #2)
> e2scub_all -n reveals that it would actually invoke systemctl start
> e2scrub@- which, manually invoked, gives some feedback about its
> failure:

Yes, that was a bug; e2scprobe e2scrub_all should haven't issued the
"systemctl start e2scrub@-" command for your system.

> I don't think that having an e2 file system on a LUKS device on an LVM
> logical volume is such an exotic thing to have.

It's not exotic, but it's not something we had tested.  So thanks for
your bug report!  On my system I have the LVM PV on top of the LUKS
device (so that all of the LV's are encrypted), and on your system you
have it the other way around.

So the problem is that when we scan for LVM devices:

lvs -o lv_path --noheadings -S 
"lv_active=active,lv_role=public,lv_role!=snapshot,vg_free>256"

we get /dev/fan-c/root, but then when we then run lsblk on that
device, we get something like this:

NAMEMAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sdb   8:16   0 931,5G  0 disk  
fan-c_root  253:00 417,2G  0 lvm   
└─root  253:28   0 417,2G  0 crypt /

And so when we run lsblk -o NAME,MOUNTPOINT,FSTYPE -P -n -p
/dev/fan-c/root we get both the top-level and subdevice:

NAME="/dev/mapper/fan-c_root" MOUNTPOINT="" FSTYPE="crypto_LUKS"
NAME="/dev/mapper/root" MOUNTPOINT="/" FSTYPE="ext4"

That was the root cause of the failure.  Attached please find the fix
that I have applied to my git repo, and which will be in the next
release of e2fsprogs.

Thanks again for the bug report!

- Ted

P.S.  Sorry for the complexity; we were trying to use systemd to
handle the error reporting.  I do wonder if we would have been better
off if we done things the more old-fashioned way, instead of trying to
take advantage of all of systemd's "value add".  My personal opinion
is whether or not we like systemd, it's won the war, so we might as
well try to take advantages of it, since we're stuck with the
disadvantages.  And to be fair to systemd, the problems we've been
having haven't been with e2scrube_all triggering e2scrub@ and
with the service file triggering e2scrub_fail@ if there we need
to send a report to the system administrator.  The fact that we have
to use systemd-escape to transmogrify "/" to "-", and "/usr/projects"
to "-usr-projects" is certainly confusing (which is where e2scrub@-
comes from) --- but it hasn't been the actual cause of the e2scrub
bugs that we've had.

The bugs that we have found have been more in the complex stacking of
dm-crypt, luks, etc, and our trying to use "lvs" and "lsblk" to figure
out which devices should get scrubbed.  And there have been a bunch
hiding there --- and we currently don't scrub file systems on
thin-provisioned (dm-thinkp) devices at all, because that would be
even harder to get right.  So the issues have been on the
lvm/thinp/lsblk side, not the systemd integration aspect.

 commit 0207f41174ac94eb98ed29e52bc6e1c5f6a8bdd3
Author: Theodore Ts'o 
Date:   Thu Jul 4 11:39:45 2019 -0400

e2scrub_all: correctly handle the case where LUKS is stacked on an LV

We handle the case where an LVM's PV is stacked on top of a dm-crypt
device, but not the case where it's the other way around, where a LVM
LV contains a LUKS encrypted file system.  Fix this oversight.

Addresses-Debian-Bug: #931387

Reported-by: Marc Haber 
Signed-off-by: Theodore Ts'o 

diff --git a/scrub/e2scrub_all.in b/scrub/e2scrub_all.in
index cdc37ced..24b2c681 100644
--- a/scrub/e2scrub_all.in
+++ b/scrub/e2scrub_all.in
@@ -102,8 +102,9 @@ ls_scan_targets() {
 if [ -z "$devices" ]; then
return 0;
 fi
-lsblk -o NAME,MOUNTPOINT,FSTYPE -P -n -p $devices | \
-   grep FSTYPE=\"ext\[234\]\" | while read vars ; do
+lsblk -o NAME,MOUNTPOINT,FSTYPE,TYPE -P -n -p $devices | \
+   grep FSTYPE=\"ext\[234\]\" | grep TYPE=\"lvm\" | \
+   while read vars ; do
eval "${vars}"

if [ "${scrub_all}" -eq 1 ] || [ -n "${MOUNTPOINT}" ]; then

Bug#930484: e2fsck corrupts sparse files with -E bmap2extent

2019-06-28 Thread Theodore Ts'o

By the way, here are the official criteria from the Debian Developer's
Reference Manual[1]:

   Extra care should be taken when uploading to stable. Basically, a
   package should only be uploaded to stable if one of the following
   happens:

   * a truly critical functionality problem

   * the package becomes uninstallable

   * a released architecture lacks the package

[1] 
https://www.debian.org/doc/manuals/developers-reference/ch05.en.html#upload-stable

This bug just doesn't qualify.   "critical" is defined as:

makes unrelated software on the system (or the whole system)
break, or causes serious data loss, or introduces a security hole
on systems where you install the package.[2]

Where as the lower-priority bug "grave" is defined as:

makes the package in question unusable or mostly so, or causes
data loss, or introduces a security hole allowing access to the
accounts of users who use the package.[2]

[2] https://www.debian.org/Bugs/Developer


This bug is at *best* grave*, because it only corrupts a small number
of files (those that have holes --- hence, not "serious" data loss)
and only if the user uses a fairly exotic feature which you have to
manually specify, by hand, from a root shell.  Given how unlikely it
will be for people to run into this, it could easily be argued that
this would be "important" or "normal".

And per the developer's reference manual, the release team doesn't get
out of bed for stable updates for anything lower that "truly critical
functionality problems".  I will note that these are criteria more
strict than what Red Hat uses for RHEL, but .  There's a reason
why I only use Debian Stable^H^H^H^H^H Obsolete for really simple
services, such as DNS, IMAP, Web servers, etc.  If I'm going to use
any kind of advanced or exotic features, the very restrictive
definition of allowable bug fixes that are currently defined for
Stable make it (IMHO) a bad idea.

If you want bug fixes that aren't quite at the "truly critical
functionality problem" level, the only answer we have today is Debian
Backports.

Personally, I think Debian as an organization has drawn the line for
Stable inclusion in completely the wrong place.  By being so concerned
about not letting any regressions through, to the point that debdeffs
have to be submitted to the Debian Release Team for their low-level,
detailed approval, and making Maintainers jump through hoops, bugs
aren't getting fixed --- even "important" or "grave" bugs.  But Debian
is a volunteer organization, and I've never cared enough to really
push on this.

Best regards,

- Ted

P.S.  I actually go through a fair amount of work to make sure the
e2fsprogs Debian source package will continue to build on stable and
oldstable, although recent initiatives to force the use of newer
debhelper compat levels hav made this more difficult.  So if people
want to build the very latest version of e2fsprogs on older Debian
releases, this is not hard, and this is what I as the upstream
maintainer would recommend.  It's not what the Debian Release Team
considers acceptable, but: ¯\_(ツ)_/¯

Bug#930484: e2fsck corrupts sparse files with -E bmap2extent

2019-06-28 Thread Theodore Ts'o

On Fri, Jun 28, 2019 at 11:47:46AM +0200, Rhonda D'Vine wrote:
> 
>  Bugs that appear in stable have to get fixed in stable.  The release
> team doesn't block bugfixes for stable, you are exaggerating here - and
> data loss is obviously a serious problem, especially when it is just a
> one-line change.  Your statement makes it sound like the stable release
> team is a nutcase -- which it clearly isn't.  Please don't imply that.  :/

There are *very*, *very*, *very* large number of bug-fixes for
e2fsprogs in version 1.43.[3456789].  This isn't actually the first
data corruption bug since 1.43.2.  And this doesn't include the data
corruption bugs found in 1.44.x and 1.45.x that apply to 1.43.2; that
would far more effort to track.

So there's actually a huge problem hiding here, which is that it's
just way too much effort to backport all of the fixes.  The problem is
that the release team isn't willing to take the maintenance branch
updates, but rather wants every single change to be logged as a Debian
bug, and individually extracted as a patch, and applied manually to
debian/patches.

And that's more effort than I as volunteer am willing to do.  Red Hat
charges a ***huge*** sum of money for crazy enterprise customers who
want that kind of loving care of each bug fix being manually curated.
And that's because it's a huge amount of work.

If someone wants to volunteer to do that work, that's great.  But
that's what I would really love to find.  And if they only want to do
it for *this* bug, as opposed to people using resize2fs to shrink file
systems, which is actually far more likely to happen --- although, I'm
not sure how many customers using Debian Obsolete do that sort of
thing --- that's fine.  But when there are a huge number of bug fixes,
it's very hard to bring the motivation to do what Red Hat charges
$ for its customers to do, when the rest of the open source world
just says, update to the latest !@#!@?! version, already!

 - Ted

P.S.  This isn't an issue just for e2fsprogs.  Even firefox-esr is
only supported for less than a year before the answer is, "Sorry, you
want the new fixes, you have to take some new features."  It's a huge
engineering effort to disentangle bug fixes and backport them to aging
code bases.

Bug#931142: dwarves: New upstream version should be packaged

2019-06-27 Thread Theodore Ts'o

On Wed, Jun 26, 2019 at 10:15:22PM -0400, Theodore Y. Ts'o wrote:
> 
> There is a newer version of dwarves upstream (1.14)

Upstream has since released version 1.15, with some miscellaneous bug
fixes.  I've updated debian/master at

https://salsa.debian.org/tytso/dwarves.git

I've also updated the debian/changelog to have a describe the new
features since 1.12:

dwarves-dfsg (1.15-1) unstable; urgency=medium

  * New upstream release (changes since 1.12):
- Add support for BTF encoding which is a much more compact way
  of encoding C type information.  It is derived from CTF (Compact
  C-Type format) and is designed for use with eBPF.
- Add initial support for the DWARF DW_TAG_partial_unit
- Improve support for pretty-printing unions
- Teach pahole to show where a struct was used, via the -I option
- Use the running kernel by default when no file name is passed
- Improve man pages
- Support new BTF deduplication algorithm found the Linux kernel's
  libbpf library, which allows type information for the kernel to
  be stored in roughly 1% of the space.
- Add a new utility, btfdiff, which compares the pretty-printed
  type information between two kernel images.
- Teach pahole to use the BTF information to pretty print structures
  using the BTF information using "pahole -F btf", which is much faster
  than using the Dwarf information.
- Infer the __packed__ attribute for structures without alignment holes
  and which violate the natural types' alignment requirements.
- Support DWARF5's DW_AT_alignment tag
- Add a --compile option to pfunct which produces compileable
  output for function prototypes in an object file.
- Miscellaneous bug fixes

 -- Theodore Y. Ts'o   Wed, 27 Jun 2019 10:16:23 -0400

Bug#930484: e2fsck corrupts sparse files with -E bmap2extent

2019-06-13 Thread Theodore Ts'o

On Thu, Jun 13, 2019 at 11:19:13PM +0200, Florian Zumbiehl wrote:
> 
> That fixed versions are available in buster and as a backport is of no use
> to anyone walking into this trap who doesn't know that they are about to
> corrupt their files. As far as I am concerned, that is more than enough
> reason to not fix the conversion in stretch--but please, at least make
> e2fsck bail out with an error message when someone asks it to corrupt the
> file system. That should be an easy and low-risk fix, and if you had done
> that when the bug was discovered, you would at least have saved me a lot of
> time, as I amh still trying to figure out what's been corrupted--and as
> noone should be using this version of the code that is almost guaranteed to
> corrupt your data, replacing it with an error message is not a regression
> either.

I'm very sorry that you lost data.

The problem is not the difficulty in backporting the change --- the
fix is really just a one-line change.  Most of the complexity in the
commit is in adding regression test.

The problem is actually making any kind of change to stretch at this
point.  The release team strongly discourages package updates, unless
the bug is ***really*** serious.  This is especially true for packages
like e2fsck which are built into the installer, since taking an
e2fsprogs package involves respinning the installer.

Given how niche the bmap2extent option is, and combine the judgement
that it's highly unlikely that people conservative enough to use
Debian Obsolete^H^H^H^H^H^H^H Stable would be using an exotic e2fsck
option with how many hoops the Debian Release Team makes people doing
stable updates (I feel actively discouraged from going down that
route) and I feel extremely demotatived towards trying to extract out
just the bug reports and backport them to ancient versions of
e2fsprogs.  Red Hat pays masochists vast amounts money to do that.
It's just not sort of thing that volunteers are likely to do, and if
you look at the very small number of packages that are updated to
Debian stretch (not even all security fixes), compared to what Red Hat
releases in its updates, it's just the way life is.

In my mind, this is why Debian Backports exists.  If you want the
latest bug fixes (including one whhich avoids data corruption when
doing off-line shrinks using resize2fs), and/or you are doing anything
at all exotic, my only recommendation is to use the Debian Backports
version of e2fsprogs.

Bottom line, for *this* particular bug (as opposed to the various
resize2fs shrink fixes) preparing an update to 1.43.4-2 is pretty
simple.  And if I had any optimism that the Release Team would be
willing to take a very late update to Debian Stretch at the point when
it's just about become oldstable, I might even be willing to do the
upload.  But it's not something where I'm feeling particularly
motivated to convince the release team that this is a change they
should be taking, and respinning the Stretch installer at this point.

Best regards,

- Ted

Bug#930484: e2fsck corrupts sparse files with -E bmap2extent

2019-06-13 Thread Theodore Ts'o

Control: severity -1 normal
Control: fixed -1 1.43.5-1

On Thu, Jun 13, 2019 at 04:09:15PM +0200, Florian Zumbiehl wrote:
> Package: e2fsprogs
> Version: 1.43.4-2
> Severity: critical
> 
> e2fsck potentially moves blocks around in sparse files when running with
> -E bmap2extent, in particular when the blocks are physically adjacent on
> the underlying block device, but have a hole in between in the file.
> ...
>
> With version 1.44.5-1~bpo9+1, the test above does not produce corruption on
> my system, but I have not investigated whether that is just coincidence or
> because the bug has been fixed. As this silently corrupts files, I would
> think it should get some fix in stretch, and be it by disabling the
> feature.

The bug was fixed in e2fsprogs 1.43.5, via the fix:

commit 855c2ecb21d1556c26d61df9b014e1c79dbbc956
Author: Darrick J. Wong 
Date:   Wed May 24 21:56:36 2017 -0400

e2fsck: fix sparse bmap to extent conversion

When e2fsck is trying to convert a sparse block-mapped file to an extent
file, we incorrectly merge block mappings that are physically contiguous
but not logically contiguous because of insufficient checking with the
extent we're constructing.  Therefore, compare the logical offsets for
contiguity as well.

Signed-off-by: Darrick J. Wong 
Signed-off-by: Theodore Ts'o 

It's an unfortunate bug, but we've lived with it for about ten years,
and it was only noticed in 2017.  The reality is very few people
actually try to convert an indirect mapped file system the new
extent-mapped system, because you get much more of the benefits of
using a more modern version of ext4 by doing a backup, reformat, and
restore of the file system.

Given that so few people have noticed, I don't believe this qualifies
as even "important", since doing the conversion is not a fundamental
aspect of e2fsck, so it doesn't qualify as a significant impact on the
usability of the package.  For similar reasons, I don't believe the
release team would agree to a new release of e2fsprogs to stretch ---
especially since (a) it's fixed in buster, (b) the fix is also
available in stretch-backports, and (c) in three weeks, Stretch will
become oldstable and Buster will become the new stable release of
Debian.

Feel free to escalate this to the Debain release team but I don't
think this is going to be considered worth their time (or mine) to be
worth a backport of the fix to e2fsprogs 1.43.4-2.

Cheers,

- Ted

Bug#929834: Buster/XFCE unlock screen is blank

2019-06-03 Thread Theodore Ts'o

On Sat, Jun 01, 2019 at 06:16:58AM +0530, Raj Kiran Grandhi wrote:
> 
> In a fresh install of Buster with XFCE desktop, locking the screen
> blanks the monitor and the monitor enters a power save state. After
> that, neither moving the mouse nor typing on the keyboard would turn
> the monitor back on.
> I could find two ways to get the display back on:
> 
> 1. Typing the password without any visual feedback (while the monitor
> continues to be in the power save state) unlocks the screen and the user
> session is displayed normally.
> 
> 2. Switching to another VT, say vt1 or vt2 turns the monitor back on
> and on switching
> back to the vt of the original session the unlock prompt is displayed
> normally and the screen can be unlocked.

There's another workaround (which is the one I use):

xset s off

This has other consequences as well, of course, but I tend to suspend
my laptop if it's ever going to be left alone, particularly if it's
running on battery.

And once I found a workaround which worked for my laptop, I was too
lazy to find a more "proper" fix.  :-)

- Ted

Bug#907034: e2fsprogs: mkfs.ext4 asks for "y/N" but requires a different char in some locales

2019-05-27 Thread Theodore Ts'o

Control: reopen -1

On Mon, May 27, 2019 at 01:22:43PM +0200, Matteo wrote:
> Package: e2fsprogs
> Version: 1.44.5-1~bpo9+1
> Followup-For: Bug #907034
> 
> in the stretch-backport version (and also in the strech one) of e2fsprogs, 
> with italian locale, mke2fs still asks for y/N, but expects "y" in order to 
> proceed.

My bad.  I fixed it for e2fsck, but not for mke2fs.  Thanks for
retesting and pointing out my mistake.  I'll fix it for the next
release, the same way I fixed it for e2fsck.  (We can still get in
trouble if there is some language where the word for yes beginsr with
a 'n', but Translations Are Hard --- it's just in this case, if the
translator doesn't keep up, and the language is wierd, it can result
in data loss.

I suppose could test for missing critical translations, and simply
abort and tell the user to "try again in English".  Hmm I may want
to do that...

- Ted

Bug#912185: mkfs.ext4 spanish translation incomplete cause problems

2019-05-26 Thread Theodore Ts'o

Control: tag -1 +pending

On Sun, Oct 28, 2018 at 06:42:47PM -0300, H. Gabriel Máculus wrote:
> Package: e2fsprogs
> Version: 1.43.4-2
> 
> When I run mkfs.ext4 to overwrite existing filesystem it prompt in
> english  (last line) and you need reply in spanish (S, n)

This has been fixed in the latest es.po from the Translation Project.  
It will be included in e2fsprogs 1.45.2.

- Ted

Bug#926130: e2fsck returns 1 which makes e2scrub fails

2019-05-25 Thread Theodore Ts'o

On Tue, May 14, 2019 at 09:42:58AM +0200, Laurent Bigonville wrote:
> 
> I've attached the requested logs. I don't see anything weird here.

Yeah, I'm not seeing anything in the logs either.  :-(

Would you be willing to send me a compressed e2image file?  The
commands to generate are:

e2image -Q /dev/fornost/docker docker.qcow2
xz docker.qcow2

.. and then send me the docker.qcow2.xz file.  It will include only
the metadata blocks of the file system.  See the e2image man page for
more details.  This *will* include filenames in the directory blocks,
so if that makes you uncomfortable, you can use the -S option --- but
if it's all the same to you, it would be more convenient if you could
send it to me w/o using -S option.  The file will hopefully be small
enough that you can just e-mail it to me, or put it someplace where I
can download it (perhaps encrypted using GPG).

Thanks!!

- Ted

Bug#929287: e2scrub_reap.service fails to start

2019-05-20 Thread Theodore Ts'o

Package: e2fsprogs
Version: 1.45.1-3 

Fixed in the most recent upload of e2fsprogs.  (Sorry, I typo'ed the
Closes: number in the changelog.  I'll fix that up in a future release.)

   - Ted

Bug#929263: cloud.debian.org: /usr/sbin not in default $PATH

2019-05-20 Thread Theodore Ts'o

On Mon, May 20, 2019 at 08:17:09PM -0700, Noah Meyerhans wrote:
> At this point, I think it'd be worth revisiting, at the project level,
> the historical tradition of leaving the sbin directories out of non-root
> paths. Setting aside all the single user desktop and laptop systems,
> there are enough alternative ways to grant restricted root (file ACLs,
> etc), and to run in alternate filesystem namespaces (e.g.  containers),
> that the functional distinctions that lead to the original directory
> split are probably applicable in a minority of situations these days.

Sure, and I often use /sbin/mke2fs on create file system images on
plain files and then corrupt them using /sbin/debugfs to create
regression tests for e2fsck, and I do all of this w/o being root.  So
I have /sbin and /usr/sbin and /usr/local/sbin in my path.  Along with
a whole bunch of other customizations in my dot files, of course.

But my usage is an edge case, and asking Debian to make changes to
global changes to the defaults for people like me was never something
I thought was justifiable.  Ultimately, if we believe that someday
we'll have the year of the Linux Desktop, where the primary
applications used by users are things like Firefox, Open Office,
Tuxracer, etc., adding /sbin and /usr/sbin might not be doing those
users a favor --- and those of us who are more technical are perfectly
capable of customizing our dot files.  (Heck, I just pull a git repo
into ~/dotfiles, and then run "make install" and I get a custom set of
my dotfiles for that installation.  :-)

> This isn't something that I feel strongly about, though. Anybody who
> does should retitle this bug appropriately and reassign it to the
> 'general' pseudopackage, whereupon it can be discussed on debian-devel.
> Otherwise it should get tagged wontfix, unless someone thinks this is an
> appropriate change to introduce at the cloud image level (I would not
> agree with this).

I agree this should be a project-level decision, and not cloud-image
specific.  I personally am against changing the default.  That's
because if someone is installing Debian on student laptops / desktops
at an educational institution like MIT, most of those users really
don't need /sbin and /usr/sbin; Debian users include far more than
just system administrators, kernel developers, and devops types.

I don't feel very strongly about it, though, so if the project as a
whole thinks Debian should be optimized for technical users, it's not
something I'll lose any sleep over.  I replace all of the default
dotfiles for myself, anyway.  :-)

- Ted

Bug#929263: cloud.debian.org: /usr/sbin not in default $PATH

2019-05-20 Thread Theodore Ts'o

On Mon, May 20, 2019 at 01:02:25PM -0400, Noah Meyerhans wrote:
> On Mon, May 20, 2019 at 11:26:00AM +0200, Jorge Barata González wrote:
> >Vagrant image debian/stretch64 v9.6.0
> >/usr/sbin is not included by default in $PATH
> > 
> >```
> >vagrant@stretch:~$ service
> >-bash: service: command not found
> >vagrant@stretch:~$ /usr/sbin/service
> >Usage: service < option > | --status-all | [ service_name [ command |
> >--full-restart ] ]
> >vagrant@stretch:~$ echo $PATH
> >/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
> >```
> 
> That path is set from /etc/profile, which is not modified by the vagrant
> images from the default that Debian installs. /usr/sbin is not in the
> default PATH in Debian normally.

Specifically, /usr/sbin and /sbin are not in the path for non-root
users, but they are included for root users:

if [ "`id -u`" -eq 0 ]; then
  PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
else
  PATH="/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games"
fi

(And since you really don't want to be running /usr/games/nethack as
root, /usr/games and /usr/local games is not in root's path.  :-)

This is a historical convention, going back decades, that only the
system administrators needs to run the programs in /sbin and
/usr/sbin.  So to avoid users getting confused when they might run
those programs and get "permission denied", historically normal users
won't have /sbin and /usr/sbin in their path.  However many system
administrators will have their own custom dot files which do include
those directories in their paths.

That assumption is perhaps less likely to be true for servers running
in cloud VM', but making things be different for cloud VM's as
compared to standard Debian configurations also has downsides in terms
of causing greater confusion.  So my suggestion would be for you to
simply have your own custom dotfiles which can set a PATH different
from the default.

- Ted

Bug#929254: e2fsprogs: e2scrub cronjob spurious falure

2019-05-20 Thread Theodore Ts'o

merge 929186 929254
thanks

On Mon, May 20, 2019 at 04:53:29AM +0200, Marc Lehmann wrote:
> Package: e2fsprogs
> Version: 1.45.1-1
> Severity: normal
> 
> I just got this mail from cron:
> 
>So sorry, the automatic e2scrub of /db on host failed.
> 
>May 19 03:10:39 host systemd[1]: Starting Online ext4 Metadata Check for 
> /db...
>May 19 03:10:39 host e2scrub@-db[5246]: /db: Not connnected to a LVM 
> logical volume.
>May 19 03:10:39 host e2scrub@-db[5246]: Usage: /sbin/e2scrub [OPTIONS] 
> mountpoint | device

There is a fix coming soon; my apologies for this regression!

> (And, a very minor typo, it should be "an LVM", not "a LVM")

Good point, thanks.

- Ted

Bug#928977: also cron spam when lvm is installed but not in use

2019-05-20 Thread Theodore Ts'o

On Sun, May 19, 2019 at 11:30:56AM -0400, Joey Hess wrote:
> I do not use lvm on my laptop, but I have lvm2 installed because I
> sometimes attach removable media that uses it. And so it does not seem
> sufficient to avoid complaining when lvm2 is not installed, because I am
> also getting cron spam:

Yeah, sorry, that's Bug #929186, and it's fixed by:

commit fbb9bfa4a5925f8049ef51d8f59eb94920781ec8
Author: Theodore Ts'o 
Date:   Sat May 18 23:04:49 2019 -0400

e2scrub_all: avoid scrubbing all devices when there is nothing to scrub

Running lsblk when there are no valid block devicse results in
generating all block devices as the list of devices to scrub; this
results in a lot of e2scrub_all failures.

Addresses-Debian-Bug: #929186

Signed-off-by: Theodore Ts'o 

I'll get a fixed version of e2fsprogs out to unstable shortly...

  - Ted

Bug#929186: e2fsprogs: e2scrub@-.service service is running against non-LVM disks and failing

2019-05-18 Thread Theodore Ts'o

tag 929186 +pending
thanks

On Sun, May 19, 2019 at 09:29:26AM +1200, jfp wrote:
> Package: e2fsprogs
> Version: 1.45.1-1
> Severity: important
> Tags: upstream
> 
> The latest update on multiple machines produces this error on all non-LVM disk
> (I have NO LVM disks in those workstations)

Oops, thanks for the bug report!  The following patch will be in the
the next release.  You can either apply this patch to
/sbin/e2scrub_all, or just disable the e2scrub_all timer unit.

- Ted

>From fbb9bfa4a5925f8049ef51d8f59eb94920781ec8 Mon Sep 17 00:00:00 2001
From: Theodore Ts'o 
Date: Sat, 18 May 2019 23:04:49 -0400
Subject: [PATCH] e2scrub_all: avoid scrubbing all devices when there is
 nothing to scrub

Running lsblk when there are no valid block devicse results in
generating all block devices as the list of devices to scrub; this
results in a lot of e2scrub_all failures.

Addresses-Debian-Bug: #929186

Signed-off-by: Theodore Ts'o 
---
 scrub/e2scrub_all.in | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/scrub/e2scrub_all.in b/scrub/e2scrub_all.in
index 7ee653158..abc237e42 100644
--- a/scrub/e2scrub_all.in
+++ b/scrub/e2scrub_all.in
@@ -103,8 +103,12 @@ fi
 
 # Find scrub targets, make sure we only do this once.
 ls_scan_targets() {
-   lsblk -o NAME,MOUNTPOINT,FSTYPE -P -n -p \
- $(lvs -o lv_path --noheadings -S 
"lv_active=active,lv_role=public,lv_role!=snapshot,vg_free>${snap_size_mb}") | \
+local devices=$(lvs -o lv_path --noheadings -S 
"lv_active=active,lv_role=public,lv_role!=snapshot,vg_free>${snap_size_mb}")
+
+if [ -z "$devices" ]; then
+   return 0;
+fi
+lsblk -o NAME,MOUNTPOINT,FSTYPE -P -n -p $devices | \
grep FSTYPE=\"ext\[234\]\" | while read vars ; do
eval "${vars}"
 
-- 
2.19.1

Bug#928977: patch

2019-05-16 Thread Theodore Ts'o

On Thu, May 16, 2019 at 02:47:06PM -0400, Theodore Ts'o wrote:
> Control: tags -1 +pending
> 
> On Thu, May 16, 2019 at 05:17:10PM +0200, Adam Borowski wrote:
> > Cron spam from a Priority:Required package is not something I'd consider
> > fit for Buster.  Thus, could you please apply this patch?
> 
> Thanks for the patch, applied!

Actually, looking at this more closely, I think this might be a better
way to address things, since I need to deal with both the systemd and
cron cases:

commit ec23e6820b64509a4640c61ec94febdd7efd2905
Author: Theodore Ts'o 
Date:   Thu May 16 14:56:37 2019 -0400

e2scrub: stop cron spam if lvm2 is not installed.

Addresses-Debian-Bug: #928977

    Signed-off-by: Theodore Ts'o 

diff --git a/scrub/e2scrub_all.cron.in b/scrub/e2scrub_all.cron.in
index 7d42c3f2e..5bf83ec97 100644
--- a/scrub/e2scrub_all.cron.in
+++ b/scrub/e2scrub_all.cron.in
@@ -1,2 +1,2 @@
 30 3 * * 0 root test -e /run/systemd/system || @pkglibdir@/e2scrub_all_cron
-10 3 * * * root test -e /run/systemd/system || @root_sbindir@/e2scrub_all -A -r
+10 3 * * * root test -e /run/systemd/system || @root_sbindir@/e2scrub_all -C 
-A -r
diff --git a/scrub/e2scrub_all.in b/scrub/e2scrub_all.in
index 31ebc7970..eb7b55bee 100644
--- a/scrub/e2scrub_all.in
+++ b/scrub/e2scrub_all.in
@@ -26,6 +26,7 @@ if (( $EUID != 0 )); then
 fi
 
 scrub_all=0
+run_from_cron=0
 snap_size_mb=256
 reap=0
 conffile="@root_sysconfdir@/e2scrub.conf"
@@ -73,6 +74,7 @@ while getopts "nrAV" opt; do
"n") DBG="echo Would execute: " ;;
"r") scrub_args="${scrub_args} -r"; reap=1;;
"A") scrub_all=1;;
+   "C") run_from_cron=1;
"V") print_version; exitcode 0;;
*) print_help; exitcode 2;;
esac
@@ -84,13 +86,19 @@ shift "$((OPTIND - 1))"
 # when e2scrub_all is run out of cron or a systemd timer.
 
 if ! type lsblk >& /dev/null ; then
+if [ "${run_from_cron}" -eq 1 ] ; then
+   exitcode 0
+fi
 echo "e2scrub_all: can't find lsblk --- is util-linux installed?"
-exitcode 0
+exitcode 1
 fi
 
 if ! type lvcreate >& /dev/null ; then
+if [ "${run_from_cron}" -eq 1 ] ; then
+   exitcode 0
+fi
 echo "e2scrub_all: can't find lvcreate --- is lvm2 installed?"
-exitcode 0
+exitcode 1
 fi
 
 # Find scrub targets, make sure we only do this once.
diff --git a/scrub/e2scrub_all.service.in b/scrub/e2scrub_all.service.in
index 20f42bfe3..77b6ad599 100644
--- a/scrub/e2scrub_all.service.in
+++ b/scrub/e2scrub_all.service.in
@@ -8,5 +8,5 @@ Documentation=man:e2scrub_all(8)
 [Service]
 Type=oneshot
 Environment=SERVICE_MODE=1
-ExecStart=@root_sbindir@/e2scrub_all
+ExecStart=@root_sbindir@/e2scrub_all -C
 SyslogIdentifier=e2scrub_all
diff --git a/scrub/e2scrub_all_cron.in b/scrub/e2scrub_all_cron.in
index f9cff878c..bc26fee3d 100644
--- a/scrub/e2scrub_all_cron.in
+++ b/scrub/e2scrub_all_cron.in
@@ -65,4 +65,4 @@ on_ac_power() {
 test -e /run/systemd/system && exit 0
 on_ac_power || exit 0
 
-exec @root_sbindir@/e2scrub_all
+exec @root_sbindir@/e2scrub_all -C
diff --git a/scrub/e2scrub_reap.service.in b/scrub/e2scrub_reap.service.in
index cf26437cd..40511f735 100644
--- a/scrub/e2scrub_reap.service.in
+++ b/scrub/e2scrub_reap.service.in
@@ -16,7 +16,7 @@ NoNewPrivileges=yes
 User=root
 IOSchedulingClass=idle
 CPUSchedulingPolicy=idle
-ExecStart=@root_sbindir@/e2scrub_all -A -r
+ExecStart=@root_sbindir@/e2scrub_all -C -A -r
 SyslogIdentifier=%N
 RemainAfterExit=no

Bug#928977: patch

2019-05-16 Thread Theodore Ts'o

Control: tags -1 +pending

On Thu, May 16, 2019 at 05:17:10PM +0200, Adam Borowski wrote:
> Cron spam from a Priority:Required package is not something I'd consider
> fit for Buster.  Thus, could you please apply this patch?

Thanks for the patch, applied!

- Ted

Bug#926130: e2fsck returns 1 which makes e2scrub fails

2019-05-13 Thread Theodore Ts'o

Hi.  I haven't forgotten about your bug report.  The problem has been
I can't find any explanation for how e2fsck could have returned an
exit status of 1 given the output shown below.

> bigon@fornost:~$ sudo LC_ALL=C e2scrub /var/lib/sbuild/
>   Logical volume "sbuildbuild.e2scrub" created.
> e2fsck 1.45.0 (6-Mar-2019)
> Pass 1: Checking inodes, blocks, and sizes
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> /dev/fornost/sbuildbuild.e2scrub: 13/983040 files (0.0% non-contiguous),
> 68098/3932160 blocks

So... if you could do the following, I would greatly appreciate it.

1)  Confirm that this is still reproducing for you with e2fsprogs 1.45.0.

2) I've just released e2fsprogs 1.45.1 to Debian unstable.  If you
could try installing e2fsprogs, and see if you can still reproduce the
problem with e2fsprogs 1.45.1.

3) If you can, please create an /etc/e2fsck.conf containing the
following:

[options]
max_count_problems = 2
log_dir = /tmp
problem_log_filename = e2fsck-problems-%N.%h.INFO.%D-%T
log_filename = e2fsck-%N.%h.INFO.%D-%T
log_dir_wait = true

4) The rerun the e2scrub command and send me the output of
/tmp/e2fsck-problems-*INFO* and /tmp/e2fsck-*INFO* files that will be
created as a result of running e2fsck.

Many thanks, and my apologies for not getting back to you sooner.

  - Ted

Bug#924591: this requires linking in libsparse, which is from Android sources

2019-05-06 Thread Theodore Ts'o

clone 924591 -1
reassign 924591 fastboot 1:8.1.0+r23-4
retitle -1 e2fsprogs: add support for dynamically loading libsparse
severity -1 wishlist
thanks

I'm reassigning the original bug back to fastboot.  I've cloned the
bug and made it a feature request of having e2fsprogs dynamically load
libsparse if it is available.  That's not happening until after Buster
is released, though.

- Ted

Bug#927767: e2fsck: Needs to clear all MMP flags in one pass

2019-04-23 Thread Theodore Ts'o

On Tue, Apr 23, 2019 at 11:09:03AM -0700, Elliott Mitchell wrote:
> 
> One of the things encountered in a fortune file: "Every program has at
> least two purposes, one for which it was designed for, and for which it
> wasn't designed for."  (inexact quote)
> 
> The use case I've got for MMP is quite similar to the one you were
> originally thinking of.  My case many parameters are different, but the
> overall purpose is the same.

Sure, but that means that anything which tries to open or mount the
block device --- whether it be dumpe2fs, e2fsck, dump (of
dump/restore), debugfs, resize2fs, or the kernel mounting the file
system, is going to have to pay the MMP sleep penalty.  There really
is no way around it, because the assumption here is the only way we
can communicate with some other system, or some other VM, which might
have read/write access to the device, is via block reads and writes to
the device itself, using the MMP block.  And this is fundamentally a
polling interface, because of the fundamental limitations of the block
device, and we don't want to be constantly reading and writing from
the block device, because of the performance penalties, and because of
the write endurance problems it causes for the SSD.

If this is not satisfactory for you then you shouldn't be using
MMP, and need to be doing something else, like solving this problem at
the hypervisor layer.  I'd love to be able to offer something better,
but I'd also like to have faster of light travel and anti-gravity
technology.  The laws of physics are very hard to work around, alas

> > The problem though is while you don't *expect* to do a failover, two
> > systems can simultaneously access the device, and so you need to have
> > most of this logic.  Worse, if your use case includes the hypervisor
> > suspending a VM for an hour or two, before resuming it later, MMP
> > won't even protect you against the fumble finger case.
> 
> This depends on the hypervisor, but I'm getting the impression Xen (the
> one I'm using) gives some strong hints the VM had been paused (otherwise
> I wouldn't see kernel messages after suspend/resume).

It doesn't matter.  If you are depending on MMP to protect you against
simultaneously access to an ext4 file system, a suspended VM will
cause the MMP protections not to work.

If the hypervisor can solve the problem, then great!   Don't use MMP

> I've observed Xen's utilities have /some/ protection against the case I'm
> concerned with.  They will prevent you from attaching a block device
> which is mounted in "Domain-0" (not actually the hypervisor, but a
> privileged VM) to a VM.  They do not (or did not) protect you from
> attempting to mount a block device attached to a VM in "Domain-0".  I
> also don't know whether Xen's utilities will detect or prevent you from
> attaching a block device to multiple VMs.

So if these are real block devices in Domain 0 (as opposed to
something made up by the hypervisor and presented to its guest VM's),
then the O_EXCL/EBUSY mount protections should protect you.  I suspect
this might mean changing the Xen hypervisor so that it is passing
knowledge to whether some VM has a block device opened with O_EXCL is
passed down to the Xen layer --- and the Xen layer informing the guest
VM if some other VM has the block device opened with O_EXCL.  That's
what I mean by making it be the hypervisor's problem.

You'll be able to solve the problem much more efficiently (with less
overhead) and much more robustly, if you solve the problem at the
hypervisor control plane level.

Cheers,

- Ted

Bug#927767: e2fsck: Needs to clear all MMP flags in one pass

2019-04-23 Thread Theodore Ts'o

On Mon, Apr 22, 2019 at 07:34:40PM -0700, Elliott Mitchell wrote:
> 
> This is several VMs on a hypervisor.  Most filesystems aren't shared, I'm
> mostly using MMP for protection against my doing something stupid due to
> typing a command with the wrong device.  The VMs have separate /, and
> /var.  Their storage isn't being presented to them as a single disk, but
> instead being divided into block devices for filesystems in the
> privileged area.
> 
> The privileged area understands ext4 and I'm mostly worried about
> mistakenly trying to mount a filesystem in the privileged area while a VM
> is active on it.  There is no fallover, mostly protection against
> fumble-fingers.  %-)

MMP was designed for the use case where there were two servers that
connected to the same disk, using something like SCSI or Fibre
Channel.  The idea was one server would be the primary, and the other
would be the backup, and both could simultaneously access the disk ---
but of course, it wouldn't be safe for the two computers to try to run
e2fsck or mount the file system at the same time.

This is why clearing MMP is so slow; the primary computer (the one
holding the MMP "lock") writes to the MMP block once every 5 seconds,
updating a sequence counter.  The secondary computer, if it is trying
to acquire the lock, has to read the MMP block, sleep for 11 seconds
(in case the primary computer's update gets delayed due to the disk
being super busy, etc.), read tghe MMP block again, and if the MMP
block hasn't changed, it knows it's safe to grab the MMP lock.

It's actually a bit more complicated than that, since we also need to
deal with the situation where the primary computer is "dead" for say,
15 seconds, and then comes back to life.  This might happen if the
system is thrashing due to high memory pressure, and then the OOM
killer kills off some processes, and the system is restored.  So in
fact, the holder of the MMP lock has to query the MMP block before it
updates it, and if it finds the MMP lock has been stolen out from
under it, it's because it failed to update the MMP block, and so it
has to relinquish the lock, forcibly unmount the file system, and
assume the backup system has take over.

The problem though is while you don't *expect* to do a failover, two
systems can simultaneously access the device, and so you need to have
most of this logic.  Worse, if your use case includes the hypervisor
suspending a VM for an hour or two, before resuming it later, MMP
won't even protect you against the fumble finger case.

The real answer this sort of lockout should be implemented at the
hypervisor layer, since it should know, from its control plane,
whether or not two VM's are trying to use a particular virtual disk.
For example, Google Compute Engine's Persistent Disk product enforces
this restriction by default.  If you are using a regional persistent
disk, you can set up a standby VM in a different GCE zone, and in case
of a zonal failure, you can use a force-attach command[1] to steal the PD
from the original VM and connect it to the backup VM.  Then if the GCE
zone comes back to life (or the network partition gets healed, etc.),
the GCE control plane will prevent the original VM from accessing the
persistent disk which was stolen away from it.  (Disclosure: I work
for Google.)

[1] 
https://cloud.google.com/compute/docs/disks/regional-persistent-disk#force_attach

MMP was designed for the use case where the shared SCSI or FC disk
didn't have this kind of control plane logic, but in fact, it's
actually *way* more efficient to implement this feature above the
layer of the guest VM.

Cheers,

- Ted

Bug#927767: e2fsck: Needs to clear all MMP flags in one pass

2019-04-22 Thread Theodore Ts'o

On Mon, Apr 22, 2019 at 03:08:09PM -0700, Elliott Mitchell wrote:
> Package: e2fsprogs
> Version: 1.43.4-2
> Severity: minor
> 
> If being run on a system which uses MMP and they weren't unmounted
> cleanly, e2fsck will clear the MMP flags in order individually heavily
> slowing the check process.
> 
> At a minimum when `e2fsck -a` is run, if it needs to clear one MMP flag
> it should check for MMP flags on other filesystems and attempt to clear
> them too.
> 
> Needing two MMP waits (one for `fsck /` and one for `fsck -a`) is
> acceptable, 3+ is not.

What's the use case where you are using MMP on multiple file systems?
In most of the configurations that I've seen, each server will have
its own root file system, and the only thing which gets shared is a
single data partition.

In a boot sequence, e2fsck gets called separate for each partition.  For 
example:

% fsck -A -V -N
fsck from util-linux 2.33.1
Checking all file systems.
[/sbin/fsck.ext4 (1) -- /] fsck.ext4 /dev/mapper/lambda-root 
[/sbin/fsck.vfat (1) -- /boot/efi] fsck.vfat /dev/nvme0n1p1 
[/sbin/fsck.ext4 (1) -- /boot] fsck.ext4 /dev/nvme0n1p4

So e2fsck has no idea whether multiple file systems are being checked,
or only one.  E2fsck is also not parsing /etc/fstab --- that's the job
of the fsck driver program, or if you are using systemd, the systemd
generator.

So at the risk of sounding like Steve Jobs --- if you are trying to
use multiple MMP file systems, you're probably holding it wrong.  :-)

What is your use case; what are you trying to achieve, and how are
your systems setup?   How are the disks connected to multiple servers?

For that matter, there could easily be configurations where just
because one MMP file system is no longer accessible to its primary
system, another MMP file system might still be accessible to its
primary system.  (For example, one could imagine 3 systems, where each
system back up the other two, and so system #3 might be the backup for
systems #1 and #2.  Just because one MMP file system --- say the one
for #1 --- has failed over, doesn't mean that the other MMP file
system --- which might be hooked up to system #2 --- should have its
MMP flag cleared.)

- Ted

Bug#924591: this requires linking in libsparse, which is from Android sources

2019-04-22 Thread Theodore Ts'o

On Mon, Apr 22, 2019 at 10:19:46PM +0200, Hans-Christoph Steiner wrote:
> 
> I don't really know how fastboot in stretch provided the mke2fs support,
> but judging by the dependencies, it might have been that fastboot used
> to do the formatting itself, based on being linked to
> android-libext4-utils and android-libsparse.  The buster version of
> fastboot is clearly calling mk2efs, which in AOSP is built from an
> inline e2fsprogs fork.

Yes, that's correct.

>From running strings on the fastboot binary from Stretch, it's using
the statically linked-in make_ext4fs code.  The make_ext4fs was code
written years and years ago, back when Android senior management
(rumor has it was that it was Andy Rubin himself) didn't want to use
any GPL'ed code in userspace.  Fortunately, that's no longer the case.

The old make_ext4fs code was old, creaky, and didn't exactly work the
same way as mke2fs (since it was written as a clean-room
reimplementation from scratch).  As a result I was very happy when we
were finally able to take the make_ext4fs code and KILL IT WITH
FIRE[1].  :-)

[1] https://www.youtube.com/watch?v=Tnod9vtB4xA

Unfortunately, the focus was to make the make_ext4fs replacement work
with AOSP only.  I wasn't aware of Debian's native Android tools; but
even if I did, it's not clear that we could have gotten things working
within the scope of the intern project to drop make_ext4fs support and
port the necessary support code into e2fsprogs.

This change started landing in AOSP in November 2016 (it was a Fall
2016 intern project).  I'd have to check to be sure, but looking at
the Debian changelog, the AOSP release with the actual KILL IT WITH
FIRE commit probably landed in Debian sometime in late 2017.  Alas,
apparently no one had noticed the problem for well over a year.  So
I'm guessing Debian's fastboot, or at least its format command, is
rarely used by Debian users.  :-/

- Ted

Bug#924591: this requires linking in libsparse, which is from Android sources

2019-04-22 Thread Theodore Ts'o

On Mon, Apr 22, 2019 at 06:09:23PM +0200, Jonas Meurer wrote:
> Hans-Christoph Steiner:
> > Theodore Ts'o:
> >> So your choice --- we can either reassign this bug back to fastboot or
> >> android-sdk-platforms-tools, or I can downgrade the severity of this
> >> bug for e2fsprogs down to wishlist[1].  Let me know how you want to
> >> handle this.
> >>
> >> [1] This is because I view this both as a "feature request" and "bugs
> >> that are very difficult to fix due to major design considerations"
> >> (per https://www.debian.org/Bugs/Developer#severities), not to mention
> >> that it's going to affect a miniscule fraction of the e2fsprogs
> >> package's users.
> > 
> > Makes sense to me.  I'm fine with this being done post-Buster or as a
> > custom mke2fs in android-platform-system-core.
> 
> So the bottom line here is that the ext4 formatting support in fastboot
> remains broken in Buster, right? That would be very unfortunate and a
> regression compared to Stretch.

I'm not sure whether or not Stretch was using the old-style
make_ext4fs from AOSP, or was including the mke2fs from AOSP, but yes,
it sounds like it's a regression from stretch.  I'm not sure how many
Debian users are using the Debian-native fastboot versus using the
version from the Google SDK or the AOSP binaries, though.

It does seem that if this is considered high priority, the most
straightforward way to address this bug is going to be to include
building mke2fs from AOSP and placing it in
/usr/lib/android-sdk/platform-tools/mke2fs.  I know some folks on the
android tools teams aren't excited with that approach, but that
probably is the best thing to do if you want to address this in
Buster.

- Ted

Bug#924591: this requires linking in libsparse, which is from Android sources

2019-04-18 Thread Theodore Ts'o

On Thu, Apr 18, 2019 at 09:32:06PM +0200, Hans-Christoph Steiner wrote:
> 
> One possibility would be including libsparse as a patch, it doesn't
> change a lot:
> https://android.googlesource.com/platform/system/core/+log/master/libsparse
> 
> But it depends on Android's libbase and libz-host.

This might be "serious" bug from the fastboot package's perspective,
but there's no way in heck the release time is going to consider this
a bug that is "serious" priority for e2fsprogs.

More to the point, there's now way in the world I (or the release and
installer teams) are going to make e2fsprogs, which is an
"important=yes" package with priority "required" drag in the
android-libsparse, android-libbase, and zlib1g packages.

So the way you changed android-sdk-platforms-tools to use /sbin/mke2fs
was a really bad choice, especially this while we are in release
freeze for Buster.  There's no way in the world we are going to make a
change like this to a package like e2fsprogs which is used by the
installer at this point.

If we had more time, and if android-libsparse-dev shipped a static
library, we could have considered statically linking in
android-libsparse, android-libbase, and libz --- and see if they would
bloat the mke2fs and debugfs binaries by only a minimal amount.

This would also require making changes to e2fsprogs configure and
Makefiles, since currently we only have support for linking in
libsparse in the AOSP build files.  The reason for this is historical;
at the time when the intern working with Android team was working on
replace Android's make_ext4fs program with mke2fs and e2droid, there
was no distribution that was shipping libsparse, and trying to make
libsparse available to Linux desktop environments was *way* beyond the
scope of the Intern's project and time availability.

We can work on this trying to find a solution post-Buster --- either
using static linking, or *possibly* figuring out a way to optionally
use dlopen() to pull in libsparse for sparse_io.c, much like the way
libss optionally pulls in the readline library using dlopen at
runtime, back when we cared about making mke2fs fit on a two 1.44 MiB
boot/root install floppies.  :-)

Alternatively, you can build your own version of mke2fs using the
libsparse from AOSP.  If you want a solution that might make it in
during the Buster release freeze, that's probably the short-term
solution I would suggest.

So your choice --- we can either reassign this bug back to fastboot or
android-sdk-platforms-tools, or I can downgrade the severity of this
bug for e2fsprogs down to wishlist[1].  Let me know how you want to
handle this.

Cheers,

- Ted

[1] This is because I view this both as a "feature request" and "bugs
that are very difficult to fix due to major design considerations"
(per https://www.debian.org/Bugs/Developer#severities), not to mention
that it's going to affect a miniscule fraction of the e2fsprogs
package's users.

Bug#926293: e2fsprogs: Add method to initialize mount count

2019-04-03 Thread Theodore Ts'o

On Tue, Apr 02, 2019 at 07:43:34PM -0700, Elliott Mitchell wrote:
> Package: e2fsprogs
> Version: 1.43.4-2
> 
> Could `mke2fs` and `tune2fs` get some method to initialize the maximum
> mount count to a non-zero value?
> 
> Having to search for random values to initialize the maximum mount count
> is a bit annoying.  Perhaps a -O init_max_mount_count setting?
> 
> I understand the author's misgivings about all the complaints from people
> annoyed by precautionary checks.  Yet some of us are fine with occasional
> checks.  Meanwhile the time-based checks were just awful.

I suspect the best way forward is to teach tune2fs to be able to
support "tune2fs -c random".  The -O option is only for file system
features that are encoded in feature bitmaps in the ext4 superblock.

However, if you are using LVM, the better way to move forward is to
use e2scrub, which is in e2fsprogs 1.45.0.  That's not going to land
in Debian Buster, since we're in lockdown and the version of e2fsprogs
in Buster is 1.44.5-1.  But realistically, the change that you are
asking for isn't going to be landing any time soon, either.

 - Ted

Bug#926138: e2scrub_reap.service fails in LXC

2019-03-31 Thread Theodore Ts'o

On Sun, Mar 31, 2019 at 10:41:15PM +0200, Martin Pitt wrote:
> Package: e2fsprogs
> Version: 1.45.0-1
> 
> In a standard sid LXC container, this unit fails:
> 
> This is due to `AmbientCapabilities=CAP_SYS_ADMIN CAP_SYS_RAWIO`,
> and containers usually don't (and really should't) have RAWIO. Also,
> this unit seems fairly useless in containers anyway, as these only
> run on already mounted file systems.

Thanks for the bug report!

> Can you please consider adding
> 
>[Unit]
>ConditionVirtualization=!container

Hmm, I'm thinking instead of adding:

[Unit]
ConditionCapability=CAP_SYS_ADMIN
ConditionCapability=CAP_SYS_RAWIO

Could you test this and confirm whether or not this would correctly
cause the unit to be skipped for your lxc containers which don't have
CAP_SYS_RAWIO?

Thanks!

- Ted

Bug#926130: e2fsck returns 1 which makes e2scrub fails

2019-03-31 Thread Theodore Ts'o

On Sun, Mar 31, 2019 at 09:37:56PM +0200, Laurent Bigonville wrote:
> Package: e2fsprogs
> Version: 1.45.0-1
> Severity: important
> 
> Hi,
> 
> For one of my LV, e2scrub consistantly fails.
> 
> Debugging this it seems that e2fsck returns 1 but it's not showing any
> messages about reparing something.
> 
> According to the manpage, 1 means:
> 1- File system errors corrected

Can you send me (or attach to the bug) the output of running "e2scrub
"?

Thanks,

- Ted

Bug#924301: error isn't about space in filesystem

2019-03-18 Thread Theodore Ts'o

On Mon, Mar 18, 2019 at 10:05:13AM -0600, Bdale Garbee wrote:
> "Theodore Ts'o"  writes:
> 
> > If you'd like to apply this patch to /sbin/e2scrub_all and confirm
> > whether or not this fixes things for you, I'd appreciate any feedback.
> 
> Applied the patch, ran the cron entry command line by hand, and got zero
> output.

Thanks for testing!!

- Ted

Bug#924301: error isn't about space in filesystem

2019-03-17 Thread Theodore Ts'o

I plan change /sbin/e2scrub_all as follows:


diff --git a/scrub/e2scrub_all.in b/scrub/e2scrub_all.in
index 8bc868aa0..4cb90a0de 100644
--- a/scrub/e2scrub_all.in
+++ b/scrub/e2scrub_all.in
@@ -21,6 +21,7 @@
 PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
 
 scrub_all=0
+snap_size_mb=256
 conffile="@root_sysconfdir@/e2scrub.conf"
 
 test -f "${conffile}" && . "${conffile}"
@@ -108,6 +109,9 @@ ls_scrub_targets() {
eval "${lvm_vars}"
echo "${LVM2_LV_ROLE}" | grep -q "snapshot" && continue
 
+   free_space="$(vgs -o vg_free --units m --noheadings --no-suffix 
"${LVM2_VG_NAME}" 2> /dev/null | sed -e 's/\..*//')"
+   test "${snap_size_mb}" -gt "${free_space}" && continue
+
if [ -n "${MOUNTPOINT}" ]; then
echo "${MOUNTPOINT}"
else

If you'd like to apply this patch to /sbin/e2scrub_all and confirm
whether or not this fixes things for you, I'd appreciate any feedback.

  - Ted

Bug#924301: error isn't about space in filesystem

2019-03-17 Thread Theodore Ts'o

On Sun, Mar 17, 2019 at 08:54:42AM -0600, Bdale Garbee wrote:
> The complaint is not about a lack of space inside the filesystem, the
> complaint is about a lack of unallocated space in the volume group.  I
> suspect a large number of users are now seeing this message from cron,
> as I am, because they have a volume group where all blocks are
> allocated.
> 
> I think either the program itself or the cron invocation needs to be
> updated to just not try and run the program when a volume group has no
> unallocated space...  I really don't want to see this error from cron
> over and over!

Yeah, that's what I'm going to do.  Will be uploading a new version
shortly.  For now, as a workaround, "systemctl disable
e2scrub_all.timer ; systemctl stop e2scrub_all.timer" will stop the
annoyance.

(Or if you want to use this feature, just resize the file system so
it's a bit smaller and also shrink the volume.)

Apologies for the annoyance

   - Ted

1 2 3 4 5 >

1 - 100 of 412 matches

Mail list logo