Bug#775234: /sbin/e2fsck: systemd boot will hang if e2fsck-based logging is enabled

2015-01-13 Thread Cedric Ware

Hello,

Theodore Ts'o (Monday 2015-01-12):
 Arguably this is as mucha bug in systemd-fsck since it could have been
 waiting for fsck to terminate instead of waiting for an EOF from the
 progress_fd pipe, but I agree that it would be better if e2fsck were
 to close the progress_fd.

Thank you.  Indeed, I wasn't sure where to file this bug; I chose e2fsck
because it had suffered from a similar one, and it seemed easier to fix
as you did than to replace systemd-fsck's while(!feof) { fscanf } loop
with something much more complicated...

OTOH, thinking about it, other filesystems' fsck utilities might be
affected too, not just e2fsck.

Do you think this bug should be reassigned to systemd?  Or that other fsck
maintainers should be made aware of the issue?

 This isn't an RC bug by any means (I don't think it even rates
 important) since very few people are using the e2fsck.conf logging
 capabilities --- most debian users pre-systemd have been been using
 /var/log/fsck/*, and systemd will do its own journalctl thing.

I'm not as well-versed as I'd like in the various logging mechanisms, but
given that /var/log/fsck isn't mounted RW at this stage of the boot,
doesn't logging there require e2fsck.conf logging?

Besides, even though few people may be using e2fsck.conf, they may be in for
a nasty surprise when upgrading to Jessie, or systemd users if they try
tinkering with e2fsck.conf; don't you think users should be warned somehow?
Perhaps the postinst script in the Debian package could check for it?

(And this doesn't cover other distributions or filesystems, of course.
Maybe it is really systemd's problem...)

Anyway, thanks for your time, and for the patch.

Best regards,
Cedric Ware.


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#775234: /sbin/e2fsck: systemd boot will hang if e2fsck-based logging is enabled

2015-01-13 Thread Theodore Ts'o
On Tue, Jan 13, 2015 at 10:57:32AM +0100, Cedric Ware wrote:
 
 Thank you.  Indeed, I wasn't sure where to file this bug; I chose e2fsck
 because it had suffered from a similar one, and it seemed easier to fix
 as you did than to replace systemd-fsck's while(!feof) { fscanf } loop
 with something much more complicated...
 
 OTOH, thinking about it, other filesystems' fsck utilities might be
 affected too, not just e2fsck.

Well, the -C option is to date only honored by e2fsck.  In fact
/sbin/fsck will only pass the -C option to fsck.ext[234].
systemd-fsck replaces /sbin/fsck, as part of systemd's we will
reimplement the entire low-level stack, incompatibly mission
statement, but I don't know of any other fsck programs which are using
this feature.

For that matter, I wasn't aware that anyone other than Ubuntu's
graphical boot is using it --- I have no idea why systemd-fsck would
care about the progress bar information, and how or why it might be
using it in the first place.  If I have time I'll have take a look at
the systemd-fsck sources, but that's neither here nor there.

  This isn't an RC bug by any means (I don't think it even rates
  important) since very few people are using the e2fsck.conf logging
  capabilities --- most debian users pre-systemd have been been using
  /var/log/fsck/*, and systemd will do its own journalctl thing.
 
 I'm not as well-versed as I'd like in the various logging mechanisms, but
 given that /var/log/fsck isn't mounted RW at this stage of the boot,
 doesn't logging there require e2fsck.conf logging?

Well, /etc/init.d/chroot{fs,root}.* both use the logsave command,
which is esparate from the e2fsck.conf logging.  I'll be honest; the
original reason why I implemented e2fsck.conf logging was for Google's
internal use, because we wanted separate log files for each file
system that was being checked.  It's not something I've really
advertised, and given that logcheck has existed for years and years,
my expectation was that most of the distributions had beenusing
logcheck and weren't going to be using the e2fsck.conf logging scheme.
If Jessie hadn't gone systemd, I probably would have some point tried
to submit patches to transfer the logging from the init.d scripts and
ssyte logsave to the e2fsck.conf system, but in the mean time, systemd
happened, and systemd has its own way of trapping all stdout
information from various init scripts and system daemons.

(Which leads to another problem, which is some daemons are extremely
chatty because they didn't expect their stdout to be captured, leading
to /var/log/journal bloat and excessive SSD wear, but that's an issue
for another day.  And since we aren't enabling persistent journalctl
logs in Jessie, it's doesn't affect Debian yet.)

In any case, this unfortunate interaction between the -C option and
e2fsck.conf logging is only really going to affect those users who
found the logging scheme described in the e2fsck.conf man page and
decided to use it that would be affected by this bug in Jessie.
Granted, there may be people who didn't know about /var/log/fsck,
and/or haven't figured out how to extract this information from
journalctl, but I expect it to be a relatively small number.

My understanding is that the release team is trying very hard to get
Jessie out the door, and so I don't want to try to push things,
especially since most debian users probably won't have discovered this
particular e2fsck.conf feature.

BTW, logsave has the same bug, but systemd-fsck doesn't use logsave,
and it's harder to fix in logsave because (a) it's supposed to be a
general purpose program, and having it sniff argv[] of its child would
be a rather unfortunate abstraction layer, and (2) it turns out
portably closing all file descriptors which are open is actually quite tricky:

http://stackoverflow.com/questions/1315382/closing-all-open-files-in-a-process
http://stackoverflow.com/questions/899038/getting-the-highest-allocated-file-descriptor

- Ted


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#775234: /sbin/e2fsck: systemd boot will hang if e2fsck-based logging is enabled

2015-01-13 Thread Cedric Ware

Theodore Ts'o (Tuesday 2015-01-13):
 My understanding is that the release team is trying very hard to get
 Jessie out the door, and so I don't want to try to push things,
 especially since most debian users probably won't have discovered this
 particular e2fsck.conf feature.

I understand.  At least, future versions of e2fsck will have the fix.

 Well, the -C option is to date only honored by e2fsck.  In fact
 /sbin/fsck will only pass the -C option to fsck.ext[234].

Ah, I hadn't noticed.  systemd-fsck does pass the -C option to /sbin/fsck,
for all filesystem types, as long as fsck.fstype exists in the PATH (and
isn't a symlink to /bin/true or the like).

So, if other fsck programs detach a child process, they won't even be able
to close the pipe's fd and will have the problem.  Do you know of any such?

  If I have time I'll have take a look at
 the systemd-fsck sources, but that's neither here nor there.

I have looked; if you think someone may be actually interested in a patch,
I could probably write it while the memory's fresh.  It's not that hard,
just somewhat tedious, as you couldn't use fscanf reliably on the pipe
itself; you'd need a poll() loop that relays the pipe to a memory stream
or something like that.  Do you think it would be worth it?

 For that matter, I wasn't aware that anyone other than Ubuntu's
 graphical boot is using it --- I have no idea why systemd-fsck would
 care about the progress bar information, and how or why it might be
 using it in the first place.

It writes a completion percentage to the console.  As for why it doesn't
just use -C0, I don't know.

All the best,
Cedric.


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#775234: /sbin/e2fsck: systemd boot will hang if e2fsck-based logging is enabled

2015-01-12 Thread Cedric Ware
Package: e2fsprogs
Version: 1.42.12-1
Severity: important
File: /sbin/e2fsck
Tags: upstream


Greetings,

While migrating to Jessie (debootstrap + pick-and-choose from old Wheezy's
/etc), I stumbled on the boot process freezing between mounting / and the
other filesystems.

I eventually traced it to the systemd-fsck-root service hanging if
/etc/e2fsck.conf specifies logging to a yet-unmounted directory.  I think
it is an fd leak similar to bug #682592:

* systemd-fsck forks fsck with argument -Cn, with fd n being one end
  of a pipe which systemd uses to read progress information.

* systemd-fsck attempts to read from pipe until EOF (systemd source file
  src/fsck/fsck.c, function process_progress).

* fsck forks e2fsck (actually fsck.ext4), which needs fd n.

* e2fsck forks/detaches a child to wait until directory mounted/writable,
  without closing fd n.

* e2fsck and fsck terminate, e2fsck child waits for boot to complete,
  systemd-fsck waits for pipe to be closed, deadlock.

Workaround: disable e2fsck logging.

strace logs attached, obtained by copying
/lib/systemd/system/systemd-fsck-root.service to /etc/systemd/system/
and replacing ExecStart=/lib/systemd/systemd-fsck with
ExecStart=strace -ff -o /run/logfile /lib/systemd/systemd-fsck, then
using systemd's debug shell on tty9 to copy /run/logfile.* elsewhere
and kill -1'ing waiting e2fsck processes to complete the boot.

I put the severity as important because it can unexpectedly leave the
system unbootable (not even in single-user mode).

Thanks in advance,
Cedric Ware.


-- System Information:
Debian Release: 8.0
  APT prefers testing
  APT policy: (990, 'testing')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 3.16.0-4-amd64 (SMP w/8 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages e2fsprogs depends on:
ii  e2fslibs1.42.12-1
ii  libblkid1   2.25.2-4
ii  libc6   2.19-13
ii  libcomerr2  1.42.12-1
ii  libss2  1.42.12-1
ii  libuuid12.25.2-4
ii  util-linux  2.25.2-4

e2fsprogs recommends no packages.

Versions of packages e2fsprogs suggests:
pn  e2fsck-static  none
pn  gpart  none
ii  parted 3.2-6

-- no debconf information
execve(/lib/systemd/systemd-fsck, [/lib/systemd/systemd-fsck], [/* 1 var 
*/]) = 0
brk(0)  = 0x7f991e592000
access(/etc/ld.so.nohwcap, F_OK)  = -1 ENOENT (No such file or directory)
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7f991dd4b000
access(/etc/ld.so.preload, R_OK)  = -1 ENOENT (No such file or directory)
open(/etc/ld.so.cache, O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=94739, ...}) = 0
mmap(NULL, 94739, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f991dd33000
close(3)= 0
access(/etc/ld.so.nohwcap, F_OK)  = -1 ENOENT (No such file or directory)
open(/lib/x86_64-linux-gnu/libpthread.so.0, O_RDONLY|O_CLOEXEC) = 3
read(3, \177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\0\1\0\0\0\20o\0\0\0\0\0\0..., 
832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=137440, ...}) = 0
mmap(NULL, 2213008, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 
0x7f991d91
mprotect(0x7f991d928000, 2093056, PROT_NONE) = 0
mmap(0x7f991db27000, 8192, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x17000) = 0x7f991db27000
mmap(0x7f991db29000, 13456, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f991db29000
close(3)= 0
access(/etc/ld.so.nohwcap, F_OK)  = -1 ENOENT (No such file or directory)
open(/lib/x86_64-linux-gnu/libc.so.6, O_RDONLY|O_CLOEXEC) = 3
read(3, \177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0\0\1\0\0\0P\34\2\0\0\0\0\0..., 
832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1729984, ...}) = 0
mmap(NULL, 3836448, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 
0x7f991d567000
mprotect(0x7f991d706000, 2097152, PROT_NONE) = 0
mmap(0x7f991d906000, 24576, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x19f000) = 0x7f991d906000
mmap(0x7f991d90c000, 14880, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f991d90c000
close(3)= 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7f991dd32000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7f991dd3
arch_prctl(ARCH_SET_FS, 0x7f991dd30780) = 0
mprotect(0x7f991d906000, 16384, PROT_READ) = 0
mprotect(0x7f991db27000, 4096, PROT_READ) = 0
mprotect(0x7f991df8a000, 12288, PROT_READ) = 0
mprotect(0x7f991dd4d000, 4096, PROT_READ) = 0
munmap(0x7f991dd33000, 94739)   = 0
set_tid_address(0x7f991dd30a50) = 253
set_robust_list(0x7f991dd30a60, 24) = 0
rt_sigaction(SIGRTMIN, {0x7f991d9169f0, [], 

Bug#775234: /sbin/e2fsck: systemd boot will hang if e2fsck-based logging is enabled

2015-01-12 Thread Theodore Ts'o
On Mon, Jan 12, 2015 at 10:00:08PM +0100, Cedric Ware wrote:

 I eventually traced it to the systemd-fsck-root service hanging if
 /etc/e2fsck.conf specifies logging to a yet-unmounted directory.  I think
 it is an fd leak similar to bug #682592:

 * systemd-fsck forks fsck with argument -Cn, with fd n being one end
   of a pipe which systemd uses to read progress information.

 * systemd-fsck attempts to read from pipe until EOF (systemd source file
   src/fsck/fsck.c, function process_progress).

 * fsck forks e2fsck (actually fsck.ext4), which needs fd n.

 * e2fsck forks/detaches a child to wait until directory mounted/writable,
   without closing fd n.

 * e2fsck and fsck terminate, e2fsck child waits for boot to complete,
   systemd-fsck waits for pipe to be closed, deadlock.


Arguably this is as mucha bug in systemd-fsck since it could have been
waiting for fsck to terminate instead of waiting for an EOF from the
progress_fd pipe, but I agree that it would be better if e2fsck were
to close the progress_fd.

This isn't an RC bug by any means (I don't think it even rates
important) since very few people are using the e2fsck.conf logging
capabilities --- most debian users pre-systemd have been been using
/var/log/fsck/*, and systemd will do its own journalctl thing.

So I'll fix this, with a patch like this (see attached), but it's not
something that's worth fixing in Jessie.

  - Ted

commit a4f95ccad41f5befb6768ff58fe26b179237df6a
Author: Theodore Ts'o ty...@mit.edu
Date:   Mon Jan 12 19:42:29 2015 -0500

e2fsck: close the progress_fd in the logfile child process

If e2fsck.conf's logging feature is enabled, and e2fsck is being run
via systemd-fsck, there will be a deadlock since systemd-fsck is
waiting for progress_fd pipe to be closed, instead of waiting for the
fsck process to exit --- and so the logfile child process won't exit
until it can write out the logfile, and systemd won't continue the
boot process so that the file system can be remounted read-write.
Oops.

Addresses-Debian-Bug: #775234

Signed-off-by: Theodore Ts'o ty...@mit.edu

diff --git a/e2fsck/logfile.c b/e2fsck/logfile.c
index c48b8eb..e004f31 100644
--- a/e2fsck/logfile.c
+++ b/e2fsck/logfile.c
@@ -20,6 +20,8 @@
 #include e2fsck.h
 #include pwd.h
 
+extern e2fsck_t e2fsck_global_ctx;   /* Try your very best not to use this! */
+
 struct string {
char*s;
int len;
@@ -233,6 +235,8 @@ static FILE *save_output(const char *s0, const char *s1, 
const char *s2)
}
 
if (pid == 0) {
+   if (e2fsck_global_ctx  e2fsck_global_ctx-progress_fd)
+   close(e2fsck_global_ctx-progress_fd);
if (daemon(0, 0)  0) {
perror(daemon);
exit(1);


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org