Re: [systemd-devel] Incorrect logic when /etc/machine-id is missing
On Fri, 03.10.14 07:07, Jan Synacek (jsyna...@redhat.com) wrote: Now I was determined to fix this bug, however I'm left clueless as to how this is actually supposed to work. Is the entire logic in this piece of code wrong, or am I missing something? How is the (re)generating/mounting of machine-id supposed to work? I am pretty sure we *should* fail to boot if /etc/machine-id cannot be initialized because the root dir is read-only. However, we probably should print a good messages in this case, explaining the rationale. Thanks for the explanation! And yes, Cannot open %s isn't that helpful in this case. I'd be delighted to take a patch for this! ;-) Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Incorrect logic when /etc/machine-id is missing
On Mon, 22.09.14 11:27, Jan Synacek (jsyna...@redhat.com) wrote: Hello, I believe that the following code is not correct: src/core/machine-id-setup.c:188-190: mkdir_parents(etc_machine_id, 0755); fd = open(etc_machine_id, O_RDWR|O_CREAT|O_CLOEXEC|O_NOCTTY, 0444); if (fd = 0) writable = true; else { fd = open(etc_machine_id, O_RDONLY|O_CLOEXEC|O_NOCTTY); if (fd 0) { log_error(Cannot open %s: %m, etc_machine_id); return -errno; } writable = false; } src/core/machine-id-setup.c:218-249: r = generate(id, root); if (r 0) return r; if (S_ISREG(st.st_mode) writable) { lseek(fd, 0, SEEK_SET); if (loop_write(fd, id, 33, false) == 33) return 0; } fd = safe_close(fd); /* Hmm, we couldn't write it? So let's write it to * /run/machine-id as a replacement */ RUN_WITH_UMASK(0022) { r = write_string_file(run_machine_id, id); } if (r 0) { log_error(Cannot write %s: %s, run_machine_id, strerror(-r)); unlink(run_machine_id); return r; } /* And now, let's mount it over */ r = mount(run_machine_id, etc_machine_id, NULL, MS_BIND, NULL); if (r 0) { log_error(Failed to mount %s: %m, etc_machine_id); unlink_noerrno(run_machine_id); return -errno; } If /etc/machine-id is missing on the system, the first open() call should probably handle that case. That's actually not true (at least on my system), because the underlying filesystem is read-only at that time. The second open() call fails as well, because there's no /etc/machine-id, resulting in a boot failure. Yes, correct. We only support booting up with either: a) /etc/machine-id existing and populated, possibly read-only b) /etc/machine-id existing and empty, possibly read-only (in this case it is overmounted with a randomized file in /run) c) /etc/machine-id missing, and /etc writable (in which case it is initialized on first boot. We explicitly do not support booting up with the file missing and /etc being read-only. The machine-id is something we wan to guarantee existance of, apps should be able to read it during early boot, and be able to rely on that it won't fail. Now I was determined to fix this bug, however I'm left clueless as to how this is actually supposed to work. Is the entire logic in this piece of code wrong, or am I missing something? How is the (re)generating/mounting of machine-id supposed to work? I am pretty sure we *should* fail to boot if /etc/machine-id cannot be initialized because the root dir is read-only. However, we probably should print a good messages in this case, explaining the rationale. Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Incorrect logic when /etc/machine-id is missing
On Tue, 23.09.14 09:09, Jan Synacek (jsyna...@redhat.com) wrote: Simon McVittie simon.mcvit...@collabora.co.uk writes: On 22/09/14 10:27, Jan Synacek wrote: If /etc/machine-id is missing on the system, the first open() call should probably handle that case. That's actually not true (at least on my system), because the underlying filesystem is read-only at that time. *What* is not true on your system? Are you saying that it is not true that /etc/machine-id is missing? (From context, probably not.) Are you saying that the first open() call doesn't handle ENOENT? (It handles it by trying the second open() call, in the hope that that might work better, because maybe the first one failed with EPERM; trying the second one on ENOENT is useless, but harmless.) Sorry for not being clear. What I wanted to say was that the first open() call with O_CREAT flag obviously fails, because the root filesystem is by default mounted read-only first and remounted read-write later. I'm testing on Fedora rawhide, which I forgot to mention too. Well, the root dir doesn't have to be read-only. For example, when systemd is invoked within a container environment the root dir is usually already writable. Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Incorrect logic when /etc/machine-id is missing
Lennart Poettering lenn...@poettering.net writes: On Mon, 22.09.14 11:27, Jan Synacek (jsyna...@redhat.com) wrote: Hello, I believe that the following code is not correct: src/core/machine-id-setup.c:188-190: mkdir_parents(etc_machine_id, 0755); fd = open(etc_machine_id, O_RDWR|O_CREAT|O_CLOEXEC|O_NOCTTY, 0444); if (fd = 0) writable = true; else { fd = open(etc_machine_id, O_RDONLY|O_CLOEXEC|O_NOCTTY); if (fd 0) { log_error(Cannot open %s: %m, etc_machine_id); return -errno; } writable = false; } src/core/machine-id-setup.c:218-249: r = generate(id, root); if (r 0) return r; if (S_ISREG(st.st_mode) writable) { lseek(fd, 0, SEEK_SET); if (loop_write(fd, id, 33, false) == 33) return 0; } fd = safe_close(fd); /* Hmm, we couldn't write it? So let's write it to * /run/machine-id as a replacement */ RUN_WITH_UMASK(0022) { r = write_string_file(run_machine_id, id); } if (r 0) { log_error(Cannot write %s: %s, run_machine_id, strerror(-r)); unlink(run_machine_id); return r; } /* And now, let's mount it over */ r = mount(run_machine_id, etc_machine_id, NULL, MS_BIND, NULL); if (r 0) { log_error(Failed to mount %s: %m, etc_machine_id); unlink_noerrno(run_machine_id); return -errno; } If /etc/machine-id is missing on the system, the first open() call should probably handle that case. That's actually not true (at least on my system), because the underlying filesystem is read-only at that time. The second open() call fails as well, because there's no /etc/machine-id, resulting in a boot failure. Yes, correct. We only support booting up with either: a) /etc/machine-id existing and populated, possibly read-only b) /etc/machine-id existing and empty, possibly read-only (in this case it is overmounted with a randomized file in /run) c) /etc/machine-id missing, and /etc writable (in which case it is initialized on first boot. We explicitly do not support booting up with the file missing and /etc being read-only. The machine-id is something we wan to guarantee existance of, apps should be able to read it during early boot, and be able to rely on that it won't fail. Now I was determined to fix this bug, however I'm left clueless as to how this is actually supposed to work. Is the entire logic in this piece of code wrong, or am I missing something? How is the (re)generating/mounting of machine-id supposed to work? I am pretty sure we *should* fail to boot if /etc/machine-id cannot be initialized because the root dir is read-only. However, we probably should print a good messages in this case, explaining the rationale. Thanks for the explanation! And yes, Cannot open %s isn't that helpful in this case. Cheers, -- Jan Synacek Software Engineer, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Incorrect logic when /etc/machine-id is missing
Simon McVittie simon.mcvit...@collabora.co.uk writes: On 22/09/14 10:27, Jan Synacek wrote: If /etc/machine-id is missing on the system, the first open() call should probably handle that case. That's actually not true (at least on my system), because the underlying filesystem is read-only at that time. *What* is not true on your system? Are you saying that it is not true that /etc/machine-id is missing? (From context, probably not.) Are you saying that the first open() call doesn't handle ENOENT? (It handles it by trying the second open() call, in the hope that that might work better, because maybe the first one failed with EPERM; trying the second one on ENOENT is useless, but harmless.) Sorry for not being clear. What I wanted to say was that the first open() call with O_CREAT flag obviously fails, because the root filesystem is by default mounted read-only first and remounted read-write later. I'm testing on Fedora rawhide, which I forgot to mention too. If, for some reason, /etc/machine-id is missing, the logic simply doesn't work. I believe systemd shouldn't fail to boot just because it couldn't write or open the id file. I'm not sure how to solve this problem, though. Now I was determined to fix this bug, however I'm left clueless as to how this is actually supposed to work. Is the entire logic in this piece of code wrong, or am I missing something? How is the (re)generating/mounting of machine-id supposed to work? Installation of systemd is meant to create and populate /etc/machine-id (in the dpkg postinst, RPM %post, or whatever is the equivalent on your distribution). If that doesn't happen, systemd does the best it can to rectify the situation. Re-creating it would work, if your initramfs happened to have mounted the root filesystem read/write. If you're deleting /etc/machine-id in order to wipe a machine's identity when cloning the filesystem, then, yes, the generate a temporary machine ID and mount it over the top logic is not going to work. In Debian's live-build tool, I contributed a patch[1] to change the logic from delete /etc/machine-id to truncate /etc/machine-id to 0 bytes, which works better. That might help your situation? It doesn't help my situation, because I simply remove the file by hand and then try to boot the system. However, it brings a question if missing /etc/machine-id *and* the machine is still booting should even be a valid use case. I still believe it should be and it should be handled by systemd, not by external scripts/processes. S [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=745840 -- Jan Synacek Software Engineer, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] Incorrect logic when /etc/machine-id is missing
Hello, I believe that the following code is not correct: src/core/machine-id-setup.c:188-190: mkdir_parents(etc_machine_id, 0755); fd = open(etc_machine_id, O_RDWR|O_CREAT|O_CLOEXEC|O_NOCTTY, 0444); if (fd = 0) writable = true; else { fd = open(etc_machine_id, O_RDONLY|O_CLOEXEC|O_NOCTTY); if (fd 0) { log_error(Cannot open %s: %m, etc_machine_id); return -errno; } writable = false; } src/core/machine-id-setup.c:218-249: r = generate(id, root); if (r 0) return r; if (S_ISREG(st.st_mode) writable) { lseek(fd, 0, SEEK_SET); if (loop_write(fd, id, 33, false) == 33) return 0; } fd = safe_close(fd); /* Hmm, we couldn't write it? So let's write it to * /run/machine-id as a replacement */ RUN_WITH_UMASK(0022) { r = write_string_file(run_machine_id, id); } if (r 0) { log_error(Cannot write %s: %s, run_machine_id, strerror(-r)); unlink(run_machine_id); return r; } /* And now, let's mount it over */ r = mount(run_machine_id, etc_machine_id, NULL, MS_BIND, NULL); if (r 0) { log_error(Failed to mount %s: %m, etc_machine_id); unlink_noerrno(run_machine_id); return -errno; } If /etc/machine-id is missing on the system, the first open() call should probably handle that case. That's actually not true (at least on my system), because the underlying filesystem is read-only at that time. The second open() call fails as well, because there's no /etc/machine-id, resulting in a boot failure. I tried briding the code so that if the second open() fails, it jumps to the line 218. It then correctly generates the id in /run/machine-id, however the mount() call fails, because it's not possible to bind mount a file to nowhere, i.e. the destination always has to exist. The boot fails as well. Now I was determined to fix this bug, however I'm left clueless as to how this is actually supposed to work. Is the entire logic in this piece of code wrong, or am I missing something? How is the (re)generating/mounting of machine-id supposed to work? Cheers, -- Jan Synacek Software Engineer, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Incorrect logic when /etc/machine-id is missing
On 22/09/14 10:27, Jan Synacek wrote: If /etc/machine-id is missing on the system, the first open() call should probably handle that case. That's actually not true (at least on my system), because the underlying filesystem is read-only at that time. *What* is not true on your system? Are you saying that it is not true that /etc/machine-id is missing? (From context, probably not.) Are you saying that the first open() call doesn't handle ENOENT? (It handles it by trying the second open() call, in the hope that that might work better, because maybe the first one failed with EPERM; trying the second one on ENOENT is useless, but harmless.) Now I was determined to fix this bug, however I'm left clueless as to how this is actually supposed to work. Is the entire logic in this piece of code wrong, or am I missing something? How is the (re)generating/mounting of machine-id supposed to work? Installation of systemd is meant to create and populate /etc/machine-id (in the dpkg postinst, RPM %post, or whatever is the equivalent on your distribution). If that doesn't happen, systemd does the best it can to rectify the situation. Re-creating it would work, if your initramfs happened to have mounted the root filesystem read/write. If you're deleting /etc/machine-id in order to wipe a machine's identity when cloning the filesystem, then, yes, the generate a temporary machine ID and mount it over the top logic is not going to work. In Debian's live-build tool, I contributed a patch[1] to change the logic from delete /etc/machine-id to truncate /etc/machine-id to 0 bytes, which works better. That might help your situation? S [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=745840 ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel