Re: [systemd-devel] Incorrect logic when /etc/machine-id is missing

2014-10-08 Thread Lennart Poettering
On Fri, 03.10.14 07:07, Jan Synacek (jsyna...@redhat.com) wrote:

  Now I was determined to fix this bug, however I'm left clueless as to
  how this is actually supposed to work. Is the entire logic in this piece
  of code wrong, or am I missing something? How is the
  (re)generating/mounting of machine-id supposed to work?
 
  I am pretty sure we *should* fail to boot if /etc/machine-id cannot be
  initialized because the root dir is read-only. However, we probably
  should print a good messages in this case, explaining the rationale.
 
 Thanks for the explanation! And yes, Cannot open %s isn't that helpful
 in this case.

I'd be delighted to take a patch for this! ;-)

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Incorrect logic when /etc/machine-id is missing

2014-10-02 Thread Lennart Poettering
On Mon, 22.09.14 11:27, Jan Synacek (jsyna...@redhat.com) wrote:

 Hello,
 
 I believe that the following code is not correct:
 
 src/core/machine-id-setup.c:188-190:
 
 mkdir_parents(etc_machine_id, 0755);
 fd = open(etc_machine_id, O_RDWR|O_CREAT|O_CLOEXEC|O_NOCTTY, 0444);
 if (fd = 0)
 writable = true;
 else {
 fd = open(etc_machine_id, O_RDONLY|O_CLOEXEC|O_NOCTTY);
 if (fd  0) {
 log_error(Cannot open %s: %m, etc_machine_id);
 return -errno;
 }
 
 writable = false;
 }
 
 src/core/machine-id-setup.c:218-249:
 
 r = generate(id, root);
 if (r  0)
 return r;
 
 if (S_ISREG(st.st_mode)  writable) {
 lseek(fd, 0, SEEK_SET);
 
 if (loop_write(fd, id, 33, false) == 33)
 return 0;
 }
 
 fd = safe_close(fd);
 
 /* Hmm, we couldn't write it? So let's write it to
  * /run/machine-id as a replacement */
 
 RUN_WITH_UMASK(0022) {
 r = write_string_file(run_machine_id, id);
 }
 if (r  0) {
 log_error(Cannot write %s: %s, run_machine_id, strerror(-r));
 unlink(run_machine_id);
 return r;
 }
 
 /* And now, let's mount it over */
 r = mount(run_machine_id, etc_machine_id, NULL, MS_BIND, NULL);
 if (r  0) {
 log_error(Failed to mount %s: %m, etc_machine_id);
 unlink_noerrno(run_machine_id);
 return -errno;
 }
 
 If /etc/machine-id is missing on the system, the first open() call
 should probably handle that case. That's actually not true (at least on
 my system), because the underlying filesystem is read-only at that
 time. The second open() call fails as well, because there's no
 /etc/machine-id, resulting in a boot failure.

Yes, correct. We only support booting up with either:

a) /etc/machine-id existing and populated, possibly read-only
b) /etc/machine-id existing and empty, possibly read-only (in this
   case it is overmounted with a randomized file in /run)
c) /etc/machine-id missing, and /etc writable (in which case it is
   initialized on first boot.

We explicitly do not support booting up with the file missing and /etc
being read-only. The machine-id is something we wan to guarantee
existance of, apps should be able to read it during early boot, and be
able to rely on that it won't fail. 

 Now I was determined to fix this bug, however I'm left clueless as to
 how this is actually supposed to work. Is the entire logic in this piece
 of code wrong, or am I missing something? How is the
 (re)generating/mounting of machine-id supposed to work?

I am pretty sure we *should* fail to boot if /etc/machine-id cannot be
initialized because the root dir is read-only. However, we probably
should print a good messages in this case, explaining the rationale.

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Incorrect logic when /etc/machine-id is missing

2014-10-02 Thread Lennart Poettering
On Tue, 23.09.14 09:09, Jan Synacek (jsyna...@redhat.com) wrote:

 Simon McVittie simon.mcvit...@collabora.co.uk writes:
  On 22/09/14 10:27, Jan Synacek wrote:
  If /etc/machine-id is missing on the system, the first open() call
  should probably handle that case. That's actually not true (at least on
  my system), because the underlying filesystem is read-only at that
  time.
 
  *What* is not true on your system?
 
  Are you saying that it is not true that /etc/machine-id is missing?
  (From context, probably not.)
 
  Are you saying that the first open() call doesn't handle ENOENT? (It
  handles it by trying the second open() call, in the hope that that
  might work better, because maybe the first one failed with EPERM; trying
  the second one on ENOENT is useless, but harmless.)
 
 Sorry for not being clear.
 
 What I wanted to say was that the first open() call with O_CREAT flag
 obviously fails, because the root filesystem is by default mounted
 read-only first and remounted read-write later. I'm testing on Fedora
 rawhide, which I forgot to mention too.

Well, the root dir doesn't have to be read-only. For example, when
systemd is invoked within a container environment the root dir is
usually already writable.

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Incorrect logic when /etc/machine-id is missing

2014-10-02 Thread Jan Synacek
Lennart Poettering lenn...@poettering.net writes:
 On Mon, 22.09.14 11:27, Jan Synacek (jsyna...@redhat.com) wrote:

 Hello,
 
 I believe that the following code is not correct:
 
 src/core/machine-id-setup.c:188-190:
 
 mkdir_parents(etc_machine_id, 0755);
 fd = open(etc_machine_id, O_RDWR|O_CREAT|O_CLOEXEC|O_NOCTTY, 0444);
 if (fd = 0)
 writable = true;
 else {
 fd = open(etc_machine_id, O_RDONLY|O_CLOEXEC|O_NOCTTY);
 if (fd  0) {
 log_error(Cannot open %s: %m, etc_machine_id);
 return -errno;
 }
 
 writable = false;
 }
 
 src/core/machine-id-setup.c:218-249:
 
 r = generate(id, root);
 if (r  0)
 return r;
 
 if (S_ISREG(st.st_mode)  writable) {
 lseek(fd, 0, SEEK_SET);
 
 if (loop_write(fd, id, 33, false) == 33)
 return 0;
 }
 
 fd = safe_close(fd);
 
 /* Hmm, we couldn't write it? So let's write it to
  * /run/machine-id as a replacement */
 
 RUN_WITH_UMASK(0022) {
 r = write_string_file(run_machine_id, id);
 }
 if (r  0) {
 log_error(Cannot write %s: %s, run_machine_id, strerror(-r));
 unlink(run_machine_id);
 return r;
 }
 
 /* And now, let's mount it over */
 r = mount(run_machine_id, etc_machine_id, NULL, MS_BIND, NULL);
 if (r  0) {
 log_error(Failed to mount %s: %m, etc_machine_id);
 unlink_noerrno(run_machine_id);
 return -errno;
 }
 
 If /etc/machine-id is missing on the system, the first open() call
 should probably handle that case. That's actually not true (at least on
 my system), because the underlying filesystem is read-only at that
 time. The second open() call fails as well, because there's no
 /etc/machine-id, resulting in a boot failure.

 Yes, correct. We only support booting up with either:

 a) /etc/machine-id existing and populated, possibly read-only
 b) /etc/machine-id existing and empty, possibly read-only (in this
case it is overmounted with a randomized file in /run)
 c) /etc/machine-id missing, and /etc writable (in which case it is
initialized on first boot.

 We explicitly do not support booting up with the file missing and /etc
 being read-only. The machine-id is something we wan to guarantee
 existance of, apps should be able to read it during early boot, and be
 able to rely on that it won't fail. 

 Now I was determined to fix this bug, however I'm left clueless as to
 how this is actually supposed to work. Is the entire logic in this piece
 of code wrong, or am I missing something? How is the
 (re)generating/mounting of machine-id supposed to work?

 I am pretty sure we *should* fail to boot if /etc/machine-id cannot be
 initialized because the root dir is read-only. However, we probably
 should print a good messages in this case, explaining the rationale.

Thanks for the explanation! And yes, Cannot open %s isn't that helpful
in this case.

Cheers,
-- 
Jan Synacek
Software Engineer, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Incorrect logic when /etc/machine-id is missing

2014-09-23 Thread Jan Synacek
Simon McVittie simon.mcvit...@collabora.co.uk writes:
 On 22/09/14 10:27, Jan Synacek wrote:
 If /etc/machine-id is missing on the system, the first open() call
 should probably handle that case. That's actually not true (at least on
 my system), because the underlying filesystem is read-only at that
 time.

 *What* is not true on your system?

 Are you saying that it is not true that /etc/machine-id is missing?
 (From context, probably not.)

 Are you saying that the first open() call doesn't handle ENOENT? (It
 handles it by trying the second open() call, in the hope that that
 might work better, because maybe the first one failed with EPERM; trying
 the second one on ENOENT is useless, but harmless.)

Sorry for not being clear.

What I wanted to say was that the first open() call with O_CREAT flag
obviously fails, because the root filesystem is by default mounted
read-only first and remounted read-write later. I'm testing on Fedora
rawhide, which I forgot to mention too.

If, for some reason, /etc/machine-id is missing, the logic simply
doesn't work. I believe systemd shouldn't fail to boot just because it
couldn't write or open the id file. I'm not sure how to solve this
problem, though.

 Now I was determined to fix this bug, however I'm left clueless as to
 how this is actually supposed to work. Is the entire logic in this piece
 of code wrong, or am I missing something? How is the
 (re)generating/mounting of machine-id supposed to work?

 Installation of systemd is meant to create and populate /etc/machine-id
 (in the dpkg postinst, RPM %post, or whatever is the equivalent on your
 distribution).

 If that doesn't happen, systemd does the best it can to rectify the
 situation. Re-creating it would work, if your initramfs happened to have
 mounted the root filesystem read/write.

 If you're deleting /etc/machine-id in order to wipe a machine's identity
 when cloning the filesystem, then, yes, the generate a temporary
 machine ID and mount it over the top logic is not going to work. In
 Debian's live-build tool, I contributed a patch[1] to change the logic
 from delete /etc/machine-id to truncate /etc/machine-id to 0 bytes,
 which works better. That might help your situation?

It doesn't help my situation, because I simply remove the file by hand
and then try to boot the system. However, it brings a question if
missing /etc/machine-id *and* the machine is still booting should even
be a valid use case. I still believe it should be and it should be
handled by systemd, not by external scripts/processes.

 S

 [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=745840

-- 
Jan Synacek
Software Engineer, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] Incorrect logic when /etc/machine-id is missing

2014-09-22 Thread Jan Synacek
Hello,

I believe that the following code is not correct:

src/core/machine-id-setup.c:188-190:

mkdir_parents(etc_machine_id, 0755);
fd = open(etc_machine_id, O_RDWR|O_CREAT|O_CLOEXEC|O_NOCTTY, 0444);
if (fd = 0)
writable = true;
else {
fd = open(etc_machine_id, O_RDONLY|O_CLOEXEC|O_NOCTTY);
if (fd  0) {
log_error(Cannot open %s: %m, etc_machine_id);
return -errno;
}

writable = false;
}

src/core/machine-id-setup.c:218-249:

r = generate(id, root);
if (r  0)
return r;

if (S_ISREG(st.st_mode)  writable) {
lseek(fd, 0, SEEK_SET);

if (loop_write(fd, id, 33, false) == 33)
return 0;
}

fd = safe_close(fd);

/* Hmm, we couldn't write it? So let's write it to
 * /run/machine-id as a replacement */

RUN_WITH_UMASK(0022) {
r = write_string_file(run_machine_id, id);
}
if (r  0) {
log_error(Cannot write %s: %s, run_machine_id, strerror(-r));
unlink(run_machine_id);
return r;
}

/* And now, let's mount it over */
r = mount(run_machine_id, etc_machine_id, NULL, MS_BIND, NULL);
if (r  0) {
log_error(Failed to mount %s: %m, etc_machine_id);
unlink_noerrno(run_machine_id);
return -errno;
}

If /etc/machine-id is missing on the system, the first open() call
should probably handle that case. That's actually not true (at least on
my system), because the underlying filesystem is read-only at that
time. The second open() call fails as well, because there's no
/etc/machine-id, resulting in a boot failure.

I tried briding the code so that if the second open() fails, it jumps to
the line 218. It then correctly generates the id in /run/machine-id,
however the mount() call fails, because it's not possible to bind
mount a file to nowhere, i.e. the destination always has to exist. The
boot fails as well.

Now I was determined to fix this bug, however I'm left clueless as to
how this is actually supposed to work. Is the entire logic in this piece
of code wrong, or am I missing something? How is the
(re)generating/mounting of machine-id supposed to work?

Cheers,
-- 
Jan Synacek
Software Engineer, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Incorrect logic when /etc/machine-id is missing

2014-09-22 Thread Simon McVittie
On 22/09/14 10:27, Jan Synacek wrote:
 If /etc/machine-id is missing on the system, the first open() call
 should probably handle that case. That's actually not true (at least on
 my system), because the underlying filesystem is read-only at that
 time.

*What* is not true on your system?

Are you saying that it is not true that /etc/machine-id is missing?
(From context, probably not.)

Are you saying that the first open() call doesn't handle ENOENT? (It
handles it by trying the second open() call, in the hope that that
might work better, because maybe the first one failed with EPERM; trying
the second one on ENOENT is useless, but harmless.)

 Now I was determined to fix this bug, however I'm left clueless as to
 how this is actually supposed to work. Is the entire logic in this piece
 of code wrong, or am I missing something? How is the
 (re)generating/mounting of machine-id supposed to work?

Installation of systemd is meant to create and populate /etc/machine-id
(in the dpkg postinst, RPM %post, or whatever is the equivalent on your
distribution).

If that doesn't happen, systemd does the best it can to rectify the
situation. Re-creating it would work, if your initramfs happened to have
mounted the root filesystem read/write.

If you're deleting /etc/machine-id in order to wipe a machine's identity
when cloning the filesystem, then, yes, the generate a temporary
machine ID and mount it over the top logic is not going to work. In
Debian's live-build tool, I contributed a patch[1] to change the logic
from delete /etc/machine-id to truncate /etc/machine-id to 0 bytes,
which works better. That might help your situation?

S

[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=745840

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel