Re: Generating new IDs for cloning (was Re: duplicate popularity-contest ID)

2019-08-18 Thread Simon McVittie
On Sun, 18 Aug 2019 at 13:57:58 +0200, Marc Haber wrote:
> On Tue, 13 Aug 2019 18:30:51 +0100, Simon McVittie 
> >bubblewrap and other container-runners often use this when setting
> >up containers - for example if you have a Flatpak app installed, try
> >something like
> >
> >flatpak run --command=mount org.gnome.Recipes
> >
> >and you'll see that the container's /etc is a mixture of read-only
> >bind-mounts from the host system and read-only bind-mounts from the
> >runtime, some of which are individual files.
> 
> That must be a horrible clutter in mtab though.

There are a lot of lines in /proc/$pid/mounts for $pid inside
the container, yes. However, bubblewrap and other unprivileged
container-runners do not (cannot!) alter the mount table outside the
container (they operate in a private mount namespace owned by a private
user namespace), so /proc/$pid/mounts for $pid outside the container
remains uncluttered.

/etc/mtab is recommended to be a symlink to /proc/self/mounts, so
it reflects the mount table that is active for the process reading
it. In older installations where it was still a regular file, it was
updated by mount(8), so in practice it will reflect the mount table
that is active for the "init namespace" (the namespaces to which pid 1
belongs). bubblewrap uses mount(2) directly, not mount(8), so it will
not alter /etc/mtab if that is a regular file.

In any case, I think avoiding "clutter" in the mount table is quite a
long way down the list of the most important properties of a system.
I would prefer Flatpak to bind-mount in the correct mixture of the
runtime's /etc and the host system's /etc to make the app work correctly,
however much clutter that might result in.

If this clutter offends you, one good way to reduce it is to encourage
packages to work correctly with fewer "boilerplate" files in /etc
(the same category of changes that results in non-sysadmin-modified
D-Bus policy fragments migrating from /etc/dbus-1/system.d to
/usr/share/dbus-1/system.d, for example).

> We should also have a document containing what we want to have in the
> future, such as a comprehensive roadmap.

Sorry, I do not have enough Debian time available to write down a
comprehensive road map for the teams and packages I'm involved with,
never mind the whole project. Any time I spend on writing a road map
advocating good ideas is time that I am not spending on implementing
those good ideas.

For goals confined to a group of closely cooperating packages and
maintainers, the way to achieve changes is to just make them.

For goals with a wider scope, I think the closest tool we have is release
goals. If you want to propose release goals around new technologies in
Debian, please do!

Debian is mostly a do-ocracy - the people who do the work decide what
the work will be - so I don't think anyone really has the authority to
write down a road map for the project as a whole and expect developers to
follow it. The DPL, release team and technical committee are probably the
closest to the positions that could meaningfully define a road map, but
even those cannot assign or require developers to work on its contents.

smcv



Re: Generating new IDs for cloning (was Re: duplicate popularity-contest ID)

2019-08-18 Thread Marc Haber
On Tue, 13 Aug 2019 18:30:51 +0100, Simon McVittie 
wrote:
>On Tue, 13 Aug 2019 at 14:22:31 +0200, Marc Haber wrote:
>> On Tue, 13 Aug 2019 12:01:13 +0100, Simon McVittie 
>> wrote:
>> >(systemd cannot create a mount point that doesn't exist yet on a read-only
>> >file system, which is why a zero-byte file is preferred.
>> 
>> but you can bind-mount over a file? I wasn't aware of that.
>
>Yes, you can bind-mount a directory over another directory, or a
>non-directory non-symlink over another non-directory non-symlink.

I wasn't aware of that. How neat.

>bubblewrap and other container-runners often use this when setting
>up containers - for example if you have a Flatpak app installed, try
>something like
>
>flatpak run --command=mount org.gnome.Recipes
>
>and you'll see that the container's /etc is a mixture of read-only
>bind-mounts from the host system and read-only bind-mounts from the
>runtime, some of which are individual files.

That must be a horrible clutter in mtab though.

>> >> >Maybe /etc/machine-id should be part of the "API" of a Debian system in
>> >> >general (systemd or not)?
>> 
>> So /etc/machine-id should be in Policy?
>
>Probably yes, if that proposal has consensus, although a prerequisite
>for it being in Policy would be to have an implementation of making it
>exist even on systems with neither systemd nor dbus installed (Policy
>is meant to document what's true, not what we hope will become true).

We should also have a document containing what we want to have in the
future, such as a comprehensive roadmap. The absence of leadership in
this regard is probably one of the reasons why we have lost so much
momentum in adopting new technology.

Greetings
Marc
-- 
-- !! No courtesy copies, please !! -
Marc Haber |   " Questions are the | Mailadresse im Header
Mannheim, Germany  | Beginning of Wisdom " | 
Nordisch by Nature | Lt. Worf, TNG "Rightful Heir" | Fon: *49 621 72739834



Re: duplicate popularity-contest ID

2019-08-15 Thread Thorsten Glaser
> Change popularity-contest by transmissing the hostid after it has been
> hashed with the content of /etc/machine-id.

Heh, there is no /etc/machine-id on my Debian system.

I have an …/etc/machine-id in buster and stretch chroots I created
and xenial, bionic and disco pbuilder base.cow directories, and
a /var/lib/dbus/machine-id file on my host system (sid), another sid
chroot, and the stretch chroot.


But depending only on files is the wrong approach anyway.

I’ve published images for Debian/m68k back when I was still
active in the revival business. I *had* preinstalled popcon
but removed the hostid and included running dpkg-reconfigure
in the first-boot-setup instructions. (Maybe few did, but I
reset the package state, so the first apt call would do.)

The thing is, unless you create clones from a running system,
and never distribute an image for others to use, you cannot
just recreate them automatically on first start (otherwise
they’ll likely turn out the same), you need some user input,
outside-world date/time, hardware IDs (MAC addresses are a
start, full dmesg is better) and random bits.

I envision a thing that, while booting a cloned VM/image,
interactively asks the user for some keystrokes and offers
to download random data from a couple of known systems (I
know of four or five and have my own as well) or a manual
URL or by ssh’ing out (ssh -l user host dd if=/dev/urandom
count=4 | dd of=/dev/urandom). — This can only work if the
image is first-booted interactively, though. (It then has
to recreate SSH host keys, popcon ID, and apparently a
machine-id file I know nothing about… plus there are things
you cannot regenerate like filesystem-embedded NFS generation
numbers, crypto and LV names… you might wish to change the
hostname and perhaps the ext2fs label but for doing the same
to swap it needs to be recreated, and ext2fs retains the UUID,
etc.)

Cloning systems is doomed to fail ☹

For my m68k case, it might have been useful to use as bootstrap
image (write it into your swap partition, boot it from TOS,
create new ext4 and debootstrap there, reboot that, mkswap over
the temp image). Others like solutions like ANSI-BEL, Puppet,
preseeding d-i, shell scripts, …

bye,
//mirabilos
-- 
 Beware of ritual lest you forget the meaning behind it.
 yeah but it means if you really care about something, don't
ritualise it, or you will lose it. don't fetishise it, don't
obsess. or you'll forget why you love it in the first place.



Re: Generating new IDs for cloning (was Re: duplicate popularity-contest ID)

2019-08-15 Thread Simon McVittie
On Thu, 15 Aug 2019 at 09:54:44 +0100, Ian Jackson wrote:
> Do we have a list of all the things this is (or might be) used for ?

As I said, I don't think a comprehensive list is feasible without
resorting to something like codesearch, because it's of similar scope to
a list of reasons to use the hostname (as in gethostname(2)), and indeed
some current uses of the hostname would probably be better to use the
machine ID (if that's possible to do without breaking compatibility,
which in some cases it won't be).

I believe the original reasoning for the D-Bus machine ID (from which
systemd took the idea) went something like this:

- I need to do $something per-machine
- but what if the user decides they don't like their current hostname
  (which after all is user-facing) and changes it to one that is more
  aesthetically appealing? then I won't be able to find the state that I
  previously stored in the home directory, X11 display or whatever is the
  shared resource
- conversely, what if the hostname is not unique among the machines sharing
  the X11 display/home directory/other shared resource, because they
  all think they are called "debian" or "localhost" or similar? then
  they will overwrite each other's state

so "like the hostname, but opaque" is quite a good mental model for it.

The second point (hostname sometimes changing) was exacerbated by the
tendency for some distros to pass a hostname received from DHCP to
sethostname(), although I think (I hope) everyone has now decided that
was a bad idea and stopped doing so.

The dbus commit that switched from hostnames to machine IDs in the X11
autolaunch protocol appears to have been in 2006, which was before
my first commit to dbus - I'm making an informed guess about former
maintainers' motivations here, not stating why I made a decision.
I cannot, now, change how dbus uses machine IDs without breaking backwards
compatibility.

Another precursor of the machine ID is gethostid(), but that's only
32 bits long, which is clearly not enough to be "unique among all UNIX
systems in existence" as its man page claims.

> I wonder if we should in Debian have a "sticky door" policy on the use
> of machine-id, like we do for virtual packages: "please come to -devel
> for a peer review".

I can see the reasoning for suggesting this, but we don't ask for peer
review for uses of other identifying properties like the hostname, MAC
address, chassis serial number, etc., and I don't think it's realistic
to expect the authors of upstream software to come to Debian seeking
permission to use an OS interface. (In some cases the machine ID forms
part of an upstream API or an interface between stored data and the
software, from which Debian maintainers cannot diverge without breaking
compatibility, so the upstream design is relevant here.)

Also, this suggestion appears to be closing (or at least applying glue
to) the door of a stable that has not contained horses for more than a
decade. The concept of a machine UUID is far from new or innovative at
this point.

systemd's machine-id(5) now recommends using a HMAC of the machine
ID with an application-specific key (reference implementation:
sd_id128_get_machine_app_specific()), which is not something that the
D-Bus maintainers thought of back in 2006, but with hindsight is an
improvement over just having one form of the machine ID; but again,
not everything can do this without it being an incompatible change.

(If machine-id(5) already existed, but D-Bus didn't, and I was designing
D-Bus now, then I'd probably be using sd_id128_get_machine_app_specific()
or equivalent for the "is this peer on the same machine as me?" APIs
in the D-Bus specification; but that isn't the order in which things
happened.)

smcv



Re: Generating new IDs for cloning (was Re: duplicate popularity-contest ID)

2019-08-15 Thread Ian Jackson
Simon McVittie writes ("Re: Generating new IDs for cloning (was Re: duplicate 
popularity-contest ID)"):
> Somehow describing which containers and chroots should have a machine ID,
> which ones should share the host's machine ID and which ones don't need
> either is a gap in my proposal.

Do we have a list of all the things this is (or might be) used for ?
I think that would help us think about these kind of questions, and
also decide how important it was to improve this area.  (Please
forgive me if someone already mentioned such a list which I overlooked
in this thread.)

We need to think about the privacy implications.  Certainly the
machine id should not normally go into network protocols.

Can we think of other uses of the machine id that might seem like a
good idea to some upstreams, but of which we would disapprove (whether
for technical or ethical reasons) ?

I wonder if we should in Debian have a "sticky door" policy on the use
of machine-id, like we do for virtual packages: "please come to -devel
for a peer review".  We could spot these issues with a lintian check
maybe.

Sorry that these considerationsn probably seem rather negative.  It's
in my nature to try to spot the downsides and problems with things;
please take this as an attempt at constructive foresight.

If we decide this machine-id is a good thing that should be in X and Y
and Z (containers, chroots, "pets" running sysvinit, or whatever
combination, etc.) then we could implement that in our own
arrangements even if upstream haven't taken those patches yet.

Regards,
Ian.

-- 
Ian JacksonThese opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.



Re: Generating new IDs for cloning (was Re: duplicate popularity-contest ID)

2019-08-14 Thread Simon McVittie
On Tue, 13 Aug 2019 at 22:01:34 -0400, Theodore Y. Ts'o wrote:
> That's just a matter of having sysvinit (and other non-systemd init
> systems) have an init script which runs as soon as the root file
> system is remounted read/write to initialize /etc/machine-id if it
> doesn't exist or if it is a zero-length file, right?

Yes ish, although it isn't *necessarily* an init system responsibility.
Somehow describing which containers and chroots should have a machine ID,
which ones should share the host's machine ID and which ones don't need
either is a gap in my proposal.

init is no longer Essential, so Debian chroots and containers will often
have neither systemd nor sysvinit (or any of the other alternatives),
but perhaps they should have a machine-id anyway - or perhaps container
managers that don't run a full init system, like schroot, should be
responsible for that? Or perhaps this requirement isn't necessary
for containers that don't provide either system services or user
logins? (The elephant in the room here is that Docker doesn't arrange to
have a machine-id, and also doesn't set the $container_uuid proposed in
.)

systemd-nspawn already sets up a machine ID for its containers, and lxc
(presumably also lxd) normally runs init, but schroot and Docker don't
normally run init and also don't take any particular steps to have a
machine ID.

Flatpak copies the machine ID from the host system into its containers,
and I would assume that other frameworks with "app containers" that are
conceptually part of the host machine rather than their own machine,
like Snap and AppImage, probably do the same.

An implementation of this should copy the dbus machine ID if it exists
(if the dbus machine ID differs from machine-id(5) then for historical
reasons various libraries will disagree on which is more important)
and the other subtleties described in systemd-machine-id-setup(1) are
probably also a good idea. On Linux systems dbus-uuidgen is not required,
because `tr -d - < /proc/sys/kernel/random/uuid` is suitable. I'm sure
kFreeBSD and Hurd have an equivalent, but I don't know what it is.

smcv



Re: Generating new IDs for cloning (was Re: duplicate popularity-contest ID)

2019-08-13 Thread Theodore Y. Ts'o
On Tue, Aug 13, 2019 at 06:30:51PM +0100, Simon McVittie wrote:
> > >> >Maybe /etc/machine-id should be part of the "API" of a Debian system in
> > >> >general (systemd or not)?
> > 
> > So /etc/machine-id should be in Policy?
> 
> Probably yes, if that proposal has consensus, although a prerequisite
> for it being in Policy would be to have an implementation of making it
> exist even on systems with neither systemd nor dbus installed (Policy
> is meant to document what's true, not what we hope will become true).

That's just a matter of having sysvinit (and other non-systemd init
systems) have an init script which runs as soon as the root file
system is remounted read/write to initialize /etc/machine-id if it
doesn't exist or if it is a zero-length file, right?

  - Ted



Re: Generating new IDs for cloning (was Re: duplicate popularity-contest ID)

2019-08-13 Thread Simon McVittie
On Tue, 13 Aug 2019 at 14:22:31 +0200, Marc Haber wrote:
> On Tue, 13 Aug 2019 12:01:13 +0100, Simon McVittie 
> wrote:
> >(systemd cannot create a mount point that doesn't exist yet on a read-only
> >file system, which is why a zero-byte file is preferred.
> 
> but you can bind-mount over a file? I wasn't aware of that.

Yes, you can bind-mount a directory over another directory, or a
non-directory non-symlink over another non-directory non-symlink.
(Symlinks get dereferenced before they're used as the source or
destination of a bind-mount.)

bubblewrap and other container-runners often use this when setting
up containers - for example if you have a Flatpak app installed, try
something like

flatpak run --command=mount org.gnome.Recipes

and you'll see that the container's /etc is a mixture of read-only
bind-mounts from the host system and read-only bind-mounts from the
runtime, some of which are individual files.

> >> >Maybe /etc/machine-id should be part of the "API" of a Debian system in
> >> >general (systemd or not)?
> 
> So /etc/machine-id should be in Policy?

Probably yes, if that proposal has consensus, although a prerequisite
for it being in Policy would be to have an implementation of making it
exist even on systems with neither systemd nor dbus installed (Policy
is meant to document what's true, not what we hope will become true).

smcv



Re: Generating new IDs for cloning (was Re: duplicate popularity-contest ID)

2019-08-13 Thread Marc Haber
On Tue, 13 Aug 2019 12:01:13 +0100, Simon McVittie 
wrote:
>(systemd cannot create a mount point that doesn't exist yet on a read-only
>file system, which is why a zero-byte file is preferred.

but you can bind-mount over a file? I wasn't aware of that.

>If you use systemd as init, install dbus, delete or empty /etc/machine-id,
>delete /var/lib/dbus/machine-id and reboot, then systemd will recreate
>/etc/machine-id very early in the boot process. Less early but still early
>in the boot process (before units with DefaultDependencies=yes, analogous
>to rcS in sysvinit), systemd-tmpfiles will make /var/lib/dbus/machine-id
>a symlink as directed by /usr/lib/tmpfiles.d/dbus.conf. By the time
>ordinary system services start, it is already a symlink. Might this be
>what happened on your test systems?

Probably, yes.

>> >Maybe /etc/machine-id should be part of the "API" of a Debian system in
>> >general (systemd or not)?
>> 
>> please elaborate on that.
>
>There are some properties that we guarantee every Debian system will have,
>even though they cannot be guaranteed on every GNU or Linux system. For
>example, Policy guarantees that /var/run is always a symlink to /run on
>Debian systems (even though they are distinct directories on e.g.
>Slackware[1], and as a result upstream projects like dbus can't rely on
>/var/run being equivalent to /run). Similarly, we guarantee that all
>Debian systems will have the base-passwd users and groups, with their
>canonical numeric values.

So /etc/machine-id should be in Policy?

Greetings
Marc
-- 
-- !! No courtesy copies, please !! -
Marc Haber |   " Questions are the | Mailadresse im Header
Mannheim, Germany  | Beginning of Wisdom " | 
Nordisch by Nature | Lt. Worf, TNG "Rightful Heir" | Fon: *49 621 72739834



Re: Generating new IDs for cloning (was Re: duplicate popularity-contest ID)

2019-08-13 Thread Sam Hartman
> "Simon" == Simon McVittie  writes:

Simon> If you use systemd as init, install dbus, delete or empty
Simon> /etc/machine-id, delete /var/lib/dbus/machine-id and reboot,

It is my experience that deleting /etc/machine-id doesn't actually work
(even if I delete the dbus machine id too).
It may well be that a zero-byte file works; I didn't know to try that.
But my experience parallels someone earlier on the list that deleting
machine-id does not actually work.



Re: Generating new IDs for cloning (was Re: duplicate popularity-contest ID)

2019-08-13 Thread Simon McVittie
On Tue, 13 Aug 2019 at 11:50:27 +0200, Marc Haber wrote:
> On Thu, 8 Aug 2019 22:44:19 +0100, Simon McVittie 
> wrote:
> >Making /etc/machine-id a 0-byte file is considered to be the canonical
> >way to clear it, rather than actually deleting it, because if systemd is
> >running on a completely read-only root filesystem, it has code to create
> >a machine ID on a tmpfs and bind-mount it over the top of the empty file.
> 
> And what will systemd do when it encounters a zero-sized
> /etc/machine-id on a writable filesystem?

Replace it with a new machine ID, the same as if it didn't exist at all:

   After the machine ID is established, systemd(1) will attempt to save it
   to /etc/machine-id. If this fails, it will attempt to bind-mount a
   temporary file over /etc/machine-id. It is an error if the file system
   is read-only and does not contain a (possibly empty) /etc/machine-id
   file.
   — machine-id(5)

(systemd cannot create a mount point that doesn't exist yet on a read-only
file system, which is why a zero-byte file is preferred.)

> >If you are doing cloning, stateless systems or similar activities,
> >and you know you will have a valid /etc/machine-id (you either use
> >systemd or have taken other steps to have one), then you can make
> >/var/lib/dbus/machine-id a symlink to /etc/machine-id (dbus comes with a
> >systemd-tmpfiles file to do this). This is not done by default in Debian,
> >or by `dbus-uuidgen --ensure`, for historical reasons
> 
> Interesting, I see this on a number of my test systems without having
> been active in this regard myself.

If you use systemd as init, install dbus, delete or empty /etc/machine-id,
delete /var/lib/dbus/machine-id and reboot, then systemd will recreate
/etc/machine-id very early in the boot process. Less early but still early
in the boot process (before units with DefaultDependencies=yes, analogous
to rcS in sysvinit), systemd-tmpfiles will make /var/lib/dbus/machine-id
a symlink as directed by /usr/lib/tmpfiles.d/dbus.conf. By the time
ordinary system services start, it is already a symlink. Might this be
what happened on your test systems?

I'm fairly sure that dbus-uuidgen (which is run in dbus.postinst,
and from /etc/init.d/dbus on non-systemd systems) always makes
/var/lib/dbus/machine-id a regular file rather than a symlink, but if
you already had dbus installed before you reset the machine ID, and you
did not subsequently boot with a non-systemd init, then dbus-uuidgen
wouldn't have been run. If /var/lib/dbus/machine-id is already a symlink
to a file with contents in the right format, dbus-uuidgen won't replace it.

> >Maybe /etc/machine-id should be part of the "API" of a Debian system in
> >general (systemd or not)?
> 
> please elaborate on that.

There are some properties that we guarantee every Debian system will have,
even though they cannot be guaranteed on every GNU or Linux system. For
example, Policy guarantees that /var/run is always a symlink to /run on
Debian systems (even though they are distinct directories on e.g.
Slackware[1], and as a result upstream projects like dbus can't rely on
/var/run being equivalent to /run). Similarly, we guarantee that all
Debian systems will have the base-passwd users and groups, with their
canonical numeric values.

Some of those properties either originated in systemd and were adopted in
Debian for even non-systemd systems (for example /usr/lib/os-release in
base-files, which originated in systemd as a replacement for lsb_release
and distro-specific facilities like /etc/debian_version), or originated in
Debian or some other distro and have been adopted by systemd as one of its
"API" guarantees on all distros that use it (for example systemd picked
up Debian's /etc/hostname, /etc/timezone and /etc/sysctl.d). In both
cases this is done on the basis that regardless of origin, they are good
ideas that should be spread further, and giving application and library
maintainers more "API" guarantees from the OS makes their jobs easier.

What I'm suggesting is that maybe a systemd-style /etc/machine-id should
be one of those properties that we guarantee, instead of relying on dbus
(which is an IPC system that has very little to do with uniquely
identifying a machine) to provide the closest thing that is guaranteed on
non-systemd-booted machines.

Because systemd does not delete /etc/machine-id even when purged (that
would be counterproductive for its intended purpose, and would break
stored state that refers to it), it is present on all systems that either
boot with systemd or have switched from systemd to sysvinit. The only
Debian systems that will not already have /etc/machine-id are those that
were installed before systemd became the default (wheezy or older) or
as a minimal debootstrap with no init system at all (jessie or newer),
may have subsequently been upgraded to newer suites, but have never run
systemd.postinst or booted with systemd.

smcv

[1] I think this is an 

Re: Generating new IDs for cloning (was Re: duplicate popularity-contest ID)

2019-08-13 Thread Marc Haber
On Thu, 8 Aug 2019 22:44:19 +0100, Simon McVittie 
wrote:
>Making /etc/machine-id a 0-byte file is considered to be the canonical
>way to clear it, rather than actually deleting it, because if systemd is
>running on a completely read-only root filesystem, it has code to create
>a machine ID on a tmpfs and bind-mount it over the top of the empty file.

And what will systemd do when it encounters a zero-sized
/etc/machine-id on a writable filesystem?

>If you are doing cloning, stateless systems or similar activities,
>and you know you will have a valid /etc/machine-id (you either use
>systemd or have taken other steps to have one), then you can make
>/var/lib/dbus/machine-id a symlink to /etc/machine-id (dbus comes with a
>systemd-tmpfiles file to do this). This is not done by default in Debian,
>or by `dbus-uuidgen --ensure`, for historical reasons; maybe it should be,
>but to be confident that it was a correct change I'd have to think about
>the ways in which it might go wrong on non-systemd systems (with either
>a non-systemd init like sysvinit, or no init at all like minimal chroots).

Interesting, I see this on a number of my test systems without having
been active in this regard myself.

>Maybe /etc/machine-id should be part of the "API" of a Debian system in
>general (systemd or not)?

please elaborate on that.

Greetings
Marc
-- 
-- !! No courtesy copies, please !! -
Marc Haber |   " Questions are the | Mailadresse im Header
Mannheim, Germany  | Beginning of Wisdom " | 
Nordisch by Nature | Lt. Worf, TNG "Rightful Heir" | Fon: *49 621 72739834



Re: duplicate popularity-contest ID

2019-08-08 Thread Bill Allombert
On Tue, Aug 06, 2019 at 02:01:16PM -0400, Sam Hartman wrote:
> > "Bill" == Bill Allombert  writes:
> 
> Bill> This is potentially an excellent idea!
> 
> Bill> Does not /etc/machine-id suffer of exactly the same issue as
> Bill> /etc/popularity-contest.conf ?
> 
> A lot more procedures for cloning images know that they need to generate
> new /etc/machine-ids.
> 
> It's one of those things you tend to realize fairly quickly that you
> need to fix up in cloned images.
> Interactions with machined, systemd journal, and a few other things tend
> to make it obvious.

However there some points in "man machine-id" which are rather
problematic for popularity-contest:

  Optionally, for stateless systems, it is generated during runtime at
  early boot if it is found to be empty.

Unless such system has very high uptime, it is much preferable for
stateless systems with identical images to report with the same popcon
ID rather than generate identical ghost submissions with each boots
(which stays active for 20 days).

  The machine-id may also be set, for example when network
  booting, by setting the systemd.machine_id= kernel command line
  parameter or passing the option --machine-id= to systemd. A machine-id
  may not be set to all zeros.

It seems machine-id is too linked to the boot process to be usable by popcon.

   When a machine is booted with systemd(1) the ID of the machine will
   be established. If systemd.machine_id= or --machine-id= options (see
   first section) are specified, this value will be used.  Otherwise, the
   value in /etc/machine-id will be used. If this file is empty or missing,
   systemd will attempt to use the D-Bus machine ID from
   /var/lib/dbus/machine-id, the value of the kernel command line option
   container_uuid, the KVM DMI product_uuid (on KVM systems), and finally a
   randomly generated UUID.

Again this would generate ghost submissions.

Cheers,
-- 
Bill. 

Imagine a large red swirl here. 



Re: Generating new IDs for cloning (was Re: duplicate popularity-contest ID)

2019-08-08 Thread Simon McVittie
On Thu, 08 Aug 2019 at 13:39:54 +0200, Marc Haber wrote:
> Generating a new machine-id doesn't seem as easy as generating a new
> ssh key: Removing /etc/machine-id doesn't do it as
> systemd-machine-id-setup seems to pull the machine-id from dbus.

For historical reasons (dbus originated the concept and
systemd generalized it into a non-D-Bus-specific "API"), each of
systemd-machine-id-setup and dbus-uuidgen tries to copy the other's
machine ID, to avoid problems where processes that read the two files
in opposite orders disagree on what the machine's unique ID is.
If you delete or empty /etc/machine-id you should also delete
/var/lib/dbus/machine-id.

Making /etc/machine-id a 0-byte file is considered to be the canonical
way to clear it, rather than actually deleting it, because if systemd is
running on a completely read-only root filesystem, it has code to create
a machine ID on a tmpfs and bind-mount it over the top of the empty file.

If you are doing cloning, stateless systems or similar activities,
and you know you will have a valid /etc/machine-id (you either use
systemd or have taken other steps to have one), then you can make
/var/lib/dbus/machine-id a symlink to /etc/machine-id (dbus comes with a
systemd-tmpfiles file to do this). This is not done by default in Debian,
or by `dbus-uuidgen --ensure`, for historical reasons; maybe it should be,
but to be confident that it was a correct change I'd have to think about
the ways in which it might go wrong on non-systemd systems (with either
a non-systemd init like sysvinit, or no init at all like minimal chroots).

Maybe /etc/machine-id should be part of the "API" of a Debian system in
general (systemd or not)?

smcv



Re: Generating new IDs for cloning (was Re: duplicate popularity-contest ID)

2019-08-08 Thread Marc Haber
On Wed, 7 Aug 2019 11:15:22 -0400, Marvin Renich 
wrote:
>I think this is a good idea, but will require work and coordination to
>accomplish.  A wiki.debian.org page with your ideas and (perhaps on a
>separate page) a place to list things that need updating after the
>physical copying is complete would be wonderful, if you feel motivated
>to get it started.  :-)  Hostname, machine-id (new to me too!), and ssh
>host keys can start the list.

https://wiki.debian.org/MachineId

as a beginning.

Greetings
Marc
-- 
-- !! No courtesy copies, please !! -
Marc Haber |   " Questions are the | Mailadresse im Header
Mannheim, Germany  | Beginning of Wisdom " | 
Nordisch by Nature | Lt. Worf, TNG "Rightful Heir" | Fon: *49 621 72739834



Re: Generating new IDs for cloning (was Re: duplicate popularity-contest ID)

2019-08-08 Thread Bernhard Schmidt
Am 08.08.19 um 13:39 schrieb Marc Haber:
> On Wed, 07 Aug 2019 09:28:12 -0400, The Wanderer
>  wrote:
>> On 2019-08-07 at 04:26, Russell Stuart wrote:
>>
>>> On Wed, 2019-08-07 at 09:34 +0200, Marc Haber wrote:
>>>
 I am using Debian for two decades now, and I realized that
 necessity two days ago.
>>>
>>> Ditto - except for me it was a few seconds ago.
>>
>> In my case, it was when I read this thread last night. (After more like
>> ~1.5 decades of Debian, for what that's worth.)
> 
> Generating a new machine-id doesn't seem as easy as generating a new
> ssh key: Removing /etc/machine-id doesn't do it as
> systemd-machine-id-setup seems to pull the machine-id from dbus.
> 
> I have four Banana Pis with identical machine IDs because they were
> cloned from a common image. Since that one originates from a Debian
> Wiki Page about the Banana Pi I guess that the vast majority of Banana
> Pis running Debian has this machine id.
> 
> How do I generate a new one?

I followed
https://unix.stackexchange.com/questions/402999/it-is-ok-to-change-etc-machine-id
last time which means

rm -f /etc/machine-id
dbus-uuidgen --ensure=/etc/machine-id
rm /var/lib/dbus/machine-id
dbus-uuidgen --ensure

Last time I only removed /etc/machine-id (hoping it would be regenerated
on Reboot) rendered the machine unbootable.

FTR, I have also only recently learned about this. Duplicate machine-ids
can have very nasty consequences. We recently had a weird networking
issue at one department where clients got assigned the same address from
the dynamic DHCP pool and kicked each other out of the network.

It took us a while to figure out the admin had cloned Kubuntu 18.04
workstations that use systemd-networkd for network configuration.
systemd-networkd DHCP by default sends the machine-id as
client-identifier, and isc-dhcp by default uses the client-identifier
(if present) instead of the MAC address to track leases.

Bernhard



Re: Generating new IDs for cloning (was Re: duplicate popularity-contest ID)

2019-08-08 Thread Marc Haber
On Wed, 07 Aug 2019 09:28:12 -0400, The Wanderer
 wrote:
>On 2019-08-07 at 04:26, Russell Stuart wrote:
>
>> On Wed, 2019-08-07 at 09:34 +0200, Marc Haber wrote:
>> 
>>> I am using Debian for two decades now, and I realized that
>>> necessity two days ago.
>> 
>> Ditto - except for me it was a few seconds ago.
>
>In my case, it was when I read this thread last night. (After more like
>~1.5 decades of Debian, for what that's worth.)

Generating a new machine-id doesn't seem as easy as generating a new
ssh key: Removing /etc/machine-id doesn't do it as
systemd-machine-id-setup seems to pull the machine-id from dbus.

I have four Banana Pis with identical machine IDs because they were
cloned from a common image. Since that one originates from a Debian
Wiki Page about the Banana Pi I guess that the vast majority of Banana
Pis running Debian has this machine id.

How do I generate a new one?

Greetings
Marc
-- 
-- !! No courtesy copies, please !! -
Marc Haber |   " Questions are the | Mailadresse im Header
Mannheim, Germany  | Beginning of Wisdom " | 
Nordisch by Nature | Lt. Worf, TNG "Rightful Heir" | Fon: *49 621 72739834



Re: Generating new IDs for cloning (was Re: duplicate popularity-contest ID)

2019-08-07 Thread The Wanderer
On 2019-08-07 at 16:59, Andrei POPESCU wrote:

> On Mi, 07 aug 19, 09:28:12, The Wanderer wrote:
> 
>> I've begun to wonder whether it might be worth the overhead to set up
>> some type of mechanism to let packages which define such
>> machine-specific IDs A: declare the fact, in a central location which
> 
> Do you mean /etc? :)

It would probably go under /etc/, but that's not what you seem to mean.

>> the sysadmin of a machine where that package is installed can easily
>> check, and B: define an automated way of performing the appropriate
>> update / regenerate step in a way which covers all known places where
>> the ID needs to be updated.
> 
> 1. Delete the contents of /etc (all of it)
> 2. If a package doesn't find its "stuff" in /etc it regenerates it from 
> defaults.
> 
> http://0pointer.net/blog/projects/stateless.html

Maybe I'm missing something, but I don't think that seems to cover all
cases.

For example, in the case where defining (I think it was) an mdadm array
embeds the then-existing hostname into the array definition, such that
the array will only be auto-assembled when it is detected on a machine
with that same hostname, it's not enough to simply wipe /etc/hostname;
you also need to arrange for the new hostname, once generated, to be
inserted into the definition of the existing array.

That's probably more relevant to the case of changing the hostname of a
single machine than of cloning a machine, since I'm hard pressed to
think of a plausible case for using RAID on a machine which is to be
cloned (and also since I think it's possible to explicitly omit the
hostname when creating the array, such that the array will auto-assemble
on any system), but there's no guarantee it's the only example of
something which needs to be updated outside of /etc/ in order for things
to keep working.

At a glance, there are also unique LVM IDs in /boot/grub/grub.cfg,
though whether those would need to be changed when cloning I don't know.
I also vaguely recall having once run into issues related to filesystems
being by default configured to mount by unique ID rather than by device
path, which thus didn't mount anymore once the filesystem had been
cloned from its original drive onto a different drive which had a
different ID, but it's been too long for me to dredge up any specifics.

-- 
   The Wanderer

The reasonable man adapts himself to the world; the unreasonable one
persists in trying to adapt the world to himself. Therefore all
progress depends on the unreasonable man. -- George Bernard Shaw



signature.asc
Description: OpenPGP digital signature


Re: Generating new IDs for cloning (was Re: duplicate popularity-contest ID)

2019-08-07 Thread Paul Wise
On Thu, Aug 8, 2019 at 4:59 AM Andrei POPESCU wrote:

> 1. Delete the contents of /etc (all of it)
> 2. If a package doesn't find its "stuff" in /etc it regenerates it from
> defaults.

There is still way too much stuff that defaults to installing
important files in /etc (default config settings, init scripts etc).

It would be nice to have a way to tell all postinsts to not generate
system-specific files (like machine-id or SSH keys) but do generate
caches. Cleaning up after postinsts seems like a hack to me.

There are some notes on reproducible installs here, IIRC Tails and
Webconverger have achieved that for their live images:

https://wiki.debian.org/ReproducibleInstalls

-- 
bye,
pabs

https://wiki.debian.org/PaulWise



Re: Generating new IDs for cloning (was Re: duplicate popularity-contest ID)

2019-08-07 Thread Andrei POPESCU
On Mi, 07 aug 19, 09:28:12, The Wanderer wrote:
> 
> I've begun to wonder whether it might be worth the overhead to set up
> some type of mechanism to let packages which define such
> machine-specific IDs A: declare the fact, in a central location which

Do you mean /etc? :)

> the sysadmin of a machine where that package is installed can easily
> check, and B: define an automated way of performing the appropriate
> update / regenerate step in a way which covers all known places where
> the ID needs to be updated.

1. Delete the contents of /etc (all of it)
2. If a package doesn't find its "stuff" in /etc it regenerates it from 
defaults.

http://0pointer.net/blog/projects/stateless.html


Kind regards,
Andrei
-- 
http://wiki.debian.org/FAQsFromDebianUser


signature.asc
Description: PGP signature


Re: Generating new IDs for cloning (was Re: duplicate popularity-contest ID)

2019-08-07 Thread Russ Allbery
The Wanderer  writes:

> This isn't the first time I've discovered that some aspect of a Debian
> system would actually need to be cleared and re-generated when that
> system is cloned, well after the point where it would have been easy for
> me to address that need. (Fortunately, although I've moved in the
> direction of cloned Debian systems multiple times in the past, so far
> all of those have petered out before reaching production. I still want
> to change that at some point, however.)

I'm only vaguely aware of this package and how it works, but doesn't
cloud-init have some support for customizing things that need to change
when cloning an image?  Does it already handle machine-id?  Maybe we just
need to popularize it and explain better how to add additional hooks for
the needs of other packages?

-- 
Russ Allbery (r...@debian.org)   



Re: duplicate popularity-contest ID

2019-08-07 Thread Russ Allbery
Michael Stone  writes:

> I don't think popcon is a good reason to pause if there are valid
> concerns suggesting removal is a good thing, for the exact reason that
> it's skewed to propagating existing practice. I'm not sure there's any
> really good use for popcon, but I'll continue to believe that any value
> it does have is more related to how many unique configurations it
> reflects rather than how many duplicate instances it can hold.

Seeing that some people out there in the world have installed and are
using a package that I maintain in Debian makes me happy.  I think we
shouldn't underestimate the pure psychological value of that, even if it's
hard to attribute specific meaning to the statistics.

(Thank you, Bill, for maintaining popcon!)

-- 
Russ Allbery (r...@debian.org)   



Re: duplicate popularity-contest ID

2019-08-07 Thread Michael Stone

On Wed, Aug 07, 2019 at 04:02:14PM +0100, Ian Jackson wrote:

Michael Stone writes ("Re: duplicate popularity-contest ID"):

I guess the question is what is the point of the popcon statistics.
Insofar as they're used to determine defaults, skewing them toward
custom images (which likely do not care about defaults) is probably a
mistake.


popcon is a really bad way to determine defaults because it is so
heavily skewed by existing defaults.

More useful uses of popcon include: estimating the downside, if some
package is (or may become) broken or removed; and, maybe, estimating
the user preferences between different non-default leaf packages.

For me, if I were doing (say) RC bugfixing and was considering asking
for a removal, even a moderate popcon figure would give me pause.
Conversely, a low popcon figure would encourage me to consult on
removing the package.


I don't think popcon is a good reason to pause if there are valid 
concerns suggesting removal is a good thing, for the exact reason that 
it's skewed to propagating existing practice. I'm not sure there's any 
really good use for popcon, but I'll continue to believe that any value 
it does have is more related to how many unique configurations it 
reflects rather than how many duplicate instances it can hold. 



Re: Generating new IDs for cloning (was Re: duplicate popularity-contest ID)

2019-08-07 Thread Marvin Renich
* The Wanderer  [190807 09:28]:
> Cloning isn't the only example of a case where some machine-specific
> configuration detail may need to be updated, without that being obvious
> in advance.
> 
> I've begun to wonder whether it might be worth the overhead to set up
> some type of mechanism to let packages which define such
> machine-specific IDs A: declare the fact, in a central location which
> the sysadmin of a machine where that package is installed can easily
> check, and B: define an automated way of performing the appropriate
> update / regenerate step in a way which covers all known places where
> the ID needs to be updated.

I think this is a good idea, but will require work and coordination to
accomplish.  A wiki.debian.org page with your ideas and (perhaps on a
separate page) a place to list things that need updating after the
physical copying is complete would be wonderful, if you feel motivated
to get it started.  :-)  Hostname, machine-id (new to me too!), and ssh
host keys can start the list.

...Marvin



Re: duplicate popularity-contest ID

2019-08-07 Thread Ian Jackson
Michael Stone writes ("Re: duplicate popularity-contest ID"):
> I guess the question is what is the point of the popcon statistics. 
> Insofar as they're used to determine defaults, skewing them toward 
> custom images (which likely do not care about defaults) is probably a 
> mistake.

popcon is a really bad way to determine defaults because it is so
heavily skewed by existing defaults.

More useful uses of popcon include: estimating the downside, if some
package is (or may become) broken or removed; and, maybe, estimating
the user preferences between different non-default leaf packages.

For me, if I were doing (say) RC bugfixing and was considering asking
for a removal, even a moderate popcon figure would give me pause.
Conversely, a low popcon figure would encourage me to consult on
removing the package.

Ian.

-- 
Ian JacksonThese opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.



Re: duplicate popularity-contest ID

2019-08-07 Thread Michael Stone

On Wed, Aug 07, 2019 at 09:31:34AM +0200, Marc Haber wrote:

On Tue, 6 Aug 2019 11:33:42 +, Bill Allombert
 wrote:

Yesterday I received the same popcon ID 2600 times, and 4700 differents ID were 
received
two times and 22000 ID were received exactly once.

I understand the need for totally identical systems, but then probably
it does not make sense for them to report to popcon.


Why? Does a node in a cluster count less than a desktop installation?
If so, why do we not value the input of our biggest users while
putting so much focus on installations in a market segment that we're
losing anyway?


I guess the question is what is the point of the popcon statistics. 
Insofar as they're used to determine defaults, skewing them toward 
custom images (which likely do not care about defaults) is probably a 
mistake. 



Generating new IDs for cloning (was Re: duplicate popularity-contest ID)

2019-08-07 Thread The Wanderer
On 2019-08-07 at 04:26, Russell Stuart wrote:

> On Wed, 2019-08-07 at 09:34 +0200, Marc Haber wrote:
> 
>> I am using Debian for two decades now, and I realized that
>> necessity two days ago.
> 
> Ditto - except for me it was a few seconds ago.

In my case, it was when I read this thread last night. (After more like
~1.5 decades of Debian, for what that's worth.)

This isn't the first time I've discovered that some aspect of a Debian
system would actually need to be cleared and re-generated when that
system is cloned, well after the point where it would have been easy for
me to address that need. (Fortunately, although I've moved in the
direction of cloned Debian systems multiple times in the past, so far
all of those have petered out before reaching production. I still want
to change that at some point, however.)


Cloning isn't the only example of a case where some machine-specific
configuration detail may need to be updated, without that being obvious
in advance.

I've been bitten by attempting to change the name of a computer running
off of LVM on mdraid, and discovering that the hostname entered during
the original install process when those two things were configured had
actually been encoded into the definition of one of the two, such that
the machine could no longer automatically find its filesystems at boot
until some action to update the hostname in that definition was taken;
the original hostname was effectively a critical ID for that filesystem.
(I *still* haven't been able to pin down with certainty what action
would do that update safely.)

Since cloning a machine often involves specifying a new hostname for the
clone, I'd expect to encounter the same issue there - although it's
probably not all that common to want to clone a machine running from
RAID, so if the mdraid is where the hostname is needed, the issue may
not tend to come up in that context.

I wouldn't be even slightly surprised if there were other examples, as
well, somewhere in the package ecosystem.


I've begun to wonder whether it might be worth the overhead to set up
some type of mechanism to let packages which define such
machine-specific IDs A: declare the fact, in a central location which
the sysadmin of a machine where that package is installed can easily
check, and B: define an automated way of performing the appropriate
update / regenerate step in a way which covers all known places where
the ID needs to be updated.

Maybe a mechanism vaguely similar to /etc/init.d/ | /etc/rc*.d/ ? Say,
one directory (name bikeshedding welcome) to contain package-installed
scripts which will generate and apply the new GUID (or replace an
existing ID with a specified new one in all relevant places, for cases
such as the hostname one given above), and another directory to contain
symlinks to scripts in the first directory. Then either a flag file to
tell the system to run the symlinked scripts (and clear the flag) on the
next boot, or just let the presence of any such symlinks be the flag
indicating to run that script and remove the symlink at boot time.

That way, rather than needing to research to find out what elements of
the installed system need to be updated at clone time, the sysadmin
could just check the relevant directory, run any scripts whose effects
need to be applied pre-clone (if any), create appropriate symlinks for
whichever others the sysadmin wants to have run in this case, create the
flag file if applicable, shut down, and clone.

...this would be arguably reminiscent of the Sysprep tool on the Windows
side of things, although probably all of more general, more flexible,
and less heavy-weight. I'm sad at there being any need for such a thing
in the Linux world, but as long as there are machine-specific IDs which
need to be updated for effective cloning, I'd rather have such a
mechanism than need to do all the work (or do the research, and write
the necessary automation scripts) myself in every case.

I'm not particularly attached to that exact solution; it's just the
first one I came up with that seemed as if it could work with sufficient
generality. If people think the idea is worth pursuing but that solution
is not ideal, I would be more than happy to defer to those with more
expertise.

-- 
   The Wanderer (will, statistically, probably regret posting this)

The reasonable man adapts himself to the world; the unreasonable one
persists in trying to adapt the world to himself. Therefore all
progress depends on the unreasonable man. -- George Bernard Shaw



signature.asc
Description: OpenPGP digital signature


Re: duplicate popularity-contest ID

2019-08-07 Thread Russell Stuart
On Wed, 2019-08-07 at 09:34 +0200, Marc Haber wrote:
> I am using Debian for two decades now, and I realized that necessity
> two days ago.

Ditto - except for me it was a few seconds ago.

signature.asc
Description: This is a digitally signed message part


Re: duplicate popularity-contest ID

2019-08-07 Thread Marc Haber
On Tue, 06 Aug 2019 14:01:16 -0400, Sam Hartman 
wrote:
>> "Bill" == Bill Allombert  writes:
>
>Bill> This is potentially an excellent idea!
>
>Bill> Does not /etc/machine-id suffer of exactly the same issue as
>Bill> /etc/popularity-contest.conf ?
>
>A lot more procedures for cloning images know that they need to generate
>new /etc/machine-ids.

I am using Debian for two decades now, and I realized that necessity
two days ago.

>It's one of those things you tend to realize fairly quickly that you
>need to fix up in cloned images.
>Interactions with machined, systemd journal, and a few other things tend
>to make it obvious.

I didn't.

Greetings
Marc
-- 
-- !! No courtesy copies, please !! -
Marc Haber |   " Questions are the | Mailadresse im Header
Mannheim, Germany  | Beginning of Wisdom " | 
Nordisch by Nature | Lt. Worf, TNG "Rightful Heir" | Fon: *49 621 72739834



Re: duplicate popularity-contest ID

2019-08-07 Thread Marc Haber
On Tue, 6 Aug 2019 11:33:42 +, Bill Allombert
 wrote:
>Yesterday I received the same popcon ID 2600 times, and 4700 differents ID 
>were received
>two times and 22000 ID were received exactly once.
>
>I understand the need for totally identical systems, but then probably
>it does not make sense for them to report to popcon.

Why? Does a node in a cluster count less than a desktop installation?
If so, why do we not value the input of our biggest users while
putting so much focus on installations in a market segment that we're
losing anyway?

>A related issue is that the submission time is randomized, but if
>2600 systems have identical /etc/cron.d/popularity-contest files, they
>will report at the same time, causing network spikes.

Then the randomization should not be in the configuration file, but
for example hashed from the MAC address.

Greetings
Marc
-- 
-- !! No courtesy copies, please !! -
Marc Haber |   " Questions are the | Mailadresse im Header
Mannheim, Germany  | Beginning of Wisdom " | 
Nordisch by Nature | Lt. Worf, TNG "Rightful Heir" | Fon: *49 621 72739834



Re: duplicate popularity-contest ID

2019-08-06 Thread Paul Wise
On Tue, Aug 6, 2019 at 7:34 PM Bill Allombert wrote:

> A related issue is that the submission time is randomized, but if
> 2600 systems have identical /etc/cron.d/popularity-contest files, they
> will report at the same time, causing network spikes.

BTW, a systemd service timer has native randomisation with
RandomizedDelaySec/AccuracySec so adding one that shadows the cron job
and disabling the cron job on systemd systems could provide more load
spread because each system would send the data at a completely
different time each day. The apt package is a good example of how to
do this (except for the randomisation part).

-- 
bye,
pabs

https://wiki.debian.org/PaulWise



Re: duplicate popularity-contest ID

2019-08-06 Thread Sam Hartman
> "Bill" == Bill Allombert  writes:

Bill> This is potentially an excellent idea!

Bill> Does not /etc/machine-id suffer of exactly the same issue as
Bill> /etc/popularity-contest.conf ?

A lot more procedures for cloning images know that they need to generate
new /etc/machine-ids.

It's one of those things you tend to realize fairly quickly that you
need to fix up in cloned images.
Interactions with machined, systemd journal, and a few other things tend
to make it obvious.

--Sam



Re: duplicate popularity-contest ID

2019-08-06 Thread Bill Allombert
On Tue, Aug 06, 2019 at 12:08:13AM +0200, Marco d'Itri wrote:
> On Aug 05, Bill Allombert  wrote:
> 
> > Each Debian popularity-contest submitter is supposed to have
> > a different random 128bit popcon ID.
> > However, the popularity-constest server 
> > receives a lot of submissions with identical popcon ID, which cause them
> > to be treated as a single submission.
> 
> > I am not quite sure what it is the reason for this problem.
> > Maybe people use prebuild system images with a pregenerated
> > /etc/popularity-contest.conf file (instead of being generated
> > by popcon postinst).
> Probably yes.
> 
> > I am not sure what to do about this.
> Change popularity-contest by transmissing the hostid after it has been
> hashed with the content of /etc/machine-id.

This is potentially an excellent idea!

Does not /etc/machine-id suffer of exactly the same issue as
/etc/popularity-contest.conf ?

Is there some statistic about /etc/machine-id reuse or unexpected change ?

Cheers,
-- 
Bill. 

Imagine a large red swirl here.



Re: duplicate popularity-contest ID

2019-08-06 Thread Jeremy Stanley
On 2019-08-06 08:33:36 -0700 (-0700), Russ Allbery wrote:
[...]
> Hm.  I think that's still in the range of what could be explained
> by VM cloning, although the 2600 with the same ID is surprising.
[...]

A CI system which is using cloned virtual machines could easily do
that. I help operate a CI system which boots and deletes far more
than 2600 new virtual machines of some distro/version combinations
every day, though it hasn't relied on cloning to create images for a
few years now. On the other hand, including popcon on a test VM
*and* enabling it to report does seem like an odd choice for a CI
system, but I've seen far stranger misconfigurations over the years.
-- 
Jeremy Stanley


signature.asc
Description: PGP signature


Re: duplicate popularity-contest ID

2019-08-06 Thread Russ Allbery
Bill Allombert  writes:

> Both.

> Yesterday I received the same popcon ID 2600 times, and 4700 differents
> ID were received two times and 22000 ID were received exactly once.

Hm.  I think that's still in the range of what could be explained by VM
cloning, although the 2600 with the same ID is surprising.

> I understand the need for totally identical systems, but then probably
> it does not make sense for them to report to popcon.

Marco's suggestion of hashing with /etc/machine-id is a good one.  That
file is unique per cloned VM (if the cloning is done properly), and if you
hash it with the popcon ID to form a new ID, there shouldn't be any
realistic chance of leaking a unique identifier for the system that might
be useful for other purposes.

> A related issue is that the submission time is randomized, but if 2600
> systems have identical /etc/cron.d/popularity-contest files, they will
> report at the same time, causing network spikes.

You could add a bit of per-run randomization to the cron job.  I assume an
individual report doesn't take a lot of effort to process, so even skewing
that load across 10-20 seconds might be enough.

-- 
Russ Allbery (r...@debian.org)   



Re: duplicate popularity-contest ID

2019-08-06 Thread Bill Allombert
On Mon, Aug 05, 2019 at 08:46:15AM -0700, Russ Allbery wrote:
> Bill Allombert  writes:
> 
> > Each Debian popularity-contest submitter is supposed to have a different
> > random 128bit popcon ID.  However, the popularity-constest server
> >  receives a lot of submissions with identical
> > popcon ID, which cause them to be treated as a single submission.
> 
> Are you getting lots and lots of submissions with one identical popcon ID,
> or lots of cases of 10-20 systems duplicating different popcon IDs?  I
> think those lead to different conclusions.

Both.
Yesterday I received the same popcon ID 2600 times, and 4700 differents ID were 
received
two times and 22000 ID were received exactly once.

I understand the need for totally identical systems, but then probably
it does not make sense for them to report to popcon.

A related issue is that the submission time is randomized, but if
2600 systems have identical /etc/cron.d/popularity-contest files, they
will report at the same time, causing network spikes.

Cheers,
-- 
Bill. 

Imagine a large red swirl here.



Re: duplicate popularity-contest ID

2019-08-05 Thread Marco d'Itri
On Aug 05, Bill Allombert  wrote:

> Each Debian popularity-contest submitter is supposed to have
> a different random 128bit popcon ID.
> However, the popularity-constest server 
> receives a lot of submissions with identical popcon ID, which cause them
> to be treated as a single submission.

> I am not quite sure what it is the reason for this problem.
> Maybe people use prebuild system images with a pregenerated
> /etc/popularity-contest.conf file (instead of being generated
> by popcon postinst).
Probably yes.

> I am not sure what to do about this.
Change popularity-contest by transmissing the hostid after it has been
hashed with the content of /etc/machine-id.

-- 
ciao,
Marco


signature.asc
Description: PGP signature


Re: duplicate popularity-contest ID

2019-08-05 Thread Russ Allbery
Bill Allombert  writes:

> Each Debian popularity-contest submitter is supposed to have a different
> random 128bit popcon ID.  However, the popularity-constest server
>  receives a lot of submissions with identical
> popcon ID, which cause them to be treated as a single submission.

Are you getting lots and lots of submissions with one identical popcon ID,
or lots of cases of 10-20 systems duplicating different popcon IDs?  I
think those lead to different conclusions.

If it's the second, I agree with the suggestion of cloned VMs.  Containers
are making this a bit less common, but building out a system and then
cloning it repeatedly used to be the most common way of scaling a web
service in environments such as AWS.

-- 
Russ Allbery (r...@debian.org)   



Re: duplicate popularity-contest ID

2019-08-05 Thread merkys
On 2019-08-05 15:29, Bill Allombert wrote:
> However, the popularity-constest server 
> receives a lot of submissions with identical popcon ID, which cause them
> to be treated as a single submission.

I would suspect cloned VMs to have identical popcon IDs. In this case
the collation of identical IDs would be a desirable property, IMO.

Best,
Andrius

-- 
Andrius Merkys
Vilnius University Institute of Biotechnology, Saulėtekio al. 7, room V325
LT-10257 Vilnius, Lithuania



Re: duplicate popularity-contest ID

2019-08-05 Thread Andrey Rahmatullin
On Mon, Aug 05, 2019 at 02:29:33PM +0200, Bill Allombert wrote:
> Dear Debian developers,
> 
> Each Debian popularity-contest submitter is supposed to have
> a different random 128bit popcon ID.
> However, the popularity-constest server 
> receives a lot of submissions with identical popcon ID, which cause them
> to be treated as a single submission.
Do you mean just one ID or several IDs with multiple submissions each?

-- 
WBR, wRAR


signature.asc
Description: PGP signature


Re: duplicate popularity-contest ID

2019-08-05 Thread Jonathan Carter
Hey Yao and Bill

On 2019/08/05 14:31, "Yao Wei (魏銘廷)" wrote:
>> I am not quite sure what it is the reason for this problem.
>> Maybe people use prebuild system images with a pregenerated
>> /etc/popularity-contest.conf file (instead of being generated
>> by popcon postinst).
> 
> Could this be caused by Debian-live installer based on Calamares?

Very unlikely, we don't install popularity-contest on live media and
it's not added/removed at any point by Calamares, so essentially when
you install popularity-contest on a calamares-live-installed system,
it's basically the same as installing it on any other type of Debian
system that didn't have it before.

I also just double-checked whether any /etc/popularity-contest.conf
exists on debian live images, and can confirm that it doesn't.

Bill, it might also be a good idea to ask on the debian-derivatives
mailing list, perhaps someone there might know. I don't suppose there's
any server logs with IPs that you can use to deduce from which country
it's coming from?

-Jonathan

-- 
  ⢀⣴⠾⠻⢶⣦⠀  Jonathan Carter (highvoltage) 
  ⣾⠁⢠⠒⠀⣿⡁  Debian Developer - https://wiki.debian.org/highvoltage
  ⢿⡄⠘⠷⠚⠋   https://debian.org | https://jonathancarter.org
  ⠈⠳⣄  Be Bold. Be brave. Debian has got your back.



Re: duplicate popularity-contest ID

2019-08-05 Thread Yao Wei (魏銘廷)


> On Aug 5, 2019, at 20:29, Bill Allombert  wrote:
> 
> I am not quite sure what it is the reason for this problem.
> Maybe people use prebuild system images with a pregenerated
> /etc/popularity-contest.conf file (instead of being generated
> by popcon postinst).

Could this be caused by Debian-live installer based on Calamares?

Yao Wei

(This email is sent from a phone; sorry for HTML email if it happens.)