Re: [systemd-devel] fsckd needs to go

2015-05-22 Thread cee1
2015-05-22 20:23 GMT+08:00 Martin Pitt :
> Hello Lennart,
>
> sorry for the late answer, got stuck in different things in the past
> two weeks..
>
> Lennart Poettering [2015-04-28 17:33 +0200]:
>> On Fri, 03.04.15 14:58, Lennart Poettering (lenn...@poettering.net) wrote:
>>
>> > systemd-fsckd would try to connect to some AF_UNIX/SOCK_STREAM socket
>> > in the fs, after forking and before execing fsck in the child, and
>> > pass the connected socket to fsck via the -C switch. If the socket is
>> > not connectable it would avoid any -C switch. With this simple change
>> > you can make this work for you: simply write a daemon (outside of
>> > systemd) that listens on that sockets and reads the progress data from
>> > it. Using SO_PEERCRED you can query which fsck PID this is from and
>> > use it to kill it. You could even add this to ply natively if you
>> > wish, since it's kinda strange to bump this all off another daemon in
>> > the middle, unnecessarily.
>>
>> I implemented this now, and removed fsckd in the progress. The
>> progress data is now available on /run/systemd/fsck.progress which
>> should be an AF_UNIX/SOCK_STREAM socket.
>
> Great, thanks! This works fine, it's very similar to what Didier did
> before. I. e. fsckd essentially works almost unmodified (except for
> adjusting the socket path).
>
> So we'll maintain that patch downstream now. It makes maintaining
> translations harder, but so be it.
>
>> Please test this, I only did some artifical testing myself, since I
>> don't use file systems that require fsck anymore myself.
>
> Neither do I, but there's always test/mocks/fsck which works very
> nicely.
>
> Thanks,
>
> Martin
>
> --
> Martin Pitt| http://www.piware.de
> Ubuntu Developer (www.ubuntu.com)  | Debian Developer  (www.debian.org)
> ___
> systemd-devel mailing list
> systemd-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/systemd-devel

Hey,

Just mention it, we've implemented similar fsck progress report in
LOonux3[1] several years ago.

FYI:
* http://lists.freedesktop.org/archives/systemd-devel/2011-June/002654.html
* patch for systemd:
https://github.com/cee1/systemd/commit/c04c709880f0619434ff58580609300d892f281b
* patch for plymouth:
https://github.com/cee1/plymouth/commit/5be1bb7751b547fe5c125a42c3f2fe607568fa0f



--
1. http://dev.lemote.com/category/loonux3



Regards,

- cee1
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] fsckd needs to go

2015-05-22 Thread Martin Pitt
Hello Lennart,

sorry for the late answer, got stuck in different things in the past
two weeks..

Lennart Poettering [2015-04-28 17:33 +0200]:
> On Fri, 03.04.15 14:58, Lennart Poettering (lenn...@poettering.net) wrote:
> 
> > systemd-fsckd would try to connect to some AF_UNIX/SOCK_STREAM socket
> > in the fs, after forking and before execing fsck in the child, and
> > pass the connected socket to fsck via the -C switch. If the socket is
> > not connectable it would avoid any -C switch. With this simple change
> > you can make this work for you: simply write a daemon (outside of
> > systemd) that listens on that sockets and reads the progress data from
> > it. Using SO_PEERCRED you can query which fsck PID this is from and
> > use it to kill it. You could even add this to ply natively if you
> > wish, since it's kinda strange to bump this all off another daemon in
> > the middle, unnecessarily.
> 
> I implemented this now, and removed fsckd in the progress. The
> progress data is now available on /run/systemd/fsck.progress which
> should be an AF_UNIX/SOCK_STREAM socket.

Great, thanks! This works fine, it's very similar to what Didier did
before. I. e. fsckd essentially works almost unmodified (except for
adjusting the socket path).

So we'll maintain that patch downstream now. It makes maintaining
translations harder, but so be it.

> Please test this, I only did some artifical testing myself, since I
> don't use file systems that require fsck anymore myself.

Neither do I, but there's always test/mocks/fsck which works very
nicely.

Thanks,

Martin

-- 
Martin Pitt| http://www.piware.de
Ubuntu Developer (www.ubuntu.com)  | Debian Developer  (www.debian.org)
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] fsckd needs to go

2015-04-28 Thread Lennart Poettering
On Fri, 03.04.15 14:58, Lennart Poettering (lenn...@poettering.net) wrote:

> systemd-fsckd would try to connect to some AF_UNIX/SOCK_STREAM socket
> in the fs, after forking and before execing fsck in the child, and
> pass the connected socket to fsck via the -C switch. If the socket is
> not connectable it would avoid any -C switch. With this simple change
> you can make this work for you: simply write a daemon (outside of
> systemd) that listens on that sockets and reads the progress data from
> it. Using SO_PEERCRED you can query which fsck PID this is from and
> use it to kill it. You could even add this to ply natively if you
> wish, since it's kinda strange to bump this all off another daemon in
> the middle, unnecessarily.

I implemented this now, and removed fsckd in the progress. The
progress data is now available on /run/systemd/fsck.progress which
should be an AF_UNIX/SOCK_STREAM socket. If you listen on it you will
get the raw fsck progress data though it. With SO_PEERCRED you can
figure out which fsck process is on the other side. If you do not
listen on it the progress data is instead printed to /dev/console
after converting it to percentage data.

Please test this, I only did some artifical testing myself, since I
don't use file systems that require fsck anymore myself.

Sorry again for communicating this so badly initially!

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] fsckd needs to go -- possible compromise?

2015-04-08 Thread Kay Sievers
On Wed, Apr 8, 2015 at 4:18 PM, Martin Pitt  wrote:
> Lennart Poettering [2015-04-07 16:14 +0200]:
>> Well, the asnc IO socket handling thing was not dealt with. The newest
>> patches still use fgets().
>> [...]
>> The killer issue really is the safety issue. We shouldn't include
>> code in systemd that makes dangerous things like killing running
>> fscks an easily accessible operation, that has a graphical UI and
>> requires no authentication.
>
> So, would you reconsider your position if we address the two things
> above? I. e. replace fgets() by our own async buffering, and entirely
> remove the cancel support? Then we'd still get a proper feedback
> during boot instead of leaving the user in the dark why booting is
> stuck, but it stays noninteractive.

I don't think there is enough justification for a fsck daemon. Large
filesystems which need fsck in userspace are a thing from the past and
insufficiently developed technology for today's operating system
tasks. Basic filesystem consistency and maintenance tasks belong into
the kernel and nowhere else.

We made it just fine into the year 2015 with the support for the
legacy filesystems, and we did not need a specialized daemon so far.
Therefore, we can except that the current level of support will be
sufficient for the coming years. We will support them well enough
until everybody will finally realize that they do not solve the
problems we face today, and that they need to be replaced.

Please keep things like fsckd in the distribution that wants to make
such promises about legacy technology. Systemd upstream should focus
on current and future technologies and not pimp up outdated
facilities, waste our time and and add more complex logic and rules in
the basic boot process.

Thanks,
Kay
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] fsckd needs to go -- possible compromise?

2015-04-08 Thread Martin Pitt
Hello all,

Lennart Poettering [2015-04-07 16:14 +0200]:
> Well, the asnc IO socket handling thing was not dealt with. The newest
> patches still use fgets().
> [...]
> The killer issue really is the safety issue. We shouldn't include
> code in systemd that makes dangerous things like killing running
> fscks an easily accessible operation, that has a graphical UI and
> requires no authentication.

So, would you reconsider your position if we address the two things
above? I. e. replace fgets() by our own async buffering, and entirely
remove the cancel support? Then we'd still get a proper feedback
during boot instead of leaving the user in the dark why booting is
stuck, but it stays noninteractive.

Thanks,

Martin

-- 
Martin Pitt| http://www.piware.de
Ubuntu Developer (www.ubuntu.com)  | Debian Developer  (www.debian.org)
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] fsckd needs to go

2015-04-08 Thread Lennart Poettering
On Wed, 08.04.15 13:03, Reindl Harald (h.rei...@thelounge.net) wrote:

> >>https://bugzilla.redhat.com/show_bug.cgi?id=1105877
> >
> >Hmm? i don't understand what that bug is about? Is it about /forcefsck
> >being ignored?
> 
> it is about "warning: checktime reached, running e2fsck is recommended" but
> the check didn't happen and that you *need* to "touch /forcefsck" while it
> should happen automatically

OK, reassigned to the kernel. it's somewhere between the kernel ane
e2fsck to figure this out. We will always call fsck, it's up to fsck
to do something, and if it decides not to, then it would needs to say
why, and get the kernel in sync...

> >And what does this bug have to do with systemd?
> 
> i don't get your reasoning for "Maybe the right fix for Ubuntu is to stop
> enabling the "routine" check logic?" because as seen a few months ago this
> routine check is important, otherwise you may not notice existing corruption
> (for whatever reason) until it is too late

Well, the file system folks at RH decided this makes no sense long
ago, please bring this up with them. Also note that the change RH was
carrying a long time is now upstream (see Martin's link), hence bring
this up with them.

systemd is not involved in this.

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] fsckd needs to go

2015-04-08 Thread Reindl Harald



Am 08.04.2015 um 12:48 schrieb Lennart Poettering:

On Wed, 08.04.15 12:31, Reindl Harald (h.rei...@thelounge.net) wrote:




Am 08.04.2015 um 12:27 schrieb Lennart Poettering:

Well, the routine check is only done by Ubuntu/Debian, it is not
enabled on any enterprise distro or on Fedora. Maybe Ubuntu/Debian
should also turn this off?

Note that the routine check is not different than a normal check
really, it just is triggered by a mount counter instead of a dirty
flag, that's all. Hence it makes little difference what you cancel,
both is dangerous, and a bad idea to allow unauthenticated.

Also, to my knowledge plymouth on Ubuntu never showed a different UI
for both cases, did it? How is the admin supposed to know when it is
just dangerous to cancel the fsck (in your "routine" check case), and
when it is extra dangerous (in the non-"routine" check case)?

Maybe the right fix for Ubuntu is to stop enabling the "routine" check
logic?


why would you want to disable it?

short before christmas i had a faulty ext4 FS needing even manual
confirmation of repairs - i don't think it's a good idea to not trigger that
automatically and frankly it *should have been* triggered that way

https://bugzilla.redhat.com/show_bug.cgi?id=1105877


Hmm? i don't understand what that bug is about? Is it about /forcefsck
being ignored?


it is about "warning: checktime reached, running e2fsck is recommended" 
but the check didn't happen and that you *need* to "touch /forcefsck" 
while it should happen automatically



And what does this bug have to do with systemd?


i don't get your reasoning for "Maybe the right fix for Ubuntu is to 
stop enabling the "routine" check logic?" because as seen a few months 
ago this routine check is important, otherwise you may not notice 
existing corruption (for whatever reason) until it is too late




signature.asc
Description: OpenPGP digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] fsckd needs to go

2015-04-08 Thread Lennart Poettering
On Wed, 08.04.15 12:31, Reindl Harald (h.rei...@thelounge.net) wrote:

> 
> 
> Am 08.04.2015 um 12:27 schrieb Lennart Poettering:
> >Well, the routine check is only done by Ubuntu/Debian, it is not
> >enabled on any enterprise distro or on Fedora. Maybe Ubuntu/Debian
> >should also turn this off?
> >
> >Note that the routine check is not different than a normal check
> >really, it just is triggered by a mount counter instead of a dirty
> >flag, that's all. Hence it makes little difference what you cancel,
> >both is dangerous, and a bad idea to allow unauthenticated.
> >
> >Also, to my knowledge plymouth on Ubuntu never showed a different UI
> >for both cases, did it? How is the admin supposed to know when it is
> >just dangerous to cancel the fsck (in your "routine" check case), and
> >when it is extra dangerous (in the non-"routine" check case)?
> >
> >Maybe the right fix for Ubuntu is to stop enabling the "routine" check
> >logic?
> 
> why would you want to disable it?
> 
> short before christmas i had a faulty ext4 FS needing even manual
> confirmation of repairs - i don't think it's a good idea to not trigger that
> automatically and frankly it *should have been* triggered that way
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1105877

Hmm? i don't understand what that bug is about? Is it about /forcefsck
being ignored? And what does this bug have to do with systemd?

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] fsckd needs to go

2015-04-08 Thread Martin Pitt
Lennart Poettering [2015-04-08 12:27 +0200]:
> Maybe the right fix for Ubuntu is to stop enabling the "routine" check
> logic? 

This already happened a while ago, through

  http://git.whamcloud.com/tools/e2fsprogs.git/commitdiff/3daf592646

So this indeed only affects older/upgraded installations.

Martin
-- 
Martin Pitt| http://www.piware.de
Ubuntu Developer (www.ubuntu.com)  | Debian Developer  (www.debian.org)
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] fsckd needs to go

2015-04-08 Thread Reindl Harald



Am 08.04.2015 um 12:27 schrieb Lennart Poettering:

Well, the routine check is only done by Ubuntu/Debian, it is not
enabled on any enterprise distro or on Fedora. Maybe Ubuntu/Debian
should also turn this off?

Note that the routine check is not different than a normal check
really, it just is triggered by a mount counter instead of a dirty
flag, that's all. Hence it makes little difference what you cancel,
both is dangerous, and a bad idea to allow unauthenticated.

Also, to my knowledge plymouth on Ubuntu never showed a different UI
for both cases, did it? How is the admin supposed to know when it is
just dangerous to cancel the fsck (in your "routine" check case), and
when it is extra dangerous (in the non-"routine" check case)?

Maybe the right fix for Ubuntu is to stop enabling the "routine" check
logic?


why would you want to disable it?

short before christmas i had a faulty ext4 FS needing even manual 
confirmation of repairs - i don't think it's a good idea to not trigger 
that automatically and frankly it *should have been* triggered that way


https://bugzilla.redhat.com/show_bug.cgi?id=1105877



signature.asc
Description: OpenPGP digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] fsckd needs to go

2015-04-08 Thread Lennart Poettering
On Wed, 08.04.15 10:46, Martin Pitt (martin.p...@ubuntu.com) wrote:

> Reindl Harald [2015-04-08 10:32 +0200]:
> > nobody needs to ability to cancel a fsck because hardly anybody has a
> > insight if the moment doing so is horrible dangerous and givne that fsck
> > don't run for fun why would you want to interrupt it and risk data loss?
> 
> You don't risk data loss by interrupting a routine check (that still
> happens on ext[234] every so often).

Well, the routine check is only done by Ubuntu/Debian, it is not
enabled on any enterprise distro or on Fedora. Maybe Ubuntu/Debian
should also turn this off?

Note that the routine check is not different than a normal check
really, it just is triggered by a mount counter instead of a dirty
flag, that's all. Hence it makes little difference what you cancel,
both is dangerous, and a bad idea to allow unauthenticated.

Also, to my knowledge plymouth on Ubuntu never showed a different UI
for both cases, did it? How is the admin supposed to know when it is
just dangerous to cancel the fsck (in your "routine" check case), and
when it is extra dangerous (in the non-"routine" check case)?

Maybe the right fix for Ubuntu is to stop enabling the "routine" check
logic? 

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] fsckd needs to go

2015-04-08 Thread Lennart Poettering
On Tue, 07.04.15 18:02, Dimitri John Ledkov (dimitri.j.led...@intel.com) wrote:

> On 3 April 2015 at 05:58, Lennart Poettering  wrote:
> > Heya,
> >
> > so we discussed the whole fsckd situation a bit more here in Berlin,
> > and we came to the conclusion that fsckd really should not exist the
> > way it does in systemd.
> >
> > To start with, the code is really wrong, it should never have been
> > merged in its current state, the read/write logic for the sockets is
> > completely borked (I cannot even boot my own machine reliably with
> > it!). And to my knowledge there has been no attempt to fix all of
> > that, even though I asked for it. It also doesn't do at all what I
> > suggested initially, as the flow of data is now fsck → systemd-fsck →
> > systemd-fsckd → plymouth, and that's just crazy, that's two steps too
> > many. systemd is supposed to be a few components playing well
> > together, but certainly not a baroque network of components where data
> > is passed though four hoops before it reaches the destination...
> >
> > Then, there's my general reservation with fsckd at all: file systems
> > that still require offline fsck are certainly not the future, but we
> > develop stuff for the future, and the idea to kill an fsck process
> > while it is running is also very very questionnable. There's a reason
> 
> Is this about progress & control data or all things fsck?

Well, ext234 require fsck, there's no way around it. We need to call
it, and we will. But the idea of beefing this up with an UI and
specifically with an unauthenticated way to kill fsck while it is
ongoing, which is an inherently unsafe operation, is what I have
issues with.

> IMHO we do need to continue support ext4, and running fsck.ext4 when
> enforced, at least from initramfs, with progress output to the user
> and ability to cancel. Or is even fsck.ext4 obsolete these days and
> shouldn't be run automatically any more?

Nope. ext2, ext3, ext4, fat require an fsck tool to be run, and we
will.

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] fsckd needs to go

2015-04-08 Thread Martin Pitt
Reindl Harald [2015-04-08 10:32 +0200]:
> nobody needs to ability to cancel a fsck because hardly anybody has a
> insight if the moment doing so is horrible dangerous and givne that fsck
> don't run for fun why would you want to interrupt it and risk data loss?

You don't risk data loss by interrupting a routine check (that still
happens on ext[234] every so often).

But anyway, I don't mind much dropping the cancel ability, but we do
want a proper progress report. fsck can take an effing long time with
large spinning rust, and without progress report users will just
consider the boot hanging/broken and switch off the machine. That's a
lot riskier :-)

Martin
-- 
Martin Pitt| http://www.piware.de
Ubuntu Developer (www.ubuntu.com)  | Debian Developer  (www.debian.org)


signature.asc
Description: Digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] fsckd needs to go

2015-04-08 Thread Reindl Harald


Am 08.04.2015 um 03:02 schrieb Dimitri John Ledkov:

Is this about progress & control data or all things fsck?

IMHO we do need to continue support ext4, and running fsck.ext4 when
enforced, at least from initramfs, with progress output to the user
and ability to cancel


nobody needs to ability to cancel a fsck because hardly anybody has a 
insight if the moment doing so is horrible dangerous and givne that fsck 
don't run for fun why would you want to interrupt it and risk data loss?




signature.asc
Description: OpenPGP digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] fsckd needs to go

2015-04-07 Thread Dimitri John Ledkov
On 3 April 2015 at 05:58, Lennart Poettering  wrote:
> Heya,
>
> so we discussed the whole fsckd situation a bit more here in Berlin,
> and we came to the conclusion that fsckd really should not exist the
> way it does in systemd.
>
> To start with, the code is really wrong, it should never have been
> merged in its current state, the read/write logic for the sockets is
> completely borked (I cannot even boot my own machine reliably with
> it!). And to my knowledge there has been no attempt to fix all of
> that, even though I asked for it. It also doesn't do at all what I
> suggested initially, as the flow of data is now fsck → systemd-fsck →
> systemd-fsckd → plymouth, and that's just crazy, that's two steps too
> many. systemd is supposed to be a few components playing well
> together, but certainly not a baroque network of components where data
> is passed though four hoops before it reaches the destination...
>
> Then, there's my general reservation with fsckd at all: file systems
> that still require offline fsck are certainly not the future, but we
> develop stuff for the future, and the idea to kill an fsck process
> while it is running is also very very questionnable. There's a reason

Is this about progress & control data or all things fsck?

IMHO we do need to continue support ext4, and running fsck.ext4 when
enforced, at least from initramfs, with progress output to the user
and ability to cancel. Or is even fsck.ext4 obsolete these days and
shouldn't be run automatically any more?

How this is implemented - e.g. inside systemd project or not, is not
relevant, but systemd seems to be a better place for this.

In upstart world, this completely was offloaded to mountall which
directly passed "special update" messages to plymouthd, which themes
could choose to parse and dispaly / act upon. This however was
ubuntu-specific patch I believe.

The current implementation/integration for systemd-fsck is also
heading to plymouth upstream for generic support there in themes, I
believe.

-- 
Regards,

Dimitri.

https://clearlinux.org
Open Source Technology Center
Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] fsckd needs to go

2015-04-07 Thread Lennart Poettering
On Mon, 06.04.15 15:21, Martin Pitt (martin.p...@ubuntu.com) wrote:

> Hello all,

Heya,

> Lennart Poettering [2015-04-03 16:34 +0200]:
> > Well, I had a brief look at this patch, but it still doesn't get the
> > socket IO stuff right. It uses synchronous fgets() to read things of
> > the sockets, that's still not OK, and is a major thing that is
> > wrong.
> 
> fsckd kicks malicious/broken fsck clients which send garbage, but if
> you want to do the buffering explicitly I can rework the patch to do
> that. 

It's not about sending garbage. It's about blocking. fsckd is supposed
to be daemon talking to multiple clients, and hence it may never
block. It's how daemons on UNIX work... Hence fgets() on client
sockets has *no* place in the fsckd sources.

> That is, if we actually keep fsckd in the upstream sources :-)
> I wouldn't like to spend time on this if you already pre-decided to
> kick this out, but I would ask to reconsider, and instead discuss
> what's wrong with the code.

Yeah, we decided to remove this, sorry!

I can only recommend to fix the async socket IO thing even if you
decide to maintain fsckd outside of systemd. It's just broken!

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] fsckd needs to go

2015-04-07 Thread Lennart Poettering
On Mon, 06.04.15 15:12, Martin Pitt (martin.p...@ubuntu.com) wrote:

> Hello Lennart, all,
> 
> Lennart Poettering [2015-04-03 14:58 +0200]:
> > To start with, the code is really wrong, it should never have been
> > merged in its current state, the read/write logic for the sockets is
> > completely borked (I cannot even boot my own machine reliably with
> > it!).
> 
> This is surprising indeed. If that's not just the journald/logind/D-Bus
> corruption (which we still haven't tracked down properly), do you have
> a journal of a hung boot? We never saw a boot failure due to fsck so
> far, so I'm naturally very interested in seeing what's wrong.

Sorry, not logs here, I removed the thing already here. Sorry.

> > And to my knowledge there has been no attempt to fix all of that,
> > even though I asked for it.
> 
> As far as I see, every point that came up during reviews, including
> your recent one about "don't route fsck output through systemd-fsck"
> got addressed (that latter patch hasn't been committed though, I
> thought you wanted to review it yourself).

Well, the asnc IO socket handling thing was not dealt with. The newest
patches still use fgets(). Using stdio for processing sockets is
generally not a good idea, since its blocking. And since you want to
process multiple connections at the same time you don't want
blocking. This is really broken.

Currently, if one fsck sends half a line, then this causes your daemon
to hang forever... THis is not acceptable in our sources, sorry.

> > It also doesn't do at all what I suggested initially, as the flow of
> > data is now fsck → systemd-fsck → systemd-fsckd → plymouth, and
> > that's just crazy, that's two steps too many.
> 
> With the above patch it's fsck -> systemd-fsckd → plymouth, and I
> don't see how to eliminate yet another step?

For example, by making ply listen directly on the socket, instead of
making this indirect via fsckd...

> > Then, there's my general reservation with fsckd at all: file systems
> > that still require offline fsck are certainly not the future, but we
> > develop stuff for the future
> 
> I do agree with the sentiment; let me assure you that we don't easily
> spend days on such stuff in vain, but it's because there are millions
> of existing installations out there which still do have ext4 and fsck.
> If systemd upstreams say "we don't care about existing products, only
> about a future with just btrfs" that's your prerogative of course, but
> distros need to have a more product-oriented focus :-/

This only is a one reason of many. The killer issue really is the
safety issue. We shouldn't include code in systemd that makes
dangerous things like killing running fscks an easily accessible
operation, that has a graphical UI and requires no authentication.

> > I hope such a solution is acceptable?
> 
> The data flow is very similar to what we have now, so this mostly
> amounts to maintaining fsckd in the systemd sources vs. maintaining it
> separately in Debian/Ubuntu. I'd be interested in what
> RHEL/SUSE/Arch/etc. want to do.

We never had code for this in Fedora/RHEL, and that's not going to
change. The ability to have a graphical UI for killing fscks without
authentication was an Ubuntu thing, and I figure it's going to stay
one.

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] fsckd needs to go

2015-04-06 Thread Martin Pitt
Hello all,

Lennart Poettering [2015-04-03 16:34 +0200]:
> Well, I had a brief look at this patch, but it still doesn't get the
> socket IO stuff right. It uses synchronous fgets() to read things of
> the sockets, that's still not OK, and is a major thing that is
> wrong.

fsckd kicks malicious/broken fsck clients which send garbage, but if
you want to do the buffering explicitly I can rework the patch to do
that. That is, if we actually keep fsckd in the upstream sources :-)
I wouldn't like to spend time on this if you already pre-decided to
kick this out, but I would ask to reconsider, and instead discuss
what's wrong with the code.

Thanks,

Martin
-- 
Martin Pitt| http://www.piware.de
Ubuntu Developer (www.ubuntu.com)  | Debian Developer  (www.debian.org)
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] fsckd needs to go

2015-04-06 Thread Martin Pitt
Hello Lennart, all,

Lennart Poettering [2015-04-03 14:58 +0200]:
> To start with, the code is really wrong, it should never have been
> merged in its current state, the read/write logic for the sockets is
> completely borked (I cannot even boot my own machine reliably with
> it!).

This is surprising indeed. If that's not just the journald/logind/D-Bus
corruption (which we still haven't tracked down properly), do you have
a journal of a hung boot? We never saw a boot failure due to fsck so
far, so I'm naturally very interested in seeing what's wrong.

> And to my knowledge there has been no attempt to fix all of that,
> even though I asked for it.

As far as I see, every point that came up during reviews, including
your recent one about "don't route fsck output through systemd-fsck"
got addressed (that latter patch hasn't been committed though, I
thought you wanted to review it yourself).

> It also doesn't do at all what I suggested initially, as the flow of
> data is now fsck → systemd-fsck → systemd-fsckd → plymouth, and
> that's just crazy, that's two steps too many.

With the above patch it's fsck -> systemd-fsckd → plymouth, and I
don't see how to eliminate yet another step?

> Then, there's my general reservation with fsckd at all: file systems
> that still require offline fsck are certainly not the future, but we
> develop stuff for the future

I do agree with the sentiment; let me assure you that we don't easily
spend days on such stuff in vain, but it's because there are millions
of existing installations out there which still do have ext4 and fsck.
If systemd upstreams say "we don't care about existing products, only
about a future with just btrfs" that's your prerogative of course, but
distros need to have a more product-oriented focus :-/

> systemd-fsckd would try to connect to some AF_UNIX/SOCK_STREAM socket
> in the fs, after forking and before execing fsck in the child, and
> pass the connected socket to fsck via the -C switch. If the socket is
> not connectable it would avoid any -C switch. With this simple change
> you can make this work for you: simply write a daemon (outside of
> systemd) that listens on that sockets and reads the progress data from
> it. Using SO_PEERCRED you can query which fsck PID this is from and
> use it to kill it. You could even add this to ply natively if you
> wish, since it's kinda strange to bump this all off another daemon in
> the middle, unnecessarily.
> 
> Changing this would actually make it very close to my initial
> suggestion, except that we would not have the receiving side for the
> progress data in systemd, you'd have to maintain that externally (or
> in ply). 
> 
> I hope such a solution is acceptable?

The data flow is very similar to what we have now, so this mostly
amounts to maintaining fsckd in the systemd sources vs. maintaining it
separately in Debian/Ubuntu. I'd be interested in what
RHEL/SUSE/Arch/etc. want to do.

Martin

-- 
Martin Pitt| http://www.piware.de
Ubuntu Developer (www.ubuntu.com)  | Debian Developer  (www.debian.org)
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] fsckd needs to go

2015-04-03 Thread Lennart Poettering
B1;3802;0cOn Fri, 03.04.15 15:17, Didier Roche (didro...@ubuntu.com) wrote:

> >To start with, the code is really wrong, it should never have been
> >merged in its current state, the read/write logic for the sockets is
> >completely borked (I cannot even boot my own machine reliably with
> >it!). And to my knowledge there has been no attempt to fix all of
> >that, even though I asked for it. It also doesn't do at all what I
> >suggested initially, as the flow of data is now fsck → systemd-fsck →
> >systemd-fsckd → plymouth, and that's just crazy, that's two steps too
> >many. systemd is supposed to be a few components playing well
> >together, but certainly not a baroque network of components where data
> >is passed though four hoops before it reaches the destination...
>
> I misunderstood first what you wanted in 2011, reading back from the mailing
> list. You would have noted that no comment (even on the first review which
> were made) raised those points in the multiple reviews that occured, hence
> it was merged. It's weird that it doesn't even boot your own machine
> reliably, as we have the first implementation running on all vivid machines
> by default, and it seems from the bug reports, reliably.

It might have to do with the fact that our ply set up (with themes,
...) is different than Ubuntu's.

> However, I'm a bit surprised about the statement that no attempt has been
> done to fix it. I think you saw I have always been responsive, prioritizing
> your suggestions over other work to fix them. When you did your first public
> personal reserves about fsckd on the mainling list and I understood what
> flow you wanted[1], I posted fixes *the day after* (with some back and force
> review) to address your comments.

I reviewed some of the initial patches, but please note that it was
merged by Pitti before I had a final look on it.

Well, the major problem (with the socket handling) are completely
unfixed still, and no patch has been posted afaics that fixed that.

I am aware that we didn't communicate this all properly, but Kay,
Daniel, David and I only sat down the day before yesterday to come to
a conclusion about all of this.

> All of them were merged by other systemd hackers and some even by ourself,
> but the biggest one, which directly addressed and implemented the flow of
> data you explicitly asked for is still waiting:
> http://lists.freedesktop.org/archives/systemd-devel/2015-March/029309.html
> (Note that this was proposed less than 48 hours after your complain about
> the data flow). Knowing that you were on holidays, I didn't push others too
> much, but Martin and I pinged you on IRC about it when you were back. Am I
> missing anything?

Well, I had a brief look at this patch, but it still doesn't get the
socket IO stuff right. It uses synchronous fgets() to read things of
the sockets, that's still not OK, and is a major thing that is
wrong. by looking at the patch I am pretty sure this all will lock up
if you have multiple fsck, to the point where you cause all fscks to
stop but the first one until the first once is finished, and so
on... (I does remove the extra bumping off systemd-fsckd though,
that's good!)

> >Then, there's my general reservation with fsckd at all: file systems
> >that still require offline fsck are certainly not the future, but we
> >develop stuff for the future, and the idea to kill an fsck process
> >while it is running is also very very questionnable. There's a reason
> >why such functionality never existed on Fedora or RHEL: it's risky. I
> >mean, it's all good allowing people to shoot themselves in the foot,
> >but there's really *no* point in making that easy and giving it a
> >fancy UI with support in the graphical boot splash. Shooting yourself
> >in the foot should be possible, but not *easily*! And certainly not be
> >allowed without prior authentication like you are doing it right now
> >with the plymouth support.
>
> I can understand those points, just a little bit disappointed that wasn't
> stated months ago, when we started to work on it and before the whole
> refactoring…

Yes, sorry for that. We should have sat down earlier, and come to a
conclusion about this. 

Sorry for the unclear message we were sending!

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] fsckd needs to go

2015-04-03 Thread Didier Roche

Le 03/04/2015 14:58, Lennart Poettering a écrit :

Heya,


Hey Lennart,


so we discussed the whole fsckd situation a bit more here in Berlin,
and we came to the conclusion that fsckd really should not exist the
way it does in systemd.

To start with, the code is really wrong, it should never have been
merged in its current state, the read/write logic for the sockets is
completely borked (I cannot even boot my own machine reliably with
it!). And to my knowledge there has been no attempt to fix all of
that, even though I asked for it. It also doesn't do at all what I
suggested initially, as the flow of data is now fsck → systemd-fsck →
systemd-fsckd → plymouth, and that's just crazy, that's two steps too
many. systemd is supposed to be a few components playing well
together, but certainly not a baroque network of components where data
is passed though four hoops before it reaches the destination...
I misunderstood first what you wanted in 2011, reading back from the 
mailing list. You would have noted that no comment (even on the first 
review which were made) raised those points in the multiple reviews that 
occured, hence it was merged. It's weird that it doesn't even boot your 
own machine reliably, as we have the first implementation running on all 
vivid machines by default, and it seems from the bug reports, reliably.


However, I'm a bit surprised about the statement that no attempt has 
been done to fix it. I think you saw I have always been responsive, 
prioritizing your suggestions over other work to fix them. When you did 
your first public personal reserves about fsckd on the mainling list and 
I understood what flow you wanted[1], I posted fixes *the day after* 
(with some back and force review) to address your comments.


All of them were merged by other systemd hackers and some even by 
ourself, but the biggest one, which directly addressed and implemented 
the flow of data you explicitly asked for is still waiting: 
http://lists.freedesktop.org/archives/systemd-devel/2015-March/029309.html 
(Note that this was proposed less than 48 hours after your complain 
about the data flow). Knowing that you were on holidays, I didn't push 
others too much, but Martin and I pinged you on IRC about it when you 
were back. Am I missing anything?




Then, there's my general reservation with fsckd at all: file systems
that still require offline fsck are certainly not the future, but we
develop stuff for the future, and the idea to kill an fsck process
while it is running is also very very questionnable. There's a reason
why such functionality never existed on Fedora or RHEL: it's risky. I
mean, it's all good allowing people to shoot themselves in the foot,
but there's really *no* point in making that easy and giving it a
fancy UI with support in the graphical boot splash. Shooting yourself
in the foot should be possible, but not *easily*! And certainly not be
allowed without prior authentication like you are doing it right now
with the plymouth support.
I can understand those points, just a little bit disappointed that 
wasn't stated months ago, when we started to work on it and before the 
whole refactoring…




Thus, we decided to remove fsckd again entirely from systemd. However,
if Ubuntu really wants to implement this anyway (I strongly
believe that this is an absolute misfeature!), then I'd be willing to
add the following for you:

systemd-fsckd would try to connect to some AF_UNIX/SOCK_STREAM socket
in the fs, after forking and before execing fsck in the child, and
pass the connected socket to fsck via the -C switch. If the socket is
not connectable it would avoid any -C switch. With this simple change
you can make this work for you: simply write a daemon (outside of
systemd) that listens on that sockets and reads the progress data from
it. Using SO_PEERCRED you can query which fsck PID this is from and
use it to kill it. You could even add this to ply natively if you
wish, since it's kinda strange to bump this all off another daemon in
the middle, unnecessarily.

Changing this would actually make it very close to my initial
suggestion, except that we would not have the receiving side for the
progress data in systemd, you'd have to maintain that externally (or
in ply).
Not sure we are going so close to vivid finale, changing it again. We 
did implement all your suggestions and fixed it to match those. I'm 
feeling a little bit uneasy about how all this turned out, showing such 
good willing to get it contributed upstream we put into it, but if 
that's the fate of it…


Didier


[1] 
http://lists.freedesktop.org/archives/systemd-devel/2015-March/029186.html

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] fsckd needs to go

2015-04-03 Thread Lennart Poettering
Heya,

so we discussed the whole fsckd situation a bit more here in Berlin,
and we came to the conclusion that fsckd really should not exist the
way it does in systemd.

To start with, the code is really wrong, it should never have been
merged in its current state, the read/write logic for the sockets is
completely borked (I cannot even boot my own machine reliably with
it!). And to my knowledge there has been no attempt to fix all of
that, even though I asked for it. It also doesn't do at all what I
suggested initially, as the flow of data is now fsck → systemd-fsck →
systemd-fsckd → plymouth, and that's just crazy, that's two steps too
many. systemd is supposed to be a few components playing well
together, but certainly not a baroque network of components where data
is passed though four hoops before it reaches the destination...

Then, there's my general reservation with fsckd at all: file systems
that still require offline fsck are certainly not the future, but we
develop stuff for the future, and the idea to kill an fsck process
while it is running is also very very questionnable. There's a reason
why such functionality never existed on Fedora or RHEL: it's risky. I
mean, it's all good allowing people to shoot themselves in the foot,
but there's really *no* point in making that easy and giving it a
fancy UI with support in the graphical boot splash. Shooting yourself
in the foot should be possible, but not *easily*! And certainly not be
allowed without prior authentication like you are doing it right now
with the plymouth support.

Thus, we decided to remove fsckd again entirely from systemd. However,
if Ubuntu really wants to implement this anyway (I strongly
believe that this is an absolute misfeature!), then I'd be willing to
add the following for you:

systemd-fsckd would try to connect to some AF_UNIX/SOCK_STREAM socket
in the fs, after forking and before execing fsck in the child, and
pass the connected socket to fsck via the -C switch. If the socket is
not connectable it would avoid any -C switch. With this simple change
you can make this work for you: simply write a daemon (outside of
systemd) that listens on that sockets and reads the progress data from
it. Using SO_PEERCRED you can query which fsck PID this is from and
use it to kill it. You could even add this to ply natively if you
wish, since it's kinda strange to bump this all off another daemon in
the middle, unnecessarily.

Changing this would actually make it very close to my initial
suggestion, except that we would not have the receiving side for the
progress data in systemd, you'd have to maintain that externally (or
in ply). 

I hope such a solution is acceptable?

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel