Re: live kernel upgrades (was: live kernel patching design)

2015-03-04 Thread Ingo Molnar

* Jiri Slaby  wrote:

> On 02/24/2015, 10:16 AM, Ingo Molnar wrote:
> >
> > and we don't design the Linux kernel for weird, extreme cases, we 
> > design for the common, sane case that has the broadest appeal, and 
> > we hope that the feature garners enough interest to be 
> > maintainable.
> 
> Hello,
> 
> oh, so why do we have NR_CPUS up to 8192, then? [...]

Because:

 - More CPUs is not some weird dead end, but a natural direction of
   hardware development.

 - Furthermore, we've gained a lot of scalability and other 
   improvements all around the kernel just by virtue of big iron 
   running into those problems first.

 - In the typical case there's no friction between 8192 CPUs and the 
   kernel's design. Where there was friction (and it happened), we 
   pushed back.

Such benefits add up and 8K CPUs support is a success story today.

That positive, symbiotic, multi-discipline relationship between 8K 
CPUs support design goals and 'regular Linux' design goals stands in 
stark contrast with the single-issue approach that live kernel 
patching is designing itself into a dead end so early on ...

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live kernel upgrades (was: live kernel patching design)

2015-03-04 Thread Ingo Molnar

* Jiri Slaby jsl...@suse.cz wrote:

 On 02/24/2015, 10:16 AM, Ingo Molnar wrote:
 
  and we don't design the Linux kernel for weird, extreme cases, we 
  design for the common, sane case that has the broadest appeal, and 
  we hope that the feature garners enough interest to be 
  maintainable.
 
 Hello,
 
 oh, so why do we have NR_CPUS up to 8192, then? [...]

Because:

 - More CPUs is not some weird dead end, but a natural direction of
   hardware development.

 - Furthermore, we've gained a lot of scalability and other 
   improvements all around the kernel just by virtue of big iron 
   running into those problems first.

 - In the typical case there's no friction between 8192 CPUs and the 
   kernel's design. Where there was friction (and it happened), we 
   pushed back.

Such benefits add up and 8K CPUs support is a success story today.

That positive, symbiotic, multi-discipline relationship between 8K 
CPUs support design goals and 'regular Linux' design goals stands in 
stark contrast with the single-issue approach that live kernel 
patching is designing itself into a dead end so early on ...

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live kernel upgrades (was: live kernel patching design)

2015-02-24 Thread Vojtech Pavlik
On Tue, Feb 24, 2015 at 11:23:29AM +0100, Ingo Molnar wrote:

> > Your upgrade proposal is an *enormous* disruption to the 
> > system:
> > 
> > - a latency of "well below 10" seconds is completely
> >   unacceptable to most users who want to patch the kernel 
> >   of a production system _while_ it's in production.
> 
> I think this statement is false for the following reasons.

The statement is very true.

>   - I'd say the majority of system operators of production 
> systems can live with a couple of seconds of delay at a 
> well defined moment of the day or week - with gradual, 
> pretty much open ended improvements in that latency 
> down the line.

In the most usual corporate setting any noticeable outage, even out of
business hours, requires an ahead notice, and an agreement of all
stakeholders - teams that depend on the system.

If a live patching technology introduces an outage, it's not "live" and
because of these bureaucratic reasons, it will not be used and a regular
reboot will be scheduled instead.

>   - I think your argument ignores the fact that live 
> upgrades would extend the scope of 'users willing to 
> patch the kernel of a production system' _enormously_. 
> 
> For example, I have a production system with this much 
> uptime:
> 
>10:50:09 up 153 days,  3:58, 34 users,  load average: 0.00, 0.02, 0.05
> 
> While currently I'm reluctant to reboot the system to 
> upgrade the kernel (due to a reboot's intrusiveness), 
> and that is why it has achieved a relatively high 
> uptime, but I'd definitely allow the kernel to upgrade 
> at 0:00am just fine. (I'd even give it up to a few 
> minutes, as long as TCP connections don't time out.)
> 
> And I don't think my usecase is special.

I agree that this is useful. But it is a different problem that only
partially overlaps with what we're trying to achieve with live patching.

If you can make full kernel upgrades to work this way, which I doubt is
achievable in the next 10 years due to all the research and
infrastructure needed, then you certainly gain an additional group of
users. And a great tool. A large portion of those that ask for live
patching won't use it, though.

But honestly, I prefer a solution that works for small patches now, than
a solution for unlimited patches sometime in next decade.

> What gradual improvements in live upgrade latency am I 
> talking about?
> 
>  - For example the majority of pure user-space process 
>pages in RAM could be saved from the old kernel over 
>into the new kernel - i.e. they'd stay in place in RAM, 
>but they'd be re-hashed for the new data structures. 
>This avoids a big chunk of checkpointing overhead.

I'd have hoped this would be a given. If you can't preserve memory
contents and have to re-load from disk, you can just as well reboot
entirely, the time needed will not be much more..

>  - Likewise, most of the page cache could be saved from an
>old kernel to a new kernel as well - further reducing
>checkpointing overhead.
> 
>  - The PROT_NONE mechanism of the current NUMA balancing
>code could be used to transparently mark user-space 
>pages as 'checkpointed'. This would reduce system 
>interruption as only 'newly modified' pages would have 
>to be checkpointed when the upgrade happens.
> 
>  - Hardware devices could be marked as 'already in well
>defined state', skipping the more expensive steps of 
>driver initialization.
> 
>  - Possibly full user-space page tables could be preserved 
>over an upgrade: this way user-space execution would be 
>unaffected even in the micro level: cache layout, TLB
>patterns, etc.
> 
> There's lots of gradual speedups possible with such a model 
> IMO.

Yes, as I say above, guaranteeing decades of employment. ;)

> With live kernel patching we run into a brick wall of 
> complexity straight away: we have to analyze the nature of 
> the kernel modification, in the context of live patching, 
> and that only works for the simplest of kernel 
> modifications.

But you're able to _use_ it.

> With live kernel upgrades no such brick wall exists, just 
> about any transition between kernel versions is possible.

The brick wall you run to is "I need to implement full kernel state
serialization before I can do anything at all." That's something that
isn't even clear _how_ to do. Particularly with Linux kernel's
development model where internal ABI and structures are always in flux
it may not even be realistic.

> Granted, with live kernel upgrades it's much more complex 
> to get the 'simple' case into an even rudimentarily working 
> fashion (full userspace state has to be enumerated, saved 
> and restored), but once we are there, it's a whole new 
> category of goodness and it probably covers 90%+ of the 
> live kernel patching usecases on day 1 already ...

Feel free to start working on it. I'll stick with live patching.

-- 
Vojtech 

Re: live kernel upgrades (was: live kernel patching design)

2015-02-24 Thread Jiri Slaby
On 02/24/2015, 10:16 AM, Ingo Molnar wrote:
> and we don't design the Linux kernel for weird, extreme 
> cases, we design for the common, sane case that has the 
> broadest appeal, and we hope that the feature garners 
> enough interest to be maintainable.

Hello,

oh, so why do we have NR_CPUS up to 8192, then? I haven't met a machine
with more than 16 cores yet. You did. But you haven't met a guy
thanksgiving for a live patching being so easy to implement, but fast. I
did.

What ones call extreme, others accept as standard. That is, I believe,
why you signed under the support for up to 8192 CPUs.

We develop Linux to be scalable, i.e. used on *whatever* scenario you
can imagine in any world. Be it large/small machines, lowmem/higmem,
numa/uma, whatever. If you don't like something, you are free to disable
that. Democracy.

> This is not a problem in general: the weird case can take 
> care of itself just fine - 'specialized and weird' usually 
> means there's enough money to throw at special hardware and 
> human solutions or it goes extinct quickly ...

Live patching is not a random idea which is about to die. It is months
of negotiations with customers, management, between developers,
establishing teams and really thinking about the idea. The decisions
were discussed on many conferences too. I am trying to shed some light
on why we are not trying to improve criu or any other already existing
project. We studied papers, codes, implementations, kSplice and such and
decided to incline to what we have implemented, presented and merged.

thanks,
-- 
js
suse labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live kernel upgrades (was: live kernel patching design)

2015-02-24 Thread Vojtech Pavlik
On Tue, Feb 24, 2015 at 11:53:28AM +0100, Ingo Molnar wrote:
> 
> * Jiri Kosina  wrote:
> 
> > [...] We could optimize the kernel the craziest way we 
> > can, but hardware takes its time to reinitialize. And in 
> > most cases, you'd really need to reinitalize it; [...]
> 
> If we want to reinitialize a device, most of the longer 
> initialization latencies during bootup these days involve 
> things like: 'poke hardware, see if there's any response'. 
> Those are mostly going away quickly with modern, 
> well-enumerated hardware interfaces.
> 
> Just try a modprobe of a random hardware driver - most 
> initialization sequences are very fast. (That's how people 
> are able to do cold bootups in less than 1 second.)

Have you ever tried to boot a system with a large (> 100) number of
drives connected over FC? That takes time to discover and you have to do
the discovery as the configuration could have changed while you were not
looking.

Or a machine with terabytes of memory? Just initializing the memory
takes minutes.

Or a desktop with USB? And you have to reinitialize the USB bus and the
state of all the USB devices, because an application might be accessing
files on an USB drive.

> In theory this could also be optimized: we could avoid the 
> reinitialization step through an upgrade via relatively 
> simple means, for example if drivers define their own 
> version and the new kernel's driver checks whether the 
> previous state is from a compatible driver. Then the new 
> driver could do a shorter initialization sequence.

There you're clearly getting in the "so complex to maintain that it'll never
work reliably" territory.

> But I'd only do it only in special cases, where for some 
> reason the initialization sequence takes longer time and it 
> makes sense to share hardware discovery information between 
> two versions of the driver. I'm not convinced such a 
> mechanism is necessary in the general case.

-- 
Vojtech Pavlik
Director SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live kernel upgrades (was: live kernel patching design)

2015-02-24 Thread Vojtech Pavlik
On Tue, Feb 24, 2015 at 10:44:05AM +0100, Ingo Molnar wrote:

> > This is the most common argument that's raised when live 
> > patching is discussed. "Why do need live patching when we 
> > have redundancy?"
> 
> My argument is that if we start off with a latency of 10 
> seconds and improve that gradually, it will be good for 
> everyone with a clear, actionable route for even those who 
> cannot take a 10 seconds delay today.

Sure, we can do it that way. 

Or do it in the other direction.

Today we have a tool (livepatch) in the kernel that can apply trivial
single-function fixes without a measurable disruption to applications.

And we can improve it gradually to expand the range of fixes it can
apply.

Dependent functions can be done by kGraft's lazy migration.

Limited data structure changes can be handled by shadowing.

Major data structure and/or locking changes require stopping the kernel,
and trapping all tasks at the kernel/userspace boundary is clearly the
cleanest way to do that. I comes at a steep latency cost, though.

Full code replacement without change scope consideration requires full
serialization and deserialization of hardware and userspace
interface state, which is something we don't have today and would
require work on every single driver. Possible, but probably a decade of
effort.

With this approach you have something useful at every point and every
piece of effort put in gives you a rewars.

> Lets see the use cases:
> 
> > [...] Examples would be legacy applications which can't 
> > run in an active-active cluster and need to be restarted 
> > on failover.
> 
> Most clusters (say web frontends) can take a stoppage of a 
> couple of seconds.

It's easy to find examples of workloads that can be stopped. It doesn't
rule out a significant set of those where stopping them is very
expensive.

> > Another usecase is large HPC clusters, where all nodes 
> > have to run carefully synchronized. Once one gets behind 
> > in a calculation cycle, others have to wait for the 
> > results and the efficiency of the whole cluster goes 
> > down. [...]
> 
> I think calculation nodes on large HPC clusters qualify as 
> the specialized case that I mentioned, where the update 
> latency could be brought down into the 1 second range.
> 
> But I don't think calculation nodes are patched in the 
> typical case: you might want to patch Internet facing 
> frontend systems, the rest is left as undisturbed as 
> possible. So I'm not even sure this is a typical usecase.

They're not patched for security bugs, but stability bugs are an
important issue for multi-month calculations.

> In any case, there's no hard limit on how fast such a 
> kernel upgrade can get in principle, and the folks who care 
> about that latency will sure help out optimizing it and 
> many HPC projects are well funded.

So far, unless you come up with an effective solutions, if you're
catching all tasks at the kernel/userspace boundary (the "Kragle"
approach), the service interruption is effectively unbounded due to
tasks in D state.

> > The value of live patching is in near zero disruption.
> 
> Latency is a good attribute of a kernel upgrade mechanism, 
> but it's by far not the only attribute and we should 
> definitely not design limitations into the approach and 
> hurt all the other attributes, just to optimize that single 
> attribute.

It's an attribute I'm not willing to give up. On the other hand, I
definitely wouldn't argue against having modes of operation where the
latency is higher and the tool is more powerful.

> I.e. don't make it a single-issue project.

There is no need to worry about that. 

-- 
Vojtech Pavlik
Director SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live kernel upgrades (was: live kernel patching design)

2015-02-24 Thread Jiri Slaby
On 02/22/2015, 10:46 AM, Ingo Molnar wrote:
> Arbitrary live kernel upgrades could be achieved by 
> starting with the 'simple method' I outlined in earlier 
> mails, using some of the methods that kpatch and kGraft are 
> both utilizing or planning to utilize:
> 
>   - implement user task and kthread parking to get the 
> kernel into quiescent state.
> 
>   - implement (optional, thus ABI-compatible) 
> system call interruptability and restartability 
> support.
> 
>   - implement task state and (limited) device state
> snapshotting support
> 
>   - implement live kernel upgrades by:
> 
>   - snapshotting all system state transparently
> 
>   - fast-rebooting into the new kernel image without 
> shutting down and rebooting user-space, i.e. _much_ 
> faster than a regular reboot.
> 
>   - restoring system state transparently within the new 
> kernel image and resuming system workloads where 
> they were left.
> 
> Even complex external state like TCP socket state and 
> graphics state can be preserved over an upgrade. As far as 
> the user is concerned, nothing happened but a brief pause - 
> and he's now running a v3.21 kernel, not v3.20.
> 
> Obviously one of the simplest utilizations of live kernel 
> upgrades would be to apply simple security fixes to 
> production systems. But that's just a very simple 
> application of a much broader capability.
> 
> Note that if done right, then the time to perform a live 
> kernel upgrade on a typical system could be brought to well 
> below 10 seconds system stoppage time: adequate to the vast 
> majority of installations.
> 
> For special installations or well optimized hardware the 
> latency could possibly be brought below 1 second stoppage 
> time.

Hello,

IMNSHO, you cannot.

The criu-based approach you have just described is already alive as an
external project in Parallels. It is of course a perfect solution for
some use cases. But its use case is a distinctive one. It is not our
competitor, it is our complementer. I will try to explain why.

It is highly dependent on HW. Kexec is not (or any other arbitrary
kernel-exchange mechanism would not be) supported by all HW, neither
drivers. There is not even a way to implement snapshotting for some
devices which is a real issue, obviously.

Downtime is highly dependent on the scenario. If you have a plenty of
dirty memory, you have to flush first. This might be minutes, especially
when using a network FS. Or you need not, but a failure to replace a
kernel is then lethal. If you have a heap of open FD, restore time will
take ages. You cannot fool any of those. It's pure I/O. You cannot
estimate the downtime and that is a real downside.

Even if you can get the criu time under one second, this is still
unacceptable for live patching. Live patching shall be by 3 orders of
magnitude faster than that, otherwise it makes no sense. If you can
afford a second, you probably already have a large enough windows or
failure handling to perform a full and mainly safer reboot/kexec anyway.

You cannot restore everything.
* TCP is one of the pure beasts in this. And there is indeed a plenty of
theoretical papers behind this, explaining what can or cannot be done.
* NFS is another one.
* Xorg. Today, we cannot even fluently switch between discreet and
native GFX chip. No go.
* There indeed are situations, where NP-hard problems need to be solved
upon restoration. No way, if you want to restore yet in this century.

While you cannot live-patch everything using KLP, it is patch-dependent.
Failure of restoration is condition-dependent and the condition is
really fuzzy. That is a huge difference.

Despite you put criu-based approach as provably safe and correct, it is
not in many cases and cannot be by definition.

That said, we are not going to start moving that way, except the many
good points which emerged during the discussion (fake signals to pick one).

> This 'live kernel upgrades' approach would have various 
> advantages:
> 
>   - it brings together various principles working towards 
> shared goals:
> 
>   - the boot time reduction folks
>   - the checkpoint/restore folks
>   - the hibernation folks
>   - the suspend/resume and power management folks
>   - the live patching folks (you)
>   - the syscall latency reduction folks
> 
> if so many disciplines are working together then maybe 
> something really good and long term maintainble can 
> crystalize out of that effort.

I must admit, whenever I implemented something in the kernel, nobody did
any work for me. So the above will only result in live patching teams to
do all the work. I am not saying we do not want to do the work. I am
only pointing out that there is nothing like "work together with other
teams" (unless we are sending them their pay-bills).

>   - it ignores the security theater that treats security
> fixes as a separate, disproportionally more important

Re: live kernel upgrades (was: live kernel patching design)

2015-02-24 Thread Petr Mladek
On Tue 2015-02-24 11:23:29, Ingo Molnar wrote:
> What gradual improvements in live upgrade latency am I 
> talking about?
> 
>  - For example the majority of pure user-space process 
>pages in RAM could be saved from the old kernel over 
>into the new kernel - i.e. they'd stay in place in RAM, 
>but they'd be re-hashed for the new data structures. 

I wonder how many structures we would need to rehash when we update
the whole kernel. I think that it is not only about memory but also
about any other subsystem: networking, scheduler, ...


>  - Hardware devices could be marked as 'already in well
>defined state', skipping the more expensive steps of 
>driver initialization.

This is another point that might get easily wrong. We know that
the quality of many drivers is not good. Yes, we want to make it
better. But we also know that system suspend does not work well
on many systems for years even with the huge effort.


>  - Possibly full user-space page tables could be preserved 
>over an upgrade: this way user-space execution would be 
>unaffected even in the micro level: cache layout, TLB
>patterns, etc.
> 
> There's lots of gradual speedups possible with such a model 
> IMO.
> 
> With live kernel patching we run into a brick wall of 
> complexity straight away: we have to analyze the nature of 
> the kernel modification, in the context of live patching, 
> and that only works for the simplest of kernel 
> modifications.
> 
> With live kernel upgrades no such brick wall exists, just 
> about any transition between kernel versions is possible.

I see here a big difference in the complexity. If verifying patches
is considered as complex then I think that it is much much more
complicated to verify that the whole kernel upgrade is safe and
that all states will be properly preserved and reused.

Otherwise, I think that live patching won't be for any Joe User.
The people producing patches will need to investigate the
changes anyway. They will not blindly take a patch on internet
and convert it to a life patch. I think that this is true for
many other kernel features.


> Granted, with live kernel upgrades it's much more complex 
> to get the 'simple' case into an even rudimentarily working 
> fashion (full userspace state has to be enumerated, saved 
> and restored), but once we are there, it's a whole new 
> category of goodness and it probably covers 90%+ of the 
> live kernel patching usecases on day 1 already ...

I like the idea and I see the benefit for other tasks: system suspend,
migration of systems to another hardware, ... But I also think that it
is another level of functionality.

IMHO, live patching is somewhere on the way for the full kernel
update and it will help as well. For example, we will need to somehow
solve transition of kthreads and thus fix their parking.

I think that live patching deserves its separate solution. I consider
it much less risky but still valuable. I am sure that it will have
its users. Also it will not block improving the things for the full
update in the future.


Best Regards,
Petr
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live kernel upgrades (was: live kernel patching design)

2015-02-24 Thread Ingo Molnar

* Jiri Kosina  wrote:

> [...] We could optimize the kernel the craziest way we 
> can, but hardware takes its time to reinitialize. And in 
> most cases, you'd really need to reinitalize it; [...]

If we want to reinitialize a device, most of the longer 
initialization latencies during bootup these days involve 
things like: 'poke hardware, see if there's any response'. 
Those are mostly going away quickly with modern, 
well-enumerated hardware interfaces.

Just try a modprobe of a random hardware driver - most 
initialization sequences are very fast. (That's how people 
are able to do cold bootups in less than 1 second.)

In theory this could also be optimized: we could avoid the 
reinitialization step through an upgrade via relatively 
simple means, for example if drivers define their own 
version and the new kernel's driver checks whether the 
previous state is from a compatible driver. Then the new 
driver could do a shorter initialization sequence.

But I'd only do it only in special cases, where for some 
reason the initialization sequence takes longer time and it 
makes sense to share hardware discovery information between 
two versions of the driver. I'm not convinced such a 
mechanism is necessary in the general case.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live kernel upgrades (was: live kernel patching design)

2015-02-24 Thread Ingo Molnar

* Pavel Machek  wrote:

> > More importantly, both kGraft and kpatch are pretty limited 
> > in what kinds of updates they allow, and neither kGraft nor 
> > kpatch has any clear path towards applying more complex 
> > fixes to kernel images that I can see: kGraft can only 
> > apply the simplest of fixes where both versions of a 
> > function are interchangeable, and kpatch is only marginally 
> > better at that - and that's pretty fundamental to both 
> > projects!
> > 
> > I think all of these problems could be resolved by shooting 
> > for the moon instead:
> > 
> >   - work towards allowing arbitrary live kernel upgrades!
> > 
> > not just 'live kernel patches'.
> 
> Note that live kernel upgrade would have interesting 
> implications outside kernel:
> 
> 1) glibc does "what kernel version is this?" caches 
> result and alters behaviour accordingly.

That should be OK, as a new kernel will be ABI compatible 
with an old kernel.

A later optimization could update the glibc cache on an 
upgrade, fortunately both projects are open source.

> 2) apps will do recently_introduced_syscall(), get error 
> and not attempt it again.

That should be fine too.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live kernel upgrades (was: live kernel patching design)

2015-02-24 Thread Ingo Molnar

* Josh Poimboeuf  wrote:

> Your upgrade proposal is an *enormous* disruption to the 
> system:
> 
> - a latency of "well below 10" seconds is completely
>   unacceptable to most users who want to patch the kernel 
>   of a production system _while_ it's in production.

I think this statement is false for the following reasons.

  - I'd say the majority of system operators of production 
systems can live with a couple of seconds of delay at a 
well defined moment of the day or week - with gradual, 
pretty much open ended improvements in that latency 
down the line.

  - I think your argument ignores the fact that live 
upgrades would extend the scope of 'users willing to 
patch the kernel of a production system' _enormously_. 

For example, I have a production system with this much 
uptime:

   10:50:09 up 153 days,  3:58, 34 users,  load average: 0.00, 0.02, 0.05

While currently I'm reluctant to reboot the system to 
upgrade the kernel (due to a reboot's intrusiveness), 
and that is why it has achieved a relatively high 
uptime, but I'd definitely allow the kernel to upgrade 
at 0:00am just fine. (I'd even give it up to a few 
minutes, as long as TCP connections don't time out.)

And I don't think my usecase is special.

What gradual improvements in live upgrade latency am I 
talking about?

 - For example the majority of pure user-space process 
   pages in RAM could be saved from the old kernel over 
   into the new kernel - i.e. they'd stay in place in RAM, 
   but they'd be re-hashed for the new data structures. 
   This avoids a big chunk of checkpointing overhead.

 - Likewise, most of the page cache could be saved from an
   old kernel to a new kernel as well - further reducing
   checkpointing overhead.

 - The PROT_NONE mechanism of the current NUMA balancing
   code could be used to transparently mark user-space 
   pages as 'checkpointed'. This would reduce system 
   interruption as only 'newly modified' pages would have 
   to be checkpointed when the upgrade happens.

 - Hardware devices could be marked as 'already in well
   defined state', skipping the more expensive steps of 
   driver initialization.

 - Possibly full user-space page tables could be preserved 
   over an upgrade: this way user-space execution would be 
   unaffected even in the micro level: cache layout, TLB
   patterns, etc.

There's lots of gradual speedups possible with such a model 
IMO.

With live kernel patching we run into a brick wall of 
complexity straight away: we have to analyze the nature of 
the kernel modification, in the context of live patching, 
and that only works for the simplest of kernel 
modifications.

With live kernel upgrades no such brick wall exists, just 
about any transition between kernel versions is possible.

Granted, with live kernel upgrades it's much more complex 
to get the 'simple' case into an even rudimentarily working 
fashion (full userspace state has to be enumerated, saved 
and restored), but once we are there, it's a whole new 
category of goodness and it probably covers 90%+ of the 
live kernel patching usecases on day 1 already ...

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live kernel upgrades (was: live kernel patching design)

2015-02-24 Thread Ingo Molnar

* Vojtech Pavlik  wrote:

> On Sun, Feb 22, 2015 at 03:01:48PM -0800, Andrew Morton wrote:
> 
> > On Sun, 22 Feb 2015 20:13:28 +0100 (CET) Jiri Kosina  
> > wrote:
> > 
> > > But if you ask the folks who are hungry for live bug 
> > > patching, they wouldn't care.
> > > 
> > > You mentioned "10 seconds", that's more or less equal 
> > > to infinity to them.
> > 
> > 10 seconds outage is unacceptable, but we're running 
> > our service on a single machine with no failover.  Who 
> > is doing this??
> 
> This is the most common argument that's raised when live 
> patching is discussed. "Why do need live patching when we 
> have redundancy?"

My argument is that if we start off with a latency of 10 
seconds and improve that gradually, it will be good for 
everyone with a clear, actionable route for even those who 
cannot take a 10 seconds delay today.

Lets see the use cases:

> [...] Examples would be legacy applications which can't 
> run in an active-active cluster and need to be restarted 
> on failover.

Most clusters (say web frontends) can take a stoppage of a 
couple of seconds.

> [...] Or trading systems, where the calculations must be 
> strictly serialized and response times are counted in 
> tens of microseconds.

All trading systems I'm aware of have daily maintenance 
time periods that can afford at minimum of a couple of 
seconds of optional maintenance latency: stock trading 
systems can be maintained when there's no trading session 
(which is many hours), aftermarket or global trading 
systems can be maintained when the daily rollover 
interested is calculated in a predetermined low activity 
period.

> Another usecase is large HPC clusters, where all nodes 
> have to run carefully synchronized. Once one gets behind 
> in a calculation cycle, others have to wait for the 
> results and the efficiency of the whole cluster goes 
> down. [...]

I think calculation nodes on large HPC clusters qualify as 
the specialized case that I mentioned, where the update 
latency could be brought down into the 1 second range.

But I don't think calculation nodes are patched in the 
typical case: you might want to patch Internet facing 
frontend systems, the rest is left as undisturbed as 
possible. So I'm not even sure this is a typical usecase.

In any case, there's no hard limit on how fast such a 
kernel upgrade can get in principle, and the folks who care 
about that latency will sure help out optimizing it and 
many HPC projects are well funded.

> The value of live patching is in near zero disruption.

Latency is a good attribute of a kernel upgrade mechanism, 
but it's by far not the only attribute and we should 
definitely not design limitations into the approach and 
hurt all the other attributes, just to optimize that single 
attribute.

I.e. don't make it a single-issue project.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live kernel upgrades (was: live kernel patching design)

2015-02-24 Thread Ingo Molnar

* Arjan van de Ven  wrote:

> I think 10 seconds is Ingo being a bit exaggerating, 
> since you can boot a full system in a lot less time than 
> that, and more so if you know more about the system (e.g. 
> don't need to spin down and then discover and spin up 
> disks). If you're talking about inside a VM it's even 
> more extreme than that.

Correct, I mentioned 10 seconds latency to be on the safe 
side - but in general I suspect it can be reduced to below 
1 second, which should be enough for everyone but the most 
specialized cases: even specialized HA servers will update 
their systems in low activity maintenance windows.

and we don't design the Linux kernel for weird, extreme 
cases, we design for the common, sane case that has the 
broadest appeal, and we hope that the feature garners 
enough interest to be maintainable.

This is not a problem in general: the weird case can take 
care of itself just fine - 'specialized and weird' usually 
means there's enough money to throw at special hardware and 
human solutions or it goes extinct quickly ...

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live kernel upgrades (was: live kernel patching design)

2015-02-24 Thread Ingo Molnar

* Arjan van de Ven arjanvande...@gmail.com wrote:

 I think 10 seconds is Ingo being a bit exaggerating, 
 since you can boot a full system in a lot less time than 
 that, and more so if you know more about the system (e.g. 
 don't need to spin down and then discover and spin up 
 disks). If you're talking about inside a VM it's even 
 more extreme than that.

Correct, I mentioned 10 seconds latency to be on the safe 
side - but in general I suspect it can be reduced to below 
1 second, which should be enough for everyone but the most 
specialized cases: even specialized HA servers will update 
their systems in low activity maintenance windows.

and we don't design the Linux kernel for weird, extreme 
cases, we design for the common, sane case that has the 
broadest appeal, and we hope that the feature garners 
enough interest to be maintainable.

This is not a problem in general: the weird case can take 
care of itself just fine - 'specialized and weird' usually 
means there's enough money to throw at special hardware and 
human solutions or it goes extinct quickly ...

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live kernel upgrades (was: live kernel patching design)

2015-02-24 Thread Ingo Molnar

* Pavel Machek pa...@ucw.cz wrote:

  More importantly, both kGraft and kpatch are pretty limited 
  in what kinds of updates they allow, and neither kGraft nor 
  kpatch has any clear path towards applying more complex 
  fixes to kernel images that I can see: kGraft can only 
  apply the simplest of fixes where both versions of a 
  function are interchangeable, and kpatch is only marginally 
  better at that - and that's pretty fundamental to both 
  projects!
  
  I think all of these problems could be resolved by shooting 
  for the moon instead:
  
- work towards allowing arbitrary live kernel upgrades!
  
  not just 'live kernel patches'.
 
 Note that live kernel upgrade would have interesting 
 implications outside kernel:
 
 1) glibc does what kernel version is this? caches 
 result and alters behaviour accordingly.

That should be OK, as a new kernel will be ABI compatible 
with an old kernel.

A later optimization could update the glibc cache on an 
upgrade, fortunately both projects are open source.

 2) apps will do recently_introduced_syscall(), get error 
 and not attempt it again.

That should be fine too.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live kernel upgrades (was: live kernel patching design)

2015-02-24 Thread Ingo Molnar

* Josh Poimboeuf jpoim...@redhat.com wrote:

 Your upgrade proposal is an *enormous* disruption to the 
 system:
 
 - a latency of well below 10 seconds is completely
   unacceptable to most users who want to patch the kernel 
   of a production system _while_ it's in production.

I think this statement is false for the following reasons.

  - I'd say the majority of system operators of production 
systems can live with a couple of seconds of delay at a 
well defined moment of the day or week - with gradual, 
pretty much open ended improvements in that latency 
down the line.

  - I think your argument ignores the fact that live 
upgrades would extend the scope of 'users willing to 
patch the kernel of a production system' _enormously_. 

For example, I have a production system with this much 
uptime:

   10:50:09 up 153 days,  3:58, 34 users,  load average: 0.00, 0.02, 0.05

While currently I'm reluctant to reboot the system to 
upgrade the kernel (due to a reboot's intrusiveness), 
and that is why it has achieved a relatively high 
uptime, but I'd definitely allow the kernel to upgrade 
at 0:00am just fine. (I'd even give it up to a few 
minutes, as long as TCP connections don't time out.)

And I don't think my usecase is special.

What gradual improvements in live upgrade latency am I 
talking about?

 - For example the majority of pure user-space process 
   pages in RAM could be saved from the old kernel over 
   into the new kernel - i.e. they'd stay in place in RAM, 
   but they'd be re-hashed for the new data structures. 
   This avoids a big chunk of checkpointing overhead.

 - Likewise, most of the page cache could be saved from an
   old kernel to a new kernel as well - further reducing
   checkpointing overhead.

 - The PROT_NONE mechanism of the current NUMA balancing
   code could be used to transparently mark user-space 
   pages as 'checkpointed'. This would reduce system 
   interruption as only 'newly modified' pages would have 
   to be checkpointed when the upgrade happens.

 - Hardware devices could be marked as 'already in well
   defined state', skipping the more expensive steps of 
   driver initialization.

 - Possibly full user-space page tables could be preserved 
   over an upgrade: this way user-space execution would be 
   unaffected even in the micro level: cache layout, TLB
   patterns, etc.

There's lots of gradual speedups possible with such a model 
IMO.

With live kernel patching we run into a brick wall of 
complexity straight away: we have to analyze the nature of 
the kernel modification, in the context of live patching, 
and that only works for the simplest of kernel 
modifications.

With live kernel upgrades no such brick wall exists, just 
about any transition between kernel versions is possible.

Granted, with live kernel upgrades it's much more complex 
to get the 'simple' case into an even rudimentarily working 
fashion (full userspace state has to be enumerated, saved 
and restored), but once we are there, it's a whole new 
category of goodness and it probably covers 90%+ of the 
live kernel patching usecases on day 1 already ...

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live kernel upgrades (was: live kernel patching design)

2015-02-24 Thread Ingo Molnar

* Vojtech Pavlik vojt...@suse.com wrote:

 On Sun, Feb 22, 2015 at 03:01:48PM -0800, Andrew Morton wrote:
 
  On Sun, 22 Feb 2015 20:13:28 +0100 (CET) Jiri Kosina jkos...@suse.cz 
  wrote:
  
   But if you ask the folks who are hungry for live bug 
   patching, they wouldn't care.
   
   You mentioned 10 seconds, that's more or less equal 
   to infinity to them.
  
  10 seconds outage is unacceptable, but we're running 
  our service on a single machine with no failover.  Who 
  is doing this??
 
 This is the most common argument that's raised when live 
 patching is discussed. Why do need live patching when we 
 have redundancy?

My argument is that if we start off with a latency of 10 
seconds and improve that gradually, it will be good for 
everyone with a clear, actionable route for even those who 
cannot take a 10 seconds delay today.

Lets see the use cases:

 [...] Examples would be legacy applications which can't 
 run in an active-active cluster and need to be restarted 
 on failover.

Most clusters (say web frontends) can take a stoppage of a 
couple of seconds.

 [...] Or trading systems, where the calculations must be 
 strictly serialized and response times are counted in 
 tens of microseconds.

All trading systems I'm aware of have daily maintenance 
time periods that can afford at minimum of a couple of 
seconds of optional maintenance latency: stock trading 
systems can be maintained when there's no trading session 
(which is many hours), aftermarket or global trading 
systems can be maintained when the daily rollover 
interested is calculated in a predetermined low activity 
period.

 Another usecase is large HPC clusters, where all nodes 
 have to run carefully synchronized. Once one gets behind 
 in a calculation cycle, others have to wait for the 
 results and the efficiency of the whole cluster goes 
 down. [...]

I think calculation nodes on large HPC clusters qualify as 
the specialized case that I mentioned, where the update 
latency could be brought down into the 1 second range.

But I don't think calculation nodes are patched in the 
typical case: you might want to patch Internet facing 
frontend systems, the rest is left as undisturbed as 
possible. So I'm not even sure this is a typical usecase.

In any case, there's no hard limit on how fast such a 
kernel upgrade can get in principle, and the folks who care 
about that latency will sure help out optimizing it and 
many HPC projects are well funded.

 The value of live patching is in near zero disruption.

Latency is a good attribute of a kernel upgrade mechanism, 
but it's by far not the only attribute and we should 
definitely not design limitations into the approach and 
hurt all the other attributes, just to optimize that single 
attribute.

I.e. don't make it a single-issue project.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live kernel upgrades (was: live kernel patching design)

2015-02-24 Thread Ingo Molnar

* Jiri Kosina jkos...@suse.cz wrote:

 [...] We could optimize the kernel the craziest way we 
 can, but hardware takes its time to reinitialize. And in 
 most cases, you'd really need to reinitalize it; [...]

If we want to reinitialize a device, most of the longer 
initialization latencies during bootup these days involve 
things like: 'poke hardware, see if there's any response'. 
Those are mostly going away quickly with modern, 
well-enumerated hardware interfaces.

Just try a modprobe of a random hardware driver - most 
initialization sequences are very fast. (That's how people 
are able to do cold bootups in less than 1 second.)

In theory this could also be optimized: we could avoid the 
reinitialization step through an upgrade via relatively 
simple means, for example if drivers define their own 
version and the new kernel's driver checks whether the 
previous state is from a compatible driver. Then the new 
driver could do a shorter initialization sequence.

But I'd only do it only in special cases, where for some 
reason the initialization sequence takes longer time and it 
makes sense to share hardware discovery information between 
two versions of the driver. I'm not convinced such a 
mechanism is necessary in the general case.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live kernel upgrades (was: live kernel patching design)

2015-02-24 Thread Jiri Slaby
On 02/22/2015, 10:46 AM, Ingo Molnar wrote:
 Arbitrary live kernel upgrades could be achieved by 
 starting with the 'simple method' I outlined in earlier 
 mails, using some of the methods that kpatch and kGraft are 
 both utilizing or planning to utilize:
 
   - implement user task and kthread parking to get the 
 kernel into quiescent state.
 
   - implement (optional, thus ABI-compatible) 
 system call interruptability and restartability 
 support.
 
   - implement task state and (limited) device state
 snapshotting support
 
   - implement live kernel upgrades by:
 
   - snapshotting all system state transparently
 
   - fast-rebooting into the new kernel image without 
 shutting down and rebooting user-space, i.e. _much_ 
 faster than a regular reboot.
 
   - restoring system state transparently within the new 
 kernel image and resuming system workloads where 
 they were left.
 
 Even complex external state like TCP socket state and 
 graphics state can be preserved over an upgrade. As far as 
 the user is concerned, nothing happened but a brief pause - 
 and he's now running a v3.21 kernel, not v3.20.
 
 Obviously one of the simplest utilizations of live kernel 
 upgrades would be to apply simple security fixes to 
 production systems. But that's just a very simple 
 application of a much broader capability.
 
 Note that if done right, then the time to perform a live 
 kernel upgrade on a typical system could be brought to well 
 below 10 seconds system stoppage time: adequate to the vast 
 majority of installations.
 
 For special installations or well optimized hardware the 
 latency could possibly be brought below 1 second stoppage 
 time.

Hello,

IMNSHO, you cannot.

The criu-based approach you have just described is already alive as an
external project in Parallels. It is of course a perfect solution for
some use cases. But its use case is a distinctive one. It is not our
competitor, it is our complementer. I will try to explain why.

It is highly dependent on HW. Kexec is not (or any other arbitrary
kernel-exchange mechanism would not be) supported by all HW, neither
drivers. There is not even a way to implement snapshotting for some
devices which is a real issue, obviously.

Downtime is highly dependent on the scenario. If you have a plenty of
dirty memory, you have to flush first. This might be minutes, especially
when using a network FS. Or you need not, but a failure to replace a
kernel is then lethal. If you have a heap of open FD, restore time will
take ages. You cannot fool any of those. It's pure I/O. You cannot
estimate the downtime and that is a real downside.

Even if you can get the criu time under one second, this is still
unacceptable for live patching. Live patching shall be by 3 orders of
magnitude faster than that, otherwise it makes no sense. If you can
afford a second, you probably already have a large enough windows or
failure handling to perform a full and mainly safer reboot/kexec anyway.

You cannot restore everything.
* TCP is one of the pure beasts in this. And there is indeed a plenty of
theoretical papers behind this, explaining what can or cannot be done.
* NFS is another one.
* Xorg. Today, we cannot even fluently switch between discreet and
native GFX chip. No go.
* There indeed are situations, where NP-hard problems need to be solved
upon restoration. No way, if you want to restore yet in this century.

While you cannot live-patch everything using KLP, it is patch-dependent.
Failure of restoration is condition-dependent and the condition is
really fuzzy. That is a huge difference.

Despite you put criu-based approach as provably safe and correct, it is
not in many cases and cannot be by definition.

That said, we are not going to start moving that way, except the many
good points which emerged during the discussion (fake signals to pick one).

 This 'live kernel upgrades' approach would have various 
 advantages:
 
   - it brings together various principles working towards 
 shared goals:
 
   - the boot time reduction folks
   - the checkpoint/restore folks
   - the hibernation folks
   - the suspend/resume and power management folks
   - the live patching folks (you)
   - the syscall latency reduction folks
 
 if so many disciplines are working together then maybe 
 something really good and long term maintainble can 
 crystalize out of that effort.

I must admit, whenever I implemented something in the kernel, nobody did
any work for me. So the above will only result in live patching teams to
do all the work. I am not saying we do not want to do the work. I am
only pointing out that there is nothing like work together with other
teams (unless we are sending them their pay-bills).

   - it ignores the security theater that treats security
 fixes as a separate, disproportionally more important
 class of fixes and instead allows arbitrary complex 
 

Re: live kernel upgrades (was: live kernel patching design)

2015-02-24 Thread Vojtech Pavlik
On Tue, Feb 24, 2015 at 10:44:05AM +0100, Ingo Molnar wrote:

  This is the most common argument that's raised when live 
  patching is discussed. Why do need live patching when we 
  have redundancy?
 
 My argument is that if we start off with a latency of 10 
 seconds and improve that gradually, it will be good for 
 everyone with a clear, actionable route for even those who 
 cannot take a 10 seconds delay today.

Sure, we can do it that way. 

Or do it in the other direction.

Today we have a tool (livepatch) in the kernel that can apply trivial
single-function fixes without a measurable disruption to applications.

And we can improve it gradually to expand the range of fixes it can
apply.

Dependent functions can be done by kGraft's lazy migration.

Limited data structure changes can be handled by shadowing.

Major data structure and/or locking changes require stopping the kernel,
and trapping all tasks at the kernel/userspace boundary is clearly the
cleanest way to do that. I comes at a steep latency cost, though.

Full code replacement without change scope consideration requires full
serialization and deserialization of hardware and userspace
interface state, which is something we don't have today and would
require work on every single driver. Possible, but probably a decade of
effort.

With this approach you have something useful at every point and every
piece of effort put in gives you a rewars.

 Lets see the use cases:
 
  [...] Examples would be legacy applications which can't 
  run in an active-active cluster and need to be restarted 
  on failover.
 
 Most clusters (say web frontends) can take a stoppage of a 
 couple of seconds.

It's easy to find examples of workloads that can be stopped. It doesn't
rule out a significant set of those where stopping them is very
expensive.

  Another usecase is large HPC clusters, where all nodes 
  have to run carefully synchronized. Once one gets behind 
  in a calculation cycle, others have to wait for the 
  results and the efficiency of the whole cluster goes 
  down. [...]
 
 I think calculation nodes on large HPC clusters qualify as 
 the specialized case that I mentioned, where the update 
 latency could be brought down into the 1 second range.
 
 But I don't think calculation nodes are patched in the 
 typical case: you might want to patch Internet facing 
 frontend systems, the rest is left as undisturbed as 
 possible. So I'm not even sure this is a typical usecase.

They're not patched for security bugs, but stability bugs are an
important issue for multi-month calculations.

 In any case, there's no hard limit on how fast such a 
 kernel upgrade can get in principle, and the folks who care 
 about that latency will sure help out optimizing it and 
 many HPC projects are well funded.

So far, unless you come up with an effective solutions, if you're
catching all tasks at the kernel/userspace boundary (the Kragle
approach), the service interruption is effectively unbounded due to
tasks in D state.

  The value of live patching is in near zero disruption.
 
 Latency is a good attribute of a kernel upgrade mechanism, 
 but it's by far not the only attribute and we should 
 definitely not design limitations into the approach and 
 hurt all the other attributes, just to optimize that single 
 attribute.

It's an attribute I'm not willing to give up. On the other hand, I
definitely wouldn't argue against having modes of operation where the
latency is higher and the tool is more powerful.

 I.e. don't make it a single-issue project.

There is no need to worry about that. 

-- 
Vojtech Pavlik
Director SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live kernel upgrades (was: live kernel patching design)

2015-02-24 Thread Vojtech Pavlik
On Tue, Feb 24, 2015 at 11:23:29AM +0100, Ingo Molnar wrote:

  Your upgrade proposal is an *enormous* disruption to the 
  system:
  
  - a latency of well below 10 seconds is completely
unacceptable to most users who want to patch the kernel 
of a production system _while_ it's in production.
 
 I think this statement is false for the following reasons.

The statement is very true.

   - I'd say the majority of system operators of production 
 systems can live with a couple of seconds of delay at a 
 well defined moment of the day or week - with gradual, 
 pretty much open ended improvements in that latency 
 down the line.

In the most usual corporate setting any noticeable outage, even out of
business hours, requires an ahead notice, and an agreement of all
stakeholders - teams that depend on the system.

If a live patching technology introduces an outage, it's not live and
because of these bureaucratic reasons, it will not be used and a regular
reboot will be scheduled instead.

   - I think your argument ignores the fact that live 
 upgrades would extend the scope of 'users willing to 
 patch the kernel of a production system' _enormously_. 
 
 For example, I have a production system with this much 
 uptime:
 
10:50:09 up 153 days,  3:58, 34 users,  load average: 0.00, 0.02, 0.05
 
 While currently I'm reluctant to reboot the system to 
 upgrade the kernel (due to a reboot's intrusiveness), 
 and that is why it has achieved a relatively high 
 uptime, but I'd definitely allow the kernel to upgrade 
 at 0:00am just fine. (I'd even give it up to a few 
 minutes, as long as TCP connections don't time out.)
 
 And I don't think my usecase is special.

I agree that this is useful. But it is a different problem that only
partially overlaps with what we're trying to achieve with live patching.

If you can make full kernel upgrades to work this way, which I doubt is
achievable in the next 10 years due to all the research and
infrastructure needed, then you certainly gain an additional group of
users. And a great tool. A large portion of those that ask for live
patching won't use it, though.

But honestly, I prefer a solution that works for small patches now, than
a solution for unlimited patches sometime in next decade.

 What gradual improvements in live upgrade latency am I 
 talking about?
 
  - For example the majority of pure user-space process 
pages in RAM could be saved from the old kernel over 
into the new kernel - i.e. they'd stay in place in RAM, 
but they'd be re-hashed for the new data structures. 
This avoids a big chunk of checkpointing overhead.

I'd have hoped this would be a given. If you can't preserve memory
contents and have to re-load from disk, you can just as well reboot
entirely, the time needed will not be much more..

  - Likewise, most of the page cache could be saved from an
old kernel to a new kernel as well - further reducing
checkpointing overhead.
 
  - The PROT_NONE mechanism of the current NUMA balancing
code could be used to transparently mark user-space 
pages as 'checkpointed'. This would reduce system 
interruption as only 'newly modified' pages would have 
to be checkpointed when the upgrade happens.
 
  - Hardware devices could be marked as 'already in well
defined state', skipping the more expensive steps of 
driver initialization.
 
  - Possibly full user-space page tables could be preserved 
over an upgrade: this way user-space execution would be 
unaffected even in the micro level: cache layout, TLB
patterns, etc.
 
 There's lots of gradual speedups possible with such a model 
 IMO.

Yes, as I say above, guaranteeing decades of employment. ;)

 With live kernel patching we run into a brick wall of 
 complexity straight away: we have to analyze the nature of 
 the kernel modification, in the context of live patching, 
 and that only works for the simplest of kernel 
 modifications.

But you're able to _use_ it.

 With live kernel upgrades no such brick wall exists, just 
 about any transition between kernel versions is possible.

The brick wall you run to is I need to implement full kernel state
serialization before I can do anything at all. That's something that
isn't even clear _how_ to do. Particularly with Linux kernel's
development model where internal ABI and structures are always in flux
it may not even be realistic.

 Granted, with live kernel upgrades it's much more complex 
 to get the 'simple' case into an even rudimentarily working 
 fashion (full userspace state has to be enumerated, saved 
 and restored), but once we are there, it's a whole new 
 category of goodness and it probably covers 90%+ of the 
 live kernel patching usecases on day 1 already ...

Feel free to start working on it. I'll stick with live patching.

-- 
Vojtech Pavlik
Director SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe 

Re: live kernel upgrades (was: live kernel patching design)

2015-02-24 Thread Petr Mladek
On Tue 2015-02-24 11:23:29, Ingo Molnar wrote:
 What gradual improvements in live upgrade latency am I 
 talking about?
 
  - For example the majority of pure user-space process 
pages in RAM could be saved from the old kernel over 
into the new kernel - i.e. they'd stay in place in RAM, 
but they'd be re-hashed for the new data structures. 

I wonder how many structures we would need to rehash when we update
the whole kernel. I think that it is not only about memory but also
about any other subsystem: networking, scheduler, ...


  - Hardware devices could be marked as 'already in well
defined state', skipping the more expensive steps of 
driver initialization.

This is another point that might get easily wrong. We know that
the quality of many drivers is not good. Yes, we want to make it
better. But we also know that system suspend does not work well
on many systems for years even with the huge effort.


  - Possibly full user-space page tables could be preserved 
over an upgrade: this way user-space execution would be 
unaffected even in the micro level: cache layout, TLB
patterns, etc.
 
 There's lots of gradual speedups possible with such a model 
 IMO.
 
 With live kernel patching we run into a brick wall of 
 complexity straight away: we have to analyze the nature of 
 the kernel modification, in the context of live patching, 
 and that only works for the simplest of kernel 
 modifications.
 
 With live kernel upgrades no such brick wall exists, just 
 about any transition between kernel versions is possible.

I see here a big difference in the complexity. If verifying patches
is considered as complex then I think that it is much much more
complicated to verify that the whole kernel upgrade is safe and
that all states will be properly preserved and reused.

Otherwise, I think that live patching won't be for any Joe User.
The people producing patches will need to investigate the
changes anyway. They will not blindly take a patch on internet
and convert it to a life patch. I think that this is true for
many other kernel features.


 Granted, with live kernel upgrades it's much more complex 
 to get the 'simple' case into an even rudimentarily working 
 fashion (full userspace state has to be enumerated, saved 
 and restored), but once we are there, it's a whole new 
 category of goodness and it probably covers 90%+ of the 
 live kernel patching usecases on day 1 already ...

I like the idea and I see the benefit for other tasks: system suspend,
migration of systems to another hardware, ... But I also think that it
is another level of functionality.

IMHO, live patching is somewhere on the way for the full kernel
update and it will help as well. For example, we will need to somehow
solve transition of kthreads and thus fix their parking.

I think that live patching deserves its separate solution. I consider
it much less risky but still valuable. I am sure that it will have
its users. Also it will not block improving the things for the full
update in the future.


Best Regards,
Petr
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live kernel upgrades (was: live kernel patching design)

2015-02-24 Thread Jiri Slaby
On 02/24/2015, 10:16 AM, Ingo Molnar wrote:
 and we don't design the Linux kernel for weird, extreme 
 cases, we design for the common, sane case that has the 
 broadest appeal, and we hope that the feature garners 
 enough interest to be maintainable.

Hello,

oh, so why do we have NR_CPUS up to 8192, then? I haven't met a machine
with more than 16 cores yet. You did. But you haven't met a guy
thanksgiving for a live patching being so easy to implement, but fast. I
did.

What ones call extreme, others accept as standard. That is, I believe,
why you signed under the support for up to 8192 CPUs.

We develop Linux to be scalable, i.e. used on *whatever* scenario you
can imagine in any world. Be it large/small machines, lowmem/higmem,
numa/uma, whatever. If you don't like something, you are free to disable
that. Democracy.

 This is not a problem in general: the weird case can take 
 care of itself just fine - 'specialized and weird' usually 
 means there's enough money to throw at special hardware and 
 human solutions or it goes extinct quickly ...

Live patching is not a random idea which is about to die. It is months
of negotiations with customers, management, between developers,
establishing teams and really thinking about the idea. The decisions
were discussed on many conferences too. I am trying to shed some light
on why we are not trying to improve criu or any other already existing
project. We studied papers, codes, implementations, kSplice and such and
decided to incline to what we have implemented, presented and merged.

thanks,
-- 
js
suse labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live kernel upgrades (was: live kernel patching design)

2015-02-24 Thread Vojtech Pavlik
On Tue, Feb 24, 2015 at 11:53:28AM +0100, Ingo Molnar wrote:
 
 * Jiri Kosina jkos...@suse.cz wrote:
 
  [...] We could optimize the kernel the craziest way we 
  can, but hardware takes its time to reinitialize. And in 
  most cases, you'd really need to reinitalize it; [...]
 
 If we want to reinitialize a device, most of the longer 
 initialization latencies during bootup these days involve 
 things like: 'poke hardware, see if there's any response'. 
 Those are mostly going away quickly with modern, 
 well-enumerated hardware interfaces.
 
 Just try a modprobe of a random hardware driver - most 
 initialization sequences are very fast. (That's how people 
 are able to do cold bootups in less than 1 second.)

Have you ever tried to boot a system with a large ( 100) number of
drives connected over FC? That takes time to discover and you have to do
the discovery as the configuration could have changed while you were not
looking.

Or a machine with terabytes of memory? Just initializing the memory
takes minutes.

Or a desktop with USB? And you have to reinitialize the USB bus and the
state of all the USB devices, because an application might be accessing
files on an USB drive.

 In theory this could also be optimized: we could avoid the 
 reinitialization step through an upgrade via relatively 
 simple means, for example if drivers define their own 
 version and the new kernel's driver checks whether the 
 previous state is from a compatible driver. Then the new 
 driver could do a shorter initialization sequence.

There you're clearly getting in the so complex to maintain that it'll never
work reliably territory.

 But I'd only do it only in special cases, where for some 
 reason the initialization sequence takes longer time and it 
 makes sense to share hardware discovery information between 
 two versions of the driver. I'm not convinced such a 
 mechanism is necessary in the general case.

-- 
Vojtech Pavlik
Director SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live kernel upgrades (was: live kernel patching design)

2015-02-23 Thread Pavel Machek
> kernel update as step one, maybe we want this on a kernel module
> level:
> Hot-swap of kernel modules, where a kernel module makes itself go
> quiet and serializes its state ("suspend" pretty much), then gets
> swapped out (hot) by its replacement,
> which then unserializes the state and continues.

Hmm. So Linux 5.0 will be micro-kernel? :-).

Pavek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live kernel upgrades (was: live kernel patching design)

2015-02-23 Thread Pavel Machek
> More importantly, both kGraft and kpatch are pretty limited 
> in what kinds of updates they allow, and neither kGraft nor 
> kpatch has any clear path towards applying more complex 
> fixes to kernel images that I can see: kGraft can only 
> apply the simplest of fixes where both versions of a 
> function are interchangeable, and kpatch is only marginally 
> better at that - and that's pretty fundamental to both 
> projects!
> 
> I think all of these problems could be resolved by shooting 
> for the moon instead:
> 
>   - work towards allowing arbitrary live kernel upgrades!
> 
> not just 'live kernel patches'.

Note that live kernel upgrade would have interesting implications
outside kernel:

1) glibc does "what kernel version is this?" caches result
and alters behaviour accordingly.

2) apps will do recently_introduced_syscall(), get error 
and not attempt it again.

Pavel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live kernel upgrades (was: live kernel patching design)

2015-02-23 Thread Vojtech Pavlik
On Mon, Feb 23, 2015 at 11:42:17AM +0100, Richard Weinberger wrote:

> > Of course, if you are random Joe User, you can do whatever you want, i.e.
> > also compile your own home-brew patches and apply them randomly and brick
> > your system that way. But that's in no way different to what you as Joe
> > User can do today; there is nothing that will prevent you from shooting
> > yourself in a foot if you are creative.
> 
> Sorry if I ask something that got already discussed, I did not follow
> the whole live-patching discussion.
> 
> How much of the userspace tools will be public available?
> With live-patching mainline the kernel offers the mechanism, but
> random Joe user still needs
> the tools to create good live patches.

All the tools for kGraft and kpatch are available in public git
repositories.

Also, while kGraft has tools to automate the generation of patches,
these are generally not required to create a patch.

-- 
Vojtech Pavlik
Director SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live kernel upgrades (was: live kernel patching design)

2015-02-23 Thread Richard Weinberger
On Mon, Feb 23, 2015 at 9:17 AM, Jiri Kosina  wrote:
> On Sun, 22 Feb 2015, Arjan van de Ven wrote:
> Of course, if you are random Joe User, you can do whatever you want, i.e.
> also compile your own home-brew patches and apply them randomly and brick
> your system that way. But that's in no way different to what you as Joe
> User can do today; there is nothing that will prevent you from shooting
> yourself in a foot if you are creative.

Sorry if I ask something that got already discussed, I did not follow
the whole live-patching discussion.

How much of the userspace tools will be public available?
With live-patching mainline the kernel offers the mechanism, but
random Joe user still needs
the tools to create good live patches.

-- 
Thanks,
//richard
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live kernel upgrades (was: live kernel patching design)

2015-02-23 Thread Jiri Kosina
On Sun, 22 Feb 2015, Arjan van de Ven wrote:

> There's a lot of logistical issues (can you patch a patched system... if 
> live patching is a first class citizen you end up with dozens and dozens 
> of live patches applied, some out of sequence etc etc). 

I can't speak on behalf of others, but I definitely can speak on behalf of 
SUSE, as we are already basing a product on this.

Yes, you can patch a patched system, you can patch one function multiple 
times, you can revert a patch. It's all tracked by dependencies.

Of course, if you are random Joe User, you can do whatever you want, i.e. 
also compile your own home-brew patches and apply them randomly and brick 
your system that way. But that's in no way different to what you as Joe 
User can do today; there is nothing that will prevent you from shooting 
yourself in a foot if you are creative.

Regarding "out of sequence", this is up to the vendor providing/packaging 
the patches to make sure that this is guaranteed not to happen. SUSE for 
example always provides "all-in-one" patch for each and every released and 
supported kernel codestream in a cummulative manner, which takes care of 
the ordering issue completely.

It's not really too different from shipping external kernel modules and 
making sure they have proper dependencies that need to be satisfied before 
the module can be loaded.

> There's the "which patches do I have, and if the first patch for a 
> security hole was not complete, how do I cope by applying number two. 
> There's the "which of my 50.000 servers have which patch applied" 
> logistics.

Yes. That's easy if distro/patch vendors make reasonable userspace and 
distribution infrastructure around this.

Thanks,

-- 
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live kernel upgrades (was: live kernel patching design)

2015-02-23 Thread Vojtech Pavlik
On Mon, Feb 23, 2015 at 11:42:17AM +0100, Richard Weinberger wrote:

  Of course, if you are random Joe User, you can do whatever you want, i.e.
  also compile your own home-brew patches and apply them randomly and brick
  your system that way. But that's in no way different to what you as Joe
  User can do today; there is nothing that will prevent you from shooting
  yourself in a foot if you are creative.
 
 Sorry if I ask something that got already discussed, I did not follow
 the whole live-patching discussion.
 
 How much of the userspace tools will be public available?
 With live-patching mainline the kernel offers the mechanism, but
 random Joe user still needs
 the tools to create good live patches.

All the tools for kGraft and kpatch are available in public git
repositories.

Also, while kGraft has tools to automate the generation of patches,
these are generally not required to create a patch.

-- 
Vojtech Pavlik
Director SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live kernel upgrades (was: live kernel patching design)

2015-02-23 Thread Pavel Machek
 More importantly, both kGraft and kpatch are pretty limited 
 in what kinds of updates they allow, and neither kGraft nor 
 kpatch has any clear path towards applying more complex 
 fixes to kernel images that I can see: kGraft can only 
 apply the simplest of fixes where both versions of a 
 function are interchangeable, and kpatch is only marginally 
 better at that - and that's pretty fundamental to both 
 projects!
 
 I think all of these problems could be resolved by shooting 
 for the moon instead:
 
   - work towards allowing arbitrary live kernel upgrades!
 
 not just 'live kernel patches'.

Note that live kernel upgrade would have interesting implications
outside kernel:

1) glibc does what kernel version is this? caches result
and alters behaviour accordingly.

2) apps will do recently_introduced_syscall(), get error 
and not attempt it again.

Pavel
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live kernel upgrades (was: live kernel patching design)

2015-02-23 Thread Richard Weinberger
On Mon, Feb 23, 2015 at 9:17 AM, Jiri Kosina jkos...@suse.cz wrote:
 On Sun, 22 Feb 2015, Arjan van de Ven wrote:
 Of course, if you are random Joe User, you can do whatever you want, i.e.
 also compile your own home-brew patches and apply them randomly and brick
 your system that way. But that's in no way different to what you as Joe
 User can do today; there is nothing that will prevent you from shooting
 yourself in a foot if you are creative.

Sorry if I ask something that got already discussed, I did not follow
the whole live-patching discussion.

How much of the userspace tools will be public available?
With live-patching mainline the kernel offers the mechanism, but
random Joe user still needs
the tools to create good live patches.

-- 
Thanks,
//richard
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live kernel upgrades (was: live kernel patching design)

2015-02-23 Thread Pavel Machek
 kernel update as step one, maybe we want this on a kernel module
 level:
 Hot-swap of kernel modules, where a kernel module makes itself go
 quiet and serializes its state (suspend pretty much), then gets
 swapped out (hot) by its replacement,
 which then unserializes the state and continues.

Hmm. So Linux 5.0 will be micro-kernel? :-).

Pavek
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live kernel upgrades (was: live kernel patching design)

2015-02-23 Thread Jiri Kosina
On Sun, 22 Feb 2015, Arjan van de Ven wrote:

 There's a lot of logistical issues (can you patch a patched system... if 
 live patching is a first class citizen you end up with dozens and dozens 
 of live patches applied, some out of sequence etc etc). 

I can't speak on behalf of others, but I definitely can speak on behalf of 
SUSE, as we are already basing a product on this.

Yes, you can patch a patched system, you can patch one function multiple 
times, you can revert a patch. It's all tracked by dependencies.

Of course, if you are random Joe User, you can do whatever you want, i.e. 
also compile your own home-brew patches and apply them randomly and brick 
your system that way. But that's in no way different to what you as Joe 
User can do today; there is nothing that will prevent you from shooting 
yourself in a foot if you are creative.

Regarding out of sequence, this is up to the vendor providing/packaging 
the patches to make sure that this is guaranteed not to happen. SUSE for 
example always provides all-in-one patch for each and every released and 
supported kernel codestream in a cummulative manner, which takes care of 
the ordering issue completely.

It's not really too different from shipping external kernel modules and 
making sure they have proper dependencies that need to be satisfied before 
the module can be loaded.

 There's the which patches do I have, and if the first patch for a 
 security hole was not complete, how do I cope by applying number two. 
 There's the which of my 50.000 servers have which patch applied 
 logistics.

Yes. That's easy if distro/patch vendors make reasonable userspace and 
distribution infrastructure around this.

Thanks,

-- 
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live kernel upgrades (was: live kernel patching design)

2015-02-22 Thread Vojtech Pavlik
On Sun, Feb 22, 2015 at 03:01:48PM -0800, Andrew Morton wrote:

> On Sun, 22 Feb 2015 20:13:28 +0100 (CET) Jiri Kosina  wrote:
> 
> > But if you ask the folks who are hungry for live bug patching, they 
> > wouldn't care.
> > 
> > You mentioned "10 seconds", that's more or less equal to infinity to them. 
> 
> 10 seconds outage is unacceptable, but we're running our service on a
> single machine with no failover.  Who is doing this??

This is the most common argument that's raised when live patching is
discussed. "Why do need live patching when we have redundancy?"

People who are asking for live patching typically do have failover in
place, but prefer not to have to use it when they don't have to.

In many cases, the failover just can't be made transparent to the
outside world and there is a short outage. Examples would be legacy
applications which can't run in an active-active cluster and need to be
restarted on failover. Or trading systems, where the calculations must
be strictly serialized and response times are counted in tens of
microseconds. 

Another usecase is large HPC clusters, where all nodes have to run
carefully synchronized. Once one gets behind in a calculation cycle,
others have to wait for the results and the efficiency of the whole
cluster goes down. There are people who run realtime on them for
that reason. Dumping all data and restarting the HPC cluster takes a lot
of time and many nodes (out of tens of thousands) may not come back up,
making the restore from media difficult. Doing a rolling upgrade causes
the nodes one by one stall by 10+ seconds, which times 10k is a long
time, too.

And even the case where you have a perfect setup with everything
redundant and with instant failover does benefit from live patching.
Since you have to plan for failure, you have to plan for failure while
patching, too. With live patching you need 2 servers minimum (or N+1),
without you need 3 (or N+2), as one will be offline while during the
upgrade process.

10 seconds of outage may be acceptable in a disaster scenario. Not
necessarily for a regular update scenario.

The value of live patching is in near zero disruption.

-- 
Vojtech Pavlik
Director SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live kernel upgrades (was: live kernel patching design)

2015-02-22 Thread Arjan van de Ven
There's failover, there's running the core services in VMs (which can
migrate)...
I think 10 seconds is Ingo being a bit exaggerating, since you can
boot a full system in a lot less time than that, and more so if you
know more about the system
(e.g. don't need to spin down and then discover and spin up disks). If
you're talking about inside a VM it's even more extreme than that.


Now, live patching sounds great as ideal, but it may end up being
(mostly) similar like hardware hotplug: Everyone wants it, but nobody
wants to use it
(and just waits for a maintenance window instead). In the hotplug
case, while people say they want it, they're also aware that hardware
hotplug is fundamentally messy, and then nobody wants to do it on that
mission critical piece of hardware outside the maintenance window.
(hotswap drives seem to have been the exception to this, that seems to
have been worked out well enough, but that's replace-with-the-same).
I would be very afraid that hot kernel patching ends up in the same
space: The super-mission-critical folks are what its aimed at, while
those are the exact same folks that would rather wait for the
maintenance window.

There's a lot of logistical issues (can you patch a patched system...
if live patching is a first class citizen you end up with dozens and
dozens of live patches applied, some out of sequence etc etc). There's
the "which patches do I have, and if the first patch for a security
hole was not complete, how do I cope by applying number two. There's
the "which of my 50.000 servers have which patch applied" logistics.

And Ingo is absolutely right: The scope is very fuzzy. Todays bugfix
is tomorrows "oh oops it turns out exploitable".

I will throw a different hat in the ring: Maybe we don't want full
kernel update as step one, maybe we want this on a kernel module
level:
Hot-swap of kernel modules, where a kernel module makes itself go
quiet and serializes its state ("suspend" pretty much), then gets
swapped out (hot) by its replacement,
which then unserializes the state and continues.

If we can do this on a module level, then the next step is treating
more components of the kernel as modules, which is a fundamental
modularity thing.



On Sun, Feb 22, 2015 at 4:18 PM, Dave Airlie  wrote:
> On 23 February 2015 at 09:01, Andrew Morton  wrote:
>> On Sun, 22 Feb 2015 20:13:28 +0100 (CET) Jiri Kosina  wrote:
>>
>>> But if you ask the folks who are hungry for live bug patching, they
>>> wouldn't care.
>>>
>>> You mentioned "10 seconds", that's more or less equal to infinity to them.
>>
>> 10 seconds outage is unacceptable, but we're running our service on a
>> single machine with no failover.  Who is doing this??
>
> if I had to guess, telcos generally, you've only got one wire between a phone
> and the exchange and if the switch on the end needs patching it better be 
> fast.
>
> Dave.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live kernel upgrades (was: live kernel patching design)

2015-02-22 Thread Dave Airlie
On 23 February 2015 at 09:01, Andrew Morton  wrote:
> On Sun, 22 Feb 2015 20:13:28 +0100 (CET) Jiri Kosina  wrote:
>
>> But if you ask the folks who are hungry for live bug patching, they
>> wouldn't care.
>>
>> You mentioned "10 seconds", that's more or less equal to infinity to them.
>
> 10 seconds outage is unacceptable, but we're running our service on a
> single machine with no failover.  Who is doing this??

if I had to guess, telcos generally, you've only got one wire between a phone
and the exchange and if the switch on the end needs patching it better be fast.

Dave.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live kernel upgrades (was: live kernel patching design)

2015-02-22 Thread Andrew Morton
On Sun, 22 Feb 2015 20:13:28 +0100 (CET) Jiri Kosina  wrote:

> But if you ask the folks who are hungry for live bug patching, they 
> wouldn't care.
> 
> You mentioned "10 seconds", that's more or less equal to infinity to them. 

10 seconds outage is unacceptable, but we're running our service on a
single machine with no failover.  Who is doing this??
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live kernel upgrades (was: live kernel patching design)

2015-02-22 Thread Jiri Kosina

[ added live-patching@ ML as well, in consistency with Josh ]

On Sun, 22 Feb 2015, Ingo Molnar wrote:

> It's all still tons of work to pull off a 'live kernel upgrade' on 
> native hardware, but IMHO it's tons of very useful work that helps a 
> dozen non-competing projects, literally.

Yes, I agree, it might be nice-to-have feature. The only issue with that 
is that it's solving a completely different problem than live patching.

Guys working on criu have made quite a few steps in that direction of 
already course; modulo bugs and current implementation limitations, you 
should be able to checkpoint your userspace, kexec to a new kernel, and 
restart your userspace.

But if you ask the folks who are hungry for live bug patching, they 
wouldn't care.

You mentioned "10 seconds", that's more or less equal to infinity to them. 
And frankly, even "10 seconds" is something we can't really guarantee. We 
could optimize the kernel the craziest way we can, but hardware takes its 
time to reinitialize. And in most cases, you'd really need to reinitalize 
it; I don't see a way how you could safely suspend it somehow in the old 
kernel and resume it in a new one, because the driver suspending the 
device might be completely different than the driver resuming the device. 
How are you able to provide hard guarantees that this is going to work?

So all in all, if you ask me -- yes, live kernel upgrades from v3.20 to 
v3.21, pretty cool feature. Is it related to the problem we are after with 
live bug patching? I very much don't think so.

Thanks,

-- 
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live kernel upgrades (was: live kernel patching design)

2015-02-22 Thread Jiri Kosina
On Sun, 22 Feb 2015, Josh Poimboeuf wrote:

> Yes, there have been some suggestions that we should support multiple 
> consistency models, but I haven't heard any good reasons that would 
> justify the added complexity.

I tend to agree, consistency models were just a temporary idea that seems 
to likely become unnecessary given all the ideas on the unified solution 
that have been presented so far.

(Well, with a small exception to this -- I still think we should be able 
to "fire and forget" for patches where it's guaranteed that no 
housekeeping is necessary -- my favorite example is again fixing out of 
bounds access in a certain syscall entry ... i.e. the "super-simple" 
consistency model).

-- 
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live kernel upgrades (was: live kernel patching design)

2015-02-22 Thread Josh Poimboeuf
On Sun, Feb 22, 2015 at 08:37:58AM -0600, Josh Poimboeuf wrote:
> On Sun, Feb 22, 2015 at 10:46:39AM +0100, Ingo Molnar wrote:
> >  - the whole 'consistency model' talk both projects employ 
> >reminds me of how we grew 'security modules': where 
> >people running various mediocre projects would in the 
> >end not seek to create a superior upstream project, but 
> >would seek the 'consensus' in the form of cross-acking 
> >each others' patches as long as their own code got 
> >upstream as well ...
> 
> That's just not the case.  The consistency models were used to describe
> the features and the pros and cons of the different approaches.
> 
> The RFC is not a compromise to get "cross-acks".  IMO it's an
> improvement on both kpatch and kGraft.  See the RFC cover letter [1] and
> the original consistency model discussion [2] for more details.

BTW, I proposed that with my RFC we only need a _single_ consistency
model.

Yes, there have been some suggestions that we should support multiple
consistency models, but I haven't heard any good reasons that would
justify the added complexity.

-- 
Josh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live kernel upgrades (was: live kernel patching design)

2015-02-22 Thread Josh Poimboeuf
[ adding live-patching mailing list to CC ]

On Sun, Feb 22, 2015 at 10:46:39AM +0100, Ingo Molnar wrote:
> * Ingo Molnar  wrote:
> > Anyway, let me try to reboot this discussion back to 
> > technological details by summing up my arguments in 
> > another mail.
> 
> So here's how I see the kGraft and kpatch series. To not 
> put too fine a point on it, I think they are fundamentally 
> misguided in both implementation and in design, which turns 
> them into an (unwilling) extended arm of the security 
> theater:
> 
>  - kGraft creates a 'mixed' state where old kernel
>functions and new kernel functions are allowed to
>co-exist,

Yes, some tasks may be running old functions and some tasks may be
running new functions.  This would only cause a problem if there are
changes to global data semantics.  We have guidelines the patch author
can follow to ensure that this isn't a problem.

>attempting to get the patching done within a bound
>amount of time.

Don't forget about my RFC [1] which converges the system to a patched
state within a few seconds.  If the system isn't patched by then, the
user space tool can trigger a safe patch revert.

>  - kpatch uses kernel stack backtraces to determine whether
>a task is executing a function or not - which IMO is
>fundamentally fragile as kernel stack backtraces are
>'debug info' and are maintained and created as such:
>we've had long lasting stack backtrace bugs which would
>now be turned into 'potentially patching a live
>function' type of functional (and hard to debug) bugs.
>I didn't see much effort that tries to turn this
>equation around and makes kernel stacktraces more
>robust.

Again, I proposed several stack unwinding validation improvements which
would make this a non-issue IMO.

>  - the whole 'consistency model' talk both projects employ 
>reminds me of how we grew 'security modules': where 
>people running various mediocre projects would in the 
>end not seek to create a superior upstream project, but 
>would seek the 'consensus' in the form of cross-acking 
>each others' patches as long as their own code got 
>upstream as well ...

That's just not the case.  The consistency models were used to describe
the features and the pros and cons of the different approaches.

The RFC is not a compromise to get "cross-acks".  IMO it's an
improvement on both kpatch and kGraft.  See the RFC cover letter [1] and
the original consistency model discussion [2] for more details.

>I'm not blaming Linus for giving in to allowing security
>modules: they might be the right model for such a hard 
>to define and in good part psychological discipline as 
>'security', but I sure don't see the necessity of doing
>that for 'live kernel patching'.
> 
> More importantly, both kGraft and kpatch are pretty limited 
> in what kinds of updates they allow, and neither kGraft nor 
> kpatch has any clear path towards applying more complex 
> fixes to kernel images that I can see: kGraft can only 
> apply the simplest of fixes where both versions of a 
> function are interchangeableand kpatch is only marginally 
> better at that - and that's pretty fundamental to both 
> projects!

Sorry, but that is just not true.  We can apply complex patches,
including "non-interchangeable functions" and data structures/semantics.

The catch is that it requires the patch author to put in the work to
modify the patch to make it compatible with live patching.  But that's
an acceptable tradeoff for distros who want to support live patching.

> I think all of these problems could be resolved by shooting 
> for the moon instead:
> 
>   - work towards allowing arbitrary live kernel upgrades!
> 
> not just 'live kernel patches'.
> 
> Work towards the goal of full live kernel upgrades between 
> any two versions of a kernel that supports live kernel 
> upgrades (and that doesn't have fatal bugs in the kernel 
> upgrade support code requiring a hard system restart).
> 
> Arbitrary live kernel upgrades could be achieved by 
> starting with the 'simple method' I outlined in earlier 
> mails, using some of the methods that kpatch and kGraft are 
> both utilizing or planning to utilize:
> 
>   - implement user task and kthread parking to get the 
> kernel into quiescent state.
> 
>   - implement (optional, thus ABI-compatible) 
> system call interruptability and restartability 
> support.
> 
>   - implement task state and (limited) device state
> snapshotting support
> 
>   - implement live kernel upgrades by:
> 
>   - snapshotting all system state transparently
> 
>   - fast-rebooting into the new kernel image without 
> shutting down and rebooting user-space, i.e. _much_ 
> faster than a regular reboot.
> 
>   - restoring system state transparently within the new 
> kernel image and resuming system workloads where 
> they were left.
> 
> Even complex external 

Re: live kernel upgrades (was: live kernel patching design)

2015-02-22 Thread Ingo Molnar

* Ingo Molnar  wrote:

> We have many of the building blocks in place and have 
> them available:
> 
>   - the freezer code already attempts at parking/unparking
> threads transparently, that could be fixed/extended.
> 
>   - hibernation, regular suspend/resume and in general
> power management has in essence already implemented
> most building blocks needed to enumerate and
> checkpoint/restore device state that otherwise gets
> lost in a shutdown/reboot cycle.
> 
>   - c/r patches started user state enumeration and
> checkpoint/restore logic

I forgot to mention:

- kexec allows the loading and execution of a new 
  kernel image.

It's all still tons of work to pull off a 'live kernel 
upgrade' on native hardware, but IMHO it's tons of very 
useful work that helps a dozen non-competing projects, 
literally.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live kernel upgrades (was: live kernel patching design)

2015-02-22 Thread Ingo Molnar

* Ingo Molnar  wrote:

>   - implement live kernel upgrades by:
> 
>   - snapshotting all system state transparently

Note that this step can be sped up further in the end, 
because most of this work can be performed asynchronously 
and transparently prior to the live kernel upgrade step 
itself.

So if we split the snapshotting+parking preparatory step 
into two parts:

- do opportunistic snapshotting of 
  sleeping/inactive user tasks while allowing 
  snapshotted tasks to continue to run

- once that is completed, do snapshotting+parking 
  of all user tasks, even running ones

The first step is largely asynchronous, can be done with 
lower priority and does not park/stop any tasks on the 
system.

Only the second step counts as 'system stoppage time': and 
only those tasks have to be snapshotted again which 
executed any code since the first snapshotting run was 
performed.

Note that even this stoppage time can be reduced further: 
if a system is running critical services/users that need as 
little interruption as possible, they could be 
prioritized/ordered to be snapshotted/parked closest to the 
live kernel upgrade step.

>   - fast-rebooting into the new kernel image without 
> shutting down and rebooting user-space, i.e. _much_ 
> faster than a regular reboot.
> 
>   - restoring system state transparently within the new 
> kernel image and resuming system workloads where 
> they were left.
> 
> Even complex external state like TCP socket state and 
> graphics state can be preserved over an upgrade. As far 
> as the user is concerned, nothing happened but a brief 
> pause - and he's now running a v3.21 kernel, not v3.20.

So all this would allow 'live, rolling kernel upgrades' in 
the end.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live kernel upgrades (was: live kernel patching design)

2015-02-22 Thread Ingo Molnar

* Ingo Molnar mi...@kernel.org wrote:

   - implement live kernel upgrades by:
 
   - snapshotting all system state transparently

Note that this step can be sped up further in the end, 
because most of this work can be performed asynchronously 
and transparently prior to the live kernel upgrade step 
itself.

So if we split the snapshotting+parking preparatory step 
into two parts:

- do opportunistic snapshotting of 
  sleeping/inactive user tasks while allowing 
  snapshotted tasks to continue to run

- once that is completed, do snapshotting+parking 
  of all user tasks, even running ones

The first step is largely asynchronous, can be done with 
lower priority and does not park/stop any tasks on the 
system.

Only the second step counts as 'system stoppage time': and 
only those tasks have to be snapshotted again which 
executed any code since the first snapshotting run was 
performed.

Note that even this stoppage time can be reduced further: 
if a system is running critical services/users that need as 
little interruption as possible, they could be 
prioritized/ordered to be snapshotted/parked closest to the 
live kernel upgrade step.

   - fast-rebooting into the new kernel image without 
 shutting down and rebooting user-space, i.e. _much_ 
 faster than a regular reboot.
 
   - restoring system state transparently within the new 
 kernel image and resuming system workloads where 
 they were left.
 
 Even complex external state like TCP socket state and 
 graphics state can be preserved over an upgrade. As far 
 as the user is concerned, nothing happened but a brief 
 pause - and he's now running a v3.21 kernel, not v3.20.

So all this would allow 'live, rolling kernel upgrades' in 
the end.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live kernel upgrades (was: live kernel patching design)

2015-02-22 Thread Ingo Molnar

* Ingo Molnar mi...@kernel.org wrote:

 We have many of the building blocks in place and have 
 them available:
 
   - the freezer code already attempts at parking/unparking
 threads transparently, that could be fixed/extended.
 
   - hibernation, regular suspend/resume and in general
 power management has in essence already implemented
 most building blocks needed to enumerate and
 checkpoint/restore device state that otherwise gets
 lost in a shutdown/reboot cycle.
 
   - c/r patches started user state enumeration and
 checkpoint/restore logic

I forgot to mention:

- kexec allows the loading and execution of a new 
  kernel image.

It's all still tons of work to pull off a 'live kernel 
upgrade' on native hardware, but IMHO it's tons of very 
useful work that helps a dozen non-competing projects, 
literally.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live kernel upgrades (was: live kernel patching design)

2015-02-22 Thread Josh Poimboeuf
[ adding live-patching mailing list to CC ]

On Sun, Feb 22, 2015 at 10:46:39AM +0100, Ingo Molnar wrote:
 * Ingo Molnar mi...@kernel.org wrote:
  Anyway, let me try to reboot this discussion back to 
  technological details by summing up my arguments in 
  another mail.
 
 So here's how I see the kGraft and kpatch series. To not 
 put too fine a point on it, I think they are fundamentally 
 misguided in both implementation and in design, which turns 
 them into an (unwilling) extended arm of the security 
 theater:
 
  - kGraft creates a 'mixed' state where old kernel
functions and new kernel functions are allowed to
co-exist,

Yes, some tasks may be running old functions and some tasks may be
running new functions.  This would only cause a problem if there are
changes to global data semantics.  We have guidelines the patch author
can follow to ensure that this isn't a problem.

attempting to get the patching done within a bound
amount of time.

Don't forget about my RFC [1] which converges the system to a patched
state within a few seconds.  If the system isn't patched by then, the
user space tool can trigger a safe patch revert.

  - kpatch uses kernel stack backtraces to determine whether
a task is executing a function or not - which IMO is
fundamentally fragile as kernel stack backtraces are
'debug info' and are maintained and created as such:
we've had long lasting stack backtrace bugs which would
now be turned into 'potentially patching a live
function' type of functional (and hard to debug) bugs.
I didn't see much effort that tries to turn this
equation around and makes kernel stacktraces more
robust.

Again, I proposed several stack unwinding validation improvements which
would make this a non-issue IMO.

  - the whole 'consistency model' talk both projects employ 
reminds me of how we grew 'security modules': where 
people running various mediocre projects would in the 
end not seek to create a superior upstream project, but 
would seek the 'consensus' in the form of cross-acking 
each others' patches as long as their own code got 
upstream as well ...

That's just not the case.  The consistency models were used to describe
the features and the pros and cons of the different approaches.

The RFC is not a compromise to get cross-acks.  IMO it's an
improvement on both kpatch and kGraft.  See the RFC cover letter [1] and
the original consistency model discussion [2] for more details.

I'm not blaming Linus for giving in to allowing security
modules: they might be the right model for such a hard 
to define and in good part psychological discipline as 
'security', but I sure don't see the necessity of doing
that for 'live kernel patching'.
 
 More importantly, both kGraft and kpatch are pretty limited 
 in what kinds of updates they allow, and neither kGraft nor 
 kpatch has any clear path towards applying more complex 
 fixes to kernel images that I can see: kGraft can only 
 apply the simplest of fixes where both versions of a 
 function are interchangeableand kpatch is only marginally 
 better at that - and that's pretty fundamental to both 
 projects!

Sorry, but that is just not true.  We can apply complex patches,
including non-interchangeable functions and data structures/semantics.

The catch is that it requires the patch author to put in the work to
modify the patch to make it compatible with live patching.  But that's
an acceptable tradeoff for distros who want to support live patching.

 I think all of these problems could be resolved by shooting 
 for the moon instead:
 
   - work towards allowing arbitrary live kernel upgrades!
 
 not just 'live kernel patches'.
 
 Work towards the goal of full live kernel upgrades between 
 any two versions of a kernel that supports live kernel 
 upgrades (and that doesn't have fatal bugs in the kernel 
 upgrade support code requiring a hard system restart).
 
 Arbitrary live kernel upgrades could be achieved by 
 starting with the 'simple method' I outlined in earlier 
 mails, using some of the methods that kpatch and kGraft are 
 both utilizing or planning to utilize:
 
   - implement user task and kthread parking to get the 
 kernel into quiescent state.
 
   - implement (optional, thus ABI-compatible) 
 system call interruptability and restartability 
 support.
 
   - implement task state and (limited) device state
 snapshotting support
 
   - implement live kernel upgrades by:
 
   - snapshotting all system state transparently
 
   - fast-rebooting into the new kernel image without 
 shutting down and rebooting user-space, i.e. _much_ 
 faster than a regular reboot.
 
   - restoring system state transparently within the new 
 kernel image and resuming system workloads where 
 they were left.
 
 Even complex external state like TCP socket state and 
 graphics state can be preserved over an 

Re: live kernel upgrades (was: live kernel patching design)

2015-02-22 Thread Jiri Kosina

[ added live-patching@ ML as well, in consistency with Josh ]

On Sun, 22 Feb 2015, Ingo Molnar wrote:

 It's all still tons of work to pull off a 'live kernel upgrade' on 
 native hardware, but IMHO it's tons of very useful work that helps a 
 dozen non-competing projects, literally.

Yes, I agree, it might be nice-to-have feature. The only issue with that 
is that it's solving a completely different problem than live patching.

Guys working on criu have made quite a few steps in that direction of 
already course; modulo bugs and current implementation limitations, you 
should be able to checkpoint your userspace, kexec to a new kernel, and 
restart your userspace.

But if you ask the folks who are hungry for live bug patching, they 
wouldn't care.

You mentioned 10 seconds, that's more or less equal to infinity to them. 
And frankly, even 10 seconds is something we can't really guarantee. We 
could optimize the kernel the craziest way we can, but hardware takes its 
time to reinitialize. And in most cases, you'd really need to reinitalize 
it; I don't see a way how you could safely suspend it somehow in the old 
kernel and resume it in a new one, because the driver suspending the 
device might be completely different than the driver resuming the device. 
How are you able to provide hard guarantees that this is going to work?

So all in all, if you ask me -- yes, live kernel upgrades from v3.20 to 
v3.21, pretty cool feature. Is it related to the problem we are after with 
live bug patching? I very much don't think so.

Thanks,

-- 
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live kernel upgrades (was: live kernel patching design)

2015-02-22 Thread Josh Poimboeuf
On Sun, Feb 22, 2015 at 08:37:58AM -0600, Josh Poimboeuf wrote:
 On Sun, Feb 22, 2015 at 10:46:39AM +0100, Ingo Molnar wrote:
   - the whole 'consistency model' talk both projects employ 
 reminds me of how we grew 'security modules': where 
 people running various mediocre projects would in the 
 end not seek to create a superior upstream project, but 
 would seek the 'consensus' in the form of cross-acking 
 each others' patches as long as their own code got 
 upstream as well ...
 
 That's just not the case.  The consistency models were used to describe
 the features and the pros and cons of the different approaches.
 
 The RFC is not a compromise to get cross-acks.  IMO it's an
 improvement on both kpatch and kGraft.  See the RFC cover letter [1] and
 the original consistency model discussion [2] for more details.

BTW, I proposed that with my RFC we only need a _single_ consistency
model.

Yes, there have been some suggestions that we should support multiple
consistency models, but I haven't heard any good reasons that would
justify the added complexity.

-- 
Josh
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live kernel upgrades (was: live kernel patching design)

2015-02-22 Thread Jiri Kosina
On Sun, 22 Feb 2015, Josh Poimboeuf wrote:

 Yes, there have been some suggestions that we should support multiple 
 consistency models, but I haven't heard any good reasons that would 
 justify the added complexity.

I tend to agree, consistency models were just a temporary idea that seems 
to likely become unnecessary given all the ideas on the unified solution 
that have been presented so far.

(Well, with a small exception to this -- I still think we should be able 
to fire and forget for patches where it's guaranteed that no 
housekeeping is necessary -- my favorite example is again fixing out of 
bounds access in a certain syscall entry ... i.e. the super-simple 
consistency model).

-- 
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live kernel upgrades (was: live kernel patching design)

2015-02-22 Thread Andrew Morton
On Sun, 22 Feb 2015 20:13:28 +0100 (CET) Jiri Kosina jkos...@suse.cz wrote:

 But if you ask the folks who are hungry for live bug patching, they 
 wouldn't care.
 
 You mentioned 10 seconds, that's more or less equal to infinity to them. 

10 seconds outage is unacceptable, but we're running our service on a
single machine with no failover.  Who is doing this??
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live kernel upgrades (was: live kernel patching design)

2015-02-22 Thread Vojtech Pavlik
On Sun, Feb 22, 2015 at 03:01:48PM -0800, Andrew Morton wrote:

 On Sun, 22 Feb 2015 20:13:28 +0100 (CET) Jiri Kosina jkos...@suse.cz wrote:
 
  But if you ask the folks who are hungry for live bug patching, they 
  wouldn't care.
  
  You mentioned 10 seconds, that's more or less equal to infinity to them. 
 
 10 seconds outage is unacceptable, but we're running our service on a
 single machine with no failover.  Who is doing this??

This is the most common argument that's raised when live patching is
discussed. Why do need live patching when we have redundancy?

People who are asking for live patching typically do have failover in
place, but prefer not to have to use it when they don't have to.

In many cases, the failover just can't be made transparent to the
outside world and there is a short outage. Examples would be legacy
applications which can't run in an active-active cluster and need to be
restarted on failover. Or trading systems, where the calculations must
be strictly serialized and response times are counted in tens of
microseconds. 

Another usecase is large HPC clusters, where all nodes have to run
carefully synchronized. Once one gets behind in a calculation cycle,
others have to wait for the results and the efficiency of the whole
cluster goes down. There are people who run realtime on them for
that reason. Dumping all data and restarting the HPC cluster takes a lot
of time and many nodes (out of tens of thousands) may not come back up,
making the restore from media difficult. Doing a rolling upgrade causes
the nodes one by one stall by 10+ seconds, which times 10k is a long
time, too.

And even the case where you have a perfect setup with everything
redundant and with instant failover does benefit from live patching.
Since you have to plan for failure, you have to plan for failure while
patching, too. With live patching you need 2 servers minimum (or N+1),
without you need 3 (or N+2), as one will be offline while during the
upgrade process.

10 seconds of outage may be acceptable in a disaster scenario. Not
necessarily for a regular update scenario.

The value of live patching is in near zero disruption.

-- 
Vojtech Pavlik
Director SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live kernel upgrades (was: live kernel patching design)

2015-02-22 Thread Dave Airlie
On 23 February 2015 at 09:01, Andrew Morton a...@linux-foundation.org wrote:
 On Sun, 22 Feb 2015 20:13:28 +0100 (CET) Jiri Kosina jkos...@suse.cz wrote:

 But if you ask the folks who are hungry for live bug patching, they
 wouldn't care.

 You mentioned 10 seconds, that's more or less equal to infinity to them.

 10 seconds outage is unacceptable, but we're running our service on a
 single machine with no failover.  Who is doing this??

if I had to guess, telcos generally, you've only got one wire between a phone
and the exchange and if the switch on the end needs patching it better be fast.

Dave.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: live kernel upgrades (was: live kernel patching design)

2015-02-22 Thread Arjan van de Ven
There's failover, there's running the core services in VMs (which can
migrate)...
I think 10 seconds is Ingo being a bit exaggerating, since you can
boot a full system in a lot less time than that, and more so if you
know more about the system
(e.g. don't need to spin down and then discover and spin up disks). If
you're talking about inside a VM it's even more extreme than that.


Now, live patching sounds great as ideal, but it may end up being
(mostly) similar like hardware hotplug: Everyone wants it, but nobody
wants to use it
(and just waits for a maintenance window instead). In the hotplug
case, while people say they want it, they're also aware that hardware
hotplug is fundamentally messy, and then nobody wants to do it on that
mission critical piece of hardware outside the maintenance window.
(hotswap drives seem to have been the exception to this, that seems to
have been worked out well enough, but that's replace-with-the-same).
I would be very afraid that hot kernel patching ends up in the same
space: The super-mission-critical folks are what its aimed at, while
those are the exact same folks that would rather wait for the
maintenance window.

There's a lot of logistical issues (can you patch a patched system...
if live patching is a first class citizen you end up with dozens and
dozens of live patches applied, some out of sequence etc etc). There's
the which patches do I have, and if the first patch for a security
hole was not complete, how do I cope by applying number two. There's
the which of my 50.000 servers have which patch applied logistics.

And Ingo is absolutely right: The scope is very fuzzy. Todays bugfix
is tomorrows oh oops it turns out exploitable.

I will throw a different hat in the ring: Maybe we don't want full
kernel update as step one, maybe we want this on a kernel module
level:
Hot-swap of kernel modules, where a kernel module makes itself go
quiet and serializes its state (suspend pretty much), then gets
swapped out (hot) by its replacement,
which then unserializes the state and continues.

If we can do this on a module level, then the next step is treating
more components of the kernel as modules, which is a fundamental
modularity thing.



On Sun, Feb 22, 2015 at 4:18 PM, Dave Airlie airl...@gmail.com wrote:
 On 23 February 2015 at 09:01, Andrew Morton a...@linux-foundation.org wrote:
 On Sun, 22 Feb 2015 20:13:28 +0100 (CET) Jiri Kosina jkos...@suse.cz wrote:

 But if you ask the folks who are hungry for live bug patching, they
 wouldn't care.

 You mentioned 10 seconds, that's more or less equal to infinity to them.

 10 seconds outage is unacceptable, but we're running our service on a
 single machine with no failover.  Who is doing this??

 if I had to guess, telcos generally, you've only got one wire between a phone
 and the exchange and if the switch on the end needs patching it better be 
 fast.

 Dave.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/