Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-05-03 Thread Sasha Levin
On Thu, May 03, 2018 at 11:32:15AM +0200, Pavel Machek wrote:
>Hi!
>
>> >- It must be obviously correct and tested.
>> >
>> >If it introduces new bug, it is not correct, and certainly not
>> >obviously correct.
>>
>> As you might have noticed, we don't strictly follow the rules.
>
>Yes, I noticed. And what I'm saying is that perhaps you should follow
>the rules more strictly.

Again, this was stated many times by Greg and others, the rules are not
there to be strictly followed.

>> Take a look at the whole PTI story as an example. It's way more than 100
>> lines, it's not obviously corrent, it fixed more than 1 thing, and so
>> on, and yet it went in -stable!
>>
>> Would you argue we shouldn't have backported PTI to -stable?
>
>Actually, I was surprised with PTI going to stable. That was clearly
>against the rules. Maybe the security bug was ugly enough to warrant
>that.
>
>But please don't use it as an argument for applying any random
>patches...

How about this: if a -stable maintainer has concerns with how I follow
the -stable rules, he's more than welcome to reject my patches. Sounds
like a plan?

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-05-03 Thread Sasha Levin
On Thu, May 03, 2018 at 11:32:15AM +0200, Pavel Machek wrote:
>Hi!
>
>> >- It must be obviously correct and tested.
>> >
>> >If it introduces new bug, it is not correct, and certainly not
>> >obviously correct.
>>
>> As you might have noticed, we don't strictly follow the rules.
>
>Yes, I noticed. And what I'm saying is that perhaps you should follow
>the rules more strictly.

Again, this was stated many times by Greg and others, the rules are not
there to be strictly followed.

>> Take a look at the whole PTI story as an example. It's way more than 100
>> lines, it's not obviously corrent, it fixed more than 1 thing, and so
>> on, and yet it went in -stable!
>>
>> Would you argue we shouldn't have backported PTI to -stable?
>
>Actually, I was surprised with PTI going to stable. That was clearly
>against the rules. Maybe the security bug was ugly enough to warrant
>that.
>
>But please don't use it as an argument for applying any random
>patches...

How about this: if a -stable maintainer has concerns with how I follow
the -stable rules, he's more than welcome to reject my patches. Sounds
like a plan?

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-05-03 Thread Sasha Levin
On Thu, May 03, 2018 at 11:36:51AM +0200, Pavel Machek wrote:
>On Tue 2018-04-17 16:19:35, Sasha Levin wrote:
>> On Tue, Apr 17, 2018 at 05:55:49PM +0200, Jan Kara wrote:
>> >On Tue 17-04-18 13:31:51, Sasha Levin wrote:
>> >> We may be able to guesstimate the 'regression chance', but there's no
>> >> way we can guess the 'annoyance' once. There are so many different use
>> >> cases that we just can't even guess how many people would get "annoyed"
>> >> by something.
>> >
>> >As a maintainer, I hope I have reasonable idea what are common use cases
>> >for my subsystem. Those I cater to when estimating 'annoyance'. Sure I don't
>> >know all of the use cases so people doing unusual stuff hit more bugs and
>> >have to report them to get fixes included in -stable. But for me this is a
>> >preferable tradeoff over the risk of regression so this is the rule I use
>> >when tagging for stable. Now I'm not a -stable maintainer and I fully agree
>> >with "those who do the work decide" principle so pick whatever patches you
>> >think are appropriate, I just wanted explain why I don't think more patches
>> >in stable are necessarily good.
>>
>> The AUTOSEL story is different for subsystems that don't do -stable, and
>> subsystems that are actually doing the work (like yourself).
>>
>> I'm not trying to override active maintainers, I'm trying to help them
>> make decisions.
>
>Ok, cool. Can you exclude LED subsystem, Hibernation and Nokia N900
>stuff from autosel work?

Curiousity got me, and I had to see what these subsystems do as far as
stable commits:

$ git log --oneline --grep 'stable@vger' --since="01-01-2016" kernel/power 
drivers/leds drivers/media/i2c/et8ek8 drivers/media/i2c/ad5820.c 
arch/x86/kernel/acpi/ | wc -l
7

Which got me a bit surprised: maybe indeed leds is mostly fine, but
hibernation is definitely tricky, I've been stung by it a few times...

So why not pick something an actual user reported, and see how that was
dealt with?

Googling first showed this:

https://bugzilla.kernel.org/show_bug.cgi?id=97201

Which was fixed by:


https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bdbc98abb3aa323f6323b11db39c740e6f8fc5b1

But that's not in any -stable tree. Hmm.. ok..

Next one on google was:

https://bugzilla.kernel.org/show_bug.cgi?id=117971

Which, in turn, was fixed by:


https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5b3f249c94ce1f46bacd9814385b0ee2d1ae52f3

Oh look at that, it's not in -stable either...

So seeing how you have concerns with my selection of -stable commits,
maybe you could explain to me why these commits didn't end up in
-stable?

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-05-03 Thread Sasha Levin
On Thu, May 03, 2018 at 11:36:51AM +0200, Pavel Machek wrote:
>On Tue 2018-04-17 16:19:35, Sasha Levin wrote:
>> On Tue, Apr 17, 2018 at 05:55:49PM +0200, Jan Kara wrote:
>> >On Tue 17-04-18 13:31:51, Sasha Levin wrote:
>> >> We may be able to guesstimate the 'regression chance', but there's no
>> >> way we can guess the 'annoyance' once. There are so many different use
>> >> cases that we just can't even guess how many people would get "annoyed"
>> >> by something.
>> >
>> >As a maintainer, I hope I have reasonable idea what are common use cases
>> >for my subsystem. Those I cater to when estimating 'annoyance'. Sure I don't
>> >know all of the use cases so people doing unusual stuff hit more bugs and
>> >have to report them to get fixes included in -stable. But for me this is a
>> >preferable tradeoff over the risk of regression so this is the rule I use
>> >when tagging for stable. Now I'm not a -stable maintainer and I fully agree
>> >with "those who do the work decide" principle so pick whatever patches you
>> >think are appropriate, I just wanted explain why I don't think more patches
>> >in stable are necessarily good.
>>
>> The AUTOSEL story is different for subsystems that don't do -stable, and
>> subsystems that are actually doing the work (like yourself).
>>
>> I'm not trying to override active maintainers, I'm trying to help them
>> make decisions.
>
>Ok, cool. Can you exclude LED subsystem, Hibernation and Nokia N900
>stuff from autosel work?

Curiousity got me, and I had to see what these subsystems do as far as
stable commits:

$ git log --oneline --grep 'stable@vger' --since="01-01-2016" kernel/power 
drivers/leds drivers/media/i2c/et8ek8 drivers/media/i2c/ad5820.c 
arch/x86/kernel/acpi/ | wc -l
7

Which got me a bit surprised: maybe indeed leds is mostly fine, but
hibernation is definitely tricky, I've been stung by it a few times...

So why not pick something an actual user reported, and see how that was
dealt with?

Googling first showed this:

https://bugzilla.kernel.org/show_bug.cgi?id=97201

Which was fixed by:


https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bdbc98abb3aa323f6323b11db39c740e6f8fc5b1

But that's not in any -stable tree. Hmm.. ok..

Next one on google was:

https://bugzilla.kernel.org/show_bug.cgi?id=117971

Which, in turn, was fixed by:


https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5b3f249c94ce1f46bacd9814385b0ee2d1ae52f3

Oh look at that, it's not in -stable either...

So seeing how you have concerns with my selection of -stable commits,
maybe you could explain to me why these commits didn't end up in
-stable?

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-05-03 Thread Sasha Levin
On Thu, May 03, 2018 at 11:47:24AM +0200, Pavel Machek wrote:
>On Mon 2018-04-16 21:18:47, Sasha Levin wrote:
>> On Mon, Apr 16, 2018 at 10:43:28PM +0200, Jiri Kosina wrote:
>> >On Mon, 16 Apr 2018, Sasha Levin wrote:
>> >
>> >> So I think that Linus's claim that users come first applies here as
>> >> well. If there's a user that cares about a particular feature being
>> >> broken, then we go ahead and fix his bug rather then ignoring him.
>> >
>> >So one extreme is fixing -stable *iff* users actually do report an issue.
>> >
>> >The other extreme is backporting everything that potentially looks like a
>> >potential fix of "something" (according to some arbitrary metric),
>> >pro-actively.
>> >
>> >The former voilates the "users first" rule, the latter has a very, very
>> >high risk of regressions.
>> >
>> >So this whole debate is about finding a compromise.
>> >
>> >My gut feeling always was that the statement in
>> >
>> >Documentation/process/stable-kernel-rules.rst
>> >
>> >is very reasonable, but making the process way more "aggresive" when
>> >backporting patches is breaking much of its original spirit for me.
>>
>> I agree that as an enterprise distro taking everything from -stable
>> isn't the best idea. Ideally you'd want to be close to the first
>
>Original purpose of -stable was "to be common base of enterprise
>distros" and our documentation still says it is.

I guess that the world changes?

At this point calling enterprise distros a niche wouldn't be too far
from the truth. Furthermore, some enterprise distros (as stated
earlier in this thread) don't even follow -stable anymore and cherry
pick their own commits.

So no, the main driving force behind -stable is not traditional
enterprise distributions.

>> I think that we can agree that it's impossible to expect every single
>> Linux user to go on LKML and complain about a bug he encountered, so the
>> rule quickly becomes "It must fix a real bug that can bother
>> people".
>
>I think you are playing dangerous word games.
>
>> My "aggressiveness" comes from the whole "bother" part: it doesn't have
>> to be critical, it doesn't have to cause data corruption, it doesn't
>> have to be a security issue. It's enough that the bug actually affects a
>> user in a way he didn't expect it to (if a user doesn't have
>> expectations, it would fall under the "This could be a problem..."
>> exception.
>
>And it seems documentation says you should be less aggressive and
>world tells you they expect to be less aggressive. So maybe that's
>what you should do?

Who is this "world" you're referring to?

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-05-03 Thread Sasha Levin
On Thu, May 03, 2018 at 11:47:24AM +0200, Pavel Machek wrote:
>On Mon 2018-04-16 21:18:47, Sasha Levin wrote:
>> On Mon, Apr 16, 2018 at 10:43:28PM +0200, Jiri Kosina wrote:
>> >On Mon, 16 Apr 2018, Sasha Levin wrote:
>> >
>> >> So I think that Linus's claim that users come first applies here as
>> >> well. If there's a user that cares about a particular feature being
>> >> broken, then we go ahead and fix his bug rather then ignoring him.
>> >
>> >So one extreme is fixing -stable *iff* users actually do report an issue.
>> >
>> >The other extreme is backporting everything that potentially looks like a
>> >potential fix of "something" (according to some arbitrary metric),
>> >pro-actively.
>> >
>> >The former voilates the "users first" rule, the latter has a very, very
>> >high risk of regressions.
>> >
>> >So this whole debate is about finding a compromise.
>> >
>> >My gut feeling always was that the statement in
>> >
>> >Documentation/process/stable-kernel-rules.rst
>> >
>> >is very reasonable, but making the process way more "aggresive" when
>> >backporting patches is breaking much of its original spirit for me.
>>
>> I agree that as an enterprise distro taking everything from -stable
>> isn't the best idea. Ideally you'd want to be close to the first
>
>Original purpose of -stable was "to be common base of enterprise
>distros" and our documentation still says it is.

I guess that the world changes?

At this point calling enterprise distros a niche wouldn't be too far
from the truth. Furthermore, some enterprise distros (as stated
earlier in this thread) don't even follow -stable anymore and cherry
pick their own commits.

So no, the main driving force behind -stable is not traditional
enterprise distributions.

>> I think that we can agree that it's impossible to expect every single
>> Linux user to go on LKML and complain about a bug he encountered, so the
>> rule quickly becomes "It must fix a real bug that can bother
>> people".
>
>I think you are playing dangerous word games.
>
>> My "aggressiveness" comes from the whole "bother" part: it doesn't have
>> to be critical, it doesn't have to cause data corruption, it doesn't
>> have to be a security issue. It's enough that the bug actually affects a
>> user in a way he didn't expect it to (if a user doesn't have
>> expectations, it would fall under the "This could be a problem..."
>> exception.
>
>And it seems documentation says you should be less aggressive and
>world tells you they expect to be less aggressive. So maybe that's
>what you should do?

Who is this "world" you're referring to?

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-05-03 Thread Sasha Levin
On Thu, May 03, 2018 at 12:04:41PM +0200, Pavel Machek wrote:
>On Tue 2018-04-17 16:06:29, Sasha Levin wrote:
>> On Tue, Apr 17, 2018 at 05:52:30PM +0200, Jiri Kosina wrote:
>> >On Tue, 17 Apr 2018, Sasha Levin wrote:
>> >
>> >> How do I get the XFS folks to send their stuff to -stable? (we have
>> >> quite a few customers who use XFS)
>> >
>> >If XFS (or *any* other subsystem) doesn't have enough manpower of upstream
>> >maintainers to deal with stable, we just have to accept that and find an
>> >answer to that.
>>
>> This is exactly what I'm doing. Many subsystems don't have enough
>> manpower to deal with -stable, so I'm trying to help.
>
>...and the torrent of spams from the AUTOSEL subsystem actually makes
>that worse.
>
>And when you are told particular fix to LEDs is not that important
>after all, you start arguing about nuclear power plants (without
>really knowing how critical subsystems work).

Obviously your knowledge far surpasses mine.

>If you want cooperation with maintainers to work, the rules need to be
>clear, first. They are documented, so follow them. If you think rules
>are wrong, lets talk about changing the rules; but arguing "every bug
>is important because someone may be hitting it" is not ok.

I'm sorry but you're just unfamiliar with the process. I'd point out
that all my AUTOSEL commits go through Greg, who wrote the rules, and
accepts my patches.

The rules are there as a guideline to allow us to not take certain
patches, they're not there as a strict set of rules we must follow at
all times.

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-05-03 Thread Sasha Levin
On Thu, May 03, 2018 at 12:04:41PM +0200, Pavel Machek wrote:
>On Tue 2018-04-17 16:06:29, Sasha Levin wrote:
>> On Tue, Apr 17, 2018 at 05:52:30PM +0200, Jiri Kosina wrote:
>> >On Tue, 17 Apr 2018, Sasha Levin wrote:
>> >
>> >> How do I get the XFS folks to send their stuff to -stable? (we have
>> >> quite a few customers who use XFS)
>> >
>> >If XFS (or *any* other subsystem) doesn't have enough manpower of upstream
>> >maintainers to deal with stable, we just have to accept that and find an
>> >answer to that.
>>
>> This is exactly what I'm doing. Many subsystems don't have enough
>> manpower to deal with -stable, so I'm trying to help.
>
>...and the torrent of spams from the AUTOSEL subsystem actually makes
>that worse.
>
>And when you are told particular fix to LEDs is not that important
>after all, you start arguing about nuclear power plants (without
>really knowing how critical subsystems work).

Obviously your knowledge far surpasses mine.

>If you want cooperation with maintainers to work, the rules need to be
>clear, first. They are documented, so follow them. If you think rules
>are wrong, lets talk about changing the rules; but arguing "every bug
>is important because someone may be hitting it" is not ok.

I'm sorry but you're just unfamiliar with the process. I'd point out
that all my AUTOSEL commits go through Greg, who wrote the rules, and
accepts my patches.

The rules are there as a guideline to allow us to not take certain
patches, they're not there as a strict set of rules we must follow at
all times.

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-05-03 Thread Pavel Machek
On Tue 2018-04-17 16:06:29, Sasha Levin wrote:
> On Tue, Apr 17, 2018 at 05:52:30PM +0200, Jiri Kosina wrote:
> >On Tue, 17 Apr 2018, Sasha Levin wrote:
> >
> >> How do I get the XFS folks to send their stuff to -stable? (we have
> >> quite a few customers who use XFS)
> >
> >If XFS (or *any* other subsystem) doesn't have enough manpower of upstream
> >maintainers to deal with stable, we just have to accept that and find an
> >answer to that.
> 
> This is exactly what I'm doing. Many subsystems don't have enough
> manpower to deal with -stable, so I'm trying to help.

...and the torrent of spams from the AUTOSEL subsystem actually makes
that worse.

And when you are told particular fix to LEDs is not that important
after all, you start arguing about nuclear power plants (without
really knowing how critical subsystems work).

If you want cooperation with maintainers to work, the rules need to be
clear, first. They are documented, so follow them. If you think rules
are wrong, lets talk about changing the rules; but arguing "every bug
is important because someone may be hitting it" is not ok.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-05-03 Thread Pavel Machek
On Tue 2018-04-17 16:06:29, Sasha Levin wrote:
> On Tue, Apr 17, 2018 at 05:52:30PM +0200, Jiri Kosina wrote:
> >On Tue, 17 Apr 2018, Sasha Levin wrote:
> >
> >> How do I get the XFS folks to send their stuff to -stable? (we have
> >> quite a few customers who use XFS)
> >
> >If XFS (or *any* other subsystem) doesn't have enough manpower of upstream
> >maintainers to deal with stable, we just have to accept that and find an
> >answer to that.
> 
> This is exactly what I'm doing. Many subsystems don't have enough
> manpower to deal with -stable, so I'm trying to help.

...and the torrent of spams from the AUTOSEL subsystem actually makes
that worse.

And when you are told particular fix to LEDs is not that important
after all, you start arguing about nuclear power plants (without
really knowing how critical subsystems work).

If you want cooperation with maintainers to work, the rules need to be
clear, first. They are documented, so follow them. If you think rules
are wrong, lets talk about changing the rules; but arguing "every bug
is important because someone may be hitting it" is not ok.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-05-03 Thread Pavel Machek
On Mon 2018-04-16 21:18:47, Sasha Levin wrote:
> On Mon, Apr 16, 2018 at 10:43:28PM +0200, Jiri Kosina wrote:
> >On Mon, 16 Apr 2018, Sasha Levin wrote:
> >
> >> So I think that Linus's claim that users come first applies here as
> >> well. If there's a user that cares about a particular feature being
> >> broken, then we go ahead and fix his bug rather then ignoring him.
> >
> >So one extreme is fixing -stable *iff* users actually do report an issue.
> >
> >The other extreme is backporting everything that potentially looks like a
> >potential fix of "something" (according to some arbitrary metric),
> >pro-actively.
> >
> >The former voilates the "users first" rule, the latter has a very, very
> >high risk of regressions.
> >
> >So this whole debate is about finding a compromise.
> >
> >My gut feeling always was that the statement in
> >
> > Documentation/process/stable-kernel-rules.rst
> >
> >is very reasonable, but making the process way more "aggresive" when
> >backporting patches is breaking much of its original spirit for me.
> 
> I agree that as an enterprise distro taking everything from -stable
> isn't the best idea. Ideally you'd want to be close to the first

Original purpose of -stable was "to be common base of enterprise
distros" and our documentation still says it is.

> I think that we can agree that it's impossible to expect every single
> Linux user to go on LKML and complain about a bug he encountered, so the
> rule quickly becomes "It must fix a real bug that can bother
> people".

I think you are playing dangerous word games.

> My "aggressiveness" comes from the whole "bother" part: it doesn't have
> to be critical, it doesn't have to cause data corruption, it doesn't
> have to be a security issue. It's enough that the bug actually affects a
> user in a way he didn't expect it to (if a user doesn't have
> expectations, it would fall under the "This could be a problem..."
> exception.

And it seems documentation says you should be less aggressive and
world tells you they expect to be less aggressive. So maybe that's
what you should do?
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-05-03 Thread Pavel Machek
On Mon 2018-04-16 21:18:47, Sasha Levin wrote:
> On Mon, Apr 16, 2018 at 10:43:28PM +0200, Jiri Kosina wrote:
> >On Mon, 16 Apr 2018, Sasha Levin wrote:
> >
> >> So I think that Linus's claim that users come first applies here as
> >> well. If there's a user that cares about a particular feature being
> >> broken, then we go ahead and fix his bug rather then ignoring him.
> >
> >So one extreme is fixing -stable *iff* users actually do report an issue.
> >
> >The other extreme is backporting everything that potentially looks like a
> >potential fix of "something" (according to some arbitrary metric),
> >pro-actively.
> >
> >The former voilates the "users first" rule, the latter has a very, very
> >high risk of regressions.
> >
> >So this whole debate is about finding a compromise.
> >
> >My gut feeling always was that the statement in
> >
> > Documentation/process/stable-kernel-rules.rst
> >
> >is very reasonable, but making the process way more "aggresive" when
> >backporting patches is breaking much of its original spirit for me.
> 
> I agree that as an enterprise distro taking everything from -stable
> isn't the best idea. Ideally you'd want to be close to the first

Original purpose of -stable was "to be common base of enterprise
distros" and our documentation still says it is.

> I think that we can agree that it's impossible to expect every single
> Linux user to go on LKML and complain about a bug he encountered, so the
> rule quickly becomes "It must fix a real bug that can bother
> people".

I think you are playing dangerous word games.

> My "aggressiveness" comes from the whole "bother" part: it doesn't have
> to be critical, it doesn't have to cause data corruption, it doesn't
> have to be a security issue. It's enough that the bug actually affects a
> user in a way he didn't expect it to (if a user doesn't have
> expectations, it would fall under the "This could be a problem..."
> exception.

And it seems documentation says you should be less aggressive and
world tells you they expect to be less aggressive. So maybe that's
what you should do?
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-05-03 Thread Pavel Machek
On Tue 2018-04-17 16:19:35, Sasha Levin wrote:
> On Tue, Apr 17, 2018 at 05:55:49PM +0200, Jan Kara wrote:
> >On Tue 17-04-18 13:31:51, Sasha Levin wrote:
> >> We may be able to guesstimate the 'regression chance', but there's no
> >> way we can guess the 'annoyance' once. There are so many different use
> >> cases that we just can't even guess how many people would get "annoyed"
> >> by something.
> >
> >As a maintainer, I hope I have reasonable idea what are common use cases
> >for my subsystem. Those I cater to when estimating 'annoyance'. Sure I don't
> >know all of the use cases so people doing unusual stuff hit more bugs and
> >have to report them to get fixes included in -stable. But for me this is a
> >preferable tradeoff over the risk of regression so this is the rule I use
> >when tagging for stable. Now I'm not a -stable maintainer and I fully agree
> >with "those who do the work decide" principle so pick whatever patches you
> >think are appropriate, I just wanted explain why I don't think more patches
> >in stable are necessarily good.
> 
> The AUTOSEL story is different for subsystems that don't do -stable, and
> subsystems that are actually doing the work (like yourself).
> 
> I'm not trying to override active maintainers, I'm trying to help them
> make decisions.

Ok, cool. Can you exclude LED subsystem, Hibernation and Nokia N900
stuff from autosel work?

Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-05-03 Thread Pavel Machek
On Tue 2018-04-17 16:19:35, Sasha Levin wrote:
> On Tue, Apr 17, 2018 at 05:55:49PM +0200, Jan Kara wrote:
> >On Tue 17-04-18 13:31:51, Sasha Levin wrote:
> >> We may be able to guesstimate the 'regression chance', but there's no
> >> way we can guess the 'annoyance' once. There are so many different use
> >> cases that we just can't even guess how many people would get "annoyed"
> >> by something.
> >
> >As a maintainer, I hope I have reasonable idea what are common use cases
> >for my subsystem. Those I cater to when estimating 'annoyance'. Sure I don't
> >know all of the use cases so people doing unusual stuff hit more bugs and
> >have to report them to get fixes included in -stable. But for me this is a
> >preferable tradeoff over the risk of regression so this is the rule I use
> >when tagging for stable. Now I'm not a -stable maintainer and I fully agree
> >with "those who do the work decide" principle so pick whatever patches you
> >think are appropriate, I just wanted explain why I don't think more patches
> >in stable are necessarily good.
> 
> The AUTOSEL story is different for subsystems that don't do -stable, and
> subsystems that are actually doing the work (like yourself).
> 
> I'm not trying to override active maintainers, I'm trying to help them
> make decisions.

Ok, cool. Can you exclude LED subsystem, Hibernation and Nokia N900
stuff from autosel work?

Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-05-03 Thread Pavel Machek
Hi!

> >- It must be obviously correct and tested.
> >
> >If it introduces new bug, it is not correct, and certainly not
> >obviously correct.
> 
> As you might have noticed, we don't strictly follow the rules.

Yes, I noticed. And what I'm saying is that perhaps you should follow
the rules more strictly.

> Take a look at the whole PTI story as an example. It's way more than 100
> lines, it's not obviously corrent, it fixed more than 1 thing, and so
> on, and yet it went in -stable!
> 
> Would you argue we shouldn't have backported PTI to -stable?

Actually, I was surprised with PTI going to stable. That was clearly
against the rules. Maybe the security bug was ugly enough to warrant
that.

But please don't use it as an argument for applying any random
patches...

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-05-03 Thread Pavel Machek
Hi!

> >- It must be obviously correct and tested.
> >
> >If it introduces new bug, it is not correct, and certainly not
> >obviously correct.
> 
> As you might have noticed, we don't strictly follow the rules.

Yes, I noticed. And what I'm saying is that perhaps you should follow
the rules more strictly.

> Take a look at the whole PTI story as an example. It's way more than 100
> lines, it's not obviously corrent, it fixed more than 1 thing, and so
> on, and yet it went in -stable!
> 
> Would you argue we shouldn't have backported PTI to -stable?

Actually, I was surprised with PTI going to stable. That was clearly
against the rules. Maybe the security bug was ugly enough to warrant
that.

But please don't use it as an argument for applying any random
patches...

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-19 Thread Greg KH
On Thu, Apr 19, 2018 at 04:22:22PM +0200, Greg KH wrote:
> On Thu, Apr 19, 2018 at 04:05:45PM +0200, Jan Kara wrote:
> > On Thu 19-04-18 15:59:43, Greg KH wrote:
> > > On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote:
> > > > Den 16-04-2018 kl. 19:19, skrev Sasha Levin:
> > > > > On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote:
> > > > > > On Mon, 16 Apr 2018 16:02:03 +
> > > > > > Sasha Levin  wrote:
> > > > > > 
> > > > > > > One of the things Greg is pushing strongly for is "bug 
> > > > > > > compatibility":
> > > > > > > we want the kernel to behave the same way between mainline and 
> > > > > > > stable.
> > > > > > > If the code is broken, it should be broken in the same way.
> > > > > > 
> > > > > > Wait! What does that mean? What's the purpose of stable if it is as
> > > > > > broken as mainline?
> > > > > 
> > > > > This just means that if there is a fix that went in mainline, and the
> > > > > fix is broken somehow, we'd rather take the broken fix than not.
> > > > > 
> > > > > In this scenario, *something* will be broken, it's just a matter of
> > > > > what. We'd rather have the same thing broken between mainline and
> > > > > stable.
> > > > > 
> > > > 
> > > > Yeah, but _intentionally_ breaking existing setups to stay "bug 
> > > > compatible"
> > > > _is_ a _regression_ you _really_ _dont_ want in a stable
> > > > supported distro. Because end-users dont care about upstream breaking
> > > > stuff... its the distro that takes the heat for that...
> > > > 
> > > > Something "already broken" is not a regression...
> > > > 
> > > > As distro maintainer that means one now have to review _every_ patch 
> > > > that
> > > > carries "AUTOSEL", follow all the mail threads that comes up about it, 
> > > > then
> > > > track if it landed in -stable queue, and read every response and 
> > > > possible
> > > > objection to all patches in the -stable queue a second time around... 
> > > > then
> > > > check if it still got included in final stable point relase and then 
> > > > either
> > > > revert them in distro kernel or go track down all the follow-up fixes
> > > > needed...
> > > > 
> > > > Just to avoid being "bug compatible with master"
> > > 
> > > I've done this "bug compatible" "breakage" more than the AUTOSEL stuff
> > > has in the past, so you had better also be reviewing all of my normal
> > > commits as well :)
> > > 
> > > Anyway, we are trying not to do this, but it does, and will,
> > > occasionally happen.
> > 
> > Sure, that's understood. So this was just misunderstanding. Sasha's
> > original comment really sounded like "bug compatibility" with current
> > master is desirable and that made me go "Are you serious?" as well...
> 
> As I said before in this thread, yes, sometimes I do this on purpose.
> 
> As an specific example, see a recent bluetooth patch that caused a
> regression on some chromebook devices.  The chromeos developers
> rightfully complainied, and I left the commit in there to provide the
> needed "leverage" on the upstream developers to fix this properly.
> Otherwise if I had reverted the stable patch, when people move to a
> newer kernel version, things break, and no one remembers why.
> 
> I also wrote a long response as to _why_ I do this, and even though it
> does happen, why it still is worth taking the stable updates.  Please
> see the archives for the full details.  I don't want to duplicate this
> again here.

And to be more specific, let's always take this as a case-by-case basis.
The last time this happened was the bluetooth bug and it was a fix for a
reported problem, but then the fix caused a regression so upstream
reverted it and I reverted it in the stable trees.  No matter what I
chose to do, someone would be upset so I followed what upstream did.

thanks,

greg k-h


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-19 Thread Greg KH
On Thu, Apr 19, 2018 at 04:22:22PM +0200, Greg KH wrote:
> On Thu, Apr 19, 2018 at 04:05:45PM +0200, Jan Kara wrote:
> > On Thu 19-04-18 15:59:43, Greg KH wrote:
> > > On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote:
> > > > Den 16-04-2018 kl. 19:19, skrev Sasha Levin:
> > > > > On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote:
> > > > > > On Mon, 16 Apr 2018 16:02:03 +
> > > > > > Sasha Levin  wrote:
> > > > > > 
> > > > > > > One of the things Greg is pushing strongly for is "bug 
> > > > > > > compatibility":
> > > > > > > we want the kernel to behave the same way between mainline and 
> > > > > > > stable.
> > > > > > > If the code is broken, it should be broken in the same way.
> > > > > > 
> > > > > > Wait! What does that mean? What's the purpose of stable if it is as
> > > > > > broken as mainline?
> > > > > 
> > > > > This just means that if there is a fix that went in mainline, and the
> > > > > fix is broken somehow, we'd rather take the broken fix than not.
> > > > > 
> > > > > In this scenario, *something* will be broken, it's just a matter of
> > > > > what. We'd rather have the same thing broken between mainline and
> > > > > stable.
> > > > > 
> > > > 
> > > > Yeah, but _intentionally_ breaking existing setups to stay "bug 
> > > > compatible"
> > > > _is_ a _regression_ you _really_ _dont_ want in a stable
> > > > supported distro. Because end-users dont care about upstream breaking
> > > > stuff... its the distro that takes the heat for that...
> > > > 
> > > > Something "already broken" is not a regression...
> > > > 
> > > > As distro maintainer that means one now have to review _every_ patch 
> > > > that
> > > > carries "AUTOSEL", follow all the mail threads that comes up about it, 
> > > > then
> > > > track if it landed in -stable queue, and read every response and 
> > > > possible
> > > > objection to all patches in the -stable queue a second time around... 
> > > > then
> > > > check if it still got included in final stable point relase and then 
> > > > either
> > > > revert them in distro kernel or go track down all the follow-up fixes
> > > > needed...
> > > > 
> > > > Just to avoid being "bug compatible with master"
> > > 
> > > I've done this "bug compatible" "breakage" more than the AUTOSEL stuff
> > > has in the past, so you had better also be reviewing all of my normal
> > > commits as well :)
> > > 
> > > Anyway, we are trying not to do this, but it does, and will,
> > > occasionally happen.
> > 
> > Sure, that's understood. So this was just misunderstanding. Sasha's
> > original comment really sounded like "bug compatibility" with current
> > master is desirable and that made me go "Are you serious?" as well...
> 
> As I said before in this thread, yes, sometimes I do this on purpose.
> 
> As an specific example, see a recent bluetooth patch that caused a
> regression on some chromebook devices.  The chromeos developers
> rightfully complainied, and I left the commit in there to provide the
> needed "leverage" on the upstream developers to fix this properly.
> Otherwise if I had reverted the stable patch, when people move to a
> newer kernel version, things break, and no one remembers why.
> 
> I also wrote a long response as to _why_ I do this, and even though it
> does happen, why it still is worth taking the stable updates.  Please
> see the archives for the full details.  I don't want to duplicate this
> again here.

And to be more specific, let's always take this as a case-by-case basis.
The last time this happened was the bluetooth bug and it was a fix for a
reported problem, but then the fix caused a regression so upstream
reverted it and I reverted it in the stable trees.  No matter what I
chose to do, someone would be upset so I followed what upstream did.

thanks,

greg k-h


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-19 Thread Thomas Backlund

Den 19.04.2018 kl. 18:57, skrev Greg KH:

On Thu, Apr 19, 2018 at 06:16:26PM +0300, Thomas Backlund wrote:

Den 19.04.2018 kl. 17:22, skrev Greg KH:

On Thu, Apr 19, 2018 at 04:05:45PM +0200, Jan Kara wrote:

On Thu 19-04-18 15:59:43, Greg KH wrote:

On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote:

Den 16-04-2018 kl. 19:19, skrev Sasha Levin:

On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote:

On Mon, 16 Apr 2018 16:02:03 +
Sasha Levin  wrote:


One of the things Greg is pushing strongly for is "bug compatibility":
we want the kernel to behave the same way between mainline and stable.
If the code is broken, it should be broken in the same way.


Wait! What does that mean? What's the purpose of stable if it is as
broken as mainline?


This just means that if there is a fix that went in mainline, and the
fix is broken somehow, we'd rather take the broken fix than not.

In this scenario, *something* will be broken, it's just a matter of
what. We'd rather have the same thing broken between mainline and
stable.



Yeah, but _intentionally_ breaking existing setups to stay "bug compatible"
_is_ a _regression_ you _really_ _dont_ want in a stable
supported distro. Because end-users dont care about upstream breaking
stuff... its the distro that takes the heat for that...

Something "already broken" is not a regression...

As distro maintainer that means one now have to review _every_ patch that
carries "AUTOSEL", follow all the mail threads that comes up about it, then
track if it landed in -stable queue, and read every response and possible
objection to all patches in the -stable queue a second time around... then
check if it still got included in final stable point relase and then either
revert them in distro kernel or go track down all the follow-up fixes
needed...

Just to avoid being "bug compatible with master"


I've done this "bug compatible" "breakage" more than the AUTOSEL stuff
has in the past, so you had better also be reviewing all of my normal
commits as well :)

Anyway, we are trying not to do this, but it does, and will,
occasionally happen.


Sure, that's understood. So this was just misunderstanding. Sasha's
original comment really sounded like "bug compatibility" with current
master is desirable and that made me go "Are you serious?" as well...


As I said before in this thread, yes, sometimes I do this on purpose.



And I guess this is the one that gets people the feeling that
"stable is not as stable as it used to be" ...


It's always been this way, it's just that no one noticed :)



:)



As an specific example, see a recent bluetooth patch that caused a
regression on some chromebook devices.  The chromeos developers
rightfully complainied, and I left the commit in there to provide the
needed "leverage" on the upstream developers to fix this properly.
Otherwise if I had reverted the stable patch, when people move to a
newer kernel version, things break, and no one remembers why.


I do understand what you are trying to do...

But from my distro hat on I have to treat things differently (and I dont
think I'm alone doing it this way...)

Known breakages gets reverted even before it hits QA, so endusers wont see
the issue at all...

So the only ones to see the issue are those building with latest upstream
without own patches applied...



I also wrote a long response as to _why_ I do this, and even though it
does happen, why it still is worth taking the stable updates.  Please
see the archives for the full details.  I don't want to duplicate this
again here.


And we do use latest stable (with some delay as I dont want to overload QA &
endusers with a new kernel every week :))


You need to automate your QA :)



Yeah, some can be automated... but that means having a lot of different 
hw to test on... emulators/vms can only test so much...


users part of QA test on a variety of hw with various installs/setups 
that exposes fun things with some hw :)




We just revert known broken (or add known fixes) before releasing them to
our users


That's great, and is what you should be doing, nothing wrong there.

thanks,

greg k-h



--
Thomas



Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-19 Thread Thomas Backlund

Den 19.04.2018 kl. 18:57, skrev Greg KH:

On Thu, Apr 19, 2018 at 06:16:26PM +0300, Thomas Backlund wrote:

Den 19.04.2018 kl. 17:22, skrev Greg KH:

On Thu, Apr 19, 2018 at 04:05:45PM +0200, Jan Kara wrote:

On Thu 19-04-18 15:59:43, Greg KH wrote:

On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote:

Den 16-04-2018 kl. 19:19, skrev Sasha Levin:

On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote:

On Mon, 16 Apr 2018 16:02:03 +
Sasha Levin  wrote:


One of the things Greg is pushing strongly for is "bug compatibility":
we want the kernel to behave the same way between mainline and stable.
If the code is broken, it should be broken in the same way.


Wait! What does that mean? What's the purpose of stable if it is as
broken as mainline?


This just means that if there is a fix that went in mainline, and the
fix is broken somehow, we'd rather take the broken fix than not.

In this scenario, *something* will be broken, it's just a matter of
what. We'd rather have the same thing broken between mainline and
stable.



Yeah, but _intentionally_ breaking existing setups to stay "bug compatible"
_is_ a _regression_ you _really_ _dont_ want in a stable
supported distro. Because end-users dont care about upstream breaking
stuff... its the distro that takes the heat for that...

Something "already broken" is not a regression...

As distro maintainer that means one now have to review _every_ patch that
carries "AUTOSEL", follow all the mail threads that comes up about it, then
track if it landed in -stable queue, and read every response and possible
objection to all patches in the -stable queue a second time around... then
check if it still got included in final stable point relase and then either
revert them in distro kernel or go track down all the follow-up fixes
needed...

Just to avoid being "bug compatible with master"


I've done this "bug compatible" "breakage" more than the AUTOSEL stuff
has in the past, so you had better also be reviewing all of my normal
commits as well :)

Anyway, we are trying not to do this, but it does, and will,
occasionally happen.


Sure, that's understood. So this was just misunderstanding. Sasha's
original comment really sounded like "bug compatibility" with current
master is desirable and that made me go "Are you serious?" as well...


As I said before in this thread, yes, sometimes I do this on purpose.



And I guess this is the one that gets people the feeling that
"stable is not as stable as it used to be" ...


It's always been this way, it's just that no one noticed :)



:)



As an specific example, see a recent bluetooth patch that caused a
regression on some chromebook devices.  The chromeos developers
rightfully complainied, and I left the commit in there to provide the
needed "leverage" on the upstream developers to fix this properly.
Otherwise if I had reverted the stable patch, when people move to a
newer kernel version, things break, and no one remembers why.


I do understand what you are trying to do...

But from my distro hat on I have to treat things differently (and I dont
think I'm alone doing it this way...)

Known breakages gets reverted even before it hits QA, so endusers wont see
the issue at all...

So the only ones to see the issue are those building with latest upstream
without own patches applied...



I also wrote a long response as to _why_ I do this, and even though it
does happen, why it still is worth taking the stable updates.  Please
see the archives for the full details.  I don't want to duplicate this
again here.


And we do use latest stable (with some delay as I dont want to overload QA &
endusers with a new kernel every week :))


You need to automate your QA :)



Yeah, some can be automated... but that means having a lot of different 
hw to test on... emulators/vms can only test so much...


users part of QA test on a variety of hw with various installs/setups 
that exposes fun things with some hw :)




We just revert known broken (or add known fixes) before releasing them to
our users


That's great, and is what you should be doing, nothing wrong there.

thanks,

greg k-h



--
Thomas



Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-19 Thread Thomas Backlund

Den 19.04.2018 kl. 18:09, skrev Sasha Levin:

On Thu, Apr 19, 2018 at 06:04:26PM +0300, Thomas Backlund wrote:

Den 19.04.2018 kl. 16:59, skrev Greg KH:

Anyway, we are trying not to do this, but it does, and will,
occasionally happen.  Look, we just did that for one platform for
4.9.94!  And the key to all of this is good testing, which we are now
doing, and hopefully you are also doing as well.


Yeah, but having to test stuff with known breakages is no fun, so we
try to avoid that


Known breakages are easier to deal with than unknown ones :)



well, if a system worked before the update, but not after...
Guess wich one we want...




I think that that "bug compatability" is basically a policy on *which*
regressions you'll see vs *if* you'll see a regression.




No. Intentionally breaking known working code in a stable branch is 
never ok.


As I said before... something that never worked is not a regression,
but breaking a previously working setup is...

That goes for security fixes too... there is not much point in a 
security fix, if it basically turns into a local DOS when the system 
stops working...


People will just boot older code and there you have it...



We'll never pull in a commit that introduces a bug but doesn't fix
another one, right? So if you have to deal with a regression anyway,
might as well deal with a regression that is also seen on mainline, so
that when you upgrade your stable kernel you'll keep the same set of
regressions to deal with.




Here I actually like the comment Linus posted about API breakage earlier 
in this thread...



If you break user workflows, NOTHING ELSE MATTERS.

Even security is secondary to "people don't use the end result,
because it doesn't work for them any more".


_This_ same statement should be aknowledged / enforced in stable trees 
too IMHO...


Because this is what will happend...

simple logic... if it does not work, the enduser will boot an earlier 
kernel... missing "all the good fixes" (including security ones) just

because one fix is bad.

For example in this AUTOSEL round there is 161 fixes of wich the enduser
never gets the 160 "supposedly good ones" when one is "bad"...


How is that a "good thing" ?


And trying to tell those that get hit "this will force upstream to fix 
it faster, so you get a working setup in some days/weeks/months..." is

not going to work...


Heh, This even reminds me that this is just as annoying as when MS
started to "bundle monthly security updates" and you get 95% installed
just to realize that the last 5% does not work (or install at all) and
you have to rollback to something working thus missing the needed
security fixes...

Same flawed logic...

Thnakfully we as distro maintainers can avoid some of the breakage for 
our enduses...


--
Thomas


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-19 Thread Thomas Backlund

Den 19.04.2018 kl. 18:09, skrev Sasha Levin:

On Thu, Apr 19, 2018 at 06:04:26PM +0300, Thomas Backlund wrote:

Den 19.04.2018 kl. 16:59, skrev Greg KH:

Anyway, we are trying not to do this, but it does, and will,
occasionally happen.  Look, we just did that for one platform for
4.9.94!  And the key to all of this is good testing, which we are now
doing, and hopefully you are also doing as well.


Yeah, but having to test stuff with known breakages is no fun, so we
try to avoid that


Known breakages are easier to deal with than unknown ones :)



well, if a system worked before the update, but not after...
Guess wich one we want...




I think that that "bug compatability" is basically a policy on *which*
regressions you'll see vs *if* you'll see a regression.




No. Intentionally breaking known working code in a stable branch is 
never ok.


As I said before... something that never worked is not a regression,
but breaking a previously working setup is...

That goes for security fixes too... there is not much point in a 
security fix, if it basically turns into a local DOS when the system 
stops working...


People will just boot older code and there you have it...



We'll never pull in a commit that introduces a bug but doesn't fix
another one, right? So if you have to deal with a regression anyway,
might as well deal with a regression that is also seen on mainline, so
that when you upgrade your stable kernel you'll keep the same set of
regressions to deal with.




Here I actually like the comment Linus posted about API breakage earlier 
in this thread...



If you break user workflows, NOTHING ELSE MATTERS.

Even security is secondary to "people don't use the end result,
because it doesn't work for them any more".


_This_ same statement should be aknowledged / enforced in stable trees 
too IMHO...


Because this is what will happend...

simple logic... if it does not work, the enduser will boot an earlier 
kernel... missing "all the good fixes" (including security ones) just

because one fix is bad.

For example in this AUTOSEL round there is 161 fixes of wich the enduser
never gets the 160 "supposedly good ones" when one is "bad"...


How is that a "good thing" ?


And trying to tell those that get hit "this will force upstream to fix 
it faster, so you get a working setup in some days/weeks/months..." is

not going to work...


Heh, This even reminds me that this is just as annoying as when MS
started to "bundle monthly security updates" and you get 95% installed
just to realize that the last 5% does not work (or install at all) and
you have to rollback to something working thus missing the needed
security fixes...

Same flawed logic...

Thnakfully we as distro maintainers can avoid some of the breakage for 
our enduses...


--
Thomas


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-19 Thread Greg KH
On Thu, Apr 19, 2018 at 06:16:26PM +0300, Thomas Backlund wrote:
> Den 19.04.2018 kl. 17:22, skrev Greg KH:
> > On Thu, Apr 19, 2018 at 04:05:45PM +0200, Jan Kara wrote:
> > > On Thu 19-04-18 15:59:43, Greg KH wrote:
> > > > On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote:
> > > > > Den 16-04-2018 kl. 19:19, skrev Sasha Levin:
> > > > > > On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote:
> > > > > > > On Mon, 16 Apr 2018 16:02:03 +
> > > > > > > Sasha Levin  wrote:
> > > > > > > 
> > > > > > > > One of the things Greg is pushing strongly for is "bug 
> > > > > > > > compatibility":
> > > > > > > > we want the kernel to behave the same way between mainline and 
> > > > > > > > stable.
> > > > > > > > If the code is broken, it should be broken in the same way.
> > > > > > > 
> > > > > > > Wait! What does that mean? What's the purpose of stable if it is 
> > > > > > > as
> > > > > > > broken as mainline?
> > > > > > 
> > > > > > This just means that if there is a fix that went in mainline, and 
> > > > > > the
> > > > > > fix is broken somehow, we'd rather take the broken fix than not.
> > > > > > 
> > > > > > In this scenario, *something* will be broken, it's just a matter of
> > > > > > what. We'd rather have the same thing broken between mainline and
> > > > > > stable.
> > > > > > 
> > > > > 
> > > > > Yeah, but _intentionally_ breaking existing setups to stay "bug 
> > > > > compatible"
> > > > > _is_ a _regression_ you _really_ _dont_ want in a stable
> > > > > supported distro. Because end-users dont care about upstream breaking
> > > > > stuff... its the distro that takes the heat for that...
> > > > > 
> > > > > Something "already broken" is not a regression...
> > > > > 
> > > > > As distro maintainer that means one now have to review _every_ patch 
> > > > > that
> > > > > carries "AUTOSEL", follow all the mail threads that comes up about 
> > > > > it, then
> > > > > track if it landed in -stable queue, and read every response and 
> > > > > possible
> > > > > objection to all patches in the -stable queue a second time around... 
> > > > > then
> > > > > check if it still got included in final stable point relase and then 
> > > > > either
> > > > > revert them in distro kernel or go track down all the follow-up fixes
> > > > > needed...
> > > > > 
> > > > > Just to avoid being "bug compatible with master"
> > > > 
> > > > I've done this "bug compatible" "breakage" more than the AUTOSEL stuff
> > > > has in the past, so you had better also be reviewing all of my normal
> > > > commits as well :)
> > > > 
> > > > Anyway, we are trying not to do this, but it does, and will,
> > > > occasionally happen.
> > > 
> > > Sure, that's understood. So this was just misunderstanding. Sasha's
> > > original comment really sounded like "bug compatibility" with current
> > > master is desirable and that made me go "Are you serious?" as well...
> > 
> > As I said before in this thread, yes, sometimes I do this on purpose.
> > 
> 
> And I guess this is the one that gets people the feeling that
> "stable is not as stable as it used to be" ...

It's always been this way, it's just that no one noticed :)

> > As an specific example, see a recent bluetooth patch that caused a
> > regression on some chromebook devices.  The chromeos developers
> > rightfully complainied, and I left the commit in there to provide the
> > needed "leverage" on the upstream developers to fix this properly.
> > Otherwise if I had reverted the stable patch, when people move to a
> > newer kernel version, things break, and no one remembers why.
> 
> I do understand what you are trying to do...
> 
> But from my distro hat on I have to treat things differently (and I dont
> think I'm alone doing it this way...)
> 
> Known breakages gets reverted even before it hits QA, so endusers wont see
> the issue at all...
> 
> So the only ones to see the issue are those building with latest upstream
> without own patches applied...
> 
> > 
> > I also wrote a long response as to _why_ I do this, and even though it
> > does happen, why it still is worth taking the stable updates.  Please
> > see the archives for the full details.  I don't want to duplicate this
> > again here.
> 
> And we do use latest stable (with some delay as I dont want to overload QA &
> endusers with a new kernel every week :))

You need to automate your QA :)

> We just revert known broken (or add known fixes) before releasing them to
> our users

That's great, and is what you should be doing, nothing wrong there.

thanks,

greg k-h


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-19 Thread Greg KH
On Thu, Apr 19, 2018 at 06:16:26PM +0300, Thomas Backlund wrote:
> Den 19.04.2018 kl. 17:22, skrev Greg KH:
> > On Thu, Apr 19, 2018 at 04:05:45PM +0200, Jan Kara wrote:
> > > On Thu 19-04-18 15:59:43, Greg KH wrote:
> > > > On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote:
> > > > > Den 16-04-2018 kl. 19:19, skrev Sasha Levin:
> > > > > > On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote:
> > > > > > > On Mon, 16 Apr 2018 16:02:03 +
> > > > > > > Sasha Levin  wrote:
> > > > > > > 
> > > > > > > > One of the things Greg is pushing strongly for is "bug 
> > > > > > > > compatibility":
> > > > > > > > we want the kernel to behave the same way between mainline and 
> > > > > > > > stable.
> > > > > > > > If the code is broken, it should be broken in the same way.
> > > > > > > 
> > > > > > > Wait! What does that mean? What's the purpose of stable if it is 
> > > > > > > as
> > > > > > > broken as mainline?
> > > > > > 
> > > > > > This just means that if there is a fix that went in mainline, and 
> > > > > > the
> > > > > > fix is broken somehow, we'd rather take the broken fix than not.
> > > > > > 
> > > > > > In this scenario, *something* will be broken, it's just a matter of
> > > > > > what. We'd rather have the same thing broken between mainline and
> > > > > > stable.
> > > > > > 
> > > > > 
> > > > > Yeah, but _intentionally_ breaking existing setups to stay "bug 
> > > > > compatible"
> > > > > _is_ a _regression_ you _really_ _dont_ want in a stable
> > > > > supported distro. Because end-users dont care about upstream breaking
> > > > > stuff... its the distro that takes the heat for that...
> > > > > 
> > > > > Something "already broken" is not a regression...
> > > > > 
> > > > > As distro maintainer that means one now have to review _every_ patch 
> > > > > that
> > > > > carries "AUTOSEL", follow all the mail threads that comes up about 
> > > > > it, then
> > > > > track if it landed in -stable queue, and read every response and 
> > > > > possible
> > > > > objection to all patches in the -stable queue a second time around... 
> > > > > then
> > > > > check if it still got included in final stable point relase and then 
> > > > > either
> > > > > revert them in distro kernel or go track down all the follow-up fixes
> > > > > needed...
> > > > > 
> > > > > Just to avoid being "bug compatible with master"
> > > > 
> > > > I've done this "bug compatible" "breakage" more than the AUTOSEL stuff
> > > > has in the past, so you had better also be reviewing all of my normal
> > > > commits as well :)
> > > > 
> > > > Anyway, we are trying not to do this, but it does, and will,
> > > > occasionally happen.
> > > 
> > > Sure, that's understood. So this was just misunderstanding. Sasha's
> > > original comment really sounded like "bug compatibility" with current
> > > master is desirable and that made me go "Are you serious?" as well...
> > 
> > As I said before in this thread, yes, sometimes I do this on purpose.
> > 
> 
> And I guess this is the one that gets people the feeling that
> "stable is not as stable as it used to be" ...

It's always been this way, it's just that no one noticed :)

> > As an specific example, see a recent bluetooth patch that caused a
> > regression on some chromebook devices.  The chromeos developers
> > rightfully complainied, and I left the commit in there to provide the
> > needed "leverage" on the upstream developers to fix this properly.
> > Otherwise if I had reverted the stable patch, when people move to a
> > newer kernel version, things break, and no one remembers why.
> 
> I do understand what you are trying to do...
> 
> But from my distro hat on I have to treat things differently (and I dont
> think I'm alone doing it this way...)
> 
> Known breakages gets reverted even before it hits QA, so endusers wont see
> the issue at all...
> 
> So the only ones to see the issue are those building with latest upstream
> without own patches applied...
> 
> > 
> > I also wrote a long response as to _why_ I do this, and even though it
> > does happen, why it still is worth taking the stable updates.  Please
> > see the archives for the full details.  I don't want to duplicate this
> > again here.
> 
> And we do use latest stable (with some delay as I dont want to overload QA &
> endusers with a new kernel every week :))

You need to automate your QA :)

> We just revert known broken (or add known fixes) before releasing them to
> our users

That's great, and is what you should be doing, nothing wrong there.

thanks,

greg k-h


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-19 Thread Thomas Backlund

Den 19.04.2018 kl. 17:22, skrev Greg KH:

On Thu, Apr 19, 2018 at 04:05:45PM +0200, Jan Kara wrote:

On Thu 19-04-18 15:59:43, Greg KH wrote:

On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote:

Den 16-04-2018 kl. 19:19, skrev Sasha Levin:

On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote:

On Mon, 16 Apr 2018 16:02:03 +
Sasha Levin  wrote:


One of the things Greg is pushing strongly for is "bug compatibility":
we want the kernel to behave the same way between mainline and stable.
If the code is broken, it should be broken in the same way.


Wait! What does that mean? What's the purpose of stable if it is as
broken as mainline?


This just means that if there is a fix that went in mainline, and the
fix is broken somehow, we'd rather take the broken fix than not.

In this scenario, *something* will be broken, it's just a matter of
what. We'd rather have the same thing broken between mainline and
stable.



Yeah, but _intentionally_ breaking existing setups to stay "bug compatible"
_is_ a _regression_ you _really_ _dont_ want in a stable
supported distro. Because end-users dont care about upstream breaking
stuff... its the distro that takes the heat for that...

Something "already broken" is not a regression...

As distro maintainer that means one now have to review _every_ patch that
carries "AUTOSEL", follow all the mail threads that comes up about it, then
track if it landed in -stable queue, and read every response and possible
objection to all patches in the -stable queue a second time around... then
check if it still got included in final stable point relase and then either
revert them in distro kernel or go track down all the follow-up fixes
needed...

Just to avoid being "bug compatible with master"


I've done this "bug compatible" "breakage" more than the AUTOSEL stuff
has in the past, so you had better also be reviewing all of my normal
commits as well :)

Anyway, we are trying not to do this, but it does, and will,
occasionally happen.


Sure, that's understood. So this was just misunderstanding. Sasha's
original comment really sounded like "bug compatibility" with current
master is desirable and that made me go "Are you serious?" as well...


As I said before in this thread, yes, sometimes I do this on purpose.



And I guess this is the one that gets people the feeling that
"stable is not as stable as it used to be" ...


As an specific example, see a recent bluetooth patch that caused a
regression on some chromebook devices.  The chromeos developers
rightfully complainied, and I left the commit in there to provide the
needed "leverage" on the upstream developers to fix this properly.
Otherwise if I had reverted the stable patch, when people move to a
newer kernel version, things break, and no one remembers why.


I do understand what you are trying to do...

But from my distro hat on I have to treat things differently (and I dont 
think I'm alone doing it this way...)


Known breakages gets reverted even before it hits QA, so endusers wont 
see the issue at all...


So the only ones to see the issue are those building with latest 
upstream without own patches applied...




I also wrote a long response as to _why_ I do this, and even though it
does happen, why it still is worth taking the stable updates.  Please
see the archives for the full details.  I don't want to duplicate this
again here.


And we do use latest stable (with some delay as I dont want to overload 
QA & endusers with a new kernel every week :))


We just revert known broken (or add known fixes) before releasing them 
to our users


--
Thomas





Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-19 Thread Thomas Backlund

Den 19.04.2018 kl. 17:22, skrev Greg KH:

On Thu, Apr 19, 2018 at 04:05:45PM +0200, Jan Kara wrote:

On Thu 19-04-18 15:59:43, Greg KH wrote:

On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote:

Den 16-04-2018 kl. 19:19, skrev Sasha Levin:

On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote:

On Mon, 16 Apr 2018 16:02:03 +
Sasha Levin  wrote:


One of the things Greg is pushing strongly for is "bug compatibility":
we want the kernel to behave the same way between mainline and stable.
If the code is broken, it should be broken in the same way.


Wait! What does that mean? What's the purpose of stable if it is as
broken as mainline?


This just means that if there is a fix that went in mainline, and the
fix is broken somehow, we'd rather take the broken fix than not.

In this scenario, *something* will be broken, it's just a matter of
what. We'd rather have the same thing broken between mainline and
stable.



Yeah, but _intentionally_ breaking existing setups to stay "bug compatible"
_is_ a _regression_ you _really_ _dont_ want in a stable
supported distro. Because end-users dont care about upstream breaking
stuff... its the distro that takes the heat for that...

Something "already broken" is not a regression...

As distro maintainer that means one now have to review _every_ patch that
carries "AUTOSEL", follow all the mail threads that comes up about it, then
track if it landed in -stable queue, and read every response and possible
objection to all patches in the -stable queue a second time around... then
check if it still got included in final stable point relase and then either
revert them in distro kernel or go track down all the follow-up fixes
needed...

Just to avoid being "bug compatible with master"


I've done this "bug compatible" "breakage" more than the AUTOSEL stuff
has in the past, so you had better also be reviewing all of my normal
commits as well :)

Anyway, we are trying not to do this, but it does, and will,
occasionally happen.


Sure, that's understood. So this was just misunderstanding. Sasha's
original comment really sounded like "bug compatibility" with current
master is desirable and that made me go "Are you serious?" as well...


As I said before in this thread, yes, sometimes I do this on purpose.



And I guess this is the one that gets people the feeling that
"stable is not as stable as it used to be" ...


As an specific example, see a recent bluetooth patch that caused a
regression on some chromebook devices.  The chromeos developers
rightfully complainied, and I left the commit in there to provide the
needed "leverage" on the upstream developers to fix this properly.
Otherwise if I had reverted the stable patch, when people move to a
newer kernel version, things break, and no one remembers why.


I do understand what you are trying to do...

But from my distro hat on I have to treat things differently (and I dont 
think I'm alone doing it this way...)


Known breakages gets reverted even before it hits QA, so endusers wont 
see the issue at all...


So the only ones to see the issue are those building with latest 
upstream without own patches applied...




I also wrote a long response as to _why_ I do this, and even though it
does happen, why it still is worth taking the stable updates.  Please
see the archives for the full details.  I don't want to duplicate this
again here.


And we do use latest stable (with some delay as I dont want to overload 
QA & endusers with a new kernel every week :))


We just revert known broken (or add known fixes) before releasing them 
to our users


--
Thomas





Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-19 Thread Sasha Levin
On Thu, Apr 19, 2018 at 06:04:26PM +0300, Thomas Backlund wrote:
>Den 19.04.2018 kl. 16:59, skrev Greg KH:
>>Anyway, we are trying not to do this, but it does, and will,
>>occasionally happen.  Look, we just did that for one platform for
>>4.9.94!  And the key to all of this is good testing, which we are now
>>doing, and hopefully you are also doing as well.
>
>Yeah, but having to test stuff with known breakages is no fun, so we 
>try to avoid that

Known breakages are easier to deal with than unknown ones :)

I think that that "bug compatability" is basically a policy on *which*
regressions you'll see vs *if* you'll see a regression.

We'll never pull in a commit that introduces a bug but doesn't fix
another one, right? So if you have to deal with a regression anyway,
might as well deal with a regression that is also seen on mainline, so
that when you upgrade your stable kernel you'll keep the same set of
regressions to deal with.

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-19 Thread Sasha Levin
On Thu, Apr 19, 2018 at 06:04:26PM +0300, Thomas Backlund wrote:
>Den 19.04.2018 kl. 16:59, skrev Greg KH:
>>Anyway, we are trying not to do this, but it does, and will,
>>occasionally happen.  Look, we just did that for one platform for
>>4.9.94!  And the key to all of this is good testing, which we are now
>>doing, and hopefully you are also doing as well.
>
>Yeah, but having to test stuff with known breakages is no fun, so we 
>try to avoid that

Known breakages are easier to deal with than unknown ones :)

I think that that "bug compatability" is basically a policy on *which*
regressions you'll see vs *if* you'll see a regression.

We'll never pull in a commit that introduces a bug but doesn't fix
another one, right? So if you have to deal with a regression anyway,
might as well deal with a regression that is also seen on mainline, so
that when you upgrade your stable kernel you'll keep the same set of
regressions to deal with.

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-19 Thread Thomas Backlund

Den 19.04.2018 kl. 16:59, skrev Greg KH:

On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote:

Den 16-04-2018 kl. 19:19, skrev Sasha Levin:

On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote:

On Mon, 16 Apr 2018 16:02:03 +
Sasha Levin  wrote:


One of the things Greg is pushing strongly for is "bug compatibility":
we want the kernel to behave the same way between mainline and stable.
If the code is broken, it should be broken in the same way.


Wait! What does that mean? What's the purpose of stable if it is as
broken as mainline?


This just means that if there is a fix that went in mainline, and the
fix is broken somehow, we'd rather take the broken fix than not.

In this scenario, *something* will be broken, it's just a matter of
what. We'd rather have the same thing broken between mainline and
stable.



Yeah, but _intentionally_ breaking existing setups to stay "bug compatible"
_is_ a _regression_ you _really_ _dont_ want in a stable
supported distro. Because end-users dont care about upstream breaking
stuff... its the distro that takes the heat for that...

Something "already broken" is not a regression...

As distro maintainer that means one now have to review _every_ patch that
carries "AUTOSEL", follow all the mail threads that comes up about it, then
track if it landed in -stable queue, and read every response and possible
objection to all patches in the -stable queue a second time around... then
check if it still got included in final stable point relase and then either
revert them in distro kernel or go track down all the follow-up fixes
needed...

Just to avoid being "bug compatible with master"


I've done this "bug compatible" "breakage" more than the AUTOSEL stuff
has in the past, so you had better also be reviewing all of my normal
commits as well :)



Yeah, I do... and same goes there ... if there is a known issue, then 
same procedure... Either revert, or try to track down fixes...




Anyway, we are trying not to do this, but it does, and will,
occasionally happen.  Look, we just did that for one platform for
4.9.94!  And the key to all of this is good testing, which we are now
doing, and hopefully you are also doing as well.


Yeah, but having to test stuff with known breakages is no fun, so we try 
to avoid that


--
Thomas



Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-19 Thread Thomas Backlund

Den 19.04.2018 kl. 16:59, skrev Greg KH:

On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote:

Den 16-04-2018 kl. 19:19, skrev Sasha Levin:

On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote:

On Mon, 16 Apr 2018 16:02:03 +
Sasha Levin  wrote:


One of the things Greg is pushing strongly for is "bug compatibility":
we want the kernel to behave the same way between mainline and stable.
If the code is broken, it should be broken in the same way.


Wait! What does that mean? What's the purpose of stable if it is as
broken as mainline?


This just means that if there is a fix that went in mainline, and the
fix is broken somehow, we'd rather take the broken fix than not.

In this scenario, *something* will be broken, it's just a matter of
what. We'd rather have the same thing broken between mainline and
stable.



Yeah, but _intentionally_ breaking existing setups to stay "bug compatible"
_is_ a _regression_ you _really_ _dont_ want in a stable
supported distro. Because end-users dont care about upstream breaking
stuff... its the distro that takes the heat for that...

Something "already broken" is not a regression...

As distro maintainer that means one now have to review _every_ patch that
carries "AUTOSEL", follow all the mail threads that comes up about it, then
track if it landed in -stable queue, and read every response and possible
objection to all patches in the -stable queue a second time around... then
check if it still got included in final stable point relase and then either
revert them in distro kernel or go track down all the follow-up fixes
needed...

Just to avoid being "bug compatible with master"


I've done this "bug compatible" "breakage" more than the AUTOSEL stuff
has in the past, so you had better also be reviewing all of my normal
commits as well :)



Yeah, I do... and same goes there ... if there is a known issue, then 
same procedure... Either revert, or try to track down fixes...




Anyway, we are trying not to do this, but it does, and will,
occasionally happen.  Look, we just did that for one platform for
4.9.94!  And the key to all of this is good testing, which we are now
doing, and hopefully you are also doing as well.


Yeah, but having to test stuff with known breakages is no fun, so we try 
to avoid that


--
Thomas



Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-19 Thread Greg KH
On Thu, Apr 19, 2018 at 04:05:45PM +0200, Jan Kara wrote:
> On Thu 19-04-18 15:59:43, Greg KH wrote:
> > On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote:
> > > Den 16-04-2018 kl. 19:19, skrev Sasha Levin:
> > > > On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote:
> > > > > On Mon, 16 Apr 2018 16:02:03 +
> > > > > Sasha Levin  wrote:
> > > > > 
> > > > > > One of the things Greg is pushing strongly for is "bug 
> > > > > > compatibility":
> > > > > > we want the kernel to behave the same way between mainline and 
> > > > > > stable.
> > > > > > If the code is broken, it should be broken in the same way.
> > > > > 
> > > > > Wait! What does that mean? What's the purpose of stable if it is as
> > > > > broken as mainline?
> > > > 
> > > > This just means that if there is a fix that went in mainline, and the
> > > > fix is broken somehow, we'd rather take the broken fix than not.
> > > > 
> > > > In this scenario, *something* will be broken, it's just a matter of
> > > > what. We'd rather have the same thing broken between mainline and
> > > > stable.
> > > > 
> > > 
> > > Yeah, but _intentionally_ breaking existing setups to stay "bug 
> > > compatible"
> > > _is_ a _regression_ you _really_ _dont_ want in a stable
> > > supported distro. Because end-users dont care about upstream breaking
> > > stuff... its the distro that takes the heat for that...
> > > 
> > > Something "already broken" is not a regression...
> > > 
> > > As distro maintainer that means one now have to review _every_ patch that
> > > carries "AUTOSEL", follow all the mail threads that comes up about it, 
> > > then
> > > track if it landed in -stable queue, and read every response and possible
> > > objection to all patches in the -stable queue a second time around... then
> > > check if it still got included in final stable point relase and then 
> > > either
> > > revert them in distro kernel or go track down all the follow-up fixes
> > > needed...
> > > 
> > > Just to avoid being "bug compatible with master"
> > 
> > I've done this "bug compatible" "breakage" more than the AUTOSEL stuff
> > has in the past, so you had better also be reviewing all of my normal
> > commits as well :)
> > 
> > Anyway, we are trying not to do this, but it does, and will,
> > occasionally happen.
> 
> Sure, that's understood. So this was just misunderstanding. Sasha's
> original comment really sounded like "bug compatibility" with current
> master is desirable and that made me go "Are you serious?" as well...

As I said before in this thread, yes, sometimes I do this on purpose.

As an specific example, see a recent bluetooth patch that caused a
regression on some chromebook devices.  The chromeos developers
rightfully complainied, and I left the commit in there to provide the
needed "leverage" on the upstream developers to fix this properly.
Otherwise if I had reverted the stable patch, when people move to a
newer kernel version, things break, and no one remembers why.

I also wrote a long response as to _why_ I do this, and even though it
does happen, why it still is worth taking the stable updates.  Please
see the archives for the full details.  I don't want to duplicate this
again here.

thanks,

greg k-h


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-19 Thread Greg KH
On Thu, Apr 19, 2018 at 04:05:45PM +0200, Jan Kara wrote:
> On Thu 19-04-18 15:59:43, Greg KH wrote:
> > On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote:
> > > Den 16-04-2018 kl. 19:19, skrev Sasha Levin:
> > > > On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote:
> > > > > On Mon, 16 Apr 2018 16:02:03 +
> > > > > Sasha Levin  wrote:
> > > > > 
> > > > > > One of the things Greg is pushing strongly for is "bug 
> > > > > > compatibility":
> > > > > > we want the kernel to behave the same way between mainline and 
> > > > > > stable.
> > > > > > If the code is broken, it should be broken in the same way.
> > > > > 
> > > > > Wait! What does that mean? What's the purpose of stable if it is as
> > > > > broken as mainline?
> > > > 
> > > > This just means that if there is a fix that went in mainline, and the
> > > > fix is broken somehow, we'd rather take the broken fix than not.
> > > > 
> > > > In this scenario, *something* will be broken, it's just a matter of
> > > > what. We'd rather have the same thing broken between mainline and
> > > > stable.
> > > > 
> > > 
> > > Yeah, but _intentionally_ breaking existing setups to stay "bug 
> > > compatible"
> > > _is_ a _regression_ you _really_ _dont_ want in a stable
> > > supported distro. Because end-users dont care about upstream breaking
> > > stuff... its the distro that takes the heat for that...
> > > 
> > > Something "already broken" is not a regression...
> > > 
> > > As distro maintainer that means one now have to review _every_ patch that
> > > carries "AUTOSEL", follow all the mail threads that comes up about it, 
> > > then
> > > track if it landed in -stable queue, and read every response and possible
> > > objection to all patches in the -stable queue a second time around... then
> > > check if it still got included in final stable point relase and then 
> > > either
> > > revert them in distro kernel or go track down all the follow-up fixes
> > > needed...
> > > 
> > > Just to avoid being "bug compatible with master"
> > 
> > I've done this "bug compatible" "breakage" more than the AUTOSEL stuff
> > has in the past, so you had better also be reviewing all of my normal
> > commits as well :)
> > 
> > Anyway, we are trying not to do this, but it does, and will,
> > occasionally happen.
> 
> Sure, that's understood. So this was just misunderstanding. Sasha's
> original comment really sounded like "bug compatibility" with current
> master is desirable and that made me go "Are you serious?" as well...

As I said before in this thread, yes, sometimes I do this on purpose.

As an specific example, see a recent bluetooth patch that caused a
regression on some chromebook devices.  The chromeos developers
rightfully complainied, and I left the commit in there to provide the
needed "leverage" on the upstream developers to fix this properly.
Otherwise if I had reverted the stable patch, when people move to a
newer kernel version, things break, and no one remembers why.

I also wrote a long response as to _why_ I do this, and even though it
does happen, why it still is worth taking the stable updates.  Please
see the archives for the full details.  I don't want to duplicate this
again here.

thanks,

greg k-h


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-19 Thread Jan Kara
On Thu 19-04-18 15:59:43, Greg KH wrote:
> On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote:
> > Den 16-04-2018 kl. 19:19, skrev Sasha Levin:
> > > On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote:
> > > > On Mon, 16 Apr 2018 16:02:03 +
> > > > Sasha Levin  wrote:
> > > > 
> > > > > One of the things Greg is pushing strongly for is "bug compatibility":
> > > > > we want the kernel to behave the same way between mainline and stable.
> > > > > If the code is broken, it should be broken in the same way.
> > > > 
> > > > Wait! What does that mean? What's the purpose of stable if it is as
> > > > broken as mainline?
> > > 
> > > This just means that if there is a fix that went in mainline, and the
> > > fix is broken somehow, we'd rather take the broken fix than not.
> > > 
> > > In this scenario, *something* will be broken, it's just a matter of
> > > what. We'd rather have the same thing broken between mainline and
> > > stable.
> > > 
> > 
> > Yeah, but _intentionally_ breaking existing setups to stay "bug compatible"
> > _is_ a _regression_ you _really_ _dont_ want in a stable
> > supported distro. Because end-users dont care about upstream breaking
> > stuff... its the distro that takes the heat for that...
> > 
> > Something "already broken" is not a regression...
> > 
> > As distro maintainer that means one now have to review _every_ patch that
> > carries "AUTOSEL", follow all the mail threads that comes up about it, then
> > track if it landed in -stable queue, and read every response and possible
> > objection to all patches in the -stable queue a second time around... then
> > check if it still got included in final stable point relase and then either
> > revert them in distro kernel or go track down all the follow-up fixes
> > needed...
> > 
> > Just to avoid being "bug compatible with master"
> 
> I've done this "bug compatible" "breakage" more than the AUTOSEL stuff
> has in the past, so you had better also be reviewing all of my normal
> commits as well :)
> 
> Anyway, we are trying not to do this, but it does, and will,
> occasionally happen.

Sure, that's understood. So this was just misunderstanding. Sasha's
original comment really sounded like "bug compatibility" with current
master is desirable and that made me go "Are you serious?" as well...

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-19 Thread Jan Kara
On Thu 19-04-18 15:59:43, Greg KH wrote:
> On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote:
> > Den 16-04-2018 kl. 19:19, skrev Sasha Levin:
> > > On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote:
> > > > On Mon, 16 Apr 2018 16:02:03 +
> > > > Sasha Levin  wrote:
> > > > 
> > > > > One of the things Greg is pushing strongly for is "bug compatibility":
> > > > > we want the kernel to behave the same way between mainline and stable.
> > > > > If the code is broken, it should be broken in the same way.
> > > > 
> > > > Wait! What does that mean? What's the purpose of stable if it is as
> > > > broken as mainline?
> > > 
> > > This just means that if there is a fix that went in mainline, and the
> > > fix is broken somehow, we'd rather take the broken fix than not.
> > > 
> > > In this scenario, *something* will be broken, it's just a matter of
> > > what. We'd rather have the same thing broken between mainline and
> > > stable.
> > > 
> > 
> > Yeah, but _intentionally_ breaking existing setups to stay "bug compatible"
> > _is_ a _regression_ you _really_ _dont_ want in a stable
> > supported distro. Because end-users dont care about upstream breaking
> > stuff... its the distro that takes the heat for that...
> > 
> > Something "already broken" is not a regression...
> > 
> > As distro maintainer that means one now have to review _every_ patch that
> > carries "AUTOSEL", follow all the mail threads that comes up about it, then
> > track if it landed in -stable queue, and read every response and possible
> > objection to all patches in the -stable queue a second time around... then
> > check if it still got included in final stable point relase and then either
> > revert them in distro kernel or go track down all the follow-up fixes
> > needed...
> > 
> > Just to avoid being "bug compatible with master"
> 
> I've done this "bug compatible" "breakage" more than the AUTOSEL stuff
> has in the past, so you had better also be reviewing all of my normal
> commits as well :)
> 
> Anyway, we are trying not to do this, but it does, and will,
> occasionally happen.

Sure, that's understood. So this was just misunderstanding. Sasha's
original comment really sounded like "bug compatibility" with current
master is desirable and that made me go "Are you serious?" as well...

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-19 Thread Greg KH
On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote:
> Den 16-04-2018 kl. 19:19, skrev Sasha Levin:
> > On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote:
> > > On Mon, 16 Apr 2018 16:02:03 +
> > > Sasha Levin  wrote:
> > > 
> > > > One of the things Greg is pushing strongly for is "bug compatibility":
> > > > we want the kernel to behave the same way between mainline and stable.
> > > > If the code is broken, it should be broken in the same way.
> > > 
> > > Wait! What does that mean? What's the purpose of stable if it is as
> > > broken as mainline?
> > 
> > This just means that if there is a fix that went in mainline, and the
> > fix is broken somehow, we'd rather take the broken fix than not.
> > 
> > In this scenario, *something* will be broken, it's just a matter of
> > what. We'd rather have the same thing broken between mainline and
> > stable.
> > 
> 
> Yeah, but _intentionally_ breaking existing setups to stay "bug compatible"
> _is_ a _regression_ you _really_ _dont_ want in a stable
> supported distro. Because end-users dont care about upstream breaking
> stuff... its the distro that takes the heat for that...
> 
> Something "already broken" is not a regression...
> 
> As distro maintainer that means one now have to review _every_ patch that
> carries "AUTOSEL", follow all the mail threads that comes up about it, then
> track if it landed in -stable queue, and read every response and possible
> objection to all patches in the -stable queue a second time around... then
> check if it still got included in final stable point relase and then either
> revert them in distro kernel or go track down all the follow-up fixes
> needed...
> 
> Just to avoid being "bug compatible with master"

I've done this "bug compatible" "breakage" more than the AUTOSEL stuff
has in the past, so you had better also be reviewing all of my normal
commits as well :)

Anyway, we are trying not to do this, but it does, and will,
occasionally happen.  Look, we just did that for one platform for
4.9.94!  And the key to all of this is good testing, which we are now
doing, and hopefully you are also doing as well.

thanks,

greg k-h


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-19 Thread Greg KH
On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote:
> Den 16-04-2018 kl. 19:19, skrev Sasha Levin:
> > On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote:
> > > On Mon, 16 Apr 2018 16:02:03 +
> > > Sasha Levin  wrote:
> > > 
> > > > One of the things Greg is pushing strongly for is "bug compatibility":
> > > > we want the kernel to behave the same way between mainline and stable.
> > > > If the code is broken, it should be broken in the same way.
> > > 
> > > Wait! What does that mean? What's the purpose of stable if it is as
> > > broken as mainline?
> > 
> > This just means that if there is a fix that went in mainline, and the
> > fix is broken somehow, we'd rather take the broken fix than not.
> > 
> > In this scenario, *something* will be broken, it's just a matter of
> > what. We'd rather have the same thing broken between mainline and
> > stable.
> > 
> 
> Yeah, but _intentionally_ breaking existing setups to stay "bug compatible"
> _is_ a _regression_ you _really_ _dont_ want in a stable
> supported distro. Because end-users dont care about upstream breaking
> stuff... its the distro that takes the heat for that...
> 
> Something "already broken" is not a regression...
> 
> As distro maintainer that means one now have to review _every_ patch that
> carries "AUTOSEL", follow all the mail threads that comes up about it, then
> track if it landed in -stable queue, and read every response and possible
> objection to all patches in the -stable queue a second time around... then
> check if it still got included in final stable point relase and then either
> revert them in distro kernel or go track down all the follow-up fixes
> needed...
> 
> Just to avoid being "bug compatible with master"

I've done this "bug compatible" "breakage" more than the AUTOSEL stuff
has in the past, so you had better also be reviewing all of my normal
commits as well :)

Anyway, we are trying not to do this, but it does, and will,
occasionally happen.  Look, we just did that for one platform for
4.9.94!  And the key to all of this is good testing, which we are now
doing, and hopefully you are also doing as well.

thanks,

greg k-h


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-19 Thread Thomas Backlund

Den 16-04-2018 kl. 19:19, skrev Sasha Levin:

On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote:

On Mon, 16 Apr 2018 16:02:03 +
Sasha Levin  wrote:


One of the things Greg is pushing strongly for is "bug compatibility":
we want the kernel to behave the same way between mainline and stable.
If the code is broken, it should be broken in the same way.


Wait! What does that mean? What's the purpose of stable if it is as
broken as mainline?


This just means that if there is a fix that went in mainline, and the
fix is broken somehow, we'd rather take the broken fix than not.

In this scenario, *something* will be broken, it's just a matter of
what. We'd rather have the same thing broken between mainline and
stable.



Yeah, but _intentionally_ breaking existing setups to stay "bug 
compatible" _is_ a _regression_ you _really_ _dont_ want in a stable

supported distro. Because end-users dont care about upstream breaking
stuff... its the distro that takes the heat for that...

Something "already broken" is not a regression...

As distro maintainer that means one now have to review _every_ patch 
that carries "AUTOSEL", follow all the mail threads that comes up about 
it, then track if it landed in -stable queue, and read every response 
and possible objection to all patches in the -stable queue a second time 
around... then check if it still got included in final stable point 
relase and then either revert them in distro kernel or go track down all 
the follow-up fixes needed...


Just to avoid being "bug compatible with master"

--
Thomas


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-19 Thread Thomas Backlund

Den 16-04-2018 kl. 19:19, skrev Sasha Levin:

On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote:

On Mon, 16 Apr 2018 16:02:03 +
Sasha Levin  wrote:


One of the things Greg is pushing strongly for is "bug compatibility":
we want the kernel to behave the same way between mainline and stable.
If the code is broken, it should be broken in the same way.


Wait! What does that mean? What's the purpose of stable if it is as
broken as mainline?


This just means that if there is a fix that went in mainline, and the
fix is broken somehow, we'd rather take the broken fix than not.

In this scenario, *something* will be broken, it's just a matter of
what. We'd rather have the same thing broken between mainline and
stable.



Yeah, but _intentionally_ breaking existing setups to stay "bug 
compatible" _is_ a _regression_ you _really_ _dont_ want in a stable

supported distro. Because end-users dont care about upstream breaking
stuff... its the distro that takes the heat for that...

Something "already broken" is not a regression...

As distro maintainer that means one now have to review _every_ patch 
that carries "AUTOSEL", follow all the mail threads that comes up about 
it, then track if it landed in -stable queue, and read every response 
and possible objection to all patches in the -stable queue a second time 
around... then check if it still got included in final stable point 
relase and then either revert them in distro kernel or go track down all 
the follow-up fixes needed...


Just to avoid being "bug compatible with master"

--
Thomas


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-18 Thread Petr Mladek
On Tue 2018-04-17 13:45:59, Sasha Levin wrote:
> On Tue, Apr 17, 2018 at 02:24:54PM +0200, Petr Mladek wrote:
> >Back to the trend. Last week I got autosel mails even for
> >patches that were still being discussed, had issues, and
> >were far from upstream:
> >
> > https://lkml.kernel.org/r/dm5pr2101mb1032ab19b489d46b717b50d4fb...@dm5pr2101mb1032.namprd21.prod.outlook.com
> > https://lkml.kernel.org/r/dm5pr2101mb10327fa0a7e0d2c901e33b79fb...@dm5pr2101mb1032.namprd21.prod.outlook.com
> >
> >It might be a good idea if the mail asked to add Fixes: tag
> >or stable mailing list. But the mail suggested to add the
> >unfinished patch into stable branch directly (even before
> >upstreaming?).
> 
> I obviously didn't suggest that this patch will go in -stable before
> it's upstream.
> 
> I've started doing those because some folks can't be arsed to reply to a
> review request for a patch that is months old. I found that if I send
> these mails while the discussion is still going on I'd get a much better
> response rate from people.

I see. It makes sense.

> If you think any of these patches should go in stable there were two
> ways about it:
>
>  - You end up adding the -stable tag yourself, and it would follow the
>usual route where Greg picks it up.
>  - You reply to that mail, and the patch would wait in a list until my
>script notices it made it upstream, at which point it would get
>queued for stable.

It would be great if the options are described in the mail.

I wonder if it would make sense to add also a tag that would
say that the commit is not suitable for stable. It might
help both sides. The maintainers will be able to share
their opinion and eventually reduce mails from autosel.
You would get feedback that maintainers considered
the patch for stable. It might be even useful for
teaching the AI.


> >Now, there are only hand full of printk patches in each
> >release, so it is still doable. I just do not understand
> >how other maintainers, from much more busy subsystems,
> >could cope with this trend.
> 
> So yes, I'm aware that the volume of patches is huge, but there's not
> much I can do about it because it's just a subset of the kernel's patch
> volume and since the kernel gets more and more patches each release, the
> volume of stable commits is bound to grow as well.

Yes, but the grow in the stable is much faster than the grow
in maintain at the moment. It might be fine if it was caused
just by engaging subsystems that ignored stable so far. But
I am not sure if it is the case. Also I am not sure about
your plans.

Anyway, I am surprised that the patches might go into stable
so easily (no response -> accepted). While it is pretty
hard to get through the review process for mainline.

Of course, many patches go into mainline without review
as well. But the difference is that they are pushed by
people that are familiar and responsible for the affected
area.

I could understand the pain. There are surely people that
do not care about stable, because it takes time, it is hard
to make decisions, flashbacks to the old code are painful,
etc. Well, this is the reason why the maintenance support
is and should be limited.

Anyway, I think that it cannot be done reasonably without
maintainers. You should be careful so that even the currently
cooperating maintainers will not start considering autosel
mails as a spam. (It is not my case. printk is small thing.
But I could imagine that it might stop being bearable
in bigger subsystems. As is already the case with xfs.)

Best Regards,
Petr


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-18 Thread Petr Mladek
On Tue 2018-04-17 13:45:59, Sasha Levin wrote:
> On Tue, Apr 17, 2018 at 02:24:54PM +0200, Petr Mladek wrote:
> >Back to the trend. Last week I got autosel mails even for
> >patches that were still being discussed, had issues, and
> >were far from upstream:
> >
> > https://lkml.kernel.org/r/dm5pr2101mb1032ab19b489d46b717b50d4fb...@dm5pr2101mb1032.namprd21.prod.outlook.com
> > https://lkml.kernel.org/r/dm5pr2101mb10327fa0a7e0d2c901e33b79fb...@dm5pr2101mb1032.namprd21.prod.outlook.com
> >
> >It might be a good idea if the mail asked to add Fixes: tag
> >or stable mailing list. But the mail suggested to add the
> >unfinished patch into stable branch directly (even before
> >upstreaming?).
> 
> I obviously didn't suggest that this patch will go in -stable before
> it's upstream.
> 
> I've started doing those because some folks can't be arsed to reply to a
> review request for a patch that is months old. I found that if I send
> these mails while the discussion is still going on I'd get a much better
> response rate from people.

I see. It makes sense.

> If you think any of these patches should go in stable there were two
> ways about it:
>
>  - You end up adding the -stable tag yourself, and it would follow the
>usual route where Greg picks it up.
>  - You reply to that mail, and the patch would wait in a list until my
>script notices it made it upstream, at which point it would get
>queued for stable.

It would be great if the options are described in the mail.

I wonder if it would make sense to add also a tag that would
say that the commit is not suitable for stable. It might
help both sides. The maintainers will be able to share
their opinion and eventually reduce mails from autosel.
You would get feedback that maintainers considered
the patch for stable. It might be even useful for
teaching the AI.


> >Now, there are only hand full of printk patches in each
> >release, so it is still doable. I just do not understand
> >how other maintainers, from much more busy subsystems,
> >could cope with this trend.
> 
> So yes, I'm aware that the volume of patches is huge, but there's not
> much I can do about it because it's just a subset of the kernel's patch
> volume and since the kernel gets more and more patches each release, the
> volume of stable commits is bound to grow as well.

Yes, but the grow in the stable is much faster than the grow
in maintain at the moment. It might be fine if it was caused
just by engaging subsystems that ignored stable so far. But
I am not sure if it is the case. Also I am not sure about
your plans.

Anyway, I am surprised that the patches might go into stable
so easily (no response -> accepted). While it is pretty
hard to get through the review process for mainline.

Of course, many patches go into mainline without review
as well. But the difference is that they are pushed by
people that are familiar and responsible for the affected
area.

I could understand the pain. There are surely people that
do not care about stable, because it takes time, it is hard
to make decisions, flashbacks to the old code are painful,
etc. Well, this is the reason why the maintenance support
is and should be limited.

Anyway, I think that it cannot be done reasonably without
maintainers. You should be careful so that even the currently
cooperating maintainers will not start considering autosel
mails as a spam. (It is not my case. printk is small thing.
But I could imagine that it might stop being bearable
in bigger subsystems. As is already the case with xfs.)

Best Regards,
Petr


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Sasha Levin
On Tue, Apr 17, 2018 at 07:57:54PM +0200, Jan Kara wrote:
>Actually I was careful enough to include only commits that got merged as
>part of the stable process into 4.14.x but got later reverted in 4.14.y.
>That's where the 0.4% number came from. So I believe all of those cases
>(13 in absolute numbers) were user visible regressions during the stable
>process.

I looked at them, and there are 2 things in play here:

 - Quite a few of those reverts are because of the PTI work. I'm not
   sure how we treat it, but yes - it skews statistics here.
 - 2 of them were reverts for device tree changes for a device that
   didn't exist in 4.14, and shouldn't have had any user visible
   changes.

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Sasha Levin
On Tue, Apr 17, 2018 at 07:57:54PM +0200, Jan Kara wrote:
>Actually I was careful enough to include only commits that got merged as
>part of the stable process into 4.14.x but got later reverted in 4.14.y.
>That's where the 0.4% number came from. So I believe all of those cases
>(13 in absolute numbers) were user visible regressions during the stable
>process.

I looked at them, and there are 2 things in play here:

 - Quite a few of those reverts are because of the PTI work. I'm not
   sure how we treat it, but yes - it skews statistics here.
 - 2 of them were reverts for device tree changes for a device that
   didn't exist in 4.14, and shouldn't have had any user visible
   changes.

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Michal Hocko
On Tue 17-04-18 14:36:44, Sasha Levin wrote:
> On Tue, Apr 17, 2018 at 04:22:46PM +0200, Michal Hocko wrote:
> >On Tue 17-04-18 13:39:33, Sasha Levin wrote:
> >[...]
> >> But mm/ commits don't come only from these people. Here's a concrete
> >> example we can discuss:
> >>
> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c61611f70958d86f659bca25c02ae69413747a8d
> >
> >I would be really careful. Because that reqiures to audit all callers to
> >be compliant with the change. This is just _too_ easy to backport
> >without noticing a failure. Now consider the other side. Is there any
> >real bug report backing this? This behavior was like that for quite some
> >time but I do not remember any actual bug report and the changelog
> >doesn't mention one either. It is about theoretical problem.
> 
> https://lkml.org/lkml/2018/3/19/430
> 
> There's even a fun little reproducer that allowed me to confirm it's an
> issue (at least) on 4.15.
> 
> Heck, it might even qualify as a CVE.
> 
> >So if this was to be merged to stable then the changelog should contain
> >a big fat warning about the existing users and how they should be
> >checked.
> 
> So what I'm asking is why *wasn't* it sent to stable? Yes, it requires
> additional work backporting this, but what I'm saying is that this
> didn't happen at all.

Do not ask me. I wasn't involved. But I would _guess_ that the original
bug is not all that serious because it requires some specific privileges
and it is quite unlikely that somebody privileged would want to shoot
its feet. But this is just my wild guess.

Anyway, I am pretty sure that if the triggering BUG was serious enough
then it would be much safer to remove it for stable backports.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Michal Hocko
On Tue 17-04-18 14:36:44, Sasha Levin wrote:
> On Tue, Apr 17, 2018 at 04:22:46PM +0200, Michal Hocko wrote:
> >On Tue 17-04-18 13:39:33, Sasha Levin wrote:
> >[...]
> >> But mm/ commits don't come only from these people. Here's a concrete
> >> example we can discuss:
> >>
> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c61611f70958d86f659bca25c02ae69413747a8d
> >
> >I would be really careful. Because that reqiures to audit all callers to
> >be compliant with the change. This is just _too_ easy to backport
> >without noticing a failure. Now consider the other side. Is there any
> >real bug report backing this? This behavior was like that for quite some
> >time but I do not remember any actual bug report and the changelog
> >doesn't mention one either. It is about theoretical problem.
> 
> https://lkml.org/lkml/2018/3/19/430
> 
> There's even a fun little reproducer that allowed me to confirm it's an
> issue (at least) on 4.15.
> 
> Heck, it might even qualify as a CVE.
> 
> >So if this was to be merged to stable then the changelog should contain
> >a big fat warning about the existing users and how they should be
> >checked.
> 
> So what I'm asking is why *wasn't* it sent to stable? Yes, it requires
> additional work backporting this, but what I'm saying is that this
> didn't happen at all.

Do not ask me. I wasn't involved. But I would _guess_ that the original
bug is not all that serious because it requires some specific privileges
and it is quite unlikely that somebody privileged would want to shoot
its feet. But this is just my wild guess.

Anyway, I am pretty sure that if the triggering BUG was serious enough
then it would be much safer to remove it for stable backports.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Jan Kara
On Tue 17-04-18 16:19:35, Sasha Levin wrote:
> On Tue, Apr 17, 2018 at 05:55:49PM +0200, Jan Kara wrote:
> >> Even regression chance is tricky, look at the commits I've linked
> >> earlier in the thread. Even the most trivial looking commits that end up
> >> in stable have a chance for regression.
> >
> >Sure, you can never be certain and I think people (including me)
> >underestimate the chance of regressions for "trivial" patches. But you just
> >estimate a chance, you may be lucky, you may not...
> >
> >> >Another point I wanted to make is that if chance a patch causes a
> >> >regression is about 2% as you said somewhere else in a thread, then by
> >> >adding 20 patches that "may fix a bug that is annoying for someone" you've
> >> >just increased a chance there's a regression in the release by 34%. And
> >>
> >> So I've said that the rejection rate is less than 2%. This includes
> >> all commits that I have proposed for -stable, but didn't end up being
> >> included in -stable.
> >>
> >> This includes commits that the author/maintainers NACKed, commits that
> >> didn't do anything on older kernels, commits that were buggy but were
> >> caught before the kernel was released, commits that failed to build on
> >> an arch I didn't test it on originally and so on.
> >>
> >> After thousands of merged AUTOSEL patches I can count the number of
> >> times a commit has caused a regression and had to be removed on one
> >> hand.
> >>
> >> >this is not just a math game, this also roughly matches a real experience
> >> >with maintaining our enterprise kernels. Do 20 "maybe" fixes outweight 
> >> >such
> >> >regression chance? And I also note that for a regression to get reported 
> >> >so
> >> >that it gets included into your 2% estimate of a patch regression rate,
> >> >someone must be bothered enough by it to triage it and send an email
> >> >somewhere so that already falls into a category of "serious" stuff to me.
> >>
> >> It is indeed a numbers game, but the regression rate isn't 2%, it's
> >> closer to 0.05%.
> >
> >Honestly, I think 0.05% is too optimististic :) Quick grepping of 4.14
> >stable tree suggests some 13 commits were reverted from stable due to bugs.
> >That's some 0.4% and that doesn't count fixes that were applied to
> >fix other regressions.
> 
> 0.05% is for commits that were merged in stable but later fixed or
> reverted because they introduced a regression. By grepping for reverts
> you also include things such as:
> 
>  - Reverts of commits that were in the corresponding mainline tree
>  - Reverts of commits that didn't introduce regressions

Actually I was careful enough to include only commits that got merged as
part of the stable process into 4.14.x but got later reverted in 4.14.y.
That's where the 0.4% number came from. So I believe all of those cases
(13 in absolute numbers) were user visible regressions during the stable
process.

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Jan Kara
On Tue 17-04-18 16:19:35, Sasha Levin wrote:
> On Tue, Apr 17, 2018 at 05:55:49PM +0200, Jan Kara wrote:
> >> Even regression chance is tricky, look at the commits I've linked
> >> earlier in the thread. Even the most trivial looking commits that end up
> >> in stable have a chance for regression.
> >
> >Sure, you can never be certain and I think people (including me)
> >underestimate the chance of regressions for "trivial" patches. But you just
> >estimate a chance, you may be lucky, you may not...
> >
> >> >Another point I wanted to make is that if chance a patch causes a
> >> >regression is about 2% as you said somewhere else in a thread, then by
> >> >adding 20 patches that "may fix a bug that is annoying for someone" you've
> >> >just increased a chance there's a regression in the release by 34%. And
> >>
> >> So I've said that the rejection rate is less than 2%. This includes
> >> all commits that I have proposed for -stable, but didn't end up being
> >> included in -stable.
> >>
> >> This includes commits that the author/maintainers NACKed, commits that
> >> didn't do anything on older kernels, commits that were buggy but were
> >> caught before the kernel was released, commits that failed to build on
> >> an arch I didn't test it on originally and so on.
> >>
> >> After thousands of merged AUTOSEL patches I can count the number of
> >> times a commit has caused a regression and had to be removed on one
> >> hand.
> >>
> >> >this is not just a math game, this also roughly matches a real experience
> >> >with maintaining our enterprise kernels. Do 20 "maybe" fixes outweight 
> >> >such
> >> >regression chance? And I also note that for a regression to get reported 
> >> >so
> >> >that it gets included into your 2% estimate of a patch regression rate,
> >> >someone must be bothered enough by it to triage it and send an email
> >> >somewhere so that already falls into a category of "serious" stuff to me.
> >>
> >> It is indeed a numbers game, but the regression rate isn't 2%, it's
> >> closer to 0.05%.
> >
> >Honestly, I think 0.05% is too optimististic :) Quick grepping of 4.14
> >stable tree suggests some 13 commits were reverted from stable due to bugs.
> >That's some 0.4% and that doesn't count fixes that were applied to
> >fix other regressions.
> 
> 0.05% is for commits that were merged in stable but later fixed or
> reverted because they introduced a regression. By grepping for reverts
> you also include things such as:
> 
>  - Reverts of commits that were in the corresponding mainline tree
>  - Reverts of commits that didn't introduce regressions

Actually I was careful enough to include only commits that got merged as
part of the stable process into 4.14.x but got later reverted in 4.14.y.
That's where the 0.4% number came from. So I believe all of those cases
(13 in absolute numbers) were user visible regressions during the stable
process.

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Sasha Levin
On Tue, Apr 17, 2018 at 05:52:30PM +0200, Jiri Kosina wrote:
>On Tue, 17 Apr 2018, Sasha Levin wrote:
>
>> How do I get the XFS folks to send their stuff to -stable? (we have
>> quite a few customers who use XFS)
>
>If XFS (or *any* other subsystem) doesn't have enough manpower of upstream
>maintainers to deal with stable, we just have to accept that and find an
>answer to that.

This is exactly what I'm doing. Many subsystems don't have enough
manpower to deal with -stable, so I'm trying to help.

>If XFS folks claim that they don't have enough mental capacity to
>create/verify XFS backports, I totally don't see how any kind of AI would
>have.

Because creating backports is not all about mental capacity!

A lot of time gets wasted on going through the list of commits,
backporting each of those commits into every -stable tree we have,
building it, running tests, etc.

So it's not all about pure mental capacity, but more about the time
per-patch it takes to get -stable done.

If I can cut down on that, by suggesting a list of commits, doing builds
and tests, what's the problem?

>If your business relies on XFS (and so does ours, BTW) or any other
>subsystem that doesn't have enough manpower to care for stable, the proper
>solution (and contribution) would be just bringing more people into the
>XFS community.

Microsoft's business relies on quite a few kernel subsystems. While we
try to bring more people in the kernel (we're hiring!), as you might
know it's not easy getting kernel folks.

So just "get more people" isn't a good solution. It doesn't scale
either.

>To put it simply -- I don't think the simple lack of actual human
>brainpower can be reasonably resolved in other way than bringing more of
>it in.
>
>Thanks,
>
>-- 
>Jiri Kosina
>SUSE Labs
>

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Sasha Levin
On Tue, Apr 17, 2018 at 05:52:30PM +0200, Jiri Kosina wrote:
>On Tue, 17 Apr 2018, Sasha Levin wrote:
>
>> How do I get the XFS folks to send their stuff to -stable? (we have
>> quite a few customers who use XFS)
>
>If XFS (or *any* other subsystem) doesn't have enough manpower of upstream
>maintainers to deal with stable, we just have to accept that and find an
>answer to that.

This is exactly what I'm doing. Many subsystems don't have enough
manpower to deal with -stable, so I'm trying to help.

>If XFS folks claim that they don't have enough mental capacity to
>create/verify XFS backports, I totally don't see how any kind of AI would
>have.

Because creating backports is not all about mental capacity!

A lot of time gets wasted on going through the list of commits,
backporting each of those commits into every -stable tree we have,
building it, running tests, etc.

So it's not all about pure mental capacity, but more about the time
per-patch it takes to get -stable done.

If I can cut down on that, by suggesting a list of commits, doing builds
and tests, what's the problem?

>If your business relies on XFS (and so does ours, BTW) or any other
>subsystem that doesn't have enough manpower to care for stable, the proper
>solution (and contribution) would be just bringing more people into the
>XFS community.

Microsoft's business relies on quite a few kernel subsystems. While we
try to bring more people in the kernel (we're hiring!), as you might
know it's not easy getting kernel folks.

So just "get more people" isn't a good solution. It doesn't scale
either.

>To put it simply -- I don't think the simple lack of actual human
>brainpower can be reasonably resolved in other way than bringing more of
>it in.
>
>Thanks,
>
>-- 
>Jiri Kosina
>SUSE Labs
>

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Mike Galbraith
On Tue, 2018-04-17 at 17:52 +0200, Jiri Kosina wrote:
> On Tue, 17 Apr 2018, Sasha Levin wrote:
> 
> > How do I get the XFS folks to send their stuff to -stable? (we have
> > quite a few customers who use XFS)
> 
> If XFS (or *any* other subsystem) doesn't have enough manpower of upstream 
> maintainers to deal with stable, we just have to accept that and find an 
> answer to that.
> 
> If XFS folks claim that they don't have enough mental capacity to 
> create/verify XFS backports, I totally don't see how any kind of AI would 
> have.
> 
> If your business relies on XFS (and so does ours, BTW) or any other 
> subsystem that doesn't have enough manpower to care for stable, the proper 
> solution (and contribution) would be just bringing more people into the 
> XFS community.
> 
> To put it simply -- I don't think the simple lack of actual human 
> brainpower can be reasonably resolved in other way than bringing more of 
> it in.

Not to worry... soon enough it'll be submitting properly massaged
backports of the stuff it submitted upstream :)

-Mike


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Mike Galbraith
On Tue, 2018-04-17 at 17:52 +0200, Jiri Kosina wrote:
> On Tue, 17 Apr 2018, Sasha Levin wrote:
> 
> > How do I get the XFS folks to send their stuff to -stable? (we have
> > quite a few customers who use XFS)
> 
> If XFS (or *any* other subsystem) doesn't have enough manpower of upstream 
> maintainers to deal with stable, we just have to accept that and find an 
> answer to that.
> 
> If XFS folks claim that they don't have enough mental capacity to 
> create/verify XFS backports, I totally don't see how any kind of AI would 
> have.
> 
> If your business relies on XFS (and so does ours, BTW) or any other 
> subsystem that doesn't have enough manpower to care for stable, the proper 
> solution (and contribution) would be just bringing more people into the 
> XFS community.
> 
> To put it simply -- I don't think the simple lack of actual human 
> brainpower can be reasonably resolved in other way than bringing more of 
> it in.

Not to worry... soon enough it'll be submitting properly massaged
backports of the stuff it submitted upstream :)

-Mike


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Sasha Levin
On Tue, Apr 17, 2018 at 05:55:49PM +0200, Jan Kara wrote:
>On Tue 17-04-18 13:31:51, Sasha Levin wrote:
>> We may be able to guesstimate the 'regression chance', but there's no
>> way we can guess the 'annoyance' once. There are so many different use
>> cases that we just can't even guess how many people would get "annoyed"
>> by something.
>
>As a maintainer, I hope I have reasonable idea what are common use cases
>for my subsystem. Those I cater to when estimating 'annoyance'. Sure I don't
>know all of the use cases so people doing unusual stuff hit more bugs and
>have to report them to get fixes included in -stable. But for me this is a
>preferable tradeoff over the risk of regression so this is the rule I use
>when tagging for stable. Now I'm not a -stable maintainer and I fully agree
>with "those who do the work decide" principle so pick whatever patches you
>think are appropriate, I just wanted explain why I don't think more patches
>in stable are necessarily good.

The AUTOSEL story is different for subsystems that don't do -stable, and
subsystems that are actually doing the work (like yourself).

I'm not trying to override active maintainers, I'm trying to help them
make decisions.

The AUTOSEL bot will attempt to apply any patch it deems as -stable for
on all -stable branches, finding possible dependencies, build them, and
run any tests that you might deem necessary.

You would be able to start your analysis without "wasting" time on doing
a bunch of "manual labor".

There's a big difference between subsystems like yours and most of the
rest of the kernel.

>> Even regression chance is tricky, look at the commits I've linked
>> earlier in the thread. Even the most trivial looking commits that end up
>> in stable have a chance for regression.
>
>Sure, you can never be certain and I think people (including me)
>underestimate the chance of regressions for "trivial" patches. But you just
>estimate a chance, you may be lucky, you may not...
>
>> >Another point I wanted to make is that if chance a patch causes a
>> >regression is about 2% as you said somewhere else in a thread, then by
>> >adding 20 patches that "may fix a bug that is annoying for someone" you've
>> >just increased a chance there's a regression in the release by 34%. And
>>
>> So I've said that the rejection rate is less than 2%. This includes
>> all commits that I have proposed for -stable, but didn't end up being
>> included in -stable.
>>
>> This includes commits that the author/maintainers NACKed, commits that
>> didn't do anything on older kernels, commits that were buggy but were
>> caught before the kernel was released, commits that failed to build on
>> an arch I didn't test it on originally and so on.
>>
>> After thousands of merged AUTOSEL patches I can count the number of
>> times a commit has caused a regression and had to be removed on one
>> hand.
>>
>> >this is not just a math game, this also roughly matches a real experience
>> >with maintaining our enterprise kernels. Do 20 "maybe" fixes outweight such
>> >regression chance? And I also note that for a regression to get reported so
>> >that it gets included into your 2% estimate of a patch regression rate,
>> >someone must be bothered enough by it to triage it and send an email
>> >somewhere so that already falls into a category of "serious" stuff to me.
>>
>> It is indeed a numbers game, but the regression rate isn't 2%, it's
>> closer to 0.05%.
>
>Honestly, I think 0.05% is too optimististic :) Quick grepping of 4.14
>stable tree suggests some 13 commits were reverted from stable due to bugs.
>That's some 0.4% and that doesn't count fixes that were applied to
>fix other regressions.

0.05% is for commits that were merged in stable but later fixed or
reverted because they introduced a regression. By grepping for reverts
you also include things such as:

 - Reverts of commits that were in the corresponding mainline tree
 - Reverts of commits that didn't introduce regressions

>But the actual numbers don't really matter that much, in principle the more
>patches you add the higher is the chance of regression. You can't change
>that so you better have a good reason to include a patch...

You increase the chance of regressions, but you also increase the chance
of fixing bugs that affect users.

My claim is that the chance to fix bugs increases far more significantly
than the chance to introduce regressions.

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Sasha Levin
On Tue, Apr 17, 2018 at 05:55:49PM +0200, Jan Kara wrote:
>On Tue 17-04-18 13:31:51, Sasha Levin wrote:
>> We may be able to guesstimate the 'regression chance', but there's no
>> way we can guess the 'annoyance' once. There are so many different use
>> cases that we just can't even guess how many people would get "annoyed"
>> by something.
>
>As a maintainer, I hope I have reasonable idea what are common use cases
>for my subsystem. Those I cater to when estimating 'annoyance'. Sure I don't
>know all of the use cases so people doing unusual stuff hit more bugs and
>have to report them to get fixes included in -stable. But for me this is a
>preferable tradeoff over the risk of regression so this is the rule I use
>when tagging for stable. Now I'm not a -stable maintainer and I fully agree
>with "those who do the work decide" principle so pick whatever patches you
>think are appropriate, I just wanted explain why I don't think more patches
>in stable are necessarily good.

The AUTOSEL story is different for subsystems that don't do -stable, and
subsystems that are actually doing the work (like yourself).

I'm not trying to override active maintainers, I'm trying to help them
make decisions.

The AUTOSEL bot will attempt to apply any patch it deems as -stable for
on all -stable branches, finding possible dependencies, build them, and
run any tests that you might deem necessary.

You would be able to start your analysis without "wasting" time on doing
a bunch of "manual labor".

There's a big difference between subsystems like yours and most of the
rest of the kernel.

>> Even regression chance is tricky, look at the commits I've linked
>> earlier in the thread. Even the most trivial looking commits that end up
>> in stable have a chance for regression.
>
>Sure, you can never be certain and I think people (including me)
>underestimate the chance of regressions for "trivial" patches. But you just
>estimate a chance, you may be lucky, you may not...
>
>> >Another point I wanted to make is that if chance a patch causes a
>> >regression is about 2% as you said somewhere else in a thread, then by
>> >adding 20 patches that "may fix a bug that is annoying for someone" you've
>> >just increased a chance there's a regression in the release by 34%. And
>>
>> So I've said that the rejection rate is less than 2%. This includes
>> all commits that I have proposed for -stable, but didn't end up being
>> included in -stable.
>>
>> This includes commits that the author/maintainers NACKed, commits that
>> didn't do anything on older kernels, commits that were buggy but were
>> caught before the kernel was released, commits that failed to build on
>> an arch I didn't test it on originally and so on.
>>
>> After thousands of merged AUTOSEL patches I can count the number of
>> times a commit has caused a regression and had to be removed on one
>> hand.
>>
>> >this is not just a math game, this also roughly matches a real experience
>> >with maintaining our enterprise kernels. Do 20 "maybe" fixes outweight such
>> >regression chance? And I also note that for a regression to get reported so
>> >that it gets included into your 2% estimate of a patch regression rate,
>> >someone must be bothered enough by it to triage it and send an email
>> >somewhere so that already falls into a category of "serious" stuff to me.
>>
>> It is indeed a numbers game, but the regression rate isn't 2%, it's
>> closer to 0.05%.
>
>Honestly, I think 0.05% is too optimististic :) Quick grepping of 4.14
>stable tree suggests some 13 commits were reverted from stable due to bugs.
>That's some 0.4% and that doesn't count fixes that were applied to
>fix other regressions.

0.05% is for commits that were merged in stable but later fixed or
reverted because they introduced a regression. By grepping for reverts
you also include things such as:

 - Reverts of commits that were in the corresponding mainline tree
 - Reverts of commits that didn't introduce regressions

>But the actual numbers don't really matter that much, in principle the more
>patches you add the higher is the chance of regression. You can't change
>that so you better have a good reason to include a patch...

You increase the chance of regressions, but you also increase the chance
of fixing bugs that affect users.

My claim is that the chance to fix bugs increases far more significantly
than the chance to introduce regressions.

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Jan Kara
On Tue 17-04-18 13:31:51, Sasha Levin wrote:
> On Tue, Apr 17, 2018 at 01:41:44PM +0200, Jan Kara wrote:
> >On Mon 16-04-18 17:23:30, Sasha Levin wrote:
> >> On Mon, Apr 16, 2018 at 07:06:04PM +0200, Pavel Machek wrote:
> >> >On Mon 2018-04-16 16:37:56, Sasha Levin wrote:
> >> >> On Mon, Apr 16, 2018 at 12:30:19PM -0400, Steven Rostedt wrote:
> >> >> >On Mon, 16 Apr 2018 16:19:14 +
> >> >> >Sasha Levin  wrote:
> >> >> >
> >> >> >> >Wait! What does that mean? What's the purpose of stable if it is as
> >> >> >> >broken as mainline?
> >> >> >>
> >> >> >> This just means that if there is a fix that went in mainline, and the
> >> >> >> fix is broken somehow, we'd rather take the broken fix than not.
> >> >> >>
> >> >> >> In this scenario, *something* will be broken, it's just a matter of
> >> >> >> what. We'd rather have the same thing broken between mainline and
> >> >> >> stable.
> >> >> >
> >> >> >Honestly, I think that removes all value of the stable series. I
> >> >> >remember when the stable series were first created. People were saying
> >> >> >that it wouldn't even get to more than 5 versions, because the bar for
> >> >> >backporting was suppose to be very high. Today it's just a fork of the
> >> >> >kernel at a given version. No more features, but we will be OK with
> >> >> >regressions. I'm struggling to see what the benefit of it is suppose to
> >> >> >be?
> >> >>
> >> >> It's not "OK with regressions".
> >> >>
> >> >> Let's look at a hypothetical example: You have a 4.15.1 kernel that has
> >> >> a broken printf() behaviour so that when you:
> >> >>
> >> >> pr_err("%d", 5)
> >> >>
> >> >> Would print:
> >> >>
> >> >> "Microsoft Rulez"
> >> >>
> >> >> Bad, right? So you went ahead and fixed it, and now it prints "5" as you
> >> >> might expect. But alas, with your patch, running:
> >> >>
> >> >> pr_err("%s", "hi!")
> >> >>
> >> >> Would show a cat picture for 5 seconds.
> >> >>
> >> >> Should we take your patch in -stable or not? If we don't, we're stuck
> >> >> with the original issue while the mainline kernel will behave
> >> >> differently, but if we do - we introduce a new regression.
> >> >
> >> >Of course not.
> >> >
> >> >- It must be obviously correct and tested.
> >> >
> >> >If it introduces new bug, it is not correct, and certainly not
> >> >obviously correct.
> >>
> >> As you might have noticed, we don't strictly follow the rules.
> >>
> >> Take a look at the whole PTI story as an example. It's way more than 100
> >> lines, it's not obviously corrent, it fixed more than 1 thing, and so
> >> on, and yet it went in -stable!
> >>
> >> Would you argue we shouldn't have backported PTI to -stable?
> >
> >So I agree with that being backported. But I think this nicely demostrates
> >a point some people are trying to make in this thread. We do take fixes
> >with high risk or regression if they fix serious enough issue. Also we do
> >take fixes to non-serious stuff (such as addition of device ID) if the
> >chances of regression are really low.
> >
> >So IMHO the metric for including the fix is not solely "how annoying to
> >user this can be" but rather something like:
> >
> >score = (how annoying the bug is) * ((1 / (chance of regression due to
> > including this)) - 1)^3
> >
> >(constants are somewhat arbitrary subject to tuning ;). Now both 'annoying'
> >and 'regression chance' parts are subjective and sometimes difficult to
> >estimate so don't take the formula too seriously but it demonstrates the
> >point. I think we all agree we want to fix annoying stuff and we don't want
> >regressions. But you need to somehow weight this over your expected
> >userbase - and this is where your argument "but someone might be annoyed by
> >LEDs not working so let's include it" has problems - it should rather be
> >"is the annoyance of non-working leds over expected user base high enough
> >to risk a regression due to this patch for someone in the expected user
> >base"? The answer to this second question is not clear at all to a casual
> >reviewer and that's why we IMHO have CC stable tag as maintainer is
> >supposed to have at least a bit better clue.
> 
> We may be able to guesstimate the 'regression chance', but there's no
> way we can guess the 'annoyance' once. There are so many different use
> cases that we just can't even guess how many people would get "annoyed"
> by something.

As a maintainer, I hope I have reasonable idea what are common use cases
for my subsystem. Those I cater to when estimating 'annoyance'. Sure I don't
know all of the use cases so people doing unusual stuff hit more bugs and
have to report them to get fixes included in -stable. But for me this is a
preferable tradeoff over the risk of regression so this is the rule I use
when tagging for stable. Now I'm not a -stable maintainer and I fully agree
with "those who do the work decide" principle so pick whatever patches you
think are appropriate, I just wanted explain 

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Jan Kara
On Tue 17-04-18 13:31:51, Sasha Levin wrote:
> On Tue, Apr 17, 2018 at 01:41:44PM +0200, Jan Kara wrote:
> >On Mon 16-04-18 17:23:30, Sasha Levin wrote:
> >> On Mon, Apr 16, 2018 at 07:06:04PM +0200, Pavel Machek wrote:
> >> >On Mon 2018-04-16 16:37:56, Sasha Levin wrote:
> >> >> On Mon, Apr 16, 2018 at 12:30:19PM -0400, Steven Rostedt wrote:
> >> >> >On Mon, 16 Apr 2018 16:19:14 +
> >> >> >Sasha Levin  wrote:
> >> >> >
> >> >> >> >Wait! What does that mean? What's the purpose of stable if it is as
> >> >> >> >broken as mainline?
> >> >> >>
> >> >> >> This just means that if there is a fix that went in mainline, and the
> >> >> >> fix is broken somehow, we'd rather take the broken fix than not.
> >> >> >>
> >> >> >> In this scenario, *something* will be broken, it's just a matter of
> >> >> >> what. We'd rather have the same thing broken between mainline and
> >> >> >> stable.
> >> >> >
> >> >> >Honestly, I think that removes all value of the stable series. I
> >> >> >remember when the stable series were first created. People were saying
> >> >> >that it wouldn't even get to more than 5 versions, because the bar for
> >> >> >backporting was suppose to be very high. Today it's just a fork of the
> >> >> >kernel at a given version. No more features, but we will be OK with
> >> >> >regressions. I'm struggling to see what the benefit of it is suppose to
> >> >> >be?
> >> >>
> >> >> It's not "OK with regressions".
> >> >>
> >> >> Let's look at a hypothetical example: You have a 4.15.1 kernel that has
> >> >> a broken printf() behaviour so that when you:
> >> >>
> >> >> pr_err("%d", 5)
> >> >>
> >> >> Would print:
> >> >>
> >> >> "Microsoft Rulez"
> >> >>
> >> >> Bad, right? So you went ahead and fixed it, and now it prints "5" as you
> >> >> might expect. But alas, with your patch, running:
> >> >>
> >> >> pr_err("%s", "hi!")
> >> >>
> >> >> Would show a cat picture for 5 seconds.
> >> >>
> >> >> Should we take your patch in -stable or not? If we don't, we're stuck
> >> >> with the original issue while the mainline kernel will behave
> >> >> differently, but if we do - we introduce a new regression.
> >> >
> >> >Of course not.
> >> >
> >> >- It must be obviously correct and tested.
> >> >
> >> >If it introduces new bug, it is not correct, and certainly not
> >> >obviously correct.
> >>
> >> As you might have noticed, we don't strictly follow the rules.
> >>
> >> Take a look at the whole PTI story as an example. It's way more than 100
> >> lines, it's not obviously corrent, it fixed more than 1 thing, and so
> >> on, and yet it went in -stable!
> >>
> >> Would you argue we shouldn't have backported PTI to -stable?
> >
> >So I agree with that being backported. But I think this nicely demostrates
> >a point some people are trying to make in this thread. We do take fixes
> >with high risk or regression if they fix serious enough issue. Also we do
> >take fixes to non-serious stuff (such as addition of device ID) if the
> >chances of regression are really low.
> >
> >So IMHO the metric for including the fix is not solely "how annoying to
> >user this can be" but rather something like:
> >
> >score = (how annoying the bug is) * ((1 / (chance of regression due to
> > including this)) - 1)^3
> >
> >(constants are somewhat arbitrary subject to tuning ;). Now both 'annoying'
> >and 'regression chance' parts are subjective and sometimes difficult to
> >estimate so don't take the formula too seriously but it demonstrates the
> >point. I think we all agree we want to fix annoying stuff and we don't want
> >regressions. But you need to somehow weight this over your expected
> >userbase - and this is where your argument "but someone might be annoyed by
> >LEDs not working so let's include it" has problems - it should rather be
> >"is the annoyance of non-working leds over expected user base high enough
> >to risk a regression due to this patch for someone in the expected user
> >base"? The answer to this second question is not clear at all to a casual
> >reviewer and that's why we IMHO have CC stable tag as maintainer is
> >supposed to have at least a bit better clue.
> 
> We may be able to guesstimate the 'regression chance', but there's no
> way we can guess the 'annoyance' once. There are so many different use
> cases that we just can't even guess how many people would get "annoyed"
> by something.

As a maintainer, I hope I have reasonable idea what are common use cases
for my subsystem. Those I cater to when estimating 'annoyance'. Sure I don't
know all of the use cases so people doing unusual stuff hit more bugs and
have to report them to get fixes included in -stable. But for me this is a
preferable tradeoff over the risk of regression so this is the rule I use
when tagging for stable. Now I'm not a -stable maintainer and I fully agree
with "those who do the work decide" principle so pick whatever patches you
think are appropriate, I just wanted explain why I don't think more patches
in 

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Jiri Kosina
On Tue, 17 Apr 2018, Sasha Levin wrote:

> How do I get the XFS folks to send their stuff to -stable? (we have
> quite a few customers who use XFS)

If XFS (or *any* other subsystem) doesn't have enough manpower of upstream 
maintainers to deal with stable, we just have to accept that and find an 
answer to that.

If XFS folks claim that they don't have enough mental capacity to 
create/verify XFS backports, I totally don't see how any kind of AI would 
have.

If your business relies on XFS (and so does ours, BTW) or any other 
subsystem that doesn't have enough manpower to care for stable, the proper 
solution (and contribution) would be just bringing more people into the 
XFS community.

To put it simply -- I don't think the simple lack of actual human 
brainpower can be reasonably resolved in other way than bringing more of 
it in.

Thanks,

-- 
Jiri Kosina
SUSE Labs



Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Jiri Kosina
On Tue, 17 Apr 2018, Sasha Levin wrote:

> How do I get the XFS folks to send their stuff to -stable? (we have
> quite a few customers who use XFS)

If XFS (or *any* other subsystem) doesn't have enough manpower of upstream 
maintainers to deal with stable, we just have to accept that and find an 
answer to that.

If XFS folks claim that they don't have enough mental capacity to 
create/verify XFS backports, I totally don't see how any kind of AI would 
have.

If your business relies on XFS (and so does ours, BTW) or any other 
subsystem that doesn't have enough manpower to care for stable, the proper 
solution (and contribution) would be just bringing more people into the 
XFS community.

To put it simply -- I don't think the simple lack of actual human 
brainpower can be reasonably resolved in other way than bringing more of 
it in.

Thanks,

-- 
Jiri Kosina
SUSE Labs



Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Sasha Levin
On Tue, Apr 17, 2018 at 04:36:31PM +0200, Michal Hocko wrote:
>On Tue 17-04-18 14:04:36, Sasha Levin wrote:
>> On Tue, Apr 17, 2018 at 01:07:17PM +0200, Michal Hocko wrote:
>> >On Tue 17-04-18 12:39:36, Greg KH wrote:
>> >> On Mon, Apr 16, 2018 at 11:28:44PM +0200, Jiri Kosina wrote:
>> >> > On Mon, 16 Apr 2018, Sasha Levin wrote:
>> >> >
>> >> > > I agree that as an enterprise distro taking everything from -stable
>> >> > > isn't the best idea. Ideally you'd want to be close to the first
>> >> > > extreme you've mentioned and only take commits if customers are asking
>> >> > > you to do so.
>> >> > >
>> >> > > I think that the rule we're trying to agree upon is the "It must fix
>> >> > > a real bug that bothers people".
>> >> > >
>> >> > > I think that we can agree that it's impossible to expect every single
>> >> > > Linux user to go on LKML and complain about a bug he encountered, so 
>> >> > > the
>> >> > > rule quickly becomes "It must fix a real bug that can bother people".
>> >> >
>> >> > So is there a reason why stable couldn't become some hybrid-form union 
>> >> > of
>> >> >
>> >> > - really critical issues (data corruption, boot issues, severe security
>> >> >   issues) taken from bleeding edge upstream
>> >> > - [reviewed] cherry-picks of functional fixes from major distro kernels
>> >> >   (based on that very -stable release), as that's apparently what people
>> >> >   are hitting in the real world with that particular kernel
>> >>
>> >> It already is that :)
>> >>
>> >> The problem Sasha is trying to solve here is that for many subsystems,
>> >> maintainers do not mark patches for stable at all.
>> >
>> >The way he is trying to do that is just wrong. Generate a pressure on
>> >those subsystems by referring to bug reports and unhappy users and I am
>> >pretty sure they will try harder... You cannot solve the problem by
>> >bypassing them without having deep understanding of the specific
>> >subsytem. Once you have it, just make sure you are part of the review
>> >process and make sure to mark patches before they are merged.
>>
>> I think we just don't agree on how we should "pressure".
>>
>> Look at the discussion I had with the XFS folks who just don't want to
>> deal with this -stable thing because they have to much work upstream.
>
>So do you really think that you or any script decide without them? My
>recollection from that discussion was quite opposite. Dave was quite
>clear that most of fixes are quite hard to evaluate and most of them
>are simply not worth risking the backport.

No, *some* fixes are hard, not most.

I'm not trying to decide for them, I'm trying to help them decide.

>> There wasn't a single patch in -stable coming from XFS for the past 6+
>> months. I'm aware of more than one way to corrupt an XFS volume for any
>> distro that uses a kernel older than 4.15.
>
>Then try to poke/bribe somebody to have it fixed. But applying
>_something_ is just not a solution. You should also evaluate whether "I
>am able to corrupt" is something that "people see in the wild". Sure
>there are zillions of bugs hidden in the large code base like the
>kernel. People just do not tend to hit them and this will likely not
>change very much in the future.

We can't ignore bugs just because people don't notice.

Data corruption bugs in particular are a pain to report as well, the
corruption might have happened months before and there's not much to
report at that point.

There's quite a few bug classes like that.

>> Sure, please buy them a beer at LSF/MM (I'll pay) and ask them to be
>> better about it, but I don't see this changing.
>
>I can surely have one or two and discuss this. I am pretty sure xfs guys
>are not going to pretend older kernels do not exist.
>
>> The solution to this, in my opinion, is to automate the whole selection
>> and review process. We do selection using AI, and we run every possible
>> test that's relevant to that subsystem.
>>
>> At which point, the amount of work a human needs to do to review a patch
>> shrinks into something far more managable for some maintainers.
>
>I really disagree. I am pretty sure maintainers are very well aware of
>how the patch is important. Some do no care about stable and I agree you
>should poke those. But some have really good reasons to not throw many
>patches that direction because they do not feel the patch is important
>enough.
>
>Remember this is not about numbers. The more is not always better.

So what is "important"? Look at the XFS issues, they were important
enough to get fixed upstream, and have an appropriate test added to
xfstests.

Why didn't they go back to -stable?

>> >> So real bugfixes
>> >> that do hit people are not getting to those kernels, which force the
>> >> distros to do extra work to triage a bug, dig through upstream kernels,
>> >> find and apply the patch.
>> >
>> >I would say that this is the primary role of the distro. To hide the
>> >jungle of the upstream work and provide the additional of bug filtering

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Sasha Levin
On Tue, Apr 17, 2018 at 04:36:31PM +0200, Michal Hocko wrote:
>On Tue 17-04-18 14:04:36, Sasha Levin wrote:
>> On Tue, Apr 17, 2018 at 01:07:17PM +0200, Michal Hocko wrote:
>> >On Tue 17-04-18 12:39:36, Greg KH wrote:
>> >> On Mon, Apr 16, 2018 at 11:28:44PM +0200, Jiri Kosina wrote:
>> >> > On Mon, 16 Apr 2018, Sasha Levin wrote:
>> >> >
>> >> > > I agree that as an enterprise distro taking everything from -stable
>> >> > > isn't the best idea. Ideally you'd want to be close to the first
>> >> > > extreme you've mentioned and only take commits if customers are asking
>> >> > > you to do so.
>> >> > >
>> >> > > I think that the rule we're trying to agree upon is the "It must fix
>> >> > > a real bug that bothers people".
>> >> > >
>> >> > > I think that we can agree that it's impossible to expect every single
>> >> > > Linux user to go on LKML and complain about a bug he encountered, so 
>> >> > > the
>> >> > > rule quickly becomes "It must fix a real bug that can bother people".
>> >> >
>> >> > So is there a reason why stable couldn't become some hybrid-form union 
>> >> > of
>> >> >
>> >> > - really critical issues (data corruption, boot issues, severe security
>> >> >   issues) taken from bleeding edge upstream
>> >> > - [reviewed] cherry-picks of functional fixes from major distro kernels
>> >> >   (based on that very -stable release), as that's apparently what people
>> >> >   are hitting in the real world with that particular kernel
>> >>
>> >> It already is that :)
>> >>
>> >> The problem Sasha is trying to solve here is that for many subsystems,
>> >> maintainers do not mark patches for stable at all.
>> >
>> >The way he is trying to do that is just wrong. Generate a pressure on
>> >those subsystems by referring to bug reports and unhappy users and I am
>> >pretty sure they will try harder... You cannot solve the problem by
>> >bypassing them without having deep understanding of the specific
>> >subsytem. Once you have it, just make sure you are part of the review
>> >process and make sure to mark patches before they are merged.
>>
>> I think we just don't agree on how we should "pressure".
>>
>> Look at the discussion I had with the XFS folks who just don't want to
>> deal with this -stable thing because they have to much work upstream.
>
>So do you really think that you or any script decide without them? My
>recollection from that discussion was quite opposite. Dave was quite
>clear that most of fixes are quite hard to evaluate and most of them
>are simply not worth risking the backport.

No, *some* fixes are hard, not most.

I'm not trying to decide for them, I'm trying to help them decide.

>> There wasn't a single patch in -stable coming from XFS for the past 6+
>> months. I'm aware of more than one way to corrupt an XFS volume for any
>> distro that uses a kernel older than 4.15.
>
>Then try to poke/bribe somebody to have it fixed. But applying
>_something_ is just not a solution. You should also evaluate whether "I
>am able to corrupt" is something that "people see in the wild". Sure
>there are zillions of bugs hidden in the large code base like the
>kernel. People just do not tend to hit them and this will likely not
>change very much in the future.

We can't ignore bugs just because people don't notice.

Data corruption bugs in particular are a pain to report as well, the
corruption might have happened months before and there's not much to
report at that point.

There's quite a few bug classes like that.

>> Sure, please buy them a beer at LSF/MM (I'll pay) and ask them to be
>> better about it, but I don't see this changing.
>
>I can surely have one or two and discuss this. I am pretty sure xfs guys
>are not going to pretend older kernels do not exist.
>
>> The solution to this, in my opinion, is to automate the whole selection
>> and review process. We do selection using AI, and we run every possible
>> test that's relevant to that subsystem.
>>
>> At which point, the amount of work a human needs to do to review a patch
>> shrinks into something far more managable for some maintainers.
>
>I really disagree. I am pretty sure maintainers are very well aware of
>how the patch is important. Some do no care about stable and I agree you
>should poke those. But some have really good reasons to not throw many
>patches that direction because they do not feel the patch is important
>enough.
>
>Remember this is not about numbers. The more is not always better.

So what is "important"? Look at the XFS issues, they were important
enough to get fixed upstream, and have an appropriate test added to
xfstests.

Why didn't they go back to -stable?

>> >> So real bugfixes
>> >> that do hit people are not getting to those kernels, which force the
>> >> distros to do extra work to triage a bug, dig through upstream kernels,
>> >> find and apply the patch.
>> >
>> >I would say that this is the primary role of the distro. To hide the
>> >jungle of the upstream work and provide the additional of bug filtering

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Sasha Levin
On Tue, Apr 17, 2018 at 04:22:46PM +0200, Michal Hocko wrote:
>On Tue 17-04-18 13:39:33, Sasha Levin wrote:
>[...]
>> But mm/ commits don't come only from these people. Here's a concrete
>> example we can discuss:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c61611f70958d86f659bca25c02ae69413747a8d
>
>I would be really careful. Because that reqiures to audit all callers to
>be compliant with the change. This is just _too_ easy to backport
>without noticing a failure. Now consider the other side. Is there any
>real bug report backing this? This behavior was like that for quite some
>time but I do not remember any actual bug report and the changelog
>doesn't mention one either. It is about theoretical problem.

https://lkml.org/lkml/2018/3/19/430

There's even a fun little reproducer that allowed me to confirm it's an
issue (at least) on 4.15.

Heck, it might even qualify as a CVE.

>So if this was to be merged to stable then the changelog should contain
>a big fat warning about the existing users and how they should be
>checked.

So what I'm asking is why *wasn't* it sent to stable? Yes, it requires
additional work backporting this, but what I'm saying is that this
didn't happen at all.

>Besides that I can see Reviewed-by: akpm and Andrew is usually very
>careful about stable backports so there probably _was_ a reson to
>exclude stable.
>-- 
>Michal Hocko
>SUSE Labs

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Sasha Levin
On Tue, Apr 17, 2018 at 04:22:46PM +0200, Michal Hocko wrote:
>On Tue 17-04-18 13:39:33, Sasha Levin wrote:
>[...]
>> But mm/ commits don't come only from these people. Here's a concrete
>> example we can discuss:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c61611f70958d86f659bca25c02ae69413747a8d
>
>I would be really careful. Because that reqiures to audit all callers to
>be compliant with the change. This is just _too_ easy to backport
>without noticing a failure. Now consider the other side. Is there any
>real bug report backing this? This behavior was like that for quite some
>time but I do not remember any actual bug report and the changelog
>doesn't mention one either. It is about theoretical problem.

https://lkml.org/lkml/2018/3/19/430

There's even a fun little reproducer that allowed me to confirm it's an
issue (at least) on 4.15.

Heck, it might even qualify as a CVE.

>So if this was to be merged to stable then the changelog should contain
>a big fat warning about the existing users and how they should be
>checked.

So what I'm asking is why *wasn't* it sent to stable? Yes, it requires
additional work backporting this, but what I'm saying is that this
didn't happen at all.

>Besides that I can see Reviewed-by: akpm and Andrew is usually very
>careful about stable backports so there probably _was_ a reson to
>exclude stable.
>-- 
>Michal Hocko
>SUSE Labs

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Michal Hocko
On Tue 17-04-18 14:04:36, Sasha Levin wrote:
> On Tue, Apr 17, 2018 at 01:07:17PM +0200, Michal Hocko wrote:
> >On Tue 17-04-18 12:39:36, Greg KH wrote:
> >> On Mon, Apr 16, 2018 at 11:28:44PM +0200, Jiri Kosina wrote:
> >> > On Mon, 16 Apr 2018, Sasha Levin wrote:
> >> >
> >> > > I agree that as an enterprise distro taking everything from -stable
> >> > > isn't the best idea. Ideally you'd want to be close to the first
> >> > > extreme you've mentioned and only take commits if customers are asking
> >> > > you to do so.
> >> > >
> >> > > I think that the rule we're trying to agree upon is the "It must fix
> >> > > a real bug that bothers people".
> >> > >
> >> > > I think that we can agree that it's impossible to expect every single
> >> > > Linux user to go on LKML and complain about a bug he encountered, so 
> >> > > the
> >> > > rule quickly becomes "It must fix a real bug that can bother people".
> >> >
> >> > So is there a reason why stable couldn't become some hybrid-form union of
> >> >
> >> > - really critical issues (data corruption, boot issues, severe security
> >> >   issues) taken from bleeding edge upstream
> >> > - [reviewed] cherry-picks of functional fixes from major distro kernels
> >> >   (based on that very -stable release), as that's apparently what people
> >> >   are hitting in the real world with that particular kernel
> >>
> >> It already is that :)
> >>
> >> The problem Sasha is trying to solve here is that for many subsystems,
> >> maintainers do not mark patches for stable at all.
> >
> >The way he is trying to do that is just wrong. Generate a pressure on
> >those subsystems by referring to bug reports and unhappy users and I am
> >pretty sure they will try harder... You cannot solve the problem by
> >bypassing them without having deep understanding of the specific
> >subsytem. Once you have it, just make sure you are part of the review
> >process and make sure to mark patches before they are merged.
> 
> I think we just don't agree on how we should "pressure".
> 
> Look at the discussion I had with the XFS folks who just don't want to
> deal with this -stable thing because they have to much work upstream.

So do you really think that you or any script decide without them? My
recollection from that discussion was quite opposite. Dave was quite
clear that most of fixes are quite hard to evaluate and most of them
are simply not worth risking the backport.

> There wasn't a single patch in -stable coming from XFS for the past 6+
> months. I'm aware of more than one way to corrupt an XFS volume for any
> distro that uses a kernel older than 4.15.

Then try to poke/bribe somebody to have it fixed. But applying
_something_ is just not a solution. You should also evaluate whether "I
am able to corrupt" is something that "people see in the wild". Sure
there are zillions of bugs hidden in the large code base like the
kernel. People just do not tend to hit them and this will likely not
change very much in the future.

> Sure, please buy them a beer at LSF/MM (I'll pay) and ask them to be
> better about it, but I don't see this changing.

I can surely have one or two and discuss this. I am pretty sure xfs guys
are not going to pretend older kernels do not exist.

> The solution to this, in my opinion, is to automate the whole selection
> and review process. We do selection using AI, and we run every possible
> test that's relevant to that subsystem.
> 
> At which point, the amount of work a human needs to do to review a patch
> shrinks into something far more managable for some maintainers.

I really disagree. I am pretty sure maintainers are very well aware of
how the patch is important. Some do no care about stable and I agree you
should poke those. But some have really good reasons to not throw many
patches that direction because they do not feel the patch is important
enough.

Remember this is not about numbers. The more is not always better.

> >> So real bugfixes
> >> that do hit people are not getting to those kernels, which force the
> >> distros to do extra work to triage a bug, dig through upstream kernels,
> >> find and apply the patch.
> >
> >I would say that this is the primary role of the distro. To hide the
> >jungle of the upstream work and provide the additional of bug filtering
> >and forwarding them the right direction.
> 
> More often than triaging, you'll just be asked to upgrade to the latest
> version. What sort of user experience does that provide?
> 
> [snip]
> 
> >> So nothing "new" is happening here, EXCEPT we are actually starting to
> >> get a better kernel-wide coverage for stable fixes, which we have not
> >> had in the past.  That's a good thing!  The number of patches applied to
> >> stable is still a very very very tiny % compared to mainline, so nothing
> >> new is happening here.
> >
> >yes I do agree, the stable process is not very much different from the
> >past and I would tend both processes broken because they explicitly try
> >to avoid 

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Michal Hocko
On Tue 17-04-18 14:04:36, Sasha Levin wrote:
> On Tue, Apr 17, 2018 at 01:07:17PM +0200, Michal Hocko wrote:
> >On Tue 17-04-18 12:39:36, Greg KH wrote:
> >> On Mon, Apr 16, 2018 at 11:28:44PM +0200, Jiri Kosina wrote:
> >> > On Mon, 16 Apr 2018, Sasha Levin wrote:
> >> >
> >> > > I agree that as an enterprise distro taking everything from -stable
> >> > > isn't the best idea. Ideally you'd want to be close to the first
> >> > > extreme you've mentioned and only take commits if customers are asking
> >> > > you to do so.
> >> > >
> >> > > I think that the rule we're trying to agree upon is the "It must fix
> >> > > a real bug that bothers people".
> >> > >
> >> > > I think that we can agree that it's impossible to expect every single
> >> > > Linux user to go on LKML and complain about a bug he encountered, so 
> >> > > the
> >> > > rule quickly becomes "It must fix a real bug that can bother people".
> >> >
> >> > So is there a reason why stable couldn't become some hybrid-form union of
> >> >
> >> > - really critical issues (data corruption, boot issues, severe security
> >> >   issues) taken from bleeding edge upstream
> >> > - [reviewed] cherry-picks of functional fixes from major distro kernels
> >> >   (based on that very -stable release), as that's apparently what people
> >> >   are hitting in the real world with that particular kernel
> >>
> >> It already is that :)
> >>
> >> The problem Sasha is trying to solve here is that for many subsystems,
> >> maintainers do not mark patches for stable at all.
> >
> >The way he is trying to do that is just wrong. Generate a pressure on
> >those subsystems by referring to bug reports and unhappy users and I am
> >pretty sure they will try harder... You cannot solve the problem by
> >bypassing them without having deep understanding of the specific
> >subsytem. Once you have it, just make sure you are part of the review
> >process and make sure to mark patches before they are merged.
> 
> I think we just don't agree on how we should "pressure".
> 
> Look at the discussion I had with the XFS folks who just don't want to
> deal with this -stable thing because they have to much work upstream.

So do you really think that you or any script decide without them? My
recollection from that discussion was quite opposite. Dave was quite
clear that most of fixes are quite hard to evaluate and most of them
are simply not worth risking the backport.

> There wasn't a single patch in -stable coming from XFS for the past 6+
> months. I'm aware of more than one way to corrupt an XFS volume for any
> distro that uses a kernel older than 4.15.

Then try to poke/bribe somebody to have it fixed. But applying
_something_ is just not a solution. You should also evaluate whether "I
am able to corrupt" is something that "people see in the wild". Sure
there are zillions of bugs hidden in the large code base like the
kernel. People just do not tend to hit them and this will likely not
change very much in the future.

> Sure, please buy them a beer at LSF/MM (I'll pay) and ask them to be
> better about it, but I don't see this changing.

I can surely have one or two and discuss this. I am pretty sure xfs guys
are not going to pretend older kernels do not exist.

> The solution to this, in my opinion, is to automate the whole selection
> and review process. We do selection using AI, and we run every possible
> test that's relevant to that subsystem.
> 
> At which point, the amount of work a human needs to do to review a patch
> shrinks into something far more managable for some maintainers.

I really disagree. I am pretty sure maintainers are very well aware of
how the patch is important. Some do no care about stable and I agree you
should poke those. But some have really good reasons to not throw many
patches that direction because they do not feel the patch is important
enough.

Remember this is not about numbers. The more is not always better.

> >> So real bugfixes
> >> that do hit people are not getting to those kernels, which force the
> >> distros to do extra work to triage a bug, dig through upstream kernels,
> >> find and apply the patch.
> >
> >I would say that this is the primary role of the distro. To hide the
> >jungle of the upstream work and provide the additional of bug filtering
> >and forwarding them the right direction.
> 
> More often than triaging, you'll just be asked to upgrade to the latest
> version. What sort of user experience does that provide?
> 
> [snip]
> 
> >> So nothing "new" is happening here, EXCEPT we are actually starting to
> >> get a better kernel-wide coverage for stable fixes, which we have not
> >> had in the past.  That's a good thing!  The number of patches applied to
> >> stable is still a very very very tiny % compared to mainline, so nothing
> >> new is happening here.
> >
> >yes I do agree, the stable process is not very much different from the
> >past and I would tend both processes broken because they explicitly try
> >to avoid 

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Greg KH
On Tue, Apr 17, 2018 at 10:15:02AM -0400, Steven Rostedt wrote:
> On Tue, 17 Apr 2018 14:04:36 +
> Sasha Levin  wrote:
> 
> > The solution to this, in my opinion, is to automate the whole selection
> > and review process. We do selection using AI, and we run every possible
> > test that's relevant to that subsystem.
> > 
> > At which point, the amount of work a human needs to do to review a patch
> > shrinks into something far more managable for some maintainers.
> 
> I guess the real question is, who are the stable kernels for? Is it just
> a place to look at to see what distros should think about. A superset
> of what distros would take. Then distros would have a nice place to
> look to find what patches they should look at. But the stable tree
> itself wont be used. But it's not being used today by major distros
> (Red Hat and SuSE). Debian may be using it, but that's because the
> stable maintainer for its kernels is also the Debian maintainer.
> 
> Who are the customers of the stable trees? They are the ones that
> should be determining the "equation" for what goes into it.

The "customers" of the stable trees are anyone who uses Linux.

Right now, it's estimated that only about 1/3 of the kernels running out
there, at the best, are an "enterprise" kernel/distro.  2/3 of the world
either run a kernel.org-based release + their own patches, or Debian.
And Debian piggy-backs on the stable kernel releases pretty regularily.

So the majority of the Linux users out there are what we are doing this
for.  Those that do not pay for a company to sift through things and
only cherry-pick what they want to pick out (hint, they almost always
miss things, some do this better than others...)

That's who this is all for, which is why we are doing our best to keep
on top of the avalanche of patches going into upstream to get the needed
fixes (both security and "normal" fixes) out to users as soon as
possible.

So again, if you are a subsystem maintainer, tag your patches for
stable.  If you do not, you will get automated emails asking you about
patches that should be applied (like the one that started this thread).
If you want to just have us ignore your subsystem entirely, we will be
glad to do so, and we will tell the world to not use your subsystem if
at all possible (see previous comments about xfs, and I would argue IB
right now...)

> Personally, I use stable as a one off from mainline. Like I mentioned
> in another email. I'm currently on 4.15.x and will probably move to
> 4.16.x next. Unless there's some critical bug announcement, I update my
> machines once a month. I originally just used mainline, but that was a
> bit too unstable for my work machines ;-)

That's great, you are a user of these trees then.  So you benifit
directly, along with everyone else who relies on them.

And again, I'm working with the SoC vendors to directly incorporate
these trees into their device trees, and I've already seen some devices
in the wild push out updated 4.4.y kernels a few weeks after they are
released.  That's the end goal here, to have the world's devices in a
much more secure and stable shape by relying on these kernels.

thanks,

greg k-h


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Greg KH
On Tue, Apr 17, 2018 at 10:15:02AM -0400, Steven Rostedt wrote:
> On Tue, 17 Apr 2018 14:04:36 +
> Sasha Levin  wrote:
> 
> > The solution to this, in my opinion, is to automate the whole selection
> > and review process. We do selection using AI, and we run every possible
> > test that's relevant to that subsystem.
> > 
> > At which point, the amount of work a human needs to do to review a patch
> > shrinks into something far more managable for some maintainers.
> 
> I guess the real question is, who are the stable kernels for? Is it just
> a place to look at to see what distros should think about. A superset
> of what distros would take. Then distros would have a nice place to
> look to find what patches they should look at. But the stable tree
> itself wont be used. But it's not being used today by major distros
> (Red Hat and SuSE). Debian may be using it, but that's because the
> stable maintainer for its kernels is also the Debian maintainer.
> 
> Who are the customers of the stable trees? They are the ones that
> should be determining the "equation" for what goes into it.

The "customers" of the stable trees are anyone who uses Linux.

Right now, it's estimated that only about 1/3 of the kernels running out
there, at the best, are an "enterprise" kernel/distro.  2/3 of the world
either run a kernel.org-based release + their own patches, or Debian.
And Debian piggy-backs on the stable kernel releases pretty regularily.

So the majority of the Linux users out there are what we are doing this
for.  Those that do not pay for a company to sift through things and
only cherry-pick what they want to pick out (hint, they almost always
miss things, some do this better than others...)

That's who this is all for, which is why we are doing our best to keep
on top of the avalanche of patches going into upstream to get the needed
fixes (both security and "normal" fixes) out to users as soon as
possible.

So again, if you are a subsystem maintainer, tag your patches for
stable.  If you do not, you will get automated emails asking you about
patches that should be applied (like the one that started this thread).
If you want to just have us ignore your subsystem entirely, we will be
glad to do so, and we will tell the world to not use your subsystem if
at all possible (see previous comments about xfs, and I would argue IB
right now...)

> Personally, I use stable as a one off from mainline. Like I mentioned
> in another email. I'm currently on 4.15.x and will probably move to
> 4.16.x next. Unless there's some critical bug announcement, I update my
> machines once a month. I originally just used mainline, but that was a
> bit too unstable for my work machines ;-)

That's great, you are a user of these trees then.  So you benifit
directly, along with everyone else who relies on them.

And again, I'm working with the SoC vendors to directly incorporate
these trees into their device trees, and I've already seen some devices
in the wild push out updated 4.4.y kernels a few weeks after they are
released.  That's the end goal here, to have the world's devices in a
much more secure and stable shape by relying on these kernels.

thanks,

greg k-h


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Michal Hocko
On Tue 17-04-18 13:39:33, Sasha Levin wrote:
[...]
> But mm/ commits don't come only from these people. Here's a concrete
> example we can discuss:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c61611f70958d86f659bca25c02ae69413747a8d

I would be really careful. Because that reqiures to audit all callers to
be compliant with the change. This is just _too_ easy to backport
without noticing a failure. Now consider the other side. Is there any
real bug report backing this? This behavior was like that for quite some
time but I do not remember any actual bug report and the changelog
doesn't mention one either. It is about theoretical problem. 

So if this was to be merged to stable then the changelog should contain
a big fat warning about the existing users and how they should be
checked.

Besides that I can see Reviewed-by: akpm and Andrew is usually very
careful about stable backports so there probably _was_ a reson to
exclude stable.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Michal Hocko
On Tue 17-04-18 13:39:33, Sasha Levin wrote:
[...]
> But mm/ commits don't come only from these people. Here's a concrete
> example we can discuss:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c61611f70958d86f659bca25c02ae69413747a8d

I would be really careful. Because that reqiures to audit all callers to
be compliant with the change. This is just _too_ easy to backport
without noticing a failure. Now consider the other side. Is there any
real bug report backing this? This behavior was like that for quite some
time but I do not remember any actual bug report and the changelog
doesn't mention one either. It is about theoretical problem. 

So if this was to be merged to stable then the changelog should contain
a big fat warning about the existing users and how they should be
checked.

Besides that I can see Reviewed-by: akpm and Andrew is usually very
careful about stable backports so there probably _was_ a reson to
exclude stable.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Steven Rostedt
On Tue, 17 Apr 2018 14:04:36 +
Sasha Levin  wrote:

> The solution to this, in my opinion, is to automate the whole selection
> and review process. We do selection using AI, and we run every possible
> test that's relevant to that subsystem.
> 
> At which point, the amount of work a human needs to do to review a patch
> shrinks into something far more managable for some maintainers.

I guess the real question is, who are the stable kernels for? Is it just
a place to look at to see what distros should think about. A superset
of what distros would take. Then distros would have a nice place to
look to find what patches they should look at. But the stable tree
itself wont be used. But it's not being used today by major distros
(Red Hat and SuSE). Debian may be using it, but that's because the
stable maintainer for its kernels is also the Debian maintainer.

Who are the customers of the stable trees? They are the ones that
should be determining the "equation" for what goes into it.

Personally, I use stable as a one off from mainline. Like I mentioned
in another email. I'm currently on 4.15.x and will probably move to
4.16.x next. Unless there's some critical bug announcement, I update my
machines once a month. I originally just used mainline, but that was a
bit too unstable for my work machines ;-)

-- Steve


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Steven Rostedt
On Tue, 17 Apr 2018 14:04:36 +
Sasha Levin  wrote:

> The solution to this, in my opinion, is to automate the whole selection
> and review process. We do selection using AI, and we run every possible
> test that's relevant to that subsystem.
> 
> At which point, the amount of work a human needs to do to review a patch
> shrinks into something far more managable for some maintainers.

I guess the real question is, who are the stable kernels for? Is it just
a place to look at to see what distros should think about. A superset
of what distros would take. Then distros would have a nice place to
look to find what patches they should look at. But the stable tree
itself wont be used. But it's not being used today by major distros
(Red Hat and SuSE). Debian may be using it, but that's because the
stable maintainer for its kernels is also the Debian maintainer.

Who are the customers of the stable trees? They are the ones that
should be determining the "equation" for what goes into it.

Personally, I use stable as a one off from mainline. Like I mentioned
in another email. I'm currently on 4.15.x and will probably move to
4.16.x next. Unless there's some critical bug announcement, I update my
machines once a month. I originally just used mainline, but that was a
bit too unstable for my work machines ;-)

-- Steve


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Sasha Levin
On Tue, Apr 17, 2018 at 01:07:17PM +0200, Michal Hocko wrote:
>On Tue 17-04-18 12:39:36, Greg KH wrote:
>> On Mon, Apr 16, 2018 at 11:28:44PM +0200, Jiri Kosina wrote:
>> > On Mon, 16 Apr 2018, Sasha Levin wrote:
>> >
>> > > I agree that as an enterprise distro taking everything from -stable
>> > > isn't the best idea. Ideally you'd want to be close to the first
>> > > extreme you've mentioned and only take commits if customers are asking
>> > > you to do so.
>> > >
>> > > I think that the rule we're trying to agree upon is the "It must fix
>> > > a real bug that bothers people".
>> > >
>> > > I think that we can agree that it's impossible to expect every single
>> > > Linux user to go on LKML and complain about a bug he encountered, so the
>> > > rule quickly becomes "It must fix a real bug that can bother people".
>> >
>> > So is there a reason why stable couldn't become some hybrid-form union of
>> >
>> > - really critical issues (data corruption, boot issues, severe security
>> >   issues) taken from bleeding edge upstream
>> > - [reviewed] cherry-picks of functional fixes from major distro kernels
>> >   (based on that very -stable release), as that's apparently what people
>> >   are hitting in the real world with that particular kernel
>>
>> It already is that :)
>>
>> The problem Sasha is trying to solve here is that for many subsystems,
>> maintainers do not mark patches for stable at all.
>
>The way he is trying to do that is just wrong. Generate a pressure on
>those subsystems by referring to bug reports and unhappy users and I am
>pretty sure they will try harder... You cannot solve the problem by
>bypassing them without having deep understanding of the specific
>subsytem. Once you have it, just make sure you are part of the review
>process and make sure to mark patches before they are merged.

I think we just don't agree on how we should "pressure".

Look at the discussion I had with the XFS folks who just don't want to
deal with this -stable thing because they have to much work upstream.

There wasn't a single patch in -stable coming from XFS for the past 6+
months. I'm aware of more than one way to corrupt an XFS volume for any
distro that uses a kernel older than 4.15.

Sure, please buy them a beer at LSF/MM (I'll pay) and ask them to be
better about it, but I don't see this changing.

The solution to this, in my opinion, is to automate the whole selection
and review process. We do selection using AI, and we run every possible
test that's relevant to that subsystem.

At which point, the amount of work a human needs to do to review a patch
shrinks into something far more managable for some maintainers.

>> So real bugfixes
>> that do hit people are not getting to those kernels, which force the
>> distros to do extra work to triage a bug, dig through upstream kernels,
>> find and apply the patch.
>
>I would say that this is the primary role of the distro. To hide the
>jungle of the upstream work and provide the additional of bug filtering
>and forwarding them the right direction.

More often than triaging, you'll just be asked to upgrade to the latest
version. What sort of user experience does that provide?

[snip]

>> So nothing "new" is happening here, EXCEPT we are actually starting to
>> get a better kernel-wide coverage for stable fixes, which we have not
>> had in the past.  That's a good thing!  The number of patches applied to
>> stable is still a very very very tiny % compared to mainline, so nothing
>> new is happening here.
>
>yes I do agree, the stable process is not very much different from the
>past and I would tend both processes broken because they explicitly try
>to avoid maintainers which is just wrong.

Avoid maintainers?! We send so much "spam" trying to get maintainers
more involved in the process. How is that avoiding them?

If you're a maintainer who has specific requirements for the -stable
flow, or you have any automated testing you'd like to be run on these
commits, or you want these mails to come in a different format, or
pretty much anything else at all just shoot me a mail!

It's been almost impossible to get maintainers involved in this process.

We don't sneak anything past maintainers, there are multiple mails over
multiple weeks for each commit that would go in. You don't have to
review it right away either, just reply with "please don't merge until
I'm done reviewing" and it'll get removed from the queue.

>> Oh, and if you do want to complain about huge new features being
>> backported, look at the mess that Spectre and Meltdown has caused in the
>> stable trees.  I don't see anyone complaining about those massive
>> changes :)
>
>Are you serious? Are you going the compare the biggest PITA that the
>community had to undergo because of HW issues with random pattern
>matching in changelog/diffs? Come on!

HW Issues are irrelevant here. You had a bug that allowed arbitrary
kernel memory access. I can easily list quite a few commits, that are
not tagged for 

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Sasha Levin
On Tue, Apr 17, 2018 at 01:07:17PM +0200, Michal Hocko wrote:
>On Tue 17-04-18 12:39:36, Greg KH wrote:
>> On Mon, Apr 16, 2018 at 11:28:44PM +0200, Jiri Kosina wrote:
>> > On Mon, 16 Apr 2018, Sasha Levin wrote:
>> >
>> > > I agree that as an enterprise distro taking everything from -stable
>> > > isn't the best idea. Ideally you'd want to be close to the first
>> > > extreme you've mentioned and only take commits if customers are asking
>> > > you to do so.
>> > >
>> > > I think that the rule we're trying to agree upon is the "It must fix
>> > > a real bug that bothers people".
>> > >
>> > > I think that we can agree that it's impossible to expect every single
>> > > Linux user to go on LKML and complain about a bug he encountered, so the
>> > > rule quickly becomes "It must fix a real bug that can bother people".
>> >
>> > So is there a reason why stable couldn't become some hybrid-form union of
>> >
>> > - really critical issues (data corruption, boot issues, severe security
>> >   issues) taken from bleeding edge upstream
>> > - [reviewed] cherry-picks of functional fixes from major distro kernels
>> >   (based on that very -stable release), as that's apparently what people
>> >   are hitting in the real world with that particular kernel
>>
>> It already is that :)
>>
>> The problem Sasha is trying to solve here is that for many subsystems,
>> maintainers do not mark patches for stable at all.
>
>The way he is trying to do that is just wrong. Generate a pressure on
>those subsystems by referring to bug reports and unhappy users and I am
>pretty sure they will try harder... You cannot solve the problem by
>bypassing them without having deep understanding of the specific
>subsytem. Once you have it, just make sure you are part of the review
>process and make sure to mark patches before they are merged.

I think we just don't agree on how we should "pressure".

Look at the discussion I had with the XFS folks who just don't want to
deal with this -stable thing because they have to much work upstream.

There wasn't a single patch in -stable coming from XFS for the past 6+
months. I'm aware of more than one way to corrupt an XFS volume for any
distro that uses a kernel older than 4.15.

Sure, please buy them a beer at LSF/MM (I'll pay) and ask them to be
better about it, but I don't see this changing.

The solution to this, in my opinion, is to automate the whole selection
and review process. We do selection using AI, and we run every possible
test that's relevant to that subsystem.

At which point, the amount of work a human needs to do to review a patch
shrinks into something far more managable for some maintainers.

>> So real bugfixes
>> that do hit people are not getting to those kernels, which force the
>> distros to do extra work to triage a bug, dig through upstream kernels,
>> find and apply the patch.
>
>I would say that this is the primary role of the distro. To hide the
>jungle of the upstream work and provide the additional of bug filtering
>and forwarding them the right direction.

More often than triaging, you'll just be asked to upgrade to the latest
version. What sort of user experience does that provide?

[snip]

>> So nothing "new" is happening here, EXCEPT we are actually starting to
>> get a better kernel-wide coverage for stable fixes, which we have not
>> had in the past.  That's a good thing!  The number of patches applied to
>> stable is still a very very very tiny % compared to mainline, so nothing
>> new is happening here.
>
>yes I do agree, the stable process is not very much different from the
>past and I would tend both processes broken because they explicitly try
>to avoid maintainers which is just wrong.

Avoid maintainers?! We send so much "spam" trying to get maintainers
more involved in the process. How is that avoiding them?

If you're a maintainer who has specific requirements for the -stable
flow, or you have any automated testing you'd like to be run on these
commits, or you want these mails to come in a different format, or
pretty much anything else at all just shoot me a mail!

It's been almost impossible to get maintainers involved in this process.

We don't sneak anything past maintainers, there are multiple mails over
multiple weeks for each commit that would go in. You don't have to
review it right away either, just reply with "please don't merge until
I'm done reviewing" and it'll get removed from the queue.

>> Oh, and if you do want to complain about huge new features being
>> backported, look at the mess that Spectre and Meltdown has caused in the
>> stable trees.  I don't see anyone complaining about those massive
>> changes :)
>
>Are you serious? Are you going the compare the biggest PITA that the
>community had to undergo because of HW issues with random pattern
>matching in changelog/diffs? Come on!

HW Issues are irrelevant here. You had a bug that allowed arbitrary
kernel memory access. I can easily list quite a few commits, that are
not tagged for 

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Sasha Levin
On Tue, Apr 17, 2018 at 02:24:54PM +0200, Petr Mladek wrote:
>Back to the trend. Last week I got autosel mails even for
>patches that were still being discussed, had issues, and
>were far from upstream:
>
> https://lkml.kernel.org/r/dm5pr2101mb1032ab19b489d46b717b50d4fb...@dm5pr2101mb1032.namprd21.prod.outlook.com
> https://lkml.kernel.org/r/dm5pr2101mb10327fa0a7e0d2c901e33b79fb...@dm5pr2101mb1032.namprd21.prod.outlook.com
>
>It might be a good idea if the mail asked to add Fixes: tag
>or stable mailing list. But the mail suggested to add the
>unfinished patch into stable branch directly (even before
>upstreaming?).

I obviously didn't suggest that this patch will go in -stable before
it's upstream.

I've started doing those because some folks can't be arsed to reply to a
review request for a patch that is months old. I found that if I send
these mails while the discussion is still going on I'd get a much better
response rate from people.

If you think any of these patches should go in stable there were two
ways about it:

 - You end up adding the -stable tag yourself, and it would follow the
   usual route where Greg picks it up.
 - You reply to that mail, and the patch would wait in a list until my
   script notices it made it upstream, at which point it would get
   queued for stable.

>Now, there are only hand full of printk patches in each
>release, so it is still doable. I just do not understand
>how other maintainers, from much more busy subsystems,
>could cope with this trend.
>
>By other words. If you want to automatize patch nomination,
>you might need to automatize also patch review. Or you need
>to keep the patch rate low. This might mean to nominate
>only important and rather trivial fixes.

I also have an effort to help review the patches. See what I'm working
on for the xfs folks:

https://lkml.org/lkml/2018/3/29/1113

Where in addition to build tests I'd also run each commit, for each
stable kernel through a set of xfstests and provide them along with the
mail.

So yes, I'm aware that the volume of patches is huge, but there's not
much I can do about it because it's just a subset of the kernel's patch
volume and since the kernel gets more and more patches each release, the
volume of stable commits is bound to grow as well.

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Sasha Levin
On Tue, Apr 17, 2018 at 02:24:54PM +0200, Petr Mladek wrote:
>Back to the trend. Last week I got autosel mails even for
>patches that were still being discussed, had issues, and
>were far from upstream:
>
> https://lkml.kernel.org/r/dm5pr2101mb1032ab19b489d46b717b50d4fb...@dm5pr2101mb1032.namprd21.prod.outlook.com
> https://lkml.kernel.org/r/dm5pr2101mb10327fa0a7e0d2c901e33b79fb...@dm5pr2101mb1032.namprd21.prod.outlook.com
>
>It might be a good idea if the mail asked to add Fixes: tag
>or stable mailing list. But the mail suggested to add the
>unfinished patch into stable branch directly (even before
>upstreaming?).

I obviously didn't suggest that this patch will go in -stable before
it's upstream.

I've started doing those because some folks can't be arsed to reply to a
review request for a patch that is months old. I found that if I send
these mails while the discussion is still going on I'd get a much better
response rate from people.

If you think any of these patches should go in stable there were two
ways about it:

 - You end up adding the -stable tag yourself, and it would follow the
   usual route where Greg picks it up.
 - You reply to that mail, and the patch would wait in a list until my
   script notices it made it upstream, at which point it would get
   queued for stable.

>Now, there are only hand full of printk patches in each
>release, so it is still doable. I just do not understand
>how other maintainers, from much more busy subsystems,
>could cope with this trend.
>
>By other words. If you want to automatize patch nomination,
>you might need to automatize also patch review. Or you need
>to keep the patch rate low. This might mean to nominate
>only important and rather trivial fixes.

I also have an effort to help review the patches. See what I'm working
on for the xfs folks:

https://lkml.org/lkml/2018/3/29/1113

Where in addition to build tests I'd also run each commit, for each
stable kernel through a set of xfstests and provide them along with the
mail.

So yes, I'm aware that the volume of patches is huge, but there's not
much I can do about it because it's just a subset of the kernel's patch
volume and since the kernel gets more and more patches each release, the
volume of stable commits is bound to grow as well.

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Sasha Levin
On Tue, Apr 17, 2018 at 02:49:24PM +0200, Michal Hocko wrote:
>On Tue 17-04-18 14:24:54, Petr Mladek wrote:
>[...]
>> Back to the trend. Last week I got autosel mails even for
>> patches that were still being discussed, had issues, and
>> were far from upstream:
>>
>> https://lkml.kernel.org/r/dm5pr2101mb1032ab19b489d46b717b50d4fb...@dm5pr2101mb1032.namprd21.prod.outlook.com
>> https://lkml.kernel.org/r/dm5pr2101mb10327fa0a7e0d2c901e33b79fb...@dm5pr2101mb1032.namprd21.prod.outlook.com
>>
>> It might be a good idea if the mail asked to add Fixes: tag
>> or stable mailing list. But the mail suggested to add the
>> unfinished patch into stable branch directly (even before
>> upstreaming?).
>
>Well, I think that poking subsystems which ignore stable trees with such
>emails early during review might be quite helpful. Maybe people start
>marking for stable and we do not need the guessing later. I wouldn't
>bother poking those who are known to mark stable patches though.

Yup, mm/ needs far less poking that XFS (for example).

What makes mm/ so good about this is that it's a rather small set of
devs who are good at marking things for stable. As long as the commit
came from one of these "core" mm/ folks it's almost guaranteed to have
proper stable tags.

But mm/ commits don't come only from these people. Here's a concrete
example we can discuss:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c61611f70958d86f659bca25c02ae69413747a8d

This was merged in a few days ago, and seems relevant for older kernel
trees as well. Should it not have a stable tag?

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Sasha Levin
On Tue, Apr 17, 2018 at 02:49:24PM +0200, Michal Hocko wrote:
>On Tue 17-04-18 14:24:54, Petr Mladek wrote:
>[...]
>> Back to the trend. Last week I got autosel mails even for
>> patches that were still being discussed, had issues, and
>> were far from upstream:
>>
>> https://lkml.kernel.org/r/dm5pr2101mb1032ab19b489d46b717b50d4fb...@dm5pr2101mb1032.namprd21.prod.outlook.com
>> https://lkml.kernel.org/r/dm5pr2101mb10327fa0a7e0d2c901e33b79fb...@dm5pr2101mb1032.namprd21.prod.outlook.com
>>
>> It might be a good idea if the mail asked to add Fixes: tag
>> or stable mailing list. But the mail suggested to add the
>> unfinished patch into stable branch directly (even before
>> upstreaming?).
>
>Well, I think that poking subsystems which ignore stable trees with such
>emails early during review might be quite helpful. Maybe people start
>marking for stable and we do not need the guessing later. I wouldn't
>bother poking those who are known to mark stable patches though.

Yup, mm/ needs far less poking that XFS (for example).

What makes mm/ so good about this is that it's a rather small set of
devs who are good at marking things for stable. As long as the commit
came from one of these "core" mm/ folks it's almost guaranteed to have
proper stable tags.

But mm/ commits don't come only from these people. Here's a concrete
example we can discuss:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c61611f70958d86f659bca25c02ae69413747a8d

This was merged in a few days ago, and seems relevant for older kernel
trees as well. Should it not have a stable tag?

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Sasha Levin
On Tue, Apr 17, 2018 at 01:41:44PM +0200, Jan Kara wrote:
>On Mon 16-04-18 17:23:30, Sasha Levin wrote:
>> On Mon, Apr 16, 2018 at 07:06:04PM +0200, Pavel Machek wrote:
>> >On Mon 2018-04-16 16:37:56, Sasha Levin wrote:
>> >> On Mon, Apr 16, 2018 at 12:30:19PM -0400, Steven Rostedt wrote:
>> >> >On Mon, 16 Apr 2018 16:19:14 +
>> >> >Sasha Levin  wrote:
>> >> >
>> >> >> >Wait! What does that mean? What's the purpose of stable if it is as
>> >> >> >broken as mainline?
>> >> >>
>> >> >> This just means that if there is a fix that went in mainline, and the
>> >> >> fix is broken somehow, we'd rather take the broken fix than not.
>> >> >>
>> >> >> In this scenario, *something* will be broken, it's just a matter of
>> >> >> what. We'd rather have the same thing broken between mainline and
>> >> >> stable.
>> >> >
>> >> >Honestly, I think that removes all value of the stable series. I
>> >> >remember when the stable series were first created. People were saying
>> >> >that it wouldn't even get to more than 5 versions, because the bar for
>> >> >backporting was suppose to be very high. Today it's just a fork of the
>> >> >kernel at a given version. No more features, but we will be OK with
>> >> >regressions. I'm struggling to see what the benefit of it is suppose to
>> >> >be?
>> >>
>> >> It's not "OK with regressions".
>> >>
>> >> Let's look at a hypothetical example: You have a 4.15.1 kernel that has
>> >> a broken printf() behaviour so that when you:
>> >>
>> >>   pr_err("%d", 5)
>> >>
>> >> Would print:
>> >>
>> >>   "Microsoft Rulez"
>> >>
>> >> Bad, right? So you went ahead and fixed it, and now it prints "5" as you
>> >> might expect. But alas, with your patch, running:
>> >>
>> >>   pr_err("%s", "hi!")
>> >>
>> >> Would show a cat picture for 5 seconds.
>> >>
>> >> Should we take your patch in -stable or not? If we don't, we're stuck
>> >> with the original issue while the mainline kernel will behave
>> >> differently, but if we do - we introduce a new regression.
>> >
>> >Of course not.
>> >
>> >- It must be obviously correct and tested.
>> >
>> >If it introduces new bug, it is not correct, and certainly not
>> >obviously correct.
>>
>> As you might have noticed, we don't strictly follow the rules.
>>
>> Take a look at the whole PTI story as an example. It's way more than 100
>> lines, it's not obviously corrent, it fixed more than 1 thing, and so
>> on, and yet it went in -stable!
>>
>> Would you argue we shouldn't have backported PTI to -stable?
>
>So I agree with that being backported. But I think this nicely demostrates
>a point some people are trying to make in this thread. We do take fixes
>with high risk or regression if they fix serious enough issue. Also we do
>take fixes to non-serious stuff (such as addition of device ID) if the
>chances of regression are really low.
>
>So IMHO the metric for including the fix is not solely "how annoying to
>user this can be" but rather something like:
>
>score = (how annoying the bug is) * ((1 / (chance of regression due to
>   including this)) - 1)^3
>
>(constants are somewhat arbitrary subject to tuning ;). Now both 'annoying'
>and 'regression chance' parts are subjective and sometimes difficult to
>estimate so don't take the formula too seriously but it demonstrates the
>point. I think we all agree we want to fix annoying stuff and we don't want
>regressions. But you need to somehow weight this over your expected
>userbase - and this is where your argument "but someone might be annoyed by
>LEDs not working so let's include it" has problems - it should rather be
>"is the annoyance of non-working leds over expected user base high enough
>to risk a regression due to this patch for someone in the expected user
>base"? The answer to this second question is not clear at all to a casual
>reviewer and that's why we IMHO have CC stable tag as maintainer is
>supposed to have at least a bit better clue.

We may be able to guesstimate the 'regression chance', but there's no
way we can guess the 'annoyance' once. There are so many different use
cases that we just can't even guess how many people would get "annoyed"
by something.

Even regression chance is tricky, look at the commits I've linked
earlier in the thread. Even the most trivial looking commits that end up
in stable have a chance for regression.

>Another point I wanted to make is that if chance a patch causes a
>regression is about 2% as you said somewhere else in a thread, then by
>adding 20 patches that "may fix a bug that is annoying for someone" you've
>just increased a chance there's a regression in the release by 34%. And

So I've said that the rejection rate is less than 2%. This includes
all commits that I have proposed for -stable, but didn't end up being
included in -stable.

This includes commits that the author/maintainers NACKed, commits that
didn't do anything on older kernels, commits that were buggy but were
caught before the kernel was released, commits 

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Sasha Levin
On Tue, Apr 17, 2018 at 01:41:44PM +0200, Jan Kara wrote:
>On Mon 16-04-18 17:23:30, Sasha Levin wrote:
>> On Mon, Apr 16, 2018 at 07:06:04PM +0200, Pavel Machek wrote:
>> >On Mon 2018-04-16 16:37:56, Sasha Levin wrote:
>> >> On Mon, Apr 16, 2018 at 12:30:19PM -0400, Steven Rostedt wrote:
>> >> >On Mon, 16 Apr 2018 16:19:14 +
>> >> >Sasha Levin  wrote:
>> >> >
>> >> >> >Wait! What does that mean? What's the purpose of stable if it is as
>> >> >> >broken as mainline?
>> >> >>
>> >> >> This just means that if there is a fix that went in mainline, and the
>> >> >> fix is broken somehow, we'd rather take the broken fix than not.
>> >> >>
>> >> >> In this scenario, *something* will be broken, it's just a matter of
>> >> >> what. We'd rather have the same thing broken between mainline and
>> >> >> stable.
>> >> >
>> >> >Honestly, I think that removes all value of the stable series. I
>> >> >remember when the stable series were first created. People were saying
>> >> >that it wouldn't even get to more than 5 versions, because the bar for
>> >> >backporting was suppose to be very high. Today it's just a fork of the
>> >> >kernel at a given version. No more features, but we will be OK with
>> >> >regressions. I'm struggling to see what the benefit of it is suppose to
>> >> >be?
>> >>
>> >> It's not "OK with regressions".
>> >>
>> >> Let's look at a hypothetical example: You have a 4.15.1 kernel that has
>> >> a broken printf() behaviour so that when you:
>> >>
>> >>   pr_err("%d", 5)
>> >>
>> >> Would print:
>> >>
>> >>   "Microsoft Rulez"
>> >>
>> >> Bad, right? So you went ahead and fixed it, and now it prints "5" as you
>> >> might expect. But alas, with your patch, running:
>> >>
>> >>   pr_err("%s", "hi!")
>> >>
>> >> Would show a cat picture for 5 seconds.
>> >>
>> >> Should we take your patch in -stable or not? If we don't, we're stuck
>> >> with the original issue while the mainline kernel will behave
>> >> differently, but if we do - we introduce a new regression.
>> >
>> >Of course not.
>> >
>> >- It must be obviously correct and tested.
>> >
>> >If it introduces new bug, it is not correct, and certainly not
>> >obviously correct.
>>
>> As you might have noticed, we don't strictly follow the rules.
>>
>> Take a look at the whole PTI story as an example. It's way more than 100
>> lines, it's not obviously corrent, it fixed more than 1 thing, and so
>> on, and yet it went in -stable!
>>
>> Would you argue we shouldn't have backported PTI to -stable?
>
>So I agree with that being backported. But I think this nicely demostrates
>a point some people are trying to make in this thread. We do take fixes
>with high risk or regression if they fix serious enough issue. Also we do
>take fixes to non-serious stuff (such as addition of device ID) if the
>chances of regression are really low.
>
>So IMHO the metric for including the fix is not solely "how annoying to
>user this can be" but rather something like:
>
>score = (how annoying the bug is) * ((1 / (chance of regression due to
>   including this)) - 1)^3
>
>(constants are somewhat arbitrary subject to tuning ;). Now both 'annoying'
>and 'regression chance' parts are subjective and sometimes difficult to
>estimate so don't take the formula too seriously but it demonstrates the
>point. I think we all agree we want to fix annoying stuff and we don't want
>regressions. But you need to somehow weight this over your expected
>userbase - and this is where your argument "but someone might be annoyed by
>LEDs not working so let's include it" has problems - it should rather be
>"is the annoyance of non-working leds over expected user base high enough
>to risk a regression due to this patch for someone in the expected user
>base"? The answer to this second question is not clear at all to a casual
>reviewer and that's why we IMHO have CC stable tag as maintainer is
>supposed to have at least a bit better clue.

We may be able to guesstimate the 'regression chance', but there's no
way we can guess the 'annoyance' once. There are so many different use
cases that we just can't even guess how many people would get "annoyed"
by something.

Even regression chance is tricky, look at the commits I've linked
earlier in the thread. Even the most trivial looking commits that end up
in stable have a chance for regression.

>Another point I wanted to make is that if chance a patch causes a
>regression is about 2% as you said somewhere else in a thread, then by
>adding 20 patches that "may fix a bug that is annoying for someone" you've
>just increased a chance there's a regression in the release by 34%. And

So I've said that the rejection rate is less than 2%. This includes
all commits that I have proposed for -stable, but didn't end up being
included in -stable.

This includes commits that the author/maintainers NACKed, commits that
didn't do anything on older kernels, commits that were buggy but were
caught before the kernel was released, commits that failed to build on
an 

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Michal Hocko
On Tue 17-04-18 14:24:54, Petr Mladek wrote:
[...]
> Back to the trend. Last week I got autosel mails even for
> patches that were still being discussed, had issues, and
> were far from upstream:
> 
> https://lkml.kernel.org/r/dm5pr2101mb1032ab19b489d46b717b50d4fb...@dm5pr2101mb1032.namprd21.prod.outlook.com
> https://lkml.kernel.org/r/dm5pr2101mb10327fa0a7e0d2c901e33b79fb...@dm5pr2101mb1032.namprd21.prod.outlook.com
> 
> It might be a good idea if the mail asked to add Fixes: tag
> or stable mailing list. But the mail suggested to add the
> unfinished patch into stable branch directly (even before
> upstreaming?).

Well, I think that poking subsystems which ignore stable trees with such
emails early during review might be quite helpful. Maybe people start
marking for stable and we do not need the guessing later. I wouldn't
bother poking those who are known to mark stable patches though.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Michal Hocko
On Tue 17-04-18 14:24:54, Petr Mladek wrote:
[...]
> Back to the trend. Last week I got autosel mails even for
> patches that were still being discussed, had issues, and
> were far from upstream:
> 
> https://lkml.kernel.org/r/dm5pr2101mb1032ab19b489d46b717b50d4fb...@dm5pr2101mb1032.namprd21.prod.outlook.com
> https://lkml.kernel.org/r/dm5pr2101mb10327fa0a7e0d2c901e33b79fb...@dm5pr2101mb1032.namprd21.prod.outlook.com
> 
> It might be a good idea if the mail asked to add Fixes: tag
> or stable mailing list. But the mail suggested to add the
> unfinished patch into stable branch directly (even before
> upstreaming?).

Well, I think that poking subsystems which ignore stable trees with such
emails early during review might be quite helpful. Maybe people start
marking for stable and we do not need the guessing later. I wouldn't
bother poking those who are known to mark stable patches though.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Petr Mladek
On Tue 2018-04-17 12:46:37, Greg KH wrote:
> Oh, I know why, suddenly subsystems that never were taking the time to
> mark patches for stable are getting patches backported and are getting
> nervous.

Yes, I am getting nervous because of this. The number of printk fixes
nominated for stable is increasing exponentially (just my feeling)
during last few months.

The problem is that I want to be responsible and think about possible
regressions. Sometimes it requires checking the state of the
particular kernel release. The older code base the more complicated
the decision is.

You might argue that backporting the fixes helps to get the same code
in all supported code bases. But it is not true. It never will be
the same.


Anyway, in the past the "automatically" nominated printk fixes
were trivial. They did not cause harm. But they also were not
worth it, IMHO. They fixed corner cases that were there for ages.
Most of these fixes were found by code review when working on
a feature. They were not backed by bug reports.


Last week, autosel nominated pretty non-trivial patch (started
this thread). It partly solved a problem we tried to fix last few
years.

On one side, this was an annoying problem that motivated several
people spend a lot of time on it. This might be a motivation
for a backport.

On the other hand, it took many years to come somewhere. The main
problem was the fear of regressions. We fixed/improved many things
in the mean time. It shows that the problem really is not trivial.
The same is true for the fix. We did our best to avoid regressions.
But it does not mean that there are none. Also it does not mean
that it will really give better results in all situations.

I really do not see a reason to hurry and backport this to
the older kernel releases. It means to spread the fix but also
eventual problems. It is easy to miss a dependant patch.
The less trivial fix, the more possible problems are there.



Back to the trend. Last week I got autosel mails even for
patches that were still being discussed, had issues, and
were far from upstream:

https://lkml.kernel.org/r/dm5pr2101mb1032ab19b489d46b717b50d4fb...@dm5pr2101mb1032.namprd21.prod.outlook.com
https://lkml.kernel.org/r/dm5pr2101mb10327fa0a7e0d2c901e33b79fb...@dm5pr2101mb1032.namprd21.prod.outlook.com

It might be a good idea if the mail asked to add Fixes: tag
or stable mailing list. But the mail suggested to add the
unfinished patch into stable branch directly (even before
upstreaming?).


Now, there are only hand full of printk patches in each
release, so it is still doable. I just do not understand
how other maintainers, from much more busy subsystems,
could cope with this trend.

By other words. If you want to automatize patch nomination,
you might need to automatize also patch review. Or you need
to keep the patch rate low. This might mean to nominate
only important and rather trivial fixes.


Best Regards,
Petr


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Petr Mladek
On Tue 2018-04-17 12:46:37, Greg KH wrote:
> Oh, I know why, suddenly subsystems that never were taking the time to
> mark patches for stable are getting patches backported and are getting
> nervous.

Yes, I am getting nervous because of this. The number of printk fixes
nominated for stable is increasing exponentially (just my feeling)
during last few months.

The problem is that I want to be responsible and think about possible
regressions. Sometimes it requires checking the state of the
particular kernel release. The older code base the more complicated
the decision is.

You might argue that backporting the fixes helps to get the same code
in all supported code bases. But it is not true. It never will be
the same.


Anyway, in the past the "automatically" nominated printk fixes
were trivial. They did not cause harm. But they also were not
worth it, IMHO. They fixed corner cases that were there for ages.
Most of these fixes were found by code review when working on
a feature. They were not backed by bug reports.


Last week, autosel nominated pretty non-trivial patch (started
this thread). It partly solved a problem we tried to fix last few
years.

On one side, this was an annoying problem that motivated several
people spend a lot of time on it. This might be a motivation
for a backport.

On the other hand, it took many years to come somewhere. The main
problem was the fear of regressions. We fixed/improved many things
in the mean time. It shows that the problem really is not trivial.
The same is true for the fix. We did our best to avoid regressions.
But it does not mean that there are none. Also it does not mean
that it will really give better results in all situations.

I really do not see a reason to hurry and backport this to
the older kernel releases. It means to spread the fix but also
eventual problems. It is easy to miss a dependant patch.
The less trivial fix, the more possible problems are there.



Back to the trend. Last week I got autosel mails even for
patches that were still being discussed, had issues, and
were far from upstream:

https://lkml.kernel.org/r/dm5pr2101mb1032ab19b489d46b717b50d4fb...@dm5pr2101mb1032.namprd21.prod.outlook.com
https://lkml.kernel.org/r/dm5pr2101mb10327fa0a7e0d2c901e33b79fb...@dm5pr2101mb1032.namprd21.prod.outlook.com

It might be a good idea if the mail asked to add Fixes: tag
or stable mailing list. But the mail suggested to add the
unfinished patch into stable branch directly (even before
upstreaming?).


Now, there are only hand full of printk patches in each
release, so it is still doable. I just do not understand
how other maintainers, from much more busy subsystems,
could cope with this trend.

By other words. If you want to automatize patch nomination,
you might need to automatize also patch review. Or you need
to keep the patch rate low. This might mean to nominate
only important and rather trivial fixes.


Best Regards,
Petr


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Jan Kara
On Mon 16-04-18 17:23:30, Sasha Levin wrote:
> On Mon, Apr 16, 2018 at 07:06:04PM +0200, Pavel Machek wrote:
> >On Mon 2018-04-16 16:37:56, Sasha Levin wrote:
> >> On Mon, Apr 16, 2018 at 12:30:19PM -0400, Steven Rostedt wrote:
> >> >On Mon, 16 Apr 2018 16:19:14 +
> >> >Sasha Levin  wrote:
> >> >
> >> >> >Wait! What does that mean? What's the purpose of stable if it is as
> >> >> >broken as mainline?
> >> >>
> >> >> This just means that if there is a fix that went in mainline, and the
> >> >> fix is broken somehow, we'd rather take the broken fix than not.
> >> >>
> >> >> In this scenario, *something* will be broken, it's just a matter of
> >> >> what. We'd rather have the same thing broken between mainline and
> >> >> stable.
> >> >
> >> >Honestly, I think that removes all value of the stable series. I
> >> >remember when the stable series were first created. People were saying
> >> >that it wouldn't even get to more than 5 versions, because the bar for
> >> >backporting was suppose to be very high. Today it's just a fork of the
> >> >kernel at a given version. No more features, but we will be OK with
> >> >regressions. I'm struggling to see what the benefit of it is suppose to
> >> >be?
> >>
> >> It's not "OK with regressions".
> >>
> >> Let's look at a hypothetical example: You have a 4.15.1 kernel that has
> >> a broken printf() behaviour so that when you:
> >>
> >>pr_err("%d", 5)
> >>
> >> Would print:
> >>
> >>"Microsoft Rulez"
> >>
> >> Bad, right? So you went ahead and fixed it, and now it prints "5" as you
> >> might expect. But alas, with your patch, running:
> >>
> >>pr_err("%s", "hi!")
> >>
> >> Would show a cat picture for 5 seconds.
> >>
> >> Should we take your patch in -stable or not? If we don't, we're stuck
> >> with the original issue while the mainline kernel will behave
> >> differently, but if we do - we introduce a new regression.
> >
> >Of course not.
> >
> >- It must be obviously correct and tested.
> >
> >If it introduces new bug, it is not correct, and certainly not
> >obviously correct.
> 
> As you might have noticed, we don't strictly follow the rules.
> 
> Take a look at the whole PTI story as an example. It's way more than 100
> lines, it's not obviously corrent, it fixed more than 1 thing, and so
> on, and yet it went in -stable!
> 
> Would you argue we shouldn't have backported PTI to -stable?

So I agree with that being backported. But I think this nicely demostrates
a point some people are trying to make in this thread. We do take fixes
with high risk or regression if they fix serious enough issue. Also we do
take fixes to non-serious stuff (such as addition of device ID) if the
chances of regression are really low.

So IMHO the metric for including the fix is not solely "how annoying to
user this can be" but rather something like:

score = (how annoying the bug is) * ((1 / (chance of regression due to
including this)) - 1)^3

(constants are somewhat arbitrary subject to tuning ;). Now both 'annoying'
and 'regression chance' parts are subjective and sometimes difficult to
estimate so don't take the formula too seriously but it demonstrates the
point. I think we all agree we want to fix annoying stuff and we don't want
regressions. But you need to somehow weight this over your expected
userbase - and this is where your argument "but someone might be annoyed by
LEDs not working so let's include it" has problems - it should rather be
"is the annoyance of non-working leds over expected user base high enough
to risk a regression due to this patch for someone in the expected user
base"? The answer to this second question is not clear at all to a casual
reviewer and that's why we IMHO have CC stable tag as maintainer is
supposed to have at least a bit better clue.

Another point I wanted to make is that if chance a patch causes a
regression is about 2% as you said somewhere else in a thread, then by
adding 20 patches that "may fix a bug that is annoying for someone" you've
just increased a chance there's a regression in the release by 34%. And
this is not just a math game, this also roughly matches a real experience
with maintaining our enterprise kernels. Do 20 "maybe" fixes outweight such
regression chance? And I also note that for a regression to get reported so
that it gets included into your 2% estimate of a patch regression rate,
someone must be bothered enough by it to triage it and send an email
somewhere so that already falls into a category of "serious" stuff to me.

So these are the reasons why I think that merging tons of patches into
stable isn't actually very good. 

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Jan Kara
On Mon 16-04-18 17:23:30, Sasha Levin wrote:
> On Mon, Apr 16, 2018 at 07:06:04PM +0200, Pavel Machek wrote:
> >On Mon 2018-04-16 16:37:56, Sasha Levin wrote:
> >> On Mon, Apr 16, 2018 at 12:30:19PM -0400, Steven Rostedt wrote:
> >> >On Mon, 16 Apr 2018 16:19:14 +
> >> >Sasha Levin  wrote:
> >> >
> >> >> >Wait! What does that mean? What's the purpose of stable if it is as
> >> >> >broken as mainline?
> >> >>
> >> >> This just means that if there is a fix that went in mainline, and the
> >> >> fix is broken somehow, we'd rather take the broken fix than not.
> >> >>
> >> >> In this scenario, *something* will be broken, it's just a matter of
> >> >> what. We'd rather have the same thing broken between mainline and
> >> >> stable.
> >> >
> >> >Honestly, I think that removes all value of the stable series. I
> >> >remember when the stable series were first created. People were saying
> >> >that it wouldn't even get to more than 5 versions, because the bar for
> >> >backporting was suppose to be very high. Today it's just a fork of the
> >> >kernel at a given version. No more features, but we will be OK with
> >> >regressions. I'm struggling to see what the benefit of it is suppose to
> >> >be?
> >>
> >> It's not "OK with regressions".
> >>
> >> Let's look at a hypothetical example: You have a 4.15.1 kernel that has
> >> a broken printf() behaviour so that when you:
> >>
> >>pr_err("%d", 5)
> >>
> >> Would print:
> >>
> >>"Microsoft Rulez"
> >>
> >> Bad, right? So you went ahead and fixed it, and now it prints "5" as you
> >> might expect. But alas, with your patch, running:
> >>
> >>pr_err("%s", "hi!")
> >>
> >> Would show a cat picture for 5 seconds.
> >>
> >> Should we take your patch in -stable or not? If we don't, we're stuck
> >> with the original issue while the mainline kernel will behave
> >> differently, but if we do - we introduce a new regression.
> >
> >Of course not.
> >
> >- It must be obviously correct and tested.
> >
> >If it introduces new bug, it is not correct, and certainly not
> >obviously correct.
> 
> As you might have noticed, we don't strictly follow the rules.
> 
> Take a look at the whole PTI story as an example. It's way more than 100
> lines, it's not obviously corrent, it fixed more than 1 thing, and so
> on, and yet it went in -stable!
> 
> Would you argue we shouldn't have backported PTI to -stable?

So I agree with that being backported. But I think this nicely demostrates
a point some people are trying to make in this thread. We do take fixes
with high risk or regression if they fix serious enough issue. Also we do
take fixes to non-serious stuff (such as addition of device ID) if the
chances of regression are really low.

So IMHO the metric for including the fix is not solely "how annoying to
user this can be" but rather something like:

score = (how annoying the bug is) * ((1 / (chance of regression due to
including this)) - 1)^3

(constants are somewhat arbitrary subject to tuning ;). Now both 'annoying'
and 'regression chance' parts are subjective and sometimes difficult to
estimate so don't take the formula too seriously but it demonstrates the
point. I think we all agree we want to fix annoying stuff and we don't want
regressions. But you need to somehow weight this over your expected
userbase - and this is where your argument "but someone might be annoyed by
LEDs not working so let's include it" has problems - it should rather be
"is the annoyance of non-working leds over expected user base high enough
to risk a regression due to this patch for someone in the expected user
base"? The answer to this second question is not clear at all to a casual
reviewer and that's why we IMHO have CC stable tag as maintainer is
supposed to have at least a bit better clue.

Another point I wanted to make is that if chance a patch causes a
regression is about 2% as you said somewhere else in a thread, then by
adding 20 patches that "may fix a bug that is annoying for someone" you've
just increased a chance there's a regression in the release by 34%. And
this is not just a math game, this also roughly matches a real experience
with maintaining our enterprise kernels. Do 20 "maybe" fixes outweight such
regression chance? And I also note that for a regression to get reported so
that it gets included into your 2% estimate of a patch regression rate,
someone must be bothered enough by it to triage it and send an email
somewhere so that already falls into a category of "serious" stuff to me.

So these are the reasons why I think that merging tons of patches into
stable isn't actually very good. 

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Jiri Kosina
On Tue, 17 Apr 2018, Greg KH wrote:

> It already is that :)

I have a question: I guess a stable team has an idea who they are 
preparing the tree for, IOW who is the target consumer. Who is it?

Certainly it's not major distros, as both RH and SUSE already stated that 
they are either not basing off the stable kernel (only cherry-pick fixes 
from it) (RH), or are quite far in the process of moving away from stable 
tree towards combination of what RH is doing + semi-automated evaluation 
of Fixes: tag (SUSE).

If the target audience is somewhere else, that's perfectly fine, but then 
it'd have to be stated explicitly I guess.

I can't speak for RH, but for us (at least me personally), the pace of 
patches flowing into -stable is way too high for us to keep control of 
what is landing in our tree.

In some sense, stability should be equivalent to "minimal necessary amout 
of super-critical changes". That's not what this "let's backport 
proactively almost everything that has word 'fixes' in changelog" (I'm a 
bit exaggerating of course) seems to be about.

Again, the rules stated out in

Documentation/process/stable-kernel-rules.rst

are very nice, and are exactly something at least we would be very happy 
about. They have the nice hidden asumption in them, that someone actually 
has to actively invest human brain power to think about the fix, its 
consequences, prerequisities, etc. Not just doing a big dump of all 
commits that "might fix something".

How many of the actual patches flowing into -stable would satisfy those 
criteria these days?

IOW, I'm pretty sure our users are much happier with us supplying them 
reactive fixes than pro-active uncertainity.

> The problem Sasha is trying to solve here is that for many subsystems,
> maintainers do not mark patches for stable at all.

The pressure on those subsystems should be coming from unhappy users 
(being it end-users or vendors redistributing the tree) of the stable 
tree, who would be complaining about missing fixes for those subsystems. 
Is this actually happening? Where?

> Oh, and if you do want to complain about huge new features being 
> backported, look at the mess that Spectre and Meltdown has caused in the 
> stable trees.  I don't see anyone complaining about those massive 
> changes :)

Umm, sorry, how is this related?

There simply was no other way, and I took it for given that this is seen 
by everybody involved as an absolute exception, due to the nature of the 
issue and of the massive changes that were needed.

Thanks,

-- 
Jiri Kosina
SUSE Labs



Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Jiri Kosina
On Tue, 17 Apr 2018, Greg KH wrote:

> It already is that :)

I have a question: I guess a stable team has an idea who they are 
preparing the tree for, IOW who is the target consumer. Who is it?

Certainly it's not major distros, as both RH and SUSE already stated that 
they are either not basing off the stable kernel (only cherry-pick fixes 
from it) (RH), or are quite far in the process of moving away from stable 
tree towards combination of what RH is doing + semi-automated evaluation 
of Fixes: tag (SUSE).

If the target audience is somewhere else, that's perfectly fine, but then 
it'd have to be stated explicitly I guess.

I can't speak for RH, but for us (at least me personally), the pace of 
patches flowing into -stable is way too high for us to keep control of 
what is landing in our tree.

In some sense, stability should be equivalent to "minimal necessary amout 
of super-critical changes". That's not what this "let's backport 
proactively almost everything that has word 'fixes' in changelog" (I'm a 
bit exaggerating of course) seems to be about.

Again, the rules stated out in

Documentation/process/stable-kernel-rules.rst

are very nice, and are exactly something at least we would be very happy 
about. They have the nice hidden asumption in them, that someone actually 
has to actively invest human brain power to think about the fix, its 
consequences, prerequisities, etc. Not just doing a big dump of all 
commits that "might fix something".

How many of the actual patches flowing into -stable would satisfy those 
criteria these days?

IOW, I'm pretty sure our users are much happier with us supplying them 
reactive fixes than pro-active uncertainity.

> The problem Sasha is trying to solve here is that for many subsystems,
> maintainers do not mark patches for stable at all.

The pressure on those subsystems should be coming from unhappy users 
(being it end-users or vendors redistributing the tree) of the stable 
tree, who would be complaining about missing fixes for those subsystems. 
Is this actually happening? Where?

> Oh, and if you do want to complain about huge new features being 
> backported, look at the mess that Spectre and Meltdown has caused in the 
> stable trees.  I don't see anyone complaining about those massive 
> changes :)

Umm, sorry, how is this related?

There simply was no other way, and I took it for given that this is seen 
by everybody involved as an absolute exception, due to the nature of the 
issue and of the massive changes that were needed.

Thanks,

-- 
Jiri Kosina
SUSE Labs



Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Michal Hocko
On Tue 17-04-18 12:39:36, Greg KH wrote:
> On Mon, Apr 16, 2018 at 11:28:44PM +0200, Jiri Kosina wrote:
> > On Mon, 16 Apr 2018, Sasha Levin wrote:
> > 
> > > I agree that as an enterprise distro taking everything from -stable
> > > isn't the best idea. Ideally you'd want to be close to the first
> > > extreme you've mentioned and only take commits if customers are asking
> > > you to do so.
> > > 
> > > I think that the rule we're trying to agree upon is the "It must fix
> > > a real bug that bothers people".
> > > 
> > > I think that we can agree that it's impossible to expect every single
> > > Linux user to go on LKML and complain about a bug he encountered, so the
> > > rule quickly becomes "It must fix a real bug that can bother people".
> > 
> > So is there a reason why stable couldn't become some hybrid-form union of
> > 
> > - really critical issues (data corruption, boot issues, severe security 
> >   issues) taken from bleeding edge upstream
> > - [reviewed] cherry-picks of functional fixes from major distro kernels 
> >   (based on that very -stable release), as that's apparently what people 
> >   are hitting in the real world with that particular kernel
> 
> It already is that :)
> 
> The problem Sasha is trying to solve here is that for many subsystems,
> maintainers do not mark patches for stable at all.

The way he is trying to do that is just wrong. Generate a pressure on
those subsystems by referring to bug reports and unhappy users and I am
pretty sure they will try harder... You cannot solve the problem by
bypassing them without having deep understanding of the specific
subsytem. Once you have it, just make sure you are part of the review
process and make sure to mark patches before they are merged.

> So real bugfixes
> that do hit people are not getting to those kernels, which force the
> distros to do extra work to triage a bug, dig through upstream kernels,
> find and apply the patch.

I would say that this is the primary role of the distro. To hide the
jungle of the upstream work and provide the additional of bug filtering
and forwarding them the right direction.
 
> By identifying the patches that should have been marked for stable,
> based on the ways that the changelog text is written and the logic in
> the patch itself, we circumvent that extra annoyance of users hitting
> problems and complaining, or ignoring them and hoping they go away if
> they reboot.

Well, but this is a two edge sword. You are not only backporting obvious
bug fixes but also pulling many patch out of the context they were
merged to and double checking all the assumptions are still true is a
non-trivial task to do. I am still not convinced any script or AI can do
that right now.

> I've been doing this "by hand" for many years now, with no complaints so
> far.

Really? I remember quite some complains about broken stable releases and
also many discussions on KS how the current workflow doesn't really work
for some users (e.g. distributions).

> Sasha has taken it to the next level as I don't scale and has
> started to automate it using some really nice tools.  That's all, this
> isn't crazy new features being backported, it's just patches that are
> obviously fixes being added to the stable tree.

I have yet to see a tool which can recognize an "obvious fix".
Seriously! Matching keywords in the changelog and some pattern
recognition in the diff can help to do some pre filtering _can_ help a
lot but there is still a human interaction needed to do sanity checking.
And that really requires deep subsystem knowledge. I really fail to see
how that can work without relevant people involvement. Pretending that
you can do stable without maintainers will simply not work IMNHO.

> Yes, sometimes those fixes need additional fixes, and that's fine,
> normal stable-marked patches need that all the time.  I don't see anyone
> complaining about that, right?
> 
> So nothing "new" is happening here, EXCEPT we are actually starting to
> get a better kernel-wide coverage for stable fixes, which we have not
> had in the past.  That's a good thing!  The number of patches applied to
> stable is still a very very very tiny % compared to mainline, so nothing
> new is happening here.

yes I do agree, the stable process is not very much different from the
past and I would tend both processes broken because they explicitly try
to avoid maintainers which is just wrong.

> Oh, and if you do want to complain about huge new features being
> backported, look at the mess that Spectre and Meltdown has caused in the
> stable trees.  I don't see anyone complaining about those massive
> changes :)

Are you serious? Are you going the compare the biggest PITA that the
community had to undergo because of HW issues with random pattern
matching in changelog/diffs? Come on!

-- 
Michal Hocko
SUSE Labs


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Michal Hocko
On Tue 17-04-18 12:39:36, Greg KH wrote:
> On Mon, Apr 16, 2018 at 11:28:44PM +0200, Jiri Kosina wrote:
> > On Mon, 16 Apr 2018, Sasha Levin wrote:
> > 
> > > I agree that as an enterprise distro taking everything from -stable
> > > isn't the best idea. Ideally you'd want to be close to the first
> > > extreme you've mentioned and only take commits if customers are asking
> > > you to do so.
> > > 
> > > I think that the rule we're trying to agree upon is the "It must fix
> > > a real bug that bothers people".
> > > 
> > > I think that we can agree that it's impossible to expect every single
> > > Linux user to go on LKML and complain about a bug he encountered, so the
> > > rule quickly becomes "It must fix a real bug that can bother people".
> > 
> > So is there a reason why stable couldn't become some hybrid-form union of
> > 
> > - really critical issues (data corruption, boot issues, severe security 
> >   issues) taken from bleeding edge upstream
> > - [reviewed] cherry-picks of functional fixes from major distro kernels 
> >   (based on that very -stable release), as that's apparently what people 
> >   are hitting in the real world with that particular kernel
> 
> It already is that :)
> 
> The problem Sasha is trying to solve here is that for many subsystems,
> maintainers do not mark patches for stable at all.

The way he is trying to do that is just wrong. Generate a pressure on
those subsystems by referring to bug reports and unhappy users and I am
pretty sure they will try harder... You cannot solve the problem by
bypassing them without having deep understanding of the specific
subsytem. Once you have it, just make sure you are part of the review
process and make sure to mark patches before they are merged.

> So real bugfixes
> that do hit people are not getting to those kernels, which force the
> distros to do extra work to triage a bug, dig through upstream kernels,
> find and apply the patch.

I would say that this is the primary role of the distro. To hide the
jungle of the upstream work and provide the additional of bug filtering
and forwarding them the right direction.
 
> By identifying the patches that should have been marked for stable,
> based on the ways that the changelog text is written and the logic in
> the patch itself, we circumvent that extra annoyance of users hitting
> problems and complaining, or ignoring them and hoping they go away if
> they reboot.

Well, but this is a two edge sword. You are not only backporting obvious
bug fixes but also pulling many patch out of the context they were
merged to and double checking all the assumptions are still true is a
non-trivial task to do. I am still not convinced any script or AI can do
that right now.

> I've been doing this "by hand" for many years now, with no complaints so
> far.

Really? I remember quite some complains about broken stable releases and
also many discussions on KS how the current workflow doesn't really work
for some users (e.g. distributions).

> Sasha has taken it to the next level as I don't scale and has
> started to automate it using some really nice tools.  That's all, this
> isn't crazy new features being backported, it's just patches that are
> obviously fixes being added to the stable tree.

I have yet to see a tool which can recognize an "obvious fix".
Seriously! Matching keywords in the changelog and some pattern
recognition in the diff can help to do some pre filtering _can_ help a
lot but there is still a human interaction needed to do sanity checking.
And that really requires deep subsystem knowledge. I really fail to see
how that can work without relevant people involvement. Pretending that
you can do stable without maintainers will simply not work IMNHO.

> Yes, sometimes those fixes need additional fixes, and that's fine,
> normal stable-marked patches need that all the time.  I don't see anyone
> complaining about that, right?
> 
> So nothing "new" is happening here, EXCEPT we are actually starting to
> get a better kernel-wide coverage for stable fixes, which we have not
> had in the past.  That's a good thing!  The number of patches applied to
> stable is still a very very very tiny % compared to mainline, so nothing
> new is happening here.

yes I do agree, the stable process is not very much different from the
past and I would tend both processes broken because they explicitly try
to avoid maintainers which is just wrong.

> Oh, and if you do want to complain about huge new features being
> backported, look at the mess that Spectre and Meltdown has caused in the
> stable trees.  I don't see anyone complaining about those massive
> changes :)

Are you serious? Are you going the compare the biggest PITA that the
community had to undergo because of HW issues with random pattern
matching in changelog/diffs? Come on!

-- 
Michal Hocko
SUSE Labs


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Greg KH
On Mon, Apr 16, 2018 at 06:54:51PM +0200, Pavel Machek wrote:
> On Mon 2018-04-16 16:45:16, Sasha Levin wrote:
> > On Mon, Apr 16, 2018 at 06:42:30PM +0200, Pavel Machek wrote:
> > >On Mon 2018-04-16 16:39:20, Sasha Levin wrote:
> > >> On Mon, Apr 16, 2018 at 06:28:50PM +0200, Pavel Machek wrote:
> > >> >
> > >> >> >> Is there a reason not to take LED fixes if they fix a bug and don't
> > >> >> >> cause a regression? Sure, we can draw some arbitrary line, maybe
> > >> >> >> designate some subsystems that are more "important" than others, 
> > >> >> >> but
> > >> >> >> what's the point?
> > >> >> >
> > >> >> >There's a tradeoff.
> > >> >> >
> > >> >> >You want to fix serious bugs in stable, and you really don't want
> > >> >> >regressions in stable. And ... stable not having 1000s of patches
> > >> >> >would be nice, too.
> > >> >>
> > >> >> I don't think we should use a number cap here, but rather look at the
> > >> >> regression rate: how many patches broke something?
> > >> >>
> > >> >> Since the rate we're seeing now with AUTOSEL is similar to what we 
> > >> >> were
> > >> >> seeing before AUTOSEL, what's the problem it's causing?
> > >> >
> > >> >Regression rate should not be the only criteria.
> > >> >
> > >> >More patches mean bigger chance customer's patches will have a
> > >> >conflict with something in -stable, for example.
> > >>
> > >> Out of tree patches can't be a consideration here. There are no
> > >> guarantees for out of tree code, ever.
> > >
> > >Out of tree code is not consideration for mainline, agreed. Stable
> > >should be different.
> > 
> > This is a discussion we could have with in right forum, but FYI stable
> > doesn't even guarantee KABI compatibility between minor versions at this
> > point.
> 
> Stable should be useful base for distributions. They carry out of tree
> patches, and yes, you should try to make their lives easy.

How do you know I already don't do that?

But no, in the end, it's not my job to make their life easier if they
are off in their own corner never providing me feedback or help.  For
those companies/distros/SoCs that do provide that feedback, I am glad to
work with them.

As proof of that, there are at least 3 "major" SoC vendors that have
been merging every one of the stable releases into their internal trees
for the past 6+ months now.  I get reports when they do the merge and
test, and so far, we have only had 1 regression.  And that regression
was because that SoC vendor forgot to upstream a patch that they had in
their internal tree (i.e. they fixed it a while ago but forgot to tell
anyone else, nothing we can do about that.)

So if you are a distro/company/whatever that takes stable releases, and
have run into problems, please let me know, and I will be glad to work
with you.

If you are not that, then please don't attempt to speak for them...

thanks,

greg k-h


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Greg KH
On Mon, Apr 16, 2018 at 06:54:51PM +0200, Pavel Machek wrote:
> On Mon 2018-04-16 16:45:16, Sasha Levin wrote:
> > On Mon, Apr 16, 2018 at 06:42:30PM +0200, Pavel Machek wrote:
> > >On Mon 2018-04-16 16:39:20, Sasha Levin wrote:
> > >> On Mon, Apr 16, 2018 at 06:28:50PM +0200, Pavel Machek wrote:
> > >> >
> > >> >> >> Is there a reason not to take LED fixes if they fix a bug and don't
> > >> >> >> cause a regression? Sure, we can draw some arbitrary line, maybe
> > >> >> >> designate some subsystems that are more "important" than others, 
> > >> >> >> but
> > >> >> >> what's the point?
> > >> >> >
> > >> >> >There's a tradeoff.
> > >> >> >
> > >> >> >You want to fix serious bugs in stable, and you really don't want
> > >> >> >regressions in stable. And ... stable not having 1000s of patches
> > >> >> >would be nice, too.
> > >> >>
> > >> >> I don't think we should use a number cap here, but rather look at the
> > >> >> regression rate: how many patches broke something?
> > >> >>
> > >> >> Since the rate we're seeing now with AUTOSEL is similar to what we 
> > >> >> were
> > >> >> seeing before AUTOSEL, what's the problem it's causing?
> > >> >
> > >> >Regression rate should not be the only criteria.
> > >> >
> > >> >More patches mean bigger chance customer's patches will have a
> > >> >conflict with something in -stable, for example.
> > >>
> > >> Out of tree patches can't be a consideration here. There are no
> > >> guarantees for out of tree code, ever.
> > >
> > >Out of tree code is not consideration for mainline, agreed. Stable
> > >should be different.
> > 
> > This is a discussion we could have with in right forum, but FYI stable
> > doesn't even guarantee KABI compatibility between minor versions at this
> > point.
> 
> Stable should be useful base for distributions. They carry out of tree
> patches, and yes, you should try to make their lives easy.

How do you know I already don't do that?

But no, in the end, it's not my job to make their life easier if they
are off in their own corner never providing me feedback or help.  For
those companies/distros/SoCs that do provide that feedback, I am glad to
work with them.

As proof of that, there are at least 3 "major" SoC vendors that have
been merging every one of the stable releases into their internal trees
for the past 6+ months now.  I get reports when they do the merge and
test, and so far, we have only had 1 regression.  And that regression
was because that SoC vendor forgot to upstream a patch that they had in
their internal tree (i.e. they fixed it a while ago but forgot to tell
anyone else, nothing we can do about that.)

So if you are a distro/company/whatever that takes stable releases, and
have run into problems, please let me know, and I will be glad to work
with you.

If you are not that, then please don't attempt to speak for them...

thanks,

greg k-h


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Greg KH
On Mon, Apr 16, 2018 at 07:00:10PM +0200, Pavel Machek wrote:
> Hi!
> 
> > >> Let me ask my wife (who is happy using Linux as a regular desktop user)
> > >> how comfortable she would be with triaging kernel bugs...
> > >
> > >That's really up to the distribution, not the main kernel stable. Does
> > >she download and compile the kernels herself? Does she use LEDs?
> > >
> > >The point is, stable is to keep what was working continued working.
> > >If we don't care about introducing a regression, and just want to keep
> > >regressions the same as mainline, why not just go to mainline? That way
> > >you can also get the new features? Mainline already has the mantra to
> > >not break user space. When I work on new features, I sometimes stumble
> > >on bugs with the current features. And some of those fixes require a
> > >rewrite. It was "good enough" before, but every so often could cause a
> > >bug that the new feature would trigger more often. Do we back port that
> > >rewrite? Do we backport fixes to old code that are more likely to be
> > >triggered by new features?
> > >
> > >Ideally, we should be working on getting to no regressions to stable.
> > 
> > This is exactly what we're doing.
> > 
> > If a fix for a bug in -stable introduces a different regression,
> > should we take it or not?
> 
> If a fix for bug introduces regression, would you call it "obviously
> correct"?

I honestly can't believe you all are arguing about this.  We backport
bugfixes to the stable tree.  If those fixes also are buggy we either
apply the fix for that problem that ended up in Linus's tree, or we
revert the patch.  If the fix is not in Linus's tree, sometimes we leave
the "bug" in stable for a bit to apply some pressure on the
developer/maintainer to get it fixed in Linus's tree (that's what I mean
by being "bug compatible".)

This is exactly what we have been doing for over a decade now, why are
people suddenly getting upset?

Oh, I know why, suddenly subsystems that never were taking the time to
mark patches for stable are getting patches backported and are getting
nervous.  The simple way to stop that from happening is to PROPERLY MARK
PATCHES FOR STABLE IN THE FIRST PLACE!

If you do that, then, no "automated" patches will get selected as you
already handled them all.  Or if there are some automated patches
picked, you can easily NAK them (like xfs does as they know better than
everyone else, and honestly, I trust them, and don't run xfs myself), or
do like what I do when it happens to me and go "hey, nice, I missed that
one!"

There, problem solved, if you do that, no more worrying by you at all,
and this thread can properly die.

ugh,

greg k-h


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Greg KH
On Mon, Apr 16, 2018 at 07:00:10PM +0200, Pavel Machek wrote:
> Hi!
> 
> > >> Let me ask my wife (who is happy using Linux as a regular desktop user)
> > >> how comfortable she would be with triaging kernel bugs...
> > >
> > >That's really up to the distribution, not the main kernel stable. Does
> > >she download and compile the kernels herself? Does she use LEDs?
> > >
> > >The point is, stable is to keep what was working continued working.
> > >If we don't care about introducing a regression, and just want to keep
> > >regressions the same as mainline, why not just go to mainline? That way
> > >you can also get the new features? Mainline already has the mantra to
> > >not break user space. When I work on new features, I sometimes stumble
> > >on bugs with the current features. And some of those fixes require a
> > >rewrite. It was "good enough" before, but every so often could cause a
> > >bug that the new feature would trigger more often. Do we back port that
> > >rewrite? Do we backport fixes to old code that are more likely to be
> > >triggered by new features?
> > >
> > >Ideally, we should be working on getting to no regressions to stable.
> > 
> > This is exactly what we're doing.
> > 
> > If a fix for a bug in -stable introduces a different regression,
> > should we take it or not?
> 
> If a fix for bug introduces regression, would you call it "obviously
> correct"?

I honestly can't believe you all are arguing about this.  We backport
bugfixes to the stable tree.  If those fixes also are buggy we either
apply the fix for that problem that ended up in Linus's tree, or we
revert the patch.  If the fix is not in Linus's tree, sometimes we leave
the "bug" in stable for a bit to apply some pressure on the
developer/maintainer to get it fixed in Linus's tree (that's what I mean
by being "bug compatible".)

This is exactly what we have been doing for over a decade now, why are
people suddenly getting upset?

Oh, I know why, suddenly subsystems that never were taking the time to
mark patches for stable are getting patches backported and are getting
nervous.  The simple way to stop that from happening is to PROPERLY MARK
PATCHES FOR STABLE IN THE FIRST PLACE!

If you do that, then, no "automated" patches will get selected as you
already handled them all.  Or if there are some automated patches
picked, you can easily NAK them (like xfs does as they know better than
everyone else, and honestly, I trust them, and don't run xfs myself), or
do like what I do when it happens to me and go "hey, nice, I missed that
one!"

There, problem solved, if you do that, no more worrying by you at all,
and this thread can properly die.

ugh,

greg k-h


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Greg KH
On Mon, Apr 16, 2018 at 11:28:44PM +0200, Jiri Kosina wrote:
> On Mon, 16 Apr 2018, Sasha Levin wrote:
> 
> > I agree that as an enterprise distro taking everything from -stable
> > isn't the best idea. Ideally you'd want to be close to the first
> > extreme you've mentioned and only take commits if customers are asking
> > you to do so.
> > 
> > I think that the rule we're trying to agree upon is the "It must fix
> > a real bug that bothers people".
> > 
> > I think that we can agree that it's impossible to expect every single
> > Linux user to go on LKML and complain about a bug he encountered, so the
> > rule quickly becomes "It must fix a real bug that can bother people".
> 
> So is there a reason why stable couldn't become some hybrid-form union of
> 
> - really critical issues (data corruption, boot issues, severe security 
>   issues) taken from bleeding edge upstream
> - [reviewed] cherry-picks of functional fixes from major distro kernels 
>   (based on that very -stable release), as that's apparently what people 
>   are hitting in the real world with that particular kernel

It already is that :)

The problem Sasha is trying to solve here is that for many subsystems,
maintainers do not mark patches for stable at all.  So real bugfixes
that do hit people are not getting to those kernels, which force the
distros to do extra work to triage a bug, dig through upstream kernels,
find and apply the patch.

By identifying the patches that should have been marked for stable,
based on the ways that the changelog text is written and the logic in
the patch itself, we circumvent that extra annoyance of users hitting
problems and complaining, or ignoring them and hoping they go away if
they reboot.

I've been doing this "by hand" for many years now, with no complaints so
far.  Sasha has taken it to the next level as I don't scale and has
started to automate it using some really nice tools.  That's all, this
isn't crazy new features being backported, it's just patches that are
obviously fixes being added to the stable tree.

Yes, sometimes those fixes need additional fixes, and that's fine,
normal stable-marked patches need that all the time.  I don't see anyone
complaining about that, right?

So nothing "new" is happening here, EXCEPT we are actually starting to
get a better kernel-wide coverage for stable fixes, which we have not
had in the past.  That's a good thing!  The number of patches applied to
stable is still a very very very tiny % compared to mainline, so nothing
new is happening here.

Oh, and if you do want to complain about huge new features being
backported, look at the mess that Spectre and Meltdown has caused in the
stable trees.  I don't see anyone complaining about those massive
changes :)

thanks,

greg k-h


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Greg KH
On Mon, Apr 16, 2018 at 11:28:44PM +0200, Jiri Kosina wrote:
> On Mon, 16 Apr 2018, Sasha Levin wrote:
> 
> > I agree that as an enterprise distro taking everything from -stable
> > isn't the best idea. Ideally you'd want to be close to the first
> > extreme you've mentioned and only take commits if customers are asking
> > you to do so.
> > 
> > I think that the rule we're trying to agree upon is the "It must fix
> > a real bug that bothers people".
> > 
> > I think that we can agree that it's impossible to expect every single
> > Linux user to go on LKML and complain about a bug he encountered, so the
> > rule quickly becomes "It must fix a real bug that can bother people".
> 
> So is there a reason why stable couldn't become some hybrid-form union of
> 
> - really critical issues (data corruption, boot issues, severe security 
>   issues) taken from bleeding edge upstream
> - [reviewed] cherry-picks of functional fixes from major distro kernels 
>   (based on that very -stable release), as that's apparently what people 
>   are hitting in the real world with that particular kernel

It already is that :)

The problem Sasha is trying to solve here is that for many subsystems,
maintainers do not mark patches for stable at all.  So real bugfixes
that do hit people are not getting to those kernels, which force the
distros to do extra work to triage a bug, dig through upstream kernels,
find and apply the patch.

By identifying the patches that should have been marked for stable,
based on the ways that the changelog text is written and the logic in
the patch itself, we circumvent that extra annoyance of users hitting
problems and complaining, or ignoring them and hoping they go away if
they reboot.

I've been doing this "by hand" for many years now, with no complaints so
far.  Sasha has taken it to the next level as I don't scale and has
started to automate it using some really nice tools.  That's all, this
isn't crazy new features being backported, it's just patches that are
obviously fixes being added to the stable tree.

Yes, sometimes those fixes need additional fixes, and that's fine,
normal stable-marked patches need that all the time.  I don't see anyone
complaining about that, right?

So nothing "new" is happening here, EXCEPT we are actually starting to
get a better kernel-wide coverage for stable fixes, which we have not
had in the past.  That's a good thing!  The number of patches applied to
stable is still a very very very tiny % compared to mainline, so nothing
new is happening here.

Oh, and if you do want to complain about huge new features being
backported, look at the mess that Spectre and Meltdown has caused in the
stable trees.  I don't see anyone complaining about those massive
changes :)

thanks,

greg k-h


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-16 Thread Jiri Kosina
On Mon, 16 Apr 2018, Sasha Levin wrote:

> I agree that as an enterprise distro taking everything from -stable
> isn't the best idea. Ideally you'd want to be close to the first
> extreme you've mentioned and only take commits if customers are asking
> you to do so.
> 
> I think that the rule we're trying to agree upon is the "It must fix
> a real bug that bothers people".
> 
> I think that we can agree that it's impossible to expect every single
> Linux user to go on LKML and complain about a bug he encountered, so the
> rule quickly becomes "It must fix a real bug that can bother people".

So is there a reason why stable couldn't become some hybrid-form union of

- really critical issues (data corruption, boot issues, severe security 
  issues) taken from bleeding edge upstream
- [reviewed] cherry-picks of functional fixes from major distro kernels 
  (based on that very -stable release), as that's apparently what people 
  are hitting in the real world with that particular kernel

?

Thanks,

-- 
Jiri Kosina
SUSE Labs



Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-16 Thread Jiri Kosina
On Mon, 16 Apr 2018, Sasha Levin wrote:

> I agree that as an enterprise distro taking everything from -stable
> isn't the best idea. Ideally you'd want to be close to the first
> extreme you've mentioned and only take commits if customers are asking
> you to do so.
> 
> I think that the rule we're trying to agree upon is the "It must fix
> a real bug that bothers people".
> 
> I think that we can agree that it's impossible to expect every single
> Linux user to go on LKML and complain about a bug he encountered, so the
> rule quickly becomes "It must fix a real bug that can bother people".

So is there a reason why stable couldn't become some hybrid-form union of

- really critical issues (data corruption, boot issues, severe security 
  issues) taken from bleeding edge upstream
- [reviewed] cherry-picks of functional fixes from major distro kernels 
  (based on that very -stable release), as that's apparently what people 
  are hitting in the real world with that particular kernel

?

Thanks,

-- 
Jiri Kosina
SUSE Labs



Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-16 Thread Steven Rostedt
On Mon, 16 Apr 2018 13:17:24 -0700
Linus Torvalds  wrote:

> On Mon, Apr 16, 2018 at 1:02 PM, Steven Rostedt  wrote:
> >
> > But this is going way off topic to what we were discussing. The
> > discussion is about what gets backported. Is automating the process
> > going to make stable better? Or is it likely to add more regressions.
> >
> > Sasha's response has been that his automated process has the same rate
> > of regressions as what gets tagged by authors. My argument is that
> > perhaps authors should tag less to stable.  
> 
> The ones who should matter most for that discussion is the distros,
> since they are the actual users of stable (as well as the people doing
> the work, of course - ie Sasha and Greg and the rest of the stable
> gang).

That was actually my final conclusion before we started out
discussion ;-)

http://lkml.kernel.org/r/20180416143510.79ba5...@gandalf.local.home

> 
> And I suspect that they actually do want all the noise, and all the
> stuff that isn't "critical". That's often the _easy_ choice. It's the
> stuff that I suspect the stable maintainers go "this I don't even have
> to think about", because it's a new driver ID or something.

Although Red Hat doesn't base off of the stable kernel. At least it
didn't when I was there. They may look at the stable kernel, but they
make their own decisions.

If we want the distros to use stable as the base, it should be the
least common factor among them. Otherwise, if stable includes commits
that a distro would rather not backport, then they wont use stable.

> 
> Because the bulk of stable tends to be driver updates, afaik. Which
> distros very much tend to want.
> 
> Will developers think that their patches matter so much that they
> should go to stable? Yes they will. Will they overtag as a result?
> Probably. But the reverse likely also happens, where people simply
> don't think about stable at all, and just want to fix a bug.
> 
> In many ways "Fixes" is likely a better thing to check for in stable
> backports, but that doesn't always exist either.
> 
> And just judging by the amount of stable email I get - and by how
> excited _I_ would be about stable work, I think "automated process" is
> simply not an option. It's a _requirement_. You'd go completely crazy
> if you didn't automate 99% of all the stable work.
> 
> So can you trust the "Cc: stable" as being perfect? Hell no. But
> what's your alternative? Manually selecting things for stable? Asking
> the developers separately?
> 
> Because "criticality" definitely isn't what determines it. If it was,
> we'd never add driver ID's etc to stable - they're clearly not
> "critical".

True. But I believe the driver ID's was given the "exception".


> 
> Yet it feels like that's sometimes those driver things are the _bulk_
> of it, and it is usually fairly safe (not quite as obviously safe as
> you'd think, because a driver ID addition has occasionally meant not
> just "now it's supported", but instead "now the generic driver doesn't
> trigger for it any more", so it can actually break things).
> 
> So I think - and _hope_ - that 99% of stable should be the
> non-critical stuff that people don't even need to think about.
> 
> The critical stuff is hopefully a tiny tiny percentage.

Well, I'm not sure that's really the case.

$ git log --oneline v4.14.33..v4.14.34 | head -20
ffebeb0d7c37 Linux 4.14.34
fdae5b620566 net/mlx4_core: Fix memory leak while delete slave's resources
9fdeb33e1913 vhost_net: add missing lock nesting notation
8c316b625705 team: move dev_mc_sync after master_upper_dev_link in team_port_add
233ba28e1862 route: check sysctl_fib_multipath_use_neigh earlier than hash
2f8aa659d4c0 vhost: validate log when IOTLB is enabled
72b880f43990 net/mlx5e: Fix traffic being dropped on VF representor
9408bceb0649 net/mlx4_en: Fix mixed PFC and Global pause user control requests
477c73abf26a strparser: Fix sign of err codes
1c71bfe84deb net/sched: fix NULL dereference on the error path of 
tcf_skbmod_init()
a19024a3f343 net/sched: fix NULL dereference in the error path of 
tunnel_key_init()
e096c8bf4fb8 net/mlx5e: Sync netdev vxlan ports at open
baab1f0c4885 net/mlx5e: Don't override vport admin link state in switchdev mode
1ec7966ab7db ipv6: sr: fix seg6 encap performances with TSO enabled
e52a45bb392f nfp: use full 40 bits of the NSP buffer address
ddf79878f1e0 net/mlx5e: Fix memory usage issues in offloading TC flows
9282181c1cc5 net/mlx5e: Avoid using the ipv6 stub in the TC offload neigh 
update path
b9c6ddda3805 vti6: better validate user provided tunnel names
109dce20c6ed ip6_tunnel: better validate user provided tunnel names
72363c63b070 ip6_gre: better validate user provided tunnel names

The majority of those appear to be on the critical side.

-- Steve


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-16 Thread Steven Rostedt
On Mon, 16 Apr 2018 13:17:24 -0700
Linus Torvalds  wrote:

> On Mon, Apr 16, 2018 at 1:02 PM, Steven Rostedt  wrote:
> >
> > But this is going way off topic to what we were discussing. The
> > discussion is about what gets backported. Is automating the process
> > going to make stable better? Or is it likely to add more regressions.
> >
> > Sasha's response has been that his automated process has the same rate
> > of regressions as what gets tagged by authors. My argument is that
> > perhaps authors should tag less to stable.  
> 
> The ones who should matter most for that discussion is the distros,
> since they are the actual users of stable (as well as the people doing
> the work, of course - ie Sasha and Greg and the rest of the stable
> gang).

That was actually my final conclusion before we started out
discussion ;-)

http://lkml.kernel.org/r/20180416143510.79ba5...@gandalf.local.home

> 
> And I suspect that they actually do want all the noise, and all the
> stuff that isn't "critical". That's often the _easy_ choice. It's the
> stuff that I suspect the stable maintainers go "this I don't even have
> to think about", because it's a new driver ID or something.

Although Red Hat doesn't base off of the stable kernel. At least it
didn't when I was there. They may look at the stable kernel, but they
make their own decisions.

If we want the distros to use stable as the base, it should be the
least common factor among them. Otherwise, if stable includes commits
that a distro would rather not backport, then they wont use stable.

> 
> Because the bulk of stable tends to be driver updates, afaik. Which
> distros very much tend to want.
> 
> Will developers think that their patches matter so much that they
> should go to stable? Yes they will. Will they overtag as a result?
> Probably. But the reverse likely also happens, where people simply
> don't think about stable at all, and just want to fix a bug.
> 
> In many ways "Fixes" is likely a better thing to check for in stable
> backports, but that doesn't always exist either.
> 
> And just judging by the amount of stable email I get - and by how
> excited _I_ would be about stable work, I think "automated process" is
> simply not an option. It's a _requirement_. You'd go completely crazy
> if you didn't automate 99% of all the stable work.
> 
> So can you trust the "Cc: stable" as being perfect? Hell no. But
> what's your alternative? Manually selecting things for stable? Asking
> the developers separately?
> 
> Because "criticality" definitely isn't what determines it. If it was,
> we'd never add driver ID's etc to stable - they're clearly not
> "critical".

True. But I believe the driver ID's was given the "exception".


> 
> Yet it feels like that's sometimes those driver things are the _bulk_
> of it, and it is usually fairly safe (not quite as obviously safe as
> you'd think, because a driver ID addition has occasionally meant not
> just "now it's supported", but instead "now the generic driver doesn't
> trigger for it any more", so it can actually break things).
> 
> So I think - and _hope_ - that 99% of stable should be the
> non-critical stuff that people don't even need to think about.
> 
> The critical stuff is hopefully a tiny tiny percentage.

Well, I'm not sure that's really the case.

$ git log --oneline v4.14.33..v4.14.34 | head -20
ffebeb0d7c37 Linux 4.14.34
fdae5b620566 net/mlx4_core: Fix memory leak while delete slave's resources
9fdeb33e1913 vhost_net: add missing lock nesting notation
8c316b625705 team: move dev_mc_sync after master_upper_dev_link in team_port_add
233ba28e1862 route: check sysctl_fib_multipath_use_neigh earlier than hash
2f8aa659d4c0 vhost: validate log when IOTLB is enabled
72b880f43990 net/mlx5e: Fix traffic being dropped on VF representor
9408bceb0649 net/mlx4_en: Fix mixed PFC and Global pause user control requests
477c73abf26a strparser: Fix sign of err codes
1c71bfe84deb net/sched: fix NULL dereference on the error path of 
tcf_skbmod_init()
a19024a3f343 net/sched: fix NULL dereference in the error path of 
tunnel_key_init()
e096c8bf4fb8 net/mlx5e: Sync netdev vxlan ports at open
baab1f0c4885 net/mlx5e: Don't override vport admin link state in switchdev mode
1ec7966ab7db ipv6: sr: fix seg6 encap performances with TSO enabled
e52a45bb392f nfp: use full 40 bits of the NSP buffer address
ddf79878f1e0 net/mlx5e: Fix memory usage issues in offloading TC flows
9282181c1cc5 net/mlx5e: Avoid using the ipv6 stub in the TC offload neigh 
update path
b9c6ddda3805 vti6: better validate user provided tunnel names
109dce20c6ed ip6_tunnel: better validate user provided tunnel names
72363c63b070 ip6_gre: better validate user provided tunnel names

The majority of those appear to be on the critical side.

-- Steve


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-16 Thread Sasha Levin
On Mon, Apr 16, 2018 at 10:43:28PM +0200, Jiri Kosina wrote:
>On Mon, 16 Apr 2018, Sasha Levin wrote:
>
>> So I think that Linus's claim that users come first applies here as
>> well. If there's a user that cares about a particular feature being
>> broken, then we go ahead and fix his bug rather then ignoring him.
>
>So one extreme is fixing -stable *iff* users actually do report an issue.
>
>The other extreme is backporting everything that potentially looks like a
>potential fix of "something" (according to some arbitrary metric),
>pro-actively.
>
>The former voilates the "users first" rule, the latter has a very, very
>high risk of regressions.
>
>So this whole debate is about finding a compromise.
>
>My gut feeling always was that the statement in
>
>   Documentation/process/stable-kernel-rules.rst
>
>is very reasonable, but making the process way more "aggresive" when
>backporting patches is breaking much of its original spirit for me.

I agree that as an enterprise distro taking everything from -stable
isn't the best idea. Ideally you'd want to be close to the first
extreme you've mentioned and only take commits if customers are asking
you to do so.

I think that the rule we're trying to agree upon is the "It must fix
a real bug that bothers people".

I think that we can agree that it's impossible to expect every single
Linux user to go on LKML and complain about a bug he encountered, so the
rule quickly becomes "It must fix a real bug that can bother people".

My "aggressiveness" comes from the whole "bother" part: it doesn't have
to be critical, it doesn't have to cause data corruption, it doesn't
have to be a security issue. It's enough that the bug actually affects a
user in a way he didn't expect it to (if a user doesn't have
expectations, it would fall under the "This could be a problem..."
exception.

We can go into a discussion about what exactly "bothering" is, but on
the flip side, the whole -stable tag is just a way for folks to indicate
they want a given patch reviewed for stable, it's not actually a
guarantee of whether the patch will go in to -stable or not.

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-16 Thread Sasha Levin
On Mon, Apr 16, 2018 at 10:43:28PM +0200, Jiri Kosina wrote:
>On Mon, 16 Apr 2018, Sasha Levin wrote:
>
>> So I think that Linus's claim that users come first applies here as
>> well. If there's a user that cares about a particular feature being
>> broken, then we go ahead and fix his bug rather then ignoring him.
>
>So one extreme is fixing -stable *iff* users actually do report an issue.
>
>The other extreme is backporting everything that potentially looks like a
>potential fix of "something" (according to some arbitrary metric),
>pro-actively.
>
>The former voilates the "users first" rule, the latter has a very, very
>high risk of regressions.
>
>So this whole debate is about finding a compromise.
>
>My gut feeling always was that the statement in
>
>   Documentation/process/stable-kernel-rules.rst
>
>is very reasonable, but making the process way more "aggresive" when
>backporting patches is breaking much of its original spirit for me.

I agree that as an enterprise distro taking everything from -stable
isn't the best idea. Ideally you'd want to be close to the first
extreme you've mentioned and only take commits if customers are asking
you to do so.

I think that the rule we're trying to agree upon is the "It must fix
a real bug that bothers people".

I think that we can agree that it's impossible to expect every single
Linux user to go on LKML and complain about a bug he encountered, so the
rule quickly becomes "It must fix a real bug that can bother people".

My "aggressiveness" comes from the whole "bother" part: it doesn't have
to be critical, it doesn't have to cause data corruption, it doesn't
have to be a security issue. It's enough that the bug actually affects a
user in a way he didn't expect it to (if a user doesn't have
expectations, it would fall under the "This could be a problem..."
exception.

We can go into a discussion about what exactly "bothering" is, but on
the flip side, the whole -stable tag is just a way for folks to indicate
they want a given patch reviewed for stable, it's not actually a
guarantee of whether the patch will go in to -stable or not.

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-16 Thread Jiri Kosina
On Mon, 16 Apr 2018, Sasha Levin wrote:

> So if a user is operating a nuclear power plant, and has 2 leds: green 
> one that says "All OK!" and a red one saying "NUCLEAR MELTDOWN!", and 
> once in a blue moon a race condition is causing the red one to go on and 
> cause panic in the little province he lives in, we should tell that user 
> to fuck off?
> 
> LEDs may not be critical for you, but they can be critical for someone
> else. Think of all the different users we have and the wildly different
> ways they use the kernel.

I am pretty sure that for almost every fix there is a person on a planet 
that'd rate it "critical". We can't really use this as an argument for 
inclusion of code into -stable, as that'd mean that -stable and Linus' 
tree would have to be basically the same.

-- 
Jiri Kosina
SUSE Labs


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-16 Thread Jiri Kosina
On Mon, 16 Apr 2018, Sasha Levin wrote:

> So if a user is operating a nuclear power plant, and has 2 leds: green 
> one that says "All OK!" and a red one saying "NUCLEAR MELTDOWN!", and 
> once in a blue moon a race condition is causing the red one to go on and 
> cause panic in the little province he lives in, we should tell that user 
> to fuck off?
> 
> LEDs may not be critical for you, but they can be critical for someone
> else. Think of all the different users we have and the wildly different
> ways they use the kernel.

I am pretty sure that for almost every fix there is a person on a planet 
that'd rate it "critical". We can't really use this as an argument for 
inclusion of code into -stable, as that'd mean that -stable and Linus' 
tree would have to be basically the same.

-- 
Jiri Kosina
SUSE Labs


  1   2   3   >