subject:"\[PATCH AUTOSEL for 4.14 015\/161\] printk\: Add console owner and waiter logic to load balance console writes"

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-05-03 Thread Sasha Levin

On Thu, May 03, 2018 at 11:32:15AM +0200, Pavel Machek wrote:
>Hi!
>
>> >- It must be obviously correct and tested.
>> >
>> >If it introduces new bug, it is not correct, and certainly not
>> >obviously correct.
>>
>> As you might have noticed, we don't strictly follow the rules.
>
>Yes, I noticed. And what I'm saying is that perhaps you should follow
>the rules more strictly.

Again, this was stated many times by Greg and others, the rules are not
there to be strictly followed.

>> Take a look at the whole PTI story as an example. It's way more than 100
>> lines, it's not obviously corrent, it fixed more than 1 thing, and so
>> on, and yet it went in -stable!
>>
>> Would you argue we shouldn't have backported PTI to -stable?
>
>Actually, I was surprised with PTI going to stable. That was clearly
>against the rules. Maybe the security bug was ugly enough to warrant
>that.
>
>But please don't use it as an argument for applying any random
>patches...

How about this: if a -stable maintainer has concerns with how I follow
the -stable rules, he's more than welcome to reject my patches. Sounds
like a plan?

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-05-03 Thread Sasha Levin

On Thu, May 03, 2018 at 11:32:15AM +0200, Pavel Machek wrote:
>Hi!
>
>> >- It must be obviously correct and tested.
>> >
>> >If it introduces new bug, it is not correct, and certainly not
>> >obviously correct.
>>
>> As you might have noticed, we don't strictly follow the rules.
>
>Yes, I noticed. And what I'm saying is that perhaps you should follow
>the rules more strictly.

Again, this was stated many times by Greg and others, the rules are not
there to be strictly followed.

>> Take a look at the whole PTI story as an example. It's way more than 100
>> lines, it's not obviously corrent, it fixed more than 1 thing, and so
>> on, and yet it went in -stable!
>>
>> Would you argue we shouldn't have backported PTI to -stable?
>
>Actually, I was surprised with PTI going to stable. That was clearly
>against the rules. Maybe the security bug was ugly enough to warrant
>that.
>
>But please don't use it as an argument for applying any random
>patches...

How about this: if a -stable maintainer has concerns with how I follow
the -stable rules, he's more than welcome to reject my patches. Sounds
like a plan?

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-05-03 Thread Sasha Levin

On Thu, May 03, 2018 at 11:36:51AM +0200, Pavel Machek wrote:
>On Tue 2018-04-17 16:19:35, Sasha Levin wrote:
>> On Tue, Apr 17, 2018 at 05:55:49PM +0200, Jan Kara wrote:
>> >On Tue 17-04-18 13:31:51, Sasha Levin wrote:
>> >> We may be able to guesstimate the 'regression chance', but there's no
>> >> way we can guess the 'annoyance' once. There are so many different use
>> >> cases that we just can't even guess how many people would get "annoyed"
>> >> by something.
>> >
>> >As a maintainer, I hope I have reasonable idea what are common use cases
>> >for my subsystem. Those I cater to when estimating 'annoyance'. Sure I don't
>> >know all of the use cases so people doing unusual stuff hit more bugs and
>> >have to report them to get fixes included in -stable. But for me this is a
>> >preferable tradeoff over the risk of regression so this is the rule I use
>> >when tagging for stable. Now I'm not a -stable maintainer and I fully agree
>> >with "those who do the work decide" principle so pick whatever patches you
>> >think are appropriate, I just wanted explain why I don't think more patches
>> >in stable are necessarily good.
>>
>> The AUTOSEL story is different for subsystems that don't do -stable, and
>> subsystems that are actually doing the work (like yourself).
>>
>> I'm not trying to override active maintainers, I'm trying to help them
>> make decisions.
>
>Ok, cool. Can you exclude LED subsystem, Hibernation and Nokia N900
>stuff from autosel work?

Curiousity got me, and I had to see what these subsystems do as far as
stable commits:

$ git log --oneline --grep 'stable@vger' --since="01-01-2016" kernel/power 
drivers/leds drivers/media/i2c/et8ek8 drivers/media/i2c/ad5820.c 
arch/x86/kernel/acpi/ | wc -l
7

Which got me a bit surprised: maybe indeed leds is mostly fine, but
hibernation is definitely tricky, I've been stung by it a few times...

So why not pick something an actual user reported, and see how that was
dealt with?

Googling first showed this:

https://bugzilla.kernel.org/show_bug.cgi?id=97201

Which was fixed by:


https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bdbc98abb3aa323f6323b11db39c740e6f8fc5b1

But that's not in any -stable tree. Hmm.. ok..

Next one on google was:

https://bugzilla.kernel.org/show_bug.cgi?id=117971

Which, in turn, was fixed by:


https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5b3f249c94ce1f46bacd9814385b0ee2d1ae52f3

Oh look at that, it's not in -stable either...

So seeing how you have concerns with my selection of -stable commits,
maybe you could explain to me why these commits didn't end up in
-stable?

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-05-03 Thread Sasha Levin

On Thu, May 03, 2018 at 11:36:51AM +0200, Pavel Machek wrote:
>On Tue 2018-04-17 16:19:35, Sasha Levin wrote:
>> On Tue, Apr 17, 2018 at 05:55:49PM +0200, Jan Kara wrote:
>> >On Tue 17-04-18 13:31:51, Sasha Levin wrote:
>> >> We may be able to guesstimate the 'regression chance', but there's no
>> >> way we can guess the 'annoyance' once. There are so many different use
>> >> cases that we just can't even guess how many people would get "annoyed"
>> >> by something.
>> >
>> >As a maintainer, I hope I have reasonable idea what are common use cases
>> >for my subsystem. Those I cater to when estimating 'annoyance'. Sure I don't
>> >know all of the use cases so people doing unusual stuff hit more bugs and
>> >have to report them to get fixes included in -stable. But for me this is a
>> >preferable tradeoff over the risk of regression so this is the rule I use
>> >when tagging for stable. Now I'm not a -stable maintainer and I fully agree
>> >with "those who do the work decide" principle so pick whatever patches you
>> >think are appropriate, I just wanted explain why I don't think more patches
>> >in stable are necessarily good.
>>
>> The AUTOSEL story is different for subsystems that don't do -stable, and
>> subsystems that are actually doing the work (like yourself).
>>
>> I'm not trying to override active maintainers, I'm trying to help them
>> make decisions.
>
>Ok, cool. Can you exclude LED subsystem, Hibernation and Nokia N900
>stuff from autosel work?

Curiousity got me, and I had to see what these subsystems do as far as
stable commits:

$ git log --oneline --grep 'stable@vger' --since="01-01-2016" kernel/power 
drivers/leds drivers/media/i2c/et8ek8 drivers/media/i2c/ad5820.c 
arch/x86/kernel/acpi/ | wc -l
7

Which got me a bit surprised: maybe indeed leds is mostly fine, but
hibernation is definitely tricky, I've been stung by it a few times...

So why not pick something an actual user reported, and see how that was
dealt with?

Googling first showed this:

https://bugzilla.kernel.org/show_bug.cgi?id=97201

Which was fixed by:


https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bdbc98abb3aa323f6323b11db39c740e6f8fc5b1

But that's not in any -stable tree. Hmm.. ok..

Next one on google was:

https://bugzilla.kernel.org/show_bug.cgi?id=117971

Which, in turn, was fixed by:


https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5b3f249c94ce1f46bacd9814385b0ee2d1ae52f3

Oh look at that, it's not in -stable either...

So seeing how you have concerns with my selection of -stable commits,
maybe you could explain to me why these commits didn't end up in
-stable?

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-05-03 Thread Sasha Levin

On Thu, May 03, 2018 at 11:47:24AM +0200, Pavel Machek wrote:
>On Mon 2018-04-16 21:18:47, Sasha Levin wrote:
>> On Mon, Apr 16, 2018 at 10:43:28PM +0200, Jiri Kosina wrote:
>> >On Mon, 16 Apr 2018, Sasha Levin wrote:
>> >
>> >> So I think that Linus's claim that users come first applies here as
>> >> well. If there's a user that cares about a particular feature being
>> >> broken, then we go ahead and fix his bug rather then ignoring him.
>> >
>> >So one extreme is fixing -stable *iff* users actually do report an issue.
>> >
>> >The other extreme is backporting everything that potentially looks like a
>> >potential fix of "something" (according to some arbitrary metric),
>> >pro-actively.
>> >
>> >The former voilates the "users first" rule, the latter has a very, very
>> >high risk of regressions.
>> >
>> >So this whole debate is about finding a compromise.
>> >
>> >My gut feeling always was that the statement in
>> >
>> >Documentation/process/stable-kernel-rules.rst
>> >
>> >is very reasonable, but making the process way more "aggresive" when
>> >backporting patches is breaking much of its original spirit for me.
>>
>> I agree that as an enterprise distro taking everything from -stable
>> isn't the best idea. Ideally you'd want to be close to the first
>
>Original purpose of -stable was "to be common base of enterprise
>distros" and our documentation still says it is.

I guess that the world changes?

At this point calling enterprise distros a niche wouldn't be too far
from the truth. Furthermore, some enterprise distros (as stated
earlier in this thread) don't even follow -stable anymore and cherry
pick their own commits.

So no, the main driving force behind -stable is not traditional
enterprise distributions.

>> I think that we can agree that it's impossible to expect every single
>> Linux user to go on LKML and complain about a bug he encountered, so the
>> rule quickly becomes "It must fix a real bug that can bother
>> people".
>
>I think you are playing dangerous word games.
>
>> My "aggressiveness" comes from the whole "bother" part: it doesn't have
>> to be critical, it doesn't have to cause data corruption, it doesn't
>> have to be a security issue. It's enough that the bug actually affects a
>> user in a way he didn't expect it to (if a user doesn't have
>> expectations, it would fall under the "This could be a problem..."
>> exception.
>
>And it seems documentation says you should be less aggressive and
>world tells you they expect to be less aggressive. So maybe that's
>what you should do?

Who is this "world" you're referring to?

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-05-03 Thread Sasha Levin

On Thu, May 03, 2018 at 11:47:24AM +0200, Pavel Machek wrote:
>On Mon 2018-04-16 21:18:47, Sasha Levin wrote:
>> On Mon, Apr 16, 2018 at 10:43:28PM +0200, Jiri Kosina wrote:
>> >On Mon, 16 Apr 2018, Sasha Levin wrote:
>> >
>> >> So I think that Linus's claim that users come first applies here as
>> >> well. If there's a user that cares about a particular feature being
>> >> broken, then we go ahead and fix his bug rather then ignoring him.
>> >
>> >So one extreme is fixing -stable *iff* users actually do report an issue.
>> >
>> >The other extreme is backporting everything that potentially looks like a
>> >potential fix of "something" (according to some arbitrary metric),
>> >pro-actively.
>> >
>> >The former voilates the "users first" rule, the latter has a very, very
>> >high risk of regressions.
>> >
>> >So this whole debate is about finding a compromise.
>> >
>> >My gut feeling always was that the statement in
>> >
>> >Documentation/process/stable-kernel-rules.rst
>> >
>> >is very reasonable, but making the process way more "aggresive" when
>> >backporting patches is breaking much of its original spirit for me.
>>
>> I agree that as an enterprise distro taking everything from -stable
>> isn't the best idea. Ideally you'd want to be close to the first
>
>Original purpose of -stable was "to be common base of enterprise
>distros" and our documentation still says it is.

I guess that the world changes?

At this point calling enterprise distros a niche wouldn't be too far
from the truth. Furthermore, some enterprise distros (as stated
earlier in this thread) don't even follow -stable anymore and cherry
pick their own commits.

So no, the main driving force behind -stable is not traditional
enterprise distributions.

>> I think that we can agree that it's impossible to expect every single
>> Linux user to go on LKML and complain about a bug he encountered, so the
>> rule quickly becomes "It must fix a real bug that can bother
>> people".
>
>I think you are playing dangerous word games.
>
>> My "aggressiveness" comes from the whole "bother" part: it doesn't have
>> to be critical, it doesn't have to cause data corruption, it doesn't
>> have to be a security issue. It's enough that the bug actually affects a
>> user in a way he didn't expect it to (if a user doesn't have
>> expectations, it would fall under the "This could be a problem..."
>> exception.
>
>And it seems documentation says you should be less aggressive and
>world tells you they expect to be less aggressive. So maybe that's
>what you should do?

Who is this "world" you're referring to?

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-05-03 Thread Sasha Levin

On Thu, May 03, 2018 at 12:04:41PM +0200, Pavel Machek wrote:
>On Tue 2018-04-17 16:06:29, Sasha Levin wrote:
>> On Tue, Apr 17, 2018 at 05:52:30PM +0200, Jiri Kosina wrote:
>> >On Tue, 17 Apr 2018, Sasha Levin wrote:
>> >
>> >> How do I get the XFS folks to send their stuff to -stable? (we have
>> >> quite a few customers who use XFS)
>> >
>> >If XFS (or *any* other subsystem) doesn't have enough manpower of upstream
>> >maintainers to deal with stable, we just have to accept that and find an
>> >answer to that.
>>
>> This is exactly what I'm doing. Many subsystems don't have enough
>> manpower to deal with -stable, so I'm trying to help.
>
>...and the torrent of spams from the AUTOSEL subsystem actually makes
>that worse.
>
>And when you are told particular fix to LEDs is not that important
>after all, you start arguing about nuclear power plants (without
>really knowing how critical subsystems work).

Obviously your knowledge far surpasses mine.

>If you want cooperation with maintainers to work, the rules need to be
>clear, first. They are documented, so follow them. If you think rules
>are wrong, lets talk about changing the rules; but arguing "every bug
>is important because someone may be hitting it" is not ok.

I'm sorry but you're just unfamiliar with the process. I'd point out
that all my AUTOSEL commits go through Greg, who wrote the rules, and
accepts my patches.

The rules are there as a guideline to allow us to not take certain
patches, they're not there as a strict set of rules we must follow at
all times.

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-05-03 Thread Sasha Levin

On Thu, May 03, 2018 at 12:04:41PM +0200, Pavel Machek wrote:
>On Tue 2018-04-17 16:06:29, Sasha Levin wrote:
>> On Tue, Apr 17, 2018 at 05:52:30PM +0200, Jiri Kosina wrote:
>> >On Tue, 17 Apr 2018, Sasha Levin wrote:
>> >
>> >> How do I get the XFS folks to send their stuff to -stable? (we have
>> >> quite a few customers who use XFS)
>> >
>> >If XFS (or *any* other subsystem) doesn't have enough manpower of upstream
>> >maintainers to deal with stable, we just have to accept that and find an
>> >answer to that.
>>
>> This is exactly what I'm doing. Many subsystems don't have enough
>> manpower to deal with -stable, so I'm trying to help.
>
>...and the torrent of spams from the AUTOSEL subsystem actually makes
>that worse.
>
>And when you are told particular fix to LEDs is not that important
>after all, you start arguing about nuclear power plants (without
>really knowing how critical subsystems work).

Obviously your knowledge far surpasses mine.

>If you want cooperation with maintainers to work, the rules need to be
>clear, first. They are documented, so follow them. If you think rules
>are wrong, lets talk about changing the rules; but arguing "every bug
>is important because someone may be hitting it" is not ok.

I'm sorry but you're just unfamiliar with the process. I'd point out
that all my AUTOSEL commits go through Greg, who wrote the rules, and
accepts my patches.

The rules are there as a guideline to allow us to not take certain
patches, they're not there as a strict set of rules we must follow at
all times.

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-05-03 Thread Pavel Machek

On Tue 2018-04-17 16:06:29, Sasha Levin wrote:
> On Tue, Apr 17, 2018 at 05:52:30PM +0200, Jiri Kosina wrote:
> >On Tue, 17 Apr 2018, Sasha Levin wrote:
> >
> >> How do I get the XFS folks to send their stuff to -stable? (we have
> >> quite a few customers who use XFS)
> >
> >If XFS (or *any* other subsystem) doesn't have enough manpower of upstream
> >maintainers to deal with stable, we just have to accept that and find an
> >answer to that.
> 
> This is exactly what I'm doing. Many subsystems don't have enough
> manpower to deal with -stable, so I'm trying to help.

...and the torrent of spams from the AUTOSEL subsystem actually makes
that worse.

And when you are told particular fix to LEDs is not that important
after all, you start arguing about nuclear power plants (without
really knowing how critical subsystems work).

If you want cooperation with maintainers to work, the rules need to be
clear, first. They are documented, so follow them. If you think rules
are wrong, lets talk about changing the rules; but arguing "every bug
is important because someone may be hitting it" is not ok.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

signature.asc
Description: Digital signature

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-05-03 Thread Pavel Machek

On Tue 2018-04-17 16:06:29, Sasha Levin wrote:
> On Tue, Apr 17, 2018 at 05:52:30PM +0200, Jiri Kosina wrote:
> >On Tue, 17 Apr 2018, Sasha Levin wrote:
> >
> >> How do I get the XFS folks to send their stuff to -stable? (we have
> >> quite a few customers who use XFS)
> >
> >If XFS (or *any* other subsystem) doesn't have enough manpower of upstream
> >maintainers to deal with stable, we just have to accept that and find an
> >answer to that.
> 
> This is exactly what I'm doing. Many subsystems don't have enough
> manpower to deal with -stable, so I'm trying to help.

...and the torrent of spams from the AUTOSEL subsystem actually makes
that worse.

And when you are told particular fix to LEDs is not that important
after all, you start arguing about nuclear power plants (without
really knowing how critical subsystems work).

If you want cooperation with maintainers to work, the rules need to be
clear, first. They are documented, so follow them. If you think rules
are wrong, lets talk about changing the rules; but arguing "every bug
is important because someone may be hitting it" is not ok.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

signature.asc
Description: Digital signature

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-05-03 Thread Pavel Machek

On Mon 2018-04-16 21:18:47, Sasha Levin wrote:
> On Mon, Apr 16, 2018 at 10:43:28PM +0200, Jiri Kosina wrote:
> >On Mon, 16 Apr 2018, Sasha Levin wrote:
> >
> >> So I think that Linus's claim that users come first applies here as
> >> well. If there's a user that cares about a particular feature being
> >> broken, then we go ahead and fix his bug rather then ignoring him.
> >
> >So one extreme is fixing -stable *iff* users actually do report an issue.
> >
> >The other extreme is backporting everything that potentially looks like a
> >potential fix of "something" (according to some arbitrary metric),
> >pro-actively.
> >
> >The former voilates the "users first" rule, the latter has a very, very
> >high risk of regressions.
> >
> >So this whole debate is about finding a compromise.
> >
> >My gut feeling always was that the statement in
> >
> > Documentation/process/stable-kernel-rules.rst
> >
> >is very reasonable, but making the process way more "aggresive" when
> >backporting patches is breaking much of its original spirit for me.
> 
> I agree that as an enterprise distro taking everything from -stable
> isn't the best idea. Ideally you'd want to be close to the first

Original purpose of -stable was "to be common base of enterprise
distros" and our documentation still says it is.

> I think that we can agree that it's impossible to expect every single
> Linux user to go on LKML and complain about a bug he encountered, so the
> rule quickly becomes "It must fix a real bug that can bother
> people".

I think you are playing dangerous word games.

> My "aggressiveness" comes from the whole "bother" part: it doesn't have
> to be critical, it doesn't have to cause data corruption, it doesn't
> have to be a security issue. It's enough that the bug actually affects a
> user in a way he didn't expect it to (if a user doesn't have
> expectations, it would fall under the "This could be a problem..."
> exception.

And it seems documentation says you should be less aggressive and
world tells you they expect to be less aggressive. So maybe that's
what you should do?
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-05-03 Thread Pavel Machek

On Mon 2018-04-16 21:18:47, Sasha Levin wrote:
> On Mon, Apr 16, 2018 at 10:43:28PM +0200, Jiri Kosina wrote:
> >On Mon, 16 Apr 2018, Sasha Levin wrote:
> >
> >> So I think that Linus's claim that users come first applies here as
> >> well. If there's a user that cares about a particular feature being
> >> broken, then we go ahead and fix his bug rather then ignoring him.
> >
> >So one extreme is fixing -stable *iff* users actually do report an issue.
> >
> >The other extreme is backporting everything that potentially looks like a
> >potential fix of "something" (according to some arbitrary metric),
> >pro-actively.
> >
> >The former voilates the "users first" rule, the latter has a very, very
> >high risk of regressions.
> >
> >So this whole debate is about finding a compromise.
> >
> >My gut feeling always was that the statement in
> >
> > Documentation/process/stable-kernel-rules.rst
> >
> >is very reasonable, but making the process way more "aggresive" when
> >backporting patches is breaking much of its original spirit for me.
> 
> I agree that as an enterprise distro taking everything from -stable
> isn't the best idea. Ideally you'd want to be close to the first

Original purpose of -stable was "to be common base of enterprise
distros" and our documentation still says it is.

> I think that we can agree that it's impossible to expect every single
> Linux user to go on LKML and complain about a bug he encountered, so the
> rule quickly becomes "It must fix a real bug that can bother
> people".

I think you are playing dangerous word games.

> My "aggressiveness" comes from the whole "bother" part: it doesn't have
> to be critical, it doesn't have to cause data corruption, it doesn't
> have to be a security issue. It's enough that the bug actually affects a
> user in a way he didn't expect it to (if a user doesn't have
> expectations, it would fall under the "This could be a problem..."
> exception.

And it seems documentation says you should be less aggressive and
world tells you they expect to be less aggressive. So maybe that's
what you should do?
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-05-03 Thread Pavel Machek

On Tue 2018-04-17 16:19:35, Sasha Levin wrote:
> On Tue, Apr 17, 2018 at 05:55:49PM +0200, Jan Kara wrote:
> >On Tue 17-04-18 13:31:51, Sasha Levin wrote:
> >> We may be able to guesstimate the 'regression chance', but there's no
> >> way we can guess the 'annoyance' once. There are so many different use
> >> cases that we just can't even guess how many people would get "annoyed"
> >> by something.
> >
> >As a maintainer, I hope I have reasonable idea what are common use cases
> >for my subsystem. Those I cater to when estimating 'annoyance'. Sure I don't
> >know all of the use cases so people doing unusual stuff hit more bugs and
> >have to report them to get fixes included in -stable. But for me this is a
> >preferable tradeoff over the risk of regression so this is the rule I use
> >when tagging for stable. Now I'm not a -stable maintainer and I fully agree
> >with "those who do the work decide" principle so pick whatever patches you
> >think are appropriate, I just wanted explain why I don't think more patches
> >in stable are necessarily good.
> 
> The AUTOSEL story is different for subsystems that don't do -stable, and
> subsystems that are actually doing the work (like yourself).
> 
> I'm not trying to override active maintainers, I'm trying to help them
> make decisions.

Ok, cool. Can you exclude LED subsystem, Hibernation and Nokia N900
stuff from autosel work?

Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-05-03 Thread Pavel Machek

On Tue 2018-04-17 16:19:35, Sasha Levin wrote:
> On Tue, Apr 17, 2018 at 05:55:49PM +0200, Jan Kara wrote:
> >On Tue 17-04-18 13:31:51, Sasha Levin wrote:
> >> We may be able to guesstimate the 'regression chance', but there's no
> >> way we can guess the 'annoyance' once. There are so many different use
> >> cases that we just can't even guess how many people would get "annoyed"
> >> by something.
> >
> >As a maintainer, I hope I have reasonable idea what are common use cases
> >for my subsystem. Those I cater to when estimating 'annoyance'. Sure I don't
> >know all of the use cases so people doing unusual stuff hit more bugs and
> >have to report them to get fixes included in -stable. But for me this is a
> >preferable tradeoff over the risk of regression so this is the rule I use
> >when tagging for stable. Now I'm not a -stable maintainer and I fully agree
> >with "those who do the work decide" principle so pick whatever patches you
> >think are appropriate, I just wanted explain why I don't think more patches
> >in stable are necessarily good.
> 
> The AUTOSEL story is different for subsystems that don't do -stable, and
> subsystems that are actually doing the work (like yourself).
> 
> I'm not trying to override active maintainers, I'm trying to help them
> make decisions.

Ok, cool. Can you exclude LED subsystem, Hibernation and Nokia N900
stuff from autosel work?

Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-05-03 Thread Pavel Machek

Hi!

> >- It must be obviously correct and tested.
> >
> >If it introduces new bug, it is not correct, and certainly not
> >obviously correct.
> 
> As you might have noticed, we don't strictly follow the rules.

Yes, I noticed. And what I'm saying is that perhaps you should follow
the rules more strictly.

> Take a look at the whole PTI story as an example. It's way more than 100
> lines, it's not obviously corrent, it fixed more than 1 thing, and so
> on, and yet it went in -stable!
> 
> Would you argue we shouldn't have backported PTI to -stable?

Actually, I was surprised with PTI going to stable. That was clearly
against the rules. Maybe the security bug was ugly enough to warrant
that.

But please don't use it as an argument for applying any random
patches...

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-05-03 Thread Pavel Machek

Hi!

> >- It must be obviously correct and tested.
> >
> >If it introduces new bug, it is not correct, and certainly not
> >obviously correct.
> 
> As you might have noticed, we don't strictly follow the rules.

Yes, I noticed. And what I'm saying is that perhaps you should follow
the rules more strictly.

> Take a look at the whole PTI story as an example. It's way more than 100
> lines, it's not obviously corrent, it fixed more than 1 thing, and so
> on, and yet it went in -stable!
> 
> Would you argue we shouldn't have backported PTI to -stable?

Actually, I was surprised with PTI going to stable. That was clearly
against the rules. Maybe the security bug was ugly enough to warrant
that.

But please don't use it as an argument for applying any random
patches...

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-19 Thread Greg KH

On Thu, Apr 19, 2018 at 04:22:22PM +0200, Greg KH wrote:
> On Thu, Apr 19, 2018 at 04:05:45PM +0200, Jan Kara wrote:
> > On Thu 19-04-18 15:59:43, Greg KH wrote:
> > > On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote:
> > > > Den 16-04-2018 kl. 19:19, skrev Sasha Levin:
> > > > > On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote:
> > > > > > On Mon, 16 Apr 2018 16:02:03 +
> > > > > > Sasha Levin  wrote:
> > > > > > 
> > > > > > > One of the things Greg is pushing strongly for is "bug 
> > > > > > > compatibility":
> > > > > > > we want the kernel to behave the same way between mainline and 
> > > > > > > stable.
> > > > > > > If the code is broken, it should be broken in the same way.
> > > > > > 
> > > > > > Wait! What does that mean? What's the purpose of stable if it is as
> > > > > > broken as mainline?
> > > > > 
> > > > > This just means that if there is a fix that went in mainline, and the
> > > > > fix is broken somehow, we'd rather take the broken fix than not.
> > > > > 
> > > > > In this scenario, *something* will be broken, it's just a matter of
> > > > > what. We'd rather have the same thing broken between mainline and
> > > > > stable.
> > > > > 
> > > > 
> > > > Yeah, but _intentionally_ breaking existing setups to stay "bug 
> > > > compatible"
> > > > _is_ a _regression_ you _really_ _dont_ want in a stable
> > > > supported distro. Because end-users dont care about upstream breaking
> > > > stuff... its the distro that takes the heat for that...
> > > > 
> > > > Something "already broken" is not a regression...
> > > > 
> > > > As distro maintainer that means one now have to review _every_ patch 
> > > > that
> > > > carries "AUTOSEL", follow all the mail threads that comes up about it, 
> > > > then
> > > > track if it landed in -stable queue, and read every response and 
> > > > possible
> > > > objection to all patches in the -stable queue a second time around... 
> > > > then
> > > > check if it still got included in final stable point relase and then 
> > > > either
> > > > revert them in distro kernel or go track down all the follow-up fixes
> > > > needed...
> > > > 
> > > > Just to avoid being "bug compatible with master"
> > > 
> > > I've done this "bug compatible" "breakage" more than the AUTOSEL stuff
> > > has in the past, so you had better also be reviewing all of my normal
> > > commits as well :)
> > > 
> > > Anyway, we are trying not to do this, but it does, and will,
> > > occasionally happen.
> > 
> > Sure, that's understood. So this was just misunderstanding. Sasha's
> > original comment really sounded like "bug compatibility" with current
> > master is desirable and that made me go "Are you serious?" as well...
> 
> As I said before in this thread, yes, sometimes I do this on purpose.
> 
> As an specific example, see a recent bluetooth patch that caused a
> regression on some chromebook devices.  The chromeos developers
> rightfully complainied, and I left the commit in there to provide the
> needed "leverage" on the upstream developers to fix this properly.
> Otherwise if I had reverted the stable patch, when people move to a
> newer kernel version, things break, and no one remembers why.
> 
> I also wrote a long response as to _why_ I do this, and even though it
> does happen, why it still is worth taking the stable updates.  Please
> see the archives for the full details.  I don't want to duplicate this
> again here.

And to be more specific, let's always take this as a case-by-case basis.
The last time this happened was the bluetooth bug and it was a fix for a
reported problem, but then the fix caused a regression so upstream
reverted it and I reverted it in the stable trees.  No matter what I
chose to do, someone would be upset so I followed what upstream did.

thanks,

greg k-h

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-19 Thread Greg KH

On Thu, Apr 19, 2018 at 04:22:22PM +0200, Greg KH wrote:
> On Thu, Apr 19, 2018 at 04:05:45PM +0200, Jan Kara wrote:
> > On Thu 19-04-18 15:59:43, Greg KH wrote:
> > > On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote:
> > > > Den 16-04-2018 kl. 19:19, skrev Sasha Levin:
> > > > > On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote:
> > > > > > On Mon, 16 Apr 2018 16:02:03 +
> > > > > > Sasha Levin  wrote:
> > > > > > 
> > > > > > > One of the things Greg is pushing strongly for is "bug 
> > > > > > > compatibility":
> > > > > > > we want the kernel to behave the same way between mainline and 
> > > > > > > stable.
> > > > > > > If the code is broken, it should be broken in the same way.
> > > > > > 
> > > > > > Wait! What does that mean? What's the purpose of stable if it is as
> > > > > > broken as mainline?
> > > > > 
> > > > > This just means that if there is a fix that went in mainline, and the
> > > > > fix is broken somehow, we'd rather take the broken fix than not.
> > > > > 
> > > > > In this scenario, *something* will be broken, it's just a matter of
> > > > > what. We'd rather have the same thing broken between mainline and
> > > > > stable.
> > > > > 
> > > > 
> > > > Yeah, but _intentionally_ breaking existing setups to stay "bug 
> > > > compatible"
> > > > _is_ a _regression_ you _really_ _dont_ want in a stable
> > > > supported distro. Because end-users dont care about upstream breaking
> > > > stuff... its the distro that takes the heat for that...
> > > > 
> > > > Something "already broken" is not a regression...
> > > > 
> > > > As distro maintainer that means one now have to review _every_ patch 
> > > > that
> > > > carries "AUTOSEL", follow all the mail threads that comes up about it, 
> > > > then
> > > > track if it landed in -stable queue, and read every response and 
> > > > possible
> > > > objection to all patches in the -stable queue a second time around... 
> > > > then
> > > > check if it still got included in final stable point relase and then 
> > > > either
> > > > revert them in distro kernel or go track down all the follow-up fixes
> > > > needed...
> > > > 
> > > > Just to avoid being "bug compatible with master"
> > > 
> > > I've done this "bug compatible" "breakage" more than the AUTOSEL stuff
> > > has in the past, so you had better also be reviewing all of my normal
> > > commits as well :)
> > > 
> > > Anyway, we are trying not to do this, but it does, and will,
> > > occasionally happen.
> > 
> > Sure, that's understood. So this was just misunderstanding. Sasha's
> > original comment really sounded like "bug compatibility" with current
> > master is desirable and that made me go "Are you serious?" as well...
> 
> As I said before in this thread, yes, sometimes I do this on purpose.
> 
> As an specific example, see a recent bluetooth patch that caused a
> regression on some chromebook devices.  The chromeos developers
> rightfully complainied, and I left the commit in there to provide the
> needed "leverage" on the upstream developers to fix this properly.
> Otherwise if I had reverted the stable patch, when people move to a
> newer kernel version, things break, and no one remembers why.
> 
> I also wrote a long response as to _why_ I do this, and even though it
> does happen, why it still is worth taking the stable updates.  Please
> see the archives for the full details.  I don't want to duplicate this
> again here.

And to be more specific, let's always take this as a case-by-case basis.
The last time this happened was the bluetooth bug and it was a fix for a
reported problem, but then the fix caused a regression so upstream
reverted it and I reverted it in the stable trees.  No matter what I
chose to do, someone would be upset so I followed what upstream did.

thanks,

greg k-h

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-19 Thread Thomas Backlund


Den 19.04.2018 kl. 18:57, skrev Greg KH:

On Thu, Apr 19, 2018 at 06:16:26PM +0300, Thomas Backlund wrote:

Den 19.04.2018 kl. 17:22, skrev Greg KH:

On Thu, Apr 19, 2018 at 04:05:45PM +0200, Jan Kara wrote:

On Thu 19-04-18 15:59:43, Greg KH wrote:

On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote:

Den 16-04-2018 kl. 19:19, skrev Sasha Levin:

On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote:

On Mon, 16 Apr 2018 16:02:03 +
Sasha Levin  wrote:


One of the things Greg is pushing strongly for is "bug compatibility":
we want the kernel to behave the same way between mainline and stable.
If the code is broken, it should be broken in the same way.


Wait! What does that mean? What's the purpose of stable if it is as
broken as mainline?


This just means that if there is a fix that went in mainline, and the
fix is broken somehow, we'd rather take the broken fix than not.

In this scenario, *something* will be broken, it's just a matter of
what. We'd rather have the same thing broken between mainline and
stable.



Yeah, but _intentionally_ breaking existing setups to stay "bug compatible"
_is_ a _regression_ you _really_ _dont_ want in a stable
supported distro. Because end-users dont care about upstream breaking
stuff... its the distro that takes the heat for that...

Something "already broken" is not a regression...

As distro maintainer that means one now have to review _every_ patch that
carries "AUTOSEL", follow all the mail threads that comes up about it, then
track if it landed in -stable queue, and read every response and possible
objection to all patches in the -stable queue a second time around... then
check if it still got included in final stable point relase and then either
revert them in distro kernel or go track down all the follow-up fixes
needed...

Just to avoid being "bug compatible with master"


I've done this "bug compatible" "breakage" more than the AUTOSEL stuff
has in the past, so you had better also be reviewing all of my normal
commits as well :)

Anyway, we are trying not to do this, but it does, and will,
occasionally happen.


Sure, that's understood. So this was just misunderstanding. Sasha's
original comment really sounded like "bug compatibility" with current
master is desirable and that made me go "Are you serious?" as well...


As I said before in this thread, yes, sometimes I do this on purpose.



And I guess this is the one that gets people the feeling that
"stable is not as stable as it used to be" ...


It's always been this way, it's just that no one noticed :)



:)



As an specific example, see a recent bluetooth patch that caused a
regression on some chromebook devices.  The chromeos developers
rightfully complainied, and I left the commit in there to provide the
needed "leverage" on the upstream developers to fix this properly.
Otherwise if I had reverted the stable patch, when people move to a
newer kernel version, things break, and no one remembers why.


I do understand what you are trying to do...

But from my distro hat on I have to treat things differently (and I dont
think I'm alone doing it this way...)

Known breakages gets reverted even before it hits QA, so endusers wont see
the issue at all...

So the only ones to see the issue are those building with latest upstream
without own patches applied...



I also wrote a long response as to _why_ I do this, and even though it
does happen, why it still is worth taking the stable updates.  Please
see the archives for the full details.  I don't want to duplicate this
again here.


And we do use latest stable (with some delay as I dont want to overload QA &
endusers with a new kernel every week :))


You need to automate your QA :)



Yeah, some can be automated... but that means having a lot of different 
hw to test on... emulators/vms can only test so much...


users part of QA test on a variety of hw with various installs/setups 
that exposes fun things with some hw :)




We just revert known broken (or add known fixes) before releasing them to
our users


That's great, and is what you should be doing, nothing wrong there.

thanks,

greg k-h



--
Thomas

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-19 Thread Thomas Backlund


Den 19.04.2018 kl. 18:57, skrev Greg KH:

On Thu, Apr 19, 2018 at 06:16:26PM +0300, Thomas Backlund wrote:

Den 19.04.2018 kl. 17:22, skrev Greg KH:

On Thu, Apr 19, 2018 at 04:05:45PM +0200, Jan Kara wrote:

On Thu 19-04-18 15:59:43, Greg KH wrote:

On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote:

Den 16-04-2018 kl. 19:19, skrev Sasha Levin:

On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote:

On Mon, 16 Apr 2018 16:02:03 +
Sasha Levin  wrote:


One of the things Greg is pushing strongly for is "bug compatibility":
we want the kernel to behave the same way between mainline and stable.
If the code is broken, it should be broken in the same way.


Wait! What does that mean? What's the purpose of stable if it is as
broken as mainline?


This just means that if there is a fix that went in mainline, and the
fix is broken somehow, we'd rather take the broken fix than not.

In this scenario, *something* will be broken, it's just a matter of
what. We'd rather have the same thing broken between mainline and
stable.



Yeah, but _intentionally_ breaking existing setups to stay "bug compatible"
_is_ a _regression_ you _really_ _dont_ want in a stable
supported distro. Because end-users dont care about upstream breaking
stuff... its the distro that takes the heat for that...

Something "already broken" is not a regression...

As distro maintainer that means one now have to review _every_ patch that
carries "AUTOSEL", follow all the mail threads that comes up about it, then
track if it landed in -stable queue, and read every response and possible
objection to all patches in the -stable queue a second time around... then
check if it still got included in final stable point relase and then either
revert them in distro kernel or go track down all the follow-up fixes
needed...

Just to avoid being "bug compatible with master"


I've done this "bug compatible" "breakage" more than the AUTOSEL stuff
has in the past, so you had better also be reviewing all of my normal
commits as well :)

Anyway, we are trying not to do this, but it does, and will,
occasionally happen.


Sure, that's understood. So this was just misunderstanding. Sasha's
original comment really sounded like "bug compatibility" with current
master is desirable and that made me go "Are you serious?" as well...


As I said before in this thread, yes, sometimes I do this on purpose.



And I guess this is the one that gets people the feeling that
"stable is not as stable as it used to be" ...


It's always been this way, it's just that no one noticed :)



:)



As an specific example, see a recent bluetooth patch that caused a
regression on some chromebook devices.  The chromeos developers
rightfully complainied, and I left the commit in there to provide the
needed "leverage" on the upstream developers to fix this properly.
Otherwise if I had reverted the stable patch, when people move to a
newer kernel version, things break, and no one remembers why.


I do understand what you are trying to do...

But from my distro hat on I have to treat things differently (and I dont
think I'm alone doing it this way...)

Known breakages gets reverted even before it hits QA, so endusers wont see
the issue at all...

So the only ones to see the issue are those building with latest upstream
without own patches applied...



I also wrote a long response as to _why_ I do this, and even though it
does happen, why it still is worth taking the stable updates.  Please
see the archives for the full details.  I don't want to duplicate this
again here.


And we do use latest stable (with some delay as I dont want to overload QA &
endusers with a new kernel every week :))


You need to automate your QA :)



Yeah, some can be automated... but that means having a lot of different 
hw to test on... emulators/vms can only test so much...


users part of QA test on a variety of hw with various installs/setups 
that exposes fun things with some hw :)




We just revert known broken (or add known fixes) before releasing them to
our users


That's great, and is what you should be doing, nothing wrong there.

thanks,

greg k-h



--
Thomas

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-19 Thread Thomas Backlund


Den 19.04.2018 kl. 18:09, skrev Sasha Levin:

On Thu, Apr 19, 2018 at 06:04:26PM +0300, Thomas Backlund wrote:

Den 19.04.2018 kl. 16:59, skrev Greg KH:

Anyway, we are trying not to do this, but it does, and will,
occasionally happen.  Look, we just did that for one platform for
4.9.94!  And the key to all of this is good testing, which we are now
doing, and hopefully you are also doing as well.


Yeah, but having to test stuff with known breakages is no fun, so we
try to avoid that


Known breakages are easier to deal with than unknown ones :)



well, if a system worked before the update, but not after...
Guess wich one we want...




I think that that "bug compatability" is basically a policy on *which*
regressions you'll see vs *if* you'll see a regression.




No. Intentionally breaking known working code in a stable branch is 
never ok.


As I said before... something that never worked is not a regression,
but breaking a previously working setup is...

That goes for security fixes too... there is not much point in a 
security fix, if it basically turns into a local DOS when the system 
stops working...


People will just boot older code and there you have it...



We'll never pull in a commit that introduces a bug but doesn't fix
another one, right? So if you have to deal with a regression anyway,
might as well deal with a regression that is also seen on mainline, so
that when you upgrade your stable kernel you'll keep the same set of
regressions to deal with.




Here I actually like the comment Linus posted about API breakage earlier 
in this thread...



If you break user workflows, NOTHING ELSE MATTERS.

Even security is secondary to "people don't use the end result,
because it doesn't work for them any more".


_This_ same statement should be aknowledged / enforced in stable trees 
too IMHO...


Because this is what will happend...

simple logic... if it does not work, the enduser will boot an earlier 
kernel... missing "all the good fixes" (including security ones) just

because one fix is bad.

For example in this AUTOSEL round there is 161 fixes of wich the enduser
never gets the 160 "supposedly good ones" when one is "bad"...


How is that a "good thing" ?


And trying to tell those that get hit "this will force upstream to fix 
it faster, so you get a working setup in some days/weeks/months..." is

not going to work...


Heh, This even reminds me that this is just as annoying as when MS
started to "bundle monthly security updates" and you get 95% installed
just to realize that the last 5% does not work (or install at all) and
you have to rollback to something working thus missing the needed
security fixes...

Same flawed logic...

Thnakfully we as distro maintainers can avoid some of the breakage for 
our enduses...


--
Thomas

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-19 Thread Thomas Backlund


Den 19.04.2018 kl. 18:09, skrev Sasha Levin:

On Thu, Apr 19, 2018 at 06:04:26PM +0300, Thomas Backlund wrote:

Den 19.04.2018 kl. 16:59, skrev Greg KH:

Anyway, we are trying not to do this, but it does, and will,
occasionally happen.  Look, we just did that for one platform for
4.9.94!  And the key to all of this is good testing, which we are now
doing, and hopefully you are also doing as well.


Yeah, but having to test stuff with known breakages is no fun, so we
try to avoid that


Known breakages are easier to deal with than unknown ones :)



well, if a system worked before the update, but not after...
Guess wich one we want...




I think that that "bug compatability" is basically a policy on *which*
regressions you'll see vs *if* you'll see a regression.




No. Intentionally breaking known working code in a stable branch is 
never ok.


As I said before... something that never worked is not a regression,
but breaking a previously working setup is...

That goes for security fixes too... there is not much point in a 
security fix, if it basically turns into a local DOS when the system 
stops working...


People will just boot older code and there you have it...



We'll never pull in a commit that introduces a bug but doesn't fix
another one, right? So if you have to deal with a regression anyway,
might as well deal with a regression that is also seen on mainline, so
that when you upgrade your stable kernel you'll keep the same set of
regressions to deal with.




Here I actually like the comment Linus posted about API breakage earlier 
in this thread...



If you break user workflows, NOTHING ELSE MATTERS.

Even security is secondary to "people don't use the end result,
because it doesn't work for them any more".


_This_ same statement should be aknowledged / enforced in stable trees 
too IMHO...


Because this is what will happend...

simple logic... if it does not work, the enduser will boot an earlier 
kernel... missing "all the good fixes" (including security ones) just

because one fix is bad.

For example in this AUTOSEL round there is 161 fixes of wich the enduser
never gets the 160 "supposedly good ones" when one is "bad"...


How is that a "good thing" ?


And trying to tell those that get hit "this will force upstream to fix 
it faster, so you get a working setup in some days/weeks/months..." is

not going to work...


Heh, This even reminds me that this is just as annoying as when MS
started to "bundle monthly security updates" and you get 95% installed
just to realize that the last 5% does not work (or install at all) and
you have to rollback to something working thus missing the needed
security fixes...

Same flawed logic...

Thnakfully we as distro maintainers can avoid some of the breakage for 
our enduses...


--
Thomas

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-19 Thread Greg KH

On Thu, Apr 19, 2018 at 06:16:26PM +0300, Thomas Backlund wrote:
> Den 19.04.2018 kl. 17:22, skrev Greg KH:
> > On Thu, Apr 19, 2018 at 04:05:45PM +0200, Jan Kara wrote:
> > > On Thu 19-04-18 15:59:43, Greg KH wrote:
> > > > On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote:
> > > > > Den 16-04-2018 kl. 19:19, skrev Sasha Levin:
> > > > > > On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote:
> > > > > > > On Mon, 16 Apr 2018 16:02:03 +
> > > > > > > Sasha Levin  wrote:
> > > > > > > 
> > > > > > > > One of the things Greg is pushing strongly for is "bug 
> > > > > > > > compatibility":
> > > > > > > > we want the kernel to behave the same way between mainline and 
> > > > > > > > stable.
> > > > > > > > If the code is broken, it should be broken in the same way.
> > > > > > > 
> > > > > > > Wait! What does that mean? What's the purpose of stable if it is 
> > > > > > > as
> > > > > > > broken as mainline?
> > > > > > 
> > > > > > This just means that if there is a fix that went in mainline, and 
> > > > > > the
> > > > > > fix is broken somehow, we'd rather take the broken fix than not.
> > > > > > 
> > > > > > In this scenario, *something* will be broken, it's just a matter of
> > > > > > what. We'd rather have the same thing broken between mainline and
> > > > > > stable.
> > > > > > 
> > > > > 
> > > > > Yeah, but _intentionally_ breaking existing setups to stay "bug 
> > > > > compatible"
> > > > > _is_ a _regression_ you _really_ _dont_ want in a stable
> > > > > supported distro. Because end-users dont care about upstream breaking
> > > > > stuff... its the distro that takes the heat for that...
> > > > > 
> > > > > Something "already broken" is not a regression...
> > > > > 
> > > > > As distro maintainer that means one now have to review _every_ patch 
> > > > > that
> > > > > carries "AUTOSEL", follow all the mail threads that comes up about 
> > > > > it, then
> > > > > track if it landed in -stable queue, and read every response and 
> > > > > possible
> > > > > objection to all patches in the -stable queue a second time around... 
> > > > > then
> > > > > check if it still got included in final stable point relase and then 
> > > > > either
> > > > > revert them in distro kernel or go track down all the follow-up fixes
> > > > > needed...
> > > > > 
> > > > > Just to avoid being "bug compatible with master"
> > > > 
> > > > I've done this "bug compatible" "breakage" more than the AUTOSEL stuff
> > > > has in the past, so you had better also be reviewing all of my normal
> > > > commits as well :)
> > > > 
> > > > Anyway, we are trying not to do this, but it does, and will,
> > > > occasionally happen.
> > > 
> > > Sure, that's understood. So this was just misunderstanding. Sasha's
> > > original comment really sounded like "bug compatibility" with current
> > > master is desirable and that made me go "Are you serious?" as well...
> > 
> > As I said before in this thread, yes, sometimes I do this on purpose.
> > 
> 
> And I guess this is the one that gets people the feeling that
> "stable is not as stable as it used to be" ...

It's always been this way, it's just that no one noticed :)

> > As an specific example, see a recent bluetooth patch that caused a
> > regression on some chromebook devices.  The chromeos developers
> > rightfully complainied, and I left the commit in there to provide the
> > needed "leverage" on the upstream developers to fix this properly.
> > Otherwise if I had reverted the stable patch, when people move to a
> > newer kernel version, things break, and no one remembers why.
> 
> I do understand what you are trying to do...
> 
> But from my distro hat on I have to treat things differently (and I dont
> think I'm alone doing it this way...)
> 
> Known breakages gets reverted even before it hits QA, so endusers wont see
> the issue at all...
> 
> So the only ones to see the issue are those building with latest upstream
> without own patches applied...
> 
> > 
> > I also wrote a long response as to _why_ I do this, and even though it
> > does happen, why it still is worth taking the stable updates.  Please
> > see the archives for the full details.  I don't want to duplicate this
> > again here.
> 
> And we do use latest stable (with some delay as I dont want to overload QA &
> endusers with a new kernel every week :))

You need to automate your QA :)

> We just revert known broken (or add known fixes) before releasing them to
> our users

That's great, and is what you should be doing, nothing wrong there.

thanks,

greg k-h

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-19 Thread Greg KH

On Thu, Apr 19, 2018 at 06:16:26PM +0300, Thomas Backlund wrote:
> Den 19.04.2018 kl. 17:22, skrev Greg KH:
> > On Thu, Apr 19, 2018 at 04:05:45PM +0200, Jan Kara wrote:
> > > On Thu 19-04-18 15:59:43, Greg KH wrote:
> > > > On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote:
> > > > > Den 16-04-2018 kl. 19:19, skrev Sasha Levin:
> > > > > > On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote:
> > > > > > > On Mon, 16 Apr 2018 16:02:03 +
> > > > > > > Sasha Levin  wrote:
> > > > > > > 
> > > > > > > > One of the things Greg is pushing strongly for is "bug 
> > > > > > > > compatibility":
> > > > > > > > we want the kernel to behave the same way between mainline and 
> > > > > > > > stable.
> > > > > > > > If the code is broken, it should be broken in the same way.
> > > > > > > 
> > > > > > > Wait! What does that mean? What's the purpose of stable if it is 
> > > > > > > as
> > > > > > > broken as mainline?
> > > > > > 
> > > > > > This just means that if there is a fix that went in mainline, and 
> > > > > > the
> > > > > > fix is broken somehow, we'd rather take the broken fix than not.
> > > > > > 
> > > > > > In this scenario, *something* will be broken, it's just a matter of
> > > > > > what. We'd rather have the same thing broken between mainline and
> > > > > > stable.
> > > > > > 
> > > > > 
> > > > > Yeah, but _intentionally_ breaking existing setups to stay "bug 
> > > > > compatible"
> > > > > _is_ a _regression_ you _really_ _dont_ want in a stable
> > > > > supported distro. Because end-users dont care about upstream breaking
> > > > > stuff... its the distro that takes the heat for that...
> > > > > 
> > > > > Something "already broken" is not a regression...
> > > > > 
> > > > > As distro maintainer that means one now have to review _every_ patch 
> > > > > that
> > > > > carries "AUTOSEL", follow all the mail threads that comes up about 
> > > > > it, then
> > > > > track if it landed in -stable queue, and read every response and 
> > > > > possible
> > > > > objection to all patches in the -stable queue a second time around... 
> > > > > then
> > > > > check if it still got included in final stable point relase and then 
> > > > > either
> > > > > revert them in distro kernel or go track down all the follow-up fixes
> > > > > needed...
> > > > > 
> > > > > Just to avoid being "bug compatible with master"
> > > > 
> > > > I've done this "bug compatible" "breakage" more than the AUTOSEL stuff
> > > > has in the past, so you had better also be reviewing all of my normal
> > > > commits as well :)
> > > > 
> > > > Anyway, we are trying not to do this, but it does, and will,
> > > > occasionally happen.
> > > 
> > > Sure, that's understood. So this was just misunderstanding. Sasha's
> > > original comment really sounded like "bug compatibility" with current
> > > master is desirable and that made me go "Are you serious?" as well...
> > 
> > As I said before in this thread, yes, sometimes I do this on purpose.
> > 
> 
> And I guess this is the one that gets people the feeling that
> "stable is not as stable as it used to be" ...

It's always been this way, it's just that no one noticed :)

> > As an specific example, see a recent bluetooth patch that caused a
> > regression on some chromebook devices.  The chromeos developers
> > rightfully complainied, and I left the commit in there to provide the
> > needed "leverage" on the upstream developers to fix this properly.
> > Otherwise if I had reverted the stable patch, when people move to a
> > newer kernel version, things break, and no one remembers why.
> 
> I do understand what you are trying to do...
> 
> But from my distro hat on I have to treat things differently (and I dont
> think I'm alone doing it this way...)
> 
> Known breakages gets reverted even before it hits QA, so endusers wont see
> the issue at all...
> 
> So the only ones to see the issue are those building with latest upstream
> without own patches applied...
> 
> > 
> > I also wrote a long response as to _why_ I do this, and even though it
> > does happen, why it still is worth taking the stable updates.  Please
> > see the archives for the full details.  I don't want to duplicate this
> > again here.
> 
> And we do use latest stable (with some delay as I dont want to overload QA &
> endusers with a new kernel every week :))

You need to automate your QA :)

> We just revert known broken (or add known fixes) before releasing them to
> our users

That's great, and is what you should be doing, nothing wrong there.

thanks,

greg k-h

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-19 Thread Thomas Backlund


Den 19.04.2018 kl. 17:22, skrev Greg KH:

On Thu, Apr 19, 2018 at 04:05:45PM +0200, Jan Kara wrote:

On Thu 19-04-18 15:59:43, Greg KH wrote:

On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote:

Den 16-04-2018 kl. 19:19, skrev Sasha Levin:

On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote:

On Mon, 16 Apr 2018 16:02:03 +
Sasha Levin  wrote:


One of the things Greg is pushing strongly for is "bug compatibility":
we want the kernel to behave the same way between mainline and stable.
If the code is broken, it should be broken in the same way.


Wait! What does that mean? What's the purpose of stable if it is as
broken as mainline?


This just means that if there is a fix that went in mainline, and the
fix is broken somehow, we'd rather take the broken fix than not.

In this scenario, *something* will be broken, it's just a matter of
what. We'd rather have the same thing broken between mainline and
stable.



Yeah, but _intentionally_ breaking existing setups to stay "bug compatible"
_is_ a _regression_ you _really_ _dont_ want in a stable
supported distro. Because end-users dont care about upstream breaking
stuff... its the distro that takes the heat for that...

Something "already broken" is not a regression...

As distro maintainer that means one now have to review _every_ patch that
carries "AUTOSEL", follow all the mail threads that comes up about it, then
track if it landed in -stable queue, and read every response and possible
objection to all patches in the -stable queue a second time around... then
check if it still got included in final stable point relase and then either
revert them in distro kernel or go track down all the follow-up fixes
needed...

Just to avoid being "bug compatible with master"


I've done this "bug compatible" "breakage" more than the AUTOSEL stuff
has in the past, so you had better also be reviewing all of my normal
commits as well :)

Anyway, we are trying not to do this, but it does, and will,
occasionally happen.


Sure, that's understood. So this was just misunderstanding. Sasha's
original comment really sounded like "bug compatibility" with current
master is desirable and that made me go "Are you serious?" as well...


As I said before in this thread, yes, sometimes I do this on purpose.



And I guess this is the one that gets people the feeling that
"stable is not as stable as it used to be" ...


As an specific example, see a recent bluetooth patch that caused a
regression on some chromebook devices.  The chromeos developers
rightfully complainied, and I left the commit in there to provide the
needed "leverage" on the upstream developers to fix this properly.
Otherwise if I had reverted the stable patch, when people move to a
newer kernel version, things break, and no one remembers why.


I do understand what you are trying to do...

But from my distro hat on I have to treat things differently (and I dont 
think I'm alone doing it this way...)


Known breakages gets reverted even before it hits QA, so endusers wont 
see the issue at all...


So the only ones to see the issue are those building with latest 
upstream without own patches applied...




I also wrote a long response as to _why_ I do this, and even though it
does happen, why it still is worth taking the stable updates.  Please
see the archives for the full details.  I don't want to duplicate this
again here.


And we do use latest stable (with some delay as I dont want to overload 
QA & endusers with a new kernel every week :))


We just revert known broken (or add known fixes) before releasing them 
to our users


--
Thomas

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-19 Thread Thomas Backlund


Den 19.04.2018 kl. 17:22, skrev Greg KH:

On Thu, Apr 19, 2018 at 04:05:45PM +0200, Jan Kara wrote:

On Thu 19-04-18 15:59:43, Greg KH wrote:

On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote:

Den 16-04-2018 kl. 19:19, skrev Sasha Levin:

On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote:

On Mon, 16 Apr 2018 16:02:03 +
Sasha Levin  wrote:


One of the things Greg is pushing strongly for is "bug compatibility":
we want the kernel to behave the same way between mainline and stable.
If the code is broken, it should be broken in the same way.


Wait! What does that mean? What's the purpose of stable if it is as
broken as mainline?


This just means that if there is a fix that went in mainline, and the
fix is broken somehow, we'd rather take the broken fix than not.

In this scenario, *something* will be broken, it's just a matter of
what. We'd rather have the same thing broken between mainline and
stable.



Yeah, but _intentionally_ breaking existing setups to stay "bug compatible"
_is_ a _regression_ you _really_ _dont_ want in a stable
supported distro. Because end-users dont care about upstream breaking
stuff... its the distro that takes the heat for that...

Something "already broken" is not a regression...

As distro maintainer that means one now have to review _every_ patch that
carries "AUTOSEL", follow all the mail threads that comes up about it, then
track if it landed in -stable queue, and read every response and possible
objection to all patches in the -stable queue a second time around... then
check if it still got included in final stable point relase and then either
revert them in distro kernel or go track down all the follow-up fixes
needed...

Just to avoid being "bug compatible with master"


I've done this "bug compatible" "breakage" more than the AUTOSEL stuff
has in the past, so you had better also be reviewing all of my normal
commits as well :)

Anyway, we are trying not to do this, but it does, and will,
occasionally happen.


Sure, that's understood. So this was just misunderstanding. Sasha's
original comment really sounded like "bug compatibility" with current
master is desirable and that made me go "Are you serious?" as well...


As I said before in this thread, yes, sometimes I do this on purpose.



And I guess this is the one that gets people the feeling that
"stable is not as stable as it used to be" ...


As an specific example, see a recent bluetooth patch that caused a
regression on some chromebook devices.  The chromeos developers
rightfully complainied, and I left the commit in there to provide the
needed "leverage" on the upstream developers to fix this properly.
Otherwise if I had reverted the stable patch, when people move to a
newer kernel version, things break, and no one remembers why.


I do understand what you are trying to do...

But from my distro hat on I have to treat things differently (and I dont 
think I'm alone doing it this way...)


Known breakages gets reverted even before it hits QA, so endusers wont 
see the issue at all...


So the only ones to see the issue are those building with latest 
upstream without own patches applied...




I also wrote a long response as to _why_ I do this, and even though it
does happen, why it still is worth taking the stable updates.  Please
see the archives for the full details.  I don't want to duplicate this
again here.


And we do use latest stable (with some delay as I dont want to overload 
QA & endusers with a new kernel every week :))


We just revert known broken (or add known fixes) before releasing them 
to our users


--
Thomas

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-19 Thread Sasha Levin

On Thu, Apr 19, 2018 at 06:04:26PM +0300, Thomas Backlund wrote:
>Den 19.04.2018 kl. 16:59, skrev Greg KH:
>>Anyway, we are trying not to do this, but it does, and will,
>>occasionally happen.  Look, we just did that for one platform for
>>4.9.94!  And the key to all of this is good testing, which we are now
>>doing, and hopefully you are also doing as well.
>
>Yeah, but having to test stuff with known breakages is no fun, so we 
>try to avoid that

Known breakages are easier to deal with than unknown ones :)

I think that that "bug compatability" is basically a policy on *which*
regressions you'll see vs *if* you'll see a regression.

We'll never pull in a commit that introduces a bug but doesn't fix
another one, right? So if you have to deal with a regression anyway,
might as well deal with a regression that is also seen on mainline, so
that when you upgrade your stable kernel you'll keep the same set of
regressions to deal with.

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-19 Thread Sasha Levin

On Thu, Apr 19, 2018 at 06:04:26PM +0300, Thomas Backlund wrote:
>Den 19.04.2018 kl. 16:59, skrev Greg KH:
>>Anyway, we are trying not to do this, but it does, and will,
>>occasionally happen.  Look, we just did that for one platform for
>>4.9.94!  And the key to all of this is good testing, which we are now
>>doing, and hopefully you are also doing as well.
>
>Yeah, but having to test stuff with known breakages is no fun, so we 
>try to avoid that

Known breakages are easier to deal with than unknown ones :)

I think that that "bug compatability" is basically a policy on *which*
regressions you'll see vs *if* you'll see a regression.

We'll never pull in a commit that introduces a bug but doesn't fix
another one, right? So if you have to deal with a regression anyway,
might as well deal with a regression that is also seen on mainline, so
that when you upgrade your stable kernel you'll keep the same set of
regressions to deal with.

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-19 Thread Thomas Backlund


Den 19.04.2018 kl. 16:59, skrev Greg KH:

On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote:

Den 16-04-2018 kl. 19:19, skrev Sasha Levin:

On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote:

On Mon, 16 Apr 2018 16:02:03 +
Sasha Levin  wrote:


One of the things Greg is pushing strongly for is "bug compatibility":
we want the kernel to behave the same way between mainline and stable.
If the code is broken, it should be broken in the same way.


Wait! What does that mean? What's the purpose of stable if it is as
broken as mainline?


This just means that if there is a fix that went in mainline, and the
fix is broken somehow, we'd rather take the broken fix than not.

In this scenario, *something* will be broken, it's just a matter of
what. We'd rather have the same thing broken between mainline and
stable.



Yeah, but _intentionally_ breaking existing setups to stay "bug compatible"
_is_ a _regression_ you _really_ _dont_ want in a stable
supported distro. Because end-users dont care about upstream breaking
stuff... its the distro that takes the heat for that...

Something "already broken" is not a regression...

As distro maintainer that means one now have to review _every_ patch that
carries "AUTOSEL", follow all the mail threads that comes up about it, then
track if it landed in -stable queue, and read every response and possible
objection to all patches in the -stable queue a second time around... then
check if it still got included in final stable point relase and then either
revert them in distro kernel or go track down all the follow-up fixes
needed...

Just to avoid being "bug compatible with master"


I've done this "bug compatible" "breakage" more than the AUTOSEL stuff
has in the past, so you had better also be reviewing all of my normal
commits as well :)



Yeah, I do... and same goes there ... if there is a known issue, then 
same procedure... Either revert, or try to track down fixes...




Anyway, we are trying not to do this, but it does, and will,
occasionally happen.  Look, we just did that for one platform for
4.9.94!  And the key to all of this is good testing, which we are now
doing, and hopefully you are also doing as well.


Yeah, but having to test stuff with known breakages is no fun, so we try 
to avoid that


--
Thomas

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-19 Thread Thomas Backlund


Den 19.04.2018 kl. 16:59, skrev Greg KH:

On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote:

Den 16-04-2018 kl. 19:19, skrev Sasha Levin:

On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote:

On Mon, 16 Apr 2018 16:02:03 +
Sasha Levin  wrote:


One of the things Greg is pushing strongly for is "bug compatibility":
we want the kernel to behave the same way between mainline and stable.
If the code is broken, it should be broken in the same way.


Wait! What does that mean? What's the purpose of stable if it is as
broken as mainline?


This just means that if there is a fix that went in mainline, and the
fix is broken somehow, we'd rather take the broken fix than not.

In this scenario, *something* will be broken, it's just a matter of
what. We'd rather have the same thing broken between mainline and
stable.



Yeah, but _intentionally_ breaking existing setups to stay "bug compatible"
_is_ a _regression_ you _really_ _dont_ want in a stable
supported distro. Because end-users dont care about upstream breaking
stuff... its the distro that takes the heat for that...

Something "already broken" is not a regression...

As distro maintainer that means one now have to review _every_ patch that
carries "AUTOSEL", follow all the mail threads that comes up about it, then
track if it landed in -stable queue, and read every response and possible
objection to all patches in the -stable queue a second time around... then
check if it still got included in final stable point relase and then either
revert them in distro kernel or go track down all the follow-up fixes
needed...

Just to avoid being "bug compatible with master"


I've done this "bug compatible" "breakage" more than the AUTOSEL stuff
has in the past, so you had better also be reviewing all of my normal
commits as well :)



Yeah, I do... and same goes there ... if there is a known issue, then 
same procedure... Either revert, or try to track down fixes...




Anyway, we are trying not to do this, but it does, and will,
occasionally happen.  Look, we just did that for one platform for
4.9.94!  And the key to all of this is good testing, which we are now
doing, and hopefully you are also doing as well.


Yeah, but having to test stuff with known breakages is no fun, so we try 
to avoid that


--
Thomas

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-19 Thread Greg KH

On Thu, Apr 19, 2018 at 04:05:45PM +0200, Jan Kara wrote:
> On Thu 19-04-18 15:59:43, Greg KH wrote:
> > On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote:
> > > Den 16-04-2018 kl. 19:19, skrev Sasha Levin:
> > > > On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote:
> > > > > On Mon, 16 Apr 2018 16:02:03 +
> > > > > Sasha Levin  wrote:
> > > > > 
> > > > > > One of the things Greg is pushing strongly for is "bug 
> > > > > > compatibility":
> > > > > > we want the kernel to behave the same way between mainline and 
> > > > > > stable.
> > > > > > If the code is broken, it should be broken in the same way.
> > > > > 
> > > > > Wait! What does that mean? What's the purpose of stable if it is as
> > > > > broken as mainline?
> > > > 
> > > > This just means that if there is a fix that went in mainline, and the
> > > > fix is broken somehow, we'd rather take the broken fix than not.
> > > > 
> > > > In this scenario, *something* will be broken, it's just a matter of
> > > > what. We'd rather have the same thing broken between mainline and
> > > > stable.
> > > > 
> > > 
> > > Yeah, but _intentionally_ breaking existing setups to stay "bug 
> > > compatible"
> > > _is_ a _regression_ you _really_ _dont_ want in a stable
> > > supported distro. Because end-users dont care about upstream breaking
> > > stuff... its the distro that takes the heat for that...
> > > 
> > > Something "already broken" is not a regression...
> > > 
> > > As distro maintainer that means one now have to review _every_ patch that
> > > carries "AUTOSEL", follow all the mail threads that comes up about it, 
> > > then
> > > track if it landed in -stable queue, and read every response and possible
> > > objection to all patches in the -stable queue a second time around... then
> > > check if it still got included in final stable point relase and then 
> > > either
> > > revert them in distro kernel or go track down all the follow-up fixes
> > > needed...
> > > 
> > > Just to avoid being "bug compatible with master"
> > 
> > I've done this "bug compatible" "breakage" more than the AUTOSEL stuff
> > has in the past, so you had better also be reviewing all of my normal
> > commits as well :)
> > 
> > Anyway, we are trying not to do this, but it does, and will,
> > occasionally happen.
> 
> Sure, that's understood. So this was just misunderstanding. Sasha's
> original comment really sounded like "bug compatibility" with current
> master is desirable and that made me go "Are you serious?" as well...

As I said before in this thread, yes, sometimes I do this on purpose.

As an specific example, see a recent bluetooth patch that caused a
regression on some chromebook devices.  The chromeos developers
rightfully complainied, and I left the commit in there to provide the
needed "leverage" on the upstream developers to fix this properly.
Otherwise if I had reverted the stable patch, when people move to a
newer kernel version, things break, and no one remembers why.

I also wrote a long response as to _why_ I do this, and even though it
does happen, why it still is worth taking the stable updates.  Please
see the archives for the full details.  I don't want to duplicate this
again here.

thanks,

greg k-h

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-19 Thread Greg KH

On Thu, Apr 19, 2018 at 04:05:45PM +0200, Jan Kara wrote:
> On Thu 19-04-18 15:59:43, Greg KH wrote:
> > On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote:
> > > Den 16-04-2018 kl. 19:19, skrev Sasha Levin:
> > > > On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote:
> > > > > On Mon, 16 Apr 2018 16:02:03 +
> > > > > Sasha Levin  wrote:
> > > > > 
> > > > > > One of the things Greg is pushing strongly for is "bug 
> > > > > > compatibility":
> > > > > > we want the kernel to behave the same way between mainline and 
> > > > > > stable.
> > > > > > If the code is broken, it should be broken in the same way.
> > > > > 
> > > > > Wait! What does that mean? What's the purpose of stable if it is as
> > > > > broken as mainline?
> > > > 
> > > > This just means that if there is a fix that went in mainline, and the
> > > > fix is broken somehow, we'd rather take the broken fix than not.
> > > > 
> > > > In this scenario, *something* will be broken, it's just a matter of
> > > > what. We'd rather have the same thing broken between mainline and
> > > > stable.
> > > > 
> > > 
> > > Yeah, but _intentionally_ breaking existing setups to stay "bug 
> > > compatible"
> > > _is_ a _regression_ you _really_ _dont_ want in a stable
> > > supported distro. Because end-users dont care about upstream breaking
> > > stuff... its the distro that takes the heat for that...
> > > 
> > > Something "already broken" is not a regression...
> > > 
> > > As distro maintainer that means one now have to review _every_ patch that
> > > carries "AUTOSEL", follow all the mail threads that comes up about it, 
> > > then
> > > track if it landed in -stable queue, and read every response and possible
> > > objection to all patches in the -stable queue a second time around... then
> > > check if it still got included in final stable point relase and then 
> > > either
> > > revert them in distro kernel or go track down all the follow-up fixes
> > > needed...
> > > 
> > > Just to avoid being "bug compatible with master"
> > 
> > I've done this "bug compatible" "breakage" more than the AUTOSEL stuff
> > has in the past, so you had better also be reviewing all of my normal
> > commits as well :)
> > 
> > Anyway, we are trying not to do this, but it does, and will,
> > occasionally happen.
> 
> Sure, that's understood. So this was just misunderstanding. Sasha's
> original comment really sounded like "bug compatibility" with current
> master is desirable and that made me go "Are you serious?" as well...

As I said before in this thread, yes, sometimes I do this on purpose.

As an specific example, see a recent bluetooth patch that caused a
regression on some chromebook devices.  The chromeos developers
rightfully complainied, and I left the commit in there to provide the
needed "leverage" on the upstream developers to fix this properly.
Otherwise if I had reverted the stable patch, when people move to a
newer kernel version, things break, and no one remembers why.

I also wrote a long response as to _why_ I do this, and even though it
does happen, why it still is worth taking the stable updates.  Please
see the archives for the full details.  I don't want to duplicate this
again here.

thanks,

greg k-h

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-19 Thread Jan Kara

On Thu 19-04-18 15:59:43, Greg KH wrote:
> On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote:
> > Den 16-04-2018 kl. 19:19, skrev Sasha Levin:
> > > On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote:
> > > > On Mon, 16 Apr 2018 16:02:03 +
> > > > Sasha Levin  wrote:
> > > > 
> > > > > One of the things Greg is pushing strongly for is "bug compatibility":
> > > > > we want the kernel to behave the same way between mainline and stable.
> > > > > If the code is broken, it should be broken in the same way.
> > > > 
> > > > Wait! What does that mean? What's the purpose of stable if it is as
> > > > broken as mainline?
> > > 
> > > This just means that if there is a fix that went in mainline, and the
> > > fix is broken somehow, we'd rather take the broken fix than not.
> > > 
> > > In this scenario, *something* will be broken, it's just a matter of
> > > what. We'd rather have the same thing broken between mainline and
> > > stable.
> > > 
> > 
> > Yeah, but _intentionally_ breaking existing setups to stay "bug compatible"
> > _is_ a _regression_ you _really_ _dont_ want in a stable
> > supported distro. Because end-users dont care about upstream breaking
> > stuff... its the distro that takes the heat for that...
> > 
> > Something "already broken" is not a regression...
> > 
> > As distro maintainer that means one now have to review _every_ patch that
> > carries "AUTOSEL", follow all the mail threads that comes up about it, then
> > track if it landed in -stable queue, and read every response and possible
> > objection to all patches in the -stable queue a second time around... then
> > check if it still got included in final stable point relase and then either
> > revert them in distro kernel or go track down all the follow-up fixes
> > needed...
> > 
> > Just to avoid being "bug compatible with master"
> 
> I've done this "bug compatible" "breakage" more than the AUTOSEL stuff
> has in the past, so you had better also be reviewing all of my normal
> commits as well :)
> 
> Anyway, we are trying not to do this, but it does, and will,
> occasionally happen.

Sure, that's understood. So this was just misunderstanding. Sasha's
original comment really sounded like "bug compatibility" with current
master is desirable and that made me go "Are you serious?" as well...

Honza
-- 
Jan Kara 
SUSE Labs, CR

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-19 Thread Jan Kara

On Thu 19-04-18 15:59:43, Greg KH wrote:
> On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote:
> > Den 16-04-2018 kl. 19:19, skrev Sasha Levin:
> > > On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote:
> > > > On Mon, 16 Apr 2018 16:02:03 +
> > > > Sasha Levin  wrote:
> > > > 
> > > > > One of the things Greg is pushing strongly for is "bug compatibility":
> > > > > we want the kernel to behave the same way between mainline and stable.
> > > > > If the code is broken, it should be broken in the same way.
> > > > 
> > > > Wait! What does that mean? What's the purpose of stable if it is as
> > > > broken as mainline?
> > > 
> > > This just means that if there is a fix that went in mainline, and the
> > > fix is broken somehow, we'd rather take the broken fix than not.
> > > 
> > > In this scenario, *something* will be broken, it's just a matter of
> > > what. We'd rather have the same thing broken between mainline and
> > > stable.
> > > 
> > 
> > Yeah, but _intentionally_ breaking existing setups to stay "bug compatible"
> > _is_ a _regression_ you _really_ _dont_ want in a stable
> > supported distro. Because end-users dont care about upstream breaking
> > stuff... its the distro that takes the heat for that...
> > 
> > Something "already broken" is not a regression...
> > 
> > As distro maintainer that means one now have to review _every_ patch that
> > carries "AUTOSEL", follow all the mail threads that comes up about it, then
> > track if it landed in -stable queue, and read every response and possible
> > objection to all patches in the -stable queue a second time around... then
> > check if it still got included in final stable point relase and then either
> > revert them in distro kernel or go track down all the follow-up fixes
> > needed...
> > 
> > Just to avoid being "bug compatible with master"
> 
> I've done this "bug compatible" "breakage" more than the AUTOSEL stuff
> has in the past, so you had better also be reviewing all of my normal
> commits as well :)
> 
> Anyway, we are trying not to do this, but it does, and will,
> occasionally happen.

Sure, that's understood. So this was just misunderstanding. Sasha's
original comment really sounded like "bug compatibility" with current
master is desirable and that made me go "Are you serious?" as well...

Honza
-- 
Jan Kara 
SUSE Labs, CR

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-19 Thread Greg KH

On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote:
> Den 16-04-2018 kl. 19:19, skrev Sasha Levin:
> > On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote:
> > > On Mon, 16 Apr 2018 16:02:03 +
> > > Sasha Levin  wrote:
> > > 
> > > > One of the things Greg is pushing strongly for is "bug compatibility":
> > > > we want the kernel to behave the same way between mainline and stable.
> > > > If the code is broken, it should be broken in the same way.
> > > 
> > > Wait! What does that mean? What's the purpose of stable if it is as
> > > broken as mainline?
> > 
> > This just means that if there is a fix that went in mainline, and the
> > fix is broken somehow, we'd rather take the broken fix than not.
> > 
> > In this scenario, *something* will be broken, it's just a matter of
> > what. We'd rather have the same thing broken between mainline and
> > stable.
> > 
> 
> Yeah, but _intentionally_ breaking existing setups to stay "bug compatible"
> _is_ a _regression_ you _really_ _dont_ want in a stable
> supported distro. Because end-users dont care about upstream breaking
> stuff... its the distro that takes the heat for that...
> 
> Something "already broken" is not a regression...
> 
> As distro maintainer that means one now have to review _every_ patch that
> carries "AUTOSEL", follow all the mail threads that comes up about it, then
> track if it landed in -stable queue, and read every response and possible
> objection to all patches in the -stable queue a second time around... then
> check if it still got included in final stable point relase and then either
> revert them in distro kernel or go track down all the follow-up fixes
> needed...
> 
> Just to avoid being "bug compatible with master"

I've done this "bug compatible" "breakage" more than the AUTOSEL stuff
has in the past, so you had better also be reviewing all of my normal
commits as well :)

Anyway, we are trying not to do this, but it does, and will,
occasionally happen.  Look, we just did that for one platform for
4.9.94!  And the key to all of this is good testing, which we are now
doing, and hopefully you are also doing as well.

thanks,

greg k-h

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-19 Thread Greg KH

On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote:
> Den 16-04-2018 kl. 19:19, skrev Sasha Levin:
> > On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote:
> > > On Mon, 16 Apr 2018 16:02:03 +
> > > Sasha Levin  wrote:
> > > 
> > > > One of the things Greg is pushing strongly for is "bug compatibility":
> > > > we want the kernel to behave the same way between mainline and stable.
> > > > If the code is broken, it should be broken in the same way.
> > > 
> > > Wait! What does that mean? What's the purpose of stable if it is as
> > > broken as mainline?
> > 
> > This just means that if there is a fix that went in mainline, and the
> > fix is broken somehow, we'd rather take the broken fix than not.
> > 
> > In this scenario, *something* will be broken, it's just a matter of
> > what. We'd rather have the same thing broken between mainline and
> > stable.
> > 
> 
> Yeah, but _intentionally_ breaking existing setups to stay "bug compatible"
> _is_ a _regression_ you _really_ _dont_ want in a stable
> supported distro. Because end-users dont care about upstream breaking
> stuff... its the distro that takes the heat for that...
> 
> Something "already broken" is not a regression...
> 
> As distro maintainer that means one now have to review _every_ patch that
> carries "AUTOSEL", follow all the mail threads that comes up about it, then
> track if it landed in -stable queue, and read every response and possible
> objection to all patches in the -stable queue a second time around... then
> check if it still got included in final stable point relase and then either
> revert them in distro kernel or go track down all the follow-up fixes
> needed...
> 
> Just to avoid being "bug compatible with master"

I've done this "bug compatible" "breakage" more than the AUTOSEL stuff
has in the past, so you had better also be reviewing all of my normal
commits as well :)

Anyway, we are trying not to do this, but it does, and will,
occasionally happen.  Look, we just did that for one platform for
4.9.94!  And the key to all of this is good testing, which we are now
doing, and hopefully you are also doing as well.

thanks,

greg k-h

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-19 Thread Thomas Backlund


Den 16-04-2018 kl. 19:19, skrev Sasha Levin:

On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote:

On Mon, 16 Apr 2018 16:02:03 +
Sasha Levin  wrote:


One of the things Greg is pushing strongly for is "bug compatibility":
we want the kernel to behave the same way between mainline and stable.
If the code is broken, it should be broken in the same way.


Wait! What does that mean? What's the purpose of stable if it is as
broken as mainline?


This just means that if there is a fix that went in mainline, and the
fix is broken somehow, we'd rather take the broken fix than not.

In this scenario, *something* will be broken, it's just a matter of
what. We'd rather have the same thing broken between mainline and
stable.



Yeah, but _intentionally_ breaking existing setups to stay "bug 
compatible" _is_ a _regression_ you _really_ _dont_ want in a stable

supported distro. Because end-users dont care about upstream breaking
stuff... its the distro that takes the heat for that...

Something "already broken" is not a regression...

As distro maintainer that means one now have to review _every_ patch 
that carries "AUTOSEL", follow all the mail threads that comes up about 
it, then track if it landed in -stable queue, and read every response 
and possible objection to all patches in the -stable queue a second time 
around... then check if it still got included in final stable point 
relase and then either revert them in distro kernel or go track down all 
the follow-up fixes needed...


Just to avoid being "bug compatible with master"

--
Thomas

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-19 Thread Thomas Backlund


Den 16-04-2018 kl. 19:19, skrev Sasha Levin:

On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote:

On Mon, 16 Apr 2018 16:02:03 +
Sasha Levin  wrote:


One of the things Greg is pushing strongly for is "bug compatibility":
we want the kernel to behave the same way between mainline and stable.
If the code is broken, it should be broken in the same way.


Wait! What does that mean? What's the purpose of stable if it is as
broken as mainline?


This just means that if there is a fix that went in mainline, and the
fix is broken somehow, we'd rather take the broken fix than not.

In this scenario, *something* will be broken, it's just a matter of
what. We'd rather have the same thing broken between mainline and
stable.



Yeah, but _intentionally_ breaking existing setups to stay "bug 
compatible" _is_ a _regression_ you _really_ _dont_ want in a stable

supported distro. Because end-users dont care about upstream breaking
stuff... its the distro that takes the heat for that...

Something "already broken" is not a regression...

As distro maintainer that means one now have to review _every_ patch 
that carries "AUTOSEL", follow all the mail threads that comes up about 
it, then track if it landed in -stable queue, and read every response 
and possible objection to all patches in the -stable queue a second time 
around... then check if it still got included in final stable point 
relase and then either revert them in distro kernel or go track down all 
the follow-up fixes needed...


Just to avoid being "bug compatible with master"

--
Thomas

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-18 Thread Petr Mladek

On Tue 2018-04-17 13:45:59, Sasha Levin wrote:
> On Tue, Apr 17, 2018 at 02:24:54PM +0200, Petr Mladek wrote:
> >Back to the trend. Last week I got autosel mails even for
> >patches that were still being discussed, had issues, and
> >were far from upstream:
> >
> > https://lkml.kernel.org/r/dm5pr2101mb1032ab19b489d46b717b50d4fb...@dm5pr2101mb1032.namprd21.prod.outlook.com
> > https://lkml.kernel.org/r/dm5pr2101mb10327fa0a7e0d2c901e33b79fb...@dm5pr2101mb1032.namprd21.prod.outlook.com
> >
> >It might be a good idea if the mail asked to add Fixes: tag
> >or stable mailing list. But the mail suggested to add the
> >unfinished patch into stable branch directly (even before
> >upstreaming?).
> 
> I obviously didn't suggest that this patch will go in -stable before
> it's upstream.
> 
> I've started doing those because some folks can't be arsed to reply to a
> review request for a patch that is months old. I found that if I send
> these mails while the discussion is still going on I'd get a much better
> response rate from people.

I see. It makes sense.

> If you think any of these patches should go in stable there were two
> ways about it:
>
>  - You end up adding the -stable tag yourself, and it would follow the
>usual route where Greg picks it up.
>  - You reply to that mail, and the patch would wait in a list until my
>script notices it made it upstream, at which point it would get
>queued for stable.

It would be great if the options are described in the mail.

I wonder if it would make sense to add also a tag that would
say that the commit is not suitable for stable. It might
help both sides. The maintainers will be able to share
their opinion and eventually reduce mails from autosel.
You would get feedback that maintainers considered
the patch for stable. It might be even useful for
teaching the AI.

> >Now, there are only hand full of printk patches in each
> >release, so it is still doable. I just do not understand
> >how other maintainers, from much more busy subsystems,
> >could cope with this trend.
> 
> So yes, I'm aware that the volume of patches is huge, but there's not
> much I can do about it because it's just a subset of the kernel's patch
> volume and since the kernel gets more and more patches each release, the
> volume of stable commits is bound to grow as well.

Yes, but the grow in the stable is much faster than the grow
in maintain at the moment. It might be fine if it was caused
just by engaging subsystems that ignored stable so far. But
I am not sure if it is the case. Also I am not sure about
your plans.

Anyway, I am surprised that the patches might go into stable
so easily (no response -> accepted). While it is pretty
hard to get through the review process for mainline.

Of course, many patches go into mainline without review
as well. But the difference is that they are pushed by
people that are familiar and responsible for the affected
area.

I could understand the pain. There are surely people that
do not care about stable, because it takes time, it is hard
to make decisions, flashbacks to the old code are painful,
etc. Well, this is the reason why the maintenance support
is and should be limited.

Anyway, I think that it cannot be done reasonably without
maintainers. You should be careful so that even the currently
cooperating maintainers will not start considering autosel
mails as a spam. (It is not my case. printk is small thing.
But I could imagine that it might stop being bearable
in bigger subsystems. As is already the case with xfs.)

Best Regards,
Petr

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-18 Thread Petr Mladek

On Tue 2018-04-17 13:45:59, Sasha Levin wrote:
> On Tue, Apr 17, 2018 at 02:24:54PM +0200, Petr Mladek wrote:
> >Back to the trend. Last week I got autosel mails even for
> >patches that were still being discussed, had issues, and
> >were far from upstream:
> >
> > https://lkml.kernel.org/r/dm5pr2101mb1032ab19b489d46b717b50d4fb...@dm5pr2101mb1032.namprd21.prod.outlook.com
> > https://lkml.kernel.org/r/dm5pr2101mb10327fa0a7e0d2c901e33b79fb...@dm5pr2101mb1032.namprd21.prod.outlook.com
> >
> >It might be a good idea if the mail asked to add Fixes: tag
> >or stable mailing list. But the mail suggested to add the
> >unfinished patch into stable branch directly (even before
> >upstreaming?).
> 
> I obviously didn't suggest that this patch will go in -stable before
> it's upstream.
> 
> I've started doing those because some folks can't be arsed to reply to a
> review request for a patch that is months old. I found that if I send
> these mails while the discussion is still going on I'd get a much better
> response rate from people.

I see. It makes sense.

> If you think any of these patches should go in stable there were two
> ways about it:
>
>  - You end up adding the -stable tag yourself, and it would follow the
>usual route where Greg picks it up.
>  - You reply to that mail, and the patch would wait in a list until my
>script notices it made it upstream, at which point it would get
>queued for stable.

It would be great if the options are described in the mail.

I wonder if it would make sense to add also a tag that would
say that the commit is not suitable for stable. It might
help both sides. The maintainers will be able to share
their opinion and eventually reduce mails from autosel.
You would get feedback that maintainers considered
the patch for stable. It might be even useful for
teaching the AI.

> >Now, there are only hand full of printk patches in each
> >release, so it is still doable. I just do not understand
> >how other maintainers, from much more busy subsystems,
> >could cope with this trend.
> 
> So yes, I'm aware that the volume of patches is huge, but there's not
> much I can do about it because it's just a subset of the kernel's patch
> volume and since the kernel gets more and more patches each release, the
> volume of stable commits is bound to grow as well.

Yes, but the grow in the stable is much faster than the grow
in maintain at the moment. It might be fine if it was caused
just by engaging subsystems that ignored stable so far. But
I am not sure if it is the case. Also I am not sure about
your plans.

Anyway, I am surprised that the patches might go into stable
so easily (no response -> accepted). While it is pretty
hard to get through the review process for mainline.

Of course, many patches go into mainline without review
as well. But the difference is that they are pushed by
people that are familiar and responsible for the affected
area.

I could understand the pain. There are surely people that
do not care about stable, because it takes time, it is hard
to make decisions, flashbacks to the old code are painful,
etc. Well, this is the reason why the maintenance support
is and should be limited.

Anyway, I think that it cannot be done reasonably without
maintainers. You should be careful so that even the currently
cooperating maintainers will not start considering autosel
mails as a spam. (It is not my case. printk is small thing.
But I could imagine that it might stop being bearable
in bigger subsystems. As is already the case with xfs.)

Best Regards,
Petr

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Sasha Levin

On Tue, Apr 17, 2018 at 07:57:54PM +0200, Jan Kara wrote:
>Actually I was careful enough to include only commits that got merged as
>part of the stable process into 4.14.x but got later reverted in 4.14.y.
>That's where the 0.4% number came from. So I believe all of those cases
>(13 in absolute numbers) were user visible regressions during the stable
>process.

I looked at them, and there are 2 things in play here:

 - Quite a few of those reverts are because of the PTI work. I'm not
   sure how we treat it, but yes - it skews statistics here.
 - 2 of them were reverts for device tree changes for a device that
   didn't exist in 4.14, and shouldn't have had any user visible
   changes.

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Sasha Levin

On Tue, Apr 17, 2018 at 07:57:54PM +0200, Jan Kara wrote:
>Actually I was careful enough to include only commits that got merged as
>part of the stable process into 4.14.x but got later reverted in 4.14.y.
>That's where the 0.4% number came from. So I believe all of those cases
>(13 in absolute numbers) were user visible regressions during the stable
>process.

I looked at them, and there are 2 things in play here:

 - Quite a few of those reverts are because of the PTI work. I'm not
   sure how we treat it, but yes - it skews statistics here.
 - 2 of them were reverts for device tree changes for a device that
   didn't exist in 4.14, and shouldn't have had any user visible
   changes.

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Michal Hocko

On Tue 17-04-18 14:36:44, Sasha Levin wrote:
> On Tue, Apr 17, 2018 at 04:22:46PM +0200, Michal Hocko wrote:
> >On Tue 17-04-18 13:39:33, Sasha Levin wrote:
> >[...]
> >> But mm/ commits don't come only from these people. Here's a concrete
> >> example we can discuss:
> >>
> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c61611f70958d86f659bca25c02ae69413747a8d
> >
> >I would be really careful. Because that reqiures to audit all callers to
> >be compliant with the change. This is just _too_ easy to backport
> >without noticing a failure. Now consider the other side. Is there any
> >real bug report backing this? This behavior was like that for quite some
> >time but I do not remember any actual bug report and the changelog
> >doesn't mention one either. It is about theoretical problem.
> 
> https://lkml.org/lkml/2018/3/19/430
> 
> There's even a fun little reproducer that allowed me to confirm it's an
> issue (at least) on 4.15.
> 
> Heck, it might even qualify as a CVE.
> 
> >So if this was to be merged to stable then the changelog should contain
> >a big fat warning about the existing users and how they should be
> >checked.
> 
> So what I'm asking is why *wasn't* it sent to stable? Yes, it requires
> additional work backporting this, but what I'm saying is that this
> didn't happen at all.

Do not ask me. I wasn't involved. But I would _guess_ that the original
bug is not all that serious because it requires some specific privileges
and it is quite unlikely that somebody privileged would want to shoot
its feet. But this is just my wild guess.

Anyway, I am pretty sure that if the triggering BUG was serious enough
then it would be much safer to remove it for stable backports.
-- 
Michal Hocko
SUSE Labs

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Michal Hocko

On Tue 17-04-18 14:36:44, Sasha Levin wrote:
> On Tue, Apr 17, 2018 at 04:22:46PM +0200, Michal Hocko wrote:
> >On Tue 17-04-18 13:39:33, Sasha Levin wrote:
> >[...]
> >> But mm/ commits don't come only from these people. Here's a concrete
> >> example we can discuss:
> >>
> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c61611f70958d86f659bca25c02ae69413747a8d
> >
> >I would be really careful. Because that reqiures to audit all callers to
> >be compliant with the change. This is just _too_ easy to backport
> >without noticing a failure. Now consider the other side. Is there any
> >real bug report backing this? This behavior was like that for quite some
> >time but I do not remember any actual bug report and the changelog
> >doesn't mention one either. It is about theoretical problem.
> 
> https://lkml.org/lkml/2018/3/19/430
> 
> There's even a fun little reproducer that allowed me to confirm it's an
> issue (at least) on 4.15.
> 
> Heck, it might even qualify as a CVE.
> 
> >So if this was to be merged to stable then the changelog should contain
> >a big fat warning about the existing users and how they should be
> >checked.
> 
> So what I'm asking is why *wasn't* it sent to stable? Yes, it requires
> additional work backporting this, but what I'm saying is that this
> didn't happen at all.

Do not ask me. I wasn't involved. But I would _guess_ that the original
bug is not all that serious because it requires some specific privileges
and it is quite unlikely that somebody privileged would want to shoot
its feet. But this is just my wild guess.

Anyway, I am pretty sure that if the triggering BUG was serious enough
then it would be much safer to remove it for stable backports.
-- 
Michal Hocko
SUSE Labs

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Jan Kara

On Tue 17-04-18 16:19:35, Sasha Levin wrote:
> On Tue, Apr 17, 2018 at 05:55:49PM +0200, Jan Kara wrote:
> >> Even regression chance is tricky, look at the commits I've linked
> >> earlier in the thread. Even the most trivial looking commits that end up
> >> in stable have a chance for regression.
> >
> >Sure, you can never be certain and I think people (including me)
> >underestimate the chance of regressions for "trivial" patches. But you just
> >estimate a chance, you may be lucky, you may not...
> >
> >> >Another point I wanted to make is that if chance a patch causes a
> >> >regression is about 2% as you said somewhere else in a thread, then by
> >> >adding 20 patches that "may fix a bug that is annoying for someone" you've
> >> >just increased a chance there's a regression in the release by 34%. And
> >>
> >> So I've said that the rejection rate is less than 2%. This includes
> >> all commits that I have proposed for -stable, but didn't end up being
> >> included in -stable.
> >>
> >> This includes commits that the author/maintainers NACKed, commits that
> >> didn't do anything on older kernels, commits that were buggy but were
> >> caught before the kernel was released, commits that failed to build on
> >> an arch I didn't test it on originally and so on.
> >>
> >> After thousands of merged AUTOSEL patches I can count the number of
> >> times a commit has caused a regression and had to be removed on one
> >> hand.
> >>
> >> >this is not just a math game, this also roughly matches a real experience
> >> >with maintaining our enterprise kernels. Do 20 "maybe" fixes outweight 
> >> >such
> >> >regression chance? And I also note that for a regression to get reported 
> >> >so
> >> >that it gets included into your 2% estimate of a patch regression rate,
> >> >someone must be bothered enough by it to triage it and send an email
> >> >somewhere so that already falls into a category of "serious" stuff to me.
> >>
> >> It is indeed a numbers game, but the regression rate isn't 2%, it's
> >> closer to 0.05%.
> >
> >Honestly, I think 0.05% is too optimististic :) Quick grepping of 4.14
> >stable tree suggests some 13 commits were reverted from stable due to bugs.
> >That's some 0.4% and that doesn't count fixes that were applied to
> >fix other regressions.
> 
> 0.05% is for commits that were merged in stable but later fixed or
> reverted because they introduced a regression. By grepping for reverts
> you also include things such as:
> 
>  - Reverts of commits that were in the corresponding mainline tree
>  - Reverts of commits that didn't introduce regressions

Actually I was careful enough to include only commits that got merged as
part of the stable process into 4.14.x but got later reverted in 4.14.y.
That's where the 0.4% number came from. So I believe all of those cases
(13 in absolute numbers) were user visible regressions during the stable
process.

Honza
-- 
Jan Kara 
SUSE Labs, CR

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Jan Kara

On Tue 17-04-18 16:19:35, Sasha Levin wrote:
> On Tue, Apr 17, 2018 at 05:55:49PM +0200, Jan Kara wrote:
> >> Even regression chance is tricky, look at the commits I've linked
> >> earlier in the thread. Even the most trivial looking commits that end up
> >> in stable have a chance for regression.
> >
> >Sure, you can never be certain and I think people (including me)
> >underestimate the chance of regressions for "trivial" patches. But you just
> >estimate a chance, you may be lucky, you may not...
> >
> >> >Another point I wanted to make is that if chance a patch causes a
> >> >regression is about 2% as you said somewhere else in a thread, then by
> >> >adding 20 patches that "may fix a bug that is annoying for someone" you've
> >> >just increased a chance there's a regression in the release by 34%. And
> >>
> >> So I've said that the rejection rate is less than 2%. This includes
> >> all commits that I have proposed for -stable, but didn't end up being
> >> included in -stable.
> >>
> >> This includes commits that the author/maintainers NACKed, commits that
> >> didn't do anything on older kernels, commits that were buggy but were
> >> caught before the kernel was released, commits that failed to build on
> >> an arch I didn't test it on originally and so on.
> >>
> >> After thousands of merged AUTOSEL patches I can count the number of
> >> times a commit has caused a regression and had to be removed on one
> >> hand.
> >>
> >> >this is not just a math game, this also roughly matches a real experience
> >> >with maintaining our enterprise kernels. Do 20 "maybe" fixes outweight 
> >> >such
> >> >regression chance? And I also note that for a regression to get reported 
> >> >so
> >> >that it gets included into your 2% estimate of a patch regression rate,
> >> >someone must be bothered enough by it to triage it and send an email
> >> >somewhere so that already falls into a category of "serious" stuff to me.
> >>
> >> It is indeed a numbers game, but the regression rate isn't 2%, it's
> >> closer to 0.05%.
> >
> >Honestly, I think 0.05% is too optimististic :) Quick grepping of 4.14
> >stable tree suggests some 13 commits were reverted from stable due to bugs.
> >That's some 0.4% and that doesn't count fixes that were applied to
> >fix other regressions.
> 
> 0.05% is for commits that were merged in stable but later fixed or
> reverted because they introduced a regression. By grepping for reverts
> you also include things such as:
> 
>  - Reverts of commits that were in the corresponding mainline tree
>  - Reverts of commits that didn't introduce regressions

Actually I was careful enough to include only commits that got merged as
part of the stable process into 4.14.x but got later reverted in 4.14.y.
That's where the 0.4% number came from. So I believe all of those cases
(13 in absolute numbers) were user visible regressions during the stable
process.

Honza
-- 
Jan Kara 
SUSE Labs, CR

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Sasha Levin

On Tue, Apr 17, 2018 at 05:52:30PM +0200, Jiri Kosina wrote:
>On Tue, 17 Apr 2018, Sasha Levin wrote:
>
>> How do I get the XFS folks to send their stuff to -stable? (we have
>> quite a few customers who use XFS)
>
>If XFS (or *any* other subsystem) doesn't have enough manpower of upstream
>maintainers to deal with stable, we just have to accept that and find an
>answer to that.

This is exactly what I'm doing. Many subsystems don't have enough
manpower to deal with -stable, so I'm trying to help.

>If XFS folks claim that they don't have enough mental capacity to
>create/verify XFS backports, I totally don't see how any kind of AI would
>have.

Because creating backports is not all about mental capacity!

A lot of time gets wasted on going through the list of commits,
backporting each of those commits into every -stable tree we have,
building it, running tests, etc.

So it's not all about pure mental capacity, but more about the time
per-patch it takes to get -stable done.

If I can cut down on that, by suggesting a list of commits, doing builds
and tests, what's the problem?

>If your business relies on XFS (and so does ours, BTW) or any other
>subsystem that doesn't have enough manpower to care for stable, the proper
>solution (and contribution) would be just bringing more people into the
>XFS community.

Microsoft's business relies on quite a few kernel subsystems. While we
try to bring more people in the kernel (we're hiring!), as you might
know it's not easy getting kernel folks.

So just "get more people" isn't a good solution. It doesn't scale
either.

>To put it simply -- I don't think the simple lack of actual human
>brainpower can be reasonably resolved in other way than bringing more of
>it in.
>
>Thanks,
>
>-- 
>Jiri Kosina
>SUSE Labs
>

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Sasha Levin

On Tue, Apr 17, 2018 at 05:52:30PM +0200, Jiri Kosina wrote:
>On Tue, 17 Apr 2018, Sasha Levin wrote:
>
>> How do I get the XFS folks to send their stuff to -stable? (we have
>> quite a few customers who use XFS)
>
>If XFS (or *any* other subsystem) doesn't have enough manpower of upstream
>maintainers to deal with stable, we just have to accept that and find an
>answer to that.

This is exactly what I'm doing. Many subsystems don't have enough
manpower to deal with -stable, so I'm trying to help.

>If XFS folks claim that they don't have enough mental capacity to
>create/verify XFS backports, I totally don't see how any kind of AI would
>have.

Because creating backports is not all about mental capacity!

A lot of time gets wasted on going through the list of commits,
backporting each of those commits into every -stable tree we have,
building it, running tests, etc.

So it's not all about pure mental capacity, but more about the time
per-patch it takes to get -stable done.

If I can cut down on that, by suggesting a list of commits, doing builds
and tests, what's the problem?

>If your business relies on XFS (and so does ours, BTW) or any other
>subsystem that doesn't have enough manpower to care for stable, the proper
>solution (and contribution) would be just bringing more people into the
>XFS community.

Microsoft's business relies on quite a few kernel subsystems. While we
try to bring more people in the kernel (we're hiring!), as you might
know it's not easy getting kernel folks.

So just "get more people" isn't a good solution. It doesn't scale
either.

>To put it simply -- I don't think the simple lack of actual human
>brainpower can be reasonably resolved in other way than bringing more of
>it in.
>
>Thanks,
>
>-- 
>Jiri Kosina
>SUSE Labs
>

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Mike Galbraith

On Tue, 2018-04-17 at 17:52 +0200, Jiri Kosina wrote:
> On Tue, 17 Apr 2018, Sasha Levin wrote:
> 
> > How do I get the XFS folks to send their stuff to -stable? (we have
> > quite a few customers who use XFS)
> 
> If XFS (or *any* other subsystem) doesn't have enough manpower of upstream 
> maintainers to deal with stable, we just have to accept that and find an 
> answer to that.
> 
> If XFS folks claim that they don't have enough mental capacity to 
> create/verify XFS backports, I totally don't see how any kind of AI would 
> have.
> 
> If your business relies on XFS (and so does ours, BTW) or any other 
> subsystem that doesn't have enough manpower to care for stable, the proper 
> solution (and contribution) would be just bringing more people into the 
> XFS community.
> 
> To put it simply -- I don't think the simple lack of actual human 
> brainpower can be reasonably resolved in other way than bringing more of 
> it in.

Not to worry... soon enough it'll be submitting properly massaged
backports of the stuff it submitted upstream :)

-Mike

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Mike Galbraith

On Tue, 2018-04-17 at 17:52 +0200, Jiri Kosina wrote:
> On Tue, 17 Apr 2018, Sasha Levin wrote:
> 
> > How do I get the XFS folks to send their stuff to -stable? (we have
> > quite a few customers who use XFS)
> 
> If XFS (or *any* other subsystem) doesn't have enough manpower of upstream 
> maintainers to deal with stable, we just have to accept that and find an 
> answer to that.
> 
> If XFS folks claim that they don't have enough mental capacity to 
> create/verify XFS backports, I totally don't see how any kind of AI would 
> have.
> 
> If your business relies on XFS (and so does ours, BTW) or any other 
> subsystem that doesn't have enough manpower to care for stable, the proper 
> solution (and contribution) would be just bringing more people into the 
> XFS community.
> 
> To put it simply -- I don't think the simple lack of actual human 
> brainpower can be reasonably resolved in other way than bringing more of 
> it in.

Not to worry... soon enough it'll be submitting properly massaged
backports of the stuff it submitted upstream :)

-Mike

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Sasha Levin

On Tue, Apr 17, 2018 at 05:55:49PM +0200, Jan Kara wrote:
>On Tue 17-04-18 13:31:51, Sasha Levin wrote:
>> We may be able to guesstimate the 'regression chance', but there's no
>> way we can guess the 'annoyance' once. There are so many different use
>> cases that we just can't even guess how many people would get "annoyed"
>> by something.
>
>As a maintainer, I hope I have reasonable idea what are common use cases
>for my subsystem. Those I cater to when estimating 'annoyance'. Sure I don't
>know all of the use cases so people doing unusual stuff hit more bugs and
>have to report them to get fixes included in -stable. But for me this is a
>preferable tradeoff over the risk of regression so this is the rule I use
>when tagging for stable. Now I'm not a -stable maintainer and I fully agree
>with "those who do the work decide" principle so pick whatever patches you
>think are appropriate, I just wanted explain why I don't think more patches
>in stable are necessarily good.

The AUTOSEL story is different for subsystems that don't do -stable, and
subsystems that are actually doing the work (like yourself).

I'm not trying to override active maintainers, I'm trying to help them
make decisions.

The AUTOSEL bot will attempt to apply any patch it deems as -stable for
on all -stable branches, finding possible dependencies, build them, and
run any tests that you might deem necessary.

You would be able to start your analysis without "wasting" time on doing
a bunch of "manual labor".

There's a big difference between subsystems like yours and most of the
rest of the kernel.

>> Even regression chance is tricky, look at the commits I've linked
>> earlier in the thread. Even the most trivial looking commits that end up
>> in stable have a chance for regression.
>
>Sure, you can never be certain and I think people (including me)
>underestimate the chance of regressions for "trivial" patches. But you just
>estimate a chance, you may be lucky, you may not...
>
>> >Another point I wanted to make is that if chance a patch causes a
>> >regression is about 2% as you said somewhere else in a thread, then by
>> >adding 20 patches that "may fix a bug that is annoying for someone" you've
>> >just increased a chance there's a regression in the release by 34%. And
>>
>> So I've said that the rejection rate is less than 2%. This includes
>> all commits that I have proposed for -stable, but didn't end up being
>> included in -stable.
>>
>> This includes commits that the author/maintainers NACKed, commits that
>> didn't do anything on older kernels, commits that were buggy but were
>> caught before the kernel was released, commits that failed to build on
>> an arch I didn't test it on originally and so on.
>>
>> After thousands of merged AUTOSEL patches I can count the number of
>> times a commit has caused a regression and had to be removed on one
>> hand.
>>
>> >this is not just a math game, this also roughly matches a real experience
>> >with maintaining our enterprise kernels. Do 20 "maybe" fixes outweight such
>> >regression chance? And I also note that for a regression to get reported so
>> >that it gets included into your 2% estimate of a patch regression rate,
>> >someone must be bothered enough by it to triage it and send an email
>> >somewhere so that already falls into a category of "serious" stuff to me.
>>
>> It is indeed a numbers game, but the regression rate isn't 2%, it's
>> closer to 0.05%.
>
>Honestly, I think 0.05% is too optimististic :) Quick grepping of 4.14
>stable tree suggests some 13 commits were reverted from stable due to bugs.
>That's some 0.4% and that doesn't count fixes that were applied to
>fix other regressions.

0.05% is for commits that were merged in stable but later fixed or
reverted because they introduced a regression. By grepping for reverts
you also include things such as:

 - Reverts of commits that were in the corresponding mainline tree
 - Reverts of commits that didn't introduce regressions

>But the actual numbers don't really matter that much, in principle the more
>patches you add the higher is the chance of regression. You can't change
>that so you better have a good reason to include a patch...

You increase the chance of regressions, but you also increase the chance
of fixing bugs that affect users.

My claim is that the chance to fix bugs increases far more significantly
than the chance to introduce regressions.

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Sasha Levin

On Tue, Apr 17, 2018 at 05:55:49PM +0200, Jan Kara wrote:
>On Tue 17-04-18 13:31:51, Sasha Levin wrote:
>> We may be able to guesstimate the 'regression chance', but there's no
>> way we can guess the 'annoyance' once. There are so many different use
>> cases that we just can't even guess how many people would get "annoyed"
>> by something.
>
>As a maintainer, I hope I have reasonable idea what are common use cases
>for my subsystem. Those I cater to when estimating 'annoyance'. Sure I don't
>know all of the use cases so people doing unusual stuff hit more bugs and
>have to report them to get fixes included in -stable. But for me this is a
>preferable tradeoff over the risk of regression so this is the rule I use
>when tagging for stable. Now I'm not a -stable maintainer and I fully agree
>with "those who do the work decide" principle so pick whatever patches you
>think are appropriate, I just wanted explain why I don't think more patches
>in stable are necessarily good.

The AUTOSEL story is different for subsystems that don't do -stable, and
subsystems that are actually doing the work (like yourself).

I'm not trying to override active maintainers, I'm trying to help them
make decisions.

The AUTOSEL bot will attempt to apply any patch it deems as -stable for
on all -stable branches, finding possible dependencies, build them, and
run any tests that you might deem necessary.

You would be able to start your analysis without "wasting" time on doing
a bunch of "manual labor".

There's a big difference between subsystems like yours and most of the
rest of the kernel.

>> Even regression chance is tricky, look at the commits I've linked
>> earlier in the thread. Even the most trivial looking commits that end up
>> in stable have a chance for regression.
>
>Sure, you can never be certain and I think people (including me)
>underestimate the chance of regressions for "trivial" patches. But you just
>estimate a chance, you may be lucky, you may not...
>
>> >Another point I wanted to make is that if chance a patch causes a
>> >regression is about 2% as you said somewhere else in a thread, then by
>> >adding 20 patches that "may fix a bug that is annoying for someone" you've
>> >just increased a chance there's a regression in the release by 34%. And
>>
>> So I've said that the rejection rate is less than 2%. This includes
>> all commits that I have proposed for -stable, but didn't end up being
>> included in -stable.
>>
>> This includes commits that the author/maintainers NACKed, commits that
>> didn't do anything on older kernels, commits that were buggy but were
>> caught before the kernel was released, commits that failed to build on
>> an arch I didn't test it on originally and so on.
>>
>> After thousands of merged AUTOSEL patches I can count the number of
>> times a commit has caused a regression and had to be removed on one
>> hand.
>>
>> >this is not just a math game, this also roughly matches a real experience
>> >with maintaining our enterprise kernels. Do 20 "maybe" fixes outweight such
>> >regression chance? And I also note that for a regression to get reported so
>> >that it gets included into your 2% estimate of a patch regression rate,
>> >someone must be bothered enough by it to triage it and send an email
>> >somewhere so that already falls into a category of "serious" stuff to me.
>>
>> It is indeed a numbers game, but the regression rate isn't 2%, it's
>> closer to 0.05%.
>
>Honestly, I think 0.05% is too optimististic :) Quick grepping of 4.14
>stable tree suggests some 13 commits were reverted from stable due to bugs.
>That's some 0.4% and that doesn't count fixes that were applied to
>fix other regressions.

0.05% is for commits that were merged in stable but later fixed or
reverted because they introduced a regression. By grepping for reverts
you also include things such as:

 - Reverts of commits that were in the corresponding mainline tree
 - Reverts of commits that didn't introduce regressions

>But the actual numbers don't really matter that much, in principle the more
>patches you add the higher is the chance of regression. You can't change
>that so you better have a good reason to include a patch...

You increase the chance of regressions, but you also increase the chance
of fixing bugs that affect users.

My claim is that the chance to fix bugs increases far more significantly
than the chance to introduce regressions.

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Jan Kara

On Tue 17-04-18 13:31:51, Sasha Levin wrote:
> On Tue, Apr 17, 2018 at 01:41:44PM +0200, Jan Kara wrote:
> >On Mon 16-04-18 17:23:30, Sasha Levin wrote:
> >> On Mon, Apr 16, 2018 at 07:06:04PM +0200, Pavel Machek wrote:
> >> >On Mon 2018-04-16 16:37:56, Sasha Levin wrote:
> >> >> On Mon, Apr 16, 2018 at 12:30:19PM -0400, Steven Rostedt wrote:
> >> >> >On Mon, 16 Apr 2018 16:19:14 +
> >> >> >Sasha Levin  wrote:
> >> >> >
> >> >> >> >Wait! What does that mean? What's the purpose of stable if it is as
> >> >> >> >broken as mainline?
> >> >> >>
> >> >> >> This just means that if there is a fix that went in mainline, and the
> >> >> >> fix is broken somehow, we'd rather take the broken fix than not.
> >> >> >>
> >> >> >> In this scenario, *something* will be broken, it's just a matter of
> >> >> >> what. We'd rather have the same thing broken between mainline and
> >> >> >> stable.
> >> >> >
> >> >> >Honestly, I think that removes all value of the stable series. I
> >> >> >remember when the stable series were first created. People were saying
> >> >> >that it wouldn't even get to more than 5 versions, because the bar for
> >> >> >backporting was suppose to be very high. Today it's just a fork of the
> >> >> >kernel at a given version. No more features, but we will be OK with
> >> >> >regressions. I'm struggling to see what the benefit of it is suppose to
> >> >> >be?
> >> >>
> >> >> It's not "OK with regressions".
> >> >>
> >> >> Let's look at a hypothetical example: You have a 4.15.1 kernel that has
> >> >> a broken printf() behaviour so that when you:
> >> >>
> >> >> pr_err("%d", 5)
> >> >>
> >> >> Would print:
> >> >>
> >> >> "Microsoft Rulez"
> >> >>
> >> >> Bad, right? So you went ahead and fixed it, and now it prints "5" as you
> >> >> might expect. But alas, with your patch, running:
> >> >>
> >> >> pr_err("%s", "hi!")
> >> >>
> >> >> Would show a cat picture for 5 seconds.
> >> >>
> >> >> Should we take your patch in -stable or not? If we don't, we're stuck
> >> >> with the original issue while the mainline kernel will behave
> >> >> differently, but if we do - we introduce a new regression.
> >> >
> >> >Of course not.
> >> >
> >> >- It must be obviously correct and tested.
> >> >
> >> >If it introduces new bug, it is not correct, and certainly not
> >> >obviously correct.
> >>
> >> As you might have noticed, we don't strictly follow the rules.
> >>
> >> Take a look at the whole PTI story as an example. It's way more than 100
> >> lines, it's not obviously corrent, it fixed more than 1 thing, and so
> >> on, and yet it went in -stable!
> >>
> >> Would you argue we shouldn't have backported PTI to -stable?
> >
> >So I agree with that being backported. But I think this nicely demostrates
> >a point some people are trying to make in this thread. We do take fixes
> >with high risk or regression if they fix serious enough issue. Also we do
> >take fixes to non-serious stuff (such as addition of device ID) if the
> >chances of regression are really low.
> >
> >So IMHO the metric for including the fix is not solely "how annoying to
> >user this can be" but rather something like:
> >
> >score = (how annoying the bug is) * ((1 / (chance of regression due to
> > including this)) - 1)^3
> >
> >(constants are somewhat arbitrary subject to tuning ;). Now both 'annoying'
> >and 'regression chance' parts are subjective and sometimes difficult to
> >estimate so don't take the formula too seriously but it demonstrates the
> >point. I think we all agree we want to fix annoying stuff and we don't want
> >regressions. But you need to somehow weight this over your expected
> >userbase - and this is where your argument "but someone might be annoyed by
> >LEDs not working so let's include it" has problems - it should rather be
> >"is the annoyance of non-working leds over expected user base high enough
> >to risk a regression due to this patch for someone in the expected user
> >base"? The answer to this second question is not clear at all to a casual
> >reviewer and that's why we IMHO have CC stable tag as maintainer is
> >supposed to have at least a bit better clue.
> 
> We may be able to guesstimate the 'regression chance', but there's no
> way we can guess the 'annoyance' once. There are so many different use
> cases that we just can't even guess how many people would get "annoyed"
> by something.

As a maintainer, I hope I have reasonable idea what are common use cases
for my subsystem. Those I cater to when estimating 'annoyance'. Sure I don't
know all of the use cases so people doing unusual stuff hit more bugs and
have to report them to get fixes included in -stable. But for me this is a
preferable tradeoff over the risk of regression so this is the rule I use
when tagging for stable. Now I'm not a -stable maintainer and I fully agree
with "those who do the work decide" principle so pick whatever patches you
think are appropriate, I just wanted explain

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Jan Kara

On Tue 17-04-18 13:31:51, Sasha Levin wrote:
> On Tue, Apr 17, 2018 at 01:41:44PM +0200, Jan Kara wrote:
> >On Mon 16-04-18 17:23:30, Sasha Levin wrote:
> >> On Mon, Apr 16, 2018 at 07:06:04PM +0200, Pavel Machek wrote:
> >> >On Mon 2018-04-16 16:37:56, Sasha Levin wrote:
> >> >> On Mon, Apr 16, 2018 at 12:30:19PM -0400, Steven Rostedt wrote:
> >> >> >On Mon, 16 Apr 2018 16:19:14 +
> >> >> >Sasha Levin  wrote:
> >> >> >
> >> >> >> >Wait! What does that mean? What's the purpose of stable if it is as
> >> >> >> >broken as mainline?
> >> >> >>
> >> >> >> This just means that if there is a fix that went in mainline, and the
> >> >> >> fix is broken somehow, we'd rather take the broken fix than not.
> >> >> >>
> >> >> >> In this scenario, *something* will be broken, it's just a matter of
> >> >> >> what. We'd rather have the same thing broken between mainline and
> >> >> >> stable.
> >> >> >
> >> >> >Honestly, I think that removes all value of the stable series. I
> >> >> >remember when the stable series were first created. People were saying
> >> >> >that it wouldn't even get to more than 5 versions, because the bar for
> >> >> >backporting was suppose to be very high. Today it's just a fork of the
> >> >> >kernel at a given version. No more features, but we will be OK with
> >> >> >regressions. I'm struggling to see what the benefit of it is suppose to
> >> >> >be?
> >> >>
> >> >> It's not "OK with regressions".
> >> >>
> >> >> Let's look at a hypothetical example: You have a 4.15.1 kernel that has
> >> >> a broken printf() behaviour so that when you:
> >> >>
> >> >> pr_err("%d", 5)
> >> >>
> >> >> Would print:
> >> >>
> >> >> "Microsoft Rulez"
> >> >>
> >> >> Bad, right? So you went ahead and fixed it, and now it prints "5" as you
> >> >> might expect. But alas, with your patch, running:
> >> >>
> >> >> pr_err("%s", "hi!")
> >> >>
> >> >> Would show a cat picture for 5 seconds.
> >> >>
> >> >> Should we take your patch in -stable or not? If we don't, we're stuck
> >> >> with the original issue while the mainline kernel will behave
> >> >> differently, but if we do - we introduce a new regression.
> >> >
> >> >Of course not.
> >> >
> >> >- It must be obviously correct and tested.
> >> >
> >> >If it introduces new bug, it is not correct, and certainly not
> >> >obviously correct.
> >>
> >> As you might have noticed, we don't strictly follow the rules.
> >>
> >> Take a look at the whole PTI story as an example. It's way more than 100
> >> lines, it's not obviously corrent, it fixed more than 1 thing, and so
> >> on, and yet it went in -stable!
> >>
> >> Would you argue we shouldn't have backported PTI to -stable?
> >
> >So I agree with that being backported. But I think this nicely demostrates
> >a point some people are trying to make in this thread. We do take fixes
> >with high risk or regression if they fix serious enough issue. Also we do
> >take fixes to non-serious stuff (such as addition of device ID) if the
> >chances of regression are really low.
> >
> >So IMHO the metric for including the fix is not solely "how annoying to
> >user this can be" but rather something like:
> >
> >score = (how annoying the bug is) * ((1 / (chance of regression due to
> > including this)) - 1)^3
> >
> >(constants are somewhat arbitrary subject to tuning ;). Now both 'annoying'
> >and 'regression chance' parts are subjective and sometimes difficult to
> >estimate so don't take the formula too seriously but it demonstrates the
> >point. I think we all agree we want to fix annoying stuff and we don't want
> >regressions. But you need to somehow weight this over your expected
> >userbase - and this is where your argument "but someone might be annoyed by
> >LEDs not working so let's include it" has problems - it should rather be
> >"is the annoyance of non-working leds over expected user base high enough
> >to risk a regression due to this patch for someone in the expected user
> >base"? The answer to this second question is not clear at all to a casual
> >reviewer and that's why we IMHO have CC stable tag as maintainer is
> >supposed to have at least a bit better clue.
> 
> We may be able to guesstimate the 'regression chance', but there's no
> way we can guess the 'annoyance' once. There are so many different use
> cases that we just can't even guess how many people would get "annoyed"
> by something.

As a maintainer, I hope I have reasonable idea what are common use cases
for my subsystem. Those I cater to when estimating 'annoyance'. Sure I don't
know all of the use cases so people doing unusual stuff hit more bugs and
have to report them to get fixes included in -stable. But for me this is a
preferable tradeoff over the risk of regression so this is the rule I use
when tagging for stable. Now I'm not a -stable maintainer and I fully agree
with "those who do the work decide" principle so pick whatever patches you
think are appropriate, I just wanted explain why I don't think more patches
in

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Jiri Kosina

On Tue, 17 Apr 2018, Sasha Levin wrote:

> How do I get the XFS folks to send their stuff to -stable? (we have
> quite a few customers who use XFS)

If XFS (or *any* other subsystem) doesn't have enough manpower of upstream 
maintainers to deal with stable, we just have to accept that and find an 
answer to that.

If XFS folks claim that they don't have enough mental capacity to 
create/verify XFS backports, I totally don't see how any kind of AI would 
have.

If your business relies on XFS (and so does ours, BTW) or any other 
subsystem that doesn't have enough manpower to care for stable, the proper 
solution (and contribution) would be just bringing more people into the 
XFS community.

To put it simply -- I don't think the simple lack of actual human 
brainpower can be reasonably resolved in other way than bringing more of 
it in.

Thanks,

-- 
Jiri Kosina
SUSE Labs

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Jiri Kosina

On Tue, 17 Apr 2018, Sasha Levin wrote:

> How do I get the XFS folks to send their stuff to -stable? (we have
> quite a few customers who use XFS)

If XFS (or *any* other subsystem) doesn't have enough manpower of upstream 
maintainers to deal with stable, we just have to accept that and find an 
answer to that.

If XFS folks claim that they don't have enough mental capacity to 
create/verify XFS backports, I totally don't see how any kind of AI would 
have.

If your business relies on XFS (and so does ours, BTW) or any other 
subsystem that doesn't have enough manpower to care for stable, the proper 
solution (and contribution) would be just bringing more people into the 
XFS community.

To put it simply -- I don't think the simple lack of actual human 
brainpower can be reasonably resolved in other way than bringing more of 
it in.

Thanks,

-- 
Jiri Kosina
SUSE Labs

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Sasha Levin

On Tue, Apr 17, 2018 at 04:36:31PM +0200, Michal Hocko wrote:
>On Tue 17-04-18 14:04:36, Sasha Levin wrote:
>> On Tue, Apr 17, 2018 at 01:07:17PM +0200, Michal Hocko wrote:
>> >On Tue 17-04-18 12:39:36, Greg KH wrote:
>> >> On Mon, Apr 16, 2018 at 11:28:44PM +0200, Jiri Kosina wrote:
>> >> > On Mon, 16 Apr 2018, Sasha Levin wrote:
>> >> >
>> >> > > I agree that as an enterprise distro taking everything from -stable
>> >> > > isn't the best idea. Ideally you'd want to be close to the first
>> >> > > extreme you've mentioned and only take commits if customers are asking
>> >> > > you to do so.
>> >> > >
>> >> > > I think that the rule we're trying to agree upon is the "It must fix
>> >> > > a real bug that bothers people".
>> >> > >
>> >> > > I think that we can agree that it's impossible to expect every single
>> >> > > Linux user to go on LKML and complain about a bug he encountered, so 
>> >> > > the
>> >> > > rule quickly becomes "It must fix a real bug that can bother people".
>> >> >
>> >> > So is there a reason why stable couldn't become some hybrid-form union 
>> >> > of
>> >> >
>> >> > - really critical issues (data corruption, boot issues, severe security
>> >> >   issues) taken from bleeding edge upstream
>> >> > - [reviewed] cherry-picks of functional fixes from major distro kernels
>> >> >   (based on that very -stable release), as that's apparently what people
>> >> >   are hitting in the real world with that particular kernel
>> >>
>> >> It already is that :)
>> >>
>> >> The problem Sasha is trying to solve here is that for many subsystems,
>> >> maintainers do not mark patches for stable at all.
>> >
>> >The way he is trying to do that is just wrong. Generate a pressure on
>> >those subsystems by referring to bug reports and unhappy users and I am
>> >pretty sure they will try harder... You cannot solve the problem by
>> >bypassing them without having deep understanding of the specific
>> >subsytem. Once you have it, just make sure you are part of the review
>> >process and make sure to mark patches before they are merged.
>>
>> I think we just don't agree on how we should "pressure".
>>
>> Look at the discussion I had with the XFS folks who just don't want to
>> deal with this -stable thing because they have to much work upstream.
>
>So do you really think that you or any script decide without them? My
>recollection from that discussion was quite opposite. Dave was quite
>clear that most of fixes are quite hard to evaluate and most of them
>are simply not worth risking the backport.

No, *some* fixes are hard, not most.

I'm not trying to decide for them, I'm trying to help them decide.

>> There wasn't a single patch in -stable coming from XFS for the past 6+
>> months. I'm aware of more than one way to corrupt an XFS volume for any
>> distro that uses a kernel older than 4.15.
>
>Then try to poke/bribe somebody to have it fixed. But applying
>_something_ is just not a solution. You should also evaluate whether "I
>am able to corrupt" is something that "people see in the wild". Sure
>there are zillions of bugs hidden in the large code base like the
>kernel. People just do not tend to hit them and this will likely not
>change very much in the future.

We can't ignore bugs just because people don't notice.

Data corruption bugs in particular are a pain to report as well, the
corruption might have happened months before and there's not much to
report at that point.

There's quite a few bug classes like that.

>> Sure, please buy them a beer at LSF/MM (I'll pay) and ask them to be
>> better about it, but I don't see this changing.
>
>I can surely have one or two and discuss this. I am pretty sure xfs guys
>are not going to pretend older kernels do not exist.
>
>> The solution to this, in my opinion, is to automate the whole selection
>> and review process. We do selection using AI, and we run every possible
>> test that's relevant to that subsystem.
>>
>> At which point, the amount of work a human needs to do to review a patch
>> shrinks into something far more managable for some maintainers.
>
>I really disagree. I am pretty sure maintainers are very well aware of
>how the patch is important. Some do no care about stable and I agree you
>should poke those. But some have really good reasons to not throw many
>patches that direction because they do not feel the patch is important
>enough.
>
>Remember this is not about numbers. The more is not always better.

So what is "important"? Look at the XFS issues, they were important
enough to get fixed upstream, and have an appropriate test added to
xfstests.

Why didn't they go back to -stable?

>> >> So real bugfixes
>> >> that do hit people are not getting to those kernels, which force the
>> >> distros to do extra work to triage a bug, dig through upstream kernels,
>> >> find and apply the patch.
>> >
>> >I would say that this is the primary role of the distro. To hide the
>> >jungle of the upstream work and provide the additional of bug filtering

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Sasha Levin

On Tue, Apr 17, 2018 at 04:36:31PM +0200, Michal Hocko wrote:
>On Tue 17-04-18 14:04:36, Sasha Levin wrote:
>> On Tue, Apr 17, 2018 at 01:07:17PM +0200, Michal Hocko wrote:
>> >On Tue 17-04-18 12:39:36, Greg KH wrote:
>> >> On Mon, Apr 16, 2018 at 11:28:44PM +0200, Jiri Kosina wrote:
>> >> > On Mon, 16 Apr 2018, Sasha Levin wrote:
>> >> >
>> >> > > I agree that as an enterprise distro taking everything from -stable
>> >> > > isn't the best idea. Ideally you'd want to be close to the first
>> >> > > extreme you've mentioned and only take commits if customers are asking
>> >> > > you to do so.
>> >> > >
>> >> > > I think that the rule we're trying to agree upon is the "It must fix
>> >> > > a real bug that bothers people".
>> >> > >
>> >> > > I think that we can agree that it's impossible to expect every single
>> >> > > Linux user to go on LKML and complain about a bug he encountered, so 
>> >> > > the
>> >> > > rule quickly becomes "It must fix a real bug that can bother people".
>> >> >
>> >> > So is there a reason why stable couldn't become some hybrid-form union 
>> >> > of
>> >> >
>> >> > - really critical issues (data corruption, boot issues, severe security
>> >> >   issues) taken from bleeding edge upstream
>> >> > - [reviewed] cherry-picks of functional fixes from major distro kernels
>> >> >   (based on that very -stable release), as that's apparently what people
>> >> >   are hitting in the real world with that particular kernel
>> >>
>> >> It already is that :)
>> >>
>> >> The problem Sasha is trying to solve here is that for many subsystems,
>> >> maintainers do not mark patches for stable at all.
>> >
>> >The way he is trying to do that is just wrong. Generate a pressure on
>> >those subsystems by referring to bug reports and unhappy users and I am
>> >pretty sure they will try harder... You cannot solve the problem by
>> >bypassing them without having deep understanding of the specific
>> >subsytem. Once you have it, just make sure you are part of the review
>> >process and make sure to mark patches before they are merged.
>>
>> I think we just don't agree on how we should "pressure".
>>
>> Look at the discussion I had with the XFS folks who just don't want to
>> deal with this -stable thing because they have to much work upstream.
>
>So do you really think that you or any script decide without them? My
>recollection from that discussion was quite opposite. Dave was quite
>clear that most of fixes are quite hard to evaluate and most of them
>are simply not worth risking the backport.

No, *some* fixes are hard, not most.

I'm not trying to decide for them, I'm trying to help them decide.

>> There wasn't a single patch in -stable coming from XFS for the past 6+
>> months. I'm aware of more than one way to corrupt an XFS volume for any
>> distro that uses a kernel older than 4.15.
>
>Then try to poke/bribe somebody to have it fixed. But applying
>_something_ is just not a solution. You should also evaluate whether "I
>am able to corrupt" is something that "people see in the wild". Sure
>there are zillions of bugs hidden in the large code base like the
>kernel. People just do not tend to hit them and this will likely not
>change very much in the future.

We can't ignore bugs just because people don't notice.

Data corruption bugs in particular are a pain to report as well, the
corruption might have happened months before and there's not much to
report at that point.

There's quite a few bug classes like that.

>> Sure, please buy them a beer at LSF/MM (I'll pay) and ask them to be
>> better about it, but I don't see this changing.
>
>I can surely have one or two and discuss this. I am pretty sure xfs guys
>are not going to pretend older kernels do not exist.
>
>> The solution to this, in my opinion, is to automate the whole selection
>> and review process. We do selection using AI, and we run every possible
>> test that's relevant to that subsystem.
>>
>> At which point, the amount of work a human needs to do to review a patch
>> shrinks into something far more managable for some maintainers.
>
>I really disagree. I am pretty sure maintainers are very well aware of
>how the patch is important. Some do no care about stable and I agree you
>should poke those. But some have really good reasons to not throw many
>patches that direction because they do not feel the patch is important
>enough.
>
>Remember this is not about numbers. The more is not always better.

So what is "important"? Look at the XFS issues, they were important
enough to get fixed upstream, and have an appropriate test added to
xfstests.

Why didn't they go back to -stable?

>> >> So real bugfixes
>> >> that do hit people are not getting to those kernels, which force the
>> >> distros to do extra work to triage a bug, dig through upstream kernels,
>> >> find and apply the patch.
>> >
>> >I would say that this is the primary role of the distro. To hide the
>> >jungle of the upstream work and provide the additional of bug filtering

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Sasha Levin

On Tue, Apr 17, 2018 at 04:22:46PM +0200, Michal Hocko wrote:
>On Tue 17-04-18 13:39:33, Sasha Levin wrote:
>[...]
>> But mm/ commits don't come only from these people. Here's a concrete
>> example we can discuss:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c61611f70958d86f659bca25c02ae69413747a8d
>
>I would be really careful. Because that reqiures to audit all callers to
>be compliant with the change. This is just _too_ easy to backport
>without noticing a failure. Now consider the other side. Is there any
>real bug report backing this? This behavior was like that for quite some
>time but I do not remember any actual bug report and the changelog
>doesn't mention one either. It is about theoretical problem.

https://lkml.org/lkml/2018/3/19/430

There's even a fun little reproducer that allowed me to confirm it's an
issue (at least) on 4.15.

Heck, it might even qualify as a CVE.

>So if this was to be merged to stable then the changelog should contain
>a big fat warning about the existing users and how they should be
>checked.

So what I'm asking is why *wasn't* it sent to stable? Yes, it requires
additional work backporting this, but what I'm saying is that this
didn't happen at all.

>Besides that I can see Reviewed-by: akpm and Andrew is usually very
>careful about stable backports so there probably _was_ a reson to
>exclude stable.
>-- 
>Michal Hocko
>SUSE Labs

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Sasha Levin

On Tue, Apr 17, 2018 at 04:22:46PM +0200, Michal Hocko wrote:
>On Tue 17-04-18 13:39:33, Sasha Levin wrote:
>[...]
>> But mm/ commits don't come only from these people. Here's a concrete
>> example we can discuss:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c61611f70958d86f659bca25c02ae69413747a8d
>
>I would be really careful. Because that reqiures to audit all callers to
>be compliant with the change. This is just _too_ easy to backport
>without noticing a failure. Now consider the other side. Is there any
>real bug report backing this? This behavior was like that for quite some
>time but I do not remember any actual bug report and the changelog
>doesn't mention one either. It is about theoretical problem.

https://lkml.org/lkml/2018/3/19/430

There's even a fun little reproducer that allowed me to confirm it's an
issue (at least) on 4.15.

Heck, it might even qualify as a CVE.

>So if this was to be merged to stable then the changelog should contain
>a big fat warning about the existing users and how they should be
>checked.

So what I'm asking is why *wasn't* it sent to stable? Yes, it requires
additional work backporting this, but what I'm saying is that this
didn't happen at all.

>Besides that I can see Reviewed-by: akpm and Andrew is usually very
>careful about stable backports so there probably _was_ a reson to
>exclude stable.
>-- 
>Michal Hocko
>SUSE Labs

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Michal Hocko

On Tue 17-04-18 14:04:36, Sasha Levin wrote:
> On Tue, Apr 17, 2018 at 01:07:17PM +0200, Michal Hocko wrote:
> >On Tue 17-04-18 12:39:36, Greg KH wrote:
> >> On Mon, Apr 16, 2018 at 11:28:44PM +0200, Jiri Kosina wrote:
> >> > On Mon, 16 Apr 2018, Sasha Levin wrote:
> >> >
> >> > > I agree that as an enterprise distro taking everything from -stable
> >> > > isn't the best idea. Ideally you'd want to be close to the first
> >> > > extreme you've mentioned and only take commits if customers are asking
> >> > > you to do so.
> >> > >
> >> > > I think that the rule we're trying to agree upon is the "It must fix
> >> > > a real bug that bothers people".
> >> > >
> >> > > I think that we can agree that it's impossible to expect every single
> >> > > Linux user to go on LKML and complain about a bug he encountered, so 
> >> > > the
> >> > > rule quickly becomes "It must fix a real bug that can bother people".
> >> >
> >> > So is there a reason why stable couldn't become some hybrid-form union of
> >> >
> >> > - really critical issues (data corruption, boot issues, severe security
> >> >   issues) taken from bleeding edge upstream
> >> > - [reviewed] cherry-picks of functional fixes from major distro kernels
> >> >   (based on that very -stable release), as that's apparently what people
> >> >   are hitting in the real world with that particular kernel
> >>
> >> It already is that :)
> >>
> >> The problem Sasha is trying to solve here is that for many subsystems,
> >> maintainers do not mark patches for stable at all.
> >
> >The way he is trying to do that is just wrong. Generate a pressure on
> >those subsystems by referring to bug reports and unhappy users and I am
> >pretty sure they will try harder... You cannot solve the problem by
> >bypassing them without having deep understanding of the specific
> >subsytem. Once you have it, just make sure you are part of the review
> >process and make sure to mark patches before they are merged.
> 
> I think we just don't agree on how we should "pressure".
> 
> Look at the discussion I had with the XFS folks who just don't want to
> deal with this -stable thing because they have to much work upstream.

So do you really think that you or any script decide without them? My
recollection from that discussion was quite opposite. Dave was quite
clear that most of fixes are quite hard to evaluate and most of them
are simply not worth risking the backport.

> There wasn't a single patch in -stable coming from XFS for the past 6+
> months. I'm aware of more than one way to corrupt an XFS volume for any
> distro that uses a kernel older than 4.15.

Then try to poke/bribe somebody to have it fixed. But applying
_something_ is just not a solution. You should also evaluate whether "I
am able to corrupt" is something that "people see in the wild". Sure
there are zillions of bugs hidden in the large code base like the
kernel. People just do not tend to hit them and this will likely not
change very much in the future.

> Sure, please buy them a beer at LSF/MM (I'll pay) and ask them to be
> better about it, but I don't see this changing.

I can surely have one or two and discuss this. I am pretty sure xfs guys
are not going to pretend older kernels do not exist.

> The solution to this, in my opinion, is to automate the whole selection
> and review process. We do selection using AI, and we run every possible
> test that's relevant to that subsystem.
> 
> At which point, the amount of work a human needs to do to review a patch
> shrinks into something far more managable for some maintainers.

I really disagree. I am pretty sure maintainers are very well aware of
how the patch is important. Some do no care about stable and I agree you
should poke those. But some have really good reasons to not throw many
patches that direction because they do not feel the patch is important
enough.

Remember this is not about numbers. The more is not always better.

> >> So real bugfixes
> >> that do hit people are not getting to those kernels, which force the
> >> distros to do extra work to triage a bug, dig through upstream kernels,
> >> find and apply the patch.
> >
> >I would say that this is the primary role of the distro. To hide the
> >jungle of the upstream work and provide the additional of bug filtering
> >and forwarding them the right direction.
> 
> More often than triaging, you'll just be asked to upgrade to the latest
> version. What sort of user experience does that provide?
> 
> [snip]
> 
> >> So nothing "new" is happening here, EXCEPT we are actually starting to
> >> get a better kernel-wide coverage for stable fixes, which we have not
> >> had in the past.  That's a good thing!  The number of patches applied to
> >> stable is still a very very very tiny % compared to mainline, so nothing
> >> new is happening here.
> >
> >yes I do agree, the stable process is not very much different from the
> >past and I would tend both processes broken because they explicitly try
> >to avoid

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Michal Hocko

On Tue 17-04-18 14:04:36, Sasha Levin wrote:
> On Tue, Apr 17, 2018 at 01:07:17PM +0200, Michal Hocko wrote:
> >On Tue 17-04-18 12:39:36, Greg KH wrote:
> >> On Mon, Apr 16, 2018 at 11:28:44PM +0200, Jiri Kosina wrote:
> >> > On Mon, 16 Apr 2018, Sasha Levin wrote:
> >> >
> >> > > I agree that as an enterprise distro taking everything from -stable
> >> > > isn't the best idea. Ideally you'd want to be close to the first
> >> > > extreme you've mentioned and only take commits if customers are asking
> >> > > you to do so.
> >> > >
> >> > > I think that the rule we're trying to agree upon is the "It must fix
> >> > > a real bug that bothers people".
> >> > >
> >> > > I think that we can agree that it's impossible to expect every single
> >> > > Linux user to go on LKML and complain about a bug he encountered, so 
> >> > > the
> >> > > rule quickly becomes "It must fix a real bug that can bother people".
> >> >
> >> > So is there a reason why stable couldn't become some hybrid-form union of
> >> >
> >> > - really critical issues (data corruption, boot issues, severe security
> >> >   issues) taken from bleeding edge upstream
> >> > - [reviewed] cherry-picks of functional fixes from major distro kernels
> >> >   (based on that very -stable release), as that's apparently what people
> >> >   are hitting in the real world with that particular kernel
> >>
> >> It already is that :)
> >>
> >> The problem Sasha is trying to solve here is that for many subsystems,
> >> maintainers do not mark patches for stable at all.
> >
> >The way he is trying to do that is just wrong. Generate a pressure on
> >those subsystems by referring to bug reports and unhappy users and I am
> >pretty sure they will try harder... You cannot solve the problem by
> >bypassing them without having deep understanding of the specific
> >subsytem. Once you have it, just make sure you are part of the review
> >process and make sure to mark patches before they are merged.
> 
> I think we just don't agree on how we should "pressure".
> 
> Look at the discussion I had with the XFS folks who just don't want to
> deal with this -stable thing because they have to much work upstream.

So do you really think that you or any script decide without them? My
recollection from that discussion was quite opposite. Dave was quite
clear that most of fixes are quite hard to evaluate and most of them
are simply not worth risking the backport.

> There wasn't a single patch in -stable coming from XFS for the past 6+
> months. I'm aware of more than one way to corrupt an XFS volume for any
> distro that uses a kernel older than 4.15.

Then try to poke/bribe somebody to have it fixed. But applying
_something_ is just not a solution. You should also evaluate whether "I
am able to corrupt" is something that "people see in the wild". Sure
there are zillions of bugs hidden in the large code base like the
kernel. People just do not tend to hit them and this will likely not
change very much in the future.

> Sure, please buy them a beer at LSF/MM (I'll pay) and ask them to be
> better about it, but I don't see this changing.

I can surely have one or two and discuss this. I am pretty sure xfs guys
are not going to pretend older kernels do not exist.

> The solution to this, in my opinion, is to automate the whole selection
> and review process. We do selection using AI, and we run every possible
> test that's relevant to that subsystem.
> 
> At which point, the amount of work a human needs to do to review a patch
> shrinks into something far more managable for some maintainers.

I really disagree. I am pretty sure maintainers are very well aware of
how the patch is important. Some do no care about stable and I agree you
should poke those. But some have really good reasons to not throw many
patches that direction because they do not feel the patch is important
enough.

Remember this is not about numbers. The more is not always better.

> >> So real bugfixes
> >> that do hit people are not getting to those kernels, which force the
> >> distros to do extra work to triage a bug, dig through upstream kernels,
> >> find and apply the patch.
> >
> >I would say that this is the primary role of the distro. To hide the
> >jungle of the upstream work and provide the additional of bug filtering
> >and forwarding them the right direction.
> 
> More often than triaging, you'll just be asked to upgrade to the latest
> version. What sort of user experience does that provide?
> 
> [snip]
> 
> >> So nothing "new" is happening here, EXCEPT we are actually starting to
> >> get a better kernel-wide coverage for stable fixes, which we have not
> >> had in the past.  That's a good thing!  The number of patches applied to
> >> stable is still a very very very tiny % compared to mainline, so nothing
> >> new is happening here.
> >
> >yes I do agree, the stable process is not very much different from the
> >past and I would tend both processes broken because they explicitly try
> >to avoid

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Greg KH

On Tue, Apr 17, 2018 at 10:15:02AM -0400, Steven Rostedt wrote:
> On Tue, 17 Apr 2018 14:04:36 +
> Sasha Levin  wrote:
> 
> > The solution to this, in my opinion, is to automate the whole selection
> > and review process. We do selection using AI, and we run every possible
> > test that's relevant to that subsystem.
> > 
> > At which point, the amount of work a human needs to do to review a patch
> > shrinks into something far more managable for some maintainers.
> 
> I guess the real question is, who are the stable kernels for? Is it just
> a place to look at to see what distros should think about. A superset
> of what distros would take. Then distros would have a nice place to
> look to find what patches they should look at. But the stable tree
> itself wont be used. But it's not being used today by major distros
> (Red Hat and SuSE). Debian may be using it, but that's because the
> stable maintainer for its kernels is also the Debian maintainer.
> 
> Who are the customers of the stable trees? They are the ones that
> should be determining the "equation" for what goes into it.

The "customers" of the stable trees are anyone who uses Linux.

Right now, it's estimated that only about 1/3 of the kernels running out
there, at the best, are an "enterprise" kernel/distro.  2/3 of the world
either run a kernel.org-based release + their own patches, or Debian.
And Debian piggy-backs on the stable kernel releases pretty regularily.

So the majority of the Linux users out there are what we are doing this
for.  Those that do not pay for a company to sift through things and
only cherry-pick what they want to pick out (hint, they almost always
miss things, some do this better than others...)

That's who this is all for, which is why we are doing our best to keep
on top of the avalanche of patches going into upstream to get the needed
fixes (both security and "normal" fixes) out to users as soon as
possible.

So again, if you are a subsystem maintainer, tag your patches for
stable.  If you do not, you will get automated emails asking you about
patches that should be applied (like the one that started this thread).
If you want to just have us ignore your subsystem entirely, we will be
glad to do so, and we will tell the world to not use your subsystem if
at all possible (see previous comments about xfs, and I would argue IB
right now...)

> Personally, I use stable as a one off from mainline. Like I mentioned
> in another email. I'm currently on 4.15.x and will probably move to
> 4.16.x next. Unless there's some critical bug announcement, I update my
> machines once a month. I originally just used mainline, but that was a
> bit too unstable for my work machines ;-)

That's great, you are a user of these trees then.  So you benifit
directly, along with everyone else who relies on them.

And again, I'm working with the SoC vendors to directly incorporate
these trees into their device trees, and I've already seen some devices
in the wild push out updated 4.4.y kernels a few weeks after they are
released.  That's the end goal here, to have the world's devices in a
much more secure and stable shape by relying on these kernels.

thanks,

greg k-h

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Greg KH

On Tue, Apr 17, 2018 at 10:15:02AM -0400, Steven Rostedt wrote:
> On Tue, 17 Apr 2018 14:04:36 +
> Sasha Levin  wrote:
> 
> > The solution to this, in my opinion, is to automate the whole selection
> > and review process. We do selection using AI, and we run every possible
> > test that's relevant to that subsystem.
> > 
> > At which point, the amount of work a human needs to do to review a patch
> > shrinks into something far more managable for some maintainers.
> 
> I guess the real question is, who are the stable kernels for? Is it just
> a place to look at to see what distros should think about. A superset
> of what distros would take. Then distros would have a nice place to
> look to find what patches they should look at. But the stable tree
> itself wont be used. But it's not being used today by major distros
> (Red Hat and SuSE). Debian may be using it, but that's because the
> stable maintainer for its kernels is also the Debian maintainer.
> 
> Who are the customers of the stable trees? They are the ones that
> should be determining the "equation" for what goes into it.

The "customers" of the stable trees are anyone who uses Linux.

Right now, it's estimated that only about 1/3 of the kernels running out
there, at the best, are an "enterprise" kernel/distro.  2/3 of the world
either run a kernel.org-based release + their own patches, or Debian.
And Debian piggy-backs on the stable kernel releases pretty regularily.

So the majority of the Linux users out there are what we are doing this
for.  Those that do not pay for a company to sift through things and
only cherry-pick what they want to pick out (hint, they almost always
miss things, some do this better than others...)

That's who this is all for, which is why we are doing our best to keep
on top of the avalanche of patches going into upstream to get the needed
fixes (both security and "normal" fixes) out to users as soon as
possible.

So again, if you are a subsystem maintainer, tag your patches for
stable.  If you do not, you will get automated emails asking you about
patches that should be applied (like the one that started this thread).
If you want to just have us ignore your subsystem entirely, we will be
glad to do so, and we will tell the world to not use your subsystem if
at all possible (see previous comments about xfs, and I would argue IB
right now...)

> Personally, I use stable as a one off from mainline. Like I mentioned
> in another email. I'm currently on 4.15.x and will probably move to
> 4.16.x next. Unless there's some critical bug announcement, I update my
> machines once a month. I originally just used mainline, but that was a
> bit too unstable for my work machines ;-)

That's great, you are a user of these trees then.  So you benifit
directly, along with everyone else who relies on them.

And again, I'm working with the SoC vendors to directly incorporate
these trees into their device trees, and I've already seen some devices
in the wild push out updated 4.4.y kernels a few weeks after they are
released.  That's the end goal here, to have the world's devices in a
much more secure and stable shape by relying on these kernels.

thanks,

greg k-h

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Michal Hocko

On Tue 17-04-18 13:39:33, Sasha Levin wrote:
[...]
> But mm/ commits don't come only from these people. Here's a concrete
> example we can discuss:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c61611f70958d86f659bca25c02ae69413747a8d

I would be really careful. Because that reqiures to audit all callers to
be compliant with the change. This is just _too_ easy to backport
without noticing a failure. Now consider the other side. Is there any
real bug report backing this? This behavior was like that for quite some
time but I do not remember any actual bug report and the changelog
doesn't mention one either. It is about theoretical problem. 

So if this was to be merged to stable then the changelog should contain
a big fat warning about the existing users and how they should be
checked.

Besides that I can see Reviewed-by: akpm and Andrew is usually very
careful about stable backports so there probably _was_ a reson to
exclude stable.
-- 
Michal Hocko
SUSE Labs

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Michal Hocko

On Tue 17-04-18 13:39:33, Sasha Levin wrote:
[...]
> But mm/ commits don't come only from these people. Here's a concrete
> example we can discuss:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c61611f70958d86f659bca25c02ae69413747a8d

I would be really careful. Because that reqiures to audit all callers to
be compliant with the change. This is just _too_ easy to backport
without noticing a failure. Now consider the other side. Is there any
real bug report backing this? This behavior was like that for quite some
time but I do not remember any actual bug report and the changelog
doesn't mention one either. It is about theoretical problem. 

So if this was to be merged to stable then the changelog should contain
a big fat warning about the existing users and how they should be
checked.

Besides that I can see Reviewed-by: akpm and Andrew is usually very
careful about stable backports so there probably _was_ a reson to
exclude stable.
-- 
Michal Hocko
SUSE Labs

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Steven Rostedt

On Tue, 17 Apr 2018 14:04:36 +
Sasha Levin  wrote:

> The solution to this, in my opinion, is to automate the whole selection
> and review process. We do selection using AI, and we run every possible
> test that's relevant to that subsystem.
> 
> At which point, the amount of work a human needs to do to review a patch
> shrinks into something far more managable for some maintainers.

I guess the real question is, who are the stable kernels for? Is it just
a place to look at to see what distros should think about. A superset
of what distros would take. Then distros would have a nice place to
look to find what patches they should look at. But the stable tree
itself wont be used. But it's not being used today by major distros
(Red Hat and SuSE). Debian may be using it, but that's because the
stable maintainer for its kernels is also the Debian maintainer.

Who are the customers of the stable trees? They are the ones that
should be determining the "equation" for what goes into it.

Personally, I use stable as a one off from mainline. Like I mentioned
in another email. I'm currently on 4.15.x and will probably move to
4.16.x next. Unless there's some critical bug announcement, I update my
machines once a month. I originally just used mainline, but that was a
bit too unstable for my work machines ;-)

-- Steve

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Steven Rostedt

On Tue, 17 Apr 2018 14:04:36 +
Sasha Levin  wrote:

> The solution to this, in my opinion, is to automate the whole selection
> and review process. We do selection using AI, and we run every possible
> test that's relevant to that subsystem.
> 
> At which point, the amount of work a human needs to do to review a patch
> shrinks into something far more managable for some maintainers.

I guess the real question is, who are the stable kernels for? Is it just
a place to look at to see what distros should think about. A superset
of what distros would take. Then distros would have a nice place to
look to find what patches they should look at. But the stable tree
itself wont be used. But it's not being used today by major distros
(Red Hat and SuSE). Debian may be using it, but that's because the
stable maintainer for its kernels is also the Debian maintainer.

Who are the customers of the stable trees? They are the ones that
should be determining the "equation" for what goes into it.

Personally, I use stable as a one off from mainline. Like I mentioned
in another email. I'm currently on 4.15.x and will probably move to
4.16.x next. Unless there's some critical bug announcement, I update my
machines once a month. I originally just used mainline, but that was a
bit too unstable for my work machines ;-)

-- Steve

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Sasha Levin

On Tue, Apr 17, 2018 at 01:07:17PM +0200, Michal Hocko wrote:
>On Tue 17-04-18 12:39:36, Greg KH wrote:
>> On Mon, Apr 16, 2018 at 11:28:44PM +0200, Jiri Kosina wrote:
>> > On Mon, 16 Apr 2018, Sasha Levin wrote:
>> >
>> > > I agree that as an enterprise distro taking everything from -stable
>> > > isn't the best idea. Ideally you'd want to be close to the first
>> > > extreme you've mentioned and only take commits if customers are asking
>> > > you to do so.
>> > >
>> > > I think that the rule we're trying to agree upon is the "It must fix
>> > > a real bug that bothers people".
>> > >
>> > > I think that we can agree that it's impossible to expect every single
>> > > Linux user to go on LKML and complain about a bug he encountered, so the
>> > > rule quickly becomes "It must fix a real bug that can bother people".
>> >
>> > So is there a reason why stable couldn't become some hybrid-form union of
>> >
>> > - really critical issues (data corruption, boot issues, severe security
>> >   issues) taken from bleeding edge upstream
>> > - [reviewed] cherry-picks of functional fixes from major distro kernels
>> >   (based on that very -stable release), as that's apparently what people
>> >   are hitting in the real world with that particular kernel
>>
>> It already is that :)
>>
>> The problem Sasha is trying to solve here is that for many subsystems,
>> maintainers do not mark patches for stable at all.
>
>The way he is trying to do that is just wrong. Generate a pressure on
>those subsystems by referring to bug reports and unhappy users and I am
>pretty sure they will try harder... You cannot solve the problem by
>bypassing them without having deep understanding of the specific
>subsytem. Once you have it, just make sure you are part of the review
>process and make sure to mark patches before they are merged.

I think we just don't agree on how we should "pressure".

Look at the discussion I had with the XFS folks who just don't want to
deal with this -stable thing because they have to much work upstream.

There wasn't a single patch in -stable coming from XFS for the past 6+
months. I'm aware of more than one way to corrupt an XFS volume for any
distro that uses a kernel older than 4.15.

Sure, please buy them a beer at LSF/MM (I'll pay) and ask them to be
better about it, but I don't see this changing.

The solution to this, in my opinion, is to automate the whole selection
and review process. We do selection using AI, and we run every possible
test that's relevant to that subsystem.

At which point, the amount of work a human needs to do to review a patch
shrinks into something far more managable for some maintainers.

>> So real bugfixes
>> that do hit people are not getting to those kernels, which force the
>> distros to do extra work to triage a bug, dig through upstream kernels,
>> find and apply the patch.
>
>I would say that this is the primary role of the distro. To hide the
>jungle of the upstream work and provide the additional of bug filtering
>and forwarding them the right direction.

More often than triaging, you'll just be asked to upgrade to the latest
version. What sort of user experience does that provide?

[snip]

>> So nothing "new" is happening here, EXCEPT we are actually starting to
>> get a better kernel-wide coverage for stable fixes, which we have not
>> had in the past.  That's a good thing!  The number of patches applied to
>> stable is still a very very very tiny % compared to mainline, so nothing
>> new is happening here.
>
>yes I do agree, the stable process is not very much different from the
>past and I would tend both processes broken because they explicitly try
>to avoid maintainers which is just wrong.

Avoid maintainers?! We send so much "spam" trying to get maintainers
more involved in the process. How is that avoiding them?

If you're a maintainer who has specific requirements for the -stable
flow, or you have any automated testing you'd like to be run on these
commits, or you want these mails to come in a different format, or
pretty much anything else at all just shoot me a mail!

It's been almost impossible to get maintainers involved in this process.

We don't sneak anything past maintainers, there are multiple mails over
multiple weeks for each commit that would go in. You don't have to
review it right away either, just reply with "please don't merge until
I'm done reviewing" and it'll get removed from the queue.

>> Oh, and if you do want to complain about huge new features being
>> backported, look at the mess that Spectre and Meltdown has caused in the
>> stable trees.  I don't see anyone complaining about those massive
>> changes :)
>
>Are you serious? Are you going the compare the biggest PITA that the
>community had to undergo because of HW issues with random pattern
>matching in changelog/diffs? Come on!

HW Issues are irrelevant here. You had a bug that allowed arbitrary
kernel memory access. I can easily list quite a few commits, that are
not tagged for

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Sasha Levin

On Tue, Apr 17, 2018 at 01:07:17PM +0200, Michal Hocko wrote:
>On Tue 17-04-18 12:39:36, Greg KH wrote:
>> On Mon, Apr 16, 2018 at 11:28:44PM +0200, Jiri Kosina wrote:
>> > On Mon, 16 Apr 2018, Sasha Levin wrote:
>> >
>> > > I agree that as an enterprise distro taking everything from -stable
>> > > isn't the best idea. Ideally you'd want to be close to the first
>> > > extreme you've mentioned and only take commits if customers are asking
>> > > you to do so.
>> > >
>> > > I think that the rule we're trying to agree upon is the "It must fix
>> > > a real bug that bothers people".
>> > >
>> > > I think that we can agree that it's impossible to expect every single
>> > > Linux user to go on LKML and complain about a bug he encountered, so the
>> > > rule quickly becomes "It must fix a real bug that can bother people".
>> >
>> > So is there a reason why stable couldn't become some hybrid-form union of
>> >
>> > - really critical issues (data corruption, boot issues, severe security
>> >   issues) taken from bleeding edge upstream
>> > - [reviewed] cherry-picks of functional fixes from major distro kernels
>> >   (based on that very -stable release), as that's apparently what people
>> >   are hitting in the real world with that particular kernel
>>
>> It already is that :)
>>
>> The problem Sasha is trying to solve here is that for many subsystems,
>> maintainers do not mark patches for stable at all.
>
>The way he is trying to do that is just wrong. Generate a pressure on
>those subsystems by referring to bug reports and unhappy users and I am
>pretty sure they will try harder... You cannot solve the problem by
>bypassing them without having deep understanding of the specific
>subsytem. Once you have it, just make sure you are part of the review
>process and make sure to mark patches before they are merged.

I think we just don't agree on how we should "pressure".

Look at the discussion I had with the XFS folks who just don't want to
deal with this -stable thing because they have to much work upstream.

There wasn't a single patch in -stable coming from XFS for the past 6+
months. I'm aware of more than one way to corrupt an XFS volume for any
distro that uses a kernel older than 4.15.

Sure, please buy them a beer at LSF/MM (I'll pay) and ask them to be
better about it, but I don't see this changing.

The solution to this, in my opinion, is to automate the whole selection
and review process. We do selection using AI, and we run every possible
test that's relevant to that subsystem.

At which point, the amount of work a human needs to do to review a patch
shrinks into something far more managable for some maintainers.

>> So real bugfixes
>> that do hit people are not getting to those kernels, which force the
>> distros to do extra work to triage a bug, dig through upstream kernels,
>> find and apply the patch.
>
>I would say that this is the primary role of the distro. To hide the
>jungle of the upstream work and provide the additional of bug filtering
>and forwarding them the right direction.

More often than triaging, you'll just be asked to upgrade to the latest
version. What sort of user experience does that provide?

[snip]

>> So nothing "new" is happening here, EXCEPT we are actually starting to
>> get a better kernel-wide coverage for stable fixes, which we have not
>> had in the past.  That's a good thing!  The number of patches applied to
>> stable is still a very very very tiny % compared to mainline, so nothing
>> new is happening here.
>
>yes I do agree, the stable process is not very much different from the
>past and I would tend both processes broken because they explicitly try
>to avoid maintainers which is just wrong.

Avoid maintainers?! We send so much "spam" trying to get maintainers
more involved in the process. How is that avoiding them?

If you're a maintainer who has specific requirements for the -stable
flow, or you have any automated testing you'd like to be run on these
commits, or you want these mails to come in a different format, or
pretty much anything else at all just shoot me a mail!

It's been almost impossible to get maintainers involved in this process.

We don't sneak anything past maintainers, there are multiple mails over
multiple weeks for each commit that would go in. You don't have to
review it right away either, just reply with "please don't merge until
I'm done reviewing" and it'll get removed from the queue.

>> Oh, and if you do want to complain about huge new features being
>> backported, look at the mess that Spectre and Meltdown has caused in the
>> stable trees.  I don't see anyone complaining about those massive
>> changes :)
>
>Are you serious? Are you going the compare the biggest PITA that the
>community had to undergo because of HW issues with random pattern
>matching in changelog/diffs? Come on!

HW Issues are irrelevant here. You had a bug that allowed arbitrary
kernel memory access. I can easily list quite a few commits, that are
not tagged for

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Sasha Levin

On Tue, Apr 17, 2018 at 02:24:54PM +0200, Petr Mladek wrote:
>Back to the trend. Last week I got autosel mails even for
>patches that were still being discussed, had issues, and
>were far from upstream:
>
> https://lkml.kernel.org/r/dm5pr2101mb1032ab19b489d46b717b50d4fb...@dm5pr2101mb1032.namprd21.prod.outlook.com
> https://lkml.kernel.org/r/dm5pr2101mb10327fa0a7e0d2c901e33b79fb...@dm5pr2101mb1032.namprd21.prod.outlook.com
>
>It might be a good idea if the mail asked to add Fixes: tag
>or stable mailing list. But the mail suggested to add the
>unfinished patch into stable branch directly (even before
>upstreaming?).

I obviously didn't suggest that this patch will go in -stable before
it's upstream.

I've started doing those because some folks can't be arsed to reply to a
review request for a patch that is months old. I found that if I send
these mails while the discussion is still going on I'd get a much better
response rate from people.

If you think any of these patches should go in stable there were two
ways about it:

 - You end up adding the -stable tag yourself, and it would follow the
   usual route where Greg picks it up.
 - You reply to that mail, and the patch would wait in a list until my
   script notices it made it upstream, at which point it would get
   queued for stable.

>Now, there are only hand full of printk patches in each
>release, so it is still doable. I just do not understand
>how other maintainers, from much more busy subsystems,
>could cope with this trend.
>
>By other words. If you want to automatize patch nomination,
>you might need to automatize also patch review. Or you need
>to keep the patch rate low. This might mean to nominate
>only important and rather trivial fixes.

I also have an effort to help review the patches. See what I'm working
on for the xfs folks:

https://lkml.org/lkml/2018/3/29/1113

Where in addition to build tests I'd also run each commit, for each
stable kernel through a set of xfstests and provide them along with the
mail.

So yes, I'm aware that the volume of patches is huge, but there's not
much I can do about it because it's just a subset of the kernel's patch
volume and since the kernel gets more and more patches each release, the
volume of stable commits is bound to grow as well.

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Sasha Levin

On Tue, Apr 17, 2018 at 02:24:54PM +0200, Petr Mladek wrote:
>Back to the trend. Last week I got autosel mails even for
>patches that were still being discussed, had issues, and
>were far from upstream:
>
> https://lkml.kernel.org/r/dm5pr2101mb1032ab19b489d46b717b50d4fb...@dm5pr2101mb1032.namprd21.prod.outlook.com
> https://lkml.kernel.org/r/dm5pr2101mb10327fa0a7e0d2c901e33b79fb...@dm5pr2101mb1032.namprd21.prod.outlook.com
>
>It might be a good idea if the mail asked to add Fixes: tag
>or stable mailing list. But the mail suggested to add the
>unfinished patch into stable branch directly (even before
>upstreaming?).

I obviously didn't suggest that this patch will go in -stable before
it's upstream.

I've started doing those because some folks can't be arsed to reply to a
review request for a patch that is months old. I found that if I send
these mails while the discussion is still going on I'd get a much better
response rate from people.

If you think any of these patches should go in stable there were two
ways about it:

 - You end up adding the -stable tag yourself, and it would follow the
   usual route where Greg picks it up.
 - You reply to that mail, and the patch would wait in a list until my
   script notices it made it upstream, at which point it would get
   queued for stable.

>Now, there are only hand full of printk patches in each
>release, so it is still doable. I just do not understand
>how other maintainers, from much more busy subsystems,
>could cope with this trend.
>
>By other words. If you want to automatize patch nomination,
>you might need to automatize also patch review. Or you need
>to keep the patch rate low. This might mean to nominate
>only important and rather trivial fixes.

I also have an effort to help review the patches. See what I'm working
on for the xfs folks:

https://lkml.org/lkml/2018/3/29/1113

Where in addition to build tests I'd also run each commit, for each
stable kernel through a set of xfstests and provide them along with the
mail.

So yes, I'm aware that the volume of patches is huge, but there's not
much I can do about it because it's just a subset of the kernel's patch
volume and since the kernel gets more and more patches each release, the
volume of stable commits is bound to grow as well.

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Sasha Levin

On Tue, Apr 17, 2018 at 02:49:24PM +0200, Michal Hocko wrote:
>On Tue 17-04-18 14:24:54, Petr Mladek wrote:
>[...]
>> Back to the trend. Last week I got autosel mails even for
>> patches that were still being discussed, had issues, and
>> were far from upstream:
>>
>> https://lkml.kernel.org/r/dm5pr2101mb1032ab19b489d46b717b50d4fb...@dm5pr2101mb1032.namprd21.prod.outlook.com
>> https://lkml.kernel.org/r/dm5pr2101mb10327fa0a7e0d2c901e33b79fb...@dm5pr2101mb1032.namprd21.prod.outlook.com
>>
>> It might be a good idea if the mail asked to add Fixes: tag
>> or stable mailing list. But the mail suggested to add the
>> unfinished patch into stable branch directly (even before
>> upstreaming?).
>
>Well, I think that poking subsystems which ignore stable trees with such
>emails early during review might be quite helpful. Maybe people start
>marking for stable and we do not need the guessing later. I wouldn't
>bother poking those who are known to mark stable patches though.

Yup, mm/ needs far less poking that XFS (for example).

What makes mm/ so good about this is that it's a rather small set of
devs who are good at marking things for stable. As long as the commit
came from one of these "core" mm/ folks it's almost guaranteed to have
proper stable tags.

But mm/ commits don't come only from these people. Here's a concrete
example we can discuss:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c61611f70958d86f659bca25c02ae69413747a8d

This was merged in a few days ago, and seems relevant for older kernel
trees as well. Should it not have a stable tag?

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Sasha Levin

On Tue, Apr 17, 2018 at 02:49:24PM +0200, Michal Hocko wrote:
>On Tue 17-04-18 14:24:54, Petr Mladek wrote:
>[...]
>> Back to the trend. Last week I got autosel mails even for
>> patches that were still being discussed, had issues, and
>> were far from upstream:
>>
>> https://lkml.kernel.org/r/dm5pr2101mb1032ab19b489d46b717b50d4fb...@dm5pr2101mb1032.namprd21.prod.outlook.com
>> https://lkml.kernel.org/r/dm5pr2101mb10327fa0a7e0d2c901e33b79fb...@dm5pr2101mb1032.namprd21.prod.outlook.com
>>
>> It might be a good idea if the mail asked to add Fixes: tag
>> or stable mailing list. But the mail suggested to add the
>> unfinished patch into stable branch directly (even before
>> upstreaming?).
>
>Well, I think that poking subsystems which ignore stable trees with such
>emails early during review might be quite helpful. Maybe people start
>marking for stable and we do not need the guessing later. I wouldn't
>bother poking those who are known to mark stable patches though.

Yup, mm/ needs far less poking that XFS (for example).

What makes mm/ so good about this is that it's a rather small set of
devs who are good at marking things for stable. As long as the commit
came from one of these "core" mm/ folks it's almost guaranteed to have
proper stable tags.

But mm/ commits don't come only from these people. Here's a concrete
example we can discuss:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c61611f70958d86f659bca25c02ae69413747a8d

This was merged in a few days ago, and seems relevant for older kernel
trees as well. Should it not have a stable tag?

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Sasha Levin

On Tue, Apr 17, 2018 at 01:41:44PM +0200, Jan Kara wrote:
>On Mon 16-04-18 17:23:30, Sasha Levin wrote:
>> On Mon, Apr 16, 2018 at 07:06:04PM +0200, Pavel Machek wrote:
>> >On Mon 2018-04-16 16:37:56, Sasha Levin wrote:
>> >> On Mon, Apr 16, 2018 at 12:30:19PM -0400, Steven Rostedt wrote:
>> >> >On Mon, 16 Apr 2018 16:19:14 +
>> >> >Sasha Levin  wrote:
>> >> >
>> >> >> >Wait! What does that mean? What's the purpose of stable if it is as
>> >> >> >broken as mainline?
>> >> >>
>> >> >> This just means that if there is a fix that went in mainline, and the
>> >> >> fix is broken somehow, we'd rather take the broken fix than not.
>> >> >>
>> >> >> In this scenario, *something* will be broken, it's just a matter of
>> >> >> what. We'd rather have the same thing broken between mainline and
>> >> >> stable.
>> >> >
>> >> >Honestly, I think that removes all value of the stable series. I
>> >> >remember when the stable series were first created. People were saying
>> >> >that it wouldn't even get to more than 5 versions, because the bar for
>> >> >backporting was suppose to be very high. Today it's just a fork of the
>> >> >kernel at a given version. No more features, but we will be OK with
>> >> >regressions. I'm struggling to see what the benefit of it is suppose to
>> >> >be?
>> >>
>> >> It's not "OK with regressions".
>> >>
>> >> Let's look at a hypothetical example: You have a 4.15.1 kernel that has
>> >> a broken printf() behaviour so that when you:
>> >>
>> >>   pr_err("%d", 5)
>> >>
>> >> Would print:
>> >>
>> >>   "Microsoft Rulez"
>> >>
>> >> Bad, right? So you went ahead and fixed it, and now it prints "5" as you
>> >> might expect. But alas, with your patch, running:
>> >>
>> >>   pr_err("%s", "hi!")
>> >>
>> >> Would show a cat picture for 5 seconds.
>> >>
>> >> Should we take your patch in -stable or not? If we don't, we're stuck
>> >> with the original issue while the mainline kernel will behave
>> >> differently, but if we do - we introduce a new regression.
>> >
>> >Of course not.
>> >
>> >- It must be obviously correct and tested.
>> >
>> >If it introduces new bug, it is not correct, and certainly not
>> >obviously correct.
>>
>> As you might have noticed, we don't strictly follow the rules.
>>
>> Take a look at the whole PTI story as an example. It's way more than 100
>> lines, it's not obviously corrent, it fixed more than 1 thing, and so
>> on, and yet it went in -stable!
>>
>> Would you argue we shouldn't have backported PTI to -stable?
>
>So I agree with that being backported. But I think this nicely demostrates
>a point some people are trying to make in this thread. We do take fixes
>with high risk or regression if they fix serious enough issue. Also we do
>take fixes to non-serious stuff (such as addition of device ID) if the
>chances of regression are really low.
>
>So IMHO the metric for including the fix is not solely "how annoying to
>user this can be" but rather something like:
>
>score = (how annoying the bug is) * ((1 / (chance of regression due to
>   including this)) - 1)^3
>
>(constants are somewhat arbitrary subject to tuning ;). Now both 'annoying'
>and 'regression chance' parts are subjective and sometimes difficult to
>estimate so don't take the formula too seriously but it demonstrates the
>point. I think we all agree we want to fix annoying stuff and we don't want
>regressions. But you need to somehow weight this over your expected
>userbase - and this is where your argument "but someone might be annoyed by
>LEDs not working so let's include it" has problems - it should rather be
>"is the annoyance of non-working leds over expected user base high enough
>to risk a regression due to this patch for someone in the expected user
>base"? The answer to this second question is not clear at all to a casual
>reviewer and that's why we IMHO have CC stable tag as maintainer is
>supposed to have at least a bit better clue.

We may be able to guesstimate the 'regression chance', but there's no
way we can guess the 'annoyance' once. There are so many different use
cases that we just can't even guess how many people would get "annoyed"
by something.

Even regression chance is tricky, look at the commits I've linked
earlier in the thread. Even the most trivial looking commits that end up
in stable have a chance for regression.

>Another point I wanted to make is that if chance a patch causes a
>regression is about 2% as you said somewhere else in a thread, then by
>adding 20 patches that "may fix a bug that is annoying for someone" you've
>just increased a chance there's a regression in the release by 34%. And

So I've said that the rejection rate is less than 2%. This includes
all commits that I have proposed for -stable, but didn't end up being
included in -stable.

This includes commits that the author/maintainers NACKed, commits that
didn't do anything on older kernels, commits that were buggy but were
caught before the kernel was released, commits

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Sasha Levin

On Tue, Apr 17, 2018 at 01:41:44PM +0200, Jan Kara wrote:
>On Mon 16-04-18 17:23:30, Sasha Levin wrote:
>> On Mon, Apr 16, 2018 at 07:06:04PM +0200, Pavel Machek wrote:
>> >On Mon 2018-04-16 16:37:56, Sasha Levin wrote:
>> >> On Mon, Apr 16, 2018 at 12:30:19PM -0400, Steven Rostedt wrote:
>> >> >On Mon, 16 Apr 2018 16:19:14 +
>> >> >Sasha Levin  wrote:
>> >> >
>> >> >> >Wait! What does that mean? What's the purpose of stable if it is as
>> >> >> >broken as mainline?
>> >> >>
>> >> >> This just means that if there is a fix that went in mainline, and the
>> >> >> fix is broken somehow, we'd rather take the broken fix than not.
>> >> >>
>> >> >> In this scenario, *something* will be broken, it's just a matter of
>> >> >> what. We'd rather have the same thing broken between mainline and
>> >> >> stable.
>> >> >
>> >> >Honestly, I think that removes all value of the stable series. I
>> >> >remember when the stable series were first created. People were saying
>> >> >that it wouldn't even get to more than 5 versions, because the bar for
>> >> >backporting was suppose to be very high. Today it's just a fork of the
>> >> >kernel at a given version. No more features, but we will be OK with
>> >> >regressions. I'm struggling to see what the benefit of it is suppose to
>> >> >be?
>> >>
>> >> It's not "OK with regressions".
>> >>
>> >> Let's look at a hypothetical example: You have a 4.15.1 kernel that has
>> >> a broken printf() behaviour so that when you:
>> >>
>> >>   pr_err("%d", 5)
>> >>
>> >> Would print:
>> >>
>> >>   "Microsoft Rulez"
>> >>
>> >> Bad, right? So you went ahead and fixed it, and now it prints "5" as you
>> >> might expect. But alas, with your patch, running:
>> >>
>> >>   pr_err("%s", "hi!")
>> >>
>> >> Would show a cat picture for 5 seconds.
>> >>
>> >> Should we take your patch in -stable or not? If we don't, we're stuck
>> >> with the original issue while the mainline kernel will behave
>> >> differently, but if we do - we introduce a new regression.
>> >
>> >Of course not.
>> >
>> >- It must be obviously correct and tested.
>> >
>> >If it introduces new bug, it is not correct, and certainly not
>> >obviously correct.
>>
>> As you might have noticed, we don't strictly follow the rules.
>>
>> Take a look at the whole PTI story as an example. It's way more than 100
>> lines, it's not obviously corrent, it fixed more than 1 thing, and so
>> on, and yet it went in -stable!
>>
>> Would you argue we shouldn't have backported PTI to -stable?
>
>So I agree with that being backported. But I think this nicely demostrates
>a point some people are trying to make in this thread. We do take fixes
>with high risk or regression if they fix serious enough issue. Also we do
>take fixes to non-serious stuff (such as addition of device ID) if the
>chances of regression are really low.
>
>So IMHO the metric for including the fix is not solely "how annoying to
>user this can be" but rather something like:
>
>score = (how annoying the bug is) * ((1 / (chance of regression due to
>   including this)) - 1)^3
>
>(constants are somewhat arbitrary subject to tuning ;). Now both 'annoying'
>and 'regression chance' parts are subjective and sometimes difficult to
>estimate so don't take the formula too seriously but it demonstrates the
>point. I think we all agree we want to fix annoying stuff and we don't want
>regressions. But you need to somehow weight this over your expected
>userbase - and this is where your argument "but someone might be annoyed by
>LEDs not working so let's include it" has problems - it should rather be
>"is the annoyance of non-working leds over expected user base high enough
>to risk a regression due to this patch for someone in the expected user
>base"? The answer to this second question is not clear at all to a casual
>reviewer and that's why we IMHO have CC stable tag as maintainer is
>supposed to have at least a bit better clue.

We may be able to guesstimate the 'regression chance', but there's no
way we can guess the 'annoyance' once. There are so many different use
cases that we just can't even guess how many people would get "annoyed"
by something.

Even regression chance is tricky, look at the commits I've linked
earlier in the thread. Even the most trivial looking commits that end up
in stable have a chance for regression.

>Another point I wanted to make is that if chance a patch causes a
>regression is about 2% as you said somewhere else in a thread, then by
>adding 20 patches that "may fix a bug that is annoying for someone" you've
>just increased a chance there's a regression in the release by 34%. And

So I've said that the rejection rate is less than 2%. This includes
all commits that I have proposed for -stable, but didn't end up being
included in -stable.

This includes commits that the author/maintainers NACKed, commits that
didn't do anything on older kernels, commits that were buggy but were
caught before the kernel was released, commits that failed to build on
an

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Michal Hocko

On Tue 17-04-18 14:24:54, Petr Mladek wrote:
[...]
> Back to the trend. Last week I got autosel mails even for
> patches that were still being discussed, had issues, and
> were far from upstream:
> 
> https://lkml.kernel.org/r/dm5pr2101mb1032ab19b489d46b717b50d4fb...@dm5pr2101mb1032.namprd21.prod.outlook.com
> https://lkml.kernel.org/r/dm5pr2101mb10327fa0a7e0d2c901e33b79fb...@dm5pr2101mb1032.namprd21.prod.outlook.com
> 
> It might be a good idea if the mail asked to add Fixes: tag
> or stable mailing list. But the mail suggested to add the
> unfinished patch into stable branch directly (even before
> upstreaming?).

Well, I think that poking subsystems which ignore stable trees with such
emails early during review might be quite helpful. Maybe people start
marking for stable and we do not need the guessing later. I wouldn't
bother poking those who are known to mark stable patches though.
-- 
Michal Hocko
SUSE Labs

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Michal Hocko

On Tue 17-04-18 14:24:54, Petr Mladek wrote:
[...]
> Back to the trend. Last week I got autosel mails even for
> patches that were still being discussed, had issues, and
> were far from upstream:
> 
> https://lkml.kernel.org/r/dm5pr2101mb1032ab19b489d46b717b50d4fb...@dm5pr2101mb1032.namprd21.prod.outlook.com
> https://lkml.kernel.org/r/dm5pr2101mb10327fa0a7e0d2c901e33b79fb...@dm5pr2101mb1032.namprd21.prod.outlook.com
> 
> It might be a good idea if the mail asked to add Fixes: tag
> or stable mailing list. But the mail suggested to add the
> unfinished patch into stable branch directly (even before
> upstreaming?).

Well, I think that poking subsystems which ignore stable trees with such
emails early during review might be quite helpful. Maybe people start
marking for stable and we do not need the guessing later. I wouldn't
bother poking those who are known to mark stable patches though.
-- 
Michal Hocko
SUSE Labs

Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-17 Thread Petr Mladek

On Tue 2018-04-17 12:46:37, Greg KH wrote:
> Oh, I know why, suddenly subsystems that never were taking the time to
> mark patches for stable are getting patches backported and are getting
> nervous.

Yes, I am getting nervous because of this. The number of printk fixes
nominated for stable is increasing exponentially (just my feeling)
during last few months.

The problem is that I want to be responsible and think about possible
regressions. Sometimes it requires checking the state of the
particular kernel release. The older code base the more complicated
the decision is.

You might argue that backporting the fixes helps to get the same code
in all supported code bases. But it is not true. It never will be
the same.

Anyway, in the past the "automatically" nominated printk fixes
were trivial. They did not cause harm. But they also were not
worth it, IMHO. They fixed corner cases that were there for ages.
Most of these fixes were found by code review when working on
a feature. They were not backed by bug reports.

Last week, autosel nominated pretty non-trivial patch (started
this thread). It partly solved a problem we tried to fix last few
years.

On one side, this was an annoying problem that motivated several
people spend a lot of time on it. This might be a motivation
for a backport.

On the other hand, it took many years to come somewhere. The main
problem was the fear of regressions. We fixed/improved many things
in the mean time. It shows that the problem really is not trivial.
The same is true for the fix. We did our best to avoid regressions.
But it does not mean that there are none. Also it does not mean
that it will really give better results in all situations.

I really do not see a reason to hurry and backport this to
the older kernel releases. It means to spread the fix but also
eventual problems. It is easy to miss a dependant patch.
The less trivial fix, the more possible problems are there.

Back to the trend. Last week I got autosel mails even for
patches that were still being discussed, had issues, and
were far from upstream:

https://lkml.kernel.org/r/dm5pr2101mb1032ab19b489d46b717b50d4fb...@dm5pr2101mb1032.namprd21.prod.outlook.com
https://lkml.kernel.org/r/dm5pr2101mb10327fa0a7e0d2c901e33b79fb...@dm5pr2101mb1032.namprd21.prod.outlook.com

It might be a good idea if the mail asked to add Fixes: tag
or stable mailing list. But the mail suggested to add the
unfinished patch into stable branch directly (even before
upstreaming?).

Now, there are only hand full of printk patches in each
release, so it is still doable. I just do not understand
how other maintainers, from much more busy subsystems,
could cope with this trend.

By other words. If you want to automatize patch nomination,
you might need to automatize also patch review. Or you need
to keep the patch rate low. This might mean to nominate
only important and rather trivial fixes.

Best Regards,
Petr