Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Thu, May 03, 2018 at 11:32:15AM +0200, Pavel Machek wrote: >Hi! > >> >- It must be obviously correct and tested. >> > >> >If it introduces new bug, it is not correct, and certainly not >> >obviously correct. >> >> As you might have noticed, we don't strictly follow the rules. > >Yes, I noticed. And what I'm saying is that perhaps you should follow >the rules more strictly. Again, this was stated many times by Greg and others, the rules are not there to be strictly followed. >> Take a look at the whole PTI story as an example. It's way more than 100 >> lines, it's not obviously corrent, it fixed more than 1 thing, and so >> on, and yet it went in -stable! >> >> Would you argue we shouldn't have backported PTI to -stable? > >Actually, I was surprised with PTI going to stable. That was clearly >against the rules. Maybe the security bug was ugly enough to warrant >that. > >But please don't use it as an argument for applying any random >patches... How about this: if a -stable maintainer has concerns with how I follow the -stable rules, he's more than welcome to reject my patches. Sounds like a plan?
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Thu, May 03, 2018 at 11:32:15AM +0200, Pavel Machek wrote: >Hi! > >> >- It must be obviously correct and tested. >> > >> >If it introduces new bug, it is not correct, and certainly not >> >obviously correct. >> >> As you might have noticed, we don't strictly follow the rules. > >Yes, I noticed. And what I'm saying is that perhaps you should follow >the rules more strictly. Again, this was stated many times by Greg and others, the rules are not there to be strictly followed. >> Take a look at the whole PTI story as an example. It's way more than 100 >> lines, it's not obviously corrent, it fixed more than 1 thing, and so >> on, and yet it went in -stable! >> >> Would you argue we shouldn't have backported PTI to -stable? > >Actually, I was surprised with PTI going to stable. That was clearly >against the rules. Maybe the security bug was ugly enough to warrant >that. > >But please don't use it as an argument for applying any random >patches... How about this: if a -stable maintainer has concerns with how I follow the -stable rules, he's more than welcome to reject my patches. Sounds like a plan?
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Thu, May 03, 2018 at 11:36:51AM +0200, Pavel Machek wrote: >On Tue 2018-04-17 16:19:35, Sasha Levin wrote: >> On Tue, Apr 17, 2018 at 05:55:49PM +0200, Jan Kara wrote: >> >On Tue 17-04-18 13:31:51, Sasha Levin wrote: >> >> We may be able to guesstimate the 'regression chance', but there's no >> >> way we can guess the 'annoyance' once. There are so many different use >> >> cases that we just can't even guess how many people would get "annoyed" >> >> by something. >> > >> >As a maintainer, I hope I have reasonable idea what are common use cases >> >for my subsystem. Those I cater to when estimating 'annoyance'. Sure I don't >> >know all of the use cases so people doing unusual stuff hit more bugs and >> >have to report them to get fixes included in -stable. But for me this is a >> >preferable tradeoff over the risk of regression so this is the rule I use >> >when tagging for stable. Now I'm not a -stable maintainer and I fully agree >> >with "those who do the work decide" principle so pick whatever patches you >> >think are appropriate, I just wanted explain why I don't think more patches >> >in stable are necessarily good. >> >> The AUTOSEL story is different for subsystems that don't do -stable, and >> subsystems that are actually doing the work (like yourself). >> >> I'm not trying to override active maintainers, I'm trying to help them >> make decisions. > >Ok, cool. Can you exclude LED subsystem, Hibernation and Nokia N900 >stuff from autosel work? Curiousity got me, and I had to see what these subsystems do as far as stable commits: $ git log --oneline --grep 'stable@vger' --since="01-01-2016" kernel/power drivers/leds drivers/media/i2c/et8ek8 drivers/media/i2c/ad5820.c arch/x86/kernel/acpi/ | wc -l 7 Which got me a bit surprised: maybe indeed leds is mostly fine, but hibernation is definitely tricky, I've been stung by it a few times... So why not pick something an actual user reported, and see how that was dealt with? Googling first showed this: https://bugzilla.kernel.org/show_bug.cgi?id=97201 Which was fixed by: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bdbc98abb3aa323f6323b11db39c740e6f8fc5b1 But that's not in any -stable tree. Hmm.. ok.. Next one on google was: https://bugzilla.kernel.org/show_bug.cgi?id=117971 Which, in turn, was fixed by: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5b3f249c94ce1f46bacd9814385b0ee2d1ae52f3 Oh look at that, it's not in -stable either... So seeing how you have concerns with my selection of -stable commits, maybe you could explain to me why these commits didn't end up in -stable?
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Thu, May 03, 2018 at 11:36:51AM +0200, Pavel Machek wrote: >On Tue 2018-04-17 16:19:35, Sasha Levin wrote: >> On Tue, Apr 17, 2018 at 05:55:49PM +0200, Jan Kara wrote: >> >On Tue 17-04-18 13:31:51, Sasha Levin wrote: >> >> We may be able to guesstimate the 'regression chance', but there's no >> >> way we can guess the 'annoyance' once. There are so many different use >> >> cases that we just can't even guess how many people would get "annoyed" >> >> by something. >> > >> >As a maintainer, I hope I have reasonable idea what are common use cases >> >for my subsystem. Those I cater to when estimating 'annoyance'. Sure I don't >> >know all of the use cases so people doing unusual stuff hit more bugs and >> >have to report them to get fixes included in -stable. But for me this is a >> >preferable tradeoff over the risk of regression so this is the rule I use >> >when tagging for stable. Now I'm not a -stable maintainer and I fully agree >> >with "those who do the work decide" principle so pick whatever patches you >> >think are appropriate, I just wanted explain why I don't think more patches >> >in stable are necessarily good. >> >> The AUTOSEL story is different for subsystems that don't do -stable, and >> subsystems that are actually doing the work (like yourself). >> >> I'm not trying to override active maintainers, I'm trying to help them >> make decisions. > >Ok, cool. Can you exclude LED subsystem, Hibernation and Nokia N900 >stuff from autosel work? Curiousity got me, and I had to see what these subsystems do as far as stable commits: $ git log --oneline --grep 'stable@vger' --since="01-01-2016" kernel/power drivers/leds drivers/media/i2c/et8ek8 drivers/media/i2c/ad5820.c arch/x86/kernel/acpi/ | wc -l 7 Which got me a bit surprised: maybe indeed leds is mostly fine, but hibernation is definitely tricky, I've been stung by it a few times... So why not pick something an actual user reported, and see how that was dealt with? Googling first showed this: https://bugzilla.kernel.org/show_bug.cgi?id=97201 Which was fixed by: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bdbc98abb3aa323f6323b11db39c740e6f8fc5b1 But that's not in any -stable tree. Hmm.. ok.. Next one on google was: https://bugzilla.kernel.org/show_bug.cgi?id=117971 Which, in turn, was fixed by: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5b3f249c94ce1f46bacd9814385b0ee2d1ae52f3 Oh look at that, it's not in -stable either... So seeing how you have concerns with my selection of -stable commits, maybe you could explain to me why these commits didn't end up in -stable?
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Thu, May 03, 2018 at 11:47:24AM +0200, Pavel Machek wrote: >On Mon 2018-04-16 21:18:47, Sasha Levin wrote: >> On Mon, Apr 16, 2018 at 10:43:28PM +0200, Jiri Kosina wrote: >> >On Mon, 16 Apr 2018, Sasha Levin wrote: >> > >> >> So I think that Linus's claim that users come first applies here as >> >> well. If there's a user that cares about a particular feature being >> >> broken, then we go ahead and fix his bug rather then ignoring him. >> > >> >So one extreme is fixing -stable *iff* users actually do report an issue. >> > >> >The other extreme is backporting everything that potentially looks like a >> >potential fix of "something" (according to some arbitrary metric), >> >pro-actively. >> > >> >The former voilates the "users first" rule, the latter has a very, very >> >high risk of regressions. >> > >> >So this whole debate is about finding a compromise. >> > >> >My gut feeling always was that the statement in >> > >> >Documentation/process/stable-kernel-rules.rst >> > >> >is very reasonable, but making the process way more "aggresive" when >> >backporting patches is breaking much of its original spirit for me. >> >> I agree that as an enterprise distro taking everything from -stable >> isn't the best idea. Ideally you'd want to be close to the first > >Original purpose of -stable was "to be common base of enterprise >distros" and our documentation still says it is. I guess that the world changes? At this point calling enterprise distros a niche wouldn't be too far from the truth. Furthermore, some enterprise distros (as stated earlier in this thread) don't even follow -stable anymore and cherry pick their own commits. So no, the main driving force behind -stable is not traditional enterprise distributions. >> I think that we can agree that it's impossible to expect every single >> Linux user to go on LKML and complain about a bug he encountered, so the >> rule quickly becomes "It must fix a real bug that can bother >> people". > >I think you are playing dangerous word games. > >> My "aggressiveness" comes from the whole "bother" part: it doesn't have >> to be critical, it doesn't have to cause data corruption, it doesn't >> have to be a security issue. It's enough that the bug actually affects a >> user in a way he didn't expect it to (if a user doesn't have >> expectations, it would fall under the "This could be a problem..." >> exception. > >And it seems documentation says you should be less aggressive and >world tells you they expect to be less aggressive. So maybe that's >what you should do? Who is this "world" you're referring to?
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Thu, May 03, 2018 at 11:47:24AM +0200, Pavel Machek wrote: >On Mon 2018-04-16 21:18:47, Sasha Levin wrote: >> On Mon, Apr 16, 2018 at 10:43:28PM +0200, Jiri Kosina wrote: >> >On Mon, 16 Apr 2018, Sasha Levin wrote: >> > >> >> So I think that Linus's claim that users come first applies here as >> >> well. If there's a user that cares about a particular feature being >> >> broken, then we go ahead and fix his bug rather then ignoring him. >> > >> >So one extreme is fixing -stable *iff* users actually do report an issue. >> > >> >The other extreme is backporting everything that potentially looks like a >> >potential fix of "something" (according to some arbitrary metric), >> >pro-actively. >> > >> >The former voilates the "users first" rule, the latter has a very, very >> >high risk of regressions. >> > >> >So this whole debate is about finding a compromise. >> > >> >My gut feeling always was that the statement in >> > >> >Documentation/process/stable-kernel-rules.rst >> > >> >is very reasonable, but making the process way more "aggresive" when >> >backporting patches is breaking much of its original spirit for me. >> >> I agree that as an enterprise distro taking everything from -stable >> isn't the best idea. Ideally you'd want to be close to the first > >Original purpose of -stable was "to be common base of enterprise >distros" and our documentation still says it is. I guess that the world changes? At this point calling enterprise distros a niche wouldn't be too far from the truth. Furthermore, some enterprise distros (as stated earlier in this thread) don't even follow -stable anymore and cherry pick their own commits. So no, the main driving force behind -stable is not traditional enterprise distributions. >> I think that we can agree that it's impossible to expect every single >> Linux user to go on LKML and complain about a bug he encountered, so the >> rule quickly becomes "It must fix a real bug that can bother >> people". > >I think you are playing dangerous word games. > >> My "aggressiveness" comes from the whole "bother" part: it doesn't have >> to be critical, it doesn't have to cause data corruption, it doesn't >> have to be a security issue. It's enough that the bug actually affects a >> user in a way he didn't expect it to (if a user doesn't have >> expectations, it would fall under the "This could be a problem..." >> exception. > >And it seems documentation says you should be less aggressive and >world tells you they expect to be less aggressive. So maybe that's >what you should do? Who is this "world" you're referring to?
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Thu, May 03, 2018 at 12:04:41PM +0200, Pavel Machek wrote: >On Tue 2018-04-17 16:06:29, Sasha Levin wrote: >> On Tue, Apr 17, 2018 at 05:52:30PM +0200, Jiri Kosina wrote: >> >On Tue, 17 Apr 2018, Sasha Levin wrote: >> > >> >> How do I get the XFS folks to send their stuff to -stable? (we have >> >> quite a few customers who use XFS) >> > >> >If XFS (or *any* other subsystem) doesn't have enough manpower of upstream >> >maintainers to deal with stable, we just have to accept that and find an >> >answer to that. >> >> This is exactly what I'm doing. Many subsystems don't have enough >> manpower to deal with -stable, so I'm trying to help. > >...and the torrent of spams from the AUTOSEL subsystem actually makes >that worse. > >And when you are told particular fix to LEDs is not that important >after all, you start arguing about nuclear power plants (without >really knowing how critical subsystems work). Obviously your knowledge far surpasses mine. >If you want cooperation with maintainers to work, the rules need to be >clear, first. They are documented, so follow them. If you think rules >are wrong, lets talk about changing the rules; but arguing "every bug >is important because someone may be hitting it" is not ok. I'm sorry but you're just unfamiliar with the process. I'd point out that all my AUTOSEL commits go through Greg, who wrote the rules, and accepts my patches. The rules are there as a guideline to allow us to not take certain patches, they're not there as a strict set of rules we must follow at all times.
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Thu, May 03, 2018 at 12:04:41PM +0200, Pavel Machek wrote: >On Tue 2018-04-17 16:06:29, Sasha Levin wrote: >> On Tue, Apr 17, 2018 at 05:52:30PM +0200, Jiri Kosina wrote: >> >On Tue, 17 Apr 2018, Sasha Levin wrote: >> > >> >> How do I get the XFS folks to send their stuff to -stable? (we have >> >> quite a few customers who use XFS) >> > >> >If XFS (or *any* other subsystem) doesn't have enough manpower of upstream >> >maintainers to deal with stable, we just have to accept that and find an >> >answer to that. >> >> This is exactly what I'm doing. Many subsystems don't have enough >> manpower to deal with -stable, so I'm trying to help. > >...and the torrent of spams from the AUTOSEL subsystem actually makes >that worse. > >And when you are told particular fix to LEDs is not that important >after all, you start arguing about nuclear power plants (without >really knowing how critical subsystems work). Obviously your knowledge far surpasses mine. >If you want cooperation with maintainers to work, the rules need to be >clear, first. They are documented, so follow them. If you think rules >are wrong, lets talk about changing the rules; but arguing "every bug >is important because someone may be hitting it" is not ok. I'm sorry but you're just unfamiliar with the process. I'd point out that all my AUTOSEL commits go through Greg, who wrote the rules, and accepts my patches. The rules are there as a guideline to allow us to not take certain patches, they're not there as a strict set of rules we must follow at all times.
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Tue 2018-04-17 16:06:29, Sasha Levin wrote: > On Tue, Apr 17, 2018 at 05:52:30PM +0200, Jiri Kosina wrote: > >On Tue, 17 Apr 2018, Sasha Levin wrote: > > > >> How do I get the XFS folks to send their stuff to -stable? (we have > >> quite a few customers who use XFS) > > > >If XFS (or *any* other subsystem) doesn't have enough manpower of upstream > >maintainers to deal with stable, we just have to accept that and find an > >answer to that. > > This is exactly what I'm doing. Many subsystems don't have enough > manpower to deal with -stable, so I'm trying to help. ...and the torrent of spams from the AUTOSEL subsystem actually makes that worse. And when you are told particular fix to LEDs is not that important after all, you start arguing about nuclear power plants (without really knowing how critical subsystems work). If you want cooperation with maintainers to work, the rules need to be clear, first. They are documented, so follow them. If you think rules are wrong, lets talk about changing the rules; but arguing "every bug is important because someone may be hitting it" is not ok. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Tue 2018-04-17 16:06:29, Sasha Levin wrote: > On Tue, Apr 17, 2018 at 05:52:30PM +0200, Jiri Kosina wrote: > >On Tue, 17 Apr 2018, Sasha Levin wrote: > > > >> How do I get the XFS folks to send their stuff to -stable? (we have > >> quite a few customers who use XFS) > > > >If XFS (or *any* other subsystem) doesn't have enough manpower of upstream > >maintainers to deal with stable, we just have to accept that and find an > >answer to that. > > This is exactly what I'm doing. Many subsystems don't have enough > manpower to deal with -stable, so I'm trying to help. ...and the torrent of spams from the AUTOSEL subsystem actually makes that worse. And when you are told particular fix to LEDs is not that important after all, you start arguing about nuclear power plants (without really knowing how critical subsystems work). If you want cooperation with maintainers to work, the rules need to be clear, first. They are documented, so follow them. If you think rules are wrong, lets talk about changing the rules; but arguing "every bug is important because someone may be hitting it" is not ok. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Mon 2018-04-16 21:18:47, Sasha Levin wrote: > On Mon, Apr 16, 2018 at 10:43:28PM +0200, Jiri Kosina wrote: > >On Mon, 16 Apr 2018, Sasha Levin wrote: > > > >> So I think that Linus's claim that users come first applies here as > >> well. If there's a user that cares about a particular feature being > >> broken, then we go ahead and fix his bug rather then ignoring him. > > > >So one extreme is fixing -stable *iff* users actually do report an issue. > > > >The other extreme is backporting everything that potentially looks like a > >potential fix of "something" (according to some arbitrary metric), > >pro-actively. > > > >The former voilates the "users first" rule, the latter has a very, very > >high risk of regressions. > > > >So this whole debate is about finding a compromise. > > > >My gut feeling always was that the statement in > > > > Documentation/process/stable-kernel-rules.rst > > > >is very reasonable, but making the process way more "aggresive" when > >backporting patches is breaking much of its original spirit for me. > > I agree that as an enterprise distro taking everything from -stable > isn't the best idea. Ideally you'd want to be close to the first Original purpose of -stable was "to be common base of enterprise distros" and our documentation still says it is. > I think that we can agree that it's impossible to expect every single > Linux user to go on LKML and complain about a bug he encountered, so the > rule quickly becomes "It must fix a real bug that can bother > people". I think you are playing dangerous word games. > My "aggressiveness" comes from the whole "bother" part: it doesn't have > to be critical, it doesn't have to cause data corruption, it doesn't > have to be a security issue. It's enough that the bug actually affects a > user in a way he didn't expect it to (if a user doesn't have > expectations, it would fall under the "This could be a problem..." > exception. And it seems documentation says you should be less aggressive and world tells you they expect to be less aggressive. So maybe that's what you should do? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Mon 2018-04-16 21:18:47, Sasha Levin wrote: > On Mon, Apr 16, 2018 at 10:43:28PM +0200, Jiri Kosina wrote: > >On Mon, 16 Apr 2018, Sasha Levin wrote: > > > >> So I think that Linus's claim that users come first applies here as > >> well. If there's a user that cares about a particular feature being > >> broken, then we go ahead and fix his bug rather then ignoring him. > > > >So one extreme is fixing -stable *iff* users actually do report an issue. > > > >The other extreme is backporting everything that potentially looks like a > >potential fix of "something" (according to some arbitrary metric), > >pro-actively. > > > >The former voilates the "users first" rule, the latter has a very, very > >high risk of regressions. > > > >So this whole debate is about finding a compromise. > > > >My gut feeling always was that the statement in > > > > Documentation/process/stable-kernel-rules.rst > > > >is very reasonable, but making the process way more "aggresive" when > >backporting patches is breaking much of its original spirit for me. > > I agree that as an enterprise distro taking everything from -stable > isn't the best idea. Ideally you'd want to be close to the first Original purpose of -stable was "to be common base of enterprise distros" and our documentation still says it is. > I think that we can agree that it's impossible to expect every single > Linux user to go on LKML and complain about a bug he encountered, so the > rule quickly becomes "It must fix a real bug that can bother > people". I think you are playing dangerous word games. > My "aggressiveness" comes from the whole "bother" part: it doesn't have > to be critical, it doesn't have to cause data corruption, it doesn't > have to be a security issue. It's enough that the bug actually affects a > user in a way he didn't expect it to (if a user doesn't have > expectations, it would fall under the "This could be a problem..." > exception. And it seems documentation says you should be less aggressive and world tells you they expect to be less aggressive. So maybe that's what you should do? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Tue 2018-04-17 16:19:35, Sasha Levin wrote: > On Tue, Apr 17, 2018 at 05:55:49PM +0200, Jan Kara wrote: > >On Tue 17-04-18 13:31:51, Sasha Levin wrote: > >> We may be able to guesstimate the 'regression chance', but there's no > >> way we can guess the 'annoyance' once. There are so many different use > >> cases that we just can't even guess how many people would get "annoyed" > >> by something. > > > >As a maintainer, I hope I have reasonable idea what are common use cases > >for my subsystem. Those I cater to when estimating 'annoyance'. Sure I don't > >know all of the use cases so people doing unusual stuff hit more bugs and > >have to report them to get fixes included in -stable. But for me this is a > >preferable tradeoff over the risk of regression so this is the rule I use > >when tagging for stable. Now I'm not a -stable maintainer and I fully agree > >with "those who do the work decide" principle so pick whatever patches you > >think are appropriate, I just wanted explain why I don't think more patches > >in stable are necessarily good. > > The AUTOSEL story is different for subsystems that don't do -stable, and > subsystems that are actually doing the work (like yourself). > > I'm not trying to override active maintainers, I'm trying to help them > make decisions. Ok, cool. Can you exclude LED subsystem, Hibernation and Nokia N900 stuff from autosel work? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Tue 2018-04-17 16:19:35, Sasha Levin wrote: > On Tue, Apr 17, 2018 at 05:55:49PM +0200, Jan Kara wrote: > >On Tue 17-04-18 13:31:51, Sasha Levin wrote: > >> We may be able to guesstimate the 'regression chance', but there's no > >> way we can guess the 'annoyance' once. There are so many different use > >> cases that we just can't even guess how many people would get "annoyed" > >> by something. > > > >As a maintainer, I hope I have reasonable idea what are common use cases > >for my subsystem. Those I cater to when estimating 'annoyance'. Sure I don't > >know all of the use cases so people doing unusual stuff hit more bugs and > >have to report them to get fixes included in -stable. But for me this is a > >preferable tradeoff over the risk of regression so this is the rule I use > >when tagging for stable. Now I'm not a -stable maintainer and I fully agree > >with "those who do the work decide" principle so pick whatever patches you > >think are appropriate, I just wanted explain why I don't think more patches > >in stable are necessarily good. > > The AUTOSEL story is different for subsystems that don't do -stable, and > subsystems that are actually doing the work (like yourself). > > I'm not trying to override active maintainers, I'm trying to help them > make decisions. Ok, cool. Can you exclude LED subsystem, Hibernation and Nokia N900 stuff from autosel work? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
Hi! > >- It must be obviously correct and tested. > > > >If it introduces new bug, it is not correct, and certainly not > >obviously correct. > > As you might have noticed, we don't strictly follow the rules. Yes, I noticed. And what I'm saying is that perhaps you should follow the rules more strictly. > Take a look at the whole PTI story as an example. It's way more than 100 > lines, it's not obviously corrent, it fixed more than 1 thing, and so > on, and yet it went in -stable! > > Would you argue we shouldn't have backported PTI to -stable? Actually, I was surprised with PTI going to stable. That was clearly against the rules. Maybe the security bug was ugly enough to warrant that. But please don't use it as an argument for applying any random patches... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
Hi! > >- It must be obviously correct and tested. > > > >If it introduces new bug, it is not correct, and certainly not > >obviously correct. > > As you might have noticed, we don't strictly follow the rules. Yes, I noticed. And what I'm saying is that perhaps you should follow the rules more strictly. > Take a look at the whole PTI story as an example. It's way more than 100 > lines, it's not obviously corrent, it fixed more than 1 thing, and so > on, and yet it went in -stable! > > Would you argue we shouldn't have backported PTI to -stable? Actually, I was surprised with PTI going to stable. That was clearly against the rules. Maybe the security bug was ugly enough to warrant that. But please don't use it as an argument for applying any random patches... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Thu, Apr 19, 2018 at 04:22:22PM +0200, Greg KH wrote: > On Thu, Apr 19, 2018 at 04:05:45PM +0200, Jan Kara wrote: > > On Thu 19-04-18 15:59:43, Greg KH wrote: > > > On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote: > > > > Den 16-04-2018 kl. 19:19, skrev Sasha Levin: > > > > > On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote: > > > > > > On Mon, 16 Apr 2018 16:02:03 + > > > > > > Sasha Levinwrote: > > > > > > > > > > > > > One of the things Greg is pushing strongly for is "bug > > > > > > > compatibility": > > > > > > > we want the kernel to behave the same way between mainline and > > > > > > > stable. > > > > > > > If the code is broken, it should be broken in the same way. > > > > > > > > > > > > Wait! What does that mean? What's the purpose of stable if it is as > > > > > > broken as mainline? > > > > > > > > > > This just means that if there is a fix that went in mainline, and the > > > > > fix is broken somehow, we'd rather take the broken fix than not. > > > > > > > > > > In this scenario, *something* will be broken, it's just a matter of > > > > > what. We'd rather have the same thing broken between mainline and > > > > > stable. > > > > > > > > > > > > > Yeah, but _intentionally_ breaking existing setups to stay "bug > > > > compatible" > > > > _is_ a _regression_ you _really_ _dont_ want in a stable > > > > supported distro. Because end-users dont care about upstream breaking > > > > stuff... its the distro that takes the heat for that... > > > > > > > > Something "already broken" is not a regression... > > > > > > > > As distro maintainer that means one now have to review _every_ patch > > > > that > > > > carries "AUTOSEL", follow all the mail threads that comes up about it, > > > > then > > > > track if it landed in -stable queue, and read every response and > > > > possible > > > > objection to all patches in the -stable queue a second time around... > > > > then > > > > check if it still got included in final stable point relase and then > > > > either > > > > revert them in distro kernel or go track down all the follow-up fixes > > > > needed... > > > > > > > > Just to avoid being "bug compatible with master" > > > > > > I've done this "bug compatible" "breakage" more than the AUTOSEL stuff > > > has in the past, so you had better also be reviewing all of my normal > > > commits as well :) > > > > > > Anyway, we are trying not to do this, but it does, and will, > > > occasionally happen. > > > > Sure, that's understood. So this was just misunderstanding. Sasha's > > original comment really sounded like "bug compatibility" with current > > master is desirable and that made me go "Are you serious?" as well... > > As I said before in this thread, yes, sometimes I do this on purpose. > > As an specific example, see a recent bluetooth patch that caused a > regression on some chromebook devices. The chromeos developers > rightfully complainied, and I left the commit in there to provide the > needed "leverage" on the upstream developers to fix this properly. > Otherwise if I had reverted the stable patch, when people move to a > newer kernel version, things break, and no one remembers why. > > I also wrote a long response as to _why_ I do this, and even though it > does happen, why it still is worth taking the stable updates. Please > see the archives for the full details. I don't want to duplicate this > again here. And to be more specific, let's always take this as a case-by-case basis. The last time this happened was the bluetooth bug and it was a fix for a reported problem, but then the fix caused a regression so upstream reverted it and I reverted it in the stable trees. No matter what I chose to do, someone would be upset so I followed what upstream did. thanks, greg k-h
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Thu, Apr 19, 2018 at 04:22:22PM +0200, Greg KH wrote: > On Thu, Apr 19, 2018 at 04:05:45PM +0200, Jan Kara wrote: > > On Thu 19-04-18 15:59:43, Greg KH wrote: > > > On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote: > > > > Den 16-04-2018 kl. 19:19, skrev Sasha Levin: > > > > > On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote: > > > > > > On Mon, 16 Apr 2018 16:02:03 + > > > > > > Sasha Levin wrote: > > > > > > > > > > > > > One of the things Greg is pushing strongly for is "bug > > > > > > > compatibility": > > > > > > > we want the kernel to behave the same way between mainline and > > > > > > > stable. > > > > > > > If the code is broken, it should be broken in the same way. > > > > > > > > > > > > Wait! What does that mean? What's the purpose of stable if it is as > > > > > > broken as mainline? > > > > > > > > > > This just means that if there is a fix that went in mainline, and the > > > > > fix is broken somehow, we'd rather take the broken fix than not. > > > > > > > > > > In this scenario, *something* will be broken, it's just a matter of > > > > > what. We'd rather have the same thing broken between mainline and > > > > > stable. > > > > > > > > > > > > > Yeah, but _intentionally_ breaking existing setups to stay "bug > > > > compatible" > > > > _is_ a _regression_ you _really_ _dont_ want in a stable > > > > supported distro. Because end-users dont care about upstream breaking > > > > stuff... its the distro that takes the heat for that... > > > > > > > > Something "already broken" is not a regression... > > > > > > > > As distro maintainer that means one now have to review _every_ patch > > > > that > > > > carries "AUTOSEL", follow all the mail threads that comes up about it, > > > > then > > > > track if it landed in -stable queue, and read every response and > > > > possible > > > > objection to all patches in the -stable queue a second time around... > > > > then > > > > check if it still got included in final stable point relase and then > > > > either > > > > revert them in distro kernel or go track down all the follow-up fixes > > > > needed... > > > > > > > > Just to avoid being "bug compatible with master" > > > > > > I've done this "bug compatible" "breakage" more than the AUTOSEL stuff > > > has in the past, so you had better also be reviewing all of my normal > > > commits as well :) > > > > > > Anyway, we are trying not to do this, but it does, and will, > > > occasionally happen. > > > > Sure, that's understood. So this was just misunderstanding. Sasha's > > original comment really sounded like "bug compatibility" with current > > master is desirable and that made me go "Are you serious?" as well... > > As I said before in this thread, yes, sometimes I do this on purpose. > > As an specific example, see a recent bluetooth patch that caused a > regression on some chromebook devices. The chromeos developers > rightfully complainied, and I left the commit in there to provide the > needed "leverage" on the upstream developers to fix this properly. > Otherwise if I had reverted the stable patch, when people move to a > newer kernel version, things break, and no one remembers why. > > I also wrote a long response as to _why_ I do this, and even though it > does happen, why it still is worth taking the stable updates. Please > see the archives for the full details. I don't want to duplicate this > again here. And to be more specific, let's always take this as a case-by-case basis. The last time this happened was the bluetooth bug and it was a fix for a reported problem, but then the fix caused a regression so upstream reverted it and I reverted it in the stable trees. No matter what I chose to do, someone would be upset so I followed what upstream did. thanks, greg k-h
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
Den 19.04.2018 kl. 18:57, skrev Greg KH: On Thu, Apr 19, 2018 at 06:16:26PM +0300, Thomas Backlund wrote: Den 19.04.2018 kl. 17:22, skrev Greg KH: On Thu, Apr 19, 2018 at 04:05:45PM +0200, Jan Kara wrote: On Thu 19-04-18 15:59:43, Greg KH wrote: On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote: Den 16-04-2018 kl. 19:19, skrev Sasha Levin: On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote: On Mon, 16 Apr 2018 16:02:03 + Sasha Levinwrote: One of the things Greg is pushing strongly for is "bug compatibility": we want the kernel to behave the same way between mainline and stable. If the code is broken, it should be broken in the same way. Wait! What does that mean? What's the purpose of stable if it is as broken as mainline? This just means that if there is a fix that went in mainline, and the fix is broken somehow, we'd rather take the broken fix than not. In this scenario, *something* will be broken, it's just a matter of what. We'd rather have the same thing broken between mainline and stable. Yeah, but _intentionally_ breaking existing setups to stay "bug compatible" _is_ a _regression_ you _really_ _dont_ want in a stable supported distro. Because end-users dont care about upstream breaking stuff... its the distro that takes the heat for that... Something "already broken" is not a regression... As distro maintainer that means one now have to review _every_ patch that carries "AUTOSEL", follow all the mail threads that comes up about it, then track if it landed in -stable queue, and read every response and possible objection to all patches in the -stable queue a second time around... then check if it still got included in final stable point relase and then either revert them in distro kernel or go track down all the follow-up fixes needed... Just to avoid being "bug compatible with master" I've done this "bug compatible" "breakage" more than the AUTOSEL stuff has in the past, so you had better also be reviewing all of my normal commits as well :) Anyway, we are trying not to do this, but it does, and will, occasionally happen. Sure, that's understood. So this was just misunderstanding. Sasha's original comment really sounded like "bug compatibility" with current master is desirable and that made me go "Are you serious?" as well... As I said before in this thread, yes, sometimes I do this on purpose. And I guess this is the one that gets people the feeling that "stable is not as stable as it used to be" ... It's always been this way, it's just that no one noticed :) :) As an specific example, see a recent bluetooth patch that caused a regression on some chromebook devices. The chromeos developers rightfully complainied, and I left the commit in there to provide the needed "leverage" on the upstream developers to fix this properly. Otherwise if I had reverted the stable patch, when people move to a newer kernel version, things break, and no one remembers why. I do understand what you are trying to do... But from my distro hat on I have to treat things differently (and I dont think I'm alone doing it this way...) Known breakages gets reverted even before it hits QA, so endusers wont see the issue at all... So the only ones to see the issue are those building with latest upstream without own patches applied... I also wrote a long response as to _why_ I do this, and even though it does happen, why it still is worth taking the stable updates. Please see the archives for the full details. I don't want to duplicate this again here. And we do use latest stable (with some delay as I dont want to overload QA & endusers with a new kernel every week :)) You need to automate your QA :) Yeah, some can be automated... but that means having a lot of different hw to test on... emulators/vms can only test so much... users part of QA test on a variety of hw with various installs/setups that exposes fun things with some hw :) We just revert known broken (or add known fixes) before releasing them to our users That's great, and is what you should be doing, nothing wrong there. thanks, greg k-h -- Thomas
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
Den 19.04.2018 kl. 18:57, skrev Greg KH: On Thu, Apr 19, 2018 at 06:16:26PM +0300, Thomas Backlund wrote: Den 19.04.2018 kl. 17:22, skrev Greg KH: On Thu, Apr 19, 2018 at 04:05:45PM +0200, Jan Kara wrote: On Thu 19-04-18 15:59:43, Greg KH wrote: On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote: Den 16-04-2018 kl. 19:19, skrev Sasha Levin: On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote: On Mon, 16 Apr 2018 16:02:03 + Sasha Levin wrote: One of the things Greg is pushing strongly for is "bug compatibility": we want the kernel to behave the same way between mainline and stable. If the code is broken, it should be broken in the same way. Wait! What does that mean? What's the purpose of stable if it is as broken as mainline? This just means that if there is a fix that went in mainline, and the fix is broken somehow, we'd rather take the broken fix than not. In this scenario, *something* will be broken, it's just a matter of what. We'd rather have the same thing broken between mainline and stable. Yeah, but _intentionally_ breaking existing setups to stay "bug compatible" _is_ a _regression_ you _really_ _dont_ want in a stable supported distro. Because end-users dont care about upstream breaking stuff... its the distro that takes the heat for that... Something "already broken" is not a regression... As distro maintainer that means one now have to review _every_ patch that carries "AUTOSEL", follow all the mail threads that comes up about it, then track if it landed in -stable queue, and read every response and possible objection to all patches in the -stable queue a second time around... then check if it still got included in final stable point relase and then either revert them in distro kernel or go track down all the follow-up fixes needed... Just to avoid being "bug compatible with master" I've done this "bug compatible" "breakage" more than the AUTOSEL stuff has in the past, so you had better also be reviewing all of my normal commits as well :) Anyway, we are trying not to do this, but it does, and will, occasionally happen. Sure, that's understood. So this was just misunderstanding. Sasha's original comment really sounded like "bug compatibility" with current master is desirable and that made me go "Are you serious?" as well... As I said before in this thread, yes, sometimes I do this on purpose. And I guess this is the one that gets people the feeling that "stable is not as stable as it used to be" ... It's always been this way, it's just that no one noticed :) :) As an specific example, see a recent bluetooth patch that caused a regression on some chromebook devices. The chromeos developers rightfully complainied, and I left the commit in there to provide the needed "leverage" on the upstream developers to fix this properly. Otherwise if I had reverted the stable patch, when people move to a newer kernel version, things break, and no one remembers why. I do understand what you are trying to do... But from my distro hat on I have to treat things differently (and I dont think I'm alone doing it this way...) Known breakages gets reverted even before it hits QA, so endusers wont see the issue at all... So the only ones to see the issue are those building with latest upstream without own patches applied... I also wrote a long response as to _why_ I do this, and even though it does happen, why it still is worth taking the stable updates. Please see the archives for the full details. I don't want to duplicate this again here. And we do use latest stable (with some delay as I dont want to overload QA & endusers with a new kernel every week :)) You need to automate your QA :) Yeah, some can be automated... but that means having a lot of different hw to test on... emulators/vms can only test so much... users part of QA test on a variety of hw with various installs/setups that exposes fun things with some hw :) We just revert known broken (or add known fixes) before releasing them to our users That's great, and is what you should be doing, nothing wrong there. thanks, greg k-h -- Thomas
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
Den 19.04.2018 kl. 18:09, skrev Sasha Levin: On Thu, Apr 19, 2018 at 06:04:26PM +0300, Thomas Backlund wrote: Den 19.04.2018 kl. 16:59, skrev Greg KH: Anyway, we are trying not to do this, but it does, and will, occasionally happen. Look, we just did that for one platform for 4.9.94! And the key to all of this is good testing, which we are now doing, and hopefully you are also doing as well. Yeah, but having to test stuff with known breakages is no fun, so we try to avoid that Known breakages are easier to deal with than unknown ones :) well, if a system worked before the update, but not after... Guess wich one we want... I think that that "bug compatability" is basically a policy on *which* regressions you'll see vs *if* you'll see a regression. No. Intentionally breaking known working code in a stable branch is never ok. As I said before... something that never worked is not a regression, but breaking a previously working setup is... That goes for security fixes too... there is not much point in a security fix, if it basically turns into a local DOS when the system stops working... People will just boot older code and there you have it... We'll never pull in a commit that introduces a bug but doesn't fix another one, right? So if you have to deal with a regression anyway, might as well deal with a regression that is also seen on mainline, so that when you upgrade your stable kernel you'll keep the same set of regressions to deal with. Here I actually like the comment Linus posted about API breakage earlier in this thread... If you break user workflows, NOTHING ELSE MATTERS. Even security is secondary to "people don't use the end result, because it doesn't work for them any more". _This_ same statement should be aknowledged / enforced in stable trees too IMHO... Because this is what will happend... simple logic... if it does not work, the enduser will boot an earlier kernel... missing "all the good fixes" (including security ones) just because one fix is bad. For example in this AUTOSEL round there is 161 fixes of wich the enduser never gets the 160 "supposedly good ones" when one is "bad"... How is that a "good thing" ? And trying to tell those that get hit "this will force upstream to fix it faster, so you get a working setup in some days/weeks/months..." is not going to work... Heh, This even reminds me that this is just as annoying as when MS started to "bundle monthly security updates" and you get 95% installed just to realize that the last 5% does not work (or install at all) and you have to rollback to something working thus missing the needed security fixes... Same flawed logic... Thnakfully we as distro maintainers can avoid some of the breakage for our enduses... -- Thomas
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
Den 19.04.2018 kl. 18:09, skrev Sasha Levin: On Thu, Apr 19, 2018 at 06:04:26PM +0300, Thomas Backlund wrote: Den 19.04.2018 kl. 16:59, skrev Greg KH: Anyway, we are trying not to do this, but it does, and will, occasionally happen. Look, we just did that for one platform for 4.9.94! And the key to all of this is good testing, which we are now doing, and hopefully you are also doing as well. Yeah, but having to test stuff with known breakages is no fun, so we try to avoid that Known breakages are easier to deal with than unknown ones :) well, if a system worked before the update, but not after... Guess wich one we want... I think that that "bug compatability" is basically a policy on *which* regressions you'll see vs *if* you'll see a regression. No. Intentionally breaking known working code in a stable branch is never ok. As I said before... something that never worked is not a regression, but breaking a previously working setup is... That goes for security fixes too... there is not much point in a security fix, if it basically turns into a local DOS when the system stops working... People will just boot older code and there you have it... We'll never pull in a commit that introduces a bug but doesn't fix another one, right? So if you have to deal with a regression anyway, might as well deal with a regression that is also seen on mainline, so that when you upgrade your stable kernel you'll keep the same set of regressions to deal with. Here I actually like the comment Linus posted about API breakage earlier in this thread... If you break user workflows, NOTHING ELSE MATTERS. Even security is secondary to "people don't use the end result, because it doesn't work for them any more". _This_ same statement should be aknowledged / enforced in stable trees too IMHO... Because this is what will happend... simple logic... if it does not work, the enduser will boot an earlier kernel... missing "all the good fixes" (including security ones) just because one fix is bad. For example in this AUTOSEL round there is 161 fixes of wich the enduser never gets the 160 "supposedly good ones" when one is "bad"... How is that a "good thing" ? And trying to tell those that get hit "this will force upstream to fix it faster, so you get a working setup in some days/weeks/months..." is not going to work... Heh, This even reminds me that this is just as annoying as when MS started to "bundle monthly security updates" and you get 95% installed just to realize that the last 5% does not work (or install at all) and you have to rollback to something working thus missing the needed security fixes... Same flawed logic... Thnakfully we as distro maintainers can avoid some of the breakage for our enduses... -- Thomas
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Thu, Apr 19, 2018 at 06:16:26PM +0300, Thomas Backlund wrote: > Den 19.04.2018 kl. 17:22, skrev Greg KH: > > On Thu, Apr 19, 2018 at 04:05:45PM +0200, Jan Kara wrote: > > > On Thu 19-04-18 15:59:43, Greg KH wrote: > > > > On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote: > > > > > Den 16-04-2018 kl. 19:19, skrev Sasha Levin: > > > > > > On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote: > > > > > > > On Mon, 16 Apr 2018 16:02:03 + > > > > > > > Sasha Levinwrote: > > > > > > > > > > > > > > > One of the things Greg is pushing strongly for is "bug > > > > > > > > compatibility": > > > > > > > > we want the kernel to behave the same way between mainline and > > > > > > > > stable. > > > > > > > > If the code is broken, it should be broken in the same way. > > > > > > > > > > > > > > Wait! What does that mean? What's the purpose of stable if it is > > > > > > > as > > > > > > > broken as mainline? > > > > > > > > > > > > This just means that if there is a fix that went in mainline, and > > > > > > the > > > > > > fix is broken somehow, we'd rather take the broken fix than not. > > > > > > > > > > > > In this scenario, *something* will be broken, it's just a matter of > > > > > > what. We'd rather have the same thing broken between mainline and > > > > > > stable. > > > > > > > > > > > > > > > > Yeah, but _intentionally_ breaking existing setups to stay "bug > > > > > compatible" > > > > > _is_ a _regression_ you _really_ _dont_ want in a stable > > > > > supported distro. Because end-users dont care about upstream breaking > > > > > stuff... its the distro that takes the heat for that... > > > > > > > > > > Something "already broken" is not a regression... > > > > > > > > > > As distro maintainer that means one now have to review _every_ patch > > > > > that > > > > > carries "AUTOSEL", follow all the mail threads that comes up about > > > > > it, then > > > > > track if it landed in -stable queue, and read every response and > > > > > possible > > > > > objection to all patches in the -stable queue a second time around... > > > > > then > > > > > check if it still got included in final stable point relase and then > > > > > either > > > > > revert them in distro kernel or go track down all the follow-up fixes > > > > > needed... > > > > > > > > > > Just to avoid being "bug compatible with master" > > > > > > > > I've done this "bug compatible" "breakage" more than the AUTOSEL stuff > > > > has in the past, so you had better also be reviewing all of my normal > > > > commits as well :) > > > > > > > > Anyway, we are trying not to do this, but it does, and will, > > > > occasionally happen. > > > > > > Sure, that's understood. So this was just misunderstanding. Sasha's > > > original comment really sounded like "bug compatibility" with current > > > master is desirable and that made me go "Are you serious?" as well... > > > > As I said before in this thread, yes, sometimes I do this on purpose. > > > > And I guess this is the one that gets people the feeling that > "stable is not as stable as it used to be" ... It's always been this way, it's just that no one noticed :) > > As an specific example, see a recent bluetooth patch that caused a > > regression on some chromebook devices. The chromeos developers > > rightfully complainied, and I left the commit in there to provide the > > needed "leverage" on the upstream developers to fix this properly. > > Otherwise if I had reverted the stable patch, when people move to a > > newer kernel version, things break, and no one remembers why. > > I do understand what you are trying to do... > > But from my distro hat on I have to treat things differently (and I dont > think I'm alone doing it this way...) > > Known breakages gets reverted even before it hits QA, so endusers wont see > the issue at all... > > So the only ones to see the issue are those building with latest upstream > without own patches applied... > > > > > I also wrote a long response as to _why_ I do this, and even though it > > does happen, why it still is worth taking the stable updates. Please > > see the archives for the full details. I don't want to duplicate this > > again here. > > And we do use latest stable (with some delay as I dont want to overload QA & > endusers with a new kernel every week :)) You need to automate your QA :) > We just revert known broken (or add known fixes) before releasing them to > our users That's great, and is what you should be doing, nothing wrong there. thanks, greg k-h
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Thu, Apr 19, 2018 at 06:16:26PM +0300, Thomas Backlund wrote: > Den 19.04.2018 kl. 17:22, skrev Greg KH: > > On Thu, Apr 19, 2018 at 04:05:45PM +0200, Jan Kara wrote: > > > On Thu 19-04-18 15:59:43, Greg KH wrote: > > > > On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote: > > > > > Den 16-04-2018 kl. 19:19, skrev Sasha Levin: > > > > > > On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote: > > > > > > > On Mon, 16 Apr 2018 16:02:03 + > > > > > > > Sasha Levin wrote: > > > > > > > > > > > > > > > One of the things Greg is pushing strongly for is "bug > > > > > > > > compatibility": > > > > > > > > we want the kernel to behave the same way between mainline and > > > > > > > > stable. > > > > > > > > If the code is broken, it should be broken in the same way. > > > > > > > > > > > > > > Wait! What does that mean? What's the purpose of stable if it is > > > > > > > as > > > > > > > broken as mainline? > > > > > > > > > > > > This just means that if there is a fix that went in mainline, and > > > > > > the > > > > > > fix is broken somehow, we'd rather take the broken fix than not. > > > > > > > > > > > > In this scenario, *something* will be broken, it's just a matter of > > > > > > what. We'd rather have the same thing broken between mainline and > > > > > > stable. > > > > > > > > > > > > > > > > Yeah, but _intentionally_ breaking existing setups to stay "bug > > > > > compatible" > > > > > _is_ a _regression_ you _really_ _dont_ want in a stable > > > > > supported distro. Because end-users dont care about upstream breaking > > > > > stuff... its the distro that takes the heat for that... > > > > > > > > > > Something "already broken" is not a regression... > > > > > > > > > > As distro maintainer that means one now have to review _every_ patch > > > > > that > > > > > carries "AUTOSEL", follow all the mail threads that comes up about > > > > > it, then > > > > > track if it landed in -stable queue, and read every response and > > > > > possible > > > > > objection to all patches in the -stable queue a second time around... > > > > > then > > > > > check if it still got included in final stable point relase and then > > > > > either > > > > > revert them in distro kernel or go track down all the follow-up fixes > > > > > needed... > > > > > > > > > > Just to avoid being "bug compatible with master" > > > > > > > > I've done this "bug compatible" "breakage" more than the AUTOSEL stuff > > > > has in the past, so you had better also be reviewing all of my normal > > > > commits as well :) > > > > > > > > Anyway, we are trying not to do this, but it does, and will, > > > > occasionally happen. > > > > > > Sure, that's understood. So this was just misunderstanding. Sasha's > > > original comment really sounded like "bug compatibility" with current > > > master is desirable and that made me go "Are you serious?" as well... > > > > As I said before in this thread, yes, sometimes I do this on purpose. > > > > And I guess this is the one that gets people the feeling that > "stable is not as stable as it used to be" ... It's always been this way, it's just that no one noticed :) > > As an specific example, see a recent bluetooth patch that caused a > > regression on some chromebook devices. The chromeos developers > > rightfully complainied, and I left the commit in there to provide the > > needed "leverage" on the upstream developers to fix this properly. > > Otherwise if I had reverted the stable patch, when people move to a > > newer kernel version, things break, and no one remembers why. > > I do understand what you are trying to do... > > But from my distro hat on I have to treat things differently (and I dont > think I'm alone doing it this way...) > > Known breakages gets reverted even before it hits QA, so endusers wont see > the issue at all... > > So the only ones to see the issue are those building with latest upstream > without own patches applied... > > > > > I also wrote a long response as to _why_ I do this, and even though it > > does happen, why it still is worth taking the stable updates. Please > > see the archives for the full details. I don't want to duplicate this > > again here. > > And we do use latest stable (with some delay as I dont want to overload QA & > endusers with a new kernel every week :)) You need to automate your QA :) > We just revert known broken (or add known fixes) before releasing them to > our users That's great, and is what you should be doing, nothing wrong there. thanks, greg k-h
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
Den 19.04.2018 kl. 17:22, skrev Greg KH: On Thu, Apr 19, 2018 at 04:05:45PM +0200, Jan Kara wrote: On Thu 19-04-18 15:59:43, Greg KH wrote: On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote: Den 16-04-2018 kl. 19:19, skrev Sasha Levin: On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote: On Mon, 16 Apr 2018 16:02:03 + Sasha Levinwrote: One of the things Greg is pushing strongly for is "bug compatibility": we want the kernel to behave the same way between mainline and stable. If the code is broken, it should be broken in the same way. Wait! What does that mean? What's the purpose of stable if it is as broken as mainline? This just means that if there is a fix that went in mainline, and the fix is broken somehow, we'd rather take the broken fix than not. In this scenario, *something* will be broken, it's just a matter of what. We'd rather have the same thing broken between mainline and stable. Yeah, but _intentionally_ breaking existing setups to stay "bug compatible" _is_ a _regression_ you _really_ _dont_ want in a stable supported distro. Because end-users dont care about upstream breaking stuff... its the distro that takes the heat for that... Something "already broken" is not a regression... As distro maintainer that means one now have to review _every_ patch that carries "AUTOSEL", follow all the mail threads that comes up about it, then track if it landed in -stable queue, and read every response and possible objection to all patches in the -stable queue a second time around... then check if it still got included in final stable point relase and then either revert them in distro kernel or go track down all the follow-up fixes needed... Just to avoid being "bug compatible with master" I've done this "bug compatible" "breakage" more than the AUTOSEL stuff has in the past, so you had better also be reviewing all of my normal commits as well :) Anyway, we are trying not to do this, but it does, and will, occasionally happen. Sure, that's understood. So this was just misunderstanding. Sasha's original comment really sounded like "bug compatibility" with current master is desirable and that made me go "Are you serious?" as well... As I said before in this thread, yes, sometimes I do this on purpose. And I guess this is the one that gets people the feeling that "stable is not as stable as it used to be" ... As an specific example, see a recent bluetooth patch that caused a regression on some chromebook devices. The chromeos developers rightfully complainied, and I left the commit in there to provide the needed "leverage" on the upstream developers to fix this properly. Otherwise if I had reverted the stable patch, when people move to a newer kernel version, things break, and no one remembers why. I do understand what you are trying to do... But from my distro hat on I have to treat things differently (and I dont think I'm alone doing it this way...) Known breakages gets reverted even before it hits QA, so endusers wont see the issue at all... So the only ones to see the issue are those building with latest upstream without own patches applied... I also wrote a long response as to _why_ I do this, and even though it does happen, why it still is worth taking the stable updates. Please see the archives for the full details. I don't want to duplicate this again here. And we do use latest stable (with some delay as I dont want to overload QA & endusers with a new kernel every week :)) We just revert known broken (or add known fixes) before releasing them to our users -- Thomas
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
Den 19.04.2018 kl. 17:22, skrev Greg KH: On Thu, Apr 19, 2018 at 04:05:45PM +0200, Jan Kara wrote: On Thu 19-04-18 15:59:43, Greg KH wrote: On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote: Den 16-04-2018 kl. 19:19, skrev Sasha Levin: On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote: On Mon, 16 Apr 2018 16:02:03 + Sasha Levin wrote: One of the things Greg is pushing strongly for is "bug compatibility": we want the kernel to behave the same way between mainline and stable. If the code is broken, it should be broken in the same way. Wait! What does that mean? What's the purpose of stable if it is as broken as mainline? This just means that if there is a fix that went in mainline, and the fix is broken somehow, we'd rather take the broken fix than not. In this scenario, *something* will be broken, it's just a matter of what. We'd rather have the same thing broken between mainline and stable. Yeah, but _intentionally_ breaking existing setups to stay "bug compatible" _is_ a _regression_ you _really_ _dont_ want in a stable supported distro. Because end-users dont care about upstream breaking stuff... its the distro that takes the heat for that... Something "already broken" is not a regression... As distro maintainer that means one now have to review _every_ patch that carries "AUTOSEL", follow all the mail threads that comes up about it, then track if it landed in -stable queue, and read every response and possible objection to all patches in the -stable queue a second time around... then check if it still got included in final stable point relase and then either revert them in distro kernel or go track down all the follow-up fixes needed... Just to avoid being "bug compatible with master" I've done this "bug compatible" "breakage" more than the AUTOSEL stuff has in the past, so you had better also be reviewing all of my normal commits as well :) Anyway, we are trying not to do this, but it does, and will, occasionally happen. Sure, that's understood. So this was just misunderstanding. Sasha's original comment really sounded like "bug compatibility" with current master is desirable and that made me go "Are you serious?" as well... As I said before in this thread, yes, sometimes I do this on purpose. And I guess this is the one that gets people the feeling that "stable is not as stable as it used to be" ... As an specific example, see a recent bluetooth patch that caused a regression on some chromebook devices. The chromeos developers rightfully complainied, and I left the commit in there to provide the needed "leverage" on the upstream developers to fix this properly. Otherwise if I had reverted the stable patch, when people move to a newer kernel version, things break, and no one remembers why. I do understand what you are trying to do... But from my distro hat on I have to treat things differently (and I dont think I'm alone doing it this way...) Known breakages gets reverted even before it hits QA, so endusers wont see the issue at all... So the only ones to see the issue are those building with latest upstream without own patches applied... I also wrote a long response as to _why_ I do this, and even though it does happen, why it still is worth taking the stable updates. Please see the archives for the full details. I don't want to duplicate this again here. And we do use latest stable (with some delay as I dont want to overload QA & endusers with a new kernel every week :)) We just revert known broken (or add known fixes) before releasing them to our users -- Thomas
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Thu, Apr 19, 2018 at 06:04:26PM +0300, Thomas Backlund wrote: >Den 19.04.2018 kl. 16:59, skrev Greg KH: >>Anyway, we are trying not to do this, but it does, and will, >>occasionally happen. Look, we just did that for one platform for >>4.9.94! And the key to all of this is good testing, which we are now >>doing, and hopefully you are also doing as well. > >Yeah, but having to test stuff with known breakages is no fun, so we >try to avoid that Known breakages are easier to deal with than unknown ones :) I think that that "bug compatability" is basically a policy on *which* regressions you'll see vs *if* you'll see a regression. We'll never pull in a commit that introduces a bug but doesn't fix another one, right? So if you have to deal with a regression anyway, might as well deal with a regression that is also seen on mainline, so that when you upgrade your stable kernel you'll keep the same set of regressions to deal with.
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Thu, Apr 19, 2018 at 06:04:26PM +0300, Thomas Backlund wrote: >Den 19.04.2018 kl. 16:59, skrev Greg KH: >>Anyway, we are trying not to do this, but it does, and will, >>occasionally happen. Look, we just did that for one platform for >>4.9.94! And the key to all of this is good testing, which we are now >>doing, and hopefully you are also doing as well. > >Yeah, but having to test stuff with known breakages is no fun, so we >try to avoid that Known breakages are easier to deal with than unknown ones :) I think that that "bug compatability" is basically a policy on *which* regressions you'll see vs *if* you'll see a regression. We'll never pull in a commit that introduces a bug but doesn't fix another one, right? So if you have to deal with a regression anyway, might as well deal with a regression that is also seen on mainline, so that when you upgrade your stable kernel you'll keep the same set of regressions to deal with.
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
Den 19.04.2018 kl. 16:59, skrev Greg KH: On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote: Den 16-04-2018 kl. 19:19, skrev Sasha Levin: On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote: On Mon, 16 Apr 2018 16:02:03 + Sasha Levinwrote: One of the things Greg is pushing strongly for is "bug compatibility": we want the kernel to behave the same way between mainline and stable. If the code is broken, it should be broken in the same way. Wait! What does that mean? What's the purpose of stable if it is as broken as mainline? This just means that if there is a fix that went in mainline, and the fix is broken somehow, we'd rather take the broken fix than not. In this scenario, *something* will be broken, it's just a matter of what. We'd rather have the same thing broken between mainline and stable. Yeah, but _intentionally_ breaking existing setups to stay "bug compatible" _is_ a _regression_ you _really_ _dont_ want in a stable supported distro. Because end-users dont care about upstream breaking stuff... its the distro that takes the heat for that... Something "already broken" is not a regression... As distro maintainer that means one now have to review _every_ patch that carries "AUTOSEL", follow all the mail threads that comes up about it, then track if it landed in -stable queue, and read every response and possible objection to all patches in the -stable queue a second time around... then check if it still got included in final stable point relase and then either revert them in distro kernel or go track down all the follow-up fixes needed... Just to avoid being "bug compatible with master" I've done this "bug compatible" "breakage" more than the AUTOSEL stuff has in the past, so you had better also be reviewing all of my normal commits as well :) Yeah, I do... and same goes there ... if there is a known issue, then same procedure... Either revert, or try to track down fixes... Anyway, we are trying not to do this, but it does, and will, occasionally happen. Look, we just did that for one platform for 4.9.94! And the key to all of this is good testing, which we are now doing, and hopefully you are also doing as well. Yeah, but having to test stuff with known breakages is no fun, so we try to avoid that -- Thomas
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
Den 19.04.2018 kl. 16:59, skrev Greg KH: On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote: Den 16-04-2018 kl. 19:19, skrev Sasha Levin: On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote: On Mon, 16 Apr 2018 16:02:03 + Sasha Levin wrote: One of the things Greg is pushing strongly for is "bug compatibility": we want the kernel to behave the same way between mainline and stable. If the code is broken, it should be broken in the same way. Wait! What does that mean? What's the purpose of stable if it is as broken as mainline? This just means that if there is a fix that went in mainline, and the fix is broken somehow, we'd rather take the broken fix than not. In this scenario, *something* will be broken, it's just a matter of what. We'd rather have the same thing broken between mainline and stable. Yeah, but _intentionally_ breaking existing setups to stay "bug compatible" _is_ a _regression_ you _really_ _dont_ want in a stable supported distro. Because end-users dont care about upstream breaking stuff... its the distro that takes the heat for that... Something "already broken" is not a regression... As distro maintainer that means one now have to review _every_ patch that carries "AUTOSEL", follow all the mail threads that comes up about it, then track if it landed in -stable queue, and read every response and possible objection to all patches in the -stable queue a second time around... then check if it still got included in final stable point relase and then either revert them in distro kernel or go track down all the follow-up fixes needed... Just to avoid being "bug compatible with master" I've done this "bug compatible" "breakage" more than the AUTOSEL stuff has in the past, so you had better also be reviewing all of my normal commits as well :) Yeah, I do... and same goes there ... if there is a known issue, then same procedure... Either revert, or try to track down fixes... Anyway, we are trying not to do this, but it does, and will, occasionally happen. Look, we just did that for one platform for 4.9.94! And the key to all of this is good testing, which we are now doing, and hopefully you are also doing as well. Yeah, but having to test stuff with known breakages is no fun, so we try to avoid that -- Thomas
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Thu, Apr 19, 2018 at 04:05:45PM +0200, Jan Kara wrote: > On Thu 19-04-18 15:59:43, Greg KH wrote: > > On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote: > > > Den 16-04-2018 kl. 19:19, skrev Sasha Levin: > > > > On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote: > > > > > On Mon, 16 Apr 2018 16:02:03 + > > > > > Sasha Levinwrote: > > > > > > > > > > > One of the things Greg is pushing strongly for is "bug > > > > > > compatibility": > > > > > > we want the kernel to behave the same way between mainline and > > > > > > stable. > > > > > > If the code is broken, it should be broken in the same way. > > > > > > > > > > Wait! What does that mean? What's the purpose of stable if it is as > > > > > broken as mainline? > > > > > > > > This just means that if there is a fix that went in mainline, and the > > > > fix is broken somehow, we'd rather take the broken fix than not. > > > > > > > > In this scenario, *something* will be broken, it's just a matter of > > > > what. We'd rather have the same thing broken between mainline and > > > > stable. > > > > > > > > > > Yeah, but _intentionally_ breaking existing setups to stay "bug > > > compatible" > > > _is_ a _regression_ you _really_ _dont_ want in a stable > > > supported distro. Because end-users dont care about upstream breaking > > > stuff... its the distro that takes the heat for that... > > > > > > Something "already broken" is not a regression... > > > > > > As distro maintainer that means one now have to review _every_ patch that > > > carries "AUTOSEL", follow all the mail threads that comes up about it, > > > then > > > track if it landed in -stable queue, and read every response and possible > > > objection to all patches in the -stable queue a second time around... then > > > check if it still got included in final stable point relase and then > > > either > > > revert them in distro kernel or go track down all the follow-up fixes > > > needed... > > > > > > Just to avoid being "bug compatible with master" > > > > I've done this "bug compatible" "breakage" more than the AUTOSEL stuff > > has in the past, so you had better also be reviewing all of my normal > > commits as well :) > > > > Anyway, we are trying not to do this, but it does, and will, > > occasionally happen. > > Sure, that's understood. So this was just misunderstanding. Sasha's > original comment really sounded like "bug compatibility" with current > master is desirable and that made me go "Are you serious?" as well... As I said before in this thread, yes, sometimes I do this on purpose. As an specific example, see a recent bluetooth patch that caused a regression on some chromebook devices. The chromeos developers rightfully complainied, and I left the commit in there to provide the needed "leverage" on the upstream developers to fix this properly. Otherwise if I had reverted the stable patch, when people move to a newer kernel version, things break, and no one remembers why. I also wrote a long response as to _why_ I do this, and even though it does happen, why it still is worth taking the stable updates. Please see the archives for the full details. I don't want to duplicate this again here. thanks, greg k-h
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Thu, Apr 19, 2018 at 04:05:45PM +0200, Jan Kara wrote: > On Thu 19-04-18 15:59:43, Greg KH wrote: > > On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote: > > > Den 16-04-2018 kl. 19:19, skrev Sasha Levin: > > > > On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote: > > > > > On Mon, 16 Apr 2018 16:02:03 + > > > > > Sasha Levin wrote: > > > > > > > > > > > One of the things Greg is pushing strongly for is "bug > > > > > > compatibility": > > > > > > we want the kernel to behave the same way between mainline and > > > > > > stable. > > > > > > If the code is broken, it should be broken in the same way. > > > > > > > > > > Wait! What does that mean? What's the purpose of stable if it is as > > > > > broken as mainline? > > > > > > > > This just means that if there is a fix that went in mainline, and the > > > > fix is broken somehow, we'd rather take the broken fix than not. > > > > > > > > In this scenario, *something* will be broken, it's just a matter of > > > > what. We'd rather have the same thing broken between mainline and > > > > stable. > > > > > > > > > > Yeah, but _intentionally_ breaking existing setups to stay "bug > > > compatible" > > > _is_ a _regression_ you _really_ _dont_ want in a stable > > > supported distro. Because end-users dont care about upstream breaking > > > stuff... its the distro that takes the heat for that... > > > > > > Something "already broken" is not a regression... > > > > > > As distro maintainer that means one now have to review _every_ patch that > > > carries "AUTOSEL", follow all the mail threads that comes up about it, > > > then > > > track if it landed in -stable queue, and read every response and possible > > > objection to all patches in the -stable queue a second time around... then > > > check if it still got included in final stable point relase and then > > > either > > > revert them in distro kernel or go track down all the follow-up fixes > > > needed... > > > > > > Just to avoid being "bug compatible with master" > > > > I've done this "bug compatible" "breakage" more than the AUTOSEL stuff > > has in the past, so you had better also be reviewing all of my normal > > commits as well :) > > > > Anyway, we are trying not to do this, but it does, and will, > > occasionally happen. > > Sure, that's understood. So this was just misunderstanding. Sasha's > original comment really sounded like "bug compatibility" with current > master is desirable and that made me go "Are you serious?" as well... As I said before in this thread, yes, sometimes I do this on purpose. As an specific example, see a recent bluetooth patch that caused a regression on some chromebook devices. The chromeos developers rightfully complainied, and I left the commit in there to provide the needed "leverage" on the upstream developers to fix this properly. Otherwise if I had reverted the stable patch, when people move to a newer kernel version, things break, and no one remembers why. I also wrote a long response as to _why_ I do this, and even though it does happen, why it still is worth taking the stable updates. Please see the archives for the full details. I don't want to duplicate this again here. thanks, greg k-h
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Thu 19-04-18 15:59:43, Greg KH wrote: > On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote: > > Den 16-04-2018 kl. 19:19, skrev Sasha Levin: > > > On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote: > > > > On Mon, 16 Apr 2018 16:02:03 + > > > > Sasha Levinwrote: > > > > > > > > > One of the things Greg is pushing strongly for is "bug compatibility": > > > > > we want the kernel to behave the same way between mainline and stable. > > > > > If the code is broken, it should be broken in the same way. > > > > > > > > Wait! What does that mean? What's the purpose of stable if it is as > > > > broken as mainline? > > > > > > This just means that if there is a fix that went in mainline, and the > > > fix is broken somehow, we'd rather take the broken fix than not. > > > > > > In this scenario, *something* will be broken, it's just a matter of > > > what. We'd rather have the same thing broken between mainline and > > > stable. > > > > > > > Yeah, but _intentionally_ breaking existing setups to stay "bug compatible" > > _is_ a _regression_ you _really_ _dont_ want in a stable > > supported distro. Because end-users dont care about upstream breaking > > stuff... its the distro that takes the heat for that... > > > > Something "already broken" is not a regression... > > > > As distro maintainer that means one now have to review _every_ patch that > > carries "AUTOSEL", follow all the mail threads that comes up about it, then > > track if it landed in -stable queue, and read every response and possible > > objection to all patches in the -stable queue a second time around... then > > check if it still got included in final stable point relase and then either > > revert them in distro kernel or go track down all the follow-up fixes > > needed... > > > > Just to avoid being "bug compatible with master" > > I've done this "bug compatible" "breakage" more than the AUTOSEL stuff > has in the past, so you had better also be reviewing all of my normal > commits as well :) > > Anyway, we are trying not to do this, but it does, and will, > occasionally happen. Sure, that's understood. So this was just misunderstanding. Sasha's original comment really sounded like "bug compatibility" with current master is desirable and that made me go "Are you serious?" as well... Honza -- Jan Kara SUSE Labs, CR
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Thu 19-04-18 15:59:43, Greg KH wrote: > On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote: > > Den 16-04-2018 kl. 19:19, skrev Sasha Levin: > > > On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote: > > > > On Mon, 16 Apr 2018 16:02:03 + > > > > Sasha Levin wrote: > > > > > > > > > One of the things Greg is pushing strongly for is "bug compatibility": > > > > > we want the kernel to behave the same way between mainline and stable. > > > > > If the code is broken, it should be broken in the same way. > > > > > > > > Wait! What does that mean? What's the purpose of stable if it is as > > > > broken as mainline? > > > > > > This just means that if there is a fix that went in mainline, and the > > > fix is broken somehow, we'd rather take the broken fix than not. > > > > > > In this scenario, *something* will be broken, it's just a matter of > > > what. We'd rather have the same thing broken between mainline and > > > stable. > > > > > > > Yeah, but _intentionally_ breaking existing setups to stay "bug compatible" > > _is_ a _regression_ you _really_ _dont_ want in a stable > > supported distro. Because end-users dont care about upstream breaking > > stuff... its the distro that takes the heat for that... > > > > Something "already broken" is not a regression... > > > > As distro maintainer that means one now have to review _every_ patch that > > carries "AUTOSEL", follow all the mail threads that comes up about it, then > > track if it landed in -stable queue, and read every response and possible > > objection to all patches in the -stable queue a second time around... then > > check if it still got included in final stable point relase and then either > > revert them in distro kernel or go track down all the follow-up fixes > > needed... > > > > Just to avoid being "bug compatible with master" > > I've done this "bug compatible" "breakage" more than the AUTOSEL stuff > has in the past, so you had better also be reviewing all of my normal > commits as well :) > > Anyway, we are trying not to do this, but it does, and will, > occasionally happen. Sure, that's understood. So this was just misunderstanding. Sasha's original comment really sounded like "bug compatibility" with current master is desirable and that made me go "Are you serious?" as well... Honza -- Jan Kara SUSE Labs, CR
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote: > Den 16-04-2018 kl. 19:19, skrev Sasha Levin: > > On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote: > > > On Mon, 16 Apr 2018 16:02:03 + > > > Sasha Levinwrote: > > > > > > > One of the things Greg is pushing strongly for is "bug compatibility": > > > > we want the kernel to behave the same way between mainline and stable. > > > > If the code is broken, it should be broken in the same way. > > > > > > Wait! What does that mean? What's the purpose of stable if it is as > > > broken as mainline? > > > > This just means that if there is a fix that went in mainline, and the > > fix is broken somehow, we'd rather take the broken fix than not. > > > > In this scenario, *something* will be broken, it's just a matter of > > what. We'd rather have the same thing broken between mainline and > > stable. > > > > Yeah, but _intentionally_ breaking existing setups to stay "bug compatible" > _is_ a _regression_ you _really_ _dont_ want in a stable > supported distro. Because end-users dont care about upstream breaking > stuff... its the distro that takes the heat for that... > > Something "already broken" is not a regression... > > As distro maintainer that means one now have to review _every_ patch that > carries "AUTOSEL", follow all the mail threads that comes up about it, then > track if it landed in -stable queue, and read every response and possible > objection to all patches in the -stable queue a second time around... then > check if it still got included in final stable point relase and then either > revert them in distro kernel or go track down all the follow-up fixes > needed... > > Just to avoid being "bug compatible with master" I've done this "bug compatible" "breakage" more than the AUTOSEL stuff has in the past, so you had better also be reviewing all of my normal commits as well :) Anyway, we are trying not to do this, but it does, and will, occasionally happen. Look, we just did that for one platform for 4.9.94! And the key to all of this is good testing, which we are now doing, and hopefully you are also doing as well. thanks, greg k-h
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Thu, Apr 19, 2018 at 02:41:33PM +0300, Thomas Backlund wrote: > Den 16-04-2018 kl. 19:19, skrev Sasha Levin: > > On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote: > > > On Mon, 16 Apr 2018 16:02:03 + > > > Sasha Levin wrote: > > > > > > > One of the things Greg is pushing strongly for is "bug compatibility": > > > > we want the kernel to behave the same way between mainline and stable. > > > > If the code is broken, it should be broken in the same way. > > > > > > Wait! What does that mean? What's the purpose of stable if it is as > > > broken as mainline? > > > > This just means that if there is a fix that went in mainline, and the > > fix is broken somehow, we'd rather take the broken fix than not. > > > > In this scenario, *something* will be broken, it's just a matter of > > what. We'd rather have the same thing broken between mainline and > > stable. > > > > Yeah, but _intentionally_ breaking existing setups to stay "bug compatible" > _is_ a _regression_ you _really_ _dont_ want in a stable > supported distro. Because end-users dont care about upstream breaking > stuff... its the distro that takes the heat for that... > > Something "already broken" is not a regression... > > As distro maintainer that means one now have to review _every_ patch that > carries "AUTOSEL", follow all the mail threads that comes up about it, then > track if it landed in -stable queue, and read every response and possible > objection to all patches in the -stable queue a second time around... then > check if it still got included in final stable point relase and then either > revert them in distro kernel or go track down all the follow-up fixes > needed... > > Just to avoid being "bug compatible with master" I've done this "bug compatible" "breakage" more than the AUTOSEL stuff has in the past, so you had better also be reviewing all of my normal commits as well :) Anyway, we are trying not to do this, but it does, and will, occasionally happen. Look, we just did that for one platform for 4.9.94! And the key to all of this is good testing, which we are now doing, and hopefully you are also doing as well. thanks, greg k-h
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
Den 16-04-2018 kl. 19:19, skrev Sasha Levin: On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote: On Mon, 16 Apr 2018 16:02:03 + Sasha Levinwrote: One of the things Greg is pushing strongly for is "bug compatibility": we want the kernel to behave the same way between mainline and stable. If the code is broken, it should be broken in the same way. Wait! What does that mean? What's the purpose of stable if it is as broken as mainline? This just means that if there is a fix that went in mainline, and the fix is broken somehow, we'd rather take the broken fix than not. In this scenario, *something* will be broken, it's just a matter of what. We'd rather have the same thing broken between mainline and stable. Yeah, but _intentionally_ breaking existing setups to stay "bug compatible" _is_ a _regression_ you _really_ _dont_ want in a stable supported distro. Because end-users dont care about upstream breaking stuff... its the distro that takes the heat for that... Something "already broken" is not a regression... As distro maintainer that means one now have to review _every_ patch that carries "AUTOSEL", follow all the mail threads that comes up about it, then track if it landed in -stable queue, and read every response and possible objection to all patches in the -stable queue a second time around... then check if it still got included in final stable point relase and then either revert them in distro kernel or go track down all the follow-up fixes needed... Just to avoid being "bug compatible with master" -- Thomas
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
Den 16-04-2018 kl. 19:19, skrev Sasha Levin: On Mon, Apr 16, 2018 at 12:12:24PM -0400, Steven Rostedt wrote: On Mon, 16 Apr 2018 16:02:03 + Sasha Levin wrote: One of the things Greg is pushing strongly for is "bug compatibility": we want the kernel to behave the same way between mainline and stable. If the code is broken, it should be broken in the same way. Wait! What does that mean? What's the purpose of stable if it is as broken as mainline? This just means that if there is a fix that went in mainline, and the fix is broken somehow, we'd rather take the broken fix than not. In this scenario, *something* will be broken, it's just a matter of what. We'd rather have the same thing broken between mainline and stable. Yeah, but _intentionally_ breaking existing setups to stay "bug compatible" _is_ a _regression_ you _really_ _dont_ want in a stable supported distro. Because end-users dont care about upstream breaking stuff... its the distro that takes the heat for that... Something "already broken" is not a regression... As distro maintainer that means one now have to review _every_ patch that carries "AUTOSEL", follow all the mail threads that comes up about it, then track if it landed in -stable queue, and read every response and possible objection to all patches in the -stable queue a second time around... then check if it still got included in final stable point relase and then either revert them in distro kernel or go track down all the follow-up fixes needed... Just to avoid being "bug compatible with master" -- Thomas
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Tue 2018-04-17 13:45:59, Sasha Levin wrote: > On Tue, Apr 17, 2018 at 02:24:54PM +0200, Petr Mladek wrote: > >Back to the trend. Last week I got autosel mails even for > >patches that were still being discussed, had issues, and > >were far from upstream: > > > > https://lkml.kernel.org/r/dm5pr2101mb1032ab19b489d46b717b50d4fb...@dm5pr2101mb1032.namprd21.prod.outlook.com > > https://lkml.kernel.org/r/dm5pr2101mb10327fa0a7e0d2c901e33b79fb...@dm5pr2101mb1032.namprd21.prod.outlook.com > > > >It might be a good idea if the mail asked to add Fixes: tag > >or stable mailing list. But the mail suggested to add the > >unfinished patch into stable branch directly (even before > >upstreaming?). > > I obviously didn't suggest that this patch will go in -stable before > it's upstream. > > I've started doing those because some folks can't be arsed to reply to a > review request for a patch that is months old. I found that if I send > these mails while the discussion is still going on I'd get a much better > response rate from people. I see. It makes sense. > If you think any of these patches should go in stable there were two > ways about it: > > - You end up adding the -stable tag yourself, and it would follow the >usual route where Greg picks it up. > - You reply to that mail, and the patch would wait in a list until my >script notices it made it upstream, at which point it would get >queued for stable. It would be great if the options are described in the mail. I wonder if it would make sense to add also a tag that would say that the commit is not suitable for stable. It might help both sides. The maintainers will be able to share their opinion and eventually reduce mails from autosel. You would get feedback that maintainers considered the patch for stable. It might be even useful for teaching the AI. > >Now, there are only hand full of printk patches in each > >release, so it is still doable. I just do not understand > >how other maintainers, from much more busy subsystems, > >could cope with this trend. > > So yes, I'm aware that the volume of patches is huge, but there's not > much I can do about it because it's just a subset of the kernel's patch > volume and since the kernel gets more and more patches each release, the > volume of stable commits is bound to grow as well. Yes, but the grow in the stable is much faster than the grow in maintain at the moment. It might be fine if it was caused just by engaging subsystems that ignored stable so far. But I am not sure if it is the case. Also I am not sure about your plans. Anyway, I am surprised that the patches might go into stable so easily (no response -> accepted). While it is pretty hard to get through the review process for mainline. Of course, many patches go into mainline without review as well. But the difference is that they are pushed by people that are familiar and responsible for the affected area. I could understand the pain. There are surely people that do not care about stable, because it takes time, it is hard to make decisions, flashbacks to the old code are painful, etc. Well, this is the reason why the maintenance support is and should be limited. Anyway, I think that it cannot be done reasonably without maintainers. You should be careful so that even the currently cooperating maintainers will not start considering autosel mails as a spam. (It is not my case. printk is small thing. But I could imagine that it might stop being bearable in bigger subsystems. As is already the case with xfs.) Best Regards, Petr
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Tue 2018-04-17 13:45:59, Sasha Levin wrote: > On Tue, Apr 17, 2018 at 02:24:54PM +0200, Petr Mladek wrote: > >Back to the trend. Last week I got autosel mails even for > >patches that were still being discussed, had issues, and > >were far from upstream: > > > > https://lkml.kernel.org/r/dm5pr2101mb1032ab19b489d46b717b50d4fb...@dm5pr2101mb1032.namprd21.prod.outlook.com > > https://lkml.kernel.org/r/dm5pr2101mb10327fa0a7e0d2c901e33b79fb...@dm5pr2101mb1032.namprd21.prod.outlook.com > > > >It might be a good idea if the mail asked to add Fixes: tag > >or stable mailing list. But the mail suggested to add the > >unfinished patch into stable branch directly (even before > >upstreaming?). > > I obviously didn't suggest that this patch will go in -stable before > it's upstream. > > I've started doing those because some folks can't be arsed to reply to a > review request for a patch that is months old. I found that if I send > these mails while the discussion is still going on I'd get a much better > response rate from people. I see. It makes sense. > If you think any of these patches should go in stable there were two > ways about it: > > - You end up adding the -stable tag yourself, and it would follow the >usual route where Greg picks it up. > - You reply to that mail, and the patch would wait in a list until my >script notices it made it upstream, at which point it would get >queued for stable. It would be great if the options are described in the mail. I wonder if it would make sense to add also a tag that would say that the commit is not suitable for stable. It might help both sides. The maintainers will be able to share their opinion and eventually reduce mails from autosel. You would get feedback that maintainers considered the patch for stable. It might be even useful for teaching the AI. > >Now, there are only hand full of printk patches in each > >release, so it is still doable. I just do not understand > >how other maintainers, from much more busy subsystems, > >could cope with this trend. > > So yes, I'm aware that the volume of patches is huge, but there's not > much I can do about it because it's just a subset of the kernel's patch > volume and since the kernel gets more and more patches each release, the > volume of stable commits is bound to grow as well. Yes, but the grow in the stable is much faster than the grow in maintain at the moment. It might be fine if it was caused just by engaging subsystems that ignored stable so far. But I am not sure if it is the case. Also I am not sure about your plans. Anyway, I am surprised that the patches might go into stable so easily (no response -> accepted). While it is pretty hard to get through the review process for mainline. Of course, many patches go into mainline without review as well. But the difference is that they are pushed by people that are familiar and responsible for the affected area. I could understand the pain. There are surely people that do not care about stable, because it takes time, it is hard to make decisions, flashbacks to the old code are painful, etc. Well, this is the reason why the maintenance support is and should be limited. Anyway, I think that it cannot be done reasonably without maintainers. You should be careful so that even the currently cooperating maintainers will not start considering autosel mails as a spam. (It is not my case. printk is small thing. But I could imagine that it might stop being bearable in bigger subsystems. As is already the case with xfs.) Best Regards, Petr
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Tue, Apr 17, 2018 at 07:57:54PM +0200, Jan Kara wrote: >Actually I was careful enough to include only commits that got merged as >part of the stable process into 4.14.x but got later reverted in 4.14.y. >That's where the 0.4% number came from. So I believe all of those cases >(13 in absolute numbers) were user visible regressions during the stable >process. I looked at them, and there are 2 things in play here: - Quite a few of those reverts are because of the PTI work. I'm not sure how we treat it, but yes - it skews statistics here. - 2 of them were reverts for device tree changes for a device that didn't exist in 4.14, and shouldn't have had any user visible changes.
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Tue, Apr 17, 2018 at 07:57:54PM +0200, Jan Kara wrote: >Actually I was careful enough to include only commits that got merged as >part of the stable process into 4.14.x but got later reverted in 4.14.y. >That's where the 0.4% number came from. So I believe all of those cases >(13 in absolute numbers) were user visible regressions during the stable >process. I looked at them, and there are 2 things in play here: - Quite a few of those reverts are because of the PTI work. I'm not sure how we treat it, but yes - it skews statistics here. - 2 of them were reverts for device tree changes for a device that didn't exist in 4.14, and shouldn't have had any user visible changes.
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Tue 17-04-18 14:36:44, Sasha Levin wrote: > On Tue, Apr 17, 2018 at 04:22:46PM +0200, Michal Hocko wrote: > >On Tue 17-04-18 13:39:33, Sasha Levin wrote: > >[...] > >> But mm/ commits don't come only from these people. Here's a concrete > >> example we can discuss: > >> > >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c61611f70958d86f659bca25c02ae69413747a8d > > > >I would be really careful. Because that reqiures to audit all callers to > >be compliant with the change. This is just _too_ easy to backport > >without noticing a failure. Now consider the other side. Is there any > >real bug report backing this? This behavior was like that for quite some > >time but I do not remember any actual bug report and the changelog > >doesn't mention one either. It is about theoretical problem. > > https://lkml.org/lkml/2018/3/19/430 > > There's even a fun little reproducer that allowed me to confirm it's an > issue (at least) on 4.15. > > Heck, it might even qualify as a CVE. > > >So if this was to be merged to stable then the changelog should contain > >a big fat warning about the existing users and how they should be > >checked. > > So what I'm asking is why *wasn't* it sent to stable? Yes, it requires > additional work backporting this, but what I'm saying is that this > didn't happen at all. Do not ask me. I wasn't involved. But I would _guess_ that the original bug is not all that serious because it requires some specific privileges and it is quite unlikely that somebody privileged would want to shoot its feet. But this is just my wild guess. Anyway, I am pretty sure that if the triggering BUG was serious enough then it would be much safer to remove it for stable backports. -- Michal Hocko SUSE Labs
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Tue 17-04-18 14:36:44, Sasha Levin wrote: > On Tue, Apr 17, 2018 at 04:22:46PM +0200, Michal Hocko wrote: > >On Tue 17-04-18 13:39:33, Sasha Levin wrote: > >[...] > >> But mm/ commits don't come only from these people. Here's a concrete > >> example we can discuss: > >> > >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c61611f70958d86f659bca25c02ae69413747a8d > > > >I would be really careful. Because that reqiures to audit all callers to > >be compliant with the change. This is just _too_ easy to backport > >without noticing a failure. Now consider the other side. Is there any > >real bug report backing this? This behavior was like that for quite some > >time but I do not remember any actual bug report and the changelog > >doesn't mention one either. It is about theoretical problem. > > https://lkml.org/lkml/2018/3/19/430 > > There's even a fun little reproducer that allowed me to confirm it's an > issue (at least) on 4.15. > > Heck, it might even qualify as a CVE. > > >So if this was to be merged to stable then the changelog should contain > >a big fat warning about the existing users and how they should be > >checked. > > So what I'm asking is why *wasn't* it sent to stable? Yes, it requires > additional work backporting this, but what I'm saying is that this > didn't happen at all. Do not ask me. I wasn't involved. But I would _guess_ that the original bug is not all that serious because it requires some specific privileges and it is quite unlikely that somebody privileged would want to shoot its feet. But this is just my wild guess. Anyway, I am pretty sure that if the triggering BUG was serious enough then it would be much safer to remove it for stable backports. -- Michal Hocko SUSE Labs
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Tue 17-04-18 16:19:35, Sasha Levin wrote: > On Tue, Apr 17, 2018 at 05:55:49PM +0200, Jan Kara wrote: > >> Even regression chance is tricky, look at the commits I've linked > >> earlier in the thread. Even the most trivial looking commits that end up > >> in stable have a chance for regression. > > > >Sure, you can never be certain and I think people (including me) > >underestimate the chance of regressions for "trivial" patches. But you just > >estimate a chance, you may be lucky, you may not... > > > >> >Another point I wanted to make is that if chance a patch causes a > >> >regression is about 2% as you said somewhere else in a thread, then by > >> >adding 20 patches that "may fix a bug that is annoying for someone" you've > >> >just increased a chance there's a regression in the release by 34%. And > >> > >> So I've said that the rejection rate is less than 2%. This includes > >> all commits that I have proposed for -stable, but didn't end up being > >> included in -stable. > >> > >> This includes commits that the author/maintainers NACKed, commits that > >> didn't do anything on older kernels, commits that were buggy but were > >> caught before the kernel was released, commits that failed to build on > >> an arch I didn't test it on originally and so on. > >> > >> After thousands of merged AUTOSEL patches I can count the number of > >> times a commit has caused a regression and had to be removed on one > >> hand. > >> > >> >this is not just a math game, this also roughly matches a real experience > >> >with maintaining our enterprise kernels. Do 20 "maybe" fixes outweight > >> >such > >> >regression chance? And I also note that for a regression to get reported > >> >so > >> >that it gets included into your 2% estimate of a patch regression rate, > >> >someone must be bothered enough by it to triage it and send an email > >> >somewhere so that already falls into a category of "serious" stuff to me. > >> > >> It is indeed a numbers game, but the regression rate isn't 2%, it's > >> closer to 0.05%. > > > >Honestly, I think 0.05% is too optimististic :) Quick grepping of 4.14 > >stable tree suggests some 13 commits were reverted from stable due to bugs. > >That's some 0.4% and that doesn't count fixes that were applied to > >fix other regressions. > > 0.05% is for commits that were merged in stable but later fixed or > reverted because they introduced a regression. By grepping for reverts > you also include things such as: > > - Reverts of commits that were in the corresponding mainline tree > - Reverts of commits that didn't introduce regressions Actually I was careful enough to include only commits that got merged as part of the stable process into 4.14.x but got later reverted in 4.14.y. That's where the 0.4% number came from. So I believe all of those cases (13 in absolute numbers) were user visible regressions during the stable process. Honza -- Jan KaraSUSE Labs, CR
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Tue 17-04-18 16:19:35, Sasha Levin wrote: > On Tue, Apr 17, 2018 at 05:55:49PM +0200, Jan Kara wrote: > >> Even regression chance is tricky, look at the commits I've linked > >> earlier in the thread. Even the most trivial looking commits that end up > >> in stable have a chance for regression. > > > >Sure, you can never be certain and I think people (including me) > >underestimate the chance of regressions for "trivial" patches. But you just > >estimate a chance, you may be lucky, you may not... > > > >> >Another point I wanted to make is that if chance a patch causes a > >> >regression is about 2% as you said somewhere else in a thread, then by > >> >adding 20 patches that "may fix a bug that is annoying for someone" you've > >> >just increased a chance there's a regression in the release by 34%. And > >> > >> So I've said that the rejection rate is less than 2%. This includes > >> all commits that I have proposed for -stable, but didn't end up being > >> included in -stable. > >> > >> This includes commits that the author/maintainers NACKed, commits that > >> didn't do anything on older kernels, commits that were buggy but were > >> caught before the kernel was released, commits that failed to build on > >> an arch I didn't test it on originally and so on. > >> > >> After thousands of merged AUTOSEL patches I can count the number of > >> times a commit has caused a regression and had to be removed on one > >> hand. > >> > >> >this is not just a math game, this also roughly matches a real experience > >> >with maintaining our enterprise kernels. Do 20 "maybe" fixes outweight > >> >such > >> >regression chance? And I also note that for a regression to get reported > >> >so > >> >that it gets included into your 2% estimate of a patch regression rate, > >> >someone must be bothered enough by it to triage it and send an email > >> >somewhere so that already falls into a category of "serious" stuff to me. > >> > >> It is indeed a numbers game, but the regression rate isn't 2%, it's > >> closer to 0.05%. > > > >Honestly, I think 0.05% is too optimististic :) Quick grepping of 4.14 > >stable tree suggests some 13 commits were reverted from stable due to bugs. > >That's some 0.4% and that doesn't count fixes that were applied to > >fix other regressions. > > 0.05% is for commits that were merged in stable but later fixed or > reverted because they introduced a regression. By grepping for reverts > you also include things such as: > > - Reverts of commits that were in the corresponding mainline tree > - Reverts of commits that didn't introduce regressions Actually I was careful enough to include only commits that got merged as part of the stable process into 4.14.x but got later reverted in 4.14.y. That's where the 0.4% number came from. So I believe all of those cases (13 in absolute numbers) were user visible regressions during the stable process. Honza -- Jan Kara SUSE Labs, CR
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Tue, Apr 17, 2018 at 05:52:30PM +0200, Jiri Kosina wrote: >On Tue, 17 Apr 2018, Sasha Levin wrote: > >> How do I get the XFS folks to send their stuff to -stable? (we have >> quite a few customers who use XFS) > >If XFS (or *any* other subsystem) doesn't have enough manpower of upstream >maintainers to deal with stable, we just have to accept that and find an >answer to that. This is exactly what I'm doing. Many subsystems don't have enough manpower to deal with -stable, so I'm trying to help. >If XFS folks claim that they don't have enough mental capacity to >create/verify XFS backports, I totally don't see how any kind of AI would >have. Because creating backports is not all about mental capacity! A lot of time gets wasted on going through the list of commits, backporting each of those commits into every -stable tree we have, building it, running tests, etc. So it's not all about pure mental capacity, but more about the time per-patch it takes to get -stable done. If I can cut down on that, by suggesting a list of commits, doing builds and tests, what's the problem? >If your business relies on XFS (and so does ours, BTW) or any other >subsystem that doesn't have enough manpower to care for stable, the proper >solution (and contribution) would be just bringing more people into the >XFS community. Microsoft's business relies on quite a few kernel subsystems. While we try to bring more people in the kernel (we're hiring!), as you might know it's not easy getting kernel folks. So just "get more people" isn't a good solution. It doesn't scale either. >To put it simply -- I don't think the simple lack of actual human >brainpower can be reasonably resolved in other way than bringing more of >it in. > >Thanks, > >-- >Jiri Kosina >SUSE Labs >
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Tue, Apr 17, 2018 at 05:52:30PM +0200, Jiri Kosina wrote: >On Tue, 17 Apr 2018, Sasha Levin wrote: > >> How do I get the XFS folks to send their stuff to -stable? (we have >> quite a few customers who use XFS) > >If XFS (or *any* other subsystem) doesn't have enough manpower of upstream >maintainers to deal with stable, we just have to accept that and find an >answer to that. This is exactly what I'm doing. Many subsystems don't have enough manpower to deal with -stable, so I'm trying to help. >If XFS folks claim that they don't have enough mental capacity to >create/verify XFS backports, I totally don't see how any kind of AI would >have. Because creating backports is not all about mental capacity! A lot of time gets wasted on going through the list of commits, backporting each of those commits into every -stable tree we have, building it, running tests, etc. So it's not all about pure mental capacity, but more about the time per-patch it takes to get -stable done. If I can cut down on that, by suggesting a list of commits, doing builds and tests, what's the problem? >If your business relies on XFS (and so does ours, BTW) or any other >subsystem that doesn't have enough manpower to care for stable, the proper >solution (and contribution) would be just bringing more people into the >XFS community. Microsoft's business relies on quite a few kernel subsystems. While we try to bring more people in the kernel (we're hiring!), as you might know it's not easy getting kernel folks. So just "get more people" isn't a good solution. It doesn't scale either. >To put it simply -- I don't think the simple lack of actual human >brainpower can be reasonably resolved in other way than bringing more of >it in. > >Thanks, > >-- >Jiri Kosina >SUSE Labs >
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Tue, 2018-04-17 at 17:52 +0200, Jiri Kosina wrote: > On Tue, 17 Apr 2018, Sasha Levin wrote: > > > How do I get the XFS folks to send their stuff to -stable? (we have > > quite a few customers who use XFS) > > If XFS (or *any* other subsystem) doesn't have enough manpower of upstream > maintainers to deal with stable, we just have to accept that and find an > answer to that. > > If XFS folks claim that they don't have enough mental capacity to > create/verify XFS backports, I totally don't see how any kind of AI would > have. > > If your business relies on XFS (and so does ours, BTW) or any other > subsystem that doesn't have enough manpower to care for stable, the proper > solution (and contribution) would be just bringing more people into the > XFS community. > > To put it simply -- I don't think the simple lack of actual human > brainpower can be reasonably resolved in other way than bringing more of > it in. Not to worry... soon enough it'll be submitting properly massaged backports of the stuff it submitted upstream :) -Mike
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Tue, 2018-04-17 at 17:52 +0200, Jiri Kosina wrote: > On Tue, 17 Apr 2018, Sasha Levin wrote: > > > How do I get the XFS folks to send their stuff to -stable? (we have > > quite a few customers who use XFS) > > If XFS (or *any* other subsystem) doesn't have enough manpower of upstream > maintainers to deal with stable, we just have to accept that and find an > answer to that. > > If XFS folks claim that they don't have enough mental capacity to > create/verify XFS backports, I totally don't see how any kind of AI would > have. > > If your business relies on XFS (and so does ours, BTW) or any other > subsystem that doesn't have enough manpower to care for stable, the proper > solution (and contribution) would be just bringing more people into the > XFS community. > > To put it simply -- I don't think the simple lack of actual human > brainpower can be reasonably resolved in other way than bringing more of > it in. Not to worry... soon enough it'll be submitting properly massaged backports of the stuff it submitted upstream :) -Mike
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Tue, Apr 17, 2018 at 05:55:49PM +0200, Jan Kara wrote: >On Tue 17-04-18 13:31:51, Sasha Levin wrote: >> We may be able to guesstimate the 'regression chance', but there's no >> way we can guess the 'annoyance' once. There are so many different use >> cases that we just can't even guess how many people would get "annoyed" >> by something. > >As a maintainer, I hope I have reasonable idea what are common use cases >for my subsystem. Those I cater to when estimating 'annoyance'. Sure I don't >know all of the use cases so people doing unusual stuff hit more bugs and >have to report them to get fixes included in -stable. But for me this is a >preferable tradeoff over the risk of regression so this is the rule I use >when tagging for stable. Now I'm not a -stable maintainer and I fully agree >with "those who do the work decide" principle so pick whatever patches you >think are appropriate, I just wanted explain why I don't think more patches >in stable are necessarily good. The AUTOSEL story is different for subsystems that don't do -stable, and subsystems that are actually doing the work (like yourself). I'm not trying to override active maintainers, I'm trying to help them make decisions. The AUTOSEL bot will attempt to apply any patch it deems as -stable for on all -stable branches, finding possible dependencies, build them, and run any tests that you might deem necessary. You would be able to start your analysis without "wasting" time on doing a bunch of "manual labor". There's a big difference between subsystems like yours and most of the rest of the kernel. >> Even regression chance is tricky, look at the commits I've linked >> earlier in the thread. Even the most trivial looking commits that end up >> in stable have a chance for regression. > >Sure, you can never be certain and I think people (including me) >underestimate the chance of regressions for "trivial" patches. But you just >estimate a chance, you may be lucky, you may not... > >> >Another point I wanted to make is that if chance a patch causes a >> >regression is about 2% as you said somewhere else in a thread, then by >> >adding 20 patches that "may fix a bug that is annoying for someone" you've >> >just increased a chance there's a regression in the release by 34%. And >> >> So I've said that the rejection rate is less than 2%. This includes >> all commits that I have proposed for -stable, but didn't end up being >> included in -stable. >> >> This includes commits that the author/maintainers NACKed, commits that >> didn't do anything on older kernels, commits that were buggy but were >> caught before the kernel was released, commits that failed to build on >> an arch I didn't test it on originally and so on. >> >> After thousands of merged AUTOSEL patches I can count the number of >> times a commit has caused a regression and had to be removed on one >> hand. >> >> >this is not just a math game, this also roughly matches a real experience >> >with maintaining our enterprise kernels. Do 20 "maybe" fixes outweight such >> >regression chance? And I also note that for a regression to get reported so >> >that it gets included into your 2% estimate of a patch regression rate, >> >someone must be bothered enough by it to triage it and send an email >> >somewhere so that already falls into a category of "serious" stuff to me. >> >> It is indeed a numbers game, but the regression rate isn't 2%, it's >> closer to 0.05%. > >Honestly, I think 0.05% is too optimististic :) Quick grepping of 4.14 >stable tree suggests some 13 commits were reverted from stable due to bugs. >That's some 0.4% and that doesn't count fixes that were applied to >fix other regressions. 0.05% is for commits that were merged in stable but later fixed or reverted because they introduced a regression. By grepping for reverts you also include things such as: - Reverts of commits that were in the corresponding mainline tree - Reverts of commits that didn't introduce regressions >But the actual numbers don't really matter that much, in principle the more >patches you add the higher is the chance of regression. You can't change >that so you better have a good reason to include a patch... You increase the chance of regressions, but you also increase the chance of fixing bugs that affect users. My claim is that the chance to fix bugs increases far more significantly than the chance to introduce regressions.
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Tue, Apr 17, 2018 at 05:55:49PM +0200, Jan Kara wrote: >On Tue 17-04-18 13:31:51, Sasha Levin wrote: >> We may be able to guesstimate the 'regression chance', but there's no >> way we can guess the 'annoyance' once. There are so many different use >> cases that we just can't even guess how many people would get "annoyed" >> by something. > >As a maintainer, I hope I have reasonable idea what are common use cases >for my subsystem. Those I cater to when estimating 'annoyance'. Sure I don't >know all of the use cases so people doing unusual stuff hit more bugs and >have to report them to get fixes included in -stable. But for me this is a >preferable tradeoff over the risk of regression so this is the rule I use >when tagging for stable. Now I'm not a -stable maintainer and I fully agree >with "those who do the work decide" principle so pick whatever patches you >think are appropriate, I just wanted explain why I don't think more patches >in stable are necessarily good. The AUTOSEL story is different for subsystems that don't do -stable, and subsystems that are actually doing the work (like yourself). I'm not trying to override active maintainers, I'm trying to help them make decisions. The AUTOSEL bot will attempt to apply any patch it deems as -stable for on all -stable branches, finding possible dependencies, build them, and run any tests that you might deem necessary. You would be able to start your analysis without "wasting" time on doing a bunch of "manual labor". There's a big difference between subsystems like yours and most of the rest of the kernel. >> Even regression chance is tricky, look at the commits I've linked >> earlier in the thread. Even the most trivial looking commits that end up >> in stable have a chance for regression. > >Sure, you can never be certain and I think people (including me) >underestimate the chance of regressions for "trivial" patches. But you just >estimate a chance, you may be lucky, you may not... > >> >Another point I wanted to make is that if chance a patch causes a >> >regression is about 2% as you said somewhere else in a thread, then by >> >adding 20 patches that "may fix a bug that is annoying for someone" you've >> >just increased a chance there's a regression in the release by 34%. And >> >> So I've said that the rejection rate is less than 2%. This includes >> all commits that I have proposed for -stable, but didn't end up being >> included in -stable. >> >> This includes commits that the author/maintainers NACKed, commits that >> didn't do anything on older kernels, commits that were buggy but were >> caught before the kernel was released, commits that failed to build on >> an arch I didn't test it on originally and so on. >> >> After thousands of merged AUTOSEL patches I can count the number of >> times a commit has caused a regression and had to be removed on one >> hand. >> >> >this is not just a math game, this also roughly matches a real experience >> >with maintaining our enterprise kernels. Do 20 "maybe" fixes outweight such >> >regression chance? And I also note that for a regression to get reported so >> >that it gets included into your 2% estimate of a patch regression rate, >> >someone must be bothered enough by it to triage it and send an email >> >somewhere so that already falls into a category of "serious" stuff to me. >> >> It is indeed a numbers game, but the regression rate isn't 2%, it's >> closer to 0.05%. > >Honestly, I think 0.05% is too optimististic :) Quick grepping of 4.14 >stable tree suggests some 13 commits were reverted from stable due to bugs. >That's some 0.4% and that doesn't count fixes that were applied to >fix other regressions. 0.05% is for commits that were merged in stable but later fixed or reverted because they introduced a regression. By grepping for reverts you also include things such as: - Reverts of commits that were in the corresponding mainline tree - Reverts of commits that didn't introduce regressions >But the actual numbers don't really matter that much, in principle the more >patches you add the higher is the chance of regression. You can't change >that so you better have a good reason to include a patch... You increase the chance of regressions, but you also increase the chance of fixing bugs that affect users. My claim is that the chance to fix bugs increases far more significantly than the chance to introduce regressions.
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Tue 17-04-18 13:31:51, Sasha Levin wrote: > On Tue, Apr 17, 2018 at 01:41:44PM +0200, Jan Kara wrote: > >On Mon 16-04-18 17:23:30, Sasha Levin wrote: > >> On Mon, Apr 16, 2018 at 07:06:04PM +0200, Pavel Machek wrote: > >> >On Mon 2018-04-16 16:37:56, Sasha Levin wrote: > >> >> On Mon, Apr 16, 2018 at 12:30:19PM -0400, Steven Rostedt wrote: > >> >> >On Mon, 16 Apr 2018 16:19:14 + > >> >> >Sasha Levinwrote: > >> >> > > >> >> >> >Wait! What does that mean? What's the purpose of stable if it is as > >> >> >> >broken as mainline? > >> >> >> > >> >> >> This just means that if there is a fix that went in mainline, and the > >> >> >> fix is broken somehow, we'd rather take the broken fix than not. > >> >> >> > >> >> >> In this scenario, *something* will be broken, it's just a matter of > >> >> >> what. We'd rather have the same thing broken between mainline and > >> >> >> stable. > >> >> > > >> >> >Honestly, I think that removes all value of the stable series. I > >> >> >remember when the stable series were first created. People were saying > >> >> >that it wouldn't even get to more than 5 versions, because the bar for > >> >> >backporting was suppose to be very high. Today it's just a fork of the > >> >> >kernel at a given version. No more features, but we will be OK with > >> >> >regressions. I'm struggling to see what the benefit of it is suppose to > >> >> >be? > >> >> > >> >> It's not "OK with regressions". > >> >> > >> >> Let's look at a hypothetical example: You have a 4.15.1 kernel that has > >> >> a broken printf() behaviour so that when you: > >> >> > >> >> pr_err("%d", 5) > >> >> > >> >> Would print: > >> >> > >> >> "Microsoft Rulez" > >> >> > >> >> Bad, right? So you went ahead and fixed it, and now it prints "5" as you > >> >> might expect. But alas, with your patch, running: > >> >> > >> >> pr_err("%s", "hi!") > >> >> > >> >> Would show a cat picture for 5 seconds. > >> >> > >> >> Should we take your patch in -stable or not? If we don't, we're stuck > >> >> with the original issue while the mainline kernel will behave > >> >> differently, but if we do - we introduce a new regression. > >> > > >> >Of course not. > >> > > >> >- It must be obviously correct and tested. > >> > > >> >If it introduces new bug, it is not correct, and certainly not > >> >obviously correct. > >> > >> As you might have noticed, we don't strictly follow the rules. > >> > >> Take a look at the whole PTI story as an example. It's way more than 100 > >> lines, it's not obviously corrent, it fixed more than 1 thing, and so > >> on, and yet it went in -stable! > >> > >> Would you argue we shouldn't have backported PTI to -stable? > > > >So I agree with that being backported. But I think this nicely demostrates > >a point some people are trying to make in this thread. We do take fixes > >with high risk or regression if they fix serious enough issue. Also we do > >take fixes to non-serious stuff (such as addition of device ID) if the > >chances of regression are really low. > > > >So IMHO the metric for including the fix is not solely "how annoying to > >user this can be" but rather something like: > > > >score = (how annoying the bug is) * ((1 / (chance of regression due to > > including this)) - 1)^3 > > > >(constants are somewhat arbitrary subject to tuning ;). Now both 'annoying' > >and 'regression chance' parts are subjective and sometimes difficult to > >estimate so don't take the formula too seriously but it demonstrates the > >point. I think we all agree we want to fix annoying stuff and we don't want > >regressions. But you need to somehow weight this over your expected > >userbase - and this is where your argument "but someone might be annoyed by > >LEDs not working so let's include it" has problems - it should rather be > >"is the annoyance of non-working leds over expected user base high enough > >to risk a regression due to this patch for someone in the expected user > >base"? The answer to this second question is not clear at all to a casual > >reviewer and that's why we IMHO have CC stable tag as maintainer is > >supposed to have at least a bit better clue. > > We may be able to guesstimate the 'regression chance', but there's no > way we can guess the 'annoyance' once. There are so many different use > cases that we just can't even guess how many people would get "annoyed" > by something. As a maintainer, I hope I have reasonable idea what are common use cases for my subsystem. Those I cater to when estimating 'annoyance'. Sure I don't know all of the use cases so people doing unusual stuff hit more bugs and have to report them to get fixes included in -stable. But for me this is a preferable tradeoff over the risk of regression so this is the rule I use when tagging for stable. Now I'm not a -stable maintainer and I fully agree with "those who do the work decide" principle so pick whatever patches you think are appropriate, I just wanted explain
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Tue 17-04-18 13:31:51, Sasha Levin wrote: > On Tue, Apr 17, 2018 at 01:41:44PM +0200, Jan Kara wrote: > >On Mon 16-04-18 17:23:30, Sasha Levin wrote: > >> On Mon, Apr 16, 2018 at 07:06:04PM +0200, Pavel Machek wrote: > >> >On Mon 2018-04-16 16:37:56, Sasha Levin wrote: > >> >> On Mon, Apr 16, 2018 at 12:30:19PM -0400, Steven Rostedt wrote: > >> >> >On Mon, 16 Apr 2018 16:19:14 + > >> >> >Sasha Levin wrote: > >> >> > > >> >> >> >Wait! What does that mean? What's the purpose of stable if it is as > >> >> >> >broken as mainline? > >> >> >> > >> >> >> This just means that if there is a fix that went in mainline, and the > >> >> >> fix is broken somehow, we'd rather take the broken fix than not. > >> >> >> > >> >> >> In this scenario, *something* will be broken, it's just a matter of > >> >> >> what. We'd rather have the same thing broken between mainline and > >> >> >> stable. > >> >> > > >> >> >Honestly, I think that removes all value of the stable series. I > >> >> >remember when the stable series were first created. People were saying > >> >> >that it wouldn't even get to more than 5 versions, because the bar for > >> >> >backporting was suppose to be very high. Today it's just a fork of the > >> >> >kernel at a given version. No more features, but we will be OK with > >> >> >regressions. I'm struggling to see what the benefit of it is suppose to > >> >> >be? > >> >> > >> >> It's not "OK with regressions". > >> >> > >> >> Let's look at a hypothetical example: You have a 4.15.1 kernel that has > >> >> a broken printf() behaviour so that when you: > >> >> > >> >> pr_err("%d", 5) > >> >> > >> >> Would print: > >> >> > >> >> "Microsoft Rulez" > >> >> > >> >> Bad, right? So you went ahead and fixed it, and now it prints "5" as you > >> >> might expect. But alas, with your patch, running: > >> >> > >> >> pr_err("%s", "hi!") > >> >> > >> >> Would show a cat picture for 5 seconds. > >> >> > >> >> Should we take your patch in -stable or not? If we don't, we're stuck > >> >> with the original issue while the mainline kernel will behave > >> >> differently, but if we do - we introduce a new regression. > >> > > >> >Of course not. > >> > > >> >- It must be obviously correct and tested. > >> > > >> >If it introduces new bug, it is not correct, and certainly not > >> >obviously correct. > >> > >> As you might have noticed, we don't strictly follow the rules. > >> > >> Take a look at the whole PTI story as an example. It's way more than 100 > >> lines, it's not obviously corrent, it fixed more than 1 thing, and so > >> on, and yet it went in -stable! > >> > >> Would you argue we shouldn't have backported PTI to -stable? > > > >So I agree with that being backported. But I think this nicely demostrates > >a point some people are trying to make in this thread. We do take fixes > >with high risk or regression if they fix serious enough issue. Also we do > >take fixes to non-serious stuff (such as addition of device ID) if the > >chances of regression are really low. > > > >So IMHO the metric for including the fix is not solely "how annoying to > >user this can be" but rather something like: > > > >score = (how annoying the bug is) * ((1 / (chance of regression due to > > including this)) - 1)^3 > > > >(constants are somewhat arbitrary subject to tuning ;). Now both 'annoying' > >and 'regression chance' parts are subjective and sometimes difficult to > >estimate so don't take the formula too seriously but it demonstrates the > >point. I think we all agree we want to fix annoying stuff and we don't want > >regressions. But you need to somehow weight this over your expected > >userbase - and this is where your argument "but someone might be annoyed by > >LEDs not working so let's include it" has problems - it should rather be > >"is the annoyance of non-working leds over expected user base high enough > >to risk a regression due to this patch for someone in the expected user > >base"? The answer to this second question is not clear at all to a casual > >reviewer and that's why we IMHO have CC stable tag as maintainer is > >supposed to have at least a bit better clue. > > We may be able to guesstimate the 'regression chance', but there's no > way we can guess the 'annoyance' once. There are so many different use > cases that we just can't even guess how many people would get "annoyed" > by something. As a maintainer, I hope I have reasonable idea what are common use cases for my subsystem. Those I cater to when estimating 'annoyance'. Sure I don't know all of the use cases so people doing unusual stuff hit more bugs and have to report them to get fixes included in -stable. But for me this is a preferable tradeoff over the risk of regression so this is the rule I use when tagging for stable. Now I'm not a -stable maintainer and I fully agree with "those who do the work decide" principle so pick whatever patches you think are appropriate, I just wanted explain why I don't think more patches in
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Tue, 17 Apr 2018, Sasha Levin wrote: > How do I get the XFS folks to send their stuff to -stable? (we have > quite a few customers who use XFS) If XFS (or *any* other subsystem) doesn't have enough manpower of upstream maintainers to deal with stable, we just have to accept that and find an answer to that. If XFS folks claim that they don't have enough mental capacity to create/verify XFS backports, I totally don't see how any kind of AI would have. If your business relies on XFS (and so does ours, BTW) or any other subsystem that doesn't have enough manpower to care for stable, the proper solution (and contribution) would be just bringing more people into the XFS community. To put it simply -- I don't think the simple lack of actual human brainpower can be reasonably resolved in other way than bringing more of it in. Thanks, -- Jiri Kosina SUSE Labs
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Tue, 17 Apr 2018, Sasha Levin wrote: > How do I get the XFS folks to send their stuff to -stable? (we have > quite a few customers who use XFS) If XFS (or *any* other subsystem) doesn't have enough manpower of upstream maintainers to deal with stable, we just have to accept that and find an answer to that. If XFS folks claim that they don't have enough mental capacity to create/verify XFS backports, I totally don't see how any kind of AI would have. If your business relies on XFS (and so does ours, BTW) or any other subsystem that doesn't have enough manpower to care for stable, the proper solution (and contribution) would be just bringing more people into the XFS community. To put it simply -- I don't think the simple lack of actual human brainpower can be reasonably resolved in other way than bringing more of it in. Thanks, -- Jiri Kosina SUSE Labs
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Tue, Apr 17, 2018 at 04:36:31PM +0200, Michal Hocko wrote: >On Tue 17-04-18 14:04:36, Sasha Levin wrote: >> On Tue, Apr 17, 2018 at 01:07:17PM +0200, Michal Hocko wrote: >> >On Tue 17-04-18 12:39:36, Greg KH wrote: >> >> On Mon, Apr 16, 2018 at 11:28:44PM +0200, Jiri Kosina wrote: >> >> > On Mon, 16 Apr 2018, Sasha Levin wrote: >> >> > >> >> > > I agree that as an enterprise distro taking everything from -stable >> >> > > isn't the best idea. Ideally you'd want to be close to the first >> >> > > extreme you've mentioned and only take commits if customers are asking >> >> > > you to do so. >> >> > > >> >> > > I think that the rule we're trying to agree upon is the "It must fix >> >> > > a real bug that bothers people". >> >> > > >> >> > > I think that we can agree that it's impossible to expect every single >> >> > > Linux user to go on LKML and complain about a bug he encountered, so >> >> > > the >> >> > > rule quickly becomes "It must fix a real bug that can bother people". >> >> > >> >> > So is there a reason why stable couldn't become some hybrid-form union >> >> > of >> >> > >> >> > - really critical issues (data corruption, boot issues, severe security >> >> > issues) taken from bleeding edge upstream >> >> > - [reviewed] cherry-picks of functional fixes from major distro kernels >> >> > (based on that very -stable release), as that's apparently what people >> >> > are hitting in the real world with that particular kernel >> >> >> >> It already is that :) >> >> >> >> The problem Sasha is trying to solve here is that for many subsystems, >> >> maintainers do not mark patches for stable at all. >> > >> >The way he is trying to do that is just wrong. Generate a pressure on >> >those subsystems by referring to bug reports and unhappy users and I am >> >pretty sure they will try harder... You cannot solve the problem by >> >bypassing them without having deep understanding of the specific >> >subsytem. Once you have it, just make sure you are part of the review >> >process and make sure to mark patches before they are merged. >> >> I think we just don't agree on how we should "pressure". >> >> Look at the discussion I had with the XFS folks who just don't want to >> deal with this -stable thing because they have to much work upstream. > >So do you really think that you or any script decide without them? My >recollection from that discussion was quite opposite. Dave was quite >clear that most of fixes are quite hard to evaluate and most of them >are simply not worth risking the backport. No, *some* fixes are hard, not most. I'm not trying to decide for them, I'm trying to help them decide. >> There wasn't a single patch in -stable coming from XFS for the past 6+ >> months. I'm aware of more than one way to corrupt an XFS volume for any >> distro that uses a kernel older than 4.15. > >Then try to poke/bribe somebody to have it fixed. But applying >_something_ is just not a solution. You should also evaluate whether "I >am able to corrupt" is something that "people see in the wild". Sure >there are zillions of bugs hidden in the large code base like the >kernel. People just do not tend to hit them and this will likely not >change very much in the future. We can't ignore bugs just because people don't notice. Data corruption bugs in particular are a pain to report as well, the corruption might have happened months before and there's not much to report at that point. There's quite a few bug classes like that. >> Sure, please buy them a beer at LSF/MM (I'll pay) and ask them to be >> better about it, but I don't see this changing. > >I can surely have one or two and discuss this. I am pretty sure xfs guys >are not going to pretend older kernels do not exist. > >> The solution to this, in my opinion, is to automate the whole selection >> and review process. We do selection using AI, and we run every possible >> test that's relevant to that subsystem. >> >> At which point, the amount of work a human needs to do to review a patch >> shrinks into something far more managable for some maintainers. > >I really disagree. I am pretty sure maintainers are very well aware of >how the patch is important. Some do no care about stable and I agree you >should poke those. But some have really good reasons to not throw many >patches that direction because they do not feel the patch is important >enough. > >Remember this is not about numbers. The more is not always better. So what is "important"? Look at the XFS issues, they were important enough to get fixed upstream, and have an appropriate test added to xfstests. Why didn't they go back to -stable? >> >> So real bugfixes >> >> that do hit people are not getting to those kernels, which force the >> >> distros to do extra work to triage a bug, dig through upstream kernels, >> >> find and apply the patch. >> > >> >I would say that this is the primary role of the distro. To hide the >> >jungle of the upstream work and provide the additional of bug filtering
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Tue, Apr 17, 2018 at 04:36:31PM +0200, Michal Hocko wrote: >On Tue 17-04-18 14:04:36, Sasha Levin wrote: >> On Tue, Apr 17, 2018 at 01:07:17PM +0200, Michal Hocko wrote: >> >On Tue 17-04-18 12:39:36, Greg KH wrote: >> >> On Mon, Apr 16, 2018 at 11:28:44PM +0200, Jiri Kosina wrote: >> >> > On Mon, 16 Apr 2018, Sasha Levin wrote: >> >> > >> >> > > I agree that as an enterprise distro taking everything from -stable >> >> > > isn't the best idea. Ideally you'd want to be close to the first >> >> > > extreme you've mentioned and only take commits if customers are asking >> >> > > you to do so. >> >> > > >> >> > > I think that the rule we're trying to agree upon is the "It must fix >> >> > > a real bug that bothers people". >> >> > > >> >> > > I think that we can agree that it's impossible to expect every single >> >> > > Linux user to go on LKML and complain about a bug he encountered, so >> >> > > the >> >> > > rule quickly becomes "It must fix a real bug that can bother people". >> >> > >> >> > So is there a reason why stable couldn't become some hybrid-form union >> >> > of >> >> > >> >> > - really critical issues (data corruption, boot issues, severe security >> >> > issues) taken from bleeding edge upstream >> >> > - [reviewed] cherry-picks of functional fixes from major distro kernels >> >> > (based on that very -stable release), as that's apparently what people >> >> > are hitting in the real world with that particular kernel >> >> >> >> It already is that :) >> >> >> >> The problem Sasha is trying to solve here is that for many subsystems, >> >> maintainers do not mark patches for stable at all. >> > >> >The way he is trying to do that is just wrong. Generate a pressure on >> >those subsystems by referring to bug reports and unhappy users and I am >> >pretty sure they will try harder... You cannot solve the problem by >> >bypassing them without having deep understanding of the specific >> >subsytem. Once you have it, just make sure you are part of the review >> >process and make sure to mark patches before they are merged. >> >> I think we just don't agree on how we should "pressure". >> >> Look at the discussion I had with the XFS folks who just don't want to >> deal with this -stable thing because they have to much work upstream. > >So do you really think that you or any script decide without them? My >recollection from that discussion was quite opposite. Dave was quite >clear that most of fixes are quite hard to evaluate and most of them >are simply not worth risking the backport. No, *some* fixes are hard, not most. I'm not trying to decide for them, I'm trying to help them decide. >> There wasn't a single patch in -stable coming from XFS for the past 6+ >> months. I'm aware of more than one way to corrupt an XFS volume for any >> distro that uses a kernel older than 4.15. > >Then try to poke/bribe somebody to have it fixed. But applying >_something_ is just not a solution. You should also evaluate whether "I >am able to corrupt" is something that "people see in the wild". Sure >there are zillions of bugs hidden in the large code base like the >kernel. People just do not tend to hit them and this will likely not >change very much in the future. We can't ignore bugs just because people don't notice. Data corruption bugs in particular are a pain to report as well, the corruption might have happened months before and there's not much to report at that point. There's quite a few bug classes like that. >> Sure, please buy them a beer at LSF/MM (I'll pay) and ask them to be >> better about it, but I don't see this changing. > >I can surely have one or two and discuss this. I am pretty sure xfs guys >are not going to pretend older kernels do not exist. > >> The solution to this, in my opinion, is to automate the whole selection >> and review process. We do selection using AI, and we run every possible >> test that's relevant to that subsystem. >> >> At which point, the amount of work a human needs to do to review a patch >> shrinks into something far more managable for some maintainers. > >I really disagree. I am pretty sure maintainers are very well aware of >how the patch is important. Some do no care about stable and I agree you >should poke those. But some have really good reasons to not throw many >patches that direction because they do not feel the patch is important >enough. > >Remember this is not about numbers. The more is not always better. So what is "important"? Look at the XFS issues, they were important enough to get fixed upstream, and have an appropriate test added to xfstests. Why didn't they go back to -stable? >> >> So real bugfixes >> >> that do hit people are not getting to those kernels, which force the >> >> distros to do extra work to triage a bug, dig through upstream kernels, >> >> find and apply the patch. >> > >> >I would say that this is the primary role of the distro. To hide the >> >jungle of the upstream work and provide the additional of bug filtering
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Tue, Apr 17, 2018 at 04:22:46PM +0200, Michal Hocko wrote: >On Tue 17-04-18 13:39:33, Sasha Levin wrote: >[...] >> But mm/ commits don't come only from these people. Here's a concrete >> example we can discuss: >> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c61611f70958d86f659bca25c02ae69413747a8d > >I would be really careful. Because that reqiures to audit all callers to >be compliant with the change. This is just _too_ easy to backport >without noticing a failure. Now consider the other side. Is there any >real bug report backing this? This behavior was like that for quite some >time but I do not remember any actual bug report and the changelog >doesn't mention one either. It is about theoretical problem. https://lkml.org/lkml/2018/3/19/430 There's even a fun little reproducer that allowed me to confirm it's an issue (at least) on 4.15. Heck, it might even qualify as a CVE. >So if this was to be merged to stable then the changelog should contain >a big fat warning about the existing users and how they should be >checked. So what I'm asking is why *wasn't* it sent to stable? Yes, it requires additional work backporting this, but what I'm saying is that this didn't happen at all. >Besides that I can see Reviewed-by: akpm and Andrew is usually very >careful about stable backports so there probably _was_ a reson to >exclude stable. >-- >Michal Hocko >SUSE Labs
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Tue, Apr 17, 2018 at 04:22:46PM +0200, Michal Hocko wrote: >On Tue 17-04-18 13:39:33, Sasha Levin wrote: >[...] >> But mm/ commits don't come only from these people. Here's a concrete >> example we can discuss: >> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c61611f70958d86f659bca25c02ae69413747a8d > >I would be really careful. Because that reqiures to audit all callers to >be compliant with the change. This is just _too_ easy to backport >without noticing a failure. Now consider the other side. Is there any >real bug report backing this? This behavior was like that for quite some >time but I do not remember any actual bug report and the changelog >doesn't mention one either. It is about theoretical problem. https://lkml.org/lkml/2018/3/19/430 There's even a fun little reproducer that allowed me to confirm it's an issue (at least) on 4.15. Heck, it might even qualify as a CVE. >So if this was to be merged to stable then the changelog should contain >a big fat warning about the existing users and how they should be >checked. So what I'm asking is why *wasn't* it sent to stable? Yes, it requires additional work backporting this, but what I'm saying is that this didn't happen at all. >Besides that I can see Reviewed-by: akpm and Andrew is usually very >careful about stable backports so there probably _was_ a reson to >exclude stable. >-- >Michal Hocko >SUSE Labs
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Tue 17-04-18 14:04:36, Sasha Levin wrote: > On Tue, Apr 17, 2018 at 01:07:17PM +0200, Michal Hocko wrote: > >On Tue 17-04-18 12:39:36, Greg KH wrote: > >> On Mon, Apr 16, 2018 at 11:28:44PM +0200, Jiri Kosina wrote: > >> > On Mon, 16 Apr 2018, Sasha Levin wrote: > >> > > >> > > I agree that as an enterprise distro taking everything from -stable > >> > > isn't the best idea. Ideally you'd want to be close to the first > >> > > extreme you've mentioned and only take commits if customers are asking > >> > > you to do so. > >> > > > >> > > I think that the rule we're trying to agree upon is the "It must fix > >> > > a real bug that bothers people". > >> > > > >> > > I think that we can agree that it's impossible to expect every single > >> > > Linux user to go on LKML and complain about a bug he encountered, so > >> > > the > >> > > rule quickly becomes "It must fix a real bug that can bother people". > >> > > >> > So is there a reason why stable couldn't become some hybrid-form union of > >> > > >> > - really critical issues (data corruption, boot issues, severe security > >> > issues) taken from bleeding edge upstream > >> > - [reviewed] cherry-picks of functional fixes from major distro kernels > >> > (based on that very -stable release), as that's apparently what people > >> > are hitting in the real world with that particular kernel > >> > >> It already is that :) > >> > >> The problem Sasha is trying to solve here is that for many subsystems, > >> maintainers do not mark patches for stable at all. > > > >The way he is trying to do that is just wrong. Generate a pressure on > >those subsystems by referring to bug reports and unhappy users and I am > >pretty sure they will try harder... You cannot solve the problem by > >bypassing them without having deep understanding of the specific > >subsytem. Once you have it, just make sure you are part of the review > >process and make sure to mark patches before they are merged. > > I think we just don't agree on how we should "pressure". > > Look at the discussion I had with the XFS folks who just don't want to > deal with this -stable thing because they have to much work upstream. So do you really think that you or any script decide without them? My recollection from that discussion was quite opposite. Dave was quite clear that most of fixes are quite hard to evaluate and most of them are simply not worth risking the backport. > There wasn't a single patch in -stable coming from XFS for the past 6+ > months. I'm aware of more than one way to corrupt an XFS volume for any > distro that uses a kernel older than 4.15. Then try to poke/bribe somebody to have it fixed. But applying _something_ is just not a solution. You should also evaluate whether "I am able to corrupt" is something that "people see in the wild". Sure there are zillions of bugs hidden in the large code base like the kernel. People just do not tend to hit them and this will likely not change very much in the future. > Sure, please buy them a beer at LSF/MM (I'll pay) and ask them to be > better about it, but I don't see this changing. I can surely have one or two and discuss this. I am pretty sure xfs guys are not going to pretend older kernels do not exist. > The solution to this, in my opinion, is to automate the whole selection > and review process. We do selection using AI, and we run every possible > test that's relevant to that subsystem. > > At which point, the amount of work a human needs to do to review a patch > shrinks into something far more managable for some maintainers. I really disagree. I am pretty sure maintainers are very well aware of how the patch is important. Some do no care about stable and I agree you should poke those. But some have really good reasons to not throw many patches that direction because they do not feel the patch is important enough. Remember this is not about numbers. The more is not always better. > >> So real bugfixes > >> that do hit people are not getting to those kernels, which force the > >> distros to do extra work to triage a bug, dig through upstream kernels, > >> find and apply the patch. > > > >I would say that this is the primary role of the distro. To hide the > >jungle of the upstream work and provide the additional of bug filtering > >and forwarding them the right direction. > > More often than triaging, you'll just be asked to upgrade to the latest > version. What sort of user experience does that provide? > > [snip] > > >> So nothing "new" is happening here, EXCEPT we are actually starting to > >> get a better kernel-wide coverage for stable fixes, which we have not > >> had in the past. That's a good thing! The number of patches applied to > >> stable is still a very very very tiny % compared to mainline, so nothing > >> new is happening here. > > > >yes I do agree, the stable process is not very much different from the > >past and I would tend both processes broken because they explicitly try > >to avoid
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Tue 17-04-18 14:04:36, Sasha Levin wrote: > On Tue, Apr 17, 2018 at 01:07:17PM +0200, Michal Hocko wrote: > >On Tue 17-04-18 12:39:36, Greg KH wrote: > >> On Mon, Apr 16, 2018 at 11:28:44PM +0200, Jiri Kosina wrote: > >> > On Mon, 16 Apr 2018, Sasha Levin wrote: > >> > > >> > > I agree that as an enterprise distro taking everything from -stable > >> > > isn't the best idea. Ideally you'd want to be close to the first > >> > > extreme you've mentioned and only take commits if customers are asking > >> > > you to do so. > >> > > > >> > > I think that the rule we're trying to agree upon is the "It must fix > >> > > a real bug that bothers people". > >> > > > >> > > I think that we can agree that it's impossible to expect every single > >> > > Linux user to go on LKML and complain about a bug he encountered, so > >> > > the > >> > > rule quickly becomes "It must fix a real bug that can bother people". > >> > > >> > So is there a reason why stable couldn't become some hybrid-form union of > >> > > >> > - really critical issues (data corruption, boot issues, severe security > >> > issues) taken from bleeding edge upstream > >> > - [reviewed] cherry-picks of functional fixes from major distro kernels > >> > (based on that very -stable release), as that's apparently what people > >> > are hitting in the real world with that particular kernel > >> > >> It already is that :) > >> > >> The problem Sasha is trying to solve here is that for many subsystems, > >> maintainers do not mark patches for stable at all. > > > >The way he is trying to do that is just wrong. Generate a pressure on > >those subsystems by referring to bug reports and unhappy users and I am > >pretty sure they will try harder... You cannot solve the problem by > >bypassing them without having deep understanding of the specific > >subsytem. Once you have it, just make sure you are part of the review > >process and make sure to mark patches before they are merged. > > I think we just don't agree on how we should "pressure". > > Look at the discussion I had with the XFS folks who just don't want to > deal with this -stable thing because they have to much work upstream. So do you really think that you or any script decide without them? My recollection from that discussion was quite opposite. Dave was quite clear that most of fixes are quite hard to evaluate and most of them are simply not worth risking the backport. > There wasn't a single patch in -stable coming from XFS for the past 6+ > months. I'm aware of more than one way to corrupt an XFS volume for any > distro that uses a kernel older than 4.15. Then try to poke/bribe somebody to have it fixed. But applying _something_ is just not a solution. You should also evaluate whether "I am able to corrupt" is something that "people see in the wild". Sure there are zillions of bugs hidden in the large code base like the kernel. People just do not tend to hit them and this will likely not change very much in the future. > Sure, please buy them a beer at LSF/MM (I'll pay) and ask them to be > better about it, but I don't see this changing. I can surely have one or two and discuss this. I am pretty sure xfs guys are not going to pretend older kernels do not exist. > The solution to this, in my opinion, is to automate the whole selection > and review process. We do selection using AI, and we run every possible > test that's relevant to that subsystem. > > At which point, the amount of work a human needs to do to review a patch > shrinks into something far more managable for some maintainers. I really disagree. I am pretty sure maintainers are very well aware of how the patch is important. Some do no care about stable and I agree you should poke those. But some have really good reasons to not throw many patches that direction because they do not feel the patch is important enough. Remember this is not about numbers. The more is not always better. > >> So real bugfixes > >> that do hit people are not getting to those kernels, which force the > >> distros to do extra work to triage a bug, dig through upstream kernels, > >> find and apply the patch. > > > >I would say that this is the primary role of the distro. To hide the > >jungle of the upstream work and provide the additional of bug filtering > >and forwarding them the right direction. > > More often than triaging, you'll just be asked to upgrade to the latest > version. What sort of user experience does that provide? > > [snip] > > >> So nothing "new" is happening here, EXCEPT we are actually starting to > >> get a better kernel-wide coverage for stable fixes, which we have not > >> had in the past. That's a good thing! The number of patches applied to > >> stable is still a very very very tiny % compared to mainline, so nothing > >> new is happening here. > > > >yes I do agree, the stable process is not very much different from the > >past and I would tend both processes broken because they explicitly try > >to avoid
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Tue, Apr 17, 2018 at 10:15:02AM -0400, Steven Rostedt wrote: > On Tue, 17 Apr 2018 14:04:36 + > Sasha Levinwrote: > > > The solution to this, in my opinion, is to automate the whole selection > > and review process. We do selection using AI, and we run every possible > > test that's relevant to that subsystem. > > > > At which point, the amount of work a human needs to do to review a patch > > shrinks into something far more managable for some maintainers. > > I guess the real question is, who are the stable kernels for? Is it just > a place to look at to see what distros should think about. A superset > of what distros would take. Then distros would have a nice place to > look to find what patches they should look at. But the stable tree > itself wont be used. But it's not being used today by major distros > (Red Hat and SuSE). Debian may be using it, but that's because the > stable maintainer for its kernels is also the Debian maintainer. > > Who are the customers of the stable trees? They are the ones that > should be determining the "equation" for what goes into it. The "customers" of the stable trees are anyone who uses Linux. Right now, it's estimated that only about 1/3 of the kernels running out there, at the best, are an "enterprise" kernel/distro. 2/3 of the world either run a kernel.org-based release + their own patches, or Debian. And Debian piggy-backs on the stable kernel releases pretty regularily. So the majority of the Linux users out there are what we are doing this for. Those that do not pay for a company to sift through things and only cherry-pick what they want to pick out (hint, they almost always miss things, some do this better than others...) That's who this is all for, which is why we are doing our best to keep on top of the avalanche of patches going into upstream to get the needed fixes (both security and "normal" fixes) out to users as soon as possible. So again, if you are a subsystem maintainer, tag your patches for stable. If you do not, you will get automated emails asking you about patches that should be applied (like the one that started this thread). If you want to just have us ignore your subsystem entirely, we will be glad to do so, and we will tell the world to not use your subsystem if at all possible (see previous comments about xfs, and I would argue IB right now...) > Personally, I use stable as a one off from mainline. Like I mentioned > in another email. I'm currently on 4.15.x and will probably move to > 4.16.x next. Unless there's some critical bug announcement, I update my > machines once a month. I originally just used mainline, but that was a > bit too unstable for my work machines ;-) That's great, you are a user of these trees then. So you benifit directly, along with everyone else who relies on them. And again, I'm working with the SoC vendors to directly incorporate these trees into their device trees, and I've already seen some devices in the wild push out updated 4.4.y kernels a few weeks after they are released. That's the end goal here, to have the world's devices in a much more secure and stable shape by relying on these kernels. thanks, greg k-h
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Tue, Apr 17, 2018 at 10:15:02AM -0400, Steven Rostedt wrote: > On Tue, 17 Apr 2018 14:04:36 + > Sasha Levin wrote: > > > The solution to this, in my opinion, is to automate the whole selection > > and review process. We do selection using AI, and we run every possible > > test that's relevant to that subsystem. > > > > At which point, the amount of work a human needs to do to review a patch > > shrinks into something far more managable for some maintainers. > > I guess the real question is, who are the stable kernels for? Is it just > a place to look at to see what distros should think about. A superset > of what distros would take. Then distros would have a nice place to > look to find what patches they should look at. But the stable tree > itself wont be used. But it's not being used today by major distros > (Red Hat and SuSE). Debian may be using it, but that's because the > stable maintainer for its kernels is also the Debian maintainer. > > Who are the customers of the stable trees? They are the ones that > should be determining the "equation" for what goes into it. The "customers" of the stable trees are anyone who uses Linux. Right now, it's estimated that only about 1/3 of the kernels running out there, at the best, are an "enterprise" kernel/distro. 2/3 of the world either run a kernel.org-based release + their own patches, or Debian. And Debian piggy-backs on the stable kernel releases pretty regularily. So the majority of the Linux users out there are what we are doing this for. Those that do not pay for a company to sift through things and only cherry-pick what they want to pick out (hint, they almost always miss things, some do this better than others...) That's who this is all for, which is why we are doing our best to keep on top of the avalanche of patches going into upstream to get the needed fixes (both security and "normal" fixes) out to users as soon as possible. So again, if you are a subsystem maintainer, tag your patches for stable. If you do not, you will get automated emails asking you about patches that should be applied (like the one that started this thread). If you want to just have us ignore your subsystem entirely, we will be glad to do so, and we will tell the world to not use your subsystem if at all possible (see previous comments about xfs, and I would argue IB right now...) > Personally, I use stable as a one off from mainline. Like I mentioned > in another email. I'm currently on 4.15.x and will probably move to > 4.16.x next. Unless there's some critical bug announcement, I update my > machines once a month. I originally just used mainline, but that was a > bit too unstable for my work machines ;-) That's great, you are a user of these trees then. So you benifit directly, along with everyone else who relies on them. And again, I'm working with the SoC vendors to directly incorporate these trees into their device trees, and I've already seen some devices in the wild push out updated 4.4.y kernels a few weeks after they are released. That's the end goal here, to have the world's devices in a much more secure and stable shape by relying on these kernels. thanks, greg k-h
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Tue 17-04-18 13:39:33, Sasha Levin wrote: [...] > But mm/ commits don't come only from these people. Here's a concrete > example we can discuss: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c61611f70958d86f659bca25c02ae69413747a8d I would be really careful. Because that reqiures to audit all callers to be compliant with the change. This is just _too_ easy to backport without noticing a failure. Now consider the other side. Is there any real bug report backing this? This behavior was like that for quite some time but I do not remember any actual bug report and the changelog doesn't mention one either. It is about theoretical problem. So if this was to be merged to stable then the changelog should contain a big fat warning about the existing users and how they should be checked. Besides that I can see Reviewed-by: akpm and Andrew is usually very careful about stable backports so there probably _was_ a reson to exclude stable. -- Michal Hocko SUSE Labs
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Tue 17-04-18 13:39:33, Sasha Levin wrote: [...] > But mm/ commits don't come only from these people. Here's a concrete > example we can discuss: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c61611f70958d86f659bca25c02ae69413747a8d I would be really careful. Because that reqiures to audit all callers to be compliant with the change. This is just _too_ easy to backport without noticing a failure. Now consider the other side. Is there any real bug report backing this? This behavior was like that for quite some time but I do not remember any actual bug report and the changelog doesn't mention one either. It is about theoretical problem. So if this was to be merged to stable then the changelog should contain a big fat warning about the existing users and how they should be checked. Besides that I can see Reviewed-by: akpm and Andrew is usually very careful about stable backports so there probably _was_ a reson to exclude stable. -- Michal Hocko SUSE Labs
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Tue, 17 Apr 2018 14:04:36 + Sasha Levinwrote: > The solution to this, in my opinion, is to automate the whole selection > and review process. We do selection using AI, and we run every possible > test that's relevant to that subsystem. > > At which point, the amount of work a human needs to do to review a patch > shrinks into something far more managable for some maintainers. I guess the real question is, who are the stable kernels for? Is it just a place to look at to see what distros should think about. A superset of what distros would take. Then distros would have a nice place to look to find what patches they should look at. But the stable tree itself wont be used. But it's not being used today by major distros (Red Hat and SuSE). Debian may be using it, but that's because the stable maintainer for its kernels is also the Debian maintainer. Who are the customers of the stable trees? They are the ones that should be determining the "equation" for what goes into it. Personally, I use stable as a one off from mainline. Like I mentioned in another email. I'm currently on 4.15.x and will probably move to 4.16.x next. Unless there's some critical bug announcement, I update my machines once a month. I originally just used mainline, but that was a bit too unstable for my work machines ;-) -- Steve
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Tue, 17 Apr 2018 14:04:36 + Sasha Levin wrote: > The solution to this, in my opinion, is to automate the whole selection > and review process. We do selection using AI, and we run every possible > test that's relevant to that subsystem. > > At which point, the amount of work a human needs to do to review a patch > shrinks into something far more managable for some maintainers. I guess the real question is, who are the stable kernels for? Is it just a place to look at to see what distros should think about. A superset of what distros would take. Then distros would have a nice place to look to find what patches they should look at. But the stable tree itself wont be used. But it's not being used today by major distros (Red Hat and SuSE). Debian may be using it, but that's because the stable maintainer for its kernels is also the Debian maintainer. Who are the customers of the stable trees? They are the ones that should be determining the "equation" for what goes into it. Personally, I use stable as a one off from mainline. Like I mentioned in another email. I'm currently on 4.15.x and will probably move to 4.16.x next. Unless there's some critical bug announcement, I update my machines once a month. I originally just used mainline, but that was a bit too unstable for my work machines ;-) -- Steve
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Tue, Apr 17, 2018 at 01:07:17PM +0200, Michal Hocko wrote: >On Tue 17-04-18 12:39:36, Greg KH wrote: >> On Mon, Apr 16, 2018 at 11:28:44PM +0200, Jiri Kosina wrote: >> > On Mon, 16 Apr 2018, Sasha Levin wrote: >> > >> > > I agree that as an enterprise distro taking everything from -stable >> > > isn't the best idea. Ideally you'd want to be close to the first >> > > extreme you've mentioned and only take commits if customers are asking >> > > you to do so. >> > > >> > > I think that the rule we're trying to agree upon is the "It must fix >> > > a real bug that bothers people". >> > > >> > > I think that we can agree that it's impossible to expect every single >> > > Linux user to go on LKML and complain about a bug he encountered, so the >> > > rule quickly becomes "It must fix a real bug that can bother people". >> > >> > So is there a reason why stable couldn't become some hybrid-form union of >> > >> > - really critical issues (data corruption, boot issues, severe security >> > issues) taken from bleeding edge upstream >> > - [reviewed] cherry-picks of functional fixes from major distro kernels >> > (based on that very -stable release), as that's apparently what people >> > are hitting in the real world with that particular kernel >> >> It already is that :) >> >> The problem Sasha is trying to solve here is that for many subsystems, >> maintainers do not mark patches for stable at all. > >The way he is trying to do that is just wrong. Generate a pressure on >those subsystems by referring to bug reports and unhappy users and I am >pretty sure they will try harder... You cannot solve the problem by >bypassing them without having deep understanding of the specific >subsytem. Once you have it, just make sure you are part of the review >process and make sure to mark patches before they are merged. I think we just don't agree on how we should "pressure". Look at the discussion I had with the XFS folks who just don't want to deal with this -stable thing because they have to much work upstream. There wasn't a single patch in -stable coming from XFS for the past 6+ months. I'm aware of more than one way to corrupt an XFS volume for any distro that uses a kernel older than 4.15. Sure, please buy them a beer at LSF/MM (I'll pay) and ask them to be better about it, but I don't see this changing. The solution to this, in my opinion, is to automate the whole selection and review process. We do selection using AI, and we run every possible test that's relevant to that subsystem. At which point, the amount of work a human needs to do to review a patch shrinks into something far more managable for some maintainers. >> So real bugfixes >> that do hit people are not getting to those kernels, which force the >> distros to do extra work to triage a bug, dig through upstream kernels, >> find and apply the patch. > >I would say that this is the primary role of the distro. To hide the >jungle of the upstream work and provide the additional of bug filtering >and forwarding them the right direction. More often than triaging, you'll just be asked to upgrade to the latest version. What sort of user experience does that provide? [snip] >> So nothing "new" is happening here, EXCEPT we are actually starting to >> get a better kernel-wide coverage for stable fixes, which we have not >> had in the past. That's a good thing! The number of patches applied to >> stable is still a very very very tiny % compared to mainline, so nothing >> new is happening here. > >yes I do agree, the stable process is not very much different from the >past and I would tend both processes broken because they explicitly try >to avoid maintainers which is just wrong. Avoid maintainers?! We send so much "spam" trying to get maintainers more involved in the process. How is that avoiding them? If you're a maintainer who has specific requirements for the -stable flow, or you have any automated testing you'd like to be run on these commits, or you want these mails to come in a different format, or pretty much anything else at all just shoot me a mail! It's been almost impossible to get maintainers involved in this process. We don't sneak anything past maintainers, there are multiple mails over multiple weeks for each commit that would go in. You don't have to review it right away either, just reply with "please don't merge until I'm done reviewing" and it'll get removed from the queue. >> Oh, and if you do want to complain about huge new features being >> backported, look at the mess that Spectre and Meltdown has caused in the >> stable trees. I don't see anyone complaining about those massive >> changes :) > >Are you serious? Are you going the compare the biggest PITA that the >community had to undergo because of HW issues with random pattern >matching in changelog/diffs? Come on! HW Issues are irrelevant here. You had a bug that allowed arbitrary kernel memory access. I can easily list quite a few commits, that are not tagged for
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Tue, Apr 17, 2018 at 01:07:17PM +0200, Michal Hocko wrote: >On Tue 17-04-18 12:39:36, Greg KH wrote: >> On Mon, Apr 16, 2018 at 11:28:44PM +0200, Jiri Kosina wrote: >> > On Mon, 16 Apr 2018, Sasha Levin wrote: >> > >> > > I agree that as an enterprise distro taking everything from -stable >> > > isn't the best idea. Ideally you'd want to be close to the first >> > > extreme you've mentioned and only take commits if customers are asking >> > > you to do so. >> > > >> > > I think that the rule we're trying to agree upon is the "It must fix >> > > a real bug that bothers people". >> > > >> > > I think that we can agree that it's impossible to expect every single >> > > Linux user to go on LKML and complain about a bug he encountered, so the >> > > rule quickly becomes "It must fix a real bug that can bother people". >> > >> > So is there a reason why stable couldn't become some hybrid-form union of >> > >> > - really critical issues (data corruption, boot issues, severe security >> > issues) taken from bleeding edge upstream >> > - [reviewed] cherry-picks of functional fixes from major distro kernels >> > (based on that very -stable release), as that's apparently what people >> > are hitting in the real world with that particular kernel >> >> It already is that :) >> >> The problem Sasha is trying to solve here is that for many subsystems, >> maintainers do not mark patches for stable at all. > >The way he is trying to do that is just wrong. Generate a pressure on >those subsystems by referring to bug reports and unhappy users and I am >pretty sure they will try harder... You cannot solve the problem by >bypassing them without having deep understanding of the specific >subsytem. Once you have it, just make sure you are part of the review >process and make sure to mark patches before they are merged. I think we just don't agree on how we should "pressure". Look at the discussion I had with the XFS folks who just don't want to deal with this -stable thing because they have to much work upstream. There wasn't a single patch in -stable coming from XFS for the past 6+ months. I'm aware of more than one way to corrupt an XFS volume for any distro that uses a kernel older than 4.15. Sure, please buy them a beer at LSF/MM (I'll pay) and ask them to be better about it, but I don't see this changing. The solution to this, in my opinion, is to automate the whole selection and review process. We do selection using AI, and we run every possible test that's relevant to that subsystem. At which point, the amount of work a human needs to do to review a patch shrinks into something far more managable for some maintainers. >> So real bugfixes >> that do hit people are not getting to those kernels, which force the >> distros to do extra work to triage a bug, dig through upstream kernels, >> find and apply the patch. > >I would say that this is the primary role of the distro. To hide the >jungle of the upstream work and provide the additional of bug filtering >and forwarding them the right direction. More often than triaging, you'll just be asked to upgrade to the latest version. What sort of user experience does that provide? [snip] >> So nothing "new" is happening here, EXCEPT we are actually starting to >> get a better kernel-wide coverage for stable fixes, which we have not >> had in the past. That's a good thing! The number of patches applied to >> stable is still a very very very tiny % compared to mainline, so nothing >> new is happening here. > >yes I do agree, the stable process is not very much different from the >past and I would tend both processes broken because they explicitly try >to avoid maintainers which is just wrong. Avoid maintainers?! We send so much "spam" trying to get maintainers more involved in the process. How is that avoiding them? If you're a maintainer who has specific requirements for the -stable flow, or you have any automated testing you'd like to be run on these commits, or you want these mails to come in a different format, or pretty much anything else at all just shoot me a mail! It's been almost impossible to get maintainers involved in this process. We don't sneak anything past maintainers, there are multiple mails over multiple weeks for each commit that would go in. You don't have to review it right away either, just reply with "please don't merge until I'm done reviewing" and it'll get removed from the queue. >> Oh, and if you do want to complain about huge new features being >> backported, look at the mess that Spectre and Meltdown has caused in the >> stable trees. I don't see anyone complaining about those massive >> changes :) > >Are you serious? Are you going the compare the biggest PITA that the >community had to undergo because of HW issues with random pattern >matching in changelog/diffs? Come on! HW Issues are irrelevant here. You had a bug that allowed arbitrary kernel memory access. I can easily list quite a few commits, that are not tagged for
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Tue, Apr 17, 2018 at 02:24:54PM +0200, Petr Mladek wrote: >Back to the trend. Last week I got autosel mails even for >patches that were still being discussed, had issues, and >were far from upstream: > > https://lkml.kernel.org/r/dm5pr2101mb1032ab19b489d46b717b50d4fb...@dm5pr2101mb1032.namprd21.prod.outlook.com > https://lkml.kernel.org/r/dm5pr2101mb10327fa0a7e0d2c901e33b79fb...@dm5pr2101mb1032.namprd21.prod.outlook.com > >It might be a good idea if the mail asked to add Fixes: tag >or stable mailing list. But the mail suggested to add the >unfinished patch into stable branch directly (even before >upstreaming?). I obviously didn't suggest that this patch will go in -stable before it's upstream. I've started doing those because some folks can't be arsed to reply to a review request for a patch that is months old. I found that if I send these mails while the discussion is still going on I'd get a much better response rate from people. If you think any of these patches should go in stable there were two ways about it: - You end up adding the -stable tag yourself, and it would follow the usual route where Greg picks it up. - You reply to that mail, and the patch would wait in a list until my script notices it made it upstream, at which point it would get queued for stable. >Now, there are only hand full of printk patches in each >release, so it is still doable. I just do not understand >how other maintainers, from much more busy subsystems, >could cope with this trend. > >By other words. If you want to automatize patch nomination, >you might need to automatize also patch review. Or you need >to keep the patch rate low. This might mean to nominate >only important and rather trivial fixes. I also have an effort to help review the patches. See what I'm working on for the xfs folks: https://lkml.org/lkml/2018/3/29/1113 Where in addition to build tests I'd also run each commit, for each stable kernel through a set of xfstests and provide them along with the mail. So yes, I'm aware that the volume of patches is huge, but there's not much I can do about it because it's just a subset of the kernel's patch volume and since the kernel gets more and more patches each release, the volume of stable commits is bound to grow as well.
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Tue, Apr 17, 2018 at 02:24:54PM +0200, Petr Mladek wrote: >Back to the trend. Last week I got autosel mails even for >patches that were still being discussed, had issues, and >were far from upstream: > > https://lkml.kernel.org/r/dm5pr2101mb1032ab19b489d46b717b50d4fb...@dm5pr2101mb1032.namprd21.prod.outlook.com > https://lkml.kernel.org/r/dm5pr2101mb10327fa0a7e0d2c901e33b79fb...@dm5pr2101mb1032.namprd21.prod.outlook.com > >It might be a good idea if the mail asked to add Fixes: tag >or stable mailing list. But the mail suggested to add the >unfinished patch into stable branch directly (even before >upstreaming?). I obviously didn't suggest that this patch will go in -stable before it's upstream. I've started doing those because some folks can't be arsed to reply to a review request for a patch that is months old. I found that if I send these mails while the discussion is still going on I'd get a much better response rate from people. If you think any of these patches should go in stable there were two ways about it: - You end up adding the -stable tag yourself, and it would follow the usual route where Greg picks it up. - You reply to that mail, and the patch would wait in a list until my script notices it made it upstream, at which point it would get queued for stable. >Now, there are only hand full of printk patches in each >release, so it is still doable. I just do not understand >how other maintainers, from much more busy subsystems, >could cope with this trend. > >By other words. If you want to automatize patch nomination, >you might need to automatize also patch review. Or you need >to keep the patch rate low. This might mean to nominate >only important and rather trivial fixes. I also have an effort to help review the patches. See what I'm working on for the xfs folks: https://lkml.org/lkml/2018/3/29/1113 Where in addition to build tests I'd also run each commit, for each stable kernel through a set of xfstests and provide them along with the mail. So yes, I'm aware that the volume of patches is huge, but there's not much I can do about it because it's just a subset of the kernel's patch volume and since the kernel gets more and more patches each release, the volume of stable commits is bound to grow as well.
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Tue, Apr 17, 2018 at 02:49:24PM +0200, Michal Hocko wrote: >On Tue 17-04-18 14:24:54, Petr Mladek wrote: >[...] >> Back to the trend. Last week I got autosel mails even for >> patches that were still being discussed, had issues, and >> were far from upstream: >> >> https://lkml.kernel.org/r/dm5pr2101mb1032ab19b489d46b717b50d4fb...@dm5pr2101mb1032.namprd21.prod.outlook.com >> https://lkml.kernel.org/r/dm5pr2101mb10327fa0a7e0d2c901e33b79fb...@dm5pr2101mb1032.namprd21.prod.outlook.com >> >> It might be a good idea if the mail asked to add Fixes: tag >> or stable mailing list. But the mail suggested to add the >> unfinished patch into stable branch directly (even before >> upstreaming?). > >Well, I think that poking subsystems which ignore stable trees with such >emails early during review might be quite helpful. Maybe people start >marking for stable and we do not need the guessing later. I wouldn't >bother poking those who are known to mark stable patches though. Yup, mm/ needs far less poking that XFS (for example). What makes mm/ so good about this is that it's a rather small set of devs who are good at marking things for stable. As long as the commit came from one of these "core" mm/ folks it's almost guaranteed to have proper stable tags. But mm/ commits don't come only from these people. Here's a concrete example we can discuss: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c61611f70958d86f659bca25c02ae69413747a8d This was merged in a few days ago, and seems relevant for older kernel trees as well. Should it not have a stable tag?
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Tue, Apr 17, 2018 at 02:49:24PM +0200, Michal Hocko wrote: >On Tue 17-04-18 14:24:54, Petr Mladek wrote: >[...] >> Back to the trend. Last week I got autosel mails even for >> patches that were still being discussed, had issues, and >> were far from upstream: >> >> https://lkml.kernel.org/r/dm5pr2101mb1032ab19b489d46b717b50d4fb...@dm5pr2101mb1032.namprd21.prod.outlook.com >> https://lkml.kernel.org/r/dm5pr2101mb10327fa0a7e0d2c901e33b79fb...@dm5pr2101mb1032.namprd21.prod.outlook.com >> >> It might be a good idea if the mail asked to add Fixes: tag >> or stable mailing list. But the mail suggested to add the >> unfinished patch into stable branch directly (even before >> upstreaming?). > >Well, I think that poking subsystems which ignore stable trees with such >emails early during review might be quite helpful. Maybe people start >marking for stable and we do not need the guessing later. I wouldn't >bother poking those who are known to mark stable patches though. Yup, mm/ needs far less poking that XFS (for example). What makes mm/ so good about this is that it's a rather small set of devs who are good at marking things for stable. As long as the commit came from one of these "core" mm/ folks it's almost guaranteed to have proper stable tags. But mm/ commits don't come only from these people. Here's a concrete example we can discuss: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c61611f70958d86f659bca25c02ae69413747a8d This was merged in a few days ago, and seems relevant for older kernel trees as well. Should it not have a stable tag?
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Tue, Apr 17, 2018 at 01:41:44PM +0200, Jan Kara wrote: >On Mon 16-04-18 17:23:30, Sasha Levin wrote: >> On Mon, Apr 16, 2018 at 07:06:04PM +0200, Pavel Machek wrote: >> >On Mon 2018-04-16 16:37:56, Sasha Levin wrote: >> >> On Mon, Apr 16, 2018 at 12:30:19PM -0400, Steven Rostedt wrote: >> >> >On Mon, 16 Apr 2018 16:19:14 + >> >> >Sasha Levinwrote: >> >> > >> >> >> >Wait! What does that mean? What's the purpose of stable if it is as >> >> >> >broken as mainline? >> >> >> >> >> >> This just means that if there is a fix that went in mainline, and the >> >> >> fix is broken somehow, we'd rather take the broken fix than not. >> >> >> >> >> >> In this scenario, *something* will be broken, it's just a matter of >> >> >> what. We'd rather have the same thing broken between mainline and >> >> >> stable. >> >> > >> >> >Honestly, I think that removes all value of the stable series. I >> >> >remember when the stable series were first created. People were saying >> >> >that it wouldn't even get to more than 5 versions, because the bar for >> >> >backporting was suppose to be very high. Today it's just a fork of the >> >> >kernel at a given version. No more features, but we will be OK with >> >> >regressions. I'm struggling to see what the benefit of it is suppose to >> >> >be? >> >> >> >> It's not "OK with regressions". >> >> >> >> Let's look at a hypothetical example: You have a 4.15.1 kernel that has >> >> a broken printf() behaviour so that when you: >> >> >> >> pr_err("%d", 5) >> >> >> >> Would print: >> >> >> >> "Microsoft Rulez" >> >> >> >> Bad, right? So you went ahead and fixed it, and now it prints "5" as you >> >> might expect. But alas, with your patch, running: >> >> >> >> pr_err("%s", "hi!") >> >> >> >> Would show a cat picture for 5 seconds. >> >> >> >> Should we take your patch in -stable or not? If we don't, we're stuck >> >> with the original issue while the mainline kernel will behave >> >> differently, but if we do - we introduce a new regression. >> > >> >Of course not. >> > >> >- It must be obviously correct and tested. >> > >> >If it introduces new bug, it is not correct, and certainly not >> >obviously correct. >> >> As you might have noticed, we don't strictly follow the rules. >> >> Take a look at the whole PTI story as an example. It's way more than 100 >> lines, it's not obviously corrent, it fixed more than 1 thing, and so >> on, and yet it went in -stable! >> >> Would you argue we shouldn't have backported PTI to -stable? > >So I agree with that being backported. But I think this nicely demostrates >a point some people are trying to make in this thread. We do take fixes >with high risk or regression if they fix serious enough issue. Also we do >take fixes to non-serious stuff (such as addition of device ID) if the >chances of regression are really low. > >So IMHO the metric for including the fix is not solely "how annoying to >user this can be" but rather something like: > >score = (how annoying the bug is) * ((1 / (chance of regression due to > including this)) - 1)^3 > >(constants are somewhat arbitrary subject to tuning ;). Now both 'annoying' >and 'regression chance' parts are subjective and sometimes difficult to >estimate so don't take the formula too seriously but it demonstrates the >point. I think we all agree we want to fix annoying stuff and we don't want >regressions. But you need to somehow weight this over your expected >userbase - and this is where your argument "but someone might be annoyed by >LEDs not working so let's include it" has problems - it should rather be >"is the annoyance of non-working leds over expected user base high enough >to risk a regression due to this patch for someone in the expected user >base"? The answer to this second question is not clear at all to a casual >reviewer and that's why we IMHO have CC stable tag as maintainer is >supposed to have at least a bit better clue. We may be able to guesstimate the 'regression chance', but there's no way we can guess the 'annoyance' once. There are so many different use cases that we just can't even guess how many people would get "annoyed" by something. Even regression chance is tricky, look at the commits I've linked earlier in the thread. Even the most trivial looking commits that end up in stable have a chance for regression. >Another point I wanted to make is that if chance a patch causes a >regression is about 2% as you said somewhere else in a thread, then by >adding 20 patches that "may fix a bug that is annoying for someone" you've >just increased a chance there's a regression in the release by 34%. And So I've said that the rejection rate is less than 2%. This includes all commits that I have proposed for -stable, but didn't end up being included in -stable. This includes commits that the author/maintainers NACKed, commits that didn't do anything on older kernels, commits that were buggy but were caught before the kernel was released, commits
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Tue, Apr 17, 2018 at 01:41:44PM +0200, Jan Kara wrote: >On Mon 16-04-18 17:23:30, Sasha Levin wrote: >> On Mon, Apr 16, 2018 at 07:06:04PM +0200, Pavel Machek wrote: >> >On Mon 2018-04-16 16:37:56, Sasha Levin wrote: >> >> On Mon, Apr 16, 2018 at 12:30:19PM -0400, Steven Rostedt wrote: >> >> >On Mon, 16 Apr 2018 16:19:14 + >> >> >Sasha Levin wrote: >> >> > >> >> >> >Wait! What does that mean? What's the purpose of stable if it is as >> >> >> >broken as mainline? >> >> >> >> >> >> This just means that if there is a fix that went in mainline, and the >> >> >> fix is broken somehow, we'd rather take the broken fix than not. >> >> >> >> >> >> In this scenario, *something* will be broken, it's just a matter of >> >> >> what. We'd rather have the same thing broken between mainline and >> >> >> stable. >> >> > >> >> >Honestly, I think that removes all value of the stable series. I >> >> >remember when the stable series were first created. People were saying >> >> >that it wouldn't even get to more than 5 versions, because the bar for >> >> >backporting was suppose to be very high. Today it's just a fork of the >> >> >kernel at a given version. No more features, but we will be OK with >> >> >regressions. I'm struggling to see what the benefit of it is suppose to >> >> >be? >> >> >> >> It's not "OK with regressions". >> >> >> >> Let's look at a hypothetical example: You have a 4.15.1 kernel that has >> >> a broken printf() behaviour so that when you: >> >> >> >> pr_err("%d", 5) >> >> >> >> Would print: >> >> >> >> "Microsoft Rulez" >> >> >> >> Bad, right? So you went ahead and fixed it, and now it prints "5" as you >> >> might expect. But alas, with your patch, running: >> >> >> >> pr_err("%s", "hi!") >> >> >> >> Would show a cat picture for 5 seconds. >> >> >> >> Should we take your patch in -stable or not? If we don't, we're stuck >> >> with the original issue while the mainline kernel will behave >> >> differently, but if we do - we introduce a new regression. >> > >> >Of course not. >> > >> >- It must be obviously correct and tested. >> > >> >If it introduces new bug, it is not correct, and certainly not >> >obviously correct. >> >> As you might have noticed, we don't strictly follow the rules. >> >> Take a look at the whole PTI story as an example. It's way more than 100 >> lines, it's not obviously corrent, it fixed more than 1 thing, and so >> on, and yet it went in -stable! >> >> Would you argue we shouldn't have backported PTI to -stable? > >So I agree with that being backported. But I think this nicely demostrates >a point some people are trying to make in this thread. We do take fixes >with high risk or regression if they fix serious enough issue. Also we do >take fixes to non-serious stuff (such as addition of device ID) if the >chances of regression are really low. > >So IMHO the metric for including the fix is not solely "how annoying to >user this can be" but rather something like: > >score = (how annoying the bug is) * ((1 / (chance of regression due to > including this)) - 1)^3 > >(constants are somewhat arbitrary subject to tuning ;). Now both 'annoying' >and 'regression chance' parts are subjective and sometimes difficult to >estimate so don't take the formula too seriously but it demonstrates the >point. I think we all agree we want to fix annoying stuff and we don't want >regressions. But you need to somehow weight this over your expected >userbase - and this is where your argument "but someone might be annoyed by >LEDs not working so let's include it" has problems - it should rather be >"is the annoyance of non-working leds over expected user base high enough >to risk a regression due to this patch for someone in the expected user >base"? The answer to this second question is not clear at all to a casual >reviewer and that's why we IMHO have CC stable tag as maintainer is >supposed to have at least a bit better clue. We may be able to guesstimate the 'regression chance', but there's no way we can guess the 'annoyance' once. There are so many different use cases that we just can't even guess how many people would get "annoyed" by something. Even regression chance is tricky, look at the commits I've linked earlier in the thread. Even the most trivial looking commits that end up in stable have a chance for regression. >Another point I wanted to make is that if chance a patch causes a >regression is about 2% as you said somewhere else in a thread, then by >adding 20 patches that "may fix a bug that is annoying for someone" you've >just increased a chance there's a regression in the release by 34%. And So I've said that the rejection rate is less than 2%. This includes all commits that I have proposed for -stable, but didn't end up being included in -stable. This includes commits that the author/maintainers NACKed, commits that didn't do anything on older kernels, commits that were buggy but were caught before the kernel was released, commits that failed to build on an
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Tue 17-04-18 14:24:54, Petr Mladek wrote: [...] > Back to the trend. Last week I got autosel mails even for > patches that were still being discussed, had issues, and > were far from upstream: > > https://lkml.kernel.org/r/dm5pr2101mb1032ab19b489d46b717b50d4fb...@dm5pr2101mb1032.namprd21.prod.outlook.com > https://lkml.kernel.org/r/dm5pr2101mb10327fa0a7e0d2c901e33b79fb...@dm5pr2101mb1032.namprd21.prod.outlook.com > > It might be a good idea if the mail asked to add Fixes: tag > or stable mailing list. But the mail suggested to add the > unfinished patch into stable branch directly (even before > upstreaming?). Well, I think that poking subsystems which ignore stable trees with such emails early during review might be quite helpful. Maybe people start marking for stable and we do not need the guessing later. I wouldn't bother poking those who are known to mark stable patches though. -- Michal Hocko SUSE Labs
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Tue 17-04-18 14:24:54, Petr Mladek wrote: [...] > Back to the trend. Last week I got autosel mails even for > patches that were still being discussed, had issues, and > were far from upstream: > > https://lkml.kernel.org/r/dm5pr2101mb1032ab19b489d46b717b50d4fb...@dm5pr2101mb1032.namprd21.prod.outlook.com > https://lkml.kernel.org/r/dm5pr2101mb10327fa0a7e0d2c901e33b79fb...@dm5pr2101mb1032.namprd21.prod.outlook.com > > It might be a good idea if the mail asked to add Fixes: tag > or stable mailing list. But the mail suggested to add the > unfinished patch into stable branch directly (even before > upstreaming?). Well, I think that poking subsystems which ignore stable trees with such emails early during review might be quite helpful. Maybe people start marking for stable and we do not need the guessing later. I wouldn't bother poking those who are known to mark stable patches though. -- Michal Hocko SUSE Labs
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Tue 2018-04-17 12:46:37, Greg KH wrote: > Oh, I know why, suddenly subsystems that never were taking the time to > mark patches for stable are getting patches backported and are getting > nervous. Yes, I am getting nervous because of this. The number of printk fixes nominated for stable is increasing exponentially (just my feeling) during last few months. The problem is that I want to be responsible and think about possible regressions. Sometimes it requires checking the state of the particular kernel release. The older code base the more complicated the decision is. You might argue that backporting the fixes helps to get the same code in all supported code bases. But it is not true. It never will be the same. Anyway, in the past the "automatically" nominated printk fixes were trivial. They did not cause harm. But they also were not worth it, IMHO. They fixed corner cases that were there for ages. Most of these fixes were found by code review when working on a feature. They were not backed by bug reports. Last week, autosel nominated pretty non-trivial patch (started this thread). It partly solved a problem we tried to fix last few years. On one side, this was an annoying problem that motivated several people spend a lot of time on it. This might be a motivation for a backport. On the other hand, it took many years to come somewhere. The main problem was the fear of regressions. We fixed/improved many things in the mean time. It shows that the problem really is not trivial. The same is true for the fix. We did our best to avoid regressions. But it does not mean that there are none. Also it does not mean that it will really give better results in all situations. I really do not see a reason to hurry and backport this to the older kernel releases. It means to spread the fix but also eventual problems. It is easy to miss a dependant patch. The less trivial fix, the more possible problems are there. Back to the trend. Last week I got autosel mails even for patches that were still being discussed, had issues, and were far from upstream: https://lkml.kernel.org/r/dm5pr2101mb1032ab19b489d46b717b50d4fb...@dm5pr2101mb1032.namprd21.prod.outlook.com https://lkml.kernel.org/r/dm5pr2101mb10327fa0a7e0d2c901e33b79fb...@dm5pr2101mb1032.namprd21.prod.outlook.com It might be a good idea if the mail asked to add Fixes: tag or stable mailing list. But the mail suggested to add the unfinished patch into stable branch directly (even before upstreaming?). Now, there are only hand full of printk patches in each release, so it is still doable. I just do not understand how other maintainers, from much more busy subsystems, could cope with this trend. By other words. If you want to automatize patch nomination, you might need to automatize also patch review. Or you need to keep the patch rate low. This might mean to nominate only important and rather trivial fixes. Best Regards, Petr
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Tue 2018-04-17 12:46:37, Greg KH wrote: > Oh, I know why, suddenly subsystems that never were taking the time to > mark patches for stable are getting patches backported and are getting > nervous. Yes, I am getting nervous because of this. The number of printk fixes nominated for stable is increasing exponentially (just my feeling) during last few months. The problem is that I want to be responsible and think about possible regressions. Sometimes it requires checking the state of the particular kernel release. The older code base the more complicated the decision is. You might argue that backporting the fixes helps to get the same code in all supported code bases. But it is not true. It never will be the same. Anyway, in the past the "automatically" nominated printk fixes were trivial. They did not cause harm. But they also were not worth it, IMHO. They fixed corner cases that were there for ages. Most of these fixes were found by code review when working on a feature. They were not backed by bug reports. Last week, autosel nominated pretty non-trivial patch (started this thread). It partly solved a problem we tried to fix last few years. On one side, this was an annoying problem that motivated several people spend a lot of time on it. This might be a motivation for a backport. On the other hand, it took many years to come somewhere. The main problem was the fear of regressions. We fixed/improved many things in the mean time. It shows that the problem really is not trivial. The same is true for the fix. We did our best to avoid regressions. But it does not mean that there are none. Also it does not mean that it will really give better results in all situations. I really do not see a reason to hurry and backport this to the older kernel releases. It means to spread the fix but also eventual problems. It is easy to miss a dependant patch. The less trivial fix, the more possible problems are there. Back to the trend. Last week I got autosel mails even for patches that were still being discussed, had issues, and were far from upstream: https://lkml.kernel.org/r/dm5pr2101mb1032ab19b489d46b717b50d4fb...@dm5pr2101mb1032.namprd21.prod.outlook.com https://lkml.kernel.org/r/dm5pr2101mb10327fa0a7e0d2c901e33b79fb...@dm5pr2101mb1032.namprd21.prod.outlook.com It might be a good idea if the mail asked to add Fixes: tag or stable mailing list. But the mail suggested to add the unfinished patch into stable branch directly (even before upstreaming?). Now, there are only hand full of printk patches in each release, so it is still doable. I just do not understand how other maintainers, from much more busy subsystems, could cope with this trend. By other words. If you want to automatize patch nomination, you might need to automatize also patch review. Or you need to keep the patch rate low. This might mean to nominate only important and rather trivial fixes. Best Regards, Petr
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Mon 16-04-18 17:23:30, Sasha Levin wrote: > On Mon, Apr 16, 2018 at 07:06:04PM +0200, Pavel Machek wrote: > >On Mon 2018-04-16 16:37:56, Sasha Levin wrote: > >> On Mon, Apr 16, 2018 at 12:30:19PM -0400, Steven Rostedt wrote: > >> >On Mon, 16 Apr 2018 16:19:14 + > >> >Sasha Levinwrote: > >> > > >> >> >Wait! What does that mean? What's the purpose of stable if it is as > >> >> >broken as mainline? > >> >> > >> >> This just means that if there is a fix that went in mainline, and the > >> >> fix is broken somehow, we'd rather take the broken fix than not. > >> >> > >> >> In this scenario, *something* will be broken, it's just a matter of > >> >> what. We'd rather have the same thing broken between mainline and > >> >> stable. > >> > > >> >Honestly, I think that removes all value of the stable series. I > >> >remember when the stable series were first created. People were saying > >> >that it wouldn't even get to more than 5 versions, because the bar for > >> >backporting was suppose to be very high. Today it's just a fork of the > >> >kernel at a given version. No more features, but we will be OK with > >> >regressions. I'm struggling to see what the benefit of it is suppose to > >> >be? > >> > >> It's not "OK with regressions". > >> > >> Let's look at a hypothetical example: You have a 4.15.1 kernel that has > >> a broken printf() behaviour so that when you: > >> > >>pr_err("%d", 5) > >> > >> Would print: > >> > >>"Microsoft Rulez" > >> > >> Bad, right? So you went ahead and fixed it, and now it prints "5" as you > >> might expect. But alas, with your patch, running: > >> > >>pr_err("%s", "hi!") > >> > >> Would show a cat picture for 5 seconds. > >> > >> Should we take your patch in -stable or not? If we don't, we're stuck > >> with the original issue while the mainline kernel will behave > >> differently, but if we do - we introduce a new regression. > > > >Of course not. > > > >- It must be obviously correct and tested. > > > >If it introduces new bug, it is not correct, and certainly not > >obviously correct. > > As you might have noticed, we don't strictly follow the rules. > > Take a look at the whole PTI story as an example. It's way more than 100 > lines, it's not obviously corrent, it fixed more than 1 thing, and so > on, and yet it went in -stable! > > Would you argue we shouldn't have backported PTI to -stable? So I agree with that being backported. But I think this nicely demostrates a point some people are trying to make in this thread. We do take fixes with high risk or regression if they fix serious enough issue. Also we do take fixes to non-serious stuff (such as addition of device ID) if the chances of regression are really low. So IMHO the metric for including the fix is not solely "how annoying to user this can be" but rather something like: score = (how annoying the bug is) * ((1 / (chance of regression due to including this)) - 1)^3 (constants are somewhat arbitrary subject to tuning ;). Now both 'annoying' and 'regression chance' parts are subjective and sometimes difficult to estimate so don't take the formula too seriously but it demonstrates the point. I think we all agree we want to fix annoying stuff and we don't want regressions. But you need to somehow weight this over your expected userbase - and this is where your argument "but someone might be annoyed by LEDs not working so let's include it" has problems - it should rather be "is the annoyance of non-working leds over expected user base high enough to risk a regression due to this patch for someone in the expected user base"? The answer to this second question is not clear at all to a casual reviewer and that's why we IMHO have CC stable tag as maintainer is supposed to have at least a bit better clue. Another point I wanted to make is that if chance a patch causes a regression is about 2% as you said somewhere else in a thread, then by adding 20 patches that "may fix a bug that is annoying for someone" you've just increased a chance there's a regression in the release by 34%. And this is not just a math game, this also roughly matches a real experience with maintaining our enterprise kernels. Do 20 "maybe" fixes outweight such regression chance? And I also note that for a regression to get reported so that it gets included into your 2% estimate of a patch regression rate, someone must be bothered enough by it to triage it and send an email somewhere so that already falls into a category of "serious" stuff to me. So these are the reasons why I think that merging tons of patches into stable isn't actually very good. Honza -- Jan Kara SUSE Labs, CR
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Mon 16-04-18 17:23:30, Sasha Levin wrote: > On Mon, Apr 16, 2018 at 07:06:04PM +0200, Pavel Machek wrote: > >On Mon 2018-04-16 16:37:56, Sasha Levin wrote: > >> On Mon, Apr 16, 2018 at 12:30:19PM -0400, Steven Rostedt wrote: > >> >On Mon, 16 Apr 2018 16:19:14 + > >> >Sasha Levin wrote: > >> > > >> >> >Wait! What does that mean? What's the purpose of stable if it is as > >> >> >broken as mainline? > >> >> > >> >> This just means that if there is a fix that went in mainline, and the > >> >> fix is broken somehow, we'd rather take the broken fix than not. > >> >> > >> >> In this scenario, *something* will be broken, it's just a matter of > >> >> what. We'd rather have the same thing broken between mainline and > >> >> stable. > >> > > >> >Honestly, I think that removes all value of the stable series. I > >> >remember when the stable series were first created. People were saying > >> >that it wouldn't even get to more than 5 versions, because the bar for > >> >backporting was suppose to be very high. Today it's just a fork of the > >> >kernel at a given version. No more features, but we will be OK with > >> >regressions. I'm struggling to see what the benefit of it is suppose to > >> >be? > >> > >> It's not "OK with regressions". > >> > >> Let's look at a hypothetical example: You have a 4.15.1 kernel that has > >> a broken printf() behaviour so that when you: > >> > >>pr_err("%d", 5) > >> > >> Would print: > >> > >>"Microsoft Rulez" > >> > >> Bad, right? So you went ahead and fixed it, and now it prints "5" as you > >> might expect. But alas, with your patch, running: > >> > >>pr_err("%s", "hi!") > >> > >> Would show a cat picture for 5 seconds. > >> > >> Should we take your patch in -stable or not? If we don't, we're stuck > >> with the original issue while the mainline kernel will behave > >> differently, but if we do - we introduce a new regression. > > > >Of course not. > > > >- It must be obviously correct and tested. > > > >If it introduces new bug, it is not correct, and certainly not > >obviously correct. > > As you might have noticed, we don't strictly follow the rules. > > Take a look at the whole PTI story as an example. It's way more than 100 > lines, it's not obviously corrent, it fixed more than 1 thing, and so > on, and yet it went in -stable! > > Would you argue we shouldn't have backported PTI to -stable? So I agree with that being backported. But I think this nicely demostrates a point some people are trying to make in this thread. We do take fixes with high risk or regression if they fix serious enough issue. Also we do take fixes to non-serious stuff (such as addition of device ID) if the chances of regression are really low. So IMHO the metric for including the fix is not solely "how annoying to user this can be" but rather something like: score = (how annoying the bug is) * ((1 / (chance of regression due to including this)) - 1)^3 (constants are somewhat arbitrary subject to tuning ;). Now both 'annoying' and 'regression chance' parts are subjective and sometimes difficult to estimate so don't take the formula too seriously but it demonstrates the point. I think we all agree we want to fix annoying stuff and we don't want regressions. But you need to somehow weight this over your expected userbase - and this is where your argument "but someone might be annoyed by LEDs not working so let's include it" has problems - it should rather be "is the annoyance of non-working leds over expected user base high enough to risk a regression due to this patch for someone in the expected user base"? The answer to this second question is not clear at all to a casual reviewer and that's why we IMHO have CC stable tag as maintainer is supposed to have at least a bit better clue. Another point I wanted to make is that if chance a patch causes a regression is about 2% as you said somewhere else in a thread, then by adding 20 patches that "may fix a bug that is annoying for someone" you've just increased a chance there's a regression in the release by 34%. And this is not just a math game, this also roughly matches a real experience with maintaining our enterprise kernels. Do 20 "maybe" fixes outweight such regression chance? And I also note that for a regression to get reported so that it gets included into your 2% estimate of a patch regression rate, someone must be bothered enough by it to triage it and send an email somewhere so that already falls into a category of "serious" stuff to me. So these are the reasons why I think that merging tons of patches into stable isn't actually very good. Honza -- Jan Kara SUSE Labs, CR
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Tue, 17 Apr 2018, Greg KH wrote: > It already is that :) I have a question: I guess a stable team has an idea who they are preparing the tree for, IOW who is the target consumer. Who is it? Certainly it's not major distros, as both RH and SUSE already stated that they are either not basing off the stable kernel (only cherry-pick fixes from it) (RH), or are quite far in the process of moving away from stable tree towards combination of what RH is doing + semi-automated evaluation of Fixes: tag (SUSE). If the target audience is somewhere else, that's perfectly fine, but then it'd have to be stated explicitly I guess. I can't speak for RH, but for us (at least me personally), the pace of patches flowing into -stable is way too high for us to keep control of what is landing in our tree. In some sense, stability should be equivalent to "minimal necessary amout of super-critical changes". That's not what this "let's backport proactively almost everything that has word 'fixes' in changelog" (I'm a bit exaggerating of course) seems to be about. Again, the rules stated out in Documentation/process/stable-kernel-rules.rst are very nice, and are exactly something at least we would be very happy about. They have the nice hidden asumption in them, that someone actually has to actively invest human brain power to think about the fix, its consequences, prerequisities, etc. Not just doing a big dump of all commits that "might fix something". How many of the actual patches flowing into -stable would satisfy those criteria these days? IOW, I'm pretty sure our users are much happier with us supplying them reactive fixes than pro-active uncertainity. > The problem Sasha is trying to solve here is that for many subsystems, > maintainers do not mark patches for stable at all. The pressure on those subsystems should be coming from unhappy users (being it end-users or vendors redistributing the tree) of the stable tree, who would be complaining about missing fixes for those subsystems. Is this actually happening? Where? > Oh, and if you do want to complain about huge new features being > backported, look at the mess that Spectre and Meltdown has caused in the > stable trees. I don't see anyone complaining about those massive > changes :) Umm, sorry, how is this related? There simply was no other way, and I took it for given that this is seen by everybody involved as an absolute exception, due to the nature of the issue and of the massive changes that were needed. Thanks, -- Jiri Kosina SUSE Labs
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Tue, 17 Apr 2018, Greg KH wrote: > It already is that :) I have a question: I guess a stable team has an idea who they are preparing the tree for, IOW who is the target consumer. Who is it? Certainly it's not major distros, as both RH and SUSE already stated that they are either not basing off the stable kernel (only cherry-pick fixes from it) (RH), or are quite far in the process of moving away from stable tree towards combination of what RH is doing + semi-automated evaluation of Fixes: tag (SUSE). If the target audience is somewhere else, that's perfectly fine, but then it'd have to be stated explicitly I guess. I can't speak for RH, but for us (at least me personally), the pace of patches flowing into -stable is way too high for us to keep control of what is landing in our tree. In some sense, stability should be equivalent to "minimal necessary amout of super-critical changes". That's not what this "let's backport proactively almost everything that has word 'fixes' in changelog" (I'm a bit exaggerating of course) seems to be about. Again, the rules stated out in Documentation/process/stable-kernel-rules.rst are very nice, and are exactly something at least we would be very happy about. They have the nice hidden asumption in them, that someone actually has to actively invest human brain power to think about the fix, its consequences, prerequisities, etc. Not just doing a big dump of all commits that "might fix something". How many of the actual patches flowing into -stable would satisfy those criteria these days? IOW, I'm pretty sure our users are much happier with us supplying them reactive fixes than pro-active uncertainity. > The problem Sasha is trying to solve here is that for many subsystems, > maintainers do not mark patches for stable at all. The pressure on those subsystems should be coming from unhappy users (being it end-users or vendors redistributing the tree) of the stable tree, who would be complaining about missing fixes for those subsystems. Is this actually happening? Where? > Oh, and if you do want to complain about huge new features being > backported, look at the mess that Spectre and Meltdown has caused in the > stable trees. I don't see anyone complaining about those massive > changes :) Umm, sorry, how is this related? There simply was no other way, and I took it for given that this is seen by everybody involved as an absolute exception, due to the nature of the issue and of the massive changes that were needed. Thanks, -- Jiri Kosina SUSE Labs
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Tue 17-04-18 12:39:36, Greg KH wrote: > On Mon, Apr 16, 2018 at 11:28:44PM +0200, Jiri Kosina wrote: > > On Mon, 16 Apr 2018, Sasha Levin wrote: > > > > > I agree that as an enterprise distro taking everything from -stable > > > isn't the best idea. Ideally you'd want to be close to the first > > > extreme you've mentioned and only take commits if customers are asking > > > you to do so. > > > > > > I think that the rule we're trying to agree upon is the "It must fix > > > a real bug that bothers people". > > > > > > I think that we can agree that it's impossible to expect every single > > > Linux user to go on LKML and complain about a bug he encountered, so the > > > rule quickly becomes "It must fix a real bug that can bother people". > > > > So is there a reason why stable couldn't become some hybrid-form union of > > > > - really critical issues (data corruption, boot issues, severe security > > issues) taken from bleeding edge upstream > > - [reviewed] cherry-picks of functional fixes from major distro kernels > > (based on that very -stable release), as that's apparently what people > > are hitting in the real world with that particular kernel > > It already is that :) > > The problem Sasha is trying to solve here is that for many subsystems, > maintainers do not mark patches for stable at all. The way he is trying to do that is just wrong. Generate a pressure on those subsystems by referring to bug reports and unhappy users and I am pretty sure they will try harder... You cannot solve the problem by bypassing them without having deep understanding of the specific subsytem. Once you have it, just make sure you are part of the review process and make sure to mark patches before they are merged. > So real bugfixes > that do hit people are not getting to those kernels, which force the > distros to do extra work to triage a bug, dig through upstream kernels, > find and apply the patch. I would say that this is the primary role of the distro. To hide the jungle of the upstream work and provide the additional of bug filtering and forwarding them the right direction. > By identifying the patches that should have been marked for stable, > based on the ways that the changelog text is written and the logic in > the patch itself, we circumvent that extra annoyance of users hitting > problems and complaining, or ignoring them and hoping they go away if > they reboot. Well, but this is a two edge sword. You are not only backporting obvious bug fixes but also pulling many patch out of the context they were merged to and double checking all the assumptions are still true is a non-trivial task to do. I am still not convinced any script or AI can do that right now. > I've been doing this "by hand" for many years now, with no complaints so > far. Really? I remember quite some complains about broken stable releases and also many discussions on KS how the current workflow doesn't really work for some users (e.g. distributions). > Sasha has taken it to the next level as I don't scale and has > started to automate it using some really nice tools. That's all, this > isn't crazy new features being backported, it's just patches that are > obviously fixes being added to the stable tree. I have yet to see a tool which can recognize an "obvious fix". Seriously! Matching keywords in the changelog and some pattern recognition in the diff can help to do some pre filtering _can_ help a lot but there is still a human interaction needed to do sanity checking. And that really requires deep subsystem knowledge. I really fail to see how that can work without relevant people involvement. Pretending that you can do stable without maintainers will simply not work IMNHO. > Yes, sometimes those fixes need additional fixes, and that's fine, > normal stable-marked patches need that all the time. I don't see anyone > complaining about that, right? > > So nothing "new" is happening here, EXCEPT we are actually starting to > get a better kernel-wide coverage for stable fixes, which we have not > had in the past. That's a good thing! The number of patches applied to > stable is still a very very very tiny % compared to mainline, so nothing > new is happening here. yes I do agree, the stable process is not very much different from the past and I would tend both processes broken because they explicitly try to avoid maintainers which is just wrong. > Oh, and if you do want to complain about huge new features being > backported, look at the mess that Spectre and Meltdown has caused in the > stable trees. I don't see anyone complaining about those massive > changes :) Are you serious? Are you going the compare the biggest PITA that the community had to undergo because of HW issues with random pattern matching in changelog/diffs? Come on! -- Michal Hocko SUSE Labs
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Tue 17-04-18 12:39:36, Greg KH wrote: > On Mon, Apr 16, 2018 at 11:28:44PM +0200, Jiri Kosina wrote: > > On Mon, 16 Apr 2018, Sasha Levin wrote: > > > > > I agree that as an enterprise distro taking everything from -stable > > > isn't the best idea. Ideally you'd want to be close to the first > > > extreme you've mentioned and only take commits if customers are asking > > > you to do so. > > > > > > I think that the rule we're trying to agree upon is the "It must fix > > > a real bug that bothers people". > > > > > > I think that we can agree that it's impossible to expect every single > > > Linux user to go on LKML and complain about a bug he encountered, so the > > > rule quickly becomes "It must fix a real bug that can bother people". > > > > So is there a reason why stable couldn't become some hybrid-form union of > > > > - really critical issues (data corruption, boot issues, severe security > > issues) taken from bleeding edge upstream > > - [reviewed] cherry-picks of functional fixes from major distro kernels > > (based on that very -stable release), as that's apparently what people > > are hitting in the real world with that particular kernel > > It already is that :) > > The problem Sasha is trying to solve here is that for many subsystems, > maintainers do not mark patches for stable at all. The way he is trying to do that is just wrong. Generate a pressure on those subsystems by referring to bug reports and unhappy users and I am pretty sure they will try harder... You cannot solve the problem by bypassing them without having deep understanding of the specific subsytem. Once you have it, just make sure you are part of the review process and make sure to mark patches before they are merged. > So real bugfixes > that do hit people are not getting to those kernels, which force the > distros to do extra work to triage a bug, dig through upstream kernels, > find and apply the patch. I would say that this is the primary role of the distro. To hide the jungle of the upstream work and provide the additional of bug filtering and forwarding them the right direction. > By identifying the patches that should have been marked for stable, > based on the ways that the changelog text is written and the logic in > the patch itself, we circumvent that extra annoyance of users hitting > problems and complaining, or ignoring them and hoping they go away if > they reboot. Well, but this is a two edge sword. You are not only backporting obvious bug fixes but also pulling many patch out of the context they were merged to and double checking all the assumptions are still true is a non-trivial task to do. I am still not convinced any script or AI can do that right now. > I've been doing this "by hand" for many years now, with no complaints so > far. Really? I remember quite some complains about broken stable releases and also many discussions on KS how the current workflow doesn't really work for some users (e.g. distributions). > Sasha has taken it to the next level as I don't scale and has > started to automate it using some really nice tools. That's all, this > isn't crazy new features being backported, it's just patches that are > obviously fixes being added to the stable tree. I have yet to see a tool which can recognize an "obvious fix". Seriously! Matching keywords in the changelog and some pattern recognition in the diff can help to do some pre filtering _can_ help a lot but there is still a human interaction needed to do sanity checking. And that really requires deep subsystem knowledge. I really fail to see how that can work without relevant people involvement. Pretending that you can do stable without maintainers will simply not work IMNHO. > Yes, sometimes those fixes need additional fixes, and that's fine, > normal stable-marked patches need that all the time. I don't see anyone > complaining about that, right? > > So nothing "new" is happening here, EXCEPT we are actually starting to > get a better kernel-wide coverage for stable fixes, which we have not > had in the past. That's a good thing! The number of patches applied to > stable is still a very very very tiny % compared to mainline, so nothing > new is happening here. yes I do agree, the stable process is not very much different from the past and I would tend both processes broken because they explicitly try to avoid maintainers which is just wrong. > Oh, and if you do want to complain about huge new features being > backported, look at the mess that Spectre and Meltdown has caused in the > stable trees. I don't see anyone complaining about those massive > changes :) Are you serious? Are you going the compare the biggest PITA that the community had to undergo because of HW issues with random pattern matching in changelog/diffs? Come on! -- Michal Hocko SUSE Labs
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Mon, Apr 16, 2018 at 06:54:51PM +0200, Pavel Machek wrote: > On Mon 2018-04-16 16:45:16, Sasha Levin wrote: > > On Mon, Apr 16, 2018 at 06:42:30PM +0200, Pavel Machek wrote: > > >On Mon 2018-04-16 16:39:20, Sasha Levin wrote: > > >> On Mon, Apr 16, 2018 at 06:28:50PM +0200, Pavel Machek wrote: > > >> > > > >> >> >> Is there a reason not to take LED fixes if they fix a bug and don't > > >> >> >> cause a regression? Sure, we can draw some arbitrary line, maybe > > >> >> >> designate some subsystems that are more "important" than others, > > >> >> >> but > > >> >> >> what's the point? > > >> >> > > > >> >> >There's a tradeoff. > > >> >> > > > >> >> >You want to fix serious bugs in stable, and you really don't want > > >> >> >regressions in stable. And ... stable not having 1000s of patches > > >> >> >would be nice, too. > > >> >> > > >> >> I don't think we should use a number cap here, but rather look at the > > >> >> regression rate: how many patches broke something? > > >> >> > > >> >> Since the rate we're seeing now with AUTOSEL is similar to what we > > >> >> were > > >> >> seeing before AUTOSEL, what's the problem it's causing? > > >> > > > >> >Regression rate should not be the only criteria. > > >> > > > >> >More patches mean bigger chance customer's patches will have a > > >> >conflict with something in -stable, for example. > > >> > > >> Out of tree patches can't be a consideration here. There are no > > >> guarantees for out of tree code, ever. > > > > > >Out of tree code is not consideration for mainline, agreed. Stable > > >should be different. > > > > This is a discussion we could have with in right forum, but FYI stable > > doesn't even guarantee KABI compatibility between minor versions at this > > point. > > Stable should be useful base for distributions. They carry out of tree > patches, and yes, you should try to make their lives easy. How do you know I already don't do that? But no, in the end, it's not my job to make their life easier if they are off in their own corner never providing me feedback or help. For those companies/distros/SoCs that do provide that feedback, I am glad to work with them. As proof of that, there are at least 3 "major" SoC vendors that have been merging every one of the stable releases into their internal trees for the past 6+ months now. I get reports when they do the merge and test, and so far, we have only had 1 regression. And that regression was because that SoC vendor forgot to upstream a patch that they had in their internal tree (i.e. they fixed it a while ago but forgot to tell anyone else, nothing we can do about that.) So if you are a distro/company/whatever that takes stable releases, and have run into problems, please let me know, and I will be glad to work with you. If you are not that, then please don't attempt to speak for them... thanks, greg k-h
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Mon, Apr 16, 2018 at 06:54:51PM +0200, Pavel Machek wrote: > On Mon 2018-04-16 16:45:16, Sasha Levin wrote: > > On Mon, Apr 16, 2018 at 06:42:30PM +0200, Pavel Machek wrote: > > >On Mon 2018-04-16 16:39:20, Sasha Levin wrote: > > >> On Mon, Apr 16, 2018 at 06:28:50PM +0200, Pavel Machek wrote: > > >> > > > >> >> >> Is there a reason not to take LED fixes if they fix a bug and don't > > >> >> >> cause a regression? Sure, we can draw some arbitrary line, maybe > > >> >> >> designate some subsystems that are more "important" than others, > > >> >> >> but > > >> >> >> what's the point? > > >> >> > > > >> >> >There's a tradeoff. > > >> >> > > > >> >> >You want to fix serious bugs in stable, and you really don't want > > >> >> >regressions in stable. And ... stable not having 1000s of patches > > >> >> >would be nice, too. > > >> >> > > >> >> I don't think we should use a number cap here, but rather look at the > > >> >> regression rate: how many patches broke something? > > >> >> > > >> >> Since the rate we're seeing now with AUTOSEL is similar to what we > > >> >> were > > >> >> seeing before AUTOSEL, what's the problem it's causing? > > >> > > > >> >Regression rate should not be the only criteria. > > >> > > > >> >More patches mean bigger chance customer's patches will have a > > >> >conflict with something in -stable, for example. > > >> > > >> Out of tree patches can't be a consideration here. There are no > > >> guarantees for out of tree code, ever. > > > > > >Out of tree code is not consideration for mainline, agreed. Stable > > >should be different. > > > > This is a discussion we could have with in right forum, but FYI stable > > doesn't even guarantee KABI compatibility between minor versions at this > > point. > > Stable should be useful base for distributions. They carry out of tree > patches, and yes, you should try to make their lives easy. How do you know I already don't do that? But no, in the end, it's not my job to make their life easier if they are off in their own corner never providing me feedback or help. For those companies/distros/SoCs that do provide that feedback, I am glad to work with them. As proof of that, there are at least 3 "major" SoC vendors that have been merging every one of the stable releases into their internal trees for the past 6+ months now. I get reports when they do the merge and test, and so far, we have only had 1 regression. And that regression was because that SoC vendor forgot to upstream a patch that they had in their internal tree (i.e. they fixed it a while ago but forgot to tell anyone else, nothing we can do about that.) So if you are a distro/company/whatever that takes stable releases, and have run into problems, please let me know, and I will be glad to work with you. If you are not that, then please don't attempt to speak for them... thanks, greg k-h
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Mon, Apr 16, 2018 at 07:00:10PM +0200, Pavel Machek wrote: > Hi! > > > >> Let me ask my wife (who is happy using Linux as a regular desktop user) > > >> how comfortable she would be with triaging kernel bugs... > > > > > >That's really up to the distribution, not the main kernel stable. Does > > >she download and compile the kernels herself? Does she use LEDs? > > > > > >The point is, stable is to keep what was working continued working. > > >If we don't care about introducing a regression, and just want to keep > > >regressions the same as mainline, why not just go to mainline? That way > > >you can also get the new features? Mainline already has the mantra to > > >not break user space. When I work on new features, I sometimes stumble > > >on bugs with the current features. And some of those fixes require a > > >rewrite. It was "good enough" before, but every so often could cause a > > >bug that the new feature would trigger more often. Do we back port that > > >rewrite? Do we backport fixes to old code that are more likely to be > > >triggered by new features? > > > > > >Ideally, we should be working on getting to no regressions to stable. > > > > This is exactly what we're doing. > > > > If a fix for a bug in -stable introduces a different regression, > > should we take it or not? > > If a fix for bug introduces regression, would you call it "obviously > correct"? I honestly can't believe you all are arguing about this. We backport bugfixes to the stable tree. If those fixes also are buggy we either apply the fix for that problem that ended up in Linus's tree, or we revert the patch. If the fix is not in Linus's tree, sometimes we leave the "bug" in stable for a bit to apply some pressure on the developer/maintainer to get it fixed in Linus's tree (that's what I mean by being "bug compatible".) This is exactly what we have been doing for over a decade now, why are people suddenly getting upset? Oh, I know why, suddenly subsystems that never were taking the time to mark patches for stable are getting patches backported and are getting nervous. The simple way to stop that from happening is to PROPERLY MARK PATCHES FOR STABLE IN THE FIRST PLACE! If you do that, then, no "automated" patches will get selected as you already handled them all. Or if there are some automated patches picked, you can easily NAK them (like xfs does as they know better than everyone else, and honestly, I trust them, and don't run xfs myself), or do like what I do when it happens to me and go "hey, nice, I missed that one!" There, problem solved, if you do that, no more worrying by you at all, and this thread can properly die. ugh, greg k-h
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Mon, Apr 16, 2018 at 07:00:10PM +0200, Pavel Machek wrote: > Hi! > > > >> Let me ask my wife (who is happy using Linux as a regular desktop user) > > >> how comfortable she would be with triaging kernel bugs... > > > > > >That's really up to the distribution, not the main kernel stable. Does > > >she download and compile the kernels herself? Does she use LEDs? > > > > > >The point is, stable is to keep what was working continued working. > > >If we don't care about introducing a regression, and just want to keep > > >regressions the same as mainline, why not just go to mainline? That way > > >you can also get the new features? Mainline already has the mantra to > > >not break user space. When I work on new features, I sometimes stumble > > >on bugs with the current features. And some of those fixes require a > > >rewrite. It was "good enough" before, but every so often could cause a > > >bug that the new feature would trigger more often. Do we back port that > > >rewrite? Do we backport fixes to old code that are more likely to be > > >triggered by new features? > > > > > >Ideally, we should be working on getting to no regressions to stable. > > > > This is exactly what we're doing. > > > > If a fix for a bug in -stable introduces a different regression, > > should we take it or not? > > If a fix for bug introduces regression, would you call it "obviously > correct"? I honestly can't believe you all are arguing about this. We backport bugfixes to the stable tree. If those fixes also are buggy we either apply the fix for that problem that ended up in Linus's tree, or we revert the patch. If the fix is not in Linus's tree, sometimes we leave the "bug" in stable for a bit to apply some pressure on the developer/maintainer to get it fixed in Linus's tree (that's what I mean by being "bug compatible".) This is exactly what we have been doing for over a decade now, why are people suddenly getting upset? Oh, I know why, suddenly subsystems that never were taking the time to mark patches for stable are getting patches backported and are getting nervous. The simple way to stop that from happening is to PROPERLY MARK PATCHES FOR STABLE IN THE FIRST PLACE! If you do that, then, no "automated" patches will get selected as you already handled them all. Or if there are some automated patches picked, you can easily NAK them (like xfs does as they know better than everyone else, and honestly, I trust them, and don't run xfs myself), or do like what I do when it happens to me and go "hey, nice, I missed that one!" There, problem solved, if you do that, no more worrying by you at all, and this thread can properly die. ugh, greg k-h
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Mon, Apr 16, 2018 at 11:28:44PM +0200, Jiri Kosina wrote: > On Mon, 16 Apr 2018, Sasha Levin wrote: > > > I agree that as an enterprise distro taking everything from -stable > > isn't the best idea. Ideally you'd want to be close to the first > > extreme you've mentioned and only take commits if customers are asking > > you to do so. > > > > I think that the rule we're trying to agree upon is the "It must fix > > a real bug that bothers people". > > > > I think that we can agree that it's impossible to expect every single > > Linux user to go on LKML and complain about a bug he encountered, so the > > rule quickly becomes "It must fix a real bug that can bother people". > > So is there a reason why stable couldn't become some hybrid-form union of > > - really critical issues (data corruption, boot issues, severe security > issues) taken from bleeding edge upstream > - [reviewed] cherry-picks of functional fixes from major distro kernels > (based on that very -stable release), as that's apparently what people > are hitting in the real world with that particular kernel It already is that :) The problem Sasha is trying to solve here is that for many subsystems, maintainers do not mark patches for stable at all. So real bugfixes that do hit people are not getting to those kernels, which force the distros to do extra work to triage a bug, dig through upstream kernels, find and apply the patch. By identifying the patches that should have been marked for stable, based on the ways that the changelog text is written and the logic in the patch itself, we circumvent that extra annoyance of users hitting problems and complaining, or ignoring them and hoping they go away if they reboot. I've been doing this "by hand" for many years now, with no complaints so far. Sasha has taken it to the next level as I don't scale and has started to automate it using some really nice tools. That's all, this isn't crazy new features being backported, it's just patches that are obviously fixes being added to the stable tree. Yes, sometimes those fixes need additional fixes, and that's fine, normal stable-marked patches need that all the time. I don't see anyone complaining about that, right? So nothing "new" is happening here, EXCEPT we are actually starting to get a better kernel-wide coverage for stable fixes, which we have not had in the past. That's a good thing! The number of patches applied to stable is still a very very very tiny % compared to mainline, so nothing new is happening here. Oh, and if you do want to complain about huge new features being backported, look at the mess that Spectre and Meltdown has caused in the stable trees. I don't see anyone complaining about those massive changes :) thanks, greg k-h
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Mon, Apr 16, 2018 at 11:28:44PM +0200, Jiri Kosina wrote: > On Mon, 16 Apr 2018, Sasha Levin wrote: > > > I agree that as an enterprise distro taking everything from -stable > > isn't the best idea. Ideally you'd want to be close to the first > > extreme you've mentioned and only take commits if customers are asking > > you to do so. > > > > I think that the rule we're trying to agree upon is the "It must fix > > a real bug that bothers people". > > > > I think that we can agree that it's impossible to expect every single > > Linux user to go on LKML and complain about a bug he encountered, so the > > rule quickly becomes "It must fix a real bug that can bother people". > > So is there a reason why stable couldn't become some hybrid-form union of > > - really critical issues (data corruption, boot issues, severe security > issues) taken from bleeding edge upstream > - [reviewed] cherry-picks of functional fixes from major distro kernels > (based on that very -stable release), as that's apparently what people > are hitting in the real world with that particular kernel It already is that :) The problem Sasha is trying to solve here is that for many subsystems, maintainers do not mark patches for stable at all. So real bugfixes that do hit people are not getting to those kernels, which force the distros to do extra work to triage a bug, dig through upstream kernels, find and apply the patch. By identifying the patches that should have been marked for stable, based on the ways that the changelog text is written and the logic in the patch itself, we circumvent that extra annoyance of users hitting problems and complaining, or ignoring them and hoping they go away if they reboot. I've been doing this "by hand" for many years now, with no complaints so far. Sasha has taken it to the next level as I don't scale and has started to automate it using some really nice tools. That's all, this isn't crazy new features being backported, it's just patches that are obviously fixes being added to the stable tree. Yes, sometimes those fixes need additional fixes, and that's fine, normal stable-marked patches need that all the time. I don't see anyone complaining about that, right? So nothing "new" is happening here, EXCEPT we are actually starting to get a better kernel-wide coverage for stable fixes, which we have not had in the past. That's a good thing! The number of patches applied to stable is still a very very very tiny % compared to mainline, so nothing new is happening here. Oh, and if you do want to complain about huge new features being backported, look at the mess that Spectre and Meltdown has caused in the stable trees. I don't see anyone complaining about those massive changes :) thanks, greg k-h
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Mon, 16 Apr 2018, Sasha Levin wrote: > I agree that as an enterprise distro taking everything from -stable > isn't the best idea. Ideally you'd want to be close to the first > extreme you've mentioned and only take commits if customers are asking > you to do so. > > I think that the rule we're trying to agree upon is the "It must fix > a real bug that bothers people". > > I think that we can agree that it's impossible to expect every single > Linux user to go on LKML and complain about a bug he encountered, so the > rule quickly becomes "It must fix a real bug that can bother people". So is there a reason why stable couldn't become some hybrid-form union of - really critical issues (data corruption, boot issues, severe security issues) taken from bleeding edge upstream - [reviewed] cherry-picks of functional fixes from major distro kernels (based on that very -stable release), as that's apparently what people are hitting in the real world with that particular kernel ? Thanks, -- Jiri Kosina SUSE Labs
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Mon, 16 Apr 2018, Sasha Levin wrote: > I agree that as an enterprise distro taking everything from -stable > isn't the best idea. Ideally you'd want to be close to the first > extreme you've mentioned and only take commits if customers are asking > you to do so. > > I think that the rule we're trying to agree upon is the "It must fix > a real bug that bothers people". > > I think that we can agree that it's impossible to expect every single > Linux user to go on LKML and complain about a bug he encountered, so the > rule quickly becomes "It must fix a real bug that can bother people". So is there a reason why stable couldn't become some hybrid-form union of - really critical issues (data corruption, boot issues, severe security issues) taken from bleeding edge upstream - [reviewed] cherry-picks of functional fixes from major distro kernels (based on that very -stable release), as that's apparently what people are hitting in the real world with that particular kernel ? Thanks, -- Jiri Kosina SUSE Labs
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Mon, 16 Apr 2018 13:17:24 -0700 Linus Torvaldswrote: > On Mon, Apr 16, 2018 at 1:02 PM, Steven Rostedt wrote: > > > > But this is going way off topic to what we were discussing. The > > discussion is about what gets backported. Is automating the process > > going to make stable better? Or is it likely to add more regressions. > > > > Sasha's response has been that his automated process has the same rate > > of regressions as what gets tagged by authors. My argument is that > > perhaps authors should tag less to stable. > > The ones who should matter most for that discussion is the distros, > since they are the actual users of stable (as well as the people doing > the work, of course - ie Sasha and Greg and the rest of the stable > gang). That was actually my final conclusion before we started out discussion ;-) http://lkml.kernel.org/r/20180416143510.79ba5...@gandalf.local.home > > And I suspect that they actually do want all the noise, and all the > stuff that isn't "critical". That's often the _easy_ choice. It's the > stuff that I suspect the stable maintainers go "this I don't even have > to think about", because it's a new driver ID or something. Although Red Hat doesn't base off of the stable kernel. At least it didn't when I was there. They may look at the stable kernel, but they make their own decisions. If we want the distros to use stable as the base, it should be the least common factor among them. Otherwise, if stable includes commits that a distro would rather not backport, then they wont use stable. > > Because the bulk of stable tends to be driver updates, afaik. Which > distros very much tend to want. > > Will developers think that their patches matter so much that they > should go to stable? Yes they will. Will they overtag as a result? > Probably. But the reverse likely also happens, where people simply > don't think about stable at all, and just want to fix a bug. > > In many ways "Fixes" is likely a better thing to check for in stable > backports, but that doesn't always exist either. > > And just judging by the amount of stable email I get - and by how > excited _I_ would be about stable work, I think "automated process" is > simply not an option. It's a _requirement_. You'd go completely crazy > if you didn't automate 99% of all the stable work. > > So can you trust the "Cc: stable" as being perfect? Hell no. But > what's your alternative? Manually selecting things for stable? Asking > the developers separately? > > Because "criticality" definitely isn't what determines it. If it was, > we'd never add driver ID's etc to stable - they're clearly not > "critical". True. But I believe the driver ID's was given the "exception". > > Yet it feels like that's sometimes those driver things are the _bulk_ > of it, and it is usually fairly safe (not quite as obviously safe as > you'd think, because a driver ID addition has occasionally meant not > just "now it's supported", but instead "now the generic driver doesn't > trigger for it any more", so it can actually break things). > > So I think - and _hope_ - that 99% of stable should be the > non-critical stuff that people don't even need to think about. > > The critical stuff is hopefully a tiny tiny percentage. Well, I'm not sure that's really the case. $ git log --oneline v4.14.33..v4.14.34 | head -20 ffebeb0d7c37 Linux 4.14.34 fdae5b620566 net/mlx4_core: Fix memory leak while delete slave's resources 9fdeb33e1913 vhost_net: add missing lock nesting notation 8c316b625705 team: move dev_mc_sync after master_upper_dev_link in team_port_add 233ba28e1862 route: check sysctl_fib_multipath_use_neigh earlier than hash 2f8aa659d4c0 vhost: validate log when IOTLB is enabled 72b880f43990 net/mlx5e: Fix traffic being dropped on VF representor 9408bceb0649 net/mlx4_en: Fix mixed PFC and Global pause user control requests 477c73abf26a strparser: Fix sign of err codes 1c71bfe84deb net/sched: fix NULL dereference on the error path of tcf_skbmod_init() a19024a3f343 net/sched: fix NULL dereference in the error path of tunnel_key_init() e096c8bf4fb8 net/mlx5e: Sync netdev vxlan ports at open baab1f0c4885 net/mlx5e: Don't override vport admin link state in switchdev mode 1ec7966ab7db ipv6: sr: fix seg6 encap performances with TSO enabled e52a45bb392f nfp: use full 40 bits of the NSP buffer address ddf79878f1e0 net/mlx5e: Fix memory usage issues in offloading TC flows 9282181c1cc5 net/mlx5e: Avoid using the ipv6 stub in the TC offload neigh update path b9c6ddda3805 vti6: better validate user provided tunnel names 109dce20c6ed ip6_tunnel: better validate user provided tunnel names 72363c63b070 ip6_gre: better validate user provided tunnel names The majority of those appear to be on the critical side. -- Steve
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Mon, 16 Apr 2018 13:17:24 -0700 Linus Torvalds wrote: > On Mon, Apr 16, 2018 at 1:02 PM, Steven Rostedt wrote: > > > > But this is going way off topic to what we were discussing. The > > discussion is about what gets backported. Is automating the process > > going to make stable better? Or is it likely to add more regressions. > > > > Sasha's response has been that his automated process has the same rate > > of regressions as what gets tagged by authors. My argument is that > > perhaps authors should tag less to stable. > > The ones who should matter most for that discussion is the distros, > since they are the actual users of stable (as well as the people doing > the work, of course - ie Sasha and Greg and the rest of the stable > gang). That was actually my final conclusion before we started out discussion ;-) http://lkml.kernel.org/r/20180416143510.79ba5...@gandalf.local.home > > And I suspect that they actually do want all the noise, and all the > stuff that isn't "critical". That's often the _easy_ choice. It's the > stuff that I suspect the stable maintainers go "this I don't even have > to think about", because it's a new driver ID or something. Although Red Hat doesn't base off of the stable kernel. At least it didn't when I was there. They may look at the stable kernel, but they make their own decisions. If we want the distros to use stable as the base, it should be the least common factor among them. Otherwise, if stable includes commits that a distro would rather not backport, then they wont use stable. > > Because the bulk of stable tends to be driver updates, afaik. Which > distros very much tend to want. > > Will developers think that their patches matter so much that they > should go to stable? Yes they will. Will they overtag as a result? > Probably. But the reverse likely also happens, where people simply > don't think about stable at all, and just want to fix a bug. > > In many ways "Fixes" is likely a better thing to check for in stable > backports, but that doesn't always exist either. > > And just judging by the amount of stable email I get - and by how > excited _I_ would be about stable work, I think "automated process" is > simply not an option. It's a _requirement_. You'd go completely crazy > if you didn't automate 99% of all the stable work. > > So can you trust the "Cc: stable" as being perfect? Hell no. But > what's your alternative? Manually selecting things for stable? Asking > the developers separately? > > Because "criticality" definitely isn't what determines it. If it was, > we'd never add driver ID's etc to stable - they're clearly not > "critical". True. But I believe the driver ID's was given the "exception". > > Yet it feels like that's sometimes those driver things are the _bulk_ > of it, and it is usually fairly safe (not quite as obviously safe as > you'd think, because a driver ID addition has occasionally meant not > just "now it's supported", but instead "now the generic driver doesn't > trigger for it any more", so it can actually break things). > > So I think - and _hope_ - that 99% of stable should be the > non-critical stuff that people don't even need to think about. > > The critical stuff is hopefully a tiny tiny percentage. Well, I'm not sure that's really the case. $ git log --oneline v4.14.33..v4.14.34 | head -20 ffebeb0d7c37 Linux 4.14.34 fdae5b620566 net/mlx4_core: Fix memory leak while delete slave's resources 9fdeb33e1913 vhost_net: add missing lock nesting notation 8c316b625705 team: move dev_mc_sync after master_upper_dev_link in team_port_add 233ba28e1862 route: check sysctl_fib_multipath_use_neigh earlier than hash 2f8aa659d4c0 vhost: validate log when IOTLB is enabled 72b880f43990 net/mlx5e: Fix traffic being dropped on VF representor 9408bceb0649 net/mlx4_en: Fix mixed PFC and Global pause user control requests 477c73abf26a strparser: Fix sign of err codes 1c71bfe84deb net/sched: fix NULL dereference on the error path of tcf_skbmod_init() a19024a3f343 net/sched: fix NULL dereference in the error path of tunnel_key_init() e096c8bf4fb8 net/mlx5e: Sync netdev vxlan ports at open baab1f0c4885 net/mlx5e: Don't override vport admin link state in switchdev mode 1ec7966ab7db ipv6: sr: fix seg6 encap performances with TSO enabled e52a45bb392f nfp: use full 40 bits of the NSP buffer address ddf79878f1e0 net/mlx5e: Fix memory usage issues in offloading TC flows 9282181c1cc5 net/mlx5e: Avoid using the ipv6 stub in the TC offload neigh update path b9c6ddda3805 vti6: better validate user provided tunnel names 109dce20c6ed ip6_tunnel: better validate user provided tunnel names 72363c63b070 ip6_gre: better validate user provided tunnel names The majority of those appear to be on the critical side. -- Steve
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Mon, Apr 16, 2018 at 10:43:28PM +0200, Jiri Kosina wrote: >On Mon, 16 Apr 2018, Sasha Levin wrote: > >> So I think that Linus's claim that users come first applies here as >> well. If there's a user that cares about a particular feature being >> broken, then we go ahead and fix his bug rather then ignoring him. > >So one extreme is fixing -stable *iff* users actually do report an issue. > >The other extreme is backporting everything that potentially looks like a >potential fix of "something" (according to some arbitrary metric), >pro-actively. > >The former voilates the "users first" rule, the latter has a very, very >high risk of regressions. > >So this whole debate is about finding a compromise. > >My gut feeling always was that the statement in > > Documentation/process/stable-kernel-rules.rst > >is very reasonable, but making the process way more "aggresive" when >backporting patches is breaking much of its original spirit for me. I agree that as an enterprise distro taking everything from -stable isn't the best idea. Ideally you'd want to be close to the first extreme you've mentioned and only take commits if customers are asking you to do so. I think that the rule we're trying to agree upon is the "It must fix a real bug that bothers people". I think that we can agree that it's impossible to expect every single Linux user to go on LKML and complain about a bug he encountered, so the rule quickly becomes "It must fix a real bug that can bother people". My "aggressiveness" comes from the whole "bother" part: it doesn't have to be critical, it doesn't have to cause data corruption, it doesn't have to be a security issue. It's enough that the bug actually affects a user in a way he didn't expect it to (if a user doesn't have expectations, it would fall under the "This could be a problem..." exception. We can go into a discussion about what exactly "bothering" is, but on the flip side, the whole -stable tag is just a way for folks to indicate they want a given patch reviewed for stable, it's not actually a guarantee of whether the patch will go in to -stable or not.
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Mon, Apr 16, 2018 at 10:43:28PM +0200, Jiri Kosina wrote: >On Mon, 16 Apr 2018, Sasha Levin wrote: > >> So I think that Linus's claim that users come first applies here as >> well. If there's a user that cares about a particular feature being >> broken, then we go ahead and fix his bug rather then ignoring him. > >So one extreme is fixing -stable *iff* users actually do report an issue. > >The other extreme is backporting everything that potentially looks like a >potential fix of "something" (according to some arbitrary metric), >pro-actively. > >The former voilates the "users first" rule, the latter has a very, very >high risk of regressions. > >So this whole debate is about finding a compromise. > >My gut feeling always was that the statement in > > Documentation/process/stable-kernel-rules.rst > >is very reasonable, but making the process way more "aggresive" when >backporting patches is breaking much of its original spirit for me. I agree that as an enterprise distro taking everything from -stable isn't the best idea. Ideally you'd want to be close to the first extreme you've mentioned and only take commits if customers are asking you to do so. I think that the rule we're trying to agree upon is the "It must fix a real bug that bothers people". I think that we can agree that it's impossible to expect every single Linux user to go on LKML and complain about a bug he encountered, so the rule quickly becomes "It must fix a real bug that can bother people". My "aggressiveness" comes from the whole "bother" part: it doesn't have to be critical, it doesn't have to cause data corruption, it doesn't have to be a security issue. It's enough that the bug actually affects a user in a way he didn't expect it to (if a user doesn't have expectations, it would fall under the "This could be a problem..." exception. We can go into a discussion about what exactly "bothering" is, but on the flip side, the whole -stable tag is just a way for folks to indicate they want a given patch reviewed for stable, it's not actually a guarantee of whether the patch will go in to -stable or not.
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Mon, 16 Apr 2018, Sasha Levin wrote: > So if a user is operating a nuclear power plant, and has 2 leds: green > one that says "All OK!" and a red one saying "NUCLEAR MELTDOWN!", and > once in a blue moon a race condition is causing the red one to go on and > cause panic in the little province he lives in, we should tell that user > to fuck off? > > LEDs may not be critical for you, but they can be critical for someone > else. Think of all the different users we have and the wildly different > ways they use the kernel. I am pretty sure that for almost every fix there is a person on a planet that'd rate it "critical". We can't really use this as an argument for inclusion of code into -stable, as that'd mean that -stable and Linus' tree would have to be basically the same. -- Jiri Kosina SUSE Labs
Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes
On Mon, 16 Apr 2018, Sasha Levin wrote: > So if a user is operating a nuclear power plant, and has 2 leds: green > one that says "All OK!" and a red one saying "NUCLEAR MELTDOWN!", and > once in a blue moon a race condition is causing the red one to go on and > cause panic in the little province he lives in, we should tell that user > to fuck off? > > LEDs may not be critical for you, but they can be critical for someone > else. Think of all the different users we have and the wildly different > ways they use the kernel. I am pretty sure that for almost every fix there is a person on a planet that'd rate it "critical". We can't really use this as an argument for inclusion of code into -stable, as that'd mean that -stable and Linus' tree would have to be basically the same. -- Jiri Kosina SUSE Labs