Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
On Tue, Jan 08, 2008 at 08:10:42PM -0800, Andrew Morton wrote: [...] > I must say that the number of bugs which actually go away when the user > stops using nvidia/fglrx/ndiswrapper/etc is a small minority. [...] > But people who think that removing the nvidia driver will > magically fix that khubd-got-stuck-in-D-state bug are urinating up an > incline. > > > Facts: > > - lots of people use nvidia/etc > > - most bugs they report aren't caused by nvidia/etc > > - we need lots of testers > > draw you own conclusions. Thanks Andrew for this demonstration. At least now I know I'm not the only one to think that. And no, I do not have any nvidia/etc. It's just that I value their users' reports as much as the other ones just because otherwise we would only track some elite's bugs, thus reducing the amount of information we have to understand the circumstances under which it happens. Cheers, Willy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
On Tue, 08 Jan 2008 23:01:07 -0500 [EMAIL PROTECTED] wrote: > On Mon, 07 Jan 2008 16:19:30 MST, Matthew Wilcox said: > > On Mon, Jan 07, 2008 at 06:04:25PM -0500, [EMAIL PROTECTED] wrote: > > > Theoretically, at least. Sometimes, in the real world, other constraints > > > enter into it... > > > > So you're saying that you can't find reliable ways to reproduce problems > > on demand? Those are some of the lower quality bug reports, so I don't > > think we're losing much by having you not report them. > > I'm sure that *everybody* on this list would *love* to know how you find > a reliable way to reproduce all the bugs that start off with "after X days of > uptime". But when you're chasing what might be a race condition with a > very small timing hole, you may need an event to happen several million times > before the accumulated chance of hitting it becomes appreciable. > I must say that the number of bugs which actually go away when the user stops using nvidia/fglrx/ndiswrapper/etc is a small minority. And you can usually tell beforehand too: if the user reports bad_page warnings or pte table scroggage or whatever and they're using nvidia I just hit 'd'. But people who think that removing the nvidia driver will magically fix that khubd-got-stuck-in-D-state bug are urinating up an incline. Facts: - lots of people use nvidia/etc - most bugs they report aren't caused by nvidia/etc - we need lots of testers draw you own conclusions. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
On Mon, 07 Jan 2008 16:19:30 MST, Matthew Wilcox said: > So you're saying that you can't find reliable ways to reproduce problems > on demand? Those are some of the lower quality bug reports, so I don't > think we're losing much by having you not report them. And in the next e-mail in my lkml folder we see: On Mon, 07 Jan 2008 18:21:45 EST, Parag Warudkar said: > BTW, I have so far tested 2.6.24-rc4/5/6/7 and 2.6.23.12 - all of > which have this problem. > > Yesterday I went back to using 2.6.22.15 and after a day's uptime it > has not reproduced with the same config. > > Time for git-bisect I suppose? (the only problem is that this takes > anywhere between 20 minutes to 8 hrs to confirm reliably.) Are you saying that we're not losing much if Parag says "screw it" and doesn't report the problem? pgpcIDYr8VWbg.pgp Description: PGP signature
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
On Mon, 07 Jan 2008 16:19:30 MST, Matthew Wilcox said: > On Mon, Jan 07, 2008 at 06:04:25PM -0500, [EMAIL PROTECTED] wrote: > > Theoretically, at least. Sometimes, in the real world, other constraints > > enter into it... > > So you're saying that you can't find reliable ways to reproduce problems > on demand? Those are some of the lower quality bug reports, so I don't > think we're losing much by having you not report them. I'm sure that *everybody* on this list would *love* to know how you find a reliable way to reproduce all the bugs that start off with "after X days of uptime". But when you're chasing what might be a race condition with a very small timing hole, you may need an event to happen several million times before the accumulated chance of hitting it becomes appreciable. pgpIzTII30SXP.pgp Description: PGP signature
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
* Linus Torvalds <[EMAIL PROTECTED]> wrote: > These things *are* fairly rare (most bugs by _far_ are of the trivial > stupid kind), but some of those things can stay around for a long > time, and it can take months of different people reporting similar > problems until somebody finally puts two and two together and sees the > pattern. one common pattern i've noticed is bug dependency. In some areas we need to fix a series of 2-3 increasingly less trivial bugs to get enough test exposure, tester confidence and developer attention to trigger (and fix) the _truly_ bad bugs. That's why agressive regression elimination (and prevention) is so important IMHO - trivial regressions can totally block testing of certain areas of code, and with an agressive 90 days release schedule every day counts. Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
On Tue, 8 Jan 2008, Stefan Richter wrote: > Matthew Wilcox wrote: > > So you're saying that you can't find reliable ways to reproduce problems > > on demand? Those are some of the lower quality bug reports, > > Or those are the more difficult problems. Indeed. If it's some race condition, or dependent on memory pressure at just the right time, or a use-after-free that corrupts some memory that normally nobody will even notice (it's freed, after all, and not necessarily re-allocated), reproduction can be really very hard. It's happily not exactly *common*, but it's certainly not unheard of either, when you need to run some specific workload for hours to trigger the bug - and then when it doesn't happen, you have to ask yourself: "was I just lunky, punk?" Some of those things also go away magically between kernel versions or subtly different configurations. A use-after-free problem might be obvious in one config, but then another configuration might change the size of a structure, and suddenly the two kmalloc's that used to be in the same slab (and made the problem more visible) end up in different slabs, and now you suddenly cannot reproduce it with that particular load at all any more! These things *are* fairly rare (most bugs by _far_ are of the trivial stupid kind), but some of those things can stay around for a long time, and it can take months of different people reporting similar problems until somebody finally puts two and two together and sees the pattern. When we get a good bug-reporter that is willing and able to reproduce and test kernels, that's wonderful, but when we get some "background noise" of bad bug-reports, that's usually good too - even if it's good only in the long run (ie sometimes we just have to accept that the bug-report didn't contain enough information for us to really do anythign about it, and just let it be - and hope that future events will clarify things) Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
* James Bottomley <[EMAIL PROTECTED]> wrote: > > On Sun, 2008-01-06 at 17:12 +0100, Ingo Molnar wrote: > > * James Bottomley <[EMAIL PROTECTED]> wrote: > > > > > > The reproducer came to you via Peter Osterlund who has never > > > > authored a single drivers/scsi/ commit before (according to git-log) > > > > and who (and here i'm out on a limb guessing it) does not even > > > > follow [EMAIL PROTECTED] > > > > > > > > this bug was obscure and hidden on [EMAIL PROTECTED] > > > > for _months_, (it is a rarely visited and rarely read mailing > > > > list) and there was just not enough "critical mass" to get this > > > > issue fixed. > > > > > > If I were you, I'd actually make a cursory effort to get my facts > > > straight before spouting off. > > > > > > This bug was actually hidden in bugzilla for ages, where Matthew > > > Wilcox was trying to deal with it on his own. [...] > > > > Huh? The bugzilla just tracked a bug reported to lkml. The very > > description of the bugzilla says: > > > > Subject : v2.6.24-rc2-409-g9418d5d: attempt to access beyond end > > of device > > Submitter : Thomas Meyer <[EMAIL PROTECTED]> > > References : http://lkml.org/lkml/2007/11/13/250 > > > > so no, it was evidently not "hidden in bugzilla for ages" - all the > > important action happened on lkml. > > ... and your original accusation was "this bug was obscure and hidden > on [EMAIL PROTECTED] for _months_" which I was pointing out > wasn't true. you are right - let me rephrase it as: "this issue was mainly hidden due to the unhealthy ping-pong between lkml (which you said you didnt read), linux-scsi and bugzilla". > Even the original lkml report was obscured by sweeping the report into > bugzilla and forgetting about it, so in fact, no action happened, even > on lkml. all the "action" already happened on the first day of reporting the bug. (the wrong commit was identified, but that's besides the point - it all sat inactive after that point. I pinged the bugzilla to get the lkml discussion active again, not to debug it there.) what got movement into it all again was the revert. > Can we stop it with the recriminations and blame shifting now. [...] what "blame shifting" ??? all i'm worried about here is the long latency for a bugfix which very apparently (to me) happened due to the isolation of linux-scsi and the resulting bug processing inefficiencies. Bugs happen and nobody is to be "blamed" for the bug itself - but the bug processing flow was broken and i've pointed that out. (If you see similar cases for code i maintain, and if you can pinpoint the reason why you think it happened and how to improve that, then please point it out to me as well.) Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
Matthew Wilcox wrote: > So you're saying that you can't find reliable ways to reproduce problems > on demand? Those are some of the lower quality bug reports, Or those are the more difficult problems. -- Stefan Richter -=-==--- ---= -=--- http://arcgraph.de/sr/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert [SCSI] Get rid of scsi_cmnd-done
Matthew Wilcox wrote: So you're saying that you can't find reliable ways to reproduce problems on demand? Those are some of the lower quality bug reports, Or those are the more difficult problems. -- Stefan Richter -=-==--- ---= -=--- http://arcgraph.de/sr/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert [SCSI] Get rid of scsi_cmnd-done
* James Bottomley [EMAIL PROTECTED] wrote: On Sun, 2008-01-06 at 17:12 +0100, Ingo Molnar wrote: * James Bottomley [EMAIL PROTECTED] wrote: The reproducer came to you via Peter Osterlund who has never authored a single drivers/scsi/ commit before (according to git-log) and who (and here i'm out on a limb guessing it) does not even follow [EMAIL PROTECTED] this bug was obscure and hidden on [EMAIL PROTECTED] for _months_, (it is a rarely visited and rarely read mailing list) and there was just not enough critical mass to get this issue fixed. If I were you, I'd actually make a cursory effort to get my facts straight before spouting off. This bug was actually hidden in bugzilla for ages, where Matthew Wilcox was trying to deal with it on his own. [...] Huh? The bugzilla just tracked a bug reported to lkml. The very description of the bugzilla says: Subject : v2.6.24-rc2-409-g9418d5d: attempt to access beyond end of device Submitter : Thomas Meyer [EMAIL PROTECTED] References : http://lkml.org/lkml/2007/11/13/250 so no, it was evidently not hidden in bugzilla for ages - all the important action happened on lkml. ... and your original accusation was this bug was obscure and hidden on [EMAIL PROTECTED] for _months_ which I was pointing out wasn't true. you are right - let me rephrase it as: this issue was mainly hidden due to the unhealthy ping-pong between lkml (which you said you didnt read), linux-scsi and bugzilla. Even the original lkml report was obscured by sweeping the report into bugzilla and forgetting about it, so in fact, no action happened, even on lkml. all the action already happened on the first day of reporting the bug. (the wrong commit was identified, but that's besides the point - it all sat inactive after that point. I pinged the bugzilla to get the lkml discussion active again, not to debug it there.) what got movement into it all again was the revert. Can we stop it with the recriminations and blame shifting now. [...] what blame shifting ??? all i'm worried about here is the long latency for a bugfix which very apparently (to me) happened due to the isolation of linux-scsi and the resulting bug processing inefficiencies. Bugs happen and nobody is to be blamed for the bug itself - but the bug processing flow was broken and i've pointed that out. (If you see similar cases for code i maintain, and if you can pinpoint the reason why you think it happened and how to improve that, then please point it out to me as well.) Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert [SCSI] Get rid of scsi_cmnd-done
On Tue, 8 Jan 2008, Stefan Richter wrote: Matthew Wilcox wrote: So you're saying that you can't find reliable ways to reproduce problems on demand? Those are some of the lower quality bug reports, Or those are the more difficult problems. Indeed. If it's some race condition, or dependent on memory pressure at just the right time, or a use-after-free that corrupts some memory that normally nobody will even notice (it's freed, after all, and not necessarily re-allocated), reproduction can be really very hard. It's happily not exactly *common*, but it's certainly not unheard of either, when you need to run some specific workload for hours to trigger the bug - and then when it doesn't happen, you have to ask yourself: was I just lunky, punk? Some of those things also go away magically between kernel versions or subtly different configurations. A use-after-free problem might be obvious in one config, but then another configuration might change the size of a structure, and suddenly the two kmalloc's that used to be in the same slab (and made the problem more visible) end up in different slabs, and now you suddenly cannot reproduce it with that particular load at all any more! These things *are* fairly rare (most bugs by _far_ are of the trivial stupid kind), but some of those things can stay around for a long time, and it can take months of different people reporting similar problems until somebody finally puts two and two together and sees the pattern. When we get a good bug-reporter that is willing and able to reproduce and test kernels, that's wonderful, but when we get some background noise of bad bug-reports, that's usually good too - even if it's good only in the long run (ie sometimes we just have to accept that the bug-report didn't contain enough information for us to really do anythign about it, and just let it be - and hope that future events will clarify things) Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert [SCSI] Get rid of scsi_cmnd-done
* Linus Torvalds [EMAIL PROTECTED] wrote: These things *are* fairly rare (most bugs by _far_ are of the trivial stupid kind), but some of those things can stay around for a long time, and it can take months of different people reporting similar problems until somebody finally puts two and two together and sees the pattern. one common pattern i've noticed is bug dependency. In some areas we need to fix a series of 2-3 increasingly less trivial bugs to get enough test exposure, tester confidence and developer attention to trigger (and fix) the _truly_ bad bugs. That's why agressive regression elimination (and prevention) is so important IMHO - trivial regressions can totally block testing of certain areas of code, and with an agressive 90 days release schedule every day counts. Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert [SCSI] Get rid of scsi_cmnd-done
On Mon, 07 Jan 2008 16:19:30 MST, Matthew Wilcox said: On Mon, Jan 07, 2008 at 06:04:25PM -0500, [EMAIL PROTECTED] wrote: Theoretically, at least. Sometimes, in the real world, other constraints enter into it... So you're saying that you can't find reliable ways to reproduce problems on demand? Those are some of the lower quality bug reports, so I don't think we're losing much by having you not report them. I'm sure that *everybody* on this list would *love* to know how you find a reliable way to reproduce all the bugs that start off with after X days of uptime. But when you're chasing what might be a race condition with a very small timing hole, you may need an event to happen several million times before the accumulated chance of hitting it becomes appreciable. pgpIzTII30SXP.pgp Description: PGP signature
Re: [patch] scsi: revert [SCSI] Get rid of scsi_cmnd-done
On Mon, 07 Jan 2008 16:19:30 MST, Matthew Wilcox said: So you're saying that you can't find reliable ways to reproduce problems on demand? Those are some of the lower quality bug reports, so I don't think we're losing much by having you not report them. And in the next e-mail in my lkml folder we see: On Mon, 07 Jan 2008 18:21:45 EST, Parag Warudkar said: BTW, I have so far tested 2.6.24-rc4/5/6/7 and 2.6.23.12 - all of which have this problem. Yesterday I went back to using 2.6.22.15 and after a day's uptime it has not reproduced with the same config. Time for git-bisect I suppose? (the only problem is that this takes anywhere between 20 minutes to 8 hrs to confirm reliably.) Are you saying that we're not losing much if Parag says screw it and doesn't report the problem? pgpcIDYr8VWbg.pgp Description: PGP signature
Re: [patch] scsi: revert [SCSI] Get rid of scsi_cmnd-done
On Tue, 08 Jan 2008 23:01:07 -0500 [EMAIL PROTECTED] wrote: On Mon, 07 Jan 2008 16:19:30 MST, Matthew Wilcox said: On Mon, Jan 07, 2008 at 06:04:25PM -0500, [EMAIL PROTECTED] wrote: Theoretically, at least. Sometimes, in the real world, other constraints enter into it... So you're saying that you can't find reliable ways to reproduce problems on demand? Those are some of the lower quality bug reports, so I don't think we're losing much by having you not report them. I'm sure that *everybody* on this list would *love* to know how you find a reliable way to reproduce all the bugs that start off with after X days of uptime. But when you're chasing what might be a race condition with a very small timing hole, you may need an event to happen several million times before the accumulated chance of hitting it becomes appreciable. I must say that the number of bugs which actually go away when the user stops using nvidia/fglrx/ndiswrapper/etc is a small minority. And you can usually tell beforehand too: if the user reports bad_page warnings or pte table scroggage or whatever and they're using nvidia I just hit 'd'. But people who think that removing the nvidia driver will magically fix that khubd-got-stuck-in-D-state bug are urinating up an incline. Facts: - lots of people use nvidia/etc - most bugs they report aren't caused by nvidia/etc - we need lots of testers draw you own conclusions. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert [SCSI] Get rid of scsi_cmnd-done
On Tue, Jan 08, 2008 at 08:10:42PM -0800, Andrew Morton wrote: [...] I must say that the number of bugs which actually go away when the user stops using nvidia/fglrx/ndiswrapper/etc is a small minority. [...] But people who think that removing the nvidia driver will magically fix that khubd-got-stuck-in-D-state bug are urinating up an incline. Facts: - lots of people use nvidia/etc - most bugs they report aren't caused by nvidia/etc - we need lots of testers draw you own conclusions. Thanks Andrew for this demonstration. At least now I know I'm not the only one to think that. And no, I do not have any nvidia/etc. It's just that I value their users' reports as much as the other ones just because otherwise we would only track some elite's bugs, thus reducing the amount of information we have to understand the circumstances under which it happens. Cheers, Willy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
On Mon, Jan 07, 2008 at 06:04:25PM -0500, [EMAIL PROTECTED] wrote: > Theoretically, at least. Sometimes, in the real world, other constraints > enter into it... So you're saying that you can't find reliable ways to reproduce problems on demand? Those are some of the lower quality bug reports, so I don't think we're losing much by having you not report them. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
On Mon, 07 Jan 2008 14:37:17 MST, Matthew Wilcox said: > If you can reproduce a bug reliably, you can reproduce it without the > nvidia module loaded. Theoretically, at least. Sometimes, in the real world, other constraints enter into it... You're welcome to stop by and figure out why (I've sunk multiple tens of hours into trying already) the NVidia binary driver can handle the Gateway monitor hanging off my docking station and the open-source one selects a non-functional sync rate, or pay for a replacement (management doesn't want to shell out $$ to replace a monitor that works just fine when the manufacturer's drivers/config are installed - surprising, that). Or I could get behind the docking station, unplug the ergonomic keyboard, the power brick, the trackball, and the cat-5, set the laptop up on the desk, find some way to arrange the laptop, cat-5, trackball, power brick, and keyboard so things are reasonable to use for a long day of typing, and wait for it to *maybe* happen again, and then put everything back when it's safe to use the monitor again. Oh, and the open source driver has some *horrid* color-banding issues on the laptop's LCD, as well, you're welcome to fix those while you're at it.. ;) And no, I'm *not* just winging it with the laptop for the day - I have enough carpal tunnel issues *with* an ergonomic keyboard and a trackball, I'm not going to press my luck with the onboard keyboard and mouse. Or I can just say "screw it" and not bother reporting any bugs that aren't easily repeatable before GDM gets started. Guess which one ends up happening in the real world? pgpE61JYJLZdN.pgp Description: PGP signature
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
> Personally, I've lost count of the number of times I've *not* reported a > bug/oops just because of the "NVidia users this means you" statement, only > to have the exact same issue reported several weeks/months later by somebody > who's able to replicate it with an untainted kernel. If you can reproduce a bug reliably, you can reproduce it without the nvidia module loaded. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
> Personally, I've lost count of the number of times I've *not* reported a > bug/oops just because of the "NVidia users this means you" statement, only > to have the exact same issue reported several weeks/months later by somebody > who's able to replicate it with an untainted kernel. And I've lost count of the number of bugs reported that turned out to be the Nvidia modules. Alan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
On Sun, 06 Jan 2008 22:08:13 +0100, Willy Tarreau said: > even slightly annoying, we never get it. Have you noticed the number of > "me too" on the list ? Users find any sort of excuse for not having filed > a report in the first time, but are still willing to confirm another > one's bug. That's normal, they're just humans after all. Personally, I've lost count of the number of times I've *not* reported a bug/oops just because of the "NVidia users this means you" statement, only to have the exact same issue reported several weeks/months later by somebody who's able to replicate it with an untainted kernel. > 'lsmod|more' in a shell window. If you do not know how to run without > those drivers, please still file the report, it may still be considered > valuable, but you may be recontacted and guided to unload them." Exactly. pgpXPPAjCPM0l.pgp Description: PGP signature
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
> "Stefan" == Stefan Richter <[EMAIL PROTECTED]> writes: Stefan> John Stoffel wrote: >> The question to me really revolves around how do you automate the >> process in a transparent manner so that people don't have to change >> much how they interact with lkml. Stefan> I think the more important questions are: Stefan> - Are there people who know how to get reports to developers? Stefan> (We can't train all potential reporters to get reports right.) Sure, but telling them "email to [EMAIL PROTECTED]" isn't tough. Stefan> - Are there enough developers to work towards bug fixes? Nope, never enough. Stefan> The technical means to capture reports, let alone the Stefan> bugtracking tools, are of secondary importance. The technical means are secondary, it's the human factors of how bug reports are captured which is of prime concern. It's obvious people aren't happy with bugzilla. Most, but not all, are happy with LKML, but a dumb mailing list isn't ideal for tracking bug reports. As a SysAdmin, I love being able to tell me user 'just email help' to open a ticket. As a SysAdmin, I can then interact with the tickets which are created and managed using 'RT' (http://www.fsck.org/rt) from inside my email client without having to think about it too much. Again, transparency to the end-user AND tothe developers is key. If neither finds if easy to use, it won't be. So we need to come up with a work flow which works with us (read you, I just read and contribute bug reports myself... :-) and for us. But which never gets in the way if at all possible. John -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
John Stoffel wrote: > The question to me really revolves around how do you automate the > process in a transparent manner so that people don't have to change > much how they interact with lkml. I think the more important questions are: - Are there people who know how to get reports to developers? (We can't train all potential reporters to get reports right.) - Are there enough developers to work towards bug fixes? The technical means to capture reports, let alone the bugtracking tools, are of secondary importance. -- Stefan Richter -=-==--- ---= --=== http://arcgraph.de/sr/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
I'll agree with what Willy wrote here, Bugzilla is a pain to use, you can't just dump an email into it and have it captured. I think we should be looking at something more like 'WebRT' which is an *issue* tracker software. But that too might be too heavy weight and too noisy as well. And suddenly it would starting putting ticket numbers onto all the non-problem conversations we have here on lkml as well, which I'm not sure we really want. Does it mean that we needs to have a '[EMAIL PROTECTED]' email for people to report bugs/problems/issues, while lkml remains (but is copied for all '[EMAIL PROTECTED]' emails) primarily the developer point of contact? The question to me really revolves around how do you automate the process in a transparent manner so that people don't have to change much how they interact with lkml. John -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert [SCSI] Get rid of scsi_cmnd-done
I'll agree with what Willy wrote here, Bugzilla is a pain to use, you can't just dump an email into it and have it captured. I think we should be looking at something more like 'WebRT' which is an *issue* tracker software. But that too might be too heavy weight and too noisy as well. And suddenly it would starting putting ticket numbers onto all the non-problem conversations we have here on lkml as well, which I'm not sure we really want. Does it mean that we needs to have a '[EMAIL PROTECTED]' email for people to report bugs/problems/issues, while lkml remains (but is copied for all '[EMAIL PROTECTED]' emails) primarily the developer point of contact? The question to me really revolves around how do you automate the process in a transparent manner so that people don't have to change much how they interact with lkml. John -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert [SCSI] Get rid of scsi_cmnd-done
John Stoffel wrote: The question to me really revolves around how do you automate the process in a transparent manner so that people don't have to change much how they interact with lkml. I think the more important questions are: - Are there people who know how to get reports to developers? (We can't train all potential reporters to get reports right.) - Are there enough developers to work towards bug fixes? The technical means to capture reports, let alone the bugtracking tools, are of secondary importance. -- Stefan Richter -=-==--- ---= --=== http://arcgraph.de/sr/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert [SCSI] Get rid of scsi_cmnd-done
Stefan == Stefan Richter [EMAIL PROTECTED] writes: Stefan John Stoffel wrote: The question to me really revolves around how do you automate the process in a transparent manner so that people don't have to change much how they interact with lkml. Stefan I think the more important questions are: Stefan - Are there people who know how to get reports to developers? Stefan (We can't train all potential reporters to get reports right.) Sure, but telling them email to [EMAIL PROTECTED] isn't tough. Stefan - Are there enough developers to work towards bug fixes? Nope, never enough. Stefan The technical means to capture reports, let alone the Stefan bugtracking tools, are of secondary importance. The technical means are secondary, it's the human factors of how bug reports are captured which is of prime concern. It's obvious people aren't happy with bugzilla. Most, but not all, are happy with LKML, but a dumb mailing list isn't ideal for tracking bug reports. As a SysAdmin, I love being able to tell me user 'just email help' to open a ticket. As a SysAdmin, I can then interact with the tickets which are created and managed using 'RT' (http://www.fsck.org/rt) from inside my email client without having to think about it too much. Again, transparency to the end-user AND tothe developers is key. If neither finds if easy to use, it won't be. So we need to come up with a work flow which works with us (read you, I just read and contribute bug reports myself... :-) and for us. But which never gets in the way if at all possible. John -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert [SCSI] Get rid of scsi_cmnd-done
On Sun, 06 Jan 2008 22:08:13 +0100, Willy Tarreau said: even slightly annoying, we never get it. Have you noticed the number of me too on the list ? Users find any sort of excuse for not having filed a report in the first time, but are still willing to confirm another one's bug. That's normal, they're just humans after all. Personally, I've lost count of the number of times I've *not* reported a bug/oops just because of the NVidia users this means you statement, only to have the exact same issue reported several weeks/months later by somebody who's able to replicate it with an untainted kernel. 'lsmod|more' in a shell window. If you do not know how to run without those drivers, please still file the report, it may still be considered valuable, but you may be recontacted and guided to unload them. Exactly. pgpXPPAjCPM0l.pgp Description: PGP signature
Re: [patch] scsi: revert [SCSI] Get rid of scsi_cmnd-done
Personally, I've lost count of the number of times I've *not* reported a bug/oops just because of the NVidia users this means you statement, only to have the exact same issue reported several weeks/months later by somebody who's able to replicate it with an untainted kernel. And I've lost count of the number of bugs reported that turned out to be the Nvidia modules. Alan -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert [SCSI] Get rid of scsi_cmnd-done
Personally, I've lost count of the number of times I've *not* reported a bug/oops just because of the NVidia users this means you statement, only to have the exact same issue reported several weeks/months later by somebody who's able to replicate it with an untainted kernel. If you can reproduce a bug reliably, you can reproduce it without the nvidia module loaded. -- Intel are signing my paycheques ... these opinions are still mine Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert [SCSI] Get rid of scsi_cmnd-done
On Mon, 07 Jan 2008 14:37:17 MST, Matthew Wilcox said: If you can reproduce a bug reliably, you can reproduce it without the nvidia module loaded. Theoretically, at least. Sometimes, in the real world, other constraints enter into it... You're welcome to stop by and figure out why (I've sunk multiple tens of hours into trying already) the NVidia binary driver can handle the Gateway monitor hanging off my docking station and the open-source one selects a non-functional sync rate, or pay for a replacement (management doesn't want to shell out $$ to replace a monitor that works just fine when the manufacturer's drivers/config are installed - surprising, that). Or I could get behind the docking station, unplug the ergonomic keyboard, the power brick, the trackball, and the cat-5, set the laptop up on the desk, find some way to arrange the laptop, cat-5, trackball, power brick, and keyboard so things are reasonable to use for a long day of typing, and wait for it to *maybe* happen again, and then put everything back when it's safe to use the monitor again. Oh, and the open source driver has some *horrid* color-banding issues on the laptop's LCD, as well, you're welcome to fix those while you're at it.. ;) And no, I'm *not* just winging it with the laptop for the day - I have enough carpal tunnel issues *with* an ergonomic keyboard and a trackball, I'm not going to press my luck with the onboard keyboard and mouse. Or I can just say screw it and not bother reporting any bugs that aren't easily repeatable before GDM gets started. Guess which one ends up happening in the real world? pgpE61JYJLZdN.pgp Description: PGP signature
Re: [patch] scsi: revert [SCSI] Get rid of scsi_cmnd-done
On Mon, Jan 07, 2008 at 06:04:25PM -0500, [EMAIL PROTECTED] wrote: Theoretically, at least. Sometimes, in the real world, other constraints enter into it... So you're saying that you can't find reliable ways to reproduce problems on demand? Those are some of the lower quality bug reports, so I don't think we're losing much by having you not report them. -- Intel are signing my paycheques ... these opinions are still mine Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
On Sun, Jan 06, 2008 at 10:08:13PM +0100, Willy Tarreau wrote: > On Sun, Jan 06, 2008 at 09:58:02PM +0200, Adrian Bunk wrote: > > On Sun, Jan 06, 2008 at 08:10:44PM +0100, Willy Tarreau wrote: > > > I, as an end user of ntpd, have been harrassed to use it to get an > > > ntp bug reported "because by mail it would get lost". What complicated > > > > Noone knows how many thousand bug reports have never reached lkml > > since majordomo silently dropped them. > > Since none of my mails has been dropped yet, I think that the false > positives are rather rare. Yes, sometimes someone complains about a > mail getting repeatedly killed. But that's not *that* much frequent > IMO and people are already used to re-post when mailing their friends, > coworkers or customers. It's no different here. Still, I agree that > it is a problem. If someone works in a company where the default MUA setting is to also attach the text as HTML and to add a vCard to all emails people might try once to submit their bug report, not notice that it didn't arrive on the list, and then simply give up. > > This is the price for having lkml relatively spam-free. > > yes and I think it works fairly well. > > > > an interface it is when you don't know it ! I remember I wanted to > > > attach a patch and it didn't even get through the first time. I did > > > it wrong. Blame me if you want, but an interface which need training > > > for proper use is certainly not for casual end users. > > > > What exactly is the problem with attaching files? > > What is "it didn't even get through the first time" exactly? > > Well, it's quite old in my memories, it may be 2 years ago. IIRC, when > I wanted to attach files, I got brought to another page for this, and > once done there was some confusing indication about how to complete the > filing or get back to terminate the report. Sorry for not being much > precise on this one, it's too far ago. Currently the page for attaching a file has a "Submit" button. > > > Also, it's very annoying to have to create an account somewhere, leaving > > > there one of the passwords you use on many other sites, just to help a > > > random developer fix a bug in his code. You quickly wonder if someone > > > else will report it and have more patience. > > > > If you already lack patience at this point, > > Well, it took me 2 minutes to send my patch to the maintainer by then, > he very politely told me that the only way was through bugzilla, and > then it took me half an hour if not more to create a bugzilla account, > find how to use it, attach the files and put a description in a text-area. People reporting bugs together with a patch to fix it are not the usual case but an exception. > Also, I remember that the ongoing mail exchanges through bugzilla > systematically removed the mail history, which made it very hard to > follow a discussion, because all mails I received were almost single-lined > looking like "how did that happen ?" or "in what circumstances ?" without > any history... Maybe this is configurable though. This depends on how people answer in Bugzilla. But an advantage of Bugzilla is that each email contains a link to the Bugzilla bug containing all discussions in the bug. > > would you be willing to > > bisect which requires more than a dozen kernel recompiles and reboots? > > Certainly not! But I would like kernel people to become less egocentric > and understand that what they routinely do all the day appears very > complicated and time-consuming to many users, and that by imposing them > complex and/or costly methods to report bugs, they filter a lot of > reports out. Sure, there are still a bunch of them doing everything > up to and including the git bisects. But what percentage ? If you report a regression in the kernel and are not willing to bisect the probability of the bug being resolved becomes _much_ smaller. Partially due to this requiring much more developer time, but also partially due to the fact that many regressions are undebuggable without a bisection. > People who encounter problems at work will not do that to start with, > because they cannot delay all their work to spend half a day building > kernels when their boss or customers are waiting for their work. Others > will report the problems they encounter at a customer's and will not > even be able to git-bisect because the customer's mail server is not > like a notebook they have everywhere with them and can reboot at will. > Some of them are more free of their time and will probably enjoy the > experience, but they are a minority IMHO. People tend to report bugs if and only they have no other choice (like some workaround). So when they report a bug they really need a fix for their problem. And have you ever worked in a company that pays a seven digit amount of Euros each year to Oracle for licences and support for their database? I have. It's not that spending some man days on debugging or working around
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
On Sun, Jan 06, 2008 at 09:58:02PM +0200, Adrian Bunk wrote: > On Sun, Jan 06, 2008 at 08:10:44PM +0100, Willy Tarreau wrote: > > I, as an end user of ntpd, have been harrassed to use it to get an > > ntp bug reported "because by mail it would get lost". What complicated > > Noone knows how many thousand bug reports have never reached lkml > since majordomo silently dropped them. Since none of my mails has been dropped yet, I think that the false positives are rather rare. Yes, sometimes someone complains about a mail getting repeatedly killed. But that's not *that* much frequent IMO and people are already used to re-post when mailing their friends, coworkers or customers. It's no different here. Still, I agree that it is a problem. > This is the price for having lkml relatively spam-free. yes and I think it works fairly well. > > an interface it is when you don't know it ! I remember I wanted to > > attach a patch and it didn't even get through the first time. I did > > it wrong. Blame me if you want, but an interface which need training > > for proper use is certainly not for casual end users. > > What exactly is the problem with attaching files? > What is "it didn't even get through the first time" exactly? Well, it's quite old in my memories, it may be 2 years ago. IIRC, when I wanted to attach files, I got brought to another page for this, and once done there was some confusing indication about how to complete the filing or get back to terminate the report. Sorry for not being much precise on this one, it's too far ago. > > Also, it's very annoying to have to create an account somewhere, leaving > > there one of the passwords you use on many other sites, just to help a > > random developer fix a bug in his code. You quickly wonder if someone > > else will report it and have more patience. > > If you already lack patience at this point, Well, it took me 2 minutes to send my patch to the maintainer by then, he very politely told me that the only way was through bugzilla, and then it took me half an hour if not more to create a bugzilla account, find how to use it, attach the files and put a description in a text-area. Also, I remember that the ongoing mail exchanges through bugzilla systematically removed the mail history, which made it very hard to follow a discussion, because all mails I received were almost single-lined looking like "how did that happen ?" or "in what circumstances ?" without any history... Maybe this is configurable though. > would you be willing to > bisect which requires more than a dozen kernel recompiles and reboots? Certainly not! But I would like kernel people to become less egocentric and understand that what they routinely do all the day appears very complicated and time-consuming to many users, and that by imposing them complex and/or costly methods to report bugs, they filter a lot of reports out. Sure, there are still a bunch of them doing everything up to and including the git bisects. But what percentage ? People who encounter problems at work will not do that to start with, because they cannot delay all their work to spend half a day building kernels when their boss or customers are waiting for their work. Others will report the problems they encounter at a customer's and will not even be able to git-bisect because the customer's mail server is not like a notebook they have everywhere with them and can reboot at will. Some of them are more free of their time and will probably enjoy the experience, but they are a minority IMHO. If we had stats on the periods git bisects are run on, I suspect that night and week-ends are the most frequent moments, simply because it's when people have time. IMHO, git bisect is excellent for kernel people. Not for random users. They first have to install git, find free space, *clone the kernel tree* and start discovering the beast. > > Another recent example: a coworker recently told me he installed the > > latest beta from ubuntu, and that he had some problems with his WIFI > > randomly hanging. I asked him if he filed a bug, he replied me "no, > > it's too much boring, I'm not the only one with this hardware, others > > have certainly already done it". When the release went out, he insisted > > telling me he was right not filing the bug because indeed it was fixed ! > > He wouldn't have sent a bug report no matter how to report it. I don't agree. It's a matter of effort vs expected advantage. Just sending a 5-lines mail from work presents a lower entry barrier than having to create an account and discover a new tool. In fact, from the user's perspective, filing a kernel bug report is seen as something annoying and useless, simply because the kernel is so much used that someone else will file the same bug anyway. They act just like microsoft users. Do you know anyone in your relatives who has *ever* filed a bug to microsoft ? Probably zero. They passively wait for the next patch, and just whine if their bug does
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
* Stefan Richter <[EMAIL PROTECTED]> wrote: > Ingo Molnar wrote: > > If lkml traffic is too big then i'd suggest to set up email > > filters to separate out mails that have 'SCSI' in their subject line > > Good idea. Minor flaw: If somebody forgets to Cc LSML, he might also > forget to put SCSI or scsi into the Subject. yeah, that's very frequent, but i dont think it's a big issue: > > or body. > > This yields false positives whenever a .config is posted. these two patterns: "^CONFIG_" "^# " will exclude .config-ish lines. Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
On Sun, Jan 06, 2008 at 08:10:44PM +0100, Willy Tarreau wrote: > On Sun, Jan 06, 2008 at 08:56:25PM +0200, Adrian Bunk wrote: > > On Sun, Jan 06, 2008 at 07:34:02PM +0100, Willy Tarreau wrote: > > >... > > > As to using bugzilla for bug tracking... Well, I agree that bug > > > tracking is important when you work on multiple problems at once. > > > But bugzilla should be the developer's tool, not the end user's. > > > That means that it should only be our job to create entries there > > > if end users find it too difficult, and that we should just invite > > > them to *try* to report there to save us some time. > > > > Where does this opinion end users would find Bugzilla too difficult come > > from? Many other projects use Bugzilla without big problems. > > > > Is this just plain FUD or based on real reports from end users? > > In the latter case, please give pointers to them so that whatever > > problems exist can be fixed. > > I, as an end user of ntpd, have been harrassed to use it to get an > ntp bug reported "because by mail it would get lost". What complicated Noone knows how many thousand bug reports have never reached lkml since majordomo silently dropped them. This is the price for having lkml relatively spam-free. > an interface it is when you don't know it ! I remember I wanted to > attach a patch and it didn't even get through the first time. I did > it wrong. Blame me if you want, but an interface which need training > for proper use is certainly not for casual end users. What exactly is the problem with attaching files? What is "it didn't even get through the first time" exactly? > Also, it's very annoying to have to create an account somewhere, leaving > there one of the passwords you use on many other sites, just to help a > random developer fix a bug in his code. You quickly wonder if someone > else will report it and have more patience. If you already lack patience at this point, would you be willing to bisect which requires more than a dozen kernel recompiles and reboots? > Another recent example: a coworker recently told me he installed the > latest beta from ubuntu, and that he had some problems with his WIFI > randomly hanging. I asked him if he filed a bug, he replied me "no, > it's too much boring, I'm not the only one with this hardware, others > have certainly already done it". When the release went out, he insisted > telling me he was right not filing the bug because indeed it was fixed ! He wouldn't have sent a bug report no matter how to report it. > We must accept that end users : > 1) do not like creating accounts (remember or divulgate passwords, > and risk of getting spam) Send _one_ email to lkml and you'll get forever spam to this address. With one email addresses of mine exactly that happened. > 2) do not know how to classify their problem, and are not even > sure it's a real bug. On the first page, when uncertain they > would probably click "Other". Adding doubt in the reporter's > mind is counter-productive as it will refrain him from being > precise about what he did to get the problem. If the bug ends at Other/Other that's not a problem - this usually gets fixed within a few hours. > 3) are not familiar with our vocabulary : > - "Tree" : mainline? mm? mjb? ac? what's that ? > - "Component" : Configuration? LSM? Modules? Other? Then let's improve that. > => finally, I'm not sure I had to click "Other" in the first place, > I want to choose something else, I click "Back" and I get back > to the login page! Bye bye. Works fine here. Did you disable cookies? > Also : > "No binary modules - NVIDIA users this means YOU!" >=> about half the reporters will wonder if they should stop here > or not. Most of those with an NVidia chipset and/or graphics > card will wonder, while the bug may still interest us. Then sugggest a better text. > At least, on the mailing list, there's no real rules, the mail will > be posted anyway. And if the user gets flamed, at least we have the > report. >... If majordomo didn't drop it. And if it didn't get ignored and forgotten. > Regards, > Willy cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
On Sun, Jan 06, 2008 at 08:56:25PM +0200, Adrian Bunk wrote: > On Sun, Jan 06, 2008 at 07:34:02PM +0100, Willy Tarreau wrote: > >... > > As to using bugzilla for bug tracking... Well, I agree that bug > > tracking is important when you work on multiple problems at once. > > But bugzilla should be the developer's tool, not the end user's. > > That means that it should only be our job to create entries there > > if end users find it too difficult, and that we should just invite > > them to *try* to report there to save us some time. > > Where does this opinion end users would find Bugzilla too difficult come > from? Many other projects use Bugzilla without big problems. > > Is this just plain FUD or based on real reports from end users? > In the latter case, please give pointers to them so that whatever > problems exist can be fixed. I, as an end user of ntpd, have been harrassed to use it to get an ntp bug reported "because by mail it would get lost". What complicated an interface it is when you don't know it ! I remember I wanted to attach a patch and it didn't even get through the first time. I did it wrong. Blame me if you want, but an interface which need training for proper use is certainly not for casual end users. Also, it's very annoying to have to create an account somewhere, leaving there one of the passwords you use on many other sites, just to help a random developer fix a bug in his code. You quickly wonder if someone else will report it and have more patience. Another recent example: a coworker recently told me he installed the latest beta from ubuntu, and that he had some problems with his WIFI randomly hanging. I asked him if he filed a bug, he replied me "no, it's too much boring, I'm not the only one with this hardware, others have certainly already done it". When the release went out, he insisted telling me he was right not filing the bug because indeed it was fixed ! We must accept that end users : 1) do not like creating accounts (remember or divulgate passwords, and risk of getting spam) 2) do not know how to classify their problem, and are not even sure it's a real bug. On the first page, when uncertain they would probably click "Other". Adding doubt in the reporter's mind is counter-productive as it will refrain him from being precise about what he did to get the problem. 3) are not familiar with our vocabulary : - "Tree" : mainline? mm? mjb? ac? what's that ? - "Component" : Configuration? LSM? Modules? Other? => finally, I'm not sure I had to click "Other" in the first place, I want to choose something else, I click "Back" and I get back to the login page! Bye bye. Also : "No binary modules - NVIDIA users this means YOU!" => about half the reporters will wonder if they should stop here or not. Most of those with an NVidia chipset and/or graphics card will wonder, while the bug may still interest us. At least, on the mailing list, there's no real rules, the mail will be posted anyway. And if the user gets flamed, at least we have the report. > And different to LKML, you don't run into problems like majordomo > silently dropping your email because it contains HTML or a vCard. That's true. But do we have statistics on the ratio of client IP addresses which go as far as the login page and which have finally not filed their bug (either stopped at the login page or given up after logging in) ? It should be very interesting... Regards, Willy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
On Sun, Jan 06, 2008 at 07:34:02PM +0100, Willy Tarreau wrote: >... > As to using bugzilla for bug tracking... Well, I agree that bug > tracking is important when you work on multiple problems at once. > But bugzilla should be the developer's tool, not the end user's. > That means that it should only be our job to create entries there > if end users find it too difficult, and that we should just invite > them to *try* to report there to save us some time. Where does this opinion end users would find Bugzilla too difficult come from? Many other projects use Bugzilla without big problems. Is this just plain FUD or based on real reports from end users? In the latter case, please give pointers to them so that whatever problems exist can be fixed. And different to LKML, you don't run into problems like majordomo silently dropping your email because it contains HTML or a vCard. > Willy cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
On Sun, 2008-01-06 at 10:44 -0800, Linus Torvalds wrote: > > On Sun, 6 Jan 2008, Linus Torvalds wrote: > > > > That said: > > > > > pktcdvd shouldn't be mucking with the size of the underlying CD/DVD ... > > > > I'm not sure if it should be mucking with the size or not, but it > > definitely shouldn't be mucking with the block-size, because that can > > indeed cause huge problems. > > Hmm. Looking closer, it's probably ok in that case, because it does do a > "bd_claim()" to make sure that it has exclusive access, so while there may > be other openers around, at least those other openers won't be filesystem > mounts or anything that opened with O_EXCL. > > So changing the blocksize is probably ok in this case. > > That still leaves the question whether pktcdvd *should* muck with the base > device at all, and I'm not at all sure about that. But I'm no longer sure > that the pktcdvd code is necessarily *clearly* broken, now it's more of a > "should it really do that?" thing. > > So I still suspect that this: > > > - set_capacity(pd->disk, lba << 2); > > - set_capacity(pd->bdev->bd_disk, lba << 2); > > - bd_set_size(pd->bdev, (loff_t)lba << 11); > > + set_capacity(pd->disk, get_capacity(pd->bdev->bd_disk)); > > is likely a good thing to do (in conjunction with my patch that made > i_size be "reliable" after an open), but there may be some reason why > pktcdvd really wants to control the size rather than be on the receiving > end of the size. > > Peter, this is your decision. Apparently my one-liner fixes the immediate > bug (but it's not really a regression either - I think the i_size issue > has been there since pretty much day #1), and what pktcdvd does is > somewhat less critical an issue? I think perhaps the true bug lies in the way we handle layered devices like this. pktcdvd holds the underlying device open, so its refcount never drops to zero. This is what causes the gendisk/block layer never to update the sizes, and what lead to pktcdvd doing it instead. However, what perhaps really needs to happen is that pktcdvd needs to take over the media change events as well. That way it could see the disk change and invalidate and reread its own setting of the block size (and possibly re set the size of the underlying device). I agree, though, this isn't a regression. It's probably obscure enough in reproduction to warrant not holding up 2.6.24 --- especially as I think the true fix will do small perturbations to a lot of subsystems. If this were a product and I were the release manager, I'd be updating the release notes with a note about having to break pktcdvd binding across media changes to work around this bug and a promise to fix it in the next release. James -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
On Sun, 6 Jan 2008, Linus Torvalds wrote: > > That said: > > > pktcdvd shouldn't be mucking with the size of the underlying CD/DVD ... > > I'm not sure if it should be mucking with the size or not, but it > definitely shouldn't be mucking with the block-size, because that can > indeed cause huge problems. Hmm. Looking closer, it's probably ok in that case, because it does do a "bd_claim()" to make sure that it has exclusive access, so while there may be other openers around, at least those other openers won't be filesystem mounts or anything that opened with O_EXCL. So changing the blocksize is probably ok in this case. That still leaves the question whether pktcdvd *should* muck with the base device at all, and I'm not at all sure about that. But I'm no longer sure that the pktcdvd code is necessarily *clearly* broken, now it's more of a "should it really do that?" thing. So I still suspect that this: > - set_capacity(pd->disk, lba << 2); > - set_capacity(pd->bdev->bd_disk, lba << 2); > - bd_set_size(pd->bdev, (loff_t)lba << 11); > + set_capacity(pd->disk, get_capacity(pd->bdev->bd_disk)); is likely a good thing to do (in conjunction with my patch that made i_size be "reliable" after an open), but there may be some reason why pktcdvd really wants to control the size rather than be on the receiving end of the size. Peter, this is your decision. Apparently my one-liner fixes the immediate bug (but it's not really a regression either - I think the i_size issue has been there since pretty much day #1), and what pktcdvd does is somewhat less critical an issue? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
On Sun, Jan 06, 2008 at 11:36:23AM -0600, James Bottomley wrote: > On Sun, 2008-01-06 at 10:11 -0700, Matthew Wilcox wrote: > > If there's willingness to change, I'm willing to put some effort into > > moving us to a bug tracking system that fits our workflow better than > > bugzilla. But if that effort will be disregarded, then let me know now > > so I don't waste my time. > > Well, I'd say yes, certainly, and I'll support the effort ... but the > problem is that I'm not one of the powers that be who control our > current bugzilla ... that's the constituency we need to convince. As I > keep saying, just getting new SCSI bug reports tipped onto the SCSI > mailing list will give us about 90% of what we need, but I haven't even > managed to get that to happen. As long as the process will be that much complicated, it will never correctly work. The primary requirement for a useful bug reporting workflow is *not to put the burden on the reporter*. I agree with Ingo here. How can a user know where to post ? He has a problem with his Linux distro. He finally understands or gets told that the strange numbers on the screen indicate a kernel oops and that he must report it if he wants it to be fixed. He then googles for how/where to report a bug, and the first reply says : http://www.kernel.org/pub/linux/docs/lkml/reporting-bugs.html in which you can read : "If you are totally stumped as to whom to send the report, send it to linux-kernel@vger.kernel.org". So *this* is the official central entry point, like it or not. And in fact it works, given the number of off-topic reports we get. It proves that end users know how to report their problems there. Other lists should be used when the problem is clearly qualified. And most of the time, it's not up to the end user to qualify his problem, but to *us*. Our mission is not to blindly write lines of code, but to spend some time getting user feedback, and bug reports are part of this feedback. In my opinion, the most important reader contribution to LKML is just reading bugs reports and redirecting them to the proper list if obviously required. People do this all the time and it has always worked. In parallel, bug entries may be added into bugzilla, either by the reporter once suggested to do so, or by the person redirecting him to the proper list. So a working workflow would look like this : 1) from: user to: lkml subject: help needed with my CD burner 2) from: any LKML reader to: user cc: lkml, linux-scsi subject: re: help needed with my CD burner Message forwarded to linux-scsi. You may accelerate resolution by describing your problem in bugzilla [url, mail...] => user knows his problem is being considered (very important) => user is connected with the proper list and possibly with a bugzilla entry. As long as the bug is not 100% sure relevant to linux-scsi, lkml should remain CCed so that other readers may occasionally spot the problem. I would also like to make a parallel with how support works in commercial products. Generally, you as the end user don't know anything about the vendor's internal process. You don't even know if you have an account on your vendor's support site (another blocking factor of bugzilla BTW). So what you do is call any of your contacts overthere, quickly describe your problem, and most of the time he gives you all the indications required to report a fine bug. And if he understands it will be too hard for you (classically because of missing account), he will initiate it by himself. At this point the bug is tracked and followed by the appropriate persons. LKML *is* the contact for any Linux Kernel problem or question. As long as we will try to bypass it and create parallel ways, it will only maintain confusion and lead to bugs getting dropped with user frustration. IMHO, the only missing indication in "reporting-bugs.html" above is : "if you don't get any reply within 2 days, surely your mail has not been noticed, repost it, if possible with a more appropriate subject". We will *never* be able to educate newcomers, but we can (and must) educate regular readers to help newcomers report bugs. If only 100 regular readers randomly forward 2 mails a month, those are 200 bugs which get properly handled. Just don't forget to change your reply-to to lkml if you don't want to get polluted by the discussion. As to using bugzilla for bug tracking... Well, I agree that bug tracking is important when you work on multiple problems at once. But bugzilla should be the developer's tool, not the end user's. That means that it should only be our job to create entries there if end users find it too difficult, and that we should just invite them to *try* to report there to save us some time. Willy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
On Sun, 6 Jan 2008, James Bottomley wrote: > > index 993f78c..6a20da9 100644 > > --- a/fs/block_dev.c > > +++ b/fs/block_dev.c > > @@ -1191,6 +1191,7 @@ static int do_open(struct block_device *bdev, struct > > file *file, int for_part) > > } > > if (bdev->bd_invalidated) > > rescan_partitions(bdev->bd_disk, bdev); > > + bd_inode->i_size = (loff_t)get_capacity(disk)<<9; > > } > > } > > bdev->bd_openers++; > > Actually, I think that code is fine ... the block size shouldn't change > across positive values of bd_openers. The block size should indeed not change, and that's why we must *not* do bd_set_size() there (because that changes the block size too, not just the size of the device). But I would argue that if we have had a device invalidation event (ie media change), then we should indeed change the actual *size* of the block device as seen by anybody who opens the file again. And yes, it will affect old openers of the same inode too (since the size is in the inode, not the file descriptor), but considering that this really should only happen for media change events, I think that's better than what we used to have. Now, we could have made it even more clear by moving the "i_size" setting into the same "if (dbev->bd_invalidated)" conditional, but the thing is, we only set bd_invalidated for devices that have minors, so things like floppies etc that don't have partition support also don't have bd_invalidated set (in effect, bd_invalidated really *is* just a flag for the partitioning code, not for disk change in general). That said: > pktcdvd shouldn't be mucking with the size of the underlying CD/DVD ... I'm not sure if it should be mucking with the size or not, but it definitely shouldn't be mucking with the block-size, because that can indeed cause huge problems. So removing the bd_set_size() is almost certainly the right thing to do (because it's always incorrect to do at an "inner" open), and the real size should have already been set by the regular open. But this should be tested by somebody who uses the dang thing. My personal favourite for a real fix would actually be to always make the capacity of the pktcdvd device match the capacity of the underlying device, ie the diff would look something like the appended (untested as usual), but that may be a bit too extensive a change. But regardless, I think the i_size change makes sense - at least in some form. It doesn't necessarily have to be the explicit i_size setting: I also considered just changing the block_dev.c do_open() make bd_set_size() _unconditionally_, and then move the "if (!bdev->bd_openers)" test into bd_set_size(), so that people couldn't even change the blocksize by mistake! I'd still like to hear comments from Al in particular, if he has something to say. Linus --- drivers/block/pktcdvd.c |4 +--- 1 files changed, 1 insertions(+), 3 deletions(-) diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c index 3535ef8..1b23681 100644 --- a/drivers/block/pktcdvd.c +++ b/drivers/block/pktcdvd.c @@ -2342,9 +2342,7 @@ static int pkt_open_dev(struct pktcdvd_device *pd, int write) goto out_unclaim; } - set_capacity(pd->disk, lba << 2); - set_capacity(pd->bdev->bd_disk, lba << 2); - bd_set_size(pd->bdev, (loff_t)lba << 11); + set_capacity(pd->disk, get_capacity(pd->bdev->bd_disk)); q = bdev_get_queue(pd->bdev); if (write) { -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
On Sun, 2008-01-06 at 10:11 -0700, Matthew Wilcox wrote: > On Sun, Jan 06, 2008 at 09:20:45AM -0600, James Bottomley wrote: > > This bug was actually hidden in bugzilla for ages, where Matthew Wilcox > > was trying to deal with it on his own. The first I heard of it (apart > > from a linux-scsi question on 13 November, when regrettably, I was busy > > with other things) was on 18 Dec when Natalie added me to the bugzilla > > cc list. The first thing I did on that date was finger pktcdvd and add > > Jens to the cc list ... however, since there was no mailing list thread > > to follow he ended up asking for context which no-one provided. > > Not true. I sent my response to the bug to linux-scsi and cc'd bugzilla. > Unfortunately, the bug reporter sent the information requested to bugzilla > instead of linux-scsi. So it got lost and overlooked. > > If there's willingness to change, I'm willing to put some effort into > moving us to a bug tracking system that fits our workflow better than > bugzilla. But if that effort will be disregarded, then let me know now > so I don't waste my time. Well, I'd say yes, certainly, and I'll support the effort ... but the problem is that I'm not one of the powers that be who control our current bugzilla ... that's the constituency we need to convince. As I keep saying, just getting new SCSI bug reports tipped onto the SCSI mailing list will give us about 90% of what we need, but I haven't even managed to get that to happen. James -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
Ingo Molnar wrote: > If lkml traffic is too big then i'd suggest to set up email > filters to separate out mails that have 'SCSI' in their subject line Good idea. Minor flaw: If somebody forgets to Cc LSML, he might also forget to put SCSI or scsi into the Subject. > or body. This yields false positives whenever a .config is posted. Also, filtering based on message body contents is costlier than filtering by headers. I for one use Sieve to sort mails into different IMAP folders at my mail provider's server, and I think Sieve doesn't offer tests for patterns in message bodies at all. -- Stefan Richter -=-==--- ---= --==- http://arcgraph.de/sr/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
On Sun, Jan 06, 2008 at 09:20:45AM -0600, James Bottomley wrote: > This bug was actually hidden in bugzilla for ages, where Matthew Wilcox > was trying to deal with it on his own. The first I heard of it (apart > from a linux-scsi question on 13 November, when regrettably, I was busy > with other things) was on 18 Dec when Natalie added me to the bugzilla > cc list. The first thing I did on that date was finger pktcdvd and add > Jens to the cc list ... however, since there was no mailing list thread > to follow he ended up asking for context which no-one provided. Not true. I sent my response to the bug to linux-scsi and cc'd bugzilla. Unfortunately, the bug reporter sent the information requested to bugzilla instead of linux-scsi. So it got lost and overlooked. If there's willingness to change, I'm willing to put some effort into moving us to a bug tracking system that fits our workflow better than bugzilla. But if that effort will be disregarded, then let me know now so I don't waste my time. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
On Sun, 2008-01-06 at 17:12 +0100, Ingo Molnar wrote: > * James Bottomley <[EMAIL PROTECTED]> wrote: > > > > The reproducer came to you via Peter Osterlund who has never > > > authored a single drivers/scsi/ commit before (according to git-log) > > > and who (and here i'm out on a limb guessing it) does not even > > > follow [EMAIL PROTECTED] > > > > > > this bug was obscure and hidden on [EMAIL PROTECTED] for > > > _months_, (it is a rarely visited and rarely read mailing list) and > > > there was just not enough "critical mass" to get this issue fixed. > > > > If I were you, I'd actually make a cursory effort to get my facts > > straight before spouting off. > > > > This bug was actually hidden in bugzilla for ages, where Matthew > > Wilcox was trying to deal with it on his own. [...] > > Huh? The bugzilla just tracked a bug reported to lkml. The very > description of the bugzilla says: > > Subject : v2.6.24-rc2-409-g9418d5d: attempt to access beyond end of > device > Submitter : Thomas Meyer <[EMAIL PROTECTED]> > References : http://lkml.org/lkml/2007/11/13/250 > > so no, it was evidently not "hidden in bugzilla for ages" - all the > important action happened on lkml. ... and your original accusation was "this bug was obscure and hidden on [EMAIL PROTECTED] for _months_" which I was pointing out wasn't true. Even the original lkml report was obscured by sweeping the report into bugzilla and forgetting about it, so in fact, no action happened, even on lkml. The history is that I was made aware of the bug on 18 Dec. I suggested then it was a pktcdvd problem and actually cc'd the wrong maintainer ... again an error which is compounded by following this single person workflow bugzilla requires. I also told you in no uncertain terms that it wasn't caused by the commit you were trying to blame, but that didn't stop the crusade to affix blame and close the bug without doing proper root cause analysis. Can we stop it with the recriminations and blame shifting now. The problem we need to solve is fixing our broken bug resolution workflow. My suggestion (for actually the third time in various fora) is: to get the best of both worlds, file a bugzilla and note the bugid. Then email a complete report to the relevant list, but add [BUG ] to the subject line and cc [EMAIL PROTECTED] If you do this, bugzilla will keep track of the entire discussion as it progresses and allow those who track bugs through bugzilla to get a pretty accurate idea of the status. You should never need to touch bugzilla again once the initial bug report is filed: all future information flow is via the mailing lists. I don't give a toss what you recommend the default list to be ... use lkml if you wish. There are a lot of people who will vector it back on to linux-scsi and keep lkml in the cc. I also think that bug reports need to be complete, so copy the information from the mailing list, don't just give a URL ... one of the other enforced breaks in workflow is that the URL in the original bugzilla wasn't working when we all tried to look at the original email on 18 Dec, that's why you get several comments asking for more information and where the original thread is. James -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
On Sun, Jan 06, 2008 at 02:55:07PM +0100, Thomas Meyer wrote: > This is the correct setup to trigger the bug. And the commit > 6f5391c283d7fdcf24bf40786ea79061919d1e1d has nothing to do with this > bug. It was bad luck that i couldn't trigger the bug with said commit > reverted (in my tests, the second boot with the reverted commit missed > the first mount/umount smaller cd step, so...) Great. Linus, can you revert the revert now? -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
On Sun, 2008-01-06 at 18:19 +0200, Boaz Harrosh wrote: > On Sun, Jan 06 2008 at 5:43 +0200, Linus Torvalds <[EMAIL PROTECTED]> wrote: > > > > > > This all still leaves the question unanswered why that commit > > 6f5391c283d7fdcf24bf40786ea79061919d1e1d changed any behaviour at all. > > Because the thing that Peter is describing has nothing to do with any > > low-level drivers what-so-ever. > > > > Linus > > > > James Matthew. > I have a (very) wild guess at what maybe have changed with the cmnd->done > patch: > > Do you remember the effective loop in scsi_lib:scsi_end_request() where > if bufflen was smaller then original request size, do to truncation > of bufflen by ULD, then the remaining of the request is re-queued again > as a new scsi-command. Well I think that the old system would call > cmnd->done for every iteration, and the new system, since the done is > called by the block-Q, does not see the resubmit of the new command. Actually, this is cmnd->done, not req->done we're removing. cmnd->done() isn't seen by the block layer; all its uses are in the SCSI mid-layer. > I have not followed all code path of the matter, but I know that sr does > alters bufflen in some cases. > All this is not a bug in itself, but it is a change in behavior that might > cause the current sr hack to fail. It's a good thought. You're right, the old code calls done for every iteration. However, it calls it in scsi_finish_completion. The new code will actually call drv->done() in that same spot for every iteration as well. The requeue is done via scsi_requeue_request which calls blk_requeue_request, which resets the START flag and sends the command right back through the system (including the prep function because scsi_requeue_request unpreps the command), so even with the new code we'll go back through all the same done paths. James -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
On Sun, Jan 06 2008 at 5:43 +0200, Linus Torvalds <[EMAIL PROTECTED]> wrote: > > > This all still leaves the question unanswered why that commit > 6f5391c283d7fdcf24bf40786ea79061919d1e1d changed any behaviour at all. > Because the thing that Peter is describing has nothing to do with any > low-level drivers what-so-ever. > > Linus > James Matthew. I have a (very) wild guess at what maybe have changed with the cmnd->done patch: Do you remember the effective loop in scsi_lib:scsi_end_request() where if bufflen was smaller then original request size, do to truncation of bufflen by ULD, then the remaining of the request is re-queued again as a new scsi-command. Well I think that the old system would call cmnd->done for every iteration, and the new system, since the done is called by the block-Q, does not see the resubmit of the new command. I have not followed all code path of the matter, but I know that sr does alters bufflen in some cases. All this is not a bug in itself, but it is a change in behavior that might cause the current sr hack to fail. Boaz -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
* James Bottomley <[EMAIL PROTECTED]> wrote: > > The reproducer came to you via Peter Osterlund who has never > > authored a single drivers/scsi/ commit before (according to git-log) > > and who (and here i'm out on a limb guessing it) does not even > > follow [EMAIL PROTECTED] > > > > this bug was obscure and hidden on [EMAIL PROTECTED] for > > _months_, (it is a rarely visited and rarely read mailing list) and > > there was just not enough "critical mass" to get this issue fixed. > > If I were you, I'd actually make a cursory effort to get my facts > straight before spouting off. > > This bug was actually hidden in bugzilla for ages, where Matthew > Wilcox was trying to deal with it on his own. [...] Huh? The bugzilla just tracked a bug reported to lkml. The very description of the bugzilla says: Subject : v2.6.24-rc2-409-g9418d5d: attempt to access beyond end of device Submitter : Thomas Meyer <[EMAIL PROTECTED]> References : http://lkml.org/lkml/2007/11/13/250 so no, it was evidently not "hidden in bugzilla for ages" - all the important action happened on lkml. > The whole problem with this bug was generated precisely because it was > kept in bugzilla where too few people actually looked at it. You're > the one who annotated the bugzilla entries with trite little homilies > asking why there was no action *without* ever notifying any mailing > list, I might add. again, this bugzilla entry originated from lkml. I did ping the bugzilla because i saw that the suspected commit's author was Cc:-ed already. Why should every bug reporter and debugger be fully aware of the absolutely SILLY little details and preferences of maintainers about how and whom to report bugs? YOU made the workflow fragile in the first place, by going away to linux-scsi and ignoring lkml reports. Or in your own words, in the bugzilla, comment #9: http://bugzilla.kernel.org/show_bug.cgi?id=9370#c9 Reply-To: [EMAIL PROTECTED] [...] Erm, actually no ... this is the first I've heard of it, except as a passing question from matthew. It's usually safe to assume if it's ^^^ not on linux-scsi I haven't seen it. that's fundamentally flawed. For testers there should be only one, simple as possible rule: "if you have a problem with the Linux kernel, then report it to lkml" [ or report it to your distro or bugzilla.kernel.org, where it will be propagated towards lkml by others. ] not to "report it to one of the 100+ lists listed here - good luck getting it right": L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] (subscribers-only) L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] (subscribers-only) L: http://lists.twibble.org/mailman/listinfo/dc395x/ L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] (subscribers-only) L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] (subscribers-only) L: [EMAIL PROTECTED] (subscribers-only) L: [EMAIL PROTECTED] (subscribers-only) L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] (subscription required) L: [EMAIL PROTECTED], [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] (subscribers-only) L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: linux-kernel@vger.kernel.org L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] (in Japanese) L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L: [EMAIL PROTECTED] L:
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
On Sun, 2008-01-06 at 17:45 +0200, Adrian Bunk wrote: > Bugzilla for tracking and mailing lists for discussing are not mutually > exclusive. I didn't ever say they were. What I said was we needed the work flow on the mailing lists. > What about asking the Bugzilla admins to set the default owner of new > SCSI bugs to [EMAIL PROTECTED] > > This way all SCSI bugs submitted in Bugzilla will automatically be > forwarded to the linux-scsi mailing list. Well, I've only been asking for this for the last two years. If you can actually make it happen, I'd be eternally grateful. James -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
On Sun, Jan 06, 2008 at 09:20:45AM -0600, James Bottomley wrote: > > On Sun, 2008-01-06 at 15:47 +0100, Ingo Molnar wrote: > > * James Bottomley <[EMAIL PROTECTED]> wrote: > > > > > > I can repeat this bug, both with and without the scsi patch that is > > > > claimed to make a difference, both with an external USB drive and an > > > > internal IDE drive. > > > > > > > > To repeat: > > > > > > > > 1. Start with an empty drive. > > > > 2. pktsetup 0 /dev/scd0 > > > > 3. Insert a CD containing an isofs filesystem. > > > > 4. mount /dev/pktcdvd/0 /mnt/tmp > > > > 5. umount /mnt/tmp > > > > 6. Press the eject button. > > > > 7. Insert a DVD containing a non-writable filesystem. > > > > 8. mount /dev/scd0 /mnt/tmp > > > > 9. find /mnt/tmp -type f -print0 | xargs -0 sha1sum >/dev/null > > > > 10. If the DVD contains data beyond the physical size of a CD, you > > > > get I/O errors in the terminal, and dmesg reports lots of > > > > "attempt to access beyond end of device" errors. > > > > > > Brilliant! I can confirm the reproduction of the bug too (that's with > > > the originally fingered commit reverted). > > > > may i point out the obvious at this stage? The thing that finally got > > movement into this bug was ... : > > > >exposure on lkml > > I won't disagree with that. That's why my philosophy is to try to force > all bug reports out of bugzilla and on to the relevant mailing list > because of the many eyes approach this engenders. The problem is that mailing lists are far too often equivalent to /dev/null for many bug reports. Tracking e.g. helps with not missing regressions and getting more of them fixed. And another of the advantages of using Bugzilla is that it gives us numbers how bad we are in terms of introducing regressions and having unfixed bugs, so developers are no longer able to tell we didn't have a problem in this area... > > The reproducer came to you via Peter Osterlund who has never authored a > > single drivers/scsi/ commit before (according to git-log) and who (and > > here i'm out on a limb guessing it) does not even follow > > [EMAIL PROTECTED] > > > > this bug was obscure and hidden on [EMAIL PROTECTED] for > > _months_, (it is a rarely visited and rarely read mailing list) and > > there was just not enough "critical mass" to get this issue fixed. > > If I were you, I'd actually make a cursory effort to get my facts > straight before spouting off. > > This bug was actually hidden in bugzilla for ages, where Matthew Wilcox > was trying to deal with it on his own. The first I heard of it (apart > from a linux-scsi question on 13 November, when regrettably, I was busy > with other things) was on 18 Dec when Natalie added me to the bugzilla > cc list. The first thing I did on that date was finger pktcdvd and add > Jens to the cc list ... however, since there was no mailing list thread > to follow he ended up asking for context which no-one provided. > > The whole problem with this bug was generated precisely because it was > kept in bugzilla where too few people actually looked at it. You're the > one who annotated the bugzilla entries with trite little homilies asking > why there was no action *without* ever notifying any mailing list, I > might add. > > The fault lies in our bug processing methodology. Bugzilla is a fine > tracking tool, but it's a bloody useless workflow one for actually > solving problems because, as you say, and I agree, the mailing lists are > where we produce the solutions. >... Bugzilla for tracking and mailing lists for discussing are not mutually exclusive. What about asking the Bugzilla admins to set the default owner of new SCSI bugs to [EMAIL PROTECTED] This way all SCSI bugs submitted in Bugzilla will automatically be forwarded to the linux-scsi mailing list. > James cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
On Sun, 2008-01-06 at 15:47 +0100, Ingo Molnar wrote: > * James Bottomley <[EMAIL PROTECTED]> wrote: > > > > I can repeat this bug, both with and without the scsi patch that is > > > claimed to make a difference, both with an external USB drive and an > > > internal IDE drive. > > > > > > To repeat: > > > > > > 1. Start with an empty drive. > > > 2. pktsetup 0 /dev/scd0 > > > 3. Insert a CD containing an isofs filesystem. > > > 4. mount /dev/pktcdvd/0 /mnt/tmp > > > 5. umount /mnt/tmp > > > 6. Press the eject button. > > > 7. Insert a DVD containing a non-writable filesystem. > > > 8. mount /dev/scd0 /mnt/tmp > > > 9. find /mnt/tmp -type f -print0 | xargs -0 sha1sum >/dev/null > > > 10. If the DVD contains data beyond the physical size of a CD, you > > > get I/O errors in the terminal, and dmesg reports lots of > > > "attempt to access beyond end of device" errors. > > > > Brilliant! I can confirm the reproduction of the bug too (that's with > > the originally fingered commit reverted). > > may i point out the obvious at this stage? The thing that finally got > movement into this bug was ... : > >exposure on lkml I won't disagree with that. That's why my philosophy is to try to force all bug reports out of bugzilla and on to the relevant mailing list because of the many eyes approach this engenders. > The reproducer came to you via Peter Osterlund who has never authored a > single drivers/scsi/ commit before (according to git-log) and who (and > here i'm out on a limb guessing it) does not even follow > [EMAIL PROTECTED] > > this bug was obscure and hidden on [EMAIL PROTECTED] for > _months_, (it is a rarely visited and rarely read mailing list) and > there was just not enough "critical mass" to get this issue fixed. If I were you, I'd actually make a cursory effort to get my facts straight before spouting off. This bug was actually hidden in bugzilla for ages, where Matthew Wilcox was trying to deal with it on his own. The first I heard of it (apart from a linux-scsi question on 13 November, when regrettably, I was busy with other things) was on 18 Dec when Natalie added me to the bugzilla cc list. The first thing I did on that date was finger pktcdvd and add Jens to the cc list ... however, since there was no mailing list thread to follow he ended up asking for context which no-one provided. The whole problem with this bug was generated precisely because it was kept in bugzilla where too few people actually looked at it. You're the one who annotated the bugzilla entries with trite little homilies asking why there was no action *without* ever notifying any mailing list, I might add. The fault lies in our bug processing methodology. Bugzilla is a fine tracking tool, but it's a bloody useless workflow one for actually solving problems because, as you say, and I agree, the mailing lists are where we produce the solutions. > _THAT_ is the power of lkml. People who are not generally interested in > your subsystem come and help. There is extra noise, but it's manageable. > > so may i at this point suggest that you as the SCSI maintainer start > reading SCSI bugreports on lkml and start participating in SCSI topics > there, without extra prompting? It _is_ an important aggregation mailing > list for development, just like -mm or the upstream kernel is an > aggregation point of all things Linux. > > I believe the "I only read linux-scsi, please post bugs there" approach > is harmful. If lkml traffic is too big then i'd suggest to set up email > filters to separate out mails that have 'SCSI' in their subject line or > body. Fortunately it's a really easy key to filter on. [ Scheduler mails > are much harder to filter out :-/ ] In fact i'd suggest to do all SCSI > development on lkml. We've got one upstream kernel codebase, one git > stream of commits and hence we should use one lkml feed to discuss > things on. Oh good grief ... we do add other mailing lists to the cc list (most often lkml) when it becomes evident that it's not a SCSI problem. Most bug reports actually start off going to a set of lists (including lkml) anyway, so there's usually full context. You may love drinking from the firehose ... I find it makes me want to pee a lot. Forcing your work habits on everyone else isn't really a very community way of solving anything. James -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
On Sun, 6 Jan 2008, James Bottomley wrote: On Sat, 2008-01-05 at 19:43 -0800, Linus Torvalds wrote: diff --git a/fs/block_dev.c b/fs/block_dev.c index 993f78c..6a20da9 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -1191,6 +1191,7 @@ static int do_open(struct block_device *bdev, struct file *file, int for_part) } if (bdev->bd_invalidated) rescan_partitions(bdev->bd_disk, bdev); + bd_inode->i_size = (loff_t)get_capacity(disk)<<9; } } bdev->bd_openers++; Actually, I think that code is fine ... the block size shouldn't change across positive values of bd_openers. I think the true fix is below: pktcdvd shouldn't be mucking with the size of the underlying CD/DVD ... it can change its own layered device capacity, certainly, but it shouldn't be mucking with the capacity that was already set in the sr_probe routine. I'm in two minds as to whether the set capacity on the underlying device should be removed as well ... I think it should, but it is harmless as the sr_probe will also reset it. However, I'll defer to what Al wants to do. James diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c index 3535ef8..10f3692 100644 --- a/drivers/block/pktcdvd.c +++ b/drivers/block/pktcdvd.c @@ -2344,7 +2344,6 @@ static int pkt_open_dev(struct pktcdvd_device *pd, int write) set_capacity(pd->disk, lba << 2); set_capacity(pd->bdev->bd_disk, lba << 2); - bd_set_size(pd->bdev, (loff_t)lba << 11); q = bdev_get_queue(pd->bdev); if (write) { That code was added to fix a similar bug, see: http://linux.derkeiler.com/Mailing-Lists/Kernel/2004-08/4288.html Maybe that fix was wrong and should have just set bd_inode->i_size instead of calling bd_set_size(). -- Peter Osterlund - [EMAIL PROTECTED] http://web.telia.com/~u89404340 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
* James Bottomley <[EMAIL PROTECTED]> wrote: > > I can repeat this bug, both with and without the scsi patch that is > > claimed to make a difference, both with an external USB drive and an > > internal IDE drive. > > > > To repeat: > > > > 1. Start with an empty drive. > > 2. pktsetup 0 /dev/scd0 > > 3. Insert a CD containing an isofs filesystem. > > 4. mount /dev/pktcdvd/0 /mnt/tmp > > 5. umount /mnt/tmp > > 6. Press the eject button. > > 7. Insert a DVD containing a non-writable filesystem. > > 8. mount /dev/scd0 /mnt/tmp > > 9. find /mnt/tmp -type f -print0 | xargs -0 sha1sum >/dev/null > > 10. If the DVD contains data beyond the physical size of a CD, you > > get I/O errors in the terminal, and dmesg reports lots of > > "attempt to access beyond end of device" errors. > > Brilliant! I can confirm the reproduction of the bug too (that's with > the originally fingered commit reverted). may i point out the obvious at this stage? The thing that finally got movement into this bug was ... : exposure on lkml The reproducer came to you via Peter Osterlund who has never authored a single drivers/scsi/ commit before (according to git-log) and who (and here i'm out on a limb guessing it) does not even follow [EMAIL PROTECTED] this bug was obscure and hidden on [EMAIL PROTECTED] for _months_, (it is a rarely visited and rarely read mailing list) and there was just not enough "critical mass" to get this issue fixed. _THAT_ is the power of lkml. People who are not generally interested in your subsystem come and help. There is extra noise, but it's manageable. so may i at this point suggest that you as the SCSI maintainer start reading SCSI bugreports on lkml and start participating in SCSI topics there, without extra prompting? It _is_ an important aggregation mailing list for development, just like -mm or the upstream kernel is an aggregation point of all things Linux. I believe the "I only read linux-scsi, please post bugs there" approach is harmful. If lkml traffic is too big then i'd suggest to set up email filters to separate out mails that have 'SCSI' in their subject line or body. Fortunately it's a really easy key to filter on. [ Scheduler mails are much harder to filter out :-/ ] In fact i'd suggest to do all SCSI development on lkml. We've got one upstream kernel codebase, one git stream of commits and hence we should use one lkml feed to discuss things on. Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
On Sat, 2008-01-05 at 19:43 -0800, Linus Torvalds wrote: > fs/block_dev.c |1 + > 1 files changed, 1 insertions(+), 0 deletions(-) > > diff --git a/fs/block_dev.c b/fs/block_dev.c > index 993f78c..6a20da9 100644 > --- a/fs/block_dev.c > +++ b/fs/block_dev.c > @@ -1191,6 +1191,7 @@ static int do_open(struct block_device *bdev, struct > file *file, int for_part) > } > if (bdev->bd_invalidated) > rescan_partitions(bdev->bd_disk, bdev); > + bd_inode->i_size = (loff_t)get_capacity(disk)<<9; > } > } > bdev->bd_openers++; Actually, I think that code is fine ... the block size shouldn't change across positive values of bd_openers. I think the true fix is below: pktcdvd shouldn't be mucking with the size of the underlying CD/DVD ... it can change its own layered device capacity, certainly, but it shouldn't be mucking with the capacity that was already set in the sr_probe routine. I'm in two minds as to whether the set capacity on the underlying device should be removed as well ... I think it should, but it is harmless as the sr_probe will also reset it. However, I'll defer to what Al wants to do. James diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c index 3535ef8..10f3692 100644 --- a/drivers/block/pktcdvd.c +++ b/drivers/block/pktcdvd.c @@ -2344,7 +2344,6 @@ static int pkt_open_dev(struct pktcdvd_device *pd, int write) set_capacity(pd->disk, lba << 2); set_capacity(pd->bdev->bd_disk, lba << 2); - bd_set_size(pd->bdev, (loff_t)lba << 11); q = bdev_get_queue(pd->bdev); if (write) { -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
On Sat, 2008-01-05 at 19:43 -0800, Linus Torvalds wrote: > This all still leaves the question unanswered why that commit > 6f5391c283d7fdcf24bf40786ea79061919d1e1d changed any behaviour at > all. > Because the thing that Peter is describing has nothing to do with any > low-level drivers what-so-ever. It isn't even a secondary effect like I thought. This commit genuinely didn't have anything to do with the bug, it was purely accidental. It came about because if you look at the reporter's recipe to reproduce, which all of us tried without success, it's missing several steps. To get the bug, these steps must have been done somehow, but I bet by pure chance they weren't when reverting 6f5391c283d7fdcf24bf40786ea79061919d1e1d which led to wrongly fingering this commit. Now, if only someone who understood the mechanics of what the commit was doing tried to stop you reverting it we could have saved a lot of trouble ... James -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
On Sun, 2008-01-06 at 03:55 +0100, Peter Osterlund wrote: > Linus Torvalds <[EMAIL PROTECTED]> writes: > > > On Wed, 2 Jan 2008, James Bottomley wrote: > > > > > Look at the taxonomy of the bug. This is the form of the error: > > > > > > buffer I/O error on device sr0, logical block 20304 > > > attempt to access beyond end of device > > > sr0: rw=0, want=81224, limit=40944 > > > > > > The last limit is the most suggestive, that comes straight from > > > bdev->bd_inode->i_size>>9 and is supposed to be the size of the block > > > device in 512 byte blocks. For a 4.7GB DVD, it's a little small. > > > Nothing in the sr code sets this directly (although it does come from > > > get_blkdev() for the first opener). pktcdvd does set it, though ... and > > > probably wrongly if the drive in question isn't UDF formatted. > > pktcdvd sets it when opening the /dev/pktcdvd device, but when the > drive is later opened as /dev/scd0, there is nothing that sets it > back. (Btw, 40944 is possible if the disk is a CDRW that was formatted > with "cdrwtool -m 10236".) > > The problem is that pktcdvd opens the cd device in non-blocking mode > when pktsetup is run, and doesn't close it again until pktsetup -d > is run. The effect is that if you meanwhile open the cd device, > blkdev.c:do_open() doesn't call bd_set_size() because bdev->bd_openers > is non-zero. > > I don't know the correct way to fix this. Maybe adding bd_set_size() > to sr.c:get_sectorsize() which already does set_capacity() would > work. Could be ... this is deep viro magic, though; I've added him to the Cc list to get his input. > > .. but you're ignoring the fact that if pktcdvd sets it wrong, then it > > should be visible with the pre-commit kernel *also*. > > I can repeat this bug, both with and without the scsi patch that is > claimed to make a difference, both with an external USB drive and an > internal IDE drive. > > To repeat: > > 1. Start with an empty drive. > 2. pktsetup 0 /dev/scd0 > 3. Insert a CD containing an isofs filesystem. > 4. mount /dev/pktcdvd/0 /mnt/tmp > 5. umount /mnt/tmp > 6. Press the eject button. > 7. Insert a DVD containing a non-writable filesystem. > 8. mount /dev/scd0 /mnt/tmp > 9. find /mnt/tmp -type f -print0 | xargs -0 sha1sum >/dev/null > 10. If the DVD contains data beyond the physical size of a CD, you > get I/O errors in the terminal, and dmesg reports lots of > "attempt to access beyond end of device" errors. Brilliant! I can confirm the reproduction of the bug too (that's with the originally fingered commit reverted). James -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
Hi, I confirm Peter's observations: > To repeat: > > 1. Start with an empty drive. > 2. pktsetup 0 /dev/scd0 > 3. Insert a CD containing an isofs filesystem. > 4. mount /dev/pktcdvd/0 /mnt/tmp > 5. umount /mnt/tmp > 6. Press the eject button. > 7. Insert a DVD containing a non-writable filesystem. > 8. mount /dev/scd0 /mnt/tmp > 9. find /mnt/tmp -type f -print0 | xargs -0 sha1sum >/dev/null > 10. If the DVD contains data beyond the physical size of a CD, you > get I/O errors in the terminal, and dmesg reports lots of > "attempt to access beyond end of device" errors. This is the correct setup to trigger the bug. And the commit 6f5391c283d7fdcf24bf40786ea79061919d1e1d has nothing to do with this bug. It was bad luck that i couldn't trigger the bug with said commit reverted (in my tests, the second boot with the reverted commit missed the first mount/umount smaller cd step, so...) So sorry for all the mess and stress. I'm glad that Peter found the real nature of this bug. mfg thomas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
Linus Torvalds <[EMAIL PROTECTED]> writes: > Does a patch like this change the behaviour you see at all? > + bd_inode->i_size = (loff_t)get_capacity(disk)<<9; It does fix my scenario, with the trivial fix of adding bdev-> at the beginning of that line, ie: diff --git a/fs/block_dev.c b/fs/block_dev.c index 993f78c..a8ed344 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -1191,6 +1191,7 @@ static int do_open(struct block_device *bdev, struct file *file, int for_part) } if (bdev->bd_invalidated) rescan_partitions(bdev->bd_disk, bdev); + bdev->bd_inode->i_size = (loff_t)get_capacity(disk)<<9; } } bdev->bd_openers++; -- Peter Osterlund - [EMAIL PROTECTED] http://web.telia.com/~u89404340 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert [SCSI] Get rid of scsi_cmnd-done
Linus Torvalds [EMAIL PROTECTED] writes: Does a patch like this change the behaviour you see at all? + bd_inode-i_size = (loff_t)get_capacity(disk)9; It does fix my scenario, with the trivial fix of adding bdev- at the beginning of that line, ie: diff --git a/fs/block_dev.c b/fs/block_dev.c index 993f78c..a8ed344 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -1191,6 +1191,7 @@ static int do_open(struct block_device *bdev, struct file *file, int for_part) } if (bdev-bd_invalidated) rescan_partitions(bdev-bd_disk, bdev); + bdev-bd_inode-i_size = (loff_t)get_capacity(disk)9; } } bdev-bd_openers++; -- Peter Osterlund - [EMAIL PROTECTED] http://web.telia.com/~u89404340 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert [SCSI] Get rid of scsi_cmnd-done
Hi, I confirm Peter's observations: To repeat: 1. Start with an empty drive. 2. pktsetup 0 /dev/scd0 3. Insert a CD containing an isofs filesystem. 4. mount /dev/pktcdvd/0 /mnt/tmp 5. umount /mnt/tmp 6. Press the eject button. 7. Insert a DVD containing a non-writable filesystem. 8. mount /dev/scd0 /mnt/tmp 9. find /mnt/tmp -type f -print0 | xargs -0 sha1sum /dev/null 10. If the DVD contains data beyond the physical size of a CD, you get I/O errors in the terminal, and dmesg reports lots of attempt to access beyond end of device errors. This is the correct setup to trigger the bug. And the commit 6f5391c283d7fdcf24bf40786ea79061919d1e1d has nothing to do with this bug. It was bad luck that i couldn't trigger the bug with said commit reverted (in my tests, the second boot with the reverted commit missed the first mount/umount smaller cd step, so...) So sorry for all the mess and stress. I'm glad that Peter found the real nature of this bug. mfg thomas -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert [SCSI] Get rid of scsi_cmnd-done
On Sun, 2008-01-06 at 03:55 +0100, Peter Osterlund wrote: Linus Torvalds [EMAIL PROTECTED] writes: On Wed, 2 Jan 2008, James Bottomley wrote: Look at the taxonomy of the bug. This is the form of the error: buffer I/O error on device sr0, logical block 20304 attempt to access beyond end of device sr0: rw=0, want=81224, limit=40944 The last limit is the most suggestive, that comes straight from bdev-bd_inode-i_size9 and is supposed to be the size of the block device in 512 byte blocks. For a 4.7GB DVD, it's a little small. Nothing in the sr code sets this directly (although it does come from get_blkdev() for the first opener). pktcdvd does set it, though ... and probably wrongly if the drive in question isn't UDF formatted. pktcdvd sets it when opening the /dev/pktcdvd device, but when the drive is later opened as /dev/scd0, there is nothing that sets it back. (Btw, 40944 is possible if the disk is a CDRW that was formatted with cdrwtool -m 10236.) The problem is that pktcdvd opens the cd device in non-blocking mode when pktsetup is run, and doesn't close it again until pktsetup -d is run. The effect is that if you meanwhile open the cd device, blkdev.c:do_open() doesn't call bd_set_size() because bdev-bd_openers is non-zero. I don't know the correct way to fix this. Maybe adding bd_set_size() to sr.c:get_sectorsize() which already does set_capacity() would work. Could be ... this is deep viro magic, though; I've added him to the Cc list to get his input. .. but you're ignoring the fact that if pktcdvd sets it wrong, then it should be visible with the pre-commit kernel *also*. I can repeat this bug, both with and without the scsi patch that is claimed to make a difference, both with an external USB drive and an internal IDE drive. To repeat: 1. Start with an empty drive. 2. pktsetup 0 /dev/scd0 3. Insert a CD containing an isofs filesystem. 4. mount /dev/pktcdvd/0 /mnt/tmp 5. umount /mnt/tmp 6. Press the eject button. 7. Insert a DVD containing a non-writable filesystem. 8. mount /dev/scd0 /mnt/tmp 9. find /mnt/tmp -type f -print0 | xargs -0 sha1sum /dev/null 10. If the DVD contains data beyond the physical size of a CD, you get I/O errors in the terminal, and dmesg reports lots of attempt to access beyond end of device errors. Brilliant! I can confirm the reproduction of the bug too (that's with the originally fingered commit reverted). James -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert [SCSI] Get rid of scsi_cmnd-done
On Sat, 2008-01-05 at 19:43 -0800, Linus Torvalds wrote: This all still leaves the question unanswered why that commit 6f5391c283d7fdcf24bf40786ea79061919d1e1d changed any behaviour at all. Because the thing that Peter is describing has nothing to do with any low-level drivers what-so-ever. It isn't even a secondary effect like I thought. This commit genuinely didn't have anything to do with the bug, it was purely accidental. It came about because if you look at the reporter's recipe to reproduce, which all of us tried without success, it's missing several steps. To get the bug, these steps must have been done somehow, but I bet by pure chance they weren't when reverting 6f5391c283d7fdcf24bf40786ea79061919d1e1d which led to wrongly fingering this commit. Now, if only someone who understood the mechanics of what the commit was doing tried to stop you reverting it we could have saved a lot of trouble ... James -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert [SCSI] Get rid of scsi_cmnd-done
On Sat, 2008-01-05 at 19:43 -0800, Linus Torvalds wrote: fs/block_dev.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/fs/block_dev.c b/fs/block_dev.c index 993f78c..6a20da9 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -1191,6 +1191,7 @@ static int do_open(struct block_device *bdev, struct file *file, int for_part) } if (bdev-bd_invalidated) rescan_partitions(bdev-bd_disk, bdev); + bd_inode-i_size = (loff_t)get_capacity(disk)9; } } bdev-bd_openers++; Actually, I think that code is fine ... the block size shouldn't change across positive values of bd_openers. I think the true fix is below: pktcdvd shouldn't be mucking with the size of the underlying CD/DVD ... it can change its own layered device capacity, certainly, but it shouldn't be mucking with the capacity that was already set in the sr_probe routine. I'm in two minds as to whether the set capacity on the underlying device should be removed as well ... I think it should, but it is harmless as the sr_probe will also reset it. However, I'll defer to what Al wants to do. James diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c index 3535ef8..10f3692 100644 --- a/drivers/block/pktcdvd.c +++ b/drivers/block/pktcdvd.c @@ -2344,7 +2344,6 @@ static int pkt_open_dev(struct pktcdvd_device *pd, int write) set_capacity(pd-disk, lba 2); set_capacity(pd-bdev-bd_disk, lba 2); - bd_set_size(pd-bdev, (loff_t)lba 11); q = bdev_get_queue(pd-bdev); if (write) { -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert [SCSI] Get rid of scsi_cmnd-done
* James Bottomley [EMAIL PROTECTED] wrote: I can repeat this bug, both with and without the scsi patch that is claimed to make a difference, both with an external USB drive and an internal IDE drive. To repeat: 1. Start with an empty drive. 2. pktsetup 0 /dev/scd0 3. Insert a CD containing an isofs filesystem. 4. mount /dev/pktcdvd/0 /mnt/tmp 5. umount /mnt/tmp 6. Press the eject button. 7. Insert a DVD containing a non-writable filesystem. 8. mount /dev/scd0 /mnt/tmp 9. find /mnt/tmp -type f -print0 | xargs -0 sha1sum /dev/null 10. If the DVD contains data beyond the physical size of a CD, you get I/O errors in the terminal, and dmesg reports lots of attempt to access beyond end of device errors. Brilliant! I can confirm the reproduction of the bug too (that's with the originally fingered commit reverted). may i point out the obvious at this stage? The thing that finally got movement into this bug was ... : exposure on lkml The reproducer came to you via Peter Osterlund who has never authored a single drivers/scsi/ commit before (according to git-log) and who (and here i'm out on a limb guessing it) does not even follow [EMAIL PROTECTED] this bug was obscure and hidden on [EMAIL PROTECTED] for _months_, (it is a rarely visited and rarely read mailing list) and there was just not enough critical mass to get this issue fixed. _THAT_ is the power of lkml. People who are not generally interested in your subsystem come and help. There is extra noise, but it's manageable. so may i at this point suggest that you as the SCSI maintainer start reading SCSI bugreports on lkml and start participating in SCSI topics there, without extra prompting? It _is_ an important aggregation mailing list for development, just like -mm or the upstream kernel is an aggregation point of all things Linux. I believe the I only read linux-scsi, please post bugs there approach is harmful. If lkml traffic is too big then i'd suggest to set up email filters to separate out mails that have 'SCSI' in their subject line or body. Fortunately it's a really easy key to filter on. [ Scheduler mails are much harder to filter out :-/ ] In fact i'd suggest to do all SCSI development on lkml. We've got one upstream kernel codebase, one git stream of commits and hence we should use one lkml feed to discuss things on. Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert [SCSI] Get rid of scsi_cmnd-done
On Sun, 6 Jan 2008, James Bottomley wrote: On Sat, 2008-01-05 at 19:43 -0800, Linus Torvalds wrote: diff --git a/fs/block_dev.c b/fs/block_dev.c index 993f78c..6a20da9 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -1191,6 +1191,7 @@ static int do_open(struct block_device *bdev, struct file *file, int for_part) } if (bdev-bd_invalidated) rescan_partitions(bdev-bd_disk, bdev); + bd_inode-i_size = (loff_t)get_capacity(disk)9; } } bdev-bd_openers++; Actually, I think that code is fine ... the block size shouldn't change across positive values of bd_openers. I think the true fix is below: pktcdvd shouldn't be mucking with the size of the underlying CD/DVD ... it can change its own layered device capacity, certainly, but it shouldn't be mucking with the capacity that was already set in the sr_probe routine. I'm in two minds as to whether the set capacity on the underlying device should be removed as well ... I think it should, but it is harmless as the sr_probe will also reset it. However, I'll defer to what Al wants to do. James diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c index 3535ef8..10f3692 100644 --- a/drivers/block/pktcdvd.c +++ b/drivers/block/pktcdvd.c @@ -2344,7 +2344,6 @@ static int pkt_open_dev(struct pktcdvd_device *pd, int write) set_capacity(pd-disk, lba 2); set_capacity(pd-bdev-bd_disk, lba 2); - bd_set_size(pd-bdev, (loff_t)lba 11); q = bdev_get_queue(pd-bdev); if (write) { That code was added to fix a similar bug, see: http://linux.derkeiler.com/Mailing-Lists/Kernel/2004-08/4288.html Maybe that fix was wrong and should have just set bd_inode-i_size instead of calling bd_set_size(). -- Peter Osterlund - [EMAIL PROTECTED] http://web.telia.com/~u89404340 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert [SCSI] Get rid of scsi_cmnd-done
On Sun, 2008-01-06 at 15:47 +0100, Ingo Molnar wrote: * James Bottomley [EMAIL PROTECTED] wrote: I can repeat this bug, both with and without the scsi patch that is claimed to make a difference, both with an external USB drive and an internal IDE drive. To repeat: 1. Start with an empty drive. 2. pktsetup 0 /dev/scd0 3. Insert a CD containing an isofs filesystem. 4. mount /dev/pktcdvd/0 /mnt/tmp 5. umount /mnt/tmp 6. Press the eject button. 7. Insert a DVD containing a non-writable filesystem. 8. mount /dev/scd0 /mnt/tmp 9. find /mnt/tmp -type f -print0 | xargs -0 sha1sum /dev/null 10. If the DVD contains data beyond the physical size of a CD, you get I/O errors in the terminal, and dmesg reports lots of attempt to access beyond end of device errors. Brilliant! I can confirm the reproduction of the bug too (that's with the originally fingered commit reverted). may i point out the obvious at this stage? The thing that finally got movement into this bug was ... : exposure on lkml I won't disagree with that. That's why my philosophy is to try to force all bug reports out of bugzilla and on to the relevant mailing list because of the many eyes approach this engenders. The reproducer came to you via Peter Osterlund who has never authored a single drivers/scsi/ commit before (according to git-log) and who (and here i'm out on a limb guessing it) does not even follow [EMAIL PROTECTED] this bug was obscure and hidden on [EMAIL PROTECTED] for _months_, (it is a rarely visited and rarely read mailing list) and there was just not enough critical mass to get this issue fixed. If I were you, I'd actually make a cursory effort to get my facts straight before spouting off. This bug was actually hidden in bugzilla for ages, where Matthew Wilcox was trying to deal with it on his own. The first I heard of it (apart from a linux-scsi question on 13 November, when regrettably, I was busy with other things) was on 18 Dec when Natalie added me to the bugzilla cc list. The first thing I did on that date was finger pktcdvd and add Jens to the cc list ... however, since there was no mailing list thread to follow he ended up asking for context which no-one provided. The whole problem with this bug was generated precisely because it was kept in bugzilla where too few people actually looked at it. You're the one who annotated the bugzilla entries with trite little homilies asking why there was no action *without* ever notifying any mailing list, I might add. The fault lies in our bug processing methodology. Bugzilla is a fine tracking tool, but it's a bloody useless workflow one for actually solving problems because, as you say, and I agree, the mailing lists are where we produce the solutions. _THAT_ is the power of lkml. People who are not generally interested in your subsystem come and help. There is extra noise, but it's manageable. so may i at this point suggest that you as the SCSI maintainer start reading SCSI bugreports on lkml and start participating in SCSI topics there, without extra prompting? It _is_ an important aggregation mailing list for development, just like -mm or the upstream kernel is an aggregation point of all things Linux. I believe the I only read linux-scsi, please post bugs there approach is harmful. If lkml traffic is too big then i'd suggest to set up email filters to separate out mails that have 'SCSI' in their subject line or body. Fortunately it's a really easy key to filter on. [ Scheduler mails are much harder to filter out :-/ ] In fact i'd suggest to do all SCSI development on lkml. We've got one upstream kernel codebase, one git stream of commits and hence we should use one lkml feed to discuss things on. Oh good grief ... we do add other mailing lists to the cc list (most often lkml) when it becomes evident that it's not a SCSI problem. Most bug reports actually start off going to a set of lists (including lkml) anyway, so there's usually full context. You may love drinking from the firehose ... I find it makes me want to pee a lot. Forcing your work habits on everyone else isn't really a very community way of solving anything. James -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert [SCSI] Get rid of scsi_cmnd-done
On Sun, Jan 06, 2008 at 09:20:45AM -0600, James Bottomley wrote: On Sun, 2008-01-06 at 15:47 +0100, Ingo Molnar wrote: * James Bottomley [EMAIL PROTECTED] wrote: I can repeat this bug, both with and without the scsi patch that is claimed to make a difference, both with an external USB drive and an internal IDE drive. To repeat: 1. Start with an empty drive. 2. pktsetup 0 /dev/scd0 3. Insert a CD containing an isofs filesystem. 4. mount /dev/pktcdvd/0 /mnt/tmp 5. umount /mnt/tmp 6. Press the eject button. 7. Insert a DVD containing a non-writable filesystem. 8. mount /dev/scd0 /mnt/tmp 9. find /mnt/tmp -type f -print0 | xargs -0 sha1sum /dev/null 10. If the DVD contains data beyond the physical size of a CD, you get I/O errors in the terminal, and dmesg reports lots of attempt to access beyond end of device errors. Brilliant! I can confirm the reproduction of the bug too (that's with the originally fingered commit reverted). may i point out the obvious at this stage? The thing that finally got movement into this bug was ... : exposure on lkml I won't disagree with that. That's why my philosophy is to try to force all bug reports out of bugzilla and on to the relevant mailing list because of the many eyes approach this engenders. The problem is that mailing lists are far too often equivalent to /dev/null for many bug reports. Tracking e.g. helps with not missing regressions and getting more of them fixed. And another of the advantages of using Bugzilla is that it gives us numbers how bad we are in terms of introducing regressions and having unfixed bugs, so developers are no longer able to tell we didn't have a problem in this area... The reproducer came to you via Peter Osterlund who has never authored a single drivers/scsi/ commit before (according to git-log) and who (and here i'm out on a limb guessing it) does not even follow [EMAIL PROTECTED] this bug was obscure and hidden on [EMAIL PROTECTED] for _months_, (it is a rarely visited and rarely read mailing list) and there was just not enough critical mass to get this issue fixed. If I were you, I'd actually make a cursory effort to get my facts straight before spouting off. This bug was actually hidden in bugzilla for ages, where Matthew Wilcox was trying to deal with it on his own. The first I heard of it (apart from a linux-scsi question on 13 November, when regrettably, I was busy with other things) was on 18 Dec when Natalie added me to the bugzilla cc list. The first thing I did on that date was finger pktcdvd and add Jens to the cc list ... however, since there was no mailing list thread to follow he ended up asking for context which no-one provided. The whole problem with this bug was generated precisely because it was kept in bugzilla where too few people actually looked at it. You're the one who annotated the bugzilla entries with trite little homilies asking why there was no action *without* ever notifying any mailing list, I might add. The fault lies in our bug processing methodology. Bugzilla is a fine tracking tool, but it's a bloody useless workflow one for actually solving problems because, as you say, and I agree, the mailing lists are where we produce the solutions. ... Bugzilla for tracking and mailing lists for discussing are not mutually exclusive. What about asking the Bugzilla admins to set the default owner of new SCSI bugs to [EMAIL PROTECTED] This way all SCSI bugs submitted in Bugzilla will automatically be forwarded to the linux-scsi mailing list. James cu Adrian -- Is there not promise of rain? Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. Only a promise, Lao Er said. Pearl S. Buck - Dragon Seed -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert [SCSI] Get rid of scsi_cmnd-done
On Sun, 2008-01-06 at 17:45 +0200, Adrian Bunk wrote: Bugzilla for tracking and mailing lists for discussing are not mutually exclusive. I didn't ever say they were. What I said was we needed the work flow on the mailing lists. What about asking the Bugzilla admins to set the default owner of new SCSI bugs to [EMAIL PROTECTED] This way all SCSI bugs submitted in Bugzilla will automatically be forwarded to the linux-scsi mailing list. Well, I've only been asking for this for the last two years. If you can actually make it happen, I'd be eternally grateful. James -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert [SCSI] Get rid of scsi_cmnd-done
On Sun, 2008-01-06 at 18:19 +0200, Boaz Harrosh wrote: On Sun, Jan 06 2008 at 5:43 +0200, Linus Torvalds [EMAIL PROTECTED] wrote: This all still leaves the question unanswered why that commit 6f5391c283d7fdcf24bf40786ea79061919d1e1d changed any behaviour at all. Because the thing that Peter is describing has nothing to do with any low-level drivers what-so-ever. Linus James Matthew. I have a (very) wild guess at what maybe have changed with the cmnd-done patch: Do you remember the effective loop in scsi_lib:scsi_end_request() where if bufflen was smaller then original request size, do to truncation of bufflen by ULD, then the remaining of the request is re-queued again as a new scsi-command. Well I think that the old system would call cmnd-done for every iteration, and the new system, since the done is called by the block-Q, does not see the resubmit of the new command. Actually, this is cmnd-done, not req-done we're removing. cmnd-done() isn't seen by the block layer; all its uses are in the SCSI mid-layer. I have not followed all code path of the matter, but I know that sr does alters bufflen in some cases. All this is not a bug in itself, but it is a change in behavior that might cause the current sr hack to fail. It's a good thought. You're right, the old code calls done for every iteration. However, it calls it in scsi_finish_completion. The new code will actually call drv-done() in that same spot for every iteration as well. The requeue is done via scsi_requeue_request which calls blk_requeue_request, which resets the START flag and sends the command right back through the system (including the prep function because scsi_requeue_request unpreps the command), so even with the new code we'll go back through all the same done paths. James -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert [SCSI] Get rid of scsi_cmnd-done
On Sun, Jan 06, 2008 at 02:55:07PM +0100, Thomas Meyer wrote: This is the correct setup to trigger the bug. And the commit 6f5391c283d7fdcf24bf40786ea79061919d1e1d has nothing to do with this bug. It was bad luck that i couldn't trigger the bug with said commit reverted (in my tests, the second boot with the reverted commit missed the first mount/umount smaller cd step, so...) Great. Linus, can you revert the revert now? -- Intel are signing my paycheques ... these opinions are still mine Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert [SCSI] Get rid of scsi_cmnd-done
On Sun, 2008-01-06 at 17:12 +0100, Ingo Molnar wrote: * James Bottomley [EMAIL PROTECTED] wrote: The reproducer came to you via Peter Osterlund who has never authored a single drivers/scsi/ commit before (according to git-log) and who (and here i'm out on a limb guessing it) does not even follow [EMAIL PROTECTED] this bug was obscure and hidden on [EMAIL PROTECTED] for _months_, (it is a rarely visited and rarely read mailing list) and there was just not enough critical mass to get this issue fixed. If I were you, I'd actually make a cursory effort to get my facts straight before spouting off. This bug was actually hidden in bugzilla for ages, where Matthew Wilcox was trying to deal with it on his own. [...] Huh? The bugzilla just tracked a bug reported to lkml. The very description of the bugzilla says: Subject : v2.6.24-rc2-409-g9418d5d: attempt to access beyond end of device Submitter : Thomas Meyer [EMAIL PROTECTED] References : http://lkml.org/lkml/2007/11/13/250 so no, it was evidently not hidden in bugzilla for ages - all the important action happened on lkml. ... and your original accusation was this bug was obscure and hidden on [EMAIL PROTECTED] for _months_ which I was pointing out wasn't true. Even the original lkml report was obscured by sweeping the report into bugzilla and forgetting about it, so in fact, no action happened, even on lkml. The history is that I was made aware of the bug on 18 Dec. I suggested then it was a pktcdvd problem and actually cc'd the wrong maintainer ... again an error which is compounded by following this single person workflow bugzilla requires. I also told you in no uncertain terms that it wasn't caused by the commit you were trying to blame, but that didn't stop the crusade to affix blame and close the bug without doing proper root cause analysis. Can we stop it with the recriminations and blame shifting now. The problem we need to solve is fixing our broken bug resolution workflow. My suggestion (for actually the third time in various fora) is: to get the best of both worlds, file a bugzilla and note the bugid. Then email a complete report to the relevant list, but add [BUG bugid] to the subject line and cc [EMAIL PROTECTED] If you do this, bugzilla will keep track of the entire discussion as it progresses and allow those who track bugs through bugzilla to get a pretty accurate idea of the status. You should never need to touch bugzilla again once the initial bug report is filed: all future information flow is via the mailing lists. I don't give a toss what you recommend the default list to be ... use lkml if you wish. There are a lot of people who will vector it back on to linux-scsi and keep lkml in the cc. I also think that bug reports need to be complete, so copy the information from the mailing list, don't just give a URL ... one of the other enforced breaks in workflow is that the URL in the original bugzilla wasn't working when we all tried to look at the original email on 18 Dec, that's why you get several comments asking for more information and where the original thread is. James -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert [SCSI] Get rid of scsi_cmnd-done
On Sun, Jan 06, 2008 at 09:20:45AM -0600, James Bottomley wrote: This bug was actually hidden in bugzilla for ages, where Matthew Wilcox was trying to deal with it on his own. The first I heard of it (apart from a linux-scsi question on 13 November, when regrettably, I was busy with other things) was on 18 Dec when Natalie added me to the bugzilla cc list. The first thing I did on that date was finger pktcdvd and add Jens to the cc list ... however, since there was no mailing list thread to follow he ended up asking for context which no-one provided. Not true. I sent my response to the bug to linux-scsi and cc'd bugzilla. Unfortunately, the bug reporter sent the information requested to bugzilla instead of linux-scsi. So it got lost and overlooked. If there's willingness to change, I'm willing to put some effort into moving us to a bug tracking system that fits our workflow better than bugzilla. But if that effort will be disregarded, then let me know now so I don't waste my time. -- Intel are signing my paycheques ... these opinions are still mine Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert [SCSI] Get rid of scsi_cmnd-done
Ingo Molnar wrote: If lkml traffic is too big then i'd suggest to set up email filters to separate out mails that have 'SCSI' in their subject line Good idea. Minor flaw: If somebody forgets to Cc LSML, he might also forget to put SCSI or scsi into the Subject. or body. This yields false positives whenever a .config is posted. Also, filtering based on message body contents is costlier than filtering by headers. I for one use Sieve to sort mails into different IMAP folders at my mail provider's server, and I think Sieve doesn't offer tests for patterns in message bodies at all. -- Stefan Richter -=-==--- ---= --==- http://arcgraph.de/sr/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert [SCSI] Get rid of scsi_cmnd-done
On Sun, 2008-01-06 at 10:11 -0700, Matthew Wilcox wrote: On Sun, Jan 06, 2008 at 09:20:45AM -0600, James Bottomley wrote: This bug was actually hidden in bugzilla for ages, where Matthew Wilcox was trying to deal with it on his own. The first I heard of it (apart from a linux-scsi question on 13 November, when regrettably, I was busy with other things) was on 18 Dec when Natalie added me to the bugzilla cc list. The first thing I did on that date was finger pktcdvd and add Jens to the cc list ... however, since there was no mailing list thread to follow he ended up asking for context which no-one provided. Not true. I sent my response to the bug to linux-scsi and cc'd bugzilla. Unfortunately, the bug reporter sent the information requested to bugzilla instead of linux-scsi. So it got lost and overlooked. If there's willingness to change, I'm willing to put some effort into moving us to a bug tracking system that fits our workflow better than bugzilla. But if that effort will be disregarded, then let me know now so I don't waste my time. Well, I'd say yes, certainly, and I'll support the effort ... but the problem is that I'm not one of the powers that be who control our current bugzilla ... that's the constituency we need to convince. As I keep saying, just getting new SCSI bug reports tipped onto the SCSI mailing list will give us about 90% of what we need, but I haven't even managed to get that to happen. James -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert [SCSI] Get rid of scsi_cmnd-done
On Sun, 6 Jan 2008, James Bottomley wrote: index 993f78c..6a20da9 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -1191,6 +1191,7 @@ static int do_open(struct block_device *bdev, struct file *file, int for_part) } if (bdev-bd_invalidated) rescan_partitions(bdev-bd_disk, bdev); + bd_inode-i_size = (loff_t)get_capacity(disk)9; } } bdev-bd_openers++; Actually, I think that code is fine ... the block size shouldn't change across positive values of bd_openers. The block size should indeed not change, and that's why we must *not* do bd_set_size() there (because that changes the block size too, not just the size of the device). But I would argue that if we have had a device invalidation event (ie media change), then we should indeed change the actual *size* of the block device as seen by anybody who opens the file again. And yes, it will affect old openers of the same inode too (since the size is in the inode, not the file descriptor), but considering that this really should only happen for media change events, I think that's better than what we used to have. Now, we could have made it even more clear by moving the i_size setting into the same if (dbev-bd_invalidated) conditional, but the thing is, we only set bd_invalidated for devices that have minors, so things like floppies etc that don't have partition support also don't have bd_invalidated set (in effect, bd_invalidated really *is* just a flag for the partitioning code, not for disk change in general). That said: pktcdvd shouldn't be mucking with the size of the underlying CD/DVD ... I'm not sure if it should be mucking with the size or not, but it definitely shouldn't be mucking with the block-size, because that can indeed cause huge problems. So removing the bd_set_size() is almost certainly the right thing to do (because it's always incorrect to do at an inner open), and the real size should have already been set by the regular open. But this should be tested by somebody who uses the dang thing. My personal favourite for a real fix would actually be to always make the capacity of the pktcdvd device match the capacity of the underlying device, ie the diff would look something like the appended (untested as usual), but that may be a bit too extensive a change. But regardless, I think the i_size change makes sense - at least in some form. It doesn't necessarily have to be the explicit i_size setting: I also considered just changing the block_dev.c do_open() make bd_set_size() _unconditionally_, and then move the if (!bdev-bd_openers) test into bd_set_size(), so that people couldn't even change the blocksize by mistake! I'd still like to hear comments from Al in particular, if he has something to say. Linus --- drivers/block/pktcdvd.c |4 +--- 1 files changed, 1 insertions(+), 3 deletions(-) diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c index 3535ef8..1b23681 100644 --- a/drivers/block/pktcdvd.c +++ b/drivers/block/pktcdvd.c @@ -2342,9 +2342,7 @@ static int pkt_open_dev(struct pktcdvd_device *pd, int write) goto out_unclaim; } - set_capacity(pd-disk, lba 2); - set_capacity(pd-bdev-bd_disk, lba 2); - bd_set_size(pd-bdev, (loff_t)lba 11); + set_capacity(pd-disk, get_capacity(pd-bdev-bd_disk)); q = bdev_get_queue(pd-bdev); if (write) { -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert [SCSI] Get rid of scsi_cmnd-done
On Sun, Jan 06, 2008 at 11:36:23AM -0600, James Bottomley wrote: On Sun, 2008-01-06 at 10:11 -0700, Matthew Wilcox wrote: If there's willingness to change, I'm willing to put some effort into moving us to a bug tracking system that fits our workflow better than bugzilla. But if that effort will be disregarded, then let me know now so I don't waste my time. Well, I'd say yes, certainly, and I'll support the effort ... but the problem is that I'm not one of the powers that be who control our current bugzilla ... that's the constituency we need to convince. As I keep saying, just getting new SCSI bug reports tipped onto the SCSI mailing list will give us about 90% of what we need, but I haven't even managed to get that to happen. As long as the process will be that much complicated, it will never correctly work. The primary requirement for a useful bug reporting workflow is *not to put the burden on the reporter*. I agree with Ingo here. How can a user know where to post ? He has a problem with his Linux distro. He finally understands or gets told that the strange numbers on the screen indicate a kernel oops and that he must report it if he wants it to be fixed. He then googles for how/where to report a bug, and the first reply says : http://www.kernel.org/pub/linux/docs/lkml/reporting-bugs.html in which you can read : If you are totally stumped as to whom to send the report, send it to linux-kernel@vger.kernel.org. So *this* is the official central entry point, like it or not. And in fact it works, given the number of off-topic reports we get. It proves that end users know how to report their problems there. Other lists should be used when the problem is clearly qualified. And most of the time, it's not up to the end user to qualify his problem, but to *us*. Our mission is not to blindly write lines of code, but to spend some time getting user feedback, and bug reports are part of this feedback. In my opinion, the most important reader contribution to LKML is just reading bugs reports and redirecting them to the proper list if obviously required. People do this all the time and it has always worked. In parallel, bug entries may be added into bugzilla, either by the reporter once suggested to do so, or by the person redirecting him to the proper list. So a working workflow would look like this : 1) from: user to: lkml subject: help needed with my CD burner 2) from: any LKML reader to: user cc: lkml, linux-scsi subject: re: help needed with my CD burner Message forwarded to linux-scsi. You may accelerate resolution by describing your problem in bugzilla [url, mail...] = user knows his problem is being considered (very important) = user is connected with the proper list and possibly with a bugzilla entry. As long as the bug is not 100% sure relevant to linux-scsi, lkml should remain CCed so that other readers may occasionally spot the problem. I would also like to make a parallel with how support works in commercial products. Generally, you as the end user don't know anything about the vendor's internal process. You don't even know if you have an account on your vendor's support site (another blocking factor of bugzilla BTW). So what you do is call any of your contacts overthere, quickly describe your problem, and most of the time he gives you all the indications required to report a fine bug. And if he understands it will be too hard for you (classically because of missing account), he will initiate it by himself. At this point the bug is tracked and followed by the appropriate persons. LKML *is* the contact for any Linux Kernel problem or question. As long as we will try to bypass it and create parallel ways, it will only maintain confusion and lead to bugs getting dropped with user frustration. IMHO, the only missing indication in reporting-bugs.html above is : if you don't get any reply within 2 days, surely your mail has not been noticed, repost it, if possible with a more appropriate subject. We will *never* be able to educate newcomers, but we can (and must) educate regular readers to help newcomers report bugs. If only 100 regular readers randomly forward 2 mails a month, those are 200 bugs which get properly handled. Just don't forget to change your reply-to to lkml if you don't want to get polluted by the discussion. As to using bugzilla for bug tracking... Well, I agree that bug tracking is important when you work on multiple problems at once. But bugzilla should be the developer's tool, not the end user's. That means that it should only be our job to create entries there if end users find it too difficult, and that we should just invite them to *try* to report there to save us some time. Willy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] scsi: revert [SCSI] Get rid of scsi_cmnd-done
On Sun, 6 Jan 2008, Linus Torvalds wrote: That said: pktcdvd shouldn't be mucking with the size of the underlying CD/DVD ... I'm not sure if it should be mucking with the size or not, but it definitely shouldn't be mucking with the block-size, because that can indeed cause huge problems. Hmm. Looking closer, it's probably ok in that case, because it does do a bd_claim() to make sure that it has exclusive access, so while there may be other openers around, at least those other openers won't be filesystem mounts or anything that opened with O_EXCL. So changing the blocksize is probably ok in this case. That still leaves the question whether pktcdvd *should* muck with the base device at all, and I'm not at all sure about that. But I'm no longer sure that the pktcdvd code is necessarily *clearly* broken, now it's more of a should it really do that? thing. So I still suspect that this: - set_capacity(pd-disk, lba 2); - set_capacity(pd-bdev-bd_disk, lba 2); - bd_set_size(pd-bdev, (loff_t)lba 11); + set_capacity(pd-disk, get_capacity(pd-bdev-bd_disk)); is likely a good thing to do (in conjunction with my patch that made i_size be reliable after an open), but there may be some reason why pktcdvd really wants to control the size rather than be on the receiving end of the size. Peter, this is your decision. Apparently my one-liner fixes the immediate bug (but it's not really a regression either - I think the i_size issue has been there since pretty much day #1), and what pktcdvd does is somewhat less critical an issue? Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert [SCSI] Get rid of scsi_cmnd-done
On Sun, 2008-01-06 at 10:44 -0800, Linus Torvalds wrote: On Sun, 6 Jan 2008, Linus Torvalds wrote: That said: pktcdvd shouldn't be mucking with the size of the underlying CD/DVD ... I'm not sure if it should be mucking with the size or not, but it definitely shouldn't be mucking with the block-size, because that can indeed cause huge problems. Hmm. Looking closer, it's probably ok in that case, because it does do a bd_claim() to make sure that it has exclusive access, so while there may be other openers around, at least those other openers won't be filesystem mounts or anything that opened with O_EXCL. So changing the blocksize is probably ok in this case. That still leaves the question whether pktcdvd *should* muck with the base device at all, and I'm not at all sure about that. But I'm no longer sure that the pktcdvd code is necessarily *clearly* broken, now it's more of a should it really do that? thing. So I still suspect that this: - set_capacity(pd-disk, lba 2); - set_capacity(pd-bdev-bd_disk, lba 2); - bd_set_size(pd-bdev, (loff_t)lba 11); + set_capacity(pd-disk, get_capacity(pd-bdev-bd_disk)); is likely a good thing to do (in conjunction with my patch that made i_size be reliable after an open), but there may be some reason why pktcdvd really wants to control the size rather than be on the receiving end of the size. Peter, this is your decision. Apparently my one-liner fixes the immediate bug (but it's not really a regression either - I think the i_size issue has been there since pretty much day #1), and what pktcdvd does is somewhat less critical an issue? I think perhaps the true bug lies in the way we handle layered devices like this. pktcdvd holds the underlying device open, so its refcount never drops to zero. This is what causes the gendisk/block layer never to update the sizes, and what lead to pktcdvd doing it instead. However, what perhaps really needs to happen is that pktcdvd needs to take over the media change events as well. That way it could see the disk change and invalidate and reread its own setting of the block size (and possibly re set the size of the underlying device). I agree, though, this isn't a regression. It's probably obscure enough in reproduction to warrant not holding up 2.6.24 --- especially as I think the true fix will do small perturbations to a lot of subsystems. If this were a product and I were the release manager, I'd be updating the release notes with a note about having to break pktcdvd binding across media changes to work around this bug and a promise to fix it in the next release. James -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert [SCSI] Get rid of scsi_cmnd-done
On Sun, Jan 06, 2008 at 08:56:25PM +0200, Adrian Bunk wrote: On Sun, Jan 06, 2008 at 07:34:02PM +0100, Willy Tarreau wrote: ... As to using bugzilla for bug tracking... Well, I agree that bug tracking is important when you work on multiple problems at once. But bugzilla should be the developer's tool, not the end user's. That means that it should only be our job to create entries there if end users find it too difficult, and that we should just invite them to *try* to report there to save us some time. Where does this opinion end users would find Bugzilla too difficult come from? Many other projects use Bugzilla without big problems. Is this just plain FUD or based on real reports from end users? In the latter case, please give pointers to them so that whatever problems exist can be fixed. I, as an end user of ntpd, have been harrassed to use it to get an ntp bug reported because by mail it would get lost. What complicated an interface it is when you don't know it ! I remember I wanted to attach a patch and it didn't even get through the first time. I did it wrong. Blame me if you want, but an interface which need training for proper use is certainly not for casual end users. Also, it's very annoying to have to create an account somewhere, leaving there one of the passwords you use on many other sites, just to help a random developer fix a bug in his code. You quickly wonder if someone else will report it and have more patience. Another recent example: a coworker recently told me he installed the latest beta from ubuntu, and that he had some problems with his WIFI randomly hanging. I asked him if he filed a bug, he replied me no, it's too much boring, I'm not the only one with this hardware, others have certainly already done it. When the release went out, he insisted telling me he was right not filing the bug because indeed it was fixed ! We must accept that end users : 1) do not like creating accounts (remember or divulgate passwords, and risk of getting spam) 2) do not know how to classify their problem, and are not even sure it's a real bug. On the first page, when uncertain they would probably click Other. Adding doubt in the reporter's mind is counter-productive as it will refrain him from being precise about what he did to get the problem. 3) are not familiar with our vocabulary : - Tree : mainline? mm? mjb? ac? what's that ? - Component : Configuration? LSM? Modules? Other? = finally, I'm not sure I had to click Other in the first place, I want to choose something else, I click Back and I get back to the login page! Bye bye. Also : No binary modules - NVIDIA users this means YOU! = about half the reporters will wonder if they should stop here or not. Most of those with an NVidia chipset and/or graphics card will wonder, while the bug may still interest us. At least, on the mailing list, there's no real rules, the mail will be posted anyway. And if the user gets flamed, at least we have the report. And different to LKML, you don't run into problems like majordomo silently dropping your email because it contains HTML or a vCard. That's true. But do we have statistics on the ratio of client IP addresses which go as far as the login page and which have finally not filed their bug (either stopped at the login page or given up after logging in) ? It should be very interesting... Regards, Willy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert [SCSI] Get rid of scsi_cmnd-done
On Sun, Jan 06, 2008 at 08:10:44PM +0100, Willy Tarreau wrote: On Sun, Jan 06, 2008 at 08:56:25PM +0200, Adrian Bunk wrote: On Sun, Jan 06, 2008 at 07:34:02PM +0100, Willy Tarreau wrote: ... As to using bugzilla for bug tracking... Well, I agree that bug tracking is important when you work on multiple problems at once. But bugzilla should be the developer's tool, not the end user's. That means that it should only be our job to create entries there if end users find it too difficult, and that we should just invite them to *try* to report there to save us some time. Where does this opinion end users would find Bugzilla too difficult come from? Many other projects use Bugzilla without big problems. Is this just plain FUD or based on real reports from end users? In the latter case, please give pointers to them so that whatever problems exist can be fixed. I, as an end user of ntpd, have been harrassed to use it to get an ntp bug reported because by mail it would get lost. What complicated Noone knows how many thousand bug reports have never reached lkml since majordomo silently dropped them. This is the price for having lkml relatively spam-free. an interface it is when you don't know it ! I remember I wanted to attach a patch and it didn't even get through the first time. I did it wrong. Blame me if you want, but an interface which need training for proper use is certainly not for casual end users. What exactly is the problem with attaching files? What is it didn't even get through the first time exactly? Also, it's very annoying to have to create an account somewhere, leaving there one of the passwords you use on many other sites, just to help a random developer fix a bug in his code. You quickly wonder if someone else will report it and have more patience. If you already lack patience at this point, would you be willing to bisect which requires more than a dozen kernel recompiles and reboots? Another recent example: a coworker recently told me he installed the latest beta from ubuntu, and that he had some problems with his WIFI randomly hanging. I asked him if he filed a bug, he replied me no, it's too much boring, I'm not the only one with this hardware, others have certainly already done it. When the release went out, he insisted telling me he was right not filing the bug because indeed it was fixed ! He wouldn't have sent a bug report no matter how to report it. We must accept that end users : 1) do not like creating accounts (remember or divulgate passwords, and risk of getting spam) Send _one_ email to lkml and you'll get forever spam to this address. With one email addresses of mine exactly that happened. 2) do not know how to classify their problem, and are not even sure it's a real bug. On the first page, when uncertain they would probably click Other. Adding doubt in the reporter's mind is counter-productive as it will refrain him from being precise about what he did to get the problem. If the bug ends at Other/Other that's not a problem - this usually gets fixed within a few hours. 3) are not familiar with our vocabulary : - Tree : mainline? mm? mjb? ac? what's that ? - Component : Configuration? LSM? Modules? Other? Then let's improve that. = finally, I'm not sure I had to click Other in the first place, I want to choose something else, I click Back and I get back to the login page! Bye bye. Works fine here. Did you disable cookies? Also : No binary modules - NVIDIA users this means YOU! = about half the reporters will wonder if they should stop here or not. Most of those with an NVidia chipset and/or graphics card will wonder, while the bug may still interest us. Then sugggest a better text. At least, on the mailing list, there's no real rules, the mail will be posted anyway. And if the user gets flamed, at least we have the report. ... If majordomo didn't drop it. And if it didn't get ignored and forgotten. Regards, Willy cu Adrian -- Is there not promise of rain? Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. Only a promise, Lao Er said. Pearl S. Buck - Dragon Seed -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert [SCSI] Get rid of scsi_cmnd-done
* Stefan Richter [EMAIL PROTECTED] wrote: Ingo Molnar wrote: If lkml traffic is too big then i'd suggest to set up email filters to separate out mails that have 'SCSI' in their subject line Good idea. Minor flaw: If somebody forgets to Cc LSML, he might also forget to put SCSI or scsi into the Subject. yeah, that's very frequent, but i dont think it's a big issue: or body. This yields false positives whenever a .config is posted. these two patterns: ^CONFIG_ ^# will exclude .config-ish lines. Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert [SCSI] Get rid of scsi_cmnd-done
On Sun, Jan 06, 2008 at 09:58:02PM +0200, Adrian Bunk wrote: On Sun, Jan 06, 2008 at 08:10:44PM +0100, Willy Tarreau wrote: I, as an end user of ntpd, have been harrassed to use it to get an ntp bug reported because by mail it would get lost. What complicated Noone knows how many thousand bug reports have never reached lkml since majordomo silently dropped them. Since none of my mails has been dropped yet, I think that the false positives are rather rare. Yes, sometimes someone complains about a mail getting repeatedly killed. But that's not *that* much frequent IMO and people are already used to re-post when mailing their friends, coworkers or customers. It's no different here. Still, I agree that it is a problem. This is the price for having lkml relatively spam-free. yes and I think it works fairly well. an interface it is when you don't know it ! I remember I wanted to attach a patch and it didn't even get through the first time. I did it wrong. Blame me if you want, but an interface which need training for proper use is certainly not for casual end users. What exactly is the problem with attaching files? What is it didn't even get through the first time exactly? Well, it's quite old in my memories, it may be 2 years ago. IIRC, when I wanted to attach files, I got brought to another page for this, and once done there was some confusing indication about how to complete the filing or get back to terminate the report. Sorry for not being much precise on this one, it's too far ago. Also, it's very annoying to have to create an account somewhere, leaving there one of the passwords you use on many other sites, just to help a random developer fix a bug in his code. You quickly wonder if someone else will report it and have more patience. If you already lack patience at this point, Well, it took me 2 minutes to send my patch to the maintainer by then, he very politely told me that the only way was through bugzilla, and then it took me half an hour if not more to create a bugzilla account, find how to use it, attach the files and put a description in a text-area. Also, I remember that the ongoing mail exchanges through bugzilla systematically removed the mail history, which made it very hard to follow a discussion, because all mails I received were almost single-lined looking like how did that happen ? or in what circumstances ? without any history... Maybe this is configurable though. would you be willing to bisect which requires more than a dozen kernel recompiles and reboots? Certainly not! But I would like kernel people to become less egocentric and understand that what they routinely do all the day appears very complicated and time-consuming to many users, and that by imposing them complex and/or costly methods to report bugs, they filter a lot of reports out. Sure, there are still a bunch of them doing everything up to and including the git bisects. But what percentage ? People who encounter problems at work will not do that to start with, because they cannot delay all their work to spend half a day building kernels when their boss or customers are waiting for their work. Others will report the problems they encounter at a customer's and will not even be able to git-bisect because the customer's mail server is not like a notebook they have everywhere with them and can reboot at will. Some of them are more free of their time and will probably enjoy the experience, but they are a minority IMHO. If we had stats on the periods git bisects are run on, I suspect that night and week-ends are the most frequent moments, simply because it's when people have time. IMHO, git bisect is excellent for kernel people. Not for random users. They first have to install git, find free space, *clone the kernel tree* and start discovering the beast. Another recent example: a coworker recently told me he installed the latest beta from ubuntu, and that he had some problems with his WIFI randomly hanging. I asked him if he filed a bug, he replied me no, it's too much boring, I'm not the only one with this hardware, others have certainly already done it. When the release went out, he insisted telling me he was right not filing the bug because indeed it was fixed ! He wouldn't have sent a bug report no matter how to report it. I don't agree. It's a matter of effort vs expected advantage. Just sending a 5-lines mail from work presents a lower entry barrier than having to create an account and discover a new tool. In fact, from the user's perspective, filing a kernel bug report is seen as something annoying and useless, simply because the kernel is so much used that someone else will file the same bug anyway. They act just like microsoft users. Do you know anyone in your relatives who has *ever* filed a bug to microsoft ? Probably zero. They passively wait for the next patch, and just whine if their bug does not get magically fixed. We must understand that our
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
On Sat, 6 Jan 2008, Peter Osterlund wrote: > > The problem is that pktcdvd opens the cd device in non-blocking mode > when pktsetup is run, and doesn't close it again until pktsetup -d > is run. The effect is that if you meanwhile open the cd device, > blkdev.c:do_open() doesn't call bd_set_size() because bdev->bd_openers > is non-zero. > > I don't know the correct way to fix this. Maybe adding bd_set_size() > to sr.c:get_sectorsize() which already does set_capacity() would > work. Hmm.. I wouldn't say that's the correct way to fix it. The thing is, if somebody has explicitly set the size of the device, than that *is* the size. The kernel should do what it is told, and it very much on purpose does the size probing only on the first open: exactly because other open fd's in progress may be there explicitly to fix up something, or may simply depend on some size it already knew about (ie we don't want to change the size behind its back). And in particular, setting the size with bd_set_size also sets the block-size, and while most things migt react fairly well to the pure *size* changing, they will react very badly indeed to the blocksize changing! So no, doing a "bd_set_size()" in any but the outermost opener simply isn't acceptable. It would cause serious problems and total chaos for the block cache (we do not handle aliasing of multiple different blocksizes). So we are very careful indeed to only call bd_set_size() when bd_openers is zero. That said, at least in your scenario: > 1. Start with an empty drive. > 2. pktsetup 0 /dev/scd0 > 3. Insert a CD containing an isofs filesystem. > 4. mount /dev/pktcdvd/0 /mnt/tmp > 5. umount /mnt/tmp > 6. Press the eject button. > 7. Insert a DVD containing a non-writable filesystem. > 8. mount /dev/scd0 /mnt/tmp > 9. find /mnt/tmp -type f -print0 | xargs -0 sha1sum >/dev/null > 10. If the DVD contains data beyond the physical size of a CD, you > get I/O errors in the terminal, and dmesg reports lots of > "attempt to access beyond end of device" errors. part of the problem here seems to be that the "media change" notification that /dev/scd0 hopefully handled correctly and caused the "set_capacity()" hass not made it to the i_size thing (that the block device layer checks). So the way things are *supposed* to work is that the media-change function ("revalidate_disk()") should have triggered as part of the media change, and that *should* have already done the set_capacity(), and that in turn is the thing that should do it all (sets the disk capacity *without* changing the blocksize!) But since "set_capacity()" doesn't actually change "i_size", only dev->capacity (which is correct - there may be many inodes associated with one device), we actually ended up having this subtle dependency on calling bd_set_size() (which we can *only* do on the first open, due to the blocksize issues). Which means that media-change won't fix these things like it is supposed to. And I suspect we've had this bug (well, it *appears* to be a bug) for a while, simply because all *normal* uses will open and close the device properly. Maybe a patch something like this might work out. I haven't really thought it through entirely - but it basically just sets the size, without doing the whole blocksize thing. Jens? Al? Comments? Does a patch like this change the behaviour you see at all? This all still leaves the question unanswered why that commit 6f5391c283d7fdcf24bf40786ea79061919d1e1d changed any behaviour at all. Because the thing that Peter is describing has nothing to do with any low-level drivers what-so-ever. Linus --- fs/block_dev.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/fs/block_dev.c b/fs/block_dev.c index 993f78c..6a20da9 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -1191,6 +1191,7 @@ static int do_open(struct block_device *bdev, struct file *file, int for_part) } if (bdev->bd_invalidated) rescan_partitions(bdev->bd_disk, bdev); + bd_inode->i_size = (loff_t)get_capacity(disk)<<9; } } bdev->bd_openers++; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
Linus Torvalds <[EMAIL PROTECTED]> writes: > On Wed, 2 Jan 2008, James Bottomley wrote: > > > Look at the taxonomy of the bug. This is the form of the error: > > > > buffer I/O error on device sr0, logical block 20304 > > attempt to access beyond end of device > > sr0: rw=0, want=81224, limit=40944 > > > > The last limit is the most suggestive, that comes straight from > > bdev->bd_inode->i_size>>9 and is supposed to be the size of the block > > device in 512 byte blocks. For a 4.7GB DVD, it's a little small. > > Nothing in the sr code sets this directly (although it does come from > > get_blkdev() for the first opener). pktcdvd does set it, though ... and > > probably wrongly if the drive in question isn't UDF formatted. pktcdvd sets it when opening the /dev/pktcdvd device, but when the drive is later opened as /dev/scd0, there is nothing that sets it back. (Btw, 40944 is possible if the disk is a CDRW that was formatted with "cdrwtool -m 10236".) The problem is that pktcdvd opens the cd device in non-blocking mode when pktsetup is run, and doesn't close it again until pktsetup -d is run. The effect is that if you meanwhile open the cd device, blkdev.c:do_open() doesn't call bd_set_size() because bdev->bd_openers is non-zero. I don't know the correct way to fix this. Maybe adding bd_set_size() to sr.c:get_sectorsize() which already does set_capacity() would work. > .. but you're ignoring the fact that if pktcdvd sets it wrong, then it > should be visible with the pre-commit kernel *also*. I can repeat this bug, both with and without the scsi patch that is claimed to make a difference, both with an external USB drive and an internal IDE drive. To repeat: 1. Start with an empty drive. 2. pktsetup 0 /dev/scd0 3. Insert a CD containing an isofs filesystem. 4. mount /dev/pktcdvd/0 /mnt/tmp 5. umount /mnt/tmp 6. Press the eject button. 7. Insert a DVD containing a non-writable filesystem. 8. mount /dev/scd0 /mnt/tmp 9. find /mnt/tmp -type f -print0 | xargs -0 sha1sum >/dev/null 10. If the DVD contains data beyond the physical size of a CD, you get I/O errors in the terminal, and dmesg reports lots of "attempt to access beyond end of device" errors. -- Peter Osterlund - [EMAIL PROTECTED] http://web.telia.com/~u89404340 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert [SCSI] Get rid of scsi_cmnd-done
Linus Torvalds [EMAIL PROTECTED] writes: On Wed, 2 Jan 2008, James Bottomley wrote: Look at the taxonomy of the bug. This is the form of the error: buffer I/O error on device sr0, logical block 20304 attempt to access beyond end of device sr0: rw=0, want=81224, limit=40944 The last limit is the most suggestive, that comes straight from bdev-bd_inode-i_size9 and is supposed to be the size of the block device in 512 byte blocks. For a 4.7GB DVD, it's a little small. Nothing in the sr code sets this directly (although it does come from get_blkdev() for the first opener). pktcdvd does set it, though ... and probably wrongly if the drive in question isn't UDF formatted. pktcdvd sets it when opening the /dev/pktcdvd device, but when the drive is later opened as /dev/scd0, there is nothing that sets it back. (Btw, 40944 is possible if the disk is a CDRW that was formatted with cdrwtool -m 10236.) The problem is that pktcdvd opens the cd device in non-blocking mode when pktsetup is run, and doesn't close it again until pktsetup -d is run. The effect is that if you meanwhile open the cd device, blkdev.c:do_open() doesn't call bd_set_size() because bdev-bd_openers is non-zero. I don't know the correct way to fix this. Maybe adding bd_set_size() to sr.c:get_sectorsize() which already does set_capacity() would work. .. but you're ignoring the fact that if pktcdvd sets it wrong, then it should be visible with the pre-commit kernel *also*. I can repeat this bug, both with and without the scsi patch that is claimed to make a difference, both with an external USB drive and an internal IDE drive. To repeat: 1. Start with an empty drive. 2. pktsetup 0 /dev/scd0 3. Insert a CD containing an isofs filesystem. 4. mount /dev/pktcdvd/0 /mnt/tmp 5. umount /mnt/tmp 6. Press the eject button. 7. Insert a DVD containing a non-writable filesystem. 8. mount /dev/scd0 /mnt/tmp 9. find /mnt/tmp -type f -print0 | xargs -0 sha1sum /dev/null 10. If the DVD contains data beyond the physical size of a CD, you get I/O errors in the terminal, and dmesg reports lots of attempt to access beyond end of device errors. -- Peter Osterlund - [EMAIL PROTECTED] http://web.telia.com/~u89404340 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert [SCSI] Get rid of scsi_cmnd-done
On Sat, 6 Jan 2008, Peter Osterlund wrote: The problem is that pktcdvd opens the cd device in non-blocking mode when pktsetup is run, and doesn't close it again until pktsetup -d is run. The effect is that if you meanwhile open the cd device, blkdev.c:do_open() doesn't call bd_set_size() because bdev-bd_openers is non-zero. I don't know the correct way to fix this. Maybe adding bd_set_size() to sr.c:get_sectorsize() which already does set_capacity() would work. Hmm.. I wouldn't say that's the correct way to fix it. The thing is, if somebody has explicitly set the size of the device, than that *is* the size. The kernel should do what it is told, and it very much on purpose does the size probing only on the first open: exactly because other open fd's in progress may be there explicitly to fix up something, or may simply depend on some size it already knew about (ie we don't want to change the size behind its back). And in particular, setting the size with bd_set_size also sets the block-size, and while most things migt react fairly well to the pure *size* changing, they will react very badly indeed to the blocksize changing! So no, doing a bd_set_size() in any but the outermost opener simply isn't acceptable. It would cause serious problems and total chaos for the block cache (we do not handle aliasing of multiple different blocksizes). So we are very careful indeed to only call bd_set_size() when bd_openers is zero. That said, at least in your scenario: 1. Start with an empty drive. 2. pktsetup 0 /dev/scd0 3. Insert a CD containing an isofs filesystem. 4. mount /dev/pktcdvd/0 /mnt/tmp 5. umount /mnt/tmp 6. Press the eject button. 7. Insert a DVD containing a non-writable filesystem. 8. mount /dev/scd0 /mnt/tmp 9. find /mnt/tmp -type f -print0 | xargs -0 sha1sum /dev/null 10. If the DVD contains data beyond the physical size of a CD, you get I/O errors in the terminal, and dmesg reports lots of attempt to access beyond end of device errors. part of the problem here seems to be that the media change notification that /dev/scd0 hopefully handled correctly and caused the set_capacity() hass not made it to the i_size thing (that the block device layer checks). So the way things are *supposed* to work is that the media-change function (revalidate_disk()) should have triggered as part of the media change, and that *should* have already done the set_capacity(), and that in turn is the thing that should do it all (sets the disk capacity *without* changing the blocksize!) But since set_capacity() doesn't actually change i_size, only dev-capacity (which is correct - there may be many inodes associated with one device), we actually ended up having this subtle dependency on calling bd_set_size() (which we can *only* do on the first open, due to the blocksize issues). Which means that media-change won't fix these things like it is supposed to. And I suspect we've had this bug (well, it *appears* to be a bug) for a while, simply because all *normal* uses will open and close the device properly. Maybe a patch something like this might work out. I haven't really thought it through entirely - but it basically just sets the size, without doing the whole blocksize thing. Jens? Al? Comments? Does a patch like this change the behaviour you see at all? This all still leaves the question unanswered why that commit 6f5391c283d7fdcf24bf40786ea79061919d1e1d changed any behaviour at all. Because the thing that Peter is describing has nothing to do with any low-level drivers what-so-ever. Linus --- fs/block_dev.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/fs/block_dev.c b/fs/block_dev.c index 993f78c..6a20da9 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -1191,6 +1191,7 @@ static int do_open(struct block_device *bdev, struct file *file, int for_part) } if (bdev-bd_invalidated) rescan_partitions(bdev-bd_disk, bdev); + bd_inode-i_size = (loff_t)get_capacity(disk)9; } } bdev-bd_openers++; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
On Wed, 2 Jan 2008, James Bottomley wrote: > > > > To say that another way: > > > > "the code is functionally equivalent, EXCEPT IT ISN'T, and it's > > known to be broken". > > > > wouldn't you say my version is more honest and correct? > > No. Just because a bug appears when a particular piece of code is in > and disappears when it is reverted doesn't automatically equate to the > code in question being buggy. But it *DOES* mean that it's not equivalent. > Look at the taxonomy of the bug. This is the form of the error: > > buffer I/O error on device sr0, logical block 20304 > attempt to access beyond end of device > sr0: rw=0, want=81224, limit=40944 > > The last limit is the most suggestive, that comes straight from > bdev->bd_inode->i_size>>9 and is supposed to be the size of the block > device in 512 byte blocks. For a 4.7GB DVD, it's a little small. > Nothing in the sr code sets this directly (although it does come from > get_blkdev() for the first opener). pktcdvd does set it, though ... and > probably wrongly if the drive in question isn't UDF formatted. .. but you're ignoring the fact that if pktcdvd sets it wrong, then it should be visible with the pre-commit kernel *also*. In other words, you continue to ignore the fact that BEHAVIOUR CHANGED. Why? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
On Wed, 2008-01-02 at 12:45 -0800, Linus Torvalds wrote: > > On Wed, 2 Jan 2008, James Bottomley wrote: > > > > OK ... I'll revert it. However, I still think it's the wrong course of > > action, because as far as my analysis goes, this code is functionally > > equivalent to what went before with the exception that we now rely on > > the request->cmd_type information in the post processing (previously we > > just relied on the cmnd->done pointer). > > To say that another way: > > "the code is functionally equivalent, EXCEPT IT ISN'T, and it's > known to be broken". > > wouldn't you say my version is more honest and correct? No. Just because a bug appears when a particular piece of code is in and disappears when it is reverted doesn't automatically equate to the code in question being buggy. We seem to get a lot of these second order effect type things; sometimes its just a problem caused by a particular routine compiling to a longer byte sequence and pushing something else out. Do give us credit for thinking "functional equivalency problem" when this bug report first came in ... I've had myself and several other people over the code. If there's an inequivalency somewhere I'm damned if I can spot it. The most promising other failure mode we tried was request type changes over the lifetime of the command, but we can't make that one fly either. Look at the taxonomy of the bug. This is the form of the error: buffer I/O error on device sr0, logical block 20304 attempt to access beyond end of device sr0: rw=0, want=81224, limit=40944 The last limit is the most suggestive, that comes straight from bdev->bd_inode->i_size>>9 and is supposed to be the size of the block device in 512 byte blocks. For a 4.7GB DVD, it's a little small. Nothing in the sr code sets this directly (although it does come from get_blkdev() for the first opener). pktcdvd does set it, though ... and probably wrongly if the drive in question isn't UDF formatted. I have also tried on many occasions to reproduce this without success (there's a simple recipe in the bug report, but it just doesn't work for me). My setup is with an aic94xxx->expander->SATAPI DVD, whereas the original reporter is ata_piix -> PATA DVD, so it could be stack differences---but again, if it is, the bug itself can't be a simple one in the generic code. The fact that there are no other reporters of problems like this also indicate to me that it isn't a widespread problem (again, pointing to something more specific in the setup of the reporter). The unreproduceability coupled with the lack of other error reports leads me to be about 90% confident the problem isn't in the code you want reverted. However, I grant that we cannot seem to find the root cause, and reverting the code will cause our bug metrics to go down by one (at least until something else causes it to reappear), so it is the corporate thing to do, I suppose. I'll send in a reversion with the sr_mod removal bug fix. James -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
On Wed, Jan 02, 2008 at 12:49:32PM -0800, Linus Torvalds wrote: > Maybe it's not that one suspicious test. Maybe it's somethign else. But > that commit was confirmed to break something, almost two months ago. You > guys seem to be in denial, and saying "it didn't change anything". > > And no, waiting for more reporters when one reporter has already narrowed > it down to the exact (smallish) commit, is simply not good. Either you can > fix it by looking at the source, or it gets reverted. > > I was hoping somebody in SCSI-land would actually look at the commit and > try to find out what's wrong, instead of all of you apparently trying to > say "nothing is wrong". I've spent about a week of time looking at it over the last couple of months. I haven't been able to figure it out. That's why I'm calling for it to be reverted, not because I'm "in denial". -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
On Wed, 2 Jan 2008, Christoph Hellwig wrote: > > I think you misunderstood Matthew here. REQ_TYPE_BLOCK_PC is indeed > used by any kind of SG_IO or similar passthrough no matter where it > originates. And exactly because REQ_TYPE_BLOCK_PC are entirely passthru > the actual driver (sd, sr or sg) is not doing the actual command > completion but it's handled in the scsi layer because it's exactly > the same no matter what driver it came on. You say that, but you then ignore that something *did* change. Maybe it's not that one suspicious test. Maybe it's somethign else. But that commit was confirmed to break something, almost two months ago. You guys seem to be in denial, and saying "it didn't change anything". And no, waiting for more reporters when one reporter has already narrowed it down to the exact (smallish) commit, is simply not good. Either you can fix it by looking at the source, or it gets reverted. I was hoping somebody in SCSI-land would actually look at the commit and try to find out what's wrong, instead of all of you apparently trying to say "nothing is wrong". I hate the excuses of "but, but, but.. it *should* work". It doesn't. Face that, *then* you can argue about why. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
On Wed, 2 Jan 2008, James Bottomley wrote: > > OK ... I'll revert it. However, I still think it's the wrong course of > action, because as far as my analysis goes, this code is functionally > equivalent to what went before with the exception that we now rely on > the request->cmd_type information in the post processing (previously we > just relied on the cmnd->done pointer). To say that another way: "the code is functionally equivalent, EXCEPT IT ISN'T, and it's known to be broken". wouldn't you say my version is more honest and correct? The old code did a per-command callback. The new one doesn't. The code was *supposed* to be equivalent, but it clearly isn't. Why argue the point? And no, maybe it's not that REQ_TYPE_BLOCK_PC should be calling ->done, maybe it's that some REQ_TYPE_FS commands should *not* be calling ->done. Or maybe we somehow got the wrong ->done in the first place, because we now get it from a different source. I don't know, but what I'm arguing (very strongly) against is this attitude of "we don't know what's wrong, but wë́'ll leave it broken because we can't be bothered to figure it out". That is exactly what reverting is there for. It doesn't matter one *whit* if the new code is cleaner and prettier, if it doesn't work. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
On Wed, Jan 02, 2008 at 11:57:10AM -0800, Linus Torvalds wrote: > On Wed, 2 Jan 2008, Matthew Wilcox wrote: > > > > sd_done and sr_done are called for REQ_TYPE_FS -- if the request comes > > in through one of the SG interfaces, we call scsi_setup_blk_pc_cmnd() > > which sets the ->done callback to scsi_blk_pc_done. > > Why do you think that REQ_TYPE_BLOCK_PC has anything to do with SG? > > It has *nothing* to do with SG, and anybody who uses SG in this day and > age on a block device is just crazy. > > The way you do generic SCSI commands (on perfectly normal block device > nodes) is using the SCSI ioctl() interfaces. That's how you are supposed > to do things like burn DVD's or do any kind of special ops. > > So REQ_TYPE_BLOCK_PC does quite commonly happen on perfectly regular block > devices, it's how all commands that aren't pure reads or writes done by > the kernel behave. I spoke imprecisely; a raw command one through SG or ioctl is REQ_TYPE_BLOCK_PC and did not go through sd_done or sr_done before this patch. > If you actually use /dev/sg*, you will be using the SG driver, and if you > don't want that to have a ->done callback, then just set "done" to NULL > for sg_driver. Unless you think that we should see different behaviour when using /dev/sg* and /dev/sd*, the aspect of the patch you criticised was correct. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
On Wed, Jan 02, 2008 at 11:57:10AM -0800, Linus Torvalds wrote: > > > On Wed, 2 Jan 2008, Matthew Wilcox wrote: > > > > sd_done and sr_done are called for REQ_TYPE_FS -- if the request comes > > in through one of the SG interfaces, we call scsi_setup_blk_pc_cmnd() > > which sets the ->done callback to scsi_blk_pc_done. > > Why do you think that REQ_TYPE_BLOCK_PC has anything to do with SG? > > It has *nothing* to do with SG, and anybody who uses SG in this day and > age on a block device is just crazy. I think you misunderstood Matthew here. REQ_TYPE_BLOCK_PC is indeed used by any kind of SG_IO or similar passthrough no matter where it originates. And exactly because REQ_TYPE_BLOCK_PC are entirely passthru the actual driver (sd, sr or sg) is not doing the actual command completion but it's handled in the scsi layer because it's exactly the same no matter what driver it came on. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
On Wed, 2008-01-02 at 12:40 -0700, Matthew Wilcox wrote: > On Wed, Jan 02, 2008 at 11:19:14AM -0800, Linus Torvalds wrote: > > It's totally immaterial if we have one reporter or many. The fact is, that > > thing has been outstanding for almost two months now. The root cause is > > almost certainly known (and Matthew is apparently even aware of it), but > > I really don't think the root cause is known. > > > By now, it needs to either have a patch, or to be reverted. It's too late > > to make excuses. > > I think reverting it is the correct thing to do. OK ... I'll revert it. However, I still think it's the wrong course of action, because as far as my analysis goes, this code is functionally equivalent to what went before with the exception that we now rely on the request->cmd_type information in the post processing (previously we just relied on the cmnd->done pointer). As far as I understand what's going on, the reporter is misusing the interface. He sets up a packet mapping and then accesses the underlying device, which is wrong (you're supposed to access now via the packet device). The one thing the packet mapping does is to alter the blocksize, which could lead to the errors reported (because the underlying fs block size and the actual block size differ) ... however, I think it should do this all the time, so what I'm unable to explain is why it doesn't write past the end of the device with this commit reverted. But all of the above is the hallmark of an existing bug that this commit exposes ... revert it, and we'll cover the underlying problem up again. James -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
On Wed, 2 Jan 2008, Matthew Wilcox wrote: > > sd_done and sr_done are called for REQ_TYPE_FS -- if the request comes > in through one of the SG interfaces, we call scsi_setup_blk_pc_cmnd() > which sets the ->done callback to scsi_blk_pc_done. Why do you think that REQ_TYPE_BLOCK_PC has anything to do with SG? It has *nothing* to do with SG, and anybody who uses SG in this day and age on a block device is just crazy. The way you do generic SCSI commands (on perfectly normal block device nodes) is using the SCSI ioctl() interfaces. That's how you are supposed to do things like burn DVD's or do any kind of special ops. So REQ_TYPE_BLOCK_PC does quite commonly happen on perfectly regular block devices, it's how all commands that aren't pure reads or writes done by the kernel behave. If you actually use /dev/sg*, you will be using the SG driver, and if you don't want that to have a ->done callback, then just set "done" to NULL for sg_driver. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
On Wed, Jan 02, 2008 at 11:19:14AM -0800, Linus Torvalds wrote: > It's totally immaterial if we have one reporter or many. The fact is, that > thing has been outstanding for almost two months now. The root cause is > almost certainly known (and Matthew is apparently even aware of it), but I really don't think the root cause is known. > By now, it needs to either have a patch, or to be reverted. It's too late > to make excuses. I think reverting it is the correct thing to do. > And no, it doesn't really need any more reporters or ways to reproduce it, > all it needs is looking at the patch and seeing what the differences are > AT A SOURCE level. And quite frankly, I see one huge honking difference, > which is that new test for > > if (cmd->request->cmd_type != REQ_TYPE_BLOCK_PC) > > which will disable all the crud that sd_done() does. sd_done and sr_done are called for REQ_TYPE_FS -- if the request comes in through one of the SG interfaces, we call scsi_setup_blk_pc_cmnd() which sets the ->done callback to scsi_blk_pc_done. > So I think you are making totally idiotic excuses. I may not know the SCSI > layer all that intimately, and maybe I'm wrong and there is some other > cause for this, but quite frankly, I doubt it. And I can look at just the > patch and have a damn good idea of why something is broken, but I can't > actually fix it myself because I don't know how to look up a a > "scsi_driver" from a "scsi_cmnd" sanely. That's in the patch. But it would be the wrong thing to do because SG commands should not be molested by the sr/sd driver. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
On Wed, 2 Jan 2008, James Bottomley wrote: > > I disagree with this. We only have one reporter of a problem and it > appears to be some type of obscure interaction with pktdvd which no-one > can track down (although it's not really helped by the reporter not > being very responsive). It's totally immaterial if we have one reporter or many. The fact is, that thing has been outstanding for almost two months now. The root cause is almost certainly known (and Matthew is apparently even aware of it), but it has been dicked-around-with by having totally weak excuses ("is it an ide-scsi thing?") > The correct thing to do is root cause the problem and fix it at source, > since it's very likely that this is a pre-existing bug that was simply > uncovered by the patch you're recommending we revert. No, the correct thing would have been to fix it a month ago. By now, it needs to either have a patch, or to be reverted. It's too late to make excuses. And no, it doesn't really need any more reporters or ways to reproduce it, all it needs is looking at the patch and seeing what the differences are AT A SOURCE level. And quite frankly, I see one huge honking difference, which is that new test for if (cmd->request->cmd_type != REQ_TYPE_BLOCK_PC) which will disable all the crud that sd_done() does. And guess what? Look at what sd_done() does: it checks whether the request was partially filled, and tries to handle that end-of-medium case nicely. Which is *exactly* what seems to have broken. So I think you are making totally idiotic excuses. I may not know the SCSI layer all that intimately, and maybe I'm wrong and there is some other cause for this, but quite frankly, I doubt it. And I can look at just the patch and have a damn good idea of why something is broken, but I can't actually fix it myself because I don't know how to look up a a "scsi_driver" from a "scsi_cmnd" sanely. But almost two months after the fact, we should have had a patch that does that, or we should revert it. NO MORE BOGUS EXCUSES! Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] scsi: revert "[SCSI] Get rid of scsi_cmnd->done"
On Wed, 2008-01-02 at 17:25 +0100, Ingo Molnar wrote: > revert commit: > > commit 6f5391c283d7fdcf24bf40786ea79061919d1e1d > Author: Matthew Wilcox <[EMAIL PROTECTED]> > Date: Tue Sep 25 12:42:04 2007 -0400 > > [SCSI] Get rid of scsi_cmnd->done > > this is a supposed-to-be-cleanup commit, but apparently it causes > regressions: > > Bug 9370 - v2.6.24-rc2-409-g9418d5d: attempt to access beyond end of device > http://bugzilla.kernel.org/show_bug.cgi?id=9370 > > this patch should be reintroduced in a more split-up form to make > testing of it easier. I disagree with this. We only have one reporter of a problem and it appears to be some type of obscure interaction with pktdvd which no-one can track down (although it's not really helped by the reporter not being very responsive). The correct thing to do is root cause the problem and fix it at source, since it's very likely that this is a pre-existing bug that was simply uncovered by the patch you're recommending we revert. Unfortunately, I suspect it won't get fixed until someone else actually manages to reproduce it (which I haven't been able to so far). James -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/