Re: [Bugme-new] [Bug 9674] New: Oops during rmmod'ing modeuls sdhci, sr_mod, ricoh_mmc, mmc_core
On Wednesday, 2 of January 2008, James Bottomley wrote: On Tue, 2008-01-01 at 18:10 -0800, Andrew Morton wrote: On Tue, 1 Jan 2008 14:55:45 -0800 (PST) [EMAIL PROTECTED] wrote: http://bugzilla.kernel.org/show_bug.cgi?id=9674 Summary: Oops during rmmod'ing modeuls sdhci, sr_mod, ricoh_mmc, mmc_core Guys, this is a very recent regression. Could you please take a look, see if it's due to mmc, block or scsi changes? There's not a lot of information to go on. The stack trace looks bogus, so I guess the kernel is compiled without a frame pointer. The bug report has been updated with a stack trace from a kernel compiled with a frame pointer. Please have a look. - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 9674] New: Oops during rmmod'ing modeuls sdhci, sr_mod, ricoh_mmc, mmc_core
On Wed, 2008-01-02 at 13:21 +0100, Rafael J. Wysocki wrote: On Wednesday, 2 of January 2008, James Bottomley wrote: On Tue, 2008-01-01 at 18:10 -0800, Andrew Morton wrote: On Tue, 1 Jan 2008 14:55:45 -0800 (PST) [EMAIL PROTECTED] wrote: http://bugzilla.kernel.org/show_bug.cgi?id=9674 Summary: Oops during rmmod'ing modeuls sdhci, sr_mod, ricoh_mmc, mmc_core Guys, this is a very recent regression. Could you please take a look, see if it's due to mmc, block or scsi changes? There's not a lot of information to go on. The stack trace looks bogus, so I guess the kernel is compiled without a frame pointer. The bug report has been updated with a stack trace from a kernel compiled with a frame pointer. Please have a look. Please, please don't do this. Filing something in bugzilla is tantamount to putting it in the file and forget folder. The reason I cc'd the SCSI mailing list and asked for more details is so that we get the email flow that might trigger direct interaction between the reporter and someone on the list who recognised the symptoms. Let me say again, catagorically, that if you want to give a bug the best chance of being fixed, the correct flow of information is: file a bugzilla and note the bugid. Then email a complete report to the relevant list, but add [BUG bugid] to the subject line and cc [EMAIL PROTECTED] If you do this, bugzilla will keep track of the entire discussion as it progresses and allow those who track bugs through bugzilla to get a pretty accurate idea of the status. You should never need to touch bugzilla again once the initial bug report is filed: all future information flow is via the mailing lists. Also, using urls unless for historical purposes is also a killer. Many people travel, and their MO is to download the email and read it on the plane/train/whatever. If you embed a url containing critical information, the email gets marked as read, but since I can't get to the information, nothing happens. Then it gets forgotten. This is the relevant piece of information that should have been on the mailing list: [ 101.359083] Unable to handle kernel paging request at 88021cc0 RIP: [ 101.359092] [88021cc0] [ 101.359099] PGD 203067 PUD 207063 PMD 3d34a067 PTE 0 [ 101.359108] Oops: 0010 [1] PREEMPT SMP [ 101.359115] CPU 0 [ 101.359118] Modules linked in: sr_mod tcp_westwood ipt_REJECT xt_state iptable_filter ipt_owner ipt_MASQUERADE xt_tcpudp xt_multiport iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack ip_tables x_tables iwl3945 ricoh_mmc cdrom [ 101.359150] Pid: 4496, comm: modprobe Not tainted 2.6.24-rc6-git7 #5 [ 101.359154] RIP: 0010:[88021cc0] [88021cc0] [ 101.359159] RSP: 0018:81002b457970 EFLAGS: 00010086 [ 101.359163] RAX: 88021cc0 RBX: 81003f1627e0 RCX: 81003f023b38 [ 101.359167] RDX: RSI: 810030efd000 RDI: 81003f1627e0 [ 101.359171] RBP: 81002b4579b8 R08: 0001 R09: 0001 [ 101.359175] R10: R11: R12: 810030efd000 [ 101.359179] R13: 81002b457988 R14: 0010 R15: 00010010 [ 101.359185] FS: 2adcf7ea0b00() GS:80733000() knlGS: [ 101.359189] CS: 0010 DS: ES: CR0: 8005003b [ 101.359193] CR2: 88021cc0 CR3: 2b497000 CR4: 06e0 [ 101.359197] DR0: DR1: DR2: [ 101.359201] DR3: DR6: 0ff0 DR7: 0400 [ 101.359206] Process modprobe (pid: 4496, threadinfo 81002b456000, task 81003de1ef50) [ 101.359210] Stack: 80333a98 0086 81003f023af8 810030efd000 [ 101.359221] 81003f1627e0 81003f023800 81003e31c000 81003f1627e0 [ 101.359230] 81002b457a08 803fe7e1 81003f023b38 [ 101.359237] Call Trace: [ 101.359248] [80333a98] elv_next_request+0xe8/0x180 [ 101.359256] [803fe7e1] scsi_request_fn+0x71/0x380 [ 101.359264] [803375b8] __generic_unplug_device+0x28/0x30 [ 101.359270] [80337623] blk_execute_rq_nowait+0x63/0xb0 [ 101.359276] [80339113] blk_execute_rq+0x73/0xe0 [ 101.359283] [80337775] get_request_wait+0x25/0x120 [ 101.359288] [80337896] blk_get_request+0x26/0x80 [ 101.359296] [803fe5b2] scsi_execute+0xe2/0x110 [ 101.359301] [803fe661] scsi_execute_req+0x81/0xf0 [ 101.359312] [8800d713] :sr_mod:sr_probe+0x1e3/0x630 [ 101.359323] [803c8d01] driver_probe_device+0xa1/0x1c0 [ 101.359329] [803c8ff5] __driver_attach+0xe5/0xf0 [ 101.359334] [803c8f10] __driver_attach+0x0/0xf0 [ 101.359342] [803c7ee3] bus_for_each_dev+0x53/0x80 [ 101.359348] [803c8b3c] driver_attach+0x1c/0x20 [ 101.359353] [803c8305]
Re: [Bugme-new] [Bug 9674] New: Oops during rmmod'ing modeuls sdhci, sr_mod, ricoh_mmc, mmc_core
On Wed, 2008-01-02 at 10:49 -0500, Pete Wyckoff wrote: [EMAIL PROTECTED] wrote on Tue, 01 Jan 2008 21:24 -0600: On Tue, 2008-01-01 at 18:10 -0800, Andrew Morton wrote: On Tue, 1 Jan 2008 14:55:45 -0800 (PST) [EMAIL PROTECTED] wrote: http://bugzilla.kernel.org/show_bug.cgi?id=9674 Summary: Oops during rmmod'ing modeuls sdhci, sr_mod, ricoh_mmc, mmc_core Guys, this is a very recent regression. Could you please take a look, see if it's due to mmc, block or scsi changes? There's not a lot of information to go on. The stack trace looks bogus, so I guess the kernel is compiled without a frame pointer. However, it does look like the initial insertion of sr_mod is going through and it generates a command which gets into scsi_request_fn and then indirects through a bogus queueucommand pointer. Bogus prep_rq_fn actually. What's the actual underlying device the cdrom is attached to? There's no real changes to SCSI in this area from 2.6.24-rc4 ... however, the reinsertion is suggestive, it's like the removal is retriggering a module request for some reason. Here's a guess. When sr_mod is removed, it looks like the request queue prep_rq_fn is still pointing to the now nonexistent sr_prep_fn. This may have been due to a commit that went in early 2.6.24: commit 7f9a6bc4e9d59e7fcf03ed23f60cd81ca5d80b65 Author: James Bottomley [EMAIL PROTECTED] Date: Sat Aug 4 10:06:25 2007 -0500 [SCSI] move ULD attachment into the prep function One of the intents of the block prep function was to allow ULDs to use it for preprocessing. The original SCSI model was to have a single prep function and add a pointer indirect filter to build the necessary commands. This patch reverses that, does away with the init_command field of the scsi_driver structure and makes ULDs attach directly to the prep function instead. The value is really that it allows us to begin to separate the ULDs from the SCSI mid layer (as long as they don't use any core functions---which is hard at the moment---a ULD doesn't even need SCSI to bind). Acked-by: Jens Axboe [EMAIL PROTECTED] Signed-off-by: James Bottomley [EMAIL PROTECTED] When the module is re-inserted, it does a few SCSI commands before setting up the new prep_rq_fn, presumably hitting this bogus pointer. One fix would be to have sr remember the original prep function and restore it in sr_kref_release. Sd and a few other drivers have this issue. Ide-cd bothers to set prep_rq_fn to NULL as it releases the device. Bingo .. that's why we ask the list, thanks Pete! I don't think the fix is the correct one, but I've attached what I think is the correct fix (basically, there's a bus callback we can use to ensure the right thing always gets done rather than relying on drivers doing it in their own release methods, that way they can't forget). The reason it was showing up in -rc4 I suspect is because something was structurally altering the module stack, and the address that sr_mod was loaded was changed, so the prep_fn as Pete said was pointing into bogus address space. The way to trigger this bug 100% of the time is to rmmod sr_mod and then send an inquiry (or another command) to the device using the sg node. James --- Index: BUILD-2.6/drivers/scsi/scsi_lib.c === --- BUILD-2.6.orig/drivers/scsi/scsi_lib.c 2008-01-01 10:13:33.0 -0600 +++ BUILD-2.6/drivers/scsi/scsi_lib.c 2008-01-02 10:17:51.0 -0600 @@ -1324,7 +1324,7 @@ int scsi_prep_return(struct request_queu } EXPORT_SYMBOL(scsi_prep_return); -static int scsi_prep_fn(struct request_queue *q, struct request *req) +int scsi_prep_fn(struct request_queue *q, struct request *req) { struct scsi_device *sdev = q-queuedata; int ret = BLKPREP_KILL; Index: BUILD-2.6/drivers/scsi/scsi_priv.h === --- BUILD-2.6.orig/drivers/scsi/scsi_priv.h 2007-11-03 09:08:46.0 -0500 +++ BUILD-2.6/drivers/scsi/scsi_priv.h 2008-01-02 10:20:09.0 -0600 @@ -74,6 +74,9 @@ extern struct request_queue *scsi_alloc_ extern void scsi_free_queue(struct request_queue *q); extern int scsi_init_queue(void); extern void scsi_exit_queue(void); +struct request_queue; +struct request; +extern int scsi_prep_fn(struct request_queue *, struct request *); /* scsi_proc.c */ #ifdef CONFIG_SCSI_PROC_FS Index: BUILD-2.6/drivers/scsi/scsi_sysfs.c === --- BUILD-2.6.orig/drivers/scsi/scsi_sysfs.c2007-11-03 10:08:02.0 -0500 +++ BUILD-2.6/drivers/scsi/scsi_sysfs.c 2008-01-02 10:31:33.0 -0600 @@ -373,12 +373,24 @@ static int scsi_bus_resume(struct device return err; } +static int
Re: [Bugme-new] [Bug 9674] New: Oops during rmmod'ing modeuls sdhci, sr_mod, ricoh_mmc, mmc_core
On Tue, 2008-01-01 at 18:10 -0800, Andrew Morton wrote: On Tue, 1 Jan 2008 14:55:45 -0800 (PST) [EMAIL PROTECTED] wrote: http://bugzilla.kernel.org/show_bug.cgi?id=9674 Summary: Oops during rmmod'ing modeuls sdhci, sr_mod, ricoh_mmc, mmc_core Guys, this is a very recent regression. Could you please take a look, see if it's due to mmc, block or scsi changes? There's not a lot of information to go on. The stack trace looks bogus, so I guess the kernel is compiled without a frame pointer. However, it does look like the initial insertion of sr_mod is going through and it generates a command which gets into scsi_request_fn and then indirects through a bogus queueucommand pointer. What's the actual underlying device the cdrom is attached to? There's no real changes to SCSI in this area from 2.6.24-rc4 ... however, the reinsertion is suggestive, it's like the removal is retriggering a module request for some reason. James - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html