Re: [Bugme-new] [Bug 9674] New: Oops during rmmod'ing modeuls sdhci, sr_mod, ricoh_mmc, mmc_core

2008-01-02 Thread Rafael J. Wysocki
On Wednesday, 2 of January 2008, James Bottomley wrote:
 
 On Tue, 2008-01-01 at 18:10 -0800, Andrew Morton wrote:
  On Tue,  1 Jan 2008 14:55:45 -0800 (PST) [EMAIL PROTECTED] wrote:
  
   http://bugzilla.kernel.org/show_bug.cgi?id=9674
   
  Summary: Oops during rmmod'ing modeuls sdhci, sr_mod, 
   ricoh_mmc,
   mmc_core
  
  Guys, this is a very recent regression.  Could you please take a look, see
  if it's due to mmc, block or scsi changes?
 
 There's not a lot of information to go on.  The stack trace looks bogus,
 so I guess the kernel is compiled without a frame pointer.

The bug report has been updated with a stack trace from a kernel compiled
with a frame pointer.  Please have a look.
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 9674] New: Oops during rmmod'ing modeuls sdhci, sr_mod, ricoh_mmc, mmc_core

2008-01-02 Thread James Bottomley

On Wed, 2008-01-02 at 13:21 +0100, Rafael J. Wysocki wrote:
 On Wednesday, 2 of January 2008, James Bottomley wrote:
  
  On Tue, 2008-01-01 at 18:10 -0800, Andrew Morton wrote:
   On Tue,  1 Jan 2008 14:55:45 -0800 (PST) [EMAIL PROTECTED] wrote:
   
http://bugzilla.kernel.org/show_bug.cgi?id=9674

   Summary: Oops during rmmod'ing modeuls sdhci, sr_mod, 
ricoh_mmc,
mmc_core
   
   Guys, this is a very recent regression.  Could you please take a look, see
   if it's due to mmc, block or scsi changes?
  
  There's not a lot of information to go on.  The stack trace looks bogus,
  so I guess the kernel is compiled without a frame pointer.
 
 The bug report has been updated with a stack trace from a kernel compiled
 with a frame pointer.  Please have a look.

Please, please don't do this.  Filing something in bugzilla is
tantamount to putting it in the file and forget folder.  The reason I
cc'd the SCSI mailing list and asked for more details is so that we get
the email flow that might trigger direct interaction between the
reporter and someone on the list who recognised the symptoms.  

Let me say again, catagorically, that if you want to give a bug the best
chance of being fixed, the correct flow of information is:

file a bugzilla and note the bugid.
Then email a complete report to the relevant list, but add [BUG bugid]
to the subject line and cc [EMAIL PROTECTED]  If you do
this, bugzilla will keep track of the entire discussion as it progresses
and allow those who track bugs through bugzilla to get a pretty accurate
idea of the status.  You should never need to touch bugzilla again once
the initial bug report is filed: all future information flow is via the
mailing lists.

Also, using urls unless for historical purposes is also a killer.  Many
people travel, and their MO is to download the email and read it on the
plane/train/whatever.  If you embed a url containing critical
information, the email gets marked as read, but since I can't get to the
information, nothing happens.  Then it gets forgotten.

This is the relevant piece of information that should have been on the
mailing list:


[  101.359083] Unable to handle kernel paging request at 88021cc0 RIP:
[  101.359092]  [88021cc0]
[  101.359099] PGD 203067 PUD 207063 PMD 3d34a067 PTE 0
[  101.359108] Oops: 0010 [1] PREEMPT SMP
[  101.359115] CPU 0
[  101.359118] Modules linked in: sr_mod tcp_westwood ipt_REJECT xt_state
iptable_filter ipt_owner ipt_MASQUERADE xt_tcpudp xt_multiport iptable_nat
nf_nat nf_conntrack_ipv4 nf_conntrack ip_tables x_tables iwl3945 ricoh_mmc
cdrom
[  101.359150] Pid: 4496, comm: modprobe Not tainted 2.6.24-rc6-git7 #5
[  101.359154] RIP: 0010:[88021cc0]  [88021cc0]
[  101.359159] RSP: 0018:81002b457970  EFLAGS: 00010086
[  101.359163] RAX: 88021cc0 RBX: 81003f1627e0 RCX:
81003f023b38
[  101.359167] RDX:  RSI: 810030efd000 RDI:
81003f1627e0
[  101.359171] RBP: 81002b4579b8 R08: 0001 R09:
0001
[  101.359175] R10:  R11:  R12:
810030efd000
[  101.359179] R13: 81002b457988 R14: 0010 R15:
00010010
[  101.359185] FS:  2adcf7ea0b00() GS:80733000()
knlGS:
[  101.359189] CS:  0010 DS:  ES:  CR0: 8005003b
[  101.359193] CR2: 88021cc0 CR3: 2b497000 CR4:
06e0
[  101.359197] DR0:  DR1:  DR2:

[  101.359201] DR3:  DR6: 0ff0 DR7:
0400
[  101.359206] Process modprobe (pid: 4496, threadinfo 81002b456000, task
81003de1ef50)
[  101.359210] Stack:  80333a98 0086 81003f023af8
810030efd000
[  101.359221]  81003f1627e0 81003f023800 81003e31c000
81003f1627e0
[  101.359230]   81002b457a08 803fe7e1
81003f023b38
[  101.359237] Call Trace:
[  101.359248]  [80333a98] elv_next_request+0xe8/0x180
[  101.359256]  [803fe7e1] scsi_request_fn+0x71/0x380
[  101.359264]  [803375b8] __generic_unplug_device+0x28/0x30
[  101.359270]  [80337623] blk_execute_rq_nowait+0x63/0xb0
[  101.359276]  [80339113] blk_execute_rq+0x73/0xe0
[  101.359283]  [80337775] get_request_wait+0x25/0x120
[  101.359288]  [80337896] blk_get_request+0x26/0x80
[  101.359296]  [803fe5b2] scsi_execute+0xe2/0x110
[  101.359301]  [803fe661] scsi_execute_req+0x81/0xf0
[  101.359312]  [8800d713] :sr_mod:sr_probe+0x1e3/0x630
[  101.359323]  [803c8d01] driver_probe_device+0xa1/0x1c0
[  101.359329]  [803c8ff5] __driver_attach+0xe5/0xf0
[  101.359334]  [803c8f10] __driver_attach+0x0/0xf0
[  101.359342]  [803c7ee3] bus_for_each_dev+0x53/0x80
[  101.359348]  [803c8b3c] driver_attach+0x1c/0x20
[  101.359353]  [803c8305] 

Re: [Bugme-new] [Bug 9674] New: Oops during rmmod'ing modeuls sdhci, sr_mod, ricoh_mmc, mmc_core

2008-01-02 Thread James Bottomley

On Wed, 2008-01-02 at 10:49 -0500, Pete Wyckoff wrote:
 [EMAIL PROTECTED] wrote on Tue, 01 Jan 2008 21:24 -0600:
  
  On Tue, 2008-01-01 at 18:10 -0800, Andrew Morton wrote:
   On Tue,  1 Jan 2008 14:55:45 -0800 (PST) [EMAIL PROTECTED] wrote:
   
http://bugzilla.kernel.org/show_bug.cgi?id=9674

   Summary: Oops during rmmod'ing modeuls sdhci, sr_mod, 
ricoh_mmc,
mmc_core
   
   Guys, this is a very recent regression.  Could you please take a look, see
   if it's due to mmc, block or scsi changes?
  
  There's not a lot of information to go on.  The stack trace looks bogus,
  so I guess the kernel is compiled without a frame pointer.  However, it
  does look like the initial insertion of sr_mod is going through and it
  generates a command which gets into scsi_request_fn and then indirects
  through a bogus queueucommand pointer.
 
 Bogus prep_rq_fn actually.
 
  What's the actual underlying device the cdrom is attached to?
  
  There's no real changes to SCSI in this area from 2.6.24-rc4 ...
  however, the reinsertion is suggestive, it's like the removal is
  retriggering a module request for some reason.
 
 Here's a guess.  When sr_mod is removed, it looks like the request
 queue prep_rq_fn is still pointing to the now nonexistent
 sr_prep_fn.  This may have been due to a commit that went in early
 2.6.24:
 
 commit 7f9a6bc4e9d59e7fcf03ed23f60cd81ca5d80b65
 Author: James Bottomley [EMAIL PROTECTED]
 Date:   Sat Aug 4 10:06:25 2007 -0500
 
 [SCSI] move ULD attachment into the prep function
 
 One of the intents of the block prep function was to allow ULDs to use
 it for preprocessing.  The original SCSI model was to have a single prep
 function and add a pointer indirect filter to build the necessary
 commands.  This patch reverses that, does away with the init_command
 field of the scsi_driver structure and makes ULDs attach directly to the
 prep function instead.  The value is really that it allows us to begin
 to separate the ULDs from the SCSI mid layer (as long as they don't use
 any core functions---which is hard at the moment---a ULD doesn't even
 need SCSI to bind).
 
 Acked-by: Jens Axboe [EMAIL PROTECTED]
 Signed-off-by: James Bottomley [EMAIL PROTECTED]
 
 When the module is re-inserted, it does a few SCSI commands before
 setting up the new prep_rq_fn, presumably hitting this bogus
 pointer.
 
 One fix would be to have sr remember the original prep function and
 restore it in sr_kref_release.  Sd and a few other drivers have this
 issue.  Ide-cd bothers to set prep_rq_fn to NULL as it releases
 the device.

Bingo .. that's why we ask the list, thanks Pete!

I don't think the fix is the correct one, but I've attached what I think
is the correct fix (basically, there's a bus callback we can use to
ensure the right thing always gets done rather than relying on drivers
doing it in their own release methods, that way they can't forget).

The reason it was showing up in -rc4 I suspect is because something was
structurally altering the module stack, and the address that sr_mod was
loaded was changed, so the prep_fn as Pete said was pointing into bogus
address space.

The way to trigger this bug 100% of the time is to rmmod sr_mod and then
send an inquiry (or another command) to the device using the sg node.

James

---

Index: BUILD-2.6/drivers/scsi/scsi_lib.c
===
--- BUILD-2.6.orig/drivers/scsi/scsi_lib.c  2008-01-01 10:13:33.0 
-0600
+++ BUILD-2.6/drivers/scsi/scsi_lib.c   2008-01-02 10:17:51.0 -0600
@@ -1324,7 +1324,7 @@ int scsi_prep_return(struct request_queu
 }
 EXPORT_SYMBOL(scsi_prep_return);
 
-static int scsi_prep_fn(struct request_queue *q, struct request *req)
+int scsi_prep_fn(struct request_queue *q, struct request *req)
 {
struct scsi_device *sdev = q-queuedata;
int ret = BLKPREP_KILL;
Index: BUILD-2.6/drivers/scsi/scsi_priv.h
===
--- BUILD-2.6.orig/drivers/scsi/scsi_priv.h 2007-11-03 09:08:46.0 
-0500
+++ BUILD-2.6/drivers/scsi/scsi_priv.h  2008-01-02 10:20:09.0 -0600
@@ -74,6 +74,9 @@ extern struct request_queue *scsi_alloc_
 extern void scsi_free_queue(struct request_queue *q);
 extern int scsi_init_queue(void);
 extern void scsi_exit_queue(void);
+struct request_queue;
+struct request;
+extern int scsi_prep_fn(struct request_queue *, struct request *);
 
 /* scsi_proc.c */
 #ifdef CONFIG_SCSI_PROC_FS
Index: BUILD-2.6/drivers/scsi/scsi_sysfs.c
===
--- BUILD-2.6.orig/drivers/scsi/scsi_sysfs.c2007-11-03 10:08:02.0 
-0500
+++ BUILD-2.6/drivers/scsi/scsi_sysfs.c 2008-01-02 10:31:33.0 -0600
@@ -373,12 +373,24 @@ static int scsi_bus_resume(struct device
return err;
 }
 
+static int 

Re: [Bugme-new] [Bug 9674] New: Oops during rmmod'ing modeuls sdhci, sr_mod, ricoh_mmc, mmc_core

2008-01-01 Thread James Bottomley

On Tue, 2008-01-01 at 18:10 -0800, Andrew Morton wrote:
 On Tue,  1 Jan 2008 14:55:45 -0800 (PST) [EMAIL PROTECTED] wrote:
 
  http://bugzilla.kernel.org/show_bug.cgi?id=9674
  
 Summary: Oops during rmmod'ing modeuls sdhci, sr_mod, ricoh_mmc,
  mmc_core
 
 Guys, this is a very recent regression.  Could you please take a look, see
 if it's due to mmc, block or scsi changes?

There's not a lot of information to go on.  The stack trace looks bogus,
so I guess the kernel is compiled without a frame pointer.  However, it
does look like the initial insertion of sr_mod is going through and it
generates a command which gets into scsi_request_fn and then indirects
through a bogus queueucommand pointer.

What's the actual underlying device the cdrom is attached to?

There's no real changes to SCSI in this area from 2.6.24-rc4 ...
however, the reinsertion is suggestive, it's like the removal is
retriggering a module request for some reason.

James


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html