> -----Original Message-----
> From: linux-scsi-ow...@vger.kernel.org [mailto:linux-scsi-
> ow...@vger.kernel.org] On Behalf Of bugzilla-dae...@bugzilla.kernel.org
> Sent: Tuesday, 23 September, 2014 4:56 PM
> To: linux-scsi@vger.kernel.org
> Subject: [Bug 81861] Oops by mvsas v0.8.16: sas: ataX: end_device-Y:0:Z: dev
> error handler -> general protection fault, RIP: mvs_task_prep_ata+0x80/0x3a0
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=81861
> 
> --- Comment #16 from linux-...@crashplan.pro ---
> When line-by-line dumping the called constants/vars from:
> 469        del_q = TXQ_MODE_I | tag |
> 470            (TXQ_CMD_STP << TXQ_CMD_SHIFT) |
> 471            (MVS_PHY_ID << TXQ_PHY_SHIFT) |
> 472            (mvi_dev->taskfileset << TXQ_SRS_SHIFT);
> 
> using the prepended statements:
>         printk("slot=%p ", slot);
>         printk(KERN_INFO "TXQ_MODE_I=%d ", TXQ_MODE_I);
>         printk(KERN_INFO "tag=%d ", tag);
>         printk(KERN_INFO "TXQ_CMD_STP=%d ", TXQ_CMD_STP);
>         printk(KERN_INFO "TXQ_CMD_SHIFT=%d ", TXQ_CMD_SHIFT);
>         printk(KERN_INFO "MVS_PHY_ID=%d ", MVS_PHY_ID);
>         printk(KERN_INFO "TXQ_PHY_SHIFT=%d ", TXQ_PHY_SHIFT);
>         del_q = TXQ_MODE_I | tag |
>                 (TXQ_CMD_STP << TXQ_CMD_SHIFT) |
>                 (MVS_PHY_ID << TXQ_PHY_SHIFT) |
>                 (mvi_dev->taskfileset << TXQ_SRS_SHIFT);
> 
> the kernel crash occurs after printing "TXQ_CMD_SHIFT" or when trying to
> output
> the value of "MVS_PHY_ID":
> [  529.113152] sas: DONE DISCOVERY on port 0, pid:133, result:0
> [  529.114313] sas: Enter sas_scsi_recover_host busy: 0 failed: 0
> [  529.115460] sas: ata7: end_device-6:0:28: dev error handler
> [  529.115522] sas: ata8: end_device-6:0:29: dev error handler
> [  529.118706] sas: ata9: end_device-6:0:30: dev error handler
> [  529.119840] sas: ata10: end_device-6:0:31: dev error handler
> [  529.271634] [mvi=ffff8800d3680000, mvi_dev=ffff8800d36836a0 tag=0
> slot=ffff8800d36a55b8
> [  529.271753] TXQ_MODE_I=268435456 tag=0
> [  529.272679] TXQ_CMD_STP=3 TXQ_CMD_SHIFT=29
> [  529.273618] MVS_PHY_ID=32768 TXQ_PHY_SHIFT=12 tx_prod=44]
> [  529.276091] [mvi=ffff8800d3680000, mvi_dev=ffff8800d3683618 tag=1
> slot=ffff8800d36a5610
> [  529.276207] TXQ_MODE_I=268435456 tag=1
> [  529.277095] TXQ_CMD_STP=3 TXQ_CMD_SHIFT=29
> [  529.278038] MVS_PHY_ID=1 TXQ_PHY_SHIFT=12 tx_prod=46]
> [  529.280271] [mvi=ffff8800d3680000, mvi_dev=ffff8800d3683618 tag=1
> slot=ffff8800d36a5610
> [  529.280385] TXQ_MODE_I=268435456 tag=1
> [  529.281445] TXQ_CMD_STP=3 TXQ_CMD_SHIFT=29
> [  529.282562] MVS_PHY_ID=1 TXQ_PHY_SHIFT=12 tx_prod=48]
> [  529.284894] [mvi=ffff8800d3680000, mvi_dev=ffff8800d36837b0 tag=2
> slot=ffff8800d36a5668
> [  529.285010] TXQ_MODE_I=268435456 tag=2
> [  529.286248] TXQ_CMD_STP=3 TXQ_CMD_SHIFT=29
> [  529.287555] BUG: unable to handle kernel NULL pointer dereference at
> 0000000000000257
> [  529.290225] IP: [<ffffffffa02888bb>] mvs_task_prep+0x7cb/0xe50 [mvsas]
> [  529.291686] PGD 0
> [  529.293141] Oops: 0000 [#1] SMP
> [  529.294630] Modules linked in: mvsas(OF) libsas scsi_transport_sas
> x86_pkg_temp_thermal intel_powerclamp coretemp kvm crct10dif_pclmul
> crc32_pclmul ghash_clmulni_intel cryptd serio_raw lpc_ich i915 mei_me mei
> drm_kms_helper video netconsole drm configfs mac_hid i2c_algo_bit psmouse
> r8169
> ahci mii libahci
> 
> Any suggestions why accessing "MVS_PHY_ID" leads to the kernel NULL pointer
> dereference oops?

1. Although MVS_PHY_ID looks like a constant, it's really not:
#define MVS_PHY_ID (1U << sas_phy->id)

2. This fault:
[   32.271218] BUG: unable to handle kernel NULL pointer dereference at 
0000000000000255
(although 255 looks like a decimal number 0xff, it's really hex 0x255)

at this line:
   0xffffffffa01c481e <+1838>:  mov    0x254(%rbx),%ecx

implies that rbx contains 1, so 0x254 + 1 = 0x255.

3. pahole drivers/scsi/mvsas/mv_sas.o
shows there are two structures with fields at offset 596:
* asd_sas_phy.id
* asd_sas_port.sas_addr[8]

4. objdump -drS drivers/scsi/mvsas/mv_sas.o
shows only a few lines with 0x254(%something), one of which
is the del_q line you've identified:

mvs_task_prep_ata(struct mvs_info *mvi, struct mvs_task_exec_info *tei):
        struct sas_ha_struct *sha = mvi->sas;
        struct sas_task *task = tei->task;
        struct domain_device *dev = task->dev;
        struct sas_phy *sphy = dev->phy;
        struct asd_sas_phy *sas_phy = sha->sas_phy[sphy->number];

        ...
        del_q = TXQ_MODE_I | tag |
                (TXQ_CMD_STP << TXQ_CMD_SHIFT) |
                (MVS_PHY_ID << TXQ_PHY_SHIFT) |
                (mvi_dev->taskfileset << TXQ_SRS_SHIFT);
        mvi->tx[mvi->tx_prod] = cpu_to_le32(del_q);

MVS_PHY_ID =
sas_phy->id =
sha->sas_phy[sphy->number] =
mvi->sas->sas_phy[dev->phy->number] =
mvi->sas->sas_phy[task->dev->phy->number]->id
mvi->sas->sas_phy[tei->task->dev->phy->number]->id

Looking at the offsets reported by pahole, that means:
%rdi->56->344[%rsi->0->0->56->688]->254

mvi->sas->sas_phy is a pointer to a pointer:
struct sas_ha_struct {
...
        struct asd_sas_phy * *     sas_phy;              /*   344     8 */

You might look for somewhere that could accidentally
be setting sas_phy[something] to a for loop index,
with a typecast hiding the problem from the compiler.
Or, the phy->number value being passed might be
out of range; if there were discovery errors, something
might not have been initialized like this function expects.


---
Rob Elliott    HP Server Storage





Reply via email to