Re: [PATCH] scsi: megaraid_sas: fix kdump kernel boot hung caused by JBOD

2020-06-05 Thread Kai Liu

On 2020/06/05 Fri 21:00, Chandrakanth Patil wrote:

Hi Kai Liu,

Tomcat (Device ID: 0017) belongs to Gen3.5 controllers (Ventura family of
controllers). So this issue is applicable.
As this is an OEM specific firmware, Please contact Broadcom support team in
order get the correct firmware image.


Thanks for your help, Chandrakanth.

Best regards,
Kai


RE: [PATCH] scsi: megaraid_sas: fix kdump kernel boot hung caused by JBOD

2020-06-05 Thread Chandrakanth Patil
>Subject: Re: [PATCH] scsi: megaraid_sas: fix kdump kernel boot hung caused
>by JBOD
>
>On 2020/06/05 Fri 01:05, Chandrakanth Patil wrote:
>>
>>Hi Kai Liu,
>>
>>Gen3 (Invader) and Gen3.5 (Ventura/Aero) generations of controllers are
>>affected.
>
>Hi Chandrakanth,
>
>My card is not one of these but it's also problematic:
>
># lspci -nn|grep 3408
>02:00.0 RAID bus controller [0104]: Broadcom / LSI MegaRAID Tri-Mode
>SAS3408
>[1000:0017] (rev 01)
>
>According to megaraid_sas.h it's Tomcat:
>
>#define PCI_DEVICE_ID_LSI_TOMCAT0x0017
>
>According to product information on broadcom.com the card model is 9440-8i.
>So I tried to
>upgrade to the latest firmware version
>51.13.0-3223 but I got these error:
>
># ./storcli64 /c0 download file=9440-8i_nopad.rom Download Completed.
>Flashing image to adapter...
>CLI Version = 007.1316.. Mar 12, 2020 Operating system = Linux
>5.3.18-
>0.g6748ac9-default Controller = 0 Status = Failure Description = image
>corrupted
>
>I tried few more versions from broadcom website, they all failed with the
>same "image
>corrupted" error.
>
>Here is the controller information:
>
># ./storcli64 /c0 show
>Generating detailed summary of the adapter, it may take a while to
>complete.
>
>CLI Version = 007.1316.. Mar 12, 2020 Operating system = Linux
>5.3.18-
>0.g6748ac9-default Controller = 0 Status = Success Description = None
>
>Product Name = SAS3408
>Serial Number = 033FAT10K8000236
>SAS Address =  57c1cf15516f4000
>PCI Address = 00:02:00:00
>System Time = 06/05/2020 12:36:59
>Mfg. Date = 00/00/00
>Controller Time = 06/05/2020 04:36:58
>FW Package Build = 50.6.3-0109
>BIOS Version = 7.06.02.2_0x07060502
>FW Version = 5.060.01-2262
>Driver Name = megaraid_sas
>Driver Version = 07.713.01.00-rc1
>Vendor Id = 0x1000
>Device Id = 0x17
>SubVendor Id = 0x19E5
>SubDevice Id = 0xD213
>Host Interface = PCI-E
>Device Interface = SAS-12G
>Bus Number = 2
>Device Number = 0
>Function Number = 0
>Domain ID = 0
>Drive Groups = 3
>
>
>Thanks,
>Kai Liu

Hi Kai Liu,

Tomcat (Device ID: 0017) belongs to Gen3.5 controllers (Ventura family of
controllers). So this issue is applicable.
As this is an OEM specific firmware, Please contact Broadcom support team in
order get the correct firmware image.

-Chandrakanth Patil


Re: [PATCH] scsi: megaraid_sas: fix kdump kernel boot hung caused by JBOD

2020-06-04 Thread Kai Liu

On 2020/06/05 Fri 01:05, Chandrakanth Patil wrote:


Hi Kai Liu,

Gen3 (Invader) and Gen3.5 (Ventura/Aero) generations of controllers are
affected.


Hi Chandrakanth,

My card is not one of these but it's also problematic:

# lspci -nn|grep 3408
02:00.0 RAID bus controller [0104]: Broadcom / LSI MegaRAID Tri-Mode SAS3408 
[1000:0017] (rev 01)

According to megaraid_sas.h it's Tomcat:

#define PCI_DEVICE_ID_LSI_TOMCAT0x0017

According to product information on broadcom.com the card model is 
9440-8i. So I tried to upgrade to the latest firmware version 
51.13.0-3223 but I got these error:


# ./storcli64 /c0 download file=9440-8i_nopad.rom
Download Completed.
Flashing image to adapter...
CLI Version = 007.1316.. Mar 12, 2020
Operating system = Linux 5.3.18-0.g6748ac9-default
Controller = 0
Status = Failure
Description = image corrupted

I tried few more versions from broadcom website, they all failed with 
the same "image corrupted" error.


Here is the controller information:

# ./storcli64 /c0 show
Generating detailed summary of the adapter, it may take a while to complete.

CLI Version = 007.1316.. Mar 12, 2020
Operating system = Linux 5.3.18-0.g6748ac9-default
Controller = 0
Status = Success
Description = None

Product Name = SAS3408
Serial Number = 033FAT10K8000236
SAS Address =  57c1cf15516f4000
PCI Address = 00:02:00:00
System Time = 06/05/2020 12:36:59
Mfg. Date = 00/00/00
Controller Time = 06/05/2020 04:36:58
FW Package Build = 50.6.3-0109
BIOS Version = 7.06.02.2_0x07060502
FW Version = 5.060.01-2262
Driver Name = megaraid_sas
Driver Version = 07.713.01.00-rc1
Vendor Id = 0x1000
Device Id = 0x17
SubVendor Id = 0x19E5
SubDevice Id = 0xD213
Host Interface = PCI-E
Device Interface = SAS-12G
Bus Number = 2
Device Number = 0
Function Number = 0
Domain ID = 0
Drive Groups = 3


Thanks,
Kai Liu


RE: [PATCH] scsi: megaraid_sas: fix kdump kernel boot hung caused by JBOD

2020-06-04 Thread Chandrakanth Patil
>Subject: Re: [PATCH] scsi: megaraid_sas: fix kdump kernel boot hung caused
>by JBOD
>
>On 2020/06/04 Thu 16:39, Chandrakanth Patil wrote:
>>
>>Hi Martin, Xiaoming Gao, Kai Liu,
>>
>>It is a known firmware issue and has been fixed. Please update to the
>>latest firmware available in the Broadcom support website.
>>Please let me know if you need any further information.
>
>Hi Chandrakanth,
>
>Could you let me know which megaraid based controllers are affected by this
>issue? All or
>some models or some generations?
>
>Best regards,
>Kai Liu

Hi Kai Liu,

Gen3 (Invader) and Gen3.5 (Ventura/Aero) generations of controllers are
affected.

Thanks,
Chandrakanth Patil


Re: [PATCH] scsi: megaraid_sas: fix kdump kernel boot hung caused by JBOD

2020-06-04 Thread Kai Liu

On 2020/06/04 Thu 16:39, Chandrakanth Patil wrote:


Hi Martin, Xiaoming Gao, Kai Liu,

It is a known firmware issue and has been fixed. Please update to the
latest firmware available in the Broadcom support website.
Please let me know if you need any further information.


Hi Chandrakanth,

Could you let me know which megaraid based controllers are affected by 
this issue? All or some models or some generations?


Best regards,
Kai Liu


RE: [PATCH] scsi: megaraid_sas: fix kdump kernel boot hung caused by JBOD

2020-06-04 Thread Chandrakanth Patil
>Subject: RE: [PATCH] scsi: megaraid_sas: fix kdump kernel boot hung
caused by JBOD
>
>>Subject: Re: [PATCH] scsi: megaraid_sas: fix kdump kernel boot hung
>>caused by JBOD
>>
>>
>>> when kernel crash, and kexec into kdump kernel, megaraid_sas will
>>> hung and print follow error logs
>>>
>>> 24.1485901 sd 0:0:G:0: [sda 1 tag809 BRCfl Debug mfi stat 0x2(1, data
>>> len requested/conpleted 0X100 0/0x0)]
>>> 24.1867171 sd 0:0:G :9: [sda I tag861 BRCfl Debug mfft stat 0x2d,
>>> data len reques ted/conp1e Led 0X100 0/0x0]
>>> 24.2054191 sd 0:O:6:O: [sda 1 tag861 FAILED Result: hustbyte=DIDGK
>>> drioerbyte-DRIUCR SENSE]
>>> 24.2549711 bik_update_ request ! 1/0 error , dev sda, sector
>>> 937782912 op 0x0:(READ) flags 0x0 phys_seg 1 prio class
>>> 21.2752791 buffer_io_error 2 callbacks suppressed
>>> 21.2752731 Duffer IO error an dev sda, logical block 117212064, async
>>> page read
>>>
>>> this bug is caused by commit '59db5a931bbe73f ("scsi: megaraid_sas:
>>> Handle sequence JBOD map failure at driver level ")' and can be fixed
>>> by not set JOB when reset_devices on
>>
>>Broadcom: Please review.
>>
>>Thanks!
>>
>>--
>>Martin K. PetersenOracle Linux Engineering
>
>We are working on it and will update you at the earliest.
>
>Thanks,
>Chandrakanth Patil

Hi Martin, Xiaoming Gao, Kai Liu,

It is a known firmware issue and has been fixed. Please update to the
latest firmware available in the Broadcom support website.
Please let me know if you need any further information.

Thanks,
Chandrakanth Patil


RE: [PATCH] scsi: megaraid_sas: fix kdump kernel boot hung caused by JBOD

2020-06-03 Thread Chandrakanth Patil
>Subject: Re: [PATCH] scsi: megaraid_sas: fix kdump kernel boot hung
caused by JBOD
>
>
>> when kernel crash, and kexec into kdump kernel, megaraid_sas will hung
>> and print follow error logs
>>
>> 24.1485901 sd 0:0:G:0: [sda 1 tag809 BRCfl Debug mfi stat 0x2(1, data
>> len requested/conpleted 0X100 0/0x0)]
>> 24.1867171 sd 0:0:G :9: [sda I tag861 BRCfl Debug mfft stat 0x2d, data
>> len reques ted/conp1e Led 0X100 0/0x0]
>> 24.2054191 sd 0:O:6:O: [sda 1 tag861 FAILED Result: hustbyte=DIDGK
>> drioerbyte-DRIUCR SENSE]
>> 24.2549711 bik_update_ request ! 1/0 error , dev sda, sector 937782912
>> op 0x0:(READ) flags 0x0 phys_seg 1 prio class
>> 21.2752791 buffer_io_error 2 callbacks suppressed
>> 21.2752731 Duffer IO error an dev sda, logical block 117212064, async
>> page read
>>
>> this bug is caused by commit '59db5a931bbe73f ("scsi: megaraid_sas:
>> Handle sequence JBOD map failure at driver level ")' and can be fixed
>> by not set JOB when reset_devices on
>
>Broadcom: Please review.
>
>Thanks!
>
>--
>Martin K. Petersen Oracle Linux Engineering

We are working on it and will update you at the earliest.

Thanks,
Chandrakanth Patil


Re: [PATCH] scsi: megaraid_sas: fix kdump kernel boot hung caused by JBOD

2020-06-02 Thread Martin K. Petersen


> when kernel crash, and kexec into kdump kernel, megaraid_sas will hung
> and print follow error logs
>
> 24.1485901 sd 0:0:G:0: [sda 1 tag809 BRCfl Debug mfi stat 0x2(1, data len 
> requested/conpleted 0X100
> 0/0x0)]
> 24.1867171 sd 0:0:G :9: [sda I tag861 BRCfl Debug mfft stat 0x2d, data len 
> reques ted/conp1e Led 0X100
> 0/0x0]
> 24.2054191 sd 0:O:6:O: [sda 1 tag861 FAILED Result: hustbyte=DIDGK 
> drioerbyte-DRIUCR SENSE]
> 24.2549711 bik_update_ request ! 1/0 error , dev sda, sector 937782912 op 
> 0x0:(READ) flags 0x0 phys_seg 1 prio class
> 21.2752791 buffer_io_error 2 callbacks suppressed
> 21.2752731 Duffer IO error an dev sda, logical block 117212064, async page 
> read
>
> this bug is caused by commit '59db5a931bbe73f ("scsi: megaraid_sas:
> Handle sequence JBOD map failure at driver level ")' and can be fixed
> by not set JOB when reset_devices on

Broadcom: Please review.

Thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] scsi: megaraid_sas: fix kdump kernel boot hung caused by JBOD

2020-06-02 Thread Kai Liu

On 2020/05/28 Thu 15:31, xiakaixu1...@gmail.com wrote:

From: Xiaoming Gao 

when kernel crash, and kexec into kdump kernel, megaraid_sas will hung and
print follow error logs

24.1485901 sd 0:0:G:0: [sda 1 tag809 BRCfl Debug mfi stat 0x2(1, data len 
requested/conpleted 0X100
0/0x0)]
24.1867171 sd 0:0:G :9: [sda I tag861 BRCfl Debug mfft stat 0x2d, data len 
reques ted/conp1e Led 0X100
0/0x0]
24.2054191 sd 0:O:6:O: [sda 1 tag861 FAILED Result: hustbyte=DIDGK 
drioerbyte-DRIUCR SENSE]
24.2549711 bik_update_ request ! 1/0 error , dev sda, sector 937782912 op 
0x0:(READ) flags 0x0 phys_seg 1 prio class
21.2752791 buffer_io_error 2 callbacks suppressed
21.2752731 Duffer IO error an dev sda, logical block 117212064, async page read

this bug is caused by commit '59db5a931bbe73f ("scsi: megaraid_sas: Handle 
sequence JBOD map failure at
driver level
")'
and can be fixed by not set JOB when reset_devices on


I've recently run into this exact issue on a arm64 machine with Avago 
3408 controller. This patch fixed the issue. Thank you.


Tested-by: Kai Liu 

Best regards,
Kai



Signed-off-by: Xiaoming Gao 
---
drivers/scsi/megaraid/megaraid_sas_fusion.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c 
b/drivers/scsi/megaraid/megaraid_sas_fusion.c
index b2ad965..24e7f1b 100644
--- a/drivers/scsi/megaraid/megaraid_sas_fusion.c
+++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c
@@ -3127,7 +3127,7 @@ static void megasas_build_ld_nonrw_fusion(struct 
megasas_instance *instance,
<< MR_RAID_CTX_RAID_FLAGS_IO_SUB_TYPE_SHIFT;

/* If FW supports PD sequence number */
-   if (instance->support_seqnum_jbod_fp) {
+   if (!reset_devices && instance->support_seqnum_jbod_fp) {
if (instance->use_seqnum_jbod_fp &&
instance->pd_list[pd_index].driveType == TYPE_DISK) {

--
1.8.3.1





[PATCH] scsi: megaraid_sas: fix kdump kernel boot hung caused by JBOD

2020-05-28 Thread xiakaixu1987
From: Xiaoming Gao 

when kernel crash, and kexec into kdump kernel, megaraid_sas will hung and
print follow error logs

24.1485901 sd 0:0:G:0: [sda 1 tag809 BRCfl Debug mfi stat 0x2(1, data len 
requested/conpleted 0X100
0/0x0)]
24.1867171 sd 0:0:G :9: [sda I tag861 BRCfl Debug mfft stat 0x2d, data len 
reques ted/conp1e Led 0X100
0/0x0]
24.2054191 sd 0:O:6:O: [sda 1 tag861 FAILED Result: hustbyte=DIDGK 
drioerbyte-DRIUCR SENSE]
24.2549711 bik_update_ request ! 1/0 error , dev sda, sector 937782912 op 
0x0:(READ) flags 0x0 phys_seg 1 prio class
21.2752791 buffer_io_error 2 callbacks suppressed
21.2752731 Duffer IO error an dev sda, logical block 117212064, async page read

this bug is caused by commit '59db5a931bbe73f ("scsi: megaraid_sas: Handle 
sequence JBOD map failure at
 driver level
")'
and can be fixed by not set JOB when reset_devices on

Signed-off-by: Xiaoming Gao 
---
 drivers/scsi/megaraid/megaraid_sas_fusion.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c 
b/drivers/scsi/megaraid/megaraid_sas_fusion.c
index b2ad965..24e7f1b 100644
--- a/drivers/scsi/megaraid/megaraid_sas_fusion.c
+++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c
@@ -3127,7 +3127,7 @@ static void megasas_build_ld_nonrw_fusion(struct 
megasas_instance *instance,
<< MR_RAID_CTX_RAID_FLAGS_IO_SUB_TYPE_SHIFT;
 
/* If FW supports PD sequence number */
-   if (instance->support_seqnum_jbod_fp) {
+   if (!reset_devices && instance->support_seqnum_jbod_fp) {
if (instance->use_seqnum_jbod_fp &&
instance->pd_list[pd_index].driveType == TYPE_DISK) {
 
-- 
1.8.3.1