Re: [PATCH v3 0/7] Enhance libsas hotplug feature
Hi, I'm sorry to say that I have to stop the libsas hotplug improvement work, I will resign from Huawei, so I have no time and hardware to continue to work at this issue. John is very familiar with this work, and provide a lot of good suggestions. So if John like, I am glad he could join to work at this issues, And my colleague Jason Yan could also provide helps. Thanks! Yijing. 在 2017/7/10 15:06, Yijing Wang 写道: > This patchset is based Johannes's patch > "scsi: sas: scsi_queue_work can fail, so make callers aware" > > Now the libsas hotplug has some issues, Dan Williams report > a similar bug here before > https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg39187.html > > The issues we have found > 1. if LLDD burst reports lots of phy-up/phy-down sas events, some events >may lost because a same sas events is pending now, finally libsas topo >may different the hardware. > 2. receive a phy down sas event, libsas call sas_deform_port to remove >devices, it would first delete the sas port, then put a destruction >discovery event in a new work, and queue it at the tail of workqueue, >once the sas port be deleted, its children device will be deleted too, >when the destruction work start, it will found the target device has >been removed, and report a sysfs warnning. > 3. since a hotplug process will be devided into several works, if a phy up >sas event insert into phydown works, like >destruction work ---> PORTE_BYTES_DMAED (sas_form_port) > >PHYE_LOSS_OF_SIGNAL >the hot remove flow would broken by PORTE_BYTES_DMAED event, it's not >we expected, and issues would occur. > > The first patch fix the sas events lost, and the second one introudce > wait-complete > to fix the hotplug order issues. > > v2->v3: some code improvements suggested by Johannes and John, > split v2 patch 2 into several small pathes. > v1->v2: some code improvements suggested by John Garry > > Yijing Wang (7): > libsas: Use static sas event pool to appease sas event lost > libsas: remove unused port_gone_completion > libsas: Use new workqueue to run sas event > libsas: add sas event wait-complete support > libsas: add a new workqueue to run probe/destruct discovery event > libsas: add wait-complete support to sync discovery event > libsas: release disco mutex during waiting in sas_ex_discover_end_dev > > drivers/scsi/libsas/sas_discover.c | 58 +++--- > drivers/scsi/libsas/sas_event.c| 212 > - > drivers/scsi/libsas/sas_expander.c | 22 +++- > drivers/scsi/libsas/sas_init.c | 21 ++-- > drivers/scsi/libsas/sas_internal.h | 64 +++ > drivers/scsi/libsas/sas_phy.c | 48 +++-- > drivers/scsi/libsas/sas_port.c | 22 ++-- > include/scsi/libsas.h | 27 +++-- > 8 files changed, 373 insertions(+), 101 deletions(-) >
Re: [PATCH v3 0/7] Enhance libsas hotplug feature
在 2017/7/13 16:08, John Garry 写道: > On 13/07/2017 02:37, wangyijing wrote: >>> > So much nicer. BTW, /dev/sdb is a SATA disk, the rest are SAS. >> Oh, I take a mistake ? The result you tested the hotplug which applied this >> patchset is fine ? >> >> Thanks! >> Yijing. > > Well basic hotplug is fine, as below. I did not do any robust testing. > OK, thanks,I tested with and without fio running, the results are both fine. Thanks! Yijing. > root@(none)$ echo 0 > ./phy-0:7/sas_phy/phy-0:7/enable > root@(none)$ [ 180.147676] hisi_sas_v2_hw HISI0162:01: found dev[8:1] is gone > [ 180.216558] hisi_sas_v2_hw HISI0162:01: found dev[7:1] is gone > [ 180.280548] hisi_sas_v2_hw HISI0162:01: found dev[6:1] is gone > [ 180.352556] hisi_sas_v2_hw HISI0162:01: found dev[5:1] is gone > [ 180.432495] hisi_sas_v2_hw HISI0162:01: found dev[4:1] is gone > [ 180.508492] hisi_sas_v2_hw HISI0162:01: found dev[3:1] is gone > [ 180.527577] sd 0:0:1:0: [sdb] Synchronizing SCSI cache > [ 180.532728] sd 0:0:1:0: [sdb] Synchronize Cache(10) failed: Result: > hostbyte=0x04 driverbyte=0x00 > [ 180.541591] sd 0:0:1:0: [sdb] Stopping disk > [ 180.545767] sd 0:0:1:0: [sdb] Start/Stop Unit failed: Result: > hostbyte=0x04 driverbyte=0x00 > [ 180.612491] hisi_sas_v2_hw HISI0162:01: found dev[2:5] is gone > [ 180.696452] hisi_sas_v2_hw HISI0162:01: found dev[1:1] is gone > [ 180.703221] hisi_sas_v2_hw HISI0162:01: found dev[0:2] is gone > > root@(none)$ echo 1 > ./phy-0:7/sas_phy/phy-0:7/enable > root@(none)$ [ 185.937831] hisi_sas_v2_hw HISI0162:01: phyup: phy7 > link_rate=11 > [ 185.996575] scsi 0:0:8:0: Direct-Access SanDisk LT0200MO P404 PQ: 0 > ANSI: 6 > [ 187.059642] ata2.00: ATA-8: HGST HUS724040ALA640, MFAOA8B0, max UDMA/133 > [ 187.066341] ata2.00: 7814037168 sectors, multi 0: LBA48 NCQ (depth 31/32) > [ 187.073278] ata2.00: ATA Identify Device Log not supported > [ 187.078755] ata2.00: Security Log not supported > [ 187.085239] ata2.00: ATA Identify Device Log not supported > [ 187.090715] ata2.00: Security Log not supported > [ 187.095236] ata2.00: configured for UDMA/133 > [ 187.136917] scsi 0:0:9:0: Direct-Access ATA HGST HUS724040AL A8B0 > PQ: 0 ANSI: 5 > [ 187.187612] sd 0:0:9:0: [sdb] 7814037168 512-byte logical blocks: (4.00 > TB/3.64 TiB) > [ 187.195365] sd 0:0:9:0: [sdb] Write Protect is off > [ 187.200161] sd 0:0:9:0: [sdb] Write cache: enabled, read cache: enabled, > doesn't support DPO or FUA > [ 187.223844] sd 0:0:9:0: [sdb] Attached SCSI disk > [ 187.225498] scsi 0:0:10:0: Direct-Access SanDisk LT0200MO P404 PQ: 0 > ANSI: 6 > [ 187.243864] sd 0:0:8:0: [sda] 390721968 512-byte logical blocks: (200 > GB/186 GiB) > [ 187.285879] sd 0:0:8:0: [sda] Write Protect is off > [ 187.367898] sd 0:0:8:0: [sda] Write cache: disabled, read cache: disabled, > supports DPO and FUA > [ 187.524043] scsi 0:0:11:0: Direct-Access SanDisk LT0200MO P404 PQ: 0 > ANSI: 6 > [ 187.701505] sd 0:0:10:0: [sdc] 390721968 512-byte logical blocks: (200 > GB/186 GiB) > [ 187.743547] sd 0:0:10:0: [sdc] Write Protect is off > [ 187.822546] scsi 0:0:12:0: Direct-Access SanDisk LT0200MO P404 PQ: 0 > ANSI: 6 > [ 187.825531] sd 0:0:10:0: [sdc] Write cache: disabled, read cache: > disabled, supports DPO and FUA > [ 188.000167] sd 0:0:11:0: [sdd] 390721968 512-byte logical blocks: (200 > GB/186 GiB) > [ 188.042205] sd 0:0:11:0: [sdd] Write Protect is off > [ 188.121527] scsi 0:0:13:0: Direct-Access SanDisk LT0200MO P404 PQ: 0 > ANSI: 6 > [ 188.124274] sd 0:0:11:0: [sdd] Write cache: disabled, read cache: > disabled, supports DPO and FUA > [ 188.298942] sd 0:0:12:0: [sde] 390721968 512-byte logical blocks: (200 > GB/186 GiB) > [ 188.340960] sd 0:0:12:0: [sde] Write Protect is off > [ 188.420023] scsi 0:0:14:0: Direct-Access SanDisk LT0200MO P404 PQ: 0 > ANSI: 6 > [ 188.422969] sd 0:0:12:0: [sde] Write cache: disabled, read cache: > disabled, supports DPO and FUA > [ 188.597501] sd 0:0:13:0: [sdf] 390721968 512-byte logical blocks: (200 > GB/186 GiB) > [ 188.605069] sd 0:0:8:0: [sda] Attached SCSI disk > [ 188.639520] sd 0:0:13:0: [sdf] Write Protect is off > [ 188.682445] scsi 0:0:15:0: Enclosure 12G SAS Expander RevB PQ: 0 > ANSI: 6 > [ 188.721540] sd 0:0:13:0: [sdf] Write cache: disabled, read cache: > disabled, supports DPO and FUA > [ 188.896399] sd 0:0:14:0: [sdg] 390721968 512-byte logical blocks: (200 > GB/186 GiB) > [ 188.938445] sd 0:0:14:0: [sdg] Write Protect is off > [ 189.020444] sd 0:0:14:0: [sdg] Write cache: disabled, read cache: > disabled, supports DPO and FUA > [ 189.060608] sd 0:0:10:0: [sdc] Attached SCSI disk > [ 189.359073] sd 0:0:11:0: [sdd] Attached SCSI disk > [ 189.657643] sd 0:0:12:0: [sde] Attached SCSI disk > [ 189.956585] sd 0:0:13:0: [sdf] Attached SCSI disk > [ 190.255148] sd 0:0:14:0: [sdg] Attached SCSI disk > > root@(none)$ echo 0 > ./phy-0:7/sas_phy/phy-0:7/enable > root@(none)$ [
Re: [PATCH v3 0/7] Enhance libsas hotplug feature
On 13/07/2017 02:37, wangyijing wrote: > So much nicer. BTW, /dev/sdb is a SATA disk, the rest are SAS. Oh, I take a mistake ? The result you tested the hotplug which applied this patchset is fine ? Thanks! Yijing. Well basic hotplug is fine, as below. I did not do any robust testing. root@(none)$ echo 0 > ./phy-0:7/sas_phy/phy-0:7/enable root@(none)$ [ 180.147676] hisi_sas_v2_hw HISI0162:01: found dev[8:1] is gone [ 180.216558] hisi_sas_v2_hw HISI0162:01: found dev[7:1] is gone [ 180.280548] hisi_sas_v2_hw HISI0162:01: found dev[6:1] is gone [ 180.352556] hisi_sas_v2_hw HISI0162:01: found dev[5:1] is gone [ 180.432495] hisi_sas_v2_hw HISI0162:01: found dev[4:1] is gone [ 180.508492] hisi_sas_v2_hw HISI0162:01: found dev[3:1] is gone [ 180.527577] sd 0:0:1:0: [sdb] Synchronizing SCSI cache [ 180.532728] sd 0:0:1:0: [sdb] Synchronize Cache(10) failed: Result: hostbyte=0x04 driverbyte=0x00 [ 180.541591] sd 0:0:1:0: [sdb] Stopping disk [ 180.545767] sd 0:0:1:0: [sdb] Start/Stop Unit failed: Result: hostbyte=0x04 driverbyte=0x00 [ 180.612491] hisi_sas_v2_hw HISI0162:01: found dev[2:5] is gone [ 180.696452] hisi_sas_v2_hw HISI0162:01: found dev[1:1] is gone [ 180.703221] hisi_sas_v2_hw HISI0162:01: found dev[0:2] is gone root@(none)$ echo 1 > ./phy-0:7/sas_phy/phy-0:7/enable root@(none)$ [ 185.937831] hisi_sas_v2_hw HISI0162:01: phyup: phy7 link_rate=11 [ 185.996575] scsi 0:0:8:0: Direct-Access SanDisk LT0200MO P404 PQ: 0 ANSI: 6 [ 187.059642] ata2.00: ATA-8: HGST HUS724040ALA640, MFAOA8B0, max UDMA/133 [ 187.066341] ata2.00: 7814037168 sectors, multi 0: LBA48 NCQ (depth 31/32) [ 187.073278] ata2.00: ATA Identify Device Log not supported [ 187.078755] ata2.00: Security Log not supported [ 187.085239] ata2.00: ATA Identify Device Log not supported [ 187.090715] ata2.00: Security Log not supported [ 187.095236] ata2.00: configured for UDMA/133 [ 187.136917] scsi 0:0:9:0: Direct-Access ATA HGST HUS724040AL A8B0 PQ: 0 ANSI: 5 [ 187.187612] sd 0:0:9:0: [sdb] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB) [ 187.195365] sd 0:0:9:0: [sdb] Write Protect is off [ 187.200161] sd 0:0:9:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 187.223844] sd 0:0:9:0: [sdb] Attached SCSI disk [ 187.225498] scsi 0:0:10:0: Direct-Access SanDisk LT0200MO P404 PQ: 0 ANSI: 6 [ 187.243864] sd 0:0:8:0: [sda] 390721968 512-byte logical blocks: (200 GB/186 GiB) [ 187.285879] sd 0:0:8:0: [sda] Write Protect is off [ 187.367898] sd 0:0:8:0: [sda] Write cache: disabled, read cache: disabled, supports DPO and FUA [ 187.524043] scsi 0:0:11:0: Direct-Access SanDisk LT0200MO P404 PQ: 0 ANSI: 6 [ 187.701505] sd 0:0:10:0: [sdc] 390721968 512-byte logical blocks: (200 GB/186 GiB) [ 187.743547] sd 0:0:10:0: [sdc] Write Protect is off [ 187.822546] scsi 0:0:12:0: Direct-Access SanDisk LT0200MO P404 PQ: 0 ANSI: 6 [ 187.825531] sd 0:0:10:0: [sdc] Write cache: disabled, read cache: disabled, supports DPO and FUA [ 188.000167] sd 0:0:11:0: [sdd] 390721968 512-byte logical blocks: (200 GB/186 GiB) [ 188.042205] sd 0:0:11:0: [sdd] Write Protect is off [ 188.121527] scsi 0:0:13:0: Direct-Access SanDisk LT0200MO P404 PQ: 0 ANSI: 6 [ 188.124274] sd 0:0:11:0: [sdd] Write cache: disabled, read cache: disabled, supports DPO and FUA [ 188.298942] sd 0:0:12:0: [sde] 390721968 512-byte logical blocks: (200 GB/186 GiB) [ 188.340960] sd 0:0:12:0: [sde] Write Protect is off [ 188.420023] scsi 0:0:14:0: Direct-Access SanDisk LT0200MO P404 PQ: 0 ANSI: 6 [ 188.422969] sd 0:0:12:0: [sde] Write cache: disabled, read cache: disabled, supports DPO and FUA [ 188.597501] sd 0:0:13:0: [sdf] 390721968 512-byte logical blocks: (200 GB/186 GiB) [ 188.605069] sd 0:0:8:0: [sda] Attached SCSI disk [ 188.639520] sd 0:0:13:0: [sdf] Write Protect is off [ 188.682445] scsi 0:0:15:0: Enclosure 12G SAS Expander RevB PQ: 0 ANSI: 6 [ 188.721540] sd 0:0:13:0: [sdf] Write cache: disabled, read cache: disabled, supports DPO and FUA [ 188.896399] sd 0:0:14:0: [sdg] 390721968 512-byte logical blocks: (200 GB/186 GiB) [ 188.938445] sd 0:0:14:0: [sdg] Write Protect is off [ 189.020444] sd 0:0:14:0: [sdg] Write cache: disabled, read cache: disabled, supports DPO and FUA [ 189.060608] sd 0:0:10:0: [sdc] Attached SCSI disk [ 189.359073] sd 0:0:11:0: [sdd] Attached SCSI disk [ 189.657643] sd 0:0:12:0: [sde] Attached SCSI disk [ 189.956585] sd 0:0:13:0: [sdf] Attached SCSI disk [ 190.255148] sd 0:0:14:0: [sdg] Attached SCSI disk root@(none)$ echo 0 > ./phy-0:7/sas_phy/phy-0:7/enable root@(none)$ [ 192.895718] hisi_sas_v2_hw HISI0162:01: found dev[8:1] is gone [ 192.964671] hisi_sas_v2_hw HISI0162:01: found dev[7:1] is gone [ 193.032744] hisi_sas_v2_hw HISI0162:01: found dev[6:1] is gone [ 193.096755] hisi_sas_v2_hw HISI0162:01: found dev[5:1] is gone [ 193.157072] hisi_sas_v2_hw HISI0162:01: found
Re: [PATCH v3 0/7] Enhance libsas hotplug feature
在 2017/7/12 17:59, John Garry 写道: > On 10/07/2017 08:06, Yijing Wang wrote: >> This patchset is based Johannes's patch >> "scsi: sas: scsi_queue_work can fail, so make callers aware" >> >> Now the libsas hotplug has some issues, Dan Williams report >> a similar bug here before >> https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg39187.html >> >> The issues we have found >> 1. if LLDD burst reports lots of phy-up/phy-down sas events, some events >>may lost because a same sas events is pending now, finally libsas topo >>may different the hardware. >> 2. receive a phy down sas event, libsas call sas_deform_port to remove >>devices, it would first delete the sas port, then put a destruction >>discovery event in a new work, and queue it at the tail of workqueue, >>once the sas port be deleted, its children device will be deleted too, >>when the destruction work start, it will found the target device has >>been removed, and report a sysfs warnning. >> 3. since a hotplug process will be devided into several works, if a phy up >>sas event insert into phydown works, like >>destruction work ---> PORTE_BYTES_DMAED (sas_form_port) >> >PHYE_LOSS_OF_SIGNAL >>the hot remove flow would broken by PORTE_BYTES_DMAED event, it's not >>we expected, and issues would occur. >> >> The first patch fix the sas events lost, and the second one introudce >> wait-complete >> to fix the hotplug order issues. >> > > I quickly tested this for basic hotplug. > > Before: > root@(none)$ echo 0 > ./phy-0:6/sas_phy/phy-0:6/enable > root@(none)$ echo 0 > ./phy-0:5/sas_phy/phy-0:5/enable > root@(none)$ echo 0 > ./phy-0:4/sas_phy/phy-0:4/enable > root@(none)$ echo 0 > ./phy-0:3/sas_phy/phy-0:3/enable > root@(none)$ echo 0 > ./phy-0:3/sas_phy/phy-0:2/enable > root@(none)$ echo 0 > ./phy-0:2/sas_phy/phy-0:2/enable > root@(none)$ echo 0 > ./phy-0:1/sas_phy/phy-0:1/enable > root@(none)$ echo 0 > ./phy-0:0/sas_phy/phy-0:0/enable > root@(none)$ echo 0 > ./phy-0:7/sas_phy/phy-0:7/enable > root@(none)$ [ 102.570694] sysfs group 'power' not found for kobject > '0:0:7:0' > [ 102.577250] [ cut here ] > [ 102.581861] WARNING: CPU: 3 PID: 1740 at fs/sysfs/group.c:237 > sysfs_remove_group+0x8c/0x94 > [ 102.590110] Modules linked in: > [ 102.593154] CPU: 3 PID: 1740 Comm: kworker/u128:2 Not tainted > 4.12.0-rc1-00032-g3ab81fc #1907 > [ 102.601664] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon D05 > UEFI Nemo 1.7 RC3 06/23/2017 > [ 102.610784] Workqueue: scsi_wq_0 sas_destruct_devices > [ 102.615822] task: 8017d4793400 task.stack: 8017b7e7 > [ 102.621728] PC is at sysfs_remove_group+0x8c/0x94 > [ 102.626419] LR is at sysfs_remove_group+0x8c/0x94 > [ 102.631109] pc : [] lr : [] pstate: > 6045 > [ 102.638490] sp : 8017b7e73b80 > [ 102.641791] x29: 8017b7e73b80 x28: 8017db010800 > [ 102.647091] x27: 08e27000 x26: 8017d43e6600 > [ 102.652390] x25: 8017b828 x24: 0003 > [ 102.657689] x23: 8017b78864b0 x22: 8017b784c988 > [ 102.662988] x21: 8017b7886410 x20: 08ee9dd0 > [ 102.668288] x19: x18: 08a1b678 > [ 102.673587] x17: 000e x16: 0007 > [ 102.678886] x15: x14: 00a3 > [ 102.684185] x13: 0033 x12: 0028 > [ 102.689484] x11: 08f3be58 x10: > [ 102.694783] x9 : 043c x8 : 6f6b20726f662064 > [ 102.700082] x7 : 08e29e08 x6 : 8017fbe34c50 > [ 102.705382] x5 : x4 : > [ 102.710681] x3 : x2 : 08e427e0 > [ 102.715980] x1 : x0 : 0033 > [ 102.721279] ---[ end trace c216cc1451d5f7ec ]--- > [ 102.725882] Call trace: > [ 102.728316] Exception stack(0x8017b7e739b0 to 0x8017b7e73ae0) > [ 102.734742] 39a0: > 0001 > [ 102.742557] 39c0: 8017b7e73b80 08267c44 08bfa050 > > [ 102.750372] 39e0: 8017b78864b0 0003 8017b828 > 8017d43e6600 > [ 102.758188] 3a00: 08e27000 8017db010800 8017d4793400 > > [ 102.766003] 3a20: 8017b7e73b80 8017b7e73b80 8017b7e73b40 > ffc8 > [ 102.773818] 3a40: 8017b7e73a70 0810c12c 0033 > > [ 102.781633] 3a60: 08e427e0 > > [ 102.789449] 3a80: 8017fbe34c50 08e29e08 6f6b20726f662064 > 043c > [ 102.797264] 3aa0: 08f3be58 0028 > 0033 > [ 102.805079] 3ac0: 00a3 0007 > 000e > [ 102.812895] [] sysfs_remove_group+0x8c/0x94 > [ 102.818628] [] dpm_sysfs_remove+0x58/0x68 > [ 102.824188] []
Re: [PATCH v3 0/7] Enhance libsas hotplug feature
在 2017/7/12 17:59, John Garry 写道: > On 10/07/2017 08:06, Yijing Wang wrote: >> This patchset is based Johannes's patch >> "scsi: sas: scsi_queue_work can fail, so make callers aware" >> >> Now the libsas hotplug has some issues, Dan Williams report >> a similar bug here before >> https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg39187.html >> >> The issues we have found >> 1. if LLDD burst reports lots of phy-up/phy-down sas events, some events >>may lost because a same sas events is pending now, finally libsas topo >>may different the hardware. >> 2. receive a phy down sas event, libsas call sas_deform_port to remove >>devices, it would first delete the sas port, then put a destruction >>discovery event in a new work, and queue it at the tail of workqueue, >>once the sas port be deleted, its children device will be deleted too, >>when the destruction work start, it will found the target device has >>been removed, and report a sysfs warnning. >> 3. since a hotplug process will be devided into several works, if a phy up >>sas event insert into phydown works, like >>destruction work ---> PORTE_BYTES_DMAED (sas_form_port) >> >PHYE_LOSS_OF_SIGNAL >>the hot remove flow would broken by PORTE_BYTES_DMAED event, it's not >>we expected, and issues would occur. >> >> The first patch fix the sas events lost, and the second one introudce >> wait-complete >> to fix the hotplug order issues. >> > > I quickly tested this for basic hotplug. > > Before: > root@(none)$ echo 0 > ./phy-0:6/sas_phy/phy-0:6/enable > root@(none)$ echo 0 > ./phy-0:5/sas_phy/phy-0:5/enable > root@(none)$ echo 0 > ./phy-0:4/sas_phy/phy-0:4/enable > root@(none)$ echo 0 > ./phy-0:3/sas_phy/phy-0:3/enable > root@(none)$ echo 0 > ./phy-0:3/sas_phy/phy-0:2/enable > root@(none)$ echo 0 > ./phy-0:2/sas_phy/phy-0:2/enable > root@(none)$ echo 0 > ./phy-0:1/sas_phy/phy-0:1/enable > root@(none)$ echo 0 > ./phy-0:0/sas_phy/phy-0:0/enable > root@(none)$ echo 0 > ./phy-0:7/sas_phy/phy-0:7/enable > root@(none)$ [ 102.570694] sysfs group 'power' not found for kobject > '0:0:7:0' > [ 102.577250] [ cut here ] > [ 102.581861] WARNING: CPU: 3 PID: 1740 at fs/sysfs/group.c:237 > sysfs_remove_group+0x8c/0x94 > [ 102.590110] Modules linked in: > [ 102.593154] CPU: 3 PID: 1740 Comm: kworker/u128:2 Not tainted > 4.12.0-rc1-00032-g3ab81fc #1907 > [ 102.601664] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon D05 > UEFI Nemo 1.7 RC3 06/23/2017 > [ 102.610784] Workqueue: scsi_wq_0 sas_destruct_devices > [ 102.615822] task: 8017d4793400 task.stack: 8017b7e7 > [ 102.621728] PC is at sysfs_remove_group+0x8c/0x94 > [ 102.626419] LR is at sysfs_remove_group+0x8c/0x94 > [ 102.631109] pc : [] lr : [] pstate: > 6045 > [ 102.638490] sp : 8017b7e73b80 > [ 102.641791] x29: 8017b7e73b80 x28: 8017db010800 > [ 102.647091] x27: 08e27000 x26: 8017d43e6600 > [ 102.652390] x25: 8017b828 x24: 0003 > [ 102.657689] x23: 8017b78864b0 x22: 8017b784c988 > [ 102.662988] x21: 8017b7886410 x20: 08ee9dd0 > [ 102.668288] x19: x18: 08a1b678 > [ 102.673587] x17: 000e x16: 0007 > [ 102.678886] x15: x14: 00a3 > [ 102.684185] x13: 0033 x12: 0028 > [ 102.689484] x11: 08f3be58 x10: > [ 102.694783] x9 : 043c x8 : 6f6b20726f662064 > [ 102.700082] x7 : 08e29e08 x6 : 8017fbe34c50 > [ 102.705382] x5 : x4 : > [ 102.710681] x3 : x2 : 08e427e0 > [ 102.715980] x1 : x0 : 0033 > [ 102.721279] ---[ end trace c216cc1451d5f7ec ]--- > [ 102.725882] Call trace: > [ 102.728316] Exception stack(0x8017b7e739b0 to 0x8017b7e73ae0) > [ 102.734742] 39a0: > 0001 > [ 102.742557] 39c0: 8017b7e73b80 08267c44 08bfa050 > > [ 102.750372] 39e0: 8017b78864b0 0003 8017b828 > 8017d43e6600 > [ 102.758188] 3a00: 08e27000 8017db010800 8017d4793400 > > [ 102.766003] 3a20: 8017b7e73b80 8017b7e73b80 8017b7e73b40 > ffc8 > [ 102.773818] 3a40: 8017b7e73a70 0810c12c 0033 > > [ 102.781633] 3a60: 08e427e0 > > [ 102.789449] 3a80: 8017fbe34c50 08e29e08 6f6b20726f662064 > 043c > [ 102.797264] 3aa0: 08f3be58 0028 > 0033 > [ 102.805079] 3ac0: 00a3 0007 > 000e > [ 102.812895] [] sysfs_remove_group+0x8c/0x94 > [ 102.818628] [] dpm_sysfs_remove+0x58/0x68 > [ 102.824188] []
Re: [PATCH v3 0/7] Enhance libsas hotplug feature
On Wed, Jul 12, 2017 at 10:59:27AM +0100, John Garry wrote: > After: > ... > root@(none)$ echo 0 > ./phy-0:7/sas_phy/phy-0:7/enable > root@(none)$ [ 446.193336] hisi_sas_v2_hw HISI0162:01: found dev[8:1] is > gone > [ 446.249205] hisi_sas_v2_hw HISI0162:01: found dev[7:1] is gone > [ 446.325201] hisi_sas_v2_hw HISI0162:01: found dev[6:1] is gone > [ 446.373189] hisi_sas_v2_hw HISI0162:01: found dev[5:1] is gone > [ 446.421187] hisi_sas_v2_hw HISI0162:01: found dev[4:1] is gone > [ 446.457232] hisi_sas_v2_hw HISI0162:01: found dev[3:1] is gone > [ 446.477151] sd 0:0:1:0: [sdb] Synchronizing SCSI cache > [ 446.482373] sd 0:0:1:0: [sdb] Synchronize Cache(10) failed: Result: > hostbyte=0x04 driverbyte=0x00 > [ 446.491238] sd 0:0:1:0: [sdb] Stopping disk > [ 446.495419] sd 0:0:1:0: [sdb] Start/Stop Unit failed: Result: > hostbyte=0x04 driverbyte=0x00 > [ 446.525227] hisi_sas_v2_hw HISI0162:01: found dev[2:5] is gone > [ 446.569249] hisi_sas_v2_hw HISI0162:01: found dev[1:1] is gone > [ 446.576872] hisi_sas_v2_hw HISI0162:01: found dev[0:2] is gone > > root@(none)$ > > So much nicer. BTW, /dev/sdb is a SATA disk, the rest are SAS. This is awesome. I hope I have some time reviewing the patches themselfes soon. Johannes -- Johannes Thumshirn Storage jthumsh...@suse.de+49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
Re: [PATCH v3 0/7] Enhance libsas hotplug feature
On 10/07/2017 08:06, Yijing Wang wrote: This patchset is based Johannes's patch "scsi: sas: scsi_queue_work can fail, so make callers aware" Now the libsas hotplug has some issues, Dan Williams report a similar bug here before https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg39187.html The issues we have found 1. if LLDD burst reports lots of phy-up/phy-down sas events, some events may lost because a same sas events is pending now, finally libsas topo may different the hardware. 2. receive a phy down sas event, libsas call sas_deform_port to remove devices, it would first delete the sas port, then put a destruction discovery event in a new work, and queue it at the tail of workqueue, once the sas port be deleted, its children device will be deleted too, when the destruction work start, it will found the target device has been removed, and report a sysfs warnning. 3. since a hotplug process will be devided into several works, if a phy up sas event insert into phydown works, like destruction work ---> PORTE_BYTES_DMAED (sas_form_port) >PHYE_LOSS_OF_SIGNAL the hot remove flow would broken by PORTE_BYTES_DMAED event, it's not we expected, and issues would occur. The first patch fix the sas events lost, and the second one introudce wait-complete to fix the hotplug order issues. I quickly tested this for basic hotplug. Before: root@(none)$ echo 0 > ./phy-0:6/sas_phy/phy-0:6/enable root@(none)$ echo 0 > ./phy-0:5/sas_phy/phy-0:5/enable root@(none)$ echo 0 > ./phy-0:4/sas_phy/phy-0:4/enable root@(none)$ echo 0 > ./phy-0:3/sas_phy/phy-0:3/enable root@(none)$ echo 0 > ./phy-0:3/sas_phy/phy-0:2/enable root@(none)$ echo 0 > ./phy-0:2/sas_phy/phy-0:2/enable root@(none)$ echo 0 > ./phy-0:1/sas_phy/phy-0:1/enable root@(none)$ echo 0 > ./phy-0:0/sas_phy/phy-0:0/enable root@(none)$ echo 0 > ./phy-0:7/sas_phy/phy-0:7/enable root@(none)$ [ 102.570694] sysfs group 'power' not found for kobject '0:0:7:0' [ 102.577250] [ cut here ] [ 102.581861] WARNING: CPU: 3 PID: 1740 at fs/sysfs/group.c:237 sysfs_remove_group+0x8c/0x94 [ 102.590110] Modules linked in: [ 102.593154] CPU: 3 PID: 1740 Comm: kworker/u128:2 Not tainted 4.12.0-rc1-00032-g3ab81fc #1907 [ 102.601664] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon D05 UEFI Nemo 1.7 RC3 06/23/2017 [ 102.610784] Workqueue: scsi_wq_0 sas_destruct_devices [ 102.615822] task: 8017d4793400 task.stack: 8017b7e7 [ 102.621728] PC is at sysfs_remove_group+0x8c/0x94 [ 102.626419] LR is at sysfs_remove_group+0x8c/0x94 [ 102.631109] pc : [] lr : [] pstate: 6045 [ 102.638490] sp : 8017b7e73b80 [ 102.641791] x29: 8017b7e73b80 x28: 8017db010800 [ 102.647091] x27: 08e27000 x26: 8017d43e6600 [ 102.652390] x25: 8017b828 x24: 0003 [ 102.657689] x23: 8017b78864b0 x22: 8017b784c988 [ 102.662988] x21: 8017b7886410 x20: 08ee9dd0 [ 102.668288] x19: x18: 08a1b678 [ 102.673587] x17: 000e x16: 0007 [ 102.678886] x15: x14: 00a3 [ 102.684185] x13: 0033 x12: 0028 [ 102.689484] x11: 08f3be58 x10: [ 102.694783] x9 : 043c x8 : 6f6b20726f662064 [ 102.700082] x7 : 08e29e08 x6 : 8017fbe34c50 [ 102.705382] x5 : x4 : [ 102.710681] x3 : x2 : 08e427e0 [ 102.715980] x1 : x0 : 0033 [ 102.721279] ---[ end trace c216cc1451d5f7ec ]--- [ 102.725882] Call trace: [ 102.728316] Exception stack(0x8017b7e739b0 to 0x8017b7e73ae0) [ 102.734742] 39a0: 0001 [ 102.742557] 39c0: 8017b7e73b80 08267c44 08bfa050 [ 102.750372] 39e0: 8017b78864b0 0003 8017b828 8017d43e6600 [ 102.758188] 3a00: 08e27000 8017db010800 8017d4793400 [ 102.766003] 3a20: 8017b7e73b80 8017b7e73b80 8017b7e73b40 ffc8 [ 102.773818] 3a40: 8017b7e73a70 0810c12c 0033 [ 102.781633] 3a60: 08e427e0 [ 102.789449] 3a80: 8017fbe34c50 08e29e08 6f6b20726f662064 043c [ 102.797264] 3aa0: 08f3be58 0028 0033 [ 102.805079] 3ac0: 00a3 0007 000e [ 102.812895] [] sysfs_remove_group+0x8c/0x94 [ 102.818628] [] dpm_sysfs_remove+0x58/0x68 [ 102.824188] [] device_del+0xf8/0x2d0 [ 102.829312] [] device_unregister+0x14/0x2c [ 102.834959] [] bsg_unregister_queue+0x60/0x98 [ 102.840866] [] __scsi_remove_device+0xa0/0xbc [ 151.331854] 3bc0: 081f21ac 803370c0 [ 151.336718] []
[PATCH v3 0/7] Enhance libsas hotplug feature
This patchset is based Johannes's patch "scsi: sas: scsi_queue_work can fail, so make callers aware" Now the libsas hotplug has some issues, Dan Williams report a similar bug here before https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg39187.html The issues we have found 1. if LLDD burst reports lots of phy-up/phy-down sas events, some events may lost because a same sas events is pending now, finally libsas topo may different the hardware. 2. receive a phy down sas event, libsas call sas_deform_port to remove devices, it would first delete the sas port, then put a destruction discovery event in a new work, and queue it at the tail of workqueue, once the sas port be deleted, its children device will be deleted too, when the destruction work start, it will found the target device has been removed, and report a sysfs warnning. 3. since a hotplug process will be devided into several works, if a phy up sas event insert into phydown works, like destruction work ---> PORTE_BYTES_DMAED (sas_form_port) >PHYE_LOSS_OF_SIGNAL the hot remove flow would broken by PORTE_BYTES_DMAED event, it's not we expected, and issues would occur. The first patch fix the sas events lost, and the second one introudce wait-complete to fix the hotplug order issues. v2->v3: some code improvements suggested by Johannes and John, split v2 patch 2 into several small pathes. v1->v2: some code improvements suggested by John Garry Yijing Wang (7): libsas: Use static sas event pool to appease sas event lost libsas: remove unused port_gone_completion libsas: Use new workqueue to run sas event libsas: add sas event wait-complete support libsas: add a new workqueue to run probe/destruct discovery event libsas: add wait-complete support to sync discovery event libsas: release disco mutex during waiting in sas_ex_discover_end_dev drivers/scsi/libsas/sas_discover.c | 58 +++--- drivers/scsi/libsas/sas_event.c| 212 - drivers/scsi/libsas/sas_expander.c | 22 +++- drivers/scsi/libsas/sas_init.c | 21 ++-- drivers/scsi/libsas/sas_internal.h | 64 +++ drivers/scsi/libsas/sas_phy.c | 48 +++-- drivers/scsi/libsas/sas_port.c | 22 ++-- include/scsi/libsas.h | 27 +++-- 8 files changed, 373 insertions(+), 101 deletions(-) -- 2.5.0