Re: [PATCH v3 0/7] Enhance libsas hotplug feature

2017-07-14 Thread wangyijing
Hi, I'm sorry to say that I have to stop the libsas hotplug improvement work, I 
will resign from
Huawei, so I have no time and hardware to continue to work at this issue. John 
is very familiar with
this work, and provide a lot of good suggestions. So if John like, I am glad he 
could join to work
at this issues, And my colleague Jason Yan could also provide helps.


Thanks!
Yijing.


在 2017/7/10 15:06, Yijing Wang 写道:
> This patchset is based Johannes's patch
> "scsi: sas: scsi_queue_work can fail, so make callers aware"
> 
> Now the libsas hotplug has some issues, Dan Williams report
> a similar bug here before
> https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg39187.html
> 
> The issues we have found
> 1. if LLDD burst reports lots of phy-up/phy-down sas events, some events
>may lost because a same sas events is pending now, finally libsas topo
>may different the hardware.
> 2. receive a phy down sas event, libsas call sas_deform_port to remove
>devices, it would first delete the sas port, then put a destruction
>discovery event in a new work, and queue it at the tail of workqueue,
>once the sas port be deleted, its children device will be deleted too,
>when the destruction work start, it will found the target device has
>been removed, and report a sysfs warnning.
> 3. since a hotplug process will be devided into several works, if a phy up
>sas event insert into phydown works, like
>destruction work  ---> PORTE_BYTES_DMAED (sas_form_port) 
> >PHYE_LOSS_OF_SIGNAL
>the hot remove flow would broken by PORTE_BYTES_DMAED event, it's not
>we expected, and issues would occur.
> 
> The first patch fix the sas events lost, and the second one introudce 
> wait-complete
> to fix the hotplug order issues.
> 
> v2->v3: some code improvements suggested by Johannes and John,
>   split v2 patch 2 into several small pathes.
> v1->v2: some code improvements suggested by John Garry
> 
> Yijing Wang (7):
>   libsas: Use static sas event pool to appease sas event lost
>   libsas: remove unused port_gone_completion
>   libsas: Use new workqueue to run sas event
>   libsas: add sas event wait-complete support
>   libsas: add a new workqueue to run probe/destruct discovery event
>   libsas: add wait-complete support to sync discovery event
>   libsas: release disco mutex during waiting in sas_ex_discover_end_dev
> 
>  drivers/scsi/libsas/sas_discover.c |  58 +++---
>  drivers/scsi/libsas/sas_event.c| 212 
> -
>  drivers/scsi/libsas/sas_expander.c |  22 +++-
>  drivers/scsi/libsas/sas_init.c |  21 ++--
>  drivers/scsi/libsas/sas_internal.h |  64 +++
>  drivers/scsi/libsas/sas_phy.c  |  48 +++--
>  drivers/scsi/libsas/sas_port.c |  22 ++--
>  include/scsi/libsas.h  |  27 +++--
>  8 files changed, 373 insertions(+), 101 deletions(-)
> 



Re: [PATCH v3 0/7] Enhance libsas hotplug feature

2017-07-13 Thread wangyijing


在 2017/7/13 16:08, John Garry 写道:
> On 13/07/2017 02:37, wangyijing wrote:
>>> > So much nicer. BTW, /dev/sdb is a SATA disk, the rest are SAS.
>> Oh, I take a mistake ? The result you tested the hotplug which applied this 
>> patchset is fine ?
>>
>> Thanks!
>> Yijing.
> 
> Well basic hotplug is fine, as below. I did not do any robust testing.
> 

OK, thanks,I tested with and without fio running, the results are both fine.

Thanks!
Yijing.

> root@(none)$ echo 0 > ./phy-0:7/sas_phy/phy-0:7/enable
> root@(none)$ [  180.147676] hisi_sas_v2_hw HISI0162:01: found dev[8:1] is gone
> [  180.216558] hisi_sas_v2_hw HISI0162:01: found dev[7:1] is gone
> [  180.280548] hisi_sas_v2_hw HISI0162:01: found dev[6:1] is gone
> [  180.352556] hisi_sas_v2_hw HISI0162:01: found dev[5:1] is gone
> [  180.432495] hisi_sas_v2_hw HISI0162:01: found dev[4:1] is gone
> [  180.508492] hisi_sas_v2_hw HISI0162:01: found dev[3:1] is gone
> [  180.527577] sd 0:0:1:0: [sdb] Synchronizing SCSI cache
> [  180.532728] sd 0:0:1:0: [sdb] Synchronize Cache(10) failed: Result: 
> hostbyte=0x04 driverbyte=0x00
> [  180.541591] sd 0:0:1:0: [sdb] Stopping disk
> [  180.545767] sd 0:0:1:0: [sdb] Start/Stop Unit failed: Result: 
> hostbyte=0x04 driverbyte=0x00
> [  180.612491] hisi_sas_v2_hw HISI0162:01: found dev[2:5] is gone
> [  180.696452] hisi_sas_v2_hw HISI0162:01: found dev[1:1] is gone
> [  180.703221] hisi_sas_v2_hw HISI0162:01: found dev[0:2] is gone
> 
> root@(none)$ echo 1 > ./phy-0:7/sas_phy/phy-0:7/enable
> root@(none)$ [  185.937831] hisi_sas_v2_hw HISI0162:01: phyup: phy7 
> link_rate=11
> [  185.996575] scsi 0:0:8:0: Direct-Access SanDisk  LT0200MO P404 PQ: 0 
> ANSI: 6
> [  187.059642] ata2.00: ATA-8: HGST HUS724040ALA640, MFAOA8B0, max UDMA/133
> [  187.066341] ata2.00: 7814037168 sectors, multi 0: LBA48 NCQ (depth 31/32)
> [  187.073278] ata2.00: ATA Identify Device Log not supported
> [  187.078755] ata2.00: Security Log not supported
> [  187.085239] ata2.00: ATA Identify Device Log not supported
> [  187.090715] ata2.00: Security Log not supported
> [  187.095236] ata2.00: configured for UDMA/133
> [  187.136917] scsi 0:0:9:0: Direct-Access ATA  HGST HUS724040AL A8B0 
> PQ: 0 ANSI: 5
> [  187.187612] sd 0:0:9:0: [sdb] 7814037168 512-byte logical blocks: (4.00 
> TB/3.64 TiB)
> [  187.195365] sd 0:0:9:0: [sdb] Write Protect is off
> [  187.200161] sd 0:0:9:0: [sdb] Write cache: enabled, read cache: enabled, 
> doesn't support DPO or FUA
> [  187.223844] sd 0:0:9:0: [sdb] Attached SCSI disk
> [  187.225498] scsi 0:0:10:0: Direct-Access SanDisk  LT0200MO  P404 PQ: 0 
> ANSI: 6
> [  187.243864] sd 0:0:8:0: [sda] 390721968 512-byte logical blocks: (200 
> GB/186 GiB)
> [  187.285879] sd 0:0:8:0: [sda] Write Protect is off
> [  187.367898] sd 0:0:8:0: [sda] Write cache: disabled, read cache: disabled, 
> supports DPO and FUA
> [  187.524043] scsi 0:0:11:0: Direct-Access SanDisk  LT0200MO  P404 PQ: 0 
> ANSI: 6
> [  187.701505] sd 0:0:10:0: [sdc] 390721968 512-byte logical blocks: (200 
> GB/186 GiB)
> [  187.743547] sd 0:0:10:0: [sdc] Write Protect is off
> [  187.822546] scsi 0:0:12:0: Direct-Access SanDisk  LT0200MO  P404 PQ: 0 
> ANSI: 6
> [  187.825531] sd 0:0:10:0: [sdc] Write cache: disabled, read cache: 
> disabled, supports DPO and FUA
> [  188.000167] sd 0:0:11:0: [sdd] 390721968 512-byte logical blocks: (200 
> GB/186 GiB)
> [  188.042205] sd 0:0:11:0: [sdd] Write Protect is off
> [  188.121527] scsi 0:0:13:0: Direct-Access SanDisk  LT0200MO  P404 PQ: 0 
> ANSI: 6
> [  188.124274] sd 0:0:11:0: [sdd] Write cache: disabled, read cache: 
> disabled, supports DPO and FUA
> [  188.298942] sd 0:0:12:0: [sde] 390721968 512-byte logical blocks: (200 
> GB/186 GiB)
> [  188.340960] sd 0:0:12:0: [sde] Write Protect is off
> [  188.420023] scsi 0:0:14:0: Direct-Access SanDisk  LT0200MO  P404 PQ: 0 
> ANSI: 6
> [  188.422969] sd 0:0:12:0: [sde] Write cache: disabled, read cache: 
> disabled, supports DPO and FUA
> [  188.597501] sd 0:0:13:0: [sdf] 390721968 512-byte logical blocks: (200 
> GB/186 GiB)
> [  188.605069] sd 0:0:8:0: [sda] Attached SCSI disk
> [  188.639520] sd 0:0:13:0: [sdf] Write Protect is off
> [  188.682445] scsi 0:0:15:0: Enclosure 12G SAS  Expander  RevB PQ: 0 
> ANSI: 6
> [  188.721540] sd 0:0:13:0: [sdf] Write cache: disabled, read cache: 
> disabled, supports DPO and FUA
> [  188.896399] sd 0:0:14:0: [sdg] 390721968 512-byte logical blocks: (200 
> GB/186 GiB)
> [  188.938445] sd 0:0:14:0: [sdg] Write Protect is off
> [  189.020444] sd 0:0:14:0: [sdg] Write cache: disabled, read cache: 
> disabled, supports DPO and FUA
> [  189.060608] sd 0:0:10:0: [sdc] Attached SCSI disk
> [  189.359073] sd 0:0:11:0: [sdd] Attached SCSI disk
> [  189.657643] sd 0:0:12:0: [sde] Attached SCSI disk
> [  189.956585] sd 0:0:13:0: [sdf] Attached SCSI disk
> [  190.255148] sd 0:0:14:0: [sdg] Attached SCSI disk
> 
> root@(none)$ echo 0 > ./phy-0:7/sas_phy/phy-0:7/enable
> root@(none)$ [  

Re: [PATCH v3 0/7] Enhance libsas hotplug feature

2017-07-13 Thread John Garry

On 13/07/2017 02:37, wangyijing wrote:

> So much nicer. BTW, /dev/sdb is a SATA disk, the rest are SAS.

Oh, I take a mistake ? The result you tested the hotplug which applied this 
patchset is fine ?

Thanks!
Yijing.


Well basic hotplug is fine, as below. I did not do any robust testing.

root@(none)$ echo 0 > ./phy-0:7/sas_phy/phy-0:7/enable
root@(none)$ [  180.147676] hisi_sas_v2_hw HISI0162:01: found dev[8:1] 
is gone

[  180.216558] hisi_sas_v2_hw HISI0162:01: found dev[7:1] is gone
[  180.280548] hisi_sas_v2_hw HISI0162:01: found dev[6:1] is gone
[  180.352556] hisi_sas_v2_hw HISI0162:01: found dev[5:1] is gone
[  180.432495] hisi_sas_v2_hw HISI0162:01: found dev[4:1] is gone
[  180.508492] hisi_sas_v2_hw HISI0162:01: found dev[3:1] is gone
[  180.527577] sd 0:0:1:0: [sdb] Synchronizing SCSI cache
[  180.532728] sd 0:0:1:0: [sdb] Synchronize Cache(10) failed: Result: 
hostbyte=0x04 driverbyte=0x00

[  180.541591] sd 0:0:1:0: [sdb] Stopping disk
[  180.545767] sd 0:0:1:0: [sdb] Start/Stop Unit failed: Result: 
hostbyte=0x04 driverbyte=0x00

[  180.612491] hisi_sas_v2_hw HISI0162:01: found dev[2:5] is gone
[  180.696452] hisi_sas_v2_hw HISI0162:01: found dev[1:1] is gone
[  180.703221] hisi_sas_v2_hw HISI0162:01: found dev[0:2] is gone

root@(none)$ echo 1 > ./phy-0:7/sas_phy/phy-0:7/enable
root@(none)$ [  185.937831] hisi_sas_v2_hw HISI0162:01: phyup: phy7 
link_rate=11
[  185.996575] scsi 0:0:8:0: Direct-Access SanDisk  LT0200MO 
P404 PQ: 0 ANSI: 6

[  187.059642] ata2.00: ATA-8: HGST HUS724040ALA640, MFAOA8B0, max UDMA/133
[  187.066341] ata2.00: 7814037168 sectors, multi 0: LBA48 NCQ (depth 31/32)
[  187.073278] ata2.00: ATA Identify Device Log not supported
[  187.078755] ata2.00: Security Log not supported
[  187.085239] ata2.00: ATA Identify Device Log not supported
[  187.090715] ata2.00: Security Log not supported
[  187.095236] ata2.00: configured for UDMA/133
[  187.136917] scsi 0:0:9:0: Direct-Access ATA  HGST HUS724040AL 
A8B0 PQ: 0 ANSI: 5
[  187.187612] sd 0:0:9:0: [sdb] 7814037168 512-byte logical blocks: 
(4.00 TB/3.64 TiB)

[  187.195365] sd 0:0:9:0: [sdb] Write Protect is off
[  187.200161] sd 0:0:9:0: [sdb] Write cache: enabled, read cache: 
enabled, doesn't support DPO or FUA

[  187.223844] sd 0:0:9:0: [sdb] Attached SCSI disk
[  187.225498] scsi 0:0:10:0: Direct-Access SanDisk  LT0200MO 
 P404 PQ: 0 ANSI: 6
[  187.243864] sd 0:0:8:0: [sda] 390721968 512-byte logical blocks: (200 
GB/186 GiB)

[  187.285879] sd 0:0:8:0: [sda] Write Protect is off
[  187.367898] sd 0:0:8:0: [sda] Write cache: disabled, read cache: 
disabled, supports DPO and FUA
[  187.524043] scsi 0:0:11:0: Direct-Access SanDisk  LT0200MO 
 P404 PQ: 0 ANSI: 6
[  187.701505] sd 0:0:10:0: [sdc] 390721968 512-byte logical blocks: 
(200 GB/186 GiB)

[  187.743547] sd 0:0:10:0: [sdc] Write Protect is off
[  187.822546] scsi 0:0:12:0: Direct-Access SanDisk  LT0200MO 
 P404 PQ: 0 ANSI: 6
[  187.825531] sd 0:0:10:0: [sdc] Write cache: disabled, read cache: 
disabled, supports DPO and FUA
[  188.000167] sd 0:0:11:0: [sdd] 390721968 512-byte logical blocks: 
(200 GB/186 GiB)

[  188.042205] sd 0:0:11:0: [sdd] Write Protect is off
[  188.121527] scsi 0:0:13:0: Direct-Access SanDisk  LT0200MO 
 P404 PQ: 0 ANSI: 6
[  188.124274] sd 0:0:11:0: [sdd] Write cache: disabled, read cache: 
disabled, supports DPO and FUA
[  188.298942] sd 0:0:12:0: [sde] 390721968 512-byte logical blocks: 
(200 GB/186 GiB)

[  188.340960] sd 0:0:12:0: [sde] Write Protect is off
[  188.420023] scsi 0:0:14:0: Direct-Access SanDisk  LT0200MO 
 P404 PQ: 0 ANSI: 6
[  188.422969] sd 0:0:12:0: [sde] Write cache: disabled, read cache: 
disabled, supports DPO and FUA
[  188.597501] sd 0:0:13:0: [sdf] 390721968 512-byte logical blocks: 
(200 GB/186 GiB)

[  188.605069] sd 0:0:8:0: [sda] Attached SCSI disk
[  188.639520] sd 0:0:13:0: [sdf] Write Protect is off
[  188.682445] scsi 0:0:15:0: Enclosure 12G SAS  Expander 
 RevB PQ: 0 ANSI: 6
[  188.721540] sd 0:0:13:0: [sdf] Write cache: disabled, read cache: 
disabled, supports DPO and FUA
[  188.896399] sd 0:0:14:0: [sdg] 390721968 512-byte logical blocks: 
(200 GB/186 GiB)

[  188.938445] sd 0:0:14:0: [sdg] Write Protect is off
[  189.020444] sd 0:0:14:0: [sdg] Write cache: disabled, read cache: 
disabled, supports DPO and FUA

[  189.060608] sd 0:0:10:0: [sdc] Attached SCSI disk
[  189.359073] sd 0:0:11:0: [sdd] Attached SCSI disk
[  189.657643] sd 0:0:12:0: [sde] Attached SCSI disk
[  189.956585] sd 0:0:13:0: [sdf] Attached SCSI disk
[  190.255148] sd 0:0:14:0: [sdg] Attached SCSI disk

root@(none)$ echo 0 > ./phy-0:7/sas_phy/phy-0:7/enable
root@(none)$ [  192.895718] hisi_sas_v2_hw HISI0162:01: found dev[8:1] 
is gone

[  192.964671] hisi_sas_v2_hw HISI0162:01: found dev[7:1] is gone
[  193.032744] hisi_sas_v2_hw HISI0162:01: found dev[6:1] is gone
[  193.096755] hisi_sas_v2_hw HISI0162:01: found dev[5:1] is gone
[  193.157072] hisi_sas_v2_hw HISI0162:01: found 

Re: [PATCH v3 0/7] Enhance libsas hotplug feature

2017-07-12 Thread wangyijing


在 2017/7/12 17:59, John Garry 写道:
> On 10/07/2017 08:06, Yijing Wang wrote:
>> This patchset is based Johannes's patch
>> "scsi: sas: scsi_queue_work can fail, so make callers aware"
>>
>> Now the libsas hotplug has some issues, Dan Williams report
>> a similar bug here before
>> https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg39187.html
>>
>> The issues we have found
>> 1. if LLDD burst reports lots of phy-up/phy-down sas events, some events
>>may lost because a same sas events is pending now, finally libsas topo
>>may different the hardware.
>> 2. receive a phy down sas event, libsas call sas_deform_port to remove
>>devices, it would first delete the sas port, then put a destruction
>>discovery event in a new work, and queue it at the tail of workqueue,
>>once the sas port be deleted, its children device will be deleted too,
>>when the destruction work start, it will found the target device has
>>been removed, and report a sysfs warnning.
>> 3. since a hotplug process will be devided into several works, if a phy up
>>sas event insert into phydown works, like
>>destruction work  ---> PORTE_BYTES_DMAED (sas_form_port) 
>> >PHYE_LOSS_OF_SIGNAL
>>the hot remove flow would broken by PORTE_BYTES_DMAED event, it's not
>>we expected, and issues would occur.
>>
>> The first patch fix the sas events lost, and the second one introudce 
>> wait-complete
>> to fix the hotplug order issues.
>>
> 
> I quickly tested this for basic hotplug.
> 
> Before:
> root@(none)$ echo 0 > ./phy-0:6/sas_phy/phy-0:6/enable
> root@(none)$ echo 0 > ./phy-0:5/sas_phy/phy-0:5/enable
> root@(none)$ echo 0 > ./phy-0:4/sas_phy/phy-0:4/enable
> root@(none)$ echo 0 > ./phy-0:3/sas_phy/phy-0:3/enable
> root@(none)$ echo 0 > ./phy-0:3/sas_phy/phy-0:2/enable
> root@(none)$ echo 0 > ./phy-0:2/sas_phy/phy-0:2/enable
> root@(none)$ echo 0 > ./phy-0:1/sas_phy/phy-0:1/enable
> root@(none)$ echo 0 > ./phy-0:0/sas_phy/phy-0:0/enable
> root@(none)$ echo 0 > ./phy-0:7/sas_phy/phy-0:7/enable
> root@(none)$ [  102.570694] sysfs group 'power' not found for kobject 
> '0:0:7:0'
> [  102.577250] [ cut here ]
> [  102.581861] WARNING: CPU: 3 PID: 1740 at fs/sysfs/group.c:237 
> sysfs_remove_group+0x8c/0x94
> [  102.590110] Modules linked in:
> [  102.593154] CPU: 3 PID: 1740 Comm: kworker/u128:2 Not tainted 
> 4.12.0-rc1-00032-g3ab81fc #1907
> [  102.601664] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon D05 
> UEFI Nemo 1.7 RC3 06/23/2017
> [  102.610784] Workqueue: scsi_wq_0 sas_destruct_devices
> [  102.615822] task: 8017d4793400 task.stack: 8017b7e7
> [  102.621728] PC is at sysfs_remove_group+0x8c/0x94
> [  102.626419] LR is at sysfs_remove_group+0x8c/0x94
> [  102.631109] pc : [] lr : [] pstate: 
> 6045
> [  102.638490] sp : 8017b7e73b80
> [  102.641791] x29: 8017b7e73b80 x28: 8017db010800
> [  102.647091] x27: 08e27000 x26: 8017d43e6600
> [  102.652390] x25: 8017b828 x24: 0003
> [  102.657689] x23: 8017b78864b0 x22: 8017b784c988
> [  102.662988] x21: 8017b7886410 x20: 08ee9dd0
> [  102.668288] x19:  x18: 08a1b678
> [  102.673587] x17: 000e x16: 0007
> [  102.678886] x15:  x14: 00a3
> [  102.684185] x13: 0033 x12: 0028
> [  102.689484] x11: 08f3be58 x10: 
> [  102.694783] x9 : 043c x8 : 6f6b20726f662064
> [  102.700082] x7 : 08e29e08 x6 : 8017fbe34c50
> [  102.705382] x5 :  x4 : 
> [  102.710681] x3 :  x2 : 08e427e0
> [  102.715980] x1 :  x0 : 0033
> [  102.721279] ---[ end trace c216cc1451d5f7ec ]---
> [  102.725882] Call trace:
> [  102.728316] Exception stack(0x8017b7e739b0 to 0x8017b7e73ae0)
> [  102.734742] 39a0:    
> 0001
> [  102.742557] 39c0: 8017b7e73b80 08267c44 08bfa050 
> 
> [  102.750372] 39e0: 8017b78864b0 0003 8017b828 
> 8017d43e6600
> [  102.758188] 3a00: 08e27000 8017db010800 8017d4793400 
> 
> [  102.766003] 3a20: 8017b7e73b80 8017b7e73b80 8017b7e73b40 
> ffc8
> [  102.773818] 3a40: 8017b7e73a70 0810c12c 0033 
> 
> [  102.781633] 3a60: 08e427e0   
> 
> [  102.789449] 3a80: 8017fbe34c50 08e29e08 6f6b20726f662064 
> 043c
> [  102.797264] 3aa0:  08f3be58 0028 
> 0033
> [  102.805079] 3ac0: 00a3  0007 
> 000e
> [  102.812895] [] sysfs_remove_group+0x8c/0x94
> [  102.818628] [] dpm_sysfs_remove+0x58/0x68
> [  102.824188] [] 

Re: [PATCH v3 0/7] Enhance libsas hotplug feature

2017-07-12 Thread wangyijing


在 2017/7/12 17:59, John Garry 写道:
> On 10/07/2017 08:06, Yijing Wang wrote:
>> This patchset is based Johannes's patch
>> "scsi: sas: scsi_queue_work can fail, so make callers aware"
>>
>> Now the libsas hotplug has some issues, Dan Williams report
>> a similar bug here before
>> https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg39187.html
>>
>> The issues we have found
>> 1. if LLDD burst reports lots of phy-up/phy-down sas events, some events
>>may lost because a same sas events is pending now, finally libsas topo
>>may different the hardware.
>> 2. receive a phy down sas event, libsas call sas_deform_port to remove
>>devices, it would first delete the sas port, then put a destruction
>>discovery event in a new work, and queue it at the tail of workqueue,
>>once the sas port be deleted, its children device will be deleted too,
>>when the destruction work start, it will found the target device has
>>been removed, and report a sysfs warnning.
>> 3. since a hotplug process will be devided into several works, if a phy up
>>sas event insert into phydown works, like
>>destruction work  ---> PORTE_BYTES_DMAED (sas_form_port) 
>> >PHYE_LOSS_OF_SIGNAL
>>the hot remove flow would broken by PORTE_BYTES_DMAED event, it's not
>>we expected, and issues would occur.
>>
>> The first patch fix the sas events lost, and the second one introudce 
>> wait-complete
>> to fix the hotplug order issues.
>>
> 
> I quickly tested this for basic hotplug.
> 
> Before:
> root@(none)$ echo 0 > ./phy-0:6/sas_phy/phy-0:6/enable
> root@(none)$ echo 0 > ./phy-0:5/sas_phy/phy-0:5/enable
> root@(none)$ echo 0 > ./phy-0:4/sas_phy/phy-0:4/enable
> root@(none)$ echo 0 > ./phy-0:3/sas_phy/phy-0:3/enable
> root@(none)$ echo 0 > ./phy-0:3/sas_phy/phy-0:2/enable
> root@(none)$ echo 0 > ./phy-0:2/sas_phy/phy-0:2/enable
> root@(none)$ echo 0 > ./phy-0:1/sas_phy/phy-0:1/enable
> root@(none)$ echo 0 > ./phy-0:0/sas_phy/phy-0:0/enable
> root@(none)$ echo 0 > ./phy-0:7/sas_phy/phy-0:7/enable
> root@(none)$ [  102.570694] sysfs group 'power' not found for kobject 
> '0:0:7:0'
> [  102.577250] [ cut here ]
> [  102.581861] WARNING: CPU: 3 PID: 1740 at fs/sysfs/group.c:237 
> sysfs_remove_group+0x8c/0x94
> [  102.590110] Modules linked in:
> [  102.593154] CPU: 3 PID: 1740 Comm: kworker/u128:2 Not tainted 
> 4.12.0-rc1-00032-g3ab81fc #1907
> [  102.601664] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon D05 
> UEFI Nemo 1.7 RC3 06/23/2017
> [  102.610784] Workqueue: scsi_wq_0 sas_destruct_devices
> [  102.615822] task: 8017d4793400 task.stack: 8017b7e7
> [  102.621728] PC is at sysfs_remove_group+0x8c/0x94
> [  102.626419] LR is at sysfs_remove_group+0x8c/0x94
> [  102.631109] pc : [] lr : [] pstate: 
> 6045
> [  102.638490] sp : 8017b7e73b80
> [  102.641791] x29: 8017b7e73b80 x28: 8017db010800
> [  102.647091] x27: 08e27000 x26: 8017d43e6600
> [  102.652390] x25: 8017b828 x24: 0003
> [  102.657689] x23: 8017b78864b0 x22: 8017b784c988
> [  102.662988] x21: 8017b7886410 x20: 08ee9dd0
> [  102.668288] x19:  x18: 08a1b678
> [  102.673587] x17: 000e x16: 0007
> [  102.678886] x15:  x14: 00a3
> [  102.684185] x13: 0033 x12: 0028
> [  102.689484] x11: 08f3be58 x10: 
> [  102.694783] x9 : 043c x8 : 6f6b20726f662064
> [  102.700082] x7 : 08e29e08 x6 : 8017fbe34c50
> [  102.705382] x5 :  x4 : 
> [  102.710681] x3 :  x2 : 08e427e0
> [  102.715980] x1 :  x0 : 0033
> [  102.721279] ---[ end trace c216cc1451d5f7ec ]---
> [  102.725882] Call trace:
> [  102.728316] Exception stack(0x8017b7e739b0 to 0x8017b7e73ae0)
> [  102.734742] 39a0:    
> 0001
> [  102.742557] 39c0: 8017b7e73b80 08267c44 08bfa050 
> 
> [  102.750372] 39e0: 8017b78864b0 0003 8017b828 
> 8017d43e6600
> [  102.758188] 3a00: 08e27000 8017db010800 8017d4793400 
> 
> [  102.766003] 3a20: 8017b7e73b80 8017b7e73b80 8017b7e73b40 
> ffc8
> [  102.773818] 3a40: 8017b7e73a70 0810c12c 0033 
> 
> [  102.781633] 3a60: 08e427e0   
> 
> [  102.789449] 3a80: 8017fbe34c50 08e29e08 6f6b20726f662064 
> 043c
> [  102.797264] 3aa0:  08f3be58 0028 
> 0033
> [  102.805079] 3ac0: 00a3  0007 
> 000e
> [  102.812895] [] sysfs_remove_group+0x8c/0x94
> [  102.818628] [] dpm_sysfs_remove+0x58/0x68
> [  102.824188] [] 

Re: [PATCH v3 0/7] Enhance libsas hotplug feature

2017-07-12 Thread Johannes Thumshirn
On Wed, Jul 12, 2017 at 10:59:27AM +0100, John Garry wrote:
> After:
> ...
> root@(none)$ echo 0 > ./phy-0:7/sas_phy/phy-0:7/enable
> root@(none)$ [  446.193336] hisi_sas_v2_hw HISI0162:01: found dev[8:1] is
> gone
> [  446.249205] hisi_sas_v2_hw HISI0162:01: found dev[7:1] is gone
> [  446.325201] hisi_sas_v2_hw HISI0162:01: found dev[6:1] is gone
> [  446.373189] hisi_sas_v2_hw HISI0162:01: found dev[5:1] is gone
> [  446.421187] hisi_sas_v2_hw HISI0162:01: found dev[4:1] is gone
> [  446.457232] hisi_sas_v2_hw HISI0162:01: found dev[3:1] is gone
> [  446.477151] sd 0:0:1:0: [sdb] Synchronizing SCSI cache
> [  446.482373] sd 0:0:1:0: [sdb] Synchronize Cache(10) failed: Result:
> hostbyte=0x04 driverbyte=0x00
> [  446.491238] sd 0:0:1:0: [sdb] Stopping disk
> [  446.495419] sd 0:0:1:0: [sdb] Start/Stop Unit failed: Result:
> hostbyte=0x04 driverbyte=0x00
> [  446.525227] hisi_sas_v2_hw HISI0162:01: found dev[2:5] is gone
> [  446.569249] hisi_sas_v2_hw HISI0162:01: found dev[1:1] is gone
> [  446.576872] hisi_sas_v2_hw HISI0162:01: found dev[0:2] is gone
> 
> root@(none)$
> 
> So much nicer. BTW, /dev/sdb is a SATA disk, the rest are SAS.

This is awesome. I hope I have some time reviewing the patches themselfes
soon.

Johannes

-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


Re: [PATCH v3 0/7] Enhance libsas hotplug feature

2017-07-12 Thread John Garry

On 10/07/2017 08:06, Yijing Wang wrote:

This patchset is based Johannes's patch
"scsi: sas: scsi_queue_work can fail, so make callers aware"

Now the libsas hotplug has some issues, Dan Williams report
a similar bug here before
https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg39187.html

The issues we have found
1. if LLDD burst reports lots of phy-up/phy-down sas events, some events
   may lost because a same sas events is pending now, finally libsas topo
   may different the hardware.
2. receive a phy down sas event, libsas call sas_deform_port to remove
   devices, it would first delete the sas port, then put a destruction
   discovery event in a new work, and queue it at the tail of workqueue,
   once the sas port be deleted, its children device will be deleted too,
   when the destruction work start, it will found the target device has
   been removed, and report a sysfs warnning.
3. since a hotplug process will be devided into several works, if a phy up
   sas event insert into phydown works, like
   destruction work  ---> PORTE_BYTES_DMAED (sas_form_port) 
>PHYE_LOSS_OF_SIGNAL
   the hot remove flow would broken by PORTE_BYTES_DMAED event, it's not
   we expected, and issues would occur.

The first patch fix the sas events lost, and the second one introudce 
wait-complete
to fix the hotplug order issues.



I quickly tested this for basic hotplug.

Before:
root@(none)$ echo 0 > ./phy-0:6/sas_phy/phy-0:6/enable
root@(none)$ echo 0 > ./phy-0:5/sas_phy/phy-0:5/enable
root@(none)$ echo 0 > ./phy-0:4/sas_phy/phy-0:4/enable
root@(none)$ echo 0 > ./phy-0:3/sas_phy/phy-0:3/enable
root@(none)$ echo 0 > ./phy-0:3/sas_phy/phy-0:2/enable
root@(none)$ echo 0 > ./phy-0:2/sas_phy/phy-0:2/enable
root@(none)$ echo 0 > ./phy-0:1/sas_phy/phy-0:1/enable
root@(none)$ echo 0 > ./phy-0:0/sas_phy/phy-0:0/enable
root@(none)$ echo 0 > ./phy-0:7/sas_phy/phy-0:7/enable
root@(none)$ [  102.570694] sysfs group 'power' not found for kobject 
'0:0:7:0'

[  102.577250] [ cut here ]
[  102.581861] WARNING: CPU: 3 PID: 1740 at fs/sysfs/group.c:237 
sysfs_remove_group+0x8c/0x94

[  102.590110] Modules linked in:
[  102.593154] CPU: 3 PID: 1740 Comm: kworker/u128:2 Not tainted 
4.12.0-rc1-00032-g3ab81fc #1907
[  102.601664] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon 
D05 UEFI Nemo 1.7 RC3 06/23/2017

[  102.610784] Workqueue: scsi_wq_0 sas_destruct_devices
[  102.615822] task: 8017d4793400 task.stack: 8017b7e7
[  102.621728] PC is at sysfs_remove_group+0x8c/0x94
[  102.626419] LR is at sysfs_remove_group+0x8c/0x94
[  102.631109] pc : [] lr : [] 
pstate: 6045

[  102.638490] sp : 8017b7e73b80
[  102.641791] x29: 8017b7e73b80 x28: 8017db010800
[  102.647091] x27: 08e27000 x26: 8017d43e6600
[  102.652390] x25: 8017b828 x24: 0003
[  102.657689] x23: 8017b78864b0 x22: 8017b784c988
[  102.662988] x21: 8017b7886410 x20: 08ee9dd0
[  102.668288] x19:  x18: 08a1b678
[  102.673587] x17: 000e x16: 0007
[  102.678886] x15:  x14: 00a3
[  102.684185] x13: 0033 x12: 0028
[  102.689484] x11: 08f3be58 x10: 
[  102.694783] x9 : 043c x8 : 6f6b20726f662064
[  102.700082] x7 : 08e29e08 x6 : 8017fbe34c50
[  102.705382] x5 :  x4 : 
[  102.710681] x3 :  x2 : 08e427e0
[  102.715980] x1 :  x0 : 0033
[  102.721279] ---[ end trace c216cc1451d5f7ec ]---
[  102.725882] Call trace:
[  102.728316] Exception stack(0x8017b7e739b0 to 0x8017b7e73ae0)
[  102.734742] 39a0:    
0001
[  102.742557] 39c0: 8017b7e73b80 08267c44 08bfa050 

[  102.750372] 39e0: 8017b78864b0 0003 8017b828 
8017d43e6600
[  102.758188] 3a00: 08e27000 8017db010800 8017d4793400 

[  102.766003] 3a20: 8017b7e73b80 8017b7e73b80 8017b7e73b40 
ffc8
[  102.773818] 3a40: 8017b7e73a70 0810c12c 0033 

[  102.781633] 3a60: 08e427e0   

[  102.789449] 3a80: 8017fbe34c50 08e29e08 6f6b20726f662064 
043c
[  102.797264] 3aa0:  08f3be58 0028 
0033
[  102.805079] 3ac0: 00a3  0007 
000e

[  102.812895] [] sysfs_remove_group+0x8c/0x94
[  102.818628] [] dpm_sysfs_remove+0x58/0x68
[  102.824188] [] device_del+0xf8/0x2d0
[  102.829312] [] device_unregister+0x14/0x2c
[  102.834959] [] bsg_unregister_queue+0x60/0x98
[  102.840866] [] __scsi_remove_device+0xa0/0xbc



[  151.331854] 3bc0: 081f21ac 803370c0
[  151.336718] [] 

[PATCH v3 0/7] Enhance libsas hotplug feature

2017-07-10 Thread Yijing Wang
This patchset is based Johannes's patch
"scsi: sas: scsi_queue_work can fail, so make callers aware"

Now the libsas hotplug has some issues, Dan Williams report
a similar bug here before
https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg39187.html

The issues we have found
1. if LLDD burst reports lots of phy-up/phy-down sas events, some events
   may lost because a same sas events is pending now, finally libsas topo
   may different the hardware.
2. receive a phy down sas event, libsas call sas_deform_port to remove
   devices, it would first delete the sas port, then put a destruction
   discovery event in a new work, and queue it at the tail of workqueue,
   once the sas port be deleted, its children device will be deleted too,
   when the destruction work start, it will found the target device has
   been removed, and report a sysfs warnning.
3. since a hotplug process will be devided into several works, if a phy up
   sas event insert into phydown works, like
   destruction work  ---> PORTE_BYTES_DMAED (sas_form_port) 
>PHYE_LOSS_OF_SIGNAL
   the hot remove flow would broken by PORTE_BYTES_DMAED event, it's not
   we expected, and issues would occur.

The first patch fix the sas events lost, and the second one introudce 
wait-complete
to fix the hotplug order issues.

v2->v3: some code improvements suggested by Johannes and John,
split v2 patch 2 into several small pathes.
v1->v2: some code improvements suggested by John Garry

Yijing Wang (7):
  libsas: Use static sas event pool to appease sas event lost
  libsas: remove unused port_gone_completion
  libsas: Use new workqueue to run sas event
  libsas: add sas event wait-complete support
  libsas: add a new workqueue to run probe/destruct discovery event
  libsas: add wait-complete support to sync discovery event
  libsas: release disco mutex during waiting in sas_ex_discover_end_dev

 drivers/scsi/libsas/sas_discover.c |  58 +++---
 drivers/scsi/libsas/sas_event.c| 212 -
 drivers/scsi/libsas/sas_expander.c |  22 +++-
 drivers/scsi/libsas/sas_init.c |  21 ++--
 drivers/scsi/libsas/sas_internal.h |  64 +++
 drivers/scsi/libsas/sas_phy.c  |  48 +++--
 drivers/scsi/libsas/sas_port.c |  22 ++--
 include/scsi/libsas.h  |  27 +++--
 8 files changed, 373 insertions(+), 101 deletions(-)

-- 
2.5.0