subject:"Hotplug"

Re: Hotplug

2018-04-11 Thread Jens Axboe

On 4/11/18 10:14 AM, Jan Kara wrote:
> Hello,
> 
> On Wed 11-04-18 08:11:13, Jens Axboe wrote:
>> On 4/11/18 7:58 AM, Jan Kara wrote:
>>> On Tue 10-04-18 11:17:46, Jens Axboe wrote:
>>>> Been running some tests and I keep running into issues with hotplug.
>>>> This looks similar to what Bart posted the other day, but it looks
>>>> more deeply rooted than just having to protect the queue in
>>>> generic_make_request_checks(). The test run is blktests,
>>>> block/001. Current -git doesn't survive it. I've seen at least two
>>>> different oopses, pasted below.
>>>>
>>>> [  102.163442] NULL pointer dereference at 0010
>>>> [  102.163444] PGD 0 P4D 0 
>>>> [  102.163447] Oops:  [#1] PREEMPT SMP
>>>> [  102.163449] Modules linked in:
>>>> [  102.175540] sr 12:0:0:0: [sr2] scsi-1 drive
>>>> [  102.180112]  scsi_debug crc_t10dif crct10dif_generic crct10dif_common 
>>>> nvme nvme_core sb_edac xl
>>>> [  102.186934] sr 12:0:0:0: Attached scsi CD-ROM sr2
>>>> [  102.191896]  sr_mod cdrom btrfs xor zstd_decompress zstd_compress 
>>>> xxhash lzo_compress zlib_defc
>>>> [  102.197169] sr 12:0:0:0: Attached scsi generic sg7 type 5
>>>> [  102.203475]  igb ahci libahci i2c_algo_bit libata dca [last unloaded: 
>>>> crc_t10dif]
>>>> [  102.203484] CPU: 43 PID: 4629 Comm: systemd-udevd Not tainted 4.16.0+ 
>>>> #650
>>>> [  102.203487] Hardware name: Dell Inc. PowerEdge T630/0NT78X, BIOS 2.3.4 
>>>> 11/09/2016
>>>> [  102.350882] RIP: 0010:sr_block_revalidate_disk+0x23/0x190 [sr_mod]
>>>> [  102.358299] RSP: 0018:883ff357bb58 EFLAGS: 00010292
>>>> [  102.364734] RAX: a00b07d0 RBX: 883ff3058000 RCX: 
>>>> 883ff357bb66
>>>> [  102.373220] RDX: 0003 RSI: 7530 RDI: 
>>>> 881fea631000
>>>> [  102.381705] RBP:  R08: 881fe4d38400 R09: 
>>>> 
>>>> [  102.390185] R10:  R11: 01b6 R12: 
>>>> 085d
>>>> [  102.398671] R13: 085d R14: 883ffd9b3790 R15: 
>>>> 
>>>> [  102.407156] FS:  7f7dc8e6d8c0() GS:883fff34() 
>>>> knlGS:
>>>> [  102.417138] CS:  0010 DS:  ES:  CR0: 80050033
>>>> [  102.424066] CR2: 0010 CR3: 003ffda98005 CR4: 
>>>> 003606e0
>>>> [  102.432545] DR0:  DR1:  DR2: 
>>>> 
>>>> [  102.441024] DR3:  DR6: fffe0ff0 DR7: 
>>>> 0400
>>>> [  102.449502] Call Trace:
>>>> [  102.452744]  ? __invalidate_device+0x48/0x60
>>>> [  102.458022]  check_disk_change+0x4c/0x60
>>>> [  102.462900]  sr_block_open+0x16/0xd0 [sr_mod]
>>>> [  102.468270]  __blkdev_get+0xb9/0x450
>>>> [  102.472774]  ? iget5_locked+0x1c0/0x1e0
>>>> [  102.477568]  blkdev_get+0x11e/0x320
>>>> [  102.481969]  ? bdget+0x11d/0x150
>>>> [  102.486083]  ? _raw_spin_unlock+0xa/0x20
>>>> [  102.490968]  ? bd_acquire+0xc0/0xc0
>>>> [  102.495368]  do_dentry_open+0x1b0/0x320
>>>> [  102.500159]  ? inode_permission+0x24/0xc0
>>>> [  102.505140]  path_openat+0x4e6/0x1420
>>>> [  102.509741]  ? cpumask_any_but+0x1f/0x40
>>>> [  102.514630]  ? flush_tlb_mm_range+0xa0/0x120
>>>> [  102.519903]  do_filp_open+0x8c/0xf0
>>>> [  102.524305]  ? __seccomp_filter+0x28/0x230
>>>> [  102.529389]  ? _raw_spin_unlock+0xa/0x20
>>>> [  102.534283]  ? __handle_mm_fault+0x7d6/0x9b0
>>>> [  102.539559]  ? list_lru_add+0xa8/0xc0
>>>> [  102.544157]  ? _raw_spin_unlock+0xa/0x20
>>>> [  102.549047]  ? __alloc_fd+0xaf/0x160
>>>> [  102.553549]  ? do_sys_open+0x1a6/0x230
>>>> [  102.558244]  do_sys_open+0x1a6/0x230
>>>> [  102.562742]  do_syscall_64+0x5a/0x100
>>>> [  102.567336]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
>>>
>>> Interesting. Thinking out loud: This is cd->device dereference I guess
>>> which means disk->private_data was NULL. That gets set in sr_probe()
>>> together with disk->fops which are certainly set as they must have led us
>>> to the crashing function sr_block_revalidate_disk(). So likely
>>> disk->private_data got already cl

Re: Hotplug

2018-04-11 Thread Jan Kara

Hello,

On Wed 11-04-18 08:11:13, Jens Axboe wrote:
> On 4/11/18 7:58 AM, Jan Kara wrote:
> > On Tue 10-04-18 11:17:46, Jens Axboe wrote:
> >> Been running some tests and I keep running into issues with hotplug.
> >> This looks similar to what Bart posted the other day, but it looks
> >> more deeply rooted than just having to protect the queue in
> >> generic_make_request_checks(). The test run is blktests,
> >> block/001. Current -git doesn't survive it. I've seen at least two
> >> different oopses, pasted below.
> >>
> >> [  102.163442] NULL pointer dereference at 0010
> >> [  102.163444] PGD 0 P4D 0 
> >> [  102.163447] Oops:  [#1] PREEMPT SMP
> >> [  102.163449] Modules linked in:
> >> [  102.175540] sr 12:0:0:0: [sr2] scsi-1 drive
> >> [  102.180112]  scsi_debug crc_t10dif crct10dif_generic crct10dif_common 
> >> nvme nvme_core sb_edac xl
> >> [  102.186934] sr 12:0:0:0: Attached scsi CD-ROM sr2
> >> [  102.191896]  sr_mod cdrom btrfs xor zstd_decompress zstd_compress 
> >> xxhash lzo_compress zlib_defc
> >> [  102.197169] sr 12:0:0:0: Attached scsi generic sg7 type 5
> >> [  102.203475]  igb ahci libahci i2c_algo_bit libata dca [last unloaded: 
> >> crc_t10dif]
> >> [  102.203484] CPU: 43 PID: 4629 Comm: systemd-udevd Not tainted 4.16.0+ 
> >> #650
> >> [  102.203487] Hardware name: Dell Inc. PowerEdge T630/0NT78X, BIOS 2.3.4 
> >> 11/09/2016
> >> [  102.350882] RIP: 0010:sr_block_revalidate_disk+0x23/0x190 [sr_mod]
> >> [  102.358299] RSP: 0018:883ff357bb58 EFLAGS: 00010292
> >> [  102.364734] RAX: a00b07d0 RBX: 883ff3058000 RCX: 
> >> 883ff357bb66
> >> [  102.373220] RDX: 0003 RSI: 7530 RDI: 
> >> 881fea631000
> >> [  102.381705] RBP:  R08: 881fe4d38400 R09: 
> >> 
> >> [  102.390185] R10:  R11: 01b6 R12: 
> >> 085d
> >> [  102.398671] R13: 085d R14: 883ffd9b3790 R15: 
> >> 
> >> [  102.407156] FS:  7f7dc8e6d8c0() GS:883fff34() 
> >> knlGS:
> >> [  102.417138] CS:  0010 DS:  ES:  CR0: 80050033
> >> [  102.424066] CR2: 0010 CR3: 003ffda98005 CR4: 
> >> 003606e0
> >> [  102.432545] DR0:  DR1:  DR2: 
> >> 
> >> [  102.441024] DR3:  DR6: fffe0ff0 DR7: 
> >> 0400
> >> [  102.449502] Call Trace:
> >> [  102.452744]  ? __invalidate_device+0x48/0x60
> >> [  102.458022]  check_disk_change+0x4c/0x60
> >> [  102.462900]  sr_block_open+0x16/0xd0 [sr_mod]
> >> [  102.468270]  __blkdev_get+0xb9/0x450
> >> [  102.472774]  ? iget5_locked+0x1c0/0x1e0
> >> [  102.477568]  blkdev_get+0x11e/0x320
> >> [  102.481969]  ? bdget+0x11d/0x150
> >> [  102.486083]  ? _raw_spin_unlock+0xa/0x20
> >> [  102.490968]  ? bd_acquire+0xc0/0xc0
> >> [  102.495368]  do_dentry_open+0x1b0/0x320
> >> [  102.500159]  ? inode_permission+0x24/0xc0
> >> [  102.505140]  path_openat+0x4e6/0x1420
> >> [  102.509741]  ? cpumask_any_but+0x1f/0x40
> >> [  102.514630]  ? flush_tlb_mm_range+0xa0/0x120
> >> [  102.519903]  do_filp_open+0x8c/0xf0
> >> [  102.524305]  ? __seccomp_filter+0x28/0x230
> >> [  102.529389]  ? _raw_spin_unlock+0xa/0x20
> >> [  102.534283]  ? __handle_mm_fault+0x7d6/0x9b0
> >> [  102.539559]  ? list_lru_add+0xa8/0xc0
> >> [  102.544157]  ? _raw_spin_unlock+0xa/0x20
> >> [  102.549047]  ? __alloc_fd+0xaf/0x160
> >> [  102.553549]  ? do_sys_open+0x1a6/0x230
> >> [  102.558244]  do_sys_open+0x1a6/0x230
> >> [  102.562742]  do_syscall_64+0x5a/0x100
> >> [  102.567336]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> > 
> > Interesting. Thinking out loud: This is cd->device dereference I guess
> > which means disk->private_data was NULL. That gets set in sr_probe()
> > together with disk->fops which are certainly set as they must have led us
> > to the crashing function sr_block_revalidate_disk(). So likely
> > disk->private_data got already cleared. That happens in sr_kref_release()
> > and the fact that that function got called means struct scsi_cd went away -
> > so sr_remove() must have been called as well. That all seems possible like:
> > 
> > CPU1CPU2
> > sr_probe()
> >

Re: Hotplug

2018-04-11 Thread Jens Axboe

On 4/11/18 7:58 AM, Jan Kara wrote:
> Hi,
> 
> On Tue 10-04-18 11:17:46, Jens Axboe wrote:
>> Been running some tests and I keep running into issues with hotplug.
>> This looks similar to what Bart posted the other day, but it looks
>> more deeply rooted than just having to protect the queue in
>> generic_make_request_checks(). The test run is blktests,
>> block/001. Current -git doesn't survive it. I've seen at least two
>> different oopses, pasted below.
>>
>> [  102.163442] NULL pointer dereference at 0010
>> [  102.163444] PGD 0 P4D 0 
>> [  102.163447] Oops:  [#1] PREEMPT SMP
>> [  102.163449] Modules linked in:
>> [  102.175540] sr 12:0:0:0: [sr2] scsi-1 drive
>> [  102.180112]  scsi_debug crc_t10dif crct10dif_generic crct10dif_common 
>> nvme nvme_core sb_edac xl
>> [  102.186934] sr 12:0:0:0: Attached scsi CD-ROM sr2
>> [  102.191896]  sr_mod cdrom btrfs xor zstd_decompress zstd_compress xxhash 
>> lzo_compress zlib_defc
>> [  102.197169] sr 12:0:0:0: Attached scsi generic sg7 type 5
>> [  102.203475]  igb ahci libahci i2c_algo_bit libata dca [last unloaded: 
>> crc_t10dif]
>> [  102.203484] CPU: 43 PID: 4629 Comm: systemd-udevd Not tainted 4.16.0+ #650
>> [  102.203487] Hardware name: Dell Inc. PowerEdge T630/0NT78X, BIOS 2.3.4 
>> 11/09/2016
>> [  102.350882] RIP: 0010:sr_block_revalidate_disk+0x23/0x190 [sr_mod]
>> [  102.358299] RSP: 0018:883ff357bb58 EFLAGS: 00010292
>> [  102.364734] RAX: a00b07d0 RBX: 883ff3058000 RCX: 
>> 883ff357bb66
>> [  102.373220] RDX: 0003 RSI: 7530 RDI: 
>> 881fea631000
>> [  102.381705] RBP:  R08: 881fe4d38400 R09: 
>> 
>> [  102.390185] R10:  R11: 01b6 R12: 
>> 085d
>> [  102.398671] R13: 085d R14: 883ffd9b3790 R15: 
>> 
>> [  102.407156] FS:  7f7dc8e6d8c0() GS:883fff34() 
>> knlGS:
>> [  102.417138] CS:  0010 DS:  ES:  CR0: 80050033
>> [  102.424066] CR2: 0010 CR3: 003ffda98005 CR4: 
>> 003606e0
>> [  102.432545] DR0:  DR1:  DR2: 
>> 
>> [  102.441024] DR3:  DR6: fffe0ff0 DR7: 
>> 0400
>> [  102.449502] Call Trace:
>> [  102.452744]  ? __invalidate_device+0x48/0x60
>> [  102.458022]  check_disk_change+0x4c/0x60
>> [  102.462900]  sr_block_open+0x16/0xd0 [sr_mod]
>> [  102.468270]  __blkdev_get+0xb9/0x450
>> [  102.472774]  ? iget5_locked+0x1c0/0x1e0
>> [  102.477568]  blkdev_get+0x11e/0x320
>> [  102.481969]  ? bdget+0x11d/0x150
>> [  102.486083]  ? _raw_spin_unlock+0xa/0x20
>> [  102.490968]  ? bd_acquire+0xc0/0xc0
>> [  102.495368]  do_dentry_open+0x1b0/0x320
>> [  102.500159]  ? inode_permission+0x24/0xc0
>> [  102.505140]  path_openat+0x4e6/0x1420
>> [  102.509741]  ? cpumask_any_but+0x1f/0x40
>> [  102.514630]  ? flush_tlb_mm_range+0xa0/0x120
>> [  102.519903]  do_filp_open+0x8c/0xf0
>> [  102.524305]  ? __seccomp_filter+0x28/0x230
>> [  102.529389]  ? _raw_spin_unlock+0xa/0x20
>> [  102.534283]  ? __handle_mm_fault+0x7d6/0x9b0
>> [  102.539559]  ? list_lru_add+0xa8/0xc0
>> [  102.544157]  ? _raw_spin_unlock+0xa/0x20
>> [  102.549047]  ? __alloc_fd+0xaf/0x160
>> [  102.553549]  ? do_sys_open+0x1a6/0x230
>> [  102.558244]  do_sys_open+0x1a6/0x230
>> [  102.562742]  do_syscall_64+0x5a/0x100
>> [  102.567336]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> 
> Interesting. Thinking out loud: This is cd->device dereference I guess
> which means disk->private_data was NULL. That gets set in sr_probe()
> together with disk->fops which are certainly set as they must have led us
> to the crashing function sr_block_revalidate_disk(). So likely
> disk->private_data got already cleared. That happens in sr_kref_release()
> and the fact that that function got called means struct scsi_cd went away -
> so sr_remove() must have been called as well. That all seems possible like:
> 
> CPU1  CPU2
> sr_probe()
>   __blkdev_get()
> disk = bdev_get_gendisk();
> 
> sr_remove()
>   del_gendisk()
>   ...
>   kref_put(>kref, sr_kref_release);
> disk->private_data = NULL;
> put_disk(disk);
> kfree(cd);
> if (disk->fops->open) {
>   ret = disk->fops->open(bdev, mode); => sr_block_open
> check_disk_change(bdev);
>

Re: Hotplug

2018-04-11 Thread Jens Axboe

On 4/11/18 8:06 AM, Bart Van Assche wrote:
> On Wed, 2018-04-11 at 15:58 +0200, Jan Kara wrote:
>> I'm not really sure where this is crashing and 'Code' line is incomplete to
>> tell me.
> 
> Hello Jan,
> 
> The following patch should fix this crash:
> https://www.mail-archive.com/linux-block@vger.kernel.org/msg20209.html.

Yeah, I forgot the link in my reply, thanks.

-- 
Jens Axboe

Re: Hotplug

2018-04-11 Thread Bart Van Assche

On Wed, 2018-04-11 at 15:58 +0200, Jan Kara wrote:
> I'm not really sure where this is crashing and 'Code' line is incomplete to
> tell me.

Hello Jan,

The following patch should fix this crash:
https://www.mail-archive.com/linux-block@vger.kernel.org/msg20209.html.

Thanks,

Bart.

Re: Hotplug

2018-04-11 Thread Jan Kara

Hi,

On Tue 10-04-18 11:17:46, Jens Axboe wrote:
> Been running some tests and I keep running into issues with hotplug.
> This looks similar to what Bart posted the other day, but it looks
> more deeply rooted than just having to protect the queue in
> generic_make_request_checks(). The test run is blktests,
> block/001. Current -git doesn't survive it. I've seen at least two
> different oopses, pasted below.
> 
> [  102.163442] NULL pointer dereference at 0010
> [  102.163444] PGD 0 P4D 0 
> [  102.163447] Oops:  [#1] PREEMPT SMP
> [  102.163449] Modules linked in:
> [  102.175540] sr 12:0:0:0: [sr2] scsi-1 drive
> [  102.180112]  scsi_debug crc_t10dif crct10dif_generic crct10dif_common nvme 
> nvme_core sb_edac xl
> [  102.186934] sr 12:0:0:0: Attached scsi CD-ROM sr2
> [  102.191896]  sr_mod cdrom btrfs xor zstd_decompress zstd_compress xxhash 
> lzo_compress zlib_defc
> [  102.197169] sr 12:0:0:0: Attached scsi generic sg7 type 5
> [  102.203475]  igb ahci libahci i2c_algo_bit libata dca [last unloaded: 
> crc_t10dif]
> [  102.203484] CPU: 43 PID: 4629 Comm: systemd-udevd Not tainted 4.16.0+ #650
> [  102.203487] Hardware name: Dell Inc. PowerEdge T630/0NT78X, BIOS 2.3.4 
> 11/09/2016
> [  102.350882] RIP: 0010:sr_block_revalidate_disk+0x23/0x190 [sr_mod]
> [  102.358299] RSP: 0018:883ff357bb58 EFLAGS: 00010292
> [  102.364734] RAX: a00b07d0 RBX: 883ff3058000 RCX: 
> 883ff357bb66
> [  102.373220] RDX: 0003 RSI: 7530 RDI: 
> 881fea631000
> [  102.381705] RBP:  R08: 881fe4d38400 R09: 
> 
> [  102.390185] R10:  R11: 01b6 R12: 
> 085d
> [  102.398671] R13: 085d R14: 883ffd9b3790 R15: 
> 
> [  102.407156] FS:  7f7dc8e6d8c0() GS:883fff34() 
> knlGS:
> [  102.417138] CS:  0010 DS:  ES:  CR0: 80050033
> [  102.424066] CR2: 0010 CR3: 003ffda98005 CR4: 
> 003606e0
> [  102.432545] DR0:  DR1:  DR2: 
> 
> [  102.441024] DR3:  DR6: fffe0ff0 DR7: 
> 0400
> [  102.449502] Call Trace:
> [  102.452744]  ? __invalidate_device+0x48/0x60
> [  102.458022]  check_disk_change+0x4c/0x60
> [  102.462900]  sr_block_open+0x16/0xd0 [sr_mod]
> [  102.468270]  __blkdev_get+0xb9/0x450
> [  102.472774]  ? iget5_locked+0x1c0/0x1e0
> [  102.477568]  blkdev_get+0x11e/0x320
> [  102.481969]  ? bdget+0x11d/0x150
> [  102.486083]  ? _raw_spin_unlock+0xa/0x20
> [  102.490968]  ? bd_acquire+0xc0/0xc0
> [  102.495368]  do_dentry_open+0x1b0/0x320
> [  102.500159]  ? inode_permission+0x24/0xc0
> [  102.505140]  path_openat+0x4e6/0x1420
> [  102.509741]  ? cpumask_any_but+0x1f/0x40
> [  102.514630]  ? flush_tlb_mm_range+0xa0/0x120
> [  102.519903]  do_filp_open+0x8c/0xf0
> [  102.524305]  ? __seccomp_filter+0x28/0x230
> [  102.529389]  ? _raw_spin_unlock+0xa/0x20
> [  102.534283]  ? __handle_mm_fault+0x7d6/0x9b0
> [  102.539559]  ? list_lru_add+0xa8/0xc0
> [  102.544157]  ? _raw_spin_unlock+0xa/0x20
> [  102.549047]  ? __alloc_fd+0xaf/0x160
> [  102.553549]  ? do_sys_open+0x1a6/0x230
> [  102.558244]  do_sys_open+0x1a6/0x230
> [  102.562742]  do_syscall_64+0x5a/0x100
> [  102.567336]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2

Interesting. Thinking out loud: This is cd->device dereference I guess
which means disk->private_data was NULL. That gets set in sr_probe()
together with disk->fops which are certainly set as they must have led us
to the crashing function sr_block_revalidate_disk(). So likely
disk->private_data got already cleared. That happens in sr_kref_release()
and the fact that that function got called means struct scsi_cd went away -
so sr_remove() must have been called as well. That all seems possible like:

CPU1CPU2
sr_probe()
__blkdev_get()
  disk = bdev_get_gendisk();

sr_remove()
  del_gendisk()
  ...
  kref_put(>kref, sr_kref_release);
disk->private_data = NULL;
put_disk(disk);
kfree(cd);
  if (disk->fops->open) {
ret = disk->fops->open(bdev, mode); => sr_block_open
  check_disk_change(bdev);
sr_block_revalidate_disk()
  CRASH

And I think the problem is in sr_block_revalidate_disk() itself as the
scsi_cd() call is not guaranteed that the caller holds reference to 'cd'
and thus that 'cd' does not disappear under it. IMHO it needs to use
scsi_cd_get() to get struct scsi_cd from gendisk. Am I missing something?

> and this one, similar to Barts:
> 
> [ 4676.738069] NULL pointer dereference at

Hotplug

2018-04-10 Thread Jens Axboe

Hi,

Been running some tests and I keep running into issues with hotplug.
This looks similar to what Bart posted the other day, but it looks
more deeply rooted than just having to protect the queue in
generic_make_request_checks(). The test run is blktests,
block/001. Current -git doesn't survive it. I've seen at least two
different oopses, pasted below.

[  102.163442] NULL pointer dereference at 0010
[  102.163444] PGD 0 P4D 0 
[  102.163447] Oops:  [#1] PREEMPT SMP
[  102.163449] Modules linked in:
[  102.175540] sr 12:0:0:0: [sr2] scsi-1 drive
[  102.180112]  scsi_debug crc_t10dif crct10dif_generic crct10dif_common nvme 
nvme_core sb_edac xl
[  102.186934] sr 12:0:0:0: Attached scsi CD-ROM sr2
[  102.191896]  sr_mod cdrom btrfs xor zstd_decompress zstd_compress xxhash 
lzo_compress zlib_defc
[  102.197169] sr 12:0:0:0: Attached scsi generic sg7 type 5
[  102.203475]  igb ahci libahci i2c_algo_bit libata dca [last unloaded: 
crc_t10dif]
[  102.203484] CPU: 43 PID: 4629 Comm: systemd-udevd Not tainted 4.16.0+ #650
[  102.203487] Hardware name: Dell Inc. PowerEdge T630/0NT78X, BIOS 2.3.4 
11/09/2016
[  102.350882] RIP: 0010:sr_block_revalidate_disk+0x23/0x190 [sr_mod]
[  102.358299] RSP: 0018:883ff357bb58 EFLAGS: 00010292
[  102.364734] RAX: a00b07d0 RBX: 883ff3058000 RCX: 883ff357bb66
[  102.373220] RDX: 0003 RSI: 7530 RDI: 881fea631000
[  102.381705] RBP:  R08: 881fe4d38400 R09: 
[  102.390185] R10:  R11: 01b6 R12: 085d
[  102.398671] R13: 085d R14: 883ffd9b3790 R15: 
[  102.407156] FS:  7f7dc8e6d8c0() GS:883fff34() 
knlGS:
[  102.417138] CS:  0010 DS:  ES:  CR0: 80050033
[  102.424066] CR2: 0010 CR3: 003ffda98005 CR4: 003606e0
[  102.432545] DR0:  DR1:  DR2: 
[  102.441024] DR3:  DR6: fffe0ff0 DR7: 0400
[  102.449502] Call Trace:
[  102.452744]  ? __invalidate_device+0x48/0x60
[  102.458022]  check_disk_change+0x4c/0x60
[  102.462900]  sr_block_open+0x16/0xd0 [sr_mod]
[  102.468270]  __blkdev_get+0xb9/0x450
[  102.472774]  ? iget5_locked+0x1c0/0x1e0
[  102.477568]  blkdev_get+0x11e/0x320
[  102.481969]  ? bdget+0x11d/0x150
[  102.486083]  ? _raw_spin_unlock+0xa/0x20
[  102.490968]  ? bd_acquire+0xc0/0xc0
[  102.495368]  do_dentry_open+0x1b0/0x320
[  102.500159]  ? inode_permission+0x24/0xc0
[  102.505140]  path_openat+0x4e6/0x1420
[  102.509741]  ? cpumask_any_but+0x1f/0x40
[  102.514630]  ? flush_tlb_mm_range+0xa0/0x120
[  102.519903]  do_filp_open+0x8c/0xf0
[  102.524305]  ? __seccomp_filter+0x28/0x230
[  102.529389]  ? _raw_spin_unlock+0xa/0x20
[  102.534283]  ? __handle_mm_fault+0x7d6/0x9b0
[  102.539559]  ? list_lru_add+0xa8/0xc0
[  102.544157]  ? _raw_spin_unlock+0xa/0x20
[  102.549047]  ? __alloc_fd+0xaf/0x160
[  102.553549]  ? do_sys_open+0x1a6/0x230
[  102.558244]  do_sys_open+0x1a6/0x230
[  102.562742]  do_syscall_64+0x5a/0x100
[  102.567336]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2

and this one, similar to Barts:

[ 4676.738069] NULL pointer dereference at 0154
[ 4676.738071] PGD 0 P4D 0 
[ 4676.738073] Oops:  [#1] PREEMPT SMP
[ 4676.738075] Modules linked in: scsi_debug crc_t10dif nvme nvme_core loop 
configfs crct10dif_ge]
[ 4676.859272] CPU: 10 PID: 16598 Comm: systemd-udevd Not tainted 4.16.0+ #651
[ 4676.867525] Hardware name: Dell Inc. PowerEdge T630/0NT78X, BIOS 2.3.4 
11/09/2016
[ 4676.876765] RIP: 0010:blk_throtl_bio+0x45/0x9b0
[ 4676.882296] RSP: 0018:881ff0c8bb38 EFLAGS: 00010246
[ 4676.888610] RAX:  RBX: 881ffa273a40 RCX: 
[ 4676.897059] RDX: 881ffa273a40 RSI:  RDI: 
[ 4676.905507] RBP:  R08: 881ffa273ac0 R09: 881ff123f458
[ 4676.913955] R10: 881ff0c8bc70 R11: 1000 R12: 
[ 4676.922402] R13: 82600980 R14: 88208113 R15: 
[ 4676.930856] FS:  7fe63e5228c0() GS:881fff74() 
knlGS:
[ 4676.940773] CS:  0010 DS:  ES:  CR0: 80050033
[ 4676.947667] CR2: 0154 CR3: 001fed2a0003 CR4: 003606e0
[ 4676.956116] DR0:  DR1:  DR2: 
[ 4676.964568] DR3:  DR6: fffe0ff0 DR7: 0400
[ 4676.973021] Call Trace:
[ 4676.976229]  generic_make_request_checks+0x640/0x660
[ 4676.982245]  ? kmem_cache_alloc+0x37/0x190
[ 4676.987295]  generic_make_request+0x29/0x2f0
[ 4676.992534]  ? submit_bio+0x5c/0x110
[ 4676.996998]  ? bio_alloc_bioset+0x99/0x1c0
[ 4677.002050]  submit_bio+0x5c/0x110
[ 4677.006317]  ? guard_bio_eod+0x42/0x120
[ 4677.011074]  submit_bh_wbc.isra.49+0x132/0x160
[ 4677.016517]  ? bh_uptodate_or_lock+0x90/0x90
[ 4677.021757

Re: [PATCH v5 0/7] Enhance libsas hotplug feature

2018-01-03 Thread Martin K. Petersen


John,

> At this point we feel that we have a decent solution to the
> long-standing libsas hotplug issues.
>
> Hannes has kindly reviewed the series.
>
> Can you let us know what else you require for acceptance? More
> independent review or testing?

According to my notes, Hannes had some concerns wrt. one of the
patches. That's why I didn't merge the series.

Please address the comments and we'll take it from there.

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: [PATCH v5 0/7] Enhance libsas hotplug feature

2018-01-02 Thread John Garry


On 08/12/2017 09:42, Jason Yan wrote:

Now the libsas hotplug has some issues, Dan Williams report
a similar bug here before
https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg39187.html



Hi Martin, James,

At this point we feel that we have a decent solution to the 
long-standing libsas hotplug issues.


Hannes has kindly reviewed the series.

Can you let us know what else you require for acceptance? More 
independent review or testing?


Thanks,
John


The issues we have found
1. if LLDD burst reports lots of phy-up/phy-down sas events, some events
   may lost because a same sas events is pending now, finally libsas topo
   may different the hardware.
2. receive a phy down sas event, libsas call sas_deform_port to remove
   devices, it would first delete the sas port, then put a destruction
   discovery event in a new work, and queue it at the tail of workqueue,
   once the sas port be deleted, its children device will be deleted too,
   when the destruction work start, it will found the target device has
   been removed, and report a sysfs warnning.
3. since a hotplug process will be devided into several works, if a phy up
   sas event insert into phydown works, like
   destruction work  ---> PORTE_BYTES_DMAED (sas_form_port) 
>PHYE_LOSS_OF_SIGNAL
   the hot remove flow would broken by PORTE_BYTES_DMAED event, it's not
   we expected, and issues would occur.

v4->v5: -process only one expander's revalidation in sas_ex_revalidate_domain()
-notify event PORTE_BROADCAST_RCVD in sas_enable_revalidation()
v3->v4: -use dynamic alloced work and support shutting down the phy if active 
event reached the threshold
-use flush_workqueue instead of wait-completion to process discover 
events synchronously
-direct call probe and destruct function
v2->v3: some code improvements suggested by Johannes and John,
split v2 patch 2 into several small pathes.
v1->v2: some code improvements suggested by John Garry

Jason Yan (7):
  scsi: libsas: Use dynamic alloced work to avoid sas event lost
  scsi: libsas: shut down the PHY if events reached the threshold
  scsi: libsas: make the event threshold configurable
  scsi: libsas: Use new workqueue to run sas event and disco event
  scsi: libsas: use flush_workqueue to process disco events
synchronously
  scsi: libsas: direct call probe and destruct
  scsi: libsas: notify event PORTE_BROADCAST_RCVD in
sas_enable_revalidation()

 drivers/scsi/hisi_sas/hisi_sas_main.c |   6 ++
 drivers/scsi/libsas/sas_ata.c |   1 -
 drivers/scsi/libsas/sas_discover.c|  34 ++-
 drivers/scsi/libsas/sas_event.c   |  86 ---
 drivers/scsi/libsas/sas_expander.c|   8 +--
 drivers/scsi/libsas/sas_init.c| 107 +-
 drivers/scsi/libsas/sas_internal.h|   7 +++
 drivers/scsi/libsas/sas_phy.c |  69 +++---
 drivers/scsi/libsas/sas_port.c|  25 
 include/scsi/libsas.h |  30 +++---
 include/scsi/scsi_transport_sas.h |   1 +
 11 files changed, 277 insertions(+), 97 deletions(-)

[PATCH v2 24/30] scsi: aacraid: Use hotplug handling function in place of scsi_scan_host

2017-12-26 Thread Raghava Aditya Renukunta

Driver uses scsi_scan_host to add new devices in the driver init path,
which adds all the fw exposed devices. The drivers resorts to queue
command checks to block out commands to _hidden_ devices.

Use the hotplug handler code to add new devices during driver init and
other areas, this is only for safw. For ARC scsi_scan_host will still
apply.

Signed-off-by: Raghava Aditya Renukunta <raghavaaditya.renuku...@microsemi.com>

---
Changes in V2:
None

 drivers/scsi/aacraid/aachba.c  |  4 
 drivers/scsi/aacraid/aacraid.h |  1 +
 drivers/scsi/aacraid/commsup.c | 18 +++---
 drivers/scsi/aacraid/linit.c   |  5 +++--
 4 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/drivers/scsi/aacraid/aachba.c b/drivers/scsi/aacraid/aachba.c
index 4ad9d3f..426c61a 100644
--- a/drivers/scsi/aacraid/aachba.c
+++ b/drivers/scsi/aacraid/aachba.c
@@ -2150,10 +2150,6 @@ int aac_get_adapter_info(struct aac_dev* dev)
dev->maximum_num_channels = le32_to_cpu(bus_info->BusCount);
}
 
-   if (!dev->sync_mode && dev->sa_firmware &&
-   dev->supplement_adapter_info.virt_device_bus != 0x)
-   rcode = aac_setup_safw_adapter(dev, AAC_INIT);
-
if (!dev->in_reset) {
char buffer[16];
tmp = le32_to_cpu(dev->adapter_info.kernelrev);
diff --git a/drivers/scsi/aacraid/aacraid.h b/drivers/scsi/aacraid/aacraid.h
index c70c998..ba84d99 100644
--- a/drivers/scsi/aacraid/aacraid.h
+++ b/drivers/scsi/aacraid/aacraid.h
@@ -2719,6 +2719,7 @@ static inline int aac_supports_2T(struct aac_dev *dev)
return (dev->adapter_info.options & AAC_OPT_NEW_COMM_64);
 }
 
+int aac_scan_host(struct aac_dev *dev, int rescan);
 char * get_container_type(unsigned type);
 extern int numacb;
 extern char aac_driver_version[];
diff --git a/drivers/scsi/aacraid/commsup.c b/drivers/scsi/aacraid/commsup.c
index 491e633..4e2687c 100644
--- a/drivers/scsi/aacraid/commsup.c
+++ b/drivers/scsi/aacraid/commsup.c
@@ -1964,6 +1964,19 @@ static int aac_update_safw_host_devices(struct aac_dev 
*dev, int rescan)
return rcode;
 }
 
+int aac_scan_host(struct aac_dev *dev, int rescan)
+{
+   int rcode = 0;
+
+   mutex_lock(>scan_mutex);
+   if (dev->sa_firmware)
+   rcode = aac_update_safw_host_devices(dev, rescan);
+   else
+   scsi_scan_host(dev->scsi_host_ptr);
+   mutex_unlock(>scan_mutex);
+   return rcode;
+}
+
 /**
  * aac_handle_sa_aif   Handle a message from the firmware
  * @dev: Which adapter this fib is from
@@ -1997,9 +2010,8 @@ static void aac_handle_sa_aif(struct aac_dev *dev, struct 
fib *fibptr)
case SA_AIF_LDEV_CHANGE:
case SA_AIF_BPCFG_CHANGE:
 
-   mutex_lock(>scan_mutex);
-   aac_update_safw_host_devices(dev, AAC_RESCAN);
-   mutex_unlock(>scan_mutex);
+   aac_scan_host(dev, AAC_RESCAN);
+
break;
 
case SA_AIF_BPSTAT_CHANGE:
diff --git a/drivers/scsi/aacraid/linit.c b/drivers/scsi/aacraid/linit.c
index 2c862cd..7ea7b2c 100644
--- a/drivers/scsi/aacraid/linit.c
+++ b/drivers/scsi/aacraid/linit.c
@@ -1787,7 +1787,8 @@ static int aac_probe_one(struct pci_dev *pdev, const 
struct pci_device_id *id)
error = scsi_add_host(shost, >dev);
if (error)
goto out_deinit;
-   scsi_scan_host(shost);
+
+   aac_scan_host(aac, AAC_INIT);
 
pci_enable_pcie_error_reporting(pdev);
pci_save_state(pdev);
@@ -2071,7 +2072,7 @@ static void aac_pci_resume(struct pci_dev *pdev)
if (sdev->sdev_state == SDEV_OFFLINE)
sdev->sdev_state = SDEV_RUNNING;
scsi_unblock_requests(aac->scsi_host_ptr);
-   scsi_scan_host(aac->scsi_host_ptr);
+   aac_scan_host(aac, AAC_RESCAN);
pci_save_state(pdev);
 
dev_err(>dev, "aacraid: PCI error - resume\n");
-- 
2.9.4

[PATCH v2 23/30] scsi: aacraid: Block concurrent hotplug event handling

2017-12-26 Thread Raghava Aditya Renukunta

Currently driver will attempt to process hotplug events concurrently based
on the FW interrupt.

Protect safw update function with a scan mutex.

Signed-off-by: Raghava Aditya Renukunta <raghavaaditya.renuku...@microsemi.com>

---
Changes in V2:
None

 drivers/scsi/aacraid/aacraid.h | 1 +
 drivers/scsi/aacraid/commsup.c | 2 ++
 drivers/scsi/aacraid/linit.c   | 1 +
 3 files changed, 4 insertions(+)

diff --git a/drivers/scsi/aacraid/aacraid.h b/drivers/scsi/aacraid/aacraid.h
index a8fe1e1..c70c998 100644
--- a/drivers/scsi/aacraid/aacraid.h
+++ b/drivers/scsi/aacraid/aacraid.h
@@ -1565,6 +1565,7 @@ struct aac_dev
spinlock_t  fib_lock;
 
struct mutexioctl_mutex;
+   struct mutexscan_mutex;
struct aac_queue_block *queues;
/*
 *  The user API will use an IOCTL to register itself to receive
diff --git a/drivers/scsi/aacraid/commsup.c b/drivers/scsi/aacraid/commsup.c
index 34155b1..491e633 100644
--- a/drivers/scsi/aacraid/commsup.c
+++ b/drivers/scsi/aacraid/commsup.c
@@ -1997,7 +1997,9 @@ static void aac_handle_sa_aif(struct aac_dev *dev, struct 
fib *fibptr)
case SA_AIF_LDEV_CHANGE:
case SA_AIF_BPCFG_CHANGE:
 
+   mutex_lock(>scan_mutex);
aac_update_safw_host_devices(dev, AAC_RESCAN);
+   mutex_unlock(>scan_mutex);
break;
 
case SA_AIF_BPSTAT_CHANGE:
diff --git a/drivers/scsi/aacraid/linit.c b/drivers/scsi/aacraid/linit.c
index b2273e3..2c862cd 100644
--- a/drivers/scsi/aacraid/linit.c
+++ b/drivers/scsi/aacraid/linit.c
@@ -1683,6 +1683,7 @@ static int aac_probe_one(struct pci_dev *pdev, const 
struct pci_device_id *id)
spin_lock_init(>fib_lock);
 
mutex_init(>ioctl_mutex);
+   mutex_init(>scan_mutex);
/*
 *  Map in the registers from the adapter.
 */
-- 
2.9.4

[PATCH 23/29] scsi: aacraid: Block concurrent hotplug event handling

2017-12-21 Thread Raghava Aditya Renukunta

Currently driver will attempt to process hotplug events concurrently based
on the FW interrupt.

Protect safw update function with a scan mutex.

Signed-off-by: Raghava Aditya Renukunta <raghavaaditya.renuku...@microsemi.com>
---
 drivers/scsi/aacraid/aacraid.h | 1 +
 drivers/scsi/aacraid/commsup.c | 2 ++
 drivers/scsi/aacraid/linit.c   | 1 +
 3 files changed, 4 insertions(+)

diff --git a/drivers/scsi/aacraid/aacraid.h b/drivers/scsi/aacraid/aacraid.h
index a8fe1e1..c70c998 100644
--- a/drivers/scsi/aacraid/aacraid.h
+++ b/drivers/scsi/aacraid/aacraid.h
@@ -1565,6 +1565,7 @@ struct aac_dev
spinlock_t  fib_lock;
 
struct mutexioctl_mutex;
+   struct mutexscan_mutex;
struct aac_queue_block *queues;
/*
 *  The user API will use an IOCTL to register itself to receive
diff --git a/drivers/scsi/aacraid/commsup.c b/drivers/scsi/aacraid/commsup.c
index f781076..698c049 100644
--- a/drivers/scsi/aacraid/commsup.c
+++ b/drivers/scsi/aacraid/commsup.c
@@ -2001,7 +2001,9 @@ static void aac_handle_sa_aif(struct aac_dev *dev, struct 
fib *fibptr)
case SA_AIF_LDEV_CHANGE:
case SA_AIF_BPCFG_CHANGE:
 
+   mutex_lock(>scan_mutex);
aac_update_safw_host_devices(dev, AAC_RESCAN);
+   mutex_unlock(>scan_mutex);
break;
 
case SA_AIF_BPSTAT_CHANGE:
diff --git a/drivers/scsi/aacraid/linit.c b/drivers/scsi/aacraid/linit.c
index b2273e3..2c862cd 100644
--- a/drivers/scsi/aacraid/linit.c
+++ b/drivers/scsi/aacraid/linit.c
@@ -1683,6 +1683,7 @@ static int aac_probe_one(struct pci_dev *pdev, const 
struct pci_device_id *id)
spin_lock_init(>fib_lock);
 
mutex_init(>ioctl_mutex);
+   mutex_init(>scan_mutex);
/*
 *  Map in the registers from the adapter.
 */
-- 
2.9.4

[PATCH 24/29] scsi: aacraid: Use hotplug handling function in place of scsi_scan_host

2017-12-21 Thread Raghava Aditya Renukunta

Driver uses scsi_scan_host to add new devices in the driver init path,
which adds all the fw exposed devices. The drivers resorts to queue
command checks to block out commands to _hidden_ devices.

Use the hotplug handler code to add new devices during driver init and
other areas, this is only for safw. For ARC scsi_scan_host will still
apply.

Signed-off-by: Raghava Aditya Renukunta <raghavaaditya.renuku...@microsemi.com>
---
 drivers/scsi/aacraid/aachba.c  |  4 
 drivers/scsi/aacraid/aacraid.h |  1 +
 drivers/scsi/aacraid/commsup.c | 18 +++---
 drivers/scsi/aacraid/linit.c   |  5 +++--
 4 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/drivers/scsi/aacraid/aachba.c b/drivers/scsi/aacraid/aachba.c
index 74f1dd2..eef7322 100644
--- a/drivers/scsi/aacraid/aachba.c
+++ b/drivers/scsi/aacraid/aachba.c
@@ -2161,10 +2161,6 @@ int aac_get_adapter_info(struct aac_dev* dev)
dev->maximum_num_channels = le32_to_cpu(bus_info->BusCount);
}
 
-   if (!dev->sync_mode && dev->sa_firmware &&
-   dev->supplement_adapter_info.virt_device_bus != 0x)
-   rcode = aac_setup_safw_adapter(dev, AAC_INIT);
-
if (!dev->in_reset) {
char buffer[16];
tmp = le32_to_cpu(dev->adapter_info.kernelrev);
diff --git a/drivers/scsi/aacraid/aacraid.h b/drivers/scsi/aacraid/aacraid.h
index c70c998..ba84d99 100644
--- a/drivers/scsi/aacraid/aacraid.h
+++ b/drivers/scsi/aacraid/aacraid.h
@@ -2719,6 +2719,7 @@ static inline int aac_supports_2T(struct aac_dev *dev)
return (dev->adapter_info.options & AAC_OPT_NEW_COMM_64);
 }
 
+int aac_scan_host(struct aac_dev *dev, int rescan);
 char * get_container_type(unsigned type);
 extern int numacb;
 extern char aac_driver_version[];
diff --git a/drivers/scsi/aacraid/commsup.c b/drivers/scsi/aacraid/commsup.c
index 698c049..46ee7ba 100644
--- a/drivers/scsi/aacraid/commsup.c
+++ b/drivers/scsi/aacraid/commsup.c
@@ -1968,6 +1968,19 @@ static int aac_update_safw_host_devices(struct aac_dev 
*dev, int rescan)
return rcode;
 }
 
+int aac_scan_host(struct aac_dev *dev, int rescan)
+{
+   int rcode = 0;
+
+   mutex_lock(>scan_mutex);
+   if (dev->sa_firmware)
+   rcode = aac_update_safw_host_devices(dev, rescan);
+   else
+   scsi_scan_host(dev->scsi_host_ptr);
+   mutex_unlock(>scan_mutex);
+   return rcode;
+}
+
 /**
  * aac_handle_sa_aif   Handle a message from the firmware
  * @dev: Which adapter this fib is from
@@ -2001,9 +2014,8 @@ static void aac_handle_sa_aif(struct aac_dev *dev, struct 
fib *fibptr)
case SA_AIF_LDEV_CHANGE:
case SA_AIF_BPCFG_CHANGE:
 
-   mutex_lock(>scan_mutex);
-   aac_update_safw_host_devices(dev, AAC_RESCAN);
-   mutex_unlock(>scan_mutex);
+   aac_scan_host(dev, AAC_RESCAN);
+
break;
 
case SA_AIF_BPSTAT_CHANGE:
diff --git a/drivers/scsi/aacraid/linit.c b/drivers/scsi/aacraid/linit.c
index 2c862cd..7ea7b2c 100644
--- a/drivers/scsi/aacraid/linit.c
+++ b/drivers/scsi/aacraid/linit.c
@@ -1787,7 +1787,8 @@ static int aac_probe_one(struct pci_dev *pdev, const 
struct pci_device_id *id)
error = scsi_add_host(shost, >dev);
if (error)
goto out_deinit;
-   scsi_scan_host(shost);
+
+   aac_scan_host(aac, AAC_INIT);
 
pci_enable_pcie_error_reporting(pdev);
pci_save_state(pdev);
@@ -2071,7 +2072,7 @@ static void aac_pci_resume(struct pci_dev *pdev)
if (sdev->sdev_state == SDEV_OFFLINE)
sdev->sdev_state = SDEV_RUNNING;
scsi_unblock_requests(aac->scsi_host_ptr);
-   scsi_scan_host(aac->scsi_host_ptr);
+   aac_scan_host(aac, AAC_RESCAN);
pci_save_state(pdev);
 
dev_err(>dev, "aacraid: PCI error - resume\n");
-- 
2.9.4

[PATCH v5 0/7] Enhance libsas hotplug feature

2017-12-08 Thread Jason Yan

Now the libsas hotplug has some issues, Dan Williams report
a similar bug here before
https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg39187.html

The issues we have found
1. if LLDD burst reports lots of phy-up/phy-down sas events, some events
   may lost because a same sas events is pending now, finally libsas topo
   may different the hardware.
2. receive a phy down sas event, libsas call sas_deform_port to remove
   devices, it would first delete the sas port, then put a destruction
   discovery event in a new work, and queue it at the tail of workqueue,
   once the sas port be deleted, its children device will be deleted too,
   when the destruction work start, it will found the target device has
   been removed, and report a sysfs warnning.
3. since a hotplug process will be devided into several works, if a phy up
   sas event insert into phydown works, like
   destruction work  ---> PORTE_BYTES_DMAED (sas_form_port) 
>PHYE_LOSS_OF_SIGNAL
   the hot remove flow would broken by PORTE_BYTES_DMAED event, it's not
   we expected, and issues would occur.

v4->v5: -process only one expander's revalidation in sas_ex_revalidate_domain()
-notify event PORTE_BROADCAST_RCVD in sas_enable_revalidation()
v3->v4: -use dynamic alloced work and support shutting down the phy if active 
event reached the threshold
-use flush_workqueue instead of wait-completion to process discover 
events synchronously
-direct call probe and destruct function
v2->v3: some code improvements suggested by Johannes and John,
split v2 patch 2 into several small pathes.
v1->v2: some code improvements suggested by John Garry

Jason Yan (7):
  scsi: libsas: Use dynamic alloced work to avoid sas event lost
  scsi: libsas: shut down the PHY if events reached the threshold
  scsi: libsas: make the event threshold configurable
  scsi: libsas: Use new workqueue to run sas event and disco event
  scsi: libsas: use flush_workqueue to process disco events
synchronously
  scsi: libsas: direct call probe and destruct
  scsi: libsas: notify event PORTE_BROADCAST_RCVD in
sas_enable_revalidation()

 drivers/scsi/hisi_sas/hisi_sas_main.c |   6 ++
 drivers/scsi/libsas/sas_ata.c |   1 -
 drivers/scsi/libsas/sas_discover.c|  34 ++-
 drivers/scsi/libsas/sas_event.c   |  86 ---
 drivers/scsi/libsas/sas_expander.c|   8 +--
 drivers/scsi/libsas/sas_init.c| 107 +-
 drivers/scsi/libsas/sas_internal.h|   7 +++
 drivers/scsi/libsas/sas_phy.c |  69 +++---
 drivers/scsi/libsas/sas_port.c|  25 
 include/scsi/libsas.h |  30 +++---
 include/scsi/scsi_transport_sas.h |   1 +
 11 files changed, 277 insertions(+), 97 deletions(-)

-- 
2.9.5

[RESEND PATCH v4 0/6] Enhance libsas hotplug feature

2017-09-19 Thread Jason Yan

Thanks Martin K. Petersen for applied some of the tidy-up patches. So I do not
have to maintain these patches out of the tree. I will only send the reset
of them in the next days if needed.

Now the libsas hotplug has some issues, Dan Williams report
a similar bug here before
https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg39187.html

The issues we have found
1. if LLDD burst reports lots of phy-up/phy-down sas events, some events
   may lost because a same sas events is pending now, finally libsas topo
   may different the hardware.
2. receive a phy down sas event, libsas call sas_deform_port to remove
   devices, it would first delete the sas port, then put a destruction
   discovery event in a new work, and queue it at the tail of workqueue,
   once the sas port be deleted, its children device will be deleted too,
   when the destruction work start, it will found the target device has
   been removed, and report a sysfs warnning.
3. since a hotplug process will be devided into several works, if a phy up
   sas event insert into phydown works, like
   destruction work  ---> PORTE_BYTES_DMAED (sas_form_port) 
>PHYE_LOSS_OF_SIGNAL
   the hot remove flow would broken by PORTE_BYTES_DMAED event, it's not
   we expected, and issues would occur.

v3->v4: -use dynamic alloced work and support shutting down the phy if active 
event reached the threshold
-use flush_workqueue instead of wait-completion to process discover 
events synchronously
-direct call probe and destruct function
v2->v3: some code improvements suggested by Johannes and John,
split v2 patch 2 into several small pathes.
v1->v2: some code improvements suggested by John Garry

Jason Yan (6):
  libsas: Use dynamic alloced work to avoid sas event lost
  libsas: shut down the PHY if events reached the threshold
  libsas: make the event threshold configurable
  libsas: Use new workqueue to run sas event and disco event
  libsas: libsas: use flush_workqueue to process disco events
synchronously
  libsas: direct call probe and destruct

 drivers/scsi/hisi_sas/hisi_sas_main.c |  6 +++
 drivers/scsi/libsas/sas_ata.c |  1 -
 drivers/scsi/libsas/sas_discover.c| 36 --
 drivers/scsi/libsas/sas_event.c   | 79 ++
 drivers/scsi/libsas/sas_expander.c|  2 +-
 drivers/scsi/libsas/sas_init.c| 91 +--
 drivers/scsi/libsas/sas_internal.h|  7 +++
 drivers/scsi/libsas/sas_phy.c | 73 ++--
 drivers/scsi/libsas/sas_port.c| 25 ++
 include/scsi/libsas.h | 29 ---
 include/scsi/scsi_transport_sas.h |  1 +
 11 files changed, 258 insertions(+), 92 deletions(-)

-- 
2.5.0

Re: [PATCH v4 00/11] Enhance libsas hotplug feature

2017-09-18 Thread John Garry


On 06/09/2017 10:15, Jason Yan wrote:

Hello all, Yijing Wang handed over this topic to me. We are working
on it the last two months. We have tested the patchset for a long
time. Here is the new version.

Now the libsas hotplug has some issues, Dan Williams report
a similar bug here before
https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg39187.html

The issues we have found
1. if LLDD burst reports lots of phy-up/phy-down sas events, some events
   may lost because a same sas events is pending now, finally libsas topo
   may different the hardware.
2. receive a phy down sas event, libsas call sas_deform_port to remove
   devices, it would first delete the sas port, then put a destruction
   discovery event in a new work, and queue it at the tail of workqueue,
   once the sas port be deleted, its children device will be deleted too,
   when the destruction work start, it will found the target device has
   been removed, and report a sysfs warnning.
3. since a hotplug process will be divided into several works, if a phy up
   sas event insert into phydown works, like
   destruction work  ---> PORTE_BYTES_DMAED (sas_form_port) 
>PHYE_LOSS_OF_SIGNAL
   the hot remove flow would broken by PORTE_BYTES_DMAED event, it's not
   we expected, and issues would occur.



I have tried to verify PM suspend/resume feature with this patchset to 
ensure that it is not broken.


Since our hisi_sas driver does not support PM yet, I have got my hands 
on an adaptec card which uses pm8001 driver (it's vendor 9065, device 
8088) to test.


So in the PM resume, sometimes I find the console locks up with and 
without this patchset. In these cases, I find this log:

[   59.344266] pm80xx pm80xx_chip_init 1098:Firmware is not ready!

Is this a known issue? Should we have a chip reset in all cases in the 
resume code, marked ***:

static int pm8001_pci_resume(struct pci_dev *pdev)
{
... 

rc = pci_go_44(pdev);
if (rc)
goto err_out_disable;

/* chip soft rst only for spc */
if (pm8001_ha->chip_id == chip_8001) {  HERE
PM8001_CHIP_DISP->chip_soft_rst(pm8001_ha);
PM8001_INIT_DBG(pm8001_ha,
pm8001_printk("chip soft reset successful\n"));
}
rc = PM8001_CHIP_DISP->chip_init(pm8001_ha);
if (rc)
goto err_out_disable;

...
}

John


v3->v4: -get rid of unused ha event and do some cleanup
-use dynamic alloced work and support shutting down the phy if active 
event reached the threshold
-use flush_workqueue instead of wait-completion to process 
discover events synchronously
-direct call probe and destruct function
-other small code improvements
v2->v3: some code improvements suggested by Johannes and John,
split v2 patch 2 into several small patches.
v1->v2: some code improvements suggested by John Garry

Jason Yan (10):
  libsas: kill useless ha_event and do some cleanup
  libsas: remove the numbering for each event enum
  libsas: remove unused port_gone_completion and DISCE_PORT_GONE
  libsas: rename notify_port_event() for consistency
  libsas: Use dynamic alloced work to avoid sas event lost
  libsas: shut down the PHY if events reached the threshold
  libsas: make the event threshold configurable
  libsas: Use new workqueue to run sas event and disco event
  libsas: libsas: use flush_workqueue to process disco events
synchronously
  libsas: direct call probe and destruct

chenxiang (1):
  libsas: add event to defer list tail instead of head when draining

 drivers/scsi/aic94xx/aic94xx_hwi.c|   3 -
 drivers/scsi/hisi_sas/hisi_sas_main.c |   7 ++-
 drivers/scsi/libsas/sas_ata.c |   1 -
 drivers/scsi/libsas/sas_discover.c|  36 +++-
 drivers/scsi/libsas/sas_dump.c|  10 
 drivers/scsi/libsas/sas_dump.h|   1 -
 drivers/scsi/libsas/sas_event.c   |  97 +++-
 drivers/scsi/libsas/sas_expander.c|   2 +-
 drivers/scsi/libsas/sas_init.c| 101 +-
 drivers/scsi/libsas/sas_internal.h|   7 +++
 drivers/scsi/libsas/sas_phy.c |  73 
 drivers/scsi/libsas/sas_port.c|  25 +
 include/scsi/libsas.h |  81 ---
 include/scsi/scsi_transport_sas.h |   1 +
 14 files changed, 270 insertions(+), 175 deletions(-)

Re: [PATCH v4 00/11] Enhance libsas hotplug feature

2017-09-15 Thread Martin K. Petersen


Jason,

> Yijing Wang handed over this topic to me. We are working on it the
> last two months. We have tested the patchset for a long time. Here is
> the new version.

Applied patches 1-4 and 11 to 4.15/scsi-queue. I suggest you resubmit
the rest to get them back on people's radar.

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: [PATCH v4 00/11] Enhance libsas hotplug feature

2017-09-06 Thread Jason Yan




On 2017/9/6 21:22, Christoph Hellwig wrote:

On Wed, Sep 06, 2017 at 02:07:57PM +0100, John Garry wrote:

Regardless of the fate of the rest of the patches in this series, I think
patches 1,2,3,4,11/11 can be taken in isolation (subject to review, of
course). It would save maintaining them out-of-tree.


I did a quick review of those and they all look fine to me.

I'll try to find some time to review the real changes in the next
days.

.



Thank you very much and I'm looking forward to your suggestions
of the real changes.

Re: [PATCH v4 00/11] Enhance libsas hotplug feature

2017-09-06 Thread Christoph Hellwig

On Wed, Sep 06, 2017 at 02:07:57PM +0100, John Garry wrote:
> Regardless of the fate of the rest of the patches in this series, I think 
> patches 1,2,3,4,11/11 can be taken in isolation (subject to review, of 
> course). It would save maintaining them out-of-tree.

I did a quick review of those and they all look fine to me.

I'll try to find some time to review the real changes in the next
days.

Re: [PATCH v4 00/11] Enhance libsas hotplug feature

2017-09-06 Thread John Garry


On 06/09/2017 10:15, Jason Yan wrote:

Hello all, Yijing Wang handed over this topic to me. We are working
on it the last two months. We have tested the patchset for a long
time. Here is the new version.

Now the libsas hotplug has some issues, Dan Williams report
a similar bug here before
https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg39187.html

The issues we have found
1. if LLDD burst reports lots of phy-up/phy-down sas events, some events
   may lost because a same sas events is pending now, finally libsas topo
   may different the hardware.
2. receive a phy down sas event, libsas call sas_deform_port to remove
   devices, it would first delete the sas port, then put a destruction
   discovery event in a new work, and queue it at the tail of workqueue,
   once the sas port be deleted, its children device will be deleted too,
   when the destruction work start, it will found the target device has
   been removed, and report a sysfs warnning.
3. since a hotplug process will be divided into several works, if a phy up
   sas event insert into phydown works, like
   destruction work  ---> PORTE_BYTES_DMAED (sas_form_port) 
>PHYE_LOSS_OF_SIGNAL
   the hot remove flow would broken by PORTE_BYTES_DMAED event, it's not
   we expected, and issues would occur.

v3->v4: -get rid of unused ha event and do some cleanup
-use dynamic alloced work and support shutting down the phy if active 
event reached the threshold
-use flush_workqueue instead of wait-completion to process 
discover events synchronously
-direct call probe and destruct function
-other small code improvements
v2->v3: some code improvements suggested by Johannes and John,
split v2 patch 2 into several small patches.
v1->v2: some code improvements suggested by John Garry

Jason Yan (10):
  libsas: kill useless ha_event and do some cleanup
  libsas: remove the numbering for each event enum
  libsas: remove unused port_gone_completion and DISCE_PORT_GONE
  libsas: rename notify_port_event() for consistency
  libsas: Use dynamic alloced work to avoid sas event lost
  libsas: shut down the PHY if events reached the threshold
  libsas: make the event threshold configurable
  libsas: Use new workqueue to run sas event and disco event
  libsas: libsas: use flush_workqueue to process disco events
synchronously
  libsas: direct call probe and destruct

chenxiang (1):
  libsas: add event to defer list tail instead of head when draining



Regardless of the fate of the rest of the patches in this series, I 
think patches 1,2,3,4,11/11 can be taken in isolation (subject to 
review, of course). It would save maintaining them out-of-tree.


John


 drivers/scsi/aic94xx/aic94xx_hwi.c|   3 -
 drivers/scsi/hisi_sas/hisi_sas_main.c |   7 ++-
 drivers/scsi/libsas/sas_ata.c |   1 -
 drivers/scsi/libsas/sas_discover.c|  36 +++-
 drivers/scsi/libsas/sas_dump.c|  10 
 drivers/scsi/libsas/sas_dump.h|   1 -
 drivers/scsi/libsas/sas_event.c   |  97 +++-
 drivers/scsi/libsas/sas_expander.c|   2 +-
 drivers/scsi/libsas/sas_init.c| 101 +-
 drivers/scsi/libsas/sas_internal.h|   7 +++
 drivers/scsi/libsas/sas_phy.c |  73 
 drivers/scsi/libsas/sas_port.c|  25 +
 include/scsi/libsas.h |  81 ---
 include/scsi/scsi_transport_sas.h |   1 +
 14 files changed, 270 insertions(+), 175 deletions(-)

[PATCH v4 00/11] Enhance libsas hotplug feature

2017-09-06 Thread Jason Yan

Hello all, Yijing Wang handed over this topic to me. We are working
on it the last two months. We have tested the patchset for a long
time. Here is the new version.

Now the libsas hotplug has some issues, Dan Williams report
a similar bug here before
https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg39187.html

The issues we have found
1. if LLDD burst reports lots of phy-up/phy-down sas events, some events
   may lost because a same sas events is pending now, finally libsas topo
   may different the hardware.
2. receive a phy down sas event, libsas call sas_deform_port to remove
   devices, it would first delete the sas port, then put a destruction
   discovery event in a new work, and queue it at the tail of workqueue,
   once the sas port be deleted, its children device will be deleted too,
   when the destruction work start, it will found the target device has
   been removed, and report a sysfs warnning.
3. since a hotplug process will be divided into several works, if a phy up
   sas event insert into phydown works, like
   destruction work  ---> PORTE_BYTES_DMAED (sas_form_port) 
>PHYE_LOSS_OF_SIGNAL
   the hot remove flow would broken by PORTE_BYTES_DMAED event, it's not
   we expected, and issues would occur.

v3->v4: -get rid of unused ha event and do some cleanup
-use dynamic alloced work and support shutting down the phy if active 
event reached the threshold
-use flush_workqueue instead of wait-completion to process 
discover events synchronously
-direct call probe and destruct function
-other small code improvements 
v2->v3: some code improvements suggested by Johannes and John,
split v2 patch 2 into several small patches.
v1->v2: some code improvements suggested by John Garry

Jason Yan (10):
  libsas: kill useless ha_event and do some cleanup
  libsas: remove the numbering for each event enum
  libsas: remove unused port_gone_completion and DISCE_PORT_GONE
  libsas: rename notify_port_event() for consistency
  libsas: Use dynamic alloced work to avoid sas event lost
  libsas: shut down the PHY if events reached the threshold
  libsas: make the event threshold configurable
  libsas: Use new workqueue to run sas event and disco event
  libsas: libsas: use flush_workqueue to process disco events
synchronously
  libsas: direct call probe and destruct

chenxiang (1):
  libsas: add event to defer list tail instead of head when draining

 drivers/scsi/aic94xx/aic94xx_hwi.c|   3 -
 drivers/scsi/hisi_sas/hisi_sas_main.c |   7 ++-
 drivers/scsi/libsas/sas_ata.c |   1 -
 drivers/scsi/libsas/sas_discover.c|  36 +++-
 drivers/scsi/libsas/sas_dump.c|  10 
 drivers/scsi/libsas/sas_dump.h|   1 -
 drivers/scsi/libsas/sas_event.c   |  97 +++-
 drivers/scsi/libsas/sas_expander.c|   2 +-
 drivers/scsi/libsas/sas_init.c| 101 +-
 drivers/scsi/libsas/sas_internal.h|   7 +++
 drivers/scsi/libsas/sas_phy.c |  73 
 drivers/scsi/libsas/sas_port.c|  25 +
 include/scsi/libsas.h |  81 ---
 include/scsi/scsi_transport_sas.h |   1 +
 14 files changed, 270 insertions(+), 175 deletions(-)

-- 
2.5.0

[PATCH] mptsas: Fixup device hotplug for VMWare ESXi

2017-08-24 Thread Hannes Reinecke

VMWare ESXi emulates an mptsas HBA, but exposes all drives as
direct-attached SAS drives.
This it not how the driver originally envisioned things; SAS drives
were supposed to be connected via an expander, and only SATA drives
would be direct attached.
As such any hotplug event for direct-attach SAS drives was silently
ignored, and the guest failed to detect new drives from within a
VMWare ESXi environment.

Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1030850
Signed-off-by: Hannes Reinecke <h...@suse.com>
---
 drivers/message/fusion/mptsas.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/message/fusion/mptsas.c b/drivers/message/fusion/mptsas.c
index f6308ad..b9bd6aa 100644
--- a/drivers/message/fusion/mptsas.c
+++ b/drivers/message/fusion/mptsas.c
@@ -4352,11 +4352,10 @@ static void mptsas_expander_delete(MPT_ADAPTER *ioc,
return;
 
phy_info = mptsas_refreshing_device_handles(ioc, _device);
-   /* Only For SATA Device ADD */
-   if (!phy_info && (sas_device.device_info &
-   MPI_SAS_DEVICE_INFO_SATA_DEVICE)) {
+   /* Device hostplug */
+   if (!phy_info) {
devtprintk(ioc, printk(MYIOC_s_DEBUG_FMT
-   "%s %d SATA HOT PLUG: "
+   "%s %d HOT PLUG: "
"parent handle of device %x\n", ioc->name,
__func__, __LINE__, sas_device.handle_parent));
port_info = mptsas_find_portinfo_by_handle(ioc,
-- 
1.8.5.6

Re: Spurious DISK_EVENT_MEDIA_CHANGE on USB DVD hotplug?

2017-08-17 Thread Joe Lawrence

On 08/14/2017 02:30 PM, Tejun Heo wrote:
> Hello, Joe.
> 
> On Thu, Aug 10, 2017 at 10:45:54AM -0400, Joe Lawrence wrote:
>> In the case of my USB DVD -> laptop example, there is no media in my
>> device, however I still see the DISK_EVENT_MEDIA_CHANGE event.  This is
>> a bit confusing, and I was wondering if there was an explanation for
>> the following:
>>
>> drivers/scsi/sr.c :: sr_probe()
>>
>> disk->events = DISK_EVENT_MEDIA_CHANGE | DISK_EVENT_EJECT_REQUEST;
>> ...
>> cd->media_present = 1;
>>
>> DISK_EVENT_MEDIA_CHANGE events will pass through to userspace and
>> for some reason cd->media_present defaults to 1?  More on that below...
> 
> I don't have any concrete ideas but I think the only thing it's trying
> to do is to always generate at least one changed event no matter what.
> 
> ...
>> sr_check_events() compares the previous (in this case, default)
>> media_present value with what the TUR returns.  If it has changed, then
>> turn on the DISK_EVENT_MEDIA_CHANGE event bit.
>>
>> In my laptop USB DVD case, !scsi_status_is_good and sshdr.asc == 0x3a,
>> so last_present (1) and cd->media_present (0) mis-compare and the change
>> event is set.  That does not seem intuitive to me, is this a bug?
> 
> It's not incorrect.  We can try to change the behavior to avoid double
> notifications but given that this has been like this for a really long
> time and that it isn't technically incorrect, I'm not sure changing it
> is a good idea.  It might as well break other things.

Without a definition of DISK_EVENT_MEDIA_CHANGE or its udev
DISK_MEDIA_CHANGE counterpart it's kinda hard to say :)  But I agree
that changing this behavior could have inadvertent effects.

>> Bringing this back to the reported BMC case, which presumably does have
>> "media" present in the virtual device... is it reasonable to expect a
>> DISK_EVENT_MEDIA_CHANGE even for a new device that contains media?  (I
>> haven't verified, but in this case GET_EVENT_STATUS_NOTIFICATION might
>> be enough to set media present.)
> 
> Yeah, I think so.
> 
>> If there is documentation that explains DISK_EVENT_MEDIA_CHANGE conditions
>> somewhere, feel free to point me there.
> 
> AFAIK, there isn't any.  The only thing it tries to do is generating
> at least one event after media change.  Given that media is reported
> present after the last notification, I think userspace should be able
> to do the right thing (and must have been doing the right thing until
> recently).

I have no idea if udev or other userspace has changed in this respect,
or this is simply a timing window that this particular user has fallen
into.  This is a simulated device, so it might be fast/slower than any
real hardware.

Thanks for chiming in,

-- Joe

Re: Spurious DISK_EVENT_MEDIA_CHANGE on USB DVD hotplug?

2017-08-14 Thread Tejun Heo

Hello, Joe.

On Thu, Aug 10, 2017 at 10:45:54AM -0400, Joe Lawrence wrote:
> In the case of my USB DVD -> laptop example, there is no media in my
> device, however I still see the DISK_EVENT_MEDIA_CHANGE event.  This is
> a bit confusing, and I was wondering if there was an explanation for
> the following:
> 
> drivers/scsi/sr.c :: sr_probe()
> 
> disk->events = DISK_EVENT_MEDIA_CHANGE | DISK_EVENT_EJECT_REQUEST;
> ...
> cd->media_present = 1;
> 
> DISK_EVENT_MEDIA_CHANGE events will pass through to userspace and
> for some reason cd->media_present defaults to 1?  More on that below...

I don't have any concrete ideas but I think the only thing it's trying
to do is to always generate at least one changed event no matter what.

...
> sr_check_events() compares the previous (in this case, default)
> media_present value with what the TUR returns.  If it has changed, then
> turn on the DISK_EVENT_MEDIA_CHANGE event bit.
> 
> In my laptop USB DVD case, !scsi_status_is_good and sshdr.asc == 0x3a,
> so last_present (1) and cd->media_present (0) mis-compare and the change
> event is set.  That does not seem intuitive to me, is this a bug?

It's not incorrect.  We can try to change the behavior to avoid double
notifications but given that this has been like this for a really long
time and that it isn't technically incorrect, I'm not sure changing it
is a good idea.  It might as well break other things.

> Bringing this back to the reported BMC case, which presumably does have
> "media" present in the virtual device... is it reasonable to expect a
> DISK_EVENT_MEDIA_CHANGE even for a new device that contains media?  (I
> haven't verified, but in this case GET_EVENT_STATUS_NOTIFICATION might
> be enough to set media present.)

Yeah, I think so.

> If there is documentation that explains DISK_EVENT_MEDIA_CHANGE conditions
> somewhere, feel free to point me there.

AFAIK, there isn't any.  The only thing it tries to do is generating
at least one event after media change.  Given that media is reported
present after the last notification, I think userspace should be able
to do the right thing (and must have been doing the right thing until
recently).

Thanks.

-- 
tejun

Spurious DISK_EVENT_MEDIA_CHANGE on USB DVD hotplug?

2017-08-10 Thread Joe Lawrence

Hi Tejun, Kay,

I'm investigating a customer report which manifests itself all the way
up in gnome-session when a BMC hotplug-adds a simulated DVD device.  The
user logs into their server's BMC and enables "media redirection", an
emulated DVD device + .iso is dynamically added to the bus... in the
past this has worked well, however, they are now noticing a timing
condition on RHEL7 that prevents gnome from successfully auto-mounting
the DVD media.

With Harald's help, I've done some debugging and we've found out that on
hotplug-add, the kernel sends two uevents (ADD, CHANGE) in short
succession.

(Example with an ordinary, physical USB DVD device on my laptop, is
very similar):

% udevadm monitor -k -e

  KERNEL[2409061.130338] add  
/devices/pci:00/:00:14.0/usb3/3-9/3-9.3/3-9.3:1.0/host20/target20:0:0/20:0:0:0/block/sr1
 (block)
  ACTION=add
  DEVNAME=/dev/sr1
  
DEVPATH=/devices/pci:00/:00:14.0/usb3/3-9/3-9.3/3-9.3:1.0/host20/target20:0:0/20:0:0:0/block/sr1
  DEVTYPE=disk
  MAJOR=11
  MINOR=1
  SEQNUM=5885
  SUBSYSTEM=block

  ...

  KERNEL[2409061.134076] change   
/devices/pci:00/:00:14.0/usb3/3-9/3-9.3/3-9.3:1.0/host20/target20:0:0/20:0:0:0/block/sr1
 (block)
  ACTION=change
  DEVNAME=/dev/sr1
  
DEVPATH=/devices/pci:00/:00:14.0/usb3/3-9/3-9.3/3-9.3:1.0/host20/target20:0:0/20:0:0:0/block/sr1
  DEVTYPE=disk
> DISK_MEDIA_CHANGE=1
  MAJOR=11
  MINOR=1
  SEQNUM=5889
  SUBSYSTEM=block

(Both of these events trigger a call out to the 'cdrom_id' userspace
program, the latter of which interferes with the gnome-session
auto-mounting feature.)

With a systemtap probe, I can also see that there are four userspace
openers of the cdrom when it is added:

  (parent)-> (child) : system-tap probe-point
---
> systemd-udevd(849)  -> systemd-udevd(6783) : 
> module("cdrom").function("cdrom_open@drivers/cdrom/cdrom.c:980")
  systemd-udevd(6783) -> cdrom_id(6791)  : 
module("cdrom").function("cdrom_open@drivers/cdrom/cdrom.c:980")
  systemd-udevd(849)  -> systemd-udevd(6783) : 
module("cdrom").function("cdrom_open@drivers/cdrom/cdrom.c:980")
  systemd-udevd(6783) -> cdrom_id(6794)  : 
module("cdrom").function("cdrom_open@drivers/cdrom/cdrom.c:980")

where on the first opener, the kernel eventually invokes
sr_mod::sr_check_events() and gets a DISK_EVENT_MEDIA_CHANGE return
code.

In the case of my USB DVD -> laptop example, there is no media in my
device, however I still see the DISK_EVENT_MEDIA_CHANGE event.  This is
a bit confusing, and I was wondering if there was an explanation for
the following:

drivers/scsi/sr.c :: sr_probe()

disk->events = DISK_EVENT_MEDIA_CHANGE | DISK_EVENT_EJECT_REQUEST;
...
cd->media_present = 1;

DISK_EVENT_MEDIA_CHANGE events will pass through to userspace and
for some reason cd->media_present defaults to 1?  More on that below...


drivers/scsi/sr.c :: sr_check_events()

...
do_tur:
/* let's see whether the media is there with TUR */
last_present = cd->media_present;
ret = scsi_test_unit_ready(cd->device, SR_TIMEOUT, MAX_RETRIES, );

/*
 * Media is considered to be present if TUR succeeds or fails with
 * sense data indicating something other than media-not-present
 * (ASC 0x3a).
 */
cd->media_present = scsi_status_is_good(ret) ||
(scsi_sense_valid() && sshdr.asc != 0x3a);

if (last_present != cd->media_present)
cd->device->changed = 1;

if (cd->device->changed) {
events |= DISK_EVENT_MEDIA_CHANGE;
cd->device->changed = 0;
cd->tur_changed = true;
}
...

sr_check_events() compares the previous (in this case, default)
media_present value with what the TUR returns.  If it has changed, then
turn on the DISK_EVENT_MEDIA_CHANGE event bit.

In my laptop USB DVD case, !scsi_status_is_good and sshdr.asc == 0x3a,
so last_present (1) and cd->media_present (0) mis-compare and the change
event is set.  That does not seem intuitive to me, is this a bug?

Bringing this back to the reported BMC case, which presumably does have
"media" present in the virtual device... is it reasonable to expect a
DISK_EVENT_MEDIA_CHANGE even for a new device that contains media?  (I
haven't verified, but in this case GET_EVENT_STATUS_NOTIFICATION might
be enough to set media present.)

If there is documentation that explains DISK_EVENT_MEDIA_CHANGE conditions
somewhere, feel free to point me there.

Thanks,

-- Joe

Re: [patch 0/5] scsi/bnx2*: Plug hotplug race, correct locking and simplify hotplug code

2017-07-26 Thread Martin K. Petersen


Thomas,

> The conversion of the cpu hotplug locking to a percpu rwsem does not
> longer allow recursive locking of the hotplug lock.
>
> The BNX2I and BNX2FC drivers install/remove hotplug states with the
> hotplug lock held. The install/removal code acquired the hotplug lock
> as well.
>
> While looking into this, I noticed an interesting hotplug race in the
> BNX2FC driver, which could result in dereferencing a NULL pointer or
> freed and potentially reused memory.
>
> The following series addresses these problems and as a final step on
> top it simplifies the hotplug code in both drivers.

Applied to 4.13/scsi-fixes. Thank you!

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: [patch 0/5] scsi/bnx2*: Plug hotplug race, correct locking and simplify hotplug code

2017-07-25 Thread Chad Dupuis


On Mon, 24 Jul 2017, 6:52am, Thomas Gleixner wrote:

> The conversion of the cpu hotplug locking to a percpu rwsem does not longer
> allow recursive locking of the hotplug lock.
> 
> The BNX2I and BNX2FC drivers install/remove hotplug states with the hotplug
> lock held. The install/removal code acquired the hotplug lock as well.
> 
> While looking into this, I noticed an interesting hotplug race in the
> BNX2FC driver, which could result in dereferencing a NULL pointer or freed
> and potentially reused memory.
> 
> The following series addresses these problems and as a final step on top it
> simplifies the hotplug code in both drivers.
> 
> Thanks,
> 
>   tglx
> 
> 
>  drivers/scsi/bnx2fc/bnx2fc_fcoe.c |   68 
> --
>  drivers/scsi/bnx2fc/bnx2fc_hwi.c  |   45 -
>  drivers/scsi/bnx2i/bnx2i_init.c   |   64 ---
>  include/linux/cpuhotplug.h|2 -
>  4 files changed, 53 insertions(+), 126 deletions(-)
> 

We tested the series and everything was fine.  Ack to the series.

Acked-by: Chad Dupuis <chad.dup...@cavium.com>

[patch 4/5] scsi/bnx2fc: Simplify CPU hotplug code

2017-07-24 Thread Thomas Gleixner

The CPU hotplug related code of this driver can be simplified by:

1) Consolidating the callbacks into a single state. The CPU thread can be
   torn down on the CPU which goes offline. There is no point in delaying
   that to the CPU dead state

2) Let the core code invoke the online/offline callbacks and remove the
   extra for_each_online_cpu() loops.

Signed-off-by: Thomas Gleixner <t...@linutronix.de>
---
 drivers/scsi/bnx2fc/bnx2fc_fcoe.c |   69 --
 include/linux/cpuhotplug.h|1 
 2 files changed, 15 insertions(+), 55 deletions(-)

--- a/drivers/scsi/bnx2fc/bnx2fc_fcoe.c
+++ b/drivers/scsi/bnx2fc/bnx2fc_fcoe.c
@@ -2624,12 +2624,11 @@ static struct fcoe_transport bnx2fc_tran
 };
 
 /**
- * bnx2fc_percpu_thread_create - Create a receive thread for an
- *  online CPU
+ * bnx2fc_cpu_online - Create a receive thread for an  online CPU
  *
  * @cpu: cpu index for the online cpu
  */
-static void bnx2fc_percpu_thread_create(unsigned int cpu)
+static int bnx2fc_cpu_online(unsigned int cpu)
 {
struct bnx2fc_percpu_s *p;
struct task_struct *thread;
@@ -2639,15 +2638,17 @@ static void bnx2fc_percpu_thread_create(
thread = kthread_create_on_node(bnx2fc_percpu_io_thread,
(void *)p, cpu_to_node(cpu),
"bnx2fc_thread/%d", cpu);
+   if (IS_ERR(thread))
+   return PTR_ERR(thread);
+
/* bind thread to the cpu */
-   if (likely(!IS_ERR(thread))) {
-   kthread_bind(thread, cpu);
-   p->iothread = thread;
-   wake_up_process(thread);
-   }
+   kthread_bind(thread, cpu);
+   p->iothread = thread;
+   wake_up_process(thread);
+   return 0;
 }
 
-static void bnx2fc_percpu_thread_destroy(unsigned int cpu)
+static int bnx2fc_cpu_offline(unsigned int cpu)
 {
struct bnx2fc_percpu_s *p;
struct task_struct *thread;
@@ -2661,7 +2662,6 @@ static void bnx2fc_percpu_thread_destroy
thread = p->iothread;
p->iothread = NULL;
 
-
/* Free all work in the list */
list_for_each_entry_safe(work, tmp, >work_list, list) {
list_del_init(>list);
@@ -2673,20 +2673,6 @@ static void bnx2fc_percpu_thread_destroy
 
if (thread)
kthread_stop(thread);
-}
-
-
-static int bnx2fc_cpu_online(unsigned int cpu)
-{
-   printk(PFX "CPU %x online: Create Rx thread\n", cpu);
-   bnx2fc_percpu_thread_create(cpu);
-   return 0;
-}
-
-static int bnx2fc_cpu_dead(unsigned int cpu)
-{
-   printk(PFX "CPU %x offline: Remove Rx thread\n", cpu);
-   bnx2fc_percpu_thread_destroy(cpu);
return 0;
 }
 
@@ -2761,31 +2747,16 @@ static int __init bnx2fc_mod_init(void)
spin_lock_init(>fp_work_lock);
}
 
-   get_online_cpus();
-
-   for_each_online_cpu(cpu)
-   bnx2fc_percpu_thread_create(cpu);
-
-   rc = cpuhp_setup_state_nocalls_cpuslocked(CPUHP_AP_ONLINE_DYN,
- "scsi/bnx2fc:online",
- bnx2fc_cpu_online, NULL);
+   rc = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "scsi/bnx2fc:online",
+  bnx2fc_cpu_online, bnx2fc_cpu_offline);
if (rc < 0)
-   goto stop_threads;
+   goto stop_thread;
bnx2fc_online_state = rc;
 
-   cpuhp_setup_state_nocalls_cpuslocked(CPUHP_SCSI_BNX2FC_DEAD,
-"scsi/bnx2fc:dead",
-NULL, bnx2fc_cpu_dead);
-   put_online_cpus();
-
cnic_register_driver(CNIC_ULP_FCOE, _cnic_cb);
-
return 0;
 
-stop_threads:
-   for_each_online_cpu(cpu)
-   bnx2fc_percpu_thread_destroy(cpu);
-   put_online_cpus();
+stop_thread:
kthread_stop(l2_thread);
 free_wq:
destroy_workqueue(bnx2fc_wq);
@@ -2804,7 +2775,6 @@ static void __exit bnx2fc_mod_exit(void)
struct fcoe_percpu_s *bg;
struct task_struct *l2_thread;
struct sk_buff *skb;
-   unsigned int cpu = 0;
 
/*
 * NOTE: Since cnic calls register_driver routine rtnl_lock,
@@ -2845,16 +2815,7 @@ static void __exit bnx2fc_mod_exit(void)
if (l2_thread)
kthread_stop(l2_thread);
 
-   get_online_cpus();
-   /* Destroy per cpu threads */
-   for_each_online_cpu(cpu) {
-   bnx2fc_percpu_thread_destroy(cpu);
-   }
-
-   cpuhp_remove_state_nocalls_cpuslocked(bnx2fc_online_state);
-   cpuhp_remove_state_nocalls_cpuslocked(CPUHP_SCSI_BNX2FC_DEAD);
-
-   put_online_cpus();
+   cpuhp_remove_state(bnx2fc_online_state);
 
destroy_workqueue(bnx2fc_wq);
/*
--- a/include/linux/cpuhotplug.h
+++ b/in

[patch 0/5] scsi/bnx2*: Plug hotplug race, correct locking and simplify hotplug code

2017-07-24 Thread Thomas Gleixner

The conversion of the cpu hotplug locking to a percpu rwsem does not longer
allow recursive locking of the hotplug lock.

The BNX2I and BNX2FC drivers install/remove hotplug states with the hotplug
lock held. The install/removal code acquired the hotplug lock as well.

While looking into this, I noticed an interesting hotplug race in the
BNX2FC driver, which could result in dereferencing a NULL pointer or freed
and potentially reused memory.

The following series addresses these problems and as a final step on top it
simplifies the hotplug code in both drivers.

Thanks,

tglx


 drivers/scsi/bnx2fc/bnx2fc_fcoe.c |   68 --
 drivers/scsi/bnx2fc/bnx2fc_hwi.c  |   45 -
 drivers/scsi/bnx2i/bnx2i_init.c   |   64 ---
 include/linux/cpuhotplug.h|2 -
 4 files changed, 53 insertions(+), 126 deletions(-)

[patch 5/5] scsi/bnx2i: Simplify cpu hotplug code

2017-07-24 Thread Thomas Gleixner

The CPU hotplug related code of this driver can be simplified by:

1) Consolidating the callbacks into a single state. The CPU thread can be
   torn down on the CPU which goes offline. There is no point in delaying
   that to the CPU dead state

2) Let the core code invoke the online/offline callbacks and remove the
   extra for_each_online_cpu() loops.

Signed-off-by: Thomas Gleixner <t...@linutronix.de>
---
 drivers/scsi/bnx2i/bnx2i_init.c |   65 +---
 include/linux/cpuhotplug.h  |1 
 2 files changed, 15 insertions(+), 51 deletions(-)

--- a/drivers/scsi/bnx2i/bnx2i_init.c
+++ b/drivers/scsi/bnx2i/bnx2i_init.c
@@ -404,12 +404,11 @@ int bnx2i_get_stats(void *handle)
 
 
 /**
- * bnx2i_percpu_thread_create - Create a receive thread for an
- * online CPU
+ * bnx2i_cpu_online - Create a receive thread for an online CPU
  *
  * @cpu:   cpu index for the online cpu
  */
-static void bnx2i_percpu_thread_create(unsigned int cpu)
+static int bnx2i_cpu_online(unsigned int cpu)
 {
struct bnx2i_percpu_s *p;
struct task_struct *thread;
@@ -419,16 +418,17 @@ static void bnx2i_percpu_thread_create(u
thread = kthread_create_on_node(bnx2i_percpu_io_thread, (void *)p,
cpu_to_node(cpu),
"bnx2i_thread/%d", cpu);
+   if (IS_ERR(thread))
+   return PTR_ERR(thread);
+
/* bind thread to the cpu */
-   if (likely(!IS_ERR(thread))) {
-   kthread_bind(thread, cpu);
-   p->iothread = thread;
-   wake_up_process(thread);
-   }
+   kthread_bind(thread, cpu);
+   p->iothread = thread;
+   wake_up_process(thread);
+   return 0;
 }
 
-
-static void bnx2i_percpu_thread_destroy(unsigned int cpu)
+static int bnx2i_cpu_offline(unsigned int cpu)
 {
struct bnx2i_percpu_s *p;
struct task_struct *thread;
@@ -451,19 +451,6 @@ static void bnx2i_percpu_thread_destroy(
spin_unlock_bh(>p_work_lock);
if (thread)
kthread_stop(thread);
-}
-
-static int bnx2i_cpu_online(unsigned int cpu)
-{
-   pr_info("bnx2i: CPU %x online: Create Rx thread\n", cpu);
-   bnx2i_percpu_thread_create(cpu);
-   return 0;
-}
-
-static int bnx2i_cpu_dead(unsigned int cpu)
-{
-   pr_info("CPU %x offline: Remove Rx thread\n", cpu);
-   bnx2i_percpu_thread_destroy(cpu);
return 0;
 }
 
@@ -511,28 +498,14 @@ static int __init bnx2i_mod_init(void)
p->iothread = NULL;
}
 
-   get_online_cpus();
-
-   for_each_online_cpu(cpu)
-   bnx2i_percpu_thread_create(cpu);
-
-   err = cpuhp_setup_state_nocalls_cpuslocked(CPUHP_AP_ONLINE_DYN,
-  "scsi/bnx2i:online",
-  bnx2i_cpu_online, NULL);
+   err = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "scsi/bnx2i:online",
+   bnx2i_cpu_online, bnx2i_cpu_offline);
if (err < 0)
-   goto remove_threads;
+   goto unreg_driver;
bnx2i_online_state = err;
-
-   cpuhp_setup_state_nocalls_cpuslocked(CPUHP_SCSI_BNX2I_DEAD,
-"scsi/bnx2i:dead",
-NULL, bnx2i_cpu_dead);
-   put_online_cpus();
return 0;
 
-remove_threads:
-   for_each_online_cpu(cpu)
-   bnx2i_percpu_thread_destroy(cpu);
-   put_online_cpus();
+unreg_driver:
cnic_unregister_driver(CNIC_ULP_ISCSI);
 unreg_xport:
iscsi_unregister_transport(_iscsi_transport);
@@ -552,7 +525,6 @@ static int __init bnx2i_mod_init(void)
 static void __exit bnx2i_mod_exit(void)
 {
struct bnx2i_hba *hba;
-   unsigned cpu = 0;
 
mutex_lock(_dev_lock);
while (!list_empty(_list)) {
@@ -570,14 +542,7 @@ static void __exit bnx2i_mod_exit(void)
}
mutex_unlock(_dev_lock);
 
-   get_online_cpus();
-
-   for_each_online_cpu(cpu)
-   bnx2i_percpu_thread_destroy(cpu);
-
-   cpuhp_remove_state_nocalls_cpuslocked(bnx2i_online_state);
-   cpuhp_remove_state_nocalls_cpuslocked(CPUHP_SCSI_BNX2I_DEAD);
-   put_online_cpus();
+   cpuhp_remove_state(bnx2i_online_state);
 
iscsi_unregister_transport(_iscsi_transport);
cnic_unregister_driver(CNIC_ULP_ISCSI);
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -39,7 +39,6 @@ enum cpuhp_state {
CPUHP_PCI_XGENE_DEAD,
CPUHP_IOMMU_INTEL_DEAD,
CPUHP_LUSTRE_CFS_DEAD,
-   CPUHP_SCSI_BNX2I_DEAD,
CPUHP_WORKQUEUE_PREP,
CPUHP_POWER_NUMA_PREPARE,
CPUHP_HRTIMERS_PREPARE,

[patch 1/5] scsi/bnx2fc: Plug CPU hotplug race

2017-07-24 Thread Thomas Gleixner

bnx2fc_process_new_cqes() has protection against CPU hotplug, which relies
on the per cpu thread pointer. This protection is racy because it happens
only partially with the per cpu fp_work_lock held.

If the CPU is unplugged after the lock is dropped, the wakeup code can
dereference a NULL pointer or access freed and potentially reused memory.

Restructure the code so the thread check and wakeup happens with the
fp_work_lock held.

Signed-off-by: Thomas Gleixner <t...@linutronix.de>
---
 drivers/scsi/bnx2fc/bnx2fc_hwi.c |   45 +++
 1 file changed, 23 insertions(+), 22 deletions(-)

--- a/drivers/scsi/bnx2fc/bnx2fc_hwi.c
+++ b/drivers/scsi/bnx2fc/bnx2fc_hwi.c
@@ -1008,6 +1008,28 @@ static struct bnx2fc_work *bnx2fc_alloc_
return work;
 }
 
+/* Pending work request completion */
+static void bnx2fc_pending_work(struct bnx2fc_rport *tgt, unsigned int wqe)
+{
+   unsigned int cpu = wqe % num_possible_cpus();
+   struct bnx2fc_percpu_s *fps;
+   struct bnx2fc_work *work;
+
+   fps = _cpu(bnx2fc_percpu, cpu);
+   spin_lock_bh(>fp_work_lock);
+   if (fps->iothread) {
+   work = bnx2fc_alloc_work(tgt, wqe);
+   if (work) {
+   list_add_tail(>list, >work_list);
+   wake_up_process(fps->iothread);
+   spin_unlock_bh(>fp_work_lock);
+   return;
+   }
+   }
+   spin_unlock_bh(>fp_work_lock);
+   bnx2fc_process_cq_compl(tgt, wqe);
+}
+
 int bnx2fc_process_new_cqes(struct bnx2fc_rport *tgt)
 {
struct fcoe_cqe *cq;
@@ -1042,28 +1064,7 @@ int bnx2fc_process_new_cqes(struct bnx2f
/* Unsolicited event notification */
bnx2fc_process_unsol_compl(tgt, wqe);
} else {
-   /* Pending work request completion */
-   struct bnx2fc_work *work = NULL;
-   struct bnx2fc_percpu_s *fps = NULL;
-   unsigned int cpu = wqe % num_possible_cpus();
-
-   fps = _cpu(bnx2fc_percpu, cpu);
-   spin_lock_bh(>fp_work_lock);
-   if (unlikely(!fps->iothread))
-   goto unlock;
-
-   work = bnx2fc_alloc_work(tgt, wqe);
-   if (work)
-   list_add_tail(>list,
- >work_list);
-unlock:
-   spin_unlock_bh(>fp_work_lock);
-
-   /* Pending work request completion */
-   if (fps->iothread && work)
-   wake_up_process(fps->iothread);
-   else
-   bnx2fc_process_cq_compl(tgt, wqe);
+   bnx2fc_pending_work(tgt, wqe);
num_free_sqes++;
}
cqe++;

Re: [PATCH v3 0/7] Enhance libsas hotplug feature

2017-07-14 Thread wangyijing

Hi, I'm sorry to say that I have to stop the libsas hotplug improvement work, I 
will resign from
Huawei, so I have no time and hardware to continue to work at this issue. John 
is very familiar with
this work, and provide a lot of good suggestions. So if John like, I am glad he 
could join to work
at this issues, And my colleague Jason Yan could also provide helps.


Thanks!
Yijing.


在 2017/7/10 15:06, Yijing Wang 写道:
> This patchset is based Johannes's patch
> "scsi: sas: scsi_queue_work can fail, so make callers aware"
> 
> Now the libsas hotplug has some issues, Dan Williams report
> a similar bug here before
> https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg39187.html
> 
> The issues we have found
> 1. if LLDD burst reports lots of phy-up/phy-down sas events, some events
>may lost because a same sas events is pending now, finally libsas topo
>may different the hardware.
> 2. receive a phy down sas event, libsas call sas_deform_port to remove
>devices, it would first delete the sas port, then put a destruction
>discovery event in a new work, and queue it at the tail of workqueue,
>once the sas port be deleted, its children device will be deleted too,
>when the destruction work start, it will found the target device has
>been removed, and report a sysfs warnning.
> 3. since a hotplug process will be devided into several works, if a phy up
>sas event insert into phydown works, like
>destruction work  ---> PORTE_BYTES_DMAED (sas_form_port) 
> >PHYE_LOSS_OF_SIGNAL
>the hot remove flow would broken by PORTE_BYTES_DMAED event, it's not
>we expected, and issues would occur.
> 
> The first patch fix the sas events lost, and the second one introudce 
> wait-complete
> to fix the hotplug order issues.
> 
> v2->v3: some code improvements suggested by Johannes and John,
>   split v2 patch 2 into several small pathes.
> v1->v2: some code improvements suggested by John Garry
> 
> Yijing Wang (7):
>   libsas: Use static sas event pool to appease sas event lost
>   libsas: remove unused port_gone_completion
>   libsas: Use new workqueue to run sas event
>   libsas: add sas event wait-complete support
>   libsas: add a new workqueue to run probe/destruct discovery event
>   libsas: add wait-complete support to sync discovery event
>   libsas: release disco mutex during waiting in sas_ex_discover_end_dev
> 
>  drivers/scsi/libsas/sas_discover.c |  58 +++---
>  drivers/scsi/libsas/sas_event.c| 212 
> -
>  drivers/scsi/libsas/sas_expander.c |  22 +++-
>  drivers/scsi/libsas/sas_init.c |  21 ++--
>  drivers/scsi/libsas/sas_internal.h |  64 +++
>  drivers/scsi/libsas/sas_phy.c  |  48 +++--
>  drivers/scsi/libsas/sas_port.c |  22 ++--
>  include/scsi/libsas.h  |  27 +++--
>  8 files changed, 373 insertions(+), 101 deletions(-)
>

Re: [PATCH v3 0/7] Enhance libsas hotplug feature

2017-07-13 Thread wangyijing



在 2017/7/13 16:08, John Garry 写道:
> On 13/07/2017 02:37, wangyijing wrote:
>>> > So much nicer. BTW, /dev/sdb is a SATA disk, the rest are SAS.
>> Oh, I take a mistake ? The result you tested the hotplug which applied this 
>> patchset is fine ?
>>
>> Thanks!
>> Yijing.
> 
> Well basic hotplug is fine, as below. I did not do any robust testing.
> 

OK， thanks，I tested with and without fio running, the results are both fine.

Thanks!
Yijing.

> root@(none)$ echo 0 > ./phy-0:7/sas_phy/phy-0:7/enable
> root@(none)$ [  180.147676] hisi_sas_v2_hw HISI0162:01: found dev[8:1] is gone
> [  180.216558] hisi_sas_v2_hw HISI0162:01: found dev[7:1] is gone
> [  180.280548] hisi_sas_v2_hw HISI0162:01: found dev[6:1] is gone
> [  180.352556] hisi_sas_v2_hw HISI0162:01: found dev[5:1] is gone
> [  180.432495] hisi_sas_v2_hw HISI0162:01: found dev[4:1] is gone
> [  180.508492] hisi_sas_v2_hw HISI0162:01: found dev[3:1] is gone
> [  180.527577] sd 0:0:1:0: [sdb] Synchronizing SCSI cache
> [  180.532728] sd 0:0:1:0: [sdb] Synchronize Cache(10) failed: Result: 
> hostbyte=0x04 driverbyte=0x00
> [  180.541591] sd 0:0:1:0: [sdb] Stopping disk
> [  180.545767] sd 0:0:1:0: [sdb] Start/Stop Unit failed: Result: 
> hostbyte=0x04 driverbyte=0x00
> [  180.612491] hisi_sas_v2_hw HISI0162:01: found dev[2:5] is gone
> [  180.696452] hisi_sas_v2_hw HISI0162:01: found dev[1:1] is gone
> [  180.703221] hisi_sas_v2_hw HISI0162:01: found dev[0:2] is gone
> 
> root@(none)$ echo 1 > ./phy-0:7/sas_phy/phy-0:7/enable
> root@(none)$ [  185.937831] hisi_sas_v2_hw HISI0162:01: phyup: phy7 
> link_rate=11
> [  185.996575] scsi 0:0:8:0: Direct-Access SanDisk  LT0200MO P404 PQ: 0 
> ANSI: 6
> [  187.059642] ata2.00: ATA-8: HGST HUS724040ALA640, MFAOA8B0, max UDMA/133
> [  187.066341] ata2.00: 7814037168 sectors, multi 0: LBA48 NCQ (depth 31/32)
> [  187.073278] ata2.00: ATA Identify Device Log not supported
> [  187.078755] ata2.00: Security Log not supported
> [  187.085239] ata2.00: ATA Identify Device Log not supported
> [  187.090715] ata2.00: Security Log not supported
> [  187.095236] ata2.00: configured for UDMA/133
> [  187.136917] scsi 0:0:9:0: Direct-Access ATA  HGST HUS724040AL A8B0 
> PQ: 0 ANSI: 5
> [  187.187612] sd 0:0:9:0: [sdb] 7814037168 512-byte logical blocks: (4.00 
> TB/3.64 TiB)
> [  187.195365] sd 0:0:9:0: [sdb] Write Protect is off
> [  187.200161] sd 0:0:9:0: [sdb] Write cache: enabled, read cache: enabled, 
> doesn't support DPO or FUA
> [  187.223844] sd 0:0:9:0: [sdb] Attached SCSI disk
> [  187.225498] scsi 0:0:10:0: Direct-Access SanDisk  LT0200MO  P404 PQ: 0 
> ANSI: 6
> [  187.243864] sd 0:0:8:0: [sda] 390721968 512-byte logical blocks: (200 
> GB/186 GiB)
> [  187.285879] sd 0:0:8:0: [sda] Write Protect is off
> [  187.367898] sd 0:0:8:0: [sda] Write cache: disabled, read cache: disabled, 
> supports DPO and FUA
> [  187.524043] scsi 0:0:11:0: Direct-Access SanDisk  LT0200MO  P404 PQ: 0 
> ANSI: 6
> [  187.701505] sd 0:0:10:0: [sdc] 390721968 512-byte logical blocks: (200 
> GB/186 GiB)
> [  187.743547] sd 0:0:10:0: [sdc] Write Protect is off
> [  187.822546] scsi 0:0:12:0: Direct-Access SanDisk  LT0200MO  P404 PQ: 0 
> ANSI: 6
> [  187.825531] sd 0:0:10:0: [sdc] Write cache: disabled, read cache: 
> disabled, supports DPO and FUA
> [  188.000167] sd 0:0:11:0: [sdd] 390721968 512-byte logical blocks: (200 
> GB/186 GiB)
> [  188.042205] sd 0:0:11:0: [sdd] Write Protect is off
> [  188.121527] scsi 0:0:13:0: Direct-Access SanDisk  LT0200MO  P404 PQ: 0 
> ANSI: 6
> [  188.124274] sd 0:0:11:0: [sdd] Write cache: disabled, read cache: 
> disabled, supports DPO and FUA
> [  188.298942] sd 0:0:12:0: [sde] 390721968 512-byte logical blocks: (200 
> GB/186 GiB)
> [  188.340960] sd 0:0:12:0: [sde] Write Protect is off
> [  188.420023] scsi 0:0:14:0: Direct-Access SanDisk  LT0200MO  P404 PQ: 0 
> ANSI: 6
> [  188.422969] sd 0:0:12:0: [sde] Write cache: disabled, read cache: 
> disabled, supports DPO and FUA
> [  188.597501] sd 0:0:13:0: [sdf] 390721968 512-byte logical blocks: (200 
> GB/186 GiB)
> [  188.605069] sd 0:0:8:0: [sda] Attached SCSI disk
> [  188.639520] sd 0:0:13:0: [sdf] Write Protect is off
> [  188.682445] scsi 0:0:15:0: Enclosure 12G SAS  Expander  RevB PQ: 0 
> ANSI: 6
> [  188.721540] sd 0:0:13:0: [sdf] Write cache: disabled, read cache: 
> disabled, supports DPO and FUA
> [  188.896399] sd 0:0:14:0: [sdg] 390721968 512-byte logical blocks: (200 
> GB/186 GiB)
> [  188.938445] sd 0:0:14:0: [sdg] Write Protect is off
> [  189.020444] sd 0:0:14:0: [sdg] Write cache: disabled, read cache: 
> disabled, supports DPO and FUA
> [  189.060608] sd 0:0:10:0: [sdc] Attached

Re: [PATCH v3 0/7] Enhance libsas hotplug feature

2017-07-13 Thread John Garry


On 13/07/2017 02:37, wangyijing wrote:

> So much nicer. BTW, /dev/sdb is a SATA disk, the rest are SAS.

Oh, I take a mistake ? The result you tested the hotplug which applied this 
patchset is fine ?

Thanks!
Yijing.


Well basic hotplug is fine, as below. I did not do any robust testing.

root@(none)$ echo 0 > ./phy-0:7/sas_phy/phy-0:7/enable
root@(none)$ [  180.147676] hisi_sas_v2_hw HISI0162:01: found dev[8:1] 
is gone

[  180.216558] hisi_sas_v2_hw HISI0162:01: found dev[7:1] is gone
[  180.280548] hisi_sas_v2_hw HISI0162:01: found dev[6:1] is gone
[  180.352556] hisi_sas_v2_hw HISI0162:01: found dev[5:1] is gone
[  180.432495] hisi_sas_v2_hw HISI0162:01: found dev[4:1] is gone
[  180.508492] hisi_sas_v2_hw HISI0162:01: found dev[3:1] is gone
[  180.527577] sd 0:0:1:0: [sdb] Synchronizing SCSI cache
[  180.532728] sd 0:0:1:0: [sdb] Synchronize Cache(10) failed: Result: 
hostbyte=0x04 driverbyte=0x00

[  180.541591] sd 0:0:1:0: [sdb] Stopping disk
[  180.545767] sd 0:0:1:0: [sdb] Start/Stop Unit failed: Result: 
hostbyte=0x04 driverbyte=0x00

[  180.612491] hisi_sas_v2_hw HISI0162:01: found dev[2:5] is gone
[  180.696452] hisi_sas_v2_hw HISI0162:01: found dev[1:1] is gone
[  180.703221] hisi_sas_v2_hw HISI0162:01: found dev[0:2] is gone

root@(none)$ echo 1 > ./phy-0:7/sas_phy/phy-0:7/enable
root@(none)$ [  185.937831] hisi_sas_v2_hw HISI0162:01: phyup: phy7 
link_rate=11
[  185.996575] scsi 0:0:8:0: Direct-Access SanDisk  LT0200MO 
P404 PQ: 0 ANSI: 6

[  187.059642] ata2.00: ATA-8: HGST HUS724040ALA640, MFAOA8B0, max UDMA/133
[  187.066341] ata2.00: 7814037168 sectors, multi 0: LBA48 NCQ (depth 31/32)
[  187.073278] ata2.00: ATA Identify Device Log not supported
[  187.078755] ata2.00: Security Log not supported
[  187.085239] ata2.00: ATA Identify Device Log not supported
[  187.090715] ata2.00: Security Log not supported
[  187.095236] ata2.00: configured for UDMA/133
[  187.136917] scsi 0:0:9:0: Direct-Access ATA  HGST HUS724040AL 
A8B0 PQ: 0 ANSI: 5
[  187.187612] sd 0:0:9:0: [sdb] 7814037168 512-byte logical blocks: 
(4.00 TB/3.64 TiB)

[  187.195365] sd 0:0:9:0: [sdb] Write Protect is off
[  187.200161] sd 0:0:9:0: [sdb] Write cache: enabled, read cache: 
enabled, doesn't support DPO or FUA

[  187.223844] sd 0:0:9:0: [sdb] Attached SCSI disk
[  187.225498] scsi 0:0:10:0: Direct-Access SanDisk  LT0200MO 
 P404 PQ: 0 ANSI: 6
[  187.243864] sd 0:0:8:0: [sda] 390721968 512-byte logical blocks: (200 
GB/186 GiB)

[  187.285879] sd 0:0:8:0: [sda] Write Protect is off
[  187.367898] sd 0:0:8:0: [sda] Write cache: disabled, read cache: 
disabled, supports DPO and FUA
[  187.524043] scsi 0:0:11:0: Direct-Access SanDisk  LT0200MO 
 P404 PQ: 0 ANSI: 6
[  187.701505] sd 0:0:10:0: [sdc] 390721968 512-byte logical blocks: 
(200 GB/186 GiB)

[  187.743547] sd 0:0:10:0: [sdc] Write Protect is off
[  187.822546] scsi 0:0:12:0: Direct-Access SanDisk  LT0200MO 
 P404 PQ: 0 ANSI: 6
[  187.825531] sd 0:0:10:0: [sdc] Write cache: disabled, read cache: 
disabled, supports DPO and FUA
[  188.000167] sd 0:0:11:0: [sdd] 390721968 512-byte logical blocks: 
(200 GB/186 GiB)

[  188.042205] sd 0:0:11:0: [sdd] Write Protect is off
[  188.121527] scsi 0:0:13:0: Direct-Access SanDisk  LT0200MO 
 P404 PQ: 0 ANSI: 6
[  188.124274] sd 0:0:11:0: [sdd] Write cache: disabled, read cache: 
disabled, supports DPO and FUA
[  188.298942] sd 0:0:12:0: [sde] 390721968 512-byte logical blocks: 
(200 GB/186 GiB)

[  188.340960] sd 0:0:12:0: [sde] Write Protect is off
[  188.420023] scsi 0:0:14:0: Direct-Access SanDisk  LT0200MO 
 P404 PQ: 0 ANSI: 6
[  188.422969] sd 0:0:12:0: [sde] Write cache: disabled, read cache: 
disabled, supports DPO and FUA
[  188.597501] sd 0:0:13:0: [sdf] 390721968 512-byte logical blocks: 
(200 GB/186 GiB)

[  188.605069] sd 0:0:8:0: [sda] Attached SCSI disk
[  188.639520] sd 0:0:13:0: [sdf] Write Protect is off
[  188.682445] scsi 0:0:15:0: Enclosure 12G SAS  Expander 
 RevB PQ: 0 ANSI: 6
[  188.721540] sd 0:0:13:0: [sdf] Write cache: disabled, read cache: 
disabled, supports DPO and FUA
[  188.896399] sd 0:0:14:0: [sdg] 390721968 512-byte logical blocks: 
(200 GB/186 GiB)

[  188.938445] sd 0:0:14:0: [sdg] Write Protect is off
[  189.020444] sd 0:0:14:0: [sdg] Write cache: disabled, read cache: 
disabled, supports DPO and FUA

[  189.060608] sd 0:0:10:0: [sdc] Attached SCSI disk
[  189.359073] sd 0:0:11:0: [sdd] Attached SCSI disk
[  189.657643] sd 0:0:12:0: [sde] Attached SCSI disk
[  189.956585] sd 0:0:13:0: [sdf] Attached SCSI disk
[  190.255148] sd 0:0:14:0: [sdg] Attached SCSI disk

root@(none)$ echo 0 > ./phy-0:7/sas_phy/phy-0:7/enable
root@(none)$ [  192.895718] hisi_sas_v2_hw HISI0162:01: found dev[8:1] 
is gone

[  192.964671] hisi_sas_v2_hw HISI0162:01: found dev[7:1] is gone
[  193.032744] hisi_sas_v2_hw HISI0162:01: found dev[6:1] is gone
[  193.096755] hisi_sas_v2_hw HISI0162:01: found dev[5:1] is gone
[  193.157072] hisi_sas_v2_hw HISI0162:01:

Re: [PATCH v3 0/7] Enhance libsas hotplug feature

2017-07-12 Thread wangyijing



在 2017/7/12 17:59, John Garry 写道:
> On 10/07/2017 08:06, Yijing Wang wrote:
>> This patchset is based Johannes's patch
>> "scsi: sas: scsi_queue_work can fail, so make callers aware"
>>
>> Now the libsas hotplug has some issues, Dan Williams report
>> a similar bug here before
>> https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg39187.html
>>
>> The issues we have found
>> 1. if LLDD burst reports lots of phy-up/phy-down sas events, some events
>>may lost because a same sas events is pending now, finally libsas topo
>>may different the hardware.
>> 2. receive a phy down sas event, libsas call sas_deform_port to remove
>>devices, it would first delete the sas port, then put a destruction
>>discovery event in a new work, and queue it at the tail of workqueue,
>>once the sas port be deleted, its children device will be deleted too,
>>when the destruction work start, it will found the target device has
>>been removed, and report a sysfs warnning.
>> 3. since a hotplug process will be devided into several works, if a phy up
>>sas event insert into phydown works, like
>>destruction work  ---> PORTE_BYTES_DMAED (sas_form_port) 
>> >PHYE_LOSS_OF_SIGNAL
>>the hot remove flow would broken by PORTE_BYTES_DMAED event, it's not
>>we expected, and issues would occur.
>>
>> The first patch fix the sas events lost, and the second one introudce 
>> wait-complete
>> to fix the hotplug order issues.
>>
> 
> I quickly tested this for basic hotplug.
> 
> Before:
> root@(none)$ echo 0 > ./phy-0:6/sas_phy/phy-0:6/enable
> root@(none)$ echo 0 > ./phy-0:5/sas_phy/phy-0:5/enable
> root@(none)$ echo 0 > ./phy-0:4/sas_phy/phy-0:4/enable
> root@(none)$ echo 0 > ./phy-0:3/sas_phy/phy-0:3/enable
> root@(none)$ echo 0 > ./phy-0:3/sas_phy/phy-0:2/enable
> root@(none)$ echo 0 > ./phy-0:2/sas_phy/phy-0:2/enable
> root@(none)$ echo 0 > ./phy-0:1/sas_phy/phy-0:1/enable
> root@(none)$ echo 0 > ./phy-0:0/sas_phy/phy-0:0/enable
> root@(none)$ echo 0 > ./phy-0:7/sas_phy/phy-0:7/enable
> root@(none)$ [  102.570694] sysfs group 'power' not found for kobject 
> '0:0:7:0'
> [  102.577250] [ cut here ]
> [  102.581861] WARNING: CPU: 3 PID: 1740 at fs/sysfs/group.c:237 
> sysfs_remove_group+0x8c/0x94
> [  102.590110] Modules linked in:
> [  102.593154] CPU: 3 PID: 1740 Comm: kworker/u128:2 Not tainted 
> 4.12.0-rc1-00032-g3ab81fc #1907
> [  102.601664] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon D05 
> UEFI Nemo 1.7 RC3 06/23/2017
> [  102.610784] Workqueue: scsi_wq_0 sas_destruct_devices
> [  102.615822] task: 8017d4793400 task.stack: 8017b7e7
> [  102.621728] PC is at sysfs_remove_group+0x8c/0x94
> [  102.626419] LR is at sysfs_remove_group+0x8c/0x94
> [  102.631109] pc : [] lr : [] pstate: 
> 6045
> [  102.638490] sp : 8017b7e73b80
> [  102.641791] x29: 8017b7e73b80 x28: 8017db010800
> [  102.647091] x27: 08e27000 x26: 8017d43e6600
> [  102.652390] x25: 8017b828 x24: 0003
> [  102.657689] x23: 8017b78864b0 x22: 8017b784c988
> [  102.662988] x21: 8017b7886410 x20: 08ee9dd0
> [  102.668288] x19:  x18: 08a1b678
> [  102.673587] x17: 000e x16: 0007
> [  102.678886] x15:  x14: 00a3
> [  102.684185] x13: 0033 x12: 0028
> [  102.689484] x11: 08f3be58 x10: 
> [  102.694783] x9 : 043c x8 : 6f6b20726f662064
> [  102.700082] x7 : 08e29e08 x6 : 8017fbe34c50
> [  102.705382] x5 :  x4 : 
> [  102.710681] x3 :  x2 : 08e427e0
> [  102.715980] x1 :  x0 : 0033
> [  102.721279] ---[ end trace c216cc1451d5f7ec ]---
> [  102.725882] Call trace:
> [  102.728316] Exception stack(0x8017b7e739b0 to 0x8017b7e73ae0)
> [  102.734742] 39a0:    
> 0001
> [  102.742557] 39c0: 8017b7e73b80 08267c44 08bfa050 
> 
> [  102.750372] 39e0: 8017b78864b0 0003 8017b828 
> 8017d43e6600
> [  102.758188] 3a00: 08e27000 8017db010800 8017d4793400 
> 
> [  102.766003] 3a20: 8017b7e73b80 8017b7e73b80 8017b7e73b40 
> ffc8
> [  102.773818] 3a40: 8017b7e73a70 0810c12c 0033 
> 
> [  102.781633] 3a60: 08e427e0   
> 
> [  102.7

Re: [PATCH v3 0/7] Enhance libsas hotplug feature

2017-07-12 Thread wangyijing



在 2017/7/12 17:59, John Garry 写道:
> On 10/07/2017 08:06, Yijing Wang wrote:
>> This patchset is based Johannes's patch
>> "scsi: sas: scsi_queue_work can fail, so make callers aware"
>>
>> Now the libsas hotplug has some issues, Dan Williams report
>> a similar bug here before
>> https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg39187.html
>>
>> The issues we have found
>> 1. if LLDD burst reports lots of phy-up/phy-down sas events, some events
>>may lost because a same sas events is pending now, finally libsas topo
>>may different the hardware.
>> 2. receive a phy down sas event, libsas call sas_deform_port to remove
>>devices, it would first delete the sas port, then put a destruction
>>discovery event in a new work, and queue it at the tail of workqueue,
>>once the sas port be deleted, its children device will be deleted too,
>>when the destruction work start, it will found the target device has
>>been removed, and report a sysfs warnning.
>> 3. since a hotplug process will be devided into several works, if a phy up
>>sas event insert into phydown works, like
>>destruction work  ---> PORTE_BYTES_DMAED (sas_form_port) 
>> >PHYE_LOSS_OF_SIGNAL
>>the hot remove flow would broken by PORTE_BYTES_DMAED event, it's not
>>we expected, and issues would occur.
>>
>> The first patch fix the sas events lost, and the second one introudce 
>> wait-complete
>> to fix the hotplug order issues.
>>
> 
> I quickly tested this for basic hotplug.
> 
> Before:
> root@(none)$ echo 0 > ./phy-0:6/sas_phy/phy-0:6/enable
> root@(none)$ echo 0 > ./phy-0:5/sas_phy/phy-0:5/enable
> root@(none)$ echo 0 > ./phy-0:4/sas_phy/phy-0:4/enable
> root@(none)$ echo 0 > ./phy-0:3/sas_phy/phy-0:3/enable
> root@(none)$ echo 0 > ./phy-0:3/sas_phy/phy-0:2/enable
> root@(none)$ echo 0 > ./phy-0:2/sas_phy/phy-0:2/enable
> root@(none)$ echo 0 > ./phy-0:1/sas_phy/phy-0:1/enable
> root@(none)$ echo 0 > ./phy-0:0/sas_phy/phy-0:0/enable
> root@(none)$ echo 0 > ./phy-0:7/sas_phy/phy-0:7/enable
> root@(none)$ [  102.570694] sysfs group 'power' not found for kobject 
> '0:0:7:0'
> [  102.577250] [ cut here ]
> [  102.581861] WARNING: CPU: 3 PID: 1740 at fs/sysfs/group.c:237 
> sysfs_remove_group+0x8c/0x94
> [  102.590110] Modules linked in:
> [  102.593154] CPU: 3 PID: 1740 Comm: kworker/u128:2 Not tainted 
> 4.12.0-rc1-00032-g3ab81fc #1907
> [  102.601664] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon D05 
> UEFI Nemo 1.7 RC3 06/23/2017
> [  102.610784] Workqueue: scsi_wq_0 sas_destruct_devices
> [  102.615822] task: 8017d4793400 task.stack: 8017b7e7
> [  102.621728] PC is at sysfs_remove_group+0x8c/0x94
> [  102.626419] LR is at sysfs_remove_group+0x8c/0x94
> [  102.631109] pc : [] lr : [] pstate: 
> 6045
> [  102.638490] sp : 8017b7e73b80
> [  102.641791] x29: 8017b7e73b80 x28: 8017db010800
> [  102.647091] x27: 08e27000 x26: 8017d43e6600
> [  102.652390] x25: 8017b828 x24: 0003
> [  102.657689] x23: 8017b78864b0 x22: 8017b784c988
> [  102.662988] x21: 8017b7886410 x20: 08ee9dd0
> [  102.668288] x19:  x18: 08a1b678
> [  102.673587] x17: 000e x16: 0007
> [  102.678886] x15:  x14: 00a3
> [  102.684185] x13: 0033 x12: 0028
> [  102.689484] x11: 08f3be58 x10: 
> [  102.694783] x9 : 043c x8 : 6f6b20726f662064
> [  102.700082] x7 : 08e29e08 x6 : 8017fbe34c50
> [  102.705382] x5 :  x4 : 
> [  102.710681] x3 :  x2 : 08e427e0
> [  102.715980] x1 :  x0 : 0033
> [  102.721279] ---[ end trace c216cc1451d5f7ec ]---
> [  102.725882] Call trace:
> [  102.728316] Exception stack(0x8017b7e739b0 to 0x8017b7e73ae0)
> [  102.734742] 39a0:    
> 0001
> [  102.742557] 39c0: 8017b7e73b80 08267c44 08bfa050 
> 
> [  102.750372] 39e0: 8017b78864b0 0003 8017b828 
> 8017d43e6600
> [  102.758188] 3a00: 08e27000 8017db010800 8017d4793400 
> 
> [  102.766003] 3a20: 8017b7e73b80 8017b7e73b80 8017b7e73b40 
> ffc8
> [  102.773818] 3a40: 8017b7e73a70 0810c12c 0033 
> 
> [  102.781633] 3a60: 08e427e0   
> 
> [  102.7

Re: [PATCH v3 0/7] Enhance libsas hotplug feature

2017-07-12 Thread Johannes Thumshirn

On Wed, Jul 12, 2017 at 10:59:27AM +0100, John Garry wrote:
> After:
> ...
> root@(none)$ echo 0 > ./phy-0:7/sas_phy/phy-0:7/enable
> root@(none)$ [  446.193336] hisi_sas_v2_hw HISI0162:01: found dev[8:1] is
> gone
> [  446.249205] hisi_sas_v2_hw HISI0162:01: found dev[7:1] is gone
> [  446.325201] hisi_sas_v2_hw HISI0162:01: found dev[6:1] is gone
> [  446.373189] hisi_sas_v2_hw HISI0162:01: found dev[5:1] is gone
> [  446.421187] hisi_sas_v2_hw HISI0162:01: found dev[4:1] is gone
> [  446.457232] hisi_sas_v2_hw HISI0162:01: found dev[3:1] is gone
> [  446.477151] sd 0:0:1:0: [sdb] Synchronizing SCSI cache
> [  446.482373] sd 0:0:1:0: [sdb] Synchronize Cache(10) failed: Result:
> hostbyte=0x04 driverbyte=0x00
> [  446.491238] sd 0:0:1:0: [sdb] Stopping disk
> [  446.495419] sd 0:0:1:0: [sdb] Start/Stop Unit failed: Result:
> hostbyte=0x04 driverbyte=0x00
> [  446.525227] hisi_sas_v2_hw HISI0162:01: found dev[2:5] is gone
> [  446.569249] hisi_sas_v2_hw HISI0162:01: found dev[1:1] is gone
> [  446.576872] hisi_sas_v2_hw HISI0162:01: found dev[0:2] is gone
> 
> root@(none)$
> 
> So much nicer. BTW, /dev/sdb is a SATA disk, the rest are SAS.

This is awesome. I hope I have some time reviewing the patches themselfes
soon.

Johannes

-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

Re: [PATCH v3 0/7] Enhance libsas hotplug feature

2017-07-12 Thread John Garry


On 10/07/2017 08:06, Yijing Wang wrote:

This patchset is based Johannes's patch
"scsi: sas: scsi_queue_work can fail, so make callers aware"

Now the libsas hotplug has some issues, Dan Williams report
a similar bug here before
https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg39187.html

The issues we have found
1. if LLDD burst reports lots of phy-up/phy-down sas events, some events
   may lost because a same sas events is pending now, finally libsas topo
   may different the hardware.
2. receive a phy down sas event, libsas call sas_deform_port to remove
   devices, it would first delete the sas port, then put a destruction
   discovery event in a new work, and queue it at the tail of workqueue,
   once the sas port be deleted, its children device will be deleted too,
   when the destruction work start, it will found the target device has
   been removed, and report a sysfs warnning.
3. since a hotplug process will be devided into several works, if a phy up
   sas event insert into phydown works, like
   destruction work  ---> PORTE_BYTES_DMAED (sas_form_port) 
>PHYE_LOSS_OF_SIGNAL
   the hot remove flow would broken by PORTE_BYTES_DMAED event, it's not
   we expected, and issues would occur.

The first patch fix the sas events lost, and the second one introudce 
wait-complete
to fix the hotplug order issues.



I quickly tested this for basic hotplug.

Before:
root@(none)$ echo 0 > ./phy-0:6/sas_phy/phy-0:6/enable
root@(none)$ echo 0 > ./phy-0:5/sas_phy/phy-0:5/enable
root@(none)$ echo 0 > ./phy-0:4/sas_phy/phy-0:4/enable
root@(none)$ echo 0 > ./phy-0:3/sas_phy/phy-0:3/enable
root@(none)$ echo 0 > ./phy-0:3/sas_phy/phy-0:2/enable
root@(none)$ echo 0 > ./phy-0:2/sas_phy/phy-0:2/enable
root@(none)$ echo 0 > ./phy-0:1/sas_phy/phy-0:1/enable
root@(none)$ echo 0 > ./phy-0:0/sas_phy/phy-0:0/enable
root@(none)$ echo 0 > ./phy-0:7/sas_phy/phy-0:7/enable
root@(none)$ [  102.570694] sysfs group 'power' not found for kobject 
'0:0:7:0'

[  102.577250] [ cut here ]
[  102.581861] WARNING: CPU: 3 PID: 1740 at fs/sysfs/group.c:237 
sysfs_remove_group+0x8c/0x94

[  102.590110] Modules linked in:
[  102.593154] CPU: 3 PID: 1740 Comm: kworker/u128:2 Not tainted 
4.12.0-rc1-00032-g3ab81fc #1907
[  102.601664] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon 
D05 UEFI Nemo 1.7 RC3 06/23/2017

[  102.610784] Workqueue: scsi_wq_0 sas_destruct_devices
[  102.615822] task: 8017d4793400 task.stack: 8017b7e7
[  102.621728] PC is at sysfs_remove_group+0x8c/0x94
[  102.626419] LR is at sysfs_remove_group+0x8c/0x94
[  102.631109] pc : [] lr : [] 
pstate: 6045

[  102.638490] sp : 8017b7e73b80
[  102.641791] x29: 8017b7e73b80 x28: 8017db010800
[  102.647091] x27: 08e27000 x26: 8017d43e6600
[  102.652390] x25: 8017b828 x24: 0003
[  102.657689] x23: 8017b78864b0 x22: 8017b784c988
[  102.662988] x21: 8017b7886410 x20: 08ee9dd0
[  102.668288] x19:  x18: 08a1b678
[  102.673587] x17: 000e x16: 0007
[  102.678886] x15:  x14: 00a3
[  102.684185] x13: 0033 x12: 0028
[  102.689484] x11: 08f3be58 x10: 
[  102.694783] x9 : 043c x8 : 6f6b20726f662064
[  102.700082] x7 : 08e29e08 x6 : 8017fbe34c50
[  102.705382] x5 :  x4 : 
[  102.710681] x3 :  x2 : 08e427e0
[  102.715980] x1 :  x0 : 0033
[  102.721279] ---[ end trace c216cc1451d5f7ec ]---
[  102.725882] Call trace:
[  102.728316] Exception stack(0x8017b7e739b0 to 0x8017b7e73ae0)
[  102.734742] 39a0:    
0001
[  102.742557] 39c0: 8017b7e73b80 08267c44 08bfa050 

[  102.750372] 39e0: 8017b78864b0 0003 8017b828 
8017d43e6600
[  102.758188] 3a00: 08e27000 8017db010800 8017d4793400 

[  102.766003] 3a20: 8017b7e73b80 8017b7e73b80 8017b7e73b40 
ffc8
[  102.773818] 3a40: 8017b7e73a70 0810c12c 0033 

[  102.781633] 3a60: 08e427e0   

[  102.789449] 3a80: 8017fbe34c50 08e29e08 6f6b20726f662064 
043c
[  102.797264] 3aa0:  08f3be58 0028 
0033
[  102.805079] 3ac0: 00a3  0007 
000e

[  102.812895] [] sysfs_remove_group+0x8c/0x94
[  102.818628] [] dpm_sysfs_remove+0x58/0x68
[  102.824188] [] device_del+0xf8/0x2d0
[  102.829312] [] device_unregister+0x14/0x2c
[  102.834959] [] bsg_unregister_queue+0x60/0x98
[  102.840866] [] __scsi_remove_device+0xa0/0xbc



[  151.331854] 3bc0: 081f21ac 803370c

[PATCH v3 0/7] Enhance libsas hotplug feature

2017-07-10 Thread Yijing Wang

This patchset is based Johannes's patch
"scsi: sas: scsi_queue_work can fail, so make callers aware"

Now the libsas hotplug has some issues, Dan Williams report
a similar bug here before
https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg39187.html

The issues we have found
1. if LLDD burst reports lots of phy-up/phy-down sas events, some events
   may lost because a same sas events is pending now, finally libsas topo
   may different the hardware.
2. receive a phy down sas event, libsas call sas_deform_port to remove
   devices, it would first delete the sas port, then put a destruction
   discovery event in a new work, and queue it at the tail of workqueue,
   once the sas port be deleted, its children device will be deleted too,
   when the destruction work start, it will found the target device has
   been removed, and report a sysfs warnning.
3. since a hotplug process will be devided into several works, if a phy up
   sas event insert into phydown works, like
   destruction work  ---> PORTE_BYTES_DMAED (sas_form_port) 
>PHYE_LOSS_OF_SIGNAL
   the hot remove flow would broken by PORTE_BYTES_DMAED event, it's not
   we expected, and issues would occur.

The first patch fix the sas events lost, and the second one introudce 
wait-complete
to fix the hotplug order issues.

v2->v3: some code improvements suggested by Johannes and John,
split v2 patch 2 into several small pathes.
v1->v2: some code improvements suggested by John Garry

Yijing Wang (7):
  libsas: Use static sas event pool to appease sas event lost
  libsas: remove unused port_gone_completion
  libsas: Use new workqueue to run sas event
  libsas: add sas event wait-complete support
  libsas: add a new workqueue to run probe/destruct discovery event
  libsas: add wait-complete support to sync discovery event
  libsas: release disco mutex during waiting in sas_ex_discover_end_dev

 drivers/scsi/libsas/sas_discover.c |  58 +++---
 drivers/scsi/libsas/sas_event.c| 212 -
 drivers/scsi/libsas/sas_expander.c |  22 +++-
 drivers/scsi/libsas/sas_init.c |  21 ++--
 drivers/scsi/libsas/sas_internal.h |  64 +++
 drivers/scsi/libsas/sas_phy.c  |  48 +++--
 drivers/scsi/libsas/sas_port.c |  22 ++--
 include/scsi/libsas.h  |  27 +++--
 8 files changed, 373 insertions(+), 101 deletions(-)

-- 
2.5.0

Re: [PATCH v2 2/2] libsas: Enhance libsas hotplug

2017-06-14 Thread wangyijing

>> In this patch, we try to solve these issues in following steps:
>> 1. create a new workqueue used to run sas event work, instead of scsi host 
>> workqueue,
>>because we may block sas event work, we cannot block the normal scsi 
>> works.
>>When libsas receive a phy down event, sas_deform_port would be called, 
>> and now we
>>block sas_deform_port and wait for destruction work finish, in 
>> sas_destruct_devices,
>>we may wait ata error handler, it would take a long time, so if do all 
>> stuff in scsi
>>host workq, libsas may block other scsi works too long.
>> 2. create a new workqueue used to run sas discovery events work, instead of 
>> scsi host
>>workqueue, because in some cases, eg. in revalidate domain event, we may 
>> unregister
>>a sas device and discover new one, we must sync the execution, wait the 
>> remove process
>>finish, then start a new discovery. So we must put the probe and destruct 
>> discovery
>>events in a new workqueue to avoid deadlock.
>> 3. introudce a asd_sas_port level wait-complete and a sas_discovery level 
>> wait-complete
>>we use former wait-complete to achieve a sas event atomic process and use 
>> latter to
>>make a sas discovery sync.
>> 4. remove disco_mutex in sas_revalidate_domain, since now 
>> sas_revalidate_domain sync
>>the destruct discovery event execution, it's no need to lock disco mutex 
>> there.
> 
> The way you've written the changelog suggests this patch should be split
> into 4 patches, each one taking care of one of your change items.

I will split it in next version.

Thanks!
Yijing.


>

Re: [PATCH v2 2/2] libsas: Enhance libsas hotplug

2017-06-14 Thread Johannes Thumshirn

On 06/14/2017 09:33 AM, Yijing Wang wrote:
> Libsas complete a hotplug event notified by LLDD in several works,
> for example, if libsas receive a PHYE_LOSS_OF_SIGNAL, we process it
> in following steps:
> 
> notify_phy_event  [interrupt context]
>   sas_queue_event [queue work on shost->work_q]
>   sas_phye_loss_of_signal [running in shost->work_q]
>   sas_deform_port [remove sas port]
>   sas_unregister_dev
>   sas_discover_event  [queue destruct 
> work on shost->work_q tail]
> 
> In above case, complete whole hotplug in two works, remove sas port first, 
> then
> put the destruction of device in another work and queue it on in the tail of
> workqueue, since sas port is the parent of the children rphy device, so if 
> remove
> sas port first, the children rphy device would also be deleted, when the 
> destruction
> work coming, it would find the target has been removed already, and report a
> sysfs warning calltrace.
> 
> queue tail queue head
> DISCE_DESTRUCT> PORTE_BYTES_DMAED event ->PHYE_LOSS_OF_SIGNAL[running]
> 
> There are other hotplug issues in current framework, in above case, if there 
> is
> hotadd sas event queued between hotremove works, the hotplug order would be 
> broken
> and unexpected issues would happen.
> 
> In this patch, we try to solve these issues in following steps:
> 1. create a new workqueue used to run sas event work, instead of scsi host 
> workqueue,
>because we may block sas event work, we cannot block the normal scsi works.
>When libsas receive a phy down event, sas_deform_port would be called, and 
> now we
>block sas_deform_port and wait for destruction work finish, in 
> sas_destruct_devices,
>we may wait ata error handler, it would take a long time, so if do all 
> stuff in scsi
>host workq, libsas may block other scsi works too long.
> 2. create a new workqueue used to run sas discovery events work, instead of 
> scsi host
>workqueue, because in some cases, eg. in revalidate domain event, we may 
> unregister
>a sas device and discover new one, we must sync the execution, wait the 
> remove process
>finish, then start a new discovery. So we must put the probe and destruct 
> discovery
>events in a new workqueue to avoid deadlock.
> 3. introudce a asd_sas_port level wait-complete and a sas_discovery level 
> wait-complete
>we use former wait-complete to achieve a sas event atomic process and use 
> latter to
>make a sas discovery sync.
> 4. remove disco_mutex in sas_revalidate_domain, since now 
> sas_revalidate_domain sync
>the destruct discovery event execution, it's no need to lock disco mutex 
> there.

The way you've written the changelog suggests this patch should be split
into 4 patches, each one taking care of one of your change items.

-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

[Resend][PATCH v2 2/2] libsas: Enhance libsas hotplug

2017-06-14 Thread Yijing Wang

Libsas complete a hotplug event notified by LLDD in several works,
for example, if libsas receive a PHYE_LOSS_OF_SIGNAL, we process it
in following steps:

notify_phy_event[interrupt context]
sas_queue_event [queue work on shost->work_q]
sas_phye_loss_of_signal [running in shost->work_q]
sas_deform_port [remove sas port]
sas_unregister_dev
sas_discover_event  [queue destruct 
work on shost->work_q tail]

In above case, complete whole hotplug in two works, remove sas port first, then
put the destruction of device in another work and queue it on in the tail of
workqueue, since sas port is the parent of the children rphy device, so if 
remove
sas port first, the children rphy device would also be deleted, when the 
destruction
work coming, it would find the target has been removed already, and report a
sysfs warning calltrace.

queue tail queue head
DISCE_DESTRUCT> PORTE_BYTES_DMAED event ->PHYE_LOSS_OF_SIGNAL[running]

There are other hotplug issues in current framework, in above case, if there is
hotadd sas event queued between hotremove works, the hotplug order would be 
broken
and unexpected issues would happen.

In this patch, we try to solve these issues in following steps:
1. create a new workqueue used to run sas event work, instead of scsi host 
workqueue,
   because we may block sas event work, we cannot block the normal scsi works.
   When libsas receive a phy down event, sas_deform_port would be called, and 
now we
   block sas_deform_port and wait for destruction work finish, in 
sas_destruct_devices,
   we may wait ata error handler, it would take a long time, so if do all stuff 
in scsi
   host workq, libsas may block other scsi works too long.
2. create a new workqueue used to run sas discovery events work, instead of 
scsi host
   workqueue, because in some cases, eg. in revalidate domain event, we may 
unregister
   a sas device and discover new one, we must sync the execution, wait the 
remove process
   finish, then start a new discovery. So we must put the probe and destruct 
discovery
   events in a new workqueue to avoid deadlock.
3. introudce a asd_sas_port level wait-complete and a sas_discovery level 
wait-complete
   we use former wait-complete to achieve a sas event atomic process and use 
latter to
   make a sas discovery sync.
4. remove disco_mutex in sas_revalidate_domain, since now sas_revalidate_domain 
sync
   the destruct discovery event execution, it's no need to lock disco mutex 
there.

Signed-off-by: Yijing Wang <wangyij...@huawei.com>
---
 drivers/scsi/libsas/sas_discover.c | 58 ++--
 drivers/scsi/libsas/sas_event.c|  2 +-
 drivers/scsi/libsas/sas_expander.c |  9 +-
 drivers/scsi/libsas/sas_init.c | 23 +-
 drivers/scsi/libsas/sas_internal.h | 61 ++
 drivers/scsi/libsas/sas_port.c |  4 +++
 include/scsi/libsas.h  |  9 ++
 7 files changed, 148 insertions(+), 18 deletions(-)

diff --git a/drivers/scsi/libsas/sas_discover.c 
b/drivers/scsi/libsas/sas_discover.c
index 60de662..43e8a1e 100644
--- a/drivers/scsi/libsas/sas_discover.c
+++ b/drivers/scsi/libsas/sas_discover.c
@@ -503,11 +503,10 @@ static void sas_revalidate_domain(struct work_struct 
*work)
struct domain_device *ddev = port->port_dev;
 
/* prevent revalidation from finding sata links in recovery */
-   mutex_lock(>disco_mutex);
if (test_bit(SAS_HA_ATA_EH_ACTIVE, >state)) {
SAS_DPRINTK("REVALIDATION DEFERRED on port %d, pid:%d\n",
port->id, task_pid_nr(current));
-   goto out;
+   return;
}
 
clear_bit(DISCE_REVALIDATE_DOMAIN, >disc.pending);
@@ -521,20 +520,57 @@ static void sas_revalidate_domain(struct work_struct 
*work)
 
SAS_DPRINTK("done REVALIDATING DOMAIN on port %d, pid:%d, res 0x%x\n",
port->id, task_pid_nr(current), res);
- out:
-   mutex_unlock(>disco_mutex);
+}
+
+static const work_func_t sas_event_fns[DISC_NUM_EVENTS] = {
+   [DISCE_DISCOVER_DOMAIN] = sas_discover_domain,
+   [DISCE_REVALIDATE_DOMAIN] = sas_revalidate_domain,
+   [DISCE_PROBE] = sas_probe_devices,
+   [DISCE_SUSPEND] = sas_suspend_devices,
+   [DISCE_RESUME] = sas_resume_devices,
+   [DISCE_DESTRUCT] = sas_destruct_devices,
+};
+
+/* a simple wrapper for sas discover event funtions */
+static void sas_discover_common_fn(struct work_struct *work)
+{
+   struct sas_discovery_event *ev = to_sas_discovery_event(work);
+   struct asd_sas_port *port = ev->port;
+
+   sas_event_fns[ev->type](work);
+   sas_unbusy_port(port);
 }
 
 /* -- Events -- */
 
 stat

[Resend][PATCH v2 0/2] Enhance libsas hotplug feature

2017-06-14 Thread Yijing Wang

Now the libsas hotplug has some issues, Dan Williams report
a similar bug here before
https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg39187.html

The issues we have found
1. if LLDD burst reports lots of phy-up/phy-down sas events, some events
   may lost because a same sas events is pending now, finally libsas topo
   may different the hardware.
2. receive a phy down sas event, libsas call sas_deform_port to remove
   devices, it would first delete the sas port, then put a destruction
   discovery event in a new work, and queue it at the tail of workqueue,
   once the sas port be deleted, its children device will be deleted too,
   when the destruction work start, it will found the target device has
   been removed, and report a sysfs warnning.
3. since a hotplug process will be devided into several works, if a phy up
   sas event insert into phydown works, like
   destruction work  ---> PORTE_BYTES_DMAED (sas_form_port) 
>PHYE_LOSS_OF_SIGNAL
   the hot remove flow would broken by PORTE_BYTES_DMAED event, it's not
   we expected, and issues would occur.

The first patch fix the sas events lost, and the second one introudce 
wait-complete
to fix the hotplug order issues.

v1->v2: some code improvements suggested by John Garry

Yijing Wang (2):
  libsas: Don't process sas events in static works
  libsas: Enhance libsas hotplug

 drivers/scsi/libsas/sas_discover.c | 58 +---
 drivers/scsi/libsas/sas_event.c| 90 ++
 drivers/scsi/libsas/sas_expander.c |  9 +++-
 drivers/scsi/libsas/sas_init.c | 29 +---
 drivers/scsi/libsas/sas_internal.h | 64 +++
 drivers/scsi/libsas/sas_phy.c  | 45 ---
 drivers/scsi/libsas/sas_port.c | 22 +-
 include/scsi/libsas.h  | 19 
 8 files changed, 232 insertions(+), 104 deletions(-)

-- 
2.5.0

[PATCH v2 2/2] libsas: Enhance libsas hotplug

2017-06-14 Thread Yijing Wang

Libsas complete a hotplug event notified by LLDD in several works,
for example, if libsas receive a PHYE_LOSS_OF_SIGNAL, we process it
in following steps:

notify_phy_event[interrupt context]
sas_queue_event [queue work on shost->work_q]
sas_phye_loss_of_signal [running in shost->work_q]
sas_deform_port [remove sas port]
sas_unregister_dev
sas_discover_event  [queue destruct 
work on shost->work_q tail]

In above case, complete whole hotplug in two works, remove sas port first, then
put the destruction of device in another work and queue it on in the tail of
workqueue, since sas port is the parent of the children rphy device, so if 
remove
sas port first, the children rphy device would also be deleted, when the 
destruction
work coming, it would find the target has been removed already, and report a
sysfs warning calltrace.

queue tail queue head
DISCE_DESTRUCT> PORTE_BYTES_DMAED event ->PHYE_LOSS_OF_SIGNAL[running]

There are other hotplug issues in current framework, in above case, if there is
hotadd sas event queued between hotremove works, the hotplug order would be 
broken
and unexpected issues would happen.

In this patch, we try to solve these issues in following steps:
1. create a new workqueue used to run sas event work, instead of scsi host 
workqueue,
   because we may block sas event work, we cannot block the normal scsi works.
   When libsas receive a phy down event, sas_deform_port would be called, and 
now we
   block sas_deform_port and wait for destruction work finish, in 
sas_destruct_devices,
   we may wait ata error handler, it would take a long time, so if do all stuff 
in scsi
   host workq, libsas may block other scsi works too long.
2. create a new workqueue used to run sas discovery events work, instead of 
scsi host
   workqueue, because in some cases, eg. in revalidate domain event, we may 
unregister
   a sas device and discover new one, we must sync the execution, wait the 
remove process
   finish, then start a new discovery. So we must put the probe and destruct 
discovery
   events in a new workqueue to avoid deadlock.
3. introudce a asd_sas_port level wait-complete and a sas_discovery level 
wait-complete
   we use former wait-complete to achieve a sas event atomic process and use 
latter to
   make a sas discovery sync.
4. remove disco_mutex in sas_revalidate_domain, since now sas_revalidate_domain 
sync
   the destruct discovery event execution, it's no need to lock disco mutex 
there.

Signed-off-by: Yijing Wang <wangyij...@huawei.com>
---
 drivers/scsi/libsas/sas_discover.c | 58 ++--
 drivers/scsi/libsas/sas_event.c|  2 +-
 drivers/scsi/libsas/sas_expander.c |  9 +-
 drivers/scsi/libsas/sas_init.c | 23 +-
 drivers/scsi/libsas/sas_internal.h | 61 ++
 drivers/scsi/libsas/sas_port.c |  4 +++
 include/scsi/libsas.h  |  9 ++
 7 files changed, 148 insertions(+), 18 deletions(-)

diff --git a/drivers/scsi/libsas/sas_discover.c 
b/drivers/scsi/libsas/sas_discover.c
index 60de662..43e8a1e 100644
--- a/drivers/scsi/libsas/sas_discover.c
+++ b/drivers/scsi/libsas/sas_discover.c
@@ -503,11 +503,10 @@ static void sas_revalidate_domain(struct work_struct 
*work)
struct domain_device *ddev = port->port_dev;
 
/* prevent revalidation from finding sata links in recovery */
-   mutex_lock(>disco_mutex);
if (test_bit(SAS_HA_ATA_EH_ACTIVE, >state)) {
SAS_DPRINTK("REVALIDATION DEFERRED on port %d, pid:%d\n",
port->id, task_pid_nr(current));
-   goto out;
+   return;
}
 
clear_bit(DISCE_REVALIDATE_DOMAIN, >disc.pending);
@@ -521,20 +520,57 @@ static void sas_revalidate_domain(struct work_struct 
*work)
 
SAS_DPRINTK("done REVALIDATING DOMAIN on port %d, pid:%d, res 0x%x\n",
port->id, task_pid_nr(current), res);
- out:
-   mutex_unlock(>disco_mutex);
+}
+
+static const work_func_t sas_event_fns[DISC_NUM_EVENTS] = {
+   [DISCE_DISCOVER_DOMAIN] = sas_discover_domain,
+   [DISCE_REVALIDATE_DOMAIN] = sas_revalidate_domain,
+   [DISCE_PROBE] = sas_probe_devices,
+   [DISCE_SUSPEND] = sas_suspend_devices,
+   [DISCE_RESUME] = sas_resume_devices,
+   [DISCE_DESTRUCT] = sas_destruct_devices,
+};
+
+/* a simple wrapper for sas discover event funtions */
+static void sas_discover_common_fn(struct work_struct *work)
+{
+   struct sas_discovery_event *ev = to_sas_discovery_event(work);
+   struct asd_sas_port *port = ev->port;
+
+   sas_event_fns[ev->type](work);
+   sas_unbusy_port(port);
 }
 
 /* -- Events -- */
 
 stat

[PATCH v2 0/2] Enhance libsas hotplug feature

2017-06-14 Thread Yijing Wang

Now the libsas hotplug has some issues, Dan Williams report
a similar bug here before
https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg39187.html

The issues we have found
1. if LLDD burst reports lots of phy-up/phy-down sas events, some events
   may lost because a same sas events is pending now, finally libsas topo
   may different the hardware.
2. receive a phy down sas event, libsas call sas_deform_port to remove
   devices, it would first delete the sas port, then put a destruction
   discovery event in a new work, and queue it at the tail of workqueue,
   once the sas port be deleted, its children device will be deleted too,
   when the destruction work start, it will found the target device has
   been removed, and report a sysfs warnning.
3. since a hotplug process will be devided into several works, if a phy up
   sas event insert into phydown works, like
   destruction work  ---> PORTE_BYTES_DMAED (sas_form_port) 
>PHYE_LOSS_OF_SIGNAL
   the hot remove flow would broken by PORTE_BYTES_DMAED event, it's not
   we expected, and issues would occur.

The first patch fix the sas events lost, and the second one introudce 
wait-complete
to fix the hotplug order issues.

v1->v2: some code improvements suggested by John Garry

Yijing Wang (2):
  libsas: Don't process sas events in static works
  libsas: Enhance libsas hotplug

 drivers/scsi/libsas/sas_discover.c | 58 +---
 drivers/scsi/libsas/sas_event.c| 90 ++
 drivers/scsi/libsas/sas_expander.c |  9 +++-
 drivers/scsi/libsas/sas_init.c | 29 +---
 drivers/scsi/libsas/sas_internal.h | 64 +++
 drivers/scsi/libsas/sas_phy.c  | 45 ---
 drivers/scsi/libsas/sas_port.c | 22 +-
 include/scsi/libsas.h  | 19 
 8 files changed, 232 insertions(+), 104 deletions(-)

-- 
2.5.0

Re: [PATCH] scsi: lpfc: Fix crash on PCI hotplug remove path

2017-05-31 Thread James Smart

Actually, I think we solved this in a better manner in this patch in the 
11.4.0.0 patch set:
  PATCH 10/15] lpfc: Fix crash on powering off BFS VM with passthrough 
device

  http://marc.info/?l=linux-scsi=149621070910290=2

See if the above patch fixes your error.

-- james



On 5/29/2017 4:11 PM, James Smart wrote:

looks good

Signed-off-by: James Smart  <james.sm...@broadcom.com>

-- james



On 5/28/2017 2:45 PM, Guilherme G. Piccoli wrote:

During a PCI hotplug remove event we could have a NULL pointer
dereference on lpfc_sli_abort_iocb(), if pring is NULL. This
patch adds a check for this case and is able to circumvent the
failure and continue the hotplug remove process with success.

This issue was introduced after the driver refactor made on
commit 895427bd012c ("scsi: lpfc: NVME Initiator: Base modifications").

Fixes: 895427bd012c ("scsi: lpfc: NVME Initiator: Base modifications")
Reported-by: Naresh Bannoth <nbann...@in.ibm.com>
Signed-off-by: Guilherme G. Piccoli <gpicc...@linux.vnet.ibm.com>
---
This patch was rebased against Martin's 4.12/scsi-fixes.

Re: [PATCH] scsi: lpfc: Fix crash on PCI hotplug remove path

2017-05-29 Thread James Smart


looks good

Signed-off-by: James Smart  <james.sm...@broadcom.com>

-- james



On 5/28/2017 2:45 PM, Guilherme G. Piccoli wrote:

During a PCI hotplug remove event we could have a NULL pointer
dereference on lpfc_sli_abort_iocb(), if pring is NULL. This
patch adds a check for this case and is able to circumvent the
failure and continue the hotplug remove process with success.

This issue was introduced after the driver refactor made on
commit 895427bd012c ("scsi: lpfc: NVME Initiator: Base modifications").

Fixes: 895427bd012c ("scsi: lpfc: NVME Initiator: Base modifications")
Reported-by: Naresh Bannoth <nbann...@in.ibm.com>
Signed-off-by: Guilherme G. Piccoli <gpicc...@linux.vnet.ibm.com>
---
This patch was rebased against Martin's 4.12/scsi-fixes.

Re: [PATCH] scsi: lpfc: Fix crash on PCI hotplug remove path

2017-05-29 Thread Raphael Philipe Mendes da Silva

On Mon, May 29, 2017 at 09:56:09AM +0200, Johannes Thumshirn wrote:
> On 05/28/2017 11:45 PM, Guilherme G. Piccoli wrote:
> > During a PCI hotplug remove event we could have a NULL pointer
> > dereference on lpfc_sli_abort_iocb(), if pring is NULL. This
> > patch adds a check for this case and is able to circumvent the
> > failure and continue the hotplug remove process with success.
> > 
> > This issue was introduced after the driver refactor made on
> > commit 895427bd012c ("scsi: lpfc: NVME Initiator: Base modifications").
> > 
> > Fixes: 895427bd012c ("scsi: lpfc: NVME Initiator: Base modifications")
> > Reported-by: Naresh Bannoth <nbann...@in.ibm.com>
> > Signed-off-by: Guilherme G. Piccoli <gpicc...@linux.vnet.ibm.com>
> > ---
> 
> Looks good,
> Reviewed-by: Johannes Thumshirn <jthumsh...@suse.de>

Tested-by: Raphael Silva <rapha...@linux.vnet.ibm.com>

Re: [PATCH] scsi: lpfc: Fix crash on PCI hotplug remove path

2017-05-29 Thread Johannes Thumshirn

On 05/28/2017 11:45 PM, Guilherme G. Piccoli wrote:
> During a PCI hotplug remove event we could have a NULL pointer
> dereference on lpfc_sli_abort_iocb(), if pring is NULL. This
> patch adds a check for this case and is able to circumvent the
> failure and continue the hotplug remove process with success.
> 
> This issue was introduced after the driver refactor made on
> commit 895427bd012c ("scsi: lpfc: NVME Initiator: Base modifications").
> 
> Fixes: 895427bd012c ("scsi: lpfc: NVME Initiator: Base modifications")
> Reported-by: Naresh Bannoth <nbann...@in.ibm.com>
> Signed-off-by: Guilherme G. Piccoli <gpicc...@linux.vnet.ibm.com>
> ---

Looks good,
Reviewed-by: Johannes Thumshirn <jthumsh...@suse.de>

-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

[PATCH] scsi: lpfc: Fix crash on PCI hotplug remove path

2017-05-28 Thread Guilherme G. Piccoli

During a PCI hotplug remove event we could have a NULL pointer
dereference on lpfc_sli_abort_iocb(), if pring is NULL. This
patch adds a check for this case and is able to circumvent the
failure and continue the hotplug remove process with success.

This issue was introduced after the driver refactor made on
commit 895427bd012c ("scsi: lpfc: NVME Initiator: Base modifications").

Fixes: 895427bd012c ("scsi: lpfc: NVME Initiator: Base modifications")
Reported-by: Naresh Bannoth <nbann...@in.ibm.com>
Signed-off-by: Guilherme G. Piccoli <gpicc...@linux.vnet.ibm.com>
---
This patch was rebased against Martin's 4.12/scsi-fixes.

 drivers/scsi/lpfc/lpfc_sli.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c
index d6b184839bc2..134c60a66fb8 100644
--- a/drivers/scsi/lpfc/lpfc_sli.c
+++ b/drivers/scsi/lpfc/lpfc_sli.c
@@ -11003,9 +11003,13 @@ lpfc_sli_abort_iocb(struct lpfc_vport *vport, struct 
lpfc_sli_ring *pring,
 
/* Setup callback routine and issue the command. */
abtsiocb->iocb_cmpl = lpfc_sli_abort_fcp_cmpl;
-   ret_val = lpfc_sli_issue_iocb(phba, pring->ringno,
- abtsiocb, 0);
-   if (ret_val == IOCB_ERROR) {
+
+   /* In PCI hotplug remove path, pring might be NULL */
+   if (pring)
+   ret_val = lpfc_sli_issue_iocb(phba, pring->ringno,
+ abtsiocb, 0);
+
+   if (!pring || ret_val == IOCB_ERROR) {
lpfc_sli_release_iocbq(phba, abtsiocb);
errcnt++;
continue;
-- 
2.12.0.rc0

Re: [PATCH 2/2] libsas: Enhance libsas hotplug

2017-05-25 Thread wangyijing

Hi John, thanks for your review and comments!

在 2017/5/25 17:04, John Garry 写道:
> Hi,
> 
> There are some comments, inline.
> 
> In general, if it works, it looks ok.
> 
> Other reviews would be greatly appreciated - Hannes, Christoph, Johannes, Dan 
> - please.
> 
>> Libsas complete a hotplug event notified by LLDD in several works,
>> for example, if libsas receive a PHYE_LOSS_OF_SIGNAL, we process it
>> in following steps:
>>
>> notify_phy_event[interrupt context]
>> sas_queue_event[queue work on shost->work_q]
>> sas_phye_loss_of_signal[running in shost->work_q]
>> sas_deform_port[remove sas port]
>> sas_unregister_dev
>> sas_discover_event    [queue destruct work on 
>> shost->work_q tail]
>>
>> In above case, complete whole hotplug in two works, remove sas port first, 
>> then
>> put the destruction of device in another work and queue it on in the tail of
>> workqueue, since sas port is the parent of the children rphy device, so if 
>> remove
>> sas port first, the children rphy device would also be deleted, when the 
>> destruction
>> work coming, it would find the target has been removed already, and report a
>> sysfs warning calltrace.
>>
>> queue tail queue head
>> DISCE_DESTRUCT> PORTE_BYTES_DMAED event 
>> ->PHYE_LOSS_OF_SIGNAL[running]
>>
>> There are other hotplug issues in current framework, in above case, if there 
>> is
>> hotadd sas event queued between hotremove works, the hotplug order would be 
>> broken
>> and unexpected issues would happen.
>>
>> In this patch, we try to solve these issues in following steps:
>> 1. create a new workqueue used to run sas event work, instead of scsi host 
>> workqueue,
>>because we may block sas event work, we cannot block the normal scsi 
>> works.
> 
> What do we block the event work for?

When libsas receive a phy down event, sas_deform_port would be called, and now 
we block sas_deform_port
and wait for destruction work finish, in sas_destruct_devices, we may wait ata 
error handler, it would
take a long time, so if do all stuff in scsi host workq, libsas may block other 
scsi works too long.

> 
>> 2. create a new workqueue used to run sas discovery events work, instead of 
>> scsi host
>>workqueue, because in some cases, eg. in revalidate domain event, we may 
>> unregister
>>a sas device and discover new one, we must sync the execution, wait the 
>> remove process
>>finish, then start a new discovery. So we must put the probe and destruct 
>> discovery
>>events in a new workqueue to avoid deadlock.
>> 3. introudce a asd_sas_port level wait-complete and a sas_discovery level 
>> wait-complete
>>we use former wait-complete to achieve a sas event atomic process and use 
>> latter to
>>make a sas discovery sync.
>> 4. remove disco_mutex in sas_revalidate_domain, since now 
>> sas_revalidate_domain sync
>>the destruct discovery event execution, it's no need to lock disco mutex 
>> there.
>>
>> Signed-off-by: Yijing Wang <wangyij...@huawei.com>
>> ---
>>  drivers/scsi/libsas/sas_discover.c | 58 
>> --
>>  drivers/scsi/libsas/sas_event.c|  2 +-
>>  drivers/scsi/libsas/sas_expander.c |  9 +-
>>  drivers/scsi/libsas/sas_init.c | 31 +++-
>>  drivers/scsi/libsas/sas_internal.h | 50 
>>  drivers/scsi/libsas/sas_port.c |  4 +++
>>  include/scsi/libsas.h  | 11 +++-
>>  7 files changed, 146 insertions(+), 19 deletions(-)
>>
>> diff --git a/drivers/scsi/libsas/sas_discover.c 
>> b/drivers/scsi/libsas/sas_discover.c
>> index 60de662..43e8a1e 100644
>> --- a/drivers/scsi/libsas/sas_discover.c
>> +++ b/drivers/scsi/libsas/sas_discover.c
>> @@ -503,11 +503,10 @@ static void sas_revalidate_domain(struct work_struct 
>> *work)
>>  struct domain_device *ddev = port->port_dev;
>>
>>  /* prevent revalidation from finding sata links in recovery */
>> -mutex_lock(>disco_mutex);
>>  if (test_bit(SAS_HA_ATA_EH_ACTIVE, >state)) {
>>  SAS_DPRINTK("REVALIDATION DEFERRED on port %d, pid:%d\n",
>>  port->id, task_pid_nr(current));
>> -goto out;
>> +return;
>>  }
>>
>>  clear_bit(DISCE_REVALIDATE_DOMAIN, >disc.pending);
>> @@ -5

Re: [PATCH 2/2] libsas: Enhance libsas hotplug

2017-05-25 Thread John Garry


Hi,

There are some comments, inline.

In general, if it works, it looks ok.

Other reviews would be greatly appreciated - Hannes, Christoph, 
Johannes, Dan - please.


> Libsas complete a hotplug event notified by LLDD in several works,
> for example, if libsas receive a PHYE_LOSS_OF_SIGNAL, we process it
> in following steps:
>
> notify_phy_event[interrupt context]
> sas_queue_event[queue work on shost->work_q]
> sas_phye_loss_of_signal[running in shost->work_q]
> sas_deform_port[remove sas port]
> sas_unregister_dev
> sas_discover_event[queue destruct work on 
shost->work_q tail]

>
> In above case, complete whole hotplug in two works, remove sas port 
first, then
> put the destruction of device in another work and queue it on in the 
tail of
> workqueue, since sas port is the parent of the children rphy device, 
so if remove
> sas port first, the children rphy device would also be deleted, when 
the destruction
> work coming, it would find the target has been removed already, and 
report a

> sysfs warning calltrace.
>
> queue tail queue head
> DISCE_DESTRUCT> PORTE_BYTES_DMAED event 
->PHYE_LOSS_OF_SIGNAL[running]

>
> There are other hotplug issues in current framework, in above case, 
if there is
> hotadd sas event queued between hotremove works, the hotplug order 
would be broken

> and unexpected issues would happen.
>
> In this patch, we try to solve these issues in following steps:
> 1. create a new workqueue used to run sas event work, instead of scsi 
host workqueue,
>because we may block sas event work, we cannot block the normal 
scsi works.


What do we block the event work for?

> 2. create a new workqueue used to run sas discovery events work, 
instead of scsi host
>workqueue, because in some cases, eg. in revalidate domain event, 
we may unregister
>a sas device and discover new one, we must sync the execution, 
wait the remove process
>finish, then start a new discovery. So we must put the probe and 
destruct discovery

>events in a new workqueue to avoid deadlock.
> 3. introudce a asd_sas_port level wait-complete and a sas_discovery 
level wait-complete
>we use former wait-complete to achieve a sas event atomic process 
and use latter to

>make a sas discovery sync.
> 4. remove disco_mutex in sas_revalidate_domain, since now 
sas_revalidate_domain sync
>the destruct discovery event execution, it's no need to lock disco 
mutex there.

>
> Signed-off-by: Yijing Wang <wangyij...@huawei.com>
> ---
>  drivers/scsi/libsas/sas_discover.c | 58 
--

>  drivers/scsi/libsas/sas_event.c|  2 +-
>  drivers/scsi/libsas/sas_expander.c |  9 +-
>  drivers/scsi/libsas/sas_init.c | 31 +++-
>  drivers/scsi/libsas/sas_internal.h | 50 
>  drivers/scsi/libsas/sas_port.c |  4 +++
>  include/scsi/libsas.h  | 11 +++-
>  7 files changed, 146 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/scsi/libsas/sas_discover.c 
b/drivers/scsi/libsas/sas_discover.c

> index 60de662..43e8a1e 100644
> --- a/drivers/scsi/libsas/sas_discover.c
> +++ b/drivers/scsi/libsas/sas_discover.c
> @@ -503,11 +503,10 @@ static void sas_revalidate_domain(struct 
work_struct *work)

>  struct domain_device *ddev = port->port_dev;
>
>  /* prevent revalidation from finding sata links in recovery */
> -mutex_lock(>disco_mutex);
>  if (test_bit(SAS_HA_ATA_EH_ACTIVE, >state)) {
>  SAS_DPRINTK("REVALIDATION DEFERRED on port %d, pid:%d\n",
>  port->id, task_pid_nr(current));
> -goto out;
> +return;
>  }
>
>  clear_bit(DISCE_REVALIDATE_DOMAIN, >disc.pending);
> @@ -521,20 +520,57 @@ static void sas_revalidate_domain(struct 
work_struct *work)

>
>  SAS_DPRINTK("done REVALIDATING DOMAIN on port %d, pid:%d, res 
0x%x\n",

>  port->id, task_pid_nr(current), res);
> - out:
> -mutex_unlock(>disco_mutex);
> +}
> +
> +static const work_func_t sas_event_fns[DISC_NUM_EVENTS] = {
> +[DISCE_DISCOVER_DOMAIN] = sas_discover_domain,
> +[DISCE_REVALIDATE_DOMAIN] = sas_revalidate_domain,
> +[DISCE_PROBE] = sas_probe_devices,
> +[DISCE_SUSPEND] = sas_suspend_devices,
> +[DISCE_RESUME] = sas_resume_devices,
> +[DISCE_DESTRUCT] = sas_destruct_devices,
> +};
> +
> +/* a simple wrapper for sas discover event funtions */
> +static void sas_discover_common_fn(struct work_struct *work)
> +{
> +struct sas_discovery_event *ev = to_sa

[PATCH 2/2] libsas: Enhance libsas hotplug

2017-05-20 Thread Yijing Wang

Libsas complete a hotplug event notified by LLDD in several works,
for example, if libsas receive a PHYE_LOSS_OF_SIGNAL, we process it
in following steps:

notify_phy_event[interrupt context]
sas_queue_event [queue work on shost->work_q]
sas_phye_loss_of_signal [running in shost->work_q]
sas_deform_port [remove sas port]
sas_unregister_dev
sas_discover_event  [queue destruct 
work on shost->work_q tail]

In above case, complete whole hotplug in two works, remove sas port first, then
put the destruction of device in another work and queue it on in the tail of
workqueue, since sas port is the parent of the children rphy device, so if 
remove
sas port first, the children rphy device would also be deleted, when the 
destruction
work coming, it would find the target has been removed already, and report a
sysfs warning calltrace.

queue tail queue head
DISCE_DESTRUCT> PORTE_BYTES_DMAED event ->PHYE_LOSS_OF_SIGNAL[running]

There are other hotplug issues in current framework, in above case, if there is
hotadd sas event queued between hotremove works, the hotplug order would be 
broken
and unexpected issues would happen.

In this patch, we try to solve these issues in following steps:
1. create a new workqueue used to run sas event work, instead of scsi host 
workqueue,
   because we may block sas event work, we cannot block the normal scsi works.
2. create a new workqueue used to run sas discovery events work, instead of 
scsi host
   workqueue, because in some cases, eg. in revalidate domain event, we may 
unregister
   a sas device and discover new one, we must sync the execution, wait the 
remove process
   finish, then start a new discovery. So we must put the probe and destruct 
discovery
   events in a new workqueue to avoid deadlock.
3. introudce a asd_sas_port level wait-complete and a sas_discovery level 
wait-complete
   we use former wait-complete to achieve a sas event atomic process and use 
latter to
   make a sas discovery sync.
4. remove disco_mutex in sas_revalidate_domain, since now sas_revalidate_domain 
sync
   the destruct discovery event execution, it's no need to lock disco mutex 
there.

Signed-off-by: Yijing Wang <wangyij...@huawei.com>
---
 drivers/scsi/libsas/sas_discover.c | 58 --
 drivers/scsi/libsas/sas_event.c|  2 +-
 drivers/scsi/libsas/sas_expander.c |  9 +-
 drivers/scsi/libsas/sas_init.c | 31 +++-
 drivers/scsi/libsas/sas_internal.h | 50 
 drivers/scsi/libsas/sas_port.c |  4 +++
 include/scsi/libsas.h  | 11 +++-
 7 files changed, 146 insertions(+), 19 deletions(-)

diff --git a/drivers/scsi/libsas/sas_discover.c 
b/drivers/scsi/libsas/sas_discover.c
index 60de662..43e8a1e 100644
--- a/drivers/scsi/libsas/sas_discover.c
+++ b/drivers/scsi/libsas/sas_discover.c
@@ -503,11 +503,10 @@ static void sas_revalidate_domain(struct work_struct 
*work)
struct domain_device *ddev = port->port_dev;
 
/* prevent revalidation from finding sata links in recovery */
-   mutex_lock(>disco_mutex);
if (test_bit(SAS_HA_ATA_EH_ACTIVE, >state)) {
SAS_DPRINTK("REVALIDATION DEFERRED on port %d, pid:%d\n",
port->id, task_pid_nr(current));
-   goto out;
+   return;
}
 
clear_bit(DISCE_REVALIDATE_DOMAIN, >disc.pending);
@@ -521,20 +520,57 @@ static void sas_revalidate_domain(struct work_struct 
*work)
 
SAS_DPRINTK("done REVALIDATING DOMAIN on port %d, pid:%d, res 0x%x\n",
port->id, task_pid_nr(current), res);
- out:
-   mutex_unlock(>disco_mutex);
+}
+
+static const work_func_t sas_event_fns[DISC_NUM_EVENTS] = {
+   [DISCE_DISCOVER_DOMAIN] = sas_discover_domain,
+   [DISCE_REVALIDATE_DOMAIN] = sas_revalidate_domain,
+   [DISCE_PROBE] = sas_probe_devices,
+   [DISCE_SUSPEND] = sas_suspend_devices,
+   [DISCE_RESUME] = sas_resume_devices,
+   [DISCE_DESTRUCT] = sas_destruct_devices,
+};
+
+/* a simple wrapper for sas discover event funtions */
+static void sas_discover_common_fn(struct work_struct *work)
+{
+   struct sas_discovery_event *ev = to_sas_discovery_event(work);
+   struct asd_sas_port *port = ev->port;
+
+   sas_event_fns[ev->type](work);
+   sas_unbusy_port(port);
 }
 
 /* -- Events -- */
 
 static void sas_chain_work(struct sas_ha_struct *ha, struct sas_work *sw)
 {
+   int ret;
+   struct sas_discovery_event *ev = to_sas_discovery_event(>work);
+   struct asd_sas_port *port = ev->port;
+
/* chained work is not subject to SA_HA_DRAINING or
 * SAS_HA_REGISTERED, because it is e

[PATCH 0/2] Enhance libsas hotplug feature

2017-05-20 Thread Yijing Wang

Now the libsas hotplug has some issues, Dan Williams report
a similar bug here before
https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg39187.html

The issues we have found
1. if LLDD burst reports lots of phy-up/phy-down sas events, some events
   may lost because a same sas events is pending now, finally libsas topo
   may different the hardware.
2. receive a phy down sas event, libsas call sas_deform_port to remove
   devices, it would first delete the sas port, then put a destruction
   discovery event in a new work, and queue it at the tail of workqueue,
   once the sas port be deleted, its children device will be deleted too,
   when the destruction work start, it will found the target device has
   been removed, and report a sysfs warnning.
3. since a hotplug process will be devided into several works, if a phy up
   sas event insert into phydown works, like
   destruction work  ---> PORTE_BYTES_DMAED (sas_form_port) 
>PHYE_LOSS_OF_SIGNAL
   the hot remove flow would broken by PORTE_BYTES_DMAED event, it's not
   we expected, and issues would occur.

The first patch fix the sas events lost, and the second one introudce 
wait-complete
to fix the hotplug order issues.

Yijing Wang (2):
  libsas: Don't process sas events in static works
  libsas: Enhance libsas hotplug

 drivers/scsi/libsas/sas_discover.c | 58 +---
 drivers/scsi/libsas/sas_event.c| 90 ++
 drivers/scsi/libsas/sas_expander.c |  9 +++-
 drivers/scsi/libsas/sas_init.c | 37 +---
 drivers/scsi/libsas/sas_internal.h | 53 ++
 drivers/scsi/libsas/sas_phy.c  | 45 ---
 drivers/scsi/libsas/sas_port.c | 22 +-
 include/scsi/libsas.h  | 21 +
 8 files changed, 230 insertions(+), 105 deletions(-)

-- 
2.5.0

Re: [PATCH] aacraid: Fixed expander hotplug for SMART family

2017-02-23 Thread Martin K. Petersen

>>>>> "Raghava" == Raghava Aditya Renukunta 
>>>>> <raghavaaditya.renuku...@microsemi.com> writes:

Raghava> Current driver Hotplug processing code skips over Enclosure
Raghava> channel, therefore any addition/removal of expander enclosure
Raghava> is not processed.  Additionally device addition code relies on
Raghava> older device type, which prevents the hotplug of adapter
Raghava> expanders.

Raghava> Fixed by removing code that skips over Enclosure channels and
Raghava> using the latest device type for addition or removal or
Raghava> enclosure expanders.

Applied to 4.11/scsi-fixes.

-- 
Martin K. Petersen  Oracle Linux Engineering

RE: [PATCH] aacraid: Fixed expander hotplug for SMART family

2017-02-23 Thread Dave Carroll

> -Original Message-
> From: Raghava Aditya Renukunta
> [mailto:raghavaaditya.renuku...@microsemi.com]
> Sent: Wednesday, February 22, 2017 8:23 AM
> To: j...@linux.vnet.ibm.com; martin.peter...@oracle.com; linux-
> s...@vger.kernel.org
> Cc: Dave Carroll; Gana Sridaran; Scott Benesh
> Subject: [PATCH] aacraid: Fixed expander hotplug for SMART family
> 
> Current driver Hotplug processing code skips over Enclosure channel,
> therefore any addition/removal of expander enclosure is not processed.
> Additionally  device addition code relies on older device type, which
> prevents the hotplug of adapter expanders.
> 
> Fixed by removing code that skips over Enclosure channels and using the
> latest device type for addition or removal or enclosure expanders.
> 
> Fixes: 6223a39fe6fbbeef (scsi: aacraid: Added support for hotplug)
> Signed-off-by: Raghava Aditya Renukunta
> <raghavaaditya.renuku...@microsemi.com>
> ---
>  drivers/scsi/aacraid/commsup.c | 5 +
>  1 file changed, 1 insertion(+), 4 deletions(-)

Reviewed-by: Dave Carroll <david.carr...@microsemi.com>

[PATCH] aacraid: Fixed expander hotplug for SMART family

2017-02-22 Thread Raghava Aditya Renukunta

Current driver Hotplug processing code skips over Enclosure channel,
therefore any addition/removal of expander enclosure is not processed.
Additionally  device addition code relies on older device type, which
prevents the hotplug of adapter expanders.

Fixed by removing code that skips over Enclosure channels and using the
latest device type for addition or removal or enclosure expanders.

Fixes: 6223a39fe6fbbeef (scsi: aacraid: Added support for hotplug)
Signed-off-by: Raghava Aditya Renukunta <raghavaaditya.renuku...@microsemi.com>
---
 drivers/scsi/aacraid/commsup.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/scsi/aacraid/commsup.c b/drivers/scsi/aacraid/commsup.c
index a2ea70d..1994c74 100644
--- a/drivers/scsi/aacraid/commsup.c
+++ b/drivers/scsi/aacraid/commsup.c
@@ -1908,9 +1908,6 @@ static void aac_resolve_luns(struct aac_dev *dev)
for (bus = 0; bus < AAC_MAX_BUSES; bus++) {
for (target = 0; target < AAC_MAX_TARGETS; target++) {
 
-   if (aac_phys_to_logical(bus) == ENCLOSURE_CHANNEL)
-   continue;
-
if (bus == CONTAINER_CHANNEL)
channel = CONTAINER_CHANNEL;
else
@@ -1922,7 +1919,7 @@ static void aac_resolve_luns(struct aac_dev *dev)
sdev = scsi_device_lookup(dev->scsi_host_ptr, channel,
target, 0);
 
-   if (!sdev && devtype)
+   if (!sdev && new_devtype)
scsi_add_device(dev->scsi_host_ptr, channel,
target, 0);
else if (sdev && new_devtype != devtype)
-- 
2.7.4

RE: [bug report] scsi: aacraid: Added support for hotplug

2017-02-14 Thread Raghava Aditya Renukunta

Hi Dan,

> -Original Message-
> From: Dan Carpenter [mailto:dan.carpen...@oracle.com]
> Sent: Monday, February 13, 2017 11:44 PM
> To: Raghava Aditya Renukunta
> <raghavaaditya.renuku...@microsemi.com>
> Cc: linux-scsi@vger.kernel.org
> Subject: Re: [bug report] scsi: aacraid: Added support for hotplug
> 
> EXTERNAL EMAIL
> 
> 
> On Mon, Feb 13, 2017 at 07:39:15PM +, Raghava Aditya Renukunta
> wrote:
> > Hi Don,
> >
> > > -Original Message-
> > > From: Dan Carpenter [mailto:dan.carpen...@oracle.com]
> > > Sent: Monday, February 13, 2017 10:47 AM
> > > To: Raghava Aditya Renukunta
> > > <raghavaaditya.renuku...@microsemi.com>
> > > Cc: linux-scsi@vger.kernel.org
> > > Subject: [bug report] scsi: aacraid: Added support for hotplug
> > >
> > > EXTERNAL EMAIL
> > >
> > >
> > > Hello Raghava Aditya Renukunta,
> > >
> > > The patch 6223a39fe6fb: "scsi: aacraid: Added support for hotplug"
> > > from Feb 2, 2017, leads to the following static checker warning:
> > >
> > > drivers/scsi/aacraid/commsup.c:2243 aac_process_events()
> > > error: double unlock 'spin_lock:t_lock'
> > >
> > > drivers/scsi/aacraid/commsup.c
> > >   2130  spin_lock_irqsave(t_lock, flags);
> > > 
> > >   2131
> > >   2132  while (!list_empty(&(dev->queues-
> > > >queue[HostNormCmdQueue].cmdq))) {
> > >   2133  struct list_head *entry;
> > >   2134  struct aac_aifcmd *aifcmd;
> > >   2135  unsigned int  num;
> > >   2136  struct hw_fib **hw_fib_pool, **hw_fib_p;
> > >   2137  struct fib **fib_pool, **fib_p;
> > >   2138
> > >   2139  set_current_state(TASK_RUNNING);
> > >   2140
> > >   2141  entry = dev->queues-
> > > >queue[HostNormCmdQueue].cmdq.next;
> > >   2142  list_del(entry);
> > >   2143
> > >   2144  t_lock = dev->queues-
> >queue[HostNormCmdQueue].lock;
> > >   2145  spin_unlock_irqrestore(t_lock, flags);
> > > ^
> > >   2146
> > >   2147  fib = list_entry(entry, struct fib, fiblink);
> > >   2148  hw_fib = fib->hw_fib_va;
> > >   2149  if (dev->sa_firmware) {
> > >   2150  /* Thor AIF */
> > >   2151  aac_handle_sa_aif(dev, fib);
> > >   2152  aac_fib_adapter_complete(fib, 
> > > (u16)sizeof(u32));
> > >   2153  continue;
> > >
> > > The locking isn't right here.  We should re-take the spinlock before
> > > continuing.
> >
> > The intention here is to protect the retrieval of entry from the queues.
> > Or do you mean that we should just protect the whole while loop with one
> spin_lock (t_lock)?
> >
> 
> This is a static checker warning that says we call
> spin_unlock_irqrestore(t_lock, flags); at the end of the loop but
> sometimes we're not holding the lock.
> 
> This is a Smatch warning and it doesn't handle loops correctly.  It
> should also warn that on line 2145 we might not be holding the lock
> either but it misses that bug.
> 
> There is no way this continue is correct with regards to locking.
> 
> regards,
> dan carpenter

I agree I will fix it shortly.

Thank you.

> > Thank you,
> > Raghava Aditya
> >
> > >   2154  }
> > >   2155  /*
> > >   2156   *  We will process the FIB here or pass it 
> > > to a
> > >   2157   *  worker thread that is TBD. We Really can't
> > >   2158   *  do anything at this point since we don't 
> > > have
> > >   2159   *  anything defined for this thread to do.
> > >   2160   */
> > >
> > > [ snip ]
> > >
> > >   2221  free_mem:
> > >     /* Free up the remaining resources */
> > >   2223  hw_fib_p = hw_fib_pool;
> > >   2224  fib_p = fib_pool;
> > >   2225  while (hw_fib_p < _fib_pool[num]) {
> > >   2226

Re: [bug report] scsi: aacraid: Added support for hotplug

2017-02-13 Thread Dan Carpenter

On Mon, Feb 13, 2017 at 07:39:15PM +, Raghava Aditya Renukunta wrote:
> Hi Don,
> 
> > -Original Message-
> > From: Dan Carpenter [mailto:dan.carpen...@oracle.com]
> > Sent: Monday, February 13, 2017 10:47 AM
> > To: Raghava Aditya Renukunta
> > <raghavaaditya.renuku...@microsemi.com>
> > Cc: linux-scsi@vger.kernel.org
> > Subject: [bug report] scsi: aacraid: Added support for hotplug
> > 
> > EXTERNAL EMAIL
> > 
> > 
> > Hello Raghava Aditya Renukunta,
> > 
> > The patch 6223a39fe6fb: "scsi: aacraid: Added support for hotplug"
> > from Feb 2, 2017, leads to the following static checker warning:
> > 
> > drivers/scsi/aacraid/commsup.c:2243 aac_process_events()
> > error: double unlock 'spin_lock:t_lock'
> > 
> > drivers/scsi/aacraid/commsup.c
> >   2130  spin_lock_irqsave(t_lock, flags);
> > 
> >   2131
> >   2132  while (!list_empty(&(dev->queues-
> > >queue[HostNormCmdQueue].cmdq))) {
> >   2133  struct list_head *entry;
> >   2134  struct aac_aifcmd *aifcmd;
> >   2135  unsigned int  num;
> >   2136  struct hw_fib **hw_fib_pool, **hw_fib_p;
> >   2137  struct fib **fib_pool, **fib_p;
> >   2138
> >   2139  set_current_state(TASK_RUNNING);
> >   2140
> >   2141  entry = dev->queues-
> > >queue[HostNormCmdQueue].cmdq.next;
> >   2142  list_del(entry);
> >   2143
> >   2144  t_lock = dev->queues->queue[HostNormCmdQueue].lock;
> >   2145  spin_unlock_irqrestore(t_lock, flags);
> > ^
> >   2146
> >   2147  fib = list_entry(entry, struct fib, fiblink);
> >   2148  hw_fib = fib->hw_fib_va;
> >   2149  if (dev->sa_firmware) {
> >   2150  /* Thor AIF */
> >   2151  aac_handle_sa_aif(dev, fib);
> >   2152  aac_fib_adapter_complete(fib, 
> > (u16)sizeof(u32));
> >   2153  continue;
> > 
> > The locking isn't right here.  We should re-take the spinlock before
> > continuing.
> 
> The intention here is to protect the retrieval of entry from the queues.
> Or do you mean that we should just protect the whole while loop with one 
> spin_lock (t_lock)?
> 

This is a static checker warning that says we call
spin_unlock_irqrestore(t_lock, flags); at the end of the loop but
sometimes we're not holding the lock.

This is a Smatch warning and it doesn't handle loops correctly.  It
should also warn that on line 2145 we might not be holding the lock
either but it misses that bug.

There is no way this continue is correct with regards to locking.

regards,
dan carpenter

> Thank you,
> Raghava Aditya
> 
> >   2154  }
> >   2155  /*
> >   2156   *  We will process the FIB here or pass it to a
> >   2157   *  worker thread that is TBD. We Really can't
> >   2158   *  do anything at this point since we don't 
> > have
> >   2159   *  anything defined for this thread to do.
> >   2160   */
> > 
> > [ snip ]
> > 
> >   2221  free_mem:
> >     /* Free up the remaining resources */
> >   2223  hw_fib_p = hw_fib_pool;
> >   2224  fib_p = fib_pool;
> >   2225  while (hw_fib_p < _fib_pool[num]) {
> >   2226  kfree(*hw_fib_p);
> >   2227  kfree(*fib_p);
> >   2228  ++fib_p;
> >   2229  ++hw_fib_p;
> >   2230  }
> >   2231  kfree(fib_pool);
> >   2232  free_hw_fib_pool:
> >   2233  kfree(hw_fib_pool);
> >   2234  free_fib:
> >   2235  kfree(fib);
> >   2236  t_lock = dev->queues->queue[HostNormCmdQueue].lock;
> >   2237  spin_lock_irqsave(t_lock, flags);
> >   2238  }
> >   2239  /*
> >   2240   *  There are no more AIF's
> >   2241   */
> >   2242  t_lock = dev->queues->queue[HostNormCmdQueue].lock;
> >   2243  spin_unlock_irqrestore(t_lock, flags);
> > 
> > Otherwise it is a double unlock bug.
> > 
> >   2244  }
> > 
> > 
> > regards,
> > dan carpenter

RE: [bug report] scsi: aacraid: Added support for hotplug

2017-02-13 Thread Raghava Aditya Renukunta

Hi Don,

> -Original Message-
> From: Dan Carpenter [mailto:dan.carpen...@oracle.com]
> Sent: Monday, February 13, 2017 10:47 AM
> To: Raghava Aditya Renukunta
> <raghavaaditya.renuku...@microsemi.com>
> Cc: linux-scsi@vger.kernel.org
> Subject: [bug report] scsi: aacraid: Added support for hotplug
> 
> EXTERNAL EMAIL
> 
> 
> Hello Raghava Aditya Renukunta,
> 
> The patch 6223a39fe6fb: "scsi: aacraid: Added support for hotplug"
> from Feb 2, 2017, leads to the following static checker warning:
> 
> drivers/scsi/aacraid/commsup.c:2243 aac_process_events()
> error: double unlock 'spin_lock:t_lock'
> 
> drivers/scsi/aacraid/commsup.c
>   2130  spin_lock_irqsave(t_lock, flags);
> 
>   2131
>   2132  while (!list_empty(&(dev->queues-
> >queue[HostNormCmdQueue].cmdq))) {
>   2133  struct list_head *entry;
>   2134  struct aac_aifcmd *aifcmd;
>   2135  unsigned int  num;
>   2136  struct hw_fib **hw_fib_pool, **hw_fib_p;
>   2137  struct fib **fib_pool, **fib_p;
>   2138
>   2139  set_current_state(TASK_RUNNING);
>   2140
>   2141  entry = dev->queues-
> >queue[HostNormCmdQueue].cmdq.next;
>   2142  list_del(entry);
>   2143
>   2144  t_lock = dev->queues->queue[HostNormCmdQueue].lock;
>   2145  spin_unlock_irqrestore(t_lock, flags);
> ^
>   2146
>   2147  fib = list_entry(entry, struct fib, fiblink);
>   2148  hw_fib = fib->hw_fib_va;
>   2149  if (dev->sa_firmware) {
>   2150  /* Thor AIF */
>   2151  aac_handle_sa_aif(dev, fib);
>   2152  aac_fib_adapter_complete(fib, 
> (u16)sizeof(u32));
>   2153  continue;
> 
> The locking isn't right here.  We should re-take the spinlock before
> continuing.

The intention here is to protect the retrieval of entry from the queues.
Or do you mean that we should just protect the whole while loop with one 
spin_lock (t_lock)?

Thank you,
Raghava Aditya

>   2154  }
>   2155  /*
>   2156   *  We will process the FIB here or pass it to a
>   2157   *  worker thread that is TBD. We Really can't
>   2158   *  do anything at this point since we don't have
>   2159   *  anything defined for this thread to do.
>   2160   */
> 
> [ snip ]
> 
>   2221  free_mem:
>     /* Free up the remaining resources */
>   2223  hw_fib_p = hw_fib_pool;
>   2224  fib_p = fib_pool;
>   2225  while (hw_fib_p < _fib_pool[num]) {
>   2226  kfree(*hw_fib_p);
>   2227  kfree(*fib_p);
>   2228  ++fib_p;
>   2229  ++hw_fib_p;
>   2230  }
>   2231  kfree(fib_pool);
>   2232  free_hw_fib_pool:
>   2233  kfree(hw_fib_pool);
>   2234  free_fib:
>   2235  kfree(fib);
>   2236  t_lock = dev->queues->queue[HostNormCmdQueue].lock;
>   2237  spin_lock_irqsave(t_lock, flags);
>   2238  }
>   2239  /*
>   2240   *  There are no more AIF's
>   2241   */
>   2242  t_lock = dev->queues->queue[HostNormCmdQueue].lock;
>   2243  spin_unlock_irqrestore(t_lock, flags);
> 
> Otherwise it is a double unlock bug.
> 
>   2244  }
> 
> 
> regards,
> dan carpenter

[bug report] scsi: aacraid: Added support for hotplug

2017-02-13 Thread Dan Carpenter

Hello Raghava Aditya Renukunta,

The patch 6223a39fe6fb: "scsi: aacraid: Added support for hotplug"
from Feb 2, 2017, leads to the following static checker warning:

drivers/scsi/aacraid/commsup.c:2243 aac_process_events()
error: double unlock 'spin_lock:t_lock'

drivers/scsi/aacraid/commsup.c
  2130  spin_lock_irqsave(t_lock, flags);

  2131  
  2132  while 
(!list_empty(&(dev->queues->queue[HostNormCmdQueue].cmdq))) {
  2133  struct list_head *entry;
  2134  struct aac_aifcmd *aifcmd;
  2135  unsigned int  num;
  2136  struct hw_fib **hw_fib_pool, **hw_fib_p;
  2137  struct fib **fib_pool, **fib_p;
  2138  
  2139  set_current_state(TASK_RUNNING);
  2140  
  2141  entry = dev->queues->queue[HostNormCmdQueue].cmdq.next;
  2142  list_del(entry);
  2143  
  2144  t_lock = dev->queues->queue[HostNormCmdQueue].lock;
  2145  spin_unlock_irqrestore(t_lock, flags);
^
  2146  
  2147  fib = list_entry(entry, struct fib, fiblink);
  2148  hw_fib = fib->hw_fib_va;
  2149  if (dev->sa_firmware) {
  2150  /* Thor AIF */
  2151  aac_handle_sa_aif(dev, fib);
  2152  aac_fib_adapter_complete(fib, (u16)sizeof(u32));
  2153  continue;

The locking isn't right here.  We should re-take the spinlock before
continuing.

  2154  }
  2155  /*
  2156   *  We will process the FIB here or pass it to a
  2157   *  worker thread that is TBD. We Really can't
  2158   *  do anything at this point since we don't have
  2159   *  anything defined for this thread to do.
  2160   */

[ snip ]

  2221  free_mem:
    /* Free up the remaining resources */
  2223  hw_fib_p = hw_fib_pool;
  2224  fib_p = fib_pool;
  2225  while (hw_fib_p < _fib_pool[num]) {
  2226  kfree(*hw_fib_p);
  2227  kfree(*fib_p);
  2228  ++fib_p;
  2229  ++hw_fib_p;
  2230  }
  2231  kfree(fib_pool);
  2232  free_hw_fib_pool:
  2233  kfree(hw_fib_pool);
  2234  free_fib:
  2235  kfree(fib);
  2236  t_lock = dev->queues->queue[HostNormCmdQueue].lock;
  2237  spin_lock_irqsave(t_lock, flags);
  2238  }
  2239  /*
  2240   *  There are no more AIF's
  2241   */
  2242  t_lock = dev->queues->queue[HostNormCmdQueue].lock;
  2243  spin_unlock_irqrestore(t_lock, flags);

Otherwise it is a double unlock bug.

  2244  }


regards,
dan carpenter

[PATCH V2 14/24] aacraid: Added support for hotplug

2017-01-25 Thread Raghava Aditya Renukunta

Added support for drive hotplug add and removal

Signed-off-by: Raghava Aditya Renukunta <raghavaaditya.renuku...@microsemi.com>
Signed-off-by: Dave Carroll <david.carr...@microsemi.com>


---
 Changes in  V2:
  None

 drivers/scsi/aacraid/aachba.c  |  13 ++--
 drivers/scsi/aacraid/aacraid.h |  17 +-
 drivers/scsi/aacraid/commsup.c | 136 +
 3 files changed, 159 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/aacraid/aachba.c b/drivers/scsi/aacraid/aachba.c
index 69f53e6..d134c8d 100644
--- a/drivers/scsi/aacraid/aachba.c
+++ b/drivers/scsi/aacraid/aachba.c
@@ -1601,7 +1601,7 @@ int aac_issue_bmic_identify(struct aac_dev *dev, u32 bus, 
u32 target)
  * Update our hba map with the information gathered from the FW
  */
 void aac_update_hba_map(struct aac_dev *dev,
-   struct aac_ciss_phys_luns_resp *phys_luns)
+   struct aac_ciss_phys_luns_resp *phys_luns, int rescan)
 {
/* ok and extended reporting */
u32 lun_count, nexus;
@@ -1646,7 +1646,10 @@ void aac_update_hba_map(struct aac_dev *dev,
dev->hba_map[bus][target].qd_limit = 32;
 
 update_devtype:
-   dev->hba_map[bus][target].devtype = devtype;
+   if (rescan == AAC_INIT)
+   dev->hba_map[bus][target].devtype = devtype;
+   else
+   dev->hba_map[bus][target].new_devtype = devtype;
}
 }
 
@@ -1658,7 +1661,7 @@ void aac_update_hba_map(struct aac_dev *dev,
  * Execute a CISS REPORT PHYS LUNS and process the results into
  * the current hba_map.
  */
-int aac_report_phys_luns(struct aac_dev *dev, struct fib *fibptr)
+int aac_report_phys_luns(struct aac_dev *dev, struct fib *fibptr, int rescan)
 {
int fibsize, datasize;
struct aac_ciss_phys_luns_resp *phys_luns;
@@ -1718,7 +1721,7 @@ int aac_report_phys_luns(struct aac_dev *dev, struct fib 
*fibptr)
/* analyse data */
if (rcode >= 0 && phys_luns->resp_flag == 2) {
/* ok and extended reporting */
-   aac_update_hba_map(dev, phys_luns);
+   aac_update_hba_map(dev, phys_luns, rescan);
}
 
pci_free_consistent(dev->pdev, datasize, (void *) phys_luns, addr);
@@ -1831,7 +1834,7 @@ int aac_get_adapter_info(struct aac_dev* dev)
if (!dev->sync_mode && dev->sa_firmware &&
dev->supplement_adapter_info.VirtDeviceBus != 0x) {
/* Thor SA Firmware -> CISS_REPORT_PHYSICAL_LUNS */
-   rcode = aac_report_phys_luns(dev, fibptr);
+   rcode = aac_report_phys_luns(dev, fibptr, AAC_INIT);
}
 
if (!dev->in_reset) {
diff --git a/drivers/scsi/aacraid/aacraid.h b/drivers/scsi/aacraid/aacraid.h
index 23c00ab..65e84cf 100644
--- a/drivers/scsi/aacraid/aacraid.h
+++ b/drivers/scsi/aacraid/aacraid.h
@@ -74,7 +74,7 @@ enum {
 #define AAC_NUM_IO_FIB (1024 - AAC_NUM_MGT_FIB)
 #define AAC_NUM_FIB(AAC_NUM_IO_FIB + AAC_NUM_MGT_FIB)
 
-#define AAC_MAX_LUN(256)
+#define AAC_MAX_LUN256
 
 #define AAC_MAX_HOSTPHYSMEMPAGES (0xf)
 #define AAC_MAX_32BIT_SGBCOUNT ((unsigned short)256)
@@ -87,6 +87,14 @@ enum {
 #define AAC_MAX_TARGETS256
 #define AAC_MAX_NATIVE_SIZE2048
 
+/* Thor AIF events */
+#define SA_AIF_HOTPLUG (1<<1)
+#define SA_AIF_HARDWARE(1<<2)
+#define SA_AIF_PDEV_CHANGE (1<<4)
+#define SA_AIF_LDEV_CHANGE (1<<5)
+#define SA_AIF_BPSTAT_CHANGE   (1<<30)
+#define SA_AIF_BPCFG_CHANGE(1<<31)
+
 #define CISS_REPORT_PHYSICAL_LUNS  0xc3
 #define WRITE_HOST_WELLNESS0xa5
 #define CISS_IDENTIFY_PHYSICAL_DEVICE  0x15
@@ -198,6 +206,7 @@ struct aac_ciss_identify_pd {
 #define CONTAINER_TO_CHANNEL(cont) (CONTAINER_CHANNEL)
 #define CONTAINER_TO_ID(cont)  (cont)
 #define CONTAINER_TO_LUN(cont) (0)
+#define ENCLOSURE_CHANNEL  (3)
 
 #define PMC_DEVICE_S6  0x28b
 #define PMC_DEVICE_S7  0x28c
@@ -1102,6 +,9 @@ struct fib {
u32 hbacmd_size;/* cmd size for native */
 };
 
+#define AAC_INIT   0
+#define AAC_RESCAN 1
+
 #define AAC_DEVTYPE_RAID_MEMBER1
 #define AAC_DEVTYPE_ARC_RAW2
 #define AAC_DEVTYPE_NATIVE_RAW 3
@@ -,6 +1123,7 @@ struct fib {
 struct aac_hba_map_info {
__le32  rmw_nexus;  /* nexus for native HBA devices */
u8  devtype;/* device type */
+   u8  new_devtype;
u8  reset_state;/* 0 - no reset, 1..x - */
/* after xth TM LUN reset */
u16 qd_limit;
@@ -2321,7 +2334,7 @@ static inline unsigned int cap_to_cyls(

[PATCH 14/24] aacraid: Added support for hotplug

2017-01-23 Thread Raghava Aditya Renukunta

Added support for drive hotplug add and removal

Signed-off-by: Raghava Aditya Renukunta <raghavaaditya.renuku...@microsemi.com>
Signed-off-by: Dave Carroll <david.carr...@microsemi.com>
---
 drivers/scsi/aacraid/aachba.c  |  13 ++--
 drivers/scsi/aacraid/aacraid.h |  17 +-
 drivers/scsi/aacraid/commsup.c | 136 +
 3 files changed, 159 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/aacraid/aachba.c b/drivers/scsi/aacraid/aachba.c
index 10a26046..237d68c 100644
--- a/drivers/scsi/aacraid/aachba.c
+++ b/drivers/scsi/aacraid/aachba.c
@@ -1601,7 +1601,7 @@ int aac_issue_bmic_identify(struct aac_dev *dev, u32 bus, 
u32 target)
  * Update our hba map with the information gathered from the FW
  */
 void aac_update_hba_map(struct aac_dev *dev,
-   struct aac_ciss_phys_luns_resp *phys_luns)
+   struct aac_ciss_phys_luns_resp *phys_luns, int rescan)
 {
/* ok and extended reporting */
u32 lun_count, nexus;
@@ -1646,7 +1646,10 @@ void aac_update_hba_map(struct aac_dev *dev,
dev->hba_map[bus][target].qd_limit = 32;
 
 update_devtype:
-   dev->hba_map[bus][target].devtype = devtype;
+   if (rescan == AAC_INIT)
+   dev->hba_map[bus][target].devtype = devtype;
+   else
+   dev->hba_map[bus][target].new_devtype = devtype;
}
 }
 
@@ -1658,7 +1661,7 @@ void aac_update_hba_map(struct aac_dev *dev,
  * Execute a CISS REPORT PHYS LUNS and process the results into
  * the current hba_map.
  */
-int aac_report_phys_luns(struct aac_dev *dev, struct fib *fibptr)
+int aac_report_phys_luns(struct aac_dev *dev, struct fib *fibptr, int rescan)
 {
int fibsize, datasize;
struct aac_ciss_phys_luns_resp *phys_luns;
@@ -1715,7 +1718,7 @@ int aac_report_phys_luns(struct aac_dev *dev, struct fib 
*fibptr)
/* analyse data */
if (rcode >= 0 && phys_luns->resp_flag == 2) {
/* ok and extended reporting */
-   aac_update_hba_map(dev, phys_luns);
+   aac_update_hba_map(dev, phys_luns, rescan);
}
}
 
@@ -1828,7 +1831,7 @@ int aac_get_adapter_info(struct aac_dev* dev)
if (!dev->sync_mode && dev->sa_firmware &&
dev->supplement_adapter_info.VirtDeviceBus != 0x) {
/* Thor SA Firmware -> CISS_REPORT_PHYSICAL_LUNS */
-   rcode = aac_report_phys_luns(dev, fibptr);
+   rcode = aac_report_phys_luns(dev, fibptr, AAC_INIT);
}
 
if (!dev->in_reset) {
diff --git a/drivers/scsi/aacraid/aacraid.h b/drivers/scsi/aacraid/aacraid.h
index 23c00ab..6709de4 100644
--- a/drivers/scsi/aacraid/aacraid.h
+++ b/drivers/scsi/aacraid/aacraid.h
@@ -74,7 +74,7 @@ enum {
 #define AAC_NUM_IO_FIB (1024 - AAC_NUM_MGT_FIB)
 #define AAC_NUM_FIB(AAC_NUM_IO_FIB + AAC_NUM_MGT_FIB)
 
-#define AAC_MAX_LUN(256)
+#define AAC_MAX_LUN256
 
 #define AAC_MAX_HOSTPHYSMEMPAGES (0xf)
 #define AAC_MAX_32BIT_SGBCOUNT ((unsigned short)256)
@@ -87,6 +87,14 @@ enum {
 #define AAC_MAX_TARGETS256
 #define AAC_MAX_NATIVE_SIZE2048
 
+/* Thor AIF events */
+#define SA_AIF_HOTPLUG (1<<1)
+#define SA_AIF_HARDWARE(1<<2)
+#define SA_AIF_PDEV_CHANGE (1<<4)
+#define SA_AIF_LDEV_CHANGE (1<<5)
+#define SA_AIF_BPSTAT_CHANGE   (1<<30)
+#define SA_AIF_BPCFG_CHANGE(1<<31)
+
 #define CISS_REPORT_PHYSICAL_LUNS  0xc3
 #define WRITE_HOST_WELLNESS0xa5
 #define CISS_IDENTIFY_PHYSICAL_DEVICE  0x15
@@ -198,6 +206,7 @@ struct aac_ciss_identify_pd {
 #define CONTAINER_TO_CHANNEL(cont) (CONTAINER_CHANNEL)
 #define CONTAINER_TO_ID(cont)  (cont)
 #define CONTAINER_TO_LUN(cont) (0)
+#define ENCLOSURE_CHANNEL  (3)
 
 #define PMC_DEVICE_S6  0x28b
 #define PMC_DEVICE_S7  0x28c
@@ -1102,6 +,9 @@ struct fib {
u32 hbacmd_size;/* cmd size for native */
 };
 
+#define AAC_INIT   0
+#define AAC_RESCAN 1
+
 #define AAC_DEVTYPE_RAID_MEMBER1
 #define AAC_DEVTYPE_ARC_RAW2
 #define AAC_DEVTYPE_NATIVE_RAW 3
@@ -,6 +1123,7 @@ struct fib {
 struct aac_hba_map_info {
__le32  rmw_nexus;  /* nexus for native HBA devices */
u8  devtype;/* device type */
+   u8  new_devtype;
u8  reset_state;/* 0 - no reset, 1..x - */
/* after xth TM LUN reset */
u16 qd_limit;
@@ -2321,7 +2334,7 @@ static inline unsigned int cap_to_cyls(sector_t capacity, 
unsigned divisor)
 
 int

[patch 04/10] scsi/bnx2i: Convert to hotplug state machine

2016-12-21 Thread Thomas Gleixner

From: Sebastian Andrzej Siewior <bige...@linutronix.de>

Install the callbacks via the state machine. No functional change.

This is the minimal fixup so we can remove the hotplug notifier mess
completely.

The real rework of this driver to use work queues is still stuck in
review/testing on the SCSI mailing list.

Signed-off-by: Sebastian Andrzej Siewior <bige...@linutronix.de>
Cc: Martin K. Petersen <martin.peter...@oracle.com>
Cc: James E.J. Bottomley <j...@linux.vnet.ibm.com>
Cc: linux-scsi@vger.kernel.org
Cc: Chad Dupuis <chad.dup...@qlogic.com>
Cc: qlogic-storage-upstr...@qlogic.com
Cc: Johannes Thumshirn <j...@kernel.org>
Cc: Christoph Hellwig <h...@lst.de>
Signed-off-by: Thomas Gleixner <t...@linutronix.de>

---
 drivers/scsi/bnx2i/bnx2i_init.c |   80 +++-
 include/linux/cpuhotplug.h  |1 
 2 files changed, 32 insertions(+), 49 deletions(-)

--- a/drivers/scsi/bnx2i/bnx2i_init.c
+++ b/drivers/scsi/bnx2i/bnx2i_init.c
@@ -70,14 +70,6 @@ u64 iscsi_error_mask = 0x00;
 
 DEFINE_PER_CPU(struct bnx2i_percpu_s, bnx2i_percpu);
 
-static int bnx2i_cpu_callback(struct notifier_block *nfb,
- unsigned long action, void *hcpu);
-/* notification function for CPU hotplug events */
-static struct notifier_block bnx2i_cpu_notifier = {
-   .notifier_call = bnx2i_cpu_callback,
-};
-
-
 /**
  * bnx2i_identify_device - identifies NetXtreme II device type
  * @hba:   Adapter structure pointer
@@ -461,41 +453,21 @@ static void bnx2i_percpu_thread_destroy(
kthread_stop(thread);
 }
 
-
-/**
- * bnx2i_cpu_callback - Handler for CPU hotplug events
- *
- * @nfb:   The callback data block
- * @action:The event triggering the callback
- * @hcpu:  The index of the CPU that the event is for
- *
- * This creates or destroys per-CPU data for iSCSI
- *
- * Returns NOTIFY_OK always.
- */
-static int bnx2i_cpu_callback(struct notifier_block *nfb,
- unsigned long action, void *hcpu)
+static int bnx2i_cpu_online(unsigned int cpu)
 {
-   unsigned cpu = (unsigned long)hcpu;
+   pr_info("bnx2i: CPU %x online: Create Rx thread\n", cpu);
+   bnx2i_percpu_thread_create(cpu);
+   return 0;
+}
 
-   switch (action) {
-   case CPU_ONLINE:
-   case CPU_ONLINE_FROZEN:
-   printk(KERN_INFO "bnx2i: CPU %x online: Create Rx thread\n",
-   cpu);
-   bnx2i_percpu_thread_create(cpu);
-   break;
-   case CPU_DEAD:
-   case CPU_DEAD_FROZEN:
-   printk(KERN_INFO "CPU %x offline: Remove Rx thread\n", cpu);
-   bnx2i_percpu_thread_destroy(cpu);
-   break;
-   default:
-   break;
-   }
-   return NOTIFY_OK;
+static int bnx2i_cpu_dead(unsigned int cpu)
+{
+   pr_info("CPU %x offline: Remove Rx thread\n", cpu);
+   bnx2i_percpu_thread_destroy(cpu);
+   return 0;
 }
 
+static enum cpuhp_state bnx2i_online_state;
 
 /**
  * bnx2i_mod_init - module init entry point
@@ -539,18 +511,28 @@ static int __init bnx2i_mod_init(void)
p->iothread = NULL;
}
 
-   cpu_notifier_register_begin();
+   get_online_cpus();
 
for_each_online_cpu(cpu)
bnx2i_percpu_thread_create(cpu);
 
-   /* Initialize per CPU interrupt thread */
-   __register_hotcpu_notifier(_cpu_notifier);
-
-   cpu_notifier_register_done();
-
+   err = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
+  "scsi/bnx2i:online",
+  bnx2i_cpu_online, NULL);
+   if (err < 0)
+   goto remove_threads;
+   bnx2i_online_state = err;
+
+   cpuhp_setup_state_nocalls(CPUHP_SCSI_BNX2I_DEAD, "scsi/bnx2i:dead",
+ NULL, bnx2i_cpu_dead);
+   put_online_cpus();
return 0;
 
+remove_threads:
+   for_each_online_cpu(cpu)
+   bnx2i_percpu_thread_destroy(cpu);
+   put_online_cpus();
+   cnic_unregister_driver(CNIC_ULP_ISCSI);
 unreg_xport:
iscsi_unregister_transport(_iscsi_transport);
 out:
@@ -587,14 +569,14 @@ static void __exit bnx2i_mod_exit(void)
}
mutex_unlock(_dev_lock);
 
-   cpu_notifier_register_begin();
+   get_online_cpus();
 
for_each_online_cpu(cpu)
bnx2i_percpu_thread_destroy(cpu);
 
-   __unregister_hotcpu_notifier(_cpu_notifier);
-
-   cpu_notifier_register_done();
+   cpuhp_remove_state_nocalls(bnx2i_online_state);
+   cpuhp_remove_state_nocalls(CPUHP_SCSI_BNX2I_DEAD);
+   put_online_cpus();
 
iscsi_unregister_transport(_iscsi_transport);
cnic_unregister_driver(CNIC_ULP_ISCSI);
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -42,6 +42,7

[patch 03/10] scsi/bnx2fc: Convert to hotplug state machine

2016-12-21 Thread Thomas Gleixner

From: Sebastian Andrzej Siewior <bige...@linutronix.de>

Install the callbacks via the state machine. No functional change.

This is the minimal fixup so we can remove the hotplug notifier mess
completely.

The real rework of this driver to use work queues is still stuck in
review/testing on the SCSI mailing list.

Signed-off-by: Sebastian Andrzej Siewior <bige...@linutronix.de>
Cc: Martin K. Petersen <martin.peter...@oracle.com>
Cc: James E.J. Bottomley <j...@linux.vnet.ibm.com>
Cc: linux-scsi@vger.kernel.org
Cc: Chad Dupuis <chad.dup...@qlogic.com>
Cc: qlogic-storage-upstr...@qlogic.com
Cc: Johannes Thumshirn <j...@kernel.org>
Cc: Christoph Hellwig <h...@lst.de>
Signed-off-by: Thomas Gleixner <t...@linutronix.de>

---
 drivers/scsi/bnx2fc/bnx2fc_fcoe.c |   81 +++---
 include/linux/cpuhotplug.h|1 
 2 files changed, 35 insertions(+), 47 deletions(-)

--- a/drivers/scsi/bnx2fc/bnx2fc_fcoe.c
+++ b/drivers/scsi/bnx2fc/bnx2fc_fcoe.c
@@ -127,13 +127,6 @@ module_param_named(log_fka, bnx2fc_log_f
 MODULE_PARM_DESC(log_fka, " Print message to kernel log when fcoe is "
"initiating a FIP keep alive when debug logging is enabled.");
 
-static int bnx2fc_cpu_callback(struct notifier_block *nfb,
-unsigned long action, void *hcpu);
-/* notification function for CPU hotplug events */
-static struct notifier_block bnx2fc_cpu_notifier = {
-   .notifier_call = bnx2fc_cpu_callback,
-};
-
 static inline struct net_device *bnx2fc_netdev(const struct fc_lport *lport)
 {
return ((struct bnx2fc_interface *)
@@ -2622,37 +2615,19 @@ static void bnx2fc_percpu_thread_destroy
kthread_stop(thread);
 }
 
-/**
- * bnx2fc_cpu_callback - Handler for CPU hotplug events
- *
- * @nfb:The callback data block
- * @action: The event triggering the callback
- * @hcpu:   The index of the CPU that the event is for
- *
- * This creates or destroys per-CPU data for fcoe
- *
- * Returns NOTIFY_OK always.
- */
-static int bnx2fc_cpu_callback(struct notifier_block *nfb,
-unsigned long action, void *hcpu)
+
+static int bnx2fc_cpu_online(unsigned int cpu)
 {
-   unsigned cpu = (unsigned long)hcpu;
+   printk(PFX "CPU %x online: Create Rx thread\n", cpu);
+   bnx2fc_percpu_thread_create(cpu);
+   return 0;
+}
 
-   switch (action) {
-   case CPU_ONLINE:
-   case CPU_ONLINE_FROZEN:
-   printk(PFX "CPU %x online: Create Rx thread\n", cpu);
-   bnx2fc_percpu_thread_create(cpu);
-   break;
-   case CPU_DEAD:
-   case CPU_DEAD_FROZEN:
-   printk(PFX "CPU %x offline: Remove Rx thread\n", cpu);
-   bnx2fc_percpu_thread_destroy(cpu);
-   break;
-   default:
-   break;
-   }
-   return NOTIFY_OK;
+static int bnx2fc_cpu_dead(unsigned int cpu)
+{
+   printk(PFX "CPU %x offline: Remove Rx thread\n", cpu);
+   bnx2fc_percpu_thread_destroy(cpu);
+   return 0;
 }
 
 static int bnx2fc_slave_configure(struct scsi_device *sdev)
@@ -2664,6 +2639,8 @@ static int bnx2fc_slave_configure(struct
return 0;
 }
 
+static enum cpuhp_state bnx2fc_online_state;
+
 /**
  * bnx2fc_mod_init - module init entry point
  *
@@ -2724,21 +2701,31 @@ static int __init bnx2fc_mod_init(void)
spin_lock_init(>fp_work_lock);
}
 
-   cpu_notifier_register_begin();
+   get_online_cpus();
 
-   for_each_online_cpu(cpu) {
+   for_each_online_cpu(cpu)
bnx2fc_percpu_thread_create(cpu);
-   }
 
-   /* Initialize per CPU interrupt thread */
-   __register_hotcpu_notifier(_cpu_notifier);
-
-   cpu_notifier_register_done();
+   rc = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
+  "scsi/bnx2fc:online",
+  bnx2fc_cpu_online, NULL);
+   if (rc < 0)
+   goto stop_threads;
+   bnx2fc_online_state = rc;
+
+   cpuhp_setup_state_nocalls(CPUHP_SCSI_BNX2FC_DEAD, "scsi/bnx2fc:dead",
+ NULL, bnx2fc_cpu_dead);
+   put_online_cpus();
 
cnic_register_driver(CNIC_ULP_FCOE, _cnic_cb);
 
return 0;
 
+stop_threads:
+   for_each_online_cpu(cpu)
+   bnx2fc_percpu_thread_destroy(cpu);
+   put_online_cpus();
+   kthread_stop(l2_thread);
 free_wq:
destroy_workqueue(bnx2fc_wq);
 release_bt:
@@ -2797,16 +2784,16 @@ static void __exit bnx2fc_mod_exit(void)
if (l2_thread)
kthread_stop(l2_thread);
 
-   cpu_notifier_register_begin();
-
+   get_online_cpus();
/* Destroy per cpu threads */
for_each_online_cpu(cpu) {
bnx2fc_percpu_thread_destroy(cpu);
}
 
-   __unregister_hotcpu_notif

Re: [RFC][PATCH v2 0/2] Improve libsas hotplug

2016-11-11 Thread wangyijing

Hi James, sorry to bother you, these two patches try to fix several issues
in libsas, Dan Williams and John Garry also found similar issue, and post
some patches before. Dan Williams's solution fix the sysfs warning calltrace,
but may introduce new flutter issue.

In these two patches, we introduce a new workqueue to fix the flutter issue.
Do you have time to look at these patches ? Your comments is important to us,
It help us to know whether we are in the right direction to fix these issues.

Thanks!
Yijing.

在 2016/9/27 11:15, Yijing Wang 写道:
> v1-v2: Fix memory allocation issue in interrupt context.
> 
> Yijing Wang (2):
>   libsas: Alloc dynamic work to avoid missing sas events
>   libsas: Fix hotplug issue in libsas
> 
>  drivers/scsi/libsas/sas_ata.c   |  34 ++---
>  drivers/scsi/libsas/sas_discover.c  | 245 
> ++--
>  drivers/scsi/libsas/sas_event.c |  61 +
>  drivers/scsi/libsas/sas_expander.c  |  54 ++--
>  drivers/scsi/libsas/sas_init.c  |  31 -
>  drivers/scsi/libsas/sas_internal.h  |  45 ++-
>  drivers/scsi/libsas/sas_phy.c   |  50 +++-
>  drivers/scsi/libsas/sas_port.c  |  35 --
>  drivers/scsi/libsas/sas_scsi_host.c |  23 
>  include/scsi/libsas.h   |  13 +-
>  include/scsi/sas_ata.h  |   4 +-
>  11 files changed, 404 insertions(+), 191 deletions(-)
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC][PATCH v2 0/2] Improve libsas hotplug

2016-09-26 Thread Yijing Wang

v1-v2: Fix memory allocation issue in interrupt context.

Yijing Wang (2):
  libsas: Alloc dynamic work to avoid missing sas events
  libsas: Fix hotplug issue in libsas

 drivers/scsi/libsas/sas_ata.c   |  34 ++---
 drivers/scsi/libsas/sas_discover.c  | 245 ++--
 drivers/scsi/libsas/sas_event.c |  61 +
 drivers/scsi/libsas/sas_expander.c  |  54 ++--
 drivers/scsi/libsas/sas_init.c  |  31 -
 drivers/scsi/libsas/sas_internal.h  |  45 ++-
 drivers/scsi/libsas/sas_phy.c   |  50 +++-
 drivers/scsi/libsas/sas_port.c  |  35 --
 drivers/scsi/libsas/sas_scsi_host.c |  23 
 include/scsi/libsas.h   |  13 +-
 include/scsi/sas_ata.h  |   4 +-
 11 files changed, 404 insertions(+), 191 deletions(-)

-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC][PATCH v2 2/2] libsas: Fix hotplug issue in libsas

2016-09-26 Thread Yijing Wang

Now the libsas hotplug has some issues, Dan Williams report
a similar bug here:
https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg39187.html

The root cause of the issues is we use one workqueue(shost->work_q) to
process libsas event, and we divide a hot-on or hot-remove flow to several
events to process. E.g. we start a new work and queue it into the same
workqueue in sas_deform_port() to remove the children devices after
the sas port. So if there is one hot-on event between remove sas port
and destruct the children devices, some unexpected errors would
be caused.

This patch modify hotplug event process mechanism to solve the
hotplug problems in libsas. We move device add/del operation to
a new workqueue(named sas_dev_wq).

And we use sas_port_alloc_num to replace sas_port_alloc function
because when discovery is concurrently executing with the device
adding or destroying, the old sas port resource may have not
completely deleted, the new sas port resource of the same name
will be created, and this will cause calltrace about sysfs
device node.

Signed-off-by: Yijing Wang <wangyij...@huawei.com>
Signed-off-by: Yousong He <heyous...@huawei.com>
Signed-off-by: Qilin Chen <chenqil...@huawei.com>
---
 drivers/scsi/libsas/sas_ata.c   |  34 ++---
 drivers/scsi/libsas/sas_discover.c  | 245 ++--
 drivers/scsi/libsas/sas_expander.c  |  54 ++--
 drivers/scsi/libsas/sas_init.c  |  26 +++-
 drivers/scsi/libsas/sas_internal.h  |  46 ++-
 drivers/scsi/libsas/sas_port.c  |  12 +-
 drivers/scsi/libsas/sas_scsi_host.c |  23 
 include/scsi/libsas.h   |   5 +-
 include/scsi/sas_ata.h  |   4 +-
 9 files changed, 340 insertions(+), 109 deletions(-)

diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c
index 763f012..877efa8 100644
--- a/drivers/scsi/libsas/sas_ata.c
+++ b/drivers/scsi/libsas/sas_ata.c
@@ -619,32 +619,22 @@ static int sas_get_ata_command_set(struct domain_device 
*dev)
return ata_dev_classify();
 }
 
-void sas_probe_sata(struct asd_sas_port *port)
+void sas_probe_sata_device(struct domain_device *dev)
 {
-   struct domain_device *dev, *n;
-
-   mutex_lock(>ha->disco_mutex);
-   list_for_each_entry(dev, >disco_list, disco_list_node) {
-   if (!dev_is_sata(dev))
-   continue;
-
-   ata_sas_async_probe(dev->sata_dev.ap);
-   }
-   mutex_unlock(>ha->disco_mutex);
+   struct asd_sas_port *port = dev->port;
 
-   list_for_each_entry_safe(dev, n, >disco_list, disco_list_node) {
-   if (!dev_is_sata(dev))
-   continue;
+   if (!port || !port->ha || !dev_is_sata(dev))
+   return;
 
-   sas_ata_wait_eh(dev);
+   ata_sas_async_probe(dev->sata_dev.ap);
 
-   /* if libata could not bring the link up, don't surface
-* the device
-*/
-   if (ata_dev_disabled(sas_to_ata_dev(dev)))
-   sas_fail_probe(dev, __func__, -ENODEV);
-   }
+   sas_ata_wait_eh(dev);
 
+   /* if libata could not bring the link up, don't surface
+* the device
+*/
+   if (ata_dev_disabled(sas_to_ata_dev(dev)))
+   sas_fail_probe(dev, __func__, -ENODEV);
 }
 
 static void sas_ata_flush_pm_eh(struct asd_sas_port *port, const char *func)
@@ -729,7 +719,7 @@ int sas_discover_sata(struct domain_device *dev)
if (res)
return res;
 
-   sas_discover_event(dev->port, DISCE_PROBE);
+   sas_notify_device_event(dev, SAS_DEVICE_ADD);
return 0;
 }
 
diff --git a/drivers/scsi/libsas/sas_discover.c 
b/drivers/scsi/libsas/sas_discover.c
index 60de662..ea57c66 100644
--- a/drivers/scsi/libsas/sas_discover.c
+++ b/drivers/scsi/libsas/sas_discover.c
@@ -34,6 +34,12 @@
 #include 
 #include "../scsi_sas_internal.h"
 
+
+static void sas_unregister_common_dev(struct asd_sas_port *port,
+   struct domain_device *dev);
+static void sas_unregister_fail_dev(struct asd_sas_port *port,
+   struct domain_device *dev);
+
 /* -- Basic task processing for discovery purposes -- */
 
 void sas_init_dev(struct domain_device *dev)
@@ -158,11 +164,8 @@ static int sas_get_port_device(struct asd_sas_port *port)
 
if (dev_is_sata(dev) || dev->dev_type == SAS_END_DEVICE)
list_add_tail(>disco_list_node, >disco_list);
-   else {
-   spin_lock_irq(>dev_list_lock);
-   list_add_tail(>dev_list_node, >dev_list);
-   spin_unlock_irq(>dev_list_lock);
-   }
+   else
+   list_add_tail(>dev_list_node, >expander_list);
 
spin_lock_irq(>phy_list_lock);
list_for_each_entry(phy, >phy_list, port_phy_el)
@@ -212,34 +215,83 @@ void sas_not

[RFC][PATCH v1 2/2] libsas: Fix hotplug issue in libsas

2016-09-12 Thread Yijing Wang

Now the libsas hotplug has some issues, Dan Williams report
a similar bug here:
https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg39187.html

The root cause of the issues is we use one workqueue(shost->work_q) to
process libsas event, and we divide a hot-on or hot-remove flow to several
events to process. E.g. we start a new work and queue it into the same
workqueue in sas_deform_port() to remove the children devices after
the sas port. So if there is one hot-on event between remove sas port
and destruct the children devices, some unexpected errors would
be caused.

This patch modify hotplug event process mechanism to solve the
hotplug problems in libsas. We move device add/del operation to
a new workqueue(named sas_dev_wq).

And we use sas_port_alloc_num to replace sas_port_alloc function
because when discovery is concurrently executing with the device
adding or destroying, the old sas port resource may have not
completely deleted, the new sas port resource of the same name
will be created, and this will cause calltrace about sysfs
device node.

Signed-off-by: Yijing Wang <wangyij...@huawei.com>
Signed-off-by: Yousong He <heyous...@huawei.com>
Signed-off-by: Qilin Chen <chenqil...@huawei.com>
---
 drivers/scsi/libsas/sas_ata.c   |   34 ++---
 drivers/scsi/libsas/sas_discover.c  |  245 +-
 drivers/scsi/libsas/sas_expander.c  |   54 ++--
 drivers/scsi/libsas/sas_init.c  |   26 -
 drivers/scsi/libsas/sas_internal.h  |   46 ++-
 drivers/scsi/libsas/sas_port.c  |   12 ++-
 drivers/scsi/libsas/sas_scsi_host.c |   23 
 include/scsi/libsas.h   |5 +-
 include/scsi/sas_ata.h  |4 +-
 9 files changed, 340 insertions(+), 109 deletions(-)

diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c
index 763f012..877efa8 100644
--- a/drivers/scsi/libsas/sas_ata.c
+++ b/drivers/scsi/libsas/sas_ata.c
@@ -619,32 +619,22 @@ static int sas_get_ata_command_set(struct domain_device 
*dev)
return ata_dev_classify();
 }
 
-void sas_probe_sata(struct asd_sas_port *port)
+void sas_probe_sata_device(struct domain_device *dev)
 {
-   struct domain_device *dev, *n;
-
-   mutex_lock(>ha->disco_mutex);
-   list_for_each_entry(dev, >disco_list, disco_list_node) {
-   if (!dev_is_sata(dev))
-   continue;
-
-   ata_sas_async_probe(dev->sata_dev.ap);
-   }
-   mutex_unlock(>ha->disco_mutex);
+   struct asd_sas_port *port = dev->port;
 
-   list_for_each_entry_safe(dev, n, >disco_list, disco_list_node) {
-   if (!dev_is_sata(dev))
-   continue;
+   if (!port || !port->ha || !dev_is_sata(dev))
+   return;
 
-   sas_ata_wait_eh(dev);
+   ata_sas_async_probe(dev->sata_dev.ap);
 
-   /* if libata could not bring the link up, don't surface
-* the device
-*/
-   if (ata_dev_disabled(sas_to_ata_dev(dev)))
-   sas_fail_probe(dev, __func__, -ENODEV);
-   }
+   sas_ata_wait_eh(dev);
 
+   /* if libata could not bring the link up, don't surface
+* the device
+*/
+   if (ata_dev_disabled(sas_to_ata_dev(dev)))
+   sas_fail_probe(dev, __func__, -ENODEV);
 }
 
 static void sas_ata_flush_pm_eh(struct asd_sas_port *port, const char *func)
@@ -729,7 +719,7 @@ int sas_discover_sata(struct domain_device *dev)
if (res)
return res;
 
-   sas_discover_event(dev->port, DISCE_PROBE);
+   sas_notify_device_event(dev, SAS_DEVICE_ADD);
return 0;
 }
 
diff --git a/drivers/scsi/libsas/sas_discover.c 
b/drivers/scsi/libsas/sas_discover.c
index 60de662..ea57c66 100644
--- a/drivers/scsi/libsas/sas_discover.c
+++ b/drivers/scsi/libsas/sas_discover.c
@@ -34,6 +34,12 @@
 #include 
 #include "../scsi_sas_internal.h"
 
+
+static void sas_unregister_common_dev(struct asd_sas_port *port,
+   struct domain_device *dev);
+static void sas_unregister_fail_dev(struct asd_sas_port *port,
+   struct domain_device *dev);
+
 /* -- Basic task processing for discovery purposes -- */
 
 void sas_init_dev(struct domain_device *dev)
@@ -158,11 +164,8 @@ static int sas_get_port_device(struct asd_sas_port *port)
 
if (dev_is_sata(dev) || dev->dev_type == SAS_END_DEVICE)
list_add_tail(>disco_list_node, >disco_list);
-   else {
-   spin_lock_irq(>dev_list_lock);
-   list_add_tail(>dev_list_node, >dev_list);
-   spin_unlock_irq(>dev_list_lock);
-   }
+   else
+   list_add_tail(>dev_list_node, >expander_list);
 
spin_lock_irq(>phy_list_lock);
list_for_each_entry(phy, >phy_list, port_phy_el)
@@ -212,34 +215,83 @@ void

[PATCH 10/21] virtio scsi: Convert to hotplug state machine

2016-09-06 Thread Sebastian Andrzej Siewior

Install the callbacks via the state machine. It uses the multi instance
infrastructure of the hotplug code to handle each interface.

virtscsi_set_affinity() is removed from virtscsi_init() because
virtscsi_cpu_notif_add() (the function which registers the instance) is invoked
right after it and the cpuhp_state_add_instance() functions invokes the startup
callback on all online CPUs.

The same thing can not be applied virtscsi_cpu_notif_remove() because
virtscsi_remove_vqs() invokes virtscsi_set_affinity() with affinity = false as
argument but the old CPU_DEAD state invoked the function with affinity = true
(which does not match the DEAD callback).

Cc: "James E.J. Bottomley" <j...@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.peter...@oracle.com>
Cc: linux-scsi@vger.kernel.org
Cc: "Michael S. Tsirkin" <m...@redhat.com>
Cc: virtualizat...@lists.linux-foundation.org
Signed-off-by: Sebastian Andrzej Siewior <bige...@linutronix.de>
---
 drivers/scsi/virtio_scsi.c | 76 ++
 include/linux/cpuhotplug.h |  1 +
 2 files changed, 50 insertions(+), 27 deletions(-)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 7dbbb29d24c6..deefab3a94d0 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -107,8 +107,8 @@ struct virtio_scsi {
/* If the affinity hint is set for virtqueues */
    bool affinity_hint_set;
 
-   /* CPU hotplug notifier */
-   struct notifier_block nb;
+   struct hlist_node node;
+   struct hlist_node node_dead;
 
/* Protected by event_vq lock */
bool stop_events;
@@ -118,6 +118,7 @@ struct virtio_scsi {
struct virtio_scsi_vq req_vqs[];
 };
 
+static enum cpuhp_state virtioscsi_online;
 static struct kmem_cache *virtscsi_cmd_cache;
 static mempool_t *virtscsi_cmd_pool;
 
@@ -852,21 +853,33 @@ static void virtscsi_set_affinity(struct virtio_scsi 
*vscsi, bool affinity)
put_online_cpus();
 }
 
-static int virtscsi_cpu_callback(struct notifier_block *nfb,
-unsigned long action, void *hcpu)
+static int virtscsi_cpu_online(unsigned int cpu, struct hlist_node *node)
 {
-   struct virtio_scsi *vscsi = container_of(nfb, struct virtio_scsi, nb);
-   switch(action) {
-   case CPU_ONLINE:
-   case CPU_ONLINE_FROZEN:
-   case CPU_DEAD:
-   case CPU_DEAD_FROZEN:
-   __virtscsi_set_affinity(vscsi, true);
-   break;
-   default:
-   break;
-   }
-   return NOTIFY_OK;
+   struct virtio_scsi *vscsi = hlist_entry_safe(node, struct virtio_scsi,
+node);
+   __virtscsi_set_affinity(vscsi, true);
+   return 0;
+}
+
+static int virtscsi_cpu_notif_add(struct virtio_scsi *vi)
+{
+   int ret;
+
+   ret = cpuhp_state_add_instance(virtioscsi_online, >node);
+   if (ret)
+   return ret;
+
+   ret = cpuhp_state_add_instance(CPUHP_VIRT_SCSI_DEAD, >node_dead);
+   if (ret)
+   cpuhp_state_remove_instance(virtioscsi_online, >node);
+   return ret;
+}
+
+static void virtscsi_cpu_notif_remove(struct virtio_scsi *vi)
+{
+   cpuhp_state_remove_instance_nocalls(virtioscsi_online, >node);
+   cpuhp_state_remove_instance_nocalls(CPUHP_VIRT_SCSI_DEAD,
+   >node_dead);
 }
 
 static void virtscsi_init_vq(struct virtio_scsi_vq *virtscsi_vq,
@@ -929,8 +942,6 @@ static int virtscsi_init(struct virtio_device *vdev,
virtscsi_init_vq(>req_vqs[i - VIRTIO_SCSI_VQ_BASE],
 vqs[i]);
 
-   virtscsi_set_affinity(vscsi, true);
-
virtscsi_config_set(vdev, cdb_size, VIRTIO_SCSI_CDB_SIZE);
virtscsi_config_set(vdev, sense_size, VIRTIO_SCSI_SENSE_SIZE);
 
@@ -987,12 +998,9 @@ static int virtscsi_probe(struct virtio_device *vdev)
if (err)
goto virtscsi_init_failed;
 
-   vscsi->nb.notifier_call = _cpu_callback;
-   err = register_hotcpu_notifier(>nb);
-   if (err) {
-   pr_err("registering cpu notifier failed\n");
+   err = virtscsi_cpu_notif_add(vscsi);
+   if (err)
goto scsi_add_host_failed;
-   }
 
cmd_per_lun = virtscsi_config_get(vdev, cmd_per_lun) ?: 1;
shost->cmd_per_lun = min_t(u32, cmd_per_lun, shost->can_queue);
@@ -1049,7 +1057,7 @@ static void virtscsi_remove(struct virtio_device *vdev)
 
scsi_remove_host(shost);
 
-   unregister_hotcpu_notifier(>nb);
+   virtscsi_cpu_notif_remove(vscsi);
 
virtscsi_remove_vqs(vdev);
scsi_host_put(shost);
@@ -1061,7 +1069,7 @@ static int virtscsi_freeze(struct virtio_device *vdev)
struct Scsi_Host *sh = virtio_scsi_host(vdev);
struct virtio_scsi *vscsi = shost_priv(sh);
 
-   unregister

Re: [PATCH V2 resend] libata:fix kernel panic when hotplug

2016-06-22 Thread dingxiang



Hi,All

Hello,

On Mon, Jun 20, 2016 at 06:46:55PM -0700, Dan Williams wrote:

On Mon, Jun 20, 2016 at 6:22 PM, Martin K. Petersen
<martin.peter...@oracle.com> wrote:

"Tejun" == Tejun Heo <t...@kernel.org> writes:

In fact,we don't need libata to deal with hotplug in sas environment.
So we can't run ata hotplug task when ata port is sas host.

Tejun> Martin, can you please confirm whether the above is true.  If so,
Tejun> I'll route the patch through libata w/ stable cc'd.

Not exactly a libsas expert. James? Dan?

While it is true that libsas itself handles adding / removing devices
we have historically avoided this conflict because
ATA_PFLAG_SCSI_HOTPLUG is never set for libsas ata_ports.  So the bug
/ behavior change is that  ATA_PFLAG_SCSI_HOTPLUG gets set in the
first place.  Ignoring it is a band-aid / not the real fix afaics.

I see.  I'll hold off for now then.  Ding Xiang, can you find out
where that flag is getting set?

Thanks!

There are two places will set flag ATA_PFLAG_SCSI_HOTPLUG in libata-eh.c.
I think both places should  be protected. Here is the suggestion.
Thanks~

diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index 61dc7a9..2bee041 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -1385,7 +1385,8 @@ void ata_eh_detach_dev(struct ata_device *dev)

if (ata_scsi_offline_dev(dev)) {
dev->flags |= ATA_DFLAG_DETACHED;
-   ap->pflags |= ATA_PFLAG_SCSI_HOTPLUG;
+   if (!(ap->pflags & ATA_FLAG_SAS_HOST))
+   ap->pflags |= ATA_PFLAG_SCSI_HOTPLUG;
}

/* clear per-dev EH info */
@@ -3299,7 +3300,8 @@ static int ata_eh_revalidate_and_attach(struct 
ata_link *link,

}

spin_lock_irqsave(ap->lock, flags);
-   ap->pflags |= ATA_PFLAG_SCSI_HOTPLUG;
+   if (!(ap->pflags & ATA_FLAG_SAS_HOST))
+   ap->pflags |= ATA_PFLAG_SCSI_HOTPLUG;
spin_unlock_irqrestore(ap->lock, flags);

/* new device discovered, configure xfermode */


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V2 resend] libata:fix kernel panic when hotplug

2016-06-21 Thread Tejun Heo

Hello,

On Mon, Jun 20, 2016 at 06:46:55PM -0700, Dan Williams wrote:
> On Mon, Jun 20, 2016 at 6:22 PM, Martin K. Petersen
> <martin.peter...@oracle.com> wrote:
> >>>>>> "Tejun" == Tejun Heo <t...@kernel.org> writes:
> >
> >>> In fact,we don't need libata to deal with hotplug in sas environment.
> >>> So we can't run ata hotplug task when ata port is sas host.
> >
> > Tejun> Martin, can you please confirm whether the above is true.  If so,
> > Tejun> I'll route the patch through libata w/ stable cc'd.
> >
> > Not exactly a libsas expert. James? Dan?
> 
> While it is true that libsas itself handles adding / removing devices
> we have historically avoided this conflict because
> ATA_PFLAG_SCSI_HOTPLUG is never set for libsas ata_ports.  So the bug
> / behavior change is that  ATA_PFLAG_SCSI_HOTPLUG gets set in the
> first place.  Ignoring it is a band-aid / not the real fix afaics.

I see.  I'll hold off for now then.  Ding Xiang, can you find out
where that flag is getting set?

Thanks!

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V2 resend] libata:fix kernel panic when hotplug

2016-06-20 Thread Dan Williams

On Mon, Jun 20, 2016 at 6:22 PM, Martin K. Petersen
<martin.peter...@oracle.com> wrote:
>>>>>> "Tejun" == Tejun Heo <t...@kernel.org> writes:
>
>>> In fact,we don't need libata to deal with hotplug in sas environment.
>>> So we can't run ata hotplug task when ata port is sas host.
>
> Tejun> Martin, can you please confirm whether the above is true.  If so,
> Tejun> I'll route the patch through libata w/ stable cc'd.
>
> Not exactly a libsas expert. James? Dan?

While it is true that libsas itself handles adding / removing devices
we have historically avoided this conflict because
ATA_PFLAG_SCSI_HOTPLUG is never set for libsas ata_ports.  So the bug
/ behavior change is that  ATA_PFLAG_SCSI_HOTPLUG gets set in the
first place.  Ignoring it is a band-aid / not the real fix afaics.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V2 resend] libata:fix kernel panic when hotplug

2016-06-20 Thread Martin K. Petersen

>>>>> "Tejun" == Tejun Heo <t...@kernel.org> writes:

>> In fact,we don't need libata to deal with hotplug in sas environment.
>> So we can't run ata hotplug task when ata port is sas host.

Tejun> Martin, can you please confirm whether the above is true.  If so,
Tejun> I'll route the patch through libata w/ stable cc'd.

Not exactly a libsas expert. James? Dan?

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V2 resend] libata:fix kernel panic when hotplug

2016-06-16 Thread Tejun Heo

On Thu, Jun 16, 2016 at 12:45:40PM +0800, DingXiang wrote:
...
> In fact,we don't need libata to deal with hotplug in sas environment.
> So we can't run ata hotplug task when ata port is sas host.

Martin, can you please confirm whether the above is true.  If so, I'll
route the patch through libata w/ stable cc'd.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V2] libata:fix kernel panic when hotplug

2016-06-16 Thread kbuild test robot

Hi,

[auto build test WARNING on tj-libata/for-next]
[also build test WARNING on v4.7-rc3 next-20160616]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/DingXiang/libata-fix-kernel-panic-when-hotplug/20160616-105155
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tj/libata for-next
config: x86_64-randconfig-s5-06161418 (attached as .config)
compiler: gcc-6 (Debian 6.1.1-1) 6.1.1 20160430
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   In file included from include/linux/linkage.h:4:0,
from include/linux/kernel.h:6,
from drivers/ata/libata-eh.c:35:
   drivers/ata/libata-eh.c: In function 'ata_scsi_port_error_handler':
   drivers/ata/libata-eh.c:820:19: error: 'ATA_PFLAG_SAS_HOST' undeclared 
(first use in this function)
   !(ap->pflags & ATA_PFLAG_SAS_HOST))
  ^
   include/linux/compiler.h:151:30: note: in definition of macro '__trace_if'
 if (__builtin_constant_p(!!(cond)) ? !!(cond) :   \
 ^~~~
>> drivers/ata/libata-eh.c:819:7: note: in expansion of macro 'if'
 else if ((ap->pflags & ATA_PFLAG_SCSI_HOTPLUG) &&
  ^~
   drivers/ata/libata-eh.c:820:19: note: each undeclared identifier is reported 
only once for each function it appears in
   !(ap->pflags & ATA_PFLAG_SAS_HOST))
  ^
   include/linux/compiler.h:151:30: note: in definition of macro '__trace_if'
 if (__builtin_constant_p(!!(cond)) ? !!(cond) :   \
 ^~~~
>> drivers/ata/libata-eh.c:819:7: note: in expansion of macro 'if'
 else if ((ap->pflags & ATA_PFLAG_SCSI_HOTPLUG) &&
  ^~

vim +/if +819 drivers/ata/libata-eh.c

   803  ap->ops->end_eh(ap);
   804  
   805  spin_unlock_irqrestore(ap->lock, flags);
   806  ata_eh_release(ap);
   807  } else {
   808  WARN_ON(ata_qc_from_tag(ap, ap->link.active_tag) == 
NULL);
   809  ap->ops->eng_timeout(ap);
   810  }
   811  
   812  scsi_eh_flush_done_q(>eh_done_q);
   813  
   814  /* clean up */
   815  spin_lock_irqsave(ap->lock, flags);
   816  
   817  if (ap->pflags & ATA_PFLAG_LOADING)
   818  ap->pflags &= ~ATA_PFLAG_LOADING;
 > 819  else if ((ap->pflags & ATA_PFLAG_SCSI_HOTPLUG) &&
   820   !(ap->pflags & ATA_PFLAG_SAS_HOST))
   821  schedule_delayed_work(>hotplug_task, 0);
   822  
   823  if (ap->pflags & ATA_PFLAG_RECOVERED)
   824  ata_port_info(ap, "EH complete\n");
   825  
   826  ap->pflags &= ~(ATA_PFLAG_SCSI_HOTPLUG | ATA_PFLAG_RECOVERED);
   827  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data

[PATCH V2 resend] libata:fix kernel panic when hotplug

2016-06-15 Thread DingXiang

In normal condition,if we use sas protocol and hotplug a
sata disk on a port,the sas driver will send event
"PORTE_BYTES_DMAED" and call function "sas_porte_bytes_dmaed".
But if a sata disk is run io and unplug it,then plug a new
sata disk,this operation may cause a kernel panic like this:

[ 2366.923208] Unable to handle kernel NULL pointer dereference
at virtual address 07b8
...
[ 2368.766334] Call trace:
[ 2368.781712] [] sas_find_dev_by_rphy+0x48/0x118
[ 2368.800394] [] sas_target_alloc+0x28/0x98
[ 2368.817975] [] scsi_alloc_target+0x248/0x308
[ 2368.835570] [] __scsi_add_device+0xb8/0x160
[ 2368.853034] [] ata_scsi_scan_host+0x190/0x230
[ 2368.871614] [] ata_scsi_hotplug+0xc8/0xe8
[ 2368.889152] [] process_one_work+0x164/0x438
[ 2368.908003] [] worker_thread+0x144/0x4b0
[ 2368.924613] [] kthread+0xfc/0x110

This because "dev_to_shost" in "sas_find_dev_by_rphy" return
a NULL point,and SHOST_TO_SAS_HA used it,so kernel panic happened.

why did dev_to_shost return a NULL point?
Because in "__scsi_add_device" ,
struct device *parent = >shost_gendev,
and in "scsi_alloc_target", "*parent" is assigned to
"starget->dev.parent",then "sas_target_alloc" will get
"struct sas_rphy" according "starget->dev.parent", and in
"sas_find_dev_by_rphy" , we will get "struct Scsi_Host *shost"
according "rphy->dev.parent",we will find that
rphy->dev.parent = shost->shost_gendev.parent, and shost_gendev.parent
is "ap->tdev",there is no parent any more,so "dev_to_shost"
return a NULL point.

when the panic will happen?
When libata is handling error,and add hotplug_task to workqueue,
if a new sata disk pluged at this moment,the libata hotplug task
will run and panic will happen.

In fact,we don't need libata to deal with hotplug in sas environment.
So we can't run ata hotplug task when ata port is sas host.

Signed-off-by: Ding Xiang <dingxi...@huawei.com>
---
 drivers/ata/libata-eh.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index 61dc7a9..4428a7c 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -816,7 +816,8 @@ void ata_scsi_port_error_handler(struct Scsi_Host *host, 
struct ata_port *ap)
 
if (ap->pflags & ATA_PFLAG_LOADING)
ap->pflags &= ~ATA_PFLAG_LOADING;
-   else if (ap->pflags & ATA_PFLAG_SCSI_HOTPLUG)
+   else if ((ap->pflags & ATA_PFLAG_SCSI_HOTPLUG) &&
+!(ap->pflags & ATA_FLAG_SAS_HOST))
schedule_delayed_work(>hotplug_task, 0);
 
if (ap->pflags & ATA_PFLAG_RECOVERED)
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V2] libata:fix kernel panic when hotplug

2016-06-15 Thread kbuild test robot

Hi,

[auto build test ERROR on tj-libata/for-next]
[also build test ERROR on v4.7-rc3 next-20160615]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/DingXiang/libata-fix-kernel-panic-when-hotplug/20160616-105155
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tj/libata for-next
config: x86_64-randconfig-s5-06161042 (attached as .config)
compiler: gcc-6 (Debian 6.1.1-1) 6.1.1 20160430
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

   drivers/ata/libata-eh.c: In function 'ata_scsi_port_error_handler':
>> drivers/ata/libata-eh.c:820:19: error: 'ATA_PFLAG_SAS_HOST' undeclared 
>> (first use in this function)
   !(ap->pflags & ATA_PFLAG_SAS_HOST))
  ^~
   drivers/ata/libata-eh.c:820:19: note: each undeclared identifier is reported 
only once for each function it appears in

vim +/ATA_PFLAG_SAS_HOST +820 drivers/ata/libata-eh.c

   814  /* clean up */
   815  spin_lock_irqsave(ap->lock, flags);
   816  
   817  if (ap->pflags & ATA_PFLAG_LOADING)
   818  ap->pflags &= ~ATA_PFLAG_LOADING;
   819  else if ((ap->pflags & ATA_PFLAG_SCSI_HOTPLUG) &&
 > 820   !(ap->pflags & ATA_PFLAG_SAS_HOST))
   821  schedule_delayed_work(>hotplug_task, 0);
   822  
   823  if (ap->pflags & ATA_PFLAG_RECOVERED)

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data

[PATCH V2] libata:fix kernel panic when hotplug

2016-06-15 Thread DingXiang

From: Ding Xiang <dingxi...@huawei.com>

In normal condition,if we use sas protocol and hotplug a
sata disk on a port,the sas driver will send event
"PORTE_BYTES_DMAED" and call function "sas_porte_bytes_dmaed".
But if a sata disk is run io and unplug it,then plug a new
sata disk,this operation may cause a kernel panic like this:

[ 2366.923208] Unable to handle kernel NULL pointer dereference
at virtual address 07b8
[ 2366.949253] pgd = ffc00121d000
[ 2366.971164] [07b8] *pgd=0027df893003, *pud=0027df893003,
*pmd=0027df894003, *pte=00606d000707
[ 2367.022822] Internal error: Oops: 9605 [#1] SMP
[ 2367.048490] Modules linked in: dm_mirror(E) dm_region_hash(E) dm_log(E)
dm_mod(E) crc32_arm64(E) aes_ce_blk(E) ablk_helper(E) cry ptd(E)
aes_ce_cipher(E) ghash_ce(E) sha2_ce(E) sha1_ce(E) ses(E) enclosure(E)
shpchp(E) marvell(E)
[ 2367.144808] CPU: 16 PID: 710 Comm: kworker/16:1 Tainted: GE
4.1.23-next.aarch64 #1
[ 2367.180161] Hardware name: Huawei Taishan 2280 /BC11SPCC,
BIOS 1.28 05/14/2016
[ 2367.213305] Workqueue: events ata_scsi_hotplug
[ 2367.244296] task: ffe7db9b5e00 ti: ffe7db1a
task.ti: ffe7db1a
[ 2367.279949] PC is at sas_find_dev_by_rphy+0x48/0x118
[ 2367.312045] LR is at sas_find_dev_by_rphy+0x40/0x118
[ 2367.341970] pc : [] lr : []
pstate: 0145
...
[ 2368.766334] Call trace:
[ 2368.781712] [] sas_find_dev_by_rphy+0x48/0x118
[ 2368.800394] [] sas_target_alloc+0x28/0x98
[ 2368.817975] [] scsi_alloc_target+0x248/0x308
[ 2368.835570] [] __scsi_add_device+0xb8/0x160
[ 2368.853034] [] ata_scsi_scan_host+0x190/0x230
[ 2368.871614] [] ata_scsi_hotplug+0xc8/0xe8
[ 2368.889152] [] process_one_work+0x164/0x438
[ 2368.908003] [] worker_thread+0x144/0x4b0
[ 2368.924613] [] kthread+0xfc/0x110
[ 2368.940923] Code: aa1303e0 97ff5deb 3480 d1082273 (f943de76)

This because "dev_to_shost" in "sas_find_dev_by_rphy" return
a NULL point,and SHOST_TO_SAS_HA used it,so kernel panic happed.

why dev_to_shost return a NULL point?
Because in "__scsi_add_device" ,
struct device *parent = >shost_gendev,
and in "scsi_alloc_target", "*parent" is assigned to
"starget->dev.parent",then "sas_target_alloc" will get
"struct sas_rphy" according "starget->dev.parent", and in
"sas_find_dev_by_rphy" , we will get "struct Scsi_Host *shost"
according "rphy->dev.parent",we will find that
rphy->dev.parent = shost->shost_gendev.parent, and shost_gendev.parent
is "ap->tdev",there is no parent any more,so "dev_to_shost"
return a NULL point.

when the panic will happen?
When libata is handling error,and add hotplug_task to workqueue,
if a new sata disk pluged at this moment,the libata hotplug task
will run and panic will happen.

In fact,we don't need libata to deal with hotplug in sas environment.
So we can't run ata hotplug task when ata port is sas host.

Signed-off-by: Ding Xiang <dingxi...@huawei.com>
---
 drivers/ata/libata-eh.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index 61dc7a9..4428a7c 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -816,7 +816,8 @@ void ata_scsi_port_error_handler(struct Scsi_Host *host, 
struct ata_port *ap)
 
if (ap->pflags & ATA_PFLAG_LOADING)
ap->pflags &= ~ATA_PFLAG_LOADING;
-   else if (ap->pflags & ATA_PFLAG_SCSI_HOTPLUG)
+   else if ((ap->pflags & ATA_PFLAG_SCSI_HOTPLUG) &&
+!(ap->pflags & ATA_PFLAG_SAS_HOST))
schedule_delayed_work(>hotplug_task, 0);
 
if (ap->pflags & ATA_PFLAG_RECOVERED)
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 10/11] scsi: bnx2fc: fix hotplug race in bnx2fc_process_new_cqes()

2016-03-11 Thread Sebastian Andrzej Siewior

The ->iothread is accessed without holding the lock. Take this:

 CPU A  CPU B
------
bnx2fc_process_new_cqes()   bnx2fc_percpu_thread_destroy()
 spin_lock_bh(fp_work_lock);
 fps->iothread != NULL
 list_add_tail(work)
 spin_unlock_bh(>fp_work_lock); spin_lock_bh(>fp_work_lock);
 fps->iothread = NULL
 if (fps->iothread && work)
...
 else
  bnx2fc_process_cq_compl(work)  bnx2fc_process_cq_compl(work);

CPU A will process wqe despite having it added to the work list of CPU
B which will at the same time clean up the queued wqe.

The fix is to add the item to the list and wakeup the thread while still
holding the lock. If the item was not added to the list then the
`process' variable is still true in which case we have to do manually.

Cc: qlogic-storage-upstr...@qlogic.com
Cc: "James E.J. Bottomley" 
Cc: "Martin K. Petersen" 
Cc: Christoph Hellwig 
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/scsi/bnx2fc/bnx2fc_hwi.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/bnx2fc/bnx2fc_hwi.c b/drivers/scsi/bnx2fc/bnx2fc_hwi.c
index 28c671b609b2..1427062e86f0 100644
--- a/drivers/scsi/bnx2fc/bnx2fc_hwi.c
+++ b/drivers/scsi/bnx2fc/bnx2fc_hwi.c
@@ -1045,6 +1045,7 @@ int bnx2fc_process_new_cqes(struct bnx2fc_rport *tgt)
struct bnx2fc_work *work = NULL;
struct bnx2fc_percpu_s *fps = NULL;
unsigned int cpu = wqe % num_possible_cpus();
+   bool process = true;
 
fps = _cpu(bnx2fc_percpu, cpu);
spin_lock_bh(>fp_work_lock);
@@ -1052,16 +1053,16 @@ int bnx2fc_process_new_cqes(struct bnx2fc_rport *tgt)
goto unlock;
 
work = bnx2fc_alloc_work(tgt, wqe);
-   if (work)
+   if (work) {
list_add_tail(>list,
  >work_list);
+   wake_up_process(fps->iothread);
+   process = false;
+   }
 unlock:
spin_unlock_bh(>fp_work_lock);
 
-   /* Pending work request completion */
-   if (fps->iothread && work)
-   wake_up_process(fps->iothread);
-   else
+   if (process)
bnx2fc_process_cq_compl(tgt, wqe);
num_free_sqes++;
}
-- 
2.7.0

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[v2 PATCH 2/3] scsi:stex.c Add hotplug support

2016-02-22 Thread Charles Chiou

From: Charles <charles.ch...@tw.promise.com>

1. Add hotplug support. Pegasus support surprise removal. To this end, I
   use return_abnormal_state function to return DID_NO_CONNECT for all
  commands which sent to driver.

2. Remove stex_hba_stop in stex_remove because we cannot send command to
   device after hotplug.

3. Add new device status:  MU_STATE_STOP, MU_STATE_NOCONNECT,
   MU_STATE_STOP. MU_STATE_STOP is currently not referenced.
   MU_STATE_NOCONNECT represent that device is plugged out from the host.

4. Use return_abnormal_function() to substitute part of code in stex_do_reset.

Signed-off-by: Charles Chiou <charles.ch...@tw.promise.com>
Reviewed-by: Johannes Thumshirn <jthumsh...@suse.de>
---
 drivers/scsi/stex.c | 53 ++---
 1 file changed, 34 insertions(+), 19 deletions(-)

diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c
index 495d632..1994603 100644
--- a/drivers/scsi/stex.c
+++ b/drivers/scsi/stex.c
@@ -84,6 +84,8 @@ enum {
MU_STATE_STARTED= 2,
MU_STATE_RESETTING  = 3,
MU_STATE_FAILED = 4,
+   MU_STATE_STOP   = 5,
+   MU_STATE_NOCONNECT  = 6,
 
MU_MAX_DELAY= 120,
MU_HANDSHAKE_SIGNATURE  = 0x5555,
@@ -537,6 +539,27 @@ stex_ss_send_cmd(struct st_hba *hba, struct req_msg *req, 
u16 tag)
readl(hba->mmio_base + YH2I_REQ); /* flush */
 }
 
+static void return_abnormal_state(struct st_hba *hba, int status)
+{
+   struct st_ccb *ccb;
+   unsigned long flags;
+   u16 tag;
+
+   spin_lock_irqsave(hba->host->host_lock, flags);
+   for (tag = 0; tag < hba->host->can_queue; tag++) {
+   ccb = >ccb[tag];
+   if (ccb->req == NULL)
+   continue;
+   ccb->req = NULL;
+   if (ccb->cmd) {
+   scsi_dma_unmap(ccb->cmd);
+   ccb->cmd->result = status << 16;
+   ccb->cmd->scsi_done(ccb->cmd);
+   ccb->cmd = NULL;
+   }
+   }
+   spin_unlock_irqrestore(hba->host->host_lock, flags);
+}
 static int
 stex_slave_config(struct scsi_device *sdev)
 {
@@ -560,8 +583,12 @@ stex_queuecommand_lck(struct scsi_cmnd *cmd, void 
(*done)(struct scsi_cmnd *))
id = cmd->device->id;
lun = cmd->device->lun;
hba = (struct st_hba *) >hostdata[0];
-
-   if (unlikely(hba->mu_status == MU_STATE_RESETTING))
+   if (hba->mu_status == MU_STATE_NOCONNECT) {
+   cmd->result = DID_NO_CONNECT;
+   done(cmd);
+   return 0;
+   }
+   if (unlikely(hba->mu_status != MU_STATE_STARTED))
return SCSI_MLQUEUE_HOST_BUSY;
 
switch (cmd->cmnd[0]) {
@@ -1260,10 +1287,8 @@ static void stex_ss_reset(struct st_hba *hba)
 
 static int stex_do_reset(struct st_hba *hba)
 {
-   struct st_ccb *ccb;
unsigned long flags;
unsigned int mu_status = MU_STATE_RESETTING;
-   u16 tag;
 
spin_lock_irqsave(hba->host->host_lock, flags);
if (hba->mu_status == MU_STATE_STARTING) {
@@ -1297,20 +1322,8 @@ static int stex_do_reset(struct st_hba *hba)
else if (hba->cardtype == st_yel)
stex_ss_reset(hba);
 
-   spin_lock_irqsave(hba->host->host_lock, flags);
-   for (tag = 0; tag < hba->host->can_queue; tag++) {
-   ccb = >ccb[tag];
-   if (ccb->req == NULL)
-   continue;
-   ccb->req = NULL;
-   if (ccb->cmd) {
-   scsi_dma_unmap(ccb->cmd);
-   ccb->cmd->result = DID_RESET << 16;
-   ccb->cmd->scsi_done(ccb->cmd);
-   ccb->cmd = NULL;
-   }
-   }
-   spin_unlock_irqrestore(hba->host->host_lock, flags);
+
+   return_abnormal_state(hba, DID_RESET);
 
if (stex_handshake(hba) == 0)
return 0;
@@ -1771,9 +1784,11 @@ static void stex_remove(struct pci_dev *pdev)
 {
struct st_hba *hba = pci_get_drvdata(pdev);
 
+   hba->mu_status = MU_STATE_NOCONNECT;
+   return_abnormal_state(hba, DID_NO_CONNECT);
scsi_remove_host(hba->host);
 
-   stex_hba_stop(hba);
+   scsi_block_requests(hba->host);
 
stex_hba_free(hba);
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Resend PATCH 2/3] scsi:stex.c Add hotplug support

2016-02-04 Thread Charles Chiou



From 60e14c245c18cbe0300cfa244334e2850a52a381 Mon Sep 17 00:00:00 2001
From: Charles <charles.ch...@tw.promise.com>
Date: Wed, 2 Sep 2015 20:48:55 +0800
Subject: [PATCH 2/3] scsi:stex.c Add hotplug support

1. Add hotplug support. Pegasus support surprise removal. To this end, I
use return_abnormal_state function to return DID_NO_CONNECT for all
   commands which sent to driver.

2. Remove stex_hba_stop in stex_remove because we cannot send command to
device after hotplug.

3. Add new device status:  MU_STATE_STOP, MU_STATE_NOCONNECT,
MU_STATE_STOP. MU_STATE_STOP is currently not referenced.
MU_STATE_NOCONNECT represent that device is plugged out from the host.

4. Use return_abnormal_function() to substitute part of code in 
stex_do_reset.


V2: N/A

Signed-off-by: Charles Chiou <charles.ch...@tw.promise.com>
Reviewed-by: Johannes Thumshirn <jthumsh...@suse.de>

---
  drivers/scsi/stex.c | 53 
++---

  1 file changed, 34 insertions(+), 19 deletions(-)

diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c
index 0c93f1f..4ef0c80 100644
--- a/drivers/scsi/stex.c
+++ b/drivers/scsi/stex.c
@@ -83,6 +83,8 @@ enum {
  MU_STATE_STARTED= 2,
  MU_STATE_RESETTING= 3,
  MU_STATE_FAILED= 4,
+MU_STATE_STOP= 5,
+MU_STATE_NOCONNECT= 6,

  MU_MAX_DELAY= 120,
  MU_HANDSHAKE_SIGNATURE= 0x5555,
@@ -544,6 +546,27 @@ stex_ss_send_cmd(struct st_hba *hba, struct req_msg 
*req,

u16 tag)
  readl(hba->mmio_base + YH2I_REQ); /* flush */
  }

+static void return_abnormal_state(struct st_hba *hba, int status)
+{
+struct st_ccb *ccb;
+unsigned long flags;
+u16 tag;
+
+spin_lock_irqsave(hba->host->host_lock, flags);
+for (tag = 0; tag < hba->host->can_queue; tag++) {
+ccb = >ccb[tag];
+if (ccb->req == NULL)
+continue;
+ccb->req = NULL;
+if (ccb->cmd) {
+scsi_dma_unmap(ccb->cmd);
+ccb->cmd->result = status << 16;
+ccb->cmd->scsi_done(ccb->cmd);
+ccb->cmd = NULL;
+}
+}
+spin_unlock_irqrestore(hba->host->host_lock, flags);
+}
  static int
  stex_slave_config(struct scsi_device *sdev)
  {
@@ -567,8 +590,12 @@ stex_queuecommand_lck(struct scsi_cmnd *cmd, void
(*done)(struct scsi_cmnd *))
  id = cmd->device->id;
  lun = cmd->device->lun;
  hba = (struct st_hba *) >hostdata[0];
-
-if (unlikely(hba->mu_status == MU_STATE_RESETTING))
+if (hba->mu_status == MU_STATE_NOCONNECT) {
+cmd->result = DID_NO_CONNECT;
+done(cmd);
+return 0;
+}
+if (unlikely(hba->mu_status != MU_STATE_STARTED))
  return SCSI_MLQUEUE_HOST_BUSY;

  switch (cmd->cmnd[0]) {
@@ -1267,10 +1294,8 @@ static void stex_ss_reset(struct st_hba *hba)

  static int stex_do_reset(struct st_hba *hba)
  {
-struct st_ccb *ccb;
  unsigned long flags;
  unsigned int mu_status = MU_STATE_RESETTING;
-u16 tag;

  spin_lock_irqsave(hba->host->host_lock, flags);
  if (hba->mu_status == MU_STATE_STARTING) {
@@ -1304,20 +1329,8 @@ static int stex_do_reset(struct st_hba *hba)
  else if (hba->cardtype == st_yel)
  stex_ss_reset(hba);

-spin_lock_irqsave(hba->host->host_lock, flags);
-for (tag = 0; tag < hba->host->can_queue; tag++) {
-ccb = >ccb[tag];
-if (ccb->req == NULL)
-continue;
-ccb->req = NULL;
-if (ccb->cmd) {
-scsi_dma_unmap(ccb->cmd);
-ccb->cmd->result = DID_RESET << 16;
-ccb->cmd->scsi_done(ccb->cmd);
-ccb->cmd = NULL;
-}
-}
-spin_unlock_irqrestore(hba->host->host_lock, flags);
+
+return_abnormal_state(hba, DID_RESET);

  if (stex_handshake(hba) == 0)
  return 0;
@@ -1786,9 +1799,11 @@ static void stex_remove(struct pci_dev *pdev)
  {
  struct st_hba *hba = pci_get_drvdata(pdev);

+hba->mu_status = MU_STATE_NOCONNECT;
+return_abnormal_state(hba, DID_NO_CONNECT);
  scsi_remove_host(hba->host);

-stex_hba_stop(hba);
+scsi_block_requests(hba->host);

  stex_hba_free(hba);
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Warning Calltrace when hotplug sas disk

2016-02-04 Thread wangyijing

Hi list, I tried to hotplug disk in my machine, but when I hot remove the disk, 
I found some warning calltrace.

When we try to unplug a disk,

The lldd report a loss_of_singal event to sas, so

sas_deform_port
sas_unregister_domain_devices
sas_unregister_dev
queue the destruct to scsi work queue
sas_port_delete
device_del(port)  //port device is parent of phy and end 
device, so in this case, we first delete the parent kobj then to delete the 
children device.
..
sas_destruct_devices
sas_rphy_delete
...

It seems caused by delete the parent device before the children devices. This 
is my personal idea, if anyone could comment on this, I will be appreciate very 
much, thanks.


WARNING: CPU: 2 PID: 6 at fs/sysfs/group.c:224 sysfs_remove_group+0xa0/0xa4()
kobj 8013e8389410 sysfs group (power)80a2dbe8 not found for kobject 
'0:0:1:0'
Modules linked in:
CPU: 2 PID: 6 Comm: kworker/u64:0 Not tainted 4.1.6+ #160
Hardware name: Hisilicon PhosphorV660 Development Board (DT)
Workqueue: scsi_wq_0 sas_destruct_devices
Call trace:
[] dump_backtrace+0x0/0x124
[] show_stack+0x10/0x1c
[] dump_stack+0x78/0x98
[] warn_slowpath_common+0x98/0xd0
[] warn_slowpath_fmt+0x4c/0x58
[] sysfs_remove_group+0x9c/0xa4
[] dpm_sysfs_remove+0x54/0x94
[] device_del+0x58/0x24c
[] device_unregister+0x10/0x2c
[] bsg_unregister_queue+0xbc/0xf8
[] __scsi_remove_device+0x9c/0xbc
[] scsi_remove_device+0x44/0x64
[] scsi_remove_target+0x198/0x258
[] sas_rphy_remove+0x8c/0xb4
[] sas_rphy_delete+0x34/0x54
[] sas_destruct_devices+0x60/0x98
[] process_one_work+0x13c/0x344
[] worker_thread+0x13c/0x494
[] kthread+0xd8/0xf0
---[ end trace b69dffc64eb59f96 ]---





--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [v2 PATCH 2/3] scsi:stex.c Add hotplug support

2016-01-29 Thread Charles Chiou


Hi all, Ping?
Does this patch has others issues need to fix? Thank you.
Charles

On 09/03/2015 10:01 PM, Johannes Thumshirn wrote:

Charles Chiou <ch1102ch...@gmail.com> writes:


 From 60e14c245c18cbe0300cfa244334e2850a52a381 Mon Sep 17 00:00:00 2001
From: Charles <charles.ch...@tw.promise.com>
Date: Wed, 2 Sep 2015 20:48:55 +0800
Subject: [PATCH 2/3] scsi:stex.c Add hotplug support

1. Add hotplug support. Pegasus support surprise removal. To this end, I
use return_abnormal_state function to return DID_NO_CONNECT for all
   commands which sent to driver.

2. Remove stex_hba_stop in stex_remove because we cannot send command to
device after hotplug.

3. Add new device status:  MU_STATE_STOP, MU_STATE_NOCONNECT,
MU_STATE_STOP. MU_STATE_STOP is currently not referenced.
MU_STATE_NOCONNECT represent that device is plugged out from the host.

4. Use return_abnormal_function() to substitute part of code in stex_do_reset.

V2: N/A

Signed-off-by: Charles Chiou <charles.ch...@tw.promise.com>
---
  drivers/scsi/stex.c | 53 ++---
  1 file changed, 34 insertions(+), 19 deletions(-)

diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c
index 0c93f1f..4ef0c80 100644
--- a/drivers/scsi/stex.c
+++ b/drivers/scsi/stex.c
@@ -83,6 +83,8 @@ enum {
MU_STATE_STARTED= 2,
MU_STATE_RESETTING  = 3,
MU_STATE_FAILED = 4,
+   MU_STATE_STOP   = 5,
+   MU_STATE_NOCONNECT  = 6,

MU_MAX_DELAY= 120,
MU_HANDSHAKE_SIGNATURE  = 0x5555,
@@ -544,6 +546,27 @@ stex_ss_send_cmd(struct st_hba *hba, struct req_msg *req,
u16 tag)
readl(hba->mmio_base + YH2I_REQ); /* flush */
  }

+static void return_abnormal_state(struct st_hba *hba, int status)
+{
+   struct st_ccb *ccb;
+   unsigned long flags;
+   u16 tag;
+
+   spin_lock_irqsave(hba->host->host_lock, flags);
+   for (tag = 0; tag < hba->host->can_queue; tag++) {
+   ccb = >ccb[tag];
+   if (ccb->req == NULL)
+   continue;
+   ccb->req = NULL;
+   if (ccb->cmd) {
+   scsi_dma_unmap(ccb->cmd);
+   ccb->cmd->result = status << 16;
+   ccb->cmd->scsi_done(ccb->cmd);
+   ccb->cmd = NULL;
+   }
+   }
+   spin_unlock_irqrestore(hba->host->host_lock, flags);
+}
  static int
  stex_slave_config(struct scsi_device *sdev)
  {
@@ -567,8 +590,12 @@ stex_queuecommand_lck(struct scsi_cmnd *cmd, void
(*done)(struct scsi_cmnd *))
id = cmd->device->id;
lun = cmd->device->lun;
hba = (struct st_hba *) >hostdata[0];
-
-   if (unlikely(hba->mu_status == MU_STATE_RESETTING))
+   if (hba->mu_status == MU_STATE_NOCONNECT) {
+   cmd->result = DID_NO_CONNECT;
+   done(cmd);
+   return 0;
+   }
+   if (unlikely(hba->mu_status != MU_STATE_STARTED))
return SCSI_MLQUEUE_HOST_BUSY;

switch (cmd->cmnd[0]) {
@@ -1267,10 +1294,8 @@ static void stex_ss_reset(struct st_hba *hba)

  static int stex_do_reset(struct st_hba *hba)
  {
-   struct st_ccb *ccb;
unsigned long flags;
unsigned int mu_status = MU_STATE_RESETTING;
-   u16 tag;

spin_lock_irqsave(hba->host->host_lock, flags);
if (hba->mu_status == MU_STATE_STARTING) {
@@ -1304,20 +1329,8 @@ static int stex_do_reset(struct st_hba *hba)
else if (hba->cardtype == st_yel)
stex_ss_reset(hba);

-   spin_lock_irqsave(hba->host->host_lock, flags);
-   for (tag = 0; tag < hba->host->can_queue; tag++) {
-   ccb = >ccb[tag];
-   if (ccb->req == NULL)
-   continue;
-   ccb->req = NULL;
-   if (ccb->cmd) {
-   scsi_dma_unmap(ccb->cmd);
-   ccb->cmd->result = DID_RESET << 16;
-   ccb->cmd->scsi_done(ccb->cmd);
-   ccb->cmd = NULL;
-   }
-   }
-   spin_unlock_irqrestore(hba->host->host_lock, flags);
+
+   return_abnormal_state(hba, DID_RESET);

if (stex_handshake(hba) == 0)
return 0;
@@ -1786,9 +1799,11 @@ static void stex_remove(struct pci_dev *pdev)
  {
struct st_hba *hba = pci_get_drvdata(pdev);

+   hba->mu_status = MU_STATE_NOCONNECT;
+   return_abnormal_state(hba, DID_NO_CONNECT);
scsi_remove_host(hba->host);

-   stex_hba_stop(hba);
+   scsi_block_requests(hba->host);

stex_hba_free(hba);


Looks OK to me, so
Reviewed-by: Johan

Re: [v2 PATCH 2/3] scsi:stex.c Add hotplug support

2015-09-03 Thread Johannes Thumshirn

Charles Chiou <ch1102ch...@gmail.com> writes:

> From 60e14c245c18cbe0300cfa244334e2850a52a381 Mon Sep 17 00:00:00 2001
> From: Charles <charles.ch...@tw.promise.com>
> Date: Wed, 2 Sep 2015 20:48:55 +0800
> Subject: [PATCH 2/3] scsi:stex.c Add hotplug support
>
> 1. Add hotplug support. Pegasus support surprise removal. To this end, I
>use return_abnormal_state function to return DID_NO_CONNECT for all
>   commands which sent to driver.
>
> 2. Remove stex_hba_stop in stex_remove because we cannot send command to
>device after hotplug.
>
> 3. Add new device status:  MU_STATE_STOP, MU_STATE_NOCONNECT,
>MU_STATE_STOP. MU_STATE_STOP is currently not referenced.
>MU_STATE_NOCONNECT represent that device is plugged out from the host.
>
> 4. Use return_abnormal_function() to substitute part of code in stex_do_reset.
>
> V2: N/A
>
> Signed-off-by: Charles Chiou <charles.ch...@tw.promise.com>
> ---
>  drivers/scsi/stex.c | 53 
> ++---
>  1 file changed, 34 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c
> index 0c93f1f..4ef0c80 100644
> --- a/drivers/scsi/stex.c
> +++ b/drivers/scsi/stex.c
> @@ -83,6 +83,8 @@ enum {
>   MU_STATE_STARTED= 2,
>   MU_STATE_RESETTING  = 3,
>   MU_STATE_FAILED = 4,
> + MU_STATE_STOP   = 5,
> + MU_STATE_NOCONNECT  = 6,
>
>   MU_MAX_DELAY= 120,
>   MU_HANDSHAKE_SIGNATURE  = 0x5555,
> @@ -544,6 +546,27 @@ stex_ss_send_cmd(struct st_hba *hba, struct req_msg *req,
> u16 tag)
>   readl(hba->mmio_base + YH2I_REQ); /* flush */
>  }
>
> +static void return_abnormal_state(struct st_hba *hba, int status)
> +{
> + struct st_ccb *ccb;
> + unsigned long flags;
> + u16 tag;
> +
> + spin_lock_irqsave(hba->host->host_lock, flags);
> + for (tag = 0; tag < hba->host->can_queue; tag++) {
> + ccb = >ccb[tag];
> + if (ccb->req == NULL)
> + continue;
> + ccb->req = NULL;
> + if (ccb->cmd) {
> + scsi_dma_unmap(ccb->cmd);
> + ccb->cmd->result = status << 16;
> + ccb->cmd->scsi_done(ccb->cmd);
> + ccb->cmd = NULL;
> + }
> + }
> + spin_unlock_irqrestore(hba->host->host_lock, flags);
> +}
>  static int
>  stex_slave_config(struct scsi_device *sdev)
>  {
> @@ -567,8 +590,12 @@ stex_queuecommand_lck(struct scsi_cmnd *cmd, void
> (*done)(struct scsi_cmnd *))
>   id = cmd->device->id;
>   lun = cmd->device->lun;
>   hba = (struct st_hba *) >hostdata[0];
> -
> - if (unlikely(hba->mu_status == MU_STATE_RESETTING))
> + if (hba->mu_status == MU_STATE_NOCONNECT) {
> + cmd->result = DID_NO_CONNECT;
> + done(cmd);
> + return 0;
> + }
> + if (unlikely(hba->mu_status != MU_STATE_STARTED))
>   return SCSI_MLQUEUE_HOST_BUSY;
>
>   switch (cmd->cmnd[0]) {
> @@ -1267,10 +1294,8 @@ static void stex_ss_reset(struct st_hba *hba)
>
>  static int stex_do_reset(struct st_hba *hba)
>  {
> - struct st_ccb *ccb;
>   unsigned long flags;
>   unsigned int mu_status = MU_STATE_RESETTING;
> - u16 tag;
>
>   spin_lock_irqsave(hba->host->host_lock, flags);
>   if (hba->mu_status == MU_STATE_STARTING) {
> @@ -1304,20 +1329,8 @@ static int stex_do_reset(struct st_hba *hba)
>   else if (hba->cardtype == st_yel)
>   stex_ss_reset(hba);
>
> - spin_lock_irqsave(hba->host->host_lock, flags);
> - for (tag = 0; tag < hba->host->can_queue; tag++) {
> - ccb = >ccb[tag];
> - if (ccb->req == NULL)
> - continue;
> - ccb->req = NULL;
> - if (ccb->cmd) {
> - scsi_dma_unmap(ccb->cmd);
> - ccb->cmd->result = DID_RESET << 16;
> - ccb->cmd->scsi_done(ccb->cmd);
> - ccb->cmd = NULL;
> - }
> - }
> - spin_unlock_irqrestore(hba->host->host_lock, flags);
> +
> + return_abnormal_state(hba, DID_RESET);
>
>   if (stex_handshake(hba) == 0)
>   return 0;
> @@ -1786,9 +1799,11 @@ static void stex_remove(struct pci_dev *pdev)
>  {
>   struct

[v2 PATCH 2/3] scsi:stex.c Add hotplug support

2015-09-03 Thread Charles Chiou


From 60e14c245c18cbe0300cfa244334e2850a52a381 Mon Sep 17 00:00:00 2001
From: Charles <charles.ch...@tw.promise.com>
Date: Wed, 2 Sep 2015 20:48:55 +0800
Subject: [PATCH 2/3] scsi:stex.c Add hotplug support

1. Add hotplug support. Pegasus support surprise removal. To this end, I
   use return_abnormal_state function to return DID_NO_CONNECT for all
  commands which sent to driver.

2. Remove stex_hba_stop in stex_remove because we cannot send command to
   device after hotplug.

3. Add new device status:  MU_STATE_STOP, MU_STATE_NOCONNECT,
   MU_STATE_STOP. MU_STATE_STOP is currently not referenced.
   MU_STATE_NOCONNECT represent that device is plugged out from the host.

4. Use return_abnormal_function() to substitute part of code in 
stex_do_reset.


V2: N/A

Signed-off-by: Charles Chiou <charles.ch...@tw.promise.com>
---
 drivers/scsi/stex.c | 53 
++---

 1 file changed, 34 insertions(+), 19 deletions(-)

diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c
index 0c93f1f..4ef0c80 100644
--- a/drivers/scsi/stex.c
+++ b/drivers/scsi/stex.c
@@ -83,6 +83,8 @@ enum {
MU_STATE_STARTED= 2,
MU_STATE_RESETTING  = 3,
MU_STATE_FAILED = 4,
+   MU_STATE_STOP   = 5,
+   MU_STATE_NOCONNECT  = 6,

MU_MAX_DELAY= 120,
MU_HANDSHAKE_SIGNATURE  = 0x5555,
@@ -544,6 +546,27 @@ stex_ss_send_cmd(struct st_hba *hba, struct req_msg 
*req, u16 tag)

readl(hba->mmio_base + YH2I_REQ); /* flush */
 }

+static void return_abnormal_state(struct st_hba *hba, int status)
+{
+   struct st_ccb *ccb;
+   unsigned long flags;
+   u16 tag;
+
+   spin_lock_irqsave(hba->host->host_lock, flags);
+   for (tag = 0; tag < hba->host->can_queue; tag++) {
+   ccb = >ccb[tag];
+   if (ccb->req == NULL)
+   continue;
+   ccb->req = NULL;
+   if (ccb->cmd) {
+   scsi_dma_unmap(ccb->cmd);
+   ccb->cmd->result = status << 16;
+   ccb->cmd->scsi_done(ccb->cmd);
+   ccb->cmd = NULL;
+   }
+   }
+   spin_unlock_irqrestore(hba->host->host_lock, flags);
+}
 static int
 stex_slave_config(struct scsi_device *sdev)
 {
@@ -567,8 +590,12 @@ stex_queuecommand_lck(struct scsi_cmnd *cmd, void 
(*done)(struct scsi_cmnd *))

id = cmd->device->id;
lun = cmd->device->lun;
hba = (struct st_hba *) >hostdata[0];
-
-   if (unlikely(hba->mu_status == MU_STATE_RESETTING))
+   if (hba->mu_status == MU_STATE_NOCONNECT) {
+   cmd->result = DID_NO_CONNECT;
+   done(cmd);
+   return 0;
+   }
+   if (unlikely(hba->mu_status != MU_STATE_STARTED))
return SCSI_MLQUEUE_HOST_BUSY;

switch (cmd->cmnd[0]) {
@@ -1267,10 +1294,8 @@ static void stex_ss_reset(struct st_hba *hba)

 static int stex_do_reset(struct st_hba *hba)
 {
-   struct st_ccb *ccb;
unsigned long flags;
unsigned int mu_status = MU_STATE_RESETTING;
-   u16 tag;

spin_lock_irqsave(hba->host->host_lock, flags);
if (hba->mu_status == MU_STATE_STARTING) {
@@ -1304,20 +1329,8 @@ static int stex_do_reset(struct st_hba *hba)
else if (hba->cardtype == st_yel)
stex_ss_reset(hba);

-   spin_lock_irqsave(hba->host->host_lock, flags);
-   for (tag = 0; tag < hba->host->can_queue; tag++) {
-   ccb = >ccb[tag];
-   if (ccb->req == NULL)
-   continue;
-   ccb->req = NULL;
-   if (ccb->cmd) {
-   scsi_dma_unmap(ccb->cmd);
-   ccb->cmd->result = DID_RESET << 16;
-   ccb->cmd->scsi_done(ccb->cmd);
-   ccb->cmd = NULL;
-   }
-   }
-   spin_unlock_irqrestore(hba->host->host_lock, flags);
+
+   return_abnormal_state(hba, DID_RESET);

if (stex_handshake(hba) == 0)
return 0;
@@ -1786,9 +1799,11 @@ static void stex_remove(struct pci_dev *pdev)
 {
struct st_hba *hba = pci_get_drvdata(pdev);

+   hba->mu_status = MU_STATE_NOCONNECT;
+   return_abnormal_state(hba, DID_NO_CONNECT);
scsi_remove_host(hba->host);

-   stex_hba_stop(hba);
+   scsi_block_requests(hba->host);

stex_hba_free(hba);

--
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/4] scsi:stex.c Add hotplug support

2015-09-02 Thread Charles Chiou



From 9f6cf26367419ed746c6c0f4e80fad6066b99d06 Mon Sep 17 00:00:00 2001
From: Charles <charles.ch...@tw.promise.com>
Date: Wed, 2 Sep 2015 20:48:55 +0800
Subject: [PATCH 2/3] scsi:stex.c Add hotplug support

1. Add hotplug support. Pegasus support surprise removal. To this end, I
   use return_abnormal_state function to return DID_NO_CONNECT for all
  commands which sent to driver.

2. Remove stex_hba_stop in stex_remove because we cannot send command to
   device after hotplug.

3. Add new device status:  MU_STATE_STOP, MU_STATE_NOCONNECT,
   MU_STATE_STOP. MU_STATE_STOP is currently not referenced.
   MU_STATE_NOCONNECT represent that device is plugged out from the host.

4. Use return_abnormal_function() to substitute part of code in 
stex_do_reset.


Signed-off-by: Charles Chiou <charles.ch...@tw.promise.com>
---
 drivers/scsi/stex.c | 53 
++---

 1 file changed, 34 insertions(+), 19 deletions(-)

diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c
index 657e3ae..6578f3d 100644
--- a/drivers/scsi/stex.c
+++ b/drivers/scsi/stex.c
@@ -83,6 +83,8 @@ enum {
MU_STATE_STARTED= 2,
MU_STATE_RESETTING  = 3,
MU_STATE_FAILED = 4,
+   MU_STATE_STOP   = 5,
+   MU_STATE_NOCONNECT  = 6,

MU_MAX_DELAY= 120,
MU_HANDSHAKE_SIGNATURE  = 0x5555,
@@ -544,6 +546,27 @@ stex_ss_send_cmd(struct st_hba *hba, struct req_msg 
*req, u16 tag)

readl(hba->mmio_base + YH2I_REQ); /* flush */
 }

+static void return_abnormal_state(struct st_hba *hba, int status)
+{
+   struct st_ccb *ccb;
+   unsigned long flags;
+   u16 tag;
+
+   spin_lock_irqsave(hba->host->host_lock, flags);
+   for (tag = 0; tag < hba->host->can_queue; tag++) {
+   ccb = >ccb[tag];
+   if (ccb->req == NULL)
+   continue;
+   ccb->req = NULL;
+   if (ccb->cmd) {
+   scsi_dma_unmap(ccb->cmd);
+   ccb->cmd->result = status << 16;
+   ccb->cmd->scsi_done(ccb->cmd);
+   ccb->cmd = NULL;
+   }
+   }
+   spin_unlock_irqrestore(hba->host->host_lock, flags);
+}
 static int
 stex_slave_config(struct scsi_device *sdev)
 {
@@ -567,8 +590,12 @@ stex_queuecommand_lck(struct scsi_cmnd *cmd, void 
(*done)(struct scsi_cmnd *))

id = cmd->device->id;
lun = cmd->device->lun;
hba = (struct st_hba *) >hostdata[0];
-
-   if (unlikely(hba->mu_status == MU_STATE_RESETTING))
+   if (hba->mu_status == MU_STATE_NOCONNECT) {
+   cmd->result = DID_NO_CONNECT;
+   done(cmd);
+   return 0;
+   }
+   if (unlikely(hba->mu_status != MU_STATE_STARTED))
return SCSI_MLQUEUE_HOST_BUSY;

switch (cmd->cmnd[0]) {
@@ -1267,10 +1294,8 @@ static void stex_ss_reset(struct st_hba *hba)

 static int stex_do_reset(struct st_hba *hba)
 {
-   struct st_ccb *ccb;
unsigned long flags;
unsigned int mu_status = MU_STATE_RESETTING;
-   u16 tag;

spin_lock_irqsave(hba->host->host_lock, flags);
if (hba->mu_status == MU_STATE_STARTING) {
@@ -1304,20 +1329,8 @@ static int stex_do_reset(struct st_hba *hba)
else if (hba->cardtype == st_yel)
stex_ss_reset(hba);

-   spin_lock_irqsave(hba->host->host_lock, flags);
-   for (tag = 0; tag < hba->host->can_queue; tag++) {
-   ccb = >ccb[tag];
-   if (ccb->req == NULL)
-   continue;
-   ccb->req = NULL;
-   if (ccb->cmd) {
-   scsi_dma_unmap(ccb->cmd);
-   ccb->cmd->result = DID_RESET << 16;
-   ccb->cmd->scsi_done(ccb->cmd);
-   ccb->cmd = NULL;
-   }
-   }
-   spin_unlock_irqrestore(hba->host->host_lock, flags);
+
+   return_abnormal_state(hba, DID_RESET);

if (stex_handshake(hba) == 0)
return 0;
@@ -1789,9 +1802,11 @@ static void stex_remove(struct pci_dev *pdev)
 {
struct st_hba *hba = pci_get_drvdata(pdev);

+   hba->mu_status = MU_STATE_NOCONNECT;
+   return_abnormal_state(hba, DID_NO_CONNECT);
scsi_remove_host(hba->host);

-   stex_hba_stop(hba);
+   scsi_block_requests(hba->host);

stex_hba_free(hba);

--
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/4 v4] scsi:stex.c Add hotplug support

2014-12-15 Thread Charles Chiou



From 901f2c1b2d1ae2991182f0f62cedc70f87ea49bc Mon Sep 17 00:00:00 2001
From: Charles Chiou charles.ch...@tw.promise.com
Date: Wed, 5 Nov 2014 17:18:37 +0800
Subject: [PATCH 2/4] scsi:stex.c Add hotplug support

1. Add hotplug support. Pegasus support surprise removal. To this end, I
   use return_abnormal_state function to return DID_NO_CONNECT for all
   commands which sent to driver.

2. Remove stex_hba_stop in stex_remove because we cannot send command to
   device after hotplug.

3. Add new device status:  MU_STATE_STOP, MU_STATE_NOCONNECT
   , MU_STATE_STOP. MU_STATE_STOP is currently not referenced.
   MU_STATE_NOCONNECT represent that device is plugged out
   from the host.

4. Use return_abnormal_function() to substitute part of code
   in stex_do_reset.

Signed-off-by: charles.ch...@tw.promise.com
---
 drivers/scsi/stex.c | 51 
+--

 1 file changed, 33 insertions(+), 18 deletions(-)

diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c
index 08e7bc8..7dc6afe 100644
--- a/drivers/scsi/stex.c
+++ b/drivers/scsi/stex.c
@@ -83,6 +83,8 @@ enum {
MU_STATE_STARTED= 2,
MU_STATE_RESETTING  = 3,
MU_STATE_FAILED = 4,
+   MU_STATE_STOP   = 5,
+   MU_STATE_NOCONNECT  = 6,

MU_MAX_DELAY= 120,
MU_HANDSHAKE_SIGNATURE  = 0x5555,
@@ -544,6 +546,27 @@ stex_ss_send_cmd(struct st_hba *hba, struct req_msg 
*req, u16 tag)

readl(hba-mmio_base + YH2I_REQ); /* flush */
 }

+static void return_abnormal_state(struct st_hba *hba, int status)
+{
+   struct st_ccb *ccb;
+   unsigned long flags;
+   u16 tag;
+
+   spin_lock_irqsave(hba-host-host_lock, flags);
+   for (tag = 0; tag  hba-host-can_queue; tag++) {
+   ccb = hba-ccb[tag];
+   if (ccb-req == NULL)
+   continue;
+   ccb-req = NULL;
+   if (ccb-cmd) {
+   scsi_dma_unmap(ccb-cmd);
+   ccb-cmd-result = status  16;
+   ccb-cmd-scsi_done(ccb-cmd);
+   ccb-cmd = NULL;
+   }
+   }
+   spin_unlock_irqrestore(hba-host-host_lock, flags);
+}
 static int
 stex_slave_alloc(struct scsi_device *sdev)
 {
@@ -585,7 +608,11 @@ stex_queuecommand_lck(struct scsi_cmnd *cmd, void 
(*done)(struct scsi_cmnd *))

id = cmd-device-id;
lun = cmd-device-lun;
hba = (struct st_hba *) host-hostdata[0];
-
+   if (hba-mu_status == MU_STATE_NOCONNECT) {
+   cmd-result = DID_NO_CONNECT;
+   done(cmd);
+   return 0;
+   }
if (unlikely(hba-mu_status == MU_STATE_RESETTING))
return SCSI_MLQUEUE_HOST_BUSY;

@@ -1287,10 +1314,8 @@ static void stex_ss_reset(struct st_hba *hba)

 static int stex_do_reset(struct st_hba *hba)
 {
-   struct st_ccb *ccb;
unsigned long flags;
unsigned int mu_status = MU_STATE_RESETTING;
-   u16 tag;

spin_lock_irqsave(hba-host-host_lock, flags);
if (hba-mu_status == MU_STATE_STARTING) {
@@ -1324,20 +1349,8 @@ static int stex_do_reset(struct st_hba *hba)
else if (hba-cardtype == st_yel)
stex_ss_reset(hba);

-   spin_lock_irqsave(hba-host-host_lock, flags);
-   for (tag = 0; tag  hba-host-can_queue; tag++) {
-   ccb = hba-ccb[tag];
-   if (ccb-req == NULL)
-   continue;
-   ccb-req = NULL;
-   if (ccb-cmd) {
-   scsi_dma_unmap(ccb-cmd);
-   ccb-cmd-result = DID_RESET  16;
-   ccb-cmd-scsi_done(ccb-cmd);
-   ccb-cmd = NULL;
-   }
-   }
-   spin_unlock_irqrestore(hba-host-host_lock, flags);
+
+   return_abnormal_state(hba, DID_RESET);

if (stex_handshake(hba) == 0)
return 0;
@@ -1808,9 +1821,11 @@ static void stex_remove(struct pci_dev *pdev)
 {
struct st_hba *hba = pci_get_drvdata(pdev);

+   hba-mu_status = MU_STATE_NOCONNECT;
+   return_abnormal_state(hba, DID_NO_CONNECT);
scsi_remove_host(hba-host);

-   stex_hba_stop(hba);
+   scsi_block_requests(hba-host);

stex_hba_free(hba);

--
1.9.1

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[V3 PATCH 2/4] scsi:stex.c Add hotplug support

2014-12-09 Thread Charles Chiou



From 901f2c1b2d1ae2991182f0f62cedc70f87ea49bc Mon Sep 17 00:00:00 2001
From: Charles Chiou charles.ch...@tw.promise.com
Date: Wed, 5 Nov 2014 17:18:37 +0800
Subject: [PATCH 2/4] scsi:stex.c Add hotplug support

1. Add hotplug support. Pegasus support surprise removal. To this end, I
   use return_abnormal_state function to return DID_NO_CONNECT for all
   commands which sent to driver.

2. Remove stex_hba_stop in stex_remove because we cannot send command to
   device after hotplug.

3. Add new device status:  MU_STATE_STOP, MU_STATE_NOCONNECT
   , MU_STATE_STOP. MU_STATE_STOP is currently not referenced.
   MU_STATE_NOCONNECT represent that device is plugged out
   from the host.

4. Use return_abnormal_function() to substitute part of code
   in stex_do_reset.

Signed-off-by: charles.ch...@tw.promise.com
---
 drivers/scsi/stex.c | 51 
+--

 1 file changed, 33 insertions(+), 18 deletions(-)

diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c
index 08e7bc8..7dc6afe 100644
--- a/drivers/scsi/stex.c
+++ b/drivers/scsi/stex.c
@@ -83,6 +83,8 @@ enum {
MU_STATE_STARTED= 2,
MU_STATE_RESETTING  = 3,
MU_STATE_FAILED = 4,
+   MU_STATE_STOP   = 5,
+   MU_STATE_NOCONNECT  = 6,

MU_MAX_DELAY= 120,
MU_HANDSHAKE_SIGNATURE  = 0x5555,
@@ -544,6 +546,27 @@ stex_ss_send_cmd(struct st_hba *hba, struct req_msg 
*req, u16 tag)

readl(hba-mmio_base + YH2I_REQ); /* flush */
 }

+static void return_abnormal_state(struct st_hba *hba, int status)
+{
+   struct st_ccb *ccb;
+   unsigned long flags;
+   u16 tag;
+
+   spin_lock_irqsave(hba-host-host_lock, flags);
+   for (tag = 0; tag  hba-host-can_queue; tag++) {
+   ccb = hba-ccb[tag];
+   if (ccb-req == NULL)
+   continue;
+   ccb-req = NULL;
+   if (ccb-cmd) {
+   scsi_dma_unmap(ccb-cmd);
+   ccb-cmd-result = status  16;
+   ccb-cmd-scsi_done(ccb-cmd);
+   ccb-cmd = NULL;
+   }
+   }
+   spin_unlock_irqrestore(hba-host-host_lock, flags);
+}
 static int
 stex_slave_alloc(struct scsi_device *sdev)
 {
@@ -585,7 +608,11 @@ stex_queuecommand_lck(struct scsi_cmnd *cmd, void 
(*done)(struct scsi_cmnd *))

id = cmd-device-id;
lun = cmd-device-lun;
hba = (struct st_hba *) host-hostdata[0];
-
+   if (hba-mu_status == MU_STATE_NOCONNECT) {
+   cmd-result = DID_NO_CONNECT;
+   done(cmd);
+   return 0;
+   }
if (unlikely(hba-mu_status == MU_STATE_RESETTING))
return SCSI_MLQUEUE_HOST_BUSY;

@@ -1287,10 +1314,8 @@ static void stex_ss_reset(struct st_hba *hba)

 static int stex_do_reset(struct st_hba *hba)
 {
-   struct st_ccb *ccb;
unsigned long flags;
unsigned int mu_status = MU_STATE_RESETTING;
-   u16 tag;

spin_lock_irqsave(hba-host-host_lock, flags);
if (hba-mu_status == MU_STATE_STARTING) {
@@ -1324,20 +1349,8 @@ static int stex_do_reset(struct st_hba *hba)
else if (hba-cardtype == st_yel)
stex_ss_reset(hba);

-   spin_lock_irqsave(hba-host-host_lock, flags);
-   for (tag = 0; tag  hba-host-can_queue; tag++) {
-   ccb = hba-ccb[tag];
-   if (ccb-req == NULL)
-   continue;
-   ccb-req = NULL;
-   if (ccb-cmd) {
-   scsi_dma_unmap(ccb-cmd);
-   ccb-cmd-result = DID_RESET  16;
-   ccb-cmd-scsi_done(ccb-cmd);
-   ccb-cmd = NULL;
-   }
-   }
-   spin_unlock_irqrestore(hba-host-host_lock, flags);
+
+   return_abnormal_state(hba, DID_RESET);

if (stex_handshake(hba) == 0)
return 0;
@@ -1808,9 +1821,11 @@ static void stex_remove(struct pci_dev *pdev)
 {
struct st_hba *hba = pci_get_drvdata(pdev);

+   hba-mu_status = MU_STATE_NOCONNECT;
+   return_abnormal_state(hba, DID_NO_CONNECT);
scsi_remove_host(hba-host);

-   stex_hba_stop(hba);
+   scsi_block_requests(hba-host);

stex_hba_free(hba);

--
1.9.1

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/4] scsi:stex.c Add hotplug support

2014-11-11 Thread Charles Chiou



From 070dfd671f4cefb2d54563b77b9c80a8c82f260a Mon Sep 17 00:00:00 2001
From: Charles Chiou charles.ch...@tw.promise.com
Date: Wed, 5 Nov 2014 17:18:37 +0800
Subject: [PATCH 2/4] scsi:stex.c Add hotplut support

1. Add hotplug support. Pegasus support surprise remove. To this, I
   use return_abnormal_state function to return DID_NO_CONNECT for all
   commands which sent to driver.

2. Remove stex_hba_stop in stex_remove because we cannot send command to
   device after hotplug.

3. Add new device status:  MU_STATE_STOP, MU_STATE_NOCONNECT,
   MU_STATE_STOP is currently no use. MU_STATE_NOCONNECT represent that
   device is plug out from the host.

4  At function stex_do_reset, I replace the part of code by
   return_abnormal_state function.

Signed-off-by: Charles Chiou charles.ch...@tw.promise.com

---
 drivers/scsi/stex.c | 51 
+--

 1 file changed, 33 insertions(+), 18 deletions(-)

diff --git a/drivers/scsi/stex.c b/drivers/scsi/stex.c
index f52f1de..c0d 100644
--- a/drivers/scsi/stex.c
+++ b/drivers/scsi/stex.c
@@ -83,6 +83,8 @@ enum {
MU_STATE_STARTED= 2,
MU_STATE_RESETTING  = 3,
MU_STATE_FAILED = 4,
+   MU_STATE_STOP   = 5,
+   MU_STATE_NOCONNECT  = 6,

MU_MAX_DELAY= 120,
MU_HANDSHAKE_SIGNATURE  = 0x5555,
@@ -544,6 +546,27 @@ stex_ss_send_cmd(struct st_hba *hba, struct req_msg 
*req, u16 tag)

readl(hba-mmio_base + YH2I_REQ); /* flush */
 }

+static void return_abnormal_state(struct st_hba *hba, int status)
+{
+   struct st_ccb *ccb;
+   unsigned long flags;
+   u16 tag;
+
+   spin_lock_irqsave(hba-host-host_lock, flags);
+   for (tag = 0; tag  hba-host-can_queue; tag++) {
+   ccb = hba-ccb[tag];
+   if (ccb-req == NULL)
+   continue;
+   ccb-req = NULL;
+   if (ccb-cmd) {
+   scsi_dma_unmap(ccb-cmd);
+   ccb-cmd-result = status  16;
+   ccb-cmd-scsi_done(ccb-cmd);
+   ccb-cmd = NULL;
+   }
+   }
+   spin_unlock_irqrestore(hba-host-host_lock, flags);
+}
 static int
 stex_slave_alloc(struct scsi_device *sdev)
 {
@@ -585,7 +608,11 @@ stex_queuecommand_lck(struct scsi_cmnd *cmd, void 
(*done)(struct scsi_cmnd *))

id = cmd-device-id;
lun = cmd-device-lun;
hba = (struct st_hba *) host-hostdata[0];
-
+   if (hba-mu_status == MU_STATE_NOCONNECT) {
+   cmd-result = DID_NO_CONNECT;
+   done(cmd);
+   return 0;
+   }
if (unlikely(hba-mu_status == MU_STATE_RESETTING))
return SCSI_MLQUEUE_HOST_BUSY;

@@ -1287,10 +1314,8 @@ static void stex_ss_reset(struct st_hba *hba)

 static int stex_do_reset(struct st_hba *hba)
 {
-   struct st_ccb *ccb;
unsigned long flags;
unsigned int mu_status = MU_STATE_RESETTING;
-   u16 tag;

spin_lock_irqsave(hba-host-host_lock, flags);
if (hba-mu_status == MU_STATE_STARTING) {
@@ -1324,20 +1349,8 @@ static int stex_do_reset(struct st_hba *hba)
else if (hba-cardtype == st_yel)
stex_ss_reset(hba);

-   spin_lock_irqsave(hba-host-host_lock, flags);
-   for (tag = 0; tag  hba-host-can_queue; tag++) {
-   ccb = hba-ccb[tag];
-   if (ccb-req == NULL)
-   continue;
-   ccb-req = NULL;
-   if (ccb-cmd) {
-   scsi_dma_unmap(ccb-cmd);
-   ccb-cmd-result = DID_RESET  16;
-   ccb-cmd-scsi_done(ccb-cmd);
-   ccb-cmd = NULL;
-   }
-   }
-   spin_unlock_irqrestore(hba-host-host_lock, flags);
+
+   return_abnormal_state(hba, DID_RESET);

if (stex_handshake(hba) == 0)
return 0;
@@ -1802,9 +1815,11 @@ static void stex_remove(struct pci_dev *pdev)
 {
struct st_hba *hba = pci_get_drvdata(pdev);

+   hba-mu_status = MU_STATE_NOCONNECT;
+   return_abnormal_state(hba, DID_NO_CONNECT);
scsi_remove_host(hba-host);

-   stex_hba_stop(hba);
+   scsi_block_requests(hba-host);

stex_hba_free(hba);

--
1.9.1

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 35/52] scsi, fcoe: Fix CPU hotplug callback registration

2014-03-10 Thread Srivatsa S. Bhat

Subsystems that want to register CPU hotplug callbacks, as well as perform
initialization for the CPUs that are already online, often do it as shown
below:

get_online_cpus();

for_each_online_cpu(cpu)
init_cpu(cpu);

register_cpu_notifier(foobar_cpu_notifier);

put_online_cpus();

This is wrong, since it is prone to ABBA deadlocks involving the
cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
with CPU hotplug operations).

Instead, the correct and race-free way of performing the callback
registration is:

cpu_notifier_register_begin();

for_each_online_cpu(cpu)
init_cpu(cpu);

/* Note the use of the double underscored version of the API */
__register_cpu_notifier(foobar_cpu_notifier);

cpu_notifier_register_done();


Fix the fcoe code in scsi by using this latter form of callback registration.

Cc: Robert Love robert.w.l...@intel.com
Cc: James E.J. Bottomley jbottom...@parallels.com
Cc: Ingo Molnar mi...@kernel.org
Cc: fcoe-de...@open-fcoe.org
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
---

 drivers/scsi/fcoe/fcoe.c |   15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/fcoe/fcoe.c b/drivers/scsi/fcoe/fcoe.c
index f317000..d5e105b 100644
--- a/drivers/scsi/fcoe/fcoe.c
+++ b/drivers/scsi/fcoe/fcoe.c
@@ -2633,14 +2633,18 @@ static int __init fcoe_init(void)
skb_queue_head_init(p-fcoe_rx_list);
}
 
+   cpu_notifier_register_begin();
+
for_each_online_cpu(cpu)
fcoe_percpu_thread_create(cpu);
 
/* Initialize per CPU interrupt thread */
-   rc = register_hotcpu_notifier(fcoe_cpu_notifier);
+   rc = __register_hotcpu_notifier(fcoe_cpu_notifier);
if (rc)
goto out_free;
 
+   cpu_notifier_register_done();
+
/* Setup link change notification */
fcoe_dev_setup();
 
@@ -2655,6 +2659,9 @@ out_free:
for_each_online_cpu(cpu) {
fcoe_percpu_thread_destroy(cpu);
}
+
+   cpu_notifier_register_done();
+
mutex_unlock(fcoe_config_mutex);
destroy_workqueue(fcoe_wq);
return rc;
@@ -2687,11 +2694,15 @@ static void __exit fcoe_exit(void)
}
rtnl_unlock();
 
-   unregister_hotcpu_notifier(fcoe_cpu_notifier);
+   cpu_notifier_register_begin();
 
for_each_online_cpu(cpu)
fcoe_percpu_thread_destroy(cpu);
 
+   __unregister_hotcpu_notifier(fcoe_cpu_notifier);
+
+   cpu_notifier_register_done();
+
mutex_unlock(fcoe_config_mutex);
 
/*

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 34/52] scsi, bnx2fc: Fix CPU hotplug callback registration

2014-03-10 Thread Srivatsa S. Bhat

Subsystems that want to register CPU hotplug callbacks, as well as perform
initialization for the CPUs that are already online, often do it as shown
below:

get_online_cpus();

for_each_online_cpu(cpu)
init_cpu(cpu);

register_cpu_notifier(foobar_cpu_notifier);

put_online_cpus();

This is wrong, since it is prone to ABBA deadlocks involving the
cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
with CPU hotplug operations).

Instead, the correct and race-free way of performing the callback
registration is:

cpu_notifier_register_begin();

for_each_online_cpu(cpu)
init_cpu(cpu);

/* Note the use of the double underscored version of the API */
__register_cpu_notifier(foobar_cpu_notifier);

cpu_notifier_register_done();


Fix the bnx2fc code in scsi by using this latter form of callback
registration.

Cc: Eddie Wai eddie@broadcom.com
Cc: James E.J. Bottomley jbottom...@parallels.com
Cc: Ingo Molnar mi...@kernel.org
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
---

 drivers/scsi/bnx2fc/bnx2fc_fcoe.c |   12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/bnx2fc/bnx2fc_fcoe.c 
b/drivers/scsi/bnx2fc/bnx2fc_fcoe.c
index 9b94850..c4ec235 100644
--- a/drivers/scsi/bnx2fc/bnx2fc_fcoe.c
+++ b/drivers/scsi/bnx2fc/bnx2fc_fcoe.c
@@ -2586,12 +2586,16 @@ static int __init bnx2fc_mod_init(void)
spin_lock_init(p-fp_work_lock);
}
 
+   cpu_notifier_register_begin();
+
for_each_online_cpu(cpu) {
bnx2fc_percpu_thread_create(cpu);
}
 
/* Initialize per CPU interrupt thread */
-   register_hotcpu_notifier(bnx2fc_cpu_notifier);
+   __register_hotcpu_notifier(bnx2fc_cpu_notifier);
+
+   cpu_notifier_register_done();
 
cnic_register_driver(CNIC_ULP_FCOE, bnx2fc_cnic_cb);
 
@@ -2656,13 +2660,17 @@ static void __exit bnx2fc_mod_exit(void)
if (l2_thread)
kthread_stop(l2_thread);
 
-   unregister_hotcpu_notifier(bnx2fc_cpu_notifier);
+   cpu_notifier_register_begin();
 
/* Destroy per cpu threads */
for_each_online_cpu(cpu) {
bnx2fc_percpu_thread_destroy(cpu);
}
 
+   __unregister_hotcpu_notifier(bnx2fc_cpu_notifier);
+
+   cpu_notifier_register_done();
+
destroy_workqueue(bnx2fc_wq);
/*
 * detach from scsi transport

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 33/52] scsi, bnx2i: Fix CPU hotplug callback registration

2014-03-10 Thread Srivatsa S. Bhat

Subsystems that want to register CPU hotplug callbacks, as well as perform
initialization for the CPUs that are already online, often do it as shown
below:

get_online_cpus();

for_each_online_cpu(cpu)
init_cpu(cpu);

register_cpu_notifier(foobar_cpu_notifier);

put_online_cpus();

This is wrong, since it is prone to ABBA deadlocks involving the
cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
with CPU hotplug operations).

Instead, the correct and race-free way of performing the callback
registration is:

cpu_notifier_register_begin();

for_each_online_cpu(cpu)
init_cpu(cpu);

/* Note the use of the double underscored version of the API */
__register_cpu_notifier(foobar_cpu_notifier);

cpu_notifier_register_done();


Fix the bnx2i code in scsi by using this latter form of callback registration.

Cc: Eddie Wai eddie@broadcom.com
Cc: James E.J. Bottomley jbottom...@parallels.com
Cc: Ingo Molnar mi...@kernel.org
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
---

 drivers/scsi/bnx2i/bnx2i_init.c |   12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/bnx2i/bnx2i_init.c b/drivers/scsi/bnx2i/bnx2i_init.c
index 34c294b..80c03b4 100644
--- a/drivers/scsi/bnx2i/bnx2i_init.c
+++ b/drivers/scsi/bnx2i/bnx2i_init.c
@@ -537,11 +537,15 @@ static int __init bnx2i_mod_init(void)
p-iothread = NULL;
}
 
+   cpu_notifier_register_begin();
+
for_each_online_cpu(cpu)
bnx2i_percpu_thread_create(cpu);
 
/* Initialize per CPU interrupt thread */
-   register_hotcpu_notifier(bnx2i_cpu_notifier);
+   __register_hotcpu_notifier(bnx2i_cpu_notifier);
+
+   cpu_notifier_register_done();
 
return 0;
 
@@ -581,11 +585,15 @@ static void __exit bnx2i_mod_exit(void)
}
mutex_unlock(bnx2i_dev_lock);
 
-   unregister_hotcpu_notifier(bnx2i_cpu_notifier);
+   cpu_notifier_register_begin();
 
for_each_online_cpu(cpu)
bnx2i_percpu_thread_destroy(cpu);
 
+   __unregister_hotcpu_notifier(bnx2i_cpu_notifier);
+
+   cpu_notifier_register_done();
+
iscsi_unregister_transport(bnx2i_iscsi_transport);
cnic_unregister_driver(CNIC_ULP_ISCSI);
 }

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 35/52] scsi, fcoe: Fix CPU hotplug callback registration

2014-02-14 Thread Srivatsa S. Bhat

Subsystems that want to register CPU hotplug callbacks, as well as perform
initialization for the CPUs that are already online, often do it as shown
below:

get_online_cpus();

for_each_online_cpu(cpu)
init_cpu(cpu);

register_cpu_notifier(foobar_cpu_notifier);

put_online_cpus();

This is wrong, since it is prone to ABBA deadlocks involving the
cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
with CPU hotplug operations).

Instead, the correct and race-free way of performing the callback
registration is:

cpu_notifier_register_begin();

for_each_online_cpu(cpu)
init_cpu(cpu);

/* Note the use of the double underscored version of the API */
__register_cpu_notifier(foobar_cpu_notifier);

cpu_notifier_register_done();


Fix the fcoe code in scsi by using this latter form of callback registration.

Cc: Robert Love robert.w.l...@intel.com
Cc: James E.J. Bottomley jbottom...@parallels.com
Cc: Ingo Molnar mi...@kernel.org
Cc: fcoe-de...@open-fcoe.org
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
---

 drivers/scsi/fcoe/fcoe.c |   15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/fcoe/fcoe.c b/drivers/scsi/fcoe/fcoe.c
index f317000..d5e105b 100644
--- a/drivers/scsi/fcoe/fcoe.c
+++ b/drivers/scsi/fcoe/fcoe.c
@@ -2633,14 +2633,18 @@ static int __init fcoe_init(void)
skb_queue_head_init(p-fcoe_rx_list);
}
 
+   cpu_notifier_register_begin();
+
for_each_online_cpu(cpu)
fcoe_percpu_thread_create(cpu);
 
/* Initialize per CPU interrupt thread */
-   rc = register_hotcpu_notifier(fcoe_cpu_notifier);
+   rc = __register_hotcpu_notifier(fcoe_cpu_notifier);
if (rc)
goto out_free;
 
+   cpu_notifier_register_done();
+
/* Setup link change notification */
fcoe_dev_setup();
 
@@ -2655,6 +2659,9 @@ out_free:
for_each_online_cpu(cpu) {
fcoe_percpu_thread_destroy(cpu);
}
+
+   cpu_notifier_register_done();
+
mutex_unlock(fcoe_config_mutex);
destroy_workqueue(fcoe_wq);
return rc;
@@ -2687,11 +2694,15 @@ static void __exit fcoe_exit(void)
}
rtnl_unlock();
 
-   unregister_hotcpu_notifier(fcoe_cpu_notifier);
+   cpu_notifier_register_begin();
 
for_each_online_cpu(cpu)
fcoe_percpu_thread_destroy(cpu);
 
+   __unregister_hotcpu_notifier(fcoe_cpu_notifier);
+
+   cpu_notifier_register_done();
+
mutex_unlock(fcoe_config_mutex);
 
/*

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 33/52] scsi, bnx2i: Fix CPU hotplug callback registration

2014-02-14 Thread Srivatsa S. Bhat

Subsystems that want to register CPU hotplug callbacks, as well as perform
initialization for the CPUs that are already online, often do it as shown
below:

get_online_cpus();

for_each_online_cpu(cpu)
init_cpu(cpu);

register_cpu_notifier(foobar_cpu_notifier);

put_online_cpus();

This is wrong, since it is prone to ABBA deadlocks involving the
cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
with CPU hotplug operations).

Instead, the correct and race-free way of performing the callback
registration is:

cpu_notifier_register_begin();

for_each_online_cpu(cpu)
init_cpu(cpu);

/* Note the use of the double underscored version of the API */
__register_cpu_notifier(foobar_cpu_notifier);

cpu_notifier_register_done();


Fix the bnx2i code in scsi by using this latter form of callback registration.

Cc: Eddie Wai eddie@broadcom.com
Cc: James E.J. Bottomley jbottom...@parallels.com
Cc: Ingo Molnar mi...@kernel.org
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
---

 drivers/scsi/bnx2i/bnx2i_init.c |   12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/bnx2i/bnx2i_init.c b/drivers/scsi/bnx2i/bnx2i_init.c
index 34c294b..80c03b4 100644
--- a/drivers/scsi/bnx2i/bnx2i_init.c
+++ b/drivers/scsi/bnx2i/bnx2i_init.c
@@ -537,11 +537,15 @@ static int __init bnx2i_mod_init(void)
p-iothread = NULL;
}
 
+   cpu_notifier_register_begin();
+
for_each_online_cpu(cpu)
bnx2i_percpu_thread_create(cpu);
 
/* Initialize per CPU interrupt thread */
-   register_hotcpu_notifier(bnx2i_cpu_notifier);
+   __register_hotcpu_notifier(bnx2i_cpu_notifier);
+
+   cpu_notifier_register_done();
 
return 0;
 
@@ -581,11 +585,15 @@ static void __exit bnx2i_mod_exit(void)
}
mutex_unlock(bnx2i_dev_lock);
 
-   unregister_hotcpu_notifier(bnx2i_cpu_notifier);
+   cpu_notifier_register_begin();
 
for_each_online_cpu(cpu)
bnx2i_percpu_thread_destroy(cpu);
 
+   __unregister_hotcpu_notifier(bnx2i_cpu_notifier);
+
+   cpu_notifier_register_done();
+
iscsi_unregister_transport(bnx2i_iscsi_transport);
cnic_unregister_driver(CNIC_ULP_ISCSI);
 }

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 33/51] scsi, fcoe: Fix CPU hotplug callback registration

2014-02-05 Thread Srivatsa S. Bhat

Subsystems that want to register CPU hotplug callbacks, as well as perform
initialization for the CPUs that are already online, often do it as shown
below:

get_online_cpus();

for_each_online_cpu(cpu)
init_cpu(cpu);

register_cpu_notifier(foobar_cpu_notifier);

put_online_cpus();

This is wrong, since it is prone to ABBA deadlocks involving the
cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
with CPU hotplug operations).

Instead, the correct and race-free way of performing the callback
registration is:

cpu_maps_update_begin();

for_each_online_cpu(cpu)
init_cpu(cpu);

/* Note the use of the double underscored version of the API */
__register_cpu_notifier(foobar_cpu_notifier);

cpu_maps_update_done();


Fix the fcoe code in scsi by using this latter form of callback registration.

Cc: Robert Love robert.w.l...@intel.com
Cc: James E.J. Bottomley jbottom...@parallels.com
Cc: fcoe-de...@open-fcoe.org
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
---

 drivers/scsi/fcoe/fcoe.c |   15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/fcoe/fcoe.c b/drivers/scsi/fcoe/fcoe.c
index f317000..1c299de 100644
--- a/drivers/scsi/fcoe/fcoe.c
+++ b/drivers/scsi/fcoe/fcoe.c
@@ -2633,14 +2633,18 @@ static int __init fcoe_init(void)
skb_queue_head_init(p-fcoe_rx_list);
}
 
+   cpu_maps_update_begin();
+
for_each_online_cpu(cpu)
fcoe_percpu_thread_create(cpu);
 
/* Initialize per CPU interrupt thread */
-   rc = register_hotcpu_notifier(fcoe_cpu_notifier);
+   rc = __register_hotcpu_notifier(fcoe_cpu_notifier);
if (rc)
goto out_free;
 
+   cpu_maps_update_done();
+
/* Setup link change notification */
fcoe_dev_setup();
 
@@ -2655,6 +2659,9 @@ out_free:
for_each_online_cpu(cpu) {
fcoe_percpu_thread_destroy(cpu);
}
+
+   cpu_maps_update_done();
+
mutex_unlock(fcoe_config_mutex);
destroy_workqueue(fcoe_wq);
return rc;
@@ -2687,11 +2694,15 @@ static void __exit fcoe_exit(void)
}
rtnl_unlock();
 
-   unregister_hotcpu_notifier(fcoe_cpu_notifier);
+   cpu_maps_update_begin();
 
for_each_online_cpu(cpu)
fcoe_percpu_thread_destroy(cpu);
 
+   __unregister_hotcpu_notifier(fcoe_cpu_notifier);
+
+   cpu_maps_update_done();
+
mutex_unlock(fcoe_config_mutex);
 
/*

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 31/51] scsi, bnx2i: Fix CPU hotplug callback registration

2014-02-05 Thread Srivatsa S. Bhat

Subsystems that want to register CPU hotplug callbacks, as well as perform
initialization for the CPUs that are already online, often do it as shown
below:

get_online_cpus();

for_each_online_cpu(cpu)
init_cpu(cpu);

register_cpu_notifier(foobar_cpu_notifier);

put_online_cpus();

This is wrong, since it is prone to ABBA deadlocks involving the
cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
with CPU hotplug operations).

Instead, the correct and race-free way of performing the callback
registration is:

cpu_maps_update_begin();

for_each_online_cpu(cpu)
init_cpu(cpu);

/* Note the use of the double underscored version of the API */
__register_cpu_notifier(foobar_cpu_notifier);

cpu_maps_update_done();


Fix the bnx2i code in scsi by using this latter form of callback registration.

Cc: Eddie Wai eddie@broadcom.com
Cc: James E.J. Bottomley jbottom...@parallels.com
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
---

 drivers/scsi/bnx2i/bnx2i_init.c |   12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/bnx2i/bnx2i_init.c b/drivers/scsi/bnx2i/bnx2i_init.c
index 34c294b..5c4e413 100644
--- a/drivers/scsi/bnx2i/bnx2i_init.c
+++ b/drivers/scsi/bnx2i/bnx2i_init.c
@@ -537,11 +537,15 @@ static int __init bnx2i_mod_init(void)
p-iothread = NULL;
}
 
+   cpu_maps_update_begin();
+
for_each_online_cpu(cpu)
bnx2i_percpu_thread_create(cpu);
 
/* Initialize per CPU interrupt thread */
-   register_hotcpu_notifier(bnx2i_cpu_notifier);
+   __register_hotcpu_notifier(bnx2i_cpu_notifier);
+
+   cpu_maps_update_done();
 
return 0;
 
@@ -581,11 +585,15 @@ static void __exit bnx2i_mod_exit(void)
}
mutex_unlock(bnx2i_dev_lock);
 
-   unregister_hotcpu_notifier(bnx2i_cpu_notifier);
+   cpu_maps_update_begin();
 
for_each_online_cpu(cpu)
bnx2i_percpu_thread_destroy(cpu);
 
+   __unregister_hotcpu_notifier(bnx2i_cpu_notifier);
+
+   cpu_maps_update_done();
+
iscsi_unregister_transport(bnx2i_iscsi_transport);
cnic_unregister_driver(CNIC_ULP_ISCSI);
 }

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 32/51] scsi, bnx2fc: Fix CPU hotplug callback registration

2014-02-05 Thread Srivatsa S. Bhat

Subsystems that want to register CPU hotplug callbacks, as well as perform
initialization for the CPUs that are already online, often do it as shown
below:

get_online_cpus();

for_each_online_cpu(cpu)
init_cpu(cpu);

register_cpu_notifier(foobar_cpu_notifier);

put_online_cpus();

This is wrong, since it is prone to ABBA deadlocks involving the
cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
with CPU hotplug operations).

Instead, the correct and race-free way of performing the callback
registration is:

cpu_maps_update_begin();

for_each_online_cpu(cpu)
init_cpu(cpu);

/* Note the use of the double underscored version of the API */
__register_cpu_notifier(foobar_cpu_notifier);

cpu_maps_update_done();


Fix the bnx2fc code in scsi by using this latter form of callback
registration.

Cc: Eddie Wai eddie@broadcom.com
Cc: James E.J. Bottomley jbottom...@parallels.com
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
---

 drivers/scsi/bnx2fc/bnx2fc_fcoe.c |   12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/bnx2fc/bnx2fc_fcoe.c 
b/drivers/scsi/bnx2fc/bnx2fc_fcoe.c
index 9b94850..f6c10c5 100644
--- a/drivers/scsi/bnx2fc/bnx2fc_fcoe.c
+++ b/drivers/scsi/bnx2fc/bnx2fc_fcoe.c
@@ -2586,12 +2586,16 @@ static int __init bnx2fc_mod_init(void)
spin_lock_init(p-fp_work_lock);
}
 
+   cpu_maps_update_begin();
+
for_each_online_cpu(cpu) {
bnx2fc_percpu_thread_create(cpu);
}
 
/* Initialize per CPU interrupt thread */
-   register_hotcpu_notifier(bnx2fc_cpu_notifier);
+   __register_hotcpu_notifier(bnx2fc_cpu_notifier);
+
+   cpu_maps_update_done();
 
cnic_register_driver(CNIC_ULP_FCOE, bnx2fc_cnic_cb);
 
@@ -2656,13 +2660,17 @@ static void __exit bnx2fc_mod_exit(void)
if (l2_thread)
kthread_stop(l2_thread);
 
-   unregister_hotcpu_notifier(bnx2fc_cpu_notifier);
+   cpu_maps_update_begin();
 
/* Destroy per cpu threads */
for_each_online_cpu(cpu) {
bnx2fc_percpu_thread_destroy(cpu);
}
 
+   __unregister_hotcpu_notifier(bnx2fc_cpu_notifier);
+
+   cpu_maps_update_done();
+
destroy_workqueue(bnx2fc_wq);
/*
 * detach from scsi transport

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/9] PCI: Eliminate race conditions between hotplug and sysfs rescan/remove (Was: Re: [PATCH v2 04/10] PCI: Destroy pci dev only once)

2014-01-15 Thread Bjorn Helgaas

On Fri, Jan 10, 2014 at 03:20:44PM +0100, Rafael J. Wysocki wrote:
 [Cc: adding linux-scsi for the MPT changes, Ben for powerpc, Matthew for
  platform/x86 and Konrad for Xen]
 
 On Friday, December 06, 2013 02:21:50 AM Rafael J. Wysocki wrote:
 
 [...]
 
  
  OK
  
  To be a bit more constructive, as the next step I'd try to use
  pci_remove_rescan_mutex to serialize all PCI hotplug operations (as I said
  above) without making the other changes made by my patch.  Does that sound
  reasonable?
 
 Well, no answer here, so as a followup, a series implementing that idea
 follows.
 
 I *hope* I found all of the places that need to be synchronized vs the bus
 rescan and device removal that can be triggered via sysfs, but I might 
 overlook
 something.  Also in some cases I wasn't quite sure how much stuff to put under
 the lock, because said stuff is not exactly straightforward.

I applied this series to my pci/locking branch for v3.14.  It should appear
in -next tomorrow.

Note that this touches some areas that are not strictly PCI, so speak
up if I'm treading on your toes:

 arch/powerpc/kernel/eeh_driver.c   |   19 --
 drivers/acpi/pci_root.c|6 
 drivers/message/fusion/mptbase.c   |2 -
 drivers/pci/hotplug/acpiphp.h  |5 +++
 drivers/pci/hotplug/acpiphp_core.c |2 -
 drivers/pci/hotplug/acpiphp_glue.c |   43 +
 drivers/pci/hotplug/cpci_hotplug_pci.c |   14 +-
 drivers/pci/hotplug/cpqphp_pci.c   |8 +-
 drivers/pci/hotplug/ibmphp_core.c  |   13 -
 drivers/pci/hotplug/pciehp_pci.c   |   17 +
 drivers/pci/hotplug/rpadlpar_core.c|   19 ++
 drivers/pci/hotplug/rpaphp_core.c  |4 +++
 drivers/pci/hotplug/s390_pci_hpc.c |4 ++-
 drivers/pci/hotplug/sgi_hotplug.c  |5 +++
 drivers/pci/hotplug/shpchp_pci.c   |   18 ++---
 drivers/pci/pci-sysfs.c|   19 +-
 drivers/pci/probe.c|   18 +
 drivers/pci/remove.c   |   11 
 drivers/pci/xen-pcifront.c |8 ++
 drivers/pcmcia/cardbus.c   |7 +
 drivers/platform/x86/asus-wmi.c|2 +
 drivers/platform/x86/eeepc-laptop.c|2 +
 drivers/scsi/mpt2sas/mpt2sas_base.c|2 -
 drivers/scsi/mpt3sas/mpt3sas_base.c|2 -
 include/linux/pci.h|3 ++

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/9] PCI: Eliminate race conditions between hotplug and sysfs rescan/remove (Was: Re: [PATCH v2 04/10] PCI: Destroy pci dev only once)

2014-01-10 Thread Rafael J. Wysocki

[Cc: adding linux-scsi for the MPT changes, Ben for powerpc, Matthew for
 platform/x86 and Konrad for Xen]

On Friday, December 06, 2013 02:21:50 AM Rafael J. Wysocki wrote:

[...]

 
 OK
 
 To be a bit more constructive, as the next step I'd try to use
 pci_remove_rescan_mutex to serialize all PCI hotplug operations (as I said
 above) without making the other changes made by my patch.  Does that sound
 reasonable?

Well, no answer here, so as a followup, a series implementing that idea
follows.

I *hope* I found all of the places that need to be synchronized vs the bus
rescan and device removal that can be triggered via sysfs, but I might overlook
something.  Also in some cases I wasn't quite sure how much stuff to put under
the lock, because said stuff is not exactly straightforward.

Enjoy!

Rafael

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 3 >

1 - 100 of 217 matches

Mail list logo