Re: [PATCH v2] scsi: qla2xxx: Spinlock recursion in qla_target

2018-06-12 Thread Madhani, Himanshu
Hi Mikail, 

> On Jun 12, 2018, at 9:08 AM, m.maly...@yadro.com wrote:
> 
> From: Mikhail Malygin 
> 
> This patch addresses issue causing spinlock recursion in qla_target.c:
> 1. qlt_handle_login takes vha->hw->tgt.sess_lock, then calls 
> qlt_schedule_sess_for_deletion
> where it tries to take spinlock again.

We had posted patch to serialize session deletion with following patches

https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git/commit/?id=1ae634eb28533b82f9777a47c1ade44cb8c0182b

https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git/commit/?id=d8630bb95f46ea118dede63bd75533faa64f9612

https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git/commit/?id=d8630bb95f46ea118dede63bd75533faa64f9612

However, this patch looks like introduced regression,

https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git/commit/?id=1c6cacf4ea6c04a58a0e3057f5ed60c24a4ffeff

can you use work_lock as it was before the change and see if that helps with 
both issue 1 and 2. 

something like this 

diff --git a/drivers/scsi/qla2xxx/qla_target.c 
b/drivers/scsi/qla2xxx/qla_target.c
index 7ed47800c660..d649f85d9657 100644
--- a/drivers/scsi/qla2xxx/qla_target.c
+++ b/drivers/scsi/qla2xxx/qla_target.c
@@ -1239,16 +1239,16 @@ void qlt_schedule_sess_for_deletion(struct fc_port 
*sess)
return;
}

-   spin_lock_irqsave(>tgt.sess_lock, flags);
if (sess->deleted == QLA_SESS_DELETED)
sess->logout_on_delete = 0;

+   spin_lock_irqsave(>vha->work_lock, flags);
if (sess->deleted == QLA_SESS_DELETION_IN_PROGRESS) {
-   spin_unlock_irqrestore(>tgt.sess_lock, flags);
+   spin_unlock_irqrestore(>vha->work_lock, flags);
return;
}
sess->deleted = QLA_SESS_DELETION_IN_PROGRESS;
-   spin_unlock_irqrestore(>tgt.sess_lock, flags);
+   spin_unlock_irqrestore(>vha->work_lock, flags);

sess->disc_state = DSC_DELETE_PEND;

> 2. qlt_reset takes the same lock, then calls qlt_schedule_sess_for_deleteion 
> via qlt_clear_tgt_db 
> 
> Stacktrace for qlt_handle_login
> 
> BUG: spinlock lockup suspected on CPU#0, swapper/0/0
> lock: 0xc0c07aa8bec0, .magic: dead4ead, .owner: swapper/0/0, .owner_cpu: 0
> CPU: 0 PID: 0 Comm: swapper/0 Tainted: G   OE   NX 
> 4.4.132-ttln.24-debug #1
> Call Trace:
> [c0dfff6d7830] [c08060c0] dump_stack+0xb0/0xf0 (unreliable)
> [c0dfff6d7870] [c07ff6ec] spin_dump+0xa8/0xc4
> [c0dfff6d78e0] [c0128320] do_raw_spin_lock+0x140/0x1d0
> [c0dfff6d7920] [c07f7354] _raw_spin_lock_irqsave+0x34/0x50
> [c0dfff6d7950] [d0001edf3220] 
> qlt_schedule_sess_for_deletion+0x90/0x250 [qla2xxx]
> [c0dfff6d79c0] [d0001edf6b08] 
> qlt_find_sess_invalidate_other+0x1d8/0x230 [qla2xxx]
> [c0dfff6d7a70] [d0001edf710c] qlt_handle_login+0x5ac/0x760 [qla2xxx]
> [c0dfff6d7b10] [d0001edf7ccc] qlt_handle_imm_notify+0xa0c/0x10b0 
> [qla2xxx]
> [c0dfff6d7c00] [d0001edf85f0] qlt_24xx_atio_pkt+0x280/0x400 [qla2xxx]
> [c0dfff6d7ca0] [d0001edfa9d8] qlt_24xx_process_atio_queue+0x368/0x7d0 
> [qla2xxx]
> [c0dfff6d7d80] [d0001edfb898] qla83xx_msix_atio_q+0x58/0x90 [qla2xxx]
> [c0dfff6d7dc0] [c0133cd0] __handle_irq_event_percpu+0xa0/0x2f0
> [c0dfff6d7e80] [c0133f5c] handle_irq_event_percpu+0x3c/0x90
> [c0dfff6d7ec0] [c0134018] handle_irq_event+0x68/0xb0
> [c0dfff6d7f00] [c0139278] handle_fasteoi_irq+0xf8/0x260
> [c0dfff6d7f40] [c0132e80] generic_handle_irq+0x50/0x80
> [c0dfff6d7f60] [c0014c44] __do_irq+0x84/0x1d0
> [c0dfff6d7f90] [c0027924] call_do_irq+0x14/0x24
> [c0f13a20] [c0014e30] do_IRQ+0xa0/0x120
> [c0f13a70] [c0002694] hardware_interrupt_common+0x114/0x180
> --- interrupt: 501 at snooze_loop+0xc4/0x1a0
>LR = snooze_loop+0x16c/0x1a0
> [c0f13d60] [c063b41c] nap_loop+0x5c/0x120 (unreliable)
> [c0f13da0] [c0637f9c] cpuidle_enter_state+0xbc/0x3d0
> [c0f13e00] [c011db10] call_cpuidle+0x50/0x80
> [c0f13e20] [c011e138] cpu_startup_entry+0x388/0x490
> [c0f13ee0] [c000c260] rest_init+0xb0/0xd0
> [c0f13f00] [c0aa4070] start_kernel+0x55c/0x578
> [c0f13f90] [c0008e6c] start_here_common+0x20/0xb4
> nvme nvme0: I/O 782 QID 9 timeout, completion polled
> nvme nvme0: I/O 99 QID 12 timeout, completion polled
> nvme nvme0: I/O 925 QID 4 timeout, completio
> 
> 
> Stacktrace for qlt_reset:
> 
> BUG: spinlock recursion on CPU#0, swapper/0/0
> lock: 0xc0207d5ffec0, .magic: dead4ead, .owner: swapper/0/0, .owner_cpu: 0
> CPU: 0 PID: 0 Comm: swapper/0 Tainted: G   OE   NX 
> 4.4.132-ttln.25-debug #1
> Call Trace:
> [c03fff71b8d0] [c08060c0] dump_stack+0xb0/0xf0 (unreliable)
> [c03fff71b910] [c07ff6ec] spin_dump+0xa8/0xc4
> [c03fff71b980] 

[PATCH v2] scsi: qla2xxx: Spinlock recursion in qla_target

2018-06-12 Thread m.malygin
From: Mikhail Malygin 

This patch addresses issue causing spinlock recursion in qla_target.c:
1. qlt_handle_login takes vha->hw->tgt.sess_lock, then calls 
qlt_schedule_sess_for_deletion
where it tries to take spinlock again.
2. qlt_reset takes the same lock, then calls qlt_schedule_sess_for_deleteion 
via qlt_clear_tgt_db 

Stacktrace for qlt_handle_login

BUG: spinlock lockup suspected on CPU#0, swapper/0/0
 lock: 0xc0c07aa8bec0, .magic: dead4ead, .owner: swapper/0/0, .owner_cpu: 0
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G   OE   NX 
4.4.132-ttln.24-debug #1
Call Trace:
[c0dfff6d7830] [c08060c0] dump_stack+0xb0/0xf0 (unreliable)
[c0dfff6d7870] [c07ff6ec] spin_dump+0xa8/0xc4
[c0dfff6d78e0] [c0128320] do_raw_spin_lock+0x140/0x1d0
[c0dfff6d7920] [c07f7354] _raw_spin_lock_irqsave+0x34/0x50
[c0dfff6d7950] [d0001edf3220] qlt_schedule_sess_for_deletion+0x90/0x250 
[qla2xxx]
[c0dfff6d79c0] [d0001edf6b08] 
qlt_find_sess_invalidate_other+0x1d8/0x230 [qla2xxx]
[c0dfff6d7a70] [d0001edf710c] qlt_handle_login+0x5ac/0x760 [qla2xxx]
[c0dfff6d7b10] [d0001edf7ccc] qlt_handle_imm_notify+0xa0c/0x10b0 
[qla2xxx]
[c0dfff6d7c00] [d0001edf85f0] qlt_24xx_atio_pkt+0x280/0x400 [qla2xxx]
[c0dfff6d7ca0] [d0001edfa9d8] qlt_24xx_process_atio_queue+0x368/0x7d0 
[qla2xxx]
[c0dfff6d7d80] [d0001edfb898] qla83xx_msix_atio_q+0x58/0x90 [qla2xxx]
[c0dfff6d7dc0] [c0133cd0] __handle_irq_event_percpu+0xa0/0x2f0
[c0dfff6d7e80] [c0133f5c] handle_irq_event_percpu+0x3c/0x90
[c0dfff6d7ec0] [c0134018] handle_irq_event+0x68/0xb0
[c0dfff6d7f00] [c0139278] handle_fasteoi_irq+0xf8/0x260
[c0dfff6d7f40] [c0132e80] generic_handle_irq+0x50/0x80
[c0dfff6d7f60] [c0014c44] __do_irq+0x84/0x1d0
[c0dfff6d7f90] [c0027924] call_do_irq+0x14/0x24
[c0f13a20] [c0014e30] do_IRQ+0xa0/0x120
[c0f13a70] [c0002694] hardware_interrupt_common+0x114/0x180
--- interrupt: 501 at snooze_loop+0xc4/0x1a0
LR = snooze_loop+0x16c/0x1a0
[c0f13d60] [c063b41c] nap_loop+0x5c/0x120 (unreliable)
[c0f13da0] [c0637f9c] cpuidle_enter_state+0xbc/0x3d0
[c0f13e00] [c011db10] call_cpuidle+0x50/0x80
[c0f13e20] [c011e138] cpu_startup_entry+0x388/0x490
[c0f13ee0] [c000c260] rest_init+0xb0/0xd0
[c0f13f00] [c0aa4070] start_kernel+0x55c/0x578
[c0f13f90] [c0008e6c] start_here_common+0x20/0xb4
nvme nvme0: I/O 782 QID 9 timeout, completion polled
nvme nvme0: I/O 99 QID 12 timeout, completion polled
nvme nvme0: I/O 925 QID 4 timeout, completio


Stacktrace for qlt_reset:

BUG: spinlock recursion on CPU#0, swapper/0/0
 lock: 0xc0207d5ffec0, .magic: dead4ead, .owner: swapper/0/0, .owner_cpu: 0
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G   OE   NX 
4.4.132-ttln.25-debug #1
Call Trace:
[c03fff71b8d0] [c08060c0] dump_stack+0xb0/0xf0 (unreliable)
[c03fff71b910] [c07ff6ec] spin_dump+0xa8/0xc4
[c03fff71b980] [c01283a4] do_raw_spin_lock+0x1c4/0x1d0
[c03fff71b9c0] [c07f7354] _raw_spin_lock_irqsave+0x34/0x50
[c03fff71b9f0] [d000128933b4] qlt_schedule_sess_for_deletion+0x44/0x80 
[qla2xxx]
[c03fff71ba30] [d00012893454] qlt_clear_tgt_db+0x64/0x90 [qla2xxx]
[c03fff71ba60] [d00012893604] qlt_reset+0x184/0x1f0 [qla2xxx]
[c03fff71bb10] [d00012897a2c] qlt_handle_imm_notify+0x74c/0x10b0 
[qla2xxx]
[c03fff71bc00] [d00012898610] qlt_24xx_atio_pkt+0x280/0x400 [qla2xxx]
[c03fff71bca0] [d0001289a9f8] qlt_24xx_process_atio_queue+0x368/0x7d0 
[qla2xxx]
[c03fff71bd80] [d0001289b8b8] qla83xx_msix_atio_q+0x58/0x90 [qla2xxx]
[c03fff71bdc0] [c0133cd0] __handle_irq_event_percpu+0xa0/0x2f0
[c03fff71be80] [c0133f5c] handle_irq_event_percpu+0x3c/0x90
[c03fff71bec0] [c0134018] handle_irq_event+0x68/0xb0
[c03fff71bf00] [c0139278] handle_fasteoi_irq+0xf8/0x260
[c03fff71bf40] [c0132e80] generic_handle_irq+0x50/0x80
[c03fff71bf60] [c0014c44] __do_irq+0x84/0x1d0
[c03fff71bf90] [c0027924] call_do_irq+0x14/0x24
[c0f13a40] [c0014e30] do_IRQ+0xa0/0x120
[c0f13a90] [c0002694] hardware_interrupt_common+0x114/0x180
--- interrupt: 501 at arch_local_irq_restore+0x5c/0x90
LR = arch_local_irq_restore+0x40/0x90
[c0f13d80] [c0f1] init_thread_union+0x0/0x2000 (unreliable)
[c0f13da0] [c0637fec] cpuidle_enter_state+0x10c/0x3d0
[c0f13e00] [c011db10] call_cpuidle+0x50/0x80
[c0f13e20] [c011e138] cpu_startup_entry+0x388/0x490
[c0f13ee0] [c000c260] rest_init+0xb0/0xd0
[c0f13f00] [c0aa4070] start_kernel+0x55c/0x578
[c0f13f90] [c0008e6c] start_here_common+0x20/0xb4

Steps to reproduce:
1. 

Re: [PATCH 0/2 REPOST] remove unneded irq save

2018-06-12 Thread Dan Williams
On Tue, Jun 12, 2018 at 8:04 AM, John Garry  wrote:
> On 12/06/2018 15:31, Sebastian Andrzej Siewior wrote:
>>
>> On 2018-06-12 13:54:36 [+0100], John Garry wrote:
>>>
>>> +Dan
>>>
>>> On 11/06/2018 19:23, Sebastian Andrzej Siewior wrote:

 On 2018-06-11 18:12:55 [+0100], John Garry wrote:
>
> On 11/06/2018 15:40, Sebastian Andrzej Siewior wrote:
>>
>> This is the repost of the two patches I posted in earlier this month:
>>
>> - [PATCH 1/2] libsas: remove irq save in sas_ata_qc_issue()
>>   Received feedback but nothing really changed. I explained that this
>> is
>>   not about "local_irqsave() + spin_lock()" *but* "local_irq_save() +
>>   spin_unlock()". This seemed to have been overseen twice.
>>   Also there were two opinions about the TODO comment:
>>  /* TODO: audit callers to ensure they are ready for qc_issue to
>>   * unconditionally re-enable interrupts
>>It is unclear to me if this comment should be removed because it
>>makes no sense or if the intention was indeed to audit callers code
>>for a possible "spin_unlock_irq(ap->lock);".
>
>
> Hi,
>
> As I said previously, since it is not clear now what the comment meant,
> then
> removing the irq save/restore calls will only make it even less clear,
> and
> should be fixed.


 how should it be fixed? The discussion went forth and back. The comment
 as of now does not match the code. It disables interrupts which are
 already disabled. It does not enable them at any point.
>>>
>>>
>>> As I see, at the point we release the lock, the question is if we can
>>> just
>>> re-enable interrupts as probably we disabled interrupts earlier for
>>> taking
>>> the same lock.
>>>
>>> However, as mentioned, we should not need to disable interrupts as they
>>> should have been already disabled.
>>
>>
>> The irq_save + unlock combo got first introduced in commit ce4f75def399
>> ("isci: Free host lock for SATA/STP abort escalation at submission
>> time."). It then got moved forth and back until it ended where it is
>> today with the comment Dan put there. Back then the code path was:
>> - ata_exec_internal_sg()
>>   spin_lock_irqsave(ap->lock, flags);
>>   -> ata_qc_issue() (saying LOCKING:spin_lock_irqsave(host lock)
>> -> qc->err_mask |= ap->ops->qc_issue(qc); => sas_ata_qc_issue()
>>-> i->dft->lldd_execute_task(task, 1, GFP_ATOMIC); =>
>> isci_task_execute_task()
>>  ->isci_request_execute(ihost, task, , gfp_flags);
>>if (dev_is_sata(task->dev)) {
>> /* Since we are still in the submit path, and since
>>  * libsas takes the host lock on behalf of SATA
>>  * devices before I/O starts, we need to unlock
>>  * before we can put the task in the error path.
>>  */
>>  raw_local_irq_save(flags);
>>  spin_unlock(isci_host->shost->host_lock);
>>  sas_task_abort(task)
>>
>> So. This when the mistake was introduced - it shouldn't get in there
>> like that.
>>
>>> Personally I would rather not change the code. But, if you must, I
>>> suggest
>>
>> of course.
>>
>>> update the comment to read something like this:
>>> - TODO: It may be possible to unconditionally re-enable interrupts for
>>> period of having the lock released. Audit callers to check.
>>
>>
>> We had this comment for 6 years or so and nothing happend. What makes
>> you think that an updated version of that comment will motivate someone
>> to make change here in the near future?
>
>
> Updating the comment is not in itself something to motivate someone to
> change, but we should keep the comment reasonably accurate or get rid.
>
>> It looks to me like a stale comment which won't change a thing because
>> it does not point out the benefit of doing so (re-enabling interrupts
>> while dropping the lock) and the price, that is paid for not doing so
>> (keeping the code as it is) is small enough to not bother.
>>
>> So if updating the comment as suggested instead of keeping it as-is or
>> removing it is the blocker *here* then I can send an updated version.
>> Any comments?
>
>
> I'd prefer an updated comment.
>

I think we should try to remove the unlock completely. I agree with
Sebastian that the audit is never coming. As it is libsas is the only
ata_port_operations implementation that drops the host_lock while
running ->qc_issue().


Re: [PATCH] mpt3sas: Fix calltrace observed while running IO & host reset

2018-06-12 Thread Bart Van Assche
On Tue, 2018-06-12 at 09:17 -0400, Chaitra P B wrote:
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c 
> b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> index 23902ad..96e523a 100644
> --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> @@ -1489,7 +1489,7 @@ struct scsi_cmnd *
>   scmd = scsi_host_find_tag(ioc->shost, unique_tag);
>   if (scmd) {
>   st = scsi_cmd_priv(scmd);
> - if (st->cb_idx == 0xFF)
> + if (st->cb_idx == 0xFF || st->smid == 0)
>   scmd = NULL;
>   }
>   }

What guarantees that st->smid won't change after it has been checked and
before scmd is used?

Thanks,

Bart.






Re: [PATCH 0/2 REPOST] remove unneded irq save

2018-06-12 Thread John Garry

On 12/06/2018 15:31, Sebastian Andrzej Siewior wrote:

On 2018-06-12 13:54:36 [+0100], John Garry wrote:

+Dan

On 11/06/2018 19:23, Sebastian Andrzej Siewior wrote:

On 2018-06-11 18:12:55 [+0100], John Garry wrote:

On 11/06/2018 15:40, Sebastian Andrzej Siewior wrote:

This is the repost of the two patches I posted in earlier this month:

- [PATCH 1/2] libsas: remove irq save in sas_ata_qc_issue()
  Received feedback but nothing really changed. I explained that this is
  not about "local_irqsave() + spin_lock()" *but* "local_irq_save() +
  spin_unlock()". This seemed to have been overseen twice.
  Also there were two opinions about the TODO comment:
 /* TODO: audit callers to ensure they are ready for qc_issue to
  * unconditionally re-enable interrupts
   It is unclear to me if this comment should be removed because it
   makes no sense or if the intention was indeed to audit callers code
   for a possible "spin_unlock_irq(ap->lock);".


Hi,

As I said previously, since it is not clear now what the comment meant, then
removing the irq save/restore calls will only make it even less clear, and
should be fixed.


how should it be fixed? The discussion went forth and back. The comment
as of now does not match the code. It disables interrupts which are
already disabled. It does not enable them at any point.


As I see, at the point we release the lock, the question is if we can just
re-enable interrupts as probably we disabled interrupts earlier for taking
the same lock.

However, as mentioned, we should not need to disable interrupts as they
should have been already disabled.


The irq_save + unlock combo got first introduced in commit ce4f75def399
("isci: Free host lock for SATA/STP abort escalation at submission
time."). It then got moved forth and back until it ended where it is
today with the comment Dan put there. Back then the code path was:
- ata_exec_internal_sg()
  spin_lock_irqsave(ap->lock, flags);
  -> ata_qc_issue() (saying LOCKING:spin_lock_irqsave(host lock)
-> qc->err_mask |= ap->ops->qc_issue(qc); => sas_ata_qc_issue()
   -> i->dft->lldd_execute_task(task, 1, GFP_ATOMIC); => 
isci_task_execute_task()
 ->isci_request_execute(ihost, task, , gfp_flags);
   if (dev_is_sata(task->dev)) {
/* Since we are still in the submit path, and since
 * libsas takes the host lock on behalf of SATA
 * devices before I/O starts, we need to unlock
 * before we can put the task in the error path.
 */
 raw_local_irq_save(flags);
 spin_unlock(isci_host->shost->host_lock);
 sas_task_abort(task)

So. This when the mistake was introduced - it shouldn't get in there
like that.


Personally I would rather not change the code. But, if you must, I suggest

of course.


update the comment to read something like this:
- TODO: It may be possible to unconditionally re-enable interrupts for
period of having the lock released. Audit callers to check.


We had this comment for 6 years or so and nothing happend. What makes
you think that an updated version of that comment will motivate someone
to make change here in the near future?


Updating the comment is not in itself something to motivate someone to 
change, but we should keep the comment reasonably accurate or get rid.



It looks to me like a stale comment which won't change a thing because
it does not point out the benefit of doing so (re-enabling interrupts
while dropping the lock) and the price, that is paid for not doing so
(keeping the code as it is) is small enough to not bother.

So if updating the comment as suggested instead of keeping it as-is or
removing it is the blocker *here* then I can send an updated version.
Any comments?


I'd prefer an updated comment.

Cheers,
John



Sebastian

.






aacraid woes with kernel 4.14.48 to 4.16.14

2018-06-12 Thread Emmanuel Florac
Hi all,

Running plain vanilla kernels on Debian stable, on 2 different machines
with different hardware setups: 

 * production system: EPYC 16 cores, 64 GB RAM, ASR8805 (latest
   firmware): doesn't work at all with 4.14.48 and up
 * test system : Opteron dual core, 2GB RAM, ASR7805, mostly works, but
   problems with 4.16.14

Both system run fine with kernel up to 4.13.16 (plain vanilla). With
4.14.48 and up, the aacraid driver doesn't work on the EPYC system. From
dmesg:

[   61.069190] Adaptec aacraid driver 1.2.1[50834]-custom
[   61.069527] aacraid :21:00.0: SME is active, device will require DMA 
bounce buffers
[   61.076949] SME is active and system is using DMA bounce buffers
[   61.076954] aacraid: Comm Interface type2 enabled


Nothing else happens. Attached devices are unavailable. "arcconf" find
no controller. "rmmod aacraid" doesn't work (device in use). 

Running kernel 4.16.14 is exactly the same:

[   40.380488] Adaptec aacraid driver 1.2.1[50877]-custom
[   40.380871] aacraid :21:00.0: SME is active, device will require DMA 
bounce buffers
[   40.388991] SME is active and system is using DMA bounce buffers
[   40.388995] aacraid: Comm Interface type2 enabled

Contrast with kernel 4.13.16 (dmesg ) :

   25.286437] Adaptec aacraid driver 1.2.1[50834]-custom
[   25.293694] aacraid: Comm Interface type2 enabled
[   25.300799] AAC0: kernel 7.13-0[33263] Mar 16 2018
[   25.300801] AAC0: monitor 7.13-0[33263]
[   25.300802] AAC0: bios 7.13-0[33263]
[   25.300804] AAC0: serial 6A46639462D
[   25.300804] AAC0: Non-DASD support enabled.
[   25.300805] AAC0: 64bit support enabled.
[   25.300807] aacraid :21:00.0: 64 Bit DAC enabled
...
[   27.307013] scsi host16: aacraid
[   27.307228] scsi 16:0:0:0: Direct-Access ASR8805  LogicalDrv 0 V1.0 
PQ: 0 ANSI: 2
[   27.307364] sd 16:0:0:0: [sdm] Very big device. Trying to use READ 
CAPACITY(16).
[   27.307376] sd 16:0:0:0: [sdm] 11714670592 512-byte logical blocks: (6.00 
TB/5.45 TiB)
[   27.307385] sd 16:0:0:0: [sdm] Write Protect is off
[   27.307386] sd 16:0:0:0: [sdm] Mode Sense: 12 00 10 08
[   27.307403] sd 16:0:0:0: [sdm] Write cache: disabled, read cache: enabled, 
supports DPO and FUA
[   27.307411] sd 16:0:0:0: Attached scsi generic sg12 type 0
[   27.307507] sd 16:0:0:0: [sdm] Very big device. Trying to use READ 
CAPACITY(16).
[   27.307830] sd 16:0:0:0: [sdm] Very big device. Trying to use READ 
CAPACITY(16).
[   27.307855] sd 16:0:0:0: [sdm] Attached SCSI removable disk
[   27.332731] scsi 16:1:0:0: Direct-Access ATA  WDC WD10TPVT-00U 1A01 
PQ: 1 ANSI: 6
[   27.355431] scsi 16:1:0:0: Attached scsi generic sg13 type 0
[   27.355830] scsi 16:1:1:0: Direct-Access ATA  WDC WD10TPVT-00U 1A01 
PQ: 1 ANSI: 6
[   27.385377] scsi 16:1:1:0: Attached scsi generic sg14 type 0
[   27.385788] scsi 16:1:2:0: Direct-Access ATA  WDC WD10TPVT-00U 1A01 
PQ: 1 ANSI: 6
[   27.413412] scsi 16:1:2:0: Attached scsi generic sg15 type 0
[   27.413813] scsi 16:1:3:0: Direct-Access ATA  WDC WD10TPVT-00U 1A01 
PQ: 1 ANSI: 6
[   27.437420] scsi 16:1:3:0: Attached scsi generic sg16 type 0
[   27.437836] scsi 16:1:4:0: Direct-Access ATA  WDC WD10TPVT-00H 1A01 
PQ: 1 ANSI: 6
[   27.636675] scsi 16:1:4:0: Attached scsi generic sg17 type 0
[   27.637077] scsi 16:1:5:0: Direct-Access ATA  WDC WD10TPVT-00U 1A01 
PQ: 1 ANSI: 6
[   27.664591] scsi 16:1:5:0: Attached scsi generic sg18 type 0
[   27.664996] scsi 16:1:6:0: Direct-Access ATA  WDC WD10TPVT-00U 1A01 
PQ: 1 ANSI: 6
[   27.697619] scsi 16:1:6:0: Attached scsi generic sg19 type 0
[   27.698027] scsi 16:1:7:0: Direct-Access ATA  WDC WD10TPVT-00U 1A01 
PQ: 1 ANSI: 6
[   27.732645] scsi 16:1:7:0: Attached scsi generic sg20 type 0

And 4.9.108 (different driver but same output):

[   11.220997] Adaptec aacraid driver 1.2-1[41066]-ms
[   11.225669] AAC0: kernel 7.13-0[33263] Mar 16 2018
[   11.225671] AAC0: monitor 7.13-0[33263]
[   11.225672] AAC0: bios 7.13-0[33263]
[   11.225674] AAC0: serial 6A46639462D
[   11.225674] AAC0: Non-DASD support enabled.
[   11.225675] AAC0: 64bit support enabled.
[   11.225675] AAC0: 64 Bit DAC enabled
[   13.220045] scsi host16: aacraid
[   13.220082] aacraid :21:00.0: DDR cache data recovered successfully
[   13.220286] scsi 16:0:0:0: Direct-Access ASR8805  LogicalDrv 0 V1.0 
PQ: 0 ANSI: 2
[   13.220371] sd 16:0:0:0: [sdm] Very big device. Trying to use READ 
CAPACITY(16).
[   13.220385] sd 16:0:0:0: [sdm] 11714670592 512-byte logical blocks: (6.00 
TB/5.45 TiB)
[   13.220393] sd 16:0:0:0: [sdm] Write Protect is off
[   13.220395] sd 16:0:0:0: [sdm] Mode Sense: 12 00 10 08
[   13.220415] sd 16:0:0:0: [sdm] Write cache: disabled, read cache: enabled, 
supports DPO and FUA
[   13.220499] sd 16:0:0:0: [sdm] Very big device. Trying to use READ 
CAPACITY(16).
[   13.220737] sd 16:0:0:0: [sdm] Very big device. Trying to use READ 
CAPACITY(16).
[   13.220774] sd 16:0:0:0: [sdm] Attached SCSI removable 

Re: [PATCH 0/2 REPOST] remove unneded irq save

2018-06-12 Thread Sebastian Andrzej Siewior
On 2018-06-12 13:54:36 [+0100], John Garry wrote:
> +Dan
> 
> On 11/06/2018 19:23, Sebastian Andrzej Siewior wrote:
> > On 2018-06-11 18:12:55 [+0100], John Garry wrote:
> > > On 11/06/2018 15:40, Sebastian Andrzej Siewior wrote:
> > > > This is the repost of the two patches I posted in earlier this month:
> > > > 
> > > > - [PATCH 1/2] libsas: remove irq save in sas_ata_qc_issue()
> > > >   Received feedback but nothing really changed. I explained that this is
> > > >   not about "local_irqsave() + spin_lock()" *but* "local_irq_save() +
> > > >   spin_unlock()". This seemed to have been overseen twice.
> > > >   Also there were two opinions about the TODO comment:
> > > >  /* TODO: audit callers to ensure they are ready for qc_issue to
> > > >   * unconditionally re-enable interrupts
> > > >It is unclear to me if this comment should be removed because it
> > > >makes no sense or if the intention was indeed to audit callers code
> > > >for a possible "spin_unlock_irq(ap->lock);".
> > > 
> > > Hi,
> > > 
> > > As I said previously, since it is not clear now what the comment meant, 
> > > then
> > > removing the irq save/restore calls will only make it even less clear, and
> > > should be fixed.
> > 
> > how should it be fixed? The discussion went forth and back. The comment
> > as of now does not match the code. It disables interrupts which are
> > already disabled. It does not enable them at any point.
> 
> As I see, at the point we release the lock, the question is if we can just
> re-enable interrupts as probably we disabled interrupts earlier for taking
> the same lock.
>
> However, as mentioned, we should not need to disable interrupts as they
> should have been already disabled.

The irq_save + unlock combo got first introduced in commit ce4f75def399
("isci: Free host lock for SATA/STP abort escalation at submission
time."). It then got moved forth and back until it ended where it is
today with the comment Dan put there. Back then the code path was:
- ata_exec_internal_sg()
  spin_lock_irqsave(ap->lock, flags);
  -> ata_qc_issue() (saying LOCKING:spin_lock_irqsave(host lock)
-> qc->err_mask |= ap->ops->qc_issue(qc); => sas_ata_qc_issue()
   -> i->dft->lldd_execute_task(task, 1, GFP_ATOMIC); => 
isci_task_execute_task()
 ->isci_request_execute(ihost, task, , gfp_flags);
   if (dev_is_sata(task->dev)) {
/* Since we are still in the submit path, and since
 * libsas takes the host lock on behalf of SATA
 * devices before I/O starts, we need to unlock
 * before we can put the task in the error path.
 */
 raw_local_irq_save(flags);
 spin_unlock(isci_host->shost->host_lock);
 sas_task_abort(task)

So. This when the mistake was introduced - it shouldn't get in there
like that.

> Personally I would rather not change the code. But, if you must, I suggest
of course.

> update the comment to read something like this:
> - TODO: It may be possible to unconditionally re-enable interrupts for
> period of having the lock released. Audit callers to check.

We had this comment for 6 years or so and nothing happend. What makes
you think that an updated version of that comment will motivate someone
to make change here in the near future?
It looks to me like a stale comment which won't change a thing because
it does not point out the benefit of doing so (re-enabling interrupts
while dropping the lock) and the price, that is paid for not doing so
(keeping the code as it is) is small enough to not bother.

So if updating the comment as suggested instead of keeping it as-is or
removing it is the blocker *here* then I can send an updated version.
Any comments?

> Cheers,
> John
 
Sebastian


[PATCH] mpt3sas: Fix calltrace observed while running IO & host reset

2018-06-12 Thread Chaitra P B
Below kernel BUG was observed while running IOs with host reset
(issued from application),

mpt3sas_cm0: diag reset: SUCCESS
[ cut here ]
WARNING: CPU: 12 PID: 4336 at drivers/scsi/mpt3sas/mpt3sas_base.c:3282 
mpt3sas_base_clear_st+0x3d/0x40 [mpt3sas]
Modules linked in: macsec tcp_diag udp_diag inet_diag unix_diag af_packet_diag 
netlink_diag binfmt_misc fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 
tun devlink ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 
xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc 
ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle 
ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 
nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw 
ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc vfat 
fat sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm 
irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper 
ablk_helper cryptd iTCO_wdt iTCO_vendor_support
 dcdbas pcspkr joydev ipmi_ssif ses enclosure sg ipmi_devintf acpi_pad 
ipmi_msghandler acpi_power_meter mei_me lpc_ich wmi mei shpchp ip_tables xfs 
libcrc32c sd_mod crc_t10dif crct10dif_generic ata_generic pata_acpi uas 
usb_storage mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect 
sysimgblt fb_sys_fops ttm drm ata_piix mpt3sas libata crct10dif_pclmul 
crct10dif_common tg3 crc32c_intel i2c_core raid_class ptp scsi_transport_sas 
pps_core dm_mirror dm_region_hash dm_log dm_mod
CPU: 12 PID: 4336 Comm: python Kdump: loaded Tainted: GW  
   3.10.0-875.el7.brdc.x86_64 #1
Hardware name: Dell Inc. PowerEdge R820/0YWR73, BIOS 1.5.0 03/08/2013
Call Trace:
 [] dump_stack+0x19/0x1b
 [] __warn+0xd8/0x100
 [] warn_slowpath_null+0x1d/0x20
 [] mpt3sas_base_clear_st+0x3d/0x40 [mpt3sas]
 [] _scsih_flush_running_cmds+0x92/0xe0 [mpt3sas]
 [] mpt3sas_scsih_reset_handler+0x43b/0xaf0 [mpt3sas]
 [] ? vprintk_default+0x29/0x40
 [] ? printk+0x60/0x77
 [] ? _base_diag_reset+0x238/0x340 [mpt3sas]
 [] mpt3sas_base_hard_reset_handler+0x1ad/0x420 [mpt3sas]
 [] _ctl_ioctl_main.isra.12+0x11b9/0x1200 [mpt3sas]
 [] ? xfs_file_aio_write+0x155/0x1b0 [xfs]
 [] ? do_sync_write+0x93/0xe0
 [] _ctl_ioctl+0x1a/0x20 [mpt3sas]
 [] do_vfs_ioctl+0x350/0x560
 [] ? __sb_end_write+0x31/0x60
 [] SyS_ioctl+0xa1/0xc0
 [] ? system_call_after_swapgs+0xa2/0x146
 [] system_call_fastpath+0x1c/0x21
 [] ? system_call_after_swapgs+0xae/0x146
---[ end trace 5dac5b98d89aaa3c ]---
[ cut here ]
kernel BUG at block/blk-core.c:1476!
invalid opcode:  [#1] SMP
Modules linked in: macsec tcp_diag udp_diag inet_diag unix_diag af_packet_diag 
netlink_diag binfmt_misc fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 
tun devlink ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 
xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc 
ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle 
ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 
nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw 
ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc vfat 
fat sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm 
irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper 
ablk_helper cryptd iTCO_wdt iTCO_vendor_support
 dcdbas pcspkr joydev ipmi_ssif ses enclosure sg ipmi_devintf acpi_pad 
ipmi_msghandler acpi_power_meter mei_me lpc_ich wmi mei shpchp ip_tables xfs 
libcrc32c sd_mod crc_t10dif crct10dif_generic ata_generic pata_acpi uas 
usb_storage mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect 
sysimgblt fb_sys_fops ttm drm ata_piix mpt3sas libata crct10dif_pclmul 
crct10dif_common tg3 crc32c_intel i2c_core raid_class ptp scsi_transport_sas 
pps_core dm_mirror dm_region_hash dm_log dm_mod
CPU: 12 PID: 4336 Comm: python Kdump: loaded Tainted: GW  
   3.10.0-875.el7.brdc.x86_64 #1
Hardware name: Dell Inc. PowerEdge R820/0YWR73, BIOS 1.5.0 03/08/2013
task: 903fc96e0fd0 ti: 903fb1eec000 task.ti: 903fb1eec000
RIP: 0010:[]  [] 
blk_requeue_request+0x90/0xa0
RSP: 0018:903c6b783dc0  EFLAGS: 00010087
RAX: 903bb67026d0 RBX: 903b7d6a6140 RCX: dead0200
RDX: 903bb67026d0 RSI: 903bb6702580 RDI: 903bb67026d0
RBP: 903c6b783dd8 R08: 903bb67026d0 R09: d97e8000
R10: 903c658bac00 R11:  R12: 903bb6702580
R13: 903fa9a292f0 R14: 0246 R15: 1057
FS:  7f7026f5b740() GS:903c6b78() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f298877c004 CR3: caf36000 CR4: 000607e0
Call Trace:
 
 [] __scsi_queue_insert+0xbf/0x110
 [] scsi_io_completion+0x5da/0x6a0
 [] scsi_finish_command+0xdc/0x140
 [] 

Re: [PATCH 0/2 REPOST] remove unneded irq save

2018-06-12 Thread John Garry

+Dan

On 11/06/2018 19:23, Sebastian Andrzej Siewior wrote:

On 2018-06-11 18:12:55 [+0100], John Garry wrote:

On 11/06/2018 15:40, Sebastian Andrzej Siewior wrote:

This is the repost of the two patches I posted in earlier this month:

- [PATCH 1/2] libsas: remove irq save in sas_ata_qc_issue()
  Received feedback but nothing really changed. I explained that this is
  not about "local_irqsave() + spin_lock()" *but* "local_irq_save() +
  spin_unlock()". This seemed to have been overseen twice.
  Also there were two opinions about the TODO comment:
 /* TODO: audit callers to ensure they are ready for qc_issue to
  * unconditionally re-enable interrupts
   It is unclear to me if this comment should be removed because it
   makes no sense or if the intention was indeed to audit callers code
   for a possible "spin_unlock_irq(ap->lock);".


Hi,

As I said previously, since it is not clear now what the comment meant, then
removing the irq save/restore calls will only make it even less clear, and
should be fixed.


how should it be fixed? The discussion went forth and back. The comment
as of now does not match the code. It disables interrupts which are
already disabled. It does not enable them at any point.


As I see, at the point we release the lock, the question is if we can 
just re-enable interrupts as probably we disabled interrupts earlier for 
taking the same lock.


However, as mentioned, we should not need to disable interrupts as they 
should have been already disabled.


Personally I would rather not change the code. But, if you must, I 
suggest update the comment to read something like this:
- TODO: It may be possible to unconditionally re-enable interrupts for 
period of having the lock released. Audit callers to check.


Cheers,
John



Sebastian

.






Re: RAID6: "Bad block number requested"

2018-06-12 Thread Bryan Gurney
On Mon, Jun 11, 2018 at 6:09 PM, James Bottomley
 wrote:
> On Mon, 2018-06-11 at 17:56 -0400, Bryan Gurney wrote:
>> On Mon, Jun 11, 2018 at 1:00 PM, Anthony Youngman
>>  wrote:
>> > On 11/06/18 16:06, James Bottomley wrote:
>> > > Well, this is the problem: a 4k logical (presumably 4k physical)
>> > > drive cannot be addressed in block sectors that are not divisible
>> > > by 8.  This type of drive configuration is very unusual (although
>> > > it was something we tested years ago before the industry realised
>> > > it had to ship drives with 4k physical but 512 byte logical
>> > > sectors because of the legacy problem).
>> >
>> > I understood these drives were now becoming much more common,
>> > especially enterprise-grade drives. I know there were problems
>> > switching from 512/512 drives to 512/4096, but as you say I thought
>> > they were pretty much addressed.
>>
>> As soon as I saw the model number "HGST HUH721010AL", and did a
>> search, I said, "Oh, it's _this_ drive."
>>
>> The HGST Ultrastar He10 has both "512e Format" and "4K Native Format"
>> part numbers, so it's easy to potentially buy the wrong type of drive
>> (e.g.: accidentally buy a 4K Native drive, and discover some obscure
>> I/O failures).
>>
>> FYI, in my experience, when an application sends a
>> smaller-than-4096-bytes I/O to a 4096-bytes block device, the usual
>> error code that's sent by the driver is EINVAL (or "Invalid
>> argument"), so see if there's a log message citing that error code.
>
> We've done the work to make this function.  However, it was a while ago
> and I don't believe anyone tests regularly now (particularly with the
> corner cases) so errors can creep back into the stack.

Ah, okay.  I was thinking more in the context of the error itself
being relatively obscure to find, since the program trying to perform
the I/O operation may report the error in a way that makes it look as
though an invalid argument to a command was received.

(At least that's how I discovered this, when I was wondering why I was
seeing "invalid argument" after trying a command that should have
worked, but failed; a blktrace run revealed a less-than-4096-byte read
that was being attempted, but failed with EINVAL.)

>> > I think it must be a couple of years ago now though, that I heard
>> > (on LWN) enterprise drives were apparently switching over to
>> > 4096/4096. With NO 512 emulation fall-back.
>>
>> Some drive manufacturers seem to be more eager than others, but
>> there's still work to be done.  For example, try this with a 4K-
>> native drive:
>>
>> 1. Write an ISO image to the drive with the command "dd
>> if=isofile.iso of=/dev/testdevice bs=4096 oflag=direct"
>>
>> 2. Create a test directory (for example, "/mnt/testdir"), then
>> attempt to mount the device with "mount /dev/testdevice /mnt/testdir"
>
> This is a textbook case of something that can never work: The
> requirement for a 4k drive is that the stack must be aligned, meaning
> 4k or multiple of 4k block size all the way up and down.  The isofs
> you're copying only has a 2k block size.  You get the same failure with
> any non 4k multiple filesystem block size.  Fortunately most modern
> filesystems have had 4k, or multiple thereof, block sizes for a while
> now, so you're unlikely to see this on your old ext4 devices but, in
> principle, it could happen.
>
> James

Then I hope that drive manufacturers don't start making 4K-native USB
flash drives; otherwise, we'll have a confusing situation on our
hands.


Bryan


>
>> When I tried it on RHEL 7.5, I saw this: "kernel: isofs_fill_super:
>> bread failed, dev=testdevice, iso_blknum=17, block=-2147483648"
>>
>> Note that ISO filesystems have a 2048-byte block size (maximum), but
>> in this test, it's stored on a block device with a block size of 4096
>> bytes.
>>
>> There may be more issues out there, but they have to be found first.
>> And finding the issues is difficult, due to the obscurity of the
>> error messages seen.
>>
>>
>> Thanks,
>>
>> Bryan
>>
>