Commit f3ddac1918fe963bcbf8d407a3a3c0881b47248b ([SCSI] qla2xxx:
Disable adapter when we encounter a PCI disconnect.) has introduced a
code that disables the board, releasing some resources, when reading
0x.
In case this happens when there is an EEH, this read will trigger EEH
detection
need to track
down other mails (btw, thanks for the detailed patch header but it
enabled me to be skeptical of your request to revert):
You're welcome. If it's been useful for rejecting this patch and
getting a better one later, it's worth it. :)
Kind regards,
--
Mauricio Faria de Oliveira
IBM
):0713 SCSI layer issued Device Reset (0, 0) return
x2002
<...>
lpfc 0006:01:00.4: 4:(0):0723 SCSI layer issued Target Reset (1, 0) return
x2002
<...>
lpfc 0006:01:00.4: 4:(0):0714 SCSI layer issued Bus Reset Data: x2002
<...>
lpfc 0006:01:00.4: 4:(0):31
on next-20160601.
Signed-off-by: Mauricio Faria de Oliveira <mauri...@linux.vnet.ibm.com>
---
drivers/scsi/lpfc/lpfc_attr.c | 8 +--
drivers/scsi/lpfc/lpfc_hw4.h | 1 +
drivers/scsi/lpfc/lpfc_init.c | 54 ++-
drivers/scsi/lpfc/lpfc_scsi.c | 3 ++-
4
on some systems).
While in there, include the CPU number in the debug message, which
helps reading it on systems with many CPUs.
This depends on commit 'powerpc: export cpu_to_core_id()' (submitted
to the linuxppc-dev mailing list). Tested on next-20160601 w/ commit.
Signed-off-by: Mauricio
(topology information),
which has server processors with many cores/threads and per-core caches.
Although the series include bits for PowerPC64, the per-core scheduling patch
is architecture independent.
Tested on next-20160601 (with an extra commit for patch 1/2, see commit msg).
Mauricio Faria de
technical
problems, for example.
Thanks for the review/comments (Christoph too),
--
Mauricio Faria de Oliveira
IBM Linux Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo inf
closely.
--
Mauricio Faria de Oliveira
IBM Linux Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/01/2016 05:43 PM, Mauricio Faria de Oliveira wrote:
Tested on next-20160601 (with an extra commit for patch 1/2, see commit msg).
FYI, that commit has been accepted into powerpc next [1].
[1] https://git.kernel.org/powerpc/c/f8ab481066e7246e4b272233aa
--
Mauricio Faria de Oliveira
IBM
where ppc64/le
usually runs, on which it would be easier to adapt this relatively
small change than moving forward w/ blk-mq/scsi-mq, for example --
even if the latter is clearly a superior approach.
[1] http://lists.infradead.org/pipermail/linux-nvme/2016-June/005012.html
--
Mauricio Faria de
,
and it happens in normal scenarios (eg SCSI EH), it seems appropriate.
Thanks,
--
Mauricio Faria de Oliveira
IBM Linux Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo inf
, please feel free to change the sign-off line as
appropriate here.
Thanks,
--
Mauricio Faria de Oliveira
IBM Linux Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo inf
xxx]
qla2x00_abort_isp+0xef/0x690 [qla2xxx]
qla2x00_do_dpc+0x36c/0x880 [qla2xxx]
kthread+0x10c/0x140
Note: this patch is a slight change of the original patch
sent by Bart, submitted by request of mkp.
Signed-off-by: Mauricio Faria de Oliveira <mauri...@linux.vnet.ibm.com>
Reported
qla2xxx_eh_abort(GET_CMD_SP(sp));
+ qla2xxx_eh_abort(scmd);
spin_lock_irqsave(>hardware_lock, flags);
}
req->outstanding_cmds[cnt] = NULL;
Signed-off-by: Mauricio Faria de Olive
et_from_kernel_thread+0x5c/0xbc
<...>
Cc: sta...@vger.kernel.org # v4.8
Fixes: 22466da5b4b7 ("lpfc: Fix possible NULL pointer dereference")
Signed-off-by: Mauricio Faria de Oliveira <mauri...@linux.vnet.ibm.com>
---
drivers/scsi/lpfc/lpfc_sli.c | 14 --
1 file
Due credit; an oversight.
On 11/23/2016 10:33 AM, Mauricio Faria de Oliveira wrote:
Reported-by: Harsha Thyagaraja <hathy...@in.ibm.com>
Cc: sta...@vger.kernel.org # v4.8
Fixes: 22466da5b4b7 ("lpfc: Fix possible NULL pointer dereference")
Signed-off-by: Mauricio Faria de
On 11/23/2016 12:12 PM, Johannes Thumshirn wrote:
Looks good and sorry for the bug,
Reviewed-by: Johannes Thumshirn <jthumsh...@suse.de>
Thanks for the quick review. Not a problem!
This problem turned out to be a good learning exercise. :)
--
Mauricio Faria de Oliveira
IBM Linux Tech
orry for this oversight.)
With it applied, both PCI device remove and EEH recovery works fine.
Fixes: 1535aa75a3d8 ("scsi: qla2xxx: fix invalid DMA access after
command aborts in PCI device remove")
Signed-off-by: Mauricio Faria de Oliveira <mauri...@linux.vnet.ibm.com>
or 17096824
Links:
[1]
http://git.qemu.org/?p=qemu.git;a=commit;h=336a6915bc7089fb20fea4ba99972ad9a97c5f52
[2] https://libvirt.org/formatdomain.html#elementsDisks (see 'disk' -> 'device')
Signed-off-by: Mauricio Faria de Oliveira <mauri...@linux.vnet.ibm.com>
Signed-off-by: Brahadamba
[sda] tag#0 8 sectors total, 4096 bytes done.
[...] sd 0:0:0:0: tag#0 0 sectors total, 0 bytes done.
Apologies for the ridiculously long commit message with description and
test-cases, but this problem has been relatively difficult to reproduce
and understand, so I thought the documentation/instr
f how the I/O is
being broken up into frames at the transport level and at which offset
the transfer was interrupted.
Christoph, Hannes, Martin,
Thank you all for your comments and pointers to the documentation/spec.
I'll carry it on with the HBA and storage folks.
cheers,
--
Mauricio F
that more properly, set the initial power state
value to '-1' (i.e., uninitialized) instead of '1' (power 'on'),
and check for it in that callback which may do an direct access
to the field value _if_ a callback function is not defined.
Signed-off-by: Mauricio Faria de Oliveira <mauri...@linux
wp=rw
|-+- policy='service-time 0' prio=0 status=active
| `- 2:2:7:0 sdaf 65:240 active undef running
`-+- policy='service-time 0' prio=0 status=enabled
`- 1:2:7:0 sdh 8:112 active undef running
Signed-off-by: Mauricio Faria de Oliveira <mauri...@linux.vnet.i
This is the PATCH v2. Sorry for the wrong subject line.
On 04/11/2017 11:46 AM, Mauricio Faria de Oliveira wrote:
Signed-off-by: Mauricio Faria de Oliveira <mauri...@linux.vnet.ibm.com>
Acked-by: Brian King <brk...@linux.vnet.ibm.com>
---
v2:
- use the scsi_cmd local variable
o back to the ipr_cmd
to get the pointer, so could be:
Thanks for catching that oversight.
--
Mauricio Faria de Oliveira
IBM Linux Technology Center
), in
which case the 'valid_states' information is not printed. That
is for the following patch too.
Signed-off-by: Mauricio Faria de Oliveira <mauri...@linux.vnet.ibm.com>
---
drivers/scsi/device_handler/scsi_dh_alua.c | 43 ++
1 file changed, 32 insertions(+), 11 del
ciated to this port group.
Signed-off-by: Mauricio Faria de Oliveira <mauri...@linux.vnet.ibm.com>
---
drivers/scsi/device_handler/scsi_dh_alua.c | 12
1 file changed, 12 insertions(+)
diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c
b/drivers/scsi/device_handler/scsi_dh_
Mauricio Faria de Oliveira (4):
scsi: scsi_dh_alua: allow I/O in the target port unavailable state
scsi: scsi_dh_alua: create alua_rtpg_print() for alua_rtpg()
sdev_printk
scsi: scsi_dh_alua: print changes to RTPG state of other PGs too
scsi: scsi_dh_alua: do not print target port g
gt;state can be updated properly
(and further SCSI IO error messages then silenced through alua_prep_fn()).
Once a path checker eventually detects an active state again, the port
group state will be updated by the path activation call, alua_activate(),
as it schedules an alua_rtpg() check.
Signed
.
Signed-off-by: Mauricio Faria de Oliveira <mauri...@linux.vnet.ibm.com>
---
drivers/scsi/device_handler/scsi_dh_alua.c | 26 --
1 file changed, 24 insertions(+), 2 deletions(-)
diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c
b/drivers/scsi/device_h
On 04/10/2017 10:17 PM, Mauricio Faria de Oliveira wrote:
For documentation purposes, I'll reply to this cover letter with the analysis
of such cases of this problem, and the accompanying messages from kernel logs.
Here it goes, for anyone interested.
Scenario: 4 LUNs, 2 target port groups
Hi Martin and Junichi,
On 04/03/2017 11:10 PM, Junichi Nomura wrote:
On 04/04/17 06:53, Mauricio Faria de Oliveira wrote:
On 03/28/2017 11:29 PM, Junichi Nomura wrote:
Since commit 895427bd012c ("scsi: lpfc: NVME Initiator: Base modifications"),
"rmmod lpfc" sta
d making this patch
a one-line. :- )
cheers,
--
Mauricio Faria de Oliveira
IBM Linux Technology Center
FO, LOG_INIT,
"2821 initialize iocb list %d.\n",
phba->cfg_iocb_cnt*1024);
cheers,
--
Mauricio Faria de Oliveira
IBM Linux Technology Center
ower state
value to '-1' (i.e., uninitialized) instead of '1' (power 'on'),
and check for it in that callback which may do an direct access
to the field value _if_ a callback function is not defined.
Signed-off-by: Mauricio Faria de Oliveira <mauri...@linux.vnet.ibm.com>
Fixes: 0802488
On 04/05/2017 11:41 AM, Dan Williams wrote:
On Wed, Apr 5, 2017 at 6:13 AM, Mauricio Faria de Oliveira
<mauri...@linux.vnet.ibm.com> wrote:
1) imagine .get_power_status couldn't update the 'power_status' field
(it's a bit unlikely with the in-tree ses driver, but in th
On 04/05/2017 01:23 PM, Song Liu wrote:
Reviewed-by: Song Liu <songliubrav...@fb.com>
Thanks for reviewing, Song Liu.
It's good to know this patch doesn't break anything for you.
cheers,
--
Mauricio Faria de Oliveira
IBM Linux Technology Center
wp=rw
|-+- policy='service-time 0' prio=0 status=active
| `- 2:2:7:0 sdaf 65:240 active undef running
`-+- policy='service-time 0' prio=0 status=enabled
`- 1:2:7:0 sdh 8:112 active undef running
Signed-off-by: Mauricio Faria de Oliveira <mauri...@linux.vnet.ibm.com>
received partially
updated WQE data.
Add the memory barrier after updating the WQE memory.
Reviewed-by: Mauricio Faria de Oliveira <mauri...@linux.vnet.ibm.com>
Martin, may you please flag this patch for stable?
Thank you,
--
Mauricio Faria de Oliveira
IBM Linux Technology Center
Hi Martin and James,
On 02/12/2017 07:52 PM, James Smart wrote:
Correct WQ creation for pagesize
Reviewed-by: Mauricio Faria de Oliveira <mauri...@linux.vnet.ibm.com>
Please flag this patch for stable.
This patch resolves a serious problem on IBM Power systems at least.
An (appa
4dac35e..6a4f75a 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -106,6 +106,7 @@ struct scsi_disk {
unsignedrc_basis: 2;
unsignedzoned: 2;
unsignedurswrz : 1;
+ unsignedmedium_access_reset : 1;
};
#define to_scsi_disk(obj) container_of(obj,struct scsi_disk,dev)
--
Mauricio Faria de Oliveira
IBM Linux Technology Center
On 03/13/2017 11:48 AM, Hannes Reinecke wrote:
This is assuming that we're always running on a scsi_disk, and that
scsi_disk is the only one implementing 'eh_action'.
Neither of which is necessarily true.
Ah, OK. Thanks for explaining.
--
Mauricio Faria de Oliveira
IBM Linux Technology
On 02/12/2017 07:49 PM, Anton Blanchard wrote:
We see lpfc devices regularly fail during kexec. Fix this by adding
a shutdown method which mirrors the remove method.
Reviewed-by: Mauricio Faria de Oliveira <mauri...@linux.vnet.ibm.com>
Tested-by: Mauricio Faria de Oliveira
present/ask for consideration too.
I think I should have included this in the tested-by tag email, for
documentation/evidence: no regression observed in system shutdown path.
Thanks,
--
Mauricio Faria de Oliveira
IBM Linux Technology Center
; I missed checking the right tree. Thanks for the pointers.
--
Mauricio Faria de Oliveira
IBM Linux Technology Center
e if they see fit/required.
[1] http://www.spinics.net/lists/linux-scsi/msg105886.html
cheers,
--
Mauricio Faria de Oliveira
IBM Linux Technology Center
idx]->pring = pring;
commit 85e8a23936ab ("scsi: lpfc: Add shutdown method for kexec") made
this more likely as lpfc_pci_remove_one() is called on driver shutdown
(e.g., modprobe -r / rmmod).
(this patch is partially based on a different patch suggested by Johannes,
thus adding a Suggested-by ta
t sent
([PATCH] lpfc: fix double free of bound CQ/WQ ring pointer) resolves it?
I don't have a setup to test it handy right now.
cheers,
--
Mauricio Faria de Oliveira
IBM Linux Technology Center
go through that function.
(and it occurred to me that the state-change check of patch 3 can
be done there, simpler.)
cheers,
--
Mauricio Faria de Oliveira
IBM Linux Technology Center
On 07/11/2017 12:32 PM, Mauricio Faria de Oliveira wrote:
Also, it seems the Unavailable/Standby states would not be logged
without a recheck from alua_check_sense(), since the only callers
of alua_rtpg_queue() are alua_activate() and alua_check[_sense]()
Well, actually it does get logged
Insert sdev_dbg() calls in the function path which may queue
alua_rtpg_work() past initialization, for debugging purposes:
- alua_activate()
- alua_check_sense()
- alua_rtpg_queue()
Signed-off-by: Mauricio Faria de Oliveira <mauri...@linux.vnet.ibm.com>
---
drivers/scsi/device_h
in unavailable/standby
are not logged - only changes are.
Patch 4 adds few sdev_dbg() calls to track the path to alua_rtpg_work()
Tested on v4.12+ (commit b4b8cbf679c4).
Mauricio Faria de Oliveira (4):
scsi: scsi_dh_alua: allow I/O in target port unavailable and standby
states
scsi
for the current PG.
Signed-off-by: Mauricio Faria de Oliveira <mauri...@linux.vnet.ibm.com>
---
v2:
- use lockdep_assert_held() instead of documenting locking conventions
(Bart Van Assche <bart.vanass...@sandisk.com>)
- define two functions (with/without supported states information)
scheduled in alua_check_sense() to update PG state.
So, do not to print such message if unavailable/standby state remains
(i.e., the PG did not transition to/from such states). All other cases
continue to be printed.
Signed-off-by: Mauricio Faria de Oliveira <mauri...@linux.vnet.ibm.com>
ated on path activation (alua_activate(),
as it schedules a recheck), thus I/O requests are no longer failed.
Signed-off-by: Mauricio Faria de Oliveira <mauri...@linux.vnet.ibm.com>
Reported-by: Naresh Bannoth <nbann...@in.ibm.com>
---
v2:
- also add support for standby state
On 07/10/2017 07:47 PM, Mauricio Faria de Oliveira wrote:
This patchset addresses that problem, and adds a few improvements
to the logging of PG state changes.
Here are some kernel log snippets with the patchset, if that helps.
The 2 port groups temporarily gone into unavailable state
On 01/31/2018 08:50 PM, Bart Van Assche wrote:
I think it would be useful to have some variant of the above code in the kernel
tree. Are you familiar with the fault injection framework (see also
and Documentation/fault-injection/fault-injection.txt)?
No, not yet. That's very interesting.
On 01/31/2018 08:59 PM, Bart Van Assche wrote:
On Wed, 2018-01-31 at 17:48 -0200, Mauricio Faria de Oliveira wrote:
On 01/31/2018 05:06 PM, Bart Van Assche wrote:
Sorry but I think this patch introduces new race conditions. Have you
Can you detail the race conditions? As far as I can see
Bart,
Thanks for reviewing.
On 01/31/2018 05:06 PM, Bart Van Assche wrote:
Sorry but I think this patch introduces new race conditions. Have you
Can you detail the race conditions? As far as I can see, the only race
condition would be when an error handler is invoked very close in time
to
reset and target reset handlers do not cause oopses,
but print a misleading message of host reset in progress, thus
fix those too.
Signed-off-by: Mauricio Faria de Oliveira <mauri...@linux.vnet.ibm.com>
---
drivers/scsi/mpt3sas/mpt3sas_scsih.c | 28
1 file changed, 2
The test-case results with PATCH v2.
scsih_abort()
=
Without patch:
[ 362.669743] setting logging_level(0x1000)
[ 362.705074] mpt3sas_cm0: skip free_smid/scsi_done scmd(c01fd4f2bd40)
[ 363.956579] sd 16:0:1:0: [sdf] Synchronizing SCSI cache
[ 363.956844]
elp, so still go for the changes.
Also, this might help to prevent similar errors in the future,
in case code changes and possibly tries to access freed stuff.
Note the fix in scsih_host_reset() is still important anyway.
Signed-off-by: Mauricio Faria de Oliveira <mauri...@linux.vnet.ibm.com>
---
v2:
This patch can be verified with this simple test-case,
which inserts a wait loop at the bottom of 'scsih_shutdown()'
and forces SCSI commands to timeout (skip 'scmd->scsi_done()').
It abuses the 'ioc->logging_level' parameter do to that, with:
- 0x1000: wait loop on scsih_shutdown() and skip
s already used in many other points in the code,
for the same reasons (exit early before the code attempts to use stuff
that might be released).
Thanks again,
--
Mauricio Faria de Oliveira
IBM Linux Technology Center
Hi Sreekanth,
On 02/15/2018 03:48 AM, Sreekanth Reddy wrote:
During the shutdown time, I don't want the outstanding IOs to timeout due to
disabling of interrupts and go the TM path. So I wanted to clear out all the
Outstanding IOs in the shutdown path itself instead of clearing them in TM
path.
for the changes.
Also, this might help to prevent similar errors in the future,
in case code changes and possibly tries to access freed stuff.
Note the fix in scsih_host_reset() is still important anyway.
Signed-off-by: Mauricio Faria de Oliveira <mauri...@linux.vnet.ibm.com>
---
.
Mauricio Faria de Oliveira (2):
scsi: mpt3sas: fix oops in error handlers after shutdown/unload
scsi: mpt3sas: wait for and flush running commands on shutdown/unload
drivers/scsi/mpt3sas/mpt3sas_base.c | 8
drivers/scsi/mpt3sas/mpt3sas_base.h | 3 +++
drivers/scsi/mpt3sas
.com>
Tested-by: Mauricio Faria de Oliveira <mauri...@linux.vnet.ibm.com>
Signed-off-by: Sreekanth Reddy <sreekanth.re...@broadcom.com>
[mauricfo: introduced something in commit message.]
Signed-off-by: Mauricio Faria de Oliveira <mauri...@linux.vnet.ibm.com>
---
drivers/
Martin, James,
On 02/22/2018 01:07 AM, Martin K. Petersen wrote:
The first patch prevents the SCSI error handlers to run once the
shutdown/unload path starts. This avoids an oops at least in the host
reset handler, on kernels with a recent patch, and also in the abort
handler on kernels
69 matches
Mail list logo