RE: [PATCH 1/3] scsi: aacraid: Check for PCI state of device in a generic way

2017-11-17 Thread Dave Carroll
> Commit 16ae9dd35d37 ("scsi: aacraid: Fix for excessive prints on EEH")
> introduced checks about the state of device before any PCI operations
> in the driver. Basically, this prevents it to perform PCI accesses
> when device is in the process of recover from a PCI error. In PowerPC,
> such mechanism is called EEH, and the aforementioned commit introduced
> checks that are based on EEH-specific primitives for that.
> 
> The potential problems with this approach are three: first, these checks
> are "locked" to powerpc only - another archs could have error recovery
> methods too, like AER in Intel. Also, the powerpc primitives perform
> expensive FW accesses to validate the precise PCI state of a device.
> Finally, code becomes more complicated and needs ifdef validation
> based on arch config being set.
> 
> So, this patch makes use of generic PCI state checks, which are
> lightweight and non-dependent of arch configs - also, it makes
> the code cleaner.
> 
> Signed-off-by: Guilherme G. Piccoli 
> ---

Reviewed-by: Dave Carroll 


Re: Seagate External SMR drive USB resets (XHCI transfer error, not timeout)

2017-11-17 Thread Jérôme Carretero
Hi,

On Thu, 16 Nov 2017 14:42:51 -0500 (EST)
Alan Stern  wrote:

> On Wed, 15 Nov 2017, Jérôme Carretero wrote:
> 
> > I performed an usbmon capture extract, centered around the event
> > (there was a few hundred MBs written for this to happen):
> > 
> >  Nov 15 22:16:33 Bidule kernel: usb 6-4.3.2.1: reset SuperSpeed USB
> >   device number 8 using xhci_hcd
> > 
> > I can see that the computer is sending a write request, and sees a
> > -EPROTO in answer (capture in attachment), so scratch the timeout
> > issue (and actually when thinking about it, this matches what UAS
> > was saying, except that UAS was taking ages to recover).
> > 
> > Looked for EPROTO in the usb code, and found a dynamic debug printf
> > in XHCI; after enabling it:
> > 
> >  Nov 15 22:45:03 Bidule kernel: xhci_hcd :07:00.0: Transfer
> > error for slot 13 ep 3 on endpoint Nov 15 22:45:03 Bidule kernel:
> > xhci_hcd :07:00.0: Transfer error for slot 12 ep 3 on endpoint
> > Nov 15 22:45:03 Bidule kernel: usb 6-4.3.3.1: reset SuperSpeed USB
> > device number 9 using xhci_hcd Nov 15 22:45:03 Bidule kernel: usb
> > 6-4.3.2.1: reset SuperSpeed USB device number 8 using xhci_hcd
> > 
> > First, I understand that a bad USB device could poison the kernel
> > log, but shouldn't that xhci_dbg() (and others eg. babble) be at
> > least an xhci_info() (I saw 2a9227a5)?  
> 
> I suspect that if every USB error got printed in the kernel log,
> people would be upset at how much useless information was added.

So it turns out that one of the 2 drives that produced most of these
errors died overnight (the kernel first reported failure at READ DMA
EXT, SMART seeing 6k Current_Pending_Sector / Offline_Uncorrectable,
then the drive just lost it and wouldn't even complete USB enumeration
now.

IMHO too much information is perhaps better than not enough, and I bet
that people would reconsider purchasing low-quality hardware if they
noticed these (unless they can happen for no reason).

> 
> > Then... I don't know enough to attribute the issue the upstream USB
> > hub(s) or the drive endpoint not behaving properly, or the
> > kernel... what should I do with these messages?  
> 
> Here's the error:
> 
> b5251480 0.505661 S Bo:6:008:2 -115 196608 = 540a2813 1a33dd99
> ab76840c bf72fc6b 60f9fcaf 4d61822c c007ff4e ab72d022 b5251480
> 0.506280 C Bo:6:008:2 -71 86016 >

Out of curiosity, which tool produced this condensed output?

> This means the kernel tried to write 196608 bytes to the drive.  After
> 86016 had been transferred, the drive did not reply correctly to the
> next output transaction, causing the kernel to perform a reset.  
> That's what happened, according to the viewpoint of the xhci-hcd 
> driver.
> 
> In theory it's possible that the drive did respond correctly and the
> information get messed up on the USB cable or on the computer's end.

Wow, that sucks.
I had a mental image where the transactions used FEC and it would be
obviously possible to differentiate between cable/hub/endpoint errors.


> Since we can't see what signals were actually sent on the USB bus,
> there's no way to be certain.  But it seems most likely that the drive
> (or rather, its USB interface) was at fault.

I would speculate (with high confidence) that the drive itself is doing
unexpected stuff, because of that bugzilla issue showing that these SMR
drives also behave strangely when connected on SATA.

I have had in circulation 10 of these 8 TB SMR drives, 1 SATA and 9 USB,
and all of them are generating unexpected kernel logging to some
extent, when subject to write-intensive loads.
2 from 2015 and SMART says they're all good; the rest since 10 days
ago, one was DOA (very early SMART bad sectors) and tonight's failure
has an S/N consecutive to that first DOA one, which smells a little.


> > I'm still filling the drives, will perform a scrub after, to see if
> > the issue causes data loss...  

To be continued... since it looks like there's no fundamental issue
with the kernel itself and this is turning into a rant on hardware,
I'll just direct follow-up e-mails to the ML only, tell me if you want
to stay in CC.


Thanks again,

-- 
Jérôme
=== START OF INFORMATION SECTION ===
Device Model: ST8000DM004-2CX188
Serial Number:
LU WWN Device Id: XX
Firmware Version: 0001
User Capacity:8,001,563,222,016 bytes [8.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate:5425 rpm
Device is:Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:Thu Nov 16 23:36:32 2017 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status:  (0x82)	Offline data c

RE: [PATCH 2/3] scsi: aacraid: Perform initialization reset only once

2017-11-17 Thread Raghava Aditya Renukunta


> -Original Message-
> From: Guilherme G. Piccoli [mailto:gpicc...@linux.vnet.ibm.com]
> Sent: Friday, November 17, 2017 1:15 PM
> To: dl-esc-Aacraid Linux Driver ; linux-
> s...@vger.kernel.org
> Cc: gpicc...@linux.vnet.ibm.com; Dave Carroll
> ; Raghava Aditya Renukunta
> ; gpicc...@protonmail.ch
> Subject: [PATCH 2/3] scsi: aacraid: Perform initialization reset only once
> 
> EXTERNAL EMAIL
> 
> 
> Currently the driver accepts two ways of requesting an initialization
> reset on the adapter: by passing aac_reset_devices module parameter,
> or the generic kernel parameter reset_devices.
> 
> It's working as intended...but if we end up reaching a scsi hang and
> the scsi EH mechanism takes place, aacraid performs resets as part of
> the scsi error recovery procedure. These EH routines might reinitialize
> the device, and if we have provided some of the reset parameters in the
> kernel command-line, we again perform an "initialization" reset.
> 
> So, to avoid this duplication of resets in case of scsi EH path, this
> patch adds a field to aac_dev struct to keep per-adapter track of the
> init reset request - once it's done, we set it to false and don't
> proactively reset anymore in case of reinitializations.
> 
> Signed-off-by: Guilherme G. Piccoli 
> ---

Reviewed-by :Raghava Aditya Renukunta 


RE: [PATCH 3/3] scsi: aacraid: Prevent crash in case of free interrupt during scsi EH path

2017-11-17 Thread Raghava Aditya Renukunta


> -Original Message-
> From: Guilherme G. Piccoli [mailto:gpicc...@linux.vnet.ibm.com]
> Sent: Friday, November 17, 2017 1:15 PM
> To: dl-esc-Aacraid Linux Driver ; linux-
> s...@vger.kernel.org
> Cc: gpicc...@linux.vnet.ibm.com; Dave Carroll
> ; Raghava Aditya Renukunta
> ; gpicc...@protonmail.ch
> Subject: [PATCH 3/3] scsi: aacraid: Prevent crash in case of free interrupt
> during scsi EH path
> 
> EXTERNAL EMAIL
> 
> 
> As part of the scsi EH path, aacraid performs a reinitialization of
> the adapter, which encompass freeing resources and IRQs, NULLifying
> lots of pointers, and then initialize it all over again.
> We've identified a problem during the free IRQ portion of this path
> if CONFIG_DEBUG_SHIRQ is enabled on kernel config file.
> 
> Happens that, in case this flag was set, right after free_irq()
> effectively clears the interrupt, it checks if it was requested
> as IRQF_SHARED. In positive case, it performs another call to the
> IRQ handler on driver. Problem is: since aacraid currently free
> some resources *before* freeing the IRQ, once free_irq() path
> calls the handler again (due to CONFIG_DEBUG_SHIRQ), aacraid
> crashes due to NULL pointer dereference with the following trace:
> 
>   aac_src_intr_message+0xf8/0x740 [aacraid]
>   __free_irq+0x33c/0x4a0
>   free_irq+0x78/0xb0
>   aac_free_irq+0x13c/0x150 [aacraid]
>   aac_reset_adapter+0x2e8/0x970 [aacraid]
>   aac_eh_reset+0x3a8/0x5d0 [aacraid]
>   scsi_try_host_reset+0x74/0x180
>   scsi_eh_ready_devs+0xc70/0x1510
>   scsi_error_handler+0x624/0xa20
> 
> This patch prevents the crash by changing the order of the
> deinitialization in this path of aacraid: first we clear the IRQ,
> then we free other resources. No functional change intended.
> 
> Signed-off-by: Guilherme G. Piccoli 
> ---

Thank you for the fix. Much appreciated. 

Reviewed-by :Raghava Aditya Renukunta 


[PATCH 0/3] Some fixes to aacraid

2017-11-17 Thread Guilherme G. Piccoli
This series presents 3 small fixes for aacraid driver.
The most important is the crash prevention, IMHO.

Tested them against v4.14.

Guilherme G. Piccoli (3):
  scsi: aacraid: Check for PCI state of device in a generic way
  scsi: aacraid: Perform initialization reset only once
  scsi: aacraid: Prevent crash in case of free interrupt during scsi EH path

 drivers/scsi/aacraid/aacraid.h |  1 +
 drivers/scsi/aacraid/commsup.c | 35 +++
 drivers/scsi/aacraid/linit.c   |  3 +++
 drivers/scsi/aacraid/rx.c  | 15 ++-
 drivers/scsi/aacraid/src.c | 20 ++--
 5 files changed, 31 insertions(+), 43 deletions(-)

-- 
2.15.0



[PATCH 2/3] scsi: aacraid: Perform initialization reset only once

2017-11-17 Thread Guilherme G. Piccoli
Currently the driver accepts two ways of requesting an initialization
reset on the adapter: by passing aac_reset_devices module parameter,
or the generic kernel parameter reset_devices.

It's working as intended...but if we end up reaching a scsi hang and
the scsi EH mechanism takes place, aacraid performs resets as part of
the scsi error recovery procedure. These EH routines might reinitialize
the device, and if we have provided some of the reset parameters in the
kernel command-line, we again perform an "initialization" reset.

So, to avoid this duplication of resets in case of scsi EH path, this
patch adds a field to aac_dev struct to keep per-adapter track of the
init reset request - once it's done, we set it to false and don't
proactively reset anymore in case of reinitializations.

Signed-off-by: Guilherme G. Piccoli 
---
 drivers/scsi/aacraid/aacraid.h |  1 +
 drivers/scsi/aacraid/linit.c   |  3 +++
 drivers/scsi/aacraid/rx.c  | 15 ++-
 drivers/scsi/aacraid/src.c | 20 ++--
 4 files changed, 28 insertions(+), 11 deletions(-)

diff --git a/drivers/scsi/aacraid/aacraid.h b/drivers/scsi/aacraid/aacraid.h
index 403a639574e5..6e3d81969a77 100644
--- a/drivers/scsi/aacraid/aacraid.h
+++ b/drivers/scsi/aacraid/aacraid.h
@@ -1673,6 +1673,7 @@ struct aac_dev
struct aac_hba_map_info hba_map[AAC_MAX_BUSES][AAC_MAX_TARGETS];
u8  adapter_shutdown;
u32 handle_pci_error;
+   boolinit_reset;
 };
 
 #define aac_adapter_interrupt(dev) \
diff --git a/drivers/scsi/aacraid/linit.c b/drivers/scsi/aacraid/linit.c
index c9252b138c1f..bdf127aaab41 100644
--- a/drivers/scsi/aacraid/linit.c
+++ b/drivers/scsi/aacraid/linit.c
@@ -1680,6 +1680,9 @@ static int aac_probe_one(struct pci_dev *pdev, const 
struct pci_device_id *id)
aac->cardtype = index;
INIT_LIST_HEAD(&aac->entry);
 
+   if (aac_reset_devices || reset_devices)
+   aac->init_reset = true;
+
aac->fibs = kzalloc(sizeof(struct fib) * (shost->can_queue + 
AAC_NUM_MGT_FIB), GFP_KERNEL);
if (!aac->fibs)
goto out_free_host;
diff --git a/drivers/scsi/aacraid/rx.c b/drivers/scsi/aacraid/rx.c
index 93ef7c37e568..ff2af06e7dd9 100644
--- a/drivers/scsi/aacraid/rx.c
+++ b/drivers/scsi/aacraid/rx.c
@@ -561,11 +561,16 @@ int _aac_rx_init(struct aac_dev *dev)
dev->a_ops.adapter_sync_cmd = rx_sync_cmd;
dev->a_ops.adapter_enable_int = aac_rx_disable_interrupt;
dev->OIMR = status = rx_readb (dev, MUnit.OIMR);
-   if status & 0x0c) != 0x0c) || aac_reset_devices || reset_devices) &&
- !aac_rx_restart_adapter(dev, 0, IOP_HWSOFT_RESET))
-   /* Make sure the Hardware FIFO is empty */
-   while ((++restart < 512) &&
- (rx_readl(dev, MUnit.OutboundQueue) != 0xL));
+
+   if (((status & 0x0c) != 0x0c) || dev->init_reset) {
+   dev->init_reset = false;
+   if (!aac_rx_restart_adapter(dev, 0, IOP_HWSOFT_RESET)) {
+   /* Make sure the Hardware FIFO is empty */
+   while ((++restart < 512) &&
+ (rx_readl(dev, MUnit.OutboundQueue) != 
0xL));
+   }
+   }
+
/*
 *  Check to see if the board panic'd while booting.
 */
diff --git a/drivers/scsi/aacraid/src.c b/drivers/scsi/aacraid/src.c
index 0c9361c87ec8..fde6b6aa86e3 100644
--- a/drivers/scsi/aacraid/src.c
+++ b/drivers/scsi/aacraid/src.c
@@ -868,9 +868,13 @@ int aac_src_init(struct aac_dev *dev)
/* Failure to reset here is an option ... */
dev->a_ops.adapter_sync_cmd = src_sync_cmd;
dev->a_ops.adapter_enable_int = aac_src_disable_interrupt;
-   if ((aac_reset_devices || reset_devices) &&
-   !aac_src_restart_adapter(dev, 0, IOP_HWSOFT_RESET))
-   ++restart;
+
+   if (dev->init_reset) {
+   dev->init_reset = false;
+   if (!aac_src_restart_adapter(dev, 0, IOP_HWSOFT_RESET))
+   ++restart;
+   }
+
/*
 *  Check to see if the board panic'd while booting.
 */
@@ -1014,9 +1018,13 @@ int aac_srcv_init(struct aac_dev *dev)
/* Failure to reset here is an option ... */
dev->a_ops.adapter_sync_cmd = src_sync_cmd;
dev->a_ops.adapter_enable_int = aac_src_disable_interrupt;
-   if ((aac_reset_devices || reset_devices) &&
-   !aac_src_restart_adapter(dev, 0, IOP_HWSOFT_RESET))
-   ++restart;
+
+   if (dev->init_reset) {
+   dev->init_reset = false;
+   if (!aac_src_restart_adapter(dev, 0, IOP_HWSOFT_RESET))
+   ++restart;
+   }
+
/*
 *  Check to see if flash update is running.
 *  Wait for the adapter to be up and running. Wait up to 5 minutes
-- 
2.15.0



[PATCH 3/3] scsi: aacraid: Prevent crash in case of free interrupt during scsi EH path

2017-11-17 Thread Guilherme G. Piccoli
As part of the scsi EH path, aacraid performs a reinitialization of
the adapter, which encompass freeing resources and IRQs, NULLifying
lots of pointers, and then initialize it all over again.
We've identified a problem during the free IRQ portion of this path
if CONFIG_DEBUG_SHIRQ is enabled on kernel config file.

Happens that, in case this flag was set, right after free_irq()
effectively clears the interrupt, it checks if it was requested
as IRQF_SHARED. In positive case, it performs another call to the
IRQ handler on driver. Problem is: since aacraid currently free
some resources *before* freeing the IRQ, once free_irq() path
calls the handler again (due to CONFIG_DEBUG_SHIRQ), aacraid
crashes due to NULL pointer dereference with the following trace:

  aac_src_intr_message+0xf8/0x740 [aacraid]
  __free_irq+0x33c/0x4a0
  free_irq+0x78/0xb0
  aac_free_irq+0x13c/0x150 [aacraid]
  aac_reset_adapter+0x2e8/0x970 [aacraid]
  aac_eh_reset+0x3a8/0x5d0 [aacraid]
  scsi_try_host_reset+0x74/0x180
  scsi_eh_ready_devs+0xc70/0x1510
  scsi_error_handler+0x624/0xa20

This patch prevents the crash by changing the order of the
deinitialization in this path of aacraid: first we clear the IRQ,
then we free other resources. No functional change intended.

Signed-off-by: Guilherme G. Piccoli 
---
 drivers/scsi/aacraid/commsup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/aacraid/commsup.c b/drivers/scsi/aacraid/commsup.c
index 2abe8fd83494..bec9f3193f60 100644
--- a/drivers/scsi/aacraid/commsup.c
+++ b/drivers/scsi/aacraid/commsup.c
@@ -1554,6 +1554,7 @@ static int _aac_reset_adapter(struct aac_dev *aac, int 
forced, u8 reset_type)
 * will ensure that i/o is queisced and the card is flushed in that
 * case.
 */
+   aac_free_irq(aac);
aac_fib_map_free(aac);
dma_free_coherent(&aac->pdev->dev, aac->comm_size, aac->comm_addr,
  aac->comm_phys);
@@ -1561,7 +1562,6 @@ static int _aac_reset_adapter(struct aac_dev *aac, int 
forced, u8 reset_type)
aac->comm_phys = 0;
kfree(aac->queues);
aac->queues = NULL;
-   aac_free_irq(aac);
kfree(aac->fsa_dev);
aac->fsa_dev = NULL;
 
-- 
2.15.0



[PATCH 1/3] scsi: aacraid: Check for PCI state of device in a generic way

2017-11-17 Thread Guilherme G. Piccoli
Commit 16ae9dd35d37 ("scsi: aacraid: Fix for excessive prints on EEH")
introduced checks about the state of device before any PCI operations
in the driver. Basically, this prevents it to perform PCI accesses
when device is in the process of recover from a PCI error. In PowerPC,
such mechanism is called EEH, and the aforementioned commit introduced
checks that are based on EEH-specific primitives for that.

The potential problems with this approach are three: first, these checks
are "locked" to powerpc only - another archs could have error recovery
methods too, like AER in Intel. Also, the powerpc primitives perform
expensive FW accesses to validate the precise PCI state of a device.
Finally, code becomes more complicated and needs ifdef validation
based on arch config being set.

So, this patch makes use of generic PCI state checks, which are
lightweight and non-dependent of arch configs - also, it makes
the code cleaner.

Signed-off-by: Guilherme G. Piccoli 
---
 drivers/scsi/aacraid/commsup.c | 33 ++---
 1 file changed, 2 insertions(+), 31 deletions(-)

diff --git a/drivers/scsi/aacraid/commsup.c b/drivers/scsi/aacraid/commsup.c
index 525a652dab48..2abe8fd83494 100644
--- a/drivers/scsi/aacraid/commsup.c
+++ b/drivers/scsi/aacraid/commsup.c
@@ -467,35 +467,6 @@ int aac_queue_get(struct aac_dev * dev, u32 * index, u32 
qid, struct hw_fib * hw
return 0;
 }
 
-#ifdef CONFIG_EEH
-static inline int aac_check_eeh_failure(struct aac_dev *dev)
-{
-   /* Check for an EEH failure for the given
-* device node. Function eeh_dev_check_failure()
-* returns 0 if there has not been an EEH error
-* otherwise returns a non-zero value.
-*
-* Need to be called before any PCI operation,
-* i.e.,before aac_adapter_check_health()
-*/
-   struct eeh_dev *edev = pci_dev_to_eeh_dev(dev->pdev);
-
-   if (eeh_dev_check_failure(edev)) {
-   /* The EEH mechanisms will handle this
-* error and reset the device if
-* necessary.
-*/
-   return 1;
-   }
-   return 0;
-}
-#else
-static inline int aac_check_eeh_failure(struct aac_dev *dev)
-{
-   return 0;
-}
-#endif
-
 /*
  * Define the highest level of host to adapter communication routines.
  * These routines will support host to adapter FS commuication. These
@@ -701,7 +672,7 @@ int aac_fib_send(u16 command, struct fib *fibptr, unsigned 
long size,
return -ETIMEDOUT;
}
 
-   if (aac_check_eeh_failure(dev))
+   if (unlikely(pci_channel_offline(dev->pdev)))
return -EFAULT;
 
if ((blink = aac_adapter_check_health(dev)) > 
0) {
@@ -801,7 +772,7 @@ int aac_hba_send(u8 command, struct fib *fibptr, 
fib_callback callback,
 
spin_unlock_irqrestore(&fibptr->event_lock, flags);
 
-   if (aac_check_eeh_failure(dev))
+   if (unlikely(pci_channel_offline(dev->pdev)))
return -EFAULT;
 
fibptr->flags |= FIB_CONTEXT_FLAG_WAIT;
-- 
2.15.0



[PATCH] scsi_dh: add new rdac devices

2017-11-17 Thread Xose Vazquez Perez
Add IBM 3542 and 3552, arrays: FAStT200 and FAStT500.
Add full STK OPENstorage family, arrays: 9176, D173, D178, D210, D220, D240 and 
D280.
Add STK BladeCtlr family, arrays: B210, B220, B240 and B280.

These changes were done in multipath-tools time ago.

Cc: NetApp RDAC team 
Cc: Hannes Reinecke 
Cc: Christophe Varoqui 
Cc: Martin K. Petersen 
Cc: James E.J. Bottomley 
Cc: SCSI ML 
Cc: device-mapper development 
Signed-off-by: Xose Vazquez Perez 
---
 drivers/scsi/scsi_dh.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_dh.c b/drivers/scsi/scsi_dh.c
index 2b785d09d5bd..b88b5dbbc444 100644
--- a/drivers/scsi/scsi_dh.c
+++ b/drivers/scsi/scsi_dh.c
@@ -56,10 +56,13 @@ static const struct scsi_dh_blist scsi_dh_blist[] = {
{"IBM", "1815", "rdac", },
{"IBM", "1818", "rdac", },
{"IBM", "3526", "rdac", },
+   {"IBM", "3542", "rdac", },
+   {"IBM", "3552", "rdac", },
{"SGI", "TP9",  "rdac", },
{"SGI", "IS",   "rdac", },
-   {"STK", "OPENstorage D280", "rdac", },
+   {"STK", "OPENstorage",  "rdac", },
{"STK", "FLEXLINE 380", "rdac", },
+   {"STK", "BladeCtlr","rdac", },
{"SUN", "CSM",  "rdac", },
{"SUN", "LCSM100",  "rdac", },
{"SUN", "STK6580_6780", "rdac", },
-- 
2.14.3



[PATCH try #2] scsi_devinfo: apply to HP-rebranded the same flags as Hitachi

2017-11-17 Thread Xose Vazquez Perez
627511e3e modified some Hitachi entries:

Four models, OPEN-/DF400/DF500/DISK-SUBSYSTEM, can handle REPORT_LUN,
and the BLIST_REPORTLUN2 flag needs to be set. And DF600 doesn't require
any flags because it returns ANSI 03h (SPC).
~~~

The same should have been done also for HP counterparts.

Cc: Takahiro Yasui 
Cc: Mike Christie 
Cc: Matthias Rudolph 
Cc: Martin K. Petersen 
Cc: James E.J. Bottomley 
Cc: SCSI ML 
Signed-off-by: Xose Vazquez Perez 
---
 drivers/scsi/scsi_devinfo.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/scsi_devinfo.c b/drivers/scsi/scsi_devinfo.c
index 2464569..44d8cfb 100644
--- a/drivers/scsi/scsi_devinfo.c
+++ b/drivers/scsi/scsi_devinfo.c
@@ -186,9 +186,8 @@ static struct {
{"HP", "C1557A", NULL, BLIST_FORCELUN},
{"HP", "C3323-300", "4269", BLIST_NOTQ},
{"HP", "C5713A", NULL, BLIST_NOREPORTLUN},
-   {"HP", "DF400", "*", BLIST_SPARSELUN | BLIST_LARGELUN},
-   {"HP", "DF500", "*", BLIST_SPARSELUN | BLIST_LARGELUN},
-   {"HP", "DF600", "*", BLIST_SPARSELUN | BLIST_LARGELUN},
+   {"HP", "DF400", "*", BLIST_REPORTLUN2},
+   {"HP", "DF500", "*", BLIST_REPORTLUN2},
{"HP", "OP-C-", "*", BLIST_SPARSELUN | BLIST_LARGELUN},
{"HP", "3380-", "*", BLIST_SPARSELUN | BLIST_LARGELUN},
{"HP", "3390-", "*", BLIST_SPARSELUN | BLIST_LARGELUN},
-- 
2.10.1



[PATCH] scsi_devinfo: apply to HP XP the same flags as Hitachi VSP

2017-11-17 Thread Xose Vazquez Perez
56f3d383f modified some Hitachi entries:

   HITACHI is always supporting VPD pages, even though it's claiming to
   support SCSI Revision 3 only.
~~~

The same should have been done also for HP-rebranded.

Cc: Hannes Reinecke 
Cc: Takahiro Yasui 
Cc: Matthias Rudolph 
Cc: Martin K. Petersen 
Cc: James E.J. Bottomley 
Cc: SCSI ML 
Signed-off-by: Xose Vazquez Perez 
---
 drivers/scsi/scsi_devinfo.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_devinfo.c b/drivers/scsi/scsi_devinfo.c
index 320d0a81358d..3ab4d337bf55 100644
--- a/drivers/scsi/scsi_devinfo.c
+++ b/drivers/scsi/scsi_devinfo.c
@@ -182,7 +182,7 @@ static struct {
{"HITACHI", "6586-", "*", BLIST_SPARSELUN | BLIST_LARGELUN},
{"HITACHI", "6588-", "*", BLIST_SPARSELUN | BLIST_LARGELUN},
{"HP", "A6189A", NULL, BLIST_SPARSELUN | BLIST_LARGELUN},   /* HP 
VA7400 */
-   {"HP", "OPEN-", "*", BLIST_REPORTLUN2}, /* HP XP Arrays */
+   {"HP", "OPEN-", "*", BLIST_REPORTLUN2 | BLIST_TRY_VPD_PAGES}, /* HP XP 
Arrays */
{"HP", "NetRAID-4M", NULL, BLIST_FORCELUN},
{"HP", "HSV100", NULL, BLIST_REPORTLUN2 | BLIST_NOSTARTONADD},
{"HP", "C1557A", NULL, BLIST_FORCELUN},
-- 
2.14.3



[Bug 197877] arcmsr fails to initialize Areca ARC-1110/ARC-1120 on some systems

2017-11-17 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=197877

--- Comment #4 from k...@sognnes.no ---
Building the latest Areca driver (1.40.00.02 from
http://www.areca.us/support/s_linux/driver/Source%20Code/arcmsr-1.40.00.02-source-only.dkms.tar.gz)
against kernel 3.17.8 introduces the bug, so it's definitely the driver rather
than some other issue with the kernel.

The latest driver that works on the affected systems seems to be
1.20.0X.15-130619
(ftp://ftp.areca.com.tw/RaidCards/AP_Drivers/Linux/DRIVER/SourceCode/arcmsr.1.20.0X.15-130619.zip),
which unfortunately doesn't compile against recent kernels.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.


Re: [PATCH 0/2] sd: Fix a deadlock between event checking and device removal

2017-11-17 Thread Bart Van Assche
On Fri, 2017-11-17 at 18:10 +0100, Jack Wang wrote:
> It's production server, I was too late to gather more information.
> kernel is 4.4.36/4.4.50
> Request mode for both multipath and scsi, no multiqueue involvement.

Hello Jack,

I haven't seen any lockups with the multipath + SRP and the legacy block
and SCSI layers in a long time. So either something went wrong with
backporting upstream fixes, an upstream fix has not been backported to kernel
4.4.36 or the lockup is caused by the SCSI LLD in your setup, e.g. a missing
->scsi_done() call. Since the srp-test software passes on your setup the
latter may be the most likely.

Bart.

Re: [PATCH 0/2] sd: Fix a deadlock between event checking and device removal

2017-11-17 Thread Jack Wang
2017-11-17 18:01 GMT+01:00 Bart Van Assche :
> On Fri, 2017-11-17 at 16:14 +0100, Jack Wang wrote:
>> I suspect could be missing run queue or lost IO, IMHO it's unlikely
>> below disk probing fix the bug.
>
> If the system is still in this state or if you can reproduce this issue,
> please collect and analyze the information under /sys/kernel/debug/block.
> That's the only way I know of to verify whether or not a lockup has been
> caused by a missing queue run. If the following command resolves the lockup
> then the root cause is definitely a missing queue run:

It's production server, I was too late to gather more information.
kernel is 4.4.36/4.4.50
Request mode for both multipath and scsi, no multiqueue involvement.

I found thread back to 2012, you also report this problem in 3.2..
https://lkml.org/lkml/2012/1/3/163

I might be a very old bug.

I will try harder to reproduce.

Thanks,
Jack


>
> for f in /sys/kernel/debug/block/*; do echo kick >$f/state; done
>
> When analyzing queue lockups it's important to also have information about
> requests that have been queued but that have not yet been started. I'm using
> the following patch locally (will split this patch and submit it properly when
> I have the time):
>
> diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
> index 29e8451931ff..3c9d64793865 100644
> --- a/block/blk-mq-debugfs.c
> +++ b/block/blk-mq-debugfs.c
> @@ -408,8 +408,7 @@ static void hctx_show_busy_rq(struct request *rq, void 
> *data, bool reserved)
>  {
> const struct show_busy_params *params = data;
>
> -   if (blk_mq_map_queue(rq->q, rq->mq_ctx->cpu) == params->hctx &&
> -   test_bit(REQ_ATOM_STARTED, &rq->atomic_flags))
> +   if (blk_mq_map_queue(rq->q, rq->mq_ctx->cpu) == params->hctx)
> __blk_mq_debugfs_rq_show(params->m,
>  list_entry_rq(&rq->queuelist));
>  }
> diff --git a/drivers/scsi/scsi_debugfs.c b/drivers/scsi/scsi_debugfs.c
> index 01f08c03f2c1..41d1e3a01786 100644
> --- a/drivers/scsi/scsi_debugfs.c
> +++ b/drivers/scsi/scsi_debugfs.c
> @@ -7,10 +7,14 @@
>  void scsi_show_rq(struct seq_file *m, struct request *rq)
>  {
> struct scsi_cmnd *cmd = container_of(scsi_req(rq), typeof(*cmd), req);
> -   int msecs = jiffies_to_msecs(jiffies - cmd->jiffies_at_alloc);
> -   char buf[80];
> +   int alloc_ms = jiffies_to_msecs(jiffies - cmd->jiffies_at_alloc);
> +   int timeout_ms = jiffies_to_msecs(rq->timeout);
> +   const u8 *const cdb = READ_ONCE(cmd->cmnd);
> +   char buf[80] = "(?)";
>
> -   __scsi_format_command(buf, sizeof(buf), cmd->cmnd, cmd->cmd_len);
> -   seq_printf(m, ", .cmd=%s, .retries=%d, allocated %d.%03d s ago", buf,
> -  cmd->retries, msecs / 1000, msecs % 1000);
> +   if ((cmd->flags & SCMD_INITIALIZED) && cdb)
> +   __scsi_format_command(buf, sizeof(buf), cdb, cmd->cmd_len);
> +   seq_printf(m, ", .cmd=%s, .retries=%d, .timeout=%d.%03d, allocated 
> %d.%03d s ago",
> +  buf, cmd->retries, timeout_ms / 1000, timeout_ms % 1000,
> +  alloc_ms / 1000, alloc_ms % 1000);
>  }


Re: [PATCH 0/2] sd: Fix a deadlock between event checking and device removal

2017-11-17 Thread Bart Van Assche
On Fri, 2017-11-17 at 16:14 +0100, Jack Wang wrote:
> I suspect could be missing run queue or lost IO, IMHO it's unlikely
> below disk probing fix the bug.

If the system is still in this state or if you can reproduce this issue,
please collect and analyze the information under /sys/kernel/debug/block.
That's the only way I know of to verify whether or not a lockup has been
caused by a missing queue run. If the following command resolves the lockup
then the root cause is definitely a missing queue run:

for f in /sys/kernel/debug/block/*; do echo kick >$f/state; done

When analyzing queue lockups it's important to also have information about
requests that have been queued but that have not yet been started. I'm using
the following patch locally (will split this patch and submit it properly when
I have the time):

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 29e8451931ff..3c9d64793865 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -408,8 +408,7 @@ static void hctx_show_busy_rq(struct request *rq, void 
*data, bool reserved)
 {
const struct show_busy_params *params = data;
 
-   if (blk_mq_map_queue(rq->q, rq->mq_ctx->cpu) == params->hctx &&
-   test_bit(REQ_ATOM_STARTED, &rq->atomic_flags))
+   if (blk_mq_map_queue(rq->q, rq->mq_ctx->cpu) == params->hctx)
__blk_mq_debugfs_rq_show(params->m,
 list_entry_rq(&rq->queuelist));
 }
diff --git a/drivers/scsi/scsi_debugfs.c b/drivers/scsi/scsi_debugfs.c
index 01f08c03f2c1..41d1e3a01786 100644
--- a/drivers/scsi/scsi_debugfs.c
+++ b/drivers/scsi/scsi_debugfs.c
@@ -7,10 +7,14 @@
 void scsi_show_rq(struct seq_file *m, struct request *rq)
 {
struct scsi_cmnd *cmd = container_of(scsi_req(rq), typeof(*cmd), req);
-   int msecs = jiffies_to_msecs(jiffies - cmd->jiffies_at_alloc);
-   char buf[80];
+   int alloc_ms = jiffies_to_msecs(jiffies - cmd->jiffies_at_alloc);
+   int timeout_ms = jiffies_to_msecs(rq->timeout);
+   const u8 *const cdb = READ_ONCE(cmd->cmnd);
+   char buf[80] = "(?)";
 
-   __scsi_format_command(buf, sizeof(buf), cmd->cmnd, cmd->cmd_len);
-   seq_printf(m, ", .cmd=%s, .retries=%d, allocated %d.%03d s ago", buf,
-  cmd->retries, msecs / 1000, msecs % 1000);
+   if ((cmd->flags & SCMD_INITIALIZED) && cdb)
+   __scsi_format_command(buf, sizeof(buf), cdb, cmd->cmd_len);
+   seq_printf(m, ", .cmd=%s, .retries=%d, .timeout=%d.%03d, allocated 
%d.%03d s ago",
+  buf, cmd->retries, timeout_ms / 1000, timeout_ms % 1000,
+  alloc_ms / 1000, alloc_ms % 1000);
 }

Re: [PATCH 0/2] sd: Fix a deadlock between event checking and device removal

2017-11-17 Thread Jack Wang
2017-11-14 18:33 GMT+01:00 Bart Van Assche :
> On Tue, 2017-11-14 at 18:01 +0100, Jack Wang wrote:
>> I suspect we run into same bug you were trying to fix in this patch
>> set. we're running in v4.4.50
>>
>> I was trying to reproduce it, but no lucky yet, do you still have your
>> reproducer?
>
> Hello Jack,
>
> I can reproduce this about every fifth run of test one of the srp-test
> software and with the SRP initiator and target drivers of what will become
> kernel v4.15-rc1 and by switching the ib_srpt driver from non-SRQ to SRQ
> mode while the initiator is logging in. I'm currently analyzing where in the
> block layer a queue run is missing. The patch below for the sd driver does
> not fix the root cause but seems to help.
>
> Bart.
>
Thanks Bart,

You're always kind and helpful.
I tried srp-test in my test machine, but still no luck.

In my case,  we had on storage failure, and lead scsi error handle and
offlined both devices, so both paths down, raid1 failed one leg (the
dm device).
During incident recovery, when tried to delete the broken scsi device,
there are processes in D state

[Mon Nov  6 09:55:19 2017] sysrq: SysRq : Show Blocked State
[Mon Nov  6 09:55:19 2017]   taskPC stack   pid father
[Mon Nov  6 09:55:19 2017] kworker/40:2D 883004327a98 0
65322  2 0x
[Mon Nov  6 09:55:19 2017] Workqueue: events_freezable_power_ disk_events_workfn
[Mon Nov  6 09:55:19 2017]  883004327a98 881804973400
882fe6e4b400 883004327ad0
[Mon Nov  6 09:55:19 2017]  0282 883004328000
00011c546eb0 883004327ad0
[Mon Nov  6 09:55:19 2017]  883007c0cd80 0028
883004327ab0 81800540
[Mon Nov  6 09:55:19 2017] Call Trace:
[Mon Nov  6 09:55:19 2017]  [] schedule+0x30/0x80
[Mon Nov  6 09:55:19 2017]  [] schedule_timeout+0x14a/0x290
[Mon Nov  6 09:55:19 2017]  [] ?
trace_raw_output_itimer_expire+0x80/0x80
[Mon Nov  6 09:55:19 2017]  [] io_schedule_timeout+0xb6/0x130
[Mon Nov  6 09:55:19 2017]  []
wait_for_completion_io_timeout+0x9b/0x110
[Mon Nov  6 09:55:19 2017]  [] ? wake_up_q+0x70/0x70
[Mon Nov  6 09:55:19 2017]  [] blk_execute_rq+0x92/0x110
[Mon Nov  6 09:55:19 2017]  [] ? wait_woken+0x80/0x80
[Mon Nov  6 09:55:19 2017]  [] ? blk_get_request+0x7d/0xe0
[Mon Nov  6 09:55:19 2017]  []
scsi_execute+0x143/0x5f0 [scsi_mod]
[Mon Nov  6 09:55:19 2017]  []
scsi_execute_req_flags+0x94/0x100 [scsi_mod]
[Mon Nov  6 09:55:19 2017]  []
scsi_test_unit_ready+0x7b/0x2a0 [scsi_mod]
[Mon Nov  6 09:55:19 2017]  [] 0xa0644971
sd_check_events
[Mon Nov  6 09:55:19 2017]  [] disk_check_events+0x54/0x130
[Mon Nov  6 09:55:19 2017]  [] disk_events_workfn+0x11/0x20
[Mon Nov  6 09:55:19 2017]  [] process_one_work+0x148/0x410
[Mon Nov  6 09:55:19 2017]  [] worker_thread+0x61/0x450
[Mon Nov  6 09:55:19 2017]  [] ? rescuer_thread+0x2e0/0x2e0
[Mon Nov  6 09:55:19 2017]  [] ? rescuer_thread+0x2e0/0x2e0
[Mon Nov  6 09:55:19 2017]  [] kthread+0xd6/0xf0
[Mon Nov  6 09:55:19 2017]  [] ? kthread_park+0x50/0x50
[Mon Nov  6 09:55:19 2017]  [] ret_from_fork+0x3f/0x70
[Mon Nov  6 09:55:19 2017]  [] ? kthread_park+0x50/0x50
[Mon Nov  6 09:55:19 2017] repair_md_dev   D 8807a258fa28 0
18979  18978 0x
[Mon Nov  6 09:55:19 2017]  8807a258fa28 88180497
8807f11eb400 882807d146e0
[Mon Nov  6 09:55:19 2017]  882807d146e0 8807a259
7fff 8807a258fb70
[Mon Nov  6 09:55:19 2017]  8807f11eb400 8807f11eb400
8807a258fa40 81800540
[Mon Nov  6 09:55:19 2017] Call Trace:
[Mon Nov  6 09:55:19 2017]  [] schedule+0x30/0x80
[Mon Nov  6 09:55:19 2017]  [] schedule_timeout+0x220/0x290
[Mon Nov  6 09:55:19 2017]  [] ?
physflat_send_IPI_mask+0x9/0x10
[Mon Nov  6 09:55:19 2017]  [] ? try_to_wake_up+0x4f/0x3c0
[Mon Nov  6 09:55:19 2017]  [] wait_for_completion+0x9a/0xf0
[Mon Nov  6 09:55:19 2017]  [] ? wake_up_q+0x70/0x70
[Mon Nov  6 09:55:19 2017]  [] flush_work+0xe5/0x140
[Mon Nov  6 09:55:19 2017]  [] ? destroy_worker+0x90/0x90
[Mon Nov  6 09:55:19 2017]  [] __cancel_work_timer+0x97/0x1b0
[Mon Nov  6 09:55:19 2017]  [] ? kernfs_put+0xf3/0x1a0
[Mon Nov  6 09:55:19 2017]  []
cancel_delayed_work_sync+0xe/0x10
[Mon Nov  6 09:55:19 2017]  [] disk_block_events+0x8d/0x90
[Mon Nov  6 09:55:19 2017]  [] del_gendisk+0x35/0x230
[Mon Nov  6 09:55:19 2017]  [] 0xa0643b79
[Mon Nov  6 09:55:19 2017]  []
__device_release_driver+0x91/0x130
[Mon Nov  6 09:55:19 2017]  [] device_release_driver+0x27/0x40
[Mon Nov  6 09:55:19 2017]  [] bus_remove_device+0xfc/0x170
[Mon Nov  6 09:55:19 2017]  [] device_del+0x123/0x240
[Mon Nov  6 09:55:19 2017]  [] ?
sysfs_remove_file_ns+0x10/0x20
[Mon Nov  6 09:55:19 2017]  []
__scsi_remove_device+0xc9/0xd0 [scsi_mod]
[Mon Nov  6 09:55:19 2017]  []
scsi_remove_device+0x2a/0x80 [scsi_mod]
[Mon Nov  6 09:55:19 2017]  []
scsi_remove_device+0x6b/0x80 [scsi_mod]
[Mon Nov  6 09:55:19 2017]  [] dev_attr_store+0x13/0x20
[Mon Nov  6 09:55:19 2017]  [] sysfs_kf_write+0x32/0x

Re: [PATCH] scsi: csiostor: remove unneeded DRIVER_LICENSE #define

2017-11-17 Thread Philippe Ombredanne
On Fri, Nov 17, 2017 at 3:21 PM, Greg Kroah-Hartman
 wrote:
> There is no need to #define the license of the driver, just put it in
> the MODULE_LICENSE() line directly as a text string.
>
> This allows tools that check that the module license matches the source
> code license to work properly, as there is no need to unwind the
> unneeded dereference, especially when the string is defined in a .h file
> far away from the .c file it is used in.
>
> Cc: "James E.J. Bottomley" 
> Cc: "Martin K. Petersen" 
> Cc: Varun Prakash 
> Reported-by: Philippe Ombredanne 
> Signed-off-by: Greg Kroah-Hartman 


Reviewed-by: Philippe Ombredanne 
-- 
Cordially
Philippe Ombredanne


[PATCH] scsi: csiostor: remove unneeded DRIVER_LICENSE #define

2017-11-17 Thread Greg Kroah-Hartman
There is no need to #define the license of the driver, just put it in
the MODULE_LICENSE() line directly as a text string.

This allows tools that check that the module license matches the source
code license to work properly, as there is no need to unwind the
unneeded dereference, especially when the string is defined in a .h file
far away from the .c file it is used in.

Cc: "James E.J. Bottomley" 
Cc: "Martin K. Petersen" 
Cc: Varun Prakash 
Reported-by: Philippe Ombredanne 
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/scsi/csiostor/csio_init.c | 2 +-
 drivers/scsi/csiostor/csio_init.h | 1 -
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/scsi/csiostor/csio_init.c 
b/drivers/scsi/csiostor/csio_init.c
index 28a9c7d706cb..abd71379bce3 100644
--- a/drivers/scsi/csiostor/csio_init.c
+++ b/drivers/scsi/csiostor/csio_init.c
@@ -1255,7 +1255,7 @@ module_init(csio_init);
 module_exit(csio_exit);
 MODULE_AUTHOR(CSIO_DRV_AUTHOR);
 MODULE_DESCRIPTION(CSIO_DRV_DESC);
-MODULE_LICENSE(CSIO_DRV_LICENSE);
+MODULE_LICENSE("Dual BSD/GPL");
 MODULE_DEVICE_TABLE(pci, csio_pci_tbl);
 MODULE_VERSION(CSIO_DRV_VERSION);
 MODULE_FIRMWARE(FW_FNAME_T5);
diff --git a/drivers/scsi/csiostor/csio_init.h 
b/drivers/scsi/csiostor/csio_init.h
index 96b31e5af91e..20244254325a 100644
--- a/drivers/scsi/csiostor/csio_init.h
+++ b/drivers/scsi/csiostor/csio_init.h
@@ -48,7 +48,6 @@
 #include "csio_hw.h"
 
 #define CSIO_DRV_AUTHOR"Chelsio Communications"
-#define CSIO_DRV_LICENSE   "Dual BSD/GPL"
 #define CSIO_DRV_DESC  "Chelsio FCoE driver"
 #define CSIO_DRV_VERSION   "1.0.0-ko"
 
-- 
2.15.0



Re: [REGRESSION][v4.13.y][v4.14.y] scsi: libsas: allow async aborts

2017-11-17 Thread Hannes Reinecke
On 11/16/2017 11:08 PM, Joseph Salisbury wrote:
> Hi Christoph,
> 
> A kernel bug report was opened against Ubuntu [0].  After a kernel
> bisect, it was found that reverting the following commit resolved this bug:
> 
> 909657615d9b ("scsi: libsas: allow async aborts")
> 
>  
> The regression was introduced as of v4.12-rc1, and it still exists in
> 4.14 mainline.
> 
> I was hoping to get your feedback, since you are the patch author.  Do
> you think gathering any additional data will help diagnose this issue,
> or would it be best to submit a revert request?
> 
I'll be checking what's going on there.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: mvsas sata drives with high ioerr_cnt and long stalls

2017-11-17 Thread Pasi Kärkkäinen
On Thu, Nov 16, 2017 at 08:55:07PM -0500, Larkin Lowrey wrote:
> It just got worse, under heavy sequential write load, all of the drives in
> the array where thrown out of the array and I got a stack trace.
> 
> I know I've been able to stress this array without issue but the last time I
> did that was several kernels ago.
> 

iirc some people earlier reported that their cards are overheating, and adding 
more cooling to either the case or a fan for the controllers helped. I don't 
know if that might be your issue aswell.

Other than that.. my general feeling is that mvsas driver is quite buggy, and 
unfortunately lots of people have had issues with it..


-- Pasi

> --Larkin
> 
> [ 5723.912580] ata14.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
> frozen
> [ 5723.920002] ata14.00: failed command: SMART
> [ 5723.924428] ata14.00: cmd b0/d1:01:00:4f:c2/00:00:00:00:00/00 tag 21 pio
> 512 in
> [ 5723.924428]  res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4
> (timeout)
> [ 5723.939938] ata14.00: status: { DRDY }
> [ 5723.943975] ata14: hard resetting link
> [ 5731.517360] ata14.00: qc timeout (cmd 0xec)
> [ 5731.521889] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> [ 5731.528411] ata14.00: revalidation failed (errno=-5)
> [ 5731.533696] ata14: hard resetting link
> [ 5739.197688] ata14.00: qc timeout (cmd 0x27)
> [ 5739.202312] ata14.00: failed to read native max address (err_mask=0x4)
> [ 5739.209165] ata14.00: HPA support seems broken, skipping HPA handling
> [ 5739.215921] ata14.00: revalidation failed (errno=-5)
> [ 5739.221185] ata14: hard resetting link
> [ 5757.118460] ata14.00: qc timeout (cmd 0xef)
> [ 5757.122940] ata14.00: failed to set xfermode (err_mask=0x4)
> [ 5757.128864] ata14.00: disabled
> [ 5757.132218] ata14: hard resetting link
> [ 5759.510588] ata14: EH complete
> [ 5759.514162] sd 8:0:7:0: [sdm] tag#2 FAILED Result:
> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> [ 5759.514267] sd 8:0:7:0: [sdm] Read Capacity(16) failed: Result:
> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> [ 5759.514271] sd 8:0:7:0: [sdm] Sense not available.
> [ 5759.514313] sd 8:0:7:0: [sdm] Read Capacity(10) failed: Result:
> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> [ 5759.514316] sd 8:0:7:0: [sdm] Sense not available.
> [ 5759.514349] sd 8:0:7:0: [sdm] 0 512-byte logical blocks: (0 B/0 B)
> [ 5759.514352] sd 8:0:7:0: [sdm] 4096-byte physical blocks
> [ 5759.514440] sdm: detected capacity change from 4000787030016 to 0
> [ 5759.572801] sd 8:0:7:0: [sdm] tag#2 CDB: ATA command pass
> through(12)/Blank a1 06 20 00 00 00 00 00 00 e5 00 00
> [ 5766.988546] ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
> frozen
> [ 5766.995950] ata8.00: failed command: SMART
> [ 5767.000362] ata8.00: cmd b0/d1:01:00:4f:c2/00:00:00:00:00/00 tag 21 pio
> 512 in
> [ 5767.000362]  res 40/00:00:00:00:00/00:00:00:00:00/40 Emask 0x4
> (timeout)
> [ 5767.015812] ata8.00: status: { DRDY }
> [ 5767.019730] ata8: hard resetting link
> [ 5769.431098] ata8.00: failed to IDENTIFY (INIT_DEV_PARAMS failed,
> err_mask=0x80)
> [ 5769.439017] ata8.00: revalidation failed (errno=-5)
> [ 5774.527196] ata8: hard resetting link
> [ 5774.687313] [ cut here ]
> [ 5774.692252] WARNING: CPU: 4 PID: 11543 at drivers/ata/libata-core.c:5137
> __ata_qc_complete+0x103/0x150
> [ 5774.702108] Modules linked in: fuse binfmt_misc vhost_net vhost macvtap
> macvlan tap xt_CHECKSUM iptable_mangle tun xt_nat veth 
> ebtable_filter ebtables 8021q garp mrp xfs ipt_MASQUERADE
> nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 xt_addrtype nf_nat
> br_net filter bridge stp llc bonding cfg80211 rfkill
> nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6
> nf_conntrack_ipv6 nf_def rag_ipv6 xt_conntrack nf_conntrack
> ip6table_filter ip6_tables jc42 joydev amd64_edac_mod edac_mce_amd kvm_amd
> kvm irqbypass crct10 dif_pclmul crc32_pclmul
> ghash_clmulni_intel ipmi_si ipmi_devintf ipmi_msghandler sp5100_tco
> i2c_piix4 k10temp fam15h_power tpm_tis  tpm_tis_core
> acpi_cpufreq tpm shpchp dm_thin_pool dm_persistent_data dm_bio_prison bcache
> raid456 libcrc32c async_raid6_recov asy nc_memcpy async_pq
> async_xor async_tx
> [ 5774.776476]  raid0 btrfs xor raid6_pq raid10 drm_kms_helper igb ttm mvsas
> uas ptp drm crc32c_intel serio_raw mpt3sas libsas usb
> _storage pps_core be2net raid_class dca scsi_transport_sas i2c_algo_bit
> [ 5774.794981] CPU: 4 PID: 11543 Comm: kworker/u65:5 Not tainted
> 4.13.12-100.fc25.x86_64 #1
> [ 5774.803581] Hardware name: Supermicro H8DG6/H8DGi/H8DG6/H8DGi, BIOS
> 3.0a   07/26/2013
> [ 5774.812273] Workqueue: events_unbound async_run_entry_fn
> [ 5774.817896] task: 90e7ca602640 task.stack: b070caf5
> [ 5774.824105] RIP: 0010:__ata_qc_complete+0x103/0x150
> [ 5774.829250] RSP: 0018:b070caf539a8 EFLAGS: 00010046
> [ 5774.834790] RAX: 0