Re: scsi: memory leak in sg_start_req

2018-01-10 Thread Douglas Gilbert

On 2018-01-09 11:05 AM, Dmitry Vyukov wrote:

Hello,

syzkaller has found the following memory leak:

unreferenced object 0x88004c19 (size 8328):
   comm "syz-executor", pid 4627, jiffies 4294749150 (age 45.507s)
   hex dump (first 32 bytes):
 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 20 00 00 00 22 01 00 00 00 00 00 00 04 00 00 00   ..."...
   backtrace:
 [<5955b5a9>] kmalloc_order+0x59/0x80 mm/slab_common.c:1124
 [<43ae006e>] kmalloc_order_trace+0x1f/0x160 mm/slab_common.c:1133
 [] kmalloc_large include/linux/slab.h:433 [inline]
 [] __kmalloc+0x2c4/0x340 mm/slub.c:3751
 [] kmalloc include/linux/slab.h:504 [inline]
 [] bio_alloc_bioset+0x4d5/0x7e0 block/bio.c:450
 [] bio_kmalloc include/linux/bio.h:410 [inline]
 [] bio_copy_user_iov+0x2be/0xcb0 block/bio.c:1226
 [<1d0b79ed>] __blk_rq_map_user_iov block/blk-map.c:67 [inline]
 [<1d0b79ed>] blk_rq_map_user_iov+0x2b6/0x7d0 block/blk-map.c:136
 [<4200a869>] blk_rq_map_user+0x11e/0x170 block/blk-map.c:166
 [<8f21739e>] sg_start_req drivers/scsi/sg.c:1794 [inline]
 [<8f21739e>] sg_common_write.isra.16+0x14df/0x1ed0
drivers/scsi/sg.c:777
 [<093f61e3>] sg_write+0x8a7/0xd7b drivers/scsi/sg.c:677
 [] __vfs_write+0x10d/0x8f0 fs/read_write.c:480
 [<0638f16f>] vfs_write+0x1fd/0x570 fs/read_write.c:544
 [<6a7e6867>] SYSC_write fs/read_write.c:589 [inline]
 [<6a7e6867>] SyS_write+0xfa/0x250 fs/read_write.c:581

can be reproduced with the following program:

// autogenerated by syzkaller (http://github.com/google/syzkaller)
#include 
#include 
#include 
#include 

int main()
{
   int fd = open("/dev/sg1", O_RDWR);
   const char *data =
  "\xb6\x3d\xb8\x5e\x1e\x8d\x22\x00\x00\x00\x00\x00\x00\x08\xaf\xd6\x1d"
  "\xcc\x43\x6a\xed\x5e\xd2\xbc\x70\x18\xce\xbc\x9b\x97\xae\x21\x91\x4d"
  "\x87\x2c\x67\x8c\xe2\x2c\x9b\x16\x0e\x96\xaa\x1f\xae\x1a";
   write(fd, data, 0x30);
   return 0;
}

if executed in a loop, memory consumption grows infinitely.

on upstream commit b2cd1df66037e7c4697c7e40496bf7e4a5e16a2d


The seemingly random data that program is sending is asking for
a buffer of 2,264,314 bytes which the sg driver procures and waits
for the caller to either issue a read() or close() the file
or shutdown the program. The test program does none of those
expected operations, it simply asks for the same resources again.

In my version of your test code (attached), that happens 1,021 times
at which point the file handles in that process are exhausted and
all subsequent open()s fail with EBADF (as do the write()s). The
output from my program was this on one run:

# ./sg_syzk_grow
First errno=9 [Bad file descriptor] index=1021
done_count=5, err_count=48979, last_errno=9 [Bad file descriptor]

# lsscsi -gs
[0:0:0:0]  disk  Linux  scsi_debug  0186  /dev/sda  /dev/sg0  2.14GB

Monitoring that program with 'free' from another terminal I see
about 2.5 GBytes of ram "swallowed" almost immediately when the test
program runs. When the program exits (about 50 seconds later) as far
as I can see all that ram is given back.


If you used the same program and wrote to a regular file rather than
a sg device, then that program would eventually fill any file system,
at the rate of 48 bytes per iteration (given enough file descriptor
resources). The sg driver, using its original 1994 interface,
deprecated for around 18 years, just gets a system to resource
exhaustion quicker.

Doug Gilbert




// autogenerated by syzkaller (http://github.com/google/syzkaller)
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

static const char * sg_dev = "/dev/sg0";

int main()
{
  const char *data =
 "\xb6\x3d\xb8\x5e\x1e\x8d\x22\x00\x00\x00\x00\x00\x00\x08\xaf\xd6\x1d"
 "\xcc\x43\x6a\xed\x5e\xd2\xbc\x70\x18\xce\xbc\x9b\x97\xae\x21\x91\x4d"
 "\x87\x2c\x67\x8c\xe2\x2c\x9b\x16\x0e\x96\xaa\x1f\xae\x1a";
int k, res, first_errno, prev_errno, first_err_index;
int err_count = 0;
int done_count = 0;
bool got_1st_errno = false;

for (k = 0; k < 1; ++k) {
int fd = open(sg_dev, O_RDWR);

  	res = write(fd, data, 0x30);
	if (res < 0) {
	if (! got_1st_errno) {
		got_1st_errno = true;
		first_errno = errno;
		first_err_index = done_count + k;
		printf("First errno=%d [%s] index=%d\n", first_errno,
		   strerror(first_errno), first_err_index);
	}
	++err_count;
	}
}
done_count += k;
sleep(10);
for (k = 0; k < 1; ++k) {
int fd = open(sg_dev, O_RDWR);

  	res = write(fd, data, 0x30);
	if (res < 0) {
	if (! got_1st_errno) {
		got_1st_errno = true;
		first_errno = errno;
		printf("First errno=%d [%s] index=%d\n", first_errno,
		   

Re: [PATCH 00/14] megaraid_sas: driver updates

2018-01-10 Thread Martin K. Petersen

Shivasharan,

> Shivasharan S (14):
>   megaraid_sas: zero out IOC INIT and stream detection memory
>   megaraid_sas: memset IOC INIT frame using correct size
>   megaraid_sas: Return the DCMD status from megasas_get_seq_num
>   megaraid_sas: Reset ldio_outstanding in megasas_resume
>   megaraid_sas: Error handling for invalid ldcount provided by firmware
> in RAID map
>   megaraid_sas: unload flag should be set after scsi_remove_host is
> called
>   megaraid_sas: Avoid firing DCMDs while OCR is in progress
>   megaraid_sas: Use megasas_wait_for_adapter_operational to detect
> controller state in IOCTL path
>   megaraid_sas: Update LD map after populating drv_map driver map copy
>   megaraid_sas: Selectively apply stream detection based on IO type
>   megaraid_sas: Expose fw_cmds_outstanding through sysfs
>   megaraid_sas: re-work DCMD refire code
>   megaraid_sas: NVME passthru command support
>   megaraid_sas: driver version upgrade

Applied to 1-12,14 to 4.16/scsi-queue.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] Change type of third __scsi_queue_insert() argument from int into bool

2018-01-10 Thread Martin K. Petersen

Bart,

> This patch does not change any functionality but makes the SCSI core
> source code slightly easier to read.

Applied to 4.16/scsi-queue.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH 1/2] scsi: aacraid: Get correct lun count

2018-01-10 Thread Martin K. Petersen

Raghava,

> The correct lun count needs to be divided by 24, missed it in the previous
> patch set.

Peculiar number.

Applied to 4.16/scsi-queue.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH 2/2] scsi: aacraid: Delay for rescan worker needs to be 10 seconds

2018-01-10 Thread Martin K. Petersen

Raghava,

> The delay for the rescan worker needs to 10 seconds, missed the HZ in
> there.

Applied to 4.16/scsi-queue.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCHv2] scsi_dh_alua: skip RTPG for devices only supporting active/optimized

2018-01-10 Thread Martin K. Petersen

Hannes,

> For hardware only supporting active/optimized there's no point in
> ever re-issuing RTPG as the only new state we can possibly read is
> active/optimized.
> This avoid spurious errors during path failover on such arrays.

Applied to 4.16/scsi-queue. Thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


[no subject]

2018-01-10 Thread Active lender@
Schöne Grüße,
  Ich bin Frau Adrian Irene von Active Lenders Darlehensfirma bekannt 
als Active Lending Loan®. Wir bieten alle Arten von Darlehen bei 1% Zinssatz. 
Wenn Sie ein Darlehen benötigen, kontaktieren Sie uns bitte mit den folgenden 
Informationen.

Bitte füllen Sie das untenstehende Formular aus und senden Sie es so schnell 
wie möglich zurück.

Benötigte Menge: .
Laufzeit des Darlehens: 
Der Grund für das Darlehen: .

Wir freuen uns darauf, Ihnen zu helfen. Kontaktieren Sie uns per E-Mail: 
cont...@activeslendinggroup.com
Deine
  Frau Adrian Irene


Re: [PATCH v2 3/4] scsi: Avoid that .queuecommand() gets called for a quiesced SCSI device

2018-01-10 Thread Ming Lei
On Wed, Jan 10, 2018 at 10:18:16AM -0800, Bart Van Assche wrote:
> Several SCSI transport and LLD drivers surround code that does not
> tolerate concurrent calls of .queuecommand() with scsi_target_block() /
> scsi_target_unblock(). These last two functions use
> blk_mq_quiesce_queue() / blk_mq_unquiesce_queue() for scsi-mq request
> queues to prevent concurrent .queuecommand() calls. However, that is

Actually blk_mq_quiesce_queue() is supposed to disable and drain dispatch,
not for prevent concurrent .queuecommand() calls.

> not sufficient to prevent .queuecommand() calls from scsi_send_eh_cmnd().

Given it is error handling, do we need to prevent the .queuecommand() call
in scsi_send_eh_cmnd()? Could you share us what the actual issue
observed is from user view?

> Hence surround the .queuecommand() call from the SCSI error handler with
> blk_start_wait_if_quiesced() / blk_finish_wait_if_quiesced().
> 
> Note: converting the .queuecommand() call in scsi_send_eh_cmnd() into
> code that calls blk_get_request(), e.g. scsi_execute_req(), is not an
> option since scsi_send_eh_cmnd() can be called if all requests are
> allocated and if no requests will make progress without aborting any
> of these requests.

If we need to prevent the .queuecommand() in scsi_send_eh_cmnd(), what
do you think of the approach by requeuing the EH command via
scsi_mq_requeue_cmd()/scsi_requeue_cmd()? And the EH request will be
dispatched finally when the queue becomes unquiesced or the STOPPED
is cleared.

-- 
Ming


[PATCH 22/38] scsi: Define usercopy region in scsi_sense_cache slab cache

2018-01-10 Thread Kees Cook
From: David Windsor 

SCSI sense buffers, stored in struct scsi_cmnd.sense and therefore
contained in the scsi_sense_cache slab cache, need to be copied to/from
userspace.

cache object allocation:
drivers/scsi/scsi_lib.c:
scsi_select_sense_cache(...):
return ... ? scsi_sense_isadma_cache : scsi_sense_cache

scsi_alloc_sense_buffer(...):
return kmem_cache_alloc_node(scsi_select_sense_cache(), ...);

scsi_init_request(...):
...
cmd->sense_buffer = scsi_alloc_sense_buffer(...);
...
cmd->req.sense = cmd->sense_buffer

example usage trace:

block/scsi_ioctl.c:
(inline from sg_io)
blk_complete_sghdr_rq(...):
struct scsi_request *req = scsi_req(rq);
...
copy_to_user(..., req->sense, len)

scsi_cmd_ioctl(...):
sg_io(...);

In support of usercopy hardening, this patch defines a region in
the scsi_sense_cache slab cache in which userspace copy operations
are allowed.

This region is known as the slab cache's usercopy region. Slab caches
can now check that each dynamically sized copy operation involving
cache-managed memory falls entirely within the slab's usercopy region.

Signed-off-by: David Windsor 
[kees: adjust commit log, provide usage trace]
Cc: "James E.J. Bottomley" 
Cc: "Martin K. Petersen" 
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Kees Cook 
---
 drivers/scsi/scsi_lib.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 1cbc497e00bd..164d062c4d94 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -79,14 +79,15 @@ int scsi_init_sense_cache(struct Scsi_Host *shost)
if (shost->unchecked_isa_dma) {
scsi_sense_isadma_cache =
kmem_cache_create("scsi_sense_cache(DMA)",
-   SCSI_SENSE_BUFFERSIZE, 0,
-   SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA, NULL);
+   SCSI_SENSE_BUFFERSIZE, 0,
+   SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA, NULL);
if (!scsi_sense_isadma_cache)
ret = -ENOMEM;
} else {
scsi_sense_cache =
-   kmem_cache_create("scsi_sense_cache",
-   SCSI_SENSE_BUFFERSIZE, 0, SLAB_HWCACHE_ALIGN, NULL);
+   kmem_cache_create_usercopy("scsi_sense_cache",
+   SCSI_SENSE_BUFFERSIZE, 0, SLAB_HWCACHE_ALIGN,
+   0, SCSI_SENSE_BUFFERSIZE, NULL);
if (!scsi_sense_cache)
ret = -ENOMEM;
}
-- 
2.7.4



Re: HP ProLiant DL360p Gen8 hangs with Linux 4.13+.

2018-01-10 Thread Vinson Lee
On Fri, Jan 5, 2018 at 8:32 AM, Bart Van Assche  wrote:
> On Thu, 2018-01-04 at 14:32 -0800, Vinson Lee wrote:
>> HP ProLiant DL360p Gen8 with Smart Array P420i boots to the login
>> prompt and hangs with Linux 4.13 or later. I cannot log in on console
>> or SSH into the machine. Linux 4.12 and older boot fine.
>>
>> I see these messages on the console.
>>
>> [  242.843206] INFO: task scsi_eh_2:465 blocked for more than 120 seconds.
>> [  242.877835]   Not tainted 4.15.0-041500rc6-generic #201712312330
>
> It seems like something got stuck in the block layer. The traditional way to
> debug this is to analyze the information that is available under
> /sys/kernel/debug/block. However, since login is not possible we can't use
> that approach. Would it be possible for you to check whether this has been
> resolved in kernel v4.15-rc6, and if not, bisect this?
>
> Thanks,
>
> Bart.

Hi.

The machine still hangs with Linux 4.15-rc6.

I did a bisect. The hang is introduced with Linux 4.13-rc1 commit
c5cb83bb337c25caae995d992d1cdf9b317f83de "genirq/cpuhotplug: Handle
managed IRQs on CPU hotplug".

There is a startup script that disables hyperthreading by offlining
sibling CPUs.

for CPU in $(cut -s -d, -f2
$SYS_PATH/cpu*/topology/thread_siblings_list | sort -un); do
echo 0 > /sys/devices/system/cpu/cpu$CPU/online
done

If the above script is not run, the machine does not hang with Linux 4.13.

Cheers,
Vinson


[PATCH] Change type of third __scsi_queue_insert() argument from int into bool

2018-01-10 Thread Bart Van Assche
This patch does not change any functionality but makes the SCSI core
source code slightly easier to read.

Signed-off-by: Bart Van Assche 
Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: Johannes Thumshirn 
---
 drivers/scsi/scsi_lib.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 1c4d885a1d9e..dad450ccd57b 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -164,7 +164,7 @@ static void scsi_mq_requeue_cmd(struct scsi_cmnd *cmd)
  * for a requeue after completion, which should only occur in this
  * file.
  */
-static void __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, int unbusy)
+static void __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, bool unbusy)
 {
struct scsi_device *device = cmd->device;
struct request_queue *q = device->request_queue;
@@ -220,7 +220,7 @@ static void __scsi_queue_insert(struct scsi_cmnd *cmd, int 
reason, int unbusy)
  */
 void scsi_queue_insert(struct scsi_cmnd *cmd, int reason)
 {
-   __scsi_queue_insert(cmd, reason, 1);
+   __scsi_queue_insert(cmd, reason, true);
 }
 
 
@@ -1015,11 +1015,11 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned 
int good_bytes)
break;
case ACTION_RETRY:
/* Retry the same command immediately */
-   __scsi_queue_insert(cmd, SCSI_MLQUEUE_EH_RETRY, 0);
+   __scsi_queue_insert(cmd, SCSI_MLQUEUE_EH_RETRY, false);
break;
case ACTION_DELAYED_RETRY:
/* Retry the same command after a delay */
-   __scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY, 0);
+   __scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY, false);
break;
}
 }
-- 
2.15.1



RE: [PATCH 13/14] megaraid_sas: NVME passthru command support

2018-01-10 Thread Sathya Prakash Veerichetty
Bart et al,
Broadcom's Tri-mode HBAs and MegaRAID controllers are capable of
connecting with SAS, SATA, NVMe drives,  SAS expanders and PCIe switches
(with NVMe drives connected behind that) and are capable of creating RAID
volumes on top of similar family of drives.  In the case of RAID
controllers, all of those drives and RAID volumes are exposed to the OS as
generic SCSI devices and in the case of HBA only for SAS and SATA the
topology is exposed to OS through SAS transport layer and NVMe drives are
exposed as generic SCSI devices.  The SCSI CDB to specific packet (SATA
frames, SSP frames or NVMe) translation occurs in the hardware/firmware.
For the OS driver, the interface to interact is common across all the type
of devices and it is MPI SCSI IO Request.

The NVMe passthru support added in this patch is only for management
purpose and will let Broadcom specific management applications to send
some direct NVMe commands to the hardware/firmware solely for management
purpose.  For normal READ/WRITE I/O the preferred path is to issue SCSI
command to our hardware/firmware and let it translate to the NVMe.

We have many architectural constraints to directly expose NVMe drives to
NVMe subsystem for normal I/O usage and management usage and hence we
prefer not to go down the path.

This patch is just addition of new feature for our management applications
(which are common across many OSes) to access a specific type of MPI
command to manage NVMe drives connected behind our HBAs (which are
non-standard) in a vendor specific way, hence we think this patch is valid
to be accepted to megaraid driver.  Please let us know if more details are
required on the tri-mode controllers.

Thanks
Sathya


-Original Message-
From: Linux-nvme [mailto:linux-nvme-boun...@lists.infradead.org] On Behalf
Of Douglas Gilbert
Sent: Wednesday, January 10, 2018 1:06 PM
To: Bart Van Assche; h...@infradead.org; kashyap.de...@broadcom.com;
shivasharan.srikanteshw...@broadcom.com
Cc: sumit.sax...@broadcom.com; peter.riv...@broadcom.com;
linux-n...@lists.infradead.org; linux-scsi@vger.kernel.org
Subject: Re: [PATCH 13/14] megaraid_sas: NVME passthru command support

On 2018-01-10 11:22 AM, Bart Van Assche wrote:
> On Tue, 2018-01-09 at 22:07 +0530, Kashyap Desai wrote:
>> Overall NVME support behind MR controller is really a SCSI device. On
>> top of that, for MegaRaid, NVME device can be part of Virtual Disk
>> and those drive will not be exposed to the driver. User application
>> may like to talk to hidden NVME devices (part of VDs). This patch
>> will extend the existing interface for megaraid product in the same
>> way it is currently supported for other protocols like SMP, SATA
pass-through.
>
> It seems to me like there is a contradiction in the above paragraph:
> if some NVMe devices are not exposed to the driver, how can a user
> space application ever send NVMe commands to it?

I think that he meant that the NVMe physical devices (e.g. SSDs) are not
exposed to the upper layers (e.g. the SCSI mid-layer and above). The SCSI
subsystem has a no_uld_attach device flag that lets a LLD attach physical
devices but the sd driver and hence the block layer do not "see" them. The
idea is that maintenance programs like smartmontools can use them via the
bsg or sg drivers. The Megaraid driver code does not seem to use
no_uld_attach. Does the NVMe subsystem have similar "generic" (i.e.
non-block) devices accessible to the user space?

> Anyway, has it been considered to implement the NVMe support as an
> NVMe transport driver? The upstream kernel already supports NVMe
> communication with NVMe PCI devices, NVMe over RDMA and NVMe over FC.
> If communication to the NVMe devices behind the MegaRaid controller
> would be implemented as an NVMe transport driver then all
> functionality of the Linux NVMe driver could be reused, including its
sysfs entries.

Broadcom already sell "SAS" HBAs that have "tri-mode" phys. That is a phy
that can connect to a SAS device (e.g. a SAS expander), a SATA device or a
NVMe device. Now if I was Broadcom designing a 24 Gbps SAS-4 next
generation expander I would be thinking of using those tri-mode phys on
it. But then there is a problem, SAS currently supports 3 protocols: SSP
(for SCSI storage and enclosure management (SES)), STP (for SATA storage )
and SMP (for expander management). The problem is how those NVMe commands,
status and data cross the wire between the OS HBA (or MegaRaid type
controller) and an expander. Solving that might need some lateral
thinking.

On one hand the NVM Express folks seem to have shelved the idea of a SCSI
to NVMe Translation Layer (SNTL) and have not updated an old white paper
on the subject. Currently there is no SNTL on Linux (there was but it was
removed) or FreeBSD but there is one on Windows.

On the other hand I'm informed that recently the same body accepted the
SES-3 standard pretty much as-is. That is done with the addition of SES
Send and SES Receive commands to 

[PATCH 0/1] scsi_debug: delay stress fix

2018-01-10 Thread Douglas Gilbert
Bart Van Assche reported that when the scsi_debug driver was being
stress tested with fio, changing the delay paremeter via sysfs
caused a cascade of oops-es. The fix presented reads the driver
wide delay values (jiffies or nanoseconds) once and remembers in
the sdebug_defer object which defer method is used and which 
method has been initialized. This simplifies handling when command
aborts occur.

This causes a minor changes in semantic: a SCSI command "in
flight" is no longer impacted by changing the delay option
after it has been scheduled (i.e. while it is waiting for a work
queue or a hr timer to exhaust).

Douglas Gilbert (1):
  scsi_debug: delay fix

 drivers/scsi/scsi_debug.c | 72 ++-
 1 file changed, 46 insertions(+), 26 deletions(-)

-- 
2.14.1



[PATCH 1/1] scsi_debug: delay stress fix

2018-01-10 Thread Douglas Gilbert
Introduce a state enum into sdebug_defer objects to indicate which,
if any, defer method has been used with the associated command.
Also add 2 bools to indicate which of the defer methods has been
initialized. Those objects are re-used but the initialization
only needs to be done once. This simplifies command cancellation
handling.

Now the delay associated with a deferred response of a command
cannot be changed (once started) by changing the delay (and ndelay)
parameters in sysfs. Command aborts and driver shutdown are still
honoured immediately when received.

Signed-off-by: Douglas Gilbert 
---
 drivers/scsi/scsi_debug.c | 72 ++-
 1 file changed, 46 insertions(+), 26 deletions(-)

diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
index e4f037f0f38b..f5d0098b5a6a 100644
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -263,12 +263,18 @@ struct sdebug_host_info {
 #define to_sdebug_host(d)  \
container_of(d, struct sdebug_host_info, dev)
 
+enum sdeb_defer_type {SDEB_DEFER_NONE = 0, SDEB_DEFER_HRT = 1,
+ SDEB_DEFER_WQ = 2};
+
 struct sdebug_defer {
struct hrtimer hrt;
struct execute_work ew;
int sqa_idx;/* index of sdebug_queue array */
int qc_idx; /* index of sdebug_queued_cmd array within sqa_idx */
int issuing_cpu;
+   bool init_hrt;
+   bool init_wq;
+   enum sdeb_defer_type defer_t;
 };
 
 struct sdebug_queued_cmd {
@@ -3495,6 +3501,7 @@ static void sdebug_q_cmd_complete(struct sdebug_defer 
*sd_dp)
struct scsi_cmnd *scp;
struct sdebug_dev_info *devip;
 
+   sd_dp->defer_t = SDEB_DEFER_NONE;
qc_idx = sd_dp->qc_idx;
sqp = sdebug_q_arr + sd_dp->sqa_idx;
if (sdebug_statistics) {
@@ -3678,13 +3685,14 @@ static void scsi_debug_slave_destroy(struct scsi_device 
*sdp)
}
 }
 
-static void stop_qc_helper(struct sdebug_defer *sd_dp)
+static void stop_qc_helper(struct sdebug_defer *sd_dp,
+  enum sdeb_defer_type defer_t)
 {
if (!sd_dp)
return;
-   if ((sdebug_jdelay > 0) || (sdebug_ndelay > 0))
+   if (defer_t == SDEB_DEFER_HRT)
hrtimer_cancel(_dp->hrt);
-   else if (sdebug_jdelay < 0)
+   else if (defer_t == SDEB_DEFER_WQ)
cancel_work_sync(_dp->ew.work);
 }
 
@@ -3694,6 +3702,7 @@ static bool stop_queued_cmnd(struct scsi_cmnd *cmnd)
 {
unsigned long iflags;
int j, k, qmax, r_qmax;
+   enum sdeb_defer_type l_defer_t;
struct sdebug_queue *sqp;
struct sdebug_queued_cmd *sqcp;
struct sdebug_dev_info *devip;
@@ -3717,8 +3726,13 @@ static bool stop_queued_cmnd(struct scsi_cmnd *cmnd)
atomic_dec(>num_in_q);
sqcp->a_cmnd = NULL;
sd_dp = sqcp->sd_dp;
+   if (sd_dp) {
+   l_defer_t = sd_dp->defer_t;
+   sd_dp->defer_t = SDEB_DEFER_NONE;
+   } else
+   l_defer_t = SDEB_DEFER_NONE;
spin_unlock_irqrestore(>qc_lock, iflags);
-   stop_qc_helper(sd_dp);
+   stop_qc_helper(sd_dp, l_defer_t);
clear_bit(k, sqp->in_use_bm);
return true;
}
@@ -3733,6 +3747,7 @@ static void stop_all_queued(void)
 {
unsigned long iflags;
int j, k;
+   enum sdeb_defer_type l_defer_t;
struct sdebug_queue *sqp;
struct sdebug_queued_cmd *sqcp;
struct sdebug_dev_info *devip;
@@ -3751,8 +3766,13 @@ static void stop_all_queued(void)
atomic_dec(>num_in_q);
sqcp->a_cmnd = NULL;
sd_dp = sqcp->sd_dp;
+   if (sd_dp) {
+   l_defer_t = sd_dp->defer_t;
+   sd_dp->defer_t = SDEB_DEFER_NONE;
+   } else
+   l_defer_t = SDEB_DEFER_NONE;
spin_unlock_irqrestore(>qc_lock, iflags);
-   stop_qc_helper(sd_dp);
+   stop_qc_helper(sd_dp, l_defer_t);
clear_bit(k, sqp->in_use_bm);
spin_lock_irqsave(>qc_lock, iflags);
}
@@ -4003,7 +4023,7 @@ static void setup_inject(struct sdebug_queue *sqp,
  * SCSI_MLQUEUE_HOST_BUSY if temporarily out of resources.
  */
 static int schedule_resp(struct scsi_cmnd *cmnd, struct sdebug_dev_info *devip,
-int scsi_result, int 

Re: [PATCHv2] scsi_dh_alua: skip RTPG for devices only supporting active/optimized

2018-01-10 Thread Bart Van Assche
On Fri, 2017-12-22 at 12:52 +0100, Hannes Reinecke wrote:
> From: Hannes Reinecke 
> 
> For hardware only supporting active/optimized there's no point in
> ever re-issuing RTPG as the only new state we can possibly read is
> active/optimized.
> This avoid spurious errors during path failover on such arrays.

Reviewed-by: Bart Van Assche 



Re: [PATCH] scsi: bfa: use ARRAY_SIZE for array sizing calculation on array __pciids

2018-01-10 Thread Martin K. Petersen

Colin,

> Use the ARRAY_SIZE macro on array __pciids to determine size of the array.
> Improvement suggested by coccinelle.

Applied to 4.16/scsi-queue. Thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH][scsi-next] scsi: qla2xxx: remove redundant assignment of d

2018-01-10 Thread Martin K. Petersen

Colin,

> The initialization of d is redundant as this value is never read
> and it is overwritten inside the subsequent for-loop.  Remove this
> redundant assignment.

Applied to 4.16/scsi-queue.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] libsas: Disable asynchronous aborts for SATA devices

2018-01-10 Thread Martin K. Petersen

Hannes,

> Handling CD-ROM devices from libsas is decidedly odd, as libata relies
> on SCSI EH to be started to figure out that no medium is present.  So
> we cannot do asynchronous aborts for SATA devices.

Applied to 4.15/scsi-fixes. Thank you!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [RESEND PATCH 2/2] scsi: qedi: Use zeroing allocator instead of allocator/memset

2018-01-10 Thread Martin K. Petersen

Himanshu,

> Use dma_zalloc_coherent instead of dma_alloc_coherent followed by memset
> 0.

Applied to 4.16/scsi-queue.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [RESEND PATCH 1/2] scsi: bnx2fc: Use zeroing allocator rather than allocator/memset

2018-01-10 Thread Martin K. Petersen

Himanshu,

> Use dma_zalloc_coherent instead of dma_alloc_coherent followed by
> memset 0.

Applied to 4.16/scsi-queue. Thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


[PATCH 2/2] scsi: aacraid: Delay for rescan worker needs to be 10 seconds

2018-01-10 Thread Raghava Aditya Renukunta
The delay for the rescan worker needs to 10 seconds, missed the HZ in
there.

Fixes: a1367e4adee207fe (scsi: aacraid: Reschedule host scan in case of failure)
Signed-off-by: Raghava Aditya Renukunta 
---
 drivers/scsi/aacraid/aacraid.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/aacraid/aacraid.h b/drivers/scsi/aacraid/aacraid.h
index 3e8bfcf..3ab3231 100644
--- a/drivers/scsi/aacraid/aacraid.h
+++ b/drivers/scsi/aacraid/aacraid.h
@@ -1340,7 +1340,7 @@ struct fib {
 #define AAC_DEVTYPE_ARC_RAW2
 #define AAC_DEVTYPE_NATIVE_RAW 3
 
-#define AAC_SAFW_RESCAN_DELAY  10
+#define AAC_SAFW_RESCAN_DELAY  (10 * HZ)
 
 struct aac_hba_map_info {
__le32  rmw_nexus;  /* nexus for native HBA devices */
-- 
2.9.4



[PATCH 1/2] scsi: aacraid: Get correct lun count

2018-01-10 Thread Raghava Aditya Renukunta
The correct lun count needs to be divided by 24, missed it in the previous
patch set.

Fixes: 4b00022753550055 (scsi: aacraid: Create helper functions to get lun info)
Signed-off-by: Raghava Aditya Renukunta 
---
 drivers/scsi/aacraid/aachba.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/aacraid/aachba.c b/drivers/scsi/aacraid/aachba.c
index a2bdd79..4c65991 100644
--- a/drivers/scsi/aacraid/aachba.c
+++ b/drivers/scsi/aacraid/aachba.c
@@ -1868,7 +1868,7 @@ static int aac_get_safw_ciss_luns(struct aac_dev *dev)
 
 static inline u32 aac_get_safw_phys_lun_count(struct aac_dev *dev)
 {
-   return get_unaligned_be32(>safw_phys_luns->list_length[0]);
+   return get_unaligned_be32(>safw_phys_luns->list_length[0])/24;
 }
 
 static inline u32 aac_get_safw_phys_bus(struct aac_dev *dev, int lun)
-- 
2.9.4



[PATCH v2] storvsc: do not assume SG list is continuous when doing bounce buffers (for 4.1 and prior stable kernels)

2018-01-10 Thread Long Li
From: Long Li 

The original patch was made for stable 4.1 and was Acked on 08/22/2017, but for
some reason it never made it to the stable tree.

Change from v1:
Changed comment that this patch is for linux-stable 4.1 and all prior stable
kernels.

storvsc checks the SG list for gaps before passing them to Hyper-v device.
If there are gaps, data is copied to a bounce buffer and a continuous data
buffer is passed to Hyper-V.

The check on gaps assumes SG list is continuous, and not chained. This is
 not always true. Failing the check may result in incorrect I/O data
passed to the Hyper-v device.

This code path is not used post Linux 4.1.

Signed-off-by: Long Li 
Acked-by: Martin K. Petersen 

---
 drivers/scsi/storvsc_drv.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 6c52d14..14dc5c6 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -584,17 +584,18 @@ static int do_bounce_buffer(struct scatterlist *sgl, 
unsigned int sg_count)
for (i = 0; i < sg_count; i++) {
if (i == 0) {
/* make sure 1st one does not have hole */
-   if (sgl[i].offset + sgl[i].length != PAGE_SIZE)
+   if (sgl->offset + sgl->length != PAGE_SIZE)
return i;
} else if (i == sg_count - 1) {
/* make sure last one does not have hole */
-   if (sgl[i].offset != 0)
+   if (sgl->offset != 0)
return i;
} else {
/* make sure no hole in the middle */
-   if (sgl[i].length != PAGE_SIZE || sgl[i].offset != 0)
+   if (sgl->length != PAGE_SIZE || sgl->offset != 0)
return i;
}
+   sgl = sg_next(sgl);
}
return -1;
 }
-- 
2.7.4



Re: [PATCH 13/14] megaraid_sas: NVME passthru command support

2018-01-10 Thread Douglas Gilbert

On 2018-01-10 11:22 AM, Bart Van Assche wrote:

On Tue, 2018-01-09 at 22:07 +0530, Kashyap Desai wrote:

Overall NVME support behind MR controller is really a SCSI device. On top
of that, for MegaRaid, NVME device can be part of Virtual Disk and those
drive will not be exposed to the driver. User application may like to talk
to hidden NVME devices (part of VDs). This patch will extend the existing
interface for megaraid product in the same way it is currently supported
for other protocols like SMP, SATA pass-through.


It seems to me like there is a contradiction in the above paragraph: if some
NVMe devices are not exposed to the driver, how can a user space application
ever send NVMe commands to it?


I think that he meant that the NVMe physical devices (e.g. SSDs) are not
exposed to the upper layers (e.g. the SCSI mid-layer and above). The
SCSI subsystem has a no_uld_attach device flag that lets a LLD attach
physical devices but the sd driver and hence the block layer do not
"see" them. The idea is that maintenance programs like smartmontools
can use them via the bsg or sg drivers. The Megaraid driver code does
not seem to use no_uld_attach. Does the NVMe subsystem have similar
"generic" (i.e. non-block) devices accessible to the user space?


Anyway, has it been considered to implement the NVMe support as an NVMe
transport driver? The upstream kernel already supports NVMe communication
with NVMe PCI devices, NVMe over RDMA and NVMe over FC. If communication to
the NVMe devices behind the MegaRaid controller would be implemented as an
NVMe transport driver then all functionality of the Linux NVMe driver could
be reused, including its sysfs entries.


Broadcom already sell "SAS" HBAs that have "tri-mode" phys. That is a phy
that can connect to a SAS device (e.g. a SAS expander), a SATA device or a
NVMe device. Now if I was Broadcom designing a 24 Gbps SAS-4 next generation
expander I would be thinking of using those tri-mode phys on it. But then
there is a problem, SAS currently supports 3 protocols: SSP (for SCSI
storage and enclosure management (SES)), STP (for SATA storage ) and SMP
(for expander management). The problem is how those NVMe commands, status
and data cross the wire between the OS HBA (or MegaRaid type controller) and
an expander. Solving that might need some lateral thinking.

On one hand the NVM Express folks seem to have shelved the idea of a SCSI
to NVMe Translation Layer (SNTL) and have not updated an old white paper
on the subject. Currently there is no SNTL on Linux (there was but it was
removed) or FreeBSD but there is one on Windows.

On the other hand I'm informed that recently the same body accepted the
SES-3 standard pretty much as-is. That is done with the addition of SES
Send and SES Receive commands to NVME-MI. The library under sg_ses has
already been modified to use them (by implementing a specialized SNTL).

Doug Gilbert



Re: [PATCH] uas: Unblock scsi-requests on failure to alloc streams in post_reset

2018-01-10 Thread Hans de Goede

Hi,

On 10-01-18 16:23, Oliver Neukum wrote:

Am Mittwoch, den 10.01.2018, 08:13 +0100 schrieb Hans de Goede:

If we return 1 from our post_reset handler, then our disconnect handler
will be called immediately afterwards. Since pre_reset blocks all scsi
requests our disconnect handler will then hang in the scsi_remove_host
call.


Hi Hans,

it seems to me that the diagnosis is spot on. But why do we
keep different code paths at all in this case? I do not see
the point of not reporting the reset to the SCSI subsystem,
even if we are not operational afterwards.
So how about something like this?


Sure, works for me :)

Regards,

Hans




 From 4d1e26154bc5d09913bfba34d7adc39cce98d20a Mon Sep 17 00:00:00 2001
From: Oliver Neukum 
Date: Wed, 10 Jan 2018 16:16:03 +0100
Subject: [PATCH] usb: uas: unconditionally bring back host after reset

Quoting Hans:

If we return 1 from our post_reset handler, then our disconnect handler
will be called immediately afterwards. Since pre_reset blocks all scsi
requests our disconnect handler will then hang in the scsi_remove_host
call.

This is esp. bad because our disconnect handler hanging for ever also
stops the USB subsys from enumerating any new USB devices, causes
commands
like lsusb to hang, etc.

In practice this happens when unplugging some uas devices because the
hub
code may see the device as needing a warm-reset and calls
usb_reset_device
before seeing the disconnect. In this case uas_configure_endpoints
fails
with -ENODEV. We do not want to print an error for this, so this commit
also silences the shost_printk for -ENODEV.

ENDQUOTE

However, if we do that we better drop any unconditional execution
and report to the SCSI subsystem that we have undergone a reset
but we are not operational now.

Signed-off-by: Oliver Neukum 
Reported-by: Hans de Goede 
---
  Makefile  | 2 +-
  drivers/usb/storage/uas.c | 7 +++
  2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/usb/storage/uas.c b/drivers/usb/storage/uas.c
index 5d04c40ee40a..3b1b9695177a 100644
--- a/drivers/usb/storage/uas.c
+++ b/drivers/usb/storage/uas.c
@@ -1076,20 +1076,19 @@ static int uas_post_reset(struct usb_interface
*intf)
    return 0;
  
  	err = uas_configure_endpoints(devinfo);

-   if (err) {
+   if (err && err != ENODEV)
    shost_printk(KERN_ERR, shost,
     "%s: alloc streams error %d after reset",
     __func__, err);
-   return 1;
-   }
  
+	/* we must unblock the host in every case lest we deadlock */

    spin_lock_irqsave(shost->host_lock, flags);
    scsi_report_bus_reset(shost, 0);
    spin_unlock_irqrestore(shost->host_lock, flags);
  
  	scsi_unblock_requests(shost);
  
-	return 0;

+   return err ? 1 : 0;
  }
  
  static int uas_suspend(struct usb_interface *intf, pm_message_t

message)
--
2.13.6



Re: [PATCH] tcmu: Allow reconfig to handle multiple attributes

2018-01-10 Thread Mike Christie
On 01/04/2018 10:11 AM, Bryant G. Ly wrote:
> This patch allows for multiple attributes to be reconfigured
> and handled all in one call as compared to multiple netlinks.
> 
> Example:
> set attribute dev_reconfig=dev_config=fbo//home/path:dev_size=2147483648
> 

I know I suggested this, but I think I was wrong. If we have to support
other apps that work in reverse to targetcli+tcmu-runner where the
tcmu-runner equivalent app sets things up then contacts the kernel,
let's just not do passthrough operations like this for reconfig. There
is no need to bring in the kernel.

For the initial config we can still do it since we have to maintain
compat, but for major reconfigs like this let's just have targetcli
contact tcmu-runner, then runner can update the kernel if needed.

So in targetcli and runner copy the check_config stuff, and add a
reconfig callout that works like it. We then do not have this weird
kernel passthrough and do not have to worry about the non
targetcl-tcmu-runner type of model.



> Signed-off-by: Bryant G. Ly 
> ---
>  drivers/target/target_core_user.c | 92 
> ++-
>  include/uapi/linux/target_core_user.h |  1 +
>  2 files changed, 92 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/target/target_core_user.c 
> b/drivers/target/target_core_user.c
> index 4824abf92ed79..619fae5e865f1 100644
> --- a/drivers/target/target_core_user.c
> +++ b/drivers/target/target_core_user.c
> @@ -152,6 +152,8 @@ struct tcmu_dev {
>   char dev_config[TCMU_CONFIG_LEN];
>  
>   int nl_reply_supported;
> +
> + char dev_reconfig[TCMU_CONFIG_LEN * 2];
>  };
>  
>  #define TCMU_DEV(_se_dev) container_of(_se_dev, struct tcmu_dev, se_dev)
> @@ -1452,6 +1454,10 @@ static int tcmu_netlink_event(struct tcmu_dev *udev, 
> enum tcmu_genl_cmd cmd,
>   ret = nla_put_u8(skb, reconfig_attr,
> *((u8 *)reconfig_data));
>   break;
> + case TCMU_ATTR_DEV_RECFG:
> + pr_err("Put string into netlink and send it\n");
> + ret = nla_put_string(skb, reconfig_attr, reconfig_data);
> + break;
>   default:
>   BUG();
>   }
> @@ -1637,7 +1643,7 @@ static void tcmu_destroy_device(struct se_device *dev)
>  
>  enum {
>   Opt_dev_config, Opt_dev_size, Opt_hw_block_size, Opt_hw_max_sectors,
> - Opt_nl_reply_supported, Opt_err,
> + Opt_nl_reply_supported, Opt_dev_reconfig, Opt_err,
>  };
>  
>  static match_table_t tokens = {
> @@ -1646,6 +1652,7 @@ static match_table_t tokens = {
>   {Opt_hw_block_size, "hw_block_size=%u"},
>   {Opt_hw_max_sectors, "hw_max_sectors=%u"},
>   {Opt_nl_reply_supported, "nl_reply_supported=%d"},
> + {Opt_dev_reconfig, "dev_reconfig=%s"},
>   {Opt_err, NULL}
>  };
>  
> @@ -1731,6 +1738,14 @@ static ssize_t tcmu_set_configfs_dev_params(struct 
> se_device *dev,
>   if (ret < 0)
>   pr_err("kstrtoint() failed for 
> nl_reply_supported=\n");
>   break;
> + case Opt_dev_reconfig:
> + arg_p = match_strdup([0]);
> + if (!arg_p) {
> + ret = -ENOMEM;
> + break;
> + }
> + kfree(arg_p);
> + break;
>   default:
>   break;
>   }
> @@ -1945,12 +1960,87 @@ static ssize_t tcmu_emulate_write_cache_store(struct 
> config_item *item,
>  }
>  CONFIGFS_ATTR(tcmu_, emulate_write_cache);
>  
> +static ssize_t tcmu_dev_reconfig_show(struct config_item *item, char *page)
> +{
> + struct se_dev_attrib *da = container_of(to_config_group(item),
> + struct se_dev_attrib, da_group);
> + struct tcmu_dev *udev = TCMU_DEV(da->da_dev);
> +
> + return snprintf(page, PAGE_SIZE, "%s\n", udev->dev_reconfig);
> +}
> +
> +static ssize_t tcmu_dev_reconfig_store(struct config_item *item,
> +const char *page,
> +size_t count)
> +{
> + struct se_dev_attrib *da = container_of(to_config_group(item),
> + struct se_dev_attrib, da_group);
> + struct tcmu_dev *udev = TCMU_DEV(da->da_dev);
> + int token, ret;
> + char *orig, *ptr, *opts, *arg_p;
> + substring_t args[MAX_OPT_ARGS];
> +
> + /* Check if device has been configured before */
> + if (tcmu_dev_configured(udev)) {
> + ret = tcmu_netlink_event(udev, TCMU_CMD_RECONFIG_DEVICE,
> +  TCMU_ATTR_DEV_RECFG, page);
> + if (ret) {
> + pr_err("Unable to reconfigure device\n");
> + return ret;
> + }
> +
> + opts = 

[PATCH v2 2/4] block: Introduce blk_start_wait_if_quiesced() and blk_finish_wait_if_quiesced()

2018-01-10 Thread Bart Van Assche
Introduce functions that allow block drivers to wait while a request
queue is in the quiesced state (blk-mq) or in the stopped state (legacy
block layer). The next patch will add calls to these functions in the
SCSI core.

Signed-off-by: Bart Van Assche 
Cc: Martin K. Petersen 
Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: Johannes Thumshirn 
Cc: Ming Lei 
---
 block/blk-core.c   |  1 +
 block/blk-mq.c | 64 ++
 include/linux/blk-mq.h |  2 ++
 3 files changed, 67 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index c10b4ce95248..06eaea15bae9 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -287,6 +287,7 @@ void blk_start_queue(struct request_queue *q)
WARN_ON_ONCE(q->mq_ops);
 
queue_flag_clear(QUEUE_FLAG_STOPPED, q);
+   wake_up_all(>mq_wq);
__blk_run_queue(q);
 }
 EXPORT_SYMBOL(blk_start_queue);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index a05ea7e9b415..87455977ad34 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -247,11 +247,75 @@ void blk_mq_unquiesce_queue(struct request_queue *q)
queue_flag_clear(QUEUE_FLAG_QUIESCED, q);
spin_unlock_irqrestore(q->queue_lock, flags);
 
+   wake_up_all(>mq_wq);
+
/* dispatch requests which are inserted during quiescing */
blk_mq_run_hw_queues(q, true);
 }
 EXPORT_SYMBOL_GPL(blk_mq_unquiesce_queue);
 
+/**
+ * blk_start_wait_if_quiesced() - wait if a queue is quiesced (blk-mq) or 
stopped (legacy block layer)
+ * @q: Request queue pointer.
+ *
+ * Some block drivers, e.g. the SCSI core, can bypass the block layer core
+ * request submission mechanism. Surround such code with
+ * blk_start_wait_if_quiesced() / blk_finish_wait_if_quiesced() to avoid that
+ * request submission can happen while a queue is quiesced or stopped.
+ *
+ * Returns with the RCU read lock held (blk-mq) or with q->queue_lock held
+ * (legacy block layer).
+ *
+ * Notes:
+ * - Every call of this function must be followed by a call of
+ *   blk_finish_wait_if_quiesced().
+ * - This function does not support block drivers whose .queue_rq()
+ *   implementation can sleep (BLK_MQ_F_BLOCKING).
+ */
+int blk_start_wait_if_quiesced(struct request_queue *q)
+{
+   struct blk_mq_hw_ctx *hctx;
+   unsigned int i;
+
+   might_sleep();
+
+   if (q->mq_ops) {
+   queue_for_each_hw_ctx(q, hctx, i)
+   WARN_ON(hctx->flags & BLK_MQ_F_BLOCKING);
+
+   rcu_read_lock();
+   while (!blk_queue_dying(q) && blk_queue_quiesced(q)) {
+   rcu_read_unlock();
+   wait_event(q->mq_wq, blk_queue_dying(q) ||
+  !blk_queue_quiesced(q));
+   rcu_read_lock();
+   }
+   } else {
+   spin_lock_irq(q->queue_lock);
+   wait_event_lock_irq(q->mq_wq,
+   blk_queue_dying(q) || !blk_queue_stopped(q),
+   *q->queue_lock);
+   q->request_fn_active++;
+   }
+   return blk_queue_dying(q) ? -ENODEV : 0;
+}
+EXPORT_SYMBOL(blk_start_wait_if_quiesced);
+
+/**
+ * blk_finish_wait_if_quiesced() - counterpart of blk_start_wait_if_quiesced()
+ * @q: Request queue pointer.
+ */
+void blk_finish_wait_if_quiesced(struct request_queue *q)
+{
+   if (q->mq_ops) {
+   rcu_read_unlock();
+   } else {
+   q->request_fn_active--;
+   spin_unlock_irq(q->queue_lock);
+   }
+}
+EXPORT_SYMBOL(blk_finish_wait_if_quiesced);
+
 void blk_mq_wake_waiters(struct request_queue *q)
 {
struct blk_mq_hw_ctx *hctx;
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 8efcf49796a3..15912cd348b5 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -267,6 +267,8 @@ void blk_mq_start_stopped_hw_queue(struct blk_mq_hw_ctx 
*hctx, bool async);
 void blk_mq_start_stopped_hw_queues(struct request_queue *q, bool async);
 void blk_mq_quiesce_queue(struct request_queue *q);
 void blk_mq_unquiesce_queue(struct request_queue *q);
+int blk_start_wait_if_quiesced(struct request_queue *q);
+void blk_finish_wait_if_quiesced(struct request_queue *q);
 void blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, unsigned long 
msecs);
 bool blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async);
 void blk_mq_run_hw_queues(struct request_queue *q, bool async);
-- 
2.15.1



[PATCH v2 3/4] scsi: Avoid that .queuecommand() gets called for a quiesced SCSI device

2018-01-10 Thread Bart Van Assche
Several SCSI transport and LLD drivers surround code that does not
tolerate concurrent calls of .queuecommand() with scsi_target_block() /
scsi_target_unblock(). These last two functions use
blk_mq_quiesce_queue() / blk_mq_unquiesce_queue() for scsi-mq request
queues to prevent concurrent .queuecommand() calls. However, that is
not sufficient to prevent .queuecommand() calls from scsi_send_eh_cmnd().
Hence surround the .queuecommand() call from the SCSI error handler with
blk_start_wait_if_quiesced() / blk_finish_wait_if_quiesced().

Note: converting the .queuecommand() call in scsi_send_eh_cmnd() into
code that calls blk_get_request(), e.g. scsi_execute_req(), is not an
option since scsi_send_eh_cmnd() can be called if all requests are
allocated and if no requests will make progress without aborting any
of these requests.

Signed-off-by: Bart Van Assche 
Reviewed-by: Hannes Reinecke 
Cc: Martin K. Petersen 
Cc: Christoph Hellwig 
Cc: Johannes Thumshirn 
Cc: Ming Lei 
---
 drivers/scsi/scsi_error.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 62b56de38ae8..f7154ea86715 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -1016,6 +1016,7 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, 
unsigned char *cmnd,
 {
struct scsi_device *sdev = scmd->device;
struct Scsi_Host *shost = sdev->host;
+   struct request_queue *q = sdev->request_queue;
DECLARE_COMPLETION_ONSTACK(done);
unsigned long timeleft = timeout;
struct scsi_eh_save ses;
@@ -1028,7 +1029,9 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, 
unsigned char *cmnd,
 
scsi_log_send(scmd);
scmd->scsi_done = scsi_eh_done;
+   blk_start_wait_if_quiesced(q);
rtn = shost->hostt->queuecommand(shost, scmd);
+   blk_finish_wait_if_quiesced(q);
if (rtn) {
if (timeleft > stall_for) {
scsi_eh_restore_cmnd(scmd, );
-- 
2.15.1



[PATCH v2 4/4] IB/srp: Fix a sleep-in-invalid-context bug

2018-01-10 Thread Bart Van Assche
The previous two patches guarantee that srp_queuecommand() does not get
invoked while reconnecting occurs. Hence remove the code from
srp_queuecommand() that prevents command queueing while reconnecting.
This patch avoids that the following can appear in the kernel log:

BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747
in_atomic(): 1, irqs_disabled(): 0, pid: 5600, name: scsi_eh_9
1 lock held by scsi_eh_9/5600:
 #0:  (rcu_read_lock){}, at: [] 
__blk_mq_run_hw_queue+0xf1/0x1e0
Preemption disabled at:
[<139badf2>] __blk_mq_delay_run_hw_queue+0x78/0xf0
CPU: 9 PID: 5600 Comm: scsi_eh_9 Tainted: GW4.15.0-rc4-dbg+ #1
Hardware name: Dell Inc. PowerEdge R720/0VWT90, BIOS 2.5.4 01/22/2016
Call Trace:
 dump_stack+0x67/0x99
 ___might_sleep+0x16a/0x250 [ib_srp]
 __mutex_lock+0x46/0x9d0
 srp_queuecommand+0x356/0x420 [ib_srp]
 scsi_dispatch_cmd+0xf6/0x3f0
 scsi_queue_rq+0x4a8/0x5f0
 blk_mq_dispatch_rq_list+0x73/0x440
 blk_mq_sched_dispatch_requests+0x109/0x1a0
 __blk_mq_run_hw_queue+0x131/0x1e0
 __blk_mq_delay_run_hw_queue+0x9a/0xf0
 blk_mq_run_hw_queue+0xc0/0x1e0
 blk_mq_start_hw_queues+0x2c/0x40
 scsi_run_queue+0x18e/0x2d0
 scsi_run_host_queues+0x22/0x40
 scsi_error_handler+0x18d/0x5f0
 kthread+0x11c/0x140
 ret_from_fork+0x24/0x30

Signed-off-by: Bart Van Assche 
Reviewed-by: Hannes Reinecke 
Cc: Jason Gunthorpe 
Cc: Doug Ledford 
---
 drivers/infiniband/ulp/srp/ib_srp.c | 21 ++---
 1 file changed, 2 insertions(+), 19 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index 972d4b3c5223..670f187ecb91 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -2149,7 +2149,6 @@ static void srp_handle_qp_err(struct ib_cq *cq, struct 
ib_wc *wc,
 static int srp_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmnd)
 {
struct srp_target_port *target = host_to_target(shost);
-   struct srp_rport *rport = target->rport;
struct srp_rdma_ch *ch;
struct srp_request *req;
struct srp_iu *iu;
@@ -2159,16 +2158,6 @@ static int srp_queuecommand(struct Scsi_Host *shost, 
struct scsi_cmnd *scmnd)
u32 tag;
u16 idx;
int len, ret;
-   const bool in_scsi_eh = !in_interrupt() && current == shost->ehandler;
-
-   /*
-* The SCSI EH thread is the only context from which srp_queuecommand()
-* can get invoked for blocked devices (SDEV_BLOCK /
-* SDEV_CREATED_BLOCK). Avoid racing with srp_reconnect_rport() by
-* locking the rport mutex if invoked from inside the SCSI EH.
-*/
-   if (in_scsi_eh)
-   mutex_lock(>mutex);
 
scmnd->result = srp_chkready(target->rport);
if (unlikely(scmnd->result))
@@ -2230,13 +2219,7 @@ static int srp_queuecommand(struct Scsi_Host *shost, 
struct scsi_cmnd *scmnd)
goto err_unmap;
}
 
-   ret = 0;
-
-unlock_rport:
-   if (in_scsi_eh)
-   mutex_unlock(>mutex);
-
-   return ret;
+   return 0;
 
 err_unmap:
srp_unmap_data(scmnd, ch, req);
@@ -2258,7 +2241,7 @@ static int srp_queuecommand(struct Scsi_Host *shost, 
struct scsi_cmnd *scmnd)
ret = SCSI_MLQUEUE_HOST_BUSY;
}
 
-   goto unlock_rport;
+   return ret;
 }
 
 /*
-- 
2.15.1



[PATCH v2 1/4] blk-mq: Rename request_queue.mq_freeze_wq into mq_wq

2018-01-10 Thread Bart Van Assche
Rename a waitqueue in struct request_queue since the next patch will
add code that uses this waitqueue outside the request queue freezing
implementation.

Signed-off-by: Bart Van Assche 
Reviewed-by: Hannes Reinecke 
Cc: Christoph Hellwig 
Cc: Johannes Thumshirn 
Cc: Ming Lei 
---
 block/blk-core.c   | 10 +-
 block/blk-mq.c | 10 +-
 include/linux/blkdev.h |  2 +-
 3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index fa1cb95f7f6a..c10b4ce95248 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -377,7 +377,7 @@ void blk_clear_preempt_only(struct request_queue *q)
 
spin_lock_irqsave(q->queue_lock, flags);
queue_flag_clear(QUEUE_FLAG_PREEMPT_ONLY, q);
-   wake_up_all(>mq_freeze_wq);
+   wake_up_all(>mq_wq);
spin_unlock_irqrestore(q->queue_lock, flags);
 }
 EXPORT_SYMBOL_GPL(blk_clear_preempt_only);
@@ -648,7 +648,7 @@ void blk_set_queue_dying(struct request_queue *q)
}
 
/* Make blk_queue_enter() reexamine the DYING flag. */
-   wake_up_all(>mq_freeze_wq);
+   wake_up_all(>mq_wq);
 }
 EXPORT_SYMBOL_GPL(blk_set_queue_dying);
 
@@ -851,7 +851,7 @@ int blk_queue_enter(struct request_queue *q, 
blk_mq_req_flags_t flags)
 */
smp_rmb();
 
-   ret = wait_event_interruptible(q->mq_freeze_wq,
+   ret = wait_event_interruptible(q->mq_wq,
(atomic_read(>mq_freeze_depth) == 0 &&
 (preempt || !blk_queue_preempt_only(q))) ||
blk_queue_dying(q));
@@ -872,7 +872,7 @@ static void blk_queue_usage_counter_release(struct 
percpu_ref *ref)
struct request_queue *q =
container_of(ref, struct request_queue, q_usage_counter);
 
-   wake_up_all(>mq_freeze_wq);
+   wake_up_all(>mq_wq);
 }
 
 static void blk_rq_timed_out_timer(struct timer_list *t)
@@ -948,7 +948,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, 
int node_id)
q->bypass_depth = 1;
__set_bit(QUEUE_FLAG_BYPASS, >queue_flags);
 
-   init_waitqueue_head(>mq_freeze_wq);
+   init_waitqueue_head(>mq_wq);
 
/*
 * Init percpu_ref in atomic mode so that it's faster to shutdown.
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 29f140b4dbf7..a05ea7e9b415 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -137,16 +137,16 @@ EXPORT_SYMBOL_GPL(blk_freeze_queue_start);
 
 void blk_mq_freeze_queue_wait(struct request_queue *q)
 {
-   wait_event(q->mq_freeze_wq, percpu_ref_is_zero(>q_usage_counter));
+   wait_event(q->mq_wq, percpu_ref_is_zero(>q_usage_counter));
 }
 EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_wait);
 
 int blk_mq_freeze_queue_wait_timeout(struct request_queue *q,
 unsigned long timeout)
 {
-   return wait_event_timeout(q->mq_freeze_wq,
-   percpu_ref_is_zero(>q_usage_counter),
-   timeout);
+   return wait_event_timeout(q->mq_wq,
+ percpu_ref_is_zero(>q_usage_counter),
+ timeout);
 }
 EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_wait_timeout);
 
@@ -185,7 +185,7 @@ void blk_mq_unfreeze_queue(struct request_queue *q)
WARN_ON_ONCE(freeze_depth < 0);
if (!freeze_depth) {
percpu_ref_reinit(>q_usage_counter);
-   wake_up_all(>mq_freeze_wq);
+   wake_up_all(>mq_wq);
}
 }
 EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 50fb1c18ec54..2c74c03a9d5f 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -638,7 +638,7 @@ struct request_queue {
struct throtl_data *td;
 #endif
struct rcu_head rcu_head;
-   wait_queue_head_t   mq_freeze_wq;
+   wait_queue_head_t   mq_wq;
struct percpu_ref   q_usage_counter;
struct list_headall_q_node;
 
-- 
2.15.1



[PATCH v2 0/4] Make SCSI transport recovery more robust

2018-01-10 Thread Bart Van Assche
Hello Jens,

A longstanding issue with the SCSI core is that several SCSI transport drivers
use scsi_target_block() and scsi_target_unblock() to avoid concurrent
.queuecommand() calls during e.g. transport recovery but that this is not
sufficient to protect from such calls. Hence this patch series. An additional
benefit of this patch series is that it allows to remove an ugly hack from
the SRP initiator driver. Please consider this patch series for kernel v4.16.

Thanks,

Bart.

Changes compared to v1:
- Renamed blk_wait_if_quiesced() into blk_start_wait_if_quiesced().
- Mentioned in the comment above blk_start_wait_if_quiesced() that every call
  of this function has to be followed by a call of
  blk_finish_wait_if_quiesced().

Bart Van Assche (4):
  blk-mq: Rename request_queue.mq_freeze_wq into mq_wq
  block: Introduce blk_start_wait_if_quiesced() and
blk_finish_wait_if_quiesced()
  scsi: Avoid that .queuecommand() gets called for a quiesced SCSI
device
  IB/srp: Fix a sleep-in-invalid-context bug

 block/blk-core.c| 11 +++---
 block/blk-mq.c  | 74 ++---
 drivers/infiniband/ulp/srp/ib_srp.c | 21 +--
 drivers/scsi/scsi_error.c   |  3 ++
 include/linux/blk-mq.h  |  2 +
 include/linux/blkdev.h  |  2 +-
 6 files changed, 83 insertions(+), 30 deletions(-)

-- 
2.15.1



Re: scsi: Always retry internal target error

2018-01-10 Thread Xose Vazquez Perez
> On 10/17/2017 09:11 AM, Hannes Reinecke wrote:
>> EMC Symmetrix is returning 'internal target error' for a variety
>> of conditions, most of which will be transient.
>> So we should always retry it, even with failfast set.
>> Otherwise we'd get spurious path flaps with multipath.
>> 
>> Signed-off-by: Hannes Reinecke 
>> ---
>>  drivers/scsi/scsi_error.c | 13 +
>>  1 file changed, 13 insertions(+)
>> 
>> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
>> index 5086489..5722c2e 100644
>> --- a/drivers/scsi/scsi_error.c
>> +++ b/drivers/scsi/scsi_error.c
>> @@ -497,6 +497,16 @@ int scsi_check_sense(struct scsi_cmnd *scmd)
>>  if (sshdr.asc == 0x10) /* DIF */
>>  return SUCCESS;
>>  
>> +if (!strncmp(scmd->device->vendor, "EMC", 3) &&
>> +!strncmp(scmd->device->model, "SYMMETRIX", 9) &&
>> +(sshdr.asc == 0x44) && (sshdr.ascq == 0x0)) {
>> +/*
>> + * EMC Symmetrix returns 'Internal target failure'
>> + * for a variety of internal issues, all of which
>> + * can be recovered by retry.
>> + */
>> +return ADD_TO_MLQUEUE;
>> +}
>>  return NEEDS_RETRY;
>>  case NOT_READY:
>>  case UNIT_ATTENTION:
>> @@ -1846,6 +1856,9 @@ int scsi_decide_disposition(struct scsi_cmnd *scmd)
>>  rtn = scsi_check_sense(scmd);
>>  if (rtn == NEEDS_RETRY)
>>  goto maybe_retry;
>> +else if (rtn == ADD_TO_MLQUEUE)
>> +/* Always enforce a retry for ADD_TO_MLQUEUE */
>> +rtn = NEEDS_RETRY;
>>  /* if rtn == FAILED, we have no sense information;
>>   * returning FAILED will wake the error handler thread
>>   * to collect the sense and redo the decide
>> 

On 10/18/2017 07:27 AM, Hannes Reinecke wrote:
>> Yeah, you are right.
>> Will be adding a blacklist entry for this.


Is there a new patch ongoing?


Re: [PATCH 13/14] megaraid_sas: NVME passthru command support

2018-01-10 Thread Bart Van Assche
On Tue, 2018-01-09 at 22:07 +0530, Kashyap Desai wrote:
> Overall NVME support behind MR controller is really a SCSI device. On top
> of that, for MegaRaid, NVME device can be part of Virtual Disk and those
> drive will not be exposed to the driver. User application may like to talk
> to hidden NVME devices (part of VDs). This patch will extend the existing
> interface for megaraid product in the same way it is currently supported
> for other protocols like SMP, SATA pass-through.

It seems to me like there is a contradiction in the above paragraph: if some
NVMe devices are not exposed to the driver, how can a user space application
ever send NVMe commands to it?

Anyway, has it been considered to implement the NVMe support as an NVMe
transport driver? The upstream kernel already supports NVMe communication
with NVMe PCI devices, NVMe over RDMA and NVMe over FC. If communication to
the NVMe devices behind the MegaRaid controller would be implemented as an
NVMe transport driver then all functionality of the Linux NVMe driver could
be reused, including its sysfs entries.

Bart.

Re: [PATCH] uas: Unblock scsi-requests on failure to alloc streams in post_reset

2018-01-10 Thread Oliver Neukum
Am Mittwoch, den 10.01.2018, 08:13 +0100 schrieb Hans de Goede:
> If we return 1 from our post_reset handler, then our disconnect handler
> will be called immediately afterwards. Since pre_reset blocks all scsi
> requests our disconnect handler will then hang in the scsi_remove_host
> call.

Hi Hans,

it seems to me that the diagnosis is spot on. But why do we
keep different code paths at all in this case? I do not see
the point of not reporting the reset to the SCSI subsystem,
even if we are not operational afterwards.
So how about something like this?

>From 4d1e26154bc5d09913bfba34d7adc39cce98d20a Mon Sep 17 00:00:00 2001
From: Oliver Neukum 
Date: Wed, 10 Jan 2018 16:16:03 +0100
Subject: [PATCH] usb: uas: unconditionally bring back host after reset

Quoting Hans:

If we return 1 from our post_reset handler, then our disconnect handler
will be called immediately afterwards. Since pre_reset blocks all scsi
requests our disconnect handler will then hang in the scsi_remove_host
call.

This is esp. bad because our disconnect handler hanging for ever also
stops the USB subsys from enumerating any new USB devices, causes
commands
like lsusb to hang, etc.

In practice this happens when unplugging some uas devices because the
hub
code may see the device as needing a warm-reset and calls
usb_reset_device
before seeing the disconnect. In this case uas_configure_endpoints
fails
with -ENODEV. We do not want to print an error for this, so this commit
also silences the shost_printk for -ENODEV.

ENDQUOTE

However, if we do that we better drop any unconditional execution
and report to the SCSI subsystem that we have undergone a reset
but we are not operational now.

Signed-off-by: Oliver Neukum 
Reported-by: Hans de Goede 
---
 Makefile  | 2 +-
 drivers/usb/storage/uas.c | 7 +++
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/usb/storage/uas.c b/drivers/usb/storage/uas.c
index 5d04c40ee40a..3b1b9695177a 100644
--- a/drivers/usb/storage/uas.c
+++ b/drivers/usb/storage/uas.c
@@ -1076,20 +1076,19 @@ static int uas_post_reset(struct usb_interface
*intf)
    return 0;
 
    err = uas_configure_endpoints(devinfo);
-   if (err) {
+   if (err && err != ENODEV)
    shost_printk(KERN_ERR, shost,
     "%s: alloc streams error %d after reset",
     __func__, err);
-   return 1;
-   }
 
+   /* we must unblock the host in every case lest we deadlock */
    spin_lock_irqsave(shost->host_lock, flags);
    scsi_report_bus_reset(shost, 0);
    spin_unlock_irqrestore(shost->host_lock, flags);
 
    scsi_unblock_requests(shost);
 
-   return 0;
+   return err ? 1 : 0;
 }
 
 static int uas_suspend(struct usb_interface *intf, pm_message_t
message)
-- 
2.13.6


Re: [-next PATCH 2/4] treewide: Use DEVICE_ATTR_RW

2018-01-10 Thread Peter Ujfalusi


On 2017-12-20 12:54, Jarkko Nikula wrote:
> On Wed, Dec 20, 2017 at 10:32:11AM +0100, Greg Kroah-Hartman wrote:
>> On Wed, Dec 20, 2017 at 01:24:44AM -0800, Joe Perches wrote:
>>> On Wed, 2017-12-20 at 10:34 +0200, Jarkko Nikula wrote:
 On Tue, Dec 19, 2017 at 10:15:07AM -0800, Joe Perches wrote:
> Convert DEVICE_ATTR uses to DEVICE_ATTR_RW where possible.
>>> [] 
> diff --git a/sound/soc/omap/mcbsp.c b/sound/soc/omap/mcbsp.c
>>> []
> @@ -854,7 +854,7 @@ static ssize_t dma_op_mode_store(struct device *dev,
>   return size;
>  }
>  
> -static DEVICE_ATTR(dma_op_mode, 0644, dma_op_mode_show, 
> dma_op_mode_store);
> +static DEVICE_ATTR_RW(dma_op_mode);
>  

 While I can ack this part here if it helps generic cleanup effort I
 don't understart would it improve code readability in general? Mode 644
 is clear and don't need any grepping but for DEVICE_ATTR_RW() I had to go
 through all of these files in order to see what does it mean:
>>
>> Yeah, 644 is "clear", but _RW() is even more clear.  Ideally I want to
>> get rid of all of the "non-standard" users that set random modes of
>> sysfs files, as we get it wrong too many times.  Using the "defaults" is
>> much better.
>>
> Fair enough. For the sound/soc/omap/ (Acked-by was missing from my
> previous reply):
> 
> Acked-by: Jarkko Nikula 

And from me to the same file (sound/soc/omap/mcbsp.c):

Acked-by: Peter Ujfalusi 


- Péter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki


Re: [RESEND PATCH 1/2] scsi: bnx2fc: Use zeroing allocator rather than allocator/memset

2018-01-10 Thread Chad Dupuis

On Tue, 9 Jan 2018, 4:06am, Himanshu Jha wrote:

> Use dma_zalloc_coherent instead of dma_alloc_coherent followed by
> memset 0.
> 
> Generated-by: scripts/coccinelle/api/alloc/kzalloc-simple.cocci
> 
> Suggested-by: Luis R. Rodriguez 
> Signed-off-by: Himanshu Jha 
> ---
>  drivers/scsi/bnx2fc/bnx2fc_hwi.c | 60 
> +---
>  drivers/scsi/bnx2fc/bnx2fc_tgt.c | 51 +++---
>  2 files changed, 47 insertions(+), 64 deletions(-)
> 

Sorry, didn't realize I needed to ack the resend.

Acked-by: Chad Dupuis 


RE: [PATCH 13/14] megaraid_sas: NVME passthru command support

2018-01-10 Thread Kashyap Desai
> -Original Message-
> From: Douglas Gilbert [mailto:dgilb...@interlog.com]
> Sent: Wednesday, January 10, 2018 2:21 AM
> To: Christoph Hellwig; Kashyap Desai
> Cc: Shivasharan Srikanteshwara; linux-scsi@vger.kernel.org; Sumit Saxena;
> linux-n...@lists.infradead.org; Peter Rivera
> Subject: Re: [PATCH 13/14] megaraid_sas: NVME passthru command support
>
> On 2018-01-09 11:45 AM, Christoph Hellwig wrote:
> > On Tue, Jan 09, 2018 at 10:07:28PM +0530, Kashyap Desai wrote:
> >> Chris -
> >>
> >> Overall NVME support behind MR controller is really a SCSI device. On
> >> top of that, for MegaRaid, NVME device can be part of Virtual Disk
> >> and those drive will not be exposed to the driver. User application
> >> may like to talk to hidden NVME devices (part of VDs). This patch
> >> will extend the existing interface for megaraid product in the same
> >> way it is currently supported for other protocols like SMP, SATA pass-
> through.
> >>
> >> Example - Current smartmon is using megaraid.h (MFI headers) to send
> >> SATA pass-through.
> >>
> >> https://github.com/mirror/smartmontools/blob/master/megaraid.h
> >
> > And that is exactly the example of why we should have never allowed
> > megaraid any private passthrough ioctls to start with.
>
> Christoph,
> Have you tried to do any serious work with  and say
> compared it with FreeBSD and Microsoft's approach? No prize for guessing
> which one is worst (and least extensible). Looks like the Linux
> pass-through
> was at the end of a ToDo list and was "designed"
> at 5 a.m in the morning.
>
> RAID cards need a pass-through that allows them to address one of many
> physical disks behind the virtual disk presented to OS.
> Pass-throughs need to have uncommited room for extra parameters that will
> be passed through as-is to the RAID LLD.

Doug - As you mentioned, I notice the same. This type of issue is common for
all RAID controllers vendors.
Whatever Christoph mentioned about NVMe type API to be used is possible, but
may need extra hit in firmware side to convert Linux NVME API to FW specific
OR deal the same in driver.
It may come with it's own pros/cons.  Also may not fulfil the end goal. For
other platforms, we still have to depend upon specialized pass-through code.
So having said that Firmware of RAID cannot use only one interface for
pass-through and they have to choose specialized pass-through code.

NVME-CLI interface is designed for NVME drives attached to block layer.
MegaRaid product is design to keep NVME protocol abstracted (much like SATA
drives behind SAS controller) and attach those drives/virtual disk to SCSI
layer.

>
> So until Christoph gives an example of how that can be done with
>  then I would like to see Christoph's objection
> ignored.
>
>
> And as a maintainer of smartmontools, I would like to point out that
> pretty
> well all supported RAIDs, on all platforms need specialized pass-through
> code.

If upstream community like to enhance nvme-cli type interface in
megaraid_sas driver, we may have to come up with one more layer in
megaraid_sas driver to convert NVME-API to specialized pass-through code.
It is really not simple to fit into existing design as NVME-CLI/API is
considering NVME drive associated with nvme.ko modules (/dev/nvmeX). Also we
don't have many sysfs entries nvme-cli is looking for NVME device etc.. We
don't have way to talk to Physical disks which is part of VD etc..

Specialized pass-through code is better to extend in application like
smartmontools etc.

> Start by looking at os_linux.cpp and then at the other OSes. And now
> smartmontools supports NVMe on most platforms and at the pass-through
> level, it is just another one, and not a particularly clean one.
>
> IMO Intel had their chance on the pass-through front, and blew it.
> It is now too late to fix it and that job (impossible ?) should not fall
> to
> MegaRaid maintainers.
>
> Douglas Gilbert


RE: [PATCH 13/14] megaraid_sas: NVME passthru command support

2018-01-10 Thread Kashyap Desai
> -Original Message-
> From: Keith Busch [mailto:keith.bu...@intel.com]
> Sent: Wednesday, January 10, 2018 4:53 AM
> To: Douglas Gilbert
> Cc: Christoph Hellwig; Kashyap Desai; Shivasharan Srikanteshwara; Sumit
> Saxena; Peter Rivera; linux-n...@lists.infradead.org; linux-
> s...@vger.kernel.org
> Subject: Re: [PATCH 13/14] megaraid_sas: NVME passthru command support
>
> On Tue, Jan 09, 2018 at 03:50:44PM -0500, Douglas Gilbert wrote:
> > Have you tried to do any serious work with  and
> > say compared it with FreeBSD and Microsoft's approach? No prize for
> > guessing which one is worst (and least extensible). Looks like the
> > Linux pass-through was at the end of a ToDo list and was "designed"
> > at 5 a.m in the morning.
>
> What the heck are you talking about? FreeBSD's NVMe passthrough is near
> identical to Linux, and Linux's existed years prior.
>
> You're not even touching the nvme subsystem, so why are you copying the
> linux-nvme mailing list to help you with a non-NVMe device? Please take
your
> ignorant and dubious claims elsewhere.

Keith -

As we discussed for mpt3sas driver NVME driver support, there was request
to add linux-n...@lists.infradead.org for NVME related discussion.
https://marc.info/?l=linux-kernel=149874673729467=2

As you mentioned, we are not touching NVME subsystem, we can skip to add
NVME mailing list for future submission w.r.t NVME drive behind MR
(megaraid_sas) and HBA (mpt3sas).
All the NVME drives behind MegaRaid controller is SCSI device irrespective
of transport.

Kashyap


Re: [PATCH] libsas: Disable asynchronous aborts for SATA devices

2018-01-10 Thread Hannes Reinecke
On 01/10/2018 09:15 AM, Christoph Hellwig wrote:
> Looks fine to me for 4.15 and -stable:
> 
> Reviewed-by: Christoph Hellwig 
> 
> But we really need to fix this properly in the long run.
> 
Looked into it, but I'm not sure if we can due to the fundamental
differences between SCSI and libata EH.

And there really is no way on how we can do async aborts on SATA; as
soon as we're submitting NCQ commands _all_ commands will be aborted on
error and we have to pick up the pieces.

We might be handling things a tad better than now (we're always punting
the abort to a workqueue, only to figure out from the workqueue function
that we should've invoked SCSI EH proper).
But I can't really see a better way here.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


[PATCHv2] libsas: Disable asynchronous aborts for SATA devices

2018-01-10 Thread Hannes Reinecke
Handling CD-ROM devices from libsas is decidedly odd, as libata
relies on SCSI EH to be started to figure out that no medium is
present.
So we cannot do asynchronous aborts for SATA devices.

Fixes: 909657615d9 ("scsi: libsas: allow async aborts")
Cc:  # 4.12+
Signed-off-by: Hannes Reinecke 
Reviewed-by: Christoph Hellwig 
Tested-by: Yves-Alexis Perez 
---
 drivers/scsi/libsas/sas_scsi_host.c | 17 +++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/libsas/sas_scsi_host.c 
b/drivers/scsi/libsas/sas_scsi_host.c
index 6267272..6de9681 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -487,15 +487,28 @@ static int sas_queue_reset(struct domain_device *dev, int 
reset_type,
 
 int sas_eh_abort_handler(struct scsi_cmnd *cmd)
 {
-   int res;
+   int res = TMF_RESP_FUNC_FAILED;
struct sas_task *task = TO_SAS_TASK(cmd);
struct Scsi_Host *host = cmd->device->host;
+   struct domain_device *dev = cmd_to_domain_dev(cmd);
struct sas_internal *i = to_sas_internal(host->transportt);
+   unsigned long flags;
 
if (!i->dft->lldd_abort_task)
return FAILED;
 
-   res = i->dft->lldd_abort_task(task);
+   spin_lock_irqsave(host->host_lock, flags);
+   /* We cannot do async aborts for SATA devices */
+   if (dev_is_sata(dev) && !host->host_eh_scheduled) {
+   spin_unlock_irqrestore(host->host_lock, flags);
+   return FAILED;
+   }
+   spin_unlock_irqrestore(host->host_lock, flags);
+
+   if (task)
+   res = i->dft->lldd_abort_task(task);
+   else
+   SAS_DPRINTK("no task to abort\n");
if (res == TMF_RESP_FUNC_SUCC || res == TMF_RESP_FUNC_COMPLETE)
return SUCCESS;
 
-- 
1.8.5.6



Re: [PATCH] libsas: Disable asynchronous aborts for SATA devices

2018-01-10 Thread Yves-Alexis Perez
On Tue, 2018-01-09 at 16:43 +0100, Hannes Reinecke wrote:
> Handling CD-ROM devices from libsas is decidedly odd, as libata
> relies on SCSI EH to be started to figure out that no medium is
> present.
> So we cannot do asynchronous aborts for SATA devices.

The box boots fine with this change, thanks!

Tested-by: Yves-Alexis Perez 
-- 
Yves-Alexis

signature.asc
Description: This is a digitally signed message part


Re: [PATCH] libsas: Disable asynchronous aborts for SATA devices

2018-01-10 Thread Christoph Hellwig
Looks fine to me for 4.15 and -stable:

Reviewed-by: Christoph Hellwig 

But we really need to fix this properly in the long run.