Re: scsi: sg: assorted memory corruptions

2018-01-31 Thread Dmitry Vyukov
On Thu, Feb 1, 2018 at 7:03 AM, Douglas Gilbert  wrote:
> On 2018-01-30 07:22 AM, Dmitry Vyukov wrote:
>>
>> Uh, I've answered this a week ago, but did not notice that Doug
>> dropped everybody from CC. Reporting to all.
>>
>> On Mon, Jan 22, 2018 at 8:16 PM, Douglas Gilbert 
>> wrote:
>>>
>>> On 2018-01-22 02:06 PM, Dmitry Vyukov wrote:


 On Mon, Jan 22, 2018 at 7:57 PM, Douglas Gilbert 
>>>
>>> Please show me the output of 'lsscsi -g' on your test machine.
>>> /dev/sg0 is often associated with /dev/sda which is often a SATA
>>> SSD (or a virtualized one) that holds the root file system.
>>> With the sg pass-through driver it is relatively easy to write
>>> random (user provided data) over the root file system which will
>>> almost certainly "root" the system.
>>
>>
>>
>> This is pretty standard qemu vm started with:
>>
>> qemu-system-x86_64 -hda wheezy.img -net user,host=10.0.2.10 -net nic
>> -nographic -kernel arch/x86/boot/bzImage -append "console=ttyS0
>> root=/dev/sda earlyprintk=serial " -m 2G -smp 4
>>
>> # lsscsi -g
>> [0:0:0:0]diskATA  QEMU HARDDISK0 /dev/sda   /dev/sg0
>
>
> With lk 4.15.0-rc9 I can run your test program (with some additions, see
> attachment) for 30 minutes against a scsi_debug simulated disk. You can
> easily replicate this test just run 'modprobe scsi_debug' and a third
> line should appear in your lsscsi output. The new device will most likely
> be /dev/sg2 .
>
> With lk 4.15.0 (release) running against a SAS SSD (SEAGATE ST200FM0073),
> the test has  been running 20 minutes and counting without problems. That
> is using a LSI HBA with the mpt3sas driver.
>
>> [1:0:0:0]cd/dvd  QEMU QEMU DVD-ROM 2.0.  /dev/sr0   /dev/sg1
>>
>> # readlink /sys/class/scsi_generic/sg0
>>
>> ../../devices/pci:00/:00:01.1/ata1/host0/target0:0:0/0:0:0:0/scsi_generic/sg0
>>
>> # cat /sys/class/scsi_generic/sg0/device/vendor
>> ATA
>
>
> ^
> That subsystem is the culprit IMO, most likely libata.
>
> Until you can show this test failing on something other than an
> ATA disk, then I will treat this issue as closed.

Hi Doug,

Why is bug in ATA not a bug? Is it long unused by everybody? I've got
it by running qemu with default flags...


[PATCH] qedi: Fix truncation of name and secret

2018-01-31 Thread Nilesh Javali
Adjust the NULL byte added by snprintf.

Signed-off-by: Nilesh Javali 
---
 drivers/scsi/qedi/qedi_main.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/qedi_main.c
index 34a..cf8badb 100644
--- a/drivers/scsi/qedi/qedi_main.c
+++ b/drivers/scsi/qedi/qedi_main.c
@@ -1840,7 +1840,7 @@ static ssize_t qedi_show_boot_ini_info(void *data, int 
type, char *buf)
 
switch (type) {
case ISCSI_BOOT_INI_INITIATOR_NAME:
-   rc = snprintf(str, NVM_ISCSI_CFG_ISCSI_NAME_MAX_LEN, "%s\n",
+   rc = snprintf(str, NVM_ISCSI_CFG_ISCSI_NAME_MAX_LEN + 1, "%s",
  initiator->initiator_name.byte);
break;
default:
@@ -1908,7 +1908,7 @@ static umode_t qedi_ini_get_attr_visibility(void *data, 
int type)
 
switch (type) {
case ISCSI_BOOT_TGT_NAME:
-   rc = snprintf(str, NVM_ISCSI_CFG_ISCSI_NAME_MAX_LEN, "%s\n",
+   rc = snprintf(str, NVM_ISCSI_CFG_ISCSI_NAME_MAX_LEN + 1, "%s",
  block->target[idx].target_name.byte);
break;
case ISCSI_BOOT_TGT_IP_ADDR:
@@ -1930,19 +1930,19 @@ static umode_t qedi_ini_get_attr_visibility(void *data, 
int type)
  block->target[idx].lun.value[0]);
break;
case ISCSI_BOOT_TGT_CHAP_NAME:
-   rc = snprintf(str, NVM_ISCSI_CFG_CHAP_NAME_MAX_LEN, "%s\n",
+   rc = snprintf(str, NVM_ISCSI_CFG_CHAP_NAME_MAX_LEN + 1, "%s",
  chap_name);
break;
case ISCSI_BOOT_TGT_CHAP_SECRET:
-   rc = snprintf(str, NVM_ISCSI_CFG_CHAP_PWD_MAX_LEN, "%s\n",
+   rc = snprintf(str, NVM_ISCSI_CFG_CHAP_PWD_MAX_LEN + 1, "%s",
  chap_secret);
break;
case ISCSI_BOOT_TGT_REV_CHAP_NAME:
-   rc = snprintf(str, NVM_ISCSI_CFG_CHAP_NAME_MAX_LEN, "%s\n",
+   rc = snprintf(str, NVM_ISCSI_CFG_CHAP_NAME_MAX_LEN + 1, "%s",
  mchap_name);
break;
case ISCSI_BOOT_TGT_REV_CHAP_SECRET:
-   rc = snprintf(str, NVM_ISCSI_CFG_CHAP_PWD_MAX_LEN, "%s\n",
+   rc = snprintf(str, NVM_ISCSI_CFG_CHAP_PWD_MAX_LEN + 1, "%s",
  mchap_secret);
break;
case ISCSI_BOOT_TGT_FLAGS:
-- 
1.8.3.1



Re: scsi: sg: assorted memory corruptions

2018-01-31 Thread Douglas Gilbert

On 2018-01-30 07:22 AM, Dmitry Vyukov wrote:

Uh, I've answered this a week ago, but did not notice that Doug
dropped everybody from CC. Reporting to all.

On Mon, Jan 22, 2018 at 8:16 PM, Douglas Gilbert  wrote:

On 2018-01-22 02:06 PM, Dmitry Vyukov wrote:


On Mon, Jan 22, 2018 at 7:57 PM, Douglas Gilbert 

Please show me the output of 'lsscsi -g' on your test machine.
/dev/sg0 is often associated with /dev/sda which is often a SATA
SSD (or a virtualized one) that holds the root file system.
With the sg pass-through driver it is relatively easy to write
random (user provided data) over the root file system which will
almost certainly "root" the system.



This is pretty standard qemu vm started with:

qemu-system-x86_64 -hda wheezy.img -net user,host=10.0.2.10 -net nic
-nographic -kernel arch/x86/boot/bzImage -append "console=ttyS0
root=/dev/sda earlyprintk=serial " -m 2G -smp 4

# lsscsi -g
[0:0:0:0]diskATA  QEMU HARDDISK0 /dev/sda   /dev/sg0


With lk 4.15.0-rc9 I can run your test program (with some additions, see
attachment) for 30 minutes against a scsi_debug simulated disk. You can
easily replicate this test just run 'modprobe scsi_debug' and a third
line should appear in your lsscsi output. The new device will most likely
be /dev/sg2 .

With lk 4.15.0 (release) running against a SAS SSD (SEAGATE ST200FM0073),
the test has  been running 20 minutes and counting without problems. That
is using a LSI HBA with the mpt3sas driver.


[1:0:0:0]cd/dvd  QEMU QEMU DVD-ROM 2.0.  /dev/sr0   /dev/sg1

# readlink /sys/class/scsi_generic/sg0
../../devices/pci:00/:00:01.1/ata1/host0/target0:0:0/0:0:0:0/scsi_generic/sg0

# cat /sys/class/scsi_generic/sg0/device/vendor
ATA


^
That subsystem is the culprit IMO, most likely libata.

Until you can show this test failing on something other than an
ATA disk, then I will treat this issue as closed.

Doug Gilbert



Perhaps it misbehaves when it
gets a SCSI command in the T10 range (i.e. not vendor specific) with
a 9 byte cdb length. As far as I'm aware T10 (and the Ansi committee
before it) have never defined a cdb with an odd length.

For those that are not aware, the sg driver is a relatively thin
shim over the block layer, the SCSI mid-level, and a low-level
driver which may have another kernel driver stack underneath it
(e.g. UAS (USB attached SCSI)). The previous report from syzkaller
on the sg driver ("scsi: memory leak in sg_start_req") has resulted
in one accepted patch on the block layer with probably more to
come in the same area.

Testing the patch Dmitry gave (with some added error checks which
reported no problems) with the scsi_debug driver supplying /dev/sg0
I have not seen any problems running that test program. Again
there might be a very slow memory leak, but if there is I don't
believe it is in the sg driver.



Did you run it in a loop? First runs pass just fine for me too.



Is thirty minutes long enough ??



Yes, it certainly should be enough. Here is what I see:


# while ./a.out; do echo RUN; done
RUN
RUN
RUN
RUN
RUN
RUN
RUN
[  371.977266] 
==
[  371.980158] BUG: KASAN: double-free or invalid-free in
__put_task_struct+0x1e7/0x5c0



Here is full execution trace of the write call if that will be of any help:
https://gist.githubusercontent.com/dvyukov/14ae64c3e753dedf9ab2608676ecf0b9/raw/9803d52bb1e317a9228e362236d042aaf0fa9d69/gistfile1.txt

This is on upstream commit 0d665e7b109d512b7cae3ccef6e8654714887844.
Also attaching my config just in case.



// autogenerated by syzkaller (http://github.com/google/syzkaller)
#include 
#include 
#include 
#include 
#include 

#include 
#include 
#include 

#define SG_NEXT_CMD_LEN 0x2283

static const char * usage = "sg_syzk_next_cdb # (e.g. '/dev/sg3') ";

int main(int argc, const char * argv[])
{
  int res, err;
  int fd;
  long len = 9;
  char* p = "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x47\x00\x00\x24\x00"
"\x00\x00\x00\x00\x00\x1c\xbb\xac\x14\x00\xaa\xe0\x00\x00\x01"
"\x00\x07\x07\x00\x00\x59\x08\x00\x00\x00\x80\xfe\x7f\x00\x00\x01";
const char * dev_name;
struct stat a_stat;

if (argc < 2) {
fprintf(stderr, "Usage: %s\n", usage);
return 1;
}
dev_name = argv[1];
if (0 != stat(dev_name, _stat)) {
err = errno;
fprintf(stderr, "Unable to stat %s, err: %s\n", dev_name,
strerror(err));
return 1;
}
if ((a_stat.st_mode & S_IFMT) != S_IFCHR) {
fprintf(stderr, "Expected %s, to be sg device\n", dev_name);
return 1;
}
  fd = open(dev_name, O_RDWR);
  if (fd < 0) {
	err = errno;
	fprintf(stderr, "open(%s) failed: %s [%d]\n", dev_name, strerror(err),
		err);
  }
  res = ioctl(fd, SG_NEXT_CMD_LEN, );
  if (res < 0) {
	err = errno;
	fprintf(stderr, "ioctl failed: %s [%d]\n", strerror(err), err);
  }
  res = write(fd, p, 

Re: [LSF/MM TOPIC] KPTI effect on IO performance

2018-01-31 Thread Ming Lei
On Wed, Jan 31, 2018 at 11:43:33AM -0700, Scotty Bauer wrote:
> On 2018-01-31 01:23, Ming Lei wrote:
> > Hi All,
> > 
> > After KPTI is merged, there is extra load introduced to context switch
> > between user space and kernel space. It is observed on my laptop that
> > one
> > syscall takes extra ~0.15us[1] compared with 'nopti'.
> > 
> > IO performance is affected too, it is observed that IOPS drops by 32% in
> > my test[2] on null_blk compared with 'nopti':
> > 
> > randread IOPS on latest linus tree:
> > -
> > | randread IOPS | randread IOPS with 'nopti'|
> > 
> > | 928K  | 1372K |
> > 
> > 
> > 
> 
> Do you know if your CPU has PCID? It would be interesting to see these tests
> on older CPUs or older kernels without PCID support.

BTW, I also saw test data in case of vCPU without PCID, and it is said the 
syscall
time can be close to ~30X compared with nopti, and the test should be setup 
easily
by adjust CPU model of Qemu.

So in case of no PCID, KPTI effect on IO performance should be much bigger than 
the
above data.

Thanks,
Ming


Re: [LSF/MM TOPIC] KPTI effect on IO performance

2018-01-31 Thread Ming Lei
Hi Scotty,

On Wed, Jan 31, 2018 at 11:43:33AM -0700, Scotty Bauer wrote:
> On 2018-01-31 01:23, Ming Lei wrote:
> > Hi All,
> > 
> > After KPTI is merged, there is extra load introduced to context switch
> > between user space and kernel space. It is observed on my laptop that
> > one
> > syscall takes extra ~0.15us[1] compared with 'nopti'.
> > 
> > IO performance is affected too, it is observed that IOPS drops by 32% in
> > my test[2] on null_blk compared with 'nopti':
> > 
> > randread IOPS on latest linus tree:
> > -
> > | randread IOPS | randread IOPS with 'nopti'|
> > 
> > | 928K  | 1372K |
> > 
> > 
> > 
> 
> Do you know if your CPU has PCID? It would be interesting to see these tests
> on older CPUs or older kernels without PCID support.

My CPU has PCID, which can be retrieved via /proc/cpuinfo.

And the above test is run on same kernel binary, and the result is just done
between 'nopti' and no 'nopti' in kernel command line.

Thanks,
Ming


Re: [PATCH] scsi_debug: implement IMMED bit

2018-01-31 Thread Bart Van Assche
On Wed, 2018-01-31 at 17:40 -0500, Douglas Gilbert wrote:
> On 2018-01-31 05:05 PM, Bart Van Assche wrote:
> > On Wed, 2018-01-31 at 15:26 -0500, Douglas Gilbert wrote:
> > > On 2018-01-31 12:06 PM, Bart Van Assche wrote:
> > > > On 01/29/18 21:54, Douglas Gilbert wrote:
> > > > > +static const struct opcode_info_t sync_cache_iarr[] = {
> > > > > +{0, 0x91, 0, F_LONG_DELAY | F_M_ACCESS, resp_sync_cache, NULL,
> > > > > +{16,  0x7, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> > > > 
> > > > ^^^
> > > > Can you clarify the choice of "0x7" in the above? After having had a 
> > > > look at
> > > > SBC-4 I was expecting 0x2 (the mask for the IMMED bit) since all other 
> > > > bits in
> > > > that command byte are either reserved or obsolete.
> > > 
> > > As a general rule when you see "obsolete" that means that field was used
> > > in an earlier standard (bit 0 was RELADDR and bit 2 was SYNC_NV). So
> > > application clients complying with earlier versions of SBC might set those
> > > bits. If they are set and the mask is being enforced I choose to not fail
> > > the command as an illegal request. Basically accept and ignore.
> > 
> > I agree with setting bits for obsolete flags. The reason I was asking about 
> > that
> > mask is because I found the following in SBC-4 for byte 1 of SYNCHRONIZE 
> > CACHE(16):
> > * Bits 7..3: reserved.
> > * Bit 2: obsolete.
> > * Bit 1: IMMED.
> > * Bit 0: reserved.
> 
> I did see the discrepancy between byte 1 bit 0 in SYNCHRONIZE CACHE(10)
> where it is "obsolete" and SYNCHRONIZE CACHE(16) where it is "reserved"
> initially thinking it might be a typo. So I went for the more permissive.
> I'm now guessing (because I don't have the drafts between SBC(-1) and
> SBC-2 and for some reason T10 wants you to be a member to download older
> drafts) that RELADDR was removed before SYNCHRONIZE CACHE(16) was added
> (as there was no 64 LBA support in SBC-1). So if there never was a
> publicly available draft where SYNCHRONIZE CACHE(16) had a RELADDR bit
> defined, then why mark it as obsolete? Reserved means a later
> draft/standard may use that bit/field.
> 
> BTW There is no requirement for a device server to enforce the mask and
> most don't (from my testing). The mask is provided as an aid to the
> application client and is required by the implementation of REPORT
> SUPPORTED OPERATION CODES which is why I have it in scsi_debug. By default
> the scsi_debug driver does not enforce the mask (but does when strict=1).
> 
> Anyway, aren't we splitting hairs here? Does this really justify a new
> patch being rolled?

People trust you as a SCSI expert and may consider to copy code from the
scsi_debug.c driver in their own GPL drivers if they need similar functionality.
Do you think that is sufficient as a motivation to edit and repost this patch?

Thanks,

Bart.

Re: [PATCH] scsi: mpt3sas: fix oops in error handlers after shutdown/unload

2018-01-31 Thread Bart Van Assche
On Wed, 2018-01-31 at 17:48 -0200, Mauricio Faria de Oliveira wrote:
> On 01/31/2018 05:06 PM, Bart Van Assche wrote:
> > Sorry but I think this patch introduces new race conditions. Have you
> 
> Can you detail the race conditions? As far as I can see, the only race
> condition would be when an error handler is invoked very close in time
> to the shutdown/unload handler, and as such miss the 'ioc->remove_host'
> assignment (thus it continues running anyway, and hit the oops).

That's indeed the race I was referring to. BTW, are you aware that your patch
does not seem to apply on Martin's tree (the fixes branch of
git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git):

$ patch -p1 < 
~/\[PATCH\]_scsi\:_mpt3sas\:_fix_oops_in_error_handlers_after_shutdown_unload.mbox
   
patching file drivers/scsi/mpt3sas/mpt3sas_scsih.c
Hunk #1 FAILED at 3007.
Hunk #2 succeeded at 2970 (offset -99 lines).
Hunk #3 succeeded at 2977 (offset -161 lines).
Hunk #4 succeeded at 3033 (offset -160 lines).
1 out of 4 hunks FAILED -- saving rejects to file 
drivers/scsi/mpt3sas/mpt3sas_scsih.c.rej

Can you rebase this patch on top of Martin's tree?

Thanks,

Bart.

Re: [PATCH] test-case

2018-01-31 Thread Bart Van Assche
On Wed, 2018-01-31 at 17:24 -0200, Mauricio Faria de Oliveira wrote:
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c 
> b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> index 3c4e47c..611cee33 100644
> --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> @@ -2997,6 +2997,12 @@ int mpt3sas_scsih_issue_locked_tm(struct 
> MPT3SAS_ADAPTER *ioc, u16 handle,
>   "attempting task abort! scmd(%p)\n", scmd);
>   _scsih_tm_display_info(ioc, scmd);
>  
> + if (ioc->logging_level & 0x0100) {
> + pr_info(MPT3SAS_FMT "fail task abort scmd(%p)\n", ioc->name, 
> scmd);
> + r = FAILED;
> + goto out;
> + }
> +
>   sas_device_priv_data = scmd->device->hostdata;
>   if (!sas_device_priv_data || !sas_device_priv_data->sas_target) {
>   sdev_printk(KERN_INFO, scmd->device,
> @@ -5584,6 +5590,11 @@ static int _scsih_set_satl_pending(struct scsi_cmnd 
> *scmd, bool pending)
>  
>   scsi_dma_unmap(scmd);
>  
> + if (ioc->logging_level & 0x1000 && scmd->cmnd[0] != 0x35) {
> + pr_info(MPT3SAS_FMT "skip scsi_done scmd(%p)\n", ioc->name, 
> scmd);
> + return 1;
> + }
> +
>   scmd->scsi_done(scmd);
>   return 1;
>  }
> @@ -10016,6 +10027,11 @@ static void scsih_remove(struct pci_dev *pdev)
>  
>   _scsih_ir_shutdown(ioc);
>   mpt3sas_base_detach(ioc);
> +
> + while (ioc->logging_level & 0x1000) {
> + pr_info(MPT3SAS_FMT "sleep on shutdown\n", ioc->name);
> + ssleep(1);
> + }
>  }

Hello Mauricio,

I think it would be useful to have some variant of the above code in the kernel
tree. Are you familiar with the fault injection framework (see also
 and Documentation/fault-injection/fault-injection.txt)?
Do you think that framework would be appropriate for controlling whether or not
the above code gets executed?

Thanks,

Bart.

Re: [PATCH] scsi_debug: implement IMMED bit

2018-01-31 Thread Douglas Gilbert

On 2018-01-31 05:05 PM, Bart Van Assche wrote:

On Wed, 2018-01-31 at 15:26 -0500, Douglas Gilbert wrote:

On 2018-01-31 12:06 PM, Bart Van Assche wrote:

On 01/29/18 21:54, Douglas Gilbert wrote:

+static const struct opcode_info_t sync_cache_iarr[] = {
+{0, 0x91, 0, F_LONG_DELAY | F_M_ACCESS, resp_sync_cache, NULL,
+{16,  0x7, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,


^^^
Can you clarify the choice of "0x7" in the above? After having had a look at
SBC-4 I was expecting 0x2 (the mask for the IMMED bit) since all other bits in
that command byte are either reserved or obsolete.


As a general rule when you see "obsolete" that means that field was used
in an earlier standard (bit 0 was RELADDR and bit 2 was SYNC_NV). So
application clients complying with earlier versions of SBC might set those
bits. If they are set and the mask is being enforced I choose to not fail
the command as an illegal request. Basically accept and ignore.


I agree with setting bits for obsolete flags. The reason I was asking about that
mask is because I found the following in SBC-4 for byte 1 of SYNCHRONIZE 
CACHE(16):
* Bits 7..3: reserved.
* Bit 2: obsolete.
* Bit 1: IMMED.
* Bit 0: reserved.


I did see the discrepancy between byte 1 bit 0 in SYNCHRONIZE CACHE(10)
where it is "obsolete" and SYNCHRONIZE CACHE(16) where it is "reserved"
initially thinking it might be a typo. So I went for the more permissive.
I'm now guessing (because I don't have the drafts between SBC(-1) and
SBC-2 and for some reason T10 wants you to be a member to download older
drafts) that RELADDR was removed before SYNCHRONIZE CACHE(16) was added
(as there was no 64 LBA support in SBC-1). So if there never was a
publicly available draft where SYNCHRONIZE CACHE(16) had a RELADDR bit
defined, then why mark it as obsolete? Reserved means a later
draft/standard may use that bit/field.

BTW There is no requirement for a device server to enforce the mask and
most don't (from my testing). The mask is provided as an aid to the
application client and is required by the implementation of REPORT
SUPPORTED OPERATION CODES which is why I have it in scsi_debug. By default
the scsi_debug driver does not enforce the mask (but does when strict=1).

Anyway, aren't we splitting hairs here? Does this really justify a new
patch being rolled?

Doug Gilbert


-return 0;
+return (cmd[1] & 0x1) ? SDEG_RES_IMMED_MASK : 0; /* check IMMED bit */


Shouldn't the mask 0x2 be used to check for the IMMED bit?


That comment needs a little more context:


Sorry, I confused byte 1 of the START STOP command with that of the SYNCHRONIZE
CACHE command so please ignore the above comment.

Bart.





Re: [PATCH] scsi_debug: implement IMMED bit

2018-01-31 Thread Bart Van Assche
On Wed, 2018-01-31 at 15:26 -0500, Douglas Gilbert wrote:
> On 2018-01-31 12:06 PM, Bart Van Assche wrote:
> > On 01/29/18 21:54, Douglas Gilbert wrote:
> > > +static const struct opcode_info_t sync_cache_iarr[] = {
> > > +{0, 0x91, 0, F_LONG_DELAY | F_M_ACCESS, resp_sync_cache, NULL,
> > > +{16,  0x7, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> > 
> >^^^
> > Can you clarify the choice of "0x7" in the above? After having had a look 
> > at 
> > SBC-4 I was expecting 0x2 (the mask for the IMMED bit) since all other bits 
> > in 
> > that command byte are either reserved or obsolete.
> 
> As a general rule when you see "obsolete" that means that field was used
> in an earlier standard (bit 0 was RELADDR and bit 2 was SYNC_NV). So
> application clients complying with earlier versions of SBC might set those
> bits. If they are set and the mask is being enforced I choose to not fail
> the command as an illegal request. Basically accept and ignore.

I agree with setting bits for obsolete flags. The reason I was asking about that
mask is because I found the following in SBC-4 for byte 1 of SYNCHRONIZE 
CACHE(16):
* Bits 7..3: reserved.
* Bit 2: obsolete.
* Bit 1: IMMED.
* Bit 0: reserved.

> > > -return 0;
> > > +return (cmd[1] & 0x1) ? SDEG_RES_IMMED_MASK : 0; /* check IMMED bit 
> > > */
> > 
> > Shouldn't the mask 0x2 be used to check for the IMMED bit?
> 
> That comment needs a little more context:

Sorry, I confused byte 1 of the START STOP command with that of the SYNCHRONIZE
CACHE command so please ignore the above comment.

Bart.

RE: [PATCH 1/1] scsi: storvsc: Spread interrupts when picking a channel for I/O requests

2018-01-31 Thread Long Li
> Subject: RE: [PATCH 1/1] scsi: storvsc: Spread interrupts when picking a
> channel for I/O requests
> 
> > From: Long Li
> > Sent: Wednesday, January 31, 2018 12:23 PM
> > To: Michael Kelley (EOSG) ; KY
> > Srinivasan ; Stephen Hemminger
> > ; martin.peter...@oracle.com;
> > de...@linuxdriverproject.org; linux-ker...@vger.kernel.org;
> > linux-scsi@vger.kernel.org; James E . J . Bottomley
> > 
> > Subject: RE: [PATCH 1/1] scsi: storvsc: Spread interrupts when picking
> > a channel for I/O requests
> >
> > > Subject: RE: [PATCH 1/1] scsi: storvsc: Spread interrupts when
> > > picking a channel for I/O requests
> > >
> > > Updated/corrected two email addresses ...
> > >
> > > > -Original Message-
> > > > From: Michael Kelley (EOSG)
> > > > Sent: Wednesday, January 24, 2018 2:14 PM
> > > > To: KY Srinivasan ; Stephen Hemminger
> > > > ; martin.peter...@oracle.com;
> > > > lo...@microsoft.com; jbottom...@odin.com;
> > > > de...@linuxdriverproject.org; linux-ker...@vger.kernel.org;
> > > > linux-scsi@vger.kernel.org
> > > > Cc: Michael Kelley (EOSG) 
> > > > Subject: [PATCH 1/1] scsi: storvsc: Spread interrupts when picking
> > > > a channel for I/O requests
> > > >
> > > > Update the algorithm in storvsc_do_io to look for a channel
> > > > starting with the current CPU + 1 and wrap around (within the
> > > > current NUMA node). This spreads VMbus interrupts more evenly
> > > > across CPUs. Previous code always started with first CPU in the
> > > > current NUMA node, skewing the interrupt load to that CPU.
> > > >
> > > > Signed-off-by: Michael Kelley 

Reviewed-by: Long Li 

> > > > ---
> > > >  drivers/scsi/storvsc_drv.c | 3 ++-
> > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/scsi/storvsc_drv.c
> > > > b/drivers/scsi/storvsc_drv.c index e07907d..f3264c4 100644
> > > > --- a/drivers/scsi/storvsc_drv.c
> > > > +++ b/drivers/scsi/storvsc_drv.c
> > > > @@ -1310,7 +1310,8 @@ static int storvsc_do_io(struct hv_device
> *device,
> > > >  */
> > > > cpumask_and(_mask, _device-
> alloced_cpus,
> > > >
> > > cpumask_of_node(cpu_to_node(q_num)));
> > > > -   for_each_cpu(tgt_cpu, _mask) {
> > > > +   for_each_cpu_wrap(tgt_cpu, _mask,
> > > > +   outgoing_channel->target_cpu + 
> > > > 1) {
> >
> > Does it work when target_cpu is the last CPU on the system?
> >
> > Otherwise, looking good.
> 
> Yes, it works.  for_each_cpu_wrap() correctly wraps in the case where the
> 3rd parameter ('start') is one past the end of the mask.  Arguably, we
> shouldn't rely on that, and should do the wrap to 0 before calling
> for_each_cpu_wrap().
> 
> >
> > > > if (tgt_cpu != 
> > > > outgoing_channel->target_cpu)
> > > {
> > > > outgoing_channel =
> > > > stor_device->stor_chns[tgt_cpu];
> > > > --
> > > > 1.8.3.1


RE: [PATCH 1/1] scsi: storvsc: Spread interrupts when picking a channel for I/O requests

2018-01-31 Thread Michael Kelley (EOSG)
> From: Long Li
> Sent: Wednesday, January 31, 2018 12:23 PM
> To: Michael Kelley (EOSG) ; KY Srinivasan
> ; Stephen Hemminger ;
> martin.peter...@oracle.com; de...@linuxdriverproject.org; 
> linux-ker...@vger.kernel.org;
> linux-scsi@vger.kernel.org; James E . J . Bottomley 
> Subject: RE: [PATCH 1/1] scsi: storvsc: Spread interrupts when picking a 
> channel for I/O
> requests
> 
> > Subject: RE: [PATCH 1/1] scsi: storvsc: Spread interrupts when picking a
> > channel for I/O requests
> >
> > Updated/corrected two email addresses ...
> >
> > > -Original Message-
> > > From: Michael Kelley (EOSG)
> > > Sent: Wednesday, January 24, 2018 2:14 PM
> > > To: KY Srinivasan ; Stephen Hemminger
> > > ; martin.peter...@oracle.com;
> > > lo...@microsoft.com; jbottom...@odin.com;
> > > de...@linuxdriverproject.org; linux-ker...@vger.kernel.org;
> > > linux-scsi@vger.kernel.org
> > > Cc: Michael Kelley (EOSG) 
> > > Subject: [PATCH 1/1] scsi: storvsc: Spread interrupts when picking a
> > > channel for I/O requests
> > >
> > > Update the algorithm in storvsc_do_io to look for a channel starting
> > > with the current CPU + 1 and wrap around (within the current NUMA
> > > node). This spreads VMbus interrupts more evenly across CPUs. Previous
> > > code always started with first CPU in the current NUMA node, skewing
> > > the interrupt load to that CPU.
> > >
> > > Signed-off-by: Michael Kelley 
> > > ---
> > >  drivers/scsi/storvsc_drv.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
> > > index e07907d..f3264c4 100644
> > > --- a/drivers/scsi/storvsc_drv.c
> > > +++ b/drivers/scsi/storvsc_drv.c
> > > @@ -1310,7 +1310,8 @@ static int storvsc_do_io(struct hv_device *device,
> > >*/
> > >   cpumask_and(_mask, _device-
> > >alloced_cpus,
> > >
> > cpumask_of_node(cpu_to_node(q_num)));
> > > - for_each_cpu(tgt_cpu, _mask) {
> > > + for_each_cpu_wrap(tgt_cpu, _mask,
> > > + outgoing_channel->target_cpu + 1) {
> 
> Does it work when target_cpu is the last CPU on the system?
> 
> Otherwise, looking good.

Yes, it works.  for_each_cpu_wrap() correctly wraps in the case where
the 3rd parameter ('start') is one past the end of the mask.  Arguably,
we shouldn't rely on that, and should do the wrap to 0 before calling
for_each_cpu_wrap().

> 
> > >   if (tgt_cpu != outgoing_channel->target_cpu)
> > {
> > >   outgoing_channel =
> > >   stor_device->stor_chns[tgt_cpu];
> > > --
> > > 1.8.3.1


Re: [PATCH] scsi_debug: implement IMMED bit

2018-01-31 Thread Douglas Gilbert

On 2018-01-31 12:06 PM, Bart Van Assche wrote:

On 01/29/18 21:54, Douglas Gilbert wrote:

+static const struct opcode_info_t sync_cache_iarr[] = {
+    {0, 0x91, 0, F_LONG_DELAY | F_M_ACCESS, resp_sync_cache, NULL,
+    {16,  0x7, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,

   ^^^
Can you clarify the choice of "0x7" in the above? After having had a look at 
SBC-4 I was expecting 0x2 (the mask for the IMMED bit) since all other bits in 
that command byte are either reserved or obsolete.


As a general rule when you see "obsolete" that means that field was used
in an earlier standard (bit 0 was RELADDR and bit 2 was SYNC_NV). So
application clients complying with earlier versions of SBC might set those
bits. If they are set and the mask is being enforced I choose to not fail
the command as an illegal request. Basically accept and ignore.


-    return 0;
+    return (cmd[1] & 0x1) ? SDEG_RES_IMMED_MASK : 0; /* check IMMED bit */


Shouldn't the mask 0x2 be used to check for the IMMED bit?


That comment needs a little more context:

@@ -1597,7 +1614,7 @@ static int resp_start_stop(struct scsi_cmnd * scp,
}
stop = !(cmd[4] & 1);
atomic_xchg(>stopped, stop);
-   return 0;
+   return (cmd[1] & 0x1) ? SDEG_RES_IMMED_MASK : 0; /* check IMMED bit */


For START STOP UNIT the IMMED flag is byte 1, bit 0. So that code
is correct IMO.


And for SYNCHRONIZE CACHE(10 and 16) the IMMED flag is byte 1 bit 1
so the corresponding return statement is:

+   return (cmd[1] & 0x2) ? SDEG_RES_IMMED_MASK : 0; /* check IMMED bit */


Doug Gilbert


RE: [PATCH 1/1] scsi: storvsc: Spread interrupts when picking a channel for I/O requests

2018-01-31 Thread Long Li
> Subject: RE: [PATCH 1/1] scsi: storvsc: Spread interrupts when picking a
> channel for I/O requests
> 
> Updated/corrected two email addresses ...
> 
> > -Original Message-
> > From: Michael Kelley (EOSG)
> > Sent: Wednesday, January 24, 2018 2:14 PM
> > To: KY Srinivasan ; Stephen Hemminger
> > ; martin.peter...@oracle.com;
> > lo...@microsoft.com; jbottom...@odin.com;
> > de...@linuxdriverproject.org; linux-ker...@vger.kernel.org;
> > linux-scsi@vger.kernel.org
> > Cc: Michael Kelley (EOSG) 
> > Subject: [PATCH 1/1] scsi: storvsc: Spread interrupts when picking a
> > channel for I/O requests
> >
> > Update the algorithm in storvsc_do_io to look for a channel starting
> > with the current CPU + 1 and wrap around (within the current NUMA
> > node). This spreads VMbus interrupts more evenly across CPUs. Previous
> > code always started with first CPU in the current NUMA node, skewing
> > the interrupt load to that CPU.
> >
> > Signed-off-by: Michael Kelley 
> > ---
> >  drivers/scsi/storvsc_drv.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
> > index e07907d..f3264c4 100644
> > --- a/drivers/scsi/storvsc_drv.c
> > +++ b/drivers/scsi/storvsc_drv.c
> > @@ -1310,7 +1310,8 @@ static int storvsc_do_io(struct hv_device *device,
> >  */
> > cpumask_and(_mask, _device-
> >alloced_cpus,
> >
> cpumask_of_node(cpu_to_node(q_num)));
> > -   for_each_cpu(tgt_cpu, _mask) {
> > +   for_each_cpu_wrap(tgt_cpu, _mask,
> > +   outgoing_channel->target_cpu + 1) {

Does it work when target_cpu is the last CPU on the system?

Otherwise, looking good.

> > if (tgt_cpu != outgoing_channel->target_cpu)
> {
> > outgoing_channel =
> > stor_device->stor_chns[tgt_cpu];
> > --
> > 1.8.3.1


Re: [GIT PULL] first round of SCSI updates for the 4.14+ merge window

2018-01-31 Thread James Bottomley
On Wed, 2018-01-31 at 11:29 -0800, Linus Torvalds wrote:
> On Wed, Jan 31, 2018 at 9:42 AM, James Bottomley
>  wrote:
> > 
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git scsi-
> > misc
> 
> Ok, now I did indeed get that
> 
>   gpg: Can't check signature: unknown pubkey algorithm
> 
> because of your fancy new key. But making git use gpg2 indeed fixes
> it, since I have a recent enough version.
> 
> Let's see if anybody else even notices. I don't think lots of people
> look at the signatures.

Great! thanks for being the guinea pig.  At least we know the theory is
now tested.

Regards,

James



Re: [PATCH] scsi: mpt3sas: fix oops in error handlers after shutdown/unload

2018-01-31 Thread Mauricio Faria de Oliveira

Bart,

Thanks for reviewing.

On 01/31/2018 05:06 PM, Bart Van Assche wrote:

Sorry but I think this patch introduces new race conditions. Have you


Can you detail the race conditions? As far as I can see, the only race
condition would be when an error handler is invoked very close in time
to the shutdown/unload handler, and as such miss the 'ioc->remove_host'
assignment (thus it continues running anyway, and hit the oops).

But this problem has never been reported, so I'd think that case would
be rare.  Not that it does not need to be perfectly solved, but _if_
that is the only problem with the patch (and _if_ I got that right),
then it would still be better than the current code in oops.

> considered to modify the mpt3sas driver such that interrupts are only
> disabled after all commands have finished?

Yes, I considered introducing waits in several points, but since the
driver is tearing down, the only point which it seem to make sense to
insert is right at the beginning of shutdown/unload -- but it seemed
not sufficient: imagine not all commands finish, which would block that
path, so we need a time out mechanism, but that does not guarantee
the call back won't occur later on (say, a while after we gave up),
so we would still need a guard at the callbacks.

I also considered checking for the particular pointer which is set
to NULL and cause the Oops, but that is a series of 10ish (small)
patches which must check 'ioc->scsi_lookup' in a spinlock, so it
seemed a bit bigger than what was needed.. and would still be prone
to hitting an oops due to another pointer at a later time.

So after thinking about those, I went with the simplest approach.

I'd be happy to consider the race conditions you thought of,
provided more detail.

thanks,
Mauricio



Re: [GIT PULL] first round of SCSI updates for the 4.14+ merge window

2018-01-31 Thread Linus Torvalds
On Wed, Jan 31, 2018 at 9:42 AM, James Bottomley
 wrote:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git scsi-misc

Ok, now I did indeed get that

  gpg: Can't check signature: unknown pubkey algorithm

because of your fancy new key. But making git use gpg2 indeed fixes
it, since I have a recent enough version.

Let's see if anybody else even notices. I don't think lots of people
look at the signatures.

  Linus


[PATCH] test-case

2018-01-31 Thread Mauricio Faria de Oliveira
This patch can be verified with this simple test-case,
which inserts a wait loop at the bottom of 'scsih_shutdown()'
and forces SCSI commands to timeout (skip 'scmd->scsi_done()').

It abuses the 'ioc->logging_level' parameter do to that, with:
- 0x1000: wait loop on scsih_shutdown() and skip scsi_done()
- 0x0100: force scsih_abort() to return FAILED early,
  so to run device/target/host reset.

Oops in scsih_abort()
=

Test-case:

# echo 0x1000 > /sys/module/mpt3sas/parameters/logging_level
# dd if=/dev/sdf of=/dev/null count=1 iflag=direct &
# kexec --force --append="$(cat /proc/cmdline)" 
--initrd=/boot/initrd.img-4.15.0 /boot/vmlinux-4.15.0

Without patch:

[  141.936251] setting logging_level(0x1000)
[  141.977920] mpt3sas_cm0: skip scsi_done scmd(a85f0166)
[  147.927561] sd 16:0:1:0: [sdf] Synchronizing SCSI cache
[  147.927831] sd 16:0:0:0: [sde] Synchronizing SCSI cache
[  147.928090] mpt3sas_cm0: sending diag reset !!
[  149.041346] mpt3sas_cm0: diag reset: SUCCESS
[  149.057985] mpt3sas_cm0: sleep on shutdown
[  150.098619] mpt3sas_cm0: sleep on shutdown
[  151.138571] mpt3sas_cm0: sleep on shutdown
...
[  171.938245] mpt3sas_cm0: sleep on shutdown
[  172.678231] sd 16:0:1:0: attempting task abort! 
scmd(a85f0166)
...
[  172.678545] Unable to handle kernel paging request for data at 
address 0x0008
[  172.678600] Faulting instruction address: 0xd0001789e8c0
[  172.678648] Oops: Kernel access of bad area, sig: 11 [#1]
...
[  172.679804] NIP [d0001789e8c0] scsih_abort+0xc0/0x290 [mpt3sas]
[  172.679854] LR [d0001789e8a8] scsih_abort+0xa8/0x290 [mpt3sas]
[  172.679903] Call Trace:
[  172.679926] [c01fed68fbc0] [d0001789e8a8] 
scsih_abort+0xa8/0x290 [mpt3sas] (unreliable)
[  172.679994] [c01fed68fc50] [c075a274] 
scmd_eh_abort_handler+0xc4/0x1a0
[  172.680053] [c01fed68fc90] [c00fea88] 
process_one_work+0x188/0x450
[  172.680109] [c01fed68fd20] [c00fede8] 
worker_thread+0x98/0x550
[  172.680157] [c01fed68fdc0] [c0107344] kthread+0x164/0x1b0
[  172.680206] [c01fed68fe30] [c000b6e0] 
ret_from_kernel_thread+0x5c/0x7c
[  172.680261] Instruction dump:
...

With patch:

[  233.259952] setting logging_level(0x1000)
[  233.290008] mpt3sas_cm0: skip scsi_done scmd(7ec97dda)
[  234.600934] sd 16:0:1:0: [sdf] Synchronizing SCSI cache
[  234.601222] sd 16:0:0:0: [sde] Synchronizing SCSI cache
[  234.601470] mpt3sas_cm0: sending diag reset !!
[  235.718433] mpt3sas_cm0: diag reset: SUCCESS
[  235.734534] mpt3sas_cm0: sleep on shutdown
[  236.805704] mpt3sas_cm0: sleep on shutdown
[  237.845708] mpt3sas_cm0: sleep on shutdown
...
[  263.845781] mpt3sas_cm0: sleep on shutdown
[  264.185782] sd 16:0:1:0: attempting task abort! 
scmd(7ec97dda)
...
[  264.186104] sd 16:0:1:0: task abort: SUCCESS scmd(7ec97dda)
[  264.225788] sd 16:0:1:0: [sdf] tag#0 FAILED Result: 
hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[  264.225910] sd 16:0:1:0: [sdf] tag#0 CDB: Read(10) 28 00 00 00 00 00 
00 00 01 00
[  264.225969] print_req_error: I/O error, dev sdf, sector 0

Oops in scsih_host_reset()
==

Test-case:

# echo 0x1100 > /sys/module/mpt3sas/parameters/logging_level
# dd if=/dev/sdf of=/dev/null count=1 iflag=direct &
# kexec --force --append="$(cat /proc/cmdline)" 
--initrd=/boot/initrd.img-4.15.0 /boot/vmlinux-4.15.0

Without patch:

[  241.734670] setting logging_level(0x1100)
[  251.587765] mpt3sas_cm0: skip scsi_done scmd(60a524f9)
[  252.771054] sd 16:0:1:0: [sdf] Synchronizing SCSI cache
[  252.771335] sd 16:0:0:0: [sde] Synchronizing SCSI cache
[  252.771582] mpt3sas_cm0: sending diag reset !!
[  253.889275] mpt3sas_cm0: diag reset: SUCCESS
[  253.906315] mpt3sas_cm0: sleep on shutdown
[  254.966487] mpt3sas_cm0: sleep on shutdown
[  256.006417] mpt3sas_cm0: sleep on shutdown
...
[  282.005452] mpt3sas_cm0: sleep on shutdown
[  282.105416] sd 16:0:1:0: attempting task abort! 
scmd(60a524f9)
...
[  282.105707] mpt3sas_cm0: fail task abort scmd(60a524f9)
[  282.105754] sd 16:0:1:0: task abort: FAILED scmd(60a524f9)
[  282.105811] sd 16:0:1:0: attempting device reset! 
scmd(60a524f9)
...
[  282.106087] mpt3sas_scsih_issue_tm: mpt3sas_cm0: host reset in 
progress!
[  282.106136] sd 16:0:1:0: device reset: FAILED 

Re: [LSF/MM TOPIC] KPTI effect on IO performance

2018-01-31 Thread Scotty Bauer

On 2018-01-31 01:23, Ming Lei wrote:

Hi All,

After KPTI is merged, there is extra load introduced to context switch
between user space and kernel space. It is observed on my laptop that 
one

syscall takes extra ~0.15us[1] compared with 'nopti'.

IO performance is affected too, it is observed that IOPS drops by 32% 
in

my test[2] on null_blk compared with 'nopti':

randread IOPS on latest linus tree:
-
| randread IOPS | randread IOPS with 'nopti'|

| 928K  | 1372K |





Do you know if your CPU has PCID? It would be interesting to see these 
tests on older CPUs or older kernels without PCID support.


Re: [PATCH] scsi: mpt3sas: fix oops in error handlers after shutdown/unload

2018-01-31 Thread Bart Van Assche
On Wed, 2018-01-31 at 17:00 -0200, Mauricio Faria de Oliveira wrote:
> This problem was observed with kexec on a system with a mpt3sas
> based adapter and an infiniband adapter which takes long enough
> to shutdown.  The mpt3sas driver finished shuttting down, which
> disabled interruption handling, thus some running commands have
> not finished and timed out.  Since the system was still running,
> waiting for the infiniband adapter to shut down, the scsi error
> handler for task abort of mpt3sas was invoked, and hit the oops
> because 'ioc->scsi_lookup' was NULL.

Sorry but I think this patch introduces new race conditions. Have you
considered to modify the mpt3sas driver such that interrupts are only
disabled after all commands have finished?

Thanks,

Bart.

[PATCH] scsi: mpt3sas: fix oops in error handlers after shutdown/unload

2018-01-31 Thread Mauricio Faria de Oliveira
This patch adds checks for 'ioc->remove_host' in the SCSI error
handlers, so not to access pointers/resources potentially freed
in the PCI shutdown/module unload path.  The error handlers may
be invoked after shutdown/unload, depending on other components.

This problem was observed with kexec on a system with a mpt3sas
based adapter and an infiniband adapter which takes long enough
to shutdown.  The mpt3sas driver finished shuttting down, which
disabled interruption handling, thus some running commands have
not finished and timed out.  Since the system was still running,
waiting for the infiniband adapter to shut down, the scsi error
handler for task abort of mpt3sas was invoked, and hit the oops
because 'ioc->scsi_lookup' was NULL.

During patch testing, a different oops happens if host reset is
reached (when it eventually calls 'mpt3sas_base_get_iocstate()').

The device reset and target reset handlers do not cause oopses,
but print a misleading message of host reset in progress, thus
fix those too.

Signed-off-by: Mauricio Faria de Oliveira 
---
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 28 
 1 file changed, 28 insertions(+)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c 
b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index b258f21..3c4e47c 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -3007,6 +3007,13 @@ int mpt3sas_scsih_issue_locked_tm(struct MPT3SAS_ADAPTER 
*ioc, u16 handle,
goto out;
}
 
+   if (ioc->remove_host) {
+   scmd->result = DID_NO_CONNECT << 16;
+   scmd->scsi_done(scmd);
+   r = SUCCESS;
+   goto out;
+   }
+
/* search for the command */
smid = _scsih_scsi_lookup_find_by_scmd(ioc, scmd);
if (!smid) {
@@ -3069,6 +3076,13 @@ int mpt3sas_scsih_issue_locked_tm(struct MPT3SAS_ADAPTER 
*ioc, u16 handle,
goto out;
}
 
+   if (ioc->remove_host) {
+   scmd->result = DID_NO_CONNECT << 16;
+   scmd->scsi_done(scmd);
+   r = SUCCESS;
+   goto out;
+   }
+
/* for hidden raid components obtain the volume_handle */
handle = 0;
if (sas_device_priv_data->sas_target->flags &
@@ -3131,6 +3145,13 @@ int mpt3sas_scsih_issue_locked_tm(struct MPT3SAS_ADAPTER 
*ioc, u16 handle,
goto out;
}
 
+   if (ioc->remove_host) {
+   scmd->result = DID_NO_CONNECT << 16;
+   scmd->scsi_done(scmd);
+   r = SUCCESS;
+   goto out;
+   }
+
/* for hidden raid components obtain the volume_handle */
handle = 0;
if (sas_device_priv_data->sas_target->flags &
@@ -3179,6 +3200,13 @@ int mpt3sas_scsih_issue_locked_tm(struct MPT3SAS_ADAPTER 
*ioc, u16 handle,
ioc->name, scmd);
scsi_print_command(scmd);
 
+   if (ioc->remove_host) {
+   scmd->result = DID_NO_CONNECT << 16;
+   scmd->scsi_done(scmd);
+   r = FAILED;
+   goto out;
+   }
+
if (ioc->is_driver_loading) {
pr_info(MPT3SAS_FMT "Blocking the host reset\n",
ioc->name);
-- 
1.8.3.1



RE: [PATCH 1/1] scsi: storvsc: Increase cmd_per_lun for higher speed devices

2018-01-31 Thread KY Srinivasan


> -Original Message-
> From: Michael Kelley (EOSG)
> Sent: Wednesday, January 24, 2018 2:50 PM
> To: KY Srinivasan ; Stephen Hemminger
> ; martin.peter...@oracle.com; Long Li
> ; j...@linux.vnet.ibm.com;
> de...@linuxdriverproject.org; linux-ker...@vger.kernel.org; linux-
> s...@vger.kernel.org
> Cc: Michael Kelley (EOSG) 
> Subject: [PATCH 1/1] scsi: storvsc: Increase cmd_per_lun for higher speed
> devices
> 
> Increase cmd_per_lun to allow more I/Os in progress per device,
> particularly for NVMe's.  The Hyper-V host side can handle the
> higher count with no issues.
> 
> Signed-off-by: Michael Kelley 

Reviewed-by: K. Y. Srinivasan 
Acked-by: K. Y. Srinivasan 

Thanks,

K. Y
> ---
>  drivers/scsi/storvsc_drv.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
> index f3264c4..6205107 100644
> --- a/drivers/scsi/storvsc_drv.c
> +++ b/drivers/scsi/storvsc_drv.c
> @@ -1657,7 +1657,7 @@ static int storvsc_queuecommand(struct Scsi_Host
> *host, struct scsi_cmnd *scmnd)
>   .eh_timed_out = storvsc_eh_timed_out,
>   .slave_alloc =  storvsc_device_alloc,
>   .slave_configure =  storvsc_device_configure,
> - .cmd_per_lun =  255,
> + .cmd_per_lun =  2048,
>   .this_id =  -1,
>   .use_clustering =   ENABLE_CLUSTERING,
>   /* Make sure we dont get a sg segment crosses a page boundary */
> --
> 1.8.3.1


Re: [PATCH resend 3/6] cdrom: wait for tray to close

2018-01-31 Thread Michal Suchánek
On Mon, 29 Jan 2018 17:05:47 +
Bart Van Assche  wrote:

> On Fri, 2018-01-26 at 17:58 +0100, Michal Suchanek wrote:
> > +static int cdrom_tray_close(struct cdrom_device_info *cdi)
> > +{
> > +   int ret;
> > +
> > +   ret = cdi->ops->tray_move(cdi, 0);
> > +   if (ret || !cdi->ops->drive_status)
> > +   return ret;
> > +
> > +   return poll_event_interruptible(CDS_TRAY_OPEN !=
> > +   cdi->ops->drive_status(cdi, CDSL_CURRENT),
> > 500); +}
> > +
> >  static
> >  int open_for_common(struct cdrom_device_info *cdi, tracktype
> > *tracks) {
> > @@ -1048,7 +1062,9 @@ int open_for_common(struct cdrom_device_info
> > *cdi, tracktype *tracks) if (CDROM_CAN(CDC_CLOSE_TRAY) &&
> > cdi->options & CDO_AUTO_CLOSE) {
> > cd_dbg(CD_OPEN, "trying to close
> > the tray\n");
> > -   ret = cdo->tray_move(cdi, 0);
> > +   ret = cdrom_tray_close(cdi);
> > +   if (ret == -ERESTARTSYS)
> > +   return ret;
> > if (ret) {
> > cd_dbg(CD_OPEN, "bummer.
> > tried to close the tray but failed.\n"); /* Ignore the error from
> > the low @@ -2312,7 +2328,8 @@ static int
> > cdrom_ioctl_closetray(struct cdrom_device_info *cdi) 
> > if (!CDROM_CAN(CDC_CLOSE_TRAY))
> > return -ENOSYS;
> > -   return cdi->ops->tray_move(cdi, 0);
> > +
> > +   return cdrom_tray_close(cdi);
> >  }  
> 
> So this patch changes code that does not wait into code that
> potentially waits forever? Sorry but I don't think that's ideal.
> Please make sure that after a certain time (a few seconds?) the loop
> finishes.

The problem is that few seconds does not cut it. We are waiting for a
mechanical tray or CD changer to move. On non-broken hardware the tray
either closes or an error is reported within a few seconds. For the
timeout to not race with the event we are waiting for it must be much
longer, though.

Also note that this code is only invoked when the user specifically
requested that the tray gets closed automatically which is off by
default.

Thanks

Michal


Sorry about all the patch dropped email noise

2018-01-31 Thread James Bottomley
If you received it, just ignore it.  I didn't initialize my fixes tree
correctly before pulling in Martin's branch, so it wrongly sent a patch
dropped email for every patch in the misc tree (which has already been
routed to Linus).

James



[GIT PULL] first round of SCSI updates for the 4.14+ merge window

2018-01-31 Thread James Bottomley
This is mostly updates of the usual driver suspects: arcmsr,
scsi_debug, mpt3sas, lpfc, cxlflash, qla2xxx, aacraid, megaraid_sas,
hisi_sas.  We also have a rework of the libsas hotplug handling to make
it more robust, a slew of 32 bit time conversions and fixes, and a host
of the usual minor updates and style changes.  The biggest potential
for regressions is the libsas hotplug changes, but so far they seem
stable under testing.

The patch is available here:

git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git scsi-misc

The short changelog is:

Andy Shevchenko (3):
  scsi: mptfusion: Use snprintf() instead of open coded divisions
  scsi: hpsa: Use vsnprintf extension %phN
  scsi: libsas: remove private hex2bin() implementation

Anil Gurumurthy (1):
  scsi: qla2xxx: Add XCB counters to debugfs

Arnd Bergmann (16):
  scsi: arcmsr: avoid do_gettimeofday
  scsi: fas216: fix sense buffer initialization
  scsi: megaraid: use ktime_get_real for firmware time
  scsi: fnic: use 64-bit timestamps
  scsi: bfa: convert to strlcpy/strlcat
  scsi: scsi_debug: remove jiffies_to_timespec
  scsi: 3w-9xxx: rework lock timeouts
  scsi: 3ware: use 64-bit times for FW time sync
  scsi: 3ware: fix 32-bit time calculations
  scsi: bfa: use 64-bit times in bfa_aen_entry_s ABI
  scsi: bfa: try to sanitize vendor netlink events
  scsi: bfa: replace bfa_get_log_time() with ktime_get_real_seconds()
  scsi: bfa: document overflow of io_profile_start_time
  scsi: bfa: improve bfa_ioc_send_enable/disable data
  scsi: bfa: use proper time accessor for stats_reset_time
  scsi: bfa: use ktime_get_real_ts64 for firmware timestamp

Bart Van Assche (9):
  scsi: core: Change third __scsi_queue_insert() argument from int to bool
  scsi: qla2xxx: Suppress gcc 7 fall-through warnings
  scsi: scsi_debug: Add support for injecting SCSI_MLQUEUE_HOST_BUSY
  scsi: dh: Remove scsi_dh_remove_device()
  scsi: core: Unexport scsi_initialize_rq()
  scsi: core: Introduce scsi_devinfo_key enumeration type
  scsi: core: scsi_get_device_flags_keyed(): Always return device flags
  scsi: core: Convert a source code comment into a runtime check
  scsi: core: Ensure that the SCSI error handler gets woken up

Bryant G. Ly (1):
  scsi: ibmvscsis: add DRC indices to debug statements

Chaitra P B (1):
  scsi: mpt3sas: Proper handling of set/clear of "ATA command pending" flag.

Ching Huang (23):
  scsi: arcmsr: simplify arcmsr_request_device_map routine
  scsi: arcmsr: simplify all arcmsr_hbaX_get_config routine by call a new 
get_adapter_config function
  scsi: arcmsr: simplify arcmsr_hbaE_get_config function
  scsi: arcmsr: waiting for iop firmware ready before issue get_config 
command to iop
  scsi: arcmsr: simplify arcmsr_hbaC_get_config function
  scsi: arcmsr: Fix command result for CHECK_CONDITION
  scsi: arcmsr: Update driver version to v1.40.00.04-20171130
  scsi: arcmsr: Add driver module parameter msix_enable
  scsi: arcmsr: Add driver module parameter msi_enable
  scsi: arcmsr: Fix grammar
  scsi: arcmsr: Adjust whitespace
  scsi: arcmsr: Spin off duplicate code
  scsi: arcmsr: Fix clear doorbell queue on ACB_ADAPTER_TYPE_B
  scsi: arcmsr: Add a function to set date and time to firmware
  scsi: arcmsr: Add ACB_F_MSG_GET_CONFIG to acb->acb_flags
  scsi: arcmsr: Add driver option cmd_per_lun
  scsi: arcmsr: Replace constant ARCMSR_MAX_OUTSTANDING_CMD
  scsi: arcmsr: Add driver option host_can_queue
  scsi: arcmsr: replace constant ARCMSR_MAX_FREECCB_NUM
  scsi: arcmsr: Increase host controller command queue depth
  scsi: arcmsr: Add code for ACB_ADAPTER_TYPE_E
  scsi: arcmsr: simplify arcmsr_iop_init function
  scsi: arcmsr: Redefine ACB_ADAPTER_TYPE_A, _B, _C, _D

Christopher Díaz Riveros (1):
  scsi: ibmvfc: Remove unneeded semicolons

Colin Ian King (15):
  scsi: mptsas: remove duplicated assignment to pointer head
  scsi: mpt3sas: make function _get_st_from_smid static
  scsi: bfa: use ARRAY_SIZE for array sizing calculation on array __pciids
  scsi: qla2xxx: remove redundant assignment of d
  scsi: aacraid: remove redundant setting of variable c
  scsi: lpfc: fix a couple of minor indentation issues
  scsi: lpfc: don't dereference localport before it has been null checked
  scsi: arcmsr: remove redundant check for secs < 0
  scsi: fusion: clean up some indentations
  scsi: ipr: fix incorrect indentation of assignment statement
  scsi: csiostor: fix spelling mistake: "Couldnt" -> "Couldn't"
  scsi: bnx2fc: fix spelling mistake: "Couldnt" -> "Couldn't"
  scsi: wd719x: make card_types static const, shrinks object size
  scsi: bfa: remove unused pointer 'port'
  scsi: aacraid: remove unused variable managed_request_id

Dan Carpenter (2):
  scsi: storvsc: missing error 

Re: [PATCH] scsi_debug: implement IMMED bit

2018-01-31 Thread Bart Van Assche

On 01/29/18 21:54, Douglas Gilbert wrote:

+static const struct opcode_info_t sync_cache_iarr[] = {
+   {0, 0x91, 0, F_LONG_DELAY | F_M_ACCESS, resp_sync_cache, NULL,
+   {16,  0x7, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,

  ^^^
Can you clarify the choice of "0x7" in the above? After having had a 
look at SBC-4 I was expecting 0x2 (the mask for the IMMED bit) since all 
other bits in that command byte are either reserved or obsolete.



-   return 0;
+   return (cmd[1] & 0x1) ? SDEG_RES_IMMED_MASK : 0; /* check IMMED bit */


Shouldn't the mask 0x2 be used to check for the IMMED bit?

Thanks,

Bart.


Re: [LSF/MM TOPIC] Patch Submission process and Handling Internal Conflict

2018-01-31 Thread Eric Sandeen
On 1/24/18 5:43 PM, Darrick J. Wong wrote:
> On Wed, Jan 24, 2018 at 01:36:00PM -0800, James Bottomley wrote:
>> On Wed, 2018-01-24 at 11:20 -0800, Mike Kravetz wrote:
>>> On 01/24/2018 11:05 AM, James Bottomley wrote:

 I've got two community style topics, which should probably be
 discussed
 in the plenary

 1. Patch Submission Process

 Today we don't have a uniform patch submission process across
 Storage, Filesystems and MM.  The question is should we (or at
 least should we adhere to some minimal standards).  The standard
 we've been trying to hold to in SCSI is one review per accepted
 non-trivial patch.  For us, it's useful because it encourages
 driver writers to review each other's patches rather than just
 posting and then complaining their patch hasn't gone in.  I can
 certainly think of a couple of bugs I've had to chase in mm where
 the underlying patches would have benefited from review, so I'd
 like to discuss making the one review per non-trival patch our base
 minimum standard across the whole of LSF/MM; it would certainly
 serve to improve our Reviewed-by statistics.
>>>
>>> Well, the mm track at least has some discussion of this last year:
>>> https://lwn.net/Articles/718212/
>>
>> The pushback in your session was mandating reviews would mean slowing
>> patch acceptance or possibly causing the dropping of patches that
>> couldn't get reviewed.  Michal did say that XFS didn't have the
>> problem, however there not being XFS people in the room, discussion
>> stopped there.
> 
> I actually /was/ lurking in the session, but a year later I have more
> thoughts:
> 
> Now that I've been maintainer for more than a year I feel more confident
> in actually talking about our review processes, though I can only speak
> about my own experiences and hope the other xfs developers chime in if
> they choose.



Mandating reviews certainly can slow down patch acceptance, though I'd
expect that any good maintainer will be doing at least cursory review
before commit; when the maintainer writes patches themselves, they /are/
then at the mercy of others for an RVB: tag.  That hasn't in general
been a huge problem for us, though things do sometimes require a bit of
poking and prodding.  But I think that's a feature not a bug.  Obtaining
at least one meaningful review means that someone else has at least
some familiarity with the new code.

In the XFS community, in reality we have only about 4 kernelspace
reviewers, with a /very/ long tail of onesey-twosies; since v4.12:


  2 Reviewed-by: Allison Henderson 
  2 Reviewed-by: Eric Sandeen 
  3 Reviewed-by: Amir Goldstein 
  4 Reviewed-by: Ross Zwisler 
  6 Reviewed-by: Andy Shevchenko 
  6 Reviewed-by: Jan Kara 
 10 Reviewed-by: Carlos Maiolino 
 60 Reviewed-by: Christoph Hellwig 
104 Reviewed-by: Dave Chinner 
109 Reviewed-by: Brian Foster 
208 Reviewed-by: Darrick J. Wong 

In userspace things look a little different in the same time period:

  1 Reviewed-by: Allison Henderson 
  1 Reviewed-by: Bill O'Donnell 
  1 Reviewed-by: Eric Sandeen 
  3 Reviewed-by: Dave Chinner 
 11 Reviewed-by: Carlos Maiolino 
 12 Reviewed-by: Christoph Hellwig 
 25 Reviewed-by: Brian Foster 
 37 Reviewed-by: Darrick J. Wong 
 44 Reviewed-by: Eric Sandeen 

Unsurprisingly(?) the maintainers still bear a lot of the review burden, but
the same workhorse rock stars are clearly present.  In reality it's something
we need to work on, to try to get more people participating in meaningful 
review,
both to speed up the cycle and to grow community knowledge.

Another thing that Darrick and I have bounced around a little bit is
the adequacy of email for significant review of large feature patchsets.
On the one hand, we'd like centralized review with archives, because
that's useful to future code archaeologists.  On the other hand, I can't
help but think that something like Github's ability to mark up 
comments line by line would have some advantages, particularly for
brand new code.



As for the question of conflict, I'm not sure what to say...  The XFS
development team has been lucky(?) to have been living in relative peace
and harmony for the past few years.  Speaking for myself, I try to
be aware of getting too nitpicky or enforcing preferences vs. requirements,
and I make an effort to reach out and check in with patch 

Re: [PATCH] scsi_debug: Simplify request tag decoding

2018-01-31 Thread Douglas Gilbert

On 2018-01-26 11:52 AM, Bart Van Assche wrote:

Since commit 64d513ac31bd ("scsi: use host wide tags by default")
all SCSI requests have a tag, whether or not scsi-mq is enabled.
Additionally, it is safe to use blk_mq_unique_tag() and
blk_mq_unique_tag_to_hwq() for legacy SCSI queues. Since this
means that the sdebug_mq_active variable is superfluous, remove it.

Signed-off-by: Bart Van Assche 
Cc: Douglas Gilbert 


Ack-ed by Douglas Gilbert 


Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
---
  drivers/scsi/scsi_debug.c | 37 +++--
  1 file changed, 11 insertions(+), 26 deletions(-)

diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
index a5986dae9020..7d2ce0cc915a 100644
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -649,7 +649,6 @@ static bool sdebug_any_injecting_opt;
  static bool sdebug_verbose;
  static bool have_dif_prot;
  static bool sdebug_statistics = DEF_STATISTICS;
-static bool sdebug_mq_active;
  
  static unsigned int sdebug_store_sectors;

  static sector_t sdebug_capacity;  /* in sectors */
@@ -3727,20 +3726,13 @@ static int resp_xdwriteread_10(struct scsi_cmnd *scp,
  
  static struct sdebug_queue *get_queue(struct scsi_cmnd *cmnd)

  {
-   struct sdebug_queue *sqp = sdebug_q_arr;
+   u32 tag = blk_mq_unique_tag(cmnd->request);
+   u16 hwq = blk_mq_unique_tag_to_hwq(tag);
  
-	if (sdebug_mq_active) {

-   u32 tag = blk_mq_unique_tag(cmnd->request);
-   u16 hwq = blk_mq_unique_tag_to_hwq(tag);
-
-   if (unlikely(hwq >= submit_queues)) {
-   pr_warn("Unexpected hwq=%d, apply modulo\n", hwq);
-   hwq %= submit_queues;
-   }
-   pr_debug("tag=%u, hwq=%d\n", tag, hwq);
-   return sqp + hwq;
-   } else
-   return sqp;
+   pr_debug("tag=%#x, hwq=%d\n", tag, hwq);
+   if (WARN_ON_ONCE(hwq >= submit_queues))
+   hwq = 0;
+   return sdebug_q_arr + hwq;
  }
  
  /* Queued (deferred) command completions converge here. */

@@ -4587,9 +4579,8 @@ static int scsi_debug_show_info(struct seq_file *m, 
struct Scsi_Host *host)
   num_host_resets);
seq_printf(m, "dix_reads=%d, dix_writes=%d, dif_errors=%d\n",
   dix_reads, dix_writes, dif_errors);
-   seq_printf(m, "usec_in_jiffy=%lu, %s=%d, mq_active=%d\n",
-  TICK_NSEC / 1000, "statistics", sdebug_statistics,
-  sdebug_mq_active);
+   seq_printf(m, "usec_in_jiffy=%lu, statistics=%d\n", TICK_NSEC / 1000,
+  sdebug_statistics);
seq_printf(m, "cmnd_count=%d, completions=%d, %s=%d, a_tsf=%d\n",
   atomic_read(_cmnd_count),
   atomic_read(_completions),
@@ -5612,13 +5603,8 @@ static int scsi_debug_queuecommand(struct Scsi_Host 
*shost,
n += scnprintf(b + n, sb - n, "%02x ",
   (u32)cmd[k]);
}
-   if (sdebug_mq_active)
-   sdev_printk(KERN_INFO, sdp, "%s: tag=%u, cmd %s\n",
-   my_name, blk_mq_unique_tag(scp->request),
-   b);
-   else
-   sdev_printk(KERN_INFO, sdp, "%s: cmd %s\n", my_name,
-   b);
+   sdev_printk(KERN_INFO, sdp, "%s: tag=%#x, cmd %s\n", my_name,
+   blk_mq_unique_tag(scp->request), b);
}
if (fake_host_busy(scp))
return SCSI_MLQUEUE_HOST_BUSY;
@@ -5782,8 +5768,7 @@ static int sdebug_driver_probe(struct device * dev)
}
/* Decide whether to tell scsi subsystem that we want mq */
/* Following should give the same answer for each host */
-   sdebug_mq_active = shost_use_blk_mq(hpnt) && (submit_queues > 1);
-   if (sdebug_mq_active)
+   if (shost_use_blk_mq(hpnt))
hpnt->nr_hw_queues = submit_queues;
  
  	sdbg_host->shost = hpnt;






[LSF/MM TOPIC] KPTI effect on IO performance

2018-01-31 Thread Ming Lei
Hi All,

After KPTI is merged, there is extra load introduced to context switch
between user space and kernel space. It is observed on my laptop that one
syscall takes extra ~0.15us[1] compared with 'nopti'.

IO performance is affected too, it is observed that IOPS drops by 32% in
my test[2] on null_blk compared with 'nopti':

randread IOPS on latest linus tree:
-
| randread IOPS | randread IOPS with 'nopti'|   

| 928K  | 1372K |   



Two paths are affected, one is IO submission(read, write,... syscall),
another is the IO completion path in which interrupt may be triggered
from user space, and context switch is needed.

So is there something we can do for decreasing the effect on IO performance?

This effect may make Hannes's issue[3] worse, and maybe 'irq poll' should be
used more widely for all high performance IO device, even some optimization
should be considered for KPTI's effect.


[1] http://people.redhat.com/minlei/tests/tools/syscall_speed.c
[2] http://people.redhat.com/minlei/tests/tools/null_perf
[3] [LSF/MM TOPIC] irq affinity handling for high CPU count machines
https://marc.info/?t=15172215682=1=2

Thanks,
Ming