Re: scsi: sg: assorted memory corruptions
On Thu, Feb 1, 2018 at 7:03 AM, Douglas Gilbertwrote: > On 2018-01-30 07:22 AM, Dmitry Vyukov wrote: >> >> Uh, I've answered this a week ago, but did not notice that Doug >> dropped everybody from CC. Reporting to all. >> >> On Mon, Jan 22, 2018 at 8:16 PM, Douglas Gilbert >> wrote: >>> >>> On 2018-01-22 02:06 PM, Dmitry Vyukov wrote: On Mon, Jan 22, 2018 at 7:57 PM, Douglas Gilbert >>> >>> Please show me the output of 'lsscsi -g' on your test machine. >>> /dev/sg0 is often associated with /dev/sda which is often a SATA >>> SSD (or a virtualized one) that holds the root file system. >>> With the sg pass-through driver it is relatively easy to write >>> random (user provided data) over the root file system which will >>> almost certainly "root" the system. >> >> >> >> This is pretty standard qemu vm started with: >> >> qemu-system-x86_64 -hda wheezy.img -net user,host=10.0.2.10 -net nic >> -nographic -kernel arch/x86/boot/bzImage -append "console=ttyS0 >> root=/dev/sda earlyprintk=serial " -m 2G -smp 4 >> >> # lsscsi -g >> [0:0:0:0]diskATA QEMU HARDDISK0 /dev/sda /dev/sg0 > > > With lk 4.15.0-rc9 I can run your test program (with some additions, see > attachment) for 30 minutes against a scsi_debug simulated disk. You can > easily replicate this test just run 'modprobe scsi_debug' and a third > line should appear in your lsscsi output. The new device will most likely > be /dev/sg2 . > > With lk 4.15.0 (release) running against a SAS SSD (SEAGATE ST200FM0073), > the test has been running 20 minutes and counting without problems. That > is using a LSI HBA with the mpt3sas driver. > >> [1:0:0:0]cd/dvd QEMU QEMU DVD-ROM 2.0. /dev/sr0 /dev/sg1 >> >> # readlink /sys/class/scsi_generic/sg0 >> >> ../../devices/pci:00/:00:01.1/ata1/host0/target0:0:0/0:0:0:0/scsi_generic/sg0 >> >> # cat /sys/class/scsi_generic/sg0/device/vendor >> ATA > > > ^ > That subsystem is the culprit IMO, most likely libata. > > Until you can show this test failing on something other than an > ATA disk, then I will treat this issue as closed. Hi Doug, Why is bug in ATA not a bug? Is it long unused by everybody? I've got it by running qemu with default flags...
[PATCH] qedi: Fix truncation of name and secret
Adjust the NULL byte added by snprintf. Signed-off-by: Nilesh Javali--- drivers/scsi/qedi/qedi_main.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/qedi_main.c index 34a..cf8badb 100644 --- a/drivers/scsi/qedi/qedi_main.c +++ b/drivers/scsi/qedi/qedi_main.c @@ -1840,7 +1840,7 @@ static ssize_t qedi_show_boot_ini_info(void *data, int type, char *buf) switch (type) { case ISCSI_BOOT_INI_INITIATOR_NAME: - rc = snprintf(str, NVM_ISCSI_CFG_ISCSI_NAME_MAX_LEN, "%s\n", + rc = snprintf(str, NVM_ISCSI_CFG_ISCSI_NAME_MAX_LEN + 1, "%s", initiator->initiator_name.byte); break; default: @@ -1908,7 +1908,7 @@ static umode_t qedi_ini_get_attr_visibility(void *data, int type) switch (type) { case ISCSI_BOOT_TGT_NAME: - rc = snprintf(str, NVM_ISCSI_CFG_ISCSI_NAME_MAX_LEN, "%s\n", + rc = snprintf(str, NVM_ISCSI_CFG_ISCSI_NAME_MAX_LEN + 1, "%s", block->target[idx].target_name.byte); break; case ISCSI_BOOT_TGT_IP_ADDR: @@ -1930,19 +1930,19 @@ static umode_t qedi_ini_get_attr_visibility(void *data, int type) block->target[idx].lun.value[0]); break; case ISCSI_BOOT_TGT_CHAP_NAME: - rc = snprintf(str, NVM_ISCSI_CFG_CHAP_NAME_MAX_LEN, "%s\n", + rc = snprintf(str, NVM_ISCSI_CFG_CHAP_NAME_MAX_LEN + 1, "%s", chap_name); break; case ISCSI_BOOT_TGT_CHAP_SECRET: - rc = snprintf(str, NVM_ISCSI_CFG_CHAP_PWD_MAX_LEN, "%s\n", + rc = snprintf(str, NVM_ISCSI_CFG_CHAP_PWD_MAX_LEN + 1, "%s", chap_secret); break; case ISCSI_BOOT_TGT_REV_CHAP_NAME: - rc = snprintf(str, NVM_ISCSI_CFG_CHAP_NAME_MAX_LEN, "%s\n", + rc = snprintf(str, NVM_ISCSI_CFG_CHAP_NAME_MAX_LEN + 1, "%s", mchap_name); break; case ISCSI_BOOT_TGT_REV_CHAP_SECRET: - rc = snprintf(str, NVM_ISCSI_CFG_CHAP_PWD_MAX_LEN, "%s\n", + rc = snprintf(str, NVM_ISCSI_CFG_CHAP_PWD_MAX_LEN + 1, "%s", mchap_secret); break; case ISCSI_BOOT_TGT_FLAGS: -- 1.8.3.1
Re: scsi: sg: assorted memory corruptions
On 2018-01-30 07:22 AM, Dmitry Vyukov wrote: Uh, I've answered this a week ago, but did not notice that Doug dropped everybody from CC. Reporting to all. On Mon, Jan 22, 2018 at 8:16 PM, Douglas Gilbertwrote: On 2018-01-22 02:06 PM, Dmitry Vyukov wrote: On Mon, Jan 22, 2018 at 7:57 PM, Douglas Gilbert Please show me the output of 'lsscsi -g' on your test machine. /dev/sg0 is often associated with /dev/sda which is often a SATA SSD (or a virtualized one) that holds the root file system. With the sg pass-through driver it is relatively easy to write random (user provided data) over the root file system which will almost certainly "root" the system. This is pretty standard qemu vm started with: qemu-system-x86_64 -hda wheezy.img -net user,host=10.0.2.10 -net nic -nographic -kernel arch/x86/boot/bzImage -append "console=ttyS0 root=/dev/sda earlyprintk=serial " -m 2G -smp 4 # lsscsi -g [0:0:0:0]diskATA QEMU HARDDISK0 /dev/sda /dev/sg0 With lk 4.15.0-rc9 I can run your test program (with some additions, see attachment) for 30 minutes against a scsi_debug simulated disk. You can easily replicate this test just run 'modprobe scsi_debug' and a third line should appear in your lsscsi output. The new device will most likely be /dev/sg2 . With lk 4.15.0 (release) running against a SAS SSD (SEAGATE ST200FM0073), the test has been running 20 minutes and counting without problems. That is using a LSI HBA with the mpt3sas driver. [1:0:0:0]cd/dvd QEMU QEMU DVD-ROM 2.0. /dev/sr0 /dev/sg1 # readlink /sys/class/scsi_generic/sg0 ../../devices/pci:00/:00:01.1/ata1/host0/target0:0:0/0:0:0:0/scsi_generic/sg0 # cat /sys/class/scsi_generic/sg0/device/vendor ATA ^ That subsystem is the culprit IMO, most likely libata. Until you can show this test failing on something other than an ATA disk, then I will treat this issue as closed. Doug Gilbert Perhaps it misbehaves when it gets a SCSI command in the T10 range (i.e. not vendor specific) with a 9 byte cdb length. As far as I'm aware T10 (and the Ansi committee before it) have never defined a cdb with an odd length. For those that are not aware, the sg driver is a relatively thin shim over the block layer, the SCSI mid-level, and a low-level driver which may have another kernel driver stack underneath it (e.g. UAS (USB attached SCSI)). The previous report from syzkaller on the sg driver ("scsi: memory leak in sg_start_req") has resulted in one accepted patch on the block layer with probably more to come in the same area. Testing the patch Dmitry gave (with some added error checks which reported no problems) with the scsi_debug driver supplying /dev/sg0 I have not seen any problems running that test program. Again there might be a very slow memory leak, but if there is I don't believe it is in the sg driver. Did you run it in a loop? First runs pass just fine for me too. Is thirty minutes long enough ?? Yes, it certainly should be enough. Here is what I see: # while ./a.out; do echo RUN; done RUN RUN RUN RUN RUN RUN RUN [ 371.977266] == [ 371.980158] BUG: KASAN: double-free or invalid-free in __put_task_struct+0x1e7/0x5c0 Here is full execution trace of the write call if that will be of any help: https://gist.githubusercontent.com/dvyukov/14ae64c3e753dedf9ab2608676ecf0b9/raw/9803d52bb1e317a9228e362236d042aaf0fa9d69/gistfile1.txt This is on upstream commit 0d665e7b109d512b7cae3ccef6e8654714887844. Also attaching my config just in case. // autogenerated by syzkaller (http://github.com/google/syzkaller) #include #include #include #include #include #include #include #include #define SG_NEXT_CMD_LEN 0x2283 static const char * usage = "sg_syzk_next_cdb # (e.g. '/dev/sg3') "; int main(int argc, const char * argv[]) { int res, err; int fd; long len = 9; char* p = "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x47\x00\x00\x24\x00" "\x00\x00\x00\x00\x00\x1c\xbb\xac\x14\x00\xaa\xe0\x00\x00\x01" "\x00\x07\x07\x00\x00\x59\x08\x00\x00\x00\x80\xfe\x7f\x00\x00\x01"; const char * dev_name; struct stat a_stat; if (argc < 2) { fprintf(stderr, "Usage: %s\n", usage); return 1; } dev_name = argv[1]; if (0 != stat(dev_name, _stat)) { err = errno; fprintf(stderr, "Unable to stat %s, err: %s\n", dev_name, strerror(err)); return 1; } if ((a_stat.st_mode & S_IFMT) != S_IFCHR) { fprintf(stderr, "Expected %s, to be sg device\n", dev_name); return 1; } fd = open(dev_name, O_RDWR); if (fd < 0) { err = errno; fprintf(stderr, "open(%s) failed: %s [%d]\n", dev_name, strerror(err), err); } res = ioctl(fd, SG_NEXT_CMD_LEN, ); if (res < 0) { err = errno; fprintf(stderr, "ioctl failed: %s [%d]\n", strerror(err), err); } res = write(fd, p,
Re: [LSF/MM TOPIC] KPTI effect on IO performance
On Wed, Jan 31, 2018 at 11:43:33AM -0700, Scotty Bauer wrote: > On 2018-01-31 01:23, Ming Lei wrote: > > Hi All, > > > > After KPTI is merged, there is extra load introduced to context switch > > between user space and kernel space. It is observed on my laptop that > > one > > syscall takes extra ~0.15us[1] compared with 'nopti'. > > > > IO performance is affected too, it is observed that IOPS drops by 32% in > > my test[2] on null_blk compared with 'nopti': > > > > randread IOPS on latest linus tree: > > - > > | randread IOPS | randread IOPS with 'nopti'| > > > > | 928K | 1372K | > > > > > > > > Do you know if your CPU has PCID? It would be interesting to see these tests > on older CPUs or older kernels without PCID support. BTW, I also saw test data in case of vCPU without PCID, and it is said the syscall time can be close to ~30X compared with nopti, and the test should be setup easily by adjust CPU model of Qemu. So in case of no PCID, KPTI effect on IO performance should be much bigger than the above data. Thanks, Ming
Re: [LSF/MM TOPIC] KPTI effect on IO performance
Hi Scotty, On Wed, Jan 31, 2018 at 11:43:33AM -0700, Scotty Bauer wrote: > On 2018-01-31 01:23, Ming Lei wrote: > > Hi All, > > > > After KPTI is merged, there is extra load introduced to context switch > > between user space and kernel space. It is observed on my laptop that > > one > > syscall takes extra ~0.15us[1] compared with 'nopti'. > > > > IO performance is affected too, it is observed that IOPS drops by 32% in > > my test[2] on null_blk compared with 'nopti': > > > > randread IOPS on latest linus tree: > > - > > | randread IOPS | randread IOPS with 'nopti'| > > > > | 928K | 1372K | > > > > > > > > Do you know if your CPU has PCID? It would be interesting to see these tests > on older CPUs or older kernels without PCID support. My CPU has PCID, which can be retrieved via /proc/cpuinfo. And the above test is run on same kernel binary, and the result is just done between 'nopti' and no 'nopti' in kernel command line. Thanks, Ming
Re: [PATCH] scsi_debug: implement IMMED bit
On Wed, 2018-01-31 at 17:40 -0500, Douglas Gilbert wrote: > On 2018-01-31 05:05 PM, Bart Van Assche wrote: > > On Wed, 2018-01-31 at 15:26 -0500, Douglas Gilbert wrote: > > > On 2018-01-31 12:06 PM, Bart Van Assche wrote: > > > > On 01/29/18 21:54, Douglas Gilbert wrote: > > > > > +static const struct opcode_info_t sync_cache_iarr[] = { > > > > > +{0, 0x91, 0, F_LONG_DELAY | F_M_ACCESS, resp_sync_cache, NULL, > > > > > +{16, 0x7, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, > > > > > > > > ^^^ > > > > Can you clarify the choice of "0x7" in the above? After having had a > > > > look at > > > > SBC-4 I was expecting 0x2 (the mask for the IMMED bit) since all other > > > > bits in > > > > that command byte are either reserved or obsolete. > > > > > > As a general rule when you see "obsolete" that means that field was used > > > in an earlier standard (bit 0 was RELADDR and bit 2 was SYNC_NV). So > > > application clients complying with earlier versions of SBC might set those > > > bits. If they are set and the mask is being enforced I choose to not fail > > > the command as an illegal request. Basically accept and ignore. > > > > I agree with setting bits for obsolete flags. The reason I was asking about > > that > > mask is because I found the following in SBC-4 for byte 1 of SYNCHRONIZE > > CACHE(16): > > * Bits 7..3: reserved. > > * Bit 2: obsolete. > > * Bit 1: IMMED. > > * Bit 0: reserved. > > I did see the discrepancy between byte 1 bit 0 in SYNCHRONIZE CACHE(10) > where it is "obsolete" and SYNCHRONIZE CACHE(16) where it is "reserved" > initially thinking it might be a typo. So I went for the more permissive. > I'm now guessing (because I don't have the drafts between SBC(-1) and > SBC-2 and for some reason T10 wants you to be a member to download older > drafts) that RELADDR was removed before SYNCHRONIZE CACHE(16) was added > (as there was no 64 LBA support in SBC-1). So if there never was a > publicly available draft where SYNCHRONIZE CACHE(16) had a RELADDR bit > defined, then why mark it as obsolete? Reserved means a later > draft/standard may use that bit/field. > > BTW There is no requirement for a device server to enforce the mask and > most don't (from my testing). The mask is provided as an aid to the > application client and is required by the implementation of REPORT > SUPPORTED OPERATION CODES which is why I have it in scsi_debug. By default > the scsi_debug driver does not enforce the mask (but does when strict=1). > > Anyway, aren't we splitting hairs here? Does this really justify a new > patch being rolled? People trust you as a SCSI expert and may consider to copy code from the scsi_debug.c driver in their own GPL drivers if they need similar functionality. Do you think that is sufficient as a motivation to edit and repost this patch? Thanks, Bart.
Re: [PATCH] scsi: mpt3sas: fix oops in error handlers after shutdown/unload
On Wed, 2018-01-31 at 17:48 -0200, Mauricio Faria de Oliveira wrote: > On 01/31/2018 05:06 PM, Bart Van Assche wrote: > > Sorry but I think this patch introduces new race conditions. Have you > > Can you detail the race conditions? As far as I can see, the only race > condition would be when an error handler is invoked very close in time > to the shutdown/unload handler, and as such miss the 'ioc->remove_host' > assignment (thus it continues running anyway, and hit the oops). That's indeed the race I was referring to. BTW, are you aware that your patch does not seem to apply on Martin's tree (the fixes branch of git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git): $ patch -p1 < ~/\[PATCH\]_scsi\:_mpt3sas\:_fix_oops_in_error_handlers_after_shutdown_unload.mbox patching file drivers/scsi/mpt3sas/mpt3sas_scsih.c Hunk #1 FAILED at 3007. Hunk #2 succeeded at 2970 (offset -99 lines). Hunk #3 succeeded at 2977 (offset -161 lines). Hunk #4 succeeded at 3033 (offset -160 lines). 1 out of 4 hunks FAILED -- saving rejects to file drivers/scsi/mpt3sas/mpt3sas_scsih.c.rej Can you rebase this patch on top of Martin's tree? Thanks, Bart.
Re: [PATCH] test-case
On Wed, 2018-01-31 at 17:24 -0200, Mauricio Faria de Oliveira wrote: > diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c > b/drivers/scsi/mpt3sas/mpt3sas_scsih.c > index 3c4e47c..611cee33 100644 > --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c > +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c > @@ -2997,6 +2997,12 @@ int mpt3sas_scsih_issue_locked_tm(struct > MPT3SAS_ADAPTER *ioc, u16 handle, > "attempting task abort! scmd(%p)\n", scmd); > _scsih_tm_display_info(ioc, scmd); > > + if (ioc->logging_level & 0x0100) { > + pr_info(MPT3SAS_FMT "fail task abort scmd(%p)\n", ioc->name, > scmd); > + r = FAILED; > + goto out; > + } > + > sas_device_priv_data = scmd->device->hostdata; > if (!sas_device_priv_data || !sas_device_priv_data->sas_target) { > sdev_printk(KERN_INFO, scmd->device, > @@ -5584,6 +5590,11 @@ static int _scsih_set_satl_pending(struct scsi_cmnd > *scmd, bool pending) > > scsi_dma_unmap(scmd); > > + if (ioc->logging_level & 0x1000 && scmd->cmnd[0] != 0x35) { > + pr_info(MPT3SAS_FMT "skip scsi_done scmd(%p)\n", ioc->name, > scmd); > + return 1; > + } > + > scmd->scsi_done(scmd); > return 1; > } > @@ -10016,6 +10027,11 @@ static void scsih_remove(struct pci_dev *pdev) > > _scsih_ir_shutdown(ioc); > mpt3sas_base_detach(ioc); > + > + while (ioc->logging_level & 0x1000) { > + pr_info(MPT3SAS_FMT "sleep on shutdown\n", ioc->name); > + ssleep(1); > + } > } Hello Mauricio, I think it would be useful to have some variant of the above code in the kernel tree. Are you familiar with the fault injection framework (see also and Documentation/fault-injection/fault-injection.txt)? Do you think that framework would be appropriate for controlling whether or not the above code gets executed? Thanks, Bart.
Re: [PATCH] scsi_debug: implement IMMED bit
On 2018-01-31 05:05 PM, Bart Van Assche wrote: On Wed, 2018-01-31 at 15:26 -0500, Douglas Gilbert wrote: On 2018-01-31 12:06 PM, Bart Van Assche wrote: On 01/29/18 21:54, Douglas Gilbert wrote: +static const struct opcode_info_t sync_cache_iarr[] = { +{0, 0x91, 0, F_LONG_DELAY | F_M_ACCESS, resp_sync_cache, NULL, +{16, 0x7, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, ^^^ Can you clarify the choice of "0x7" in the above? After having had a look at SBC-4 I was expecting 0x2 (the mask for the IMMED bit) since all other bits in that command byte are either reserved or obsolete. As a general rule when you see "obsolete" that means that field was used in an earlier standard (bit 0 was RELADDR and bit 2 was SYNC_NV). So application clients complying with earlier versions of SBC might set those bits. If they are set and the mask is being enforced I choose to not fail the command as an illegal request. Basically accept and ignore. I agree with setting bits for obsolete flags. The reason I was asking about that mask is because I found the following in SBC-4 for byte 1 of SYNCHRONIZE CACHE(16): * Bits 7..3: reserved. * Bit 2: obsolete. * Bit 1: IMMED. * Bit 0: reserved. I did see the discrepancy between byte 1 bit 0 in SYNCHRONIZE CACHE(10) where it is "obsolete" and SYNCHRONIZE CACHE(16) where it is "reserved" initially thinking it might be a typo. So I went for the more permissive. I'm now guessing (because I don't have the drafts between SBC(-1) and SBC-2 and for some reason T10 wants you to be a member to download older drafts) that RELADDR was removed before SYNCHRONIZE CACHE(16) was added (as there was no 64 LBA support in SBC-1). So if there never was a publicly available draft where SYNCHRONIZE CACHE(16) had a RELADDR bit defined, then why mark it as obsolete? Reserved means a later draft/standard may use that bit/field. BTW There is no requirement for a device server to enforce the mask and most don't (from my testing). The mask is provided as an aid to the application client and is required by the implementation of REPORT SUPPORTED OPERATION CODES which is why I have it in scsi_debug. By default the scsi_debug driver does not enforce the mask (but does when strict=1). Anyway, aren't we splitting hairs here? Does this really justify a new patch being rolled? Doug Gilbert -return 0; +return (cmd[1] & 0x1) ? SDEG_RES_IMMED_MASK : 0; /* check IMMED bit */ Shouldn't the mask 0x2 be used to check for the IMMED bit? That comment needs a little more context: Sorry, I confused byte 1 of the START STOP command with that of the SYNCHRONIZE CACHE command so please ignore the above comment. Bart.
Re: [PATCH] scsi_debug: implement IMMED bit
On Wed, 2018-01-31 at 15:26 -0500, Douglas Gilbert wrote: > On 2018-01-31 12:06 PM, Bart Van Assche wrote: > > On 01/29/18 21:54, Douglas Gilbert wrote: > > > +static const struct opcode_info_t sync_cache_iarr[] = { > > > +{0, 0x91, 0, F_LONG_DELAY | F_M_ACCESS, resp_sync_cache, NULL, > > > +{16, 0x7, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, > > > >^^^ > > Can you clarify the choice of "0x7" in the above? After having had a look > > at > > SBC-4 I was expecting 0x2 (the mask for the IMMED bit) since all other bits > > in > > that command byte are either reserved or obsolete. > > As a general rule when you see "obsolete" that means that field was used > in an earlier standard (bit 0 was RELADDR and bit 2 was SYNC_NV). So > application clients complying with earlier versions of SBC might set those > bits. If they are set and the mask is being enforced I choose to not fail > the command as an illegal request. Basically accept and ignore. I agree with setting bits for obsolete flags. The reason I was asking about that mask is because I found the following in SBC-4 for byte 1 of SYNCHRONIZE CACHE(16): * Bits 7..3: reserved. * Bit 2: obsolete. * Bit 1: IMMED. * Bit 0: reserved. > > > -return 0; > > > +return (cmd[1] & 0x1) ? SDEG_RES_IMMED_MASK : 0; /* check IMMED bit > > > */ > > > > Shouldn't the mask 0x2 be used to check for the IMMED bit? > > That comment needs a little more context: Sorry, I confused byte 1 of the START STOP command with that of the SYNCHRONIZE CACHE command so please ignore the above comment. Bart.
RE: [PATCH 1/1] scsi: storvsc: Spread interrupts when picking a channel for I/O requests
> Subject: RE: [PATCH 1/1] scsi: storvsc: Spread interrupts when picking a > channel for I/O requests > > > From: Long Li > > Sent: Wednesday, January 31, 2018 12:23 PM > > To: Michael Kelley (EOSG); KY > > Srinivasan ; Stephen Hemminger > > ; martin.peter...@oracle.com; > > de...@linuxdriverproject.org; linux-ker...@vger.kernel.org; > > linux-scsi@vger.kernel.org; James E . J . Bottomley > > > > Subject: RE: [PATCH 1/1] scsi: storvsc: Spread interrupts when picking > > a channel for I/O requests > > > > > Subject: RE: [PATCH 1/1] scsi: storvsc: Spread interrupts when > > > picking a channel for I/O requests > > > > > > Updated/corrected two email addresses ... > > > > > > > -Original Message- > > > > From: Michael Kelley (EOSG) > > > > Sent: Wednesday, January 24, 2018 2:14 PM > > > > To: KY Srinivasan ; Stephen Hemminger > > > > ; martin.peter...@oracle.com; > > > > lo...@microsoft.com; jbottom...@odin.com; > > > > de...@linuxdriverproject.org; linux-ker...@vger.kernel.org; > > > > linux-scsi@vger.kernel.org > > > > Cc: Michael Kelley (EOSG) > > > > Subject: [PATCH 1/1] scsi: storvsc: Spread interrupts when picking > > > > a channel for I/O requests > > > > > > > > Update the algorithm in storvsc_do_io to look for a channel > > > > starting with the current CPU + 1 and wrap around (within the > > > > current NUMA node). This spreads VMbus interrupts more evenly > > > > across CPUs. Previous code always started with first CPU in the > > > > current NUMA node, skewing the interrupt load to that CPU. > > > > > > > > Signed-off-by: Michael Kelley Reviewed-by: Long Li > > > > --- > > > > drivers/scsi/storvsc_drv.c | 3 ++- > > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/drivers/scsi/storvsc_drv.c > > > > b/drivers/scsi/storvsc_drv.c index e07907d..f3264c4 100644 > > > > --- a/drivers/scsi/storvsc_drv.c > > > > +++ b/drivers/scsi/storvsc_drv.c > > > > @@ -1310,7 +1310,8 @@ static int storvsc_do_io(struct hv_device > *device, > > > > */ > > > > cpumask_and(_mask, _device- > alloced_cpus, > > > > > > > cpumask_of_node(cpu_to_node(q_num))); > > > > - for_each_cpu(tgt_cpu, _mask) { > > > > + for_each_cpu_wrap(tgt_cpu, _mask, > > > > + outgoing_channel->target_cpu + > > > > 1) { > > > > Does it work when target_cpu is the last CPU on the system? > > > > Otherwise, looking good. > > Yes, it works. for_each_cpu_wrap() correctly wraps in the case where the > 3rd parameter ('start') is one past the end of the mask. Arguably, we > shouldn't rely on that, and should do the wrap to 0 before calling > for_each_cpu_wrap(). > > > > > > > if (tgt_cpu != > > > > outgoing_channel->target_cpu) > > > { > > > > outgoing_channel = > > > > stor_device->stor_chns[tgt_cpu]; > > > > -- > > > > 1.8.3.1
RE: [PATCH 1/1] scsi: storvsc: Spread interrupts when picking a channel for I/O requests
> From: Long Li > Sent: Wednesday, January 31, 2018 12:23 PM > To: Michael Kelley (EOSG); KY Srinivasan > ; Stephen Hemminger ; > martin.peter...@oracle.com; de...@linuxdriverproject.org; > linux-ker...@vger.kernel.org; > linux-scsi@vger.kernel.org; James E . J . Bottomley > Subject: RE: [PATCH 1/1] scsi: storvsc: Spread interrupts when picking a > channel for I/O > requests > > > Subject: RE: [PATCH 1/1] scsi: storvsc: Spread interrupts when picking a > > channel for I/O requests > > > > Updated/corrected two email addresses ... > > > > > -Original Message- > > > From: Michael Kelley (EOSG) > > > Sent: Wednesday, January 24, 2018 2:14 PM > > > To: KY Srinivasan ; Stephen Hemminger > > > ; martin.peter...@oracle.com; > > > lo...@microsoft.com; jbottom...@odin.com; > > > de...@linuxdriverproject.org; linux-ker...@vger.kernel.org; > > > linux-scsi@vger.kernel.org > > > Cc: Michael Kelley (EOSG) > > > Subject: [PATCH 1/1] scsi: storvsc: Spread interrupts when picking a > > > channel for I/O requests > > > > > > Update the algorithm in storvsc_do_io to look for a channel starting > > > with the current CPU + 1 and wrap around (within the current NUMA > > > node). This spreads VMbus interrupts more evenly across CPUs. Previous > > > code always started with first CPU in the current NUMA node, skewing > > > the interrupt load to that CPU. > > > > > > Signed-off-by: Michael Kelley > > > --- > > > drivers/scsi/storvsc_drv.c | 3 ++- > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c > > > index e07907d..f3264c4 100644 > > > --- a/drivers/scsi/storvsc_drv.c > > > +++ b/drivers/scsi/storvsc_drv.c > > > @@ -1310,7 +1310,8 @@ static int storvsc_do_io(struct hv_device *device, > > >*/ > > > cpumask_and(_mask, _device- > > >alloced_cpus, > > > > > cpumask_of_node(cpu_to_node(q_num))); > > > - for_each_cpu(tgt_cpu, _mask) { > > > + for_each_cpu_wrap(tgt_cpu, _mask, > > > + outgoing_channel->target_cpu + 1) { > > Does it work when target_cpu is the last CPU on the system? > > Otherwise, looking good. Yes, it works. for_each_cpu_wrap() correctly wraps in the case where the 3rd parameter ('start') is one past the end of the mask. Arguably, we shouldn't rely on that, and should do the wrap to 0 before calling for_each_cpu_wrap(). > > > > if (tgt_cpu != outgoing_channel->target_cpu) > > { > > > outgoing_channel = > > > stor_device->stor_chns[tgt_cpu]; > > > -- > > > 1.8.3.1
Re: [PATCH] scsi_debug: implement IMMED bit
On 2018-01-31 12:06 PM, Bart Van Assche wrote: On 01/29/18 21:54, Douglas Gilbert wrote: +static const struct opcode_info_t sync_cache_iarr[] = { + {0, 0x91, 0, F_LONG_DELAY | F_M_ACCESS, resp_sync_cache, NULL, + {16, 0x7, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, ^^^ Can you clarify the choice of "0x7" in the above? After having had a look at SBC-4 I was expecting 0x2 (the mask for the IMMED bit) since all other bits in that command byte are either reserved or obsolete. As a general rule when you see "obsolete" that means that field was used in an earlier standard (bit 0 was RELADDR and bit 2 was SYNC_NV). So application clients complying with earlier versions of SBC might set those bits. If they are set and the mask is being enforced I choose to not fail the command as an illegal request. Basically accept and ignore. - return 0; + return (cmd[1] & 0x1) ? SDEG_RES_IMMED_MASK : 0; /* check IMMED bit */ Shouldn't the mask 0x2 be used to check for the IMMED bit? That comment needs a little more context: @@ -1597,7 +1614,7 @@ static int resp_start_stop(struct scsi_cmnd * scp, } stop = !(cmd[4] & 1); atomic_xchg(>stopped, stop); - return 0; + return (cmd[1] & 0x1) ? SDEG_RES_IMMED_MASK : 0; /* check IMMED bit */ For START STOP UNIT the IMMED flag is byte 1, bit 0. So that code is correct IMO. And for SYNCHRONIZE CACHE(10 and 16) the IMMED flag is byte 1 bit 1 so the corresponding return statement is: + return (cmd[1] & 0x2) ? SDEG_RES_IMMED_MASK : 0; /* check IMMED bit */ Doug Gilbert
RE: [PATCH 1/1] scsi: storvsc: Spread interrupts when picking a channel for I/O requests
> Subject: RE: [PATCH 1/1] scsi: storvsc: Spread interrupts when picking a > channel for I/O requests > > Updated/corrected two email addresses ... > > > -Original Message- > > From: Michael Kelley (EOSG) > > Sent: Wednesday, January 24, 2018 2:14 PM > > To: KY Srinivasan; Stephen Hemminger > > ; martin.peter...@oracle.com; > > lo...@microsoft.com; jbottom...@odin.com; > > de...@linuxdriverproject.org; linux-ker...@vger.kernel.org; > > linux-scsi@vger.kernel.org > > Cc: Michael Kelley (EOSG) > > Subject: [PATCH 1/1] scsi: storvsc: Spread interrupts when picking a > > channel for I/O requests > > > > Update the algorithm in storvsc_do_io to look for a channel starting > > with the current CPU + 1 and wrap around (within the current NUMA > > node). This spreads VMbus interrupts more evenly across CPUs. Previous > > code always started with first CPU in the current NUMA node, skewing > > the interrupt load to that CPU. > > > > Signed-off-by: Michael Kelley > > --- > > drivers/scsi/storvsc_drv.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c > > index e07907d..f3264c4 100644 > > --- a/drivers/scsi/storvsc_drv.c > > +++ b/drivers/scsi/storvsc_drv.c > > @@ -1310,7 +1310,8 @@ static int storvsc_do_io(struct hv_device *device, > > */ > > cpumask_and(_mask, _device- > >alloced_cpus, > > > cpumask_of_node(cpu_to_node(q_num))); > > - for_each_cpu(tgt_cpu, _mask) { > > + for_each_cpu_wrap(tgt_cpu, _mask, > > + outgoing_channel->target_cpu + 1) { Does it work when target_cpu is the last CPU on the system? Otherwise, looking good. > > if (tgt_cpu != outgoing_channel->target_cpu) > { > > outgoing_channel = > > stor_device->stor_chns[tgt_cpu]; > > -- > > 1.8.3.1
Re: [GIT PULL] first round of SCSI updates for the 4.14+ merge window
On Wed, 2018-01-31 at 11:29 -0800, Linus Torvalds wrote: > On Wed, Jan 31, 2018 at 9:42 AM, James Bottomley >wrote: > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git scsi- > > misc > > Ok, now I did indeed get that > > gpg: Can't check signature: unknown pubkey algorithm > > because of your fancy new key. But making git use gpg2 indeed fixes > it, since I have a recent enough version. > > Let's see if anybody else even notices. I don't think lots of people > look at the signatures. Great! thanks for being the guinea pig. At least we know the theory is now tested. Regards, James
Re: [PATCH] scsi: mpt3sas: fix oops in error handlers after shutdown/unload
Bart, Thanks for reviewing. On 01/31/2018 05:06 PM, Bart Van Assche wrote: Sorry but I think this patch introduces new race conditions. Have you Can you detail the race conditions? As far as I can see, the only race condition would be when an error handler is invoked very close in time to the shutdown/unload handler, and as such miss the 'ioc->remove_host' assignment (thus it continues running anyway, and hit the oops). But this problem has never been reported, so I'd think that case would be rare. Not that it does not need to be perfectly solved, but _if_ that is the only problem with the patch (and _if_ I got that right), then it would still be better than the current code in oops. > considered to modify the mpt3sas driver such that interrupts are only > disabled after all commands have finished? Yes, I considered introducing waits in several points, but since the driver is tearing down, the only point which it seem to make sense to insert is right at the beginning of shutdown/unload -- but it seemed not sufficient: imagine not all commands finish, which would block that path, so we need a time out mechanism, but that does not guarantee the call back won't occur later on (say, a while after we gave up), so we would still need a guard at the callbacks. I also considered checking for the particular pointer which is set to NULL and cause the Oops, but that is a series of 10ish (small) patches which must check 'ioc->scsi_lookup' in a spinlock, so it seemed a bit bigger than what was needed.. and would still be prone to hitting an oops due to another pointer at a later time. So after thinking about those, I went with the simplest approach. I'd be happy to consider the race conditions you thought of, provided more detail. thanks, Mauricio
Re: [GIT PULL] first round of SCSI updates for the 4.14+ merge window
On Wed, Jan 31, 2018 at 9:42 AM, James Bottomleywrote: > > git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git scsi-misc Ok, now I did indeed get that gpg: Can't check signature: unknown pubkey algorithm because of your fancy new key. But making git use gpg2 indeed fixes it, since I have a recent enough version. Let's see if anybody else even notices. I don't think lots of people look at the signatures. Linus
[PATCH] test-case
This patch can be verified with this simple test-case, which inserts a wait loop at the bottom of 'scsih_shutdown()' and forces SCSI commands to timeout (skip 'scmd->scsi_done()'). It abuses the 'ioc->logging_level' parameter do to that, with: - 0x1000: wait loop on scsih_shutdown() and skip scsi_done() - 0x0100: force scsih_abort() to return FAILED early, so to run device/target/host reset. Oops in scsih_abort() = Test-case: # echo 0x1000 > /sys/module/mpt3sas/parameters/logging_level # dd if=/dev/sdf of=/dev/null count=1 iflag=direct & # kexec --force --append="$(cat /proc/cmdline)" --initrd=/boot/initrd.img-4.15.0 /boot/vmlinux-4.15.0 Without patch: [ 141.936251] setting logging_level(0x1000) [ 141.977920] mpt3sas_cm0: skip scsi_done scmd(a85f0166) [ 147.927561] sd 16:0:1:0: [sdf] Synchronizing SCSI cache [ 147.927831] sd 16:0:0:0: [sde] Synchronizing SCSI cache [ 147.928090] mpt3sas_cm0: sending diag reset !! [ 149.041346] mpt3sas_cm0: diag reset: SUCCESS [ 149.057985] mpt3sas_cm0: sleep on shutdown [ 150.098619] mpt3sas_cm0: sleep on shutdown [ 151.138571] mpt3sas_cm0: sleep on shutdown ... [ 171.938245] mpt3sas_cm0: sleep on shutdown [ 172.678231] sd 16:0:1:0: attempting task abort! scmd(a85f0166) ... [ 172.678545] Unable to handle kernel paging request for data at address 0x0008 [ 172.678600] Faulting instruction address: 0xd0001789e8c0 [ 172.678648] Oops: Kernel access of bad area, sig: 11 [#1] ... [ 172.679804] NIP [d0001789e8c0] scsih_abort+0xc0/0x290 [mpt3sas] [ 172.679854] LR [d0001789e8a8] scsih_abort+0xa8/0x290 [mpt3sas] [ 172.679903] Call Trace: [ 172.679926] [c01fed68fbc0] [d0001789e8a8] scsih_abort+0xa8/0x290 [mpt3sas] (unreliable) [ 172.679994] [c01fed68fc50] [c075a274] scmd_eh_abort_handler+0xc4/0x1a0 [ 172.680053] [c01fed68fc90] [c00fea88] process_one_work+0x188/0x450 [ 172.680109] [c01fed68fd20] [c00fede8] worker_thread+0x98/0x550 [ 172.680157] [c01fed68fdc0] [c0107344] kthread+0x164/0x1b0 [ 172.680206] [c01fed68fe30] [c000b6e0] ret_from_kernel_thread+0x5c/0x7c [ 172.680261] Instruction dump: ... With patch: [ 233.259952] setting logging_level(0x1000) [ 233.290008] mpt3sas_cm0: skip scsi_done scmd(7ec97dda) [ 234.600934] sd 16:0:1:0: [sdf] Synchronizing SCSI cache [ 234.601222] sd 16:0:0:0: [sde] Synchronizing SCSI cache [ 234.601470] mpt3sas_cm0: sending diag reset !! [ 235.718433] mpt3sas_cm0: diag reset: SUCCESS [ 235.734534] mpt3sas_cm0: sleep on shutdown [ 236.805704] mpt3sas_cm0: sleep on shutdown [ 237.845708] mpt3sas_cm0: sleep on shutdown ... [ 263.845781] mpt3sas_cm0: sleep on shutdown [ 264.185782] sd 16:0:1:0: attempting task abort! scmd(7ec97dda) ... [ 264.186104] sd 16:0:1:0: task abort: SUCCESS scmd(7ec97dda) [ 264.225788] sd 16:0:1:0: [sdf] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 264.225910] sd 16:0:1:0: [sdf] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 01 00 [ 264.225969] print_req_error: I/O error, dev sdf, sector 0 Oops in scsih_host_reset() == Test-case: # echo 0x1100 > /sys/module/mpt3sas/parameters/logging_level # dd if=/dev/sdf of=/dev/null count=1 iflag=direct & # kexec --force --append="$(cat /proc/cmdline)" --initrd=/boot/initrd.img-4.15.0 /boot/vmlinux-4.15.0 Without patch: [ 241.734670] setting logging_level(0x1100) [ 251.587765] mpt3sas_cm0: skip scsi_done scmd(60a524f9) [ 252.771054] sd 16:0:1:0: [sdf] Synchronizing SCSI cache [ 252.771335] sd 16:0:0:0: [sde] Synchronizing SCSI cache [ 252.771582] mpt3sas_cm0: sending diag reset !! [ 253.889275] mpt3sas_cm0: diag reset: SUCCESS [ 253.906315] mpt3sas_cm0: sleep on shutdown [ 254.966487] mpt3sas_cm0: sleep on shutdown [ 256.006417] mpt3sas_cm0: sleep on shutdown ... [ 282.005452] mpt3sas_cm0: sleep on shutdown [ 282.105416] sd 16:0:1:0: attempting task abort! scmd(60a524f9) ... [ 282.105707] mpt3sas_cm0: fail task abort scmd(60a524f9) [ 282.105754] sd 16:0:1:0: task abort: FAILED scmd(60a524f9) [ 282.105811] sd 16:0:1:0: attempting device reset! scmd(60a524f9) ... [ 282.106087] mpt3sas_scsih_issue_tm: mpt3sas_cm0: host reset in progress! [ 282.106136] sd 16:0:1:0: device reset: FAILED
Re: [LSF/MM TOPIC] KPTI effect on IO performance
On 2018-01-31 01:23, Ming Lei wrote: Hi All, After KPTI is merged, there is extra load introduced to context switch between user space and kernel space. It is observed on my laptop that one syscall takes extra ~0.15us[1] compared with 'nopti'. IO performance is affected too, it is observed that IOPS drops by 32% in my test[2] on null_blk compared with 'nopti': randread IOPS on latest linus tree: - | randread IOPS | randread IOPS with 'nopti'| | 928K | 1372K | Do you know if your CPU has PCID? It would be interesting to see these tests on older CPUs or older kernels without PCID support.
Re: [PATCH] scsi: mpt3sas: fix oops in error handlers after shutdown/unload
On Wed, 2018-01-31 at 17:00 -0200, Mauricio Faria de Oliveira wrote: > This problem was observed with kexec on a system with a mpt3sas > based adapter and an infiniband adapter which takes long enough > to shutdown. The mpt3sas driver finished shuttting down, which > disabled interruption handling, thus some running commands have > not finished and timed out. Since the system was still running, > waiting for the infiniband adapter to shut down, the scsi error > handler for task abort of mpt3sas was invoked, and hit the oops > because 'ioc->scsi_lookup' was NULL. Sorry but I think this patch introduces new race conditions. Have you considered to modify the mpt3sas driver such that interrupts are only disabled after all commands have finished? Thanks, Bart.
[PATCH] scsi: mpt3sas: fix oops in error handlers after shutdown/unload
This patch adds checks for 'ioc->remove_host' in the SCSI error handlers, so not to access pointers/resources potentially freed in the PCI shutdown/module unload path. The error handlers may be invoked after shutdown/unload, depending on other components. This problem was observed with kexec on a system with a mpt3sas based adapter and an infiniband adapter which takes long enough to shutdown. The mpt3sas driver finished shuttting down, which disabled interruption handling, thus some running commands have not finished and timed out. Since the system was still running, waiting for the infiniband adapter to shut down, the scsi error handler for task abort of mpt3sas was invoked, and hit the oops because 'ioc->scsi_lookup' was NULL. During patch testing, a different oops happens if host reset is reached (when it eventually calls 'mpt3sas_base_get_iocstate()'). The device reset and target reset handlers do not cause oopses, but print a misleading message of host reset in progress, thus fix those too. Signed-off-by: Mauricio Faria de Oliveira--- drivers/scsi/mpt3sas/mpt3sas_scsih.c | 28 1 file changed, 28 insertions(+) diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c index b258f21..3c4e47c 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c @@ -3007,6 +3007,13 @@ int mpt3sas_scsih_issue_locked_tm(struct MPT3SAS_ADAPTER *ioc, u16 handle, goto out; } + if (ioc->remove_host) { + scmd->result = DID_NO_CONNECT << 16; + scmd->scsi_done(scmd); + r = SUCCESS; + goto out; + } + /* search for the command */ smid = _scsih_scsi_lookup_find_by_scmd(ioc, scmd); if (!smid) { @@ -3069,6 +3076,13 @@ int mpt3sas_scsih_issue_locked_tm(struct MPT3SAS_ADAPTER *ioc, u16 handle, goto out; } + if (ioc->remove_host) { + scmd->result = DID_NO_CONNECT << 16; + scmd->scsi_done(scmd); + r = SUCCESS; + goto out; + } + /* for hidden raid components obtain the volume_handle */ handle = 0; if (sas_device_priv_data->sas_target->flags & @@ -3131,6 +3145,13 @@ int mpt3sas_scsih_issue_locked_tm(struct MPT3SAS_ADAPTER *ioc, u16 handle, goto out; } + if (ioc->remove_host) { + scmd->result = DID_NO_CONNECT << 16; + scmd->scsi_done(scmd); + r = SUCCESS; + goto out; + } + /* for hidden raid components obtain the volume_handle */ handle = 0; if (sas_device_priv_data->sas_target->flags & @@ -3179,6 +3200,13 @@ int mpt3sas_scsih_issue_locked_tm(struct MPT3SAS_ADAPTER *ioc, u16 handle, ioc->name, scmd); scsi_print_command(scmd); + if (ioc->remove_host) { + scmd->result = DID_NO_CONNECT << 16; + scmd->scsi_done(scmd); + r = FAILED; + goto out; + } + if (ioc->is_driver_loading) { pr_info(MPT3SAS_FMT "Blocking the host reset\n", ioc->name); -- 1.8.3.1
RE: [PATCH 1/1] scsi: storvsc: Increase cmd_per_lun for higher speed devices
> -Original Message- > From: Michael Kelley (EOSG) > Sent: Wednesday, January 24, 2018 2:50 PM > To: KY Srinivasan; Stephen Hemminger > ; martin.peter...@oracle.com; Long Li > ; j...@linux.vnet.ibm.com; > de...@linuxdriverproject.org; linux-ker...@vger.kernel.org; linux- > s...@vger.kernel.org > Cc: Michael Kelley (EOSG) > Subject: [PATCH 1/1] scsi: storvsc: Increase cmd_per_lun for higher speed > devices > > Increase cmd_per_lun to allow more I/Os in progress per device, > particularly for NVMe's. The Hyper-V host side can handle the > higher count with no issues. > > Signed-off-by: Michael Kelley Reviewed-by: K. Y. Srinivasan Acked-by: K. Y. Srinivasan Thanks, K. Y > --- > drivers/scsi/storvsc_drv.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c > index f3264c4..6205107 100644 > --- a/drivers/scsi/storvsc_drv.c > +++ b/drivers/scsi/storvsc_drv.c > @@ -1657,7 +1657,7 @@ static int storvsc_queuecommand(struct Scsi_Host > *host, struct scsi_cmnd *scmnd) > .eh_timed_out = storvsc_eh_timed_out, > .slave_alloc = storvsc_device_alloc, > .slave_configure = storvsc_device_configure, > - .cmd_per_lun = 255, > + .cmd_per_lun = 2048, > .this_id = -1, > .use_clustering = ENABLE_CLUSTERING, > /* Make sure we dont get a sg segment crosses a page boundary */ > -- > 1.8.3.1
Re: [PATCH resend 3/6] cdrom: wait for tray to close
On Mon, 29 Jan 2018 17:05:47 + Bart Van Asschewrote: > On Fri, 2018-01-26 at 17:58 +0100, Michal Suchanek wrote: > > +static int cdrom_tray_close(struct cdrom_device_info *cdi) > > +{ > > + int ret; > > + > > + ret = cdi->ops->tray_move(cdi, 0); > > + if (ret || !cdi->ops->drive_status) > > + return ret; > > + > > + return poll_event_interruptible(CDS_TRAY_OPEN != > > + cdi->ops->drive_status(cdi, CDSL_CURRENT), > > 500); +} > > + > > static > > int open_for_common(struct cdrom_device_info *cdi, tracktype > > *tracks) { > > @@ -1048,7 +1062,9 @@ int open_for_common(struct cdrom_device_info > > *cdi, tracktype *tracks) if (CDROM_CAN(CDC_CLOSE_TRAY) && > > cdi->options & CDO_AUTO_CLOSE) { > > cd_dbg(CD_OPEN, "trying to close > > the tray\n"); > > - ret = cdo->tray_move(cdi, 0); > > + ret = cdrom_tray_close(cdi); > > + if (ret == -ERESTARTSYS) > > + return ret; > > if (ret) { > > cd_dbg(CD_OPEN, "bummer. > > tried to close the tray but failed.\n"); /* Ignore the error from > > the low @@ -2312,7 +2328,8 @@ static int > > cdrom_ioctl_closetray(struct cdrom_device_info *cdi) > > if (!CDROM_CAN(CDC_CLOSE_TRAY)) > > return -ENOSYS; > > - return cdi->ops->tray_move(cdi, 0); > > + > > + return cdrom_tray_close(cdi); > > } > > So this patch changes code that does not wait into code that > potentially waits forever? Sorry but I don't think that's ideal. > Please make sure that after a certain time (a few seconds?) the loop > finishes. The problem is that few seconds does not cut it. We are waiting for a mechanical tray or CD changer to move. On non-broken hardware the tray either closes or an error is reported within a few seconds. For the timeout to not race with the event we are waiting for it must be much longer, though. Also note that this code is only invoked when the user specifically requested that the tray gets closed automatically which is off by default. Thanks Michal
Sorry about all the patch dropped email noise
If you received it, just ignore it. I didn't initialize my fixes tree correctly before pulling in Martin's branch, so it wrongly sent a patch dropped email for every patch in the misc tree (which has already been routed to Linus). James
[GIT PULL] first round of SCSI updates for the 4.14+ merge window
This is mostly updates of the usual driver suspects: arcmsr, scsi_debug, mpt3sas, lpfc, cxlflash, qla2xxx, aacraid, megaraid_sas, hisi_sas. We also have a rework of the libsas hotplug handling to make it more robust, a slew of 32 bit time conversions and fixes, and a host of the usual minor updates and style changes. The biggest potential for regressions is the libsas hotplug changes, but so far they seem stable under testing. The patch is available here: git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git scsi-misc The short changelog is: Andy Shevchenko (3): scsi: mptfusion: Use snprintf() instead of open coded divisions scsi: hpsa: Use vsnprintf extension %phN scsi: libsas: remove private hex2bin() implementation Anil Gurumurthy (1): scsi: qla2xxx: Add XCB counters to debugfs Arnd Bergmann (16): scsi: arcmsr: avoid do_gettimeofday scsi: fas216: fix sense buffer initialization scsi: megaraid: use ktime_get_real for firmware time scsi: fnic: use 64-bit timestamps scsi: bfa: convert to strlcpy/strlcat scsi: scsi_debug: remove jiffies_to_timespec scsi: 3w-9xxx: rework lock timeouts scsi: 3ware: use 64-bit times for FW time sync scsi: 3ware: fix 32-bit time calculations scsi: bfa: use 64-bit times in bfa_aen_entry_s ABI scsi: bfa: try to sanitize vendor netlink events scsi: bfa: replace bfa_get_log_time() with ktime_get_real_seconds() scsi: bfa: document overflow of io_profile_start_time scsi: bfa: improve bfa_ioc_send_enable/disable data scsi: bfa: use proper time accessor for stats_reset_time scsi: bfa: use ktime_get_real_ts64 for firmware timestamp Bart Van Assche (9): scsi: core: Change third __scsi_queue_insert() argument from int to bool scsi: qla2xxx: Suppress gcc 7 fall-through warnings scsi: scsi_debug: Add support for injecting SCSI_MLQUEUE_HOST_BUSY scsi: dh: Remove scsi_dh_remove_device() scsi: core: Unexport scsi_initialize_rq() scsi: core: Introduce scsi_devinfo_key enumeration type scsi: core: scsi_get_device_flags_keyed(): Always return device flags scsi: core: Convert a source code comment into a runtime check scsi: core: Ensure that the SCSI error handler gets woken up Bryant G. Ly (1): scsi: ibmvscsis: add DRC indices to debug statements Chaitra P B (1): scsi: mpt3sas: Proper handling of set/clear of "ATA command pending" flag. Ching Huang (23): scsi: arcmsr: simplify arcmsr_request_device_map routine scsi: arcmsr: simplify all arcmsr_hbaX_get_config routine by call a new get_adapter_config function scsi: arcmsr: simplify arcmsr_hbaE_get_config function scsi: arcmsr: waiting for iop firmware ready before issue get_config command to iop scsi: arcmsr: simplify arcmsr_hbaC_get_config function scsi: arcmsr: Fix command result for CHECK_CONDITION scsi: arcmsr: Update driver version to v1.40.00.04-20171130 scsi: arcmsr: Add driver module parameter msix_enable scsi: arcmsr: Add driver module parameter msi_enable scsi: arcmsr: Fix grammar scsi: arcmsr: Adjust whitespace scsi: arcmsr: Spin off duplicate code scsi: arcmsr: Fix clear doorbell queue on ACB_ADAPTER_TYPE_B scsi: arcmsr: Add a function to set date and time to firmware scsi: arcmsr: Add ACB_F_MSG_GET_CONFIG to acb->acb_flags scsi: arcmsr: Add driver option cmd_per_lun scsi: arcmsr: Replace constant ARCMSR_MAX_OUTSTANDING_CMD scsi: arcmsr: Add driver option host_can_queue scsi: arcmsr: replace constant ARCMSR_MAX_FREECCB_NUM scsi: arcmsr: Increase host controller command queue depth scsi: arcmsr: Add code for ACB_ADAPTER_TYPE_E scsi: arcmsr: simplify arcmsr_iop_init function scsi: arcmsr: Redefine ACB_ADAPTER_TYPE_A, _B, _C, _D Christopher Díaz Riveros (1): scsi: ibmvfc: Remove unneeded semicolons Colin Ian King (15): scsi: mptsas: remove duplicated assignment to pointer head scsi: mpt3sas: make function _get_st_from_smid static scsi: bfa: use ARRAY_SIZE for array sizing calculation on array __pciids scsi: qla2xxx: remove redundant assignment of d scsi: aacraid: remove redundant setting of variable c scsi: lpfc: fix a couple of minor indentation issues scsi: lpfc: don't dereference localport before it has been null checked scsi: arcmsr: remove redundant check for secs < 0 scsi: fusion: clean up some indentations scsi: ipr: fix incorrect indentation of assignment statement scsi: csiostor: fix spelling mistake: "Couldnt" -> "Couldn't" scsi: bnx2fc: fix spelling mistake: "Couldnt" -> "Couldn't" scsi: wd719x: make card_types static const, shrinks object size scsi: bfa: remove unused pointer 'port' scsi: aacraid: remove unused variable managed_request_id Dan Carpenter (2): scsi: storvsc: missing error
Re: [PATCH] scsi_debug: implement IMMED bit
On 01/29/18 21:54, Douglas Gilbert wrote: +static const struct opcode_info_t sync_cache_iarr[] = { + {0, 0x91, 0, F_LONG_DELAY | F_M_ACCESS, resp_sync_cache, NULL, + {16, 0x7, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, ^^^ Can you clarify the choice of "0x7" in the above? After having had a look at SBC-4 I was expecting 0x2 (the mask for the IMMED bit) since all other bits in that command byte are either reserved or obsolete. - return 0; + return (cmd[1] & 0x1) ? SDEG_RES_IMMED_MASK : 0; /* check IMMED bit */ Shouldn't the mask 0x2 be used to check for the IMMED bit? Thanks, Bart.
Re: [LSF/MM TOPIC] Patch Submission process and Handling Internal Conflict
On 1/24/18 5:43 PM, Darrick J. Wong wrote: > On Wed, Jan 24, 2018 at 01:36:00PM -0800, James Bottomley wrote: >> On Wed, 2018-01-24 at 11:20 -0800, Mike Kravetz wrote: >>> On 01/24/2018 11:05 AM, James Bottomley wrote: I've got two community style topics, which should probably be discussed in the plenary 1. Patch Submission Process Today we don't have a uniform patch submission process across Storage, Filesystems and MM. The question is should we (or at least should we adhere to some minimal standards). The standard we've been trying to hold to in SCSI is one review per accepted non-trivial patch. For us, it's useful because it encourages driver writers to review each other's patches rather than just posting and then complaining their patch hasn't gone in. I can certainly think of a couple of bugs I've had to chase in mm where the underlying patches would have benefited from review, so I'd like to discuss making the one review per non-trival patch our base minimum standard across the whole of LSF/MM; it would certainly serve to improve our Reviewed-by statistics. >>> >>> Well, the mm track at least has some discussion of this last year: >>> https://lwn.net/Articles/718212/ >> >> The pushback in your session was mandating reviews would mean slowing >> patch acceptance or possibly causing the dropping of patches that >> couldn't get reviewed. Michal did say that XFS didn't have the >> problem, however there not being XFS people in the room, discussion >> stopped there. > > I actually /was/ lurking in the session, but a year later I have more > thoughts: > > Now that I've been maintainer for more than a year I feel more confident > in actually talking about our review processes, though I can only speak > about my own experiences and hope the other xfs developers chime in if > they choose. Mandating reviews certainly can slow down patch acceptance, though I'd expect that any good maintainer will be doing at least cursory review before commit; when the maintainer writes patches themselves, they /are/ then at the mercy of others for an RVB: tag. That hasn't in general been a huge problem for us, though things do sometimes require a bit of poking and prodding. But I think that's a feature not a bug. Obtaining at least one meaningful review means that someone else has at least some familiarity with the new code. In the XFS community, in reality we have only about 4 kernelspace reviewers, with a /very/ long tail of onesey-twosies; since v4.12: 2 Reviewed-by: Allison Henderson2 Reviewed-by: Eric Sandeen 3 Reviewed-by: Amir Goldstein 4 Reviewed-by: Ross Zwisler 6 Reviewed-by: Andy Shevchenko 6 Reviewed-by: Jan Kara 10 Reviewed-by: Carlos Maiolino 60 Reviewed-by: Christoph Hellwig 104 Reviewed-by: Dave Chinner 109 Reviewed-by: Brian Foster 208 Reviewed-by: Darrick J. Wong In userspace things look a little different in the same time period: 1 Reviewed-by: Allison Henderson 1 Reviewed-by: Bill O'Donnell 1 Reviewed-by: Eric Sandeen 3 Reviewed-by: Dave Chinner 11 Reviewed-by: Carlos Maiolino 12 Reviewed-by: Christoph Hellwig 25 Reviewed-by: Brian Foster 37 Reviewed-by: Darrick J. Wong 44 Reviewed-by: Eric Sandeen Unsurprisingly(?) the maintainers still bear a lot of the review burden, but the same workhorse rock stars are clearly present. In reality it's something we need to work on, to try to get more people participating in meaningful review, both to speed up the cycle and to grow community knowledge. Another thing that Darrick and I have bounced around a little bit is the adequacy of email for significant review of large feature patchsets. On the one hand, we'd like centralized review with archives, because that's useful to future code archaeologists. On the other hand, I can't help but think that something like Github's ability to mark up comments line by line would have some advantages, particularly for brand new code. As for the question of conflict, I'm not sure what to say... The XFS development team has been lucky(?) to have been living in relative peace and harmony for the past few years. Speaking for myself, I try to be aware of getting too nitpicky or enforcing preferences vs. requirements, and I make an effort to reach out and check in with patch
Re: [PATCH] scsi_debug: Simplify request tag decoding
On 2018-01-26 11:52 AM, Bart Van Assche wrote: Since commit 64d513ac31bd ("scsi: use host wide tags by default") all SCSI requests have a tag, whether or not scsi-mq is enabled. Additionally, it is safe to use blk_mq_unique_tag() and blk_mq_unique_tag_to_hwq() for legacy SCSI queues. Since this means that the sdebug_mq_active variable is superfluous, remove it. Signed-off-by: Bart Van AsscheCc: Douglas Gilbert Ack-ed by Douglas Gilbert Cc: Christoph Hellwig Cc: Hannes Reinecke --- drivers/scsi/scsi_debug.c | 37 +++-- 1 file changed, 11 insertions(+), 26 deletions(-) diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c index a5986dae9020..7d2ce0cc915a 100644 --- a/drivers/scsi/scsi_debug.c +++ b/drivers/scsi/scsi_debug.c @@ -649,7 +649,6 @@ static bool sdebug_any_injecting_opt; static bool sdebug_verbose; static bool have_dif_prot; static bool sdebug_statistics = DEF_STATISTICS; -static bool sdebug_mq_active; static unsigned int sdebug_store_sectors; static sector_t sdebug_capacity; /* in sectors */ @@ -3727,20 +3726,13 @@ static int resp_xdwriteread_10(struct scsi_cmnd *scp, static struct sdebug_queue *get_queue(struct scsi_cmnd *cmnd) { - struct sdebug_queue *sqp = sdebug_q_arr; + u32 tag = blk_mq_unique_tag(cmnd->request); + u16 hwq = blk_mq_unique_tag_to_hwq(tag); - if (sdebug_mq_active) { - u32 tag = blk_mq_unique_tag(cmnd->request); - u16 hwq = blk_mq_unique_tag_to_hwq(tag); - - if (unlikely(hwq >= submit_queues)) { - pr_warn("Unexpected hwq=%d, apply modulo\n", hwq); - hwq %= submit_queues; - } - pr_debug("tag=%u, hwq=%d\n", tag, hwq); - return sqp + hwq; - } else - return sqp; + pr_debug("tag=%#x, hwq=%d\n", tag, hwq); + if (WARN_ON_ONCE(hwq >= submit_queues)) + hwq = 0; + return sdebug_q_arr + hwq; } /* Queued (deferred) command completions converge here. */ @@ -4587,9 +4579,8 @@ static int scsi_debug_show_info(struct seq_file *m, struct Scsi_Host *host) num_host_resets); seq_printf(m, "dix_reads=%d, dix_writes=%d, dif_errors=%d\n", dix_reads, dix_writes, dif_errors); - seq_printf(m, "usec_in_jiffy=%lu, %s=%d, mq_active=%d\n", - TICK_NSEC / 1000, "statistics", sdebug_statistics, - sdebug_mq_active); + seq_printf(m, "usec_in_jiffy=%lu, statistics=%d\n", TICK_NSEC / 1000, + sdebug_statistics); seq_printf(m, "cmnd_count=%d, completions=%d, %s=%d, a_tsf=%d\n", atomic_read(_cmnd_count), atomic_read(_completions), @@ -5612,13 +5603,8 @@ static int scsi_debug_queuecommand(struct Scsi_Host *shost, n += scnprintf(b + n, sb - n, "%02x ", (u32)cmd[k]); } - if (sdebug_mq_active) - sdev_printk(KERN_INFO, sdp, "%s: tag=%u, cmd %s\n", - my_name, blk_mq_unique_tag(scp->request), - b); - else - sdev_printk(KERN_INFO, sdp, "%s: cmd %s\n", my_name, - b); + sdev_printk(KERN_INFO, sdp, "%s: tag=%#x, cmd %s\n", my_name, + blk_mq_unique_tag(scp->request), b); } if (fake_host_busy(scp)) return SCSI_MLQUEUE_HOST_BUSY; @@ -5782,8 +5768,7 @@ static int sdebug_driver_probe(struct device * dev) } /* Decide whether to tell scsi subsystem that we want mq */ /* Following should give the same answer for each host */ - sdebug_mq_active = shost_use_blk_mq(hpnt) && (submit_queues > 1); - if (sdebug_mq_active) + if (shost_use_blk_mq(hpnt)) hpnt->nr_hw_queues = submit_queues; sdbg_host->shost = hpnt;
[LSF/MM TOPIC] KPTI effect on IO performance
Hi All, After KPTI is merged, there is extra load introduced to context switch between user space and kernel space. It is observed on my laptop that one syscall takes extra ~0.15us[1] compared with 'nopti'. IO performance is affected too, it is observed that IOPS drops by 32% in my test[2] on null_blk compared with 'nopti': randread IOPS on latest linus tree: - | randread IOPS | randread IOPS with 'nopti'| | 928K | 1372K | Two paths are affected, one is IO submission(read, write,... syscall), another is the IO completion path in which interrupt may be triggered from user space, and context switch is needed. So is there something we can do for decreasing the effect on IO performance? This effect may make Hannes's issue[3] worse, and maybe 'irq poll' should be used more widely for all high performance IO device, even some optimization should be considered for KPTI's effect. [1] http://people.redhat.com/minlei/tests/tools/syscall_speed.c [2] http://people.redhat.com/minlei/tests/tools/null_perf [3] [LSF/MM TOPIC] irq affinity handling for high CPU count machines https://marc.info/?t=15172215682=1=2 Thanks, Ming