Re: [PATCH 0/5] blk-mq/scsi-mq: support global tags & introduce force_blk_mq

2018-02-10 Thread Ming Lei
On Sat, Feb 10, 2018 at 09:00:57AM +0800, Ming Lei wrote:
> Hi Kashyap,
> 
> On Fri, Feb 09, 2018 at 02:12:16PM +0530, Kashyap Desai wrote:
> > > -Original Message-
> > > From: Ming Lei [mailto:ming@redhat.com]
> > > Sent: Friday, February 9, 2018 11:01 AM
> > > To: Kashyap Desai
> > > Cc: Hannes Reinecke; Jens Axboe; linux-bl...@vger.kernel.org; Christoph
> > > Hellwig; Mike Snitzer; linux-scsi@vger.kernel.org; Arun Easi; Omar
> > Sandoval;
> > > Martin K . Petersen; James Bottomley; Christoph Hellwig; Don Brace;
> > Peter
> > > Rivera; Paolo Bonzini; Laurence Oberman
> > > Subject: Re: [PATCH 0/5] blk-mq/scsi-mq: support global tags & introduce
> > > force_blk_mq
> > >
> > > On Fri, Feb 09, 2018 at 10:28:23AM +0530, Kashyap Desai wrote:
> > > > > -Original Message-
> > > > > From: Ming Lei [mailto:ming@redhat.com]
> > > > > Sent: Thursday, February 8, 2018 10:23 PM
> > > > > To: Hannes Reinecke
> > > > > Cc: Kashyap Desai; Jens Axboe; linux-bl...@vger.kernel.org;
> > > > > Christoph Hellwig; Mike Snitzer; linux-scsi@vger.kernel.org; Arun
> > > > > Easi; Omar
> > > > Sandoval;
> > > > > Martin K . Petersen; James Bottomley; Christoph Hellwig; Don Brace;
> > > > Peter
> > > > > Rivera; Paolo Bonzini; Laurence Oberman
> > > > > Subject: Re: [PATCH 0/5] blk-mq/scsi-mq: support global tags &
> > > > > introduce force_blk_mq
> > > > >
> > > > > On Thu, Feb 08, 2018 at 08:00:29AM +0100, Hannes Reinecke wrote:
> > > > > > On 02/07/2018 03:14 PM, Kashyap Desai wrote:
> > > > > > >> -Original Message-
> > > > > > >> From: Ming Lei [mailto:ming@redhat.com]
> > > > > > >> Sent: Wednesday, February 7, 2018 5:53 PM
> > > > > > >> To: Hannes Reinecke
> > > > > > >> Cc: Kashyap Desai; Jens Axboe; linux-bl...@vger.kernel.org;
> > > > > > >> Christoph Hellwig; Mike Snitzer; linux-scsi@vger.kernel.org;
> > > > > > >> Arun Easi; Omar
> > > > > > > Sandoval;
> > > > > > >> Martin K . Petersen; James Bottomley; Christoph Hellwig; Don
> > > > > > >> Brace;
> > > > > > > Peter
> > > > > > >> Rivera; Paolo Bonzini; Laurence Oberman
> > > > > > >> Subject: Re: [PATCH 0/5] blk-mq/scsi-mq: support global tags &
> > > > > > >> introduce force_blk_mq
> > > > > > >>
> > > > > > >> On Wed, Feb 07, 2018 at 07:50:21AM +0100, Hannes Reinecke
> > wrote:
> > > > > > >>> Hi all,
> > > > > > >>>
> > > > > > >>> [ .. ]
> > > > > > >
> > > > > > > Could you share us your patch for enabling global_tags/MQ on
> > > > > >  megaraid_sas
> > > > > > > so that I can reproduce your test?
> > > > > > >
> > > > > > >> See below perf top data. "bt_iter" is consuming 4 times
> > > > > > >> more
> > > > CPU.
> > > > > > >
> > > > > > > Could you share us what the IOPS/CPU utilization effect is
> > > > > > > after
> > > > > >  applying the
> > > > > > > patch V2? And your test script?
> > > > > >  Regarding CPU utilization, I need to test one more time.
> > > > > >  Currently system is in used.
> > > > > > 
> > > > > >  I run below fio test on total 24 SSDs expander attached.
> > > > > > 
> > > > > >  numactl -N 1 fio jbod.fio --rw=randread --iodepth=64 --bs=4k
> > > > > >  --ioengine=libaio --rw=randread
> > > > > > 
> > > > > >  Performance dropped from 1.6 M IOPs to 770K IOPs.
> > > > > > 
> > > > > > >>> This is basically what we've seen with earlier iterations.
> > > > > > >>
> > > > > > >> Hi Hannes,
> > > > > > >>
> > > > > > >> As I mentioned in another mail[1], Kashyap's patch has a big
> > > > > > >> issue,
> > > > > > > which
> > > > > > >> causes only reply queue 0 used.
> > > > > > >>
> > > > > > >> [1] https://marc.info/?l=linux-scsi=151793204014631=2
> > > > > > >>
> > > > > > >> So could you guys run your performance test again after fixing
> > > > > > >> the
> > > > > > > patch?
> > > > > > >
> > > > > > > Ming -
> > > > > > >
> > > > > > > I tried after change you requested.  Performance drop is still
> > > > unresolved.
> > > > > > > From 1.6 M IOPS to 770K IOPS.
> > > > > > >
> > > > > > > See below data. All 24 reply queue is in used correctly.
> > > > > > >
> > > > > > > IRQs / 1 second(s)
> > > > > > > IRQ#  TOTAL  NODE0   NODE1  NAME
> > > > > > >  360  16422  0   16422  IR-PCI-MSI 70254653-edge megasas
> > > > > > >  364  15980  0   15980  IR-PCI-MSI 70254657-edge megasas
> > > > > > >  362  15979  0   15979  IR-PCI-MSI 70254655-edge megasas
> > > > > > >  345  15696  0   15696  IR-PCI-MSI 70254638-edge megasas
> > > > > > >  341  15659  0   15659  IR-PCI-MSI 70254634-edge megasas
> > > > > > >  369  15656  0   15656  IR-PCI-MSI 70254662-edge megasas
> > > > > > >  359  15650  0   15650  IR-PCI-MSI 70254652-edge megasas
> > > > > > >  358  15596  0   15596  IR-PCI-MSI 70254651-edge megasas
> > > > > > >  350  15574  0   15574  IR-PCI-MSI 70254643-edge megasas
> > > > > > >  342  15532  0   15532  IR-PCI-MSI 70254635-edge megasas
> > > > > > >  344  15527  0   15527  

Re: scsi: sg: assorted memory corruptions

2018-02-10 Thread Eric Biggers
On Sun, Feb 04, 2018 at 12:10:58PM +0100, Dmitry Vyukov wrote:
> >
> > To get memory corruption it's actually sufficient just to submit "1-byte" 
> > reads;
> > there's no need for the SG_NEXT_CMD_LEN ioctl or anything:
> >
> > #include 
> > #include 
> >
> > int main()
> > {
> > int fd = open("/dev/sg0", O_RDWR);
> > char buf[43] = { [36] = 0x08 /* READ_6 */ };
> >
> > for (;;)
> > write(fd, buf, sizeof(buf));
> > }
> >
> > (where /dev/sg0 is the default QEMU disk type, "82371SB PIIX3 IDE")
> >
> > The SCSI command descriptor block is the 6 bytes at indices 36-41, so index 
> > 42
> > is the only data byte.
> >
> > Also this is a different bug from the crash in ata_bmdma_fill_sg() which is
> > fixed by "libata: fix length validation of ATAPI-relayed SCSI commands".
> >
> > I'm guessing the driver is DMA'ing to somewhere it shouldn't be...
> 
> It would be good to add KASAN checks to the DMA code that issues
> transfers. This is another case where a silent memory corruption
> causes dozens of assorted crashes all over the kernel. If we add
> checks, KASAN would pinpoint the exact stack that issues the bad
> command. This may be the simplest way to debug this bug as well. I've
> filed https://bugzilla.kernel.org/show_bug.cgi?id=198661 for this.

It seems the problem is related to the fact that in the PRD (Physical Region
Descriptor) list for the DMA transfer for "BMDMA" ATA disks, the disk (emulated
in QEMU here: https://github.com/qemu/qemu/blob/master/hw/ide/pci.c#L89) ignores
the low bit in the lengths, causing a length of 1 (byte) to be interpreted as 0.
But, at the same time there is also a special case where a length of 0 is
interpreted as 65536 bytes.

So the disk will DMA up to 65536 bytes into a 1-byte buffer, causing massive
memory corruption.

I'm not sure what the best fix is, but probably it needs to be required that the
lengths in the sglist have the alignment needed for the disk.

KASAN would not have helped here unfortunately.  Even if there were KASAN checks
when mapping the sglist for DMA or when filling in the PRD list, the kernel
would not have known that the disk would actually interpret "1" as "65536".

- Eric


Re: [PATCH] scsi: scsi_debug: Fix pointer stying issues

2018-02-10 Thread John Pittman
Ha! Thanks for catching the typo Douglas.

On Fri, Feb 9, 2018 at 9:53 PM, Douglas Gilbert  wrote:
> As per the subject line, I wouldn't mind putting these changes in with
> the pigs :-)
>
> On 2018-02-09 09:12 PM, John Pittman wrote:
>>
>> Pointer styling issues exposed by checkpatch.pl in scsi_debug.c:
>>
>> ERROR: "foo * bar" should be "foo *bar"
>>
>> Fixed 37 total errors reported.
>>
>> Signed-off-by: John Pittman 
>
>
> Ack-ed by: Douglas Gilbert 
>
>
>> ---
>>   drivers/scsi/scsi_debug.c | 72
>> +++
>>   1 file changed, 36 insertions(+), 36 deletions(-)
>>
>> diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
>> index a5986da..a1f867f 100644
>> --- a/drivers/scsi/scsi_debug.c
>> +++ b/drivers/scsi/scsi_debug.c
>> @@ -1155,8 +1155,8 @@ static int inquiry_vpd_84(unsigned char *arr)
>>   static int inquiry_vpd_85(unsigned char *arr)
>>   {
>> int num = 0;
>> -   const char * na1 = "https://www.kernel.org/config;;
>> -   const char * na2 = "http://www.kernel.org/log;;
>> +   const char *na1 = "https://www.kernel.org/config;;
>> +   const char *na2 = "http://www.kernel.org/log;;
>> int plen, olen;
>> arr[num++] = 0x1;   /* lu, storage config */
>> @@ -1372,7 +1372,7 @@ static int inquiry_vpd_b2(unsigned char *arr)
>>   static int resp_inquiry(struct scsi_cmnd *scp, struct sdebug_dev_info
>> *devip)
>>   {
>> unsigned char pq_pdt;
>> -   unsigned char * arr;
>> +   unsigned char *arr;
>> unsigned char *cmd = scp->cmnd;
>> int alloc_len, n, ret;
>> bool have_wlun, is_disk;
>> @@ -1523,10 +1523,10 @@ static int resp_inquiry(struct scsi_cmnd *scp,
>> struct sdebug_dev_info *devip)
>>   static unsigned char iec_m_pg[] = {0x1c, 0xa, 0x08, 0, 0, 0, 0, 0,
>>0, 0, 0x0, 0x0};
>>   -static int resp_requests(struct scsi_cmnd * scp,
>> -struct sdebug_dev_info * devip)
>> +static int resp_requests(struct scsi_cmnd *scp,
>> +struct sdebug_dev_info *devip)
>>   {
>> -   unsigned char * sbuff;
>> +   unsigned char *sbuff;
>> unsigned char *cmd = scp->cmnd;
>> unsigned char arr[SCSI_SENSE_BUFFERSIZE];
>> bool dsense;
>> @@ -1584,8 +1584,8 @@ static int resp_requests(struct scsi_cmnd * scp,
>> return fill_from_dev_buffer(scp, arr, len);
>>   }
>>   -static int resp_start_stop(struct scsi_cmnd * scp,
>> -  struct sdebug_dev_info * devip)
>> +static int resp_start_stop(struct scsi_cmnd *scp,
>> +  struct sdebug_dev_info *devip)
>>   {
>> unsigned char *cmd = scp->cmnd;
>> int power_cond, stop;
>> @@ -1612,8 +1612,8 @@ static sector_t get_sdebug_capacity(void)
>>   }
>> #define SDEBUG_READCAP_ARR_SZ 8
>> -static int resp_readcap(struct scsi_cmnd * scp,
>> -   struct sdebug_dev_info * devip)
>> +static int resp_readcap(struct scsi_cmnd *scp,
>> +   struct sdebug_dev_info *devip)
>>   {
>> unsigned char arr[SDEBUG_READCAP_ARR_SZ];
>> unsigned int capac;
>> @@ -1631,8 +1631,8 @@ static int resp_readcap(struct scsi_cmnd * scp,
>>   }
>> #define SDEBUG_READCAP16_ARR_SZ 32
>> -static int resp_readcap16(struct scsi_cmnd * scp,
>> - struct sdebug_dev_info * devip)
>> +static int resp_readcap16(struct scsi_cmnd *scp,
>> + struct sdebug_dev_info *devip)
>>   {
>> unsigned char *cmd = scp->cmnd;
>> unsigned char arr[SDEBUG_READCAP16_ARR_SZ];
>> @@ -1670,11 +1670,11 @@ static int resp_readcap16(struct scsi_cmnd * scp,
>> #define SDEBUG_MAX_TGTPGS_ARR_SZ 1412
>>   -static int resp_report_tgtpgs(struct scsi_cmnd * scp,
>> - struct sdebug_dev_info * devip)
>> +static int resp_report_tgtpgs(struct scsi_cmnd *scp,
>> + struct sdebug_dev_info *devip)
>>   {
>> unsigned char *cmd = scp->cmnd;
>> -   unsigned char * arr;
>> +   unsigned char *arr;
>> int host_no = devip->sdbg_host->shost->host_no;
>> int n, ret, alen, rlen;
>> int port_group_a, port_group_b, port_a, port_b;
>> @@ -1926,7 +1926,7 @@ static int resp_rsup_tmfs(struct scsi_cmnd *scp,
>> /* <> */
>>   -static int resp_err_recov_pg(unsigned char * p, int pcontrol, int
>> target)
>> +static int resp_err_recov_pg(unsigned char *p, int pcontrol, int target)
>>   { /* Read-Write Error Recovery page for mode_sense */
>> unsigned char err_recov_pg[] = {0x1, 0xa, 0xc0, 11, 240, 0, 0, 0,
>> 5, 0, 0xff, 0xff};
>> @@ -1937,7 +1937,7 @@ static int resp_err_recov_pg(unsigned char * p, int
>> pcontrol, int target)
>> return sizeof(err_recov_pg);
>>   }
>>   -static int resp_disconnect_pg(unsigned char * p, int