[PATCH blktests] scsi/004: add regression test for false BLK_STS_OK with non good SAM status

2018-04-17 Thread Steffen Maier
Signed-off-by: Steffen Maier 
---
 tests/scsi/004 |   59 
 tests/scsi/004.out |3 ++
 2 files changed, 62 insertions(+), 0 deletions(-)
 create mode 100755 tests/scsi/004
 create mode 100644 tests/scsi/004.out

diff --git a/tests/scsi/004 b/tests/scsi/004
new file mode 100755
index 000..4852efc
--- /dev/null
+++ b/tests/scsi/004
@@ -0,0 +1,59 @@
+#!/bin/bash
+#
+# Ensure repeated SAM_STAT_TASK_SET_FULL results in EIO on timing out command.
+#
+# Regression test for commit cbe095e2b584 ("Revert "scsi: core: return
+# BLK_STS_OK for DID_OK in __scsi_error_from_host_byte()"")
+#
+# Found independently of corresponding commit mail threads while
+# experimenting with storage mirroring. This test is a storage-independent
+# reproducer for the error case I ran into.
+#
+# Copyright IBM Corp. 2018
+# Author: Steffen Maier 
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+
+. common/scsi_debug
+
+DESCRIPTION="ensure repeated TASK SET FULL results in EIO on timing out 
command"
+
+requires() {
+   _have_scsi_debug
+}
+
+test() {
+   echo "Running ${TEST_NAME}"
+
+   if ! _init_scsi_debug add_host=1 max_luns=1 statistics=1 every_nth=1; 
then
+   return 1
+   fi
+   # every_nth RW with full queue gets SAM_STAT_TASK_SET_FULL
+   echo 0x800 > /sys/bus/pseudo/drivers/scsi_debug/opts
+   # delay to reduce response repetition: around 1..10sec depending on HZ
+   echo 1000 > /sys/bus/pseudo/drivers/scsi_debug/delay
+   # a single command fills device queue to satisfy 0x800 opts condition
+   echo 1 > "/sys/block/${SCSI_DEBUG_DEVICES[0]}/device/queue_depth"
+   dd if="/dev/${SCSI_DEBUG_DEVICES[0]}" iflag=direct of=/dev/null bs=512 
count=1 |& grep -o "Input/output error"
+   # stop injection
+   echo 0 > /sys/bus/pseudo/drivers/scsi_debug/opts
+   # dd closing SCSI disk causes implicit TUR also being delayed once
+   while grep -q -F "in_use_bm BUSY:" 
"/proc/scsi/scsi_debug/${SCSI_DEBUG_HOSTS[0]}"; do
+   sleep 1
+   done
+   echo 1 > /sys/bus/pseudo/drivers/scsi_debug/delay
+   _exit_scsi_debug
+
+   echo "Test complete"
+}
diff --git a/tests/scsi/004.out b/tests/scsi/004.out
new file mode 100644
index 000..b1126fb
--- /dev/null
+++ b/tests/scsi/004.out
@@ -0,0 +1,3 @@
+Running scsi/004
+Input/output error
+Test complete
-- 
1.7.1



Re: MPT Fusion - ext4 delayed allocation failed errors on 4.14!

2018-04-17 Thread James Bottomley
On Mon, 2018-04-16 at 23:25 -0400, Martin K. Petersen wrote:
> Nikola,
> 
> > thanks for explanation. but disabling write same for now is safe,
> > right?
> 
> I was hoping we'd be able to disable it for SATA devices only.
> 
> > root@siv-70140:~ # sg_vpd /dev/sda
> > Supported VPD pages VPD page:
> >   Supported VPD pages [sv]
> >   Device identification [di]
> >   Supported VPD pages [sv]
> >   Supported VPD pages [sv]
> 
> However, it doesn't look like we have much to work with since there
> is no ATA Information VPD page exposed.

it's the fat firmware problem again.  However, the driver does know, so
could we persuade a modification of
mpt3sas_scsih.c:_scsih_display_sata_capabilities() to populate a VPD
page?

James



Re: [PATCH blktests] scsi/004: add regression test for false BLK_STS_OK with non good SAM status

2018-04-17 Thread Johannes Thumshirn
Looks good,
Reviewed-by: Johannes Thumshirn 
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


Re: usercopy whitelist woe in scsi_sense_cache

2018-04-17 Thread Oleksandr Natalenko

Hi.

17.04.2018 05:12, Kees Cook wrote:
Turning off HARDENED_USERCOPY and turning on KASAN, I see the same 
report:


[   38.274106] BUG: KASAN: slab-out-of-bounds in 
_copy_to_user+0x42/0x60
[   38.274841] Read of size 22 at addr 8800122b8c4b by task 
smartctl/1064

[   38.275630]
[   38.275818] CPU: 2 PID: 1064 Comm: smartctl Not tainted 
4.17.0-rc1-ARCH+ #266

[   38.276631] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[   38.277690] Call Trace:
[   38.277988]  dump_stack+0x71/0xab
[   38.278397]  ? _copy_to_user+0x42/0x60
[   38.278833]  print_address_description+0x6a/0x270
[   38.279368]  ? _copy_to_user+0x42/0x60
[   38.279800]  kasan_report+0x243/0x360
[   38.280221]  _copy_to_user+0x42/0x60
[   38.280635]  sg_io+0x459/0x660
...

Though we get slightly more details (some we already knew):

[   38.301330] Allocated by task 329:
[   38.301734]  kmem_cache_alloc_node+0xca/0x220
[   38.302239]  scsi_mq_init_request+0x64/0x130 [scsi_mod]
[   38.302821]  blk_mq_alloc_rqs+0x2cf/0x370
[   38.303265]  blk_mq_sched_alloc_tags.isra.4+0x7d/0xb0
[   38.303820]  blk_mq_init_sched+0xf0/0x220
[   38.304268]  elevator_switch+0x17a/0x2c0
[   38.304705]  elv_iosched_store+0x173/0x220
[   38.305171]  queue_attr_store+0x72/0xb0
[   38.305602]  kernfs_fop_write+0x188/0x220
[   38.306049]  __vfs_write+0xb6/0x330
[   38.306436]  vfs_write+0xe9/0x240
[   38.306804]  ksys_write+0x98/0x110
[   38.307181]  do_syscall_64+0x6d/0x1d0
[   38.307590]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   38.308142]
[   38.308316] Freed by task 0:
[   38.308652] (stack is not available)
[   38.309060]
[   38.309243] The buggy address belongs to the object at 
8800122b8c00

[   38.309243]  which belongs to the cache scsi_sense_cache of size 96
[   38.310625] The buggy address is located 75 bytes inside of
[   38.310625]  96-byte region [8800122b8c00, 8800122b8c60)


With a hardware watchpoint, I've isolated the corruption to here:

bfq_dispatch_request+0x2be/0x1610:
__bfq_dispatch_request at block/bfq-iosched.c:3902
3900if (rq) {
3901inc_in_driver_start_rq:
3902bfqd->rq_in_driver++;
3903start_rq:
3904rq->rq_flags |= RQF_STARTED;
3905}

Through some race condition(?), rq_in_driver is also sense_buffer, and
it can get incremented.
…
I still haven't figured this out, though... any have a moment to look 
at this?


By any chance, have you tried to simplify the reproducer environment, or 
it still needs my complex layout to trigger things even with KASAN?


Regards,
  Oleksandr


Re: usercopy whitelist woe in scsi_sense_cache

2018-04-17 Thread James Bottomley
On Mon, 2018-04-16 at 20:12 -0700, Kees Cook wrote:
> I still haven't figured this out, though... any have a moment to look
> at this?

Just to let you know you're not alone ... but I can't make any sense of
this either.  The bfdq is the elevator_data, which is initialised when
the scheduler is attached, so it shouldn't change.  Is it possible to
set a data break point on elevator_data after it's initialised and see
if it got changed by something?

James



Re: [RESEND PATCH v1 1/2] trace: events: scsi: Add tag in SCSI trace events

2018-04-17 Thread Bean Huo (beanhuo)
Hi, Steve 
Right, Please see below portion log from ftrace and blktrace,
There is no any impact on blktrace. 
>
>Looking at the code from
>git://git.kernel.org/pub/scm/linux/kernel/git/axboe/blktrace.git
>
>It appears that it does not rely on the ftrace ring buffers.
>
>So I'm guessing blktrace is not affected by this patch.
>
>-- Steve


#/blktrace -d /dev/block/sdd14 -o - | ./blkparse -i -  
  8,62   3 1162 0.213039583  4116  P   N [iozone]
  8,62   3 1163 0.213041146  4116  U   N [iozone] 1
  8,62   3 1164 0.213042708  4116  I  WS 39573632 + 128 [iozone]
  8,62   3 1165 0.213044791  4116  D  WS 39573632 + 128 [iozone]
  8,48   3 1166 0.213585416  4116  A  WS 39573760 + 128 <- (8,62) 342272
  8,62   3 1167 0.213590104  4116  Q  WS 39573760 + 128 [iozone]
  8,62   3 1168 0.213592187  4116  G  WS 39573760 + 128 [iozone]
  8,62   3 1169 0.213594271  4116  P   N [iozone]
  8,62   3 1170 0.213595833  4116  U   N [iozone] 1
  8,62   3 1171 0.213597396  4116  I  WS 39573760 + 128 [iozone]
  8,62   3 1172 0.21360  4116  D  WS 39573760 + 128 [iozone]
  8,48   3 1173 0.214092187  4116  A  WS 39573888 + 128 <- (8,62) 342400
  8,62   3 1174 0.214094791  4116  Q  WS 39573888 + 128 [iozone]
  8,62   3 1175 0.214097916  4116  G  WS 39573888 + 128 [iozone]
  8,62   3 1176 0.214102604  4116  P   N [iozone]
  8,62   3 1177 0.214105208  4116  U   N [iozone] 1
  8,62   3 1178 0.214106771  4116  I  WS 39573888 + 128 [iozone]
  8,62   3 1179 0.214111458  4116  D  WS 39573888 + 128 [iozone]
  8,48   3 1180 0.216531771  4115  A  WS 39546496 + 128 <- (8,62) 315008
  8,62   3 1181 0.216534896  4115  Q  WS 39546496 + 128 [iozone]
  8,62   3 1182 0.216538021  4115  G  WS 39546496 + 128 [iozone]
  8,62   3 1183 0.216540625  4115  P   N [iozone]
  8,62   3 1184 0.216543229  4115  U   N [iozone] 1
  8,62   3 1185 0.216544791  4115  I  WS 39546496 + 128 [iozone]
  8,62   3 1186 0.216547396  4115  D  WS 39546496 + 128 [iozone]
  8,48   3 1187 0.217146354  4115  A  WS 39546624 + 128 <- (8,62) 315136
  8,62   3 1188 0.217148437  4115  Q  WS 39546624 + 128 [iozone]
  8,62   3 1189 0.217151041  4115  G  WS 39546624 + 128 [iozone]
  8,62   3 1190 0.217153646  4115  P   N [iozone]
  8,62   3 1191 0.217155208  4115  U   N [iozone] 1
  8,62   3 1192 0.217156771  4115  I  WS 39546624 + 128 [iozone]
  8,62   3 1193 0.217159896  4115  D  WS 39546624 + 128 [iozone]
  8,48   3 1194 0.217712500  4115  A  WS 39546752 + 128 <- (8,62) 315264
  8,62   3 1195 0.217714583  4115  Q  WS 39546752 + 128 [iozone]
  8,62   3 1196 0.217717708  4115  G  WS 39546752 + 128 [iozone]
  8,62   3 1197 0.217722396  4115  P   N [iozone]
  8,62   3 1198 0.217724479  4115  U   N [iozone] 1
  8,62   3 1199 0.217726041  4115  I  WS 39546752 + 128 [iozone]
  8,62   3 1200 0.217728646  4115  D  WS 39546752 + 128 [iozone]
  8,62   5 1150 0.198047916 0  C  WS 39569920 + 128 [0]
  8,62   4 1104 0.198104687 0  C  WS 39576448 + 128 [0]
  8,48   5 1151 0.198182291  4116  A  WS 39570048 + 128 <- (8,62) 338560
  8,62   5 1152 0.19818  4116  Q  WS 39570048 + 128 [iozone]
  8,62   5 1153 0.198184375  4116  G  WS 39570048 + 128 [iozone]
  8,62   5 1154 0.198185937  4116  P   N [iozone]
  8,62   5 1155 0.198186458  4116  U   N [iozone] 1
  8,62   5 1156 0.198186979  4116  I  WS 39570048 + 128 [iozone]
  8,62   5 1157 0.198188541  4116  D  WS 39570048 + 128 [iozone]
  8,48   4 1105 0.198218229  4114  A  WS 39576576 + 128 <- (8,62) 345088
  8,62   4 1106 0.198219271  4114  Q  WS 39576576 + 128 [iozone]
  8,62   4 1107 0.198220312  4114  G  WS 39576576 + 128 [iozone]
  8,62   4 1108 0.198221354  4114  P   N [iozone]
  8,62   4 1109 0.198221875  4114  U   N [iozone] 1
  8,62   4 1110 0.198222396  4114  I  WS 39576576 + 128 [iozone]
  8,62   4  0.198223437  4114  D  WS 39576576 + 128 [iozone]
  8,62   4 1112 0.198245312  4114  C  WS 39594368 + 128 [0]
  8,62   5 1158 0.198336979 0  C  WS 39570048 + 128 [0]
  8,62   4 1113 0.198409375 0  C  WS 39576576 + 128 [0]
  8,48   5 1159 0.199219791  4113  A  WS 39594624 + 128 <- (8,62) 363136
  8,62   5 1160 0.199220833  4113  Q  WS 39594624 + 128 [iozone]
  8,62   5 1161 0.199222396  4113  G  WS 39594624 + 128 [iozone]
  8,62   5 1162 0.199223437  4113  P   N [iozone]
  8,62   5 1163 0.199224479  4113  U   N [iozone] 1
  8,62   5 1164 0.199225000  4113  I  WS 39594624 + 128 [iozone]
  8,62   5 1165 0.199226562  4113  D  WS 39594624 + 128 [iozone]
  8,48   4 1114 0.199235937  4116  A  WS 39570176 + 128 <- 

Re: Add udev-md-raid-safe-timeouts.rules

2018-04-17 Thread Austin S. Hemmelgarn

On 2018-04-16 13:10, Chris Murphy wrote:

Adding linux-usb@ and linux-scsi@
(This email does contain the thread initiating email, but some replies
are on the other lists.)

On Mon, Apr 16, 2018 at 5:43 AM, Austin S. Hemmelgarn
 wrote:

On 2018-04-15 21:04, Chris Murphy wrote:


I just ran into this:

https://github.com/neilbrown/mdadm/pull/32/commits/af1ddca7d5311dfc9ed60a5eb6497db1296f1bec

This solution is inadequate, can it be made more generic? This isn't
an md specific problem, it affects Btrfs and LVM as well. And in fact
raid0, and even none raid setups.

There is no good reason to prevent deep recovery, which is what
happens with the default command timer of 30 seconds, with this class
of drive. Basically that value is going to cause data loss for the
single device and also raid0 case, where the reset happens before deep
recovery has a chance. And even if deep recovery fails to return user
data, what we need to see is the proper error message: read error UNC,
rather than a link reset message which just obfuscates the problem.



This has been discussed at least once here before (probably more times, hard
to be sure since it usually comes up as a side discussion in an only
marginally related thread).  Last I knew, the consensus here was that it
needs to be changed upstream in the kernel, not by adding a udev rule
because while the value is technically system policy, the default policy is
brain-dead for anything but the original disks it was i9ntended for (30
seconds works perfectly fine for actual SCSI devices because they behave
sanely in the face of media errors, but it's horribly inadequate for ATA
devices).

To re-iterate what I've said before on the subject:

For ATA drives it should probably be 150 seconds.  That's 30 seconds beyond
the typical amount of time most consumer drives will keep retrying a sector,
so even if it goes the full time to try and recover a sector this shouldn't
trigger.  The only people this change should negatively impact are those who
have failing drives which support SCT ERC and have it enabled, but aren't
already adjusting this timeout.

For physical SCSI devices, it should continue to be 30 seconds.  SCSI disks
are sensible here and don't waste your time trying to recover a sector.  For
PV-SCSI devices, it should probably be adjusted too, but I don't know what a
reasonable value is.

For USB devices it should probably be higher than 30 seconds, but again I
have no idea what a reasonable value is.


I don't know how all of this is designed but it seems like there's
only one location for the command timer, and the SCSI driver owns it,
and then everyone else (ATA and USB and for all I know SAN) are on top
of that and lack any ability to have separate timeouts.
On the note of SAN, iSCSI is part of the SCSI subsystem, so it gets 
applied directly there.  I'm pretty sure NBD has it's own thing, and I 
think the same is true of ATAoE.


As far as USB, UMS is essentially a stripped down version of SCSI with 
it's own limitations, and UAS _is_ SCSI, with both of those having 
pretty much always been routed through the SCSI subsystem.


The nice thing about the udev rule is that it tests for SCT ERC before
making a change. There certainly are enterprise and almost enterprise
"NAS" SATA drives that have short SCT ERC times enabled out of the box
- and the udev method makes them immune to the change.
The kernel could just as easily look for that too though.  From what 
I've seen however, other failure sources that wouldn't trigger SCT ERC 
on SATA drives are really rare, usually it means a bad cable, bad drive 
electronics, or a bad storage controller, so i don't think having it set 
really high for SCT ERC enabled drives is likely to be much of an issue 
most of the time.


[PATCH 1/1] scsi: sd: improved drive sanitize error handling

2018-04-17 Thread Mahesh Rajashekhara
During the drive sanitization, when the sd driver issues TEST UNIT READY (TUR),
drive reports Sense Key: NOT_READY, ASC: 0x4 and ASCQ: 0x1b.

Sd driver issuing START_STOP command to spin up the drive.
This causes a hung and call trace occurred.

sd driver should take note of Sense Key: NOT_READY, ASC: 0x4 and ASCQ: 0x1b and
should not send START_STOP command until that condition clears.

Excerpt of dmesg:
[  206.998697] sd 6:0:0:0: [sda] tag#1 FAILED Result: hostbyte=DID_OK 
driverbyte=DRIVER_SENSE
[  206.998714] sd 6:0:0:0: [sda] tag#1 Sense Key : Not Ready [current]
[  206.998720] sd 6:0:0:0: [sda] tag#1 Add. Sense: Logical unit not ready, 
sanitize in progress
[  206.998727] sd 6:0:0:0: [sda] tag#1 CDB: Read(10) 28 00 00 00 00 04 00 00 04 
00
[  206.998732] print_req_error: I/O error, dev sda, sector 4
[  206.998787] Buffer I/O error on dev sda, logical block 1, async page read
[  206.999786] sd 6:0:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_OK 
driverbyte=DRIVER_SENSE
[  206.999799] sd 6:0:0:0: [sda] tag#0 Sense Key : Not Ready [current]
[  206.999806] sd 6:0:0:0: [sda] tag#0 Add. Sense: Logical unit not ready, 
sanitize in progress
[  206.999812] sd 6:0:0:0: [sda] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 04 
00
[  206.999816] print_req_error: I/O error, dev sda, sector 0
[  207.000127] Buffer I/O error on dev sda, logical block 0, async page read
[  207.000761] sd 6:0:0:0: [sda] tag#1 FAILED Result: hostbyte=DID_OK 
driverbyte=DRIVER_SENSE
[  207.000775] sd 6:0:0:0: [sda] tag#1 Sense Key : Not Ready [current]
[  207.000780] sd 6:0:0:0: [sda] tag#1 Add. Sense: Logical unit not ready, 
sanitize in progress
[  207.000786] sd 6:0:0:0: [sda] tag#1 CDB: Read(10) 28 00 00 00 00 04 00 00 04 
00
[  207.000789] print_req_error: I/O error, dev sda, sector 4
[  207.000835] Buffer I/O error on dev sda, logical block 1, async page read
[  207.013461] Dev sda: unable to read RDB block 0
[  207.041607]  sda: unable to read partition table
[  207.051796] sd 6:0:0:0: [sda] Spinning up disk...
[  208.091576] ..
[  242.874325] INFO: task systemd-udevd:613 blocked for more than 120 seconds.
[  242.874391]   Tainted: G   OE4.16.0-rc1+ #1
[  242.874445] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  242.874507] systemd-udevd   D0   613478 0x8104
[  242.874512] Call Trace:
[  242.874524]  ? __schedule+0x297/0x870
[  242.874532]  ? _cond_resched+0x15/0x40
[  242.874537]  schedule+0x28/0x80
[  242.874542]  async_synchronize_cookie_domain+0x96/0x140
[  242.874547]  ? wait_woken+0x80/0x80
[  242.874553]  do_init_module+0xbc/0x201
[  242.874557]  load_module+0x1989/0x1f10
[  242.874565]  ? SYSC_finit_module+0xe9/0x110
[  242.874569]  SYSC_finit_module+0xe9/0x110
[  242.874577]  do_syscall_64+0x71/0x130
[  242.874584]  entry_SYSCALL_64_after_hwframe+0x21/0x86
[  242.874588] RIP: 0033:0x7fed93532a49
[  242.874591] RSP: 002b:7fff9467b108 EFLAGS: 0246 ORIG_RAX: 
0139
[  242.874595] RAX: ffda RBX: 55a3462a9380 RCX: 7fed93532a49
[  242.874597] RDX:  RSI: 7fed9321e1c5 RDI: 000f
[  242.874600] RBP: 7fed9321e1c5 R08:  R09: 
[  242.874602] R10: 000f R11: 0246 R12: 
[  242.874604] R13: 55a3462a7510 R14: 0002 R15: 55a3462a9380
[  243.290259] 
...not responding...
[  307.290719] sd 6:0:0:0: [sda] Attached SCSI disk

Signed-off-by: Mahesh Rajashekhara 
---
 drivers/scsi/sd.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index a6201e6..9421d98 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2121,6 +2121,8 @@ sd_spinup_disk(struct scsi_disk *sdkp)
break;  /* standby */
if (sshdr.asc == 4 && sshdr.ascq == 0xc)
break;  /* unavailable */
+   if (sshdr.asc == 4 && sshdr.ascq == 0x1b)
+   break;  /* sanitize in progress */
/*
 * Issue command to spin up drive when not ready
 */
-- 
2.7.4



[PATCH v9 5/5] arm64: defconfig: enable f2fs and squashfs

2018-04-17 Thread Li Wei
Partitions in HiKey960 are formatted as f2fs and squashfs.
f2fs is for userdata; squashfs is for system. Both partitions are required
by Android.

Signed-off-by: Li Wei 
Signed-off-by: Zhangfei Gao 
Signed-off-by: Guodong Xu 
---
 arch/arm64/configs/defconfig | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
index d42b1ecaf490..e8036cddb272 100644
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@ -613,6 +613,7 @@ CONFIG_ACPI_APEI_MEMORY_FAILURE=y
 CONFIG_ACPI_APEI_EINJ=y
 CONFIG_EXT2_FS=y
 CONFIG_EXT3_FS=y
+CONFIG_F2FS_FS=y
 CONFIG_EXT4_FS_POSIX_ACL=y
 CONFIG_BTRFS_FS=m
 CONFIG_BTRFS_FS_POSIX_ACL=y
@@ -628,6 +629,13 @@ CONFIG_HUGETLBFS=y
 CONFIG_CONFIGFS_FS=y
 CONFIG_EFIVAR_FS=y
 CONFIG_SQUASHFS=y
+CONFIG_SQUASHFS_FILE_DIRECT=y
+CONFIG_SQUASHFS_DECOMP_MULTI_PERCPU=y
+CONFIG_SQUASHFS_XATTR=y
+CONFIG_SQUASHFS_LZ4=y
+CONFIG_SQUASHFS_LZO=y
+CONFIG_SQUASHFS_XZ=y
+CONFIG_SQUASHFS_4K_DEVBLK_SIZE=y
 CONFIG_NFS_FS=y
 CONFIG_NFS_V4=y
 CONFIG_NFS_V4_1=y
-- 
2.15.0



[PATCH v9 0/5] scsi: ufs: add ufs driver code for Hisilicon Hi3660 SoC

2018-04-17 Thread Li Wei
This patchset adds driver support for UFS for Hi3660 SoC. It is verified on 
HiKey960 board.

Li Wei (5):
  scsi: ufs: add Hisilicon ufs driver code
  dt-bindings: scsi: ufs: add document for hisi-ufs
  arm64: dts: add ufs dts node
  arm64: defconfig: enable configs for Hisilicon ufs
  arm64: defconfig: enable f2fs and squashfs

 Documentation/devicetree/bindings/ufs/ufs-hisi.txt |  29 +
 .../devicetree/bindings/ufs/ufshcd-pltfrm.txt  |  10 +-
 arch/arm64/boot/dts/hisilicon/hi3660.dtsi  |  20 +
 arch/arm64/configs/defconfig   |  11 +
 drivers/scsi/ufs/Kconfig   |   9 +
 drivers/scsi/ufs/Makefile  |   1 +
 drivers/scsi/ufs/ufs-hisi.c| 621 +
 drivers/scsi/ufs/ufs-hisi.h| 114 
 8 files changed, 812 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/ufs/ufs-hisi.txt
 create mode 100644 drivers/scsi/ufs/ufs-hisi.c
 create mode 100644 drivers/scsi/ufs/ufs-hisi.h

Major changes in v9:
 - solve review comments from Rob Herring.
   *remove clocks/clock-names/freq-table-hz from ufs-hisi.txt binding file.
   *Move the rst to the ufshcd_pltfm.txt common binding file.
   *Modify the member "assert" of UFS host structure to "arst". 
--
Major changes in v8:
 - solve review comments from zhangfei.
   *Add Version history.
 - solve review comments from Rob Herring.
   *remove freq-table-hz.
 -  solve review comments from Riku Voipio.
   *Add MODULE_DEVICE_TABLE for ufs driver.
-- 
Major changes in v7:
 - solve review comments from Philippe Ombredanne.
   *use the new SPDX license ids instead of the GNU General Public License.
-- 
2.15.0



[PATCH v2] target: Fix Fortify_panic kernel exception

2018-04-17 Thread Bryant G. Ly
The bug exists in the memcmp in which the length passed in must
be guaranteed to be 1. This bug currently exists because
the second pointer passed in, can be smaller than the
cmd->data_length, which causes a fortify_panic.

The fix is to use memchr_inv instead to find whether or not
a 0 exists instead of using memcmp. This way you dont have to
worry about buffer overflow which is the reason for the
fortify_panic.

The bug was found by running a block backstore via LIO.

[  496.212958] Call Trace:
[  496.212960] [c007e58e3800] [c0cbbefc] fortify_panic+0x24/0x38 
(unreliable)
[  496.212965] [c007e58e3860] [df150c28] 
iblock_execute_write_same+0x3b8/0x3c0 [target_core_iblock]
[  496.212976] [c007e58e3910] [d6c737d4] 
__target_execute_cmd+0x54/0x150 [target_core_mod]
[  496.212982] [c007e58e3940] [d6d32ce4] 
ibmvscsis_write_pending+0x74/0xe0 [ibmvscsis]
[  496.212991] [c007e58e39b0] [d6c74fc8] 
transport_generic_new_cmd+0x318/0x370 [target_core_mod]
[  496.213001] [c007e58e3a30] [d6c75084] 
transport_handle_cdb_direct+0x64/0xd0 [target_core_mod]
[  496.213011] [c007e58e3aa0] [d6c75298] 
target_submit_cmd_map_sgls+0x1a8/0x320 [target_core_mod]
[  496.213021] [c007e58e3b30] [d6c75458] 
target_submit_cmd+0x48/0x60 [target_core_mod]
[  496.213026] [c007e58e3bd0] [d6d34c20] 
ibmvscsis_scheduler+0x370/0x600 [ibmvscsis]
[  496.213031] [c007e58e3c90] [c013135c] 
process_one_work+0x1ec/0x580
[  496.213035] [c007e58e3d20] [c0131798] worker_thread+0xa8/0x600
[  496.213039] [c007e58e3dc0] [c013a468] kthread+0x168/0x1b0
[  496.213044] [c007e58e3e30] [c000b528] 
ret_from_kernel_thread+0x5c/0xb4

Fixes: 2237498f0b5c ("target/iblock: Convert WRITE_SAME to 
blkdev_issue_zeroout")
Signed-off-by: Bryant G. Ly 
Reviewed-by: Steven Royer 
Tested-by: Taylor Jakobson 
Cc: Christoph Hellwig 
Cc: Nicholas Bellinger 
Cc: 
---
 drivers/target/target_core_iblock.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/target/target_core_iblock.c 
b/drivers/target/target_core_iblock.c
index 07c814c..6042901 100644
--- a/drivers/target/target_core_iblock.c
+++ b/drivers/target/target_core_iblock.c
@@ -427,8 +427,8 @@ iblock_execute_zero_out(struct block_device *bdev, struct 
se_cmd *cmd)
 {
struct se_device *dev = cmd->se_dev;
struct scatterlist *sg = >t_data_sg[0];
-   unsigned char *buf, zero = 0x00, *p = 
-   int rc, ret;
+   unsigned char *buf, *not_zero;
+   int ret;
 
buf = kmap(sg_page(sg)) + sg->offset;
if (!buf)
@@ -437,10 +437,10 @@ iblock_execute_zero_out(struct block_device *bdev, struct 
se_cmd *cmd)
 * Fall back to block_execute_write_same() slow-path if
 * incoming WRITE_SAME payload does not contain zeros.
 */
-   rc = memcmp(buf, p, cmd->data_length);
+   not_zero = memchr_inv(buf, 0x00, cmd->data_length);
kunmap(sg_page(sg));
 
-   if (rc)
+   if (not_zero)
return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
 
ret = blkdev_issue_zeroout(bdev,
-- 
2.7.2



Re: usercopy whitelist woe in scsi_sense_cache

2018-04-17 Thread Kees Cook
On Mon, Apr 16, 2018 at 8:12 PM, Kees Cook  wrote:
> With a hardware watchpoint, I've isolated the corruption to here:
>
> bfq_dispatch_request+0x2be/0x1610:
> __bfq_dispatch_request at block/bfq-iosched.c:3902
> 3900if (rq) {
> 3901inc_in_driver_start_rq:
> 3902bfqd->rq_in_driver++;
> 3903start_rq:
> 3904rq->rq_flags |= RQF_STARTED;
> 3905}

FWIW, the stacktrace here (removing the ? lines) is:

[   34.311980] RIP: 0010:bfq_dispatch_request+0x2be/0x1610
[   34.452491]  blk_mq_do_dispatch_sched+0x1d9/0x260
[   34.454561]  blk_mq_sched_dispatch_requests+0x3da/0x4b0
[   34.458789]  __blk_mq_run_hw_queue+0xae/0x130
[   34.460001]  __blk_mq_delay_run_hw_queue+0x192/0x280
[   34.460823]  blk_mq_run_hw_queue+0x10b/0x1b0
[   34.463240]  blk_mq_sched_insert_request+0x3bd/0x4d0
[   34.467342]  blk_execute_rq+0xcf/0x140
[   34.468483]  sg_io+0x2f7/0x730

Can anyone tell me more about the memory allocation layout of the
various variables here? It looks like struct request is a header in
front of struct scsi_request? How do struct elevator_queue, struct
blk_mq_ctx, and struct blk_mq_hw_ctx overlap these?

Regardless, I'll check for elevator data changing too...

-Kees

-- 
Kees Cook
Pixel Security


Re: usercopy whitelist woe in scsi_sense_cache

2018-04-17 Thread Kees Cook
On Tue, Apr 17, 2018 at 3:02 AM, James Bottomley
 wrote:
> On Mon, 2018-04-16 at 20:12 -0700, Kees Cook wrote:
>> I still haven't figured this out, though... any have a moment to look
>> at this?
>
> Just to let you know you're not alone ... but I can't make any sense of
> this either.  The bfdq is the elevator_data, which is initialised when
> the scheduler is attached, so it shouldn't change.  Is it possible to
> set a data break point on elevator_data after it's initialised and see
> if it got changed by something?

Yeah, it seems like some pointer chain is getting overwritten outside
of a lock or rcu or ?. I don't know this code well enough to guess at
where to check, though. What I find so strange is that the structure
offsets are different between bfpd's rq_in_driver field and
scsi_request's sense field, so even THAT doesn't look to be clear-cut
either:

struct bfq_data {
struct request_queue * queue;/* 0 8 */
struct list_head   dispatch; /* 816 */
struct bfq_group * root_group;   /*24 8 */
struct rb_root queue_weights_tree;   /*32 8 */
struct rb_root group_weights_tree;   /*40 8 */
intbusy_queues;  /*48 4 */
intwr_busy_queues;   /*52 4 */
intqueued;   /*56 4 */
intrq_in_driver; /*60 4 */
...

struct scsi_request {
unsigned char  __cmd[16];/* 016 */
unsigned char *cmd;  /*16 8 */
short unsigned int cmd_len;  /*24 2 */

/* XXX 2 bytes hole, try to pack */

intresult;   /*28 4 */
unsigned int   sense_len;/*32 4 */
unsigned int   resid_len;/*36 4 */
intretries;  /*40 4 */

/* XXX 4 bytes hole, try to pack */

void * sense;/*48 8 */
...

This is _so_ weird. :P

-Kees

-- 
Kees Cook
Pixel Security


Re: [PATCH] target: fix crash with iscsi target and dvd

2018-04-17 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig 

Btw, seems like someone needs to volunteer for putting together a pull
request with target fixes for Linus - I haven't heard from Nic for
a while, and we've got quite a few fixes out on the list.


Re: [PATCHv3] tcmu: allow userspace to reset netlink

2018-04-17 Thread Mike Christie
Nick,

Ignore this v3 too. It will not work for deletion.

On 04/16/2018 07:43 AM, xiu...@redhat.com wrote:
> From: Xiubo Li 
> 
> This patch adds 1 tcmu attr to reset and complete all the blocked
> netlink waiting threads. It's used when the userspace daemon like
> tcmu-runner has crashed or forced to shutdown just before the
> netlink requests be replied to the kernel, then the netlink requeting
> threads will get stuck forever. We must reboot the machine to recover
> from it and by this the rebootng is not a must then.
> 
> The Call Trace will be something like:
> ==
> INFO: task targetctl:22655 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> targetctl   D 880169718fd0 0 22655  17249 0x0080
> Call Trace:
>  [] schedule+0x29/0x70
>  [] schedule_timeout+0x239/0x2c0
>  [] ? skb_release_data+0xf2/0x140
>  [] wait_for_completion+0xfd/0x140
>  [] ? wake_up_state+0x20/0x20
>  [] tcmu_netlink_event+0x26a/0x3a0 [target_core_user]
>  [] ? wake_up_atomic_t+0x30/0x30
>  [] tcmu_configure_device+0x236/0x350 [target_core_user]
>  [] target_configure_device+0x3f/0x3b0 [target_core_mod]
>  [] target_core_store_dev_enable+0x2c/0x60 [target_core_mod]
>  [] target_core_dev_store+0x24/0x40 [target_core_mod]
>  [] configfs_write_file+0xc4/0x130
>  [] vfs_write+0xbd/0x1e0
>  [] SyS_write+0x7f/0xe0
>  [] system_call_fastpath+0x16/0x1b
> ==
> 
> Signed-off-by: Xiubo Li 
> ---
> Changes since v1(suggested by Mike Christie):
> v2: - Makes the reset per device.
> v3: - Remove nl_cmd->complete, use status instead
> - Fix lock issue 
> - Check if a nl command is even waiting before trying to wake up
> 
>  drivers/target/target_core_user.c | 60 
> +--
>  1 file changed, 51 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/target/target_core_user.c 
> b/drivers/target/target_core_user.c
> index 4ad89ea..a831f5b7 100644
> --- a/drivers/target/target_core_user.c
> +++ b/drivers/target/target_core_user.c
> @@ -104,8 +104,8 @@ struct tcmu_hba {
>  #define TCMU_CONFIG_LEN 256
>  
>  struct tcmu_nl_cmd {
> - /* wake up thread waiting for reply */
> - struct completion complete;
> + unsigned int waiter;
> +
>   int cmd;
>   int status;
>  };
> @@ -159,9 +159,12 @@ struct tcmu_dev {
>  
>   spinlock_t nl_cmd_lock;
>   struct tcmu_nl_cmd curr_nl_cmd;
> - /* wake up threads waiting on curr_nl_cmd */
> + /* wake up threads waiting on nl_cmd_wq */
>   wait_queue_head_t nl_cmd_wq;
>  
> + /* complete thread waiting complete_wq */
> + wait_queue_head_t complete_wq;
> +
>   char dev_config[TCMU_CONFIG_LEN];
>  
>   int nl_reply_supported;
> @@ -307,11 +310,13 @@ static int tcmu_genl_cmd_done(struct genl_info *info, 
> int completed_cmd)
>   nl_cmd->status = rc;
>   }
>  
> - spin_unlock(>nl_cmd_lock);
>   if (!is_removed)
>target_undepend_item(>dev_group.cg_item);
> - if (!ret)
> - complete(_cmd->complete);
> + if (!ret && nl_cmd->waiter) {
> + nl_cmd->waiter--;
> + wake_up(>complete_wq);
> + }
> + spin_unlock(>nl_cmd_lock);
>   return ret;
>  }
>  
> @@ -1258,6 +1263,7 @@ static struct se_device *tcmu_alloc_device(struct 
> se_hba *hba, const char *name)
>   timer_setup(>cmd_timer, tcmu_cmd_timedout, 0);
>  
>   init_waitqueue_head(>nl_cmd_wq);
> + init_waitqueue_head(>complete_wq);
>   spin_lock_init(>nl_cmd_lock);
>  
>   INIT_RADIX_TREE(>data_blocks, GFP_KERNEL);
> @@ -1554,8 +1560,8 @@ static void tcmu_init_genl_cmd_reply(struct tcmu_dev 
> *udev, int cmd)
>   }
>  
>   memset(nl_cmd, 0, sizeof(*nl_cmd));
> + nl_cmd->status = 1;
>   nl_cmd->cmd = cmd;
> - init_completion(_cmd->complete);
>  
>   spin_unlock(>nl_cmd_lock);
>  }
> @@ -1572,13 +1578,16 @@ static int tcmu_wait_genl_cmd_reply(struct tcmu_dev 
> *udev)
>   if (udev->nl_reply_supported <= 0)
>   return 0;
>  
> + spin_lock(>nl_cmd_lock);
> + nl_cmd->waiter++;
> + spin_unlock(>nl_cmd_lock);
> +
>   pr_debug("sleeping for nl reply\n");
> - wait_for_completion(_cmd->complete);
> + wait_event(udev->complete_wq, nl_cmd->status != 1);
>  
>   spin_lock(>nl_cmd_lock);
>   nl_cmd->cmd = TCMU_CMD_UNSPEC;
>   ret = nl_cmd->status;
> - nl_cmd->status = 0;
>   spin_unlock(>nl_cmd_lock);
>  
>   wake_up_all(>nl_cmd_wq);
> @@ -2323,6 +2332,38 @@ static ssize_t tcmu_block_dev_store(struct config_item 
> *item, const char *page,
>  }
>  CONFIGFS_ATTR(tcmu_, block_dev);
>  
> +static ssize_t tcmu_reset_netlink_store(struct config_item *item, const char 
> *page,
> + size_t count)
> +{
> + struct se_device *se_dev = container_of(to_config_group(item),
> + struct se_device,
> +  

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-17 Thread Jens Axboe
On 4/17/18 10:42 AM, Kees Cook wrote:
> On Mon, Apr 16, 2018 at 8:12 PM, Kees Cook  wrote:
>> With a hardware watchpoint, I've isolated the corruption to here:
>>
>> bfq_dispatch_request+0x2be/0x1610:
>> __bfq_dispatch_request at block/bfq-iosched.c:3902
>> 3900if (rq) {
>> 3901inc_in_driver_start_rq:
>> 3902bfqd->rq_in_driver++;
>> 3903start_rq:
>> 3904rq->rq_flags |= RQF_STARTED;
>> 3905}
> 
> FWIW, the stacktrace here (removing the ? lines) is:
> 
> [   34.311980] RIP: 0010:bfq_dispatch_request+0x2be/0x1610
> [   34.452491]  blk_mq_do_dispatch_sched+0x1d9/0x260
> [   34.454561]  blk_mq_sched_dispatch_requests+0x3da/0x4b0
> [   34.458789]  __blk_mq_run_hw_queue+0xae/0x130
> [   34.460001]  __blk_mq_delay_run_hw_queue+0x192/0x280
> [   34.460823]  blk_mq_run_hw_queue+0x10b/0x1b0
> [   34.463240]  blk_mq_sched_insert_request+0x3bd/0x4d0
> [   34.467342]  blk_execute_rq+0xcf/0x140
> [   34.468483]  sg_io+0x2f7/0x730
> 
> Can anyone tell me more about the memory allocation layout of the
> various variables here? It looks like struct request is a header in
> front of struct scsi_request? How do struct elevator_queue, struct
> blk_mq_ctx, and struct blk_mq_hw_ctx overlap these?

The scsi_request is a payload item for the block request, it's
located right after the request in memory. These are persistent
allocations, we don't allocate/free them per IO.

blk_mq_ctx are the blk-mq software queues, they are percpu and
allocated when the queue is setup.

blk_mq_hw_ctx is the hardware queue. You probably have just one,
it's allocated when the queue is setup.

struct elevator_queue is allocated when the scheduler is attached
to the queue. This can get freed and allocated if you switch
the scheduler on a queue, otherwise it persists until the queue
is torn down (and the scheduler data is freed).

> Regardless, I'll check for elevator data changing too...

It should not change unless you switch IO schedulers. If you're
using BFQ and not switching, then it won't change.

-- 
Jens Axboe



Re: [PATCH] target: Fix Fortify_panic kernel exception

2018-04-17 Thread Bryant G. Ly

On 4/13/18 11:44 AM, Christoph Hellwig wrote:

> The patch looks fine, but in general I think descriptions of what
> you fixed in the code or more important than starting out with
> a backtrace.
>
> E.g. please explain what was wrong, how you fixed it and only after
> that mention how it was caught.  (preferably without the whole trace)
>
I will put the trace at the end next time. Do you want me to re-submit
with it moved?

-Bryant



Re: [PATCH] target: Fix Fortify_panic kernel exception

2018-04-17 Thread Christoph Hellwig
On Tue, Apr 17, 2018 at 10:18:17AM -0500, Bryant G. Ly wrote:
> 
> On 4/13/18 11:44 AM, Christoph Hellwig wrote:
> 
> > The patch looks fine, but in general I think descriptions of what
> > you fixed in the code or more important than starting out with
> > a backtrace.
> >
> > E.g. please explain what was wrong, how you fixed it and only after
> > that mention how it was caught.  (preferably without the whole trace)
> >
> I will put the trace at the end next time. Do you want me to re-submit
> with it moved?

Please do.


Re: usercopy whitelist woe in scsi_sense_cache

2018-04-17 Thread Kees Cook
On Tue, Apr 17, 2018 at 2:19 AM, Oleksandr Natalenko
 wrote:
> By any chance, have you tried to simplify the reproducer environment, or it
> still needs my complex layout to trigger things even with KASAN?

I haven't tried minimizing the reproducer yet, no. Now that I have a
specific place to watch in the kernel for the corruption, though, that
might help. If I get stuck again today, I'll try it.

-Kees

-- 
Kees Cook
Pixel Security


[PATCH v9 3/5] arm64: dts: add ufs dts node

2018-04-17 Thread Li Wei
arm64: dts: add ufs node for Hisilicon.

Signed-off-by: Li Wei 
---
 arch/arm64/boot/dts/hisilicon/hi3660.dtsi | 20 
 1 file changed, 20 insertions(+)

diff --git a/arch/arm64/boot/dts/hisilicon/hi3660.dtsi 
b/arch/arm64/boot/dts/hisilicon/hi3660.dtsi
index ec3eb8e33a3a..e9013d40a613 100644
--- a/arch/arm64/boot/dts/hisilicon/hi3660.dtsi
+++ b/arch/arm64/boot/dts/hisilicon/hi3660.dtsi
@@ -892,6 +892,26 @@
reset-gpios = < 1 0 >;
};
 
+   /* UFS */
+   ufs: ufs@ff3b {
+   compatible = "hisilicon,hi3660-ufs", "jedec,ufs-1.1";
+   /* 0: HCI standard */
+   /* 1: UFS SYS CTRL */
+   reg = <0x0 0xff3b 0x0 0x1000>,
+   <0x0 0xff3b1000 0x0 0x1000>;
+   interrupt-parent = <>;
+   interrupts = ;
+   clocks = <_ctrl HI3660_CLK_GATE_UFSIO_REF>,
+   <_ctrl HI3660_CLK_GATE_UFSPHY_CFG>;
+   clock-names = "ref_clk", "phy_clk";
+   freq-table-hz = <0 0>, <0 0>;
+   /* offset: 0x84; bit: 12 */
+   /* offset: 0x84; bit: 7  */
+   resets = <_rst 0x84 12>,
+   <_rst 0x84 7>;
+   reset-names = "rst", "arst";
+   };
+
/* SD */
dwmmc1: dwmmc1@ff37f000 {
#address-cells = <1>;
-- 
2.15.0



[PATCH v9 1/5] scsi: ufs: add Hisilicon ufs driver code

2018-04-17 Thread Li Wei
add Hisilicon ufs driver code.

Signed-off-by: Li Wei 
Signed-off-by: Geng Jianfeng 
Signed-off-by: Zang Leigang 
Signed-off-by: Yu Jianfeng 
---
 drivers/scsi/ufs/Kconfig|   9 +
 drivers/scsi/ufs/Makefile   |   1 +
 drivers/scsi/ufs/ufs-hisi.c | 621 
 drivers/scsi/ufs/ufs-hisi.h | 114 
 4 files changed, 745 insertions(+)
 create mode 100644 drivers/scsi/ufs/ufs-hisi.c
 create mode 100644 drivers/scsi/ufs/ufs-hisi.h

diff --git a/drivers/scsi/ufs/Kconfig b/drivers/scsi/ufs/Kconfig
index e27b4d4e6ae2..10c3e60d86a5 100644
--- a/drivers/scsi/ufs/Kconfig
+++ b/drivers/scsi/ufs/Kconfig
@@ -100,3 +100,12 @@ config SCSI_UFS_QCOM
 
  Select this if you have UFS controller on QCOM chipset.
  If unsure, say N.
+
+config SCSI_UFS_HISI
+   tristate "Hisilicon specific hooks to UFS controller platform driver"
+   depends on (ARCH_HISI || COMPILE_TEST) && SCSI_UFSHCD_PLATFORM
+   help
+ This selects the Hisilicon specific additions to UFSHCD platform 
driver.
+
+ Select this if you have UFS controller on Hisilicon chipset.
+ If unsure, say N.
diff --git a/drivers/scsi/ufs/Makefile b/drivers/scsi/ufs/Makefile
index 918f5791202d..2c50f03d8c4a 100644
--- a/drivers/scsi/ufs/Makefile
+++ b/drivers/scsi/ufs/Makefile
@@ -7,3 +7,4 @@ obj-$(CONFIG_SCSI_UFSHCD) += ufshcd-core.o
 ufshcd-core-objs := ufshcd.o ufs-sysfs.o
 obj-$(CONFIG_SCSI_UFSHCD_PCI) += ufshcd-pci.o
 obj-$(CONFIG_SCSI_UFSHCD_PLATFORM) += ufshcd-pltfrm.o
+obj-$(CONFIG_SCSI_UFS_HISI) += ufs-hisi.o
diff --git a/drivers/scsi/ufs/ufs-hisi.c b/drivers/scsi/ufs/ufs-hisi.c
new file mode 100644
index ..bb9b1cc198db
--- /dev/null
+++ b/drivers/scsi/ufs/ufs-hisi.c
@@ -0,0 +1,621 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * HiSilicon Hi UFS Driver
+ *
+ * Copyright (c) 2016-2017 Linaro Ltd.
+ * Copyright (c) 2016-2017 HiSilicon Technologies Co., Ltd.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "ufshcd.h"
+#include "ufshcd-pltfrm.h"
+#include "unipro.h"
+#include "ufs-hisi.h"
+#include "ufshci.h"
+
+static int ufs_hisi_check_hibern8(struct ufs_hba *hba)
+{
+   int err = 0;
+   u32 tx_fsm_val_0 = 0;
+   u32 tx_fsm_val_1 = 0;
+   unsigned long timeout = jiffies + msecs_to_jiffies(HBRN8_POLL_TOUT_MS);
+
+   do {
+   err = ufshcd_dme_get(hba, UIC_ARG_MIB_SEL(MPHY_TX_FSM_STATE, 0),
+ _fsm_val_0);
+   err |= ufshcd_dme_get(hba,
+   UIC_ARG_MIB_SEL(MPHY_TX_FSM_STATE, 1), _fsm_val_1);
+   if (err || (tx_fsm_val_0 == TX_FSM_HIBERN8 &&
+   tx_fsm_val_1 == TX_FSM_HIBERN8))
+   break;
+
+   /* sleep for max. 200us */
+   usleep_range(100, 200);
+   } while (time_before(jiffies, timeout));
+
+   /*
+* we might have scheduled out for long during polling so
+* check the state again.
+*/
+   if (time_after(jiffies, timeout)) {
+   err = ufshcd_dme_get(hba, UIC_ARG_MIB_SEL(MPHY_TX_FSM_STATE, 0),
+_fsm_val_0);
+   err |= ufshcd_dme_get(hba,
+UIC_ARG_MIB_SEL(MPHY_TX_FSM_STATE, 1), _fsm_val_1);
+   }
+
+   if (err) {
+   dev_err(hba->dev, "%s: unable to get TX_FSM_STATE, err %d\n",
+   __func__, err);
+   } else if (tx_fsm_val_0 != TX_FSM_HIBERN8 ||
+tx_fsm_val_1 != TX_FSM_HIBERN8) {
+   err = -1;
+   dev_err(hba->dev, "%s: invalid TX_FSM_STATE, lane0 = %d, lane1 
= %d\n",
+   __func__, tx_fsm_val_0, tx_fsm_val_1);
+   }
+
+   return err;
+}
+
+static void ufs_hi3660_clk_init(struct ufs_hba *hba)
+{
+   struct ufs_hisi_host *host = ufshcd_get_variant(hba);
+
+   ufs_sys_ctrl_clr_bits(host, BIT_SYSCTRL_REF_CLOCK_EN, PHY_CLK_CTRL);
+   if (ufs_sys_ctrl_readl(host, PHY_CLK_CTRL) & BIT_SYSCTRL_REF_CLOCK_EN)
+   mdelay(1);
+   /* use abb clk */
+   ufs_sys_ctrl_clr_bits(host, BIT_UFS_REFCLK_SRC_SEl, UFS_SYSCTRL);
+   ufs_sys_ctrl_clr_bits(host, BIT_UFS_REFCLK_ISO_EN, PHY_ISO_EN);
+   /* open mphy ref clk */
+   ufs_sys_ctrl_set_bits(host, BIT_SYSCTRL_REF_CLOCK_EN, PHY_CLK_CTRL);
+}
+
+static void ufs_hi3660_soc_init(struct ufs_hba *hba)
+{
+   struct ufs_hisi_host *host = ufshcd_get_variant(hba);
+   u32 reg;
+
+   if (!IS_ERR(host->rst))
+   reset_control_assert(host->rst);
+
+   /* HC_PSW powerup */
+   ufs_sys_ctrl_set_bits(host, BIT_UFS_PSW_MTCMOS_EN, PSW_POWER_CTRL);
+   udelay(10);
+   /* notify PWR ready */
+   ufs_sys_ctrl_set_bits(host, BIT_SYSCTRL_PWR_READY, HC_LP_CTRL);
+   ufs_sys_ctrl_writel(host, MASK_UFS_DEVICE_RESET | 0,
+  

[PATCH v9 4/5] arm64: defconfig: enable configs for Hisilicon ufs

2018-04-17 Thread Li Wei
This enable configs for Hisilicon Hi UFS driver.

Signed-off-by: Li Wei 
Signed-off-by: Zhangfei Gao 
Signed-off-by: Guodong Xu 
---
 arch/arm64/configs/defconfig | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
index ecf613761e78..d42b1ecaf490 100644
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@ -187,6 +187,9 @@ CONFIG_BLK_DEV_SD=y
 CONFIG_SCSI_SAS_ATA=y
 CONFIG_SCSI_HISI_SAS=y
 CONFIG_SCSI_HISI_SAS_PCI=y
+CONFIG_SCSI_UFSHCD=y
+CONFIG_SCSI_UFSHCD_PLATFORM=y
+CONFIG_SCSI_UFS_HISI=y
 CONFIG_ATA=y
 CONFIG_SATA_AHCI=y
 CONFIG_SATA_AHCI_PLATFORM=y
-- 
2.15.0



Re: usercopy whitelist woe in scsi_sense_cache

2018-04-17 Thread Kees Cook
On Mon, Apr 16, 2018 at 8:12 PM, Kees Cook  wrote:
> With a hardware watchpoint, I've isolated the corruption to here:
>
> bfq_dispatch_request+0x2be/0x1610:
> __bfq_dispatch_request at block/bfq-iosched.c:3902
> 3900if (rq) {
> 3901inc_in_driver_start_rq:
> 3902bfqd->rq_in_driver++;
> 3903start_rq:
> 3904rq->rq_flags |= RQF_STARTED;
> 3905}

This state continues to appear to be "impossible".

[   68.845979] watchpoint triggered
[   68.846462] sense before:8b8f9f6aae00 after:8b8f9f6aae01
[   68.847196] elevator before:8b8f9a2c2000 after:8b8f9a2c2000
[   68.847905] elevator_data before:8b8f9a2c0400 after:8b8f9a2c0400
...
[   68.856925] RIP: 0010:bfq_dispatch_request+0x99/0xad0
[   68.857553] RSP: 0018:900280c63a58 EFLAGS: 0082
[   68.858253] RAX: 8b8f9aefbe80 RBX: 8b8f9a2c0400 RCX: 8b8f9a2c0408
[   68.859201] RDX: 8b8f9a2c0408 RSI: 900280c63b34 RDI: 0001
[   68.860147] RBP:  R08: 000f0204 R09: 
[   68.861122] R10: 900280c63af0 R11: 0040 R12: 8b8f9aefbe00
[   68.862089] R13: 8b8f9a221950 R14:  R15: 8b8f9a2c0770

Here we can see that sense buffer has, as we've seen, been
incremented. However, the "before" values for elevator and
elevator_data still match their expected values. As such, this should
be "impossible", since:

static struct request *__bfq_dispatch_request(struct blk_mq_hw_ctx *hctx)
{
struct bfq_data *bfqd = hctx->queue->elevator->elevator_data;
...
if (rq) {
inc_in_driver_start_rq:
bfqd->rq_in_driver++;
start_rq:
rq->rq_flags |= RQF_STARTED;
}
exit:
return rq;
}

For rq_in_driver++ to touch sense, bfqd must be equal to scsi_request
- 12 bytes (rq_in_driver is 60 byte offset from struct bfq_data, and
sense is 48 bytes offset from struct scsi_request).

The above bfq_dispatch_request+0x99/0xad0 is still
__bfq_dispatch_request at block/bfq-iosched.c:3902, just with KASAN
removed. 0x99 is 153 decimal:

(gdb) disass bfq_dispatch_request
Dump of assembler code for function bfq_dispatch_request:
...
   0x8134b2ad <+141>:   test   %rax,%rax
   0x8134b2b0 <+144>:   je 0x8134b2bd

   0x8134b2b2 <+146>:   addl   $0x1,0x100(%rax)
   0x8134b2b9 <+153>:   addl   $0x1,0x3c(%rbx)
   0x8134b2bd <+157>:   orl$0x2,0x18(%r12)
   0x8134b2c3 <+163>:   test   %ebp,%ebp
   0x8134b2c5 <+165>:   je 0x8134b2ce

   0x8134b2c7 <+167>:   mov0x108(%r14),%rax
   0x8134b2ce <+174>:   mov%r15,%rdi
   0x8134b2d1 <+177>:   callq  0x81706f90 <_raw_spin_unlock_irq>

Just as a sanity-check, at +157 %r12 should be rq, rq_flags is 0x18
offset from, $0x2 is RQF_STARTED, so that maps to "rq->rq_flags |=
RQF_STARTED", the next C statement. I don't know what +146 is, though?
An increment of something 256 bytes offset? There's a lot of inline
fun and reordering happening here, so I'm ignoring that for the
moment.

So at +153 %rbx should be bfqd (i.e. elevator_data), but this is the
tripped instruction. The watchpoint dump shows RBX as 8b8f9a2c0400
... which matches elevator_data.

So, what can this be? Some sort of cache issue? By the time the
watchpoint handler captures the register information, it's already
masked the problem? I'm stumped again. :(

-Kees

-- 
Kees Cook
Pixel Security


[PATCH] bsg referencing bus driver module

2018-04-17 Thread Anatoliy Glagolev
Description: bsg_release may crash while decrementing reference to the
parent device with the following stack:

[16834.636216,07] Call Trace:
 ...   scsi_proc_hostdir_rm
[16834.641944,07]  [] scsi_host_dev_release+0x3f/0x130
[16834.647740,07]  [] device_release+0x32/0xa0
[16834.653423,07]  [] kobject_cleanup+0x77/0x190
[16834.659002,07]  [] kobject_put+0x25/0x50
[16834.664430,07]  [] put_device+0x17/0x20
[16834.669740,07]  [] bsg_kref_release_function+0x24/0x30
[16834.675007,07]  [] bsg_release+0x166/0x1d0
[16834.680148,07]  [] __fput+0xcb/0x1d0
[16834.685156,07]  [] fput+0xe/0x10
[16834.690017,07]  [] task_work_run+0x86/0xb0
[16834.694781,07]  [] exit_to_usermode_loop+0x6b/0x9a
[16834.699466,07]  [] syscall_return_slowpath+0x55/0x60
[16834.704110,07]  [] int_ret_from_sys_call+0x25/0x9f

if the parent driver implementing the device has unloaded. To address
the problem, taking a reference to the parent driver module.

Note: this is a follow-up to earlier discussion "[PATCH] Waiting for
scsi_host_template release".

---
 block/bsg.c  | 31 +++
 drivers/scsi/scsi_transport_fc.c |  3 ++-
 include/linux/bsg.h  |  5 +
 3 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/block/bsg.c b/block/bsg.c
index b9a5361..0aa7d74 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -798,7 +798,8 @@ found:
  return bd;
 }

-static struct bsg_device *bsg_get_device(struct inode *inode, struct
file *file)
+static struct bsg_device *bsg_get_device(struct inode *inode, struct
file *file,
+ struct bsg_class_device **pbcd)
 {
  struct bsg_device *bd;
  struct bsg_class_device *bcd;
@@ -814,6 +815,7 @@ static struct bsg_device *bsg_get_device(struct
inode *inode, struct file *file)

  if (!bcd)
  return ERR_PTR(-ENODEV);
+ *pbcd = bcd;

  bd = __bsg_get_device(iminor(inode), bcd->queue);
  if (bd)
@@ -829,22 +831,34 @@ static struct bsg_device *bsg_get_device(struct
inode *inode, struct file *file)
 static int bsg_open(struct inode *inode, struct file *file)
 {
  struct bsg_device *bd;
+ struct bsg_class_device *bcd;

- bd = bsg_get_device(inode, file);
+ bd = bsg_get_device(inode, file, );

  if (IS_ERR(bd))
  return PTR_ERR(bd);

  file->private_data = bd;
+ if (bcd->parent_module) {
+ if (!try_module_get(bcd->parent_module)) {
+ bsg_put_device(bd);
+ return -ENODEV;
+ }
+ }
  return 0;
 }

 static int bsg_release(struct inode *inode, struct file *file)
 {
+ int ret;
  struct bsg_device *bd = file->private_data;
+ struct module *parent_module = bd->queue->bsg_dev.parent_module;

  file->private_data = NULL;
- return bsg_put_device(bd);
+ ret = bsg_put_device(bd);
+ if (parent_module)
+ module_put(parent_module);
+ return ret;
 }

 static unsigned int bsg_poll(struct file *file, poll_table *wait)
@@ -977,6 +991,14 @@ EXPORT_SYMBOL_GPL(bsg_unregister_queue);
 int bsg_register_queue(struct request_queue *q, struct device *parent,
const char *name, void (*release)(struct device *))
 {
+ return bsg_register_queue_ex(q, parent, name, release, NULL);
+}
+EXPORT_SYMBOL_GPL(bsg_register_queue);
+
+int bsg_register_queue_ex(struct request_queue *q, struct device *parent,
+  const char *name, void (*release)(struct device *),
+  struct module *parent_module)
+{
  struct bsg_class_device *bcd;
  dev_t dev;
  int ret;
@@ -1012,6 +1034,7 @@ int bsg_register_queue(struct request_queue *q,
struct device *parent,
  bcd->queue = q;
  bcd->parent = get_device(parent);
  bcd->release = release;
+ bcd->parent_module = parent_module;
  kref_init(>ref);
  dev = MKDEV(bsg_major, bcd->minor);
  class_dev = device_create(bsg_class, parent, dev, NULL, "%s", devname);
@@ -1039,7 +1062,7 @@ unlock:
  mutex_unlock(_mutex);
  return ret;
 }
-EXPORT_SYMBOL_GPL(bsg_register_queue);
+EXPORT_SYMBOL_GPL(bsg_register_queue_ex);

 static struct cdev bsg_cdev;

diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
index 24eaaf6..c153f80 100644
--- a/drivers/scsi/scsi_transport_fc.c
+++ b/drivers/scsi/scsi_transport_fc.c
@@ -4064,7 +4064,8 @@ fc_bsg_hostadd(struct Scsi_Host *shost, struct
fc_host_attrs *fc_host)
  blk_queue_rq_timed_out(q, fc_bsg_job_timeout);
  blk_queue_rq_timeout(q, FC_DEFAULT_BSG_TIMEOUT);

- err = bsg_register_queue(q, dev, bsg_name, NULL);
+ err = bsg_register_queue_ex(q, dev, bsg_name, NULL,
+ shost->hostt->module);
  if (err) {
  printk(KERN_ERR "fc_host%d: bsg interface failed to "
  "initialize - register queue\n",
diff --git a/include/linux/bsg.h b/include/linux/bsg.h
index 7173f6e..fe41e83 100644
--- a/include/linux/bsg.h
+++ b/include/linux/bsg.h
@@ -12,11 +12,16 @@ struct bsg_class_device {
  struct request_queue *queue;
  struct kref ref;
  void (*release)(struct device *);
+ struct module *parent_module;
 };

 extern int bsg_register_queue(struct request_queue *q,
   struct device *parent, const char *name,
   void (*release)(struct device *));
+extern int 

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-17 Thread Kees Cook
On Tue, Apr 17, 2018 at 1:03 PM, Kees Cook  wrote:
> The above bfq_dispatch_request+0x99/0xad0 is still
> __bfq_dispatch_request at block/bfq-iosched.c:3902, just with KASAN
> removed. 0x99 is 153 decimal:
>
> (gdb) disass bfq_dispatch_request
> Dump of assembler code for function bfq_dispatch_request:
> ...
>0x8134b2ad <+141>:   test   %rax,%rax
>0x8134b2b0 <+144>:   je 0x8134b2bd
> 
>0x8134b2b2 <+146>:   addl   $0x1,0x100(%rax)
>0x8134b2b9 <+153>:   addl   $0x1,0x3c(%rbx)
>0x8134b2bd <+157>:   orl$0x2,0x18(%r12)
>0x8134b2c3 <+163>:   test   %ebp,%ebp
>0x8134b2c5 <+165>:   je 0x8134b2ce
> 
>0x8134b2c7 <+167>:   mov0x108(%r14),%rax
>0x8134b2ce <+174>:   mov%r15,%rdi
>0x8134b2d1 <+177>:   callq  0x81706f90 
> <_raw_spin_unlock_irq>
>
> Just as a sanity-check, at +157 %r12 should be rq, rq_flags is 0x18
> offset from, $0x2 is RQF_STARTED, so that maps to "rq->rq_flags |=
> RQF_STARTED", the next C statement. I don't know what +146 is, though?
> An increment of something 256 bytes offset? There's a lot of inline
> fun and reordering happening here, so I'm ignoring that for the
> moment.

No -- I'm reading this wrong. The RIP is the IP _after_ the trap, so
+146 is the offender.

[   29.284746] watchpoint @ 95d41a0fe580 triggered
[   29.285349] sense before:95d41f45f700 after:95d41f45f701 (@95d41a
0fe580)
[   29.286176] elevator before:95d419419c00 after:95d419419c00
[   29.286847] elevator_data before:95d419418c00 after:95d419418c00
...
[   29.295069] RIP: 0010:bfq_dispatch_request+0x99/0xbb0
[   29.295622] RSP: 0018:b26e01707a40 EFLAGS: 0002
[   29.296181] RAX: 95d41a0fe480 RBX: 95d419418c00 RCX: 95d419418c08

RAX is 95d41a0fe480 and sense is stored at 95d41a0fe580,
exactly 0x100 away.

WTF is this addl?

-Kees

-- 
Kees Cook
Pixel Security


Re: usercopy whitelist woe in scsi_sense_cache

2018-04-17 Thread Kees Cook
On Tue, Apr 17, 2018 at 1:20 PM, Kees Cook  wrote:
> On Tue, Apr 17, 2018 at 1:03 PM, Kees Cook  wrote:
>> The above bfq_dispatch_request+0x99/0xad0 is still
>> __bfq_dispatch_request at block/bfq-iosched.c:3902, just with KASAN
>> removed. 0x99 is 153 decimal:
>>
>> (gdb) disass bfq_dispatch_request
>> Dump of assembler code for function bfq_dispatch_request:
>> ...
>>0x8134b2ad <+141>:   test   %rax,%rax
>>0x8134b2b0 <+144>:   je 0x8134b2bd
>> 
>>0x8134b2b2 <+146>:   addl   $0x1,0x100(%rax)
>>0x8134b2b9 <+153>:   addl   $0x1,0x3c(%rbx)
>>0x8134b2bd <+157>:   orl$0x2,0x18(%r12)
>>0x8134b2c3 <+163>:   test   %ebp,%ebp
>>0x8134b2c5 <+165>:   je 0x8134b2ce
>> 
>>0x8134b2c7 <+167>:   mov0x108(%r14),%rax
>>0x8134b2ce <+174>:   mov%r15,%rdi
>>0x8134b2d1 <+177>:   callq  0x81706f90 
>> <_raw_spin_unlock_irq>
>>
>> Just as a sanity-check, at +157 %r12 should be rq, rq_flags is 0x18
>> offset from, $0x2 is RQF_STARTED, so that maps to "rq->rq_flags |=
>> RQF_STARTED", the next C statement. I don't know what +146 is, though?
>> An increment of something 256 bytes offset? There's a lot of inline
>> fun and reordering happening here, so I'm ignoring that for the
>> moment.
>
> No -- I'm reading this wrong. The RIP is the IP _after_ the trap, so
> +146 is the offender.
>
> [   29.284746] watchpoint @ 95d41a0fe580 triggered
> [   29.285349] sense before:95d41f45f700 after:95d41f45f701 
> (@95d41a
> 0fe580)
> [   29.286176] elevator before:95d419419c00 after:95d419419c00
> [   29.286847] elevator_data before:95d419418c00 after:95d419418c00
> ...
> [   29.295069] RIP: 0010:bfq_dispatch_request+0x99/0xbb0
> [   29.295622] RSP: 0018:b26e01707a40 EFLAGS: 0002
> [   29.296181] RAX: 95d41a0fe480 RBX: 95d419418c00 RCX: 
> 95d419418c08
>
> RAX is 95d41a0fe480 and sense is stored at 95d41a0fe580,
> exactly 0x100 away.
>
> WTF is this addl?

What are the chances? :P Two ++ statements in a row separate by a
collapsed goto. FML. :)

...
bfqq->dispatched++;
goto inc_in_driver_start_rq;
...
inc_in_driver_start_rq:
bfqd->rq_in_driver++;
...

And there's the 0x100 (256):

struct bfq_queue {
...
intdispatched;   /*   256 4 */

So bfqq is corrupted somewhere... I'll keep digging. I hope you're all
enjoying my live debugging transcript. ;)

-Kees

-- 
Kees Cook
Pixel Security


Re: [PATCH 3/3] sd_zbc: Avoid that resetting a zone fails sporadically

2018-04-17 Thread Damien Le Moal
On 2018/04/16 18:04, Bart Van Assche wrote:
> Since SCSI scanning occurs asynchronously, since sd_revalidate_disk()
> is called from sd_probe_async() and since sd_revalidate_disk() calls
> sd_zbc_read_zones() it can happen that sd_zbc_read_zones() is called
> concurrently with blkdev_report_zones() and/or blkdev_reset_zones().
> That can cause these functions to fail with -EIO because
> sd_zbc_read_zones() e.g. sets q->nr_zones to zero before restoring it
> to the actual value, even if no drive characteristics have changed.
> Avoid that this can happen by making the following changes:
> - Protect the code that updates zone information with blk_queue_enter()
>   and blk_queue_exit().
> - Modify sd_zbc_setup_seq_zones_bitmap() and sd_zbc_setup() such that
>   these functions do not modify struct scsi_disk before all zone
>   information has been obtained.
> 
> Note: since commit 055f6e18e08f ("block: Make q_usage_counter also
> track legacy requests"; kernel v4.15) the request queue freezing
> mechanism also affects legacy request queues.
> 
> Fixes: 89d947561077 ("sd: Implement support for ZBC devices")
> Signed-off-by: Bart Van Assche 
> Cc: Jens Axboe 
> Cc: Damien Le Moal 
> Cc: Christoph Hellwig 
> Cc: Hannes Reinecke 
> Cc: sta...@vger.kernel.org # v4.10
> ---
>  drivers/scsi/sd_zbc.c  | 140 
> +
>  include/linux/blkdev.h |   5 ++
>  2 files changed, 87 insertions(+), 58 deletions(-)
> 
> diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
> index 2d0c06f7db3e..323e3dc4bc59 100644
> --- a/drivers/scsi/sd_zbc.c
> +++ b/drivers/scsi/sd_zbc.c
> @@ -390,8 +390,10 @@ static int sd_zbc_check_capacity(struct scsi_disk *sdkp, 
> unsigned char *buf)
>   *
>   * Check that all zones of the device are equal. The last zone can however
>   * be smaller. The zone size must also be a power of two number of LBAs.
> + *
> + * Returns the zone size in bytes upon success or an error code upon failure.
>   */
> -static int sd_zbc_check_zone_size(struct scsi_disk *sdkp)
> +static s64 sd_zbc_check_zone_size(struct scsi_disk *sdkp)
>  {
>   u64 zone_blocks = 0;
>   sector_t block = 0;
> @@ -402,8 +404,6 @@ static int sd_zbc_check_zone_size(struct scsi_disk *sdkp)
>   int ret;
>   u8 same;
>  
> - sdkp->zone_blocks = 0;
> -
>   /* Get a buffer */
>   buf = kmalloc(SD_ZBC_BUF_SIZE, GFP_KERNEL);
>   if (!buf)
> @@ -435,16 +435,17 @@ static int sd_zbc_check_zone_size(struct scsi_disk 
> *sdkp)
>  
>   /* Parse zone descriptors */
>   while (rec < buf + buf_len) {
> - zone_blocks = get_unaligned_be64([8]);
> - if (sdkp->zone_blocks == 0) {
> - sdkp->zone_blocks = zone_blocks;
> - } else if (zone_blocks != sdkp->zone_blocks &&
> -(block + zone_blocks < sdkp->capacity
> - || zone_blocks > sdkp->zone_blocks)) {
> - zone_blocks = 0;
> + u64 this_zone_blocks = get_unaligned_be64([8]);
> +
> + if (zone_blocks == 0) {
> + zone_blocks = this_zone_blocks;
> + } else if (this_zone_blocks != zone_blocks &&
> +(block + this_zone_blocks < sdkp->capacity
> + || this_zone_blocks > zone_blocks)) {
> + this_zone_blocks = 0;
>   goto out;
>   }
> - block += zone_blocks;
> + block += this_zone_blocks;
>   rec += 64;
>   }
>  
> @@ -457,8 +458,6 @@ static int sd_zbc_check_zone_size(struct scsi_disk *sdkp)
>  
>   } while (block < sdkp->capacity);
>  
> - zone_blocks = sdkp->zone_blocks;
> -
>  out:
>   if (!zone_blocks) {
>   if (sdkp->first_scan)
> @@ -478,8 +477,7 @@ static int sd_zbc_check_zone_size(struct scsi_disk *sdkp)
> "Zone size too large\n");
>   ret = -ENODEV;
>   } else {
> - sdkp->zone_blocks = zone_blocks;
> - sdkp->zone_shift = ilog2(zone_blocks);
> + ret = zone_blocks;
>   }
>  
>  out_free:
> @@ -490,15 +488,14 @@ static int sd_zbc_check_zone_size(struct scsi_disk 
> *sdkp)
>  
>  /**
>   * sd_zbc_alloc_zone_bitmap - Allocate a zone bitmap (one bit per zone).
> - * @sdkp: The disk of the bitmap
> + * @nr_zones: Number of zones to allocate space for.
> + * @numa_node: NUMA node to allocate the memory from.
>   */
> -static inline unsigned long *sd_zbc_alloc_zone_bitmap(struct scsi_disk *sdkp)
> +static inline unsigned long *
> +sd_zbc_alloc_zone_bitmap(u32 nr_zones, int numa_node)
>  {
> - struct request_queue *q = sdkp->disk->queue;
> -
> - 

Re: [PATCH 1/1] scsi/ufs: qcom: Don't enable PHY_QCOM_UFS by default

2018-04-17 Thread Bjorn Andersson
On Mon 09 Apr 23:31 PDT 2018, Vivek Gautam wrote:

> 
> 
> On 4/10/2018 1:39 AM, Bjorn Andersson wrote:
> > On Mon 09 Apr 10:38 PDT 2018, Vivek Gautam wrote:
> > > On 4/9/2018 10:21 PM, Bjorn Andersson wrote:
> > > > On Mon 09 Apr 06:24 PDT 2018, Vivek Gautam wrote:
> > [..]
> > > > > diff --git a/include/linux/phy/phy-qcom-ufs.h 
> > > > > b/include/linux/phy/phy-qcom-ufs.h
> > > > > index 0a2c18a9771d..1388c2a2965e 100644
> > > > > --- a/include/linux/phy/phy-qcom-ufs.h
> > > > > +++ b/include/linux/phy/phy-qcom-ufs.h
> > > > > @@ -31,8 +31,21 @@ void ufs_qcom_phy_enable_dev_ref_clk(struct phy 
> > > > > *phy);
> > > > > */
> > > > >void ufs_qcom_phy_disable_dev_ref_clk(struct phy *phy);
> > > > > +#if IS_ENABLED(CONFIG_PHY_QCOM_UFS)
> > > > >int ufs_qcom_phy_set_tx_lane_enable(struct phy *phy, u32 tx_lanes);
> > > > >void ufs_qcom_phy_save_controller_version(struct phy *phy,
> > > > > - u8 major, u16 minor, u16 step);
> > > > > +   u8 major, u16 minor, u16 
> > > > > step);
> > > > > +#else
> > > > > +static inline int ufs_qcom_phy_set_tx_lane_enable(struct phy *phy, 
> > > > > u32 tx_lanes)
> > > > > +{
> > > > > + return -ENOSYS;
> > > > > +}
> > > > > +
> > > > > +static inline void ufs_qcom_phy_save_controller_version(struct phy 
> > > > > *phy,
> > > > > + u8 major, u16 
> > > > > minor,
> > > > > + u16 step)
> > > > > +{
> > > > > +}
> > > > > +#endif /* PHY_QCOM_UFS */
> > > > What's the timeline for getting rid of the references to these
> > > > functions? I presume that code depending on these being here will
> > > > compile but won't actually work?
> > > Yes, these inline definitions are just to keep ufs-qcom happy with the
> > > direct
> > > calls that it makes to these functions.
> > > As you would know these couple of functions are just used by the 20nm phy.
> > > However, we don't have any platform yet in the upstream that enables this
> > > phy.
> > > I am hoping that we will eventually get rid of these functions when we
> > > further
> > > clean up ufs-qcom driver.
> > > 
> > I see, but that means that we're calling this function with a struct phy
> > that might not be a struct ufs_qcom_phy and as such a defconfig with
> > both enabled will have undefined outcome for the migrated phys.
> 
> No, we will have to add support for separate phys as sdm845 has phy per each
> lane,
> and the older struct phy will exist alongside.
> We will call this function only with the older phy pointer.
> 
> > In particular we do expect that the same kernel will boot on db820c and
> > sdm845-mtp, so we will have to enable support for the 14nm & 20nm phy
> > driver (and we don't want random crashes because someone happened to
> > enable it).
> 
> Right, so we create new struct phy while keeping older one intact to keep
> the
> ufs-qcom work with both - ufs_qcom_phy and qmp_phy.
> Some of the controller drivers, such as usb/dwc3/ keep support for old and
> new phys,
> although there the difference is between generic phy and the usb-phy.
> So, I am assuming that if we want to keep ufs-qcom on platforms using 20nm,
> 14nm and 10nm phys happy, we will have to keep the phys separately for
> sometime.
> What do you say about it?
> 

My concern is only that the UFS HCI driver doesn't have a way to know if
it's the new or old "type" of phy, but if you can get that working then
I don't have any objections about doing so for a transitional period.

But, you may not use kernel config options to handle this, the same
Image should boot on msm8916, msm8996 and sdm845 (with appropriate dtb
for each one).

> On db820c, we can still work with the ufs_qcom_phy.
> 

I do not have an issue with that.

Regards,
Bjorn


Re: [PATCH 1/3] sd_zbc: Change the type of the ZBC fields into u32

2018-04-17 Thread Damien Le Moal
On 2018/04/16 18:04, Bart Van Assche wrote:
> This patch does not change any functionality but makes it clear
> that it is on purpose that these fields are 32 bits wide.
> 
> Signed-off-by: Bart Van Assche 
> Cc: Damien Le Moal 
> Cc: Christoph Hellwig 
> Cc: Hannes Reinecke 
> ---
>  drivers/scsi/sd.h | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
> index 0d663b5e45bb..392c7d078ae3 100644
> --- a/drivers/scsi/sd.h
> +++ b/drivers/scsi/sd.h
> @@ -74,12 +74,12 @@ struct scsi_disk {
>   struct gendisk  *disk;
>   struct opal_dev *opal_dev;
>  #ifdef CONFIG_BLK_DEV_ZONED
> - unsigned intnr_zones;
> - unsigned intzone_blocks;
> - unsigned intzone_shift;
> - unsigned intzones_optimal_open;
> - unsigned intzones_optimal_nonseq;
> - unsigned intzones_max_open;
> + u32 nr_zones;
> + u32 zone_blocks;
> + u32 zone_shift;
> + u32 zones_optimal_open;
> + u32 zones_optimal_nonseq;
> + u32 zones_max_open;
>  #endif
>   atomic_topeners;
>   sector_tcapacity;   /* size in logical blocks */
> 

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research

Re: [PATCH 2/3] sd_zbc: Let the SCSI core handle ILLEGAL REQUEST / ASC 0x21

2018-04-17 Thread Damien Le Moal
On 2018/04/16 18:04, Bart Van Assche wrote:
> scsi_io_completion() translates the sense key ILLEGAL REQUEST / ASC
> 0x21 into ACTION_FAIL. That means that setting cmd->allowed to zero
> in sd_zbc_complete() for this sense code / ASC combination is not
> necessary. Hence remove the code that resets cmd->allowed from
> sd_zbc_complete().
> 
> Signed-off-by: Bart Van Assche 
> Cc: Damien Le Moal 
> Cc: Christoph Hellwig 
> Cc: Hannes Reinecke 
> ---
>  drivers/scsi/sd_zbc.c | 10 --
>  1 file changed, 10 deletions(-)
> 
> diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
> index 41df75eea57b..2d0c06f7db3e 100644
> --- a/drivers/scsi/sd_zbc.c
> +++ b/drivers/scsi/sd_zbc.c
> @@ -299,16 +299,6 @@ void sd_zbc_complete(struct scsi_cmnd *cmd, unsigned int 
> good_bytes,
>   case REQ_OP_WRITE:
>   case REQ_OP_WRITE_ZEROES:
>   case REQ_OP_WRITE_SAME:
> -
> - if (result &&
> - sshdr->sense_key == ILLEGAL_REQUEST &&
> - sshdr->asc == 0x21)
> - /*
> -  * INVALID ADDRESS FOR WRITE error: It is unlikely that
> -  * retrying write requests failed with any kind of
> -  * alignement error will result in success. So don't.
> -  */
> - cmd->allowed = 0;
>   break;
>  
>   case REQ_OP_ZONE_REPORT:
> 

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-17 Thread Jens Axboe
On 4/17/18 3:48 PM, Jens Axboe wrote:
> On 4/17/18 3:47 PM, Kees Cook wrote:
>> On Tue, Apr 17, 2018 at 2:39 PM, Jens Axboe  wrote:
>>> On 4/17/18 3:25 PM, Kees Cook wrote:
 On Tue, Apr 17, 2018 at 1:46 PM, Kees Cook  wrote:
> I see elv.priv[1] assignments made in a few places -- is it possible
> there is some kind of uninitialized-but-not-NULL state that can leak
> in there?

 Got it. This fixes it for me:

 diff --git a/block/blk-mq.c b/block/blk-mq.c
 index 0dc9e341c2a7..859df3160303 100644
 --- a/block/blk-mq.c
 +++ b/block/blk-mq.c
 @@ -363,7 +363,7 @@ static struct request *blk_mq_get_request(struct
 request_queue *q,

 rq = blk_mq_rq_ctx_init(data, tag, op);
 if (!op_is_flush(op)) {
 -   rq->elv.icq = NULL;
 +   memset(>elv, 0, sizeof(rq->elv));
 if (e && e->type->ops.mq.prepare_request) {
 if (e->type->icq_cache && rq_ioc(bio))
 blk_mq_sched_assign_ioc(rq, bio);
 @@ -461,7 +461,7 @@ void blk_mq_free_request(struct request *rq)
 e->type->ops.mq.finish_request(rq);
 if (rq->elv.icq) {
 put_io_context(rq->elv.icq->ioc);
 -   rq->elv.icq = NULL;
 +   memset(>elv, 0, sizeof(rq->elv));
 }
 }
>>>
>>> This looks like a BFQ problem, this should not be necessary. Paolo,
>>> you're calling your own prepare request handler from the insert
>>> as well, and your prepare request does nothing if rq->elv.icq == NULL.
>>
>> I sent the patch anyway, since it's kind of a robustness improvement,
>> I'd hope. If you fix BFQ also, please add:
> 
> It's also a memset() in the hot path, would prefer to avoid that...
> The issue here is really the convoluted bfq usage of insert/prepare,
> I'm sure Paolo can take it from here.

Does this fix it?

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index f0ecd98509d8..d883469a1582 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -4934,8 +4934,11 @@ static void bfq_prepare_request(struct request *rq, 
struct bio *bio)
bool new_queue = false;
bool bfqq_already_existing = false, split = false;
 
-   if (!rq->elv.icq)
+   if (!rq->elv.icq) {
+   rq->elv.priv[0] = rq->elv.priv[1] = NULL;
return;
+   }
+
bic = icq_to_bic(rq->elv.icq);
 
spin_lock_irq(>lock);

-- 
Jens Axboe



Re: [PATCH] blk-mq: Clear out elevator private data

2018-04-17 Thread Kees Cook
On Tue, Apr 17, 2018 at 2:45 PM, Jens Axboe  wrote:
> On 4/17/18 3:42 PM, Kees Cook wrote:
>> Some elevators may not correctly check rq->rq_flags & RQF_ELVPRIV, and
>> may attempt to read rq->elv fields. When requests got reused, this
>> caused BFQ to think it already had a bfqq (rq->elv.priv[1]) allocated.
>> This could lead to odd behaviors like having the sense buffer address
>> slowly start incrementing. This eventually tripped HARDENED_USERCOPY
>> and KASAN.
>>
>> This patch wipes all of rq->elv instead of just rq->elv.icq. While
>> it shouldn't technically be needed, this ends up being a robustness
>> improvement that should lead to at least finding bugs in elevators faster.
>
> Comments from the other email still apply, we should not need to do this
> full memset() for every request. From a quick look, BFQ needs to straighten
> out its usage of prepare request and interactions with insert_request.

Sure, understood. I would point out, FWIW, that memset() gets unrolled
by the compiler and this is just two more XORs in the same cacheline
(the two words following icq). (And there is SO much more being
cleared during alloc, it didn't seem like hardly any extra cost vs the
robustness it provided.)

-Kees

-- 
Kees Cook
Pixel Security


Re: [PATCH] blk-mq: Clear out elevator private data

2018-04-17 Thread Jens Axboe
On 4/17/18 4:57 PM, Kees Cook wrote:
> On Tue, Apr 17, 2018 at 2:45 PM, Jens Axboe  wrote:
>> On 4/17/18 3:42 PM, Kees Cook wrote:
>>> Some elevators may not correctly check rq->rq_flags & RQF_ELVPRIV, and
>>> may attempt to read rq->elv fields. When requests got reused, this
>>> caused BFQ to think it already had a bfqq (rq->elv.priv[1]) allocated.
>>> This could lead to odd behaviors like having the sense buffer address
>>> slowly start incrementing. This eventually tripped HARDENED_USERCOPY
>>> and KASAN.
>>>
>>> This patch wipes all of rq->elv instead of just rq->elv.icq. While
>>> it shouldn't technically be needed, this ends up being a robustness
>>> improvement that should lead to at least finding bugs in elevators faster.
>>
>> Comments from the other email still apply, we should not need to do this
>> full memset() for every request. From a quick look, BFQ needs to straighten
>> out its usage of prepare request and interactions with insert_request.
> 
> Sure, understood. I would point out, FWIW, that memset() gets unrolled
> by the compiler and this is just two more XORs in the same cacheline
> (the two words following icq). (And there is SO much more being
> cleared during alloc, it didn't seem like hardly any extra cost vs the
> robustness it provided.)

Yeah, it's not super pricey, but it's not needed. BFQ is the user of
the members, and the one that assigns them. You're saying leftover
assignments, since it doesn't always assign them. Hence I think that's
a better fix, just sent out a test patch a few minutes ago.

You did all the hard work, I'm just coasting on your findings.

-- 
Jens Axboe



Re: usercopy whitelist woe in scsi_sense_cache

2018-04-17 Thread Kees Cook
On Tue, Apr 17, 2018 at 3:57 PM, Jens Axboe  wrote:
> On 4/17/18 3:48 PM, Jens Axboe wrote:
>> On 4/17/18 3:47 PM, Kees Cook wrote:
>>> On Tue, Apr 17, 2018 at 2:39 PM, Jens Axboe  wrote:
 On 4/17/18 3:25 PM, Kees Cook wrote:
> On Tue, Apr 17, 2018 at 1:46 PM, Kees Cook  wrote:
>> I see elv.priv[1] assignments made in a few places -- is it possible
>> there is some kind of uninitialized-but-not-NULL state that can leak
>> in there?
>
> Got it. This fixes it for me:
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 0dc9e341c2a7..859df3160303 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -363,7 +363,7 @@ static struct request *blk_mq_get_request(struct
> request_queue *q,
>
> rq = blk_mq_rq_ctx_init(data, tag, op);
> if (!op_is_flush(op)) {
> -   rq->elv.icq = NULL;
> +   memset(>elv, 0, sizeof(rq->elv));
> if (e && e->type->ops.mq.prepare_request) {
> if (e->type->icq_cache && rq_ioc(bio))
> blk_mq_sched_assign_ioc(rq, bio);
> @@ -461,7 +461,7 @@ void blk_mq_free_request(struct request *rq)
> e->type->ops.mq.finish_request(rq);
> if (rq->elv.icq) {
> put_io_context(rq->elv.icq->ioc);
> -   rq->elv.icq = NULL;
> +   memset(>elv, 0, sizeof(rq->elv));
> }
> }

 This looks like a BFQ problem, this should not be necessary. Paolo,
 you're calling your own prepare request handler from the insert
 as well, and your prepare request does nothing if rq->elv.icq == NULL.
>>>
>>> I sent the patch anyway, since it's kind of a robustness improvement,
>>> I'd hope. If you fix BFQ also, please add:
>>
>> It's also a memset() in the hot path, would prefer to avoid that...
>> The issue here is really the convoluted bfq usage of insert/prepare,
>> I'm sure Paolo can take it from here.
>
> Does this fix it?
>
> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> index f0ecd98509d8..d883469a1582 100644
> --- a/block/bfq-iosched.c
> +++ b/block/bfq-iosched.c
> @@ -4934,8 +4934,11 @@ static void bfq_prepare_request(struct request *rq, 
> struct bio *bio)
> bool new_queue = false;
> bool bfqq_already_existing = false, split = false;
>
> -   if (!rq->elv.icq)
> +   if (!rq->elv.icq) {
> +   rq->elv.priv[0] = rq->elv.priv[1] = NULL;
> return;
> +   }
> +
> bic = icq_to_bic(rq->elv.icq);
>
> spin_lock_irq(>lock);

It does! Excellent. :)

-Kees

-- 
Kees Cook
Pixel Security


Re: usercopy whitelist woe in scsi_sense_cache

2018-04-17 Thread Jens Axboe
On 4/17/18 5:06 PM, Kees Cook wrote:
> On Tue, Apr 17, 2018 at 3:57 PM, Jens Axboe  wrote:
>> On 4/17/18 3:48 PM, Jens Axboe wrote:
>>> On 4/17/18 3:47 PM, Kees Cook wrote:
 On Tue, Apr 17, 2018 at 2:39 PM, Jens Axboe  wrote:
> On 4/17/18 3:25 PM, Kees Cook wrote:
>> On Tue, Apr 17, 2018 at 1:46 PM, Kees Cook  wrote:
>>> I see elv.priv[1] assignments made in a few places -- is it possible
>>> there is some kind of uninitialized-but-not-NULL state that can leak
>>> in there?
>>
>> Got it. This fixes it for me:
>>
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index 0dc9e341c2a7..859df3160303 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -363,7 +363,7 @@ static struct request *blk_mq_get_request(struct
>> request_queue *q,
>>
>> rq = blk_mq_rq_ctx_init(data, tag, op);
>> if (!op_is_flush(op)) {
>> -   rq->elv.icq = NULL;
>> +   memset(>elv, 0, sizeof(rq->elv));
>> if (e && e->type->ops.mq.prepare_request) {
>> if (e->type->icq_cache && rq_ioc(bio))
>> blk_mq_sched_assign_ioc(rq, bio);
>> @@ -461,7 +461,7 @@ void blk_mq_free_request(struct request *rq)
>> e->type->ops.mq.finish_request(rq);
>> if (rq->elv.icq) {
>> put_io_context(rq->elv.icq->ioc);
>> -   rq->elv.icq = NULL;
>> +   memset(>elv, 0, sizeof(rq->elv));
>> }
>> }
>
> This looks like a BFQ problem, this should not be necessary. Paolo,
> you're calling your own prepare request handler from the insert
> as well, and your prepare request does nothing if rq->elv.icq == NULL.

 I sent the patch anyway, since it's kind of a robustness improvement,
 I'd hope. If you fix BFQ also, please add:
>>>
>>> It's also a memset() in the hot path, would prefer to avoid that...
>>> The issue here is really the convoluted bfq usage of insert/prepare,
>>> I'm sure Paolo can take it from here.
>>
>> Does this fix it?
>>
>> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
>> index f0ecd98509d8..d883469a1582 100644
>> --- a/block/bfq-iosched.c
>> +++ b/block/bfq-iosched.c
>> @@ -4934,8 +4934,11 @@ static void bfq_prepare_request(struct request *rq, 
>> struct bio *bio)
>> bool new_queue = false;
>> bool bfqq_already_existing = false, split = false;
>>
>> -   if (!rq->elv.icq)
>> +   if (!rq->elv.icq) {
>> +   rq->elv.priv[0] = rq->elv.priv[1] = NULL;
>> return;
>> +   }
>> +
>> bic = icq_to_bic(rq->elv.icq);
>>
>> spin_lock_irq(>lock);
> 
> It does! Excellent. :)

Sweet! I'll add a comment and queue it up for 4.17 and mark for stable, with
your annotations too.

-- 
Jens Axboe



Re: usercopy whitelist woe in scsi_sense_cache

2018-04-17 Thread Jens Axboe
On 4/17/18 3:25 PM, Kees Cook wrote:
> On Tue, Apr 17, 2018 at 1:46 PM, Kees Cook  wrote:
>> I see elv.priv[1] assignments made in a few places -- is it possible
>> there is some kind of uninitialized-but-not-NULL state that can leak
>> in there?
> 
> Got it. This fixes it for me:
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 0dc9e341c2a7..859df3160303 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -363,7 +363,7 @@ static struct request *blk_mq_get_request(struct
> request_queue *q,
> 
> rq = blk_mq_rq_ctx_init(data, tag, op);
> if (!op_is_flush(op)) {
> -   rq->elv.icq = NULL;
> +   memset(>elv, 0, sizeof(rq->elv));
> if (e && e->type->ops.mq.prepare_request) {
> if (e->type->icq_cache && rq_ioc(bio))
> blk_mq_sched_assign_ioc(rq, bio);
> @@ -461,7 +461,7 @@ void blk_mq_free_request(struct request *rq)
> e->type->ops.mq.finish_request(rq);
> if (rq->elv.icq) {
> put_io_context(rq->elv.icq->ioc);
> -   rq->elv.icq = NULL;
> +   memset(>elv, 0, sizeof(rq->elv));
> }
> }

This looks like a BFQ problem, this should not be necessary. Paolo,
you're calling your own prepare request handler from the insert
as well, and your prepare request does nothing if rq->elv.icq == NULL.

-- 
Jens Axboe



Re: usercopy whitelist woe in scsi_sense_cache

2018-04-17 Thread Jens Axboe
On 4/17/18 3:47 PM, Kees Cook wrote:
> On Tue, Apr 17, 2018 at 2:39 PM, Jens Axboe  wrote:
>> On 4/17/18 3:25 PM, Kees Cook wrote:
>>> On Tue, Apr 17, 2018 at 1:46 PM, Kees Cook  wrote:
 I see elv.priv[1] assignments made in a few places -- is it possible
 there is some kind of uninitialized-but-not-NULL state that can leak
 in there?
>>>
>>> Got it. This fixes it for me:
>>>
>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>> index 0dc9e341c2a7..859df3160303 100644
>>> --- a/block/blk-mq.c
>>> +++ b/block/blk-mq.c
>>> @@ -363,7 +363,7 @@ static struct request *blk_mq_get_request(struct
>>> request_queue *q,
>>>
>>> rq = blk_mq_rq_ctx_init(data, tag, op);
>>> if (!op_is_flush(op)) {
>>> -   rq->elv.icq = NULL;
>>> +   memset(>elv, 0, sizeof(rq->elv));
>>> if (e && e->type->ops.mq.prepare_request) {
>>> if (e->type->icq_cache && rq_ioc(bio))
>>> blk_mq_sched_assign_ioc(rq, bio);
>>> @@ -461,7 +461,7 @@ void blk_mq_free_request(struct request *rq)
>>> e->type->ops.mq.finish_request(rq);
>>> if (rq->elv.icq) {
>>> put_io_context(rq->elv.icq->ioc);
>>> -   rq->elv.icq = NULL;
>>> +   memset(>elv, 0, sizeof(rq->elv));
>>> }
>>> }
>>
>> This looks like a BFQ problem, this should not be necessary. Paolo,
>> you're calling your own prepare request handler from the insert
>> as well, and your prepare request does nothing if rq->elv.icq == NULL.
> 
> I sent the patch anyway, since it's kind of a robustness improvement,
> I'd hope. If you fix BFQ also, please add:

It's also a memset() in the hot path, would prefer to avoid that...
The issue here is really the convoluted bfq usage of insert/prepare,
I'm sure Paolo can take it from here.

-- 
Jens Axboe



Re: usercopy whitelist woe in scsi_sense_cache

2018-04-17 Thread Jens Axboe
On 4/17/18 2:25 PM, Kees Cook wrote:
> On Tue, Apr 17, 2018 at 1:20 PM, Kees Cook  wrote:
>> On Tue, Apr 17, 2018 at 1:03 PM, Kees Cook  wrote:
>>> The above bfq_dispatch_request+0x99/0xad0 is still
>>> __bfq_dispatch_request at block/bfq-iosched.c:3902, just with KASAN
>>> removed. 0x99 is 153 decimal:
>>>
>>> (gdb) disass bfq_dispatch_request
>>> Dump of assembler code for function bfq_dispatch_request:
>>> ...
>>>0x8134b2ad <+141>:   test   %rax,%rax
>>>0x8134b2b0 <+144>:   je 0x8134b2bd
>>> 
>>>0x8134b2b2 <+146>:   addl   $0x1,0x100(%rax)
>>>0x8134b2b9 <+153>:   addl   $0x1,0x3c(%rbx)
>>>0x8134b2bd <+157>:   orl$0x2,0x18(%r12)
>>>0x8134b2c3 <+163>:   test   %ebp,%ebp
>>>0x8134b2c5 <+165>:   je 0x8134b2ce
>>> 
>>>0x8134b2c7 <+167>:   mov0x108(%r14),%rax
>>>0x8134b2ce <+174>:   mov%r15,%rdi
>>>0x8134b2d1 <+177>:   callq  0x81706f90 
>>> <_raw_spin_unlock_irq>
>>>
>>> Just as a sanity-check, at +157 %r12 should be rq, rq_flags is 0x18
>>> offset from, $0x2 is RQF_STARTED, so that maps to "rq->rq_flags |=
>>> RQF_STARTED", the next C statement. I don't know what +146 is, though?
>>> An increment of something 256 bytes offset? There's a lot of inline
>>> fun and reordering happening here, so I'm ignoring that for the
>>> moment.
>>
>> No -- I'm reading this wrong. The RIP is the IP _after_ the trap, so
>> +146 is the offender.
>>
>> [   29.284746] watchpoint @ 95d41a0fe580 triggered
>> [   29.285349] sense before:95d41f45f700 after:95d41f45f701 
>> (@95d41a
>> 0fe580)
>> [   29.286176] elevator before:95d419419c00 after:95d419419c00
>> [   29.286847] elevator_data before:95d419418c00 after:95d419418c00
>> ...
>> [   29.295069] RIP: 0010:bfq_dispatch_request+0x99/0xbb0
>> [   29.295622] RSP: 0018:b26e01707a40 EFLAGS: 0002
>> [   29.296181] RAX: 95d41a0fe480 RBX: 95d419418c00 RCX: 
>> 95d419418c08
>>
>> RAX is 95d41a0fe480 and sense is stored at 95d41a0fe580,
>> exactly 0x100 away.
>>
>> WTF is this addl?
> 
> What are the chances? :P Two ++ statements in a row separate by a
> collapsed goto. FML. :)
> 
> ...
> bfqq->dispatched++;
> goto inc_in_driver_start_rq;
> ...
> inc_in_driver_start_rq:
> bfqd->rq_in_driver++;
> ...
> 
> And there's the 0x100 (256):
> 
> struct bfq_queue {
> ...
> intdispatched;   /*   256 4 */
> 
> So bfqq is corrupted somewhere... I'll keep digging. I hope you're all
> enjoying my live debugging transcript. ;)

It has to be the latter bfqq->dispatched increment, as those are
transient (and bfqd is not).

Adding Paolo.

-- 
Jens Axboe



Re: usercopy whitelist woe in scsi_sense_cache

2018-04-17 Thread Kees Cook
On Tue, Apr 17, 2018 at 1:28 PM, Jens Axboe  wrote:
> It has to be the latter bfqq->dispatched increment, as those are
> transient (and bfqd is not).

Yeah, and I see a lot of comments around the lifetime of rq and bfqq,
so I assume something is not being locked correctly.

#define RQ_BFQQ(rq) ((rq)->elv.priv[1])

static struct request *__bfq_dispatch_request(struct blk_mq_hw_ctx *hctx)
{
struct bfq_data *bfqd = hctx->queue->elevator->elevator_data;
struct request *rq = NULL;
struct bfq_queue *bfqq = NULL;

if (!list_empty(>dispatch)) {
rq = list_first_entry(>dispatch, struct request,
  queuelist);
list_del_init(>queuelist);

bfqq = RQ_BFQQ(rq);

if (bfqq) {
/*
 * Increment counters here, because this
 * dispatch does not follow the standard
 * dispatch flow (where counters are
 * incremented)
 */
bfqq->dispatched++;
...

I see elv.priv[1] assignments made in a few places -- is it possible
there is some kind of uninitialized-but-not-NULL state that can leak
in there?

bfq_prepare_request() assigns elv.priv[1], and bfq_insert_request()
only checks that it's non-NULL (if at all) in one case. Can
bfq_insert_request() get called without bfq_prepare_request() being
called first?

-Kees

-- 
Kees Cook
Pixel Security


Re: [PATCH] blk-mq: Clear out elevator private data

2018-04-17 Thread Jens Axboe
On 4/17/18 3:42 PM, Kees Cook wrote:
> Some elevators may not correctly check rq->rq_flags & RQF_ELVPRIV, and
> may attempt to read rq->elv fields. When requests got reused, this
> caused BFQ to think it already had a bfqq (rq->elv.priv[1]) allocated.
> This could lead to odd behaviors like having the sense buffer address
> slowly start incrementing. This eventually tripped HARDENED_USERCOPY
> and KASAN.
> 
> This patch wipes all of rq->elv instead of just rq->elv.icq. While
> it shouldn't technically be needed, this ends up being a robustness
> improvement that should lead to at least finding bugs in elevators faster.

Comments from the other email still apply, we should not need to do this
full memset() for every request. From a quick look, BFQ needs to straighten
out its usage of prepare request and interactions with insert_request.

> Reported-by: Oleksandr Natalenko 
> Fixes: bd166ef183c26 ("blk-mq-sched: add framework for MQ capable IO 
> schedulers")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Kees Cook 
> ---
> In theory, BFQ needs to also check the RQF_ELVPRIV flag, but I'll leave that
> to Paolo to figure out. Also, my Fixes line is kind of a best-guess. This
> is where icq was originally wiped, so it seemed as good a commit as any.

Yeah, that's probably a bit too broad for fixes :-)

-- 
Jens Axboe



Re: usercopy whitelist woe in scsi_sense_cache

2018-04-17 Thread Oleksandr Natalenko

Hi.

17.04.2018 23:47, Kees Cook wrote:

I sent the patch anyway, since it's kind of a robustness improvement,
I'd hope. If you fix BFQ also, please add:

Reported-by: Oleksandr Natalenko 
Root-caused-by: Kees Cook 

:) I gotta task-switch to other things!

Thanks for the pointers, and thank you Oleksandr for providing the 
reproducer!


That was a great fun to read. Thank you for digging into it.

Regards,
  Oleksandr


Re: usercopy whitelist woe in scsi_sense_cache

2018-04-17 Thread Kees Cook
On Tue, Apr 17, 2018 at 1:46 PM, Kees Cook  wrote:
> I see elv.priv[1] assignments made in a few places -- is it possible
> there is some kind of uninitialized-but-not-NULL state that can leak
> in there?

Got it. This fixes it for me:

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 0dc9e341c2a7..859df3160303 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -363,7 +363,7 @@ static struct request *blk_mq_get_request(struct
request_queue *q,

rq = blk_mq_rq_ctx_init(data, tag, op);
if (!op_is_flush(op)) {
-   rq->elv.icq = NULL;
+   memset(>elv, 0, sizeof(rq->elv));
if (e && e->type->ops.mq.prepare_request) {
if (e->type->icq_cache && rq_ioc(bio))
blk_mq_sched_assign_ioc(rq, bio);
@@ -461,7 +461,7 @@ void blk_mq_free_request(struct request *rq)
e->type->ops.mq.finish_request(rq);
if (rq->elv.icq) {
put_io_context(rq->elv.icq->ioc);
-   rq->elv.icq = NULL;
+   memset(>elv, 0, sizeof(rq->elv));
}
}



-- 
Kees Cook
Pixel Security


Re: usercopy whitelist woe in scsi_sense_cache

2018-04-17 Thread Kees Cook
On Tue, Apr 17, 2018 at 2:39 PM, Jens Axboe  wrote:
> On 4/17/18 3:25 PM, Kees Cook wrote:
>> On Tue, Apr 17, 2018 at 1:46 PM, Kees Cook  wrote:
>>> I see elv.priv[1] assignments made in a few places -- is it possible
>>> there is some kind of uninitialized-but-not-NULL state that can leak
>>> in there?
>>
>> Got it. This fixes it for me:
>>
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index 0dc9e341c2a7..859df3160303 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -363,7 +363,7 @@ static struct request *blk_mq_get_request(struct
>> request_queue *q,
>>
>> rq = blk_mq_rq_ctx_init(data, tag, op);
>> if (!op_is_flush(op)) {
>> -   rq->elv.icq = NULL;
>> +   memset(>elv, 0, sizeof(rq->elv));
>> if (e && e->type->ops.mq.prepare_request) {
>> if (e->type->icq_cache && rq_ioc(bio))
>> blk_mq_sched_assign_ioc(rq, bio);
>> @@ -461,7 +461,7 @@ void blk_mq_free_request(struct request *rq)
>> e->type->ops.mq.finish_request(rq);
>> if (rq->elv.icq) {
>> put_io_context(rq->elv.icq->ioc);
>> -   rq->elv.icq = NULL;
>> +   memset(>elv, 0, sizeof(rq->elv));
>> }
>> }
>
> This looks like a BFQ problem, this should not be necessary. Paolo,
> you're calling your own prepare request handler from the insert
> as well, and your prepare request does nothing if rq->elv.icq == NULL.

I sent the patch anyway, since it's kind of a robustness improvement,
I'd hope. If you fix BFQ also, please add:

Reported-by: Oleksandr Natalenko 
Root-caused-by: Kees Cook 

:) I gotta task-switch to other things!

Thanks for the pointers, and thank you Oleksandr for providing the reproducer!

-Kees

-- 
Kees Cook
Pixel Security


Re: MPT Fusion - ext4 delayed allocation failed errors on 4.14!

2018-04-17 Thread Martin K. Petersen

James,

>> However, it doesn't look like we have much to work with since there
>> is no ATA Information VPD page exposed.
>
> it's the fat firmware problem again.  However, the driver does know, so
> could we persuade a modification of
> mpt3sas_scsih.c:_scsih_display_sata_capabilities() to populate a VPD
> page?

This is gen 1 MPT. I'm a bit irked about disabling WRITE SAME for
attached SAS devices. Given how old this controller is, however, it
hardly seems worth the hassle to fix this properly.

-- 
Martin K. Petersen  Oracle Linux Engineering


[PATCH v3 1/6] ilog2: create truly constant version for sparse

2018-04-17 Thread Martin Wilck
Sparse emits errors about ilog2() in array indices because of the use of
__ilog2_32() and __ilog2_64(), rightly so
(https://www.spinics.net/lists/linux-sparse/msg03471.html).

Create a const_ilog2() variant that works with sparse for this
scenario.

(Note: checkpatch.pl complains about missing parentheses, but that appears
to be a false positive. I can get rid of the warning simply by inserting
whitespace, making checkpatch "see" the whole macro).

Signed-off-by: Martin Wilck 
---
 include/linux/log2.h | 35 ---
 1 file changed, 24 insertions(+), 11 deletions(-)

diff --git a/include/linux/log2.h b/include/linux/log2.h
index 41a1ae0..2af7f778 100644
--- a/include/linux/log2.h
+++ b/include/linux/log2.h
@@ -72,16 +72,13 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
 }
 
 /**
- * ilog2 - log base 2 of 32-bit or a 64-bit unsigned value
+ * const_ilog2 - log base 2 of 32-bit or a 64-bit constant unsigned value
  * @n: parameter
  *
- * constant-capable log of base 2 calculation
- * - this can be used to initialise global variables from constant data, hence
- * the massive ternary operator construction
- *
- * selects the appropriately-sized optimised version depending on sizeof(n)
+ * Use this where sparse expects a true constant expression, e.g. for array
+ * indices.
  */
-#define ilog2(n)   \
+#define const_ilog2(n) \
 (  \
__builtin_constant_p(n) ? ( \
(n) < 2 ? 0 :   \
@@ -147,10 +144,26 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
(n) & (1ULL <<  4) ?  4 :   \
(n) & (1ULL <<  3) ?  3 :   \
(n) & (1ULL <<  2) ?  2 :   \
-   1 ) :   \
-   (sizeof(n) <= 4) ?  \
-   __ilog2_u32(n) :\
-   __ilog2_u64(n)  \
+   1) :\
+   -1)
+
+/**
+ * ilog2 - log base 2 of 32-bit or a 64-bit unsigned value
+ * @n: parameter
+ *
+ * constant-capable log of base 2 calculation
+ * - this can be used to initialise global variables from constant data, hence
+ * the massive ternary operator construction
+ *
+ * selects the appropriately-sized optimised version depending on sizeof(n)
+ */
+#define ilog2(n) \
+( \
+   __builtin_constant_p(n) ?   \
+   const_ilog2(n) :\
+   (sizeof(n) <= 4) ?  \
+   __ilog2_u32(n) :\
+   __ilog2_u64(n)  \
  )
 
 /**
-- 
2.16.1



[PATCH v3 0/6] scsi: handle special return codes for ABORTED COMMAND

2018-04-17 Thread Martin Wilck
Here is another attempt to handle the special return codes for ABORTED COMMAND
for certain SCSI devices. Following MKP's recommendation, I've created two
new BLIST flags, simplifying the code in scsi_error.c compared to the previous
versions. Rather than using "free" bits, I increased the length of
blist_flag_t to 64 bit, and used previously unused bits. I also added checking
for obsolete and unused bits.

For the blist_flag_t size increase, I used sparse to try and avoid regressions;
that necessitated fixing sparse's errors for the current code first.

Martin Wilck (6):
  ilog2: create truly constant version for sparse
  scsi: use const_ilog2 for array indices
  scsi: devinfo: change blist_flag_t to 64bit
  scsi: devinfo: warn on undefined blist flags
  scsi: devinfo: add BLIST_RETRY_ITF for EMC Symmetrix
  scsi: devinfo: BLIST_RETRY_ASC_C1 for Fujitsu ETERNUS

 drivers/scsi/Makefile   |  2 +-
 drivers/scsi/scsi_debugfs.c |  2 +-
 drivers/scsi/scsi_devinfo.c | 28 +
 drivers/scsi/scsi_error.c   |  7 +
 drivers/scsi/scsi_sysfs.c   |  2 +-
 include/linux/log2.h| 35 ++---
 include/scsi/scsi_device.h  |  2 +-
 include/scsi/scsi_devinfo.h | 75 ++---
 8 files changed, 107 insertions(+), 46 deletions(-)

-- 
2.16.1



Re: [PATCH v3 1/6] ilog2: create truly constant version for sparse

2018-04-17 Thread Linus Torvalds
On Tue, Apr 17, 2018 at 4:35 PM, Martin Wilck  wrote:
> Sparse emits errors about ilog2() in array indices because of the use of
> __ilog2_32() and __ilog2_64(),

If sparse warns about it, then presumably gcc with -Wvla warns about it too?

And if thats the case, then __builtin_constant_p() and a ternary
operation isn't good enough, because gcc -Wvla is even more anal than
sparse is.

Which is why we have that __is_constexpr() magic for min/max().

So I suspect that what you'd want is

  #define ilog2(n) \
__builtin_choose_expr(__is_constexpr(n), \
const_ilog2(n), \
__builtin_choose_expr(sizeof(n) <= 4, \
__ilog2_u32(n), \
__ilog2_u64(n)))

or something. Hmm?

   Linus


Re: [PATCHv3] tcmu: allow userspace to reset netlink

2018-04-17 Thread Xiubo Li

On 2018/4/18 0:40, Mike Christie wrote:

Nick,

Ignore this v3 too. It will not work for deletion.
Yeah, for deletion the vfs will lock the file, so there is no any new 
access could in, will redesign and make it module wide.



On 04/16/2018 07:43 AM, xiu...@redhat.com wrote:

From: Xiubo Li 

This patch adds 1 tcmu attr to reset and complete all the blocked
netlink waiting threads. It's used when the userspace daemon like
tcmu-runner has crashed or forced to shutdown just before the
netlink requests be replied to the kernel, then the netlink requeting
threads will get stuck forever. We must reboot the machine to recover
from it and by this the rebootng is not a must then.

The Call Trace will be something like:
==
INFO: task targetctl:22655 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
targetctl   D 880169718fd0 0 22655  17249 0x0080
Call Trace:
  [] schedule+0x29/0x70
  [] schedule_timeout+0x239/0x2c0
  [] ? skb_release_data+0xf2/0x140
  [] wait_for_completion+0xfd/0x140
  [] ? wake_up_state+0x20/0x20
  [] tcmu_netlink_event+0x26a/0x3a0 [target_core_user]
  [] ? wake_up_atomic_t+0x30/0x30
  [] tcmu_configure_device+0x236/0x350 [target_core_user]
  [] target_configure_device+0x3f/0x3b0 [target_core_mod]
  [] target_core_store_dev_enable+0x2c/0x60 [target_core_mod]
  [] target_core_dev_store+0x24/0x40 [target_core_mod]
  [] configfs_write_file+0xc4/0x130
  [] vfs_write+0xbd/0x1e0
  [] SyS_write+0x7f/0xe0
  [] system_call_fastpath+0x16/0x1b
==

Signed-off-by: Xiubo Li 
---
Changes since v1(suggested by Mike Christie):
v2: - Makes the reset per device.
v3: - Remove nl_cmd->complete, use status instead
 - Fix lock issue
 - Check if a nl command is even waiting before trying to wake up

  drivers/target/target_core_user.c | 60 +--
  1 file changed, 51 insertions(+), 9 deletions(-)

diff --git a/drivers/target/target_core_user.c 
b/drivers/target/target_core_user.c
index 4ad89ea..a831f5b7 100644
--- a/drivers/target/target_core_user.c
+++ b/drivers/target/target_core_user.c
@@ -104,8 +104,8 @@ struct tcmu_hba {
  #define TCMU_CONFIG_LEN 256
  
  struct tcmu_nl_cmd {

-   /* wake up thread waiting for reply */
-   struct completion complete;
+   unsigned int waiter;
+
int cmd;
int status;
  };
@@ -159,9 +159,12 @@ struct tcmu_dev {
  
  	spinlock_t nl_cmd_lock;

struct tcmu_nl_cmd curr_nl_cmd;
-   /* wake up threads waiting on curr_nl_cmd */
+   /* wake up threads waiting on nl_cmd_wq */
wait_queue_head_t nl_cmd_wq;
  
+	/* complete thread waiting complete_wq */

+   wait_queue_head_t complete_wq;
+
char dev_config[TCMU_CONFIG_LEN];
  
  	int nl_reply_supported;

@@ -307,11 +310,13 @@ static int tcmu_genl_cmd_done(struct genl_info *info, int 
completed_cmd)
nl_cmd->status = rc;
}
  
-	spin_unlock(>nl_cmd_lock);

if (!is_removed)
 target_undepend_item(>dev_group.cg_item);
-   if (!ret)
-   complete(_cmd->complete);
+   if (!ret && nl_cmd->waiter) {
+   nl_cmd->waiter--;
+   wake_up(>complete_wq);
+   }
+   spin_unlock(>nl_cmd_lock);
return ret;
  }
  
@@ -1258,6 +1263,7 @@ static struct se_device *tcmu_alloc_device(struct se_hba *hba, const char *name)

timer_setup(>cmd_timer, tcmu_cmd_timedout, 0);
  
  	init_waitqueue_head(>nl_cmd_wq);

+   init_waitqueue_head(>complete_wq);
spin_lock_init(>nl_cmd_lock);
  
  	INIT_RADIX_TREE(>data_blocks, GFP_KERNEL);

@@ -1554,8 +1560,8 @@ static void tcmu_init_genl_cmd_reply(struct tcmu_dev 
*udev, int cmd)
}
  
  	memset(nl_cmd, 0, sizeof(*nl_cmd));

+   nl_cmd->status = 1;
nl_cmd->cmd = cmd;
-   init_completion(_cmd->complete);
  
  	spin_unlock(>nl_cmd_lock);

  }
@@ -1572,13 +1578,16 @@ static int tcmu_wait_genl_cmd_reply(struct tcmu_dev 
*udev)
if (udev->nl_reply_supported <= 0)
return 0;
  
+	spin_lock(>nl_cmd_lock);

+   nl_cmd->waiter++;
+   spin_unlock(>nl_cmd_lock);
+
pr_debug("sleeping for nl reply\n");
-   wait_for_completion(_cmd->complete);
+   wait_event(udev->complete_wq, nl_cmd->status != 1);
  
  	spin_lock(>nl_cmd_lock);

nl_cmd->cmd = TCMU_CMD_UNSPEC;
ret = nl_cmd->status;
-   nl_cmd->status = 0;
spin_unlock(>nl_cmd_lock);
  
  	wake_up_all(>nl_cmd_wq);

@@ -2323,6 +2332,38 @@ static ssize_t tcmu_block_dev_store(struct config_item 
*item, const char *page,
  }
  CONFIGFS_ATTR(tcmu_, block_dev);
  
+static ssize_t tcmu_reset_netlink_store(struct config_item *item, const char *page,

+   size_t count)
+{
+   struct se_device *se_dev = container_of(to_config_group(item),
+   struct se_device,
+

[PATCH v3 3/6] scsi: devinfo: change blist_flag_t to 64bit

2018-04-17 Thread Martin Wilck
Space for SCSI blist flags is gradually running out. Change the
type to __u64. And fix a checkpatch complaint about symbolic
mode flags in scsi_devinfo.c.

Make checkpatch happy by replacing simple_strtoul() with kstrtoull().

Signed-off-by: Martin Wilck 
---
 drivers/scsi/scsi_devinfo.c | 18 +++-
 drivers/scsi/scsi_sysfs.c   |  2 +-
 include/scsi/scsi_device.h  |  2 +-
 include/scsi/scsi_devinfo.h | 50 ++---
 4 files changed, 40 insertions(+), 32 deletions(-)

diff --git a/drivers/scsi/scsi_devinfo.c b/drivers/scsi/scsi_devinfo.c
index e5370d7..fc6b755 100644
--- a/drivers/scsi/scsi_devinfo.c
+++ b/drivers/scsi/scsi_devinfo.c
@@ -361,8 +361,16 @@ int scsi_dev_info_list_add_keyed(int compatible, char 
*vendor, char *model,
scsi_strcpy_devinfo("model", devinfo->model, sizeof(devinfo->model),
model, compatible);
 
-   if (strflags)
-   flags = (__force blist_flags_t)simple_strtoul(strflags, NULL, 
0);
+   if (strflags) {
+   unsigned long long val;
+   int ret = kstrtoull(strflags, 0, );
+
+   if (ret != 0) {
+   kfree(devinfo);
+   return ret;
+   }
+   flags = (__force blist_flags_t)val;
+   }
devinfo->flags = flags;
devinfo->compatible = compatible;
 
@@ -615,7 +623,7 @@ static int devinfo_seq_show(struct seq_file *m, void *v)
devinfo_table->name)
seq_printf(m, "[%s]:\n", devinfo_table->name);
 
-   seq_printf(m, "'%.8s' '%.16s' 0x%x\n",
+   seq_printf(m, "'%.8s' '%.16s' 0x%llx\n",
   devinfo->vendor, devinfo->model, devinfo->flags);
return 0;
 }
@@ -734,9 +742,9 @@ MODULE_PARM_DESC(dev_flags,
 " list entries for vendor and model with an integer value of flags"
 " to the scsi device info list");
 
-module_param_named(default_dev_flags, scsi_default_dev_flags, int, 
S_IRUGO|S_IWUSR);
+module_param_named(default_dev_flags, scsi_default_dev_flags, ullong, 0644);
 MODULE_PARM_DESC(default_dev_flags,
-"scsi default device flag integer value");
+"scsi default device flag uint64_t value");
 
 /**
  * scsi_exit_devinfo - remove /proc/scsi/device_info & the scsi_dev_info_list
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index feacd7a..43c27ce 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -968,7 +968,7 @@ sdev_show_wwid(struct device *dev, struct device_attribute 
*attr,
 static DEVICE_ATTR(wwid, S_IRUGO, sdev_show_wwid, NULL);
 
 #define BLIST_FLAG_NAME(name)  \
-   [const_ilog2((__force unsigned int)BLIST_##name)] = #name
+   [const_ilog2((__force __u64)BLIST_##name)] = #name
 static const char *const sdev_bflags_name[] = {
 #include "scsi_devinfo_tbl.c"
 };
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 7ae177c..4c36af6 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -15,7 +15,7 @@ struct scsi_cmnd;
 struct scsi_lun;
 struct scsi_sense_hdr;
 
-typedef unsigned int __bitwise blist_flags_t;
+typedef __u64 __bitwise blist_flags_t;
 
 struct scsi_mode_data {
__u32   length;
diff --git a/include/scsi/scsi_devinfo.h b/include/scsi/scsi_devinfo.h
index ea67c32..e206d29 100644
--- a/include/scsi/scsi_devinfo.h
+++ b/include/scsi/scsi_devinfo.h
@@ -6,55 +6,55 @@
  */
 
 /* Only scan LUN 0 */
-#define BLIST_NOLUN((__force blist_flags_t)(1 << 0))
+#define BLIST_NOLUN((__force blist_flags_t)(1ULL << 0))
 /* Known to have LUNs, force scanning.
  * DEPRECATED: Use max_luns=N */
-#define BLIST_FORCELUN ((__force blist_flags_t)(1 << 1))
+#define BLIST_FORCELUN ((__force blist_flags_t)(1ULL << 1))
 /* Flag for broken handshaking */
-#define BLIST_BORKEN   ((__force blist_flags_t)(1 << 2))
+#define BLIST_BORKEN   ((__force blist_flags_t)(1ULL << 2))
 /* unlock by special command */
-#define BLIST_KEY  ((__force blist_flags_t)(1 << 3))
+#define BLIST_KEY  ((__force blist_flags_t)(1ULL << 3))
 /* Do not use LUNs in parallel */
-#define BLIST_SINGLELUN((__force blist_flags_t)(1 << 4))
+#define BLIST_SINGLELUN((__force blist_flags_t)(1ULL << 4))
 /* Buggy Tagged Command Queuing */
-#define BLIST_NOTQ ((__force blist_flags_t)(1 << 5))
+#define BLIST_NOTQ ((__force blist_flags_t)(1ULL << 5))
 /* Non consecutive LUN numbering */
-#define BLIST_SPARSELUN((__force blist_flags_t)(1 << 6))
+#define BLIST_SPARSELUN((__force blist_flags_t)(1ULL << 6))
 /* Avoid LUNS >= 5 */
-#define BLIST_MAX5LUN  ((__force blist_flags_t)(1 << 7))
+#define BLIST_MAX5LUN  ((__force blist_flags_t)(1ULL << 7))
 /* Treat as (removable) CD-ROM */
-#define BLIST_ISROM

[PATCH v3 4/6] scsi: devinfo: warn on undefined blist flags

2018-04-17 Thread Martin Wilck
Warn if a device (or the user) sets blist flags which are unknown
or have been removed. This should enable us to reuse freed blist
bits in later releases.

Signed-off-by: Martin Wilck 
---
 drivers/scsi/Makefile   |  2 +-
 drivers/scsi/scsi_devinfo.c |  6 ++
 include/scsi/scsi_devinfo.h | 21 +
 3 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/Makefile b/drivers/scsi/Makefile
index d5135ef..fd901a5 100644
--- a/drivers/scsi/Makefile
+++ b/drivers/scsi/Makefile
@@ -190,7 +190,7 @@ $(obj)/53c700.o $(MODVERDIR)/$(obj)/53c700.ver: 
$(obj)/53c700_d.h
 $(obj)/scsi_sysfs.o: $(obj)/scsi_devinfo_tbl.c
 
 quiet_cmd_bflags = GEN $@
-   cmd_bflags = sed -n 's/.*BLIST_\([A-Z0-9_]*\) 
*.*/BLIST_FLAG_NAME(\1),/p' $< > $@
+   cmd_bflags = sed -n 's/.*define *BLIST_\([A-Z0-9_]*\) 
*.*/BLIST_FLAG_NAME(\1),/p' $< > $@
 
 $(obj)/scsi_devinfo_tbl.c: include/scsi/scsi_devinfo.h
$(call if_changed,bflags)
diff --git a/drivers/scsi/scsi_devinfo.c b/drivers/scsi/scsi_devinfo.c
index fc6b755..c05843a 100644
--- a/drivers/scsi/scsi_devinfo.c
+++ b/drivers/scsi/scsi_devinfo.c
@@ -371,6 +371,12 @@ int scsi_dev_info_list_add_keyed(int compatible, char 
*vendor, char *model,
}
flags = (__force blist_flags_t)val;
}
+   if (flags & __BLIST_UNUSED_MASK) {
+   pr_err("scsi_devinfo (%s:%s): unsupported flags 0x%llx",
+  vendor, model, flags & __BLIST_UNUSED_MASK);
+   kfree(devinfo);
+   return -EINVAL;
+   }
devinfo->flags = flags;
devinfo->compatible = compatible;
 
diff --git a/include/scsi/scsi_devinfo.h b/include/scsi/scsi_devinfo.h
index e206d29..3434e14 100644
--- a/include/scsi/scsi_devinfo.h
+++ b/include/scsi/scsi_devinfo.h
@@ -28,8 +28,13 @@
 #define BLIST_LARGELUN ((__force blist_flags_t)(1ULL << 9))
 /* override additional length field */
 #define BLIST_INQUIRY_36   ((__force blist_flags_t)(1ULL << 10))
+#define __BLIST_UNUSED_11  ((__force blist_flags_t)(1ULL << 11))
 /* do not do automatic start on add */
 #define BLIST_NOSTARTONADD ((__force blist_flags_t)(1ULL << 12))
+#define __BLIST_UNUSED_13  ((__force blist_flags_t)(1ULL << 13))
+#define __BLIST_UNUSED_14  ((__force blist_flags_t)(1ULL << 14))
+#define __BLIST_UNUSED_15  ((__force blist_flags_t)(1ULL << 15))
+#define __BLIST_UNUSED_16  ((__force blist_flags_t)(1ULL << 16))
 /* try REPORT_LUNS even for SCSI-2 devs (if HBA supports more than 8 LUNs) */
 #define BLIST_REPORTLUN2   ((__force blist_flags_t)(1ULL << 17))
 /* don't try REPORT_LUNS scan (SCSI-3 devs) */
@@ -44,10 +49,12 @@
 #define BLIST_RETRY_HWERROR((__force blist_flags_t)(1ULL << 22))
 /* maximum 512 sector cdb length */
 #define BLIST_MAX_512  ((__force blist_flags_t)(1ULL << 23))
+#define __BLIST_UNUSED_24  ((__force blist_flags_t)(1ULL << 24))
 /* Disable T10 PI (DIF) */
 #define BLIST_NO_DIF   ((__force blist_flags_t)(1ULL << 25))
 /* Ignore SBC-3 VPD pages */
 #define BLIST_SKIP_VPD_PAGES   ((__force blist_flags_t)(1ULL << 26))
+#define __BLIST_UNUSED_27  ((__force blist_flags_t)(1ULL << 27))
 /* Attempt to read VPD pages */
 #define BLIST_TRY_VPD_PAGES((__force blist_flags_t)(1ULL << 28))
 /* don't try to issue RSOC */
@@ -57,4 +64,18 @@
 /* Use UNMAP limit for WRITE SAME */
 #define BLIST_UNMAP_LIMIT_WS   ((__force blist_flags_t)(1ULL << 31))
 
+#define __BLIST_LAST_USED BLIST_UNMAP_LIMIT_WS
+
+#define __BLIST_HIGH_UNUSED (~(__BLIST_LAST_USED | \
+  (__force blist_flags_t) \
+  ((__force __u64)__BLIST_LAST_USED - 1ULL)))
+#define __BLIST_UNUSED_MASK (__BLIST_UNUSED_11 | \
+__BLIST_UNUSED_13 | \
+__BLIST_UNUSED_14 | \
+__BLIST_UNUSED_15 | \
+__BLIST_UNUSED_16 | \
+__BLIST_UNUSED_24 | \
+__BLIST_UNUSED_27 | \
+__BLIST_HIGH_UNUSED)
+
 #endif
-- 
2.16.1



[PATCH v3 6/6] scsi: devinfo: BLIST_RETRY_ASC_C1 for Fujitsu ETERNUS

2018-04-17 Thread Martin Wilck
On Fujitsu ETERNUS systems, sense code ABORTED COMMAND with ASC/Q C1/01 is
used to indicate temporary condition where the storage-internal path to a
target is switched from one controller to another. SCSI commands that
return with this error code must be retried unconditionally (i.e. without
the "maybe_retry" logic in scsi_decide_disposition); otherwise dm-multipath
might initiate a failover from a healthy path e.g. for REQ_FAILFAST_DEV
commands.

Introduce a new blist flag for this case.

Signed-off-by: Martin Wilck 
---
 drivers/scsi/scsi_devinfo.c | 1 +
 drivers/scsi/scsi_error.c   | 3 +++
 include/scsi/scsi_devinfo.h | 4 +++-
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_devinfo.c b/drivers/scsi/scsi_devinfo.c
index f7b94c1..e75a50f 100644
--- a/drivers/scsi/scsi_devinfo.c
+++ b/drivers/scsi/scsi_devinfo.c
@@ -168,6 +168,7 @@ static struct {
{"easyRAID", "X6P", NULL, BLIST_NOREPORTLUN},
{"easyRAID", "F8", NULL, BLIST_NOREPORTLUN},
{"FSC", "CentricStor", "*", BLIST_SPARSELUN | BLIST_LARGELUN},
+   {"FUJITSU", "ETERNUS_DXM", "*", BLIST_RETRY_ASC_C1},
{"Generic", "USB SD Reader", "1.00", BLIST_FORCELUN | BLIST_INQUIRY_36},
{"Generic", "USB Storage-SMC", "0180", BLIST_FORCELUN | 
BLIST_INQUIRY_36},
{"Generic", "USB Storage-SMC", "0207", BLIST_FORCELUN | 
BLIST_INQUIRY_36},
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 1dee91f..896b991 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -527,6 +527,9 @@ int scsi_check_sense(struct scsi_cmnd *scmd)
 
if (sshdr.asc == 0x44 && sdev->sdev_bflags & BLIST_RETRY_ITF)
return ADD_TO_MLQUEUE;
+   if (sshdr.asc == 0xc1 && sshdr.ascq == 0x01 &&
+   sdev->sdev_bflags & BLIST_RETRY_ASC_C1)
+   return ADD_TO_MLQUEUE;
 
return NEEDS_RETRY;
case NOT_READY:
diff --git a/include/scsi/scsi_devinfo.h b/include/scsi/scsi_devinfo.h
index 91a327e..3fdb322 100644
--- a/include/scsi/scsi_devinfo.h
+++ b/include/scsi/scsi_devinfo.h
@@ -65,8 +65,10 @@
 #define BLIST_UNMAP_LIMIT_WS   ((__force blist_flags_t)(1ULL << 31))
 /* Always retry ABORTED_COMMAND with Internal Target Failure */
 #define BLIST_RETRY_ITF((__force blist_flags_t)(1ULL << 32))
+/* Always retry ABORTED_COMMAND with ASC 0xc1 */
+#define BLIST_RETRY_ASC_C1 ((__force blist_flags_t)(1ULL << 33))
 
-#define __BLIST_LAST_USED BLIST_RETRY_ITF
+#define __BLIST_LAST_USED BLIST_RETRY_ASC_C1
 
 #define __BLIST_HIGH_UNUSED (~(__BLIST_LAST_USED | \
   (__force blist_flags_t) \
-- 
2.16.1



[PATCH v3 2/6] scsi: use const_ilog2 for array indices

2018-04-17 Thread Martin Wilck
Use the just introduced const_ilog2() macro to avoid sparse
errors.

Signed-off-by: Martin Wilck 
---
 drivers/scsi/scsi_debugfs.c | 2 +-
 drivers/scsi/scsi_sysfs.c   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/scsi_debugfs.c b/drivers/scsi/scsi_debugfs.c
index b784002..c5a8756 100644
--- a/drivers/scsi/scsi_debugfs.c
+++ b/drivers/scsi/scsi_debugfs.c
@@ -4,7 +4,7 @@
 #include 
 #include "scsi_debugfs.h"
 
-#define SCSI_CMD_FLAG_NAME(name) [ilog2(SCMD_##name)] = #name
+#define SCSI_CMD_FLAG_NAME(name)[const_ilog2(SCMD_##name)] = #name
 static const char *const scsi_cmd_flags[] = {
SCSI_CMD_FLAG_NAME(TAGGED),
SCSI_CMD_FLAG_NAME(UNCHECKED_ISA_DMA),
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index e56a4ac..feacd7a 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -968,7 +968,7 @@ sdev_show_wwid(struct device *dev, struct device_attribute 
*attr,
 static DEVICE_ATTR(wwid, S_IRUGO, sdev_show_wwid, NULL);
 
 #define BLIST_FLAG_NAME(name)  \
-   [ilog2((__force unsigned int)BLIST_##name)] = #name
+   [const_ilog2((__force unsigned int)BLIST_##name)] = #name
 static const char *const sdev_bflags_name[] = {
 #include "scsi_devinfo_tbl.c"
 };
-- 
2.16.1



[PATCH v3 5/6] scsi: devinfo: add BLIST_RETRY_ITF for EMC Symmetrix

2018-04-17 Thread Martin Wilck
EMC Symmetrix returns 'internal target error' for a variety
of conditions, most of which will be transient.
So we should always retry it, even with failfast set.
Otherwise we'd get spurious path flaps with multipath.

Signed-off-by: Martin Wilck 
---
 drivers/scsi/scsi_devinfo.c | 3 ++-
 drivers/scsi/scsi_error.c   | 4 
 include/scsi/scsi_devinfo.h | 4 +++-
 3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/scsi_devinfo.c b/drivers/scsi/scsi_devinfo.c
index c05843a..f7b94c1 100644
--- a/drivers/scsi/scsi_devinfo.c
+++ b/drivers/scsi/scsi_devinfo.c
@@ -161,7 +161,8 @@ static struct {
{"DGC", "RAID", NULL, BLIST_SPARSELUN}, /* EMC CLARiiON, storage on LUN 
0 */
{"DGC", "DISK", NULL, BLIST_SPARSELUN}, /* EMC CLARiiON, no storage on 
LUN 0 */
{"EMC",  "Invista", "*", BLIST_SPARSELUN | BLIST_LARGELUN},
-   {"EMC", "SYMMETRIX", NULL, BLIST_SPARSELUN | BLIST_LARGELUN | 
BLIST_REPORTLUN2},
+   {"EMC", "SYMMETRIX", NULL, BLIST_SPARSELUN | BLIST_LARGELUN |
+BLIST_REPORTLUN2 | BLIST_RETRY_ITF},
{"EMULEX", "MD21/S2 ESDI", NULL, BLIST_SINGLELUN},
{"easyRAID", "16P", NULL, BLIST_NOREPORTLUN},
{"easyRAID", "X6P", NULL, BLIST_NOREPORTLUN},
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index ac3b1c3..1dee91f 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -38,6 +38,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "scsi_priv.h"
@@ -524,6 +525,9 @@ int scsi_check_sense(struct scsi_cmnd *scmd)
if (sshdr.asc == 0x10) /* DIF */
return SUCCESS;
 
+   if (sshdr.asc == 0x44 && sdev->sdev_bflags & BLIST_RETRY_ITF)
+   return ADD_TO_MLQUEUE;
+
return NEEDS_RETRY;
case NOT_READY:
case UNIT_ATTENTION:
diff --git a/include/scsi/scsi_devinfo.h b/include/scsi/scsi_devinfo.h
index 3434e14..91a327e 100644
--- a/include/scsi/scsi_devinfo.h
+++ b/include/scsi/scsi_devinfo.h
@@ -63,8 +63,10 @@
 #define BLIST_MAX_1024 ((__force blist_flags_t)(1ULL << 30))
 /* Use UNMAP limit for WRITE SAME */
 #define BLIST_UNMAP_LIMIT_WS   ((__force blist_flags_t)(1ULL << 31))
+/* Always retry ABORTED_COMMAND with Internal Target Failure */
+#define BLIST_RETRY_ITF((__force blist_flags_t)(1ULL << 32))
 
-#define __BLIST_LAST_USED BLIST_UNMAP_LIMIT_WS
+#define __BLIST_LAST_USED BLIST_RETRY_ITF
 
 #define __BLIST_HIGH_UNUSED (~(__BLIST_LAST_USED | \
   (__force blist_flags_t) \
-- 
2.16.1