Re: [PATCH] scsi: aacraid: fix io drop during the reset

2017-12-14 Thread Martin K. Petersen

Prasad,

> "FIB_CONTEXT_FLAG_TIMEDOUT" flag is set in aac_eh_abort to
> indicate command timeout, using the same flag in reset handler
> causes the command to timeout and the IO's were droped.

Applied to 4.15/scsi-fixes. Thank you!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH v3] Use blist_flags_t consistently

2017-12-14 Thread Martin K. Petersen

Bart,

> Use the type blist_flags_t for all variables that represent blacklist
> flags. Additionally, suppress recently introduced sparse warnings
> related to blacklist flags.

Applied to 4.15/scsi-fixes, thanks!

PS. Had to tweak the commit ID.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: bloated workload of SCSI commands at attachment

2017-12-14 Thread Martin K. Petersen

Doug,

> What seems to be to be new is the three repeated sequence involving
> the "a3 0c" command. That is REPORT SUPPORTED OPERATION CODES.
> Why is the same sequence repeated 3 times?

One invocation is probably partition scanning.

sd_probe calls revalidate twice. It used to be a somewhat tricky dance
of doing it once to get basic device properties, then allocating the
gendisk, and finally doing another revalidate pass to fill out the
remaining fields that had a dependency on a gendisk being instantiated.

Given some of the recent block changes, I am not entirely sure two calls
are required. Would be interesting to test...

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH 5/4] scsi: arcmsr: simplify arcmsr_request_device_map routine

2017-12-14 Thread Martin K. Petersen

Ching,

> simplify arcmsr_request_device_map routine

Applied to 4.16/scsi-queue. Thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH v2 5/5] scsi_debug: add resp_write_scat function

2017-12-14 Thread Douglas Gilbert

On 2017-12-14 05:57 PM, Bart Van Assche wrote:

On Wed, 2017-12-13 at 20:40 -0500, Douglas Gilbert wrote:

+   {0, 0x7f, 0x11, F_SA_HIGH | F_D_OUT | FF_MEDIA_IO, resp_write_scat,
+   NULL, {32,  0xc7, 0, 0, 0, 0, 0x3f, 0x18, 0x0, 0x11, 0xf8,
+  0, 0xff, 0xff, 0x0, 0x0} },  /* WRITE SCATTERED(32) */


I know this is consistent with other command declarations in the scsi_debug
driver, but it seems weird to me to report to the initiator that only some
bits of the ADDITIONAL CDB LENGTH fields are supported (0x18)? From SPC-5:

The CDB USAGE DATA field contains information about the CDB for the command
being queried. The first byte of the CDB USAGE DATA field shall contain the
operation code for the command being queried. If the command being queried
contains a service action, then that service action code shall be placed in
the CDB USAGE DATA field in the same location as the SERVICE ACTION field of
the command CDB. All other bytes of the CDB USAGE DATA field shall contain a
usage map for bits in the CDB for the command being queried.


Given the opcode and service action for WRITE SCATTERED(32), if the
ADDITIONAL CDB LENGTH is other than 0x18, then the command is in error.
The mask with the best coverage on that is 0x18. It will catch any
bits set that are clear in the corresponding mask position but not catch
bits that are clear where the mask is set. So 0x19 will get caught as
invalid but 0x10 won't.

Attached is some output from a Seagate SAS SSD using the sg_opcodes
utility. Note that the 5 variable length cdb_s are all 32 bytes long
and have a mask of 0x18 for the ADDITIONAL CDB LENGTH.


+   lrdp = kzalloc(lbdof_blen, GFP_ATOMIC);
+   if (lrdp == NULL) {
+   mk_sense_buffer(scp, ILLEGAL_REQUEST, INSUFF_RES_ASC,
+   INSUFF_RES_ASCQ);
+   return illegal_condition_result;
+   }


It seems weird to me to report a fatal error to the initiator if a memory
allocation fails? I think other SCSI LLDs return SCSI_MLQUEUE_HOST_BUSY if a
memory allocation fails.


The scsi_debug driver plays the role of SCSI initiator and target
connected via a virtual transport. If the underlying OS reports a
resource error than I believe it is the implementer's choice whether
to treat it as a initiator error or a target (logical unit) error.
I chose the latter because the failure is in code that is roughly
equivalent to the device server (inside a LU inside a target).

The scsi_debug driver reports SCSI_MLQUEUE_HOST_BUSY in other contexts,
both real out-of-resources situations and via error injection.

Doug Gilbert
  SEAGATE   ST200FM0073   CB02
  Peripheral device type: disk

Opcode  ServiceCDBName
(hex)   action(h)  size   
---
 00  6Test Unit Ready
cdb usage: 00 00 00 00 00 00 
 01  6Rezero Unit
cdb usage: 01 00 01 00 ff 00 
 03  6Request Sense
cdb usage: 03 e1 00 00 ff 00 
 04  6Format Unit
cdb usage: 04 ff 00 ff ff 00 
 07  6Reassign Blocks
cdb usage: 07 03 00 00 00 00 
 08  6Read(6)
cdb usage: 08 1f ff ff ff 00 
 0a  6Write(6)
cdb usage: 0a 1f ff ff ff 00 
 0b  6Seek(6)
cdb usage: 0b 1f ff ff 00 00 
 12  6Inquiry
cdb usage: 12 e1 ff ff ff 00 
 15  6Mode select(6)
cdb usage: 15 11 00 00 ff 00 
 16  6Reserve(6)
cdb usage: 16 1e ff ff ff 00 
 17  6Release(6)
cdb usage: 17 1e ff 00 00 00 
 1a  6Mode sense(6)
cdb usage: 1a 08 ff ff ff 00 
 1b  6Start stop unit
cdb usage: 1b 01 00 0f f5 00 
 1c  6Receive diagnostic results
cdb usage: 1c 01 ff ff ff 00 
 1d  6Send diagnostic
cdb usage: 1d f7 00 ff ff 00 
 25 10Read capacity(10)
cdb usage: 25 00 7f ff ff ff 00 00 01 00 
 28 10Read(10)
cdb usage: 28 fa ff ff ff ff 00 ff ff 00 
 2a 10Write(10)
cdb usage: 2a fa ff ff ff ff 00 ff ff 00 
 2b 10Seek(10)
cdb usage: 2b 00 ff ff ff ff 00 00 00 00 
 2e 10Write and verify(10)
cdb usage: 2e f2 ff ff ff ff 00 ff ff 00 
 2f 10Verify(10)
cdb usage: 2f f2 ff ff ff ff 00 ff ff 00 
 35 10Synchronize cache(10)
cdb usage: 35 06 ff ff ff ff 00 ff ff 00 
 37 10Read defect data(10)
cdb usage: 37 00 1f 00 00 00 00 ff ff 00 
 3b010Write buffer, combined header and data [or multiple 
modes]
cdb usage: 3b 00 ff ff ff ff ff ff ff 00 
 3b210Write buffer, data
cdb usage: 3b 02 ff ff ff ff ff ff ff 

Re: [PATCH 0/4] scsi: arcmsr: simplify hba_get_config routine

2017-12-14 Thread Martin K. Petersen

Ching,

> patch 1: simplify arcmsr_hbaC_get_config function.
>
> patch 2: wait iop firmware ready before issue get_config command to iop.
>
> patch 3: simplify arcmsr_hbaE_get_config function.
>
> patch 4: simplify all arcmsr_hbaX_get_config routine by call a new 
> get_adapter_config function

Applied to 4.16/scsi-queue, thank you!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH 0/9] lpfc updates for 11.4.0.6

2017-12-14 Thread Martin K. Petersen

James,

> This patch set provides a number of bug fixes and 1 addition to
> the driver.

Applied to 4.16/scsi-queue. Thank you!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH 00/19] hisi_sas: PM, RAS, and other misc changes

2017-12-14 Thread Martin K. Petersen

John,

> This patchset contains support for some new
> features, and also some modifications and other
> fixes.
>
> Headline changes include:
> - v3 hw Suspend and Resume support
> - v3 hw RAS (PCI AER) support
> - v2 hw HW port error handling support
> - other misc fixes and tidy-up

Applied to 4.16/scsi-queue. Thank you!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH v2 5/5] scsi_debug: add resp_write_scat function

2017-12-14 Thread Bart Van Assche
On Wed, 2017-12-13 at 20:40 -0500, Douglas Gilbert wrote:
> + {0, 0x7f, 0x11, F_SA_HIGH | F_D_OUT | FF_MEDIA_IO, resp_write_scat,
> + NULL, {32,  0xc7, 0, 0, 0, 0, 0x3f, 0x18, 0x0, 0x11, 0xf8,
> +0, 0xff, 0xff, 0x0, 0x0} },  /* WRITE SCATTERED(32) */

I know this is consistent with other command declarations in the scsi_debug
driver, but it seems weird to me to report to the initiator that only some
bits of the ADDITIONAL CDB LENGTH fields are supported (0x18)? From SPC-5:

The CDB USAGE DATA field contains information about the CDB for the command
being queried. The first byte of the CDB USAGE DATA field shall contain the
operation code for the command being queried. If the command being queried
contains a service action, then that service action code shall be placed in
the CDB USAGE DATA field in the same location as the SERVICE ACTION field of
the command CDB. All other bytes of the CDB USAGE DATA field shall contain a
usage map for bits in the CDB for the command being queried.
 
> + lrdp = kzalloc(lbdof_blen, GFP_ATOMIC);
> + if (lrdp == NULL) {
> + mk_sense_buffer(scp, ILLEGAL_REQUEST, INSUFF_RES_ASC,
> + INSUFF_RES_ASCQ);
> + return illegal_condition_result;
> + }

It seems weird to me to report a fatal error to the initiator if a memory
allocation fails? I think other SCSI LLDs return SCSI_MLQUEUE_HOST_BUSY if a
memory allocation fails.

Thanks,

Bart.

Re: [PATCH v2 4/5] scsi_debug: ARRAY_SIZE() and FF_MEDIA_IO

2017-12-14 Thread Bart Van Assche
On Wed, 2017-12-13 at 20:40 -0500, Douglas Gilbert wrote:
> Reviewer suggested using the ARRAY_SIZE macro. That reduced one of the subtle
> inter-dependencies in the parser's tables.
> 
> It is important that commands which simulate media access, indicate this in 
> the
> flags for that command. The flag to do that was FF_DIRECT_IO. On reflection
> FF_MEDIA_IO seems a more accurate description. 

Thanks!

Reviewed-by: Bart Van Assche 



Re: [PATCH v2 3/5] scsi_debug: do_device_access, add sg offset argument

2017-12-14 Thread Bart Van Assche
On Wed, 2017-12-13 at 20:40 -0500, Douglas Gilbert wrote:
> WRITE SCATTERED needs to take several "bites" out of the data-out buffer.
> Expand the do_device_access() function to take a sg_skip argument.

Reviewed-by: Bart Van Assche 



Re: [PATCH v2 2/5] scsi_debug: fix group_number mask

2017-12-14 Thread Bart Van Assche
On Wed, 2017-12-13 at 20:40 -0500, Douglas Gilbert wrote:
>  static const struct opcode_info_t reserve_iarr[1] = {
> @@ -508,11 +508,11 @@ static const struct opcode_info_t 
> opcode_info_arr[SDEB_I_LAST_ELEMENT + 1] = {
>0, 0} },
>   {3, 0x88, 0, F_D_IN | FF_DIRECT_IO, resp_read_dt0, read_iarr,
>   {16,  0xfe, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> -  0xff, 0xff, 0xff, 0x9f, 0xc7} },   /* READ(16) */
> +  0xff, 0xff, 0xff, 0xff, 0xc7} },   /* READ(16) */
>  /* 10 */
>   {3, 0x8a, 0, F_D_OUT | FF_DIRECT_IO, resp_write_dt0, write_iarr,
>   {16,  0xfa, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> -  0xff, 0xff, 0xff, 0x9f, 0xc7} },   /* WRITE(16) */
> +  0xff, 0xff, 0xff, 0xff, 0xc7} },   /* WRITE(16) */

Please mention all changes in the patch description. I think the READ(16) and
WRITE(16) changes not only change the GROUP NUMBER field from 5 to 6 bits but
also declare the DLD0 bit as supported. Anyway:

Reviewed-by: Bart Van Assche 



Re: [PATCH v2 1/5] scsi_debug: tab, kstrto changes

2017-12-14 Thread Bart Van Assche
On Wed, 2017-12-13 at 20:40 -0500, Douglas Gilbert wrote:
> Some of my development tools tend to add spaces (my preference) rather
> than tabs (kernel convention). Running unexpand to clean these spaces
> up found more of them than checkpatch.pl did. Then checkpatch.pl
> complained about other style violations in those newly tabbed lines.

Reviewed-by: Bart Van Assche 



Re: [PATCH 4/4] sd: use async_probe cookie to avoid deadlocks

2017-12-14 Thread Bart Van Assche
On Tue, 2017-12-12 at 09:57 +0100, Hannes Reinecke wrote:
> With the current design we're waiting for all async probes to
> finish when removing any sd device.
> This might lead to a livelock where the 'remove' call is blocking
> for any probe calls to finish, and the probe calls are waiting for
> a response, which will never be processes as the thread handling
> the responses is waiting for the remove call to finish.
> Which is completely pointless as we only _really_ care for the
> probe on _this_ device to be completed; any other probing can
> happily continue for all we care.
> So save the async probing cookie in the structure and only wait
> if this specific probe is still active.

From async_synchronize_cookie_domain():

wait_event(async_done, lowest_in_progress(domain) >= cookie);

So async_synchronize_cookie_domain() also waits for multiple asynchronous
probes to finish. Does this patch have any advantages over the patch I
posted (https://marc.info/?l=linux-scsi&m=151275368714540)?

Thanks,

Bart.

Re: [PATCH] scsi: check for device state in __scsi_remove_target()

2017-12-14 Thread Ewan D. Milne
On Thu, 2017-12-14 at 10:02 +0100, Hannes Reinecke wrote:
> On 12/14/2017 09:05 AM, Jason Yan wrote:
> > 
> > On 2017/12/14 6:23, Bart Van Assche wrote:
> >> On Wed, 2017-12-13 at 14:21 +0100, Hannes Reinecke wrote:
> >>> As it turned out device_get() doesn't use kref_get_unless_zero(),
> >>> so we will be always getting a device pointer.
> >>> So we need to check for the device state in __scsi_remove_target()
> >>> to avoid tripping over deleted objects.
> >>>
> >>> Fixes: fbce4d9 ("scsi: fixup kernel warning during rmmod()")
> >>
> >> How about adding Reported-by: Jason Yan? See also
> >> https://www.spinics.net/lists/linux-scsi/msg115295.html
> >>
> >> Anyway:
> >>
> >> Reviewed-by: Bart Van Assche 
> >>
> > 
> > Seems the same as my patch.So how do we plan to fix this issue,
> > pick this approach up or the approach James Bottomley suggested?
> > I have sent a patch to change get_device() but Greg seems do not
> > like this way.
> > 
> This is actually a real regression, which can be trivially exercised by
> eg logging out from two connections to an iSCSI target.
> (Our QA tripped across that one).
> So I'd rather have to have it fixed reasonably soon.
> 
> While 'get_device' is IMO the 'correct' solution it surely warrants a
> broader discussion, plus one would need to audit all callers to check
> the return value. If we were going down that route we should probably
> add a __must_check to get_device(), too.
> But again, this will probably drag out for quite some time, and I'd
> prefer to have the fix in the meantime.
> 
> Cheers,
> 
> Hannes

We have 2 reproducible test cases, this patch fixes one of them,
which was a continually oscillating FC target port w/short dev_loss_tmo.
I'm still waiting for a report on the iSCSI test.  The code looks good.
We need to get some kind of fix for this sooner rather than later.

Reviewed-by: Ewan D. Milne 



Re: [PATCH 3/4] scsi: add missing get_device() return value checks

2017-12-14 Thread Bart Van Assche
On Tue, 2017-12-12 at 09:57 +0100, Hannes Reinecke wrote:
>   sdev->sdev_gendev.parent = get_device(&starget->dev);
> + if (!sdev->sdev_gendev.parent) {
> + kfree(sdev);
> + return NULL;
> + }

Are you sure that get_device() can return NULL?

Thanks,

Bart.

Re: [PATCH 2/4] sd: Check if parent is still present before proceeding with probing

2017-12-14 Thread Bart Van Assche
On Tue, 2017-12-12 at 09:57 +0100, Hannes Reinecke wrote:
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index ab75ebd..228b0b62 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -3399,6 +3399,10 @@ static int sd_probe(struct device *dev)
>   }
>  
>   device_initialize(&sdkp->dev);
> +
> + if (!get_device(dev))
> + goto out_free_index;

get_device(dev) returns the value of the 'dev' argument. Are you sure the
if-test is useful? Did you perhaps want to check sdp->sdev_state like
scsi_device_get() does?

>   sdkp->dev.parent = dev;
>   sdkp->dev.class = &sd_disk_class;
>   dev_set_name(&sdkp->dev, "%s", dev_name(dev));
> @@ -3407,7 +3411,6 @@ static int sd_probe(struct device *dev)
>   if (error)
>   goto out_free_index;
>  
> - get_device(dev);
>   dev_set_drvdata(dev, sdkp);
>  
>   get_device(&sdkp->dev); /* prevent release before async_schedule */

Since the get_device() call has been moved up, are there any error paths that
have to be adjusted?

Bart.

Re: [PATCH 1/4] block: check for GENHD_FL_UP in del_gendisk()

2017-12-14 Thread Bart Van Assche
On Wed, 2017-12-13 at 08:36 +0100, Hannes Reinecke wrote:
> On 12/12/2017 05:57 PM, Bart Van Assche wrote:
> > On Tue, 2017-12-12 at 09:57 +0100, Hannes Reinecke wrote:
> > > From: Hannes Reinecke 
> > > 
> > > When a device is probed asynchronously del_gendisk() might be called
> > > before the async probing was run, causing del_gendisk() to crash
> > > due to uninitialized sysfs objects.
> > > 
> > > Signed-off-by: Hannes Reinecke 
> > > ---
> > >  block/genhd.c | 3 +++
> > >  1 file changed, 3 insertions(+)
> > > 
> > > diff --git a/block/genhd.c b/block/genhd.c
> > > index c2223f1..cc40d95 100644
> > > --- a/block/genhd.c
> > > +++ b/block/genhd.c
> > > @@ -697,6 +697,9 @@ void del_gendisk(struct gendisk *disk)
> > >   struct disk_part_iter piter;
> > >   struct hd_struct *part;
> > >  
> > > + if (!(disk->flags & GENHD_FL_UP))
> > > + return;
> > > +
> > >   blk_integrity_del(disk);
> > >   disk_del_events(disk);
> > 
> > Hello Hannes,
> > 
> > Thank you for having published your approach for increasing disk probing
> > concurrency. Your approach looks interesting to me. However, I don't think
> > that patches 1/4..3/4 are sufficient to avoid races between e.g.
> > device_add_disk() and del_gendisk(). As far as I know no locks are held
> > around the device_add_disk() and del_gendisk() calls. Does that mean that
> > del_gendisk() can call e.g. blk_integrity_del() before device_add_disk() has
> > called blk_integrity_add()?
> 
> In principle, yes. However, the overall idea here is that device_add_disk()
> and del_gendisk() are enclosed within upper layer procedures, which themselves
> provide additional locking. In our case the sd driver provided synchronisation
> guarantees ensuring that device_add_disk() and del_gendisk() doesn't run
> concurrently.
> 
> if one is really concerned we could convert disk->flags to a bitmask, and use
> atomic bitmask modification; that should avoid any concurrency issues.

Hello Hannes,

Regarding the scenario explained in a previous e-mail: what guarantees that the
device_add_disk() call in sd_probe_async() does not happen concurrently with the
device_unregister() call from __scsi_remove_device()?

Thanks,

Bart.

[PATCH 5/6] Documentetion: cdrom: introduce CDS_DRIVE_ERROR

2017-12-14 Thread Michal Suchanek

CDS_DRIVE_NOT_READY is used for the state in which CDROM is 'becoming
ready' (typically analyzing the disc) but also as the fallback when
nothing else applies. Introduce CDS_DRIVE_ERROR for the fallback case.

Signed-off-by: Michal Suchanek 
---
 Documentation/cdrom/cdrom-standard.tex | 8 +++-
 Documentation/cdrom/ide-cd | 6 ++
 Documentation/ioctl/cdrom.txt  | 1 +
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/Documentation/cdrom/cdrom-standard.tex 
b/Documentation/cdrom/cdrom-standard.tex
index 8f85b0e41046..018284ba696a 100644
--- a/Documentation/cdrom/cdrom-standard.tex
+++ b/Documentation/cdrom/cdrom-standard.tex
@@ -371,11 +371,17 @@ $$
 CDS_NO_INFO& no information available\cr
 CDS_NO_DISC& no disc is inserted, tray is closed\cr
 CDS_TRAY_OPEN& tray is opened\cr
-CDS_DRIVE_NOT_READY& something is wrong, tray is moving?\cr
+CDS_DRIVE_NOT_READY& tray just closed?\cr
 CDS_DISC_OK& a disc is loaded and everything is fine\cr
+CDS_DRIVE_ERROR& something is wrong\cr
 }
 $$
 
+Note: The IDE and SCSI cdroms have a status code 'drive becoming ready' which
+is typically returned when the drive has just closed and is analyzing the disc.
+For other cdrom types this state is not reported by the hardware or not
+implemented by the driver.
+
 \subsection{$Int\ media_changed(struct\ cdrom_device_info * cdi, int\ 
disc_nr)$}
 
 This function is very similar to the original function in $struct\ 
diff --git a/Documentation/cdrom/ide-cd b/Documentation/cdrom/ide-cd
index a5f2a7f1ff46..9324a8fd9a39 100644
--- a/Documentation/cdrom/ide-cd
+++ b/Documentation/cdrom/ide-cd
@@ -455,6 +455,9 @@ main (int argc, char **argv)
case CDS_DRIVE_NOT_READY:
printf ("Drive Not Ready.\n");
break;
+   case CDS_DRIVE_ERROR:
+   printf ("Drive problem.\n");
+   break;
default:
printf ("This Should not happen!\n");
break;
@@ -481,6 +484,9 @@ main (int argc, char **argv)
case CDS_NO_INFO:
printf ("No Information available.");
break;
+   case CDS_DRIVE_ERROR:
+   printf ("Drive problem.\n");
+   break;
default:
printf ("This Should not happen!\n");
break;
diff --git a/Documentation/ioctl/cdrom.txt b/Documentation/ioctl/cdrom.txt
index a4d62a9d6771..7720d11807c3 100644
--- a/Documentation/ioctl/cdrom.txt
+++ b/Documentation/ioctl/cdrom.txt
@@ -700,6 +700,7 @@ CDROM_DRIVE_STATUS  Get tray position, etc.
CDS_TRAY_OPEN
CDS_DRIVE_NOT_READY
CDS_DISC_OK
+   CDS_DRIVE_ERROR
-1  error
 
error returns:
-- 
2.13.6



[PATCH 0/6] Fix cdrom autoclose

2017-12-14 Thread Michal Suchanek
Hello,

there is cdrom autoclose feature that is supposed to close the tray, wait for
the disc to become ready, and then open the device.

This used to work in ancient times. Then in old times there was a hack in
util-linux which worked around the breakage which probably resulted from
switching to scsi emulation.

Currently util-linux maintainer refuses to merge another hack on the basis that
kernel still has the feature so it should be fixed there. Indeed, to implement
this feature effectively from userspace one would need to know when the CD-ROM
is in the "drive becoming ready" state which is knowledge that never leaves the
hardware-specific driver and is passed neither to userspace nor the generic
cdrom driver.

So this patchset fixes the kernel autoclose implementation in cdrom.c and to
do so reports the "drive becoming ready" state from the harware specific
drivers.

Michal Suchanek (6):
  delay: add poll_event_interruptible
  cdrom: factor out common open_for_* code
  cdrom: wait for tray to close
  cdrom: introduce CDS_DRIVE_ERROR
  Documentetion: cdrom: introduce CDS_DRIVE_ERROR
  cdrom: wait for drive to become ready

 Documentation/cdrom/cdrom-standard.tex |   8 ++-
 Documentation/cdrom/ide-cd |   6 ++
 Documentation/ioctl/cdrom.txt  |   1 +
 drivers/block/paride/pcd.c |   2 +-
 drivers/cdrom/cdrom.c  | 124 -
 drivers/cdrom/gdrom.c  |   2 +-
 drivers/ide/ide-cd_ioctl.c |  12 ++--
 drivers/scsi/sr_ioctl.c|   2 +-
 include/linux/delay.h  |  12 
 include/uapi/linux/cdrom.h |   1 +
 10 files changed, 99 insertions(+), 71 deletions(-)

-- 
2.13.6



[PATCH 3/3] virtio_scsi: Implement 'native LUN' feature

2017-12-14 Thread Hannes Reinecke
The 'native LUN' feature allows virtio-scsi to pass in the LUN
numbers from the underlying storage directly, without having
to modify the LUN number itself.
It works by shifting the existing LUN number down by 8 bytes,
and add the virtio-specific 8-byte LUN steering header.
With that virtio doesn't have to mangle the LUN number, allowing
us to pass the 'real' LUN number to the guest.
Of course, we do cut off the last 8 bytes of the 'real' LUN number,
but I'm not aware of any array utilizing that, so the impact should
be negligible.

Signed-off-by: Hannes Reinecke 
---
 drivers/scsi/virtio_scsi.c   | 62 ++--
 include/uapi/linux/virtio_scsi.h |  1 +
 2 files changed, 48 insertions(+), 15 deletions(-)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index f925fbd..63c2c85 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -356,8 +356,12 @@ static void virtscsi_handle_transport_reset(struct 
virtio_scsi *vscsi,
struct scsi_device *sdev;
struct Scsi_Host *shost = virtio_scsi_host(vscsi->vdev);
unsigned int target = event->lun[1];
-   unsigned int lun = (event->lun[2] << 8) | event->lun[3];
+   u64 lun;
 
+   if (virtio_has_feature(vscsi->vdev, VIRTIO_SCSI_F_NATIVE_LUN))
+   lun = scsilun_to_int((struct scsi_lun *)event->lun) >> 16;
+   else
+   lun = (event->lun[2] << 8) | event->lun[3];
switch (virtio32_to_cpu(vscsi->vdev, event->reason)) {
case VIRTIO_SCSI_EVT_RESET_RESCAN:
scsi_add_device(shost, 0, target, lun);
@@ -368,7 +372,7 @@ static void virtscsi_handle_transport_reset(struct 
virtio_scsi *vscsi,
scsi_remove_device(sdev);
scsi_device_put(sdev);
} else {
-   pr_err("SCSI device %d 0 %d %d not found\n",
+   pr_err("SCSI device %d 0 %d %llu not found\n",
shost->host_no, target, lun);
}
break;
@@ -383,13 +387,17 @@ static void virtscsi_handle_param_change(struct 
virtio_scsi *vscsi,
struct scsi_device *sdev;
struct Scsi_Host *shost = virtio_scsi_host(vscsi->vdev);
unsigned int target = event->lun[1];
-   unsigned int lun = (event->lun[2] << 8) | event->lun[3];
+   u64 lun;
u8 asc = virtio32_to_cpu(vscsi->vdev, event->reason) & 255;
u8 ascq = virtio32_to_cpu(vscsi->vdev, event->reason) >> 8;
 
+   if (virtio_has_feature(vscsi->vdev, VIRTIO_SCSI_F_NATIVE_LUN))
+   lun = scsilun_to_int((struct scsi_lun *)event->lun) >> 16;
+   else
+   lun = (event->lun[2] << 8) | event->lun[3];
sdev = scsi_device_lookup(shost, 0, target, lun);
if (!sdev) {
-   pr_err("SCSI device %d 0 %d %d not found\n",
+   pr_err("SCSI device %d 0 %d %llu not found\n",
shost->host_no, target, lun);
return;
}
@@ -524,10 +532,16 @@ static void virtio_scsi_init_hdr(struct virtio_device 
*vdev,
 int target_id,
 struct scsi_cmnd *sc)
 {
-   cmd->lun[0] = 1;
-   cmd->lun[1] = target_id;
-   cmd->lun[2] = (sc->device->lun >> 8) | 0x40;
-   cmd->lun[3] = sc->device->lun & 0xff;
+   if (virtio_has_feature(vdev, VIRTIO_SCSI_F_NATIVE_LUN)) {
+   u64 lun = sc->device->lun << 16;
+   lun |= ((u64)1 << 8) | (u64)target_id;
+   int_to_scsilun(lun, (struct scsi_lun *)&cmd->lun);
+   } else {
+   cmd->lun[0] = 1;
+   cmd->lun[1] = target_id;
+   cmd->lun[2] = (sc->device->lun >> 8) | 0x40;
+   cmd->lun[3] = sc->device->lun & 0xff;
+   }
cmd->tag = cpu_to_virtio64(vdev, (unsigned long)sc);
cmd->task_attr = VIRTIO_SCSI_S_SIMPLE;
cmd->prio = 0;
@@ -776,11 +790,17 @@ static int virtscsi_device_reset(struct scsi_cmnd *sc)
.type = VIRTIO_SCSI_T_TMF,
.subtype = cpu_to_virtio32(vscsi->vdev,
 
VIRTIO_SCSI_T_TMF_LOGICAL_UNIT_RESET),
-   .lun[0] = 1,
-   .lun[1] = target_id,
-   .lun[2] = (sc->device->lun >> 8) | 0x40,
-   .lun[3] = sc->device->lun & 0xff,
};
+   if (virtio_has_feature(vscsi->vdev, VIRTIO_SCSI_F_NATIVE_LUN)) {
+   u64 lun = sc->device->lun << 16;
+   lun |= ((u64)1 << 8) | (u64)target_id;
+   int_to_scsilun(lun, (struct scsi_lun *)&cmd->req.tmf.lun);
+   } else {
+   cmd->req.tmf.lun[0] = 1;
+   cmd->req.tmf.lun[1] = target_id;
+   cmd->req.tmf.lun[2] = (sc->device->lun >> 8) | 0x40;
+   cmd->req.tmf.lun[3] = sc->device->lun & 0xff;
+   }
return virtscsi_tmf(vscsi, cmd);
 }
 
@@ -851,10 +871,18 @@ static int virtsc

[PATCH 2/3] virtio-scsi: Add FC transport class

2017-12-14 Thread Hannes Reinecke
When a device announces an 'FC' protocol we should be pulling
in the FC transport class to have the rports etc setup correctly.

Signed-off-by: Hannes Reinecke 
---
 drivers/scsi/virtio_scsi.c | 323 ++---
 1 file changed, 277 insertions(+), 46 deletions(-)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index a561e90..f925fbd 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -25,11 +25,13 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -91,6 +93,12 @@ struct virtio_scsi_vq {
  * an atomic_t.
  */
 struct virtio_scsi_target_state {
+   struct list_head list;
+   struct fc_rport *rport;
+   struct virtio_scsi *vscsi;
+   int target_id;
+   bool removed;
+
seqcount_t tgt_seq;
 
/* Count of outstanding requests. */
@@ -117,8 +125,12 @@ struct virtio_scsi {
/* Protected by event_vq lock */
bool stop_events;
 
+   int protocol;
int next_target_id;
+   u64 wwnn;
+   u64 wwpn;
struct work_struct rescan_work;
+   struct list_head target_list;
spinlock_t rescan_lock;
 
struct virtio_scsi_vq ctrl_vq;
@@ -128,6 +140,7 @@ struct virtio_scsi {
 
 static struct kmem_cache *virtscsi_cmd_cache;
 static mempool_t *virtscsi_cmd_pool;
+static struct scsi_transport_template *virtio_transport_template;
 
 static inline struct Scsi_Host *virtio_scsi_host(struct virtio_device *vdev)
 {
@@ -156,15 +169,21 @@ static void virtscsi_compute_resid(struct scsi_cmnd *sc, 
u32 resid)
 static void virtscsi_complete_cmd(struct virtio_scsi *vscsi, void *buf)
 {
struct virtio_scsi_cmd *cmd = buf;
+   struct fc_rport *rport;
struct scsi_cmnd *sc = cmd->sc;
struct virtio_scsi_cmd_resp *resp = &cmd->resp.cmd;
-   struct virtio_scsi_target_state *tgt =
-   scsi_target(sc->device)->hostdata;
+   struct virtio_scsi_target_state *tgt;
 
dev_dbg(&sc->device->sdev_gendev,
"cmd %p response %u status %#02x sense_len %u\n",
sc, resp->response, resp->status, resp->sense_len);
 
+   rport = starget_to_rport(scsi_target(sc->device));
+   if (!rport)
+   tgt = scsi_target(sc->device)->hostdata;
+   else
+   tgt = rport->dd_data;
+
sc->result = resp->status;
virtscsi_compute_resid(sc, virtio32_to_cpu(vscsi->vdev, resp->resid));
switch (resp->response) {
@@ -502,10 +521,11 @@ static int virtscsi_kick_cmd(struct virtio_scsi_vq *vq,
 
 static void virtio_scsi_init_hdr(struct virtio_device *vdev,
 struct virtio_scsi_cmd_req *cmd,
+int target_id,
 struct scsi_cmnd *sc)
 {
cmd->lun[0] = 1;
-   cmd->lun[1] = sc->device->id;
+   cmd->lun[1] = target_id;
cmd->lun[2] = (sc->device->lun >> 8) | 0x40;
cmd->lun[3] = sc->device->lun & 0xff;
cmd->tag = cpu_to_virtio64(vdev, (unsigned long)sc);
@@ -517,12 +537,14 @@ static void virtio_scsi_init_hdr(struct virtio_device 
*vdev,
 #ifdef CONFIG_BLK_DEV_INTEGRITY
 static void virtio_scsi_init_hdr_pi(struct virtio_device *vdev,
struct virtio_scsi_cmd_req_pi *cmd_pi,
+   int target_id,
struct scsi_cmnd *sc)
 {
struct request *rq = sc->request;
struct blk_integrity *bi;
 
-   virtio_scsi_init_hdr(vdev, (struct virtio_scsi_cmd_req *)cmd_pi, sc);
+   virtio_scsi_init_hdr(vdev, (struct virtio_scsi_cmd_req *)cmd_pi,
+target_id, sc);
 
if (!rq || !scsi_prot_sg_count(sc))
return;
@@ -542,6 +564,7 @@ static void virtio_scsi_init_hdr_pi(struct virtio_device 
*vdev,
 
 static int virtscsi_queuecommand(struct virtio_scsi *vscsi,
 struct virtio_scsi_vq *req_vq,
+int target_id,
 struct scsi_cmnd *sc)
 {
struct Scsi_Host *shost = virtio_scsi_host(vscsi->vdev);
@@ -564,13 +587,15 @@ static int virtscsi_queuecommand(struct virtio_scsi 
*vscsi,
 
 #ifdef CONFIG_BLK_DEV_INTEGRITY
if (virtio_has_feature(vscsi->vdev, VIRTIO_SCSI_F_T10_PI)) {
-   virtio_scsi_init_hdr_pi(vscsi->vdev, &cmd->req.cmd_pi, sc);
+   virtio_scsi_init_hdr_pi(vscsi->vdev, &cmd->req.cmd_pi,
+   target_id, sc);
memcpy(cmd->req.cmd_pi.cdb, sc->cmnd, sc->cmd_len);
req_size = sizeof(cmd->req.cmd_pi);
} else
 #endif
{
-   virtio_scsi_init_hdr(vscsi->vdev, &cmd->req.cmd, sc);
+   virtio_scsi_init_hdr(vscsi->vdev, &cmd->req.cmd,
+target_id, sc);
memc

[PATCH 0/3] virtio-vfc implementation

2017-12-14 Thread Hannes Reinecke
Hi all,

here's my attempt to implement a 'Virtual FC' emulation for virtio-scsi,
based on the presentation at the KVM Forum in Prague.

This doesn't so much implement the FC protocol per se, it just uses the
port information to hook in the FC transport class, making the device
to look like a FC device.

This patchset has a complementary patchset for the qemu virtio-scsi driver,
which will be posted to the qemu-devel list.

As usual, comments and reviews are welcome.

Hannes Reinecke (3):
  virtio-scsi: implement target rescan
  virtio-scsi: Add FC transport class
  virtio_scsi: Implement 'native LUN' feature

 drivers/scsi/virtio_scsi.c   | 582 ---
 include/uapi/linux/virtio_scsi.h |  16 ++
 2 files changed, 554 insertions(+), 44 deletions(-)

-- 
1.8.5.6



[PATCH 1/3] virtio-scsi: implement target rescan

2017-12-14 Thread Hannes Reinecke
Implement the 'rescan' virtio-scsi feature. Rescanning works by
sending a 'rescan' virtio-scsi command with the next requested
target id to the backend. The backend will respond with the next
used target id or '-1' if no more targets are found.
This avoids scanning all possible targets.

Signed-off-by: Hannes Reinecke 
---
 drivers/scsi/virtio_scsi.c   | 239 ++-
 include/uapi/linux/virtio_scsi.h |  15 +++
 2 files changed, 250 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 7c28e8d..a561e90 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -46,12 +46,14 @@ struct virtio_scsi_cmd {
struct virtio_scsi_cmd_req_picmd_pi;
struct virtio_scsi_ctrl_tmf_req  tmf;
struct virtio_scsi_ctrl_an_req   an;
+   struct virtio_scsi_rescan_reqrescan;
} req;
union {
struct virtio_scsi_cmd_resp  cmd;
struct virtio_scsi_ctrl_tmf_resp tmf;
struct virtio_scsi_ctrl_an_resp  an;
struct virtio_scsi_event evt;
+   struct virtio_scsi_rescan_resp   rescan;
} resp;
 } cacheline_aligned_in_smp;
 
@@ -115,6 +117,10 @@ struct virtio_scsi {
/* Protected by event_vq lock */
bool stop_events;
 
+   int next_target_id;
+   struct work_struct rescan_work;
+   spinlock_t rescan_lock;
+
struct virtio_scsi_vq ctrl_vq;
struct virtio_scsi_vq event_vq;
struct virtio_scsi_vq req_vqs[];
@@ -318,6 +324,11 @@ static void virtscsi_cancel_event_work(struct virtio_scsi 
*vscsi)
 
for (i = 0; i < VIRTIO_SCSI_EVENT_LEN; i++)
cancel_work_sync(&vscsi->event_list[i].work);
+
+   spin_lock_irq(&vscsi->rescan_lock);
+   vscsi->next_target_id = -1;
+   spin_unlock_irq(&vscsi->rescan_lock);
+   cancel_work_sync(&vscsi->rescan_work);
 }
 
 static void virtscsi_handle_transport_reset(struct virtio_scsi *vscsi,
@@ -805,6 +816,168 @@ static enum blk_eh_timer_return 
virtscsi_eh_timed_out(struct scsi_cmnd *scmnd)
return BLK_EH_RESET_TIMER;
 }
 
+static void virtscsi_rescan_work(struct work_struct *work)
+{
+   struct virtio_scsi *vscsi =
+   container_of(work, struct virtio_scsi, rescan_work);
+   struct Scsi_Host *sh = virtio_scsi_host(vscsi->vdev);
+   int target_id, ret;
+   struct virtio_scsi_cmd *cmd;
+   DECLARE_COMPLETION_ONSTACK(comp);
+
+   spin_lock_irq(&vscsi->rescan_lock);
+   target_id = vscsi->next_target_id;
+   if (target_id == -1) {
+   shost_printk(KERN_INFO, sh, "rescan: terminated\n");
+   spin_unlock_irq(&vscsi->rescan_lock);
+   return;
+   }
+   spin_unlock_irq(&vscsi->rescan_lock);
+
+   cmd = mempool_alloc(virtscsi_cmd_pool, GFP_NOIO);
+   if (!cmd) {
+   shost_printk(KERN_INFO, sh, "rescan: no memory\n");
+   goto scan_host;
+   }
+   shost_printk(KERN_INFO, sh, "rescan: next target %d\n", target_id);
+   memset(cmd, 0, sizeof(*cmd));
+   cmd->comp = ∁
+   cmd->sc = NULL;
+   cmd->req.rescan = (struct virtio_scsi_rescan_req){
+   .type = VIRTIO_SCSI_T_RESCAN,
+   .next_id = cpu_to_virtio32(vscsi->vdev, target_id),
+   };
+
+   ret = virtscsi_kick_cmd(&vscsi->ctrl_vq, cmd, sizeof(cmd->req.rescan),
+   sizeof(cmd->resp.rescan));
+   if (ret < 0) {
+   mempool_free(cmd, virtscsi_cmd_pool);
+   goto scan_host;
+   }
+
+   wait_for_completion(&comp);
+   target_id = virtio32_to_cpu(vscsi->vdev, cmd->resp.rescan.id);
+   if (target_id != -1) {
+   int transport = virtio32_to_cpu(vscsi->vdev,
+   cmd->resp.rescan.transport);
+   spin_lock_irq(&vscsi->rescan_lock);
+   vscsi->next_target_id = target_id + 1;
+   spin_unlock_irq(&vscsi->rescan_lock);
+   shost_printk(KERN_INFO, sh,
+"found %s target %d (WWN %*phN)\n",
+transport == SCSI_PROTOCOL_FCP ? "FC" : "SAS",
+target_id, 8,
+cmd->resp.rescan.port_wwn);
+   scsi_scan_target(&sh->shost_gendev, 0, target_id,
+SCAN_WILD_CARD, SCSI_SCAN_INITIAL);
+   queue_work(system_freezable_wq, &vscsi->rescan_work);
+   } else {
+   shost_printk(KERN_INFO, sh,
+"rescan: no more targets\n");
+   spin_lock_irq(&vscsi->rescan_lock);
+   vscsi->next_target_id = -1;
+   spin_unlock_irq(&vscsi->rescan_lock);
+   }
+   mempool_free(cmd, virtscsi_cmd_pool);
+   return;
+scan_host:
+   spin_lock_irq(&vscsi->rescan_lock);
+

Re: [PATCH] driver core: Make it safe to use get_device() if the reference count is zero

2017-12-14 Thread Jason Yan


On 2017/12/14 16:56, Greg KH wrote:

On Thu, Dec 14, 2017 at 04:27:56PM +0800, Jason Yan wrote:



On 2017/12/14 16:10, Greg KH wrote:

On Thu, Dec 14, 2017 at 03:56:46PM +0800, Jason Yan wrote:


On 2017/12/14 15:42, Greg KH wrote:

On Thu, Dec 14, 2017 at 11:39:36AM +0800, Jason Yan wrote:

Some driviers may have the chance to increase a reference count that
has dropped to zero when using get_device() because of their design.

Then those drivers are broken :)


We have met such a issue with scsi:
https://www.spinics.net/lists/linux-scsi/msg115295.html

The scsi core will keep the scsi device object in the host list after
it has been deleted and the iterator can still find it. All of the
places where need iterating have to check the state of the scsi device
and this makes a lot of code redundancy and complexity.

Provide a safe mechanism in get_device() by using
kobject_get_unless_zero().

Suggested-by: Bart Van Assche 
Signed-off-by: Jason Yan 
CC: Greg Kroah-Hartman 
CC: Bart Van Assche 
CC: Ewan D. Milne 
CC: James E.J. Bottomley 
CC: Christoph Hellwig 
---
drivers/base/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 12ebd05..cc74810 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1916,7 +1916,7 @@ EXPORT_SYMBOL_GPL(device_register);
 */
struct device *get_device(struct device *dev)
{
-   return dev ? kobj_to_dev(kobject_get(&dev->kobj)) : NULL;
+   return dev && kobject_get_unless_zero(&dev->kobj) ? dev : NULL;

I really don't want to do this, the bus the device is on should prevent
this from happening.

Also, once that reference count drops to zero, the memory should be
freed, so you really have a stale pointer here, and this code would fail
if you had slab debugging enabled anyway.


Actually I don't want this either. But the design of scsi core will leave
the scsi device on the host list after it is deleted, and it can  be
found later and the refcount have a very big chance to increase from
zero again. And after a lot of discussion it seems that the scsi layer
is difficult to change the situation in the near future.


Keeping a 'struct device' reference counted chunk of memory on a list
that has a different lifetime rule from that device itself, is crazy.



The lifetime rule is the same. That device itself will delete from the
host list in the destructor, scsi_device_dev_release_usercontext(), and
the memory will be freed after that. That's why this issue came out.


And yes, I remember how all of this came about, but I really don't have
the time to work on it myself...


So I don't even think this fixes the issue you think it fixes :)


This issue is very easy to reproduce on my machine and I have tested the
patch and it really fixes the issue.


Even with slab debugging enabled?  If so, what is keeping that memory
from being freed once the reference count drops to 0?



As above, the memory is not freed yet when we increasing the refconunt from
zero, so it's nothing to do with slab debugging enabled or not. If
we want to free it, we have to grab host lock first to delete it from
the list, so if we are grabing the host lock, we can increase the
refcount safely from 0 to 1.


Wait, what?  Once that refcount goes to 0, the release callback will be
called, and the memory had better be freed.  Any pointer you might still
have to the structure is now invalid.  The fact that you are somehow
still keeping that pointer around is not ok, and slab debugging should
have caused the memory to be overwritten and garbage would result if you
tried to make this call again.



I don't know why scsi have this design. Anyone who is familiar with the
history of this design can explain this?


thanks,

greg k-h

.





[Bug 198161] incremental patch-4.9.51-52 on sg.c break Nero 4 nero burning rom application

2017-12-14 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=198161

Johannes Thumshirn (jthumsh...@suse.de) changed:

   What|Removed |Added

 CC||jthumsh...@suse.de

--- Comment #2 from Johannes Thumshirn (jthumsh...@suse.de) ---
(In reply to Jean-Luc from comment #0)
> Hi
> 
> Nero 4 Linux can no longer detect devices since the changes in code in sg.c.
> It displays the following message: "Warning: no device detected"...
> 
> Regards

Can you please re-test with a more recent kernel (e.g. v4.14) ? I have had
submitted a fix for exactly this issue with Nero and believe it is gone.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.


[PATCH] scsi: ufs: add trace event for ufs upiu

2017-12-14 Thread Ohad Sharabi
Add UFS Protocol Information Units(upiu) trace events for ufs driver,
used to trace various ufs transaction types- command, task-management
and device management.
The trace-point format is generic and can be easily adapted to trace
other upius if needed.
Currently tracing ufs transaction of type 'device management', which
this patch introduce, cannot be obtained from any other trace.
Device management transactions are used for communication with the
device such as reading and writing descriptor or attributes etc.

Signed-off-by: Ohad Sharabi 
---
 drivers/scsi/ufs/ufshcd.c  | 48 ++
 include/trace/events/ufs.h | 28 +++
 2 files changed, 76 insertions(+)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index a355d98..6d79c99 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -274,6 +274,47 @@ static inline void ufshcd_remove_non_printable(char *val)
*val = ' ';
 }
 
+static void ufshcd_add_upiu_trace(struct ufs_hba *hba, unsigned int tag,
+   const char *str)
+{
+   struct utp_task_req_desc *descp;
+   struct utp_upiu_task_req *task_req;
+   struct utp_upiu_req *rq;
+   u8 tx_code, *hdr, *tsf;
+   int off;
+
+   if (!trace_ufshcd_upiu_enabled())
+   return;
+
+   off = (int)tag - hba->nutrs;
+   if (off < 0) {
+   rq = hba->lrb[tag].ucd_req_ptr;
+   hdr = (u8 *)&rq->header;
+   } else {
+   descp = &hba->utmrdl_base_addr[off];
+   task_req = (struct utp_upiu_task_req *)descp->task_req_upiu;
+   hdr = (u8 *)&task_req->header;
+   }
+
+   tx_code = hdr[0] & 0x3f;
+   switch (hdr[0] & 0x3f) {
+   case UPIU_TRANSACTION_COMMAND:
+   tsf = (u8 *)&rq->sc.cdb;
+   break;
+   case UPIU_TRANSACTION_TASK_REQ:
+   tsf = (u8 *)&task_req->input_param1;
+   break;
+   case UPIU_TRANSACTION_QUERY_REQ:
+   tsf = (u8 *)&rq->qr;
+   break;
+   default:
+   return;
+   }
+
+   /* trace UPIU header and Transaction Specific Fields (TSF) */
+   trace_ufshcd_upiu(dev_name(hba->dev), str, hdr, tsf);
+}
+
 static void ufshcd_add_command_trace(struct ufs_hba *hba,
unsigned int tag, const char *str)
 {
@@ -283,6 +324,9 @@ static void ufshcd_add_command_trace(struct ufs_hba *hba,
struct ufshcd_lrb *lrbp;
int transfer_len = -1;
 
+   /* trace UPIU also */
+   ufshcd_add_upiu_trace(hba, tag, str);
+
if (!trace_ufshcd_command_enabled())
return;
 
@@ -5462,11 +5506,14 @@ static int ufshcd_issue_tm_cmd(struct ufs_hba *hba, int 
lun_id, int task_id,
 
spin_unlock_irqrestore(host->host_lock, flags);
 
+   ufshcd_add_upiu_trace(hba, task_tag, "tm_send");
+
/* wait until the task management command is completed */
err = wait_event_timeout(hba->tm_wq,
test_bit(free_slot, &hba->tm_condition),
msecs_to_jiffies(TM_CMD_TIMEOUT));
if (!err) {
+   ufshcd_add_upiu_trace(hba, task_tag, "tm_complete_err");
dev_err(hba->dev, "%s: task management cmd 0x%.2x timed-out\n",
__func__, tm_function);
if (ufshcd_clear_tm_cmd(hba, free_slot))
@@ -5475,6 +5522,7 @@ static int ufshcd_issue_tm_cmd(struct ufs_hba *hba, int 
lun_id, int task_id,
err = -ETIMEDOUT;
} else {
err = ufshcd_task_req_compl(hba, free_slot, tm_response);
+   ufshcd_add_upiu_trace(hba, task_tag, "tm_complete");
}
 
clear_bit(free_slot, &hba->tm_condition);
diff --git a/include/trace/events/ufs.h b/include/trace/events/ufs.h
index bf6f826..0b2ff5d 100644
--- a/include/trace/events/ufs.h
+++ b/include/trace/events/ufs.h
@@ -257,6 +257,34 @@
)
 );
 
+TRACE_EVENT(ufshcd_upiu,
+   TP_PROTO(const char *dev_name, const char *str, unsigned char *hdr,
+   unsigned char *tsf),
+
+   TP_ARGS(dev_name, str, hdr, tsf),
+
+   TP_STRUCT__entry(
+   __string(dev_name, dev_name)
+   __string(str, str)
+   __array(unsigned char, hdr, 12)
+   __array(unsigned char, tsf, 16)
+   ),
+
+   TP_fast_assign(
+   __assign_str(dev_name, dev_name);
+   __assign_str(str, str);
+   memcpy(__entry->hdr, hdr, sizeof(__entry->hdr));
+   memcpy(__entry->tsf, tsf, sizeof(__entry->tsf));
+   ),
+
+   TP_printk(
+   "%s: %s: HDR:%s, CDB:%s",
+   __get_str(str), __get_str(dev_name),
+   __print_hex(__entry->hdr, sizeof(__entry->hdr)),
+   __print_hex(__entry->tsf, sizeof(__entry->tsf))
+   )
+);
+
 #endif /* if !defined(_TRACE_UFS_H) || defined(TRACE_HEADER_MULTI_READ) */
 
 /* This part must

Re: [PATCH] scsi: check for device state in __scsi_remove_target()

2017-12-14 Thread Hannes Reinecke
On 12/14/2017 09:05 AM, Jason Yan wrote:
> 
> On 2017/12/14 6:23, Bart Van Assche wrote:
>> On Wed, 2017-12-13 at 14:21 +0100, Hannes Reinecke wrote:
>>> As it turned out device_get() doesn't use kref_get_unless_zero(),
>>> so we will be always getting a device pointer.
>>> So we need to check for the device state in __scsi_remove_target()
>>> to avoid tripping over deleted objects.
>>>
>>> Fixes: fbce4d9 ("scsi: fixup kernel warning during rmmod()")
>>
>> How about adding Reported-by: Jason Yan? See also
>> https://www.spinics.net/lists/linux-scsi/msg115295.html
>>
>> Anyway:
>>
>> Reviewed-by: Bart Van Assche 
>>
> 
> Seems the same as my patch.So how do we plan to fix this issue,
> pick this approach up or the approach James Bottomley suggested?
> I have sent a patch to change get_device() but Greg seems do not
> like this way.
> 
This is actually a real regression, which can be trivially exercised by
eg logging out from two connections to an iSCSI target.
(Our QA tripped across that one).
So I'd rather have to have it fixed reasonably soon.

While 'get_device' is IMO the 'correct' solution it surely warrants a
broader discussion, plus one would need to audit all callers to check
the return value. If we were going down that route we should probably
add a __must_check to get_device(), too.
But again, this will probably drag out for quite some time, and I'd
prefer to have the fix in the meantime.

Cheers,

Hannes
-- 
Dr. Hannes ReineckezSeries & Storage
h...@suse.com  +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


[bug report] scsi: qla2xxx: Fix memory leak in dual/target mode

2017-12-14 Thread Dan Carpenter
Hello himanshu.madh...@cavium.com,

This is a semi-automatic email about new static checker warnings.

The patch 7867b98dceb7: "scsi: qla2xxx: Fix memory leak in
dual/target mode" from Dec 4, 2017, leads to the following Smatch
complaint:

drivers/scsi/qla2xxx/qla_mid.c:586 qla25xx_delete_req_que()
error: we previously assumed 'req' could be null (see line 580)

drivers/scsi/qla2xxx/qla_mid.c:602 qla25xx_delete_rsp_que()
error: we previously assumed 'rsp' could be null (see line 596)


drivers/scsi/qla2xxx/qla_mid.c
   579  
   580  if (req && vha->flags.qpairs_req_created) {
^^^
Check for NULL

   581  req->options |= BIT_0;
   582  ret = qla25xx_init_req_que(vha, req);
   583  if (ret != QLA_SUCCESS)
   584  return QLA_FUNCTION_FAILED;
   585  }
   586  qla25xx_free_req_que(vha, req);
  ^^^
Unchecked dereference inside function.

   587  
   588  return ret;

[ snip ]

   595  
   596  if (rsp && vha->flags.qpairs_rsp_created) {
^^^ 
  597   rsp->options |= BIT_0;
   598  ret = qla25xx_init_rsp_que(vha, rsp);
   599  if (ret != QLA_SUCCESS)
   600  return QLA_FUNCTION_FAILED;
   601  }
   602  qla25xx_free_rsp_que(vha, rsp);
  ^^^
   603  
   604  return ret;

regards,
dan carpenter

regards,
dan carpenter


Re: [PATCH] driver core: Make it safe to use get_device() if the reference count is zero

2017-12-14 Thread Greg KH
On Thu, Dec 14, 2017 at 04:27:56PM +0800, Jason Yan wrote:
> 
> 
> On 2017/12/14 16:10, Greg KH wrote:
> > On Thu, Dec 14, 2017 at 03:56:46PM +0800, Jason Yan wrote:
> > > 
> > > On 2017/12/14 15:42, Greg KH wrote:
> > > > On Thu, Dec 14, 2017 at 11:39:36AM +0800, Jason Yan wrote:
> > > > > Some driviers may have the chance to increase a reference count that
> > > > > has dropped to zero when using get_device() because of their design.
> > > > Then those drivers are broken :)
> > > > 
> > > > > We have met such a issue with scsi:
> > > > > https://www.spinics.net/lists/linux-scsi/msg115295.html
> > > > > 
> > > > > The scsi core will keep the scsi device object in the host list after
> > > > > it has been deleted and the iterator can still find it. All of the
> > > > > places where need iterating have to check the state of the scsi device
> > > > > and this makes a lot of code redundancy and complexity.
> > > > > 
> > > > > Provide a safe mechanism in get_device() by using
> > > > > kobject_get_unless_zero().
> > > > > 
> > > > > Suggested-by: Bart Van Assche 
> > > > > Signed-off-by: Jason Yan 
> > > > > CC: Greg Kroah-Hartman 
> > > > > CC: Bart Van Assche 
> > > > > CC: Ewan D. Milne 
> > > > > CC: James E.J. Bottomley 
> > > > > CC: Christoph Hellwig 
> > > > > ---
> > > > >drivers/base/core.c | 2 +-
> > > > >1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > > > > index 12ebd05..cc74810 100644
> > > > > --- a/drivers/base/core.c
> > > > > +++ b/drivers/base/core.c
> > > > > @@ -1916,7 +1916,7 @@ EXPORT_SYMBOL_GPL(device_register);
> > > > > */
> > > > >struct device *get_device(struct device *dev)
> > > > >{
> > > > > - return dev ? kobj_to_dev(kobject_get(&dev->kobj)) : NULL;
> > > > > + return dev && kobject_get_unless_zero(&dev->kobj) ? dev : NULL;
> > > > I really don't want to do this, the bus the device is on should prevent
> > > > this from happening.
> > > > 
> > > > Also, once that reference count drops to zero, the memory should be
> > > > freed, so you really have a stale pointer here, and this code would fail
> > > > if you had slab debugging enabled anyway.
> > > 
> > > Actually I don't want this either. But the design of scsi core will leave
> > > the scsi device on the host list after it is deleted, and it can  be
> > > found later and the refcount have a very big chance to increase from
> > > zero again. And after a lot of discussion it seems that the scsi layer
> > > is difficult to change the situation in the near future.
> > 
> > Keeping a 'struct device' reference counted chunk of memory on a list
> > that has a different lifetime rule from that device itself, is crazy.
> > 
> 
> The lifetime rule is the same. That device itself will delete from the
> host list in the destructor, scsi_device_dev_release_usercontext(), and
> the memory will be freed after that. That's why this issue came out.
> 
> > And yes, I remember how all of this came about, but I really don't have
> > the time to work on it myself...
> > 
> > > > So I don't even think this fixes the issue you think it fixes :)
> > > 
> > > This issue is very easy to reproduce on my machine and I have tested the
> > > patch and it really fixes the issue.
> > 
> > Even with slab debugging enabled?  If so, what is keeping that memory
> > from being freed once the reference count drops to 0?
> > 
> 
> As above, the memory is not freed yet when we increasing the refconunt from
> zero, so it's nothing to do with slab debugging enabled or not. If
> we want to free it, we have to grab host lock first to delete it from
> the list, so if we are grabing the host lock, we can increase the
> refcount safely from 0 to 1.

Wait, what?  Once that refcount goes to 0, the release callback will be
called, and the memory had better be freed.  Any pointer you might still
have to the structure is now invalid.  The fact that you are somehow
still keeping that pointer around is not ok, and slab debugging should
have caused the memory to be overwritten and garbage would result if you
tried to make this call again.

thanks,

greg k-h


Re: [PATCH] driver core: Make it safe to use get_device() if the reference count is zero

2017-12-14 Thread Jason Yan



On 2017/12/14 16:10, Greg KH wrote:

On Thu, Dec 14, 2017 at 03:56:46PM +0800, Jason Yan wrote:


On 2017/12/14 15:42, Greg KH wrote:

On Thu, Dec 14, 2017 at 11:39:36AM +0800, Jason Yan wrote:

Some driviers may have the chance to increase a reference count that
has dropped to zero when using get_device() because of their design.

Then those drivers are broken :)


We have met such a issue with scsi:
https://www.spinics.net/lists/linux-scsi/msg115295.html

The scsi core will keep the scsi device object in the host list after
it has been deleted and the iterator can still find it. All of the
places where need iterating have to check the state of the scsi device
and this makes a lot of code redundancy and complexity.

Provide a safe mechanism in get_device() by using
kobject_get_unless_zero().

Suggested-by: Bart Van Assche 
Signed-off-by: Jason Yan 
CC: Greg Kroah-Hartman 
CC: Bart Van Assche 
CC: Ewan D. Milne 
CC: James E.J. Bottomley 
CC: Christoph Hellwig 
---
   drivers/base/core.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 12ebd05..cc74810 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1916,7 +1916,7 @@ EXPORT_SYMBOL_GPL(device_register);
*/
   struct device *get_device(struct device *dev)
   {
-   return dev ? kobj_to_dev(kobject_get(&dev->kobj)) : NULL;
+   return dev && kobject_get_unless_zero(&dev->kobj) ? dev : NULL;

I really don't want to do this, the bus the device is on should prevent
this from happening.

Also, once that reference count drops to zero, the memory should be
freed, so you really have a stale pointer here, and this code would fail
if you had slab debugging enabled anyway.


Actually I don't want this either. But the design of scsi core will leave
the scsi device on the host list after it is deleted, and it can  be
found later and the refcount have a very big chance to increase from
zero again. And after a lot of discussion it seems that the scsi layer
is difficult to change the situation in the near future.


Keeping a 'struct device' reference counted chunk of memory on a list
that has a different lifetime rule from that device itself, is crazy.



The lifetime rule is the same. That device itself will delete from the
host list in the destructor, scsi_device_dev_release_usercontext(), and
the memory will be freed after that. That's why this issue came out.


And yes, I remember how all of this came about, but I really don't have
the time to work on it myself...


So I don't even think this fixes the issue you think it fixes :)


This issue is very easy to reproduce on my machine and I have tested the
patch and it really fixes the issue.


Even with slab debugging enabled?  If so, what is keeping that memory
from being freed once the reference count drops to 0?



As above, the memory is not freed yet when we increasing the refconunt 
from zero, so it's nothing to do with slab debugging enabled or not. If

we want to free it, we have to grab host lock first to delete it from
the list, so if we are grabing the host lock, we can increase the
refcount safely from 0 to 1.


I think you are just papering over the real issue here, which is one
reason I really do not like the get_unless_zero() function at all.

thanks,

greg k-h

.





Re: [PATCH] driver core: Make it safe to use get_device() if the reference count is zero

2017-12-14 Thread Greg KH
On Thu, Dec 14, 2017 at 03:56:46PM +0800, Jason Yan wrote:
> 
> On 2017/12/14 15:42, Greg KH wrote:
> > On Thu, Dec 14, 2017 at 11:39:36AM +0800, Jason Yan wrote:
> > > Some driviers may have the chance to increase a reference count that
> > > has dropped to zero when using get_device() because of their design.
> > Then those drivers are broken :)
> > 
> > > We have met such a issue with scsi:
> > > https://www.spinics.net/lists/linux-scsi/msg115295.html
> > > 
> > > The scsi core will keep the scsi device object in the host list after
> > > it has been deleted and the iterator can still find it. All of the
> > > places where need iterating have to check the state of the scsi device
> > > and this makes a lot of code redundancy and complexity.
> > > 
> > > Provide a safe mechanism in get_device() by using
> > > kobject_get_unless_zero().
> > > 
> > > Suggested-by: Bart Van Assche 
> > > Signed-off-by: Jason Yan 
> > > CC: Greg Kroah-Hartman 
> > > CC: Bart Van Assche 
> > > CC: Ewan D. Milne 
> > > CC: James E.J. Bottomley 
> > > CC: Christoph Hellwig 
> > > ---
> > >   drivers/base/core.c | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > > index 12ebd05..cc74810 100644
> > > --- a/drivers/base/core.c
> > > +++ b/drivers/base/core.c
> > > @@ -1916,7 +1916,7 @@ EXPORT_SYMBOL_GPL(device_register);
> > >*/
> > >   struct device *get_device(struct device *dev)
> > >   {
> > > - return dev ? kobj_to_dev(kobject_get(&dev->kobj)) : NULL;
> > > + return dev && kobject_get_unless_zero(&dev->kobj) ? dev : NULL;
> > I really don't want to do this, the bus the device is on should prevent
> > this from happening.
> > 
> > Also, once that reference count drops to zero, the memory should be
> > freed, so you really have a stale pointer here, and this code would fail
> > if you had slab debugging enabled anyway.
> 
> Actually I don't want this either. But the design of scsi core will leave
> the scsi device on the host list after it is deleted, and it can  be
> found later and the refcount have a very big chance to increase from
> zero again. And after a lot of discussion it seems that the scsi layer
> is difficult to change the situation in the near future.

Keeping a 'struct device' reference counted chunk of memory on a list
that has a different lifetime rule from that device itself, is crazy.

And yes, I remember how all of this came about, but I really don't have
the time to work on it myself...

> > So I don't even think this fixes the issue you think it fixes :)
> 
> This issue is very easy to reproduce on my machine and I have tested the
> patch and it really fixes the issue.

Even with slab debugging enabled?  If so, what is keeping that memory
from being freed once the reference count drops to 0?

I think you are just papering over the real issue here, which is one
reason I really do not like the get_unless_zero() function at all.

thanks,

greg k-h


[bug report] scsi: lpfc: Correct driver deregistrations with host nvme transport

2017-12-14 Thread Dan Carpenter
Hello James Smart,

This is a semi-automatic email about new static checker warnings.

The patch add9d6be3d65: "scsi: lpfc: Correct driver deregistrations
with host nvme transport" from Nov 20, 2017, leads to the following
Smatch complaint:

drivers/scsi/lpfc/lpfc_nvme.c:969 lpfc_nvme_io_cmd_wqe_cmpl()
error: we previously assumed 'ndlp' could be null (see line 938)

drivers/scsi/lpfc/lpfc_nvme.c
   937  
   938  if (ndlp && NLP_CHK_NODE_ACT(ndlp))

Existing code assumes ndlp can be NULL.

   939  atomic_dec(&ndlp->cmd_pending);
   940  
   941  /* Update stats and complete the IO.  There is
   942   * no need for dma unprep because the nvme_transport
   943   * owns the dma address.
   944   */
   945  #ifdef CONFIG_SCSI_LPFC_DEBUG_FS
   946  if (lpfc_ncmd->ts_cmd_start) {
   947  lpfc_ncmd->ts_isr_cmpl = pwqeIn->isr_timestamp;
   948  lpfc_ncmd->ts_data_nvme = ktime_get_ns();
   949  phba->ktime_last_cmd = lpfc_ncmd->ts_data_nvme;
   950  lpfc_nvme_ktime(phba, lpfc_ncmd);
   951  }
   952  if (phba->cpucheck_on & LPFC_CHECK_NVME_IO) {
   953  if (lpfc_ncmd->cpu != smp_processor_id())
   954  lpfc_printf_vlog(vport, KERN_ERR, 
LOG_NVME_IOERR,
   955   "6701 CPU Check cmpl: "
   956   "cpu %d expect %d\n",
   957   smp_processor_id(), 
lpfc_ncmd->cpu);
   958  if (lpfc_ncmd->cpu < LPFC_CHECK_CPU_CNT)
   959  phba->cpucheck_cmpl_io[lpfc_ncmd->cpu]++;
   960  }
   961  #endif
   962  freqpriv = nCmd->private;
   963  freqpriv->nvme_buf = NULL;
   964  
   965  /* NVME targets need completion held off until the abort 
exchange
   966   * completes unless the NVME Rport is getting unregistered.
   967   */
   968  if (!(lpfc_ncmd->flags & LPFC_SBUF_XBUSY) ||
   969  ndlp->upcall_flags & NLP_WAIT_FOR_UNREG) {
^
The patch adds an unchecked dereference.

   970  /* Clear the XBUSY flag to prevent double completions.
   971   * The nvme rport is getting unregistered and there is

regards,
dan carpenter


Re: [PATCH] scsi: check for device state in __scsi_remove_target()

2017-12-14 Thread Jason Yan


On 2017/12/14 6:23, Bart Van Assche wrote:

On Wed, 2017-12-13 at 14:21 +0100, Hannes Reinecke wrote:

As it turned out device_get() doesn't use kref_get_unless_zero(),
so we will be always getting a device pointer.
So we need to check for the device state in __scsi_remove_target()
to avoid tripping over deleted objects.

Fixes: fbce4d9 ("scsi: fixup kernel warning during rmmod()")


How about adding Reported-by: Jason Yan? See also
https://www.spinics.net/lists/linux-scsi/msg115295.html

Anyway:

Reviewed-by: Bart Van Assche 



Seems the same as my patch.So how do we plan to fix this issue,
pick this approach up or the approach James Bottomley suggested?
I have sent a patch to change get_device() but Greg seems do not
like this way.