Re: [PATCH] IB/srp: Fix a memory leak

2015-11-20 Thread Sagi Grimberg



On 19/11/2015 01:00, Bart Van Assche wrote:

If srp_connect_ch() returns a positive value then that is considered
by its caller as a connection failure but this does not result in a
scsi_host_put() call and additionally causes the srp_create_target()
function to return a positive value while it should return a negative
value. Avoid all this confusion and additionally fix a memory leak by
ensuring that srp_connect_ch() always returns a value that is <= 0.
This patch avoids that a rejected login triggers the following memory
leak:

unreferenced object 0x88021b24a220 (size 8):
   comm "srp_daemon", pid 56421, jiffies 4295006762 (age 4240.750s)
   hex dump (first 8 bytes):
 68 6f 73 74 35 38 00 a5  host58..
   backtrace:
 [] kmemleak_alloc+0x7a/0xc0
 [] __kmalloc_track_caller+0xfe/0x160
 [] kvasprintf+0x5b/0x90
 [] kvasprintf_const+0x8d/0xb0
 [] kobject_set_name_vargs+0x3c/0xa0
 [] dev_set_name+0x3c/0x40
 [] scsi_host_alloc+0x327/0x4b0
 [] srp_create_target+0x4e/0x8a0 [ib_srp]
 [] dev_attr_store+0x1b/0x20
 [] sysfs_kf_write+0x4a/0x60
 [] kernfs_fop_write+0x14e/0x180
 [] __vfs_write+0x2f/0xf0
 [] vfs_write+0xa4/0x100
 [] SyS_write+0x54/0xc0
 [] entry_SYSCALL_64_fastpath+0x12/0x6f

Signed-off-by: Bart Van Assche 
Cc: Sagi Grimberg 
Cc: Sebastian Parschauer 
Cc: Christoph Hellwig 
Cc: stable 


Looks good,

Reviewed-by: Sagi Grimberg 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/71] More fixes, cleanup and modernization for NCR5380 drivers

2015-11-20 Thread Finn Thain

Hi Ondrej,

On Fri, 20 Nov 2015, Ondrej Zary wrote:

> On Friday 20 November 2015 02:41:19 Finn Thain wrote:
> > 
> > 
> > My tests involved 3 different scsi targets (two disks and a CD-ROM) 
> > but none of these send a SDTR. Your log says the driver correctly 
> > rejected the SDTR message but that doesn't mean the target actually 
> > went to MSG IN phase and got the message. Do you have any older 
> > targets you can test?
> 
> Another disk, without patches:
> 
> [   84.481582] pnp 01:01.00: activated
> [   84.489650] scsi host2: Generic NCR5380/NCR53C400 SCSI, io_port 0x240, 
> n_io_port 16, base 0x0, irq 0, can_queue 16, cmd_per_lun 2, sg_tablesize 128, 
> this_id 7, flags { DTC3181E NO_PSEUDO_DMA }, USLEEP_POLL 3, USLEEP_WAITLONG 
> 1250, options { AUTOPROBE_IRQ PSEUDO_DMA NCR53C400 }
> [   84.953332] scsi 2:0:1:0: Direct-Access QUANTUM  LP240S GM240S01X 4.6  
> PQ: 0 ANSI: 2 CCS
> [   86.786475] sd 2:0:1:0: Attached scsi generic sg1 type 0
> [   86.793753] sd 2:0:1:0: [sdb] 479350 512-byte logical blocks: (245 MB/234 
> MiB)
> [   86.998555] sd 2:0:1:0: [sdb] Write Protect is off
> [   87.406068] sd 2:0:1:0: [sdb] Write cache: enabled, read cache: enabled, 
> doesn't support DPO or FUA
> [  118.888271] sd 2:0:1:0: [sdb] aborting command
> [  118.888738] sd 2:0:1:0: [sdb] aborting command
> 
> With patches:
> 
> [  258.473748] pnp 01:01.00: activated
> [  258.483592] scsi host2: Generic NCR5380/NCR53C400 SCSI, io_port 0x240, 
> n_io_port 16, base 0x0, irq 0, can_queue 16, cmd_per_lun 2, sg_tablesize 128, 
> this_id 7, flags { DTC3181E NO_PSEUDO_DMA }, options { AUTOPROBE_IRQ 
> PSEUDO_DMA }
> [  261.347632] scsi 2:0:1:0: Direct-Access QUANTUM  LP240S GM240S01X 4.6  
> PQ: 0 ANSI: 2 CCS
> [  275.560451] sd 2:0:1:0: Attached scsi generic sg1 type 0
> [  275.632519] sd 2:0:1:0: [sdb] 479350 512-byte logical blocks: (245 MB/234 
> MiB)
> [  275.635533] sd 2:0:1:0: [sdb] Write Protect is off
> [  275.642315] sd 2:0:1:0: [sdb] Write cache: enabled, read cache: enabled, 
> doesn't support DPO or FUA
> [  469.076347] sd 2:0:1:0: [sdb] FAILED Result: hostbyte=DID_TIME_OUT 
> driverbyte=DRIVER_SENSE
> [  469.076613] sd 2:0:1:0: [sdb] Sense Key : Aborted Command [current]
> [  469.076851] sd 2:0:1:0: [sdb] Add. Sense: No additional sense information
> [  469.077086] sd 2:0:1:0: [sdb] CDB: Read(10) 28 00 00 00 00 02 00 00 02 00
> [  469.077306] blk_update_request: I/O error, dev sdb, sector 2
> [  469.077522] Buffer I/O error on dev sdb, logical block 1, async page read
> [  480.108255] INFO: task kworker/u2:2:60 blocked for more than 120 seconds.
> [  480.109773]   Not tainted 4.3.0-rc1+ #74
> [  480.109973] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> this message.
> [  480.110179] kworker/u2:2D 0040 060  2 0x
> [  480.110671] Workqueue: events_unbound async_run_entry_fn
> [  480.110999]  cf9e8780 0046 2eff25f7 0040 c117f111 2ee82733 
> 0040 0016fec4
> [  480.112390]   cfaa6000  7fff c139c7d2 c139c504 
> 7fff c139d9d3
> [  480.113661]  0040 cfaa5cfc c106f460 00161108  c648 
> 2106dcce 0040
> [  480.114893] Call Trace:
> [  480.115124]  [] ? blk_queue_bio+0x1e8/0x1fb
> [  480.115344]  [] ? bit_wait_io_timeout+0x3d/0x3d
> [  480.115564]  [] ? schedule+0x5b/0x67
> [  480.115794]  [] ? schedule_timeout+0x13/0xc5
> [  480.116007]  [] ? timekeeping_get_ns+0x10/0x69
> [  480.116406]  [] ? bit_wait_io_timeout+0x3d/0x3d
> [  480.116636]  [] ? ktime_get+0x38/0x48
> [  480.116843]  [] ? io_schedule_timeout+0x83/0xd7
> [  480.117062]  [] ? bit_wait_io+0x21/0x26
> [  480.117256]  [] ? __wait_on_bit+0x2f/0x5a
> [  480.117486]  [] ? blkdev_readpages+0x15/0x15
> [  480.117704]  [] ? wait_on_page_bit+0x57/0x5e
> [  480.117942]  [] ? wake_atomic_t_function+0x2a/0x2a
> [  480.118151]  [] ? wait_on_page_read+0xf/0x2a
> [  480.118373]  [] ? do_read_cache_page+0x8e/0x116
> [  480.118587]  [] ? blkdev_readpages+0x15/0x15
> [  480.118809]  [] ? read_cache_page+0x14/0x18
> [  480.119008]  [] ? read_dev_sector+0x25/0x57
> [  480.119222]  [] ? adfspart_check_ICS+0x30/0x1ac
> [  480.119438]  [] ? vsnprintf+0x78/0x25d
> [  480.119671]  [] ? snprintf+0x16/0x18
> [  480.119874]  [] ? check_partition+0xd7/0x165
> [  480.120253]  [] ? rescan_partitions+0x95/0x283
> [  480.120443]  [] ? scsi_block_when_processing_errors+0x13/0xae
> [  480.120693]  [] ? mutex_lock+0x9/0x21
> [  480.120915]  [] ? __blkdev_get+0x155/0x2f6
> [  480.121133]  [] ? blkdev_get+0x148/0x258
> [  480.121350]  [] ? unlock_new_inode+0x36/0x3c
> [  480.121570]  [] ? bdget+0xdc/0xe6
> [  480.121761]  [] ? add_disk+0x221/0x368
> [  480.121996]  [] ? sd_probe_async+0xed/0x157
> [  480.122214]  [] ? async_run_entry_fn+0x2c/0xad
> [  480.122437]  [] ? process_one_work+0x130/0x21f
> [  480.122639]  [] ? worker_thread+0x18a/0x247
> [  480.122854]  [] ? process_scheduled_works+0x1d/0x1d
> [  480.123069]  [] ? kthread+0x7c/0x81
> [  480.123288]  [] ? ret_from_kernel_th

[PATCH 0/3] Badblock tracking for gendisks

2015-11-20 Thread Vishal Verma
Patch 1 copies badblock management code into a header of its own,
making it generally available. It follows common libraries of code
such as linked lists, where anyone may embed a core data structure
in another place, and use the provided accessor functions to
manipulate the data.

Patch 2 adds badblock tracking to gendisks (in preparation for use
by NVDIMM devices). Right now, it is turned on unconditionally - I'd
appreciate comments on if that is the right path.

Patch 3 converts md over to use the new badblocks 'library'. I have
done some pretty simple testing on this - created a raid 1 device,
made sure the sysfs entries show up, and can be used to add and view
badblocks. A closer look by the md folks would be nice here.

Vishal Verma (3):
  badblocks: Add core badblock management code
  block: Add badblock management for gendisks
  md: convert to use the generic badblocks code

 block/genhd.c |  64 ++
 drivers/md/md.c   | 495 ++--
 drivers/md/md.h   |  31 +--
 include/linux/badblocks.h | 512 ++
 include/linux/genhd.h |   6 +
 5 files changed, 603 insertions(+), 505 deletions(-)
 create mode 100644 include/linux/badblocks.h

-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] md: convert to use the generic badblocks code

2015-11-20 Thread Vishal Verma
Retain badblocks as part of rdev, but use the accessor functions from
include/linux/badblocks for all manipulation.

Signed-off-by: Vishal Verma 
---
 drivers/md/md.c | 495 +++-
 drivers/md/md.h |  31 +---
 2 files changed, 21 insertions(+), 505 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index c702de1..82994d7 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -34,6 +34,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1358,8 +1359,6 @@ static __le32 calc_sb_1_csum(struct mdp_superblock_1 *sb)
return cpu_to_le32(csum);
 }
 
-static int md_set_badblocks(struct badblocks *bb, sector_t s, int sectors,
-   int acknowledged);
 static int super_1_load(struct md_rdev *rdev, struct md_rdev *refdev, int 
minor_version)
 {
struct mdp_superblock_1 *sb;
@@ -1484,7 +1483,7 @@ static int super_1_load(struct md_rdev *rdev, struct 
md_rdev *refdev, int minor_
count <<= sb->bblog_shift;
if (bb + 1 == 0)
break;
-   if (md_set_badblocks(&rdev->badblocks,
+   if (badblocks_set(&rdev->badblocks,
 sector, count, 1) == 0)
return -EINVAL;
}
@@ -2226,7 +2225,7 @@ repeat:
rdev_for_each(rdev, mddev) {
if (rdev->badblocks.changed) {
rdev->badblocks.changed = 0;
-   md_ack_all_badblocks(&rdev->badblocks);
+   ack_all_badblocks(&rdev->badblocks);
md_error(mddev, rdev);
}
clear_bit(Blocked, &rdev->flags);
@@ -2352,7 +2351,7 @@ repeat:
clear_bit(Blocked, &rdev->flags);
 
if (any_badblocks_changed)
-   md_ack_all_badblocks(&rdev->badblocks);
+   ack_all_badblocks(&rdev->badblocks);
clear_bit(BlockedBadBlocks, &rdev->flags);
wake_up(&rdev->blocked_wait);
}
@@ -2944,11 +2943,17 @@ static ssize_t recovery_start_store(struct md_rdev 
*rdev, const char *buf, size_
 static struct rdev_sysfs_entry rdev_recovery_start =
 __ATTR(recovery_start, S_IRUGO|S_IWUSR, recovery_start_show, 
recovery_start_store);
 
-static ssize_t
-badblocks_show(struct badblocks *bb, char *page, int unack);
-static ssize_t
-badblocks_store(struct badblocks *bb, const char *page, size_t len, int unack);
-
+/* sysfs access to bad-blocks list.
+ * We present two files.
+ * 'bad-blocks' lists sector numbers and lengths of ranges that
+ *are recorded as bad.  The list is truncated to fit within
+ *the one-page limit of sysfs.
+ *Writing "sector length" to this file adds an acknowledged
+ *bad block list.
+ * 'unacknowledged-bad-blocks' lists bad blocks that have not yet
+ *been acknowledged.  Writing to this file adds bad blocks
+ *without acknowledging them.  This is largely for testing.
+ */
 static ssize_t bb_show(struct md_rdev *rdev, char *page)
 {
return badblocks_show(&rdev->badblocks, page, 0);
@@ -8348,253 +8353,7 @@ void md_finish_reshape(struct mddev *mddev)
 }
 EXPORT_SYMBOL(md_finish_reshape);
 
-/* Bad block management.
- * We can record which blocks on each device are 'bad' and so just
- * fail those blocks, or that stripe, rather than the whole device.
- * Entries in the bad-block table are 64bits wide.  This comprises:
- * Length of bad-range, in sectors: 0-511 for lengths 1-512
- * Start of bad-range, sector offset, 54 bits (allows 8 exbibytes)
- *  A 'shift' can be set so that larger blocks are tracked and
- *  consequently larger devices can be covered.
- * 'Acknowledged' flag - 1 bit. - the most significant bit.
- *
- * Locking of the bad-block table uses a seqlock so md_is_badblock
- * might need to retry if it is very unlucky.
- * We will sometimes want to check for bad blocks in a bi_end_io function,
- * so we use the write_seqlock_irq variant.
- *
- * When looking for a bad block we specify a range and want to
- * know if any block in the range is bad.  So we binary-search
- * to the last range that starts at-or-before the given endpoint,
- * (or "before the sector after the target range")
- * then see if it ends after the given start.
- * We return
- *  0 if there are no known bad blocks in the range
- *  1 if there are known bad block which are all acknowledged
- * -1 if there are bad blocks which have not yet been acknowledged in metadata.
- * plus the start/length of the first bad section we overlap.
- */
-int md_is_badblock(struct badblocks *bb, sector_t s, int sectors,
-  sector_t *first_bad, int *bad_sectors)
-{
-   int hi;
-   int lo;
-   u64 *p = bb->pag

[PATCH 1/3] badblocks: Add core badblock management code

2015-11-20 Thread Vishal Verma
Take the core badblocks implementation from md, and make it generally
available. This follows the same style as kernel implementations of
linked lists, rb-trees etc, where you can have a structure that can be
embedded anywhere, and accessor functions to manipulate the data.

The only changes in this copy of the code are ones to generalize
function/variable names from md-specific ones. Also add init and free
functions.

Signed-off-by: Vishal Verma 
---
 include/linux/badblocks.h | 512 ++
 1 file changed, 512 insertions(+)
 create mode 100644 include/linux/badblocks.h

diff --git a/include/linux/badblocks.h b/include/linux/badblocks.h
new file mode 100644
index 000..94fa348
--- /dev/null
+++ b/include/linux/badblocks.h
@@ -0,0 +1,512 @@
+#ifndef _LINUX_BADBLOCKS_H
+#define _LINUX_BADBLOCKS_H
+
+#include 
+#include 
+#include 
+
+#define BB_LEN_MASK(0x01FFULL)
+#define BB_OFFSET_MASK (0x7E00ULL)
+#define BB_ACK_MASK(0x8000ULL)
+#define BB_MAX_LEN 512
+#define BB_OFFSET(x)   (((x) & BB_OFFSET_MASK) >> 9)
+#define BB_LEN(x)  (((x) & BB_LEN_MASK) + 1)
+#define BB_ACK(x)  (!!((x) & BB_ACK_MASK))
+#define BB_MAKE(a, l, ack) (((a)<<9) | ((l)-1) | ((u64)(!!(ack)) << 63))
+
+/* Bad block numbers are stored sorted in a single page.
+ * 64bits is used for each block or extent.
+ * 54 bits are sector number, 9 bits are extent size,
+ * 1 bit is an 'acknowledged' flag.
+ */
+#define MAX_BADBLOCKS  (PAGE_SIZE/8)
+
+struct badblocks {
+   int count;  /* count of bad blocks */
+   int unacked_exist;  /* there probably are unacknowledged
+* bad blocks.  This is only cleared
+* when a read discovers none
+*/
+   int shift;  /* shift from sectors to block size
+* a -ve shift means badblocks are
+* disabled.*/
+   u64 *page;  /* badblock list */
+   int changed;
+   seqlock_t lock;
+   sector_t sector;
+   sector_t size;  /* in sectors */
+};
+
+/* Bad block management.
+ * We can record which blocks on each device are 'bad' and so just
+ * fail those blocks, or that stripe, rather than the whole device.
+ * Entries in the bad-block table are 64bits wide.  This comprises:
+ * Length of bad-range, in sectors: 0-511 for lengths 1-512
+ * Start of bad-range, sector offset, 54 bits (allows 8 exbibytes)
+ *  A 'shift' can be set so that larger blocks are tracked and
+ *  consequently larger devices can be covered.
+ * 'Acknowledged' flag - 1 bit. - the most significant bit.
+ *
+ * Locking of the bad-block table uses a seqlock so badblocks_check
+ * might need to retry if it is very unlucky.
+ * We will sometimes want to check for bad blocks in a bi_end_io function,
+ * so we use the write_seqlock_irq variant.
+ *
+ * When looking for a bad block we specify a range and want to
+ * know if any block in the range is bad.  So we binary-search
+ * to the last range that starts at-or-before the given endpoint,
+ * (or "before the sector after the target range")
+ * then see if it ends after the given start.
+ * We return
+ *  0 if there are no known bad blocks in the range
+ *  1 if there are known bad block which are all acknowledged
+ * -1 if there are bad blocks which have not yet been acknowledged in metadata.
+ * plus the start/length of the first bad section we overlap.
+ */
+static inline int badblocks_check(struct badblocks *bb, sector_t s, int 
sectors,
+  sector_t *first_bad, int *bad_sectors)
+{
+   int hi;
+   int lo;
+   u64 *p = bb->page;
+   int rv;
+   sector_t target = s + sectors;
+   unsigned seq;
+
+   if (bb->shift > 0) {
+   /* round the start down, and the end up */
+   s >>= bb->shift;
+   target += (1>= bb->shift;
+   sectors = target - s;
+   }
+   /* 'target' is now the first block after the bad range */
+
+retry:
+   seq = read_seqbegin(&bb->lock);
+   lo = 0;
+   rv = 0;
+   hi = bb->count;
+
+   /* Binary search between lo and hi for 'target'
+* i.e. for the last range that starts before 'target'
+*/
+   /* INVARIANT: ranges before 'lo' and at-or-after 'hi'
+* are known not to be the last range before target.
+* VARIANT: hi-lo is the number of possible
+* ranges, and decreases until it reaches 1
+*/
+   while (hi - lo > 1) {
+   int mid = (lo + hi) / 2;
+   sector_t a = BB_OFFSET(p[mid]);
+   if (a < target)
+   /* This could still be the one, earlier ranges
+* could not. */
+   lo = mid;
+   else
+   /* This and later ranges are de

[PATCH 2/3] block: Add badblock management for gendisks

2015-11-20 Thread Vishal Verma
NVDIMM devices, which can behave more like DRAM rather than block
devices, may develop bad cache lines, or 'poison'. A block device
exposed by the pmem driver can then consume poison via a read (or
write), and cause a machine check. On platforms without machine
check recovery features, this would mean a crash.

The block device maintaining a runtime list of all known sectors that
have poison can directly avoid this, and also provide a path forward
to enable proper handling/recovery for DAX faults on such a device.

Use the new badblock management interfaces to add a badblocks list to
gendisks.

Signed-off-by: Vishal Verma 
---
 block/genhd.c | 64 +++
 include/linux/genhd.h |  6 +
 2 files changed, 70 insertions(+)

diff --git a/block/genhd.c b/block/genhd.c
index 0c706f3..4209c32 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "blk.h"
 
@@ -505,6 +506,20 @@ static int exact_lock(dev_t devt, void *data)
return 0;
 }
 
+static void disk_alloc_badblocks(struct gendisk *disk)
+{
+   disk->bb = kzalloc(sizeof(disk->bb), GFP_KERNEL);
+   if (!disk->bb) {
+   pr_warn("%s: failed to allocate space for badblocks\n",
+   disk->disk_name);
+   return;
+   }
+
+   if (badblocks_init(disk->bb, 1))
+   pr_warn("%s: failed to initialize badblocks\n",
+   disk->disk_name);
+}
+
 static void register_disk(struct gendisk *disk)
 {
struct device *ddev = disk_to_dev(disk);
@@ -609,6 +624,7 @@ void add_disk(struct gendisk *disk)
disk->first_minor = MINOR(devt);
 
disk_alloc_events(disk);
+   disk_alloc_badblocks(disk);
 
/* Register BDI before referencing it from bdev */
bdi = &disk->queue->backing_dev_info;
@@ -657,6 +673,9 @@ void del_gendisk(struct gendisk *disk)
blk_unregister_queue(disk);
blk_unregister_region(disk_devt(disk), disk->minors);
 
+   badblocks_free(disk->bb);
+   kfree(disk->bb);
+
part_stat_set_all(&disk->part0, 0);
disk->part0.stamp = 0;
 
@@ -670,6 +689,48 @@ void del_gendisk(struct gendisk *disk)
 }
 EXPORT_SYMBOL(del_gendisk);
 
+/*
+ * The gendisk usage of badblocks does not track acknowledgements for
+ * badblocks. We always assume they are acknowledged.
+ */
+int disk_check_badblocks(struct gendisk *disk, sector_t s, int sectors,
+  sector_t *first_bad, int *bad_sectors)
+{
+   return badblocks_check(disk->bb, s, sectors, first_bad, bad_sectors);
+}
+EXPORT_SYMBOL(disk_check_badblocks);
+
+int disk_set_badblocks(struct gendisk *disk, sector_t s, int sectors)
+{
+   return badblocks_set(disk->bb, s, sectors, 1);
+}
+EXPORT_SYMBOL(disk_set_badblocks);
+
+int disk_clear_badblocks(struct gendisk *disk, sector_t s, int sectors)
+{
+   return badblocks_clear(disk->bb, s, sectors);
+}
+EXPORT_SYMBOL(disk_clear_badblocks);
+
+/* sysfs access to bad-blocks list. */
+static ssize_t disk_badblocks_show(struct device *dev,
+   struct device_attribute *attr,
+   char *page)
+{
+   struct gendisk *disk = dev_to_disk(dev);
+
+   return badblocks_show(disk->bb, page, 0);
+}
+
+static ssize_t disk_badblocks_store(struct device *dev,
+   struct device_attribute *attr,
+   const char *page, size_t len)
+{
+   struct gendisk *disk = dev_to_disk(dev);
+
+   return badblocks_store(disk->bb, page, len, 0);
+}
+
 /**
  * get_gendisk - get partitioning information for a given device
  * @devt: device to get partitioning information for
@@ -988,6 +1049,8 @@ static DEVICE_ATTR(discard_alignment, S_IRUGO, 
disk_discard_alignment_show,
 static DEVICE_ATTR(capability, S_IRUGO, disk_capability_show, NULL);
 static DEVICE_ATTR(stat, S_IRUGO, part_stat_show, NULL);
 static DEVICE_ATTR(inflight, S_IRUGO, part_inflight_show, NULL);
+static DEVICE_ATTR(badblocks, S_IRUGO | S_IWUSR, disk_badblocks_show,
+   disk_badblocks_store);
 #ifdef CONFIG_FAIL_MAKE_REQUEST
 static struct device_attribute dev_attr_fail =
__ATTR(make-it-fail, S_IRUGO|S_IWUSR, part_fail_show, part_fail_store);
@@ -1009,6 +1072,7 @@ static struct attribute *disk_attrs[] = {
&dev_attr_capability.attr,
&dev_attr_stat.attr,
&dev_attr_inflight.attr,
+   &dev_attr_badblocks.attr,
 #ifdef CONFIG_FAIL_MAKE_REQUEST
&dev_attr_fail.attr,
 #endif
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index 2adbfa6..5563bde 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -162,6 +162,7 @@ struct disk_part_tbl {
 };
 
 struct disk_events;
+struct badblocks;
 
 struct gendisk {
/* major, first_minor and minors are input parameters only,
@@ -201,6 +202,7 @@ struct gendisk {
struct blk_integrity *

Re: [PATCH 00/18] ALUA device handler update, part 1

2015-11-20 Thread Bart Van Assche

On 11/20/2015 02:52 AM, Hannes Reinecke wrote:

One thing, though: I don't really agree with Barts objection that
moving to a workqueue would tie in too many resources.
Thing is, I'm not convinces that using a work queue is allocating
too many resources (we're speaking of 460 vs 240 bytes here).
Also we have to retry commands for quite some time (cite the
infamous NetApp takeover/giveback, which can take minutes).
If we were to handle that without workqueue we'd have to initiate
the retry from the end_io callback, causing a quite deep stack
recursion. Which I'm not really fond of.


Hello Hannes,

Sorry if I wasn't clear enough in my previous e-mail about this topic 
but I'm more concerned about the additional memory needed for thread 
stacks and thread control data structures than about the additional 
memory needed for the workqueue. I'd like to see the ALUA device handler 
implementation scale to thousands of LUNs and target port groups. In 
case all connections between an initiator and a target port group fail, 
with a synchronous implementation of STPG we will either need a large 
number of threads (in case of one thread per STPG command) or the STPG 
commands will be serialized (if there are fewer threads than portal 
groups). Neither alternative looks attractive to me.


BTW, not all storage arrays need STPG retries. Some arrays are able to 
process an STPG command quickly (this means within a few seconds).


A previous discussion about this topic is available e.g. at 
http://thread.gmane.org/gmane.linux.scsi/105340/focus=105601.


Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fix a bdi reregistration race, v2

2015-11-20 Thread Bart Van Assche

On 11/20/2015 02:44 PM, Aaro Koskinen wrote:

I think you should squash the revert of v1 into this patch, and then
document the crash the original patch caused and how this new patch is
fixing that.


Hello Aaro,

I'd like to know the opinion of the SCSI maintainers about this. It's 
not impossible that they would prefer to submit the revert to Linus 
quickly and only send the reworked fix to Linus during the v4.5 merge 
window.


Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fix a bdi reregistration race, v2

2015-11-20 Thread Aaro Koskinen
Hi,

I think you should squash the revert of v1 into this patch, and then
document the crash the original patch caused and how this new patch is
fixing that.

A.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Fix a bdi reregistration race, v2

2015-11-20 Thread Bart Van Assche
Unregister and reregister BDI devices in the proper order. This patch
avoids that the following kernel warning can be triggered during
SCSI device reregistration:

WARNING: CPU: 7 PID: 203 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x68/0x80()
sysfs: cannot create duplicate filename '/devices/virtual/bdi/8:32'
Workqueue: events_unbound async_run_entry_fn
Call Trace:
[] dump_stack+0x4c/0x65
[] warn_slowpath_common+0x8a/0xc0
[] warn_slowpath_fmt+0x46/0x50
[] sysfs_warn_dup+0x68/0x80
[] sysfs_create_dir_ns+0x7e/0x90
[] kobject_add_internal+0xa8/0x320
[] kobject_add+0x60/0xb0
[] device_add+0x107/0x5e0
[] device_create_groups_vargs+0xd8/0x100
[] device_create_vargs+0x1c/0x20
[] bdi_register+0x63/0x2a0
[] bdi_register_dev+0x27/0x30
[] add_disk+0x1a9/0x4e0
[] sd_probe_async+0x119/0x1d0 [sd_mod]
[] async_run_entry_fn+0x4a/0x140
[] process_one_work+0x1d8/0x7c0
[] worker_thread+0x114/0x460
[] kthread+0xf8/0x110
[] ret_from_fork+0x3f/0x70

Signed-off-by: Bart Van Assche 
Cc: Jens Axboe 
Cc: Tejun Heo 
Cc: Jan Kara 
Cc: Hannes Reinecke 
Cc: Aaro Koskinen 
Cc: 
---
 drivers/scsi/scsi_sysfs.c|  2 ++
 include/linux/backing-dev-defs.h |  1 +
 include/linux/backing-dev.h  |  1 +
 mm/backing-dev.c | 28 ++--
 4 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index f5ace2b..8d64518 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1110,6 +,7 @@ void __scsi_remove_device(struct scsi_device *sdev)
device_unregister(&sdev->sdev_dev);
transport_remove_device(dev);
scsi_dh_remove_device(sdev);
+   bdi_sysfs_del(&sdev->request_queue->backing_dev_info);
device_del(dev);
} else
put_device(&sdev->sdev_dev);
diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
index 1b4d69f..1a42ecb 100644
--- a/include/linux/backing-dev-defs.h
+++ b/include/linux/backing-dev-defs.h
@@ -135,6 +135,7 @@ struct bdi_writeback {
 
 struct backing_dev_info {
struct list_head bdi_list;
+   bool is_visible;
unsigned long ra_pages; /* max readahead in PAGE_CACHE_SIZE units */
unsigned int capabilities; /* Device capabilities */
congested_fn *congested_fn; /* Function pointer if device is md/dm */
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index c82794f..9004d90 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -24,6 +24,7 @@ __printf(3, 4)
 int bdi_register(struct backing_dev_info *bdi, struct device *parent,
const char *fmt, ...);
 int bdi_register_dev(struct backing_dev_info *bdi, dev_t dev);
+void bdi_sysfs_del(struct backing_dev_info *bdi);
 void bdi_unregister(struct backing_dev_info *bdi);
 
 int __must_check bdi_setup_and_register(struct backing_dev_info *, char *);
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 8ed2ffd..b56971f 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -774,6 +774,7 @@ int bdi_init(struct backing_dev_info *bdi)
int ret;
 
bdi->dev = NULL;
+   bdi->is_visible = false;
 
bdi->min_ratio = 0;
bdi->max_ratio = 100;
@@ -806,6 +807,7 @@ int bdi_register(struct backing_dev_info *bdi, struct 
device *parent,
return PTR_ERR(dev);
 
bdi->dev = dev;
+   bdi->is_visible = true;
 
bdi_debug_register(bdi, dev_name(dev));
set_bit(WB_registered, &bdi->wb.state);
@@ -837,6 +839,28 @@ static void bdi_remove_from_list(struct backing_dev_info 
*bdi)
synchronize_rcu_expedited();
 }
 
+/**
+ * bdi_sysfs_del - remove a BDI device from sysfs
+ * @bdi: BDI device pointer.
+ *
+ * It is safe to call this function more than once.
+ */
+void bdi_sysfs_del(struct backing_dev_info *bdi)
+{
+   bool is_visible = false;
+
+   spin_lock_bh(&bdi_lock);
+   swap(bdi->is_visible, is_visible);
+   spin_unlock_bh(&bdi_lock);
+
+   if (!is_visible)
+   return;
+
+   bdi_debug_unregister(bdi);
+   device_del(bdi->dev);
+}
+EXPORT_SYMBOL(bdi_sysfs_del);
+
 void bdi_unregister(struct backing_dev_info *bdi)
 {
/* make sure nobody finds us on the bdi_list anymore */
@@ -845,8 +869,8 @@ void bdi_unregister(struct backing_dev_info *bdi)
cgwb_bdi_destroy(bdi);
 
if (bdi->dev) {
-   bdi_debug_unregister(bdi);
-   device_unregister(bdi->dev);
+   bdi_sysfs_del(bdi);
+   put_device(bdi->dev);
bdi->dev = NULL;
}
 }
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Revert "scsi: Fix a bdi reregistration race"

2015-11-20 Thread Bart Van Assche
The SCSI sd driver probes SCSI devices asynchronously. The sd_remove()
function, called indirectly by device_del(), waits until asynchronous
probing has finished. Since the block layer queue must only be cleaned
up after probing has finished, device_del() has to be called before
blk_cleanup_queue(). Hence revert commit bf2cf3baa20b.

Signed-off-by: Bart Van Assche 
Cc: Hannes Reinecke 
Cc: Aaro Koskinen 
Cc: 
---
 drivers/scsi/scsi_sysfs.c | 14 +++---
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 8d23122..f5ace2b 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -1110,7 +1110,9 @@ void __scsi_remove_device(struct scsi_device *sdev)
device_unregister(&sdev->sdev_dev);
transport_remove_device(dev);
scsi_dh_remove_device(sdev);
-   }
+   device_del(dev);
+   } else
+   put_device(&sdev->sdev_dev);
 
/*
 * Stop accepting new requests and wait until all queuecommand() and
@@ -1121,16 +1123,6 @@ void __scsi_remove_device(struct scsi_device *sdev)
blk_cleanup_queue(sdev->request_queue);
cancel_work_sync(&sdev->requeue_work);
 
-   /*
-* Remove the device after blk_cleanup_queue() has been called such
-* a possible bdi_register() call with the same name occurs after
-* blk_cleanup_queue() has called bdi_destroy().
-*/
-   if (sdev->is_visible)
-   device_del(dev);
-   else
-   put_device(&sdev->sdev_dev);
-
if (sdev->host->hostt->slave_destroy)
sdev->host->hostt->slave_destroy(sdev);
transport_destroy_device(dev);
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] scsi_transport_fc: Introduce scsi_host_{get,put}()

2015-11-20 Thread Bart Van Assche
Use scsi_host_{get,put}() instead of open-coding these functions.
Compile-tested only.

Signed-off-by: Bart Van Assche 
Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: James Smart 
Cc: stable 
---
 drivers/scsi/scsi_transport_fc.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
index 24eaaf6..8a88226 100644
--- a/drivers/scsi/scsi_transport_fc.c
+++ b/drivers/scsi/scsi_transport_fc.c
@@ -2586,7 +2586,7 @@ fc_rport_final_delete(struct work_struct *work)
transport_remove_device(dev);
device_del(dev);
transport_destroy_device(dev);
-   put_device(&shost->shost_gendev);   /* for fc_host->rport list */
+   scsi_host_put(shost);   /* for fc_host->rport list */
put_device(dev);/* for self-reference */
 }
 
@@ -2650,7 +2650,7 @@ fc_rport_create(struct Scsi_Host *shost, int channel,
else
rport->scsi_target_id = -1;
list_add_tail(&rport->peers, &fc_host->rports);
-   get_device(&shost->shost_gendev);   /* for fc_host->rport list */
+   scsi_host_get(shost);   /* for fc_host->rport list */
 
spin_unlock_irqrestore(shost->host_lock, flags);
 
@@ -2685,7 +2685,7 @@ delete_rport:
transport_destroy_device(dev);
spin_lock_irqsave(shost->host_lock, flags);
list_del(&rport->peers);
-   put_device(&shost->shost_gendev);   /* for fc_host->rport list */
+   scsi_host_put(shost);   /* for fc_host->rport list */
spin_unlock_irqrestore(shost->host_lock, flags);
put_device(dev->parent);
kfree(rport);
@@ -3383,7 +3383,7 @@ fc_vport_setup(struct Scsi_Host *shost, int channel, 
struct device *pdev,
fc_host->npiv_vports_inuse++;
vport->number = fc_host->next_vport_number++;
list_add_tail(&vport->peers, &fc_host->vports);
-   get_device(&shost->shost_gendev);   /* for fc_host->vport list */
+   scsi_host_get(shost);   /* for fc_host->vport list */
 
spin_unlock_irqrestore(shost->host_lock, flags);
 
@@ -3441,7 +3441,7 @@ delete_vport:
transport_destroy_device(dev);
spin_lock_irqsave(shost->host_lock, flags);
list_del(&vport->peers);
-   put_device(&shost->shost_gendev);   /* for fc_host->vport list */
+   scsi_host_put(shost);   /* for fc_host->vport list */
fc_host->npiv_vports_inuse--;
spin_unlock_irqrestore(shost->host_lock, flags);
put_device(dev->parent);
@@ -3504,7 +3504,7 @@ fc_vport_terminate(struct fc_vport *vport)
vport->flags |= FC_VPORT_DELETED;
list_del(&vport->peers);
fc_host->npiv_vports_inuse--;
-   put_device(&shost->shost_gendev);  /* for fc_host->vport list */
+   scsi_host_put(shost);   /* for fc_host->vport list */
}
spin_unlock_irqrestore(shost->host_lock, flags);
 
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] Fix a memory leak in scsi_host_dev_release()

2015-11-20 Thread Bart Van Assche
Fix a memory leak that occurs if a SCSI LLD calls scsi_host_alloc()
and scsi_host_put() but neither scsi_host_add() nor scsi_host_remove().
This leak is fixed by ensuring that put_device(&shost->shost_dev) is
always called. This patch also removes the get_device() call from
scsi_add_host*() and the put_device() call from scsi_remove_host()
since these calls are no longer needed.

The following shell command triggers the scenario described above:

for ((i=0; i<2; i++)); do
  srp_daemon -oac |
  while read line; do
echo $line >/sys/class/infiniband_srp/srp-mlx4_0-1/add_target
  done
done

The kmemleak report is as follows:

unreferenced object 0x88021b24a220 (size 8):
  comm "srp_daemon", pid 56421, jiffies 4295006762 (age 4240.750s)
  hex dump (first 8 bytes):
68 6f 73 74 35 38 00 a5  host58..
  backtrace:
[] kmemleak_alloc+0x7a/0xc0
[] __kmalloc_track_caller+0xfe/0x160
[] kvasprintf+0x5b/0x90
[] kvasprintf_const+0x8d/0xb0
[] kobject_set_name_vargs+0x3c/0xa0
[] dev_set_name+0x3c/0x40
[] scsi_host_alloc+0x327/0x4b0
[] srp_create_target+0x4e/0x8a0 [ib_srp]
[] dev_attr_store+0x1b/0x20
[] sysfs_kf_write+0x4a/0x60
[] kernfs_fop_write+0x14e/0x180
[] __vfs_write+0x2f/0xf0
[] vfs_write+0xa4/0x100
[] SyS_write+0x54/0xc0
[] entry_SYSCALL_64_fastpath+0x12/0x6f

Signed-off-by: Bart Van Assche 
Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: stable 
---
 drivers/scsi/hosts.c | 57 +---
 include/scsi/scsi_host.h |  5 +
 2 files changed, 44 insertions(+), 18 deletions(-)

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index 323982f..b9fa1f9 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -45,16 +45,6 @@
 static atomic_t scsi_host_next_hn = ATOMIC_INIT(0);/* host_no for next new 
host */
 
 
-static void scsi_host_cls_release(struct device *dev)
-{
-   put_device(&class_to_shost(dev)->shost_gendev);
-}
-
-static struct class shost_class = {
-   .name   = "scsi_host",
-   .dev_release= scsi_host_cls_release,
-};
-
 /**
  * scsi_host_set_state - Take the given host through the host state model.
  * @shost: scsi host to change the state of.
@@ -180,7 +170,7 @@ void scsi_remove_host(struct Scsi_Host *shost)
spin_unlock_irqrestore(shost->host_lock, flags);
 
transport_unregister_device(&shost->shost_gendev);
-   device_unregister(&shost->shost_dev);
+   device_del(&shost->shost_dev);
device_del(&shost->shost_gendev);
 }
 EXPORT_SYMBOL(scsi_remove_host);
@@ -263,8 +253,6 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct 
device *dev,
if (error)
goto out_del_gendev;
 
-   get_device(&shost->shost_gendev);
-
if (shost->transportt->host_size) {
shost->shost_data = kzalloc(shost->transportt->host_size,
 GFP_KERNEL);
@@ -311,10 +299,10 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, 
struct device *dev,
 }
 EXPORT_SYMBOL(scsi_add_host_with_dma);
 
-static void scsi_host_dev_release(struct device *dev)
+static void scsi_host_free(struct kref *kref)
 {
-   struct Scsi_Host *shost = dev_to_shost(dev);
-   struct device *parent = dev->parent;
+   struct Scsi_Host *shost = container_of(kref, typeof(*shost), kref2);
+   struct device *parent = shost->shost_gendev.parent;
struct request_queue *q;
void *queuedata;
 
@@ -349,6 +337,35 @@ static void scsi_host_dev_release(struct device *dev)
kfree(shost);
 }
 
+/* Called if shost_gendev refcnt drops to zero. */
+static void scsi_host_dev_release(struct device *dev)
+{
+   struct Scsi_Host *shost = dev_to_shost(dev);
+
+   kref_put(&shost->kref2, scsi_host_free);
+}
+
+/* Called if shost_dev refcnt drops to zero. */
+static void scsi_host_cls_release(struct device *dev)
+{
+   struct Scsi_Host *shost = class_to_shost(dev);
+
+   kref_put(&shost->kref2, scsi_host_free);
+}
+
+static struct class shost_class = {
+   .name   = "scsi_host",
+   .dev_release= scsi_host_cls_release,
+};
+
+static void scsi_host_release(struct kref *kref)
+{
+   struct Scsi_Host *shost = container_of(kref, typeof(*shost), kref1);
+
+   put_device(&shost->shost_gendev);
+   put_device(&shost->shost_dev);
+}
+
 static int shost_eh_deadline = -1;
 
 module_param_named(eh_deadline, shost_eh_deadline, int, S_IRUGO|S_IWUSR);
@@ -467,6 +484,10 @@ struct Scsi_Host *scsi_host_alloc(struct 
scsi_host_template *sht, int privsize)
 
shost->use_blk_mq = scsi_use_blk_mq && !shost->hostt->disable_blk_mq;
 
+   kref_init(&shost->kref1);
+   kref_init(&shost->kref2);
+   kref_get(&shost->kref2);
+
device_initialize(&shost->shost_gendev);
dev_set_name(&shost->shost_gendev, "host%d", shost->host_no);
shost->shost_gendev.bus = &scsi_bus_type;
@@ -571,7 +592,7 @@ EXPORT_SYMBOL(scsi_host_lookup);
 struct Scsi_Host *scsi_hos

[PATCH 0/2] Fix a scsi_host_dev_release() memory leak

2015-11-20 Thread Bart Van Assche

This patch series consists of two patches:
- A first patch that converts the code that accesses shost->shost_gendev 
to manipulate the shost reference count into scsi_host_{get,put}() calls.

- A second patch that fixes the actual memory leak.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux kernel security issuse scsi_report_lun_scan report

2015-11-20 Thread Linus Torvalds
On Fri, Nov 20, 2015 at 12:57 PM, James Bottomley
 wrote:
>
> It's done under the scan mutex, so there can only be one thread in that
> code at once.

Hmm. Looking at the call chain seems to confirm that.

But looking at the call chain I _also_ see that we have
scsi_free_host_dev() there, which seems to be some stale frame data
from a previous scan.

I'm wondering if that is a clue.  I find exactly two callers of that
functions, both in the gdth driver.

Maybe this is some odd refcount bug, brought on by reuse of a sdev.
Would that make more sense?

Why is that scsi_free_host_dev() used only by that one driver, and
nobody else wants it or needs it?

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux kernel security issuse scsi_report_lun_scan report

2015-11-20 Thread James Bottomley
On Fri, 2015-11-20 at 12:33 -0800, Linus Torvalds wrote:
> On Fri, Nov 20, 2015 at 10:26 AM, James Bottomley
>  wrote:
> >
> > We can look at it, but the analysis shouldn't be correct.
> 
> Just take the five seconds to check the freeing path, please. The last
> free is this:

I did ... it can only be something freed the device while we were
scanning it because it can't be a device we allocated (unless there's an
unbalanced put somewhere).  I can't see how it can happen, but moving
the put to after the if should fix it.  I'd just like confirmation it
does in case this is actually caused by something else entirely.

James

> > INFO: Freed in scsi_device_dev_release_usercontext+0x23d/0x2d7 age=4 cpu=0 
> > pid=1
> >   free_debug_processing+0x188/0x207 mm/slub.c:1108
> >   scsi_device_dev_release_usercontext+0x23d/0x2d7 
> > drivers/scsi/scsi_sysfs.c:429
> >   __slab_free+0x4a/0x27a mm/slub.c:2587
> >   mempool_free_slab+0x0/0x13 mm/mempool.c:453
> >   ida_remove+0xc6/0x159 lib/idr.c:1042
> >   __raw_callee_save___pv_queued_spin_unlock+0x11/0x1e ??:?
> >   __read_once_size include/linux/compiler.h:218
> >   atomic_read ./arch/x86/include/asm/atomic.h:27
> >   __rcu_is_watching+0x18/0x1f kernel/rcu/tree.c:987
> >   scsi_device_dev_release_usercontext+0x23d/0x2d7 
> > drivers/scsi/scsi_sysfs.c:429
> >   scsi_device_dev_release_usercontext+0x23d/0x2d7 
> > drivers/scsi/scsi_sysfs.c:429
> >   scsi_device_dev_release_usercontext+0x0/0x2d7 
> > drivers/scsi/scsi_sysfs.c:438
> >   execute_in_process_context+0x24/0x82 kernel/workqueue.c:2969
> >   device_release+0x44/0xde drivers/base/core.c:247
> >   kobject_cleanup lib/kobject.c:631
> >   kobject_release lib/kobject.c:660
> >   kref_sub include/linux/kref.h:74
> >   kref_put include/linux/kref.h:99
> >   kobject_put+0xbc/0xcf lib/kobject.c:677
> >   scsi_report_lun_scan+0x27f/0x434 drivers/scsi/scsi_scan.c:1458
> >   scsi_report_lun_scan+0x0/0x434 drivers/scsi/scsi_scan.c:1053
> 
> so clearly the device *was* freed by scsi_report_lun_scan() doing the
> scsi_device_put().
> 
> >  This device
> > is the one we first used to issue the report lun scan.  Either it's an
> > existing device, or we created it specially for the purpose.  If it's an
> > existing one, that put just releases our reference, but the core still
> > has one  (there'd have to be a very unusual scan destroy race for the
> > core to be releasing a reference to an object it was in process of
> > scanning).
> 
> A race it may be. Or maybe it's even a kasan bug. But so far, those
> kasan reports have been pretty much on the money.
> 
> It may be that a possible race is widened by kasan itself slowing things down.
> 
> Linus
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux kernel security issuse scsi_report_lun_scan report

2015-11-20 Thread James Bottomley
On Fri, 2015-11-20 at 12:54 -0800, Linus Torvalds wrote:
> On Fri, Nov 20, 2015 at 10:26 AM, James Bottomley
>  wrote:
> >
> > We can look at it, but the analysis shouldn't be correct.  This device
> > is the one we first used to issue the report lun scan.  Either it's an
> > existing device, or we created it specially for the purpose.  If it's an
> > existing one, that put just releases our reference, but the core still
> > has one  (there'd have to be a very unusual scan destroy race for the
> > core to be releasing a reference to an object it was in process of
> > scanning).
> 
> Side note: that whole "if it's an existing one" looks fundamentally racy.
> 
> What if two threads have that existing one, and both threads decide
> "there's no device there", so they'll both decide to do that
> __scsi_remove_device()?

It's done under the scan mutex, so there can only be one thread in that
code at once.

James


> In fact, one of the threads might have created the device, so it looks
> like it's sufficient that just one thread has that
> "scsi_device_lookup_by_target()" case..
> 
> I don't see any serialization around this.
> 
> Now, I do agree that it's odd that this happens during early kernel
> initcalls, but the scsi layer is one of the things that uses async
> stuff, so if that do_scan_async() ever ends up using the same target
> twice, that would explain it. Can that happen some way?
> 
>  Linus
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux kernel security issuse scsi_report_lun_scan report

2015-11-20 Thread Linus Torvalds
On Fri, Nov 20, 2015 at 10:26 AM, James Bottomley
 wrote:
>
> We can look at it, but the analysis shouldn't be correct.  This device
> is the one we first used to issue the report lun scan.  Either it's an
> existing device, or we created it specially for the purpose.  If it's an
> existing one, that put just releases our reference, but the core still
> has one  (there'd have to be a very unusual scan destroy race for the
> core to be releasing a reference to an object it was in process of
> scanning).

Side note: that whole "if it's an existing one" looks fundamentally racy.

What if two threads have that existing one, and both threads decide
"there's no device there", so they'll both decide to do that
__scsi_remove_device()?

In fact, one of the threads might have created the device, so it looks
like it's sufficient that just one thread has that
"scsi_device_lookup_by_target()" case..

I don't see any serialization around this.

Now, I do agree that it's odd that this happens during early kernel
initcalls, but the scsi layer is one of the things that uses async
stuff, so if that do_scan_async() ever ends up using the same target
twice, that would explain it. Can that happen some way?

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux kernel security issuse scsi_report_lun_scan report

2015-11-20 Thread Linus Torvalds
On Fri, Nov 20, 2015 at 10:26 AM, James Bottomley
 wrote:
>
> We can look at it, but the analysis shouldn't be correct.

Just take the five seconds to check the freeing path, please. The last
free is this:

> INFO: Freed in scsi_device_dev_release_usercontext+0x23d/0x2d7 age=4 cpu=0 
> pid=1
>   free_debug_processing+0x188/0x207 mm/slub.c:1108
>   scsi_device_dev_release_usercontext+0x23d/0x2d7 
> drivers/scsi/scsi_sysfs.c:429
>   __slab_free+0x4a/0x27a mm/slub.c:2587
>   mempool_free_slab+0x0/0x13 mm/mempool.c:453
>   ida_remove+0xc6/0x159 lib/idr.c:1042
>   __raw_callee_save___pv_queued_spin_unlock+0x11/0x1e ??:?
>   __read_once_size include/linux/compiler.h:218
>   atomic_read ./arch/x86/include/asm/atomic.h:27
>   __rcu_is_watching+0x18/0x1f kernel/rcu/tree.c:987
>   scsi_device_dev_release_usercontext+0x23d/0x2d7 
> drivers/scsi/scsi_sysfs.c:429
>   scsi_device_dev_release_usercontext+0x23d/0x2d7 
> drivers/scsi/scsi_sysfs.c:429
>   scsi_device_dev_release_usercontext+0x0/0x2d7 drivers/scsi/scsi_sysfs.c:438
>   execute_in_process_context+0x24/0x82 kernel/workqueue.c:2969
>   device_release+0x44/0xde drivers/base/core.c:247
>   kobject_cleanup lib/kobject.c:631
>   kobject_release lib/kobject.c:660
>   kref_sub include/linux/kref.h:74
>   kref_put include/linux/kref.h:99
>   kobject_put+0xbc/0xcf lib/kobject.c:677
>   scsi_report_lun_scan+0x27f/0x434 drivers/scsi/scsi_scan.c:1458
>   scsi_report_lun_scan+0x0/0x434 drivers/scsi/scsi_scan.c:1053

so clearly the device *was* freed by scsi_report_lun_scan() doing the
scsi_device_put().

>  This device
> is the one we first used to issue the report lun scan.  Either it's an
> existing device, or we created it specially for the purpose.  If it's an
> existing one, that put just releases our reference, but the core still
> has one  (there'd have to be a very unusual scan destroy race for the
> core to be releasing a reference to an object it was in process of
> scanning).

A race it may be. Or maybe it's even a kasan bug. But so far, those
kasan reports have been pretty much on the money.

It may be that a possible race is widened by kasan itself slowing things down.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/71] More fixes, cleanup and modernization for NCR5380 drivers

2015-11-20 Thread Ondrej Zary
On Friday 20 November 2015 02:41:19 Finn Thain wrote:
> 
> On Thu, 19 Nov 2015, Ondrej Zary wrote:
> 
> > On Thursday 19 November 2015 03:24:56 Finn Thain wrote:
> >
> > > On Wed, 18 Nov 2015, Ondrej Zary wrote:
> > >
> > > >
> > > > I have some NCR5380 ISA cards and can test them.
> > >
> > > Thanks Ondrej. I've no idea which ISA drivers are presently working in 
> > > mainline. Finding regressions may be more difficult than usual ;-)
> > 
> > You're right... looks very broken:
> > 
> > [   62.577194] scsi host2: Generic NCR5380/NCR53C400 SCSI, io_port 0x240, 
> > n_io_port 16, base 0x0, irq 0, can_queue 16, cmd_per_lun 2, 
> > sg_tablesize 128, this_id 7, flags { DTC3181E NO_PSEUDO_DMA }, USLEEP_POLL 
> > 3, USLEEP_WAITLONG 1250, options { AUTOPROBE_IRQ PSEUDO_DMA 
> > NCR53C400 }
> > [   62.796635] scsi 2:0:0:0: Direct-Access IBM  0663 e  
> >   PQ: 0 ANSI: 2
> > [   63.878494] sd 2:0:0:0: Attached scsi generic sg1 type 0
> > [   95.848260] sd 2:0:0:0: aborting command
> > 
> > And the system hangs completely.
> > 
> 
> Yes. That was the usual failure mode. The old EH abort routine is fatal. 
> Up until I disabled PDMA by default for mac_scsi (in v3.19), that driver 
> would do the same thing.
> 
> > It's much better with your patches, but still not great :)
> > 
> 
> Pleased to hear it :)
> 
> > [   93.963264] pnp 01:01.00: [io  0x0240-0x025f]
> > [   93.963493] pnp 01:01.00: [irq 5]
> > [   93.965768] pnp 01:01.00: activated
> > [   93.977147] scsi host2: Generic NCR5380/NCR53C400 SCSI, io_port 0x240, 
> > n_io_port 16, base 0x0, irq 0, can_queue 16, cmd_per_lun 2, 
> > sg_tablesize 128, this_id 7, flags { DTC3181E NO_PSEUDO_DMA }, options { 
> > AUTOPROBE_IRQ PSEUDO_DMA }
> > [   93.987527] scsi host2: rejecting message
> > [   93.987647] Synchronous Data Transfer Request period = 100 ns offset = 12
> > [   94.001219] scsi 2:0:0:0: Direct-Access IBM  0663 e  
> >   PQ: 0 ANSI: 2
> > [  113.000794] sd 2:0:0:0: Attached scsi generic sg1 type 0
> 
> I'd be interested to know what commands were in play in that 19 second 
> interval. Might need to use scsi_logging_level to figure that out.
> 
> My tests involved 3 different scsi targets (two disks and a CD-ROM) but 
> none of these send a SDTR. Your log says the driver correctly rejected the 
> SDTR message but that doesn't mean the target actually went to MSG IN 
> phase and got the message. Do you have any older targets you can test?

Another disk, without patches:

[   84.481582] pnp 01:01.00: activated
[   84.489650] scsi host2: Generic NCR5380/NCR53C400 SCSI, io_port 0x240, 
n_io_port 16, base 0x0, irq 0, can_queue 16, cmd_per_lun 2, sg_tablesize 128, 
this_id 7, flags { DTC3181E NO_PSEUDO_DMA }, USLEEP_POLL 3, USLEEP_WAITLONG 
1250, options { AUTOPROBE_IRQ PSEUDO_DMA NCR53C400 }
[   84.953332] scsi 2:0:1:0: Direct-Access QUANTUM  LP240S GM240S01X 4.6  
PQ: 0 ANSI: 2 CCS
[   86.786475] sd 2:0:1:0: Attached scsi generic sg1 type 0
[   86.793753] sd 2:0:1:0: [sdb] 479350 512-byte logical blocks: (245 MB/234 
MiB)
[   86.998555] sd 2:0:1:0: [sdb] Write Protect is off
[   87.406068] sd 2:0:1:0: [sdb] Write cache: enabled, read cache: enabled, 
doesn't support DPO or FUA
[  118.888271] sd 2:0:1:0: [sdb] aborting command
[  118.888738] sd 2:0:1:0: [sdb] aborting command

With patches:

[  258.473748] pnp 01:01.00: activated
[  258.483592] scsi host2: Generic NCR5380/NCR53C400 SCSI, io_port 0x240, 
n_io_port 16, base 0x0, irq 0, can_queue 16, cmd_per_lun 2, sg_tablesize 128, 
this_id 7, flags { DTC3181E NO_PSEUDO_DMA }, options { AUTOPROBE_IRQ PSEUDO_DMA 
}
[  261.347632] scsi 2:0:1:0: Direct-Access QUANTUM  LP240S GM240S01X 4.6  
PQ: 0 ANSI: 2 CCS
[  275.560451] sd 2:0:1:0: Attached scsi generic sg1 type 0
[  275.632519] sd 2:0:1:0: [sdb] 479350 512-byte logical blocks: (245 MB/234 
MiB)
[  275.635533] sd 2:0:1:0: [sdb] Write Protect is off
[  275.642315] sd 2:0:1:0: [sdb] Write cache: enabled, read cache: enabled, 
doesn't support DPO or FUA
[  469.076347] sd 2:0:1:0: [sdb] FAILED Result: hostbyte=DID_TIME_OUT 
driverbyte=DRIVER_SENSE
[  469.076613] sd 2:0:1:0: [sdb] Sense Key : Aborted Command [current]
[  469.076851] sd 2:0:1:0: [sdb] Add. Sense: No additional sense information
[  469.077086] sd 2:0:1:0: [sdb] CDB: Read(10) 28 00 00 00 00 02 00 00 02 00
[  469.077306] blk_update_request: I/O error, dev sdb, sector 2
[  469.077522] Buffer I/O error on dev sdb, logical block 1, async page read
[  480.108255] INFO: task kworker/u2:2:60 blocked for more than 120 seconds.
[  480.109773]   Not tainted 4.3.0-rc1+ #74
[  480.109973] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  480.110179] kworker/u2:2D 0040 060  2 0x
[  480.110671] Workqueue: events_unbound async_run_entry_fn
[  480.110999]  cf9e8780 0046 2eff25f7 0040 c117f111 2ee82733 0040 
0016fec4
[  480.112390]   cfaa6000  7fff c139c7d2 c139c504 7fff 
c139d9d3

Re: linux kernel security issuse scsi_report_lun_scan report

2015-11-20 Thread James Bottomley
On Fri, 2015-11-20 at 10:14 -0800, Linus Torvalds wrote:
> [ I don't know if the original reporter ended up actually sending this
> to the scsi list like Greg asked, so I'll forward it myself just in
> case ]

No, this is the first time we've seen this.

> There seems to be a very old use-after-free in the scsi code (git
> blame says the lines around this area are from 2005 and 2008) that
> kasan reports.
> 
> I've tried to clean up the formatting in the email a bit, but the
> executive summary seems to be that this:
> 
> drivers/scsi/scsi_scan.c, around line 1459:
> 
> scsi_device_put(sdev);
> if (scsi_device_created(sdev))
> 
> is just wrong, because the "scsi_device_put()" may be freeing the
> sdev, so then doing "scsi_device_created(sdev)" after it is bogus.

We can look at it, but the analysis shouldn't be correct.  This device
is the one we first used to issue the report lun scan.  Either it's an
existing device, or we created it specially for the purpose.  If it's an
existing one, that put just releases our reference, but the core still
has one  (there'd have to be a very unusual scan destroy race for the
core to be releasing a reference to an object it was in process of
scanning).  If we create it, then it releases our reference taken in
line 1331 but it's also not a final put because alloc has to be paired
with remove.  However, it should also be harmless to move the put to
after the if() clause ... can we ask the reporter to check if that's the
fix?

James


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux kernel security issuse scsi_report_lun_scan report

2015-11-20 Thread Linus Torvalds
[ I don't know if the original reporter ended up actually sending this
to the scsi list like Greg asked, so I'll forward it myself just in
case ]

There seems to be a very old use-after-free in the scsi code (git
blame says the lines around this area are from 2005 and 2008) that
kasan reports.

I've tried to clean up the formatting in the email a bit, but the
executive summary seems to be that this:

drivers/scsi/scsi_scan.c, around line 1459:

scsi_device_put(sdev);
if (scsi_device_created(sdev))

is just wrong, because the "scsi_device_put()" may be freeing the
sdev, so then doing "scsi_device_created(sdev)" after it is bogus.

 Linus

On Wed, Nov 18, 2015 at 5:15 AM, 程君(成淼)  wrote:
>
> Dear all:
>  we find one security issuse in kernel 4.3.0,aslo check the
> lastest code,please check , thanks.
>
> 1.user-after-free in scsi_report_lun_scan
>
> CPU: 0 PID: 1 Comm: swapper/0 Tainted: G  B  4.3.0+ #2
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
> Call Trace:
>  scsi_device_created include/scsi/scsi_device.h:460
>  scsi_report_lun_scan+0x28b/0x434 drivers/scsi/scsi_scan.c:1459
>  device_release+0x44/0xde drivers/base/core.c:247
>  scsi_device_created include/scsi/scsi_device.h:460
>  scsi_report_lun_scan+0x28b/0x434 drivers/scsi/scsi_scan.c:1459
>  scsi_probe_and_add_lun+0xe4f/0xe4f drivers/scsi/scsi_scan.c:1053
>  scsi_free_host_dev+0x4d/0x4d drivers/scsi/scsi_scan.c:1921
>  __raw_callee_save___pv_queued_spin_unlock+0x11/0x1e ??:?
>  __scsi_scan_target+0x16f/0x293 drivers/scsi/scsi_scan.c:1563
>  scsi_add_device+0x20/0x20 drivers/scsi/scsi_scan.c:1513
>  __pm_runtime_idle+0x5c/0x5c drivers/base/power/runtime.c:904
>  __raw_callee_save___pv_queued_spin_unlock+0x11/0x1e ??:?
>  scsi_scan_channel+0x81/0x8f drivers/scsi/scsi_scan.c:1641
>  scsi_scan_host_selected+0x144/0x161 drivers/scsi/scsi_scan.c:1669
>  scsi_scan_host+0xa5/0x21d drivers/scsi/scsi_scan.c:1837
>  virtscsi_probe+0x4c8/0x50b drivers/scsi/virtio_scsi.c:1032
>  virtscsi_init+0x392/0x392 drivers/scsi/virtio_scsi.c:908
>  kernfs_type include/linux/kernfs.h:239
>  kernfs_leftmost_descendant+0x12/0x36 fs/kernfs/dir.c:970
>  kernfs_activate+0x30/0xea fs/kernfs/dir.c:1036
>  mutex_clear_owner kernel/locking/mutex.h:27
>  mutex_unlock+0x12/0x2a kernel/locking/mutex.c:435
>  kernfs_add_one+0x20d/0x21f fs/kernfs/dir.c:647
>  __virtio_clear_bit include/linux/virtio_config.h:135
>  vring_transport_features+0x3f/0x55 drivers/virtio/virtio_ring.c:786
>  vp_finalize_features+0x49/0x4e drivers/virtio/virtio_pci_legacy.c:44
>  virtio_dev_probe+0x163/0x2d0 drivers/virtio/virtio.c:235
>  really_probe drivers/base/dd.c:316
>  driver_probe_device+0x25d/0x378 drivers/base/dd.c:429
>  really_probe drivers/base/dd.c:368
>  driver_probe_device+0x378/0x378 drivers/base/dd.c:429
>  __driver_attach+0x6d/0xae drivers/base/dd.c:642
>  bus_for_each_dev+0x106/0x10a drivers/base/bus.c:314
>  next_device+0x24/0x24 drivers/base/bus.c:280
>  __raw_callee_save___pv_queued_spin_unlock+0x11/0x1e ??:?
>  bus_add_driver+0x269/0x2bc drivers/base/bus.c:708
>  initcall_blacklist+0xbe/0xbe init/main.c:726
>  driver_register+0x103/0x144 drivers/base/driver.c:168
>  scsi_init_sysctl+0x1d/0x1d drivers/scsi/scsi_sysctl.c:46
>  init+0xaf/0xb9 net/ipv4/netfilter/nf_nat_h323.c:590
>  scsi_init_sysctl+0x1d/0x1d drivers/scsi/scsi_sysctl.c:46
>  do_one_initcall+0x9d/0x1f4 init/main.c:794
>  try_to_run_init_process+0x2f/0x2f init/main.c:928
>  parse_args+0x4b/0x3ab kernel/params.c:234
> Memory state around the buggy address:
> 88006984d000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> 88006984d080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>88006984d100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  ^
> 88006984d180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> 88006984d200: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
>
> INFO: Allocated in scsi_alloc_sdev+0x7c/0x56e age=4 cpu=0 pid=1
>   set_track+0x6d/0x108 mm/slub.c:528
>   alloc_debug_processing+0xaf/0x142 mm/slub.c:1049
>   __slab_alloc+0x3e0/0x4b8 mm/slub.c:2402
>   kmalloc include/linux/slab.h:445
>   kzalloc include/linux/slab.h:593
>   scsi_alloc_sdev+0x7c/0x56e drivers/scsi/scsi_scan.c:218
>   init_object+0x2d/0x5e mm/slub.c:681
>   __raw_callee_save___pv_queued_spin_unlock+0x11/0x1e ??:?
>   scsi_probe_and_add_lun+0xe3d/0xe4f drivers/scsi/scsi_scan.c:1178
>   slab_alloc_node mm/slub.c:2470
>   slab_alloc mm/slub.c:2512
>   __kmalloc+0x84/0x169 mm/slub.c:3417
>   slab_alloc_node mm/slub.c:2470
>   slab_alloc mm/slub.c:2512
>   __kmalloc+0x84/0x169 mm/slub.c:3417
>   kmalloc include/linux/slab.h:445
>   kzalloc include/linux/slab.h:593
>   scsi_alloc_sdev+0x7c/0x56e drivers/scsi/scsi_scan.c:218
>   scsi_report_lun_scan+0x17f/0x434 drivers/scsi/scsi_scan.c:1328
>   scsi_report_lun_scan+0x0/0x434 drivers/scsi/scsi_scan.c:1

Re: [PATCH] Fix a memory leak in scsi_host_dev_release()

2015-11-20 Thread Bart Van Assche

On 11/20/2015 03:52 AM, Christoph Hellwig wrote:

the memory leak looks real, and your fix looks corret, but I still
don't like it.

I think it's reasonable for SCSI to assume that the final put_device
fully frees the struct device including the name pointer that is
assigned entirely behind the back of the caller.

So I think the fix for this probably should be in the driver core.


Hello Christoph,

Thanks for the feedback. However, I'm not sure this can be fixed by 
modifying the driver core. If scsi_host_remove() is not called the SCSI 
core doesn't call put_device(&shost->shost_dev). I will post a second 
version of this patch that ensures that the SCSI core always calls 
put_device(&shost->shost_dev).


Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/9] IB: add a proper completion queue abstraction

2015-11-20 Thread Bart Van Assche
On 11/20/2015 02:16 AM, Christoph Hellwig wrote:
> On Wed, Nov 18, 2015 at 10:20:14AM -0800, Bart Van Assche wrote:
>> Are you perhaps referring to the sysfs CPU mask that allows to control
>> workqueue affinity ?
> 
> I think he is referring to the defintion of WQ_UNBOUND:
> 
>WQ_UNBOUND
> 
>   Work items queued to an unbound wq are served by the special
>   woker-pools which host workers which are not bound to any
>   specific CPU.  This makes the wq behave as a simple execution
>   context provider without concurrency management.  The unbound
>   worker-pools try to start execution of work items as soon as
>   possible.  Unbound wq sacrifices locality but is useful for
>   the following cases.
> 
>   * Wide fluctuation in the concurrency level requirement is
> expected and using bound wq may end up creating large number
> of mostly unused workers across different CPUs as the issuer
> hops through different CPUs.
> 
>   * Long running CPU intensive workloads which can be better
> managed by the system scheduler.
 
Hello Christoph,

The comment about locality in the above quote is interesting. How about
modifying patch 2/9 as indicated below ? The modification below does not
change the behavior of this patch if ib_cq.w.cpu is not modified. And it
allows users who care about locality and who want to skip the scheduler
overhead by setting ib_cq.w.cpu to the index of the CPU they want the
work to be processed on.

Thanks,

Bart.

---
 drivers/infiniband/core/cq.c | 11 ++-
 include/rdma/ib_verbs.h  |  5 -
 2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
index bf2a079..4d80d8c 100644
--- a/drivers/infiniband/core/cq.c
+++ b/drivers/infiniband/core/cq.c
@@ -94,18 +94,18 @@ static void ib_cq_completion_softirq(struct ib_cq *cq, void 
*private)
 
 static void ib_cq_poll_work(struct work_struct *work)
 {
-   struct ib_cq *cq = container_of(work, struct ib_cq, work);
+   struct ib_cq *cq = container_of(work, struct ib_cq, w.work);
int completed;
 
completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE);
if (completed >= IB_POLL_BUDGET_WORKQUEUE ||
ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0)
-   queue_work(ib_comp_wq, &cq->work);
+   queue_work_on(cq->w.cpu, ib_comp_wq, &cq->w.work);
 }
 
 static void ib_cq_completion_workqueue(struct ib_cq *cq, void *private)
 {
-   queue_work(ib_comp_wq, &cq->work);
+   queue_work_on(cq->w.cpu, ib_comp_wq, &cq->w.work);
 }
 
 /**
@@ -159,7 +159,8 @@ struct ib_cq *ib_alloc_cq(struct ib_device *dev, void 
*private,
break;
case IB_POLL_WORKQUEUE:
cq->comp_handler = ib_cq_completion_workqueue;
-   INIT_WORK(&cq->work, ib_cq_poll_work);
+   INIT_WORK(&cq->w.work, ib_cq_poll_work);
+   cq->w.cpu = WORK_CPU_UNBOUND;
ib_req_notify_cq(cq, IB_CQ_NEXT_COMP);
break;
default:
@@ -195,7 +196,7 @@ void ib_free_cq(struct ib_cq *cq)
irq_poll_disable(&cq->iop);
break;
case IB_POLL_WORKQUEUE:
-   flush_work(&cq->work);
+   flush_work(&cq->w.work);
break;
default:
WARN_ON_ONCE(1);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index f59a8d3..b1344f8 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1291,7 +1291,10 @@ struct ib_cq {
struct ib_wc*wc;
union {
struct irq_poll iop;
-   struct work_struct  work;
+   struct {
+   struct work_struct  work;
+   int cpu;
+   } w;
};
 };
 
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: hpsa: select CONFIG_SCSI_SAS_ATTR

2015-11-20 Thread Martin K. Petersen
> "Arnd" == Arnd Bergmann  writes:

Arnd> The hpsa driver recently started using the sas transport class,
Arnd> but it does not ensure that the corresponding code is actually
Arnd> built, which may lead to a link error:

Applied.

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [RESEND 2] SCSI: initio: remove duplicate module device table

2015-11-20 Thread Martin K. Petersen
> "Arnd" == Arnd Bergmann  writes:

Arnd> The initio driver has for many years had two copies of the same
Arnd> module device table. One of them is also used for registering the
Arnd> other driver, the other one is entirely useless after the large
Arnd> scale cleanup that Alan Cox did back in 2007.

Applied.

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] scsi: advansys needs ISA dma api for ISA support

2015-11-20 Thread Martin K. Petersen
> "Arnd" == Arnd Bergmann  writes:

Arnd> The advansys drvier uses the request_dma function that is used on
Arnd> ISA machines for the internal DMA controller, which causes build
Arnd> errors on platforms that have ISA slots but do not provide the ISA
Arnd> DMA API:

Applied.

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] scsi: use sector_div instead of do_div

2015-11-20 Thread Arnd Bergmann
do_div is the wrong way to divide a sector_t, as it is less
efficient when sector_t is 32-bit wide. With the upcoming
do_div optimizations, the kernel starts warning about this:

drivers/scsi/scsi_debug.c: In function 'dif_store':
include/asm-generic/div64.h:207:28: warning: comparison of distinct pointer 
types lacks a cast

This changes the code to use sector_div instead, which always
produces optimal code.

Signed-off-by: Arnd Bergmann 
---
Found on the ARM randconfig build today, after I merged Nico's patches
for linux-next

diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
index dfcc45bb03b1..ec622ba9864a 100644
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -678,7 +678,7 @@ static void *fake_store(unsigned long long lba)
 
 static struct sd_dif_tuple *dif_store(sector_t sector)
 {
-   sector = do_div(sector, sdebug_store_sectors);
+   sector = sector_div(sector, sdebug_store_sectors);
 
return dif_storep + sector;
 }
@@ -2780,7 +2780,7 @@ static unsigned long lba_to_map_index(sector_t lba)
lba += scsi_debug_unmap_granularity -
scsi_debug_unmap_alignment;
}
-   do_div(lba, scsi_debug_unmap_granularity);
+   sector_div(lba, scsi_debug_unmap_granularity);
 
return lba;
 }

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: hpsa: select CONFIG_SCSI_SAS_ATTR

2015-11-20 Thread James Bottomley
On Fri, 2015-11-20 at 10:09 -0600, Don Brace wrote:
> Signed-off-by: Don Brace 

You mean acked-by ... it can only be signed-off-by if *you* resend the
patch because signoffs track the patch transmission path.

James

> And...Thanks!
> 
> 
> On 11/19/2015 04:04 PM, Arnd Bergmann wrote:
> > The hpsa driver recently started using the sas transport class, but it
> > does not ensure that the corresponding code is actually built, which
> > may lead to a link error:
> >
> > drivers/built-in.o: In function `hpsa_free_sas_phy':
> > (.text+0x1ce874): undefined reference to `sas_port_delete_phy'
> > (.text+0x1ce87c): undefined reference to `sas_phy_free'
> > drivers/built-in.o: In function `hpsa_alloc_sas_port':
> > (.text+0x1ceb9c): undefined reference to `sas_port_alloc_num'
> > (.text+0x1ceba8): undefined reference to `sas_port_add'
> > drivers/built-in.o: In function `hpsa_init':
> > (.init.text+0x8838): undefined reference to `sas_attach_transport'
> > (.init.text+0x8868): undefined reference to `sas_release_transport
> >
> > This adds 'select SCSI_SAS_ATTR' in the Kconfig entry, just like we do
> > for all other drivers using those functions.
> >
> > Signed-off-by: Arnd Bergmann 
> > Fixes: d04e62b9d63a ("hpsa: add in sas transport class")
> > ---
> > This showed up on the ARM randconfig builds today for the first time.
> >
> > diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
> > index 04a720686516..64eed87d34a8 100644
> > --- a/drivers/scsi/Kconfig
> > +++ b/drivers/scsi/Kconfig
> > @@ -364,6 +364,7 @@ config SCSI_HPSA
> > tristate "HP Smart Array SCSI driver"
> > depends on PCI && SCSI
> > select CHECK_SIGNATURE
> > +   select SCSI_SAS_ATTRS
> > help
> >   This driver supports HP Smart Array Controllers (circa 2009).
> >   It is a SCSI alternative to the cciss driver, which is a block
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: hpsa: select CONFIG_SCSI_SAS_ATTR

2015-11-20 Thread Don Brace

Signed-off-by: Don Brace 

And...Thanks!


On 11/19/2015 04:04 PM, Arnd Bergmann wrote:

The hpsa driver recently started using the sas transport class, but it
does not ensure that the corresponding code is actually built, which
may lead to a link error:

drivers/built-in.o: In function `hpsa_free_sas_phy':
(.text+0x1ce874): undefined reference to `sas_port_delete_phy'
(.text+0x1ce87c): undefined reference to `sas_phy_free'
drivers/built-in.o: In function `hpsa_alloc_sas_port':
(.text+0x1ceb9c): undefined reference to `sas_port_alloc_num'
(.text+0x1ceba8): undefined reference to `sas_port_add'
drivers/built-in.o: In function `hpsa_init':
(.init.text+0x8838): undefined reference to `sas_attach_transport'
(.init.text+0x8868): undefined reference to `sas_release_transport

This adds 'select SCSI_SAS_ATTR' in the Kconfig entry, just like we do
for all other drivers using those functions.

Signed-off-by: Arnd Bergmann 
Fixes: d04e62b9d63a ("hpsa: add in sas transport class")
---
This showed up on the ARM randconfig builds today for the first time.

diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
index 04a720686516..64eed87d34a8 100644
--- a/drivers/scsi/Kconfig
+++ b/drivers/scsi/Kconfig
@@ -364,6 +364,7 @@ config SCSI_HPSA
tristate "HP Smart Array SCSI driver"
depends on PCI && SCSI
select CHECK_SIGNATURE
+   select SCSI_SAS_ATTRS
help
  This driver supports HP Smart Array Controllers (circa 2009).
  It is a SCSI alternative to the cciss driver, which is a block

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-20 Thread Ewan Milne
On Fri, 2015-11-20 at 15:55 +0100, Hannes Reinecke wrote:
> Can't we have a joint effort here?
> I've been spending a _LOT_ of time trying to debug things here, but
> none of the ideas I've come up with have been able to fix anything.

Yes.  I'm not the one primarily looking at it, and we don't have a
reproducer in-house.  We just have the one dump right now.

> 
> I'm almost tempted to increase the count from scsi_alloc_sgtable()
> by one and be done with ...
> 

That might not fix it if it is a problem with the merge code, though.


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-20 Thread Hannes Reinecke
On 11/20/2015 03:38 PM, Ewan Milne wrote:
> On Thu, 2015-11-19 at 16:35 +0100, Hannes Reinecke wrote:
>> On 11/19/2015 09:23 AM, Christoph Hellwig wrote:
>>> It's pretty much guaranteed a block layer bug, most likely in the
>>> merge bios to request infrastucture where we don't obey the merging
>>> limits properly.
>>>
>>> Does either of you have a known good and first known bad kernel?
>>
>> Well, I have been fighting a similar issue for several months now,
>> albeit with multipath enabled. Haven't had much progress with this,
>> sadly.
>> Seeing that this is our distro kernel it might or might not be
>> related; however, as the symptoms are identical there still is a
>> chance that this is actually a generic block-layer problem.
>>
>> Cheers,
>>
>> Hannes
> 
> We have seen this also.  (e.g.  req->nr_phys_segments was 3, but
> blk_rq_map_sg() returned 4.)  I was suspicious of the patch:
> 
> bio: modify __bio_add_page() to accept pages that don't start a new segment
> 
> But we put some debugging code in and didn't hit it.  We haven't
> found the problem yet, either, though.  We're still looking.
> 
Can't we have a joint effort here?
I've been spending a _LOT_ of time trying to debug things here, but
none of the ideas I've come up with have been able to fix anything.

I'm almost tempted to increase the count from scsi_alloc_sgtable()
by one and be done with ...

> As Christoph said, it would seem to be a problem with the block layer
> merging.
> 
> The API for this seems defective, in that blk_rq_map_sg() should
> never be returning a value indicating that it overwrote past the
> end of the supplied SG array and depend on the caller to check it.
> (We could get data corruption on another I/O if it used adjacent
> memory for a different SG list, for example.)
> 
Yeah, the API is bloody useless.
By the time you hit the BUG_ON you've already caused a memory
corruption, so no way you can recover there.

At the very least we should be passing in the sg list count into
blk_map_rq_sg(), but that's a core blocklayer API and changes here
would require changes by quite a set of drivers. Plus it wouldn't
help me for a distribution kernel ...

Cheers,

Hannes
-- 
Dr. Hannes ReineckezSeries & Storage
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-20 Thread Ewan Milne
On Thu, 2015-11-19 at 16:35 +0100, Hannes Reinecke wrote:
> On 11/19/2015 09:23 AM, Christoph Hellwig wrote:
> > It's pretty much guaranteed a block layer bug, most likely in the
> > merge bios to request infrastucture where we don't obey the merging
> > limits properly.
> > 
> > Does either of you have a known good and first known bad kernel?
> 
> Well, I have been fighting a similar issue for several months now,
> albeit with multipath enabled. Haven't had much progress with this,
> sadly.
> Seeing that this is our distro kernel it might or might not be
> related; however, as the symptoms are identical there still is a
> chance that this is actually a generic block-layer problem.
> 
> Cheers,
> 
> Hannes

We have seen this also.  (e.g.  req->nr_phys_segments was 3, but
blk_rq_map_sg() returned 4.)  I was suspicious of the patch:

bio: modify __bio_add_page() to accept pages that don't start a new segment

But we put some debugging code in and didn't hit it.  We haven't
found the problem yet, either, though.  We're still looking.

As Christoph said, it would seem to be a problem with the block layer
merging.

The API for this seems defective, in that blk_rq_map_sg() should
never be returning a value indicating that it overwrote past the
end of the supplied SG array and depend on the caller to check it.
(We could get data corruption on another I/O if it used adjacent
memory for a different SG list, for example.)

-Ewan





--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-20 Thread Mark Salter
On Fri, 2015-11-20 at 13:56 +0100, Laurent Dufour wrote:
> On 20/11/2015 13:10, Michael Ellerman wrote:
> > On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
> > 
> > > It's pretty much guaranteed a block layer bug, most likely in the
> > > merge bios to request infrastucture where we don't obey the merging
> > > limits properly.
> > > 
> > > Does either of you have a known good and first known bad kernel?
> > 
> > Not me, I've only hit it one or two times. All I can say is I have hit it in
> > 4.4-rc1.
> > 
> > Laurent, can you narrow it down at all?
> 
> I'm trying bisect it but it's very time consuming, and I think I missed
> some bad kernel because I didn't let enough time to the system to panic.
> 
> But I'm still on...
> 
I usually hit it at boot time. Maybe one out of five times. I'll take a stab
at bisecting today...

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-20 Thread Laurent Dufour
On 20/11/2015 13:10, Michael Ellerman wrote:
> On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
> 
>> It's pretty much guaranteed a block layer bug, most likely in the
>> merge bios to request infrastucture where we don't obey the merging
>> limits properly.
>>
>> Does either of you have a known good and first known bad kernel?
> 
> Not me, I've only hit it one or two times. All I can say is I have hit it in
> 4.4-rc1.
> 
> Laurent, can you narrow it down at all?

I'm trying bisect it but it's very time consuming, and I think I missed
some bad kernel because I didn't let enough time to the system to panic.

But I'm still on...

Cheers

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/71] More fixes, cleanup and modernization for NCR5380 drivers

2015-11-20 Thread Ondrej Zary
On Friday 20 November 2015, Geert Uytterhoeven wrote:
> On Fri, Nov 20, 2015 at 12:40 PM, Ondrej Zary
>  wrote:
> > Working ISA means more testing possibilities. It's much easier to get an 
> > ISA card than a Sun or Atari. Also faster CPU (such as 1 GHz P3) means 
> > quicker testing.
> 
> Faster PCs without ISA slots? ;-)

Faster but not too fast, you have to be careful :)
There are many boards for Pentium 3 or Ahlon XP CPUs with ISA slots.

-- 
Ondrej Zary
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH] lpfc: Add lockdep assertions

2015-11-20 Thread Johannes Thumshirn
Several functions in lpfc have comments stating that the function must be
called with the hbalock (or hostlock, or ringlock) held. Add
lockdep_assert_held() annotations to these functions, so one can actually
verify the locks are held.

Signed-off-by: Johannes Thumshirn 
---

I'm not sure if this is actually helpfull upstream but it helped me to debug a 
downstream bug.

 drivers/scsi/lpfc/lpfc_hbadisc.c |  5 +
 drivers/scsi/lpfc/lpfc_sli.c | 43 
 2 files changed, 48 insertions(+)

diff --git a/drivers/scsi/lpfc/lpfc_hbadisc.c b/drivers/scsi/lpfc/lpfc_hbadisc.c
index bfc2442..96de2ab 100644
--- a/drivers/scsi/lpfc/lpfc_hbadisc.c
+++ b/drivers/scsi/lpfc/lpfc_hbadisc.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1314,6 +1315,8 @@ __lpfc_update_fcf_record_pri(struct lpfc_hba *phba, 
uint16_t fcf_index,
 {
struct lpfc_fcf_pri *fcf_pri;
 
+   lockdep_assert_held(&phba->hbalock);
+
fcf_pri = &phba->fcf.fcf_pri[fcf_index];
fcf_pri->fcf_rec.fcf_index = fcf_index;
/* FCF record priority */
@@ -1398,6 +1401,8 @@ __lpfc_update_fcf_record(struct lpfc_hba *phba, struct 
lpfc_fcf_rec *fcf_rec,
   struct fcf_record *new_fcf_record, uint32_t addr_mode,
   uint16_t vlan_id, uint32_t flag)
 {
+   lockdep_assert_held(&phba->hbalock);
+
/* Copy the fields from the HBA's FCF record */
lpfc_copy_fcf_record(fcf_rec, new_fcf_record);
/* Update other fields of driver FCF record */
diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c
index f9585cd..6d84c77 100644
--- a/drivers/scsi/lpfc/lpfc_sli.c
+++ b/drivers/scsi/lpfc/lpfc_sli.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -576,6 +577,8 @@ __lpfc_sli_get_iocbq(struct lpfc_hba *phba)
struct list_head *lpfc_iocb_list = &phba->lpfc_iocb_list;
struct lpfc_iocbq * iocbq = NULL;
 
+   lockdep_assert_held(&phba->hbalock);
+
list_remove_head(lpfc_iocb_list, iocbq, struct lpfc_iocbq, list);
if (iocbq)
phba->iocb_cnt++;
@@ -797,6 +800,7 @@ int
 lpfc_test_rrq_active(struct lpfc_hba *phba, struct lpfc_nodelist *ndlp,
uint16_t  xritag)
 {
+   lockdep_assert_held(&phba->hbalock);
if (!ndlp)
return 0;
if (!ndlp->active_rrqs_xri_bitmap)
@@ -914,6 +918,8 @@ __lpfc_sli_get_sglq(struct lpfc_hba *phba, struct 
lpfc_iocbq *piocbq)
struct lpfc_nodelist *ndlp;
int found = 0;
 
+   lockdep_assert_held(&phba->hbalock);
+
if (piocbq->iocb_flag &  LPFC_IO_FCP) {
lpfc_cmd = (struct lpfc_scsi_buf *) piocbq->context1;
ndlp = lpfc_cmd->rdata->pnode;
@@ -1003,6 +1009,8 @@ __lpfc_sli_release_iocbq_s4(struct lpfc_hba *phba, struct 
lpfc_iocbq *iocbq)
unsigned long iflag = 0;
struct lpfc_sli_ring *pring = &phba->sli.ring[LPFC_ELS_RING];
 
+   lockdep_assert_held(&phba->hbalock);
+
if (iocbq->sli4_xritag == NO_XRI)
sglq = NULL;
else
@@ -1058,6 +1066,7 @@ __lpfc_sli_release_iocbq_s3(struct lpfc_hba *phba, struct 
lpfc_iocbq *iocbq)
 {
size_t start_clean = offsetof(struct lpfc_iocbq, iocb);
 
+   lockdep_assert_held(&phba->hbalock);
 
/*
 * Clean all volatile data fields, preserve iotag and node struct.
@@ -1080,6 +1089,8 @@ __lpfc_sli_release_iocbq_s3(struct lpfc_hba *phba, struct 
lpfc_iocbq *iocbq)
 static void
 __lpfc_sli_release_iocbq(struct lpfc_hba *phba, struct lpfc_iocbq *iocbq)
 {
+   lockdep_assert_held(&phba->hbalock);
+
phba->__lpfc_sli_release_iocbq(phba, iocbq);
phba->iocb_cnt--;
 }
@@ -1310,6 +1321,8 @@ static int
 lpfc_sli_ringtxcmpl_put(struct lpfc_hba *phba, struct lpfc_sli_ring *pring,
struct lpfc_iocbq *piocb)
 {
+   lockdep_assert_held(&phba->hbalock);
+
list_add_tail(&piocb->list, &pring->txcmplq);
piocb->iocb_flag |= LPFC_IO_ON_TXCMPLQ;
 
@@ -1344,6 +1357,8 @@ lpfc_sli_ringtx_get(struct lpfc_hba *phba, struct 
lpfc_sli_ring *pring)
 {
struct lpfc_iocbq *cmd_iocb;
 
+   lockdep_assert_held(&phba->hbalock);
+
list_remove_head((&pring->txq), cmd_iocb, struct lpfc_iocbq, list);
return cmd_iocb;
 }
@@ -1367,6 +1382,9 @@ lpfc_sli_next_iocb_slot (struct lpfc_hba *phba, struct 
lpfc_sli_ring *pring)
 {
struct lpfc_pgp *pgp = &phba->port_gp[pring->ringno];
uint32_t  max_cmd_idx = pring->sli.sli3.numCiocb;
+
+   lockdep_assert_held(&phba->hbalock);
+
if ((pring->sli.sli3.next_cmdidx == pring->sli.sli3.cmdidx) &&
   (++pring->sli.sli3.next_cmdidx >= max_cmd_idx))
pring->sli.sli3.next_cmdidx = 0;
@@ -1497,6 +1515,7 @@ static void
 lpfc_sli_submit_iocb(struct lpfc_hba *phba, struct lpfc_sli_ring *pring,
IOCB_t *iocb, struct lpfc_iocbq *nextiocb

Re: [PATCH 00/71] More fixes, cleanup and modernization for NCR5380 drivers

2015-11-20 Thread Geert Uytterhoeven
On Fri, Nov 20, 2015 at 12:40 PM, Ondrej Zary
 wrote:
> Working ISA means more testing possibilities. It's much easier to get an ISA 
> card than a Sun or Atari. Also faster CPU (such as 1 GHz P3) means quicker 
> testing.

Faster PCs without ISA slots? ;-)

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-20 Thread Michael Ellerman
On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:

> It's pretty much guaranteed a block layer bug, most likely in the
> merge bios to request infrastucture where we don't obey the merging
> limits properly.
> 
> Does either of you have a known good and first known bad kernel?

Not me, I've only hit it one or two times. All I can say is I have hit it in
4.4-rc1.

Laurent, can you narrow it down at all?

cheers

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fix a memory leak in scsi_host_dev_release()

2015-11-20 Thread Christoph Hellwig
Hi Bart,

the memory leak looks real, and your fix looks corret, but I still
don't like it.

I think it's reasonable for SCSI to assume that the final put_device
fully frees the struct device including the name pointer that is
assigned entirely behind the back of the caller.

So I think the fix for this probably should be in the driver core.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/71] More fixes, cleanup and modernization for NCR5380 drivers

2015-11-20 Thread Christoph Hellwig
On Fri, Nov 20, 2015 at 12:40:03PM +0100, Ondrej Zary wrote:
> > > I'd love to be able to get rid of the ISA drivers to be honest.
> > 
> > Is that because of their use of scsi_module.c or their general decrepitude 
> > or something else?
> 
> scsi_module.c usage shouldn't be hard to fix. I can do that after finding a 
> working setup.

It's the general state of them.

> 
> > > Given that they appear to be gravely broken before your cleanups this 
> > > might be an opportunity to get rid of them.
> > 
> > At this stage, that's unclear (to me). It could be that g_NCR5380.c is not 
> > broken. It could be that the core driver can't handle certain targets. I 
> > think we need to do more testing.
> 
> Maybe I was just unlucky and tested a drive that never worked with this 
> driver.
> 
> Working ISA means more testing possibilities. It's much easier to get an ISA 
> card than a Sun or Atari. Also faster CPU (such as 1 GHz P3) means quicker 
> testing.


Well, if you volunteer to bring the NCR5380 ISA drivers up to date and
maintain them it's obvuously fine to keep them around.

I'm more worried about all the unmaintained ISA drivers in horrible
shape.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/71] More fixes, cleanup and modernization for NCR5380 drivers

2015-11-20 Thread Ondrej Zary
On Friday 20 November 2015, Finn Thain wrote:
> 
> On Fri, 20 Nov 2015, Christoph Hellwig wrote:
> 
> > On Fri, Nov 20, 2015 at 07:19:21PM +1100, Finn Thain wrote:
> >
> > > Yes. I didn't do that conversion because I don't have ISA hardware and 
> > > I don't understand ISA probing.
> > > 
> > > The present patch set doesn't seek to resurrect the ISA drivers. But I 
> > > am trying to avoid regressions.
> > > 
> > > I have mixed feelings about the ISA drivers. ISA DMA support 
> > > complicates things (it was never completed) and DMA seems to be the 
> > > main obstacle to merging the two core driver forks.
> > 
> > I'd love to be able to get rid of the ISA drivers to be honest.
> 
> Is that because of their use of scsi_module.c or their general decrepitude 
> or something else?

scsi_module.c usage shouldn't be hard to fix. I can do that after finding a 
working setup.

> > Given that they appear to be gravely broken before your cleanups this 
> > might be an opportunity to get rid of them.
> 
> At this stage, that's unclear (to me). It could be that g_NCR5380.c is not 
> broken. It could be that the core driver can't handle certain targets. I 
> think we need to do more testing.

Maybe I was just unlucky and tested a drive that never worked with this driver.

Working ISA means more testing possibilities. It's much easier to get an ISA 
card than a Sun or Atari. Also faster CPU (such as 1 GHz P3) means quicker 
testing.

-- 
Ondrej Zary
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/18] ALUA device handler update, part 1

2015-11-20 Thread Hannes Reinecke
On 11/20/2015 11:54 AM, Christoph Hellwig wrote:
> On Fri, Nov 20, 2015 at 11:52:21AM +0100, Hannes Reinecke wrote:
>> One thing, though: I don't really agree with Barts objection that
>> moving to a workqueue would tie in too many resources.
>> Thing is, I'm not convinces that using a work queue is allocating
>> too many resources (we're speaking of 460 vs 240 bytes here).
>> Also we have to retry commands for quite some time (cite the
>> infamous NetApp takeover/giveback, which can take minutes).
>> If we were to handle that without workqueue we'd have to initiate
>> the retry from the end_io callback, causing a quite deep stack
>> recursion. Which I'm not really fond of.
>>
>> But if anyone has a better idea on how to handle retries without the
>> need for workqueues I'm all ears :-)
> 
> I tend to agree with you, but you better warm up that discussion again
> on the old thread that actually has Bart on Cc.  Or just resend once
> Martin has merged this patch, and have discussion around that version.
> 
Fully agree. So I'm waiting for Martin to pick up the patches first.

Cheers,

Hannes
-- 
Dr. Hannes ReineckezSeries & Storage
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi_debug: fix prevent_allow+verify regressions

2015-11-20 Thread Ruediger Meier
Hi,

what happened to this patch? It's still not merged into the kernel. 
That's a pity.

cu,
Rudi

On Monday 06 April 2015, Douglas Gilbert wrote:
> Ruediger Meier observed a regression with the PREVENT ALLOW
> MEDIUM REMOVAL command in lk 3.19:
>http://www.spinics.net/lists/util-linux-ng/msg11448.html
>
> Inspection indicated the same regression with VERIFY(10).
>
> The patch is against lk 3.19.3 and also works with lk 4.0.0-rc6.
> With this patch both commands are accepted and do nothing.
> VERIFY(10) could be implemented since it shares logic with
> COMPARE AND WRITE which is implemented; however that would
> be an enhancement.
>
> ChangeLog:
>- fix the lk 3.19 regression so that the PREVENT ALLOW
>  MEDIUM REMOVAL command is supported once again
>- same fix for VERIFY(10)
>
> Signed-off-by: Douglas Gilbert 


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/18] ALUA device handler update, part 1

2015-11-20 Thread Christoph Hellwig
On Fri, Nov 20, 2015 at 11:52:21AM +0100, Hannes Reinecke wrote:
> One thing, though: I don't really agree with Barts objection that
> moving to a workqueue would tie in too many resources.
> Thing is, I'm not convinces that using a work queue is allocating
> too many resources (we're speaking of 460 vs 240 bytes here).
> Also we have to retry commands for quite some time (cite the
> infamous NetApp takeover/giveback, which can take minutes).
> If we were to handle that without workqueue we'd have to initiate
> the retry from the end_io callback, causing a quite deep stack
> recursion. Which I'm not really fond of.
> 
> But if anyone has a better idea on how to handle retries without the
> need for workqueues I'm all ears :-)

I tend to agree with you, but you better warm up that discussion again
on the old thread that actually has Bart on Cc.  Or just resend once
Martin has merged this patch, and have discussion around that version.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/71] More fixes, cleanup and modernization for NCR5380 drivers

2015-11-20 Thread Finn Thain

On Fri, 20 Nov 2015, Christoph Hellwig wrote:

> On Fri, Nov 20, 2015 at 07:19:21PM +1100, Finn Thain wrote:
>
> > Yes. I didn't do that conversion because I don't have ISA hardware and 
> > I don't understand ISA probing.
> > 
> > The present patch set doesn't seek to resurrect the ISA drivers. But I 
> > am trying to avoid regressions.
> > 
> > I have mixed feelings about the ISA drivers. ISA DMA support 
> > complicates things (it was never completed) and DMA seems to be the 
> > main obstacle to merging the two core driver forks.
> 
> I'd love to be able to get rid of the ISA drivers to be honest.

Is that because of their use of scsi_module.c or their general decrepitude 
or something else?

> Given that they appear to be gravely broken before your cleanups this 
> might be an opportunity to get rid of them.

At this stage, that's unclear (to me). It could be that g_NCR5380.c is not 
broken. It could be that the core driver can't handle certain targets. I 
think we need to do more testing.

-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/18] ALUA device handler update, part 1

2015-11-20 Thread Hannes Reinecke
On 11/20/2015 11:47 AM, Christoph Hellwig wrote:
> On Mon, Nov 09, 2015 at 04:08:05PM +0100, Hannes Reinecke wrote:
>> Hi all,
>>
>> here's the first part of my ALUA device handler update.
>> It's mainly bugfixes and minor improvements; the two important
>> things are the addition of VPD parsing functions scsi_vpd_lun_id()
>> and scsi_vpd_tpg_id().
>> This series has been split off from the original 'Asynchronous ALUA'
>> patchset, as these bits are pretty uncontroversial and have a good
>> chance of being merged reasonably soon.
> 
> I've already reviewed most patches, but the other ones looks fine as
> well:
> 
> Reviewed-by: Christoph Hellwig 
> 
> Let's get this into the 4.5 queue ASAP an then tackle the actually
> interesting bits next!
> 
I'm all for it.

One thing, though: I don't really agree with Barts objection that
moving to a workqueue would tie in too many resources.
Thing is, I'm not convinces that using a work queue is allocating
too many resources (we're speaking of 460 vs 240 bytes here).
Also we have to retry commands for quite some time (cite the
infamous NetApp takeover/giveback, which can take minutes).
If we were to handle that without workqueue we'd have to initiate
the retry from the end_io callback, causing a quite deep stack
recursion. Which I'm not really fond of.

But if anyone has a better idea on how to handle retries without the
need for workqueues I'm all ears :-)

Cheers,

Hannes
-- 
Dr. Hannes ReineckezSeries & Storage
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/18] ALUA device handler update, part 1

2015-11-20 Thread Christoph Hellwig
On Mon, Nov 09, 2015 at 04:08:05PM +0100, Hannes Reinecke wrote:
> Hi all,
> 
> here's the first part of my ALUA device handler update.
> It's mainly bugfixes and minor improvements; the two important
> things are the addition of VPD parsing functions scsi_vpd_lun_id()
> and scsi_vpd_tpg_id().
> This series has been split off from the original 'Asynchronous ALUA'
> patchset, as these bits are pretty uncontroversial and have a good
> chance of being merged reasonably soon.

I've already reviewed most patches, but the other ones looks fine as
well:

Reviewed-by: Christoph Hellwig 

Let's get this into the 4.5 queue ASAP an then tackle the actually
interesting bits next!
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: hpsa: select CONFIG_SCSI_SAS_ATTR

2015-11-20 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] scsi: advansys needs ISA dma api for ISA support

2015-11-20 Thread Christoph Hellwig
Loosk good,

Reviewed-by: Christoph Hellwig 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/9] srpt: chain RDMA READ/WRITE requests

2015-11-20 Thread Christoph Hellwig
On Wed, Nov 18, 2015 at 08:32:59AM -0800, Bart Van Assche wrote:
> As you know events like a cable pull can cause some of the RDMA work 
> requests to succeed and others to fail. It is essential that all RDMA work 
> requests related to the same SCSI command have finished before the buffers 
> these requests operate upon are reused. The purpose of the SRPT_RDMA_ABORT 
> request is to wait for the RDMA requests that were posted without 
> IB_SEND_SIGNALED and for which no error completion will be received. BTW, I 
> think this consideration applies to all SCSI target drivers and not only to 
> SRP target drivers.

I think everyone understand the theroetical issue, but we'd like to
see a practical case that the implementation in isert and my proposed
srpt one don't handle.

Given that chained WRs must not be reordered the HCA must also give
us the completions in the order we submitted them.  Because of that
the previous WRs must have been completed by the time we get the
notification for the last one which usually does the cleanup.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/9] IB: add a proper completion queue abstraction

2015-11-20 Thread Christoph Hellwig
On Wed, Nov 18, 2015 at 10:20:14AM -0800, Bart Van Assche wrote:
> Are you perhaps referring to the sysfs CPU mask that allows to control 
> workqueue affinity ?

I think he is referring to the defintion of WQ_UNBOUND:

  WQ_UNBOUND

Work items queued to an unbound wq are served by the special
woker-pools which host workers which are not bound to any
specific CPU.  This makes the wq behave as a simple execution
context provider without concurrency management.  The unbound
worker-pools try to start execution of work items as soon as
possible.  Unbound wq sacrifices locality but is useful for
the following cases.

* Wide fluctuation in the concurrency level requirement is
  expected and using bound wq may end up creating large number
  of mostly unused workers across different CPUs as the issuer
  hops through different CPUs.

* Long running CPU intensive workloads which can be better
  managed by the system scheduler.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB/srp: Fix a memory leak

2015-11-20 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [RESEND 2] SCSI: initio: remove duplicate module device table

2015-11-20 Thread Christoph Hellwig
Looks fine,

Reviewed-by: Christoph Hellwig 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/71] More fixes, cleanup and modernization for NCR5380 drivers

2015-11-20 Thread Christoph Hellwig
On Fri, Nov 20, 2015 at 07:19:21PM +1100, Finn Thain wrote:
> Yes. I didn't do that conversion because I don't have ISA hardware and I 
> don't understand ISA probing.
> 
> The present patch set doesn't seek to resurrect the ISA drivers. But I am 
> trying to avoid regressions.
> 
> I have mixed feelings about the ISA drivers. ISA DMA support complicates 
> things (it was never completed) and DMA seems to be the main obstacle to 
> merging the two core driver forks.

I'd love to be able to get rid of the ISA drivers to be honest.  Given
that they appear to be gravely broken before your cleanups this might
be an opportunity to get rid of them.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/71] More fixes, cleanup and modernization for NCR5380 drivers

2015-11-20 Thread Ondrej Zary
On Friday 20 November 2015, Finn Thain wrote:
> 
> On Thu, 19 Nov 2015, Christoph Hellwig wrote:
> 
> > On Fri, Nov 20, 2015 at 06:21:06PM +1100, Finn Thain wrote:
> >
> > > > Not sure what module was being probed here. I presume it was 
> > > > g_NCR5380 or g_NCR5380_mmio. Neither of these calls 
> > > > 'scsi_scan_host'. I'm not sure what the implications are (?)
> > > 
> > > Nevermind. The call is in scsi_module.c.
> > 
> > Which, btw really need to go away.  If you want to resurrect the
> > ISA drivers they need to be converted to proper probing.
> 
> Yes. I didn't do that conversion because I don't have ISA hardware and I 
> don't understand ISA probing.
> 
> The present patch set doesn't seek to resurrect the ISA drivers. But I am 
> trying to avoid regressions.
> 
> I have mixed feelings about the ISA drivers. ISA DMA support complicates 
> things (it was never completed) and DMA seems to be the main obstacle to 
> merging the two core driver forks.

IIRC, my ISA cards can't do DMA either.

-- 
Ondrej Zary
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/71] More fixes, cleanup and modernization for NCR5380 drivers

2015-11-20 Thread Finn Thain

On Thu, 19 Nov 2015, Christoph Hellwig wrote:

> On Fri, Nov 20, 2015 at 06:21:06PM +1100, Finn Thain wrote:
>
> > > Not sure what module was being probed here. I presume it was 
> > > g_NCR5380 or g_NCR5380_mmio. Neither of these calls 
> > > 'scsi_scan_host'. I'm not sure what the implications are (?)
> > 
> > Nevermind. The call is in scsi_module.c.
> 
> Which, btw really need to go away.  If you want to resurrect the
> ISA drivers they need to be converted to proper probing.

Yes. I didn't do that conversion because I don't have ISA hardware and I 
don't understand ISA probing.

The present patch set doesn't seek to resurrect the ISA drivers. But I am 
trying to avoid regressions.

I have mixed feelings about the ISA drivers. ISA DMA support complicates 
things (it was never completed) and DMA seems to be the main obstacle to 
merging the two core driver forks.

-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html