Re: [PATCH v2 7/7] [RFC] nvme: Fix a race condition

2016-10-11 Thread Bart Van Assche

On 10/11/16 09:46, Christoph Hellwig wrote:

On Wed, Sep 28, 2016 at 05:01:45PM -0700, Bart Van Assche wrote:

Avoid that nvme_queue_rq() is still running when nvme_stop_queues()
returns. Untested.

Signed-off-by: Bart Van Assche 
Cc: Keith Busch 
Cc: Christoph Hellwig 
Cc: Sagi Grimberg 
---
 drivers/nvme/host/core.c | 20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index d791fba..98f1f29 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -201,13 +201,9 @@ fail:

 void nvme_requeue_req(struct request *req)
 {
-   unsigned long flags;
-
blk_mq_requeue_request(req);
-   spin_lock_irqsave(req->q->queue_lock, flags);
-   if (!blk_mq_queue_stopped(req->q))
-   blk_mq_kick_requeue_list(req->q);
-   spin_unlock_irqrestore(req->q->queue_lock, flags);
+   WARN_ON_ONCE(blk_mq_queue_stopped(req->q));
+   blk_mq_kick_requeue_list(req->q);
 }
 EXPORT_SYMBOL_GPL(nvme_requeue_req);


Can we just add a 'bool kick' argument to blk_mq_requeue_request and
move all this handling to the core?


Hello Christoph,

That sounds like a good idea to me. Thanks also for the other review 
comments you posted on this patch series. I will rework patch 6/7 such 
that the code for waiting is moved into the SCSI core.


Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv3 13/41] truncate: make sure invalidate_mapping_pages() can discard huge pages

2016-10-11 Thread Kirill A. Shutemov
On Tue, Oct 11, 2016 at 05:58:15PM +0200, Jan Kara wrote:
> On Thu 15-09-16 14:54:55, Kirill A. Shutemov wrote:
> > invalidate_inode_page() has expectation about page_count() of the page
> > -- if it's not 2 (one to caller, one to radix-tree), it will not be
> > dropped. That condition almost never met for THPs -- tail pages are
> > pinned to the pagevec.
> > 
> > Let's drop them, before calling invalidate_inode_page().
> > 
> > Signed-off-by: Kirill A. Shutemov 
> > ---
> >  mm/truncate.c | 11 +++
> >  1 file changed, 11 insertions(+)
> > 
> > diff --git a/mm/truncate.c b/mm/truncate.c
> > index a01cce450a26..ce904e4b1708 100644
> > --- a/mm/truncate.c
> > +++ b/mm/truncate.c
> > @@ -504,10 +504,21 @@ unsigned long invalidate_mapping_pages(struct 
> > address_space *mapping,
> > /* 'end' is in the middle of THP */
> > if (index ==  round_down(end, HPAGE_PMD_NR))
> > continue;
> > +   /*
> > +* invalidate_inode_page() expects
> > +* page_count(page) == 2 to drop page from page
> > +* cache -- drop tail pages references.
> > +*/
> > +   get_page(page);
> > +   pagevec_release();
> 
> I'm not quite sure why this is needed. When you have multiorder entry in
> the radix tree for your huge page, then you should not get more entries in
> the pagevec for your huge page. What do I miss?

For compatibility reason find_get_entries() (which is called by
pagevec_lookup_entries()) collects all subpages of huge page in the range
(head/tails). See patch [07/41]

So huge page, which is fully in the range it will be pinned up to
PAGEVEC_SIZE times.

-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv3 14/41] filemap: allocate huge page in page_cache_read(), if allowed

2016-10-11 Thread Kirill A. Shutemov
On Tue, Oct 11, 2016 at 06:15:45PM +0200, Jan Kara wrote:
> On Thu 15-09-16 14:54:56, Kirill A. Shutemov wrote:
> > This patch adds basic functionality to put huge page into page cache.
> > 
> > At the moment we only put huge pages into radix-tree if the range covered
> > by the huge page is empty.
> > 
> > We ignore shadow entires for now, just remove them from the tree before
> > inserting huge page.
> > 
> > Later we can add logic to accumulate information from shadow entires to
> > return to caller (average eviction time?).
> > 
> > Signed-off-by: Kirill A. Shutemov 
> > ---
> >  include/linux/fs.h  |   5 ++
> >  include/linux/pagemap.h |  21 ++-
> >  mm/filemap.c| 148 
> > +++-
> >  3 files changed, 157 insertions(+), 17 deletions(-)
> > 
> ...
> > @@ -663,16 +663,55 @@ static int __add_to_page_cache_locked(struct page 
> > *page,
> > page->index = offset;
> >  
> > spin_lock_irq(>tree_lock);
> > -   error = page_cache_tree_insert(mapping, page, shadowp);
> > +   if (PageTransHuge(page)) {
> > +   struct radix_tree_iter iter;
> > +   void **slot;
> > +   void *p;
> > +
> > +   error = 0;
> > +
> > +   /* Wipe shadow entires */
> > +   radix_tree_for_each_slot(slot, >page_tree, , 
> > offset) {
> > +   if (iter.index >= offset + HPAGE_PMD_NR)
> > +   break;
> > +
> > +   p = radix_tree_deref_slot_protected(slot,
> > +   >tree_lock);
> > +   if (!p)
> > +   continue;
> > +
> > +   if (!radix_tree_exception(p)) {
> > +   error = -EEXIST;
> > +   break;
> > +   }
> > +
> > +   mapping->nrexceptional--;
> > +   rcu_assign_pointer(*slot, NULL);
> 
> I think you also need something like workingset_node_shadows_dec(node)
> here. It would be even better if you used something like
> clear_exceptional_entry() to have the logic in one place (you obviously
> need to factor out only part of clear_exceptional_entry() first).

Good point. Will do.

> > +   }
> > +
> > +   if (!error)
> > +   error = __radix_tree_insert(>page_tree, offset,
> > +   compound_order(page), page);
> > +
> > +   if (!error) {
> > +   count_vm_event(THP_FILE_ALLOC);
> > +   mapping->nrpages += HPAGE_PMD_NR;
> > +   *shadowp = NULL;
> > +   __inc_node_page_state(page, NR_FILE_THPS);
> > +   }
> > +   } else {
> > +   error = page_cache_tree_insert(mapping, page, shadowp);
> > +   }
> 
> And I'd prefer to have this logic moved to page_cache_tree_insert() because
> logically it IMHO belongs there - it is a simply another case of handling
> of radix tree used for page cache.

Okay.

-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv3 12/41] thp: handle write-protection faults for file THP

2016-10-11 Thread Kirill A. Shutemov
On Tue, Oct 11, 2016 at 05:47:50PM +0200, Jan Kara wrote:
> On Thu 15-09-16 14:54:54, Kirill A. Shutemov wrote:
> > For filesystems that wants to be write-notified (has mkwrite), we will
> > encount write-protection faults for huge PMDs in shared mappings.
> > 
> > The easiest way to handle them is to clear the PMD and let it refault as
> > wriable.
> > 
> > Signed-off-by: Kirill A. Shutemov 
> > ---
> >  mm/memory.c | 11 ++-
> >  1 file changed, 10 insertions(+), 1 deletion(-)
> > 
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 83be99d9d8a1..aad8d5c6311f 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -3451,8 +3451,17 @@ static int wp_huge_pmd(struct fault_env *fe, pmd_t 
> > orig_pmd)
> > return fe->vma->vm_ops->pmd_fault(fe->vma, fe->address, fe->pmd,
> > fe->flags);
> >  
> > +   if (fe->vma->vm_flags & VM_SHARED) {
> > +   /* Clear PMD */
> > +   zap_page_range_single(fe->vma, fe->address,
> > +   HPAGE_PMD_SIZE, NULL);
> > +   VM_BUG_ON(!pmd_none(*fe->pmd));
> > +
> > +   /* Refault to establish writable PMD */
> > +   return 0;
> > +   }
> > +
> 
> Since we want to write-protect the page table entry on each page writeback
> and write-enable then on the next write, this is relatively expensive.
> Would it be that complicated to handle this fully in ->pmd_fault handler
> like we do for DAX?
> 
> Maybe it doesn't have to be done now but longer term I guess it might make
> sense.

Right. This approach is just simplier to implement. We can rework it if it
will show up on traces.

> Otherwise the patch looks good so feel free to add:
> 
> Reviewed-by: Jan Kara 

Thanks!

-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv3 11/41] thp: try to free page's buffers before attempt split

2016-10-11 Thread Kirill A. Shutemov
On Tue, Oct 11, 2016 at 05:40:31PM +0200, Jan Kara wrote:
> On Thu 15-09-16 14:54:53, Kirill A. Shutemov wrote:
> > We want page to be isolated from the rest of the system before spliting
> > it. We rely on page count to be 2 for file pages to make sure nobody
> > uses the page: one pin to caller, one to radix-tree.
> > 
> > Filesystems with backing storage can have page count increased if it has
> > buffers.
> > 
> > Let's try to free them, before attempt split. And remove one guarding
> > VM_BUG_ON_PAGE().
> > 
> > Signed-off-by: Kirill A. Shutemov 
> ...
> > @@ -2041,6 +2041,23 @@ int split_huge_page_to_list(struct page *page, 
> > struct list_head *list)
> > goto out;
> > }
> >  
> > +   /* Try to free buffers before attempt split */
> > +   if (!PageSwapBacked(head) && PagePrivate(page)) {
> > +   /*
> > +* We cannot trigger writeback from here due possible
> > +* recursion if triggered from vmscan, only wait.
> > +*
> > +* Caller can trigger writeback it on its own, if safe.
> > +*/
> > +   wait_on_page_writeback(head);
> > +
> > +   if (page_has_buffers(head) &&
> > +   !try_to_free_buffers(head)) {
> > +   ret = -EBUSY;
> > +   goto out;
> > +   }
> 
> Shouldn't you rather use try_to_release_page() here? Because filesystems
> have their ->releasepage() callbacks for freeing data associated with a
> page. It is not guaranteed page private data are buffers although it is
> true for ext4...

Fair enough. Will fix this.

-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 0/2] Enabling ATA Command Priorities

2016-10-11 Thread Adam Manzanares
This patch builds ATA commands with high priority if the iocontext
of a process is set to real time. The goal of the patch is to
improve tail latencies of workloads that use higher queue depths.

This patch has been tested with an Ultrastar HE8 HDD and cuts the 
the p99.99 tail latency of foreground IO from 2s down to 72ms when
using the deadline scheduler. This patch works independently of the
scheduler so it can be used with all of the currently available 
request based schedulers. 

Foreground IO, for the previously described results, is an async fio job 
submitting 4K read requests at a QD of 1 to the HDD. The foreground IO is set 
with the iopriority class of real time. The background workload is another fio
job submitting read requests at a QD of 32 to the same HDD with default 
iopriority.

This feature is enabled by setting a queue flag that is exposed as a sysfs
entry named rq_ioc_prio. If this feature is enabled, and the submission 
iocontext exists, and the bio_prio is not valid then the request ioprio is
set to the iocontext prio.

v3:
 - Removed null dereference issue in blk-core
 - Renamed queue sysfs entries for clarity
 - Added documentation for sysfs queue entry

v2:
 - Add queue flag to set iopriority going to the request
 - If queue flag set, send iopriority class to ata_build_rw_tf
 - Remove redundant code in ata_ncq_prio_enabled function.


Adam Manzanares (2):
  block: Add iocontext priority to request
  ata: Enabling ATA Command Priorities

 Documentation/block/queue-sysfs.txt | 12 
 block/blk-core.c|  5 +
 block/blk-sysfs.c   | 32 
 drivers/ata/libata-core.c   | 35 ++-
 drivers/ata/libata-scsi.c   | 10 +-
 drivers/ata/libata.h|  2 +-
 include/linux/ata.h |  6 ++
 include/linux/blkdev.h  |  3 +++
 include/linux/libata.h  | 18 ++
 9 files changed, 120 insertions(+), 3 deletions(-)

-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/2] ata: Enabling ATA Command Priorities

2016-10-11 Thread Adam Manzanares
This patch checks to see if an ATA device supports NCQ command priorities.
If so and the user has specified an iocontext that indicates IO_PRIO_CLASS_RT
and also enables request priorities in the block queue then we build a tf
with a high priority command.

This patch depends on patch block-Add-iocontext-priority-to-request

Signed-off-by: Adam Manzanares 
---
 drivers/ata/libata-core.c | 35 ++-
 drivers/ata/libata-scsi.c | 10 +-
 drivers/ata/libata.h  |  2 +-
 include/linux/ata.h   |  6 ++
 include/linux/libata.h| 18 ++
 5 files changed, 68 insertions(+), 3 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 223a770..181b530 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -739,6 +739,7 @@ u64 ata_tf_read_block(const struct ata_taskfile *tf, struct 
ata_device *dev)
  * @n_block: Number of blocks
  * @tf_flags: RW/FUA etc...
  * @tag: tag
+ * @class: IO priority class
  *
  * LOCKING:
  * None.
@@ -753,7 +754,7 @@ u64 ata_tf_read_block(const struct ata_taskfile *tf, struct 
ata_device *dev)
  */
 int ata_build_rw_tf(struct ata_taskfile *tf, struct ata_device *dev,
u64 block, u32 n_block, unsigned int tf_flags,
-   unsigned int tag)
+   unsigned int tag, int class)
 {
tf->flags |= ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE;
tf->flags |= tf_flags;
@@ -785,6 +786,12 @@ int ata_build_rw_tf(struct ata_taskfile *tf, struct 
ata_device *dev,
tf->device = ATA_LBA;
if (tf->flags & ATA_TFLAG_FUA)
tf->device |= 1 << 7;
+
+   if (ata_ncq_prio_enabled(dev)) {
+   if (class == IOPRIO_CLASS_RT)
+   tf->hob_nsect |= ATA_PRIO_HIGH <<
+ATA_SHIFT_PRIO;
+   }
} else if (dev->flags & ATA_DFLAG_LBA) {
tf->flags |= ATA_TFLAG_LBA;
 
@@ -2156,6 +2163,30 @@ static void ata_dev_config_ncq_non_data(struct 
ata_device *dev)
}
 }
 
+static void ata_dev_config_ncq_prio(struct ata_device *dev)
+{
+   struct ata_port *ap = dev->link->ap;
+   unsigned int err_mask;
+
+   err_mask = ata_read_log_page(dev,
+ATA_LOG_SATA_ID_DEV_DATA,
+ATA_LOG_SATA_SETTINGS,
+ap->sector_buf,
+1);
+   if (err_mask) {
+   ata_dev_dbg(dev,
+   "failed to get Identify Device data, Emask 0x%x\n",
+   err_mask);
+   return;
+   }
+
+   if (ap->sector_buf[ATA_LOG_NCQ_PRIO_OFFSET] & BIT(3))
+   dev->flags |= ATA_DFLAG_NCQ_PRIO;
+   else
+   ata_dev_dbg(dev, "SATA page does not support priority\n");
+
+}
+
 static int ata_dev_config_ncq(struct ata_device *dev,
   char *desc, size_t desc_sz)
 {
@@ -2205,6 +2236,8 @@ static int ata_dev_config_ncq(struct ata_device *dev,
ata_dev_config_ncq_send_recv(dev);
if (ata_id_has_ncq_non_data(dev->id))
ata_dev_config_ncq_non_data(dev);
+   if (ata_id_has_ncq_prio(dev->id))
+   ata_dev_config_ncq_prio(dev);
}
 
return 0;
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index e207b33..4304694 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -50,6 +50,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "libata.h"
 #include "libata-transport.h"
@@ -1757,6 +1758,8 @@ static unsigned int ata_scsi_rw_xlat(struct 
ata_queued_cmd *qc)
 {
struct scsi_cmnd *scmd = qc->scsicmd;
const u8 *cdb = scmd->cmnd;
+   struct request *rq = scmd->request;
+   int class = 0;
unsigned int tf_flags = 0;
u64 block;
u32 n_block;
@@ -1822,8 +1825,13 @@ static unsigned int ata_scsi_rw_xlat(struct 
ata_queued_cmd *qc)
qc->flags |= ATA_QCFLAG_IO;
qc->nbytes = n_block * scmd->device->sector_size;
 
+   /* If queue supports req prio pass it onto the task file */
+   if (blk_queue_rq_ioc_prio(rq->q))
+   class = IOPRIO_PRIO_CLASS(req_get_ioprio(rq));
+
rc = ata_build_rw_tf(>tf, qc->dev, block, n_block, tf_flags,
-qc->tag);
+qc->tag, class);
+
if (likely(rc == 0))
return 0;
 
diff --git a/drivers/ata/libata.h b/drivers/ata/libata.h
index 3b301a4..8f3a559 100644
--- a/drivers/ata/libata.h
+++ b/drivers/ata/libata.h
@@ -66,7 +66,7 @@ extern u64 ata_tf_to_lba48(const struct ata_taskfile *tf);
 extern struct ata_queued_cmd *ata_qc_new_init(struct ata_device *dev, int tag);
 extern int ata_build_rw_tf(struct 

[PATCH v3 1/2] block: Add iocontext priority to request

2016-10-11 Thread Adam Manzanares
Patch adds an association between iocontext ioprio and the ioprio of
a request. This feature is only enabled if a queue flag is set to
indicate that requests should have ioprio associated with them. The
queue flag is exposed as the req_prio queue sysfs entry.

Signed-off-by: Adam Mananzanares 
---
 Documentation/block/queue-sysfs.txt | 12 
 block/blk-core.c|  5 +
 block/blk-sysfs.c   | 32 
 include/linux/blkdev.h  |  3 +++
 4 files changed, 52 insertions(+)

diff --git a/Documentation/block/queue-sysfs.txt 
b/Documentation/block/queue-sysfs.txt
index 2a39040..3ca4e8f 100644
--- a/Documentation/block/queue-sysfs.txt
+++ b/Documentation/block/queue-sysfs.txt
@@ -144,6 +144,18 @@ For storage configurations that need to maximize 
distribution of completion
 processing setting this option to '2' forces the completion to run on the
 requesting cpu (bypassing the "group" aggregation logic).
 
+rq_ioc_prio (RW)
+
+If this option is '1', and there is a valid iocontext associated with the
+issuing context, and the bio we are processing does not have a valid
+prio, then we save the prio value from the iocontext with the request.
+
+This feature can be combined with device drivers that are aware of prio
+values in order to handle prio accordingly. An example would be if the ata
+layer recognizes prio and creates ata commands with high priority and sends
+them to the device. If the hardware supports priorities for commands then
+this has the potential to speed up response times for high priority IO.
+
 scheduler (RW)
 --
 When read, this file will display the current and available IO schedulers
diff --git a/block/blk-core.c b/block/blk-core.c
index 14d7c07..2e740c4 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define CREATE_TRACE_POINTS
 #include 
@@ -1648,6 +1649,7 @@ unsigned int blk_plug_queued_count(struct request_queue 
*q)
 
 void init_request_from_bio(struct request *req, struct bio *bio)
 {
+   struct io_context *ioc = rq_ioc(bio);
req->cmd_type = REQ_TYPE_FS;
 
req->cmd_flags |= bio->bi_opf & REQ_COMMON_MASK;
@@ -1657,6 +1659,9 @@ void init_request_from_bio(struct request *req, struct 
bio *bio)
req->errors = 0;
req->__sector = bio->bi_iter.bi_sector;
req->ioprio = bio_prio(bio);
+   if (blk_queue_rq_ioc_prio(req->q) && !ioprio_valid(req->ioprio) && ioc)
+   req->ioprio = ioc->ioprio;
+
blk_rq_bio_prep(req->q, req, bio);
 }
 
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 9cc8d7c..a9c5105 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -384,6 +384,31 @@ static ssize_t queue_dax_show(struct request_queue *q, 
char *page)
return queue_var_show(blk_queue_dax(q), page);
 }
 
+static ssize_t queue_rq_ioc_prio_show(struct request_queue *q, char *page)
+{
+   return queue_var_show(blk_queue_rq_ioc_prio(q), page);
+}
+
+static ssize_t queue_rq_ioc_prio_store(struct request_queue *q,
+  const char *page, size_t count)
+{
+   unsigned long rq_ioc_prio_on;
+   ssize_t ret;
+
+   ret = queue_var_store(_ioc_prio_on, page, count);
+   if (ret < 0)
+   return ret;
+
+   spin_lock_irq(q->queue_lock);
+   if (rq_ioc_prio_on)
+   queue_flag_set(QUEUE_FLAG_RQ_IOC_PRIO, q);
+   else
+   queue_flag_clear(QUEUE_FLAG_RQ_IOC_PRIO, q);
+   spin_unlock_irq(q->queue_lock);
+
+   return ret;
+}
+
 static struct queue_sysfs_entry queue_requests_entry = {
.attr = {.name = "nr_requests", .mode = S_IRUGO | S_IWUSR },
.show = queue_requests_show,
@@ -526,6 +551,12 @@ static struct queue_sysfs_entry queue_dax_entry = {
.show = queue_dax_show,
 };
 
+static struct queue_sysfs_entry queue_rq_ioc_prio_entry = {
+   .attr = {.name = "rq_ioc_prio", .mode = S_IRUGO | S_IWUSR },
+   .show = queue_rq_ioc_prio_show,
+   .store = queue_rq_ioc_prio_store,
+};
+
 static struct attribute *default_attrs[] = {
_requests_entry.attr,
_ra_entry.attr,
@@ -553,6 +584,7 @@ static struct attribute *default_attrs[] = {
_poll_entry.attr,
_wc_entry.attr,
_dax_entry.attr,
+   _rq_ioc_prio_entry.attr,
NULL,
 };
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index c47c358..63b842a 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -505,6 +505,7 @@ struct request_queue {
 #define QUEUE_FLAG_FUA24   /* device supports FUA writes */
 #define QUEUE_FLAG_FLUSH_NQ25  /* flush not queueuable */
 #define QUEUE_FLAG_DAX 26  /* device supports DAX */
+#define QUEUE_FLAG_RQ_IOC_PRIO 27  /* Use iocontext ioprio */
 
 #define QUEUE_FLAG_DEFAULT ((1 << QUEUE_FLAG_IO_STAT) |\
 

Re: [PATCH 2/6] ipr: use pci_irq_allocate_vectors

2016-10-11 Thread Martin K. Petersen
> "Christoph" == Christoph Hellwig  writes:

Christoph> Switch the ipr driver to use pci_alloc_irq_vectors.  We need
Christoph> to two calls to pci_alloc_irq_vectors as ipr only supports
Christoph> multiple MSI-X vectors, but not multiple MSI vectors.

Christoph> Otherwise this cleans up a lot of cruft and allows to use a
Christoph> common request_irq loop for irq types, which happens to only
Christoph> iterate over a single line in the non MSI-X case.

Applied to 4.10/scsi-queue.

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6] arcmsr: use pci_alloc_irq_vectors

2016-10-11 Thread Martin K. Petersen
> "Christoph" == Christoph Hellwig  writes:

Christoph> Switch the arcmsr driver to use pci_alloc_irq_vectors.  We
Christoph> need to two calls to pci_alloc_irq_vectors as arcmsr only
Christoph> supports multiple MSI-X vectors, but not multiple MSI
Christoph> vectors.

Christoph> Otherwise this cleans up a lot of cruft and allows to use a
Christoph> common request_irq loop for irq types, which happens to only
Christoph> iterate over a single line in the non MSI-X case.

Applied to 4.10/scsi-queue.

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv3 13/41] truncate: make sure invalidate_mapping_pages() can discard huge pages

2016-10-11 Thread Jan Kara
On Thu 15-09-16 14:54:55, Kirill A. Shutemov wrote:
> invalidate_inode_page() has expectation about page_count() of the page
> -- if it's not 2 (one to caller, one to radix-tree), it will not be
> dropped. That condition almost never met for THPs -- tail pages are
> pinned to the pagevec.
> 
> Let's drop them, before calling invalidate_inode_page().
> 
> Signed-off-by: Kirill A. Shutemov 
> ---
>  mm/truncate.c | 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/mm/truncate.c b/mm/truncate.c
> index a01cce450a26..ce904e4b1708 100644
> --- a/mm/truncate.c
> +++ b/mm/truncate.c
> @@ -504,10 +504,21 @@ unsigned long invalidate_mapping_pages(struct 
> address_space *mapping,
>   /* 'end' is in the middle of THP */
>   if (index ==  round_down(end, HPAGE_PMD_NR))
>   continue;
> + /*
> +  * invalidate_inode_page() expects
> +  * page_count(page) == 2 to drop page from page
> +  * cache -- drop tail pages references.
> +  */
> + get_page(page);
> + pagevec_release();

I'm not quite sure why this is needed. When you have multiorder entry in
the radix tree for your huge page, then you should not get more entries in
the pagevec for your huge page. What do I miss?

Honza
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv3 11/41] thp: try to free page's buffers before attempt split

2016-10-11 Thread Jan Kara
On Thu 15-09-16 14:54:53, Kirill A. Shutemov wrote:
> We want page to be isolated from the rest of the system before spliting
> it. We rely on page count to be 2 for file pages to make sure nobody
> uses the page: one pin to caller, one to radix-tree.
> 
> Filesystems with backing storage can have page count increased if it has
> buffers.
> 
> Let's try to free them, before attempt split. And remove one guarding
> VM_BUG_ON_PAGE().
> 
> Signed-off-by: Kirill A. Shutemov 
...
> @@ -2041,6 +2041,23 @@ int split_huge_page_to_list(struct page *page, struct 
> list_head *list)
>   goto out;
>   }
>  
> + /* Try to free buffers before attempt split */
> + if (!PageSwapBacked(head) && PagePrivate(page)) {
> + /*
> +  * We cannot trigger writeback from here due possible
> +  * recursion if triggered from vmscan, only wait.
> +  *
> +  * Caller can trigger writeback it on its own, if safe.
> +  */
> + wait_on_page_writeback(head);
> +
> + if (page_has_buffers(head) &&
> + !try_to_free_buffers(head)) {
> + ret = -EBUSY;
> + goto out;
> + }

Shouldn't you rather use try_to_release_page() here? Because filesystems
have their ->releasepage() callbacks for freeing data associated with a
page. It is not guaranteed page private data are buffers although it is
true for ext4...

Honza
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv3 12/41] thp: handle write-protection faults for file THP

2016-10-11 Thread Jan Kara
On Thu 15-09-16 14:54:54, Kirill A. Shutemov wrote:
> For filesystems that wants to be write-notified (has mkwrite), we will
> encount write-protection faults for huge PMDs in shared mappings.
> 
> The easiest way to handle them is to clear the PMD and let it refault as
> wriable.
> 
> Signed-off-by: Kirill A. Shutemov 
> ---
>  mm/memory.c | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 83be99d9d8a1..aad8d5c6311f 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3451,8 +3451,17 @@ static int wp_huge_pmd(struct fault_env *fe, pmd_t 
> orig_pmd)
>   return fe->vma->vm_ops->pmd_fault(fe->vma, fe->address, fe->pmd,
>   fe->flags);
>  
> + if (fe->vma->vm_flags & VM_SHARED) {
> + /* Clear PMD */
> + zap_page_range_single(fe->vma, fe->address,
> + HPAGE_PMD_SIZE, NULL);
> + VM_BUG_ON(!pmd_none(*fe->pmd));
> +
> + /* Refault to establish writable PMD */
> + return 0;
> + }
> +

Since we want to write-protect the page table entry on each page writeback
and write-enable then on the next write, this is relatively expensive.
Would it be that complicated to handle this fully in ->pmd_fault handler
like we do for DAX?

Maybe it doesn't have to be done now but longer term I guess it might make
sense.

Otherwise the patch looks good so feel free to add:

Reviewed-by: Jan Kara 

Honza
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv3 14/41] filemap: allocate huge page in page_cache_read(), if allowed

2016-10-11 Thread Jan Kara
On Thu 15-09-16 14:54:56, Kirill A. Shutemov wrote:
> This patch adds basic functionality to put huge page into page cache.
> 
> At the moment we only put huge pages into radix-tree if the range covered
> by the huge page is empty.
> 
> We ignore shadow entires for now, just remove them from the tree before
> inserting huge page.
> 
> Later we can add logic to accumulate information from shadow entires to
> return to caller (average eviction time?).
> 
> Signed-off-by: Kirill A. Shutemov 
> ---
>  include/linux/fs.h  |   5 ++
>  include/linux/pagemap.h |  21 ++-
>  mm/filemap.c| 148 
> +++-
>  3 files changed, 157 insertions(+), 17 deletions(-)
> 
...
> @@ -663,16 +663,55 @@ static int __add_to_page_cache_locked(struct page *page,
>   page->index = offset;
>  
>   spin_lock_irq(>tree_lock);
> - error = page_cache_tree_insert(mapping, page, shadowp);
> + if (PageTransHuge(page)) {
> + struct radix_tree_iter iter;
> + void **slot;
> + void *p;
> +
> + error = 0;
> +
> + /* Wipe shadow entires */
> + radix_tree_for_each_slot(slot, >page_tree, , 
> offset) {
> + if (iter.index >= offset + HPAGE_PMD_NR)
> + break;
> +
> + p = radix_tree_deref_slot_protected(slot,
> + >tree_lock);
> + if (!p)
> + continue;
> +
> + if (!radix_tree_exception(p)) {
> + error = -EEXIST;
> + break;
> + }
> +
> + mapping->nrexceptional--;
> + rcu_assign_pointer(*slot, NULL);

I think you also need something like workingset_node_shadows_dec(node)
here. It would be even better if you used something like
clear_exceptional_entry() to have the logic in one place (you obviously
need to factor out only part of clear_exceptional_entry() first).

> + }
> +
> + if (!error)
> + error = __radix_tree_insert(>page_tree, offset,
> + compound_order(page), page);
> +
> + if (!error) {
> + count_vm_event(THP_FILE_ALLOC);
> + mapping->nrpages += HPAGE_PMD_NR;
> + *shadowp = NULL;
> + __inc_node_page_state(page, NR_FILE_THPS);
> + }
> + } else {
> + error = page_cache_tree_insert(mapping, page, shadowp);
> + }

And I'd prefer to have this logic moved to page_cache_tree_insert() because
logically it IMHO belongs there - it is a simply another case of handling
of radix tree used for page cache.

Honza
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 6/7] SRP transport: Port srp_wait_for_queuecommand() to scsi-mq

2016-10-11 Thread Christoph Hellwig
On Wed, Oct 05, 2016 at 02:51:50PM -0700, Bart Van Assche wrote:
> There are multiple direct blk_*() calls in other SCSI transport drivers. So 
> my proposal is to wait with moving this code into scsi_lib.c until there is 
> a second user of this code.

I still don't think these low-level difference for blk-mq vs legacy
request belong into a scsi LLDD.  So I concur with Sagi that this
should go into the core SCSI code.

In fact I suspect we should just call it directly from
scsi_internal_device_block, and maybe even scsi_internal_device_unblock
for case of setting the device offline.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/7] blk-mq: Introduce blk_mq_queue_stopped()

2016-10-11 Thread Christoph Hellwig
Looks fine,

Reviewed-by: Christoph Hellwig 
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/9] Introduce blk_quiesce_queue() and blk_resume_queue()

2016-10-11 Thread Laurence Oberman


- Original Message -
> From: "Bart Van Assche" 
> To: "Jens Axboe" 
> Cc: "Christoph Hellwig" , "James Bottomley" 
> , "Martin K. Petersen"
> , "Mike Snitzer" , "Doug 
> Ledford" , "Keith
> Busch" , linux-block@vger.kernel.org, 
> linux-s...@vger.kernel.org, linux-r...@vger.kernel.org,
> linux-n...@lists.infradead.org
> Sent: Monday, September 26, 2016 2:25:54 PM
> Subject: [PATCH 0/9] Introduce blk_quiesce_queue() and blk_resume_queue()
> 
> Hello Jens,
> 
> Multiple block drivers need the functionality to stop a request queue
> and to wait until all ongoing request_fn() / queue_rq() calls have
> finished without waiting until all outstanding requests have finished.
> Hence this patch series that introduces the blk_quiesce_queue() and
> blk_resume_queue() functions. The dm-mq, SRP and nvme patches in this
> patch series are three examples of where these functions are useful.
> These patches apply on top of the September 21 version of your
> for-4.9/block branch. The individual patches in this series are:
> 
> 0001-blk-mq-Introduce-blk_mq_queue_stopped.patch
> 0002-dm-Fix-a-race-condition-related-to-stopping-and-star.patch
> 0003-RFC-nvme-Use-BLK_MQ_S_STOPPED-instead-of-QUEUE_FLAG_.patch
> 0004-block-Move-blk_freeze_queue-and-blk_unfreeze_queue-c.patch
> 0005-block-Extend-blk_freeze_queue_start-to-the-non-blk-m.patch
> 0006-block-Rename-mq_freeze_wq-and-mq_freeze_depth.patch
> 0007-blk-mq-Introduce-blk_quiesce_queue-and-blk_resume_qu.patch
> 0008-SRP-transport-Port-srp_wait_for_queuecommand-to-scsi.patch
> 0009-RFC-nvme-Fix-a-race-condition.patch
> 
> Thanks,
> 
> Bart.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
Hello

I took Bart's latest patches from his tree and tested all the SRP/RDMA and as 
many of the nvme tests.

Everything is passing my tests, including SRP port resets etc.
The nvme tests were all on a small intel nvme card.

Tested-by: Laurence Oberman 
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 35/44] block: add reference counting for struct bsg_job

2016-10-11 Thread Johannes Thumshirn
Add reference counting to 'struct bsg_job' so we can implement a reuqest
timeout handler for bsg_jobs, which is needed for Fibre Channel.

Signed-off-by: Johannes Thumshirn 
---
 block/bsg-lib.c | 7 +--
 include/linux/bsg-lib.h | 2 ++
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/block/bsg-lib.c b/block/bsg-lib.c
index 650f427..632fb40 100644
--- a/block/bsg-lib.c
+++ b/block/bsg-lib.c
@@ -32,8 +32,10 @@
  * bsg_destroy_job - routine to teardown/delete a bsg job
  * @job: bsg_job that is to be torn down
  */
-static void bsg_destroy_job(struct bsg_job *job)
+static void bsg_destroy_job(struct kref *kref)
 {
+   struct bsg_job *job = container_of(kref, struct bsg_job, kref);
+
put_device(job->dev);   /* release reference for the request */
 
kfree(job->request_payload.sg_list);
@@ -84,7 +86,7 @@ static void bsg_softirq_done(struct request *rq)
struct bsg_job *job = rq->special;
 
blk_end_request_all(rq, rq->errors);
-   bsg_destroy_job(job);
+   kref_put(>kref, bsg_destroy_job);
 }
 
 static int bsg_map_buffer(struct bsg_buffer *buf, struct request *req)
@@ -142,6 +144,7 @@ static int bsg_create_job(struct device *dev, struct 
request *req)
job->dev = dev;
/* take a reference for the request */
get_device(job->dev);
+   kref_init(>kref);
return 0;
 
 failjob_rls_rqst_payload:
diff --git a/include/linux/bsg-lib.h b/include/linux/bsg-lib.h
index a226652..58e0717 100644
--- a/include/linux/bsg-lib.h
+++ b/include/linux/bsg-lib.h
@@ -40,6 +40,8 @@ struct bsg_job {
struct device *dev;
struct request *req;
 
+   struct kref kref;
+
/* Transport/driver specific request/reply structs */
void *request;
void *reply;
-- 
1.8.5.6

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 42/44] block: add bsg_job_put() and bsg_job_get()

2016-10-11 Thread Johannes Thumshirn
Add bsg_job_put() and bsg_job_get() so don't need to export
bsg_destroy_job() any more.

Signed-off-by: Johannes Thumshirn 
---
 block/bsg-lib.c  | 17 ++---
 drivers/scsi/scsi_transport_fc.c |  2 +-
 include/linux/bsg-lib.h  |  3 ++-
 3 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/block/bsg-lib.c b/block/bsg-lib.c
index 5d24d25..4bf3a98 100644
--- a/block/bsg-lib.c
+++ b/block/bsg-lib.c
@@ -32,7 +32,7 @@
  * bsg_destroy_job - routine to teardown/delete a bsg job
  * @job: bsg_job that is to be torn down
  */
-void bsg_destroy_job(struct kref *kref)
+static void bsg_destroy_job(struct kref *kref)
 {
struct bsg_job *job = container_of(kref, struct bsg_job, kref);
 
@@ -42,7 +42,18 @@ void bsg_destroy_job(struct kref *kref)
kfree(job->reply_payload.sg_list);
kfree(job);
 }
-EXPORT_SYMBOL_GPL(bsg_destroy_job);
+
+void bsg_job_put(struct bsg_job *job)
+{
+   kref_put(>kref, bsg_destroy_job);
+}
+EXPORT_SYMBOL_GPL(bsg_job_put);
+
+void bsg_job_get(struct bsg_job *job)
+{
+   kref_get(>kref);
+}
+EXPORT_SYMBOL_GPL(bsg_job_get);
 
 /**
  * bsg_job_done - completion routine for bsg requests
@@ -87,7 +98,7 @@ void bsg_softirq_done(struct request *rq)
struct bsg_job *job = rq->special;
 
blk_end_request_all(rq, rq->errors);
-   kref_put(>kref, bsg_destroy_job);
+   bsg_job_put(job);
 }
 EXPORT_SYMBOL_GPL(bsg_softirq_done);
 
diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
index 720ddc9..34652e2 100644
--- a/drivers/scsi/scsi_transport_fc.c
+++ b/drivers/scsi/scsi_transport_fc.c
@@ -3577,7 +3577,7 @@ fc_bsg_job_timeout(struct request *req)
/* call LLDD to abort the i/o as it has timed out */
err = i->f->bsg_timeout(job);
if (err == -EAGAIN) {
-   kref_put(>kref, bsg_destroy_job);
+   bsg_job_put(job);
return BLK_EH_RESET_TIMER;
} else if (err)
printk(KERN_ERR "ERROR: FC BSG request timeout - LLD "
diff --git a/include/linux/bsg-lib.h b/include/linux/bsg-lib.h
index 09f3044..267d7ee 100644
--- a/include/linux/bsg-lib.h
+++ b/include/linux/bsg-lib.h
@@ -69,7 +69,8 @@ void bsg_job_done(struct bsg_job *job, int result,
 int bsg_setup_queue(struct device *dev, struct request_queue *q, char *name,
bsg_job_fn *job_fn, int dd_job_size);
 void bsg_request_fn(struct request_queue *q);
-void bsg_destroy_job(struct kref *kref);
 void bsg_softirq_done(struct request *rq);
+void bsg_job_put(struct bsg_job *job);
+void bsg_job_get(struct bsg_job *job);
 
 #endif
-- 
1.8.5.6

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 37/44] block: export bsg_destroy_job

2016-10-11 Thread Johannes Thumshirn
Export bsg_destroy_job so we can use it from clients of bsg-lib.

Signed-off-by: Johannes Thumshirn 
---
 block/bsg-lib.c | 3 ++-
 include/linux/bsg-lib.h | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/bsg-lib.c b/block/bsg-lib.c
index 632fb40..6b99c7f 100644
--- a/block/bsg-lib.c
+++ b/block/bsg-lib.c
@@ -32,7 +32,7 @@
  * bsg_destroy_job - routine to teardown/delete a bsg job
  * @job: bsg_job that is to be torn down
  */
-static void bsg_destroy_job(struct kref *kref)
+void bsg_destroy_job(struct kref *kref)
 {
struct bsg_job *job = container_of(kref, struct bsg_job, kref);
 
@@ -42,6 +42,7 @@ static void bsg_destroy_job(struct kref *kref)
kfree(job->reply_payload.sg_list);
kfree(job);
 }
+EXPORT_SYMBOL_GPL(bsg_destroy_job);
 
 /**
  * bsg_job_done - completion routine for bsg requests
diff --git a/include/linux/bsg-lib.h b/include/linux/bsg-lib.h
index 58e0717..67f7de6 100644
--- a/include/linux/bsg-lib.h
+++ b/include/linux/bsg-lib.h
@@ -69,5 +69,6 @@ void bsg_job_done(struct bsg_job *job, int result,
 int bsg_setup_queue(struct device *dev, struct request_queue *q, char *name,
bsg_job_fn *job_fn, int dd_job_size);
 void bsg_request_fn(struct request_queue *q);
+void bsg_destroy_job(struct kref *kref);
 
 #endif
-- 
1.8.5.6

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 44/44] block: unexport bsg_softirq_done() again

2016-10-11 Thread Johannes Thumshirn
Unexport bsg_softirq_done() again, we don't need it outside of bsg-lib.c
anymore now that scsi_transport_fc is a pure bsg-lib client.

Signed-off-by: Johannes Thumshirn 
---
 block/bsg-lib.c | 3 +--
 include/linux/bsg-lib.h | 1 -
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/block/bsg-lib.c b/block/bsg-lib.c
index 4bf3a98..71f3865 100644
--- a/block/bsg-lib.c
+++ b/block/bsg-lib.c
@@ -93,14 +93,13 @@ EXPORT_SYMBOL_GPL(bsg_job_done);
  * bsg_softirq_done - softirq done routine for destroying the bsg requests
  * @rq: BSG request that holds the job to be destroyed
  */
-void bsg_softirq_done(struct request *rq)
+static void bsg_softirq_done(struct request *rq)
 {
struct bsg_job *job = rq->special;
 
blk_end_request_all(rq, rq->errors);
bsg_job_put(job);
 }
-EXPORT_SYMBOL_GPL(bsg_softirq_done);
 
 static int bsg_map_buffer(struct bsg_buffer *buf, struct request *req)
 {
diff --git a/include/linux/bsg-lib.h b/include/linux/bsg-lib.h
index 267d7ee..a458d36 100644
--- a/include/linux/bsg-lib.h
+++ b/include/linux/bsg-lib.h
@@ -69,7 +69,6 @@ void bsg_job_done(struct bsg_job *job, int result,
 int bsg_setup_queue(struct device *dev, struct request_queue *q, char *name,
bsg_job_fn *job_fn, int dd_job_size);
 void bsg_request_fn(struct request_queue *q);
-void bsg_softirq_done(struct request *rq);
 void bsg_job_put(struct bsg_job *job);
 void bsg_job_get(struct bsg_job *job);
 
-- 
1.8.5.6

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][V3] nbd: add multi-connection support

2016-10-11 Thread Sagi Grimberg



NBD can become contended on its single connection.  We have to serialize all
writes and we can only process one read response at a time.  Fix this by
allowing userspace to provide multiple connections to a single nbd device.  This
coupled with block-mq drastically increases performance in multi-process cases.
Thanks,


Hey Josef,

I gave this patch a tryout and I'm getting a kernel paging request when
running multi-threaded write workload [1].

I have 2 VMs on my laptop: each is assigned with 2 cpus. I connected
the client to the server via 2 connections and ran:
fio --group_reporting --rw=randwrite --bs=4k --numjobs=2 --iodepth=128 
--runtime=60 --time_based --loops=1 --ioengine=libaio --direct=1 
--invalidate=1 --randrepeat=1 --norandommap --exitall --name task_nbd0 
--filename=/dev/nbd0


The server backend is null_blk btw:
./nbd-server 1022 /dev/nullb0

nbd-client:
./nbd-client -C 2 192.168.100.3 1022 /dev/nbd0

[1]:
[  171.813649] BUG: unable to handle kernel paging request at 
000235363130

[  171.816015] IP: [] nbd_queue_rq+0x319/0x580 [nbd]
[  171.816015] PGD 7a080067 PUD 0
[  171.816015] Oops:  [#1] SMP
[  171.816015] Modules linked in: nbd(O) rpcsec_gss_krb5 nfsv4 ib_iser 
iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 
snd_hda_codec_generic ppdev kvm_intel cirrus snd_hda_intel ttm kvm 
irqbypass drm_kms_helper snd_hda_codec drm snd_hda_core snd_hwdep joydev 
input_leds fb_sys_fops snd_pcm serio_raw syscopyarea snd_timer 
sysfillrect snd sysimgblt soundcore i2c_piix4 nfsd ib_umad parport_pc 
auth_rpcgss nfs_acl rdma_ucm nfs rdma_cm iw_cm lockd grace ib_cm 
configfs sunrpc ib_uverbs mac_hid fscache ib_core lp parport psmouse 
floppy e1000 pata_acpi [last unloaded: nbd]
[  171.816015] CPU: 0 PID: 196 Comm: kworker/0:1H Tainted: G   O 
   4.8.0-rc4+ #61
[  171.816015] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS Bochs 01/01/2011

[  171.816015] Workqueue: kblockd blk_mq_run_work_fn
[  171.816015] task: 8f0b37b23280 task.stack: 8f0b37bf
[  171.816015] RIP: 0010:[]  [] 
nbd_queue_rq+0x319/0x580 [nbd]

[  171.816015] RSP: 0018:8f0b37bf3c20  EFLAGS: 00010206
[  171.816015] RAX: 000235363130 RBX:  RCX: 
0200
[  171.816015] RDX: 0200 RSI: 8f0b37b23b48 RDI: 
8f0b37b23280
[  171.816015] RBP: 8f0b37bf3cc8 R08: 0001 R09: 

[  171.816015] R10:  R11: 8f0b37f21000 R12: 
23536303
[  171.816015] R13:  R14: 23536313 R15: 
8f0b37f21000
[  171.816015] FS:  () GS:8f0b3d20() 
knlGS:

[  171.816015] CS:  0010 DS:  ES:  CR0: 80050033
[  171.816015] CR2: 000235363130 CR3: 789b7000 CR4: 
06f0
[  171.816015] DR0:  DR1:  DR2: 

[  171.816015] DR3:  DR6: fffe0ff0 DR7: 
0400

[  171.816015] Stack:
[  171.816015]  8f0b 8f0b37a79480 8f0b378513c8 
0282
[  171.816015]  8f0b37b28428 8f0b37a795f0 8f0b37f21500 
0a0023536313
[  171.816015]  ea0001c69080  8f0b37b28280 
1395602537b23280

[  171.816015] Call Trace:
[  171.816015]  [] __blk_mq_run_hw_queue+0x260/0x390
[  171.816015]  [] blk_mq_run_work_fn+0x12/0x20
[  171.816015]  [] process_one_work+0x1f1/0x6b0
[  171.816015]  [] ? process_one_work+0x172/0x6b0
[  171.816015]  [] worker_thread+0x4e/0x490
[  171.816015]  [] ? process_one_work+0x6b0/0x6b0
[  171.816015]  [] ? process_one_work+0x6b0/0x6b0
[  171.816015]  [] kthread+0x101/0x120
[  171.816015]  [] ret_from_fork+0x1f/0x40
[  171.816015]  [] ? kthread_create_on_node+0x250/0x250
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] softirq: Display IRQ_POLL for irq-poll statistics

2016-10-11 Thread Johannes Thumshirn
On Mon, Oct 10, 2016 at 03:10:51PM +0300, Sagi Grimberg wrote:
> This library was moved to the generic area and was
> renamed to irq-poll. Hence, update proc/softirqs output accordingly.
> 
> Signed-off-by: Sagi Grimberg 
> ---

Looks good,
Reviewed-by: Johannes Thumshirn 

-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html