Re: [PATCH 1/3] fs: move documentation for thaw_super() where appropriate

2018-04-20 Thread Randy Dunlap
On 04/20/18 17:07, Luis R. Rodriguez wrote:
> On Sat, Apr 21, 2018 at 12:01:31AM +, Bart Van Assche wrote:
>> On Fri, 2018-04-20 at 16:59 -0700, Luis R. Rodriguez wrote:
>>> +/**
>>> + * thaw_super -- unlock filesystem
>>> + * @sb: the super to thaw
>>> + *
>>> + * Unlocks the filesystem and marks it writeable again after 
>>> freeze_super().
>>> + */
>>
>> Have you verified the output generated by scripts/kernel-doc? Last
>> time I checked the output for headers containing a double dash looked
>> weird.
> 
> No, I just moved the block of kdoc crap. Randy would have this memorized I'm
> sure though so I should just fix the kdoc if its bad while at it.
> 
>   Luis
> 

"--" does sound odd.  I've never seen it used on purpose.
FWIW, I just go by what is in Documentation/doc-guide/kernel-doc.rst:

Function documentation
--

The general format of a function and function-like macro kernel-doc comment is::

  /**
   * function_name() - Brief description of function.
   * @arg1: Describe the first argument.
   * @arg2: Describe the second argument.
   *One can provide multiple line descriptions
   *for arguments.



-- 
~Randy


Re: [PATCH 1/3] fs: move documentation for thaw_super() where appropriate

2018-04-20 Thread Luis R. Rodriguez
On Sat, Apr 21, 2018 at 12:01:31AM +, Bart Van Assche wrote:
> On Fri, 2018-04-20 at 16:59 -0700, Luis R. Rodriguez wrote:
> > +/**
> > + * thaw_super -- unlock filesystem
> > + * @sb: the super to thaw
> > + *
> > + * Unlocks the filesystem and marks it writeable again after 
> > freeze_super().
> > + */
> 
> Have you verified the output generated by scripts/kernel-doc? Last
> time I checked the output for headers containing a double dash looked
> weird.

No, I just moved the block of kdoc crap. Randy would have this memorized I'm
sure though so I should just fix the kdoc if its bad while at it.

  Luis


Re: [PATCH 0/3] fs: minor fs thaw fixes and adjustments

2018-04-20 Thread Luis R. Rodriguez
On Fri, Apr 20, 2018 at 04:59:01PM -0700, Luis R. Rodriguez wrote:
> Here's a few minor fs thaw fixes and adjustments I ran accross
> as I started to refresh my development for to use freeze_fs on
> suspend/hibernation [0]. These are just prep commits for the real
> work, I'm still reviewing feedback and adjusting the code for
> that.

And I forgot to provide the URL, for those that missed the old series
I was working on:

[0] https://lkml.kernel.org/r/20171129232356.28296-1-mcg...@kernel.org

  Luis
> 
> I've tested the patches with generic/085 on xfs and found no regressions.
> 
> Luis R. Rodriguez (3):
>   fs: move documentation for thaw_super() where appropriate
>   fs: make thaw_super_locked() really just a helper
>   fs: fix corner case race on freeze_bdev() when sb disappears
> 
>  fs/block_dev.c |  4 +++-
>  fs/super.c | 39 ++-
>  2 files changed, 29 insertions(+), 14 deletions(-)
> 
> -- 
> 2.16.3
> 
> 

-- 
Do not panic


Re: [PATCH 1/3] fs: move documentation for thaw_super() where appropriate

2018-04-20 Thread Bart Van Assche
On Fri, 2018-04-20 at 16:59 -0700, Luis R. Rodriguez wrote:
> +/**
> + * thaw_super -- unlock filesystem
> + * @sb: the super to thaw
> + *
> + * Unlocks the filesystem and marks it writeable again after freeze_super().
> + */

Have you verified the output generated by scripts/kernel-doc? Last
time I checked the output for headers containing a double dash looked
weird.

Bart.





[PATCH 2/3] fs: make thaw_super_locked() really just a helper

2018-04-20 Thread Luis R. Rodriguez
thaw_super_locked() was added via commit 08fdc8a0138a ("buffer.c: call
thaw_super during emergency thaw") merged on v4.17 to help with the
ability so that the caller can take charge of handling the s_umount lock,
however, it has left all* of the failure handling including unlocking
lock of s_umount inside thaw_super_locked().

This does not make thaw_super_locked() flexible. For instance we may
later want to use it with the abilty to handle unfolding of the locks
ourselves.

Change thaw_super_locked() to really just be a helper, and let the
callers deal with all the error handling.

This commit introeuces no functional changes.

Signed-off-by: Luis R. Rodriguez 
---
 fs/super.c | 27 ---
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index 9d0eb5e20a1f..82bc74a16f06 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -937,10 +937,15 @@ void emergency_remount(void)
 
 static void do_thaw_all_callback(struct super_block *sb)
 {
+   int error;
+
down_write(>s_umount);
if (sb->s_root && sb->s_flags & MS_BORN) {
emergency_thaw_bdev(sb);
-   thaw_super_locked(sb);
+   error = thaw_super_locked(sb);
+   if (error)
+   up_write(>s_umount);
+   deactivate_locked_super(sb);
} else {
up_write(>s_umount);
}
@@ -1532,14 +1537,13 @@ int freeze_super(struct super_block *sb)
 }
 EXPORT_SYMBOL(freeze_super);
 
+/* Caller takes the sb->s_umount rw_semaphore lock and handles active count */
 static int thaw_super_locked(struct super_block *sb)
 {
int error;
 
-   if (sb->s_writers.frozen != SB_FREEZE_COMPLETE) {
-   up_write(>s_umount);
+   if (sb->s_writers.frozen != SB_FREEZE_COMPLETE)
return -EINVAL;
-   }
 
if (sb_rdonly(sb)) {
sb->s_writers.frozen = SB_UNFROZEN;
@@ -1554,7 +1558,6 @@ static int thaw_super_locked(struct super_block *sb)
printk(KERN_ERR
"VFS:Filesystem thaw failed\n");
lockdep_sb_freeze_release(sb);
-   up_write(>s_umount);
return error;
}
}
@@ -1563,7 +1566,6 @@ static int thaw_super_locked(struct super_block *sb)
sb_freeze_unlock(sb);
 out:
wake_up(>s_writers.wait_unfrozen);
-   deactivate_locked_super(sb);
return 0;
 }
 
@@ -1575,7 +1577,18 @@ static int thaw_super_locked(struct super_block *sb)
  */
 int thaw_super(struct super_block *sb)
 {
+   int error;
+
down_write(>s_umount);
-   return thaw_super_locked(sb);
+   error = thaw_super_locked(sb);
+   if (error) {
+   up_write(>s_umount);
+   goto out;
+   }
+
+   deactivate_locked_super(sb);
+
+out:
+   return error;
 }
 EXPORT_SYMBOL(thaw_super);
-- 
2.16.3



[PATCH 1/3] fs: move documentation for thaw_super() where appropriate

2018-04-20 Thread Luis R. Rodriguez
On commit 08fdc8a0138a ("buffer.c: call thaw_super during emergency thaw")
Mateusz added thaw_super_locked() and made thaw_super() use it, but
forgot to move the documentation.

Signed-off-by: Luis R. Rodriguez 
---
 fs/super.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index 5fa9a8d8d865..9d0eb5e20a1f 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1532,12 +1532,6 @@ int freeze_super(struct super_block *sb)
 }
 EXPORT_SYMBOL(freeze_super);
 
-/**
- * thaw_super -- unlock filesystem
- * @sb: the super to thaw
- *
- * Unlocks the filesystem and marks it writeable again after freeze_super().
- */
 static int thaw_super_locked(struct super_block *sb)
 {
int error;
@@ -1573,6 +1567,12 @@ static int thaw_super_locked(struct super_block *sb)
return 0;
 }
 
+/**
+ * thaw_super -- unlock filesystem
+ * @sb: the super to thaw
+ *
+ * Unlocks the filesystem and marks it writeable again after freeze_super().
+ */
 int thaw_super(struct super_block *sb)
 {
down_write(>s_umount);
-- 
2.16.3



Re: [RFC PATCH 00/79] Generic page write protection and a solution to page waitqueue

2018-04-20 Thread Tim Chen
On 04/20/2018 03:19 PM, Jerome Glisse wrote:
> On Fri, Apr 20, 2018 at 12:57:41PM -0700, Tim Chen wrote:
>> On 04/04/2018 12:17 PM, jgli...@redhat.com wrote:
>>
>>
>> Your approach seems useful if there are lots of locked pages sharing
>> the same wait queue.  
>>
>> That said, in the original workload from our customer with the long wait 
>> queue
>> problem, there was a single super hot page getting migrated, and it
>> is being accessed by all threads which caused the big log jam while they 
>> wait for
>> the migration to get completed.  
>> With your approach, we will still likely end up with a long queue 
>> in that workload even if we have per page wait queue.
>>
>> Thanks.
> 
> Ok so i re-read the thread, i was writting this cover letter from memory
> and i had bad recollection of your issue, so sorry.
> 
> First, do you have a way to reproduce the issue ? Something easy would
> be nice :)

Unfortunately it is a customer workload that they guard closely and wouldn't 
let us
look at the source code.  We have to profile and backtrace its behavior.

Mel made a quick attempt to reproduce the behavior with a hot page migration, 
but he wasn't quite able to duplicate the pathologic behavior.

> 
> So what i am proposing for per page wait queue would only marginaly help
> you (it might not even be mesurable in your workload). It would certainly
> make the code smaller and easier to understand i believe.

In certain cases if we have lots of pages sharing a page wait queue,
your solution would help, and we wouldn't be wasting time checking
waiters not waiting on the page that's being unlocked.  Though I
don't have a specific workload that has such behavior.

> 
> Now that i have look back at your issue i think there is 2 things we
> should do. First keep migration page map read only, this would at least
> avoid CPU read fault. In trace you captured i wasn't able to ascertain
> if this were read or write fault.
> 
> Second idea i have is about NUMA, everytime we NUMA migrate a page we
> could attach a temporary struct to the page (using page->mapping). So
> if we scan that page again we can inspect information about previous
> migration and see if we are not over migrating that page (ie bouncing
> it all over). If so we can mark the page (maybe with a page flag if we
> can find one) to protect it from further migration. That temporary
> struct would be remove after a while, ie autonuma would preallocate a
> bunch of those and keep an LRU of them and recycle the oldest when it
> needs a new one to migrate another page.

The goal to migrate a hot page with care, or avoid bouncing it around 
frequently makes sense.  If it is a hot page shared by many threads
running on different NUMA nodes, and moving it will only mildly improve NUMA
locality, we should avoid the migration.

Tim

> 
> 
> LSF/MM slots:
> 
> Michal can i get 2 slots to talk about this ? MM only discussion, one
> to talk about doing migration with page map read only but write
> protected while migration is happening. The other one to talk about
> attaching auto NUMA tracking struct to page.
> 
> Cheers,
> Jérôme
> 



Re: [PATCH v2] target: Fix Fortify_panic kernel exception

2018-04-20 Thread Martin K. Petersen

Bryant,

> The bug exists in the memcmp in which the length passed in must be
> guaranteed to be 1. This bug currently exists because the second
> pointer passed in, can be smaller than the cmd->data_length, which
> causes a fortify_panic.
>
> The fix is to use memchr_inv instead to find whether or not a 0 exists
> instead of using memcmp. This way you dont have to worry about buffer
> overflow which is the reason for the fortify_panic.

Clarified the commit description a bit and applied the patch
4.17/scsi-fixes. Thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] bsg referencing bus driver module

2018-04-20 Thread Anatoliy Glagolev
 
> This patch isn't applyable because your mailer has changed all the tabs
> to spaces.
> 
> I also think there's no need to do it this way.  I think what we need
> is for fc_bsg_remove() to wait until the bsg queue is drained.  It does
> look like the author thought this happened otherwise the code wouldn't
> have the note.  If we fix it that way we can do the same thing in all
> the other transport classes that use bsg (which all have a similar
> issue).
> 
> James
> 

Thanks, James. Sorry about the tabs; re-sending.

On fc_bsg_remove()...: are you suggesting to implement the whole fix
in scsi_transport_fc.c? That would be nice, but I do not see how that
is possible. Even with the queue drained bsg still holds a reference
to the Scsi_Host via bsg_class_device; bsg_class_device itself is
referenced on bsg_open and kept around while a user-mode process keeps
a handle to bsg.
Even if we somehow implement the waiting the call may be stuck
forever if the user-mode process keeps the handle.
I think handling it via a rererence to the module is more consistent
with the way things are done in Linux. You suggested the approach
youself back in "Waiting for scsi_host_template release" discussion.


>From df939b80d02bf37b21efaaef8ede86cfd39b0cb8 Mon Sep 17 00:00:00 2001
From: Anatoliy Glagolev 
Date: Thu, 19 Apr 2018 15:06:06 -0600
Subject: [PATCH] bsg referencing parent module

Fixing a bug when bsg layer holds the last reference to device
when the device's module has been unloaded. Upon dropping the
reference the device's release function may touch memory of
the unloaded module.
---
 block/bsg-lib.c  | 24 ++--
 block/bsg.c  | 29 ++---
 drivers/scsi/scsi_transport_fc.c |  8 ++--
 include/linux/bsg-lib.h  |  4 
 include/linux/bsg.h  |  5 +
 5 files changed, 63 insertions(+), 7 deletions(-)

diff --git a/block/bsg-lib.c b/block/bsg-lib.c
index fc2e5ff..90c28fd 100644
--- a/block/bsg-lib.c
+++ b/block/bsg-lib.c
@@ -309,6 +309,25 @@ struct request_queue *bsg_setup_queue(struct device *dev, 
const char *name,
bsg_job_fn *job_fn, int dd_job_size,
void (*release)(struct device *))
 {
+   return bsg_setup_queue_ex(dev, name, job_fn, dd_job_size, release,
+   NULL);
+}
+EXPORT_SYMBOL_GPL(bsg_setup_queue);
+
+/**
+ * bsg_setup_queue - Create and add the bsg hooks so we can receive requests
+ * @dev: device to attach bsg device to
+ * @name: device to give bsg device
+ * @job_fn: bsg job handler
+ * @dd_job_size: size of LLD data needed for each job
+ * @release: @dev release function
+ * @dev_module: @dev's module
+ */
+struct request_queue *bsg_setup_queue_ex(struct device *dev, const char *name,
+   bsg_job_fn *job_fn, int dd_job_size,
+   void (*release)(struct device *),
+   struct module *dev_module)
+{
struct request_queue *q;
int ret;
 
@@ -331,7 +350,8 @@ struct request_queue *bsg_setup_queue(struct device *dev, 
const char *name,
blk_queue_softirq_done(q, bsg_softirq_done);
blk_queue_rq_timeout(q, BLK_DEFAULT_SG_TIMEOUT);
 
-   ret = bsg_register_queue(q, dev, name, _transport_ops, release);
+   ret = bsg_register_queue_ex(q, dev, name, _transport_ops, release,
+   dev_module);
if (ret) {
printk(KERN_ERR "%s: bsg interface failed to "
   "initialize - register queue\n", dev->kobj.name);
@@ -343,4 +363,4 @@ struct request_queue *bsg_setup_queue(struct device *dev, 
const char *name,
blk_cleanup_queue(q);
return ERR_PTR(ret);
 }
-EXPORT_SYMBOL_GPL(bsg_setup_queue);
+EXPORT_SYMBOL_GPL(bsg_setup_queue_ex);
diff --git a/block/bsg.c b/block/bsg.c
index defa06c..6920c5b 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -750,7 +750,8 @@ static struct bsg_device *__bsg_get_device(int minor, 
struct request_queue *q)
return bd;
 }
 
-static struct bsg_device *bsg_get_device(struct inode *inode, struct file 
*file)
+static struct bsg_device *bsg_get_device(struct inode *inode, struct file 
*file,
+   struct bsg_class_device **pbcd)
 {
struct bsg_device *bd;
struct bsg_class_device *bcd;
@@ -766,6 +767,7 @@ static struct bsg_device *bsg_get_device(struct inode 
*inode, struct file *file)
 
if (!bcd)
return ERR_PTR(-ENODEV);
+   *pbcd = bcd;
 
bd = __bsg_get_device(iminor(inode), bcd->queue);
if (bd)
@@ -781,22 +783,34 @@ static struct bsg_device *bsg_get_device(struct inode 
*inode, struct file *file)
 static int bsg_open(struct inode *inode, struct file *file)
 {
struct bsg_device *bd;
+   struct bsg_class_device *bcd;
 
-   bd = bsg_get_device(inode, file);
+   bd = bsg_get_device(inode, file, );
 
if (IS_ERR(bd))
return PTR_ERR(bd);
 
file->private_data = bd;
+   if 

Re: [RFC PATCH 00/79] Generic page write protection and a solution to page waitqueue

2018-04-20 Thread Jerome Glisse
On Fri, Apr 20, 2018 at 12:57:41PM -0700, Tim Chen wrote:
> On 04/04/2018 12:17 PM, jgli...@redhat.com wrote:
> > From: Jérôme Glisse 
> > 
> > https://cgit.freedesktop.org/~glisse/linux/log/?h=generic-write-protection-rfc
> > 
> > This is an RFC for LSF/MM discussions. It impacts the file subsystem,
> > the block subsystem and the mm subsystem. Hence it would benefit from
> > a cross sub-system discussion.
> > 
> > Patchset is not fully bake so take it with a graint of salt. I use it
> > to illustrate the fact that it is doable and now that i did it once i
> > believe i have a better and cleaner plan in my head on how to do this.
> > I intend to share and discuss it at LSF/MM (i still need to write it
> > down). That plan lead to quite different individual steps than this
> > patchset takes and his also easier to split up in more manageable
> > pieces.
> > 
> > I also want to apologize for the size and number of patches (and i am
> > not even sending them all).
> > 
> > --
> > The Why ?
> > 
> > I have two objectives: duplicate memory read only accross nodes and or
> > devices and work around PCIE atomic limitations. More on each of those
> > objective below. I also want to put forward that it can solve the page
> > wait list issue ie having each page with its own wait list and thus
> > avoiding long wait list traversale latency recently reported [1].
> > 
> > It does allow KSM for file back pages (truely generic KSM even between
> > both anonymous and file back page). I am not sure how useful this can
> > be, this was not an objective i did pursue, this is just a for free
> > feature (see below).
> > 
> > [1] https://groups.google.com/forum/#!topic/linux.kernel/Iit1P5BNyX8
> > 
> > --
> > Per page wait list, so long page_waitqueue() !
> > 
> > Not implemented in this RFC but below is the logic and pseudo code
> > at bottom of this email.
> > 
> > When there is a contention on struct page lock bit, the caller which
> > is trying to lock the page will add itself to a waitqueue. The issues
> > here is that multiple pages share the same wait queue and on large
> > system with a lot of ram this means we can quickly get to a long list
> > of waiters for differents pages (or for the same page) on the same
> > list [1].
> 
> Your approach seems useful if there are lots of locked pages sharing
> the same wait queue.  
> 
> That said, in the original workload from our customer with the long wait queue
> problem, there was a single super hot page getting migrated, and it
> is being accessed by all threads which caused the big log jam while they wait 
> for
> the migration to get completed.  
> With your approach, we will still likely end up with a long queue 
> in that workload even if we have per page wait queue.
> 
> Thanks.

Ok so i re-read the thread, i was writting this cover letter from memory
and i had bad recollection of your issue, so sorry.

First, do you have a way to reproduce the issue ? Something easy would
be nice :)

So what i am proposing for per page wait queue would only marginaly help
you (it might not even be mesurable in your workload). It would certainly
make the code smaller and easier to understand i believe.

Now that i have look back at your issue i think there is 2 things we
should do. First keep migration page map read only, this would at least
avoid CPU read fault. In trace you captured i wasn't able to ascertain
if this were read or write fault.

Second idea i have is about NUMA, everytime we NUMA migrate a page we
could attach a temporary struct to the page (using page->mapping). So
if we scan that page again we can inspect information about previous
migration and see if we are not over migrating that page (ie bouncing
it all over). If so we can mark the page (maybe with a page flag if we
can find one) to protect it from further migration. That temporary
struct would be remove after a while, ie autonuma would preallocate a
bunch of those and keep an LRU of them and recycle the oldest when it
needs a new one to migrate another page.


LSF/MM slots:

Michal can i get 2 slots to talk about this ? MM only discussion, one
to talk about doing migration with page map read only but write
protected while migration is happening. The other one to talk about
attaching auto NUMA tracking struct to page.

Cheers,
Jérôme


Re: usercopy whitelist woe in scsi_sense_cache

2018-04-20 Thread Oleksandr Natalenko

Hi.

On 20.04.2018 22:23, Kees Cook wrote:

I don't know the "how", I only found the "what". :) If you want, grab
the reproducer VM linked to earlier in this thread; it'll hit the
problem within about 30 seconds of running the reproducer.


Just to avoid a possible confusion I should note that I've removed the 
reproducer from my server, but I can re-upload it if needed.


--
  Oleksandr Natalenko (post-factum)


Re: usercopy whitelist woe in scsi_sense_cache

2018-04-20 Thread Kees Cook
On Thu, Apr 19, 2018 at 2:32 AM, Paolo Valente  wrote:
> I'm missing something here.  When the request gets completed in the
> first place, the hook bfq_finish_requeue_request gets called, and that
> hook clears both ->elv.priv elements (as the request has a non-null
> elv.icq).  So, when bfq gets the same request again, those elements
> must be NULL.  What am I getting wrong?
>
> I have some more concern on this point, but I'll stick to this for the
> moment, to not create more confusion.

I don't know the "how", I only found the "what". :) If you want, grab
the reproducer VM linked to earlier in this thread; it'll hit the
problem within about 30 seconds of running the reproducer.

-Kees

-- 
Kees Cook
Pixel Security


Re: [RFC PATCH 00/79] Generic page write protection and a solution to page waitqueue

2018-04-20 Thread Tim Chen
On 04/04/2018 12:17 PM, jgli...@redhat.com wrote:
> From: Jérôme Glisse 
> 
> https://cgit.freedesktop.org/~glisse/linux/log/?h=generic-write-protection-rfc
> 
> This is an RFC for LSF/MM discussions. It impacts the file subsystem,
> the block subsystem and the mm subsystem. Hence it would benefit from
> a cross sub-system discussion.
> 
> Patchset is not fully bake so take it with a graint of salt. I use it
> to illustrate the fact that it is doable and now that i did it once i
> believe i have a better and cleaner plan in my head on how to do this.
> I intend to share and discuss it at LSF/MM (i still need to write it
> down). That plan lead to quite different individual steps than this
> patchset takes and his also easier to split up in more manageable
> pieces.
> 
> I also want to apologize for the size and number of patches (and i am
> not even sending them all).
> 
> --
> The Why ?
> 
> I have two objectives: duplicate memory read only accross nodes and or
> devices and work around PCIE atomic limitations. More on each of those
> objective below. I also want to put forward that it can solve the page
> wait list issue ie having each page with its own wait list and thus
> avoiding long wait list traversale latency recently reported [1].
> 
> It does allow KSM for file back pages (truely generic KSM even between
> both anonymous and file back page). I am not sure how useful this can
> be, this was not an objective i did pursue, this is just a for free
> feature (see below).
> 
> [1] https://groups.google.com/forum/#!topic/linux.kernel/Iit1P5BNyX8
> 
> --
> Per page wait list, so long page_waitqueue() !
> 
> Not implemented in this RFC but below is the logic and pseudo code
> at bottom of this email.
> 
> When there is a contention on struct page lock bit, the caller which
> is trying to lock the page will add itself to a waitqueue. The issues
> here is that multiple pages share the same wait queue and on large
> system with a lot of ram this means we can quickly get to a long list
> of waiters for differents pages (or for the same page) on the same
> list [1].

Your approach seems useful if there are lots of locked pages sharing
the same wait queue.  

That said, in the original workload from our customer with the long wait queue
problem, there was a single super hot page getting migrated, and it
is being accessed by all threads which caused the big log jam while they wait 
for
the migration to get completed.  
With your approach, we will still likely end up with a long queue 
in that workload even if we have per page wait queue.

Thanks.

Tim



Re: [LSF/MM] Ride sharing

2018-04-20 Thread Theodore Y. Ts'o
On Fri, Apr 20, 2018 at 09:59:23AM +1000, Dave Chinner wrote:
> On Thu, Apr 19, 2018 at 01:34:32PM -0700, Matthew Wilcox wrote:
> > I hate renting unnecessary cars, and the various transportation companies
> > offer a better deal if multiple people book at once.
> > 
> > I'm scheduled to arrive on Sunday at 3:18pm local time if anyone wants to
> > share transport. Does anyone have a wiki we can use to coordinate this?
> 
> Arriving 4.15pm sunday, so if you want to wait around for a bit I'm
> happy to share...

I'm arriving at 4:39pm on Sunday and will be renting a car.  If you
are interested in a ride, send me e-mail or send me a Google Hangouts
message at theodore@gmail.com

- Ted


Re: [PATCH 3/3] lightnvm: pblk: fix smeta write error path

2018-04-20 Thread Javier Gonzalez
> On 19 Apr 2018, at 09.39, Hans Holmberg  
> wrote:
> 
> From: Hans Holmberg 
> 
> Smeta write errors were previously ignored. Skip these
> lines instead and throw them back on the free
> list, so the chunks will go through a reset cycle
> before we attempt to use the line again.
> 
> Signed-off-by: Hans Holmberg 
> ---
> drivers/lightnvm/pblk-core.c | 7 ---
> 1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> index f6135e4..485fe8c 100644
> --- a/drivers/lightnvm/pblk-core.c
> +++ b/drivers/lightnvm/pblk-core.c
> @@ -849,9 +849,10 @@ static int pblk_line_submit_smeta_io(struct pblk *pblk, 
> struct pblk_line *line,
>   atomic_dec(>inflight_io);
> 
>   if (rqd.error) {
> - if (dir == PBLK_WRITE)
> + if (dir == PBLK_WRITE) {
>   pblk_log_write_err(pblk, );
> - else if (dir == PBLK_READ)
> + ret = 1;
> + } else if (dir == PBLK_READ)
>   pblk_log_read_err(pblk, );
>   }
> 
> @@ -1120,7 +1121,7 @@ static int pblk_line_init_bb(struct pblk *pblk, struct 
> pblk_line *line,
> 
>   if (init && pblk_line_submit_smeta_io(pblk, line, off, PBLK_WRITE)) {
>   pr_debug("pblk: line smeta I/O failed. Retry\n");
> - return 1;
> + return 0;
>   }
> 
>   bitmap_copy(line->invalid_bitmap, line->map_bitmap, lm->sec_per_line);
> --
> 2.7.4

Looks good to me..

Reviewed-by: Javier González 




signature.asc
Description: Message signed with OpenPGP


Re: [PATCH 2/3] lightnvm: pblk: garbage collect lines with failed writes

2018-04-20 Thread Javier Gonzalez
> On 19 Apr 2018, at 09.39, Hans Holmberg  
> wrote:
> 
> From: Hans Holmberg 
> 
> Write failures should not happen under normal circumstances,
> so in order to bring the chunk back into a known state as soon
> as possible, evacuate all the valid data out of the line and let the
> fw judge if the block can be written to in the next reset cycle.
> 
> Do this by introducing a new gc list for lines with failed writes,
> and ensure that the rate limiter allocates a small portion of
> the write bandwidth to get the job done.
> 
> The lba list is saved in memory for use during gc as we
> cannot gurantee that the emeta data is readable if a write
> error occurred.
> 
> Signed-off-by: Hans Holmberg 
> ---
> drivers/lightnvm/pblk-core.c  | 43 +--
> drivers/lightnvm/pblk-gc.c| 79 +++
> drivers/lightnvm/pblk-init.c  | 39 ++---
> drivers/lightnvm/pblk-rl.c| 29 +---
> drivers/lightnvm/pblk-sysfs.c | 15 ++--
> drivers/lightnvm/pblk-write.c |  2 ++
> drivers/lightnvm/pblk.h   | 25 +++---
> 7 files changed, 178 insertions(+), 54 deletions(-)
> 
> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c
> index 7762e89..f6135e4 100644
> --- a/drivers/lightnvm/pblk-core.c
> +++ b/drivers/lightnvm/pblk-core.c
> @@ -373,7 +373,13 @@ struct list_head *pblk_line_gc_list(struct pblk *pblk, 
> struct pblk_line *line)
> 
>   lockdep_assert_held(>lock);
> 
> - if (!vsc) {
> + if (line->w_err_gc->has_write_err) {
> + if (line->gc_group != PBLK_LINEGC_WERR) {
> + line->gc_group = PBLK_LINEGC_WERR;
> + move_list = _mg->gc_werr_list;
> + pblk_rl_werr_line_in(>rl);
> + }
> + } else if (!vsc) {
>   if (line->gc_group != PBLK_LINEGC_FULL) {
>   line->gc_group = PBLK_LINEGC_FULL;
>   move_list = _mg->gc_full_list;
> @@ -1603,8 +1609,13 @@ static void __pblk_line_put(struct pblk *pblk, struct 
> pblk_line *line)
>   line->state = PBLK_LINESTATE_FREE;
>   line->gc_group = PBLK_LINEGC_NONE;
>   pblk_line_free(line);
> - spin_unlock(>lock);
> 
> + if (line->w_err_gc->has_write_err) {
> + pblk_rl_werr_line_out(>rl);
> + line->w_err_gc->has_write_err = 0;
> + }
> +
> + spin_unlock(>lock);
>   atomic_dec(>pipeline_gc);
> 
>   spin_lock(_mg->free_lock);
> @@ -1767,11 +1778,32 @@ void pblk_line_close_meta(struct pblk *pblk, struct 
> pblk_line *line)
> 
>   spin_lock(_mg->close_lock);
>   spin_lock(>lock);
> +
> + /* Update the in-memory start address for emeta, in case it has
> +  * shifted due to write errors
> +  */
> + if (line->emeta_ssec != line->cur_sec)
> + line->emeta_ssec = line->cur_sec;
> +
>   list_add_tail(>list, _mg->emeta_list);
>   spin_unlock(>lock);
>   spin_unlock(_mg->close_lock);
> 
>   pblk_line_should_sync_meta(pblk);
> +
> +
> +}
> +
> +static void pblk_save_lba_list(struct pblk *pblk, struct pblk_line *line)
> +{
> + struct pblk_line_meta *lm = >lm;
> + unsigned int lba_list_size = lm->emeta_len[2];
> + struct pblk_w_err_gc *w_err_gc = line->w_err_gc;
> + struct pblk_emeta *emeta = line->emeta;
> +
> + w_err_gc->lba_list = kmalloc(lba_list_size, GFP_KERNEL);
> + memcpy(w_err_gc->lba_list, emeta_to_lbas(pblk, emeta->buf),
> + lba_list_size);
> }
> 
> void pblk_line_close_ws(struct work_struct *work)
> @@ -1780,6 +1812,13 @@ void pblk_line_close_ws(struct work_struct *work)
>   ws);
>   struct pblk *pblk = line_ws->pblk;
>   struct pblk_line *line = line_ws->line;
> + struct pblk_w_err_gc *w_err_gc = line->w_err_gc;
> +
> + /* Write errors makes the emeta start address stored in smeta invalid,
> +  * so keep a copy of the lba list until we've gc'd the line
> +  */
> + if (w_err_gc->has_write_err)
> + pblk_save_lba_list(pblk, line);
> 
>   pblk_line_close(pblk, line);
>   mempool_free(line_ws, pblk->gen_ws_pool);
> diff --git a/drivers/lightnvm/pblk-gc.c b/drivers/lightnvm/pblk-gc.c
> index b0cc277..62f0548 100644
> --- a/drivers/lightnvm/pblk-gc.c
> +++ b/drivers/lightnvm/pblk-gc.c
> @@ -138,10 +138,10 @@ static void pblk_gc_line_prepare_ws(struct work_struct 
> *work)
>   struct pblk_line_mgmt *l_mg = >l_mg;
>   struct pblk_line_meta *lm = >lm;
>   struct pblk_gc *gc = >gc;
> - struct line_emeta *emeta_buf;
> + struct line_emeta *emeta_buf = NULL;
>   struct pblk_line_ws *gc_rq_ws;
>   struct pblk_gc_rq *gc_rq;
> - __le64 *lba_list;
> + __le64 *lba_list = NULL;
>   unsigned long *invalid_bitmap;
>   int sec_left, nr_secs, bit;
>   int ret;

Re: [PATCH 03/11] fs: add frozen sb state helpers

2018-04-20 Thread Luis R. Rodriguez
On Tue, Apr 17, 2018 at 05:59:36PM -0700, Luis R. Rodriguez wrote:
> On Thu, Dec 21, 2017 at 12:03:29PM +0100, Jan Kara wrote:
> > Hello,
> > 
> > I think I owe you a reply here... Sorry that it took so long.
> 
> Took me just as long :)
> 
> > On Fri 01-12-17 22:13:27, Luis R. Rodriguez wrote:
> > > 
> > > I'll note that its still not perfectly clear if really the semantics 
> > > behind
> > > freeze_bdev() match what I described above fully. That still needs to be
> > > vetted for. For instance, does thaw_bdev() keep a superblock frozen if we
> > > an ioctl initiated freeze had occurred before? If so then great. Otherwise
> > > I think we'll need to distinguish the ioctl interface. Worst possible case
> > > is that bdev semantics and in-kernel semantics differ somehow, then that
> > > will really create a holy fucking mess.
> > 
> > I believe nobody really thought about mixing those two interfaces to fs
> > freezing and so the behavior is basically defined by the implementation.
> > That is:
> > 
> > freeze_bdev() on sb frozen by ioctl_fsfreeze() -> EBUSY
> > freeze_bdev() on sb frozen by freeze_bdev() -> success
> > ioctl_fsfreeze() on sb frozen by freeze_bdev() -> EBUSY
> > ioctl_fsfreeze() on sb frozen by ioctl_fsfreeze() -> EBUSY
> > 
> > thaw_bdev() on sb frozen by ioctl_fsfreeze() -> EINVAL
> 
> Phew, so this is what we want for the in-kernel freezing so we're good
> and *can* combine these then.

I double checked, and I don't see where you get EINVAL for this case.
We *do* keep the sb frozen though, which is good, and the worst fear
I had was that we did not. However we return 0 if there was already
a prior freeze_bdev() or ioctl_fsfreeze() other than the context that
started the prior freeze (--bdev->bd_fsfreeze_count > 0).

The -EINVAL is only returned currently if there were no freezers.

int thaw_bdev(struct block_device *bdev, struct super_block *sb)
{
int error = -EINVAL;

mutex_lock(>bd_fsfreeze_mutex);
if (!bdev->bd_fsfreeze_count)
goto out;

error = 0;
if (--bdev->bd_fsfreeze_count > 0)
goto out;
...
out:
mutex_unlock(>bd_fsfreeze_mutex);
return error;
}

  Luis


Re: [PATCH 09/12] PCI: remove CONFIG_PCI_BUS_ADDR_T_64BIT

2018-04-20 Thread Bjorn Helgaas
On Sun, Apr 15, 2018 at 04:59:44PM +0200, Christoph Hellwig wrote:
> This symbol is now always identical to CONFIG_ARCH_DMA_ADDR_T_64BIT, so
> remove it.
> 
> Signed-off-by: Christoph Hellwig 

Acked-by: Bjorn Helgaas 

Please merge this along with the rest of the series; let me know if you
need anything more from me.

> ---
>  drivers/pci/Kconfig | 4 
>  drivers/pci/bus.c   | 4 ++--
>  include/linux/pci.h | 2 +-
>  3 files changed, 3 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
> index 34b56a8f8480..29a487f31dae 100644
> --- a/drivers/pci/Kconfig
> +++ b/drivers/pci/Kconfig
> @@ -5,10 +5,6 @@
>  
>  source "drivers/pci/pcie/Kconfig"
>  
> -config PCI_BUS_ADDR_T_64BIT
> - def_bool y if (ARCH_DMA_ADDR_T_64BIT || 64BIT)
> - depends on PCI
> -
>  config PCI_MSI
>   bool "Message Signaled Interrupts (MSI and MSI-X)"
>   depends on PCI
> diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
> index bc2ded4c451f..35b7fc87eac5 100644
> --- a/drivers/pci/bus.c
> +++ b/drivers/pci/bus.c
> @@ -120,7 +120,7 @@ int devm_request_pci_bus_resources(struct device *dev,
>  EXPORT_SYMBOL_GPL(devm_request_pci_bus_resources);
>  
>  static struct pci_bus_region pci_32_bit = {0, 0xULL};
> -#ifdef CONFIG_PCI_BUS_ADDR_T_64BIT
> +#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
>  static struct pci_bus_region pci_64_bit = {0,
>   (pci_bus_addr_t) 0xULL};
>  static struct pci_bus_region pci_high = {(pci_bus_addr_t) 0x1ULL,
> @@ -230,7 +230,7 @@ int pci_bus_alloc_resource(struct pci_bus *bus, struct 
> resource *res,
> resource_size_t),
>   void *alignf_data)
>  {
> -#ifdef CONFIG_PCI_BUS_ADDR_T_64BIT
> +#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
>   int rc;
>  
>   if (res->flags & IORESOURCE_MEM_64) {
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 73178a2fcee0..55371cb827ad 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -670,7 +670,7 @@ int raw_pci_read(unsigned int domain, unsigned int bus, 
> unsigned int devfn,
>  int raw_pci_write(unsigned int domain, unsigned int bus, unsigned int devfn,
> int reg, int len, u32 val);
>  
> -#ifdef CONFIG_PCI_BUS_ADDR_T_64BIT
> +#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
>  typedef u64 pci_bus_addr_t;
>  #else
>  typedef u32 pci_bus_addr_t;
> -- 
> 2.17.0
> 


Re: [PATCH 1/3] lightnvm: pblk: rework write error recovery path

2018-04-20 Thread Javier Gonzalez
> On 19 Apr 2018, at 09.39, Hans Holmberg  
> wrote:
> 
> From: Hans Holmberg 
> 
> The write error recovery path is incomplete, so rework
> the write error recovery handling to do resubmits directly
> from the write buffer.
> 
> When a write error occurs, the remaining sectors in the chunk are
> mapped out and invalidated and the request inserted in a resubmit list.
> 
> The writer thread checks if there are any requests to resubmit,
> scans and invalidates any lbas that have been overwritten by later
> writes and resubmits the failed entries.
> 
> Signed-off-by: Hans Holmberg 
> ---
> drivers/lightnvm/pblk-init.c |   2 +
> drivers/lightnvm/pblk-recovery.c |  91 ---
> drivers/lightnvm/pblk-write.c| 241 ---
> drivers/lightnvm/pblk.h  |   8 +-
> 4 files changed, 180 insertions(+), 162 deletions(-)
> 
> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
> index bfc488d..6f06727 100644
> --- a/drivers/lightnvm/pblk-init.c
> +++ b/drivers/lightnvm/pblk-init.c
> @@ -426,6 +426,7 @@ static int pblk_core_init(struct pblk *pblk)
>   goto free_r_end_wq;
> 
>   INIT_LIST_HEAD(>compl_list);
> + INIT_LIST_HEAD(>resubmit_list);
> 
>   return 0;
> 
> @@ -1185,6 +1186,7 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct 
> gendisk *tdisk,
>   pblk->state = PBLK_STATE_RUNNING;
>   pblk->gc.gc_enabled = 0;
> 
> + spin_lock_init(>resubmit_lock);
>   spin_lock_init(>trans_lock);
>   spin_lock_init(>lock);
> 
> diff --git a/drivers/lightnvm/pblk-recovery.c 
> b/drivers/lightnvm/pblk-recovery.c
> index 9cb6d5d..5983428 100644
> --- a/drivers/lightnvm/pblk-recovery.c
> +++ b/drivers/lightnvm/pblk-recovery.c
> @@ -16,97 +16,6 @@
> 
> #include "pblk.h"
> 
> -void pblk_submit_rec(struct work_struct *work)
> -{
> - struct pblk_rec_ctx *recovery =
> - container_of(work, struct pblk_rec_ctx, ws_rec);
> - struct pblk *pblk = recovery->pblk;
> - struct nvm_rq *rqd = recovery->rqd;
> - struct pblk_c_ctx *c_ctx = nvm_rq_to_pdu(rqd);
> - struct bio *bio;
> - unsigned int nr_rec_secs;
> - unsigned int pgs_read;
> - int ret;
> -
> - nr_rec_secs = bitmap_weight((unsigned long int *)>ppa_status,
> - NVM_MAX_VLBA);
> -
> - bio = bio_alloc(GFP_KERNEL, nr_rec_secs);
> -
> - bio->bi_iter.bi_sector = 0;
> - bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
> - rqd->bio = bio;
> - rqd->nr_ppas = nr_rec_secs;
> -
> - pgs_read = pblk_rb_read_to_bio_list(>rwb, bio, >failed,
> - nr_rec_secs);

Please, remove functions that are not longer used. Doing a pass on the
rest of the removed functions would be a good idea.

> - if (pgs_read != nr_rec_secs) {
> - pr_err("pblk: could not read recovery entries\n");
> - goto err;
> - }
> -
> - if (pblk_setup_w_rec_rq(pblk, rqd, c_ctx)) {

Same here

> - pr_err("pblk: could not setup recovery request\n");
> - goto err;
> - }
> -
> -#ifdef CONFIG_NVM_DEBUG
> - atomic_long_add(nr_rec_secs, >recov_writes);
> -#endif

Can you add this debug counter to the new path? I see you added other
counters, if it is a rename, can you put it on a separate patch?

> -
> - ret = pblk_submit_io(pblk, rqd);
> - if (ret) {
> - pr_err("pblk: I/O submission failed: %d\n", ret);
> - goto err;
> - }
> -
> - mempool_free(recovery, pblk->rec_pool);
> - return;
> -
> -err:
> - bio_put(bio);
> - pblk_free_rqd(pblk, rqd, PBLK_WRITE);
> -}
> -
> -int pblk_recov_setup_rq(struct pblk *pblk, struct pblk_c_ctx *c_ctx,
> - struct pblk_rec_ctx *recovery, u64 *comp_bits,
> - unsigned int comp)
> -{
> - struct nvm_rq *rec_rqd;
> - struct pblk_c_ctx *rec_ctx;
> - int nr_entries = c_ctx->nr_valid + c_ctx->nr_padded;
> -
> - rec_rqd = pblk_alloc_rqd(pblk, PBLK_WRITE);
> - rec_ctx = nvm_rq_to_pdu(rec_rqd);
> -
> - /* Copy completion bitmap, but exclude the first X completed entries */
> - bitmap_shift_right((unsigned long int *)_rqd->ppa_status,
> - (unsigned long int *)comp_bits,
> - comp, NVM_MAX_VLBA);
> -
> - /* Save the context for the entries that need to be re-written and
> -  * update current context with the completed entries.
> -  */
> - rec_ctx->sentry = pblk_rb_wrap_pos(>rwb, c_ctx->sentry + comp);
> - if (comp >= c_ctx->nr_valid) {
> - rec_ctx->nr_valid = 0;
> - rec_ctx->nr_padded = nr_entries - comp;
> -
> - c_ctx->nr_padded = comp - c_ctx->nr_valid;
> - } else {
> - rec_ctx->nr_valid = c_ctx->nr_valid - comp;
> - 

Re: [LSF/MM] Ride sharing

2018-04-20 Thread Eric Sandeen
On 4/19/18 3:34 PM, Matthew Wilcox wrote:
> I hate renting unnecessary cars, and the various transportation companies
> offer a better deal if multiple people book at once.
> 
> I'm scheduled to arrive on Sunday at 3:18pm local time if anyone wants to
> share transport.  Does anyone have a wiki we can use to coordinate this?

I get in 1:08pm local time on Sunday.  I'd be game to coordinate if possible,
but it may be too late for that.

FWIW I spoke with http://canyontransport.com and although the website says
minimum 2 adults, they will actually take just one, so you don't /have/ to
find another just to book with them.  The price is per-person regardless
of how many are in the van.

-Eric


Re: [PATCH] block: mq: Add some minor doc for core structs

2018-04-20 Thread Jens Axboe
On 4/20/18 2:29 AM, Linus Walleij wrote:
> As it came up in discussion on the mailing list that the semantic
> meaning of 'blk_mq_ctx' and 'blk_mq_hw_ctx' isn't completely
> obvious to everyone, let's add some minimal kerneldoc for a
> starter.

Thanks Linus, applied.

-- 
Jens Axboe



Re: question about request merge

2018-04-20 Thread Jens Axboe
On 4/19/18 9:51 PM, Zhengyuan Liu wrote:
> Hi, Shaohua
> 
> I found it indeed doesn't do front merge when two threads flush plug list  
> concurrently.   To 
> reappear , I prepared two IO threads , named a0.io and a1.io .
> Thread a1.io  uses libaio to write 5 requests : 
> sectors: 16 + 8, 40 + 8, 64 + 8, 88 + 8, 112 + 8
> Thread a0.io  uses libaio to write other 5 requests : 
> sectors: 8+ 8, 32 + 8, 56 + 8, 80 + 8, 104 + 8

I'm cutting some of the below.

Thanks for the detailed email. It's mostly on purpose that we don't
spend cycles and memory on maintaining a separate front merge hash,
since it's generally not something that happens very often. If you have
a thread pool doing IO and split sequential IO such that you would
benefit a lot from front merging, then I would generally claim that
you're not writing your app in the most optimal manner.

So I'm curious, what's the big interest in front merging?

-- 
Jens Axboe



Re: [PATCH v6] blk-mq: Avoid that a completion can be ignored for BLK_EH_RESET_TIMER

2018-04-20 Thread Bart Van Assche
On Fri, 2018-04-20 at 14:55 +0800, jianchao.wang wrote:
> Hi Bart
> 
> On 04/20/2018 12:43 AM, Bart Van Assche wrote:
> > Use the deadline instead of the request generation to detect whether
> >   or not a request timer fired after reinitialization of a request
> 
> Maybe use deadline to do this is not suitable.
> 
> Let's think of the following scenario.
> 
> T1/T2 times in nano seconds
> J1/J2 times in jiffies
> 
> rq is started at T1,J1
> blk_mq_timeout_work
>   -> blk_mq_check_expired
> -> save rq->__deadline (J1 + MQ_RQ_IN_FLIGHT)
> 
>  rq is completed and freed 
>  rq is allocated and started 
> again on T2
>  rq->__deadline = J2 + 
> MQ_RQ_IN_FLIGHT
>   -> synchronize srcu/rcu
>   -> blk_mq_terminate_expired
> -> rq->__deadline (J2 + MQ_RQ_IN_FLIGHT)
> 
> 1. deadline = rq->__deadline & ~RQ_STATE_MASK (0x3)
>if J2-J1 < 4 jiffies, we will get the same deadline value.
> 
> 2. even if we do some bit shift when save and get deadline
>if T2 - T1 < time of 1 jiffies, we sill get the same deadline.

Hello Jianchao,

How about using the upper 16 or 32 bits of rq->__deadline to store a generation
number? I don't know any block driver for which the product of (deadline in
seconds) and HZ exceeds (1 << 32).

Thanks,

Bart.





Re: [PATCH 2/3] xen netback: add fault injection facility

2018-04-20 Thread staskins

On 04/20/18 15:00, Juergen Gross wrote:

On 20/04/18 14:52, stask...@amazon.com wrote:

On 04/20/18 13:25, Juergen Gross wrote:

On 20/04/18 12:47, Stanislav Kinsburskii wrote:

+    for (fi = 0; fi < XENVIF_FI_MAX; fi++) {
+    vfi->faults[fi] = xen_fi_dir_add(vfi->dir,
+    xenvif_fi_names[fi]);

How does this work? xenvif_fi_names[] is an empty array and this is the
only reference to it. Who is allocating the memory for that array?

Well, it works in the way one adds a var to enum (which is used as a key
later) and a corresponding string into the array (which is used as a
name for the fault directory in sysfs).

Then you should size the array via XENVIF_FI_MAX.


Makes sense.
Thanks!

Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B


Re: [PATCH 2/3] xen netback: add fault injection facility

2018-04-20 Thread Juergen Gross
On 20/04/18 14:52, stask...@amazon.com wrote:
> On 04/20/18 13:25, Juergen Gross wrote:
>> On 20/04/18 12:47, Stanislav Kinsburskii wrote:
>>> +    for (fi = 0; fi < XENVIF_FI_MAX; fi++) {
>>> +    vfi->faults[fi] = xen_fi_dir_add(vfi->dir,
>>> +    xenvif_fi_names[fi]);
>> How does this work? xenvif_fi_names[] is an empty array and this is the
>> only reference to it. Who is allocating the memory for that array?
> 
> Well, it works in the way one adds a var to enum (which is used as a key
> later) and a corresponding string into the array (which is used as a
> name for the fault directory in sysfs).

Then you should size the array via XENVIF_FI_MAX.


Juergen


Re: [PATCH 2/3] xen netback: add fault injection facility

2018-04-20 Thread staskins

On 04/20/18 13:25, Juergen Gross wrote:

On 20/04/18 12:47, Stanislav Kinsburskii wrote:

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 8918466..5cc9acd 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -465,6 +465,14 @@ config XEN_NETDEV_BACKEND
  compile this driver as a module, chose M here: the module
  will be called xen-netback.
  
+config XEN_NETDEV_BACKEND_FAULT_INJECTION

+ bool "Xen net-device backend driver fault injection"
+ depends on XEN_NETDEV_BACKEND
+ depends on XEN_FAULT_INJECTION
+ default n
+ help
+   Allow to inject errors to Xen backend network driver
+
  config VMXNET3
tristate "VMware VMXNET3 ethernet driver"
depends on PCI && INET
diff --git a/drivers/net/xen-netback/Makefile b/drivers/net/xen-netback/Makefile
index d49798a..28abcdc 100644
--- a/drivers/net/xen-netback/Makefile
+++ b/drivers/net/xen-netback/Makefile
@@ -1,3 +1,4 @@
  obj-$(CONFIG_XEN_NETDEV_BACKEND) := xen-netback.o
  
  xen-netback-y := netback.o xenbus.o interface.o hash.o rx.o

+xen-netback-$(CONFIG_XEN_NETDEV_BACKEND_FAULT_INJECTION) += netback_fi.o
diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index a46a1e9..30d676d 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -286,6 +286,9 @@ struct xenvif {
  
  #ifdef CONFIG_DEBUG_FS

struct dentry *xenvif_dbg_root;
+#ifdef CONFIG_XEN_NETDEV_BACKEND_FAULT_INJECTION
+   void *fi_info;
+#endif
  #endif
  
  	struct xen_netif_ctrl_back_ring ctrl;

diff --git a/drivers/net/xen-netback/netback.c 
b/drivers/net/xen-netback/netback.c
index a27daa2..ecc416e 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -33,6 +33,7 @@
   */
  
  #include "common.h"

+#include "netback_fi.h"
  
  #include 

  #include 
@@ -1649,6 +1650,7 @@ static int __init netback_init(void)
PTR_ERR(xen_netback_dbg_root));
  #endif /* CONFIG_DEBUG_FS */
  
+	(void) xen_netbk_fi_init();

This is the only usage of xen_netbk_fi_init(). Why don't you make it
return void from the beginning?


Well, I could do so, of course.
My intention was to treat this as an error. But then it doesn't 
correlate to ignored debugfs directory creation error above.



return 0;
  
  failed_init:

@@ -1659,6 +1661,7 @@ module_init(netback_init);
  
  static void __exit netback_fini(void)

  {
+   xen_netbk_fi_fini();
  #ifdef CONFIG_DEBUG_FS
if (!IS_ERR_OR_NULL(xen_netback_dbg_root))
debugfs_remove_recursive(xen_netback_dbg_root);
diff --git a/drivers/net/xen-netback/netback_fi.c 
b/drivers/net/xen-netback/netback_fi.c
new file mode 100644
index 000..47541d0
--- /dev/null
+++ b/drivers/net/xen-netback/netback_fi.c
@@ -0,0 +1,119 @@
+/*
+ * Fault injection interface for Xen backend network driver
+ *
+ * Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation; or, when distributed
+ * separately from the Linux kernel or incorporated into other
+ * software packages, subject to the following license:

SPDX again.


Will fix.




+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include "common.h"
+
+#include 
+
+#include 
+#include "netback_fi.h"
+
+static struct dentry *vif_fi_dir;
+
+static const char *xenvif_fi_names[] = {
+};
+
+struct xenvif_fi {
+   struct dentry *dir;
+   struct xen_fi *faults[XENVIF_FI_MAX];
+};
+
+int xen_netbk_fi_init(void)
+{
+   vif_fi_dir = xen_fi_dir_create("xen-netback");
+   if (!vif_fi_dir)
+   return -ENOMEM;
+   return 0;
+}
+
+void xen_netbk_fi_fini(void)
+{
+   debugfs_remove_recursive(vif_fi_dir);
+}
+
+void xenvif_fi_fini(struct xenvif *vif)
+{
+   struct 

Re: [PATCH 3/3] xen blkback: add fault injection facility

2018-04-20 Thread staskins

On 04/20/18 13:28, Juergen Gross wrote:

On 20/04/18 12:47, Stanislav Kinsburskii wrote:

This patch adds wrapper helpers around generic Xen fault inject
facility.
The major reason is to keep all the module fault injection directories
in a dedicated subdirectory instead of Xen fault inject root.

IOW, when using these helpers, per-device and named by device name
fault injection control directories will appear under the following
directory:
- /sys/kernel/debug/xen/fault_inject/xen-blkback/
instead of:
- /sys/kernel/debug/xen/fault_inject/

Signed-off-by: Stanislav Kinsburskii 
CC: Jens Axboe 
CC: Konrad Rzeszutek Wilk 
CC: "Roger Pau Monné" 
CC: linux-ker...@vger.kernel.org
CC: linux-block@vger.kernel.org
CC: xen-de...@lists.xenproject.org
CC: Stanislav Kinsburskii 
CC: David Woodhouse 

This is an exact copy of the netback patch apart from the names.

I don't like adding multiple copies of the same coding to the tree.


Sure, will dedupplicate this.



Juergen



Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B


Re: [PATCH 1/3] xen: add generic fault injection facility

2018-04-20 Thread staskins

On 04/20/18 12:59, Juergen Gross wrote:

On 20/04/18 12:47, Stanislav Kinsburskii wrote:

diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig
index c1f98f3..483fc16 100644
--- a/arch/x86/xen/Kconfig
+++ b/arch/x86/xen/Kconfig
@@ -77,3 +77,10 @@ config XEN_PVH
bool "Support for running as a PVH guest"
depends on XEN && XEN_PVHVM && ACPI
def_bool n
+
+config XEN_FAULT_INJECTION
+   bool "Enable Xen fault injection"
+   depends on FAULT_INJECTION_DEBUG_FS
+   default n
+   help
+ Enable Xen fault injection facility

Why for x86 only? I'd rather add this under drivers/xen


Sure.




diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
index d83cb54..3158fe1 100644
--- a/arch/x86/xen/Makefile
+++ b/arch/x86/xen/Makefile
@@ -34,3 +34,4 @@ obj-$(CONFIG_XEN_DOM0)+= vga.o
  obj-$(CONFIG_SWIOTLB_XEN) += pci-swiotlb-xen.o
  obj-$(CONFIG_XEN_EFI) += efi.o
  obj-$(CONFIG_XEN_PVH) += xen-pvh.o
+obj-$(CONFIG_XEN_FAULT_INJECTION)  += fault_inject.o
diff --git a/arch/x86/xen/fault_inject.c b/arch/x86/xen/fault_inject.c
new file mode 100644
index 000..ecf0f7c
--- /dev/null
+++ b/arch/x86/xen/fault_inject.c
@@ -0,0 +1,109 @@
+/*
+ * Fauit injection interface for Xen virtual block devices
+ *
+ * Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation; or, when distributed
+ * separately from the Linux kernel or incorporated into other
+ * software packages, subject to the following license:

Please use the appropriate SPDX header instead of the full GPL2
boilerplate.


Ditto.




Juergen



Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B


Re: [PATCH 2/3] xen netback: add fault injection facility

2018-04-20 Thread Juergen Gross
On 20/04/18 12:47, Stanislav Kinsburskii wrote:
> This patch adds wrapper helpers around generic Xen fault inject facility.
> The major reason is to keep all the module fault injection directories
> in a dedicated subdirectory instead of Xen fault inject root.
> 
> IOW, when using these helpers, per-device and named by device name fault
> injection control directories will appear under the following directory:
> - /sys/kernel/debug/xen/fault_inject/xen-netback/
> instead of:
> - /sys/kernel/debug/xen/fault_inject/
> 
> Signed-off-by: Stanislav Kinsburskii 
> CC: Wei Liu 
> CC: Paul Durrant 
> CC: "David S. Miller" 
> CC: Matteo Croce 
> CC: Stefan Hajnoczi 
> CC: Daniel Borkmann 
> CC: Gerard Garcia 
> CC: David Ahern 
> CC: Juergen Gross 
> CC: Amir Levy 
> CC: Jakub Kicinski 
> CC: linux-ker...@vger.kernel.org
> CC: net...@vger.kernel.org
> CC: xen-de...@lists.xenproject.org
> CC: Stanislav Kinsburskii 
> CC: David Woodhouse 
> ---
>  drivers/net/Kconfig  |8 ++
>  drivers/net/xen-netback/Makefile |1 
>  drivers/net/xen-netback/common.h |3 +
>  drivers/net/xen-netback/netback.c|3 +
>  drivers/net/xen-netback/netback_fi.c |  119 
> ++
>  drivers/net/xen-netback/netback_fi.h |   35 ++
>  drivers/net/xen-netback/xenbus.c |6 ++
>  7 files changed, 175 insertions(+)
>  create mode 100644 drivers/net/xen-netback/netback_fi.c
>  create mode 100644 drivers/net/xen-netback/netback_fi.h
> 
> diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
> index 8918466..5cc9acd 100644
> --- a/drivers/net/Kconfig
> +++ b/drivers/net/Kconfig
> @@ -465,6 +465,14 @@ config XEN_NETDEV_BACKEND
> compile this driver as a module, chose M here: the module
> will be called xen-netback.
>  
> +config XEN_NETDEV_BACKEND_FAULT_INJECTION
> +   bool "Xen net-device backend driver fault injection"
> +   depends on XEN_NETDEV_BACKEND
> +   depends on XEN_FAULT_INJECTION
> +   default n
> +   help
> + Allow to inject errors to Xen backend network driver
> +
>  config VMXNET3
>   tristate "VMware VMXNET3 ethernet driver"
>   depends on PCI && INET
> diff --git a/drivers/net/xen-netback/Makefile 
> b/drivers/net/xen-netback/Makefile
> index d49798a..28abcdc 100644
> --- a/drivers/net/xen-netback/Makefile
> +++ b/drivers/net/xen-netback/Makefile
> @@ -1,3 +1,4 @@
>  obj-$(CONFIG_XEN_NETDEV_BACKEND) := xen-netback.o
>  
>  xen-netback-y := netback.o xenbus.o interface.o hash.o rx.o
> +xen-netback-$(CONFIG_XEN_NETDEV_BACKEND_FAULT_INJECTION) += netback_fi.o
> diff --git a/drivers/net/xen-netback/common.h 
> b/drivers/net/xen-netback/common.h
> index a46a1e9..30d676d 100644
> --- a/drivers/net/xen-netback/common.h
> +++ b/drivers/net/xen-netback/common.h
> @@ -286,6 +286,9 @@ struct xenvif {
>  
>  #ifdef CONFIG_DEBUG_FS
>   struct dentry *xenvif_dbg_root;
> +#ifdef CONFIG_XEN_NETDEV_BACKEND_FAULT_INJECTION
> + void *fi_info;
> +#endif
>  #endif
>  
>   struct xen_netif_ctrl_back_ring ctrl;
> diff --git a/drivers/net/xen-netback/netback.c 
> b/drivers/net/xen-netback/netback.c
> index a27daa2..ecc416e 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -33,6 +33,7 @@
>   */
>  
>  #include "common.h"
> +#include "netback_fi.h"
>  
>  #include 
>  #include 
> @@ -1649,6 +1650,7 @@ static int __init netback_init(void)
>   PTR_ERR(xen_netback_dbg_root));
>  #endif /* CONFIG_DEBUG_FS */
>  
> + (void) xen_netbk_fi_init();

This is the only usage of xen_netbk_fi_init(). Why don't you make it
return void from the beginning?

>   return 0;
>  
>  failed_init:
> @@ -1659,6 +1661,7 @@ module_init(netback_init);
>  
>  static void __exit netback_fini(void)
>  {
> + xen_netbk_fi_fini();
>  #ifdef CONFIG_DEBUG_FS
>   if (!IS_ERR_OR_NULL(xen_netback_dbg_root))
>   debugfs_remove_recursive(xen_netback_dbg_root);
> diff --git a/drivers/net/xen-netback/netback_fi.c 
> b/drivers/net/xen-netback/netback_fi.c
> new file mode 100644
> index 000..47541d0
> --- /dev/null
> +++ b/drivers/net/xen-netback/netback_fi.c
> @@ -0,0 +1,119 @@
> +/*
> + * Fault injection interface for Xen backend network driver
> + *
> + * Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License version 2
> + * as published by the Free Software Foundation; or, when distributed
> + * separately from the Linux kernel or incorporated into other
> + * software packages, subject to the 

Re: [PATCH 1/3] xen: add generic fault injection facility

2018-04-20 Thread Juergen Gross
On 20/04/18 12:47, Stanislav Kinsburskii wrote:
> The overall idea of this patch is to add a generic fault injection facility
> to Xen, which later can be used in various places by different Xen parts.
> 
> Core implementation ideas:
> 
> - The facility build is controlled by boolean config
>   CONFIG_XEN_FAULT_INJECTION option ("N" by default).
> 
> - All fault injection logic is located in an optionally compiled separated
>   file.
> 
> - Fault injection attribute and control directory creation and destruction
>   are wrapped with helpers, producing and accepting a pointer to an opaque
>   object thus making all the rest of code independent on fault injection
>   engine.
> 
> When enabled Xen root fault injection directory appears:
> 
> - /sys/kernel/debug/xen/fault_inject/
> 
> The falicity provides the following helpers (exported to be accessible in
> modules):
> 
> - xen_fi_add(name) - adds fault injection control directory "name" to Xen
>   root fault injection directory
> 
> - xen_fi_dir_create(name) - allows to create a subdirectory "name" in Xen
>   root fault injection directory.
> 
> - xen_fi_dir_add(dir, name) - adds fault injection control directory "name"
>   to directory "dir"
> 
> - xen_should_fail(fi) - check whether fi hav to fail.
> 
> Signed-off-by: Stanislav Kinsburskii 
> CC: Boris Ostrovsky 
> CC: Juergen Gross 
> CC: Thomas Gleixner 
> CC: Ingo Molnar 
> CC: "H. Peter Anvin" 
> CC: x...@kernel.org
> CC: xen-de...@lists.xenproject.org
> CC: linux-ker...@vger.kernel.org
> CC: Stanislav Kinsburskii 
> CC: David Woodhouse 
> ---
>  arch/x86/xen/Kconfig|7 +++
>  arch/x86/xen/Makefile   |1 
>  arch/x86/xen/fault_inject.c |  109 
> +++
>  include/xen/fault_inject.h  |   45 ++
>  4 files changed, 162 insertions(+)
>  create mode 100644 arch/x86/xen/fault_inject.c
>  create mode 100644 include/xen/fault_inject.h
> 
> diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig
> index c1f98f3..483fc16 100644
> --- a/arch/x86/xen/Kconfig
> +++ b/arch/x86/xen/Kconfig
> @@ -77,3 +77,10 @@ config XEN_PVH
>   bool "Support for running as a PVH guest"
>   depends on XEN && XEN_PVHVM && ACPI
>   def_bool n
> +
> +config XEN_FAULT_INJECTION
> + bool "Enable Xen fault injection"
> + depends on FAULT_INJECTION_DEBUG_FS
> + default n
> + help
> +   Enable Xen fault injection facility

Why for x86 only? I'd rather add this under drivers/xen

> diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
> index d83cb54..3158fe1 100644
> --- a/arch/x86/xen/Makefile
> +++ b/arch/x86/xen/Makefile
> @@ -34,3 +34,4 @@ obj-$(CONFIG_XEN_DOM0)  += vga.o
>  obj-$(CONFIG_SWIOTLB_XEN)+= pci-swiotlb-xen.o
>  obj-$(CONFIG_XEN_EFI)+= efi.o
>  obj-$(CONFIG_XEN_PVH)+= xen-pvh.o
> +obj-$(CONFIG_XEN_FAULT_INJECTION)+= fault_inject.o
> diff --git a/arch/x86/xen/fault_inject.c b/arch/x86/xen/fault_inject.c
> new file mode 100644
> index 000..ecf0f7c
> --- /dev/null
> +++ b/arch/x86/xen/fault_inject.c
> @@ -0,0 +1,109 @@
> +/*
> + * Fauit injection interface for Xen virtual block devices
> + *
> + * Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License version 2
> + * as published by the Free Software Foundation; or, when distributed
> + * separately from the Linux kernel or incorporated into other
> + * software packages, subject to the following license:

Please use the appropriate SPDX header instead of the full GPL2
boilerplate.


Juergen


[PATCH 1/3] xen: add generic fault injection facility

2018-04-20 Thread Stanislav Kinsburskii
The overall idea of this patch is to add a generic fault injection facility
to Xen, which later can be used in various places by different Xen parts.

Core implementation ideas:

- The facility build is controlled by boolean config
  CONFIG_XEN_FAULT_INJECTION option ("N" by default).

- All fault injection logic is located in an optionally compiled separated
  file.

- Fault injection attribute and control directory creation and destruction
  are wrapped with helpers, producing and accepting a pointer to an opaque
  object thus making all the rest of code independent on fault injection
  engine.

When enabled Xen root fault injection directory appears:

- /sys/kernel/debug/xen/fault_inject/

The falicity provides the following helpers (exported to be accessible in
modules):

- xen_fi_add(name) - adds fault injection control directory "name" to Xen
  root fault injection directory

- xen_fi_dir_create(name) - allows to create a subdirectory "name" in Xen
  root fault injection directory.

- xen_fi_dir_add(dir, name) - adds fault injection control directory "name"
  to directory "dir"

- xen_should_fail(fi) - check whether fi hav to fail.

Signed-off-by: Stanislav Kinsburskii 
CC: Boris Ostrovsky 
CC: Juergen Gross 
CC: Thomas Gleixner 
CC: Ingo Molnar 
CC: "H. Peter Anvin" 
CC: x...@kernel.org
CC: xen-de...@lists.xenproject.org
CC: linux-ker...@vger.kernel.org
CC: Stanislav Kinsburskii 
CC: David Woodhouse 
---
 arch/x86/xen/Kconfig|7 +++
 arch/x86/xen/Makefile   |1 
 arch/x86/xen/fault_inject.c |  109 +++
 include/xen/fault_inject.h  |   45 ++
 4 files changed, 162 insertions(+)
 create mode 100644 arch/x86/xen/fault_inject.c
 create mode 100644 include/xen/fault_inject.h

diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig
index c1f98f3..483fc16 100644
--- a/arch/x86/xen/Kconfig
+++ b/arch/x86/xen/Kconfig
@@ -77,3 +77,10 @@ config XEN_PVH
bool "Support for running as a PVH guest"
depends on XEN && XEN_PVHVM && ACPI
def_bool n
+
+config XEN_FAULT_INJECTION
+   bool "Enable Xen fault injection"
+   depends on FAULT_INJECTION_DEBUG_FS
+   default n
+   help
+ Enable Xen fault injection facility
diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
index d83cb54..3158fe1 100644
--- a/arch/x86/xen/Makefile
+++ b/arch/x86/xen/Makefile
@@ -34,3 +34,4 @@ obj-$(CONFIG_XEN_DOM0)+= vga.o
 obj-$(CONFIG_SWIOTLB_XEN)  += pci-swiotlb-xen.o
 obj-$(CONFIG_XEN_EFI)  += efi.o
 obj-$(CONFIG_XEN_PVH)  += xen-pvh.o
+obj-$(CONFIG_XEN_FAULT_INJECTION)  += fault_inject.o
diff --git a/arch/x86/xen/fault_inject.c b/arch/x86/xen/fault_inject.c
new file mode 100644
index 000..ecf0f7c
--- /dev/null
+++ b/arch/x86/xen/fault_inject.c
@@ -0,0 +1,109 @@
+/*
+ * Fauit injection interface for Xen virtual block devices
+ *
+ * Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation; or, when distributed
+ * separately from the Linux kernel or incorporated into other
+ * software packages, subject to the following license:
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include 
+#include 
+
+#include 
+
+#include "debugfs.h"
+
+static struct dentry *d_fi_debug;
+
+static DECLARE_FAULT_ATTR(template_attr);
+
+struct xen_fi {
+   struct dentry   *dir;
+   struct fault_attr   attr;
+};
+
+struct xen_fi *xen_fi_dir_add(struct dentry *parent, const char *name)
+{
+   struct xen_fi *fi;
+   struct dentry *dir;
+
+   fi = kzalloc(sizeof(*fi), GFP_KERNEL);
+   if (!fi)
+

[PATCH 0/3] Introduce Xen fault injection facility

2018-04-20 Thread Stanislav Kinsburskii
This series adds a facility, which can be used to instrument Xen code with
fault injections.
It is based "Fault injection capabilities infrastructure" described here:
- Documentation/fault-injection/fault-injection.txt

First patch adds a generic facility to use anywhere in Xen.
When using it all the fault injection user land control directories (if
any) will appear here:
- /sys/kernel/debug/xen/fault_inject/

To distinguish with generic (or core) Xen fault injections, next two
patches add additional directories to the root path above for blkback and
netback drivers respectively:
- /sys/kernel/debug/xen/fault_inject/xen-blkback/
- /sys/kernel/debug/xen/fault_inject/xen-netback/

---

Stanislav Kinsburskii (3):
  xen: add generic fault injection facility
  xen netback: add fault injection facility
  xen blkback: add fault injection facility


 arch/x86/xen/Kconfig   |7 ++
 arch/x86/xen/Makefile  |1 
 arch/x86/xen/fault_inject.c|  109 +
 drivers/block/Kconfig  |7 ++
 drivers/block/xen-blkback/Makefile |1 
 drivers/block/xen-blkback/blkback.c|9 ++
 drivers/block/xen-blkback/blkback_fi.c |  116 +++
 drivers/block/xen-blkback/blkback_fi.h |   37 ++
 drivers/block/xen-blkback/common.h |3 +
 drivers/block/xen-blkback/xenbus.c |5 +
 drivers/net/Kconfig|8 ++
 drivers/net/xen-netback/Makefile   |1 
 drivers/net/xen-netback/common.h   |3 +
 drivers/net/xen-netback/netback.c  |3 +
 drivers/net/xen-netback/netback_fi.c   |  119 
 drivers/net/xen-netback/netback_fi.h   |   35 +
 drivers/net/xen-netback/xenbus.c   |6 ++
 include/xen/fault_inject.h |   45 
 18 files changed, 515 insertions(+)
 create mode 100644 arch/x86/xen/fault_inject.c
 create mode 100644 drivers/block/xen-blkback/blkback_fi.c
 create mode 100644 drivers/block/xen-blkback/blkback_fi.h
 create mode 100644 drivers/net/xen-netback/netback_fi.c
 create mode 100644 drivers/net/xen-netback/netback_fi.h
 create mode 100644 include/xen/fault_inject.h
Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B


[PATCH 3/3] xen blkback: add fault injection facility

2018-04-20 Thread Stanislav Kinsburskii
This patch adds wrapper helpers around generic Xen fault inject
facility.
The major reason is to keep all the module fault injection directories
in a dedicated subdirectory instead of Xen fault inject root.

IOW, when using these helpers, per-device and named by device name
fault injection control directories will appear under the following
directory:
- /sys/kernel/debug/xen/fault_inject/xen-blkback/
instead of:
- /sys/kernel/debug/xen/fault_inject/

Signed-off-by: Stanislav Kinsburskii 
CC: Jens Axboe 
CC: Konrad Rzeszutek Wilk 
CC: "Roger Pau Monné" 
CC: linux-ker...@vger.kernel.org
CC: linux-block@vger.kernel.org
CC: xen-de...@lists.xenproject.org
CC: Stanislav Kinsburskii 
CC: David Woodhouse 
---
 drivers/block/Kconfig  |7 ++
 drivers/block/xen-blkback/Makefile |1 
 drivers/block/xen-blkback/blkback.c|9 ++
 drivers/block/xen-blkback/blkback_fi.c |  116 
 drivers/block/xen-blkback/blkback_fi.h |   37 ++
 drivers/block/xen-blkback/common.h |3 +
 drivers/block/xen-blkback/xenbus.c |5 +
 7 files changed, 178 insertions(+)
 create mode 100644 drivers/block/xen-blkback/blkback_fi.c
 create mode 100644 drivers/block/xen-blkback/blkback_fi.h

diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
index ad9b687..0ce05fe 100644
--- a/drivers/block/Kconfig
+++ b/drivers/block/Kconfig
@@ -436,6 +436,13 @@ config XEN_BLKDEV_BACKEND
  compile this driver as a module, chose M here: the module
  will be called xen-blkback.
 
+config XEN_BLKDEV_BACKEND_FAULT_INJECTION
+ bool "Xen block-device backend driver fault injection"
+ depends on XEN_BLKDEV_BACKEND
+ depends on XEN_FAULT_INJECTION
+ default n
+ help
+   Allow to inject errors to Xen backend block driver
 
 config VIRTIO_BLK
tristate "Virtio block driver"
diff --git a/drivers/block/xen-blkback/Makefile 
b/drivers/block/xen-blkback/Makefile
index e491c1b..035d134 100644
--- a/drivers/block/xen-blkback/Makefile
+++ b/drivers/block/xen-blkback/Makefile
@@ -1,3 +1,4 @@
 obj-$(CONFIG_XEN_BLKDEV_BACKEND) := xen-blkback.o
 
 xen-blkback-y  := blkback.o xenbus.o
+xen-blkback-$(CONFIG_XEN_BLKDEV_BACKEND_FAULT_INJECTION) += blkback_fi.o
diff --git a/drivers/block/xen-blkback/blkback.c 
b/drivers/block/xen-blkback/blkback.c
index 987d665..2933bec 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -51,6 +51,7 @@
 #include 
 #include 
 #include "common.h"
+#include "blkback_fi.h"
 
 /*
  * Maximum number of unused free pages to keep in the internal buffer.
@@ -1491,11 +1492,19 @@ static int __init xen_blkif_init(void)
if (rc)
goto failed_init;
 
+   (void) xen_blkbk_fi_init();
  failed_init:
return rc;
 }
 
 module_init(xen_blkif_init);
 
+static void __exit xen_blkif_fini(void)
+{
+   xen_blkbk_fi_fini();
+}
+
+module_exit(xen_blkif_fini);
+
 MODULE_LICENSE("Dual BSD/GPL");
 MODULE_ALIAS("xen-backend:vbd");
diff --git a/drivers/block/xen-blkback/blkback_fi.c 
b/drivers/block/xen-blkback/blkback_fi.c
new file mode 100644
index 000..b7abaea
--- /dev/null
+++ b/drivers/block/xen-blkback/blkback_fi.c
@@ -0,0 +1,116 @@
+/*
+ * Fault injection interface for Xen virtual block devices
+ *
+ * Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation; or, when distributed
+ * separately from the Linux kernel or incorporated into other
+ * software packages, subject to the following license:
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include 
+#include 
+
+#include 
+
+#include "blkback_fi.h"

[PATCH 2/3] xen netback: add fault injection facility

2018-04-20 Thread Stanislav Kinsburskii
This patch adds wrapper helpers around generic Xen fault inject facility.
The major reason is to keep all the module fault injection directories
in a dedicated subdirectory instead of Xen fault inject root.

IOW, when using these helpers, per-device and named by device name fault
injection control directories will appear under the following directory:
- /sys/kernel/debug/xen/fault_inject/xen-netback/
instead of:
- /sys/kernel/debug/xen/fault_inject/

Signed-off-by: Stanislav Kinsburskii 
CC: Wei Liu 
CC: Paul Durrant 
CC: "David S. Miller" 
CC: Matteo Croce 
CC: Stefan Hajnoczi 
CC: Daniel Borkmann 
CC: Gerard Garcia 
CC: David Ahern 
CC: Juergen Gross 
CC: Amir Levy 
CC: Jakub Kicinski 
CC: linux-ker...@vger.kernel.org
CC: net...@vger.kernel.org
CC: xen-de...@lists.xenproject.org
CC: Stanislav Kinsburskii 
CC: David Woodhouse 
---
 drivers/net/Kconfig  |8 ++
 drivers/net/xen-netback/Makefile |1 
 drivers/net/xen-netback/common.h |3 +
 drivers/net/xen-netback/netback.c|3 +
 drivers/net/xen-netback/netback_fi.c |  119 ++
 drivers/net/xen-netback/netback_fi.h |   35 ++
 drivers/net/xen-netback/xenbus.c |6 ++
 7 files changed, 175 insertions(+)
 create mode 100644 drivers/net/xen-netback/netback_fi.c
 create mode 100644 drivers/net/xen-netback/netback_fi.h

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 8918466..5cc9acd 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -465,6 +465,14 @@ config XEN_NETDEV_BACKEND
  compile this driver as a module, chose M here: the module
  will be called xen-netback.
 
+config XEN_NETDEV_BACKEND_FAULT_INJECTION
+ bool "Xen net-device backend driver fault injection"
+ depends on XEN_NETDEV_BACKEND
+ depends on XEN_FAULT_INJECTION
+ default n
+ help
+   Allow to inject errors to Xen backend network driver
+
 config VMXNET3
tristate "VMware VMXNET3 ethernet driver"
depends on PCI && INET
diff --git a/drivers/net/xen-netback/Makefile b/drivers/net/xen-netback/Makefile
index d49798a..28abcdc 100644
--- a/drivers/net/xen-netback/Makefile
+++ b/drivers/net/xen-netback/Makefile
@@ -1,3 +1,4 @@
 obj-$(CONFIG_XEN_NETDEV_BACKEND) := xen-netback.o
 
 xen-netback-y := netback.o xenbus.o interface.o hash.o rx.o
+xen-netback-$(CONFIG_XEN_NETDEV_BACKEND_FAULT_INJECTION) += netback_fi.o
diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index a46a1e9..30d676d 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -286,6 +286,9 @@ struct xenvif {
 
 #ifdef CONFIG_DEBUG_FS
struct dentry *xenvif_dbg_root;
+#ifdef CONFIG_XEN_NETDEV_BACKEND_FAULT_INJECTION
+   void *fi_info;
+#endif
 #endif
 
struct xen_netif_ctrl_back_ring ctrl;
diff --git a/drivers/net/xen-netback/netback.c 
b/drivers/net/xen-netback/netback.c
index a27daa2..ecc416e 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -33,6 +33,7 @@
  */
 
 #include "common.h"
+#include "netback_fi.h"
 
 #include 
 #include 
@@ -1649,6 +1650,7 @@ static int __init netback_init(void)
PTR_ERR(xen_netback_dbg_root));
 #endif /* CONFIG_DEBUG_FS */
 
+   (void) xen_netbk_fi_init();
return 0;
 
 failed_init:
@@ -1659,6 +1661,7 @@ module_init(netback_init);
 
 static void __exit netback_fini(void)
 {
+   xen_netbk_fi_fini();
 #ifdef CONFIG_DEBUG_FS
if (!IS_ERR_OR_NULL(xen_netback_dbg_root))
debugfs_remove_recursive(xen_netback_dbg_root);
diff --git a/drivers/net/xen-netback/netback_fi.c 
b/drivers/net/xen-netback/netback_fi.c
new file mode 100644
index 000..47541d0
--- /dev/null
+++ b/drivers/net/xen-netback/netback_fi.c
@@ -0,0 +1,119 @@
+/*
+ * Fault injection interface for Xen backend network driver
+ *
+ * Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation; or, when distributed
+ * separately from the Linux kernel or incorporated into other
+ * software packages, subject to the following license:
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software 

[PATCH] A tentative patch to bypass REQ_META in blk throttle

2018-04-20 Thread Benlong Zhang
One problem with cgwb is how fs should treat metadata bios.
For example in xfs, the log might be partially stuck in one
group, leaving threads in other groups waiting for too long.
Please refer to the linux-xfs discussion:
"[PATCH V2] xfs: implement cgroup writeback support"

One approach is to correctly tag bio->bi_css from within the
filesystems (for xfs log should be blkcg_root), and the other
is to skip them, but relies on REQ_META being set.

It works with xfs on a cgwb porting implementation to 3.10.0.
But really not sure about other filesystems...
---
 block/blk-throttle.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index c5a1316..e2abf6e 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -2159,6 +2159,10 @@ bool blk_throtl_bio(struct request_queue *q, struct 
blkcg_gq *blkg,
if (bio_flagged(bio, BIO_THROTTLED) || !tg->has_rules[rw])
goto out;
 
+   /* bypass REQ_META for fs global resource */
+   if (bio->bi_opf & REQ_META)
+   goto out;
+
spin_lock_irq(q->queue_lock);
 
throtl_update_latency_buckets(td);
-- 
1.8.3.1



[BUG REPORT] WARNING at __blk_mq_complete_request+0x105/0x180

2018-04-20 Thread Michael Moese
Hello,
For kernel 4.16.1, I encountered a bug triggering a WARNING using the test code 
at:
https://github.com/ChaitanayaKulkarni/nvmftests/blob/master/tests/test_nvmf_run_host_traffic.py

NVMF run host traffic and disable target namespace(s):
1. From the config file create Target.
2. From the config file create host and connect to target.
3. Start host traffic parallely on all host ns, disable target ns.
4. Delete Host.
5. Delete Target.

I can trigger a kernel warning. 

The resulting error:
[  294.559404] loop: module loaded
[  304.601864] loop: module loaded
[  305.710676] nvmet: adding nsid 1 to subsystem testnqn1
[  305.745667] nvmet: creating controller 1 for subsystem testnqn1 for NQN 
nqn.2014-08.org.nvmexpress:uuid:8b5e4215-fd40-4bac-9438-3878073e98ad.
[  305.745729] nvme nvme1: creating 4 I/O queues.
[  305.746155] nvme nvme1: new ctrl: "testnqn1"
[  314.025761] WARNING: CPU: 3 PID: 2178 at ../block/blk-mq.c:534 
__blk_mq_complete_request+0x105/0x180
[  314.025763] Modules linked in: nvme_loop nvme_fabrics nvmet loop configfs 
af_packet xt_tcpudp ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT 
nf_reject_ipv4 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge 
stp llc iscsi_ibft iscsi_boot_sysfs ip6table_nat nf_conntrack_ipv6 
nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security 
iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack 
libcrc32c iptable_mangle iptable_raw iptable_security ebtable_filter ebtables 
ip6table_filter ip6_tables iptable_filter ip_tables x_tables 
snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core crct10dif_pclmul 
crc32_pclmul ghash_clmulni_intel pcbc 8139too qxl ttm snd_hwdep aesni_intel 
aes_x86_64 8139cp drm_kms_helper crypto_simd glue_helper pcspkr cryptd
[  314.025789]  drm fb_sys_fops syscopyarea virtio_balloon sysfillrect snd_pcm 
snd_timer snd soundcore qemu_fw_cfg button sysimgblt mii i2c_piix4 btrfs xor 
zstd_decompress zstd_compress xxhash raid6_pq sr_mod cdrom ata_generic ata_piix 
uhci_hcd ehci_pci ehci_hcd crc32c_intel serio_raw floppy usbcore sg [last 
unloaded: loop]
[  314.025804] CPU: 3 PID: 2178 Comm: dd Not tainted 4.16.1-1-default #1 
openSUSE Tumbleweed (unreleased)
[  314.025805] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.0.0-prebuilt.qemu-project.org 04/01/2014
[  314.025806] RIP: 0010:__blk_mq_complete_request+0x105/0x180
[  314.025807] RSP: 0018:beb900d83a00 EFLAGS: 00010297
[  314.025808] RAX:  RBX: 9bd97895c300 RCX: 0003
[  314.025809] RDX: 0003 RSI: beb900d83a24 RDI: 9bd97895c300
[  314.025810] RBP: deb8ffdb2380 R08: 9bd9e9a92300 R09: 0009
[  314.025810] R10: beb900d83b60 R11: 0001 R12: 9bd9e989d680
[  314.025811] R13: 9bd97895c4b8 R14: 9bd9e786eaa8 R15: 9bd97895c340
[  314.025812] FS:  7f26b2c09540() GS:9bd9ffd8() 
knlGS:
[  314.025813] CS:  0010 DS:  ES:  CR0: 80050033
[  314.025814] CR2: 55eaf5b2 CR3: b14d4005 CR4: 003606e0
[  314.025817] DR0:  DR1:  DR2: 
[  314.025817] DR3:  DR6: fffe0ff0 DR7: 0400
[  314.025818] Call Trace:
[  314.025829]  blk_mq_complete_request+0x53/0x70
[  314.025832]  nvmet_req_init+0x85/0x130 [nvmet]
[  314.025835]  nvme_loop_queue_rq+0x87/0x210 [nvme_loop]
[  314.025837]  blk_mq_dispatch_rq_list+0x87/0x500
[  314.025838]  ? flush_busy_ctx+0x60/0x90
[  314.025840]  ? blk_mq_flush_busy_ctxs+0xc5/0xe0
[  314.025842]  blk_mq_sched_dispatch_requests+0x167/0x170
[  314.025844]  __blk_mq_run_hw_queue+0x53/0xb0
[  314.025846]  __blk_mq_delay_run_hw_queue+0x83/0xa0
[  314.025847]  blk_mq_run_hw_queue+0x6c/0xd0
[  314.025849]  blk_mq_flush_plug_list+0x1d2/0x250
[  314.025852]  blk_flush_plug_list+0xcb/0x260
[  314.025854]  blk_finish_plug+0x27/0x40
[  314.025857]  __do_page_cache_readahead+0x1b6/0x260
[  314.025860]  ? ondemand_readahead+0x117/0x2c0
[  314.025862]  ondemand_readahead+0x117/0x2c0
[  314.025864]  generic_file_read_iter+0x6bc/0x930
[  314.025867]  ? aa_file_perm+0x196/0x310
[  314.025870]  __vfs_read+0xdb/0x140
[  314.025872]  vfs_read+0x89/0x130
[  314.025874]  SyS_read+0x42/0x90
[  314.025876]  do_syscall_64+0x76/0x140
[  314.025878]  entry_SYSCALL_64_after_hwframe+0x42/0xb7
[  314.025880] RIP: 0033:0x7f26b271dbc1
[  314.025881] RSP: 002b:7fffb1fea8a8 EFLAGS: 0246 ORIG_RAX: 

[  314.025882] RAX: ffda RBX: 55eaf47fe400 RCX: 7f26b271dbc1
[  314.025883] RDX: 1000 RSI: 55eaf5b2 RDI: 
[  314.025883] RBP: 1000 R08: 0003 R09: 007f2074a008
[  314.025884] R10: 08c6 R11: 0246 R12: 55eaf5b2
[  314.025884] R13:  R14:  R15: 55eaf5b2
[  314.025886] 

Re: [PATCH] bsg referencing bus driver module

2018-04-20 Thread James Bottomley
On Thu, 2018-04-19 at 15:10 -0700, Anatoliy Glagolev wrote:
> Updated: rebased on recent Linux, cc-ed maintainers per instructions
> in MAINTAINERS file
> 
> From df939b80d02bf37b21efaaef8ede86cfd39b0cb8 Mon Sep 17 00:00:00
> 2001
> From: Anatoliy Glagolev 
> Date: Thu, 19 Apr 2018 15:06:06 -0600
> Subject: [PATCH] bsg referencing parent module
> 
> Fixing a bug when bsg layer holds the last reference to device
> when the device's module has been unloaded. Upon dropping the
> reference the device's release function may touch memory of
> the unloaded module.
> ---
>  block/bsg-lib.c  | 24 ++--
>  block/bsg.c  | 29 ++---
>  drivers/scsi/scsi_transport_fc.c |  8 ++--
>  include/linux/bsg-lib.h  |  4 
>  include/linux/bsg.h  |  5 +
>  5 files changed, 63 insertions(+), 7 deletions(-)
> 
> diff --git a/block/bsg-lib.c b/block/bsg-lib.c
> index fc2e5ff..90c28fd 100644
> --- a/block/bsg-lib.c
> +++ b/block/bsg-lib.c
> @@ -309,6 +309,25 @@ struct request_queue *bsg_setup_queue(struct
> device *dev, const char *name,
>   bsg_job_fn *job_fn, int dd_job_size,
>   void (*release)(struct device *))
>  {
> + return bsg_setup_queue_ex(dev, name, job_fn, dd_job_size, release,
> + NULL);
> +}
> +EXPORT_SYMBOL_GPL(bsg_setup_queue);
> +
> +/**
> + * bsg_setup_queue - Create and add the bsg hooks so we can receive
> requests
> + * @dev: device to attach bsg device to
> + * @name: device to give bsg device
> + * @job_fn: bsg job handler
> + * @dd_job_size: size of LLD data needed for each job
> + * @release: @dev release function
> + * @dev_module: @dev's module
> + */
> +struct request_queue *bsg_setup_queue_ex(struct device *dev, const
> char *name,
> + bsg_job_fn *job_fn, int dd_job_size,
> + void (*release)(struct device *),
> + struct module *dev_module)

This patch isn't applyable because your mailer has changed all the tabs
to spaces.

I also think there's no need to do it this way.  I think what we need
is for fc_bsg_remove() to wait until the bsg queue is drained.  It does
look like the author thought this happened otherwise the code wouldn't
have the note.  If we fix it that way we can do the same thing in all
the other transport classes that use bsg (which all have a similar
issue).

James



[PATCH] block: mq: Add some minor doc for core structs

2018-04-20 Thread Linus Walleij
As it came up in discussion on the mailing list that the semantic
meaning of 'blk_mq_ctx' and 'blk_mq_hw_ctx' isn't completely
obvious to everyone, let's add some minimal kerneldoc for a
starter.

Signed-off-by: Linus Walleij 
---
 block/blk-mq.h | 3 +++
 include/linux/blk-mq.h | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/block/blk-mq.h b/block/blk-mq.h
index 88c558f71819..89b5cd3a6c70 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -7,6 +7,9 @@
 
 struct blk_mq_tag_set;
 
+/**
+ * struct blk_mq_ctx - State for a software queue facing the submitting CPUs
+ */
 struct blk_mq_ctx {
struct {
spinlock_t  lock;
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index e3986f4b3461..ebc34a5686dc 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -9,6 +9,9 @@
 struct blk_mq_tags;
 struct blk_flush_queue;
 
+/**
+ * struct blk_mq_hw_ctx - State for a hardware queue facing the hardware block 
device
+ */
 struct blk_mq_hw_ctx {
struct {
spinlock_t  lock;
-- 
2.14.3



[PATCH 3/3] scsi: avoid to hold host-wide counter of host_busy for scsi_mq

2018-04-20 Thread Ming Lei
It isn't necessary to check the host depth in scsi_queue_rq() any more
since it has been respected by blk-mq before calling scsi_queue_rq() via
getting driver tag.

Lots of LUNs may attach to same host, and per-host IOPS may reach millions
level, so we should avoid to this expensive atomic operations on the
hostwide counter in IO path.

This patch implemens scsi_host_busy() via blk_mq_tagset_busy_iter() for
reading the count of busy IOs for scsi_mq.

It is observed that IOPS is increased by 15% in IO test on scsi_debug
(32 LUNs, 32 submit queues, 1024 can_queue, libaio/dio) in one
dual-socket system.

Cc: Omar Sandoval ,
Cc: "Martin K. Petersen" ,
Cc: James Bottomley ,
Cc: Christoph Hellwig ,
Cc: Don Brace 
Cc: Kashyap Desai 
Cc: Mike Snitzer 
Cc: Hannes Reinecke 
Cc: Laurence Oberman 
Signed-off-by: Ming Lei 
---
 drivers/scsi/hosts.c| 24 +++-
 drivers/scsi/scsi_lib.c | 23 +--
 2 files changed, 40 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index 69beb30205f1..ad56e2b10ac8 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -564,13 +564,35 @@ struct Scsi_Host *scsi_host_get(struct Scsi_Host *shost)
 }
 EXPORT_SYMBOL(scsi_host_get);
 
+struct scsi_host_mq_in_flight {
+   int cnt;
+};
+
+static void scsi_host_check_in_flight(struct request *rq, void *data,
+   bool reserved)
+{
+   struct scsi_host_mq_in_flight *in_flight = data;
+
+   if (blk_mq_request_started(rq))
+   in_flight->cnt++;
+}
+
 /**
  * scsi_host_busy - Return the host busy counter
  * @shost: Pointer to Scsi_Host to inc.
  **/
 int scsi_host_busy(struct Scsi_Host *shost)
 {
-   return atomic_read(>host_busy);
+   struct scsi_host_mq_in_flight in_flight = {
+   .cnt = 0,
+   };
+
+   if (!shost->use_blk_mq)
+   return atomic_read(>host_busy);
+
+   blk_mq_tagset_busy_iter(>tag_set, scsi_host_check_in_flight,
+   _flight);
+   return in_flight.cnt;
 }
 EXPORT_SYMBOL(scsi_host_busy);
 
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 0dfec0dedd5e..dc437c642934 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -345,7 +345,8 @@ static void scsi_dec_host_busy(struct Scsi_Host *shost)
unsigned long flags;
 
rcu_read_lock();
-   atomic_dec(>host_busy);
+   if (!shost->use_blk_mq)
+   atomic_dec(>host_busy);
if (unlikely(scsi_host_in_recovery(shost))) {
spin_lock_irqsave(shost->host_lock, flags);
if (shost->host_failed || shost->host_eh_scheduled)
@@ -444,7 +445,12 @@ static inline bool scsi_target_is_busy(struct scsi_target 
*starget)
 
 static inline bool scsi_host_is_busy(struct Scsi_Host *shost)
 {
-   if (shost->can_queue > 0 &&
+   /*
+* blk-mq can handle host queue busy efficiently via host-wide driver
+* tag allocation
+*/
+
+   if (!shost->use_blk_mq && shost->can_queue > 0 &&
atomic_read(>host_busy) >= shost->can_queue)
return true;
if (atomic_read(>host_blocked) > 0)
@@ -1539,9 +1545,12 @@ static inline int scsi_host_queue_ready(struct 
request_queue *q,
if (scsi_host_in_recovery(shost))
return 0;
 
-   busy = atomic_inc_return(>host_busy) - 1;
+   if (!shost->use_blk_mq)
+   busy = atomic_inc_return(>host_busy) - 1;
+   else
+   busy = 0;
if (atomic_read(>host_blocked) > 0) {
-   if (busy)
+   if (busy || scsi_host_busy(shost))
goto starved;
 
/*
@@ -1555,7 +1564,7 @@ static inline int scsi_host_queue_ready(struct 
request_queue *q,
 "unblocking host at zero depth\n"));
}
 
-   if (shost->can_queue > 0 && busy >= shost->can_queue)
+   if (!shost->use_blk_mq && shost->can_queue > 0 && busy >= 
shost->can_queue)
goto starved;
if (shost->host_self_blocked)
goto starved;
@@ -1641,7 +1650,9 @@ static void scsi_kill_request(struct request *req, struct 
request_queue *q)
 * with the locks as normal issue path does.
 */
atomic_inc(>device_busy);
-   atomic_inc(>host_busy);
+
+   if (!shost->use_blk_mq)
+   atomic_inc(>host_busy);
if (starget->can_queue > 0)
atomic_inc(>target_busy);
 
-- 
2.9.5



[PATCH 1/3] scsi: introduce scsi_host_busy()

2018-04-20 Thread Ming Lei
This patch introduces SCSI middle layer API of scsi_host_busy() for
drivers to read the host-wide counter of scsi_host->host_busy.

Cc: Omar Sandoval ,
Cc: "Martin K. Petersen" ,
Cc: James Bottomley ,
Cc: Christoph Hellwig ,
Cc: Don Brace 
Cc: Kashyap Desai 
Cc: Mike Snitzer 
Cc: Hannes Reinecke 
Cc: Laurence Oberman 
Signed-off-by: Ming Lei 
---
 drivers/scsi/hosts.c | 10 ++
 include/scsi/scsi_host.h |  1 +
 2 files changed, 11 insertions(+)

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index 7649d63a1b8d..69beb30205f1 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -565,6 +565,16 @@ struct Scsi_Host *scsi_host_get(struct Scsi_Host *shost)
 EXPORT_SYMBOL(scsi_host_get);
 
 /**
+ * scsi_host_busy - Return the host busy counter
+ * @shost: Pointer to Scsi_Host to inc.
+ **/
+int scsi_host_busy(struct Scsi_Host *shost)
+{
+   return atomic_read(>host_busy);
+}
+EXPORT_SYMBOL(scsi_host_busy);
+
+/**
  * scsi_host_put - dec a Scsi_Host ref count
  * @shost: Pointer to Scsi_Host to dec.
  **/
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index 12f454cb6f61..44ab89268f30 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -758,6 +758,7 @@ extern void scsi_scan_host(struct Scsi_Host *);
 extern void scsi_rescan_device(struct device *);
 extern void scsi_remove_host(struct Scsi_Host *);
 extern struct Scsi_Host *scsi_host_get(struct Scsi_Host *);
+extern int scsi_host_busy(struct Scsi_Host *shost);
 extern void scsi_host_put(struct Scsi_Host *t);
 extern struct Scsi_Host *scsi_host_lookup(unsigned short);
 extern const char *scsi_host_state_name(enum scsi_host_state);
-- 
2.9.5



[PATCH 2/3] scsi: read host_busy via scsi_host_busy()

2018-04-20 Thread Ming Lei
No functional change, just replace the direct read of scsi_host->host_busy
with scsi_host_busy().

Cc: Omar Sandoval ,
Cc: "Martin K. Petersen" ,
Cc: James Bottomley ,
Cc: Christoph Hellwig ,
Cc: Don Brace 
Cc: Kashyap Desai 
Cc: Mike Snitzer 
Cc: Hannes Reinecke 
Cc: Laurence Oberman 
Signed-off-by: Ming Lei 
---
 drivers/scsi/advansys.c   | 8 
 drivers/scsi/libsas/sas_scsi_host.c   | 4 ++--
 drivers/scsi/megaraid/megaraid_sas_base.c | 2 +-
 drivers/scsi/mpt3sas/mpt3sas_base.c   | 4 ++--
 drivers/scsi/qlogicpti.c  | 2 +-
 drivers/scsi/scsi.c   | 2 +-
 drivers/scsi/scsi_error.c | 6 +++---
 drivers/scsi/scsi_sysfs.c | 2 +-
 8 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/drivers/scsi/advansys.c b/drivers/scsi/advansys.c
index 24e57e770432..38ee024cbfc7 100644
--- a/drivers/scsi/advansys.c
+++ b/drivers/scsi/advansys.c
@@ -2416,8 +2416,8 @@ static void asc_prt_scsi_host(struct Scsi_Host *s)
struct asc_board *boardp = shost_priv(s);
 
printk("Scsi_Host at addr 0x%p, device %s\n", s, dev_name(boardp->dev));
-   printk(" host_busy %u, host_no %d,\n",
-  atomic_read(>host_busy), s->host_no);
+   printk(" host_busy %d, host_no %d,\n",
+  scsi_host_busy(s), s->host_no);
 
printk(" base 0x%lx, io_port 0x%lx, irq %d,\n",
   (ulong)s->base, (ulong)s->io_port, boardp->irq);
@@ -3182,8 +3182,8 @@ static void asc_prt_driver_conf(struct seq_file *m, 
struct Scsi_Host *shost)
shost->host_no);
 
seq_printf(m,
-  " host_busy %u, max_id %u, max_lun %llu, max_channel %u\n",
-  atomic_read(>host_busy), shost->max_id,
+  " host_busy %d, max_id %u, max_lun %llu, max_channel %u\n",
+  scsi_host_busy(shost), shost->max_id,
   shost->max_lun, shost->max_channel);
 
seq_printf(m,
diff --git a/drivers/scsi/libsas/sas_scsi_host.c 
b/drivers/scsi/libsas/sas_scsi_host.c
index ceab5e5c41c2..33229348dcb6 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -759,7 +759,7 @@ void sas_scsi_recover_host(struct Scsi_Host *shost)
spin_unlock_irq(shost->host_lock);
 
SAS_DPRINTK("Enter %s busy: %d failed: %d\n",
-   __func__, atomic_read(>host_busy), 
shost->host_failed);
+   __func__, scsi_host_busy(shost), shost->host_failed);
/*
 * Deal with commands that still have SAS tasks (i.e. they didn't
 * complete via the normal sas_task completion mechanism),
@@ -801,7 +801,7 @@ void sas_scsi_recover_host(struct Scsi_Host *shost)
goto retry;
 
SAS_DPRINTK("--- Exit %s: busy: %d failed: %d tries: %d\n",
-   __func__, atomic_read(>host_busy),
+   __func__, scsi_host_busy(shost),
shost->host_failed, tries);
 }
 
diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c 
b/drivers/scsi/megaraid/megaraid_sas_base.c
index b89c6e6c0589..008a3bdfa948 100644
--- a/drivers/scsi/megaraid/megaraid_sas_base.c
+++ b/drivers/scsi/megaraid/megaraid_sas_base.c
@@ -2822,7 +2822,7 @@ static int megasas_reset_bus_host(struct scsi_cmnd *scmd)
"SCSI command pointer: (%p)\t SCSI host state: %d\t"
" SCSI host busy: %d\t FW outstanding: %d\n",
scmd, scmd->device->host->shost_state,
-   atomic_read((atomic_t *)>device->host->host_busy),
+   scsi_host_busy(scmd->device->host),
atomic_read(>fw_outstanding));
 
/*
diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c 
b/drivers/scsi/mpt3sas/mpt3sas_base.c
index 61f93a134956..b7325355df8e 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -3260,7 +3260,7 @@ _base_recovery_check(struct MPT3SAS_ADAPTER *ioc)
 * See _wait_for_commands_to_complete() call with regards to this code.
 */
if (ioc->shost_recovery && ioc->pending_io_count) {
-   ioc->pending_io_count = atomic_read(>shost->host_busy);
+   ioc->pending_io_count = scsi_host_busy(ioc->shost);
if (ioc->pending_io_count == 0)
wake_up(>reset_wq);
}
@@ -6658,7 +6658,7 @@ mpt3sas_wait_for_commands_to_complete(struct 
MPT3SAS_ADAPTER *ioc)
return;
 
/* pending command count */
-   ioc->pending_io_count = atomic_read(>shost->host_busy);
+   ioc->pending_io_count = scsi_host_busy(ioc->shost);
 
if (!ioc->pending_io_count)
return;
diff --git a/drivers/scsi/qlogicpti.c b/drivers/scsi/qlogicpti.c
index 

[PATCH 0/3] scsi: scsi-mq: don't hold host_busy in IO path

2018-04-20 Thread Ming Lei
Hi,

This patches removes the expensive atomic opeation on host-wide counter
of .host_busy for scsi-mq, and it is observed that IOPS can be increased by
15% with this change in IO test over scsi_debug.


Ming Lei (3):
  scsi: introduce scsi_host_busy()
  scsi: read host_busy via scsi_host_busy()
  scsi: avoid to hold host-wide counter of host_busy for scsi_mq

 drivers/scsi/advansys.c   |  8 
 drivers/scsi/hosts.c  | 32 +++
 drivers/scsi/libsas/sas_scsi_host.c   |  4 ++--
 drivers/scsi/megaraid/megaraid_sas_base.c |  2 +-
 drivers/scsi/mpt3sas/mpt3sas_base.c   |  4 ++--
 drivers/scsi/qlogicpti.c  |  2 +-
 drivers/scsi/scsi.c   |  2 +-
 drivers/scsi/scsi_error.c |  6 +++---
 drivers/scsi/scsi_lib.c   | 23 --
 drivers/scsi/scsi_sysfs.c |  2 +-
 include/scsi/scsi_host.h  |  1 +
 11 files changed, 65 insertions(+), 21 deletions(-)


Cc: Omar Sandoval ,
Cc: "Martin K. Petersen" ,
Cc: James Bottomley ,
Cc: Christoph Hellwig ,
Cc: Don Brace 
Cc: Kashyap Desai 
Cc: Mike Snitzer 
Cc: Hannes Reinecke 
Cc: Laurence Oberman 

-- 
2.9.5



Re: [PATCH v6] blk-mq: Avoid that a completion can be ignored for BLK_EH_RESET_TIMER

2018-04-20 Thread jianchao.wang
Hi Bart

On 04/20/2018 12:43 AM, Bart Van Assche wrote:
> Use the deadline instead of the request generation to detect whether
>   or not a request timer fired after reinitialization of a request

Maybe use deadline to do this is not suitable.

Let's think of the following scenario.

T1/T2 times in nano seconds
J1/J2 times in jiffies

rq is started at T1,J1
blk_mq_timeout_work
  -> blk_mq_check_expired
-> save rq->__deadline (J1 + MQ_RQ_IN_FLIGHT)

 rq is completed and freed 
 rq is allocated and started again 
on T2
 rq->__deadline = J2 + 
MQ_RQ_IN_FLIGHT
  -> synchronize srcu/rcu
  -> blk_mq_terminate_expired
-> rq->__deadline (J2 + MQ_RQ_IN_FLIGHT)

1. deadline = rq->__deadline & ~RQ_STATE_MASK (0x3)
   if J2-J1 < 4 jiffies, we will get the same deadline value.

2. even if we do some bit shift when save and get deadline
   if T2 - T1 < time of 1 jiffies, we sill get the same deadline.

Thanks
Jianchao