Re: [PATCH v2 6/6] staging: lustre: mdc: use large xattr buffers for old servers

2018-05-31 Thread Dilger, Andreas
On May 31, 2018, at 18:54, Greg Kroah-Hartman  
wrote:
> 
> On Tue, May 29, 2018 at 10:21:45AM -0400, James Simmons wrote:
>> From: "John L. Hammond" 
>> 
>> Pre 2.10.1 MDTs will crash when they receive a listxattr (MDS_GETXATTR
>> with OBD_MD_FLXATTRLS) RPC for an orphan or dead object. So for
>> clients connected to these older MDTs, try to avoid sending listxattr
>> RPCs by making the bulk getxattr (MDS_GETXATTR with OBD_MD_FLXATTRALL)
>> more likely to succeed and thereby reducing the chances of falling
>> back to listxattr.
>> 
>> +#if LUSTRE_VERSION_CODE < OBD_OCD_VERSION(3, 0, 53, 0)
> 
> Why are you adding pointless version checks to mainline?  Please don't
> add new ones of these, you need to be working on removing the existing
> ones.

These are not Linux kernel version checks, but rather Lustre release version
checks.  This allows us to remove workarounds like this in the future when
they are no longer needed, rather than accumulating cruft forever.  It's like
the separation of NFSv2 vs NFSv3 vs NFSv4.

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH v2 6/6] staging: lustre: mdc: use large xattr buffers for old servers

2018-05-31 Thread Dilger, Andreas
On May 31, 2018, at 18:54, Greg Kroah-Hartman  
wrote:
> 
> On Tue, May 29, 2018 at 10:21:45AM -0400, James Simmons wrote:
>> From: "John L. Hammond" 
>> 
>> Pre 2.10.1 MDTs will crash when they receive a listxattr (MDS_GETXATTR
>> with OBD_MD_FLXATTRLS) RPC for an orphan or dead object. So for
>> clients connected to these older MDTs, try to avoid sending listxattr
>> RPCs by making the bulk getxattr (MDS_GETXATTR with OBD_MD_FLXATTRALL)
>> more likely to succeed and thereby reducing the chances of falling
>> back to listxattr.
>> 
>> +#if LUSTRE_VERSION_CODE < OBD_OCD_VERSION(3, 0, 53, 0)
> 
> Why are you adding pointless version checks to mainline?  Please don't
> add new ones of these, you need to be working on removing the existing
> ones.

These are not Linux kernel version checks, but rather Lustre release version
checks.  This allows us to remove workarounds like this in the future when
they are no longer needed, rather than accumulating cruft forever.  It's like
the separation of NFSv2 vs NFSv3 vs NFSv4.

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH 4/4] staging: lustre: obdclass: change object lookup to no wait mode

2018-05-16 Thread Dilger, Andreas
On May 16, 2018, at 02:00, Dan Carpenter  wrote:
> 
> On Tue, May 15, 2018 at 04:02:55PM +0100, James Simmons wrote:
>> 
/*
 * Allocate new object. This may result in rather complicated
 * operations, including fld queries, inode loading, etc.
 */
o = lu_object_alloc(env, dev, f, conf);
 -  if (IS_ERR(o))
 +  if (unlikely(IS_ERR(o)))
return o;
 
>>> 
>>> This is an unrelated and totally pointless.  likely/unlikely annotations
>>> hurt readability, and they should only be added if it's something which
>>> is going to show up in benchmarking.  lu_object_alloc() is already too
>>> slow for the unlikely() to make a difference and anyway IS_ERR() has an
>>> unlikely built in so it's duplicative...
>> 
>> Sounds like a good checkpatch case to test for :-)
> 
> The likely/unlikely annotations have their place in fast paths so a
> checkpatch warning would get annoying...

I think James was suggesting a check for unlikely(IS_ERR()), or possibly
a check for unlikely() on something that is already unlikely() after CPP
expansion.

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH 4/4] staging: lustre: obdclass: change object lookup to no wait mode

2018-05-16 Thread Dilger, Andreas
On May 16, 2018, at 02:00, Dan Carpenter  wrote:
> 
> On Tue, May 15, 2018 at 04:02:55PM +0100, James Simmons wrote:
>> 
/*
 * Allocate new object. This may result in rather complicated
 * operations, including fld queries, inode loading, etc.
 */
o = lu_object_alloc(env, dev, f, conf);
 -  if (IS_ERR(o))
 +  if (unlikely(IS_ERR(o)))
return o;
 
>>> 
>>> This is an unrelated and totally pointless.  likely/unlikely annotations
>>> hurt readability, and they should only be added if it's something which
>>> is going to show up in benchmarking.  lu_object_alloc() is already too
>>> slow for the unlikely() to make a difference and anyway IS_ERR() has an
>>> unlikely built in so it's duplicative...
>> 
>> Sounds like a good checkpatch case to test for :-)
> 
> The likely/unlikely annotations have their place in fast paths so a
> checkpatch warning would get annoying...

I think James was suggesting a check for unlikely(IS_ERR()), or possibly
a check for unlikely() on something that is already unlikely() after CPP
expansion.

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [lustre-devel] [PATCH] staging: lustre: Fix an error handling path in 'client_common_fill_super()'

2018-05-12 Thread Dilger, Andreas
On May 12, 2018, at 00:33, Christophe JAILLET  
wrote:
> 
> According to error handling path before and after this one, we should go
> to 'out_md_fid' here, instead of 'out_md', if 'obd_connect()' fails.
> 
> Signed-off-by: Christophe JAILLET 

Good catch.

Reviewed-by: Andreas Dilger 

> ---
> The last goto 'out_lock_cn_cb' looks spurious but is correct.
> In case of error, 'd_make_root()' performs a 'iput()', so skipping it in
> the error handling path lokks fine to me.
> ---
> drivers/staging/lustre/lustre/llite/llite_lib.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c 
> b/drivers/staging/lustre/lustre/llite/llite_lib.c
> index 60dbe888e336..83eb2da2c9ad 100644
> --- a/drivers/staging/lustre/lustre/llite/llite_lib.c
> +++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
> @@ -400,11 +400,11 @@ static int client_common_fill_super(struct super_block 
> *sb, char *md, char *dt)
>   LCONSOLE_ERROR_MSG(0x150,
>  "An OST (dt %s) is performing recovery, of 
> which this client is not a part.  Please wait for recovery to complete, 
> abort, or time out.\n",
>  dt);
> - goto out_md;
> + goto out_md_fid;
>   } else if (err) {
>   CERROR("%s: Cannot connect to %s: rc = %d\n",
>  sbi->ll_dt_exp->exp_obd->obd_name, dt, err);
> - goto out_md;
> + goto out_md_fid;
>   }
> 
>   sbi->ll_dt_exp->exp_connect_data = *data;
> -- 
> 2.17.0
> 
> ___
> lustre-devel mailing list
> lustre-de...@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [lustre-devel] [PATCH] staging: lustre: Fix an error handling path in 'client_common_fill_super()'

2018-05-12 Thread Dilger, Andreas
On May 12, 2018, at 00:33, Christophe JAILLET  
wrote:
> 
> According to error handling path before and after this one, we should go
> to 'out_md_fid' here, instead of 'out_md', if 'obd_connect()' fails.
> 
> Signed-off-by: Christophe JAILLET 

Good catch.

Reviewed-by: Andreas Dilger 

> ---
> The last goto 'out_lock_cn_cb' looks spurious but is correct.
> In case of error, 'd_make_root()' performs a 'iput()', so skipping it in
> the error handling path lokks fine to me.
> ---
> drivers/staging/lustre/lustre/llite/llite_lib.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c 
> b/drivers/staging/lustre/lustre/llite/llite_lib.c
> index 60dbe888e336..83eb2da2c9ad 100644
> --- a/drivers/staging/lustre/lustre/llite/llite_lib.c
> +++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
> @@ -400,11 +400,11 @@ static int client_common_fill_super(struct super_block 
> *sb, char *md, char *dt)
>   LCONSOLE_ERROR_MSG(0x150,
>  "An OST (dt %s) is performing recovery, of 
> which this client is not a part.  Please wait for recovery to complete, 
> abort, or time out.\n",
>  dt);
> - goto out_md;
> + goto out_md_fid;
>   } else if (err) {
>   CERROR("%s: Cannot connect to %s: rc = %d\n",
>  sbi->ll_dt_exp->exp_obd->obd_name, dt, err);
> - goto out_md;
> + goto out_md_fid;
>   }
> 
>   sbi->ll_dt_exp->exp_connect_data = *data;
> -- 
> 2.17.0
> 
> ___
> lustre-devel mailing list
> lustre-de...@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [lustre-devel] [PATCH] staging: lustre: fix spelling mistake: "req_ulinked" -> "req_unlinked"

2018-05-11 Thread Dilger, Andreas
On May 11, 2018, at 07:38, Colin King  wrote:
> 
> From: Colin Ian King 
> 
> Trivial fix to spelling mistake in DEBUG_REQ message text
> 
> Signed-off-by: Colin Ian King 

Reviewed-by: Andreas Dilger 

> ---
> drivers/staging/lustre/lustre/ptlrpc/client.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/staging/lustre/lustre/ptlrpc/client.c 
> b/drivers/staging/lustre/lustre/ptlrpc/client.c
> index eeb281266413..a51feaeb7734 100644
> --- a/drivers/staging/lustre/lustre/ptlrpc/client.c
> +++ b/drivers/staging/lustre/lustre/ptlrpc/client.c
> @@ -2514,7 +2514,7 @@ static int ptlrpc_unregister_reply(struct 
> ptlrpc_request *request, int async)
>   }
> 
>   DEBUG_REQ(D_WARNING, request,
> -   "Unexpectedly long timeout receiving_reply=%d 
> req_ulinked=%d reply_unlinked=%d",
> +   "Unexpectedly long timeout receiving_reply=%d 
> req_unlinked=%d reply_unlinked=%d",
> request->rq_receiving_reply,
> request->rq_req_unlinked,
> request->rq_reply_unlinked);
> -- 
> 2.17.0

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [lustre-devel] [PATCH] staging: lustre: fix spelling mistake: "req_ulinked" -> "req_unlinked"

2018-05-11 Thread Dilger, Andreas
On May 11, 2018, at 07:38, Colin King  wrote:
> 
> From: Colin Ian King 
> 
> Trivial fix to spelling mistake in DEBUG_REQ message text
> 
> Signed-off-by: Colin Ian King 

Reviewed-by: Andreas Dilger 

> ---
> drivers/staging/lustre/lustre/ptlrpc/client.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/staging/lustre/lustre/ptlrpc/client.c 
> b/drivers/staging/lustre/lustre/ptlrpc/client.c
> index eeb281266413..a51feaeb7734 100644
> --- a/drivers/staging/lustre/lustre/ptlrpc/client.c
> +++ b/drivers/staging/lustre/lustre/ptlrpc/client.c
> @@ -2514,7 +2514,7 @@ static int ptlrpc_unregister_reply(struct 
> ptlrpc_request *request, int async)
>   }
> 
>   DEBUG_REQ(D_WARNING, request,
> -   "Unexpectedly long timeout receiving_reply=%d 
> req_ulinked=%d reply_unlinked=%d",
> +   "Unexpectedly long timeout receiving_reply=%d 
> req_unlinked=%d reply_unlinked=%d",
> request->rq_receiving_reply,
> request->rq_req_unlinked,
> request->rq_reply_unlinked);
> -- 
> 2.17.0

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH v2] staging: lustre: llite: fix potential missing-check bug when copying lumv

2018-05-04 Thread Dilger, Andreas
On May 3, 2018, at 22:19, Wenwen Wang  wrote:
> 
> On Tue, May 1, 2018 at 3:46 AM, Dan Carpenter  
> wrote:
>> On Mon, Apr 30, 2018 at 05:56:10PM -0500, Wenwen Wang wrote:
>>> However, given that the user data resides in the user space, a malicious
>>> user-space process can race to change the data between the two copies. By
>>> doing so, the attacker can provide a data with an inconsistent version,
>>> e.g., v1 version + v3 data. This can lead to logical errors in the
>>> following execution in ll_dir_setstripe(), which performs different actions
>>> according to the version specified by the field lmm_magic.
>> 
>> This part is misleading.  The fix is to improve readability and make
>> static checkers happy.  You're over dramatizing it to make people think
>> it has a security impact when it doesn't.
>> 
>> If the user wants to specify v1 data they can just say that on the first
>> read.  They don't need to do funny tricks and race between the two
>> reads.  It's allowed.
>> 
>> In other words this allows the user to do something in a very
>> complicated way which they are already allowed to do in a very simple
>> straight forward way.
>> 
>> regards,
>> dan carpenter
> 
> Thanks for your comment, Dan! How about this:
> 
> However, given that the user data resides in the user space, a
> malicious user-space process can race to change the data between the
> two copies. By doing so, the attacker can provide a data with an
> inconsistent version, e.g., v1 version + v3 data. The current kernel
> can handle such inconsistent data. But, it may pose a potential
> security risk for future implementations. Also, to improve code
> readability and make static analysis tools happy, which will warn
> about read-verify-re-read type bugs, this issue should be fixed.

There is nothing preventing the user from using struct lov_mds_md_v3 but
filling in lmm_magic = LOV_MAGIC_V1 from the beginning, no need for a race.

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH v2] staging: lustre: llite: fix potential missing-check bug when copying lumv

2018-05-04 Thread Dilger, Andreas
On May 3, 2018, at 22:19, Wenwen Wang  wrote:
> 
> On Tue, May 1, 2018 at 3:46 AM, Dan Carpenter  
> wrote:
>> On Mon, Apr 30, 2018 at 05:56:10PM -0500, Wenwen Wang wrote:
>>> However, given that the user data resides in the user space, a malicious
>>> user-space process can race to change the data between the two copies. By
>>> doing so, the attacker can provide a data with an inconsistent version,
>>> e.g., v1 version + v3 data. This can lead to logical errors in the
>>> following execution in ll_dir_setstripe(), which performs different actions
>>> according to the version specified by the field lmm_magic.
>> 
>> This part is misleading.  The fix is to improve readability and make
>> static checkers happy.  You're over dramatizing it to make people think
>> it has a security impact when it doesn't.
>> 
>> If the user wants to specify v1 data they can just say that on the first
>> read.  They don't need to do funny tricks and race between the two
>> reads.  It's allowed.
>> 
>> In other words this allows the user to do something in a very
>> complicated way which they are already allowed to do in a very simple
>> straight forward way.
>> 
>> regards,
>> dan carpenter
> 
> Thanks for your comment, Dan! How about this:
> 
> However, given that the user data resides in the user space, a
> malicious user-space process can race to change the data between the
> two copies. By doing so, the attacker can provide a data with an
> inconsistent version, e.g., v1 version + v3 data. The current kernel
> can handle such inconsistent data. But, it may pose a potential
> security risk for future implementations. Also, to improve code
> readability and make static analysis tools happy, which will warn
> about read-verify-re-read type bugs, this issue should be fixed.

There is nothing preventing the user from using struct lov_mds_md_v3 but
filling in lmm_magic = LOV_MAGIC_V1 from the beginning, no need for a race.

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH 1/4] staging: lustre: obdclass: change spinlock of key to rwlock

2018-05-03 Thread Dilger, Andreas
On May 3, 2018, at 07:50, David Laight  wrote:
> 
> From: James Simmons
>> Sent: 02 May 2018 19:22
>> From: Li Xi 
>> 
>> Most of the time, keys are never changed. So rwlock might be
>> better for the concurrency of key read.
> 
> OTOH unless there is contention on the spin lock during reads the
> additional cost of a rwlock (probably double that of a spinlock)
> will hurt performance.
> 
> ...
>> -spin_lock(_keys_guard);
>> +read_lock(_keys_guard);
>>  atomic_inc(_key_initing_cnt);
>> -spin_unlock(_keys_guard);
>> +read_unlock(_keys_guard);
> 
> WTF, seems unlikely that you need to hold any kind of lock
> over an atomic_inc().
> 
> If this is just ensuring that no code holds the lock then
> it would need to request the write_lock().
> (and would need a comment)

There was a fair amount of benchmarking done for this that shows the
performance is significantly improved with the patch, which can be
seen in the ticket that was referenced in the original commit comment:

https://jira.hpdd.intel.com/browse/LU-6800?focusedCommentId=121776#comment-121776

That said, it might be good to include this information into the
commit comment itself.

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH 1/4] staging: lustre: obdclass: change spinlock of key to rwlock

2018-05-03 Thread Dilger, Andreas
On May 3, 2018, at 07:50, David Laight  wrote:
> 
> From: James Simmons
>> Sent: 02 May 2018 19:22
>> From: Li Xi 
>> 
>> Most of the time, keys are never changed. So rwlock might be
>> better for the concurrency of key read.
> 
> OTOH unless there is contention on the spin lock during reads the
> additional cost of a rwlock (probably double that of a spinlock)
> will hurt performance.
> 
> ...
>> -spin_lock(_keys_guard);
>> +read_lock(_keys_guard);
>>  atomic_inc(_key_initing_cnt);
>> -spin_unlock(_keys_guard);
>> +read_unlock(_keys_guard);
> 
> WTF, seems unlikely that you need to hold any kind of lock
> over an atomic_inc().
> 
> If this is just ensuring that no code holds the lock then
> it would need to request the write_lock().
> (and would need a comment)

There was a fair amount of benchmarking done for this that shows the
performance is significantly improved with the patch, which can be
seen in the ticket that was referenced in the original commit comment:

https://jira.hpdd.intel.com/browse/LU-6800?focusedCommentId=121776#comment-121776

That said, it might be good to include this information into the
commit comment itself.

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [lustre-devel] [PATCH 04/10] staging: lustre: lu_object: move retry logic inside htable_lookup

2018-05-01 Thread Dilger, Andreas
On Apr 30, 2018, at 21:52, NeilBrown  wrote:
> 
> The current retry logic, to wait when a 'dying' object is found,
> spans multiple functions.  The process is attached to a waitqueue
> and set TASK_UNINTERRUPTIBLE in htable_lookup, and this status
> is passed back through lu_object_find_try() to lu_object_find_at()
> where schedule() is called and the process is removed from the queue.
> 
> This can be simplified by moving all the logic (including
> hashtable locking) inside htable_lookup(), which now never returns
> EAGAIN.
> 
> Note that htable_lookup() is called with the hash bucket lock
> held, and will drop and retake it if it needs to schedule.
> 
> I made this a 'goto' loop rather than a 'while(1)' loop as the
> diff is easier to read.
> 
> Signed-off-by: NeilBrown 
> ---
> drivers/staging/lustre/lustre/obdclass/lu_object.c |   73 +++-
> 1 file changed, 27 insertions(+), 46 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/obdclass/lu_object.c 
> b/drivers/staging/lustre/lustre/obdclass/lu_object.c
> index 2bf089817157..93daa52e2535 100644
> --- a/drivers/staging/lustre/lustre/obdclass/lu_object.c
> +++ b/drivers/staging/lustre/lustre/obdclass/lu_object.c
> @@ -586,16 +586,21 @@ EXPORT_SYMBOL(lu_object_print);
> static struct lu_object *htable_lookup(struct lu_site *s,

It's probably a good idea to add a comment for this function that it may
drop and re-acquire the hash bucket lock internally.

>  struct cfs_hash_bd *bd,
>  const struct lu_fid *f,
> -wait_queue_entry_t *waiter,
>  __u64 *version)
> {
> + struct cfs_hash *hs = s->ls_obj_hash;
>   struct lu_site_bkt_data *bkt;
>   struct lu_object_header *h;
>   struct hlist_node   *hnode;
> - __u64  ver = cfs_hash_bd_version_get(bd);
> + __u64 ver;
> + wait_queue_entry_t waiter;
> 
> - if (*version == ver)
> +retry:
> + ver = cfs_hash_bd_version_get(bd);
> +
> + if (*version == ver) {
>   return ERR_PTR(-ENOENT);
> + }

(style) we don't need the {} around a single-line if statement

>   *version = ver;
>   bkt = cfs_hash_bd_extra_get(s->ls_obj_hash, bd);
> @@ -625,11 +630,15 @@ static struct lu_object *htable_lookup(struct lu_site 
> *s,
>* drained), and moreover, lookup has to wait until object is freed.
>*/
> 
> - init_waitqueue_entry(waiter, current);
> - add_wait_queue(>lsb_marche_funebre, waiter);
> + init_waitqueue_entry(, current);
> + add_wait_queue(>lsb_marche_funebre, );
>   set_current_state(TASK_UNINTERRUPTIBLE);
>   lprocfs_counter_incr(s->ls_stats, LU_SS_CACHE_DEATH_RACE);
> - return ERR_PTR(-EAGAIN);
> + cfs_hash_bd_unlock(hs, bd, 1);

This looks like it isn't unlocking and locking the hash bucket in the same
manner that it was done in the caller.  Here excl = 1, but in the caller
you changed it to excl = 0?

> + schedule();
> + remove_wait_queue(>lsb_marche_funebre, );

Is it worthwhile to use your new helper function here to get the wq from "s"?

> + cfs_hash_bd_lock(hs, bd, 1);
> + goto retry;
> }
> 
> /**
> @@ -693,13 +702,14 @@ static struct lu_object *lu_object_new(const struct 
> lu_env *env,
> }
> 
> /**
> - * Core logic of lu_object_find*() functions.
> + * Much like lu_object_find(), but top level device of object is specifically
> + * \a dev rather than top level device of the site. This interface allows
> + * objects of different "stacking" to be created within the same site.
>  */
> -static struct lu_object *lu_object_find_try(const struct lu_env *env,
> - struct lu_device *dev,
> - const struct lu_fid *f,
> - const struct lu_object_conf *conf,
> - wait_queue_entry_t *waiter)
> +struct lu_object *lu_object_find_at(const struct lu_env *env,
> + struct lu_device *dev,
> + const struct lu_fid *f,
> + const struct lu_object_conf *conf)
> {
>   struct lu_object  *o;
>   struct lu_object  *shadow;
> @@ -725,17 +735,16 @@ static struct lu_object *lu_object_find_try(const 
> struct lu_env *env,
>* It is unnecessary to perform lookup-alloc-lookup-insert, instead,
>* just alloc and insert directly.
>*
> -  * If dying object is found during index search, add @waiter to the
> -  * site wait-queue and return ERR_PTR(-EAGAIN).
>*/
>   if (conf && conf->loc_flags & LOC_F_NEW)
>   return lu_object_new(env, dev, f, conf);
> 
>   s  = dev->ld_site;
>   hs = s->ls_obj_hash;
> - cfs_hash_bd_get_and_lock(hs, (void *)f, , 1);
> - o = htable_lookup(s, , f, waiter, );
> - 

Re: [lustre-devel] [PATCH 04/10] staging: lustre: lu_object: move retry logic inside htable_lookup

2018-05-01 Thread Dilger, Andreas
On Apr 30, 2018, at 21:52, NeilBrown  wrote:
> 
> The current retry logic, to wait when a 'dying' object is found,
> spans multiple functions.  The process is attached to a waitqueue
> and set TASK_UNINTERRUPTIBLE in htable_lookup, and this status
> is passed back through lu_object_find_try() to lu_object_find_at()
> where schedule() is called and the process is removed from the queue.
> 
> This can be simplified by moving all the logic (including
> hashtable locking) inside htable_lookup(), which now never returns
> EAGAIN.
> 
> Note that htable_lookup() is called with the hash bucket lock
> held, and will drop and retake it if it needs to schedule.
> 
> I made this a 'goto' loop rather than a 'while(1)' loop as the
> diff is easier to read.
> 
> Signed-off-by: NeilBrown 
> ---
> drivers/staging/lustre/lustre/obdclass/lu_object.c |   73 +++-
> 1 file changed, 27 insertions(+), 46 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/obdclass/lu_object.c 
> b/drivers/staging/lustre/lustre/obdclass/lu_object.c
> index 2bf089817157..93daa52e2535 100644
> --- a/drivers/staging/lustre/lustre/obdclass/lu_object.c
> +++ b/drivers/staging/lustre/lustre/obdclass/lu_object.c
> @@ -586,16 +586,21 @@ EXPORT_SYMBOL(lu_object_print);
> static struct lu_object *htable_lookup(struct lu_site *s,

It's probably a good idea to add a comment for this function that it may
drop and re-acquire the hash bucket lock internally.

>  struct cfs_hash_bd *bd,
>  const struct lu_fid *f,
> -wait_queue_entry_t *waiter,
>  __u64 *version)
> {
> + struct cfs_hash *hs = s->ls_obj_hash;
>   struct lu_site_bkt_data *bkt;
>   struct lu_object_header *h;
>   struct hlist_node   *hnode;
> - __u64  ver = cfs_hash_bd_version_get(bd);
> + __u64 ver;
> + wait_queue_entry_t waiter;
> 
> - if (*version == ver)
> +retry:
> + ver = cfs_hash_bd_version_get(bd);
> +
> + if (*version == ver) {
>   return ERR_PTR(-ENOENT);
> + }

(style) we don't need the {} around a single-line if statement

>   *version = ver;
>   bkt = cfs_hash_bd_extra_get(s->ls_obj_hash, bd);
> @@ -625,11 +630,15 @@ static struct lu_object *htable_lookup(struct lu_site 
> *s,
>* drained), and moreover, lookup has to wait until object is freed.
>*/
> 
> - init_waitqueue_entry(waiter, current);
> - add_wait_queue(>lsb_marche_funebre, waiter);
> + init_waitqueue_entry(, current);
> + add_wait_queue(>lsb_marche_funebre, );
>   set_current_state(TASK_UNINTERRUPTIBLE);
>   lprocfs_counter_incr(s->ls_stats, LU_SS_CACHE_DEATH_RACE);
> - return ERR_PTR(-EAGAIN);
> + cfs_hash_bd_unlock(hs, bd, 1);

This looks like it isn't unlocking and locking the hash bucket in the same
manner that it was done in the caller.  Here excl = 1, but in the caller
you changed it to excl = 0?

> + schedule();
> + remove_wait_queue(>lsb_marche_funebre, );

Is it worthwhile to use your new helper function here to get the wq from "s"?

> + cfs_hash_bd_lock(hs, bd, 1);
> + goto retry;
> }
> 
> /**
> @@ -693,13 +702,14 @@ static struct lu_object *lu_object_new(const struct 
> lu_env *env,
> }
> 
> /**
> - * Core logic of lu_object_find*() functions.
> + * Much like lu_object_find(), but top level device of object is specifically
> + * \a dev rather than top level device of the site. This interface allows
> + * objects of different "stacking" to be created within the same site.
>  */
> -static struct lu_object *lu_object_find_try(const struct lu_env *env,
> - struct lu_device *dev,
> - const struct lu_fid *f,
> - const struct lu_object_conf *conf,
> - wait_queue_entry_t *waiter)
> +struct lu_object *lu_object_find_at(const struct lu_env *env,
> + struct lu_device *dev,
> + const struct lu_fid *f,
> + const struct lu_object_conf *conf)
> {
>   struct lu_object  *o;
>   struct lu_object  *shadow;
> @@ -725,17 +735,16 @@ static struct lu_object *lu_object_find_try(const 
> struct lu_env *env,
>* It is unnecessary to perform lookup-alloc-lookup-insert, instead,
>* just alloc and insert directly.
>*
> -  * If dying object is found during index search, add @waiter to the
> -  * site wait-queue and return ERR_PTR(-EAGAIN).
>*/
>   if (conf && conf->loc_flags & LOC_F_NEW)
>   return lu_object_new(env, dev, f, conf);
> 
>   s  = dev->ld_site;
>   hs = s->ls_obj_hash;
> - cfs_hash_bd_get_and_lock(hs, (void *)f, , 1);
> - o = htable_lookup(s, , f, waiter, );
> - cfs_hash_bd_unlock(hs, , 

Re: [PATCH 03/10] staging: lustre: lu_object: discard extra lru count.

2018-04-30 Thread Dilger, Andreas
On Apr 30, 2018, at 21:52, NeilBrown  wrote:
> 
> lu_object maintains 2 lru counts.
> One is a per-bucket lsb_lru_len.
> The other is the per-cpu ls_lru_len_counter.
> 
> The only times the per-bucket counters are use are:
> - a debug message when an object is added
> - in lu_site_stats_get when all the counters are combined.
> 
> The debug message is not essential, and the per-cpu counter
> can be used to get the combined total.
> 
> So discard the per-bucket lsb_lru_len.
> 
> Signed-off-by: NeilBrown 

Looks reasonable, though it would also be possible to fix the percpu
functions rather than adding a workaround in this code.

Reviewed-by: Andreas Dilger 

> ---
> drivers/staging/lustre/lustre/obdclass/lu_object.c |   24 
> 1 file changed, 9 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/obdclass/lu_object.c 
> b/drivers/staging/lustre/lustre/obdclass/lu_object.c
> index 2a8a25d6edb5..2bf089817157 100644
> --- a/drivers/staging/lustre/lustre/obdclass/lu_object.c
> +++ b/drivers/staging/lustre/lustre/obdclass/lu_object.c
> @@ -57,10 +57,6 @@
> #include 
> 
> struct lu_site_bkt_data {
> - /**
> -  * number of object in this bucket on the lsb_lru list.
> -  */
> - longlsb_lru_len;
>   /**
>* LRU list, updated on each access to object. Protected by
>* bucket lock of lu_site::ls_obj_hash.
> @@ -187,10 +183,9 @@ void lu_object_put(const struct lu_env *env, struct 
> lu_object *o)
>   if (!lu_object_is_dying(top)) {
>   LASSERT(list_empty(>loh_lru));
>   list_add_tail(>loh_lru, >lsb_lru);
> - bkt->lsb_lru_len++;
>   percpu_counter_inc(>ls_lru_len_counter);
> - CDEBUG(D_INODE, "Add %p to site lru. hash: %p, bkt: %p, 
> lru_len: %ld\n",
> -o, site->ls_obj_hash, bkt, bkt->lsb_lru_len);
> + CDEBUG(D_INODE, "Add %p to site lru. hash: %p, bkt: %p\n",
> +o, site->ls_obj_hash, bkt);
>   cfs_hash_bd_unlock(site->ls_obj_hash, , 1);
>   return;
>   }
> @@ -238,7 +233,6 @@ void lu_object_unhash(const struct lu_env *env, struct 
> lu_object *o)
> 
>   list_del_init(>loh_lru);
>   bkt = cfs_hash_bd_extra_get(obj_hash, );
> - bkt->lsb_lru_len--;
>   percpu_counter_dec(>ls_lru_len_counter);
>   }
>   cfs_hash_bd_del_locked(obj_hash, , >loh_hash);
> @@ -422,7 +416,6 @@ int lu_site_purge_objects(const struct lu_env *env, 
> struct lu_site *s,
>   cfs_hash_bd_del_locked(s->ls_obj_hash,
>  , >loh_hash);
>   list_move(>loh_lru, );
> - bkt->lsb_lru_len--;
>   percpu_counter_dec(>ls_lru_len_counter);
>   if (did_sth == 0)
>   did_sth = 1;
> @@ -621,7 +614,6 @@ static struct lu_object *htable_lookup(struct lu_site *s,
>   lprocfs_counter_incr(s->ls_stats, LU_SS_CACHE_HIT);
>   if (!list_empty(>loh_lru)) {
>   list_del_init(>loh_lru);
> - bkt->lsb_lru_len--;
>   percpu_counter_dec(>ls_lru_len_counter);
>   }
>   return lu_object_top(h);
> @@ -1834,19 +1826,21 @@ struct lu_site_stats {
>   unsigned intlss_busy;
> };
> 
> -static void lu_site_stats_get(struct cfs_hash *hs,
> +static void lu_site_stats_get(const struct lu_site *s,
> struct lu_site_stats *stats, int populated)
> {
> + struct cfs_hash *hs = s->ls_obj_hash;
>   struct cfs_hash_bd bd;
>   unsigned int i;
> + /* percpu_counter_read_positive() won't accept a const pointer */
> + struct lu_site *s2 = (struct lu_site *)s;

It would seem worthwhile to change the percpu_counter_read_positive() and
percpu_counter_read() arguments to be "const struct percpu_counter *fbc",
rather than doing this cast here.  I can't see any reason that would be bad,
since both implementations just access fbc->count, and do not modify anything.

> + stats->lss_busy += cfs_hash_size_get(hs) -
> + percpu_counter_read_positive(>ls_lru_len_counter);
>   cfs_hash_for_each_bucket(hs, , i) {
> - struct lu_site_bkt_data *bkt = cfs_hash_bd_extra_get(hs, );
>   struct hlist_head   *hhead;
> 
>   cfs_hash_bd_lock(hs, , 1);
> - stats->lss_busy  +=
> - cfs_hash_bd_count_get() - bkt->lsb_lru_len;
>   stats->lss_total += cfs_hash_bd_count_get();
>   stats->lss_max_search = max((int)stats->lss_max_search,
>   cfs_hash_bd_depmax_get());
> @@ -2039,7 +2033,7 @@ int lu_site_stats_print(const struct lu_site *s, struct 
> seq_file *m)

Re: [PATCH 03/10] staging: lustre: lu_object: discard extra lru count.

2018-04-30 Thread Dilger, Andreas
On Apr 30, 2018, at 21:52, NeilBrown  wrote:
> 
> lu_object maintains 2 lru counts.
> One is a per-bucket lsb_lru_len.
> The other is the per-cpu ls_lru_len_counter.
> 
> The only times the per-bucket counters are use are:
> - a debug message when an object is added
> - in lu_site_stats_get when all the counters are combined.
> 
> The debug message is not essential, and the per-cpu counter
> can be used to get the combined total.
> 
> So discard the per-bucket lsb_lru_len.
> 
> Signed-off-by: NeilBrown 

Looks reasonable, though it would also be possible to fix the percpu
functions rather than adding a workaround in this code.

Reviewed-by: Andreas Dilger 

> ---
> drivers/staging/lustre/lustre/obdclass/lu_object.c |   24 
> 1 file changed, 9 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/obdclass/lu_object.c 
> b/drivers/staging/lustre/lustre/obdclass/lu_object.c
> index 2a8a25d6edb5..2bf089817157 100644
> --- a/drivers/staging/lustre/lustre/obdclass/lu_object.c
> +++ b/drivers/staging/lustre/lustre/obdclass/lu_object.c
> @@ -57,10 +57,6 @@
> #include 
> 
> struct lu_site_bkt_data {
> - /**
> -  * number of object in this bucket on the lsb_lru list.
> -  */
> - longlsb_lru_len;
>   /**
>* LRU list, updated on each access to object. Protected by
>* bucket lock of lu_site::ls_obj_hash.
> @@ -187,10 +183,9 @@ void lu_object_put(const struct lu_env *env, struct 
> lu_object *o)
>   if (!lu_object_is_dying(top)) {
>   LASSERT(list_empty(>loh_lru));
>   list_add_tail(>loh_lru, >lsb_lru);
> - bkt->lsb_lru_len++;
>   percpu_counter_inc(>ls_lru_len_counter);
> - CDEBUG(D_INODE, "Add %p to site lru. hash: %p, bkt: %p, 
> lru_len: %ld\n",
> -o, site->ls_obj_hash, bkt, bkt->lsb_lru_len);
> + CDEBUG(D_INODE, "Add %p to site lru. hash: %p, bkt: %p\n",
> +o, site->ls_obj_hash, bkt);
>   cfs_hash_bd_unlock(site->ls_obj_hash, , 1);
>   return;
>   }
> @@ -238,7 +233,6 @@ void lu_object_unhash(const struct lu_env *env, struct 
> lu_object *o)
> 
>   list_del_init(>loh_lru);
>   bkt = cfs_hash_bd_extra_get(obj_hash, );
> - bkt->lsb_lru_len--;
>   percpu_counter_dec(>ls_lru_len_counter);
>   }
>   cfs_hash_bd_del_locked(obj_hash, , >loh_hash);
> @@ -422,7 +416,6 @@ int lu_site_purge_objects(const struct lu_env *env, 
> struct lu_site *s,
>   cfs_hash_bd_del_locked(s->ls_obj_hash,
>  , >loh_hash);
>   list_move(>loh_lru, );
> - bkt->lsb_lru_len--;
>   percpu_counter_dec(>ls_lru_len_counter);
>   if (did_sth == 0)
>   did_sth = 1;
> @@ -621,7 +614,6 @@ static struct lu_object *htable_lookup(struct lu_site *s,
>   lprocfs_counter_incr(s->ls_stats, LU_SS_CACHE_HIT);
>   if (!list_empty(>loh_lru)) {
>   list_del_init(>loh_lru);
> - bkt->lsb_lru_len--;
>   percpu_counter_dec(>ls_lru_len_counter);
>   }
>   return lu_object_top(h);
> @@ -1834,19 +1826,21 @@ struct lu_site_stats {
>   unsigned intlss_busy;
> };
> 
> -static void lu_site_stats_get(struct cfs_hash *hs,
> +static void lu_site_stats_get(const struct lu_site *s,
> struct lu_site_stats *stats, int populated)
> {
> + struct cfs_hash *hs = s->ls_obj_hash;
>   struct cfs_hash_bd bd;
>   unsigned int i;
> + /* percpu_counter_read_positive() won't accept a const pointer */
> + struct lu_site *s2 = (struct lu_site *)s;

It would seem worthwhile to change the percpu_counter_read_positive() and
percpu_counter_read() arguments to be "const struct percpu_counter *fbc",
rather than doing this cast here.  I can't see any reason that would be bad,
since both implementations just access fbc->count, and do not modify anything.

> + stats->lss_busy += cfs_hash_size_get(hs) -
> + percpu_counter_read_positive(>ls_lru_len_counter);
>   cfs_hash_for_each_bucket(hs, , i) {
> - struct lu_site_bkt_data *bkt = cfs_hash_bd_extra_get(hs, );
>   struct hlist_head   *hhead;
> 
>   cfs_hash_bd_lock(hs, , 1);
> - stats->lss_busy  +=
> - cfs_hash_bd_count_get() - bkt->lsb_lru_len;
>   stats->lss_total += cfs_hash_bd_count_get();
>   stats->lss_max_search = max((int)stats->lss_max_search,
>   cfs_hash_bd_depmax_get());
> @@ -2039,7 +2033,7 @@ int lu_site_stats_print(const struct lu_site *s, struct 
> seq_file *m)
>   struct lu_site_stats stats;
> 
>   memset(, 0, 

Re: [lustre-devel] [PATCH 02/10] staging: lustre: make struct lu_site_bkt_data private

2018-04-30 Thread Dilger, Andreas
On Apr 30, 2018, at 21:52, NeilBrown  wrote:
> 
> This data structure only needs to be public so that
> various modules can access a wait queue to wait for object
> destruction.
> If we provide a function to get the wait queue, rather than the
> whole bucket, the structure can be made private.
> 
> Signed-off-by: NeilBrown 

Nice cleanup.

Reviewed-by: Andreas Dilger 

> ---
> drivers/staging/lustre/lustre/include/lu_object.h  |   36 +-
> drivers/staging/lustre/lustre/llite/lcommon_cl.c   |8 ++-
> drivers/staging/lustre/lustre/lov/lov_object.c |8 ++-
> drivers/staging/lustre/lustre/obdclass/lu_object.c |   50 +---
> 4 files changed, 54 insertions(+), 48 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/include/lu_object.h 
> b/drivers/staging/lustre/lustre/include/lu_object.h
> index c3b0ed518819..f29bbca5af65 100644
> --- a/drivers/staging/lustre/lustre/include/lu_object.h
> +++ b/drivers/staging/lustre/lustre/include/lu_object.h
> @@ -549,31 +549,7 @@ struct lu_object_header {
> };
> 
> struct fld;
> -
> -struct lu_site_bkt_data {
> - /**
> -  * number of object in this bucket on the lsb_lru list.
> -  */
> - longlsb_lru_len;
> - /**
> -  * LRU list, updated on each access to object. Protected by
> -  * bucket lock of lu_site::ls_obj_hash.
> -  *
> -  * "Cold" end of LRU is lu_site::ls_lru.next. Accessed object are
> -  * moved to the lu_site::ls_lru.prev (this is due to the non-existence
> -  * of list_for_each_entry_safe_reverse()).
> -  */
> - struct list_headlsb_lru;
> - /**
> -  * Wait-queue signaled when an object in this site is ultimately
> -  * destroyed (lu_object_free()). It is used by lu_object_find() to
> -  * wait before re-trying when object in the process of destruction is
> -  * found in the hash table.
> -  *
> -  * \see htable_lookup().
> -  */
> - wait_queue_head_t  lsb_marche_funebre;
> -};
> +struct lu_site_bkt_data;
> 
> enum {
>   LU_SS_CREATED= 0,
> @@ -642,14 +618,8 @@ struct lu_site {
>   struct percpu_counterls_lru_len_counter;
> };
> 
> -static inline struct lu_site_bkt_data *
> -lu_site_bkt_from_fid(struct lu_site *site, struct lu_fid *fid)
> -{
> - struct cfs_hash_bd bd;
> -
> - cfs_hash_bd_get(site->ls_obj_hash, fid, );
> - return cfs_hash_bd_extra_get(site->ls_obj_hash, );
> -}
> +wait_queue_head_t *
> +lu_site_wq_from_fid(struct lu_site *site, struct lu_fid *fid);
> 
> static inline struct seq_server_site *lu_site2seq(const struct lu_site *s)
> {
> diff --git a/drivers/staging/lustre/lustre/llite/lcommon_cl.c 
> b/drivers/staging/lustre/lustre/llite/lcommon_cl.c
> index df5c0c0ae703..d5b42fb1d601 100644
> --- a/drivers/staging/lustre/lustre/llite/lcommon_cl.c
> +++ b/drivers/staging/lustre/lustre/llite/lcommon_cl.c
> @@ -211,12 +211,12 @@ static void cl_object_put_last(struct lu_env *env, 
> struct cl_object *obj)
> 
>   if (unlikely(atomic_read(>loh_ref) != 1)) {
>   struct lu_site *site = obj->co_lu.lo_dev->ld_site;
> - struct lu_site_bkt_data *bkt;
> + wait_queue_head_t *wq;
> 
> - bkt = lu_site_bkt_from_fid(site, >loh_fid);
> + wq = lu_site_wq_from_fid(site, >loh_fid);
> 
>   init_waitqueue_entry(, current);
> - add_wait_queue(>lsb_marche_funebre, );
> + add_wait_queue(wq, );
> 
>   while (1) {
>   set_current_state(TASK_UNINTERRUPTIBLE);
> @@ -226,7 +226,7 @@ static void cl_object_put_last(struct lu_env *env, struct 
> cl_object *obj)
>   }
> 
>   set_current_state(TASK_RUNNING);
> - remove_wait_queue(>lsb_marche_funebre, );
> + remove_wait_queue(wq, );
>   }
> 
>   cl_object_put(env, obj);
> diff --git a/drivers/staging/lustre/lustre/lov/lov_object.c 
> b/drivers/staging/lustre/lustre/lov/lov_object.c
> index f7c69680cb7d..adc90f310fd7 100644
> --- a/drivers/staging/lustre/lustre/lov/lov_object.c
> +++ b/drivers/staging/lustre/lustre/lov/lov_object.c
> @@ -370,7 +370,7 @@ static void lov_subobject_kill(const struct lu_env *env, 
> struct lov_object *lov,
>   struct cl_object*sub;
>   struct lov_layout_raid0 *r0;
>   struct lu_site*site;
> - struct lu_site_bkt_data *bkt;
> + wait_queue_head_t *wq;
>   wait_queue_entry_t*waiter;
> 
>   r0  = >u.raid0;
> @@ -378,7 +378,7 @@ static void lov_subobject_kill(const struct lu_env *env, 
> struct lov_object *lov,
> 
>   sub  = lovsub2cl(los);
>   site = sub->co_lu.lo_dev->ld_site;
> - bkt  = lu_site_bkt_from_fid(site, >co_lu.lo_header->loh_fid);
> + wq   = lu_site_wq_from_fid(site, >co_lu.lo_header->loh_fid);
> 
>   cl_object_kill(env, sub);
>   /* release a reference to the sub-object and ... */
> @@ 

Re: [lustre-devel] [PATCH 02/10] staging: lustre: make struct lu_site_bkt_data private

2018-04-30 Thread Dilger, Andreas
On Apr 30, 2018, at 21:52, NeilBrown  wrote:
> 
> This data structure only needs to be public so that
> various modules can access a wait queue to wait for object
> destruction.
> If we provide a function to get the wait queue, rather than the
> whole bucket, the structure can be made private.
> 
> Signed-off-by: NeilBrown 

Nice cleanup.

Reviewed-by: Andreas Dilger 

> ---
> drivers/staging/lustre/lustre/include/lu_object.h  |   36 +-
> drivers/staging/lustre/lustre/llite/lcommon_cl.c   |8 ++-
> drivers/staging/lustre/lustre/lov/lov_object.c |8 ++-
> drivers/staging/lustre/lustre/obdclass/lu_object.c |   50 +---
> 4 files changed, 54 insertions(+), 48 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/include/lu_object.h 
> b/drivers/staging/lustre/lustre/include/lu_object.h
> index c3b0ed518819..f29bbca5af65 100644
> --- a/drivers/staging/lustre/lustre/include/lu_object.h
> +++ b/drivers/staging/lustre/lustre/include/lu_object.h
> @@ -549,31 +549,7 @@ struct lu_object_header {
> };
> 
> struct fld;
> -
> -struct lu_site_bkt_data {
> - /**
> -  * number of object in this bucket on the lsb_lru list.
> -  */
> - longlsb_lru_len;
> - /**
> -  * LRU list, updated on each access to object. Protected by
> -  * bucket lock of lu_site::ls_obj_hash.
> -  *
> -  * "Cold" end of LRU is lu_site::ls_lru.next. Accessed object are
> -  * moved to the lu_site::ls_lru.prev (this is due to the non-existence
> -  * of list_for_each_entry_safe_reverse()).
> -  */
> - struct list_headlsb_lru;
> - /**
> -  * Wait-queue signaled when an object in this site is ultimately
> -  * destroyed (lu_object_free()). It is used by lu_object_find() to
> -  * wait before re-trying when object in the process of destruction is
> -  * found in the hash table.
> -  *
> -  * \see htable_lookup().
> -  */
> - wait_queue_head_t  lsb_marche_funebre;
> -};
> +struct lu_site_bkt_data;
> 
> enum {
>   LU_SS_CREATED= 0,
> @@ -642,14 +618,8 @@ struct lu_site {
>   struct percpu_counterls_lru_len_counter;
> };
> 
> -static inline struct lu_site_bkt_data *
> -lu_site_bkt_from_fid(struct lu_site *site, struct lu_fid *fid)
> -{
> - struct cfs_hash_bd bd;
> -
> - cfs_hash_bd_get(site->ls_obj_hash, fid, );
> - return cfs_hash_bd_extra_get(site->ls_obj_hash, );
> -}
> +wait_queue_head_t *
> +lu_site_wq_from_fid(struct lu_site *site, struct lu_fid *fid);
> 
> static inline struct seq_server_site *lu_site2seq(const struct lu_site *s)
> {
> diff --git a/drivers/staging/lustre/lustre/llite/lcommon_cl.c 
> b/drivers/staging/lustre/lustre/llite/lcommon_cl.c
> index df5c0c0ae703..d5b42fb1d601 100644
> --- a/drivers/staging/lustre/lustre/llite/lcommon_cl.c
> +++ b/drivers/staging/lustre/lustre/llite/lcommon_cl.c
> @@ -211,12 +211,12 @@ static void cl_object_put_last(struct lu_env *env, 
> struct cl_object *obj)
> 
>   if (unlikely(atomic_read(>loh_ref) != 1)) {
>   struct lu_site *site = obj->co_lu.lo_dev->ld_site;
> - struct lu_site_bkt_data *bkt;
> + wait_queue_head_t *wq;
> 
> - bkt = lu_site_bkt_from_fid(site, >loh_fid);
> + wq = lu_site_wq_from_fid(site, >loh_fid);
> 
>   init_waitqueue_entry(, current);
> - add_wait_queue(>lsb_marche_funebre, );
> + add_wait_queue(wq, );
> 
>   while (1) {
>   set_current_state(TASK_UNINTERRUPTIBLE);
> @@ -226,7 +226,7 @@ static void cl_object_put_last(struct lu_env *env, struct 
> cl_object *obj)
>   }
> 
>   set_current_state(TASK_RUNNING);
> - remove_wait_queue(>lsb_marche_funebre, );
> + remove_wait_queue(wq, );
>   }
> 
>   cl_object_put(env, obj);
> diff --git a/drivers/staging/lustre/lustre/lov/lov_object.c 
> b/drivers/staging/lustre/lustre/lov/lov_object.c
> index f7c69680cb7d..adc90f310fd7 100644
> --- a/drivers/staging/lustre/lustre/lov/lov_object.c
> +++ b/drivers/staging/lustre/lustre/lov/lov_object.c
> @@ -370,7 +370,7 @@ static void lov_subobject_kill(const struct lu_env *env, 
> struct lov_object *lov,
>   struct cl_object*sub;
>   struct lov_layout_raid0 *r0;
>   struct lu_site*site;
> - struct lu_site_bkt_data *bkt;
> + wait_queue_head_t *wq;
>   wait_queue_entry_t*waiter;
> 
>   r0  = >u.raid0;
> @@ -378,7 +378,7 @@ static void lov_subobject_kill(const struct lu_env *env, 
> struct lov_object *lov,
> 
>   sub  = lovsub2cl(los);
>   site = sub->co_lu.lo_dev->ld_site;
> - bkt  = lu_site_bkt_from_fid(site, >co_lu.lo_header->loh_fid);
> + wq   = lu_site_wq_from_fid(site, >co_lu.lo_header->loh_fid);
> 
>   cl_object_kill(env, sub);
>   /* release a reference to the sub-object and ... */
> @@ -391,7 +391,7 @@ static void lov_subobject_kill(const 

Re: [PATCH v2] staging: lustre: llite: fix potential missing-check bug when copying lumv

2018-04-30 Thread Dilger, Andreas
On Apr 30, 2018, at 16:56, Wenwen Wang  wrote:
> 
> In ll_dir_ioctl(), the object lumv3 is firstly copied from the user space
> using Its address, i.e., lumv1 =  If the lmm_magic field of lumv3 is
> LOV_USER_MAGIC_V3, lumv3 will be modified by the second copy from the user
> space. The second copy is necessary, because the two versions (i.e.,
> lov_user_md_v1 and lov_user_md_v3) have different data formats and lengths.
> However, given that the user data resides in the user space, a malicious
> user-space process can race to change the data between the two copies. By
> doing so, the attacker can provide a data with an inconsistent version,
> e.g., v1 version + v3 data. This can lead to logical errors in the
> following execution in ll_dir_setstripe(), which performs different actions
> according to the version specified by the field lmm_magic.
> 
> This patch rechecks the version field lmm_magic in the second copy.  If the
> version is not as expected, i.e., LOV_USER_MAGIC_V3, an error code will be
> returned: -EINVAL.
> 
> Signed-off-by: Wenwen Wang 

Thanks for the updated patch.

Reviewed-by: Andreas Dilger 

> ---
> drivers/staging/lustre/lustre/llite/dir.c | 2 ++
> 1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/staging/lustre/lustre/llite/dir.c 
> b/drivers/staging/lustre/lustre/llite/dir.c
> index d10d272..80d44ca 100644
> --- a/drivers/staging/lustre/lustre/llite/dir.c
> +++ b/drivers/staging/lustre/lustre/llite/dir.c
> @@ -1185,6 +1185,8 @@ static long ll_dir_ioctl(struct file *file, unsigned 
> int cmd, unsigned long arg)
>   if (lumv1->lmm_magic == LOV_USER_MAGIC_V3) {
>   if (copy_from_user(, lumv3p, sizeof(lumv3)))
>   return -EFAULT;
> + if (lumv3.lmm_magic != LOV_USER_MAGIC_V3)
> + return -EINVAL;
>   }
> 
>   if (is_root_inode(inode))
> -- 
> 2.7.4
> 

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH v2] staging: lustre: llite: fix potential missing-check bug when copying lumv

2018-04-30 Thread Dilger, Andreas
On Apr 30, 2018, at 16:56, Wenwen Wang  wrote:
> 
> In ll_dir_ioctl(), the object lumv3 is firstly copied from the user space
> using Its address, i.e., lumv1 =  If the lmm_magic field of lumv3 is
> LOV_USER_MAGIC_V3, lumv3 will be modified by the second copy from the user
> space. The second copy is necessary, because the two versions (i.e.,
> lov_user_md_v1 and lov_user_md_v3) have different data formats and lengths.
> However, given that the user data resides in the user space, a malicious
> user-space process can race to change the data between the two copies. By
> doing so, the attacker can provide a data with an inconsistent version,
> e.g., v1 version + v3 data. This can lead to logical errors in the
> following execution in ll_dir_setstripe(), which performs different actions
> according to the version specified by the field lmm_magic.
> 
> This patch rechecks the version field lmm_magic in the second copy.  If the
> version is not as expected, i.e., LOV_USER_MAGIC_V3, an error code will be
> returned: -EINVAL.
> 
> Signed-off-by: Wenwen Wang 

Thanks for the updated patch.

Reviewed-by: Andreas Dilger 

> ---
> drivers/staging/lustre/lustre/llite/dir.c | 2 ++
> 1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/staging/lustre/lustre/llite/dir.c 
> b/drivers/staging/lustre/lustre/llite/dir.c
> index d10d272..80d44ca 100644
> --- a/drivers/staging/lustre/lustre/llite/dir.c
> +++ b/drivers/staging/lustre/lustre/llite/dir.c
> @@ -1185,6 +1185,8 @@ static long ll_dir_ioctl(struct file *file, unsigned 
> int cmd, unsigned long arg)
>   if (lumv1->lmm_magic == LOV_USER_MAGIC_V3) {
>   if (copy_from_user(, lumv3p, sizeof(lumv3)))
>   return -EFAULT;
> + if (lumv3.lmm_magic != LOV_USER_MAGIC_V3)
> + return -EINVAL;
>   }
> 
>   if (is_root_inode(inode))
> -- 
> 2.7.4
> 

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH 01/10] staging: lustre: ldlm: store name directly in namespace.

2018-04-30 Thread Dilger, Andreas
On Apr 30, 2018, at 21:52, NeilBrown  wrote:
> 
> Rather than storing the name of a namespace in the
> hash table, store it directly in the namespace.
> This will allow the hashtable to be changed to use
> rhashtable.
> 
> Signed-off-by: NeilBrown 

Reviewed-by: Andreas Dilger 

> ---
> drivers/staging/lustre/lustre/include/lustre_dlm.h |5 -
> drivers/staging/lustre/lustre/ldlm/ldlm_resource.c |5 +
> 2 files changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/staging/lustre/lustre/include/lustre_dlm.h 
> b/drivers/staging/lustre/lustre/include/lustre_dlm.h
> index d668d86423a4..b3532adac31c 100644
> --- a/drivers/staging/lustre/lustre/include/lustre_dlm.h
> +++ b/drivers/staging/lustre/lustre/include/lustre_dlm.h
> @@ -362,6 +362,9 @@ struct ldlm_namespace {
>   /** Flag indicating if namespace is on client instead of server */
>   enum ldlm_side  ns_client;
> 
> + /** name of this namespace */
> + char*ns_name;
> +
>   /** Resource hash table for namespace. */
>   struct cfs_hash *ns_rs_hash;
> 
> @@ -878,7 +881,7 @@ static inline bool ldlm_has_layout(struct ldlm_lock *lock)
> static inline char *
> ldlm_ns_name(struct ldlm_namespace *ns)
> {
> - return ns->ns_rs_hash->hs_name;
> + return ns->ns_name;
> }
> 
> static inline struct ldlm_namespace *
> diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c 
> b/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c
> index 6c615b6e9bdc..43bbc5fd94cc 100644
> --- a/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c
> +++ b/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c
> @@ -688,6 +688,9 @@ struct ldlm_namespace *ldlm_namespace_new(struct 
> obd_device *obd, char *name,
>   ns->ns_obd  = obd;
>   ns->ns_appetite = apt;
>   ns->ns_client   = client;
> + ns->ns_name = kstrdup(name, GFP_KERNEL);
> + if (!ns->ns_name)
> + goto out_hash;
> 
>   INIT_LIST_HEAD(>ns_list_chain);
>   INIT_LIST_HEAD(>ns_unused_list);
> @@ -730,6 +733,7 @@ struct ldlm_namespace *ldlm_namespace_new(struct 
> obd_device *obd, char *name,
>   ldlm_namespace_sysfs_unregister(ns);
>   ldlm_namespace_cleanup(ns, 0);
> out_hash:
> + kfree(ns->ns_name);
>   cfs_hash_putref(ns->ns_rs_hash);
> out_ns:
>   kfree(ns);
> @@ -993,6 +997,7 @@ void ldlm_namespace_free_post(struct ldlm_namespace *ns)
>   ldlm_namespace_debugfs_unregister(ns);
>   ldlm_namespace_sysfs_unregister(ns);
>   cfs_hash_putref(ns->ns_rs_hash);
> + kfree(ns->ns_name);
>   /* Namespace \a ns should be not on list at this time, otherwise
>* this will cause issues related to using freed \a ns in poold
>* thread.
> 
> 

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH 01/10] staging: lustre: ldlm: store name directly in namespace.

2018-04-30 Thread Dilger, Andreas
On Apr 30, 2018, at 21:52, NeilBrown  wrote:
> 
> Rather than storing the name of a namespace in the
> hash table, store it directly in the namespace.
> This will allow the hashtable to be changed to use
> rhashtable.
> 
> Signed-off-by: NeilBrown 

Reviewed-by: Andreas Dilger 

> ---
> drivers/staging/lustre/lustre/include/lustre_dlm.h |5 -
> drivers/staging/lustre/lustre/ldlm/ldlm_resource.c |5 +
> 2 files changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/staging/lustre/lustre/include/lustre_dlm.h 
> b/drivers/staging/lustre/lustre/include/lustre_dlm.h
> index d668d86423a4..b3532adac31c 100644
> --- a/drivers/staging/lustre/lustre/include/lustre_dlm.h
> +++ b/drivers/staging/lustre/lustre/include/lustre_dlm.h
> @@ -362,6 +362,9 @@ struct ldlm_namespace {
>   /** Flag indicating if namespace is on client instead of server */
>   enum ldlm_side  ns_client;
> 
> + /** name of this namespace */
> + char*ns_name;
> +
>   /** Resource hash table for namespace. */
>   struct cfs_hash *ns_rs_hash;
> 
> @@ -878,7 +881,7 @@ static inline bool ldlm_has_layout(struct ldlm_lock *lock)
> static inline char *
> ldlm_ns_name(struct ldlm_namespace *ns)
> {
> - return ns->ns_rs_hash->hs_name;
> + return ns->ns_name;
> }
> 
> static inline struct ldlm_namespace *
> diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c 
> b/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c
> index 6c615b6e9bdc..43bbc5fd94cc 100644
> --- a/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c
> +++ b/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c
> @@ -688,6 +688,9 @@ struct ldlm_namespace *ldlm_namespace_new(struct 
> obd_device *obd, char *name,
>   ns->ns_obd  = obd;
>   ns->ns_appetite = apt;
>   ns->ns_client   = client;
> + ns->ns_name = kstrdup(name, GFP_KERNEL);
> + if (!ns->ns_name)
> + goto out_hash;
> 
>   INIT_LIST_HEAD(>ns_list_chain);
>   INIT_LIST_HEAD(>ns_unused_list);
> @@ -730,6 +733,7 @@ struct ldlm_namespace *ldlm_namespace_new(struct 
> obd_device *obd, char *name,
>   ldlm_namespace_sysfs_unregister(ns);
>   ldlm_namespace_cleanup(ns, 0);
> out_hash:
> + kfree(ns->ns_name);
>   cfs_hash_putref(ns->ns_rs_hash);
> out_ns:
>   kfree(ns);
> @@ -993,6 +997,7 @@ void ldlm_namespace_free_post(struct ldlm_namespace *ns)
>   ldlm_namespace_debugfs_unregister(ns);
>   ldlm_namespace_sysfs_unregister(ns);
>   cfs_hash_putref(ns->ns_rs_hash);
> + kfree(ns->ns_name);
>   /* Namespace \a ns should be not on list at this time, otherwise
>* this will cause issues related to using freed \a ns in poold
>* thread.
> 
> 

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH] staging: luster: llite: fix a potential missing-check bug when copying lumv

2018-04-30 Thread Dilger, Andreas
On Apr 29, 2018, at 07:20, Greg Kroah-Hartman <gre...@linuxfoundation.org> 
wrote:
> 
> On Sat, Apr 28, 2018 at 04:04:25PM +0000, Dilger, Andreas wrote:
>> On Apr 27, 2018, at 17:45, Wenwen Wang <wang6...@umn.edu> wrote:
>>> [PATCH] staging: luster: llite: fix potential missing-check bug when 
>>> copying lumv
>> 
>> (typo) s/luster/lustre/
>> 
>>> In ll_dir_ioctl(), the object lumv3 is firstly copied from the user space
>>> using Its address, i.e., lumv1 =  If the lmm_magic field of lumv3 is
>>> LOV_USER_MAGIV_V3, lumv3 will be modified by the second copy from the user
>> 
>> (typo) s/MAGIV/MAGIC/
>> 
>>> space. The second copy is necessary, because the two versions (i.e.,
>>> lov_user_md_v1 and lov_user_md_v3) have different data formats and lengths.
>>> However, given that the user data resides in the user space, a malicious
>>> user-space process can race to change the data between the two copies. By
>>> doing so, the attacker can provide a data with an inconsistent version,
>>> e.g., v1 version + v3 data. This can lead to logical errors in the
>>> following execution in ll_dir_setstripe(), which performs different actions
>>> according to the version specified by the field lmm_magic.
>> 
>> This isn't a serious bug in the end.  The LOV_USER_MAGIC_V3 check just copies
>> a bit more data from userspace (the lmm_pool field).  It would be more of a
>> problem if the reverse was possible (copy smaller V1 buffer, but change the
>> magic to LOV_USER_MAGIC_V3 afterward), but this isn't possible since the 
>> second
>> copy is not done if there is a V1 magic.  If the user changes from V3 magic
>> to V1 in a racy manner it means less data will be used than copied, which
>> is harmless.
>> 
>>> This patch rechecks the version field lmm_magic in the second copy.  If the
>>> version is not as expected, i.e., LOV_USER_MAGIC_V3, an error code will be
>>> returned: -EINVAL.
>> 
>> This isn't a bad idea in any case, since it verifies the data copied from
>> userspace is still valid.
> 
> So you agree with this patch?  Or do not?
> 
> confused,

I don't think it fixes a real bug, but it makes the code a bit more clear,
so I'm OK to land it (with minor corrections to commit message per above).

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH] staging: luster: llite: fix a potential missing-check bug when copying lumv

2018-04-30 Thread Dilger, Andreas
On Apr 29, 2018, at 07:20, Greg Kroah-Hartman  
wrote:
> 
> On Sat, Apr 28, 2018 at 04:04:25PM +, Dilger, Andreas wrote:
>> On Apr 27, 2018, at 17:45, Wenwen Wang  wrote:
>>> [PATCH] staging: luster: llite: fix potential missing-check bug when 
>>> copying lumv
>> 
>> (typo) s/luster/lustre/
>> 
>>> In ll_dir_ioctl(), the object lumv3 is firstly copied from the user space
>>> using Its address, i.e., lumv1 =  If the lmm_magic field of lumv3 is
>>> LOV_USER_MAGIV_V3, lumv3 will be modified by the second copy from the user
>> 
>> (typo) s/MAGIV/MAGIC/
>> 
>>> space. The second copy is necessary, because the two versions (i.e.,
>>> lov_user_md_v1 and lov_user_md_v3) have different data formats and lengths.
>>> However, given that the user data resides in the user space, a malicious
>>> user-space process can race to change the data between the two copies. By
>>> doing so, the attacker can provide a data with an inconsistent version,
>>> e.g., v1 version + v3 data. This can lead to logical errors in the
>>> following execution in ll_dir_setstripe(), which performs different actions
>>> according to the version specified by the field lmm_magic.
>> 
>> This isn't a serious bug in the end.  The LOV_USER_MAGIC_V3 check just copies
>> a bit more data from userspace (the lmm_pool field).  It would be more of a
>> problem if the reverse was possible (copy smaller V1 buffer, but change the
>> magic to LOV_USER_MAGIC_V3 afterward), but this isn't possible since the 
>> second
>> copy is not done if there is a V1 magic.  If the user changes from V3 magic
>> to V1 in a racy manner it means less data will be used than copied, which
>> is harmless.
>> 
>>> This patch rechecks the version field lmm_magic in the second copy.  If the
>>> version is not as expected, i.e., LOV_USER_MAGIC_V3, an error code will be
>>> returned: -EINVAL.
>> 
>> This isn't a bad idea in any case, since it verifies the data copied from
>> userspace is still valid.
> 
> So you agree with this patch?  Or do not?
> 
> confused,

I don't think it fixes a real bug, but it makes the code a bit more clear,
so I'm OK to land it (with minor corrections to commit message per above).

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH] staging: luster: llite: fix a potential missing-check bug when copying lumv

2018-04-28 Thread Dilger, Andreas
On Apr 27, 2018, at 17:45, Wenwen Wang  wrote:
> [PATCH] staging: luster: llite: fix potential missing-check bug when copying 
> lumv

(typo) s/luster/lustre/

> In ll_dir_ioctl(), the object lumv3 is firstly copied from the user space
> using Its address, i.e., lumv1 =  If the lmm_magic field of lumv3 is
> LOV_USER_MAGIV_V3, lumv3 will be modified by the second copy from the user

(typo) s/MAGIV/MAGIC/

> space. The second copy is necessary, because the two versions (i.e.,
> lov_user_md_v1 and lov_user_md_v3) have different data formats and lengths.
> However, given that the user data resides in the user space, a malicious
> user-space process can race to change the data between the two copies. By
> doing so, the attacker can provide a data with an inconsistent version,
> e.g., v1 version + v3 data. This can lead to logical errors in the
> following execution in ll_dir_setstripe(), which performs different actions
> according to the version specified by the field lmm_magic.

This isn't a serious bug in the end.  The LOV_USER_MAGIC_V3 check just copies
a bit more data from userspace (the lmm_pool field).  It would be more of a
problem if the reverse was possible (copy smaller V1 buffer, but change the
magic to LOV_USER_MAGIC_V3 afterward), but this isn't possible since the second
copy is not done if there is a V1 magic.  If the user changes from V3 magic
to V1 in a racy manner it means less data will be used than copied, which
is harmless.

> This patch rechecks the version field lmm_magic in the second copy.  If the
> version is not as expected, i.e., LOV_USER_MAGIC_V3, an error code will be
> returned: -EINVAL.

This isn't a bad idea in any case, since it verifies the data copied from
userspace is still valid.

Cheers, Andreas

> Signed-off-by: Wenwen Wang 
> ---
> drivers/staging/lustre/lustre/llite/dir.c | 2 ++
> 1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/staging/lustre/lustre/llite/dir.c 
> b/drivers/staging/lustre/lustre/llite/dir.c
> index d10d272..80d44ca 100644
> --- a/drivers/staging/lustre/lustre/llite/dir.c
> +++ b/drivers/staging/lustre/lustre/llite/dir.c
> @@ -1185,6 +1185,8 @@ static long ll_dir_ioctl(struct file *file, unsigned 
> int cmd, unsigned long arg)
>   if (lumv1->lmm_magic == LOV_USER_MAGIC_V3) {
>   if (copy_from_user(, lumv3p, sizeof(lumv3)))
>   return -EFAULT;
> + if (lumv3.lmm_magic != LOV_USER_MAGIC_V3)
> + return -EINVAL;
>   }
> 
>   if (is_root_inode(inode))
> -- 
> 2.7.4
> 

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH] staging: luster: llite: fix a potential missing-check bug when copying lumv

2018-04-28 Thread Dilger, Andreas
On Apr 27, 2018, at 17:45, Wenwen Wang  wrote:
> [PATCH] staging: luster: llite: fix potential missing-check bug when copying 
> lumv

(typo) s/luster/lustre/

> In ll_dir_ioctl(), the object lumv3 is firstly copied from the user space
> using Its address, i.e., lumv1 =  If the lmm_magic field of lumv3 is
> LOV_USER_MAGIV_V3, lumv3 will be modified by the second copy from the user

(typo) s/MAGIV/MAGIC/

> space. The second copy is necessary, because the two versions (i.e.,
> lov_user_md_v1 and lov_user_md_v3) have different data formats and lengths.
> However, given that the user data resides in the user space, a malicious
> user-space process can race to change the data between the two copies. By
> doing so, the attacker can provide a data with an inconsistent version,
> e.g., v1 version + v3 data. This can lead to logical errors in the
> following execution in ll_dir_setstripe(), which performs different actions
> according to the version specified by the field lmm_magic.

This isn't a serious bug in the end.  The LOV_USER_MAGIC_V3 check just copies
a bit more data from userspace (the lmm_pool field).  It would be more of a
problem if the reverse was possible (copy smaller V1 buffer, but change the
magic to LOV_USER_MAGIC_V3 afterward), but this isn't possible since the second
copy is not done if there is a V1 magic.  If the user changes from V3 magic
to V1 in a racy manner it means less data will be used than copied, which
is harmless.

> This patch rechecks the version field lmm_magic in the second copy.  If the
> version is not as expected, i.e., LOV_USER_MAGIC_V3, an error code will be
> returned: -EINVAL.

This isn't a bad idea in any case, since it verifies the data copied from
userspace is still valid.

Cheers, Andreas

> Signed-off-by: Wenwen Wang 
> ---
> drivers/staging/lustre/lustre/llite/dir.c | 2 ++
> 1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/staging/lustre/lustre/llite/dir.c 
> b/drivers/staging/lustre/lustre/llite/dir.c
> index d10d272..80d44ca 100644
> --- a/drivers/staging/lustre/lustre/llite/dir.c
> +++ b/drivers/staging/lustre/lustre/llite/dir.c
> @@ -1185,6 +1185,8 @@ static long ll_dir_ioctl(struct file *file, unsigned 
> int cmd, unsigned long arg)
>   if (lumv1->lmm_magic == LOV_USER_MAGIC_V3) {
>   if (copy_from_user(, lumv3p, sizeof(lumv3)))
>   return -EFAULT;
> + if (lumv3.lmm_magic != LOV_USER_MAGIC_V3)
> + return -EINVAL;
>   }
> 
>   if (is_root_inode(inode))
> -- 
> 2.7.4
> 

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [lustre-devel] [PATCH 1/6] staging: lustre: move stack-check macros to libcfs_debug.h

2018-04-16 Thread Dilger, Andreas
On Apr 16, 2018, at 16:48, Doug Oucharek  wrote:
> 
>> 
>> On Apr 16, 2018, at 3:42 PM, James Simmons  wrote:
>> 
>> 
>>> James,
>>> 
>>> If I understand correctly, you're saying you want to be able to build 
>>> without debug support...?  I'm not convinced that building a client without 
>>> debug support is interesting or useful.  In fact, I think it would be 
>>> harmful, and we shouldn't open up the possibility - this is switchable 
>>> debug with very low overhead when not actually "on".  It would be really 
>>> awful to get a problem on a running system and discover there's no debug 
>>> support - that you can't even enable debug without a reinstall.
>>> 
>>> If I've understood you correctly, then I would want to see proof of a 
>>> significant performance cost when debug is built but *off* before agreeing 
>>> to even exposing this option.  (I know it's a choice they'd have to make, 
>>> but if it's not really useful with a side order of potentially harmful, we 
>>> shouldn't even give people the choice.)
>> 
>> I'm not saying add the option today but this is more for the long game.
>> While the Intel lustre developers deeply love lustre's debugging 
>> infrastructure I see a future where something better will come along to
>> replace it. When that day comes we will have a period where both
>> debugging infrastructurs will exist and some deployers of lustre will
>> want to turn off the old debugging infrastructure and just use the new.
>> That is what I have in mind. A switch to flip between options.
> 
> Yes please!!  An option for users which says “no, you do not have the right 
> to panic my system via LASSERT whenever you like” would be a blessing.

Note that LASSERT() itself does not panic the system, unless you configure it
with panic_on_lbug=1.  Otherwise, it just blocks that thread (though this can
also have an impact on other threads if you are holding locks at that time).

That said, the LASSERT() should not be hit unless there is bad code, data
corruption, or the LASSERT() itself is incorrect (essentially bad code also).

So "whenever you like" is "whenever the system is about to corrupt your data",
and people are not very forgiving if a filesystem corrupts their data...

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [lustre-devel] [PATCH 1/6] staging: lustre: move stack-check macros to libcfs_debug.h

2018-04-16 Thread Dilger, Andreas
On Apr 16, 2018, at 16:48, Doug Oucharek  wrote:
> 
>> 
>> On Apr 16, 2018, at 3:42 PM, James Simmons  wrote:
>> 
>> 
>>> James,
>>> 
>>> If I understand correctly, you're saying you want to be able to build 
>>> without debug support...?  I'm not convinced that building a client without 
>>> debug support is interesting or useful.  In fact, I think it would be 
>>> harmful, and we shouldn't open up the possibility - this is switchable 
>>> debug with very low overhead when not actually "on".  It would be really 
>>> awful to get a problem on a running system and discover there's no debug 
>>> support - that you can't even enable debug without a reinstall.
>>> 
>>> If I've understood you correctly, then I would want to see proof of a 
>>> significant performance cost when debug is built but *off* before agreeing 
>>> to even exposing this option.  (I know it's a choice they'd have to make, 
>>> but if it's not really useful with a side order of potentially harmful, we 
>>> shouldn't even give people the choice.)
>> 
>> I'm not saying add the option today but this is more for the long game.
>> While the Intel lustre developers deeply love lustre's debugging 
>> infrastructure I see a future where something better will come along to
>> replace it. When that day comes we will have a period where both
>> debugging infrastructurs will exist and some deployers of lustre will
>> want to turn off the old debugging infrastructure and just use the new.
>> That is what I have in mind. A switch to flip between options.
> 
> Yes please!!  An option for users which says “no, you do not have the right 
> to panic my system via LASSERT whenever you like” would be a blessing.

Note that LASSERT() itself does not panic the system, unless you configure it
with panic_on_lbug=1.  Otherwise, it just blocks that thread (though this can
also have an impact on other threads if you are holding locks at that time).

That said, the LASSERT() should not be hit unless there is bad code, data
corruption, or the LASSERT() itself is incorrect (essentially bad code also).

So "whenever you like" is "whenever the system is about to corrupt your data",
and people are not very forgiving if a filesystem corrupts their data...

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [lustre-devel] [PATCH 11/17] staging: lustre: libcfs: discard cfs_time_shift().

2018-04-04 Thread Dilger, Andreas
On Apr 2, 2018, at 16:26, NeilBrown <ne...@suse.com> wrote:
> On Mon, Apr 02 2018, Dilger, Andreas wrote:
>> On Mar 30, 2018, at 13:02, James Simmons <jsimm...@infradead.org> wrote:
>>>> This function simply multiplies by HZ and adds jiffies.
>>>> This is simple enough to be opencoded, and doing so
>>>> makes the code easier to read.
>>>> 
>>>> Same for cfs_time_shift_64()
>>> 
>>> Reviewed-by: James Simmons <jsimm...@infradead.org>
>> 
>> Hmm, I thought we were trying to get rid of direct HZ usage in modules,
>> because of tickless systems, and move to e.g. msecs_to_jiffies() or similar?
> 
> Are we?  I hadn't heard but I could easily have missed it.
> Documentation/scheduler/completion.txt does say
> 
>Timeouts are preferably calculated with
>msecs_to_jiffies() or usecs_to_jiffies().
> 
> but is isn't clear what they are preferred to.  Do you remember where
> you heard? or have a reference?

I thought the goal was to avoid hard-coding the HZ value so that kernels
could have variable clock rates in the future.

Cheers, Andreas

> $ git grep ' \* *HZ'  |wc
>   2244   15679  170016
> $ git grep msecs_to_jiffies | wc
>   3301   13151  276725
> 
> so msecs_to_jiffies is slightly more popular than "* HZ" (even if you add
> in "HZ *").  But that could just be a preference for using milliseconds
> over using seconds.
> 
> $ git grep msecs_to_jiffies   | grep -c '[0-9]000'
> 587
> 
> so there are only 587 places that msecs_to_jiffies is clearly used in
> place of multiplying by HZ.
> 
> If we were to pursue this, I would want to add secs_to_jiffies() to
> include/linux/jiffies.h and use that.
> 
> Thanks,
> NeilBrown
> ___
> lustre-devel mailing list
> lustre-de...@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [lustre-devel] [PATCH 11/17] staging: lustre: libcfs: discard cfs_time_shift().

2018-04-04 Thread Dilger, Andreas
On Apr 2, 2018, at 16:26, NeilBrown  wrote:
> On Mon, Apr 02 2018, Dilger, Andreas wrote:
>> On Mar 30, 2018, at 13:02, James Simmons  wrote:
>>>> This function simply multiplies by HZ and adds jiffies.
>>>> This is simple enough to be opencoded, and doing so
>>>> makes the code easier to read.
>>>> 
>>>> Same for cfs_time_shift_64()
>>> 
>>> Reviewed-by: James Simmons 
>> 
>> Hmm, I thought we were trying to get rid of direct HZ usage in modules,
>> because of tickless systems, and move to e.g. msecs_to_jiffies() or similar?
> 
> Are we?  I hadn't heard but I could easily have missed it.
> Documentation/scheduler/completion.txt does say
> 
>Timeouts are preferably calculated with
>msecs_to_jiffies() or usecs_to_jiffies().
> 
> but is isn't clear what they are preferred to.  Do you remember where
> you heard? or have a reference?

I thought the goal was to avoid hard-coding the HZ value so that kernels
could have variable clock rates in the future.

Cheers, Andreas

> $ git grep ' \* *HZ'  |wc
>   2244   15679  170016
> $ git grep msecs_to_jiffies | wc
>   3301   13151  276725
> 
> so msecs_to_jiffies is slightly more popular than "* HZ" (even if you add
> in "HZ *").  But that could just be a preference for using milliseconds
> over using seconds.
> 
> $ git grep msecs_to_jiffies   | grep -c '[0-9]000'
> 587
> 
> so there are only 587 places that msecs_to_jiffies is clearly used in
> place of multiplying by HZ.
> 
> If we were to pursue this, I would want to add secs_to_jiffies() to
> include/linux/jiffies.h and use that.
> 
> Thanks,
> NeilBrown
> ___
> lustre-devel mailing list
> lustre-de...@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH 13/17] staging: lustre: libcfs: remove cfs_timeout_cap()

2018-04-02 Thread Dilger, Andreas
On Mar 28, 2018, at 22:26, NeilBrown  wrote:
> 
> This wrapper is only used once, so open-code it as max().
> 
> This allows us to remove the libcfs_time.h include file.
> 
> Signed-off-by: NeilBrown 
> ---
> .../staging/lustre/include/linux/libcfs/libcfs.h   |1 
> .../lustre/include/linux/libcfs/libcfs_time.h  |   50 
> .../lustre/include/linux/libcfs/linux/linux-time.h |2 -
> drivers/staging/lustre/lustre/ptlrpc/import.c  |4 +-
> 4 files changed, 3 insertions(+), 54 deletions(-)
> delete mode 100644 drivers/staging/lustre/include/linux/libcfs/libcfs_time.h
> 
> diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs.h 
> b/drivers/staging/lustre/include/linux/libcfs/libcfs.h
> index 3b751c436b3d..3d3fa52858e5 100644
> --- a/drivers/staging/lustre/include/linux/libcfs/libcfs.h
> +++ b/drivers/staging/lustre/include/linux/libcfs/libcfs.h
> @@ -43,7 +43,6 @@
> #include 
> #include 
> #include 
> -#include 
> #include 
> #include 
> #include 
> diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_time.h 
> b/drivers/staging/lustre/include/linux/libcfs/libcfs_time.h
> deleted file mode 100644
> index 172a8872e3f3..
> --- a/drivers/staging/lustre/include/linux/libcfs/libcfs_time.h
> +++ /dev/null
> @@ -1,50 +0,0 @@
> -// SPDX-License-Identifier: GPL-2.0
> -/*
> - * GPL HEADER START
> - *
> - * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
> - *
> - * This program is free software; you can redistribute it and/or modify
> - * it under the terms of the GNU General Public License version 2 only,
> - * as published by the Free Software Foundation.
> - *
> - * This program is distributed in the hope that it will be useful, but
> - * WITHOUT ANY WARRANTY; without even the implied warranty of
> - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> - * General Public License version 2 for more details (a copy is included
> - * in the LICENSE file that accompanied this code).
> - *
> - * You should have received a copy of the GNU General Public License
> - * version 2 along with this program; If not, see
> - * http://www.gnu.org/licenses/gpl-2.0.html
> - *
> - * GPL HEADER END
> - */
> -/*
> - * Copyright (c) 2008, 2010, Oracle and/or its affiliates. All rights 
> reserved.
> - * Use is subject to license terms.
> - */
> -/*
> - * This file is part of Lustre, http://www.lustre.org/
> - * Lustre is a trademark of Sun Microsystems, Inc.
> - *
> - * libcfs/include/libcfs/libcfs_time.h
> - *
> - * Time functions.
> - *
> - */
> -
> -#ifndef __LIBCFS_TIME_H__
> -#define __LIBCFS_TIME_H__
> -/*
> - * return valid time-out based on user supplied one. Currently we only check
> - * that time-out is not shorted than allowed.
> - */
> -static inline long cfs_timeout_cap(long timeout)
> -{
> - if (timeout < CFS_TICK)
> - timeout = CFS_TICK;
> - return timeout;
> -}
> -
> -#endif
> diff --git a/drivers/staging/lustre/include/linux/libcfs/linux/linux-time.h 
> b/drivers/staging/lustre/include/linux/libcfs/linux/linux-time.h
> index ff3aae2f1231..ecb2126a9e6f 100644
> --- a/drivers/staging/lustre/include/linux/libcfs/linux/linux-time.h
> +++ b/drivers/staging/lustre/include/linux/libcfs/linux/linux-time.h
> @@ -78,7 +78,7 @@ static inline int cfs_time_beforeq_64(u64 t1, u64 t2)
> /*
>  * One jiffy
>  */
> -#define CFS_TICK (1)
> +#define CFS_TICK (1UL)

It seems like CFS_TICK is mostly useless as well and could just be dropped?

> #define CFS_DURATION_T  "%ld"
> 
> diff --git a/drivers/staging/lustre/lustre/ptlrpc/import.c 
> b/drivers/staging/lustre/lustre/ptlrpc/import.c
> index 4a9d1f189d01..dd4fd54128dd 100644
> --- a/drivers/staging/lustre/lustre/ptlrpc/import.c
> +++ b/drivers/staging/lustre/lustre/ptlrpc/import.c
> @@ -1486,7 +1486,7 @@ int ptlrpc_disconnect_import(struct obd_import *imp, 
> int noclose)
>   }
> 
>   if (ptlrpc_import_in_recovery(imp)) {
> - long timeout;
> + unsigned long timeout;
> 
>   if (AT_OFF) {
>   if (imp->imp_server_timeout)
> @@ -1501,7 +1501,7 @@ int ptlrpc_disconnect_import(struct obd_import *imp, 
> int noclose)
> 
>   if (wait_event_idle_timeout(imp->imp_recovery_waitq,
>   !ptlrpc_import_in_recovery(imp),
> - cfs_timeout_cap(timeout)) == 0)
> + max(timeout, CFS_TICK)) == 0)
>   l_wait_event_abortable(
>   imp->imp_recovery_waitq,
>   !ptlrpc_import_in_recovery(imp));
> 
> 

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH 13/17] staging: lustre: libcfs: remove cfs_timeout_cap()

2018-04-02 Thread Dilger, Andreas
On Mar 28, 2018, at 22:26, NeilBrown  wrote:
> 
> This wrapper is only used once, so open-code it as max().
> 
> This allows us to remove the libcfs_time.h include file.
> 
> Signed-off-by: NeilBrown 
> ---
> .../staging/lustre/include/linux/libcfs/libcfs.h   |1 
> .../lustre/include/linux/libcfs/libcfs_time.h  |   50 
> .../lustre/include/linux/libcfs/linux/linux-time.h |2 -
> drivers/staging/lustre/lustre/ptlrpc/import.c  |4 +-
> 4 files changed, 3 insertions(+), 54 deletions(-)
> delete mode 100644 drivers/staging/lustre/include/linux/libcfs/libcfs_time.h
> 
> diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs.h 
> b/drivers/staging/lustre/include/linux/libcfs/libcfs.h
> index 3b751c436b3d..3d3fa52858e5 100644
> --- a/drivers/staging/lustre/include/linux/libcfs/libcfs.h
> +++ b/drivers/staging/lustre/include/linux/libcfs/libcfs.h
> @@ -43,7 +43,6 @@
> #include 
> #include 
> #include 
> -#include 
> #include 
> #include 
> #include 
> diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_time.h 
> b/drivers/staging/lustre/include/linux/libcfs/libcfs_time.h
> deleted file mode 100644
> index 172a8872e3f3..
> --- a/drivers/staging/lustre/include/linux/libcfs/libcfs_time.h
> +++ /dev/null
> @@ -1,50 +0,0 @@
> -// SPDX-License-Identifier: GPL-2.0
> -/*
> - * GPL HEADER START
> - *
> - * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
> - *
> - * This program is free software; you can redistribute it and/or modify
> - * it under the terms of the GNU General Public License version 2 only,
> - * as published by the Free Software Foundation.
> - *
> - * This program is distributed in the hope that it will be useful, but
> - * WITHOUT ANY WARRANTY; without even the implied warranty of
> - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> - * General Public License version 2 for more details (a copy is included
> - * in the LICENSE file that accompanied this code).
> - *
> - * You should have received a copy of the GNU General Public License
> - * version 2 along with this program; If not, see
> - * http://www.gnu.org/licenses/gpl-2.0.html
> - *
> - * GPL HEADER END
> - */
> -/*
> - * Copyright (c) 2008, 2010, Oracle and/or its affiliates. All rights 
> reserved.
> - * Use is subject to license terms.
> - */
> -/*
> - * This file is part of Lustre, http://www.lustre.org/
> - * Lustre is a trademark of Sun Microsystems, Inc.
> - *
> - * libcfs/include/libcfs/libcfs_time.h
> - *
> - * Time functions.
> - *
> - */
> -
> -#ifndef __LIBCFS_TIME_H__
> -#define __LIBCFS_TIME_H__
> -/*
> - * return valid time-out based on user supplied one. Currently we only check
> - * that time-out is not shorted than allowed.
> - */
> -static inline long cfs_timeout_cap(long timeout)
> -{
> - if (timeout < CFS_TICK)
> - timeout = CFS_TICK;
> - return timeout;
> -}
> -
> -#endif
> diff --git a/drivers/staging/lustre/include/linux/libcfs/linux/linux-time.h 
> b/drivers/staging/lustre/include/linux/libcfs/linux/linux-time.h
> index ff3aae2f1231..ecb2126a9e6f 100644
> --- a/drivers/staging/lustre/include/linux/libcfs/linux/linux-time.h
> +++ b/drivers/staging/lustre/include/linux/libcfs/linux/linux-time.h
> @@ -78,7 +78,7 @@ static inline int cfs_time_beforeq_64(u64 t1, u64 t2)
> /*
>  * One jiffy
>  */
> -#define CFS_TICK (1)
> +#define CFS_TICK (1UL)

It seems like CFS_TICK is mostly useless as well and could just be dropped?

> #define CFS_DURATION_T  "%ld"
> 
> diff --git a/drivers/staging/lustre/lustre/ptlrpc/import.c 
> b/drivers/staging/lustre/lustre/ptlrpc/import.c
> index 4a9d1f189d01..dd4fd54128dd 100644
> --- a/drivers/staging/lustre/lustre/ptlrpc/import.c
> +++ b/drivers/staging/lustre/lustre/ptlrpc/import.c
> @@ -1486,7 +1486,7 @@ int ptlrpc_disconnect_import(struct obd_import *imp, 
> int noclose)
>   }
> 
>   if (ptlrpc_import_in_recovery(imp)) {
> - long timeout;
> + unsigned long timeout;
> 
>   if (AT_OFF) {
>   if (imp->imp_server_timeout)
> @@ -1501,7 +1501,7 @@ int ptlrpc_disconnect_import(struct obd_import *imp, 
> int noclose)
> 
>   if (wait_event_idle_timeout(imp->imp_recovery_waitq,
>   !ptlrpc_import_in_recovery(imp),
> - cfs_timeout_cap(timeout)) == 0)
> + max(timeout, CFS_TICK)) == 0)
>   l_wait_event_abortable(
>   imp->imp_recovery_waitq,
>   !ptlrpc_import_in_recovery(imp));
> 
> 

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH 11/17] staging: lustre: libcfs: discard cfs_time_shift().

2018-04-02 Thread Dilger, Andreas

> On Mar 30, 2018, at 13:02, James Simmons  wrote:
> 
> 
>> This function simply multiplies by HZ and adds jiffies.
>> This is simple enough to be opencoded, and doing so
>> makes the code easier to read.
>> 
>> Same for cfs_time_shift_64()
> 
> Reviewed-by: James Simmons 

Hmm, I thought we were trying to get rid of direct HZ usage in modules,
because of tickless systems, and move to e.g. msecs_to_jiffies() or similar?

Cheers, Andreas

> 
>> Signed-off-by: NeilBrown 
>> ---
>> .../lustre/include/linux/libcfs/libcfs_time.h  |5 
>> .../lustre/include/linux/libcfs/linux/linux-time.h |5 
>> .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c|   12 +
>> .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c |4 ++-
>> .../staging/lustre/lnet/klnds/socklnd/socklnd.c|4 ++-
>> .../staging/lustre/lnet/klnds/socklnd/socklnd_cb.c |   12 +
>> drivers/staging/lustre/lnet/lnet/net_fault.c   |   26 
>> ++--
>> drivers/staging/lustre/lnet/lnet/router.c  |2 +-
>> drivers/staging/lustre/lustre/ldlm/ldlm_request.c  |2 +-
>> drivers/staging/lustre/lustre/ldlm/ldlm_resource.c |2 +-
>> drivers/staging/lustre/lustre/llite/llite_lib.c|4 ++-
>> drivers/staging/lustre/lustre/llite/lproc_llite.c  |   12 +
>> drivers/staging/lustre/lustre/llite/statahead.c|2 +-
>> drivers/staging/lustre/lustre/lmv/lmv_obd.c|2 +-
>> drivers/staging/lustre/lustre/lov/lov_obd.c|2 +-
>> drivers/staging/lustre/lustre/mdc/mdc_request.c|2 +-
>> .../lustre/lustre/obdclass/lprocfs_status.c|   12 +
>> .../staging/lustre/lustre/obdclass/obd_config.c|2 +-
>> drivers/staging/lustre/lustre/osc/osc_request.c|2 +-
>> drivers/staging/lustre/lustre/ptlrpc/pinger.c  |6 ++---
>> drivers/staging/lustre/lustre/ptlrpc/service.c |2 +-
>> 21 files changed, 56 insertions(+), 66 deletions(-)
>> 
>> diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_time.h 
>> b/drivers/staging/lustre/include/linux/libcfs/libcfs_time.h
>> index 7b41a129f041..0ebbde4ec8e8 100644
>> --- a/drivers/staging/lustre/include/linux/libcfs/libcfs_time.h
>> +++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_time.h
>> @@ -50,11 +50,6 @@ static inline int cfs_time_aftereq(unsigned long t1, 
>> unsigned long t2)
>>  return time_before_eq(t2, t1);
>> }
>> 
>> -static inline unsigned long cfs_time_shift(int seconds)
>> -{
>> -return jiffies + seconds * HZ;
>> -}
>> -
>> /*
>>  * return valid time-out based on user supplied one. Currently we only check
>>  * that time-out is not shorted than allowed.
>> diff --git a/drivers/staging/lustre/include/linux/libcfs/linux/linux-time.h 
>> b/drivers/staging/lustre/include/linux/libcfs/linux/linux-time.h
>> index b3a80531bd71..ff3aae2f1231 100644
>> --- a/drivers/staging/lustre/include/linux/libcfs/linux/linux-time.h
>> +++ b/drivers/staging/lustre/include/linux/libcfs/linux/linux-time.h
>> @@ -65,11 +65,6 @@ static inline long cfs_duration_sec(long d)
>>  return d / msecs_to_jiffies(MSEC_PER_SEC);
>> }
>> 
>> -static inline u64 cfs_time_shift_64(int seconds)
>> -{
>> -return get_jiffies_64() + (u64)seconds * HZ;
>> -}
>> -
>> static inline int cfs_time_before_64(u64 t1, u64 t2)
>> {
>>  return (__s64)t2 - (__s64)t1 > 0;
>> diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c 
>> b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
>> index 7df07f39b849..276bf486f64b 100644
>> --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
>> +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
>> @@ -1446,7 +1446,7 @@ static int kiblnd_create_fmr_pool(struct 
>> kib_fmr_poolset *fps,
>>  if (rc)
>>  goto out_fpo;
>> 
>> -fpo->fpo_deadline = cfs_time_shift(IBLND_POOL_DEADLINE);
>> +fpo->fpo_deadline = jiffies + IBLND_POOL_DEADLINE * HZ;
>>  fpo->fpo_owner = fps;
>>  *pp_fpo = fpo;
>> 
>> @@ -1619,7 +1619,7 @@ int kiblnd_fmr_pool_map(struct kib_fmr_poolset *fps, 
>> struct kib_tx *tx,
>>  spin_lock(>fps_lock);
>>  version = fps->fps_version;
>>  list_for_each_entry(fpo, >fps_pool_list, fpo_list) {
>> -fpo->fpo_deadline = cfs_time_shift(IBLND_POOL_DEADLINE);
>> +fpo->fpo_deadline = jiffies + IBLND_POOL_DEADLINE * HZ;
>>  fpo->fpo_map_count++;
>> 
>>  if (fpo->fpo_is_fmr) {
>> @@ -1743,7 +1743,7 @@ int kiblnd_fmr_pool_map(struct kib_fmr_poolset *fps, 
>> struct kib_tx *tx,
>>  fps->fps_version++;
>>  list_add_tail(>fpo_list, >fps_pool_list);
>>  } else {
>> -fps->fps_next_retry = cfs_time_shift(IBLND_POOL_RETRY);
>> +fps->fps_next_retry = jiffies + IBLND_POOL_RETRY * HZ;
>>  }
>>  spin_unlock(>fps_lock);
>> 
>> @@ -1764,7 +1764,7 @@ static void kiblnd_init_pool(struct kib_poolset *ps, 
>> struct kib_pool *pool, int
>> 
>>   

Re: [PATCH 11/17] staging: lustre: libcfs: discard cfs_time_shift().

2018-04-02 Thread Dilger, Andreas

> On Mar 30, 2018, at 13:02, James Simmons  wrote:
> 
> 
>> This function simply multiplies by HZ and adds jiffies.
>> This is simple enough to be opencoded, and doing so
>> makes the code easier to read.
>> 
>> Same for cfs_time_shift_64()
> 
> Reviewed-by: James Simmons 

Hmm, I thought we were trying to get rid of direct HZ usage in modules,
because of tickless systems, and move to e.g. msecs_to_jiffies() or similar?

Cheers, Andreas

> 
>> Signed-off-by: NeilBrown 
>> ---
>> .../lustre/include/linux/libcfs/libcfs_time.h  |5 
>> .../lustre/include/linux/libcfs/linux/linux-time.h |5 
>> .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c|   12 +
>> .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c |4 ++-
>> .../staging/lustre/lnet/klnds/socklnd/socklnd.c|4 ++-
>> .../staging/lustre/lnet/klnds/socklnd/socklnd_cb.c |   12 +
>> drivers/staging/lustre/lnet/lnet/net_fault.c   |   26 
>> ++--
>> drivers/staging/lustre/lnet/lnet/router.c  |2 +-
>> drivers/staging/lustre/lustre/ldlm/ldlm_request.c  |2 +-
>> drivers/staging/lustre/lustre/ldlm/ldlm_resource.c |2 +-
>> drivers/staging/lustre/lustre/llite/llite_lib.c|4 ++-
>> drivers/staging/lustre/lustre/llite/lproc_llite.c  |   12 +
>> drivers/staging/lustre/lustre/llite/statahead.c|2 +-
>> drivers/staging/lustre/lustre/lmv/lmv_obd.c|2 +-
>> drivers/staging/lustre/lustre/lov/lov_obd.c|2 +-
>> drivers/staging/lustre/lustre/mdc/mdc_request.c|2 +-
>> .../lustre/lustre/obdclass/lprocfs_status.c|   12 +
>> .../staging/lustre/lustre/obdclass/obd_config.c|2 +-
>> drivers/staging/lustre/lustre/osc/osc_request.c|2 +-
>> drivers/staging/lustre/lustre/ptlrpc/pinger.c  |6 ++---
>> drivers/staging/lustre/lustre/ptlrpc/service.c |2 +-
>> 21 files changed, 56 insertions(+), 66 deletions(-)
>> 
>> diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_time.h 
>> b/drivers/staging/lustre/include/linux/libcfs/libcfs_time.h
>> index 7b41a129f041..0ebbde4ec8e8 100644
>> --- a/drivers/staging/lustre/include/linux/libcfs/libcfs_time.h
>> +++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_time.h
>> @@ -50,11 +50,6 @@ static inline int cfs_time_aftereq(unsigned long t1, 
>> unsigned long t2)
>>  return time_before_eq(t2, t1);
>> }
>> 
>> -static inline unsigned long cfs_time_shift(int seconds)
>> -{
>> -return jiffies + seconds * HZ;
>> -}
>> -
>> /*
>>  * return valid time-out based on user supplied one. Currently we only check
>>  * that time-out is not shorted than allowed.
>> diff --git a/drivers/staging/lustre/include/linux/libcfs/linux/linux-time.h 
>> b/drivers/staging/lustre/include/linux/libcfs/linux/linux-time.h
>> index b3a80531bd71..ff3aae2f1231 100644
>> --- a/drivers/staging/lustre/include/linux/libcfs/linux/linux-time.h
>> +++ b/drivers/staging/lustre/include/linux/libcfs/linux/linux-time.h
>> @@ -65,11 +65,6 @@ static inline long cfs_duration_sec(long d)
>>  return d / msecs_to_jiffies(MSEC_PER_SEC);
>> }
>> 
>> -static inline u64 cfs_time_shift_64(int seconds)
>> -{
>> -return get_jiffies_64() + (u64)seconds * HZ;
>> -}
>> -
>> static inline int cfs_time_before_64(u64 t1, u64 t2)
>> {
>>  return (__s64)t2 - (__s64)t1 > 0;
>> diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c 
>> b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
>> index 7df07f39b849..276bf486f64b 100644
>> --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
>> +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
>> @@ -1446,7 +1446,7 @@ static int kiblnd_create_fmr_pool(struct 
>> kib_fmr_poolset *fps,
>>  if (rc)
>>  goto out_fpo;
>> 
>> -fpo->fpo_deadline = cfs_time_shift(IBLND_POOL_DEADLINE);
>> +fpo->fpo_deadline = jiffies + IBLND_POOL_DEADLINE * HZ;
>>  fpo->fpo_owner = fps;
>>  *pp_fpo = fpo;
>> 
>> @@ -1619,7 +1619,7 @@ int kiblnd_fmr_pool_map(struct kib_fmr_poolset *fps, 
>> struct kib_tx *tx,
>>  spin_lock(>fps_lock);
>>  version = fps->fps_version;
>>  list_for_each_entry(fpo, >fps_pool_list, fpo_list) {
>> -fpo->fpo_deadline = cfs_time_shift(IBLND_POOL_DEADLINE);
>> +fpo->fpo_deadline = jiffies + IBLND_POOL_DEADLINE * HZ;
>>  fpo->fpo_map_count++;
>> 
>>  if (fpo->fpo_is_fmr) {
>> @@ -1743,7 +1743,7 @@ int kiblnd_fmr_pool_map(struct kib_fmr_poolset *fps, 
>> struct kib_tx *tx,
>>  fps->fps_version++;
>>  list_add_tail(>fpo_list, >fps_pool_list);
>>  } else {
>> -fps->fps_next_retry = cfs_time_shift(IBLND_POOL_RETRY);
>> +fps->fps_next_retry = jiffies + IBLND_POOL_RETRY * HZ;
>>  }
>>  spin_unlock(>fps_lock);
>> 
>> @@ -1764,7 +1764,7 @@ static void kiblnd_init_pool(struct kib_poolset *ps, 
>> struct kib_pool *pool, int
>> 
>>  memset(pool, 0, sizeof(*pool));
>>  

Re: [lustre-devel] [PATCH v2] staging: lustre: Remove VLA usage

2018-03-09 Thread Dilger, Andreas
On Mar 7, 2018, at 13:54, Kees Cook  wrote:
> 
> The kernel would like to have all stack VLA usage removed[1]. This switches
> to a simple kasprintf() instead, and in the process fixes an off-by-one
> between the allocation and the sprintf (allocation did not include NULL
> byte in calculation).
> 
> [1] https://lkml.org/lkml/2018/3/7/621
> 
> Signed-off-by: Kees Cook 
> Reviewed-by: Rasmus Villemoes 

This seems better than the VLA_SAFE() macro, at the cost of an extra kmalloc.
I don't think these code paths are super performance critical.

Reviewed-by: Andreas Dilger 

> ---
> drivers/staging/lustre/lustre/llite/xattr.c | 19 +--
> 1 file changed, 13 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/llite/xattr.c 
> b/drivers/staging/lustre/lustre/llite/xattr.c
> index 532384c91447..ff6fe81a4ddb 100644
> --- a/drivers/staging/lustre/lustre/llite/xattr.c
> +++ b/drivers/staging/lustre/lustre/llite/xattr.c
> @@ -87,10 +87,10 @@ ll_xattr_set_common(const struct xattr_handler *handler,
>   const char *name, const void *value, size_t size,
>   int flags)
> {
> - char fullname[strlen(handler->prefix) + strlen(name) + 1];
>   struct ll_sb_info *sbi = ll_i2sbi(inode);
>   struct ptlrpc_request *req = NULL;
>   const char *pv = value;
> + char *fullname;
>   __u64 valid;
>   int rc;
> 
> @@ -141,10 +141,13 @@ ll_xattr_set_common(const struct xattr_handler *handler,
>   return -EPERM;
>   }
> 
> - sprintf(fullname, "%s%s\n", handler->prefix, name);
> + fullname = kasprintf(GFP_KERNEL, "%s%s\n", handler->prefix, name);
> + if (!fullname)
> + return -ENOMEM;
>   rc = md_setxattr(sbi->ll_md_exp, ll_inode2fid(inode),
>valid, fullname, pv, size, 0, flags,
>ll_i2suppgid(inode), );
> + kfree(fullname);
>   if (rc) {
>   if (rc == -EOPNOTSUPP && handler->flags == XATTR_USER_T) {
>   LCONSOLE_INFO("Disabling user_xattr feature because it 
> is not supported on the server\n");
> @@ -364,11 +367,11 @@ static int ll_xattr_get_common(const struct 
> xattr_handler *handler,
>  struct dentry *dentry, struct inode *inode,
>  const char *name, void *buffer, size_t size)
> {
> - char fullname[strlen(handler->prefix) + strlen(name) + 1];
>   struct ll_sb_info *sbi = ll_i2sbi(inode);
> #ifdef CONFIG_FS_POSIX_ACL
>   struct ll_inode_info *lli = ll_i2info(inode);
> #endif
> + char *fullname;
>   int rc;
> 
>   CDEBUG(D_VFSTRACE, "VFS Op:inode=" DFID "(%p)\n",
> @@ -411,9 +414,13 @@ static int ll_xattr_get_common(const struct 
> xattr_handler *handler,
>   if (handler->flags == XATTR_ACL_DEFAULT_T && !S_ISDIR(inode->i_mode))
>   return -ENODATA;
> #endif
> - sprintf(fullname, "%s%s\n", handler->prefix, name);
> - return ll_xattr_list(inode, fullname, handler->flags, buffer, size,
> -  OBD_MD_FLXATTR);
> + fullname = kasprintf(GFP_KERNEL, "%s%s\n", handler->prefix, name);
> + if (!fullname)
> + return -ENOMEM;
> + rc = ll_xattr_list(inode, fullname, handler->flags, buffer, size,
> +OBD_MD_FLXATTR);
> + kfree(fullname);
> + return rc;
> }
> 
> static ssize_t ll_getxattr_lov(struct inode *inode, void *buf, size_t 
> buf_size)
> -- 
> 2.7.4
> 
> 
> -- 
> Kees Cook
> Pixel Security
> ___
> lustre-devel mailing list
> lustre-de...@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [lustre-devel] [PATCH v2] staging: lustre: Remove VLA usage

2018-03-09 Thread Dilger, Andreas
On Mar 7, 2018, at 13:54, Kees Cook  wrote:
> 
> The kernel would like to have all stack VLA usage removed[1]. This switches
> to a simple kasprintf() instead, and in the process fixes an off-by-one
> between the allocation and the sprintf (allocation did not include NULL
> byte in calculation).
> 
> [1] https://lkml.org/lkml/2018/3/7/621
> 
> Signed-off-by: Kees Cook 
> Reviewed-by: Rasmus Villemoes 

This seems better than the VLA_SAFE() macro, at the cost of an extra kmalloc.
I don't think these code paths are super performance critical.

Reviewed-by: Andreas Dilger 

> ---
> drivers/staging/lustre/lustre/llite/xattr.c | 19 +--
> 1 file changed, 13 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/llite/xattr.c 
> b/drivers/staging/lustre/lustre/llite/xattr.c
> index 532384c91447..ff6fe81a4ddb 100644
> --- a/drivers/staging/lustre/lustre/llite/xattr.c
> +++ b/drivers/staging/lustre/lustre/llite/xattr.c
> @@ -87,10 +87,10 @@ ll_xattr_set_common(const struct xattr_handler *handler,
>   const char *name, const void *value, size_t size,
>   int flags)
> {
> - char fullname[strlen(handler->prefix) + strlen(name) + 1];
>   struct ll_sb_info *sbi = ll_i2sbi(inode);
>   struct ptlrpc_request *req = NULL;
>   const char *pv = value;
> + char *fullname;
>   __u64 valid;
>   int rc;
> 
> @@ -141,10 +141,13 @@ ll_xattr_set_common(const struct xattr_handler *handler,
>   return -EPERM;
>   }
> 
> - sprintf(fullname, "%s%s\n", handler->prefix, name);
> + fullname = kasprintf(GFP_KERNEL, "%s%s\n", handler->prefix, name);
> + if (!fullname)
> + return -ENOMEM;
>   rc = md_setxattr(sbi->ll_md_exp, ll_inode2fid(inode),
>valid, fullname, pv, size, 0, flags,
>ll_i2suppgid(inode), );
> + kfree(fullname);
>   if (rc) {
>   if (rc == -EOPNOTSUPP && handler->flags == XATTR_USER_T) {
>   LCONSOLE_INFO("Disabling user_xattr feature because it 
> is not supported on the server\n");
> @@ -364,11 +367,11 @@ static int ll_xattr_get_common(const struct 
> xattr_handler *handler,
>  struct dentry *dentry, struct inode *inode,
>  const char *name, void *buffer, size_t size)
> {
> - char fullname[strlen(handler->prefix) + strlen(name) + 1];
>   struct ll_sb_info *sbi = ll_i2sbi(inode);
> #ifdef CONFIG_FS_POSIX_ACL
>   struct ll_inode_info *lli = ll_i2info(inode);
> #endif
> + char *fullname;
>   int rc;
> 
>   CDEBUG(D_VFSTRACE, "VFS Op:inode=" DFID "(%p)\n",
> @@ -411,9 +414,13 @@ static int ll_xattr_get_common(const struct 
> xattr_handler *handler,
>   if (handler->flags == XATTR_ACL_DEFAULT_T && !S_ISDIR(inode->i_mode))
>   return -ENODATA;
> #endif
> - sprintf(fullname, "%s%s\n", handler->prefix, name);
> - return ll_xattr_list(inode, fullname, handler->flags, buffer, size,
> -  OBD_MD_FLXATTR);
> + fullname = kasprintf(GFP_KERNEL, "%s%s\n", handler->prefix, name);
> + if (!fullname)
> + return -ENOMEM;
> + rc = ll_xattr_list(inode, fullname, handler->flags, buffer, size,
> +OBD_MD_FLXATTR);
> + kfree(fullname);
> + return rc;
> }
> 
> static ssize_t ll_getxattr_lov(struct inode *inode, void *buf, size_t 
> buf_size)
> -- 
> 2.7.4
> 
> 
> -- 
> Kees Cook
> Pixel Security
> ___
> lustre-devel mailing list
> lustre-de...@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH 17/17] Revert "staging: Disable lustre file system for MIPS, SH, and XTENSA"

2018-03-08 Thread Dilger, Andreas
On Mar 1, 2018, at 16:31, NeilBrown  wrote:
> 
> This reverts commit 16f1eeb660bd2bfd223704ee6350706b39c55a7a.
> 
> The reason for this patch was that lustre used copy_from_user_page.
> Commit 76133e66b141 ("staging/lustre: Replace jobid acquiring with per
> node setting") removed that usage.
> So the arch restrictions can go.

I don't think these platforms will be used with Lustre any time soon,
but who knows... :-)

In any case, thanks for this patch series, definitely a lot of good
cleanups in there.

Reviewed-by: Andreas Dilger 

> Signed-off-by: NeilBrown 
> ---
> drivers/staging/lustre/lustre/Kconfig |1 -
> 1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/staging/lustre/lustre/Kconfig 
> b/drivers/staging/lustre/lustre/Kconfig
> index c669c8fa0cc6..ccb78a945995 100644
> --- a/drivers/staging/lustre/lustre/Kconfig
> +++ b/drivers/staging/lustre/lustre/Kconfig
> @@ -1,6 +1,5 @@
> config LUSTRE_FS
>   tristate "Lustre file system client support"
> - depends on !MIPS && !XTENSA && !SUPERH
>   depends on LNET
>   select CRYPTO
>   select CRYPTO_CRC32
> 
> 

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH 17/17] Revert "staging: Disable lustre file system for MIPS, SH, and XTENSA"

2018-03-08 Thread Dilger, Andreas
On Mar 1, 2018, at 16:31, NeilBrown  wrote:
> 
> This reverts commit 16f1eeb660bd2bfd223704ee6350706b39c55a7a.
> 
> The reason for this patch was that lustre used copy_from_user_page.
> Commit 76133e66b141 ("staging/lustre: Replace jobid acquiring with per
> node setting") removed that usage.
> So the arch restrictions can go.

I don't think these platforms will be used with Lustre any time soon,
but who knows... :-)

In any case, thanks for this patch series, definitely a lot of good
cleanups in there.

Reviewed-by: Andreas Dilger 

> Signed-off-by: NeilBrown 
> ---
> drivers/staging/lustre/lustre/Kconfig |1 -
> 1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/staging/lustre/lustre/Kconfig 
> b/drivers/staging/lustre/lustre/Kconfig
> index c669c8fa0cc6..ccb78a945995 100644
> --- a/drivers/staging/lustre/lustre/Kconfig
> +++ b/drivers/staging/lustre/lustre/Kconfig
> @@ -1,6 +1,5 @@
> config LUSTRE_FS
>   tristate "Lustre file system client support"
> - depends on !MIPS && !XTENSA && !SUPERH
>   depends on LNET
>   select CRYPTO
>   select CRYPTO_CRC32
> 
> 

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH 16/17] staging: lustre: allow monolithic builds

2018-03-08 Thread Dilger, Andreas
On Mar 1, 2018, at 16:31, NeilBrown  wrote:
> 
> Remove restriction the lustre must be built
> as modules.  It now works as a monolithic build.
> 
> Signed-off-by: NeilBrown 

Reviewed-by: Andreas Dilger 

> ---
> drivers/staging/lustre/lnet/Kconfig   |2 +-
> drivers/staging/lustre/lustre/Kconfig |2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lnet/Kconfig 
> b/drivers/staging/lustre/lnet/Kconfig
> index 6bcb53d0c6f4..ad049e6f24e4 100644
> --- a/drivers/staging/lustre/lnet/Kconfig
> +++ b/drivers/staging/lustre/lnet/Kconfig
> @@ -1,6 +1,6 @@
> config LNET
>   tristate "Lustre networking subsystem (LNet)"
> - depends on INET && m
> + depends on INET
>   help
> The Lustre network layer, also known as LNet, is a networking 
> abstaction
> level API that was initially created to allow Lustre Filesystem to 
> utilize
> diff --git a/drivers/staging/lustre/lustre/Kconfig 
> b/drivers/staging/lustre/lustre/Kconfig
> index 90d826946c6a..c669c8fa0cc6 100644
> --- a/drivers/staging/lustre/lustre/Kconfig
> +++ b/drivers/staging/lustre/lustre/Kconfig
> @@ -1,6 +1,6 @@
> config LUSTRE_FS
>   tristate "Lustre file system client support"
> - depends on m && !MIPS && !XTENSA && !SUPERH
> + depends on !MIPS && !XTENSA && !SUPERH
>   depends on LNET
>   select CRYPTO
>   select CRYPTO_CRC32
> 
> 

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH 16/17] staging: lustre: allow monolithic builds

2018-03-08 Thread Dilger, Andreas
On Mar 1, 2018, at 16:31, NeilBrown  wrote:
> 
> Remove restriction the lustre must be built
> as modules.  It now works as a monolithic build.
> 
> Signed-off-by: NeilBrown 

Reviewed-by: Andreas Dilger 

> ---
> drivers/staging/lustre/lnet/Kconfig   |2 +-
> drivers/staging/lustre/lustre/Kconfig |2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lnet/Kconfig 
> b/drivers/staging/lustre/lnet/Kconfig
> index 6bcb53d0c6f4..ad049e6f24e4 100644
> --- a/drivers/staging/lustre/lnet/Kconfig
> +++ b/drivers/staging/lustre/lnet/Kconfig
> @@ -1,6 +1,6 @@
> config LNET
>   tristate "Lustre networking subsystem (LNet)"
> - depends on INET && m
> + depends on INET
>   help
> The Lustre network layer, also known as LNet, is a networking 
> abstaction
> level API that was initially created to allow Lustre Filesystem to 
> utilize
> diff --git a/drivers/staging/lustre/lustre/Kconfig 
> b/drivers/staging/lustre/lustre/Kconfig
> index 90d826946c6a..c669c8fa0cc6 100644
> --- a/drivers/staging/lustre/lustre/Kconfig
> +++ b/drivers/staging/lustre/lustre/Kconfig
> @@ -1,6 +1,6 @@
> config LUSTRE_FS
>   tristate "Lustre file system client support"
> - depends on m && !MIPS && !XTENSA && !SUPERH
> + depends on !MIPS && !XTENSA && !SUPERH
>   depends on LNET
>   select CRYPTO
>   select CRYPTO_CRC32
> 
> 

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH 15/17] staging: lustre: ptlrpc: move thread creation out of module initialization

2018-03-08 Thread Dilger, Andreas
On Mar 1, 2018, at 16:31, NeilBrown  wrote:
> 
> When the ptlrpc module is loaded, it starts the pinger thread and
> calls LNetNIInit which starts various threads.
> 
> We don't need these threads until the module is actually being
> used, such as when a lustre filesystem is mounted.
> 
> So move the thread creation into new ptlrpc_inc_ref() (modeled on
> ptlrpcd_inc_ref()), and call that when needed, such as at mount time.

It looks like this is still done early enough in the mount sequence, so the
earlier "[06/17] get entropy from nid when nid set" action is still done
before the client UUID is generated in ll_init_sbi().

Reviewed-by: Andreas Dilger 

> Signed-off-by: NeilBrown 
> ---
> drivers/staging/lustre/lustre/include/lustre_net.h |3 +
> drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c|   12 
> drivers/staging/lustre/lustre/llite/llite_lib.c|   18 +-
> .../staging/lustre/lustre/ptlrpc/ptlrpc_module.c   |   56 +---
> 4 files changed, 65 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/include/lustre_net.h 
> b/drivers/staging/lustre/lustre/include/lustre_net.h
> index 108683c54127..d35ae0cda8d2 100644
> --- a/drivers/staging/lustre/lustre/include/lustre_net.h
> +++ b/drivers/staging/lustre/lustre/include/lustre_net.h
> @@ -1804,6 +1804,9 @@ int ptlrpc_register_rqbd(struct 
> ptlrpc_request_buffer_desc *rqbd);
>  */
> void ptlrpc_request_committed(struct ptlrpc_request *req, int force);
> 
> +int ptlrpc_inc_ref(void);
> +void ptlrpc_dec_ref(void);
> +
> void ptlrpc_init_client(int req_portal, int rep_portal, char *name,
>   struct ptlrpc_client *);
> struct ptlrpc_connection *ptlrpc_uuid_to_connection(struct obd_uuid *uuid);
> diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c 
> b/drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c
> index 58913e628124..c772c68e5a49 100644
> --- a/drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c
> +++ b/drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c
> @@ -869,6 +869,10 @@ int ldlm_get_ref(void)
> {
>   int rc = 0;
> 
> + rc = ptlrpc_inc_ref();
> + if (rc)
> + return rc;
> +
>   mutex_lock(_ref_mutex);
>   if (++ldlm_refcount == 1) {
>   rc = ldlm_setup();
> @@ -877,14 +881,18 @@ int ldlm_get_ref(void)
>   }
>   mutex_unlock(_ref_mutex);
> 
> + if (rc)
> + ptlrpc_dec_ref();
> +
>   return rc;
> }
> 
> void ldlm_put_ref(void)
> {
> + int rc = 0;
>   mutex_lock(_ref_mutex);
>   if (ldlm_refcount == 1) {
> - int rc = ldlm_cleanup();
> + rc = ldlm_cleanup();
> 
>   if (rc)
>   CERROR("ldlm_cleanup failed: %d\n", rc);
> @@ -894,6 +902,8 @@ void ldlm_put_ref(void)
>   ldlm_refcount--;
>   }
>   mutex_unlock(_ref_mutex);
> + if (!rc)
> + ptlrpc_dec_ref();
> }
> 
> static ssize_t cancel_unused_locks_before_replay_show(struct kobject *kobj,
> diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c 
> b/drivers/staging/lustre/lustre/llite/llite_lib.c
> index 844182ad7dd7..706b14bf8981 100644
> --- a/drivers/staging/lustre/lustre/llite/llite_lib.c
> +++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
> @@ -879,9 +879,15 @@ int ll_fill_super(struct super_block *sb)
> 
>   CDEBUG(D_VFSTRACE, "VFS Op: sb %p\n", sb);
> 
> + err = ptlrpc_inc_ref();
> + if (err)
> + return err;
> +
>   cfg = kzalloc(sizeof(*cfg), GFP_NOFS);
> - if (!cfg)
> - return -ENOMEM;
> + if (!cfg) {
> + err = -ENOMEM;
> + goto out_put;
> + }
> 
>   try_module_get(THIS_MODULE);
> 
> @@ -891,7 +897,8 @@ int ll_fill_super(struct super_block *sb)
>   if (!sbi) {
>   module_put(THIS_MODULE);
>   kfree(cfg);
> - return -ENOMEM;
> + err = -ENOMEM;
> + goto out_put;
>   }
> 
>   err = ll_options(lsi->lsi_lmd->lmd_opts, >ll_flags);
> @@ -958,6 +965,9 @@ int ll_fill_super(struct super_block *sb)
>   LCONSOLE_WARN("Mounted %s\n", profilenm);
> 
>   kfree(cfg);
> +out_put:
> + if (err)
> + ptlrpc_dec_ref();
>   return err;
> } /* ll_fill_super */
> 
> @@ -1028,6 +1038,8 @@ void ll_put_super(struct super_block *sb)
>   cl_env_cache_purge(~0);
> 
>   module_put(THIS_MODULE);
> +
> + ptlrpc_dec_ref();
> } /* client_put_super */
> 
> struct inode *ll_inode_from_resource_lock(struct ldlm_lock *lock)
> diff --git a/drivers/staging/lustre/lustre/ptlrpc/ptlrpc_module.c 
> b/drivers/staging/lustre/lustre/ptlrpc/ptlrpc_module.c
> index 131fc6d9646e..38923418669f 100644
> --- a/drivers/staging/lustre/lustre/ptlrpc/ptlrpc_module.c
> +++ b/drivers/staging/lustre/lustre/ptlrpc/ptlrpc_module.c
> @@ -45,6 +45,42 @@ extern spinlock_t ptlrpc_last_xid_lock;
> extern spinlock_t ptlrpc_rs_debug_lock;
> #endif
> 
> 

Re: [PATCH 15/17] staging: lustre: ptlrpc: move thread creation out of module initialization

2018-03-08 Thread Dilger, Andreas
On Mar 1, 2018, at 16:31, NeilBrown  wrote:
> 
> When the ptlrpc module is loaded, it starts the pinger thread and
> calls LNetNIInit which starts various threads.
> 
> We don't need these threads until the module is actually being
> used, such as when a lustre filesystem is mounted.
> 
> So move the thread creation into new ptlrpc_inc_ref() (modeled on
> ptlrpcd_inc_ref()), and call that when needed, such as at mount time.

It looks like this is still done early enough in the mount sequence, so the
earlier "[06/17] get entropy from nid when nid set" action is still done
before the client UUID is generated in ll_init_sbi().

Reviewed-by: Andreas Dilger 

> Signed-off-by: NeilBrown 
> ---
> drivers/staging/lustre/lustre/include/lustre_net.h |3 +
> drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c|   12 
> drivers/staging/lustre/lustre/llite/llite_lib.c|   18 +-
> .../staging/lustre/lustre/ptlrpc/ptlrpc_module.c   |   56 +---
> 4 files changed, 65 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/include/lustre_net.h 
> b/drivers/staging/lustre/lustre/include/lustre_net.h
> index 108683c54127..d35ae0cda8d2 100644
> --- a/drivers/staging/lustre/lustre/include/lustre_net.h
> +++ b/drivers/staging/lustre/lustre/include/lustre_net.h
> @@ -1804,6 +1804,9 @@ int ptlrpc_register_rqbd(struct 
> ptlrpc_request_buffer_desc *rqbd);
>  */
> void ptlrpc_request_committed(struct ptlrpc_request *req, int force);
> 
> +int ptlrpc_inc_ref(void);
> +void ptlrpc_dec_ref(void);
> +
> void ptlrpc_init_client(int req_portal, int rep_portal, char *name,
>   struct ptlrpc_client *);
> struct ptlrpc_connection *ptlrpc_uuid_to_connection(struct obd_uuid *uuid);
> diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c 
> b/drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c
> index 58913e628124..c772c68e5a49 100644
> --- a/drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c
> +++ b/drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c
> @@ -869,6 +869,10 @@ int ldlm_get_ref(void)
> {
>   int rc = 0;
> 
> + rc = ptlrpc_inc_ref();
> + if (rc)
> + return rc;
> +
>   mutex_lock(_ref_mutex);
>   if (++ldlm_refcount == 1) {
>   rc = ldlm_setup();
> @@ -877,14 +881,18 @@ int ldlm_get_ref(void)
>   }
>   mutex_unlock(_ref_mutex);
> 
> + if (rc)
> + ptlrpc_dec_ref();
> +
>   return rc;
> }
> 
> void ldlm_put_ref(void)
> {
> + int rc = 0;
>   mutex_lock(_ref_mutex);
>   if (ldlm_refcount == 1) {
> - int rc = ldlm_cleanup();
> + rc = ldlm_cleanup();
> 
>   if (rc)
>   CERROR("ldlm_cleanup failed: %d\n", rc);
> @@ -894,6 +902,8 @@ void ldlm_put_ref(void)
>   ldlm_refcount--;
>   }
>   mutex_unlock(_ref_mutex);
> + if (!rc)
> + ptlrpc_dec_ref();
> }
> 
> static ssize_t cancel_unused_locks_before_replay_show(struct kobject *kobj,
> diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c 
> b/drivers/staging/lustre/lustre/llite/llite_lib.c
> index 844182ad7dd7..706b14bf8981 100644
> --- a/drivers/staging/lustre/lustre/llite/llite_lib.c
> +++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
> @@ -879,9 +879,15 @@ int ll_fill_super(struct super_block *sb)
> 
>   CDEBUG(D_VFSTRACE, "VFS Op: sb %p\n", sb);
> 
> + err = ptlrpc_inc_ref();
> + if (err)
> + return err;
> +
>   cfg = kzalloc(sizeof(*cfg), GFP_NOFS);
> - if (!cfg)
> - return -ENOMEM;
> + if (!cfg) {
> + err = -ENOMEM;
> + goto out_put;
> + }
> 
>   try_module_get(THIS_MODULE);
> 
> @@ -891,7 +897,8 @@ int ll_fill_super(struct super_block *sb)
>   if (!sbi) {
>   module_put(THIS_MODULE);
>   kfree(cfg);
> - return -ENOMEM;
> + err = -ENOMEM;
> + goto out_put;
>   }
> 
>   err = ll_options(lsi->lsi_lmd->lmd_opts, >ll_flags);
> @@ -958,6 +965,9 @@ int ll_fill_super(struct super_block *sb)
>   LCONSOLE_WARN("Mounted %s\n", profilenm);
> 
>   kfree(cfg);
> +out_put:
> + if (err)
> + ptlrpc_dec_ref();
>   return err;
> } /* ll_fill_super */
> 
> @@ -1028,6 +1038,8 @@ void ll_put_super(struct super_block *sb)
>   cl_env_cache_purge(~0);
> 
>   module_put(THIS_MODULE);
> +
> + ptlrpc_dec_ref();
> } /* client_put_super */
> 
> struct inode *ll_inode_from_resource_lock(struct ldlm_lock *lock)
> diff --git a/drivers/staging/lustre/lustre/ptlrpc/ptlrpc_module.c 
> b/drivers/staging/lustre/lustre/ptlrpc/ptlrpc_module.c
> index 131fc6d9646e..38923418669f 100644
> --- a/drivers/staging/lustre/lustre/ptlrpc/ptlrpc_module.c
> +++ b/drivers/staging/lustre/lustre/ptlrpc/ptlrpc_module.c
> @@ -45,6 +45,42 @@ extern spinlock_t ptlrpc_last_xid_lock;
> extern spinlock_t ptlrpc_rs_debug_lock;
> #endif
> 
> +DEFINE_MUTEX(ptlrpc_startup);
> +static int ptlrpc_active = 

Re: [PATCH 14/17] staging: lustre: change sai_thread to sai_task.

2018-03-08 Thread Dilger, Andreas

> On Mar 1, 2018, at 16:31, NeilBrown  wrote:
> 
> Rather than allocating a ptlrpc_thread for the
> stat-ahead thread, just use the task_struct provided
> by kthreads directly.
> 
> As nothing ever waits for the sai_task, it must call do_exit()
> directly rather than simply return from the function.
> Also it cannot use kthread_should_stop() to know when to stop.
> 
> There is one caller which can ask it to stop so we need a simple
> signaling mechanism.  I've chosen to set ->sai_task to NULL
> when the thread should finish up.  The thread notices this and
> cleans up and exits.
> lli_sa_lock is used to avoid races between waking up the process
> and the process exiting.
> 
> Signed-off-by: NeilBrown 

CC'd Fan Yong, who is the author for this code.

Reviewed-by: Andreas Dilger 

> ---
> .../staging/lustre/lustre/llite/llite_internal.h   |2 
> drivers/staging/lustre/lustre/llite/statahead.c|  118 +---
> 2 files changed, 54 insertions(+), 66 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h 
> b/drivers/staging/lustre/lustre/llite/llite_internal.h
> index 0c2d717fd526..d46bcf71b273 100644
> --- a/drivers/staging/lustre/lustre/llite/llite_internal.h
> +++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
> @@ -1070,7 +1070,7 @@ struct ll_statahead_info {
>   sai_agl_valid:1,/* AGL is valid for the dir */
>   sai_in_readpage:1;/* statahead in readdir() */
>   wait_queue_head_t   sai_waitq;  /* stat-ahead wait queue */
> - struct ptlrpc_threadsai_thread; /* stat-ahead thread */
> + struct task_struct *sai_task;   /* stat-ahead thread */
>   struct task_struct *sai_agl_task;   /* AGL thread */
>   struct list_headsai_interim_entries; /* entries which got async
> * stat reply, but not
> diff --git a/drivers/staging/lustre/lustre/llite/statahead.c 
> b/drivers/staging/lustre/lustre/llite/statahead.c
> index 39241b952bf4..155ce3cf6f60 100644
> --- a/drivers/staging/lustre/lustre/llite/statahead.c
> +++ b/drivers/staging/lustre/lustre/llite/statahead.c
> @@ -267,7 +267,7 @@ sa_kill(struct ll_statahead_info *sai, struct sa_entry 
> *entry)
> 
> /* called by scanner after use, sa_entry will be killed */
> static void
> -sa_put(struct ll_statahead_info *sai, struct sa_entry *entry)
> +sa_put(struct ll_statahead_info *sai, struct sa_entry *entry, struct 
> ll_inode_info *lli)
> {
>   struct sa_entry *tmp, *next;
> 
> @@ -295,7 +295,11 @@ sa_put(struct ll_statahead_info *sai, struct sa_entry 
> *entry)
>   sa_kill(sai, tmp);
>   }
> 
> - wake_up(>sai_thread.t_ctl_waitq);
> + spin_lock(>lli_sa_lock);
> + if (sai->sai_task)
> + wake_up_process(sai->sai_task);
> + spin_unlock(>lli_sa_lock);
> +
> }
> 
> /*
> @@ -403,7 +407,6 @@ static struct ll_statahead_info *ll_sai_alloc(struct 
> dentry *dentry)
>   sai->sai_max = LL_SA_RPC_MIN;
>   sai->sai_index = 1;
>   init_waitqueue_head(>sai_waitq);
> - init_waitqueue_head(>sai_thread.t_ctl_waitq);
> 
>   INIT_LIST_HEAD(>sai_interim_entries);
>   INIT_LIST_HEAD(>sai_entries);
> @@ -465,7 +468,7 @@ static void ll_sai_put(struct ll_statahead_info *sai)
>   lli->lli_sai = NULL;
>   spin_unlock(>lli_sa_lock);
> 
> - LASSERT(thread_is_stopped(>sai_thread));
> + LASSERT(sai->sai_task == NULL);
>   LASSERT(sai->sai_agl_task == NULL);
>   LASSERT(sai->sai_sent == sai->sai_replied);
>   LASSERT(!sa_has_callback(sai));
> @@ -646,7 +649,6 @@ static int ll_statahead_interpret(struct ptlrpc_request 
> *req,
>   struct ll_inode_info *lli = ll_i2info(dir);
>   struct ll_statahead_info *sai = lli->lli_sai;
>   struct sa_entry *entry = (struct sa_entry *)minfo->mi_cbdata;
> - wait_queue_head_t *waitq = NULL;
>   __u64 handle = 0;
> 
>   if (it_disposition(it, DISP_LOOKUP_NEG))
> @@ -657,7 +659,6 @@ static int ll_statahead_interpret(struct ptlrpc_request 
> *req,
>* sai should be always valid, no need to refcount
>*/
>   LASSERT(sai);
> - LASSERT(!thread_is_stopped(>sai_thread));
>   LASSERT(entry);
> 
>   CDEBUG(D_READA, "sa_entry %.*s rc %d\n",
> @@ -681,8 +682,9 @@ static int ll_statahead_interpret(struct ptlrpc_request 
> *req,
>   spin_lock(>lli_sa_lock);
>   if (rc) {
>   if (__sa_make_ready(sai, entry, rc))
> - waitq = >sai_waitq;
> + wake_up(>sai_waitq);
>   } else {
> + int first = 0;
>   entry->se_minfo = minfo;
>   entry->se_req = ptlrpc_request_addref(req);
>   /*
> @@ -693,14 +695,15 @@ static int ll_statahead_interpret(struct ptlrpc_request 
> *req,
>*/
>

Re: [PATCH 14/17] staging: lustre: change sai_thread to sai_task.

2018-03-08 Thread Dilger, Andreas

> On Mar 1, 2018, at 16:31, NeilBrown  wrote:
> 
> Rather than allocating a ptlrpc_thread for the
> stat-ahead thread, just use the task_struct provided
> by kthreads directly.
> 
> As nothing ever waits for the sai_task, it must call do_exit()
> directly rather than simply return from the function.
> Also it cannot use kthread_should_stop() to know when to stop.
> 
> There is one caller which can ask it to stop so we need a simple
> signaling mechanism.  I've chosen to set ->sai_task to NULL
> when the thread should finish up.  The thread notices this and
> cleans up and exits.
> lli_sa_lock is used to avoid races between waking up the process
> and the process exiting.
> 
> Signed-off-by: NeilBrown 

CC'd Fan Yong, who is the author for this code.

Reviewed-by: Andreas Dilger 

> ---
> .../staging/lustre/lustre/llite/llite_internal.h   |2 
> drivers/staging/lustre/lustre/llite/statahead.c|  118 +---
> 2 files changed, 54 insertions(+), 66 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h 
> b/drivers/staging/lustre/lustre/llite/llite_internal.h
> index 0c2d717fd526..d46bcf71b273 100644
> --- a/drivers/staging/lustre/lustre/llite/llite_internal.h
> +++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
> @@ -1070,7 +1070,7 @@ struct ll_statahead_info {
>   sai_agl_valid:1,/* AGL is valid for the dir */
>   sai_in_readpage:1;/* statahead in readdir() */
>   wait_queue_head_t   sai_waitq;  /* stat-ahead wait queue */
> - struct ptlrpc_threadsai_thread; /* stat-ahead thread */
> + struct task_struct *sai_task;   /* stat-ahead thread */
>   struct task_struct *sai_agl_task;   /* AGL thread */
>   struct list_headsai_interim_entries; /* entries which got async
> * stat reply, but not
> diff --git a/drivers/staging/lustre/lustre/llite/statahead.c 
> b/drivers/staging/lustre/lustre/llite/statahead.c
> index 39241b952bf4..155ce3cf6f60 100644
> --- a/drivers/staging/lustre/lustre/llite/statahead.c
> +++ b/drivers/staging/lustre/lustre/llite/statahead.c
> @@ -267,7 +267,7 @@ sa_kill(struct ll_statahead_info *sai, struct sa_entry 
> *entry)
> 
> /* called by scanner after use, sa_entry will be killed */
> static void
> -sa_put(struct ll_statahead_info *sai, struct sa_entry *entry)
> +sa_put(struct ll_statahead_info *sai, struct sa_entry *entry, struct 
> ll_inode_info *lli)
> {
>   struct sa_entry *tmp, *next;
> 
> @@ -295,7 +295,11 @@ sa_put(struct ll_statahead_info *sai, struct sa_entry 
> *entry)
>   sa_kill(sai, tmp);
>   }
> 
> - wake_up(>sai_thread.t_ctl_waitq);
> + spin_lock(>lli_sa_lock);
> + if (sai->sai_task)
> + wake_up_process(sai->sai_task);
> + spin_unlock(>lli_sa_lock);
> +
> }
> 
> /*
> @@ -403,7 +407,6 @@ static struct ll_statahead_info *ll_sai_alloc(struct 
> dentry *dentry)
>   sai->sai_max = LL_SA_RPC_MIN;
>   sai->sai_index = 1;
>   init_waitqueue_head(>sai_waitq);
> - init_waitqueue_head(>sai_thread.t_ctl_waitq);
> 
>   INIT_LIST_HEAD(>sai_interim_entries);
>   INIT_LIST_HEAD(>sai_entries);
> @@ -465,7 +468,7 @@ static void ll_sai_put(struct ll_statahead_info *sai)
>   lli->lli_sai = NULL;
>   spin_unlock(>lli_sa_lock);
> 
> - LASSERT(thread_is_stopped(>sai_thread));
> + LASSERT(sai->sai_task == NULL);
>   LASSERT(sai->sai_agl_task == NULL);
>   LASSERT(sai->sai_sent == sai->sai_replied);
>   LASSERT(!sa_has_callback(sai));
> @@ -646,7 +649,6 @@ static int ll_statahead_interpret(struct ptlrpc_request 
> *req,
>   struct ll_inode_info *lli = ll_i2info(dir);
>   struct ll_statahead_info *sai = lli->lli_sai;
>   struct sa_entry *entry = (struct sa_entry *)minfo->mi_cbdata;
> - wait_queue_head_t *waitq = NULL;
>   __u64 handle = 0;
> 
>   if (it_disposition(it, DISP_LOOKUP_NEG))
> @@ -657,7 +659,6 @@ static int ll_statahead_interpret(struct ptlrpc_request 
> *req,
>* sai should be always valid, no need to refcount
>*/
>   LASSERT(sai);
> - LASSERT(!thread_is_stopped(>sai_thread));
>   LASSERT(entry);
> 
>   CDEBUG(D_READA, "sa_entry %.*s rc %d\n",
> @@ -681,8 +682,9 @@ static int ll_statahead_interpret(struct ptlrpc_request 
> *req,
>   spin_lock(>lli_sa_lock);
>   if (rc) {
>   if (__sa_make_ready(sai, entry, rc))
> - waitq = >sai_waitq;
> + wake_up(>sai_waitq);
>   } else {
> + int first = 0;
>   entry->se_minfo = minfo;
>   entry->se_req = ptlrpc_request_addref(req);
>   /*
> @@ -693,14 +695,15 @@ static int ll_statahead_interpret(struct ptlrpc_request 
> *req,
>*/
>   entry->se_handle = handle;
>   if 

Re: [PATCH 13/17] staging: lustre: remove 'ptlrpc_thread usage' for sai_agl_thread

2018-03-08 Thread Dilger, Andreas
On Mar 1, 2018, at 16:31, NeilBrown  wrote:
> 
> Lustre has a 'struct ptlrpc_thread' which provides
> control functionality wrapped around kthreads.
> None of the functionality used in statahead.c requires
> ptlrcp_thread - it can all be done directly with kthreads.
> 
> So discard the ptlrpc_thread and just use a task_struct directly.
> 
> One particular change worth noting is that in the current
> code, the thread performs some start-up actions and then
> signals that it is ready to go.  In the new code, the thread
> is first created, then the startup actions are perform, then
> the thread is woken up.  This means there is no need to wait
> any more than kthread_create() already waits.
> 
> Signed-off-by: NeilBrown 

Looks reasonable, but one minor comment inline below.  Not enough to
make me think the patch is bad, just a minor inefficiency...

Reviewed-by: Andreas Dilger 

I've also CC'd Fan Yong, who is the author of this code in case
he has any comments.

> diff --git a/drivers/staging/lustre/lustre/llite/statahead.c 
> b/drivers/staging/lustre/lustre/llite/statahead.c
> index ba00881a5745..39241b952bf4 100644
> --- a/drivers/staging/lustre/lustre/llite/statahead.c
> +++ b/drivers/staging/lustre/lustre/llite/statahead.c
> @@ -861,35 +860,13 @@ static int ll_agl_thread(void *arg)
>   struct inode *dir= d_inode(parent);
>   struct ll_inode_info *plli   = ll_i2info(dir);
>   struct ll_inode_info *clli;
> - struct ll_sb_info   *sbi= ll_i2sbi(dir);
> - struct ll_statahead_info *sai;
> - struct ptlrpc_thread *thread;
> + /* We already own this reference, so it is safe to take it without a 
> lock. */
> + struct ll_statahead_info *sai = plli->lli_sai;
> 
> - sai = ll_sai_get(dir);

Here we used to grab a reference to "sai" from the directory, but you
get it in the calling thread now...

> @@ -937,16 +917,22 @@ static void ll_start_agl(struct dentry *parent, struct 
> ll_statahead_info *sai)
>  sai, parent);
> 
>   plli = ll_i2info(d_inode(parent));
> + task = kthread_create(ll_agl_thread, parent, "ll_agl_%u",
> +   plli->lli_opendir_pid);
>   if (IS_ERR(task)) {
>   CERROR("can't start ll_agl thread, rc: %ld\n", PTR_ERR(task));
>   return;
>   }
> 
> + sai->sai_agl_task = task;
> + atomic_inc(_i2sbi(d_inode(parent))->ll_agl_total);
> + spin_lock(>lli_agl_lock);
> + sai->sai_agl_valid = 1;
> + spin_unlock(>lli_agl_lock);
> + /* Get an extra reference that the thread holds */
> + ll_sai_get(d_inode(parent));

Here you get the extra reference, but we already have the pointer to
"sai", do going through "parent->d_inode->lli->lli_sai" to get "sai"
again seems convoluted.  One option is atomic_inc(>sai_refcount),
but given that this is done only once per "ls" call I don't think it
is a huge deal, and not more work than was done before.

Cheers, Andreas

> +
> + wake_up_process(task);
> }
> 
> /* statahead thread main function */
> @@ -958,7 +944,6 @@ static int ll_statahead_thread(void *arg)
>   struct ll_sb_info   *sbi= ll_i2sbi(dir);
>   struct ll_statahead_info *sai;
>   struct ptlrpc_thread *sa_thread;
> - struct ptlrpc_thread *agl_thread;
>   struct page   *page = NULL;
>   __u64pos= 0;
>   intfirst  = 0;
> @@ -967,7 +952,6 @@ static int ll_statahead_thread(void *arg)
> 
>   sai = ll_sai_get(dir);
>   sa_thread = >sai_thread;
> - agl_thread = >sai_agl_thread;
>   sa_thread->t_pid = current_pid();
>   CDEBUG(D_READA, "statahead thread starting: sai %p, parent %pd\n",
>  sai, parent);
> @@ -1129,21 +1113,13 @@ static int ll_statahead_thread(void *arg)
>   sa_handle_callback(sai);
>   }
> out:
> - if (sai->sai_agl_valid) {
> - spin_lock(>lli_agl_lock);
> - thread_set_flags(agl_thread, SVC_STOPPING);
> - spin_unlock(>lli_agl_lock);
> - wake_up(_thread->t_ctl_waitq);
> + if (sai->sai_agl_task) {
> + kthread_stop(sai->sai_agl_task);
> 
>   CDEBUG(D_READA, "stop agl thread: sai %p pid %u\n",
> -sai, (unsigned int)agl_thread->t_pid);
> - wait_event_idle(agl_thread->t_ctl_waitq,
> - thread_is_stopped(agl_thread));
> - } else {
> - /* Set agl_thread flags anyway. */
> - thread_set_flags(agl_thread, SVC_STOPPED);
> +sai, (unsigned int)sai->sai_agl_task->pid);
> + sai->sai_agl_task = NULL;
>   }
> -
>   /*
>* wait for inflight statahead RPCs to finish, and then we can free sai
>* safely because statahead RPC will access sai data
> 
> 

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH 13/17] staging: lustre: remove 'ptlrpc_thread usage' for sai_agl_thread

2018-03-08 Thread Dilger, Andreas
On Mar 1, 2018, at 16:31, NeilBrown  wrote:
> 
> Lustre has a 'struct ptlrpc_thread' which provides
> control functionality wrapped around kthreads.
> None of the functionality used in statahead.c requires
> ptlrcp_thread - it can all be done directly with kthreads.
> 
> So discard the ptlrpc_thread and just use a task_struct directly.
> 
> One particular change worth noting is that in the current
> code, the thread performs some start-up actions and then
> signals that it is ready to go.  In the new code, the thread
> is first created, then the startup actions are perform, then
> the thread is woken up.  This means there is no need to wait
> any more than kthread_create() already waits.
> 
> Signed-off-by: NeilBrown 

Looks reasonable, but one minor comment inline below.  Not enough to
make me think the patch is bad, just a minor inefficiency...

Reviewed-by: Andreas Dilger 

I've also CC'd Fan Yong, who is the author of this code in case
he has any comments.

> diff --git a/drivers/staging/lustre/lustre/llite/statahead.c 
> b/drivers/staging/lustre/lustre/llite/statahead.c
> index ba00881a5745..39241b952bf4 100644
> --- a/drivers/staging/lustre/lustre/llite/statahead.c
> +++ b/drivers/staging/lustre/lustre/llite/statahead.c
> @@ -861,35 +860,13 @@ static int ll_agl_thread(void *arg)
>   struct inode *dir= d_inode(parent);
>   struct ll_inode_info *plli   = ll_i2info(dir);
>   struct ll_inode_info *clli;
> - struct ll_sb_info   *sbi= ll_i2sbi(dir);
> - struct ll_statahead_info *sai;
> - struct ptlrpc_thread *thread;
> + /* We already own this reference, so it is safe to take it without a 
> lock. */
> + struct ll_statahead_info *sai = plli->lli_sai;
> 
> - sai = ll_sai_get(dir);

Here we used to grab a reference to "sai" from the directory, but you
get it in the calling thread now...

> @@ -937,16 +917,22 @@ static void ll_start_agl(struct dentry *parent, struct 
> ll_statahead_info *sai)
>  sai, parent);
> 
>   plli = ll_i2info(d_inode(parent));
> + task = kthread_create(ll_agl_thread, parent, "ll_agl_%u",
> +   plli->lli_opendir_pid);
>   if (IS_ERR(task)) {
>   CERROR("can't start ll_agl thread, rc: %ld\n", PTR_ERR(task));
>   return;
>   }
> 
> + sai->sai_agl_task = task;
> + atomic_inc(_i2sbi(d_inode(parent))->ll_agl_total);
> + spin_lock(>lli_agl_lock);
> + sai->sai_agl_valid = 1;
> + spin_unlock(>lli_agl_lock);
> + /* Get an extra reference that the thread holds */
> + ll_sai_get(d_inode(parent));

Here you get the extra reference, but we already have the pointer to
"sai", do going through "parent->d_inode->lli->lli_sai" to get "sai"
again seems convoluted.  One option is atomic_inc(>sai_refcount),
but given that this is done only once per "ls" call I don't think it
is a huge deal, and not more work than was done before.

Cheers, Andreas

> +
> + wake_up_process(task);
> }
> 
> /* statahead thread main function */
> @@ -958,7 +944,6 @@ static int ll_statahead_thread(void *arg)
>   struct ll_sb_info   *sbi= ll_i2sbi(dir);
>   struct ll_statahead_info *sai;
>   struct ptlrpc_thread *sa_thread;
> - struct ptlrpc_thread *agl_thread;
>   struct page   *page = NULL;
>   __u64pos= 0;
>   intfirst  = 0;
> @@ -967,7 +952,6 @@ static int ll_statahead_thread(void *arg)
> 
>   sai = ll_sai_get(dir);
>   sa_thread = >sai_thread;
> - agl_thread = >sai_agl_thread;
>   sa_thread->t_pid = current_pid();
>   CDEBUG(D_READA, "statahead thread starting: sai %p, parent %pd\n",
>  sai, parent);
> @@ -1129,21 +1113,13 @@ static int ll_statahead_thread(void *arg)
>   sa_handle_callback(sai);
>   }
> out:
> - if (sai->sai_agl_valid) {
> - spin_lock(>lli_agl_lock);
> - thread_set_flags(agl_thread, SVC_STOPPING);
> - spin_unlock(>lli_agl_lock);
> - wake_up(_thread->t_ctl_waitq);
> + if (sai->sai_agl_task) {
> + kthread_stop(sai->sai_agl_task);
> 
>   CDEBUG(D_READA, "stop agl thread: sai %p pid %u\n",
> -sai, (unsigned int)agl_thread->t_pid);
> - wait_event_idle(agl_thread->t_ctl_waitq,
> - thread_is_stopped(agl_thread));
> - } else {
> - /* Set agl_thread flags anyway. */
> - thread_set_flags(agl_thread, SVC_STOPPED);
> +sai, (unsigned int)sai->sai_agl_task->pid);
> + sai->sai_agl_task = NULL;
>   }
> -
>   /*
>* wait for inflight statahead RPCs to finish, and then we can free sai
>* safely because statahead RPC will access sai data
> 
> 

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH 12/17] staging: lustre: remove unused flag from ptlrpc_thread

2018-03-08 Thread Dilger, Andreas
On Mar 1, 2018, at 16:31, NeilBrown  wrote:
> 
> SVC_EVENT is no longer used.
> 
> Signed-off-by: NeilBrown 

Reviewed-by: Andreas Dilger 

> ---
> drivers/staging/lustre/lustre/include/lustre_net.h |   11 ---
> 1 file changed, 11 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/include/lustre_net.h 
> b/drivers/staging/lustre/lustre/include/lustre_net.h
> index 5a4434e7c85a..108683c54127 100644
> --- a/drivers/staging/lustre/lustre/include/lustre_net.h
> +++ b/drivers/staging/lustre/lustre/include/lustre_net.h
> @@ -1259,7 +1259,6 @@ enum {
>   SVC_STOPPING= 1 << 1,
>   SVC_STARTING= 1 << 2,
>   SVC_RUNNING = 1 << 3,
> - SVC_EVENT   = 1 << 4,
> };
> 
> #define PTLRPC_THR_NAME_LEN   32
> @@ -1302,11 +1301,6 @@ struct ptlrpc_thread {
>   chart_name[PTLRPC_THR_NAME_LEN];
> };
> 
> -static inline int thread_is_init(struct ptlrpc_thread *thread)
> -{
> - return thread->t_flags == 0;
> -}
> -
> static inline int thread_is_stopped(struct ptlrpc_thread *thread)
> {
>   return !!(thread->t_flags & SVC_STOPPED);
> @@ -1327,11 +1321,6 @@ static inline int thread_is_running(struct 
> ptlrpc_thread *thread)
>   return !!(thread->t_flags & SVC_RUNNING);
> }
> 
> -static inline int thread_is_event(struct ptlrpc_thread *thread)
> -{
> - return !!(thread->t_flags & SVC_EVENT);
> -}
> -
> static inline void thread_clear_flags(struct ptlrpc_thread *thread, __u32 
> flags)
> {
>   thread->t_flags &= ~flags;
> 
> 

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH 12/17] staging: lustre: remove unused flag from ptlrpc_thread

2018-03-08 Thread Dilger, Andreas
On Mar 1, 2018, at 16:31, NeilBrown  wrote:
> 
> SVC_EVENT is no longer used.
> 
> Signed-off-by: NeilBrown 

Reviewed-by: Andreas Dilger 

> ---
> drivers/staging/lustre/lustre/include/lustre_net.h |   11 ---
> 1 file changed, 11 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/include/lustre_net.h 
> b/drivers/staging/lustre/lustre/include/lustre_net.h
> index 5a4434e7c85a..108683c54127 100644
> --- a/drivers/staging/lustre/lustre/include/lustre_net.h
> +++ b/drivers/staging/lustre/lustre/include/lustre_net.h
> @@ -1259,7 +1259,6 @@ enum {
>   SVC_STOPPING= 1 << 1,
>   SVC_STARTING= 1 << 2,
>   SVC_RUNNING = 1 << 3,
> - SVC_EVENT   = 1 << 4,
> };
> 
> #define PTLRPC_THR_NAME_LEN   32
> @@ -1302,11 +1301,6 @@ struct ptlrpc_thread {
>   chart_name[PTLRPC_THR_NAME_LEN];
> };
> 
> -static inline int thread_is_init(struct ptlrpc_thread *thread)
> -{
> - return thread->t_flags == 0;
> -}
> -
> static inline int thread_is_stopped(struct ptlrpc_thread *thread)
> {
>   return !!(thread->t_flags & SVC_STOPPED);
> @@ -1327,11 +1321,6 @@ static inline int thread_is_running(struct 
> ptlrpc_thread *thread)
>   return !!(thread->t_flags & SVC_RUNNING);
> }
> 
> -static inline int thread_is_event(struct ptlrpc_thread *thread)
> -{
> - return !!(thread->t_flags & SVC_EVENT);
> -}
> -
> static inline void thread_clear_flags(struct ptlrpc_thread *thread, __u32 
> flags)
> {
>   thread->t_flags &= ~flags;
> 
> 

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH 11/17] staging: lustre: ptlrpc: use workqueue for pinger

2018-03-08 Thread Dilger, Andreas
On Mar 1, 2018, at 16:31, NeilBrown  wrote:
> 
> lustre has a "Pinger" kthread which periodically pings peers
> to ensure all hosts are functioning.
> 
> This can more easily be done using a work queue.
> 
> As maintaining contact with other peers is import for
> keeping the filesystem running, and as the filesystem might
> be involved in freeing memory, it is safest to have a
> separate WQ_MEM_RECLAIM workqueue.
> 
> The SVC_EVENT functionality to wake up the thread can be
> replaced with mod_delayed_work().
> 
> Also use round_jiffies_up_relative() rather than setting a
> minimum of 1 second delay.  The PING_INTERVAL is measured in
> seconds so this meets the need is allow the workqueue to
> keep wakeups synchronized.
> 
> Signed-off-by: NeilBrown 

Looks reasonable.  Fortunately, pinging the server does not need
to be very accurate since it is only done occasionally when the
client is otherwise idle, so it shouldn't matter if the workqueue
operation is delayed by a few seconds.

Reviewed-by: Andreas Dilger 

> ---
> drivers/staging/lustre/lustre/ptlrpc/pinger.c |   81 +++--
> 1 file changed, 24 insertions(+), 57 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/ptlrpc/pinger.c 
> b/drivers/staging/lustre/lustre/ptlrpc/pinger.c
> index b5f3cfee8e75..0775b7a048bb 100644
> --- a/drivers/staging/lustre/lustre/ptlrpc/pinger.c
> +++ b/drivers/staging/lustre/lustre/ptlrpc/pinger.c
> @@ -217,21 +217,18 @@ static void ptlrpc_pinger_process_import(struct 
> obd_import *imp,
>   }
> }
> 
> -static int ptlrpc_pinger_main(void *arg)
> -{
> - struct ptlrpc_thread *thread = arg;
> -
> - /* Record that the thread is running */
> - thread_set_flags(thread, SVC_RUNNING);
> - wake_up(>t_ctl_waitq);
> +static struct workqueue_struct *pinger_wq;
> +static void ptlrpc_pinger_main(struct work_struct *ws);
> +static DECLARE_DELAYED_WORK(ping_work, ptlrpc_pinger_main);
> 
> - /* And now, loop forever, pinging as needed. */
> - while (1) {
> - unsigned long this_ping = cfs_time_current();
> - long time_to_next_wake;
> - struct timeout_item *item;
> - struct obd_import *imp;
> +static void ptlrpc_pinger_main(struct work_struct *ws)
> +{
> + unsigned long this_ping = cfs_time_current();
> + long time_to_next_wake;
> + struct timeout_item *item;
> + struct obd_import *imp;
> 
> + do {
>   mutex_lock(_mutex);
>   list_for_each_entry(item, _list, ti_chain) {
>   item->ti_cb(item, item->ti_cb_data);
> @@ -260,50 +257,24 @@ static int ptlrpc_pinger_main(void *arg)
>  time_to_next_wake,
>  cfs_time_add(this_ping,
>   PING_INTERVAL * HZ));
> - if (time_to_next_wake > 0) {
> - wait_event_idle_timeout(thread->t_ctl_waitq,
> - thread_is_stopping(thread) ||
> - thread_is_event(thread),
> - max_t(long, time_to_next_wake, 
> HZ));
> - if (thread_test_and_clear_flags(thread, SVC_STOPPING))
> - break;
> - /* woken after adding import to reset timer */
> - thread_test_and_clear_flags(thread, SVC_EVENT);
> - }
> - }
> + } while (time_to_next_wake <= 0);
> 
> - thread_set_flags(thread, SVC_STOPPED);
> - wake_up(>t_ctl_waitq);
> -
> - CDEBUG(D_NET, "pinger thread exiting, process %d\n", current_pid());
> - return 0;
> + queue_delayed_work(pinger_wq, _work,
> +round_jiffies_up_relative(time_to_next_wake));
> }
> 
> -static struct ptlrpc_thread pinger_thread;
> -
> int ptlrpc_start_pinger(void)
> {
> - struct task_struct *task;
> - int rc;
> -
> - if (!thread_is_init(_thread) &&
> - !thread_is_stopped(_thread))
> + if (pinger_wq)
>   return -EALREADY;
> 
> - init_waitqueue_head(_thread.t_ctl_waitq);
> -
> - strcpy(pinger_thread.t_name, "ll_ping");
> -
> - task = kthread_run(ptlrpc_pinger_main, _thread,
> -pinger_thread.t_name);
> - if (IS_ERR(task)) {
> - rc = PTR_ERR(task);
> - CERROR("cannot start pinger thread: rc = %d\n", rc);
> - return rc;
> + pinger_wq = alloc_workqueue("ptlrpc_pinger", WQ_MEM_RECLAIM, 1);
> + if (!pinger_wq) {
> + CERROR("cannot start pinger workqueue\n");
> + return -ENOMEM;
>   }
> - wait_event_idle(pinger_thread.t_ctl_waitq,
> - thread_is_running(_thread));
> 
> + queue_delayed_work(pinger_wq, _work, 0);
>   return 0;
> }
> 
> @@ -313,16 +284,13 @@ int ptlrpc_stop_pinger(void)
> {
>   int rc = 0;
> 
> - if (thread_is_init(_thread) ||

Re: [PATCH 11/17] staging: lustre: ptlrpc: use workqueue for pinger

2018-03-08 Thread Dilger, Andreas
On Mar 1, 2018, at 16:31, NeilBrown  wrote:
> 
> lustre has a "Pinger" kthread which periodically pings peers
> to ensure all hosts are functioning.
> 
> This can more easily be done using a work queue.
> 
> As maintaining contact with other peers is import for
> keeping the filesystem running, and as the filesystem might
> be involved in freeing memory, it is safest to have a
> separate WQ_MEM_RECLAIM workqueue.
> 
> The SVC_EVENT functionality to wake up the thread can be
> replaced with mod_delayed_work().
> 
> Also use round_jiffies_up_relative() rather than setting a
> minimum of 1 second delay.  The PING_INTERVAL is measured in
> seconds so this meets the need is allow the workqueue to
> keep wakeups synchronized.
> 
> Signed-off-by: NeilBrown 

Looks reasonable.  Fortunately, pinging the server does not need
to be very accurate since it is only done occasionally when the
client is otherwise idle, so it shouldn't matter if the workqueue
operation is delayed by a few seconds.

Reviewed-by: Andreas Dilger 

> ---
> drivers/staging/lustre/lustre/ptlrpc/pinger.c |   81 +++--
> 1 file changed, 24 insertions(+), 57 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/ptlrpc/pinger.c 
> b/drivers/staging/lustre/lustre/ptlrpc/pinger.c
> index b5f3cfee8e75..0775b7a048bb 100644
> --- a/drivers/staging/lustre/lustre/ptlrpc/pinger.c
> +++ b/drivers/staging/lustre/lustre/ptlrpc/pinger.c
> @@ -217,21 +217,18 @@ static void ptlrpc_pinger_process_import(struct 
> obd_import *imp,
>   }
> }
> 
> -static int ptlrpc_pinger_main(void *arg)
> -{
> - struct ptlrpc_thread *thread = arg;
> -
> - /* Record that the thread is running */
> - thread_set_flags(thread, SVC_RUNNING);
> - wake_up(>t_ctl_waitq);
> +static struct workqueue_struct *pinger_wq;
> +static void ptlrpc_pinger_main(struct work_struct *ws);
> +static DECLARE_DELAYED_WORK(ping_work, ptlrpc_pinger_main);
> 
> - /* And now, loop forever, pinging as needed. */
> - while (1) {
> - unsigned long this_ping = cfs_time_current();
> - long time_to_next_wake;
> - struct timeout_item *item;
> - struct obd_import *imp;
> +static void ptlrpc_pinger_main(struct work_struct *ws)
> +{
> + unsigned long this_ping = cfs_time_current();
> + long time_to_next_wake;
> + struct timeout_item *item;
> + struct obd_import *imp;
> 
> + do {
>   mutex_lock(_mutex);
>   list_for_each_entry(item, _list, ti_chain) {
>   item->ti_cb(item, item->ti_cb_data);
> @@ -260,50 +257,24 @@ static int ptlrpc_pinger_main(void *arg)
>  time_to_next_wake,
>  cfs_time_add(this_ping,
>   PING_INTERVAL * HZ));
> - if (time_to_next_wake > 0) {
> - wait_event_idle_timeout(thread->t_ctl_waitq,
> - thread_is_stopping(thread) ||
> - thread_is_event(thread),
> - max_t(long, time_to_next_wake, 
> HZ));
> - if (thread_test_and_clear_flags(thread, SVC_STOPPING))
> - break;
> - /* woken after adding import to reset timer */
> - thread_test_and_clear_flags(thread, SVC_EVENT);
> - }
> - }
> + } while (time_to_next_wake <= 0);
> 
> - thread_set_flags(thread, SVC_STOPPED);
> - wake_up(>t_ctl_waitq);
> -
> - CDEBUG(D_NET, "pinger thread exiting, process %d\n", current_pid());
> - return 0;
> + queue_delayed_work(pinger_wq, _work,
> +round_jiffies_up_relative(time_to_next_wake));
> }
> 
> -static struct ptlrpc_thread pinger_thread;
> -
> int ptlrpc_start_pinger(void)
> {
> - struct task_struct *task;
> - int rc;
> -
> - if (!thread_is_init(_thread) &&
> - !thread_is_stopped(_thread))
> + if (pinger_wq)
>   return -EALREADY;
> 
> - init_waitqueue_head(_thread.t_ctl_waitq);
> -
> - strcpy(pinger_thread.t_name, "ll_ping");
> -
> - task = kthread_run(ptlrpc_pinger_main, _thread,
> -pinger_thread.t_name);
> - if (IS_ERR(task)) {
> - rc = PTR_ERR(task);
> - CERROR("cannot start pinger thread: rc = %d\n", rc);
> - return rc;
> + pinger_wq = alloc_workqueue("ptlrpc_pinger", WQ_MEM_RECLAIM, 1);
> + if (!pinger_wq) {
> + CERROR("cannot start pinger workqueue\n");
> + return -ENOMEM;
>   }
> - wait_event_idle(pinger_thread.t_ctl_waitq,
> - thread_is_running(_thread));
> 
> + queue_delayed_work(pinger_wq, _work, 0);
>   return 0;
> }
> 
> @@ -313,16 +284,13 @@ int ptlrpc_stop_pinger(void)
> {
>   int rc = 0;
> 
> - if (thread_is_init(_thread) ||
> - thread_is_stopped(_thread))
> + if 

Re: [PATCH 10/17] staging: lustre: ptlrpc: use delayed_work in sec_gc

2018-03-08 Thread Dilger, Andreas
On Mar 1, 2018, at 16:31, NeilBrown  wrote:
> 
> The garbage collection for security contexts currently
> has a dedicated kthread which wakes up every 30 minutes
> to discard old garbage.
> 
> Replace this with a simple delayed_work item on the
> system work queue.
> 
> Signed-off-by: NeilBrown 

Reviewed-by: Andreas Dilger 

> ---
> drivers/staging/lustre/lustre/ptlrpc/sec_gc.c |   90 -
> 1 file changed, 28 insertions(+), 62 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/ptlrpc/sec_gc.c 
> b/drivers/staging/lustre/lustre/ptlrpc/sec_gc.c
> index 48f1a72afd77..2c8bad7b7877 100644
> --- a/drivers/staging/lustre/lustre/ptlrpc/sec_gc.c
> +++ b/drivers/staging/lustre/lustre/ptlrpc/sec_gc.c
> @@ -55,7 +55,6 @@ static spinlock_t sec_gc_list_lock;
> static LIST_HEAD(sec_gc_ctx_list);
> static spinlock_t sec_gc_ctx_list_lock;
> 
> -static struct ptlrpc_thread sec_gc_thread;
> static atomic_t sec_gc_wait_del = ATOMIC_INIT(0);
> 
> void sptlrpc_gc_add_sec(struct ptlrpc_sec *sec)
> @@ -139,86 +138,53 @@ static void sec_do_gc(struct ptlrpc_sec *sec)
>   sec->ps_gc_next = ktime_get_real_seconds() + sec->ps_gc_interval;
> }
> 
> -static int sec_gc_main(void *arg)
> -{
> - struct ptlrpc_thread *thread = arg;
> -
> - unshare_fs_struct();
> +static void sec_gc_main(struct work_struct *ws);
> +static DECLARE_DELAYED_WORK(sec_gc_work, sec_gc_main);
> 
> - /* Record that the thread is running */
> - thread_set_flags(thread, SVC_RUNNING);
> - wake_up(>t_ctl_waitq);
> -
> - while (1) {
> - struct ptlrpc_sec *sec;
> +static void sec_gc_main(struct work_struct *ws)
> +{
> + struct ptlrpc_sec *sec;
> 
> - sec_process_ctx_list();
> + sec_process_ctx_list();
> again:
> - /* go through sec list do gc.
> -  * FIXME here we iterate through the whole list each time which
> -  * is not optimal. we perhaps want to use balanced binary tree
> -  * to trace each sec as order of expiry time.
> -  * another issue here is we wakeup as fixed interval instead of
> -  * according to each sec's expiry time
> + /* go through sec list do gc.
> +  * FIXME here we iterate through the whole list each time which
> +  * is not optimal. we perhaps want to use balanced binary tree
> +  * to trace each sec as order of expiry time.
> +  * another issue here is we wakeup as fixed interval instead of
> +  * according to each sec's expiry time
> +  */
> + mutex_lock(_gc_mutex);
> + list_for_each_entry(sec, _gc_list, ps_gc_list) {
> + /* if someone is waiting to be deleted, let it
> +  * proceed as soon as possible.
>*/
> - mutex_lock(_gc_mutex);
> - list_for_each_entry(sec, _gc_list, ps_gc_list) {
> - /* if someone is waiting to be deleted, let it
> -  * proceed as soon as possible.
> -  */
> - if (atomic_read(_gc_wait_del)) {
> - CDEBUG(D_SEC, "deletion pending, start over\n");
> - mutex_unlock(_gc_mutex);
> - goto again;
> - }
> -
> - sec_do_gc(sec);
> + if (atomic_read(_gc_wait_del)) {
> + CDEBUG(D_SEC, "deletion pending, start over\n");
> + mutex_unlock(_gc_mutex);
> + goto again;
>   }
> - mutex_unlock(_gc_mutex);
> -
> - /* check ctx list again before sleep */
> - sec_process_ctx_list();
> - wait_event_idle_timeout(thread->t_ctl_waitq,
> - thread_is_stopping(thread),
> - SEC_GC_INTERVAL * HZ);
> 
> - if (thread_test_and_clear_flags(thread, SVC_STOPPING))
> - break;
> + sec_do_gc(sec);
>   }
> + mutex_unlock(_gc_mutex);
> 
> - thread_set_flags(thread, SVC_STOPPED);
> - wake_up(>t_ctl_waitq);
> - return 0;
> + /* check ctx list again before sleep */
> + sec_process_ctx_list();
> + schedule_delayed_work(_gc_work, SEC_GC_INTERVAL * HZ);
> }
> 
> int sptlrpc_gc_init(void)
> {
> - struct task_struct *task;
> -
>   mutex_init(_gc_mutex);
>   spin_lock_init(_gc_list_lock);
>   spin_lock_init(_gc_ctx_list_lock);
> 
> - /* initialize thread control */
> - memset(_gc_thread, 0, sizeof(sec_gc_thread));
> - init_waitqueue_head(_gc_thread.t_ctl_waitq);
> -
> - task = kthread_run(sec_gc_main, _gc_thread, "sptlrpc_gc");
> - if (IS_ERR(task)) {
> - CERROR("can't start gc thread: %ld\n", PTR_ERR(task));
> - return PTR_ERR(task);
> - }
> -
> - wait_event_idle(sec_gc_thread.t_ctl_waitq,
> - 

Re: [PATCH 10/17] staging: lustre: ptlrpc: use delayed_work in sec_gc

2018-03-08 Thread Dilger, Andreas
On Mar 1, 2018, at 16:31, NeilBrown  wrote:
> 
> The garbage collection for security contexts currently
> has a dedicated kthread which wakes up every 30 minutes
> to discard old garbage.
> 
> Replace this with a simple delayed_work item on the
> system work queue.
> 
> Signed-off-by: NeilBrown 

Reviewed-by: Andreas Dilger 

> ---
> drivers/staging/lustre/lustre/ptlrpc/sec_gc.c |   90 -
> 1 file changed, 28 insertions(+), 62 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/ptlrpc/sec_gc.c 
> b/drivers/staging/lustre/lustre/ptlrpc/sec_gc.c
> index 48f1a72afd77..2c8bad7b7877 100644
> --- a/drivers/staging/lustre/lustre/ptlrpc/sec_gc.c
> +++ b/drivers/staging/lustre/lustre/ptlrpc/sec_gc.c
> @@ -55,7 +55,6 @@ static spinlock_t sec_gc_list_lock;
> static LIST_HEAD(sec_gc_ctx_list);
> static spinlock_t sec_gc_ctx_list_lock;
> 
> -static struct ptlrpc_thread sec_gc_thread;
> static atomic_t sec_gc_wait_del = ATOMIC_INIT(0);
> 
> void sptlrpc_gc_add_sec(struct ptlrpc_sec *sec)
> @@ -139,86 +138,53 @@ static void sec_do_gc(struct ptlrpc_sec *sec)
>   sec->ps_gc_next = ktime_get_real_seconds() + sec->ps_gc_interval;
> }
> 
> -static int sec_gc_main(void *arg)
> -{
> - struct ptlrpc_thread *thread = arg;
> -
> - unshare_fs_struct();
> +static void sec_gc_main(struct work_struct *ws);
> +static DECLARE_DELAYED_WORK(sec_gc_work, sec_gc_main);
> 
> - /* Record that the thread is running */
> - thread_set_flags(thread, SVC_RUNNING);
> - wake_up(>t_ctl_waitq);
> -
> - while (1) {
> - struct ptlrpc_sec *sec;
> +static void sec_gc_main(struct work_struct *ws)
> +{
> + struct ptlrpc_sec *sec;
> 
> - sec_process_ctx_list();
> + sec_process_ctx_list();
> again:
> - /* go through sec list do gc.
> -  * FIXME here we iterate through the whole list each time which
> -  * is not optimal. we perhaps want to use balanced binary tree
> -  * to trace each sec as order of expiry time.
> -  * another issue here is we wakeup as fixed interval instead of
> -  * according to each sec's expiry time
> + /* go through sec list do gc.
> +  * FIXME here we iterate through the whole list each time which
> +  * is not optimal. we perhaps want to use balanced binary tree
> +  * to trace each sec as order of expiry time.
> +  * another issue here is we wakeup as fixed interval instead of
> +  * according to each sec's expiry time
> +  */
> + mutex_lock(_gc_mutex);
> + list_for_each_entry(sec, _gc_list, ps_gc_list) {
> + /* if someone is waiting to be deleted, let it
> +  * proceed as soon as possible.
>*/
> - mutex_lock(_gc_mutex);
> - list_for_each_entry(sec, _gc_list, ps_gc_list) {
> - /* if someone is waiting to be deleted, let it
> -  * proceed as soon as possible.
> -  */
> - if (atomic_read(_gc_wait_del)) {
> - CDEBUG(D_SEC, "deletion pending, start over\n");
> - mutex_unlock(_gc_mutex);
> - goto again;
> - }
> -
> - sec_do_gc(sec);
> + if (atomic_read(_gc_wait_del)) {
> + CDEBUG(D_SEC, "deletion pending, start over\n");
> + mutex_unlock(_gc_mutex);
> + goto again;
>   }
> - mutex_unlock(_gc_mutex);
> -
> - /* check ctx list again before sleep */
> - sec_process_ctx_list();
> - wait_event_idle_timeout(thread->t_ctl_waitq,
> - thread_is_stopping(thread),
> - SEC_GC_INTERVAL * HZ);
> 
> - if (thread_test_and_clear_flags(thread, SVC_STOPPING))
> - break;
> + sec_do_gc(sec);
>   }
> + mutex_unlock(_gc_mutex);
> 
> - thread_set_flags(thread, SVC_STOPPED);
> - wake_up(>t_ctl_waitq);
> - return 0;
> + /* check ctx list again before sleep */
> + sec_process_ctx_list();
> + schedule_delayed_work(_gc_work, SEC_GC_INTERVAL * HZ);
> }
> 
> int sptlrpc_gc_init(void)
> {
> - struct task_struct *task;
> -
>   mutex_init(_gc_mutex);
>   spin_lock_init(_gc_list_lock);
>   spin_lock_init(_gc_ctx_list_lock);
> 
> - /* initialize thread control */
> - memset(_gc_thread, 0, sizeof(sec_gc_thread));
> - init_waitqueue_head(_gc_thread.t_ctl_waitq);
> -
> - task = kthread_run(sec_gc_main, _gc_thread, "sptlrpc_gc");
> - if (IS_ERR(task)) {
> - CERROR("can't start gc thread: %ld\n", PTR_ERR(task));
> - return PTR_ERR(task);
> - }
> -
> - wait_event_idle(sec_gc_thread.t_ctl_waitq,
> - thread_is_running(_gc_thread));
> + 

Re: [PATCH 09/17] staging: lustre: ldlm: use delayed_work for pools_recalc

2018-03-08 Thread Dilger, Andreas
On Mar 1, 2018, at 16:31, NeilBrown  wrote:
> 
> ldlm currenty has a kthread which wakes up every so often
> and calls ldlm_pools_recalc().
> The thread is started and stopped, but no other external interactions
> happen.
> 
> This can trivially be replaced by a delayed_work if we have
> ldlm_pools_recalc() reschedule the work rather than just report
> when to do that.
> 
> Signed-off-by: NeilBrown 

Reviewed-by: Andreas Dilger 

> ---
> drivers/staging/lustre/lustre/ldlm/ldlm_pool.c |   99 +++-
> 1 file changed, 11 insertions(+), 88 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_pool.c 
> b/drivers/staging/lustre/lustre/ldlm/ldlm_pool.c
> index a0e486b57e08..53b8f33e54b5 100644
> --- a/drivers/staging/lustre/lustre/ldlm/ldlm_pool.c
> +++ b/drivers/staging/lustre/lustre/ldlm/ldlm_pool.c
> @@ -784,9 +784,6 @@ static int ldlm_pool_granted(struct ldlm_pool *pl)
>   return atomic_read(>pl_granted);
> }
> 
> -static struct ptlrpc_thread *ldlm_pools_thread;
> -static struct completion ldlm_pools_comp;
> -
> /*
>  * count locks from all namespaces (if possible). Returns number of
>  * cached locks.
> @@ -899,8 +896,12 @@ static unsigned long ldlm_pools_cli_scan(struct shrinker 
> *s,
>  sc->gfp_mask);
> }
> 
> -static int ldlm_pools_recalc(enum ldlm_side client)
> +static void ldlm_pools_recalc(struct work_struct *ws);
> +static DECLARE_DELAYED_WORK(ldlm_recalc_pools, ldlm_pools_recalc);
> +
> +static void ldlm_pools_recalc(struct work_struct *ws)
> {
> + enum ldlm_side client = LDLM_NAMESPACE_CLIENT;
>   struct ldlm_namespace *ns;
>   struct ldlm_namespace *ns_old = NULL;
>   /* seconds of sleep if no active namespaces */
> @@ -982,92 +983,19 @@ static int ldlm_pools_recalc(enum ldlm_side client)
>   /* Wake up the blocking threads from time to time. */
>   ldlm_bl_thread_wakeup();
> 
> - return time;
> -}
> -
> -static int ldlm_pools_thread_main(void *arg)
> -{
> - struct ptlrpc_thread *thread = (struct ptlrpc_thread *)arg;
> - int c_time;
> -
> - thread_set_flags(thread, SVC_RUNNING);
> - wake_up(>t_ctl_waitq);
> -
> - CDEBUG(D_DLMTRACE, "%s: pool thread starting, process %d\n",
> -"ldlm_poold", current_pid());
> -
> - while (1) {
> - /*
> -  * Recal all pools on this tick.
> -  */
> - c_time = ldlm_pools_recalc(LDLM_NAMESPACE_CLIENT);
> -
> - /*
> -  * Wait until the next check time, or until we're
> -  * stopped.
> -  */
> - wait_event_idle_timeout(thread->t_ctl_waitq,
> - thread_is_stopping(thread) ||
> - thread_is_event(thread),
> - c_time * HZ);
> -
> - if (thread_test_and_clear_flags(thread, SVC_STOPPING))
> - break;
> - thread_test_and_clear_flags(thread, SVC_EVENT);
> - }
> -
> - thread_set_flags(thread, SVC_STOPPED);
> - wake_up(>t_ctl_waitq);
> -
> - CDEBUG(D_DLMTRACE, "%s: pool thread exiting, process %d\n",
> -"ldlm_poold", current_pid());
> -
> - complete_and_exit(_pools_comp, 0);
> + schedule_delayed_work(_recalc_pools, time * HZ);
> }
> 
> static int ldlm_pools_thread_start(void)
> {
> - struct task_struct *task;
> -
> - if (ldlm_pools_thread)
> - return -EALREADY;
> -
> - ldlm_pools_thread = kzalloc(sizeof(*ldlm_pools_thread), GFP_NOFS);
> - if (!ldlm_pools_thread)
> - return -ENOMEM;
> -
> - init_completion(_pools_comp);
> - init_waitqueue_head(_pools_thread->t_ctl_waitq);
> + schedule_delayed_work(_recalc_pools, 0);
> 
> - task = kthread_run(ldlm_pools_thread_main, ldlm_pools_thread,
> -"ldlm_poold");
> - if (IS_ERR(task)) {
> - CERROR("Can't start pool thread, error %ld\n", PTR_ERR(task));
> - kfree(ldlm_pools_thread);
> - ldlm_pools_thread = NULL;
> - return PTR_ERR(task);
> - }
> - wait_event_idle(ldlm_pools_thread->t_ctl_waitq,
> - thread_is_running(ldlm_pools_thread));
>   return 0;
> }
> 
> static void ldlm_pools_thread_stop(void)
> {
> - if (!ldlm_pools_thread)
> - return;
> -
> - thread_set_flags(ldlm_pools_thread, SVC_STOPPING);
> - wake_up(_pools_thread->t_ctl_waitq);
> -
> - /*
> -  * Make sure that pools thread is finished before freeing @thread.
> -  * This fixes possible race and oops due to accessing freed memory
> -  * in pools thread.
> -  */
> - wait_for_completion(_pools_comp);
> - kfree(ldlm_pools_thread);
> - ldlm_pools_thread = NULL;
> + cancel_delayed_work_sync(_recalc_pools);
> }
> 
> static struct shrinker ldlm_pools_cli_shrinker = {
> @@ -1081,20 +1009,15 @@ int 

Re: [PATCH 09/17] staging: lustre: ldlm: use delayed_work for pools_recalc

2018-03-08 Thread Dilger, Andreas
On Mar 1, 2018, at 16:31, NeilBrown  wrote:
> 
> ldlm currenty has a kthread which wakes up every so often
> and calls ldlm_pools_recalc().
> The thread is started and stopped, but no other external interactions
> happen.
> 
> This can trivially be replaced by a delayed_work if we have
> ldlm_pools_recalc() reschedule the work rather than just report
> when to do that.
> 
> Signed-off-by: NeilBrown 

Reviewed-by: Andreas Dilger 

> ---
> drivers/staging/lustre/lustre/ldlm/ldlm_pool.c |   99 +++-
> 1 file changed, 11 insertions(+), 88 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_pool.c 
> b/drivers/staging/lustre/lustre/ldlm/ldlm_pool.c
> index a0e486b57e08..53b8f33e54b5 100644
> --- a/drivers/staging/lustre/lustre/ldlm/ldlm_pool.c
> +++ b/drivers/staging/lustre/lustre/ldlm/ldlm_pool.c
> @@ -784,9 +784,6 @@ static int ldlm_pool_granted(struct ldlm_pool *pl)
>   return atomic_read(>pl_granted);
> }
> 
> -static struct ptlrpc_thread *ldlm_pools_thread;
> -static struct completion ldlm_pools_comp;
> -
> /*
>  * count locks from all namespaces (if possible). Returns number of
>  * cached locks.
> @@ -899,8 +896,12 @@ static unsigned long ldlm_pools_cli_scan(struct shrinker 
> *s,
>  sc->gfp_mask);
> }
> 
> -static int ldlm_pools_recalc(enum ldlm_side client)
> +static void ldlm_pools_recalc(struct work_struct *ws);
> +static DECLARE_DELAYED_WORK(ldlm_recalc_pools, ldlm_pools_recalc);
> +
> +static void ldlm_pools_recalc(struct work_struct *ws)
> {
> + enum ldlm_side client = LDLM_NAMESPACE_CLIENT;
>   struct ldlm_namespace *ns;
>   struct ldlm_namespace *ns_old = NULL;
>   /* seconds of sleep if no active namespaces */
> @@ -982,92 +983,19 @@ static int ldlm_pools_recalc(enum ldlm_side client)
>   /* Wake up the blocking threads from time to time. */
>   ldlm_bl_thread_wakeup();
> 
> - return time;
> -}
> -
> -static int ldlm_pools_thread_main(void *arg)
> -{
> - struct ptlrpc_thread *thread = (struct ptlrpc_thread *)arg;
> - int c_time;
> -
> - thread_set_flags(thread, SVC_RUNNING);
> - wake_up(>t_ctl_waitq);
> -
> - CDEBUG(D_DLMTRACE, "%s: pool thread starting, process %d\n",
> -"ldlm_poold", current_pid());
> -
> - while (1) {
> - /*
> -  * Recal all pools on this tick.
> -  */
> - c_time = ldlm_pools_recalc(LDLM_NAMESPACE_CLIENT);
> -
> - /*
> -  * Wait until the next check time, or until we're
> -  * stopped.
> -  */
> - wait_event_idle_timeout(thread->t_ctl_waitq,
> - thread_is_stopping(thread) ||
> - thread_is_event(thread),
> - c_time * HZ);
> -
> - if (thread_test_and_clear_flags(thread, SVC_STOPPING))
> - break;
> - thread_test_and_clear_flags(thread, SVC_EVENT);
> - }
> -
> - thread_set_flags(thread, SVC_STOPPED);
> - wake_up(>t_ctl_waitq);
> -
> - CDEBUG(D_DLMTRACE, "%s: pool thread exiting, process %d\n",
> -"ldlm_poold", current_pid());
> -
> - complete_and_exit(_pools_comp, 0);
> + schedule_delayed_work(_recalc_pools, time * HZ);
> }
> 
> static int ldlm_pools_thread_start(void)
> {
> - struct task_struct *task;
> -
> - if (ldlm_pools_thread)
> - return -EALREADY;
> -
> - ldlm_pools_thread = kzalloc(sizeof(*ldlm_pools_thread), GFP_NOFS);
> - if (!ldlm_pools_thread)
> - return -ENOMEM;
> -
> - init_completion(_pools_comp);
> - init_waitqueue_head(_pools_thread->t_ctl_waitq);
> + schedule_delayed_work(_recalc_pools, 0);
> 
> - task = kthread_run(ldlm_pools_thread_main, ldlm_pools_thread,
> -"ldlm_poold");
> - if (IS_ERR(task)) {
> - CERROR("Can't start pool thread, error %ld\n", PTR_ERR(task));
> - kfree(ldlm_pools_thread);
> - ldlm_pools_thread = NULL;
> - return PTR_ERR(task);
> - }
> - wait_event_idle(ldlm_pools_thread->t_ctl_waitq,
> - thread_is_running(ldlm_pools_thread));
>   return 0;
> }
> 
> static void ldlm_pools_thread_stop(void)
> {
> - if (!ldlm_pools_thread)
> - return;
> -
> - thread_set_flags(ldlm_pools_thread, SVC_STOPPING);
> - wake_up(_pools_thread->t_ctl_waitq);
> -
> - /*
> -  * Make sure that pools thread is finished before freeing @thread.
> -  * This fixes possible race and oops due to accessing freed memory
> -  * in pools thread.
> -  */
> - wait_for_completion(_pools_comp);
> - kfree(ldlm_pools_thread);
> - ldlm_pools_thread = NULL;
> + cancel_delayed_work_sync(_recalc_pools);
> }
> 
> static struct shrinker ldlm_pools_cli_shrinker = {
> @@ -1081,20 +1009,15 @@ int ldlm_pools_init(void)
>   int rc;
> 
>   rc = 

Re: [PATCH 08/17] staging: lustre: obdclass: use workqueue for zombie management.

2018-03-07 Thread Dilger, Andreas
On Mar 1, 2018, at 16:31, NeilBrown  wrote:
> 
> obdclass currently maintains two lists of data structures
> (imports and exports), and a kthread which will free
> anything on either list.  The thread is woken whenever
> anything is added to either list.
> 
> This is exactly the sort of thing that workqueues exist for.
> 
> So discard the zombie kthread and the lists and locks, and
> create a single workqueue.  Each obd_import and obd_export
> gets a work_struct to attach to this workqueue.
> 
> This requires a small change to import_sec_validate_get()
> which was testing if an obd_import was on the zombie
> list.  This cannot have every safely found it to be
> on the list (as it could be freed asynchronously)
> so it must be dead code.
> 
> We could use system_wq instead of creating a dedicated
> zombie_wq, but as we occasionally want to flush all pending
> work, it is a little nicer to only have to wait for our own
> work items.

Nice cleanup.  Lustre definitely has too many threads, but
kernel work queues didn't exist in the dark ages.

I CC'd Alexey, since he wrote this code initially, in case
there is anything special to be aware of.

Reviewed-by: Andreas Dilger 

> Signed-off-by: NeilBrown 
> ---
> .../staging/lustre/lustre/include/lustre_export.h  |2 
> .../staging/lustre/lustre/include/lustre_import.h  |4 
> drivers/staging/lustre/lustre/obdclass/genops.c|  193 ++--
> drivers/staging/lustre/lustre/ptlrpc/sec.c |6 -
> 4 files changed, 30 insertions(+), 175 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/include/lustre_export.h 
> b/drivers/staging/lustre/lustre/include/lustre_export.h
> index 66ac9dc7302a..40cd168ed2ea 100644
> --- a/drivers/staging/lustre/lustre/include/lustre_export.h
> +++ b/drivers/staging/lustre/lustre/include/lustre_export.h
> @@ -87,6 +87,8 @@ struct obd_export {
>   struct obd_uuidexp_client_uuid;
>   /** To link all exports on an obd device */
>   struct list_headexp_obd_chain;
> + /** work_struct for destruction of export */
> + struct work_struct  exp_zombie_work;
>   struct hlist_node exp_uuid_hash; /** uuid-export hash*/
>   /** Obd device of this export */
>   struct obd_device   *exp_obd;
> diff --git a/drivers/staging/lustre/lustre/include/lustre_import.h 
> b/drivers/staging/lustre/lustre/include/lustre_import.h
> index ea158e0630e2..1731048f1ff2 100644
> --- a/drivers/staging/lustre/lustre/include/lustre_import.h
> +++ b/drivers/staging/lustre/lustre/include/lustre_import.h
> @@ -162,8 +162,8 @@ struct obd_import {
>   struct ptlrpc_client *imp_client;
>   /** List element for linking into pinger chain */
>   struct list_headimp_pinger_chain;
> - /** List element for linking into chain for destruction */
> - struct list_headimp_zombie_chain;
> + /** work struct for destruction of import */
> + struct work_struct  imp_zombie_work;
> 
>   /**
>* Lists of requests that are retained for replay, waiting for a reply,
> diff --git a/drivers/staging/lustre/lustre/obdclass/genops.c 
> b/drivers/staging/lustre/lustre/obdclass/genops.c
> index 8f776a4058a9..63ccbabb4c5a 100644
> --- a/drivers/staging/lustre/lustre/obdclass/genops.c
> +++ b/drivers/staging/lustre/lustre/obdclass/genops.c
> @@ -48,10 +48,7 @@ struct kmem_cache *obdo_cachep;
> EXPORT_SYMBOL(obdo_cachep);
> static struct kmem_cache *import_cachep;
> 
> -static struct list_head  obd_zombie_imports;
> -static struct list_head  obd_zombie_exports;
> -static spinlock_t  obd_zombie_impexp_lock;
> -static void obd_zombie_impexp_notify(void);
> +static struct workqueue_struct *zombie_wq;
> static void obd_zombie_export_add(struct obd_export *exp);
> static void obd_zombie_import_add(struct obd_import *imp);
> 
> @@ -701,6 +698,13 @@ void class_export_put(struct obd_export *exp)
> }
> EXPORT_SYMBOL(class_export_put);
> 
> +static void obd_zombie_exp_cull(struct work_struct *ws)
> +{
> + struct obd_export *export = container_of(ws, struct obd_export, 
> exp_zombie_work);
> +
> + class_export_destroy(export);
> +}
> +
> /* Creates a new export, adds it to the hash table, and returns a
>  * pointer to it. The refcount is 2: one for the hash reference, and
>  * one for the pointer returned by this function.
> @@ -741,6 +745,7 @@ struct obd_export *class_new_export(struct obd_device 
> *obd,
>   INIT_HLIST_NODE(>exp_uuid_hash);
>   spin_lock_init(>exp_bl_list_lock);
>   INIT_LIST_HEAD(>exp_bl_list);
> + INIT_WORK(>exp_zombie_work, obd_zombie_exp_cull);
> 
>   export->exp_sp_peer = LUSTRE_SP_ANY;
>   export->exp_flvr.sf_rpc = SPTLRPC_FLVR_INVALID;
> @@ -862,7 +867,6 @@ EXPORT_SYMBOL(class_import_get);
> 
> void class_import_put(struct obd_import *imp)
> {
> - LASSERT(list_empty(>imp_zombie_chain));
>   

Re: [PATCH 08/17] staging: lustre: obdclass: use workqueue for zombie management.

2018-03-07 Thread Dilger, Andreas
On Mar 1, 2018, at 16:31, NeilBrown  wrote:
> 
> obdclass currently maintains two lists of data structures
> (imports and exports), and a kthread which will free
> anything on either list.  The thread is woken whenever
> anything is added to either list.
> 
> This is exactly the sort of thing that workqueues exist for.
> 
> So discard the zombie kthread and the lists and locks, and
> create a single workqueue.  Each obd_import and obd_export
> gets a work_struct to attach to this workqueue.
> 
> This requires a small change to import_sec_validate_get()
> which was testing if an obd_import was on the zombie
> list.  This cannot have every safely found it to be
> on the list (as it could be freed asynchronously)
> so it must be dead code.
> 
> We could use system_wq instead of creating a dedicated
> zombie_wq, but as we occasionally want to flush all pending
> work, it is a little nicer to only have to wait for our own
> work items.

Nice cleanup.  Lustre definitely has too many threads, but
kernel work queues didn't exist in the dark ages.

I CC'd Alexey, since he wrote this code initially, in case
there is anything special to be aware of.

Reviewed-by: Andreas Dilger 

> Signed-off-by: NeilBrown 
> ---
> .../staging/lustre/lustre/include/lustre_export.h  |2 
> .../staging/lustre/lustre/include/lustre_import.h  |4 
> drivers/staging/lustre/lustre/obdclass/genops.c|  193 ++--
> drivers/staging/lustre/lustre/ptlrpc/sec.c |6 -
> 4 files changed, 30 insertions(+), 175 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/include/lustre_export.h 
> b/drivers/staging/lustre/lustre/include/lustre_export.h
> index 66ac9dc7302a..40cd168ed2ea 100644
> --- a/drivers/staging/lustre/lustre/include/lustre_export.h
> +++ b/drivers/staging/lustre/lustre/include/lustre_export.h
> @@ -87,6 +87,8 @@ struct obd_export {
>   struct obd_uuidexp_client_uuid;
>   /** To link all exports on an obd device */
>   struct list_headexp_obd_chain;
> + /** work_struct for destruction of export */
> + struct work_struct  exp_zombie_work;
>   struct hlist_node exp_uuid_hash; /** uuid-export hash*/
>   /** Obd device of this export */
>   struct obd_device   *exp_obd;
> diff --git a/drivers/staging/lustre/lustre/include/lustre_import.h 
> b/drivers/staging/lustre/lustre/include/lustre_import.h
> index ea158e0630e2..1731048f1ff2 100644
> --- a/drivers/staging/lustre/lustre/include/lustre_import.h
> +++ b/drivers/staging/lustre/lustre/include/lustre_import.h
> @@ -162,8 +162,8 @@ struct obd_import {
>   struct ptlrpc_client *imp_client;
>   /** List element for linking into pinger chain */
>   struct list_headimp_pinger_chain;
> - /** List element for linking into chain for destruction */
> - struct list_headimp_zombie_chain;
> + /** work struct for destruction of import */
> + struct work_struct  imp_zombie_work;
> 
>   /**
>* Lists of requests that are retained for replay, waiting for a reply,
> diff --git a/drivers/staging/lustre/lustre/obdclass/genops.c 
> b/drivers/staging/lustre/lustre/obdclass/genops.c
> index 8f776a4058a9..63ccbabb4c5a 100644
> --- a/drivers/staging/lustre/lustre/obdclass/genops.c
> +++ b/drivers/staging/lustre/lustre/obdclass/genops.c
> @@ -48,10 +48,7 @@ struct kmem_cache *obdo_cachep;
> EXPORT_SYMBOL(obdo_cachep);
> static struct kmem_cache *import_cachep;
> 
> -static struct list_head  obd_zombie_imports;
> -static struct list_head  obd_zombie_exports;
> -static spinlock_t  obd_zombie_impexp_lock;
> -static void obd_zombie_impexp_notify(void);
> +static struct workqueue_struct *zombie_wq;
> static void obd_zombie_export_add(struct obd_export *exp);
> static void obd_zombie_import_add(struct obd_import *imp);
> 
> @@ -701,6 +698,13 @@ void class_export_put(struct obd_export *exp)
> }
> EXPORT_SYMBOL(class_export_put);
> 
> +static void obd_zombie_exp_cull(struct work_struct *ws)
> +{
> + struct obd_export *export = container_of(ws, struct obd_export, 
> exp_zombie_work);
> +
> + class_export_destroy(export);
> +}
> +
> /* Creates a new export, adds it to the hash table, and returns a
>  * pointer to it. The refcount is 2: one for the hash reference, and
>  * one for the pointer returned by this function.
> @@ -741,6 +745,7 @@ struct obd_export *class_new_export(struct obd_device 
> *obd,
>   INIT_HLIST_NODE(>exp_uuid_hash);
>   spin_lock_init(>exp_bl_list_lock);
>   INIT_LIST_HEAD(>exp_bl_list);
> + INIT_WORK(>exp_zombie_work, obd_zombie_exp_cull);
> 
>   export->exp_sp_peer = LUSTRE_SP_ANY;
>   export->exp_flvr.sf_rpc = SPTLRPC_FLVR_INVALID;
> @@ -862,7 +867,6 @@ EXPORT_SYMBOL(class_import_get);
> 
> void class_import_put(struct obd_import *imp)
> {
> - LASSERT(list_empty(>imp_zombie_chain));
>   LASSERT_ATOMIC_GT_LT(>imp_refcount, 0, LI_POISON);
> 
>   

Re: [PATCH 07/17] staging: lustre: ptlrpc: change GFP_NOFS to GFP_KERNEL

2018-03-07 Thread Dilger, Andreas
On Mar 1, 2018, at 16:31, NeilBrown  wrote:
> 
> These allocations are performed during initialization,
> so they don't need GFP_NOFS.
> 
> Signed-off-by: NeilBrown 

Reviewed-by: Andreas Dilger 

> ---
> drivers/staging/lustre/lustre/ptlrpc/sec_bulk.c |2 +-
> drivers/staging/lustre/lustre/ptlrpc/service.c  |4 ++--
> 2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/ptlrpc/sec_bulk.c 
> b/drivers/staging/lustre/lustre/ptlrpc/sec_bulk.c
> index 577c5822b823..625b9520d78f 100644
> --- a/drivers/staging/lustre/lustre/ptlrpc/sec_bulk.c
> +++ b/drivers/staging/lustre/lustre/ptlrpc/sec_bulk.c
> @@ -377,7 +377,7 @@ static inline void enc_pools_alloc(void)
>   page_pools.epp_pools =
>   kvzalloc(page_pools.epp_max_pools *
>   sizeof(*page_pools.epp_pools),
> - GFP_NOFS);
> + GFP_KERNEL);
> }
> 
> static inline void enc_pools_free(void)
> diff --git a/drivers/staging/lustre/lustre/ptlrpc/service.c 
> b/drivers/staging/lustre/lustre/ptlrpc/service.c
> index 49417228b621..f37364e00dfe 100644
> --- a/drivers/staging/lustre/lustre/ptlrpc/service.c
> +++ b/drivers/staging/lustre/lustre/ptlrpc/service.c
> @@ -2046,7 +2046,7 @@ static int ptlrpc_main(void *arg)
>   goto out;
>   }
> 
> - env = kzalloc(sizeof(*env), GFP_NOFS);
> + env = kzalloc(sizeof(*env), GFP_KERNEL);
>   if (!env) {
>   rc = -ENOMEM;
>   goto out_srv_fini;
> @@ -2072,7 +2072,7 @@ static int ptlrpc_main(void *arg)
>   }
> 
>   /* Alloc reply state structure for this one */
> - rs = kvzalloc(svc->srv_max_reply_size, GFP_NOFS);
> + rs = kvzalloc(svc->srv_max_reply_size, GFP_KERNEL);
>   if (!rs) {
>   rc = -ENOMEM;
>   goto out_srv_fini;
> 
> 

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH 07/17] staging: lustre: ptlrpc: change GFP_NOFS to GFP_KERNEL

2018-03-07 Thread Dilger, Andreas
On Mar 1, 2018, at 16:31, NeilBrown  wrote:
> 
> These allocations are performed during initialization,
> so they don't need GFP_NOFS.
> 
> Signed-off-by: NeilBrown 

Reviewed-by: Andreas Dilger 

> ---
> drivers/staging/lustre/lustre/ptlrpc/sec_bulk.c |2 +-
> drivers/staging/lustre/lustre/ptlrpc/service.c  |4 ++--
> 2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/ptlrpc/sec_bulk.c 
> b/drivers/staging/lustre/lustre/ptlrpc/sec_bulk.c
> index 577c5822b823..625b9520d78f 100644
> --- a/drivers/staging/lustre/lustre/ptlrpc/sec_bulk.c
> +++ b/drivers/staging/lustre/lustre/ptlrpc/sec_bulk.c
> @@ -377,7 +377,7 @@ static inline void enc_pools_alloc(void)
>   page_pools.epp_pools =
>   kvzalloc(page_pools.epp_max_pools *
>   sizeof(*page_pools.epp_pools),
> - GFP_NOFS);
> + GFP_KERNEL);
> }
> 
> static inline void enc_pools_free(void)
> diff --git a/drivers/staging/lustre/lustre/ptlrpc/service.c 
> b/drivers/staging/lustre/lustre/ptlrpc/service.c
> index 49417228b621..f37364e00dfe 100644
> --- a/drivers/staging/lustre/lustre/ptlrpc/service.c
> +++ b/drivers/staging/lustre/lustre/ptlrpc/service.c
> @@ -2046,7 +2046,7 @@ static int ptlrpc_main(void *arg)
>   goto out;
>   }
> 
> - env = kzalloc(sizeof(*env), GFP_NOFS);
> + env = kzalloc(sizeof(*env), GFP_KERNEL);
>   if (!env) {
>   rc = -ENOMEM;
>   goto out_srv_fini;
> @@ -2072,7 +2072,7 @@ static int ptlrpc_main(void *arg)
>   }
> 
>   /* Alloc reply state structure for this one */
> - rs = kvzalloc(svc->srv_max_reply_size, GFP_NOFS);
> + rs = kvzalloc(svc->srv_max_reply_size, GFP_KERNEL);
>   if (!rs) {
>   rc = -ENOMEM;
>   goto out_srv_fini;
> 
> 

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH 06/17] staging: lustre: get entropy from nid when nid set.

2018-03-07 Thread Dilger, Andreas
On Mar 1, 2018, at 16:31, NeilBrown  wrote:
> 
> When the 'lustre' module is loaded, it gets a list of
> net devices and uses the node ids to  add entropy
> to the prng.  This means that the network interfaces need
> to be configured before the module is loaded, which prevents
> the module from being compiled into a monolithic kernel.
> 
> So move this entropy addition to the moment when
> the interface is imported to LNet and the node id is first known.

It took me a while to convince myself this is correct, but this is
moving the entropy addition earlier in the startup sequence, and
that is a good thing.  The important factor is to ensure that the
client UUID (generated at mount time) is unique across all clients,
and adding the node address to the entropy ensures this, even if many
thousands of identical diskless nodes boot and mount simultaneously.

Reviewed-by: Andreas Dilger 

> Signed-off-by: NeilBrown 
> ---
> drivers/staging/lustre/lnet/lnet/api-ni.c |7 +++
> drivers/staging/lustre/lustre/llite/super25.c |   17 +
> 2 files changed, 8 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c 
> b/drivers/staging/lustre/lnet/lnet/api-ni.c
> index 48d25ccadbb3..90266be0132d 100644
> --- a/drivers/staging/lustre/lnet/lnet/api-ni.c
> +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
> @@ -1214,6 +1214,7 @@ lnet_startup_lndni(struct lnet_ni *ni, struct 
> lnet_ioctl_config_data *conf)
>   struct lnet_lnd *lnd;
>   struct lnet_tx_queue *tq;
>   int i;
> + u32 seed;
> 
>   lnd_type = LNET_NETTYP(LNET_NIDNET(ni->ni_nid));
> 
> @@ -1352,6 +1353,12 @@ lnet_startup_lndni(struct lnet_ni *ni, struct 
> lnet_ioctl_config_data *conf)
>   tq->tq_credits = lnet_ni_tq_credits(ni);
>   }
> 
> + /* Nodes with small feet have little entropy. The NID for this
> +  * node gives the most entropy in the low bits.
> +  */
> + seed = LNET_NIDADDR(ni->ni_nid);
> + add_device_randomness(, sizeof(seed));
> +
>   CDEBUG(D_LNI, "Added LNI %s [%d/%d/%d/%d]\n",
>  libcfs_nid2str(ni->ni_nid), ni->ni_peertxcredits,
>  lnet_ni_tq_credits(ni) * LNET_CPT_NUMBER,
> diff --git a/drivers/staging/lustre/lustre/llite/super25.c 
> b/drivers/staging/lustre/lustre/llite/super25.c
> index 9b0bb3541a84..861e7a60f408 100644
> --- a/drivers/staging/lustre/lustre/llite/super25.c
> +++ b/drivers/staging/lustre/lustre/llite/super25.c
> @@ -85,8 +85,7 @@ MODULE_ALIAS_FS("lustre");
> 
> static int __init lustre_init(void)
> {
> - struct lnet_process_id lnet_id;
> - int i, rc;
> + int rc;
> 
>   BUILD_BUG_ON(sizeof(LUSTRE_VOLATILE_HDR) !=
>LUSTRE_VOLATILE_HDR_LEN + 1);
> @@ -125,20 +124,6 @@ static int __init lustre_init(void)
>   goto out_debugfs;
>   }
> 
> - /* Nodes with small feet have little entropy. The NID for this
> -  * node gives the most entropy in the low bits
> -  */
> - for (i = 0;; i++) {
> - u32 seed;
> -
> - if (LNetGetId(i, _id) == -ENOENT)
> - break;
> - if (LNET_NETTYP(LNET_NIDNET(lnet_id.nid)) != LOLND) {
> -  seed = LNET_NIDADDR(lnet_id.nid);
> -  add_device_randomness(, sizeof(seed));
> - }
> - }
> -
>   rc = vvp_global_init();
>   if (rc != 0)
>   goto out_sysfs;
> 
> 

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH 06/17] staging: lustre: get entropy from nid when nid set.

2018-03-07 Thread Dilger, Andreas
On Mar 1, 2018, at 16:31, NeilBrown  wrote:
> 
> When the 'lustre' module is loaded, it gets a list of
> net devices and uses the node ids to  add entropy
> to the prng.  This means that the network interfaces need
> to be configured before the module is loaded, which prevents
> the module from being compiled into a monolithic kernel.
> 
> So move this entropy addition to the moment when
> the interface is imported to LNet and the node id is first known.

It took me a while to convince myself this is correct, but this is
moving the entropy addition earlier in the startup sequence, and
that is a good thing.  The important factor is to ensure that the
client UUID (generated at mount time) is unique across all clients,
and adding the node address to the entropy ensures this, even if many
thousands of identical diskless nodes boot and mount simultaneously.

Reviewed-by: Andreas Dilger 

> Signed-off-by: NeilBrown 
> ---
> drivers/staging/lustre/lnet/lnet/api-ni.c |7 +++
> drivers/staging/lustre/lustre/llite/super25.c |   17 +
> 2 files changed, 8 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c 
> b/drivers/staging/lustre/lnet/lnet/api-ni.c
> index 48d25ccadbb3..90266be0132d 100644
> --- a/drivers/staging/lustre/lnet/lnet/api-ni.c
> +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c
> @@ -1214,6 +1214,7 @@ lnet_startup_lndni(struct lnet_ni *ni, struct 
> lnet_ioctl_config_data *conf)
>   struct lnet_lnd *lnd;
>   struct lnet_tx_queue *tq;
>   int i;
> + u32 seed;
> 
>   lnd_type = LNET_NETTYP(LNET_NIDNET(ni->ni_nid));
> 
> @@ -1352,6 +1353,12 @@ lnet_startup_lndni(struct lnet_ni *ni, struct 
> lnet_ioctl_config_data *conf)
>   tq->tq_credits = lnet_ni_tq_credits(ni);
>   }
> 
> + /* Nodes with small feet have little entropy. The NID for this
> +  * node gives the most entropy in the low bits.
> +  */
> + seed = LNET_NIDADDR(ni->ni_nid);
> + add_device_randomness(, sizeof(seed));
> +
>   CDEBUG(D_LNI, "Added LNI %s [%d/%d/%d/%d]\n",
>  libcfs_nid2str(ni->ni_nid), ni->ni_peertxcredits,
>  lnet_ni_tq_credits(ni) * LNET_CPT_NUMBER,
> diff --git a/drivers/staging/lustre/lustre/llite/super25.c 
> b/drivers/staging/lustre/lustre/llite/super25.c
> index 9b0bb3541a84..861e7a60f408 100644
> --- a/drivers/staging/lustre/lustre/llite/super25.c
> +++ b/drivers/staging/lustre/lustre/llite/super25.c
> @@ -85,8 +85,7 @@ MODULE_ALIAS_FS("lustre");
> 
> static int __init lustre_init(void)
> {
> - struct lnet_process_id lnet_id;
> - int i, rc;
> + int rc;
> 
>   BUILD_BUG_ON(sizeof(LUSTRE_VOLATILE_HDR) !=
>LUSTRE_VOLATILE_HDR_LEN + 1);
> @@ -125,20 +124,6 @@ static int __init lustre_init(void)
>   goto out_debugfs;
>   }
> 
> - /* Nodes with small feet have little entropy. The NID for this
> -  * node gives the most entropy in the low bits
> -  */
> - for (i = 0;; i++) {
> - u32 seed;
> -
> - if (LNetGetId(i, _id) == -ENOENT)
> - break;
> - if (LNET_NETTYP(LNET_NIDNET(lnet_id.nid)) != LOLND) {
> -  seed = LNET_NIDADDR(lnet_id.nid);
> -  add_device_randomness(, sizeof(seed));
> - }
> - }
> -
>   rc = vvp_global_init();
>   if (rc != 0)
>   goto out_sysfs;
> 
> 

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH 05/17] staging: lustre: lnet: keep ln_nportals consistent

2018-03-07 Thread Dilger, Andreas
On Mar 1, 2018, at 16:31, NeilBrown  wrote:
> 
> ln_nportals should be zero when no portals have
> been allocated.  This ensures that memory allocation failure
> is handled correctly elsewhere.
> 
> Signed-off-by: NeilBrown 

Reviewed-by: Andreas Dilger 

> ---
> drivers/staging/lustre/lnet/lnet/lib-ptl.c |5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lnet/lnet/lib-ptl.c 
> b/drivers/staging/lustre/lnet/lnet/lib-ptl.c
> index 471f2f6c86f4..fc47379c5938 100644
> --- a/drivers/staging/lustre/lnet/lnet/lib-ptl.c
> +++ b/drivers/staging/lustre/lnet/lnet/lib-ptl.c
> @@ -841,6 +841,7 @@ lnet_portals_destroy(void)
> 
>   cfs_array_free(the_lnet.ln_portals);
>   the_lnet.ln_portals = NULL;
> + the_lnet.ln_nportals = 0;
> }
> 
> int
> @@ -851,12 +852,12 @@ lnet_portals_create(void)
> 
>   size = offsetof(struct lnet_portal, ptl_mt_maps[LNET_CPT_NUMBER]);
> 
> - the_lnet.ln_nportals = MAX_PORTALS;
> - the_lnet.ln_portals = cfs_array_alloc(the_lnet.ln_nportals, size);
> + the_lnet.ln_portals = cfs_array_alloc(MAX_PORTALS, size);
>   if (!the_lnet.ln_portals) {
>   CERROR("Failed to allocate portals table\n");
>   return -ENOMEM;
>   }
> + the_lnet.ln_nportals = MAX_PORTALS;
> 
>   for (i = 0; i < the_lnet.ln_nportals; i++) {
>   if (lnet_ptl_setup(the_lnet.ln_portals[i], i)) {
> 
> 

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH 05/17] staging: lustre: lnet: keep ln_nportals consistent

2018-03-07 Thread Dilger, Andreas
On Mar 1, 2018, at 16:31, NeilBrown  wrote:
> 
> ln_nportals should be zero when no portals have
> been allocated.  This ensures that memory allocation failure
> is handled correctly elsewhere.
> 
> Signed-off-by: NeilBrown 

Reviewed-by: Andreas Dilger 

> ---
> drivers/staging/lustre/lnet/lnet/lib-ptl.c |5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lnet/lnet/lib-ptl.c 
> b/drivers/staging/lustre/lnet/lnet/lib-ptl.c
> index 471f2f6c86f4..fc47379c5938 100644
> --- a/drivers/staging/lustre/lnet/lnet/lib-ptl.c
> +++ b/drivers/staging/lustre/lnet/lnet/lib-ptl.c
> @@ -841,6 +841,7 @@ lnet_portals_destroy(void)
> 
>   cfs_array_free(the_lnet.ln_portals);
>   the_lnet.ln_portals = NULL;
> + the_lnet.ln_nportals = 0;
> }
> 
> int
> @@ -851,12 +852,12 @@ lnet_portals_create(void)
> 
>   size = offsetof(struct lnet_portal, ptl_mt_maps[LNET_CPT_NUMBER]);
> 
> - the_lnet.ln_nportals = MAX_PORTALS;
> - the_lnet.ln_portals = cfs_array_alloc(the_lnet.ln_nportals, size);
> + the_lnet.ln_portals = cfs_array_alloc(MAX_PORTALS, size);
>   if (!the_lnet.ln_portals) {
>   CERROR("Failed to allocate portals table\n");
>   return -ENOMEM;
>   }
> + the_lnet.ln_nportals = MAX_PORTALS;
> 
>   for (i = 0; i < the_lnet.ln_nportals; i++) {
>   if (lnet_ptl_setup(the_lnet.ln_portals[i], i)) {
> 
> 

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH 04/17] staging: lustre: obdclass: don't require lct_owner to be non-NULL.

2018-03-07 Thread Dilger, Andreas
On Mar 1, 2018, at 16:31, NeilBrown  wrote:
> 
> Some places in lu_object.c allow lct_owner to be NULL, implying
> that the code is built in to the kernel (not a module), but
> two places don't.  This prevents us from building lustre into
> the kernel.
> 
> So remove the requirement and always allow lct_owner to be NULL.
> 
> This requires removing an "assert" that the module count is positive,
> but this is redundant as module_put() already does the necessary test.
> 
> Signed-off-by: NeilBrown 

Reviewed-by: Andreas Dilger 

> ---
> drivers/staging/lustre/lustre/obdclass/lu_object.c |7 +--
> 1 file changed, 1 insertion(+), 6 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/obdclass/lu_object.c 
> b/drivers/staging/lustre/lustre/obdclass/lu_object.c
> index cca688175d2d..880800e78c52 100644
> --- a/drivers/staging/lustre/lustre/obdclass/lu_object.c
> +++ b/drivers/staging/lustre/lustre/obdclass/lu_object.c
> @@ -1380,12 +1380,8 @@ static void key_fini(struct lu_context *ctx, int index)
>   lu_ref_del(>lct_reference, "ctx", ctx);
>   atomic_dec(>lct_used);
> 
> - if ((ctx->lc_tags & LCT_NOREF) == 0) {
> -#ifdef CONFIG_MODULE_UNLOAD
> - LINVRNT(module_refcount(key->lct_owner) > 0);
> -#endif
> + if ((ctx->lc_tags & LCT_NOREF) == 0)
>   module_put(key->lct_owner);
> - }
>   ctx->lc_value[index] = NULL;
>   }
> }
> @@ -1619,7 +1615,6 @@ static int keys_fill(struct lu_context *ctx)
>   LINVRNT(key->lct_init);
>   LINVRNT(key->lct_index == i);
> 
> - LASSERT(key->lct_owner);
>   if (!(ctx->lc_tags & LCT_NOREF) &&
>   !try_module_get(key->lct_owner)) {
>   /* module is unloading, skip this key */
> 
> 

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH 04/17] staging: lustre: obdclass: don't require lct_owner to be non-NULL.

2018-03-07 Thread Dilger, Andreas
On Mar 1, 2018, at 16:31, NeilBrown  wrote:
> 
> Some places in lu_object.c allow lct_owner to be NULL, implying
> that the code is built in to the kernel (not a module), but
> two places don't.  This prevents us from building lustre into
> the kernel.
> 
> So remove the requirement and always allow lct_owner to be NULL.
> 
> This requires removing an "assert" that the module count is positive,
> but this is redundant as module_put() already does the necessary test.
> 
> Signed-off-by: NeilBrown 

Reviewed-by: Andreas Dilger 

> ---
> drivers/staging/lustre/lustre/obdclass/lu_object.c |7 +--
> 1 file changed, 1 insertion(+), 6 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/obdclass/lu_object.c 
> b/drivers/staging/lustre/lustre/obdclass/lu_object.c
> index cca688175d2d..880800e78c52 100644
> --- a/drivers/staging/lustre/lustre/obdclass/lu_object.c
> +++ b/drivers/staging/lustre/lustre/obdclass/lu_object.c
> @@ -1380,12 +1380,8 @@ static void key_fini(struct lu_context *ctx, int index)
>   lu_ref_del(>lct_reference, "ctx", ctx);
>   atomic_dec(>lct_used);
> 
> - if ((ctx->lc_tags & LCT_NOREF) == 0) {
> -#ifdef CONFIG_MODULE_UNLOAD
> - LINVRNT(module_refcount(key->lct_owner) > 0);
> -#endif
> + if ((ctx->lc_tags & LCT_NOREF) == 0)
>   module_put(key->lct_owner);
> - }
>   ctx->lc_value[index] = NULL;
>   }
> }
> @@ -1619,7 +1615,6 @@ static int keys_fill(struct lu_context *ctx)
>   LINVRNT(key->lct_init);
>   LINVRNT(key->lct_index == i);
> 
> - LASSERT(key->lct_owner);
>   if (!(ctx->lc_tags & LCT_NOREF) &&
>   !try_module_get(key->lct_owner)) {
>   /* module is unloading, skip this key */
> 
> 

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH 03/17] staging: lustre: statahead: remove incorrect test on agl_list_empty()

2018-03-07 Thread Dilger, Andreas
On Mar 1, 2018, at 16:31, NeilBrown  wrote:
> 
> Including agl_list_empty() in the wait_event_idle() condition
> is pointless as the body of the loop doesn't do anything
> about the agl list.
> So if the list wasn't empty, the while loop would spin
> indefinitely.
> 
> The test was removed in the lustre-release commit
> 672ab0e00d61 ("LU-3270 statahead: small fixes and cleanup"),
> but not in the Linux commit 5231f7651c55 ("staging: lustre:
> statahead: small fixes and cleanup").
> 
> Fixes: 5231f7651c55 ("staging: lustre: statahead: small fixes and cleanup")
> Signed-off-by: NeilBrown 

Reviewed-by: Andreas Dilger 

> ---
> drivers/staging/lustre/lustre/llite/statahead.c |1 -
> 1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/staging/lustre/lustre/llite/statahead.c 
> b/drivers/staging/lustre/lustre/llite/statahead.c
> index 6052bfd7ff05..ba00881a5745 100644
> --- a/drivers/staging/lustre/lustre/llite/statahead.c
> +++ b/drivers/staging/lustre/lustre/llite/statahead.c
> @@ -1124,7 +1124,6 @@ static int ll_statahead_thread(void *arg)
>   while (thread_is_running(sa_thread)) {
>   wait_event_idle(sa_thread->t_ctl_waitq,
>   sa_has_callback(sai) ||
> - !agl_list_empty(sai) ||
>   !thread_is_running(sa_thread));
> 
>   sa_handle_callback(sai);
> 
> 

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH 03/17] staging: lustre: statahead: remove incorrect test on agl_list_empty()

2018-03-07 Thread Dilger, Andreas
On Mar 1, 2018, at 16:31, NeilBrown  wrote:
> 
> Including agl_list_empty() in the wait_event_idle() condition
> is pointless as the body of the loop doesn't do anything
> about the agl list.
> So if the list wasn't empty, the while loop would spin
> indefinitely.
> 
> The test was removed in the lustre-release commit
> 672ab0e00d61 ("LU-3270 statahead: small fixes and cleanup"),
> but not in the Linux commit 5231f7651c55 ("staging: lustre:
> statahead: small fixes and cleanup").
> 
> Fixes: 5231f7651c55 ("staging: lustre: statahead: small fixes and cleanup")
> Signed-off-by: NeilBrown 

Reviewed-by: Andreas Dilger 

> ---
> drivers/staging/lustre/lustre/llite/statahead.c |1 -
> 1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/staging/lustre/lustre/llite/statahead.c 
> b/drivers/staging/lustre/lustre/llite/statahead.c
> index 6052bfd7ff05..ba00881a5745 100644
> --- a/drivers/staging/lustre/lustre/llite/statahead.c
> +++ b/drivers/staging/lustre/lustre/llite/statahead.c
> @@ -1124,7 +1124,6 @@ static int ll_statahead_thread(void *arg)
>   while (thread_is_running(sa_thread)) {
>   wait_event_idle(sa_thread->t_ctl_waitq,
>   sa_has_callback(sai) ||
> - !agl_list_empty(sai) ||
>   !thread_is_running(sa_thread));
> 
>   sa_handle_callback(sai);
> 
> 

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH 02/17] staging: lustre: fix bug in osc_enter_cache_try

2018-03-07 Thread Dilger, Andreas
On Mar 1, 2018, at 16:31, NeilBrown  wrote:
> 
> The lustre-release patch commit bdc5bb52c554 ("LU-4933 osc:
> Automatically increase the max_dirty_mb") changed
> 
> -   if (cli->cl_dirty + PAGE_CACHE_SIZE <= cli->cl_dirty_max &&
> +   if (cli->cl_dirty_pages < cli->cl_dirty_max_pages &&
> 
> When this patch landed in Linux a couple of years later, it landed as
> 
> -   if (cli->cl_dirty + PAGE_SIZE <= cli->cl_dirty_max &&
> +   if (cli->cl_dirty_pages <= cli->cl_dirty_max_pages &&
> 
> which is clearly different ('<=' vs '<'), and allows cl_dirty_pages to
> increase beyond cl_dirty_max_pages - which causes a latter assertion
> to fails.
> 
> Fixes: 3147b268400a ("staging: lustre: osc: Automatically increase the 
> max_dirty_mb")
> Signed-off-by: NeilBrown 

Reviewed-by: Andreas Dilger 

> ---
> drivers/staging/lustre/lustre/include/obd.h   |2 +-
> drivers/staging/lustre/lustre/osc/osc_cache.c |2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/include/obd.h 
> b/drivers/staging/lustre/lustre/include/obd.h
> index 4368f4e9f208..f1233ca7d337 100644
> --- a/drivers/staging/lustre/lustre/include/obd.h
> +++ b/drivers/staging/lustre/lustre/include/obd.h
> @@ -191,7 +191,7 @@ struct client_obd {
>   struct sptlrpc_flavorcl_flvr_mgc;   /* fixed flavor of mgc->mgs */
> 
>   /* the grant values are protected by loi_list_lock below */
> - unsigned longcl_dirty_pages;/* all _dirty_ in 
> pahges */
> + unsigned longcl_dirty_pages;/* all _dirty_ in pages 
> */
>   unsigned longcl_dirty_max_pages;/* allowed w/o rpc */
>   unsigned longcl_dirty_transit;  /* dirty synchronous */
>   unsigned longcl_avail_grant;/* bytes of credit for 
> ost */
> diff --git a/drivers/staging/lustre/lustre/osc/osc_cache.c 
> b/drivers/staging/lustre/lustre/osc/osc_cache.c
> index 1c70a504ee89..459503727ce3 100644
> --- a/drivers/staging/lustre/lustre/osc/osc_cache.c
> +++ b/drivers/staging/lustre/lustre/osc/osc_cache.c
> @@ -1529,7 +1529,7 @@ static int osc_enter_cache_try(struct client_obd *cli,
>   if (rc < 0)
>   return 0;
> 
> - if (cli->cl_dirty_pages <= cli->cl_dirty_max_pages &&
> + if (cli->cl_dirty_pages < cli->cl_dirty_max_pages &&
>   atomic_long_read(_dirty_pages) + 1 <= obd_max_dirty_pages) {
>   osc_consume_write_grant(cli, >oap_brw_page);
>   if (transient) {
> 
> 

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH 02/17] staging: lustre: fix bug in osc_enter_cache_try

2018-03-07 Thread Dilger, Andreas
On Mar 1, 2018, at 16:31, NeilBrown  wrote:
> 
> The lustre-release patch commit bdc5bb52c554 ("LU-4933 osc:
> Automatically increase the max_dirty_mb") changed
> 
> -   if (cli->cl_dirty + PAGE_CACHE_SIZE <= cli->cl_dirty_max &&
> +   if (cli->cl_dirty_pages < cli->cl_dirty_max_pages &&
> 
> When this patch landed in Linux a couple of years later, it landed as
> 
> -   if (cli->cl_dirty + PAGE_SIZE <= cli->cl_dirty_max &&
> +   if (cli->cl_dirty_pages <= cli->cl_dirty_max_pages &&
> 
> which is clearly different ('<=' vs '<'), and allows cl_dirty_pages to
> increase beyond cl_dirty_max_pages - which causes a latter assertion
> to fails.
> 
> Fixes: 3147b268400a ("staging: lustre: osc: Automatically increase the 
> max_dirty_mb")
> Signed-off-by: NeilBrown 

Reviewed-by: Andreas Dilger 

> ---
> drivers/staging/lustre/lustre/include/obd.h   |2 +-
> drivers/staging/lustre/lustre/osc/osc_cache.c |2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/include/obd.h 
> b/drivers/staging/lustre/lustre/include/obd.h
> index 4368f4e9f208..f1233ca7d337 100644
> --- a/drivers/staging/lustre/lustre/include/obd.h
> +++ b/drivers/staging/lustre/lustre/include/obd.h
> @@ -191,7 +191,7 @@ struct client_obd {
>   struct sptlrpc_flavorcl_flvr_mgc;   /* fixed flavor of mgc->mgs */
> 
>   /* the grant values are protected by loi_list_lock below */
> - unsigned longcl_dirty_pages;/* all _dirty_ in 
> pahges */
> + unsigned longcl_dirty_pages;/* all _dirty_ in pages 
> */
>   unsigned longcl_dirty_max_pages;/* allowed w/o rpc */
>   unsigned longcl_dirty_transit;  /* dirty synchronous */
>   unsigned longcl_avail_grant;/* bytes of credit for 
> ost */
> diff --git a/drivers/staging/lustre/lustre/osc/osc_cache.c 
> b/drivers/staging/lustre/lustre/osc/osc_cache.c
> index 1c70a504ee89..459503727ce3 100644
> --- a/drivers/staging/lustre/lustre/osc/osc_cache.c
> +++ b/drivers/staging/lustre/lustre/osc/osc_cache.c
> @@ -1529,7 +1529,7 @@ static int osc_enter_cache_try(struct client_obd *cli,
>   if (rc < 0)
>   return 0;
> 
> - if (cli->cl_dirty_pages <= cli->cl_dirty_max_pages &&
> + if (cli->cl_dirty_pages < cli->cl_dirty_max_pages &&
>   atomic_long_read(_dirty_pages) + 1 <= obd_max_dirty_pages) {
>   osc_consume_write_grant(cli, >oap_brw_page);
>   if (transient) {
> 
> 

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH 01/17] staging: lustre: obd_mount: use correct niduuid suffix.

2018-03-07 Thread Dilger, Andreas
On Mar 1, 2018, at 16:31, NeilBrown  wrote:
> 
> Commit 4f016420d368 ("Staging: lustre: obdclass: Use kasprintf") moved
> some sprintf() calls earlier in the code to combine them with
> memory allocation and create kasprintf() calls.
> 
> In one case, this code movement moved the sprintf to a location where the
> values being formatter were different.
> In particular
>   sprintf(niduuid, "%s_%x", mgcname, i);
> was move from *after* the line
>i = 0;
> to a location where the value of 'i' was at least 1.
> 
> This cause the wrong name to be formatted, and triggers
> 
>   CERROR("del MDC UUID %s failed: rc = %d\n",
>  niduuid, rc);
> 
> at unmount time.
> 
> So use '0' instead of 'i'.
> 
> Fixes: 4f016420d368 ("Staging: lustre: obdclass: Use kasprintf")
> Signed-off-by: NeilBrown 

Reviewed-by: Andreas Dilger 

> ---
> drivers/staging/lustre/lustre/obdclass/obd_mount.c |2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/staging/lustre/lustre/obdclass/obd_mount.c 
> b/drivers/staging/lustre/lustre/obdclass/obd_mount.c
> index acc1ea773c9c..f5e8214ac37b 100644
> --- a/drivers/staging/lustre/lustre/obdclass/obd_mount.c
> +++ b/drivers/staging/lustre/lustre/obdclass/obd_mount.c
> @@ -243,7 +243,7 @@ int lustre_start_mgc(struct super_block *sb)
>   libcfs_nid2str_r(nid, nidstr, sizeof(nidstr));
>   mgcname = kasprintf(GFP_NOFS,
>   "%s%s", LUSTRE_MGC_OBDNAME, nidstr);
> - niduuid = kasprintf(GFP_NOFS, "%s_%x", mgcname, i);
> + niduuid = kasprintf(GFP_NOFS, "%s_%x", mgcname, 0);
>   if (!mgcname || !niduuid) {
>   rc = -ENOMEM;
>   goto out_free;
> 
> 

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH 01/17] staging: lustre: obd_mount: use correct niduuid suffix.

2018-03-07 Thread Dilger, Andreas
On Mar 1, 2018, at 16:31, NeilBrown  wrote:
> 
> Commit 4f016420d368 ("Staging: lustre: obdclass: Use kasprintf") moved
> some sprintf() calls earlier in the code to combine them with
> memory allocation and create kasprintf() calls.
> 
> In one case, this code movement moved the sprintf to a location where the
> values being formatter were different.
> In particular
>   sprintf(niduuid, "%s_%x", mgcname, i);
> was move from *after* the line
>i = 0;
> to a location where the value of 'i' was at least 1.
> 
> This cause the wrong name to be formatted, and triggers
> 
>   CERROR("del MDC UUID %s failed: rc = %d\n",
>  niduuid, rc);
> 
> at unmount time.
> 
> So use '0' instead of 'i'.
> 
> Fixes: 4f016420d368 ("Staging: lustre: obdclass: Use kasprintf")
> Signed-off-by: NeilBrown 

Reviewed-by: Andreas Dilger 

> ---
> drivers/staging/lustre/lustre/obdclass/obd_mount.c |2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/staging/lustre/lustre/obdclass/obd_mount.c 
> b/drivers/staging/lustre/lustre/obdclass/obd_mount.c
> index acc1ea773c9c..f5e8214ac37b 100644
> --- a/drivers/staging/lustre/lustre/obdclass/obd_mount.c
> +++ b/drivers/staging/lustre/lustre/obdclass/obd_mount.c
> @@ -243,7 +243,7 @@ int lustre_start_mgc(struct super_block *sb)
>   libcfs_nid2str_r(nid, nidstr, sizeof(nidstr));
>   mgcname = kasprintf(GFP_NOFS,
>   "%s%s", LUSTRE_MGC_OBDNAME, nidstr);
> - niduuid = kasprintf(GFP_NOFS, "%s_%x", mgcname, i);
> + niduuid = kasprintf(GFP_NOFS, "%s_%x", mgcname, 0);
>   if (!mgcname || !niduuid) {
>   rc = -ENOMEM;
>   goto out_free;
> 
> 

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH] staging: lustre: llite: replace variable length array

2018-01-29 Thread Dilger, Andreas
On Jan 27, 2018, at 14:42, Sven Dziadek  wrote:
> 
> The functionality of the removed variable length array is already
> implemented by the function xattr_full_name in fs/xattr.c
> 
> This fixes the sparse warning:
> warning: Variable length array is used.
> 
> Signed-off-by: Sven Dziadek 
> ---
> drivers/staging/lustre/lustre/llite/xattr.c | 12 
> 1 file changed, 4 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/llite/xattr.c 
> b/drivers/staging/lustre/lustre/llite/xattr.c
> index 532384c91447..4fd28213c6a1 100644
> --- a/drivers/staging/lustre/lustre/llite/xattr.c
> +++ b/drivers/staging/lustre/lustre/llite/xattr.c
> @@ -87,7 +87,6 @@ ll_xattr_set_common(const struct xattr_handler *handler,
>   const char *name, const void *value, size_t size,
>   int flags)
> {
> - char fullname[strlen(handler->prefix) + strlen(name) + 1];
>   struct ll_sb_info *sbi = ll_i2sbi(inode);
>   struct ptlrpc_request *req = NULL;
>   const char *pv = value;
> @@ -141,9 +140,8 @@ ll_xattr_set_common(const struct xattr_handler *handler,
>   return -EPERM;
>   }
> 
> - sprintf(fullname, "%s%s\n", handler->prefix, name);
> - rc = md_setxattr(sbi->ll_md_exp, ll_inode2fid(inode),
> -  valid, fullname, pv, size, 0, flags,
> + rc = md_setxattr(sbi->ll_md_exp, ll_inode2fid(inode), valid,
> +  xattr_full_name(handler, name), pv, size, 0, flags,
>ll_i2suppgid(inode), );

Hi Sven,
thanks for the patch.

Looking at the details of "xattr_full_name()", this seems quite risky.  This
is essentially returning the pointer _before_ "name" on the assumption that
it contains the full "prefix.name" string.  IMHO, that is not necessarily a
safe assumption to make several layers down in the code.

James, I thought you had a patch for this to use kasprintf() instead of the
on-stack "fullname" declaration?

Cheers, Andreas

>   if (rc) {
>   if (rc == -EOPNOTSUPP && handler->flags == XATTR_USER_T) {
> @@ -364,7 +362,6 @@ static int ll_xattr_get_common(const struct xattr_handler 
> *handler,
>  struct dentry *dentry, struct inode *inode,
>  const char *name, void *buffer, size_t size)
> {
> - char fullname[strlen(handler->prefix) + strlen(name) + 1];
>   struct ll_sb_info *sbi = ll_i2sbi(inode);
> #ifdef CONFIG_FS_POSIX_ACL
>   struct ll_inode_info *lli = ll_i2info(inode);
> @@ -411,9 +408,8 @@ static int ll_xattr_get_common(const struct xattr_handler 
> *handler,
>   if (handler->flags == XATTR_ACL_DEFAULT_T && !S_ISDIR(inode->i_mode))
>   return -ENODATA;
> #endif
> - sprintf(fullname, "%s%s\n", handler->prefix, name);
> - return ll_xattr_list(inode, fullname, handler->flags, buffer, size,
> -  OBD_MD_FLXATTR);
> + return ll_xattr_list(inode, xattr_full_name(handler, name),
> +  handler->flags, buffer, size, OBD_MD_FLXATTR);
> }
> 
> static ssize_t ll_getxattr_lov(struct inode *inode, void *buf, size_t 
> buf_size)
> -- 
> 2.11.0
> 

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH] staging: lustre: llite: replace variable length array

2018-01-29 Thread Dilger, Andreas
On Jan 27, 2018, at 14:42, Sven Dziadek  wrote:
> 
> The functionality of the removed variable length array is already
> implemented by the function xattr_full_name in fs/xattr.c
> 
> This fixes the sparse warning:
> warning: Variable length array is used.
> 
> Signed-off-by: Sven Dziadek 
> ---
> drivers/staging/lustre/lustre/llite/xattr.c | 12 
> 1 file changed, 4 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/llite/xattr.c 
> b/drivers/staging/lustre/lustre/llite/xattr.c
> index 532384c91447..4fd28213c6a1 100644
> --- a/drivers/staging/lustre/lustre/llite/xattr.c
> +++ b/drivers/staging/lustre/lustre/llite/xattr.c
> @@ -87,7 +87,6 @@ ll_xattr_set_common(const struct xattr_handler *handler,
>   const char *name, const void *value, size_t size,
>   int flags)
> {
> - char fullname[strlen(handler->prefix) + strlen(name) + 1];
>   struct ll_sb_info *sbi = ll_i2sbi(inode);
>   struct ptlrpc_request *req = NULL;
>   const char *pv = value;
> @@ -141,9 +140,8 @@ ll_xattr_set_common(const struct xattr_handler *handler,
>   return -EPERM;
>   }
> 
> - sprintf(fullname, "%s%s\n", handler->prefix, name);
> - rc = md_setxattr(sbi->ll_md_exp, ll_inode2fid(inode),
> -  valid, fullname, pv, size, 0, flags,
> + rc = md_setxattr(sbi->ll_md_exp, ll_inode2fid(inode), valid,
> +  xattr_full_name(handler, name), pv, size, 0, flags,
>ll_i2suppgid(inode), );

Hi Sven,
thanks for the patch.

Looking at the details of "xattr_full_name()", this seems quite risky.  This
is essentially returning the pointer _before_ "name" on the assumption that
it contains the full "prefix.name" string.  IMHO, that is not necessarily a
safe assumption to make several layers down in the code.

James, I thought you had a patch for this to use kasprintf() instead of the
on-stack "fullname" declaration?

Cheers, Andreas

>   if (rc) {
>   if (rc == -EOPNOTSUPP && handler->flags == XATTR_USER_T) {
> @@ -364,7 +362,6 @@ static int ll_xattr_get_common(const struct xattr_handler 
> *handler,
>  struct dentry *dentry, struct inode *inode,
>  const char *name, void *buffer, size_t size)
> {
> - char fullname[strlen(handler->prefix) + strlen(name) + 1];
>   struct ll_sb_info *sbi = ll_i2sbi(inode);
> #ifdef CONFIG_FS_POSIX_ACL
>   struct ll_inode_info *lli = ll_i2info(inode);
> @@ -411,9 +408,8 @@ static int ll_xattr_get_common(const struct xattr_handler 
> *handler,
>   if (handler->flags == XATTR_ACL_DEFAULT_T && !S_ISDIR(inode->i_mode))
>   return -ENODATA;
> #endif
> - sprintf(fullname, "%s%s\n", handler->prefix, name);
> - return ll_xattr_list(inode, fullname, handler->flags, buffer, size,
> -  OBD_MD_FLXATTR);
> + return ll_xattr_list(inode, xattr_full_name(handler, name),
> +  handler->flags, buffer, size, OBD_MD_FLXATTR);
> }
> 
> static ssize_t ll_getxattr_lov(struct inode *inode, void *buf, size_t 
> buf_size)
> -- 
> 2.11.0
> 

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH] staging: lustre: lnet: return of an error code should be negative

2018-01-29 Thread Dilger, Andreas
On Jan 27, 2018, at 22:24, Sumit Pundir  wrote:
> 
> Return value of error codes should typically be negative.
> Issue reported by checkpatch.pl
> 
> Signed-off-by: Sumit Pundir 

Reviewed-by: Andreas Dilger 

> ---
> drivers/staging/lustre/lnet/selftest/framework.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/staging/lustre/lnet/selftest/framework.c 
> b/drivers/staging/lustre/lnet/selftest/framework.c
> index c7697f6..0ca1e3a 100644
> --- a/drivers/staging/lustre/lnet/selftest/framework.c
> +++ b/drivers/staging/lustre/lnet/selftest/framework.c
> @@ -187,7 +187,7 @@ sfw_del_session_timer(void)
>   return 0;
>   }
> 
> - return EBUSY; /* racing with sfw_session_expired() */
> + return -EBUSY; /* racing with sfw_session_expired() */
> }
> 
> static void
> -- 
> 2.7.4
> 

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH] staging: lustre: lnet: return of an error code should be negative

2018-01-29 Thread Dilger, Andreas
On Jan 27, 2018, at 22:24, Sumit Pundir  wrote:
> 
> Return value of error codes should typically be negative.
> Issue reported by checkpatch.pl
> 
> Signed-off-by: Sumit Pundir 

Reviewed-by: Andreas Dilger 

> ---
> drivers/staging/lustre/lnet/selftest/framework.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/staging/lustre/lnet/selftest/framework.c 
> b/drivers/staging/lustre/lnet/selftest/framework.c
> index c7697f6..0ca1e3a 100644
> --- a/drivers/staging/lustre/lnet/selftest/framework.c
> +++ b/drivers/staging/lustre/lnet/selftest/framework.c
> @@ -187,7 +187,7 @@ sfw_del_session_timer(void)
>   return 0;
>   }
> 
> - return EBUSY; /* racing with sfw_session_expired() */
> + return -EBUSY; /* racing with sfw_session_expired() */
> }
> 
> static void
> -- 
> 2.7.4
> 

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH] staging: lustre: lnet/selftest: fix compile error on UP build

2018-01-26 Thread Dilger, Andreas
On Jan 22, 2018, at 23:27, NeilBrown  wrote:
> 
> When compiled without CONFIG_SMP, we get a compile error
> as ->ctb_parts is not defined.
> 
> There is already a function, cfs_cpt_cpumask(), which will get the
> cpumask we need, and which handles the UP case by returning a NULL pointer.
> So use that and handle NULL.
> Also avoid the #ifdef by allocating a cpumask_var and copying
> into it, rather than sharing the mask.
> 
> Reported-by: kbuild test robot 
> Fixes: 6106c0f82481 ("staging: lustre: lnet: convert selftest to use 
> workqueues")
> Signed-off-by: NeilBrown 

Reviewed-by: Andreas Dilger 

> ---
> drivers/staging/lustre/lnet/selftest/module.c | 17 -
> 1 file changed, 8 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lnet/selftest/module.c 
> b/drivers/staging/lustre/lnet/selftest/module.c
> index 31a74b48c995..7359aa56d9b3 100644
> --- a/drivers/staging/lustre/lnet/selftest/module.c
> +++ b/drivers/staging/lustre/lnet/selftest/module.c
> @@ -110,7 +110,8 @@ lnet_selftest_init(void)
>   lst_init_step = LST_INIT_WI_TEST;
>   for (i = 0; i < nscheds; i++) {
>   int nthrs = cfs_cpt_weight(lnet_cpt_table(), i);
> - struct workqueue_attrs attrs;
> + struct workqueue_attrs attrs = {0};
> + cpumask_var_t *mask = cfs_cpt_cpumask(lnet_cpt_table(), i);
> 
>   /* reserve at least one CPU for LND */
>   nthrs = max(nthrs - 1, 1);
> @@ -121,14 +122,12 @@ lnet_selftest_init(void)
>   rc = -ENOMEM;
>   goto error;
>   }
> - attrs.nice = 0;
> - #ifdef CONFIG_CPUMASK_OFFSTACK
> - attrs.cpumask = lnet_cpt_table()->ctb_parts[i].cpt_cpumask;
> - #else
> - cpumask_copy(attrs.cpumask, 
> lnet_cpt_table()->ctb_parts[i].cpt_cpumask);
> - #endif
> - attrs.no_numa = false;
> - apply_workqueue_attrs(lst_test_wq[i], );
> +
> + if (mask && alloc_cpumask_var(, GFP_KERNEL)) {
> + cpumask_copy(attrs.cpumask, *mask);
> + apply_workqueue_attrs(lst_test_wq[i], );
> + free_cpumask_var(attrs.cpumask);
> + }
>   }
> 
>   rc = srpc_startup();
> -- 
> 2.14.0.rc0.dirty
> 

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH] staging: lustre: lnet/selftest: fix compile error on UP build

2018-01-26 Thread Dilger, Andreas
On Jan 22, 2018, at 23:27, NeilBrown  wrote:
> 
> When compiled without CONFIG_SMP, we get a compile error
> as ->ctb_parts is not defined.
> 
> There is already a function, cfs_cpt_cpumask(), which will get the
> cpumask we need, and which handles the UP case by returning a NULL pointer.
> So use that and handle NULL.
> Also avoid the #ifdef by allocating a cpumask_var and copying
> into it, rather than sharing the mask.
> 
> Reported-by: kbuild test robot 
> Fixes: 6106c0f82481 ("staging: lustre: lnet: convert selftest to use 
> workqueues")
> Signed-off-by: NeilBrown 

Reviewed-by: Andreas Dilger 

> ---
> drivers/staging/lustre/lnet/selftest/module.c | 17 -
> 1 file changed, 8 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lnet/selftest/module.c 
> b/drivers/staging/lustre/lnet/selftest/module.c
> index 31a74b48c995..7359aa56d9b3 100644
> --- a/drivers/staging/lustre/lnet/selftest/module.c
> +++ b/drivers/staging/lustre/lnet/selftest/module.c
> @@ -110,7 +110,8 @@ lnet_selftest_init(void)
>   lst_init_step = LST_INIT_WI_TEST;
>   for (i = 0; i < nscheds; i++) {
>   int nthrs = cfs_cpt_weight(lnet_cpt_table(), i);
> - struct workqueue_attrs attrs;
> + struct workqueue_attrs attrs = {0};
> + cpumask_var_t *mask = cfs_cpt_cpumask(lnet_cpt_table(), i);
> 
>   /* reserve at least one CPU for LND */
>   nthrs = max(nthrs - 1, 1);
> @@ -121,14 +122,12 @@ lnet_selftest_init(void)
>   rc = -ENOMEM;
>   goto error;
>   }
> - attrs.nice = 0;
> - #ifdef CONFIG_CPUMASK_OFFSTACK
> - attrs.cpumask = lnet_cpt_table()->ctb_parts[i].cpt_cpumask;
> - #else
> - cpumask_copy(attrs.cpumask, 
> lnet_cpt_table()->ctb_parts[i].cpt_cpumask);
> - #endif
> - attrs.no_numa = false;
> - apply_workqueue_attrs(lst_test_wq[i], );
> +
> + if (mask && alloc_cpumask_var(, GFP_KERNEL)) {
> + cpumask_copy(attrs.cpumask, *mask);
> + apply_workqueue_attrs(lst_test_wq[i], );
> + free_cpumask_var(attrs.cpumask);
> + }
>   }
> 
>   rc = srpc_startup();
> -- 
> 2.14.0.rc0.dirty
> 

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH v4] staging: lustre: separate a connection destroy from free struct kib_conn

2018-01-25 Thread Dilger, Andreas
On Jan 25, 2018, at 06:51, Eremin, Dmitry  wrote:
> 
> The logic of the original commit 4d99b2581eff ("staging: lustre: avoid
> intensive reconnecting for ko2iblnd") was assumed conditional free of
> struct kib_conn if the second argument free_conn in function
> kiblnd_destroy_conn(struct kib_conn *conn, bool free_conn) is true.
> But this hunk of code was dropped from original commit. As result the logic
> works wrong and current code use struct kib_conn after free.
> 
>> drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
>> 3317  kiblnd_destroy_conn(conn, !peer);
>>   Freed always (but should be conditionally)
>> 3318
>> 3319  spin_lock_irqsave(lock, flags);
>> 3320  if (!peer)
>> 3321  continue;
>> 3322
>> 3323  conn->ibc_peer = peer;
>>  ^^ Use after free
>> 3324  if (peer->ibp_reconnected < KIB_RECONN_HIGH_RACE)
>> 3325  list_add_tail(>ibc_list,
>> ^^ Use after free
>> 3326_data.kib_reconn_list);
>> 3327  else
>> 3328  list_add_tail(>ibc_list,
>> ^^ Use after free
>> 3329_data.kib_reconn_wait);
> 
> To avoid confusion this fix moved the freeing a struct kib_conn outside of
> the function kiblnd_destroy_conn() and free as it was intended in original
> commit.
> 
> Cc:  # v4.6
> Fixes: 4d99b2581eff ("staging: lustre: avoid intensive reconnecting for 
> ko2iblnd")
> Signed-off-by: Dmitry Eremin 

Reviewed-by: Andreas Dilger 

> ---
> Changes in v4:
>- fixed the issue with use after free by moving the freeing a struct
>  kib_conn outside of the function kiblnd_destroy_conn()
> 
> drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c| 7 +++
> drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h| 2 +-
> drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c | 6 --
> 3 files changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c 
> b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
> index 2ebc484385b3..ec84edfda271 100644
> --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
> +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
> @@ -824,14 +824,15 @@ struct kib_conn *kiblnd_create_conn(struct kib_peer 
> *peer, struct rdma_cm_id *cm
>   return conn;
> 
>  failed_2:
> - kiblnd_destroy_conn(conn, true);
> + kiblnd_destroy_conn(conn);
> + kfree(conn);
>  failed_1:
>   kfree(init_qp_attr);
>  failed_0:
>   return NULL;
> }
> 
> -void kiblnd_destroy_conn(struct kib_conn *conn, bool free_conn)
> +void kiblnd_destroy_conn(struct kib_conn *conn)
> {
>   struct rdma_cm_id *cmid = conn->ibc_cmid;
>   struct kib_peer *peer = conn->ibc_peer;
> @@ -889,8 +890,6 @@ void kiblnd_destroy_conn(struct kib_conn *conn, bool 
> free_conn)
>   rdma_destroy_id(cmid);
>   atomic_dec(>ibn_nconns);
>   }
> -
> - kfree(conn);
> }
> 
> int kiblnd_close_peer_conns_locked(struct kib_peer *peer, int why)
> diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h 
> b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h
> index 171eced213f8..b18911d09e9a 100644
> --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h
> +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h
> @@ -1016,7 +1016,7 @@ int  kiblnd_close_stale_conns_locked(struct kib_peer 
> *peer,
> struct kib_conn *kiblnd_create_conn(struct kib_peer *peer,
>   struct rdma_cm_id *cmid,
>   int state, int version);
> -void kiblnd_destroy_conn(struct kib_conn *conn, bool free_conn);
> +void kiblnd_destroy_conn(struct kib_conn *conn);
> void kiblnd_close_conn(struct kib_conn *conn, int error);
> void kiblnd_close_conn_locked(struct kib_conn *conn, int error);
> 
> diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c 
> b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
> index 9b3328c5d1e7..b3e7f28eb978 100644
> --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
> +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
> @@ -3314,11 +3314,13 @@ static int kiblnd_resolve_addr(struct rdma_cm_id 
> *cmid,
>   spin_unlock_irqrestore(lock, flags);
>   dropped_lock = 1;
> 
> - kiblnd_destroy_conn(conn, !peer);
> + kiblnd_destroy_conn(conn);
> 
>   spin_lock_irqsave(lock, flags);
> - if (!peer)
> + if (!peer) {
> + kfree(conn);
>   continue;
> + }
> 
>   conn->ibc_peer = peer;
>   if (peer->ibp_reconnected < KIB_RECONN_HIGH_RACE)
> -- 
> 1.8.3.1
> 

Cheers, Andreas
--
Andreas Dilger
Lustre Principal 

Re: [PATCH v4] staging: lustre: separate a connection destroy from free struct kib_conn

2018-01-25 Thread Dilger, Andreas
On Jan 25, 2018, at 06:51, Eremin, Dmitry  wrote:
> 
> The logic of the original commit 4d99b2581eff ("staging: lustre: avoid
> intensive reconnecting for ko2iblnd") was assumed conditional free of
> struct kib_conn if the second argument free_conn in function
> kiblnd_destroy_conn(struct kib_conn *conn, bool free_conn) is true.
> But this hunk of code was dropped from original commit. As result the logic
> works wrong and current code use struct kib_conn after free.
> 
>> drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
>> 3317  kiblnd_destroy_conn(conn, !peer);
>>   Freed always (but should be conditionally)
>> 3318
>> 3319  spin_lock_irqsave(lock, flags);
>> 3320  if (!peer)
>> 3321  continue;
>> 3322
>> 3323  conn->ibc_peer = peer;
>>  ^^ Use after free
>> 3324  if (peer->ibp_reconnected < KIB_RECONN_HIGH_RACE)
>> 3325  list_add_tail(>ibc_list,
>> ^^ Use after free
>> 3326_data.kib_reconn_list);
>> 3327  else
>> 3328  list_add_tail(>ibc_list,
>> ^^ Use after free
>> 3329_data.kib_reconn_wait);
> 
> To avoid confusion this fix moved the freeing a struct kib_conn outside of
> the function kiblnd_destroy_conn() and free as it was intended in original
> commit.
> 
> Cc:  # v4.6
> Fixes: 4d99b2581eff ("staging: lustre: avoid intensive reconnecting for 
> ko2iblnd")
> Signed-off-by: Dmitry Eremin 

Reviewed-by: Andreas Dilger 

> ---
> Changes in v4:
>- fixed the issue with use after free by moving the freeing a struct
>  kib_conn outside of the function kiblnd_destroy_conn()
> 
> drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c| 7 +++
> drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h| 2 +-
> drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c | 6 --
> 3 files changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c 
> b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
> index 2ebc484385b3..ec84edfda271 100644
> --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
> +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
> @@ -824,14 +824,15 @@ struct kib_conn *kiblnd_create_conn(struct kib_peer 
> *peer, struct rdma_cm_id *cm
>   return conn;
> 
>  failed_2:
> - kiblnd_destroy_conn(conn, true);
> + kiblnd_destroy_conn(conn);
> + kfree(conn);
>  failed_1:
>   kfree(init_qp_attr);
>  failed_0:
>   return NULL;
> }
> 
> -void kiblnd_destroy_conn(struct kib_conn *conn, bool free_conn)
> +void kiblnd_destroy_conn(struct kib_conn *conn)
> {
>   struct rdma_cm_id *cmid = conn->ibc_cmid;
>   struct kib_peer *peer = conn->ibc_peer;
> @@ -889,8 +890,6 @@ void kiblnd_destroy_conn(struct kib_conn *conn, bool 
> free_conn)
>   rdma_destroy_id(cmid);
>   atomic_dec(>ibn_nconns);
>   }
> -
> - kfree(conn);
> }
> 
> int kiblnd_close_peer_conns_locked(struct kib_peer *peer, int why)
> diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h 
> b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h
> index 171eced213f8..b18911d09e9a 100644
> --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h
> +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h
> @@ -1016,7 +1016,7 @@ int  kiblnd_close_stale_conns_locked(struct kib_peer 
> *peer,
> struct kib_conn *kiblnd_create_conn(struct kib_peer *peer,
>   struct rdma_cm_id *cmid,
>   int state, int version);
> -void kiblnd_destroy_conn(struct kib_conn *conn, bool free_conn);
> +void kiblnd_destroy_conn(struct kib_conn *conn);
> void kiblnd_close_conn(struct kib_conn *conn, int error);
> void kiblnd_close_conn_locked(struct kib_conn *conn, int error);
> 
> diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c 
> b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
> index 9b3328c5d1e7..b3e7f28eb978 100644
> --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
> +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
> @@ -3314,11 +3314,13 @@ static int kiblnd_resolve_addr(struct rdma_cm_id 
> *cmid,
>   spin_unlock_irqrestore(lock, flags);
>   dropped_lock = 1;
> 
> - kiblnd_destroy_conn(conn, !peer);
> + kiblnd_destroy_conn(conn);
> 
>   spin_lock_irqsave(lock, flags);
> - if (!peer)
> + if (!peer) {
> + kfree(conn);
>   continue;
> + }
> 
>   conn->ibc_peer = peer;
>   if (peer->ibp_reconnected < KIB_RECONN_HIGH_RACE)
> -- 
> 1.8.3.1
> 

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH 6/8] staging: lustre: Fix overlong lines

2018-01-18 Thread Dilger, Andreas
On Jan 11, 2018, at 10:17, Fabian Huegel  wrote:
> 
> Fixed four lines that went over the 80 character limit
> to reduce checkpatch warnings.
> 
> Signed-off-by: Fabian Huegel 
> Signed-off-by: Christoph Volkert 
> ---
> drivers/staging/lustre/lustre/include/obd_class.h | 14 ++
> 1 file changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/include/obd_class.h 
> b/drivers/staging/lustre/lustre/include/obd_class.h
> index d195866..06f825b 100644
> --- a/drivers/staging/lustre/lustre/include/obd_class.h
> +++ b/drivers/staging/lustre/lustre/include/obd_class.h
> @@ -850,7 +850,9 @@ static inline int obd_pool_del(struct obd_device *obd, 
> char *poolname)
>   return rc;
> }
> 
> -static inline int obd_pool_add(struct obd_device *obd, char *poolname, char 
> *ostname)
> +static inline int obd_pool_add(struct obd_device *obd,
> +char *poolname,
> +char *ostname)

This only needs a single field moved onto the next line, like:

+static inline int obd_pool_add(struct obd_device *obd, char *poolname,
+  char *ostname)


> @@ -861,7 +863,9 @@ static inline int obd_pool_add(struct obd_device *obd, 
> char *poolname, char *ost
>   return rc;
> }
> 
> -static inline int obd_pool_rem(struct obd_device *obd, char *poolname, char 
> *ostname)
> +static inline int obd_pool_rem(struct obd_device *obd,
> +char *poolname,
> +char *ostname)

Same.

> @@ -997,7 +1001,8 @@ static inline int obd_statfs(const struct lu_env *env, 
> struct obd_export *exp,
>   spin_unlock(>obd_osfs_lock);
>   }
>   } else {
> - CDEBUG(D_SUPER, "%s: use %p cache blocks %llu/%llu objects 
> %llu/%llu\n",
> + CDEBUG(D_SUPER,
> +"%s: use %p cache blocks %llu/%llu objects %llu/%llu\n",
>  obd->obd_name, >obd_osfs,
>  obd->obd_osfs.os_bavail, obd->obd_osfs.os_blocks,
>  obd->obd_osfs.os_ffree, obd->obd_osfs.os_files);
> @@ -1579,7 +1584,8 @@ int class_procfs_init(void);
> int class_procfs_clean(void);
> 
> /* prng.c */
> -#define ll_generate_random_uuid(uuid_out) get_random_bytes(uuid_out, 
> sizeof(class_uuid_t))
> +#define ll_generate_random_uuid(uuid_out) \
> + get_random_bytes(uuid_out, sizeof(class_uuid_t))

This looks like it would be better to replace ll_generate_random_uuid()
callers with generate_random_uuid().

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH 6/8] staging: lustre: Fix overlong lines

2018-01-18 Thread Dilger, Andreas
On Jan 11, 2018, at 10:17, Fabian Huegel  wrote:
> 
> Fixed four lines that went over the 80 character limit
> to reduce checkpatch warnings.
> 
> Signed-off-by: Fabian Huegel 
> Signed-off-by: Christoph Volkert 
> ---
> drivers/staging/lustre/lustre/include/obd_class.h | 14 ++
> 1 file changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/include/obd_class.h 
> b/drivers/staging/lustre/lustre/include/obd_class.h
> index d195866..06f825b 100644
> --- a/drivers/staging/lustre/lustre/include/obd_class.h
> +++ b/drivers/staging/lustre/lustre/include/obd_class.h
> @@ -850,7 +850,9 @@ static inline int obd_pool_del(struct obd_device *obd, 
> char *poolname)
>   return rc;
> }
> 
> -static inline int obd_pool_add(struct obd_device *obd, char *poolname, char 
> *ostname)
> +static inline int obd_pool_add(struct obd_device *obd,
> +char *poolname,
> +char *ostname)

This only needs a single field moved onto the next line, like:

+static inline int obd_pool_add(struct obd_device *obd, char *poolname,
+  char *ostname)


> @@ -861,7 +863,9 @@ static inline int obd_pool_add(struct obd_device *obd, 
> char *poolname, char *ost
>   return rc;
> }
> 
> -static inline int obd_pool_rem(struct obd_device *obd, char *poolname, char 
> *ostname)
> +static inline int obd_pool_rem(struct obd_device *obd,
> +char *poolname,
> +char *ostname)

Same.

> @@ -997,7 +1001,8 @@ static inline int obd_statfs(const struct lu_env *env, 
> struct obd_export *exp,
>   spin_unlock(>obd_osfs_lock);
>   }
>   } else {
> - CDEBUG(D_SUPER, "%s: use %p cache blocks %llu/%llu objects 
> %llu/%llu\n",
> + CDEBUG(D_SUPER,
> +"%s: use %p cache blocks %llu/%llu objects %llu/%llu\n",
>  obd->obd_name, >obd_osfs,
>  obd->obd_osfs.os_bavail, obd->obd_osfs.os_blocks,
>  obd->obd_osfs.os_ffree, obd->obd_osfs.os_files);
> @@ -1579,7 +1584,8 @@ int class_procfs_init(void);
> int class_procfs_clean(void);
> 
> /* prng.c */
> -#define ll_generate_random_uuid(uuid_out) get_random_bytes(uuid_out, 
> sizeof(class_uuid_t))
> +#define ll_generate_random_uuid(uuid_out) \
> + get_random_bytes(uuid_out, sizeof(class_uuid_t))

This looks like it would be better to replace ll_generate_random_uuid()
callers with generate_random_uuid().

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH] staging: lustre: Fix avoid intensive reconnecting for ko2iblnd patch

2018-01-16 Thread Dilger, Andreas

> On Jan 16, 2018, at 09:56, Greg Kroah-Hartman  
> wrote:
> 
> On Tue, Jan 16, 2018 at 03:01:49PM +, Eremin, Dmitry wrote:
>> In the original commit 4d99b2581effe115376402e710fbcb1c3c073769
> 
> Please use the documented way to write this:
>   4d99b2581eff ("staging: lustre: avoid intensive reconnecting for 
> ko2iblnd")
> 

>> was missed one hunk. Added it now to avoid issue with use after free.
> 
> And I do not understand this commit message at all.
> 
>> Signed-off-by: Dmitry Eremin 
>> ---
>> drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>> 
>> diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c 
>> b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
>> index 2ebc484..a15a625 100644
>> --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
>> +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
>> @@ -890,7 +890,8 @@ void kiblnd_destroy_conn(struct kib_conn *conn, bool 
>> free_conn)
>>  atomic_dec(>ibn_nconns);
>>  }
>> 
>> -kfree(conn);
>> +if (free_conn)
>> +kfree(conn);
> 
> This looks really odd, don't you think?

I'm not sure what the objection is here?  There is an argument to this
this function named "free_conn" which determines if the structure should
be freed, or if the network connection is just being torn down and
reconnected.

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH] staging: lustre: Fix avoid intensive reconnecting for ko2iblnd patch

2018-01-16 Thread Dilger, Andreas

> On Jan 16, 2018, at 09:56, Greg Kroah-Hartman  
> wrote:
> 
> On Tue, Jan 16, 2018 at 03:01:49PM +, Eremin, Dmitry wrote:
>> In the original commit 4d99b2581effe115376402e710fbcb1c3c073769
> 
> Please use the documented way to write this:
>   4d99b2581eff ("staging: lustre: avoid intensive reconnecting for 
> ko2iblnd")
> 

>> was missed one hunk. Added it now to avoid issue with use after free.
> 
> And I do not understand this commit message at all.
> 
>> Signed-off-by: Dmitry Eremin 
>> ---
>> drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>> 
>> diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c 
>> b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
>> index 2ebc484..a15a625 100644
>> --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
>> +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
>> @@ -890,7 +890,8 @@ void kiblnd_destroy_conn(struct kib_conn *conn, bool 
>> free_conn)
>>  atomic_dec(>ibn_nconns);
>>  }
>> 
>> -kfree(conn);
>> +if (free_conn)
>> +kfree(conn);
> 
> This looks really odd, don't you think?

I'm not sure what the objection is here?  There is an argument to this
this function named "free_conn" which determines if the structure should
be freed, or if the network connection is just being torn down and
reconnected.

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH 1/2] staging: lustre: use strim instead of cfs_trimwhite.

2017-12-19 Thread Dilger, Andreas
On Dec 18, 2017, at 16:01, NeilBrown  wrote:
> 
> Linux lib provides identical functionality to cfs_trimwhite,
> so discard that code and use the standard.
> 
> Signed-off-by: NeilBrown 

Reviewed-by: Andreas Dilger 

> ---
> .../lustre/include/linux/libcfs/libcfs_string.h|1 -
> drivers/staging/lustre/lnet/libcfs/libcfs_string.c |   20 
> .../staging/lustre/lnet/libcfs/linux/linux-cpu.c   |8 
> drivers/staging/lustre/lnet/lnet/config.c  |   10 +-
> drivers/staging/lustre/lnet/lnet/router_proc.c |2 +-
> 5 files changed, 10 insertions(+), 31 deletions(-)
> 
> diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_string.h 
> b/drivers/staging/lustre/include/linux/libcfs/libcfs_string.h
> index c1375733ff31..66463477074a 100644
> --- a/drivers/staging/lustre/include/linux/libcfs/libcfs_string.h
> +++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_string.h
> @@ -73,7 +73,6 @@ struct cfs_expr_list {
>   struct list_headel_exprs;
> };
> 
> -char *cfs_trimwhite(char *str);
> int cfs_gettok(struct cfs_lstr *next, char delim, struct cfs_lstr *res);
> int cfs_str2num_check(char *str, int nob, unsigned int *num,
> unsigned int min, unsigned int max);
> diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_string.c 
> b/drivers/staging/lustre/lnet/libcfs/libcfs_string.c
> index b1d8faa3f7aa..442889a3d729 100644
> --- a/drivers/staging/lustre/lnet/libcfs/libcfs_string.c
> +++ b/drivers/staging/lustre/lnet/libcfs/libcfs_string.c
> @@ -137,26 +137,6 @@ char *cfs_firststr(char *str, size_t size)
> }
> EXPORT_SYMBOL(cfs_firststr);
> 
> -char *
> -cfs_trimwhite(char *str)
> -{
> - char *end;
> -
> - while (isspace(*str))
> - str++;
> -
> - end = str + strlen(str);
> - while (end > str) {
> - if (!isspace(end[-1]))
> - break;
> - end--;
> - }
> -
> - *end = 0;
> - return str;
> -}
> -EXPORT_SYMBOL(cfs_trimwhite);
> -
> /**
>  * Extracts tokens from strings.
>  *
> diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c 
> b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
> index e9156bf05ed4..d30650f8dcb4 100644
> --- a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
> +++ b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
> @@ -818,7 +818,7 @@ cfs_cpt_table_create_pattern(char *pattern)
>   int c;
>   int i;
> 
> - str = cfs_trimwhite(pattern);
> + str = strim(pattern);
>   if (*str == 'n' || *str == 'N') {
>   pattern = str + 1;
>   if (*pattern != '\0') {
> @@ -870,7 +870,7 @@ cfs_cpt_table_create_pattern(char *pattern)
> 
>   high = node ? MAX_NUMNODES - 1 : nr_cpu_ids - 1;
> 
> - for (str = cfs_trimwhite(pattern), c = 0;; c++) {
> + for (str = strim(pattern), c = 0;; c++) {
>   struct cfs_range_expr *range;
>   struct cfs_expr_list *el;
>   char *bracket = strchr(str, '[');
> @@ -905,7 +905,7 @@ cfs_cpt_table_create_pattern(char *pattern)
>   goto failed;
>   }
> 
> - str = cfs_trimwhite(str + n);
> + str = strim(str + n);
>   if (str != bracket) {
>   CERROR("Invalid pattern %s\n", str);
>   goto failed;
> @@ -945,7 +945,7 @@ cfs_cpt_table_create_pattern(char *pattern)
>   goto failed;
>   }
> 
> - str = cfs_trimwhite(bracket + 1);
> + str = strim(bracket + 1);
>   }
> 
>   return cptab;
> diff --git a/drivers/staging/lustre/lnet/lnet/config.c 
> b/drivers/staging/lustre/lnet/lnet/config.c
> index fd53c74766a7..44eeca63f458 100644
> --- a/drivers/staging/lustre/lnet/lnet/config.c
> +++ b/drivers/staging/lustre/lnet/lnet/config.c
> @@ -269,7 +269,7 @@ lnet_parse_networks(struct list_head *nilist, char 
> *networks)
> 
>   if (comma)
>   *comma++ = 0;
> - net = libcfs_str2net(cfs_trimwhite(str));
> + net = libcfs_str2net(strim(str));
> 
>   if (net == LNET_NIDNET(LNET_NID_ANY)) {
>   LCONSOLE_ERROR_MSG(0x113,
> @@ -292,7 +292,7 @@ lnet_parse_networks(struct list_head *nilist, char 
> *networks)
>   }
> 
>   *bracket = 0;
> - net = libcfs_str2net(cfs_trimwhite(str));
> + net = libcfs_str2net(strim(str));
>   if (net == LNET_NIDNET(LNET_NID_ANY)) {
>   tmp = str;
>   goto failed_syntax;
> @@ -322,7 +322,7 @@ lnet_parse_networks(struct list_head *nilist, char 
> *networks)
>   if (comma)
>   *comma++ = 0;
> 
> - iface = cfs_trimwhite(iface);
> + iface = strim(iface);
>  

Re: [PATCH 1/2] staging: lustre: use strim instead of cfs_trimwhite.

2017-12-19 Thread Dilger, Andreas
On Dec 18, 2017, at 16:01, NeilBrown  wrote:
> 
> Linux lib provides identical functionality to cfs_trimwhite,
> so discard that code and use the standard.
> 
> Signed-off-by: NeilBrown 

Reviewed-by: Andreas Dilger 

> ---
> .../lustre/include/linux/libcfs/libcfs_string.h|1 -
> drivers/staging/lustre/lnet/libcfs/libcfs_string.c |   20 
> .../staging/lustre/lnet/libcfs/linux/linux-cpu.c   |8 
> drivers/staging/lustre/lnet/lnet/config.c  |   10 +-
> drivers/staging/lustre/lnet/lnet/router_proc.c |2 +-
> 5 files changed, 10 insertions(+), 31 deletions(-)
> 
> diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_string.h 
> b/drivers/staging/lustre/include/linux/libcfs/libcfs_string.h
> index c1375733ff31..66463477074a 100644
> --- a/drivers/staging/lustre/include/linux/libcfs/libcfs_string.h
> +++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_string.h
> @@ -73,7 +73,6 @@ struct cfs_expr_list {
>   struct list_headel_exprs;
> };
> 
> -char *cfs_trimwhite(char *str);
> int cfs_gettok(struct cfs_lstr *next, char delim, struct cfs_lstr *res);
> int cfs_str2num_check(char *str, int nob, unsigned int *num,
> unsigned int min, unsigned int max);
> diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_string.c 
> b/drivers/staging/lustre/lnet/libcfs/libcfs_string.c
> index b1d8faa3f7aa..442889a3d729 100644
> --- a/drivers/staging/lustre/lnet/libcfs/libcfs_string.c
> +++ b/drivers/staging/lustre/lnet/libcfs/libcfs_string.c
> @@ -137,26 +137,6 @@ char *cfs_firststr(char *str, size_t size)
> }
> EXPORT_SYMBOL(cfs_firststr);
> 
> -char *
> -cfs_trimwhite(char *str)
> -{
> - char *end;
> -
> - while (isspace(*str))
> - str++;
> -
> - end = str + strlen(str);
> - while (end > str) {
> - if (!isspace(end[-1]))
> - break;
> - end--;
> - }
> -
> - *end = 0;
> - return str;
> -}
> -EXPORT_SYMBOL(cfs_trimwhite);
> -
> /**
>  * Extracts tokens from strings.
>  *
> diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c 
> b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
> index e9156bf05ed4..d30650f8dcb4 100644
> --- a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
> +++ b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
> @@ -818,7 +818,7 @@ cfs_cpt_table_create_pattern(char *pattern)
>   int c;
>   int i;
> 
> - str = cfs_trimwhite(pattern);
> + str = strim(pattern);
>   if (*str == 'n' || *str == 'N') {
>   pattern = str + 1;
>   if (*pattern != '\0') {
> @@ -870,7 +870,7 @@ cfs_cpt_table_create_pattern(char *pattern)
> 
>   high = node ? MAX_NUMNODES - 1 : nr_cpu_ids - 1;
> 
> - for (str = cfs_trimwhite(pattern), c = 0;; c++) {
> + for (str = strim(pattern), c = 0;; c++) {
>   struct cfs_range_expr *range;
>   struct cfs_expr_list *el;
>   char *bracket = strchr(str, '[');
> @@ -905,7 +905,7 @@ cfs_cpt_table_create_pattern(char *pattern)
>   goto failed;
>   }
> 
> - str = cfs_trimwhite(str + n);
> + str = strim(str + n);
>   if (str != bracket) {
>   CERROR("Invalid pattern %s\n", str);
>   goto failed;
> @@ -945,7 +945,7 @@ cfs_cpt_table_create_pattern(char *pattern)
>   goto failed;
>   }
> 
> - str = cfs_trimwhite(bracket + 1);
> + str = strim(bracket + 1);
>   }
> 
>   return cptab;
> diff --git a/drivers/staging/lustre/lnet/lnet/config.c 
> b/drivers/staging/lustre/lnet/lnet/config.c
> index fd53c74766a7..44eeca63f458 100644
> --- a/drivers/staging/lustre/lnet/lnet/config.c
> +++ b/drivers/staging/lustre/lnet/lnet/config.c
> @@ -269,7 +269,7 @@ lnet_parse_networks(struct list_head *nilist, char 
> *networks)
> 
>   if (comma)
>   *comma++ = 0;
> - net = libcfs_str2net(cfs_trimwhite(str));
> + net = libcfs_str2net(strim(str));
> 
>   if (net == LNET_NIDNET(LNET_NID_ANY)) {
>   LCONSOLE_ERROR_MSG(0x113,
> @@ -292,7 +292,7 @@ lnet_parse_networks(struct list_head *nilist, char 
> *networks)
>   }
> 
>   *bracket = 0;
> - net = libcfs_str2net(cfs_trimwhite(str));
> + net = libcfs_str2net(strim(str));
>   if (net == LNET_NIDNET(LNET_NID_ANY)) {
>   tmp = str;
>   goto failed_syntax;
> @@ -322,7 +322,7 @@ lnet_parse_networks(struct list_head *nilist, char 
> *networks)
>   if (comma)
>   *comma++ = 0;
> 
> - iface = cfs_trimwhite(iface);
> + iface = strim(iface);
>   if (!*iface) {
> 

Re: [PATCH 2/2] staging: lustre: disable preempt while sampling processor id.

2017-12-19 Thread Dilger, Andreas
On Dec 18, 2017, at 16:01, NeilBrown  wrote:
> 
> Calling smp_processor_id() without disabling preemption
> triggers a warning (if CONFIG_DEBUG_PREEMPT).
> I think the result of cfs_cpt_current() is only used as a hint for
> load balancing, rather than as a precise and stable indicator of
> the current CPU.  So it doesn't need to be called with
> preemption disabled.
> 
> So disable preemption inside cfs_cpt_current() to silence the warning.
> 
> Signed-off-by: NeilBrown 

Reviewed-by: Andreas Dilger 

> ---
> .../staging/lustre/lnet/libcfs/linux/linux-cpu.c   |   13 +++--
> 1 file changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c 
> b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
> index d30650f8dcb4..ca8518b8a3e0 100644
> --- a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
> +++ b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
> @@ -517,19 +517,20 @@ EXPORT_SYMBOL(cfs_cpt_spread_node);
> int
> cfs_cpt_current(struct cfs_cpt_table *cptab, int remap)
> {
> - int cpu = smp_processor_id();
> - int cpt = cptab->ctb_cpu2cpt[cpu];
> + int cpu;
> + int cpt;
> 
> - if (cpt < 0) {
> - if (!remap)
> - return cpt;
> + preempt_disable();
> + cpu = smp_processor_id();
> + cpt = cptab->ctb_cpu2cpt[cpu];
> 
> + if (cpt < 0 && remap) {
>   /* don't return negative value for safety of upper layer,
>* instead we shadow the unknown cpu to a valid partition ID
>*/
>   cpt = cpu % cptab->ctb_nparts;
>   }
> -
> + preempt_enable();
>   return cpt;
> }
> EXPORT_SYMBOL(cfs_cpt_current);
> 
> 

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH 2/2] staging: lustre: disable preempt while sampling processor id.

2017-12-19 Thread Dilger, Andreas
On Dec 18, 2017, at 16:01, NeilBrown  wrote:
> 
> Calling smp_processor_id() without disabling preemption
> triggers a warning (if CONFIG_DEBUG_PREEMPT).
> I think the result of cfs_cpt_current() is only used as a hint for
> load balancing, rather than as a precise and stable indicator of
> the current CPU.  So it doesn't need to be called with
> preemption disabled.
> 
> So disable preemption inside cfs_cpt_current() to silence the warning.
> 
> Signed-off-by: NeilBrown 

Reviewed-by: Andreas Dilger 

> ---
> .../staging/lustre/lnet/libcfs/linux/linux-cpu.c   |   13 +++--
> 1 file changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c 
> b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
> index d30650f8dcb4..ca8518b8a3e0 100644
> --- a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
> +++ b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
> @@ -517,19 +517,20 @@ EXPORT_SYMBOL(cfs_cpt_spread_node);
> int
> cfs_cpt_current(struct cfs_cpt_table *cptab, int remap)
> {
> - int cpu = smp_processor_id();
> - int cpt = cptab->ctb_cpu2cpt[cpu];
> + int cpu;
> + int cpt;
> 
> - if (cpt < 0) {
> - if (!remap)
> - return cpt;
> + preempt_disable();
> + cpu = smp_processor_id();
> + cpt = cptab->ctb_cpu2cpt[cpu];
> 
> + if (cpt < 0 && remap) {
>   /* don't return negative value for safety of upper layer,
>* instead we shadow the unknown cpu to a valid partition ID
>*/
>   cpt = cpu % cptab->ctb_nparts;
>   }
> -
> + preempt_enable();
>   return cpt;
> }
> EXPORT_SYMBOL(cfs_cpt_current);
> 
> 

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [lustre-devel] [PATCH 02/16] staging: lustre: replace simple cases of l_wait_event() with wait_event().

2017-12-19 Thread Dilger, Andreas
On Dec 18, 2017, at 11:03, Patrick Farrell  wrote:
> 
> The wait calls in ll_statahead_thread are done in a service thread, and
> should probably *not* contribute to load.
> 
> The one in osc_extent_wait is perhaps tough - It is called both from user
> threads & daemon threads depending on the situation.  The effect of adding
> that to load average could be significant for some activities, even when
> no user threads are busy.  Thoughts from other Lustre people would be
> welcome here.

The main reasons we started using l_wait_event() were:
- it is used by the request handling threads, and wait_event() caused the
  load average to always be == number of service threads, which was
  wrong if those threads were idle waiting for requests to arrive.
  That is mostly a server problem, but a couple of request handlers are
  on the client also (DLM lock cancellation threads, etc.) that shouldn't
  contribute to load.  It looks like there is a better solution for this
  today with TASK_IDLE.
- we want the userspace threads to be interruptible if the server is not
  functional, but the client should at least get a chance to complete the
  RPC if the server is just busy.  Since Lustre needs to work in systems
  with 10,000+ clients pounding a server, the server response time is not
  necessarily fast.  The l_wait_event() behavior is that it blocks signals
  until the RPC timeout, which will normally succeed, but after the timeout
  the signals are unblocked and the user thread can be interrupted if the
  user wants, but it will keep waiting for the RPC to finish if not.  This
  is half-way between NFS soft and hard mounts.  I don't think there is an
  equivalent wait_event_* for that behaviour.

Cheers, Andreas

> Similar issues for osc_object_invalidate.
> 
> (If no one else speaks up, my vote is no contribution to load for those
> OSC waits.)
> 
> Otherwise this one looks good...
> 
> On 12/18/17, 1:17 AM, "lustre-devel on behalf of NeilBrown"
>  wrote:
> 
>> @@ -968,7 +964,6 @@ static int ll_statahead_thread(void *arg)
>>  intfirst  = 0;
>>  intrc = 0;
>>  struct md_op_data *op_data;
>> -struct l_wait_info  lwi= { 0 };
>>  sai = ll_sai_get(dir);
>>  sa_thread = >sai_thread;
>> @@ -1069,12 +1064,11 @@ static int ll_statahead_thread(void *arg)
>>  /* wait for spare statahead window */
>>  do {
>> -l_wait_event(sa_thread->t_ctl_waitq,
>> - !sa_sent_full(sai) ||
>> - sa_has_callback(sai) ||
>> - !list_empty(>sai_agls) ||
>> - !thread_is_running(sa_thread),
>> - );
>> +wait_event(sa_thread->t_ctl_waitq,
>> +   !sa_sent_full(sai) ||
>> +   sa_has_callback(sai) ||
>> +   !list_empty(>sai_agls) ||
>> +   !thread_is_running(sa_thread));
>>  sa_handle_callback(sai);
>>  spin_lock(>lli_agl_lock);
>> @@ -1128,11 +1122,10 @@ static int ll_statahead_thread(void *arg)
>>   * for file release to stop me.
>>   */
>>  while (thread_is_running(sa_thread)) {
>> -l_wait_event(sa_thread->t_ctl_waitq,
>> - sa_has_callback(sai) ||
>> - !agl_list_empty(sai) ||
>> - !thread_is_running(sa_thread),
>> - );
>> +wait_event(sa_thread->t_ctl_waitq,
>> +   sa_has_callback(sai) ||
>> +   !agl_list_empty(sai) ||
>> +   !thread_is_running(sa_thread));
>>  sa_handle_callback(sai);
>>  }
>> @@ -1145,9 +1138,8 @@ static int ll_statahead_thread(void *arg)
>>  CDEBUG(D_READA, "stop agl thread: sai %p pid %u\n",
>> sai, (unsigned int)agl_thread->t_pid);
>> -l_wait_event(agl_thread->t_ctl_waitq,
>> - thread_is_stopped(agl_thread),
>> - );
>> +wait_event(agl_thread->t_ctl_waitq,
>> +   thread_is_stopped(agl_thread));
>>  } else {
>>  /* Set agl_thread flags anyway. */
>>  thread_set_flags(agl_thread, SVC_STOPPED);
>> @@ -1159,8 +1151,8 @@ static int ll_statahead_thread(void *arg)
>>   */
>>  while (sai->sai_sent != sai->sai_replied) {
>>  /* in case we're not woken up, timeout wait */
>> -lwi = LWI_TIMEOUT(msecs_to_jiffies(MSEC_PER_SEC >> 3),
>> -  NULL, NULL);
>> +struct l_wait_info lwi = 

Re: [lustre-devel] [PATCH 02/16] staging: lustre: replace simple cases of l_wait_event() with wait_event().

2017-12-19 Thread Dilger, Andreas
On Dec 18, 2017, at 11:03, Patrick Farrell  wrote:
> 
> The wait calls in ll_statahead_thread are done in a service thread, and
> should probably *not* contribute to load.
> 
> The one in osc_extent_wait is perhaps tough - It is called both from user
> threads & daemon threads depending on the situation.  The effect of adding
> that to load average could be significant for some activities, even when
> no user threads are busy.  Thoughts from other Lustre people would be
> welcome here.

The main reasons we started using l_wait_event() were:
- it is used by the request handling threads, and wait_event() caused the
  load average to always be == number of service threads, which was
  wrong if those threads were idle waiting for requests to arrive.
  That is mostly a server problem, but a couple of request handlers are
  on the client also (DLM lock cancellation threads, etc.) that shouldn't
  contribute to load.  It looks like there is a better solution for this
  today with TASK_IDLE.
- we want the userspace threads to be interruptible if the server is not
  functional, but the client should at least get a chance to complete the
  RPC if the server is just busy.  Since Lustre needs to work in systems
  with 10,000+ clients pounding a server, the server response time is not
  necessarily fast.  The l_wait_event() behavior is that it blocks signals
  until the RPC timeout, which will normally succeed, but after the timeout
  the signals are unblocked and the user thread can be interrupted if the
  user wants, but it will keep waiting for the RPC to finish if not.  This
  is half-way between NFS soft and hard mounts.  I don't think there is an
  equivalent wait_event_* for that behaviour.

Cheers, Andreas

> Similar issues for osc_object_invalidate.
> 
> (If no one else speaks up, my vote is no contribution to load for those
> OSC waits.)
> 
> Otherwise this one looks good...
> 
> On 12/18/17, 1:17 AM, "lustre-devel on behalf of NeilBrown"
>  wrote:
> 
>> @@ -968,7 +964,6 @@ static int ll_statahead_thread(void *arg)
>>  intfirst  = 0;
>>  intrc = 0;
>>  struct md_op_data *op_data;
>> -struct l_wait_info  lwi= { 0 };
>>  sai = ll_sai_get(dir);
>>  sa_thread = >sai_thread;
>> @@ -1069,12 +1064,11 @@ static int ll_statahead_thread(void *arg)
>>  /* wait for spare statahead window */
>>  do {
>> -l_wait_event(sa_thread->t_ctl_waitq,
>> - !sa_sent_full(sai) ||
>> - sa_has_callback(sai) ||
>> - !list_empty(>sai_agls) ||
>> - !thread_is_running(sa_thread),
>> - );
>> +wait_event(sa_thread->t_ctl_waitq,
>> +   !sa_sent_full(sai) ||
>> +   sa_has_callback(sai) ||
>> +   !list_empty(>sai_agls) ||
>> +   !thread_is_running(sa_thread));
>>  sa_handle_callback(sai);
>>  spin_lock(>lli_agl_lock);
>> @@ -1128,11 +1122,10 @@ static int ll_statahead_thread(void *arg)
>>   * for file release to stop me.
>>   */
>>  while (thread_is_running(sa_thread)) {
>> -l_wait_event(sa_thread->t_ctl_waitq,
>> - sa_has_callback(sai) ||
>> - !agl_list_empty(sai) ||
>> - !thread_is_running(sa_thread),
>> - );
>> +wait_event(sa_thread->t_ctl_waitq,
>> +   sa_has_callback(sai) ||
>> +   !agl_list_empty(sai) ||
>> +   !thread_is_running(sa_thread));
>>  sa_handle_callback(sai);
>>  }
>> @@ -1145,9 +1138,8 @@ static int ll_statahead_thread(void *arg)
>>  CDEBUG(D_READA, "stop agl thread: sai %p pid %u\n",
>> sai, (unsigned int)agl_thread->t_pid);
>> -l_wait_event(agl_thread->t_ctl_waitq,
>> - thread_is_stopped(agl_thread),
>> - );
>> +wait_event(agl_thread->t_ctl_waitq,
>> +   thread_is_stopped(agl_thread));
>>  } else {
>>  /* Set agl_thread flags anyway. */
>>  thread_set_flags(agl_thread, SVC_STOPPED);
>> @@ -1159,8 +1151,8 @@ static int ll_statahead_thread(void *arg)
>>   */
>>  while (sai->sai_sent != sai->sai_replied) {
>>  /* in case we're not woken up, timeout wait */
>> -lwi = LWI_TIMEOUT(msecs_to_jiffies(MSEC_PER_SEC >> 3),
>> -  NULL, NULL);
>> +struct l_wait_info lwi = 
>> LWI_TIMEOUT(msecs_to_jiffies(MSEC_PER_SEC >>
>> 3),
>> +   

Re: [lustre-devel] [PATCH SERIES 4: 0/4] staging: lustre: use standard prng

2017-12-18 Thread Dilger, Andreas
On Dec 17, 2017, at 18:41, NeilBrown  wrote:
> 
> Lustre has its own internal PRNG code.
> This adds nothing of value to the Linux standard prng code,
> so switch over to using the standard interfaces.
> This adds a few callers to add_device_randomness(), which
> helps everyone, and removes unnecessary code.

Neil,
Thanks for the patches.  I'll run them through our testing system, but
they look good at first glance.

An interesting anecdote as this code is removed...  When it was first
added, we were running Lustre on a single-threaded runtime environment
without any local storage, interrupts, local clock, or h/w RNG (Catamount,
on the ASCI Red Storm https://en.wikipedia.org/wiki/Red_Storm_(computing)
supercomputer) and since there were thousands of nodes booting up and
mounting Lustre, there were often some with identical random number
states/seeds after boot, so we had to fold in the only unique state that
we had on each node - the network address.

That system is long gone, and it is good to clean up this code in a
more portable manner.

Cheers, Andreas


> ---
> 
> NeilBrown (4):
>  staging: lustre: replace cfs_rand() with prandom_u32_max()
>  staging: lustre: replace cfs_srand() calls with add_device_randomness().
>  staging: lustre: replace cfs_get_random_bytes calls with 
> get_random_byte()
>  staging: lustre: libcfs: remove prng
> 
> 
> .../staging/lustre/include/linux/libcfs/libcfs.h   |   10 -
> drivers/staging/lustre/lnet/libcfs/Makefile|2 
> drivers/staging/lustre/lnet/libcfs/fail.c  |2 
> drivers/staging/lustre/lnet/libcfs/prng.c  |  137 
> drivers/staging/lustre/lnet/lnet/net_fault.c   |   38 +++---
> drivers/staging/lustre/lnet/lnet/router.c  |   19 +--
> drivers/staging/lustre/lustre/include/obd_class.h  |2 
> drivers/staging/lustre/lustre/llite/super25.c  |   17 +-
> drivers/staging/lustre/lustre/mgc/mgc_request.c|4 -
> .../lustre/lustre/obdclass/lustre_handles.c|9 -
> drivers/staging/lustre/lustre/ptlrpc/client.c  |2 
> 11 files changed, 42 insertions(+), 200 deletions(-)
> delete mode 100644 drivers/staging/lustre/lnet/libcfs/prng.c
> 
> --
> Signature
> 
> ___
> lustre-devel mailing list
> lustre-de...@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [lustre-devel] [PATCH SERIES 4: 0/4] staging: lustre: use standard prng

2017-12-18 Thread Dilger, Andreas
On Dec 17, 2017, at 18:41, NeilBrown  wrote:
> 
> Lustre has its own internal PRNG code.
> This adds nothing of value to the Linux standard prng code,
> so switch over to using the standard interfaces.
> This adds a few callers to add_device_randomness(), which
> helps everyone, and removes unnecessary code.

Neil,
Thanks for the patches.  I'll run them through our testing system, but
they look good at first glance.

An interesting anecdote as this code is removed...  When it was first
added, we were running Lustre on a single-threaded runtime environment
without any local storage, interrupts, local clock, or h/w RNG (Catamount,
on the ASCI Red Storm https://en.wikipedia.org/wiki/Red_Storm_(computing)
supercomputer) and since there were thousands of nodes booting up and
mounting Lustre, there were often some with identical random number
states/seeds after boot, so we had to fold in the only unique state that
we had on each node - the network address.

That system is long gone, and it is good to clean up this code in a
more portable manner.

Cheers, Andreas


> ---
> 
> NeilBrown (4):
>  staging: lustre: replace cfs_rand() with prandom_u32_max()
>  staging: lustre: replace cfs_srand() calls with add_device_randomness().
>  staging: lustre: replace cfs_get_random_bytes calls with 
> get_random_byte()
>  staging: lustre: libcfs: remove prng
> 
> 
> .../staging/lustre/include/linux/libcfs/libcfs.h   |   10 -
> drivers/staging/lustre/lnet/libcfs/Makefile|2 
> drivers/staging/lustre/lnet/libcfs/fail.c  |2 
> drivers/staging/lustre/lnet/libcfs/prng.c  |  137 
> drivers/staging/lustre/lnet/lnet/net_fault.c   |   38 +++---
> drivers/staging/lustre/lnet/lnet/router.c  |   19 +--
> drivers/staging/lustre/lustre/include/obd_class.h  |2 
> drivers/staging/lustre/lustre/llite/super25.c  |   17 +-
> drivers/staging/lustre/lustre/mgc/mgc_request.c|4 -
> .../lustre/lustre/obdclass/lustre_handles.c|9 -
> drivers/staging/lustre/lustre/ptlrpc/client.c  |2 
> 11 files changed, 42 insertions(+), 200 deletions(-)
> delete mode 100644 drivers/staging/lustre/lnet/libcfs/prng.c
> 
> --
> Signature
> 
> ___
> lustre-devel mailing list
> lustre-de...@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH] staging: lustre: Fix sparse, using plain integer as NULL pointer in lov_object_fiemap()

2017-12-04 Thread Dilger, Andreas

> On Nov 30, 2017, at 11:30, Andrii Vladyka  wrote:
> 
> Change 0 to NULL in lov_object_fiemap() in order to fix warning produced by 
> sparse
> 
> Signed-off-by: Andrii Vladyka 

Patches should be inline rather than in an attachment.

That said, the patch looks correct, so you can add:

Signed-off-by: Andreas Dilger 

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH] staging: lustre: Fix sparse, using plain integer as NULL pointer in lov_object_fiemap()

2017-12-04 Thread Dilger, Andreas

> On Nov 30, 2017, at 11:30, Andrii Vladyka  wrote:
> 
> Change 0 to NULL in lov_object_fiemap() in order to fix warning produced by 
> sparse
> 
> Signed-off-by: Andrii Vladyka 

Patches should be inline rather than in an attachment.

That said, the patch looks correct, so you can add:

Signed-off-by: Andreas Dilger 

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [lustre-devel] [PATCH 9/9] staging: lustre: ldlm: remove unused field 'fwd_generation'

2017-10-27 Thread Dilger, Andreas
On Oct 22, 2017, at 18:53, NeilBrown  wrote:
> 
> With this field gone, we don't need local variables 'imp' or 'obd'
> any more.
> 
> Signed-off-by: NeilBrown 

Thanks for the patches.

Reviewed-by: Andreas Dilger 

> ---
> drivers/staging/lustre/lustre/ldlm/ldlm_flock.c |   21 +++--
> 1 file changed, 3 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c 
> b/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
> index 774d8667769a..9c0e9cd0 100644
> --- a/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
> +++ b/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
> @@ -311,7 +311,6 @@ static int ldlm_process_flock_lock(struct ldlm_lock *req)
> 
> struct ldlm_flock_wait_data {
>   struct ldlm_lock *fwd_lock;
> - intfwd_generation;
> };
> 
> static void
> @@ -342,11 +341,9 @@ int
> ldlm_flock_completion_ast(struct ldlm_lock *lock, __u64 flags, void *data)
> {
>   struct file_lock*getlk = lock->l_ast_data;
> - struct obd_device *obd;
> - struct obd_import *imp = NULL;
> - struct ldlm_flock_wait_data fwd;
> - struct l_wait_infolwi;
> - int  rc = 0;
> + struct ldlm_flock_wait_data fwd;
> + struct l_wait_info  lwi;
> + int rc = 0;
> 
>   OBD_FAIL_TIMEOUT(OBD_FAIL_LDLM_CP_CB_WAIT2, 4);
>   if (OBD_FAIL_PRECHECK(OBD_FAIL_LDLM_CP_CB_WAIT3)) {
> @@ -374,18 +371,6 @@ ldlm_flock_completion_ast(struct ldlm_lock *lock, __u64 
> flags, void *data)
> 
>   LDLM_DEBUG(lock, "client-side enqueue returned a blocked lock, 
> sleeping");
>   fwd.fwd_lock = lock;
> - obd = class_exp2obd(lock->l_conn_export);
> -
> - /* if this is a local lock, there is no import */
> - if (obd)
> - imp = obd->u.cli.cl_import;
> -
> - if (imp) {
> - spin_lock(>imp_lock);
> - fwd.fwd_generation = imp->imp_generation;
> - spin_unlock(>imp_lock);
> - }
> -
>   lwi = LWI_TIMEOUT_INTR(0, NULL, ldlm_flock_interrupted_wait, );
> 
>   /* Go to sleep until the lock is granted. */
> 
> 
> ___
> lustre-devel mailing list
> lustre-de...@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [lustre-devel] [PATCH 9/9] staging: lustre: ldlm: remove unused field 'fwd_generation'

2017-10-27 Thread Dilger, Andreas
On Oct 22, 2017, at 18:53, NeilBrown  wrote:
> 
> With this field gone, we don't need local variables 'imp' or 'obd'
> any more.
> 
> Signed-off-by: NeilBrown 

Thanks for the patches.

Reviewed-by: Andreas Dilger 

> ---
> drivers/staging/lustre/lustre/ldlm/ldlm_flock.c |   21 +++--
> 1 file changed, 3 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c 
> b/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
> index 774d8667769a..9c0e9cd0 100644
> --- a/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
> +++ b/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
> @@ -311,7 +311,6 @@ static int ldlm_process_flock_lock(struct ldlm_lock *req)
> 
> struct ldlm_flock_wait_data {
>   struct ldlm_lock *fwd_lock;
> - intfwd_generation;
> };
> 
> static void
> @@ -342,11 +341,9 @@ int
> ldlm_flock_completion_ast(struct ldlm_lock *lock, __u64 flags, void *data)
> {
>   struct file_lock*getlk = lock->l_ast_data;
> - struct obd_device *obd;
> - struct obd_import *imp = NULL;
> - struct ldlm_flock_wait_data fwd;
> - struct l_wait_infolwi;
> - int  rc = 0;
> + struct ldlm_flock_wait_data fwd;
> + struct l_wait_info  lwi;
> + int rc = 0;
> 
>   OBD_FAIL_TIMEOUT(OBD_FAIL_LDLM_CP_CB_WAIT2, 4);
>   if (OBD_FAIL_PRECHECK(OBD_FAIL_LDLM_CP_CB_WAIT3)) {
> @@ -374,18 +371,6 @@ ldlm_flock_completion_ast(struct ldlm_lock *lock, __u64 
> flags, void *data)
> 
>   LDLM_DEBUG(lock, "client-side enqueue returned a blocked lock, 
> sleeping");
>   fwd.fwd_lock = lock;
> - obd = class_exp2obd(lock->l_conn_export);
> -
> - /* if this is a local lock, there is no import */
> - if (obd)
> - imp = obd->u.cli.cl_import;
> -
> - if (imp) {
> - spin_lock(>imp_lock);
> - fwd.fwd_generation = imp->imp_generation;
> - spin_unlock(>imp_lock);
> - }
> -
>   lwi = LWI_TIMEOUT_INTR(0, NULL, ldlm_flock_interrupted_wait, );
> 
>   /* Go to sleep until the lock is granted. */
> 
> 
> ___
> lustre-devel mailing list
> lustre-de...@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH 8/9] staging: lustre: ldlm: remove unnecessary 'ownlocks' variable.

2017-10-27 Thread Dilger, Andreas
On Oct 22, 2017, at 18:53, NeilBrown  wrote:
> 
> Now that the code has been simplified, 'ownlocks' is not
> necessary.
> 
> The loop which sets it exits with 'lock' having the same value as
> 'ownlocks', or pointing to the head of the list if ownlocks is NULL.
> 
> The current code then tests ownlocks and sets 'lock' to exactly the
> value that it currently has.
> 
> So discard 'ownlocks'.
> 
> Also remove unnecessary initialization of 'lock'.
> 
> Signed-off-by: NeilBrown 

Reviewed-by: Andreas Dilger 

> ---
> drivers/staging/lustre/lustre/ldlm/ldlm_flock.c |   15 +++
> 1 file changed, 3 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c 
> b/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
> index 0bf6dce1c5b1..774d8667769a 100644
> --- a/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
> +++ b/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
> @@ -115,8 +115,7 @@ static int ldlm_process_flock_lock(struct ldlm_lock *req)
>   struct ldlm_resource *res = req->l_resource;
>   struct ldlm_namespace *ns = ldlm_res_to_ns(res);
>   struct ldlm_lock *tmp;
> - struct ldlm_lock *ownlocks = NULL;
> - struct ldlm_lock *lock = NULL;
> + struct ldlm_lock *lock;
>   struct ldlm_lock *new = req;
>   struct ldlm_lock *new2 = NULL;
>   enum ldlm_mode mode = req->l_req_mode;
> @@ -140,22 +139,14 @@ static int ldlm_process_flock_lock(struct ldlm_lock 
> *req)
>   /* This loop determines where this processes locks start
>* in the resource lr_granted list.
>*/
> - list_for_each_entry(lock, >lr_granted, l_res_link) {
> - if (ldlm_same_flock_owner(lock, req)) {
> - ownlocks = lock;
> + list_for_each_entry(lock, >lr_granted, l_res_link)
> + if (ldlm_same_flock_owner(lock, req))
>   break;
> - }
> - }
> 
>   /* Scan the locks owned by this process to find the insertion point
>* (as locks are ordered), and to handle overlaps.
>* We may have to merge or split existing locks.
>*/
> - if (ownlocks)
> - lock = ownlocks;
> - else
> - lock = list_entry(>lr_granted,
> -   struct ldlm_lock, l_res_link);
>   list_for_each_entry_safe_from(lock, tmp, >lr_granted, l_res_link) {
> 
>   if (!ldlm_same_flock_owner(lock, new))
> 
> 

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH 8/9] staging: lustre: ldlm: remove unnecessary 'ownlocks' variable.

2017-10-27 Thread Dilger, Andreas
On Oct 22, 2017, at 18:53, NeilBrown  wrote:
> 
> Now that the code has been simplified, 'ownlocks' is not
> necessary.
> 
> The loop which sets it exits with 'lock' having the same value as
> 'ownlocks', or pointing to the head of the list if ownlocks is NULL.
> 
> The current code then tests ownlocks and sets 'lock' to exactly the
> value that it currently has.
> 
> So discard 'ownlocks'.
> 
> Also remove unnecessary initialization of 'lock'.
> 
> Signed-off-by: NeilBrown 

Reviewed-by: Andreas Dilger 

> ---
> drivers/staging/lustre/lustre/ldlm/ldlm_flock.c |   15 +++
> 1 file changed, 3 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c 
> b/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
> index 0bf6dce1c5b1..774d8667769a 100644
> --- a/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
> +++ b/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
> @@ -115,8 +115,7 @@ static int ldlm_process_flock_lock(struct ldlm_lock *req)
>   struct ldlm_resource *res = req->l_resource;
>   struct ldlm_namespace *ns = ldlm_res_to_ns(res);
>   struct ldlm_lock *tmp;
> - struct ldlm_lock *ownlocks = NULL;
> - struct ldlm_lock *lock = NULL;
> + struct ldlm_lock *lock;
>   struct ldlm_lock *new = req;
>   struct ldlm_lock *new2 = NULL;
>   enum ldlm_mode mode = req->l_req_mode;
> @@ -140,22 +139,14 @@ static int ldlm_process_flock_lock(struct ldlm_lock 
> *req)
>   /* This loop determines where this processes locks start
>* in the resource lr_granted list.
>*/
> - list_for_each_entry(lock, >lr_granted, l_res_link) {
> - if (ldlm_same_flock_owner(lock, req)) {
> - ownlocks = lock;
> + list_for_each_entry(lock, >lr_granted, l_res_link)
> + if (ldlm_same_flock_owner(lock, req))
>   break;
> - }
> - }
> 
>   /* Scan the locks owned by this process to find the insertion point
>* (as locks are ordered), and to handle overlaps.
>* We may have to merge or split existing locks.
>*/
> - if (ownlocks)
> - lock = ownlocks;
> - else
> - lock = list_entry(>lr_granted,
> -   struct ldlm_lock, l_res_link);
>   list_for_each_entry_safe_from(lock, tmp, >lr_granted, l_res_link) {
> 
>   if (!ldlm_same_flock_owner(lock, new))
> 
> 

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH 7/9] staging: lustre: ldlm: tidy list walking in ldlm_flock()

2017-10-27 Thread Dilger, Andreas
On Oct 22, 2017, at 18:53, NeilBrown  wrote:
> 
> Use list_for_each_entry variants to
> avoid the explicit list_entry() calls.
> This allows us to use list_for_each_entry_safe_from()
> instread of adding a local list-walking macro.
> 
> Also improve some comments so that it is more obvious
> that the locks are sorted per-owner and that we need
> to find the insertion point.
> 
> Signed-off-by: NeilBrown 

The conversion looks a bit tricky, but appears to be correct.

Reviewed-by: Andreas Dilger 

> ---
> drivers/staging/lustre/lustre/ldlm/ldlm_flock.c |   45 ++-
> 1 file changed, 19 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c 
> b/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
> index 1bf56892fcf5..0bf6dce1c5b1 100644
> --- a/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
> +++ b/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
> @@ -59,17 +59,6 @@
> #include 
> #include "ldlm_internal.h"
> 
> -/**
> - * list_for_remaining_safe - iterate over the remaining entries in a list
> - * and safeguard against removal of a list entry.
> - * \param pos   the  list_head to use as a loop counter. pos MUST
> - * have been initialized prior to using it in this macro.
> - * \param n another  list_head to use as temporary storage
> - * \param head  the head for your list.
> - */
> -#define list_for_remaining_safe(pos, n, head) \
> - for (n = pos->next; pos != (head); pos = n, n = pos->next)
> -
> static inline int
> ldlm_same_flock_owner(struct ldlm_lock *lock, struct ldlm_lock *new)
> {
> @@ -125,8 +114,8 @@ static int ldlm_process_flock_lock(struct ldlm_lock *req)
> {
>   struct ldlm_resource *res = req->l_resource;
>   struct ldlm_namespace *ns = ldlm_res_to_ns(res);
> - struct list_head *tmp;
> - struct list_head *ownlocks = NULL;
> + struct ldlm_lock *tmp;
> + struct ldlm_lock *ownlocks = NULL;
>   struct ldlm_lock *lock = NULL;
>   struct ldlm_lock *new = req;
>   struct ldlm_lock *new2 = NULL;
> @@ -151,23 +140,23 @@ static int ldlm_process_flock_lock(struct ldlm_lock 
> *req)
>   /* This loop determines where this processes locks start
>* in the resource lr_granted list.
>*/
> - list_for_each(tmp, >lr_granted) {
> - lock = list_entry(tmp, struct ldlm_lock,
> -   l_res_link);
> + list_for_each_entry(lock, >lr_granted, l_res_link) {
>   if (ldlm_same_flock_owner(lock, req)) {
> - ownlocks = tmp;
> + ownlocks = lock;
>   break;
>   }
>   }
> 
> - /* Scan the locks owned by this process that overlap this request.
> + /* Scan the locks owned by this process to find the insertion point
> +  * (as locks are ordered), and to handle overlaps.
>* We may have to merge or split existing locks.
>*/
> - if (!ownlocks)
> - ownlocks = >lr_granted;
> -
> - list_for_remaining_safe(ownlocks, tmp, >lr_granted) {
> - lock = list_entry(ownlocks, struct ldlm_lock, l_res_link);
> + if (ownlocks)
> + lock = ownlocks;
> + else
> + lock = list_entry(>lr_granted,
> +   struct ldlm_lock, l_res_link);
> + list_for_each_entry_safe_from(lock, tmp, >lr_granted, l_res_link) {
> 
>   if (!ldlm_same_flock_owner(lock, new))
>   break;
> @@ -295,7 +284,7 @@ static int ldlm_process_flock_lock(struct ldlm_lock *req)
>lock->l_granted_mode);
> 
>   /* insert new2 at lock */
> - ldlm_resource_add_lock(res, ownlocks, new2);
> + ldlm_resource_add_lock(res, >l_res_link, new2);
>   LDLM_LOCK_RELEASE(new2);
>   break;
>   }
> @@ -309,8 +298,12 @@ static int ldlm_process_flock_lock(struct ldlm_lock *req)
> 
>   if (!added) {
>   list_del_init(>l_res_link);
> - /* insert new lock before ownlocks in list. */
> - ldlm_resource_add_lock(res, ownlocks, req);
> + /* insert new lock before "lock", which might be the
> +  * next lock for this owner, or might be the first
> +  * lock for the next owner, or might not be a lock at
> +  * all, but instead points at the head of the list
> +  */
> + ldlm_resource_add_lock(res, >l_res_link, req);
>   }
> 
>   /* In case we're reprocessing the requested lock we can't destroy
> 
> 

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH 7/9] staging: lustre: ldlm: tidy list walking in ldlm_flock()

2017-10-27 Thread Dilger, Andreas
On Oct 22, 2017, at 18:53, NeilBrown  wrote:
> 
> Use list_for_each_entry variants to
> avoid the explicit list_entry() calls.
> This allows us to use list_for_each_entry_safe_from()
> instread of adding a local list-walking macro.
> 
> Also improve some comments so that it is more obvious
> that the locks are sorted per-owner and that we need
> to find the insertion point.
> 
> Signed-off-by: NeilBrown 

The conversion looks a bit tricky, but appears to be correct.

Reviewed-by: Andreas Dilger 

> ---
> drivers/staging/lustre/lustre/ldlm/ldlm_flock.c |   45 ++-
> 1 file changed, 19 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c 
> b/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
> index 1bf56892fcf5..0bf6dce1c5b1 100644
> --- a/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
> +++ b/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
> @@ -59,17 +59,6 @@
> #include 
> #include "ldlm_internal.h"
> 
> -/**
> - * list_for_remaining_safe - iterate over the remaining entries in a list
> - * and safeguard against removal of a list entry.
> - * \param pos   the  list_head to use as a loop counter. pos MUST
> - * have been initialized prior to using it in this macro.
> - * \param n another  list_head to use as temporary storage
> - * \param head  the head for your list.
> - */
> -#define list_for_remaining_safe(pos, n, head) \
> - for (n = pos->next; pos != (head); pos = n, n = pos->next)
> -
> static inline int
> ldlm_same_flock_owner(struct ldlm_lock *lock, struct ldlm_lock *new)
> {
> @@ -125,8 +114,8 @@ static int ldlm_process_flock_lock(struct ldlm_lock *req)
> {
>   struct ldlm_resource *res = req->l_resource;
>   struct ldlm_namespace *ns = ldlm_res_to_ns(res);
> - struct list_head *tmp;
> - struct list_head *ownlocks = NULL;
> + struct ldlm_lock *tmp;
> + struct ldlm_lock *ownlocks = NULL;
>   struct ldlm_lock *lock = NULL;
>   struct ldlm_lock *new = req;
>   struct ldlm_lock *new2 = NULL;
> @@ -151,23 +140,23 @@ static int ldlm_process_flock_lock(struct ldlm_lock 
> *req)
>   /* This loop determines where this processes locks start
>* in the resource lr_granted list.
>*/
> - list_for_each(tmp, >lr_granted) {
> - lock = list_entry(tmp, struct ldlm_lock,
> -   l_res_link);
> + list_for_each_entry(lock, >lr_granted, l_res_link) {
>   if (ldlm_same_flock_owner(lock, req)) {
> - ownlocks = tmp;
> + ownlocks = lock;
>   break;
>   }
>   }
> 
> - /* Scan the locks owned by this process that overlap this request.
> + /* Scan the locks owned by this process to find the insertion point
> +  * (as locks are ordered), and to handle overlaps.
>* We may have to merge or split existing locks.
>*/
> - if (!ownlocks)
> - ownlocks = >lr_granted;
> -
> - list_for_remaining_safe(ownlocks, tmp, >lr_granted) {
> - lock = list_entry(ownlocks, struct ldlm_lock, l_res_link);
> + if (ownlocks)
> + lock = ownlocks;
> + else
> + lock = list_entry(>lr_granted,
> +   struct ldlm_lock, l_res_link);
> + list_for_each_entry_safe_from(lock, tmp, >lr_granted, l_res_link) {
> 
>   if (!ldlm_same_flock_owner(lock, new))
>   break;
> @@ -295,7 +284,7 @@ static int ldlm_process_flock_lock(struct ldlm_lock *req)
>lock->l_granted_mode);
> 
>   /* insert new2 at lock */
> - ldlm_resource_add_lock(res, ownlocks, new2);
> + ldlm_resource_add_lock(res, >l_res_link, new2);
>   LDLM_LOCK_RELEASE(new2);
>   break;
>   }
> @@ -309,8 +298,12 @@ static int ldlm_process_flock_lock(struct ldlm_lock *req)
> 
>   if (!added) {
>   list_del_init(>l_res_link);
> - /* insert new lock before ownlocks in list. */
> - ldlm_resource_add_lock(res, ownlocks, req);
> + /* insert new lock before "lock", which might be the
> +  * next lock for this owner, or might be the first
> +  * lock for the next owner, or might not be a lock at
> +  * all, but instead points at the head of the list
> +  */
> + ldlm_resource_add_lock(res, >l_res_link, req);
>   }
> 
>   /* In case we're reprocessing the requested lock we can't destroy
> 
> 

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation









Re: [PATCH 6/9] staging: lustre: ldlm: remove 'flags' arg from ldlm_flock_destroy()

2017-10-27 Thread Dilger, Andreas
On Oct 22, 2017, at 18:53, NeilBrown  wrote:
> 
> The only value ever passed in LDLM_FL_WAIT_NOREPROC, so assume that
> instead of passing it.
> 
> Signed-off-by: NeilBrown 

Reviewed-by: Andreas Dilger 

> ---
> drivers/staging/lustre/lustre/ldlm/ldlm_flock.c |   36 ++-
> 1 file changed, 16 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c 
> b/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
> index d5a5742a1171..1bf56892fcf5 100644
> --- a/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
> +++ b/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
> @@ -88,24 +88,23 @@ ldlm_flocks_overlap(struct ldlm_lock *lock, struct 
> ldlm_lock *new)
> }
> 
> static inline void
> -ldlm_flock_destroy(struct ldlm_lock *lock, enum ldlm_mode mode, __u64 flags)
> +ldlm_flock_destroy(struct ldlm_lock *lock, enum ldlm_mode mode)
> {
> - LDLM_DEBUG(lock, "%s(mode: %d, flags: 0x%llx)",
> -__func__, mode, flags);
> + LDLM_DEBUG(lock, "%s(mode: %d)",
> +__func__, mode);
> 
>   /* Safe to not lock here, since it should be empty anyway */
>   LASSERT(hlist_unhashed(>l_exp_flock_hash));
> 
>   list_del_init(>l_res_link);
> - if (flags == LDLM_FL_WAIT_NOREPROC) {
> - /* client side - set a flag to prevent sending a CANCEL */
> - lock->l_flags |= LDLM_FL_LOCAL_ONLY | LDLM_FL_CBPENDING;
> 
> - /* when reaching here, it is under lock_res_and_lock(). Thus,
> -  * need call the nolock version of ldlm_lock_decref_internal
> -  */
> - ldlm_lock_decref_internal_nolock(lock, mode);
> - }
> + /* client side - set a flag to prevent sending a CANCEL */
> + lock->l_flags |= LDLM_FL_LOCAL_ONLY | LDLM_FL_CBPENDING;
> +
> + /* when reaching here, it is under lock_res_and_lock(). Thus,
> +  * need call the nolock version of ldlm_lock_decref_internal
> +  */
> + ldlm_lock_decref_internal_nolock(lock, mode);
> 
>   ldlm_lock_destroy_nolock(lock);
> }
> @@ -208,8 +207,7 @@ static int ldlm_process_flock_lock(struct ldlm_lock *req)
>   }
> 
>   if (added) {
> - ldlm_flock_destroy(lock, mode,
> -LDLM_FL_WAIT_NOREPROC);
> + ldlm_flock_destroy(lock, mode);
>   } else {
>   new = lock;
>   added = 1;
> @@ -233,8 +231,7 @@ static int ldlm_process_flock_lock(struct ldlm_lock *req)
>   new->l_policy_data.l_flock.end + 1;
>   break;
>   }
> - ldlm_flock_destroy(lock, lock->l_req_mode,
> -LDLM_FL_WAIT_NOREPROC);
> + ldlm_flock_destroy(lock, lock->l_req_mode);
>   continue;
>   }
>   if (new->l_policy_data.l_flock.end >=
> @@ -265,8 +262,7 @@ static int ldlm_process_flock_lock(struct ldlm_lock *req)
>   NULL, 0, LVB_T_NONE);
>   lock_res_and_lock(req);
>   if (IS_ERR(new2)) {
> - ldlm_flock_destroy(req, lock->l_granted_mode,
> -LDLM_FL_WAIT_NOREPROC);
> + ldlm_flock_destroy(req, lock->l_granted_mode);
>   return LDLM_ITER_STOP;
>   }
>   goto reprocess;
> @@ -323,7 +319,7 @@ static int ldlm_process_flock_lock(struct ldlm_lock *req)
>* could be freed before the completion AST can be sent.
>*/
>   if (added)
> - ldlm_flock_destroy(req, mode, LDLM_FL_WAIT_NOREPROC);
> + ldlm_flock_destroy(req, mode);
> 
>   ldlm_resource_dump(D_INFO, res);
>   return LDLM_ITER_CONTINUE;
> @@ -477,7 +473,7 @@ ldlm_flock_completion_ast(struct ldlm_lock *lock, __u64 
> flags, void *data)
>   LDLM_DEBUG(lock, "client-side enqueue deadlock 
> received");
>   rc = -EDEADLK;
>   }
> - ldlm_flock_destroy(lock, mode, LDLM_FL_WAIT_NOREPROC);
> + ldlm_flock_destroy(lock, mode);
>   unlock_res_and_lock(lock);
> 
>   /* Need to wake up the waiter if we were evicted */
> @@ -498,7 +494,7 @@ ldlm_flock_completion_ast(struct ldlm_lock *lock, __u64 
> flags, void *data)
>* in the lock changes we can decref the appropriate refcount.
>*/
>   LASSERT(ldlm_is_test_lock(lock));
> - ldlm_flock_destroy(lock, getlk->fl_type, LDLM_FL_WAIT_NOREPROC);
> + ldlm_flock_destroy(lock, getlk->fl_type);
>   switch (lock->l_granted_mode) {
> 

Re: [PATCH 6/9] staging: lustre: ldlm: remove 'flags' arg from ldlm_flock_destroy()

2017-10-27 Thread Dilger, Andreas
On Oct 22, 2017, at 18:53, NeilBrown  wrote:
> 
> The only value ever passed in LDLM_FL_WAIT_NOREPROC, so assume that
> instead of passing it.
> 
> Signed-off-by: NeilBrown 

Reviewed-by: Andreas Dilger 

> ---
> drivers/staging/lustre/lustre/ldlm/ldlm_flock.c |   36 ++-
> 1 file changed, 16 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c 
> b/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
> index d5a5742a1171..1bf56892fcf5 100644
> --- a/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
> +++ b/drivers/staging/lustre/lustre/ldlm/ldlm_flock.c
> @@ -88,24 +88,23 @@ ldlm_flocks_overlap(struct ldlm_lock *lock, struct 
> ldlm_lock *new)
> }
> 
> static inline void
> -ldlm_flock_destroy(struct ldlm_lock *lock, enum ldlm_mode mode, __u64 flags)
> +ldlm_flock_destroy(struct ldlm_lock *lock, enum ldlm_mode mode)
> {
> - LDLM_DEBUG(lock, "%s(mode: %d, flags: 0x%llx)",
> -__func__, mode, flags);
> + LDLM_DEBUG(lock, "%s(mode: %d)",
> +__func__, mode);
> 
>   /* Safe to not lock here, since it should be empty anyway */
>   LASSERT(hlist_unhashed(>l_exp_flock_hash));
> 
>   list_del_init(>l_res_link);
> - if (flags == LDLM_FL_WAIT_NOREPROC) {
> - /* client side - set a flag to prevent sending a CANCEL */
> - lock->l_flags |= LDLM_FL_LOCAL_ONLY | LDLM_FL_CBPENDING;
> 
> - /* when reaching here, it is under lock_res_and_lock(). Thus,
> -  * need call the nolock version of ldlm_lock_decref_internal
> -  */
> - ldlm_lock_decref_internal_nolock(lock, mode);
> - }
> + /* client side - set a flag to prevent sending a CANCEL */
> + lock->l_flags |= LDLM_FL_LOCAL_ONLY | LDLM_FL_CBPENDING;
> +
> + /* when reaching here, it is under lock_res_and_lock(). Thus,
> +  * need call the nolock version of ldlm_lock_decref_internal
> +  */
> + ldlm_lock_decref_internal_nolock(lock, mode);
> 
>   ldlm_lock_destroy_nolock(lock);
> }
> @@ -208,8 +207,7 @@ static int ldlm_process_flock_lock(struct ldlm_lock *req)
>   }
> 
>   if (added) {
> - ldlm_flock_destroy(lock, mode,
> -LDLM_FL_WAIT_NOREPROC);
> + ldlm_flock_destroy(lock, mode);
>   } else {
>   new = lock;
>   added = 1;
> @@ -233,8 +231,7 @@ static int ldlm_process_flock_lock(struct ldlm_lock *req)
>   new->l_policy_data.l_flock.end + 1;
>   break;
>   }
> - ldlm_flock_destroy(lock, lock->l_req_mode,
> -LDLM_FL_WAIT_NOREPROC);
> + ldlm_flock_destroy(lock, lock->l_req_mode);
>   continue;
>   }
>   if (new->l_policy_data.l_flock.end >=
> @@ -265,8 +262,7 @@ static int ldlm_process_flock_lock(struct ldlm_lock *req)
>   NULL, 0, LVB_T_NONE);
>   lock_res_and_lock(req);
>   if (IS_ERR(new2)) {
> - ldlm_flock_destroy(req, lock->l_granted_mode,
> -LDLM_FL_WAIT_NOREPROC);
> + ldlm_flock_destroy(req, lock->l_granted_mode);
>   return LDLM_ITER_STOP;
>   }
>   goto reprocess;
> @@ -323,7 +319,7 @@ static int ldlm_process_flock_lock(struct ldlm_lock *req)
>* could be freed before the completion AST can be sent.
>*/
>   if (added)
> - ldlm_flock_destroy(req, mode, LDLM_FL_WAIT_NOREPROC);
> + ldlm_flock_destroy(req, mode);
> 
>   ldlm_resource_dump(D_INFO, res);
>   return LDLM_ITER_CONTINUE;
> @@ -477,7 +473,7 @@ ldlm_flock_completion_ast(struct ldlm_lock *lock, __u64 
> flags, void *data)
>   LDLM_DEBUG(lock, "client-side enqueue deadlock 
> received");
>   rc = -EDEADLK;
>   }
> - ldlm_flock_destroy(lock, mode, LDLM_FL_WAIT_NOREPROC);
> + ldlm_flock_destroy(lock, mode);
>   unlock_res_and_lock(lock);
> 
>   /* Need to wake up the waiter if we were evicted */
> @@ -498,7 +494,7 @@ ldlm_flock_completion_ast(struct ldlm_lock *lock, __u64 
> flags, void *data)
>* in the lock changes we can decref the appropriate refcount.
>*/
>   LASSERT(ldlm_is_test_lock(lock));
> - ldlm_flock_destroy(lock, getlk->fl_type, LDLM_FL_WAIT_NOREPROC);
> + ldlm_flock_destroy(lock, getlk->fl_type);
>   switch (lock->l_granted_mode) {
>   case LCK_PR:
>   

  1   2   3   4   >