Re: [PATCH 01/11] staging: lustre: simplify use of interval-tree.

2018-07-05 Thread James Simmons


> >> Lustre has a private interval-tree implementation.  This
> >> implementation (inexplicably) refuses to insert an interval if an
> >> identical interval already exists.  It is OK with all sorts of
> >> overlapping intervals, but identical intervals are rejected.
> >
> > I talked to Oleg about this since this changes the behavior. He is worried
> > about having identical items that would end up being merged.  If we can 
> > guarantee by some other means there are no identical nodes, we are 
> > probably fine with the interval tree code allowing this. Oleg can explain 
> > better than me in this case.
> 
> I don't think this is a change in behaviour.
> In the driver/staging client code, interval tree is being used in two
> places and both of them have clumsy work-arounds for the fact that they
> cannot insert duplicates in the interval tree.
> The patch just cleans his up.
> 
> However if I have missed something, please provide details.
> What "identical items" might get merged?

Oleg could you fill in detail what your concerns are?
 
> >  
> >> Both users of interval-tree in lustre would be simpler if this was not
> >> the case.  They need to store all intervals, even if some are
> >> identical.
> >> 
> >> llite/range_lock.c add a rl_next_lock list_head to each lock.
> >> If it cannot insert a new lock because the range is in use, it
> >> attached the new lock to the existing lock using rl_next_lock.
> >> This requires extra code to iterate over the rl_next_lock lists when
> >> iterating over locks, and to update the list when deleting a lock from
> >> the tree.
> >> 
> >> ldlm_extend allocates a separate ldlm_interval which as a list of
> >> ldlm_locks which share the same interval.  This is linked together
> >> by over-loading the l_sl_policy which, for non-extent locks, is used
> >> for linking together locks with the same policy.
> >> This doesn't only require extra code, but also an extra memory
> >> allocation.
> >> 
> >> This patch removes all that complexity.
> >> - interval_insert() now never fails.
> >
> > Its not really a failure. What it does is if it finds a already existing
> > node with the range requested it returns the already existing node 
> > pointer. If not it just creates a new node and returns NULL. Sometimes
> > identical request can happen. A good example of this is with HSM request
> > on the MDS server. In that case sometimes we get identical progress
> > reports which we want to filter out so not add the same data.
> 
> This example is server-side code which is not a focus at present.
> Having a quick look, it looks like it would be easy enough to do a
> lookup first and then only insert if the lookup failed.
> I think this is a much nicer approach than never allowing duplicates in
> the interval tree.
> 
> Thanks,
> NeilBrown
> 


Re: [PATCH 01/11] staging: lustre: simplify use of interval-tree.

2018-07-05 Thread James Simmons


> >> Lustre has a private interval-tree implementation.  This
> >> implementation (inexplicably) refuses to insert an interval if an
> >> identical interval already exists.  It is OK with all sorts of
> >> overlapping intervals, but identical intervals are rejected.
> >
> > I talked to Oleg about this since this changes the behavior. He is worried
> > about having identical items that would end up being merged.  If we can 
> > guarantee by some other means there are no identical nodes, we are 
> > probably fine with the interval tree code allowing this. Oleg can explain 
> > better than me in this case.
> 
> I don't think this is a change in behaviour.
> In the driver/staging client code, interval tree is being used in two
> places and both of them have clumsy work-arounds for the fact that they
> cannot insert duplicates in the interval tree.
> The patch just cleans his up.
> 
> However if I have missed something, please provide details.
> What "identical items" might get merged?

Oleg could you fill in detail what your concerns are?
 
> >  
> >> Both users of interval-tree in lustre would be simpler if this was not
> >> the case.  They need to store all intervals, even if some are
> >> identical.
> >> 
> >> llite/range_lock.c add a rl_next_lock list_head to each lock.
> >> If it cannot insert a new lock because the range is in use, it
> >> attached the new lock to the existing lock using rl_next_lock.
> >> This requires extra code to iterate over the rl_next_lock lists when
> >> iterating over locks, and to update the list when deleting a lock from
> >> the tree.
> >> 
> >> ldlm_extend allocates a separate ldlm_interval which as a list of
> >> ldlm_locks which share the same interval.  This is linked together
> >> by over-loading the l_sl_policy which, for non-extent locks, is used
> >> for linking together locks with the same policy.
> >> This doesn't only require extra code, but also an extra memory
> >> allocation.
> >> 
> >> This patch removes all that complexity.
> >> - interval_insert() now never fails.
> >
> > Its not really a failure. What it does is if it finds a already existing
> > node with the range requested it returns the already existing node 
> > pointer. If not it just creates a new node and returns NULL. Sometimes
> > identical request can happen. A good example of this is with HSM request
> > on the MDS server. In that case sometimes we get identical progress
> > reports which we want to filter out so not add the same data.
> 
> This example is server-side code which is not a focus at present.
> Having a quick look, it looks like it would be easy enough to do a
> lookup first and then only insert if the lookup failed.
> I think this is a much nicer approach than never allowing duplicates in
> the interval tree.
> 
> Thanks,
> NeilBrown
> 


Re: [PATCH 01/11] staging: lustre: simplify use of interval-tree.

2018-06-15 Thread James Simmons


> Lustre has a private interval-tree implementation.  This
> implementation (inexplicably) refuses to insert an interval if an
> identical interval already exists.  It is OK with all sorts of
> overlapping intervals, but identical intervals are rejected.

I talked to Oleg about this since this changes the behavior. He is worried
about having identical items that would end up being merged.  If we can 
guarantee by some other means there are no identical nodes, we are 
probably fine with the interval tree code allowing this. Oleg can explain 
better than me in this case.
 
> Both users of interval-tree in lustre would be simpler if this was not
> the case.  They need to store all intervals, even if some are
> identical.
> 
> llite/range_lock.c add a rl_next_lock list_head to each lock.
> If it cannot insert a new lock because the range is in use, it
> attached the new lock to the existing lock using rl_next_lock.
> This requires extra code to iterate over the rl_next_lock lists when
> iterating over locks, and to update the list when deleting a lock from
> the tree.
> 
> ldlm_extend allocates a separate ldlm_interval which as a list of
> ldlm_locks which share the same interval.  This is linked together
> by over-loading the l_sl_policy which, for non-extent locks, is used
> for linking together locks with the same policy.
> This doesn't only require extra code, but also an extra memory
> allocation.
> 
> This patch removes all that complexity.
> - interval_insert() now never fails.

Its not really a failure. What it does is if it finds a already existing
node with the range requested it returns the already existing node 
pointer. If not it just creates a new node and returns NULL. Sometimes
identical request can happen. A good example of this is with HSM request
on the MDS server. In that case sometimes we get identical progress
reports which we want to filter out so not add the same data.

> - consequently rl_next_lock is always empty and
>   rl_lock_count is always zero. so they are removed
> - every ldlm_lock has linked directly into the
>   interval tree, so each has an embedded interval_node
>   rather than a pointer to a 'struct ldlm_interval'
> - ldlm_interval is now unused, so it is gone as it
>   the kmemcache from which they were allocated.
> - the various functions for allocating an ldlm_interval
>   and attaching to a lock or detaching from a lock
>   are also gone.
> 
> Signed-off-by: NeilBrown 
> ---
>  .../staging/lustre/lustre/include/interval_tree.h  |4 +
>  drivers/staging/lustre/lustre/include/lustre_dlm.h |   12 ---
>  drivers/staging/lustre/lustre/ldlm/interval_tree.c |   13 +--
>  drivers/staging/lustre/lustre/ldlm/ldlm_extent.c   |   76 
> ++--
>  drivers/staging/lustre/lustre/ldlm/ldlm_internal.h |   17 
>  drivers/staging/lustre/lustre/ldlm/ldlm_lock.c |   25 +--
>  drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c|9 --
>  drivers/staging/lustre/lustre/llite/range_lock.c   |   59 +---
>  drivers/staging/lustre/lustre/llite/range_lock.h   |8 --
>  9 files changed, 17 insertions(+), 206 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/include/interval_tree.h 
> b/drivers/staging/lustre/lustre/include/interval_tree.h
> index 7d119c1a0469..bcda74fc7875 100644
> --- a/drivers/staging/lustre/lustre/include/interval_tree.h
> +++ b/drivers/staging/lustre/lustre/include/interval_tree.h
> @@ -100,8 +100,8 @@ static inline int interval_set(struct interval_node *node,
>  typedef enum interval_iter (*interval_callback_t)(struct interval_node *node,
> void *args);
>  
> -struct interval_node *interval_insert(struct interval_node *node,
> -   struct interval_node **root);
> +void interval_insert(struct interval_node *node,
> +  struct interval_node **root);
>  void interval_erase(struct interval_node *node, struct interval_node **root);
>  
>  /*
> diff --git a/drivers/staging/lustre/lustre/include/lustre_dlm.h 
> b/drivers/staging/lustre/lustre/include/lustre_dlm.h
> index 2c55241258cc..baeb8c63352b 100644
> --- a/drivers/staging/lustre/lustre/include/lustre_dlm.h
> +++ b/drivers/staging/lustre/lustre/include/lustre_dlm.h
> @@ -513,16 +513,6 @@ struct ldlm_glimpse_work {
>  /** The ldlm_glimpse_work is allocated on the stack and should not be freed. 
> */
>  #define LDLM_GL_WORK_NOFREE 0x1
>  
> -/** Interval node data for each LDLM_EXTENT lock. */
> -struct ldlm_interval {
> - struct interval_nodeli_node;  /* node for tree management */
> - struct list_headli_group; /* the locks which have the same
> -* policy - group of the policy
> -*/
> -};
> -
> -#define to_ldlm_interval(n) container_of(n, struct ldlm_interval, li_node)
> -
>  /**
>   * Interval tree for extent locks.
>   * The interval tree must be accessed under the resource 

Re: [PATCH 01/11] staging: lustre: simplify use of interval-tree.

2018-06-15 Thread James Simmons


> Lustre has a private interval-tree implementation.  This
> implementation (inexplicably) refuses to insert an interval if an
> identical interval already exists.  It is OK with all sorts of
> overlapping intervals, but identical intervals are rejected.

I talked to Oleg about this since this changes the behavior. He is worried
about having identical items that would end up being merged.  If we can 
guarantee by some other means there are no identical nodes, we are 
probably fine with the interval tree code allowing this. Oleg can explain 
better than me in this case.
 
> Both users of interval-tree in lustre would be simpler if this was not
> the case.  They need to store all intervals, even if some are
> identical.
> 
> llite/range_lock.c add a rl_next_lock list_head to each lock.
> If it cannot insert a new lock because the range is in use, it
> attached the new lock to the existing lock using rl_next_lock.
> This requires extra code to iterate over the rl_next_lock lists when
> iterating over locks, and to update the list when deleting a lock from
> the tree.
> 
> ldlm_extend allocates a separate ldlm_interval which as a list of
> ldlm_locks which share the same interval.  This is linked together
> by over-loading the l_sl_policy which, for non-extent locks, is used
> for linking together locks with the same policy.
> This doesn't only require extra code, but also an extra memory
> allocation.
> 
> This patch removes all that complexity.
> - interval_insert() now never fails.

Its not really a failure. What it does is if it finds a already existing
node with the range requested it returns the already existing node 
pointer. If not it just creates a new node and returns NULL. Sometimes
identical request can happen. A good example of this is with HSM request
on the MDS server. In that case sometimes we get identical progress
reports which we want to filter out so not add the same data.

> - consequently rl_next_lock is always empty and
>   rl_lock_count is always zero. so they are removed
> - every ldlm_lock has linked directly into the
>   interval tree, so each has an embedded interval_node
>   rather than a pointer to a 'struct ldlm_interval'
> - ldlm_interval is now unused, so it is gone as it
>   the kmemcache from which they were allocated.
> - the various functions for allocating an ldlm_interval
>   and attaching to a lock or detaching from a lock
>   are also gone.
> 
> Signed-off-by: NeilBrown 
> ---
>  .../staging/lustre/lustre/include/interval_tree.h  |4 +
>  drivers/staging/lustre/lustre/include/lustre_dlm.h |   12 ---
>  drivers/staging/lustre/lustre/ldlm/interval_tree.c |   13 +--
>  drivers/staging/lustre/lustre/ldlm/ldlm_extent.c   |   76 
> ++--
>  drivers/staging/lustre/lustre/ldlm/ldlm_internal.h |   17 
>  drivers/staging/lustre/lustre/ldlm/ldlm_lock.c |   25 +--
>  drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c|9 --
>  drivers/staging/lustre/lustre/llite/range_lock.c   |   59 +---
>  drivers/staging/lustre/lustre/llite/range_lock.h   |8 --
>  9 files changed, 17 insertions(+), 206 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/include/interval_tree.h 
> b/drivers/staging/lustre/lustre/include/interval_tree.h
> index 7d119c1a0469..bcda74fc7875 100644
> --- a/drivers/staging/lustre/lustre/include/interval_tree.h
> +++ b/drivers/staging/lustre/lustre/include/interval_tree.h
> @@ -100,8 +100,8 @@ static inline int interval_set(struct interval_node *node,
>  typedef enum interval_iter (*interval_callback_t)(struct interval_node *node,
> void *args);
>  
> -struct interval_node *interval_insert(struct interval_node *node,
> -   struct interval_node **root);
> +void interval_insert(struct interval_node *node,
> +  struct interval_node **root);
>  void interval_erase(struct interval_node *node, struct interval_node **root);
>  
>  /*
> diff --git a/drivers/staging/lustre/lustre/include/lustre_dlm.h 
> b/drivers/staging/lustre/lustre/include/lustre_dlm.h
> index 2c55241258cc..baeb8c63352b 100644
> --- a/drivers/staging/lustre/lustre/include/lustre_dlm.h
> +++ b/drivers/staging/lustre/lustre/include/lustre_dlm.h
> @@ -513,16 +513,6 @@ struct ldlm_glimpse_work {
>  /** The ldlm_glimpse_work is allocated on the stack and should not be freed. 
> */
>  #define LDLM_GL_WORK_NOFREE 0x1
>  
> -/** Interval node data for each LDLM_EXTENT lock. */
> -struct ldlm_interval {
> - struct interval_nodeli_node;  /* node for tree management */
> - struct list_headli_group; /* the locks which have the same
> -* policy - group of the policy
> -*/
> -};
> -
> -#define to_ldlm_interval(n) container_of(n, struct ldlm_interval, li_node)
> -
>  /**
>   * Interval tree for extent locks.
>   * The interval tree must be accessed under the resource 

Re: [PATCH 08/11] staging: lustre: obdclass: move linux/linux-foo.c to foo.c

2018-06-13 Thread James Simmons


> As lustre is now linux-only, having this linux sub-directory
> with files named "linux-something" is just noise.  Move them
> to a more friendly name.

Reviewed-by: James Simmons 
 
> Signed-off-by: NeilBrown 
> ---
>  drivers/staging/lustre/lustre/obdclass/Makefile|2 
>  .../lustre/lustre/obdclass/linux/linux-module.c|  514 
> 
>  .../lustre/lustre/obdclass/linux/linux-sysctl.c|  162 --
>  drivers/staging/lustre/lustre/obdclass/module.c|  514 
> 
>  drivers/staging/lustre/lustre/obdclass/sysctl.c|  162 ++
>  5 files changed, 677 insertions(+), 677 deletions(-)
>  delete mode 100644 
> drivers/staging/lustre/lustre/obdclass/linux/linux-module.c
>  delete mode 100644 
> drivers/staging/lustre/lustre/obdclass/linux/linux-sysctl.c
>  create mode 100644 drivers/staging/lustre/lustre/obdclass/module.c
>  create mode 100644 drivers/staging/lustre/lustre/obdclass/sysctl.c
> 
> diff --git a/drivers/staging/lustre/lustre/obdclass/Makefile 
> b/drivers/staging/lustre/lustre/obdclass/Makefile
> index e3fa9acff4c4..e36ba2167d10 100644
> --- a/drivers/staging/lustre/lustre/obdclass/Makefile
> +++ b/drivers/staging/lustre/lustre/obdclass/Makefile
> @@ -4,7 +4,7 @@ subdir-ccflags-y += 
> -I$(srctree)/drivers/staging/lustre/lustre/include
>  
>  obj-$(CONFIG_LUSTRE_FS) += obdclass.o
>  
> -obdclass-y := linux/linux-module.o linux/linux-sysctl.o \
> +obdclass-y := module.o sysctl.o \
> llog.o llog_cat.o llog_obd.o llog_swab.o class_obd.o debug.o \
> genops.o uuid.o lprocfs_status.o lprocfs_counters.o \
> lustre_handles.o lustre_peer.o statfs_pack.o linkea.o \
> diff --git a/drivers/staging/lustre/lustre/obdclass/linux/linux-module.c 
> b/drivers/staging/lustre/lustre/obdclass/linux/linux-module.c
> deleted file mode 100644
> index 9c800580053b..
> --- a/drivers/staging/lustre/lustre/obdclass/linux/linux-module.c
> +++ /dev/null
> @@ -1,514 +0,0 @@
> -// SPDX-License-Identifier: GPL-2.0
> -/*
> - * GPL HEADER START
> - *
> - * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
> - *
> - * This program is free software; you can redistribute it and/or modify
> - * it under the terms of the GNU General Public License version 2 only,
> - * as published by the Free Software Foundation.
> - *
> - * This program is distributed in the hope that it will be useful, but
> - * WITHOUT ANY WARRANTY; without even the implied warranty of
> - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> - * General Public License version 2 for more details (a copy is included
> - * in the LICENSE file that accompanied this code).
> - *
> - * You should have received a copy of the GNU General Public License
> - * version 2 along with this program; If not, see
> - * http://www.gnu.org/licenses/gpl-2.0.html
> - *
> - * GPL HEADER END
> - */
> -/*
> - * Copyright (c) 2007, 2010, Oracle and/or its affiliates. All rights 
> reserved.
> - * Use is subject to license terms.
> - *
> - * Copyright (c) 2011, 2012, Intel Corporation.
> - */
> -/*
> - * This file is part of Lustre, http://www.lustre.org/
> - * Lustre is a trademark of Sun Microsystems, Inc.
> - *
> - * lustre/obdclass/linux/linux-module.c
> - *
> - * Object Devices Class Driver
> - * These are the only exported functions, they provide some generic
> - * infrastructure for managing object devices
> - */
> -
> -#define DEBUG_SUBSYSTEM S_CLASS
> -
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -
> -#define OBD_MAX_IOCTL_BUFFER 8192
> -
> -static int obd_ioctl_is_invalid(struct obd_ioctl_data *data)
> -{
> - if (data->ioc_len > BIT(30)) {
> - CERROR("OBD ioctl: ioc_len larger than 1<<30\n");
> - return 1;
> - }
> -
> - if (data->ioc_inllen1 > BIT(30)) {
> - CERROR("OBD ioctl: ioc_inllen1 larger than 1<<30\n");
> - return 1;
> - }
> -
> - if (data->ioc_inllen2 > BIT(30)) {
> - CERROR("OBD ioctl: ioc_inllen2 larger than 1<<30\n");
> - return 1;
> - }
> -
> - if (data->ioc_inllen3 > BIT(30)) {
> - CERROR("OBD ioctl: ioc_inllen3 larger than 1<<3

Re: [PATCH 08/11] staging: lustre: obdclass: move linux/linux-foo.c to foo.c

2018-06-13 Thread James Simmons


> As lustre is now linux-only, having this linux sub-directory
> with files named "linux-something" is just noise.  Move them
> to a more friendly name.

Reviewed-by: James Simmons 
 
> Signed-off-by: NeilBrown 
> ---
>  drivers/staging/lustre/lustre/obdclass/Makefile|2 
>  .../lustre/lustre/obdclass/linux/linux-module.c|  514 
> 
>  .../lustre/lustre/obdclass/linux/linux-sysctl.c|  162 --
>  drivers/staging/lustre/lustre/obdclass/module.c|  514 
> 
>  drivers/staging/lustre/lustre/obdclass/sysctl.c|  162 ++
>  5 files changed, 677 insertions(+), 677 deletions(-)
>  delete mode 100644 
> drivers/staging/lustre/lustre/obdclass/linux/linux-module.c
>  delete mode 100644 
> drivers/staging/lustre/lustre/obdclass/linux/linux-sysctl.c
>  create mode 100644 drivers/staging/lustre/lustre/obdclass/module.c
>  create mode 100644 drivers/staging/lustre/lustre/obdclass/sysctl.c
> 
> diff --git a/drivers/staging/lustre/lustre/obdclass/Makefile 
> b/drivers/staging/lustre/lustre/obdclass/Makefile
> index e3fa9acff4c4..e36ba2167d10 100644
> --- a/drivers/staging/lustre/lustre/obdclass/Makefile
> +++ b/drivers/staging/lustre/lustre/obdclass/Makefile
> @@ -4,7 +4,7 @@ subdir-ccflags-y += 
> -I$(srctree)/drivers/staging/lustre/lustre/include
>  
>  obj-$(CONFIG_LUSTRE_FS) += obdclass.o
>  
> -obdclass-y := linux/linux-module.o linux/linux-sysctl.o \
> +obdclass-y := module.o sysctl.o \
> llog.o llog_cat.o llog_obd.o llog_swab.o class_obd.o debug.o \
> genops.o uuid.o lprocfs_status.o lprocfs_counters.o \
> lustre_handles.o lustre_peer.o statfs_pack.o linkea.o \
> diff --git a/drivers/staging/lustre/lustre/obdclass/linux/linux-module.c 
> b/drivers/staging/lustre/lustre/obdclass/linux/linux-module.c
> deleted file mode 100644
> index 9c800580053b..
> --- a/drivers/staging/lustre/lustre/obdclass/linux/linux-module.c
> +++ /dev/null
> @@ -1,514 +0,0 @@
> -// SPDX-License-Identifier: GPL-2.0
> -/*
> - * GPL HEADER START
> - *
> - * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
> - *
> - * This program is free software; you can redistribute it and/or modify
> - * it under the terms of the GNU General Public License version 2 only,
> - * as published by the Free Software Foundation.
> - *
> - * This program is distributed in the hope that it will be useful, but
> - * WITHOUT ANY WARRANTY; without even the implied warranty of
> - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> - * General Public License version 2 for more details (a copy is included
> - * in the LICENSE file that accompanied this code).
> - *
> - * You should have received a copy of the GNU General Public License
> - * version 2 along with this program; If not, see
> - * http://www.gnu.org/licenses/gpl-2.0.html
> - *
> - * GPL HEADER END
> - */
> -/*
> - * Copyright (c) 2007, 2010, Oracle and/or its affiliates. All rights 
> reserved.
> - * Use is subject to license terms.
> - *
> - * Copyright (c) 2011, 2012, Intel Corporation.
> - */
> -/*
> - * This file is part of Lustre, http://www.lustre.org/
> - * Lustre is a trademark of Sun Microsystems, Inc.
> - *
> - * lustre/obdclass/linux/linux-module.c
> - *
> - * Object Devices Class Driver
> - * These are the only exported functions, they provide some generic
> - * infrastructure for managing object devices
> - */
> -
> -#define DEBUG_SUBSYSTEM S_CLASS
> -
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -
> -#define OBD_MAX_IOCTL_BUFFER 8192
> -
> -static int obd_ioctl_is_invalid(struct obd_ioctl_data *data)
> -{
> - if (data->ioc_len > BIT(30)) {
> - CERROR("OBD ioctl: ioc_len larger than 1<<30\n");
> - return 1;
> - }
> -
> - if (data->ioc_inllen1 > BIT(30)) {
> - CERROR("OBD ioctl: ioc_inllen1 larger than 1<<30\n");
> - return 1;
> - }
> -
> - if (data->ioc_inllen2 > BIT(30)) {
> - CERROR("OBD ioctl: ioc_inllen2 larger than 1<<30\n");
> - return 1;
> - }
> -
> - if (data->ioc_inllen3 > BIT(30)) {
> - CERROR("OBD ioctl: ioc_inllen3 larger than 1<<3

Re: [PATCH 09/11] staging: lustre: discard WIRE_ATTR

2018-06-13 Thread James Simmons


> This macro adds nothing of value, and make the code harder
> to read for new readers.

Reviewed-by: James Simmons 
 
> Signed-off-by: NeilBrown 
> ---
>  .../staging/lustre/include/linux/lnet/socklnd.h|8 ++-
>  .../lustre/include/uapi/linux/lnet/lnet-types.h|   28 +---
>  .../lustre/include/uapi/linux/lnet/lnetst.h|4 +-
>  .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h|   22 +
>  drivers/staging/lustre/lnet/selftest/rpc.h |   48 
> ++--
>  5 files changed, 54 insertions(+), 56 deletions(-)
> 
> diff --git a/drivers/staging/lustre/include/linux/lnet/socklnd.h 
> b/drivers/staging/lustre/include/linux/lnet/socklnd.h
> index 6bd1bca190a3..9f69257e000b 100644
> --- a/drivers/staging/lustre/include/linux/lnet/socklnd.h
> +++ b/drivers/staging/lustre/include/linux/lnet/socklnd.h
> @@ -50,7 +50,7 @@ struct ksock_hello_msg {
>   __u32   kshm_ctype; /* connection type */
>   __u32   kshm_nips;  /* # IP addrs */
>   __u32   kshm_ips[0];/* IP addrs */
> -} WIRE_ATTR;
> +} __packed;
>  
>  struct ksock_lnet_msg {
>   struct lnet_hdr ksnm_hdr;   /* lnet hdr */
> @@ -61,7 +61,7 @@ struct ksock_lnet_msg {
>* structure definitions. lnet payload will be stored just after
>* the body of structure ksock_lnet_msg_t
>*/
> -} WIRE_ATTR;
> +} __packed;
>  
>  struct ksock_msg {
>   __u32   ksm_type;   /* type of socklnd message */
> @@ -71,8 +71,8 @@ struct ksock_msg {
>   struct ksock_lnet_msg lnetmsg; /* lnet message, it's empty if
>   * it's NOOP
>   */
> - } WIRE_ATTR ksm_u;
> -} WIRE_ATTR;
> + } __packed ksm_u;
> +} __packed;
>  
>  #define KSOCK_MSG_NOOP   0xC0/* ksm_u empty */
>  #define KSOCK_MSG_LNET   0xC1/* lnet msg */
> diff --git a/drivers/staging/lustre/include/uapi/linux/lnet/lnet-types.h 
> b/drivers/staging/lustre/include/uapi/linux/lnet/lnet-types.h
> index 1be9b7aa7326..f97e7d9d881f 100644
> --- a/drivers/staging/lustre/include/uapi/linux/lnet/lnet-types.h
> +++ b/drivers/staging/lustre/include/uapi/linux/lnet/lnet-types.h
> @@ -112,14 +112,12 @@ static inline __u32 LNET_MKNET(__u32 type, __u32 num)
>   return (type << 16) | num;
>  }
>  
> -#define WIRE_ATTR__packed
> -
>  /* Packed version of lnet_process_id to transfer via network */
>  struct lnet_process_id_packed {
>   /* node id / process id */
>   lnet_nid_t  nid;
>   lnet_pid_t  pid;
> -} WIRE_ATTR;
> +} __packed;
>  
>  /*
>   * The wire handle's interface cookie only matches one network interface in
> @@ -130,7 +128,7 @@ struct lnet_process_id_packed {
>  struct lnet_handle_wire {
>   __u64   wh_interface_cookie;
>   __u64   wh_object_cookie;
> -} WIRE_ATTR;
> +} __packed;
>  
>  enum lnet_msg_type {
>   LNET_MSG_ACK = 0,
> @@ -150,7 +148,7 @@ struct lnet_ack {
>   struct lnet_handle_wire dst_wmd;
>   __u64   match_bits;
>   __u32   mlength;
> -} WIRE_ATTR;
> +} __packed;
>  
>  struct lnet_put {
>   struct lnet_handle_wire ack_wmd;
> @@ -158,7 +156,7 @@ struct lnet_put {
>   __u64   hdr_data;
>   __u32   ptl_index;
>   __u32   offset;
> -} WIRE_ATTR;
> +} __packed;
>  
>  struct lnet_get {
>   struct lnet_handle_wire return_wmd;
> @@ -166,16 +164,16 @@ struct lnet_get {
>   __u32   ptl_index;
>   __u32   src_offset;
>   __u32   sink_length;
> -} WIRE_ATTR;
> +} __packed;
>  
>  struct lnet_reply {
>   struct lnet_handle_wire dst_wmd;
> -} WIRE_ATTR;
> +} __packed;
>  
>  struct lnet_hello {
>   __u64   incarnation;
>   __u32   type;
> -} WIRE_ATTR;
> +} __packed;
>  
>  struct lnet_hdr {
>   lnet_nid_t  dest_nid;
> @@ -192,7 +190,7 @@ struct lnet_hdr {
>   struct lnet_reply   reply;
>   struct lnet_hello   hello;
>   } msg;
> -} WIRE_ATTR;
> +} __packed;
>  
>  /*
>   * A HELLO message contains a magic number and protocol version
> @@ -208,7 +206,7 @@ struct lnet_magicversion {
>   __u32   magic;  /* LNET_PROTO_TCP_MAGIC */
>   __u16   version_major;  /* increment on incompatible change */
>   __u16   version_minor;  /* increment on compatible change */
> -} WIRE_ATTR;
> +} __packed;
>  
>  /* PROTO MAGIC for LNDs */
>  #define LNET_P

Re: [PATCH 09/11] staging: lustre: discard WIRE_ATTR

2018-06-13 Thread James Simmons


> This macro adds nothing of value, and make the code harder
> to read for new readers.

Reviewed-by: James Simmons 
 
> Signed-off-by: NeilBrown 
> ---
>  .../staging/lustre/include/linux/lnet/socklnd.h|8 ++-
>  .../lustre/include/uapi/linux/lnet/lnet-types.h|   28 +---
>  .../lustre/include/uapi/linux/lnet/lnetst.h|4 +-
>  .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h|   22 +
>  drivers/staging/lustre/lnet/selftest/rpc.h |   48 
> ++--
>  5 files changed, 54 insertions(+), 56 deletions(-)
> 
> diff --git a/drivers/staging/lustre/include/linux/lnet/socklnd.h 
> b/drivers/staging/lustre/include/linux/lnet/socklnd.h
> index 6bd1bca190a3..9f69257e000b 100644
> --- a/drivers/staging/lustre/include/linux/lnet/socklnd.h
> +++ b/drivers/staging/lustre/include/linux/lnet/socklnd.h
> @@ -50,7 +50,7 @@ struct ksock_hello_msg {
>   __u32   kshm_ctype; /* connection type */
>   __u32   kshm_nips;  /* # IP addrs */
>   __u32   kshm_ips[0];/* IP addrs */
> -} WIRE_ATTR;
> +} __packed;
>  
>  struct ksock_lnet_msg {
>   struct lnet_hdr ksnm_hdr;   /* lnet hdr */
> @@ -61,7 +61,7 @@ struct ksock_lnet_msg {
>* structure definitions. lnet payload will be stored just after
>* the body of structure ksock_lnet_msg_t
>*/
> -} WIRE_ATTR;
> +} __packed;
>  
>  struct ksock_msg {
>   __u32   ksm_type;   /* type of socklnd message */
> @@ -71,8 +71,8 @@ struct ksock_msg {
>   struct ksock_lnet_msg lnetmsg; /* lnet message, it's empty if
>   * it's NOOP
>   */
> - } WIRE_ATTR ksm_u;
> -} WIRE_ATTR;
> + } __packed ksm_u;
> +} __packed;
>  
>  #define KSOCK_MSG_NOOP   0xC0/* ksm_u empty */
>  #define KSOCK_MSG_LNET   0xC1/* lnet msg */
> diff --git a/drivers/staging/lustre/include/uapi/linux/lnet/lnet-types.h 
> b/drivers/staging/lustre/include/uapi/linux/lnet/lnet-types.h
> index 1be9b7aa7326..f97e7d9d881f 100644
> --- a/drivers/staging/lustre/include/uapi/linux/lnet/lnet-types.h
> +++ b/drivers/staging/lustre/include/uapi/linux/lnet/lnet-types.h
> @@ -112,14 +112,12 @@ static inline __u32 LNET_MKNET(__u32 type, __u32 num)
>   return (type << 16) | num;
>  }
>  
> -#define WIRE_ATTR__packed
> -
>  /* Packed version of lnet_process_id to transfer via network */
>  struct lnet_process_id_packed {
>   /* node id / process id */
>   lnet_nid_t  nid;
>   lnet_pid_t  pid;
> -} WIRE_ATTR;
> +} __packed;
>  
>  /*
>   * The wire handle's interface cookie only matches one network interface in
> @@ -130,7 +128,7 @@ struct lnet_process_id_packed {
>  struct lnet_handle_wire {
>   __u64   wh_interface_cookie;
>   __u64   wh_object_cookie;
> -} WIRE_ATTR;
> +} __packed;
>  
>  enum lnet_msg_type {
>   LNET_MSG_ACK = 0,
> @@ -150,7 +148,7 @@ struct lnet_ack {
>   struct lnet_handle_wire dst_wmd;
>   __u64   match_bits;
>   __u32   mlength;
> -} WIRE_ATTR;
> +} __packed;
>  
>  struct lnet_put {
>   struct lnet_handle_wire ack_wmd;
> @@ -158,7 +156,7 @@ struct lnet_put {
>   __u64   hdr_data;
>   __u32   ptl_index;
>   __u32   offset;
> -} WIRE_ATTR;
> +} __packed;
>  
>  struct lnet_get {
>   struct lnet_handle_wire return_wmd;
> @@ -166,16 +164,16 @@ struct lnet_get {
>   __u32   ptl_index;
>   __u32   src_offset;
>   __u32   sink_length;
> -} WIRE_ATTR;
> +} __packed;
>  
>  struct lnet_reply {
>   struct lnet_handle_wire dst_wmd;
> -} WIRE_ATTR;
> +} __packed;
>  
>  struct lnet_hello {
>   __u64   incarnation;
>   __u32   type;
> -} WIRE_ATTR;
> +} __packed;
>  
>  struct lnet_hdr {
>   lnet_nid_t  dest_nid;
> @@ -192,7 +190,7 @@ struct lnet_hdr {
>   struct lnet_reply   reply;
>   struct lnet_hello   hello;
>   } msg;
> -} WIRE_ATTR;
> +} __packed;
>  
>  /*
>   * A HELLO message contains a magic number and protocol version
> @@ -208,7 +206,7 @@ struct lnet_magicversion {
>   __u32   magic;  /* LNET_PROTO_TCP_MAGIC */
>   __u16   version_major;  /* increment on incompatible change */
>   __u16   version_minor;  /* increment on compatible change */
> -} WIRE_ATTR;
> +} __packed;
>  
>  /* PROTO MAGIC for LNDs */
>  #define LNET_P

Re: [PATCH 07/11] staging: lustre: fold lprocfs_call_handler functionality into lnet_debugfs_*

2018-06-13 Thread James Simmons


> The calling convention for ->proc_handler is rather clumsy,
> as a comment in fs/procfs/proc_sysctl.c confirms.
> lustre has copied this convention to lnet_debugfs_{read,write},
> and then provided a wrapper for handlers - lprocfs_call_handler -
> to work around the clumsiness.
> 
> It is cleaner to just fold the functionality of lprocfs_call_handler()
> into lnet_debugfs_* and let them call the final handler directly.
> 
> If these files were ever moved to /proc/sys (which seems unlikely) the
> handling in fs/procfs/proc_sysctl.c would need to be fixed to, but
> that would not be a bad thing.
> 
> So modify all the functions that did use the wrapper to not need it
> now that a more sane calling convention is available.

Reviewed-by: James Simmons 
 
> Signed-off-by: NeilBrown 
> ---
>  .../staging/lustre/include/linux/libcfs/libcfs.h   |4 -
>  drivers/staging/lustre/lnet/libcfs/module.c|   84 
> +++-
>  drivers/staging/lustre/lnet/lnet/router_proc.c |   41 +++---
>  3 files changed, 41 insertions(+), 88 deletions(-)
> 
> diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs.h 
> b/drivers/staging/lustre/include/linux/libcfs/libcfs.h
> index edc7ed0dcb94..7ac609328256 100644
> --- a/drivers/staging/lustre/include/linux/libcfs/libcfs.h
> +++ b/drivers/staging/lustre/include/linux/libcfs/libcfs.h
> @@ -57,10 +57,6 @@ int libcfs_setup(void);
>  extern struct workqueue_struct *cfs_rehash_wq;
>  
>  void lustre_insert_debugfs(struct ctl_table *table);
> -int lprocfs_call_handler(void *data, int write, loff_t *ppos,
> -  void __user *buffer, size_t *lenp,
> -  int (*handler)(void *data, int write, loff_t pos,
> - void __user *buffer, int len));
>  
>  /*
>   * Memory
> diff --git a/drivers/staging/lustre/lnet/libcfs/module.c 
> b/drivers/staging/lustre/lnet/libcfs/module.c
> index 5dc7de9e6478..02c404c6738e 100644
> --- a/drivers/staging/lustre/lnet/libcfs/module.c
> +++ b/drivers/staging/lustre/lnet/libcfs/module.c
> @@ -290,33 +290,15 @@ static struct miscdevice libcfs_dev = {
>  
>  static int libcfs_dev_registered;
>  
> -int lprocfs_call_handler(void *data, int write, loff_t *ppos,
> -  void __user *buffer, size_t *lenp,
> -  int (*handler)(void *data, int write, loff_t pos,
> - void __user *buffer, int len))
> -{
> - int rc = handler(data, write, *ppos, buffer, *lenp);
> -
> - if (rc < 0)
> - return rc;
> -
> - if (write) {
> - *ppos += *lenp;
> - } else {
> - *lenp = rc;
> - *ppos += rc;
> - }
> - return 0;
> -}
> -EXPORT_SYMBOL(lprocfs_call_handler);
> -
> -static int __proc_dobitmasks(void *data, int write,
> -  loff_t pos, void __user *buffer, int nob)
> +static int proc_dobitmasks(struct ctl_table *table, int write,
> +void __user *buffer, size_t *lenp, loff_t *ppos)
>  {
>   const int tmpstrlen = 512;
>   char *tmpstr;
>   int rc;
> - unsigned int *mask = data;
> + size_t nob = *lenp;
> + loff_t pos = *ppos;
> + unsigned int *mask = table->data;
>   int is_subsys = (mask == _subsystem_debug) ? 1 : 0;
>   int is_printk = (mask == _printk) ? 1 : 0;
>  
> @@ -351,32 +333,23 @@ static int __proc_dobitmasks(void *data, int write,
>   return rc;
>  }
>  
> -static int proc_dobitmasks(struct ctl_table *table, int write,
> -void __user *buffer, size_t *lenp, loff_t *ppos)
> +static int proc_dump_kernel(struct ctl_table *table, int write,
> + void __user *buffer, size_t *lenp, loff_t *ppos)
>  {
> - return lprocfs_call_handler(table->data, write, ppos, buffer, lenp,
> - __proc_dobitmasks);
> -}
> + size_t nob = *lenp;
>  
> -static int __proc_dump_kernel(void *data, int write,
> -   loff_t pos, void __user *buffer, int nob)
> -{
>   if (!write)
>   return 0;
>  
>   return cfs_trace_dump_debug_buffer_usrstr(buffer, nob);
>  }
>  
> -static int proc_dump_kernel(struct ctl_table *table, int write,
> +static int proc_daemon_file(struct ctl_table *table, int write,
>   void __user *buffer, size_t *lenp, loff_t *ppos)
>  {
> - return lprocfs_call_handler(table->data, write, ppos, buffer, lenp,
> - __proc_dump_kernel);
> -}
> + size_t nob = *lenp;
> + loff_t pos = *ppos;
>  
> -static int __proc_daem

Re: [PATCH 07/11] staging: lustre: fold lprocfs_call_handler functionality into lnet_debugfs_*

2018-06-13 Thread James Simmons


> The calling convention for ->proc_handler is rather clumsy,
> as a comment in fs/procfs/proc_sysctl.c confirms.
> lustre has copied this convention to lnet_debugfs_{read,write},
> and then provided a wrapper for handlers - lprocfs_call_handler -
> to work around the clumsiness.
> 
> It is cleaner to just fold the functionality of lprocfs_call_handler()
> into lnet_debugfs_* and let them call the final handler directly.
> 
> If these files were ever moved to /proc/sys (which seems unlikely) the
> handling in fs/procfs/proc_sysctl.c would need to be fixed to, but
> that would not be a bad thing.
> 
> So modify all the functions that did use the wrapper to not need it
> now that a more sane calling convention is available.

Reviewed-by: James Simmons 
 
> Signed-off-by: NeilBrown 
> ---
>  .../staging/lustre/include/linux/libcfs/libcfs.h   |4 -
>  drivers/staging/lustre/lnet/libcfs/module.c|   84 
> +++-
>  drivers/staging/lustre/lnet/lnet/router_proc.c |   41 +++---
>  3 files changed, 41 insertions(+), 88 deletions(-)
> 
> diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs.h 
> b/drivers/staging/lustre/include/linux/libcfs/libcfs.h
> index edc7ed0dcb94..7ac609328256 100644
> --- a/drivers/staging/lustre/include/linux/libcfs/libcfs.h
> +++ b/drivers/staging/lustre/include/linux/libcfs/libcfs.h
> @@ -57,10 +57,6 @@ int libcfs_setup(void);
>  extern struct workqueue_struct *cfs_rehash_wq;
>  
>  void lustre_insert_debugfs(struct ctl_table *table);
> -int lprocfs_call_handler(void *data, int write, loff_t *ppos,
> -  void __user *buffer, size_t *lenp,
> -  int (*handler)(void *data, int write, loff_t pos,
> - void __user *buffer, int len));
>  
>  /*
>   * Memory
> diff --git a/drivers/staging/lustre/lnet/libcfs/module.c 
> b/drivers/staging/lustre/lnet/libcfs/module.c
> index 5dc7de9e6478..02c404c6738e 100644
> --- a/drivers/staging/lustre/lnet/libcfs/module.c
> +++ b/drivers/staging/lustre/lnet/libcfs/module.c
> @@ -290,33 +290,15 @@ static struct miscdevice libcfs_dev = {
>  
>  static int libcfs_dev_registered;
>  
> -int lprocfs_call_handler(void *data, int write, loff_t *ppos,
> -  void __user *buffer, size_t *lenp,
> -  int (*handler)(void *data, int write, loff_t pos,
> - void __user *buffer, int len))
> -{
> - int rc = handler(data, write, *ppos, buffer, *lenp);
> -
> - if (rc < 0)
> - return rc;
> -
> - if (write) {
> - *ppos += *lenp;
> - } else {
> - *lenp = rc;
> - *ppos += rc;
> - }
> - return 0;
> -}
> -EXPORT_SYMBOL(lprocfs_call_handler);
> -
> -static int __proc_dobitmasks(void *data, int write,
> -  loff_t pos, void __user *buffer, int nob)
> +static int proc_dobitmasks(struct ctl_table *table, int write,
> +void __user *buffer, size_t *lenp, loff_t *ppos)
>  {
>   const int tmpstrlen = 512;
>   char *tmpstr;
>   int rc;
> - unsigned int *mask = data;
> + size_t nob = *lenp;
> + loff_t pos = *ppos;
> + unsigned int *mask = table->data;
>   int is_subsys = (mask == _subsystem_debug) ? 1 : 0;
>   int is_printk = (mask == _printk) ? 1 : 0;
>  
> @@ -351,32 +333,23 @@ static int __proc_dobitmasks(void *data, int write,
>   return rc;
>  }
>  
> -static int proc_dobitmasks(struct ctl_table *table, int write,
> -void __user *buffer, size_t *lenp, loff_t *ppos)
> +static int proc_dump_kernel(struct ctl_table *table, int write,
> + void __user *buffer, size_t *lenp, loff_t *ppos)
>  {
> - return lprocfs_call_handler(table->data, write, ppos, buffer, lenp,
> - __proc_dobitmasks);
> -}
> + size_t nob = *lenp;
>  
> -static int __proc_dump_kernel(void *data, int write,
> -   loff_t pos, void __user *buffer, int nob)
> -{
>   if (!write)
>   return 0;
>  
>   return cfs_trace_dump_debug_buffer_usrstr(buffer, nob);
>  }
>  
> -static int proc_dump_kernel(struct ctl_table *table, int write,
> +static int proc_daemon_file(struct ctl_table *table, int write,
>   void __user *buffer, size_t *lenp, loff_t *ppos)
>  {
> - return lprocfs_call_handler(table->data, write, ppos, buffer, lenp,
> - __proc_dump_kernel);
> -}
> + size_t nob = *lenp;
> + loff_t pos = *ppos;
>  
> -static int __proc_daem

Re: [PATCH v2 01/25] staging: lustre: libcfs: restore UMP handling

2018-06-13 Thread James Simmons


> > With the cleanup of the libcfs SMP handling all UMP handling
> > was removed. In the process now various NULL pointers and
> > empty fields are return in the UMP case which causes lustre
> > to crash hard. Restore the proper UMP handling so Lustre can
> > properly function.
> 
> Can't we just get lustre to handle the NULL pointer?
> Is most cases, the pointer is accessed through an accessor function, and
> on !CONFIG_SMP, that can be a static inline that doesn't even look at
> the pointer.

Lots of NULL pointer checks for a structure allocated at libcfs module 
start and only cleaned up at libcfs removal is not a clean approach.
So I have thought about it and I have to ask why allocate a global
struct cfs_cpu_table. It could be made static and fill it in which would
avoid the whole NULL pointer issue. Plus for the UMP case why allocate
a new cfs_cpu_table with cfs_cpt_table_alloc() which is exactly like
the default UMP cfs_cpu_table. Instead we could just return the pointer
to the static default cfs_cpt_tab every time. We still have the NULL
ctb_cpumask field to deal with. Does that sound like a better solution
to you? Doug what do you think?
 
> I really think this is a step backwards.  If you can identify specific
> problems caused by the current code, I'm sure we can fix them.
> 
> >
> > Signed-off-by: James Simmons 
> > Signed-off-by: Amir Shehata 
> > Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734
> 
> This bug doesn't seem to mention this patch at all
> 
> > Reviewed-on: http://review.whamcloud.com/18916
> 
> Nor does this review.

Yeah its mutated so much from what is in the Intel tree.
I do believe it was the last patch to touch this.


Re: [PATCH v2 01/25] staging: lustre: libcfs: restore UMP handling

2018-06-13 Thread James Simmons


> > With the cleanup of the libcfs SMP handling all UMP handling
> > was removed. In the process now various NULL pointers and
> > empty fields are return in the UMP case which causes lustre
> > to crash hard. Restore the proper UMP handling so Lustre can
> > properly function.
> 
> Can't we just get lustre to handle the NULL pointer?
> Is most cases, the pointer is accessed through an accessor function, and
> on !CONFIG_SMP, that can be a static inline that doesn't even look at
> the pointer.

Lots of NULL pointer checks for a structure allocated at libcfs module 
start and only cleaned up at libcfs removal is not a clean approach.
So I have thought about it and I have to ask why allocate a global
struct cfs_cpu_table. It could be made static and fill it in which would
avoid the whole NULL pointer issue. Plus for the UMP case why allocate
a new cfs_cpu_table with cfs_cpt_table_alloc() which is exactly like
the default UMP cfs_cpu_table. Instead we could just return the pointer
to the static default cfs_cpt_tab every time. We still have the NULL
ctb_cpumask field to deal with. Does that sound like a better solution
to you? Doug what do you think?
 
> I really think this is a step backwards.  If you can identify specific
> problems caused by the current code, I'm sure we can fix them.
> 
> >
> > Signed-off-by: James Simmons 
> > Signed-off-by: Amir Shehata 
> > Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734
> 
> This bug doesn't seem to mention this patch at all
> 
> > Reviewed-on: http://review.whamcloud.com/18916
> 
> Nor does this review.

Yeah its mutated so much from what is in the Intel tree.
I do believe it was the last patch to touch this.


Re: [PATCH 11/11] staging: lustre: centralize setting of subdir-ccflags-y

2018-06-13 Thread James Simmons


> We don't need to set subdir-ccflags-y in every Makefile.
> The whole point of the "subdir-" bit is that the setting
> can go once in the top-level Makefile.
> 

Nak: When attempting to build individual components I get:

~/lustre-upstream$ make SUBDIRS=drivers/staging/lustre/lustre/lmv modules 
-j 16

  WARNING: Symbol version dump ./Module.symvers
   is missing; modules will have no dependencies and modversions.

  CC [M]  drivers/staging/lustre/lustre/lmv/lmv_obd.o
  CC [M]  drivers/staging/lustre/lustre/lmv/lmv_intent.o
  CC [M]  drivers/staging/lustre/lustre/lmv/lmv_fld.o
  CC [M]  drivers/staging/lustre/lustre/lmv/lproc_lmv.o
drivers/staging/lustre/lustre/lmv/lproc_lmv.c:38:28: fatal error: 
lprocfs_status.h: No such file or directory
 #include 
^
compilation terminated.


> Signed-off-by: NeilBrown 
> ---
>  drivers/staging/lustre/Makefile|3 +++
>  drivers/staging/lustre/lnet/klnds/o2iblnd/Makefile |2 --
>  drivers/staging/lustre/lnet/klnds/socklnd/Makefile |2 --
>  drivers/staging/lustre/lnet/libcfs/Makefile|2 --
>  drivers/staging/lustre/lnet/lnet/Makefile  |2 --
>  drivers/staging/lustre/lnet/selftest/Makefile  |2 --
>  drivers/staging/lustre/lustre/fid/Makefile |2 --
>  drivers/staging/lustre/lustre/fld/Makefile |2 --
>  drivers/staging/lustre/lustre/llite/Makefile   |2 --
>  drivers/staging/lustre/lustre/lmv/Makefile |2 --
>  drivers/staging/lustre/lustre/lov/Makefile |2 --
>  drivers/staging/lustre/lustre/mdc/Makefile |2 --
>  drivers/staging/lustre/lustre/mgc/Makefile |2 --
>  drivers/staging/lustre/lustre/obdclass/Makefile|2 --
>  drivers/staging/lustre/lustre/obdecho/Makefile |2 --
>  drivers/staging/lustre/lustre/osc/Makefile |2 --
>  drivers/staging/lustre/lustre/ptlrpc/Makefile  |2 --
>  17 files changed, 3 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/staging/lustre/Makefile b/drivers/staging/lustre/Makefile
> index 95ffe337a80a..a44086fa8668 100644
> --- a/drivers/staging/lustre/Makefile
> +++ b/drivers/staging/lustre/Makefile
> @@ -1,2 +1,5 @@
> +subdir-ccflags-y += -I$(srctree)/drivers/staging/lustre/include
> +subdir-ccflags-y += -I$(srctree)/drivers/staging/lustre/lustre/include
> +
>  obj-$(CONFIG_LNET)   += lnet/
>  obj-$(CONFIG_LUSTRE_FS)  += lustre/
> diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/Makefile 
> b/drivers/staging/lustre/lnet/klnds/o2iblnd/Makefile
> index 4affe1d79948..e1a05ece130c 100644
> --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/Makefile
> +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/Makefile
> @@ -1,5 +1,3 @@
> -subdir-ccflags-y += -I$(srctree)/drivers/staging/lustre/include
> -subdir-ccflags-y += -I$(srctree)/drivers/staging/lustre/lustre/include
>  
>  obj-$(CONFIG_LNET_XPRT_IB) += ko2iblnd.o
>  ko2iblnd-y := o2iblnd.o o2iblnd_cb.o o2iblnd_modparams.o
> diff --git a/drivers/staging/lustre/lnet/klnds/socklnd/Makefile 
> b/drivers/staging/lustre/lnet/klnds/socklnd/Makefile
> index a7da1abfc804..4d03cad997c1 100644
> --- a/drivers/staging/lustre/lnet/klnds/socklnd/Makefile
> +++ b/drivers/staging/lustre/lnet/klnds/socklnd/Makefile
> @@ -1,5 +1,3 @@
> -subdir-ccflags-y += -I$(srctree)/drivers/staging/lustre/include
> -subdir-ccflags-y += -I$(srctree)/drivers/staging/lustre/lustre/include
>  
>  obj-$(CONFIG_LNET) += ksocklnd.o
>  
> diff --git a/drivers/staging/lustre/lnet/libcfs/Makefile 
> b/drivers/staging/lustre/lnet/libcfs/Makefile
> index 6a1b232da495..3d6b99c6e883 100644
> --- a/drivers/staging/lustre/lnet/libcfs/Makefile
> +++ b/drivers/staging/lustre/lnet/libcfs/Makefile
> @@ -1,6 +1,4 @@
>  # SPDX-License-Identifier: GPL-2.0
> -subdir-ccflags-y += -I$(srctree)/drivers/staging/lustre/include
> -subdir-ccflags-y += -I$(srctree)/drivers/staging/lustre/lustre/include
>  
>  obj-$(CONFIG_LNET) += libcfs.o
>  
> diff --git a/drivers/staging/lustre/lnet/lnet/Makefile 
> b/drivers/staging/lustre/lnet/lnet/Makefile
> index 0a9d70924fe0..ba33e90e47ec 100644
> --- a/drivers/staging/lustre/lnet/lnet/Makefile
> +++ b/drivers/staging/lustre/lnet/lnet/Makefile
> @@ -1,6 +1,4 @@
>  # SPDX-License-Identifier: GPL-2.0
> -subdir-ccflags-y += -I$(srctree)/drivers/staging/lustre/include
> -subdir-ccflags-y += -I$(srctree)/drivers/staging/lustre/lustre/include
>  
>  obj-$(CONFIG_LNET) += lnet.o
>  
> diff --git a/drivers/staging/lustre/lnet/selftest/Makefile 
> b/drivers/staging/lustre/lnet/selftest/Makefile
> index 3ccc8966b566..16f8efcd1531 100644
> --- a/drivers/staging/lustre/lnet/selftest/Makefile
> +++ b/drivers/staging/lustre/lnet/selftest/Makefile
> @@ -1,5 +1,3 @@
> -subdir-ccflags-y += -I$(srctree)/drivers/staging/lustre/include
> -subdir-ccflags-y += -I$(srctree)/drivers/staging/lustre/lustre/include
>  
>  obj-$(CONFIG_LNET_SELFTEST) := lnet_selftest.o
>  
> diff --git 

Re: [PATCH 11/11] staging: lustre: centralize setting of subdir-ccflags-y

2018-06-13 Thread James Simmons


> We don't need to set subdir-ccflags-y in every Makefile.
> The whole point of the "subdir-" bit is that the setting
> can go once in the top-level Makefile.
> 

Nak: When attempting to build individual components I get:

~/lustre-upstream$ make SUBDIRS=drivers/staging/lustre/lustre/lmv modules 
-j 16

  WARNING: Symbol version dump ./Module.symvers
   is missing; modules will have no dependencies and modversions.

  CC [M]  drivers/staging/lustre/lustre/lmv/lmv_obd.o
  CC [M]  drivers/staging/lustre/lustre/lmv/lmv_intent.o
  CC [M]  drivers/staging/lustre/lustre/lmv/lmv_fld.o
  CC [M]  drivers/staging/lustre/lustre/lmv/lproc_lmv.o
drivers/staging/lustre/lustre/lmv/lproc_lmv.c:38:28: fatal error: 
lprocfs_status.h: No such file or directory
 #include 
^
compilation terminated.


> Signed-off-by: NeilBrown 
> ---
>  drivers/staging/lustre/Makefile|3 +++
>  drivers/staging/lustre/lnet/klnds/o2iblnd/Makefile |2 --
>  drivers/staging/lustre/lnet/klnds/socklnd/Makefile |2 --
>  drivers/staging/lustre/lnet/libcfs/Makefile|2 --
>  drivers/staging/lustre/lnet/lnet/Makefile  |2 --
>  drivers/staging/lustre/lnet/selftest/Makefile  |2 --
>  drivers/staging/lustre/lustre/fid/Makefile |2 --
>  drivers/staging/lustre/lustre/fld/Makefile |2 --
>  drivers/staging/lustre/lustre/llite/Makefile   |2 --
>  drivers/staging/lustre/lustre/lmv/Makefile |2 --
>  drivers/staging/lustre/lustre/lov/Makefile |2 --
>  drivers/staging/lustre/lustre/mdc/Makefile |2 --
>  drivers/staging/lustre/lustre/mgc/Makefile |2 --
>  drivers/staging/lustre/lustre/obdclass/Makefile|2 --
>  drivers/staging/lustre/lustre/obdecho/Makefile |2 --
>  drivers/staging/lustre/lustre/osc/Makefile |2 --
>  drivers/staging/lustre/lustre/ptlrpc/Makefile  |2 --
>  17 files changed, 3 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/staging/lustre/Makefile b/drivers/staging/lustre/Makefile
> index 95ffe337a80a..a44086fa8668 100644
> --- a/drivers/staging/lustre/Makefile
> +++ b/drivers/staging/lustre/Makefile
> @@ -1,2 +1,5 @@
> +subdir-ccflags-y += -I$(srctree)/drivers/staging/lustre/include
> +subdir-ccflags-y += -I$(srctree)/drivers/staging/lustre/lustre/include
> +
>  obj-$(CONFIG_LNET)   += lnet/
>  obj-$(CONFIG_LUSTRE_FS)  += lustre/
> diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/Makefile 
> b/drivers/staging/lustre/lnet/klnds/o2iblnd/Makefile
> index 4affe1d79948..e1a05ece130c 100644
> --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/Makefile
> +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/Makefile
> @@ -1,5 +1,3 @@
> -subdir-ccflags-y += -I$(srctree)/drivers/staging/lustre/include
> -subdir-ccflags-y += -I$(srctree)/drivers/staging/lustre/lustre/include
>  
>  obj-$(CONFIG_LNET_XPRT_IB) += ko2iblnd.o
>  ko2iblnd-y := o2iblnd.o o2iblnd_cb.o o2iblnd_modparams.o
> diff --git a/drivers/staging/lustre/lnet/klnds/socklnd/Makefile 
> b/drivers/staging/lustre/lnet/klnds/socklnd/Makefile
> index a7da1abfc804..4d03cad997c1 100644
> --- a/drivers/staging/lustre/lnet/klnds/socklnd/Makefile
> +++ b/drivers/staging/lustre/lnet/klnds/socklnd/Makefile
> @@ -1,5 +1,3 @@
> -subdir-ccflags-y += -I$(srctree)/drivers/staging/lustre/include
> -subdir-ccflags-y += -I$(srctree)/drivers/staging/lustre/lustre/include
>  
>  obj-$(CONFIG_LNET) += ksocklnd.o
>  
> diff --git a/drivers/staging/lustre/lnet/libcfs/Makefile 
> b/drivers/staging/lustre/lnet/libcfs/Makefile
> index 6a1b232da495..3d6b99c6e883 100644
> --- a/drivers/staging/lustre/lnet/libcfs/Makefile
> +++ b/drivers/staging/lustre/lnet/libcfs/Makefile
> @@ -1,6 +1,4 @@
>  # SPDX-License-Identifier: GPL-2.0
> -subdir-ccflags-y += -I$(srctree)/drivers/staging/lustre/include
> -subdir-ccflags-y += -I$(srctree)/drivers/staging/lustre/lustre/include
>  
>  obj-$(CONFIG_LNET) += libcfs.o
>  
> diff --git a/drivers/staging/lustre/lnet/lnet/Makefile 
> b/drivers/staging/lustre/lnet/lnet/Makefile
> index 0a9d70924fe0..ba33e90e47ec 100644
> --- a/drivers/staging/lustre/lnet/lnet/Makefile
> +++ b/drivers/staging/lustre/lnet/lnet/Makefile
> @@ -1,6 +1,4 @@
>  # SPDX-License-Identifier: GPL-2.0
> -subdir-ccflags-y += -I$(srctree)/drivers/staging/lustre/include
> -subdir-ccflags-y += -I$(srctree)/drivers/staging/lustre/lustre/include
>  
>  obj-$(CONFIG_LNET) += lnet.o
>  
> diff --git a/drivers/staging/lustre/lnet/selftest/Makefile 
> b/drivers/staging/lustre/lnet/selftest/Makefile
> index 3ccc8966b566..16f8efcd1531 100644
> --- a/drivers/staging/lustre/lnet/selftest/Makefile
> +++ b/drivers/staging/lustre/lnet/selftest/Makefile
> @@ -1,5 +1,3 @@
> -subdir-ccflags-y += -I$(srctree)/drivers/staging/lustre/include
> -subdir-ccflags-y += -I$(srctree)/drivers/staging/lustre/lustre/include
>  
>  obj-$(CONFIG_LNET_SELFTEST) := lnet_selftest.o
>  
> diff --git 

[PATCH v2 4/6] staging: lustre: acl: increase ACL entries limitation

2018-05-29 Thread James Simmons
From: Fan Yong 

Originally, the limitation of ACL entries is 32, that is not
enough for some use cases. In fact, restricting ACL entries
count is mainly for preparing the RPC reply buffer to receive
the ACL data. So we cannot make the ACL entries count to be
unlimited. But we can enlarge the RPC reply buffer to hold
more ACL entries. On the other hand, MDT backend filesystem
has its own EA size limitation. For example, for ldiskfs case,
if large EA enable, then the max ACL size is 1048492 bytes;
otherwise, it is 4012 bytes. For ZFS backend, such value is
32768 bytes. With such hard limitation, we can calculate how
many ACL entries we can have at most. This patch increases
the RPC reply buffer to match such hard limitation. For old
client, to avoid buffer overflow because of large ACL data
(more than 32 ACL entries), the MDT will forbid the old client
to access the file with large ACL data. As for how to know
whether it is old client or new, a new connection flag
OBD_CONNECT_LARGE_ACL is used for that.

Signed-off-by: Fan Yong 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7473
Reviewed-on: https://review.whamcloud.com/19790
Reviewed-by: Andreas Dilger 
Reviewed-by: Li Xi 
Reviewed-by: Lai Siyao 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---

Changelog:

v1) Initial patch
v2) Rebased patch. No changes

 drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h | 2 +-
 drivers/staging/lustre/lustre/include/lustre_acl.h| 7 ++-
 drivers/staging/lustre/lustre/llite/llite_lib.c   | 3 ++-
 drivers/staging/lustre/lustre/mdc/mdc_locks.c | 6 ++
 drivers/staging/lustre/lustre/mdc/mdc_reint.c | 2 ++
 drivers/staging/lustre/lustre/mdc/mdc_request.c   | 4 
 drivers/staging/lustre/lustre/ptlrpc/layout.c | 4 +---
 drivers/staging/lustre/lustre/ptlrpc/wiretest.c   | 4 ++--
 8 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h 
b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h
index aac98db..8778c6f 100644
--- a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h
@@ -615,7 +615,7 @@ struct ptlrpc_body_v2 {
 #define OBD_CONNECT_REQPORTAL   0x40ULL /*Separate non-IO req portal */
 #define OBD_CONNECT_ACL 0x80ULL /*access control lists 
*/
 #define OBD_CONNECT_XATTR  0x100ULL /*client use extended attr */
-#define OBD_CONNECT_CROW   0x200ULL /*MDS+OST create obj on write*/
+#define OBD_CONNECT_LARGE_ACL  0x200ULL /* more than 32 ACL entries */
 #define OBD_CONNECT_TRUNCLOCK  0x400ULL /*locks on server for punch */
 #define OBD_CONNECT_TRANSNO0x800ULL /*replay sends init transno */
 #define OBD_CONNECT_IBITS 0x1000ULL /*support for inodebits locks*/
diff --git a/drivers/staging/lustre/lustre/include/lustre_acl.h 
b/drivers/staging/lustre/lustre/include/lustre_acl.h
index 35ff61c..e7575a1 100644
--- a/drivers/staging/lustre/lustre/include/lustre_acl.h
+++ b/drivers/staging/lustre/lustre/include/lustre_acl.h
@@ -36,11 +36,16 @@
 
 #include 
 #include 
+#ifdef CONFIG_FS_POSIX_ACL
 #include 
 
 #define LUSTRE_POSIX_ACL_MAX_ENTRIES   32
-#define LUSTRE_POSIX_ACL_MAX_SIZE  
\
+#define LUSTRE_POSIX_ACL_MAX_SIZE_OLD  
\
(sizeof(struct posix_acl_xattr_header) +
\
 LUSTRE_POSIX_ACL_MAX_ENTRIES * sizeof(struct posix_acl_xattr_entry))
 
+#else /* ! CONFIG_FS_POSIX_ACL */
+#define LUSTRE_POSIX_ACL_MAX_SIZE_OLD 0
+#endif /* CONFIG_FS_POSIX_ACL */
+
 #endif
diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c 
b/drivers/staging/lustre/lustre/llite/llite_lib.c
index 1bc0782..36066c8 100644
--- a/drivers/staging/lustre/lustre/llite/llite_lib.c
+++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
@@ -199,7 +199,8 @@ static int client_common_fill_super(struct super_block *sb, 
char *md, char *dt)
if (sbi->ll_flags & LL_SBI_LRU_RESIZE)
data->ocd_connect_flags |= OBD_CONNECT_LRU_RESIZE;
 #ifdef CONFIG_FS_POSIX_ACL
-   data->ocd_connect_flags |= OBD_CONNECT_ACL | OBD_CONNECT_UMASK;
+   data->ocd_connect_flags |= OBD_CONNECT_ACL | OBD_CONNECT_UMASK |
+  OBD_CONNECT_LARGE_ACL;
 #endif
 
if (OBD_FAIL_CHECK(OBD_FAIL_MDC_LIGHTWEIGHT))
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c 
b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index 253a545..65a5341 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -308,6 +308,8 @@ static void mdc_realloc_openmsg(struct ptlrpc_request *req,
 
req_capsule_set_size(>rq_pill, _MDT_MD, RCL_SERVER,

[PATCH v2 1/6] staging: lustre: llite: create acl.c file

2018-05-29 Thread James Simmons
Move ll_get_acl() to its own file acl.c just like all the other
linux file systems do.

Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6142
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch to add acl.c file which contains acl handling
for lustre

 drivers/staging/lustre/lustre/llite/Makefile   |  2 +
 drivers/staging/lustre/lustre/llite/acl.c  | 51 ++
 drivers/staging/lustre/lustre/llite/file.c | 13 --
 .../staging/lustre/lustre/llite/llite_internal.h   |  5 +++
 4 files changed, 58 insertions(+), 13 deletions(-)
 create mode 100644 drivers/staging/lustre/lustre/llite/acl.c

diff --git a/drivers/staging/lustre/lustre/llite/Makefile 
b/drivers/staging/lustre/lustre/llite/Makefile
index 519fd74..5200924 100644
--- a/drivers/staging/lustre/lustre/llite/Makefile
+++ b/drivers/staging/lustre/lustre/llite/Makefile
@@ -9,3 +9,5 @@ lustre-y := dcache.o dir.o file.o llite_lib.o llite_nfs.o \
super25.o statahead.o glimpse.o lcommon_cl.o lcommon_misc.o \
vvp_dev.o vvp_page.o vvp_lock.o vvp_io.o vvp_object.o \
lproc_llite.o
+
+lustre-$(CONFIG_FS_POSIX_ACL) += acl.o
diff --git a/drivers/staging/lustre/lustre/llite/acl.c 
b/drivers/staging/lustre/lustre/llite/acl.c
new file mode 100644
index 000..d7c3bf9
--- /dev/null
+++ b/drivers/staging/lustre/lustre/llite/acl.c
@@ -0,0 +1,51 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * GPL HEADER START
+ *
+ * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 only,
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License version 2 for more details (a copy is included
+ * in the LICENSE file that accompanied this code).
+ *
+ * You should have received a copy of the GNU General Public License
+ * version 2 along with this program; If not, see
+ * http://www.gnu.org/licenses/gpl-2.0.html
+ *
+ * GPL HEADER END
+ */
+/*
+ * Copyright (c) 2002, 2010, Oracle and/or its affiliates. All rights reserved.
+ * Use is subject to license terms.
+ *
+ * Copyright (c) 2011, 2015, Intel Corporation.
+ */
+/*
+ * This file is part of Lustre, http://www.lustre.org/
+ * Lustre is a trademark of Sun Microsystems, Inc.
+ *
+ * lustre/llite/acl.c
+ */
+
+#define DEBUG_SUBSYSTEM S_LLITE
+
+#include "llite_internal.h"
+
+struct posix_acl *ll_get_acl(struct inode *inode, int type)
+{
+   struct ll_inode_info *lli = ll_i2info(inode);
+   struct posix_acl *acl = NULL;
+
+   spin_lock(>lli_lock);
+   /* VFS' acl_permission_check->check_acl will release the refcount */
+   acl = posix_acl_dup(lli->lli_posix_acl);
+   spin_unlock(>lli_lock);
+
+   return acl;
+}
diff --git a/drivers/staging/lustre/lustre/llite/file.c 
b/drivers/staging/lustre/lustre/llite/file.c
index a77cadc..ccbf91b 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -3030,19 +3030,6 @@ static int ll_fiemap(struct inode *inode, struct 
fiemap_extent_info *fieinfo,
return rc;
 }
 
-struct posix_acl *ll_get_acl(struct inode *inode, int type)
-{
-   struct ll_inode_info *lli = ll_i2info(inode);
-   struct posix_acl *acl = NULL;
-
-   spin_lock(>lli_lock);
-   /* VFS' acl_permission_check->check_acl will release the refcount */
-   acl = posix_acl_dup(lli->lli_posix_acl);
-   spin_unlock(>lli_lock);
-
-   return acl;
-}
-
 int ll_inode_permission(struct inode *inode, int mask)
 {
struct ll_sb_info *sbi;
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h 
b/drivers/staging/lustre/lustre/llite/llite_internal.h
index 379d88e..bdb1564 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -754,7 +754,12 @@ enum ldlm_mode ll_take_md_lock(struct inode *inode, __u64 
bits,
 int ll_md_real_close(struct inode *inode, fmode_t fmode);
 int ll_getattr(const struct path *path, struct kstat *stat,
   u32 request_mask, unsigned int flags);
+#ifdef CONFIG_FS_POSIX_ACL
 struct posix_acl *ll_get_acl(struct inode *inode, int type);
+#else
+#define ll_get_acl NULL
+#endif /* CONFIG_FS_POSIX_ACL */
+
 int ll_migrate(struct inode *parent, struct file *file, int mdtidx,
   const char *name, int namelen);
 int ll_get_fid_by_name(struct inode *parent, const char *name,
-- 
1.8.3.1



[PATCH v2 4/6] staging: lustre: acl: increase ACL entries limitation

2018-05-29 Thread James Simmons
From: Fan Yong 

Originally, the limitation of ACL entries is 32, that is not
enough for some use cases. In fact, restricting ACL entries
count is mainly for preparing the RPC reply buffer to receive
the ACL data. So we cannot make the ACL entries count to be
unlimited. But we can enlarge the RPC reply buffer to hold
more ACL entries. On the other hand, MDT backend filesystem
has its own EA size limitation. For example, for ldiskfs case,
if large EA enable, then the max ACL size is 1048492 bytes;
otherwise, it is 4012 bytes. For ZFS backend, such value is
32768 bytes. With such hard limitation, we can calculate how
many ACL entries we can have at most. This patch increases
the RPC reply buffer to match such hard limitation. For old
client, to avoid buffer overflow because of large ACL data
(more than 32 ACL entries), the MDT will forbid the old client
to access the file with large ACL data. As for how to know
whether it is old client or new, a new connection flag
OBD_CONNECT_LARGE_ACL is used for that.

Signed-off-by: Fan Yong 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7473
Reviewed-on: https://review.whamcloud.com/19790
Reviewed-by: Andreas Dilger 
Reviewed-by: Li Xi 
Reviewed-by: Lai Siyao 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---

Changelog:

v1) Initial patch
v2) Rebased patch. No changes

 drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h | 2 +-
 drivers/staging/lustre/lustre/include/lustre_acl.h| 7 ++-
 drivers/staging/lustre/lustre/llite/llite_lib.c   | 3 ++-
 drivers/staging/lustre/lustre/mdc/mdc_locks.c | 6 ++
 drivers/staging/lustre/lustre/mdc/mdc_reint.c | 2 ++
 drivers/staging/lustre/lustre/mdc/mdc_request.c   | 4 
 drivers/staging/lustre/lustre/ptlrpc/layout.c | 4 +---
 drivers/staging/lustre/lustre/ptlrpc/wiretest.c   | 4 ++--
 8 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h 
b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h
index aac98db..8778c6f 100644
--- a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h
@@ -615,7 +615,7 @@ struct ptlrpc_body_v2 {
 #define OBD_CONNECT_REQPORTAL   0x40ULL /*Separate non-IO req portal */
 #define OBD_CONNECT_ACL 0x80ULL /*access control lists 
*/
 #define OBD_CONNECT_XATTR  0x100ULL /*client use extended attr */
-#define OBD_CONNECT_CROW   0x200ULL /*MDS+OST create obj on write*/
+#define OBD_CONNECT_LARGE_ACL  0x200ULL /* more than 32 ACL entries */
 #define OBD_CONNECT_TRUNCLOCK  0x400ULL /*locks on server for punch */
 #define OBD_CONNECT_TRANSNO0x800ULL /*replay sends init transno */
 #define OBD_CONNECT_IBITS 0x1000ULL /*support for inodebits locks*/
diff --git a/drivers/staging/lustre/lustre/include/lustre_acl.h 
b/drivers/staging/lustre/lustre/include/lustre_acl.h
index 35ff61c..e7575a1 100644
--- a/drivers/staging/lustre/lustre/include/lustre_acl.h
+++ b/drivers/staging/lustre/lustre/include/lustre_acl.h
@@ -36,11 +36,16 @@
 
 #include 
 #include 
+#ifdef CONFIG_FS_POSIX_ACL
 #include 
 
 #define LUSTRE_POSIX_ACL_MAX_ENTRIES   32
-#define LUSTRE_POSIX_ACL_MAX_SIZE  
\
+#define LUSTRE_POSIX_ACL_MAX_SIZE_OLD  
\
(sizeof(struct posix_acl_xattr_header) +
\
 LUSTRE_POSIX_ACL_MAX_ENTRIES * sizeof(struct posix_acl_xattr_entry))
 
+#else /* ! CONFIG_FS_POSIX_ACL */
+#define LUSTRE_POSIX_ACL_MAX_SIZE_OLD 0
+#endif /* CONFIG_FS_POSIX_ACL */
+
 #endif
diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c 
b/drivers/staging/lustre/lustre/llite/llite_lib.c
index 1bc0782..36066c8 100644
--- a/drivers/staging/lustre/lustre/llite/llite_lib.c
+++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
@@ -199,7 +199,8 @@ static int client_common_fill_super(struct super_block *sb, 
char *md, char *dt)
if (sbi->ll_flags & LL_SBI_LRU_RESIZE)
data->ocd_connect_flags |= OBD_CONNECT_LRU_RESIZE;
 #ifdef CONFIG_FS_POSIX_ACL
-   data->ocd_connect_flags |= OBD_CONNECT_ACL | OBD_CONNECT_UMASK;
+   data->ocd_connect_flags |= OBD_CONNECT_ACL | OBD_CONNECT_UMASK |
+  OBD_CONNECT_LARGE_ACL;
 #endif
 
if (OBD_FAIL_CHECK(OBD_FAIL_MDC_LIGHTWEIGHT))
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c 
b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index 253a545..65a5341 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -308,6 +308,8 @@ static void mdc_realloc_openmsg(struct ptlrpc_request *req,
 
req_capsule_set_size(>rq_pill, _MDT_MD, RCL_SERVER,

[PATCH v2 1/6] staging: lustre: llite: create acl.c file

2018-05-29 Thread James Simmons
Move ll_get_acl() to its own file acl.c just like all the other
linux file systems do.

Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6142
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch to add acl.c file which contains acl handling
for lustre

 drivers/staging/lustre/lustre/llite/Makefile   |  2 +
 drivers/staging/lustre/lustre/llite/acl.c  | 51 ++
 drivers/staging/lustre/lustre/llite/file.c | 13 --
 .../staging/lustre/lustre/llite/llite_internal.h   |  5 +++
 4 files changed, 58 insertions(+), 13 deletions(-)
 create mode 100644 drivers/staging/lustre/lustre/llite/acl.c

diff --git a/drivers/staging/lustre/lustre/llite/Makefile 
b/drivers/staging/lustre/lustre/llite/Makefile
index 519fd74..5200924 100644
--- a/drivers/staging/lustre/lustre/llite/Makefile
+++ b/drivers/staging/lustre/lustre/llite/Makefile
@@ -9,3 +9,5 @@ lustre-y := dcache.o dir.o file.o llite_lib.o llite_nfs.o \
super25.o statahead.o glimpse.o lcommon_cl.o lcommon_misc.o \
vvp_dev.o vvp_page.o vvp_lock.o vvp_io.o vvp_object.o \
lproc_llite.o
+
+lustre-$(CONFIG_FS_POSIX_ACL) += acl.o
diff --git a/drivers/staging/lustre/lustre/llite/acl.c 
b/drivers/staging/lustre/lustre/llite/acl.c
new file mode 100644
index 000..d7c3bf9
--- /dev/null
+++ b/drivers/staging/lustre/lustre/llite/acl.c
@@ -0,0 +1,51 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * GPL HEADER START
+ *
+ * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 only,
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License version 2 for more details (a copy is included
+ * in the LICENSE file that accompanied this code).
+ *
+ * You should have received a copy of the GNU General Public License
+ * version 2 along with this program; If not, see
+ * http://www.gnu.org/licenses/gpl-2.0.html
+ *
+ * GPL HEADER END
+ */
+/*
+ * Copyright (c) 2002, 2010, Oracle and/or its affiliates. All rights reserved.
+ * Use is subject to license terms.
+ *
+ * Copyright (c) 2011, 2015, Intel Corporation.
+ */
+/*
+ * This file is part of Lustre, http://www.lustre.org/
+ * Lustre is a trademark of Sun Microsystems, Inc.
+ *
+ * lustre/llite/acl.c
+ */
+
+#define DEBUG_SUBSYSTEM S_LLITE
+
+#include "llite_internal.h"
+
+struct posix_acl *ll_get_acl(struct inode *inode, int type)
+{
+   struct ll_inode_info *lli = ll_i2info(inode);
+   struct posix_acl *acl = NULL;
+
+   spin_lock(>lli_lock);
+   /* VFS' acl_permission_check->check_acl will release the refcount */
+   acl = posix_acl_dup(lli->lli_posix_acl);
+   spin_unlock(>lli_lock);
+
+   return acl;
+}
diff --git a/drivers/staging/lustre/lustre/llite/file.c 
b/drivers/staging/lustre/lustre/llite/file.c
index a77cadc..ccbf91b 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -3030,19 +3030,6 @@ static int ll_fiemap(struct inode *inode, struct 
fiemap_extent_info *fieinfo,
return rc;
 }
 
-struct posix_acl *ll_get_acl(struct inode *inode, int type)
-{
-   struct ll_inode_info *lli = ll_i2info(inode);
-   struct posix_acl *acl = NULL;
-
-   spin_lock(>lli_lock);
-   /* VFS' acl_permission_check->check_acl will release the refcount */
-   acl = posix_acl_dup(lli->lli_posix_acl);
-   spin_unlock(>lli_lock);
-
-   return acl;
-}
-
 int ll_inode_permission(struct inode *inode, int mask)
 {
struct ll_sb_info *sbi;
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h 
b/drivers/staging/lustre/lustre/llite/llite_internal.h
index 379d88e..bdb1564 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -754,7 +754,12 @@ enum ldlm_mode ll_take_md_lock(struct inode *inode, __u64 
bits,
 int ll_md_real_close(struct inode *inode, fmode_t fmode);
 int ll_getattr(const struct path *path, struct kstat *stat,
   u32 request_mask, unsigned int flags);
+#ifdef CONFIG_FS_POSIX_ACL
 struct posix_acl *ll_get_acl(struct inode *inode, int type);
+#else
+#define ll_get_acl NULL
+#endif /* CONFIG_FS_POSIX_ACL */
+
 int ll_migrate(struct inode *parent, struct file *file, int mdtidx,
   const char *name, int namelen);
 int ll_get_fid_by_name(struct inode *parent, const char *name,
-- 
1.8.3.1



[PATCH v2 2/6] staging: lustre: llite: add support set_acl method in inode operations

2018-05-29 Thread James Simmons
From: Dmitry Eremin 

Linux kernel v3.14 adds set_acl method to inode operations.
This patch adds support to Lustre for proper acl management.

Signed-off-by: Dmitry Eremin 
Signed-off-by: John L. Hammond 
Signed-off-by: James Simmons 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9183
Reviewed-on: https://review.whamcloud.com/25965
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-10541
Reviewed-on: https://review.whamcloud.com/31588
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-10926
Reviewed-on: https://review.whamcloud.com/32045
Reviewed-by: Bob Glossman 
Reviewed-by: James Simmons 
Reviewed-by: Andreas Dilger 
Reviewed-by: Dmitry Eremin 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial ported patch

v2) Updated patch with fixes that address issues pointed out by
   Can Carpenter

v3) Rebased to contain new code in acl.c

 drivers/staging/lustre/lustre/llite/acl.c  | 57 ++
 .../staging/lustre/lustre/llite/llite_internal.h   |  2 +
 2 files changed, 59 insertions(+)

diff --git a/drivers/staging/lustre/lustre/llite/acl.c 
b/drivers/staging/lustre/lustre/llite/acl.c
index d7c3bf9..de1499b 100644
--- a/drivers/staging/lustre/lustre/llite/acl.c
+++ b/drivers/staging/lustre/lustre/llite/acl.c
@@ -49,3 +49,60 @@ struct posix_acl *ll_get_acl(struct inode *inode, int type)
 
return acl;
 }
+
+int ll_set_acl(struct inode *inode, struct posix_acl *acl, int type)
+{
+   struct ll_sb_info *sbi = ll_i2sbi(inode);
+   struct ptlrpc_request *req = NULL;
+   const char *name = NULL;
+   size_t value_size = 0;
+   char *value = NULL;
+   int rc = 0;
+
+   switch (type) {
+   case ACL_TYPE_ACCESS:
+   name = XATTR_NAME_POSIX_ACL_ACCESS;
+   if (acl)
+   rc = posix_acl_update_mode(inode, >i_mode, );
+   break;
+
+   case ACL_TYPE_DEFAULT:
+   name = XATTR_NAME_POSIX_ACL_DEFAULT;
+   if (!S_ISDIR(inode->i_mode))
+   rc = acl ? -EACCES : 0;
+   break;
+
+   default:
+   rc = -EINVAL;
+   break;
+   }
+   if (rc)
+   return rc;
+
+   if (acl) {
+   value_size = posix_acl_xattr_size(acl->a_count);
+   value = kmalloc(value_size, GFP_NOFS);
+   if (!value) {
+   rc = -ENOMEM;
+   goto out;
+   }
+
+   rc = posix_acl_to_xattr(_user_ns, acl, value, value_size);
+   if (rc < 0)
+   goto out_value;
+   }
+
+   rc = md_setxattr(sbi->ll_md_exp, ll_inode2fid(inode),
+value ? OBD_MD_FLXATTR : OBD_MD_FLXATTRRM,
+name, value, value_size, 0, 0, 0, );
+
+   ptlrpc_req_finished(req);
+out_value:
+   kfree(value);
+out:
+   if (rc)
+   forget_cached_acl(inode, type);
+   else
+   set_cached_acl(inode, type, acl);
+   return rc;
+}
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h 
b/drivers/staging/lustre/lustre/llite/llite_internal.h
index bdb1564..c08a6e1 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -756,8 +756,10 @@ int ll_getattr(const struct path *path, struct kstat *stat,
   u32 request_mask, unsigned int flags);
 #ifdef CONFIG_FS_POSIX_ACL
 struct posix_acl *ll_get_acl(struct inode *inode, int type);
+int ll_set_acl(struct inode *inode, struct posix_acl *acl, int type);
 #else
 #define ll_get_acl NULL
+#define ll_set_acl NULL
 #endif /* CONFIG_FS_POSIX_ACL */
 
 int ll_migrate(struct inode *parent, struct file *file, int mdtidx,
-- 
1.8.3.1



[PATCH v2 2/6] staging: lustre: llite: add support set_acl method in inode operations

2018-05-29 Thread James Simmons
From: Dmitry Eremin 

Linux kernel v3.14 adds set_acl method to inode operations.
This patch adds support to Lustre for proper acl management.

Signed-off-by: Dmitry Eremin 
Signed-off-by: John L. Hammond 
Signed-off-by: James Simmons 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9183
Reviewed-on: https://review.whamcloud.com/25965
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-10541
Reviewed-on: https://review.whamcloud.com/31588
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-10926
Reviewed-on: https://review.whamcloud.com/32045
Reviewed-by: Bob Glossman 
Reviewed-by: James Simmons 
Reviewed-by: Andreas Dilger 
Reviewed-by: Dmitry Eremin 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial ported patch

v2) Updated patch with fixes that address issues pointed out by
   Can Carpenter

v3) Rebased to contain new code in acl.c

 drivers/staging/lustre/lustre/llite/acl.c  | 57 ++
 .../staging/lustre/lustre/llite/llite_internal.h   |  2 +
 2 files changed, 59 insertions(+)

diff --git a/drivers/staging/lustre/lustre/llite/acl.c 
b/drivers/staging/lustre/lustre/llite/acl.c
index d7c3bf9..de1499b 100644
--- a/drivers/staging/lustre/lustre/llite/acl.c
+++ b/drivers/staging/lustre/lustre/llite/acl.c
@@ -49,3 +49,60 @@ struct posix_acl *ll_get_acl(struct inode *inode, int type)
 
return acl;
 }
+
+int ll_set_acl(struct inode *inode, struct posix_acl *acl, int type)
+{
+   struct ll_sb_info *sbi = ll_i2sbi(inode);
+   struct ptlrpc_request *req = NULL;
+   const char *name = NULL;
+   size_t value_size = 0;
+   char *value = NULL;
+   int rc = 0;
+
+   switch (type) {
+   case ACL_TYPE_ACCESS:
+   name = XATTR_NAME_POSIX_ACL_ACCESS;
+   if (acl)
+   rc = posix_acl_update_mode(inode, >i_mode, );
+   break;
+
+   case ACL_TYPE_DEFAULT:
+   name = XATTR_NAME_POSIX_ACL_DEFAULT;
+   if (!S_ISDIR(inode->i_mode))
+   rc = acl ? -EACCES : 0;
+   break;
+
+   default:
+   rc = -EINVAL;
+   break;
+   }
+   if (rc)
+   return rc;
+
+   if (acl) {
+   value_size = posix_acl_xattr_size(acl->a_count);
+   value = kmalloc(value_size, GFP_NOFS);
+   if (!value) {
+   rc = -ENOMEM;
+   goto out;
+   }
+
+   rc = posix_acl_to_xattr(_user_ns, acl, value, value_size);
+   if (rc < 0)
+   goto out_value;
+   }
+
+   rc = md_setxattr(sbi->ll_md_exp, ll_inode2fid(inode),
+value ? OBD_MD_FLXATTR : OBD_MD_FLXATTRRM,
+name, value, value_size, 0, 0, 0, );
+
+   ptlrpc_req_finished(req);
+out_value:
+   kfree(value);
+out:
+   if (rc)
+   forget_cached_acl(inode, type);
+   else
+   set_cached_acl(inode, type, acl);
+   return rc;
+}
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h 
b/drivers/staging/lustre/lustre/llite/llite_internal.h
index bdb1564..c08a6e1 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -756,8 +756,10 @@ int ll_getattr(const struct path *path, struct kstat *stat,
   u32 request_mask, unsigned int flags);
 #ifdef CONFIG_FS_POSIX_ACL
 struct posix_acl *ll_get_acl(struct inode *inode, int type);
+int ll_set_acl(struct inode *inode, struct posix_acl *acl, int type);
 #else
 #define ll_get_acl NULL
+#define ll_set_acl NULL
 #endif /* CONFIG_FS_POSIX_ACL */
 
 int ll_migrate(struct inode *parent, struct file *file, int mdtidx,
-- 
1.8.3.1



[PATCH v2 6/6] staging: lustre: mdc: use large xattr buffers for old servers

2018-05-29 Thread James Simmons
From: "John L. Hammond" 

Pre 2.10.1 MDTs will crash when they receive a listxattr (MDS_GETXATTR
with OBD_MD_FLXATTRLS) RPC for an orphan or dead object. So for
clients connected to these older MDTs, try to avoid sending listxattr
RPCs by making the bulk getxattr (MDS_GETXATTR with OBD_MD_FLXATTRALL)
more likely to succeed and thereby reducing the chances of falling
back to listxattr.

Signed-off-by: John L. Hammond 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-10912
Reviewed-on: https://review.whamcloud.com/31990
Reviewed-by: Andreas Dilger 
Reviewed-by: Fan Yong 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch
v2) Rebased patch. No changes

 drivers/staging/lustre/lustre/mdc/mdc_locks.c | 31 +--
 1 file changed, 24 insertions(+), 7 deletions(-)

diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c 
b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index a8aa0fa..b991c6f 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -326,8 +326,10 @@ static void mdc_realloc_openmsg(struct ptlrpc_request *req,
 {
struct ptlrpc_request   *req;
struct ldlm_intent  *lit;
+   u32 min_buf_size = 0;
int rc, count = 0;
LIST_HEAD(cancels);
+   u32 buf_size = 0;
 
req = ptlrpc_request_alloc(class_exp2cliimp(exp),
   _LDLM_INTENT_GETXATTR);
@@ -344,18 +346,33 @@ static void mdc_realloc_openmsg(struct ptlrpc_request 
*req,
lit = req_capsule_client_get(>rq_pill, _LDLM_INTENT);
lit->opc = IT_GETXATTR;
 
+#if LUSTRE_VERSION_CODE < OBD_OCD_VERSION(3, 0, 53, 0)
+   /* If the supplied buffer is too small then the server will
+* return -ERANGE and llite will fallback to using non cached
+* xattr operations. On servers before 2.10.1 a (non-cached)
+* listxattr RPC for an orphan or dead file causes an oops. So
+* let's try to avoid sending too small a buffer to too old a
+* server. This is effectively undoing the memory conservation
+* of LU-9417 when it would be *more* likely to crash the
+* server. See LU-9856.
+*/
+   if (exp->exp_connect_data.ocd_version < OBD_OCD_VERSION(2, 10, 1, 0))
+   min_buf_size = exp->exp_connect_data.ocd_max_easize;
+#endif
+   buf_size = max_t(u32, min_buf_size,
+GA_DEFAULT_EA_NAME_LEN * GA_DEFAULT_EA_NUM);
+
/* pack the intended request */
-   mdc_pack_body(req, _data->op_fid1, op_data->op_valid,
- GA_DEFAULT_EA_NAME_LEN * GA_DEFAULT_EA_NUM, -1, 0);
+   mdc_pack_body(req, _data->op_fid1, op_data->op_valid, buf_size,
+ -1, 0);
 
-   req_capsule_set_size(>rq_pill, _EADATA, RCL_SERVER,
-GA_DEFAULT_EA_NAME_LEN * GA_DEFAULT_EA_NUM);
+   req_capsule_set_size(>rq_pill, _EADATA, RCL_SERVER, buf_size);
 
-   req_capsule_set_size(>rq_pill, _EAVALS, RCL_SERVER,
-GA_DEFAULT_EA_NAME_LEN * GA_DEFAULT_EA_NUM);
+   req_capsule_set_size(>rq_pill, _EAVALS, RCL_SERVER, buf_size);
 
req_capsule_set_size(>rq_pill, _EAVALS_LENS, RCL_SERVER,
-sizeof(u32) * GA_DEFAULT_EA_NUM);
+max_t(u32, min_buf_size,
+  sizeof(u32) * GA_DEFAULT_EA_NUM));
 
req_capsule_set_size(>rq_pill, _ACL, RCL_SERVER, 0);
 
-- 
1.8.3.1



[PATCH v2 6/6] staging: lustre: mdc: use large xattr buffers for old servers

2018-05-29 Thread James Simmons
From: "John L. Hammond" 

Pre 2.10.1 MDTs will crash when they receive a listxattr (MDS_GETXATTR
with OBD_MD_FLXATTRLS) RPC for an orphan or dead object. So for
clients connected to these older MDTs, try to avoid sending listxattr
RPCs by making the bulk getxattr (MDS_GETXATTR with OBD_MD_FLXATTRALL)
more likely to succeed and thereby reducing the chances of falling
back to listxattr.

Signed-off-by: John L. Hammond 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-10912
Reviewed-on: https://review.whamcloud.com/31990
Reviewed-by: Andreas Dilger 
Reviewed-by: Fan Yong 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch
v2) Rebased patch. No changes

 drivers/staging/lustre/lustre/mdc/mdc_locks.c | 31 +--
 1 file changed, 24 insertions(+), 7 deletions(-)

diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c 
b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index a8aa0fa..b991c6f 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -326,8 +326,10 @@ static void mdc_realloc_openmsg(struct ptlrpc_request *req,
 {
struct ptlrpc_request   *req;
struct ldlm_intent  *lit;
+   u32 min_buf_size = 0;
int rc, count = 0;
LIST_HEAD(cancels);
+   u32 buf_size = 0;
 
req = ptlrpc_request_alloc(class_exp2cliimp(exp),
   _LDLM_INTENT_GETXATTR);
@@ -344,18 +346,33 @@ static void mdc_realloc_openmsg(struct ptlrpc_request 
*req,
lit = req_capsule_client_get(>rq_pill, _LDLM_INTENT);
lit->opc = IT_GETXATTR;
 
+#if LUSTRE_VERSION_CODE < OBD_OCD_VERSION(3, 0, 53, 0)
+   /* If the supplied buffer is too small then the server will
+* return -ERANGE and llite will fallback to using non cached
+* xattr operations. On servers before 2.10.1 a (non-cached)
+* listxattr RPC for an orphan or dead file causes an oops. So
+* let's try to avoid sending too small a buffer to too old a
+* server. This is effectively undoing the memory conservation
+* of LU-9417 when it would be *more* likely to crash the
+* server. See LU-9856.
+*/
+   if (exp->exp_connect_data.ocd_version < OBD_OCD_VERSION(2, 10, 1, 0))
+   min_buf_size = exp->exp_connect_data.ocd_max_easize;
+#endif
+   buf_size = max_t(u32, min_buf_size,
+GA_DEFAULT_EA_NAME_LEN * GA_DEFAULT_EA_NUM);
+
/* pack the intended request */
-   mdc_pack_body(req, _data->op_fid1, op_data->op_valid,
- GA_DEFAULT_EA_NAME_LEN * GA_DEFAULT_EA_NUM, -1, 0);
+   mdc_pack_body(req, _data->op_fid1, op_data->op_valid, buf_size,
+ -1, 0);
 
-   req_capsule_set_size(>rq_pill, _EADATA, RCL_SERVER,
-GA_DEFAULT_EA_NAME_LEN * GA_DEFAULT_EA_NUM);
+   req_capsule_set_size(>rq_pill, _EADATA, RCL_SERVER, buf_size);
 
-   req_capsule_set_size(>rq_pill, _EAVALS, RCL_SERVER,
-GA_DEFAULT_EA_NAME_LEN * GA_DEFAULT_EA_NUM);
+   req_capsule_set_size(>rq_pill, _EAVALS, RCL_SERVER, buf_size);
 
req_capsule_set_size(>rq_pill, _EAVALS_LENS, RCL_SERVER,
-sizeof(u32) * GA_DEFAULT_EA_NUM);
+max_t(u32, min_buf_size,
+  sizeof(u32) * GA_DEFAULT_EA_NUM));
 
req_capsule_set_size(>rq_pill, _ACL, RCL_SERVER, 0);
 
-- 
1.8.3.1



[PATCH v2 3/6] staging: lustre: llite: remove unused parameters from md_{get,set}xattr()

2018-05-29 Thread James Simmons
From: "John L. Hammond" 

md_getxattr() and md_setxattr() each have several unused
parameters. Remove them and improve the naming or remaining
parameters.

Signed-off-by: John L. Hammond 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-10792
Reviewed-on: https://review.whamcloud.com/
Reviewed-by: Dmitry Eremin 
Reviewed-by: James Simmons 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch

v2) Rebased to new parent patch

v3) Rebased againt to new parent patch using acl.c file

 drivers/staging/lustre/lustre/include/obd.h   |  7 ++---
 drivers/staging/lustre/lustre/include/obd_class.h | 21 ++
 drivers/staging/lustre/lustre/llite/acl.c |  2 +-
 drivers/staging/lustre/lustre/llite/file.c|  3 +-
 drivers/staging/lustre/lustre/llite/xattr.c   |  6 ++--
 drivers/staging/lustre/lustre/lmv/lmv_obd.c   | 22 +++
 drivers/staging/lustre/lustre/mdc/mdc_request.c   | 34 +--
 7 files changed, 46 insertions(+), 49 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/obd.h 
b/drivers/staging/lustre/lustre/include/obd.h
index da99a0f..b1907bb 100644
--- a/drivers/staging/lustre/lustre/include/obd.h
+++ b/drivers/staging/lustre/lustre/include/obd.h
@@ -940,12 +940,11 @@ struct md_ops {
  struct ptlrpc_request **);
 
int (*setxattr)(struct obd_export *, const struct lu_fid *,
-   u64, const char *, const char *, int, int, int, __u32,
-   struct ptlrpc_request **);
+   u64, const char *, const void *, size_t, unsigned int,
+   u32, struct ptlrpc_request **);
 
int (*getxattr)(struct obd_export *, const struct lu_fid *,
-   u64, const char *, const char *, int, int, int,
-   struct ptlrpc_request **);
+   u64, const char *, size_t, struct ptlrpc_request **);
 
int (*init_ea_size)(struct obd_export *, u32, u32);
 
diff --git a/drivers/staging/lustre/lustre/include/obd_class.h 
b/drivers/staging/lustre/lustre/include/obd_class.h
index a3b1465..fc9c772 100644
--- a/drivers/staging/lustre/lustre/include/obd_class.h
+++ b/drivers/staging/lustre/lustre/include/obd_class.h
@@ -1385,29 +1385,26 @@ static inline int md_merge_attr(struct obd_export *exp,
 }
 
 static inline int md_setxattr(struct obd_export *exp, const struct lu_fid *fid,
- u64 valid, const char *name,
- const char *input, int input_size,
- int output_size, int flags, __u32 suppgid,
+ u64 obd_md_valid, const char *name,
+ const char *value, size_t value_size,
+ unsigned int xattr_flags, u32 suppgid,
  struct ptlrpc_request **request)
 {
EXP_CHECK_MD_OP(exp, setxattr);
EXP_MD_COUNTER_INCREMENT(exp, setxattr);
-   return MDP(exp->exp_obd, setxattr)(exp, fid, valid, name, input,
-  input_size, output_size, flags,
+   return MDP(exp->exp_obd, setxattr)(exp, fid, obd_md_valid, name,
+  value, value_size, xattr_flags,
   suppgid, request);
 }
 
 static inline int md_getxattr(struct obd_export *exp, const struct lu_fid *fid,
- u64 valid, const char *name,
- const char *input, int input_size,
- int output_size, int flags,
- struct ptlrpc_request **request)
+ u64 obd_md_valid, const char *name,
+ size_t buf_size, struct ptlrpc_request **req)
 {
EXP_CHECK_MD_OP(exp, getxattr);
EXP_MD_COUNTER_INCREMENT(exp, getxattr);
-   return MDP(exp->exp_obd, getxattr)(exp, fid, valid, name, input,
-  input_size, output_size, flags,
-  request);
+   return MDP(exp->exp_obd, getxattr)(exp, fid, obd_md_valid, name,
+  buf_size, req);
 }
 
 static inline int md_set_open_replay_data(struct obd_export *exp,
diff --git a/drivers/staging/lustre/lustre/llite/acl.c 
b/drivers/staging/lustre/lustre/llite/acl.c
index de1499b..2ee9ff9 100644
--- a/drivers/staging/lustre/lustre/llite/acl.c
+++ b/drivers/staging/lustre/lustre/llite/acl.c
@@ -94,7 +94,7 @@ int ll_set_acl(struct inode *inode, struct posix_acl *acl, 
int type)
 
rc = md_setxattr(sbi->ll_md_exp, ll_inode2fid(inode),
 value ? OBD_MD_FLXATTR : OBD_MD_FLXATTRRM,
-name, value, value_size, 0, 0, 0, );
+name, value, value_size, 0, 0, );
 
ptlrpc_req_finished(req);
 out_value:
diff --git a/drivers/staging/lustre/lustr

[PATCH v2 01/25] staging: lustre: libcfs: restore UMP handling

2018-05-29 Thread James Simmons
With the cleanup of the libcfs SMP handling all UMP handling
was removed. In the process now various NULL pointers and
empty fields are return in the UMP case which causes lustre
to crash hard. Restore the proper UMP handling so Lustre can
properly function.

Signed-off-by: James Simmons 
Signed-off-by: Amir Shehata 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734
Reviewed-on: http://review.whamcloud.com/18916
Reviewed-by: Olaf Weber 
Reviewed-by: Doug Oucharek 
Signed-off-by: James Simmons 
---
Changelog:

v1) New patch to handle the disappearence of UMP support

 .../lustre/include/linux/libcfs/libcfs_cpu.h   | 87 --
 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c|  4 -
 drivers/staging/lustre/lnet/libcfs/module.c|  4 +
 3 files changed, 69 insertions(+), 26 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h 
b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
index 61641c4..2ad12a6 100644
--- a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
+++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
@@ -74,6 +74,7 @@
 
 #include 
 #include 
+#include 
 #include 
 
 /* any CPU partition */
@@ -89,10 +90,11 @@ struct cfs_cpu_partition {
/* spread rotor for NUMA allocator */
unsigned intcpt_spread_rotor;
 };
-
+#endif /* CONFIG_SMP */
 
 /** descriptor for CPU partitions */
 struct cfs_cpt_table {
+#ifdef CONFIG_SMP
/* version, reserved for hotplug */
unsigned intctb_version;
/* spread rotor for NUMA allocator */
@@ -103,14 +105,26 @@ struct cfs_cpt_table {
struct cfs_cpu_partition*ctb_parts;
/* shadow HW CPU to CPU partition ID */
int *ctb_cpu2cpt;
-   /* all cpus in this partition table */
-   cpumask_var_t   ctb_cpumask;
/* all nodes in this partition table */
nodemask_t  *ctb_nodemask;
+#else
+   nodemask_t  ctb_nodemask;
+#endif /* CONFIG_SMP */
+   /* all cpus in this partition table */
+   cpumask_var_t   ctb_cpumask;
 };
 
 extern struct cfs_cpt_table*cfs_cpt_tab;
 
+#ifdef CONFIG_SMP
+/**
+ * destroy a CPU partition table
+ */
+void cfs_cpt_table_free(struct cfs_cpt_table *cptab);
+/**
+ * create a cfs_cpt_table with \a ncpt number of partitions
+ */
+struct cfs_cpt_table *cfs_cpt_table_alloc(unsigned int ncpt);
 /**
  * return cpumask of CPU partition \a cpt
  */
@@ -208,20 +222,52 @@ void cfs_cpt_unset_nodemask(struct cfs_cpt_table *cptab,
 void cfs_cpu_fini(void);
 
 #else /* !CONFIG_SMP */
-struct cfs_cpt_table;
-#define cfs_cpt_tab ((struct cfs_cpt_table *)NULL)
 
-static inline cpumask_var_t *
-cfs_cpt_cpumask(struct cfs_cpt_table *cptab, int cpt)
+static inline void cfs_cpt_table_free(struct cfs_cpt_table *cptab)
 {
-   return NULL;
+   kfree(cptab);
 }
 
-static inline int
-cfs_cpt_table_print(struct cfs_cpt_table *cptab, char *buf, int len)
+static inline struct cfs_cpt_table *cfs_cpt_table_alloc(int ncpt)
 {
-   return 0;
+   struct cfs_cpt_table *cptab;
+
+   if (ncpt != 1)
+   return NULL;
+
+   cptab = kzalloc(sizeof(*cptab), GFP_NOFS);
+   if (!cptab)
+   return NULL;
+
+   if (!zalloc_cpumask_var(>ctb_cpumask, GFP_NOFS)) {
+   kfree(cptab);
+   return NULL;
+   }
+   cpumask_set_cpu(0, cptab->ctb_cpumask);
+   node_set(0, cptab->ctb_nodemask);
+
+   return cptab;
+}
+
+static inline int cfs_cpt_table_print(struct cfs_cpt_table *cptab,
+ char *buf, int len)
+{
+   int rc;
+
+   rc = snprintf(buf, len, "0\t: 0\n");
+   len -= rc;
+   if (len <= 0)
+   return -EFBIG;
+
+   return rc;
 }
+
+static inline cpumask_var_t *
+cfs_cpt_cpumask(struct cfs_cpt_table *cptab, int cpt)
+{
+   return >ctb_cpumask;
+}
+
 static inline int
 cfs_cpt_number(struct cfs_cpt_table *cptab)
 {
@@ -243,7 +289,7 @@ void cfs_cpt_unset_nodemask(struct cfs_cpt_table *cptab,
 static inline nodemask_t *
 cfs_cpt_nodemask(struct cfs_cpt_table *cptab, int cpt)
 {
-   return NULL;
+   return >ctb_nodemask;
 }
 
 static inline int
@@ -328,24 +374,21 @@ void cfs_cpt_unset_nodemask(struct cfs_cpt_table *cptab,
 static inline int
 cfs_cpu_init(void)
 {
-   return 0;
+   cfs_cpt_tab = cfs_cpt_table_alloc(1);
+
+   return cfs_cpt_tab ? 0 : -1;
 }
 
 static inline void cfs_cpu_fini(void)
 {
+   if (cfs_cpt_tab) {
+   cfs_cpt_table_free(cfs_cpt_tab);
+   cfs_cpt_tab = NULL;
+   }
 }
 
 #endif /* CONFIG_SMP */
 
-/**
- * destroy a CPU partition table
- */
-void cfs_cpt_table_free(struct cfs_cpt_table *cptab);
-/**
- * create a cfs_cpt_table with \a ncpt number of partitions
- */
-struct cfs_cpt_table *cfs_cpt_table_alloc(unsigned int n

[PATCH v2 3/6] staging: lustre: llite: remove unused parameters from md_{get,set}xattr()

2018-05-29 Thread James Simmons
From: "John L. Hammond" 

md_getxattr() and md_setxattr() each have several unused
parameters. Remove them and improve the naming or remaining
parameters.

Signed-off-by: John L. Hammond 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-10792
Reviewed-on: https://review.whamcloud.com/
Reviewed-by: Dmitry Eremin 
Reviewed-by: James Simmons 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch

v2) Rebased to new parent patch

v3) Rebased againt to new parent patch using acl.c file

 drivers/staging/lustre/lustre/include/obd.h   |  7 ++---
 drivers/staging/lustre/lustre/include/obd_class.h | 21 ++
 drivers/staging/lustre/lustre/llite/acl.c |  2 +-
 drivers/staging/lustre/lustre/llite/file.c|  3 +-
 drivers/staging/lustre/lustre/llite/xattr.c   |  6 ++--
 drivers/staging/lustre/lustre/lmv/lmv_obd.c   | 22 +++
 drivers/staging/lustre/lustre/mdc/mdc_request.c   | 34 +--
 7 files changed, 46 insertions(+), 49 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/obd.h 
b/drivers/staging/lustre/lustre/include/obd.h
index da99a0f..b1907bb 100644
--- a/drivers/staging/lustre/lustre/include/obd.h
+++ b/drivers/staging/lustre/lustre/include/obd.h
@@ -940,12 +940,11 @@ struct md_ops {
  struct ptlrpc_request **);
 
int (*setxattr)(struct obd_export *, const struct lu_fid *,
-   u64, const char *, const char *, int, int, int, __u32,
-   struct ptlrpc_request **);
+   u64, const char *, const void *, size_t, unsigned int,
+   u32, struct ptlrpc_request **);
 
int (*getxattr)(struct obd_export *, const struct lu_fid *,
-   u64, const char *, const char *, int, int, int,
-   struct ptlrpc_request **);
+   u64, const char *, size_t, struct ptlrpc_request **);
 
int (*init_ea_size)(struct obd_export *, u32, u32);
 
diff --git a/drivers/staging/lustre/lustre/include/obd_class.h 
b/drivers/staging/lustre/lustre/include/obd_class.h
index a3b1465..fc9c772 100644
--- a/drivers/staging/lustre/lustre/include/obd_class.h
+++ b/drivers/staging/lustre/lustre/include/obd_class.h
@@ -1385,29 +1385,26 @@ static inline int md_merge_attr(struct obd_export *exp,
 }
 
 static inline int md_setxattr(struct obd_export *exp, const struct lu_fid *fid,
- u64 valid, const char *name,
- const char *input, int input_size,
- int output_size, int flags, __u32 suppgid,
+ u64 obd_md_valid, const char *name,
+ const char *value, size_t value_size,
+ unsigned int xattr_flags, u32 suppgid,
  struct ptlrpc_request **request)
 {
EXP_CHECK_MD_OP(exp, setxattr);
EXP_MD_COUNTER_INCREMENT(exp, setxattr);
-   return MDP(exp->exp_obd, setxattr)(exp, fid, valid, name, input,
-  input_size, output_size, flags,
+   return MDP(exp->exp_obd, setxattr)(exp, fid, obd_md_valid, name,
+  value, value_size, xattr_flags,
   suppgid, request);
 }
 
 static inline int md_getxattr(struct obd_export *exp, const struct lu_fid *fid,
- u64 valid, const char *name,
- const char *input, int input_size,
- int output_size, int flags,
- struct ptlrpc_request **request)
+ u64 obd_md_valid, const char *name,
+ size_t buf_size, struct ptlrpc_request **req)
 {
EXP_CHECK_MD_OP(exp, getxattr);
EXP_MD_COUNTER_INCREMENT(exp, getxattr);
-   return MDP(exp->exp_obd, getxattr)(exp, fid, valid, name, input,
-  input_size, output_size, flags,
-  request);
+   return MDP(exp->exp_obd, getxattr)(exp, fid, obd_md_valid, name,
+  buf_size, req);
 }
 
 static inline int md_set_open_replay_data(struct obd_export *exp,
diff --git a/drivers/staging/lustre/lustre/llite/acl.c 
b/drivers/staging/lustre/lustre/llite/acl.c
index de1499b..2ee9ff9 100644
--- a/drivers/staging/lustre/lustre/llite/acl.c
+++ b/drivers/staging/lustre/lustre/llite/acl.c
@@ -94,7 +94,7 @@ int ll_set_acl(struct inode *inode, struct posix_acl *acl, 
int type)
 
rc = md_setxattr(sbi->ll_md_exp, ll_inode2fid(inode),
 value ? OBD_MD_FLXATTR : OBD_MD_FLXATTRRM,
-name, value, value_size, 0, 0, 0, );
+name, value, value_size, 0, 0, );
 
ptlrpc_req_finished(req);
 out_value:
diff --git a/drivers/staging/lustre/lustr

[PATCH v2 01/25] staging: lustre: libcfs: restore UMP handling

2018-05-29 Thread James Simmons
With the cleanup of the libcfs SMP handling all UMP handling
was removed. In the process now various NULL pointers and
empty fields are return in the UMP case which causes lustre
to crash hard. Restore the proper UMP handling so Lustre can
properly function.

Signed-off-by: James Simmons 
Signed-off-by: Amir Shehata 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734
Reviewed-on: http://review.whamcloud.com/18916
Reviewed-by: Olaf Weber 
Reviewed-by: Doug Oucharek 
Signed-off-by: James Simmons 
---
Changelog:

v1) New patch to handle the disappearence of UMP support

 .../lustre/include/linux/libcfs/libcfs_cpu.h   | 87 --
 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c|  4 -
 drivers/staging/lustre/lnet/libcfs/module.c|  4 +
 3 files changed, 69 insertions(+), 26 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h 
b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
index 61641c4..2ad12a6 100644
--- a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
+++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
@@ -74,6 +74,7 @@
 
 #include 
 #include 
+#include 
 #include 
 
 /* any CPU partition */
@@ -89,10 +90,11 @@ struct cfs_cpu_partition {
/* spread rotor for NUMA allocator */
unsigned intcpt_spread_rotor;
 };
-
+#endif /* CONFIG_SMP */
 
 /** descriptor for CPU partitions */
 struct cfs_cpt_table {
+#ifdef CONFIG_SMP
/* version, reserved for hotplug */
unsigned intctb_version;
/* spread rotor for NUMA allocator */
@@ -103,14 +105,26 @@ struct cfs_cpt_table {
struct cfs_cpu_partition*ctb_parts;
/* shadow HW CPU to CPU partition ID */
int *ctb_cpu2cpt;
-   /* all cpus in this partition table */
-   cpumask_var_t   ctb_cpumask;
/* all nodes in this partition table */
nodemask_t  *ctb_nodemask;
+#else
+   nodemask_t  ctb_nodemask;
+#endif /* CONFIG_SMP */
+   /* all cpus in this partition table */
+   cpumask_var_t   ctb_cpumask;
 };
 
 extern struct cfs_cpt_table*cfs_cpt_tab;
 
+#ifdef CONFIG_SMP
+/**
+ * destroy a CPU partition table
+ */
+void cfs_cpt_table_free(struct cfs_cpt_table *cptab);
+/**
+ * create a cfs_cpt_table with \a ncpt number of partitions
+ */
+struct cfs_cpt_table *cfs_cpt_table_alloc(unsigned int ncpt);
 /**
  * return cpumask of CPU partition \a cpt
  */
@@ -208,20 +222,52 @@ void cfs_cpt_unset_nodemask(struct cfs_cpt_table *cptab,
 void cfs_cpu_fini(void);
 
 #else /* !CONFIG_SMP */
-struct cfs_cpt_table;
-#define cfs_cpt_tab ((struct cfs_cpt_table *)NULL)
 
-static inline cpumask_var_t *
-cfs_cpt_cpumask(struct cfs_cpt_table *cptab, int cpt)
+static inline void cfs_cpt_table_free(struct cfs_cpt_table *cptab)
 {
-   return NULL;
+   kfree(cptab);
 }
 
-static inline int
-cfs_cpt_table_print(struct cfs_cpt_table *cptab, char *buf, int len)
+static inline struct cfs_cpt_table *cfs_cpt_table_alloc(int ncpt)
 {
-   return 0;
+   struct cfs_cpt_table *cptab;
+
+   if (ncpt != 1)
+   return NULL;
+
+   cptab = kzalloc(sizeof(*cptab), GFP_NOFS);
+   if (!cptab)
+   return NULL;
+
+   if (!zalloc_cpumask_var(>ctb_cpumask, GFP_NOFS)) {
+   kfree(cptab);
+   return NULL;
+   }
+   cpumask_set_cpu(0, cptab->ctb_cpumask);
+   node_set(0, cptab->ctb_nodemask);
+
+   return cptab;
+}
+
+static inline int cfs_cpt_table_print(struct cfs_cpt_table *cptab,
+ char *buf, int len)
+{
+   int rc;
+
+   rc = snprintf(buf, len, "0\t: 0\n");
+   len -= rc;
+   if (len <= 0)
+   return -EFBIG;
+
+   return rc;
 }
+
+static inline cpumask_var_t *
+cfs_cpt_cpumask(struct cfs_cpt_table *cptab, int cpt)
+{
+   return >ctb_cpumask;
+}
+
 static inline int
 cfs_cpt_number(struct cfs_cpt_table *cptab)
 {
@@ -243,7 +289,7 @@ void cfs_cpt_unset_nodemask(struct cfs_cpt_table *cptab,
 static inline nodemask_t *
 cfs_cpt_nodemask(struct cfs_cpt_table *cptab, int cpt)
 {
-   return NULL;
+   return >ctb_nodemask;
 }
 
 static inline int
@@ -328,24 +374,21 @@ void cfs_cpt_unset_nodemask(struct cfs_cpt_table *cptab,
 static inline int
 cfs_cpu_init(void)
 {
-   return 0;
+   cfs_cpt_tab = cfs_cpt_table_alloc(1);
+
+   return cfs_cpt_tab ? 0 : -1;
 }
 
 static inline void cfs_cpu_fini(void)
 {
+   if (cfs_cpt_tab) {
+   cfs_cpt_table_free(cfs_cpt_tab);
+   cfs_cpt_tab = NULL;
+   }
 }
 
 #endif /* CONFIG_SMP */
 
-/**
- * destroy a CPU partition table
- */
-void cfs_cpt_table_free(struct cfs_cpt_table *cptab);
-/**
- * create a cfs_cpt_table with \a ncpt number of partitions
- */
-struct cfs_cpt_table *cfs_cpt_table_alloc(unsigned int n

[PATCH v2 04/25] staging: lustre: libcfs: properly handle failure cases in SMP code

2018-05-29 Thread James Simmons
While pushing the SMP work some bugs were pointed out by Dan
Carpenter in the code. Due to single err label in cfs_cpu_init()
and cfs_cpt_table_alloc() a few items were being cleaned up that
were never initialized. This can lead to crashed and other problems.
In those initialization function introduce individual labels to
jump to only the thing initialized get freed on failure.

Signed-off-by: James Simmons 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-10932
Reviewed-on: https://review.whamcloud.com/32085
Reviewed-by: Dmitry Eremin 
Reviewed-by: Andreas Dilger 
Signed-off-by: James Simmons 
---
Changelog:

v1) New patch to make libcfs SMP code handle failure paths correctly.

 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 72 ++---
 1 file changed, 52 insertions(+), 20 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c 
b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index 34df7ed..b67a60c 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -81,17 +81,19 @@ struct cfs_cpt_table *
 
cptab->ctb_nparts = ncpt;
 
+   if (!zalloc_cpumask_var(>ctb_cpumask, GFP_NOFS))
+   goto failed_alloc_cpumask;
+
cptab->ctb_nodemask = kzalloc(sizeof(*cptab->ctb_nodemask),
  GFP_NOFS);
-   if (!zalloc_cpumask_var(>ctb_cpumask, GFP_NOFS) ||
-   !cptab->ctb_nodemask)
-   goto failed;
+   if (!cptab->ctb_nodemask)
+   goto failed_alloc_nodemask;
 
cptab->ctb_cpu2cpt = kvmalloc_array(num_possible_cpus(),
sizeof(cptab->ctb_cpu2cpt[0]),
GFP_KERNEL);
if (!cptab->ctb_cpu2cpt)
-   goto failed;
+   goto failed_alloc_cpu2cpt;
 
memset(cptab->ctb_cpu2cpt, -1,
   num_possible_cpus() * sizeof(cptab->ctb_cpu2cpt[0]));
@@ -99,22 +101,41 @@ struct cfs_cpt_table *
cptab->ctb_parts = kvmalloc_array(ncpt, sizeof(cptab->ctb_parts[0]),
  GFP_KERNEL);
if (!cptab->ctb_parts)
-   goto failed;
+   goto failed_alloc_ctb_parts;
+
+   memset(cptab->ctb_parts, -1, ncpt * sizeof(cptab->ctb_parts[0]));
 
for (i = 0; i < ncpt; i++) {
struct cfs_cpu_partition *part = >ctb_parts[i];
 
+   if (!zalloc_cpumask_var(>cpt_cpumask, GFP_NOFS))
+   goto failed_setting_ctb_parts;
+
part->cpt_nodemask = kzalloc(sizeof(*part->cpt_nodemask),
 GFP_NOFS);
-   if (!zalloc_cpumask_var(>cpt_cpumask, GFP_NOFS) ||
-   !part->cpt_nodemask)
-   goto failed;
+   if (!part->cpt_nodemask)
+   goto failed_setting_ctb_parts;
}
 
return cptab;
 
- failed:
-   cfs_cpt_table_free(cptab);
+failed_setting_ctb_parts:
+   while (i-- >= 0) {
+   struct cfs_cpu_partition *part = >ctb_parts[i];
+
+   kfree(part->cpt_nodemask);
+   free_cpumask_var(part->cpt_cpumask);
+   }
+
+   kvfree(cptab->ctb_parts);
+failed_alloc_ctb_parts:
+   kvfree(cptab->ctb_cpu2cpt);
+failed_alloc_cpu2cpt:
+   kfree(cptab->ctb_nodemask);
+failed_alloc_nodemask:
+   free_cpumask_var(cptab->ctb_cpumask);
+failed_alloc_cpumask:
+   kfree(cptab);
return NULL;
 }
 EXPORT_SYMBOL(cfs_cpt_table_alloc);
@@ -940,7 +961,7 @@ static int cfs_cpu_dead(unsigned int cpu)
 int
 cfs_cpu_init(void)
 {
-   int ret = 0;
+   int ret;
 
LASSERT(!cfs_cpt_tab);
 
@@ -949,23 +970,23 @@ static int cfs_cpu_dead(unsigned int cpu)
"staging/lustre/cfe:dead", NULL,
cfs_cpu_dead);
if (ret < 0)
-   goto failed;
+   goto failed_cpu_dead;
+
ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
"staging/lustre/cfe:online",
cfs_cpu_online, NULL);
if (ret < 0)
-   goto failed;
+   goto failed_cpu_online;
+
lustre_cpu_online = ret;
 #endif
-   ret = -EINVAL;
-
get_online_cpus();
if (*cpu_pattern) {
char *cpu_pattern_dup = kstrdup(cpu_pattern, GFP_KERNEL);
 
if (!cpu_pattern_dup) {
CERROR("Failed to duplicate cpu_pattern\n");
-   goto failed;
+   goto failed_alloc_table;
}
 
cfs_cpt_tab = cfs_cpt_table_create_pattern(cpu_pattern_dup);
@@ -973,7 +994,7 @@ static int cfs_

[PATCH v2 00/25] staging: lustre: libcfs: SMP rework

2018-05-29 Thread James Simmons
From: James Simmons 

Recently lustre support has been expanded to extreme machines with as
many as a 1000+ cores. On the other end lustre also has been ported
to platforms like ARM and KNL which have uniquie NUMA and core setup.
For example some devices exist that have NUMA nodes with no cores.
With these new platforms the limitations of the Lustre's SMP code
came to light so a lot of work was needed. This resulted in this
patch set which has been tested on these platforms.

Amir Shehata (8):
  staging: lustre: libcfs: replace MAX_NUMNODES with nr_node_ids
  staging: lustre: libcfs: remove excess space
  staging: lustre: libcfs: replace num_possible_cpus() with nr_cpu_ids
  staging: lustre: libcfs: NUMA support
  staging: lustre: libcfs: add cpu distance handling
  staging: lustre: libcfs: use distance in cpu and node handling
  staging: lustre: libcfs: provide debugfs files for distance handling
  staging: lustre: libcfs: invert error handling for cfs_cpt_table_print

Dmitry Eremin (15):
  staging: lustre: libcfs: remove useless CPU partition code
  staging: lustre: libcfs: rename variable i to cpu
  staging: lustre: libcfs: fix libcfs_cpu coding style
  staging: lustre: libcfs: use int type for CPT identification.
  staging: lustre: libcfs: rename i to node for cfs_cpt_set_nodemask
  staging: lustre: libcfs: rename i to cpu for cfs_cpt_bind
  staging: lustre: libcfs: rename cpumask_var_t variables to *_mask
  staging: lustre: libcfs: rename goto label in cfs_cpt_table_print
  staging: lustre: libcfs: update debug messages
  staging: lustre: libcfs: make tolerant to offline CPUs and empty NUMA nodes
  staging: lustre: libcfs: report NUMA node instead of just node
  staging: lustre: libcfs: update debug messages in CPT code
  staging: lustre: libcfs: rework CPU pattern parsing code
  staging: lustre: libcfs: change CPT estimate algorithm
  staging: lustre: ptlrpc: use current CPU instead of hardcoded 0

James Simmons (2):
  staging: lustre: libcfs: restore UMP handling
  staging: lustre: libcfs: properly handle failure cases in SMP code

 .../lustre/include/linux/libcfs/libcfs_cpu.h   | 225 +++--
 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c| 965 +++--
 drivers/staging/lustre/lnet/libcfs/module.c|  57 ++
 drivers/staging/lustre/lnet/lnet/lib-msg.c |   2 +
 drivers/staging/lustre/lustre/ptlrpc/service.c |  11 +-
 5 files changed, 728 insertions(+), 532 deletions(-)

-- 
1.8.3.1



[PATCH v2 04/25] staging: lustre: libcfs: properly handle failure cases in SMP code

2018-05-29 Thread James Simmons
While pushing the SMP work some bugs were pointed out by Dan
Carpenter in the code. Due to single err label in cfs_cpu_init()
and cfs_cpt_table_alloc() a few items were being cleaned up that
were never initialized. This can lead to crashed and other problems.
In those initialization function introduce individual labels to
jump to only the thing initialized get freed on failure.

Signed-off-by: James Simmons 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-10932
Reviewed-on: https://review.whamcloud.com/32085
Reviewed-by: Dmitry Eremin 
Reviewed-by: Andreas Dilger 
Signed-off-by: James Simmons 
---
Changelog:

v1) New patch to make libcfs SMP code handle failure paths correctly.

 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 72 ++---
 1 file changed, 52 insertions(+), 20 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c 
b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index 34df7ed..b67a60c 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -81,17 +81,19 @@ struct cfs_cpt_table *
 
cptab->ctb_nparts = ncpt;
 
+   if (!zalloc_cpumask_var(>ctb_cpumask, GFP_NOFS))
+   goto failed_alloc_cpumask;
+
cptab->ctb_nodemask = kzalloc(sizeof(*cptab->ctb_nodemask),
  GFP_NOFS);
-   if (!zalloc_cpumask_var(>ctb_cpumask, GFP_NOFS) ||
-   !cptab->ctb_nodemask)
-   goto failed;
+   if (!cptab->ctb_nodemask)
+   goto failed_alloc_nodemask;
 
cptab->ctb_cpu2cpt = kvmalloc_array(num_possible_cpus(),
sizeof(cptab->ctb_cpu2cpt[0]),
GFP_KERNEL);
if (!cptab->ctb_cpu2cpt)
-   goto failed;
+   goto failed_alloc_cpu2cpt;
 
memset(cptab->ctb_cpu2cpt, -1,
   num_possible_cpus() * sizeof(cptab->ctb_cpu2cpt[0]));
@@ -99,22 +101,41 @@ struct cfs_cpt_table *
cptab->ctb_parts = kvmalloc_array(ncpt, sizeof(cptab->ctb_parts[0]),
  GFP_KERNEL);
if (!cptab->ctb_parts)
-   goto failed;
+   goto failed_alloc_ctb_parts;
+
+   memset(cptab->ctb_parts, -1, ncpt * sizeof(cptab->ctb_parts[0]));
 
for (i = 0; i < ncpt; i++) {
struct cfs_cpu_partition *part = >ctb_parts[i];
 
+   if (!zalloc_cpumask_var(>cpt_cpumask, GFP_NOFS))
+   goto failed_setting_ctb_parts;
+
part->cpt_nodemask = kzalloc(sizeof(*part->cpt_nodemask),
 GFP_NOFS);
-   if (!zalloc_cpumask_var(>cpt_cpumask, GFP_NOFS) ||
-   !part->cpt_nodemask)
-   goto failed;
+   if (!part->cpt_nodemask)
+   goto failed_setting_ctb_parts;
}
 
return cptab;
 
- failed:
-   cfs_cpt_table_free(cptab);
+failed_setting_ctb_parts:
+   while (i-- >= 0) {
+   struct cfs_cpu_partition *part = >ctb_parts[i];
+
+   kfree(part->cpt_nodemask);
+   free_cpumask_var(part->cpt_cpumask);
+   }
+
+   kvfree(cptab->ctb_parts);
+failed_alloc_ctb_parts:
+   kvfree(cptab->ctb_cpu2cpt);
+failed_alloc_cpu2cpt:
+   kfree(cptab->ctb_nodemask);
+failed_alloc_nodemask:
+   free_cpumask_var(cptab->ctb_cpumask);
+failed_alloc_cpumask:
+   kfree(cptab);
return NULL;
 }
 EXPORT_SYMBOL(cfs_cpt_table_alloc);
@@ -940,7 +961,7 @@ static int cfs_cpu_dead(unsigned int cpu)
 int
 cfs_cpu_init(void)
 {
-   int ret = 0;
+   int ret;
 
LASSERT(!cfs_cpt_tab);
 
@@ -949,23 +970,23 @@ static int cfs_cpu_dead(unsigned int cpu)
"staging/lustre/cfe:dead", NULL,
cfs_cpu_dead);
if (ret < 0)
-   goto failed;
+   goto failed_cpu_dead;
+
ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
"staging/lustre/cfe:online",
cfs_cpu_online, NULL);
if (ret < 0)
-   goto failed;
+   goto failed_cpu_online;
+
lustre_cpu_online = ret;
 #endif
-   ret = -EINVAL;
-
get_online_cpus();
if (*cpu_pattern) {
char *cpu_pattern_dup = kstrdup(cpu_pattern, GFP_KERNEL);
 
if (!cpu_pattern_dup) {
CERROR("Failed to duplicate cpu_pattern\n");
-   goto failed;
+   goto failed_alloc_table;
}
 
cfs_cpt_tab = cfs_cpt_table_create_pattern(cpu_pattern_dup);
@@ -973,7 +994,7 @@ static int cfs_

[PATCH v2 00/25] staging: lustre: libcfs: SMP rework

2018-05-29 Thread James Simmons
From: James Simmons 

Recently lustre support has been expanded to extreme machines with as
many as a 1000+ cores. On the other end lustre also has been ported
to platforms like ARM and KNL which have uniquie NUMA and core setup.
For example some devices exist that have NUMA nodes with no cores.
With these new platforms the limitations of the Lustre's SMP code
came to light so a lot of work was needed. This resulted in this
patch set which has been tested on these platforms.

Amir Shehata (8):
  staging: lustre: libcfs: replace MAX_NUMNODES with nr_node_ids
  staging: lustre: libcfs: remove excess space
  staging: lustre: libcfs: replace num_possible_cpus() with nr_cpu_ids
  staging: lustre: libcfs: NUMA support
  staging: lustre: libcfs: add cpu distance handling
  staging: lustre: libcfs: use distance in cpu and node handling
  staging: lustre: libcfs: provide debugfs files for distance handling
  staging: lustre: libcfs: invert error handling for cfs_cpt_table_print

Dmitry Eremin (15):
  staging: lustre: libcfs: remove useless CPU partition code
  staging: lustre: libcfs: rename variable i to cpu
  staging: lustre: libcfs: fix libcfs_cpu coding style
  staging: lustre: libcfs: use int type for CPT identification.
  staging: lustre: libcfs: rename i to node for cfs_cpt_set_nodemask
  staging: lustre: libcfs: rename i to cpu for cfs_cpt_bind
  staging: lustre: libcfs: rename cpumask_var_t variables to *_mask
  staging: lustre: libcfs: rename goto label in cfs_cpt_table_print
  staging: lustre: libcfs: update debug messages
  staging: lustre: libcfs: make tolerant to offline CPUs and empty NUMA nodes
  staging: lustre: libcfs: report NUMA node instead of just node
  staging: lustre: libcfs: update debug messages in CPT code
  staging: lustre: libcfs: rework CPU pattern parsing code
  staging: lustre: libcfs: change CPT estimate algorithm
  staging: lustre: ptlrpc: use current CPU instead of hardcoded 0

James Simmons (2):
  staging: lustre: libcfs: restore UMP handling
  staging: lustre: libcfs: properly handle failure cases in SMP code

 .../lustre/include/linux/libcfs/libcfs_cpu.h   | 225 +++--
 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c| 965 +++--
 drivers/staging/lustre/lnet/libcfs/module.c|  57 ++
 drivers/staging/lustre/lnet/lnet/lib-msg.c |   2 +
 drivers/staging/lustre/lustre/ptlrpc/service.c |  11 +-
 5 files changed, 728 insertions(+), 532 deletions(-)

-- 
1.8.3.1



[PATCH v2 v2 07/25] staging: lustre: libcfs: replace num_possible_cpus() with nr_cpu_ids

2018-05-29 Thread James Simmons
From: Amir Shehata 

Move from num_possible_cpus() to nr_cpu_ids.

Signed-off-by: Amir Shehata 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734
Reviewed-on: http://review.whamcloud.com/18916
Reviewed-by: Olaf Weber 
Reviewed-by: Doug Oucharek 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch
v2) Rebased patch. Same code

 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c 
b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index d9d1388..3f855a8 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -89,14 +89,14 @@ struct cfs_cpt_table *
if (!cptab->ctb_nodemask)
goto failed_alloc_nodemask;
 
-   cptab->ctb_cpu2cpt = kvmalloc_array(num_possible_cpus(),
+   cptab->ctb_cpu2cpt = kvmalloc_array(nr_cpu_ids,
sizeof(cptab->ctb_cpu2cpt[0]),
GFP_KERNEL);
if (!cptab->ctb_cpu2cpt)
goto failed_alloc_cpu2cpt;
 
memset(cptab->ctb_cpu2cpt, -1,
-  num_possible_cpus() * sizeof(cptab->ctb_cpu2cpt[0]));
+  nr_cpu_ids * sizeof(cptab->ctb_cpu2cpt[0]));
 
cptab->ctb_parts = kvmalloc_array(ncpt, sizeof(cptab->ctb_parts[0]),
  GFP_KERNEL);
-- 
1.8.3.1



[PATCH v2 v2 07/25] staging: lustre: libcfs: replace num_possible_cpus() with nr_cpu_ids

2018-05-29 Thread James Simmons
From: Amir Shehata 

Move from num_possible_cpus() to nr_cpu_ids.

Signed-off-by: Amir Shehata 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734
Reviewed-on: http://review.whamcloud.com/18916
Reviewed-by: Olaf Weber 
Reviewed-by: Doug Oucharek 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch
v2) Rebased patch. Same code

 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c 
b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index d9d1388..3f855a8 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -89,14 +89,14 @@ struct cfs_cpt_table *
if (!cptab->ctb_nodemask)
goto failed_alloc_nodemask;
 
-   cptab->ctb_cpu2cpt = kvmalloc_array(num_possible_cpus(),
+   cptab->ctb_cpu2cpt = kvmalloc_array(nr_cpu_ids,
sizeof(cptab->ctb_cpu2cpt[0]),
GFP_KERNEL);
if (!cptab->ctb_cpu2cpt)
goto failed_alloc_cpu2cpt;
 
memset(cptab->ctb_cpu2cpt, -1,
-  num_possible_cpus() * sizeof(cptab->ctb_cpu2cpt[0]));
+  nr_cpu_ids * sizeof(cptab->ctb_cpu2cpt[0]));
 
cptab->ctb_parts = kvmalloc_array(ncpt, sizeof(cptab->ctb_parts[0]),
  GFP_KERNEL);
-- 
1.8.3.1



[PATCH v2 08/25] staging: lustre: libcfs: NUMA support

2018-05-29 Thread James Simmons
From: Amir Shehata 

This patch adds NUMA node support. NUMA node information is stored
in the CPT table. A NUMA node mask is maintained for the entire
table as well as for each CPT to track the NUMA nodes related to
each of the CPTs. Add new function cfs_cpt_of_node() which returns
the CPT of a particular NUMA node.

Signed-off-by: Amir Shehata 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734
Reviewed-on: http://review.whamcloud.com/18916
Reviewed-by: Olaf Weber 
Reviewed-by: Doug Oucharek 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch
v2) Rebased patch to handle recent libcfs changes

 .../lustre/include/linux/libcfs/libcfs_cpu.h| 11 +++
 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 21 +
 2 files changed, 32 insertions(+)

diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h 
b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
index 3626969..487625d 100644
--- a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
+++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
@@ -103,6 +103,8 @@ struct cfs_cpt_table {
struct cfs_cpu_partition*ctb_parts;
/* shadow HW CPU to CPU partition ID */
int *ctb_cpu2cpt;
+   /* shadow HW node to CPU partition ID */
+   int *ctb_node2cpt;
/* all nodes in this partition table */
nodemask_t  *ctb_nodemask;
 #else
@@ -157,6 +159,10 @@ struct cfs_cpt_table {
  */
 int cfs_cpt_of_cpu(struct cfs_cpt_table *cptab, int cpu);
 /**
+ * shadow HW node ID \a NODE to CPU-partition ID by \a cptab
+ */
+int cfs_cpt_of_node(struct cfs_cpt_table *cptab, int node);
+/**
  * bind current thread on a CPU-partition \a cpt of \a cptab
  */
 int cfs_cpt_bind(struct cfs_cpt_table *cptab, int cpt);
@@ -345,6 +351,11 @@ static inline int cfs_cpt_table_print(struct cfs_cpt_table 
*cptab,
return 0;
 }
 
+static inline int cfs_cpt_of_node(struct cfs_cpt_table *cptab, int node)
+{
+   return 0;
+}
+
 static inline int
 cfs_cpt_bind(struct cfs_cpt_table *cptab, int cpt)
 {
diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c 
b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index 3f855a8..f616073 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -98,6 +98,15 @@ struct cfs_cpt_table *
memset(cptab->ctb_cpu2cpt, -1,
   nr_cpu_ids * sizeof(cptab->ctb_cpu2cpt[0]));
 
+   cptab->ctb_node2cpt = kvmalloc_array(nr_node_ids,
+sizeof(cptab->ctb_node2cpt[0]),
+GFP_KERNEL);
+   if (!cptab->ctb_node2cpt)
+   goto failed_alloc_node2cpt;
+
+   memset(cptab->ctb_node2cpt, -1,
+  nr_node_ids * sizeof(cptab->ctb_node2cpt[0]));
+
cptab->ctb_parts = kvmalloc_array(ncpt, sizeof(cptab->ctb_parts[0]),
  GFP_KERNEL);
if (!cptab->ctb_parts)
@@ -129,6 +138,8 @@ struct cfs_cpt_table *
 
kvfree(cptab->ctb_parts);
 failed_alloc_ctb_parts:
+   kvfree(cptab->ctb_node2cpt);
+failed_alloc_node2cpt:
kvfree(cptab->ctb_cpu2cpt);
 failed_alloc_cpu2cpt:
kfree(cptab->ctb_nodemask);
@@ -146,6 +157,7 @@ struct cfs_cpt_table *
int i;
 
kvfree(cptab->ctb_cpu2cpt);
+   kvfree(cptab->ctb_node2cpt);
 
for (i = 0; cptab->ctb_parts && i < cptab->ctb_nparts; i++) {
struct cfs_cpu_partition *part = >ctb_parts[i];
@@ -511,6 +523,15 @@ struct cfs_cpt_table *
 }
 EXPORT_SYMBOL(cfs_cpt_of_cpu);
 
+int cfs_cpt_of_node(struct cfs_cpt_table *cptab, int node)
+{
+   if (node < 0 || node > nr_node_ids)
+   return CFS_CPT_ANY;
+
+   return cptab->ctb_node2cpt[node];
+}
+EXPORT_SYMBOL(cfs_cpt_of_node);
+
 int
 cfs_cpt_bind(struct cfs_cpt_table *cptab, int cpt)
 {
-- 
1.8.3.1



[PATCH v2 08/25] staging: lustre: libcfs: NUMA support

2018-05-29 Thread James Simmons
From: Amir Shehata 

This patch adds NUMA node support. NUMA node information is stored
in the CPT table. A NUMA node mask is maintained for the entire
table as well as for each CPT to track the NUMA nodes related to
each of the CPTs. Add new function cfs_cpt_of_node() which returns
the CPT of a particular NUMA node.

Signed-off-by: Amir Shehata 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734
Reviewed-on: http://review.whamcloud.com/18916
Reviewed-by: Olaf Weber 
Reviewed-by: Doug Oucharek 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch
v2) Rebased patch to handle recent libcfs changes

 .../lustre/include/linux/libcfs/libcfs_cpu.h| 11 +++
 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 21 +
 2 files changed, 32 insertions(+)

diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h 
b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
index 3626969..487625d 100644
--- a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
+++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
@@ -103,6 +103,8 @@ struct cfs_cpt_table {
struct cfs_cpu_partition*ctb_parts;
/* shadow HW CPU to CPU partition ID */
int *ctb_cpu2cpt;
+   /* shadow HW node to CPU partition ID */
+   int *ctb_node2cpt;
/* all nodes in this partition table */
nodemask_t  *ctb_nodemask;
 #else
@@ -157,6 +159,10 @@ struct cfs_cpt_table {
  */
 int cfs_cpt_of_cpu(struct cfs_cpt_table *cptab, int cpu);
 /**
+ * shadow HW node ID \a NODE to CPU-partition ID by \a cptab
+ */
+int cfs_cpt_of_node(struct cfs_cpt_table *cptab, int node);
+/**
  * bind current thread on a CPU-partition \a cpt of \a cptab
  */
 int cfs_cpt_bind(struct cfs_cpt_table *cptab, int cpt);
@@ -345,6 +351,11 @@ static inline int cfs_cpt_table_print(struct cfs_cpt_table 
*cptab,
return 0;
 }
 
+static inline int cfs_cpt_of_node(struct cfs_cpt_table *cptab, int node)
+{
+   return 0;
+}
+
 static inline int
 cfs_cpt_bind(struct cfs_cpt_table *cptab, int cpt)
 {
diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c 
b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index 3f855a8..f616073 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -98,6 +98,15 @@ struct cfs_cpt_table *
memset(cptab->ctb_cpu2cpt, -1,
   nr_cpu_ids * sizeof(cptab->ctb_cpu2cpt[0]));
 
+   cptab->ctb_node2cpt = kvmalloc_array(nr_node_ids,
+sizeof(cptab->ctb_node2cpt[0]),
+GFP_KERNEL);
+   if (!cptab->ctb_node2cpt)
+   goto failed_alloc_node2cpt;
+
+   memset(cptab->ctb_node2cpt, -1,
+  nr_node_ids * sizeof(cptab->ctb_node2cpt[0]));
+
cptab->ctb_parts = kvmalloc_array(ncpt, sizeof(cptab->ctb_parts[0]),
  GFP_KERNEL);
if (!cptab->ctb_parts)
@@ -129,6 +138,8 @@ struct cfs_cpt_table *
 
kvfree(cptab->ctb_parts);
 failed_alloc_ctb_parts:
+   kvfree(cptab->ctb_node2cpt);
+failed_alloc_node2cpt:
kvfree(cptab->ctb_cpu2cpt);
 failed_alloc_cpu2cpt:
kfree(cptab->ctb_nodemask);
@@ -146,6 +157,7 @@ struct cfs_cpt_table *
int i;
 
kvfree(cptab->ctb_cpu2cpt);
+   kvfree(cptab->ctb_node2cpt);
 
for (i = 0; cptab->ctb_parts && i < cptab->ctb_nparts; i++) {
struct cfs_cpu_partition *part = >ctb_parts[i];
@@ -511,6 +523,15 @@ struct cfs_cpt_table *
 }
 EXPORT_SYMBOL(cfs_cpt_of_cpu);
 
+int cfs_cpt_of_node(struct cfs_cpt_table *cptab, int node)
+{
+   if (node < 0 || node > nr_node_ids)
+   return CFS_CPT_ANY;
+
+   return cptab->ctb_node2cpt[node];
+}
+EXPORT_SYMBOL(cfs_cpt_of_node);
+
 int
 cfs_cpt_bind(struct cfs_cpt_table *cptab, int cpt)
 {
-- 
1.8.3.1



[PATCH v2 03/25] staging: lustre: libcfs: rename variable i to cpu

2018-05-29 Thread James Simmons
From: Dmitry Eremin 

Change the name of the variable i used for for_each_cpu() to cpu
for code readability.

Signed-off-by: Dmitry Eremin 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8703
Reviewed-on: https://review.whamcloud.com/23303
Reviewed-by: James Simmons 
Reviewed-by: Doug Oucharek 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch
v2) Rebased to handle recent cleanups in libcfs

 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c 
b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index 951a9ca..34df7ed 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -340,7 +340,7 @@ struct cfs_cpt_table *
 cfs_cpt_set_cpumask(struct cfs_cpt_table *cptab, int cpt,
const cpumask_t *mask)
 {
-   int i;
+   int cpu;
 
if (!cpumask_weight(mask) ||
cpumask_any_and(mask, cpu_online_mask) >= nr_cpu_ids) {
@@ -349,8 +349,8 @@ struct cfs_cpt_table *
return 0;
}
 
-   for_each_cpu(i, mask) {
-   if (!cfs_cpt_set_cpu(cptab, cpt, i))
+   for_each_cpu(cpu, mask) {
+   if (!cfs_cpt_set_cpu(cptab, cpt, cpu))
return 0;
}
 
@@ -362,10 +362,10 @@ struct cfs_cpt_table *
 cfs_cpt_unset_cpumask(struct cfs_cpt_table *cptab, int cpt,
  const cpumask_t *mask)
 {
-   int i;
+   int cpu;
 
-   for_each_cpu(i, mask)
-   cfs_cpt_unset_cpu(cptab, cpt, i);
+   for_each_cpu(cpu, mask)
+   cfs_cpt_unset_cpu(cptab, cpt, cpu);
 }
 EXPORT_SYMBOL(cfs_cpt_unset_cpumask);
 
-- 
1.8.3.1



[PATCH v2 03/25] staging: lustre: libcfs: rename variable i to cpu

2018-05-29 Thread James Simmons
From: Dmitry Eremin 

Change the name of the variable i used for for_each_cpu() to cpu
for code readability.

Signed-off-by: Dmitry Eremin 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8703
Reviewed-on: https://review.whamcloud.com/23303
Reviewed-by: James Simmons 
Reviewed-by: Doug Oucharek 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch
v2) Rebased to handle recent cleanups in libcfs

 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c 
b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index 951a9ca..34df7ed 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -340,7 +340,7 @@ struct cfs_cpt_table *
 cfs_cpt_set_cpumask(struct cfs_cpt_table *cptab, int cpt,
const cpumask_t *mask)
 {
-   int i;
+   int cpu;
 
if (!cpumask_weight(mask) ||
cpumask_any_and(mask, cpu_online_mask) >= nr_cpu_ids) {
@@ -349,8 +349,8 @@ struct cfs_cpt_table *
return 0;
}
 
-   for_each_cpu(i, mask) {
-   if (!cfs_cpt_set_cpu(cptab, cpt, i))
+   for_each_cpu(cpu, mask) {
+   if (!cfs_cpt_set_cpu(cptab, cpt, cpu))
return 0;
}
 
@@ -362,10 +362,10 @@ struct cfs_cpt_table *
 cfs_cpt_unset_cpumask(struct cfs_cpt_table *cptab, int cpt,
  const cpumask_t *mask)
 {
-   int i;
+   int cpu;
 
-   for_each_cpu(i, mask)
-   cfs_cpt_unset_cpu(cptab, cpt, i);
+   for_each_cpu(cpu, mask)
+   cfs_cpt_unset_cpu(cptab, cpt, cpu);
 }
 EXPORT_SYMBOL(cfs_cpt_unset_cpumask);
 
-- 
1.8.3.1



[PATCH v2 09/25] staging: lustre: libcfs: add cpu distance handling

2018-05-29 Thread James Simmons
From: Amir Shehata 

Add functionality to calculate the distance between two CPTs.
Expose those distance in debugfs so people deploying a setup
can debug what is being created for CPTs.

Signed-off-by: Amir Shehata 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734
Reviewed-on: http://review.whamcloud.com/18916
Reviewed-by: Olaf Weber 
Reviewed-by: Doug Oucharek 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch
v2) Rebased patch to handle recent libcfs changes

 .../lustre/include/linux/libcfs/libcfs_cpu.h   | 31 +++
 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c| 61 ++
 2 files changed, 92 insertions(+)

diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h 
b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
index 487625d..d5237d0 100644
--- a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
+++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
@@ -87,6 +87,8 @@ struct cfs_cpu_partition {
cpumask_var_t   cpt_cpumask;
/* nodes mask for this partition */
nodemask_t  *cpt_nodemask;
+   /* NUMA distance between CPTs */
+   unsigned int*cpt_distance;
/* spread rotor for NUMA allocator */
unsigned intcpt_spread_rotor;
 };
@@ -97,6 +99,8 @@ struct cfs_cpt_table {
 #ifdef CONFIG_SMP
/* spread rotor for NUMA allocator */
unsigned intctb_spread_rotor;
+   /* maximum NUMA distance between all nodes in table */
+   unsigned intctb_distance;
/* # of CPU partitions */
unsigned intctb_nparts;
/* partitions tables */
@@ -134,6 +138,10 @@ struct cfs_cpt_table {
  */
 int cfs_cpt_table_print(struct cfs_cpt_table *cptab, char *buf, int len);
 /**
+ * print distance information of cpt-table
+ */
+int cfs_cpt_distance_print(struct cfs_cpt_table *cptab, char *buf, int len);
+/**
  * return total number of CPU partitions in \a cptab
  */
 int
@@ -163,6 +171,10 @@ struct cfs_cpt_table {
  */
 int cfs_cpt_of_node(struct cfs_cpt_table *cptab, int node);
 /**
+ * NUMA distance between \a cpt1 and \a cpt2 in \a cptab
+ */
+unsigned int cfs_cpt_distance(struct cfs_cpt_table *cptab, int cpt1, int cpt2);
+/**
  * bind current thread on a CPU-partition \a cpt of \a cptab
  */
 int cfs_cpt_bind(struct cfs_cpt_table *cptab, int cpt);
@@ -257,6 +269,19 @@ static inline int cfs_cpt_table_print(struct cfs_cpt_table 
*cptab,
return rc;
 }
 
+static inline int cfs_cpt_distance_print(struct cfs_cpt_table *cptab,
+char *buf, int len)
+{
+   int rc;
+
+   rc = snprintf(buf, len, "0\t: 0:1\n");
+   len -= rc;
+   if (len <= 0)
+   return -EFBIG;
+
+   return rc;
+}
+
 static inline cpumask_var_t *
 cfs_cpt_cpumask(struct cfs_cpt_table *cptab, int cpt)
 {
@@ -287,6 +312,12 @@ static inline int cfs_cpt_table_print(struct cfs_cpt_table 
*cptab,
return >ctb_nodemask;
 }
 
+static inline unsigned int cfs_cpt_distance(struct cfs_cpt_table *cptab,
+   int cpt1, int cpt2)
+{
+   return 1;
+}
+
 static inline int
 cfs_cpt_set_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu)
 {
diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c 
b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index f616073..2a74e51 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -124,6 +124,15 @@ struct cfs_cpt_table *
 GFP_NOFS);
if (!part->cpt_nodemask)
goto failed_setting_ctb_parts;
+
+   part->cpt_distance = kvmalloc_array(cptab->ctb_nparts,
+   
sizeof(part->cpt_distance[0]),
+   GFP_KERNEL);
+   if (!part->cpt_distance)
+   goto failed_setting_ctb_parts;
+
+   memset(part->cpt_distance, -1,
+  cptab->ctb_nparts * sizeof(part->cpt_distance[0]));
}
 
return cptab;
@@ -134,6 +143,7 @@ struct cfs_cpt_table *
 
kfree(part->cpt_nodemask);
free_cpumask_var(part->cpt_cpumask);
+   kvfree(part->cpt_distance);
}
 
kvfree(cptab->ctb_parts);
@@ -164,6 +174,7 @@ struct cfs_cpt_table *
 
kfree(part->cpt_nodemask);
free_cpumask_var(part->cpt_cpumask);
+   kvfree(part->cpt_distance);
}
 
kvfree(cptab->ctb_parts);
@@ -218,6 +229,44 @@ struct cfs_cpt_table *
 }
 EXPORT_SYMBOL(cfs_cpt_table_print);
 
+int cfs_cpt_distance_print(struct cfs_cpt_table *cptab, char *buf, int len)
+{
+   char *tmp = buf;
+   int r

[PATCH v2 09/25] staging: lustre: libcfs: add cpu distance handling

2018-05-29 Thread James Simmons
From: Amir Shehata 

Add functionality to calculate the distance between two CPTs.
Expose those distance in debugfs so people deploying a setup
can debug what is being created for CPTs.

Signed-off-by: Amir Shehata 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734
Reviewed-on: http://review.whamcloud.com/18916
Reviewed-by: Olaf Weber 
Reviewed-by: Doug Oucharek 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch
v2) Rebased patch to handle recent libcfs changes

 .../lustre/include/linux/libcfs/libcfs_cpu.h   | 31 +++
 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c| 61 ++
 2 files changed, 92 insertions(+)

diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h 
b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
index 487625d..d5237d0 100644
--- a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
+++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
@@ -87,6 +87,8 @@ struct cfs_cpu_partition {
cpumask_var_t   cpt_cpumask;
/* nodes mask for this partition */
nodemask_t  *cpt_nodemask;
+   /* NUMA distance between CPTs */
+   unsigned int*cpt_distance;
/* spread rotor for NUMA allocator */
unsigned intcpt_spread_rotor;
 };
@@ -97,6 +99,8 @@ struct cfs_cpt_table {
 #ifdef CONFIG_SMP
/* spread rotor for NUMA allocator */
unsigned intctb_spread_rotor;
+   /* maximum NUMA distance between all nodes in table */
+   unsigned intctb_distance;
/* # of CPU partitions */
unsigned intctb_nparts;
/* partitions tables */
@@ -134,6 +138,10 @@ struct cfs_cpt_table {
  */
 int cfs_cpt_table_print(struct cfs_cpt_table *cptab, char *buf, int len);
 /**
+ * print distance information of cpt-table
+ */
+int cfs_cpt_distance_print(struct cfs_cpt_table *cptab, char *buf, int len);
+/**
  * return total number of CPU partitions in \a cptab
  */
 int
@@ -163,6 +171,10 @@ struct cfs_cpt_table {
  */
 int cfs_cpt_of_node(struct cfs_cpt_table *cptab, int node);
 /**
+ * NUMA distance between \a cpt1 and \a cpt2 in \a cptab
+ */
+unsigned int cfs_cpt_distance(struct cfs_cpt_table *cptab, int cpt1, int cpt2);
+/**
  * bind current thread on a CPU-partition \a cpt of \a cptab
  */
 int cfs_cpt_bind(struct cfs_cpt_table *cptab, int cpt);
@@ -257,6 +269,19 @@ static inline int cfs_cpt_table_print(struct cfs_cpt_table 
*cptab,
return rc;
 }
 
+static inline int cfs_cpt_distance_print(struct cfs_cpt_table *cptab,
+char *buf, int len)
+{
+   int rc;
+
+   rc = snprintf(buf, len, "0\t: 0:1\n");
+   len -= rc;
+   if (len <= 0)
+   return -EFBIG;
+
+   return rc;
+}
+
 static inline cpumask_var_t *
 cfs_cpt_cpumask(struct cfs_cpt_table *cptab, int cpt)
 {
@@ -287,6 +312,12 @@ static inline int cfs_cpt_table_print(struct cfs_cpt_table 
*cptab,
return >ctb_nodemask;
 }
 
+static inline unsigned int cfs_cpt_distance(struct cfs_cpt_table *cptab,
+   int cpt1, int cpt2)
+{
+   return 1;
+}
+
 static inline int
 cfs_cpt_set_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu)
 {
diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c 
b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index f616073..2a74e51 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -124,6 +124,15 @@ struct cfs_cpt_table *
 GFP_NOFS);
if (!part->cpt_nodemask)
goto failed_setting_ctb_parts;
+
+   part->cpt_distance = kvmalloc_array(cptab->ctb_nparts,
+   
sizeof(part->cpt_distance[0]),
+   GFP_KERNEL);
+   if (!part->cpt_distance)
+   goto failed_setting_ctb_parts;
+
+   memset(part->cpt_distance, -1,
+  cptab->ctb_nparts * sizeof(part->cpt_distance[0]));
}
 
return cptab;
@@ -134,6 +143,7 @@ struct cfs_cpt_table *
 
kfree(part->cpt_nodemask);
free_cpumask_var(part->cpt_cpumask);
+   kvfree(part->cpt_distance);
}
 
kvfree(cptab->ctb_parts);
@@ -164,6 +174,7 @@ struct cfs_cpt_table *
 
kfree(part->cpt_nodemask);
free_cpumask_var(part->cpt_cpumask);
+   kvfree(part->cpt_distance);
}
 
kvfree(cptab->ctb_parts);
@@ -218,6 +229,44 @@ struct cfs_cpt_table *
 }
 EXPORT_SYMBOL(cfs_cpt_table_print);
 
+int cfs_cpt_distance_print(struct cfs_cpt_table *cptab, char *buf, int len)
+{
+   char *tmp = buf;
+   int r

[PATCH v2 10/25] staging: lustre: libcfs: use distance in cpu and node handling

2018-05-29 Thread James Simmons
From: Amir Shehata 

Take into consideration the location of NUMA nodes and core
when calling cfs_cpt_[un]set_cpu() and cfs_cpt_[un]set_node().
This enables functioning on platforms with 100s of cores and
NUMA nodes.

Signed-off-by: Amir Shehata 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734
Reviewed-on: http://review.whamcloud.com/18916
Reviewed-by: Olaf Weber 
Reviewed-by: Doug Oucharek 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch
v2) Rebased patch to handle recent libcfs changes

 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 192 ++--
 1 file changed, 143 insertions(+), 49 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c 
b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index 2a74e51..9ff9fe9 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -330,11 +330,134 @@ unsigned int cfs_cpt_distance(struct cfs_cpt_table 
*cptab, int cpt1, int cpt2)
 }
 EXPORT_SYMBOL(cfs_cpt_distance);
 
+/*
+ * Calculate the maximum NUMA distance between all nodes in the
+ * from_mask and all nodes in the to_mask.
+ */
+static unsigned int cfs_cpt_distance_calculate(nodemask_t *from_mask,
+  nodemask_t *to_mask)
+{
+   unsigned int maximum;
+   unsigned int distance;
+   int from;
+   int to;
+
+   maximum = 0;
+   for_each_node_mask(from, *from_mask) {
+   for_each_node_mask(to, *to_mask) {
+   distance = node_distance(from, to);
+   if (maximum < distance)
+   maximum = distance;
+   }
+   }
+   return maximum;
+}
+
+static void cfs_cpt_add_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu)
+{
+   cptab->ctb_cpu2cpt[cpu] = cpt;
+
+   cpumask_set_cpu(cpu, cptab->ctb_cpumask);
+   cpumask_set_cpu(cpu, cptab->ctb_parts[cpt].cpt_cpumask);
+}
+
+static void cfs_cpt_del_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu)
+{
+   cpumask_clear_cpu(cpu, cptab->ctb_parts[cpt].cpt_cpumask);
+   cpumask_clear_cpu(cpu, cptab->ctb_cpumask);
+
+   cptab->ctb_cpu2cpt[cpu] = -1;
+}
+
+static void cfs_cpt_add_node(struct cfs_cpt_table *cptab, int cpt, int node)
+{
+   struct cfs_cpu_partition *part;
+
+   if (!node_isset(node, *cptab->ctb_nodemask)) {
+   unsigned int dist;
+
+   /* first time node is added to the CPT table */
+   node_set(node, *cptab->ctb_nodemask);
+   cptab->ctb_node2cpt[node] = cpt;
+
+   dist = cfs_cpt_distance_calculate(cptab->ctb_nodemask,
+ cptab->ctb_nodemask);
+   cptab->ctb_distance = dist;
+   }
+
+   part = >ctb_parts[cpt];
+   if (!node_isset(node, *part->cpt_nodemask)) {
+   int cpt2;
+
+   /* first time node is added to this CPT */
+   node_set(node, *part->cpt_nodemask);
+   for (cpt2 = 0; cpt2 < cptab->ctb_nparts; cpt2++) {
+   struct cfs_cpu_partition *part2;
+   unsigned int dist;
+
+   part2 = >ctb_parts[cpt2];
+   dist = cfs_cpt_distance_calculate(part->cpt_nodemask,
+ part2->cpt_nodemask);
+   part->cpt_distance[cpt2] = dist;
+   dist = cfs_cpt_distance_calculate(part2->cpt_nodemask,
+ part->cpt_nodemask);
+   part2->cpt_distance[cpt] = dist;
+   }
+   }
+}
+
+static void cfs_cpt_del_node(struct cfs_cpt_table *cptab, int cpt, int node)
+{
+   struct cfs_cpu_partition *part = >ctb_parts[cpt];
+   int cpu;
+
+   for_each_cpu(cpu, part->cpt_cpumask) {
+   /* this CPT has other CPU belonging to this node? */
+   if (cpu_to_node(cpu) == node)
+   break;
+   }
+
+   if (cpu >= nr_cpu_ids && node_isset(node,  *part->cpt_nodemask)) {
+   int cpt2;
+
+   /* No more CPUs in the node for this CPT. */
+   node_clear(node, *part->cpt_nodemask);
+   for (cpt2 = 0; cpt2 < cptab->ctb_nparts; cpt2++) {
+   struct cfs_cpu_partition *part2;
+   unsigned int dist;
+
+   part2 = >ctb_parts[cpt2];
+   if (node_isset(node, *part2->cpt_nodemask))
+   cptab->ctb_node2cpt[node] = cpt2;
+
+   dist = cfs_cpt_distance_calculate(part->cpt_nodemask,
+ part2->cpt_nodemask);
+  

[PATCH v2 05/25] staging: lustre: libcfs: replace MAX_NUMNODES with nr_node_ids

2018-05-29 Thread James Simmons
From: Amir Shehata 

Replace MAX_NUMNODES which is considered deprocated with
nr_nodes_ids. Looking at page_malloc.c you will see that
nr_nodes_ids is equal to MAX_NUMNODES. MAX_NUMNODES is
actually setup with Kconfig.

Signed-off-by: Amir Shehata 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734
Reviewed-on: http://review.whamcloud.com/18916
Reviewed-by: Olaf Weber 
Reviewed-by: Doug Oucharek 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch
v2) Same code but added in more details in commit message

 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c 
b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index b67a60c..d3017e8 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -395,7 +395,7 @@ struct cfs_cpt_table *
 {
const cpumask_t *mask;
 
-   if (node < 0 || node >= MAX_NUMNODES) {
+   if (node < 0 || node >= nr_node_ids) {
CDEBUG(D_INFO,
   "Invalid NUMA id %d for CPU partition %d\n", node, cpt);
return 0;
@@ -412,7 +412,7 @@ struct cfs_cpt_table *
 {
const cpumask_t *mask;
 
-   if (node < 0 || node >= MAX_NUMNODES) {
+   if (node < 0 || node >= nr_node_ids) {
CDEBUG(D_INFO,
   "Invalid NUMA id %d for CPU partition %d\n", node, cpt);
return;
@@ -836,7 +836,7 @@ struct cfs_cpt_table *
return cptab;
}
 
-   high = node ? MAX_NUMNODES - 1 : nr_cpu_ids - 1;
+   high = node ? nr_node_ids - 1 : nr_cpu_ids - 1;
 
for (str = strim(pattern), c = 0;; c++) {
struct cfs_range_expr *range;
-- 
1.8.3.1



[PATCH v2 10/25] staging: lustre: libcfs: use distance in cpu and node handling

2018-05-29 Thread James Simmons
From: Amir Shehata 

Take into consideration the location of NUMA nodes and core
when calling cfs_cpt_[un]set_cpu() and cfs_cpt_[un]set_node().
This enables functioning on platforms with 100s of cores and
NUMA nodes.

Signed-off-by: Amir Shehata 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734
Reviewed-on: http://review.whamcloud.com/18916
Reviewed-by: Olaf Weber 
Reviewed-by: Doug Oucharek 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch
v2) Rebased patch to handle recent libcfs changes

 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 192 ++--
 1 file changed, 143 insertions(+), 49 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c 
b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index 2a74e51..9ff9fe9 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -330,11 +330,134 @@ unsigned int cfs_cpt_distance(struct cfs_cpt_table 
*cptab, int cpt1, int cpt2)
 }
 EXPORT_SYMBOL(cfs_cpt_distance);
 
+/*
+ * Calculate the maximum NUMA distance between all nodes in the
+ * from_mask and all nodes in the to_mask.
+ */
+static unsigned int cfs_cpt_distance_calculate(nodemask_t *from_mask,
+  nodemask_t *to_mask)
+{
+   unsigned int maximum;
+   unsigned int distance;
+   int from;
+   int to;
+
+   maximum = 0;
+   for_each_node_mask(from, *from_mask) {
+   for_each_node_mask(to, *to_mask) {
+   distance = node_distance(from, to);
+   if (maximum < distance)
+   maximum = distance;
+   }
+   }
+   return maximum;
+}
+
+static void cfs_cpt_add_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu)
+{
+   cptab->ctb_cpu2cpt[cpu] = cpt;
+
+   cpumask_set_cpu(cpu, cptab->ctb_cpumask);
+   cpumask_set_cpu(cpu, cptab->ctb_parts[cpt].cpt_cpumask);
+}
+
+static void cfs_cpt_del_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu)
+{
+   cpumask_clear_cpu(cpu, cptab->ctb_parts[cpt].cpt_cpumask);
+   cpumask_clear_cpu(cpu, cptab->ctb_cpumask);
+
+   cptab->ctb_cpu2cpt[cpu] = -1;
+}
+
+static void cfs_cpt_add_node(struct cfs_cpt_table *cptab, int cpt, int node)
+{
+   struct cfs_cpu_partition *part;
+
+   if (!node_isset(node, *cptab->ctb_nodemask)) {
+   unsigned int dist;
+
+   /* first time node is added to the CPT table */
+   node_set(node, *cptab->ctb_nodemask);
+   cptab->ctb_node2cpt[node] = cpt;
+
+   dist = cfs_cpt_distance_calculate(cptab->ctb_nodemask,
+ cptab->ctb_nodemask);
+   cptab->ctb_distance = dist;
+   }
+
+   part = >ctb_parts[cpt];
+   if (!node_isset(node, *part->cpt_nodemask)) {
+   int cpt2;
+
+   /* first time node is added to this CPT */
+   node_set(node, *part->cpt_nodemask);
+   for (cpt2 = 0; cpt2 < cptab->ctb_nparts; cpt2++) {
+   struct cfs_cpu_partition *part2;
+   unsigned int dist;
+
+   part2 = >ctb_parts[cpt2];
+   dist = cfs_cpt_distance_calculate(part->cpt_nodemask,
+ part2->cpt_nodemask);
+   part->cpt_distance[cpt2] = dist;
+   dist = cfs_cpt_distance_calculate(part2->cpt_nodemask,
+ part->cpt_nodemask);
+   part2->cpt_distance[cpt] = dist;
+   }
+   }
+}
+
+static void cfs_cpt_del_node(struct cfs_cpt_table *cptab, int cpt, int node)
+{
+   struct cfs_cpu_partition *part = >ctb_parts[cpt];
+   int cpu;
+
+   for_each_cpu(cpu, part->cpt_cpumask) {
+   /* this CPT has other CPU belonging to this node? */
+   if (cpu_to_node(cpu) == node)
+   break;
+   }
+
+   if (cpu >= nr_cpu_ids && node_isset(node,  *part->cpt_nodemask)) {
+   int cpt2;
+
+   /* No more CPUs in the node for this CPT. */
+   node_clear(node, *part->cpt_nodemask);
+   for (cpt2 = 0; cpt2 < cptab->ctb_nparts; cpt2++) {
+   struct cfs_cpu_partition *part2;
+   unsigned int dist;
+
+   part2 = >ctb_parts[cpt2];
+   if (node_isset(node, *part2->cpt_nodemask))
+   cptab->ctb_node2cpt[node] = cpt2;
+
+   dist = cfs_cpt_distance_calculate(part->cpt_nodemask,
+ part2->cpt_nodemask);
+  

[PATCH v2 05/25] staging: lustre: libcfs: replace MAX_NUMNODES with nr_node_ids

2018-05-29 Thread James Simmons
From: Amir Shehata 

Replace MAX_NUMNODES which is considered deprocated with
nr_nodes_ids. Looking at page_malloc.c you will see that
nr_nodes_ids is equal to MAX_NUMNODES. MAX_NUMNODES is
actually setup with Kconfig.

Signed-off-by: Amir Shehata 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734
Reviewed-on: http://review.whamcloud.com/18916
Reviewed-by: Olaf Weber 
Reviewed-by: Doug Oucharek 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch
v2) Same code but added in more details in commit message

 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c 
b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index b67a60c..d3017e8 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -395,7 +395,7 @@ struct cfs_cpt_table *
 {
const cpumask_t *mask;
 
-   if (node < 0 || node >= MAX_NUMNODES) {
+   if (node < 0 || node >= nr_node_ids) {
CDEBUG(D_INFO,
   "Invalid NUMA id %d for CPU partition %d\n", node, cpt);
return 0;
@@ -412,7 +412,7 @@ struct cfs_cpt_table *
 {
const cpumask_t *mask;
 
-   if (node < 0 || node >= MAX_NUMNODES) {
+   if (node < 0 || node >= nr_node_ids) {
CDEBUG(D_INFO,
   "Invalid NUMA id %d for CPU partition %d\n", node, cpt);
return;
@@ -836,7 +836,7 @@ struct cfs_cpt_table *
return cptab;
}
 
-   high = node ? MAX_NUMNODES - 1 : nr_cpu_ids - 1;
+   high = node ? nr_node_ids - 1 : nr_cpu_ids - 1;
 
for (str = strim(pattern), c = 0;; c++) {
struct cfs_range_expr *range;
-- 
1.8.3.1



[PATCH v2 11/25] staging: lustre: libcfs: provide debugfs files for distance handling

2018-05-29 Thread James Simmons
From: Amir Shehata 

On systems with large number of NUMA nodes and cores it is easy
to incorrectly configure their use with Lustre. Provide debugfs
files which can help track down any issues.

Signed-off-by: Amir Shehata 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734
Reviewed-on: http://review.whamcloud.com/18916
Reviewed-by: Olaf Weber 
Reviewed-by: Doug Oucharek 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch
v2) Rebased patch. No code changes from original patch

 drivers/staging/lustre/lnet/libcfs/module.c | 53 +
 1 file changed, 53 insertions(+)

diff --git a/drivers/staging/lustre/lnet/libcfs/module.c 
b/drivers/staging/lustre/lnet/libcfs/module.c
index b438d456..d2dfc29 100644
--- a/drivers/staging/lustre/lnet/libcfs/module.c
+++ b/drivers/staging/lustre/lnet/libcfs/module.c
@@ -468,6 +468,53 @@ static int proc_cpt_table(struct ctl_table *table, int 
write,
__proc_cpt_table);
 }
 
+static int __proc_cpt_distance(void *data, int write,
+  loff_t pos, void __user *buffer, int nob)
+{
+   char *buf = NULL;
+   int len = 4096;
+   int rc = 0;
+
+   if (write)
+   return -EPERM;
+
+   LASSERT(cfs_cpt_tab);
+
+   while (1) {
+   buf = kzalloc(len, GFP_KERNEL);
+   if (!buf)
+   return -ENOMEM;
+
+   rc = cfs_cpt_distance_print(cfs_cpt_tab, buf, len);
+   if (rc >= 0)
+   break;
+
+   if (rc == -EFBIG) {
+   kfree(buf);
+   len <<= 1;
+   continue;
+   }
+   goto out;
+   }
+
+   if (pos >= rc) {
+   rc = 0;
+   goto out;
+   }
+
+   rc = cfs_trace_copyout_string(buffer, nob, buf + pos, NULL);
+out:
+   kfree(buf);
+   return rc;
+}
+
+static int proc_cpt_distance(struct ctl_table *table, int write,
+void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+   return lprocfs_call_handler(table->data, write, ppos, buffer, lenp,
+   __proc_cpt_distance);
+}
+
 static struct ctl_table lnet_table[] = {
{
.procname = "debug",
@@ -497,6 +544,12 @@ static int proc_cpt_table(struct ctl_table *table, int 
write,
.proc_handler = _cpt_table,
},
{
+   .procname = "cpu_partition_distance",
+   .maxlen   = 128,
+   .mode = 0444,
+   .proc_handler = _cpt_distance,
+   },
+   {
.procname = "debug_log_upcall",
.data = lnet_debug_log_upcall,
.maxlen   = sizeof(lnet_debug_log_upcall),
-- 
1.8.3.1



[PATCH v2 11/25] staging: lustre: libcfs: provide debugfs files for distance handling

2018-05-29 Thread James Simmons
From: Amir Shehata 

On systems with large number of NUMA nodes and cores it is easy
to incorrectly configure their use with Lustre. Provide debugfs
files which can help track down any issues.

Signed-off-by: Amir Shehata 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734
Reviewed-on: http://review.whamcloud.com/18916
Reviewed-by: Olaf Weber 
Reviewed-by: Doug Oucharek 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch
v2) Rebased patch. No code changes from original patch

 drivers/staging/lustre/lnet/libcfs/module.c | 53 +
 1 file changed, 53 insertions(+)

diff --git a/drivers/staging/lustre/lnet/libcfs/module.c 
b/drivers/staging/lustre/lnet/libcfs/module.c
index b438d456..d2dfc29 100644
--- a/drivers/staging/lustre/lnet/libcfs/module.c
+++ b/drivers/staging/lustre/lnet/libcfs/module.c
@@ -468,6 +468,53 @@ static int proc_cpt_table(struct ctl_table *table, int 
write,
__proc_cpt_table);
 }
 
+static int __proc_cpt_distance(void *data, int write,
+  loff_t pos, void __user *buffer, int nob)
+{
+   char *buf = NULL;
+   int len = 4096;
+   int rc = 0;
+
+   if (write)
+   return -EPERM;
+
+   LASSERT(cfs_cpt_tab);
+
+   while (1) {
+   buf = kzalloc(len, GFP_KERNEL);
+   if (!buf)
+   return -ENOMEM;
+
+   rc = cfs_cpt_distance_print(cfs_cpt_tab, buf, len);
+   if (rc >= 0)
+   break;
+
+   if (rc == -EFBIG) {
+   kfree(buf);
+   len <<= 1;
+   continue;
+   }
+   goto out;
+   }
+
+   if (pos >= rc) {
+   rc = 0;
+   goto out;
+   }
+
+   rc = cfs_trace_copyout_string(buffer, nob, buf + pos, NULL);
+out:
+   kfree(buf);
+   return rc;
+}
+
+static int proc_cpt_distance(struct ctl_table *table, int write,
+void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+   return lprocfs_call_handler(table->data, write, ppos, buffer, lenp,
+   __proc_cpt_distance);
+}
+
 static struct ctl_table lnet_table[] = {
{
.procname = "debug",
@@ -497,6 +544,12 @@ static int proc_cpt_table(struct ctl_table *table, int 
write,
.proc_handler = _cpt_table,
},
{
+   .procname = "cpu_partition_distance",
+   .maxlen   = 128,
+   .mode = 0444,
+   .proc_handler = _cpt_distance,
+   },
+   {
.procname = "debug_log_upcall",
.data = lnet_debug_log_upcall,
.maxlen   = sizeof(lnet_debug_log_upcall),
-- 
1.8.3.1



[PATCH v2 12/25] staging: lustre: libcfs: invert error handling for cfs_cpt_table_print

2018-05-29 Thread James Simmons
From: Amir Shehata 

Instead of setting rc to -EFBIG for several cases in the loop lets
just go to the out label on error which returns -E2BIG directly.

Signed-off-by: Amir Shehata 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734
Reviewed-on: http://review.whamcloud.com/18916
Reviewed-by: Olaf Weber 
Reviewed-by: Doug Oucharek 
Signed-off-by: James Simmons 
---
Changelog:

v1) New patch to replace several patches. Went crazy for the one
change per patch approach.

 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 25 ++---
 1 file changed, 10 insertions(+), 15 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c 
b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index 9ff9fe9..bf41ba3 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -190,29 +190,26 @@ struct cfs_cpt_table *
 cfs_cpt_table_print(struct cfs_cpt_table *cptab, char *buf, int len)
 {
char *tmp = buf;
-   int rc = 0;
+   int rc;
int i;
int j;
 
for (i = 0; i < cptab->ctb_nparts; i++) {
-   if (len > 0) {
-   rc = snprintf(tmp, len, "%d\t:", i);
-   len -= rc;
-   }
+   if (len <= 0)
+   goto out;
+
+   rc = snprintf(tmp, len, "%d\t:", i);
+   len -= rc;
 
-   if (len <= 0) {
-   rc = -EFBIG;
+   if (len <= 0)
goto out;
-   }
 
tmp += rc;
for_each_cpu(j, cptab->ctb_parts[i].cpt_cpumask) {
rc = snprintf(tmp, len, "%d ", j);
len -= rc;
-   if (len <= 0) {
-   rc = -EFBIG;
+   if (len <= 0)
goto out;
-   }
tmp += rc;
}
 
@@ -221,11 +218,9 @@ struct cfs_cpt_table *
len--;
}
 
- out:
-   if (rc < 0)
-   return rc;
-
return tmp - buf;
+out:
+   return -E2BIG;
 }
 EXPORT_SYMBOL(cfs_cpt_table_print);
 
-- 
1.8.3.1



[PATCH v2 12/25] staging: lustre: libcfs: invert error handling for cfs_cpt_table_print

2018-05-29 Thread James Simmons
From: Amir Shehata 

Instead of setting rc to -EFBIG for several cases in the loop lets
just go to the out label on error which returns -E2BIG directly.

Signed-off-by: Amir Shehata 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734
Reviewed-on: http://review.whamcloud.com/18916
Reviewed-by: Olaf Weber 
Reviewed-by: Doug Oucharek 
Signed-off-by: James Simmons 
---
Changelog:

v1) New patch to replace several patches. Went crazy for the one
change per patch approach.

 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 25 ++---
 1 file changed, 10 insertions(+), 15 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c 
b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index 9ff9fe9..bf41ba3 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -190,29 +190,26 @@ struct cfs_cpt_table *
 cfs_cpt_table_print(struct cfs_cpt_table *cptab, char *buf, int len)
 {
char *tmp = buf;
-   int rc = 0;
+   int rc;
int i;
int j;
 
for (i = 0; i < cptab->ctb_nparts; i++) {
-   if (len > 0) {
-   rc = snprintf(tmp, len, "%d\t:", i);
-   len -= rc;
-   }
+   if (len <= 0)
+   goto out;
+
+   rc = snprintf(tmp, len, "%d\t:", i);
+   len -= rc;
 
-   if (len <= 0) {
-   rc = -EFBIG;
+   if (len <= 0)
goto out;
-   }
 
tmp += rc;
for_each_cpu(j, cptab->ctb_parts[i].cpt_cpumask) {
rc = snprintf(tmp, len, "%d ", j);
len -= rc;
-   if (len <= 0) {
-   rc = -EFBIG;
+   if (len <= 0)
goto out;
-   }
tmp += rc;
}
 
@@ -221,11 +218,9 @@ struct cfs_cpt_table *
len--;
}
 
- out:
-   if (rc < 0)
-   return rc;
-
return tmp - buf;
+out:
+   return -E2BIG;
 }
 EXPORT_SYMBOL(cfs_cpt_table_print);
 
-- 
1.8.3.1



[PATCH v2 15/25] staging: lustre: libcfs: rename i to node for cfs_cpt_set_nodemask

2018-05-29 Thread James Simmons
From: Dmitry Eremin 

Rename variable i to node to make code easier to understand.

Signed-off-by: Dmitry Eremin 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8703
Reviewed-on: https://review.whamcloud.com/23222
Reviewed-by: Amir Shehata 
Reviewed-by: James Simmons 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch
v2) Rebased patch. No changes in code

 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c 
b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index 14d5791..bac5601 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -575,10 +575,10 @@ void cfs_cpt_unset_node(struct cfs_cpt_table *cptab, int 
cpt, int node)
 int cfs_cpt_set_nodemask(struct cfs_cpt_table *cptab, int cpt,
 const nodemask_t *mask)
 {
-   int i;
+   int node;
 
-   for_each_node_mask(i, *mask) {
-   if (!cfs_cpt_set_node(cptab, cpt, i))
+   for_each_node_mask(node, *mask) {
+   if (!cfs_cpt_set_node(cptab, cpt, node))
return 0;
}
 
@@ -589,10 +589,10 @@ int cfs_cpt_set_nodemask(struct cfs_cpt_table *cptab, int 
cpt,
 void cfs_cpt_unset_nodemask(struct cfs_cpt_table *cptab, int cpt,
const nodemask_t *mask)
 {
-   int i;
+   int node;
 
-   for_each_node_mask(i, *mask)
-   cfs_cpt_unset_node(cptab, cpt, i);
+   for_each_node_mask(node, *mask)
+   cfs_cpt_unset_node(cptab, cpt, node);
 }
 EXPORT_SYMBOL(cfs_cpt_unset_nodemask);
 
-- 
1.8.3.1



[PATCH v2 15/25] staging: lustre: libcfs: rename i to node for cfs_cpt_set_nodemask

2018-05-29 Thread James Simmons
From: Dmitry Eremin 

Rename variable i to node to make code easier to understand.

Signed-off-by: Dmitry Eremin 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8703
Reviewed-on: https://review.whamcloud.com/23222
Reviewed-by: Amir Shehata 
Reviewed-by: James Simmons 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch
v2) Rebased patch. No changes in code

 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c 
b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index 14d5791..bac5601 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -575,10 +575,10 @@ void cfs_cpt_unset_node(struct cfs_cpt_table *cptab, int 
cpt, int node)
 int cfs_cpt_set_nodemask(struct cfs_cpt_table *cptab, int cpt,
 const nodemask_t *mask)
 {
-   int i;
+   int node;
 
-   for_each_node_mask(i, *mask) {
-   if (!cfs_cpt_set_node(cptab, cpt, i))
+   for_each_node_mask(node, *mask) {
+   if (!cfs_cpt_set_node(cptab, cpt, node))
return 0;
}
 
@@ -589,10 +589,10 @@ int cfs_cpt_set_nodemask(struct cfs_cpt_table *cptab, int 
cpt,
 void cfs_cpt_unset_nodemask(struct cfs_cpt_table *cptab, int cpt,
const nodemask_t *mask)
 {
-   int i;
+   int node;
 
-   for_each_node_mask(i, *mask)
-   cfs_cpt_unset_node(cptab, cpt, i);
+   for_each_node_mask(node, *mask)
+   cfs_cpt_unset_node(cptab, cpt, node);
 }
 EXPORT_SYMBOL(cfs_cpt_unset_nodemask);
 
-- 
1.8.3.1



[PATCH v2 13/25] staging: lustre: libcfs: fix libcfs_cpu coding style

2018-05-29 Thread James Simmons
From: Dmitry Eremin 

This patch bring the lustre CPT code into alignment with the
Linux kernel coding style.

Signed-off-by: Dmitry Eremin 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8703
Reviewed-on: https://review.whamcloud.com/23304
Reviewed-by: James Simmons 
Reviewed-by: Doug Oucharek 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch
v2) Rebased patch to handle recent libcfs changes

 .../lustre/include/linux/libcfs/libcfs_cpu.h   | 76 --
 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c| 92 --
 2 files changed, 66 insertions(+), 102 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h 
b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
index d5237d0..2c97adf 100644
--- a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
+++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
@@ -144,8 +144,7 @@ struct cfs_cpt_table {
 /**
  * return total number of CPU partitions in \a cptab
  */
-int
-cfs_cpt_number(struct cfs_cpt_table *cptab);
+int cfs_cpt_number(struct cfs_cpt_table *cptab);
 /**
  * return number of HW cores or hyper-threadings in a CPU partition \a cpt
  */
@@ -207,25 +206,24 @@ void cfs_cpt_unset_cpumask(struct cfs_cpt_table *cptab,
  * remove all cpus in NUMA node \a node from CPU partition \a cpt
  */
 void cfs_cpt_unset_node(struct cfs_cpt_table *cptab, int cpt, int node);
-
 /**
  * add all cpus in node mask \a mask to CPU partition \a cpt
  * return 1 if successfully set all CPUs, otherwise return 0
  */
 int cfs_cpt_set_nodemask(struct cfs_cpt_table *cptab,
-int cpt, nodemask_t *mask);
+int cpt, const nodemask_t *mask);
 /**
  * remove all cpus in node mask \a mask from CPU partition \a cpt
  */
 void cfs_cpt_unset_nodemask(struct cfs_cpt_table *cptab,
-   int cpt, nodemask_t *mask);
+   int cpt, const nodemask_t *mask);
 /**
  * convert partition id \a cpt to numa node id, if there are more than one
  * nodes in this partition, it might return a different node id each time.
  */
 int cfs_cpt_spread_node(struct cfs_cpt_table *cptab, int cpt);
 
-int  cfs_cpu_init(void);
+int cfs_cpu_init(void);
 void cfs_cpu_fini(void);
 
 #else /* !CONFIG_SMP */
@@ -282,32 +280,29 @@ static inline int cfs_cpt_distance_print(struct 
cfs_cpt_table *cptab,
return rc;
 }
 
-static inline cpumask_var_t *
-cfs_cpt_cpumask(struct cfs_cpt_table *cptab, int cpt)
+static inline cpumask_var_t *cfs_cpt_cpumask(struct cfs_cpt_table *cptab,
+int cpt)
 {
return >ctb_cpumask;
 }
 
-static inline int
-cfs_cpt_number(struct cfs_cpt_table *cptab)
+static inline int cfs_cpt_number(struct cfs_cpt_table *cptab)
 {
return 1;
 }
 
-static inline int
-cfs_cpt_weight(struct cfs_cpt_table *cptab, int cpt)
+static inline int cfs_cpt_weight(struct cfs_cpt_table *cptab, int cpt)
 {
return 1;
 }
 
-static inline int
-cfs_cpt_online(struct cfs_cpt_table *cptab, int cpt)
+static inline int cfs_cpt_online(struct cfs_cpt_table *cptab, int cpt)
 {
return 1;
 }
 
-static inline nodemask_t *
-cfs_cpt_nodemask(struct cfs_cpt_table *cptab, int cpt)
+static inline nodemask_t *cfs_cpt_nodemask(struct cfs_cpt_table *cptab,
+  int cpt)
 {
return >ctb_nodemask;
 }
@@ -318,66 +313,61 @@ static inline unsigned int cfs_cpt_distance(struct 
cfs_cpt_table *cptab,
return 1;
 }
 
-static inline int
-cfs_cpt_set_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu)
+static inline int cfs_cpt_set_cpu(struct cfs_cpt_table *cptab, int cpt,
+ int cpu)
 {
return 1;
 }
 
-static inline void
-cfs_cpt_unset_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu)
+static inline void cfs_cpt_unset_cpu(struct cfs_cpt_table *cptab, int cpt,
+int cpu)
 {
 }
 
-static inline int
-cfs_cpt_set_cpumask(struct cfs_cpt_table *cptab, int cpt,
-   const cpumask_t *mask)
+static inline int cfs_cpt_set_cpumask(struct cfs_cpt_table *cptab, int cpt,
+ const cpumask_t *mask)
 {
return 1;
 }
 
-static inline void
-cfs_cpt_unset_cpumask(struct cfs_cpt_table *cptab, int cpt,
- const cpumask_t *mask)
+static inline void cfs_cpt_unset_cpumask(struct cfs_cpt_table *cptab, int cpt,
+const cpumask_t *mask)
 {
 }
 
-static inline int
-cfs_cpt_set_node(struct cfs_cpt_table *cptab, int cpt, int node)
+static inline int cfs_cpt_set_node(struct cfs_cpt_table *cptab, int cpt,
+  int node)
 {
return 1;
 }
 
-static inline void
-cfs_cpt_unset_node(struct cfs_cpt_table *cptab, int cpt, int node)
+static inline void cfs_cpt_unset_node(struct cfs_cpt_table *cptab, i

[PATCH v2 13/25] staging: lustre: libcfs: fix libcfs_cpu coding style

2018-05-29 Thread James Simmons
From: Dmitry Eremin 

This patch bring the lustre CPT code into alignment with the
Linux kernel coding style.

Signed-off-by: Dmitry Eremin 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8703
Reviewed-on: https://review.whamcloud.com/23304
Reviewed-by: James Simmons 
Reviewed-by: Doug Oucharek 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch
v2) Rebased patch to handle recent libcfs changes

 .../lustre/include/linux/libcfs/libcfs_cpu.h   | 76 --
 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c| 92 --
 2 files changed, 66 insertions(+), 102 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h 
b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
index d5237d0..2c97adf 100644
--- a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
+++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
@@ -144,8 +144,7 @@ struct cfs_cpt_table {
 /**
  * return total number of CPU partitions in \a cptab
  */
-int
-cfs_cpt_number(struct cfs_cpt_table *cptab);
+int cfs_cpt_number(struct cfs_cpt_table *cptab);
 /**
  * return number of HW cores or hyper-threadings in a CPU partition \a cpt
  */
@@ -207,25 +206,24 @@ void cfs_cpt_unset_cpumask(struct cfs_cpt_table *cptab,
  * remove all cpus in NUMA node \a node from CPU partition \a cpt
  */
 void cfs_cpt_unset_node(struct cfs_cpt_table *cptab, int cpt, int node);
-
 /**
  * add all cpus in node mask \a mask to CPU partition \a cpt
  * return 1 if successfully set all CPUs, otherwise return 0
  */
 int cfs_cpt_set_nodemask(struct cfs_cpt_table *cptab,
-int cpt, nodemask_t *mask);
+int cpt, const nodemask_t *mask);
 /**
  * remove all cpus in node mask \a mask from CPU partition \a cpt
  */
 void cfs_cpt_unset_nodemask(struct cfs_cpt_table *cptab,
-   int cpt, nodemask_t *mask);
+   int cpt, const nodemask_t *mask);
 /**
  * convert partition id \a cpt to numa node id, if there are more than one
  * nodes in this partition, it might return a different node id each time.
  */
 int cfs_cpt_spread_node(struct cfs_cpt_table *cptab, int cpt);
 
-int  cfs_cpu_init(void);
+int cfs_cpu_init(void);
 void cfs_cpu_fini(void);
 
 #else /* !CONFIG_SMP */
@@ -282,32 +280,29 @@ static inline int cfs_cpt_distance_print(struct 
cfs_cpt_table *cptab,
return rc;
 }
 
-static inline cpumask_var_t *
-cfs_cpt_cpumask(struct cfs_cpt_table *cptab, int cpt)
+static inline cpumask_var_t *cfs_cpt_cpumask(struct cfs_cpt_table *cptab,
+int cpt)
 {
return >ctb_cpumask;
 }
 
-static inline int
-cfs_cpt_number(struct cfs_cpt_table *cptab)
+static inline int cfs_cpt_number(struct cfs_cpt_table *cptab)
 {
return 1;
 }
 
-static inline int
-cfs_cpt_weight(struct cfs_cpt_table *cptab, int cpt)
+static inline int cfs_cpt_weight(struct cfs_cpt_table *cptab, int cpt)
 {
return 1;
 }
 
-static inline int
-cfs_cpt_online(struct cfs_cpt_table *cptab, int cpt)
+static inline int cfs_cpt_online(struct cfs_cpt_table *cptab, int cpt)
 {
return 1;
 }
 
-static inline nodemask_t *
-cfs_cpt_nodemask(struct cfs_cpt_table *cptab, int cpt)
+static inline nodemask_t *cfs_cpt_nodemask(struct cfs_cpt_table *cptab,
+  int cpt)
 {
return >ctb_nodemask;
 }
@@ -318,66 +313,61 @@ static inline unsigned int cfs_cpt_distance(struct 
cfs_cpt_table *cptab,
return 1;
 }
 
-static inline int
-cfs_cpt_set_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu)
+static inline int cfs_cpt_set_cpu(struct cfs_cpt_table *cptab, int cpt,
+ int cpu)
 {
return 1;
 }
 
-static inline void
-cfs_cpt_unset_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu)
+static inline void cfs_cpt_unset_cpu(struct cfs_cpt_table *cptab, int cpt,
+int cpu)
 {
 }
 
-static inline int
-cfs_cpt_set_cpumask(struct cfs_cpt_table *cptab, int cpt,
-   const cpumask_t *mask)
+static inline int cfs_cpt_set_cpumask(struct cfs_cpt_table *cptab, int cpt,
+ const cpumask_t *mask)
 {
return 1;
 }
 
-static inline void
-cfs_cpt_unset_cpumask(struct cfs_cpt_table *cptab, int cpt,
- const cpumask_t *mask)
+static inline void cfs_cpt_unset_cpumask(struct cfs_cpt_table *cptab, int cpt,
+const cpumask_t *mask)
 {
 }
 
-static inline int
-cfs_cpt_set_node(struct cfs_cpt_table *cptab, int cpt, int node)
+static inline int cfs_cpt_set_node(struct cfs_cpt_table *cptab, int cpt,
+  int node)
 {
return 1;
 }
 
-static inline void
-cfs_cpt_unset_node(struct cfs_cpt_table *cptab, int cpt, int node)
+static inline void cfs_cpt_unset_node(struct cfs_cpt_table *cptab, i

[PATCH v2 06/25] staging: lustre: libcfs: remove excess space

2018-05-29 Thread James Simmons
From: Amir Shehata 

The function cfs_cpt_table_print() was adding two spaces
to the string buffer. Just add it once.

Signed-off-by: Amir Shehata 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734
Reviewed-on: http://review.whamcloud.com/18916
Reviewed-by: Olaf Weber 
Reviewed-by: Doug Oucharek 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch
v2) Rebased patch. Same code

 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c 
b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index d3017e8..d9d1388 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -173,7 +173,7 @@ struct cfs_cpt_table *
 
for (i = 0; i < cptab->ctb_nparts; i++) {
if (len > 0) {
-   rc = snprintf(tmp, len, "%d\t: ", i);
+   rc = snprintf(tmp, len, "%d\t:", i);
len -= rc;
}
 
-- 
1.8.3.1



[PATCH v2 16/25] staging: lustre: libcfs: rename i to cpu for cfs_cpt_bind

2018-05-29 Thread James Simmons
From: Dmitry Eremin 

Rename variable i to cpu to make code easier to understand.

Signed-off-by: Dmitry Eremin 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8703
Reviewed-on: https://review.whamcloud.com/23222
Reviewed-by: Amir Shehata 
Reviewed-by: James Simmons 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch
v2) Rebased patch. No changes in code

 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c 
b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index bac5601..1c10529 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -669,8 +669,8 @@ int cfs_cpt_bind(struct cfs_cpt_table *cptab, int cpt)
 {
cpumask_var_t *cpumask;
nodemask_t *nodemask;
+   int cpu;
int rc;
-   int i;
 
LASSERT(cpt == CFS_CPT_ANY || (cpt >= 0 && cpt < cptab->ctb_nparts));
 
@@ -688,8 +688,8 @@ int cfs_cpt_bind(struct cfs_cpt_table *cptab, int cpt)
return -EINVAL;
}
 
-   for_each_online_cpu(i) {
-   if (cpumask_test_cpu(i, *cpumask))
+   for_each_online_cpu(cpu) {
+   if (cpumask_test_cpu(cpu, *cpumask))
continue;
 
rc = set_cpus_allowed_ptr(current, *cpumask);
-- 
1.8.3.1



[PATCH v2 06/25] staging: lustre: libcfs: remove excess space

2018-05-29 Thread James Simmons
From: Amir Shehata 

The function cfs_cpt_table_print() was adding two spaces
to the string buffer. Just add it once.

Signed-off-by: Amir Shehata 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734
Reviewed-on: http://review.whamcloud.com/18916
Reviewed-by: Olaf Weber 
Reviewed-by: Doug Oucharek 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch
v2) Rebased patch. Same code

 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c 
b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index d3017e8..d9d1388 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -173,7 +173,7 @@ struct cfs_cpt_table *
 
for (i = 0; i < cptab->ctb_nparts; i++) {
if (len > 0) {
-   rc = snprintf(tmp, len, "%d\t: ", i);
+   rc = snprintf(tmp, len, "%d\t:", i);
len -= rc;
}
 
-- 
1.8.3.1



[PATCH v2 16/25] staging: lustre: libcfs: rename i to cpu for cfs_cpt_bind

2018-05-29 Thread James Simmons
From: Dmitry Eremin 

Rename variable i to cpu to make code easier to understand.

Signed-off-by: Dmitry Eremin 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8703
Reviewed-on: https://review.whamcloud.com/23222
Reviewed-by: Amir Shehata 
Reviewed-by: James Simmons 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch
v2) Rebased patch. No changes in code

 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c 
b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index bac5601..1c10529 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -669,8 +669,8 @@ int cfs_cpt_bind(struct cfs_cpt_table *cptab, int cpt)
 {
cpumask_var_t *cpumask;
nodemask_t *nodemask;
+   int cpu;
int rc;
-   int i;
 
LASSERT(cpt == CFS_CPT_ANY || (cpt >= 0 && cpt < cptab->ctb_nparts));
 
@@ -688,8 +688,8 @@ int cfs_cpt_bind(struct cfs_cpt_table *cptab, int cpt)
return -EINVAL;
}
 
-   for_each_online_cpu(i) {
-   if (cpumask_test_cpu(i, *cpumask))
+   for_each_online_cpu(cpu) {
+   if (cpumask_test_cpu(cpu, *cpumask))
continue;
 
rc = set_cpus_allowed_ptr(current, *cpumask);
-- 
1.8.3.1



[PATCH v2 20/25] staging: lustre: libcfs: make tolerant to offline CPUs and empty NUMA nodes

2018-05-29 Thread James Simmons
From: Dmitry Eremin 

Rework CPU partition code in the way of make it more tolerant to
offline CPUs and empty nodes.

Signed-off-by: Dmitry Eremin 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8703
Reviewed-on: https://review.whamcloud.com/23222
Reviewed-by: Amir Shehata 
Reviewed-by: James Simmons 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch
v2) Rebased patch. No changes in code from earlier patch

 .../lustre/include/linux/libcfs/libcfs_cpu.h   |   2 +
 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c| 132 +
 drivers/staging/lustre/lnet/lnet/lib-msg.c |   2 +
 3 files changed, 60 insertions(+), 76 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h 
b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
index 9f4ba9d..c0aa0b3 100644
--- a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
+++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
@@ -91,6 +91,8 @@ struct cfs_cpu_partition {
unsigned int*cpt_distance;
/* spread rotor for NUMA allocator */
int cpt_spread_rotor;
+   /* NUMA node if cpt_nodemask is empty */
+   int cpt_node;
 };
 #endif /* CONFIG_SMP */
 
diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c 
b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index 7f1061e..99a9494 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -457,8 +457,16 @@ int cfs_cpt_set_cpu(struct cfs_cpt_table *cptab, int cpt, 
int cpu)
return 0;
}
 
-   LASSERT(!cpumask_test_cpu(cpu, cptab->ctb_cpumask));
-   LASSERT(!cpumask_test_cpu(cpu, cptab->ctb_parts[cpt].cpt_cpumask));
+   if (cpumask_test_cpu(cpu, cptab->ctb_cpumask)) {
+   CDEBUG(D_INFO, "CPU %d is already in cpumask\n", cpu);
+   return 0;
+   }
+
+   if (cpumask_test_cpu(cpu, cptab->ctb_parts[cpt].cpt_cpumask)) {
+   CDEBUG(D_INFO, "CPU %d is already in partition %d cpumask\n",
+  cpu, cptab->ctb_cpu2cpt[cpu]);
+   return 0;
+   }
 
cfs_cpt_add_cpu(cptab, cpt, cpu);
cfs_cpt_add_node(cptab, cpt, cpu_to_node(cpu));
@@ -527,8 +535,10 @@ void cfs_cpt_unset_cpumask(struct cfs_cpt_table *cptab, 
int cpt,
 {
int cpu;
 
-   for_each_cpu(cpu, mask)
-   cfs_cpt_unset_cpu(cptab, cpt, cpu);
+   for_each_cpu(cpu, mask) {
+   cfs_cpt_del_cpu(cptab, cpt, cpu);
+   cfs_cpt_del_node(cptab, cpt, cpu_to_node(cpu));
+   }
 }
 EXPORT_SYMBOL(cfs_cpt_unset_cpumask);
 
@@ -579,10 +589,8 @@ int cfs_cpt_set_nodemask(struct cfs_cpt_table *cptab, int 
cpt,
 {
int node;
 
-   for_each_node_mask(node, *mask) {
-   if (!cfs_cpt_set_node(cptab, cpt, node))
-   return 0;
-   }
+   for_each_node_mask(node, *mask)
+   cfs_cpt_set_node(cptab, cpt, node);
 
return 1;
 }
@@ -603,7 +611,7 @@ int cfs_cpt_spread_node(struct cfs_cpt_table *cptab, int 
cpt)
nodemask_t *mask;
int weight;
int rotor;
-   int node;
+   int node = 0;
 
/* convert CPU partition ID to HW node id */
 
@@ -613,20 +621,20 @@ int cfs_cpt_spread_node(struct cfs_cpt_table *cptab, int 
cpt)
} else {
mask = cptab->ctb_parts[cpt].cpt_nodemask;
rotor = cptab->ctb_parts[cpt].cpt_spread_rotor++;
+   node  = cptab->ctb_parts[cpt].cpt_node;
}
 
weight = nodes_weight(*mask);
-   LASSERT(weight > 0);
-
-   rotor %= weight;
+   if (weight > 0) {
+   rotor %= weight;
 
-   for_each_node_mask(node, *mask) {
-   if (!rotor--)
-   return node;
+   for_each_node_mask(node, *mask) {
+   if (!rotor--)
+   return node;
+   }
}
 
-   LBUG();
-   return 0;
+   return node;
 }
 EXPORT_SYMBOL(cfs_cpt_spread_node);
 
@@ -719,17 +727,21 @@ static int cfs_cpt_choose_ncpus(struct cfs_cpt_table 
*cptab, int cpt,
cpumask_var_t core_mask;
int rc = 0;
int cpu;
+   int i;
 
LASSERT(number > 0);
 
if (number >= cpumask_weight(node_mask)) {
while (!cpumask_empty(node_mask)) {
cpu = cpumask_first(node_mask);
+   cpumask_clear_cpu(cpu, node_mask);
+
+   if (!cpu_online(cpu))
+   continue;
 
rc = cfs_cpt_set_cpu(cptab, cpt, cpu);
if (!rc)
return -EINVAL;
-   cpumask_clear_cpu(cpu, node_mask);
}

[PATCH v2 21/25] staging: lustre: libcfs: report NUMA node instead of just node

2018-05-29 Thread James Simmons
From: Dmitry Eremin 

Reporting "HW nodes" is too generic. It really is reporting
"HW NUMA nodes". Update the debug message.

Signed-off-by: Dmitry Eremin 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8703
Reviewed-on: https://review.whamcloud.com/23306
Reviewed-by: James Simmons 
Reviewed-by: Andreas Dilger 
Reviewed-by: Patrick Farrell 
Reviewed-by: Olaf Weber 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch
v2) Rebased patch. No changes in code from earlier patch

 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c 
b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index 99a9494..0fc102c 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -1138,7 +1138,7 @@ int cfs_cpu_init(void)
 
put_online_cpus();
 
-   LCONSOLE(0, "HW nodes: %d, HW CPU cores: %d, npartitions: %d\n",
+   LCONSOLE(0, "HW NUMA nodes: %d, HW CPU cores: %d, npartitions: %d\n",
 num_online_nodes(), num_online_cpus(),
 cfs_cpt_number(cfs_cpt_tab));
return 0;
-- 
1.8.3.1



[PATCH v2 21/25] staging: lustre: libcfs: report NUMA node instead of just node

2018-05-29 Thread James Simmons
From: Dmitry Eremin 

Reporting "HW nodes" is too generic. It really is reporting
"HW NUMA nodes". Update the debug message.

Signed-off-by: Dmitry Eremin 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8703
Reviewed-on: https://review.whamcloud.com/23306
Reviewed-by: James Simmons 
Reviewed-by: Andreas Dilger 
Reviewed-by: Patrick Farrell 
Reviewed-by: Olaf Weber 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch
v2) Rebased patch. No changes in code from earlier patch

 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c 
b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index 99a9494..0fc102c 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -1138,7 +1138,7 @@ int cfs_cpu_init(void)
 
put_online_cpus();
 
-   LCONSOLE(0, "HW nodes: %d, HW CPU cores: %d, npartitions: %d\n",
+   LCONSOLE(0, "HW NUMA nodes: %d, HW CPU cores: %d, npartitions: %d\n",
 num_online_nodes(), num_online_cpus(),
 cfs_cpt_number(cfs_cpt_tab));
return 0;
-- 
1.8.3.1



[PATCH v2 20/25] staging: lustre: libcfs: make tolerant to offline CPUs and empty NUMA nodes

2018-05-29 Thread James Simmons
From: Dmitry Eremin 

Rework CPU partition code in the way of make it more tolerant to
offline CPUs and empty nodes.

Signed-off-by: Dmitry Eremin 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8703
Reviewed-on: https://review.whamcloud.com/23222
Reviewed-by: Amir Shehata 
Reviewed-by: James Simmons 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch
v2) Rebased patch. No changes in code from earlier patch

 .../lustre/include/linux/libcfs/libcfs_cpu.h   |   2 +
 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c| 132 +
 drivers/staging/lustre/lnet/lnet/lib-msg.c |   2 +
 3 files changed, 60 insertions(+), 76 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h 
b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
index 9f4ba9d..c0aa0b3 100644
--- a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
+++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
@@ -91,6 +91,8 @@ struct cfs_cpu_partition {
unsigned int*cpt_distance;
/* spread rotor for NUMA allocator */
int cpt_spread_rotor;
+   /* NUMA node if cpt_nodemask is empty */
+   int cpt_node;
 };
 #endif /* CONFIG_SMP */
 
diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c 
b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index 7f1061e..99a9494 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -457,8 +457,16 @@ int cfs_cpt_set_cpu(struct cfs_cpt_table *cptab, int cpt, 
int cpu)
return 0;
}
 
-   LASSERT(!cpumask_test_cpu(cpu, cptab->ctb_cpumask));
-   LASSERT(!cpumask_test_cpu(cpu, cptab->ctb_parts[cpt].cpt_cpumask));
+   if (cpumask_test_cpu(cpu, cptab->ctb_cpumask)) {
+   CDEBUG(D_INFO, "CPU %d is already in cpumask\n", cpu);
+   return 0;
+   }
+
+   if (cpumask_test_cpu(cpu, cptab->ctb_parts[cpt].cpt_cpumask)) {
+   CDEBUG(D_INFO, "CPU %d is already in partition %d cpumask\n",
+  cpu, cptab->ctb_cpu2cpt[cpu]);
+   return 0;
+   }
 
cfs_cpt_add_cpu(cptab, cpt, cpu);
cfs_cpt_add_node(cptab, cpt, cpu_to_node(cpu));
@@ -527,8 +535,10 @@ void cfs_cpt_unset_cpumask(struct cfs_cpt_table *cptab, 
int cpt,
 {
int cpu;
 
-   for_each_cpu(cpu, mask)
-   cfs_cpt_unset_cpu(cptab, cpt, cpu);
+   for_each_cpu(cpu, mask) {
+   cfs_cpt_del_cpu(cptab, cpt, cpu);
+   cfs_cpt_del_node(cptab, cpt, cpu_to_node(cpu));
+   }
 }
 EXPORT_SYMBOL(cfs_cpt_unset_cpumask);
 
@@ -579,10 +589,8 @@ int cfs_cpt_set_nodemask(struct cfs_cpt_table *cptab, int 
cpt,
 {
int node;
 
-   for_each_node_mask(node, *mask) {
-   if (!cfs_cpt_set_node(cptab, cpt, node))
-   return 0;
-   }
+   for_each_node_mask(node, *mask)
+   cfs_cpt_set_node(cptab, cpt, node);
 
return 1;
 }
@@ -603,7 +611,7 @@ int cfs_cpt_spread_node(struct cfs_cpt_table *cptab, int 
cpt)
nodemask_t *mask;
int weight;
int rotor;
-   int node;
+   int node = 0;
 
/* convert CPU partition ID to HW node id */
 
@@ -613,20 +621,20 @@ int cfs_cpt_spread_node(struct cfs_cpt_table *cptab, int 
cpt)
} else {
mask = cptab->ctb_parts[cpt].cpt_nodemask;
rotor = cptab->ctb_parts[cpt].cpt_spread_rotor++;
+   node  = cptab->ctb_parts[cpt].cpt_node;
}
 
weight = nodes_weight(*mask);
-   LASSERT(weight > 0);
-
-   rotor %= weight;
+   if (weight > 0) {
+   rotor %= weight;
 
-   for_each_node_mask(node, *mask) {
-   if (!rotor--)
-   return node;
+   for_each_node_mask(node, *mask) {
+   if (!rotor--)
+   return node;
+   }
}
 
-   LBUG();
-   return 0;
+   return node;
 }
 EXPORT_SYMBOL(cfs_cpt_spread_node);
 
@@ -719,17 +727,21 @@ static int cfs_cpt_choose_ncpus(struct cfs_cpt_table 
*cptab, int cpt,
cpumask_var_t core_mask;
int rc = 0;
int cpu;
+   int i;
 
LASSERT(number > 0);
 
if (number >= cpumask_weight(node_mask)) {
while (!cpumask_empty(node_mask)) {
cpu = cpumask_first(node_mask);
+   cpumask_clear_cpu(cpu, node_mask);
+
+   if (!cpu_online(cpu))
+   continue;
 
rc = cfs_cpt_set_cpu(cptab, cpt, cpu);
if (!rc)
return -EINVAL;
-   cpumask_clear_cpu(cpu, node_mask);
}

[PATCH v2 19/25] staging: lustre: libcfs: update debug messages

2018-05-29 Thread James Simmons
From: Dmitry Eremin 

For cfs_cpt_bind() change the CERROR to CDEBUG. Make the debug
message in cfs_cpt_table_create_pattern() more understandable.
Report rc value for when cfs_cpt_create_table() fails.

Signed-off-by: Dmitry Eremin 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8703
Reviewed-on: https://review.whamcloud.com/23222
Reviewed-by: Amir Shehata 
Reviewed-by: James Simmons 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch
v2) Rebased patch. No changes in code from earlier patch

 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c 
b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index e12d337..7f1061e 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -480,7 +480,8 @@ void cfs_cpt_unset_cpu(struct cfs_cpt_table *cptab, int 
cpt, int cpu)
/* caller doesn't know the partition ID */
cpt = cptab->ctb_cpu2cpt[cpu];
if (cpt < 0) { /* not set in this CPT-table */
-   CDEBUG(D_INFO, "Try to unset cpu %d which is not in 
CPT-table %p\n",
+   CDEBUG(D_INFO,
+  "Try to unset cpu %d which is not in CPT-table 
%p\n",
   cpt, cptab);
return;
}
@@ -506,7 +507,8 @@ int cfs_cpt_set_cpumask(struct cfs_cpt_table *cptab, int 
cpt,
 
if (!cpumask_weight(mask) ||
cpumask_any_and(mask, cpu_online_mask) >= nr_cpu_ids) {
-   CDEBUG(D_INFO, "No online CPU is found in the CPU mask for CPU 
partition %d\n",
+   CDEBUG(D_INFO,
+  "No online CPU is found in the CPU mask for CPU 
partition %d\n",
   cpt);
return 0;
}
@@ -683,7 +685,8 @@ int cfs_cpt_bind(struct cfs_cpt_table *cptab, int cpt)
}
 
if (cpumask_any_and(*cpumask, cpu_online_mask) >= nr_cpu_ids) {
-   CERROR("No online CPU found in CPU partition %d, did someone do 
CPU hotplug on system? You might need to reload Lustre modules to keep system 
working well.\n",
+   CDEBUG(D_INFO,
+  "No online CPU found in CPU partition %d, did someone do 
CPU hotplug on system? You might need to reload Lustre modules to keep system 
working well.\n",
   cpt);
return -EINVAL;
}
@@ -914,8 +917,8 @@ static struct cfs_cpt_table *cfs_cpt_table_create(int ncpt)
 failed_mask:
free_cpumask_var(node_mask);
 failed:
-   CERROR("Failed to setup CPU-partition-table with %d CPU-partitions, 
online HW nodes: %d, HW cpus: %d.\n",
-  ncpt, num_online_nodes(), num_online_cpus());
+   CERROR("Failed (rc = %d) to setup CPU partition table with %d 
partitions, online HW NUMA nodes: %d, HW CPU cores: %d.\n",
+  rc, ncpt, num_online_nodes(), num_online_cpus());
 
if (cptab)
cfs_cpt_table_free(cptab);
@@ -1030,7 +1033,7 @@ static struct cfs_cpt_table 
*cfs_cpt_table_create_pattern(char *pattern)
 
bracket = strchr(str, ']');
if (!bracket) {
-   CERROR("missing right bracket for cpt %d, %s\n",
+   CERROR("Missing right bracket for partition %d, %s\n",
   cpt, str);
goto failed;
}
-- 
1.8.3.1



[PATCH v2 14/25] staging: lustre: libcfs: use int type for CPT identification.

2018-05-29 Thread James Simmons
From: Dmitry Eremin 

Use int type for CPT identification to match the linux kernel
CPU identification.

Signed-off-by: Dmitry Eremin 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8703
Reviewed-on: https://review.whamcloud.com/23304
Reviewed-by: James Simmons 
Reviewed-by: Doug Oucharek 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch
v2) Rebased patch to handle recent libcfs changes

 drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h |  8 
 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c  | 14 +++---
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h 
b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
index 2c97adf..9f4ba9d 100644
--- a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
+++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
@@ -90,7 +90,7 @@ struct cfs_cpu_partition {
/* NUMA distance between CPTs */
unsigned int*cpt_distance;
/* spread rotor for NUMA allocator */
-   unsigned intcpt_spread_rotor;
+   int cpt_spread_rotor;
 };
 #endif /* CONFIG_SMP */
 
@@ -98,11 +98,11 @@ struct cfs_cpu_partition {
 struct cfs_cpt_table {
 #ifdef CONFIG_SMP
/* spread rotor for NUMA allocator */
-   unsigned intctb_spread_rotor;
+   int ctb_spread_rotor;
/* maximum NUMA distance between all nodes in table */
unsigned intctb_distance;
/* # of CPU partitions */
-   unsigned intctb_nparts;
+   int ctb_nparts;
/* partitions tables */
struct cfs_cpu_partition*ctb_parts;
/* shadow HW CPU to CPU partition ID */
@@ -128,7 +128,7 @@ struct cfs_cpt_table {
 /**
  * create a cfs_cpt_table with \a ncpt number of partitions
  */
-struct cfs_cpt_table *cfs_cpt_table_alloc(unsigned int ncpt);
+struct cfs_cpt_table *cfs_cpt_table_alloc(int ncpt);
 /**
  * return cpumask of CPU partition \a cpt
  */
diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c 
b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index fab6675..14d5791 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -69,7 +69,7 @@
 module_param(cpu_pattern, charp, 0444);
 MODULE_PARM_DESC(cpu_pattern, "CPU partitions pattern");
 
-struct cfs_cpt_table *cfs_cpt_table_alloc(unsigned int ncpt)
+struct cfs_cpt_table *cfs_cpt_table_alloc(int ncpt)
 {
struct cfs_cpt_table *cptab;
int i;
@@ -784,13 +784,13 @@ static int cfs_cpt_choose_ncpus(struct cfs_cpt_table 
*cptab, int cpt,
return rc;
 }
 
-#define CPT_WEIGHT_MIN  4u
+#define CPT_WEIGHT_MIN 4
 
-static unsigned int cfs_cpt_num_estimate(void)
+static int cfs_cpt_num_estimate(void)
 {
-   unsigned int nnode = num_online_nodes();
-   unsigned int ncpu = num_online_cpus();
-   unsigned int ncpt;
+   int nnode = num_online_nodes();
+   int ncpu = num_online_cpus();
+   int ncpt;
 
if (ncpu <= CPT_WEIGHT_MIN) {
ncpt = 1;
@@ -820,7 +820,7 @@ static unsigned int cfs_cpt_num_estimate(void)
/* config many CPU partitions on 32-bit system could consume
 * too much memory
 */
-   ncpt = min(2U, ncpt);
+   ncpt = min(2, ncpt);
 #endif
while (ncpu % ncpt)
ncpt--; /* worst case is 1 */
-- 
1.8.3.1



[PATCH v2 18/25] staging: lustre: libcfs: rename goto label in cfs_cpt_table_print

2018-05-29 Thread James Simmons
From: Dmitry Eremin 

Change goto label out to err.

Signed-off-by: Dmitry Eremin 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8703
Reviewed-on: https://review.whamcloud.com/23222
Reviewed-by: Amir Shehata 
Reviewed-by: James Simmons 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch
v2) Rebased patch. No changes in code from earlier patch

 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c 
b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index fb27dac..e12d337 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -193,20 +193,20 @@ int cfs_cpt_table_print(struct cfs_cpt_table *cptab, char 
*buf, int len)
 
for (i = 0; i < cptab->ctb_nparts; i++) {
if (len <= 0)
-   goto out;
+   goto err;
 
rc = snprintf(tmp, len, "%d\t:", i);
len -= rc;
 
if (len <= 0)
-   goto out;
+   goto err;
 
tmp += rc;
for_each_cpu(j, cptab->ctb_parts[i].cpt_cpumask) {
rc = snprintf(tmp, len, "%d ", j);
len -= rc;
if (len <= 0)
-   goto out;
+   goto err;
tmp += rc;
}
 
@@ -216,7 +216,7 @@ int cfs_cpt_table_print(struct cfs_cpt_table *cptab, char 
*buf, int len)
}
 
return tmp - buf;
-out:
+err:
return -E2BIG;
 }
 EXPORT_SYMBOL(cfs_cpt_table_print);
-- 
1.8.3.1



[PATCH v2 19/25] staging: lustre: libcfs: update debug messages

2018-05-29 Thread James Simmons
From: Dmitry Eremin 

For cfs_cpt_bind() change the CERROR to CDEBUG. Make the debug
message in cfs_cpt_table_create_pattern() more understandable.
Report rc value for when cfs_cpt_create_table() fails.

Signed-off-by: Dmitry Eremin 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8703
Reviewed-on: https://review.whamcloud.com/23222
Reviewed-by: Amir Shehata 
Reviewed-by: James Simmons 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch
v2) Rebased patch. No changes in code from earlier patch

 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c 
b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index e12d337..7f1061e 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -480,7 +480,8 @@ void cfs_cpt_unset_cpu(struct cfs_cpt_table *cptab, int 
cpt, int cpu)
/* caller doesn't know the partition ID */
cpt = cptab->ctb_cpu2cpt[cpu];
if (cpt < 0) { /* not set in this CPT-table */
-   CDEBUG(D_INFO, "Try to unset cpu %d which is not in 
CPT-table %p\n",
+   CDEBUG(D_INFO,
+  "Try to unset cpu %d which is not in CPT-table 
%p\n",
   cpt, cptab);
return;
}
@@ -506,7 +507,8 @@ int cfs_cpt_set_cpumask(struct cfs_cpt_table *cptab, int 
cpt,
 
if (!cpumask_weight(mask) ||
cpumask_any_and(mask, cpu_online_mask) >= nr_cpu_ids) {
-   CDEBUG(D_INFO, "No online CPU is found in the CPU mask for CPU 
partition %d\n",
+   CDEBUG(D_INFO,
+  "No online CPU is found in the CPU mask for CPU 
partition %d\n",
   cpt);
return 0;
}
@@ -683,7 +685,8 @@ int cfs_cpt_bind(struct cfs_cpt_table *cptab, int cpt)
}
 
if (cpumask_any_and(*cpumask, cpu_online_mask) >= nr_cpu_ids) {
-   CERROR("No online CPU found in CPU partition %d, did someone do 
CPU hotplug on system? You might need to reload Lustre modules to keep system 
working well.\n",
+   CDEBUG(D_INFO,
+  "No online CPU found in CPU partition %d, did someone do 
CPU hotplug on system? You might need to reload Lustre modules to keep system 
working well.\n",
   cpt);
return -EINVAL;
}
@@ -914,8 +917,8 @@ static struct cfs_cpt_table *cfs_cpt_table_create(int ncpt)
 failed_mask:
free_cpumask_var(node_mask);
 failed:
-   CERROR("Failed to setup CPU-partition-table with %d CPU-partitions, 
online HW nodes: %d, HW cpus: %d.\n",
-  ncpt, num_online_nodes(), num_online_cpus());
+   CERROR("Failed (rc = %d) to setup CPU partition table with %d 
partitions, online HW NUMA nodes: %d, HW CPU cores: %d.\n",
+  rc, ncpt, num_online_nodes(), num_online_cpus());
 
if (cptab)
cfs_cpt_table_free(cptab);
@@ -1030,7 +1033,7 @@ static struct cfs_cpt_table 
*cfs_cpt_table_create_pattern(char *pattern)
 
bracket = strchr(str, ']');
if (!bracket) {
-   CERROR("missing right bracket for cpt %d, %s\n",
+   CERROR("Missing right bracket for partition %d, %s\n",
   cpt, str);
goto failed;
}
-- 
1.8.3.1



[PATCH v2 14/25] staging: lustre: libcfs: use int type for CPT identification.

2018-05-29 Thread James Simmons
From: Dmitry Eremin 

Use int type for CPT identification to match the linux kernel
CPU identification.

Signed-off-by: Dmitry Eremin 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8703
Reviewed-on: https://review.whamcloud.com/23304
Reviewed-by: James Simmons 
Reviewed-by: Doug Oucharek 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch
v2) Rebased patch to handle recent libcfs changes

 drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h |  8 
 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c  | 14 +++---
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h 
b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
index 2c97adf..9f4ba9d 100644
--- a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
+++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
@@ -90,7 +90,7 @@ struct cfs_cpu_partition {
/* NUMA distance between CPTs */
unsigned int*cpt_distance;
/* spread rotor for NUMA allocator */
-   unsigned intcpt_spread_rotor;
+   int cpt_spread_rotor;
 };
 #endif /* CONFIG_SMP */
 
@@ -98,11 +98,11 @@ struct cfs_cpu_partition {
 struct cfs_cpt_table {
 #ifdef CONFIG_SMP
/* spread rotor for NUMA allocator */
-   unsigned intctb_spread_rotor;
+   int ctb_spread_rotor;
/* maximum NUMA distance between all nodes in table */
unsigned intctb_distance;
/* # of CPU partitions */
-   unsigned intctb_nparts;
+   int ctb_nparts;
/* partitions tables */
struct cfs_cpu_partition*ctb_parts;
/* shadow HW CPU to CPU partition ID */
@@ -128,7 +128,7 @@ struct cfs_cpt_table {
 /**
  * create a cfs_cpt_table with \a ncpt number of partitions
  */
-struct cfs_cpt_table *cfs_cpt_table_alloc(unsigned int ncpt);
+struct cfs_cpt_table *cfs_cpt_table_alloc(int ncpt);
 /**
  * return cpumask of CPU partition \a cpt
  */
diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c 
b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index fab6675..14d5791 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -69,7 +69,7 @@
 module_param(cpu_pattern, charp, 0444);
 MODULE_PARM_DESC(cpu_pattern, "CPU partitions pattern");
 
-struct cfs_cpt_table *cfs_cpt_table_alloc(unsigned int ncpt)
+struct cfs_cpt_table *cfs_cpt_table_alloc(int ncpt)
 {
struct cfs_cpt_table *cptab;
int i;
@@ -784,13 +784,13 @@ static int cfs_cpt_choose_ncpus(struct cfs_cpt_table 
*cptab, int cpt,
return rc;
 }
 
-#define CPT_WEIGHT_MIN  4u
+#define CPT_WEIGHT_MIN 4
 
-static unsigned int cfs_cpt_num_estimate(void)
+static int cfs_cpt_num_estimate(void)
 {
-   unsigned int nnode = num_online_nodes();
-   unsigned int ncpu = num_online_cpus();
-   unsigned int ncpt;
+   int nnode = num_online_nodes();
+   int ncpu = num_online_cpus();
+   int ncpt;
 
if (ncpu <= CPT_WEIGHT_MIN) {
ncpt = 1;
@@ -820,7 +820,7 @@ static unsigned int cfs_cpt_num_estimate(void)
/* config many CPU partitions on 32-bit system could consume
 * too much memory
 */
-   ncpt = min(2U, ncpt);
+   ncpt = min(2, ncpt);
 #endif
while (ncpu % ncpt)
ncpt--; /* worst case is 1 */
-- 
1.8.3.1



[PATCH v2 18/25] staging: lustre: libcfs: rename goto label in cfs_cpt_table_print

2018-05-29 Thread James Simmons
From: Dmitry Eremin 

Change goto label out to err.

Signed-off-by: Dmitry Eremin 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8703
Reviewed-on: https://review.whamcloud.com/23222
Reviewed-by: Amir Shehata 
Reviewed-by: James Simmons 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch
v2) Rebased patch. No changes in code from earlier patch

 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c 
b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index fb27dac..e12d337 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -193,20 +193,20 @@ int cfs_cpt_table_print(struct cfs_cpt_table *cptab, char 
*buf, int len)
 
for (i = 0; i < cptab->ctb_nparts; i++) {
if (len <= 0)
-   goto out;
+   goto err;
 
rc = snprintf(tmp, len, "%d\t:", i);
len -= rc;
 
if (len <= 0)
-   goto out;
+   goto err;
 
tmp += rc;
for_each_cpu(j, cptab->ctb_parts[i].cpt_cpumask) {
rc = snprintf(tmp, len, "%d ", j);
len -= rc;
if (len <= 0)
-   goto out;
+   goto err;
tmp += rc;
}
 
@@ -216,7 +216,7 @@ int cfs_cpt_table_print(struct cfs_cpt_table *cptab, char 
*buf, int len)
}
 
return tmp - buf;
-out:
+err:
return -E2BIG;
 }
 EXPORT_SYMBOL(cfs_cpt_table_print);
-- 
1.8.3.1



[PATCH v2 17/25] staging: lustre: libcfs: rename cpumask_var_t variables to *_mask

2018-05-29 Thread James Simmons
From: Dmitry Eremin 

Because we handle both cpu mask as well as core identifiers it can
easily be confused. To avoid this rename various cpumask_var_t to
have appended *_mask to their names.

Signed-off-by: Dmitry Eremin 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8703
Reviewed-on: https://review.whamcloud.com/23222
Reviewed-by: Amir Shehata 
Reviewed-by: James Simmons 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch
v2) Rebased patch. No changes in code from earlier patch

 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 62 -
 1 file changed, 31 insertions(+), 31 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c 
b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index 1c10529..fb27dac 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -710,23 +710,23 @@ int cfs_cpt_bind(struct cfs_cpt_table *cptab, int cpt)
  * We always prefer to choose CPU in the same core/socket.
  */
 static int cfs_cpt_choose_ncpus(struct cfs_cpt_table *cptab, int cpt,
-   cpumask_t *node, int number)
+   cpumask_t *node_mask, int number)
 {
-   cpumask_var_t socket;
-   cpumask_var_t core;
+   cpumask_var_t socket_mask;
+   cpumask_var_t core_mask;
int rc = 0;
int cpu;
 
LASSERT(number > 0);
 
-   if (number >= cpumask_weight(node)) {
-   while (!cpumask_empty(node)) {
-   cpu = cpumask_first(node);
+   if (number >= cpumask_weight(node_mask)) {
+   while (!cpumask_empty(node_mask)) {
+   cpu = cpumask_first(node_mask);
 
rc = cfs_cpt_set_cpu(cptab, cpt, cpu);
if (!rc)
return -EINVAL;
-   cpumask_clear_cpu(cpu, node);
+   cpumask_clear_cpu(cpu, node_mask);
}
return 0;
}
@@ -736,34 +736,34 @@ static int cfs_cpt_choose_ncpus(struct cfs_cpt_table 
*cptab, int cpt,
 * As we cannot initialize a cpumask_var_t, we need
 * to alloc both before we can risk trying to free either
 */
-   if (!zalloc_cpumask_var(, GFP_NOFS))
+   if (!zalloc_cpumask_var(_mask, GFP_NOFS))
rc = -ENOMEM;
-   if (!zalloc_cpumask_var(, GFP_NOFS))
+   if (!zalloc_cpumask_var(_mask, GFP_NOFS))
rc = -ENOMEM;
if (rc)
goto out;
 
-   while (!cpumask_empty(node)) {
-   cpu = cpumask_first(node);
+   while (!cpumask_empty(node_mask)) {
+   cpu = cpumask_first(node_mask);
 
/* get cpumask for cores in the same socket */
-   cpumask_copy(socket, topology_core_cpumask(cpu));
-   cpumask_and(socket, socket, node);
+   cpumask_copy(socket_mask, topology_core_cpumask(cpu));
+   cpumask_and(socket_mask, socket_mask, node_mask);
 
-   LASSERT(!cpumask_empty(socket));
+   LASSERT(!cpumask_empty(socket_mask));
 
-   while (!cpumask_empty(socket)) {
+   while (!cpumask_empty(socket_mask)) {
int i;
 
/* get cpumask for hts in the same core */
-   cpumask_copy(core, topology_sibling_cpumask(cpu));
-   cpumask_and(core, core, node);
+   cpumask_copy(core_mask, topology_sibling_cpumask(cpu));
+   cpumask_and(core_mask, core_mask, node_mask);
 
-   LASSERT(!cpumask_empty(core));
+   LASSERT(!cpumask_empty(core_mask));
 
-   for_each_cpu(i, core) {
-   cpumask_clear_cpu(i, socket);
-   cpumask_clear_cpu(i, node);
+   for_each_cpu(i, core_mask) {
+   cpumask_clear_cpu(i, socket_mask);
+   cpumask_clear_cpu(i, node_mask);
 
rc = cfs_cpt_set_cpu(cptab, cpt, i);
if (!rc) {
@@ -774,13 +774,13 @@ static int cfs_cpt_choose_ncpus(struct cfs_cpt_table 
*cptab, int cpt,
if (!--number)
goto out;
}
-   cpu = cpumask_first(socket);
+   cpu = cpumask_first(socket_mask);
}
}
 
 out:
-   free_cpumask_var(socket);
-   free_cpumask_var(core);
+   free_cpumask_var(socket_mask);
+   free_cpumask_var(core_mask);
return rc;
 }
 
@@ -831,7 +831,7 @@ static int cfs_cpt_num_estimate(void)
 static struct cfs_cpt_table *cfs_cpt_table_create(int ncpt)
 {
struct cfs_cpt_table *cptab = NULL;
-   cpumask_v

[PATCH v2 24/25] staging: lustre: libcfs: change CPT estimate algorithm

2018-05-29 Thread James Simmons
From: Dmitry Eremin 

The main idea to have more CPU partitions is based on KNL experience.
When a thread submit IO for network communication one of threads from
current CPT is used for network stack. Whith high parallelization many
threads become involved in network submission but having less CPU
partitions they will wait until single thread process them from network
queue. So, the bottleneck just moves into network layer in case of
small amount of CPU partitions. My experiments showed that the best
performance was when for each IO thread we have one network thread.
This condition can be provided having 2 real HW cores (without hyper
threads) per CPT. This is exactly what implemented in this patch.

Change CPT estimate algorithm from 2 * (N - 1)^2 < NCPUS <= 2 * N^2
to 2 HW cores per CPT. This is critical for machines with number of
cores different from 2^N.

Current algorithm splits CPTs in KNL:
LNet: HW CPU cores: 272, npartitions: 16
cpu_partition_table=
0   : 0-4,68-71,136-139,204-207
1   : 5-9,73-76,141-144,209-212
2   : 10-14,78-81,146-149,214-217
3   : 15-17,72,77,83-85,140,145,151-153,208,219-221
4   : 18-21,82,86-88,150,154-156,213,218,222-224
5   : 22-26,90-93,158-161,226-229
6   : 27-31,95-98,163-166,231-234
7   : 32-35,89,100-103,168-171,236-239
8   : 36-38,94,99,104-105,157,162,167,172-173,225,230,235,240-241
9   : 39-43,107-110,175-178,243-246
10  : 44-48,112-115,180-183,248-251
11  : 49-51,106,111,117-119,174,179,185-187,242,253-255
12  : 52-55,116,120-122,184,188-190,247,252,256-258
13  : 56-60,124-127,192-195,260-263
14  : 61-65,129-132,197-200,265-268
15  : 66-67,123,128,133-135,191,196,201-203,259,264,269-271

New algorithm will split CPTs in KNL:
LNet: HW CPU cores: 272, npartitions: 34
cpu_partition_table=
0   : 0-1,68-69,136-137,204-205
1   : 2-3,70-71,138-139,206-207
2   : 4-5,72-73,140-141,208-209
3   : 6-7,74-75,142-143,210-211
4   : 8-9,76-77,144-145,212-213
5   : 10-11,78-79,146-147,214-215
6   : 12-13,80-81,148-149,216-217
7   : 14-15,82-83,150-151,218-219
8   : 16-17,84-85,152-153,220-221
9   : 18-19,86-87,154-155,222-223
10  : 20-21,88-89,156-157,224-225
11  : 22-23,90-91,158-159,226-227
12  : 24-25,92-93,160-161,228-229
13  : 26-27,94-95,162-163,230-231
14  : 28-29,96-97,164-165,232-233
15  : 30-31,98-99,166-167,234-235
16  : 32-33,100-101,168-169,236-237
17  : 34-35,102-103,170-171,238-239
18  : 36-37,104-105,172-173,240-241
19  : 38-39,106-107,174-175,242-243
20  : 40-41,108-109,176-177,244-245
21  : 42-43,110-111,178-179,246-247
22  : 44-45,112-113,180-181,248-249
23  : 46-47,114-115,182-183,250-251
24  : 48-49,116-117,184-185,252-253
25  : 50-51,118-119,186-187,254-255
26  : 52-53,120-121,188-189,256-257
27  : 54-55,122-123,190-191,258-259
28  : 56-57,124-125,192-193,260-261
29  : 58-59,126-127,194-195,262-263
30  : 60-61,128-129,196-197,264-265
31  : 62-63,130-131,198-199,266-267
32  : 64-65,132-133,200-201,268-269
33  : 66-67,134-135,202-203,270-271

'N' pattern in KNL works is not always good.
in flat mode it will be one CPT with all CPUs inside.

in SNC-4 mode:
cpu_partition_table=
0   : 0-17,68-85,136-153,204-221
1   : 18-35,86-103,154-171,222-239
2   : 36-51,104-119,172-187,240-255
3   : 52-67,120-135,188-203,256-271

Signed-off-by: Dmitry Eremin 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8703
Reviewed-on: https://review.whamcloud.com/24304
Reviewed-by: James Simmons 
Reviewed-by: Andreas Dilger 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch
v2) Rebased patch. No changes in code from earlier patch

 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 30 +
 1 file changed, 5 insertions(+), 25 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c 
b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index aed48de..ff752d5 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -798,34 +798,14 @@ static int cfs_cpt_choose_ncpus(struct cfs_cpt_table 
*cptab, int cpt,
 
 static int cfs_cpt_num_estimate(void)
 {
-   int nnode = num_online_nodes();
+   int nthr = cpumask_weight(topology_sibling_cpumask(smp_processor_id()));
int ncpu = num_online_cpus();
-   int ncpt;
+   int ncpt = 1;
 
-   if (ncpu <= CPT_WEIGHT_MIN) {
-   ncpt = 1;
-   goto out;
-   }
-
-   /* generate reasonable number of CPU partitions based on total number
-* of CPUs, Preferred N should be power2 and match this condition:
-* 2 * (N - 1)^

[PATCH v2 24/25] staging: lustre: libcfs: change CPT estimate algorithm

2018-05-29 Thread James Simmons
From: Dmitry Eremin 

The main idea to have more CPU partitions is based on KNL experience.
When a thread submit IO for network communication one of threads from
current CPT is used for network stack. Whith high parallelization many
threads become involved in network submission but having less CPU
partitions they will wait until single thread process them from network
queue. So, the bottleneck just moves into network layer in case of
small amount of CPU partitions. My experiments showed that the best
performance was when for each IO thread we have one network thread.
This condition can be provided having 2 real HW cores (without hyper
threads) per CPT. This is exactly what implemented in this patch.

Change CPT estimate algorithm from 2 * (N - 1)^2 < NCPUS <= 2 * N^2
to 2 HW cores per CPT. This is critical for machines with number of
cores different from 2^N.

Current algorithm splits CPTs in KNL:
LNet: HW CPU cores: 272, npartitions: 16
cpu_partition_table=
0   : 0-4,68-71,136-139,204-207
1   : 5-9,73-76,141-144,209-212
2   : 10-14,78-81,146-149,214-217
3   : 15-17,72,77,83-85,140,145,151-153,208,219-221
4   : 18-21,82,86-88,150,154-156,213,218,222-224
5   : 22-26,90-93,158-161,226-229
6   : 27-31,95-98,163-166,231-234
7   : 32-35,89,100-103,168-171,236-239
8   : 36-38,94,99,104-105,157,162,167,172-173,225,230,235,240-241
9   : 39-43,107-110,175-178,243-246
10  : 44-48,112-115,180-183,248-251
11  : 49-51,106,111,117-119,174,179,185-187,242,253-255
12  : 52-55,116,120-122,184,188-190,247,252,256-258
13  : 56-60,124-127,192-195,260-263
14  : 61-65,129-132,197-200,265-268
15  : 66-67,123,128,133-135,191,196,201-203,259,264,269-271

New algorithm will split CPTs in KNL:
LNet: HW CPU cores: 272, npartitions: 34
cpu_partition_table=
0   : 0-1,68-69,136-137,204-205
1   : 2-3,70-71,138-139,206-207
2   : 4-5,72-73,140-141,208-209
3   : 6-7,74-75,142-143,210-211
4   : 8-9,76-77,144-145,212-213
5   : 10-11,78-79,146-147,214-215
6   : 12-13,80-81,148-149,216-217
7   : 14-15,82-83,150-151,218-219
8   : 16-17,84-85,152-153,220-221
9   : 18-19,86-87,154-155,222-223
10  : 20-21,88-89,156-157,224-225
11  : 22-23,90-91,158-159,226-227
12  : 24-25,92-93,160-161,228-229
13  : 26-27,94-95,162-163,230-231
14  : 28-29,96-97,164-165,232-233
15  : 30-31,98-99,166-167,234-235
16  : 32-33,100-101,168-169,236-237
17  : 34-35,102-103,170-171,238-239
18  : 36-37,104-105,172-173,240-241
19  : 38-39,106-107,174-175,242-243
20  : 40-41,108-109,176-177,244-245
21  : 42-43,110-111,178-179,246-247
22  : 44-45,112-113,180-181,248-249
23  : 46-47,114-115,182-183,250-251
24  : 48-49,116-117,184-185,252-253
25  : 50-51,118-119,186-187,254-255
26  : 52-53,120-121,188-189,256-257
27  : 54-55,122-123,190-191,258-259
28  : 56-57,124-125,192-193,260-261
29  : 58-59,126-127,194-195,262-263
30  : 60-61,128-129,196-197,264-265
31  : 62-63,130-131,198-199,266-267
32  : 64-65,132-133,200-201,268-269
33  : 66-67,134-135,202-203,270-271

'N' pattern in KNL works is not always good.
in flat mode it will be one CPT with all CPUs inside.

in SNC-4 mode:
cpu_partition_table=
0   : 0-17,68-85,136-153,204-221
1   : 18-35,86-103,154-171,222-239
2   : 36-51,104-119,172-187,240-255
3   : 52-67,120-135,188-203,256-271

Signed-off-by: Dmitry Eremin 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8703
Reviewed-on: https://review.whamcloud.com/24304
Reviewed-by: James Simmons 
Reviewed-by: Andreas Dilger 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch
v2) Rebased patch. No changes in code from earlier patch

 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 30 +
 1 file changed, 5 insertions(+), 25 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c 
b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index aed48de..ff752d5 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -798,34 +798,14 @@ static int cfs_cpt_choose_ncpus(struct cfs_cpt_table 
*cptab, int cpt,
 
 static int cfs_cpt_num_estimate(void)
 {
-   int nnode = num_online_nodes();
+   int nthr = cpumask_weight(topology_sibling_cpumask(smp_processor_id()));
int ncpu = num_online_cpus();
-   int ncpt;
+   int ncpt = 1;
 
-   if (ncpu <= CPT_WEIGHT_MIN) {
-   ncpt = 1;
-   goto out;
-   }
-
-   /* generate reasonable number of CPU partitions based on total number
-* of CPUs, Preferred N should be power2 and match this condition:
-* 2 * (N - 1)^

[PATCH v2 17/25] staging: lustre: libcfs: rename cpumask_var_t variables to *_mask

2018-05-29 Thread James Simmons
From: Dmitry Eremin 

Because we handle both cpu mask as well as core identifiers it can
easily be confused. To avoid this rename various cpumask_var_t to
have appended *_mask to their names.

Signed-off-by: Dmitry Eremin 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8703
Reviewed-on: https://review.whamcloud.com/23222
Reviewed-by: Amir Shehata 
Reviewed-by: James Simmons 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch
v2) Rebased patch. No changes in code from earlier patch

 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 62 -
 1 file changed, 31 insertions(+), 31 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c 
b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index 1c10529..fb27dac 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -710,23 +710,23 @@ int cfs_cpt_bind(struct cfs_cpt_table *cptab, int cpt)
  * We always prefer to choose CPU in the same core/socket.
  */
 static int cfs_cpt_choose_ncpus(struct cfs_cpt_table *cptab, int cpt,
-   cpumask_t *node, int number)
+   cpumask_t *node_mask, int number)
 {
-   cpumask_var_t socket;
-   cpumask_var_t core;
+   cpumask_var_t socket_mask;
+   cpumask_var_t core_mask;
int rc = 0;
int cpu;
 
LASSERT(number > 0);
 
-   if (number >= cpumask_weight(node)) {
-   while (!cpumask_empty(node)) {
-   cpu = cpumask_first(node);
+   if (number >= cpumask_weight(node_mask)) {
+   while (!cpumask_empty(node_mask)) {
+   cpu = cpumask_first(node_mask);
 
rc = cfs_cpt_set_cpu(cptab, cpt, cpu);
if (!rc)
return -EINVAL;
-   cpumask_clear_cpu(cpu, node);
+   cpumask_clear_cpu(cpu, node_mask);
}
return 0;
}
@@ -736,34 +736,34 @@ static int cfs_cpt_choose_ncpus(struct cfs_cpt_table 
*cptab, int cpt,
 * As we cannot initialize a cpumask_var_t, we need
 * to alloc both before we can risk trying to free either
 */
-   if (!zalloc_cpumask_var(, GFP_NOFS))
+   if (!zalloc_cpumask_var(_mask, GFP_NOFS))
rc = -ENOMEM;
-   if (!zalloc_cpumask_var(, GFP_NOFS))
+   if (!zalloc_cpumask_var(_mask, GFP_NOFS))
rc = -ENOMEM;
if (rc)
goto out;
 
-   while (!cpumask_empty(node)) {
-   cpu = cpumask_first(node);
+   while (!cpumask_empty(node_mask)) {
+   cpu = cpumask_first(node_mask);
 
/* get cpumask for cores in the same socket */
-   cpumask_copy(socket, topology_core_cpumask(cpu));
-   cpumask_and(socket, socket, node);
+   cpumask_copy(socket_mask, topology_core_cpumask(cpu));
+   cpumask_and(socket_mask, socket_mask, node_mask);
 
-   LASSERT(!cpumask_empty(socket));
+   LASSERT(!cpumask_empty(socket_mask));
 
-   while (!cpumask_empty(socket)) {
+   while (!cpumask_empty(socket_mask)) {
int i;
 
/* get cpumask for hts in the same core */
-   cpumask_copy(core, topology_sibling_cpumask(cpu));
-   cpumask_and(core, core, node);
+   cpumask_copy(core_mask, topology_sibling_cpumask(cpu));
+   cpumask_and(core_mask, core_mask, node_mask);
 
-   LASSERT(!cpumask_empty(core));
+   LASSERT(!cpumask_empty(core_mask));
 
-   for_each_cpu(i, core) {
-   cpumask_clear_cpu(i, socket);
-   cpumask_clear_cpu(i, node);
+   for_each_cpu(i, core_mask) {
+   cpumask_clear_cpu(i, socket_mask);
+   cpumask_clear_cpu(i, node_mask);
 
rc = cfs_cpt_set_cpu(cptab, cpt, i);
if (!rc) {
@@ -774,13 +774,13 @@ static int cfs_cpt_choose_ncpus(struct cfs_cpt_table 
*cptab, int cpt,
if (!--number)
goto out;
}
-   cpu = cpumask_first(socket);
+   cpu = cpumask_first(socket_mask);
}
}
 
 out:
-   free_cpumask_var(socket);
-   free_cpumask_var(core);
+   free_cpumask_var(socket_mask);
+   free_cpumask_var(core_mask);
return rc;
 }
 
@@ -831,7 +831,7 @@ static int cfs_cpt_num_estimate(void)
 static struct cfs_cpt_table *cfs_cpt_table_create(int ncpt)
 {
struct cfs_cpt_table *cptab = NULL;
-   cpumask_v

[PATCH v2 25/25] staging: lustre: ptlrpc: use current CPU instead of hardcoded 0

2018-05-29 Thread James Simmons
From: Dmitry Eremin 

fix crash if CPU 0 disabled.

Signed-off-by: Dmitry Eremin 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8710
Reviewed-on: https://review.whamcloud.com/23305
Reviewed-by: Doug Oucharek 
Reviewed-by: Andreas Dilger 
Signed-off-by: James Simmons 
---
Changelog:

v1) New patch to address crash in ptlrpc

 drivers/staging/lustre/lustre/ptlrpc/service.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/staging/lustre/lustre/ptlrpc/service.c 
b/drivers/staging/lustre/lustre/ptlrpc/service.c
index 3fd8c74..8e74a45 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/service.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/service.c
@@ -421,7 +421,7 @@ static void ptlrpc_at_timer(struct timer_list *t)
 * there are.
 */
/* weight is # of HTs */
-   if (cpumask_weight(topology_sibling_cpumask(0)) > 1) {
+   if 
(cpumask_weight(topology_sibling_cpumask(smp_processor_id())) > 1) {
/* depress thread factor for hyper-thread */
factor = factor - (factor >> 1) + (factor >> 3);
}
@@ -2221,15 +2221,16 @@ static int ptlrpc_hr_main(void *arg)
struct ptlrpc_hr_thread *hrt = arg;
struct ptlrpc_hr_partition *hrp = hrt->hrt_partition;
LIST_HEAD(replies);
-   char threadname[20];
int rc;
 
-   snprintf(threadname, sizeof(threadname), "ptlrpc_hr%02d_%03d",
-hrp->hrp_cpt, hrt->hrt_id);
unshare_fs_struct();
 
rc = cfs_cpt_bind(ptlrpc_hr.hr_cpt_table, hrp->hrp_cpt);
if (rc != 0) {
+   char threadname[20];
+
+   snprintf(threadname, sizeof(threadname), "ptlrpc_hr%02d_%03d",
+hrp->hrp_cpt, hrt->hrt_id);
CWARN("Failed to bind %s on CPT %d of CPT table %p: rc = %d\n",
  threadname, hrp->hrp_cpt, ptlrpc_hr.hr_cpt_table, rc);
}
@@ -2528,7 +2529,7 @@ int ptlrpc_hr_init(void)
 
init_waitqueue_head(_hr.hr_waitq);
 
-   weight = cpumask_weight(topology_sibling_cpumask(0));
+   weight = cpumask_weight(topology_sibling_cpumask(smp_processor_id()));
 
cfs_percpt_for_each(hrp, i, ptlrpc_hr.hr_partitions) {
hrp->hrp_cpt = i;
-- 
1.8.3.1



[PATCH v2 23/25] staging: lustre: libcfs: rework CPU pattern parsing code

2018-05-29 Thread James Simmons
From: Dmitry Eremin 

Currently the module param string for CPU pattern can be
modified which is wrong. Rewrite CPU pattern parsing code
to avoid the passed buffer from being changed. This change
also enables us to add real errors propogation to the caller
functions.

Signed-off-by: Dmitry Eremin 
Signed-off-by: Amir Shehata 
Signed-off-by: Andreas Dilger 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8703
Reviewed-on: https://review.whamcloud.com/23306
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9715
Reviewed-on: https://review.whamcloud.com/27872
Reviewed-by: James Simmons 
Reviewed-by: Andreas Dilger 
Reviewed-by: Patrick Farrell 
Reviewed-by: Olaf Weber 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch
v2) Rebased patch. No changes in code from earlier patch

 .../lustre/include/linux/libcfs/libcfs_cpu.h   |   2 +-
 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c| 146 -
 2 files changed, 87 insertions(+), 61 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h 
b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
index c0aa0b3..12ed0a9 100644
--- a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
+++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
@@ -393,7 +393,7 @@ static inline int cfs_cpu_init(void)
 
 static inline void cfs_cpu_fini(void)
 {
-   if (cfs_cpt_tab) {
+   if (!IS_ERR_OR_NULL(cfs_cpt_tab)) {
cfs_cpt_table_free(cfs_cpt_tab);
cfs_cpt_tab = NULL;
}
diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c 
b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index 649f7f9..aed48de 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -692,11 +692,11 @@ int cfs_cpt_bind(struct cfs_cpt_table *cptab, int cpt)
nodemask = cptab->ctb_parts[cpt].cpt_nodemask;
}
 
-   if (cpumask_any_and(*cpumask, cpu_online_mask) >= nr_cpu_ids) {
+   if (!cpumask_intersects(*cpumask, cpu_online_mask)) {
CDEBUG(D_INFO,
   "No online CPU found in CPU partition %d, did someone do 
CPU hotplug on system? You might need to reload Lustre modules to keep system 
working well.\n",
   cpt);
-   return -EINVAL;
+   return -ENODEV;
}
 
for_each_online_cpu(cpu) {
@@ -860,11 +860,13 @@ static struct cfs_cpt_table *cfs_cpt_table_create(int 
ncpt)
cptab = cfs_cpt_table_alloc(ncpt);
if (!cptab) {
CERROR("Failed to allocate CPU map(%d)\n", ncpt);
+   rc = -ENOMEM;
goto failed;
}
 
if (!zalloc_cpumask_var(_mask, GFP_NOFS)) {
CERROR("Failed to allocate scratch cpumask\n");
+   rc = -ENOMEM;
goto failed;
}
 
@@ -879,8 +881,10 @@ static struct cfs_cpt_table *cfs_cpt_table_create(int ncpt)
 
rc = cfs_cpt_choose_ncpus(cptab, cpt, node_mask,
  num - ncpu);
-   if (rc < 0)
+   if (rc < 0) {
+   rc = -EINVAL;
goto failed_mask;
+   }
 
ncpu = cpumask_weight(part->cpt_cpumask);
if (ncpu == num + !!(rem > 0)) {
@@ -903,37 +907,51 @@ static struct cfs_cpt_table *cfs_cpt_table_create(int 
ncpt)
if (cptab)
cfs_cpt_table_free(cptab);
 
-   return NULL;
+   return ERR_PTR(rc);
 }
 
-static struct cfs_cpt_table *cfs_cpt_table_create_pattern(char *pattern)
+static struct cfs_cpt_table *cfs_cpt_table_create_pattern(const char *pattern)
 {
struct cfs_cpt_table *cptab;
+   char *pattern_dup;
+   char *bracket;
char *str;
int node = 0;
-   int high;
int ncpt = 0;
-   int cpt;
+   int cpt = 0;
+   int high;
int rc;
int c;
int i;
 
-   str = strim(pattern);
+   pattern_dup = kstrdup(pattern, GFP_KERNEL);
+   if (!pattern_dup) {
+   CERROR("Failed to duplicate pattern '%s'\n", pattern);
+   return ERR_PTR(-ENOMEM);
+   }
+
+   str = strim(pattern_dup);
if (*str == 'n' || *str == 'N') {
-   pattern = str + 1;
-   if (*pattern != '\0') {
-   node = 1;
-   } else { /* shortcut to create CPT from NUMA & CPU topology */
+   str++; /* skip 'N' char */
+   node = 1; /* NUMA pattern */
+   if (*str == '\0') {
node = -1;
-   ncpt = num_online_nodes();
+   for_each_online_node(i) {
+   if (!cpumask_empty(cpu

[PATCH v2 22/25] staging: lustre: libcfs: update debug messages in CPT code

2018-05-29 Thread James Simmons
From: Dmitry Eremin 

Update the debug messages for the CPT table creation code. Place
the passed in string in quotes to make it clear what it is.
Captialize cpu in the debug strings.

Signed-off-by: Dmitry Eremin 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8703
Reviewed-on: https://review.whamcloud.com/23306
Reviewed-by: James Simmons 
Reviewed-by: Andreas Dilger 
Reviewed-by: Patrick Farrell 
Reviewed-by: Olaf Weber 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch
v2) Rebased patch. No changes in code from earlier patch

 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c 
b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index 0fc102c..649f7f9 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -496,7 +496,7 @@ void cfs_cpt_unset_cpu(struct cfs_cpt_table *cptab, int 
cpt, int cpu)
 
} else if (cpt != cptab->ctb_cpu2cpt[cpu]) {
CDEBUG(D_INFO,
-  "CPU %d is not in cpu-partition %d\n", cpu, cpt);
+  "CPU %d is not in CPU partition %d\n", cpu, cpt);
return;
}
 
@@ -940,14 +940,14 @@ static struct cfs_cpt_table 
*cfs_cpt_table_create_pattern(char *pattern)
if (!ncpt ||
(node && ncpt > num_online_nodes()) ||
(!node && ncpt > num_online_cpus())) {
-   CERROR("Invalid pattern %s, or too many partitions %d\n",
+   CERROR("Invalid pattern '%s', or too many partitions %d\n",
   pattern, ncpt);
return NULL;
}
 
cptab = cfs_cpt_table_alloc(ncpt);
if (!cptab) {
-   CERROR("Failed to allocate cpu partition table\n");
+   CERROR("Failed to allocate CPU partition table\n");
return NULL;
}
 
@@ -978,11 +978,11 @@ static struct cfs_cpt_table 
*cfs_cpt_table_create_pattern(char *pattern)
 
if (!bracket) {
if (*str) {
-   CERROR("Invalid pattern %s\n", str);
+   CERROR("Invalid pattern '%s'\n", str);
goto failed;
}
if (c != ncpt) {
-   CERROR("expect %d partitions but found %d\n",
+   CERROR("Expect %d partitions but found %d\n",
   ncpt, c);
goto failed;
}
@@ -990,7 +990,7 @@ static struct cfs_cpt_table 
*cfs_cpt_table_create_pattern(char *pattern)
}
 
if (sscanf(str, "%d%n", , ) < 1) {
-   CERROR("Invalid cpu pattern %s\n", str);
+   CERROR("Invalid CPU pattern '%s'\n", str);
goto failed;
}
 
@@ -1007,20 +1007,20 @@ static struct cfs_cpt_table 
*cfs_cpt_table_create_pattern(char *pattern)
 
str = strim(str + n);
if (str != bracket) {
-   CERROR("Invalid pattern %s\n", str);
+   CERROR("Invalid pattern '%s'\n", str);
goto failed;
}
 
bracket = strchr(str, ']');
if (!bracket) {
-   CERROR("Missing right bracket for partition %d, %s\n",
+   CERROR("Missing right bracket for partition %d in 
'%s'\n",
   cpt, str);
goto failed;
}
 
if (cfs_expr_list_parse(str, (bracket - str) + 1,
0, high, )) {
-   CERROR("Can't parse number range: %s\n", str);
+   CERROR("Can't parse number range in '%s'\n", str);
goto failed;
}
 
-- 
1.8.3.1



[PATCH v2 25/25] staging: lustre: ptlrpc: use current CPU instead of hardcoded 0

2018-05-29 Thread James Simmons
From: Dmitry Eremin 

fix crash if CPU 0 disabled.

Signed-off-by: Dmitry Eremin 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8710
Reviewed-on: https://review.whamcloud.com/23305
Reviewed-by: Doug Oucharek 
Reviewed-by: Andreas Dilger 
Signed-off-by: James Simmons 
---
Changelog:

v1) New patch to address crash in ptlrpc

 drivers/staging/lustre/lustre/ptlrpc/service.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/staging/lustre/lustre/ptlrpc/service.c 
b/drivers/staging/lustre/lustre/ptlrpc/service.c
index 3fd8c74..8e74a45 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/service.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/service.c
@@ -421,7 +421,7 @@ static void ptlrpc_at_timer(struct timer_list *t)
 * there are.
 */
/* weight is # of HTs */
-   if (cpumask_weight(topology_sibling_cpumask(0)) > 1) {
+   if 
(cpumask_weight(topology_sibling_cpumask(smp_processor_id())) > 1) {
/* depress thread factor for hyper-thread */
factor = factor - (factor >> 1) + (factor >> 3);
}
@@ -2221,15 +2221,16 @@ static int ptlrpc_hr_main(void *arg)
struct ptlrpc_hr_thread *hrt = arg;
struct ptlrpc_hr_partition *hrp = hrt->hrt_partition;
LIST_HEAD(replies);
-   char threadname[20];
int rc;
 
-   snprintf(threadname, sizeof(threadname), "ptlrpc_hr%02d_%03d",
-hrp->hrp_cpt, hrt->hrt_id);
unshare_fs_struct();
 
rc = cfs_cpt_bind(ptlrpc_hr.hr_cpt_table, hrp->hrp_cpt);
if (rc != 0) {
+   char threadname[20];
+
+   snprintf(threadname, sizeof(threadname), "ptlrpc_hr%02d_%03d",
+hrp->hrp_cpt, hrt->hrt_id);
CWARN("Failed to bind %s on CPT %d of CPT table %p: rc = %d\n",
  threadname, hrp->hrp_cpt, ptlrpc_hr.hr_cpt_table, rc);
}
@@ -2528,7 +2529,7 @@ int ptlrpc_hr_init(void)
 
init_waitqueue_head(_hr.hr_waitq);
 
-   weight = cpumask_weight(topology_sibling_cpumask(0));
+   weight = cpumask_weight(topology_sibling_cpumask(smp_processor_id()));
 
cfs_percpt_for_each(hrp, i, ptlrpc_hr.hr_partitions) {
hrp->hrp_cpt = i;
-- 
1.8.3.1



[PATCH v2 23/25] staging: lustre: libcfs: rework CPU pattern parsing code

2018-05-29 Thread James Simmons
From: Dmitry Eremin 

Currently the module param string for CPU pattern can be
modified which is wrong. Rewrite CPU pattern parsing code
to avoid the passed buffer from being changed. This change
also enables us to add real errors propogation to the caller
functions.

Signed-off-by: Dmitry Eremin 
Signed-off-by: Amir Shehata 
Signed-off-by: Andreas Dilger 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8703
Reviewed-on: https://review.whamcloud.com/23306
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9715
Reviewed-on: https://review.whamcloud.com/27872
Reviewed-by: James Simmons 
Reviewed-by: Andreas Dilger 
Reviewed-by: Patrick Farrell 
Reviewed-by: Olaf Weber 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch
v2) Rebased patch. No changes in code from earlier patch

 .../lustre/include/linux/libcfs/libcfs_cpu.h   |   2 +-
 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c| 146 -
 2 files changed, 87 insertions(+), 61 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h 
b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
index c0aa0b3..12ed0a9 100644
--- a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
+++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
@@ -393,7 +393,7 @@ static inline int cfs_cpu_init(void)
 
 static inline void cfs_cpu_fini(void)
 {
-   if (cfs_cpt_tab) {
+   if (!IS_ERR_OR_NULL(cfs_cpt_tab)) {
cfs_cpt_table_free(cfs_cpt_tab);
cfs_cpt_tab = NULL;
}
diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c 
b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index 649f7f9..aed48de 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -692,11 +692,11 @@ int cfs_cpt_bind(struct cfs_cpt_table *cptab, int cpt)
nodemask = cptab->ctb_parts[cpt].cpt_nodemask;
}
 
-   if (cpumask_any_and(*cpumask, cpu_online_mask) >= nr_cpu_ids) {
+   if (!cpumask_intersects(*cpumask, cpu_online_mask)) {
CDEBUG(D_INFO,
   "No online CPU found in CPU partition %d, did someone do 
CPU hotplug on system? You might need to reload Lustre modules to keep system 
working well.\n",
   cpt);
-   return -EINVAL;
+   return -ENODEV;
}
 
for_each_online_cpu(cpu) {
@@ -860,11 +860,13 @@ static struct cfs_cpt_table *cfs_cpt_table_create(int 
ncpt)
cptab = cfs_cpt_table_alloc(ncpt);
if (!cptab) {
CERROR("Failed to allocate CPU map(%d)\n", ncpt);
+   rc = -ENOMEM;
goto failed;
}
 
if (!zalloc_cpumask_var(_mask, GFP_NOFS)) {
CERROR("Failed to allocate scratch cpumask\n");
+   rc = -ENOMEM;
goto failed;
}
 
@@ -879,8 +881,10 @@ static struct cfs_cpt_table *cfs_cpt_table_create(int ncpt)
 
rc = cfs_cpt_choose_ncpus(cptab, cpt, node_mask,
  num - ncpu);
-   if (rc < 0)
+   if (rc < 0) {
+   rc = -EINVAL;
goto failed_mask;
+   }
 
ncpu = cpumask_weight(part->cpt_cpumask);
if (ncpu == num + !!(rem > 0)) {
@@ -903,37 +907,51 @@ static struct cfs_cpt_table *cfs_cpt_table_create(int 
ncpt)
if (cptab)
cfs_cpt_table_free(cptab);
 
-   return NULL;
+   return ERR_PTR(rc);
 }
 
-static struct cfs_cpt_table *cfs_cpt_table_create_pattern(char *pattern)
+static struct cfs_cpt_table *cfs_cpt_table_create_pattern(const char *pattern)
 {
struct cfs_cpt_table *cptab;
+   char *pattern_dup;
+   char *bracket;
char *str;
int node = 0;
-   int high;
int ncpt = 0;
-   int cpt;
+   int cpt = 0;
+   int high;
int rc;
int c;
int i;
 
-   str = strim(pattern);
+   pattern_dup = kstrdup(pattern, GFP_KERNEL);
+   if (!pattern_dup) {
+   CERROR("Failed to duplicate pattern '%s'\n", pattern);
+   return ERR_PTR(-ENOMEM);
+   }
+
+   str = strim(pattern_dup);
if (*str == 'n' || *str == 'N') {
-   pattern = str + 1;
-   if (*pattern != '\0') {
-   node = 1;
-   } else { /* shortcut to create CPT from NUMA & CPU topology */
+   str++; /* skip 'N' char */
+   node = 1; /* NUMA pattern */
+   if (*str == '\0') {
node = -1;
-   ncpt = num_online_nodes();
+   for_each_online_node(i) {
+   if (!cpumask_empty(cpu

[PATCH v2 22/25] staging: lustre: libcfs: update debug messages in CPT code

2018-05-29 Thread James Simmons
From: Dmitry Eremin 

Update the debug messages for the CPT table creation code. Place
the passed in string in quotes to make it clear what it is.
Captialize cpu in the debug strings.

Signed-off-by: Dmitry Eremin 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8703
Reviewed-on: https://review.whamcloud.com/23306
Reviewed-by: James Simmons 
Reviewed-by: Andreas Dilger 
Reviewed-by: Patrick Farrell 
Reviewed-by: Olaf Weber 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch
v2) Rebased patch. No changes in code from earlier patch

 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c 
b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index 0fc102c..649f7f9 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -496,7 +496,7 @@ void cfs_cpt_unset_cpu(struct cfs_cpt_table *cptab, int 
cpt, int cpu)
 
} else if (cpt != cptab->ctb_cpu2cpt[cpu]) {
CDEBUG(D_INFO,
-  "CPU %d is not in cpu-partition %d\n", cpu, cpt);
+  "CPU %d is not in CPU partition %d\n", cpu, cpt);
return;
}
 
@@ -940,14 +940,14 @@ static struct cfs_cpt_table 
*cfs_cpt_table_create_pattern(char *pattern)
if (!ncpt ||
(node && ncpt > num_online_nodes()) ||
(!node && ncpt > num_online_cpus())) {
-   CERROR("Invalid pattern %s, or too many partitions %d\n",
+   CERROR("Invalid pattern '%s', or too many partitions %d\n",
   pattern, ncpt);
return NULL;
}
 
cptab = cfs_cpt_table_alloc(ncpt);
if (!cptab) {
-   CERROR("Failed to allocate cpu partition table\n");
+   CERROR("Failed to allocate CPU partition table\n");
return NULL;
}
 
@@ -978,11 +978,11 @@ static struct cfs_cpt_table 
*cfs_cpt_table_create_pattern(char *pattern)
 
if (!bracket) {
if (*str) {
-   CERROR("Invalid pattern %s\n", str);
+   CERROR("Invalid pattern '%s'\n", str);
goto failed;
}
if (c != ncpt) {
-   CERROR("expect %d partitions but found %d\n",
+   CERROR("Expect %d partitions but found %d\n",
   ncpt, c);
goto failed;
}
@@ -990,7 +990,7 @@ static struct cfs_cpt_table 
*cfs_cpt_table_create_pattern(char *pattern)
}
 
if (sscanf(str, "%d%n", , ) < 1) {
-   CERROR("Invalid cpu pattern %s\n", str);
+   CERROR("Invalid CPU pattern '%s'\n", str);
goto failed;
}
 
@@ -1007,20 +1007,20 @@ static struct cfs_cpt_table 
*cfs_cpt_table_create_pattern(char *pattern)
 
str = strim(str + n);
if (str != bracket) {
-   CERROR("Invalid pattern %s\n", str);
+   CERROR("Invalid pattern '%s'\n", str);
goto failed;
}
 
bracket = strchr(str, ']');
if (!bracket) {
-   CERROR("Missing right bracket for partition %d, %s\n",
+   CERROR("Missing right bracket for partition %d in 
'%s'\n",
   cpt, str);
goto failed;
}
 
if (cfs_expr_list_parse(str, (bracket - str) + 1,
0, high, )) {
-   CERROR("Can't parse number range: %s\n", str);
+   CERROR("Can't parse number range in '%s'\n", str);
goto failed;
}
 
-- 
1.8.3.1



[PATCH v2 5/6] staging: lustre: mdc: excessive memory consumption by the xattr cache

2018-05-29 Thread James Simmons
From: Andrew Perepechko 

The refill operation of the xattr cache does not know the
reply size in advance, so it makes a guess based on
the maxeasize value returned by the MDS.

In practice, it allocates 16 KiB for the common case and
4 MiB for the large xattr case. However, a typical reply
is just a few hundred bytes.

If we follow the conservative approach, we can prepare a
single memory page for the reply. It is large enough for
any reasonable xattr set and, at the same time, it does
not require multiple page memory reclaim, which can be
costly.

If, for a specific file, the reply is larger than a single
page, the client is prepared to handle that and will fall back
to non-cached xattr code. Indeed, if this happens often and
xattrs are often used to store large values, it makes sense to
disable the xattr cache at all since it wasn't designed for
such [mis]use.

Signed-off-by: Andrew Perepechko 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9417
Reviewed-on: https://review.whamcloud.com/26887
Reviewed-by: Fan Yong 
Reviewed-by: Ben Evans 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch
v2) rebased patch. No changes

 drivers/staging/lustre/lustre/mdc/mdc_locks.c | 23 +--
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c 
b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index 65a5341..a8aa0fa 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -315,6 +315,10 @@ static void mdc_realloc_openmsg(struct ptlrpc_request *req,
return req;
 }
 
+#define GA_DEFAULT_EA_NAME_LEN 20
+#define GA_DEFAULT_EA_VAL_LEN  250
+#define GA_DEFAULT_EA_NUM  10
+
 static struct ptlrpc_request *
 mdc_intent_getxattr_pack(struct obd_export *exp,
 struct lookup_intent *it,
@@ -323,7 +327,6 @@ static void mdc_realloc_openmsg(struct ptlrpc_request *req,
struct ptlrpc_request   *req;
struct ldlm_intent  *lit;
int rc, count = 0;
-   u32 maxdata;
LIST_HEAD(cancels);
 
req = ptlrpc_request_alloc(class_exp2cliimp(exp),
@@ -341,20 +344,20 @@ static void mdc_realloc_openmsg(struct ptlrpc_request 
*req,
lit = req_capsule_client_get(>rq_pill, _LDLM_INTENT);
lit->opc = IT_GETXATTR;
 
-   maxdata = class_exp2cliimp(exp)->imp_connect_data.ocd_max_easize;
-
/* pack the intended request */
-   mdc_pack_body(req, _data->op_fid1, op_data->op_valid, maxdata, -1,
- 0);
+   mdc_pack_body(req, _data->op_fid1, op_data->op_valid,
+ GA_DEFAULT_EA_NAME_LEN * GA_DEFAULT_EA_NUM, -1, 0);
 
-   req_capsule_set_size(>rq_pill, _EADATA, RCL_SERVER, maxdata);
+   req_capsule_set_size(>rq_pill, _EADATA, RCL_SERVER,
+GA_DEFAULT_EA_NAME_LEN * GA_DEFAULT_EA_NUM);
 
-   req_capsule_set_size(>rq_pill, _EAVALS, RCL_SERVER, maxdata);
+   req_capsule_set_size(>rq_pill, _EAVALS, RCL_SERVER,
+GA_DEFAULT_EA_NAME_LEN * GA_DEFAULT_EA_NUM);
 
-   req_capsule_set_size(>rq_pill, _EAVALS_LENS,
-RCL_SERVER, maxdata);
+   req_capsule_set_size(>rq_pill, _EAVALS_LENS, RCL_SERVER,
+sizeof(u32) * GA_DEFAULT_EA_NUM);
 
-   req_capsule_set_size(>rq_pill, _ACL, RCL_SERVER, maxdata);
+   req_capsule_set_size(>rq_pill, _ACL, RCL_SERVER, 0);
 
ptlrpc_request_set_replen(req);
 
-- 
1.8.3.1



[PATCH v2 02/25] staging: lustre: libcfs: remove useless CPU partition code

2018-05-29 Thread James Simmons
From: Dmitry Eremin 

* remove scratch buffer and mutex which guard it.
* remove global cpumask and spinlock which guard it.
* remove cpt_version for checking CPUs state change during setup
  because of just disable CPUs state change during setup.
* remove whole global struct cfs_cpt_data cpt_data.
* remove few unused APIs.

Signed-off-by: Dmitry Eremin 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8703
Reviewed-on: https://review.whamcloud.com/23303
Reviewed-on: https://review.whamcloud.com/25048
Reviewed-by: James Simmons 
Reviewed-by: Doug Oucharek 
Reviewed-by: Andreas Dilger 
Reviewed-by: Olaf Weber 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch
v2) Rebased to handle recent cleanups in libcfs

 .../lustre/include/linux/libcfs/libcfs_cpu.h   |  32 ++
 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c| 115 +++--
 2 files changed, 22 insertions(+), 125 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h 
b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
index 2ad12a6..3626969 100644
--- a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
+++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
@@ -95,8 +95,6 @@ struct cfs_cpu_partition {
 /** descriptor for CPU partitions */
 struct cfs_cpt_table {
 #ifdef CONFIG_SMP
-   /* version, reserved for hotplug */
-   unsigned intctb_version;
/* spread rotor for NUMA allocator */
unsigned intctb_spread_rotor;
/* # of CPU partitions */
@@ -176,12 +174,12 @@ struct cfs_cpt_table {
  * return 1 if successfully set all CPUs, otherwise return 0
  */
 int cfs_cpt_set_cpumask(struct cfs_cpt_table *cptab,
-   int cpt, cpumask_t *mask);
+   int cpt, const cpumask_t *mask);
 /**
  * remove all cpus in \a mask from CPU partition \a cpt
  */
 void cfs_cpt_unset_cpumask(struct cfs_cpt_table *cptab,
-  int cpt, cpumask_t *mask);
+  int cpt, const cpumask_t *mask);
 /**
  * add all cpus in NUMA node \a node to CPU partition \a cpt
  * return 1 if successfully set all CPUs, otherwise return 0
@@ -204,20 +202,11 @@ int cfs_cpt_set_nodemask(struct cfs_cpt_table *cptab,
 void cfs_cpt_unset_nodemask(struct cfs_cpt_table *cptab,
int cpt, nodemask_t *mask);
 /**
- * unset all cpus for CPU partition \a cpt
- */
-void cfs_cpt_clear(struct cfs_cpt_table *cptab, int cpt);
-/**
  * convert partition id \a cpt to numa node id, if there are more than one
  * nodes in this partition, it might return a different node id each time.
  */
 int cfs_cpt_spread_node(struct cfs_cpt_table *cptab, int cpt);
 
-/**
- * return number of HTs in the same core of \a cpu
- */
-int cfs_cpu_ht_nsiblings(int cpu);
-
 int  cfs_cpu_init(void);
 void cfs_cpu_fini(void);
 
@@ -304,13 +293,15 @@ static inline int cfs_cpt_table_print(struct 
cfs_cpt_table *cptab,
 }
 
 static inline int
-cfs_cpt_set_cpumask(struct cfs_cpt_table *cptab, int cpt, cpumask_t *mask)
+cfs_cpt_set_cpumask(struct cfs_cpt_table *cptab, int cpt,
+   const cpumask_t *mask)
 {
return 1;
 }
 
 static inline void
-cfs_cpt_unset_cpumask(struct cfs_cpt_table *cptab, int cpt, cpumask_t *mask)
+cfs_cpt_unset_cpumask(struct cfs_cpt_table *cptab, int cpt,
+ const cpumask_t *mask)
 {
 }
 
@@ -336,11 +327,6 @@ static inline int cfs_cpt_table_print(struct cfs_cpt_table 
*cptab,
 {
 }
 
-static inline void
-cfs_cpt_clear(struct cfs_cpt_table *cptab, int cpt)
-{
-}
-
 static inline int
 cfs_cpt_spread_node(struct cfs_cpt_table *cptab, int cpt)
 {
@@ -348,12 +334,6 @@ static inline int cfs_cpt_table_print(struct cfs_cpt_table 
*cptab,
 }
 
 static inline int
-cfs_cpu_ht_nsiblings(int cpu)
-{
-   return 1;
-}
-
-static inline int
 cfs_cpt_current(struct cfs_cpt_table *cptab, int remap)
 {
return 0;
diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c 
b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index 803fc58..951a9ca 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -69,19 +69,6 @@
 module_param(cpu_pattern, charp, 0444);
 MODULE_PARM_DESC(cpu_pattern, "CPU partitions pattern");
 
-static struct cfs_cpt_data {
-   /* serialize hotplug etc */
-   spinlock_t  cpt_lock;
-   /* reserved for hotplug */
-   unsigned long   cpt_version;
-   /* mutex to protect cpt_cpumask */
-   struct mutexcpt_mutex;
-   /* scratch buffer for set/unset_node */
-   cpumask_var_t   cpt_cpumask;
-} cpt_data;
-
-#define CFS_CPU_VERSION_MAGIC 0xbabecafe
-
 struct cfs_cpt_table *
 cfs_cpt_table_alloc(unsigned int ncpt)
 {
@@ -124,11 +111,6 @@ struct cfs_cpt_table *
goto failed;
}
 
-   spin_lock(_dat

[PATCH v2 5/6] staging: lustre: mdc: excessive memory consumption by the xattr cache

2018-05-29 Thread James Simmons
From: Andrew Perepechko 

The refill operation of the xattr cache does not know the
reply size in advance, so it makes a guess based on
the maxeasize value returned by the MDS.

In practice, it allocates 16 KiB for the common case and
4 MiB for the large xattr case. However, a typical reply
is just a few hundred bytes.

If we follow the conservative approach, we can prepare a
single memory page for the reply. It is large enough for
any reasonable xattr set and, at the same time, it does
not require multiple page memory reclaim, which can be
costly.

If, for a specific file, the reply is larger than a single
page, the client is prepared to handle that and will fall back
to non-cached xattr code. Indeed, if this happens often and
xattrs are often used to store large values, it makes sense to
disable the xattr cache at all since it wasn't designed for
such [mis]use.

Signed-off-by: Andrew Perepechko 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9417
Reviewed-on: https://review.whamcloud.com/26887
Reviewed-by: Fan Yong 
Reviewed-by: Ben Evans 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch
v2) rebased patch. No changes

 drivers/staging/lustre/lustre/mdc/mdc_locks.c | 23 +--
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c 
b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index 65a5341..a8aa0fa 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -315,6 +315,10 @@ static void mdc_realloc_openmsg(struct ptlrpc_request *req,
return req;
 }
 
+#define GA_DEFAULT_EA_NAME_LEN 20
+#define GA_DEFAULT_EA_VAL_LEN  250
+#define GA_DEFAULT_EA_NUM  10
+
 static struct ptlrpc_request *
 mdc_intent_getxattr_pack(struct obd_export *exp,
 struct lookup_intent *it,
@@ -323,7 +327,6 @@ static void mdc_realloc_openmsg(struct ptlrpc_request *req,
struct ptlrpc_request   *req;
struct ldlm_intent  *lit;
int rc, count = 0;
-   u32 maxdata;
LIST_HEAD(cancels);
 
req = ptlrpc_request_alloc(class_exp2cliimp(exp),
@@ -341,20 +344,20 @@ static void mdc_realloc_openmsg(struct ptlrpc_request 
*req,
lit = req_capsule_client_get(>rq_pill, _LDLM_INTENT);
lit->opc = IT_GETXATTR;
 
-   maxdata = class_exp2cliimp(exp)->imp_connect_data.ocd_max_easize;
-
/* pack the intended request */
-   mdc_pack_body(req, _data->op_fid1, op_data->op_valid, maxdata, -1,
- 0);
+   mdc_pack_body(req, _data->op_fid1, op_data->op_valid,
+ GA_DEFAULT_EA_NAME_LEN * GA_DEFAULT_EA_NUM, -1, 0);
 
-   req_capsule_set_size(>rq_pill, _EADATA, RCL_SERVER, maxdata);
+   req_capsule_set_size(>rq_pill, _EADATA, RCL_SERVER,
+GA_DEFAULT_EA_NAME_LEN * GA_DEFAULT_EA_NUM);
 
-   req_capsule_set_size(>rq_pill, _EAVALS, RCL_SERVER, maxdata);
+   req_capsule_set_size(>rq_pill, _EAVALS, RCL_SERVER,
+GA_DEFAULT_EA_NAME_LEN * GA_DEFAULT_EA_NUM);
 
-   req_capsule_set_size(>rq_pill, _EAVALS_LENS,
-RCL_SERVER, maxdata);
+   req_capsule_set_size(>rq_pill, _EAVALS_LENS, RCL_SERVER,
+sizeof(u32) * GA_DEFAULT_EA_NUM);
 
-   req_capsule_set_size(>rq_pill, _ACL, RCL_SERVER, maxdata);
+   req_capsule_set_size(>rq_pill, _ACL, RCL_SERVER, 0);
 
ptlrpc_request_set_replen(req);
 
-- 
1.8.3.1



[PATCH v2 02/25] staging: lustre: libcfs: remove useless CPU partition code

2018-05-29 Thread James Simmons
From: Dmitry Eremin 

* remove scratch buffer and mutex which guard it.
* remove global cpumask and spinlock which guard it.
* remove cpt_version for checking CPUs state change during setup
  because of just disable CPUs state change during setup.
* remove whole global struct cfs_cpt_data cpt_data.
* remove few unused APIs.

Signed-off-by: Dmitry Eremin 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8703
Reviewed-on: https://review.whamcloud.com/23303
Reviewed-on: https://review.whamcloud.com/25048
Reviewed-by: James Simmons 
Reviewed-by: Doug Oucharek 
Reviewed-by: Andreas Dilger 
Reviewed-by: Olaf Weber 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch
v2) Rebased to handle recent cleanups in libcfs

 .../lustre/include/linux/libcfs/libcfs_cpu.h   |  32 ++
 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c| 115 +++--
 2 files changed, 22 insertions(+), 125 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h 
b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
index 2ad12a6..3626969 100644
--- a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
+++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
@@ -95,8 +95,6 @@ struct cfs_cpu_partition {
 /** descriptor for CPU partitions */
 struct cfs_cpt_table {
 #ifdef CONFIG_SMP
-   /* version, reserved for hotplug */
-   unsigned intctb_version;
/* spread rotor for NUMA allocator */
unsigned intctb_spread_rotor;
/* # of CPU partitions */
@@ -176,12 +174,12 @@ struct cfs_cpt_table {
  * return 1 if successfully set all CPUs, otherwise return 0
  */
 int cfs_cpt_set_cpumask(struct cfs_cpt_table *cptab,
-   int cpt, cpumask_t *mask);
+   int cpt, const cpumask_t *mask);
 /**
  * remove all cpus in \a mask from CPU partition \a cpt
  */
 void cfs_cpt_unset_cpumask(struct cfs_cpt_table *cptab,
-  int cpt, cpumask_t *mask);
+  int cpt, const cpumask_t *mask);
 /**
  * add all cpus in NUMA node \a node to CPU partition \a cpt
  * return 1 if successfully set all CPUs, otherwise return 0
@@ -204,20 +202,11 @@ int cfs_cpt_set_nodemask(struct cfs_cpt_table *cptab,
 void cfs_cpt_unset_nodemask(struct cfs_cpt_table *cptab,
int cpt, nodemask_t *mask);
 /**
- * unset all cpus for CPU partition \a cpt
- */
-void cfs_cpt_clear(struct cfs_cpt_table *cptab, int cpt);
-/**
  * convert partition id \a cpt to numa node id, if there are more than one
  * nodes in this partition, it might return a different node id each time.
  */
 int cfs_cpt_spread_node(struct cfs_cpt_table *cptab, int cpt);
 
-/**
- * return number of HTs in the same core of \a cpu
- */
-int cfs_cpu_ht_nsiblings(int cpu);
-
 int  cfs_cpu_init(void);
 void cfs_cpu_fini(void);
 
@@ -304,13 +293,15 @@ static inline int cfs_cpt_table_print(struct 
cfs_cpt_table *cptab,
 }
 
 static inline int
-cfs_cpt_set_cpumask(struct cfs_cpt_table *cptab, int cpt, cpumask_t *mask)
+cfs_cpt_set_cpumask(struct cfs_cpt_table *cptab, int cpt,
+   const cpumask_t *mask)
 {
return 1;
 }
 
 static inline void
-cfs_cpt_unset_cpumask(struct cfs_cpt_table *cptab, int cpt, cpumask_t *mask)
+cfs_cpt_unset_cpumask(struct cfs_cpt_table *cptab, int cpt,
+ const cpumask_t *mask)
 {
 }
 
@@ -336,11 +327,6 @@ static inline int cfs_cpt_table_print(struct cfs_cpt_table 
*cptab,
 {
 }
 
-static inline void
-cfs_cpt_clear(struct cfs_cpt_table *cptab, int cpt)
-{
-}
-
 static inline int
 cfs_cpt_spread_node(struct cfs_cpt_table *cptab, int cpt)
 {
@@ -348,12 +334,6 @@ static inline int cfs_cpt_table_print(struct cfs_cpt_table 
*cptab,
 }
 
 static inline int
-cfs_cpu_ht_nsiblings(int cpu)
-{
-   return 1;
-}
-
-static inline int
 cfs_cpt_current(struct cfs_cpt_table *cptab, int remap)
 {
return 0;
diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c 
b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index 803fc58..951a9ca 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -69,19 +69,6 @@
 module_param(cpu_pattern, charp, 0444);
 MODULE_PARM_DESC(cpu_pattern, "CPU partitions pattern");
 
-static struct cfs_cpt_data {
-   /* serialize hotplug etc */
-   spinlock_t  cpt_lock;
-   /* reserved for hotplug */
-   unsigned long   cpt_version;
-   /* mutex to protect cpt_cpumask */
-   struct mutexcpt_mutex;
-   /* scratch buffer for set/unset_node */
-   cpumask_var_t   cpt_cpumask;
-} cpt_data;
-
-#define CFS_CPU_VERSION_MAGIC 0xbabecafe
-
 struct cfs_cpt_table *
 cfs_cpt_table_alloc(unsigned int ncpt)
 {
@@ -124,11 +111,6 @@ struct cfs_cpt_table *
goto failed;
}
 
-   spin_lock(_dat

[PATCH v2 0/6] staging: lustre: llite: remaining xattr patches

2018-05-29 Thread James Simmons
From: James Simmons 

Fixed the bugs in the set_acl patch pointed out by Dan Carpenter.
Rebased the next patch 'remove unused parameter..." on the parent
patch. Created new acl.c file to match what other linx kernel file
systems do. Added newer xattr fixes that were recently pushed.

Andrew Perepechko (1):
  staging: lustre: mdc: excessive memory consumption by the xattr cache

Dmitry Eremin (1):
  staging: lustre: llite: add support set_acl method in inode operations

Fan Yong (1):
  staging: lustre: acl: increase ACL entries limitation

James Simmons (1):
  staging: lustre: llite: create acl.c file

John L. Hammond (2):
  staging: lustre: llite: remove unused parameters from md_{get,set}xattr()
  staging: lustre: mdc: use large xattr buffers for old servers

 .../lustre/include/uapi/linux/lustre/lustre_idl.h  |   2 +-
 drivers/staging/lustre/lustre/include/lustre_acl.h |   7 +-
 drivers/staging/lustre/lustre/include/obd.h|   7 +-
 drivers/staging/lustre/lustre/include/obd_class.h  |  21 ++--
 drivers/staging/lustre/lustre/llite/Makefile   |   2 +
 drivers/staging/lustre/lustre/llite/acl.c  | 108 +
 drivers/staging/lustre/lustre/llite/file.c |  16 +--
 .../staging/lustre/lustre/llite/llite_internal.h   |   7 ++
 drivers/staging/lustre/lustre/llite/llite_lib.c|   3 +-
 drivers/staging/lustre/lustre/llite/xattr.c|   6 +-
 drivers/staging/lustre/lustre/lmv/lmv_obd.c|  22 ++---
 drivers/staging/lustre/lustre/mdc/mdc_locks.c  |  42 ++--
 drivers/staging/lustre/lustre/mdc/mdc_reint.c  |   2 +
 drivers/staging/lustre/lustre/mdc/mdc_request.c|  38 +---
 drivers/staging/lustre/lustre/ptlrpc/layout.c  |   4 +-
 drivers/staging/lustre/lustre/ptlrpc/wiretest.c|   4 +-
 16 files changed, 214 insertions(+), 77 deletions(-)
 create mode 100644 drivers/staging/lustre/lustre/llite/acl.c
--
Changelog:

v1) Initial patch set with fixes to address issues pointed by Dan.
v2) Created new acl.c file and rebased the patches due to that change

-- 
1.8.3.1



[PATCH v2 0/6] staging: lustre: llite: remaining xattr patches

2018-05-29 Thread James Simmons
From: James Simmons 

Fixed the bugs in the set_acl patch pointed out by Dan Carpenter.
Rebased the next patch 'remove unused parameter..." on the parent
patch. Created new acl.c file to match what other linx kernel file
systems do. Added newer xattr fixes that were recently pushed.

Andrew Perepechko (1):
  staging: lustre: mdc: excessive memory consumption by the xattr cache

Dmitry Eremin (1):
  staging: lustre: llite: add support set_acl method in inode operations

Fan Yong (1):
  staging: lustre: acl: increase ACL entries limitation

James Simmons (1):
  staging: lustre: llite: create acl.c file

John L. Hammond (2):
  staging: lustre: llite: remove unused parameters from md_{get,set}xattr()
  staging: lustre: mdc: use large xattr buffers for old servers

 .../lustre/include/uapi/linux/lustre/lustre_idl.h  |   2 +-
 drivers/staging/lustre/lustre/include/lustre_acl.h |   7 +-
 drivers/staging/lustre/lustre/include/obd.h|   7 +-
 drivers/staging/lustre/lustre/include/obd_class.h  |  21 ++--
 drivers/staging/lustre/lustre/llite/Makefile   |   2 +
 drivers/staging/lustre/lustre/llite/acl.c  | 108 +
 drivers/staging/lustre/lustre/llite/file.c |  16 +--
 .../staging/lustre/lustre/llite/llite_internal.h   |   7 ++
 drivers/staging/lustre/lustre/llite/llite_lib.c|   3 +-
 drivers/staging/lustre/lustre/llite/xattr.c|   6 +-
 drivers/staging/lustre/lustre/lmv/lmv_obd.c|  22 ++---
 drivers/staging/lustre/lustre/mdc/mdc_locks.c  |  42 ++--
 drivers/staging/lustre/lustre/mdc/mdc_reint.c  |   2 +
 drivers/staging/lustre/lustre/mdc/mdc_request.c|  38 +---
 drivers/staging/lustre/lustre/ptlrpc/layout.c  |   4 +-
 drivers/staging/lustre/lustre/ptlrpc/wiretest.c|   4 +-
 16 files changed, 214 insertions(+), 77 deletions(-)
 create mode 100644 drivers/staging/lustre/lustre/llite/acl.c
--
Changelog:

v1) Initial patch set with fixes to address issues pointed by Dan.
v2) Created new acl.c file and rebased the patches due to that change

-- 
1.8.3.1



Re: [PATCH 4/4] staging: lustre: obdclass: change object lookup to no wait mode

2018-05-16 Thread James Simmons

> > > Anyway, I understand that Intel has been ignoring kernel.org instead of
> > > sending forwarding their patches properly so you're doing a difficult
> > > and thankless job...  Thanks for that.  I'm sure it's frustrating to
> > > look at these patches for you as well.
> > 
> > Thank you for the complement. Also thank you for taking time to review
> > these patches. Your feedback is most welcomed and benefitical to the
> > health of the lustre client.
> > 
> > Sadly its not just Intel but other vendors that don't directly contribute
> > to the linux lustre client. I have spoke to the vendors about contributing 
> > and they all say the same thing. No working with drivers in the staging 
> > tree. Sadly all the parties involved are very interested in the success 
> > of the lustre client. No one has ever told me directly why they don't get
> > involved but I suspect it has to deal with 2 reasons. One is that staging
> > drivers are not normally enabled by distributions so their clients 
> > normally will never deal with the staging lustre client. Secondly vendors
> > just lack the man power to contribute in a meanful way.
> 
> If staging is hurting you, why is it in staging at all?  Why not just
> drop it, go off and spend a few months to clean up all the issues in
> your own tree (with none of those pesky requirements of easy-to-review
> patches) and then submit a "clean" filesystem for inclusion in the
> "real" part of the kernel tree?
> 
> It doesn't sound like anyone is actually using this code in the tree
> as-is, so why even keep it here?

I never said being in staging is hurting the progression of Lustre. In 
fact it is the exact opposite otherwise I wouldn't be active in this work.
What I was pointing out to Dan was that many vendors are reluctant to 
partcipate in broader open source development of this type.

The whole point of this is to evolve Lustre into a proper open source 
project not dependent on vendors for survival. Several years ago Lustre 
changed hands several times and the HPC community was worried about its
survival. Various institutions band togther to raise the resources to 
keep it alive. Over time Lustre has been migrating to a more open source 
community effort. An awesome example is the work the University of Indiana 
did for the sptlrpc layer. Now we see efforts expanding into the realm of 
the linux lustre client. Actually HPC sites that are community members are 
testing and running the linux client. In spite of the lack of vendor 
involvement the linux lustre client is making excellent progress. How 
often do you see style patches anymore? The headers are properly split
between userspace UAPI headers and kernel space. One of the major barriers
to leave staging was the the lack of a strong presence to continue moving
the lustre client forward. That is no longer the case. The finish line is
in view.


Re: [PATCH 4/4] staging: lustre: obdclass: change object lookup to no wait mode

2018-05-16 Thread James Simmons

> > > Anyway, I understand that Intel has been ignoring kernel.org instead of
> > > sending forwarding their patches properly so you're doing a difficult
> > > and thankless job...  Thanks for that.  I'm sure it's frustrating to
> > > look at these patches for you as well.
> > 
> > Thank you for the complement. Also thank you for taking time to review
> > these patches. Your feedback is most welcomed and benefitical to the
> > health of the lustre client.
> > 
> > Sadly its not just Intel but other vendors that don't directly contribute
> > to the linux lustre client. I have spoke to the vendors about contributing 
> > and they all say the same thing. No working with drivers in the staging 
> > tree. Sadly all the parties involved are very interested in the success 
> > of the lustre client. No one has ever told me directly why they don't get
> > involved but I suspect it has to deal with 2 reasons. One is that staging
> > drivers are not normally enabled by distributions so their clients 
> > normally will never deal with the staging lustre client. Secondly vendors
> > just lack the man power to contribute in a meanful way.
> 
> If staging is hurting you, why is it in staging at all?  Why not just
> drop it, go off and spend a few months to clean up all the issues in
> your own tree (with none of those pesky requirements of easy-to-review
> patches) and then submit a "clean" filesystem for inclusion in the
> "real" part of the kernel tree?
> 
> It doesn't sound like anyone is actually using this code in the tree
> as-is, so why even keep it here?

I never said being in staging is hurting the progression of Lustre. In 
fact it is the exact opposite otherwise I wouldn't be active in this work.
What I was pointing out to Dan was that many vendors are reluctant to 
partcipate in broader open source development of this type.

The whole point of this is to evolve Lustre into a proper open source 
project not dependent on vendors for survival. Several years ago Lustre 
changed hands several times and the HPC community was worried about its
survival. Various institutions band togther to raise the resources to 
keep it alive. Over time Lustre has been migrating to a more open source 
community effort. An awesome example is the work the University of Indiana 
did for the sptlrpc layer. Now we see efforts expanding into the realm of 
the linux lustre client. Actually HPC sites that are community members are 
testing and running the linux client. In spite of the lack of vendor 
involvement the linux lustre client is making excellent progress. How 
often do you see style patches anymore? The headers are properly split
between userspace UAPI headers and kernel space. One of the major barriers
to leave staging was the the lack of a strong presence to continue moving
the lustre client forward. That is no longer the case. The finish line is
in view.


Re: [PATCH 4/4] staging: lustre: obdclass: change object lookup to no wait mode

2018-05-15 Thread James Simmons

> > /*
> >  * Allocate new object. This may result in rather complicated
> >  * operations, including fld queries, inode loading, etc.
> >  */
> > o = lu_object_alloc(env, dev, f, conf);
> > -   if (IS_ERR(o))
> > +   if (unlikely(IS_ERR(o)))
> > return o;
> >  
> 
> This is an unrelated and totally pointless.  likely/unlikely annotations
> hurt readability, and they should only be added if it's something which
> is going to show up in benchmarking.  lu_object_alloc() is already too
> slow for the unlikely() to make a difference and anyway IS_ERR() has an
> unlikely built in so it's duplicative...

Sounds like a good checkpatch case to test for :-) Some people like to try
and milk ever cycle they can. Personally for me I never use those 
annotations. With modern processors I'm skeptical if their benefits.
I do cleanup up the patches to some extent to make it compliant with 
kernel standards but leave the core code in place for people to comment 
on.

> Anyway, I understand that Intel has been ignoring kernel.org instead of
> sending forwarding their patches properly so you're doing a difficult
> and thankless job...  Thanks for that.  I'm sure it's frustrating to
> look at these patches for you as well.

Thank you for the complement. Also thank you for taking time to review
these patches. Your feedback is most welcomed and benefitical to the
health of the lustre client.

Sadly its not just Intel but other vendors that don't directly contribute
to the linux lustre client. I have spoke to the vendors about contributing 
and they all say the same thing. No working with drivers in the staging 
tree. Sadly all the parties involved are very interested in the success 
of the lustre client. No one has ever told me directly why they don't get
involved but I suspect it has to deal with 2 reasons. One is that staging
drivers are not normally enabled by distributions so their clients 
normally will never deal with the staging lustre client. Secondly vendors
just lack the man power to contribute in a meanful way.


Re: [PATCH 4/4] staging: lustre: obdclass: change object lookup to no wait mode

2018-05-15 Thread James Simmons

> > /*
> >  * Allocate new object. This may result in rather complicated
> >  * operations, including fld queries, inode loading, etc.
> >  */
> > o = lu_object_alloc(env, dev, f, conf);
> > -   if (IS_ERR(o))
> > +   if (unlikely(IS_ERR(o)))
> > return o;
> >  
> 
> This is an unrelated and totally pointless.  likely/unlikely annotations
> hurt readability, and they should only be added if it's something which
> is going to show up in benchmarking.  lu_object_alloc() is already too
> slow for the unlikely() to make a difference and anyway IS_ERR() has an
> unlikely built in so it's duplicative...

Sounds like a good checkpatch case to test for :-) Some people like to try
and milk ever cycle they can. Personally for me I never use those 
annotations. With modern processors I'm skeptical if their benefits.
I do cleanup up the patches to some extent to make it compliant with 
kernel standards but leave the core code in place for people to comment 
on.

> Anyway, I understand that Intel has been ignoring kernel.org instead of
> sending forwarding their patches properly so you're doing a difficult
> and thankless job...  Thanks for that.  I'm sure it's frustrating to
> look at these patches for you as well.

Thank you for the complement. Also thank you for taking time to review
these patches. Your feedback is most welcomed and benefitical to the
health of the lustre client.

Sadly its not just Intel but other vendors that don't directly contribute
to the linux lustre client. I have spoke to the vendors about contributing 
and they all say the same thing. No working with drivers in the staging 
tree. Sadly all the parties involved are very interested in the success 
of the lustre client. No one has ever told me directly why they don't get
involved but I suspect it has to deal with 2 reasons. One is that staging
drivers are not normally enabled by distributions so their clients 
normally will never deal with the staging lustre client. Secondly vendors
just lack the man power to contribute in a meanful way.


Re: [PATCH v2 1/5] staging: lustre: llite: add support set_acl method in inode operations

2018-05-15 Thread James Simmons

> On Mon, May 14 2018, James Simmons wrote:
> 
> > From: Dmitry Eremin <dmitry.ere...@intel.com>
> >
> > Linux kernel v3.14 adds set_acl method to inode operations.
> > This patch adds support to Lustre for proper acl management.
> >
> > Signed-off-by: Dmitry Eremin <dmitry.ere...@intel.com>
> > Signed-off-by: John L. Hammond <john.hamm...@intel.com>
> > Signed-off-by: James Simmons <uja.o...@yahoo.com>
> > Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9183
> > Reviewed-on: https://review.whamcloud.com/25965
> > Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-10541
> > Reviewed-on: https://review.whamcloud.com/31588
> > Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-10926
> > Reviewed-on: https://review.whamcloud.com/32045
> > Reviewed-by: Bob Glossman <bob.gloss...@intel.com>
> > Reviewed-by: James Simmons <uja.o...@yahoo.com>
> > Reviewed-by: Andreas Dilger <andreas.dil...@intel.com>
> > Reviewed-by: Dmitry Eremin <dmitry.ere...@intel.com>
> > Reviewed-by: Oleg Drokin <oleg.dro...@intel.com>
> > Signed-off-by: James Simmons <jsimm...@infradead.org>
> > ---
> > Changelog:
> >
> > v1) Initial patch ported to staging tree
> > v2) Fixed up goto handling and avoid BUG() when calling
> > forget_cached_acl()with invalid type as pointed out by Dan Carpenter
> >
> >  drivers/staging/lustre/lustre/llite/file.c | 62 
> > ++
> >  .../staging/lustre/lustre/llite/llite_internal.h   |  4 ++
> >  drivers/staging/lustre/lustre/llite/namei.c| 10 +++-
> >  3 files changed, 74 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/staging/lustre/lustre/llite/file.c 
> > b/drivers/staging/lustre/lustre/llite/file.c
> > index 0026fde..64a5698 100644
> > --- a/drivers/staging/lustre/lustre/llite/file.c
> > +++ b/drivers/staging/lustre/lustre/llite/file.c
> > @@ -3030,6 +3030,7 @@ static int ll_fiemap(struct inode *inode, struct 
> > fiemap_extent_info *fieinfo,
> > return rc;
> >  }
> >  
> > +#ifdef CONFIG_FS_POSIX_ACL
> 
> Using #ifdef in  .c files is generally discouraged.
> The "standard" approach here is:
> - put the acl code in a separate file (acl.c)
> - optionally include it via the Make file
>  lustre-$(CONFIG_FS_POSIX_ACL) += acl.o
> 
> - in the header where ll_get_acl and ll_set_acl are declared have
>  #ifdef CONFIG_FS_POSIX_ACL
>declare the functions
>  #else
>  #define ll_get_acl NULL
>  #define ll_set_acl NULL
>  #endif
> 
> Now as this is staging and that is (presumably) an upstream patch
> lightly improved it is probably fine to include the patch as-is,
> but in that case we will probably want to fix it up later.

So you want Lustre to be like everyone else :-)
Okay I will fix up and send a new patch series.

> Thanks,
> NeilBrown
> 
> >  struct posix_acl *ll_get_acl(struct inode *inode, int type)
> >  {
> > struct ll_inode_info *lli = ll_i2info(inode);
> > @@ -3043,6 +3044,64 @@ struct posix_acl *ll_get_acl(struct inode *inode, 
> > int type)
> > return acl;
> >  }
> >  
> > +int ll_set_acl(struct inode *inode, struct posix_acl *acl, int type)
> > +{
> > +   struct ll_sb_info *sbi = ll_i2sbi(inode);
> > +   struct ptlrpc_request *req = NULL;
> > +   const char *name = NULL;
> > +   size_t value_size = 0;
> > +   char *value = NULL;
> > +   int rc;
> > +
> > +   switch (type) {
> > +   case ACL_TYPE_ACCESS:
> > +   name = XATTR_NAME_POSIX_ACL_ACCESS;
> > +   if (acl)
> > +   rc = posix_acl_update_mode(inode, >i_mode, );
> > +   break;
> > +
> > +   case ACL_TYPE_DEFAULT:
> > +   name = XATTR_NAME_POSIX_ACL_DEFAULT;
> > +   if (!S_ISDIR(inode->i_mode))
> > +   rc = acl ? -EACCES : 0;
> > +   break;
> > +
> > +   default:
> > +   rc = -EINVAL;
> > +   break;
> > +   }
> > +   if (rc)
> > +   return rc;
> > +
> > +   if (acl) {
> > +   value_size = posix_acl_xattr_size(acl->a_count);
> > +   value = kmalloc(value_size, GFP_NOFS);
> > +   if (!value) {
> > +   rc = -ENOMEM;
> > +   goto out;
> > +   }
> > +
> > +   rc = posix_acl_to_xattr(_user_ns, acl, value, value_size);
> > +   if (rc < 0)
> > +   goto out_value;
> > +   }
> 

Re: [PATCH v2 1/5] staging: lustre: llite: add support set_acl method in inode operations

2018-05-15 Thread James Simmons

> On Mon, May 14 2018, James Simmons wrote:
> 
> > From: Dmitry Eremin 
> >
> > Linux kernel v3.14 adds set_acl method to inode operations.
> > This patch adds support to Lustre for proper acl management.
> >
> > Signed-off-by: Dmitry Eremin 
> > Signed-off-by: John L. Hammond 
> > Signed-off-by: James Simmons 
> > Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9183
> > Reviewed-on: https://review.whamcloud.com/25965
> > Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-10541
> > Reviewed-on: https://review.whamcloud.com/31588
> > Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-10926
> > Reviewed-on: https://review.whamcloud.com/32045
> > Reviewed-by: Bob Glossman 
> > Reviewed-by: James Simmons 
> > Reviewed-by: Andreas Dilger 
> > Reviewed-by: Dmitry Eremin 
> > Reviewed-by: Oleg Drokin 
> > Signed-off-by: James Simmons 
> > ---
> > Changelog:
> >
> > v1) Initial patch ported to staging tree
> > v2) Fixed up goto handling and avoid BUG() when calling
> > forget_cached_acl()with invalid type as pointed out by Dan Carpenter
> >
> >  drivers/staging/lustre/lustre/llite/file.c | 62 
> > ++
> >  .../staging/lustre/lustre/llite/llite_internal.h   |  4 ++
> >  drivers/staging/lustre/lustre/llite/namei.c| 10 +++-
> >  3 files changed, 74 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/staging/lustre/lustre/llite/file.c 
> > b/drivers/staging/lustre/lustre/llite/file.c
> > index 0026fde..64a5698 100644
> > --- a/drivers/staging/lustre/lustre/llite/file.c
> > +++ b/drivers/staging/lustre/lustre/llite/file.c
> > @@ -3030,6 +3030,7 @@ static int ll_fiemap(struct inode *inode, struct 
> > fiemap_extent_info *fieinfo,
> > return rc;
> >  }
> >  
> > +#ifdef CONFIG_FS_POSIX_ACL
> 
> Using #ifdef in  .c files is generally discouraged.
> The "standard" approach here is:
> - put the acl code in a separate file (acl.c)
> - optionally include it via the Make file
>  lustre-$(CONFIG_FS_POSIX_ACL) += acl.o
> 
> - in the header where ll_get_acl and ll_set_acl are declared have
>  #ifdef CONFIG_FS_POSIX_ACL
>declare the functions
>  #else
>  #define ll_get_acl NULL
>  #define ll_set_acl NULL
>  #endif
> 
> Now as this is staging and that is (presumably) an upstream patch
> lightly improved it is probably fine to include the patch as-is,
> but in that case we will probably want to fix it up later.

So you want Lustre to be like everyone else :-)
Okay I will fix up and send a new patch series.

> Thanks,
> NeilBrown
> 
> >  struct posix_acl *ll_get_acl(struct inode *inode, int type)
> >  {
> > struct ll_inode_info *lli = ll_i2info(inode);
> > @@ -3043,6 +3044,64 @@ struct posix_acl *ll_get_acl(struct inode *inode, 
> > int type)
> > return acl;
> >  }
> >  
> > +int ll_set_acl(struct inode *inode, struct posix_acl *acl, int type)
> > +{
> > +   struct ll_sb_info *sbi = ll_i2sbi(inode);
> > +   struct ptlrpc_request *req = NULL;
> > +   const char *name = NULL;
> > +   size_t value_size = 0;
> > +   char *value = NULL;
> > +   int rc;
> > +
> > +   switch (type) {
> > +   case ACL_TYPE_ACCESS:
> > +   name = XATTR_NAME_POSIX_ACL_ACCESS;
> > +   if (acl)
> > +   rc = posix_acl_update_mode(inode, >i_mode, );
> > +   break;
> > +
> > +   case ACL_TYPE_DEFAULT:
> > +   name = XATTR_NAME_POSIX_ACL_DEFAULT;
> > +   if (!S_ISDIR(inode->i_mode))
> > +   rc = acl ? -EACCES : 0;
> > +   break;
> > +
> > +   default:
> > +   rc = -EINVAL;
> > +   break;
> > +   }
> > +   if (rc)
> > +   return rc;
> > +
> > +   if (acl) {
> > +   value_size = posix_acl_xattr_size(acl->a_count);
> > +   value = kmalloc(value_size, GFP_NOFS);
> > +   if (!value) {
> > +   rc = -ENOMEM;
> > +   goto out;
> > +   }
> > +
> > +   rc = posix_acl_to_xattr(_user_ns, acl, value, value_size);
> > +   if (rc < 0)
> > +   goto out_value;
> > +   }
> > +
> > +   rc = md_setxattr(sbi->ll_md_exp, ll_inode2fid(inode),
> > +value ? OBD_MD_FLXATTR : OBD_MD_FLXATTRRM,
> > +name, value, value_size, 0, 0, 0, );
> > +
> > +   ptlrpc_req_finished(req);
> > +out_value:
> &g

Re: [PATCH v2 1/5] staging: lustre: llite: add support set_acl method in inode operations

2018-05-15 Thread James Simmons

> On Mon, May 14, 2018 at 10:16:59PM -0400, James Simmons wrote:
> > +#ifdef CONFIG_FS_POSIX_ACL
> >  struct posix_acl *ll_get_acl(struct inode *inode, int type)
> >  {
> > struct ll_inode_info *lli = ll_i2info(inode);
> > @@ -3043,6 +3044,64 @@ struct posix_acl *ll_get_acl(struct inode *inode, 
> > int type)
> > return acl;
> >  }
> >  
> > +int ll_set_acl(struct inode *inode, struct posix_acl *acl, int type)
> > +{
> > +   struct ll_sb_info *sbi = ll_i2sbi(inode);
> > +   struct ptlrpc_request *req = NULL;
> > +   const char *name = NULL;
> > +   size_t value_size = 0;
> > +   char *value = NULL;
> > +   int rc;
> 
> "rc" needs to be initialized to zero.  It's disapppointing that GCC
> doesn't catch this.

Thanks Dan. Will fix.
 
> > +
> > +   switch (type) {
> > +   case ACL_TYPE_ACCESS:
> > +   name = XATTR_NAME_POSIX_ACL_ACCESS;
> > +   if (acl)
> > +   rc = posix_acl_update_mode(inode, >i_mode, );
> > +   break;
> > +
> > +   case ACL_TYPE_DEFAULT:
> > +   name = XATTR_NAME_POSIX_ACL_DEFAULT;
> > +   if (!S_ISDIR(inode->i_mode))
> > +   rc = acl ? -EACCES : 0;
> > +   break;
> > +
> > +   default:
> > +   rc = -EINVAL;
> > +   break;
> > +   }
> > +   if (rc)
> > +   return rc;
> 
> Otherwise rc can be uninitialized here.
> 
> regards,
> dan carpenter
> 
> 


Re: [PATCH v2 1/5] staging: lustre: llite: add support set_acl method in inode operations

2018-05-15 Thread James Simmons

> On Mon, May 14, 2018 at 10:16:59PM -0400, James Simmons wrote:
> > +#ifdef CONFIG_FS_POSIX_ACL
> >  struct posix_acl *ll_get_acl(struct inode *inode, int type)
> >  {
> > struct ll_inode_info *lli = ll_i2info(inode);
> > @@ -3043,6 +3044,64 @@ struct posix_acl *ll_get_acl(struct inode *inode, 
> > int type)
> > return acl;
> >  }
> >  
> > +int ll_set_acl(struct inode *inode, struct posix_acl *acl, int type)
> > +{
> > +   struct ll_sb_info *sbi = ll_i2sbi(inode);
> > +   struct ptlrpc_request *req = NULL;
> > +   const char *name = NULL;
> > +   size_t value_size = 0;
> > +   char *value = NULL;
> > +   int rc;
> 
> "rc" needs to be initialized to zero.  It's disapppointing that GCC
> doesn't catch this.

Thanks Dan. Will fix.
 
> > +
> > +   switch (type) {
> > +   case ACL_TYPE_ACCESS:
> > +   name = XATTR_NAME_POSIX_ACL_ACCESS;
> > +   if (acl)
> > +   rc = posix_acl_update_mode(inode, >i_mode, );
> > +   break;
> > +
> > +   case ACL_TYPE_DEFAULT:
> > +   name = XATTR_NAME_POSIX_ACL_DEFAULT;
> > +   if (!S_ISDIR(inode->i_mode))
> > +   rc = acl ? -EACCES : 0;
> > +   break;
> > +
> > +   default:
> > +   rc = -EINVAL;
> > +   break;
> > +   }
> > +   if (rc)
> > +   return rc;
> 
> Otherwise rc can be uninitialized here.
> 
> regards,
> dan carpenter
> 
> 


[PATCH v2 1/5] staging: lustre: llite: add support set_acl method in inode operations

2018-05-14 Thread James Simmons
From: Dmitry Eremin <dmitry.ere...@intel.com>

Linux kernel v3.14 adds set_acl method to inode operations.
This patch adds support to Lustre for proper acl management.

Signed-off-by: Dmitry Eremin <dmitry.ere...@intel.com>
Signed-off-by: John L. Hammond <john.hamm...@intel.com>
Signed-off-by: James Simmons <uja.o...@yahoo.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9183
Reviewed-on: https://review.whamcloud.com/25965
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-10541
Reviewed-on: https://review.whamcloud.com/31588
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-10926
Reviewed-on: https://review.whamcloud.com/32045
Reviewed-by: Bob Glossman <bob.gloss...@intel.com>
Reviewed-by: James Simmons <uja.o...@yahoo.com>
Reviewed-by: Andreas Dilger <andreas.dil...@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.ere...@intel.com>
Reviewed-by: Oleg Drokin <oleg.dro...@intel.com>
Signed-off-by: James Simmons <jsimm...@infradead.org>
---
Changelog:

v1) Initial patch ported to staging tree
v2) Fixed up goto handling and avoid BUG() when calling
forget_cached_acl()with invalid type as pointed out by Dan Carpenter

 drivers/staging/lustre/lustre/llite/file.c | 62 ++
 .../staging/lustre/lustre/llite/llite_internal.h   |  4 ++
 drivers/staging/lustre/lustre/llite/namei.c| 10 +++-
 3 files changed, 74 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/file.c 
b/drivers/staging/lustre/lustre/llite/file.c
index 0026fde..64a5698 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -3030,6 +3030,7 @@ static int ll_fiemap(struct inode *inode, struct 
fiemap_extent_info *fieinfo,
return rc;
 }
 
+#ifdef CONFIG_FS_POSIX_ACL
 struct posix_acl *ll_get_acl(struct inode *inode, int type)
 {
struct ll_inode_info *lli = ll_i2info(inode);
@@ -3043,6 +3044,64 @@ struct posix_acl *ll_get_acl(struct inode *inode, int 
type)
return acl;
 }
 
+int ll_set_acl(struct inode *inode, struct posix_acl *acl, int type)
+{
+   struct ll_sb_info *sbi = ll_i2sbi(inode);
+   struct ptlrpc_request *req = NULL;
+   const char *name = NULL;
+   size_t value_size = 0;
+   char *value = NULL;
+   int rc;
+
+   switch (type) {
+   case ACL_TYPE_ACCESS:
+   name = XATTR_NAME_POSIX_ACL_ACCESS;
+   if (acl)
+   rc = posix_acl_update_mode(inode, >i_mode, );
+   break;
+
+   case ACL_TYPE_DEFAULT:
+   name = XATTR_NAME_POSIX_ACL_DEFAULT;
+   if (!S_ISDIR(inode->i_mode))
+   rc = acl ? -EACCES : 0;
+   break;
+
+   default:
+   rc = -EINVAL;
+   break;
+   }
+   if (rc)
+   return rc;
+
+   if (acl) {
+   value_size = posix_acl_xattr_size(acl->a_count);
+   value = kmalloc(value_size, GFP_NOFS);
+   if (!value) {
+   rc = -ENOMEM;
+   goto out;
+   }
+
+   rc = posix_acl_to_xattr(_user_ns, acl, value, value_size);
+   if (rc < 0)
+   goto out_value;
+   }
+
+   rc = md_setxattr(sbi->ll_md_exp, ll_inode2fid(inode),
+value ? OBD_MD_FLXATTR : OBD_MD_FLXATTRRM,
+name, value, value_size, 0, 0, 0, );
+
+   ptlrpc_req_finished(req);
+out_value:
+   kfree(value);
+out:
+   if (rc)
+   forget_cached_acl(inode, type);
+   else
+   set_cached_acl(inode, type, acl);
+   return rc;
+}
+#endif /* CONFIG_FS_POSIX_ACL */
+
 int ll_inode_permission(struct inode *inode, int mask)
 {
struct ll_sb_info *sbi;
@@ -3164,7 +3223,10 @@ int ll_inode_permission(struct inode *inode, int mask)
.permission = ll_inode_permission,
.listxattr  = ll_listxattr,
.fiemap = ll_fiemap,
+#ifdef CONFIG_FS_POSIX_ACL
.get_acl= ll_get_acl,
+   .set_acl= ll_set_acl,
+#endif
 };
 
 /* dynamic ioctl number support routines */
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h 
b/drivers/staging/lustre/lustre/llite/llite_internal.h
index 6504850..2280327 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -754,7 +754,11 @@ enum ldlm_mode ll_take_md_lock(struct inode *inode, __u64 
bits,
 int ll_md_real_close(struct inode *inode, fmode_t fmode);
 int ll_getattr(const struct path *path, struct kstat *stat,
   u32 request_mask, unsigned int flags);
+#ifdef CONFIG_FS_POSIX_ACL
 struct posix_acl *ll_get_acl(struct inode *inode, int type);
+int ll_set_acl(struct inode *inode, struct posix_acl *acl, int type);
+#endif /* CONFIG_FS_POSIX_ACL */
+
 int ll_migrate(struct inode *

[PATCH v2 1/5] staging: lustre: llite: add support set_acl method in inode operations

2018-05-14 Thread James Simmons
From: Dmitry Eremin 

Linux kernel v3.14 adds set_acl method to inode operations.
This patch adds support to Lustre for proper acl management.

Signed-off-by: Dmitry Eremin 
Signed-off-by: John L. Hammond 
Signed-off-by: James Simmons 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9183
Reviewed-on: https://review.whamcloud.com/25965
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-10541
Reviewed-on: https://review.whamcloud.com/31588
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-10926
Reviewed-on: https://review.whamcloud.com/32045
Reviewed-by: Bob Glossman 
Reviewed-by: James Simmons 
Reviewed-by: Andreas Dilger 
Reviewed-by: Dmitry Eremin 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch ported to staging tree
v2) Fixed up goto handling and avoid BUG() when calling
forget_cached_acl()with invalid type as pointed out by Dan Carpenter

 drivers/staging/lustre/lustre/llite/file.c | 62 ++
 .../staging/lustre/lustre/llite/llite_internal.h   |  4 ++
 drivers/staging/lustre/lustre/llite/namei.c| 10 +++-
 3 files changed, 74 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/file.c 
b/drivers/staging/lustre/lustre/llite/file.c
index 0026fde..64a5698 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -3030,6 +3030,7 @@ static int ll_fiemap(struct inode *inode, struct 
fiemap_extent_info *fieinfo,
return rc;
 }
 
+#ifdef CONFIG_FS_POSIX_ACL
 struct posix_acl *ll_get_acl(struct inode *inode, int type)
 {
struct ll_inode_info *lli = ll_i2info(inode);
@@ -3043,6 +3044,64 @@ struct posix_acl *ll_get_acl(struct inode *inode, int 
type)
return acl;
 }
 
+int ll_set_acl(struct inode *inode, struct posix_acl *acl, int type)
+{
+   struct ll_sb_info *sbi = ll_i2sbi(inode);
+   struct ptlrpc_request *req = NULL;
+   const char *name = NULL;
+   size_t value_size = 0;
+   char *value = NULL;
+   int rc;
+
+   switch (type) {
+   case ACL_TYPE_ACCESS:
+   name = XATTR_NAME_POSIX_ACL_ACCESS;
+   if (acl)
+   rc = posix_acl_update_mode(inode, >i_mode, );
+   break;
+
+   case ACL_TYPE_DEFAULT:
+   name = XATTR_NAME_POSIX_ACL_DEFAULT;
+   if (!S_ISDIR(inode->i_mode))
+   rc = acl ? -EACCES : 0;
+   break;
+
+   default:
+   rc = -EINVAL;
+   break;
+   }
+   if (rc)
+   return rc;
+
+   if (acl) {
+   value_size = posix_acl_xattr_size(acl->a_count);
+   value = kmalloc(value_size, GFP_NOFS);
+   if (!value) {
+   rc = -ENOMEM;
+   goto out;
+   }
+
+   rc = posix_acl_to_xattr(_user_ns, acl, value, value_size);
+   if (rc < 0)
+   goto out_value;
+   }
+
+   rc = md_setxattr(sbi->ll_md_exp, ll_inode2fid(inode),
+value ? OBD_MD_FLXATTR : OBD_MD_FLXATTRRM,
+name, value, value_size, 0, 0, 0, );
+
+   ptlrpc_req_finished(req);
+out_value:
+   kfree(value);
+out:
+   if (rc)
+   forget_cached_acl(inode, type);
+   else
+   set_cached_acl(inode, type, acl);
+   return rc;
+}
+#endif /* CONFIG_FS_POSIX_ACL */
+
 int ll_inode_permission(struct inode *inode, int mask)
 {
struct ll_sb_info *sbi;
@@ -3164,7 +3223,10 @@ int ll_inode_permission(struct inode *inode, int mask)
.permission = ll_inode_permission,
.listxattr  = ll_listxattr,
.fiemap = ll_fiemap,
+#ifdef CONFIG_FS_POSIX_ACL
.get_acl= ll_get_acl,
+   .set_acl= ll_set_acl,
+#endif
 };
 
 /* dynamic ioctl number support routines */
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h 
b/drivers/staging/lustre/lustre/llite/llite_internal.h
index 6504850..2280327 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -754,7 +754,11 @@ enum ldlm_mode ll_take_md_lock(struct inode *inode, __u64 
bits,
 int ll_md_real_close(struct inode *inode, fmode_t fmode);
 int ll_getattr(const struct path *path, struct kstat *stat,
   u32 request_mask, unsigned int flags);
+#ifdef CONFIG_FS_POSIX_ACL
 struct posix_acl *ll_get_acl(struct inode *inode, int type);
+int ll_set_acl(struct inode *inode, struct posix_acl *acl, int type);
+#endif /* CONFIG_FS_POSIX_ACL */
+
 int ll_migrate(struct inode *parent, struct file *file, int mdtidx,
   const char *name, int namelen);
 int ll_get_fid_by_name(struct inode *parent, const char *name,
diff --git a/drivers/staging/lustre/lustre/llite/namei.c 
b/drivers/staging/lustre/lustre/llite/namei.c
index 9ac7f09..b41f189 100644
--- a/drivers

[PATCH 3/5] staging: lustre: acl: increase ACL entries limitation

2018-05-14 Thread James Simmons
From: Fan Yong <fan.y...@intel.com>

Originally, the limitation of ACL entries is 32, that is not
enough for some use cases. In fact, restricting ACL entries
count is mainly for preparing the RPC reply buffer to receive
the ACL data. So we cannot make the ACL entries count to be
unlimited. But we can enlarge the RPC reply buffer to hold
more ACL entries. On the other hand, MDT backend filesystem
has its own EA size limitation. For example, for ldiskfs case,
if large EA enable, then the max ACL size is 1048492 bytes;
otherwise, it is 4012 bytes. For ZFS backend, such value is
32768 bytes. With such hard limitation, we can calculate how
many ACL entries we can have at most. This patch increases
the RPC reply buffer to match such hard limitation. For old
client, to avoid buffer overflow because of large ACL data
(more than 32 ACL entries), the MDT will forbid the old client
to access the file with large ACL data. As for how to know
whether it is old client or new, a new connection flag
OBD_CONNECT_LARGE_ACL is used for that.

Signed-off-by: Fan Yong <fan.y...@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7473
Reviewed-on: https://review.whamcloud.com/19790
Reviewed-by: Andreas Dilger <andreas.dil...@intel.com>
Reviewed-by: Li Xi <l...@ddn.com>
Reviewed-by: Lai Siyao <lai.si...@intel.com>
Reviewed-by: Oleg Drokin <oleg.dro...@intel.com>
Signed-off-by: James Simmons <jsimm...@infradead.org>
---
 drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h | 2 +-
 drivers/staging/lustre/lustre/include/lustre_acl.h| 7 ++-
 drivers/staging/lustre/lustre/llite/llite_lib.c   | 3 ++-
 drivers/staging/lustre/lustre/mdc/mdc_locks.c | 6 ++
 drivers/staging/lustre/lustre/mdc/mdc_reint.c | 2 ++
 drivers/staging/lustre/lustre/mdc/mdc_request.c   | 4 
 drivers/staging/lustre/lustre/ptlrpc/layout.c | 4 +---
 drivers/staging/lustre/lustre/ptlrpc/wiretest.c   | 4 ++--
 8 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h 
b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h
index aac98db..8778c6f 100644
--- a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h
@@ -615,7 +615,7 @@ struct ptlrpc_body_v2 {
 #define OBD_CONNECT_REQPORTAL   0x40ULL /*Separate non-IO req portal */
 #define OBD_CONNECT_ACL 0x80ULL /*access control lists 
*/
 #define OBD_CONNECT_XATTR  0x100ULL /*client use extended attr */
-#define OBD_CONNECT_CROW   0x200ULL /*MDS+OST create obj on write*/
+#define OBD_CONNECT_LARGE_ACL  0x200ULL /* more than 32 ACL entries */
 #define OBD_CONNECT_TRUNCLOCK  0x400ULL /*locks on server for punch */
 #define OBD_CONNECT_TRANSNO0x800ULL /*replay sends init transno */
 #define OBD_CONNECT_IBITS 0x1000ULL /*support for inodebits locks*/
diff --git a/drivers/staging/lustre/lustre/include/lustre_acl.h 
b/drivers/staging/lustre/lustre/include/lustre_acl.h
index 35ff61c..e7575a1 100644
--- a/drivers/staging/lustre/lustre/include/lustre_acl.h
+++ b/drivers/staging/lustre/lustre/include/lustre_acl.h
@@ -36,11 +36,16 @@
 
 #include 
 #include 
+#ifdef CONFIG_FS_POSIX_ACL
 #include 
 
 #define LUSTRE_POSIX_ACL_MAX_ENTRIES   32
-#define LUSTRE_POSIX_ACL_MAX_SIZE  
\
+#define LUSTRE_POSIX_ACL_MAX_SIZE_OLD  
\
(sizeof(struct posix_acl_xattr_header) +
\
 LUSTRE_POSIX_ACL_MAX_ENTRIES * sizeof(struct posix_acl_xattr_entry))
 
+#else /* ! CONFIG_FS_POSIX_ACL */
+#define LUSTRE_POSIX_ACL_MAX_SIZE_OLD 0
+#endif /* CONFIG_FS_POSIX_ACL */
+
 #endif
diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c 
b/drivers/staging/lustre/lustre/llite/llite_lib.c
index 83eb2da..b5c287b 100644
--- a/drivers/staging/lustre/lustre/llite/llite_lib.c
+++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
@@ -198,7 +198,8 @@ static int client_common_fill_super(struct super_block *sb, 
char *md, char *dt)
if (sbi->ll_flags & LL_SBI_LRU_RESIZE)
data->ocd_connect_flags |= OBD_CONNECT_LRU_RESIZE;
 #ifdef CONFIG_FS_POSIX_ACL
-   data->ocd_connect_flags |= OBD_CONNECT_ACL | OBD_CONNECT_UMASK;
+   data->ocd_connect_flags |= OBD_CONNECT_ACL | OBD_CONNECT_UMASK |
+  OBD_CONNECT_LARGE_ACL;
 #endif
 
if (OBD_FAIL_CHECK(OBD_FAIL_MDC_LIGHTWEIGHT))
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c 
b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index 253a545..65a5341 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -308,6 +308,8 @@ 

[PATCH 3/5] staging: lustre: acl: increase ACL entries limitation

2018-05-14 Thread James Simmons
From: Fan Yong 

Originally, the limitation of ACL entries is 32, that is not
enough for some use cases. In fact, restricting ACL entries
count is mainly for preparing the RPC reply buffer to receive
the ACL data. So we cannot make the ACL entries count to be
unlimited. But we can enlarge the RPC reply buffer to hold
more ACL entries. On the other hand, MDT backend filesystem
has its own EA size limitation. For example, for ldiskfs case,
if large EA enable, then the max ACL size is 1048492 bytes;
otherwise, it is 4012 bytes. For ZFS backend, such value is
32768 bytes. With such hard limitation, we can calculate how
many ACL entries we can have at most. This patch increases
the RPC reply buffer to match such hard limitation. For old
client, to avoid buffer overflow because of large ACL data
(more than 32 ACL entries), the MDT will forbid the old client
to access the file with large ACL data. As for how to know
whether it is old client or new, a new connection flag
OBD_CONNECT_LARGE_ACL is used for that.

Signed-off-by: Fan Yong 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7473
Reviewed-on: https://review.whamcloud.com/19790
Reviewed-by: Andreas Dilger 
Reviewed-by: Li Xi 
Reviewed-by: Lai Siyao 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h | 2 +-
 drivers/staging/lustre/lustre/include/lustre_acl.h| 7 ++-
 drivers/staging/lustre/lustre/llite/llite_lib.c   | 3 ++-
 drivers/staging/lustre/lustre/mdc/mdc_locks.c | 6 ++
 drivers/staging/lustre/lustre/mdc/mdc_reint.c | 2 ++
 drivers/staging/lustre/lustre/mdc/mdc_request.c   | 4 
 drivers/staging/lustre/lustre/ptlrpc/layout.c | 4 +---
 drivers/staging/lustre/lustre/ptlrpc/wiretest.c   | 4 ++--
 8 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h 
b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h
index aac98db..8778c6f 100644
--- a/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/include/uapi/linux/lustre/lustre_idl.h
@@ -615,7 +615,7 @@ struct ptlrpc_body_v2 {
 #define OBD_CONNECT_REQPORTAL   0x40ULL /*Separate non-IO req portal */
 #define OBD_CONNECT_ACL 0x80ULL /*access control lists 
*/
 #define OBD_CONNECT_XATTR  0x100ULL /*client use extended attr */
-#define OBD_CONNECT_CROW   0x200ULL /*MDS+OST create obj on write*/
+#define OBD_CONNECT_LARGE_ACL  0x200ULL /* more than 32 ACL entries */
 #define OBD_CONNECT_TRUNCLOCK  0x400ULL /*locks on server for punch */
 #define OBD_CONNECT_TRANSNO0x800ULL /*replay sends init transno */
 #define OBD_CONNECT_IBITS 0x1000ULL /*support for inodebits locks*/
diff --git a/drivers/staging/lustre/lustre/include/lustre_acl.h 
b/drivers/staging/lustre/lustre/include/lustre_acl.h
index 35ff61c..e7575a1 100644
--- a/drivers/staging/lustre/lustre/include/lustre_acl.h
+++ b/drivers/staging/lustre/lustre/include/lustre_acl.h
@@ -36,11 +36,16 @@
 
 #include 
 #include 
+#ifdef CONFIG_FS_POSIX_ACL
 #include 
 
 #define LUSTRE_POSIX_ACL_MAX_ENTRIES   32
-#define LUSTRE_POSIX_ACL_MAX_SIZE  
\
+#define LUSTRE_POSIX_ACL_MAX_SIZE_OLD  
\
(sizeof(struct posix_acl_xattr_header) +
\
 LUSTRE_POSIX_ACL_MAX_ENTRIES * sizeof(struct posix_acl_xattr_entry))
 
+#else /* ! CONFIG_FS_POSIX_ACL */
+#define LUSTRE_POSIX_ACL_MAX_SIZE_OLD 0
+#endif /* CONFIG_FS_POSIX_ACL */
+
 #endif
diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c 
b/drivers/staging/lustre/lustre/llite/llite_lib.c
index 83eb2da..b5c287b 100644
--- a/drivers/staging/lustre/lustre/llite/llite_lib.c
+++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
@@ -198,7 +198,8 @@ static int client_common_fill_super(struct super_block *sb, 
char *md, char *dt)
if (sbi->ll_flags & LL_SBI_LRU_RESIZE)
data->ocd_connect_flags |= OBD_CONNECT_LRU_RESIZE;
 #ifdef CONFIG_FS_POSIX_ACL
-   data->ocd_connect_flags |= OBD_CONNECT_ACL | OBD_CONNECT_UMASK;
+   data->ocd_connect_flags |= OBD_CONNECT_ACL | OBD_CONNECT_UMASK |
+  OBD_CONNECT_LARGE_ACL;
 #endif
 
if (OBD_FAIL_CHECK(OBD_FAIL_MDC_LIGHTWEIGHT))
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c 
b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index 253a545..65a5341 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -308,6 +308,8 @@ static void mdc_realloc_openmsg(struct ptlrpc_request *req,
 
req_capsule_set_size(>rq_pill, _MDT_MD, RCL_SERVER,
 obddev->u.cli.cl_max_mds_easize);
+   req_capsule_s

[PATCH 5/5] staging: lustre: mdc: use large xattr buffers for old servers

2018-05-14 Thread James Simmons
From: "John L. Hammond" <john.hamm...@intel.com>

Pre 2.10.1 MDTs will crash when they receive a listxattr (MDS_GETXATTR
with OBD_MD_FLXATTRLS) RPC for an orphan or dead object. So for
clients connected to these older MDTs, try to avoid sending listxattr
RPCs by making the bulk getxattr (MDS_GETXATTR with OBD_MD_FLXATTRALL)
more likely to succeed and thereby reducing the chances of falling
back to listxattr.

Signed-off-by: John L. Hammond <john.hamm...@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-10912
Reviewed-on: https://review.whamcloud.com/31990
Reviewed-by: Andreas Dilger <andreas.dil...@intel.com>
Reviewed-by: Fan Yong <fan.y...@intel.com>
Reviewed-by: Oleg Drokin <oleg.dro...@intel.com>
Signed-off-by: James Simmons <jsimm...@infradead.org>
---
 drivers/staging/lustre/lustre/mdc/mdc_locks.c | 31 +--
 1 file changed, 24 insertions(+), 7 deletions(-)

diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c 
b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index a8aa0fa..b991c6f 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -326,8 +326,10 @@ static void mdc_realloc_openmsg(struct ptlrpc_request *req,
 {
struct ptlrpc_request   *req;
struct ldlm_intent  *lit;
+   u32 min_buf_size = 0;
int rc, count = 0;
LIST_HEAD(cancels);
+   u32 buf_size = 0;
 
req = ptlrpc_request_alloc(class_exp2cliimp(exp),
   _LDLM_INTENT_GETXATTR);
@@ -344,18 +346,33 @@ static void mdc_realloc_openmsg(struct ptlrpc_request 
*req,
lit = req_capsule_client_get(>rq_pill, _LDLM_INTENT);
lit->opc = IT_GETXATTR;
 
+#if LUSTRE_VERSION_CODE < OBD_OCD_VERSION(3, 0, 53, 0)
+   /* If the supplied buffer is too small then the server will
+* return -ERANGE and llite will fallback to using non cached
+* xattr operations. On servers before 2.10.1 a (non-cached)
+* listxattr RPC for an orphan or dead file causes an oops. So
+* let's try to avoid sending too small a buffer to too old a
+* server. This is effectively undoing the memory conservation
+* of LU-9417 when it would be *more* likely to crash the
+* server. See LU-9856.
+*/
+   if (exp->exp_connect_data.ocd_version < OBD_OCD_VERSION(2, 10, 1, 0))
+   min_buf_size = exp->exp_connect_data.ocd_max_easize;
+#endif
+   buf_size = max_t(u32, min_buf_size,
+GA_DEFAULT_EA_NAME_LEN * GA_DEFAULT_EA_NUM);
+
/* pack the intended request */
-   mdc_pack_body(req, _data->op_fid1, op_data->op_valid,
- GA_DEFAULT_EA_NAME_LEN * GA_DEFAULT_EA_NUM, -1, 0);
+   mdc_pack_body(req, _data->op_fid1, op_data->op_valid, buf_size,
+ -1, 0);
 
-   req_capsule_set_size(>rq_pill, _EADATA, RCL_SERVER,
-GA_DEFAULT_EA_NAME_LEN * GA_DEFAULT_EA_NUM);
+   req_capsule_set_size(>rq_pill, _EADATA, RCL_SERVER, buf_size);
 
-   req_capsule_set_size(>rq_pill, _EAVALS, RCL_SERVER,
-GA_DEFAULT_EA_NAME_LEN * GA_DEFAULT_EA_NUM);
+   req_capsule_set_size(>rq_pill, _EAVALS, RCL_SERVER, buf_size);
 
req_capsule_set_size(>rq_pill, _EAVALS_LENS, RCL_SERVER,
-sizeof(u32) * GA_DEFAULT_EA_NUM);
+max_t(u32, min_buf_size,
+  sizeof(u32) * GA_DEFAULT_EA_NUM));
 
req_capsule_set_size(>rq_pill, _ACL, RCL_SERVER, 0);
 
-- 
1.8.3.1



[PATCH 5/5] staging: lustre: mdc: use large xattr buffers for old servers

2018-05-14 Thread James Simmons
From: "John L. Hammond" 

Pre 2.10.1 MDTs will crash when they receive a listxattr (MDS_GETXATTR
with OBD_MD_FLXATTRLS) RPC for an orphan or dead object. So for
clients connected to these older MDTs, try to avoid sending listxattr
RPCs by making the bulk getxattr (MDS_GETXATTR with OBD_MD_FLXATTRALL)
more likely to succeed and thereby reducing the chances of falling
back to listxattr.

Signed-off-by: John L. Hammond 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-10912
Reviewed-on: https://review.whamcloud.com/31990
Reviewed-by: Andreas Dilger 
Reviewed-by: Fan Yong 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lustre/mdc/mdc_locks.c | 31 +--
 1 file changed, 24 insertions(+), 7 deletions(-)

diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c 
b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index a8aa0fa..b991c6f 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -326,8 +326,10 @@ static void mdc_realloc_openmsg(struct ptlrpc_request *req,
 {
struct ptlrpc_request   *req;
struct ldlm_intent  *lit;
+   u32 min_buf_size = 0;
int rc, count = 0;
LIST_HEAD(cancels);
+   u32 buf_size = 0;
 
req = ptlrpc_request_alloc(class_exp2cliimp(exp),
   _LDLM_INTENT_GETXATTR);
@@ -344,18 +346,33 @@ static void mdc_realloc_openmsg(struct ptlrpc_request 
*req,
lit = req_capsule_client_get(>rq_pill, _LDLM_INTENT);
lit->opc = IT_GETXATTR;
 
+#if LUSTRE_VERSION_CODE < OBD_OCD_VERSION(3, 0, 53, 0)
+   /* If the supplied buffer is too small then the server will
+* return -ERANGE and llite will fallback to using non cached
+* xattr operations. On servers before 2.10.1 a (non-cached)
+* listxattr RPC for an orphan or dead file causes an oops. So
+* let's try to avoid sending too small a buffer to too old a
+* server. This is effectively undoing the memory conservation
+* of LU-9417 when it would be *more* likely to crash the
+* server. See LU-9856.
+*/
+   if (exp->exp_connect_data.ocd_version < OBD_OCD_VERSION(2, 10, 1, 0))
+   min_buf_size = exp->exp_connect_data.ocd_max_easize;
+#endif
+   buf_size = max_t(u32, min_buf_size,
+GA_DEFAULT_EA_NAME_LEN * GA_DEFAULT_EA_NUM);
+
/* pack the intended request */
-   mdc_pack_body(req, _data->op_fid1, op_data->op_valid,
- GA_DEFAULT_EA_NAME_LEN * GA_DEFAULT_EA_NUM, -1, 0);
+   mdc_pack_body(req, _data->op_fid1, op_data->op_valid, buf_size,
+ -1, 0);
 
-   req_capsule_set_size(>rq_pill, _EADATA, RCL_SERVER,
-GA_DEFAULT_EA_NAME_LEN * GA_DEFAULT_EA_NUM);
+   req_capsule_set_size(>rq_pill, _EADATA, RCL_SERVER, buf_size);
 
-   req_capsule_set_size(>rq_pill, _EAVALS, RCL_SERVER,
-GA_DEFAULT_EA_NAME_LEN * GA_DEFAULT_EA_NUM);
+   req_capsule_set_size(>rq_pill, _EAVALS, RCL_SERVER, buf_size);
 
req_capsule_set_size(>rq_pill, _EAVALS_LENS, RCL_SERVER,
-sizeof(u32) * GA_DEFAULT_EA_NUM);
+max_t(u32, min_buf_size,
+  sizeof(u32) * GA_DEFAULT_EA_NUM));
 
req_capsule_set_size(>rq_pill, _ACL, RCL_SERVER, 0);
 
-- 
1.8.3.1



[PATCH 4/5] staging: lustre: mdc: excessive memory consumption by the xattr cache

2018-05-14 Thread James Simmons
From: Andrew Perepechko <c17...@cray.com>

The refill operation of the xattr cache does not know the
reply size in advance, so it makes a guess based on
the maxeasize value returned by the MDS.

In practice, it allocates 16 KiB for the common case and
4 MiB for the large xattr case. However, a typical reply
is just a few hundred bytes.

If we follow the conservative approach, we can prepare a
single memory page for the reply. It is large enough for
any reasonable xattr set and, at the same time, it does
not require multiple page memory reclaim, which can be
costly.

If, for a specific file, the reply is larger than a single
page, the client is prepared to handle that and will fall back
to non-cached xattr code. Indeed, if this happens often and
xattrs are often used to store large values, it makes sense to
disable the xattr cache at all since it wasn't designed for
such [mis]use.

Signed-off-by: Andrew Perepechko <c17...@cray.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9417
Reviewed-on: https://review.whamcloud.com/26887
Reviewed-by: Fan Yong <fan.y...@intel.com>
Reviewed-by: Ben Evans <bev...@cray.com>
Reviewed-by: Oleg Drokin <oleg.dro...@intel.com>
Signed-off-by: James Simmons <jsimm...@infradead.org>
---
 drivers/staging/lustre/lustre/mdc/mdc_locks.c | 23 +--
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c 
b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index 65a5341..a8aa0fa 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -315,6 +315,10 @@ static void mdc_realloc_openmsg(struct ptlrpc_request *req,
return req;
 }
 
+#define GA_DEFAULT_EA_NAME_LEN 20
+#define GA_DEFAULT_EA_VAL_LEN  250
+#define GA_DEFAULT_EA_NUM  10
+
 static struct ptlrpc_request *
 mdc_intent_getxattr_pack(struct obd_export *exp,
 struct lookup_intent *it,
@@ -323,7 +327,6 @@ static void mdc_realloc_openmsg(struct ptlrpc_request *req,
struct ptlrpc_request   *req;
struct ldlm_intent  *lit;
int rc, count = 0;
-   u32 maxdata;
LIST_HEAD(cancels);
 
req = ptlrpc_request_alloc(class_exp2cliimp(exp),
@@ -341,20 +344,20 @@ static void mdc_realloc_openmsg(struct ptlrpc_request 
*req,
lit = req_capsule_client_get(>rq_pill, _LDLM_INTENT);
lit->opc = IT_GETXATTR;
 
-   maxdata = class_exp2cliimp(exp)->imp_connect_data.ocd_max_easize;
-
/* pack the intended request */
-   mdc_pack_body(req, _data->op_fid1, op_data->op_valid, maxdata, -1,
- 0);
+   mdc_pack_body(req, _data->op_fid1, op_data->op_valid,
+ GA_DEFAULT_EA_NAME_LEN * GA_DEFAULT_EA_NUM, -1, 0);
 
-   req_capsule_set_size(>rq_pill, _EADATA, RCL_SERVER, maxdata);
+   req_capsule_set_size(>rq_pill, _EADATA, RCL_SERVER,
+GA_DEFAULT_EA_NAME_LEN * GA_DEFAULT_EA_NUM);
 
-   req_capsule_set_size(>rq_pill, _EAVALS, RCL_SERVER, maxdata);
+   req_capsule_set_size(>rq_pill, _EAVALS, RCL_SERVER,
+GA_DEFAULT_EA_NAME_LEN * GA_DEFAULT_EA_NUM);
 
-   req_capsule_set_size(>rq_pill, _EAVALS_LENS,
-RCL_SERVER, maxdata);
+   req_capsule_set_size(>rq_pill, _EAVALS_LENS, RCL_SERVER,
+sizeof(u32) * GA_DEFAULT_EA_NUM);
 
-   req_capsule_set_size(>rq_pill, _ACL, RCL_SERVER, maxdata);
+   req_capsule_set_size(>rq_pill, _ACL, RCL_SERVER, 0);
 
ptlrpc_request_set_replen(req);
 
-- 
1.8.3.1



[PATCH v2 2/5] staging: lustre: llite: remove unused parameters from md_{get,set}xattr()

2018-05-14 Thread James Simmons
From: "John L. Hammond" <john.hamm...@intel.com>

md_getxattr() and md_setxattr() each have several unused
parameters. Remove them and improve the naming or remaining
parameters.

Signed-off-by: John L. Hammond <john.hamm...@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-10792
Reviewed-on: https://review.whamcloud.com/
Reviewed-by: Dmitry Eremin <dmitry.ere...@intel.com>
Reviewed-by: James Simmons <uja.o...@yahoo.com>
Signed-off-by: James Simmons <jsimm...@infradead.org>
---
Changelog:

v1) Initial patch ported to staging tree
v2) Rebased on fixed parent patch

 drivers/staging/lustre/lustre/include/obd.h   |  7 ++---
 drivers/staging/lustre/lustre/include/obd_class.h | 21 ++
 drivers/staging/lustre/lustre/llite/file.c|  5 ++--
 drivers/staging/lustre/lustre/llite/xattr.c   |  6 ++--
 drivers/staging/lustre/lustre/lmv/lmv_obd.c   | 22 +++
 drivers/staging/lustre/lustre/mdc/mdc_request.c   | 34 +--
 6 files changed, 46 insertions(+), 49 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/obd.h 
b/drivers/staging/lustre/lustre/include/obd.h
index fe21987..a69564d 100644
--- a/drivers/staging/lustre/lustre/include/obd.h
+++ b/drivers/staging/lustre/lustre/include/obd.h
@@ -940,12 +940,11 @@ struct md_ops {
  struct ptlrpc_request **);
 
int (*setxattr)(struct obd_export *, const struct lu_fid *,
-   u64, const char *, const char *, int, int, int, __u32,
-   struct ptlrpc_request **);
+   u64, const char *, const void *, size_t, unsigned int,
+   u32, struct ptlrpc_request **);
 
int (*getxattr)(struct obd_export *, const struct lu_fid *,
-   u64, const char *, const char *, int, int, int,
-   struct ptlrpc_request **);
+   u64, const char *, size_t, struct ptlrpc_request **);
 
int (*init_ea_size)(struct obd_export *, u32, u32);
 
diff --git a/drivers/staging/lustre/lustre/include/obd_class.h 
b/drivers/staging/lustre/lustre/include/obd_class.h
index a76f016..0081578 100644
--- a/drivers/staging/lustre/lustre/include/obd_class.h
+++ b/drivers/staging/lustre/lustre/include/obd_class.h
@@ -1385,29 +1385,26 @@ static inline int md_merge_attr(struct obd_export *exp,
 }
 
 static inline int md_setxattr(struct obd_export *exp, const struct lu_fid *fid,
- u64 valid, const char *name,
- const char *input, int input_size,
- int output_size, int flags, __u32 suppgid,
+ u64 obd_md_valid, const char *name,
+ const char *value, size_t value_size,
+ unsigned int xattr_flags, u32 suppgid,
  struct ptlrpc_request **request)
 {
EXP_CHECK_MD_OP(exp, setxattr);
EXP_MD_COUNTER_INCREMENT(exp, setxattr);
-   return MDP(exp->exp_obd, setxattr)(exp, fid, valid, name, input,
-  input_size, output_size, flags,
+   return MDP(exp->exp_obd, setxattr)(exp, fid, obd_md_valid, name,
+  value, value_size, xattr_flags,
   suppgid, request);
 }
 
 static inline int md_getxattr(struct obd_export *exp, const struct lu_fid *fid,
- u64 valid, const char *name,
- const char *input, int input_size,
- int output_size, int flags,
- struct ptlrpc_request **request)
+ u64 obd_md_valid, const char *name,
+ size_t buf_size, struct ptlrpc_request **req)
 {
EXP_CHECK_MD_OP(exp, getxattr);
EXP_MD_COUNTER_INCREMENT(exp, getxattr);
-   return MDP(exp->exp_obd, getxattr)(exp, fid, valid, name, input,
-  input_size, output_size, flags,
-  request);
+   return MDP(exp->exp_obd, getxattr)(exp, fid, obd_md_valid, name,
+  buf_size, req);
 }
 
 static inline int md_set_open_replay_data(struct obd_export *exp,
diff --git a/drivers/staging/lustre/lustre/llite/file.c 
b/drivers/staging/lustre/lustre/llite/file.c
index 64a5698..de30df2 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -3088,7 +3088,7 @@ int ll_set_acl(struct inode *inode, struct posix_acl 
*acl, int type)
 
rc = md_setxattr(sbi->ll_md_exp, ll_inode2fid(inode),
 value ? OBD_MD_FLXATTR : OBD_MD_FLXATTRRM,
-name, value, value_size, 0, 0, 0, );
+name, value, value_size, 0, 0, );
 
ptlrpc_req_fini

[PATCH 4/5] staging: lustre: mdc: excessive memory consumption by the xattr cache

2018-05-14 Thread James Simmons
From: Andrew Perepechko 

The refill operation of the xattr cache does not know the
reply size in advance, so it makes a guess based on
the maxeasize value returned by the MDS.

In practice, it allocates 16 KiB for the common case and
4 MiB for the large xattr case. However, a typical reply
is just a few hundred bytes.

If we follow the conservative approach, we can prepare a
single memory page for the reply. It is large enough for
any reasonable xattr set and, at the same time, it does
not require multiple page memory reclaim, which can be
costly.

If, for a specific file, the reply is larger than a single
page, the client is prepared to handle that and will fall back
to non-cached xattr code. Indeed, if this happens often and
xattrs are often used to store large values, it makes sense to
disable the xattr cache at all since it wasn't designed for
such [mis]use.

Signed-off-by: Andrew Perepechko 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9417
Reviewed-on: https://review.whamcloud.com/26887
Reviewed-by: Fan Yong 
Reviewed-by: Ben Evans 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lustre/mdc/mdc_locks.c | 23 +--
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c 
b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index 65a5341..a8aa0fa 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -315,6 +315,10 @@ static void mdc_realloc_openmsg(struct ptlrpc_request *req,
return req;
 }
 
+#define GA_DEFAULT_EA_NAME_LEN 20
+#define GA_DEFAULT_EA_VAL_LEN  250
+#define GA_DEFAULT_EA_NUM  10
+
 static struct ptlrpc_request *
 mdc_intent_getxattr_pack(struct obd_export *exp,
 struct lookup_intent *it,
@@ -323,7 +327,6 @@ static void mdc_realloc_openmsg(struct ptlrpc_request *req,
struct ptlrpc_request   *req;
struct ldlm_intent  *lit;
int rc, count = 0;
-   u32 maxdata;
LIST_HEAD(cancels);
 
req = ptlrpc_request_alloc(class_exp2cliimp(exp),
@@ -341,20 +344,20 @@ static void mdc_realloc_openmsg(struct ptlrpc_request 
*req,
lit = req_capsule_client_get(>rq_pill, _LDLM_INTENT);
lit->opc = IT_GETXATTR;
 
-   maxdata = class_exp2cliimp(exp)->imp_connect_data.ocd_max_easize;
-
/* pack the intended request */
-   mdc_pack_body(req, _data->op_fid1, op_data->op_valid, maxdata, -1,
- 0);
+   mdc_pack_body(req, _data->op_fid1, op_data->op_valid,
+ GA_DEFAULT_EA_NAME_LEN * GA_DEFAULT_EA_NUM, -1, 0);
 
-   req_capsule_set_size(>rq_pill, _EADATA, RCL_SERVER, maxdata);
+   req_capsule_set_size(>rq_pill, _EADATA, RCL_SERVER,
+GA_DEFAULT_EA_NAME_LEN * GA_DEFAULT_EA_NUM);
 
-   req_capsule_set_size(>rq_pill, _EAVALS, RCL_SERVER, maxdata);
+   req_capsule_set_size(>rq_pill, _EAVALS, RCL_SERVER,
+GA_DEFAULT_EA_NAME_LEN * GA_DEFAULT_EA_NUM);
 
-   req_capsule_set_size(>rq_pill, _EAVALS_LENS,
-RCL_SERVER, maxdata);
+   req_capsule_set_size(>rq_pill, _EAVALS_LENS, RCL_SERVER,
+sizeof(u32) * GA_DEFAULT_EA_NUM);
 
-   req_capsule_set_size(>rq_pill, _ACL, RCL_SERVER, maxdata);
+   req_capsule_set_size(>rq_pill, _ACL, RCL_SERVER, 0);
 
ptlrpc_request_set_replen(req);
 
-- 
1.8.3.1



[PATCH v2 2/5] staging: lustre: llite: remove unused parameters from md_{get,set}xattr()

2018-05-14 Thread James Simmons
From: "John L. Hammond" 

md_getxattr() and md_setxattr() each have several unused
parameters. Remove them and improve the naming or remaining
parameters.

Signed-off-by: John L. Hammond 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-10792
Reviewed-on: https://review.whamcloud.com/
Reviewed-by: Dmitry Eremin 
Reviewed-by: James Simmons 
Signed-off-by: James Simmons 
---
Changelog:

v1) Initial patch ported to staging tree
v2) Rebased on fixed parent patch

 drivers/staging/lustre/lustre/include/obd.h   |  7 ++---
 drivers/staging/lustre/lustre/include/obd_class.h | 21 ++
 drivers/staging/lustre/lustre/llite/file.c|  5 ++--
 drivers/staging/lustre/lustre/llite/xattr.c   |  6 ++--
 drivers/staging/lustre/lustre/lmv/lmv_obd.c   | 22 +++
 drivers/staging/lustre/lustre/mdc/mdc_request.c   | 34 +--
 6 files changed, 46 insertions(+), 49 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/obd.h 
b/drivers/staging/lustre/lustre/include/obd.h
index fe21987..a69564d 100644
--- a/drivers/staging/lustre/lustre/include/obd.h
+++ b/drivers/staging/lustre/lustre/include/obd.h
@@ -940,12 +940,11 @@ struct md_ops {
  struct ptlrpc_request **);
 
int (*setxattr)(struct obd_export *, const struct lu_fid *,
-   u64, const char *, const char *, int, int, int, __u32,
-   struct ptlrpc_request **);
+   u64, const char *, const void *, size_t, unsigned int,
+   u32, struct ptlrpc_request **);
 
int (*getxattr)(struct obd_export *, const struct lu_fid *,
-   u64, const char *, const char *, int, int, int,
-   struct ptlrpc_request **);
+   u64, const char *, size_t, struct ptlrpc_request **);
 
int (*init_ea_size)(struct obd_export *, u32, u32);
 
diff --git a/drivers/staging/lustre/lustre/include/obd_class.h 
b/drivers/staging/lustre/lustre/include/obd_class.h
index a76f016..0081578 100644
--- a/drivers/staging/lustre/lustre/include/obd_class.h
+++ b/drivers/staging/lustre/lustre/include/obd_class.h
@@ -1385,29 +1385,26 @@ static inline int md_merge_attr(struct obd_export *exp,
 }
 
 static inline int md_setxattr(struct obd_export *exp, const struct lu_fid *fid,
- u64 valid, const char *name,
- const char *input, int input_size,
- int output_size, int flags, __u32 suppgid,
+ u64 obd_md_valid, const char *name,
+ const char *value, size_t value_size,
+ unsigned int xattr_flags, u32 suppgid,
  struct ptlrpc_request **request)
 {
EXP_CHECK_MD_OP(exp, setxattr);
EXP_MD_COUNTER_INCREMENT(exp, setxattr);
-   return MDP(exp->exp_obd, setxattr)(exp, fid, valid, name, input,
-  input_size, output_size, flags,
+   return MDP(exp->exp_obd, setxattr)(exp, fid, obd_md_valid, name,
+  value, value_size, xattr_flags,
   suppgid, request);
 }
 
 static inline int md_getxattr(struct obd_export *exp, const struct lu_fid *fid,
- u64 valid, const char *name,
- const char *input, int input_size,
- int output_size, int flags,
- struct ptlrpc_request **request)
+ u64 obd_md_valid, const char *name,
+ size_t buf_size, struct ptlrpc_request **req)
 {
EXP_CHECK_MD_OP(exp, getxattr);
EXP_MD_COUNTER_INCREMENT(exp, getxattr);
-   return MDP(exp->exp_obd, getxattr)(exp, fid, valid, name, input,
-  input_size, output_size, flags,
-  request);
+   return MDP(exp->exp_obd, getxattr)(exp, fid, obd_md_valid, name,
+  buf_size, req);
 }
 
 static inline int md_set_open_replay_data(struct obd_export *exp,
diff --git a/drivers/staging/lustre/lustre/llite/file.c 
b/drivers/staging/lustre/lustre/llite/file.c
index 64a5698..de30df2 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -3088,7 +3088,7 @@ int ll_set_acl(struct inode *inode, struct posix_acl 
*acl, int type)
 
rc = md_setxattr(sbi->ll_md_exp, ll_inode2fid(inode),
 value ? OBD_MD_FLXATTR : OBD_MD_FLXATTRRM,
-name, value, value_size, 0, 0, 0, );
+name, value, value_size, 0, 0, );
 
ptlrpc_req_finished(req);
 out_value:
@@ -3400,8 +3400,7 @@ static int ll_layout_fetch(struct inode *inode, struct 
ldlm_lock *lock)
rc = 

[PATCH 0/5] staging: lustre: llite: remaining xattr fixes

2018-05-14 Thread James Simmons
Fixed the bugs in the set_acl patch pointed out by Dan Carpenter.
Rebased the next patch 'remove unused parameter..." on the parent
patch. Added newer xattr fixes that were recently pushed.

Andrew Perepechko (1):
  staging: lustre: mdc: excessive memory consumption by the xattr cache

Dmitry Eremin (1):
  staging: lustre: llite: add support set_acl method in inode operations

Fan Yong (1):
  staging: lustre: acl: increase ACL entries limitation

John L. Hammond (2):
  staging: lustre: llite: remove unused parameters from md_{get,set}xattr()
  staging: lustre: mdc: use large xattr buffers for old servers

 .../lustre/include/uapi/linux/lustre/lustre_idl.h  |  2 +-
 drivers/staging/lustre/lustre/include/lustre_acl.h |  7 ++-
 drivers/staging/lustre/lustre/include/obd.h|  7 +--
 drivers/staging/lustre/lustre/include/obd_class.h  | 21 +++
 drivers/staging/lustre/lustre/llite/file.c | 65 +-
 .../staging/lustre/lustre/llite/llite_internal.h   |  4 ++
 drivers/staging/lustre/lustre/llite/llite_lib.c|  3 +-
 drivers/staging/lustre/lustre/llite/namei.c| 10 +++-
 drivers/staging/lustre/lustre/llite/xattr.c|  6 +-
 drivers/staging/lustre/lustre/lmv/lmv_obd.c| 22 
 drivers/staging/lustre/lustre/mdc/mdc_locks.c  | 42 +++---
 drivers/staging/lustre/lustre/mdc/mdc_reint.c  |  2 +
 drivers/staging/lustre/lustre/mdc/mdc_request.c| 38 -
 drivers/staging/lustre/lustre/ptlrpc/layout.c  |  4 +-
 drivers/staging/lustre/lustre/ptlrpc/wiretest.c|  4 +-
 15 files changed, 171 insertions(+), 66 deletions(-)

-- 
1.8.3.1



[PATCH 0/5] staging: lustre: llite: remaining xattr fixes

2018-05-14 Thread James Simmons
Fixed the bugs in the set_acl patch pointed out by Dan Carpenter.
Rebased the next patch 'remove unused parameter..." on the parent
patch. Added newer xattr fixes that were recently pushed.

Andrew Perepechko (1):
  staging: lustre: mdc: excessive memory consumption by the xattr cache

Dmitry Eremin (1):
  staging: lustre: llite: add support set_acl method in inode operations

Fan Yong (1):
  staging: lustre: acl: increase ACL entries limitation

John L. Hammond (2):
  staging: lustre: llite: remove unused parameters from md_{get,set}xattr()
  staging: lustre: mdc: use large xattr buffers for old servers

 .../lustre/include/uapi/linux/lustre/lustre_idl.h  |  2 +-
 drivers/staging/lustre/lustre/include/lustre_acl.h |  7 ++-
 drivers/staging/lustre/lustre/include/obd.h|  7 +--
 drivers/staging/lustre/lustre/include/obd_class.h  | 21 +++
 drivers/staging/lustre/lustre/llite/file.c | 65 +-
 .../staging/lustre/lustre/llite/llite_internal.h   |  4 ++
 drivers/staging/lustre/lustre/llite/llite_lib.c|  3 +-
 drivers/staging/lustre/lustre/llite/namei.c| 10 +++-
 drivers/staging/lustre/lustre/llite/xattr.c|  6 +-
 drivers/staging/lustre/lustre/lmv/lmv_obd.c| 22 
 drivers/staging/lustre/lustre/mdc/mdc_locks.c  | 42 +++---
 drivers/staging/lustre/lustre/mdc/mdc_reint.c  |  2 +
 drivers/staging/lustre/lustre/mdc/mdc_request.c| 38 -
 drivers/staging/lustre/lustre/ptlrpc/layout.c  |  4 +-
 drivers/staging/lustre/lustre/ptlrpc/wiretest.c|  4 +-
 15 files changed, 171 insertions(+), 66 deletions(-)

-- 
1.8.3.1



  1   2   3   4   5   6   7   8   9   10   >