Re: [lustre-discuss] Lustre traffic slow on OPA fabric network

2018-07-10 Thread Robin Humble
Hi Kurt,

On Tue, Jul 03, 2018 at 02:59:22PM -0400, Kurt Strosahl wrote:
>   I've been seeing a great deal of slowness from clients on an OPA network 
> accessing lustre through lnet routers.  The nodes take very long to complete 
> things like lfs df, and show lots of dropped / reestablished connections.  
> The OSS systems show this as well, and occasionally will report that all 
> routes are down to a host on the omnipath fabric.  They also show large 
> numbers of bulk callback errors.  The lnet router show large numbers of 
> PUT_NACK messages, as well as Abort reconnection messages for nodes on the 
> OPA fabric.

I don't suppose you're talking to a super-old Lustre version via the
lnet routers?

we see excellent performance OPA to IB via lnet routers wth 2.10.x
clients and 2.9 servers, but when we try to talk to a IEEL 2.5.41
servers then we see pretty much exactly the symptoms you describe.

strangely direct mounts of old lustre on new clients on IB work ok, but
not via lnet routers to OPA. old lustre to new clients on tcp networks
are ok. lnet self tests OPA to IB also work fine, it's just when we do
the actual mounts...
anyway, we are going to try and resolve the problem by updating the
IEEL to 2.9 or 2.10

hmm, now that I think of it, we did have to tweak the ko2iblnd options
a lot on the lnet router to get it this stable. I forget the symptoms
we were seeing though, sorry.
we found the minimum common denominator settings between the IB network
and the OPA, and tuned ko2iblnd on the lnet routers down to that. if it
finds one OPA card then Lustre imposes an agressive OPA config on all
IB networks which made our mlx4 cards on a ipath/qib fabric unhappy.

FWIW, for our hardware combo, ko2iblnd options are

  options ko2iblnd-opa peer_credits=8 peer_credits_hiw=0 credits=256 
concurrent_sends=0 ntx=512 map_on_demand=0 fmr_pool_size=512 
fmr_flush_trigger=384 fmr_cache=1 conns_per_peer=1

I don't know what most of these do, so please take with a grain of salt.

cheers,
robin
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] flock vs localflock

2018-07-10 Thread Robin Humble
Hi Darby,

On Thu, Jul 05, 2018 at 09:26:36PM +, Vicker, Darby (JSC-EG311) wrote:
>Also, the ldlm processes lead us to looking at flock vs localflock.  On 
>previous generations of our LFS???s, we used localflock.  But on the current 
>LFS, we decided to try flock instead.  This LFS has been in production for a 
>couple years with no obvious problems due to flock but we decided to drop back 
>to localflock as a precaution for now.  We need to do a more controlled test 
>but this does seem to help.  What are other sites using for locking parameters?

we use flock for /home and the large scratch filesystem. have done for
probably 10 years. localflock for the read-only software installs in
/apps, and no locking for the OS image (overlayfs with ramdisk upper,
read-only Lustre lower).

we are all ZFS and 2.10.4 too.

I don't think we have much in the way of flock user codes, so I can't
actually recall any issues along those lines.

the most common MDS abusing load we see is jobs across multiple nodes
appending to the same (by definition rubbish) output file. the write
lock bounces between nodes and causes high MDS load, poor performance
for those clients nodes, bit slower for everyone. I look for these
simply with 'lsof' and correlate across nodes.

HTH

cheers,
robin
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org