Re: [lustre-discuss] 2.12.2 mds problems

2019-11-26 Thread Stephane Thiell
Hi Alastair,

The first thing to do is to upgrade your servers to 2.12.3, as many bugs have 
been fixed.

http://wiki.lustre.org/Lustre_2.12.3_Changelog

Stephane

> On Nov 20, 2019, at 7:29 AM, BASDEN, ALASTAIR G.  
> wrote:
> 
> Hi,
> 
> We have a new 2.12.2 system, and are seeing fairly frequent lockups on the 
> primary mds.  We get messages such as:
> 
> Nov 20 14:24:12 c6mds1 kernel: LustreError: 
> 38853:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer 
> expired after 150s: evicting client at 172.18.122.165@o2ib  ns: 
> mdt-cos6-MDT_UUID lock: 92596372cec0/0x2efa065d0bb180f3 lrc: 3/0,0 
> mode: PW/PW res: [0x27a26:0x14:0x0].0x0 bits 0x40/0x0 rrc: 50 type: 
> IBT flags: 0x6020040020 nid: 172.18.122.165@o2ib remote: 
> 0x37bce663787828ed expref: 11 pid: 39074 timeout: 4429040 lvb_type: 0
> Nov 20 14:25:03 c6mds1 kernel: LNet: Service thread pid 39057 was inactive 
> for 200.67s. The thread might be hung, or it might only be slow and will 
> resume later. Dumping the stack trace for debugging purposes:
> Nov 20 14:25:03 c6mds1 kernel: Pid: 39057, comm: mdt00_045 
> 3.10.0-957.10.1.el7_lustre.x86_64 #1 SMP Tue Apr 30 22:18:15 UTC 2019
> Nov 20 14:25:03 c6mds1 kernel: Call Trace:
> Nov 20 14:25:03 c6mds1 kernel: [] 
> ldlm_completion_ast+0x5b1/0x920 [ptlrpc]
> Nov 20 14:25:03 c6mds1 kernel: [] 
> ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc]
> Nov 20 14:25:03 c6mds1 kernel: [] 
> mdt_object_local_lock+0x50b/0xb20 [mdt]
> Nov 20 14:25:03 c6mds1 kernel: [] 
> mdt_object_lock_internal+0x70/0x360 [mdt]
> Nov 20 14:25:03 c6mds1 kernel: [] 
> mdt_object_lock+0x20/0x30 [mdt]
> Nov 20 14:25:03 c6mds1 kernel: [] 
> mdt_brw_enqueue+0x44b/0x760 [mdt]
> Nov 20 14:25:03 c6mds1 kernel: [] 
> mdt_intent_brw+0x1f/0x30 [mdt]
> Nov 20 14:25:03 c6mds1 kernel: [] 
> mdt_intent_policy+0x2e8/0xd00 [mdt]
> Nov 20 14:25:03 c6mds1 kernel: [] 
> ldlm_lock_enqueue+0x366/0xa60 [ptlrpc]
> Nov 20 14:25:03 c6mds1 kernel: [] 
> ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc]
> Nov 20 14:25:03 c6mds1 kernel: [] tgt_enqueue+0x62/0x210 
> [ptlrpc]
> Nov 20 14:25:03 c6mds1 kernel: [] 
> tgt_request_handle+0xaea/0x1580 [ptlrpc]
> Nov 20 14:25:03 c6mds1 kernel: [] 
> ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
> Nov 20 14:25:03 c6mds1 kernel: [] 
> ptlrpc_main+0xafc/0x1fc0 [ptlrpc]
> Nov 20 14:25:03 c6mds1 kernel: [] kthread+0xd1/0xe0
> Nov 20 14:25:03 c6mds1 kernel: [] 
> ret_from_fork_nospec_begin+0x7/0x21
> Nov 20 14:25:03 c6mds1 kernel: [] 0x
> Nov 20 14:25:03 c6mds1 kernel: LustreError: dumping log to 
> /tmp/lustre-log.1574259903.39057
> 
> Nov 20 14:25:03 c6mds1 kernel: LNet: Service thread pid 39024 was inactive 
> for 201.36s. Watchdog stack traces are limited to 3 per 300 seconds, 
> skipping this one.
> Nov 20 14:25:52 c6mds1 kernel: LustreError: 
> 38853:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer 
> expired after 100s: evicting client at 172.18.122.167@o2ib  ns: 
> mdt-cos6-MDT_UUID lock: 922fb4238480/0x2efa065d0bb1817f lrc: 3/0,0 
> mode: PW/PW res: [0x27a26:0x14:0x0].0x0 bits 0x40/0x0 rrc: 53 type: 
> IBT flags: 0x6020040020 nid: 172.18.122.167@o2ib remote: 
> 0x1c35b518c55069d8 expref: 15 pid: 39076 timeout: 4429140 lvb_type: 0
> Nov 20 14:25:52 c6mds1 kernel: LNet: Service thread pid 39054 completed 
> after 249.98s. This indicates the system was overloaded (too many service 
> threads, or there were not enough hardware resources).
> Nov 20 14:25:52 c6mds1 kernel: LustreError: 
> 39074:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed 
> export 924c828ec000 ns: mdt-cos6-MDT_UUID lock: 
> 92596372c140/0x2efa065d0bb186db lrc: 3/0,0 mode: PR/PR res: 
> [0x27a26:0x14:0x0].0x0 bits 0x20/0x0 rrc: 17 type: IBT flags: 
> 0x502000 nid: 172.18.122.165@o2ib remote: 0x37bce663787829b8 
> expref: 2 pid: 39074 timeout: 0 lvb_type: 0
> Nov 20 14:25:52 c6mds1 kernel: LustreError: 
> 39074:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 1 previous 
> similar message
> Nov 20 14:25:52 c6mds1 kernel: LNet: Skipped 7 previous similar messages
> 
> 
> 
> 
> Any suggestions?  Its a ldisfs backend for the MDS (OSSs are zfs).
> 
> Thanks,
> Alastair.
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Lnet Self Test

2019-11-26 Thread Pinkesh Valdria
Hello All, 

 

I created a new Lustre cluster on CentOS7.6 and I am running 
lnet_selftest_wrapper.sh to measure throughput on the network.  The nodes are 
connected to each other using 25Gbps ethernet, so theoretical max is 25 Gbps * 
125 = 3125 MB/s.    Using iperf3,  I get 22Gbps (2750 MB/s) between the nodes.

 

 

[root@lustre-client-2 ~]# for c in 1 2 4 8 12 16 20 24 ;  do echo $c ; 
ST=lst-output-$(date +%Y-%m-%d-%H:%M:%S)  CN=$c  SZ=1M  TM=30 BRW=write 
CKSUM=simple LFROM="10.0.3.7@tcp1" LTO="10.0.3.6@tcp1" 
/root/lnet_selftest_wrapper.sh; done ;

 

When I run lnet_selftest_wrapper.sh (from Lustre wiki) between 2 nodes,  I get 
a max of  2055.31  MiB/s,  Is that expected at the Lnet level?  Or can I 
further tune the network and OS kernel (tuning I applied are below) to get 
better throughput?

 

 

 

Result Snippet from lnet_selftest_wrapper.sh

 

 [LNet Rates of lfrom]

[R] Avg: 4112 RPC/s Min: 4112 RPC/s Max: 4112 RPC/s

[W] Avg: 4112 RPC/s Min: 4112 RPC/s Max: 4112 RPC/s

[LNet Bandwidth of lfrom]

[R] Avg: 0.31 MiB/s Min: 0.31 MiB/s Max: 0.31 MiB/s

[W] Avg: 2055.30  MiB/s Min: 2055.30  MiB/s Max: 2055.30  MiB/s

[LNet Rates of lto]

[R] Avg: 4136 RPC/s Min: 4136 RPC/s Max: 4136 RPC/s

[W] Avg: 4136 RPC/s Min: 4136 RPC/s Max: 4136 RPC/s

[LNet Bandwidth of lto]

[R] Avg: 2055.31  MiB/s Min: 2055.31  MiB/s Max: 2055.31  MiB/s

[W] Avg: 0.32 MiB/s Min: 0.32 MiB/s Max: 0.32 MiB/s

 

 

Tuning applied: 

Ethernet NICs: 

ip link set dev ens3 mtu 9000 

ethtool -G ens3 rx 2047 tx 2047 rx-jumbo 8191

 

 

less /etc/sysctl.conf

net.core.wmem_max=16777216

net.core.rmem_max=16777216

net.core.wmem_default=16777216

net.core.rmem_default=16777216

net.core.optmem_max=16777216

net.core.netdev_max_backlog=27000

kernel.sysrq=1

kernel.shmmax=18446744073692774399

net.core.somaxconn=8192

net.ipv4.tcp_adv_win_scale=2

net.ipv4.tcp_low_latency=1

net.ipv4.tcp_rmem = 212992 87380 16777216

net.ipv4.tcp_sack = 1

net.ipv4.tcp_timestamps = 1

net.ipv4.tcp_window_scaling = 1

net.ipv4.tcp_wmem = 212992 65536 16777216

vm.min_free_kbytes = 65536

net.ipv4.tcp_congestion_control = cubic

net.ipv4.tcp_timestamps = 0

net.ipv4.tcp_congestion_control = htcp

net.ipv4.tcp_no_metrics_save = 0

 

 

 

echo "#

# tuned configuration

#

[main]

summary=Broadly applicable tuning that provides excellent performance across a 
variety of common server workloads

 

[disk]

devices=!dm-*, !sda1, !sda2, !sda3

readahead=>4096

 

[cpu]

force_latency=1

governor=performance

energy_perf_bias=performance

min_perf_pct=100

[vm]

transparent_huge_pages=never

[sysctl]

kernel.sched_min_granularity_ns = 1000

kernel.sched_wakeup_granularity_ns = 1500

vm.dirty_ratio = 30

vm.dirty_background_ratio = 10

vm.swappiness=30

" > lustre-performance/tuned.conf

 

tuned-adm profile lustre-performance

 

 

Thanks,

Pinkesh Valdria

 

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org