Re: [gpfsug-discuss] Poor client performance with high cpu usage of mmfsd process

2020-11-20 Thread Czauz, Kamil
Here is the output of waiters on 2 hosts that were having the issue today:

HOST 1
 [2020-11-20 09:07:53 root@nyzls149m ~]# /usr/lpp/mmfs/bin/mmdiag --waiters

=== mmdiag: waiters ===
Waiting 0.0035 sec since 09:08:07, monitored, thread 135497 
FileBlockReadFetchHandlerThread: on ThCond 0x7F615C152468 (MsgRecordCondvar), 
reason 'RPC wait' for NSD I/O completion on node 10.64.44.180 
Waiting 0.0036 sec since 09:08:07, monitored, thread 139228 
PrefetchWorkerThread: on ThCond 0x7F627000D5D8 (MsgRecordCondvar), reason 'RPC 
wait' for NSD I/O completion on node 10.64.44.181 

[2020-11-20 09:08:07 root@nyzls149m ~]# /usr/lpp/mmfs/bin/mmdiag --waiters

=== mmdiag: waiters ===

HOST 2
[2020-11-20 09:08:49 root@nyzls150m ~]# /usr/lpp/mmfs/bin/mmdiag --waiters

=== mmdiag: waiters ===
Waiting 0.0034 sec since 09:08:50, monitored, thread 345318 
SharedHashTabFetchHandlerThread: on ThCond 0x7F049C001F08 (MsgRecordCondvar), 
reason 'RPC wait' for NSD I/O completion on node 10.64.44.133 

[2020-11-20 09:08:50 root@nyzls150m ~]# /usr/lpp/mmfs/bin/mmdiag --waiters

=== mmdiag: waiters ===

[2020-11-20 09:08:52 root@nyzls150m ~]# /usr/lpp/mmfs/bin/mmdiag --waiters

=== mmdiag: waiters ===


You can see the waiters go from 0 to 1-2 , but they are hardly blocking.

Yes there are separate pools for metadata for all of the filesystems here.


I did another trace today when the problem was happening - this time I was able 
to get a longer trace using the following command:
/usr/lpp/mmfs/bin/mmtracectl --start --trace=io --trace-file-size=512M 
--tracedev-write-mode=blocking --tracedev-buffer-size=64M -N nyzls149m


This is what the trsum output looks like:

Elapsed trace time:   62.412092000 seconds
Elapsed trace time from first VFS call to last:   62.412091999
Time idle between VFS calls:   0.002913000 seconds

Operations stats: total time(s)  countavg-usecs
wait-time(s)avg-usecs
  readpage   0.003487000 9  387.444
  rdwr   0.273721000   183 1495.743
  read_inode20.007304000   325   22.474
  follow_link0.01395200058  240.552
  pagein 0.02597400066  393.545
  getattr0.00279200026  107.385
  revalidate 0.009406000  21724.331
  create66.194479000 3 22064826.333
  open   1.7255050008819608.011
  unlink18.685099000 1 18685099.000
  setattr0.01162700014  830.500
  lookup  2379.215514000   502  4739473.135
  delete_inode   0.015553000   328   47.418
  rename98.099073000 5 19619814.600
  release0.05057400089  568.247
  permission 0.00745400073  102.110
  getxattr   0.00234600032   73.312
  statfs 0.81000 6   13.500
  mmap   0.04980900018 2767.167
  removexattr0.00082700014   59.071
  llseek 0.000441000479.383
  readdir0.00266700034   78.441
Ops  4093 Secs  62.409178999  Ops/Sec   65.583

MaxFilesToCache is set to 12000 :
[common]
maxFilesToCache 12000


I only see gpfs_i_lookup in the tracefile, no gpfs_v_lookups
#  grep gpfs_i_lookup trcrpt.2020-11-20_09.20.38.283986.nyzls149m |wc -l
1097

They mostly look like this -

  62.346560 238895 TRACE_VNODE: gpfs_i_lookup exit: new inode 
0x922178971A40 iNum 21980113 (0x14F63D1) cnP 0x922178971C88 retP 0x0 
code 0 rc 0
  62.346955 238895 TRACE_VNODE: gpfs_i_lookup enter: diP 0x91A8A4991E00 
dentryP 0x92C545A93500 name '20170323.txt' d_flags 0x80 d_count 1 unhashed 1
  62.367701 218442 TRACE_VNODE: gpfs_i_lookup exit: new inode 
0x92207130 iNum 29629892 (0x1C41DC4) cnP 0x922071300248 retP 0x0 
code 0 rc 0
  62.367734 218444 TRACE_VNODE: gpfs_i_lookup enter: diP 0x9193CF457800 
dentryP 0x9229527A89C0 name 'node.py' d_flags 0x80 d_count 1 unhashed 1


-Original Message-
From: gpfsug-discuss-boun...@spectrumscale.org 
 On Behalf Of Uwe Falke
Sent: Monday, November 16, 2020 8:46 AM
To: gpfsug main discussion list 
Subject: Re: [gpfsug-discuss] Poor client performance with high cpu usage of 
mmfsd process

Hi,
while the other nodes can well block the local one, as Frederick suggests,  
there should at least be something visible locally waiting for these other 
nodes.
Looking at all waiters might be a good thing, but this case looks strange in 
other ways. Mind statement there are almost no local waiters and none of them 
gets older than 10 ms.

I am no developer nor do I have the code, so don't expect too much.  Can you 
tell

Re: [gpfsug-discuss] Poor client performance with high cpu usage of mmfsd process

2020-11-16 Thread Uwe Falke
Hi, 
while the other nodes can well block the local one, as Frederick suggests, 
 there should at least be something visible locally waiting for these 
other nodes. 
Looking at all waiters might be a good thing, but this case looks strange 
in other ways. Mind statement there are almost no local waiters and none 
of them gets older than 10 ms.

I am no developer nor do I have the code, so don't expect too much.  Can 
you tell what lookups you see (check in the trcrpt file, could be like 
gpfs_i_lookup or gpfs_v_lookup)? 
Lookups are metadata ops, do you have a separate pool for your metadata? 
How is that pool set up (doen to the physical block devices)?
Your trcsum down revealed 36 lookups, each one on avg taking >30ms. That 
is a lot (albeit the respective waiters won't show up at first glance as 
suspicious ...). 
So, which waiters did you see  (hope you saved them, if not, do it next 
time).

What are the node you see this on and the whole cluster used for? What is 
the MaxFilesToCache setting (for that node and for others)? what HW is 
that, how big are your nodes (memory,CPU)?
To check the unreasonably short trace capture time: how large are the 
trcrpt files you obtain?


Mit freundlichen Grüßen / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefa...@de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Geschäftsführung: Sven Schooss, Stefan Hierl
Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122



From:   "Czauz, Kamil" 
To: gpfsug main discussion list 
Date:   13/11/2020 14:33
Subject:    [EXTERNAL] Re: [gpfsug-discuss] Poor client performance 
with high cpu   usage   of  mmfsd process
Sent by:gpfsug-discuss-boun...@spectrumscale.org



Hi Uwe -

Regarding your previous message - waiters were coming / going with just 
1-2 waiters when I ran the mmdiag command, with very low wait times 
(<0.01s).

We are running version 4.2.3

I did another capture today while the client is functioning normally and 
this was the header result:

Overwrite trace parameters:
buffer size: 134217728
64 kernel trace streams, indices 0-63 (selected by low bits of processor 
ID)
128 daemon trace streams, indices 64-191 (selected by low bits of thread 
ID)
Interval for calibrating clock rate was 25.996957 seconds and 67592121252 
cycles
Measured cycle count update rate to be 261271 per second < using 
this value
OS reported cycle count update rate as 259000 per second
Trace milestones:
kernel trace enabled Fri Nov 13 08:20:01.800558000 2020 (TOD 
1605273601.800558, cycles 20807897445779444)
daemon trace enabled Fri Nov 13 08:20:01.910017000 2020 (TOD 
1605273601.910017, cycles 20807897730372442)
all streams included Fri Nov 13 08:20:26.423085049 2020 (TOD 
1605273626.423085, cycles 20807961464381068) < useful part of trace 
extends from here
trace quiesced Fri Nov 13 08:20:27.797515000 2020 (TOD 1605273627.000797, 
cycles 20807965037900696) < to here
Approximate number of times the trace buffer was filled: 14.631

Still a very small capture (1.3s), but the trsum.awk output was not filled 
with lookup commands / large lookup times. Can you help debug what those 
long lookup operations mean?

Unfinished operations:

27967 * pagein ** 1.362382116
27967 * readpage ** 1.362381516
139130 1.362448448 * Unfinished IO: buffer/disk 3002F67 
20:107498951168^\archive_data_16
104686 1.362022068 * Unfinished IO: buffer/disk 50011878000 
1:47169618944^\archive_data_1
0 0.0 * Unfinished IO: buffer/disk 5003CEB8000 
4:23073390592^\FFFE
0 0.0 * Unfinished IO: buffer/disk 5003CEB8000 
4:23073390592^\
0 0.0 * Unfinished IO: buffer/disk 2000EE78000 
5:47631127040^\FFFE
341710 1.362423815 * Unfinished IO: buffer/disk 20022218000 
19:107498951680^\archive_data_15
0 0.0 * Unfinished IO: buffer/disk 2000EE78000 
5:47631127040^\
0 0.0 * Unfinished IO: buffer/disk 206B000 
18:3452986648^\FFFE
0 0.0 * Unfinished IO: buffer/disk 206B000 
18:3452986648^\
0 0.0 * Unfinished IO: buffer/disk 206B000 
18:3452986648^\
139150 1.361122006 * Unfinished IO: buffer/disk 50012018000 
2:47169622016^\archive_data_2
0 0.0 * Unfinished IO: buffer/disk 5003CEB8000 
4:23073390592^\
95782 1.361112791 * Unfinished IO: buffer/disk 4001630 
20:107498950656^\archive_data_16
0 0.0 * Unfinished IO: buffer/disk 2000EE78000 
5:47631127040^\
271076 1.361579585 * Unfinished IO: buffer/disk 20023DB8000 
4:47169606656^\archive_data_4
341676 1.

Re: [gpfsug-discuss] Poor client performance with high cpu usage of mmfsd process

2020-11-13 Thread Frederick Stock
49468 13.72% 86.28% 3175480 0.00278 0.0 100.00% 0.00% 1336006 0.01170 0.250020470 0.00% 100.00% 1644777 0.00367 0.250149757 0.00% 100.00% 6189680 0.02717 0.06518 29.42% 70.58% 1184839 0.03001 0.250144214 0.00% 100.00% 35145858 0.00687 0.0 100.00% 0.00% 1333972 0.218656404 0.43897 99.98% 0.02% 4334691 0.187695040 0.000295117 99.84% 0.16% 25# total App-read/write = 7 Average duration = 0.000123672 sec # time(sec) count % %ile read write avgBytesR avgBytesW0.000500 7 1.00 1.00 7 0 1172 0-Original Message-From: gpfsug-discuss-boun...@spectrumscale.org On Behalf Of Uwe FalkeSent: Wednesday, November 11, 2020 8:57 PMTo: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Poor client performance with high cpu usage of mmfsd processHi, Kamil,I suppose you'd rather not see such an issue than pursue the ugly work-around to kill off processes.In such situations, the first looks should be for the GPFS log (on the client, on the cluster manager, and maybe on the file system manager) and for the current waiters (that is the list of currently waiting threads) on the hanging client.-> /var/adm/ras/mmfs.log.latestmmdiag --waitersThat might give you a first idea what is taking long and which components are involved.Also,mmdiag --iohistshows you the last IOs and some stats (service time, size) for them.Either that clue is already sufficient, or you go on (if you see DIO somewhere, direct IO is used which might slow down things, for example).GPFS has a nice tracing which you can configure or just run the default trace.Running a dedicated (low-level) io trace can be achieved by mmtracectl --start --trace=io --tracedev-write-mode=overwrite -N  then, when the issue is seen, stop the trace by mmtracectl --stop -N Do not wait to stop the trace once you've seen the issue, the trace file cyclically overwrites its output. If the issue lasts some time you could also start the trace while you see it, run the trace for say 20 secs and stop again. On stopping the trace, the output gets converted into an ASCII trace file named trcrpt.*(usually in /tmp/mmfs, check the command output).There you should see lines with FIO which carry the inode of the related file after the "tag" keyword.example:0.000745100 25123 TRACE_IO: FIO: read data tag 248415 43466 ioVecSize 8 1st buf 0x299E89BC000 disk 8D0 da 154:2083875440 nSectors 128 err 0 finishTime 1563473283.135212150-> inode is 248415there is a utility , tsfindinode, to translate that into the file path.you need to build this first if not yet done:cd /usr/lpp/mmfs/samples/util ; make, then run./tsfindinode -i  For the IO trace analysis there is an older tool :/usr/lpp/mmfs/samples/debugtools/trsum.awk.Then there is some new stuff I've not yet used in /usr/lpp/mmfs/samples/traceanz/ (always check the README)Hope that halps a bit.Mit freundlichen Grüßen / Kind regardsDr. Uwe FalkeIT SpecialistHybrid Cloud Infrastructure / Technology Consulting & Implementation Services+49 175 575 2877 MobileRathausstr. 7, 09111 Chemnitz, Germanyuwefa...@de.ibm.comIBM ServicesIBM Data Privacy StatementIBM Deutschland Business & Technology Services GmbHGeschäftsführung: Sven Schooss, Stefan Hierl Sitz der Gesellschaft:EhningenRegistergericht: Amtsgericht Stuttgart, HRB 17122From: "Czauz, Kamil" To: "gpfsug-discuss@spectrumscale.org"Date: 11/11/2020 23:36Subject: [EXTERNAL] [gpfsug-discuss] Poor client performance with high cpu usage of mmfsd process Sent by: gpfsug-discuss-boun...@spectrumscale.orgWe regularly run into performance issues on our clients where the client seems to hang when accessing any gpfs mount, even something simple like a ls could take a few minutes to complete. This affects every gpfs mount on the client, but other clients are working just fine. Also the mmfsd process at this point is spinning at something like 300-500% cpu.The only way I have found to solve this is by killing processes that may be doing heavy i/o to the gpfs mounts - but this is more of an art than a science. I often end up killing many processes before finding the offending one.My question is really about finding the offending process easier. Is there something similar to iotop or a trace that I can enable that can tell me what files/processes and being heavily used by the mmfsd process on the client?-KamilConfidentiality Note: This e-mail and any attachments are confidential and may be protected by legal privilege. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of this e-mail or any attachment is prohibited. If you have received this e-mail in error, please notify us immediately by returning it to the sender and delete this copy from your system. We will use any personal information you give to us in accordance with our Privacy Policy which can be found in the Data Protection section on our corporate website http://www.squarepoint-capital.

Re: [gpfsug-discuss] Poor client performance with high cpu usage of mmfsd process

2020-11-13 Thread Czauz, Kamil
58832110 MSG FRep: nsdMsgReadExt msg_id 631945542 Rduration 
415591.871 us Rlen 262144 Hduration 415597.056 + us


Elapsed trace time:1.374423981 seconds
Elapsed trace time from first VFS call to last:1.374423980
Time idle between VFS calls:   0.001603738 seconds

Operations stats: total time(s)  countavg-usecs
wait-time(s)avg-usecs
  readpage   1.151660085  1874  614.546
  rdwr   0.431456904   581  742.611
  read_inode20.001180648   9341.264
  follow_link0.29502 74.215
  getattr0.48413 95.379
  revalidate 0.07080670.106
  pagein 1.149699537  1877  612.520
  create 0.007664829 9  851.648
  open   0.00103265719   54.350
  unlink 0.00256372614  183.123
  delete_inode   0.000764598   8260.926
  lookup 0.312847947   953  328.277
  setattr0.020651226   824   25.062
  permission 0.15018 1   15.018
  rename 0.000529023 4  132.256
  release0.00161380022   73.355
  getxattr   0.30494 65.082
  mmap   0.54767 1   54.767
  llseek 0.01130 40.283
  readdir0.33947 2   16.973
  removexattr0.002119736   8202.585

User thread stats: GPFS-time(sec)Appl-time  GPFS-%  Appl-% Ops
 42625   0.00138   0.31017   0.44%  99.56%   3
 42378   0.000586959   0.011596801   4.82%  95.18%  32
 42627   0.00272   0.13421   1.99%  98.01%   2
 42641   0.003284590   0.012593594  20.69%  79.31%  35
 42628   0.001522335   0.02748  99.82%   0.18%   2
 25464   0.003462795   0.500281914   0.69%  99.31%  12
301420   0.16711   0.052848218   0.03%  99.97%  38
 95103   0.00544   0.0 100.00%   0.00%   1
145858   0.00659   0.000794896   0.08%  99.92%   2
 42221   0.11484   0.39445  22.55%  77.45%   5
371718   0.00707   0.001805425   0.04%  99.96%   2
 95109   0.00880   0.008998763   0.01%  99.99%   2
 95337   0.10330   0.503057866   0.00% 100.00%   8
 42700   0.002442175   0.012504429  16.34%  83.66%  35
189680   0.003466450   0.500128627   0.69%  99.31%   9
 42681   0.006685396   0.000391575  94.47%   5.53%  16
 42702   0.48203   0.00500  98.97%   1.03%   2
 42703   0.33280   0.140102087   0.02%  99.98%   9
224423   0.00195   0.0 100.00%   0.00%   1
 42706   0.000541098   0.14713  97.35%   2.65%   3
106275   0.00456   0.0 100.00%   0.00%   1
 42721   0.000372857   0.0 100.00%   0.00%   1


-Original Message-
From: gpfsug-discuss-boun...@spectrumscale.org 
 On Behalf Of Uwe Falke
Sent: Friday, November 13, 2020 4:37 AM
To: gpfsug main discussion list 
Subject: Re: [gpfsug-discuss] Poor client performance with high cpu usage of 
mmfsd process

Hi Kamil,
in my mail just a few minutes ago  I'd overlooked that the buffer size in your 
trace was indeed 128M (I suppose the trace file is adapting that size if not 
set in particular). That is very strange, even under high load, the trace 
should then capture some longer time than 10 secs, and , most of all, it should 
contain much more activities than just these few you had.
That is very mysterious.
I am out of ideas for the moment, and a bit short of time to dig here.

To check your tracing, you could run a trace like before but when everything is 
normal and check that out - you should see many records, the trcsum.awk should 
list just a small portion of unfinished ops at the end, ... If that is fine, 
then the tracing itself is affected by your crritical condition (never 
experienced that before - rather GPFS grinds to a halt than the trace is 
abandoned), and that might well be worth a support ticket.

Mit freundlichen Grüßen / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefa...@de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Geschäftsführung: Sven Schooss, Stefan Hierl Sitz der Ge

Re: [gpfsug-discuss] Poor client performance with high cpu usage of mmfsd process

2020-11-13 Thread Uwe Falke
Hi Kamil, 
in my mail just a few minutes ago  I'd overlooked that the buffer size in 
your trace was indeed 128M (I suppose the trace file is adapting that size 
if not set in particular). That is very strange, even under high load, the 
trace should then capture some longer time than 10 secs, and , most of 
all, it should contain much more activities than just these few you had. 
That is very mysterious. 
I am out of ideas for the moment, and a bit short of time to dig here.

To check your tracing, you could run a trace like before but when 
everything is normal and check that out - you should see many records, the 
trcsum.awk should list just a small portion of unfinished ops at the end, 
... If that is fine, then the tracing itself is affected by your crritical 
condition (never experienced that before - rather GPFS grinds to a halt 
than the trace is abandoned), and that might well be worth a support 
ticket.

Mit freundlichen Grüßen / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefa...@de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Geschäftsführung: Sven Schooss, Stefan Hierl
Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122



From:   Uwe Falke/Germany/IBM
To: gpfsug main discussion list 
Date:   13/11/2020 10:21
Subject:Re: [EXTERNAL] Re: [gpfsug-discuss] Poor client 
performance with high cpu usage of      mmfsd process


Hi, Kamil, 
looks your tracefile setting has been too low: 
all streams included Thu Nov 12 20:58:19.950515266 2020 (TOD 
1605232699.950515, cycles 20701552715873212) < useful part of trace 
extends from here
trace quiesced Thu Nov 12 20:58:20.133134000 2020 (TOD 1605232700.000133, 
cycles 20701553190681534) < to here
means you effectively captured a period of about 5ms only ... you can't 
see much from that. 

I'd assumed the default trace file size would be sufficient here but it 
doesn't seem to. 
try running with something like 
 mmtracectl --start --trace-file-size=512M --trace=io 
--tracedev-write-mode=overwrite -N . 

However, if you say "no major waiter" - how many waiters did you see at 
any time? what kind of waiters were the oldest, how long they'd waited?
it could indeed well be that some job is just creating a killer workload. 

The very short cyle time of the trace points, OTOH, to high activity, OTOH 
the trace file setting appears quite low (trace=io doesnt' collect many 
trace infos, just basic IO stuff). 
If I might ask: what version of GPFS are you running?

Mit freundlichen Grüßen / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefa...@de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Geschäftsführung: Sven Schooss, Stefan Hierl
Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122




From:   "Czauz, Kamil" 
To: gpfsug main discussion list 
Date:   13/11/2020 03:33
Subject:    [EXTERNAL] Re: [gpfsug-discuss] Poor client performance 
with high cpu usage of  mmfsd process
Sent by:gpfsug-discuss-boun...@spectrumscale.org



Hi Uwe -

I hit the issue again today, no major waiters, and nothing useful from the 
iohist report. Nothing interesting in the logs either.

I was able to get a trace today while the issue was happening. I took 2 
traces a few min apart.

The beginning of the traces look something like this:
Overwrite trace parameters:
buffer size: 134217728
64 kernel trace streams, indices 0-63 (selected by low bits of processor 
ID)
128 daemon trace streams, indices 64-191 (selected by low bits of thread 
ID)
Interval for calibrating clock rate was 100.019054 seconds and 
260049296314 cycles
Measured cycle count update rate to be 257559 per second < using 
this value
OS reported cycle count update rate as 259000 per second
Trace milestones:
kernel trace enabled Thu Nov 12 20:56:40.11408 2020 (TOD 
1605232600.114080, cycles 20701293141385220)
daemon trace enabled Thu Nov 12 20:56:40.24743 2020 (TOD 
1605232600.247430, cycles 20701293488095152)
all streams included Thu Nov 12 20:58:19.950515266 2020 (TOD 
1605232699.950515, cycles 20701552715873212) < useful part of trace 
extends from here
trace quiesced Thu Nov 12 20:58:20.133134000 2020 (TOD 1605232700.000133, 
cycles 20701553190681534) < to here
Approximate number of times the trace buffer was filled: 553.529


Here is the output of
trsum.awk details=0
I'm not quite sure what to make of it, can you help me decipher it? The 
'lookup' operations are taking a hell of a long time, what does that mean?

Capture 1

Unf

Re: [gpfsug-discuss] Poor client performance with high cpu usage of mmfsd process

2020-11-13 Thread Uwe Falke
Hi, Kamil, 
looks your tracefile setting has been too low: 
all streams included Thu Nov 12 20:58:19.950515266 2020 (TOD 
1605232699.950515, cycles 20701552715873212) < useful part of trace 
extends from here
trace quiesced Thu Nov 12 20:58:20.133134000 2020 (TOD 1605232700.000133, 
cycles 20701553190681534) < to here
means you effectively captured a period of about 5ms only ... you can't 
see much from that. 

I'd assumed the default trace file size would be sufficient here but it 
doesn't seem to. 
try running with something like 
 mmtracectl --start --trace-file-size=512M --trace=io 
--tracedev-write-mode=overwrite -N . 

However, if you say "no major waiter" - how many waiters did you see at 
any time? what kind of waiters were the oldest, how long they'd waited?
it could indeed well be that some job is just creating a killer workload. 

The very short cyle time of the trace points, OTOH, to high activity, OTOH 
the trace file setting appears quite low (trace=io doesnt' collect many 
trace infos, just basic IO stuff). 
If I might ask: what version of GPFS are you running?

Mit freundlichen Grüßen / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefa...@de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Geschäftsführung: Sven Schooss, Stefan Hierl
Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122



From:   "Czauz, Kamil" 
To: gpfsug main discussion list 
Date:   13/11/2020 03:33
Subject:    [EXTERNAL] Re: [gpfsug-discuss] Poor client performance 
with high cpu usage of  mmfsd process
Sent by:gpfsug-discuss-boun...@spectrumscale.org



Hi Uwe -

I hit the issue again today, no major waiters, and nothing useful from the 
iohist report. Nothing interesting in the logs either.

I was able to get a trace today while the issue was happening. I took 2 
traces a few min apart.

The beginning of the traces look something like this:
Overwrite trace parameters:
buffer size: 134217728
64 kernel trace streams, indices 0-63 (selected by low bits of processor 
ID)
128 daemon trace streams, indices 64-191 (selected by low bits of thread 
ID)
Interval for calibrating clock rate was 100.019054 seconds and 
260049296314 cycles
Measured cycle count update rate to be 257559 per second < using 
this value
OS reported cycle count update rate as 259000 per second
Trace milestones:
kernel trace enabled Thu Nov 12 20:56:40.11408 2020 (TOD 
1605232600.114080, cycles 20701293141385220)
daemon trace enabled Thu Nov 12 20:56:40.24743 2020 (TOD 
1605232600.247430, cycles 20701293488095152)
all streams included Thu Nov 12 20:58:19.950515266 2020 (TOD 
1605232699.950515, cycles 20701552715873212) < useful part of trace 
extends from here
trace quiesced Thu Nov 12 20:58:20.133134000 2020 (TOD 1605232700.000133, 
cycles 20701553190681534) < to here
Approximate number of times the trace buffer was filled: 553.529


Here is the output of
trsum.awk details=0
I'm not quite sure what to make of it, can you help me decipher it? The 
'lookup' operations are taking a hell of a long time, what does that mean?

Capture 1

Unfinished operations:

21234 * lookup ** 0.165851604
290020 * lookup ** 0.151032241
302757 * lookup ** 0.168723402
301677 * lookup ** 0.070016530
230983 * lookup ** 0.127699082
21233 * lookup ** 0.060357257
309046 * lookup ** 0.157124551
301643 * lookup ** 0.165543982
304042 * lookup ** 0.172513838
167794 * lookup ** 0.056056815
189680 * lookup ** 0.062022237
362216 * lookup ** 0.072063619
406314 * lookup ** 0.114121838
167776 * lookup ** 0.114899642
303016 * lookup ** 0.144491120
290021 * lookup ** 0.142311603
167762 * lookup ** 0.144240366
248530 * lookup ** 0.168728131
0 0.0 * Unfinished IO: buffer/disk 30018014000 
14:48493092752^\
0 0.0 * Unfinished IO: buffer/disk 30018014000 
14:48493092752^\FFFE
0 0.0 * Unfinished IO: buffer/disk 206B000 
2:6058336^\
0 0.0 * Unfinished IO: buffer/disk 30018014000 
14:48493092752^\
0 0.0 * Unfinished IO: buffer/disk 206B000 
2:6058336^\FFFE
0 0.0 * Unfinished IO: buffer/disk 206B000 
2:6058336^\
0 0.000

Re: [gpfsug-discuss] Poor client performance with high cpu usage of mmfsd process

2020-11-12 Thread Luis Bolinches
In addition to what Uwe mentioned
 
I am bit concern of you needing to look into killing daemon to "fix" it.
 
You might want to look into QoS fine-stats to get PID process information for the node
 
http://files.gpfsug.org/presentations/2018/London/14_LuisBolinches_GPFSUG.pdf
 
Slide 15 and KC
 
https://www.ibm.com/support/knowledgecenter/STXKQY_5.1.0/com.ibm.spectrum.scale.v5r10.doc/bl1adm_mmqos.htm
 
It consumes some memory per node. No need to set any QoS limits just enable QoS
 
--Ystävällisin terveisin / Kind regards / Saludos cordiales / Salutations / SalutacionsLuis Bolinches
Consultant IT Specialist
IBM Spectrum Scale development
Mobile Phone: +358503112585
 
https://www.youracclaim.com/user/luis-bolinches
 
Ab IBM Finland Oy
Laajalahdentie 23
00330 Helsinki
Uusimaa - Finland"If you always give you will always have" --  Anonymous
 
 
 
- Original message -From: "Uwe Falke" Sent by: gpfsug-discuss-boun...@spectrumscale.orgTo: gpfsug main discussion list Cc:Subject: [EXTERNAL] Re: [gpfsug-discuss] Poor client performance with high cpu usage of mmfsd processDate: Thu, Nov 12, 2020 03:56 
Hi, Kamil,I suppose you'd rather not see such an issue than pursue the uglywork-around to kill off processes.In such situations, the first looks should be for the GPFS log (on theclient, on the cluster manager, and maybe on the file system manager) andfor the current waiters (that is the list of currently waiting threads) onthe hanging client.-> /var/adm/ras/mmfs.log.latestmmdiag --waitersThat might give you a first idea what is taking long and which componentsare involved.Also,mmdiag --iohistshows you the last IOs and some stats (service time, size) for them.Either that clue is already sufficient, or you go on (if you see DIOsomewhere, direct IO is used which might slow down things, for example).GPFS has a nice tracing which you can configure or just run the defaulttrace.Running a dedicated (low-level) io trace can be achieved bymmtracectl --start --trace=io  --tracedev-write-mode=overwrite -Nthen, when the issue is seen, stop the trace bymmtracectl --stop   -N Do not wait  to stop the trace once you've seen the issue, the trace filecyclically overwrites its output. If the issue lasts some time you couldalso start the trace while you see it, run the trace for say 20 secs andstop again. On stopping the trace, the output gets converted into an ASCIItrace file named trcrpt.*(usually in /tmp/mmfs, check the command output).There you should see lines with  FIO which carry the inode of the relatedfile after the "tag" keyword.example:0.000745100  25123 TRACE_IO: FIO: read data tag 248415 43466 ioVecSize 81st buf 0x299E89BC000 disk 8D0 da 154:2083875440 nSectors 128 err 0finishTime 1563473283.135212150-> inode is 248415there is a utility , tsfindinode, to translate that into the file path.you need to build this first if not yet done:cd /usr/lpp/mmfs/samples/util ; make, then run./tsfindinode -i  For the IO trace analysis there is an older tool :/usr/lpp/mmfs/samples/debugtools/trsum.awk.Then there is some new stuff I've not yet used in/usr/lpp/mmfs/samples/traceanz/ (always check the README)Hope that halps a bit.Mit freundlichen Grüßen / Kind regardsDr. Uwe FalkeIT SpecialistHybrid Cloud Infrastructure / Technology Consulting & ImplementationServices+49 175 575 2877 MobileRathausstr. 7, 09111 Chemnitz, Germanyuwefa...@de.ibm.comIBM ServicesIBM Data Privacy StatementIBM Deutschland Business & Technology Services GmbHGeschäftsführung: Sven Schooss, Stefan HierlSitz der Gesellschaft: EhningenRegistergericht: Amtsgericht Stuttgart, HRB 17122From:   "Czauz, Kamil" To:     "gpfsug-discuss@spectrumscale.org"Date:   11/11/2020 23:36Subject:        [EXTERNAL] [gpfsug-discuss] Poor client performance withhigh cpu usage of       mmfsd processSent by:        gpfsug-discuss-boun...@spectrumscale.orgWe regularly run into performance issues on our clients where the clientseems to hang when accessing any gpfs mount, even something simple like als could take a few minutes to complete.   This affects every gpfs mounton the client, but other clients are working just fine. Also the mmfsdprocess at this point is spinning at something like 300-500% cpu. The only way I have found to solve this is by killing processes that maybe doing heavy i/o to the gpfs mounts - but this is more of an art than ascience.  I often end up killing many processes before finding theoffending one. My question is really about finding the offending process easier.  Isthere something similar to iotop or a trace that I can enable that cantell me what files/processes and being heavily used by the mmfsd processon the client? -KamilConfidentiality Note: This e-mail and any attachments are confidential andmay be protected by legal privilege. If you are not the intendedrecipient, be aware that any disclosure, copying, distribution or use ofthis e-mail or any attachment is prohibited. If you have received thise-mail in error, please notify us immediately by returning 

Re: [gpfsug-discuss] Poor client performance with high cpu usage of mmfsd process

2020-11-11 Thread Uwe Falke
Hi, Kamil, 
I suppose you'd rather not see such an issue than pursue the ugly 
work-around to kill off processes. 
In such situations, the first looks should be for the GPFS log (on the 
client, on the cluster manager, and maybe on the file system manager) and 
for the current waiters (that is the list of currently waiting threads) on 
the hanging client. 

-> /var/adm/ras/mmfs.log.latest
mmdiag --waiters 

That might give you a first idea what is taking long and which components 
are involved. 

Also, 
mmdiag --iohist 
shows you the last IOs and some stats (service time, size) for them. 

Either that clue is already sufficient, or you go on (if you see DIO 
somewhere, direct IO is used which might slow down things, for example). 
GPFS has a nice tracing which you can configure or just run the default 
trace. 

Running a dedicated (low-level) io trace can be achieved by 
mmtracectl --start --trace=io  --tracedev-write-mode=overwrite -N 

then, when the issue is seen, stop the trace by 
mmtracectl --stop   -N 

Do not wait  to stop the trace once you've seen the issue, the trace file 
cyclically overwrites its output. If the issue lasts some time you could 
also start the trace while you see it, run the trace for say 20 secs and 
stop again. On stopping the trace, the output gets converted into an ASCII 
trace file named trcrpt.*(usually in /tmp/mmfs, check the command output). 


There you should see lines with  FIO which carry the inode of the related 
file after the "tag" keyword.
example: 
0.000745100  25123 TRACE_IO: FIO: read data tag 248415 43466 ioVecSize 8 
1st buf 0x299E89BC000 disk 8D0 da 154:2083875440 nSectors 128 err 0 
finishTime 1563473283.135212150

-> inode is 248415

there is a utility , tsfindinode, to translate that into the file path. 
you need to build this first if not yet done: 
cd /usr/lpp/mmfs/samples/util ; make
, then run 
./tsfindinode -i  

For the IO trace analysis there is an older tool : 
/usr/lpp/mmfs/samples/debugtools/trsum.awk. 

Then there is some new stuff I've not yet used in 
/usr/lpp/mmfs/samples/traceanz/ (always check the README)

Hope that halps a bit. 

Mit freundlichen Grüßen / Kind regards

Dr. Uwe Falke
IT Specialist
Hybrid Cloud Infrastructure / Technology Consulting & Implementation 
Services
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefa...@de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Geschäftsführung: Sven Schooss, Stefan Hierl
Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122



From:   "Czauz, Kamil" 
To: "gpfsug-discuss@spectrumscale.org" 

Date:   11/11/2020 23:36
Subject:    [EXTERNAL] [gpfsug-discuss] Poor client performance with 
high cpu usage of   mmfsd process
Sent by:gpfsug-discuss-boun...@spectrumscale.org



We regularly run into performance issues on our clients where the client 
seems to hang when accessing any gpfs mount, even something simple like a 
ls could take a few minutes to complete.   This affects every gpfs mount 
on the client, but other clients are working just fine. Also the mmfsd 
process at this point is spinning at something like 300-500% cpu.
 
The only way I have found to solve this is by killing processes that may 
be doing heavy i/o to the gpfs mounts - but this is more of an art than a 
science.  I often end up killing many processes before finding the 
offending one.
 
My question is really about finding the offending process easier.  Is 
there something similar to iotop or a trace that I can enable that can 
tell me what files/processes and being heavily used by the mmfsd process 
on the client?
 
-Kamil
Confidentiality Note: This e-mail and any attachments are confidential and 
may be protected by legal privilege. If you are not the intended 
recipient, be aware that any disclosure, copying, distribution or use of 
this e-mail or any attachment is prohibited. If you have received this 
e-mail in error, please notify us immediately by returning it to the 
sender and delete this copy from your system. We will use any personal 
information you give to us in accordance with our Privacy Policy which can 
be found in the Data Protection section on our corporate website 
www.squarepoint-capital.com. Please note that e-mails may be monitored for 
regulatory and compliance purposes. Thank you for your cooperation. 
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss 





___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


[gpfsug-discuss] Poor client performance with high cpu usage of mmfsd process

2020-11-11 Thread Czauz, Kamil
We regularly run into performance issues on our clients where the client seems 
to hang when accessing any gpfs mount, even something simple like a ls could 
take a few minutes to complete.   This affects every gpfs mount on the client, 
but other clients are working just fine. Also the mmfsd process at this point 
is spinning at something like 300-500% cpu.

The only way I have found to solve this is by killing processes that may be 
doing heavy i/o to the gpfs mounts - but this is more of an art than a science. 
 I often end up killing many processes before finding the offending one.

My question is really about finding the offending process easier.  Is there 
something similar to iotop or a trace that I can enable that can tell me what 
files/processes and being heavily used by the mmfsd process on the client?

-Kamil
Confidentiality Note: This e-mail and any attachments are confidential and may 
be protected by legal privilege. If you are not the intended recipient, be 
aware that any disclosure, copying, distribution or use of this e-mail or any 
attachment is prohibited. If you have received this e-mail in error, please 
notify us immediately by returning it to the sender and delete this copy from 
your system. We will use any personal information you give to us in accordance 
with our Privacy Policy which can be found in the Data Protection section on 
our corporate website www.squarepoint-capital.com. Please note that e-mails may 
be monitored for regulatory and compliance purposes. Thank you for your 
cooperation.
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss