Hum... I looks there is something wrong with these kernel threads.
I don't know what their role is.

googling "kernel migration threads high cpu usage" returns many pages...
or ask a kernel expert if you have one.

Keep me updated,

Regards,
Thomas

On 04/21/15 11:40, Carmelo Ponti (CSCS) wrote:
Hi Thomas

checking the load with top I can see a lot of processes called migration/[0-31] which are loading the servers:

    7 root      RT   0     0    0    0 S 55.0  0.0   1691:40 migration/1
    3 root      RT   0     0    0    0 S 34.7  0.0   1595:18 migration/0
   39 root      RT   0     0    0    0 S 34.2  0.0   1696:14 migration/9
   35 root      RT   0     0    0    0 S 29.7  0.0   1576:14 migration/8
  103 root      RT   0     0    0    0 S 26.0  0.0 735:58.52 migration/25
   71 root      RT   0     0    0    0 S 21.4  0.0 789:06.28 migration/17
   11 root      RT   0     0    0    0 S 20.2  0.0   1058:41 migration/2
   51 root      RT   0     0    0    0 S 19.8  0.0 465:53.14 migration/12
  115 root      RT   0     0    0    0 S 19.7  0.0 285:58.55 migration/28
   47 root      RT   0     0    0    0 S 18.6  0.0 783:47.47 migration/11
   99 root      RT   0     0    0    0 S 18.0  0.0 648:41.81 migration/24
   43 root      RT   0     0    0    0 S 17.6  0.0   1154:51 migration/10
  111 root      RT   0     0    0    0 S 17.0  0.0 401:44.51 migration/27
  107 root      RT   0     0    0    0 S 15.7  0.0 579:55.17 migration/26
   67 root      RT   0     0    0    0 S 14.9  0.0 698:39.89 migration/16
   75 root      RT   0     0    0    0 S 13.1  0.0 621:44.06 migration/18
   83 root      RT   0     0    0    0 S 11.7  0.0 271:49.81 migration/20

These processes are heavily used only once I start robinhood. On other robinhood installations where the load of lustre is not so hight, migration processes are always 0%. Currently /scratch/daint is using by two CRAY clusters for a total of ca. 6400 compute nodes. Maybe our robinhood HW is to small. Do you think changing the HW could help?

Carmelo



On Tue, 2015-04-21 at 11:14 +0200, LEIBOVICI Thomas wrote:
Hi,

Reports indicate the DB backend is not overloaded, so you don't have to
change your DB tunings or db access method.
It looks the access to the filesystem is very slow: GET_INFO_FS stage
takes about 127ms, which means filesystem calls take 127ms which is too
long.

As mysql and robinhood are almost idle (0% CPU), do you know what causes
the server load of 30 ??
This load my explain why accesses are slow.

Thomas


------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
_______________________________________________
robinhood-support mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/robinhood-support

Reply via email to