Hi Thomas
checking the load with top I can see a lot of processes called
migration/[0-31] which are loading the servers:
7 root RT 0 0 0 0 S 55.0 0.0 1691:40
migration/1
3 root RT 0 0 0 0 S 34.7 0.0 1595:18
migration/0
39 root RT 0 0 0 0 S 34.2 0.0 1696:14
migration/9
35 root RT 0 0 0 0 S 29.7 0.0 1576:14
migration/8
103 root RT 0 0 0 0 S 26.0 0.0 735:58.52
migration/25
71 root RT 0 0 0 0 S 21.4 0.0 789:06.28
migration/17
11 root RT 0 0 0 0 S 20.2 0.0 1058:41
migration/2
51 root RT 0 0 0 0 S 19.8 0.0 465:53.14
migration/12
115 root RT 0 0 0 0 S 19.7 0.0 285:58.55
migration/28
47 root RT 0 0 0 0 S 18.6 0.0 783:47.47
migration/11
99 root RT 0 0 0 0 S 18.0 0.0 648:41.81
migration/24
43 root RT 0 0 0 0 S 17.6 0.0 1154:51
migration/10
111 root RT 0 0 0 0 S 17.0 0.0 401:44.51
migration/27
107 root RT 0 0 0 0 S 15.7 0.0 579:55.17
migration/26
67 root RT 0 0 0 0 S 14.9 0.0 698:39.89
migration/16
75 root RT 0 0 0 0 S 13.1 0.0 621:44.06
migration/18
83 root RT 0 0 0 0 S 11.7 0.0 271:49.81
migration/20
These processes are heavily used only once I start robinhood. On other
robinhood installations where the load of lustre is not so hight,
migration processes are always 0%. Currently /scratch/daint is using by
two CRAY clusters for a total of ca. 6400 compute nodes. Maybe our
robinhood HW is to small. Do you think changing the HW could help?
Carmelo
On Tue, 2015-04-21 at 11:14 +0200, LEIBOVICI Thomas wrote:
> Hi,
>
> Reports indicate the DB backend is not overloaded, so you don't have to
> change your DB tunings or db access method.
> It looks the access to the filesystem is very slow: GET_INFO_FS stage
> takes about 127ms, which means filesystem calls take 127ms which is too
> long.
>
> As mysql and robinhood are almost idle (0% CPU), do you know what causes
> the server load of 30 ??
> This load my explain why accesses are slow.
>
> Thomas
>
>
> On 04/21/15 10:23, Carmelo Ponti (CSCS) wrote:
> > Hi Thomas
> >
> > Thank you very much for your prompt answer.
> >
> >> * An overview of your robinhood pipeline would help determining where it
> >> spends time.
> >> grep STATS /var/log/robinhood.daint.log
> >
> > 2015/04/21 09:56:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | ======= Resource
> > Monitor stats ======
> > 2015/04/21 09:56:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | Trigger #0
> > (global_usage) : purge running.
> > 2015/04/21 09:56:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | ============ Purge
> > stats ============
> > 2015/04/21 09:56:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | idle purge
> > threads = 0
> > 2015/04/21 09:56:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | purge operations
> > pending = 255
> > 2015/04/21 09:56:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | purge status:
> > 2015/04/21 09:56:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | successfully
> > purged = 76350
> > 2015/04/21 09:56:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | accessed since last
> > update = 16
> > 2015/04/21 09:56:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | moved or deleted
> > since last update = 1050
> > 2015/04/21 09:56:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS |
> > whitelisted/ignored = 1748614
> > 2015/04/21 09:56:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | total purged volume =
> > 3576313036800 (3.25 TB)
> > 2015/04/21 09:56:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | last file submitted 0
> > s ago
> > 2015/04/21 09:56:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | last file handled 0
> > s ago
> > 2015/04/21 09:56:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | last file purged 0
> > s ago
> > 2015/04/21 09:57:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | ====================
> > Dumping stats at 2015/04/21 09:57:33 =====================
> > 2015/04/21 09:57:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | ======== General
> > statistics =========
> > 2015/04/21 09:57:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | Daemon start time:
> > 2015/04/20 15:57:20
> > 2015/04/21 09:57:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | Started modules:
> > log_reader,purge
> > 2015/04/21 09:57:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | ChangeLog reader #0:
> > 2015/04/21 09:57:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | fs_name = snx11026
> > 2015/04/21 09:57:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | mdt_name = MDT0000
> > 2015/04/21 09:57:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | reader_id = cl1
> > 2015/04/21 09:57:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | records read
> > = 10909648
> > 2015/04/21 09:57:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | interesting records
> > = 8816971
> > 2015/04/21 09:57:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | suppressed records
> > = 2092677
> > 2015/04/21 09:57:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | records pending
> > = 741
> > 2015/04/21 09:57:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | last
> > received = 2015/04/21 09:56:39
> > 2015/04/21 09:57:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | last read record
> > time = 2015/04/20 17:21:01.102867
> > 2015/04/21 09:57:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | last read record
> > id = 5448947569
> > 2015/04/21 09:57:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | last pushed record
> > id = 5448946682
> > 2015/04/21 09:57:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | last committed
> > record id = 5448831700
> > 2015/04/21 09:57:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | last cleared record
> > id = 5448830961
> > 2015/04/21 09:57:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | read
> > speed = 6.60 record/sec
> > 2015/04/21 09:57:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | processing speed
> > ratio = 0.03
> > 2015/04/21 09:57:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS |
> > status = busy
> > 2015/04/21 09:57:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | ChangeLog stats:
> > 2015/04/21 09:57:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | MARK: 0, CREAT:
> > 2425674, MKDIR: 60555, HLINK: 82, SLINK: 2671, MKNOD: 0, UNLNK: 3115664
> > 2015/04/21 09:57:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | RMDIR: 47692, RENME:
> > 538602, RNMTO: 538602, OPEN: 0, CLOSE: 0, LYOUT: 0, TRUNC: 0
> > 2015/04/21 09:57:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | SATTR: 4145515,
> > XATTR: 9103, HSM: 0, MTIME: 25484, CTIME: 4, ATIME: 0
> > 2015/04/21 09:57:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | ==== EntryProcessor
> > Pipeline Stats ===
> > 2015/04/21 09:57:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | Idle threads: 0
> > 2015/04/21 09:57:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | Id constraints count:
> > 93223 (hash min=0/max=99/avg=5.8)
> > 2015/04/21 09:57:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | Name constraints count:
> > 82374 (hash min=0/max=1068/avg=5.1)
> > 2015/04/21 09:57:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | Stage |
> > Wait | Curr | Done | Total | ms/op |
> > 2015/04/21 09:57:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | 0: GET_FID |
> > 0 | 0 | 0 | 0 | 0.00 |
> > 2015/04/21 09:57:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | 1: GET_INFO_DB
> > |99977 | 0 | 0 | 9254855 | 0.46 |
> > 2015/04/21 09:57:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | 2: GET_INFO_FS |
> > 3 | 16 | 4 | 5060149 | 127.36 |
> > 2015/04/21 09:57:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | 3: REPORTING |
> > 0 | 0 | 0 | 1621148 | 0.00 |
> > 2015/04/21 09:57:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | 4: PRE_APPLY |
> > 0 | 0 | 0 | 2329020 | 0.00 |
> > 2015/04/21 09:57:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | 5: DB_APPLY |
> > 0 | 0 | 0 | 2329020 | 0.90 |
> > 2015/04/21 09:57:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | 6: CHGLOG_CLR |
> > 0 | 0 | 0 | 8742336 | 0.02 |
> > 2015/04/21 09:57:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | 7: RM_OLD_ENTRIES |
> > 0 | 0 | 0 | 0 | 0.00 |
> > 2015/04/21 09:57:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | DB ops:
> > get=1464691/ins=395683/upd=1224593/rm=707447
> > 2015/04/21 09:57:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | --- Pipeline stage
> > details ---
> > 2015/04/21 09:57:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | GET_INFO_DB : first:
> > changelog record #5448831728, fid=[0x259aef470:0x49a9:0x0], status=waiting
> > 2015/04/21 09:57:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | GET_INFO_DB : last:
> > changelog record #5448946681, fid=[0x257f8da18:0xce40:0x0], status=waiting
> > 2015/04/21 09:57:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | GET_INFO_FS : first:
> > changelog record #5448831701, fid=[0x25847ab0d:0x9e8a:0x0],
> > status=processing
> > 2015/04/21 09:57:33 robinhood@daintrbh01
> > <mailto:robinhood@daintrbh01>[26268/1] STATS | GET_INFO_FS : last:
> > changelog record #5448831727, fid=[0x259aef470:0x49a4:0x0], status=done
> >
> >
> >> * Running "sar -d" on you robinhood host for the DB storage disk, is the
> >> last column (%util) close to 100%?
> >
> > Currently we are using an SSD disk and in this moment last column
> > (%util) seems OK:
> >
> > 12:10:05 AM dev8-16 9.97 0.00 1150.94 115.40
> > 0.04 4.35 0.78 0.78
> > 12:20:01 AM dev8-16 10.23 0.00 1085.88 106.14
> > 0.04 4.29 0.63 0.64
> > 12:30:05 AM dev8-16 10.59 0.00 1252.76 118.35
> > 0.05 4.38 0.89 0.94
> > 12:40:01 AM dev8-16 23.02 36.74 2117.12 93.58
> > 0.11 4.57 0.57 1.32
> > 12:50:02 AM dev8-16 82.62 37.86 6054.04 73.73
> > 0.42 5.08 0.35 2.92
> > 01:00:04 AM dev8-16 11.73 0.05 1038.67 88.54
> > 0.06 4.72 0.58 0.67
> > 01:10:04 AM dev8-16 14.87 0.59 1267.59 85.27
> > 0.07 4.93 0.50 0.74
> > 01:20:02 AM dev8-16 12.44 0.28 1138.93 91.59
> > 0.06 4.89 0.57 0.71
> > 01:30:02 AM dev8-16 11.79 0.00 1010.65 85.74
> > 0.06 4.80 0.54 0.64
> > 01:40:01 AM dev8-16 12.83 0.00 1214.26 94.68
> > 0.06 4.85 0.71 0.91
> > 01:50:02 AM dev8-16 31.71 159.38 2330.52 78.53
> > 0.15 4.79 0.46 1.46
> > 02:00:04 AM dev8-16 19.18 0.11 1718.61 89.61
> > 0.10 5.09 0.66 1.27
> > 02:10:02 AM dev8-16 22.84 1.94 2110.76 92.49
> > 0.12 5.11 0.60 1.36
> > 02:20:01 AM dev8-16 11.16 0.00 1346.71 120.64
> > 0.05 4.61 0.88 0.98
> > 02:30:01 AM dev8-16 11.25 0.05 1348.59 119.86
> > 0.05 4.23 0.69 0.78
> > 02:40:02 AM dev8-16 20.44 0.05 1873.43 91.64
> > 0.10 4.74 0.55 1.13
> > 02:50:03 AM dev8-16 11.58 0.00 1357.64 117.25
> > 0.05 4.33 0.81 0.94
> > 03:00:01 AM dev8-16 235.51 686.79 18075.37 79.67
> > 1.38 5.87 0.37 8.65
> > 03:10:03 AM dev8-16 131.17 23.25 9851.24 75.28
> > 0.85 6.50 0.42 5.54
> > 03:20:02 AM dev8-16 23.08 3.06 1567.65 68.05
> > 0.14 5.97 0.68 1.57
> > 03:30:02 AM dev8-16 21.13 1.02 1667.58 78.97
> > 0.12 5.88 0.62 1.32
> > 03:40:01 AM dev8-16 11.54 0.00 1340.91 116.18
> > 0.06 4.77 0.96 1.11
> > 03:50:02 AM dev8-16 11.33 0.00 1365.30 120.48
> > 0.06 5.09 0.95 1.07
> > 04:00:03 AM dev8-16 13.15 0.21 1451.96 110.40
> > 0.07 5.12 0.88 1.16
> > 04:10:02 AM dev8-16 11.42 0.00 1345.78 117.85
> > 0.06 4.86 0.97 1.10
> > 04:20:02 AM dev8-16 10.77 0.05 1336.48 124.11
> > 0.05 4.67 0.92 0.99
> > 04:30:01 AM dev8-16 11.35 0.48 1356.11 119.55
> > 0.06 4.93 0.90 1.02
> > 04:40:07 AM dev8-16 26.63 59.74 1958.26 75.79
> > 0.13 4.93 0.45 1.21
> > 04:50:02 AM dev8-16 11.90 0.00 1093.49 91.91
> > 0.05 4.61 0.59 0.70
> > 05:00:06 AM dev8-16 12.20 2.13 1014.56 83.32
> > 0.06 4.80 0.56 0.68
> > 05:10:02 AM dev8-16 12.40 0.54 1080.64 87.21
> > 0.05 4.43 0.52 0.64
> > 05:20:03 AM dev8-16 10.76 0.00 914.27 85.00
> > 0.05 4.40 0.60 0.65
> > 05:30:04 AM dev8-16 10.85 0.11 951.95 87.78
> > 0.05 4.57 0.65 0.71
> > 05:40:06 AM dev8-16 12.02 1.07 1009.64 84.08
> > 0.06 4.86 0.75 0.90
> > 05:50:04 AM dev8-16 11.65 1.07 971.08 83.45
> > 0.06 4.75 0.54 0.63
> > 06:00:06 AM dev8-16 10.87 0.75 865.84 79.69
> > 0.05 4.50 0.63 0.68
> > 06:10:02 AM dev8-16 11.72 0.11 1034.79 88.30
> > 0.06 4.82 0.69 0.81
> > 06:20:02 AM dev8-16 12.79 0.16 1080.31 84.50
> > 0.06 4.73 0.51 0.65
> > 06:30:03 AM dev8-16 10.99 1.39 928.56 84.59
> > 0.05 4.55 0.55 0.61
> > 06:40:06 AM dev8-16 10.74 5.51 893.18 83.69
> > 0.05 4.56 0.65 0.69
> > 06:50:02 AM dev8-16 11.73 0.92 1014.51 86.59
> > 0.05 4.60 0.52 0.61
> > 07:00:01 AM dev8-16 167.35 1058.15 11826.19 76.99
> > 0.88 5.24 0.36 6.10
> > 07:10:01 AM dev8-16 77.19 236.47 5595.96 75.56
> > 0.43 5.55 0.47 3.63
> > 07:20:01 AM dev8-16 219.20 459.52 17196.77 80.55
> > 1.34 6.13 0.40 8.70
> > 07:30:02 AM dev8-16 153.37 109.63 10959.45 72.17
> > 0.86 5.62 0.34 5.21
> > 07:40:10 AM dev8-16 13.91 0.32 1147.46 82.49
> > 0.11 8.02 0.70 0.97
> > 07:50:01 AM dev8-16 13.30 0.00 1175.87 88.44
> > 0.07 5.56 0.64 0.85
> > 08:00:01 AM dev8-16 148.46 515.57 10966.84 77.34
> > 0.88 5.94 0.36 5.35
> > 08:10:05 AM dev8-16 124.70 40.06 9220.48 74.27
> > 0.71 5.72 0.43 5.32
> > 08:20:02 AM dev8-16 24.91 10.71 1595.07 64.47
> > 0.15 5.85 0.72 1.80
> > 08:30:04 AM dev8-16 41.70 207.69 2723.45 70.29
> > 0.23 5.54 0.59 2.45
> > 08:40:02 AM dev8-16 37.94 20.20 2680.39 71.17
> > 0.21 5.49 0.56 2.12
> > 08:50:03 AM dev8-16 44.98 15.02 3061.40 68.40
> > 0.27 6.01 0.58 2.59
> > 09:00:02 AM dev8-16 41.42 10.29 2806.15 67.99
> > 0.23 5.51 0.52 2.15
> > 09:10:01 AM dev8-16 27.64 11.86 1878.74 68.40
> > 0.16 5.61 0.59 1.64
> > 09:20:03 AM dev8-16 21.89 23.48 1452.68 67.44
> > 0.13 5.88 0.65 1.42
> > 09:30:02 AM dev8-16 27.67 65.13 1865.97 69.78
> > 0.14 5.23 0.55 1.52
> > 09:40:02 AM dev8-16 31.31 57.76 2381.06 77.90
> > 0.16 5.15 0.59 1.83
> > 09:50:02 AM dev8-16 27.29 39.61 1859.79 69.60
> > 0.15 5.48 0.55 1.50
> > 10:00:04 AM dev8-16 40.77 48.04 2970.01 74.03
> > 0.21 5.20 0.53 2.16
> > Average: dev8-16 37.74 65.47 2940.89 79.66
> > 0.21 5.54 0.48 1.82
> >
> >
> >> * If you DB disk backend is not fast, I suggest you use DB request
> >> batching. In robinhood config (EntryProcessor block):
> >>
> >> max_batch_size = 1000;
> >>
> >> Or you can try increasing processing threads, in EntryProcessor block:
> >>
> >> nb_threads = 32 ;
> >
> > We are using an SSD disk as DB backend so I'm going to increase the
> > number of threads to 32 and monitor the system.
> >
> >> * Looking at "top", what is the memory and CPU usage of mysqld and
> >> robinhood under full load?
> >
> > The load of the server in this moment is ~30 (load average: 32.30,
> > 34.40, 35.24) and top of robinhood and mysql give me the following values:
> >
> > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> > 26268 root 20 0 4373m 1.8g 1428 S 0.0 1.5 316:44.06 robinhood
> > 3217 mysql 20 0 37.6g 1.3g 4320 S 0.0 1.0 1443:00 mysqld
> >
> > Thank you for the DB tuning advice. I'll try them and let you know.
> >
> > regards
> > Carmelo
>
--
----------------------------------------------------------------------
Carmelo Ponti System Engineer
CSCS Swiss Center for Scientific Computing
Via Trevano 131 Email: [email protected]
CH-6900 Lugano http://www.cscs.ch
Phone: +41 91 610 82 15/Fax: +41 91 610 82 82
----------------------------------------------------------------------
------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
_______________________________________________
robinhood-support mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/robinhood-support