Hi,

We’ve observed some suspicious memory leak problems of MGR since upgraded to 
Nautilus. 
Yesterday I upgrade our cluster to the latest 14.2.10 and this problem seems 
still reproducible. According to the monitoring chart (memory usage of the 
active mgr node), the memory consumption started to increase with higher 
velocity at about 21:40pm. Up to now the reserved memory of mgr is about 8.3G 
according to top. I also checked the log of mgr and found that from 21:38:40, a 
message "client.0 ms_handle_reset on v2:10.3.1.3:6800/6” was produced every 
second:


2020-06-29 21:38:24.173 7f2ecab7a700  0 log_channel(cluster) log [DBG] : pgmap 
v9979: 1824 pgs: 3 active+clean+scrubbing+deep, 1821 active+clean; 54 TiB data, 
169 TiB used, 153 TiB / 322 TiB avail; 8.1 MiB/s rd, 21 MiB/s wr, 673 op/s
2020-06-29 21:38:26.180 7f2ecab7a700  0 log_channel(cluster) log [DBG] : pgmap 
v9980: 1824 pgs: 3 active+clean+scrubbing+deep, 1821 active+clean; 54 TiB data, 
169 TiB used, 153 TiB / 322 TiB avail; 9.6 MiB/s rd, 26 MiB/s wr, 764 op/s
2020-06-29 21:38:28.183 7f2ecab7a700  0 log_channel(cluster) log [DBG] : pgmap 
v9981: 1824 pgs: 3 active+clean+scrubbing+deep, 1821 active+clean; 54 TiB data, 
169 TiB used, 153 TiB / 322 TiB avail; 9.3 MiB/s rd, 23 MiB/s wr, 667 op/s
2020-06-29 21:38:30.186 7f2ecab7a700  0 log_channel(cluster) log [DBG] : pgmap 
v9982: 1824 pgs: 3 active+clean+scrubbing+deep, 1821 active+clean; 54 TiB data, 
169 TiB used, 153 TiB / 322 TiB avail; 9.3 MiB/s rd, 22 MiB/s wr, 661 op/s
2020-06-29 21:38:32.191 7f2ecab7a700  0 log_channel(cluster) log [DBG] : pgmap 
v9983: 1824 pgs: 3 active+clean+scrubbing+deep, 1821 active+clean; 54 TiB data, 
169 TiB used, 153 TiB / 322 TiB avail; 8.1 MiB/s rd, 21 MiB/s wr, 683 op/s
2020-06-29 21:38:34.195 7f2ecab7a700  0 log_channel(cluster) log [DBG] : pgmap 
v9984: 1824 pgs: 3 active+clean+scrubbing+deep, 1821 active+clean; 54 TiB data, 
169 TiB used, 153 TiB / 322 TiB avail; 4.0 MiB/s rd, 17 MiB/s wr, 670 op/s
2020-06-29 21:38:36.200 7f2ecab7a700  0 log_channel(cluster) log [DBG] : pgmap 
v9985: 1824 pgs: 3 active+clean+scrubbing+deep, 1821 active+clean; 54 TiB data, 
169 TiB used, 153 TiB / 322 TiB avail; 2.4 MiB/s rd, 15 MiB/s wr, 755 op/s
2020-06-29 21:38:38.203 7f2ecab7a700  0 log_channel(cluster) log [DBG] : pgmap 
v9986: 1824 pgs: 3 active+clean+scrubbing+deep, 1821 active+clean; 54 TiB data, 
169 TiB used, 153 TiB / 322 TiB avail; 1.2 MiB/s rd, 12 MiB/s wr, 668 op/s
2020-06-29 21:38:40.207 7f2ecab7a700  0 log_channel(cluster) log [DBG] : pgmap 
v9987: 1824 pgs: 3 active+clean+scrubbing+deep, 1821 active+clean; 54 TiB data, 
169 TiB used, 153 TiB / 322 TiB avail; 1.1 MiB/s rd, 12 MiB/s wr, 681 op/s
2020-06-29 21:38:40.887 7f2edbec3700  0 client.0 ms_handle_reset on 
v2:10.3.1.3:6800/6
2020-06-29 21:38:41.887 7f2edbec3700  0 client.0 ms_handle_reset on 
v2:10.3.1.3:6800/6
2020-06-29 21:38:42.213 7f2ecab7a700  0 log_channel(cluster) log [DBG] : pgmap 
v9988: 1824 pgs: 3 active+clean+scrubbing+deep, 1821 active+clean; 54 TiB data, 
169 TiB used, 153 TiB / 322 TiB avail; 1.2 MiB/s rd, 13 MiB/s wr, 735 op/s
2020-06-29 21:38:42.887 7f2edbec3700  0 client.0 ms_handle_reset on 
v2:10.3.1.3:6800/6
2020-06-29 21:38:43.887 7f2edbec3700  0 client.0 ms_handle_reset on 
v2:10.3.1.3:6800/6
2020-06-29 21:38:44.216 7f2ecab7a700  0 log_channel(cluster) log [DBG] : pgmap 
v9989: 1824 pgs: 3 active+clean+scrubbing+deep, 1821 active+clean; 54 TiB data, 
169 TiB used, 153 TiB / 322 TiB avail; 982 KiB/s rd, 12 MiB/s wr, 687 op/s
2020-06-29 21:38:44.888 7f2edbec3700  0 client.0 ms_handle_reset on 
v2:10.3.1.3:6800/6
2020-06-29 21:38:45.888 7f2edbec3700  0 client.0 ms_handle_reset on 
v2:10.3.1.3:6800/6
2020-06-29 21:38:46.222 7f2ecab7a700  0 log_channel(cluster) log [DBG] : pgmap 
v9990: 1824 pgs: 3 active+clean+scrubbing+deep, 1821 active+clean; 54 TiB data, 
169 TiB used, 153 TiB / 322 TiB avail; 1.2 MiB/s rd, 17 MiB/s wr, 789 op/s
2020-06-29 21:38:46.888 7f2edbec3700  0 client.0 ms_handle_reset on 
v2:10.3.1.3:6800/6
2020-06-29 21:38:47.888 7f2edbec3700  0 client.0 ms_handle_reset on 
v2:10.3.1.3:6800/6
2020-06-29 21:38:48.225 7f2ecab7a700  0 log_channel(cluster) log [DBG] : pgmap 
v9991: 1824 pgs: 3 active+clean+scrubbing+deep, 1821 active+clean; 54 TiB data, 
169 TiB used, 153 TiB / 322 TiB avail; 1.2 MiB/s rd, 15 MiB/s wr, 684 op/s
2020-06-29 21:38:48.888 7f2edbec3700  0 client.0 ms_handle_reset on 
v2:10.3.1.3:6800/6
2020-06-29 21:38:49.888 7f2edbec3700  0 client.0 ms_handle_reset on 
v2:10.3.1.3:6800/6
2020-06-29 21:38:50.228 7f2ecab7a700  0 log_channel(cluster) log [DBG] : pgmap 
v9992: 1824 pgs: 3 active+clean+scrubbing+deep, 1821 active+clean; 54 TiB data, 
169 TiB used, 153 TiB / 322 TiB avail; 940 KiB/s rd, 16 MiB/s wr, 674 op/s
2020-06-29 21:38:50.888 7f2edbec3700  0 client.0 ms_handle_reset on 
v2:10.3.1.3:6800/6
2020-06-29 21:38:51.888 7f2edbec3700  0 client.0 ms_handle_reset on 
v2:10.3.1.3:6800/6
2020-06-29 21:38:52.235 7f2ecab7a700  0 log_channel(cluster) log [DBG] : pgmap 
v9993: 1824 pgs: 3 active+clean+scrubbing+deep, 1821 active+clean; 54 TiB data, 
169 TiB used, 153 TiB / 322 TiB avail; 1.3 MiB/s rd, 19 MiB/s wr, 752 op/s
2020-06-29 21:38:52.889 7f2edbec3700  0 client.0 ms_handle_reset on 
v2:10.3.1.3:6800/6
2020-06-29 21:38:53.889 7f2edbec3700  0 client.0 ms_handle_reset on 
v2:10.3.1.3:6800/6


I enable the debug log of another cluster which has the same problem, the log 
is attached

Any thoughts about this issue?

BR,
Xu Yun

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to