For a few weeks, one of our clusters has been having large spikes in CPU across most or all nodes, at the same time. Beginning at the same time, our metrics collection job has intermittently started to see long response times when calling the following api end-points:
/nifi-api/process-groups/root /nifi-api/flow/process-groups/root/status?recursive=False&nodewise=True /nifi-api/system-diagnostics?nodewise=true There has been no change in response time for these end-points: /nifi-api/access/token /nifi-api/flow/cluster/summary /nifi-api/tenants/users/ /nifi-api/flow/process-groups/root/controller-services?includeAncestorGroups=False&includeDescendantGroups=True All calls are node local, so usually response times are sub-second. But when the slowness sets in, some problem calls may take as long as 45 to 60 seconds. Sometimes it's just one call, sometimes it's more than one that is slow (they are called synchronously back to back). Any thoughts on properties/settings that we can look at, or component logging that we can enable to help troubleshoot the slowness? We have a second cluster, geographically isolated, that runs (to the best of our knowledge) the exact same jobs with the exact same data. This cluster has no issues. Thanks, Peter