Hong,
Nice link to the parallel threads issue - very timely and useful as we just
put in the replicaSet workaround to 3 yesterday to fix an issue running only on
one core.
Will look more into the logstash config as well - the issue is we baseline
at 30 logs/sec on an idle system now - so CPU usage is unavoidable - a more
granular VM cluster will help to a point.
That fix for the replicaCount:3 will not be sufficient - it would need an
autoscaler and cpu limiter in the yaml
A better fix is a switch to a DaemonSet - 1 per vm - this is in review as
of last night.
https://jira.onap.org/browse/LOG-376
https://jira.onap.org/browse/LOG-181
https://gerrit.onap.org/r/#/c/48139/
This hogging of all available CPUs on a particular host is also a problem
for a couple other applications in onap - each one will require similar
resource tuning currently occurring in the log pods.
My main CD cluster is still 4 x 64g but a move to 9 x 16g also helps with
the cpu granularity of the pods - the same system Gary's CD system runs.
https://git.onap.org/logging-analytics/tree/deploy/rancher
https://git.onap.org/integration/tree/deployment/heat/onap-oom
The nodjs issue is separate from this though right.
Thank you
/michael
From: GUAN, HONG [mailto:[email protected]]
Sent: Friday, May 18, 2018 9:16 AM
To: Michael O'Brien <[email protected]>; [email protected];
[email protected]
Subject: RE: OOM Beijing CPU utilization
FYI
Below are what we found out about CPU Management of Logstash.
https://discuss.elastic.co/t/cpu-management-of-logstash/99487
Before deploy 'log'(CPU 6%)
[centos@server-k8s-cluster-1node-kubernetes-master-host-afxat7 kubernetes]$
kubectl top node
NAME CPU(cores) CPU%
MEMORY(bytes) MEMORY%
server-k8s-cluster-1node-kubernetes-node-host-645o52 312m 3%
12273Mi 77%
server-k8s-cluster-1node-kubernetes-node-host-s891z4 1586m 19%
4082Mi 25%
server-k8s-cluster-1node-kubernetes-node-host-6v5ip2 531m 6%
2278Mi 14%
server-k8s-cluster-1node-kubernetes-master-host-afxat7 124m 1%
2933Mi 18%
server-k8s-cluster-1node-kubernetes-node-host-vpsi6z 197m 2%
12344Mi 78%
After deploy 'log' (CPU 97%)
[centos@server-k8s-cluster-1node-kubernetes-master-host-afxat7 kubernetes]$
kubectl get pod -n onap -o wide
NAME READY STATUS RESTARTS
AGE IP NODE
onap-appc-appc-0 2/2 Running 0
15h 10.47.0.8 server-k8s-cluster-1node-kubernetes-node-host-645o52
onap-appc-appc-cdt-7878d75dd8-nmhld 1/1 Running 0
15h 10.36.0.3 server-k8s-cluster-1node-kubernetes-node-host-s891z4
onap-appc-appc-db-0 2/2 Running 0
15h 10.42.0.4 server-k8s-cluster-1node-kubernetes-node-host-6v5ip2
onap-appc-appc-dgbuilder-989bc9898-prbzg 1/1 Running 0
15h 10.36.0.4 server-k8s-cluster-1node-kubernetes-node-host-s891z4
onap-consul-consul-6d9946f754-2qv8g 1/1 Running 0
15h 10.42.0.5 server-k8s-cluster-1node-kubernetes-node-host-6v5ip2
onap-consul-consul-server-0 1/1 Running 0
15h 10.36.0.5 server-k8s-cluster-1node-kubernetes-node-host-s891z4
onap-consul-consul-server-1 1/1 Running 0
15h 10.42.0.6 server-k8s-cluster-1node-kubernetes-node-host-6v5ip2
onap-consul-consul-server-2 1/1 Running 0
15h 10.47.0.9 server-k8s-cluster-1node-kubernetes-node-host-645o52
onap-log-log-elasticsearch-f4cdbb4b8-d8kgd 1/1 Running 0
5m 10.36.0.8 server-k8s-cluster-1node-kubernetes-node-host-s891z4
onap-log-log-kibana-9f8768474-pps9r 1/1 Running 0
5m 10.42.0.8 server-k8s-cluster-1node-kubernetes-node-host-6v5ip2
onap-log-log-logstash-7dd49fd4d-7vhhs 1/1 Running 0
5m 10.42.0.9 server-k8s-cluster-1node-kubernetes-node-host-6v5ip2
onap-log-log-logstash-7dd49fd4d-l5thf 1/1 Running 0
5m 10.36.0.7 server-k8s-cluster-1node-kubernetes-node-host-s891z4
onap-log-log-logstash-7dd49fd4d-sllqv 1/1 Running 0
5m 10.47.0.11 server-k8s-cluster-1node-kubernetes-node-host-645o52
onap-msb-kube2msb-69b4cfb74d-sxc47 1/1 Running 0
15h 10.42.0.3 server-k8s-cluster-1node-kubernetes-node-host-6v5ip2
onap-msb-msb-consul-b946c8486-dcbm9 1/1 Running 0
15h 10.36.0.1 server-k8s-cluster-1node-kubernetes-node-host-s891z4
[centos@server-k8s-cluster-1node-kubernetes-master-host-afxat7 kubernetes]$
kubectl top node
NAME CPU(cores) CPU%
MEMORY(bytes) MEMORY%
server-k8s-cluster-1node-kubernetes-node-host-645o52 971m 12%
12452Mi 78%
server-k8s-cluster-1node-kubernetes-node-host-s891z4 825m 10%
5182Mi 32%
server-k8s-cluster-1node-kubernetes-node-host-6v5ip2 7807m 97%
4354Mi 27%
server-k8s-cluster-1node-kubernetes-master-host-afxat7 158m 1%
2952Mi 18%
server-k8s-cluster-1node-kubernetes-node-host-vpsi6z 213m 2%
12461Mi 78%
[centos@server-k8s-cluster-1node-kubernetes-master-host-afxat7 kubernetes]$
Thanks,
Hong
From:
[email protected]<mailto:[email protected]>
[mailto:[email protected]] On Behalf Of OBRIEN, FRANK MICHAEL
Sent: Friday, May 18, 2018 12:03 AM
To: [email protected]<mailto:[email protected]>;
[email protected]<mailto:[email protected]>
Subject: Re: [onap-discuss] OOM Beijing CPU utilization
Hi,
I have seen this 3 times from Dec to March - tracking this nodejs issue via
OOM-834 (not an OOM issue) - last saw it 27th March under 1.8.10 (current
version) - but running helm 2.6.1 (current version 2.8.2)
https://jira.onap.org/browse/OOM-834<https://urldefense.proofpoint.com/v2/url?u=https-3A__jira.onap.org_browse_OOM-2D834&d=DwMFAg&c=LFYZ-o9_HUMeMTSQicvjIg&r=bUW1yd5b4djZ_J3L_jlK2A&m=L0JQnOxKvCvyKzAvkkzLD91rQxughYCQ5gUi3H9258c&s=obo4K1OVBv0H0CsRoXCG0T10rOeUddAbX9jRKXDr4nM&e=>
Something in the infrastructure is causing this - as I have seen it on an
idle kubernetes cluster (no onap pods installed)
Will look again through the k8s jiras
You are correct - it is not the .ru crypto miner that targets 10250/pods or
the new one that targets a cluster without oauth lockdown
Tracking anti-crypto here
https://jira.onap.org/browse/LOG-353<https://urldefense.proofpoint.com/v2/url?u=https-3A__jira.onap.org_browse_LOG-2D353&d=DwMFAg&c=LFYZ-o9_HUMeMTSQicvjIg&r=bUW1yd5b4djZ_J3L_jlK2A&m=L0JQnOxKvCvyKzAvkkzLD91rQxughYCQ5gUi3H9258c&s=h__wOfz1ALTKTbE7kUVCnNWH0kWksFXAj8A7nB-eZBQ&e=>
I think I will ask for 5 min to go over the lockdown of clusters with the
security subcommittee - the oauth lockdown will cover off 10249-10255 as well.
/michael
From:
[email protected]<mailto:[email protected]>
[mailto:[email protected]] On Behalf Of
[email protected]<mailto:[email protected]>
Sent: Thursday, May 17, 2018 7:28 PM
To: [email protected]<mailto:[email protected]>
Subject: [onap-discuss] OOM Beijing CPU utilization
Hi,
I have a running OOM ONAP Beijing deployment on 2 nodes.
After a few days running OK, i noticed around 100% CPU on all 16 vCPUs on the
1st node.
I see a process nodejs running with 815% CPU as shown below.
What is this process doing ?
I checked for mining, and there's none, and I have port 10250 blocked, I don't
see any suspicious processes.
I had to kill the nodejs process in order to regain interactivity with my onap
deployment.
Thanks.
root@olc-oom-bjng:~# top
top - 22:58:14 up 13 days, 1:28, 1 user, load average: 53.66, 49.26, 48.69
Tasks: 1181 total, 1 running, 1175 sleeping, 0 stopped, 5 zombie
%Cpu(s): 84.0 us, 14.9 sy, 0.1 ni, 0.4 id, 0.0 wa, 0.0 hi, 0.2 si, 0.3 st
KiB Mem : 10474657+total, 1037308 free, 88390000 used, 15319272 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 14242952 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
20119 root 20 0 1431420 65876 1124 S 815.4 0.1 34465:04 nodjs -c
/bin/config.json
_________________________________________________________________________________________________________________________
Ce message et ses pieces jointes peuvent contenir des informations
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou
falsifie. Merci.
This message and its attachments may contain confidential or privileged
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been
modified, changed or falsified.
Thank you.
This message and the information contained herein is proprietary and
confidential and subject to the Amdocs policy statement,
you may review at
https://www.amdocs.com/about/email-disclaimer<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.amdocs.com_about_email-2Ddisclaimer&d=DwMFAg&c=LFYZ-o9_HUMeMTSQicvjIg&r=bUW1yd5b4djZ_J3L_jlK2A&m=L0JQnOxKvCvyKzAvkkzLD91rQxughYCQ5gUi3H9258c&s=Vih3vcwvszxLxdm1rniV2a1QyQyfBd_5TXeUbBcY3NM&e=>
This message and the information contained herein is proprietary and
confidential and subject to the Amdocs policy statement,
you may review at https://www.amdocs.com/about/email-disclaimer
<https://www.amdocs.com/about/email-disclaimer>
_______________________________________________
onap-discuss mailing list
[email protected]
https://lists.onap.org/mailman/listinfo/onap-discuss