All,

I realize this is a very long thread (with apologies.  But, I really need 
to find a solution to the high CPU usage.  Please let me know if I can 
provide any additional information so that you can help me.

On Friday, January 21, 2022 at 7:27:18 AM UTC-5 Dimitri Yioulos wrote:

> That's a good question.  The machine that I'm running node_exporter on for 
> which you see the pprof output was just rebuilt.  So, the output is from a 
> fresh, and basic, install of node_exporter.  This is the systemd 
> node_exporter service:
>
> [Unit]
> Description=Node Exporter
> After=network.target
>
> [Service]
> User=node_exporter
> Group=node_exporter
> Type=simple
> ExecStart=/usr/local/bin/node_exporter
>
> [Install]
> WantedBy=multi-user.target
>
> and, the prometheus target:
>
>   - job_name: 'myserver1'
>     scrape_interval: 5s
>     static_configs:
>       - targets: ['myserver1:9100']
>         labels:
>           env: prod
>           alias: myserver1
>
> I'm not sure what else to look at.
>
> On Friday, January 21, 2022 at 3:06:43 AM UTC-5 Brian Candler wrote:
>
>> The question is, why are systemd collector and process collector still in 
>> that graph?
>>
>> On Friday, 21 January 2022 at 00:27:14 UTC dyio...@gmail.com wrote:
>>
>>> The attached is pprof output in text format, which may be easier to read
>>>
>>> On Thursday, January 20, 2022 at 6:30:25 PM UTC-5 Dimitri Yioulos wrote:
>>>
>>>> I ran pprof (attached).  I'll have to work on /proc/<pid>/stat (even 
>>>> with the much appreciated reference :-) ).
>>>>
>>>> On Thursday, January 20, 2022 at 11:54:33 AM UTC-5 Brian Candler wrote:
>>>>
>>>>> So now go back to the original suggestion: run pprof with 
>>>>> node_exporter running the way you *want* to be running it.
>>>>>
>>>>> > [root@myhost1 ~]# time for ((i=1;i<=1000;i++)); do node_exporter 
>>>>> >/dev/null 2>&1; done
>>>>>
>>>>> That's meaningless.  node_exporter is a daemon, not something you can 
>>>>> run one-shot like that.  If you remove the ">/dev/null 2>&1" you'll see 
>>>>> lots of startup messages, probably ending with
>>>>>
>>>>> ts=2022-01-20T16:49:07.433Z caller=node_exporter.go:202 level=error 
>>>>> err="listen tcp :9100: bind: address already in use"
>>>>>
>>>>> and then node_exporter terminating.  So you're not seeing the CPU 
>>>>> overhead of any node_exporter scrape jobs, only its startup overhead.
>>>>>
>>>>> If the system is idle apart from running node_exporter, then "top" 
>>>>> will show you system time and cpu time.  More accurately, find the 
>>>>> process 
>>>>> ID of node_exporter then look in /proc/<pid>/stat
>>>>>
>>>>> https://stackoverflow.com/questions/16726779/how-do-i-get-the-total-cpu-usage-of-an-application-from-proc-pid-stat
>>>>>
>>>>> On Thursday, 20 January 2022 at 12:33:06 UTC dyio...@gmail.com wrote:
>>>>>
>>>>>> Brian,
>>>>>>
>>>>>> Originally, I had not activated any additional collectors.  Then, I 
>>>>>> read somewhere that I should add the systemd and process collectors.  
>>>>>> Still 
>>>>>> learning, here, so ... .  That's why you saw them in the pprof graph.  I 
>>>>>> then curcled back and removed them.  However, high CPU usage has 
>>>>>> *always* been an issue.  That goes for every system in which I have 
>>>>>> node_exporter running.  While a few are test machines, and I care a bit 
>>>>>> less, for production machines it's an issue.
>>>>>>
>>>>>> Here's some time output for node_exporter, though I'm not good at 
>>>>>> interpreting the results:
>>>>>>
>>>>>> [root@myhost1 ~]# time for ((i=1;i<=1000;i++)); do node_exporter 
>>>>>> >/dev/null 2>&1; done
>>>>>>
>>>>>> real        0m6.103s
>>>>>> user        0m3.658s
>>>>>> sys        0m3.151s
>>>>>>
>>>>>> So, if the above is a good way to measure node_exporter's user versus 
>>>>>> system time, then they're about equal.  If you have another means to do 
>>>>>> such measurement, I'd appreciate your sharing it.  Once that's 
>>>>>> determined 
>>>>>> and, if system time versus user time is "out-of-whack", how do I 
>>>>>> remediate?
>>>>>>
>>>>>> Many thanks.
>>>>>>
>>>>>> On Thursday, January 20, 2022 at 3:46:35 AM UTC-5 Brian Candler wrote:
>>>>>>
>>>>>>> So the systemd and process collectors aren't active.  I wonder why 
>>>>>>> they appeared in your pprof graph then?  Was it exactly the same binary 
>>>>>>> you 
>>>>>>> were running?
>>>>>>>
>>>>>>> 20% CPU usage from a once-every-five-second scrape implies that it 
>>>>>>> should take about 1 CPU-second in total, but all the collectors seem 
>>>>>>> very 
>>>>>>> fast.  The top five use between 0.01 and 0.015 seconds - and that's 
>>>>>>> wall 
>>>>>>> clock time, not CPU time.
>>>>>>>
>>>>>>> node_scrape_collector_duration_seconds{collector="cpu"} 0.010873961
>>>>>>> node_scrape_collector_duration_seconds{collector="diskstats"} 
>>>>>>> 0.01727642
>>>>>>> node_scrape_collector_duration_seconds{collector="hwmon"} 0.014143617
>>>>>>> node_scrape_collector_duration_seconds{collector="netclass"} 
>>>>>>> 0.013852102
>>>>>>> node_scrape_collector_duration_seconds{collector="thermal_zone"} 
>>>>>>> 0.010936983
>>>>>>>
>>>>>>> Something weird is going on.  Next you might want to drill down into 
>>>>>>> node_exporter's user versus system time.  Is the usage mostly system 
>>>>>>> time?  
>>>>>>> That might point you some way, although the implication then is that 
>>>>>>> the 
>>>>>>> high CPU usage is some part of node_exporter outside of individual 
>>>>>>> collectors.
>>>>>>>
>>>>>>> On Wednesday, 19 January 2022 at 23:27:40 UTC dyio...@gmail.com 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> [root@myhost1 ~]# curl -Ss localhost:9100/metrics | grep -i 
>>>>>>>> collector
>>>>>>>> # HELP node_scrape_collector_duration_seconds node_exporter: 
>>>>>>>> Duration of a collector scrape.
>>>>>>>> # TYPE node_scrape_collector_duration_seconds gauge
>>>>>>>> node_scrape_collector_duration_seconds{collector="arp"} 0.002911805
>>>>>>>> node_scrape_collector_duration_seconds{collector="bcache"} 
>>>>>>>> 1.4571e-05
>>>>>>>> node_scrape_collector_duration_seconds{collector="bonding"} 
>>>>>>>> 0.000112308
>>>>>>>> node_scrape_collector_duration_seconds{collector="btrfs"} 
>>>>>>>> 0.001308192
>>>>>>>> node_scrape_collector_duration_seconds{collector="conntrack"} 
>>>>>>>> 0.002750716
>>>>>>>> node_scrape_collector_duration_seconds{collector="cpu"} 0.010873961
>>>>>>>> node_scrape_collector_duration_seconds{collector="cpufreq"} 
>>>>>>>> 0.008559194
>>>>>>>> node_scrape_collector_duration_seconds{collector="diskstats"} 
>>>>>>>> 0.01727642
>>>>>>>> node_scrape_collector_duration_seconds{collector="dmi"} 0.000971785
>>>>>>>> node_scrape_collector_duration_seconds{collector="edac"} 0.006972343
>>>>>>>> node_scrape_collector_duration_seconds{collector="entropy"} 
>>>>>>>> 0.001360089
>>>>>>>> node_scrape_collector_duration_seconds{collector="fibrechannel"} 
>>>>>>>> 2.8256e-05
>>>>>>>> node_scrape_collector_duration_seconds{collector="filefd"} 
>>>>>>>> 0.000739988
>>>>>>>> node_scrape_collector_duration_seconds{collector="filesystem"} 
>>>>>>>> 0.00554684
>>>>>>>> node_scrape_collector_duration_seconds{collector="hwmon"} 
>>>>>>>> 0.014143617
>>>>>>>> node_scrape_collector_duration_seconds{collector="infiniband"} 
>>>>>>>> 1.3484e-05
>>>>>>>> node_scrape_collector_duration_seconds{collector="ipvs"} 7.5532e-05
>>>>>>>> node_scrape_collector_duration_seconds{collector="loadavg"} 
>>>>>>>> 0.004074291
>>>>>>>> node_scrape_collector_duration_seconds{collector="mdadm"} 
>>>>>>>> 0.000974966
>>>>>>>> node_scrape_collector_duration_seconds{collector="meminfo"} 
>>>>>>>> 0.004201816
>>>>>>>> node_scrape_collector_duration_seconds{collector="netclass"} 
>>>>>>>> 0.013852102
>>>>>>>> node_scrape_collector_duration_seconds{collector="netdev"} 
>>>>>>>> 0.006993921
>>>>>>>> node_scrape_collector_duration_seconds{collector="netstat"} 
>>>>>>>> 0.007896151
>>>>>>>> node_scrape_collector_duration_seconds{collector="nfs"} 0.000125062
>>>>>>>> node_scrape_collector_duration_seconds{collector="nfsd"} 3.6075e-05
>>>>>>>> node_scrape_collector_duration_seconds{collector="nvme"} 0.001064067
>>>>>>>> node_scrape_collector_duration_seconds{collector="os"} 0.005645435
>>>>>>>> node_scrape_collector_duration_seconds{collector="powersupplyclass"} 
>>>>>>>> 0.001394135
>>>>>>>> node_scrape_collector_duration_seconds{collector="pressure"} 
>>>>>>>> 0.001466664
>>>>>>>> node_scrape_collector_duration_seconds{collector="rapl"} 0.00226622
>>>>>>>> node_scrape_collector_duration_seconds{collector="schedstat"} 
>>>>>>>> 0.006677493
>>>>>>>> node_scrape_collector_duration_seconds{collector="sockstat"} 
>>>>>>>> 0.000970676
>>>>>>>> node_scrape_collector_duration_seconds{collector="softnet"} 
>>>>>>>> 0.002014497
>>>>>>>> node_scrape_collector_duration_seconds{collector="stat"} 0.004216999
>>>>>>>> node_scrape_collector_duration_seconds{collector="tapestats"} 
>>>>>>>> 1.0296e-05
>>>>>>>> node_scrape_collector_duration_seconds{collector="textfile"} 
>>>>>>>> 5.2573e-05
>>>>>>>> node_scrape_collector_duration_seconds{collector="thermal_zone"} 
>>>>>>>> 0.010936983
>>>>>>>> node_scrape_collector_duration_seconds{collector="time"} 0.00568072
>>>>>>>> node_scrape_collector_duration_seconds{collector="timex"} 3.3662e-05
>>>>>>>> node_scrape_collector_duration_seconds{collector="udp_queues"} 
>>>>>>>> 0.004138555
>>>>>>>> node_scrape_collector_duration_seconds{collector="uname"} 1.3713e-05
>>>>>>>> node_scrape_collector_duration_seconds{collector="vmstat"} 
>>>>>>>> 0.005691152
>>>>>>>> node_scrape_collector_duration_seconds{collector="xfs"} 0.008633677
>>>>>>>> node_scrape_collector_duration_seconds{collector="zfs"} 2.8179e-05
>>>>>>>> # HELP node_scrape_collector_success node_exporter: Whether a 
>>>>>>>> collector succeeded.
>>>>>>>> # TYPE node_scrape_collector_success gauge
>>>>>>>> node_scrape_collector_success{collector="arp"} 1
>>>>>>>> node_scrape_collector_success{collector="bcache"} 1
>>>>>>>> node_scrape_collector_success{collector="bonding"} 0
>>>>>>>> node_scrape_collector_success{collector="btrfs"} 1
>>>>>>>> node_scrape_collector_success{collector="conntrack"} 1
>>>>>>>> node_scrape_collector_success{collector="cpu"} 1
>>>>>>>> node_scrape_collector_success{collector="cpufreq"} 1
>>>>>>>> node_scrape_collector_success{collector="diskstats"} 1
>>>>>>>> node_scrape_collector_success{collector="dmi"} 1
>>>>>>>> node_scrape_collector_success{collector="edac"} 1
>>>>>>>> node_scrape_collector_success{collector="entropy"} 1
>>>>>>>> node_scrape_collector_success{collector="fibrechannel"} 0
>>>>>>>> node_scrape_collector_success{collector="filefd"} 1
>>>>>>>> node_scrape_collector_success{collector="filesystem"} 1
>>>>>>>> node_scrape_collector_success{collector="hwmon"} 1
>>>>>>>> node_scrape_collector_success{collector="infiniband"} 0
>>>>>>>> node_scrape_collector_success{collector="ipvs"} 0
>>>>>>>> node_scrape_collector_success{collector="loadavg"} 1
>>>>>>>> node_scrape_collector_success{collector="mdadm"} 1
>>>>>>>> node_scrape_collector_success{collector="meminfo"} 1
>>>>>>>> node_scrape_collector_success{collector="netclass"} 1
>>>>>>>> node_scrape_collector_success{collector="netdev"} 1
>>>>>>>> node_scrape_collector_success{collector="netstat"} 1
>>>>>>>> node_scrape_collector_success{collector="nfs"} 0
>>>>>>>> node_scrape_collector_success{collector="nfsd"} 0
>>>>>>>> node_scrape_collector_success{collector="nvme"} 0
>>>>>>>> node_scrape_collector_success{collector="os"} 1
>>>>>>>> node_scrape_collector_success{collector="powersupplyclass"} 1
>>>>>>>> node_scrape_collector_success{collector="pressure"} 0
>>>>>>>> node_scrape_collector_success{collector="rapl"} 1
>>>>>>>> node_scrape_collector_success{collector="schedstat"} 1
>>>>>>>> node_scrape_collector_success{collector="sockstat"} 1
>>>>>>>> node_scrape_collector_success{collector="softnet"} 1
>>>>>>>> node_scrape_collector_success{collector="stat"} 1
>>>>>>>> node_scrape_collector_success{collector="tapestats"} 0
>>>>>>>> node_scrape_collector_success{collector="textfile"} 1
>>>>>>>> node_scrape_collector_success{collector="thermal_zone"} 1
>>>>>>>> node_scrape_collector_success{collector="time"} 1
>>>>>>>> node_scrape_collector_success{collector="timex"} 1
>>>>>>>> node_scrape_collector_success{collector="udp_queues"} 1
>>>>>>>> node_scrape_collector_success{collector="uname"} 1
>>>>>>>> node_scrape_collector_success{collector="vmstat"} 1
>>>>>>>> node_scrape_collector_success{collector="xfs"} 1
>>>>>>>> node_scrape_collector_success{collector="zfs"} 0
>>>>>>>>
>>>>>>>> On Tuesday, January 18, 2022 at 1:12:04 PM UTC-5 Brian Candler 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Can you show the output of:
>>>>>>>>>
>>>>>>>>> curl -Ss localhost:9100/metrics | grep -i collector
>>>>>>>>>
>>>>>>>>> On Tuesday, 18 January 2022 at 14:33:25 UTC dyio...@gmail.com 
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> [root@myhost1 ~]# ps auxwww | grep node_exporter
>>>>>>>>>> node_ex+ 4143664 12.5  0.0 725828 22668 ?        Ssl  09:29   
>>>>>>>>>> 0:06 /usr/local/bin/node_exporter --no-collector.wifi
>>>>>>>>>>
>>>>>>>>>> On Saturday, January 15, 2022 at 11:23:43 AM UTC-5 Brian Candler 
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> On Friday, 14 January 2022 at 14:12:02 UTC dyio...@gmail.com 
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> @Brian Chandler  I'm using the node_exporter defaults, as 
>>>>>>>>>>>> described here - https://github.com/prometheus/node_exporter.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Are you *really*?    Can you show the *exact* command line that 
>>>>>>>>>>> node_exporter is running with?  e.g.
>>>>>>>>>>>
>>>>>>>>>>> ps auxwww | grep node_exporter
>>>>>>>>>>>
>>>>>>>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/aa37d40f-2e13-4ba9-b5ae-887d9c81cc5dn%40googlegroups.com.

Reply via email to