The question is, why are systemd collector and process collector still in that graph?
On Friday, 21 January 2022 at 00:27:14 UTC dyio...@gmail.com wrote: > The attached is pprof output in text format, which may be easier to read > > On Thursday, January 20, 2022 at 6:30:25 PM UTC-5 Dimitri Yioulos wrote: > >> I ran pprof (attached). I'll have to work on /proc/<pid>/stat (even with >> the much appreciated reference :-) ). >> >> On Thursday, January 20, 2022 at 11:54:33 AM UTC-5 Brian Candler wrote: >> >>> So now go back to the original suggestion: run pprof with node_exporter >>> running the way you *want* to be running it. >>> >>> > [root@myhost1 ~]# time for ((i=1;i<=1000;i++)); do node_exporter >>> >/dev/null 2>&1; done >>> >>> That's meaningless. node_exporter is a daemon, not something you can >>> run one-shot like that. If you remove the ">/dev/null 2>&1" you'll see >>> lots of startup messages, probably ending with >>> >>> ts=2022-01-20T16:49:07.433Z caller=node_exporter.go:202 level=error >>> err="listen tcp :9100: bind: address already in use" >>> >>> and then node_exporter terminating. So you're not seeing the CPU >>> overhead of any node_exporter scrape jobs, only its startup overhead. >>> >>> If the system is idle apart from running node_exporter, then "top" will >>> show you system time and cpu time. More accurately, find the process ID of >>> node_exporter then look in /proc/<pid>/stat >>> >>> https://stackoverflow.com/questions/16726779/how-do-i-get-the-total-cpu-usage-of-an-application-from-proc-pid-stat >>> >>> On Thursday, 20 January 2022 at 12:33:06 UTC dyio...@gmail.com wrote: >>> >>>> Brian, >>>> >>>> Originally, I had not activated any additional collectors. Then, I >>>> read somewhere that I should add the systemd and process collectors. >>>> Still >>>> learning, here, so ... . That's why you saw them in the pprof graph. I >>>> then curcled back and removed them. However, high CPU usage has >>>> *always* been an issue. That goes for every system in which I have >>>> node_exporter running. While a few are test machines, and I care a bit >>>> less, for production machines it's an issue. >>>> >>>> Here's some time output for node_exporter, though I'm not good at >>>> interpreting the results: >>>> >>>> [root@myhost1 ~]# time for ((i=1;i<=1000;i++)); do node_exporter >>>> >/dev/null 2>&1; done >>>> >>>> real 0m6.103s >>>> user 0m3.658s >>>> sys 0m3.151s >>>> >>>> So, if the above is a good way to measure node_exporter's user versus >>>> system time, then they're about equal. If you have another means to do >>>> such measurement, I'd appreciate your sharing it. Once that's determined >>>> and, if system time versus user time is "out-of-whack", how do I remediate? >>>> >>>> Many thanks. >>>> >>>> On Thursday, January 20, 2022 at 3:46:35 AM UTC-5 Brian Candler wrote: >>>> >>>>> So the systemd and process collectors aren't active. I wonder why >>>>> they appeared in your pprof graph then? Was it exactly the same binary >>>>> you >>>>> were running? >>>>> >>>>> 20% CPU usage from a once-every-five-second scrape implies that it >>>>> should take about 1 CPU-second in total, but all the collectors seem very >>>>> fast. The top five use between 0.01 and 0.015 seconds - and that's wall >>>>> clock time, not CPU time. >>>>> >>>>> node_scrape_collector_duration_seconds{collector="cpu"} 0.010873961 >>>>> node_scrape_collector_duration_seconds{collector="diskstats"} >>>>> 0.01727642 >>>>> node_scrape_collector_duration_seconds{collector="hwmon"} 0.014143617 >>>>> node_scrape_collector_duration_seconds{collector="netclass"} >>>>> 0.013852102 >>>>> node_scrape_collector_duration_seconds{collector="thermal_zone"} >>>>> 0.010936983 >>>>> >>>>> Something weird is going on. Next you might want to drill down into >>>>> node_exporter's user versus system time. Is the usage mostly system >>>>> time? >>>>> That might point you some way, although the implication then is that the >>>>> high CPU usage is some part of node_exporter outside of individual >>>>> collectors. >>>>> >>>>> On Wednesday, 19 January 2022 at 23:27:40 UTC dyio...@gmail.com wrote: >>>>> >>>>>> [root@myhost1 ~]# curl -Ss localhost:9100/metrics | grep -i collector >>>>>> # HELP node_scrape_collector_duration_seconds node_exporter: Duration >>>>>> of a collector scrape. >>>>>> # TYPE node_scrape_collector_duration_seconds gauge >>>>>> node_scrape_collector_duration_seconds{collector="arp"} 0.002911805 >>>>>> node_scrape_collector_duration_seconds{collector="bcache"} 1.4571e-05 >>>>>> node_scrape_collector_duration_seconds{collector="bonding"} >>>>>> 0.000112308 >>>>>> node_scrape_collector_duration_seconds{collector="btrfs"} 0.001308192 >>>>>> node_scrape_collector_duration_seconds{collector="conntrack"} >>>>>> 0.002750716 >>>>>> node_scrape_collector_duration_seconds{collector="cpu"} 0.010873961 >>>>>> node_scrape_collector_duration_seconds{collector="cpufreq"} >>>>>> 0.008559194 >>>>>> node_scrape_collector_duration_seconds{collector="diskstats"} >>>>>> 0.01727642 >>>>>> node_scrape_collector_duration_seconds{collector="dmi"} 0.000971785 >>>>>> node_scrape_collector_duration_seconds{collector="edac"} 0.006972343 >>>>>> node_scrape_collector_duration_seconds{collector="entropy"} >>>>>> 0.001360089 >>>>>> node_scrape_collector_duration_seconds{collector="fibrechannel"} >>>>>> 2.8256e-05 >>>>>> node_scrape_collector_duration_seconds{collector="filefd"} 0.000739988 >>>>>> node_scrape_collector_duration_seconds{collector="filesystem"} >>>>>> 0.00554684 >>>>>> node_scrape_collector_duration_seconds{collector="hwmon"} 0.014143617 >>>>>> node_scrape_collector_duration_seconds{collector="infiniband"} >>>>>> 1.3484e-05 >>>>>> node_scrape_collector_duration_seconds{collector="ipvs"} 7.5532e-05 >>>>>> node_scrape_collector_duration_seconds{collector="loadavg"} >>>>>> 0.004074291 >>>>>> node_scrape_collector_duration_seconds{collector="mdadm"} 0.000974966 >>>>>> node_scrape_collector_duration_seconds{collector="meminfo"} >>>>>> 0.004201816 >>>>>> node_scrape_collector_duration_seconds{collector="netclass"} >>>>>> 0.013852102 >>>>>> node_scrape_collector_duration_seconds{collector="netdev"} 0.006993921 >>>>>> node_scrape_collector_duration_seconds{collector="netstat"} >>>>>> 0.007896151 >>>>>> node_scrape_collector_duration_seconds{collector="nfs"} 0.000125062 >>>>>> node_scrape_collector_duration_seconds{collector="nfsd"} 3.6075e-05 >>>>>> node_scrape_collector_duration_seconds{collector="nvme"} 0.001064067 >>>>>> node_scrape_collector_duration_seconds{collector="os"} 0.005645435 >>>>>> node_scrape_collector_duration_seconds{collector="powersupplyclass"} >>>>>> 0.001394135 >>>>>> node_scrape_collector_duration_seconds{collector="pressure"} >>>>>> 0.001466664 >>>>>> node_scrape_collector_duration_seconds{collector="rapl"} 0.00226622 >>>>>> node_scrape_collector_duration_seconds{collector="schedstat"} >>>>>> 0.006677493 >>>>>> node_scrape_collector_duration_seconds{collector="sockstat"} >>>>>> 0.000970676 >>>>>> node_scrape_collector_duration_seconds{collector="softnet"} >>>>>> 0.002014497 >>>>>> node_scrape_collector_duration_seconds{collector="stat"} 0.004216999 >>>>>> node_scrape_collector_duration_seconds{collector="tapestats"} >>>>>> 1.0296e-05 >>>>>> node_scrape_collector_duration_seconds{collector="textfile"} >>>>>> 5.2573e-05 >>>>>> node_scrape_collector_duration_seconds{collector="thermal_zone"} >>>>>> 0.010936983 >>>>>> node_scrape_collector_duration_seconds{collector="time"} 0.00568072 >>>>>> node_scrape_collector_duration_seconds{collector="timex"} 3.3662e-05 >>>>>> node_scrape_collector_duration_seconds{collector="udp_queues"} >>>>>> 0.004138555 >>>>>> node_scrape_collector_duration_seconds{collector="uname"} 1.3713e-05 >>>>>> node_scrape_collector_duration_seconds{collector="vmstat"} 0.005691152 >>>>>> node_scrape_collector_duration_seconds{collector="xfs"} 0.008633677 >>>>>> node_scrape_collector_duration_seconds{collector="zfs"} 2.8179e-05 >>>>>> # HELP node_scrape_collector_success node_exporter: Whether a >>>>>> collector succeeded. >>>>>> # TYPE node_scrape_collector_success gauge >>>>>> node_scrape_collector_success{collector="arp"} 1 >>>>>> node_scrape_collector_success{collector="bcache"} 1 >>>>>> node_scrape_collector_success{collector="bonding"} 0 >>>>>> node_scrape_collector_success{collector="btrfs"} 1 >>>>>> node_scrape_collector_success{collector="conntrack"} 1 >>>>>> node_scrape_collector_success{collector="cpu"} 1 >>>>>> node_scrape_collector_success{collector="cpufreq"} 1 >>>>>> node_scrape_collector_success{collector="diskstats"} 1 >>>>>> node_scrape_collector_success{collector="dmi"} 1 >>>>>> node_scrape_collector_success{collector="edac"} 1 >>>>>> node_scrape_collector_success{collector="entropy"} 1 >>>>>> node_scrape_collector_success{collector="fibrechannel"} 0 >>>>>> node_scrape_collector_success{collector="filefd"} 1 >>>>>> node_scrape_collector_success{collector="filesystem"} 1 >>>>>> node_scrape_collector_success{collector="hwmon"} 1 >>>>>> node_scrape_collector_success{collector="infiniband"} 0 >>>>>> node_scrape_collector_success{collector="ipvs"} 0 >>>>>> node_scrape_collector_success{collector="loadavg"} 1 >>>>>> node_scrape_collector_success{collector="mdadm"} 1 >>>>>> node_scrape_collector_success{collector="meminfo"} 1 >>>>>> node_scrape_collector_success{collector="netclass"} 1 >>>>>> node_scrape_collector_success{collector="netdev"} 1 >>>>>> node_scrape_collector_success{collector="netstat"} 1 >>>>>> node_scrape_collector_success{collector="nfs"} 0 >>>>>> node_scrape_collector_success{collector="nfsd"} 0 >>>>>> node_scrape_collector_success{collector="nvme"} 0 >>>>>> node_scrape_collector_success{collector="os"} 1 >>>>>> node_scrape_collector_success{collector="powersupplyclass"} 1 >>>>>> node_scrape_collector_success{collector="pressure"} 0 >>>>>> node_scrape_collector_success{collector="rapl"} 1 >>>>>> node_scrape_collector_success{collector="schedstat"} 1 >>>>>> node_scrape_collector_success{collector="sockstat"} 1 >>>>>> node_scrape_collector_success{collector="softnet"} 1 >>>>>> node_scrape_collector_success{collector="stat"} 1 >>>>>> node_scrape_collector_success{collector="tapestats"} 0 >>>>>> node_scrape_collector_success{collector="textfile"} 1 >>>>>> node_scrape_collector_success{collector="thermal_zone"} 1 >>>>>> node_scrape_collector_success{collector="time"} 1 >>>>>> node_scrape_collector_success{collector="timex"} 1 >>>>>> node_scrape_collector_success{collector="udp_queues"} 1 >>>>>> node_scrape_collector_success{collector="uname"} 1 >>>>>> node_scrape_collector_success{collector="vmstat"} 1 >>>>>> node_scrape_collector_success{collector="xfs"} 1 >>>>>> node_scrape_collector_success{collector="zfs"} 0 >>>>>> >>>>>> On Tuesday, January 18, 2022 at 1:12:04 PM UTC-5 Brian Candler wrote: >>>>>> >>>>>>> Can you show the output of: >>>>>>> >>>>>>> curl -Ss localhost:9100/metrics | grep -i collector >>>>>>> >>>>>>> On Tuesday, 18 January 2022 at 14:33:25 UTC dyio...@gmail.com wrote: >>>>>>> >>>>>>>> [root@myhost1 ~]# ps auxwww | grep node_exporter >>>>>>>> node_ex+ 4143664 12.5 0.0 725828 22668 ? Ssl 09:29 0:06 >>>>>>>> /usr/local/bin/node_exporter --no-collector.wifi >>>>>>>> >>>>>>>> On Saturday, January 15, 2022 at 11:23:43 AM UTC-5 Brian Candler >>>>>>>> wrote: >>>>>>>> >>>>>>>>> On Friday, 14 January 2022 at 14:12:02 UTC dyio...@gmail.com >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> @Brian Chandler I'm using the node_exporter defaults, as >>>>>>>>>> described here - https://github.com/prometheus/node_exporter. >>>>>>>>>> >>>>>>>>> >>>>>>>>> Are you *really*? Can you show the *exact* command line that >>>>>>>>> node_exporter is running with? e.g. >>>>>>>>> >>>>>>>>> ps auxwww | grep node_exporter >>>>>>>>> >>>>>>>> -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/4801c823-b980-404d-886a-f82565fdd974n%40googlegroups.com.