Hi, On 7/21/20 10:01 PM, Lakshman Savadamuthu wrote: > just FYI, there are few other hosts in this cluster, where node_exporter > is running just fine without any issues. > We have started the process using systemctl command, here is the service > file: > > # cat /etc/systemd/system/node_exporter.service > > [Unit] > > Description=Node Exporter > > > [Service] > > User=prometheus > > ExecStart=/usr/local/bin/node_exporter --collector.filesystem > --collector.netdev --collector.cpu --collector.diskstats > --collector.mdadm --collector.loadavg --collector.time --collector.uname > --collector.logind > --collector.textfile.directory=/var/lib/node_exporter/textfile_collector > --collector.systemd > > > [Install] > > WantedBy=default.target > > [ ^^^ This looks truncated somehow?
> Also here is the stack trace: > > [root@mesosagent13 ~]# cat /proc/53547/stack > > [<ffffffffb8c9e04b>] do_exit+0x6bb/0xa40 > > [<ffffffffb8c9e44f>] do_group_exit+0x3f/0xa0 > > [<ffffffffb8caf24e>] get_signal_to_deliver+0x1ce/0x5e0 > > [<ffffffffb8c2b527>] do_signal+0x57/0x6f0 > > [<ffffffffb8c2bc32>] do_notify_resume+0x72/0xc0 > > [<ffffffffb9375124>] int_signal+0x12/0x17 > > [<ffffffffffffffff>] 0xffffffffffffffff > > [root@mesosagent13 ~]# Sounds like a classical Zombie process example. This means, the parent (i.e. systemd) is expected to clean this up. Not sure how it can happen with systemd. Maybe try restarting it (systemctl daemon-reexec). Besides that, I suggest continuing the other tests such as running node_exporter without systemd and with increased debug level. This should be possible despite the Zombie process. Kind regards, Christian -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/b53dbf6c-a74d-30eb-b4a0-3eee37cef8f7%40hoffmann-christian.info.

