Re: [Ganglia-general] Configuration problem after failover
Well, I should defer to folks who have actually configured this stuff in the last year or three, but IIRC, this could happen because the admin node gmond that is acting as the collector for all the compute nodes is reporting statistics to itself. Its other gmond is also doing that. The configuration file for the "compute collector" gmond should be set up not to collect any statistics on the local node. IIRC, that may require a bit more work than just making it "mute"; e.g., you may have to remove any parts that identify what to collect on that machine. OTOH, if you haven't restarted the main admin gmetad (or the compute collector gmond), then your hypothesis about stale RRD files might be right. If you don't care about what's been collected so far, you can try just removing the files for those nodes within that cluster's directory and restarting gmetad. -- ReC On Thu, Mar 26, 2015 at 4:11 AM, Loris Bennett wrote: > Rick Cobb writes: > > > Generally what you do is have all the compute nodes send to a gmond > > server on the administrative nodes, and then have gmetad poll that > > gmond. You use a unicast setup on the compute node gmonds to do this > > (IIRC, you can have them send to more than one for redundancy); you > > may as well make them "deaf" when you do that. > > > > If you want the administrative nodes to appear in a separate cluster, > > run more than one gmond on them. One listens on the compute gmond port > > and is "mute", the other one listens on the administrative gmond port > > (which you'll have to assign differently than 8648 and 8649) and is > > "normal" (neither deaf nor mute). > > > > -- ReC > > Thanks for all the help. I have now got this working. The only thing > that isn't quite correct is that one of the admin nodes is listed along > with the compute nodes, although no data has been collected here > (however, the data *is* collected in the admin node data source). > > So is there still something wrong with my configuration or is this just > an artefact caused by a past misconfiguration, which has left an entry > for the admin node in the RRD files for the compute nodes? > > Cheers, > > Loris > > -- > This signature is currently under construction. > > > > -- > Dive into the World of Parallel Programming The Go Parallel Website, > sponsored > by Intel and developed in partnership with Slashdot Media, is your hub for > all > things parallel software development, from weekly thought leadership blogs > to > news, videos, case studies, tutorials and more. Take a look and join the > conversation now. http://goparallel.sourceforge.net/ > ___ > Ganglia-general mailing list > Ganglia-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/ganglia-general > -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Configuration problem after failover
Rick Cobb writes: > Generally what you do is have all the compute nodes send to a gmond > server on the administrative nodes, and then have gmetad poll that > gmond. You use a unicast setup on the compute node gmonds to do this > (IIRC, you can have them send to more than one for redundancy); you > may as well make them "deaf" when you do that. > > If you want the administrative nodes to appear in a separate cluster, > run more than one gmond on them. One listens on the compute gmond port > and is "mute", the other one listens on the administrative gmond port > (which you'll have to assign differently than 8648 and 8649) and is > "normal" (neither deaf nor mute). > > -- ReC Thanks for all the help. I have now got this working. The only thing that isn't quite correct is that one of the admin nodes is listed along with the compute nodes, although no data has been collected here (however, the data *is* collected in the admin node data source). So is there still something wrong with my configuration or is this just an artefact caused by a past misconfiguration, which has left an entry for the admin node in the RRD files for the compute nodes? Cheers, Loris -- This signature is currently under construction. -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Configuration problem after failover
Generally what you do is have all the compute nodes send to a gmond server on the administrative nodes, and then have gmetad poll that gmond. You use a unicast setup on the compute node gmonds to do this (IIRC, you can have them send to more than one for redundancy); you may as well make them "deaf" when you do that. If you want the administrative nodes to appear in a separate cluster, run more than one gmond on them. One listens on the compute gmond port and is "mute", the other one listens on the administrative gmond port (which you'll have to assign differently than 8648 and 8649) and is "normal" (neither deaf nor mute). -- ReC On Tue, Mar 24, 2015 at 2:10 AM, Loris Bennett wrote: > > On 03/20/2015 10:23 AM, Loris Bennett wrote: > >> Hi, > >> > >> I have the following in my gmetad.conf > >> > >> data_source "Admin_Nodes" 10 admin:8648 > >> data_source "Compute_Nodes" 10 admin:8649 > >> > >> and when I look at the ports in use, I have > >> > >> $ netstat -plane | egrep 'gmon|gme' > >> tcp0 0 0.0.0.0:86510.0.0.0:* > LISTEN 493256095111 62544/gmetad > >> tcp0 0 0.0.0.0:86520.0.0.0:* > LISTEN 493256095112 62544/gmetad > >> unix 2 [ ] DGRAM256095117 62544/gmetad > >> > >> Should I expect to see gmetad listening on ports 8648 and 8649 as well? > >> > >> Cheers, > >> > >> Loris > >> > Vladimir Vuksan writes: > > > No. Gmetad listens to two ports by default > > > > 8651 and 8652 > > > > 8648 and 8649 are ports for the gmond which gmetad is polling. > > > > OK, I think I have a general problem will my setup. I have: > > - 3 admin nodes, which during normal operation are always up > - 100 compute nodes, any or all of which could be powered down during > normal operation > > Setting up the data source for the admin nodes seems straight forward, > as they are normally all up. However, how should it be defined for the > compute nodes? I would like to do something like > > data_source "Compute_Nodes" 10 node*.test.cluster:8649 > > but this produces the error: > > we failed to resolve data source name node*.test.cluster > > I could add one of the admin nodes to the cluster of compute nodes, but > then it would no longer be able to seed its own data to the cluster of > admin node. > > Is there a standard way of dealing with this case? > > Cheers, > > Loris > > -- > This signature is currently under construction. > > > > -- > Dive into the World of Parallel Programming The Go Parallel Website, > sponsored > by Intel and developed in partnership with Slashdot Media, is your hub for > all > things parallel software development, from weekly thought leadership blogs > to > news, videos, case studies, tutorials and more. Take a look and join the > conversation now. http://goparallel.sourceforge.net/ > ___ > Ganglia-general mailing list > Ganglia-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/ganglia-general > -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Configuration problem after failover
> On 03/20/2015 10:23 AM, Loris Bennett wrote: >> Hi, >> >> I have the following in my gmetad.conf >> >> data_source "Admin_Nodes" 10 admin:8648 >> data_source "Compute_Nodes" 10 admin:8649 >> >> and when I look at the ports in use, I have >> >> $ netstat -plane | egrep 'gmon|gme' >> tcp0 0 0.0.0.0:86510.0.0.0:* >> LISTEN 493256095111 62544/gmetad >> tcp0 0 0.0.0.0:86520.0.0.0:* >> LISTEN 493256095112 62544/gmetad >> unix 2 [ ] DGRAM256095117 62544/gmetad >> >> Should I expect to see gmetad listening on ports 8648 and 8649 as well? >> >> Cheers, >> >> Loris >> Vladimir Vuksan writes: > No. Gmetad listens to two ports by default > > 8651 and 8652 > > 8648 and 8649 are ports for the gmond which gmetad is polling. > OK, I think I have a general problem will my setup. I have: - 3 admin nodes, which during normal operation are always up - 100 compute nodes, any or all of which could be powered down during normal operation Setting up the data source for the admin nodes seems straight forward, as they are normally all up. However, how should it be defined for the compute nodes? I would like to do something like data_source "Compute_Nodes" 10 node*.test.cluster:8649 but this produces the error: we failed to resolve data source name node*.test.cluster I could add one of the admin nodes to the cluster of compute nodes, but then it would no longer be able to seed its own data to the cluster of admin node. Is there a standard way of dealing with this case? Cheers, Loris -- This signature is currently under construction. -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Configuration problem after failover
No. Gmetad listens to two ports by default 8651 and 8652 8648 and 8649 are ports for the gmond which gmetad is polling. On 03/20/2015 10:23 AM, Loris Bennett wrote: > Hi, > > I have the following in my gmetad.conf > > data_source "Admin_Nodes" 10 admin:8648 > data_source "Compute_Nodes" 10 admin:8649 > > and when I look at the ports in use, I have > > $ netstat -plane | egrep 'gmon|gme' > tcp0 0 0.0.0.0:86510.0.0.0:* > LISTEN 493256095111 62544/gmetad > tcp0 0 0.0.0.0:86520.0.0.0:* > LISTEN 493256095112 62544/gmetad > unix 2 [ ] DGRAM256095117 62544/gmetad > > Should I expect to see gmetad listening on ports 8648 and 8649 as well? > > Cheers, > > Loris > -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
[Ganglia-general] Configuration problem after failover
Hi, I have the following in my gmetad.conf data_source "Admin_Nodes" 10 admin:8648 data_source "Compute_Nodes" 10 admin:8649 and when I look at the ports in use, I have $ netstat -plane | egrep 'gmon|gme' tcp0 0 0.0.0.0:86510.0.0.0:* LISTEN 493256095111 62544/gmetad tcp0 0 0.0.0.0:86520.0.0.0:* LISTEN 493256095112 62544/gmetad unix 2 [ ] DGRAM256095117 62544/gmetad Should I expect to see gmetad listening on ports 8648 and 8649 as well? Cheers, Loris -- This signature is currently under construction. -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general