Re: [Ganglia-general] Configuration problem after failover

2015-03-26 Thread Rick Cobb
Well, I should defer to folks who have actually configured this stuff in
the last year or three, but IIRC, this could happen because the admin node
gmond that is acting as the collector for all the compute nodes is
reporting statistics to itself.  Its other gmond is also doing that.

The configuration file for the "compute collector" gmond should be set up
not to collect any statistics on the local node. IIRC, that may require a
bit more work than just making it "mute"; e.g., you may have to remove any
parts that identify what to collect on that machine.

OTOH, if you haven't restarted the main admin gmetad (or the compute
collector gmond), then your hypothesis about stale RRD files might be
right.  If you don't care about what's been collected so far, you can try
just removing the files for those nodes within that cluster's directory and
restarting gmetad.

-- ReC

On Thu, Mar 26, 2015 at 4:11 AM, Loris Bennett 
wrote:

> Rick Cobb  writes:
>
> > Generally what you do is have all the compute nodes send to a gmond
> > server on the administrative nodes, and then have gmetad poll that
> > gmond. You use a unicast setup on the compute node gmonds to do this
> > (IIRC, you can have them send to more than one for redundancy); you
> > may as well make them "deaf" when you do that.
> >
> > If you want the administrative nodes to appear in a separate cluster,
> > run more than one gmond on them. One listens on the compute gmond port
> > and is "mute", the other one listens on the administrative gmond port
> > (which you'll have to assign differently than 8648 and 8649) and is
> > "normal" (neither deaf nor mute).
> >
> > -- ReC
>
> Thanks for all the help.  I have now got this working.  The only thing
> that isn't quite correct is that one of the admin nodes is listed along
> with the compute nodes, although no data has been collected here
> (however, the data *is* collected in the admin node data source).
>
> So is there still something wrong with my configuration or is this just
> an artefact caused by a past misconfiguration, which has left an entry
> for the admin node in the RRD files for the compute nodes?
>
> Cheers,
>
> Loris
>
> --
> This signature is currently under construction.
>
>
>
> --
> Dive into the World of Parallel Programming The Go Parallel Website,
> sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for
> all
> things parallel software development, from weekly thought leadership blogs
> to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
> ___
> Ganglia-general mailing list
> Ganglia-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ganglia-general
>
--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Configuration problem after failover

2015-03-26 Thread Loris Bennett
Rick Cobb  writes:

> Generally what you do is have all the compute nodes send to a gmond
> server on the administrative nodes, and then have gmetad poll that
> gmond. You use a unicast setup on the compute node gmonds to do this
> (IIRC, you can have them send to more than one for redundancy); you
> may as well make them "deaf" when you do that.
>
> If you want the administrative nodes to appear in a separate cluster,
> run more than one gmond on them. One listens on the compute gmond port
> and is "mute", the other one listens on the administrative gmond port
> (which you'll have to assign differently than 8648 and 8649) and is
> "normal" (neither deaf nor mute).
>
> -- ReC

Thanks for all the help.  I have now got this working.  The only thing
that isn't quite correct is that one of the admin nodes is listed along
with the compute nodes, although no data has been collected here
(however, the data *is* collected in the admin node data source).

So is there still something wrong with my configuration or is this just
an artefact caused by a past misconfiguration, which has left an entry
for the admin node in the RRD files for the compute nodes?

Cheers,

Loris

-- 
This signature is currently under construction.


--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Configuration problem after failover

2015-03-24 Thread Rick Cobb
Generally what you do is have all the compute nodes send to a gmond server
on the administrative nodes, and then have gmetad poll that gmond.  You use
a unicast setup on the compute node gmonds to do this (IIRC, you can have
them send to more than one for redundancy); you may as well make them
"deaf" when you do that.

If you want the administrative nodes to appear in a separate cluster, run
more than one gmond on them. One listens on the compute gmond port and is
"mute", the other one listens on the administrative gmond port (which
you'll have to assign differently than 8648 and 8649) and is "normal"
(neither deaf nor mute).

-- ReC

On Tue, Mar 24, 2015 at 2:10 AM, Loris Bennett 
wrote:

> > On 03/20/2015 10:23 AM, Loris Bennett wrote:
> >> Hi,
> >>
> >> I have the following in my gmetad.conf
> >>
> >> data_source "Admin_Nodes" 10 admin:8648
> >> data_source "Compute_Nodes" 10 admin:8649
> >>
> >> and when I look at the ports in use, I have
> >>
> >> $ netstat -plane | egrep 'gmon|gme'
> >> tcp0  0 0.0.0.0:86510.0.0.0:*
>  LISTEN  493256095111  62544/gmetad
> >> tcp0  0 0.0.0.0:86520.0.0.0:*
>  LISTEN  493256095112  62544/gmetad
> >> unix  2  [ ] DGRAM256095117 62544/gmetad
> >>
> >> Should I expect to see gmetad listening on ports 8648 and 8649 as well?
> >>
> >> Cheers,
> >>
> >> Loris
> >>
> Vladimir Vuksan  writes:
>
> > No. Gmetad listens to two ports by default
> >
> > 8651 and 8652
> >
> > 8648 and 8649 are ports for the gmond which gmetad is polling.
> >
>
> OK, I think I have a general problem will my setup.  I have:
>
> - 3 admin nodes, which during normal operation are always up
> - 100 compute nodes, any or all of which could be powered down during
>   normal operation
>
> Setting up the data source for the admin nodes seems straight forward,
> as they are normally all up.  However, how should it be defined for the
> compute nodes?  I would like to do something like
>
>   data_source "Compute_Nodes" 10 node*.test.cluster:8649
>
> but this produces the error:
>
>   we failed to resolve data source name node*.test.cluster
>
> I could add one of the admin nodes to the cluster of compute nodes, but
> then it would no longer be able to seed its own data to the cluster of
> admin node.
>
> Is there a standard way of dealing with this case?
>
> Cheers,
>
> Loris
>
> --
> This signature is currently under construction.
>
>
>
> --
> Dive into the World of Parallel Programming The Go Parallel Website,
> sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for
> all
> things parallel software development, from weekly thought leadership blogs
> to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
> ___
> Ganglia-general mailing list
> Ganglia-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ganglia-general
>
--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Configuration problem after failover

2015-03-24 Thread Loris Bennett
> On 03/20/2015 10:23 AM, Loris Bennett wrote:
>> Hi,
>>
>> I have the following in my gmetad.conf
>>
>> data_source "Admin_Nodes" 10 admin:8648
>> data_source "Compute_Nodes" 10 admin:8649
>>
>> and when I look at the ports in use, I have
>>
>> $ netstat -plane | egrep 'gmon|gme'
>> tcp0  0 0.0.0.0:86510.0.0.0:*   
>> LISTEN  493256095111  62544/gmetad
>> tcp0  0 0.0.0.0:86520.0.0.0:*   
>> LISTEN  493256095112  62544/gmetad
>> unix  2  [ ] DGRAM256095117 62544/gmetad
>>
>> Should I expect to see gmetad listening on ports 8648 and 8649 as well?
>>
>> Cheers,
>>
>> Loris
>>
Vladimir Vuksan  writes:

> No. Gmetad listens to two ports by default
>
> 8651 and 8652
>
> 8648 and 8649 are ports for the gmond which gmetad is polling.
>

OK, I think I have a general problem will my setup.  I have:

- 3 admin nodes, which during normal operation are always up
- 100 compute nodes, any or all of which could be powered down during
  normal operation

Setting up the data source for the admin nodes seems straight forward,
as they are normally all up.  However, how should it be defined for the
compute nodes?  I would like to do something like

  data_source "Compute_Nodes" 10 node*.test.cluster:8649

but this produces the error:

  we failed to resolve data source name node*.test.cluster

I could add one of the admin nodes to the cluster of compute nodes, but
then it would no longer be able to seed its own data to the cluster of
admin node.

Is there a standard way of dealing with this case?

Cheers,

Loris

-- 
This signature is currently under construction.


--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Configuration problem after failover

2015-03-20 Thread Vladimir Vuksan
No. Gmetad listens to two ports by default

8651 and 8652

8648 and 8649 are ports for the gmond which gmetad is polling.


On 03/20/2015 10:23 AM, Loris Bennett wrote:
> Hi,
>
> I have the following in my gmetad.conf
>
> data_source "Admin_Nodes" 10 admin:8648
> data_source "Compute_Nodes" 10 admin:8649
>
> and when I look at the ports in use, I have
>
> $ netstat -plane | egrep 'gmon|gme'
> tcp0  0 0.0.0.0:86510.0.0.0:*   
> LISTEN  493256095111  62544/gmetad
> tcp0  0 0.0.0.0:86520.0.0.0:*   
> LISTEN  493256095112  62544/gmetad
> unix  2  [ ] DGRAM256095117 62544/gmetad
>
> Should I expect to see gmetad listening on ports 8648 and 8649 as well?
>
> Cheers,
>
> Loris
>


--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


[Ganglia-general] Configuration problem after failover

2015-03-20 Thread Loris Bennett
Hi,

I have the following in my gmetad.conf

data_source "Admin_Nodes" 10 admin:8648
data_source "Compute_Nodes" 10 admin:8649

and when I look at the ports in use, I have

$ netstat -plane | egrep 'gmon|gme'
tcp0  0 0.0.0.0:86510.0.0.0:*   
LISTEN  493256095111  62544/gmetad
tcp0  0 0.0.0.0:86520.0.0.0:*   
LISTEN  493256095112  62544/gmetad
unix  2  [ ] DGRAM256095117 62544/gmetad

Should I expect to see gmetad listening on ports 8648 and 8649 as well?

Cheers,

Loris

-- 
This signature is currently under construction.


--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general