Re: [Ganglia-developers] gmond python module

2018-03-16 Thread Dmitry Akselrod
Thanks Robin!   That's exactly what I needed.

On Tue, Mar 13, 2018 at 3:12 AM Robin Humble 
wrote:

> Hi Dmitry,
>
> On Fri, Mar 09, 2018 at 08:11:08PM +, Dmitry Akselrod wrote:
> >2.  As I am collecting the metrics for the remote hosts on my utility
> >hosts, the Ganglia website will show my utility host as the node name for
> >all the metrics.   That all makes sense since gmetad is polling the gmond
> >on my utility host and the gmond on my utility host is storing the
> >metrics.   Is there a way to override the hostname for the specific metric
> >I am collecting via SNMP?   I would like the Ganglia cluster to have a
> node
> >for each of the appliance I am polling via my SNMP module with its metrics
> >assigned to it.   It seems like it should be theoretically possible since
> >gmond can aggregate metrics from multiple hosts.   I am just not sure how
> >to get to get to this programmatically.
>
> rather than use a gmond python module, you could probably accomplish
> what you want using external python program that gathers up all your
> SNMP data and then spoofs it into ganglia using gmetric.py.
>   https://github.com/ganglia/ganglia_contrib/tree/master/gmetric-python
>
> the data will appear to be coming from another host even though you
> are inserting it all into ganglia from your utility host.
>   https://github.com/ganglia/monitor-core/wiki/Gmetric-Spoofing
>
> eg.
>   g = gmetric.Gmetric( gmondHost, gmondPort, gmondProtocol )
>   spoofStr = ip + ':' + host
>   g.send( name, '%.2f' % d, 'float', unit, 'both', 60, 0, "", spoofStr )
>
> I do this for a bunch of 'out of band' data like node temps, fans,
> infiniband traffic, filesystem traffic, etc.
>
> the only quirk in doing it this way is that if a host is down then this
> spoof'd data will make it appear like it's still up. but for pure 'fake'
> hosts like it sounds like you have, then that's probably what you want.
>
> cheers,
> robin
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] gmond python module

2018-03-13 Thread Robin Humble
Hi Dmitry,

On Fri, Mar 09, 2018 at 08:11:08PM +, Dmitry Akselrod wrote:
>2.  As I am collecting the metrics for the remote hosts on my utility
>hosts, the Ganglia website will show my utility host as the node name for
>all the metrics.   That all makes sense since gmetad is polling the gmond
>on my utility host and the gmond on my utility host is storing the
>metrics.   Is there a way to override the hostname for the specific metric
>I am collecting via SNMP?   I would like the Ganglia cluster to have a node
>for each of the appliance I am polling via my SNMP module with its metrics
>assigned to it.   It seems like it should be theoretically possible since
>gmond can aggregate metrics from multiple hosts.   I am just not sure how
>to get to get to this programmatically.

rather than use a gmond python module, you could probably accomplish
what you want using external python program that gathers up all your
SNMP data and then spoofs it into ganglia using gmetric.py.
  https://github.com/ganglia/ganglia_contrib/tree/master/gmetric-python

the data will appear to be coming from another host even though you
are inserting it all into ganglia from your utility host.
  https://github.com/ganglia/monitor-core/wiki/Gmetric-Spoofing

eg.
  g = gmetric.Gmetric( gmondHost, gmondPort, gmondProtocol )
  spoofStr = ip + ':' + host
  g.send( name, '%.2f' % d, 'float', unit, 'both', 60, 0, "", spoofStr )

I do this for a bunch of 'out of band' data like node temps, fans,
infiniband traffic, filesystem traffic, etc.

the only quirk in doing it this way is that if a host is down then this
spoof'd data will make it appear like it's still up. but for pure 'fake'
hosts like it sounds like you have, then that's probably what you want.

cheers,
robin

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] gmond python module interface

2009-02-01 Thread David Stainton
Hey,

Yeah I agree third party daemons are not the way to go long term.
And yeah if the python module interface could be improved
and then it would likely be easier and more fun to use...

Yeah my proposal wasn't really well thought out...
The idea seemed more attractive when the daemon was threading... then
it would scale well maybe.
I hear tell Python threads have some limitations...
but for this application it probably wouldn't matter.
I think the main limitation I hear about Python threads has to do with
the Global Interpretor Lock...
Only one thread at a time has access to Python objects...

Anyway I might continue to write some modules using gmond interface...
But for a simple module I might sometimes use a gmetric script or
possibly my proposed daemon.

Eventually I want to look at more code and figure out how to use
Ganglia better...


~ David


On Sat, Jan 31, 2009 at 9:48 PM, Spike Spiegel  wrote:
> Hi,
>
> provided that I haven't had the time to look at this part of the code
> yet and that I agree it would be much nicer to have a gmetric-like
> behavior,
>
> On Sun, Feb 1, 2009 at 12:21 AM, David Stainton  
> wrote:
>
>> I like using gmetric to monitor... so I wrote gmetric-daemon which
>> is my attempt at a forking standalone daemon
>> which runs Python metric modules and calls gmetric for each metric...
>
> in a previous email you call upon a "most scalable, most correct and
> most reliable/highly available design", which is certainly a valuable
> goal that I don't see met by this proposal. A gmetric-daemon as far as
> I understand gmetric would defy caching and directives like threshold
> and timeout, which are very important at least as far as scalability
> goes. Furthermore as long as there are built-int plugins with
> collection groups and so on a third party daemon sounds like the wrong
> approach to me, so as much easier as it might be at first I'd believe
> that the "most scalable, most correct and most reliable design" is the
> one Brad proposes cavia the fact that figuring it all out will take
> more time.
>
>> I wanted a slightly different multithreaded approach to monitoring...
>> but it turns out
>> that Python threads really suck.
>
> care to share in which way python threads really suck?
>
>> So I made this a forking daemon.
>> One process per module. Not very memory effecient. But then I don't
>> expect to need many modules...
>
> *I* don't? what if somebody else does? what if you do tomorrow/at
> another job? I don't see how you'd fix something like that at later
> stage without having to throw everything away. And how does this meet
> the "most scalable" design goal?
>
> Don't get me wrong, I'm sure everybody agrees on the problems and
> appreciate the effort, I'm merely pointing out that from my
> perspective this proposal doesn't meet the design goals and is
> unlikely to get traction upstream or in the HPC community, even tho it
> might be just perfect for you and other people. And just in case, I've
> no affiliation with ganglia and these are my own opinions, maybe
> upstream folks have completely different thoughts.
>
> time and skills permitting I'd be happy to help out with improving the
> python interface especially since it's something we'd like to heavily
> leverage at work.
>
> thanks
>
> --
> "Behind every great man there's a great backpack" - B.
>

--
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] gmond python module interface

2009-01-31 Thread Spike Spiegel
Hi,

provided that I haven't had the time to look at this part of the code
yet and that I agree it would be much nicer to have a gmetric-like
behavior,

On Sun, Feb 1, 2009 at 12:21 AM, David Stainton  wrote:

> I like using gmetric to monitor... so I wrote gmetric-daemon which
> is my attempt at a forking standalone daemon
> which runs Python metric modules and calls gmetric for each metric...

in a previous email you call upon a "most scalable, most correct and
most reliable/highly available design", which is certainly a valuable
goal that I don't see met by this proposal. A gmetric-daemon as far as
I understand gmetric would defy caching and directives like threshold
and timeout, which are very important at least as far as scalability
goes. Furthermore as long as there are built-int plugins with
collection groups and so on a third party daemon sounds like the wrong
approach to me, so as much easier as it might be at first I'd believe
that the "most scalable, most correct and most reliable design" is the
one Brad proposes cavia the fact that figuring it all out will take
more time.

> I wanted a slightly different multithreaded approach to monitoring...
> but it turns out
> that Python threads really suck.

care to share in which way python threads really suck?

> So I made this a forking daemon.
> One process per module. Not very memory effecient. But then I don't
> expect to need many modules...

*I* don't? what if somebody else does? what if you do tomorrow/at
another job? I don't see how you'd fix something like that at later
stage without having to throw everything away. And how does this meet
the "most scalable" design goal?

Don't get me wrong, I'm sure everybody agrees on the problems and
appreciate the effort, I'm merely pointing out that from my
perspective this proposal doesn't meet the design goals and is
unlikely to get traction upstream or in the HPC community, even tho it
might be just perfect for you and other people. And just in case, I've
no affiliation with ganglia and these are my own opinions, maybe
upstream folks have completely different thoughts.

time and skills permitting I'd be happy to help out with improving the
python interface especially since it's something we'd like to heavily
leverage at work.

thanks

-- 
"Behind every great man there's a great backpack" - B.

--
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] gmond python module interface

2009-01-31 Thread David Stainton
Hello,

I just realized gmond is even better than I thought.
It's threaded? I wrote a python plugin with a callback
that repeatedly sleeps forever... But this didn't stop
the other plugin callbacks from running. Ganglia makes me happy...

I like using gmetric to monitor... so I wrote gmetric-daemon which
is my attempt at a forking standalone daemon
which runs Python metric modules and calls gmetric for each metric...
I'm going to try and use this framework to monitor the cluster at work.

I'm fairly new to Ganglia but I suspect some other users probably
wrote something like this... or maybe use crontabs.

I wanted a slightly different multithreaded approach to monitoring...
but it turns out
that Python threads really suck. So I made this a forking daemon.
One process per module. Not very memory effecient. But then I don't
expect to need many modules...

I like writing Python scripts that call gmetric... its easy.
I'll soon be writing some more metric monitors for my work.

Here is the gitHub repo if anyone is interested :

http://github.com/david415/gmetric-daemon/tree/master

by the way... it's a rough draft at best right now.

-- 
Operations Engineer Spinn3r.com
Location: San Francisco, CA
YIM: mrdavids22
Work: http://spinn3r.com
Blog: http://david415.wordpress.com


On Wed, Jan 28, 2009 at 2:28 PM, David Stainton  wrote:
> Greetings,
>
>
> Gilad, if you are going to rewrite your mysql python module to use threads...
> you might want to think more about the race conditions.
> I'll use your very useful mysql module as an example of how the python
> module interface is fundamentally flawed by design.
> Multiple metrics are provided by a single blocking query to mysqld
> (e.g. SHOW INNODB STATUS) therefore the programmer of
> the python module should be in control of data collection scheduling.
>
> With the current module interface one might think to spawn a collector
> thread to run continuously and populate a cache...
> and have each callback function read an element asynchronously from the cache.
> This design is flawed because now there are two schedulers!
> The collector thread must schedule it's data collection and the
> .pyconf file must also
> tell the gmond scheduler when to collect data from the cache.
> It's easy to see how it would be difficult or impossible to keep these
> two parallel schedulers aligned
> so that there are no race conditions preventing a consistent view of the data.
>
> I do not think there is a way to utilize the current python module
> interface to write modules which correctly handle all
> edge cases for many real world problems (e.g. monitoring databases
> with blocking queries which return multiple metrics worth of data).
>
> I agree, writing a Python script that calls gmetric is easier.
> But I think the situation is deceptive and I'm looking to make things
> scale well and be highly available/reliable.
> It is not just inconvenient to have to have a .pyconf per module...
> It's also a design flaw because it implies two parallel schedulers.
>
> I'd suggest rewriting the python module interface.
> The programmer utilizing this interface that I'm imagining,
> could easily write python modules that correctly handle multiple
> metrics and blocking calls.
>
> The user of this API would write a single data collection function
> which returns a tuple of metrics (metric meta data and metric value)
> and scheduling info (e.g. the number of seconds later that this
> function should be called).
> Gmond would spawn a thread for each module.
> Each module thread runs the collector function supplied to it.
> When the collector function returns... the thread should somehow (??) update
> the metric data structures (like using gmetric). The collector
> function also returns
> scheduling information... for example how long the module thread should sleep
> before calling the collector function again. Collector functions would
> measure how long data collection takes
> and use that information to schedule the next data collection.
>
> At this time it seems easier to write a deamon which makes calls to
> gmetric and correctly handles multiple blocking
> calls which collect data for multiple metrics. I'd make sure that the
> daemon spawns a thread for each blocking collector (e.g. module).
>
> The messier equivalent would be to write smaller scripts. Each script
> has a blocking collector which then reports all the metrics
> via calls to gmetric. Each script is run in parallel; via cron or
> whatever parallel execution scheduler...
>
> Spawning python threads is obviously more memory efficient than
> forking many procs...
> But my point here is the equivalent scheduling.
>
> I'm going to write the threading daemon (in python)
> because it seems like the easiest, most scalable, most correct and
> most reliable/highly available design I can think of.
> It should make writing a module very easy and quick which is what it should 
> be.
> We shouldn't have to think about threading and race conditi

Re: [Ganglia-developers] gmond python module interface

2009-01-28 Thread David Stainton
Greetings,


Gilad, if you are going to rewrite your mysql python module to use threads...
you might want to think more about the race conditions.
I'll use your very useful mysql module as an example of how the python
module interface is fundamentally flawed by design.
Multiple metrics are provided by a single blocking query to mysqld
(e.g. SHOW INNODB STATUS) therefore the programmer of
the python module should be in control of data collection scheduling.

With the current module interface one might think to spawn a collector
thread to run continuously and populate a cache...
and have each callback function read an element asynchronously from the cache.
This design is flawed because now there are two schedulers!
The collector thread must schedule it's data collection and the
.pyconf file must also
tell the gmond scheduler when to collect data from the cache.
It's easy to see how it would be difficult or impossible to keep these
two parallel schedulers aligned
so that there are no race conditions preventing a consistent view of the data.

I do not think there is a way to utilize the current python module
interface to write modules which correctly handle all
edge cases for many real world problems (e.g. monitoring databases
with blocking queries which return multiple metrics worth of data).

I agree, writing a Python script that calls gmetric is easier.
But I think the situation is deceptive and I'm looking to make things
scale well and be highly available/reliable.
It is not just inconvenient to have to have a .pyconf per module...
It's also a design flaw because it implies two parallel schedulers.

I'd suggest rewriting the python module interface.
The programmer utilizing this interface that I'm imagining,
could easily write python modules that correctly handle multiple
metrics and blocking calls.

The user of this API would write a single data collection function
which returns a tuple of metrics (metric meta data and metric value)
and scheduling info (e.g. the number of seconds later that this
function should be called).
Gmond would spawn a thread for each module.
Each module thread runs the collector function supplied to it.
When the collector function returns... the thread should somehow (??) update
the metric data structures (like using gmetric). The collector
function also returns
scheduling information... for example how long the module thread should sleep
before calling the collector function again. Collector functions would
measure how long data collection takes
and use that information to schedule the next data collection.

At this time it seems easier to write a deamon which makes calls to
gmetric and correctly handles multiple blocking
calls which collect data for multiple metrics. I'd make sure that the
daemon spawns a thread for each blocking collector (e.g. module).

The messier equivalent would be to write smaller scripts. Each script
has a blocking collector which then reports all the metrics
via calls to gmetric. Each script is run in parallel; via cron or
whatever parallel execution scheduler...

Spawning python threads is obviously more memory efficient than
forking many procs...
But my point here is the equivalent scheduling.

I'm going to write the threading daemon (in python)
because it seems like the easiest, most scalable, most correct and
most reliable/highly available design I can think of.
It should make writing a module very easy and quick which is what it should be.
We shouldn't have to think about threading and race conditions if all
I want is a simple module.
Of course it'd be cleaner if I didn't have to popen gmetric.
But I'd rather have my work be reliable and able scale to many modules
and metrics.

Thoughts anyone?

Cheers,

David Stainton

--
Operations Engineer Spinn3r.com
Location: San Francisco, CA
YIM: mrdavids22
Work: http://spinn3r.com
Blog: http://david415.wordpress.com

--
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] gmond python module interface

2009-01-28 Thread Brad Nicholes
>>> On 1/28/2009 at 7:59 AM, in message <20090128145933.ga19...@imperial.ac.uk>,
Kostas Georgiou  wrote:
> On Wed, Jan 28, 2009 at 06:09:48AM -0800, Gilad Raphaelli wrote:
> 
>> 
>> I think this is a well thought out email and I'm a little surprised at
>> the lack of response to it.  Is it because no one is actually using
>> the gmond python module interface and hasn't had to make these types
>> of decisions?  I don't have a single gmetric script/module that only
>> collects one metric and have used both the mutli-threaded collector
>> approach and the single-thread hopeful/naive approach (having written
>> the mysql metric module you mention).  I agree that the multi-threaded
>> design seems less prone to problems but in practice I haven't had any
>> problems either way.  That being said, I will be trying some of your
>> suggestions or moving to threads for that mysql module when time
>> permits.
> 
> You could also modify gmond to run each plugin in a new thread, no sure
> if it is easy/possible though since I haven't looked at that part of the
> code yet.
> 
>> My frustration with the gmond python module interface is that it's not
>> actually a complete replacement for gmetric scripts as I use them.
>> Needing to know all of the metrics that a module will report before
>> runtime makes for a lot of upfront work creating .pyconf files and
>> doesn't allow for adding new metrics without restarting gmond.  Being
>> able to deploy one gmetric script that conditionally reports gmetrics
>> based on what's running/hardware installed/etc on a box is a big
>> advantage for a gmetric script over conditionally generating pyconf
>> files and then still having to conditionally collect metrics in the
>> actual gmond module.  I expect most users just stick with the gmetric
>> script in this case and handle scheduling themselves?
> 
> I agree here, my suggestion in the "Wildcard Configuration" was to add a
> metric_autoconf function in the plugin and get gmond to use that to get
> the collection_group from there. In any case for the modules that I
> develop I call this function from main so I can do python module.py -t >
> module.pyconf with a similar effect.
> 
> This doesn't solve all problems though, I would like to be able to just
> start mysql in a host or add a new disk for example and have metrics for
> them without touching the configuration at all. Maybe extending the
> configuration so in a .pyconf you can write something like
> collection_group {
>   autoconf = 300
> }
> to have gmond interogate the plugin every 5 mins to get a new collection
> group will be enough, I need to think about it a bit more...
> 

The issue is more about the internal hash tables and arrays that are 
initialized at startup directly from the information that is found in the .conf 
files.  In order to make gmond recognize metrics without them being explicitly 
configured in the .conf file, gmond needs to be changed so that the internal 
tables can be allocated on the fly.  It can certainly be done, it is just a 
matter of taking the time to figure it all out and make the changes compatible 
with the current functionality.


Brad


--
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] gmond python module interface

2009-01-28 Thread Kostas Georgiou
On Wed, Jan 28, 2009 at 06:09:48AM -0800, Gilad Raphaelli wrote:

> 
> I think this is a well thought out email and I'm a little surprised at
> the lack of response to it.  Is it because no one is actually using
> the gmond python module interface and hasn't had to make these types
> of decisions?  I don't have a single gmetric script/module that only
> collects one metric and have used both the mutli-threaded collector
> approach and the single-thread hopeful/naive approach (having written
> the mysql metric module you mention).  I agree that the multi-threaded
> design seems less prone to problems but in practice I haven't had any
> problems either way.  That being said, I will be trying some of your
> suggestions or moving to threads for that mysql module when time
> permits.

You could also modify gmond to run each plugin in a new thread, no sure
if it is easy/possible though since I haven't looked at that part of the
code yet.

> My frustration with the gmond python module interface is that it's not
> actually a complete replacement for gmetric scripts as I use them.
> Needing to know all of the metrics that a module will report before
> runtime makes for a lot of upfront work creating .pyconf files and
> doesn't allow for adding new metrics without restarting gmond.  Being
> able to deploy one gmetric script that conditionally reports gmetrics
> based on what's running/hardware installed/etc on a box is a big
> advantage for a gmetric script over conditionally generating pyconf
> files and then still having to conditionally collect metrics in the
> actual gmond module.  I expect most users just stick with the gmetric
> script in this case and handle scheduling themselves?

I agree here, my suggestion in the "Wildcard Configuration" was to add a
metric_autoconf function in the plugin and get gmond to use that to get
the collection_group from there. In any case for the modules that I
develop I call this function from main so I can do python module.py -t >
module.pyconf with a similar effect.

This doesn't solve all problems though, I would like to be able to just
start mysql in a host or add a new disk for example and have metrics for
them without touching the configuration at all. Maybe extending the
configuration so in a .pyconf you can write something like
collection_group {
  autoconf = 300
}
to have gmond interogate the plugin every 5 mins to get a new collection
group will be enough, I need to think about it a bit more...

Kostas


--
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] gmond python module interface

2009-01-28 Thread Gilad Raphaelli

I think this is a well thought out email and I'm a little surprised at the lack 
of response to it.  Is it because no one is actually using the gmond python 
module interface and hasn't had to make these types of decisions?  I don't have 
a single gmetric script/module that only collects one metric and have used both 
the mutli-threaded collector approach and the single-thread hopeful/naive 
approach (having written the mysql metric module you mention).  I agree that 
the multi-threaded design seems less prone to problems but in practice I 
haven't had any problems either way.  That being said, I will be trying some of 
your suggestions or moving to threads for that mysql module when time permits.

My frustration with the gmond python module interface is that it's not actually 
a complete replacement for gmetric scripts as I use them.  Needing to know all 
of the metrics that a module will report before runtime makes for a lot of 
upfront work creating .pyconf files and doesn't allow for adding new metrics 
without restarting gmond.  Being able to deploy one gmetric script that 
conditionally reports gmetrics based on what's running/hardware installed/etc 
on a box is a big advantage for a gmetric script over conditionally generating 
pyconf files and then still having to conditionally collect metrics in the 
actual gmond module.  I expect most users just stick with the gmetric script in 
this case and handle scheduling themselves?

-g



- Original Message 
> From: David Stainton 
> To: ganglia-developers@lists.sourceforge.net
> Sent: Friday, January 23, 2009 11:43:32 AM
> Subject: [Ganglia-developers] gmond python module interface
> 
> Hi,
> 
> I've been thinking about the python module interface and how best to use it.
> Gmond uses a single thread that executes the callback function for
> every metric of every module
> in a scheduled fashion...
> This seems like a brittle design that won't scale for many metrics.
> If a developer writes a module that takes too long that would prevent other
> metric callbacks from being called.
> I was thinking the design should use threads to prevent a bad module
> from DOSing the rest of the modules.
> Either that or an enforced timer...
> 
> I maintain a largish cluster of mysql databases so I wanted to use Ganglia
> to get mysql stats. I found a python module for doing this :
> 
> http://g.raphaelli.com/2009/1/5/ganglia-mysql-metrics
> 
> This module provides a lot of useful mysql metrics from the output of
> about 5 mysql queries.
> I briefly audited the code and found interesting things.
> The callback function first calls update_stats() before returning the
> relevant metric.
> Obviously we only want update_stats() to cache data and only perform
> mysql queries
> after all the metric callbacks have read a metric from the cache.
> However, I noticed this at the beginning of update_stats():
> 
> if time.time() - last_update < 15:
> return True
> else:
> last_update = time.time()
> 
> This design assumes that the gmond metric scheduler will schedule all
> mysql metric callbacks within 15 seconds of the first mysql metric
> callback.
> This is probably a safe assumption but it still bothers me ;-)
> 
> I think gmond is unlikely to take longer than 15 seconds to call all
> the mysql metric callbacks.
> But a mysql database could easily take longer than 15 seconds to
> return results for the 5 queries.
> This would cause the callback function to execute mysql queries for each 
> metric.
> So a quick fix would be to measure the time after collecting the data
> from the mysql queries...
> 
> Or the module could be improved by removing the time measurement and instead
> marking each metric item as they are read by the callback function.
> When all metrics are finally marked as being read then the callback
> function will compute the metrics again.
> Mysql could block on a "SHOW SLAVE STATUS" which would then break my design
> by preventing gmond from running other callbacks for the other modules...
> 
> Another approach is the python threading method used in tcpconn.py:
> 
> - spawn a worker thread that caches data for many metrics
>   - worker thread uses a lock when update metric cache
> - metric callback function acquieres lock to read metric values from cache
>   - callback function blocks if lock is already acquired
> 
> In this case there would only be 2 threads competing for the lock so I
> guess it doesn't matter
> that the python Lock object
> (http://docs.python.org/library/threading.html) has no defined
> fairness scheduling...
> 
> I sort of like this approach. The callbacks can return immediately
> because they read data that is cached by a worker thread.
> I guess a problem with this design might be that a gmond would
> schedule metric callbacks out of sync with
> the worker thread collecting data.
> This could cause a race condition where metric callbacks might return
> old values while others return new values.
> 
> The pros

Re: [Ganglia-developers] gmond python module: multidisk.py

2007-07-17 Thread Brad Nicholes
>>> On 7/16/2007 at 7:54 PM, in message
<[EMAIL PROTECTED]>, "Bernard Li"
<[EMAIL PROTECTED]> wrote:
> Hi Brad:
> 
> On 7/16/07, Brad Nicholes <[EMAIL PROTECTED]> wrote:
> 
>> Slurpfile is probably doing the right thing by reporting that the file 
> doesn't exist when it goes to read it (however it should probably state which 
> file it can't read).  In this case metric_init() should probably stat() the 
> file before it calls slurpfile() and then assign the default value if stat() 
> fails.
> 
> Okay slurpfile now reports the filename that gave the error:
> 
> http://ganglia.svn.sourceforge.net/viewvc/ganglia?view=rev&revision=816 
> 
> However, I noticed that in metrics.c, some invocation of slurpfile in
> metric_init() also returns the filename, but some don't -- seems to be
> all over the place currently.
> 
> And the fix for libmetrics/linux/metrics.c is inline:
> 
> Index: libmetrics/linux/metrics.c
> ===
> --- libmetrics/linux/metrics.c  (revision 814)
> +++ libmetrics/linux/metrics.c  (working copy)
> @@ -164,13 +164,25 @@
>  {
> g_val_t rval;
> char * dummy;
> +   struct stat struct_stat;
> 
> num_cpustates = num_cpustates_func();
> 
> -   cpufreq = 1;
> -   rval.int32 =
> slurpfile("/sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq",
> sys_devices_system_cpu, BUFFSIZE);
> -   if ( rval.int32 == SYNAPSE_FAILURE )
> -  cpufreq = 0;
> +   /* /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq is only
> available on newer kernels, so stat to make sure it exists */
> +   /* before slurping the file */
> +   cpufreq = 0;
> +   if ( ! (stat("/sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq",
> &struct_stat)) )
> +  {
> + rval.int32 =
> slurpfile("/sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq",
> sys_devices_system_cpu, BUFFSIZE);
> + if ( rval.int32 == SYNAPSE_FAILURE )
> +{
> +   err_msg("metric_init() got an error from slurpfile()");
> +}
> + else
> +{
> +   cpufreq = 1;
> +}
> +  }
> 
> rval.int32 = slurpfile("/proc/cpuinfo", proc_cpuinfo, BUFFSIZE);
> if ( rval.int32 == SYNAPSE_FAILURE )
> 
> Any volunteers with a newer kernel can test this before I apply?
> 

I don't think that it should write out another error message.  Slurpfile will 
be writing out a message already stating that the file can't be found.  One of 
the purposes for reading this file is to simply determine if cpu_speed needs to 
be  adjusted in the function cpu_speed_func().  If the file doesn't exist, it 
isn't really an error, it just means that cpu_speed doesn't need to be 
adjusted, so writing out an error message would probably be confusing to the 
user.  Also, and somebody can correct me if I am wrong, I am not sure that this 
file depends on a newer kernel.  I think that it depends on whether the 
hardware supports this functionality or not as to whether the file actually 
exists.

BTW, the note attached to the patch in SVN 
(http://ganglia.svn.sourceforge.net/viewvc/ganglia/trunk/monitor-core/libmetrics/linux/metrics.c?revision=741&view=markup)
 states that one of the purposes of adjusting the cpu_speed is to compensate 
for the fact that each CPU can't be tracked independently.  Tracking each CPU 
independently now exists in Ganglia 3.1.x in the gmond/modules/multicpu.c 
module.  The built-in cpu_speed_func() would still need to have to make this 
adjustment, but if the user would rather report each cpu speed rather than an 
adjusted average, that can now be done.

Brad

Brad


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] gmond python module: multidisk.py

2007-07-16 Thread Bernard Li
Hi Brad:

On 7/16/07, Brad Nicholes <[EMAIL PROTECTED]> wrote:

> Slurpfile is probably doing the right thing by reporting that the file 
> doesn't exist when it goes to read it (however it should probably state which 
> file it can't read).  In this case metric_init() should probably stat() the 
> file before it calls slurpfile() and then assign the default value if stat() 
> fails.

Okay slurpfile now reports the filename that gave the error:

http://ganglia.svn.sourceforge.net/viewvc/ganglia?view=rev&revision=816

However, I noticed that in metrics.c, some invocation of slurpfile in
metric_init() also returns the filename, but some don't -- seems to be
all over the place currently.

And the fix for libmetrics/linux/metrics.c is inline:

Index: libmetrics/linux/metrics.c
===
--- libmetrics/linux/metrics.c  (revision 814)
+++ libmetrics/linux/metrics.c  (working copy)
@@ -164,13 +164,25 @@
 {
g_val_t rval;
char * dummy;
+   struct stat struct_stat;

num_cpustates = num_cpustates_func();

-   cpufreq = 1;
-   rval.int32 =
slurpfile("/sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq",
sys_devices_system_cpu, BUFFSIZE);
-   if ( rval.int32 == SYNAPSE_FAILURE )
-  cpufreq = 0;
+   /* /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq is only
available on newer kernels, so stat to make sure it exists */
+   /* before slurping the file */
+   cpufreq = 0;
+   if ( ! (stat("/sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq",
&struct_stat)) )
+  {
+ rval.int32 =
slurpfile("/sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq",
sys_devices_system_cpu, BUFFSIZE);
+ if ( rval.int32 == SYNAPSE_FAILURE )
+{
+   err_msg("metric_init() got an error from slurpfile()");
+}
+ else
+{
+   cpufreq = 1;
+}
+  }

rval.int32 = slurpfile("/proc/cpuinfo", proc_cpuinfo, BUFFSIZE);
if ( rval.int32 == SYNAPSE_FAILURE )

Any volunteers with a newer kernel can test this before I apply?

Thanks,

Bernard

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] gmond python module: multidisk.py

2007-07-16 Thread Brad Nicholes
>>> On 7/16/2007 at 11:57 AM, in message
<[EMAIL PROTECTED]>, "Bernard Li"
<[EMAIL PROTECTED]> wrote:
> Hi Brad:
> 
> I don't have this file in CentOS 4.4 either.  Looks like it is only
> available in newer kernels.
> 
> Have a look at this bugzilla bug:
> 
> http://bugzilla.ganglia.info/cgi-bin/bugzilla/show_bug.cgi?id=114 
> 
> Shouldn't we check for the presence of this file before we slurpfile()
> it?  Or should we modify slurpfile() to be more accommodating?
> 

Slurpfile is probably doing the right thing by reporting that the file doesn't 
exist when it goes to read it (however it should probably state which file it 
can't read).  In this case metric_init() should probably stat() the file before 
it calls slurpfile() and then assign the default value if stat() fails.

Brad


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] gmond python module: multidisk.py

2007-07-16 Thread Bernard Li
Hi Brad:

I don't have this file in CentOS 4.4 either.  Looks like it is only
available in newer kernels.

Have a look at this bugzilla bug:

http://bugzilla.ganglia.info/cgi-bin/bugzilla/show_bug.cgi?id=114

Shouldn't we check for the presence of this file before we slurpfile()
it?  Or should we modify slurpfile() to be more accommodating?

Cheers,

Bernard

On 7/16/07, Brad Nicholes <[EMAIL PROTECTED]> wrote:
> >>> On Fri, Jul 13, 2007 at  7:01 PM, in message
> <[EMAIL PROTECTED]>, "Bernard Li"
> <[EMAIL PROTECTED]> wrote:
> > Brad:
> >
> > multidisk.py has a line:
> >
> > print 'Discovered device %s' % line[1]
> >
> > Is this for debugging purposes?
> >
> > Right now you will get output similar to the following when you start
> > up gmond with - m or - d options:
> >
> > Discovered device /
> > Discovered device /boot
> > Discovered device /dev/shm
> > slurpfile() open() error: No such file or directory
> >
> > I guess it's okay with - d, but - m?  It somewhat clutters the output.
> >
> > BTW, any ideas where the slurpfile error is coming from?
> >
>
> The slurpfile error is coming from the call to metric_init() in 
> libmetrics/linux/metrics.c which tries to read the file 
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq.  At least on SUSE or 
> SUSE on my hardware, this file doesn't exist.  From the way that the code is 
> written, it looks like it doesn't necessarily have to.
>
> Brad
>
>

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] gmond python module: multidisk.py

2007-07-16 Thread Brad Nicholes
>>> On Fri, Jul 13, 2007 at  7:01 PM, in message
<[EMAIL PROTECTED]>, "Bernard Li"
<[EMAIL PROTECTED]> wrote: 
> Brad:
> 
> multidisk.py has a line:
> 
> print 'Discovered device %s' % line[1]
> 
> Is this for debugging purposes?
> 
> Right now you will get output similar to the following when you start
> up gmond with - m or - d options:
> 
> Discovered device /
> Discovered device /boot
> Discovered device /dev/shm
> slurpfile() open() error: No such file or directory
> 
> I guess it's okay with - d, but - m?  It somewhat clutters the output.
> 
> BTW, any ideas where the slurpfile error is coming from?
> 

The slurpfile error is coming from the call to metric_init() in 
libmetrics/linux/metrics.c which tries to read the file 
/sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq.  At least on SUSE or 
SUSE on my hardware, this file doesn't exist.  From the way that the code is 
written, it looks like it doesn't necessarily have to.

Brad


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] gmond python module: multidisk.py

2007-07-13 Thread Brad Nicholes
>>> On 7/13/2007 at 7:01 PM, in message
<[EMAIL PROTECTED]>, "Bernard Li"
<[EMAIL PROTECTED]> wrote:
> Brad:
> 
> multidisk.py has a line:
> 
> print 'Discovered device %s' % line[1]
> 
> Is this for debugging purposes?
> 
> Right now you will get output similar to the following when you start
> up gmond with -m or -d options:
> 
> Discovered device /
> Discovered device /boot
> Discovered device /dev/shm
> slurpfile() open() error: No such file or directory
> 
> I guess it's okay with -d, but -m?  It somewhat clutters the output.
> 
> BTW, any ideas where the slurpfile error is coming from?
> 

The print statement could be removed.  It doesn't really serve any purpose in a 
production environment.  I'm not exactly sure where the slurpfile error is 
coming from.  I looked into it briefly once before but never really got it 
resolved.  I'll look into it again.

Brad


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers