Re: [Ganglia-developers] [Ganglia-general] [ANNOUNCEMENT] Ganglia meetup Tue Oct 21 in San Francisco (Quantcast HQ)

2014-10-21 Thread Carlo Marcelo Arenas Belon
Also forgot to mention we have a code for FREE parking thanks to 
our friends of zirx[1], so if you are driving to SF for this 
meeting (like I am planing to do) all you really need is to install 
their app, hit the right address for quantcast in the map and hit
the price to enter your code: GANGLIA so someone will be waiting 
for you at the door and take your car to a safe place.

see you all in the other side, and lets have fun

Carlo

[1] http://zirx.com/

PS. you need an iphone or android phone to use their app though

--
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://p.sf.net/sfu/Zoho
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] moving mod_multicpu out of ganglia to ganglia-modules-linux

2012-05-14 Thread Carlo Marcelo Arenas Belon
On Mon, May 14, 2012 at 01:17:19PM +0200, Daniel Pocock wrote:
 
 The mod_multicpu code in the main ganglia repo is Linux-only, while most
 of the other modules are cross-platform

I think it might also work for cygwin but haven't really tried lately, if
that is the case though it will remove this functionality from cygwin for 
no big gain IMHO.

Most of the python modules are linux specific though, so would guess your 
comment was about native modules instead.

 The version in ganglia-modules-linux is based on the same code, with
 some small enhancements (using arrays instead of string comparisons)

instead of having a forked version, why not make multi-cpu portable instead?
and if you think your linux version is better, why not import it instead?

having a mechanism to identify which OS is supported by each module was 
something that was missing in the modular architecture from the start 
(since it was modeled after apache that doesn't have that requirement) and 
adding this functionality instead of hacking around the lack of it would 
be IMHO a better option, eventhough that would most likely require a 
binary incompatible change and therefore a different (at least minor) 
version of ganglia, which seems is something we are fond of now anyway 
considering I'd seen some code released as 3.4 already.

Carlo

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] new script for automating releases

2012-04-21 Thread Carlo Marcelo Arenas Belon
On Fri, Apr 20, 2012 at 01:54:27AM +0200, Daniel Pocock wrote:
 
 I've generalised it for building just about anything that works with
 git/autotools and published it here:
 
   https://sourceforge.net/p/git2dist/code/?branch=ref%2Fmaster
 
 It's had two test runs today, flactag-2.0.1 and ganglia-3.3.7

there is either a bug on it, or a misunderstanding on how ganglia's agreed 
workflow [1] if this is too happen after a remote update :

  $ git branch
  * master
  $ git describe --tags
  3.3.5-20-gb98de1e

if you are going to suggest that I should be looking at the release/3.3
branch for the current HEAD of development or that this would be fixed by 
merging that back to master sometime later (which implies 3.3 will be
abruptly EOL sometime in the future) then that should be spelled 
out and documented clearly before we found ourselves with a ganglia fork
on our hands or an even bigger repository mess with duplicated commits 
or even worst, lost bug fixes.

[1] http://sourceforge.net/apps/trac/ganglia/wiki/how_project_works

--
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] 3.3.5 released today

2012-04-13 Thread Carlo Marcelo Arenas Belon
On Thu, Apr 05, 2012 at 12:52:21AM +0200, Daniel Pocock wrote:
 
 A number of bugs were found during the testing of 3.3.5 and discussed on
 the mailing lists.

could a list of this bugs be published somewhere with the release,
so that anyone knows what to expect if upgrading (most people 
probably still using patched 3.1.7 as that is what is provided by most 
distributions)

from the top of my head there are :

* 2 memory leaks (one probably only in deaf mode)
* gmetad hierarchical mode is broken

  In other words, anyone who is using 3.3.1 or 3.3.0 should not get any
 new bugs from upgrading to 3.3.5

considering that 3.3.5 doubled the in memory size of each metric, it is 
likely to make the memory leaking problems worse though

Carlo

--
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] bootstrap / autotools (renamed)

2012-04-04 Thread Carlo Marcelo Arenas Belon
On Wed, Apr 04, 2012 at 06:51:27PM +0200, Daniel Pocock wrote:
 
 On 04/04/12 18:35, Dave Rawks wrote:
 
  be built only with specific versions from Debian current stable
  (squeeze) especially when the goal of the maintainers appears to be
  inclusion into testing/new-stable. Personally I would like for the
  entire tarball to have a nice sensible autotools based build install
  process ./configure  make  make install that can be quickly
  packaged up with a super minimal amount of distro specific munging.
 
 It does do that: someone who downloads the tarball should NOT run the
 bootstrap script (and should not have to run it).

and that is why until recently wasn't included in the tarball, so no one 
would get confused.

Carlo

--
Better than sec? Nothing is better than sec when it comes to
monitoring Big Data applications. Try Boundary one-second 
resolution app monitoring today. Free.
http://p.sf.net/sfu/Boundary-dev2dev
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] autotools / git release branches / web version.php

2012-03-27 Thread Carlo Marcelo Arenas Belon
On Mon, Mar 26, 2012 at 04:57:31PM +0100, Daniel Pocock wrote:
 
 Having the option to work both ways may just continue to create traps 
 for people who know one half of the project and not so much about the other.

ironically, the main driver for the 3.3.x series was to import the new web 
frontend, and it used to (mostly) work for the first 2 releases of that 
series while keeping the posibility of building independently.

it would seem IMHO that all the extra hacking that was done to the build 
and release process, including the (mostly ignored) documentation hadn't 
improved on its reliability or clarity as shown by the fact that the last 
package release just masquerades the obsolete version 3.3.1 as 3.3.5 for 
web.

Carlo

--
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] release/3.3 branch created

2012-03-27 Thread Carlo Marcelo Arenas Belon
On Tue, Mar 27, 2012 at 10:46:51AM -0400, Vladimir Vuksan wrote:
 Therefore I'd like to dump branches for now and just stay on mainline.

+1, keep it simple; and unless my view of git log is incorrect no feature 
(except some bugfixes) were added since 3.3.2 anyway, which might be the reason 
why the last release notes available (and that has already a due date that is 
1 month old) hasn't been updated : 

  https://github.com/ganglia/monitor-core/wiki/Release-Notes

Carlo

--
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] 3.3.5 tagged

2012-03-27 Thread Carlo Marcelo Arenas Belon
On Mon, Mar 26, 2012 at 04:50:18PM +0100, Daniel Pocock wrote:
 
 Release 3.3.5
 
 The release has now been tagged in git
 commit = 9db9beea062c7ce5e5b4d10ed553c9b7cea7642e

wrong bundle :

  carenas@dell ~/src/git/ganglia $ git describe --tags
  3.3.5
  carenas@dell ~/src/git/ganglia $ cd web/
  carenas@dell ~/src/git/ganglia/web $ git describe --tags
  3.3.2-3

while web has since had a lot more fixes added as shown by :

  carenas@dell ~/src/git/ganglia-web $ git describe --tags
  3.3.4-14-g7383ed8
  carenas@dell ~/src/git/ganglia-web $ git diff --stat 3.3.2-3.. | cat
  Makefile |2 +-
  api/host.php |9 ++---
  cluster_view.php |4 ++--
  functions.php|   15 +++
  graph.php|5 +++--
  header.php   |1 +
  inspect_graph.php|4 ++--
  templates/default/views_view.tpl |   16 
  8 files changed, 42 insertions(+), 14 deletions(-)

Carlo

--
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] autotools / git release branches / web version.php

2012-03-26 Thread Carlo Marcelo Arenas Belon
On Mon, Mar 26, 2012 at 04:23:46PM +0100, Daniel Pocock wrote:
 
 We need to get version into version.php

all you need to do is run make in that directory and it will be done
for you, if you follow the documentation.

yes, the 3.3.4 release or lower is missing that, as well, because the 
procedure you follow to make the package was incomplete, but the wiki has 
been updated, and the file has been committed with the right version, so
hopefully the 3.3.5 package will be fine.

Carlo

--
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] 3.3.3 tagged

2012-03-22 Thread Carlo Marcelo Arenas Belon
On Wed, Mar 21, 2012 at 07:59:16PM +0100, Daniel Pocock wrote:
 On 21/03/12 19:48, Vladimir Vuksan wrote:
  I agree with Alex. We are churning through too many versions. I would
  personally be OK with overriding the existing 3.3.2 tag and going with
  3.3.2 instead of 3.3.4.
 
 Having been involved in the releases between 3.1.2 and 3.1.7, I accept
 some of the responsibility if people did find it problematic
 
 That is why I put out a test tarball, only tagged 3.3.3dp1, before
 tagging 3.3.3 - so people did have 24 hours to evaluate

and that resulted (like in the 3.1.2 to 3.1.7 cycle) in a couple of 
obvious issues that were found after the release tag was made and 
therefore in a couple releases more.

which probably point to the fact (which keeps getting ignored) that the 
testing community for ganglia is very small (per sourceforge download 
statistics they were 10 downloads for each on of those prereleases) and not 
able to respond in the timeline you suggest.

specially when :

* no information about what has changed is provided, so no one knows where
  to look
* there is no standard battery test to run, neither enough time for testers
  to build their own package and deploy them in some test cluster to see
  how they behave.
* the target audience for this product are sysadmins, and so providing 
  binaries and making broader announcements (also including the 
  ganglia-users) would be recommended so that prerelease testing is
  exhaustive.
* there has been obviously little testing before making the release tar 
  and so those few testers eventually get more tired as the releases keep 
  increasing and demanding they start from scratch each time.

the end result of course being that the quality of ganglia at release time 
is not what I am sure we all would like to see, and far from perfect.

usually package maintainers don't even bother to get involved with 
prereleases, but would be IMHO and important part of that testing if 
we are to aim for a quality final releases.

Carlo

--
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] 3.3.3 tagged

2012-03-22 Thread Carlo Marcelo Arenas Belon
On Thu, Mar 22, 2012 at 01:52:04PM +, Daniel Pocock wrote:
 On 22/03/2012 13:33, Carlo Marcelo Arenas Belon wrote:
  On Wed, Mar 21, 2012 at 07:59:16PM +0100, Daniel Pocock wrote:
  On 21/03/12 19:48, Vladimir Vuksan wrote:
  I agree with Alex. We are churning through too many versions. I would
  personally be OK with overriding the existing 3.3.2 tag and going with
  3.3.2 instead of 3.3.4.
 
  Having been involved in the releases between 3.1.2 and 3.1.7, I accept
  some of the responsibility if people did find it problematic
 
  That is why I put out a test tarball, only tagged 3.3.3dp1, before
  tagging 3.3.3 - so people did have 24 hours to evaluate
 
  and that resulted (like in the 3.1.2 to 3.1.7 cycle) in a couple of
  obvious issues that were found after the release tag was made and
  therefore in a couple releases more.
 
 Let's not have a discussion about `obvious' issues: Ganglia is supported 
 on a large number of platforms, but I'm not sure if everyone here is 
 testing every platform.  I've never run it on AIX or a non-Intel based 
 Linux, for example.

obvious here means :

* does it has the right version?
* does it build?
* can you make a package out of it?
* does it require a flag day (compatibility or feature wise)?

all of those SHOULD be resolved without having to bump a version number, 
because it is something we should be able to test before we make a package 
that is meant for public consumption.

a version number change in a package is normally associated with changes 
on features or bugs, and therefore requires more focused testing than the
above.

  specially when :
 
  * no information about what has changed is provided, so no one knows where
 to look
 
 There was previously a change log, I'm not sure what happened to that
 
 Do we just rely on the git logs (maybe a script to extract them to the 
 web page too)?  Or should someone be obliged to make a proper report 
 with each release?

https://github.com/ganglia/monitor-core/wiki

  * the target audience for this product are sysadmins, and so providing
 binaries and making broader announcements (also including the
 ganglia-users) would be recommended so that prerelease testing is
 exhaustive.
 
 I deliberately avoided that, because it should not be seen as an 
 official release yet, and it could be tiresome helping less experienced 
 users evaluate it.  I would prefer to suggest that those sysadmins who 
 want to test bleeding edge stuff join the dev list.

missed the point, it is not that they want to test bleeding stuff, as much 
as we want them to test our bleeding stuff, so that when it gets released 
to the public all issues had been ironed out.

  usually package maintainers don't even bother to get involved with
  prereleases, but would be IMHO and important part of that testing if
  we are to aim for a quality final releases.
 
 I'm actually testing each of the 3.3.x series on OpenCSW.  I hope to 
 have binary packages in experimental very soon for people to try.

and I am sure debian has experimental, and fedora has rawhide and 
opensuse has tumbleweed, and there are plenty other options we could 
be using to widen our test base if we would just meet their requirements 
and ask for their help.

Carlo

PS. I usually test in gentoo amd64, but not sure if ~amd64 would be the 
right place for the packages we put as prerelease

--
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] releasing 3.3.2 today?

2012-03-20 Thread Carlo Marcelo Arenas Belon
On Tue, Mar 20, 2012 at 12:09:29PM +, Daniel Pocock wrote:
 
 Does anyone want to sneak in any last minute changes before I tag 3.3.2 
 and make the tarball available for testing?

there is already a published tag named 3.3.2, if you are not going to release
that then it will be better if we skip that release number and aim for 3.3.3

would recommend for testing you do tag it like 3.3.2pre1 or something like 
that as well.

Carlo

--
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] releasing 3.3.2 today?

2012-03-20 Thread Carlo Marcelo Arenas Belon
On Tue, Mar 20, 2012 at 05:36:56PM +, Daniel Pocock wrote:
 On 20/03/2012 17:34, Bernard Li wrote:
 On Tue, Mar 20, 2012 at 10:03 AM, Daniel Pocockdan...@pocock.com.au  wrote:
 
 I agree with that approach, with a slight variation - I'll tag it as
 3.3.3dp1 (after adding the ChangeLog file)
 
 Quick question -- does this prevent RPM upgrading? i.e. 3.3.3dp1 -  3.3.3?
 
 It is just a tag to help us keep track of what we test, it is not
 intended for versioning a binary package

AFAIK the version of this release will be 3.3.2 since micro numbers no 
longer exist.

Carlo

--
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] Ganglia 3.3.1 configure.in broken, 3.3.2 needed

2012-03-11 Thread Carlo Marcelo Arenas Belon
On Sat, Mar 10, 2012 at 04:25:16PM -0500, Vladimir Vuksan wrote:
 I am not married to package-ganglia-release so anything that 
 helps us long term is a win.

I think the problems are not with the tools but with the process
and as was spelled out on the original list of bulletpoints.

agree though that simplifying tools goes also together with making
the process simpler and less prone to failures.

  On Sat, 10 Mar 2012, Daniel Pocock wrote:
  a) the tag doesn't cover the ganglia-web stuff

this is not correct, but made complicated by the fact that the tags are 
not static, not standard (3.3.2-3 might had been better called 3.3.2.3 IMHO),
and that they are not matching :

  $ git describe --tags
  3.3.2
  $ cd web/
  $ git describe --tags
  3.3.2-1

  b) the tag is created before testing (which is not necessary when using
  git, you can tag after you test, because a tag is just a checksum of
  what you tested)

more importantly, you could end up pushing that tag by mistake, and end up 
changing it later (something that wouldn't work since you can't force every 
clone to delete and reaquire that tag).

if enforcing having a tag is going to be used in this way, will be better 
to stick to names like 3.3.2pre1 or use the old svn standard of not updating 
the main version after it has been tested and release prereleases with versions 
like 3.1.1.x (where x used to be the svn revision, but now would have to be 
a monotonically increasing number)

I'd prefer using the 'pre' notation which is I submitted 74ddc9e so hopefully
it will be more difficult to have a release like 3.3.1 where the version of 
the package (and associated libraries) was wrong.

the most important part of this being of course, that it is possible to do 
more coordinated and exhaustive testing before release.

Carlo

--
Virtualization  Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] Ganglia 3.3.1 configure.in broken, 3.3.2 needed

2012-03-09 Thread Carlo Marcelo Arenas Belon
On Thu, Mar 08, 2012 at 04:34:19PM +0100, Daniel Pocock wrote:
 
 Michael, do you have write access on the wiki?  I think we need to get
 this distribution-specific stuff captured there along with the general
 notes I provided below.

having this instructions added to the codebase just like README.WIN is 
could help too, specialy considering there is a fair ammount of 
confusion now with information (not all of it consistent with each 
other) between the multiple wikis and website.

Carlo

--
Virtualization  Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] Investigating feasibility of moving repo to Github

2011-07-11 Thread Carlo Marcelo Arenas Belon
On Sun, Jul 10, 2011 at 04:28:18PM -0400, Vladimir Vuksan wrote:
 
 Any thoughts on why we shouldn't make the Github our primary repository ?

as it was explained long time ago when I proposed the same and were rejected 
we will also need to change our scripts so that they will be able to work 
without subversion, to make a package.

* the svn release is used as the MICRO release number
* the changelog is created dynamically from svn log
* the development tracks svn changelog numbers to keep track of merges from 
  trunk, since svn is limited to do so automatically (until recently), but
  this last one is something that git would to automatically and will only 
  reflect maybe in the way we do integrations and how they get approved

using git allows for a much more dynamic development, specially since there
are already significant codebases that are being maintained in parallel and
using svn, limits how easily they could be integrated back.

in that same line, it would be a good idea, most likely, to allow for some 
time for that people which had already diverged trees using svn to either 
got those changes integrated in there, or move them to their own trees using 
git, and so during that period, keeping both trees on sync somehow would 
be needed.

Carlo

--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] Investigating feasibility of moving repo to Github

2011-07-11 Thread Carlo Marcelo Arenas Belon
On Sun, Jul 10, 2011 at 09:27:28PM -0400, Jesse Becker wrote:
 
 My only concern is with the import process itself.

any import process that I know of from svn to git, should at least preserve
the history, what is your concern specifically here?

 There is a lot of important metadata in the existing SVN repository.

are you referring to which files are executable or ASCII and stuff like that?
tools should be able to translate them most likely into their corresponding 
git flags

if you are talking about the external dependency to web/dwoo that was added 
in trunk and therefore now also in 3.2, that would need to be translated as 
well, but git submodules allows for that.

 I believe that
 this should be completely preserved, either directly within the git
 repository, or as a separate standalone (and frozen) SVN repository.
 The commit logs, test branches, and history is too important to lose.

the test branches that are no longer open (because they were already merged 
back) wouldn't need to be migrated IMHO, as for the other branches that were 
open but never merged back, the should be probably migrated over as well as 
topic branches but later weeded out after their good parts had been merged 
back, to avoid confusion.

git allows you to have infinite number of local branches on your repository 
anyway, for all topics you would feel like.

Carlo

--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] Investigating feasibility of moving repo to Github

2011-07-11 Thread Carlo Marcelo Arenas Belon
On Mon, Jul 11, 2011 at 01:05:47PM -0400, Jesse Becker wrote:
 On Mon, Jul 11, 2011 at 12:42, Carlo Marcelo Arenas Belon
 care...@sajinet.com.pe wrote:
  On Sun, Jul 10, 2011 at 09:27:28PM -0400, Jesse Becker wrote:
 
  I believe that
  this should be completely preserved, either directly within the git
  repository, or as a separate standalone (and frozen) SVN repository.
  The commit logs, test branches, and history is too important to lose.
 
  the test branches that are no longer open (because they were already 
  merged
  back) wouldn't need to be migrated IMHO, as for the other branches that were
  open but never merged back, the should be probably migrated over as well as
  topic branches but later weeded out after their good parts had been merged
  back, to avoid confusion.
 
 Closed branches can remain closed, but I still think they should be
 kept as a record, if nothing else.

For keeping a record of them, it would be easier to keep svn around in a 
read only way, as it is (nearly) imposible to reconstruct the merges and 
the full history in svn anyway, as it was only recently that metadata was 
added for keeping track of the merges.

The equivalent on git for an svn merge operation without metadata (as was 
the default until very recently) is to do `git merge --squash`, which 
doesn't keep track of the development history, and so migrating those 
branches into git isn't very expresive and is instead just a waste.

Carlo

--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] Writing a Makefile for manpages in mans/

2011-03-01 Thread Carlo Marcelo Arenas Belon
On Mon, Feb 28, 2011 at 12:11:16PM -0800, Bernard Li wrote:
 On Sat, Feb 26, 2011 at 8:56 AM, Carlo Marcelo Arenas Belon
 care...@sajinet.com.pe wrote:
 
  and also requires some post processing for the right formatting :
 
  ?http://www.mail-archive.com/ganglia-developers@lists.sourceforge.net/msg05612.html
 
 In the email, you said:
 
 after the binaries are build then pipe them to help2man, then some
 sed to replace the ` if I recall correctly.
 
 So do we want ` or not, because it looks like the code in trunk
 right now has `:

obviously I didn't recall correctly ;), specially considering that the
source of the backticks is actually gengetopt.

 I guess it all depends on the locale of your system.  On my RHEL6b2
 system, everything is single quote and there are no `.

which version of help2man?, which locale, and which version of the binaries
are you calling that don't have backticks in --help?

 Are you aware of any other post-processing that needs to be done?

* removing the version (--version-string=  could help as a workaround)
* a descriptive name (-nmanual page for Ganglia Status Tool for gstat)
* removing the misalignment that is added in the DESCRIPTION with the
  package name, version and the '.SS Purpose\n.IP' string which was removed
  with commit 1132

 I think the only thing left is slightly better formatting for AUTHORS
 and COPYING then we can use those as template for the manpages (in my
 Makefile I generate a help2man include template that has AUTHORS,
 COPYING and BUGS).

presume first creating suitable include files for -i

 What do you think?

since I don't have such Makefile, there is not much I can comment on, but
my attempt of generating a test updated man file for gstat showed the
formatting was really off compared with the original, so hope you had better
luck :

  $ help2man --version-string=  -N -nmanual page for Ganglia Status Tool -i 
AUTHORS -i COPYING ./gstat  ../mans/gstat.1

Carlo

PS. using help2man 1.38.2

--
Free Software Download: Index, Search  Analyze Logs and other IT data in 
Real-Time with Splunk. Collect, index and harness all the fast moving IT data 
generated by your applications, servers and devices whether physical, virtual
or in the cloud. Deliver compliance at lower cost and gain new business 
insights. http://p.sf.net/sfu/splunk-dev2dev 
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] Dynamically resizable buffer for slurpfile()

2011-02-27 Thread Carlo Marcelo Arenas Belon
On Thu, Feb 24, 2011 at 10:44:58PM +, Kostas Georgiou wrote:
 On Thu, Feb 24, 2011 at 12:25:16PM +, Carlo Marcelo Arenas Belon wrote:
 
  On Wed, Feb 23, 2011 at 09:42:56AM -0800, Bernard Li wrote:
   
   123   read:
   124  read_len = read(fd, db, buflen);
   125  if (read_len = 0)
   126 {
   127if (errno == EINTR)
   128   goto read;
   129err_ret(slurpfile() read() error on file %s, 
   filename);
   130close(fd);
   131return SYNAPSE_FAILURE;
   132 }
  
  this code is not relevant as it is only called when EINTR is received
  because a signal interrupts the read call (very unlikely)
 
 Shouldn't this be if (read_len  0), a return of zero from read is
 possible (EOF for example). If slurpfile is called with buffer=NULL and
 buffsize equal or a multiple of the file size then we get
 SYNAPSE_FAILURE. The errno check will be against an old value of errno
 in this which makes it more likely to hit (still very unlikely though :)
 and then we have an infinite loop...

good point, eventhough I would like to think that errno will be reset by
the read call and avoid the infinite loop anyway.

Committed revision 2494

Carlo

--
Free Software Download: Index, Search  Analyze Logs and other IT data in 
Real-Time with Splunk. Collect, index and harness all the fast moving IT data 
generated by your applications, servers and devices whether physical, virtual
or in the cloud. Deliver compliance at lower cost and gain new business 
insights. http://p.sf.net/sfu/splunk-dev2dev 
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] Dynamically resizable buffer for slurpfile()

2011-02-24 Thread Carlo Marcelo Arenas Belon
On Wed, Feb 23, 2011 at 05:12:03PM -0800, Bernard Li wrote:
 
  I tested under EL5 and EL6 and it was't able to get past the initial
  buffer size. ?I believe what I did was:
 
 Correction.  It works on EL6, but not on EL5:

most likely the test is just giving inconsistent results, and that is why
now works in EL5, while it didn't before.

 [CentOS 5.5 x86_64 with kernel 2.6.18-194.32.1.el5]
 
 read(3, 2.6.18-194.32.1., 16) = 16
 read(3, , 16) = 0
 
 [RHEL6b2 x86_64 with kernel 2.6.32-37.el6.x86_64]
 
 read(3, 2.6.32-37.el6.x8, 16) = 16
 read(3, 6_64\n, 16)   = 5
 
 The issue may be specific to files in /proc/sys, because I tried
 reading /proc/stat on CentOS 5.5 and it worked fine.

very unlikely, and considering that this is some code modification you
made and that you only have, the problem is most likely in your code anyway
(maybe even miscompiled)

Carlo

--
Free Software Download: Index, Search  Analyze Logs and other IT data in 
Real-Time with Splunk. Collect, index and harness all the fast moving IT data 
generated by your applications, servers and devices whether physical, virtual
or in the cloud. Deliver compliance at lower cost and gain new business 
insights. http://p.sf.net/sfu/splunk-dev2dev 
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] Dynamically resizable buffer for slurpfile()

2011-02-24 Thread Carlo Marcelo Arenas Belon
On Wed, Feb 23, 2011 at 09:42:56AM -0800, Bernard Li wrote:
 
  what second pass?
 
  ? dummy = proc_sys_kernel_osrelease;
  ? rval.int32 = slurpfile(/proc/sys/kernel/osrelease, dummy,
  ? ? ? ? ? ? ? ? ? ? ? ? ?MAX_G_STRING_SIZE);
 
  why would anyone call slurpfile in a loop anyway?, and slurpfile
  doesn't call itself recursively but just reads as much data as it
  can into the buffer provided (second parameter).
 
 Sorry I wasn't clear, I meant the goto read loop:
 
 123   read:
 124  read_len = read(fd, db, buflen);
 125  if (read_len = 0)
 126 {
 127if (errno == EINTR)
 128   goto read;
 129err_ret(slurpfile() read() error on file %s, filename);
 130close(fd);
 131return SYNAPSE_FAILURE;
 132 }

this code is not relevant as it is only called when EINTR is received
because a signal interrupts the read call (very unlikely)

the second conditional after that code is used to continue reading the
buffer after it is resized if that is possible and that works fine as
shown by your tests

136if (read_len == buflen)
137   {
138  if (dynamic) {
139 dynamic += buflen;
140 db = realloc(*buffer, dynamic);
141 *buffer = db;
142 db = *buffer + dynamic - buflen;
143 goto read;
144  } else {
145 --read_len;
146 err_msg(slurpfile() read() buffer overflow on file %s, 
filename);
147  }
148   }

 When I straced the process, the first read() was able to read up to
 MAX_G_STRING, however, the second read() returns 0.  However, if I
 read a regular file (not in /proc filesystem), it was able to read the
 rest of the string in the second pass just fine.

this just sounds to strange, but was able to replicate it after a lot of
guessing in a CentOS 5 VM (both 32bit and 64bit) as shown by :

# strace -e read dd if=/proc/sys/kernel/osrelease bs=16  /dev/null
read(0, 2.6.18-164.9.1.e, 16) = 16
read(0, , 16) = 0   

so not a ganglia problem, and just a problem with the way you were trying
to use slurpfile and the way that specific sysctl handler is implemented
in that version of the kernel.

makes sense anyway to not worry about partial reads from a value that is
meant to be used whole anyway, but interestingly enough and as you reported
later it is no longer working that way with newer kernels.

 Regarding this particular bug -- how should we fix this?  There are
 currently two issues:
 
 1) The OS release is truncated in the web frontend

and that is to protect the gmond process against crashes

 2) The warning slurpfile() read() buffer overflow on file
 /proc/sys/kernel/osrelease is displayed multiple times during RPM
 installation (possibly because gmond was called to generate conf files
 etc.)

that was meant to be mostly informative, but the message might need to
be reworked to be more effective.

 Can we potentially increase MAX_G_STRING or have
 proc_sys_kernel_osrelease buffer size resize dynamically?

no

Carlo

--
Free Software Download: Index, Search  Analyze Logs and other IT data in 
Real-Time with Splunk. Collect, index and harness all the fast moving IT data 
generated by your applications, servers and devices whether physical, virtual
or in the cloud. Deliver compliance at lower cost and gain new business 
insights. http://p.sf.net/sfu/splunk-dev2dev 
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] rpc/xdr.h missing #include rpc/types.h on Max OS X 10.6.4

2010-08-02 Thread Carlo Marcelo Arenas Belon
On Sat, Jul 31, 2010 at 04:37:37PM -0700, Bernard Li wrote:
 if it looks good I'll check it into trunk:

looks good, but haven't test it (don't have MacOS X anyway), with only the
following comments :

* probably better (as it will make for a faster configure) to have all
  AC_CHECK_HEADERS checks eventually together in one single macro instead
  of spread around.
* eventually (assuming they are make to work again) same changes should
  be applied to xdr{client,server} in tests.

also (since it is related to the rpc support anyway), wouldn't anyone have
any objection on committing a fixed generated gm_protocol.h (and friends)
instead of relying on the local rpcgen to generate them at build time?

advantages will be that all hacks around at least cygwin's implementation
of it will be removed and all dependencies to rpc/rpc.h will be truly
dropped, but of course, the disadvantage would be (mostly philosofical)
that an intermediate file would had been already propagated into each supported
platform and so it would need to be tested to be valid and work correctly
in all of them without having to rely on local platform knowledge.

anyone patching gm_protocol.x could just regenerate it anyway, as far as
the makefile rule is left there (which then could be a problem for some
as well, if they have a clock skew problem that confuses make and forces
recreating those files for no reason) and so customizations shouldn't
be a concern AFAIK.

Carlo

--
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million in cash or HP Products. Visit us here for more details:
http://p.sf.net/sfu/dev2dev-palm
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] Bugzilla #264

2010-07-13 Thread Carlo Marcelo Arenas Belon
On Mon, Jul 12, 2010 at 05:55:26PM -0700, Bernard Li wrote:
 Hi Carlo:
 
 On Fri, Jul 9, 2010 at 1:04 AM, Carlo Marcelo Arenas Belon
 care...@sajinet.com.pe wrote:
 
  bug is invalid, as it is the result of not indicating the right paths
  to use for the dependencies, if installing them through ports.
 
  the proposed fix will only remove the WARNING from configure, which
  is just a red herring since the code has all dependendant headers defined
  correctly regardless of the results from configure but if you really want
  to commit it, it wouldn't most likely do harm either.
 
 Initially I thought configure actually bailed with the warnings,
 however, checking with the user again this does not appear to be the
 case.

there are only warnings and they are mostly just informative with the
current implementation.

 So are you saying that if you did ./configure --prefix=/usr/local then
 the WARNING would not show up?

no, the WARNINGs are not related to which flags are used in configure
at all.

in order to get a working build though, ./configure must be instructed
where to find the dependencies (unless in /usr as usually happens in
linux), hence why the report that it was failing to build is invalid.

 AFAIK that's what the user did too
 (even though it was not specified in the bug report), so I just wanted
 to confirm.  If the warning still shows up it might be a good idea to
 check in the code if it doesn't break anything since less warning is
 good IMHO.

as I said before already, it wouldn't most likely harm either so feel
free to commit it so that a new bootstrapped snapshot could be tested
in all supported platforms.

Carlo

--
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] Install gmond module config files, python modules and config files by default

2010-06-29 Thread Carlo Marcelo Arenas Belon
On Mon, Jun 28, 2010 at 12:17:03PM -0700, Bernard Li wrote:
 On Sat, Jun 26, 2010 at 4:56 AM, Carlo Marcelo Arenas Belon
 care...@sajinet.com.pe wrote:
 
  this would trigger gmond to segfault unless it was linked against
  libconfuse 2.7 or it also has a default gmond.conf file created.
 
 Actually, gmond would segfault with or without this patch

Right, forgot the problem was introduced and released with 3.1.7 when
pushing by default the configuration for modpython (unless built with
--disable-python)

 this is because the default configuration has the line:
 
 include ($sysconfdir/conf.d/*.conf)
 
 which causes libconfuse to segfault.

not really; it is because the include is referencing that directory
AND there are files on that directory that then get imported into
and inserted into the configuration making libconfuse segfault.

if the directory would not exist or be empty (as it was before) then
gmond wouldn't crash.

 I was thinking that it does not make sense to include this in the
 default configuration file, which is used when no configuration file
 is found.  The reason is if no configuration file is found, you would
 not expect to have other configuration files lying around to be
 included.

make install is pushing configuration files and therefore there are
other files lying around even if no configuration was created.

 One way to fix this, is to only include this line when we are trying
 to output this to standard out via `gmond -t`.  This way gmond can
 still function without a configuration file, and the default
 configuration outputting still works as expected and users won't need
 libconfuse 2.7 to get gmond working the way it's supposed to be --
 what do you think?

interesting, but more of a hack around the problem than a solution.
it also has the sideffect of changing the way `gmond -t` works and
making the internal configuration invisible as there is no way anymore
to print it, and therefore should need also most likely to be
documented clearly to avoid surprises.

agree though that at least remove the segfault by default and therefore
is worth considering, even if probably a similar solution could be accomplished
but not pushing by default configurations (which as I said your patch was
encouraging instead)

  installing the example modules by default might not be a good idea, as
  they are just generating bogus metrics anyway.
 
 Agreed, but they are installed, but not turned on (note the
 pyconf.off extension).

but they are example modules (AKA meant for reading and learning from, not
running in a production setup) and would rather see pushed by a packager into
/usr/share/doc/ganglia/examples or something similar than in the place where
all other real modules are deployed.

  also, as you pointed out since these modules are linux specific and only
  needed on some setups they were intentionally not included in the default
  install as they are generally pulled as needed by the packager/sysadmins
  that are interested on them anyway.
 
 How about a new target like `make install_gmond_modules`?

probably more of a `make install_extra_linux_modules`, which then would also
pull the needed configurations and ensure that gmond can include and enable
them all without crashing.

 The reason why I decided to do this is because users who build from
 source may not know about these modules, or know where they are
 supposed to go.  So I thought it would be nice to make it easier for
 them to discover this.

this seems like something that would be better to correct through
documentation.

don't forget also packagers (and sysadmins) are already pulling
through their packages whatever they find useful and so this change
will conflict downstream with their setups, while not providing the
information that would allow otherwise be used by interested sysadmins
to make their own educated decisions.

Carlo

--
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] Manpages in mans/ directory

2010-06-26 Thread Carlo Marcelo Arenas Belon
On Wed, Jun 23, 2010 at 03:23:52PM -0700, Bernard Li wrote:
 
 I'm trying to get some manpage related bug fixes in and was wondering
 if someone could tell me how the manpages in the mans/ directory of
 our source tree are generated.

after the binaries are build then pipe them to help2man, then some sed
to replace the ` if I recall correctly.

 It looks like it's a combination of generation via help2man and
 manually adding some sections (like AUTHOR, BUGS, COPYRIGHT).

help2man -i can be used for adding those extra sections; never bothered
creating a Makefile though because the AUTHORS and COPYING files which
would had been the sources for it are not really well maintained (BUGS
was added and used as a source for example)

Carlo

--
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] Install gmond module config files, python modules and config files by default

2010-06-26 Thread Carlo Marcelo Arenas Belon
On Fri, Jun 25, 2010 at 01:26:37PM -0700, Bernard Li wrote:
 
 The following patch will install the gmond module (including python)
 config files to the sysconfdir

this would trigger gmond to segfault unless it was linked against
libconfuse 2.7 or it also has a default gmond.conf file created.

 as well as the python modules to
 moduledir when `make install` is executed:

installing the example modules by default might not be a good idea, as
they are just generating bogus metrics anyway.

also, as you pointed out since these modules are linux specific and only
needed on some setups they were intentionally not included in the default
install as they are generally pulled as needed by the packager/sysadmins
that are interested on them anyway.

Carlo

--
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] bootstrapping for 3.1.X series and 3.2.X

2010-01-07 Thread Carlo Marcelo Arenas Belon
On Tue, Jan 05, 2010 at 02:42:28PM +, Daniel Pocock wrote:
 Carlo Marcelo Arenas Belon wrote:
 On Mon, Dec 28, 2009 at 10:51:51PM +, Daniel Pocock wrote:
 Carlo Marcelo Arenas Belon wrote:
 On Sun, Dec 06, 2009 at 09:28:04AM +, Daniel Pocock wrote:
 
 Ok, but if it is not locked down, let's consider some of the following:

 - document the version we expect
 
 agree, and that is what README.SVN is for, but first we have to decide 
 which
 version to expect to begin with.

guess that if we are going to use lenny then the defaults for that distribution
should be documented then as a prerequisite here since anything older won't
be tested anyway?

  automake: 1.10 (1.10.1)
  autoconf: 2.61 
  libtool: 1.5 (1.5.26)

while the last official release (3.1.2) used instead :

  automake: 1.9 (1.9.6)
  autoconf: 2.59
  libtool: 1.5 (1.5.22)

and the de-facto standard (CentOS 4) used instead :

  automake: 1.9 (1.9.2)
  autoconf: 2.69
  libtool: 1.5 (1.5.6)

 - maybe add some check to configure that warns if a different 
 version of  autotools is detected?
 
 configure doesn't depend autotools and so that would be the wrong place to 
 put
 any checks, but configure.in does and there is where bootstrapping should 
 be
 aborted using AC_PREREQ and friends if using the wrong versions.
 
 Ok, should we use AC_PREREQ for 3.1.6, are there any disadvantages?

 only if the macros will definitely break with an older version of autoconf
 as otherwise all we are doing is enforcing a recommendation and preventing
 anyone that might not have access to the newest version of autotools the
 posibility of getting their own bootstrap (not much of an issue if we also
 provide regular snapshots though).

 I've had a quick search for information on this - it appears that adding

 AC_PREREQ(2.61)

 would cause bootstrapping to fail on any older or newer version - only  
 2.61 would be supported.

will fail if any older version is used, but work also for newer versions

 I think this is the right way to go, as it will prevent us from running  
 in to the same issues again, and it will hopefully prevent people  
 building trunk with a different version of autotools and creating bugs  
 that no one else can re-produce.

not really, all we are doing is preventing some developer to generate their
own bootstrap of ganglia if they have only access to an older than 2.61
version of autoconf, even if it is very likely that 2.53 or older will be
all that is really needed.

if we are not providing periodic snapshots, then that developer won't be
able to do any of the work he wanted to do, unless he upgrades his tools
locally or gets a handle of another system where he can get a bootstrap
(most likely installing debian 5 somewhere) and so we just made his life
more difficult and probably even discourage him to scratch that itch.

bootstrapping, doesn't mean releasing and so would expect release managers
to use the versions or environment we know works, but that is something
that can be done through process and documentation better than it can be
done by code, hence why I suggest r2174 gets reverted.

 I think Debian 5.0 (lenny) is the final decision then

 Debian 5.0 (lenny) x86 (32-bit) right?

 I'm using lenny amd64 (64 bit) most of the time now, especially since  
 the various browser plugins (e.g. Java) now support 64 bit Linux.

the problems with the bootstrap of 3.1.2 might had been because of using
a 64 bit bootstrap (as that was never seen when doing CentOS 4 x86), but
if Debian 5 doesn't have that problem (we would have to confirm that the
packages generated in x86 and amd64 are identical) then saying Debian
5.0 (lenny) should be enough to describe the suggested bootstrap
environment.

 any final objections/comments?

 the only one I can think of is that we sometimes used to provide RPMs with
 the releases but that would be IMHO not that important considering that
 fedora/EPEL might be the package most people would use anyway and at least
 for fedora that used to be released fairly quickly after the source package
 was posted on our site as the fedora packagers are also actively involved
 in the list.
   
 Providing RPMs is probably much less important than having a stable  
 bootstrap environment

agree

 However, it would be good for packaging activities to continue, and I  
 can't see why we can't script the release process so that it invokes the  
 rpmbuild commands on a Fedora box over ssh.

then you are going to need either 2 public resources for all release managers
to use consistently or a coordinate release process were the package is
generated and then independently binary packages are added to it before
the announcement (which also means we have to agree on what is going to be
used for building those RPM packages).

 Should we

 a) after fixing the other showstopper (fork issue), do we tag 3.1.6 
 and let people test a tarball from Debian 5 autotools?, or

 b) make another 3.1.5 tarball

Re: [Ganglia-developers] PATCH : Adding trends to Ganglia

2010-01-06 Thread Carlo Marcelo Arenas Belon
On Tue, Jan 05, 2010 at 10:46:34AM +0100, Sebastien Termeau wrote:
 On Mon, Jan 4, 2010 at 10:03 AM, Carlo Marcelo Arenas Belon 
 care...@sajinet.com.pe wrote:
 
  On Tue, Dec 29, 2009 at 02:49:28PM +0100, Sebastien Termeau wrote:
  
   OK, I will provide you with two new patches that include those remarks.
 
  BUG249 (the one about using tables for formatting of the host view) is IMHO
  already closed, and unless you really meant to (as I expected and asked
  before
  but got no confirmation) to be really an enhancement that would be released
  with some 3.1 version (most likely 3.1.7).
 
  if that is the case, please update the target on the bugs or if you can't
  do
  that let me know and I would do so and track the corresponding backport for
  the release.
 
 I agree.
 How do I change the target version? It is the version number in the bug
 description?

no it is the Version field in the details section, which now says Trunk
and should say instead 3.1.x; if you have problems changing that let me
know and I'll do the honors.

assume 3.1.7 would be OK since we are almost code freeze for 3.1.6, and then
will prepare a backport patch that could be pulled manually and if you don't
file one yourself too and which you are probably using already anyway for your
local package.

  BUG250 will need an updated patch that can be applied cleanly to trunk so
  that it can be tested/enhanced further.
 
 I just submitted a new version of the patch.
 This one can be cleanly applied to trunk.
 I slightly modified the order in which thinks are done in graph.php in order
 to calculate the 'start' and 'end' values before calling the metric.php
 script.

cool, will check it and commit it to trunk then if it is working, but I
suspect someone with a better clue about UI design might have a word about
it that I have before it can get into 3.1

   I was also thinking of adding a third one with minimum, maximum and
  average.
   Do you think it might be interesting to have this graph also?
 
  AFAIK, those values are already in the metric graphs as numeric values, and
  the MAX is also graphed with a red line, is that what you were looking to
  add?
 
 Yes it is.
 I was thinking that maybe the normal graphs should not come with this max
 line.

OK, I think this was done before, where the red line was actually a per cluster
max or the real max (like 100% in a percentage metric)

there were some patches also flying around to add those values to the Y axis
which I am not sure got committed but which will be complimentary to your idea.

 And instead, we can provide a new 'trend graph' with MIN, MAX and AVG drawn
 as lines.

in that case it would probably make more sense to have a checkbox in the bar
to toggle trending ON/OFF so that all graphs in the host view will be showing
either the normal or the trending graph, instead of having a link for each
graph.

Carlo

--
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] template-based metric definition with PCRE

2010-01-04 Thread Carlo Marcelo Arenas Belon
On Mon, Dec 28, 2009 at 08:47:35PM +, Daniel Pocock wrote:
 Jesse Becker wrote:
  On Sat, Nov 28, 2009 at 08:42, Daniel Pocock dan...@pocock.com.au wrote:

  For those following trunk, you may need to bootstrap again, and make
  sure you have pcre available.
 
  I've linked gmond with libpcre so that it can dynamically match the
  metric names
 
  E.g., for the multicpu module, this is the only metric definition that
  needs to be given to enable all metrics on all cores:
 
   metric {
 name_match = multicpu_([a-z]+)([0-9]+)
 value_threshold = 1.0
 title = CPU-\\2 \\1
   }
 
  Oh, that's cool. +1 for me.

 I've backported to 3.1,

that was a bad idea IMHO, not because the implementation is bad, but because
3.1.3^H4^H5^H6 has been delayed long enough that adding anything else to it
this late and therefore resetting the testing cycle would be unwise;
specially considering there are other fairly significant fixes/features
waiting as well for backport as well.

there is also the fact that there was a valid (sorta, even if no code was
ever produced otherwise) comment on how this functionality should be made
optional (just like python is) and that wasn't discussed further (except
on this email after it was committed), neither corrected.

lastly, this code makes using multicpu so easy that it will be fairly obvious
the module never worked fine to begin with and so it would therefore make
more sense to also backport the needed fixes in r2116 (still incomplete), and
maybe even the configuration cleanup patches in r2118 which are also somehow
related, and also consider better ways to protect users of other platforms
than Linux and Cygwin from shooting themselves on the foot by trying to get
that module loaded, and which is an even bigger issue.

 $ svn log -r2160
 
 r2160 | d_pocock | 2009-12-28 20:43:54 + (Mon, 28 Dec 2009) | 1 line
 
 Patch for PCRE support (backport r2112 and r2119)

you are missing also r2150 and r2156 and some yet not existent patches
so that the dependency will be also in the RPM SPEC and documented in
the configuration man page and other needed places.

would suggest instead to revert this backport for now.

  I'd be interested in any feedback on the PCRE dependency.  If necessary,
  the feature can be made into a compile time option so that gmond can
  build without it.
 
  Yes, an optional compile time option is the way to do this.  Use it if
  present, but continue on without it if not present.

 Is PCRE not available on any platform that we want to support for 3.1?  

most likely available everywhere (just like python), but since not having
it would most likely only imply that the use of the corresponding
configuration wouldn't be possible it really makes sense to be considered
optional.

 If not, then I'll leave the patch as it is, too many #ifdefs can make 
 the code look messy.  The current implementation tries default locations 
 for pcre, or let's you specify your own version:
 
 ./configure --with-libpcre=/opt/pcre

ideally all that should be needed will be to also have a --enable-pcre or
equivalent flag to control how to disable support for this at compile time
just like it is possible for python (and that has proven to be really useful
for Solaris users AFAIK)

being able to use then autoconf like #defines to either enable a dummy
implementation of the missing functionality should be all that is needed
and shouldn't made the code that ugly (unless it needs refactoring anyway)

but I understand if you are looking instead to get the feature initially
released without having this as a posibility.

Carlo

--
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] bootstrapping for 3.1.X series and 3.2.X

2010-01-04 Thread Carlo Marcelo Arenas Belon
On Mon, Dec 28, 2009 at 10:51:51PM +, Daniel Pocock wrote:
 Carlo Marcelo Arenas Belon wrote:
  On Sun, Dec 06, 2009 at 09:28:04AM +, Daniel Pocock wrote:

  Carlo Marcelo Arenas Belon wrote:
  
  On Wed, Nov 25, 2009 at 11:00:21AM +, Daniel Pocock wrote:

  b) should the choice of bootstrap environment be locked for all 
  3.1.X, and only changed when increasing the minor version number 
  (e.g. when we go from 3.1 to 3.2)?
  
  no, but since our build system is full of hacks and not completely 
  reliable
  it might be a good idea to test no issues are introduced when looking at
  a new version.

  Ok, but if it is not locked down, let's consider some of the following:
 
  - document the version we expect
 
  agree, and that is what README.SVN is for, but first we have to decide which
  version to expect to begin with.

  - maybe add some check to configure that warns if a different version of  
  autotools is detected?
 
  configure doesn't depend autotools and so that would be the wrong place to 
  put
  any checks, but configure.in does and there is where bootstrapping should be
  aborted using AC_PREREQ and friends if using the wrong versions.

 Ok, should we use AC_PREREQ for 3.1.6, are there any disadvantages?

only if the macros will definitely break with an older version of autoconf
as otherwise all we are doing is enforcing a recommendation and preventing
anyone that might not have access to the newest version of autotools the
posibility of getting their own bootstrap (not much of an issue if we also
provide regular snapshots though).

  d) Can anyone volunteer to provide a stable bootstrap environment 
  (e.g. a virtual server) just for Ganglia?  Two such environments may 
  be needed, one for trunk and one for the current release branch.
  
  Matt did offer an EC2 instance if we could agree on an OS version :
 

  http://www.mail-archive.com/ganglia-developers@lists.sourceforge.net/msg05271.html
 
  I suggested Debian 5.0 (more conservative) or Fedora 12 (to be updated 
  more
  frequently) but as far as it is agreed, documented and reproducible 
  anything
  should work.

  I prefer Debian 5.0 (lenny), that is what I have on my laptop, home PC  
  and various other infrastructure that I use. Elsewhere I am using 
  RHEL3/4/5.
 
  Debian 5.0 is also what is being used for bugzilla AFAIK and so that might
  be a good option for consolidation.
 
 Who controls access to the Bugzilla server?  I wouldn't mind having use 
 of that as a bootstrap environment.

Matt would know, but I suspect that shell access might be probably problematic
to get and therefore unless we are talking about some continuous build system
like cruisecontrol or hudson making snapshots, it might be problematic
otherwise.

to easy using one of those systems r2144 (still incomplete) was committed
but would be nice to know which direction we are going anyway and for now it
would seem there is not much dialogue going on about the alternatives.

  We also have access to the OpenCSW build farm, and they are willing to  
  consider applications for access by Ganglia developers, so we could look  
  at that as a bootstrap environment.
 
  Bootstrapping is done only once per package and so wouldn't make sense to
  also do bootstrapping in Solaris.

 No, I wasn't suggesting we bootstrap separately for Solaris.  I was just 
 suggesting that we use the OpenCSW machine to bootstrap for all platforms.
 
 However, we would be stuck with whatever version of autotools is current 
 in the OpenCSW environment, and any decision to change the version there 
 would be out of our control.
 
 I think Debian 5.0 (lenny) is the final decision then

Debian 5.0 (lenny) x86 (32-bit) right?

 any final objections/comments?

the only one I can think of is that we sometimes used to provide RPMs with
the releases but that would be IMHO not that important considering that
fedora/EPEL might be the package most people would use anyway and at least
for fedora that used to be released fairly quickly after the source package
was posted on our site as the fedora packagers are also actively involved
in the list.

debian/ubuntu is usually also well represented, and that shouldn't be an
issue for releases in debian 5 anyway.

 Should we
 
 a) after fixing the other showstopper (fork issue), do we tag 3.1.6 and 
 let people test a tarball from Debian 5 autotools?, or
 
 b) make another 3.1.5 tarball using Debian 5 autotools, and put it in a 
 separate location for people to test before we tag?

Using debian for this release will break Solaris (I have a fix ready but
not yet backported) and also AIX (which Michael is maintaining outside
our tree and with patched generated based on the bootstrapping used for
3.1.2) :

  http://www.perzl.org/ganglia/

As I said in the STATUS file for 3.1, it would be better IMHO to delay
this decision until 3.1.7 (which hopefully would also include support
for AIX

Re: [Ganglia-developers] [RFC] two step gmond initialization

2010-01-04 Thread Carlo Marcelo Arenas Belon
On Mon, Dec 28, 2009 at 11:05:36PM +, Daniel Pocock wrote:
 Carlo Marcelo Arenas Belon wrote:
 On Fri, Dec 18, 2009 at 04:18:16PM +, Daniel Pocock wrote:
   
 Carlo Marcelo Arenas Belon wrote:
 
 On Sun, Dec 13, 2009 at 10:49:00AM +, Daniel Pocock wrote:
   
 I could accept Brooks' solution, because it means gmond would 
 only fail  for something like out-of-memory, while any 
 configuration failure, port  in use, etc would cause it to fail 
 before detaching.
 
 If gmond still fails silently in some cases, you have not accomplished the
 objective that you were trying to obtain with r2025 anyway.
 
 I agree - it doesn't completely meet my goal, but it does at least   
 result in an error code for most types of bad configuration (or port 
 in  use)

 that part is OK, but you still have the added sideeffects of r2025 which
 would affect gmond in other interesting ways :

 * the metric (and module) initialization is now done by the parent and  
   expected to be inherited by the child, this means for example that 
 the
   parent will send (and receive) metric information (even before forking)
 * the suid is done by the parent and therefore the child isn't privileged
   (while the metric initialization was done as root), this would at least
   prevent anyone to bind gmond to privileged ports but also could result
   in complicated permission issues by metric collection scripts.

 as I said before I think the apr_poll issue with BSD should be taken as
 a warning of how the changes we were planning to do could have unintended
 sideeffects, and since moving the daemonization was only one way to solve
 the original problem, makes more sense to instead revert this change and
 evaluate alternatives.
   
 It is this line of argument, rather than the concerns about APR, that  
 makes me think reverting the change completely might be the way to go  
 for now, although the reason for the change is still a legitimate issue  
 and can be tracked in bugzilla.

agree, and I have to admit I am surprised this (which was my main argument)
somehow wasn't made clear until now.

indeed, the proposed alternative implementation of a fix was published just
because I agree that this issue is legitimate a bug (even if there might not
be a bugzilla for it) which needed to be corrected anyway.

 Maybe this type of disruptive change will have to come in 3.2, there we  
 can look at the various phases of initialisation more closely, prompt  
 people to review their modules, etc.

I was looking forward for 3.2 being the windows native version and therefore
if the problem with the initialization is solved in a windows incompatible
way then we are going to be left with no other option than to do this
disruptive change there anyway.

Carlo

--
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] PATCH : Adding trends to Ganglia

2009-12-29 Thread Carlo Marcelo Arenas Belon
On Mon, Dec 28, 2009 at 01:22:47PM +0100, Sebastien Termeau wrote:
 
  On Wed, Dec 16, 2009 at 1:18 PM, Carlo Marcelo Arenas Belon 
  care...@sajinet.com.pe wrote:
 
  On Tue, Dec 15, 2009 at 02:32:07PM +0100, Sebastien Termeau wrote:
   Dear Ganglia Developers,
  
   Please find below a patch that brings trends to Ganglia.
 
  Really interesting, would you mind filing and enhancement bug on
  www.ganglia.info?, that would be also a great place for attaching
  those images you said were also needed.
 
 Just to inform you that I have submitted 2 enhancement requests:

would assume you wanted them eventually included as part of some 3.1
release? (most likely 3.1.7), if so would be better to adjust the
target.

http://bugzilla.ganglia.info/cgi-bin/bugzilla/show_bug.cgi?id=249

committed in r2161, but as explained in bugzilla the use of TR/TR is
invalid HTML4, and so unless we rely on browsers doing the right thing
(which seems to happen at least on Firefox 3.5) it will need to be patched
to do something a little more complicated or as ugly as the following hack
(which again will rely on browsers doing the Right Thing (tm) while
rendering) :

--- web/templates/default/host_view.tpl 2009-12-29 02:30:30.0 -0800
+++ web/templates/default/host_view.tpl 2009-12-29 02:56:05.0 -0800
@@ -129,7 +129,7 @@
 /A/TD
 {new_row}
 !-- END BLOCK : vol_metric_info --
-/TR
+TD/TD/TR
 /TABLE
 /DIV
 !-- END BLOCK : vol_group_info --

http://bugzilla.ganglia.info/cgi-bin/bugzilla/show_bug.cgi?id=250

patch provided is broken as shown by :

  patch:  malformed patch at line 104: diff -ur 
ganglia_trunk_tables/host_view.php ganglia/host_view.php

but after massaging slightly I am still having the following comments :

* with_trends should be a bool instead (true/false)
* probably to avoid surprises it should be false in 3.1 when backported
* the same TR abuse issue from BUG249 applies here
* the use of magic constants for the prediction trends should be explained
  to allow for customization or not made configurable at all.
* the hardcoded double the currently selected range should be made flexible
  somehow or at least explained in the graph to avoid surprises.
* I am using rrdtool 1.3.8 so I only got projection but the use of the
  icons for it seemed strange and not particularly good looking

looking forward for a newer version that could be more broadly tested though
as the feature is definitely interesting.

Carlo

--
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] gmetad and rrdtool scalability

2009-12-25 Thread Carlo Marcelo Arenas Belon
On Thu, Dec 24, 2009 at 12:10:51PM +, Daniel Pocock wrote:
 Vladimir Vuksan wrote:

 The issue is value of this data. If these were financial transactions  
 than no loss would be acceptable however these are not. They are  
 performance, trending data which get averaged down as time goes by  
 so loss of couple hours or even days of data is not tragic.

 I agree - it doesn't have to be perfect.

still the current implementation has ways to go and should be most likely
expanded for more data reliability as far as it doesn't cost to much.

 To come back to my own requirement though, it is about horizontal  
 scalability.  Let's say you have a hypothetical big enterprise that has  
 just decided to adopt Ganglia as a universal solution on every node in  
 every data center globally, including subsidiary companies, etc.

 No one really wants to manually map individual servers to clusters and  
 gmetad servers.  They want plug-and-play.

the currently federated model of gmetad helps slightly in that respect
as you would expect each one of the independent offices/units/datacenters
would have 1 gmetad locally (as far as it is big enough to handle the load)
to collect and aggregate data and 1 central gmetad that connects to all
the leaves for the centralized view.

of course you can also have more than 1 gmetad (even 1 per cluster per
location) and make the gmetad hierarchy tree a little larger.
 
 They just want to allocate some storage and gmetad hardware in each main  
 data center, plug them in, and watch the graphs appear.  If the CPU or  
 IO load gets too high on some of the gmetad servers in a particular  
 location, they want to re-distribute the load over the others in that  
 location.  When the IO load gets too high on all of the gmetads, they  
 want to be able to scale horizontally - add an extra 1 or 2 gmetad  
 servers and see the load distributed between them.

horizontal scalability like these would be ideal, but again, the added
complexity cost might be difficult to assimilate.

 Maybe this sounds a little bit like a Christmas wish-list, but does  
 anyone else feel that this is a valid requirement?  Imagine something  
 even bigger - if a state or national government decided to deploy the  
 gmond agent throughout all their departments in an effort to gather  
 utilization data - would it scale?  Would it be easy enough for a  
 diverse range of IT departments to just plug it in?

with enough planning and assuming the cluster tree is somehow balanced
it should work fine IMHO, but for very large clusters or ones that span
multiple locations and can't be split logically (clouds) you would soon
run into scalability issues, including as well memory pressure in the
gmond collectors.

 Carlo also made some comments about RDBMS instead of RRD.  This raises a  
 few discussion points:

I meant RDBMs alongside RRDs, as RRDs were specially designed to allow
for an efficient storage and summarization of metrics which is what is
most of the time needed.

For special cases where you need to have all data without any distortion
for a long time, then an ETL process with a RDBMS and some datawharehouse
is better fitted.

The ETL could be as simple as scanning the RRDs periodically and importing
the records into a database, but would be nice if this could be done
directly from gmetad by allowing for hooks during write RRD time.

This was indeed, one of the reasons why the python gmetad in trunk had
a modular design, so that a module for doing that could be written if
someone had interest on doing so.

Carlo

--
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] gmetad and rrdtool scalability

2009-12-20 Thread Carlo Marcelo Arenas Belon
On Sun, Dec 20, 2009 at 04:02:36PM +, Spike Spiegel wrote:
 On Mon, Dec 14, 2009 at 10:28 AM, Carlo Marcelo Arenas Belon
 care...@sajinet.com.pe wrote:
 
  b) you can afford to have duplicate storage - if your storage
  requirements are huge (retaining a lot of historic data or lot's of data
  at short polling intervals), you may not want to duplicate everything
 
  if you are planning to store a lot of historic data then you should be
  using instead some sort of database, not RRDs and so I think this shouldn't
  be an issue unless you explode the RRAs and try to abuse the RRDs as a RDBMs
 
 I think there's a middle ground here that'd be interesting to explore,
 altho that's a different thread, but for kicks this is the gist: the
 common pattern for rrd storage is hour/day/month/year and I've always
 found it bogus.

I am sure the defaults provided were completely arbitrary (I think you missed
week) but make sense based on the fact that there were the smallest time unit
of their kind and also that they fit the standard gmond polling rates which
wouldn't accommodate for 1 min or 1 sec.

 In many cases I've needed higher resolution (down to
 the second) for the last 5-20 minutes, then intervals of an hr to a
 couple hrs, then a day to three days and then a week to 3 weeks etc
 etc, which increases your storage requirements, but  is imho not an
 abuse of rrd and still retains the many advantages of rrd over having
 to maintain a RDBMs.

agree, and the fact that it is not easy enough to do or requires a somehow
intrusive maintenance is a bug, but still possible for the reasons you
explain.

  PS. I like the ideas on this thread, don't get me wrong, just that I agree
  ?? ??with Vladimir that gmetad and RRDtool are probably not the sweet spot
  ?? ??(cost wise) for scalability work even if I also agree that the vertical
  ?? ??scalability of gmetad is suboptimal to say the least.
 
 sort of. If you're looking at where your resources go to compute and
 deal with large amount of data, I agree. If you look at what it costs
 you or if it's even possible to create a fully scalable and resilient
 ganglia based monitoring infrastructure, I disagree.

not sure what part are you quoting here, but I have the feeling we probably
agree ;)

getting my ganglia developer hat, I dislike the fact that gmetad can't scale
horizontally like all well designed applications should, but the fact that
there is no solution for it to do so yet, means that the complexity involved
on making that change is probably not worth it in most (if not all) the cases
considering that hardware (to the levels needed most of the time) is cheap
anyway, as I really hope there is no one out there running gmetad in some
big iron solution, when some decent PC box with enough memory would do
mostly fine.

there are problems as well with the way federation currently works which
require more network bandwith and CPU that should be really needed and that
I would guess we should tackle first, specially considering the increase
of the XML sizes with 3.1 (which also has been worked around too) but for
that (getting my ganglia user hat) would assume most big installations will
stick with 3.0 anyway for now.

Carlo

--
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] [RFC] two step gmond initialization

2009-12-18 Thread Carlo Marcelo Arenas Belon
On Fri, Dec 18, 2009 at 04:18:16PM +, Daniel Pocock wrote:
 Carlo Marcelo Arenas Belon wrote:
 On Sun, Dec 13, 2009 at 10:49:00AM +, Daniel Pocock wrote:

 I could accept Brooks' solution, because it means gmond would only 
 fail  for something like out-of-memory, while any configuration 
 failure, port  in use, etc would cause it to fail before detaching.

 If gmond still fails silently in some cases, you have not accomplished the
 objective that you were trying to obtain with r2025 anyway.
   
 I agree - it doesn't completely meet my goal, but it does at least  
 result in an error code for most types of bad configuration (or port in  
 use)

that part is OK, but you still have the added sideeffects of r2025 which
would affect gmond in other interesting ways :

* the metric (and module) initialization is now done by the parent and 
  expected to be inherited by the child, this means for example that the
  parent will send (and receive) metric information (even before forking)
* the suid is done by the parent and therefore the child isn't privileged
  (while the metric initialization was done as root), this would at least
  prevent anyone to bind gmond to privileged ports but also could result
  in complicated permission issues by metric collection scripts.

as I said before I think the apr_poll issue with BSD should be taken as
a warning of how the changes we were planning to do could have unintended
sideeffects, and since moving the daemonization was only one way to solve
the original problem, makes more sense to instead revert this change and
evaluate alternatives.

 and it allows us to continue using apr (which some people have
 indicated a preference for).

the solution I proposed doesn't remove the apr dependency, just doesn't
use it for this specific case, because it is obvious it doesn't fit for
what we need to, and we gain otherwise nothing from it (unless we would
have a windows native version of gmond)

it was also meant to be a temporary solution and the minimum change
needed so that we can have :

* 3.1.6 released quickly
* the bug you were trying to solve still fixed for 3.1.6

ideally we should be able to make this work through apr in the long run
(even if that means fixing apr), or if that is not possible rely on posix
itself for getting windows compatibility for this part whenever the time
comes to do that.

 The solution I proposed addresses the problem of reporting to the OS any
 failure while initialization (which was the original bug to fix anyway)
 in a straight forward way and is therefore the right way to correct this
 IMHO, without introducing any regressions by changing long relied upon
 semantics.
   
 Does anyone else have any feelings about this?  I think we can choose from:

 - Carlo's solution (implement apr_proc_detach ourselves, calling process  
 hangs around and uses socket to discover if daemon started successfully)

not a socket but a pipe.

 - Brooks' solution (prepare sockets before detaching, prepare pollsets  
 after detaching) - this allows us to continue using apr_proc_detach and  
 not have native UNIX code

this should work fine too (after all was the proposed option 3), but is
really a fix for the bug introduced with r2025, instead of a fix to the
original bug, hence why I don't really see how we can compare them both
side by side.

 - Revert my change completely

this was my suggestion for 3.1.6, so at least we will have a working
gmond faster and be able to stabilize (both trunk and 3.1) further.

since we haven't done this yet, testing any other changes in both
trunk and 3.1 is impossible in BSD, and we had therefore implicitally
dropped support for those platforms.

 I would like to make some kind of decision about what goes in 3.1.6  
 before Christmas, and maybe aim to tag 3.1.6 by 11 January, there is  
 also the possibility that we can try to push it out more quickly, maybe  
 tagging it 24 December and go GA in mid January?

timeline will of course depend on the amount of changes involved,
I am afraid also there has been almost no dialogue about the
other showstoppers for 3.1.6 (like the bootstrapping issue) so there
might be additional complications for this (I was indeed preparing
some more build fixes to prevent more regressions if the original
plan shown of using Fedora 9 with 3.1.5 are still in effect)

Carlo

--
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] PATCH : Adding trends to Ganglia

2009-12-16 Thread Carlo Marcelo Arenas Belon
On Tue, Dec 15, 2009 at 02:32:07PM +0100, Sebastien Termeau wrote:
 Dear Ganglia Developers,
 
 Please find below a patch that brings trends to Ganglia.

Really interesting, would you mind filing and enhancement bug on
www.ganglia.info?, that would be also a great place for attaching
those images you said were also needed.

 It uses RRD's LSLSLOPE, LSLINT and PREDICT (requires RRD = 1.4.0 ) to
 provide two kinds of trends.
 Trends can be disabled by modifying conf.php ($with_trends).

considering that RRD = 1.4.0 isn't that common yet would assume is probably
better to have this turned off by default for now

 I also modified the host_view template to use tables instead of sending a
 BR after n metrics.

would you mind doing this change in an independent that would be applied
before the one that adds your feature?, it would be also better if the
patch is done against svn trunk as it will be easier to integrate that
way but shouldn't be that difficult to correct that either if you don't
feel comfortable working with subversion or git (using git svn).

more comments inlined with the code.

 ---
 diff -ur ganglia.ori/conf.php ganglia/conf.php
 --- ganglia.ori/conf.php2009-12-14 15:19:23.0 +0100
 +++ ganglia/conf.php2009-12-15 12:05:49.0 +0100
 @@ -64,6 +64,18 @@
  #
  $show_meta_snapshot = yes;
 
 +#
 +# Show trends icons next to each single metric graph
 +#
 +$with_trends = yes;

have you tested it this as false as well?, what about a version
of RRD that doesn't support this?, would recommend having this
off by default and adding a comment saying only to turn on if
using the right version of rrdtool.

 @@ -140,6 +142,12 @@
# Get_context makes start negative.
$start = $sourcetime + $start;
 }
 +
 +# For trends, we double the time range
 +if ($trend_type != ''){
 +  $rrdtool_graph['end']=$end + ($end-$start);
 +}
 +
  # Fix from Phil Radden, but step is not always 15 anymore.
  if ($range==month)
 $rrdtool_graph['end'] = floor($rrdtool_graph['end'] / 672) * 672;

could you elaborate on why this is needed?, other than of course allow for a 
trend to be
visible, should we allow the trend end target to be configurable instead?

 diff -ur ganglia.ori/host_view.php ganglia/host_view.php
 --- ganglia.ori/host_view.php2009-12-14 15:19:23.0 +0100
 +++ ganglia/host_view.php2009-12-15 14:16:46.0 +0100
 @@ -161,10 +161,26 @@
   $tpl-newBlock(vol_metric_info);
   $tpl-assign(graphargs, $v['graph']);
   $tpl-assign(alt, $hostname $name);
 +

huh?

   if (isset($v['description']))
 $tpl-assign(desc, $v['description']);
 - if ( !(++$i % $metriccols) )
 -$tpl-assign(br, BR);
 + # PREDICT supported in 1.4.0
 +   if ($with_trends == 'yes'){
 + if( version_compare($version[rrdtool], '1.4.5') = 0) {

1.4.5?

 +   $tpl-newBlock(trend_predict);
 +   $tpl-assign(graphargs, $v['graph']);
 +   $tpl-assign(images,./templates/$template_name/images);
 + }
 + else {
 +   $tpl-newBlock(trend);
 +   $tpl-assign(graphargs, $v['graph']);
 +   $tpl-assign(images,./templates/$template_name/images);
 + }
 +   }
 + if ( !(++$i % $metriccols) ){
 +   $tpl-gotoBlock (vol_metric_info);
 +$tpl-assign(new_row, /TRTR);
 + }

who gets the last /TR addded to close the table?

Carlo

--
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] gmetad and rrdtool scalability

2009-12-14 Thread Carlo Marcelo Arenas Belon
On Mon, Dec 14, 2009 at 09:26:01AM +, Daniel Pocock wrote:
 Vladimir Vuksan wrote:
  I think you guys are complicating much :-). Can't you simply have 
  multiple gmetads in different sites poll a single gmond. That way if 
  one gmetad fails data is still available and updated on the other 
  gmetads. That is what we used to do.
 
 That is a good solution under two conditions:
 
 a) you are only concerned with redundancy and not looking for 
 scalability - when I say scalability, I refer to the idea of maybe 3 or 
 more gmetads running in parallel collecting data from huge numbers of agents

what is the bottleneck here?, CPUs for polling or IO?, if IO using memory
would be most likely all you really need (specially considering RAM is really
cheap and RRDs are very small), if CPUs then there might be somethings we
can do to help with that, but vertical scalability is what gmetad has, and
for that usually means going to a bigger box if you hit the limit on the
current one.

 b) you can afford to have duplicate storage - if your storage 
 requirements are huge (retaining a lot of historic data or lot's of data 
 at short polling intervals), you may not want to duplicate everything

if you are planning to store a lot of historic data then you should be
using instead some sort of database, not RRDs and so I think this shouldn't
be an issue unless you explode the RRAs and try to abuse the RRDs as a RDBMs

of course that means you have to add a process to gather your metric data
out of the RRDs to begin with and into your RDBMs but there shouldn't be a
need to be concerned with RRDs storage size, when you are most likely going
to be spending a lot more in that RDBMs storage (including snapshots and
mirrors and all those things that make DBAs feel warm inside, regardless of
budget)

Carlo

PS. I like the ideas on this thread, don't get me wrong, just that I agree
with Vladimir that gmetad and RRDtool are probably not the sweet spot
(cost wise) for scalability work even if I also agree that the vertical
scalability of gmetad is suboptimal to say the least.

--
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] [RFC] two step gmond initialization

2009-12-13 Thread Carlo Marcelo Arenas Belon
On Sun, Dec 13, 2009 at 10:49:00AM +, Daniel Pocock wrote:
 Carlo Marcelo Arenas Belon wrote:
 On Fri, Dec 11, 2009 at 01:31:22PM -0600, Brooks Davis wrote:
   
 On Fri, Dec 11, 2009 at 04:56:51PM +, Carlo Marcelo Arenas Belon wrote:
 
 I presume the reason why you haven't seen this show up in the APR list, is
 because it makes probably more sense for the apache httpd list instead for
 help understanding how apache is able to work around the leakiness of
 apr_poll and that also requires some reading from apache's code (which I
 am not at least that familiar with, neither really interested)
   
 Looking at the prefork mpm, the pollsets are created and used only
 in child_main() and thus are created after the fork.  I suspect that
 changing the ganglia code to open all the sockets, but defer creation of
 the pollset until after fork is the right way to go.

 That is the way we did the initialization before r2025 so I guess that could
 explain why we weren't affected just like apache is not.
   
 Not quite - pre-r2025, we did this:

 a) detach
 b) socket init
 c) pollset init

 Post r2025:

 a) socket init
 b) pollset init
 c) detach

 Brooks' solution:

 a) socket init
 b) detach
 c) pollset init

 I could accept Brooks' solution, because it means gmond would only fail  
 for something like out-of-memory, while any configuration failure, port  
 in use, etc would cause it to fail before detaching.

If gmond still fails silently in some cases, you have not accomplished the
objective that you were trying to obtain with r2025 anyway.

The solution I proposed addresses the problem of reporting to the OS any
failure while initialization (which was the original bug to fix anyway)
in a straight forward way and is therefore the right way to correct this
IMHO, without introducing any regressions by changing long relied upon
semantics.

 Basically, we would have to split the code in  
 setup_listen_channels_pollset() into two functions, one that gets called  
 before detaching, and one that is called after detaching.

Why make the code more complicated, and are you really expecting to do that
in scope for getting it backported into 3.1.6 considering how intrusive that
would be?

Also be aware there are bugfixes on that code that hadn't yet been backported
and so you are going to either have to certify as well all those fixes or
cherry pick the changes needed and test all different combinations.

Carlo

--
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] [RFC] two step gmond initialization

2009-12-12 Thread Carlo Marcelo Arenas Belon
On Fri, Dec 11, 2009 at 01:31:22PM -0600, Brooks Davis wrote:
 On Fri, Dec 11, 2009 at 04:56:51PM +, Carlo Marcelo Arenas Belon wrote:
 
  I presume the reason why you haven't seen this show up in the APR list, is
  because it makes probably more sense for the apache httpd list instead for
  help understanding how apache is able to work around the leakiness of
  apr_poll and that also requires some reading from apache's code (which I
  am not at least that familiar with, neither really interested)
 
 Looking at the prefork mpm, the pollsets are created and used only
 in child_main() and thus are created after the fork.  I suspect that
 changing the ganglia code to open all the sockets, but defer creation of
 the pollset until after fork is the right way to go.

That is the way we did the initialization before r2025 so I guess that could
explain why we weren't affected just like apache is not.

In the other hand though that change was introduced to force gmond to report
to its parent in case there were problems creating those resources and that
would be silently ignored otherwise, and I guess apache either has that bug
as well or simply has a better way to code that notification just like the
one that was proposed originally in this thread.

Carlo

--
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] [RFC] two step gmond initialization

2009-12-12 Thread Carlo Marcelo Arenas Belon
On Fri, Dec 11, 2009 at 09:40:53AM -0800, Bernard Li wrote:
 
 Wow...  what a long thread...

Sorry about that boss, but also sent an executive summary in :

  
http://www.mail-archive.com/ganglia-developers@lists.sourceforge.net/msg05398.html

Hope you don't mind reading instead that one or my small enough for a tweet
comment on the STATUS page for 3.1

 IMHO, the best solution here is to look at apache's main loop
 implementation and adapt our code.  This way, (hopefully) we will get
 what we want (late initialization) without modifying any apr code.
 Carlo, since you seem to be on a roll here, could you please kindly:

I think I'd made my point of view clear, including alternatives and a probe
of concept implementation of my preferred resolution which still keeps a
fix for the problem this regression was meant to help with.

If you are interested on implementing alternative 3 (if that is even possible)
or 4 feel free to do so, but considering this is a showstopper for 3.1.6 and
that we (would assume) want to get that released without regressions and ASAP
then would recommend instead we focus for now in a fix for this regression
which will mean either :

1) revert the code and delay a solution for the child notification issues
2) implement the child notification using something we know works like the
   code proposed (at least for now) instead of reordering the initialization
   and breaking all the BSD (and who knows what else) with that.

Carlo

--
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] [RFC] two step gmond initialization

2009-12-11 Thread Carlo Marcelo Arenas Belon
On Fri, Dec 11, 2009 at 08:59:56AM -0700, Brad Nicholes wrote:
 
 APR is designed to solve these problems in a cross platform way and we
 are proposing that we abandon the cross platform solution in favor of a
 platform specific solution.

Just want to clarify here that it is not a platform specific solution as
much as one based on fairly common UNIX standards and therefore supports
most likely every single of the platforms we run on including cygwin.

In any case to simplify testing and probing me wrong a snapshot from trunk
with the patch included is available from :

  http://sajino.sajinet.com.pe/ganglia/ganglia-3.2.0.0.tar.gz

There is yet no native windows ganglia (even if some work has been done
already to have a native metric version of libmetrics at least on trunk),
neither a novell network version (and eventhough I would be interested on
at least adding it to libmetrics lack the access needed to a development
environment which will allow one to exist most likely as noticed by the
lack of interest on it, otherwise) and so most of the portability of APR
(which in this case is just a small wrapper around fork) is not helping
much yet.

 I know that httpd doesn't have these issues and they detach and run
 just fine across a wide variety of platforms including windows, BSD,
 solaris, etc.

right, and that is why alternative 3 in :

  
http://www.mail-archive.com/ganglia-developers@lists.sourceforge.net/msg05398.html

says, look at the apache httpd implementation of their main loop and make
ganglia's similar so it will work with APR AS-IS.

 Why are we having these problems when httpd doesn't?  Is the real solution
 as simple as going to the APR mailing list and asking why this issue exists
 in APR and if there is a workaround?  I haven't really seen this issue show
 up on the APR mailing list so far or did I miss it?

it is obvious, as you explained before, that apache uses APR in a different
way than ganglia and that is why there is no bug to fix here in APR (except
the fact that the implementation of apr_poll is leaky as it is inconsistent
between platforms), if there is a bug I would say it is in the BSD
implementation of kqueue with its non inheritable file handles.

I presume the reason why you haven't seen this show up in the APR list, is
because it makes probably more sense for the apache httpd list instead for
help understanding how apache is able to work around the leakiness of
apr_poll and that also requires some reading from apache's code (which I
am not at least that familiar with, neither really interested)

 One of the problems that we already have with gmond is that there is
 already too much platform specific code in it which is why we have to
 rely on cygwin in order to run on windows.

ganglia is a monitoring application, and therefore it is very likely to have
to work with very platform specific stuff anyway (unlike apache), I agree
though that using cygwin for windows is not ideal and I hope it will be
deprecated sometime when a native windows version would be available, but
APR so far hasn't help much in that direction AFAIK.

 It is also the reason why gmetad doesn't really run on windows because
 it wasn't built on top of a cross platform solution.  My gut feel is that
 we should be moving ganglia more towards APR rather than away from it.

gmetad could be made to run on windows, and from time to time some pure soul
succeeds and then realizes why it was still reported as dont even try.

AFAIK the original reason behind having and alternative python implementation
of gmetad was to have to avoid having to go through the pain of cleaning that
code for portability, noting that it mostly works almost reliably in linux
at best, still I agree APR would be most likely part of a portable gmetad
if needed.

Carlo

--
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] [RFC] two step gmond initialization

2009-12-10 Thread Carlo Marcelo Arenas Belon
Greetings,

in case it wasn't obvious, and to celebrate the 1 week anniversary for this
email, RFC means Request for Comments, and so if you have any about the code
(which I even sent with an obvious bug to encourage the usual bikesheeding)
or design, a reply on it (better if to the original email so it can be
referenced as part of the discussion for context) would be appreciated;
specially considering this is a showstopper for 3.1.3^H4^H5^H6. (made a little
more obvious with r2149)

in the nutshell the solution proposed does :

1) get rid of the apr_proc_detach dependency which is useless anyway when
   all it does is to daemonize the process and we even have an implementation
   for that in our code that is now only used by gmetad.
2) implement the forking / IPC using plain standard unix calls instead for
   portability.
3) create a variable that will be used in all error paths to indicate
   initialization failure and communicate that from child to parent
   through a pipe so that the parent can report failure to the OS if
   needed.
4) patch all error paths to use the new semantics.

the alternatives will be in order of preference :

1) revert the current implementation and delay a solution.
2) drop this feature and maybe hack it with some init script logic
3) reimplement gmond to be more apache like so that APR magically works
4) implement it using APR after fixing APR first if possible
5) ignore the problem and tell BSD users to run gmond in the foreground
   and deal with it.

Carlo

--
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] bootstrapping for 3.1.X series and 3.2.X

2009-12-06 Thread Carlo Marcelo Arenas Belon
On Sun, Dec 06, 2009 at 09:28:04AM +, Daniel Pocock wrote:
 Carlo Marcelo Arenas Belon wrote:
 On Wed, Nov 25, 2009 at 11:00:21AM +, Daniel Pocock wrote:
   
 b) should the choice of bootstrap environment be locked for all 
 3.1.X, and only changed when increasing the minor version number 
 (e.g. when we go from 3.1 to 3.2)?

 no, but since our build system is full of hacks and not completely reliable
 it might be a good idea to test no issues are introduced when looking at
 a new version.
   
 Ok, but if it is not locked down, let's consider some of the following:

 - document the version we expect

agree, and that is what README.SVN is for, but first we have to decide which
version to expect to begin with.

 - maybe add some check to configure that warns if a different version of  
 autotools is detected?

configure doesn't depend autotools and so that would be the wrong place to put
any checks, but configure.in does and there is where bootstrapping should be
aborted using AC_PREREQ and friends if using the wrong versions.

 c) what environment should be used to bootstrap 3.2.X/trunk?

 the same than 3.1 so that all improvements in the build system will be
 tested there and then backported for stability.
   
 Not necessarily - changes can be backported and then tested on the  
 release branch before it is frozen/tagged for a release candidate.

this will violate your rule of same autotools per branch but frankly I
don't care as far as we allocate for the extra time that will be needed to
certify the new bootstrap environment works.

 That would allow more aggressive changes to be implemented in trunk that are  
 not intended for backport.

trunk has several changes that are not intended for backport already and they
are not intended for release either or we will have a 3.2 branch already.

 d) Can anyone volunteer to provide a stable bootstrap environment 
 (e.g. a virtual server) just for Ganglia?  Two such environments may 
 be needed, one for trunk and one for the current release branch.

 Matt did offer an EC2 instance if we could agree on an OS version :

   
 http://www.mail-archive.com/ganglia-developers@lists.sourceforge.net/msg05271.html

 I suggested Debian 5.0 (more conservative) or Fedora 12 (to be updated more
 frequently) but as far as it is agreed, documented and reproducible anything
 should work.
   
 I prefer Debian 5.0 (lenny), that is what I have on my laptop, home PC  
 and various other infrastructure that I use. Elsewhere I am using 
 RHEL3/4/5.

Debian 5.0 is also what is being used for bugzilla AFAIK and so that might
be a good option for consolidation.

 We also have access to the OpenCSW build farm, and they are willing to  
 consider applications for access by Ganglia developers, so we could look  
 at that as a bootstrap environment.

Bootstrapping is done only once per package and so wouldn't make sense to
also do bootstrapping in Solaris.

having the OpenCSW build farm as part of our test builds would be a great
way to ensure Solaris users are better supported though.

Carlo

--
Join us December 9, 2009 for the Red Hat Virtual Experience,
a free event focused on virtualization and cloud computing. 
Attend in-depth sessions from your desk. Your couch. Anywhere.
http://p.sf.net/sfu/redhat-sfdev2dev
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


[Ganglia-developers] [RFC] two step gmond initialization

2009-12-03 Thread Carlo Marcelo Arenas Belon
Greetings,

the following patch (which is never meant to be committed, and is therefore
very ugly in purpose) is a proof of concept for an alternative to the recent
problematic feature proposal of returning failure status for gmond and that
is part of 3.1.3^H4^H5.

it has been tested on Linux amd64 and OpenBSD amd64 and applies to trunk
(includes reverting r2025 for simplicity).

it replaces apr_proc_detach with an inline implementation of it on plain
POSIX and that should be most likely as portable (at least for the platforms
we care of) and doesn't intentionally include any error checking to make it
obvious functionality wise and has been implemented by brute force search
and replace and therefore is definitely missing several other interesting
failure paths.

Carlo

---
Index: lib/error_msg.c
===
--- lib/error_msg.c	(revision 2133)
+++ lib/error_msg.c	(working copy)
@@ -21,6 +21,7 @@
 int daemon_proc;/* set nonzero by daemon_init() */
 
 int ganglia_quiet_errors = 0;
+int gmond_status = 0;
 
 static void err_doit (int, int, const char *, va_list);
 
@@ -121,7 +122,8 @@
va_start (ap, fmt);
err_doit (0, LOG_ERR, fmt, ap);
va_end (ap);
-   exit (1);
+   gmond_status = 1;
+   exit (gmond_status);
 }
 
 /* Print a message and return to caller.
Index: gmond/gmond.c
===
--- gmond/gmond.c	(revision 2133)
+++ gmond/gmond.c	(working copy)
@@ -84,6 +84,9 @@
 /* The directory where DSO modules are located */
 char *module_dir = NULL;
 
+static int pipefd[2];
+extern int gmond_status;
+
 /* The array for outgoing UDP message channels */
 Ganglia_udp_send_channels udp_send_channels = NULL;
 
@@ -214,6 +217,13 @@
 char **gmond_argv;
 extern char **environ;
 
+void gmond_terminate()
+{
+  if (daemon_proc) {
+write(pipefd[1], gmond_status, sizeof(gmond_status));
+  }
+}
+
 /* apr_socket_send can't assure all characters in buf been sent. */
 static apr_status_t
 socket_send(apr_socket_t *sock, const char *buf, apr_size_t *len)
@@ -263,7 +273,8 @@
   exit(0);
 #endif
   err_msg(execve failed to reload %s: %s, gmond_bin, strerror(errno));
-  exit(1);
+  gmond_status = 1;
+  exit(gmond_status);
 }
 
 /* this is just a temporary function */
@@ -317,9 +328,25 @@
   if(!args_info.foreground_flag  should_daemonize  !debug_level)
 {
   char *cwd;
+  pid_t cpid;
 
   apr_filepath_get(cwd, 0, global_context);
-  apr_proc_detach(1);
+  pipe(pipefd);
+  cpid = fork();
+  if (cpid  0) {
+  close(pipefd[1]);
+  read(pipefd[0], gmond_status, sizeof(gmond_status));
+  close(pipefd[0]);
+  _exit(gmond_status);
+  }
+  atexit(gmond_terminate);
+  close(pipefd[0]);
+  chdir(/);
+  setsid();
+  setpgid(0, 0);
+  freopen(/dev/null, r, stdin);
+  freopen(/dev/null, w, stdout);
+  freopen(/dev/null, w, stderr);
   apr_filepath_set(cwd, global_context);
 
   /* enable errmsg logging to syslog */
@@ -359,7 +386,8 @@
   if(deaf  mute)
 {
   err_msg(Configured to run both deaf and mute. Nothing to do. Exiting.\n);
-  exit(1);
+  gmond_status = 1;
+  exit(gmond_status);
 }
 }
 
@@ -404,7 +432,8 @@
   if(!acl)
 {
   err_msg(Unable to allocate memory for ACL. Exiting.\n);
-  exit(1);
+  gmond_status = 1;
+  exit(gmond_status);
 }
 
   default_action = cfg_getstr( acl_config, default);
@@ -419,7 +448,8 @@
   else
 {
   err_msg(Invalid default ACL '%s'. Exiting.\n, default_action);
-  exit(1);
+  gmond_status = 1;
+  exit(gmond_status);
 }
 
   /* Create an array to hold each of the access instructions */
@@ -427,7 +457,8 @@
   if(!acl-access_array)
 {
   err_msg(Unable to malloc access array. Exiting.\n);
-  exit(1);
+  gmond_status = 1;
+  exit(gmond_status);
 }
   for(i=0; i num_access; i++)
 {
@@ -440,7 +471,8 @@
   /* This shouldn't happen unless maybe acl is empty and
* the safest thing to do it exit */
   err_msg(Unable to process ACLs. Exiting.\n);
-  exit(1);
+  gmond_status = 1;
+  exit(gmond_status);
 }
 
   ip = cfg_getstr( access_config, ip);
@@ -449,7 +481,8 @@
   if(!ip  !mask  !action)
 {
   err_msg(An access record requires an ip, mask and action. Exiting.\n);
-  exit(1);
+  gmond_status = 1;
+  exit(gmond_status);
 }
 
   /* Process the action first */
@@ -464,7 +497,8 @@
   else
 {
   err_msg(ACL access entry has action '%s'. Must be deny|allow. Exiting.\n, action);
-  exit(1);
+  gmond_status = 1;
+  exit(gmond_status);
 }  
 
   /* Create the subnet */
@@ -473,7 +507,8 @@
   if(status != APR_SUCCESS)
 {
   err_msg(ACL access entry has invalid ip('%s')/mask('%s'). Exiting.\n, ip, mask);

Re: [Ganglia-developers] [Ganglia-general] Ganglia 3.1.5 beta ready for final testing

2009-12-03 Thread Carlo Marcelo Arenas Belon
On Wed, Dec 02, 2009 at 07:41:39PM +, Daniel Pocock wrote:
 
 Therefore, the approach might need to be some combination of the 
 solutions.  E.g. a configure option that allows people to choose the new 
 behaviour or the old behaviour.

-1, this will double our supported paths for almost no gain and knowing
that at least 50% are broken, and still underscores the nature of the problem.
because changing the initialization would affect also (in a platform specific
way) things like threaded gmond modules and the resources they rely on just
as an example.

 As we know the new behaviour works on Solaris and Linux

with the version of APR that was tested it with, which is also a moving
target.

 then the package can be built the new way on those 
 platforms by default.  On BSD, users could choose what they want by 
 setting a configure option.  If a user had an updated apr (provided such 
 update is feasible), they might compile with the new behaviour.

again, this is not a BSD specific problem (indeed I suspect that solaris
might be affected as well, specially in cases where APR was compiled to
use port_getn), because then apr_poll_* has slightly different semantics
than poll and therefore could result in platform specific failures that
might not be as obvious as it was kqueue for the BSD.

the problem that we were trying to solve was just to propagate correctly
the status from the gmond daemon to the caller and for a proof of concept
in that direction (as suggested before) refer to :

  
http://www.mail-archive.com/ganglia-developers@lists.sourceforge.net/msg05390.html

Carlo

--
Join us December 9, 2009 for the Red Hat Virtual Experience,
a free event focused on virtualization and cloud computing. 
Attend in-depth sessions from your desk. Your couch. Anywhere.
http://p.sf.net/sfu/redhat-sfdev2dev
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


[Ganglia-developers] [RFC] status update for removing ganglia release names from the code

2009-12-03 Thread Carlo Marcelo Arenas Belon
Jesse

There is a backport request for 3.1 labeled build: remove ganglia release
name from the code and that has a veto from you which I would like to see
reconsidered.

your objection refers to a thread[1] that includes the explanation of why
this backport proposal is consistent with the consensus at that time (and
which has since changed[2]) as it only removes the name from the web
frontend configuration where it wasn't being used (dead code):

  
http://www.mail-archive.com/ganglia-developers@lists.sourceforge.net/msg04719.html

It is important to note that since the proposal has been stalled for a long
time it won't be able to cleanly be backported from trunk and so to simplify
the reviewing process a conflict free version of it is attached to this
email.

Carlo

[1] 
http://www.mail-archive.com/ganglia-developers@lists.sourceforge.net/msg04697.html
[2] 
http://www.mail-archive.com/ganglia-developers@lists.sourceforge.net/msg05246.html

Property changes on: .
___
Modified: svn:mergeinfo
   Merged /trunk/monitor-core:r1703,1731

Index: configure.in
===
--- configure.in	(revision 2135)
+++ configure.in	(working copy)
@@ -84,7 +84,6 @@
 AC_SUBST(GANGLIA_MINOR_VERSION)
 AC_SUBST(GANGLIA_MICRO_VERSION)
 AC_SUBST(GANGLIA_VERSION)
-AC_SUBST(GANGLIA_RELEASE_NAME)
 AC_SUBST(REL)
 
 AC_SUBST(LIBGANGLIA_INTERFACE_AGE)
Index: web/version.php.in
===
--- web/version.php.in	(revision 2135)
+++ web/version.php.in	(working copy)
@@ -6,6 +6,5 @@
 $microversion = @GANGLIA_MICRO_VERSION@;
 
 $ganglia_version = @GANGLIA_VERSION@;
-$ganglia_release_name= @GANGLIA_RELEASE_NAME@;
 
 ?
--
Join us December 9, 2009 for the Red Hat Virtual Experience,
a free event focused on virtualization and cloud computing. 
Attend in-depth sessions from your desk. Your couch. Anywhere.
http://p.sf.net/sfu/redhat-sfdev2dev___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] [Ganglia-general] Ganglia 3.1.5 beta ready for final testing

2009-12-02 Thread Carlo Marcelo Arenas Belon
On Wed, Dec 02, 2009 at 01:57:44AM +, Carlo Marcelo Arenas Belon wrote:
 On Tue, Dec 01, 2009 at 10:20:32PM +, Daniel Pocock wrote:
  - Can you easily re-compile APR with a different poll implementation?  I  
  think you can change it from configure.
 
 Which option?, --enable-other-child doesn't make a difference and considering
 how many different versions of APR are installed in all affected systems I
 would be surprised this to be an APR issue.

and surprised I am, as the problem goes away if APR is forced to use poll
instead of kqueue.

but that of course requires a patched version of apr (including bootstrapping)
and is probably not an option, unless we go back to the dark ages of including
all dependencies statically.

if anyone is interested I am attaching a patch for apr-1.3.9 which could be
used to fix this problem in {Free,Net,Open}BSD and which will also require
that ganglia be linked with the patched library by doing something like (using
/opt/ganglia to avoid clashing with the system provided packages and ignoring
the fact that you would need to be root with a bourne shell to execute the
following incantation, and that is very unlikely to be a good idea anyway) :

  # mkdir -p /opt/ganglia
  # tar -xvzf apr-1.3.9.tar.gz
  # cd apr-1.3.9
  # patch -p1  apr-1.3.9-configure-disablekqueue.patch
  # ./confgure --prefix=/opt/ganglia
  # make
  # make install
  # cd ..
  # tar -xvzf ganglia-3.1.5.tar.gz
  # cd ganglia-3.1.5
  # ./configure --prefix=/opt/ganglia 
--with-libapr=/opt/ganglia/bin/apr-1-config
  # make
  # make install
  # LD_LIBRARY_PATH=/opt/ganglia/lib /opt/ganglia/bin/gmond

Carlo

PS. DragonFlyBSD will be still affected and MacOS X was probably luckily not
--- apr-1.3.9/configure	Mon Sep 21 14:59:34 2009
+++ apr-1.3.9/configure	Wed Dec  2 01:45:45 2009
@@ -5762,6 +5762,10 @@
 ac_cv_o_nonblock_inherited=yes
   fi
 
+  if test -z $ac_cv_func_kqueue; then
+test x$silent != xyes  echo   setting ac_cv_func_kqueue to \no\
+ac_cv_func_kqueue=no
+  fi
 	;;
 *-netbsd*)
 
@@ -5792,6 +5796,10 @@
 ac_cv_o_nonblock_inherited=yes
   fi
 
+  if test -z $ac_cv_func_kqueue; then
+test x$silent != xyes  echo   setting ac_cv_func_kqueue to \no\
+ac_cv_func_kqueue=no
+  fi
 	;;
 *-freebsd*)
 
@@ -5838,15 +5846,12 @@
   fi
 
 fi
-# prevent use of KQueue before FreeBSD 4.8
-if test $os_version -lt 48; then
-
+# prevent use of KQueue
   if test -z $ac_cv_func_kqueue; then
 test x$silent != xyes  echo   setting ac_cv_func_kqueue to \no\
 ac_cv_func_kqueue=no
   fi
 
-fi
 	;;
 *-k*bsd*-gnu)
 
--
Join us December 9, 2009 for the Red Hat Virtual Experience,
a free event focused on virtualization and cloud computing. 
Attend in-depth sessions from your desk. Your couch. Anywhere.
http://p.sf.net/sfu/redhat-sfdev2dev___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] [Ganglia-general] Ganglia 3.1.5 beta ready for final testing

2009-12-02 Thread Carlo Marcelo Arenas Belon
On Wed, Dec 02, 2009 at 11:17:26AM +, Daniel Pocock wrote:
 Carlo Marcelo Arenas Belon wrote:
 On Wed, Dec 02, 2009 at 10:36:02AM +, Daniel Pocock wrote:

 Can you try re-enabling kqueue and patching apr to use rfork()?

 Doesn't work, and fails now on sending of the metrics, because of course
 this time the parent process close that socket and the child can use it
 after that.

 The only viable solution I see is to delay the creation of all the sockets
 until daemonized as it was being done originally.
   
 The problem with that is that if another process is already listening on  
 one of the ports wanted by gmond, then the listener set up will fail,  
 but if the problem is only detected after daemonizing, then the caller  
 doesn't know about the failure.

but that is something that could be fixed at the caller level but just
checking if the port is bound to something already before calling gmond.

agree that is not elegant, but is better than the current situation where
you can't start gmond at all.

 If you really need to avoid having the parent report back on issues on that
 then you are going to keep the parent around and send the status back from
 the child until getting into the main loop through a unix socket or similar
 instead as you suggested originally was another option.
   
 That is not as easy to implement in apr as the apr_proc_detach() call.   

frankly I don't like much all the abstractions that apr_* provides because
makes simple things like this more complicated (specially because of the
unintended sideeffects) but since apr_proc_detach is just calling fork
and reopening the 3 std filehandles shouldn't be that difficult to work
around.

 apr_proc_fork() is described as the only call in apr that is not  
 portable.  apr_proc_create() could be used to invoke another gmond  
 process, but I'm not sure that apr guarantees to preserve the file  
 descriptors and memory allocations across that call.

apr_proc_fork() is not called by apr_proc_detach() AFAIK, indeed I was
surprised to see it even existed when noticed that apr_proc_detach calls
fork() directly.

 Maybe the problem has something to do with the way detach recycles  
 stdin/stdout/stderr?  As a quick test, could you try modifying gmond.c  
 so that it calls fork() directly rather than calling apr_proc_detach()?

fork() doesn't work because the kqueue filehandle is not inherited; using
rfork() instead doesn't either because all filehandles are closed by doing
exit(0) in the parent and so fails in the same way that changing
apr_proc_detach() does when changed to use rfork() instead.

Carlo

--
Join us December 9, 2009 for the Red Hat Virtual Experience,
a free event focused on virtualization and cloud computing. 
Attend in-depth sessions from your desk. Your couch. Anywhere.
http://p.sf.net/sfu/redhat-sfdev2dev
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] [Ganglia-general] Ganglia 3.1.5 beta ready for final testing

2009-12-02 Thread Carlo Marcelo Arenas Belon
On Wed, Dec 02, 2009 at 11:48:51AM +, Daniel Pocock wrote:

 fork() doesn't work because the kqueue filehandle is not inherited; using
 rfork() instead doesn't either because all filehandles are closed by doing
 exit(0) in the parent and so fails in the same way that changing
 apr_proc_detach() does when changed to use rfork() instead.
   
 I'm not a BSD expert, do you know if there is any ioctl or something  
 that can be used to tell BSD to keep the file descriptors for the child  
 process?

not a BSD expert either, but I would think that would be very unlikely.

I would suggest reverting r2025 in trunk and start looking for an alternative
solution, but would be probably just easier to revert r2043 for 3.1 as well
to solve the release blocker, with the possibility of adding some logic to
the init script to try to help with the test case you were trying to prevent
by the original feature.

Carlo

PS. apache httpd must have a solution as they don't seem to have kqueue 
disabled, but that solution is probably just to delay the port binding
as was done originally (except that they manage better the failures)

--
Join us December 9, 2009 for the Red Hat Virtual Experience,
a free event focused on virtualization and cloud computing. 
Attend in-depth sessions from your desk. Your couch. Anywhere.
http://p.sf.net/sfu/redhat-sfdev2dev
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] vc++ build

2009-12-01 Thread Carlo Marcelo Arenas Belon
On Mon, Nov 30, 2009 at 12:19:48PM -0500, Gladish, Jacob wrote:
 I believe this has come up in the past, but does anyone know if there's
 interest or any progress made on the native win32 build/port?

there has been some slow progress for win32 support in general but using
mingw instead (which is easier to work with than vc++ for an autotools
based project) but all of this is highly experimental and only available
on trunk (3.2).

Carlo

--
Join us December 9, 2009 for the Red Hat Virtual Experience,
a free event focused on virtualization and cloud computing. 
Attend in-depth sessions from your desk. Your couch. Anywhere.
http://p.sf.net/sfu/redhat-sfdev2dev
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] Ganglia 3.1.5 beta ready for final testing

2009-12-01 Thread Carlo Marcelo Arenas Belon
On Tue, Dec 01, 2009 at 10:20:32PM +, Daniel Pocock wrote:

 - Could it be a security issue?  Can you try disabling setuid?  It  
 appears that listen channels are only set up after setuid, but maybe  
 there is something else.

still reproducible with setuid = off

 - Have you tried different versions of APR?  E.g. on RHEL5, I test with  
 the native apr-1.2.7, and on Debian I have 1.2.12-5

OpenBSD 4.5 comes with 1.2.11p2 but it also failed with a manually installed
1.3.8

 - Can you easily re-compile APR with a different poll implementation?  I  
 think you can change it from configure.

Which option?, --enable-other-child doesn't make a difference and considering
how many different versions of APR are installed in all affected systems I
would be surprised this to be an APR issue.

 - If you take 3.1.2 or another release and apply this patch only, do you  
 see the same bug?

yes

Carlo

--
Join us December 9, 2009 for the Red Hat Virtual Experience,
a free event focused on virtualization and cloud computing. 
Attend in-depth sessions from your desk. Your couch. Anywhere.
http://p.sf.net/sfu/redhat-sfdev2dev
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] Ganglia 3.1.5 beta ready for final testing

2009-11-30 Thread Carlo Marcelo Arenas Belon
On Mon, Nov 30, 2009 at 08:12:34AM +, Daniel Pocock wrote:

 Carlo Marcelo Arenas Belon wrote:
 On Sun, Nov 29, 2009 at 10:57:01AM +, Carlo Marcelo Arenas Belon wrote:
   
 On Tue, Nov 24, 2009 at 06:03:51PM -0800, Bernard Li wrote:
 
 Please help us test on as many OS/archs as possible, as this would go
 GA quite immediately ;-)
   
 FreeBSD is not able to return any XML data through TCP/8649 (tested with
 FreeBSD 8.0 amd64).

 the problem wasn't actually the TCP/8649 service but the fact that gmond
 was going into an infinite loop after sending the first metric update.

 the issue was tracked down to r2043 and a 3.1.5 development package with
 that patch reverted is available for testing from :

   http://sajino.sajinet.com.pe/ganglia/ganglia-3.1.5.2101.tar.gz
   
 Did you see this issue with 3.1.3 or 3.1.4?  They both contain the same  
 patch.

Both 3.1.3 and 3.1.4 should have the same problem, but haven't been able to
test 3.1.3 since it is no longer available.  (FreeBSD 8 was just released a
couple of days ago anyway).  3.1.4 shows the same behavior at least there
and the fixed package seems to also work find with OpenBSD 4.4 amd64,
NetBSD 4 i386 and DragonFlyBSD 2.4.1 i386 and amd64 (after also patched
with r2124 to workaround BUG245).

 DragonFlyBSD fails to build but a 3.2 version of ganglia which includes
 fixes for that fails with the same TCP issue than FreeBSD and so this
 issue might be affecting other BSD as well.

 confirmed also to be affecting OpenBSD (tested with OpenBSD 4.5 amd64)
 but considering the nature of the fix wouldn't be surprised if other
 configurations were also affected.
   
 Are you proposing a fix or just revert the change?

Your call, eventhough a fix for this feature will be probably preferred as
there is nothing special about the BSD for them to be affected and it might
be that the problem is therefore more generic.

At least a revert would be needed for 3.1 as this accounts for a regression
but haven't done so either waiting for you to first revert it on trunk and
then decide on how to proceed from there depending on how critical this
feature was for the release.

 The change has been working on Linux, Solaris and Cygwin.

Other than just doing a manual bisect (using git instead of svn here would
had been useful) to find where the problem was introduced and validate that
reverting it corrects the problem haven't done much analysis of it, but the
fact that it broke in such a strange way (was indeed expecting the culprit
to be somewhere else, specially considering all recent changes in the
networking and the fact that it seemed originally to be triggered by a TCP
request) probably points to a bigger issue which just happens to have not
been visible on the configurations used to test Linux, Solaris and Cygwin,
specially considering how pervasive it was (broke all BSD I had access to
test, at least)

Carlo

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] Ganglia 3.1.5 beta ready for final testing

2009-11-30 Thread Carlo Marcelo Arenas Belon
On Mon, Nov 30, 2009 at 01:29:34PM +, Daniel Pocock wrote:
 Carlo Marcelo Arenas Belon wrote:

 Your call, eventhough a fix for this feature will be probably preferred as
 there is nothing special about the BSD for them to be affected and it might
 be that the problem is therefore more generic.
   
 It may be that this bug is revealing a more serious issue in the way  
 initialisation is done, so I would prefer to know the real cause rather  
 than just revert the change that forces the problem to show itself.

agree and as I said before the reason why I didn't just revert it from trunk
or 3.1 as a fix even if it seems to resolve the problem.

 At least a revert would be needed for 3.1 as this accounts for a regression
 but haven't done so either waiting for you to first revert it on trunk and
 then decide on how to proceed from there depending on how critical this
 feature was for the release.
   
 I agree that it is a recession, but reverting it may cause the real  
 culprit to remain hidden.  I'd rather hold the release while we look  
 more closely.

not sure if I understand what you meant here, since it would be obvious to
me that 3.1.5 can't be released if a fix (even if it is just reverting the
change) is committed.

are you saying you want to hold of on deciding to release or not 3.1.5 or
to see what will be in 3.1.6?, if the later I would suggest also pulling
some other fixes and of course that would also require for us to agree
on a bootstrapping environment for this release at least.

 The change has been working on Linux, Solaris and Cygwin.

 Other than just doing a manual bisect (using git instead of svn here would
 had been useful) to find where the problem was introduced and validate that
 reverting it corrects the problem haven't done much analysis of it, but the
 fact that it broke in such a strange way (was indeed expecting the culprit
 to be somewhere else, specially considering all recent changes in the
 networking and the fact that it seemed originally to be triggered by a TCP
 request) probably points to a bigger issue which just happens to have not
 been visible on the configurations used to test Linux, Solaris and Cygwin,
 specially considering how pervasive it was (broke all BSD I had access to
 test, at least)
   
 Can you provide output from strace/truss and also a stack trace from the  
 point where it is in the infinite loop?

filed BUG246 with the trace information (collected from OpenBSD 4.5 amd64)
using ktrace, but you got me there.

from the way the problem represents itself isn't really obvious were the
offending code is and is difficult to debug as well since it dissapears
when in debug mode or not running as a daemon, which is the reason why
I haven't been able to capture a backtrace yet either.

 There is a good reason for moving the daemonize code the way I did - an  
 alternative would be to daemonize, but make the original process hang  
 around until the daemon process has entered the main loop.

OK, and assume it is probably related to the cases were gmond suddenly
dies at startup without notification but some clarification on what was
the problem you were trying to solve would be probably usefull too.

Carlo

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] Ganglia 3.1.5 beta ready for final testing

2009-11-29 Thread Carlo Marcelo Arenas Belon
On Tue, Nov 24, 2009 at 06:03:51PM -0800, Bernard Li wrote:
 
 Please help us test on as many OS/archs as possible, as this would go
 GA quite immediately ;-)

FreeBSD is not able to return any XML data through TCP/8649 (tested with
FreeBSD 8.0 amd64).

DragonFlyBSD fails to build but a 3.2 version of ganglia which includes
fixes for that fails with the same TCP issue than FreeBSD and so this
issue might be affecting other BSD as well.

Carlo

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


[Ganglia-developers] bind and bind_hostname parameters in udp_send_channel

2009-11-29 Thread Carlo Marcelo Arenas Belon
Greetings,

As part of 3.1.3 (then 3.1.4 and now 3.1.5) two additional parameters were
added to the configuration for udp_send_channel which were not documented
but that are otherwise very useful.

after adding some basic documentation to trunk in r2122 and using them had
found that the interface should be better improved before it gets released
by either :

* remove bind_hostname and overload that functionality on bind by defining
  a magic value which means (resolve default hostname) like .

* keep bind_hostname but converted into a boolean so it can be set like all
  other flags in gmond.conf and better handle what to do when both
  parameters are provided (currently bind_hostname seems to silently
  override bind)

Carlo

PS. backporting the documentation to 3.1 should be done also once it is
stabilized

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


[Ganglia-developers] RFC: release history in ganglia-3.1's STATUS file

2009-11-29 Thread Carlo Marcelo Arenas Belon
Greetings,

while looking at a fix for the broken gmond in at least some of the BSD
platforms that was reported here :

  
http://www.mail-archive.com/ganglia-developers@lists.sourceforge.net/msg05366.html

noticed also that the STATUS file for the 3.1 branch had some confusing
history and which I assume based on our own wiki [1] and a sample file
from apache [2] (which was used as a basis for that) should be instead
something like :

Index: STATUS
===
--- STATUS  (revision 2122)
+++ STATUS  (working copy)
@@ -7,9 +7,10 @@
 
 Release history:
 
-3.1.5(hargrave)   : Released: In Development
-3.1.4(???): Released: Not released
-3.1.3(avenger): Released: Not released
+3.1.6(hargrave)   : In Development
+3.1.5(hargrave)   : Tagged: Nov 24, 2009
+3.1.4(hargrave)   : Tagged: Oct 26, 2009 (not released)
+3.1.3(avenger): Tagged: Sep 19, 2009 (not released)
 3.1.2(langley): Released: Feb 17, 2009
 3.1.1(wien)   : Released: Sep 10, 2008
 3.1.0(amelia) : Released: Jul 30, 2008

the main differences and points for discussion being :

* Until we get rid of the release names, and unless the release name
  changes it should be considered that the release name is the same (as
  reported during configure).
* 3.1.3, 3.1.4 and 3.1.5 were released at least as betas and therefore
  the last version of that file misses that information.
* 3.1.3 and 3.1.4 status of not released used to be no GA and that
  might a better way to identify releases that went through beta cycles
  but never went to GA.
* 3.1.5 is either Released (included tagging and a beta package) or
  in development and since there was an announcement for testing I
  assume is at least in the same state than 3.1.3 and 3.1.4 were, and
  the fact that the GANGLIA_NANO_VERSION and GANGLIA_SNAPSHOT settings
  wasn't updated to reflect that was probably just an oversight which
  has been corrected in r2123
* 3.1.6 has no commits yet but should be open for development at least
  for bugfixes for 3.1.5 (if that gets scrapped) or to include other
  features/bugfixes which had been otherwise on halt since the feature
  freeze for avenger was called.

in any case, looking forward for comments on this so that the fixes (if
needed) can be committed but specially so it is clear on how to proceed
until the GA status for 3.1.5 is decided.

Carlo

[1] http://sourceforge.net/apps/trac/ganglia/wiki/how_project_works
[2] 
http://svn.apache.org/viewvc/httpd/httpd/branches/2.0.x/STATUS?revision=882861view=markup

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] Ganglia 3.1.5 beta ready for final testing

2009-11-29 Thread Carlo Marcelo Arenas Belon
On Sun, Nov 29, 2009 at 10:57:01AM +, Carlo Marcelo Arenas Belon wrote:
 On Tue, Nov 24, 2009 at 06:03:51PM -0800, Bernard Li wrote:
  
  Please help us test on as many OS/archs as possible, as this would go
  GA quite immediately ;-)
 
 FreeBSD is not able to return any XML data through TCP/8649 (tested with
 FreeBSD 8.0 amd64).

the problem wasn't actually the TCP/8649 service but the fact that gmond
was going into an infinite loop after sending the first metric update.

the issue was tracked down to r2043 and a 3.1.5 development package with
that patch reverted is available for testing from :

  http://sajino.sajinet.com.pe/ganglia/ganglia-3.1.5.2101.tar.gz

 DragonFlyBSD fails to build but a 3.2 version of ganglia which includes
 fixes for that fails with the same TCP issue than FreeBSD and so this
 issue might be affecting other BSD as well.

confirmed also to be affecting OpenBSD (tested with OpenBSD 4.5 amd64)
but considering the nature of the fix wouldn't be surprised if other
configurations were also affected.

Carlo

CC Daniel as the release manager for 3.1.5 and author of the problematic
   feature.

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] bootstrapping for 3.1.X series and 3.2.X

2009-11-28 Thread Carlo Marcelo Arenas Belon
On Wed, Nov 25, 2009 at 11:00:21AM +, Daniel Pocock wrote:
 
 a) is it preferred that we release 3.1.4 or that we release 3.1.5, or a 
 third option, roll a 3.1.6 tarball using the same environment where 
 3.1.2 was bootstrapped?

3.1.2 had a bootstrapping problem which resulted on it failing to build
by default on multilib amd64/i386 systems if both the 32bit and 64bit
versions of the dependencies (libapr, confuse) were installed.

3.1.4 used the same bootstrapping than 3.1.1 and so was IMHO better, but
because there were multiple 3.1.4 packages is probably difficult to know
which one was validated, and that was AFAIK one of the reasons why it
wasn't eventually released.

 b) should the choice of bootstrap environment be locked for all 3.1.X, 
 and only changed when increasing the minor version number (e.g. when we 
 go from 3.1 to 3.2)?

no, but since our build systems is full of hacks and not completely reliable
it might be a good idea to test no issues are introduced when looking at
a new version.

 c) what environment should be used to bootstrap 3.2.X/trunk?

the same than 3.1 so that all improvements in the build system will be
tested there and then backported for stability.

 d) Can anyone volunteer to provide a stable bootstrap environment (e.g. 
 a virtual server) just for Ganglia?  Two such environments may be 
 needed, one for trunk and one for the current release branch.

Matt did offer an EC2 instance if we could agree on an OS version :

  
http://www.mail-archive.com/ganglia-developers@lists.sourceforge.net/msg05271.html

I suggested Debian 5.0 (more conservative) or Fedora 12 (to be updated more
frequently) but as far as it is agreed, documented and reproducible anything
should work.

Carlo

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


[Ganglia-developers] Using git for ganglia source code management (was Re: [Ganglia-general] Ganglia 3.1.4 beta ready for testing)

2009-11-03 Thread Carlo Marcelo Arenas Belon
Changing subject and list to better focus this thread

On Mon, Nov 02, 2009 at 10:57:57PM +, Daniel Pocock wrote:

 The discussions about bootstrapping and versioning brings me to another  
 issue - does anyone have any interest in using git instead of SVN?

+1, but beware that the automatic ChangeLog generation, as well as release
flows will need to be adjusted as they were designed around subversion.

 I notice it can do some handy tricks, like generating version numbers that  
 reflect the tag you are building in

it can also do some more interesting tricks, like pushing/pulling from a
subversion server and so there is really no need to force anyone to migrate
either.

 as well as all the benefits of distributed version control.

which is IMHO the biggest selling point, as it allows for more participation
as it is easier to contribute patches and maintain them even when not having
access to the main repository.

Carlo

--
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] [Ganglia-general] Ganglia 3.1.4 beta ready for testing

2009-11-03 Thread Carlo Marcelo Arenas Belon
On Mon, Nov 02, 2009 at 03:05:32PM -0800, Bernard Li wrote:
 
 Can you please test this tarball bootstrapped on Fedora 9.

It works, but would invalidate all testing that was done for 3.1.3
and the original 3.1.4.

 If it works I will replace the original tarball with this:
 
 http://ganglia.info/testing/bootstrapped_on_fedora9/ganglia-3.1.4.tar.gz

-1

Changing the release package in the middle of a release is a bad idea;
indeed changing it without bumping the release version goes against our
release procedures, as it could result in different binary packages and was
the reason why the unofficial package I provided was published far
from the ganglia servers to hopefully avoid any confusion and frustration
if it was found later that someone finds a bug which happens to be only
reproducible in the other version.

There is also the risk of introducing a bug (like the one in 3.1.2 from
bootstrapping in SuSE with automake 1.9.6 which prevented users that had
the 32bit libraries for apr installed on 64bit systems to get a working
build) and so as much as I am excited about finally moving to some more
modern versions of autotools, this make only sense as part of 3.1.5, and
which will hopefully also allow for enough time to remove all needed hacks
and finally cleanup the bootstrapping code.

Carlo

--
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] [Ganglia-general] Ganglia 3.1.4 beta ready for testing

2009-11-03 Thread Carlo Marcelo Arenas Belon
On Mon, Nov 02, 2009 at 11:09:40PM +, Daniel Pocock wrote:
 
 I note Paul is using gcc, whereas I'm building and testing with Sun 
 Studio on the OpenCSW build farm - Sun's compiler is now a free 
 download, and it is used to build all the CSW libraries (including those 
 used by Ganglia), so this is now the easiest solution to support - that, 
 and Solaris 8 support, led me to tweak the configure.in stuff for 
 Solaris - maybe it needs more tweaking to support gcc - would anyone 
 like to comment on the preferred gcc build environment to be supported?

IMHO any gcc should work, and indeed gcc was the originally supported
compiler for ganglia in Solaris (Sun Studio was added later in 3.1.1 when
it was made freely available with OpenSolaris).

while working on libmetrics (as can be seen in the corresponding metrics.c
file) the following versions of gcc were used (most of them using SUNWtoo
and other SUNW provided tools as part of the toolchain when possible) :

  Solaris 7 x86 (32-bit) with gcc-2.8.1 (this one used GNU binutils AFAIK)
  Solaris 8 (64-bit) with gcc-3.3.1
  Solaris 9 (64-bit) with gcc-3.4.4
  Solaris 10 SPARC (64-bit) and x86 (32-bit and 64-bit) with SUNWgcc

 On the issue of the gcc environment, we basically need a second version 
 of scripts/build-solaris.sh for gcc - this raises questions like should 
 the libraries (apr, confuse) be built with gcc too?  Which ld, ar, etc?

This is IMHO a packager call after all we don't provide binaries (well we
do but almost no one uses them) because as you pointed out the decision on
which toolchain to use needs to be made at the distribution or system
engineering level and so we are left to support them all the best we can.

In cases were there is some overlap (like in the case of the CSW packages,
where the package maintainers are also upstream contributors) or when it
helps to simplify maintenance on a specific platform (like the CentOS
4 RPMs or the Makefile.WiX recipes for Cygwin) then it makes sense to
have some additional code to help with it and also some more testing
or confidence about the resulting binaries working as expected, but that
shouldn't be ever considered as the only supported solution IMHO.

Carlo

--
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] bootstrapping ganglia with modern autotool versions for release (was Re: Ganglia 3.1.4 beta ready for testing)

2009-11-02 Thread Carlo Marcelo Arenas Belon
On Fri, Oct 30, 2009 at 12:28:03PM -0700, Bernard Li wrote:
 
 I have a Fedora 9 VM that I can use to bootstrap in the future --
 would the autotools that come with that version work?

something with libtool 2.2 probably better, as well as something
that is still getting updates (in case there are bugs that need
to be fixed).

Fedora 12 is going to be released in a couple of weeks and
therefore Fedora 10 will go out of support a month after that,
leaving Fedora 9 EOL for more than 3 months already :

  https://www.redhat.com/archives/fedora-announce-list/2009-July/msg4.html

Carlo

--
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] [Ganglia-general] Ganglia 3.1.4 beta ready for testing

2009-10-29 Thread Carlo Marcelo Arenas Belon
On Tue, Oct 27, 2009 at 09:52:52AM +, Paul Sobey wrote:

 /usr/include/sys/feature_tests.h:336:2: error: #error Compiler or options 
 invalid; UNIX 03 and POSIX.1-2001 applications   require the use of 
 c99
 make[2]: *** [getopt1.o] Error 1
 
 Googling leads me to try compiling with CFLAGS=-std=gnu99 per:
 
 http://bugzilla.ganglia.info/cgi-bin/bugzilla/show_bug.cgi?id=215

this is a bug on the autoconf from CentOS 4 which is used to build the
release packages, therefore you can also workaround the issue by
rebootstrapping the package or making your own with a better version
of the autotools.  for simplicity I'd uploaded an unofficial release
package for 3.1.4 bootstrapped on fedora rawhide in :

  http://sajino.sajinet.com.pe/ganglia/ganglia-3.1.4.tar.gz

 If do that, compilation fails building against Python 2.6.2 (built with 
 same toolchain):

once you use -std=gnu99 is no longer the same toolchain and therefore
building python with the same standard support should solve your problem.

Carlo

--
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] [Ganglia-general] Ganglia 3.1.4 beta ready for testing

2009-10-29 Thread Carlo Marcelo Arenas Belon
On Thu, Oct 29, 2009 at 04:44:59PM +, Paul Sobey wrote:
 I note from the Makefile Daniel posted:
 
 # Depends: some issues exist getting the Python support working on 
 Solaris,
 # Ganglia's configure.in needs to be further enhanced for this to work

I think this is a CSW specific problem, as I had no problem getting
python support compiled in Solaris 10u7 x86 using SUNWPython-devel, SUNWgcc,
SUNWlexpt and compiled versions of confuse and apr.

  $ PATH=$PATH:/usr/sfw/bin:/usr/ccs/bin
  $ ./configure CC=gcc -std=gnu99 --prefix=/usr/local 
--with-libarp=/usr/local/apr/bin/apr-1-config --with-libconfuse=/usr/local
  $ make

Daniel, could you elaborate?

Carlo

--
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] [Ganglia-general] Ganglia 3.1.4 beta ready for testing

2009-10-29 Thread Carlo Marcelo Arenas Belon
On Thu, Oct 29, 2009 at 08:42:05PM +, Daniel Pocock wrote:
 
 Carlo Marcelo Arenas Belon wrote:
  On Thu, Oct 29, 2009 at 04:44:59PM +, Paul Sobey wrote:

  I note from the Makefile Daniel posted:
 
  # Depends: some issues exist getting the Python support working on 
  Solaris,
  # Ganglia's configure.in needs to be further enhanced for this to work
 
  Daniel, could you elaborate?
 
 Although I have described the Python module in the CSW Makefile, it is 
 not something I have properly tested.

OK and I haven't done any testing either, other than making sure it builds
and that a mod_example like module can be loaded, but my question was more
about the need to change configure.in to support python modules which you
were referring about in the Makefile as Paul noted.

 I am still working through some 
 core agent problems (e.g. see the discussion on csw-maintainers about 
 building a 64 bit version of everything: I've noticed that when running 
 a 32 bit binary on some 64 bit machines with lot's of RAM, some kstat 
 calls lead to a seg fault)

care to provide a link to the thread or any bug reports?, earlier releases
for 3.0 required 64bit binaries as they were reading kernel memory directly
to gather the statistics, but after those metrics were migrated to kstat
that shouldn't be an issue anymore, and I am running some 32-bit 3.0 agent
with solaris sparc with significant amount of memory as well, so there
might be a regression to track here.

Carlo

--
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


[Ganglia-developers] Solaris support for 3.1.4 (Was RE: Ganglia 3.1.4 beta ready for testing)

2009-10-29 Thread Carlo Marcelo Arenas Belon
Trimming CC and changing Subject to better focus this thread

On Thu, Oct 29, 2009 at 10:09:32PM +, Daniel Pocock wrote:
 Carlo Marcelo Arenas Belon wrote:
 
  my question was more
  about the need to change configure.in to support python modules which you
  were referring about in the Makefile as Paul noted.

 I definitely remember playing with configure.in to try and get the 
 Python support working with CSW, although I'm not certain what state I 
 left it in.

do you mean you have a modified 3.1 package that had extensions for CSW
python support?, are those modifications available somewhere?

 I've done a diff from 2017:HEAD on trunk/configure.in, it appears that 
 none of my changes for Python support on Solaris are in there.

should I then assume that neither trunk or ganglia-3.1 had any python
support related patches committed from you then?

 In one branch I started working on I tried setting up my own LDFLAGS for 
 Python, e.g. in configure.in:
 
 LDFLAGS_PYTHON=-lpython${PY_VERSION}
 or for static:
 LDFLAGS_PYTHON=/opt/csw/lib/python2.3/config/libpython${PY_VERSION}.a -lm
 
 and using LDFLAGS_PYTHON with when linking the Python module.

the following should be all that is needed IMHO (not tested and assuming the
location/name of the python binary from your previous comments) :

  ./configure --with-python=/opt/csw/bin/python2.3 

 However, I don't think this is best practice for configure.in.  I can 
 have a go at making it work, but it would be useful to agree on the 
 compatibility requirements first: e.g. should compatibility with 
 CSWpython be the main goal, or do we want to set some other criteria?

not sure what you mean here, but AFAIK the objective was to be able to use
python2.3 or higher (just because CentOS 4 uses that)

Solaris 10 comes with python2.3 and python2.4 (Through SUNWPython) but in
theory any version of python should work if configure is pointed to it.

In Gentoo Linux 10.0 amd64, Fedora 12 or Ubuntu Karmic that came with
python 2.6 all python modules should work even if the tcpconn.py module
might warn about deprecated use of popen (which means as soon as someone
moves to Python 3 that module at least will break)

I think the core modpython should build with any python 2.x and maybe 1.x
as well, but I don't think anyone ever tested/needed that.

  I am still working through some 
  core agent problems (e.g. see the discussion on csw-maintainers about 
  building a 64 bit version of everything: I've noticed that when running 
  a 32 bit binary on some 64 bit machines with lot's of RAM, some kstat 
  calls lead to a seg fault)
 
  care to provide a link to the thread or any bug reports?, earlier releases
  for 3.0 required 64bit binaries as they were reading kernel memory directly
  to gather the statistics, but after those metrics were migrated to kstat
  that shouldn't be an issue anymore, and I am running some 32-bit 3.0 agent
  with solaris sparc with significant amount of memory as well, so there
  might be a regression to track here.

 I've been discussing the issue privately with Dago, it is easily 
 reproducible on the host called build8st in the CSW build farm.  All my 
 latest packages are on the box already so if you request an account, you 
 can try it.  I'll forward you the email.

OK, the problem might be Solaris8 specific then, since my Solaris 9 and 10
binaries didn't have that problem.  Hopefully will be able to figure out
how to get a CSW account then, but if you could get a core dump (better
if from an unstriped binary) or some backtraces could help on debugging
this issue.

 The more general discussion on building packages containing both 32 and 
 64 bit libraries started here:
 
 http://lists.opencsw.org/pipermail/maintainers/2009-October/004687.html

OK, do you have any references or documentation for the kstat requirement
on 64bit kernels?, at least on my Solaris 10u7 system vmstat is 32bit and
linked against a 32bit version of libkstat (even if a 64bit version is
also available)

Carlo

--
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] [Ganglia-general] Ganglia 3.1.4 beta ready for testing

2009-10-29 Thread Carlo Marcelo Arenas Belon
On Thu, Oct 29, 2009 at 01:10:14PM -0700, Bernard Li wrote:
 On Thu, Oct 29, 2009 at 12:01 PM, Carlo Marcelo Arenas Belon
 care...@sajinet.com.pe wrote:
 
  this is a bug on the autoconf from CentOS 4 which is used to build the
  release packages, therefore you can also workaround the issue by
  rebootstrapping the package or making your own with a better version
  of the autotools. ?for simplicity I'd uploaded an unofficial release
  package for 3.1.4 bootstrapped on fedora rawhide in :
 
  ?http://sajino.sajinet.com.pe/ganglia/ganglia-3.1.4.tar.gz
 
 Do you have a link for the bug, and are you aware whether there are
 updates for CentOS 4 to fix the issue?

I am not aware of a CentOS or RHEL bug report, but considering that EL4
is in maintenance mode there won't be a fix anyway (2.59 was released
in 2003 and the last update to package was in 2004)

 I guess I could start building on CentOS 5, provided that the autoconf
 does not have this bug.

CentOS 5 also uses autoconf 2.59 so wouldn't help with this problem, but
might hopefully allow us remove all the kludges that were added to workaround
the libtool 1.5.6 bugs which were preventing DragonFlyBSD support.

Ideally, which platform is used to bootstrap shouldn't be relevant though
and IMHO we should be instead aiming to the latest versions of the autotools
(either installed by hand or provided as part of the distribution if more
development focused) and for that when on Linux usually means Fedora, Gentoo
or Debian IMHO.

Carlo

--
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


[Ganglia-developers] bootstrapping ganglia with modern autotool versions for release (was Re: Ganglia 3.1.4 beta ready for testing)

2009-10-29 Thread Carlo Marcelo Arenas Belon
Trimming CC and changing Subject to reflect thread better

On Thu, Oct 29, 2009 at 04:54:17PM -0700, Bernard Li wrote:
 On Thu, Oct 29, 2009 at 4:47 PM, Carlo Marcelo Arenas Belon
 care...@sajinet.com.pe wrote:
 
  Ideally, which platform is used to bootstrap shouldn't be relevant though
  and IMHO we should be instead aiming to the latest versions of the autotools
  (either installed by hand or provided as part of the distribution if more
  development focused) and for that when on Linux usually means Fedora, Gentoo
  or Debian IMHO.
 
 I have no problem with this in theory, but would new version of
 autotools create a tarball that is backward compatible?

Curious about how a backward compatible tarball would be like, but if
by that you meant that you can use the resulting configure script and
supporting files on systems that are much older and that never had a package
for autotools with the same version that were used, then the answer is yes,
that is the whole point of autotools anyway, to support other systems than
the one that was originally used to build the code on, without requiring
to have any of the autotools themselves installed.

As for testing, is there any problem you had found on the unofficial
package that I posted and that was build on Fedora Rawhide x86 :

  http://sajino.sajinet.com.pe/ganglia/ganglia-3.1.4.tar.gz

Other than being able to work on Solaris and not asking for an unnecessary
C++ compiler I wouldn't expect it to be that different when used to build
ganglia binaries.

Carlo 

--
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] [Ganglia-general] Ganglia 3.1.4 beta ready for testing

2009-10-27 Thread Carlo Marcelo Arenas Belon
On Mon, Oct 26, 2009 at 04:51:33PM -0700, Bernard Li wrote:
 
 Ganglia 3.1.4 is ready for testing at:
 
 http://ganglia.info/testing/

DragonFlyBSD fails to build (tested with 2.4.0 32bit).

not a regression (a system header problem which also affects
3.1.2) and there are some trivial unrelated changes in trunk
which could help with that.

Carlo 

--
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] [Ganglia-general] Ganglia 3.1.4 beta ready for testing

2009-10-27 Thread Carlo Marcelo Arenas Belon
On Tue, Oct 27, 2009 at 10:15:15AM +, Daniel Pocock wrote:
 Carlo Marcelo Arenas Belon wrote:

 DragonFlyBSD fails to build (tested with 2.4.0 32bit).

 how do you propose we avoid missing stuff like this in future?

Not sure if this can be avoided, but there might be some things we could
do to mitigate like :

* periodic snapshot releases
* automated buildfarm
* census of developers/users which could help with testing per platform
* some sort of unit tests or other tests which could be automated

 3.1.3 was floating around for a while, and I tested it on RHEL3, RHEL4,  
 RHEL5, Solaris 8, Solaris 10, Debian lenny and Cygwin.

And that was very useful as it uncovered a problem in RHEL3 AFAIK.

 Do you think we need some checklist of supported platforms that must be  
 verified before we tag in future?

The release notes includes a list of supported platforms which usually
reflect the ones that we got reports were working fine during the testing
cycle.

My objective with this report was to let you know that DragonFlyBSD wasn't
one of them.  I wasn't implying that report was to be considered as a
showstopper for the release either.

 We may need to have some buy-in from  
 people willing to run the tests at short notice as a release date  
 approaches, as not many people are going to have easy access to every  
 supported platform.

And not all people will be available either.

 Maybe commits on the release branch need to be blocked while such  
 testing is done?

This already happens AFAIK even if there is no formal provision against
it, but just to simplify the release manager job who would need
otherwise to cherry-pick a release branch.

Setting up a list of objectives before the release is tagged might help
though clarify what is expected from the release (including clarification
of which areas will need testing or which features are to be evaluated)
and also which platforms are expected to be supported based probably on
feedback provided from pre-release packages.

Carlo

--
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] changes to trunk, backports

2009-08-10 Thread Carlo Marcelo Arenas Belon
On Mon, Aug 10, 2009 at 09:24:18PM +0100, Daniel Pocock wrote:
 
 I'm making some more changes to trunk over the next few days, some of 
 them impact the build system (configure.in, Makefile.am).

you mean more than the ones that were already committed from r2017 to
r2021?

what are the changes you are planning to commit for?

 I've been able to test on Linux (RHEL 4 and 5), Solaris 8 and 10 and 
 Cygwin - I realise people using other platforms may encounter different 
 results, so I will stagger most things by one or two weeks before 
 backporting to the 3.1 branch.

i think that r2021 is either a fix or a good part of a fix for BUG16,
feel free to reassign that bug to yourself and finish it ;)

 The last round of changes I made have been backported to the 3.1 branch 
 today, hopefully this will allow more people to evaluate it before the 
 next release.

probably will be a good idea to document them in the STATUS file so they
don't get lost for the future release notes.

Carlo

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] Stack smashing in Linux gmond while reading long lines from /proc/mounts.

2009-07-31 Thread Carlo Marcelo Arenas Belon
On Thu, Jul 30, 2009 at 10:33:03AM -0400, Jason A. Smith wrote:
 On Thu, 2009-07-30 at 10:29 +, Carlo Marcelo Arenas Belon wrote:
  On Wed, Jul 29, 2009 at 03:42:05PM -0400, Jason A. Smith wrote:
   
   In gmond, the monitor-core/libmetrics/linux/metrics.c:find_disk_space()
   function, was not only using small character arrays, but the arrays for
   the sscanf after the fgets were smaller than the array for the line it
   just read in, which can lead to buffer overflows and the stack
   smashing problem that we were having.
  
  using fixed size arrays in the stack is never a good idea.
  in this case it could be theoretically possible to exploit this overflow
  with the help of a malicious NFS server (very unlikely though).
  
   To fix out problem and prevent the overflows, I made a patch to increase
   the size of the arrays and also make each of the arrays used in the
   sscanf the same size as the line buffer used in fgets, so there is no
   chance of another overflow.
  
  committed for trunk in r2007, but the new implementation might also
  generate segfaults on its own due to stack overflows when running with
  very small stacks as it requires a bigger stack.
  
  IMHO it would be better to migrate this function to use getmntent and
  friends as it was done already for Cygwin, Solaris and the BSD and that
  way avoid the use of local buffers and parsing of the /proc/mounts file
  directly.
 
 This is a good idea, I didn't think to check the libmetric code from the
 other OSes.  Besides a few minor differences for the remote_mounts 
 valid_mount_type functions and the seen_before part, the solaris  linux
 code look almost identical.

if I recall correctly, when I added disk metrics for Solaris it was indeed
modeled after the linux code except for the following differences (there
was a thread started at a time that got nowhere) :

* originally using GiB instead of GB (changed since)
* whitelist of filesystems instead of a blacklist
* using getmntent and friends
* avoid use of hashes, pointers or static buffers.

 Should the linux code be updated to look
 more like the solaris code?

It would be probably easier to change the implementation to use getmntent
instead of parsing the /proc/mounts file by hand and which might as well
help to simplify the code further.

 Too bad it isn't possible to merge similar
 code instead copying it.

there is no reason why we can't (if needed) write the code in 1 single
place and link it against the OS specific metric code later (as shown
by libmetrics/interface.c), but it is usually just easier to copy the code
as it will need to be adjusted to work properly for each supported OS (and
different versions of it as well, with different ABI/API)  and that way
avoid having to obscure the logic with portability constructs which 
are required otherwise.

considering though that getmntent is available in all the OS that are
implementing disk metrics AFAIK, this should be probably (as suggested
originally) a secure, portable alternative IMHO.

Carlo

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] Replacing core metrics with Python metric modules

2009-07-24 Thread Carlo Marcelo Arenas Belon
On Wed, Jul 22, 2009 at 04:02:01PM -0500, Martin Hicks wrote:
 
 I was wondering if is possible to write a Python metric module that
 could replace the core set of metrics that gmond usually collects on the
 compute node, and instead grab the data from PCP that is running on the
 head node.

this looks like a perfect place to use gmetric with spoofing instead
(assuming that you are not loading the core metrics in gmond, or not
even running gmond at all if all you need is already coming from PCP)

 Are there any real differences between the metrics that are normally
 collected by gmond, and those user-defined metrics collected by a Python
 module?

starting with 3.1 all metrics are the same

Carlo

--
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] boottime and uptime (the saga continues)

2009-07-19 Thread Carlo Marcelo Arenas Belon
On Fri, Jul 17, 2009 at 07:10:35AM -0700, Ken Teague wrote:
 
 I'm using Ganglia v3.0.3 on openSUSE 10.3 which came pre-configured on a 
 Microway cluster.  It's slightly modified to add their Microway 
 Control stuff integrated which is basically a button from the Ganglia 
 homepage which leads to their TriCom/NodeWatch thermal monitoring web 
 page.  As such, I don't think that the issue I'm having has anything to 
 do with their customization, but I wanted this to be known beforehand in 
 case the possibility exists.

hope that customization is only in the web frontend because a solution to
your problem might require a patched gmond.

 The issue I'm facing is with incorrect boottime and uptime for my master 
 and all slave nodes.

seems like BUG169 that was fixed in ganglia 3.1.2 and will be part of 3.0.8

  http://bugzilla.ganglia.info/cgi-bin/bugzilla/show_bug.cgi?id=169

the problem, most likely, is that your /proc/stat is too big in the affected
servers.

Carlo

--
Enter the BlackBerry Developer Challenge  
This is your chance to win up to $100,000 in prizes! For a limited time, 
vendors submitting new applications to BlackBerry App World(TM) will have
the opportunity to enter the BlackBerry Developer Challenge. See full prize  
details at: http://p.sf.net/sfu/Challenge
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] Building a ganglia interface into collectl

2009-04-12 Thread Carlo Marcelo Arenas Belon
On Wed, Apr 01, 2009 at 11:01:41AM +, Seger, Mark wrote:
 I hear what you're saying about using the API, but collectl is a perl script

there are perl bindings for libganglia (at least the metric generation) in :

  
http://search.cpan.org/~hirose/Ganglia-Gmetric-XS-1.00/lib/Ganglia/Gmetric/XS.pm

beware though that you won't be able to use with ganglia 3.1.x and will
essentially link a static version of ganglia's metric implement into your
code but at least will avoid having to generate binary packets and reverse
engineer the protocol as it changes.

Carlo

--
This SF.net email is sponsored by:
High Quality Requirements in a Collaborative Environment.
Download a free trial of Rational Requirements Composer Now!
http://p.sf.net/sfu/www-ibm-com
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] CVE

2009-01-25 Thread Carlo Marcelo Arenas Belon
On Fri, Jan 23, 2009 at 08:52:45AM -0700, Brad Nicholes wrote:
 
 Are we finished hashing this whole patch out yet?

haven't seen many comments from other testers of the simplified patch,
but considering that it has been included already in the 3.1.1 stable
package from Gentoo x86, I'd assume it is hashed out already.

Fedora and Debian are also testing patches for their packages AFAIK.

 Are we ready to apply the current patch to 3.1.2 and release or is there
 still more discussion going on?

guess it depends on how you define current patch as the backported
patch has still one hunk that was originally meant to be for gmetad's
multi request proposed feature that is still under discussion and hasn't
been committed yet (a second hunk was reverted already as it showed a
regression in the web frontend while testing the proposed Fedora package
update that was using it).

in any case to avoid further delays (even if IMHO not ideal, but better
than the current situation) committed the backported patch in r1959 for
ganglia-3.1.

also committed r1960 to make the new introduced feature (returning and
empty response instead of the full tree if the request to the interactive
port is invalid) consistent.

Carlo

--
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] patches for: [Sec] Gmetadserver BoFandnetwork overload + [Feature] multiple requestsper connoninteractive port

2009-01-18 Thread Carlo Marcelo Arenas Belon
On Sun, Jan 18, 2009 at 11:22:27AM +0800, Spike Spiegel wrote:

 the comment should be removed since the +1 is there:
 
 + /* +1 not needed as q-p is already accounting for that */
 + element = malloc(len + 1);

Committed revision 1950

 other than that looks good to me.

could you check the simplified one?, this problem was introduced in
2003 and therefore affects all versions of ganglia since then (including
2.5.7 which is not supported anymore and that will need to be patched by
the users of it which include Debian/Ubuntu, Novell/OpenSuSE and
probably others).

 Two things:
 1) How has this been tested? I did some myself and got to wonder how
 you guys did it, do you have any standardized approach?

sadly there is no test suite associated with ganglia code and therefore
there is no standardized approach other than applying the patch and
banging the resulting binary to see if it works reliably.

 2) you mention backports to 3.1 and then move on to 3.1.2, what about
 3.0? Some of us (quite a few?) are still running 3.0 and afaik kostas
 already applied the patch to that branch and ran some tests (and so
 did I - and server.c hasn't changed for a long time so it should be
 indeed a safe operation)

it will be included in 3.0.8 as well.

Carlo

--
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] patches for: [Sec] Gmetad server BoF and network overload + [Feature] multiple requests per conn on interactive port

2009-01-18 Thread Carlo Marcelo Arenas Belon
On Tue, Jan 13, 2009 at 11:41:19PM +0800, Spike Spiegel wrote:
 
 === DoS attacks
 1) Given REQUESTLEN=2048, and 3 characters to be the minimum to craft a valid
 and nonexistent path /x, with the above feature implemented it would be
 possible to trigger 2048/3 calls to process_path which would possibly lead to
 CPU overload.

this is not handled by any of the provided patches but since processing
is aborted as soon as the path is considered invalid the depth of the
path is not relevant for CPU or bandwith utilization

 2) extension to 1) - as it is ganglia returns the entire tree if an element is
 not found. with large trees 2048/3 requests could easily result in several GBs
 of data being transferred. Related to this if you look at gmetad/server.c 
 lines
 601:606 you'll see this:
  err_msg(Got a malformed path request from %s, 
 remote_ip);
  /* Send them the entire tree to discourage attacks. */
  strcpy(request, /);
 which leads to the same scenario as above.

the amount of data returned is not dependent on the depth of the path
because it will always be the full XML tree (once).

 What I propose is that for both cases, malformed request and non existent
 items, we log an error and bail out. This would solve 2) and most of 1) making
 the call possibly exist far quicker.

the proposed solution will result in a truncated XML which then will fail to
be parsed in the client and in an obscure error like unable to write
XML tree info.

agree that returning the whole tree isn't the best way to signal a
syntax error, but returning a truncated XML will be more difficult to
handle in the client side as depending on the implementation used it
will fail to even load with an exception.

because the connection to the client is getting severed when it is
malformed it will also show strange errors like unable to write root
preamble (DTD, etc) or Connection reset by peer in the client.

Carlo

--
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] patches for: [Sec] Gmetadserver BoFandnetwork overload + [Feature] multiple requestsper connoninteractive port

2009-01-18 Thread Carlo Marcelo Arenas Belon
On Sun, Jan 18, 2009 at 09:53:32PM +0800, Spike Spiegel wrote:
 On Sun, Jan 18, 2009 at 7:35 PM, Carlo Marcelo Arenas Belon
 care...@sajinet.com.pe wrote:
  other than that looks good to me.
 
  could you check the simplified one?, this problem was introduced in
  2003 and therefore affects all versions of ganglia since then (including
  2.5.7 which is not supported anymore and that will need to be patched by
  the users of it which include Debian/Ubuntu, Novell/OpenSuSE and
  probably others).
 
 apologies but I lost you there, what do you mean with the simplified one?

http://bugzilla.ganglia.info/cgi-bin/bugzilla/attachment.cgi?id=188action=view

that should apply cleanly to 3.1.1, 3.0.7 and 2.5.7

 Is that what you meant when you said banging to resulting binary?

partially; scripts would be able to help only after the testing
parameters had been defined, and at least for this test might be limited
by the fact that the interactive port is mainly used by the web
frontend.

Carlo

--
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] cygwin build stuck on libintl

2009-01-12 Thread Carlo Marcelo Arenas Belon
On Fri, Jan 09, 2009 at 01:41:09PM -, daniel.poc...@barclayscapital.com 
wrote:
 One thing I notice about README.WIN is that it doesn't tell me which
 sections of the Cygwin setup to look in for each dependency, I really
 some of them are obvious, but it would save time listing them for those
 of us who don't like clicking around the Cygwin setup GUI:
 
 apr1
 expat
 diffutils (Utils)
 python (Python)
 sharutils (Archive)
 sunrpc (Libs)
 bison (Devel)
 flex (Devel)
 libtool (Devel)

Committed revision 1943

 If there is a way to run the Cygwin setup.exe and request automatic
 installation of these packages from the command line, that would be
 useful to show in README.WIN

not that I am aware of, but using the Full View allows you to get an
alphabetically ordered list of packages which helps.

 configure gets stuck on libconfuse (libintl dependency issue)
 
 $ cd confuse-2.6
 $ make clean  ./configure  make  make install
  builds successfully 
 $ cd ../ganglia-trunk
 $ ./bootstrap
 $ ./configure --with-libconfuse=/usr/local --enable-static-build
  various messages 
 Checking for confuse
 Added -I/usr/local/include to CFLAGS
 Added -L/usr/local/lib to LDFLAGS
 checking for cfg_parse in -lconfuse... no
 Trying harder including gettext
 checking for cfg_parse in -lconfuse... no
 Trying harder including iconv
 checking for cfg_parse in -lconfuse... no
 libconfuse not found

all this extra checks were added to workaround problems in libconfuse
when using an external to libc gettext implementation, but they are
obviously not able to workaround cygwin if NLS is enabled.

since this is a problem on libconfuse, better disable NLS support on
it as it is not needed anyway.

 libiconv, gettext and gettext-devel are definitely installed in Cygwin

that is the problem.

updated the documentation to explicitally disable NLS support even if
those packages (which are not in the list of what is needed for this
reason) where installed and found at configure time.

Carlo

--
Check out the new SourceForge.net Marketplace.
It is the best place to buy or sell services for
just about anything Open Source.
http://p.sf.net/sfu/Xq1LFB
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] Ganglia web cluster/grid name patch.

2009-01-11 Thread Carlo Marcelo Arenas Belon
On Fri, Jan 09, 2009 at 04:29:34PM -0500, Jason A. Smith wrote:
 Here is another patch.

Committed revision 1940.

Made some slight changes to keep indentation as is currently being used
in the affected files and convert tabs into spaces.

Carlo

--
Check out the new SourceForge.net Marketplace.
It is the best place to buy or sell services for
just about anything Open Source.
http://p.sf.net/sfu/Xq1LFB
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] Web templates subversion access.

2009-01-11 Thread Carlo Marcelo Arenas Belon
On Sat, Jan 10, 2009 at 06:03:42PM -0500, Jason A. Smith wrote:
 On Sat, 2009-01-10 at 11:25 +, Carlo Marcelo Arenas Belon wrote:
  On Thu, Jan 08, 2009 at 04:13:13PM -0500, Jason A. Smith wrote:
  
   Recently I started testing the svn version of the web scripts and found
   a few bugs.
  
  could you elaborate on the bug this patch is fixing?, the code from
  trunk and the other 2 active branches seems similar enough to consider
  this a problem also outside of trunk.
 
 I have not tested nor looked at any other trunks unfortunately.  I have
 a working 3.0.7 installation, and this problem does not exist there.

AFAIK, it doesn't exist in trunk either as I am unable to reproduce it.
could it be that some patch you are adding is triggering the behaviour?

   Fixed graph zooming and make sure the default summary graph size
   overrides the size selected for the cluster graphs.
  
  graphargs should contain (in the host view) all host specific parameters
  that are needed to create the graphs like h (hostname), r (range) and
  st (start time) but doesn't have z (size) and therefore by moving it to the
  beginning of the generated URL all it changed is the order used, at
  least for the report graphs which are the ones that are being patched.
 
 The problem I noticed was mainly in the cluster view, when selecting a
 specific metric to look at or the size, it would also change the size of
 the top summary graphs in addition to the lower host graphs.  I assume
 this is an unintended consequence of probably a few patches interacting
 together since previous versions of ganglia did not do this, but I did
 not bother to track it down.

if a z variable ever gets into graphargs that would most likely break
the code the way you describe, but as I mentioned before it doesn't seem
to be doing that, or at least it doesn't seem to be doing that in a
clean checkout from trunk that I'd been testing for this bug.

 I think the problem also occurred in the
 host view, but I don't really remember.  The meta view already had the
 graph variable placed first in the arg list, so patching in this way
 also makes all three main graph views work the same.  The only change,
 as you say is the order of the graphargs to force the medium size to
 override what is in the variable.

I agree that the way it is coded is fragile and your patch is
just making all the references more consistent by changing the order,
but as I argue below, I think that relying on the variable to be
overridden and the order of the variables to be of significance (as your
patch suggests) is the wrong approach to solving the problem.

In any case, at least for now, Committed revision 1942.

  for the metrics graph, the order does (sadly) make a difference as the
  zoom relies on having z redefined to large through the template, but
  the patch doesn't apply to that section of the code.
 
 I am not sure what you mean here, the patch does apply to the zoom
 feature, since it does touch the graph image links also.  In addition to
 the summary graphs at the top being affected, I noticed that zooming was
 also broken.

I was commenting on the way the template for host view is constructed
and in the fact that for the metric graphs, the position for graphargs
was important as it shows z=medium as part of graphargs and then
relies in the template to override that with z=large for the zoom to
work.

Your patch changes the way the URLs for the report graphs are being
generated so it matches the way the ones for the metric graphs but
doesn't change the code there as it is already working even if in a
hacky way.

  could we instead remove the hardcoded values and manage the URLs in a
  way that makes them not dependent on the order of the parameters so that
  variables are overridden?
 
 Possibly, this was just the easiest fix that I thought of though, since
 it keeps the graph args variable in the list, so they can share things
 like the time range, so they don't have to be managed separately, just
 the order was changed, to act like an override.

In host_view.php:54, graphargs is defined for each metric to have a
hardcoded size medium and that is then overridden in the HREF through
the template so that the link used when the graph is clicked has :

  c=$clusterh=$hostname...z=medium...z=large

My argument was that it will be IMHO better if the graph size would be
configurable and therefore both sizes for the graph detached from the
graphargs variable so that the code won't have to rely on a variable
being overridden or the order the arguments have as it does now.

Carlo

--
Check out the new SourceForge.net Marketplace.
It is the best place to buy or sell services for
just about anything Open Source.
http://p.sf.net/sfu/Xq1LFB
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo

Re: [Ganglia-developers] Web templates subversion access.

2009-01-10 Thread Carlo Marcelo Arenas Belon
On Thu, Jan 08, 2009 at 04:13:13PM -0500, Jason A. Smith wrote:

 Recently I started testing the svn version of the web scripts and found
 a few bugs.

could you elaborate on the bug this patch is fixing?, the code from
trunk and the other 2 active branches seems similar enough to consider
this a problem also outside of trunk.

 Fixed graph zooming and make sure the default summary graph size
 overrides the size selected for the cluster graphs.

graphargs should contain (in the host view) all host specific parameters
that are needed to create the graphs like h (hostname), r (range) and
st (start time) but doesn't have z (size) and therefore by moving it to the
beginning of the generated URL all it changed is the order used, at
least for the report graphs which are the ones that are being patched.

for the metrics graph, the order does (sadly) make a difference as the
zoom relies on having z redefined to large through the template, but
the patch doesn't apply to that section of the code.

could we instead remove the hardcoded values and manage the URLs in a
way that makes them not dependent on the order of the parameters so that
variables are overriden?

Carlo

--
Check out the new SourceForge.net Marketplace.
It is the best place to buy or sell services for
just about anything Open Source.
http://p.sf.net/sfu/Xq1LFB
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] problems with cygwin build

2009-01-04 Thread Carlo Marcelo Arenas Belon
On Sun, Jan 04, 2009 at 06:17:56PM -0800, Jacob Gladish wrote:
 I'm trying to build the trunk from svn and getting errors generating the 
 configure script.

there is a file with instructions called README.SVN and a script that
does the bootstrapping for you.

haven't checked for a while in cygwin but it should work fine; if it
doesn't you might want to do the bootstrapping in Linux and generate a
release package instead.

 It looks like this is some autoconf version issue. Am I missing something?

most likely automake

Carlo

--
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] [Ganglia-svn] SF.net SVN: ganglia:[1925] trunk/monitor-core

2008-12-07 Thread Carlo Marcelo Arenas Belon
On Mon, Dec 01, 2008 at 12:19:25PM -0700, Brad Nicholes wrote:
 I don't understand what you are trying to do with this patch.

as explained in the commit message (which I apologize if it wasn't
clear enough), it is correcting the definition of modules so it is
correctly tagged as showing multiple times in the configuration as per :

  http://www.nongnu.org/confuse/manual/confuse_8h.html#a3

of course, I also adjusted all in tree code to use the correct syntax
to retrieve all module definitions and that way avoid any problems.

 Once libconfuse has finished reading and parsing the entire
 configuration file along with all of the individual modules sections,
 it automatically consolidates them into a single section.

since the modules section contains only other subsections (in this case
the section module) which is defined as showing multiple times then
all those subsections will be linked to the first section created, and
which will be then accessible with a call to cfg_getsec(modules).

the problem with that, of course, is that we are then relying on an
unintended sideffect of how the configuration structure is being created
and that will break if another non section configuration is added later.

I'd also argue that if all modules sections are to be collapsed anyway
wouldn't be better to get rid of the modules configuration and just
list all modules as part of the root?

 There is no need to try to scan individual modules sections.

if the configuration is defined to be shown multiple times, then a call
to cfg_getsec will only get one of the instances.

 This code was working correctly as it was.  Please revert this patch.

seems it was reverted already in r1931, so added some documentation of
the latent problems in r1933 until the compatibility issues raised could
be resolved.

either implementation will work for the current setup but if you are to
reconsider don't forget to revert r1933 as well.

fixing any external module that would have problems looking at the
module list (most likely useful for script handlers) shouldn't been
that difficult IMHO.

Carlo

--
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] cpu_report and Windows/Cygwin nodes

2008-11-15 Thread Carlo Marcelo Arenas Belon
On Thu, Nov 13, 2008 at 08:57:13AM -, [EMAIL PROTECTED] wrote:
 We've tweaked some of the reports on an older version of the ganglia-web
 package so that they work with Windows nodes.

you mean a broken image?, just a warning in the web log because the
rrdgraph command is somehow broken by the missing data, or the fact that the
report is bogus?

 For example, cpu_report
 needs to test if the file cpu_nice.rrd exists, and use a modified
 rrdgraph command if there is no cpu_nice.rrd

cpu_nice should exist and be reported in windows (even if it might be
bogus or a 0 value like in load)
 
 I was about to start updating these patches to trunk, but I'm curious
 about what else is going on in this area - is any similar work in the
 pipeline?  Is there any imminent plan to overhaul this part of the code,
 making such work un-necessary?

All work we do is publicly available in trunk, and the only work that has been
done AFAIK was released with 3.1.0 and was the modular graph framework.

Reporting always assumes that all metrics are available for all nodes and that
they are all relevant and somehow equivalent, and as you mentioned before that
wasn't really a safe assumption considering multiple platforms, the
introduction of platform specific metrics and custom metrics in 3.0 and now
with 3.1, the introduction of modular plugable metrics.

As you said there is IMHO the imminent need for an overhaul of this part of
the code and your proposal might be at least a start.

Carlo

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] spec file from trunk, jscalendar

2008-11-15 Thread Carlo Marcelo Arenas Belon
On Wed, Nov 12, 2008 at 05:07:53PM -, [EMAIL PROTECTED] wrote:
 
 Just a quick note, ganglia.spec.in needs to have calendar.php added to
 the list of files for web, see below:

Committed revision 1921

 Also, what is the plan for packaging jscalendar?  Should it be packaged
 independently and deployed elsewhere, with a symbolic link created by
 the ganglia-web RPM, or should this code become part of Ganglia SVN, as
 proposed for other third party dependencies?

Timothy's proposal made it optional, so it might make more sense to have
it detached from the ganglia web frontend RPM.

This will also help with the already convoluted licensing mix that the web
frontend code has as jscalendar is LGPL.

TemplatePower is GPLv2+ and the rest would seem to be MIT, so adding LGPL
to the mix might not be the best way to try to help some user/distribution to
figure out what their legal rights are.

Carlo

PS. the glue code with jscalendar could be made a little more robust as well
from what I recall when reviewing this code originally for inclusion.

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] ganglia-web: custom start and finish times in trunk

2008-11-15 Thread Carlo Marcelo Arenas Belon
On Wed, Nov 12, 2008 at 02:26:22AM +, Jesse Becker wrote:
 Anecdotally, I can say that the web frontend for trunk should work
 with 3.1.x

considering that the XML interface between the web frontend and gmetad
hasn't change, it should work even with a 2.5 gmetad or older (up to maybe
the introduction of the interactive port)

 actually may even work with 3.0.x.  This isn't really
 supported though.

if by supported you mean, it is not being tested then I agree; but there
shouldn't be any reason to recommend against running it in that configuration,
specially considering that there should be an upgrade path to 3.1 from 3.0
and 2.5 that will require this interface to be stable.

Carlo

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] 3.1.x branch whitespace cleanup for gmond.conf

2008-11-07 Thread Carlo Marcelo Arenas Belon
On Wed, Nov 05, 2008 at 04:21:57PM -0800, Bernard Li wrote:
 
 While trying to upgrade my current 3.1.0 installation of Ganglia with
 the 3.1.1.1901 + spoofing RPMs I just built, I noticed that gmond.conf
 was created as gmond.conf.rpmnew.

 This doesn't surprise me much, as I know that there were minor
 configuration changes between 3.1.0 and 3.1.x branch.

there were changes from 3.1.0 to 3.1.1 because of the removal of the
module path as described in the release notes, and also changes from 3.1.1
to 3.1.2 because of the removal of support for clusterless gmond and
adding the option to filter out the extra metric metadata.

 However, when I ran diff, the result was that the *entire* file was
 completely different.

diff -b (assuming the version of diffutils from CentOS 4 supports that)
will let you see the relevant changes that will need porting.

 I double checked this and found out it is the same between 3.1.1 and
 3.1.1.1901 (i.e. the entire 3.1.1 gmond.conf file was different from
 3.1.1.1901 gmond.conf file).
 
 Turns out, it was because of this recent backport in 3.1.x branch:
 
 http://ganglia.svn.sourceforge.net/viewvc/ganglia?view=revrevision=1882
 
 Is this really necessary?  While in general I am in favour of
 whitespace cleanup, in this case I think since it impacts usability, I
 think we should punt this until later.

later will just defer the problem, and since every release will have
most likely configuration changes which will require updates then you
are not solving anything but just deferring this.

if the concern is really about usability then `gmond -r` could be used
if extended to aid in configuration migrations, but considering that
functionality was broken in 3.1 until recently (fixes will be released with
3.1.2) for migrations from 2.5 and migrations from 3.0 were being handled by
some hacky patch and manual steps, all that should be needed in that case will
be to update the procedures to use the right flags. 

Carlo

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] Comment on gmetric backport proposal in 3.1.x branch

2008-11-07 Thread Carlo Marcelo Arenas Belon
On Fri, Nov 07, 2008 at 02:52:45PM -0800, Bernard Li wrote:
 
 On Fri, Nov 7, 2008 at 2:41 PM, Carlo Marcelo Arenas Belon
 [EMAIL PROTECTED] wrote:
 
  Care to explain what your comment means?
 
  are you serious?, if so will take some time later in the night to reply with
 
 Yes I am.  Otherwise I wouldn't send the email to begin with.

I see, and seems you keep confusing my email address with the one from the
ganglia developers list.

considering how many people is going to have to read through this email or
at least delete it, even if it wasn't directed to them, seems like a waste
of our collective energy, so I apologize in advance to everyone for this
reply.

 In case you need clarification on my question, I am confused about this:
 
 patched generated files
 
 Did you mean:
 
 The patch generated files

no.

 or
 
 Generated files were patched

yes; so it seems you were not that confused after all.

 Please explain what these generated files are.

a generated file is a file that is generated through some process (like a
compiled binary is generated by processing source code with a compiler).

in this case the files you patched (gmetric/cmdline.h, gmetric/cmdline.c,
mans/gmetric.1) are not meant to be patched directly because they are
generated through a process as it is clearly explained in the header of
them all :

$ head gmetric/cmdline.h 
/** @file cmdline.h
 *  @brief The header file for the command line option parser
 *  generated by GNU Gengetopt version 2.22
 *  http://www.gnu.org/software/gengetopt.
 *  DO NOT modify this file, since it can be overwritten
 *  @author GNU Gengetopt by Lorenzo Bettini */

$ head gmetric/cmdline.c 
/*
  File autogenerated by gengetopt version 2.22
  generated with the following command:
  /usr/local/bin/gengetopt --input ./cmdline.sh 

  The developers of gengetopt consider the fixed text that goes in all
  gengetopt output files to be in the public domain:
  we make no copyright claims on it.
*/

and my favorite :

$ head mans/gmetric.1 
.\ DO NOT MODIFY THIS FILE!  It was generated by help2man 1.36.
.TH GMETRIC 1 March 2008 gmetric User Commands
.SH NAME
gmetric \- manual page for Ganglia Custom Metric Utility
.SH SYNOPSIS
.B gmetric
[\fIOPTIONS\fR]...
.SH DESCRIPTION
The Ganglia Metric Client (gmetric) announces a metric
on the list of defined send channels defined in a configuration file

the cleanup I refer to is to ensure that the changes are done in the original
sources so that your changes won't be blown away next time, and then
regenerate the files as it was meant to be.

will see also if some Makefile rules could be added so rebuilding them is
as simple as making an RPM, so that we could prevent this kind of issues and
waste of energy in the future.

definitely not bad for being your first commit ever to gmetric, so feel free
to backport it, so our users will be able to use it instead of getting an
answer like :

  http://www.mail-archive.com/[EMAIL PROTECTED]/msg04159.html

which for some strange reason never had a reply and got Filippo in the right
direction as he had no more problems with ganglia since.

Carlo

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] Spoofing patch...

2008-11-06 Thread Carlo Marcelo Arenas Belon
On Tue, Nov 04, 2008 at 06:13:47PM -0700, Brad Nicholes wrote:
 Carlo,
 In the STATUS file you commented that the spoofing patch needs more work.

my comment was about the spoofing feature needing more work in 3.1 as you
pointed out below.

the patch just makes the problem bigger by adding several changes (more than
500 lines) on top of an implementation that has open regressions and that
will need to be cleaned up first IMHO.

 Can you explain what work needs to be done.  Other than supporting the
 short commandline for spoofing a heartbeat, AFAICT everything is working
 as it should.

current 3.1, without the additional patch is missing the heartbeat spoofing
support as you pointed out, and is also exporting the spoofed data in the
XML which even if mostly harmless is a change of behaviour and should be
cleaned up as well (as it really adds no value and is inconsistent)

as for 3.1 with the patch added, there are the following problems :

* the added calls for toupper() in libgmond could result in undefined
  behaviour in platforms where char is signed and toupper is implemented
  using an array lookup (NetBSD, and probably HPUX and AIX)

* the proposed patch has some additional patches that will need to be
  added on top of it (some of them already proposed and approved like
  the modpython fixes, but some others still not even proposed)

 I have committed a spoofing example python module in trunk that spoofs
 cpu_util, boottime, heartbeat and osname for three imaginary machines.
 This spoof example module runs under both trunk and the patched 3.1.x.

Sadly I haven't been able to make it work, with `gmond -m` showing :

# gmond -m
 (module python_module)
load_oneOne minute load average (module load_module)
...

and python modules in general not working anymore (linked against amd64's
python 2.5.2 in Linux)

the C interface seems to work fine at least in 3.1 (in trunk it messes
GMOND_STARTED as explained before)

 I have also created a patch for sending a heartbeat through the shortened
 commandline which has been proposed as a follow-on backport.

saw that, not sure about the needed dependency in the modular spoofing
support as IMHO a change to gmetric should be independent of that.

 If there is nothing else missing, can we get this one backported?

with the availability of a proposed backport patch that fixes the conflicts
and an example python spoofing module it should be easier to do so but as I
pointed before not yet sure enough to stamp a vote on it, but I agree we
should get this patch/feature released with the next release.

Carlo

PS. can the full list of patches needed from backport be added to the list, I
suspect r1615 is missing as that should be required for r1622 which I added
and is included in your consolidated patch

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] backport proposal: mcast_if support in gmond (BUG140)

2008-10-29 Thread Carlo Marcelo Arenas Belon
On Tue, Oct 28, 2008 at 12:42:28PM -0600, Brad Nicholes wrote:
 * libganglia: mcast_if support in gmond (BUG140)
 
 http://www.mail-archive.com/ganglia-developers@lists.sourceforge.net/msg03775.html
  
 http://ganglia.svn.sourceforge.net/viewvc/ganglia?view=revrevision=1734 
 +1: carenas
 -1: bnicholes
 bnicholes: The patch appears to be dependant on rev. 1478 in trunk which
   is not included in the backport proposal.
 carenas: no, otherwise would had been included; see BUG140 for patch
 
 Carlo,
 I don't understand your response.

Agree is a little concise as it was constrained by being in the STATUS file,
which is again a reason why I suggested we discuss the patches in the list
instead, as using email is a far more expressive medium and IMHO better
oriented to interactive discussions than commits in a file.

Thanks for bringing this to the list so we will hopefully had a faster way
to come to a conclusion on this issue.

  Patch 1734 makes a call to the function join_mcast() which doesn't exist 
 unless patch 1478 is also backported.

no; that is an svn merge problem which results in a conflict because as you
said. in trunk that function has been renamed.

the code in patch 1734 itself that is added doesn't call or depend on that
function at all and so the resolution for that conflict is to keep the
intended change from the patch, and preserve the function name (which is
actually from a nearby function even if svn seems to think otherwise)

to simplify testing, and provide a patch that could be applied directly to a
3.1 branch and that has the conflict already resolved BUG140 was updated with
a patch file.

the attachment patch itself can be retrieved from :

  
http://bugzilla.ganglia.info/cgi-bin/bugzilla/attachment.cgi?id=176action=view

 So if you backport r1734 without also backporting r1478, the gmond code will 
 be left unbuildable with an unresolved function call to a non-existent 
 join_mcast() function.  Apparently r1478 renamed this function from 
 mcast_join() to join_mcast().

shouldn't be an issue if using the patch provided, which has the right
resolution for the svn merge conflict and that avoids pulling this other
unrelated change and that is actually part of another backport proposal that
is not meant for this merge window at least.

Carlo

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] spoofing heartbeats with gmetric broken in 3.1

2008-10-29 Thread Carlo Marcelo Arenas Belon
On Tue, Oct 28, 2008 at 12:25:18PM -0600, Brad Nicholes wrote:
  On 10/28/2008 at 3:39 AM, in message [EMAIL PROTECTED], Carlo
 Marcelo Arenas Belon [EMAIL PROTECTED] wrote:
 
  2) is spoofing healthchecks really needed?, considering that the last update
 from the spoofed host will be updated anyway by the metric report?

the use here of healthcheck is incorrect, the issue is for heartbeats as
detailed in the subject.

 The health check needs to be there mainly so that the heartbeat metric shows 
 up for the spoofed box in the XML.

every time a spoofed metric is sent, the REPORTED value for the host that is
being spoofed will be updated, which is AFAIK the whole point of the
heartbeat message anyway.

the REPORTED value is tied to the host and not to any specific metric which
is where the term heartbeat metric doesn't really fit for a model where the
spoofing is a METRIC attribute instead.

 Once the module spoofing functionality has been accepted for backport, I have 
 an example python module that spoofs the base information such as heartbeat, 
 location, boottime, etc.  By just adding this module, you get all of the 
 spoofed base metrics.

interesting; could that be the reason why while testing gmetric spoofing in
trunk the GMOND_STARTED value was apparently getting updated?

haven't yet tracked that bug, as I wanted to focus in the 3.1 code first, but
that is also a regression as it will prevent anyone to identify which hosts
are being spoofed through it (which was one of Yemi's concerns when this was
introduced around 3.0.4)

  3) even if using some METADATA with the metric code to indicate the 
  SPOOF_HOST between gmetric and gmond is that EXTRA_ELEMENT needed in the 
  gmond XML?
 
 I'm not sure I understand the question.  The EXTRA_ELEMENT XML tag is used 
 because spoofing is an extension to the standard metric data just like TITLE 
 and GROUP.

right; before there was no XML interface because spoofing was being done at
the XDR level, but my concern was directed at why the EXTRA_ELEMENT for
SPOOF_HOST was visible from the XML exported from gmond when it has been
already processed and it is indeed redundant.

it is also strange IMHO that the SPOOF_HEARTBEAT doesn't show if the intention
was to keep those EXTRA_ELEMENT in gmond after they were processed.

 The only way to do without the EXTRA_ELEMENT tag would be to rework the 
 standard tags to include some kind of spoofing attribute.

there are already standard XDR tags for spoofing (at least in 3.0) which could
be most likely reused for this if needed, but then I am confused of what the
rationale was behind using instead EXTRA_ELEMENT tag with 3.1 if using XDR was
possible after the XDR refactoring.

Carlo

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


[Ganglia-developers] spoofing heartbeats with gmetric broken in 3.1

2008-10-28 Thread Carlo Marcelo Arenas Belon
Greetings,

while looking at the spoofing code in gmetric noticed the implementation by
Yemi to be able to send a spoofed heartbeat by running :

  gmetric -S ${IP}:${NAME} -H

is now considered invalid (since r882) and the implementation in trunk
is dropping the metric if -H is used to indicate a healthcheck should be
spoofed with :

  gmetric -n ${METRIC} -v ${VALUE} -t ${TYPE} -S ${IP}:${NAME} -H

considering this is a regression (even if I had to admit I am not sure how
serious, as spoofing is not something I'd used other than for testing its
code), then will be great if someone that knows better this feature could
answer the following :

1) other than breaking some scripts by no longer supporting the format used
   in 3.0, is the longer format needed by 3.1 sufficient? (except of course
   it is odd to drop the information about the metric used just because
   -H is also included as in trunk)

2) is spoofing healthchecks really needed?, considering that the last update
   from the spoofed host will be updated anyway by the metric report?

3) even if using some METADATA with the metric code to indicate the SPOOF_HOST
   between gmetric and gmond is that EXTRA_ELEMENT needed in the gmond XML?

Carlo

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] backport proposal for graph statistics, bugzilla 206

2008-10-18 Thread Carlo Marcelo Arenas Belon
On Sat, Oct 18, 2008 at 01:14:32PM -0400, Jesse Becker wrote:
 Just a note that I've added a backport proposal for bugzilla ID#206
 into the 3.1.x branch STATUS file.

great, since Timothy did most of the work for it, I am sure he will be
interested in commenting about it so inlining the path below for discussion

 I've
 consolidated several patches from trunk into a single patch, and
 posted it.  Please review, test and vote.

was just looking at that and I have to admit I am not sure why a consolidated
patch that diverts from trunk will be needed.

couldn't just a merge from all relevant patches in trunk be used for
backport?, if there are few minor textual changes why not include them as
well to avoid having later conflicts when trying to merge further stuff from
trunk?

if the changes that were skipped were not good for 3.1, then they are not good
for trunk either and that could be fixed with further patches in trunk than
then could be added to the list of patches from 3.1 for backport.

Carlo
---
Index: graph.d/mem_report.php
===
--- graph.d/mem_report.php  (revision 1868)
+++ graph.d/mem_report.php  (working copy)
@@ -30,30 +30,44 @@
 $rrdtool_graph['vertical-label'] = 'Bytes';
 $rrdtool_graph['extras'] = '--rigid --base 1024';
 
-$series = DEF:'mem_total'='${rrd_dir}/mem_total.rrd':'sum':AVERAGE 
-.CDEF:'bmem_total'=mem_total,1024,* 
-.DEF:'mem_shared'='${rrd_dir}/mem_shared.rrd':'sum':AVERAGE 
-.CDEF:'bmem_shared'=mem_shared,1024,* 
-.DEF:'mem_free'='${rrd_dir}/mem_free.rrd':'sum':AVERAGE 
-.CDEF:'bmem_free'=mem_free,1024,* 
-.DEF:'mem_cached'='${rrd_dir}/mem_cached.rrd':'sum':AVERAGE 
-.CDEF:'bmem_cached'=mem_cached,1024,* 
-.DEF:'mem_buffers'='${rrd_dir}/mem_buffers.rrd':'sum':AVERAGE 
-.CDEF:'bmem_buffers'=mem_buffers,1024,* 
-
.CDEF:'bmem_used'='bmem_total','bmem_shared',-,'bmem_free',-,'bmem_cached',-,'bmem_buffers',-
 
-.AREA:'bmem_used'#$mem_used_color:'Memory Used' 
-.STACK:'bmem_shared'#$mem_shared_color:'Memory Shared' 
-.STACK:'bmem_cached'#$mem_cached_color:'Memory Cached' 
-.STACK:'bmem_buffers'#$mem_buffered_color:'Memory Buffered' ;
+$fmt = '%.1lf';
 
+$series = 'DEF:mem_total=${rrd_dir}/mem_total.rrd:sum:AVERAGE' 
+. 'CDEF:bmem_total=mem_total,1024,*' 
+. 'DEF:mem_shared=${rrd_dir}/mem_shared.rrd:sum:AVERAGE' 
+. 'CDEF:bmem_shared=mem_shared,1024,*' 
+. 'DEF:mem_free=${rrd_dir}/mem_free.rrd:sum:AVERAGE' 
+. 'CDEF:bmem_free=mem_free,1024,*' 
+. 'DEF:mem_cached=${rrd_dir}/mem_cached.rrd:sum:AVERAGE' 
+. 'CDEF:bmem_cached=mem_cached,1024,*' 
+. 'DEF:mem_buffers=${rrd_dir}/mem_buffers.rrd:sum:AVERAGE' 
+. 'CDEF:bmem_buffers=mem_buffers,1024,*' 
+. 
'CDEF:bmem_used=bmem_total,bmem_shared,-,bmem_free,-,bmem_cached,-,bmem_buffers,-'
 
+. 'AREA:bmem_used#$mem_used_color:Used' 
+. 'GPRINT:bmem_used:AVERAGE:$fmt%S' 
+. 'STACK:bmem_shared#$mem_shared_color:Shared' 
+. 'GPRINT:bmem_shared:AVERAGE:$fmt%S' 
+. 'STACK:bmem_cached#$mem_cached_color:Cached' 
+. 'GPRINT:bmem_cached:AVERAGE:$fmt%S\\l' 
+. 'STACK:bmem_buffers#$mem_buffered_color:Buffered' 
+. 'GPRINT:bmem_buffers:AVERAGE:$fmt%S' ;
+
 if (file_exists($rrd_dir/swap_total.rrd)) {
-$series .= DEF:'swap_total'='${rrd_dir}/swap_total.rrd':'sum':AVERAGE 

-.DEF:'swap_free'='${rrd_dir}/swap_free.rrd':'sum':AVERAGE 
-.CDEF:'bmem_swapped'='swap_total','swap_free',-,1024,* 
-.STACK:'bmem_swapped'#$mem_swapped_color:'Memory Swapped' ;
+$series .= 'DEF:swap_total=${rrd_dir}/swap_total.rrd:sum:AVERAGE' 
+. 'DEF:swap_free=${rrd_dir}/swap_free.rrd:sum:AVERAGE' 
+. 'CDEF:bmem_swapped=swap_total,swap_free,-,1024,*' 
+. 'STACK:bmem_swapped#$mem_swapped_color:Swapped' 
+. 'GPRINT:bmem_swapped:AVERAGE:$fmt%S\\g' 
+. 'CDEF:bswap_total=swap_total,1024,*' 
+. 'GPRINT:bswap_total:AVERAGE:/$fmt%S\\g' 
+. 'CDEF:swap_util=swap_total,swap_free,-,swap_total,/,100,*' 
+. 'GPRINT:swap_util:AVERAGE: ($fmt%%)\\l' ;
 }
 
-$series .= LINE2:'bmem_total'#$cpu_num_color:'Total In-Core Memory' ;
+$series .= 'LINE2:bmem_total#$cpu_num_color:Total In-Core' ;
+$series .= 'GPRINT:bmem_total:AVERAGE:$fmt%S' 
+.  'CDEF:util=bmem_total,bmem_free,-,bmem_total,/,100,*' 
+.  'GPRINT:util:AVERAGE:($fmt%% Real Memory Used)\\l' ;
 
 $rrdtool_graph['series'] = $series;
 
Index: graph.d/load_report.php
===
--- graph.d/load_report.php (revision 1868)
+++ graph.d/load_report.php (working copy)
@@ -19,7 +19,7 @@
$hostname = strip_domainname($hostname);
 }
 
-

Re: [Ganglia-developers] backport proposal for graph statistics, bugzilla 206

2008-10-18 Thread Carlo Marcelo Arenas Belon
On Sat, Oct 18, 2008 at 01:49:37PM -0400, Jesse Becker wrote:
 On Sat, Oct 18, 2008 at 13:33, Carlo Marcelo Arenas Belon
 [EMAIL PROTECTED] wrote:
  On Sat, Oct 18, 2008 at 01:14:32PM -0400, Jesse Becker wrote:
 
  couldn't just a merge from all relevant patches in trunk be used for
  backport?, if there are few minor textual changes why not include them as
  well to avoid having later conflicts when trying to merge further stuff from
  trunk?

in the bug report you wrote the following :

Updated patch from trunk back to 3.1.x branch.  Consolidates most of r1844,
r1848, r1850, r1856, and r1857.

There are a few minor textual changes from the listed revisions not included
in this patch

 Because I figured it would be easier to review and test applying one
 patch, instead of the 4-5 that it takes otherwise.  The single patch
 was produced directly from a diff against trunk; no new code is
 included.

I am not asking about why 1 patch was provided (which I agree is really useful
for testing) but on why the STATUS change doesn't instead list all patches
that are needed (at least from my initial merge attempts based on your
instructions I think 1847 is missing)

  if the changes that were skipped were not good for 3.1, then they are not 
  good
  for trunk either and that could be fixed with further patches in trunk than
  then could be added to the list of patches from 3.1 for backport.
 
 Nothing was skipped.  In fact, given that new stuff goes into trunk
 *before* it goes into 3.1, I don't really see how it could have been
 skipped.  As I said, there's no new code here.

as explained before there was no mention about new code being proposed, but
on the contrary, that some of them were probably missing, and that will result
in the long run in a divergence between trunk and 3.1 which will affect future
merges.

Carlo

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] [Ganglia-svn] SF.net SVN: ganglia:[1825]trunk/monitor-core/gmond/gmond.c

2008-10-04 Thread Carlo Marcelo Arenas Belon
On Thu, Sep 25, 2008 at 08:15:05AM -0600, Brad Nicholes wrote:
 [EMAIL PROTECTED] wrote:
 if (strcasecmp(cb-name,  metric_info[i].name) == 0) 
   {
  -  sprintf (modular_desc, %s (module %s), 
  metric_info[i].desc, cb-modp-module_name);
  +  snprintf (modular_desc, sizeof(modular_desc),
  +%s (module %s),
  +metric_info[i].desc,
  +cb-modp-module_name);
  +
 desc = (char*)modular_desc;
 break;
   }
 
 When copying into the buffer, shouldn't the length be sizeof(modular_desc)-1
 rather than the full length of the buffer?

the length is the maximum allowed number of bytes that will be written in the
buffer so sizeof(modular_desc) is a more natural fit since otherwise the
buffer will be artificially restricted by 1 byte.

 It needs to allow for a NULL terminator.

snprintf is defined with C99 and I don't have an specification handy but the
man page for it in Linux says :

  The  functions  snprintf()  and  vsnprintf()  write  at most size bytes
   (including the trailing null byte ('\0')) to str.

so the NULL terminator should be included in the length requested, and a quick
test with gcc 4.1.2 and the following code shows that it is indeed terminating
the buffer and truncating the result as needed to do so.

#include string.h
#include stdio.h

#define BUFFSIZE 4

int main(int argc, char *argv[])
{
  char buffer[BUFFSIZE];
  char *source = test;
  int n;

  for (n =0 ; n  BUFFSIZE; n++)
buffer[n] = 'A';
  printf(%s\n, buffer);
  snprintf(buffer, sizeof(buffer), %s, source);
  printf(%s\n, buffer);
  printf(%d\n, buffer[BUFFSIZE]);

  return (0);
}

to avoid truncation modular_desc should be large enough but now is 1024
bytes long and most likely big enough already (if probably too big)

Carlo

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] gmond native Windows binary

2008-10-04 Thread Carlo Marcelo Arenas Belon
On Thu, Sep 25, 2008 at 10:14:14PM +0100, [EMAIL PROTECTED] wrote:
 
 I am aware of mingw32 - however, a Cygwin environment provides autoconf
 and friends.

there is also a similar environment for mingw called msys but there is
no reason (neither I'd recommend) using it when you can just as well install
gcc-mingw in cygwin and have both.

 Do you propose:
 a) running configure on Linux and then mingw on Windows, or
 b) running mingw32 within a Linux host to cross-compile for Windows, or 
 c) is there a complete way to build directly from a fresh SVN checkout
 on Windows using the mingw32 tools and native autotools executables?

not sure what you meant here, but as you explained before ./configure
works in cygwin using mingw :

  # CC=gcc -mno-cygwin ./configure

and you can also cross-compile for linux using mingw if inclined, and last
I checked you could also bootstrap trunk in cygwin if you wanted to and
configure/build libmetrics either as a cygwin library or as a native windows
library.

 Regarding the C++ issues and Posix thread issues: is there any intention
 to back port this to 3.1.x, or is there another release that these are
 targetted for?  I'm not too worried either way, I just want to be able
 to focus my efforts in the right versions.

the C++ issues and some of the fixes needed to build libmetrics as a native
windows library has been proposed for backport into 3.1 for sometime already
but have no votes yet, I don't see any reason to keep them only on trunk but
in any case all development happens in trunk, so that is where you have to
focus anyway and I would imagine this might be material for 3.2 once all
pieces are put in place but since fixing DSO support for windows is in the
TODO list for 3.1 some of that code will have to be backported anyway.

there were no fixes implemented for the Posix thread vs Native thread yet
but this issue has been talked about several times before and I remember
you mentioning at least once you had done work with some library that could
be used to abstract the differences in one of the last threads, but haven't
yet read any specifics I could share and I am too lazy to search in the
mailing list for links.

 Regarding libConfuse: can you refer me to any previous postings on the
 issue of libConfuse static build on Windows?  Is this a limitation that
 originates upstream?

libConfuse limitations comes from upstream, which is why the only viable
solution for now will be to link it statically (as it is done in cygwin)
and unless the upstream version is fixed and we then rely on that yet not
existant version.

 I'm not a big fan of the srclib concept myself, but I don't see why
 there shouldn't be snapshots of essential dependencies in another part
 of the SVN repository, not directly under the Ganglia sub-tree.

because then you will need to pull all the pieces by hand to build it and
then you wouldn't have ever a standalone package (snapshot or release) that
could be used.

in any case having the build use a srclib provided libconfuse statically
if none is available or when instructed by doing --with-libconfuse=internal
or something like that is far better than any alternative AFAIK.

 Also, I remember seeing something about problems running dynamically
 linked metric modules on Windows - is that the case?  If so, is it
 something that can be overcome if someone is willing to work on it?  The
 APR docs seem to suggest that it is intended to support Windows DLLs.

If building a native windows version, linked to a windows APR maybe, but
not if building inside cygwin as APR will try then to use the UNIX code and
fail.

this is IMHO an APR deficiency which shows also when trying to for example
build apache using DSO inside cygwin.

there is also the problem that in windows (like in AIX) all objects must be
resolved at link time and sadly our build process is not clean enough to
ensure that yet (even if it works in Linux and other platforms like
Solaris/BSD where the dynamic linker does lazy binding to cover for our build
process deficiencies)

Carlo

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


Re: [Ganglia-developers] Small bug fix for ganglia.spec

2008-10-01 Thread Carlo Marcelo Arenas Belon
On Tue, Sep 30, 2008 at 11:20:47PM +0200, Ulf wrote:
 Can you just add the missing %defattr(-,root,root,-) to the ganglia.spec.

Committed revision 1842.

Carlo

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers


  1   2   3   4   >