Hi, Bernard:
Today when I was searching for information about "hardcoded limit of
supporting 4 GPUs", I found a booklet named "NVML API REFERENCE MANUAL
Version 3.295.45". In the first chapter, it lists all products the
NVML API supports. Unfortunately, the graphics cards in
Us", I found a booklet named "NVML API REFERENCE MANUAL
> Version 3.295.45". In the first chapter, it lists all products the
> NVML API supports. Unfortunately, the graphics cards in my cluster are
> under limited support. I doubt if this limitation would stop me from
> wor
Hi Md:
Thanks for your email.
You already have my email address so feel free to send me questions there.
I am proposing that students work on the following:
- Update plugin to support new metrics that can be collected by new version
of NVML
- Update web interface to support summarizing GPU
Dear all:
Just a quick note letting you guys know that we now have a python
module for monitoring NVIDIA GPUs using the newly released Python
bindings for NVML:
https://github.com/ganglia/gmond_python_modules/tree/master/gpu/nvidia
If you are running a cluster with NVIDIA GPUs, please download
may be a good idea. My schedule is mostly open next week. When
> are others free? I will brush up on sflow by then.
>
> NVML and the Python metric module are tested at NVIDIA on Windows and
> Linux, but not within Cygwin. The process will be easier/faster on the
> NVML side if w
detect for your GPU
are collected. For more information on what metrics are supported on what
models, please refer to NVML documentation
After following the above procedure respective services gmond and gmetad restart
could not get the GPU metrics in Ganglia.
Thanks & Regards,
Hr
I'm trying to implement the instructions given
here
http://developer.nvidia.com/ganglia-monitoring-system
on one of our Rocks 5.4.2 clusters that has 2 GPU
cards in every compute node.
Part #1: Python bindings for the NVML
http://pypi.python.org/pypi/nvidia-ml-py/
This requires P
Hi Robert,
sFlow is a very simple protocol - an sFlow agent periodically sends
XDR encoded structures over UDP. Each structure has a tag and a
length, making the protocol extensible.
In the short term, it would make sense is to define an sFlow structure
to carry the current NVML metrics and tag
on support?
Thanks,
Bernard
On Thursday, July 12, 2012, Robert Alexander wrote:
Hey,
A meeting may be a good idea. My schedule is mostly open next week. When are
others free? I will brush up on sflow by then.
NVML and the Python metric module are tested at NVIDIA on Windows and Linux,
but
Hi Dirk:
On Thursday, 6 March 2014, Dirk Luo wrote:
To my knowledge, the NVML plug-in provides a variety of GPU metrics.
> With these metrics, the RRDtool/graphite draws graphs as defined by
> the parameters supplied on the command line. The parameters supplied
> are defined in a fil
metrics that the management library could detect for your GPU
are collected. For more information on what metrics are supported on what
models, please refer to NVML documentation
After following the above procedure respective services gmond and gmetad restart
could not get the GPU metrics in
n.org/pypi/nvidia-ml-py/
requires Python to be newer than 2.4 - following
Phil's instructions in a recent email, I got
Python 2.7 and 3.x to install; and used that to
get these Python bindings for NVML to install.
I then followed the instructions in 'Ganglia/gmond
python modules&
Hey,
A meeting may be a good idea. My schedule is mostly open next week. When are
others free? I will brush up on sflow by then.
NVML and the Python metric module are tested at NVIDIA on Windows and Linux,
but not within Cygwin. The process will be easier/faster on the NVML side if
we
Hi Praful:
Thanks for your email.
For the GPU project, I propose the following work to be done:
- Update plugin to support new metrics that can be collected by new version
of NVML
- Update web interface to support summarizing GPU graphs under Host Overview
- Update web interface to better
quot;HP group", and the third would be the "Appro group".Each node
within our Linux cluster may each have 4, 8 or 16 GPUs. I'm currently using
the NVML Python Nvidia module to gather various metrics for each GPU for each
of the 500 nodes in our cluster. Therefore with
s.
I take you point about re-using the existing GPU module and gmetric,
unfortunately I don't have experience with Python. My plan is to write
something in C to export the nvml metrics, with various output options. We will
then decide whether to call this new code from existing gmond 3.1 v
d gmetric,
> unfortunately I don't have experience with Python. My plan is to write
> something in C to export the nvml metrics, with various output options. We
> will then decide whether to call this new code from existing gmond 3.1 via
> gmetric, new (if we get it working) gmond 3.
metrics.
>
> I take you point about re-using the existing GPU module and gmetric,
> unfortunately I don't have experience with Python. My plan is to write
> something in C to export the nvml metrics, with various output options. We
> will then decide whether to call this new
Hi, Bernard:
I am interested in working on the GPU project.
To my knowledge, the NVML plug-in provides a variety of GPU metrics.
With these metrics, the RRDtool/graphite draws graphs as defined by
the parameters supplied on the command line. The parameters supplied
are defined in a file similar
ygwin, my latest errors
>> are parsing gm_protocol_xdr.c. I don't know whether we should follow this
>> up, it would be nice to have a Windows gmond, but my only reason for
>> upgrading are the GPU metrics.
>>
>> I take you point about re-using the existing
1.7.
Ganglia 3.1.7 Apache Web server
=
- OS Version: RedHat 5.5
- RRDs files all stored on an NFS filesystem
/nfs/data/ganglia/rrds.
- A single Apache web server running a single gmetad daemon
which collects data from 4 different clusters.
-
:
https://github.com/ganglia/gmond_python_modules/tree/master/gpu/nvidia
https://github.com/ganglia/ganglia_contrib
Longer term, it would make sense to extend Host sFlow to use the
C-based NVML API to extract and export metrics. This would be
straightforward - the Host sFlow agent uses native C APIs
//github.com/ganglia/ganglia_contrib
>
> Longer term, it would make sense to extend Host sFlow to use the
> C-based NVML API to extract and export metrics. This would be
> straightforward - the Host sFlow agent uses native C APIs on the
> platforms it supports to extract metrics.
>
ganglia/gmond_python_modules/tree/master/gpu/nvidia
> https://github.com/ganglia/ganglia_contrib
>
> Longer term, it would make sense to extend Host sFlow to use the
> C-based NVML API to extract and export metrics. This would be
> straightforward - the Host sFlow agent uses nativ
fig
> eth0 Link encap:Ethernet HWaddr B8:AC:6F:14:20:09
> inet6 addr: fe80::baac:6fff:fe14:2009/64 Scope:Link
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> RX packets:30202937055 errors:0 dropped:48 overruns:0 frame:0
> TX pac
25 matches
Mail list logo