Thanks ,
This is exactly what i was looking for.
-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
Sent: Monday, July 08, 2002 6:54 PM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: RE: [Oscar-users] Tuning info on OSCAR Cluster Documentation ?
Vinod,
A universal measure of performance for supercomputers is the number of
floating point operation per second (flops) achieved for a real 64-bit
precision matrix LU factorization and solve problem. This may be done
on a MPI machine using the scalable linear algebra package ScaLAPACK put
together by several institutions including the University of Knoxville,
TN. The software package can be downloaded from many different sites,
but the main site is:
http://www.netlib.org/scalapack/
Two principle routines in ScaLAPACK, PDGETRF (double precision
factorization) and PDGETRS (double precision solve), are used for this
measure on a 32-bit precision machine. The speed of the factorization
and solve depends on different parameters used in the problem. There are
test drivers in this software package set up to allow variations for
these many parameters including the size of the matrix factored and the
number of right-hand sides to solve.
For your own comparison, a Cray X-MP in 1987 could factor and solve a
1000x1000 matrix with 1000 right-hand sides at approximately 220 Mflops
(million floating point operations per second).
Paul
Paul Harten, Ph.D.
Team Leader - P2Tools Design & Development
Industrial Multimedia Branch
Sustainable Technology Division
National Risk Management Research Laboratory
U.S. Environmental Protection Agency
26 West Martin Luther King Drive
Cincinnati, Ohio 45268
T: 513-569-7045
F: 513-569-7471
E: [EMAIL PROTECTED]
VINOD <[EMAIL PROTECTED]>
Sent by: To: Kyndig Renshai
<[EMAIL PROTECTED]>
[EMAIL PROTECTED] cc: oscar users
<[EMAIL PROTECTED]>
ceforge.net Subject: RE: [Oscar-users]
Tuning info on OSCAR Cluster Documentation ?
07/08/2002 04:59 AM
Thank You Ren,
What I understood form these ,
Performance tuning is of two different levels.
1. The Parallel program which works on the cluster. In fact I have
found a number of documents on fine tuning pvmpov with skyvase.pov
2. But could you give some info on how to tune my Beowulf Cluster. I
suppose this will include individual tuning of PBS, Maui and PVM ,MPICH
etc. Is there any single utility/ test program for measuring the
performance? If not a single one where can I find the individual tools
for each package?
Thanks Again,
Vinod.
-----Original Message-----
From: Kyndig Renshai [mailto:[EMAIL PROTECTED]]
Sent: Thursday, July 04, 2002 7:53 PM
To: VINOD
Subject: Re: [Oscar-users] Tuning info on OSCAR Cluster Documentation ?
Ok you're going about this all wrong ... always research your
problem before implementing it ... I'm no expert so bear with me
while I try to say what I understand ...
1. A cluster is a group of computers that are linked by a logical
structure -(its a software solution for connectivity and
communication). That's all OSCAR does. They use schedulers (pbs
and maui) to launch batch jobs. Batch jobs are not necessarily
parallel just compute intensive. Schedulers try to optimize the
turn around time for these types of jobs by allocating them to
different members of the cluster to get them finished as quickly
as possible.
2. pvmpov is a parallel pgm. Think of it this way - it has a
master coordinator program on the master node - that splits up the
job into smaller parts (divide and conquer algorithmn) and passes
them off to the other processors. When each processor (or node)
is done - they pass them back their portion of the job to the
coordinator - which collates the data to create the final
solution.
your parallel pgm - needs the tuning. I'm not saying that
clusters cannot be optimized.
You parallel pgm (this particular style of processing -
coordinator/subordinate) is limited by a few factors:
the speed of the processors (if you mix processors speeds on the
nodes for instance - the slowest processor can slow down the
overall throughput. That is unless the coordinator has
algorithmns to load balance ...)
the channel speed - channel bandwidth and number of hops to the
coordinator all pay a role in latency. (But we're talking local
clusters here). There are a few models used in getting data out to
the nodes each has its own limitations. If its totally
distributed, peer to peer or some mixture (where there's messaging
between coordinator and nodes or both - coordinator and node and
node to node ) can affect what goes on in the channel and hence
latency - and affect the overall time to complete the job.
For something like povpvm granularity (size of the chunks being
sent) into the channel (the smaller the pieces the more sends. The
larger size there may be increase latency (queueing theory).
pvmpov is not loadbalancing either - so if you have a mix of
processors on the nodes - this might increase the turn around
time on the slowest processor. -Synchronous sends for instance
(wait until all processors are done doing a particular part of the
mosaic of the job before continuing to the next part) will also
determine the overall time to completion of the problem.
There are documentation on how parallel pgms work - start with
google.com - - beowulf.org is also the first place to look and
check out the few educational sources. Ya I know you're not quite
interested in the academics of parallel pgmming and more
interested in how to optimize the hardware and OSCAR in
particular.
Understand the problem on both domains. For instance - if you
look at Condor (condor.wisc.edu or do a google) - you'll get a
completely different notion of how cluster can be put together -
and a secondary notion of how clusters should work - its not about
hardware so much as it is a logical configuration notion. All you
really want is your pgm to be able to use the resources of other
computers.
There just happens to be a few problems with just allowin this
kinda access across an entire network - Condor addresses these.
Also take a look at grid computing (www.globus.org) and for an
implementation grid-in-a-box (www.ncsa.uiuc.edu) (should be in
their downloads section).
OSCAR creates generic cluster - it only sets up the most basic
infrastructure for cluster behavior of the beowulf style using
component type integration to create the product. As you can see
by the many questions concerning problems - component based
software creation can be a challenging - especially if the
designers do not also publish an architectural and design document
that aids the user in figuring out how and why things do the
things they do. The pgm is aimed at network administrators who
understand linux and shell scripting all of which I'm sure you're
probably pretty familiar with.
I hope this helps.
Ren
VINOD wrote:
Dear All,
I am newbie. But was trying to hang on OSCAR for my cluster, for
the last few weeks. Finally I could set up one successfully .
Thanks to the detailed Documentation provided along with the
software. Hats off to the team who has worked for this.
I have a small request regarding the documentation. No where it
is mentioning on how to tune my cluster for a better through put.
As I have posted earlier I have done a benchmark with POVRAY
-skyvase.pov. And I got a value of 8 seconds rendering time.
Fine.. But how will know that I have a good performer in my hand?
How will I tune it for better through put? what are the
parameters to be tuned and taken care of?
Could some body throw some light on these?
Thanks
Vinod.
Do You Yahoo!?
New! SBC Yahoo! Dial - 1st Month Free & unlimited access
�����������������������������������������ӆ+,�隊X���'���u��N��g��+n}��)���vk��k��j+{�m�����b�HzOܢo������������������������������������:����z�&j)b�
b�Ӭq���ǫ��b������q�����a��l����l��.�ǟ���w��X�����b��?�����