RE: [Oscar-users] Tuning info on OSCAR Cluster Documentation ?

Harten . Paul Mon, 8 Jul 2002 08:25:12 -0500

Vinod,

A universal measure of performance for supercomputers is the number of
floating point operation per second (flops) achieved for a real 64-bit
precision matrix LU factorization and solve problem.  This may be done
on a MPI machine using the scalable linear algebra package ScaLAPACK put
together by several institutions including the University of Knoxville,
TN.  The software package can be downloaded from many different sites,
but the main site is:


http://www.netlib.org/scalapack/

Two principle routines in ScaLAPACK, PDGETRF (double precision
factorization) and PDGETRS (double precision solve), are used for this
measure on a 32-bit precision machine.  The speed of the factorization
and solve depends on different parameters used in the problem. There are
test drivers in this software package set up to allow variations for
these many parameters including the size of the matrix factored and the
number of right-hand sides to solve.

For your own comparison, a Cray X-MP in 1987 could factor and solve a
1000x1000 matrix with 1000 right-hand sides at approximately 220 Mflops
(million floating point operations per second).

Paul


Paul Harten, Ph.D.
Team Leader - P2Tools Design & Development
Industrial Multimedia Branch
Sustainable Technology Division
National Risk Management Research Laboratory
U.S. Environmental Protection Agency
26 West Martin Luther King Drive
Cincinnati, Ohio 45268

T: 513-569-7045
F: 513-569-7471
E: [EMAIL PROTECTED]


                                                                                       
                                             
                      VINOD <[EMAIL PROTECTED]>                                     
                                             
                      Sent by:                            To:       Kyndig Renshai 
<[EMAIL PROTECTED]>                             
                      [EMAIL PROTECTED]        cc:       oscar users 
<[EMAIL PROTECTED]>                 
                      ceforge.net                         Subject:  RE: [Oscar-users] 
Tuning info on OSCAR Cluster Documentation ?  
                                                                                       
                                             
                                                                                       
                                             
                      07/08/2002 04:59 AM                                              
                                             
                                                                                       
                                             
                                                                                       
                                             




Thank You Ren,
            What I understood form these ,
    Performance tuning is of two different levels.
    1. The Parallel program which works on the cluster. In fact I have
found a number of documents on fine tuning pvmpov with skyvase.pov
    2. But could you give some info on how to tune my Beowulf Cluster. I
suppose this will include individual tuning of PBS, Maui and PVM ,MPICH
etc. Is there any single utility/ test program  for measuring the
performance? If not a single one where can I find the individual tools
for each package?

Thanks Again,
        Vinod.
-----Original Message-----
From: Kyndig Renshai [mailto:[EMAIL PROTECTED]]
Sent: Thursday, July 04, 2002 7:53 PM
To: VINOD
Subject: Re: [Oscar-users] Tuning info on OSCAR Cluster Documentation ?



      Ok you're going about this all wrong ... always research your
      problem before implementing it ...  I'm no expert so bear with me
      while I try to say what I understand ...


      1. A cluster is a group of computers that are linked by a logical
      structure -(its a software solution for connectivity and
      communication).  That's all OSCAR does.  They use schedulers (pbs
      and maui) to launch batch jobs.  Batch jobs are not necessarily
      parallel just compute intensive.  Schedulers try to optimize the
      turn around time for these types of jobs by allocating them to
      different members of the cluster to get them finished as quickly
      as possible.


      2. pvmpov is a parallel pgm.  Think of it this way - it has a
      master coordinator program on the master node - that splits up the
      job into smaller parts (divide and conquer algorithmn) and passes
      them off to the other processors.  When each processor (or node)
      is done - they pass them back their portion of the job to the
      coordinator - which collates the data to create the final
      solution.


      your parallel pgm - needs the tuning.  I'm not saying that
      clusters cannot be optimized.


      You parallel pgm (this particular style of processing -
      coordinator/subordinate) is limited by a few factors:


      the speed of the processors (if you mix processors speeds on the
      nodes for instance - the slowest processor can slow down the
      overall throughput.  That is unless the coordinator has
      algorithmns to load balance ...)


      the channel speed -  channel bandwidth and number of hops to the
      coordinator all pay a role in latency.  (But we're talking local
      clusters here). There are a few models used in getting data out to
      the nodes each has its own limitations.   If its totally
      distributed, peer to peer or some mixture (where there's messaging
      between coordinator and nodes or both - coordinator and node and
      node to node ) can affect what goes on in the channel and hence
      latency - and affect the overall time to complete the job.


      For something like povpvm granularity (size of the chunks being
      sent) into the channel (the smaller the pieces the more sends. The
      larger size there may be increase latency (queueing theory).
      pvmpov is not loadbalancing either - so if you have a mix of
      processors on the nodes  - this might increase the turn around
      time on the slowest processor. -Synchronous sends for instance
      (wait until all processors are done doing a particular part of the
      mosaic of the job before continuing to the next part) will also
      determine the overall time to completion of the problem.


      There are documentation on how parallel pgms work - start with
      google.com -  - beowulf.org is also the first place to look and
      check out the few educational sources.  Ya I know you're not quite
      interested in the academics of parallel pgmming and more
      interested in how to optimize the hardware and OSCAR in
      particular.


      Understand the problem on both domains.  For instance - if you
      look at Condor (condor.wisc.edu or do a google) - you'll get a
      completely different notion of how cluster can be put together -
      and a secondary notion of how clusters should work - its not about
      hardware so much as it is a logical configuration notion.  All you
      really want is your pgm to be able to use the resources of other
      computers.


      There just happens to be a few problems with just allowin this
      kinda access across an entire network - Condor addresses these.
      Also take a look at grid computing (www.globus.org) and for an
      implementation grid-in-a-box (www.ncsa.uiuc.edu) (should be in
      their downloads section).


      OSCAR creates generic cluster - it only sets up the most basic
      infrastructure for cluster behavior of the beowulf style using
      component type integration to create the product.  As you can see
      by the many questions concerning problems - component based
      software creation can be a challenging - especially if the
      designers do not also publish an architectural and design document
      that aids the user in figuring out how and why things do the
      things they do.  The pgm is aimed at network administrators who
      understand linux and shell scripting all of which I'm sure you're
      probably pretty familiar with.


      I hope this helps.


      Ren


       VINOD wrote:
       Dear All,
       I am newbie. But was trying to hang on OSCAR for my cluster, for
       the last few weeks. Finally I could set up one successfully .
       Thanks to the detailed Documentation provided along with the
       software. Hats off to the team who has worked for this.
       I have a small request regarding the documentation. No where it
       is mentioning on how to tune my cluster for a better through put.
       As I have posted earlier I have done a benchmark with POVRAY
       -skyvase.pov. And I got a value of 8 seconds rendering time.
       Fine.. But how will know that I have a good performer in my hand?
       How will I tune it for better through put? what are the
       parameters to be tuned and taken care of?

       Could some body throw some light on these?

       Thanks
       Vinod.



      Do You Yahoo!?
      New! SBC Yahoo! Dial - 1st Month Free & unlimited access









-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Oh, it's good to be a geek.
http://thinkgeek.com/sf
_______________________________________________
Oscar-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/oscar-users

RE: [Oscar-users] Tuning info on OSCAR Cluster Documentation ?

Reply via email to