[Wien] Which mpi?

2008-01-18 Thread Masato YOSHIYA
Dear all:

Below my comment is based not only on Wien2k but on other 
ab-initio/first-principles codes in which relatively larger memory is 
consumed and the number of iteration is fewer, than, say, molecular 
dynamics codes.

 With the introduction of an mpi benchmark (thanks Peter), I would like
 to start a thread on this which would help me and perhaps others. Some
 questions:
 1) Has anyone tested Intel's mpi to see how much better (if at all) it is?

In our case, three points needs to be evaluated for this question: (1) 
stability, (2) commanding procedure, and (3) speed.

As far as I tried Intel's for other pseudo-potential code, I saw no 
remarkable difference in all three points above, and I've never heard 
opinions that are in great favor for Intel's, compared with MPICH1, 
MPICH2, LAM, or OpenMPI. One of advantages of Intel's over other free 
ones is that if you buy it you receive support.

 2) Is there much difference between mpich-1 and mpich-2?

In my experiences, statistical error is larger than the difference 
between mpich-1 and mpich-2 if you do not encounter bugs specific to a 
specific version, except for (2) commanding procedure: MPICH-1 or 
OpenMPI doesn't require any daemon to be executed in prior to an actual 
parallel run while MPICH-2 or LAM requires a daemon needs to be booted 
before a parallel run is executed. And, if you use ones that require the 
daemon, you can kill all the parallel run threads safely.

I'm (still) using mpich-1 (ver. 1.2.6) and mpich-2 (ver. 1.0.5p4), 
depending on what?, weather of the day.

 3) Is there much effect for 1) and 2) with ethernet versus myrinet or
 infiniband?

Sorry, I have no idea.

 4) Should one use rsh/ssh or something different for multiple CPU's on
 one computer?

If you execute a parallel run using mpiXX, you needs to use either rsh 
or ssh, even if you are using other core/CPU in a computer. But 
configuring routing table not to use NIC but to use loopback to reach 
the machine's own other CPU greatly reduces communication speed loss.

Hope this helps.

Looking forward to hearing others' opinions on Wien2k since I have just 
a little experience on parallel Wien2k.

Masato


[Wien] strange time using -it switch

2008-01-18 Thread Yongsheng Zhang

I think I am using $SCRATCH. In my .cshrc file, I have the line,
setenv SCRATCH ./

This machine is a shared memory machine, 4 CPUs in one node. Since the
communication between nodes is slow, I only use one node in k-point
parallel style NOT MPI parallel.

The same thing happens on our IBM Linux cluster, 2 CPUs in one node. -it
switch only works with the line without $para. It is not a shared
memory machine, and I use ssh for parallelization. Moreover, on this
machine, the -it switch meets another problem: The first full
diagonalization iteration is fine, and memory is enough for the
calculation, but when it switches to -it in the second iteration, and
copy case.vector into case.vector_old correctly, it says insufficiently
virtual memory.

 LAPW0 END
 LAPW1 END
 LAPW1 END
 LAPW1 END
 LAPW1 END
LAPW2 - FERMI; weighs written
 LAPW2 END
 LAPW2 END
 LAPW2 END
 LAPW2 END
 SUMPARA END
 SUMPARA END
 CORE  END
 MIXER END
 LAPW0 END
forrtl: severe (41): insufficient virtual memory
Image  PCRoutineLineSource
lapw1  08548873  Unknown   Unknown  Unknown
lapw1  08547E93  Unknown   Unknown  Unknown
lapw1  0850C80E  Unknown   Unknown  Unknown
lapw1  084DBFB8  Unknown   Unknown  Unknown
lapw1  084F8832  Unknown   Unknown  Unknown
lapw1  08098779  Unknown   Unknown  Unknown
lapw1  08091A14  Unknown   Unknown  Unknown
lapw1  08055F8C  Unknown   Unknown  Unknown
lapw1  0807832E  Unknown   Unknown  Unknown
lapw1  0804EA59  Unknown   Unknown  Unknown
libc.so.6  400BE210  Unknown   Unknown  Unknown
lapw1  0804E981  Unknown   Unknown  Unknown
forrtl: severe (41): insufficient virtual memory
Image  PCRoutineLineSource
lapw1  08548873  Unknown   Unknown  Unknown
lapw1  08547E93  Unknown   Unknown  Unknown
lapw1  0850C80E  Unknown   Unknown  Unknown
lapw1  084DBFB8  Unknown   Unknown  Unknown
lapw1  084F8832  Unknown   Unknown  Unknown
lapw1  08098779  Unknown   Unknown  Unknown
lapw1  08091A14  Unknown   Unknown  Unknown
lapw1  08055F8C  Unknown   Unknown  Unknown
lapw1  0807832E  Unknown   Unknown  Unknown
lapw1  0804EA59  Unknown   Unknown  Unknown
libc.so.6  400BE210  Unknown   Unknown  Unknown
lapw1  0804E981  Unknown   Unknown  Unknown
forrtl: severe (41): insufficient virtual memory
.

Then I do a test, turning off the -it switch, and the job just run
smoothly.


Thank you very much
Zhang

Peter Blaha wrote:
 Are you using $SCRATCH ?

 Is this a shared memory machine, do you use ssh or rsh for parallelization ?

 execute vec2old_lapw $para on the commandline, eventually add the -x switch
 in the first line of the script.
   


-- 
-
Address:  Fritz-Haber-Institut, Abt. Theorie
  Faradayweg 4-6 D-14195 Berlin (Germany)
Phone:+49 30 8413 4818
Fax:  +49 30 8413 4701
Email:zhang at fhi-berlin.mpg.de
-
1-0.0735-11600-23.05




[Wien] strange time using -it switch

2008-01-18 Thread Peter Blaha
I can hardly help without more info. Anyway, without a local SCRATCh dir
even without $para it should be ok.

(execute vec2old_lapw -p on the commandline in this subdir.
What do you get ? Eventually change the first line of the script to -fx.)


Yes, of course the iterative diagonalization needs some extra memory (basically
two times the vector files + some auxilliary arrays. So when full diag. just 
fits
into the memory it is possible that -it will crash.
For such large cases I'd use the mpi-parallel version anyway!

 The same thing happens on our IBM Linux cluster, 2 CPUs in one node. -it
 switch only works with the line without $para. It is not a shared
 memory machine, and I use ssh for parallelization. Moreover, on this
 machine, the -it switch meets another problem: The first full
 diagonalization iteration is fine, and memory is enough for the
 calculation, but when it switches to -it in the second iteration, and
 copy case.vector into case.vector_old correctly, it says insufficiently
 virtual memory.
 
  LAPW0 END
  LAPW1 END
  LAPW1 END
  LAPW1 END
  LAPW1 END
 LAPW2 - FERMI; weighs written
  LAPW2 END
  LAPW2 END
  LAPW2 END
  LAPW2 END
  SUMPARA END
  SUMPARA END
  CORE  END
  MIXER END
  LAPW0 END
 forrtl: severe (41): insufficient virtual memory
 Image  PCRoutineLineSource
 lapw1  08548873  Unknown   Unknown  Unknown
 lapw1  08547E93  Unknown   Unknown  Unknown
 lapw1  0850C80E  Unknown   Unknown  Unknown
 lapw1  084DBFB8  Unknown   Unknown  Unknown
 lapw1  084F8832  Unknown   Unknown  Unknown
 lapw1  08098779  Unknown   Unknown  Unknown
 lapw1  08091A14  Unknown   Unknown  Unknown
 lapw1  08055F8C  Unknown   Unknown  Unknown
 lapw1  0807832E  Unknown   Unknown  Unknown
 lapw1  0804EA59  Unknown   Unknown  Unknown
 libc.so.6  400BE210  Unknown   Unknown  Unknown
 lapw1  0804E981  Unknown   Unknown  Unknown
 forrtl: severe (41): insufficient virtual memory
 Image  PCRoutineLineSource
 lapw1  08548873  Unknown   Unknown  Unknown
 lapw1  08547E93  Unknown   Unknown  Unknown
 lapw1  0850C80E  Unknown   Unknown  Unknown
 lapw1  084DBFB8  Unknown   Unknown  Unknown
 lapw1  084F8832  Unknown   Unknown  Unknown
 lapw1  08098779  Unknown   Unknown  Unknown
 lapw1  08091A14  Unknown   Unknown  Unknown
 lapw1  08055F8C  Unknown   Unknown  Unknown
 lapw1  0807832E  Unknown   Unknown  Unknown
 lapw1  0804EA59  Unknown   Unknown  Unknown
 libc.so.6  400BE210  Unknown   Unknown  Unknown
 lapw1  0804E981  Unknown   Unknown  Unknown
 forrtl: severe (41): insufficient virtual memory
 .
 
 Then I do a test, turning off the -it switch, and the job just run
 smoothly.
 
 
 Thank you very much
 Zhang
 
 Peter Blaha wrote:
 Are you using $SCRATCH ?

 Is this a shared memory machine, do you use ssh or rsh for parallelization ?

 execute vec2old_lapw $para on the commandline, eventually add the -x switch
 in the first line of the script.
   
 
 

-- 

   P.Blaha
--
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-15671 FAX: +43-1-58801-15698
Email: blaha at theochem.tuwien.ac.atWWW: http://info.tuwien.ac.at/theochem/
--