Re: [OMPI users] problem with .bashrc stetting of openmpi

2010-08-13 Thread Cristobal Navarro
hello Sunita,

what linux distribution is this?

On Fri, Aug 13, 2010 at 1:57 AM,  wrote:

> Dear Open-mpi users,
>
> I installed openmpi-1.4.1 in my user area and then set the path for
> openmpi in the .bashrc file as follow. However, am still getting following
> error message whenever am starting the parallel molecular dynamics
> simulation using GROMACS. So every time am starting the MD job, I need to
> source the .bashrc file again.
>
> Earlier in some other machine I did the same thing and was not getting any
> problem.
>
> Could you guys suggest what would be the problem?
>
> .bashrc
> #path for openmpi
> export PATH=$PATH:/home/sunitap/soft/openmpi/bin
> export CFLAGS="-I/home/sunitap/soft/openmpi/include"
> export LDFLAGS="-L/home/sunitap/soft/openmpi/lib"
> export LD_LIBRARY_PATH=/home/sunitap/soft/openmpi/lib:$LD_LIBRARY_PATH
>
> == error message ==
> mdrun_mpi: error while loading shared libraries: libmpi.so.0: cannot open
> shared object file: No such file or directory
>
> 
>
> Thanks for any help.
> Best regards,
> Sunita
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] openMPI shared with NFS, but says different version

2010-08-05 Thread Cristobal Navarro
i have good news.

after updating to a newer kernel on ubuntu server nodes, sm is not a problem
anymore for the nehalem CPUs!!!
my older kernel, was
Linux 2.6.32-22-server #36-Ubuntu SMP Thu Jun 3 20:38:33 UTC 2010 x86_64
GNU/Linux

and i upgraded to
Linux agua 2.6.32-24-server #39-Ubuntu SMP Wed Jul 28 06:21:40 UTC 2010
x86_64 GNU/Linux

that solved everything.
Gus, maybe the problem you had with fedora can be solved in a similar way.

we should keep this for the records.

regards
Cristobal






On Wed, Jul 28, 2010 at 6:45 PM, Gus Correa <g...@ldeo.columbia.edu> wrote:

> Cristobal Navarro wrote:
>
>> Gus
>> my kernel for all nodes is this one:
>> Linux 2.6.32-22-server #36-Ubuntu SMP Thu Jun 3 20:38:33 UTC 2010 x86_64
>> GNU/Linux
>>
>>
> Kernel is not my league.
>
> However, it would be great if somebody clarified
> for good these issues with Nehalem/Westmere, HT,
> shared memory and what the kernel is doing,
> or how to make the kernel do the right thing.
> Maybe Intel could tell.
>
>
>  at least for the moment i will use this configuration, at least for
>> deveplopment/testing  of the parallel programs.
>> lag is minimum :)
>>
>> whenever i get another kernel update, i will test again to check if sm
>> works, would be good to know that suddenly another distribution supports
>> nehalem sm.
>>
>> best regards and thanks again
>> Cristobal
>> ps: guess what are the names of the other 2 nodes lol
>>
>
> Acatenango (I said that before), and Pacaya.
>
> Maybe: Santa Maria, Santiaguito, Atitlan, Toliman, San Pedro,
> Cerro de Oro ... too many volcanoes, and some are multithreaded ...
> You need to buy more nodes!
>
> Gus
>
>
>>
>>
>> On Wed, Jul 28, 2010 at 5:50 PM, Gus Correa <g...@ldeo.columbia.edu> g...@ldeo.columbia.edu>> wrote:
>>
>>Hi Cristobal
>>
>>Please, read my answer (way down the message) below.
>>
>>Cristobal Navarro wrote:
>>
>>
>>
>>    On Wed, Jul 28, 2010 at 3:28 PM, Gus Correa
>><g...@ldeo.columbia.edu <mailto:g...@ldeo.columbia.edu>
>><mailto:g...@ldeo.columbia.edu <mailto:g...@ldeo.columbia.edu>>>
>>wrote:
>>
>>   Hi Cristobal
>>
>>   Cristobal Navarro wrote:
>>
>>
>>
>>   On Wed, Jul 28, 2010 at 11:09 AM, Gus Correa
>>   <g...@ldeo.columbia.edu <mailto:g...@ldeo.columbia.edu>
>><mailto:g...@ldeo.columbia.edu <mailto:g...@ldeo.columbia.edu>>
>><mailto:g...@ldeo.columbia.edu
>><mailto:g...@ldeo.columbia.edu> <mailto:g...@ldeo.columbia.edu
>><mailto:g...@ldeo.columbia.edu>>>>
>>   wrote:
>>
>>  Hi Cristobal
>>
>>  In case you are not using full path name for
>>mpiexec/mpirun,
>>  what does "which mpirun" say?
>>
>>
>>   --> $which mpirun
>>/opt/openmpi-1.4.2
>>
>>
>>  Often times this is a source of confusion, old
>>versions may
>>  be first on the PATH.
>>
>>  Gus
>>
>>
>>   openMPI version problem is now gone, i can confirm that the
>>   version is consistent now :), thanks.
>>
>>
>>   This is good news.
>>
>>
>>   however, i keep getting this kernel crash randomnly when i
>>   execute with -np higher than 5
>>   these are Xeons, with Hyperthreading On, is that a problem??
>>
>>
>>   The problem may be with Hyperthreading, maybe not.
>>   Which Xeons?
>>
>>
>>--> they are not so old, not so new either
>>fcluster@agua:~$ cat /proc/cpuinfo | more
>>processor : 0
>>vendor_id : GenuineIntel
>>cpu family : 6
>>model : 26
>>model name : Intel(R) Xeon(R) CPU   E5520  @ 2.27GHz
>>stepping : 5
>>cpu MHz : 1596.000
>>cache size : 8192 KB
>>physical id : 0
>>siblings : 8
>>core id : 0
>>cpu cores : 4
>>apicid : 0
>>initial apicid : 0
>>fpu : yes
>>fpu_exception : yes
>>cpuid level : 11
>>wp : yes
>>flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
>>cmov p

Re: [OMPI users] Open MPI C++ class datatype

2010-08-04 Thread Cristobal Navarro
If your class has too many complex atributes,
 it might be a good idea to send some sort of string or similar data
representing your class,
and then on the receiving node, you create the object based on that
information.

works on some type of problems problems.
best regards.
Cristobal

On Wed, Aug 4, 2010 at 3:53 AM, Riccardo Murri wrote:

> Hi Jack,
>
> On Wed, Aug 4, 2010 at 6:25 AM, Jack Bryan  wrote:
> > I need to transfer some data, which is C++ class with some vector
> > member data.
> > I want to use MPI_Bcast(buffer, count, datatype, root, comm);
> > May I use MPI_Datatype to define customized data structure that contain
> C++
> > class ?
>
> No, unless you have access to the implementation details of the
> std::vector class (which would render your code dependent on one
> particular implementation of the STL, and thus non-portable).
>
> Boost.MPI provides support for std C++ datatypes; if you want to keep
> to "plain MPI" calls, then your only choice is to use C-style arrays.
>
> Regards,
> Riccardo
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] openMPI shared with NFS, but says different version

2010-07-28 Thread Cristobal Navarro
Gus
my kernel for all nodes is this one:
Linux 2.6.32-22-server #36-Ubuntu SMP Thu Jun 3 20:38:33 UTC 2010 x86_64
GNU/Linux

at least for the moment i will use this configuration, at least for
deveplopment/testing  of the parallel programs.
lag is minimum :)

whenever i get another kernel update, i will test again to check if sm
works, would be good to know that suddenly another distribution supports
nehalem sm.

best regards and thanks again
Cristobal
ps: guess what are the names of the other 2 nodes lol



On Wed, Jul 28, 2010 at 5:50 PM, Gus Correa <g...@ldeo.columbia.edu> wrote:

> Hi Cristobal
>
> Please, read my answer (way down the message) below.
>
> Cristobal Navarro wrote:
>
>>
>>
>> On Wed, Jul 28, 2010 at 3:28 PM, Gus Correa <g...@ldeo.columbia.edu> g...@ldeo.columbia.edu>> wrote:
>>
>>Hi Cristobal
>>
>>Cristobal Navarro wrote:
>>
>>
>>
>>On Wed, Jul 28, 2010 at 11:09 AM, Gus Correa
>><g...@ldeo.columbia.edu <mailto:g...@ldeo.columbia.edu>
>><mailto:g...@ldeo.columbia.edu <mailto:g...@ldeo.columbia.edu>>>
>>wrote:
>>
>>   Hi Cristobal
>>
>>   In case you are not using full path name for mpiexec/mpirun,
>>   what does "which mpirun" say?
>>
>>
>>--> $which mpirun
>> /opt/openmpi-1.4.2
>>
>>
>>   Often times this is a source of confusion, old versions may
>>   be first on the PATH.
>>
>>   Gus
>>
>>
>>openMPI version problem is now gone, i can confirm that the
>>version is consistent now :), thanks.
>>
>>
>>This is good news.
>>
>>
>>however, i keep getting this kernel crash randomnly when i
>>execute with -np higher than 5
>>these are Xeons, with Hyperthreading On, is that a problem??
>>
>>
>>The problem may be with Hyperthreading, maybe not.
>>Which Xeons?
>>
>>
>> --> they are not so old, not so new either
>> fcluster@agua:~$ cat /proc/cpuinfo | more
>> processor : 0
>> vendor_id : GenuineIntel
>> cpu family : 6
>> model : 26
>> model name : Intel(R) Xeon(R) CPU   E5520  @ 2.27GHz
>> stepping : 5
>> cpu MHz : 1596.000
>> cache size : 8192 KB
>> physical id : 0
>> siblings : 8
>> core id : 0
>> cpu cores : 4
>> apicid : 0
>> initial apicid : 0
>> fpu : yes
>> fpu_exception : yes
>> cpuid level : 11
>> wp : yes
>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
>> pse36 clflush dts acpi mmx fxsr sse sse2 ss h
>> t tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good
>> xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_
>> cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm ida
>> tpr_shadow vnmi flexpriority ept vpid
>> bogomips : 4522.21
>> clflush size : 64
>> cache_alignment : 64
>> address sizes : 40 bits physical, 48 bits virtual
>> power management:
>> ...same for cpu1, 2, 3, ..., 15.
>>
>>
> AHA! Nehalems!
>
> Here they are E5540, just a different clock speed, I suppose.
>
>
>  information on how the cpu is distributed
>> fcluster@agua:~$ lstopo
>> System(7992MB)
>>  Socket#0 + L3(8192KB)
>>L2(256KB) + L1(32KB) + Core#0
>>  P#0
>>  P#8
>>L2(256KB) + L1(32KB) + Core#1
>>  P#2
>>  P#10
>>L2(256KB) + L1(32KB) + Core#2
>>  P#4
>>  P#12
>>L2(256KB) + L1(32KB) + Core#3
>>  P#6
>>  P#14
>>  Socket#1 + L3(8192KB)
>>L2(256KB) + L1(32KB) + Core#0
>>  P#1
>>  P#9
>>L2(256KB) + L1(32KB) + Core#1
>>  P#3
>>  P#11
>>L2(256KB) + L1(32KB) + Core#2
>>  P#5
>>  P#13
>>L2(256KB) + L1(32KB) + Core#3
>>  P#7
>>  P#15
>>
>>
>>
>>
>>If I remember right, the old hyperthreading on old Xeons was
>>problematic.
>>
>>OTOH, about 1-2 months ago I had trouble with OpenMPI on a
>>relatively new Xeon Nehalem machine with (the new) Hyperthreading
>>turned on,
>>and Fedora Core 13.
>>The machine would hang with the OpenMPI connectivity example.
>>I reported this to the list, you may find in the archives.
>>
>>
>> --i foudn the archives recently about an hour ago, was not sure if it was
>> the same problem but i removed HT for 

Re: [OMPI users] openMPI shared with NFS, but says different version

2010-07-28 Thread Cristobal Navarro
On Wed, Jul 28, 2010 at 3:28 PM, Gus Correa <g...@ldeo.columbia.edu> wrote:

> Hi Cristobal
>
> Cristobal Navarro wrote:
>
>>
>>
>> On Wed, Jul 28, 2010 at 11:09 AM, Gus Correa <g...@ldeo.columbia.edu> g...@ldeo.columbia.edu>> wrote:
>>
>>Hi Cristobal
>>
>>In case you are not using full path name for mpiexec/mpirun,
>>what does "which mpirun" say?
>>
>>
>> --> $which mpirun
>>  /opt/openmpi-1.4.2
>>
>>
>>Often times this is a source of confusion, old versions may
>>be first on the PATH.
>>
>>Gus
>>
>>
>> openMPI version problem is now gone, i can confirm that the version is
>> consistent now :), thanks.
>>
>>
> This is good news.
>
>
>  however, i keep getting this kernel crash randomnly when i execute with
>> -np higher than 5
>> these are Xeons, with Hyperthreading On, is that a problem??
>>
>>
> The problem may be with Hyperthreading, maybe not.
> Which Xeons?
>

--> they are not so old, not so new either
fcluster@agua:~$ cat /proc/cpuinfo | more
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) CPU   E5520  @ 2.27GHz
stepping : 5
cpu MHz : 1596.000
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
pse36 clflush dts acpi mmx fxsr sse sse2 ss h
t tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good
xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_
cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm ida
tpr_shadow vnmi flexpriority ept vpid
bogomips : 4522.21
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
...same for cpu1, 2, 3, ..., 15.

information on how the cpu is distributed
fcluster@agua:~$ lstopo
System(7992MB)
  Socket#0 + L3(8192KB)
L2(256KB) + L1(32KB) + Core#0
  P#0
  P#8
L2(256KB) + L1(32KB) + Core#1
  P#2
  P#10
L2(256KB) + L1(32KB) + Core#2
  P#4
  P#12
L2(256KB) + L1(32KB) + Core#3
  P#6
  P#14
  Socket#1 + L3(8192KB)
L2(256KB) + L1(32KB) + Core#0
  P#1
  P#9
L2(256KB) + L1(32KB) + Core#1
  P#3
  P#11
L2(256KB) + L1(32KB) + Core#2
  P#5
  P#13
L2(256KB) + L1(32KB) + Core#3
  P#7
  P#15





> If I remember right, the old hyperthreading on old Xeons was problematic.
>
> OTOH, about 1-2 months ago I had trouble with OpenMPI on a relatively new
> Xeon Nehalem machine with (the new) Hyperthreading turned on,
> and Fedora Core 13.
> The machine would hang with the OpenMPI connectivity example.
> I reported this to the list, you may find in the archives.
>

--i foudn the archives recently about an hour ago, was not sure if it was
the same problem but i removed HT for testing with setting the online flag
to 0 on the extra cpus showed with lstopo, unfortenately i also crashes, so
HT may not be the problem.

> Apparently other people got everything (OpenMPI with HT on Nehalem)
> working in more stable distributions (CentOS, RHEL, etc).
>
> That problem was likely to be in the FC13 kernel,
> because even turning off HT I still had the machine hanging.
> Nothing worked with shared memory turned on,
> so I had to switch OpenMPI to use tcp instead,
> which is kind of ridiculous in a standalone machine.


--> very interesting, sm can be the problem


>
>
>
>  im trying to locate the kernel error on logs, but after rebooting a crash,
>> the error is not in the kern.log (neither kern.log.1).
>> all i remember is that it starts with "Kernel BUG..."
>> and somepart it mentions a certain CPU X, where that cpu can be any from 0
>> to 15 (im testing only in main node).  Someone knows where the log of kernel
>> error could be?
>>
>>
> Have you tried to turn off hyperthreading?
>

--> yes, tried, same crashes.


> In any case, depending on the application, it may not help much performance
> to have HT on.
>
> A more radical alternative is to try
> -mca btl tcp,self
> in the mpirun command line.
> That is what worked in the case I mentioned above.
>

wow!, this worked really :),  you pointed out the problem, it was shared
memory.
i have 4 nodes, so anyways there will be node comunication, do you think i
can rely on working with -mca btl tcp,self?? i dont mind small lag.

i just have one more question, is this a problem of the ubuntu server
kernel?? from the Nehalem Cpus?? from openMPI (i dont think) ??

and on what depends that in the future, sm could be possible on the same
c

Re: [OMPI users] openMPI shared with NFS, but says different version

2010-07-28 Thread Cristobal Navarro
to clear things,

i still can do a hello world on all 16 threads, but a few more repetitions
of the example and it kernel crashes :(

fcluster@agua:~$ mpirun --hostfile localhostfile -np 16 testMPI/hola
Process 0 on agua out of 16
Process 2 on agua out of 16
Process 14 on agua out of 16
Process 8 on agua out of 16
Process 1 on agua out of 16
Process 7 on agua out of 16
Process 9 on agua out of 16
Process 3 on agua out of 16
Process 4 on agua out of 16
Process 10 on agua out of 16
Process 15 on agua out of 16
Process 5 on agua out of 16
Process 6 on agua out of 16
Process 11 on agua out of 16
Process 13 on agua out of 16
Process 12 on agua out of 16
fcluster@agua:~$



On Wed, Jul 28, 2010 at 2:47 PM, Cristobal Navarro <axisch...@gmail.com>wrote:

>
>
> On Wed, Jul 28, 2010 at 11:09 AM, Gus Correa <g...@ldeo.columbia.edu>wrote:
>
>> Hi Cristobal
>>
>> In case you are not using full path name for mpiexec/mpirun,
>> what does "which mpirun" say?
>>
>
> --> $which mpirun
>   /opt/openmpi-1.4.2
>
>>
>> Often times this is a source of confusion, old versions may
>> be first on the PATH.
>>
>> Gus
>>
>
> openMPI version problem is now gone, i can confirm that the version is
> consistent now :), thanks.
>
> however, i keep getting this kernel crash randomnly when i execute with -np
> higher than 5
> these are Xeons, with Hyperthreading On, is that a problem??
>
> im trying to locate the kernel error on logs, but after rebooting a crash,
> the error is not in the kern.log (neither kern.log.1).
> all i remember is that it starts with "Kernel BUG..."
> and somepart it mentions a certain CPU X, where that cpu can be any from 0
> to 15 (im testing only in main node).  Someone knows where the log of kernel
> error could be?
>
>>
>> Cristobal Navarro wrote:
>>
>>>
>>> On Tue, Jul 27, 2010 at 7:29 PM, Gus Correa <g...@ldeo.columbia.edu>> g...@ldeo.columbia.edu>> wrote:
>>>
>>>Hi Cristobal
>>>
>>>Does it run only on the head node alone?
>>>(Fuego? Agua? Acatenango?)
>>>Try to put only the head node on the hostfile and execute with
>>> mpiexec.
>>>
>>> --> i will try only with the head node, and post results back
>>>This may help sort out what is going on.
>>>Hopefully it will run on the head node.
>>>
>>>Also, do you have Infinband connecting the nodes?
>>>The error messages refer to the openib btl (i.e. Infiniband),
>>>and complains of
>>>
>>>
>>> no we are just using normal network 100MBit/s , since i am just testing
>>> yet.
>>>
>>>
>>>"perhaps a missing symbol, or compiled for a different
>>>version of Open MPI?".
>>>It sounds as a mixup of versions/builds.
>>>
>>>
>>> --> i agree, somewhere there must be the remains of the older version
>>>
>>>Did you configure/build OpenMPI from source, or did you install
>>>it with apt-get?
>>>It may be easier/less confusing to install from source.
>>>If you did, what configure options did you use?
>>>
>>>
>>> -->i installed from source, ./configure --prefix=/opt/openmpi-1.4.2
>>> --with-sge --without-xgid --disable--static
>>>
>>>Also, as for the OpenMPI runtime environment,
>>>it is not enough to set it on
>>>the command line, because it will be effective only on the head node.
>>>You need to either add them to the PATH and LD_LIBRARY_PATH
>>>on your .bashrc/.cshrc files (assuming these files and your home
>>>directory are *also* shared with the nodes via NFS),
>>>or use the --prefix option of mpiexec to point to the OpenMPI main
>>>directory.
>>>
>>>
>>> yes, all nodes have their PATH and LD_LIBRARY_PATH set up properly inside
>>> the login scripts ( .bashrc in my case  )
>>>
>>>Needless to say, you need to check and ensure that the OpenMPI
>>>directory (and maybe your home directory, and your work directory)
>>>is (are)
>>>really mounted on the nodes.
>>>
>>>
>>> --> yes, doublechecked that they are
>>>
>>>I hope this helps,
>>>
>>>
>>> --> thanks really!
>>>
>>>Gus Correa
>>>
>>>Update: i just reinstalled openMPI, with the same parameters, and it
>>>seems that the problem has gone, i couldnt test entirely but when i
>>>get back to lab ill confirm.
>>>
>>> best regards! Cristobal
>>>
>>>
>>> 
>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>


Re: [OMPI users] openMPI shared with NFS, but says different version

2010-07-28 Thread Cristobal Navarro
On Wed, Jul 28, 2010 at 11:09 AM, Gus Correa <g...@ldeo.columbia.edu> wrote:

> Hi Cristobal
>
> In case you are not using full path name for mpiexec/mpirun,
> what does "which mpirun" say?
>

--> $which mpirun
  /opt/openmpi-1.4.2

>
> Often times this is a source of confusion, old versions may
> be first on the PATH.
>
> Gus
>

openMPI version problem is now gone, i can confirm that the version is
consistent now :), thanks.

however, i keep getting this kernel crash randomnly when i execute with -np
higher than 5
these are Xeons, with Hyperthreading On, is that a problem??

im trying to locate the kernel error on logs, but after rebooting a crash,
the error is not in the kern.log (neither kern.log.1).
all i remember is that it starts with "Kernel BUG..."
and somepart it mentions a certain CPU X, where that cpu can be any from 0
to 15 (im testing only in main node).  Someone knows where the log of kernel
error could be?

>
> Cristobal Navarro wrote:
>
>>
>> On Tue, Jul 27, 2010 at 7:29 PM, Gus Correa <g...@ldeo.columbia.edu> g...@ldeo.columbia.edu>> wrote:
>>
>>Hi Cristobal
>>
>>Does it run only on the head node alone?
>>(Fuego? Agua? Acatenango?)
>>Try to put only the head node on the hostfile and execute with mpiexec.
>>
>> --> i will try only with the head node, and post results back
>>This may help sort out what is going on.
>>Hopefully it will run on the head node.
>>
>>Also, do you have Infinband connecting the nodes?
>>The error messages refer to the openib btl (i.e. Infiniband),
>>and complains of
>>
>>
>> no we are just using normal network 100MBit/s , since i am just testing
>> yet.
>>
>>
>>"perhaps a missing symbol, or compiled for a different
>>version of Open MPI?".
>>It sounds as a mixup of versions/builds.
>>
>>
>> --> i agree, somewhere there must be the remains of the older version
>>
>>Did you configure/build OpenMPI from source, or did you install
>>it with apt-get?
>>It may be easier/less confusing to install from source.
>>If you did, what configure options did you use?
>>
>>
>> -->i installed from source, ./configure --prefix=/opt/openmpi-1.4.2
>> --with-sge --without-xgid --disable--static
>>
>>Also, as for the OpenMPI runtime environment,
>>it is not enough to set it on
>>the command line, because it will be effective only on the head node.
>>You need to either add them to the PATH and LD_LIBRARY_PATH
>>on your .bashrc/.cshrc files (assuming these files and your home
>>directory are *also* shared with the nodes via NFS),
>>or use the --prefix option of mpiexec to point to the OpenMPI main
>>directory.
>>
>>
>> yes, all nodes have their PATH and LD_LIBRARY_PATH set up properly inside
>> the login scripts ( .bashrc in my case  )
>>
>>Needless to say, you need to check and ensure that the OpenMPI
>>directory (and maybe your home directory, and your work directory)
>>is (are)
>>really mounted on the nodes.
>>
>>
>> --> yes, doublechecked that they are
>>
>>I hope this helps,
>>
>>
>> --> thanks really!
>>
>>Gus Correa
>>
>>Update: i just reinstalled openMPI, with the same parameters, and it
>>seems that the problem has gone, i couldnt test entirely but when i
>>get back to lab ill confirm.
>>
>> best regards! Cristobal
>>
>>
>> 
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] openMPI shared with NFS, but says different version

2010-07-28 Thread Cristobal Navarro
yes,

somehow after the second install, the installlation is consistent.

im only running into an issue, might be mpi im not sure.
these nodes, each one have 8 phisical procesors (2xIntel Xeon quad core),
and 16 virtual ones, btw i have ubuntu server 64bit 10.04 instaled on these
nodes.

the problem seems to be whenever y try to use over 8 proceses (make use of
the virtual ones), i get a horrible error saying about a kernel error and a
certain cpu that crashed, the error hags there for about a minute, then it
switches to another cpu and shows the same error. i have no other option to
press power off button.

ill try to copy the error, and post it.



On Wed, Jul 28, 2010 at 7:39 AM, Jeff Squyres <jsquy...@cisco.com> wrote:

> This issue is usually caused by installing one version of Open MPI over an
> older version:
>
>http://www.open-mpi.org/faq/?category=building#install-overwrite
>
>
> On Jul 27, 2010, at 10:35 PM, Cristobal Navarro wrote:
>
> >
> > On Tue, Jul 27, 2010 at 7:29 PM, Gus Correa <g...@ldeo.columbia.edu>
> wrote:
> > Hi Cristobal
> >
> > Does it run only on the head node alone?
> > (Fuego? Agua? Acatenango?)
> > Try to put only the head node on the hostfile and execute with mpiexec.
> > --> i will try only with the head node, and post results back
> > This may help sort out what is going on.
> > Hopefully it will run on the head node.
> >
> > Also, do you have Infinband connecting the nodes?
> > The error messages refer to the openib btl (i.e. Infiniband),
> > and complains of
> >
> > no we are just using normal network 100MBit/s , since i am just testing
> yet.
> >
> > "perhaps a missing symbol, or compiled for a different
> > version of Open MPI?".
> > It sounds as a mixup of versions/builds.
> >
> > --> i agree, somewhere there must be the remains of the older version
> >
> > Did you configure/build OpenMPI from source, or did you install
> > it with apt-get?
> > It may be easier/less confusing to install from source.
> > If you did, what configure options did you use?
> >
> > -->i installed from source,
> > ./configure --prefix=/opt/openmpi-1.4.2 --with-sge --without-xgid
> --disable--static
> >
> > Also, as for the OpenMPI runtime environment,
> > it is not enough to set it on
> > the command line, because it will be effective only on the head node.
> > You need to either add them to the PATH and LD_LIBRARY_PATH
> > on your .bashrc/.cshrc files (assuming these files and your home
> directory are *also* shared with the nodes via NFS),
> > or use the --prefix option of mpiexec to point to the OpenMPI main
> directory.
> >
> > yes, all nodes have their PATH and LD_LIBRARY_PATH set up properly inside
> the login scripts ( .bashrc in my case  )
> >
> > Needless to say, you need to check and ensure that the OpenMPI directory
> (and maybe your home directory, and your work directory) is (are)
> > really mounted on the nodes.
> >
> > --> yes, doublechecked that they are
> >
> > I hope this helps,
> >
> > --> thanks really!
> >
> > Gus Correa
> >
> > Update: i just reinstalled openMPI, with the same parameters, and it
> seems that the problem has gone, i couldnt test entirely but when i get back
> to lab ill confirm.
> >
> > best regards!
> > Cristobal
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] openMPI shared with NFS, but says different version

2010-07-27 Thread Cristobal Navarro
On Tue, Jul 27, 2010 at 7:29 PM, Gus Correa  wrote:

> Hi Cristobal
>
> Does it run only on the head node alone?
> (Fuego? Agua? Acatenango?)
> Try to put only the head node on the hostfile and execute with mpiexec.
>
--> i will try only with the head node, and post results back

> This may help sort out what is going on.
> Hopefully it will run on the head node.
>
> Also, do you have Infinband connecting the nodes?
> The error messages refer to the openib btl (i.e. Infiniband),
> and complains of


no we are just using normal network 100MBit/s , since i am just testing yet.

>
> "perhaps a missing symbol, or compiled for a different
> version of Open MPI?".
> It sounds as a mixup of versions/builds.
>

--> i agree, somewhere there must be the remains of the older version

>
> Did you configure/build OpenMPI from source, or did you install
> it with apt-get?
> It may be easier/less confusing to install from source.
> If you did, what configure options did you use?
>

-->i installed from source,
./configure --prefix=/opt/openmpi-1.4.2 --with-sge --without-xgid
--disable--static

>
> Also, as for the OpenMPI runtime environment,
> it is not enough to set it on
> the command line, because it will be effective only on the head node.
> You need to either add them to the PATH and LD_LIBRARY_PATH
> on your .bashrc/.cshrc files (assuming these files and your home directory
> are *also* shared with the nodes via NFS),
> or use the --prefix option of mpiexec to point to the OpenMPI main
> directory.
>

yes, all nodes have their PATH and LD_LIBRARY_PATH set up properly inside
the login scripts ( .bashrc in my case  )

>
> Needless to say, you need to check and ensure that the OpenMPI directory
> (and maybe your home directory, and your work directory) is (are)
> really mounted on the nodes.
>

--> yes, doublechecked that they are

>
> I hope this helps,


--> thanks really!

>
> Gus Correa
>
> Update: i just reinstalled openMPI, with the same parameters, and it seems
> that the problem has gone, i couldnt test entirely but when i get back to
> lab ill confirm.
>

best regards!
Cristobal


Re: [OMPI users] openMPI shared with NFS, but says different version

2010-07-27 Thread Cristobal Navarro
i compiled with absolute path in case:
fcluster@agua:~$ /opt/openmpi-1.4.2/bin/mpicc testMPI/hello.c -o
testMPI/hola
fcluster@agua:~$ mpirun --hostfile myhostfile -np 5 testMPI/hola
[agua:03547] mca: base: component_find: unable to open
/opt/openmpi-1.4.2/lib/openmpi/mca_btl_ofud: perhaps a missing symbol, or
compiled for a different version of Open MPI? (ignored)
[agua:03547] mca: base: component_find: unable to open
/opt/openmpi-1.4.2/lib/openmpi/mca_btl_openib: perhaps a missing symbol, or
compiled for a different version of Open MPI? (ignored)
[agua:03548] mca: base: component_find: unable to open
/opt/openmpi-1.4.2/lib/openmpi/mca_btl_ofud: perhaps a missing symbol, or
compiled for a different version of Open MPI? (ignored)
[agua:03548] mca: base: component_find: unable to open
/opt/openmpi-1.4.2/lib/openmpi/mca_btl_openib: perhaps a missing symbol, or
compiled for a different version of Open MPI? (ignored)
[agua:03549] mca: base: component_find: unable to open
/opt/openmpi-1.4.2/lib/openmpi/mca_btl_ofud: perhaps a missing symbol, or
compiled for a different version of Open MPI? (ignored)
[agua:03549] mca: base: component_find: unable to open
/opt/openmpi-1.4.2/lib/openmpi/mca_btl_openib: perhaps a missing symbol, or
compiled for a different version of Open MPI? (ignored)
[agua:03550] mca: base: component_find: unable to open
/opt/openmpi-1.4.2/lib/openmpi/mca_btl_ofud: perhaps a missing symbol, or
compiled for a different version of Open MPI? (ignored)
[agua:03550] mca: base: component_find: unable to open
/opt/openmpi-1.4.2/lib/openmpi/mca_btl_openib: perhaps a missing symbol, or
compiled for a different version of Open MPI? (ignored)
[agua:03551] mca: base: component_find: unable to open
/opt/openmpi-1.4.2/lib/openmpi/mca_btl_ofud: perhaps a missing symbol, or
compiled for a different version of Open MPI? (ignored)
[agua:03551] mca: base: component_find: unable to open
/opt/openmpi-1.4.2/lib/openmpi/mca_btl_openib: perhaps a missing symbol, or
compiled for a different version of Open MPI? (ignored)
--
mpirun noticed that process rank 4 with PID 3551 on node agua exited on
signal 11 (Segmentation fault).
--

and it segfaulted. the machine stoped and threw many errors on its screen,
cannot copy them because they didnt show in ssh.


On Tue, Jul 27, 2010 at 7:07 PM, Cristobal Navarro <axisch...@gmail.com>wrote:

> Thanks Gus,
>
> but i already had the paths
>
> fcluster@agua:~$ echo $PATH
>
> /opt/openmpi-1.4.2/bin:/opt/cfc/sge/bin/lx24-amd64:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
> fcluster@agua:~$ echo $LD_LIBRARY_PATH
> /opt/openmpi-1.4.2/lib:
> fcluster@agua:~$
>
> even weird, errors come sometimes from the master node (agua)
>
>
> On Tue, Jul 27, 2010 at 6:59 PM, Gus Correa <g...@ldeo.columbia.edu> wrote:
>
>> Hi Cristobal
>>
>> Try using the --prefix option of mpiexec.
>> "man mpiexec" is your friend!
>>
>> Alternatively, append the OpenMPI directories to your
>> PATH *and* LD_LIBRARY_PATH on your .bashrc/.csrhc file
>> See this FAQ:
>> http://www.open-mpi.org/faq/?category=running#adding-ompi-to-path
>>
>> I hope it helps,
>> Gus Correa
>>
>> Cristobal Navarro wrote:
>>
>>> Hi,
>>> Even when executing a hello world openmpi, i get this error, which is
>>> then ignored.
>>> fcluster@fuego:~$ mpirun --hostfile myhostfile -np 5 testMPI/hola
>>> [agua:02357] mca: base: component_find: unable to open
>>> /opt/openmpi-1.4.2/lib/openmpi/mca_btl_ofud: perhaps a missing symbol, or
>>> compiled for a different version of Open MPI? (ignored)
>>> [agua:02354] mca: base: component_find: unable to open
>>> /opt/openmpi-1.4.2/lib/openmpi/mca_btl_ofud: perhaps a missing symbol, or
>>> compiled for a different version of Open MPI? (ignored)
>>> [agua:02356] mca: base: component_find: unable to open
>>> /opt/openmpi-1.4.2/lib/openmpi/mca_btl_ofud: perhaps a missing symbol, or
>>> compiled for a different version of Open MPI? (ignored)
>>> [agua:02358] mca: base: component_find: unable to open
>>> /opt/openmpi-1.4.2/lib/openmpi/mca_btl_ofud: perhaps a missing symbol, or
>>> compiled for a different version of Open MPI? (ignored)
>>> [agua:02355] mca: base: component_find: unable to open
>>> /opt/openmpi-1.4.2/lib/openmpi/mca_btl_ofud: perhaps a missing symbol, or
>>> compiled for a different version of Open MPI? (ignored)
>>> [agua:02358] mca: base: component_find: unable to open
>>> /opt/openmpi-1.4.2/lib/openmpi/mca_btl_openib: perhaps a missing symbol, or
>>> compiled for a different

Re: [OMPI users] openMPI shared with NFS, but says different version

2010-07-27 Thread Cristobal Navarro
Thanks Gus,

but i already had the paths

fcluster@agua:~$ echo $PATH
/opt/openmpi-1.4.2/bin:/opt/cfc/sge/bin/lx24-amd64:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
fcluster@agua:~$ echo $LD_LIBRARY_PATH
/opt/openmpi-1.4.2/lib:
fcluster@agua:~$

even weird, errors come sometimes from the master node (agua)


On Tue, Jul 27, 2010 at 6:59 PM, Gus Correa <g...@ldeo.columbia.edu> wrote:

> Hi Cristobal
>
> Try using the --prefix option of mpiexec.
> "man mpiexec" is your friend!
>
> Alternatively, append the OpenMPI directories to your
> PATH *and* LD_LIBRARY_PATH on your .bashrc/.csrhc file
> See this FAQ:
> http://www.open-mpi.org/faq/?category=running#adding-ompi-to-path
>
> I hope it helps,
> Gus Correa
>
> Cristobal Navarro wrote:
>
>> Hi,
>> Even when executing a hello world openmpi, i get this error, which is then
>> ignored.
>> fcluster@fuego:~$ mpirun --hostfile myhostfile -np 5 testMPI/hola
>> [agua:02357] mca: base: component_find: unable to open
>> /opt/openmpi-1.4.2/lib/openmpi/mca_btl_ofud: perhaps a missing symbol, or
>> compiled for a different version of Open MPI? (ignored)
>> [agua:02354] mca: base: component_find: unable to open
>> /opt/openmpi-1.4.2/lib/openmpi/mca_btl_ofud: perhaps a missing symbol, or
>> compiled for a different version of Open MPI? (ignored)
>> [agua:02356] mca: base: component_find: unable to open
>> /opt/openmpi-1.4.2/lib/openmpi/mca_btl_ofud: perhaps a missing symbol, or
>> compiled for a different version of Open MPI? (ignored)
>> [agua:02358] mca: base: component_find: unable to open
>> /opt/openmpi-1.4.2/lib/openmpi/mca_btl_ofud: perhaps a missing symbol, or
>> compiled for a different version of Open MPI? (ignored)
>> [agua:02355] mca: base: component_find: unable to open
>> /opt/openmpi-1.4.2/lib/openmpi/mca_btl_ofud: perhaps a missing symbol, or
>> compiled for a different version of Open MPI? (ignored)
>> [agua:02358] mca: base: component_find: unable to open
>> /opt/openmpi-1.4.2/lib/openmpi/mca_btl_openib: perhaps a missing symbol, or
>> compiled for a different version of Open MPI? (ignored)
>> [agua:02355] mca: base: component_find: unable to open
>> /opt/openmpi-1.4.2/lib/openmpi/mca_btl_openib: perhaps a missing symbol, or
>> compiled for a different version of Open MPI? (ignored)
>> [agua:02354] mca: base: component_find: unable to open
>> /opt/openmpi-1.4.2/lib/openmpi/mca_btl_openib: perhaps a missing symbol, or
>> compiled for a different version of Open MPI? (ignored)
>> [agua:02356] mca: base: component_find: unable to open
>> /opt/openmpi-1.4.2/lib/openmpi/mca_btl_openib: perhaps a missing symbol, or
>> compiled for a different version of Open MPI? (ignored)
>> [agua:02357] mca: base: component_find: unable to open
>> /opt/openmpi-1.4.2/lib/openmpi/mca_btl_openib: perhaps a missing symbol, or
>> compiled for a different version of Open MPI? (ignored)
>> Process 3 on agua out of 5
>> Process 4 on agua out of 5
>> Process 1 on agua out of 5
>> Process 2 on agua out of 5
>> Process 0 on agua out of 5
>>
>>
>> /opt/openmpi-1.4.2/ is shared through NFS.
>>
>> master node did had an older openmpi version before installing 1.4.2, but
>> i removed them all with
>> sudo apt-get --purge remove libopenmpi1 libopenmpi-dev openmpi-bin
>> openmpi-dev openmpi-common
>> i checked for /usr/lib64/openmpi   and for  /usr/lib/openmpi   and deleted
>> them.
>>
>> however, when compiling again i keep getting this error,
>> something must be remaining from the older version of openmpi, but i
>> really dont know where that remaining could be.
>> any help, welcome
>>
>>
>> 
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


[OMPI users] openMPI shared with NFS, but says different version

2010-07-27 Thread Cristobal Navarro
Hi,

Even when executing a hello world openmpi, i get this error, which is then
ignored.
fcluster@fuego:~$ mpirun --hostfile myhostfile -np 5 testMPI/hola
[agua:02357] mca: base: component_find: unable to open
/opt/openmpi-1.4.2/lib/openmpi/mca_btl_ofud: perhaps a missing symbol, or
compiled for a different version of Open MPI? (ignored)
[agua:02354] mca: base: component_find: unable to open
/opt/openmpi-1.4.2/lib/openmpi/mca_btl_ofud: perhaps a missing symbol, or
compiled for a different version of Open MPI? (ignored)
[agua:02356] mca: base: component_find: unable to open
/opt/openmpi-1.4.2/lib/openmpi/mca_btl_ofud: perhaps a missing symbol, or
compiled for a different version of Open MPI? (ignored)
[agua:02358] mca: base: component_find: unable to open
/opt/openmpi-1.4.2/lib/openmpi/mca_btl_ofud: perhaps a missing symbol, or
compiled for a different version of Open MPI? (ignored)
[agua:02355] mca: base: component_find: unable to open
/opt/openmpi-1.4.2/lib/openmpi/mca_btl_ofud: perhaps a missing symbol, or
compiled for a different version of Open MPI? (ignored)
[agua:02358] mca: base: component_find: unable to open
/opt/openmpi-1.4.2/lib/openmpi/mca_btl_openib: perhaps a missing symbol, or
compiled for a different version of Open MPI? (ignored)
[agua:02355] mca: base: component_find: unable to open
/opt/openmpi-1.4.2/lib/openmpi/mca_btl_openib: perhaps a missing symbol, or
compiled for a different version of Open MPI? (ignored)
[agua:02354] mca: base: component_find: unable to open
/opt/openmpi-1.4.2/lib/openmpi/mca_btl_openib: perhaps a missing symbol, or
compiled for a different version of Open MPI? (ignored)
[agua:02356] mca: base: component_find: unable to open
/opt/openmpi-1.4.2/lib/openmpi/mca_btl_openib: perhaps a missing symbol, or
compiled for a different version of Open MPI? (ignored)
[agua:02357] mca: base: component_find: unable to open
/opt/openmpi-1.4.2/lib/openmpi/mca_btl_openib: perhaps a missing symbol, or
compiled for a different version of Open MPI? (ignored)
Process 3 on agua out of 5
Process 4 on agua out of 5
Process 1 on agua out of 5
Process 2 on agua out of 5
Process 0 on agua out of 5


/opt/openmpi-1.4.2/ is shared through NFS.

master node did had an older openmpi version before installing 1.4.2, but i
removed them all with
sudo apt-get --purge remove libopenmpi1 libopenmpi-dev openmpi-bin
openmpi-dev openmpi-common
i checked for /usr/lib64/openmpi   and for  /usr/lib/openmpi   and deleted
them.

however, when compiling again i keep getting this error,
something must be remaining from the older version of openmpi, but i really
dont know where that remaining could be.
any help, welcome


Re: [OMPI users] Help on the big picture..

2010-07-24 Thread Cristobal Navarro
this is a mailing list, and some of us are new, others older and
experienced, the new ones we might not know the protocol commonly
used, but we should at least treat each other more friendly without
judging the interests of the others ahead of time, because you are
wrong.

All the answers received were useful for me.
thanks...
Cristobal





On Fri, Jul 23, 2010 at 10:27 PM, Tim Prince <n...@aol.com> wrote:
> On 7/22/2010 4:11 PM, Gus Correa wrote:
>>
>> Hi Cristobal
>>
>> Cristobal Navarro wrote:
>>>
>>> yes,
>>> i was aware of the big difference hehe.
>>>
>>> now that openMP and openMPI is in talk, i've alwyas wondered if its a
>>> good idea to model a solution on the following way, using both openMP
>>> and openMPI.
>>> suppose you have n nodes, each node has a quadcore, (so you have n*4
>>> processors)
>>> launch n proceses acorrding to the n nodes available.
>>> set a resource manager like SGE to fill the n*4 slots using round robin.
>>> on each process, make use of the other cores available on the node,
>>> with openMP.
>>>
>>> if this is possible, then on each one could make use fo the shared
>>> memory model locally at each node, evading unnecesary I/O through the
>>> nwetwork, what do you think?
>>>
> Before asking what we think about this, please check the many references
> posted on this subject over the last decade.  Then refine your question to
> what you are interested in hearing about; evidently you have no interest in
> much of this topic.
>>
>> Yes, it is possible, and many of the atmosphere/oceans/climate codes
>> that we run is written with this capability. In other areas of
>> science and engineering this is probably the case too.
>>
>> However, this is not necessarily better/faster/simpler than dedicate all
>> the cores to MPI processes.
>>
>> In my view, this is due to:
>>
>> 1) OpenMP has a different scope than MPI,
>> and to some extent is limited by more stringent requirements than MPI;
>>
>> 2) Most modern MPI implementations (and OpenMPI is an example) use shared
>> memory mechanisms to communicate between processes that reside
>> in a single physical node/computer;
>
> The shared memory communication of several MPI implementations does greatly
> improve efficiency of message passing among ranks assigned to the same node.
>  However, these ranks also communicate with ranks on other nodes, so there
> is a large potential advantage for hybrid MPI/OpenMP as the number of cores
> in use increases.  If you aren't interested in running on more than 8 nodes
> or so, perhaps you won't care about this.
>>
>> 3) Writing hybrid code with MPI and OpenMP requires more effort,
>> and much care so as not to let the two forms of parallelism step on
>> each other's toes.
>
> The MPI standard specifies the use of MPI_init_thread to indicate which
> combination of MPI and threading you intend to use, and to inquire whether
> that model is supported by the active MPI.
> In the case where there is only 1 MPI process per node (possibly using
> several cores via OpenMP threading) there is no requirement for special
> affinity support.
> If there is more than 1 FUNNELED rank per multiple CPU node, it becomes
> important to maintain cache locality for each rank.
>>
>> OpenMP operates mostly through compiler directives/pragmas interspersed
>> on the code.  For instance, you can parallelize inner loops in no time,
>> granted that there are no data dependencies across the commands within the
>> loop.  All it takes is to write one or two directive/pragma lines.
>> More than loop parallelization can be done with OpenMP, of course,
>> although not as much as can be done with MPI.
>> Still, with OpenMP, you are restricted to work in a shared memory
>> environment.
>>
>> By contrast, MPI requires more effort to program, but it takes advantage
>> of shared memory and networked environments
>> (and perhaps extended grids too).
>>
> 
> snipped tons of stuff rather than attempt to reconcile top postings
>
> --
> Tim Prince
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



Re: [OMPI users] Help on the big picture..

2010-07-22 Thread Cristobal Navarro
thanks

very clear,

i was not aware that openMPI internally uses shared memory in case two
proceses reside on the same node,
which is perfect.

very complete explanations,
thanks really

On Thu, Jul 22, 2010 at 7:11 PM, Gus Correa <g...@ldeo.columbia.edu> wrote:
> Hi Cristobal
>
> Cristobal Navarro wrote:
>>
>> yes,
>> i was aware of the big difference hehe.
>>
>> now that openMP and openMPI is in talk, i've alwyas wondered if its a
>> good idea to model a solution on the following way, using both openMP
>> and openMPI.
>> suppose you have n nodes, each node has a quadcore, (so you have n*4
>> processors)
>> launch n proceses acorrding to the n nodes available.
>> set a resource manager like SGE to fill the n*4 slots using round robin.
>> on each process, make use of the other cores available on the node,
>> with openMP.
>>
>> if this is possible, then on each one could make use fo the shared
>> memory model locally at each node, evading unnecesary I/O through the
>> nwetwork, what do you think?
>>
>
> Yes, it is possible, and many of the atmosphere/oceans/climate codes
> that we run is written with this capability. In other areas of
> science and engineering this is probably the case too.
>
> However, this is not necessarily better/faster/simpler than dedicate all the
> cores to MPI processes.
>
> In my view, this is due to:
>
> 1) OpenMP has a different scope than MPI,
> and to some extent is limited by more stringent requirements than MPI;
>
> 2) Most modern MPI implementations (and OpenMPI is an example) use shared
> memory mechanisms to communicate between processes that reside
> in a single physical node/computer;
>
> 3) Writing hybrid code with MPI and OpenMP requires more effort,
> and much care so as not to let the two forms of parallelism step on
> each other's toes.
>
> OpenMP operates mostly through compiler directives/pragmas interspersed
> on the code.  For instance, you can parallelize inner loops in no time,
> granted that there are no data dependencies across the commands within the
> loop.  All it takes is to write one or two directive/pragma lines.
> More than loop parallelization can be done with OpenMP, of course,
> although not as much as can be done with MPI.
> Still, with OpenMP, you are restricted to work in a shared memory
> environment.
>
> By contrast, MPI requires more effort to program, but it takes advantage
> of shared memory and networked environments
> (and perhaps extended grids too).
> On areas where MPI-based libraries and APIs (like PETSc) were developed,
> the effort of programming directly with MPI can be reduced,
> by using the library facilities.
>
> To answer your question in another email, I think
> in principle you can program with PETSc and MPI together.
>
> I hope this helps.
> Gus Correa
> -
> Gustavo Correa
> Lamont-Doherty Earth Observatory - Columbia University
> Palisades, NY, 10964-8000 - USA
> -
>
>>
>>
>> On Thu, Jul 22, 2010 at 5:27 PM, amjad ali <amja...@gmail.com> wrote:
>>>
>>> Hi Cristobal,
>>>
>>> Note that the pic in http://dl.dropbox.com/u/6380744/clusterLibs.png
>>> shows that Scalapack is based on what; it only shows which packages
>>> Scalapack uses; hence no OpenMP is there.
>>>
>>> Also be clear about the difference:
>>> "OpenMP" is for shared memory parallel programming, while
>>> "OpenMPI" is an implantation of MPI standard (this list is about OpenMPI
>>> obviously).
>>>
>>> best
>>> AA.
>>>
>>> On Thu, Jul 22, 2010 at 5:06 PM, Cristobal Navarro <axisch...@gmail.com>
>>> wrote:
>>>>
>>>> Thanks
>>>>
>>>> im looking at the manual, seems good.
>>>> i think now the picture is more clear.
>>>>
>>>> i have a very custom algorithm, local problem of research,
>>>> paralelizable, thats where openMPI enters.
>>>> then, at some point on the program, all the computation traduces to
>>>> numeric (double) matrix operations, eigenvalues and derivatives. thats
>>>> where a library like PETSc makes its appearance. or a lower level
>>>> solution would be GSL and manually implement paralelism with MPI.
>>>>
>>>> in case someone chooses, a highlevel library like PETSc and some low
>>>> level openMPI for its custom algorithms, is there a race for MPI
>>>> problem?
>&g

Re: [OMPI users] Help on the big picture..

2010-07-22 Thread Cristobal Navarro
yes,
i was aware of the big difference hehe.

now that openMP and openMPI is in talk, i've alwyas wondered if its a
good idea to model a solution on the following way, using both openMP
and openMPI.
suppose you have n nodes, each node has a quadcore, (so you have n*4 processors)
launch n proceses acorrding to the n nodes available.
set a resource manager like SGE to fill the n*4 slots using round robin.
on each process, make use of the other cores available on the node,
with openMP.

if this is possible, then on each one could make use fo the shared
memory model locally at each node, evading unnecesary I/O through the
nwetwork, what do you think?



On Thu, Jul 22, 2010 at 5:27 PM, amjad ali <amja...@gmail.com> wrote:
> Hi Cristobal,
>
> Note that the pic in http://dl.dropbox.com/u/6380744/clusterLibs.png
> shows that Scalapack is based on what; it only shows which packages
> Scalapack uses; hence no OpenMP is there.
>
> Also be clear about the difference:
> "OpenMP" is for shared memory parallel programming, while
> "OpenMPI" is an implantation of MPI standard (this list is about OpenMPI
> obviously).
>
> best
> AA.
>
> On Thu, Jul 22, 2010 at 5:06 PM, Cristobal Navarro <axisch...@gmail.com>
> wrote:
>>
>> Thanks
>>
>> im looking at the manual, seems good.
>> i think now the picture is more clear.
>>
>> i have a very custom algorithm, local problem of research,
>> paralelizable, thats where openMPI enters.
>> then, at some point on the program, all the computation traduces to
>> numeric (double) matrix operations, eigenvalues and derivatives. thats
>> where a library like PETSc makes its appearance. or a lower level
>> solution would be GSL and manually implement paralelism with MPI.
>>
>> in case someone chooses, a highlevel library like PETSc and some low
>> level openMPI for its custom algorithms, is there a race for MPI
>> problem?
>>
>> On Thu, Jul 22, 2010 at 3:42 PM, Gus Correa <g...@ldeo.columbia.edu> wrote:
>> > Hi Cristobal
>> >
>> > You may want to take a look at PETSc,
>> > which has all the machinery for linear algebra that
>> > you need, can easily attach a variety of Linear Algebra packages,
>> > including those in the diagram you sent and more,
>> > builds on top of MPI, and can even build MPI for you, if you prefer.
>> > It has C and Fortran interfaces, and if I remember right,
>> > you can build it alternatively with a C++ interface.
>> > You can choose from real or complex scalars,
>> > depending on your target problem (e.g. if you are going to do
>> > signal/image processing with FFTs, you want complex scalars).
>> > I don't know if it has high level commands to deal with
>> > data structures (like trees that you mentioned), but it may.
>> >
>> > http://www.mcs.anl.gov/petsc/petsc-as/
>> >
>> > My $0.02
>> > Gus Correa
>> > -
>> > Gustavo Correa
>> > Lamont-Doherty Earth Observatory - Columbia University
>> > Palisades, NY, 10964-8000 - USA
>> > -
>> >
>> > Cristobal Navarro wrote:
>> >>
>> >> Hello,
>> >>
>> >> i am designing a solution to one of my programs, which mixes some tree
>> >> generation, matrix operatons, eigenvaluies, among other tasks.
>> >> i have to paralellize all of this for a cluster of 4 nodes (32 cores),
>> >> and what i first thought was MPI as a blind choice, but after looking
>> >> at this picture
>> >>
>> >> http://dl.dropbox.com/u/6380744/clusterLibs.png ( On the picture,
>> >> openMP is missing.)
>> >>
>> >> i decided to take a break and sit down, think what best suits to my
>> >> needs.
>> >> Adittionally, i am not familiar with Fortran, so i search for C/C++
>> >> libraries.
>> >>
>> >> what are your experiences, what aspects of your proyect do you
>> >> consider when choosing, is a good practice to mix these libraries in
>> >> one same proyect?
>> >> ___
>> >> users mailing list
>> >> us...@open-mpi.org
>> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >
>> > ___
>> > users mailing list
>> > us...@open-mpi.org
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] Help on the big picture..

2010-07-22 Thread Cristobal Navarro
Thanks

im looking at the manual, seems good.
i think now the picture is more clear.

i have a very custom algorithm, local problem of research,
paralelizable, thats where openMPI enters.
then, at some point on the program, all the computation traduces to
numeric (double) matrix operations, eigenvalues and derivatives. thats
where a library like PETSc makes its appearance. or a lower level
solution would be GSL and manually implement paralelism with MPI.

in case someone chooses, a highlevel library like PETSc and some low
level openMPI for its custom algorithms, is there a race for MPI
problem?

On Thu, Jul 22, 2010 at 3:42 PM, Gus Correa <g...@ldeo.columbia.edu> wrote:
> Hi Cristobal
>
> You may want to take a look at PETSc,
> which has all the machinery for linear algebra that
> you need, can easily attach a variety of Linear Algebra packages,
> including those in the diagram you sent and more,
> builds on top of MPI, and can even build MPI for you, if you prefer.
> It has C and Fortran interfaces, and if I remember right,
> you can build it alternatively with a C++ interface.
> You can choose from real or complex scalars,
> depending on your target problem (e.g. if you are going to do
> signal/image processing with FFTs, you want complex scalars).
> I don't know if it has high level commands to deal with
> data structures (like trees that you mentioned), but it may.
>
> http://www.mcs.anl.gov/petsc/petsc-as/
>
> My $0.02
> Gus Correa
> -
> Gustavo Correa
> Lamont-Doherty Earth Observatory - Columbia University
> Palisades, NY, 10964-8000 - USA
> -----
>
> Cristobal Navarro wrote:
>>
>> Hello,
>>
>> i am designing a solution to one of my programs, which mixes some tree
>> generation, matrix operatons, eigenvaluies, among other tasks.
>> i have to paralellize all of this for a cluster of 4 nodes (32 cores),
>> and what i first thought was MPI as a blind choice, but after looking
>> at this picture
>>
>> http://dl.dropbox.com/u/6380744/clusterLibs.png ( On the picture,
>> openMP is missing.)
>>
>> i decided to take a break and sit down, think what best suits to my needs.
>> Adittionally, i am not familiar with Fortran, so i search for C/C++
>> libraries.
>>
>> what are your experiences, what aspects of your proyect do you
>> consider when choosing, is a good practice to mix these libraries in
>> one same proyect?
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


[OMPI users] Help on the big picture..

2010-07-22 Thread Cristobal Navarro
Hello,

i am designing a solution to one of my programs, which mixes some tree
generation, matrix operatons, eigenvaluies, among other tasks.
i have to paralellize all of this for a cluster of 4 nodes (32 cores),
and what i first thought was MPI as a blind choice, but after looking
at this picture

http://dl.dropbox.com/u/6380744/clusterLibs.png ( On the picture,
openMP is missing.)

i decided to take a break and sit down, think what best suits to my needs.
Adittionally, i am not familiar with Fortran, so i search for C/C++ libraries.

what are your experiences, what aspects of your proyect do you
consider when choosing, is a good practice to mix these libraries in
one same proyect?


[OMPI users] Question about tree generation (in parallel)

2010-06-02 Thread Cristobal Navarro
Hello,

i got an algorithm that generates trees, of different sizes, recursively. at
the moment i have the algorithm in its secuential version.

here we have 4 identical computers with Xeon 8-core in each node + 4gb ram.
they have HyperThreading so they count as 16-processors per node.
so i can launch a total of 64 parallel threads.

my question is, what could be the best approach when using MPI.???

assigning -np 64 maybe is not a good idea, because i would not be taking
advantage of the vecinity of cores which could improve memory tasks speeds,
i mean it might be better to have 4 mpi processes and each one of these
spawn 15 threads locally???...(can i mix MPI with local threads right? )

i dont have much experience in MPI, i only programmed bigger algorithms in
CUDA which is much easier.
any suggestions or help is welcome
Cristobal




--
Cristobal 


Re: [OMPI users] communicate C++ STL strucutures ??

2010-05-07 Thread Cristobal Navarro
Thanks for the answer,

ill will look at them when i get to this point, i've heard good comments
about boost.
Cristobal




On Fri, May 7, 2010 at 4:49 PM, Fernando Lemos <fernando...@gmail.com>wrote:

> On Fri, May 7, 2010 at 5:33 PM, Cristobal Navarro <axisch...@gmail.com>
> wrote:
> > Hello,
> >
> > my question is the following.
> >
> > is it possible to send and receive C++ objects or STL structures (for
> > example, send map<a,b> myMap) through openMPI SEND and RECEIVE functions?
> > at first glance i thought it was possible, but after reading some doc, im
> > not sure.
> > i dont have my source code at that stage for testing yet
>
> Not normally, you have to serialize it before sending and deserialize
> it after sending. You could use Boost.MPI and Boost.Serialize too,
> that would probably be the best way to go.
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


[OMPI users] communicate C++ STL strucutures ??

2010-05-07 Thread Cristobal Navarro
Hello,

my question is the following.

is it possible to send and receive C++ objects or STL structures (for
example, send map myMap) through openMPI SEND and RECEIVE functions?
at first glance i thought it was possible, but after reading some doc, im
not sure.
i dont have my source code at that stage for testing yet


Cristobal


Re: [OMPI users] openmpi 1.4.1 and xgrid

2010-04-30 Thread Cristobal Navarro
this is strange, because some weeks ago i compiled openmpi 1.4.1 on a mac
10.5.6
and the parameter --without-xgrid worked good.

can you turn off xgrid on the macs you are working with?? that might help


Cristobal




On Fri, Apr 30, 2010 at 6:19 PM, Doug Reeder <d...@cox.net> wrote:

> Alan,
>
> I haven't tried to build 1.4.x on os x 10.6.x yet, but it sounds like the
> configure script has become too clever by half. Is there a configure
> argument to force no xgrid (e.g., --with-xgrid=no or --enable-xgrid=no).
>
> Doug Reeder
>
> On Apr 30, 2010, at 3:12 PM, Alan wrote:
>
> Hi guys, thanks,
>
> Well, I can assure there I have the right things as explained here:
>
> ompi 1.2.8 (apple)
> /usr/bin/ompi_info | grep xgrid
>  MCA ras: xgrid (MCA v1.0, API v1.3, Component v1.2.8)
>  MCA pls: xgrid (MCA v1.0, API v1.3, Component v1.2.8)
>
> ompi 1.3.3 (Fink)
> /sw/bin/ompi_info | grep xgrid
> "nothing"
>
> ompi 1.4.1 (mine, for Amber11)
> /Users/alan/Programmes/amber11/exe/ompi_info | grep xgrid
>   MCA plm: xgrid (MCA v2.0, API v2.0, Component v1.4.1)
>
> So, my problem is "simple", the formula I used to compile ompi without
> xgrid used to work, but it's simply not working anymore with ompi 1.4.1,
> even though I see in compilation:
>
> --- MCA component plm:xgrid (m4 configuration macro)
> checking for MCA component plm:xgrid compile mode... static
> checking if C and Objective C are link compatible... yes
> checking for XgridFoundation Framework... yes
> configure: WARNING: XGrid components must be built as DSOs.  Disabling
> checking if MCA component plm:xgrid can compile... no
>
> Any help helps.
>
> Thanks,
>
> Alan
>
> On Fri, Apr 30, 2010 at 20:32, Cristobal Navarro <axisch...@gmail.com>wrote:
>
>> try launching mpirun -v a see what version is picking up.
>> maybe its the included 1.2.x
>>
>>
>> Cristobal
>>
>>
>>
>>
>>
>> On Fri, Apr 30, 2010 at 3:22 PM, Doug Reeder <d...@cox.net> wrote:
>>
>>> Alan,
>>>
>>> Are you sure that the ompi_info and mpirun that you are using are the
>>> 1.4.1 versions and not the apple supplied versions. I use modules to help
>>> ensure that I am using the openmpi that I built and not the apple supplied
>>> versions.
>>>
>>> Doug Reeder
>>> On Apr 30, 2010, at 12:14 PM, Alan wrote:
>>>
>>> Hi there,
>>>
>>>  No matter I do I cannot disable xgrid while compiling opempi. I tried:
>>>
>>> --without-xgrid --enable-shared --enable-static
>>>
>>> And still see with ompi_info:
>>>
>>>  MCA plm: xgrid (MCA v2.0, API v2.0, Component v1.4.1)
>>>
>>> And because of xgrid on ompi, I have:
>>>
>>> openmpi-1.4.1/examples% mpirun -c 2 hello_c
>>> [amadeus.local:26559] [[63998,0],0] ORTE_ERROR_LOG: Unknown error: 1 in
>>> file src/plm_xgrid_module.m at line 119
>>> [amadeus.local:26559] [[63998,0],0] ORTE_ERROR_LOG: Unknown error: 1 in
>>> file src/plm_xgrid_module.m at line 15
>>>
>>> Using mac SL 10.6.3
>>>
>>> Compiling 1.3.3 and haven't any problem.
>>>
>>> Thanks in advance,
>>>
>>> Alan
>>>
>>> --
>>> Alan Wilter S. da Silva, D.Sc. - CCPN Research Associate
>>> Department of Biochemistry, University of Cambridge.
>>> 80 Tennis Court Road, Cambridge CB2 1GA, UK.
>>> >>http://www.bio.cam.ac.uk/~awd28 <http://www.bio.cam.ac.uk/%7Eawd28><<
>>>  ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
>
> --
> Alan Wilter S. da Silva, D.Sc. - CCPN Research Associate
> Department of Biochemistry, University of Cambridge.
> 80 Tennis Court Road, Cambridge CB2 1GA, UK.
> >>http://www.bio.cam.ac.uk/~awd28 <http://www.bio.cam.ac.uk/%7Eawd28><<
>  ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] openmpi 1.4.1 and xgrid

2010-04-30 Thread Cristobal Navarro
try launching mpirun -v a see what version is picking up.
maybe its the included 1.2.x


Cristobal




On Fri, Apr 30, 2010 at 3:22 PM, Doug Reeder  wrote:

> Alan,
>
> Are you sure that the ompi_info and mpirun that you are using are the 1.4.1
> versions and not the apple supplied versions. I use modules to help ensure
> that I am using the openmpi that I built and not the apple supplied
> versions.
>
> Doug Reeder
> On Apr 30, 2010, at 12:14 PM, Alan wrote:
>
> Hi there,
>
> No matter I do I cannot disable xgrid while compiling opempi. I tried:
>
> --without-xgrid --enable-shared --enable-static
>
> And still see with ompi_info:
>
>  MCA plm: xgrid (MCA v2.0, API v2.0, Component v1.4.1)
>
> And because of xgrid on ompi, I have:
>
> openmpi-1.4.1/examples% mpirun -c 2 hello_c
> [amadeus.local:26559] [[63998,0],0] ORTE_ERROR_LOG: Unknown error: 1 in
> file src/plm_xgrid_module.m at line 119
> [amadeus.local:26559] [[63998,0],0] ORTE_ERROR_LOG: Unknown error: 1 in
> file src/plm_xgrid_module.m at line 15
>
> Using mac SL 10.6.3
>
> Compiling 1.3.3 and haven't any problem.
>
> Thanks in advance,
>
> Alan
>
> --
> Alan Wilter S. da Silva, D.Sc. - CCPN Research Associate
> Department of Biochemistry, University of Cambridge.
> 80 Tennis Court Road, Cambridge CB2 1GA, UK.
> >>http://www.bio.cam.ac.uk/~awd28 <<
>  ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] run openMPI jobs with SGE,

2010-04-09 Thread Cristobal Navarro
Thanks,
now i get mixed results and everything seems to be working ok with mixed mpi
xecution

is it normal that after receiving the results, the hosts remain busy like 15
seconds ??
example
master:common master$ qrsh -verbose -pe orte 10
/opt/openmpi-1.4.1/bin/mpirun -np 10 hostname
Your job 65 ("mpirun") has been submitted
waiting for interactive job to be scheduled ...
Your interactive job 65 has been successfully scheduled.
Establishing builtin session to host worker00.local ...
worker00.local
worker00.local
worker00.local
worker00.local
worker00.local
master.local
master.local
master.local
master.local
master.local
#after some seconds, i query the hosts status and slots are still used
master:common master$ qstat -f
queuename  qtype resv/used/tot. load_avg arch
 states
-
all.q@master.local BIP   0/5/16 0.02 darwin-x86
 65 0.55500 mpirun master   r 04/09/2010 17:44:36 5

-
all.q@worker00.local   BIP   0/5/16 0.01 darwin-x86
 65 0.55500 mpirun master   r 04/09/2010 17:44:36 5

master:common master$

but after waiting more time, they get free again
master:common master$ qstat -f
queuename  qtype resv/used/tot. load_avg arch
 states
-
all.q@master.local BIP   0/0/16 0.01 darwin-x86
-
all.q@worker00.local   BIP   0/0/16 0.01 darwin-x86

anyways these are just details, thanks to your help the important aspects
are working.
Cristobal




On Fri, Apr 9, 2010 at 1:34 PM, Reuti <re...@staff.uni-marburg.de> wrote:

> Am 09.04.2010 um 18:57 schrieb Cristobal Navarro:
>
> > sorry the command was missing a number
> >
> > as you said it should be
> >
> > qrsh -verbose -pe pempi 6 mpirun -np 6 hostname
> > waiting for interactive job to be scheduled ...
> >
> > Your "qrsh" request could not be scheduled, try again later.
> > ---
> > this is my parallel enviroment
> > qconf -sp pempi
> > pe_namepempi
> > slots  210
> > user_lists NONE
> > xuser_listsNONE
> > start_proc_args/usr/bin/true
> > stop_proc_args /usr/bin/true
> > allocation_rule$pe_slots
>
> $pe_slots means that all slots must come from one and the same machine
> (e.g. for smp jobs). You can try $round_robin.
>
> -- Reuti
>
>
> > control_slaves TRUE
> > job_is_first_task  FALSE
> > urgency_slots  min
> > accounting_summary TRUE
> >
> > this is the queue
> > qconf -sq cola.q
> > qname cola.q
> > hostlist  @allhosts
> > seq_no0
> > load_thresholds   np_load_avg=1.75
> > suspend_thresholdsNONE
> > nsuspend  1
> > suspend_interval  00:05:00
> > priority  0
> > min_cpu_interval  00:05:00
> > processorsUNDEFINED
> > qtype BATCH INTERACTIVE
> > ckpt_list NONE
> > pe_list   make pempi
> > rerun FALSE
> > slots 2
> > tmpdir/tmp
> > shell /bin/csh
> >
> > i noticed that if i put 2 slots (since the queue has 2 slots) on the -pe
> pempi N   argument and also the full path to mpirun as you guys pointed, it
> works!!!
> > cristobal@neoideo:~$ qrsh -verbose -pe pempi 2
> /opt/openmpi-1.4.1/bin/mpirun -np 6 hostname
> > Your job 125 ("mpirun") has been submitted
> > waiting for interactive job to be scheduled ...
> > Your interactive job 125 has been successfully scheduled.
> > Establishing builtin session to host ijorge.local ...
> > ijorge.local
> > ijorge.local
> > ijorge.local
> > ijorge.local
> > ijorge.local
> > ijorge.local
> > cristobal@neoideo:~$ qrsh -verbose -pe pempi 2
> /opt/openmpi-1.4.1/bin/mpirun -np 6 hostname
> > Your job 126 ("mpirun") has been submitted
> > waiting for interactive job to be scheduled ...
> > Your interactive job 126 has been successfully scheduled.
> > Establishing builtin session to host neoideo ...
> > neoideo
> > neoideo
> > neoideo
> > neoideo
> > neoideo
> > neoideo
> > cristobal@neoideo:~$
> >
> > i just wonder why i didnt get mixed hostnames? like
> > neoideo
> > neoideo
> > 

Re: [OMPI users] run openMPI jobs with SGE,

2010-04-09 Thread Cristobal Navarro
sorry the command was missing a number

as you said it should be

qrsh -verbose -pe pempi 6 mpirun -np 6 hostname
waiting for interactive job to be scheduled ...

Your "qrsh" request could not be scheduled, try again later.
---
*this is my parallel enviroment*
*qconf -sp pempi*
pe_namepempi
slots  210
user_lists NONE
xuser_listsNONE
start_proc_args/usr/bin/true
stop_proc_args /usr/bin/true
allocation_rule$pe_slots
control_slaves TRUE
job_is_first_task  FALSE
urgency_slots  min
accounting_summary TRUE

*this is the queue
qconf -sq cola.q
*qname cola.q
hostlist  @allhosts
seq_no0
load_thresholds   np_load_avg=1.75
suspend_thresholdsNONE
nsuspend  1
suspend_interval  00:05:00
priority  0
min_cpu_interval  00:05:00
processorsUNDEFINED
qtype BATCH INTERACTIVE
ckpt_list NONE
pe_list   make pempi
rerun FALSE
slots 2
tmpdir/tmp
shell /bin/csh*

i noticed that if i put 2 slots (since the queue has 2 slots) on the -pe
pempi N   argument and also the full path to mpirun as you guys pointed, it
works!!!
*cristobal@neoideo:~$ qrsh -verbose -pe pempi 2
/opt/openmpi-1.4.1/bin/mpirun -np 6 hostname
Your job 125 ("mpirun") has been submitted
waiting for interactive job to be scheduled ...
Your interactive job 125 has been successfully scheduled.
Establishing builtin session to host ijorge.local ...
ijorge.local
ijorge.local
ijorge.local
ijorge.local
ijorge.local
ijorge.local
cristobal@neoideo:~$ qrsh -verbose -pe pempi 2 /opt/openmpi-1.4.1/bin/mpirun
-np 6 hostname
Your job 126 ("mpirun") has been submitted
waiting for interactive job to be scheduled ...
Your interactive job 126 has been successfully scheduled.
Establishing builtin session to host neoideo ...
neoideo
neoideo
neoideo
neoideo
neoideo
neoideo
cristobal@neoideo:~$ *
**
i just wonder why i didnt get mixed hostnames? like
neoideo
neoideo
ijorge.local
ijorge.local
neoideo
ijorge.local

??

thanks for the help already!!!
*
Cristobal




On Fri, Apr 9, 2010 at 8:58 AM, Huynh Thuc Cuoc <htc...@gmail.com> wrote:

> Dear friend,
> 1.
> I prefer to use sge qsub cmd, for examples:
>
> [huong@ioitg2 MyPhylo]$ qsub -pe orte 3 myphylo.qsub
> Your job 35 ("myphylo.qsub") has been submitted
> [huong@ioitg2 MyPhylo]$ qstat
> job-ID  prior   name   user state submit/start at
> queue  slots ja-task-ID
>
> -
>  35 0.55500 myphylo.qs huongr 04/09/2010 19:28:59
> al...@node2.ioit-grid.ac.vn3
> [huong@ioitg2 MyPhylo]$ qstat
> [huong@ioitg2 MyPhylo]$
>
> This job is running on node2 of my cluster.
> My softs as following:
> headnode: 4 CPUs. $GRAM, CentOS 5.4 + sge 6.2u4 (qmaster and also execd
> host) + openmpi 1.4.1
> nodes 4CPUs, 1GRAM, CentOS 5.4 + sgeexecd + openmpi1.4.1
> PE=orte and set to 4 slots.
> The app myphylo.qsub has the long cmd in the shell:
> /opt/openmpi/bin/mpirun -np 10 $HOME/MyPhylo/bin/par-phylo-builder --data .
> . . .
> Try to set PE as orte, use default PE = make instead.
>
> 2. I test your cmd on my sytem as:
> a.
> [huong@ioitg2 MyPhylo]$ qrsh -verbose -pe make mpirun -np 6 hostname
> error: Numerical value invalid!
> The initial portion of string "mpirun" contains no decimal number
> [huong@ioitg2 MyPhylo]$ qrsh -verbose -pe orte 2 mpirun -np 6 hostname
> Your job 36 ("mpirun") has been submitted
>
> waiting for interactive job to be scheduled ...
> Your interactive job 36 has been successfully scheduled.
> Establishing builtin session to host ioitg2.ioit-grid.ac.vn ...
> bash: mpirun: command not found
> [huong@ioitg2 MyPhylo]$
>
> ERROR ! So I try:
> [huong@ioitg2 MyPhylo]$ qrsh -verbose -pe orte 2 /opt/openmpi/bin/mpirun
> -np 6 hostname
> Your job 38 ("mpirun") has been submitted
>
> waiting for interactive job to be scheduled ...
> Your interactive job 38 has been successfully scheduled.
> Establishing builtin session to host ioitg2.ioit-grid.ac.vn ...
> ioitg2.ioit-grid.ac.vn
> ioitg2.ioit-grid.ac.vn
> ioitg2.ioit-grid.ac.vn
> ioitg2.ioit-grid.ac.vn
> ioitg2.ioit-grid.ac.vn
> ioitg2.ioit-grid.ac.vn
> [huong@ioitg2 MyPhylo]$
>
> This OK.
> What is: the PATH points to where mpirun is located.
>
> TRY.
>
> Good chance
> HT Cuoc
>
>
> On Fri, Apr 9, 2010 at 11:02 AM, Cristobal Navarro <axisch...@gmail.com>wrote:
>
>> Hello,
>>
>> after some days of work and testing, i managed to install SGE on two
>> machines, also instal

Re: [OMPI users] openMPI on Xgrid

2010-03-31 Thread Cristobal Navarro
and how about Sun Grid Engine + openMPI, good idea??

im asking because i just checked out that Mathematica 7 supports cluster
integration with SGE which will be a plus apart from our C programs.


Cristobal




On Tue, Mar 30, 2010 at 4:06 PM, Gus Correa <g...@ldeo.columbia.edu> wrote:

> Craig Tierney wrote:
>
>> Jody Klymak wrote:
>>
>>> On Mar 30, 2010, at  11:12 AM, Cristobal Navarro wrote:
>>>
>>>  i just have some questions,
>>>> Torque requires moab, but from what i've read on the site you have to
>>>> buy moab right?
>>>>
>>> I am pretty sure you can download torque w/o moab.  I do not use moab,
>>> which I think is a higher-level scheduling layer on top of pbs. However,
>>> there are folks here who would know far more than I do about
>>> these sorts of things.
>>>
>>> Cheers,  Jody
>>>
>>>
>> Moab is a scheduler, which works with Torque and several other
>> products.  Torque comes with a basic scheduler, and Moab is not
>> required.  If you want more features but not pay for Moab, you
>> can look at Maui.
>>
>> Craig
>>
>>
>>
> Hi
>
> Just adding to what Craig and Jody said.
> Moab is not required for Torque.
>
> A small cluster with a few users can work well with
> the basic Torque/PBS scheduler (pbs_sched),
> and its first-in-first-out job policy.
> An alternative is to replace pbs_sched with the
> free Maui scheduler, if you need fine grained job control.
>
> You can install both Torque and Maui from source code (available here
> http://www.clusterresources.com/), but it takes some work.
>
> Some Linux distributions have Torque and Maui available as packages
> through yum, apt-get, etc.
> I would guess for the Mac you can get at least Torque through fink,
> or not?
>
> Gus Correa
> -
> Gustavo Correa
> Lamont-Doherty Earth Observatory - Columbia University
> Palisades, NY, 10964-8000 - USA
> -
>
>
>
>>
>>  --
>>> Jody Klymak
>>> http://web.uvic.ca/~jklymak/
>>>
>>>
>>>
>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] openMPI on Xgrid

2010-03-30 Thread Cristobal Navarro
i looked at TORQUE and looks good, indeed. i will give it a try for testing

i just have some questions,
Torque requires moab, but from what i've read on the site you have to buy
moab right?

im looking for a 100% free solution
Cristobal




On Mon, Mar 29, 2010 at 3:48 PM, Jody Klymak <jkly...@uvic.ca> wrote:

>
> On Mar 29, 2010, at  12:39 PM, Ralph Castain wrote:
>
>
> On Mar 29, 2010, at 1:34 PM, Cristobal Navarro wrote:
>
> thanks for the information,
>
> but is it possible to make it work with xgrid or the 1.4.1 version just
> dont support it?
>
>
> FWIW, I've had excellent success with Torque and openmpi on OS-X 10.5
> Server.
>
> http://www.clusterresources.com/products/torque-resource-manager.php
>
> It doesn't have a nice dashboard, but the queue tools are more than
> adequate for my needs.
>
> Open MPI had a funny port issue on my setup that folks helped with
>
> From my notes:
>
> Edited /Network/Xgrid/openmpi/etc/openmpi-mca-params.conf to make sure
> that the right ports are used:
>
> 
> # set ports so that they are more valid than the default ones (see email
> from Ralph Castain)
> btl_tcp_port_min_v4 = 36900
> btl_tcp_port_range  = 32
> 
>
> Cheers,  Jody
>
>
>  --
> Jody Klymak
> http://web.uvic.ca/~jklymak/
>
>
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] openMPI on Xgrid

2010-03-29 Thread Cristobal Navarro
at least it would be a good exercise to complete the process with xgrid +
openMPI for the knowledge

Cristobal




On Mon, Mar 29, 2010 at 4:11 PM, Cristobal Navarro <axisch...@gmail.com>wrote:

> i realized that xcode dev tools include openMPI 1.2.x
> should i keep trying??
> or do you recommend to completly abandon xgrid and go for another tool like
> Torque with openMPI?
>
>
>
>
> On Mon, Mar 29, 2010 at 3:48 PM, Jody Klymak <jkly...@uvic.ca> wrote:
>
>>
>> On Mar 29, 2010, at  12:39 PM, Ralph Castain wrote:
>>
>>
>> On Mar 29, 2010, at 1:34 PM, Cristobal Navarro wrote:
>>
>> thanks for the information,
>>
>> but is it possible to make it work with xgrid or the 1.4.1 version just
>> dont support it?
>>
>>
>> FWIW, I've had excellent success with Torque and openmpi on OS-X 10.5
>> Server.
>>
>> http://www.clusterresources.com/products/torque-resource-manager.php
>>
>> It doesn't have a nice dashboard, but the queue tools are more than
>> adequate for my needs.
>>
>> Open MPI had a funny port issue on my setup that folks helped with
>>
>> From my notes:
>>
>> Edited /Network/Xgrid/openmpi/etc/openmpi-mca-params.conf to make sure
>> that the right ports are used:
>>
>> 
>> # set ports so that they are more valid than the default ones (see email
>> from Ralph Castain)
>> btl_tcp_port_min_v4 = 36900
>> btl_tcp_port_range  = 32
>> 
>>
>> Cheers,  Jody
>>
>>
>>  --
>> Jody Klymak
>> http://web.uvic.ca/~jklymak/
>>
>>
>>
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>


Re: [OMPI users] openMPI on Xgrid

2010-03-29 Thread Cristobal Navarro
i realized that xcode dev tools include openMPI 1.2.x
should i keep trying??
or do you recommend to completly abandon xgrid and go for another tool like
Torque with openMPI?




On Mon, Mar 29, 2010 at 3:48 PM, Jody Klymak <jkly...@uvic.ca> wrote:

>
> On Mar 29, 2010, at  12:39 PM, Ralph Castain wrote:
>
>
> On Mar 29, 2010, at 1:34 PM, Cristobal Navarro wrote:
>
> thanks for the information,
>
> but is it possible to make it work with xgrid or the 1.4.1 version just
> dont support it?
>
>
> FWIW, I've had excellent success with Torque and openmpi on OS-X 10.5
> Server.
>
> http://www.clusterresources.com/products/torque-resource-manager.php
>
> It doesn't have a nice dashboard, but the queue tools are more than
> adequate for my needs.
>
> Open MPI had a funny port issue on my setup that folks helped with
>
> From my notes:
>
> Edited /Network/Xgrid/openmpi/etc/openmpi-mca-params.conf to make sure
> that the right ports are used:
>
> 
> # set ports so that they are more valid than the default ones (see email
> from Ralph Castain)
> btl_tcp_port_min_v4 = 36900
> btl_tcp_port_range  = 32
> 
>
> Cheers,  Jody
>
>
>  --
> Jody Klymak
> http://web.uvic.ca/~jklymak/
>
>
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] openMPI on Xgrid

2010-03-29 Thread Cristobal Navarro
thanks for the information,

but is it possible to make it work with xgrid or the 1.4.1 version just dont
support it?




On Mon, Mar 29, 2010 at 3:07 PM, Ralph Castain <r...@open-mpi.org> wrote:

> Our xgrid support has been broken for some time now due to lack of access
> to a test environment. So your system is using rsh/ssh instead.
>
> Until we get someone interested in xgrid, or at least willing to debug it
> and tell us what needs to be done, I'm afraid our xgrid support will be
> lacking.
>
>
> On Mar 29, 2010, at 12:56 PM, Cristobal Navarro wrote:
>
> Hello,
> i am new on this mailing list!
> i've read the other messages about configuring openMPI on Xgrid, but i
> havent solved my problem yet and openMPI keeps running as if Xgrid didnt
> exist.
>
> i configured xgrid properly, and can send simple C program jobs trough the
> command line from my client, which is the same as the controller and the
> same as the agent for the moment.
> >> xgrid -h localhost -p pass -job run ./helloWorld
> i also installed xgrid Admin for monitoring.
>
> then,
>  i compiled openMPI 1.4.1 with these options
>
> /configure --prefix=/usr/local/openmpi/ --enable-shared --disable-static
> --with-xgrid
> sudo make
> sudo make install
>
> and i made a simple helloMPI example.
>
>
> /* MPI C Example */
> #include 
> #include 
>
> int main (argc, argv)
>   int argc;
>   char *argv[];
> {
> int rank, size;
>
> MPI_Init (, );   /* starts MPI */
> MPI_Comm_rank (MPI_COMM_WORLD, ); /* get current process id */
> MPI_Comm_size (MPI_COMM_WORLD, ); /* get number of processes */
> printf( "Hello world from process %d of %d\n", rank, size );
> MPI_Finalize();
> return 0;
> }
>
>
> and compiled succesfully
>
> >> mpicc hellompi.c -o hellompi
>
> the i run it
>
> >> mpirun -np 2 hellompi
> I am running on ijorge.local
> Hello World from process 0 of 2
> I am running on ijorge.local
> Hello World from process 1 of 2
>
> the results are correct, but when i check Xgrid Admin, i see that the
> execution didnt go trought Xgrid since there arent any new jobs on the list.
> in the end, openMPI and Xgrid are not comunicating to each other.
>
> what am i missing??
>
> my enviroment variables are these:
>
> >>echo $XGRID_CONTROLLER_HOSTNAME
> ijorge.local
> >>echo $XGRID_CONTROLLER_PASSWORD
> myPassword
>
>
> any help is welcome!!
> thanks in advance
>
> Cristobal
>
>
>  ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


[OMPI users] openMPI on Xgrid

2010-03-29 Thread Cristobal Navarro
Hello,
i am new on this mailing list!
i've read the other messages about configuring openMPI on Xgrid, but i
havent solved my problem yet and openMPI keeps running as if Xgrid didnt
exist.

i configured xgrid properly, and can send simple C program jobs trough the
command line from my client, which is the same as the controller and the
same as the agent for the moment.
>> xgrid -h localhost -p pass -job run ./helloWorld
i also installed xgrid Admin for monitoring.

then,
 i compiled openMPI 1.4.1 with these options

/configure --prefix=/usr/local/openmpi/ --enable-shared --disable-static
--with-xgrid
sudo make
sudo make install

and i made a simple helloMPI example.


/* MPI C Example */
#include 
#include 

int main (argc, argv)
  int argc;
  char *argv[];
{
int rank, size;

MPI_Init (, );   /* starts MPI */
MPI_Comm_rank (MPI_COMM_WORLD, ); /* get current process id */
MPI_Comm_size (MPI_COMM_WORLD, ); /* get number of processes */
printf( "Hello world from process %d of %d\n", rank, size );
MPI_Finalize();
return 0;
}


and compiled succesfully

>> mpicc hellompi.c -o hellompi

the i run it

>> mpirun -np 2 hellompi
I am running on ijorge.local
Hello World from process 0 of 2
I am running on ijorge.local
Hello World from process 1 of 2

the results are correct, but when i check Xgrid Admin, i see that the
execution didnt go trought Xgrid since there arent any new jobs on the list.
in the end, openMPI and Xgrid are not comunicating to each other.

what am i missing??

my enviroment variables are these:

>>echo $XGRID_CONTROLLER_HOSTNAME
ijorge.local
>>echo $XGRID_CONTROLLER_PASSWORD
myPassword


any help is welcome!!
thanks in advance

Cristobal