Thanks for all the information,

what i meant by

mpirun --mca shmem_base_verbose 100 ...

is really you modify your mpirun command line (or your torque script if applicable) and add

--mca shmem_base_verbose 100

right after mpirun


Cheers,


Gilles


On 5/16/2017 3:59 AM, Ioannis Botsis wrote:
Hi Gilles

Thank you for your prompt response.

Here is some information about the system

Ubuntu 16.04 server
Linux-4.4.0-75-generic-x86_64-with-Ubuntu-16.04-xenial

On HP PROLIANT DL320R05 Generation 5, 4GB RAM,  4x120GB raid-1 HDD,  2
ethernet ports 10/100/1000
HP StorageWorks 70 Modular Smart Array with 14x120GB HDD (RAID-5)

44 HP Proliant BL465c server blade, double AMD Opteron Model 2218(2.6GHz,
2MB, 95W), 4 GB RAM, 2 NC370i Multifunction Gigabit Servers Adapters, 120GB

User's area is shared with the nodes.

ssh and torque 6.0.2 services works fine

Torque and openmpi 2.1.0 are installed from tarball.   configure
--prefix=/storage/exp_soft/tuc  is used for the deployment of openmpi 2.1.0.
After make and make install binaries, lib and include files of openmpi2.1.0
are located under /storage/exp_soft/tuc .

/storage is a shared file system for all the nodes of the cluster

$PATH:
         /storage/exp_soft/tuc/bin
         /storage/exp_soft/tuc/sbin
         /storage/exp_soft/tuc/torque/bin
         /storage/exp_soft/tuc/torque/sbin
         /usr/local/sbin
         /usr/local/bin
         /usr/sbin
         /usr/bin
         /sbin
         /bin
         /snap/bin


LD_LIBRARY_PATH=/storage/exp_soft/tuc/lib

C_INCLUDE_PATH=/storage/exp_soft/tuc/include

I use also jupyterhub (with cluster tab enabled) as a user interface to the
cluster. After the installation of python and some dependencies???? mpich
and openmpi are also installed in the system directories.

----------------------------------------------------------------------------
----------------------------------------------------------------------------
--------------------------
mpirun --allow-run-as-root --mca shmem_base_verbose 100 ...

[se01.grid.tuc.gr:19607] mca: base: components_register: registering
framework shmem components
[se01.grid.tuc.gr:19607] mca: base: components_register: found loaded
component sysv
[se01.grid.tuc.gr:19607] mca: base: components_register: component sysv
register function successful
[se01.grid.tuc.gr:19607] mca: base: components_register: found loaded
component posix
[se01.grid.tuc.gr:19607] mca: base: components_register: component posix
register function successful
[se01.grid.tuc.gr:19607] mca: base: components_register: found loaded
component mmap
[se01.grid.tuc.gr:19607] mca: base: components_register: component mmap
register function successful
[se01.grid.tuc.gr:19607] mca: base: components_open: opening shmem
components
[se01.grid.tuc.gr:19607] mca: base: components_open: found loaded component
sysv
[se01.grid.tuc.gr:19607] mca: base: components_open: component sysv open
function successful
[se01.grid.tuc.gr:19607] mca: base: components_open: found loaded component
posix
[se01.grid.tuc.gr:19607] mca: base: components_open: component posix open
function successful
[se01.grid.tuc.gr:19607] mca: base: components_open: found loaded component
mmap
[se01.grid.tuc.gr:19607] mca: base: components_open: component mmap open
function successful
[se01.grid.tuc.gr:19607] shmem: base: runtime_query: Auto-selecting shmem
components
[se01.grid.tuc.gr:19607] shmem: base: runtime_query: (shmem) Querying
component (run-time) [sysv]
[se01.grid.tuc.gr:19607] shmem: base: runtime_query: (shmem) Query of
component [sysv] set priority to 30
[se01.grid.tuc.gr:19607] shmem: base: runtime_query: (shmem) Querying
component (run-time) [posix]
[se01.grid.tuc.gr:19607] shmem: base: runtime_query: (shmem) Query of
component [posix] set priority to 40
[se01.grid.tuc.gr:19607] shmem: base: runtime_query: (shmem) Querying
component (run-time) [mmap]
[se01.grid.tuc.gr:19607] shmem: base: runtime_query: (shmem) Query of
component [mmap] set priority to 50
[se01.grid.tuc.gr:19607] shmem: base: runtime_query: (shmem) Selected
component [mmap]
[se01.grid.tuc.gr:19607] mca: base: close: unloading component sysv
[se01.grid.tuc.gr:19607] mca: base: close: unloading component posix
[se01.grid.tuc.gr:19607] shmem: base: best_runnable_component_name:
Searching for best runnable component.
[se01.grid.tuc.gr:19607] shmem: base: best_runnable_component_name: Found
best runnable component: (mmap).
--------------------------------------------------------------------------
mpirun was unable to find the specified executable file, and therefore
did not launch the job.  This error was first reported for process
rank 0; it may have occurred for other processes as well.

NOTE: A common cause for this error is misspelling a mpirun command
       line parameter option (remember that mpirun interprets the first
       unrecognized command line token as the executable).

Node:       se01
Executable: ...
--------------------------------------------------------------------------
2 total processes failed to start
[se01.grid.tuc.gr:19607] mca: base: close: component mmap closed
[se01.grid.tuc.gr:19607] mca: base: close: unloading component mmap


jb


-----Original Message-----
From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of
gil...@rist.or.jp
Sent: Monday, May 15, 2017 1:47 PM
To: Open MPI Users <users@lists.open-mpi.org>
Subject: Re: [OMPI users] (no subject)

Ioannis,

### What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git
branch name and hash, etc.)



### Describe how Open MPI was installed (e.g., from a source/
distribution tarball, from a git clone, from an operating system
distribution package, etc.)



### Please describe the system on which you are running

* Operating system/version:
* Computer hardware:
* Network type:

also, what if you

mpirun --mca shmem_base_verbose 100 ...


Cheers,

Gilles
----- Original Message -----
Hi

I am trying to run the following simple demo to a cluster of two nodes

----------------------------------------------------------------------
------------------------------------
#include <mpi.h>
#include <stdio.h>

int main(int argc, char** argv) {
      MPI_Init(NULL, NULL);

      int world_size;
      MPI_Comm_size(MPI_COMM_WORLD, &world_size);

      int world_rank;
      MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);

      char processor_name[MPI_MAX_PROCESSOR_NAME];
      int name_len;
      MPI_Get_processor_name(processor_name, &name_len);

      printf("Hello world from processor %s, rank %d"   " out of %d
processors\n",  processor_name, world_rank, world_size);

      MPI_Finalize();
}
----------------------------------------------------------------------
---------------------------
i get always the message

----------------------------------------------------------------------
--------------------------
It looks like opal_init failed for some reason; your parallel process
is
likely to abort.  There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

    opal_shmem_base_select failed
    --> Returned value -1 instead of OPAL_SUCCESS
----------------------------------------------------------------------
----------------------------
any hint?

Ioannis Botsis



_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to