Sorry, but I can't get it.
Would you please provide a demo_code(in context of the working code) ?

Thanks.

On Sun, Aug 2, 2015 at 7:43 PM, Gilles Gouaillardet <
gilles.gouaillar...@gmail.com> wrote:

> simply replace nwchem with hostname
>
> both hosts should be part of the output...
>
> Cheers,
>
> Gilles
>
> On Sunday, August 2, 2015, abhisek Mondal <abhisek.m...@gmail.com> wrote:
>
>> Jeff, Gilles
>>
>> Here's my scenario again when I tried something different:
>> I've interactively booked 2 nodes(cx1015 and cx1016) and working in
>> "cx1015" node.
>> Here I hit "module load openmpi" and "module load nwchem"( but I don't
>> know how to "module load" on other node).
>> Using the openmpi command to run: "<path>/mpirun --hostfile myhostfile
>> -np 32 <path>/nwchem my_code.nw"
>>
>> And AMAZINGLY it is working...
>>
>> But can you guys suggest me a way so that I can make sure 2 of the booked
>> nodes are being used by mpirun not 1.
>>
>> Thanks.
>>
>> On Sun, Aug 2, 2015 at 5:16 PM, Gilles Gouaillardet <
>> gilles.gouaillar...@gmail.com> wrote:
>>
>>> The initial error was ompi could not find orted on the second node, and
>>> that was fixed by using the full path for mpirun
>>>
>>> if you run under pbs, you should not need the hostile option.
>>> just ask pbs to allocate 2 nodes and everything should run smoothly.
>>>
>>> at first, I recommend you run a non MPI application
>>> /.../bin/mpirun hostname
>>> and then nwchem
>>>
>>> if it still does not work, then run with verbose palm and post the output
>>>
>>> Cheets,
>>>
>>> Gilles
>>>
>>> On Sunday, August 2, 2015, abhisek Mondal <abhisek.m...@gmail.com>
>>> wrote:
>>>
>>>> I'm on a HPC cluster. So, the openmpi-1.6.4 here installed as a module.
>>>> In .pbs script, before executing my code-line, I'm loading both
>>>> "nwchem" and "openmpi" module.
>>>> It is working very nicely when I work on single node(with 16
>>>> processors). But if I try to switch in multiple nodes with "hostfile"
>>>> option, things are  starting to crash.
>>>>
>>>> On Sun, Aug 2, 2015 at 5:02 PM, abhisek Mondal <abhisek.m...@gmail.com>
>>>> wrote:
>>>>
>>>>> HI,
>>>>> I have tried using full paths for both of them. But stuck in the same
>>>>> issue.
>>>>>
>>>>> On Sun, Aug 2, 2015 at 4:39 PM, Gilles Gouaillardet <
>>>>> gilles.gouaillar...@gmail.com> wrote:
>>>>>
>>>>>> Is ompi installed on the other node and at the same location ?
>>>>>> did you configure ompi with --enable-mpirun-prefix-by-default ?
>>>>>> (note that should not be necessary if you invoke mpirun with full
>>>>>> path )
>>>>>>
>>>>>> you can also try
>>>>>> /.../bin/mpirun --mca plm_base_verbose 100 ...
>>>>>>
>>>>>> and see if there is something wrong
>>>>>>
>>>>>> last but not least, can you try to use full path for both mpirun and
>>>>>> nwchem ?
>>>>>>
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Gilles
>>>>>>
>>>>>> On Sunday, August 2, 2015, abhisek Mondal <abhisek.m...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Yes, I have tried this and got following error:
>>>>>>>
>>>>>>> *mpirun was unable to launch the specified application as it could
>>>>>>> not find an executable:*
>>>>>>>
>>>>>>> *Executable: nwchem*
>>>>>>> *Node: cx934*
>>>>>>>
>>>>>>> *while attempting to start process rank 16.*
>>>>>>>
>>>>>>> Given that: I have to run my code with "nwchem filename.nw" command.
>>>>>>> While I run the same thing on 1 node with 16 processor, it works
>>>>>>> fine (mpirun -np 16 nwchem filename.nw).
>>>>>>> Can't understand why am I having problem while trying to go for
>>>>>>> multinode operation.
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>> On Sun, Aug 2, 2015 at 3:41 PM, Gilles Gouaillardet <
>>>>>>> gilles.gouaillar...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Can you try running invoking mpirun with its full path instead ?
>>>>>>>> e.g. /usr/local/bin/mpirun instead of mpirun
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>>
>>>>>>>> Gilles
>>>>>>>>
>>>>>>>> On Sunday, August 2, 2015, abhisek Mondal <abhisek.m...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Here is the other details,
>>>>>>>>>
>>>>>>>>> a. The Openmpi version is 1.6.4
>>>>>>>>>
>>>>>>>>> b. The error as being generated is :
>>>>>>>>> *Warning: Permanently added 'cx0937,10.1.4.1' (RSA) to the list of
>>>>>>>>> known hosts.*
>>>>>>>>> *Warning: Permanently added 'cx0934,10.1.3.255' (RSA) to the list
>>>>>>>>> of known hosts.*
>>>>>>>>> *orted: Command not found.*
>>>>>>>>> *orted: Command not found.*
>>>>>>>>>
>>>>>>>>> *--------------------------------------------------------------------------*
>>>>>>>>> *A daemon (pid 53580) died unexpectedly with status 1 while
>>>>>>>>> attempting*
>>>>>>>>> *to launch so we are aborting.*
>>>>>>>>>
>>>>>>>>> *There may be more information reported by the environment (see
>>>>>>>>> above).*
>>>>>>>>>
>>>>>>>>> *This may be because the daemon was unable to find all the needed
>>>>>>>>> shared*
>>>>>>>>> *libraries on the remote node. You may set your LD_LIBRARY_PATH to
>>>>>>>>> have the*
>>>>>>>>> *location of the shared libraries on the remote nodes and this
>>>>>>>>> will*
>>>>>>>>> *automatically be forwarded to the remote nodes.*
>>>>>>>>>
>>>>>>>>> *--------------------------------------------------------------------------*
>>>>>>>>>
>>>>>>>>> *--------------------------------------------------------------------------*
>>>>>>>>> *mpirun noticed that the job aborted, but has no info as to the
>>>>>>>>> process*
>>>>>>>>> *that caused that situation.*
>>>>>>>>>
>>>>>>>>> *--------------------------------------------------------------------------*
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I'm not being able to understand why "command not found" error is
>>>>>>>>> being raised.
>>>>>>>>> Thank you.
>>>>>>>>>
>>>>>>>>> On Sun, Aug 2, 2015 at 1:43 AM, Ralph Castain <r...@open-mpi.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Would you please tell us:
>>>>>>>>>>
>>>>>>>>>> (a) what version of OMPI you are using
>>>>>>>>>>
>>>>>>>>>> (b) what error message you are getting when the job terminates
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Aug 1, 2015, at 12:22 PM, abhisek Mondal <
>>>>>>>>>> abhisek.m...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>> I'm working on an openmpi enabled cluster. I'm trying to run a
>>>>>>>>>> job with 2 different nodes and 16 processors per nodes.
>>>>>>>>>> Using this command:
>>>>>>>>>>
>>>>>>>>>> *mpirun -np 32 --hostfile myhostfile -loadbalance exe*
>>>>>>>>>>
>>>>>>>>>> The contents of myhostfile:
>>>>>>>>>>
>>>>>>>>>> *cx0937 slots=16    *
>>>>>>>>>> *cx0934 slots=16*
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> But the job is getting terminated each time before job allocation
>>>>>>>>>> happens as per desired way.
>>>>>>>>>>
>>>>>>>>>> So, it'll very nice if I get some suggestions regarding the facts
>>>>>>>>>> I'm missing.
>>>>>>>>>>
>>>>>>>>>> Thank you
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Abhisek Mondal
>>>>>>>>>>
>>>>>>>>>> *Research Fellow*
>>>>>>>>>>
>>>>>>>>>> *Structural Biology and Bioinformatics*
>>>>>>>>>> *Indian Institute of Chemical Biology*
>>>>>>>>>>
>>>>>>>>>> *Kolkata 700032*
>>>>>>>>>>
>>>>>>>>>> *INDIA*
>>>>>>>>>> _______________________________________________
>>>>>>>>>> users mailing list
>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>> Searchable archives:
>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/08/27367.php
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> users mailing list
>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>> Link to this post:
>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/08/27367.php
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Abhisek Mondal
>>>>>>>>>
>>>>>>>>> *Research Fellow*
>>>>>>>>>
>>>>>>>>> *Structural Biology and Bioinformatics*
>>>>>>>>> *Indian Institute of Chemical Biology*
>>>>>>>>>
>>>>>>>>> *Kolkata 700032*
>>>>>>>>>
>>>>>>>>> *INDIA*
>>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> us...@open-mpi.org
>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>> Link to this post:
>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/08/27369.php
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Abhisek Mondal
>>>>>>>
>>>>>>> *Research Fellow*
>>>>>>>
>>>>>>> *Structural Biology and Bioinformatics*
>>>>>>> *Indian Institute of Chemical Biology*
>>>>>>>
>>>>>>> *Kolkata 700032*
>>>>>>>
>>>>>>> *INDIA*
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> Link to this post:
>>>>>> http://www.open-mpi.org/community/lists/users/2015/08/27371.php
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Abhisek Mondal
>>>>>
>>>>> *Research Fellow*
>>>>>
>>>>> *Structural Biology and Bioinformatics*
>>>>> *Indian Institute of Chemical Biology*
>>>>>
>>>>> *Kolkata 700032*
>>>>>
>>>>> *INDIA*
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Abhisek Mondal
>>>>
>>>> *Research Fellow*
>>>>
>>>> *Structural Biology and Bioinformatics*
>>>> *Indian Institute of Chemical Biology*
>>>>
>>>> *Kolkata 700032*
>>>>
>>>> *INDIA*
>>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/users/2015/08/27375.php
>>>
>>
>>
>>
>> --
>> Abhisek Mondal
>>
>> *Research Fellow*
>>
>> *Structural Biology and Bioinformatics*
>> *Indian Institute of Chemical Biology*
>>
>> *Kolkata 700032*
>>
>> *INDIA*
>>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/08/27377.php
>



-- 
Abhisek Mondal

*Research Fellow*

*Structural Biology and Bioinformatics*
*Indian Institute of Chemical Biology*

*Kolkata 700032*

*INDIA*

Reply via email to