Sorry, but I can't get it. Would you please provide a demo_code(in context of the working code) ?
Thanks. On Sun, Aug 2, 2015 at 7:43 PM, Gilles Gouaillardet < gilles.gouaillar...@gmail.com> wrote: > simply replace nwchem with hostname > > both hosts should be part of the output... > > Cheers, > > Gilles > > On Sunday, August 2, 2015, abhisek Mondal <abhisek.m...@gmail.com> wrote: > >> Jeff, Gilles >> >> Here's my scenario again when I tried something different: >> I've interactively booked 2 nodes(cx1015 and cx1016) and working in >> "cx1015" node. >> Here I hit "module load openmpi" and "module load nwchem"( but I don't >> know how to "module load" on other node). >> Using the openmpi command to run: "<path>/mpirun --hostfile myhostfile >> -np 32 <path>/nwchem my_code.nw" >> >> And AMAZINGLY it is working... >> >> But can you guys suggest me a way so that I can make sure 2 of the booked >> nodes are being used by mpirun not 1. >> >> Thanks. >> >> On Sun, Aug 2, 2015 at 5:16 PM, Gilles Gouaillardet < >> gilles.gouaillar...@gmail.com> wrote: >> >>> The initial error was ompi could not find orted on the second node, and >>> that was fixed by using the full path for mpirun >>> >>> if you run under pbs, you should not need the hostile option. >>> just ask pbs to allocate 2 nodes and everything should run smoothly. >>> >>> at first, I recommend you run a non MPI application >>> /.../bin/mpirun hostname >>> and then nwchem >>> >>> if it still does not work, then run with verbose palm and post the output >>> >>> Cheets, >>> >>> Gilles >>> >>> On Sunday, August 2, 2015, abhisek Mondal <abhisek.m...@gmail.com> >>> wrote: >>> >>>> I'm on a HPC cluster. So, the openmpi-1.6.4 here installed as a module. >>>> In .pbs script, before executing my code-line, I'm loading both >>>> "nwchem" and "openmpi" module. >>>> It is working very nicely when I work on single node(with 16 >>>> processors). But if I try to switch in multiple nodes with "hostfile" >>>> option, things are starting to crash. >>>> >>>> On Sun, Aug 2, 2015 at 5:02 PM, abhisek Mondal <abhisek.m...@gmail.com> >>>> wrote: >>>> >>>>> HI, >>>>> I have tried using full paths for both of them. But stuck in the same >>>>> issue. >>>>> >>>>> On Sun, Aug 2, 2015 at 4:39 PM, Gilles Gouaillardet < >>>>> gilles.gouaillar...@gmail.com> wrote: >>>>> >>>>>> Is ompi installed on the other node and at the same location ? >>>>>> did you configure ompi with --enable-mpirun-prefix-by-default ? >>>>>> (note that should not be necessary if you invoke mpirun with full >>>>>> path ) >>>>>> >>>>>> you can also try >>>>>> /.../bin/mpirun --mca plm_base_verbose 100 ... >>>>>> >>>>>> and see if there is something wrong >>>>>> >>>>>> last but not least, can you try to use full path for both mpirun and >>>>>> nwchem ? >>>>>> >>>>>> >>>>>> Cheers, >>>>>> >>>>>> Gilles >>>>>> >>>>>> On Sunday, August 2, 2015, abhisek Mondal <abhisek.m...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Yes, I have tried this and got following error: >>>>>>> >>>>>>> *mpirun was unable to launch the specified application as it could >>>>>>> not find an executable:* >>>>>>> >>>>>>> *Executable: nwchem* >>>>>>> *Node: cx934* >>>>>>> >>>>>>> *while attempting to start process rank 16.* >>>>>>> >>>>>>> Given that: I have to run my code with "nwchem filename.nw" command. >>>>>>> While I run the same thing on 1 node with 16 processor, it works >>>>>>> fine (mpirun -np 16 nwchem filename.nw). >>>>>>> Can't understand why am I having problem while trying to go for >>>>>>> multinode operation. >>>>>>> >>>>>>> Thanks. >>>>>>> >>>>>>> On Sun, Aug 2, 2015 at 3:41 PM, Gilles Gouaillardet < >>>>>>> gilles.gouaillar...@gmail.com> wrote: >>>>>>> >>>>>>>> Can you try running invoking mpirun with its full path instead ? >>>>>>>> e.g. /usr/local/bin/mpirun instead of mpirun >>>>>>>> >>>>>>>> Cheers, >>>>>>>> >>>>>>>> Gilles >>>>>>>> >>>>>>>> On Sunday, August 2, 2015, abhisek Mondal <abhisek.m...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Here is the other details, >>>>>>>>> >>>>>>>>> a. The Openmpi version is 1.6.4 >>>>>>>>> >>>>>>>>> b. The error as being generated is : >>>>>>>>> *Warning: Permanently added 'cx0937,10.1.4.1' (RSA) to the list of >>>>>>>>> known hosts.* >>>>>>>>> *Warning: Permanently added 'cx0934,10.1.3.255' (RSA) to the list >>>>>>>>> of known hosts.* >>>>>>>>> *orted: Command not found.* >>>>>>>>> *orted: Command not found.* >>>>>>>>> >>>>>>>>> *--------------------------------------------------------------------------* >>>>>>>>> *A daemon (pid 53580) died unexpectedly with status 1 while >>>>>>>>> attempting* >>>>>>>>> *to launch so we are aborting.* >>>>>>>>> >>>>>>>>> *There may be more information reported by the environment (see >>>>>>>>> above).* >>>>>>>>> >>>>>>>>> *This may be because the daemon was unable to find all the needed >>>>>>>>> shared* >>>>>>>>> *libraries on the remote node. You may set your LD_LIBRARY_PATH to >>>>>>>>> have the* >>>>>>>>> *location of the shared libraries on the remote nodes and this >>>>>>>>> will* >>>>>>>>> *automatically be forwarded to the remote nodes.* >>>>>>>>> >>>>>>>>> *--------------------------------------------------------------------------* >>>>>>>>> >>>>>>>>> *--------------------------------------------------------------------------* >>>>>>>>> *mpirun noticed that the job aborted, but has no info as to the >>>>>>>>> process* >>>>>>>>> *that caused that situation.* >>>>>>>>> >>>>>>>>> *--------------------------------------------------------------------------* >>>>>>>>> >>>>>>>>> >>>>>>>>> I'm not being able to understand why "command not found" error is >>>>>>>>> being raised. >>>>>>>>> Thank you. >>>>>>>>> >>>>>>>>> On Sun, Aug 2, 2015 at 1:43 AM, Ralph Castain <r...@open-mpi.org> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Would you please tell us: >>>>>>>>>> >>>>>>>>>> (a) what version of OMPI you are using >>>>>>>>>> >>>>>>>>>> (b) what error message you are getting when the job terminates >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Aug 1, 2015, at 12:22 PM, abhisek Mondal < >>>>>>>>>> abhisek.m...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>> I'm working on an openmpi enabled cluster. I'm trying to run a >>>>>>>>>> job with 2 different nodes and 16 processors per nodes. >>>>>>>>>> Using this command: >>>>>>>>>> >>>>>>>>>> *mpirun -np 32 --hostfile myhostfile -loadbalance exe* >>>>>>>>>> >>>>>>>>>> The contents of myhostfile: >>>>>>>>>> >>>>>>>>>> *cx0937 slots=16 * >>>>>>>>>> *cx0934 slots=16* >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> But the job is getting terminated each time before job allocation >>>>>>>>>> happens as per desired way. >>>>>>>>>> >>>>>>>>>> So, it'll very nice if I get some suggestions regarding the facts >>>>>>>>>> I'm missing. >>>>>>>>>> >>>>>>>>>> Thank you >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Abhisek Mondal >>>>>>>>>> >>>>>>>>>> *Research Fellow* >>>>>>>>>> >>>>>>>>>> *Structural Biology and Bioinformatics* >>>>>>>>>> *Indian Institute of Chemical Biology* >>>>>>>>>> >>>>>>>>>> *Kolkata 700032* >>>>>>>>>> >>>>>>>>>> *INDIA* >>>>>>>>>> _______________________________________________ >>>>>>>>>> users mailing list >>>>>>>>>> us...@open-mpi.org >>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>> Searchable archives: >>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/08/27367.php >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> users mailing list >>>>>>>>>> us...@open-mpi.org >>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>> Link to this post: >>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/08/27367.php >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Abhisek Mondal >>>>>>>>> >>>>>>>>> *Research Fellow* >>>>>>>>> >>>>>>>>> *Structural Biology and Bioinformatics* >>>>>>>>> *Indian Institute of Chemical Biology* >>>>>>>>> >>>>>>>>> *Kolkata 700032* >>>>>>>>> >>>>>>>>> *INDIA* >>>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> us...@open-mpi.org >>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>> Link to this post: >>>>>>>> http://www.open-mpi.org/community/lists/users/2015/08/27369.php >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Abhisek Mondal >>>>>>> >>>>>>> *Research Fellow* >>>>>>> >>>>>>> *Structural Biology and Bioinformatics* >>>>>>> *Indian Institute of Chemical Biology* >>>>>>> >>>>>>> *Kolkata 700032* >>>>>>> >>>>>>> *INDIA* >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> Link to this post: >>>>>> http://www.open-mpi.org/community/lists/users/2015/08/27371.php >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Abhisek Mondal >>>>> >>>>> *Research Fellow* >>>>> >>>>> *Structural Biology and Bioinformatics* >>>>> *Indian Institute of Chemical Biology* >>>>> >>>>> *Kolkata 700032* >>>>> >>>>> *INDIA* >>>>> >>>> >>>> >>>> >>>> -- >>>> Abhisek Mondal >>>> >>>> *Research Fellow* >>>> >>>> *Structural Biology and Bioinformatics* >>>> *Indian Institute of Chemical Biology* >>>> >>>> *Kolkata 700032* >>>> >>>> *INDIA* >>>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2015/08/27375.php >>> >> >> >> >> -- >> Abhisek Mondal >> >> *Research Fellow* >> >> *Structural Biology and Bioinformatics* >> *Indian Institute of Chemical Biology* >> >> *Kolkata 700032* >> >> *INDIA* >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/08/27377.php > -- Abhisek Mondal *Research Fellow* *Structural Biology and Bioinformatics* *Indian Institute of Chemical Biology* *Kolkata 700032* *INDIA*