Re: [OMPI users] v1.5.3-x64 does not work on Windows 7 workgroup

Damien Fri, 20 May 2011 14:58:36 -0400

MPI can get through your firewall, right?

Damien


On 20/05/2011 12:53 PM, Jason Mackay wrote:

I have verified that disabling UAC does not fix the problem. xhlp.exestarts, threads spin up on both machines, CPU usage is at 80-90% butno progress is ever made.
>From this state, Ctrl-break on the head node yields the following output:
[REMOTEMACHINE:02032] [[20816,1],0]-[[20816,0],0]mca_oob_tcp_msg_recv: readv failed: Unknown error (108)[REMOTEMACHINE:05064] [[20816,1],1]-[[20816,0],0]mca_oob_tcp_msg_recv: readv failed: Unknown error (108)[REMOTEMACHINE:05420] [[20816,1],2]-[[20816,0],0]mca_oob_tcp_msg_recv: readv failed: Unknown error (108)[REMOTEMACHINE:03852] [[20816,1],3]-[[20816,0],0]mca_oob_tcp_msg_recv: readv failed: Unknown error (108)[REMOTEMACHINE:05436] [[20816,1],4]-[[20816,0],0]mca_oob_tcp_msg_recv: readv failed: Unknown error (108)[REMOTEMACHINE:04416] [[20816,1],5]-[[20816,0],0]mca_oob_tcp_msg_recv: readv failed: Unknown error (108)[REMOTEMACHINE:02032] [[20816,1],0] routed:binomial: Connection tolifeline [[20816,0],0] lost[REMOTEMACHINE:05064] [[20816,1],1] routed:binomial: Connection tolifeline [[20816,0],0] lost[REMOTEMACHINE:05420] [[20816,1],2] routed:binomial: Connection tolifeline [[20816,0],0] lost[REMOTEMACHINE:03852] [[20816,1],3] routed:binomial: Connection tolifeline [[20816,0],0] lost[REMOTEMACHINE:05436] [[20816,1],4] routed:binomial: Connection tolifeline [[20816,0],0] lost[REMOTEMACHINE:04416] [[20816,1],5] routed:binomial: Connection tolifeline [[20816,0],0] lost
> From: users-requ...@open-mpi.org
> Subject: users Digest, Vol 1911, Issue 1
> To: us...@open-mpi.org
> Date: Fri, 20 May 2011 08:14:13 -0400
>
> Send users mailing list submissions to
> us...@open-mpi.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> or, via email, send a message with subject or body 'help' to
> users-requ...@open-mpi.org
>
> You can reach the person managing the list at
> users-ow...@open-mpi.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of users digest..."
>
>
> Today's Topics:
>
> 1. Re: Error: Entry Point Not Found (Zhangping Wei)
> 2. Re: Problem with MPI_Request, MPI_Isend/recv and
> MPI_Wait/Test (George Bosilca)
> 3. Re: v1.5.3-x64 does not work on Windows 7 workgroup (Jeff Squyres)
> 4. Re: Error: Entry Point Not Found (Jeff Squyres)
> 5. Re: openmpi (1.2.8 or above) and Intel composer XE 2011 (aka
> 12.0) (Jeff Squyres)
> 6. Re: Openib with > 32 cores per node (Jeff Squyres)
> 7. Re: MPI_COMM_DUP freeze with OpenMPI 1.4.1 (Jeff Squyres)
> 8. Re: Trouble with MPI-IO (Jeff Squyres)
> 9. Re: Trouble with MPI-IO (Tom Rosmond)
> 10. Re: Problem with MPI_Request, MPI_Isend/recv and
> MPI_Wait/Test (David B?ttner)
> 11. Re: Trouble with MPI-IO (Jeff Squyres)
> 12. Re: MPI_Alltoallv function crashes when np > 100 (Jeff Squyres)
> 13. Re: MPI_ERR_TRUNCATE with MPI_Allreduce() error, but only
> sometimes... (Jeff Squyres)
> 14. Re: Trouble with MPI-IO (Jeff Squyres)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Thu, 19 May 2011 09:13:53 -0700 (PDT)
> From: Zhangping Wei <zhangping_...@yahoo.com>
> Subject: Re: [OMPI users] Error: Entry Point Not Found
> To: us...@open-mpi.org
> Message-ID: <101342.7961...@web111818.mail.gq1.yahoo.com>
> Content-Type: text/plain; charset="gb2312"
>
> Dear Paul,
>
> I checked the way 'mpirun -np N <cmd>' you mentioned, but it was thesame
> problem.
>
> I guess it may related to the system I used, because I have used itcorrectly in
> another XP 32 bit system.
>
> I look forward to more advice.Thanks.
>
> Zhangping
>
>
>
>
> ________________________________
> ???????? "users-requ...@open-mpi.org" <users-requ...@open-mpi.org>
> ???????? us...@open-mpi.org
> ?????????? 2011/5/19 (????) 11:00:02 ????
> ?? ???? users Digest, Vol 1910, Issue 2
>
> Send users mailing list submissions to
> us...@open-mpi.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> or, via email, send a message with subject or body 'help' to
> users-requ...@open-mpi.org
>
> You can reach the person managing the list at
> users-ow...@open-mpi.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of users digest..."
>
>
> Today's Topics:
>
> 1. Re: Error: Entry Point Not Found (Paul van der Walt)
> 2. Re: Openib with > 32 cores per node (Robert Horton)
> 3. Re: Openib with > 32 cores per node (Samuel K. Gutierrez)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Thu, 19 May 2011 16:14:02 +0100
> From: Paul van der Walt <p...@denknerd.nl>
> Subject: Re: [OMPI users] Error: Entry Point Not Found
> To: Open MPI Users <us...@open-mpi.org>
> Message-ID: <banlktinjz0cntchqjczyhfgsnr51jpu...@mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
>
> Hi,
>
> On 19 May 2011 15:54, Zhangping Wei <zhangping_...@yahoo.com> wrote:
> > 4, I use command window to run it in this way: ?mpirun ?n 4?**.exe ?,then I
>
> Probably not the problem, but shouldn't that be 'mpirun -np N <cmd>' ?
>
> Paul
>
> --
> O< ascii ribbon campaign - stop html mail - www.asciiribbon.org
>
>
>
> ------------------------------
>
> Message: 2
> Date: Thu, 19 May 2011 16:37:56 +0100
> From: Robert Horton <r.hor...@qmul.ac.uk>
> Subject: Re: [OMPI users] Openib with > 32 cores per node
> To: Open MPI Users <us...@open-mpi.org>
> Message-ID: <1305819476.9663.148.camel@moelwyn>
> Content-Type: text/plain; charset="UTF-8"
>
> On Thu, 2011-05-19 at 08:27 -0600, Samuel K. Gutierrez wrote:
> > Hi,
> >
> > Try the following QP parameters that only use shared receive queues.
> >
> > -mca btl_openib_receive_queues S,12288,128,64,32:S,65536,128,64,32
> >
>
> Thanks for that. If I run the job over 2 x 48 cores it now works and the
> performance seems reasonable (I need to do some more tuning) but when I
> go up to 4 x 48 cores I'm getting the same problem:
>
>[compute-1-7.local][[14383,1],86][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_oob.c:464:qp_create_one]
> error creating qp errno says Cannot allocate memory
> [compute-1-7.local:18106] *** An error occurred in MPI_Isend
> [compute-1-7.local:18106] *** on communicator MPI_COMM_WORLD
> [compute-1-7.local:18106] *** MPI_ERR_OTHER: known error not in list
> [compute-1-7.local:18106] *** MPI_ERRORS_ARE_FATAL (your MPI jobwill now abort)
>
> Any thoughts?
>
> Thanks,
> Rob
> --
> Robert Horton
> System Administrator (Research Support) - School of MathematicalSciences
> Queen Mary, University of London
> r.hor...@qmul.ac.uk - +44 (0) 20 7882 7345
>
>
>
> ------------------------------
>
> Message: 3
> Date: Thu, 19 May 2011 09:59:13 -0600
> From: "Samuel K. Gutierrez" <sam...@lanl.gov>
> Subject: Re: [OMPI users] Openib with > 32 cores per node
> To: Open MPI Users <us...@open-mpi.org>
> Message-ID: <b3e83138-9af0-48c0-871c-dbbb2e712...@lanl.gov>
> Content-Type: text/plain; charset=us-ascii
>
> Hi,
>
> On May 19, 2011, at 9:37 AM, Robert Horton wrote
>
> > On Thu, 2011-05-19 at 08:27 -0600, Samuel K. Gutierrez wrote:
> >> Hi,
> >>
> >> Try the following QP parameters that only use shared receive queues.
> >>
> >> -mca btl_openib_receive_queues S,12288,128,64,32:S,65536,128,64,32
> >>
> >
> > Thanks for that. If I run the job over 2 x 48 cores it now worksand the> > performance seems reasonable (I need to do some more tuning) butwhen I
> > go up to 4 x 48 cores I'm getting the same problem:
> >
>>[compute-1-7.local][[14383,1],86][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_oob.c:464:qp_create_one]
> >] error creating qp errno says Cannot allocate memory
> > [compute-1-7.local:18106] *** An error occurred in MPI_Isend
> > [compute-1-7.local:18106] *** on communicator MPI_COMM_WORLD
> > [compute-1-7.local:18106] *** MPI_ERR_OTHER: known error not in list
> > [compute-1-7.local:18106] *** MPI_ERRORS_ARE_FATAL (your MPI jobwill now
> >abort)
> >
> > Any thoughts?
>
> How much memory does each node have? Does this happen at startup?
>
> Try adding:
>
> -mca btl_openib_cpc_include rdmacm
>
> I'm not sure if your version of OFED supports this feature, butmaybe using XRC> may help. I **think** other tweaks are needed to get this going, butI'm not
> familiar with the details.
>
> Hope that helps,
>
> Samuel K. Gutierrez
> Los Alamos National Laboratory
>
>
> >
> > Thanks,
> > Rob
> > --
> > Robert Horton
> > System Administrator (Research Support) - School of MathematicalSciences
> > Queen Mary, University of London
> > r.hor...@qmul.ac.uk - +44 (0) 20 7882 7345
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
>
>
>
> ------------------------------
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> End of users Digest, Vol 1910, Issue 2
> **************************************
> -------------- next part --------------
> HTML attachment scrubbed and removed
>
> ------------------------------
>
> Message: 2
> Date: Thu, 19 May 2011 08:48:03 -0800
> From: George Bosilca <bosi...@eecs.utk.edu>
> Subject: Re: [OMPI users] Problem with MPI_Request, MPI_Isend/recv and
> MPI_Wait/Test
> To: Open MPI Users <us...@open-mpi.org>
> Message-ID: <fcac66f9-fdb5-48bb-a800-263d8a4f9...@eecs.utk.edu>
> Content-Type: text/plain; charset=iso-8859-1
>
> David,
>
> I do not see any mechanism for protecting the accesses to therequests to a single thread? What is the thread model you're using?
>
> >From an implementation perspective, your code is correct only ifyou initialize the MPI library with MPI_THREAD_MULTIPLE and if thelibrary accepts. Otherwise, there is an assumption that theapplication is single threaded, or that the MPI behavior isimplementation dependent. Please read the MPI standard regarding toMPI_Init_thread for more details.
>
> Regards,
> george.
>
> On May 19, 2011, at 02:34 , David B?ttner wrote:
>
> > Hello,
> >
> > I am working on a hybrid MPI (OpenMPI 1.4.3) and Pthread code. Iam using MPI_Isend and MPI_Irecv for communication andMPI_Test/MPI_Wait to check if it is done. I do this repeatedly in theouter loop of my code. The MPI_Test is used in the inner loop to checkif some function can be called which depends on the received data.> > The program regularly crashed (only when not using printf...) andafter debugging it I figured out the following problem:
> >
> > In MPI_Isend I have an invalid read of memory. I fixed the problemwith not re-using a
> >
> > MPI_Request req_s, req_r;
> >
> > but by using
> >
> > MPI_Request* req_s;
> > MPI_Request* req_r
> >
> > and re-allocating them before the MPI_Isend/recv.
> >
> > The documentation says, that in MPI_Wait and MPI_Test (ifsuccessful) the request-objects are deallocated and set toMPI_REQUEST_NULL.> > It also says, that in MPI_Isend and MPI_Irecv, it allocates theObjects and associates it with the request object.
> >
> > As I understand this, this either means I can use a pointer toMPI_Request which I don't have to initialize for this (it doesn't workbut crashes), or that I can use a MPI_Request pointer which I haveinitialized with malloc(sizeof(MPI_REQUEST)) (or passing the addressof a MPI_Request req), which is set and unset in the functions. Butthis version crashes, too.> > What works is using a pointer, which I allocate before theMPI_Isend/recv and which I free after MPI_Wait in every iteration. Inother words: It only uses if I don't reuse any kind of MPI_Request.Only if I recreate one every time.
> >
> > Is this, what is should be like? I believe that a reuse of thememory would be a lot more efficient (less calls to malloc...). Am Imissing something here? Or am I doing something wrong?
> >
> >
> > Let me provide some more detailed information about my problem:
> >
> > I am running the program on a 30 node infiniband cluster. Eachnode has 4 single core Opteron CPUs. I am running 1 MPI Rank per nodeand 4 threads per rank (-> one thread per core).
> > I am compiling with mpicc of OpenMPI using gcc below.
> > Some pseudo-code of the program can be found at the end of thise-mail.
> >
> > I was able to reproduce the problem using different amount ofnodes and even using one node only. The problem does not arise when Iput printf-debugging information into the code. This pointed me intothe direction that I have some memory problem, where some writeaccesses some memory it is not supposed to.> > I ran the tests using valgrind with --leak-check=full and--show-reachable=yes, which pointed me either to MPI_Isend or MPI_Waitdepending on whether I had the threads spin in a loop for MPI_Test toreturn success or used MPI_Wait respectively.
> >
> > I would appreciate your help with this. Am I missing somethingimportant here? Is there a way to re-use the request in the differentiterations other than I thought it should work?> > Or is there a way to re-initialize the allocated memory before theMPI_Isend/recv so that I at least don't have to call free and malloceach time?
> >
> > Thank you very much for your help!
> > Kind regards,
> > David B?ttner
> >
> > _____________________
> > Pseudo-Code of program:
> >
> > MPI_Request* req_s;
> > MPI_Request* req_w;
> > OUTER-LOOP
> > if(0 == threadid)
> > {
> > req_s = malloc(sizeof(MPI_Request));
> > req_r = malloc(sizeof(MPI_Request));
> > MPI_Isend(..., req_s)
> > MPI_Irecv(..., req_r)
> > }
> > pthread_barrier
> > INNER-LOOP (while NOT_DONE or RET)
> > if(TRYLOCK && NOT_DONE)
> > {
> > if(MPI_TEST(req_r))
> > {
> > Call_Function_A;
> > NOT_DONE = 0;
> > }
> >
> > }
> > RET = Call_Function_B;
> > }
> > pthread_barrier_wait
> > if(0 == threadid)
> > {
> > MPI_WAIT(req_s)
> > MPI_WAIT(req_r)
> > free(req_s);
> > free(req_r);
> > }
> > _____________
> >
> >
> > --
> > David B?ttner, Informatik, Technische Universit?t M?nchen
> > TUM I-10 - FMI 01.06.059 - Tel. 089 / 289-17676
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> "To preserve the freedom of the human mind then and freedom of thepress, every spirit should be ready to devote itself to martyrdom; foras long as we may think as we will, and speak as we think, thecondition of man will proceed in improvement."
> -- Thomas Jefferson, 1799
>
>
>
>
> ------------------------------
>
> Message: 3
> Date: Thu, 19 May 2011 21:22:48 -0400
> From: Jeff Squyres <jsquy...@cisco.com>
> Subject: Re: [OMPI users] v1.5.3-x64 does not work on Windows 7
> workgroup
> To: Open MPI Users <us...@open-mpi.org>
> Message-ID: <278274f0-bf00-4498-950f-9779e0083...@cisco.com>
> Content-Type: text/plain; charset=us-ascii
>
> Unfortunately, our Windows guy (Shiqing) is off getting married andwill be out for a little while. :-(
>
> All that I can cite is the README.WINDOWS.txt file in the top-leveldirectory. I'm afraid that I don't know much else about Windows. :-(
>
>
> On May 18, 2011, at 8:17 PM, Jason Mackay wrote:
>
> > Hi all,
> >
> > My thanks to all those involved for putting together this Windowsbinary release of OpenMPI! I am hoping to use it in a small Windowsbased OpenMPI cluster at home.
> >
> > Unfortunately my experience so far has not exactly been troublefree. It seems that, due to the fact that this release is using WMI,there are a number of settings that must be configured on the machinesin order to get this to work. These settings are not documented in thedistribution at all. I have been experimenting with it for over a weekon and off and as soon as I solve one problem, another one arises.
> >
> > Currently, after much searching, reading, and tinkering with DCOMsettings etc..., I can remotely start processes on all my machinesusing mpirun but those processes cannot access network shares (e.g.for binary distribution) and HPL (which works on any one node) doesnot seem to work if I run it across multiple nodes, also indicating anetwork issue (CPU sits at 100% in all processes with no networktraffic and never terminates). To eliminate premission issues that maybe caused by UAC I tried the same setup on two domain machines usingan administrative account to launch and the behavior was the same. Ihave read that WMI processes cannot access network resources and I amat a loss for a solution to this newest of problems. If anyone knowshow to make this work I would appreciate the help. I assume thatsomeone has gotten this working and has the answers.
> >
> > I have searched the mailing list archives and I found other userswith similar problems but no clear guidance on the threads. Somethreads make references to Microsoft KB articles but do not explicitlytell the user what needs to be done, leaving each new user torediscover the tricks on their own. One thread made it appear thattesting had only been done on Windows XP. Needless to say, securityhas changed dramatically in Windows since XP!
> >
> > I would like to see OpenMPI for Windows be usable by a newcomerwithout all of this pain.
> >
> > What would be fantastic would be:
> > 1) a step-by-step procedure for how to get OpenMPI 1.5 working onWindows> > a) preferably in a bare Windows 7 workgroup environment withnothing else (i.e. no Microsoft Cluster Compute Pack, no domain etc...)
> > 2) inclusion of these steps in the binary distribution
> > 3) bonus points for a script which accomplishes these thingsautomatically
> >
> > If someone can help with (1), I would happily volunteer my time towork on (3).
> >
> > Regards,
> > Jason
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
>
>
> ------------------------------
>
> Message: 4
> Date: Thu, 19 May 2011 21:26:43 -0400
> From: Jeff Squyres <jsquy...@cisco.com>
> Subject: Re: [OMPI users] Error: Entry Point Not Found
> To: Open MPI Users <us...@open-mpi.org>
> Message-ID: <f830ec35-fc9b-4801-b2a3-50f54d215...@cisco.com>
> Content-Type: text/plain; charset=windows-1252
>
> On May 19, 2011, at 10:54 AM, Zhangping Wei wrote:
>
> > 4, I use command window to run it in this way: ?mpirun ?n 4 **.exe?,then I met the error: ?entry point not found: the procedure entrypoint inet_pton could not be located in the dynamic link libraryWS2_32.dll?
>
> Unfortunately our Windows developer/maintainer is out for a littlewhile (he's getting married); he pretty much did the Windows stuff byhimself, so none of the rest of us know much about it. :(
>
> inet_pton is a standard function call relating to IP addresses thatwe use in the internals of OMPI; I'm not sure why it wouldn't be foundon Windows XP (Shiqing did cite that the OMPI Windows port should workon Windows XP).
>
> This post seems to imply that inet_ntop is only available on Vistaand above:
>
>http://social.msdn.microsoft.com/Forums/en-US/vcgeneral/thread/e40465f2-41b7-4243-ad33-15ae9366f4e6/
>
> So perhaps Shiqing needs to put in some kind of portabilityworkaround for OMPI, and the current binaries won't actually work forXP...?
>
> I can't say that for sure because I really know very little aboutWindows; we'll unfortunately have to wait until he returns to get adefinitive answer. :-(
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
>
>
> ------------------------------
>
> Message: 5
> Date: Thu, 19 May 2011 21:37:49 -0400
> From: Jeff Squyres <jsquy...@cisco.com>
> Subject: Re: [OMPI users] openmpi (1.2.8 or above) and Intel composer
> XE 2011 (aka 12.0)
> To: Open MPI Users <us...@open-mpi.org>
> Cc: Giovanni Bracco <giovanni.bra...@enea.it>, Agostino Funel
> <agostino.fu...@enea.it>, Fiorenzo Ambrosino
> <fiorenzo.ambros...@enea.it>, Guido Guarnieri
> <guido.guarni...@enea.it>, Roberto Ciavarella
> <roberto.ciavare...@enea.it>, Salvatore Podda
> <salvatore.po...@enea.it>, Giovanni Ponti <giovanni.po...@enea.it>
> Message-ID: <45362608-b8b0-4ade-9959-b35c5690a...@cisco.com>
> Content-Type: text/plain; charset=us-ascii
>
> Sorry for the late reply.
>
> Other users have seen something similar but we have never been ableto reproduce it. Is this only when using IB? If you use "mpirun --mcabtl_openib_cpc_if_include rdmacm", does the problem go away?
>
>
> On May 11, 2011, at 6:00 PM, Marcus R. Epperson wrote:
>
> > I've seen the same thing when I build openmpi 1.4.3 with Intel 12,but only when I have -O2 or -O3 in CFLAGS. If I drop it down to -O1then the collectives hangs go away. I don't know what, if anything,the higher optimization buys you when compiling openmpi, so I'm notsure if that's an acceptable workaround or not.
> >
> > My system is similar to yours - Intel X5570 with QDR Mellanox IBrunning RHEL 5, Slurm, and these openmpi btls: openib,sm,self. I'musing IMB 3.2.2 with a single iteration of Barrier to reproduce thehang, and it happens 100% of the time for me when I invoke it like this:
> >
> > # salloc -N 9 orterun -n 65 ./IMB-MPI1 -npmin 64 -iter 1 barrier
> >
> > The hang happens on the first Barrier (64 ranks) and each of theparticipating ranks have this backtrace:
> >
> > __poll (...)
> > poll_dispatch () from [instdir]/lib/libopen-pal.so.0
> > opal_event_loop () from [instdir]/lib/libopen-pal.so.0
> > opal_progress () from [instdir]/lib/libopen-pal.so.0
> > ompi_request_default_wait_all () from [instdir]/lib/libmpi.so.0
> > ompi_coll_tuned_sendrecv_actual () from [instdir]/lib/libmpi.so.0
> > ompi_coll_tuned_barrier_intra_recursivedoubling () from[instdir]/lib/libmpi.so.0> > ompi_coll_tuned_barrier_intra_dec_fixed () from[instdir]/lib/libmpi.so.0
> > PMPI_Barrier () from [instdir]/lib/libmpi.so.0
> > IMB_barrier ()
> > IMB_init_buffers_iter ()
> > main ()
> >
> > The one non-participating rank has this backtrace:
> >
> > __poll (...)
> > poll_dispatch () from [instdir]/lib/libopen-pal.so.0
> > opal_event_loop () from [instdir]/lib/libopen-pal.so.0
> > opal_progress () from [instdir]/lib/libopen-pal.so.0
> > ompi_request_default_wait_all () from [instdir]/lib/libmpi.so.0
> > ompi_coll_tuned_sendrecv_actual () from [instdir]/lib/libmpi.so.0
> > ompi_coll_tuned_barrier_intra_bruck () from [instdir]/lib/libmpi.so.0
> > ompi_coll_tuned_barrier_intra_dec_fixed () from[instdir]/lib/libmpi.so.0
> > PMPI_Barrier () from [instdir]/lib/libmpi.so.0
> > main ()
> >
> > If I use more nodes I can get it to hang with 1ppn, so that seemsto rule out the sm btl (or interactions with it) as a culprit at least.
> >
> > I can't reproduce this with openmpi 1.5.3, interestingly.
> >
> > -Marcus
> >
> >
> > On 05/10/2011 03:37 AM, Salvatore Podda wrote:
> >> Dear all,
> >>
> >> we succeed in building several version of openmpi from 1.2.8 to1.4.3
> >> with Intel composer XE 2011 (aka 12.0).
> >> However we found a threshold in the number of cores (dependingfrom the
> >> application: IMB, xhpl or user applications
> >> and form the number of required cores) above which theapplication hangs
> >> (sort of deadlocks).
> >> The building of openmpi with 'gcc' and 'pgi' does not show thesame limits.> >> There are any known incompatibilities of openmpi with thisversion of
> >> intel compiilers?
> >>
> >> The characteristics of our computational infrastructure are:
> >>
> >> Intel processors E7330, E5345, E5530 e E5620
> >>
> >> CentOS 5.3, CentOS 5.5.
> >>
> >> Intel composer XE 2011
> >> gcc 4.1.2
> >> pgi 10.2-1
> >>
> >> Regards
> >>
> >> Salvatore Podda
> >>
> >> ENEA UTICT-HPC
> >> Department for Computer Science Development and ICT
> >> Facilities Laboratory for Science and High Performace Computing
> >> C.R. Frascati
> >> Via E. Fermi, 45
> >> PoBox 65
> >> 00044 Frascati (Rome)
> >> Italy
> >>
> >> Tel: +39 06 9400 5342
> >> Fax: +39 06 9400 5551
> >> Fax: +39 06 9400 5735
> >> E-mail: salvatore.po...@enea.it
> >> Home Page: www.cresco.enea.it
> >> _______________________________________________
> >> users mailing list
> >> us...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
>
>
> ------------------------------
>
> Message: 6
> Date: Thu, 19 May 2011 22:01:00 -0400
> From: Jeff Squyres <jsquy...@cisco.com>
> Subject: Re: [OMPI users] Openib with > 32 cores per node
> To: Open MPI Users <us...@open-mpi.org>
> Message-ID: <c18c4827-d305-484a-9dae-290902d40...@cisco.com>
> Content-Type: text/plain; charset=us-ascii
>
> What Sam is alluding to is that the OpenFabrics driver code in OMPIis sucking up oodles of memory for each IB connection that you'reusing. The receive_queues param that he sent tells OMPI to use allshared receive queues (instead of defaulting to one per-peer receivequeue and the rest shared receive queues -- the per-peer RQ sucks upall the memory when you multiple it by N peers).
>
>
> On May 19, 2011, at 11:59 AM, Samuel K. Gutierrez wrote:
>
> > Hi,
> >
> > On May 19, 2011, at 9:37 AM, Robert Horton wrote
> >
> >> On Thu, 2011-05-19 at 08:27 -0600, Samuel K. Gutierrez wrote:
> >>> Hi,
> >>>
> >>> Try the following QP parameters that only use shared receive queues.
> >>>
> >>> -mca btl_openib_receive_queues S,12288,128,64,32:S,65536,128,64,32
> >>>
> >>
> >> Thanks for that. If I run the job over 2 x 48 cores it now worksand the> >> performance seems reasonable (I need to do some more tuning) butwhen I
> >> go up to 4 x 48 cores I'm getting the same problem:
> >>
> >>[compute-1-7.local][[14383,1],86][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_oob.c:464:qp_create_one]error creating qp errno says Cannot allocate memory
> >> [compute-1-7.local:18106] *** An error occurred in MPI_Isend
> >> [compute-1-7.local:18106] *** on communicator MPI_COMM_WORLD
> >> [compute-1-7.local:18106] *** MPI_ERR_OTHER: known error not in list
> >> [compute-1-7.local:18106] *** MPI_ERRORS_ARE_FATAL (your MPI jobwill now abort)
> >>
> >> Any thoughts?
> >
> > How much memory does each node have? Does this happen at startup?
> >
> > Try adding:
> >
> > -mca btl_openib_cpc_include rdmacm
> >
> > I'm not sure if your version of OFED supports this feature, butmaybe using XRC may help. I **think** other tweaks are needed to getthis going, but I'm not familiar with the details.
> >
> > Hope that helps,
> >
> > Samuel K. Gutierrez
> > Los Alamos National Laboratory
> >
> >
> >>
> >> Thanks,
> >> Rob
> >> --
> >> Robert Horton
> >> System Administrator (Research Support) - School of MathematicalSciences
> >> Queen Mary, University of London
> >> r.hor...@qmul.ac.uk - +44 (0) 20 7882 7345
> >>
> >> _______________________________________________
> >> users mailing list
> >> us...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> >
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
>
>
> ------------------------------
>
> Message: 7
> Date: Thu, 19 May 2011 22:04:46 -0400
> From: Jeff Squyres <jsquy...@cisco.com>
> Subject: Re: [OMPI users] MPI_COMM_DUP freeze with OpenMPI 1.4.1
> To: Open MPI Users <us...@open-mpi.org>
> Message-ID: <0dcf20b8-ca5c-4746-8187-a2dff39b1...@cisco.com>
> Content-Type: text/plain; charset=us-ascii
>
> On May 13, 2011, at 8:31 AM, francoise.r...@obs.ujf-grenoble.fr wrote:
>
> > Here is the MUMPS portion of code (in zmumps_part1.F file) wherethe slaves call MPI_COMM_DUP , id%PAR and MASTER are initialized to 0before :
> >
> > CALL MPI_COMM_SIZE(id%COMM, id%NPROCS, IERR )
>
> I re-indented so that I could read it better:
>
> CALL MPI_COMM_SIZE(id%COMM, id%NPROCS, IERR )
> IF ( id%PAR .eq. 0 ) THEN
> IF ( id%MYID .eq. MASTER ) THEN
> color = MPI_UNDEFINED
> ELSE
> color = 0
> END IF
> CALL MPI_COMM_SPLIT( id%COMM, color, 0,
> & id%COMM_NODES, IERR )
> id%NSLAVES = id%NPROCS - 1
> ELSE
> CALL MPI_COMM_DUP( id%COMM, id%COMM_NODES, IERR )
> id%NSLAVES = id%NPROCS
> END IF
>
> IF (id%PAR .ne. 0 .or. id%MYID .NE. MASTER) THEN
> CALL MPI_COMM_DUP( id%COMM_NODES, id%COMM_LOAD, IERR
> ENDIF
>
> That doesn't look right -- both MPI_COMM_SPLIT and MPI_COMM_DUP arecollective, meaning that all processes in the communicator must callthem. In the first case, only some processes are callingMPI_COMM_SPLIT. Is there some other logic that forces the rest of theprocesses to call MPI_COMM_SPLIT, too?
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
>
>
> ------------------------------
>
> Message: 8
> Date: Thu, 19 May 2011 22:30:03 -0400
> From: Jeff Squyres <jsquy...@cisco.com>
> Subject: Re: [OMPI users] Trouble with MPI-IO
> To: Open MPI Users <us...@open-mpi.org>
> Message-ID: <eefb638f-72f1-4208-8ea2-4f25f610c...@cisco.com>
> Content-Type: text/plain; charset=us-ascii
>
> Props for that testio script. I think you win the award for "mosteasy to reproduce test case." :-)
>
> I notice that some of the lines went over 72 columns, so I renamedthe file x.f90 and changed all the comments from "c" to "!" and joinedthe two &-split lines. The error about implicit type for lenr wentaway, but then when I enabled better type checking by using "use mpi"instead of "include 'mpif.h'", I got the following:
>
> x.f90:99.77:
>
> call mpi_type_indexed(lenij,ijlena,ijdisp,mpi_real,ij_vector_type,ierr)
> 1
> Error: There is no specific subroutine for the generic'mpi_type_indexed' at (1)
>
> I looked at our mpi F90 module and see the following:
>
> interface MPI_Type_indexed
> subroutine MPI_Type_indexed(count, array_of_blocklengths,array_of_displacements, oldtype, newtype, ierr)
> integer, intent(in) :: count
> integer, dimension(*), intent(in) :: array_of_blocklengths
> integer, dimension(*), intent(in) :: array_of_displacements
> integer, intent(in) :: oldtype
> integer, intent(out) :: newtype
> integer, intent(out) :: ierr
> end subroutine MPI_Type_indexed
> end interface
>
> I don't quite grok the syntax of the "allocatable" type ijdisp, sothat might be the problem here...?
>
> Regardless, I'm not entirely sure if the problem is the >72character lines, but then when that is gone, I'm not sure how theallocatable stuff fits in... (I'm not enough of a Fortran programmerto know)
>
>
>
>
> On May 10, 2011, at 7:14 PM, Tom Rosmond wrote:
>
> > I would appreciate someone with experience with MPI-IO look at the
> > simple fortran program gzipped and attached to this note. It is
> > imbedded in a script so that all that is necessary to run it is do:
> > 'testio' from the command line. The program generates a small 2-Dinput
> > array, sets up an MPI-IO environment, and write a 2-D output array
> > twice, with the only difference being the displacement arrays used to
> > construct the indexed datatype. For the first write, simple
> > monotonically increasing displacements are used, for the second the
> > displacements are 'shuffled' in one dimension. They are printed during
> > the run.
> >
> > For the first case the file is written properly, but for thesecond the
> > program hangs on MPI_FILE_WRITE_AT_ALL and must be aborted manually.
> > Although the program is compiled as an mpi program, I am running on a
> > single processor, which makes the problem more puzzling.
> >
> > The program should be relatively self-explanatory, but if more
> > information is needed, please ask. I am on an 8 core Xeon based Dell
> > workstation running Scientific Linux 5.5, Intel fortran 12.0.3, and
> > OpenMPI 1.5.3. I have also attached output from 'ompi_info'.
> >
> > T. Rosmond
> >
> >
> ><testio.gz><info_ompi.gz>_______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
>
>
> ------------------------------
>
> Message: 9
> Date: Thu, 19 May 2011 20:24:25 -0700
> From: Tom Rosmond <rosm...@reachone.com>
> Subject: Re: [OMPI users] Trouble with MPI-IO
> To: Open MPI Users <us...@open-mpi.org>
> Message-ID: <1305861865.4284.104.ca...@cedar.reachone.com>
> Content-Type: text/plain
>
> Thanks for looking at my problem. Sounds like you did reproduce my
> problem. I have added some comments below
>
> On Thu, 2011-05-19 at 22:30 -0400, Jeff Squyres wrote:
> > Props for that testio script. I think you win the award for "mosteasy to reproduce test case." :-)
> >
> > I notice that some of the lines went over 72 columns, so I renamedthe file x.f90 and changed all the comments from "c" to "!" and joinedthe two &-split lines. The error about implicit type for lenr wentaway, but then when I enabled better type checking by using "use mpi"instead of "include 'mpif.h'", I got the following:
>
> What fortran compiler did you use?
>
> In the original script my Intel compile used the -132 option,
> allowing up to that many columns per line. I still think in
> F77 fortran much of the time, and use 'c' for comments out
> of habit. The change to '!' doesn't make any difference.
>
>
> > x.f90:99.77:
> >
> > callmpi_type_indexed(lenij,ijlena,ijdisp,mpi_real,ij_vector_type,ierr)
> > 1
> > Error: There is no specific subroutine for the generic'mpi_type_indexed' at (1)
>
> Hmmm, very strange, since I am looking right at the MPI standard
> documents with that routine documented. I too get this compile failure
> when I switch to 'use mpi'. Could that be a problem with the Open MPI
> fortran libraries???
> >
> > I looked at our mpi F90 module and see the following:
> >
> > interface MPI_Type_indexed
> > subroutine MPI_Type_indexed(count, array_of_blocklengths,array_of_displacements, oldtype, newtype, ierr)
> > integer, intent(in) :: count
> > integer, dimension(*), intent(in) :: array_of_blocklengths
> > integer, dimension(*), intent(in) :: array_of_displacements
> > integer, intent(in) :: oldtype
> > integer, intent(out) :: newtype
> > integer, intent(out) :: ierr
> > end subroutine MPI_Type_indexed
> > end interface
> >
> > I don't quite grok the syntax of the "allocatable" type ijdisp, sothat might be the problem here...?
>
> Just a standard F90 'allocatable' statement. I've written thousands
> just like it.
> >
> > Regardless, I'm not entirely sure if the problem is the >72character lines, but then when that is gone, I'm not sure how theallocatable stuff fits in... (I'm not enough of a Fortran programmerto know)
> >
> Anyone else out that who can comment????
>
>
> T. Rosmond
>
>
>
> >
> > On May 10, 2011, at 7:14 PM, Tom Rosmond wrote:
> >
> > > I would appreciate someone with experience with MPI-IO look at the
> > > simple fortran program gzipped and attached to this note. It is
> > > imbedded in a script so that all that is necessary to run it is do:
> > > 'testio' from the command line. The program generates a small2-D input
> > > array, sets up an MPI-IO environment, and write a 2-D output array
> > > twice, with the only difference being the displacement arraysused to
> > > construct the indexed datatype. For the first write, simple
> > > monotonically increasing displacements are used, for the second the
> > > displacements are 'shuffled' in one dimension. They are printedduring
> > > the run.
> > >
> > > For the first case the file is written properly, but for thesecond the
> > > program hangs on MPI_FILE_WRITE_AT_ALL and must be aborted manually.
> > > Although the program is compiled as an mpi program, I am runningon a
> > > single processor, which makes the problem more puzzling.
> > >
> > > The program should be relatively self-explanatory, but if more
> > > information is needed, please ask. I am on an 8 core Xeon based Dell
> > > workstation running Scientific Linux 5.5, Intel fortran 12.0.3, and
> > > OpenMPI 1.5.3. I have also attached output from 'ompi_info'.
> > >
> > > T. Rosmond
> > >
> > >
> > ><testio.gz><info_ompi.gz>_______________________________________________
> > > users mailing list
> > > us...@open-mpi.org
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
>
>
>
> ------------------------------
>
> Message: 10
> Date: Fri, 20 May 2011 09:25:14 +0200
> From: David B?ttner <david.buett...@in.tum.de>
> Subject: Re: [OMPI users] Problem with MPI_Request, MPI_Isend/recv and
> MPI_Wait/Test
> To: Open MPI Users <us...@open-mpi.org>
> Message-ID: <4dd6175a.1080...@in.tum.de>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Hello,
>
> thanks for the quick answer. I am sorry that I forgot to mentionthis: I
> did compile OpenMPI with MPI_THREAD_MULTIPLE support and test if
> required == provided after the MPI_Thread_init call.
>
> > I do not see any mechanism for protecting the accesses to therequests to a single thread? What is the thread model you're using?
> >
> Again I am sorry that this was not clear: In the pseudo code below I
> wanted to indicate the access-protection I do by thread-id dependent
> calls if(0 == thread-id) and by using the trylock(...) (using
> pthread-mutexes). In the code all accesses concerning one MPI_Request
> (which are pthread-global-pointers in my case) are protected and called
> in sequential order, i.e. MPI_Isend/recv is returns before anythread is
> allowed to call the corresponding MPI_Test and no-one can call MPI_Test
> any more when a thread is allowed to call MPI_Wait.
> I did this in the same manner before with other MPI implementations,but
> also on the same machine with the same (untouched) OpenMPI
> implementation, also using pthreads and MPI in combination, but I used
>
> MPI_Request req;
>
> instead of
>
> MPI_Request* req;
> (and later)
> req = (MPI_Request*)malloc(sizeof(MPI_Request));
>
>
> In my recent (problem) code, I also tried not using pointers, but got
> the same problem. Also, as I described in the first mail, I tried
> everything concerning the memory allocation of the MPI_Request objects.
> I tried not calling malloc. This I guessed wouldn't work, but the
> OpenMPI documentation says this:
>
> " Nonblocking calls allocate a communication request object and
> associate it with the request handle the argument request). "
> [http://www.open-mpi.org/doc/v1.4/man3/MPI_Isend.3.php] and
>
> " [...] if the communication object was created by a nonblockingsend or
> receive, then it is deallocated and the request handle is set to
> MPI_REQUEST_NULL."
> [http://www.open-mpi.org/doc/v1.4/man3/MPI_Test.3.php] and (in slightly
> different words) [http://www.open-mpi.org/doc/v1.4/man3/MPI_Wait.3.php]
>
> So I thought that it might do some kind of optimized memory stuff
> internally.
>
> I also tried allocating req (for each used MPI_Request) once before the
> first use and deallocation after the last use (which I thought was the
> way it was supposed to work), but that crashes also.
>
> I tried replacing the pointers through global variables
>
> MPI_Request req;
>
> which didn't do the job...
>
> The only thing that seems to work is what I mentioned below: Allocate
> every time I am going to need it in the MPI_Isend/recv, use it in
> MPI_Test/Wait and after that deallocate it by hand each time.
> I don't think that this is supposed to be like this since I have todo a
> call to malloc and free so often (for multiple MPI_Request objects in
> each iteration) that it will most likely limit performance...
>
> Anyway I still have the same problem and am still unclear on what kind
> of memory allocation I should be doing for the MPI_Requests. Is there
> anything else (besides MPI_THREAD_MULTIPLE support, thread access
> control, sequential order of MPI_Isend/recv, MPI_Test and MPI_Wait for
> one MPI_Request object) I need to take care of? If not, what could I do
> to find the source of my problem?
>
> Thanks again for any kind of help!
>
> Kind regards,
> David
>
>
>
> > > From an implementation perspective, your code is correct only ifyou initialize the MPI library with MPI_THREAD_MULTIPLE and if thelibrary accepts. Otherwise, there is an assumption that theapplication is single threaded, or that the MPI behavior isimplementation dependent. Please read the MPI standard regarding toMPI_Init_thread for more details.
> >
> > Regards,
> > george.
> >
> > On May 19, 2011, at 02:34 , David B?ttner wrote:
> >
> >> Hello,
> >>
> >> I am working on a hybrid MPI (OpenMPI 1.4.3) and Pthread code. Iam using MPI_Isend and MPI_Irecv for communication andMPI_Test/MPI_Wait to check if it is done. I do this repeatedly in theouter loop of my code. The MPI_Test is used in the inner loop to checkif some function can be called which depends on the received data.> >> The program regularly crashed (only when not using printf...) andafter debugging it I figured out the following problem:
> >>
> >> In MPI_Isend I have an invalid read of memory. I fixed theproblem with not re-using a
> >>
> >> MPI_Request req_s, req_r;
> >>
> >> but by using
> >>
> >> MPI_Request* req_s;
> >> MPI_Request* req_r
> >>
> >> and re-allocating them before the MPI_Isend/recv.
> >>
> >> The documentation says, that in MPI_Wait and MPI_Test (ifsuccessful) the request-objects are deallocated and set toMPI_REQUEST_NULL.> >> It also says, that in MPI_Isend and MPI_Irecv, it allocates theObjects and associates it with the request object.
> >>
> >> As I understand this, this either means I can use a pointer toMPI_Request which I don't have to initialize for this (it doesn't workbut crashes), or that I can use a MPI_Request pointer which I haveinitialized with malloc(sizeof(MPI_REQUEST)) (or passing the addressof a MPI_Request req), which is set and unset in the functions. Butthis version crashes, too.> >> What works is using a pointer, which I allocate before theMPI_Isend/recv and which I free after MPI_Wait in every iteration. Inother words: It only uses if I don't reuse any kind of MPI_Request.Only if I recreate one every time.
> >>
> >> Is this, what is should be like? I believe that a reuse of thememory would be a lot more efficient (less calls to malloc...). Am Imissing something here? Or am I doing something wrong?
> >>
> >>
> >> Let me provide some more detailed information about my problem:
> >>
> >> I am running the program on a 30 node infiniband cluster. Eachnode has 4 single core Opteron CPUs. I am running 1 MPI Rank per nodeand 4 threads per rank (-> one thread per core).
> >> I am compiling with mpicc of OpenMPI using gcc below.
> >> Some pseudo-code of the program can be found at the end of thise-mail.
> >>
> >> I was able to reproduce the problem using different amount ofnodes and even using one node only. The problem does not arise when Iput printf-debugging information into the code. This pointed me intothe direction that I have some memory problem, where some writeaccesses some memory it is not supposed to.> >> I ran the tests using valgrind with --leak-check=full and--show-reachable=yes, which pointed me either to MPI_Isend or MPI_Waitdepending on whether I had the threads spin in a loop for MPI_Test toreturn success or used MPI_Wait respectively.
> >>
> >> I would appreciate your help with this. Am I missing somethingimportant here? Is there a way to re-use the request in the differentiterations other than I thought it should work?> >> Or is there a way to re-initialize the allocated memory beforethe MPI_Isend/recv so that I at least don't have to call free andmalloc each time?
> >>
> >> Thank you very much for your help!
> >> Kind regards,
> >> David B?ttner
> >>
> >> _____________________
> >> Pseudo-Code of program:
> >>
> >> MPI_Request* req_s;
> >> MPI_Request* req_w;
> >> OUTER-LOOP
> >> if(0 == threadid)
> >> {
> >> req_s = malloc(sizeof(MPI_Request));
> >> req_r = malloc(sizeof(MPI_Request));
> >> MPI_Isend(..., req_s)
> >> MPI_Irecv(..., req_r)
> >> }
> >> pthread_barrier
> >> INNER-LOOP (while NOT_DONE or RET)
> >> if(TRYLOCK&& NOT_DONE)
> >> {
> >> if(MPI_TEST(req_r))
> >> {
> >> Call_Function_A;
> >> NOT_DONE = 0;
> >> }
> >>
> >> }
> >> RET = Call_Function_B;
> >> }
> >> pthread_barrier_wait
> >> if(0 == threadid)
> >> {
> >> MPI_WAIT(req_s)
> >> MPI_WAIT(req_r)
> >> free(req_s);
> >> free(req_r);
> >> }
> >> _____________
> >>
> >>
> >> --
> >> David B?ttner, Informatik, Technische Universit?t M?nchen
> >> TUM I-10 - FMI 01.06.059 - Tel. 089 / 289-17676
> >>
> >> _______________________________________________
> >> users mailing list
> >> us...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> > "To preserve the freedom of the human mind then and freedom of thepress, every spirit should be ready to devote itself to martyrdom; foras long as we may think as we will, and speak as we think, thecondition of man will proceed in improvement."
> > -- Thomas Jefferson, 1799
> >
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> --
> David B?ttner, Informatik, Technische Universit?t M?nchen
> TUM I-10 - FMI 01.06.059 - Tel. 089 / 289-17676
>
>
>
> ------------------------------
>
> Message: 11
> Date: Fri, 20 May 2011 06:23:21 -0400
> From: Jeff Squyres <jsquy...@cisco.com>
> Subject: Re: [OMPI users] Trouble with MPI-IO
> To: Open MPI Users <us...@open-mpi.org>
> Message-ID: <a5b121e9-e664-49d0-ae54-2cfe52712...@cisco.com>
> Content-Type: text/plain; charset=us-ascii
>
> On May 19, 2011, at 11:24 PM, Tom Rosmond wrote:
>
> > What fortran compiler did you use?
>
> gfortran.
>
> > In the original script my Intel compile used the -132 option,
> > allowing up to that many columns per line.
>
> Gotcha.
>
> >> x.f90:99.77:
> >>
> >> callmpi_type_indexed(lenij,ijlena,ijdisp,mpi_real,ij_vector_type,ierr)
> >> 1
> >> Error: There is no specific subroutine for the generic'mpi_type_indexed' at (1)
> >
> > Hmmm, very strange, since I am looking right at the MPI standard
> > documents with that routine documented. I too get this compile failure
> > when I switch to 'use mpi'. Could that be a problem with the Open MPI
> > fortran libraries???
>
> I think that that error is telling us that there's a compile-timemismatch -- that the signature of what you've passed doesn't match thesignature of OMPI's MPI_Type_indexed subroutine.
>
> >> I looked at our mpi F90 module and see the following:
> >>
> >> interface MPI_Type_indexed
> >> subroutine MPI_Type_indexed(count, array_of_blocklengths,array_of_displacements, oldtype, newtype, ierr)
> >> integer, intent(in) :: count
> >> integer, dimension(*), intent(in) :: array_of_blocklengths
> >> integer, dimension(*), intent(in) :: array_of_displacements
> >> integer, intent(in) :: oldtype
> >> integer, intent(out) :: newtype
> >> integer, intent(out) :: ierr
> >> end subroutine MPI_Type_indexed
> >> end interface
>
> Shouldn't ijlena and ijdisp be 1D arrays, not 2D arrays?
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
>
>
> ------------------------------
>
> Message: 12
> Date: Fri, 20 May 2011 07:26:19 -0400
> From: Jeff Squyres <jsquy...@cisco.com>
> Subject: Re: [OMPI users] MPI_Alltoallv function crashes when np > 100
> To: Open MPI Users <us...@open-mpi.org>
> Message-ID: <f9f71854-b9dd-459f-999d-8a8aef8d6...@cisco.com>
> Content-Type: text/plain; charset=GB2312
>
> I missed this email in my INBOX, sorry.
>
> Can you be more specific about what exact error is occurring? Youjust say that the application crashes...? Please send all theinformation listed here:
>
> http://www.open-mpi.org/community/help/
>
>
> On Apr 26, 2011, at 10:51 PM, ?????? wrote:
>
> > It seems that the const variable SOMAXCONN who used by listen()system call causes this problem. Can anybody help me resolve thisquestion?
> >
> > 2011/4/25 ?????? <xjun.m...@gmail.com>
> > Dear all,
> >
> > As I mentioned, when I mpiruned an application with the parameter"np = 150(or bigger)", the application who used the MPI_Alltoallvfunction would carsh. The problem would recur no matter how many nodeswe used.
> >
> > The edition of OpenMPI: 1.4.1 or 1.4.3
> > The OS: linux redhat 2.6.32
> >
> > BTW, my nodes had enough memory to run the application, and theMPI_Alltoall function worked well at my environment.
> > Did anybody meet the same problem? Thanks.
> >
> >
> > Best Regards
> >
> >
> >
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
>
>
> ------------------------------
>
> Message: 13
> Date: Fri, 20 May 2011 07:28:28 -0400
> From: Jeff Squyres <jsquy...@cisco.com>
> Subject: Re: [OMPI users] MPI_ERR_TRUNCATE with MPI_Allreduce() error,
> but only sometimes...
> To: Open MPI Users <us...@open-mpi.org>
> Message-ID: <caef632e-757b-49ee-b545-5cccbc712...@cisco.com>
> Content-Type: text/plain; charset=us-ascii
>
> Sorry for the super-late reply. :-\
>
> Yes, ERR_TRUNCATE means that the receiver didn't have a large enoughbuffer.
>
> Have you tried upgrading to a newer version of Open MPI? 1.4.3 isthe current stable release (I have a very dim and not guaranteed to becorrect recollection that we fixed something in the internals ofcollectives somewhere with regards to ERR_TRUNCATE...?).
>
>
> On Apr 25, 2011, at 4:44 PM, Wei Hao wrote:
>
> > Hi:
> >
> > I'm running openmpi 1.2.8. I'm working on a project where one partinvolves communicating an integer, representing the number of datapoints I'm keeping track of, to all the processors. The line is simple:
> >
> > MPI_Allreduce(&np,&geo_N,1,MPI_INT,MPI_MAX,MPI_COMM_WORLD);
> >
> > where np and geo_N are integers, np is the result of a localcalculation, and geo_N has been declared on all the processors. geo_Nis nondecreasing. This line works the first time I call it (geo_N goesfrom 0 to some other integer), but if I call it later in the program,I get the following error:
> >
> >
> > [woodhen-039:26189] *** An error occurred in MPI_Allreduce
> > [woodhen-039:26189] *** on communicator MPI_COMM_WORLD
> > [woodhen-039:26189] *** MPI_ERR_TRUNCATE: message truncated
> > [woodhen-039:26189] *** MPI_ERRORS_ARE_FATAL (goodbye)
> >
> >
> > As I understand it, MPI_ERR_TRUNCATE means that the output bufferis too small, but I'm not sure where I've made a mistake. It'sparticularly frustrating because it seems to work fine the first time.Does anyone have any thoughts?
> >
> > Thanks
> > Wei
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
>
>
> ------------------------------
>
> Message: 14
> Date: Fri, 20 May 2011 08:14:07 -0400
> From: Jeff Squyres <jsquy...@cisco.com>
> Subject: Re: [OMPI users] Trouble with MPI-IO
> To: Open MPI Users <us...@open-mpi.org>
> Message-ID: <42db03b3-9cf4-4acb-aa20-b857e5f76...@cisco.com>
> Content-Type: text/plain; charset="us-ascii"
>
> On May 20, 2011, at 6:23 AM, Jeff Squyres wrote:
>
> > Shouldn't ijlena and ijdisp be 1D arrays, not 2D arrays?
>
> Ok, if I convert ijlena and ijdisp to 1D arrays, I don't get thecompile error (even though they're allocatable -- so allocate was ared herring, sorry). That's all that "use mpi" is complaining about --that the function signatures didn't match.
>
> use mpi is your friend -- even if you don't use F90 constructs much.Compile-time checking is Very Good Thing (you were effectively"getting lucky" by passing in the 2D arrays, I think).
>
> Attached is my final version. And with this version, I see the hangwhen running it with the "T" parameter.
>
> That being said, I'm not an expert on the MPI IO stuff -- your code*looks* right to me, but I could be missing something subtle in theinterpretation of MPI_FILE_SET_VIEW. I tried running your code withMPICH 1.3.2p1 and it also hung.
>
> Rob (ROMIO guy) -- can you comment this code? Is it correct?
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: x.f90
> Type: application/octet-stream
> Size: 3820 bytes
> Desc: not available
> URL:<http://www.open-mpi.org/MailArchives/users/attachments/20110520/53a5461b/attachment.obj>
>
> ------------------------------
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> End of users Digest, Vol 1911, Issue 1
> **************************************


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] v1.5.3-x64 does not work on Windows 7 workgroup

Reply via email to