[OMPI users] (no subject)
dear sir/madam what are the mpi functins used for taking checkpoint and restart within applicaion in mpi programs and where do i get these functions from ? with regards mallikarjuna shastry
Re: [OMPI users] MPI and C++ - now Send and Receive of Classes and STL containers
I strongly suggest you take a look at boost::mpi, http://www.boost.org/doc/libs/1_39_0/doc/html/mpi.html It handles serialization transparently and has some great natural extensions to the MPI C interface for C++, e.g. bool global = all_reduce(comm, local, logical_and()); This sets "global" to "local_0 && local_1 && ... && local_N-1" Luis Vitorio Cargnini wrote: Thank you very much John, the explanation of &v[0], was the kind of think that I was looking for, thank you very much. This kind of approach solves my problems. Le 09-07-05 à 22:20, John Phillips a écrit : Luis Vitorio Cargnini wrote: Hi, So, after some explanation I start to use the bindings of C inside my C++ code, then comme my new doubt: How to send a object through Send and Recv of MPI ? Because the types are CHAR, int, double, long double, you got. Someone have any suggestion ? Thanks. Vitorio. Vitorio, If you are sending collections of built in data types (ints, doubles, that sort of thing), then it may be easy, and it isn't awful. You want the data in a single stretch of continuous memory. If you are using an STL vector, this is already true. If you are using some other container, then no guarantees are provided for whether the memory is continuous. Imagine you are using a vector, and you know the number of entries in that vector. You want to send that vector to processor 2 on the world communicator with tag 0. Then, the code snippet would be; std::vector v; ... code that fills v with something ... int send_error; send_error = MPI_Send(&v[0], v.size(), MPI_DOUBLE, 2, 0, MPI_COMM_WORLD); The &v[0] part provides a pointer to the first member of the array that holds the data for the vector. If you know how long it will be, you could use that constant instead of using the v.size() function. Knowing the length also simplifies the send, since the remote process also knows the length and doesn't need a separate send to provide that information. It is also possible to provide a pointer to the start of storage for the character array that makes up a string. Both of these legacy friendly interfaces are part of the standard, and should be available on any reasonable implementation of the STL. If you are using a container that is not held in continuous memory, and the data is all of a single built in data type, then you need to first serialize the data into a block of continuous memory before sending it. (If the data block is large, then you may actually have to divide it into pieces and send them separately.) If the data is not a block of all a single built in type, (It may include several built in types, or it may be a custom data class with complex internal structure, for example.) then the serialization problem gets harder. In this case, look at the MPI provided facilities for dealing with complex data types and compare to the boost provided facilities. There is an initial learning curve for the boost facilities, but in the long run it may provide a substantial development time savings if you need to transmit and receive several complex types. In most cases, the run time cost is small for using the boost facilities. (according to the tests run during library development and documented with the library) John Phillips ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] Parallel I/O Usage
Hi, do I understand the MPI-2 Parallel I/O correctly (C++)? After opening a file with MPI::File::Open, I can use Read_at on the returned file object. I give offsets in bytes and I can perform random access reads from any process at any point of the file without violating correctness (although the performance might/should/will be better using views): MPI::File f = MPI::File::Open(MPI::COMM_WORLD, filename, MPI::MODE_RDONLY, MPI::INFO_NULL); // ... MPI::Offset pos_in_file = ...; // ... f.Read_at(pos_in_file, buffer, local_n + 1, MPI::INTEGER); // ... f.Close(); I have some problems with the program reading invalid data and want to make sure I am actually using parallel I/O the right way. Thanks, -- Manuel Holtgrewe
[OMPI users] Segmentation fault - Address not mapped
Dear all, I have recently started working on a project using OpenMPI. Basically, I have been given some c++ code, a cluster to play with and a deadline in order to make the c++ code run faster. The cluster was a bit crowded, so I started working on my laptop (g++ 4.3.3 -- Ubuntu repos, OpenMPI 1.3.2 -- compiled with no options ) and after one week I actually had something that was running on my computer, therefore decided to move to the cluster. Since the cluster is very old and it was using g++ 3.2 and an old version of OpenMPI, I decided to install both of them from source in my home folder (g++ 4.4, OpenMPI 1.3.2). The issue is that when I run the program (after being compiled flawless on the machine), I get these error messages: [denali:30134] *** Process received signal *** [denali:30134] Signal: Segmentation fault (11) [denali:30134] Signal code: Address not mapped (1) [denali:30134] Failing at address: 0x18 (more in the attached file -- mpirun -np 4 ray-trace) All this morning, I have gone through the mailing lists, found people experiencing my problems, but their solution did not work for me. By using simple debugging (cout), I was able to determine where the error comes from: //Initialize step MPI_Init(&argc,&argv); //Here it breaks!!! Memory allocation issue! MPI_Comm_size(MPI_COMM_WORLD, &pool); std::cout<<"I'm here"
Re: [OMPI users] (no subject)
The MPI standard does not define any functions for taking checkpoints from the application. The checkpoint/restart work in Open MPI is a command line driven, transparent solution. So the application does not have change in any way, and the user (or scheduler) must initiate the checkpoint from the command line (on the same node as the mpirun process). We have experimented with adding Open MPI specific checkpoint/restart interfaces in the context of the MPI Forum. These prototypes have not made it to the Open MPI trunk. Some information about that particular development is at the link below: https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/Quiescence Best, Josh On Jul 6, 2009, at 12:07 AM, Mallikarjuna Shastry wrote: dear sir/madam what are the mpi functins used for taking checkpoint and restart within applicaion in mpi programs and where do i get these functions from ? with regards mallikarjuna shastry ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Segmentation fault - Address not mapped
Hi, //Initialize step MPI_Init(&argc,&argv); //Here it breaks!!! Memory allocation issue! MPI_Comm_size(MPI_COMM_WORLD, &pool); std::cout<<"I'm here"< and your PATH is also okay? (I see that you use plain mpicxx in the build) ... Moreover, I wanted to see if the installation is actually ok and I tried running this program: http://en.wikipedia.org/wiki/Message_Passing_Interface#Example_program with exactly the same results; the code breaks when the memory address of variable pool is referenced. So, if you have any ideas or you think I might have missed something, please let me know. Thanks, Catalin ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Segmentation fault - Address not mapped
On Mon, Jul 6, 2009 at 2:14 PM, Dorian Krause wrote: > Hi, > >> >> //Initialize step >> MPI_Init(&argc,&argv); >> //Here it breaks!!! Memory allocation issue! >> MPI_Comm_size(MPI_COMM_WORLD, &pool); >> std::cout<<"I'm here"<> MPI_Comm_rank(MPI_COMM_WORLD, &myid); >> >> When trying to debug via gdb, the problem seems to be: >> >> Program received signal SIGSEGV, Segmentation fault. >> 0xb7524772 in ompi_comm_invalid (comm=Could not find the frame base >> for "ompi_comm_invalid".) at communicator.h:261 >> 261 communicator.h: No such file or directory. >> in communicator.h >> >> which might indicate a problem with paths. For now, my LD_LIBRARY_PATH >> is set to "/users/cluster/cdavid/local/lib/" (the local folder in my >> home folder emulates the directory structure of the / folder). >> > > and your PATH is also okay? (I see that you use plain mpicxx in the build) > ... > Hi again! This is the output of some commands in the terminal: cdavid@denali:~$ which mpicxx ~/local/bin/mpicxx cdavid@denali:~$ echo $PATH /users/cluster/cdavid/local/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/opt/scali/bin:/opt/scali/sbin:/opt/scali/contrib/pbs/bin:/users/cluster/cdavid/bin cdavid@denali:~$ echo $LD_LIBRARY_PATH /users/cluster/cdavid/local/lib/ cdavid@denali:~$ locate communicator.h cdavid@denali:~$ I don't see anything wrong with the path (I added the first part in order to make it look there first). I even tried adding "-L/users/cluster/cdavid/local/lib -lmpi -I/users/cluster/cdavid/local/include" to the compiler invocation, in hope of an improvement. So far, nothing. Regards, Catalin -- ** Catalin David B.Sc. Computer Science 2010 Jacobs University Bremen Phone: +49-(0)1577-49-38-667 College Ring 4, #343 Bremen, 28759 Germany **
Re: [OMPI users] Segmentation fault - Address not mapped
Hi Are you also sure that you have the same version of Open-MPI on every machine of your cluster, and that it is the mpicxx of this version that is called when you run your program? I ask because you mentioned that there was an old version of Open-MPI present... die you remove this? Jody On Mon, Jul 6, 2009 at 3:24 PM, Catalin David wrote: > On Mon, Jul 6, 2009 at 2:14 PM, Dorian Krause wrote: >> Hi, >> >>> >>> //Initialize step >>> MPI_Init(&argc,&argv); >>> //Here it breaks!!! Memory allocation issue! >>> MPI_Comm_size(MPI_COMM_WORLD, &pool); >>> std::cout<<"I'm here"<>> MPI_Comm_rank(MPI_COMM_WORLD, &myid); >>> >>> When trying to debug via gdb, the problem seems to be: >>> >>> Program received signal SIGSEGV, Segmentation fault. >>> 0xb7524772 in ompi_comm_invalid (comm=Could not find the frame base >>> for "ompi_comm_invalid".) at communicator.h:261 >>> 261 communicator.h: No such file or directory. >>> in communicator.h >>> >>> which might indicate a problem with paths. For now, my LD_LIBRARY_PATH >>> is set to "/users/cluster/cdavid/local/lib/" (the local folder in my >>> home folder emulates the directory structure of the / folder). >>> >> >> and your PATH is also okay? (I see that you use plain mpicxx in the build) >> ... >> > > Hi again! > > This is the output of some commands in the terminal: > > cdavid@denali:~$ which mpicxx > ~/local/bin/mpicxx > cdavid@denali:~$ echo $PATH > /users/cluster/cdavid/local/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/opt/scali/bin:/opt/scali/sbin:/opt/scali/contrib/pbs/bin:/users/cluster/cdavid/bin > cdavid@denali:~$ echo $LD_LIBRARY_PATH > /users/cluster/cdavid/local/lib/ > cdavid@denali:~$ locate communicator.h > cdavid@denali:~$ > > I don't see anything wrong with the path (I added the first part in > order to make it look there first). I even tried adding > "-L/users/cluster/cdavid/local/lib -lmpi > -I/users/cluster/cdavid/local/include" to the compiler invocation, in > hope of an improvement. So far, nothing. > > Regards, > > Catalin > > > -- > > ** > Catalin David > B.Sc. Computer Science 2010 > Jacobs University Bremen > > Phone: +49-(0)1577-49-38-667 > > College Ring 4, #343 > Bremen, 28759 > Germany > ** > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] Segmentation fault - Address not mapped
On Mon, Jul 6, 2009 at 3:26 PM, jody wrote: > Hi > Are you also sure that you have the same version of Open-MPI > on every machine of your cluster, and that it is the mpicxx of this > version that is called when you run your program? > I ask because you mentioned that there was an old version of Open-MPI > present... die you remove this? > > Jody Hi I have just logged in a few other boxes and they all mount my home folder. When running `echo $LD_LIBRARY_PATH` and other commands, I get what I expect to get, but this might be because I have set these variables in the .bashrc file. So, I tried compiling/running like this ~/local/bin/mpicxx [stuff] and ~/local/bin/mpirun -np 4 ray-trace, but I get the same errors. As for the previous version, I don't have root access, therefore I was not able to remove it. I was just trying to outrun it by setting the $PATH variable to point first at my local installation. Catalin -- ** Catalin David B.Sc. Computer Science 2010 Jacobs University Bremen Phone: +49-(0)1577-49-38-667 College Ring 4, #343 Bremen, 28759 Germany **
[OMPI users] any way to get serial time on head node?
Let total time on my slot 0 process be S+C+B+I = serial computations + communication + busy wait + idle Is there a way to find out S? S+C would probably also be useful, since I assume C is low. The problem is that I = 0, roughly, and B is big. Since B is big, the usual process timing methods don't work. If B all went to "system" as opposed to "user" time I could use the latter, but I don't think that's the case. Can anyone confirm that? If S is big, I might be able to gain by parallelizing in a different way. By S I mean to refer to serial computation that is part of my algorithm, rather than the technical fact that all the computation is serial on a given slot. I'm running R/RMPI. Thanks. Ross
Re: [OMPI users] MPI and C++ - now Send and Receive of Classes and STL containers
just one additional and if I have: vector< vector > x; How to use the MPI_Send MPI_Send(&x[0][0], x[0].size(),MPI_DOUBLE, 2, 0, MPI_COMM_WORLD); ? Le 09-07-05 à 22:20, John Phillips a écrit : Luis Vitorio Cargnini wrote: Hi, So, after some explanation I start to use the bindings of C inside my C++ code, then comme my new doubt: How to send a object through Send and Recv of MPI ? Because the types are CHAR, int, double, long double, you got. Someone have any suggestion ? Thanks. Vitorio. Vitorio, If you are sending collections of built in data types (ints, doubles, that sort of thing), then it may be easy, and it isn't awful. You want the data in a single stretch of continuous memory. If you are using an STL vector, this is already true. If you are using some other container, then no guarantees are provided for whether the memory is continuous. Imagine you are using a vector, and you know the number of entries in that vector. You want to send that vector to processor 2 on the world communicator with tag 0. Then, the code snippet would be; std::vector v; ... code that fills v with something ... int send_error; send_error = MPI_Send(&v[0], v.size(), MPI_DOUBLE, 2, 0, MPI_COMM_WORLD); The &v[0] part provides a pointer to the first member of the array that holds the data for the vector. If you know how long it will be, you could use that constant instead of using the v.size() function. Knowing the length also simplifies the send, since the remote process also knows the length and doesn't need a separate send to provide that information. It is also possible to provide a pointer to the start of storage for the character array that makes up a string. Both of these legacy friendly interfaces are part of the standard, and should be available on any reasonable implementation of the STL. If you are using a container that is not held in continuous memory, and the data is all of a single built in data type, then you need to first serialize the data into a block of continuous memory before sending it. (If the data block is large, then you may actually have to divide it into pieces and send them separately.) If the data is not a block of all a single built in type, (It may include several built in types, or it may be a custom data class with complex internal structure, for example.) then the serialization problem gets harder. In this case, look at the MPI provided facilities for dealing with complex data types and compare to the boost provided facilities. There is an initial learning curve for the boost facilities, but in the long run it may provide a substantial development time savings if you need to transmit and receive several complex types. In most cases, the run time cost is small for using the boost facilities. (according to the tests run during library development and documented with the library) John Phillips ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users smime.p7s Description: S/MIME cryptographic signature PGP.sig Description: Ceci est une signature électronique PGP
Re: [OMPI users] MPI and C++ - now Send and Receive of Classes and STL containers
Thanks, but I really do not want to use Boost. Is easier ? certainly is, but I want to make it using only MPI itself and not been dependent of a Library, or templates like the majority of boost a huge set of templates and wrappers for different libraries, implemented in C, supplying a wrapper for C++. I admit Boost is a valuable tool, but in my case, as much independent I could be from additional libs, better. Le 09-07-06 à 04:49, Number Cruncher a écrit : I strongly suggest you take a look at boost::mpi, http://www.boost.org/doc/libs/1_39_0/doc/html/mpi.html It handles serialization transparently and has some great natural extensions to the MPI C interface for C++, e.g. bool global = all_reduce(comm, local, logical_and()); This sets "global" to "local_0 && local_1 && ... && local_N-1" Luis Vitorio Cargnini wrote: Thank you very much John, the explanation of &v[0], was the kind of think that I was looking for, thank you very much. This kind of approach solves my problems. Le 09-07-05 à 22:20, John Phillips a écrit : Luis Vitorio Cargnini wrote: Hi, So, after some explanation I start to use the bindings of C inside my C++ code, then comme my new doubt: How to send a object through Send and Recv of MPI ? Because the types are CHAR, int, double, long double, you got. Someone have any suggestion ? Thanks. Vitorio. Vitorio, If you are sending collections of built in data types (ints, doubles, that sort of thing), then it may be easy, and it isn't awful. You want the data in a single stretch of continuous memory. If you are using an STL vector, this is already true. If you are using some other container, then no guarantees are provided for whether the memory is continuous. Imagine you are using a vector, and you know the number of entries in that vector. You want to send that vector to processor 2 on the world communicator with tag 0. Then, the code snippet would be; std::vector v; ... code that fills v with something ... int send_error; send_error = MPI_Send(&v[0], v.size(), MPI_DOUBLE, 2, 0, MPI_COMM_WORLD); The &v[0] part provides a pointer to the first member of the array that holds the data for the vector. If you know how long it will be, you could use that constant instead of using the v.size() function. Knowing the length also simplifies the send, since the remote process also knows the length and doesn't need a separate send to provide that information. It is also possible to provide a pointer to the start of storage for the character array that makes up a string. Both of these legacy friendly interfaces are part of the standard, and should be available on any reasonable implementation of the STL. If you are using a container that is not held in continuous memory, and the data is all of a single built in data type, then you need to first serialize the data into a block of continuous memory before sending it. (If the data block is large, then you may actually have to divide it into pieces and send them separately.) If the data is not a block of all a single built in type, (It may include several built in types, or it may be a custom data class with complex internal structure, for example.) then the serialization problem gets harder. In this case, look at the MPI provided facilities for dealing with complex data types and compare to the boost provided facilities. There is an initial learning curve for the boost facilities, but in the long run it may provide a substantial development time savings if you need to transmit and receive several complex types. In most cases, the run time cost is small for using the boost facilities. (according to the tests run during library development and documented with the library) John Phillips ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users smime.p7s Description: S/MIME cryptographic signature PGP.sig Description: Ceci est une signature électronique PGP
[OMPI users] Segfault when using valgrind
Hi, I am attempting to debug a memory corruption in an mpi program using valgrind. Howver, when I run with valgrind I get semi-random segfaults and valgrind messages with the openmpi library. Here is an example of such a seg fault: ==6153== ==6153== Invalid read of size 8 ==6153==at 0x19102EA0: (within /usr/lib/openmpi/lib/openmpi/mca_btl_sm.so) ==6153==by 0x182ABACB: (within /usr/lib/openmpi/lib/openmpi/mca_pml_ob1.so) ==6153==by 0x182A3040: (within /usr/lib/openmpi/lib/openmpi/mca_pml_ob1.so) ==6153==by 0xB425DD3: PMPI_Isend (in /usr/lib/openmpi/lib/libmpi.so.0.0.0) ==6153==by 0x7B83DA8: int Uintah::SFC::MergeExchange(int, std::vector, std::allocator > >&, std::vector, std::allocator > >&, std::vector, std::allocator > >&) (SFC.h:2989) ==6153==by 0x7B84A8F: void Uintah::SFC::Batchers(std::vector, std::allocator > >&, std::vector, std::allocator > >&, std::vector, std::allocator > >&) (SFC.h:3730) ==6153==by 0x7B8857B: void Uintah::SFC::Cleanup(std::vector, std::allocator > >&, std::vector, std::allocator > >&, std::vector, std::allocator > >&) (SFC.h:3695) ==6153==by 0x7B88CC6: void Uintah::SFC::Parallel0<3, unsigned char>() (SFC.h:2928) ==6153==by 0x7C00AAB: void Uintah::SFC::Parallel<3, unsigned char>() (SFC.h:1108) ==6153==by 0x7C0EF39: void Uintah::SFC::GenerateDim<3>(int) (SFC.h:694) ==6153==by 0x7C0F0F2: Uintah::SFC::GenerateCurve(int) (SFC.h:670) ==6153==by 0x7B30CAC: Uintah::DynamicLoadBalancer::useSFC(Uintah::Handle const&, int*) (DynamicLoadBalancer.cc:429) ==6153== Address 0x10 is not stack'd, malloc'd or (recently) free'd ^G^G^GThread "main"(pid 6153) caught signal SIGSEGV at address (nil) (segmentation violation) Looking at the code for our isend at SFC.h:298 does not seem to have any errors: = MergeInfo myinfo,theirinfo; MPI_Request srequest, rrequest; MPI_Status status; myinfo.n=n; if(n!=0) { myinfo.min=sendbuf[0].bits; myinfo.max=sendbuf[n-1].bits; } //cout << rank << " n:" << n << " min:" << (int)myinfo.min << "max:" << (int)myinfo.max << endl; MPI_Isend(&myinfo,sizeof(MergeInfo),MPI_BYTE,to,0,Comm,&srequest); == myinfo is a struct located on the stack, to is the rank of the processor that the message is being sent to, and srequest is also on the stack. When I don't run with valgrind my program runs past this point just fine. I am currently using openmpi 1.3 from the debian unstable branch. I also see the same type of segfault in a different portion of the code involving an MPI_Allgather which can be seen below: == ==22736== Use of uninitialised value of size 8 ==22736==at 0x19104775: mca_btl_sm_component_progress (opal_list.h:322) ==22736==by 0x1382CE09: opal_progress (opal_progress.c:207) ==22736==by 0xB404264: ompi_request_default_wait_all (condition.h:99) ==22736==by 0x1A1ADC16: ompi_coll_tuned_sendrecv_actual (coll_tuned_util.c:55) ==22736==by 0x1A1B61E1: ompi_coll_tuned_allgatherv_intra_bruck (coll_tuned_util.h:60) ==22736==by 0xB418B2E: PMPI_Allgatherv (pallgatherv.c:121) ==22736==by 0x646CCF7: Uintah::Level::setBCTypes() (Level.cc:728) ==22736==by 0x646D823: Uintah::Level::finalizeLevel() (Level.cc:537) ==22736==by 0x6465457: Uintah::Grid::problemSetup(Uintah::Handle const&, Uintah::ProcessorGroup const*, bool) (Grid.cc:866) ==22736==by 0x8345759: Uintah::SimulationController::gridSetup() (SimulationController.cc:243) ==22736==by 0x834F418: Uintah::AMRSimulationController::run() (AMRSimulationController.cc:117) ==22736==by 0x4089AE: main (sus.cc:629) ==22736== ==22736== Invalid read of size 8 ==22736==at 0x19104775: mca_btl_sm_component_progress (opal_list.h:322) ==22736==by 0x1382CE09: opal_progress (opal_progress.c:207) ==22736==by 0xB404264: ompi_request_default_wait_all (condition.h:99) ==22736==by 0x1A1ADC16: ompi_coll_tuned_sendrecv_actual (coll_tuned_util.c:55) ==22736==by 0x1A1B61E1: ompi_coll_tuned_allgatherv_intra_bruck (coll_tuned_util.h:60) ==22736==by 0xB418B2E: PMPI_Allgatherv (pallgatherv.c:121) ==22736==by 0x646CCF7: Uintah::Level::setBCTypes() (Level.cc:728) ==22736==by 0x646D823: Uintah::Level::finalizeLevel() (Level.cc:537) ==22736==by 0x6465457: Uintah::Grid::problemSetup(Uintah::Handle const&, Uintah::ProcessorGroup const*, bool) (Grid.cc:866) ==22736==by 0x8345759: Uintah::SimulationController::gridSetup() (SimulationController.cc:243) ==22736==by 0x834F418: Uintah::AMRSimulationController::run() (AMRSimulationController.cc:117) ==22736==by 0x4089AE: main (sus.cc:629) Are these problems with openmpi and is there any known work arounds? Thanks, Justin
[OMPI users] br / MapReduce
Feels like a deja-vu: http://www.linux-mag.com/cache/7407/1.html. Doesn't MapReduce do what MPI has been doing for a lot longer?
Re: [OMPI users] MPI and C++ (Boost)
Hi Luis, Luis Vitorio Cargnini wrote: Thanks, but I really do not want to use Boost. Is easier ? certainly is, but I want to make it using only MPI itself and not been dependent of a Library, or templates like the majority of boost a huge set of templates and wrappers for different libraries, implemented in C, supplying a wrapper for C++. I admit Boost is a valuable tool, but in my case, as much independent I could be from additional libs, better. I've used Boost MPI before and it really isn't that bad and shouldn't be seen as "just another library". Many parts of Boost are on their way to being part of the standard and are discussed and debated on. And so, it isn't the same as going to some random person's web page and downloading their library/template. Of course, it takes time to make it into the standard and I'm not entirely sure if everything will (probably not). (One "annoying" thing about Boost MPI is that you have to compile it...if you are distributing your code, end-users might find that bothersome...oh, and serialization as well.) One suggestion might be to make use of Boost and once you got your code working, start changing it back. At least you will have a working program to compare against. Kind of like writing a prototype first... Ray
Re: [OMPI users] MPI and C++ (Boost)
Hi Raymond, thanks for your answer Le 09-07-06 à 21:16, Raymond Wan a écrit : I've used Boost MPI before and it really isn't that bad and shouldn't be seen as "just another library". Many parts of Boost are on their way to being part of the standard and are discussed and debated on. And so, it isn't the same as going to some random person's web page and downloading their library/template. Of course, it takes time to make it into the standard and I'm not entirely sure if everything will (probably not). (One "annoying" thing about Boost MPI is that you have to compile it...if you are distributing your code, end-users might find that bothersome...oh, and serialization as well.) we have a common factor, I'm not exactly distributing, but I'll add a dependency into my code, something that bothers me. One suggestion might be to make use of Boost and once you got your code working, start changing it back. At least you will have a working program to compare against. Kind of like writing a prototype first... Your suggestion is a great and interesting idea. I only have the fear to get used to the Boost and could not get rid of Boost anymore, because one thing is sure the abstraction added by Boost is impressive, it turn the things much less painful like MPI to be implemented using C++, also the serialization inside Boost::MPI already made by Boost to use MPI is astonishing attractive, and of course the possibility to add new types like classes to be able to send objects through MPI_Send of Boost, this is certainly attractive, but again I do not want to get dependent of a library as I said, this is my major concern. . Ray smime.p7s Description: S/MIME cryptographic signature PGP.sig Description: Ceci est une signature électronique PGP
[OMPI users] Configuration problem or network problem?
Hi all, The system I use is a PS3 cluster, with 16 PS3s and a PowerPC as a headnode, they are connected by a high speed switch. There are point-to-point communication functions( MPI_Send and MPI_Recv ), the data size is about 40KB, and a lot of computings which will consume a long time(about 1 sec)in a loop.The co-processor in PS3 can take care of the computation, the main processor take care of point-to-point communication,so the computing and communication can overlap.The communication funtions should return much faster than computing function. My question is that after some circles, the time consumed by communication functions in a PS3 will increase heavily, and the whole cluster's sync state will corrupt.When I decrease the computing time, this situation just disappeare.I am very confused about this. I think there is a mechanism in OpenMPI that cause this case, does everyone get this situation before? I use "mpirun --mca btl tcp, self -np 17 --hostfile ...", is there something i should added? Lin
Re: [OMPI users] Configuration problem or network problem?
Lin, Try -np 16 and not running on the head node. Doug Reeder On Jul 6, 2009, at 7:08 PM, Zou, Lin (GE, Research, Consultant) wrote: Hi all, The system I use is a PS3 cluster, with 16 PS3s and a PowerPC as a headnode, they are connected by a high speed switch. There are point-to-point communication functions( MPI_Send and MPI_Recv ), the data size is about 40KB, and a lot of computings which will consume a long time(about 1 sec)in a loop.The co- processor in PS3 can take care of the computation, the main processor take care of point-to-point communication,so the computing and communication can overlap.The communication funtions should return much faster than computing function. My question is that after some circles, the time consumed by communication functions in a PS3 will increase heavily, and the whole cluster's sync state will corrupt.When I decrease the computing time, this situation just disappeare.I am very confused about this. I think there is a mechanism in OpenMPI that cause this case, does everyone get this situation before? I use "mpirun --mca btl tcp, self -np 17 --hostfile ...", is there something i should added? Lin ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] MPI and C++ - now Send and Receive of Classes and STL containers
Luis Vitorio Cargnini wrote: just one additional and if I have: vector< vector > x; How to use the MPI_Send MPI_Send(&x[0][0], x[0].size(),MPI_DOUBLE, 2, 0, MPI_COMM_WORLD); ? Vitorio, The standard provides no information on where the different parts of the data will be, relative to each other. In specific, there is no reason to believe that the data in the different internally nested doubles will be contiguous. (In fact, I know of no platform where it will be.) That means trying to send the whole structure at once is problematic. What you wrote will provide a pointer to the first element of the first nested vector to MPI_Send, and the length of that nested vector. If that is what you intend, I expect it to work. (I have not tested it, so I may be misthinking something here.) The other nested vectors could be sent for themselves, using separate MPI_Send calls. The only reliable way to send all of the data at once would be to serialize it off to a single vector or array for the send, then repack it in the structure after it is received. John
Re: [OMPI users] MPI and C++ (Boost)
Luis Vitorio Cargnini wrote: Your suggestion is a great and interesting idea. I only have the fear to get used to the Boost and could not get rid of Boost anymore, because one thing is sure the abstraction added by Boost is impressive, it turn the things much less painful like MPI to be implemented using C++, also the serialization inside Boost::MPI already made by Boost to use MPI is astonishing attractive, and of course the possibility to add new types like classes to be able to send objects through MPI_Send of Boost, this is certainly attractive, but again I do not want to get dependent of a library as I said, this is my major concern. . I'm having problems understanding your base argument here. It seems to be that you are afraid that Boost.MPI will make your prototype program so much better and easier to write that you won't want to remove it. Wouldn't this be exactly the reason why keeping it would be good? I like and use Boost.MPI. I voted for inclusion during the review in the Boost developer community. However, what you should do in your program is use those tools that produce the right trade off between the best performance, easiest to develop correctly, and most maintainable program you can. If that means using Boost.MPI, then remember that questions about it are answered at the Boost Users mailing list. If your decision is that that does not include Boost.MPI then you will have some other challenges to face but experience shows that you can still produce a very high quality program. Choose as you see fit, just be sure to understand your own reasons. (Whether any of the rest of us on this list understand them or not.) John
Re: [OMPI users] MPI and C++ (Boost)
On Mon, 2009-07-06 at 23:09 -0400, John Phillips wrote: > Luis Vitorio Cargnini wrote: > > > > Your suggestion is a great and interesting idea. I only have the fear to > > get used to the Boost and could not get rid of Boost anymore, because > > one thing is sure the abstraction added by Boost is impressive, it turn > > the things much less painful like MPI to be implemented using C++, also > > the serialization inside Boost::MPI already made by Boost to use MPI is > > astonishing attractive, and of course the possibility to add new types > > like classes to be able to send objects through MPI_Send of Boost, this > > is certainly attractive, but again I do not want to get dependent of a > > library as I said, this is my major concern. > > . > >I'm having problems understanding your base argument here. It seems > to be that you are afraid that Boost.MPI will make your prototype > program so much better and easier to write that you won't want to remove > it. Wouldn't this be exactly the reason why keeping it would be good? > >I like and use Boost.MPI. I voted for inclusion during the review in > the Boost developer community. However, what you should do in your > program is use those tools that produce the right trade off between the > best performance, easiest to develop correctly, and most maintainable > program you can. If that means using Boost.MPI, then remember that > questions about it are answered at the Boost Users mailing list. If your > decision is that that does not include Boost.MPI then you will have some > other challenges to face but experience shows that you can still produce > a very high quality program. > >Choose as you see fit, just be sure to understand your own reasons. > (Whether any of the rest of us on this list understand them or not.) I understand Luis' position completely. He wants an MPI program, not a program that's written in some other environment, no matter how attractive that may be. It's like the difference between writing a numerical program in standard-conforming Fortran and writing it in the latest flavour of the month interpreted language calling highly optimised libraries behind the scenes. IF boost is attached to MPI 3 (or whatever), AND it becomes part of the mainstream MPI implementations, THEN you can have the discussion again. Ciao Terry -- Dr. Terry Frankcombe Research School of Chemistry, Australian National University Ph: (+61) 0417 163 509Skype: terry.frankcombe
Re: [OMPI users] Configuration problem or network problem?
Thank you for your suggestion, I tried this solution, but it doesn't work. In fact, the headnode doesn't participate the computing and communication, it only malloc a large a memory, and when the loop in every PS3 is over, the headnode gather the data from every PS3. The strange thing is that sometimes the program can work well, but when reboot the system, without any change to the program, it can't work, so I think it should be some mechanism in OpenMPI that can configure to let the program work well. Lin From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Doug Reeder Sent: 2009年7月7日 10:49 To: Open MPI Users Subject: Re: [OMPI users] Configuration problem or network problem? Lin, Try -np 16 and not running on the head node. Doug Reeder On Jul 6, 2009, at 7:08 PM, Zou, Lin (GE, Research, Consultant) wrote: Hi all, The system I use is a PS3 cluster, with 16 PS3s and a PowerPC as a headnode, they are connected by a high speed switch. There are point-to-point communication functions( MPI_Send and MPI_Recv ), the data size is about 40KB, and a lot of computings which will consume a long time(about 1 sec)in a loop.The co-processor in PS3 can take care of the computation, the main processor take care of point-to-point communication,so the computing and communication can overlap.The communication funtions should return much faster than computing function. My question is that after some circles, the time consumed by communication functions in a PS3 will increase heavily, and the whole cluster's sync state will corrupt.When I decrease the computing time, this situation just disappeare.I am very confused about this. I think there is a mechanism in OpenMPI that cause this case, does everyone get this situation before? I use "mpirun --mca btl tcp, self -np 17 --hostfile ...", is there something i should added? Lin ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users