I'm afraid I don't know enough about Boost to know. What the specific error message means is that you have posted an MPI_Recv that was too small to handle an incoming message. It is permissible in MPI to post a receive that is *larger* than the corresponding incoming message, but it is defined as an error to post a receive with a buffer that is too small.
On May 3, 2010, at 6:18 PM, Pooja Varshneya wrote: > Hi All, > > I have written a program where MPI master sends and receives large > amount of data i.e sending from 1KB to 1MB of data. > The amount of data to be sent with each call is different > > The program runs well when running with 5 slaves, but when i try to > run the same program with 9 slaves, it gives me > MPI_Recv:MPI_ERR_TRUNCATE: message truncated error. > > I am using boost MPI and boost serialization libraries for sending data. > I understand that the internal buffer on the master are overrun in > this case. Is there a way i can increase the buffer sizes ? > > Here is the output: > -bash-3.2$ mpirun -np 9 --hostfile hostfile2 --rankfile rankfile2 > $BENCHMARKS_ROOT/bin/boost_binomial_LB 10 5000_steps.txt > 5000_homo_bytes.txt > Master: Starting Binomial Option Price calculations for American call > option > Master: Current stock price: 110 > Master: Strike price: 100 > Master: Risk-free rate: 1.05 > Master: Volatility (annualized): 0.15 > Master: Time (years): 1 > Master: Number of calculations: 10 > > Slave 1:Going to Received Skeleton: 1 > Slave 1:Received Skeleton: 1 > Slave 1:Gpoing to Received Payload: 1 > Slave 1:Received Payload: 1 > Master: Sent initial message > Master: Sent initial message > Master: Sent initial message > Slave 2:Going to Received Skeleton: 2 > Slave 2:Received Skeleton: 2 > Slave 2:Gpoing to Received Payload: 2 > Slave 2:Received Payload: 2 > Slave 3:Going to Received Skeleton: 3 > Slave 3:Received Skeleton: 3 > Slave 3:Gpoing to Received Payload: 3 > Slave 3:Received Payload: 3 > Slave 4:Going to Received Skeleton: 4 > Slave 4:Received Skeleton: 4 > Slave 4:Gpoing to Received Payload: 4 > Slave 1: Sent Response Skeleton: 1 > Master: Sent initial message > Slave 4:Received Payload: 4 > Slave 5:Going to Received Skeleton: 5 > terminate called after throwing an instance of > 'boost > ::exception_detail > ::clone_impl > <boost::exception_detail::error_info_injector<boost::mpi::exception> >' > what(): MPI_Recv: MPI_ERR_TRUNCATE: message truncated > [rh5x64-u12:26987] *** Process received signal *** > [rh5x64-u12:26987] Signal: Aborted (6) > [rh5x64-u12:26987] Signal code: (-6) > [rh5x64-u12:26987] [ 0] /lib64/libpthread.so.0 [0x3ba680e7c0] > [rh5x64-u12:26987] [ 1] /lib64/libc.so.6(gsignal+0x35) [0x3ba5c30265] > [rh5x64-u12:26987] [ 2] /lib64/libc.so.6(abort+0x110) [0x3ba5c31d10] > [rh5x64-u12:26987] [ 3] /usr/lib64/libstdc++.so. > 6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x114) [0x3bb7abec44] > [rh5x64-u12:26987] [ 4] /usr/lib64/libstdc++.so.6 [0x3bb7abcdb6] > [rh5x64-u12:26987] [ 5] /usr/lib64/libstdc++.so.6 [0x3bb7abcde3] > [rh5x64-u12:26987] [ 6] /usr/lib64/libstdc++.so.6 [0x3bb7abceca] > [rh5x64-u12:26987] [ 7] /userdata/testing/benchmark_binaries/bin/ > boost_binomial_LB(_ZN5boost15throw_exceptionINS_3mpi9exceptionEEEvRKT_ > +0x172) [0x4216a2] > [rh5x64-u12:26987] [ 8] /usr/local/lib/libboost_mpi.so. > 1.42.0 > (_ZN5boost3mpi6detail19packed_archive_recvEP19ompi_communicator_tiiRNS0_15packed_iarchiveER20ompi_status_public_t > +0x16b) [0x2b0317faa6b3] > [rh5x64-u12:26987] [ 9] /usr/local/lib/libboost_mpi.so. > 1.42.0 > (_ZNK5boost3mpi12communicator4recvINS0_15packed_iarchiveEEENS0_6statusEiiRT_ > +0x40) [0x2b0317f9c72a] > [rh5x64-u12:26987] [10] /usr/local/lib/libboost_mpi.so. > 1.42.0 > (_ZNK5boost3mpi12communicator4recvINS0_24packed_skeleton_iarchiveEEENS0_6statusEiiRT_ > +0x38) [0x2b0317f9c76c] > [rh5x64-u12:26987] [11] /userdata/testing/benchmark_binaries/bin/ > boost_binomial_LB > (_ZNK5boost3mpi12communicator4recvI31Binomial_Option_Pricing_RequestEENS0_6statusEiiRKNS0_14skeleton_proxyIT_EE > +0x121) [0x4258c1] > [rh5x64-u12:26987] [12] /userdata/testing/benchmark_binaries/bin/ > boost_binomial_LB(main+0x409) [0x41d369] > [rh5x64-u12:26987] [13] /lib64/libc.so.6(__libc_start_main+0xf4) > [0x3ba5c1d994] > [rh5x64-u12:26987] [14] /userdata/testing/benchmark_binaries/bin/ > boost_binomial_LB(__gxx_personality_v0+0x399) [0x419e69] > [rh5x64-u12:26987] *** End of error message *** > [rh5x64-u11.zlab.local][[47840,1],0][btl_tcp_frag.c: > 216:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: > Connection reset by peer (104) > -------------------------------------------------------------------------- > mpirun noticed that process rank 5 with PID 26987 on node 172.10.0.112 > exited on signal 6 (Aborted). > -------------------------------------------------------------------------- > > Here is the program code: > > #include <iostream> > #include <cstdlib> > #include <ctime> > #include <algorithm> > #include <numeric> > #include <functional> > #include <iomanip> > #include <cstdlib> > #include <cmath> > #include <limits> > #include <vector> > #include <sstream> > #include <fstream> > #include <streambuf> > > #include <mpi.h> > > #include <boost/mpi/environment.hpp> > #include <boost/mpi/communicator.hpp> > #include <boost/mpi/collectives.hpp> > #include <boost/thread/barrier.hpp> > #include <boost/thread/mutex.hpp> > #include <boost/config.hpp> > #include <boost/serialization/access.hpp> > #include <boost/serialization/string.hpp> > #include <boost/mpi/skeleton_and_content.hpp> > #include <boost/mpi/datatype.hpp> > #include <boost/archive/tmpdir.hpp> > #include <boost/serialization/utility.hpp> > #include <boost/serialization/base_object.hpp> > #include <boost/mpi.hpp> > #include <boost/tokenizer.hpp> > #include <boost/archive/tmpdir.hpp> > #include <boost/archive/binary_oarchive.hpp> > #include <boost/serialization/export.hpp> > #include <boost/serialization/base_object.hpp> > #include <boost/serialization/utility.hpp> > #include <boost/serialization/vector.hpp> > > #include "ace/OS_NS_sys_time.h" > #include "ace/OS_NS_time.h" > #include "ace/Profile_Timer.h" > > using namespace MPI; > using std::scientific; > using namespace std; > > namespace mpi = boost::mpi; > > #define STOPTAG 0 > > std::ofstream output_file; > > static void master (int & n_calls, > std::string &step_file_name, > std::string &byte_file_name, > mpi::communicator &world); > static void slave (mpi::communicator &world); > > struct Binomial_Option_Pricing_Request > { > double cur_stock_price; > double strike_price; > double risk_free_rate; > double volatility; > double t; > int n_steps; > double option_price; > std::vector<char> payload; > }; > > namespace boost > { > namespace serialization > { > template<class Archive> > void serialize (Archive &ar, > struct > Binomial_Option_Pricing_Request &bopr, > unsigned int version) > { > ar & bopr.cur_stock_price; > ar & bopr.strike_price; > ar & bopr.risk_free_rate; > ar & bopr.volatility; > ar & bopr.t; > ar & bopr.n_steps; > ar & bopr.option_price; > ar & bopr.payload; > } > } > } > > BOOST_IS_MPI_DATATYPE (Binomial_Option_Pricing_Request); > > int > main (int argc, char **argv) > { > mpi::environment env (argc, argv); > mpi::communicator world; > > std::string step_file_name; > int n_calls; > > read_input (argv[1], &n_calls); > read_input (argv[2], &step_file_name); > std::string byte_file_name; > read_input (argv[3], &byte_file_name); > > if (world.rank () == 0) > { > master (n_calls, step_file_name, byte_file_name, world); > } > else > { > slave (world); > } > > MPI_Finalize (); > return 0; > } > > static void > master (int & n_calls, > std::string &step_file_name, > std::string &byte_file_name, > mpi::communicator &world) > { > int n_tasks = world.size (); > int rank; > > const double cur_stock_price = 110.0; > const double strike_price = 100.0; > const double risk_free_rate = 1.05; // Risk-free interest rate > const double volatility = 0.15; // Annualized volatility > const double t = 1.0; // In years > > > int request_count = 0; > int reply_count = 0; > int vector_count = 0; > > std::vector<Binomial_Option_Pricing_Request> requests (n_calls); > std::vector<Binomial_Option_Pricing_Request> replies(n_calls); > > for (std::vector<Binomial_Option_Pricing_Request>::iterator > it = requests.begin(); > it != requests.end(); ++it) > { > Binomial_Option_Pricing_Request& r(*it); > r.cur_stock_price = cur_stock_price; > r.strike_price = strike_price; > r.risk_free_rate = risk_free_rate; > r.volatility = volatility; > r.t = t; > r.n_steps = step_vector[vector_count]; > > // resize vector for sending heterogenous payload > r.payload.resize (byte_vector[vector_count]); > // Initialize payload > std::for_each (r.payload.begin (), > > r.payload.end (), > > Initialize_Byte_Vector ()); > ++vector_count; > } > > for (rank = 1; rank < n_tasks; ++rank) > { > // send Binomial_Option_Pricing_Request skeleton > // resize vector for sending heterogenous payload > world.send (rank, > request_count + 1, > > mpi::skeleton(requests[request_count])); > > // send Binomial_Option_Pricing_Request data > world.send (rank, > request_count + 1, > > mpi::get_content(requests[request_count])); > > std::cout << "Master: Sent initial message" << std::endl; > requests[request_count].payload.resize(0); > ++request_count; > } > while (request_count < n_calls) > { > > Binomial_Option_Pricing_Request bopr_reply_data; > > mpi::status msg = world.probe (); > > // Receive reply skeleton > world.recv (msg.source (), msg.tag (), mpi::skeleton > (bopr_reply_data)); > > std::cout << "Master: Received reply skeleton > from:"<<msg.source()<<"for"<<msg.tag ()<<std::endl; > // Receive reply > world.recv (msg.source (), msg.tag (), mpi::get_content > (bopr_reply_data)); > > std::cout << "Master: Received reply > from:"<<msg.source()<<"for "<< > msg.tag() << std::endl; > bopr_reply_data.payload.resize (0); > replies.push_back (bopr_reply_data); > ++reply_count; > > world.send (msg.source (), > request_count > + 1, > > mpi::skeleton(requests[request_count])); > > std::cout << "Master:Sent message skeleton > to :"<<msg.source()<<std::endl; > // send Binomial_Option_Pricing_Request data > world.send (msg.source (), > request_count > + 1, > > mpi::get_content(requests[request_count])); > > std::cout << "Master:Sent message to > :"<<msg.source()<<std::endl; > //requests[request_count].payload.resize(0); > ++request_count; > // store reply > } > > while (reply_count < n_calls) > { > std::cout <<" Master Inside final loop" <<std::endl; > Binomial_Option_Pricing_Request bopr_reply_data; > > mpi::status msg = world.probe (); > > // Receive reply skeleton > world.recv (msg.source (), msg.tag (), mpi::skeleton > (bopr_reply_data)); > // Receive reply > world.recv (msg.source (), msg.tag (), mpi::get_content > (bopr_reply_data)); > bopr_reply_data.payload.resize (0); > // store reply > replies.push_back (bopr_reply_data); > ++reply_count; > } > > for (int rank = 1; rank < n_tasks; ++rank) > { > Binomial_Option_Pricing_Request bopr_stop_data; > world.send (rank, STOPTAG, bopr_stop_data); > } > } > > static void > slave (mpi::communicator &world) > { > int my_rank = world.rank (); > int count = 0; > > while (1) > { > Binomial_Option_Pricing_Request bopr_call_data; > > mpi::status msg = world.probe (); > if (msg.tag () == STOPTAG) > { > break; > } > else > { > world.recv (0, msg.tag (), mpi::skeleton > (bopr_call_data)); > std::cout << "Slave " << world.rank () << ":Received Skeleton: > "<<msg.tag() << std::endl; > > world.recv (0, msg.tag (), mpi::get_content (bopr_call_data)); > std::cout << "Slave " << world.rank () << ":Received Payload: > "<<msg.tag() << std::endl; > > bopr_call_data.option_price = > option_price_call_american_binomial > (bopr_call_data.cur_stock_price, > > bopr_call_data.strike_price, > > bopr_call_data.risk_free_rate, > > bopr_call_data.volatility, > > bopr_call_data.t, > > bopr_call_data.n_steps); > > world.isend (0, msg.tag (), mpi::skeleton > (bopr_call_data)); > std::cout << "Slave " << world.rank () << ": Sent Response > Skeleton: "<<msg.tag() << std::endl; > world.isend (0, msg.tag (), mpi::get_content > (bopr_call_data)); > std::cout << "Slave " << world.rank () << ": Sent Response Payload: > "<<msg.tag() << std::endl; > ++count; > } > } > std::cout << "Slave: " << my_rank << " : " > << "Number of requests > processed: " << count << std::endl; > } > > Thanks, > Pooja > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/