I'm afraid I don't know enough about Boost to know.

What the specific error message means is that you have posted an MPI_Recv that 
was too small to handle an incoming message.  It is permissible in MPI to post 
a receive that is *larger* than the corresponding incoming message, but it is 
defined as an error to post a receive with a buffer that is too small.


On May 3, 2010, at 6:18 PM, Pooja Varshneya wrote:

> Hi All,
> 
> I have written a program where MPI master sends and receives large 
> amount of data i.e sending from 1KB to 1MB of data.
> The amount of data to be sent with each call is different
> 
> The program runs well when running with 5 slaves, but when i try to 
> run the same program with 9 slaves, it gives me 
> MPI_Recv:MPI_ERR_TRUNCATE: message truncated error.
> 
> I am using boost MPI and boost serialization libraries for sending data.
> I understand that the internal buffer on the master are overrun in 
> this case. Is there a way i can increase the buffer sizes ?
> 
> Here is the output:
> -bash-3.2$ mpirun -np 9 --hostfile hostfile2 --rankfile rankfile2 
> $BENCHMARKS_ROOT/bin/boost_binomial_LB 10 5000_steps.txt 
> 5000_homo_bytes.txt
> Master: Starting Binomial Option Price calculations for American call 
> option
> Master: Current stock price: 110
> Master: Strike price: 100
> Master: Risk-free rate: 1.05
> Master: Volatility (annualized): 0.15
> Master: Time (years): 1
> Master: Number of calculations: 10
> 
> Slave 1:Going to Received Skeleton: 1
> Slave 1:Received Skeleton: 1
> Slave 1:Gpoing to Received Payload: 1
> Slave 1:Received Payload: 1
> Master: Sent initial message
> Master: Sent initial message
> Master: Sent initial message
> Slave 2:Going to Received Skeleton: 2
> Slave 2:Received Skeleton: 2
> Slave 2:Gpoing to Received Payload: 2
> Slave 2:Received Payload: 2
> Slave 3:Going to Received Skeleton: 3
> Slave 3:Received Skeleton: 3
> Slave 3:Gpoing to Received Payload: 3
> Slave 3:Received Payload: 3
> Slave 4:Going to Received Skeleton: 4
> Slave 4:Received Skeleton: 4
> Slave 4:Gpoing to Received Payload: 4
> Slave 1: Sent Response Skeleton: 1
> Master: Sent initial message
> Slave 4:Received Payload: 4
> Slave 5:Going to Received Skeleton: 5
> terminate called after throwing an instance of 
> 'boost
> ::exception_detail
> ::clone_impl
> <boost::exception_detail::error_info_injector<boost::mpi::exception> >'
>    what():  MPI_Recv: MPI_ERR_TRUNCATE: message truncated
> [rh5x64-u12:26987] *** Process received signal ***
> [rh5x64-u12:26987] Signal: Aborted (6)
> [rh5x64-u12:26987] Signal code:  (-6)
> [rh5x64-u12:26987] [ 0] /lib64/libpthread.so.0 [0x3ba680e7c0]
> [rh5x64-u12:26987] [ 1] /lib64/libc.so.6(gsignal+0x35) [0x3ba5c30265]
> [rh5x64-u12:26987] [ 2] /lib64/libc.so.6(abort+0x110) [0x3ba5c31d10]
> [rh5x64-u12:26987] [ 3] /usr/lib64/libstdc++.so.
> 6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x114) [0x3bb7abec44]
> [rh5x64-u12:26987] [ 4] /usr/lib64/libstdc++.so.6 [0x3bb7abcdb6]
> [rh5x64-u12:26987] [ 5] /usr/lib64/libstdc++.so.6 [0x3bb7abcde3]
> [rh5x64-u12:26987] [ 6] /usr/lib64/libstdc++.so.6 [0x3bb7abceca]
> [rh5x64-u12:26987] [ 7] /userdata/testing/benchmark_binaries/bin/
> boost_binomial_LB(_ZN5boost15throw_exceptionINS_3mpi9exceptionEEEvRKT_
> +0x172) [0x4216a2]
> [rh5x64-u12:26987] [ 8] /usr/local/lib/libboost_mpi.so.
> 1.42.0
> (_ZN5boost3mpi6detail19packed_archive_recvEP19ompi_communicator_tiiRNS0_15packed_iarchiveER20ompi_status_public_t
> +0x16b) [0x2b0317faa6b3]
> [rh5x64-u12:26987] [ 9] /usr/local/lib/libboost_mpi.so.
> 1.42.0
> (_ZNK5boost3mpi12communicator4recvINS0_15packed_iarchiveEEENS0_6statusEiiRT_
> +0x40) [0x2b0317f9c72a]
> [rh5x64-u12:26987] [10] /usr/local/lib/libboost_mpi.so.
> 1.42.0
> (_ZNK5boost3mpi12communicator4recvINS0_24packed_skeleton_iarchiveEEENS0_6statusEiiRT_
> +0x38) [0x2b0317f9c76c]
> [rh5x64-u12:26987] [11] /userdata/testing/benchmark_binaries/bin/
> boost_binomial_LB
> (_ZNK5boost3mpi12communicator4recvI31Binomial_Option_Pricing_RequestEENS0_6statusEiiRKNS0_14skeleton_proxyIT_EE
> +0x121) [0x4258c1]
> [rh5x64-u12:26987] [12] /userdata/testing/benchmark_binaries/bin/
> boost_binomial_LB(main+0x409) [0x41d369]
> [rh5x64-u12:26987] [13] /lib64/libc.so.6(__libc_start_main+0xf4) 
> [0x3ba5c1d994]
> [rh5x64-u12:26987] [14] /userdata/testing/benchmark_binaries/bin/
> boost_binomial_LB(__gxx_personality_v0+0x399) [0x419e69]
> [rh5x64-u12:26987] *** End of error message ***
> [rh5x64-u11.zlab.local][[47840,1],0][btl_tcp_frag.c:
> 216:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: 
> Connection reset by peer (104)
> --------------------------------------------------------------------------
> mpirun noticed that process rank 5 with PID 26987 on node 172.10.0.112 
> exited on signal 6 (Aborted).
> --------------------------------------------------------------------------
> 
> Here is the program code:
> 
> #include <iostream>
> #include <cstdlib>
> #include <ctime>
> #include <algorithm>
> #include <numeric>
> #include <functional>
> #include <iomanip>
> #include <cstdlib>
> #include <cmath>
> #include <limits>
> #include <vector>
> #include <sstream>
> #include <fstream>
> #include <streambuf>
> 
> #include <mpi.h>
> 
> #include <boost/mpi/environment.hpp>
> #include <boost/mpi/communicator.hpp>
> #include <boost/mpi/collectives.hpp>
> #include <boost/thread/barrier.hpp>
> #include <boost/thread/mutex.hpp>
> #include <boost/config.hpp>
> #include <boost/serialization/access.hpp>
> #include <boost/serialization/string.hpp>
> #include <boost/mpi/skeleton_and_content.hpp>
> #include <boost/mpi/datatype.hpp>
> #include <boost/archive/tmpdir.hpp>
> #include <boost/serialization/utility.hpp>
> #include <boost/serialization/base_object.hpp>
> #include <boost/mpi.hpp>
> #include <boost/tokenizer.hpp>
> #include <boost/archive/tmpdir.hpp>
> #include <boost/archive/binary_oarchive.hpp>
> #include <boost/serialization/export.hpp>
> #include <boost/serialization/base_object.hpp>
> #include <boost/serialization/utility.hpp>
> #include <boost/serialization/vector.hpp>
> 
> #include "ace/OS_NS_sys_time.h"
> #include "ace/OS_NS_time.h"
> #include "ace/Profile_Timer.h"
> 
> using namespace MPI;
> using std::scientific;
> using namespace std;
> 
> namespace mpi = boost::mpi;
> 
> #define STOPTAG 0
> 
> std::ofstream output_file;
> 
> static void master (int & n_calls,
>                                         std::string &step_file_name,
>                                         std::string &byte_file_name,
>                                         mpi::communicator &world);
> static void slave (mpi::communicator &world);
> 
> struct Binomial_Option_Pricing_Request
> {
>         double cur_stock_price;
>         double strike_price;
>         double risk_free_rate;
>         double volatility;
>         double t;
>         int n_steps;
>         double option_price;
>         std::vector<char> payload;
> };
> 
> namespace boost
> {
>         namespace serialization
>         {
>                 template<class Archive>
>                         void serialize (Archive &ar,
>                                                   struct 
> Binomial_Option_Pricing_Request &bopr,
>                                                   unsigned int version)
>                         {
>                                 ar & bopr.cur_stock_price;
>                                 ar & bopr.strike_price;
>                                 ar & bopr.risk_free_rate;
>                                 ar & bopr.volatility;
>                                 ar & bopr.t;
>                                 ar & bopr.n_steps;
>                                 ar & bopr.option_price;
>                                 ar & bopr.payload;
>                         }
>         }
> }
> 
> BOOST_IS_MPI_DATATYPE (Binomial_Option_Pricing_Request);
> 
> int
> main (int argc, char **argv)
> {
>         mpi::environment env (argc, argv);
>         mpi::communicator world;
> 
>         std::string step_file_name;
>         int n_calls;
> 
>         read_input (argv[1], &n_calls);
>         read_input (argv[2], &step_file_name);
>         std::string byte_file_name;
>         read_input (argv[3], &byte_file_name);
> 
>         if (world.rank () == 0)
>         {
>                 master (n_calls, step_file_name, byte_file_name, world);
>         }
>         else
>         {
>                 slave (world);
>         }
> 
>         MPI_Finalize ();
>         return 0;
> }
> 
> static void
> master (int & n_calls,
>                                 std::string &step_file_name,
>                                 std::string &byte_file_name,
>                                 mpi::communicator &world)
> {
>         int n_tasks = world.size ();
>         int rank;
> 
>         const double cur_stock_price = 110.0;
>         const double strike_price = 100.0;
>         const double risk_free_rate = 1.05; // Risk-free interest rate
>         const double volatility = 0.15; // Annualized volatility
>         const double t = 1.0; // In years
> 
> 
>         int request_count = 0;
>    int reply_count = 0;
>         int vector_count = 0;
> 
>         std::vector<Binomial_Option_Pricing_Request> requests (n_calls);
>         std::vector<Binomial_Option_Pricing_Request> replies(n_calls);
> 
>         for (std::vector<Binomial_Option_Pricing_Request>::iterator
>                          it = requests.begin();
>                          it != requests.end(); ++it)
>         {
>                 Binomial_Option_Pricing_Request& r(*it);
>                 r.cur_stock_price = cur_stock_price;
>                 r.strike_price = strike_price;
>                 r.risk_free_rate = risk_free_rate;
>                 r.volatility = volatility;
>                 r.t = t;
>                 r.n_steps = step_vector[vector_count];
> 
>                 // resize vector for sending heterogenous payload
>                 r.payload.resize (byte_vector[vector_count]);
>                 // Initialize payload
>                 std::for_each (r.payload.begin (),
>                                                                          
> r.payload.end (),
>                                                                          
> Initialize_Byte_Vector ());
>                 ++vector_count;
>         }
> 
>         for (rank = 1; rank < n_tasks; ++rank)
>         {
>                 // send Binomial_Option_Pricing_Request skeleton
>                 // resize vector for sending heterogenous payload
>                 world.send (rank,
>                                         request_count + 1,
>                                         
> mpi::skeleton(requests[request_count]));
> 
>                 // send Binomial_Option_Pricing_Request data
>                 world.send (rank,
>                                         request_count + 1,
>                                         
> mpi::get_content(requests[request_count]));
>                                        
>                 std::cout << "Master: Sent initial message" << std::endl;
>                 requests[request_count].payload.resize(0);
>                 ++request_count;
>         }
>         while (request_count < n_calls)
>         {
> 
>                 Binomial_Option_Pricing_Request bopr_reply_data;
>                
>                 mpi::status msg = world.probe ();
>                
>                 // Receive reply skeleton
>                 world.recv (msg.source (), msg.tag (), mpi::skeleton 
> (bopr_reply_data));
> 
>                 std::cout << "Master: Received reply skeleton 
> from:"<<msg.source()<<"for"<<msg.tag ()<<std::endl;
>                 // Receive reply
>                 world.recv (msg.source (), msg.tag (), mpi::get_content 
> (bopr_reply_data));
>                
>                 std::cout << "Master: Received reply 
> from:"<<msg.source()<<"for "<<
>                         msg.tag() << std::endl;
>                 bopr_reply_data.payload.resize (0);
>                 replies.push_back (bopr_reply_data);
>                 ++reply_count;
>                
>                 world.send (msg.source (),
>                                                                 request_count 
> + 1,
>                                                                 
> mpi::skeleton(requests[request_count]));
> 
>                 std::cout << "Master:Sent message skeleton 
> to :"<<msg.source()<<std::endl;
>                 // send Binomial_Option_Pricing_Request data
>                 world.send (msg.source (),
>                                                                 request_count 
> + 1,
>                                                                 
> mpi::get_content(requests[request_count]));
> 
>                 std::cout << "Master:Sent message to 
> :"<<msg.source()<<std::endl;
>                 //requests[request_count].payload.resize(0);
>                 ++request_count;
>                 // store reply
>         }
> 
>         while (reply_count < n_calls)
>         {
>                 std::cout <<" Master Inside final loop" <<std::endl;
>                 Binomial_Option_Pricing_Request bopr_reply_data;
>                
>                 mpi::status msg = world.probe ();
>                
>                 // Receive reply skeleton
>                 world.recv (msg.source (), msg.tag (), mpi::skeleton 
> (bopr_reply_data));
>                 // Receive reply
>                 world.recv (msg.source (), msg.tag (), mpi::get_content 
> (bopr_reply_data));
>                 bopr_reply_data.payload.resize (0);
>                 // store reply
>                 replies.push_back (bopr_reply_data);
>                 ++reply_count;
>         }
>        
>         for (int rank = 1; rank < n_tasks; ++rank)
>         {
>                 Binomial_Option_Pricing_Request bopr_stop_data;
>                 world.send (rank, STOPTAG, bopr_stop_data);
>         }
> }
> 
> static void
> slave (mpi::communicator &world)
> {
>         int my_rank = world.rank ();
>         int count = 0;
> 
>         while (1)
>         {
>                 Binomial_Option_Pricing_Request bopr_call_data;
> 
>                 mpi::status msg = world.probe ();
>                 if (msg.tag () == STOPTAG)
>                 {
>                         break;
>                 }
>                 else
>                 {
>                         world.recv (0, msg.tag (), mpi::skeleton 
> (bopr_call_data));
>           std::cout << "Slave " << world.rank () << ":Received Skeleton: 
> "<<msg.tag() << std::endl;
> 
>                 world.recv (0, msg.tag (), mpi::get_content (bopr_call_data));
>           std::cout << "Slave " << world.rank () << ":Received Payload: 
> "<<msg.tag() << std::endl;
>                        
> bopr_call_data.option_price =
>                                 option_price_call_american_binomial 
> (bopr_call_data.cur_stock_price,
>                                                                               
>                            bopr_call_data.strike_price,
>                                                                               
>                            bopr_call_data.risk_free_rate,
>                                                                               
>                            bopr_call_data.volatility,
>                                                                               
>                            bopr_call_data.t,
>                                                                               
>                            bopr_call_data.n_steps);
>                
>                         world.isend (0, msg.tag (), mpi::skeleton 
> (bopr_call_data));
>           std::cout << "Slave " << world.rank () << ": Sent Response 
> Skeleton: "<<msg.tag() << std::endl;
>                         world.isend (0, msg.tag (), mpi::get_content 
> (bopr_call_data));
>           std::cout << "Slave " << world.rank () << ": Sent Response Payload: 
> "<<msg.tag() << std::endl;
>                         ++count;
>                 }
>         }
>         std::cout << "Slave: " << my_rank << " : "
>                                                 << "Number of requests 
> processed: " << count << std::endl;
> }
> 
> Thanks,
> Pooja
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to