On Wed, Jun 1, 2011 at 4:41 PM, Steven McCoy <[email protected]> wrote:
> On 2 June 2011 04:17, Ladan Gharai <[email protected]> wrote: > >> Hi: >> >> >> >> I are trying to use PUB/SUB with epgm – and what I am observing is that >> sometimes one of the receivers stops receiving data. >> >> >> >> With Iperf and UDP, the boxes I am using can sustain 500Mbps with no >> loss (or very little) – but the epgm receiver is clunking out at 100Mbps or >> even lower data rates >> >> > The default rate limit is now 40mbps, you can turn it off or raise it up as > required. > Yes, I have ZMQ_RATE set to 500Mbps > > >> >> >> I am using RHEL5 and zeromq-2.1.7 >> >> >> >> I’ve turned on the openpgm trace/debug messages – afaict once the epgm >> receiver sustains “a lot” of packet loss its just not able to start-over >> again >> >> > Every time the receiver sees packet loss it closes the socket and schedules > a new socket to be created to reconnect to the PGM stream. > I am not sure I understand this - do you mean the zmq socket gets a new zmq socket if the ePGM receiver experiences unrecoverable loss? (I dont see any new socket opening I just see the zmq recv not receiving anymore) > > >> >> >> My questions are: >> >> 1. Is there a way to reset the receiver once this happens? >> >> > Reconnects occur with the same engine as TCP reconnects. > > >> >> 1. >> 2. Has anyone experimented with changing the size of the rxw (it >> currently uses 33333) – and the various timers NAK_RB_IVL, NAK_RPT_IVL and >> NAK_RDATA_IVL (something akin to TCP tuning?) >> >> > If you find PGM is non-productive you should investigate tightening the > recovery settings so failure is raised sooner rather than later. The > default settings are friendly towards 10mb networks and so running at high > speed on 1gb networks may pose a problem with high data loss. > > For example, drop the retry count for DATA & NCF from the default 50 to 2. > > ~line 211 in pgm_socket.cpp: > nak_data_retries = 2, > > nak_ncf_retries = 2; > Yes - this seems the most sensible approach, expect now it crashes - Segmentation fault - once it falls into a long series of packet losses. > >> 1. >> 2. Also occasionally I see the following assertion failed sometime >> after everything has gone to zero: >> >> Assertion failed: pending_bytes == 0 >> (pgm_receiver.cpp:142) >> > > Also raised a couple of weeks ago and there is a case in Github. Requires > additional debugging to find the cause. The first step is to add an > assertion to ensure "*pending_bytes*" is always positive. > > ~line 226 in pgm_receiver.cpp: > > // Push all the data to the decoder. > > ssize_t processed = it->second.decoder->process_buffer (data, > received); > > *assert (processed >= 0);* > > if (processed < received) { > > // Save some state so we can resume the decoding process later. > > pending_bytes = received - processed; > > > -- > Steve-o > > _______________________________________________ > zeromq-dev mailing list > [email protected] > http://lists.zeromq.org/mailman/listinfo/zeromq-dev > >
_______________________________________________ zeromq-dev mailing list [email protected] http://lists.zeromq.org/mailman/listinfo/zeromq-dev
