Re: [zeromq-dev] PUB/SUB on an epgm socket stops receiving eventually …

Ladan Gharai Fri, 03 Jun 2011 12:49:53 -0700

On Wed, Jun 1, 2011 at 4:41 PM, Steven McCoy <[email protected]> wrote:


> On 2 June 2011 04:17, Ladan Gharai <[email protected]> wrote:
>
>> Hi:
>>
>>
>>
>> I are trying to use PUB/SUB with epgm – and what I am observing is that
>> sometimes one of the receivers stops receiving data.
>>
>>
>>
>> With Iperf and UDP,  the boxes I am using can sustain 500Mbps with no
>> loss (or very little) – but the epgm receiver is clunking out at 100Mbps or
>> even lower data rates
>>
>>
> The default rate limit is now 40mbps, you can turn it off or raise it up as
> required.
>

    Yes, I have ZMQ_RATE set to 500Mbps

>
>
>>
>>
>> I am using RHEL5 and zeromq-2.1.7
>>
>>
>>
>> I’ve turned on  the openpgm trace/debug messages – afaict  once the epgm
>> receiver sustains “a lot” of packet loss its just not able to start-over
>> again
>>
>>
> Every time the receiver sees packet loss it closes the socket and schedules
> a new socket to be created to reconnect to the PGM stream.
>

   I am not sure I understand this - do you mean the zmq socket gets a new
zmq socket if the ePGM receiver experiences unrecoverable loss?  (I dont see
any new socket opening I just see the zmq recv  not receiving anymore)

>
>
>>
>>
>> My questions are:
>>
>>    1.   Is there a way to reset the receiver once this happens?
>>
>>
> Reconnects occur with the same engine as TCP reconnects.
>
>
>>
>>    1.
>>    2. Has anyone experimented with changing the size of the rxw (it
>>    currently uses 33333) – and the various timers NAK_RB_IVL, NAK_RPT_IVL and
>>    NAK_RDATA_IVL  (something akin to TCP tuning?)
>>
>>
> If you find PGM is non-productive you should investigate tightening the
> recovery settings so failure is raised sooner rather than later.  The
> default settings are friendly towards 10mb networks and so running at high
> speed on 1gb networks may pose a problem with high data loss.
>
> For example, drop the retry count for DATA & NCF from the default 50 to 2.
>
> ~line 211 in pgm_socket.cpp:
>                    nak_data_retries = 2,
>


>                   nak_ncf_retries = 2;
>

    Yes - this seems the most sensible approach, expect now it crashes -
Segmentation fault - once it falls into a long series of packet losses.

>
>>    1.
>>    2.    Also occasionally I see the following assertion failed sometime
>>    after everything has gone to zero:
>>
>>                                 Assertion failed: pending_bytes == 0
>> (pgm_receiver.cpp:142)
>>
>
> Also raised a couple of weeks ago and there is a case in Github.  Requires
> additional debugging to find the cause.  The first step is to add an
> assertion to ensure "*pending_bytes*" is always positive.
>
> ~line 226 in pgm_receiver.cpp:
>
>  //  Push all the data to the decoder.
>
>         ssize_t processed = it->second.decoder->process_buffer (data, 
> received);
>
> *assert (processed >= 0);*
>
>         if (processed < received) {
>
>             //  Save some state so we can resume the decoding process later.
>
>             pending_bytes = received - processed;
>
>
> --
> Steve-o
>
> _______________________________________________
> zeromq-dev mailing list
> [email protected]
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>

_______________________________________________
zeromq-dev mailing list
[email protected]
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

Re: [zeromq-dev] PUB/SUB on an epgm socket stops receiving eventually …

Reply via email to