[ 
https://issues.apache.org/jira/browse/S4-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13198935#comment-13198935
 ] 

Karthik Kambatla commented on S4-7:
-----------------------------------

Matthieu, thanks a lot for taking a close look. I saw the test not running at 
some point, but it ran for me the last few times I checked. Now, I see that it 
surfaces more often if I increase the number of partitions and/or the number of 
messages. I ll look into the issue and fix it at the earliest. 

Just as you suggested, I had initially used a separate queue for in-transit 
messages. I can go back to it, but I suspect we would need synchronization 
across thread pools to avoid multiple threads processing the same queue. It 
might make sense to have a single thread sending out all the messages. Then, we 
don't need any synchronization at all; however, in cases of connection-loss to 
one partition, messages to be sent to other partitions will also be stalled. 
                
> Netty to tolerate network glitches and connection loss
> ------------------------------------------------------
>
>                 Key: S4-7
>                 URL: https://issues.apache.org/jira/browse/S4-7
>             Project: Apache S4
>          Issue Type: Bug
>            Reporter: Leo Neumeyer
>            Assignee: Karthik Kambatla
>             Fix For: 0.5
>
>         Attachments: S4-7-Robust-TCPEmitter-asynchronous-ordered.patch, 
> s4-7.patch, s4-7.patch
>
>
> NettyEmitter connects to different partitions and creates channels over which 
> it communicates to other listeners.
> It suffers from the following issues -- 
> 1. If the underlying topology changes, the channels and the associated 
> connections are not updated.
> 2. If a connection gets disconnected, it stays disconnected.
> 3. If for any reason, a connection can't be made, send() drops the message to 
> be sent.
> The solution is to - 
> 1. Maintain a bounded messageQueue for each destination partition - if a 
> connection does not exist, the message should be queued.
> 2. Maintain a map of the channel used for each destination partition - update 
> this map on changes to topology, or on send() in case of disconnections.
> 3. Every time a (re-)connection is made, send the queued messages first.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to