[
https://issues.apache.org/jira/browse/S4-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228240#comment-13228240
]
Karthik Kambatla commented on S4-7:
-----------------------------------
I believe I have fixed the second issue as well. At least the last 20 runs have
not exposed any more bugs. The updated source code is at
https://github.com/kambatla/s4/tree/S4-7-FIX.
More details about the bug with NetworkGlitchTest:
When I call Channel.close(), it instigates a series of associated operations
including the handling of in-flight message transfers. The failure of these
transmissions invokes the corresponding operationComplete() handler. As I
mentioned earlier, the handler closes the channel so that subsequent sends can
re-establish the connection. The two simultaneous calls to close() lead to a
deadlock.
> Netty to tolerate network glitches and connection loss
> ------------------------------------------------------
>
> Key: S4-7
> URL: https://issues.apache.org/jira/browse/S4-7
> Project: Apache S4
> Issue Type: Bug
> Reporter: Leo Neumeyer
> Assignee: Karthik Kambatla
> Fix For: 0.5
>
> Attachments: S4-7.patch
>
>
> NettyEmitter connects to different partitions and creates channels over which
> it communicates to other listeners.
> It suffers from the following issues --
> 1. If the underlying topology changes, the channels and the associated
> connections are not updated.
> 2. If a connection gets disconnected, it stays disconnected.
> 3. If for any reason, a connection can't be made, send() drops the message to
> be sent.
> The solution is to -
> 1. Maintain a bounded messageQueue for each destination partition - if a
> connection does not exist, the message should be queued.
> 2. Maintain a map of the channel used for each destination partition - update
> this map on changes to topology, or on send() in case of disconnections.
> 3. Every time a (re-)connection is made, send the queued messages first.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira