[jira] [Updated] (FLINK-4021) Problem of setting autoread for netty channel when more tasks sharing the same Tcp connection
[ https://issues.apache.org/jira/browse/FLINK-4021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi updated FLINK-4021: --- Component/s: (was: Distributed Coordination) Network > Problem of setting autoread for netty channel when more tasks sharing the > same Tcp connection > - > > Key: FLINK-4021 > URL: https://issues.apache.org/jira/browse/FLINK-4021 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.0.2 >Reporter: Zhijiang Wang >Assignee: Zhijiang Wang > > More than one task sharing the same Tcp connection for shuffling data. > If the downstream task said as "A" has no available memory segment to read > netty buffer from network, it will set autoread as false for the channel. > When the task A is failed or has available segments again, the netty handler > will be notified to process the staging buffers first, then reset autoread as > true. But in some scenarios, the autoread will not be set as true any more. > That is when processing staging buffers, first find the corresponding input > channel for the buffer, if the task for that input channel is failed, the > decodeMsg method in PartitionRequestClientHandler will return false, that > means setting autoread as true will not be done anymore. > In summary, if one task "A" sets the autoread as false because of no > available segments, and resulting in some staging buffers. If another task > "B" is failed by accident corresponding to one staging buffer. When task A > trys to reset autoread as true, the process can not work because of task B > failed. > I have fixed this problem in our application by adding one boolean parameter > in decodeBufferOrEvent method to distinguish whether this method is invoke by > netty IO thread channel read or staged message handler task in > PartitionRequestClientHandler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-4021) Problem of setting autoread for netty channel when more tasks sharing the same Tcp connection
[ https://issues.apache.org/jira/browse/FLINK-4021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi updated FLINK-4021: --- Fix Version/s: (was: 1.1.0) > Problem of setting autoread for netty channel when more tasks sharing the > same Tcp connection > - > > Key: FLINK-4021 > URL: https://issues.apache.org/jira/browse/FLINK-4021 > Project: Flink > Issue Type: Bug > Components: Distributed Runtime >Affects Versions: 1.0.2 >Reporter: Zhijiang Wang >Assignee: Zhijiang Wang > > More than one task sharing the same Tcp connection for shuffling data. > If the downstream task said as "A" has no available memory segment to read > netty buffer from network, it will set autoread as false for the channel. > When the task A is failed or has available segments again, the netty handler > will be notified to process the staging buffers first, then reset autoread as > true. But in some scenarios, the autoread will not be set as true any more. > That is when processing staging buffers, first find the corresponding input > channel for the buffer, if the task for that input channel is failed, the > decodeMsg method in PartitionRequestClientHandler will return false, that > means setting autoread as true will not be done anymore. > In summary, if one task "A" sets the autoread as false because of no > available segments, and resulting in some staging buffers. If another task > "B" is failed by accident corresponding to one staging buffer. When task A > trys to reset autoread as true, the process can not work because of task B > failed. > I have fixed this problem in our application by adding one boolean parameter > in decodeBufferOrEvent method to distinguish whether this method is invoke by > netty IO thread channel read or staged message handler task in > PartitionRequestClientHandler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)