There should be an upstream container that is either killed by yarn/stram or exits on run-time exception. Without such condition you should not see "Connection reset by peer" message in the downstream container.
There are multiple reasons why you may see windows id getting stuck behavior. One of them is outlined in the Jira I mentioned in my previous email. Thank you, Vlad Отправлено с iPhone > On May 11, 2017, at 08:45, chiranjeevi vasupilli <chiru....@gmail.com> wrote: > > The HDFS write operator is partitioned and receive data from another data > generator operator of partitioned running in an other container.. > > > What is the reason for window id getting stuck in 3.2 version. > > > >> On Wed, May 10, 2017 at 9:05 PM, Vlad Rozov <v.ro...@datatorrent.com> wrote: >> Is HDFS write operator partitioned? If not, in 3.2 release Apex deploys >> unifier for Nx1 partitioning in a separate upstream container. Check that >> container log. >> >> In any case my recommendation is to upgrade to 3.6.0 that has fix for >> APEXCORE-641. >> >> Thank you, >> >> Vlad >> >>> On 5/10/17 02:20, chiranjeevi vasupilli wrote: >>> The upstream operator processing fine, there are no exceptions and window >>> id keep moving. >>> Apex version: 3.2.2-incubating-SNAPSHOT >>> >>> Thanks >>> Chiranjeevi V >>> >>>> On Tue, May 9, 2017 at 10:12 PM, Vlad Rozov <v.ro...@datatorrent.com> >>>> wrote: >>>> Sorry, I mean upstream. Can you also provide Apex version. >>>> >>>> Thank you, >>>> >>>> Vlad >>>> >>>>> On 5/8/17 21:57, chiranjeevi vasupilli wrote: >>>>> In the killed container we have one writer operator to HDFS and default >>>>> Unifier. The writer operator receives data from other upstream >>>>> operators(32 partitions) running in a separate containers. There is no >>>>> functional processing happening in the blocked operators. >>>>> >>>>> There is no downstream operators. >>>>> >>>>> Please suggest. >>>>> >>>>> >>>>> >>>>>> On Mon, May 8, 2017 at 8:58 PM, Vlad Rozov <v.ro...@datatorrent.com> >>>>>> wrote: >>>>>> The exception below means that a downstream >>>>>> operator abruptly disconnected. It is not an indication of a problem by >>>>>> itself. Please check downstream operator container log for exceptions >>>>>> and error messages. >>>>>> >>>>>> Thank you, >>>>>> >>>>>> Vlad >>>>>> >>>>>>> On 5/8/17 07:12, Pramod Immaneni wrote: >>>>>>> Hi Chiranjeevi, >>>>>>> >>>>>>> I am assuming the operator that is upstream and feeding data to this >>>>>>> one is progressing properly. What is this operator doing? Is it doing >>>>>>> any blocking operations, for example, communicating with some external >>>>>>> systems in a blocking fashion? Do you see any other exceptions before >>>>>>> the above one you mentioned? >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>>>> On Mon, May 8, 2017 at 3:41 AM, chiranjeevi vasupilli >>>>>>>> <chiru....@gmail.com> wrote: >>>>>>>> Hi Team, >>>>>>>> >>>>>>>> In my use case , one of the operator window id got stuck and after >>>>>>>> timeout it is getting killed. >>>>>>>> In the logs we can see >>>>>>>> [ProcessWideEventLoop] ERROR netlet.AbstractLengthPrependerClient >>>>>>>> handleException - Disconnecting >>>>>>>> Subscriber{id=tcp://hostname:57968/323.rsnOutput.1} because of an >>>>>>>> exception. >>>>>>>> java.io.IOException: Connection reset by peer >>>>>>>> at sun.nio.ch.FileDispatcherImpl.read0(Native Method). >>>>>>>> >>>>>>>> Before container getting , we see the above exception in the logs. >>>>>>>> >>>>>>>> Please suggest , reasons for window id getting stuck and how to debug >>>>>>>> it further. >>>>>>>> >>>>>>>> Thanks >>>>>>>> Chiranjeevi V >>>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> ur's >>>>> chiru >>>> >>> >>> >>> >>> -- >>> ur's >>> chiru >> > > > > -- > ur's > chiru