Can you please attach the stacktrace of the operator? You can increase the attribute TIMEOUT_WINDOW_COUNT , AppMaster uses that to decide when to kill the blocked operator.
For taking stack trace, find the information in the blog. https://www.datatorrent.com/blog/getting-stack-traces-apache-apex-applications/ On Tue, Feb 28, 2017 at 12:59 PM Sunil Parmar <[email protected]> wrote: > Ashwin, > I don’t see such warning. I’ll PM you entire log file. > > On 2017-02-28 12:16 (-0800), Ashwin Chandra Putta < > [email protected]> wrote: > > Sunil, > > This might be related to checkpointing. See: > > > https://github.com/apache/apex-core/blob/master/engine/src/main/java/com/datatorrent/stram/StreamingContainerManager.java#L2211-L2217 > > > > Also check this piece of code: > > > https://github.com/apache/apex-core/blob/master/engine/src/main/java/com/datatorrent/stram/StreamingContainerManager.java#L2031-L2044 > > > > Can you paste the output of the warning from the code above which starts > > with 'Marking operator ' > > > > Regards, > > Ashwin. > > > > On Tue, Feb 28, 2017 at 12:03 PM, Sunil Parmar <[email protected] > > > > wrote: > > > > > That doesn%u2019t seems to be the case. We do see window id moving in > UI as > > > well. > > > > > > On 2017-02-28 11:19 (-0800), Munagala Ramanath <[email protected]> > > > wrote: > > > > It likely means that that operator is taking too long to return from > one > > > of > > > > the callbacks like beginWindow(), endWindow(), > > > > emitTuples(), etc. Do you have any potentially blocking calls to > external > > > > systems in any of those callbacks ? > > > > > > > > Ram > > > > > > > > On Tue, Feb 28, 2017 at 11:09 AM, Sunil Parmar < > [email protected] > > > > > > > > wrote: > > > > > > > > > 2017-02-27 19:43:21,926 INFO com.datatorrent.stram. > > > StreamingContainerManager: > > > > > Blocked operator PTOperator[id=3,name=eventUpdatesFormatter] > container > > > > > > PTContainer[id=1(container_1487310232732_0027_02_000111),state=ACTIVE] > > > > > time 61905ms > > > > > 2017-02-27 19:43:22,928 INFO com.datatorrent.stram. > > > StreamingAppMasterService: > > > > > Completed containerId=container_1487310232732_0027_02_000111, > > > > > state=COMPLETE, exitStatus=-105, diagnostics=Container killed by > the > > > > > ApplicationMaster. > > > > > Container killed on request. Exit code is 143 > > > > > Container exited with a non-zero exit code 143 > > > > > > > > > > > > > > > Can anyone help understand this error ? We see one of the operators > > > keeps > > > > > restarting the container; the above error is from AppMaster log. > > > > > > > > > > Thanks, > > > > > Sunil > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > _______________________________________________________ > > > > > > > > Munagala V. Ramanath > > > > > > > > Software Engineer > > > > > > > > E: [email protected] | M: (408) 331-5034 | Twitter: @UnknownRam > > > > > > > > www.datatorrent.com | apex.apache.org > > > > > > > > > > > > > > > -- > > > > Regards, > > Ashwin. > > > -- *Join us at Apex Big Data World-San Jose <http://www.apexbigdata.com/san-jose.html>, April 4, 2017!* [image: http://www.apexbigdata.com/san-jose-register.html]
