Can you please attach the stacktrace of the operator?

You can increase the attribute TIMEOUT_WINDOW_COUNT , AppMaster uses that
to decide when to kill the blocked operator.

For taking stack trace, find the information in the blog.
https://www.datatorrent.com/blog/getting-stack-traces-apache-apex-applications/

On Tue, Feb 28, 2017 at 12:59 PM Sunil Parmar <[email protected]>
wrote:

> Ashwin,
> I don’t see such warning. I’ll PM you entire log file.
>
> On 2017-02-28 12:16 (-0800), Ashwin Chandra Putta <
> [email protected]> wrote:
> > Sunil,
> > This might be related to checkpointing. See:
> >
> https://github.com/apache/apex-core/blob/master/engine/src/main/java/com/datatorrent/stram/StreamingContainerManager.java#L2211-L2217
> >
> > Also check this piece of code:
> >
> https://github.com/apache/apex-core/blob/master/engine/src/main/java/com/datatorrent/stram/StreamingContainerManager.java#L2031-L2044
> >
> > Can you paste the output of the warning from the code above which starts
> > with 'Marking operator '
> >
> > Regards,
> > Ashwin.
> >
> > On Tue, Feb 28, 2017 at 12:03 PM, Sunil Parmar <[email protected]
> >
> > wrote:
> >
> > > That doesn%u2019t seems to be the case. We do see window id moving in
> UI as
> > > well.
> > >
> > > On 2017-02-28 11:19 (-0800), Munagala Ramanath <[email protected]>
> > > wrote:
> > > > It likely means that that operator is taking too long to return from
> one
> > > of
> > > > the callbacks like beginWindow(), endWindow(),
> > > > emitTuples(), etc. Do you have any potentially blocking calls to
> external
> > > > systems in any of those callbacks ?
> > > >
> > > > Ram
> > > >
> > > > On Tue, Feb 28, 2017 at 11:09 AM, Sunil Parmar <
> [email protected]
> > > >
> > > > wrote:
> > > >
> > > > > 2017-02-27 19:43:21,926 INFO com.datatorrent.stram.
> > > StreamingContainerManager:
> > > > > Blocked operator PTOperator[id=3,name=eventUpdatesFormatter]
> container
> > > > >
> PTContainer[id=1(container_1487310232732_0027_02_000111),state=ACTIVE]
> > > > > time 61905ms
> > > > > 2017-02-27 19:43:22,928 INFO com.datatorrent.stram.
> > > StreamingAppMasterService:
> > > > > Completed containerId=container_1487310232732_0027_02_000111,
> > > > > state=COMPLETE, exitStatus=-105, diagnostics=Container killed by
> the
> > > > > ApplicationMaster.
> > > > > Container killed on request. Exit code is 143
> > > > > Container exited with a non-zero exit code 143
> > > > >
> > > > >
> > > > > Can anyone help understand this error ? We see one of the operators
> > > keeps
> > > > > restarting the container; the above error is from AppMaster log.
> > > > >
> > > > > Thanks,
> > > > > Sunil
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > _______________________________________________________
> > > >
> > > > Munagala V. Ramanath
> > > >
> > > > Software Engineer
> > > >
> > > > E: [email protected] | M: (408) 331-5034 | Twitter: @UnknownRam
> > > >
> > > > www.datatorrent.com  |  apex.apache.org
> > > >
> > >
> >
> >
> >
> > --
> >
> > Regards,
> > Ashwin.
> >
>
-- 
*Join us at Apex Big Data World-San Jose
<http://www.apexbigdata.com/san-jose.html>, April 4, 2017!*
[image: http://www.apexbigdata.com/san-jose-register.html]

Reply via email to