I think we figured the issue. It was the Cassandra ; in that environment one of 
the node was making write super slow. We fixed the cluster and now it's much 
better.

On 2017-02-28 13:09 (-0800), Sandesh Hegde 
<[email protected]<mailto:[email protected]>> wrote:
> Can you please attach the stacktrace of the operator?
>
> You can increase the attribute TIMEOUT_WINDOW_COUNT , AppMaster uses that
> to decide when to kill the blocked operator.
>
> For taking stack trace, find the information in the blog.
> https://www.datatorrent.com/blog/getting-stack-traces-apache-apex-applications/
>
> On Tue, Feb 28, 2017 at 12:59 PM Sunil Parmar 
> <[email protected]<mailto:[email protected]>>
> wrote:
>
> > Ashwin,
> > I don%u2019t see such warning. I%u2019ll PM you entire log file.
> >
> > On 2017-02-28 12:16 (-0800), Ashwin Chandra Putta <
> > [email protected]<mailto:[email protected]>> wrote:
> > > Sunil,
> > > This might be related to checkpointing. See:
> > >
> > https://github.com/apache/apex-core/blob/master/engine/src/main/java/com/datatorrent/stram/StreamingContainerManager.java#L2211-L2217
> > >
> > > Also check this piece of code:
> > >
> > https://github.com/apache/apex-core/blob/master/engine/src/main/java/com/datatorrent/stram/StreamingContainerManager.java#L2031-L2044
> > >
> > > Can you paste the output of the warning from the code above which starts
> > > with 'Marking operator '
> > >
> > > Regards,
> > > Ashwin.
> > >
> > > On Tue, Feb 28, 2017 at 12:03 PM, Sunil Parmar 
> > > <[email protected]<mailto:[email protected]>
> > >
> > > wrote:
> > >
> > > > That doesn%u2019t seems to be the case. We do see window id moving in
> > UI as
> > > > well.
> > > >
> > > > On 2017-02-28 11:19 (-0800), Munagala Ramanath 
> > > > <[email protected]<mailto:[email protected]>>
> > > > wrote:
> > > > > It likely means that that operator is taking too long to return from
> > one
> > > > of
> > > > > the callbacks like beginWindow(), endWindow(),
> > > > > emitTuples(), etc. Do you have any potentially blocking calls to
> > external
> > > > > systems in any of those callbacks ?
> > > > >
> > > > > Ram
> > > > >
> > > > > On Tue, Feb 28, 2017 at 11:09 AM, Sunil Parmar <
> > [email protected]<mailto:[email protected]>
> > > > >
> > > > > wrote:
> > > > >
> > > > > > 2017-02-27 19:43:21,926 INFO com.datatorrent.stram.
> > > > StreamingContainerManager:
> > > > > > Blocked operator PTOperator[id=3,name=eventUpdatesFormatter]
> > container
> > > > > >
> > PTContainer[id=1(container_1487310232732_0027_02_000111),state=ACTIVE]
> > > > > > time 61905ms
> > > > > > 2017-02-27 19:43:22,928 INFO com.datatorrent.stram.
> > > > StreamingAppMasterService:
> > > > > > Completed containerId=container_1487310232732_0027_02_000111,
> > > > > > state=COMPLETE, exitStatus=-105, diagnostics=Container killed by
> > the
> > > > > > ApplicationMaster.
> > > > > > Container killed on request. Exit code is 143
> > > > > > Container exited with a non-zero exit code 143
> > > > > >
> > > > > >
> > > > > > Can anyone help understand this error ? We see one of the operators
> > > > keeps
> > > > > > restarting the container; the above error is from AppMaster log.
> > > > > >
> > > > > > Thanks,
> > > > > > Sunil
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > _______________________________________________________
> > > > >
> > > > > Munagala V. Ramanath
> > > > >
> > > > > Software Engineer
> > > > >
> > > > > E: [email protected]<mailto:[email protected]> | M: (408) 
> > > > > 331-5034 | Twitter: @UnknownRam
> > > > >
> > > > > www.datatorrent.com  |  apex.apache.org
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > Regards,
> > > Ashwin.
> > >
> >
> --
> *Join us at Apex Big Data World-San Jose
> <http://www.apexbigdata.com/san-jose.html>, April 4, 2017!*
> [image: http://www.apexbigdata.com/san-jose-register.html]
>

Reply via email to