apex metrics integration into monitoring system

2017-02-28 Thread Mohammad Kargar
Is there any way to integrate Apex metrics/stats into an external
monitoring system (e.g. graphite)? Also what's the best way for enabling
jmx for a submitted job?

Thanks


Re: Blocked operator PTOperator

2017-02-28 Thread Sandesh Hegde
Can you please attach the stacktrace of the operator?

You can increase the attribute TIMEOUT_WINDOW_COUNT , AppMaster uses that
to decide when to kill the blocked operator.

For taking stack trace, find the information in the blog.
https://www.datatorrent.com/blog/getting-stack-traces-apache-apex-applications/

On Tue, Feb 28, 2017 at 12:59 PM Sunil Parmar 
wrote:

> Ashwin,
> I don’t see such warning. I’ll PM you entire log file.
>
> On 2017-02-28 12:16 (-0800), Ashwin Chandra Putta <
> ashwinchand...@gmail.com> wrote:
> > Sunil,
> > This might be related to checkpointing. See:
> >
> https://github.com/apache/apex-core/blob/master/engine/src/main/java/com/datatorrent/stram/StreamingContainerManager.java#L2211-L2217
> >
> > Also check this piece of code:
> >
> https://github.com/apache/apex-core/blob/master/engine/src/main/java/com/datatorrent/stram/StreamingContainerManager.java#L2031-L2044
> >
> > Can you paste the output of the warning from the code above which starts
> > with 'Marking operator '
> >
> > Regards,
> > Ashwin.
> >
> > On Tue, Feb 28, 2017 at 12:03 PM, Sunil Parmar  >
> > wrote:
> >
> > > That doesn%u2019t seems to be the case. We do see window id moving in
> UI as
> > > well.
> > >
> > > On 2017-02-28 11:19 (-0800), Munagala Ramanath 
> > > wrote:
> > > > It likely means that that operator is taking too long to return from
> one
> > > of
> > > > the callbacks like beginWindow(), endWindow(),
> > > > emitTuples(), etc. Do you have any potentially blocking calls to
> external
> > > > systems in any of those callbacks ?
> > > >
> > > > Ram
> > > >
> > > > On Tue, Feb 28, 2017 at 11:09 AM, Sunil Parmar <
> spar...@threatmetrix.com
> > > >
> > > > wrote:
> > > >
> > > > > 2017-02-27 19:43:21,926 INFO com.datatorrent.stram.
> > > StreamingContainerManager:
> > > > > Blocked operator PTOperator[id=3,name=eventUpdatesFormatter]
> container
> > > > >
> PTContainer[id=1(container_1487310232732_0027_02_000111),state=ACTIVE]
> > > > > time 61905ms
> > > > > 2017-02-27 19:43:22,928 INFO com.datatorrent.stram.
> > > StreamingAppMasterService:
> > > > > Completed containerId=container_1487310232732_0027_02_000111,
> > > > > state=COMPLETE, exitStatus=-105, diagnostics=Container killed by
> the
> > > > > ApplicationMaster.
> > > > > Container killed on request. Exit code is 143
> > > > > Container exited with a non-zero exit code 143
> > > > >
> > > > >
> > > > > Can anyone help understand this error ? We see one of the operators
> > > keeps
> > > > > restarting the container; the above error is from AppMaster log.
> > > > >
> > > > > Thanks,
> > > > > Sunil
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > ___
> > > >
> > > > Munagala V. Ramanath
> > > >
> > > > Software Engineer
> > > >
> > > > E: r...@datatorrent.com | M: (408) 331-5034 | Twitter: @UnknownRam
> > > >
> > > > www.datatorrent.com  |  apex.apache.org
> > > >
> > >
> >
> >
> >
> > --
> >
> > Regards,
> > Ashwin.
> >
>
-- 
*Join us at Apex Big Data World-San Jose
, April 4, 2017!*
[image: http://www.apexbigdata.com/san-jose-register.html]


Re: Blocked operator PTOperator

2017-02-28 Thread Sunil Parmar
I don't see such warning in the appmaster log.




On 2017-02-28 12:16 (-0800), Ashwin Chandra Putta 
> wrote:
> Sunil,
> This might be related to checkpointing. See:
> https://github.com/apache/apex-core/blob/master/engine/src/main/java/com/datatorrent/stram/StreamingContainerManager.java#L2211-L2217
>
> Also check this piece of code:
> https://github.com/apache/apex-core/blob/master/engine/src/main/java/com/datatorrent/stram/StreamingContainerManager.java#L2031-L2044
>
> Can you paste the output of the warning from the code above which starts
> with 'Marking operator '
>
> Regards,
> Ashwin.
>
> On Tue, Feb 28, 2017 at 12:03 PM, Sunil Parmar 
> >
> wrote:
>
> > That doesn%u2019t seems to be the case. We do see window id moving in UI as
> > well.
> >
> > On 2017-02-28 11:19 (-0800), Munagala Ramanath 
> > >
> > wrote:
> > > It likely means that that operator is taking too long to return from one
> > of
> > > the callbacks like beginWindow(), endWindow(),
> > > emitTuples(), etc. Do you have any potentially blocking calls to external
> > > systems in any of those callbacks ?
> > >
> > > Ram
> > >
> > > On Tue, Feb 28, 2017 at 11:09 AM, Sunil Parmar 
> > > 
> > >
> > > wrote:
> > >
> > > > 2017-02-27 19:43:21,926 INFO com.datatorrent.stram.
> > StreamingContainerManager:
> > > > Blocked operator PTOperator[id=3,name=eventUpdatesFormatter] container
> > > > PTContainer[id=1(container_1487310232732_0027_02_000111),state=ACTIVE]
> > > > time 61905ms
> > > > 2017-02-27 19:43:22,928 INFO com.datatorrent.stram.
> > StreamingAppMasterService:
> > > > Completed containerId=container_1487310232732_0027_02_000111,
> > > > state=COMPLETE, exitStatus=-105, diagnostics=Container killed by the
> > > > ApplicationMaster.
> > > > Container killed on request. Exit code is 143
> > > > Container exited with a non-zero exit code 143
> > > >
> > > >
> > > > Can anyone help understand this error ? We see one of the operators
> > keeps
> > > > restarting the container; the above error is from AppMaster log.
> > > >
> > > > Thanks,
> > > > Sunil
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > ___
> > >
> > > Munagala V. Ramanath
> > >
> > > Software Engineer
> > >
> > > E: r...@datatorrent.com | M: (408) 331-5034 
> > > | Twitter: @UnknownRam
> > >
> > > www.datatorrent.com  |  apex.apache.org
> > >
> >
>
>
>
> --
>
> Regards,
> Ashwin.
>


Re: Blocked operator PTOperator

2017-02-28 Thread Ashwin Chandra Putta
Sunil,
This might be related to checkpointing. See:
https://github.com/apache/apex-core/blob/master/engine/src/main/java/com/datatorrent/stram/StreamingContainerManager.java#L2211-L2217

Also check this piece of code:
https://github.com/apache/apex-core/blob/master/engine/src/main/java/com/datatorrent/stram/StreamingContainerManager.java#L2031-L2044

Can you paste the output of the warning from the code above which starts
with 'Marking operator '

Regards,
Ashwin.

On Tue, Feb 28, 2017 at 12:03 PM, Sunil Parmar 
wrote:

> That doesn’t seems to be the case. We do see window id moving in UI as
> well.
>
> On 2017-02-28 11:19 (-0800), Munagala Ramanath 
> wrote:
> > It likely means that that operator is taking too long to return from one
> of
> > the callbacks like beginWindow(), endWindow(),
> > emitTuples(), etc. Do you have any potentially blocking calls to external
> > systems in any of those callbacks ?
> >
> > Ram
> >
> > On Tue, Feb 28, 2017 at 11:09 AM, Sunil Parmar  >
> > wrote:
> >
> > > 2017-02-27 19:43:21,926 INFO com.datatorrent.stram.
> StreamingContainerManager:
> > > Blocked operator PTOperator[id=3,name=eventUpdatesFormatter] container
> > > PTContainer[id=1(container_1487310232732_0027_02_000111),state=ACTIVE]
> > > time 61905ms
> > > 2017-02-27 19:43:22,928 INFO com.datatorrent.stram.
> StreamingAppMasterService:
> > > Completed containerId=container_1487310232732_0027_02_000111,
> > > state=COMPLETE, exitStatus=-105, diagnostics=Container killed by the
> > > ApplicationMaster.
> > > Container killed on request. Exit code is 143
> > > Container exited with a non-zero exit code 143
> > >
> > >
> > > Can anyone help understand this error ? We see one of the operators
> keeps
> > > restarting the container; the above error is from AppMaster log.
> > >
> > > Thanks,
> > > Sunil
> > >
> >
> >
> >
> > --
> >
> > ___
> >
> > Munagala V. Ramanath
> >
> > Software Engineer
> >
> > E: r...@datatorrent.com | M: (408) 331-5034 | Twitter: @UnknownRam
> >
> > www.datatorrent.com  |  apex.apache.org
> >
>



-- 

Regards,
Ashwin.


Re: Blocked operator PTOperator

2017-02-28 Thread Sunil Parmar
That doesn't seems to be the case. We do see window id moving in UI as well.

On 2017-02-28 11:19 (-0800), Munagala Ramanath 
> wrote:
> It likely means that that operator is taking too long to return from one of
> the callbacks like beginWindow(), endWindow(),
> emitTuples(), etc. Do you have any potentially blocking calls to external
> systems in any of those callbacks ?
>
> Ram
>
> On Tue, Feb 28, 2017 at 11:09 AM, Sunil Parmar 
> >
> wrote:
>
> > 2017-02-27 19:43:21,926 INFO 
> > com.datatorrent.stram.StreamingContainerManager:
> > Blocked operator PTOperator[id=3,name=eventUpdatesFormatter] container
> > PTContainer[id=1(container_1487310232732_0027_02_000111),state=ACTIVE]
> > time 61905ms
> > 2017-02-27 19:43:22,928 INFO 
> > com.datatorrent.stram.StreamingAppMasterService:
> > Completed containerId=container_1487310232732_0027_02_000111,
> > state=COMPLETE, exitStatus=-105, diagnostics=Container killed by the
> > ApplicationMaster.
> > Container killed on request. Exit code is 143
> > Container exited with a non-zero exit code 143
> >
> >
> > Can anyone help understand this error ? We see one of the operators keeps
> > restarting the container; the above error is from AppMaster log.
> >
> > Thanks,
> > Sunil
> >
>
>
>
> --
>
> ___
>
> Munagala V. Ramanath
>
> Software Engineer
>
> E: r...@datatorrent.com | M: (408) 331-5034 | 
> Twitter: @UnknownRam
>
> www.datatorrent.com  |  apex.apache.org
>


Re: Scala Operators never leave Pending_Deploy state

2017-02-28 Thread Chris Benninger
Ok so I figured it out. It wasn't Scala exactly. The yarn container memory
configuration was at the default, just on the threshold between java jar
and a scala-based jar (containing the scala libs in addition to everything)
so the scala jobs were just big enough to cause the yarn containers to not
have enough ram.

Thanks!

On Wed, Feb 22, 2017 at 10:59 PM, Tushar Gosavi 
wrote:

> Hi Chris,
>
> Can you provide logs to the container where Scala operators are running?
> what version of scala and apex are you using? If possible can you provide
> scala operator code for me to test.
>
> Thanks.
> - Tushar.
>
>
>
> On Wed, Feb 22, 2017 at 5:15 AM, Chris Benninger 
> wrote:
>
>> I will also mention that I followed these instructions: https://www.data
>> torrent.com/blog/blog-writing-apache-apex-application-in-scala/
>>
>> On Tue, Feb 21, 2017 at 3:44 PM, Chris Benninger 
>> wrote:
>>
>>> I cannot get any scala operators to actually work. The Job starts and
>>> says running but any Scala Operators do not kick off. It shows them in
>>> Pending_Deploy state. I see nothing unusual in the logs.
>>>
>>>
>>>
>>
>


Re: Blocked operator PTOperator

2017-02-28 Thread Munagala Ramanath
It likely means that that operator is taking too long to return from one of
the callbacks like beginWindow(), endWindow(),
emitTuples(), etc. Do you have any potentially blocking calls to external
systems in any of those callbacks ?

Ram

On Tue, Feb 28, 2017 at 11:09 AM, Sunil Parmar 
wrote:

> 2017-02-27 19:43:21,926 INFO com.datatorrent.stram.StreamingContainerManager:
> Blocked operator PTOperator[id=3,name=eventUpdatesFormatter] container
> PTContainer[id=1(container_1487310232732_0027_02_000111),state=ACTIVE]
> time 61905ms
> 2017-02-27 19:43:22,928 INFO com.datatorrent.stram.StreamingAppMasterService:
> Completed containerId=container_1487310232732_0027_02_000111,
> state=COMPLETE, exitStatus=-105, diagnostics=Container killed by the
> ApplicationMaster.
> Container killed on request. Exit code is 143
> Container exited with a non-zero exit code 143
>
>
> Can anyone help understand this error ? We see one of the operators keeps
> restarting the container; the above error is from AppMaster log.
>
> Thanks,
> Sunil
>



-- 

___

Munagala V. Ramanath

Software Engineer

E: r...@datatorrent.com | M: (408) 331-5034 | Twitter: @UnknownRam

www.datatorrent.com  |  apex.apache.org


Blocked operator PTOperator

2017-02-28 Thread Sunil Parmar
2017-02-27 19:43:21,926 INFO com.datatorrent.stram.StreamingContainerManager: 
Blocked operator PTOperator[id=3,name=eventUpdatesFormatter] container 
PTContainer[id=1(container_1487310232732_0027_02_000111),state=ACTIVE] time 
61905ms
2017-02-27 19:43:22,928 INFO com.datatorrent.stram.StreamingAppMasterService: 
Completed containerId=container_1487310232732_0027_02_000111, state=COMPLETE, 
exitStatus=-105, diagnostics=Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143


Can anyone help understand this error ? We see one of the operators keeps 
restarting the container; the above error is from AppMaster log.

Thanks,
Sunil


No Checkpointing when partitioning is used in the loop with Delay Operator.

2017-02-28 Thread Ambarish Pande
Hello All,

I have created an app in which I am using Delay Operator for adding a
feedback loop in the DAG. Also I am doing static partitioning of some
operator in the loop. When I run the app, the operators in the loop are not
being check-pointed.

To narrow down the cause of this, I experimented a bit with the app.

App without Partitioning of Operators and with Delay Operator -
Checkpointing OK
App with partitioning of Operators and Without Delay Operator -
Checkpointing OK
App with partitioning of Operators with Delay Operator - NO Checkpointing


I have reported an issue on APEX-CORE. Here is the link :
https://issues.apache.org/jira/browse/APEXCORE-654

Any help would be greatly appreciated. I really need loop in my application.

Thank You.