RE: [EXT] Re: Correlate Processor ID in Logs

2017-08-22 Thread Karthik Kothareddy (karthikk) [CONT - Type 2]
Pierre and Kevin,

Thanks for your suggestions, based on your inputs maybe I can build a hybrid 
monitoring system which uses both SiteToSite Reporting Task and Bulletins 
through REST calls.

-Karthik

-Original Message-
From: Pierre Villard [mailto:pierre.villard...@gmail.com] 
Sent: Tuesday, August 22, 2017 2:48 PM
To: dev 
Subject: [EXT] Re: Correlate Processor ID in Logs

Hi,

I'd suggest to use the SiteToSite Bulletin Reporting Task as a way to monitor 
the bulletins generated by NiFi. If your reporting task is scheduled frequently 
enough, you shouldn't have any issue. Note that the "5 bulletins limit" is per 
processor.

Thanks!

2017-08-22 22:43 GMT+02:00 Kevin Doran :

> Hi Karthik,
>
> A processor's metadata, including its name and parent processor group 
> ID, are accessible via the NiFi REST API [1] via GET /processors/{id}, 
> which
> returns:
>
> {
> ...
> "component": {
> "id": "value",
> "parentGroupId": "value",
> "name": "value",
> "type": "value",
>... }
> }
>
> Of course, hitting the API for every log line doesn't scale, so one 
> approach would be to build a local cache of processorId -> 
> processorMetadata in whatever log line processing tool you are using, 
> and use the cache in order to enrich each log line with the fields you 
> require.
> You could build the cache lazily, i.e., start with an empty lookup 
> table, and if the processor ID is not in the cache, hit the REST API to look 
> it up.
>
> Regards,
> Kevin
>
> [1] https://nifi.apache.org/docs/nifi-docs/rest-api/
>
> On 8/22/17, 15:56, "Karthik Kothareddy (karthikk) [CONT - Type 2]" < 
> karth...@micron.com> wrote:
>
> Hello All,
>
> I am trying to build a monitoring mechanism for our flows and I'm 
> considering using the "nifi-app.log" as a primary source and filter 
> them based on the messages. However, I see that a particular message 
> only has Processor name and ID for example,
>
> ERROR [Timer-Driven Process Thread-36] 
> o.a.nifi.processors.standard.ExecuteSQL
> ExecuteSQL[id=015a1007-548f-1bf5-1836-e4e53164d184] Unable to execute 
> SQL select query SELECT * FROM table WHERE comp_datetime <= 
> '2017-01-31 23:59:59.813' ORDER BY datetime OFFSET 32400 ROWS 
> FETCH NEXT 100 ROWS ONLY for 
> StandardFlowFileRecord[uuid=fc425c66-b83d-46d2-94bc-
> 332e43345960,claim=StandardContentClaim [resourceClaim= 
> StandardResourceClaim[id=1499803802779-112000, container=default, 
> section=384], offset=265042, length=114613],offset=53992, 
> name=16290968101533439,size=167]
>
> Given the above Error message it is really hard to correlate the 
> ProcessorName/ID to the actual name of the Processor or it's parent 
> ProcessorGroup. Is there a way that I can correlate them easily?
>
> Also , I have considered using Bulletins as the source which is 
> more fine grained to the actual processor and ProcessorGroup it 
> belongs to but problem with this approach is the rest call only 
> returns 5 bulletins back each time. And according to this post 
> https://community.hortonworks.
> com/questions/72411/nifi-bulletinrepository-api-
> returns-maximum-5-bull.html  it is a fixed value and practically not 
> feasible to capture all of them if the flow has multiple failures 
> every second.
>
>
> Any thoughts around this are much appreciated.
>
> Thanks
> Karthik
>
>
>
>


Re: Correlate Processor ID in Logs

2017-08-22 Thread Pierre Villard
Hi,

I'd suggest to use the SiteToSite Bulletin Reporting Task as a way to
monitor the bulletins generated by NiFi. If your reporting task is
scheduled frequently enough, you shouldn't have any issue. Note that the "5
bulletins limit" is per processor.

Thanks!

2017-08-22 22:43 GMT+02:00 Kevin Doran :

> Hi Karthik,
>
> A processor's metadata, including its name and parent processor group ID,
> are accessible via the NiFi REST API [1] via GET /processors/{id}, which
> returns:
>
> {
> ...
> "component": {
> "id": "value",
> "parentGroupId": "value",
> "name": "value",
> "type": "value",
>... }
> }
>
> Of course, hitting the API for every log line doesn't scale, so one
> approach would be to build a local cache of processorId ->
> processorMetadata in whatever log line processing tool you are using, and
> use the cache in order to enrich each log line with the fields you require.
> You could build the cache lazily, i.e., start with an empty lookup table,
> and if the processor ID is not in the cache, hit the REST API to look it up.
>
> Regards,
> Kevin
>
> [1] https://nifi.apache.org/docs/nifi-docs/rest-api/
>
> On 8/22/17, 15:56, "Karthik Kothareddy (karthikk) [CONT - Type 2]" <
> karth...@micron.com> wrote:
>
> Hello All,
>
> I am trying to build a monitoring mechanism for our flows and I'm
> considering using the "nifi-app.log" as a primary source and filter them
> based on the messages. However, I see that a particular message only has
> Processor name and ID for example,
>
> ERROR [Timer-Driven Process Thread-36] 
> o.a.nifi.processors.standard.ExecuteSQL
> ExecuteSQL[id=015a1007-548f-1bf5-1836-e4e53164d184] Unable to execute SQL
> select query SELECT * FROM table WHERE comp_datetime <= '2017-01-31
> 23:59:59.813' ORDER BY datetime OFFSET 32400 ROWS FETCH NEXT 100
> ROWS ONLY for StandardFlowFileRecord[uuid=fc425c66-b83d-46d2-94bc-
> 332e43345960,claim=StandardContentClaim [resourceClaim=
> StandardResourceClaim[id=1499803802779-112000, container=default,
> section=384], offset=265042, length=114613],offset=53992,
> name=16290968101533439,size=167]
>
> Given the above Error message it is really hard to correlate the
> ProcessorName/ID to the actual name of the Processor or it's parent
> ProcessorGroup. Is there a way that I can correlate them easily?
>
> Also , I have considered using Bulletins as the source which is more
> fine grained to the actual processor and ProcessorGroup it belongs to but
> problem with this approach is the rest call only returns 5 bulletins back
> each time. And according to this post https://community.hortonworks.
> com/questions/72411/nifi-bulletinrepository-api-
> returns-maximum-5-bull.html  it is a fixed value and practically not
> feasible to capture all of them if the flow has multiple failures every
> second.
>
>
> Any thoughts around this are much appreciated.
>
> Thanks
> Karthik
>
>
>
>


Re: Correlate Processor ID in Logs

2017-08-22 Thread Kevin Doran
Hi Karthik,

A processor's metadata, including its name and parent processor group ID, are 
accessible via the NiFi REST API [1] via GET /processors/{id}, which returns: 

{
...
"component": {
"id": "value",
"parentGroupId": "value",
"name": "value",
"type": "value",
   ... }
}

Of course, hitting the API for every log line doesn't scale, so one approach 
would be to build a local cache of processorId -> processorMetadata in whatever 
log line processing tool you are using, and use the cache in order to enrich 
each log line with the fields you require. You could build the cache lazily, 
i.e., start with an empty lookup table, and if the processor ID is not in the 
cache, hit the REST API to look it up.

Regards,
Kevin

[1] https://nifi.apache.org/docs/nifi-docs/rest-api/

On 8/22/17, 15:56, "Karthik Kothareddy (karthikk) [CONT - Type 2]" 
 wrote:

Hello All,

I am trying to build a monitoring mechanism for our flows and I'm 
considering using the "nifi-app.log" as a primary source and filter them based 
on the messages. However, I see that a particular message only has Processor 
name and ID for example,

ERROR [Timer-Driven Process Thread-36] 
o.a.nifi.processors.standard.ExecuteSQL 
ExecuteSQL[id=015a1007-548f-1bf5-1836-e4e53164d184] Unable to execute SQL 
select query SELECT * FROM table WHERE comp_datetime <= '2017-01-31 
23:59:59.813' ORDER BY datetime OFFSET 32400 ROWS FETCH NEXT 100 ROWS 
ONLY for 
StandardFlowFileRecord[uuid=fc425c66-b83d-46d2-94bc-332e43345960,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1499803802779-112000, 
container=default, section=384], offset=265042, 
length=114613],offset=53992,name=16290968101533439,size=167]

Given the above Error message it is really hard to correlate the 
ProcessorName/ID to the actual name of the Processor or it's parent 
ProcessorGroup. Is there a way that I can correlate them easily?

Also , I have considered using Bulletins as the source which is more fine 
grained to the actual processor and ProcessorGroup it belongs to but problem 
with this approach is the rest call only returns 5 bulletins back each time. 
And according to this post 
https://community.hortonworks.com/questions/72411/nifi-bulletinrepository-api-returns-maximum-5-bull.html
  it is a fixed value and practically not feasible to capture all of them if 
the flow has multiple failures every second.


Any thoughts around this are much appreciated.

Thanks
Karthik





Correlate Processor ID in Logs

2017-08-22 Thread Karthik Kothareddy (karthikk) [CONT - Type 2]
Hello All,

I am trying to build a monitoring mechanism for our flows and I'm considering 
using the "nifi-app.log" as a primary source and filter them based on the 
messages. However, I see that a particular message only has Processor name and 
ID for example,

ERROR [Timer-Driven Process Thread-36] o.a.nifi.processors.standard.ExecuteSQL 
ExecuteSQL[id=015a1007-548f-1bf5-1836-e4e53164d184] Unable to execute SQL 
select query SELECT * FROM table WHERE comp_datetime <= '2017-01-31 
23:59:59.813' ORDER BY datetime OFFSET 32400 ROWS FETCH NEXT 100 ROWS 
ONLY for 
StandardFlowFileRecord[uuid=fc425c66-b83d-46d2-94bc-332e43345960,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1499803802779-112000, 
container=default, section=384], offset=265042, 
length=114613],offset=53992,name=16290968101533439,size=167]

Given the above Error message it is really hard to correlate the 
ProcessorName/ID to the actual name of the Processor or it's parent 
ProcessorGroup. Is there a way that I can correlate them easily?

Also , I have considered using Bulletins as the source which is more fine 
grained to the actual processor and ProcessorGroup it belongs to but problem 
with this approach is the rest call only returns 5 bulletins back each time. 
And according to this post 
https://community.hortonworks.com/questions/72411/nifi-bulletinrepository-api-returns-maximum-5-bull.html
  it is a fixed value and practically not feasible to capture all of them if 
the flow has multiple failures every second.


Any thoughts around this are much appreciated.

Thanks
Karthik