[GitHub] nifi issue #2430: NIFI-4809 - Implement a SiteToSiteMetricsReportingTask

2018-03-15 Thread mattyb149
Github user mattyb149 commented on the issue:

https://github.com/apache/nifi/pull/2430
  
+1 LGTM, ran the unit tests and tried on a live NiFi instance with both 
Ambari and Record modes. The only thing I noticed (that should be its own Jira) 
is that the RecordWriter used by the reporting task does not report any 
referencing components.
Thanks for this great addition! Merging to master


---


[GitHub] nifi issue #2430: NIFI-4809 - Implement a SiteToSiteMetricsReportingTask

2018-03-13 Thread pvillard31
Github user pvillard31 commented on the issue:

https://github.com/apache/nifi/pull/2430
  
Done @mattyb149, thanks!


---


[GitHub] nifi issue #2430: NIFI-4809 - Implement a SiteToSiteMetricsReportingTask

2018-03-07 Thread pvillard31
Github user pvillard31 commented on the issue:

https://github.com/apache/nifi/pull/2430
  
@mattyb149 - pushed another commit with unit tests and doc


---


[GitHub] nifi issue #2430: NIFI-4809 - Implement a SiteToSiteMetricsReportingTask

2018-03-03 Thread pvillard31
Github user pvillard31 commented on the issue:

https://github.com/apache/nifi/pull/2430
  
@mattyb149 - this is not the final version, I still have unit tests and 
"additional details" doc to add, but wanted to give you an update in case you 
want to have a look and if you already have feedbacks.

I've reworked the new reporting task as I suggested with two additional 
properties. I also reworked the way metrics are constructed to have the ambari 
format and the "record" format with the minimal code duplication. I also chose 
this implementation (using the JsonTreeReader) so that we can easily reuse the 
code for the other S2S reporting task. If it sounds OK to you, adding an 
optional Record Writer property to the other S2S reporting tasks should be 
really easy (I guess best is to take care of it into a follow-up JIRA).


---


[GitHub] nifi issue #2430: NIFI-4809 - Implement a SiteToSiteMetricsReportingTask

2018-02-28 Thread mattyb149
Github user mattyb149 commented on the issue:

https://github.com/apache/nifi/pull/2430
  
I like all those ideas!


---


[GitHub] nifi issue #2430: NIFI-4809 - Implement a SiteToSiteMetricsReportingTask

2018-02-28 Thread pvillard31
Github user pvillard31 commented on the issue:

https://github.com/apache/nifi/pull/2430
  
What about adding a property to let the user decide the output format:
- "Output format" with two options: "Ambari Metrics Collector format" or 
"Record based format"
- "Record writer": this property being used if and only if the "Record 
based format" is selected

My primary objective was to have this reporting task usable in MiNiFi java 
agents to send the metrics to a NiFi cluster that would publish the metrics to 
AMS. But we could definitely use this reporting task to get the metrics and 
store the data into another store such as Elasticsearch.

In any case, I'd add an "Additional details" page to provide information 
about the schemas.


---


[GitHub] nifi issue #2430: NIFI-4809 - Implement a SiteToSiteMetricsReportingTask

2018-02-28 Thread mattyb149
Github user mattyb149 commented on the issue:

https://github.com/apache/nifi/pull/2430
  
Ugh that's true, I'm not a fan of dynamic keys. Since the schema would be 
generated by the reporting task, then we could create a schema for the example 
above, but then each flow file would have its own schema, and even worse, each 
flow file would have only one metric, so at that point it's not really 
conducive to record processing. As an alternative we could convert the output 
(before or after JSON conversion) to an altered spec with a consistent schema 
definition. 
For the Ambari reporting task, since Ambari is expecting this format, then 
fine; if we keep the same spec in this new reporting task for consistency, then 
I'd hope to see a template on the Wiki using the new reporting task with a 
JoltTransformJSON processor to do the aforementioned transformation, along with 
including an AvroSchemaRegistry that contains the schema definition for the 
files coming out of the JoltTransformJSON processor. This would allow us to 
keep a consistent (standard) defined format (albeit non-schema-friendly), but 
offer a well-known solution to prepare the data for record-aware processors. 
Thoughts?


---


[GitHub] nifi issue #2430: NIFI-4809 - Implement a SiteToSiteMetricsReportingTask

2018-02-28 Thread pvillard31
Github user pvillard31 commented on the issue:

https://github.com/apache/nifi/pull/2430
  
Thanks for your comments @mattyb149 - I just pushed a commit that should 
address everything.

Regarding the record approach you suggested. Even though I really like the 
idea, I'm not sure to see how to define a valid avro schema for the 
specification used by the Ambari collector API 
(https://cwiki.apache.org/confluence/display/AMBARI/Metrics+Collector+API+Specification):

json
{
  "metrics": [
{
  "metricname": "AMBARI_METRICS.SmokeTest.FakeMetric",
  "appid": "amssmoketestfake",
  "hostname": "ambari20-5.c.pramod-thangali.internal",
  "timestamp": 1432075898000,
  "starttime": 1432075898000,
  "metrics": {
"1432075898000": 0.963781711428,
"1432075899000": 1432075898000
  }
}
  ]
}


How would we manage the 'metrics' part where field names are timestamps?


---


[GitHub] nifi issue #2430: NIFI-4809 - Implement a SiteToSiteMetricsReportingTask

2018-01-25 Thread mattyb149
Github user mattyb149 commented on the issue:

https://github.com/apache/nifi/pull/2430
  
Reusing the Ambari format certainly makes this one easier to implement, and 
as you said the user can convert later with a record processor, but I'm 
thinking it might be best to be able to specify the RecordWriter in the 
reporting task. That way the conversion, filtering, etc. can be done while the 
records are already in object format (i.e. parsed), which would save the step 
of re-parsing and re-writing. Using a JsonRecordSetWriter with "Inherit Record 
Schema" would result in the same behavior as you propose here, or you could 
provide a different format and/or filter out the fields, etc. by using a 
configured RecordSetWriter.

@markap14 suggested to encode the "input" schema as a file in 
src/main/resources and read it in when the class/instance is created, using the 
available util method to convert it to a RecordSchema. To be honest I would 
like to have the S2S Provenance reporting task do this as well, but that's a 
bit more invasive since it already exists. Since the one in this PR is new, 
would be nice to have it be the exemplar for creating reporting services in the 
future. What do you think?


---


[GitHub] nifi issue #2430: NIFI-4809 - Implement a SiteToSiteMetricsReportingTask

2018-01-24 Thread pvillard31
Github user pvillard31 commented on the issue:

https://github.com/apache/nifi/pull/2430
  
I'll need to update this PR if #2431 is merged first.


---