[GitHub] nifi issue #2430: NIFI-4809 - Implement a SiteToSiteMetricsReportingTask
Github user mattyb149 commented on the issue: https://github.com/apache/nifi/pull/2430 +1 LGTM, ran the unit tests and tried on a live NiFi instance with both Ambari and Record modes. The only thing I noticed (that should be its own Jira) is that the RecordWriter used by the reporting task does not report any referencing components. Thanks for this great addition! Merging to master ---
[GitHub] nifi issue #2430: NIFI-4809 - Implement a SiteToSiteMetricsReportingTask
Github user pvillard31 commented on the issue: https://github.com/apache/nifi/pull/2430 Done @mattyb149, thanks! ---
[GitHub] nifi issue #2430: NIFI-4809 - Implement a SiteToSiteMetricsReportingTask
Github user pvillard31 commented on the issue: https://github.com/apache/nifi/pull/2430 @mattyb149 - pushed another commit with unit tests and doc ---
[GitHub] nifi issue #2430: NIFI-4809 - Implement a SiteToSiteMetricsReportingTask
Github user pvillard31 commented on the issue: https://github.com/apache/nifi/pull/2430 @mattyb149 - this is not the final version, I still have unit tests and "additional details" doc to add, but wanted to give you an update in case you want to have a look and if you already have feedbacks. I've reworked the new reporting task as I suggested with two additional properties. I also reworked the way metrics are constructed to have the ambari format and the "record" format with the minimal code duplication. I also chose this implementation (using the JsonTreeReader) so that we can easily reuse the code for the other S2S reporting task. If it sounds OK to you, adding an optional Record Writer property to the other S2S reporting tasks should be really easy (I guess best is to take care of it into a follow-up JIRA). ---
[GitHub] nifi issue #2430: NIFI-4809 - Implement a SiteToSiteMetricsReportingTask
Github user mattyb149 commented on the issue: https://github.com/apache/nifi/pull/2430 I like all those ideas! ---
[GitHub] nifi issue #2430: NIFI-4809 - Implement a SiteToSiteMetricsReportingTask
Github user pvillard31 commented on the issue: https://github.com/apache/nifi/pull/2430 What about adding a property to let the user decide the output format: - "Output format" with two options: "Ambari Metrics Collector format" or "Record based format" - "Record writer": this property being used if and only if the "Record based format" is selected My primary objective was to have this reporting task usable in MiNiFi java agents to send the metrics to a NiFi cluster that would publish the metrics to AMS. But we could definitely use this reporting task to get the metrics and store the data into another store such as Elasticsearch. In any case, I'd add an "Additional details" page to provide information about the schemas. ---
[GitHub] nifi issue #2430: NIFI-4809 - Implement a SiteToSiteMetricsReportingTask
Github user mattyb149 commented on the issue: https://github.com/apache/nifi/pull/2430 Ugh that's true, I'm not a fan of dynamic keys. Since the schema would be generated by the reporting task, then we could create a schema for the example above, but then each flow file would have its own schema, and even worse, each flow file would have only one metric, so at that point it's not really conducive to record processing. As an alternative we could convert the output (before or after JSON conversion) to an altered spec with a consistent schema definition. For the Ambari reporting task, since Ambari is expecting this format, then fine; if we keep the same spec in this new reporting task for consistency, then I'd hope to see a template on the Wiki using the new reporting task with a JoltTransformJSON processor to do the aforementioned transformation, along with including an AvroSchemaRegistry that contains the schema definition for the files coming out of the JoltTransformJSON processor. This would allow us to keep a consistent (standard) defined format (albeit non-schema-friendly), but offer a well-known solution to prepare the data for record-aware processors. Thoughts? ---
[GitHub] nifi issue #2430: NIFI-4809 - Implement a SiteToSiteMetricsReportingTask
Github user pvillard31 commented on the issue: https://github.com/apache/nifi/pull/2430 Thanks for your comments @mattyb149 - I just pushed a commit that should address everything. Regarding the record approach you suggested. Even though I really like the idea, I'm not sure to see how to define a valid avro schema for the specification used by the Ambari collector API (https://cwiki.apache.org/confluence/display/AMBARI/Metrics+Collector+API+Specification): json { "metrics": [ { "metricname": "AMBARI_METRICS.SmokeTest.FakeMetric", "appid": "amssmoketestfake", "hostname": "ambari20-5.c.pramod-thangali.internal", "timestamp": 1432075898000, "starttime": 1432075898000, "metrics": { "1432075898000": 0.963781711428, "1432075899000": 1432075898000 } } ] } How would we manage the 'metrics' part where field names are timestamps? ---
[GitHub] nifi issue #2430: NIFI-4809 - Implement a SiteToSiteMetricsReportingTask
Github user mattyb149 commented on the issue: https://github.com/apache/nifi/pull/2430 Reusing the Ambari format certainly makes this one easier to implement, and as you said the user can convert later with a record processor, but I'm thinking it might be best to be able to specify the RecordWriter in the reporting task. That way the conversion, filtering, etc. can be done while the records are already in object format (i.e. parsed), which would save the step of re-parsing and re-writing. Using a JsonRecordSetWriter with "Inherit Record Schema" would result in the same behavior as you propose here, or you could provide a different format and/or filter out the fields, etc. by using a configured RecordSetWriter. @markap14 suggested to encode the "input" schema as a file in src/main/resources and read it in when the class/instance is created, using the available util method to convert it to a RecordSchema. To be honest I would like to have the S2S Provenance reporting task do this as well, but that's a bit more invasive since it already exists. Since the one in this PR is new, would be nice to have it be the exemplar for creating reporting services in the future. What do you think? ---
[GitHub] nifi issue #2430: NIFI-4809 - Implement a SiteToSiteMetricsReportingTask
Github user pvillard31 commented on the issue: https://github.com/apache/nifi/pull/2430 I'll need to update this PR if #2431 is merged first. ---