[jira] [Updated] (NIFI-2735) Add processor to perform simple aggregations

2023-12-06 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-2735:
---
Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

Resolving as QueryRecord and/or NIFI-5291 cover this

> Add processor to perform simple aggregations
> 
>
> Key: NIFI-2735
> URL: https://issues.apache.org/jira/browse/NIFI-2735
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Reporter: Matt Burgess
>Priority: Major
>
> This is a proposal for a new processor (AggregateValues, for example) that 
> can perform simple aggregation operations such as count, sum, average, min, 
> max, and concatenate, over a set of "related" flow files. For example, when a 
> JSON file is split on an array (using the SplitJson processor), the total 
> count of the splits, the index of each split, and the unique identifier 
> (shared by each split) are stored as attributes in each flow file sent to the 
> "splits" relationship:
> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.SplitJson/index.html
> These attributes are the "fragment.*" attributes in the documentation for 
> SplitText, SplitXml, and SplitJson, for example.
> Such a processor could perform these operations for each flow file split from 
> the original document, and when all documents from a split have been 
> processed, a flow file could be transferred to an "aggregate" relationship 
> containing attributes for the operation, aggregate value, etc.
> An interesting application of this (besides the actual aggregation 
> operations) is that you can use the "aggregate" relationship as an event 
> trigger. For example if you need to wait until all files from a group are 
> processed, you can use AggregateValues and the "aggregate" relationship to 
> indicate downstream that the entire group has been processed. If there is not 
> a Split processor upstream, then the attributes (fragment.*) would have to be 
> manipulated by the data flow designer, but this can be accomplished with 
> other processors (including the scripting processors if necessary). 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-2735) Add processor to perform simple aggregations

2016-09-15 Thread Matt Burgess (JIRA)

 [ 
https://issues.apache.org/jira/browse/NIFI-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-2735:
---
Description: 
This is a proposal for a new processor (AggregateValues, for example) that can 
perform simple aggregation operations such as count, sum, average, min, max, 
and concatenate, over a set of "related" flow files. For example, when a JSON 
file is split on an array (using the SplitJson processor), the total count of 
the splits, the index of each split, and the unique identifier (shared by each 
split) are stored as attributes in each flow file sent to the "splits" 
relationship:

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.SplitJson/index.html

These attributes are the "fragment.*" attributes in the documentation for 
SplitText, SplitXml, and SplitJson, for example.

Such a processor could perform these operations for each flow file split from 
the original document, and when all documents from a split have been processed, 
a flow file could be transferred to an "aggregate" relationship containing 
attributes for the operation, aggregate value, etc.

An interesting application of this (besides the actual aggregation operations) 
is that you can use the "aggregate" relationship as an event trigger. For 
example if you need to wait until all files from a group are processed, you can 
use AggregateValues and the "aggregate" relationship to indicate downstream 
that the entire group has been processed. If there is not a Split processor 
upstream, then the attributes (fragment.*) would have to be manipulated by the 
data flow designer, but this can be accomplished with other processors 
(including the scripting processors if necessary). 

  was:
This is a proposal for a new processor (AggregateValues, for example) that can 
perform simple aggregation operations such as count, sum, average, min, max, 
and concatenate, over a set of "related" flow files. For example, when a JSON 
file is split on an array (using the SplitJson processor), the total count of 
the splits, the index of each split, and the unique indentifier (shared by each 
split) are stored as attributes in each flow file sent to the "splits" 
relationship:

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.SplitJson/index.html

These attributes are the "fragment.*" attributes in the documentation for 
SplitText, SplitXml, and SplitJson, for example.

Such a processor could perform these operations for each flow file split from 
the original document, and when all documents from a split have been processed, 
a flow file could be transferred to an "aggregate" relationship containing 
attributes for the operation, aggregate value, etc.

An interesting application of this (besides the actual aggregation operations) 
is that you can use the "aggregate" relationship as an event trigger. For 
example if you need to wait until all files from a group are processed, you can 
use AggregateValues and the "aggregate" relationship to indicate downstream 
that the entire group has been processed. If there is not a Split processor 
upstream, then the attributes (fragment.*) would have to be manipulated by the 
data flow designer, but this can be accomplished with other processors 
(including the scripting processors if necessary). 


> Add processor to perform simple aggregations
> 
>
> Key: NIFI-2735
> URL: https://issues.apache.org/jira/browse/NIFI-2735
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: Matt Burgess
>
> This is a proposal for a new processor (AggregateValues, for example) that 
> can perform simple aggregation operations such as count, sum, average, min, 
> max, and concatenate, over a set of "related" flow files. For example, when a 
> JSON file is split on an array (using the SplitJson processor), the total 
> count of the splits, the index of each split, and the unique identifier 
> (shared by each split) are stored as attributes in each flow file sent to the 
> "splits" relationship:
> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.SplitJson/index.html
> These attributes are the "fragment.*" attributes in the documentation for 
> SplitText, SplitXml, and SplitJson, for example.
> Such a processor could perform these operations for each flow file split from 
> the original document, and when all documents from a split have been 
> processed, a flow file could be transferred to an "aggregate" relationship 
> containing attributes for the operation, aggregate value, etc.
> An interesting application of this (besides the actual aggregation 
> operations) is that you can use the "aggregate" relationship as an event 
> trigger. For example if