[jira] [Commented] (NIFI-4872) NIFI component high resource usage annotation

2018-03-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386725#comment-16386725
 ] 

ASF GitHub Bot commented on NIFI-4872:
--

Github user markap14 commented on the issue:

https://github.com/apache/nifi/pull/2475
  
@jtstorck this looks good to me. Simple merge conflict in import statements 
but I was able to address that. Otherwise, I think this is all a great step 
forward. I do agree that we will likely need more PR's later to further enrich 
the existing processors but this lays the groundwork for it all, so it makes 
sense to merge it in as-is. So +1 merged to master. Thanks for getting this 
knocked out!


> NIFI component high resource usage annotation
> -
>
> Key: NIFI-4872
> URL: https://issues.apache.org/jira/browse/NIFI-4872
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Core Framework, Core UI
>Affects Versions: 1.5.0
>Reporter: Jeff Storck
>Assignee: Jeff Storck
>Priority: Critical
> Fix For: 1.6.0
>
>
> NiFi Processors currently have no means to relay whether or not they have may 
> be resource intensive or not. The idea here would be to introduce an 
> Annotation that can be added to Processors that indicate they may cause high 
> memory, disk, CPU, or network usage. For instance, any Processor that reads 
> the FlowFile contents into memory (like many XML Processors for instance) may 
> cause high memory usage. What ultimately determines if there is high 
> memory/disk/cpu/network usage will depend on the FlowFiles being processed. 
> With many of these components in the dataflow, it increases the risk of 
> OutOfMemoryErrors and performance degradation.
> The annotation should support one value from a fixed list of: CPU, Disk, 
> Memory, Network.  It should also allow the developer to provide a custom 
> description of the scenario that the component would fall under the high 
> usage category.  The annotation should be able to be specified multiple 
> times, for as many resources as it has the potential to be high usage.
> By marking components with this new Annotation, we can update the generated 
> Processor documentation to include this fact.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4872) NIFI component high resource usage annotation

2018-03-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386721#comment-16386721
 ] 

ASF GitHub Bot commented on NIFI-4872:
--

Github user asfgit closed the pull request at:

https://github.com/apache/nifi/pull/2475


> NIFI component high resource usage annotation
> -
>
> Key: NIFI-4872
> URL: https://issues.apache.org/jira/browse/NIFI-4872
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Core Framework, Core UI
>Affects Versions: 1.5.0
>Reporter: Jeff Storck
>Assignee: Jeff Storck
>Priority: Critical
>
> NiFi Processors currently have no means to relay whether or not they have may 
> be resource intensive or not. The idea here would be to introduce an 
> Annotation that can be added to Processors that indicate they may cause high 
> memory, disk, CPU, or network usage. For instance, any Processor that reads 
> the FlowFile contents into memory (like many XML Processors for instance) may 
> cause high memory usage. What ultimately determines if there is high 
> memory/disk/cpu/network usage will depend on the FlowFiles being processed. 
> With many of these components in the dataflow, it increases the risk of 
> OutOfMemoryErrors and performance degradation.
> The annotation should support one value from a fixed list of: CPU, Disk, 
> Memory, Network.  It should also allow the developer to provide a custom 
> description of the scenario that the component would fall under the high 
> usage category.  The annotation should be able to be specified multiple 
> times, for as many resources as it has the potential to be high usage.
> By marking components with this new Annotation, we can update the generated 
> Processor documentation to include this fact.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4872) NIFI component high resource usage annotation

2018-02-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16379290#comment-16379290
 ] 

ASF GitHub Bot commented on NIFI-4872:
--

Github user markap14 commented on the issue:

https://github.com/apache/nifi/pull/2475
  
Thinking about this a little more, I think that the DISK resource makes a 
lot of sense to have but I think we should document as to when to use it - that 
being, it should be used if the Processor would use the disk in a way that may 
not be intuitive. For example, ConvertRecord perhaps does not need it, given 
that it reads the records once and writes them once, which is what would be 
expected for converting from one format to another.

However, QueryRecord is a great example of where this annotation would make 
sense. This is because QueryRecord will read the data up to N number of times, 
where N is the number of SQL statements supplied. DetectMimeType is also an 
interesting example, because I would expect it to read through all of the 
FlowFile content, but in some cases it is able to read only a few bytes, I 
believe, to determine the content's mime type.

Perhaps we should treat the NETWORK one the same way? Or potentially drop 
it? I don't know of any cases off the top of my head that would use the network 
in any unexpected way.


> NIFI component high resource usage annotation
> -
>
> Key: NIFI-4872
> URL: https://issues.apache.org/jira/browse/NIFI-4872
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Core Framework, Core UI
>Affects Versions: 1.5.0
>Reporter: Jeff Storck
>Assignee: Jeff Storck
>Priority: Critical
>
> NiFi Processors currently have no means to relay whether or not they have may 
> be resource intensive or not. The idea here would be to introduce an 
> Annotation that can be added to Processors that indicate they may cause high 
> memory, disk, CPU, or network usage. For instance, any Processor that reads 
> the FlowFile contents into memory (like many XML Processors for instance) may 
> cause high memory usage. What ultimately determines if there is high 
> memory/disk/cpu/network usage will depend on the FlowFiles being processed. 
> With many of these components in the dataflow, it increases the risk of 
> OutOfMemoryErrors and performance degradation.
> The annotation should support one value from a fixed list of: CPU, Disk, 
> Memory, Network.  It should also allow the developer to provide a custom 
> description of the scenario that the component would fall under the high 
> usage category.  The annotation should be able to be specified multiple 
> times, for as many resources as it has the potential to be high usage.
> By marking components with this new Annotation, we can update the generated 
> Processor documentation to include this fact.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4872) NIFI component high resource usage annotation

2018-02-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16379269#comment-16379269
 ] 

ASF GitHub Bot commented on NIFI-4872:
--

Github user markap14 commented on the issue:

https://github.com/apache/nifi/pull/2475
  
@jtstorck I should probably have read through all the comments before 
adding my own :) Sorry about that. I did notice though, that you have resources 
for "DISK" and "NETWORK" but they are not used anywhere. I would imagine that 
any processor that changes the content of the FlowFile would get a "DISK" one - 
which is a very large number of them. And perhaps even processors that read the 
content? I wonder if that's actually necessary. Since the Processor shows how 
much data is being read/written in the 5 minute stats, I wonder if we could 
just drop that? Similarly, I think that the NETWORK utilization may be kind of 
inferred in most cases - any processor that interacts with an external service 
is likely to have high network utilization. But not sure it makes sense to 
label every single one of those. Would recommend that we either remove those or 
add javadocs explaining when exactly we recommend using those annotations if we 
are not going to use them for each processor that touches flowfile content / 
network.


> NIFI component high resource usage annotation
> -
>
> Key: NIFI-4872
> URL: https://issues.apache.org/jira/browse/NIFI-4872
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Core Framework, Core UI
>Affects Versions: 1.5.0
>Reporter: Jeff Storck
>Assignee: Jeff Storck
>Priority: Critical
>
> NiFi Processors currently have no means to relay whether or not they have may 
> be resource intensive or not. The idea here would be to introduce an 
> Annotation that can be added to Processors that indicate they may cause high 
> memory, disk, CPU, or network usage. For instance, any Processor that reads 
> the FlowFile contents into memory (like many XML Processors for instance) may 
> cause high memory usage. What ultimately determines if there is high 
> memory/disk/cpu/network usage will depend on the FlowFiles being processed. 
> With many of these components in the dataflow, it increases the risk of 
> OutOfMemoryErrors and performance degradation.
> The annotation should support one value from a fixed list of: CPU, Disk, 
> Memory, Network.  It should also allow the developer to provide a custom 
> description of the scenario that the component would fall under the high 
> usage category.  The annotation should be able to be specified multiple 
> times, for as many resources as it has the potential to be high usage.
> By marking components with this new Annotation, we can update the generated 
> Processor documentation to include this fact.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4872) NIFI component high resource usage annotation

2018-02-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16379266#comment-16379266
 ] 

ASF GitHub Bot commented on NIFI-4872:
--

Github user markap14 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2475#discussion_r171065265
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/SplitXml.java
 ---
@@ -82,6 +84,7 @@
 description = "The number of split FlowFiles generated 
from the parent FlowFile"),
 @WritesAttribute(attribute = "segment.original.filename ", 
description = "The filename of the parent FlowFile")
 })
+@SystemResourceConsideration(resource = SystemResource.MEMORY)
--- End diff --

In this particular context, we are buffering the entirety of the FlowFile's 
content (as a Document object, which can take approximately 10 times as much 
heap as the size of the XML - i.e., a 1 MB XML document may take 10 MB of 
heap), in addition to all of the generated FlowFile objects. A two-stage 
approach may well be necessary for lots of splits, but even then if the XML is 
large you could potentially run out of heap space.


> NIFI component high resource usage annotation
> -
>
> Key: NIFI-4872
> URL: https://issues.apache.org/jira/browse/NIFI-4872
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Core Framework, Core UI
>Affects Versions: 1.5.0
>Reporter: Jeff Storck
>Assignee: Jeff Storck
>Priority: Critical
>
> NiFi Processors currently have no means to relay whether or not they have may 
> be resource intensive or not. The idea here would be to introduce an 
> Annotation that can be added to Processors that indicate they may cause high 
> memory, disk, CPU, or network usage. For instance, any Processor that reads 
> the FlowFile contents into memory (like many XML Processors for instance) may 
> cause high memory usage. What ultimately determines if there is high 
> memory/disk/cpu/network usage will depend on the FlowFiles being processed. 
> With many of these components in the dataflow, it increases the risk of 
> OutOfMemoryErrors and performance degradation.
> The annotation should support one value from a fixed list of: CPU, Disk, 
> Memory, Network.  It should also allow the developer to provide a custom 
> description of the scenario that the component would fall under the high 
> usage category.  The annotation should be able to be specified multiple 
> times, for as many resources as it has the potential to be high usage.
> By marking components with this new Annotation, we can update the generated 
> Processor documentation to include this fact.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4872) NIFI component high resource usage annotation

2018-02-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16379263#comment-16379263
 ] 

ASF GitHub Bot commented on NIFI-4872:
--

Github user markap14 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2475#discussion_r171065023
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/SplitText.java
 ---
@@ -87,6 +89,7 @@
 @WritesAttribute(attribute = "fragment.count", description = "The 
number of split FlowFiles generated from the parent FlowFile"),
 @WritesAttribute(attribute = "segment.original.filename ", description 
= "The filename of the parent FlowFile")})
 @SeeAlso(MergeContent.class)
+@SystemResourceConsideration(resource = SystemResource.MEMORY)
--- End diff --

I would again add a description here that indicates that it's not buffering 
the content in memory but rather just storing the FlowFile w/ its attributes in 
memory and that if generating too many splits, a two-phase approach may be 
necessary.


> NIFI component high resource usage annotation
> -
>
> Key: NIFI-4872
> URL: https://issues.apache.org/jira/browse/NIFI-4872
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Core Framework, Core UI
>Affects Versions: 1.5.0
>Reporter: Jeff Storck
>Assignee: Jeff Storck
>Priority: Critical
>
> NiFi Processors currently have no means to relay whether or not they have may 
> be resource intensive or not. The idea here would be to introduce an 
> Annotation that can be added to Processors that indicate they may cause high 
> memory, disk, CPU, or network usage. For instance, any Processor that reads 
> the FlowFile contents into memory (like many XML Processors for instance) may 
> cause high memory usage. What ultimately determines if there is high 
> memory/disk/cpu/network usage will depend on the FlowFiles being processed. 
> With many of these components in the dataflow, it increases the risk of 
> OutOfMemoryErrors and performance degradation.
> The annotation should support one value from a fixed list of: CPU, Disk, 
> Memory, Network.  It should also allow the developer to provide a custom 
> description of the scenario that the component would fall under the high 
> usage category.  The annotation should be able to be specified multiple 
> times, for as many resources as it has the potential to be high usage.
> By marking components with this new Annotation, we can update the generated 
> Processor documentation to include this fact.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4872) NIFI component high resource usage annotation

2018-02-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16379261#comment-16379261
 ] 

ASF GitHub Bot commented on NIFI-4872:
--

Github user markap14 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2475#discussion_r171064833
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/SplitJson.java
 ---
@@ -80,7 +82,7 @@
 description = "The number of split FlowFiles generated 
from the parent FlowFile"),
 @WritesAttribute(attribute = "segment.original.filename ", 
description = "The filename of the parent FlowFile")
 })
-
+@SystemResourceConsideration(resource = SystemResource.MEMORY)
--- End diff --

In this particular context, we are buffering the entirety of the FlowFile's 
content (as a JsonNode object), in addition to all of the generated FlowFile 
objects. A two-stage approach may well be necessary for lots of splits, but 
even then if the JSON is extremely large you could potentially run out of heap 
space.


> NIFI component high resource usage annotation
> -
>
> Key: NIFI-4872
> URL: https://issues.apache.org/jira/browse/NIFI-4872
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Core Framework, Core UI
>Affects Versions: 1.5.0
>Reporter: Jeff Storck
>Assignee: Jeff Storck
>Priority: Critical
>
> NiFi Processors currently have no means to relay whether or not they have may 
> be resource intensive or not. The idea here would be to introduce an 
> Annotation that can be added to Processors that indicate they may cause high 
> memory, disk, CPU, or network usage. For instance, any Processor that reads 
> the FlowFile contents into memory (like many XML Processors for instance) may 
> cause high memory usage. What ultimately determines if there is high 
> memory/disk/cpu/network usage will depend on the FlowFiles being processed. 
> With many of these components in the dataflow, it increases the risk of 
> OutOfMemoryErrors and performance degradation.
> The annotation should support one value from a fixed list of: CPU, Disk, 
> Memory, Network.  It should also allow the developer to provide a custom 
> description of the scenario that the component would fall under the high 
> usage category.  The annotation should be able to be specified multiple 
> times, for as many resources as it has the potential to be high usage.
> By marking components with this new Annotation, we can update the generated 
> Processor documentation to include this fact.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4872) NIFI component high resource usage annotation

2018-02-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16379259#comment-16379259
 ] 

ASF GitHub Bot commented on NIFI-4872:
--

Github user markap14 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2475#discussion_r171064496
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/SplitContent.java
 ---
@@ -75,6 +77,7 @@
 @WritesAttribute(attribute = "fragment.count", description = "The 
number of split FlowFiles generated from the parent FlowFile"),
 @WritesAttribute(attribute = "segment.original.filename ", description 
= "The filename of the parent FlowFile")})
 @SeeAlso(MergeContent.class)
+@SystemResourceConsideration(resource = SystemResource.MEMORY)
--- End diff --

I would again add a description here that indicates that it's not buffering 
the content in memory but rather just storing the FlowFile w/ its attributes in 
memory and that if generating too many splits, a two-phase approach may be 
necessary.


> NIFI component high resource usage annotation
> -
>
> Key: NIFI-4872
> URL: https://issues.apache.org/jira/browse/NIFI-4872
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Core Framework, Core UI
>Affects Versions: 1.5.0
>Reporter: Jeff Storck
>Assignee: Jeff Storck
>Priority: Critical
>
> NiFi Processors currently have no means to relay whether or not they have may 
> be resource intensive or not. The idea here would be to introduce an 
> Annotation that can be added to Processors that indicate they may cause high 
> memory, disk, CPU, or network usage. For instance, any Processor that reads 
> the FlowFile contents into memory (like many XML Processors for instance) may 
> cause high memory usage. What ultimately determines if there is high 
> memory/disk/cpu/network usage will depend on the FlowFiles being processed. 
> With many of these components in the dataflow, it increases the risk of 
> OutOfMemoryErrors and performance degradation.
> The annotation should support one value from a fixed list of: CPU, Disk, 
> Memory, Network.  It should also allow the developer to provide a custom 
> description of the scenario that the component would fall under the high 
> usage category.  The annotation should be able to be specified multiple 
> times, for as many resources as it has the potential to be high usage.
> By marking components with this new Annotation, we can update the generated 
> Processor documentation to include this fact.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4872) NIFI component high resource usage annotation

2018-02-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16379256#comment-16379256
 ] 

ASF GitHub Bot commented on NIFI-4872:
--

Github user markap14 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2475#discussion_r171064049
  
--- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/MergeContent.java
 ---
@@ -131,6 +133,7 @@
 @WritesAttribute(attribute = "merge.bin.age", description = "The age 
of the bin, in milliseconds, when it was merged and output. Effectively "
 + "this is the greatest amount of time that any FlowFile in this 
bundle remained waiting in this processor before it was output") })
 @SeeAlso({SegmentContent.class, MergeRecord.class})
+@SystemResourceConsideration(resource = SystemResource.MEMORY)
--- End diff --

It would probably be helpful here to add a description that explains that 
the content itself is not stored in memory but rather the FlowFiles' attributes 
and that the configuration for max bin size, etc. will influence how much heap 
is used. Would also call out that if merging together many small FlowFiles, a 
two-stage approach may be necessary in order to avoid running out of memory.


> NIFI component high resource usage annotation
> -
>
> Key: NIFI-4872
> URL: https://issues.apache.org/jira/browse/NIFI-4872
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Core Framework, Core UI
>Affects Versions: 1.5.0
>Reporter: Jeff Storck
>Assignee: Jeff Storck
>Priority: Critical
>
> NiFi Processors currently have no means to relay whether or not they have may 
> be resource intensive or not. The idea here would be to introduce an 
> Annotation that can be added to Processors that indicate they may cause high 
> memory, disk, CPU, or network usage. For instance, any Processor that reads 
> the FlowFile contents into memory (like many XML Processors for instance) may 
> cause high memory usage. What ultimately determines if there is high 
> memory/disk/cpu/network usage will depend on the FlowFiles being processed. 
> With many of these components in the dataflow, it increases the risk of 
> OutOfMemoryErrors and performance degradation.
> The annotation should support one value from a fixed list of: CPU, Disk, 
> Memory, Network.  It should also allow the developer to provide a custom 
> description of the scenario that the component would fall under the high 
> usage category.  The annotation should be able to be specified multiple 
> times, for as many resources as it has the potential to be high usage.
> By marking components with this new Annotation, we can update the generated 
> Processor documentation to include this fact.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4872) NIFI component high resource usage annotation

2018-02-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370566#comment-16370566
 ] 

ASF GitHub Bot commented on NIFI-4872:
--

Github user joewitt commented on the issue:

https://github.com/apache/nifi/pull/2475
  
@jtstorck i'm supportive of it as is.  Not in a position right now to 
test/build.  Can try later/tomorrow if someone else hasn't.  noticed the 
travis-ci issues but they appear unrelated


> NIFI component high resource usage annotation
> -
>
> Key: NIFI-4872
> URL: https://issues.apache.org/jira/browse/NIFI-4872
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Core Framework, Core UI
>Affects Versions: 1.5.0
>Reporter: Jeff Storck
>Assignee: Jeff Storck
>Priority: Critical
>
> NiFi Processors currently have no means to relay whether or not they have may 
> be resource intensive or not. The idea here would be to introduce an 
> Annotation that can be added to Processors that indicate they may cause high 
> memory, disk, CPU, or network usage. For instance, any Processor that reads 
> the FlowFile contents into memory (like many XML Processors for instance) may 
> cause high memory usage. What ultimately determines if there is high 
> memory/disk/cpu/network usage will depend on the FlowFiles being processed. 
> With many of these components in the dataflow, it increases the risk of 
> OutOfMemoryErrors and performance degradation.
> The annotation should support one value from a fixed list of: CPU, Disk, 
> Memory, Network.  It should also allow the developer to provide a custom 
> description of the scenario that the component would fall under the high 
> usage category.  The annotation should be able to be specified multiple 
> times, for as many resources as it has the potential to be high usage.
> By marking components with this new Annotation, we can update the generated 
> Processor documentation to include this fact.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4872) NIFI component high resource usage annotation

2018-02-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370407#comment-16370407
 ] 

ASF GitHub Bot commented on NIFI-4872:
--

Github user jtstorck commented on the issue:

https://github.com/apache/nifi/pull/2475
  
I agree, @joewitt.  I wanted to get the annotation, its integration, and 
the components that need the annotation tagged, and sort through any issues or 
changes to the annotation itself before diving too deeply into writing specific 
descriptions.  The annotation supports a description, but it might be that a 
wall of text might not be the best way to convey a system resource 
consideration.  It might be a good time to look into supporting some formatting 
of the content in the annotation's description (including 
Reads/WritesAttribute).


> NIFI component high resource usage annotation
> -
>
> Key: NIFI-4872
> URL: https://issues.apache.org/jira/browse/NIFI-4872
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Core Framework, Core UI
>Affects Versions: 1.5.0
>Reporter: Jeff Storck
>Assignee: Jeff Storck
>Priority: Critical
>
> NiFi Processors currently have no means to relay whether or not they have may 
> be resource intensive or not. The idea here would be to introduce an 
> Annotation that can be added to Processors that indicate they may cause high 
> memory, disk, CPU, or network usage. For instance, any Processor that reads 
> the FlowFile contents into memory (like many XML Processors for instance) may 
> cause high memory usage. What ultimately determines if there is high 
> memory/disk/cpu/network usage will depend on the FlowFiles being processed. 
> With many of these components in the dataflow, it increases the risk of 
> OutOfMemoryErrors and performance degradation.
> The annotation should support one value from a fixed list of: CPU, Disk, 
> Memory, Network.  It should also allow the developer to provide a custom 
> description of the scenario that the component would fall under the high 
> usage category.  The annotation should be able to be specified multiple 
> times, for as many resources as it has the potential to be high usage.
> By marking components with this new Annotation, we can update the generated 
> Processor documentation to include this fact.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4872) NIFI component high resource usage annotation

2018-02-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370393#comment-16370393
 ] 

ASF GitHub Bot commented on NIFI-4872:
--

Github user joewitt commented on the issue:

https://github.com/apache/nifi/pull/2475
  
I think my only concern is that as-is we're labeling a bunch of things as 
"CPU" or "MEMORY" but not giving descriptions.  As a user i'd see that and 
thing 'well, how does this use memory'?  For instance, does that mean each 
flowfile's content is fully loaded in memory?  Or does it mean part of one is?  
Or all of a batch of them?  Or if we say CPU usage for compression how should I 
think about number of threads?  Or in the case of compress content it might be 
worth adding 'MEMORY" and explaining that it is actually really efficient and 
can handle large objects without ever loading much in memory.  So in that case 
the resource consideration is to alleviate concerns.  We're not qualifying the 
usage consideration as good or bad in this approach.  But merely "Hey here is a 
resource usage consideration you should or might have in mind and here is how 
this component works in that regard".  Does this make sense?  So, in that sense 
I'd like to see us add descriptions to all these things we're tagging.  Not 
saying it is a must for the PR but adding "MEMORY" without explaining might 
just be alarming


> NIFI component high resource usage annotation
> -
>
> Key: NIFI-4872
> URL: https://issues.apache.org/jira/browse/NIFI-4872
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Core Framework, Core UI
>Affects Versions: 1.5.0
>Reporter: Jeff Storck
>Assignee: Jeff Storck
>Priority: Critical
>
> NiFi Processors currently have no means to relay whether or not they have may 
> be resource intensive or not. The idea here would be to introduce an 
> Annotation that can be added to Processors that indicate they may cause high 
> memory, disk, CPU, or network usage. For instance, any Processor that reads 
> the FlowFile contents into memory (like many XML Processors for instance) may 
> cause high memory usage. What ultimately determines if there is high 
> memory/disk/cpu/network usage will depend on the FlowFiles being processed. 
> With many of these components in the dataflow, it increases the risk of 
> OutOfMemoryErrors and performance degradation.
> The annotation should support one value from a fixed list of: CPU, Disk, 
> Memory, Network.  It should also allow the developer to provide a custom 
> description of the scenario that the component would fall under the high 
> usage category.  The annotation should be able to be specified multiple 
> times, for as many resources as it has the potential to be high usage.
> By marking components with this new Annotation, we can update the generated 
> Processor documentation to include this fact.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4872) NIFI component high resource usage annotation

2018-02-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370388#comment-16370388
 ] 

ASF GitHub Bot commented on NIFI-4872:
--

Github user jtstorck commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2475#discussion_r169409286
  
--- Diff: 
nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-documentation/src/main/java/org/apache/nifi/documentation/html/HtmlDocumentationWriter.java
 ---
@@ -727,6 +729,41 @@ protected void writeLink(final XMLStreamWriter 
xmlStreamWriter, final String tex
 xmlStreamWriter.writeEndElement();
 }
 
+/**
+ * Writes all the system resource considerations for this component
+ *
+ * @param configurableComponent the component to describe
+ * @param xmlStreamWriter the xml stream writer to use
+ * @throws XMLStreamException thrown if there was a problem writing 
the XML
+ */
+private void 
writeSystemResourceConsiderationInfo(ConfigurableComponent 
configurableComponent, XMLStreamWriter xmlStreamWriter)
+throws XMLStreamException {
+
+SystemResourceConsideration[] systemResourceConsiderations = 
configurableComponent.getClass().getAnnotationsByType(SystemResourceConsideration.class);
+
+writeSimpleElement(xmlStreamWriter, "h3", "System Resource 
Considerations:");
+if (systemResourceConsiderations.length > 0) {
+xmlStreamWriter.writeStartElement("table");
+xmlStreamWriter.writeAttribute("id", 
"system-resource-considerations");
+xmlStreamWriter.writeStartElement("tr");
+writeSimpleElement(xmlStreamWriter, "th", "Resource");
+writeSimpleElement(xmlStreamWriter, "th", "Description");
+xmlStreamWriter.writeEndElement();
+for (SystemResourceConsideration systemResourceConsideration : 
systemResourceConsiderations) {
+xmlStreamWriter.writeStartElement("tr");
+writeSimpleElement(xmlStreamWriter, "td", 
systemResourceConsideration.resource().name());
+// TODO allow for HTML characters here.
--- End diff --

That TODO is also present on the reads/writes attributes code in 
HtmlProcessorDocumentationWriter.  Since the functionality is similar, I added 
the TODO there as well.  Will have to talk to @mcgilman about the intention 
there.


> NIFI component high resource usage annotation
> -
>
> Key: NIFI-4872
> URL: https://issues.apache.org/jira/browse/NIFI-4872
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Core Framework, Core UI
>Affects Versions: 1.5.0
>Reporter: Jeff Storck
>Assignee: Jeff Storck
>Priority: Critical
>
> NiFi Processors currently have no means to relay whether or not they have may 
> be resource intensive or not. The idea here would be to introduce an 
> Annotation that can be added to Processors that indicate they may cause high 
> memory, disk, CPU, or network usage. For instance, any Processor that reads 
> the FlowFile contents into memory (like many XML Processors for instance) may 
> cause high memory usage. What ultimately determines if there is high 
> memory/disk/cpu/network usage will depend on the FlowFiles being processed. 
> With many of these components in the dataflow, it increases the risk of 
> OutOfMemoryErrors and performance degradation.
> The annotation should support one value from a fixed list of: CPU, Disk, 
> Memory, Network.  It should also allow the developer to provide a custom 
> description of the scenario that the component would fall under the high 
> usage category.  The annotation should be able to be specified multiple 
> times, for as many resources as it has the potential to be high usage.
> By marking components with this new Annotation, we can update the generated 
> Processor documentation to include this fact.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4872) NIFI component high resource usage annotation

2018-02-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370385#comment-16370385
 ] 

ASF GitHub Bot commented on NIFI-4872:
--

Github user joewitt commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2475#discussion_r169408354
  
--- Diff: 
nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-documentation/src/main/java/org/apache/nifi/documentation/html/HtmlDocumentationWriter.java
 ---
@@ -727,6 +729,41 @@ protected void writeLink(final XMLStreamWriter 
xmlStreamWriter, final String tex
 xmlStreamWriter.writeEndElement();
 }
 
+/**
+ * Writes all the system resource considerations for this component
+ *
+ * @param configurableComponent the component to describe
+ * @param xmlStreamWriter the xml stream writer to use
+ * @throws XMLStreamException thrown if there was a problem writing 
the XML
+ */
+private void 
writeSystemResourceConsiderationInfo(ConfigurableComponent 
configurableComponent, XMLStreamWriter xmlStreamWriter)
+throws XMLStreamException {
+
+SystemResourceConsideration[] systemResourceConsiderations = 
configurableComponent.getClass().getAnnotationsByType(SystemResourceConsideration.class);
+
+writeSimpleElement(xmlStreamWriter, "h3", "System Resource 
Considerations:");
+if (systemResourceConsiderations.length > 0) {
+xmlStreamWriter.writeStartElement("table");
+xmlStreamWriter.writeAttribute("id", 
"system-resource-considerations");
+xmlStreamWriter.writeStartElement("tr");
+writeSimpleElement(xmlStreamWriter, "th", "Resource");
+writeSimpleElement(xmlStreamWriter, "th", "Description");
+xmlStreamWriter.writeEndElement();
+for (SystemResourceConsideration systemResourceConsideration : 
systemResourceConsiderations) {
+xmlStreamWriter.writeStartElement("tr");
+writeSimpleElement(xmlStreamWriter, "td", 
systemResourceConsideration.resource().name());
+// TODO allow for HTML characters here.
--- End diff --

probably need/want to sort out this todo?


> NIFI component high resource usage annotation
> -
>
> Key: NIFI-4872
> URL: https://issues.apache.org/jira/browse/NIFI-4872
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Core Framework, Core UI
>Affects Versions: 1.5.0
>Reporter: Jeff Storck
>Assignee: Jeff Storck
>Priority: Critical
>
> NiFi Processors currently have no means to relay whether or not they have may 
> be resource intensive or not. The idea here would be to introduce an 
> Annotation that can be added to Processors that indicate they may cause high 
> memory, disk, CPU, or network usage. For instance, any Processor that reads 
> the FlowFile contents into memory (like many XML Processors for instance) may 
> cause high memory usage. What ultimately determines if there is high 
> memory/disk/cpu/network usage will depend on the FlowFiles being processed. 
> With many of these components in the dataflow, it increases the risk of 
> OutOfMemoryErrors and performance degradation.
> The annotation should support one value from a fixed list of: CPU, Disk, 
> Memory, Network.  It should also allow the developer to provide a custom 
> description of the scenario that the component would fall under the high 
> usage category.  The annotation should be able to be specified multiple 
> times, for as many resources as it has the potential to be high usage.
> By marking components with this new Annotation, we can update the generated 
> Processor documentation to include this fact.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4872) NIFI component high resource usage annotation

2018-02-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370379#comment-16370379
 ] 

ASF GitHub Bot commented on NIFI-4872:
--

Github user jtstorck commented on the issue:

https://github.com/apache/nifi/pull/2475
  
@joewitt PR has been rebased against current master, and I've implemented 
some of the changes you requested.


> NIFI component high resource usage annotation
> -
>
> Key: NIFI-4872
> URL: https://issues.apache.org/jira/browse/NIFI-4872
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Core Framework, Core UI
>Affects Versions: 1.5.0
>Reporter: Jeff Storck
>Assignee: Jeff Storck
>Priority: Critical
>
> NiFi Processors currently have no means to relay whether or not they have may 
> be resource intensive or not. The idea here would be to introduce an 
> Annotation that can be added to Processors that indicate they may cause high 
> memory, disk, CPU, or network usage. For instance, any Processor that reads 
> the FlowFile contents into memory (like many XML Processors for instance) may 
> cause high memory usage. What ultimately determines if there is high 
> memory/disk/cpu/network usage will depend on the FlowFiles being processed. 
> With many of these components in the dataflow, it increases the risk of 
> OutOfMemoryErrors and performance degradation.
> The annotation should support one value from a fixed list of: CPU, Disk, 
> Memory, Network.  It should also allow the developer to provide a custom 
> description of the scenario that the component would fall under the high 
> usage category.  The annotation should be able to be specified multiple 
> times, for as many resources as it has the potential to be high usage.
> By marking components with this new Annotation, we can update the generated 
> Processor documentation to include this fact.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4872) NIFI component high resource usage annotation

2018-02-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16368244#comment-16368244
 ] 

ASF GitHub Bot commented on NIFI-4872:
--

Github user pvillard31 commented on the issue:

https://github.com/apache/nifi/pull/2475
  
Few suggestions regarding existing processors: ExtractText and ReplaceText 
can also be CPU intensive when using some tricky regular expressions. Same goes 
for grok processors as well as TransformXML (depends of the XSLT). It's not 
true in most cases but it can be in some situations. Will try to continue the 
review early next week.


> NIFI component high resource usage annotation
> -
>
> Key: NIFI-4872
> URL: https://issues.apache.org/jira/browse/NIFI-4872
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Core Framework, Core UI
>Affects Versions: 1.5.0
>Reporter: Jeff Storck
>Assignee: Jeff Storck
>Priority: Critical
>
> NiFi Processors currently have no means to relay whether or not they have may 
> be resource intensive or not. The idea here would be to introduce an 
> Annotation that can be added to Processors that indicate they may cause high 
> memory, disk, CPU, or network usage. For instance, any Processor that reads 
> the FlowFile contents into memory (like many XML Processors for instance) may 
> cause high memory usage. What ultimately determines if there is high 
> memory/disk/cpu/network usage will depend on the FlowFiles being processed. 
> With many of these components in the dataflow, it increases the risk of 
> OutOfMemoryErrors and performance degradation.
> The annotation should support one value from a fixed list of: CPU, Disk, 
> Memory, Network.  It should also allow the developer to provide a custom 
> description of the scenario that the component would fall under the high 
> usage category.  The annotation should be able to be specified multiple 
> times, for as many resources as it has the potential to be high usage.
> By marking components with this new Annotation, we can update the generated 
> Processor documentation to include this fact.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4872) NIFI component high resource usage annotation

2018-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16368025#comment-16368025
 ] 

ASF GitHub Bot commented on NIFI-4872:
--

Github user jtstorck commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2475#discussion_r168896206
  
--- Diff: 
nifi-nar-bundles/nifi-amqp-bundle/nifi-amqp-processors/src/main/java/org/apache/nifi/amqp/processors/PublishAMQP.java
 ---
@@ -62,6 +64,7 @@
 + "and Queue is not set up, the message will have no final 
destination and will return (i.e., the data will not make it to the queue). If "
 + "that happens you will see a log in both app-log and bulletin 
stating to that effect. Fixing the binding "
 + "(normally done by AMQP administrator) will resolve the issue.")
+@HighResourceUsageScenario(resource = SystemResource.MEMORY)
--- End diff --

The developer can provide a description using the "scenario" argument on 
the annotation.  This first pass was to identify most of the processors that 
have the annotation.  As we look through the list of components, specific 
descriptions can be added to override the default scenario from the annotation 
itself.


> NIFI component high resource usage annotation
> -
>
> Key: NIFI-4872
> URL: https://issues.apache.org/jira/browse/NIFI-4872
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Core Framework, Core UI
>Affects Versions: 1.5.0
>Reporter: Jeff Storck
>Assignee: Jeff Storck
>Priority: Critical
>
> NiFi Processors currently have no means to relay whether or not they have may 
> be resource intensive or not. The idea here would be to introduce an 
> Annotation that can be added to Processors that indicate they may cause high 
> memory, disk, CPU, or network usage. For instance, any Processor that reads 
> the FlowFile contents into memory (like many XML Processors for instance) may 
> cause high memory usage. What ultimately determines if there is high 
> memory/disk/cpu/network usage will depend on the FlowFiles being processed. 
> With many of these components in the dataflow, it increases the risk of 
> OutOfMemoryErrors and performance degradation.
> The annotation should support one value from a fixed list of: CPU, Disk, 
> Memory, Network.  It should also allow the developer to provide a custom 
> description of the scenario that the component would fall under the high 
> usage category.  The annotation should be able to be specified multiple 
> times, for as many resources as it has the potential to be high usage.
> By marking components with this new Annotation, we can update the generated 
> Processor documentation to include this fact.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4872) NIFI component high resource usage annotation

2018-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16368020#comment-16368020
 ] 

ASF GitHub Bot commented on NIFI-4872:
--

Github user jtstorck commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2475#discussion_r168895945
  
--- Diff: 
nifi-api/src/main/java/org/apache/nifi/annotation/behavior/HighResourceUsageScenario.java
 ---
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.annotation.behavior;
+
+import java.lang.annotation.Documented;
+import java.lang.annotation.ElementType;
+import java.lang.annotation.Inherited;
+import java.lang.annotation.Repeatable;
+import java.lang.annotation.Retention;
+import java.lang.annotation.RetentionPolicy;
+import java.lang.annotation.Target;
+
+/**
+ * Annotation that may be placed on a
+ * {@link org.apache.nifi.components.ConfigurableComponent Component} 
indicating that this
+ * component may cause high usage of a resource.
+ */
+@Documented
+@Target({ElementType.TYPE})
+@Retention(RetentionPolicy.RUNTIME)
+@Inherited
+@Repeatable(HighResourceUsageScenarios.class)
+public @interface HighResourceUsageScenario {
--- End diff --

I'm not tied to any of the names of classes that I've used so far.  
SystemResourceConsideration sounds good to me, especially since it has a wider 
range of meaning than just specifying higher resource usage.


> NIFI component high resource usage annotation
> -
>
> Key: NIFI-4872
> URL: https://issues.apache.org/jira/browse/NIFI-4872
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Core Framework, Core UI
>Affects Versions: 1.5.0
>Reporter: Jeff Storck
>Assignee: Jeff Storck
>Priority: Critical
>
> NiFi Processors currently have no means to relay whether or not they have may 
> be resource intensive or not. The idea here would be to introduce an 
> Annotation that can be added to Processors that indicate they may cause high 
> memory, disk, CPU, or network usage. For instance, any Processor that reads 
> the FlowFile contents into memory (like many XML Processors for instance) may 
> cause high memory usage. What ultimately determines if there is high 
> memory/disk/cpu/network usage will depend on the FlowFiles being processed. 
> With many of these components in the dataflow, it increases the risk of 
> OutOfMemoryErrors and performance degradation.
> The annotation should support one value from a fixed list of: CPU, Disk, 
> Memory, Network.  It should also allow the developer to provide a custom 
> description of the scenario that the component would fall under the high 
> usage category.  The annotation should be able to be specified multiple 
> times, for as many resources as it has the potential to be high usage.
> By marking components with this new Annotation, we can update the generated 
> Processor documentation to include this fact.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4872) NIFI component high resource usage annotation

2018-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367513#comment-16367513
 ] 

ASF GitHub Bot commented on NIFI-4872:
--

Github user joewitt commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2475#discussion_r168796980
  
--- Diff: 
nifi-nar-bundles/nifi-amqp-bundle/nifi-amqp-processors/src/main/java/org/apache/nifi/amqp/processors/PublishAMQP.java
 ---
@@ -62,6 +64,7 @@
 + "and Queue is not set up, the message will have no final 
destination and will return (i.e., the data will not make it to the queue). If "
 + "that happens you will see a log in both app-log and bulletin 
stating to that effect. Fixing the binding "
 + "(normally done by AMQP administrator) will resolve the issue.")
+@HighResourceUsageScenario(resource = SystemResource.MEMORY)
--- End diff --

We need to be able to articulate the memory usage.  Is it that every 
message published is fully loaded into memory in a byte[] therefore large 
messages will consume large amounts of heap?  Same for a lot of items below.  
We need to be able to let the developer explain.  In some cases we have 
processors that operate on batches of things and people will worry it is the 
batch that is the problem.  But in reality it is that if any single 
event/record is large within a batch that single event will be in mem/etc...


> NIFI component high resource usage annotation
> -
>
> Key: NIFI-4872
> URL: https://issues.apache.org/jira/browse/NIFI-4872
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Core Framework, Core UI
>Affects Versions: 1.5.0
>Reporter: Jeff Storck
>Assignee: Jeff Storck
>Priority: Critical
>
> NiFi Processors currently have no means to relay whether or not they have may 
> be resource intensive or not. The idea here would be to introduce an 
> Annotation that can be added to Processors that indicate they may cause high 
> memory, disk, CPU, or network usage. For instance, any Processor that reads 
> the FlowFile contents into memory (like many XML Processors for instance) may 
> cause high memory usage. What ultimately determines if there is high 
> memory/disk/cpu/network usage will depend on the FlowFiles being processed. 
> With many of these components in the dataflow, it increases the risk of 
> OutOfMemoryErrors and performance degradation.
> The annotation should support one value from a fixed list of: CPU, Disk, 
> Memory, Network.  It should also allow the developer to provide a custom 
> description of the scenario that the component would fall under the high 
> usage category.  The annotation should be able to be specified multiple 
> times, for as many resources as it has the potential to be high usage.
> By marking components with this new Annotation, we can update the generated 
> Processor documentation to include this fact.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4872) NIFI component high resource usage annotation

2018-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367508#comment-16367508
 ] 

ASF GitHub Bot commented on NIFI-4872:
--

Github user joewitt commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2475#discussion_r168796396
  
--- Diff: 
nifi-api/src/main/java/org/apache/nifi/annotation/behavior/HighResourceUsageScenarios.java
 ---
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.annotation.behavior;
+
+import java.lang.annotation.Documented;
+import java.lang.annotation.ElementType;
+import java.lang.annotation.Inherited;
+import java.lang.annotation.Retention;
+import java.lang.annotation.RetentionPolicy;
+import java.lang.annotation.Target;
+
+/**
+ * Annotation that may be placed on a
+ * {@link org.apache.nifi.components.ConfigurableComponent Component} 
indicating that this
+ * component may cause high usage of resources.
+ *
+ */
+@Documented
+@Target({ElementType.TYPE})
+@Retention(RetentionPolicy.RUNTIME)
+@Inherited
+public @interface HighResourceUsageScenarios {
--- End diff --

SystemResourceConsiderations instead?


> NIFI component high resource usage annotation
> -
>
> Key: NIFI-4872
> URL: https://issues.apache.org/jira/browse/NIFI-4872
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Core Framework, Core UI
>Affects Versions: 1.5.0
>Reporter: Jeff Storck
>Assignee: Jeff Storck
>Priority: Critical
>
> NiFi Processors currently have no means to relay whether or not they have may 
> be resource intensive or not. The idea here would be to introduce an 
> Annotation that can be added to Processors that indicate they may cause high 
> memory, disk, CPU, or network usage. For instance, any Processor that reads 
> the FlowFile contents into memory (like many XML Processors for instance) may 
> cause high memory usage. What ultimately determines if there is high 
> memory/disk/cpu/network usage will depend on the FlowFiles being processed. 
> With many of these components in the dataflow, it increases the risk of 
> OutOfMemoryErrors and performance degradation.
> The annotation should support one value from a fixed list of: CPU, Disk, 
> Memory, Network.  It should also allow the developer to provide a custom 
> description of the scenario that the component would fall under the high 
> usage category.  The annotation should be able to be specified multiple 
> times, for as many resources as it has the potential to be high usage.
> By marking components with this new Annotation, we can update the generated 
> Processor documentation to include this fact.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4872) NIFI component high resource usage annotation

2018-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367507#comment-16367507
 ] 

ASF GitHub Bot commented on NIFI-4872:
--

Github user joewitt commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2475#discussion_r168796071
  
--- Diff: 
nifi-api/src/main/java/org/apache/nifi/annotation/behavior/HighResourceUsageScenario.java
 ---
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.annotation.behavior;
+
+import java.lang.annotation.Documented;
+import java.lang.annotation.ElementType;
+import java.lang.annotation.Inherited;
+import java.lang.annotation.Repeatable;
+import java.lang.annotation.Retention;
+import java.lang.annotation.RetentionPolicy;
+import java.lang.annotation.Target;
+
+/**
+ * Annotation that may be placed on a
+ * {@link org.apache.nifi.components.ConfigurableComponent Component} 
indicating that this
+ * component may cause high usage of a resource.
+ */
+@Documented
+@Target({ElementType.TYPE})
+@Retention(RetentionPolicy.RUNTIME)
+@Inherited
+@Repeatable(HighResourceUsageScenarios.class)
+public @interface HighResourceUsageScenario {
--- End diff --

What do you think about calling this 'SystemResourceConsideration' instead 
of HighResourceUsageScenario?  It takes 'SystemResource' types which make sense 
to me and this isn't just about 'high usage' it is also about helping provide 
the developer a way to articulate these concerns to a user.  We get questions 
all the time about 'Can you compress large objects' - and the answer is yes 
because it is done in a streaming/small buffer fashion regardless of whether 
something is 10KB or 10GB.


> NIFI component high resource usage annotation
> -
>
> Key: NIFI-4872
> URL: https://issues.apache.org/jira/browse/NIFI-4872
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Core Framework, Core UI
>Affects Versions: 1.5.0
>Reporter: Jeff Storck
>Assignee: Jeff Storck
>Priority: Critical
>
> NiFi Processors currently have no means to relay whether or not they have may 
> be resource intensive or not. The idea here would be to introduce an 
> Annotation that can be added to Processors that indicate they may cause high 
> memory, disk, CPU, or network usage. For instance, any Processor that reads 
> the FlowFile contents into memory (like many XML Processors for instance) may 
> cause high memory usage. What ultimately determines if there is high 
> memory/disk/cpu/network usage will depend on the FlowFiles being processed. 
> With many of these components in the dataflow, it increases the risk of 
> OutOfMemoryErrors and performance degradation.
> The annotation should support one value from a fixed list of: CPU, Disk, 
> Memory, Network.  It should also allow the developer to provide a custom 
> description of the scenario that the component would fall under the high 
> usage category.  The annotation should be able to be specified multiple 
> times, for as many resources as it has the potential to be high usage.
> By marking components with this new Annotation, we can update the generated 
> Processor documentation to include this fact.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4872) NIFI component high resource usage annotation

2018-02-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366275#comment-16366275
 ] 

ASF GitHub Bot commented on NIFI-4872:
--

GitHub user jtstorck opened a pull request:

https://github.com/apache/nifi/pull/2475

NIFI-4872 Added annotation for specifying scenarios in which components can 
cause high usage of system resources.

Thank you for submitting a contribution to Apache NiFi.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [x] Is there a JIRA ticket associated with this PR? Is it referenced 
 in the commit message?

- [x] Does your PR title start with NIFI- where  is the JIRA number 
you are trying to resolve? Pay particular attention to the hyphen "-" character.

- [x] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

- [ ] Is your initial contribution a single, squashed commit?

### For code changes:
- [x] Have you ensured that the full suite of tests is executed via mvn 
-Pcontrib-check clean install at the root nifi folder?
- [x] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file under nifi-assembly?
- [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found under nifi-assembly?
- [ ] If adding new Properties, have you added .displayName in addition to 
.name (programmatic access) for each of the new properties?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jtstorck/nifi NIFI-4872

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nifi/pull/2475.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2475


commit 4906f2d1545f94c1ec264fc0a65615412b81cbe9
Author: Jeff Storck 
Date:   2018-02-15T18:12:49Z

NIFI-4872 Added annotation for specifying scenarios in which components can 
cause high usage of system resources.

commit ec85dadc21c9081297d6dcb1ae0424c33ed6f42b
Author: Jeff Storck 
Date:   2018-02-15T20:03:39Z

NIFI-4872 Initial set of components marked with the 
HighResourceUsageScenario annotation.




> NIFI component high resource usage annotation
> -
>
> Key: NIFI-4872
> URL: https://issues.apache.org/jira/browse/NIFI-4872
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Core Framework, Core UI
>Affects Versions: 1.5.0
>Reporter: Jeff Storck
>Assignee: Jeff Storck
>Priority: Critical
>
> NiFi Processors currently have no means to relay whether or not they have may 
> be resource intensive or not. The idea here would be to introduce an 
> Annotation that can be added to Processors that indicate they may cause high 
> memory, disk, CPU, or network usage. For instance, any Processor that reads 
> the FlowFile contents into memory (like many XML Processors for instance) may 
> cause high memory usage. What ultimately determines if there is high 
> memory/disk/cpu/network usage will depend on the FlowFiles being processed. 
> With many of these components in the dataflow, it increases the risk of 
> OutOfMemoryErrors and performance degradation.
> The annotation should support one value from a fixed list of: CPU, Disk, 
> Memory, Network.  It should also allow the developer to provide a custom 
> description of the scenario that the component would fall under the high 
> usage category.  The annotation should be able to be specified multiple 
> times, for as many resources as it has the potential to be high usage.
> By marking components with this new Annotation, we can update the generated 
> Processor documentation to include this fact.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4872) NIFI component high resource usage annotation

2018-02-13 Thread Pierre Villard (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363069#comment-16363069
 ] 

Pierre Villard commented on NIFI-4872:
--

[~jtstorck] - cool! Sounds good to me!

> NIFI component high resource usage annotation
> -
>
> Key: NIFI-4872
> URL: https://issues.apache.org/jira/browse/NIFI-4872
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Core Framework, Core UI
>Affects Versions: 1.5.0
>Reporter: Jeff Storck
>Assignee: Jeff Storck
>Priority: Critical
>
> NiFi Processors currently have no means to relay whether or not they have may 
> be resource intensive or not. The idea here would be to introduce an 
> Annotation that can be added to Processors that indicate they may cause high 
> memory, disk, CPU, or network usage. For instance, any Processor that reads 
> the FlowFile contents into memory (like many XML Processors for instance) may 
> cause high memory usage. What ultimately determines if there is high 
> memory/disk/cpu/network usage will depend on the FlowFiles being processed. 
> With many of these components in the dataflow, it increases the risk of 
> OutOfMemoryErrors and performance degradation.
> The annotation should support one value from a fixed list of: CPU, Disk, 
> Memory, Network.  It should also allow the developer to provide a custom 
> description of the scenario that the component would fall under the high 
> usage category.  The annotation should be able to be specified multiple 
> times, for as many resources as it has the potential to be high usage.
> By marking components with this new Annotation, we can update the generated 
> Processor documentation to include this fact.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-4872) NIFI component high resource usage annotation

2018-02-13 Thread Pierre Villard (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361970#comment-16361970
 ] 

Pierre Villard commented on NIFI-4872:
--

Would it also make sense to add a description field in this annotation? I'm 
thinking about Merge and Split processors: we often recall users to perform a 
two steps processing when using such a processor with a huge file. We could 
also update the capability description though.

> NIFI component high resource usage annotation
> -
>
> Key: NIFI-4872
> URL: https://issues.apache.org/jira/browse/NIFI-4872
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Core Framework, Core UI
>Affects Versions: 1.5.0
>Reporter: Jeff Storck
>Assignee: Jeff Storck
>Priority: Critical
>
> NiFi Processors currently have no means to relay whether or not they have may 
> be resource intensive or not. The idea here would be to introduce an 
> Annotation that can be added to Processors that indicate they may cause high 
> memory, disk, CPU, or network usage. For instance, any Processor that reads 
> the FlowFile contents into memory (like many XML Processors for instance) may 
> cause high memory usage. What ultimately determines if there is high 
> memory/disk/cpu/network usage will depend on the FlowFiles being processed. 
> With many of these components in the dataflow, it increases the risk of 
> OutOfMemoryErrors and performance degradation.
> The annotation should support one or more values from a fixed list of: CPU, 
> Disk, Memory, Network.
> By marking components with this new Annotation, we can update the generated 
> Processor documentation to include this fact.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)