[jira] [Commented] (NIFI-4707) SiteToSiteProvenanceReportingTask not returning correct metadata

2018-01-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16308959#comment-16308959
 ] 

ASF GitHub Bot commented on NIFI-4707:
--

Github user asfgit closed the pull request at:

https://github.com/apache/nifi/pull/2351


> SiteToSiteProvenanceReportingTask not returning correct metadata
> 
>
> Key: NIFI-4707
> URL: https://issues.apache.org/jira/browse/NIFI-4707
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: Matt Burgess
> Fix For: 1.5.0
>
>
> When the SiteToSiteProvenanceReportingTask emits flow files, some of them 
> include a "componentName" field and some do not. Investigation shows that 
> only the components (except connections) in the root process group have that 
> field populated. Having this information can be very helpful to the user, 
> even though the names might be duplicated, there would be a mapping between a 
> component's ID and its name. At the very least the behavior (i.e. component 
> name being available) should be consistent.
> Having a full map (by traversing the entire flow) also opens up the ability 
> to include Process Group information for the various components. The 
> reporting task could include the parent Process Group identifier and/or name, 
> with perhaps a special ID for the root PG's "parent", such as "@ROOT@" or 
> something unique.
> This could also allow for a PG ID in the list of filtered "component IDs", 
> where any provenance event for a processor in a particular PG could be 
> included in a filter when that PG's ID is in the filter list.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (NIFI-4707) SiteToSiteProvenanceReportingTask not returning correct metadata

2018-01-02 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16308599#comment-16308599
 ] 

ASF subversion and git services commented on NIFI-4707:
---

Commit 1f793923a4f7663cdab6c259b49e6b4167553109 in nifi's branch 
refs/heads/master from [~ca9mbu]
[ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=1f79392 ]

NIFI-4707: Build full component map for ID -> Name association in provenance 
reporting"

NIFI-4707: Add process group ID/name to S2SProvReportingTask records

NIFI-4707: Added support for filtering provenance on process group ID

NIFI-4707: Fixed support for provenance in Atlas reporting task

NIFI-4707: Refactored common code into reporting-utils, fixed filtering


> SiteToSiteProvenanceReportingTask not returning correct metadata
> 
>
> Key: NIFI-4707
> URL: https://issues.apache.org/jira/browse/NIFI-4707
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: Matt Burgess
> Fix For: 1.5.0
>
>
> When the SiteToSiteProvenanceReportingTask emits flow files, some of them 
> include a "componentName" field and some do not. Investigation shows that 
> only the components (except connections) in the root process group have that 
> field populated. Having this information can be very helpful to the user, 
> even though the names might be duplicated, there would be a mapping between a 
> component's ID and its name. At the very least the behavior (i.e. component 
> name being available) should be consistent.
> Having a full map (by traversing the entire flow) also opens up the ability 
> to include Process Group information for the various components. The 
> reporting task could include the parent Process Group identifier and/or name, 
> with perhaps a special ID for the root PG's "parent", such as "@ROOT@" or 
> something unique.
> This could also allow for a PG ID in the list of filtered "component IDs", 
> where any provenance event for a processor in a particular PG could be 
> included in a filter when that PG's ID is in the filter list.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (NIFI-4707) SiteToSiteProvenanceReportingTask not returning correct metadata

2018-01-02 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16308595#comment-16308595
 ] 

ASF subversion and git services commented on NIFI-4707:
---

Commit 1f793923a4f7663cdab6c259b49e6b4167553109 in nifi's branch 
refs/heads/master from [~ca9mbu]
[ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=1f79392 ]

NIFI-4707: Build full component map for ID -> Name association in provenance 
reporting"

NIFI-4707: Add process group ID/name to S2SProvReportingTask records

NIFI-4707: Added support for filtering provenance on process group ID

NIFI-4707: Fixed support for provenance in Atlas reporting task

NIFI-4707: Refactored common code into reporting-utils, fixed filtering


> SiteToSiteProvenanceReportingTask not returning correct metadata
> 
>
> Key: NIFI-4707
> URL: https://issues.apache.org/jira/browse/NIFI-4707
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: Matt Burgess
> Fix For: 1.5.0
>
>
> When the SiteToSiteProvenanceReportingTask emits flow files, some of them 
> include a "componentName" field and some do not. Investigation shows that 
> only the components (except connections) in the root process group have that 
> field populated. Having this information can be very helpful to the user, 
> even though the names might be duplicated, there would be a mapping between a 
> component's ID and its name. At the very least the behavior (i.e. component 
> name being available) should be consistent.
> Having a full map (by traversing the entire flow) also opens up the ability 
> to include Process Group information for the various components. The 
> reporting task could include the parent Process Group identifier and/or name, 
> with perhaps a special ID for the root PG's "parent", such as "@ROOT@" or 
> something unique.
> This could also allow for a PG ID in the list of filtered "component IDs", 
> where any provenance event for a processor in a particular PG could be 
> included in a filter when that PG's ID is in the filter list.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (NIFI-4707) SiteToSiteProvenanceReportingTask not returning correct metadata

2018-01-02 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16308597#comment-16308597
 ] 

ASF subversion and git services commented on NIFI-4707:
---

Commit 1f793923a4f7663cdab6c259b49e6b4167553109 in nifi's branch 
refs/heads/master from [~ca9mbu]
[ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=1f79392 ]

NIFI-4707: Build full component map for ID -> Name association in provenance 
reporting"

NIFI-4707: Add process group ID/name to S2SProvReportingTask records

NIFI-4707: Added support for filtering provenance on process group ID

NIFI-4707: Fixed support for provenance in Atlas reporting task

NIFI-4707: Refactored common code into reporting-utils, fixed filtering


> SiteToSiteProvenanceReportingTask not returning correct metadata
> 
>
> Key: NIFI-4707
> URL: https://issues.apache.org/jira/browse/NIFI-4707
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: Matt Burgess
> Fix For: 1.5.0
>
>
> When the SiteToSiteProvenanceReportingTask emits flow files, some of them 
> include a "componentName" field and some do not. Investigation shows that 
> only the components (except connections) in the root process group have that 
> field populated. Having this information can be very helpful to the user, 
> even though the names might be duplicated, there would be a mapping between a 
> component's ID and its name. At the very least the behavior (i.e. component 
> name being available) should be consistent.
> Having a full map (by traversing the entire flow) also opens up the ability 
> to include Process Group information for the various components. The 
> reporting task could include the parent Process Group identifier and/or name, 
> with perhaps a special ID for the root PG's "parent", such as "@ROOT@" or 
> something unique.
> This could also allow for a PG ID in the list of filtered "component IDs", 
> where any provenance event for a processor in a particular PG could be 
> included in a filter when that PG's ID is in the filter list.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (NIFI-4707) SiteToSiteProvenanceReportingTask not returning correct metadata

2018-01-02 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16308604#comment-16308604
 ] 

ASF subversion and git services commented on NIFI-4707:
---

Commit 84cecfbeea82e81faef77c5b9de76f54bd965316 in nifi's branch 
refs/heads/master from [~ijokarumawak]
[ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=84cecfb ]

NIFI-4707: Fixed ProcessGroup tree

- Removed duplicated creation of a ParentProcessGroupSearchNode for the
root ProcessGroup.
- Removed duplicated creation of a ParentProcessGroupSearchNode for each
component inside a ProcessGroup.
- Fixed ProcessGroup id hierarchy.
- Fixed filtering logic.
- Added unit tests for filtering by ProcessGroupId and Remote
Input/Output ports.

Signed-off-by: Matthew Burgess 

This closes #2351


> SiteToSiteProvenanceReportingTask not returning correct metadata
> 
>
> Key: NIFI-4707
> URL: https://issues.apache.org/jira/browse/NIFI-4707
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: Matt Burgess
> Fix For: 1.5.0
>
>
> When the SiteToSiteProvenanceReportingTask emits flow files, some of them 
> include a "componentName" field and some do not. Investigation shows that 
> only the components (except connections) in the root process group have that 
> field populated. Having this information can be very helpful to the user, 
> even though the names might be duplicated, there would be a mapping between a 
> component's ID and its name. At the very least the behavior (i.e. component 
> name being available) should be consistent.
> Having a full map (by traversing the entire flow) also opens up the ability 
> to include Process Group information for the various components. The 
> reporting task could include the parent Process Group identifier and/or name, 
> with perhaps a special ID for the root PG's "parent", such as "@ROOT@" or 
> something unique.
> This could also allow for a PG ID in the list of filtered "component IDs", 
> where any provenance event for a processor in a particular PG could be 
> included in a filter when that PG's ID is in the filter list.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (NIFI-4707) SiteToSiteProvenanceReportingTask not returning correct metadata

2018-01-02 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16308601#comment-16308601
 ] 

ASF subversion and git services commented on NIFI-4707:
---

Commit d65e6b25630fa918ede2cd6922dc777e816679c3 in nifi's branch 
refs/heads/master from [~ijokarumawak]
[ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=d65e6b2 ]

NIFI-4707: Improved S2SProvenanceReportingTask

- Simplified consumeEvents method signature
- Refactored ComponentMapHolder methods visibility
- Renamed componentMap to componentNameMap
- Map more metadata from ConnectionStatus for Remote Input/Output Ports
- Support Process Group hierachy filtering
- Throw an exception when the reporting task fails to send provenance
data to keep current provenance event index so that events can be
consumed again


> SiteToSiteProvenanceReportingTask not returning correct metadata
> 
>
> Key: NIFI-4707
> URL: https://issues.apache.org/jira/browse/NIFI-4707
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: Matt Burgess
> Fix For: 1.5.0
>
>
> When the SiteToSiteProvenanceReportingTask emits flow files, some of them 
> include a "componentName" field and some do not. Investigation shows that 
> only the components (except connections) in the root process group have that 
> field populated. Having this information can be very helpful to the user, 
> even though the names might be duplicated, there would be a mapping between a 
> component's ID and its name. At the very least the behavior (i.e. component 
> name being available) should be consistent.
> Having a full map (by traversing the entire flow) also opens up the ability 
> to include Process Group information for the various components. The 
> reporting task could include the parent Process Group identifier and/or name, 
> with perhaps a special ID for the root PG's "parent", such as "@ROOT@" or 
> something unique.
> This could also allow for a PG ID in the list of filtered "component IDs", 
> where any provenance event for a processor in a particular PG could be 
> included in a filter when that PG's ID is in the filter list.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (NIFI-4707) SiteToSiteProvenanceReportingTask not returning correct metadata

2018-01-02 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16308600#comment-16308600
 ] 

ASF subversion and git services commented on NIFI-4707:
---

Commit 1f793923a4f7663cdab6c259b49e6b4167553109 in nifi's branch 
refs/heads/master from [~ca9mbu]
[ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=1f79392 ]

NIFI-4707: Build full component map for ID -> Name association in provenance 
reporting"

NIFI-4707: Add process group ID/name to S2SProvReportingTask records

NIFI-4707: Added support for filtering provenance on process group ID

NIFI-4707: Fixed support for provenance in Atlas reporting task

NIFI-4707: Refactored common code into reporting-utils, fixed filtering


> SiteToSiteProvenanceReportingTask not returning correct metadata
> 
>
> Key: NIFI-4707
> URL: https://issues.apache.org/jira/browse/NIFI-4707
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: Matt Burgess
> Fix For: 1.5.0
>
>
> When the SiteToSiteProvenanceReportingTask emits flow files, some of them 
> include a "componentName" field and some do not. Investigation shows that 
> only the components (except connections) in the root process group have that 
> field populated. Having this information can be very helpful to the user, 
> even though the names might be duplicated, there would be a mapping between a 
> component's ID and its name. At the very least the behavior (i.e. component 
> name being available) should be consistent.
> Having a full map (by traversing the entire flow) also opens up the ability 
> to include Process Group information for the various components. The 
> reporting task could include the parent Process Group identifier and/or name, 
> with perhaps a special ID for the root PG's "parent", such as "@ROOT@" or 
> something unique.
> This could also allow for a PG ID in the list of filtered "component IDs", 
> where any provenance event for a processor in a particular PG could be 
> included in a filter when that PG's ID is in the filter list.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (NIFI-4707) SiteToSiteProvenanceReportingTask not returning correct metadata

2018-01-02 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16308603#comment-16308603
 ] 

ASF subversion and git services commented on NIFI-4707:
---

Commit 97dc20e2d95057b890b3fecbe6f6e8877923e6b3 in nifi's branch 
refs/heads/master from [~ca9mbu]
[ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=97dc20e ]

NIFI-4707: Changed process group parent stack to tree


> SiteToSiteProvenanceReportingTask not returning correct metadata
> 
>
> Key: NIFI-4707
> URL: https://issues.apache.org/jira/browse/NIFI-4707
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: Matt Burgess
> Fix For: 1.5.0
>
>
> When the SiteToSiteProvenanceReportingTask emits flow files, some of them 
> include a "componentName" field and some do not. Investigation shows that 
> only the components (except connections) in the root process group have that 
> field populated. Having this information can be very helpful to the user, 
> even though the names might be duplicated, there would be a mapping between a 
> component's ID and its name. At the very least the behavior (i.e. component 
> name being available) should be consistent.
> Having a full map (by traversing the entire flow) also opens up the ability 
> to include Process Group information for the various components. The 
> reporting task could include the parent Process Group identifier and/or name, 
> with perhaps a special ID for the root PG's "parent", such as "@ROOT@" or 
> something unique.
> This could also allow for a PG ID in the list of filtered "component IDs", 
> where any provenance event for a processor in a particular PG could be 
> included in a filter when that PG's ID is in the filter list.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (NIFI-4707) SiteToSiteProvenanceReportingTask not returning correct metadata

2018-01-02 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16308596#comment-16308596
 ] 

ASF subversion and git services commented on NIFI-4707:
---

Commit 1f793923a4f7663cdab6c259b49e6b4167553109 in nifi's branch 
refs/heads/master from [~ca9mbu]
[ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=1f79392 ]

NIFI-4707: Build full component map for ID -> Name association in provenance 
reporting"

NIFI-4707: Add process group ID/name to S2SProvReportingTask records

NIFI-4707: Added support for filtering provenance on process group ID

NIFI-4707: Fixed support for provenance in Atlas reporting task

NIFI-4707: Refactored common code into reporting-utils, fixed filtering


> SiteToSiteProvenanceReportingTask not returning correct metadata
> 
>
> Key: NIFI-4707
> URL: https://issues.apache.org/jira/browse/NIFI-4707
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: Matt Burgess
> Fix For: 1.5.0
>
>
> When the SiteToSiteProvenanceReportingTask emits flow files, some of them 
> include a "componentName" field and some do not. Investigation shows that 
> only the components (except connections) in the root process group have that 
> field populated. Having this information can be very helpful to the user, 
> even though the names might be duplicated, there would be a mapping between a 
> component's ID and its name. At the very least the behavior (i.e. component 
> name being available) should be consistent.
> Having a full map (by traversing the entire flow) also opens up the ability 
> to include Process Group information for the various components. The 
> reporting task could include the parent Process Group identifier and/or name, 
> with perhaps a special ID for the root PG's "parent", such as "@ROOT@" or 
> something unique.
> This could also allow for a PG ID in the list of filtered "component IDs", 
> where any provenance event for a processor in a particular PG could be 
> included in a filter when that PG's ID is in the filter list.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (NIFI-4707) SiteToSiteProvenanceReportingTask not returning correct metadata

2018-01-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16308544#comment-16308544
 ] 

ASF GitHub Bot commented on NIFI-4707:
--

Github user mattyb149 commented on the issue:

https://github.com/apache/nifi/pull/2351
  
@ijokarumawak Looks great, thanks!  Guess I was a little careless with 
implementation and testing on that go-round, thanks for getting it across the 
finish line. I ran the tests and with a number of scenarios on a live NiFi to 
verify things seem to be working smoothly. Will let Travis finish then merge to 
master.


> SiteToSiteProvenanceReportingTask not returning correct metadata
> 
>
> Key: NIFI-4707
> URL: https://issues.apache.org/jira/browse/NIFI-4707
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: Matt Burgess
>
> When the SiteToSiteProvenanceReportingTask emits flow files, some of them 
> include a "componentName" field and some do not. Investigation shows that 
> only the components (except connections) in the root process group have that 
> field populated. Having this information can be very helpful to the user, 
> even though the names might be duplicated, there would be a mapping between a 
> component's ID and its name. At the very least the behavior (i.e. component 
> name being available) should be consistent.
> Having a full map (by traversing the entire flow) also opens up the ability 
> to include Process Group information for the various components. The 
> reporting task could include the parent Process Group identifier and/or name, 
> with perhaps a special ID for the root PG's "parent", such as "@ROOT@" or 
> something unique.
> This could also allow for a PG ID in the list of filtered "component IDs", 
> where any provenance event for a processor in a particular PG could be 
> included in a filter when that PG's ID is in the filter list.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (NIFI-4707) SiteToSiteProvenanceReportingTask not returning correct metadata

2017-12-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16303040#comment-16303040
 ] 

ASF GitHub Bot commented on NIFI-4707:
--

Github user ijokarumawak commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2351#discussion_r158617030
  
--- Diff: 
nifi-nar-bundles/nifi-extension-utils/nifi-reporting-utils/src/main/java/org/apache/nifi/reporting/util/provenance/ProvenanceEventConsumer.java
 ---
@@ -218,18 +230,35 @@ private boolean isFilteringEnabled() {
 return componentTypeRegex != null || !eventTypes.isEmpty() || 
!componentIds.isEmpty();
 }
 
-private List 
filterEvents(List provenanceEvents) {
-if(isFilteringEnabled()) {
-List filteredEvents = new 
ArrayList();
+private List filterEvents(ComponentMapHolder 
componentMapHolder, List provenanceEvents) {
+if (isFilteringEnabled()) {
+List filteredEvents = new ArrayList<>();
 
 for (ProvenanceEventRecord provenanceEventRecord : 
provenanceEvents) {
-if(!componentIds.isEmpty() && 
!componentIds.contains(provenanceEventRecord.getComponentId())) {
-continue;
+final String componentId = 
provenanceEventRecord.getComponentId();
+if (!componentIds.isEmpty() && 
!componentIds.contains(componentId)) {
+// If we aren't filtering it out based on component 
ID, let's see if this component has a parent process group IDs
+// that is being filtered on
+if (componentMapHolder == null) {
+continue;
+}
+final String processGroupId = 
componentMapHolder.getProcessGroupId(componentId, 
provenanceEventRecord.getComponentType());
+if (StringUtils.isEmpty(processGroupId)) {
+continue;
+}
+// Check if any parent process group has the specified 
component ID
--- End diff --

This comment does not sound right to me, if 'the specified component ID' 
means what user specified at ReportingTask filter property.


> SiteToSiteProvenanceReportingTask not returning correct metadata
> 
>
> Key: NIFI-4707
> URL: https://issues.apache.org/jira/browse/NIFI-4707
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: Matt Burgess
>
> When the SiteToSiteProvenanceReportingTask emits flow files, some of them 
> include a "componentName" field and some do not. Investigation shows that 
> only the components (except connections) in the root process group have that 
> field populated. Having this information can be very helpful to the user, 
> even though the names might be duplicated, there would be a mapping between a 
> component's ID and its name. At the very least the behavior (i.e. component 
> name being available) should be consistent.
> Having a full map (by traversing the entire flow) also opens up the ability 
> to include Process Group information for the various components. The 
> reporting task could include the parent Process Group identifier and/or name, 
> with perhaps a special ID for the root PG's "parent", such as "@ROOT@" or 
> something unique.
> This could also allow for a PG ID in the list of filtered "component IDs", 
> where any provenance event for a processor in a particular PG could be 
> included in a filter when that PG's ID is in the filter list.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (NIFI-4707) SiteToSiteProvenanceReportingTask not returning correct metadata

2017-12-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16303034#comment-16303034
 ] 

ASF GitHub Bot commented on NIFI-4707:
--

Github user ijokarumawak commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2351#discussion_r158616813
  
--- Diff: 
nifi-nar-bundles/nifi-extension-utils/nifi-reporting-utils/src/main/java/org/apache/nifi/reporting/util/provenance/ProvenanceEventConsumer.java
 ---
@@ -218,18 +230,35 @@ private boolean isFilteringEnabled() {
 return componentTypeRegex != null || !eventTypes.isEmpty() || 
!componentIds.isEmpty();
 }
 
-private List 
filterEvents(List provenanceEvents) {
-if(isFilteringEnabled()) {
-List filteredEvents = new 
ArrayList();
+private List filterEvents(ComponentMapHolder 
componentMapHolder, List provenanceEvents) {
+if (isFilteringEnabled()) {
+List filteredEvents = new ArrayList<>();
 
 for (ProvenanceEventRecord provenanceEventRecord : 
provenanceEvents) {
-if(!componentIds.isEmpty() && 
!componentIds.contains(provenanceEventRecord.getComponentId())) {
-continue;
+final String componentId = 
provenanceEventRecord.getComponentId();
+if (!componentIds.isEmpty() && 
!componentIds.contains(componentId)) {
+// If we aren't filtering it out based on component 
ID, let's see if this component has a parent process group IDs
+// that is being filtered on
+if (componentMapHolder == null) {
+continue;
+}
+final String processGroupId = 
componentMapHolder.getProcessGroupId(componentId, 
provenanceEventRecord.getComponentType());
+if (StringUtils.isEmpty(processGroupId)) {
+continue;
+}
+// Check if any parent process group has the specified 
component ID
+ParentProcessGroupSearchNode matchedComponent = 
componentMapHolder.getProcessGroupParent(componentId);
--- End diff --

`componentMapHolder.getProcessGroupParent(componentId)` will not work with 
RemoteInputPorts and RemoteOutputPorts.


> SiteToSiteProvenanceReportingTask not returning correct metadata
> 
>
> Key: NIFI-4707
> URL: https://issues.apache.org/jira/browse/NIFI-4707
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: Matt Burgess
>
> When the SiteToSiteProvenanceReportingTask emits flow files, some of them 
> include a "componentName" field and some do not. Investigation shows that 
> only the components (except connections) in the root process group have that 
> field populated. Having this information can be very helpful to the user, 
> even though the names might be duplicated, there would be a mapping between a 
> component's ID and its name. At the very least the behavior (i.e. component 
> name being available) should be consistent.
> Having a full map (by traversing the entire flow) also opens up the ability 
> to include Process Group information for the various components. The 
> reporting task could include the parent Process Group identifier and/or name, 
> with perhaps a special ID for the root PG's "parent", such as "@ROOT@" or 
> something unique.
> This could also allow for a PG ID in the list of filtered "component IDs", 
> where any provenance event for a processor in a particular PG could be 
> included in a filter when that PG's ID is in the filter list.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (NIFI-4707) SiteToSiteProvenanceReportingTask not returning correct metadata

2017-12-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16303032#comment-16303032
 ] 

ASF GitHub Bot commented on NIFI-4707:
--

Github user ijokarumawak commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2351#discussion_r158616024
  
--- Diff: 
nifi-nar-bundles/nifi-extension-utils/nifi-reporting-utils/src/main/java/org/apache/nifi/reporting/util/provenance/ProvenanceEventConsumer.java
 ---
@@ -218,18 +230,35 @@ private boolean isFilteringEnabled() {
 return componentTypeRegex != null || !eventTypes.isEmpty() || 
!componentIds.isEmpty();
 }
 
-private List 
filterEvents(List provenanceEvents) {
-if(isFilteringEnabled()) {
-List filteredEvents = new 
ArrayList();
+private List filterEvents(ComponentMapHolder 
componentMapHolder, List provenanceEvents) {
+if (isFilteringEnabled()) {
+List filteredEvents = new ArrayList<>();
 
 for (ProvenanceEventRecord provenanceEventRecord : 
provenanceEvents) {
-if(!componentIds.isEmpty() && 
!componentIds.contains(provenanceEventRecord.getComponentId())) {
-continue;
+final String componentId = 
provenanceEventRecord.getComponentId();
+if (!componentIds.isEmpty() && 
!componentIds.contains(componentId)) {
+// If we aren't filtering it out based on component 
ID, let's see if this component has a parent process group IDs
+// that is being filtered on
+if (componentMapHolder == null) {
+continue;
+}
+final String processGroupId = 
componentMapHolder.getProcessGroupId(componentId, 
provenanceEventRecord.getComponentType());
+if (StringUtils.isEmpty(processGroupId)) {
+continue;
+}
+// Check if any parent process group has the specified 
component ID
+ParentProcessGroupSearchNode matchedComponent = 
componentMapHolder.getProcessGroupParent(componentId);
+while (matchedComponent != null && 
!matchedComponent.getId().equals(processGroupId) && 
!componentIds.contains(matchedComponent.getId())) {
--- End diff --

The condition `!matchedComponent.getId().equals(processGroupId)` should be 
removed.
It does not work if a ProcessGroup id is used for filtering. For example, 
if there are Root, PG1, PG2, and Component C1 is in PG1, then the reporting 
task is configured to filter with PG2. In that case, `processGroupId` would be 
PG1. But it's not specified in `componentIds`. Since `componentIds` only 
contains PG2, C1 in PG1 should be filtered out. But the condition make C1 to 
pass.


> SiteToSiteProvenanceReportingTask not returning correct metadata
> 
>
> Key: NIFI-4707
> URL: https://issues.apache.org/jira/browse/NIFI-4707
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: Matt Burgess
>
> When the SiteToSiteProvenanceReportingTask emits flow files, some of them 
> include a "componentName" field and some do not. Investigation shows that 
> only the components (except connections) in the root process group have that 
> field populated. Having this information can be very helpful to the user, 
> even though the names might be duplicated, there would be a mapping between a 
> component's ID and its name. At the very least the behavior (i.e. component 
> name being available) should be consistent.
> Having a full map (by traversing the entire flow) also opens up the ability 
> to include Process Group information for the various components. The 
> reporting task could include the parent Process Group identifier and/or name, 
> with perhaps a special ID for the root PG's "parent", such as "@ROOT@" or 
> something unique.
> This could also allow for a PG ID in the list of filtered "component IDs", 
> where any provenance event for a processor in a particular PG could be 
> included in a filter when that PG's ID is in the filter list.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (NIFI-4707) SiteToSiteProvenanceReportingTask not returning correct metadata

2017-12-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16303029#comment-16303029
 ] 

ASF GitHub Bot commented on NIFI-4707:
--

Github user ijokarumawak commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2351#discussion_r158615663
  
--- Diff: 
nifi-nar-bundles/nifi-extension-utils/nifi-reporting-utils/src/main/java/org/apache/nifi/reporting/util/provenance/ComponentMapHolder.java
 ---
@@ -0,0 +1,120 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.reporting.util.provenance;
+
+import org.apache.nifi.controller.status.ConnectionStatus;
+import org.apache.nifi.controller.status.PortStatus;
+import org.apache.nifi.controller.status.ProcessGroupStatus;
+import org.apache.nifi.controller.status.ProcessorStatus;
+import org.apache.nifi.controller.status.RemoteProcessGroupStatus;
+
+import java.util.HashMap;
+import java.util.Map;
+
+public class ComponentMapHolder {
+private static final String REMOTE_INPUT_PORT = "Remote Input Port";
+private static final String REMOTE_OUTPUT_PORT = "Remote Output Port";
+private final Map componentNameMap = new HashMap<>();
+private final Map 
componentToParentGroupMap = new HashMap<>();
+private final Map sourceToConnectionParentGroupMap = 
new HashMap<>();
+private final Map destinationToConnectionParentGroupMap 
= new HashMap<>();
+
+private ComponentMapHolder putAll(ComponentMapHolder holder) {
+this.componentNameMap.putAll(holder.componentNameMap);
+
this.componentToParentGroupMap.putAll(holder.componentToParentGroupMap);
+
this.sourceToConnectionParentGroupMap.putAll(holder.sourceToConnectionParentGroupMap);
+
this.destinationToConnectionParentGroupMap.putAll(holder.destinationToConnectionParentGroupMap);
+return this;
+}
+
+public String getComponentName(final String componentId) {
+return componentNameMap.get(componentId);
+}
+
+public String getProcessGroupId(final String componentId, final String 
componentType) {
+// Where a Remote Input/Output Port resides is only available at 
ConnectionStatus.
+if (REMOTE_INPUT_PORT.equals(componentType)) {
+return destinationToConnectionParentGroupMap.get(componentId);
+} else if (REMOTE_OUTPUT_PORT.equals(componentType)) {
+return sourceToConnectionParentGroupMap.get(componentId);
+}
+ParentProcessGroupSearchNode parentNode = 
componentToParentGroupMap.get(componentId);
+return parentNode == null ? null : parentNode.getId();
+}
+
+public ParentProcessGroupSearchNode getProcessGroupParent(final String 
componentId) {
+return componentToParentGroupMap.get(componentId);
+}
+
+public static ComponentMapHolder createComponentMap(final 
ProcessGroupStatus status, final ParentProcessGroupSearchNode thisNode) {
+final ComponentMapHolder holder = new ComponentMapHolder();
+final Map componentNameMap = 
holder.componentNameMap;
+final Map 
componentToParentGroupMap = holder.componentToParentGroupMap;
+final Map sourceToConnectionParentGroupMap = 
holder.sourceToConnectionParentGroupMap;
+final Map destinationToConnectionParentGroupMap = 
holder.destinationToConnectionParentGroupMap;
+
+if (status != null) {
+ParentProcessGroupSearchNode parentNode = thisNode;
+componentNameMap.put(status.getId(), status.getName());
+// Put a root entry in if one does not yet exist
+if (parentNode == null) {
+parentNode = new 
ParentProcessGroupSearchNode(status.getId(), null);
 

[jira] [Commented] (NIFI-4707) SiteToSiteProvenanceReportingTask not returning correct metadata

2017-12-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16299476#comment-16299476
 ] 

ASF GitHub Bot commented on NIFI-4707:
--

Github user ijokarumawak commented on the issue:

https://github.com/apache/nifi/pull/2351
  
@mattyb149 Good catch. Yeah, simple ProcessGroupID tree structure would 
help optimizing lookup at filtering. Please add that and squash commits, then 
I'll do a final review and merge it. Thanks!


> SiteToSiteProvenanceReportingTask not returning correct metadata
> 
>
> Key: NIFI-4707
> URL: https://issues.apache.org/jira/browse/NIFI-4707
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: Matt Burgess
>
> When the SiteToSiteProvenanceReportingTask emits flow files, some of them 
> include a "componentName" field and some do not. Investigation shows that 
> only the components (except connections) in the root process group have that 
> field populated. Having this information can be very helpful to the user, 
> even though the names might be duplicated, there would be a mapping between a 
> component's ID and its name. At the very least the behavior (i.e. component 
> name being available) should be consistent.
> Having a full map (by traversing the entire flow) also opens up the ability 
> to include Process Group information for the various components. The 
> reporting task could include the parent Process Group identifier and/or name, 
> with perhaps a special ID for the root PG's "parent", such as "@ROOT@" or 
> something unique.
> This could also allow for a PG ID in the list of filtered "component IDs", 
> where any provenance event for a processor in a particular PG could be 
> included in a filter when that PG's ID is in the filter list.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (NIFI-4707) SiteToSiteProvenanceReportingTask not returning correct metadata

2017-12-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16298653#comment-16298653
 ] 

ASF GitHub Bot commented on NIFI-4707:
--

Github user mattyb149 commented on the issue:

https://github.com/apache/nifi/pull/2351
  
Thanks for the commit, great stuff!  Do you think the processing of the 
stack for each record will be ok in terms of performance impact?  I wonder if 
we'd be better off building a tree (basically a flow graph model) when building 
a component map, along with another "index" map from component id -> node in 
the tree. It might replace the need for other "inheritance" maps or property 
maps (as the node could hold the properties). Then we can get the component and 
traverse to the root during the filter?


> SiteToSiteProvenanceReportingTask not returning correct metadata
> 
>
> Key: NIFI-4707
> URL: https://issues.apache.org/jira/browse/NIFI-4707
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: Matt Burgess
>
> When the SiteToSiteProvenanceReportingTask emits flow files, some of them 
> include a "componentName" field and some do not. Investigation shows that 
> only the components (except connections) in the root process group have that 
> field populated. Having this information can be very helpful to the user, 
> even though the names might be duplicated, there would be a mapping between a 
> component's ID and its name. At the very least the behavior (i.e. component 
> name being available) should be consistent.
> Having a full map (by traversing the entire flow) also opens up the ability 
> to include Process Group information for the various components. The 
> reporting task could include the parent Process Group identifier and/or name, 
> with perhaps a special ID for the root PG's "parent", such as "@ROOT@" or 
> something unique.
> This could also allow for a PG ID in the list of filtered "component IDs", 
> where any provenance event for a processor in a particular PG could be 
> included in a filter when that PG's ID is in the filter list.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (NIFI-4707) SiteToSiteProvenanceReportingTask not returning correct metadata

2017-12-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16298219#comment-16298219
 ] 

ASF GitHub Bot commented on NIFI-4707:
--

Github user ijokarumawak commented on the issue:

https://github.com/apache/nifi/pull/2351
  
@mattyb149 I forgot to mention that I'm +1 with the rest of the code, so if 
my commit seems reasonable, please squash yours and merge it. Thanks!


> SiteToSiteProvenanceReportingTask not returning correct metadata
> 
>
> Key: NIFI-4707
> URL: https://issues.apache.org/jira/browse/NIFI-4707
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: Matt Burgess
>
> When the SiteToSiteProvenanceReportingTask emits flow files, some of them 
> include a "componentName" field and some do not. Investigation shows that 
> only the components (except connections) in the root process group have that 
> field populated. Having this information can be very helpful to the user, 
> even though the names might be duplicated, there would be a mapping between a 
> component's ID and its name. At the very least the behavior (i.e. component 
> name being available) should be consistent.
> Having a full map (by traversing the entire flow) also opens up the ability 
> to include Process Group information for the various components. The 
> reporting task could include the parent Process Group identifier and/or name, 
> with perhaps a special ID for the root PG's "parent", such as "@ROOT@" or 
> something unique.
> This could also allow for a PG ID in the list of filtered "component IDs", 
> where any provenance event for a processor in a particular PG could be 
> included in a filter when that PG's ID is in the filter list.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (NIFI-4707) SiteToSiteProvenanceReportingTask not returning correct metadata

2017-12-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16298037#comment-16298037
 ] 

ASF GitHub Bot commented on NIFI-4707:
--

Github user ijokarumawak commented on the issue:

https://github.com/apache/nifi/pull/2351
  
Hi @mattyb149 Thanks for updating this PR. It mostly looks good, however, 
while I was testing, I found few points those can be improved. I went ahead and 
added following improvements on top ob your commits. Would you cherry-pick this 
commit? 
https://github.com/ijokarumawak/nifi/commit/8effe3b19681ac34594a2f33e9d049ef081730a6

1. "Remote Input/Output Port" port name and process group id can only be 
retrieved by mapping ConnectionStatus source or destination component id.
2. When a ProcessGroupId is used to filter events, the filtering should 
consider PG hierarchy, meaning if PG1 is a child of Root, and PG2 is a child of 
PG1, and PG1 uuid is used as filter component id, then provenance events 
happening at PG2 should also be reported.

Other minor improvements:
- Simplified consumeEvents method signature
- Refactored ComponentMapHolder methods visibility
- Renamed componentMap to componentNameMap
- Throw an exception when the reporting task fails to send provenance 
data to keep current provenance event index so that events can be consumed again

Thank you!




> SiteToSiteProvenanceReportingTask not returning correct metadata
> 
>
> Key: NIFI-4707
> URL: https://issues.apache.org/jira/browse/NIFI-4707
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: Matt Burgess
>
> When the SiteToSiteProvenanceReportingTask emits flow files, some of them 
> include a "componentName" field and some do not. Investigation shows that 
> only the components (except connections) in the root process group have that 
> field populated. Having this information can be very helpful to the user, 
> even though the names might be duplicated, there would be a mapping between a 
> component's ID and its name. At the very least the behavior (i.e. component 
> name being available) should be consistent.
> Having a full map (by traversing the entire flow) also opens up the ability 
> to include Process Group information for the various components. The 
> reporting task could include the parent Process Group identifier and/or name, 
> with perhaps a special ID for the root PG's "parent", such as "@ROOT@" or 
> something unique.
> This could also allow for a PG ID in the list of filtered "component IDs", 
> where any provenance event for a processor in a particular PG could be 
> included in a filter when that PG's ID is in the filter list.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (NIFI-4707) SiteToSiteProvenanceReportingTask not returning correct metadata

2017-12-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16296797#comment-16296797
 ] 

ASF GitHub Bot commented on NIFI-4707:
--

Github user mattyb149 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2351#discussion_r157756093
  
--- Diff: 
nifi-nar-bundles/nifi-site-to-site-reporting-bundle/nifi-site-to-site-reporting-task/src/main/java/org/apache/nifi/reporting/SiteToSiteProvenanceReportingTask.java
 ---
@@ -174,34 +177,47 @@ public void onUnscheduled() {
 return properties;
 }
 
-private Map createComponentMap(final ProcessGroupStatus 
status) {
-final Map componentMap = new HashMap<>();
+private ComponentMapHolder createComponentMap(final ProcessGroupStatus 
status) {
--- End diff --

I agree, will move it there.


> SiteToSiteProvenanceReportingTask not returning correct metadata
> 
>
> Key: NIFI-4707
> URL: https://issues.apache.org/jira/browse/NIFI-4707
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: Matt Burgess
>
> When the SiteToSiteProvenanceReportingTask emits flow files, some of them 
> include a "componentName" field and some do not. Investigation shows that 
> only the components (except connections) in the root process group have that 
> field populated. Having this information can be very helpful to the user, 
> even though the names might be duplicated, there would be a mapping between a 
> component's ID and its name. At the very least the behavior (i.e. component 
> name being available) should be consistent.
> Having a full map (by traversing the entire flow) also opens up the ability 
> to include Process Group information for the various components. The 
> reporting task could include the parent Process Group identifier and/or name, 
> with perhaps a special ID for the root PG's "parent", such as "@ROOT@" or 
> something unique.
> This could also allow for a PG ID in the list of filtered "component IDs", 
> where any provenance event for a processor in a particular PG could be 
> included in a filter when that PG's ID is in the filter list.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (NIFI-4707) SiteToSiteProvenanceReportingTask not returning correct metadata

2017-12-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16296798#comment-16296798
 ] 

ASF GitHub Bot commented on NIFI-4707:
--

Github user mattyb149 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2351#discussion_r157756243
  
--- Diff: 
nifi-nar-bundles/nifi-atlas-bundle/nifi-atlas-reporting-task/src/main/java/org/apache/nifi/atlas/reporting/ReportLineageToAtlas.java
 ---
@@ -640,7 +640,7 @@ private void 
consumeNiFiProvenanceEvents(ReportingContext context, NiFiFlow nifi
 final AnalysisContext analysisContext = new 
StandardAnalysisContext(nifiFlow, clusterResolvers,
 // FIXME: This class cast shouldn't be necessary to query 
lineage. Possible refactor target in next major update.
 
(ProvenanceRepository)eventAccess.getProvenanceRepository());
-consumer.consumeEvents(eventAccess, context.getStateManager(), 
events -> {
+consumer.consumeEvents(null, eventAccess, 
context.getStateManager(), events -> {
--- End diff --

Agreed, I will update


> SiteToSiteProvenanceReportingTask not returning correct metadata
> 
>
> Key: NIFI-4707
> URL: https://issues.apache.org/jira/browse/NIFI-4707
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: Matt Burgess
>
> When the SiteToSiteProvenanceReportingTask emits flow files, some of them 
> include a "componentName" field and some do not. Investigation shows that 
> only the components (except connections) in the root process group have that 
> field populated. Having this information can be very helpful to the user, 
> even though the names might be duplicated, there would be a mapping between a 
> component's ID and its name. At the very least the behavior (i.e. component 
> name being available) should be consistent.
> Having a full map (by traversing the entire flow) also opens up the ability 
> to include Process Group information for the various components. The 
> reporting task could include the parent Process Group identifier and/or name, 
> with perhaps a special ID for the root PG's "parent", such as "@ROOT@" or 
> something unique.
> This could also allow for a PG ID in the list of filtered "component IDs", 
> where any provenance event for a processor in a particular PG could be 
> included in a filter when that PG's ID is in the filter list.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (NIFI-4707) SiteToSiteProvenanceReportingTask not returning correct metadata

2017-12-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16296235#comment-16296235
 ] 

ASF GitHub Bot commented on NIFI-4707:
--

Github user ijokarumawak commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2351#discussion_r157667259
  
--- Diff: 
nifi-nar-bundles/nifi-atlas-bundle/nifi-atlas-reporting-task/src/main/java/org/apache/nifi/atlas/reporting/ReportLineageToAtlas.java
 ---
@@ -640,7 +640,7 @@ private void 
consumeNiFiProvenanceEvents(ReportingContext context, NiFiFlow nifi
 final AnalysisContext analysisContext = new 
StandardAnalysisContext(nifiFlow, clusterResolvers,
 // FIXME: This class cast shouldn't be necessary to query 
lineage. Possible refactor target in next major update.
 
(ProvenanceRepository)eventAccess.getProvenanceRepository());
-consumer.consumeEvents(eventAccess, context.getStateManager(), 
events -> {
+consumer.consumeEvents(null, eventAccess, 
context.getStateManager(), events -> {
--- End diff --

It would be more useful if we add another consumeEvents method signature 
(or replace existing one) having the last argument as 
`BiConsumer`.


> SiteToSiteProvenanceReportingTask not returning correct metadata
> 
>
> Key: NIFI-4707
> URL: https://issues.apache.org/jira/browse/NIFI-4707
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: Matt Burgess
>
> When the SiteToSiteProvenanceReportingTask emits flow files, some of them 
> include a "componentName" field and some do not. Investigation shows that 
> only the components (except connections) in the root process group have that 
> field populated. Having this information can be very helpful to the user, 
> even though the names might be duplicated, there would be a mapping between a 
> component's ID and its name. At the very least the behavior (i.e. component 
> name being available) should be consistent.
> Having a full map (by traversing the entire flow) also opens up the ability 
> to include Process Group information for the various components. The 
> reporting task could include the parent Process Group identifier and/or name, 
> with perhaps a special ID for the root PG's "parent", such as "@ROOT@" or 
> something unique.
> This could also allow for a PG ID in the list of filtered "component IDs", 
> where any provenance event for a processor in a particular PG could be 
> included in a filter when that PG's ID is in the filter list.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (NIFI-4707) SiteToSiteProvenanceReportingTask not returning correct metadata

2017-12-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16296225#comment-16296225
 ] 

ASF GitHub Bot commented on NIFI-4707:
--

Github user ijokarumawak commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2351#discussion_r157666543
  
--- Diff: 
nifi-nar-bundles/nifi-site-to-site-reporting-bundle/nifi-site-to-site-reporting-task/src/main/java/org/apache/nifi/reporting/SiteToSiteProvenanceReportingTask.java
 ---
@@ -174,34 +177,47 @@ public void onUnscheduled() {
 return properties;
 }
 
-private Map createComponentMap(final ProcessGroupStatus 
status) {
-final Map componentMap = new HashMap<>();
+private ComponentMapHolder createComponentMap(final ProcessGroupStatus 
status) {
--- End diff --

This method probably should be in ProvenanceEventConsumer instead of each 
ReportingTask implementation so that other ReportingTasks can get benefits from 
it. How do you think?


> SiteToSiteProvenanceReportingTask not returning correct metadata
> 
>
> Key: NIFI-4707
> URL: https://issues.apache.org/jira/browse/NIFI-4707
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: Matt Burgess
>
> When the SiteToSiteProvenanceReportingTask emits flow files, some of them 
> include a "componentName" field and some do not. Investigation shows that 
> only the components (except connections) in the root process group have that 
> field populated. Having this information can be very helpful to the user, 
> even though the names might be duplicated, there would be a mapping between a 
> component's ID and its name. At the very least the behavior (i.e. component 
> name being available) should be consistent.
> Having a full map (by traversing the entire flow) also opens up the ability 
> to include Process Group information for the various components. The 
> reporting task could include the parent Process Group identifier and/or name, 
> with perhaps a special ID for the root PG's "parent", such as "@ROOT@" or 
> something unique.
> This could also allow for a PG ID in the list of filtered "component IDs", 
> where any provenance event for a processor in a particular PG could be 
> included in a filter when that PG's ID is in the filter list.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)