[jira] [Commented] (NIFI-1135) For Provenance Query, bring back Event Summaries instead of the Events themselves

2017-01-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15828285#comment-15828285
 ] 

ASF GitHub Bot commented on NIFI-1135:
--

Github user asfgit closed the pull request at:

https://github.com/apache/nifi/pull/1413


> For Provenance Query, bring back Event Summaries instead of the Events 
> themselves
> -
>
> Key: NIFI-1135
> URL: https://issues.apache.org/jira/browse/NIFI-1135
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework, Core UI
>Affects Versions: 1.0.0
>Reporter: Mark Payne
>Assignee: Matt Gilman
> Fix For: 1.2.0
>
>
> Currently, when we query Provenance, we pull back up to 1000 events. These 
> are full Provenance Events with attributes, etc. If the query takes a long 
> time, we will request those objects that already have matched the query many 
> times. This amounts to a great deal of heap being used and sending back very 
> large JSON objects (10+ MB is not uncommon and it could potentially be far 
> worse).
> We should instead use a ProvenanceEventSummary object. This object should 
> contain just the info shown in the results table and the pointer to the 
> actual event in the Provenance Store. This allows us to return the queries 
> much faster, store less data in the heap, and provide less data back to the 
> end user with virtually the same experience.
> The one place that this would differ in UX is when the user clicks the "info" 
> button to view the entire provenance event, we would have to pull the event 
> back from the server, rather than already having that in memory.
> We should consider storing all of the fields in the results table in Lucene 
> to provide faster results. Otherwise, we could still get potentially better 
> results with the current approach if we just ensure that the first fields 
> that we store are those in the results table. This allows us to read just a 
> small portion of the event from file and deserializing just a small amount of 
> data before moving on to the next event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-1135) For Provenance Query, bring back Event Summaries instead of the Events themselves

2017-01-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15828287#comment-15828287
 ] 

ASF GitHub Bot commented on NIFI-1135:
--

Github user markap14 commented on the issue:

https://github.com/apache/nifi/pull/1413
  
@mcgilman Looks great! +1 merged to master. Thanks for updating this!


> For Provenance Query, bring back Event Summaries instead of the Events 
> themselves
> -
>
> Key: NIFI-1135
> URL: https://issues.apache.org/jira/browse/NIFI-1135
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework, Core UI
>Affects Versions: 1.0.0
>Reporter: Mark Payne
>Assignee: Matt Gilman
> Fix For: 1.2.0
>
>
> Currently, when we query Provenance, we pull back up to 1000 events. These 
> are full Provenance Events with attributes, etc. If the query takes a long 
> time, we will request those objects that already have matched the query many 
> times. This amounts to a great deal of heap being used and sending back very 
> large JSON objects (10+ MB is not uncommon and it could potentially be far 
> worse).
> We should instead use a ProvenanceEventSummary object. This object should 
> contain just the info shown in the results table and the pointer to the 
> actual event in the Provenance Store. This allows us to return the queries 
> much faster, store less data in the heap, and provide less data back to the 
> end user with virtually the same experience.
> The one place that this would differ in UX is when the user clicks the "info" 
> button to view the entire provenance event, we would have to pull the event 
> back from the server, rather than already having that in memory.
> We should consider storing all of the fields in the results table in Lucene 
> to provide faster results. Otherwise, we could still get potentially better 
> results with the current approach if we just ensure that the first fields 
> that we store are those in the results table. This allows us to read just a 
> small portion of the event from file and deserializing just a small amount of 
> data before moving on to the next event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-1135) For Provenance Query, bring back Event Summaries instead of the Events themselves

2017-01-18 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15828283#comment-15828283
 ] 

ASF subversion and git services commented on NIFI-1135:
---

Commit e925b18fe617ff4e04f86e44ee3500225c7157e6 in nifi's branch 
refs/heads/master from [~mcgilman]
[ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=e925b18 ]

NIFI-1135:
- Adding additional parameters to be able to limit the size of the provenance 
response. Specifically, whether the events should be summarized and whether 
events should be returned incrementally before the query has completed.
- Ensuring the cluster node address is included in provenance events returned.
- Ensuring there is a cluster coordinator before attempting to get the cluster 
node address.
- Removing exponential back off between provenance requests.
- Ensuring the content viewer url is retrieve before initializing the 
provenance table.

This closes #1413.


> For Provenance Query, bring back Event Summaries instead of the Events 
> themselves
> -
>
> Key: NIFI-1135
> URL: https://issues.apache.org/jira/browse/NIFI-1135
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework, Core UI
>Affects Versions: 1.0.0
>Reporter: Mark Payne
>Assignee: Matt Gilman
>
> Currently, when we query Provenance, we pull back up to 1000 events. These 
> are full Provenance Events with attributes, etc. If the query takes a long 
> time, we will request those objects that already have matched the query many 
> times. This amounts to a great deal of heap being used and sending back very 
> large JSON objects (10+ MB is not uncommon and it could potentially be far 
> worse).
> We should instead use a ProvenanceEventSummary object. This object should 
> contain just the info shown in the results table and the pointer to the 
> actual event in the Provenance Store. This allows us to return the queries 
> much faster, store less data in the heap, and provide less data back to the 
> end user with virtually the same experience.
> The one place that this would differ in UX is when the user clicks the "info" 
> button to view the entire provenance event, we would have to pull the event 
> back from the server, rather than already having that in memory.
> We should consider storing all of the fields in the results table in Lucene 
> to provide faster results. Otherwise, we could still get potentially better 
> results with the current approach if we just ensure that the first fields 
> that we store are those in the results table. This allows us to read just a 
> small portion of the event from file and deserializing just a small amount of 
> data before moving on to the next event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-1135) For Provenance Query, bring back Event Summaries instead of the Events themselves

2017-01-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822064#comment-15822064
 ] 

ASF GitHub Bot commented on NIFI-1135:
--

Github user mcgilman commented on the issue:

https://github.com/apache/nifi/pull/1413
  
Got it. Will update.


> For Provenance Query, bring back Event Summaries instead of the Events 
> themselves
> -
>
> Key: NIFI-1135
> URL: https://issues.apache.org/jira/browse/NIFI-1135
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework, Core UI
>Affects Versions: 1.0.0
>Reporter: Mark Payne
>Assignee: Matt Gilman
>
> Currently, when we query Provenance, we pull back up to 1000 events. These 
> are full Provenance Events with attributes, etc. If the query takes a long 
> time, we will request those objects that already have matched the query many 
> times. This amounts to a great deal of heap being used and sending back very 
> large JSON objects (10+ MB is not uncommon and it could potentially be far 
> worse).
> We should instead use a ProvenanceEventSummary object. This object should 
> contain just the info shown in the results table and the pointer to the 
> actual event in the Provenance Store. This allows us to return the queries 
> much faster, store less data in the heap, and provide less data back to the 
> end user with virtually the same experience.
> The one place that this would differ in UX is when the user clicks the "info" 
> button to view the entire provenance event, we would have to pull the event 
> back from the server, rather than already having that in memory.
> We should consider storing all of the fields in the results table in Lucene 
> to provide faster results. Otherwise, we could still get potentially better 
> results with the current approach if we just ensure that the first fields 
> that we store are those in the results table. This allows us to read just a 
> small portion of the event from file and deserializing just a small amount of 
> data before moving on to the next event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-1135) For Provenance Query, bring back Event Summaries instead of the Events themselves

2017-01-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822049#comment-15822049
 ] 

ASF GitHub Bot commented on NIFI-1135:
--

Github user markap14 commented on the issue:

https://github.com/apache/nifi/pull/1413
  
@mcgilman very cool to have this making its way into NiFi! I did some 
testing, and things work well for the most part. However, I did hit a NPE when 
running in non-cluster mode:

```
2017-01-13 17:39:42,680 ERROR [NiFi Web Server-33] 
o.a.nifi.web.api.config.ThrowableMapper An unexpected error has occurred: 
java.lang.NullPointerException. Returning Internal Server Error response.
java.lang.NullPointerException: null
at 
org.apache.nifi.web.api.ProvenanceEventResource.getProvenanceEvent(ProvenanceEventResource.java:293)
 ~[classes/:na]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
~[na:1.8.0_111]
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[na:1.8.0_111]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[na:1.8.0_111]
at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_111]
```

Looks like the resource is calling getClusterCoordinator() and not 
expecting a 'null' to be returned. But in standalone mode, there's no Cluster 
Coordinator.


> For Provenance Query, bring back Event Summaries instead of the Events 
> themselves
> -
>
> Key: NIFI-1135
> URL: https://issues.apache.org/jira/browse/NIFI-1135
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework, Core UI
>Affects Versions: 1.0.0
>Reporter: Mark Payne
>Assignee: Matt Gilman
>
> Currently, when we query Provenance, we pull back up to 1000 events. These 
> are full Provenance Events with attributes, etc. If the query takes a long 
> time, we will request those objects that already have matched the query many 
> times. This amounts to a great deal of heap being used and sending back very 
> large JSON objects (10+ MB is not uncommon and it could potentially be far 
> worse).
> We should instead use a ProvenanceEventSummary object. This object should 
> contain just the info shown in the results table and the pointer to the 
> actual event in the Provenance Store. This allows us to return the queries 
> much faster, store less data in the heap, and provide less data back to the 
> end user with virtually the same experience.
> The one place that this would differ in UX is when the user clicks the "info" 
> button to view the entire provenance event, we would have to pull the event 
> back from the server, rather than already having that in memory.
> We should consider storing all of the fields in the results table in Lucene 
> to provide faster results. Otherwise, we could still get potentially better 
> results with the current approach if we just ensure that the first fields 
> that we store are those in the results table. This allows us to read just a 
> small portion of the event from file and deserializing just a small amount of 
> data before moving on to the next event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NIFI-1135) For Provenance Query, bring back Event Summaries instead of the Events themselves

2017-01-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15821858#comment-15821858
 ] 

ASF GitHub Bot commented on NIFI-1135:
--

GitHub user mcgilman opened a pull request:

https://github.com/apache/nifi/pull/1413

NIFI-1135: Returning event summaries instead of full events

NIFI-1135:
- Adding additional parameters to be able to limit the size of the 
provenance response. Specifically, whether the events should be summarized and 
whether events should be returned incrementally before the query has completed.
- Ensuring the cluster node address is included in provenance events 
returned.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mcgilman/nifi NIFI-1135

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nifi/pull/1413.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1413


commit a89638d2f3928bc7c547996d282c85e89698f8bf
Author: Matt Gilman 
Date:   2017-01-13T14:25:10Z

NIFI-1135:
- Adding additional parameters to be able to limit the size of the 
provenance response. Specifically, whether the events should be summarized and 
whether events should be returned incrementally before the query has completed.
- Ensuring the cluster node address is included in provenance events 
returned.




> For Provenance Query, bring back Event Summaries instead of the Events 
> themselves
> -
>
> Key: NIFI-1135
> URL: https://issues.apache.org/jira/browse/NIFI-1135
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework, Core UI
>Affects Versions: 1.0.0
>Reporter: Mark Payne
>Assignee: Mark Payne
>
> Currently, when we query Provenance, we pull back up to 1000 events. These 
> are full Provenance Events with attributes, etc. If the query takes a long 
> time, we will request those objects that already have matched the query many 
> times. This amounts to a great deal of heap being used and sending back very 
> large JSON objects (10+ MB is not uncommon and it could potentially be far 
> worse).
> We should instead use a ProvenanceEventSummary object. This object should 
> contain just the info shown in the results table and the pointer to the 
> actual event in the Provenance Store. This allows us to return the queries 
> much faster, store less data in the heap, and provide less data back to the 
> end user with virtually the same experience.
> The one place that this would differ in UX is when the user clicks the "info" 
> button to view the entire provenance event, we would have to pull the event 
> back from the server, rather than already having that in memory.
> We should consider storing all of the fields in the results table in Lucene 
> to provide faster results. Otherwise, we could still get potentially better 
> results with the current approach if we just ensure that the first fields 
> that we store are those in the results table. This allows us to read just a 
> small portion of the event from file and deserializing just a small amount of 
> data before moving on to the next event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)