Re: Nifi provenance indexing throughput if it is being used as an event store

2019-02-18 Thread Ali Nazemian
Sure. Thanks, Joe.

On Sun, 17 Feb. 2019, 22:52 Joe Witt  ali
>
> there are many variables here that are needed before anyone could know for
> sure.
>
> but give it a try and measure amd forecast and youll know within a day or
> two.
>
> thanks
>
>
> On Sat, Feb 16, 2019, 11:37 PM Ali Nazemian 
>> Thanks, Joe. Given the fact that we would like to add a few attributes
>> and set them to be indexed for the provenance, the mentioned rate should be
>> alright?
>>
>> Cheers,
>> Ali
>>
>> On Sat, Feb 16, 2019 at 2:56 PM Joe Witt  wrote:
>>
>>> Ali
>>>
>>> You certainly can and at the rates you mention you should be able to
>>> keep it for a good while.
>>>
>>> Just set the properties you need for your system and measure the rate at
>>> which prov storage fills.
>>>
>>> Thanks
>>>
>>> On Fri, Feb 15, 2019 at 10:29 PM Ali Nazemian 
>>> wrote:
>>>
 I didn't mean to use Nifi provenance search for an external provenance
 search. I meant to use it for internal search provenance but keep the
 provenance for a longer time than usual. It means instead of expecting it
 to keep provenance data for a few days, use it as an event store as it also
 provides the search capability.

 Regards,
 Ali

 On Sat, Feb 16, 2019 at 5:29 AM Andrew Grande 
 wrote:

> NiFi provenance searches are not a good integration pattern for
> external systems. I.e. using it to periodicaly fetch history burdens the
> cluster (those searches can be heavy) and disrupt normal processing SLAs.
>
> Pushing provenance events out to an external system (pitebtially even
> filtered down to components of interest) is a much more predictable 
> pattern
> and provides lots of flexibility on how to interpret the events.
>
> Andrew
>
> On Thu, Feb 14, 2019, 11:26 PM Ali Nazemian 
> wrote:
>
>> Can I expect the Nifi search provenance part do the job for me?
>>
>> On Fri, 15 Feb. 2019, 13:21 Mike Thomsen > wrote:
>>
>>> Ali,
>>>
>>> There is a site to site publishing task for provenance that you can
>>> add as a root controller service that would be great here. It'll just 
>>> take
>>> all of your provenance data periodically and ship it off to another NiFi
>>> server or cluster that can process all of the provenance data as blocks 
>>> of
>>> JSON data. A common pattern there is to filter down to the events you 
>>> want
>>> and publish to ElasticSearch.
>>>
>>> On Thu, Feb 14, 2019 at 7:05 PM Ali Nazemian 
>>> wrote:
>>>
 Hi All,

 I am investigating to see how Nifi provenance can be used as an
 event store for a long period of time. Our use case is very burst 
 based and
 sometimes we may not receive any event for a period of time and 
 sometimes
 we may get burst traffic. On average we can say maybe around 1000 eps 
 is
 the expected throughput at this stage. Nifi has a powerful provenance 
 that
 gives you an ability to also index based on some attributes. I am
 investigating how reliable is to use Nifi provenance store for a long
 period of time and enable index for a few extra attributes. Has anybody
 used Nifi provenance at this scale? Can lots of Lucene indices create 
 other
 issues within Nifi as provenance uses Lucene for the indexing?

 P.S: Our use case is pretty light for Nifi as we are not going to
 have any ETL and Nifi is being used mostly as an Orchestrator of 
 multiple
 Microservices.

 Regards,
 Ali

>>>

 --
 A.Nazemian

>>>
>>
>> --
>> A.Nazemian
>>
>


Re: Nifi provenance indexing throughput if it is being used as an event store

2019-02-17 Thread Joe Witt
ali

there are many variables here that are needed before anyone could know for
sure.

but give it a try and measure amd forecast and youll know within a day or
two.

thanks


On Sat, Feb 16, 2019, 11:37 PM Ali Nazemian  Thanks, Joe. Given the fact that we would like to add a few attributes and
> set them to be indexed for the provenance, the mentioned rate should be
> alright?
>
> Cheers,
> Ali
>
> On Sat, Feb 16, 2019 at 2:56 PM Joe Witt  wrote:
>
>> Ali
>>
>> You certainly can and at the rates you mention you should be able to keep
>> it for a good while.
>>
>> Just set the properties you need for your system and measure the rate at
>> which prov storage fills.
>>
>> Thanks
>>
>> On Fri, Feb 15, 2019 at 10:29 PM Ali Nazemian 
>> wrote:
>>
>>> I didn't mean to use Nifi provenance search for an external provenance
>>> search. I meant to use it for internal search provenance but keep the
>>> provenance for a longer time than usual. It means instead of expecting it
>>> to keep provenance data for a few days, use it as an event store as it also
>>> provides the search capability.
>>>
>>> Regards,
>>> Ali
>>>
>>> On Sat, Feb 16, 2019 at 5:29 AM Andrew Grande 
>>> wrote:
>>>
 NiFi provenance searches are not a good integration pattern for
 external systems. I.e. using it to periodicaly fetch history burdens the
 cluster (those searches can be heavy) and disrupt normal processing SLAs.

 Pushing provenance events out to an external system (pitebtially even
 filtered down to components of interest) is a much more predictable pattern
 and provides lots of flexibility on how to interpret the events.

 Andrew

 On Thu, Feb 14, 2019, 11:26 PM Ali Nazemian 
 wrote:

> Can I expect the Nifi search provenance part do the job for me?
>
> On Fri, 15 Feb. 2019, 13:21 Mike Thomsen  wrote:
>
>> Ali,
>>
>> There is a site to site publishing task for provenance that you can
>> add as a root controller service that would be great here. It'll just 
>> take
>> all of your provenance data periodically and ship it off to another NiFi
>> server or cluster that can process all of the provenance data as blocks 
>> of
>> JSON data. A common pattern there is to filter down to the events you 
>> want
>> and publish to ElasticSearch.
>>
>> On Thu, Feb 14, 2019 at 7:05 PM Ali Nazemian 
>> wrote:
>>
>>> Hi All,
>>>
>>> I am investigating to see how Nifi provenance can be used as an
>>> event store for a long period of time. Our use case is very burst based 
>>> and
>>> sometimes we may not receive any event for a period of time and 
>>> sometimes
>>> we may get burst traffic. On average we can say maybe around 1000 eps is
>>> the expected throughput at this stage. Nifi has a powerful provenance 
>>> that
>>> gives you an ability to also index based on some attributes. I am
>>> investigating how reliable is to use Nifi provenance store for a long
>>> period of time and enable index for a few extra attributes. Has anybody
>>> used Nifi provenance at this scale? Can lots of Lucene indices create 
>>> other
>>> issues within Nifi as provenance uses Lucene for the indexing?
>>>
>>> P.S: Our use case is pretty light for Nifi as we are not going to
>>> have any ETL and Nifi is being used mostly as an Orchestrator of 
>>> multiple
>>> Microservices.
>>>
>>> Regards,
>>> Ali
>>>
>>
>>>
>>> --
>>> A.Nazemian
>>>
>>
>
> --
> A.Nazemian
>


Re: Nifi provenance indexing throughput if it is being used as an event store

2019-02-16 Thread Ali Nazemian
Thanks, Joe. Given the fact that we would like to add a few attributes and
set them to be indexed for the provenance, the mentioned rate should be
alright?

Cheers,
Ali

On Sat, Feb 16, 2019 at 2:56 PM Joe Witt  wrote:

> Ali
>
> You certainly can and at the rates you mention you should be able to keep
> it for a good while.
>
> Just set the properties you need for your system and measure the rate at
> which prov storage fills.
>
> Thanks
>
> On Fri, Feb 15, 2019 at 10:29 PM Ali Nazemian 
> wrote:
>
>> I didn't mean to use Nifi provenance search for an external provenance
>> search. I meant to use it for internal search provenance but keep the
>> provenance for a longer time than usual. It means instead of expecting it
>> to keep provenance data for a few days, use it as an event store as it also
>> provides the search capability.
>>
>> Regards,
>> Ali
>>
>> On Sat, Feb 16, 2019 at 5:29 AM Andrew Grande  wrote:
>>
>>> NiFi provenance searches are not a good integration pattern for external
>>> systems. I.e. using it to periodicaly fetch history burdens the cluster
>>> (those searches can be heavy) and disrupt normal processing SLAs.
>>>
>>> Pushing provenance events out to an external system (pitebtially even
>>> filtered down to components of interest) is a much more predictable pattern
>>> and provides lots of flexibility on how to interpret the events.
>>>
>>> Andrew
>>>
>>> On Thu, Feb 14, 2019, 11:26 PM Ali Nazemian 
>>> wrote:
>>>
 Can I expect the Nifi search provenance part do the job for me?

 On Fri, 15 Feb. 2019, 13:21 Mike Thomsen >>>
> Ali,
>
> There is a site to site publishing task for provenance that you can
> add as a root controller service that would be great here. It'll just take
> all of your provenance data periodically and ship it off to another NiFi
> server or cluster that can process all of the provenance data as blocks of
> JSON data. A common pattern there is to filter down to the events you want
> and publish to ElasticSearch.
>
> On Thu, Feb 14, 2019 at 7:05 PM Ali Nazemian 
> wrote:
>
>> Hi All,
>>
>> I am investigating to see how Nifi provenance can be used as an event
>> store for a long period of time. Our use case is very burst based and
>> sometimes we may not receive any event for a period of time and sometimes
>> we may get burst traffic. On average we can say maybe around 1000 eps is
>> the expected throughput at this stage. Nifi has a powerful provenance 
>> that
>> gives you an ability to also index based on some attributes. I am
>> investigating how reliable is to use Nifi provenance store for a long
>> period of time and enable index for a few extra attributes. Has anybody
>> used Nifi provenance at this scale? Can lots of Lucene indices create 
>> other
>> issues within Nifi as provenance uses Lucene for the indexing?
>>
>> P.S: Our use case is pretty light for Nifi as we are not going to
>> have any ETL and Nifi is being used mostly as an Orchestrator of multiple
>> Microservices.
>>
>> Regards,
>> Ali
>>
>
>>
>> --
>> A.Nazemian
>>
>

-- 
A.Nazemian


Re: Nifi provenance indexing throughput if it is being used as an event store

2019-02-15 Thread Joe Witt
Ali

You certainly can and at the rates you mention you should be able to keep
it for a good while.

Just set the properties you need for your system and measure the rate at
which prov storage fills.

Thanks

On Fri, Feb 15, 2019 at 10:29 PM Ali Nazemian  wrote:

> I didn't mean to use Nifi provenance search for an external provenance
> search. I meant to use it for internal search provenance but keep the
> provenance for a longer time than usual. It means instead of expecting it
> to keep provenance data for a few days, use it as an event store as it also
> provides the search capability.
>
> Regards,
> Ali
>
> On Sat, Feb 16, 2019 at 5:29 AM Andrew Grande  wrote:
>
>> NiFi provenance searches are not a good integration pattern for external
>> systems. I.e. using it to periodicaly fetch history burdens the cluster
>> (those searches can be heavy) and disrupt normal processing SLAs.
>>
>> Pushing provenance events out to an external system (pitebtially even
>> filtered down to components of interest) is a much more predictable pattern
>> and provides lots of flexibility on how to interpret the events.
>>
>> Andrew
>>
>> On Thu, Feb 14, 2019, 11:26 PM Ali Nazemian 
>> wrote:
>>
>>> Can I expect the Nifi search provenance part do the job for me?
>>>
>>> On Fri, 15 Feb. 2019, 13:21 Mike Thomsen >>
 Ali,

 There is a site to site publishing task for provenance that you can add
 as a root controller service that would be great here. It'll just take all
 of your provenance data periodically and ship it off to another NiFi server
 or cluster that can process all of the provenance data as blocks of JSON
 data. A common pattern there is to filter down to the events you want and
 publish to ElasticSearch.

 On Thu, Feb 14, 2019 at 7:05 PM Ali Nazemian 
 wrote:

> Hi All,
>
> I am investigating to see how Nifi provenance can be used as an event
> store for a long period of time. Our use case is very burst based and
> sometimes we may not receive any event for a period of time and sometimes
> we may get burst traffic. On average we can say maybe around 1000 eps is
> the expected throughput at this stage. Nifi has a powerful provenance that
> gives you an ability to also index based on some attributes. I am
> investigating how reliable is to use Nifi provenance store for a long
> period of time and enable index for a few extra attributes. Has anybody
> used Nifi provenance at this scale? Can lots of Lucene indices create 
> other
> issues within Nifi as provenance uses Lucene for the indexing?
>
> P.S: Our use case is pretty light for Nifi as we are not going to have
> any ETL and Nifi is being used mostly as an Orchestrator of multiple
> Microservices.
>
> Regards,
> Ali
>

>
> --
> A.Nazemian
>


Re: Nifi provenance indexing throughput if it is being used as an event store

2019-02-15 Thread Ali Nazemian
I didn't mean to use Nifi provenance search for an external provenance
search. I meant to use it for internal search provenance but keep the
provenance for a longer time than usual. It means instead of expecting it
to keep provenance data for a few days, use it as an event store as it also
provides the search capability.

Regards,
Ali

On Sat, Feb 16, 2019 at 5:29 AM Andrew Grande  wrote:

> NiFi provenance searches are not a good integration pattern for external
> systems. I.e. using it to periodicaly fetch history burdens the cluster
> (those searches can be heavy) and disrupt normal processing SLAs.
>
> Pushing provenance events out to an external system (pitebtially even
> filtered down to components of interest) is a much more predictable pattern
> and provides lots of flexibility on how to interpret the events.
>
> Andrew
>
> On Thu, Feb 14, 2019, 11:26 PM Ali Nazemian  wrote:
>
>> Can I expect the Nifi search provenance part do the job for me?
>>
>> On Fri, 15 Feb. 2019, 13:21 Mike Thomsen >
>>> Ali,
>>>
>>> There is a site to site publishing task for provenance that you can add
>>> as a root controller service that would be great here. It'll just take all
>>> of your provenance data periodically and ship it off to another NiFi server
>>> or cluster that can process all of the provenance data as blocks of JSON
>>> data. A common pattern there is to filter down to the events you want and
>>> publish to ElasticSearch.
>>>
>>> On Thu, Feb 14, 2019 at 7:05 PM Ali Nazemian 
>>> wrote:
>>>
 Hi All,

 I am investigating to see how Nifi provenance can be used as an event
 store for a long period of time. Our use case is very burst based and
 sometimes we may not receive any event for a period of time and sometimes
 we may get burst traffic. On average we can say maybe around 1000 eps is
 the expected throughput at this stage. Nifi has a powerful provenance that
 gives you an ability to also index based on some attributes. I am
 investigating how reliable is to use Nifi provenance store for a long
 period of time and enable index for a few extra attributes. Has anybody
 used Nifi provenance at this scale? Can lots of Lucene indices create other
 issues within Nifi as provenance uses Lucene for the indexing?

 P.S: Our use case is pretty light for Nifi as we are not going to have
 any ETL and Nifi is being used mostly as an Orchestrator of multiple
 Microservices.

 Regards,
 Ali

>>>

-- 
A.Nazemian


Re: Nifi provenance indexing throughput if it is being used as an event store

2019-02-15 Thread Andrew Grande
NiFi provenance searches are not a good integration pattern for external
systems. I.e. using it to periodicaly fetch history burdens the cluster
(those searches can be heavy) and disrupt normal processing SLAs.

Pushing provenance events out to an external system (pitebtially even
filtered down to components of interest) is a much more predictable pattern
and provides lots of flexibility on how to interpret the events.

Andrew

On Thu, Feb 14, 2019, 11:26 PM Ali Nazemian  wrote:

> Can I expect the Nifi search provenance part do the job for me?
>
> On Fri, 15 Feb. 2019, 13:21 Mike Thomsen 
>> Ali,
>>
>> There is a site to site publishing task for provenance that you can add
>> as a root controller service that would be great here. It'll just take all
>> of your provenance data periodically and ship it off to another NiFi server
>> or cluster that can process all of the provenance data as blocks of JSON
>> data. A common pattern there is to filter down to the events you want and
>> publish to ElasticSearch.
>>
>> On Thu, Feb 14, 2019 at 7:05 PM Ali Nazemian 
>> wrote:
>>
>>> Hi All,
>>>
>>> I am investigating to see how Nifi provenance can be used as an event
>>> store for a long period of time. Our use case is very burst based and
>>> sometimes we may not receive any event for a period of time and sometimes
>>> we may get burst traffic. On average we can say maybe around 1000 eps is
>>> the expected throughput at this stage. Nifi has a powerful provenance that
>>> gives you an ability to also index based on some attributes. I am
>>> investigating how reliable is to use Nifi provenance store for a long
>>> period of time and enable index for a few extra attributes. Has anybody
>>> used Nifi provenance at this scale? Can lots of Lucene indices create other
>>> issues within Nifi as provenance uses Lucene for the indexing?
>>>
>>> P.S: Our use case is pretty light for Nifi as we are not going to have
>>> any ETL and Nifi is being used mostly as an Orchestrator of multiple
>>> Microservices.
>>>
>>> Regards,
>>> Ali
>>>
>>


Re: Nifi provenance indexing throughput if it is being used as an event store

2019-02-14 Thread Ali Nazemian
Can I expect the Nifi search provenance part do the job for me?

On Fri, 15 Feb. 2019, 13:21 Mike Thomsen  Ali,
>
> There is a site to site publishing task for provenance that you can add as
> a root controller service that would be great here. It'll just take all of
> your provenance data periodically and ship it off to another NiFi server or
> cluster that can process all of the provenance data as blocks of JSON data.
> A common pattern there is to filter down to the events you want and publish
> to ElasticSearch.
>
> On Thu, Feb 14, 2019 at 7:05 PM Ali Nazemian 
> wrote:
>
>> Hi All,
>>
>> I am investigating to see how Nifi provenance can be used as an event
>> store for a long period of time. Our use case is very burst based and
>> sometimes we may not receive any event for a period of time and sometimes
>> we may get burst traffic. On average we can say maybe around 1000 eps is
>> the expected throughput at this stage. Nifi has a powerful provenance that
>> gives you an ability to also index based on some attributes. I am
>> investigating how reliable is to use Nifi provenance store for a long
>> period of time and enable index for a few extra attributes. Has anybody
>> used Nifi provenance at this scale? Can lots of Lucene indices create other
>> issues within Nifi as provenance uses Lucene for the indexing?
>>
>> P.S: Our use case is pretty light for Nifi as we are not going to have
>> any ETL and Nifi is being used mostly as an Orchestrator of multiple
>> Microservices.
>>
>> Regards,
>> Ali
>>
>


Re: Nifi provenance indexing throughput if it is being used as an event store

2019-02-14 Thread Mike Thomsen
Ali,

There is a site to site publishing task for provenance that you can add as
a root controller service that would be great here. It'll just take all of
your provenance data periodically and ship it off to another NiFi server or
cluster that can process all of the provenance data as blocks of JSON data.
A common pattern there is to filter down to the events you want and publish
to ElasticSearch.

On Thu, Feb 14, 2019 at 7:05 PM Ali Nazemian  wrote:

> Hi All,
>
> I am investigating to see how Nifi provenance can be used as an event
> store for a long period of time. Our use case is very burst based and
> sometimes we may not receive any event for a period of time and sometimes
> we may get burst traffic. On average we can say maybe around 1000 eps is
> the expected throughput at this stage. Nifi has a powerful provenance that
> gives you an ability to also index based on some attributes. I am
> investigating how reliable is to use Nifi provenance store for a long
> period of time and enable index for a few extra attributes. Has anybody
> used Nifi provenance at this scale? Can lots of Lucene indices create other
> issues within Nifi as provenance uses Lucene for the indexing?
>
> P.S: Our use case is pretty light for Nifi as we are not going to have any
> ETL and Nifi is being used mostly as an Orchestrator of multiple
> Microservices.
>
> Regards,
> Ali
>


Nifi provenance indexing throughput if it is being used as an event store

2019-02-14 Thread Ali Nazemian
Hi All,

I am investigating to see how Nifi provenance can be used as an event store
for a long period of time. Our use case is very burst based and sometimes
we may not receive any event for a period of time and sometimes we may get
burst traffic. On average we can say maybe around 1000 eps is the expected
throughput at this stage. Nifi has a powerful provenance that gives you an
ability to also index based on some attributes. I am investigating how
reliable is to use Nifi provenance store for a long period of time and
enable index for a few extra attributes. Has anybody used Nifi provenance
at this scale? Can lots of Lucene indices create other issues within Nifi
as provenance uses Lucene for the indexing?

P.S: Our use case is pretty light for Nifi as we are not going to have any
ETL and Nifi is being used mostly as an Orchestrator of multiple
Microservices.

Regards,
Ali