Re: Nifi provenance indexing throughput if it is being used as an event store
Sure. Thanks, Joe. On Sun, 17 Feb. 2019, 22:52 Joe Witt ali > > there are many variables here that are needed before anyone could know for > sure. > > but give it a try and measure amd forecast and youll know within a day or > two. > > thanks > > > On Sat, Feb 16, 2019, 11:37 PM Ali Nazemian >> Thanks, Joe. Given the fact that we would like to add a few attributes >> and set them to be indexed for the provenance, the mentioned rate should be >> alright? >> >> Cheers, >> Ali >> >> On Sat, Feb 16, 2019 at 2:56 PM Joe Witt wrote: >> >>> Ali >>> >>> You certainly can and at the rates you mention you should be able to >>> keep it for a good while. >>> >>> Just set the properties you need for your system and measure the rate at >>> which prov storage fills. >>> >>> Thanks >>> >>> On Fri, Feb 15, 2019 at 10:29 PM Ali Nazemian >>> wrote: >>> I didn't mean to use Nifi provenance search for an external provenance search. I meant to use it for internal search provenance but keep the provenance for a longer time than usual. It means instead of expecting it to keep provenance data for a few days, use it as an event store as it also provides the search capability. Regards, Ali On Sat, Feb 16, 2019 at 5:29 AM Andrew Grande wrote: > NiFi provenance searches are not a good integration pattern for > external systems. I.e. using it to periodicaly fetch history burdens the > cluster (those searches can be heavy) and disrupt normal processing SLAs. > > Pushing provenance events out to an external system (pitebtially even > filtered down to components of interest) is a much more predictable > pattern > and provides lots of flexibility on how to interpret the events. > > Andrew > > On Thu, Feb 14, 2019, 11:26 PM Ali Nazemian > wrote: > >> Can I expect the Nifi search provenance part do the job for me? >> >> On Fri, 15 Feb. 2019, 13:21 Mike Thomsen > wrote: >> >>> Ali, >>> >>> There is a site to site publishing task for provenance that you can >>> add as a root controller service that would be great here. It'll just >>> take >>> all of your provenance data periodically and ship it off to another NiFi >>> server or cluster that can process all of the provenance data as blocks >>> of >>> JSON data. A common pattern there is to filter down to the events you >>> want >>> and publish to ElasticSearch. >>> >>> On Thu, Feb 14, 2019 at 7:05 PM Ali Nazemian >>> wrote: >>> Hi All, I am investigating to see how Nifi provenance can be used as an event store for a long period of time. Our use case is very burst based and sometimes we may not receive any event for a period of time and sometimes we may get burst traffic. On average we can say maybe around 1000 eps is the expected throughput at this stage. Nifi has a powerful provenance that gives you an ability to also index based on some attributes. I am investigating how reliable is to use Nifi provenance store for a long period of time and enable index for a few extra attributes. Has anybody used Nifi provenance at this scale? Can lots of Lucene indices create other issues within Nifi as provenance uses Lucene for the indexing? P.S: Our use case is pretty light for Nifi as we are not going to have any ETL and Nifi is being used mostly as an Orchestrator of multiple Microservices. Regards, Ali >>> -- A.Nazemian >>> >> >> -- >> A.Nazemian >> >
Re: Nifi provenance indexing throughput if it is being used as an event store
ali there are many variables here that are needed before anyone could know for sure. but give it a try and measure amd forecast and youll know within a day or two. thanks On Sat, Feb 16, 2019, 11:37 PM Ali Nazemian Thanks, Joe. Given the fact that we would like to add a few attributes and > set them to be indexed for the provenance, the mentioned rate should be > alright? > > Cheers, > Ali > > On Sat, Feb 16, 2019 at 2:56 PM Joe Witt wrote: > >> Ali >> >> You certainly can and at the rates you mention you should be able to keep >> it for a good while. >> >> Just set the properties you need for your system and measure the rate at >> which prov storage fills. >> >> Thanks >> >> On Fri, Feb 15, 2019 at 10:29 PM Ali Nazemian >> wrote: >> >>> I didn't mean to use Nifi provenance search for an external provenance >>> search. I meant to use it for internal search provenance but keep the >>> provenance for a longer time than usual. It means instead of expecting it >>> to keep provenance data for a few days, use it as an event store as it also >>> provides the search capability. >>> >>> Regards, >>> Ali >>> >>> On Sat, Feb 16, 2019 at 5:29 AM Andrew Grande >>> wrote: >>> NiFi provenance searches are not a good integration pattern for external systems. I.e. using it to periodicaly fetch history burdens the cluster (those searches can be heavy) and disrupt normal processing SLAs. Pushing provenance events out to an external system (pitebtially even filtered down to components of interest) is a much more predictable pattern and provides lots of flexibility on how to interpret the events. Andrew On Thu, Feb 14, 2019, 11:26 PM Ali Nazemian wrote: > Can I expect the Nifi search provenance part do the job for me? > > On Fri, 15 Feb. 2019, 13:21 Mike Thomsen wrote: > >> Ali, >> >> There is a site to site publishing task for provenance that you can >> add as a root controller service that would be great here. It'll just >> take >> all of your provenance data periodically and ship it off to another NiFi >> server or cluster that can process all of the provenance data as blocks >> of >> JSON data. A common pattern there is to filter down to the events you >> want >> and publish to ElasticSearch. >> >> On Thu, Feb 14, 2019 at 7:05 PM Ali Nazemian >> wrote: >> >>> Hi All, >>> >>> I am investigating to see how Nifi provenance can be used as an >>> event store for a long period of time. Our use case is very burst based >>> and >>> sometimes we may not receive any event for a period of time and >>> sometimes >>> we may get burst traffic. On average we can say maybe around 1000 eps is >>> the expected throughput at this stage. Nifi has a powerful provenance >>> that >>> gives you an ability to also index based on some attributes. I am >>> investigating how reliable is to use Nifi provenance store for a long >>> period of time and enable index for a few extra attributes. Has anybody >>> used Nifi provenance at this scale? Can lots of Lucene indices create >>> other >>> issues within Nifi as provenance uses Lucene for the indexing? >>> >>> P.S: Our use case is pretty light for Nifi as we are not going to >>> have any ETL and Nifi is being used mostly as an Orchestrator of >>> multiple >>> Microservices. >>> >>> Regards, >>> Ali >>> >> >>> >>> -- >>> A.Nazemian >>> >> > > -- > A.Nazemian >
Re: Nifi provenance indexing throughput if it is being used as an event store
Thanks, Joe. Given the fact that we would like to add a few attributes and set them to be indexed for the provenance, the mentioned rate should be alright? Cheers, Ali On Sat, Feb 16, 2019 at 2:56 PM Joe Witt wrote: > Ali > > You certainly can and at the rates you mention you should be able to keep > it for a good while. > > Just set the properties you need for your system and measure the rate at > which prov storage fills. > > Thanks > > On Fri, Feb 15, 2019 at 10:29 PM Ali Nazemian > wrote: > >> I didn't mean to use Nifi provenance search for an external provenance >> search. I meant to use it for internal search provenance but keep the >> provenance for a longer time than usual. It means instead of expecting it >> to keep provenance data for a few days, use it as an event store as it also >> provides the search capability. >> >> Regards, >> Ali >> >> On Sat, Feb 16, 2019 at 5:29 AM Andrew Grande wrote: >> >>> NiFi provenance searches are not a good integration pattern for external >>> systems. I.e. using it to periodicaly fetch history burdens the cluster >>> (those searches can be heavy) and disrupt normal processing SLAs. >>> >>> Pushing provenance events out to an external system (pitebtially even >>> filtered down to components of interest) is a much more predictable pattern >>> and provides lots of flexibility on how to interpret the events. >>> >>> Andrew >>> >>> On Thu, Feb 14, 2019, 11:26 PM Ali Nazemian >>> wrote: >>> Can I expect the Nifi search provenance part do the job for me? On Fri, 15 Feb. 2019, 13:21 Mike Thomsen >>> > Ali, > > There is a site to site publishing task for provenance that you can > add as a root controller service that would be great here. It'll just take > all of your provenance data periodically and ship it off to another NiFi > server or cluster that can process all of the provenance data as blocks of > JSON data. A common pattern there is to filter down to the events you want > and publish to ElasticSearch. > > On Thu, Feb 14, 2019 at 7:05 PM Ali Nazemian > wrote: > >> Hi All, >> >> I am investigating to see how Nifi provenance can be used as an event >> store for a long period of time. Our use case is very burst based and >> sometimes we may not receive any event for a period of time and sometimes >> we may get burst traffic. On average we can say maybe around 1000 eps is >> the expected throughput at this stage. Nifi has a powerful provenance >> that >> gives you an ability to also index based on some attributes. I am >> investigating how reliable is to use Nifi provenance store for a long >> period of time and enable index for a few extra attributes. Has anybody >> used Nifi provenance at this scale? Can lots of Lucene indices create >> other >> issues within Nifi as provenance uses Lucene for the indexing? >> >> P.S: Our use case is pretty light for Nifi as we are not going to >> have any ETL and Nifi is being used mostly as an Orchestrator of multiple >> Microservices. >> >> Regards, >> Ali >> > >> >> -- >> A.Nazemian >> > -- A.Nazemian
Re: Nifi provenance indexing throughput if it is being used as an event store
Ali You certainly can and at the rates you mention you should be able to keep it for a good while. Just set the properties you need for your system and measure the rate at which prov storage fills. Thanks On Fri, Feb 15, 2019 at 10:29 PM Ali Nazemian wrote: > I didn't mean to use Nifi provenance search for an external provenance > search. I meant to use it for internal search provenance but keep the > provenance for a longer time than usual. It means instead of expecting it > to keep provenance data for a few days, use it as an event store as it also > provides the search capability. > > Regards, > Ali > > On Sat, Feb 16, 2019 at 5:29 AM Andrew Grande wrote: > >> NiFi provenance searches are not a good integration pattern for external >> systems. I.e. using it to periodicaly fetch history burdens the cluster >> (those searches can be heavy) and disrupt normal processing SLAs. >> >> Pushing provenance events out to an external system (pitebtially even >> filtered down to components of interest) is a much more predictable pattern >> and provides lots of flexibility on how to interpret the events. >> >> Andrew >> >> On Thu, Feb 14, 2019, 11:26 PM Ali Nazemian >> wrote: >> >>> Can I expect the Nifi search provenance part do the job for me? >>> >>> On Fri, 15 Feb. 2019, 13:21 Mike Thomsen >> Ali, There is a site to site publishing task for provenance that you can add as a root controller service that would be great here. It'll just take all of your provenance data periodically and ship it off to another NiFi server or cluster that can process all of the provenance data as blocks of JSON data. A common pattern there is to filter down to the events you want and publish to ElasticSearch. On Thu, Feb 14, 2019 at 7:05 PM Ali Nazemian wrote: > Hi All, > > I am investigating to see how Nifi provenance can be used as an event > store for a long period of time. Our use case is very burst based and > sometimes we may not receive any event for a period of time and sometimes > we may get burst traffic. On average we can say maybe around 1000 eps is > the expected throughput at this stage. Nifi has a powerful provenance that > gives you an ability to also index based on some attributes. I am > investigating how reliable is to use Nifi provenance store for a long > period of time and enable index for a few extra attributes. Has anybody > used Nifi provenance at this scale? Can lots of Lucene indices create > other > issues within Nifi as provenance uses Lucene for the indexing? > > P.S: Our use case is pretty light for Nifi as we are not going to have > any ETL and Nifi is being used mostly as an Orchestrator of multiple > Microservices. > > Regards, > Ali > > > -- > A.Nazemian >
Re: Nifi provenance indexing throughput if it is being used as an event store
I didn't mean to use Nifi provenance search for an external provenance search. I meant to use it for internal search provenance but keep the provenance for a longer time than usual. It means instead of expecting it to keep provenance data for a few days, use it as an event store as it also provides the search capability. Regards, Ali On Sat, Feb 16, 2019 at 5:29 AM Andrew Grande wrote: > NiFi provenance searches are not a good integration pattern for external > systems. I.e. using it to periodicaly fetch history burdens the cluster > (those searches can be heavy) and disrupt normal processing SLAs. > > Pushing provenance events out to an external system (pitebtially even > filtered down to components of interest) is a much more predictable pattern > and provides lots of flexibility on how to interpret the events. > > Andrew > > On Thu, Feb 14, 2019, 11:26 PM Ali Nazemian wrote: > >> Can I expect the Nifi search provenance part do the job for me? >> >> On Fri, 15 Feb. 2019, 13:21 Mike Thomsen > >>> Ali, >>> >>> There is a site to site publishing task for provenance that you can add >>> as a root controller service that would be great here. It'll just take all >>> of your provenance data periodically and ship it off to another NiFi server >>> or cluster that can process all of the provenance data as blocks of JSON >>> data. A common pattern there is to filter down to the events you want and >>> publish to ElasticSearch. >>> >>> On Thu, Feb 14, 2019 at 7:05 PM Ali Nazemian >>> wrote: >>> Hi All, I am investigating to see how Nifi provenance can be used as an event store for a long period of time. Our use case is very burst based and sometimes we may not receive any event for a period of time and sometimes we may get burst traffic. On average we can say maybe around 1000 eps is the expected throughput at this stage. Nifi has a powerful provenance that gives you an ability to also index based on some attributes. I am investigating how reliable is to use Nifi provenance store for a long period of time and enable index for a few extra attributes. Has anybody used Nifi provenance at this scale? Can lots of Lucene indices create other issues within Nifi as provenance uses Lucene for the indexing? P.S: Our use case is pretty light for Nifi as we are not going to have any ETL and Nifi is being used mostly as an Orchestrator of multiple Microservices. Regards, Ali >>> -- A.Nazemian
Re: Nifi provenance indexing throughput if it is being used as an event store
NiFi provenance searches are not a good integration pattern for external systems. I.e. using it to periodicaly fetch history burdens the cluster (those searches can be heavy) and disrupt normal processing SLAs. Pushing provenance events out to an external system (pitebtially even filtered down to components of interest) is a much more predictable pattern and provides lots of flexibility on how to interpret the events. Andrew On Thu, Feb 14, 2019, 11:26 PM Ali Nazemian wrote: > Can I expect the Nifi search provenance part do the job for me? > > On Fri, 15 Feb. 2019, 13:21 Mike Thomsen >> Ali, >> >> There is a site to site publishing task for provenance that you can add >> as a root controller service that would be great here. It'll just take all >> of your provenance data periodically and ship it off to another NiFi server >> or cluster that can process all of the provenance data as blocks of JSON >> data. A common pattern there is to filter down to the events you want and >> publish to ElasticSearch. >> >> On Thu, Feb 14, 2019 at 7:05 PM Ali Nazemian >> wrote: >> >>> Hi All, >>> >>> I am investigating to see how Nifi provenance can be used as an event >>> store for a long period of time. Our use case is very burst based and >>> sometimes we may not receive any event for a period of time and sometimes >>> we may get burst traffic. On average we can say maybe around 1000 eps is >>> the expected throughput at this stage. Nifi has a powerful provenance that >>> gives you an ability to also index based on some attributes. I am >>> investigating how reliable is to use Nifi provenance store for a long >>> period of time and enable index for a few extra attributes. Has anybody >>> used Nifi provenance at this scale? Can lots of Lucene indices create other >>> issues within Nifi as provenance uses Lucene for the indexing? >>> >>> P.S: Our use case is pretty light for Nifi as we are not going to have >>> any ETL and Nifi is being used mostly as an Orchestrator of multiple >>> Microservices. >>> >>> Regards, >>> Ali >>> >>
Re: Nifi provenance indexing throughput if it is being used as an event store
Can I expect the Nifi search provenance part do the job for me? On Fri, 15 Feb. 2019, 13:21 Mike Thomsen Ali, > > There is a site to site publishing task for provenance that you can add as > a root controller service that would be great here. It'll just take all of > your provenance data periodically and ship it off to another NiFi server or > cluster that can process all of the provenance data as blocks of JSON data. > A common pattern there is to filter down to the events you want and publish > to ElasticSearch. > > On Thu, Feb 14, 2019 at 7:05 PM Ali Nazemian > wrote: > >> Hi All, >> >> I am investigating to see how Nifi provenance can be used as an event >> store for a long period of time. Our use case is very burst based and >> sometimes we may not receive any event for a period of time and sometimes >> we may get burst traffic. On average we can say maybe around 1000 eps is >> the expected throughput at this stage. Nifi has a powerful provenance that >> gives you an ability to also index based on some attributes. I am >> investigating how reliable is to use Nifi provenance store for a long >> period of time and enable index for a few extra attributes. Has anybody >> used Nifi provenance at this scale? Can lots of Lucene indices create other >> issues within Nifi as provenance uses Lucene for the indexing? >> >> P.S: Our use case is pretty light for Nifi as we are not going to have >> any ETL and Nifi is being used mostly as an Orchestrator of multiple >> Microservices. >> >> Regards, >> Ali >> >
Re: Nifi provenance indexing throughput if it is being used as an event store
Ali, There is a site to site publishing task for provenance that you can add as a root controller service that would be great here. It'll just take all of your provenance data periodically and ship it off to another NiFi server or cluster that can process all of the provenance data as blocks of JSON data. A common pattern there is to filter down to the events you want and publish to ElasticSearch. On Thu, Feb 14, 2019 at 7:05 PM Ali Nazemian wrote: > Hi All, > > I am investigating to see how Nifi provenance can be used as an event > store for a long period of time. Our use case is very burst based and > sometimes we may not receive any event for a period of time and sometimes > we may get burst traffic. On average we can say maybe around 1000 eps is > the expected throughput at this stage. Nifi has a powerful provenance that > gives you an ability to also index based on some attributes. I am > investigating how reliable is to use Nifi provenance store for a long > period of time and enable index for a few extra attributes. Has anybody > used Nifi provenance at this scale? Can lots of Lucene indices create other > issues within Nifi as provenance uses Lucene for the indexing? > > P.S: Our use case is pretty light for Nifi as we are not going to have any > ETL and Nifi is being used mostly as an Orchestrator of multiple > Microservices. > > Regards, > Ali >
Nifi provenance indexing throughput if it is being used as an event store
Hi All, I am investigating to see how Nifi provenance can be used as an event store for a long period of time. Our use case is very burst based and sometimes we may not receive any event for a period of time and sometimes we may get burst traffic. On average we can say maybe around 1000 eps is the expected throughput at this stage. Nifi has a powerful provenance that gives you an ability to also index based on some attributes. I am investigating how reliable is to use Nifi provenance store for a long period of time and enable index for a few extra attributes. Has anybody used Nifi provenance at this scale? Can lots of Lucene indices create other issues within Nifi as provenance uses Lucene for the indexing? P.S: Our use case is pretty light for Nifi as we are not going to have any ETL and Nifi is being used mostly as an Orchestrator of multiple Microservices. Regards, Ali