Re: Nifi provenance indexing throughput if it is being used as an event store

2019-02-16 Thread Ali Nazemian
Thanks, Joe. Given the fact that we would like to add a few attributes and
set them to be indexed for the provenance, the mentioned rate should be
alright?

Cheers,
Ali

On Sat, Feb 16, 2019 at 2:56 PM Joe Witt  wrote:

> Ali
>
> You certainly can and at the rates you mention you should be able to keep
> it for a good while.
>
> Just set the properties you need for your system and measure the rate at
> which prov storage fills.
>
> Thanks
>
> On Fri, Feb 15, 2019 at 10:29 PM Ali Nazemian 
> wrote:
>
>> I didn't mean to use Nifi provenance search for an external provenance
>> search. I meant to use it for internal search provenance but keep the
>> provenance for a longer time than usual. It means instead of expecting it
>> to keep provenance data for a few days, use it as an event store as it also
>> provides the search capability.
>>
>> Regards,
>> Ali
>>
>> On Sat, Feb 16, 2019 at 5:29 AM Andrew Grande  wrote:
>>
>>> NiFi provenance searches are not a good integration pattern for external
>>> systems. I.e. using it to periodicaly fetch history burdens the cluster
>>> (those searches can be heavy) and disrupt normal processing SLAs.
>>>
>>> Pushing provenance events out to an external system (pitebtially even
>>> filtered down to components of interest) is a much more predictable pattern
>>> and provides lots of flexibility on how to interpret the events.
>>>
>>> Andrew
>>>
>>> On Thu, Feb 14, 2019, 11:26 PM Ali Nazemian 
>>> wrote:
>>>
 Can I expect the Nifi search provenance part do the job for me?

 On Fri, 15 Feb. 2019, 13:21 Mike Thomsen >>>
> Ali,
>
> There is a site to site publishing task for provenance that you can
> add as a root controller service that would be great here. It'll just take
> all of your provenance data periodically and ship it off to another NiFi
> server or cluster that can process all of the provenance data as blocks of
> JSON data. A common pattern there is to filter down to the events you want
> and publish to ElasticSearch.
>
> On Thu, Feb 14, 2019 at 7:05 PM Ali Nazemian 
> wrote:
>
>> Hi All,
>>
>> I am investigating to see how Nifi provenance can be used as an event
>> store for a long period of time. Our use case is very burst based and
>> sometimes we may not receive any event for a period of time and sometimes
>> we may get burst traffic. On average we can say maybe around 1000 eps is
>> the expected throughput at this stage. Nifi has a powerful provenance 
>> that
>> gives you an ability to also index based on some attributes. I am
>> investigating how reliable is to use Nifi provenance store for a long
>> period of time and enable index for a few extra attributes. Has anybody
>> used Nifi provenance at this scale? Can lots of Lucene indices create 
>> other
>> issues within Nifi as provenance uses Lucene for the indexing?
>>
>> P.S: Our use case is pretty light for Nifi as we are not going to
>> have any ETL and Nifi is being used mostly as an Orchestrator of multiple
>> Microservices.
>>
>> Regards,
>> Ali
>>
>
>>
>> --
>> A.Nazemian
>>
>

-- 
A.Nazemian


Re: Record-oriented DetectDuplicate?

2019-02-16 Thread Mike Thomsen
Andrew, Mark, etc.

A new contributor alerted me on Jira that he did his own take on this
processor. I encouraged him to join the dev list so we can discuss the use
case in more depth and sort out what is the best way forward.

See https://issues.apache.org/jira/browse/NIFI-6047

I'll give him a little while to join and announce he's ready to go over it
before I move forward with a discussion on this.

On Sat, Feb 9, 2019 at 12:34 PM Mike Thomsen  wrote:

> PR if anyone is interested:
>
> https://github.com/apache/nifi/pull/3298
>
> On Fri, Feb 8, 2019 at 5:34 PM Mike Thomsen 
> wrote:
>
>> With Redis and HBase you can set a TTL on the data itself in the lookup
>> table. Were you thinking something more than that?
>>
>> On Fri, Feb 8, 2019 at 4:42 PM Andrew Grande  wrote:
>>
>>> Can I suggest a time-based option for specifying the window? I think we
>>> only mentioned the number of records.
>>>
>>> Andrew
>>>
>>> On Fri, Feb 8, 2019, 8:22 AM Mike Thomsen 
>>> wrote:
>>>
 Thanks. That answers it succinctly for me. I'll build out a
 DetectDuplicateRecord processor to handle this.

 On Fri, Feb 8, 2019 at 11:17 AM Mark Payne 
 wrote:

> Matt,
>
> That would work if you want to select distinct records in a given
> FlowFIle but not across FlowFiles.
> PartitionRecord -> UpdateAttribute (optionally to combine multiple
> attributes into one) -> DetectDuplicate
> would work, but given that you expect the records to be unique
> generally, this would have the effect of
> splitting each FlowFile into Record-per-FlowFile, which is certainly
> not ideal.
>
> Thanks
> -Mark
>
>
> > On Feb 8, 2019, at 11:14 AM, Matt Burgess 
> wrote:
> >
> > Mike,
> >
> > I don't think so, but you could try a SELECT DISTINCT in QueryRecord,
> > might be a bit of a pain if you want to select all columns and there
> > are lots of them.
> >
> > Alternatively you could try PartitionRecord -> QueryRecord (select *
> > limit 1). Neither PartitionRecord nor QueryRecord keeps state so
> you'd
> > likely need to use distributed cache or UpdateAttribute.
> >
> > Regards,
> > Matt
> >
> > On Fri, Feb 8, 2019 at 11:08 AM Mike Thomsen 
> wrote:
> >>
> >> Do we have anything like DetectDuplicate for the Record API
> already? Didn't see anything, but wanted to ask before reinventing the
> wheel.
> >>
> >> Thanks,
> >>
> >> Mike
>
>


Re: 1.9 release date?

2019-02-16 Thread Boris Tyukin
wondering the same thing
Boris

On Sat, Feb 16, 2019 at 10:41 AM dan young  wrote:

> Heya folks,
>
> Any insight on 1.9 release date?  Looks like a lot of goodies and fixes
> included...
>
> Regards,
>
> Dano
>


Re: 1.9 release date?

2019-02-16 Thread Joe Witt
dan

we did rc1 this week and will have rc2 up today or tomorrow ideally.

thanks

On Sat, Feb 16, 2019, 10:42 AM dan young  Heya folks,
>
> Any insight on 1.9 release date?  Looks like a lot of goodies and fixes
> included...
>
> Regards,
>
> Dano
>


1.9 release date?

2019-02-16 Thread dan young
Heya folks,

Any insight on 1.9 release date?  Looks like a lot of goodies and fixes
included...

Regards,

Dano