Bryan,

I have a feeling you're right. This might call for a reporting task that
exports to ElasticSearch so that Kibana dashboards can be used to answer
these questions.

Thanks,

Mike

On Thu, Mar 1, 2018 at 10:20 AM, Bryan Bende <[email protected]> wrote:

> Mike,
>
> As far as I know, Atlas is not really about "event level" lineage, it
> is more about "system level" or "data set' level.
>
> So I believe the goal of Atlas is to show how the systems are
> connected and how a particular data set flows through the system.
>
> So an example might be... NiFi pulls from source #1, then publishes to
> Kafka topic #1,  and then a stream processing system consumes from
> Kafka topic #1, and then writes results to Hive.
>
> Atlas can then tell you that source #1 flowed through all these
> systems and was the source for these results in Hive (something like
> that).
>
> I don't think its a massive long-term store for event-level provenance
> data like NiFi has, but others can chime in here if I am wrong.
>
> -Bryan
>
>
> On Thu, Mar 1, 2018 at 10:11 AM, Mike Thomsen <[email protected]>
> wrote:
> > So I tried again, and finally got something populated (screenshot
> attached
> > for reference). What I don't see is anything like the provenance data
> that
> > the processors store. Like nothing about the flowfiles, their attributes,
> > etc.
> >
> > My goal here is to have a long term, searchable repository of provenance
> > data so questions like "when was data set XYZ reindexed" can be
> answered. Is
> > the flowfile provenance data not being captured and sent to Atlas or am I
> > doing it wrong?
> >
> > If the answer is "not yet" I'm cool with that and would be happy to take
> a
> > stab at expanding the scope of the reporting task's capabilities. I just
> > need someone more knowledgeable on this integration to give me pointers.
> >
> > Thanks,
> >
> > Mike
> >
> > On Wed, Feb 28, 2018 at 2:43 PM, Mike Thomsen <[email protected]>
> > wrote:
> >>
> >> Matt,
> >>
> >> Yeah, I saw that pretty early on. Admittedly my question may be a bit
> >> nebulous. What I'm trying to figure out is what I should be seeing in
> Atlas
> >> if NiFi is sending it events properly. Since the integration and
> knowledge
> >> around it is probably clustered here, I'm not sure I can go to the Atlas
> >> list and ask them the same question.
> >>
> >> Thanks,
> >>
> >> Mike
> >>
> >> On Wed, Feb 28, 2018 at 2:13 PM, Matt Burgess <[email protected]>
> >> wrote:
> >>>
> >>> Mike,
> >>>
> >>> There is a nifi-atlas-bundle in NiFi with a NAR that includes the
> >>> ReportLineageToAtlas reporting task, but IIRC it is so large that it
> >>> is not included in the default assembly. Instead there is a
> >>> "include-atlas" profile that can be activated when building the
> >>> assembly, and that should include the Atlas NAR and associated
> >>> reporting task.
> >>>
> >>> Regards,
> >>> Matt
> >>>
> >>>
> >>> On Wed, Feb 28, 2018 at 1:42 PM, Mike Thomsen <[email protected]>
> >>> wrote:
> >>> > I have Atlas 0.8.2 (BerkeleyDB and Embedded ES) and NiFi 1.6.0
> nightly
> >>> > both
> >>> > up and claiming that they can talk to one another.
> >>> >
> >>> > What should I be seeing if they are? My test configuration consists
> of
> >>> > a
> >>> > simple process group that has GetMongo, UpdateAttributes and
> >>> > PutElasticSearchHttpRecord. I'm not sure if events are actually
> making
> >>> > it.
> >>> >
> >>> > The Atlas documentation is pretty limited on setting up a vanilla
> >>> > installation, so I was wondering if someone could point me in the
> right
> >>> > direction from a NiFi point of view on what I should be seeing so I
> can
> >>> > start fumbling around in the right direction.
> >>> >
> >>> > Thanks,
> >>> >
> >>> > Mike
> >>
> >>
> >
>

Reply via email to