Mike, As far as I know, Atlas is not really about "event level" lineage, it is more about "system level" or "data set' level.
So I believe the goal of Atlas is to show how the systems are connected and how a particular data set flows through the system. So an example might be... NiFi pulls from source #1, then publishes to Kafka topic #1, and then a stream processing system consumes from Kafka topic #1, and then writes results to Hive. Atlas can then tell you that source #1 flowed through all these systems and was the source for these results in Hive (something like that). I don't think its a massive long-term store for event-level provenance data like NiFi has, but others can chime in here if I am wrong. -Bryan On Thu, Mar 1, 2018 at 10:11 AM, Mike Thomsen <[email protected]> wrote: > So I tried again, and finally got something populated (screenshot attached > for reference). What I don't see is anything like the provenance data that > the processors store. Like nothing about the flowfiles, their attributes, > etc. > > My goal here is to have a long term, searchable repository of provenance > data so questions like "when was data set XYZ reindexed" can be answered. Is > the flowfile provenance data not being captured and sent to Atlas or am I > doing it wrong? > > If the answer is "not yet" I'm cool with that and would be happy to take a > stab at expanding the scope of the reporting task's capabilities. I just > need someone more knowledgeable on this integration to give me pointers. > > Thanks, > > Mike > > On Wed, Feb 28, 2018 at 2:43 PM, Mike Thomsen <[email protected]> > wrote: >> >> Matt, >> >> Yeah, I saw that pretty early on. Admittedly my question may be a bit >> nebulous. What I'm trying to figure out is what I should be seeing in Atlas >> if NiFi is sending it events properly. Since the integration and knowledge >> around it is probably clustered here, I'm not sure I can go to the Atlas >> list and ask them the same question. >> >> Thanks, >> >> Mike >> >> On Wed, Feb 28, 2018 at 2:13 PM, Matt Burgess <[email protected]> >> wrote: >>> >>> Mike, >>> >>> There is a nifi-atlas-bundle in NiFi with a NAR that includes the >>> ReportLineageToAtlas reporting task, but IIRC it is so large that it >>> is not included in the default assembly. Instead there is a >>> "include-atlas" profile that can be activated when building the >>> assembly, and that should include the Atlas NAR and associated >>> reporting task. >>> >>> Regards, >>> Matt >>> >>> >>> On Wed, Feb 28, 2018 at 1:42 PM, Mike Thomsen <[email protected]> >>> wrote: >>> > I have Atlas 0.8.2 (BerkeleyDB and Embedded ES) and NiFi 1.6.0 nightly >>> > both >>> > up and claiming that they can talk to one another. >>> > >>> > What should I be seeing if they are? My test configuration consists of >>> > a >>> > simple process group that has GetMongo, UpdateAttributes and >>> > PutElasticSearchHttpRecord. I'm not sure if events are actually making >>> > it. >>> > >>> > The Atlas documentation is pretty limited on setting up a vanilla >>> > installation, so I was wondering if someone could point me in the right >>> > direction from a NiFi point of view on what I should be seeing so I can >>> > start fumbling around in the right direction. >>> > >>> > Thanks, >>> > >>> > Mike >> >> >
