Bryan, I have a feeling you're right. This might call for a reporting task that exports to ElasticSearch so that Kibana dashboards can be used to answer these questions.
Thanks, Mike On Thu, Mar 1, 2018 at 10:20 AM, Bryan Bende <[email protected]> wrote: > Mike, > > As far as I know, Atlas is not really about "event level" lineage, it > is more about "system level" or "data set' level. > > So I believe the goal of Atlas is to show how the systems are > connected and how a particular data set flows through the system. > > So an example might be... NiFi pulls from source #1, then publishes to > Kafka topic #1, and then a stream processing system consumes from > Kafka topic #1, and then writes results to Hive. > > Atlas can then tell you that source #1 flowed through all these > systems and was the source for these results in Hive (something like > that). > > I don't think its a massive long-term store for event-level provenance > data like NiFi has, but others can chime in here if I am wrong. > > -Bryan > > > On Thu, Mar 1, 2018 at 10:11 AM, Mike Thomsen <[email protected]> > wrote: > > So I tried again, and finally got something populated (screenshot > attached > > for reference). What I don't see is anything like the provenance data > that > > the processors store. Like nothing about the flowfiles, their attributes, > > etc. > > > > My goal here is to have a long term, searchable repository of provenance > > data so questions like "when was data set XYZ reindexed" can be > answered. Is > > the flowfile provenance data not being captured and sent to Atlas or am I > > doing it wrong? > > > > If the answer is "not yet" I'm cool with that and would be happy to take > a > > stab at expanding the scope of the reporting task's capabilities. I just > > need someone more knowledgeable on this integration to give me pointers. > > > > Thanks, > > > > Mike > > > > On Wed, Feb 28, 2018 at 2:43 PM, Mike Thomsen <[email protected]> > > wrote: > >> > >> Matt, > >> > >> Yeah, I saw that pretty early on. Admittedly my question may be a bit > >> nebulous. What I'm trying to figure out is what I should be seeing in > Atlas > >> if NiFi is sending it events properly. Since the integration and > knowledge > >> around it is probably clustered here, I'm not sure I can go to the Atlas > >> list and ask them the same question. > >> > >> Thanks, > >> > >> Mike > >> > >> On Wed, Feb 28, 2018 at 2:13 PM, Matt Burgess <[email protected]> > >> wrote: > >>> > >>> Mike, > >>> > >>> There is a nifi-atlas-bundle in NiFi with a NAR that includes the > >>> ReportLineageToAtlas reporting task, but IIRC it is so large that it > >>> is not included in the default assembly. Instead there is a > >>> "include-atlas" profile that can be activated when building the > >>> assembly, and that should include the Atlas NAR and associated > >>> reporting task. > >>> > >>> Regards, > >>> Matt > >>> > >>> > >>> On Wed, Feb 28, 2018 at 1:42 PM, Mike Thomsen <[email protected]> > >>> wrote: > >>> > I have Atlas 0.8.2 (BerkeleyDB and Embedded ES) and NiFi 1.6.0 > nightly > >>> > both > >>> > up and claiming that they can talk to one another. > >>> > > >>> > What should I be seeing if they are? My test configuration consists > of > >>> > a > >>> > simple process group that has GetMongo, UpdateAttributes and > >>> > PutElasticSearchHttpRecord. I'm not sure if events are actually > making > >>> > it. > >>> > > >>> > The Atlas documentation is pretty limited on setting up a vanilla > >>> > installation, so I was wondering if someone could point me in the > right > >>> > direction from a NiFi point of view on what I should be seeing so I > can > >>> > start fumbling around in the right direction. > >>> > > >>> > Thanks, > >>> > > >>> > Mike > >> > >> > > >
