Bryan, That did it. It might not be able to answer at the granularity of how many times a reindex was done, but with the right mix of updateattributes and that task, I was able to build an ElasticSearch index that can at least show a date histogram aggregation of when reindex operations happened by data set.
On Thu, Mar 1, 2018 at 11:23 AM, Bryan Bende <[email protected]> wrote: > Mike, > > That is basically the point of SiteToSiteProvenanceReportingTask... > you send the provenance events from reporting task back to the same > cluster, and then leverage existing processors like the ElasticSearch > processors. > > Otherwise we'd get into building 100 reporting tasks for all the > various destinations, just like all the processors. > > -Bryan > > On Thu, Mar 1, 2018 at 11:04 AM, Mike Thomsen <[email protected]> > wrote: > > Bryan, > > > > I have a feeling you're right. This might call for a reporting task that > > exports to ElasticSearch so that Kibana dashboards can be used to answer > > these questions. > > > > Thanks, > > > > Mike > > > > On Thu, Mar 1, 2018 at 10:20 AM, Bryan Bende <[email protected]> wrote: > >> > >> Mike, > >> > >> As far as I know, Atlas is not really about "event level" lineage, it > >> is more about "system level" or "data set' level. > >> > >> So I believe the goal of Atlas is to show how the systems are > >> connected and how a particular data set flows through the system. > >> > >> So an example might be... NiFi pulls from source #1, then publishes to > >> Kafka topic #1, and then a stream processing system consumes from > >> Kafka topic #1, and then writes results to Hive. > >> > >> Atlas can then tell you that source #1 flowed through all these > >> systems and was the source for these results in Hive (something like > >> that). > >> > >> I don't think its a massive long-term store for event-level provenance > >> data like NiFi has, but others can chime in here if I am wrong. > >> > >> -Bryan > >> > >> > >> On Thu, Mar 1, 2018 at 10:11 AM, Mike Thomsen <[email protected]> > >> wrote: > >> > So I tried again, and finally got something populated (screenshot > >> > attached > >> > for reference). What I don't see is anything like the provenance data > >> > that > >> > the processors store. Like nothing about the flowfiles, their > >> > attributes, > >> > etc. > >> > > >> > My goal here is to have a long term, searchable repository of > provenance > >> > data so questions like "when was data set XYZ reindexed" can be > >> > answered. Is > >> > the flowfile provenance data not being captured and sent to Atlas or > am > >> > I > >> > doing it wrong? > >> > > >> > If the answer is "not yet" I'm cool with that and would be happy to > take > >> > a > >> > stab at expanding the scope of the reporting task's capabilities. I > just > >> > need someone more knowledgeable on this integration to give me > pointers. > >> > > >> > Thanks, > >> > > >> > Mike > >> > > >> > On Wed, Feb 28, 2018 at 2:43 PM, Mike Thomsen <[email protected] > > > >> > wrote: > >> >> > >> >> Matt, > >> >> > >> >> Yeah, I saw that pretty early on. Admittedly my question may be a bit > >> >> nebulous. What I'm trying to figure out is what I should be seeing in > >> >> Atlas > >> >> if NiFi is sending it events properly. Since the integration and > >> >> knowledge > >> >> around it is probably clustered here, I'm not sure I can go to the > >> >> Atlas > >> >> list and ask them the same question. > >> >> > >> >> Thanks, > >> >> > >> >> Mike > >> >> > >> >> On Wed, Feb 28, 2018 at 2:13 PM, Matt Burgess <[email protected]> > >> >> wrote: > >> >>> > >> >>> Mike, > >> >>> > >> >>> There is a nifi-atlas-bundle in NiFi with a NAR that includes the > >> >>> ReportLineageToAtlas reporting task, but IIRC it is so large that it > >> >>> is not included in the default assembly. Instead there is a > >> >>> "include-atlas" profile that can be activated when building the > >> >>> assembly, and that should include the Atlas NAR and associated > >> >>> reporting task. > >> >>> > >> >>> Regards, > >> >>> Matt > >> >>> > >> >>> > >> >>> On Wed, Feb 28, 2018 at 1:42 PM, Mike Thomsen < > [email protected]> > >> >>> wrote: > >> >>> > I have Atlas 0.8.2 (BerkeleyDB and Embedded ES) and NiFi 1.6.0 > >> >>> > nightly > >> >>> > both > >> >>> > up and claiming that they can talk to one another. > >> >>> > > >> >>> > What should I be seeing if they are? My test configuration > consists > >> >>> > of > >> >>> > a > >> >>> > simple process group that has GetMongo, UpdateAttributes and > >> >>> > PutElasticSearchHttpRecord. I'm not sure if events are actually > >> >>> > making > >> >>> > it. > >> >>> > > >> >>> > The Atlas documentation is pretty limited on setting up a vanilla > >> >>> > installation, so I was wondering if someone could point me in the > >> >>> > right > >> >>> > direction from a NiFi point of view on what I should be seeing so > I > >> >>> > can > >> >>> > start fumbling around in the right direction. > >> >>> > > >> >>> > Thanks, > >> >>> > > >> >>> > Mike > >> >> > >> >> > >> > > > > > >
