Bryan,

That did it. It might not be able to answer at the granularity of how many
times a reindex was done, but with the right mix of updateattributes and
that task, I was able to build an ElasticSearch index that can at least
show a date histogram aggregation of when reindex operations happened by
data set.

On Thu, Mar 1, 2018 at 11:23 AM, Bryan Bende <bbe...@gmail.com> wrote:

> Mike,
>
> That is basically the point of SiteToSiteProvenanceReportingTask...
> you send the provenance events from reporting task back to the same
> cluster, and then leverage existing processors like the ElasticSearch
> processors.
>
> Otherwise we'd get into building 100 reporting tasks for all the
> various destinations, just like all the processors.
>
> -Bryan
>
> On Thu, Mar 1, 2018 at 11:04 AM, Mike Thomsen <mikerthom...@gmail.com>
> wrote:
> > Bryan,
> >
> > I have a feeling you're right. This might call for a reporting task that
> > exports to ElasticSearch so that Kibana dashboards can be used to answer
> > these questions.
> >
> > Thanks,
> >
> > Mike
> >
> > On Thu, Mar 1, 2018 at 10:20 AM, Bryan Bende <bbe...@gmail.com> wrote:
> >>
> >> Mike,
> >>
> >> As far as I know, Atlas is not really about "event level" lineage, it
> >> is more about "system level" or "data set' level.
> >>
> >> So I believe the goal of Atlas is to show how the systems are
> >> connected and how a particular data set flows through the system.
> >>
> >> So an example might be... NiFi pulls from source #1, then publishes to
> >> Kafka topic #1,  and then a stream processing system consumes from
> >> Kafka topic #1, and then writes results to Hive.
> >>
> >> Atlas can then tell you that source #1 flowed through all these
> >> systems and was the source for these results in Hive (something like
> >> that).
> >>
> >> I don't think its a massive long-term store for event-level provenance
> >> data like NiFi has, but others can chime in here if I am wrong.
> >>
> >> -Bryan
> >>
> >>
> >> On Thu, Mar 1, 2018 at 10:11 AM, Mike Thomsen <mikerthom...@gmail.com>
> >> wrote:
> >> > So I tried again, and finally got something populated (screenshot
> >> > attached
> >> > for reference). What I don't see is anything like the provenance data
> >> > that
> >> > the processors store. Like nothing about the flowfiles, their
> >> > attributes,
> >> > etc.
> >> >
> >> > My goal here is to have a long term, searchable repository of
> provenance
> >> > data so questions like "when was data set XYZ reindexed" can be
> >> > answered. Is
> >> > the flowfile provenance data not being captured and sent to Atlas or
> am
> >> > I
> >> > doing it wrong?
> >> >
> >> > If the answer is "not yet" I'm cool with that and would be happy to
> take
> >> > a
> >> > stab at expanding the scope of the reporting task's capabilities. I
> just
> >> > need someone more knowledgeable on this integration to give me
> pointers.
> >> >
> >> > Thanks,
> >> >
> >> > Mike
> >> >
> >> > On Wed, Feb 28, 2018 at 2:43 PM, Mike Thomsen <mikerthom...@gmail.com
> >
> >> > wrote:
> >> >>
> >> >> Matt,
> >> >>
> >> >> Yeah, I saw that pretty early on. Admittedly my question may be a bit
> >> >> nebulous. What I'm trying to figure out is what I should be seeing in
> >> >> Atlas
> >> >> if NiFi is sending it events properly. Since the integration and
> >> >> knowledge
> >> >> around it is probably clustered here, I'm not sure I can go to the
> >> >> Atlas
> >> >> list and ask them the same question.
> >> >>
> >> >> Thanks,
> >> >>
> >> >> Mike
> >> >>
> >> >> On Wed, Feb 28, 2018 at 2:13 PM, Matt Burgess <mattyb...@apache.org>
> >> >> wrote:
> >> >>>
> >> >>> Mike,
> >> >>>
> >> >>> There is a nifi-atlas-bundle in NiFi with a NAR that includes the
> >> >>> ReportLineageToAtlas reporting task, but IIRC it is so large that it
> >> >>> is not included in the default assembly. Instead there is a
> >> >>> "include-atlas" profile that can be activated when building the
> >> >>> assembly, and that should include the Atlas NAR and associated
> >> >>> reporting task.
> >> >>>
> >> >>> Regards,
> >> >>> Matt
> >> >>>
> >> >>>
> >> >>> On Wed, Feb 28, 2018 at 1:42 PM, Mike Thomsen <
> mikerthom...@gmail.com>
> >> >>> wrote:
> >> >>> > I have Atlas 0.8.2 (BerkeleyDB and Embedded ES) and NiFi 1.6.0
> >> >>> > nightly
> >> >>> > both
> >> >>> > up and claiming that they can talk to one another.
> >> >>> >
> >> >>> > What should I be seeing if they are? My test configuration
> consists
> >> >>> > of
> >> >>> > a
> >> >>> > simple process group that has GetMongo, UpdateAttributes and
> >> >>> > PutElasticSearchHttpRecord. I'm not sure if events are actually
> >> >>> > making
> >> >>> > it.
> >> >>> >
> >> >>> > The Atlas documentation is pretty limited on setting up a vanilla
> >> >>> > installation, so I was wondering if someone could point me in the
> >> >>> > right
> >> >>> > direction from a NiFi point of view on what I should be seeing so
> I
> >> >>> > can
> >> >>> > start fumbling around in the right direction.
> >> >>> >
> >> >>> > Thanks,
> >> >>> >
> >> >>> > Mike
> >> >>
> >> >>
> >> >
> >
> >
>

Reply via email to