Dominique, Thats a great question. An important distinction between NiFi's lineage and Atlas's lineage is that NiFi provides data-level lineage (i.e., lineage for each individual FlowFile). Atlas provides "dataset-level" lineage. So what gets exported to Atlas is not the lineage of each individual FlowFile but rather the connection between each source and destination.
So, for example, if you pull 1,000 FlowFiles from Kafka Topic ABC and push them to HDFS directory /path/to/my/new/file and then you pull a million FlowFiles from Kafka Topic XYZ and push them to HDFS directory /path/to/my/other/file then what gets exported to Atlas is two separate lineages: a lineage from Kafka Topic ABC to HDFS directory /path/to/my/new/file and a lineage from Kafka Topic XYZ to HDFS directory /path/to/my/other/file. It will not export this lineage 1 million FlowFiles as a result of having 1 million FlowFile traverse this lineage. It will send it only one time. Does that make sense? Thanks -Mark > On Jan 23, 2018, at 7:15 AM, Dominique De Vito <[email protected]> wrote: > > Hi, > > AFAIK lineage occurs at FlowFile level. > That is (AFAIU) each FlowFile could have it its own lineage. > > So, if Nifi is reading a file as input, with 1 000 records, is Nifi going to > send to Atlas 1 000 lineages? > > If yes, does Nifi send these 1 000 lineages (to Atlas) in one call in a batch > way? > > Or, these 1 000 lineages to send corespond to 1 000 calls to Atlas ? > > Thanks. > > Dominique > > > > >
