Dominique,

Thats a great question. An important distinction between NiFi's lineage and 
Atlas's lineage is that NiFi provides
data-level lineage (i.e., lineage for each individual FlowFile). Atlas provides 
"dataset-level"
lineage. So what gets exported to Atlas is not the lineage of each individual 
FlowFile
but rather the connection between each source and destination.

So, for example, if you pull 1,000 FlowFiles from Kafka Topic ABC and push them 
to HDFS directory /path/to/my/new/file
and then you pull a million FlowFiles from Kafka Topic XYZ and push them to 
HDFS directory /path/to/my/other/file
then what gets exported to Atlas is two separate lineages: a lineage from Kafka 
Topic ABC to HDFS directory
/path/to/my/new/file and a lineage from Kafka Topic XYZ to HDFS directory 
/path/to/my/other/file.
It will not export this lineage 1 million FlowFiles as a result of having 1 
million FlowFile traverse this lineage. It will
send it only one time.

Does that make sense?

Thanks
-Mark


> On Jan 23, 2018, at 7:15 AM, Dominique De Vito <[email protected]> wrote:
> 
> Hi,
> 
> AFAIK lineage occurs at FlowFile level.
> That is (AFAIU) each FlowFile could have it its own lineage.
> 
> So, if Nifi is reading a file as input, with 1 000 records, is Nifi going to 
> send to Atlas 1 000 lineages?
> 
> If yes, does Nifi send these 1 000 lineages (to Atlas) in one call in a batch 
> way?
> 
> Or, these 1 000 lineages to send corespond to 1 000 calls to Atlas ?
> 
> Thanks.
> 
> Dominique
> 
> 
> 
> 
> 

Reply via email to