Hi Dominique,

Thank you for your interest in NiFI and Atlas integration.
I have some experience with that, and actually written the NiFi reporting task.

I have two things in mind could be related to your situation.
One is NIFI-4971, it's under being reviewed now. It fixes lineage
reporting issue when 'complete path' strategy is used.
If you are using 'complete path', I'd recommend trying 'simple path'
to see if that's the case.

The other one is Atlas not being able to catch up fast enough to
consume all messages from the Kafka topic.
This happens when lots of messages are sent to the Atlas hook topic
from NiFi, particularly seen when different files are written or
retrieved from file system and NiFi tries to report it, as those
entities are reported individually.
Following command can be helpful to see how Atlas consumes messages.
If there're lots of LAG, those messages are waiting to be consumed and
processed by Atlas.

# Sometimes Atlas consumer is not catching up and entities are not
created even if NiFi reported as expected
KAFKA_HOME/bin/kafka-consumer-groups.sh --bootstrap-server server:port
--describe --group atlas
GROUP                          TOPIC
PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             OWNER
atlas                          ATLAS_HOOK                     0
  24944           31897           6953

Thanks,
Koji


On Thu, Apr 26, 2018 at 6:50 PM, Dominique De Vito <[email protected]> wrote:
> Hi,
>
> I have defined a simple pipeline in Nifi:
>
> GetFile => some processor doing a dummy transformation => PublishInKafka
>
> ...............with Atlas integration for lineage purposes
>
> Versions:
> -- Atlas 0.8.0 (Stack : HDP 2.6.4)
> -- Nifi 1.5.0
>
> and I have put some (dummy) file into the input directory, and it went up to
> the end of the pipeline.
>
> Results:
>
> * a "nifi_flow" entity and a "nifi_flow_path" entity were defined in Atlas
> <= good
>
> * PROBLEM_1: the "nifi_flow_path" entity has no input, neither output.
>
> But I see in the Nifi logs a trace stating that Nifi has sent a
> "ENTITY_PARTIAL_UPDATE" json to Atlas HOOK topic, with correct input and
> output.
>
> So, something looks like broken in Nifi<=>Atlas link, or within Atlas.
>
> * PROBLEM_2 (but Atlas related): when I use the GUI, Atlas says it can't
> found the "nifi_flow" entity while it's available through the REST api:
>
> 2018-04-24 05:48:14,317 ERROR - [pool-2-thread-5 -
> 3076c14e-9bb4-44a7-8299-d56476f3ec89:] ~ graph rollback due to exception
> AtlasBaseException:Instance nifi_flow with unique attribute
> {qualifiedName=76d4acd9-0162-1000-257a-7393e17b3a16@mycluster5} does not
> exist (GraphTransactionInterceptor:73)
>
> ============>
>
> So my questions:
>
> 1) Did anyone meet such problems ?
>
> 2) Does anyone have had some (good) experience integrating Nifi with Atlas ?
>
> Thanks.
>
> Dominique
>

Reply via email to