Re: problem with Nifi / Atlas integration - has anyone some experience with this integration ?
Hi Koji, Many thanks for your answer / help. > I have two things in mind could be related to your situation. > One is NIFI-4971, it's under being reviewed now. It fixes lineage > reporting issue when 'complete path' strategy is used. > If you are using 'complete path', I'd recommend trying 'simple path' > to see if that's the case. I wat not aware of that one. Thanks for the info, and the Kafka lag too. After investigation, it looks like my pb was simpler. I was running: a) Nifi 1.5 on my computer b) HDP (with Atlas + Kafka inside) in a VM (HDP sandbox), and a lot of processes (all ?) in this VM run inside a container also. Kafka exposes the 6667 port (by default) in the container, but I didn't noticed, at first steps, that the container DOES NOT expose the inner 6667 port outside (that is, on the VM). Due to (b) - AFAIU - msgs sent by Nifi were not able to reach Kafka and then, were not able to reach Atlas. When my non-standbox (next to come) env will be available, I will do other Nifi/Atlas integration tests. So far, using HDP sandbox has being a pain to use (due inner port non-exposure). Thanks. Regards, Dominique 2018-04-26 17:48 GMT+02:00 Koji Kawamura: > Hi Dominique, > > Thank you for your interest in NiFI and Atlas integration. > I have some experience with that, and actually written the NiFi reporting > task. > > I have two things in mind could be related to your situation. > One is NIFI-4971, it's under being reviewed now. It fixes lineage > reporting issue when 'complete path' strategy is used. > If you are using 'complete path', I'd recommend trying 'simple path' > to see if that's the case. > > The other one is Atlas not being able to catch up fast enough to > consume all messages from the Kafka topic. > This happens when lots of messages are sent to the Atlas hook topic > from NiFi, particularly seen when different files are written or > retrieved from file system and NiFi tries to report it, as those > entities are reported individually. > Following command can be helpful to see how Atlas consumes messages. > If there're lots of LAG, those messages are waiting to be consumed and > processed by Atlas. > > # Sometimes Atlas consumer is not catching up and entities are not > created even if NiFi reported as expected > KAFKA_HOME/bin/kafka-consumer-groups.sh --bootstrap-server server:port > --describe --group atlas > GROUP TOPIC > PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG OWNER > atlas ATLAS_HOOK 0 > 24944 31897 6953 > > Thanks, > Koji > > > On Thu, Apr 26, 2018 at 6:50 PM, Dominique De Vito > wrote: > > Hi, > > > > I have defined a simple pipeline in Nifi: > > > > GetFile => some processor doing a dummy transformation => PublishInKafka > > > > ...with Atlas integration for lineage purposes > > > > Versions: > > -- Atlas 0.8.0 (Stack : HDP 2.6.4) > > -- Nifi 1.5.0 > > > > and I have put some (dummy) file into the input directory, and it went > up to > > the end of the pipeline. > > > > Results: > > > > * a "nifi_flow" entity and a "nifi_flow_path" entity were defined in > Atlas > > <= good > > > > * PROBLEM_1: the "nifi_flow_path" entity has no input, neither output. > > > > But I see in the Nifi logs a trace stating that Nifi has sent a > > "ENTITY_PARTIAL_UPDATE" json to Atlas HOOK topic, with correct input and > > output. > > > > So, something looks like broken in Nifi<=>Atlas link, or within Atlas. > > > > * PROBLEM_2 (but Atlas related): when I use the GUI, Atlas says it can't > > found the "nifi_flow" entity while it's available through the REST api: > > > > 2018-04-24 05:48:14,317 ERROR - [pool-2-thread-5 - > > 3076c14e-9bb4-44a7-8299-d56476f3ec89:] ~ graph rollback due to exception > > AtlasBaseException:Instance nifi_flow with unique attribute > > {qualifiedName=76d4acd9-0162-1000-257a-7393e17b3a16@mycluster5} does not > > exist (GraphTransactionInterceptor:73) > > > > > > > > > So my questions: > > > > 1) Did anyone meet such problems ? > > > > 2) Does anyone have had some (good) experience integrating Nifi with > Atlas ? > > > > Thanks. > > > > Dominique > > >
Re: problem with Nifi / Atlas integration - has anyone some experience with this integration ?
Hi Dominique, Thank you for your interest in NiFI and Atlas integration. I have some experience with that, and actually written the NiFi reporting task. I have two things in mind could be related to your situation. One is NIFI-4971, it's under being reviewed now. It fixes lineage reporting issue when 'complete path' strategy is used. If you are using 'complete path', I'd recommend trying 'simple path' to see if that's the case. The other one is Atlas not being able to catch up fast enough to consume all messages from the Kafka topic. This happens when lots of messages are sent to the Atlas hook topic from NiFi, particularly seen when different files are written or retrieved from file system and NiFi tries to report it, as those entities are reported individually. Following command can be helpful to see how Atlas consumes messages. If there're lots of LAG, those messages are waiting to be consumed and processed by Atlas. # Sometimes Atlas consumer is not catching up and entities are not created even if NiFi reported as expected KAFKA_HOME/bin/kafka-consumer-groups.sh --bootstrap-server server:port --describe --group atlas GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG OWNER atlas ATLAS_HOOK 0 24944 31897 6953 Thanks, Koji On Thu, Apr 26, 2018 at 6:50 PM, Dominique De Vitowrote: > Hi, > > I have defined a simple pipeline in Nifi: > > GetFile => some processor doing a dummy transformation => PublishInKafka > > ...with Atlas integration for lineage purposes > > Versions: > -- Atlas 0.8.0 (Stack : HDP 2.6.4) > -- Nifi 1.5.0 > > and I have put some (dummy) file into the input directory, and it went up to > the end of the pipeline. > > Results: > > * a "nifi_flow" entity and a "nifi_flow_path" entity were defined in Atlas > <= good > > * PROBLEM_1: the "nifi_flow_path" entity has no input, neither output. > > But I see in the Nifi logs a trace stating that Nifi has sent a > "ENTITY_PARTIAL_UPDATE" json to Atlas HOOK topic, with correct input and > output. > > So, something looks like broken in Nifi<=>Atlas link, or within Atlas. > > * PROBLEM_2 (but Atlas related): when I use the GUI, Atlas says it can't > found the "nifi_flow" entity while it's available through the REST api: > > 2018-04-24 05:48:14,317 ERROR - [pool-2-thread-5 - > 3076c14e-9bb4-44a7-8299-d56476f3ec89:] ~ graph rollback due to exception > AtlasBaseException:Instance nifi_flow with unique attribute > {qualifiedName=76d4acd9-0162-1000-257a-7393e17b3a16@mycluster5} does not > exist (GraphTransactionInterceptor:73) > > > > > So my questions: > > 1) Did anyone meet such problems ? > > 2) Does anyone have had some (good) experience integrating Nifi with Atlas ? > > Thanks. > > Dominique >
problem with Nifi / Atlas integration - has anyone some experience with this integration ?
Hi, I have defined a simple pipeline in Nifi: GetFile => some processor doing a dummy transformation => PublishInKafka ...with Atlas integration for lineage purposes Versions: -- Atlas 0.8.0 (Stack : HDP 2.6.4) -- Nifi 1.5.0 and I have put some (dummy) file into the input directory, and it went up to the end of the pipeline. Results: * a "nifi_flow" entity and a "nifi_flow_path" entity were defined in Atlas <= good * PROBLEM_1: the "nifi_flow_path" entity has no input, neither output. But I see in the Nifi logs a trace stating that Nifi has sent a "ENTITY_PARTIAL_UPDATE" json to Atlas HOOK topic, with correct input and output. So, something looks like broken in Nifi<=>Atlas link, or within Atlas. * PROBLEM_2 (but Atlas related): when I use the GUI, Atlas says it can't found the "nifi_flow" entity while it's available through the REST api: 2018-04-24 05:48:14,317 ERROR - [pool-2-thread-5 - 3076c14e-9bb4-44a7-8299-d56476f3ec89:] ~ graph rollback due to exception AtlasBaseException:Instance nifi_flow with unique attribute {qualifiedName=76d4acd9-0162-1000-257a-7393e17b3a16@mycluster5} does not exist (GraphTransactionInterceptor:73) > So my questions: 1) Did anyone meet such problems ? 2) Does anyone have had some (good) experience integrating Nifi with Atlas ? Thanks. Dominique