Re: problem with Nifi / Atlas integration - has anyone some experience with this integration ?

2018-05-03 Thread Dominique De Vito
Hi Koji,

Many thanks for your answer / help.

> I have two things in mind could be related to your situation.
> One is NIFI-4971, it's under being reviewed now. It fixes lineage
> reporting issue when 'complete path' strategy is used.
> If you are using 'complete path', I'd recommend trying 'simple path'
> to see if that's the case.

I wat not aware of that one. Thanks for the info, and the Kafka lag too.

After investigation, it looks like my pb was simpler.

I was running:

a) Nifi 1.5 on my computer

b) HDP (with Atlas + Kafka inside) in a VM (HDP sandbox), and a lot of
processes (all ?) in this VM run inside a container also. Kafka exposes the
6667 port (by default) in the container, but I didn't noticed, at first
steps, that the container DOES NOT expose the inner 6667 port outside (that
is, on the VM).

Due to (b) - AFAIU - msgs sent by Nifi were not able to reach Kafka and
then, were not able to reach Atlas.

When my non-standbox (next to come) env will be available, I will do other
Nifi/Atlas integration tests. So far, using HDP sandbox has being a pain to
use (due inner port non-exposure).

Thanks.

Regards,
Dominique



2018-04-26 17:48 GMT+02:00 Koji Kawamura :

> Hi Dominique,
>
> Thank you for your interest in NiFI and Atlas integration.
> I have some experience with that, and actually written the NiFi reporting
> task.
>
> I have two things in mind could be related to your situation.
> One is NIFI-4971, it's under being reviewed now. It fixes lineage
> reporting issue when 'complete path' strategy is used.
> If you are using 'complete path', I'd recommend trying 'simple path'
> to see if that's the case.
>
> The other one is Atlas not being able to catch up fast enough to
> consume all messages from the Kafka topic.
> This happens when lots of messages are sent to the Atlas hook topic
> from NiFi, particularly seen when different files are written or
> retrieved from file system and NiFi tries to report it, as those
> entities are reported individually.
> Following command can be helpful to see how Atlas consumes messages.
> If there're lots of LAG, those messages are waiting to be consumed and
> processed by Atlas.
>
> # Sometimes Atlas consumer is not catching up and entities are not
> created even if NiFi reported as expected
> KAFKA_HOME/bin/kafka-consumer-groups.sh --bootstrap-server server:port
> --describe --group atlas
> GROUP  TOPIC
> PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG OWNER
> atlas  ATLAS_HOOK 0
>   24944   31897   6953
>
> Thanks,
> Koji
>
>
> On Thu, Apr 26, 2018 at 6:50 PM, Dominique De Vito 
> wrote:
> > Hi,
> >
> > I have defined a simple pipeline in Nifi:
> >
> > GetFile => some processor doing a dummy transformation => PublishInKafka
> >
> > ...with Atlas integration for lineage purposes
> >
> > Versions:
> > -- Atlas 0.8.0 (Stack : HDP 2.6.4)
> > -- Nifi 1.5.0
> >
> > and I have put some (dummy) file into the input directory, and it went
> up to
> > the end of the pipeline.
> >
> > Results:
> >
> > * a "nifi_flow" entity and a "nifi_flow_path" entity were defined in
> Atlas
> > <= good
> >
> > * PROBLEM_1: the "nifi_flow_path" entity has no input, neither output.
> >
> > But I see in the Nifi logs a trace stating that Nifi has sent a
> > "ENTITY_PARTIAL_UPDATE" json to Atlas HOOK topic, with correct input and
> > output.
> >
> > So, something looks like broken in Nifi<=>Atlas link, or within Atlas.
> >
> > * PROBLEM_2 (but Atlas related): when I use the GUI, Atlas says it can't
> > found the "nifi_flow" entity while it's available through the REST api:
> >
> > 2018-04-24 05:48:14,317 ERROR - [pool-2-thread-5 -
> > 3076c14e-9bb4-44a7-8299-d56476f3ec89:] ~ graph rollback due to exception
> > AtlasBaseException:Instance nifi_flow with unique attribute
> > {qualifiedName=76d4acd9-0162-1000-257a-7393e17b3a16@mycluster5} does not
> > exist (GraphTransactionInterceptor:73)
> >
> > >
> >
> > So my questions:
> >
> > 1) Did anyone meet such problems ?
> >
> > 2) Does anyone have had some (good) experience integrating Nifi with
> Atlas ?
> >
> > Thanks.
> >
> > Dominique
> >
>


Re: problem with Nifi / Atlas integration - has anyone some experience with this integration ?

2018-04-26 Thread Koji Kawamura
Hi Dominique,

Thank you for your interest in NiFI and Atlas integration.
I have some experience with that, and actually written the NiFi reporting task.

I have two things in mind could be related to your situation.
One is NIFI-4971, it's under being reviewed now. It fixes lineage
reporting issue when 'complete path' strategy is used.
If you are using 'complete path', I'd recommend trying 'simple path'
to see if that's the case.

The other one is Atlas not being able to catch up fast enough to
consume all messages from the Kafka topic.
This happens when lots of messages are sent to the Atlas hook topic
from NiFi, particularly seen when different files are written or
retrieved from file system and NiFi tries to report it, as those
entities are reported individually.
Following command can be helpful to see how Atlas consumes messages.
If there're lots of LAG, those messages are waiting to be consumed and
processed by Atlas.

# Sometimes Atlas consumer is not catching up and entities are not
created even if NiFi reported as expected
KAFKA_HOME/bin/kafka-consumer-groups.sh --bootstrap-server server:port
--describe --group atlas
GROUP  TOPIC
PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG OWNER
atlas  ATLAS_HOOK 0
  24944   31897   6953

Thanks,
Koji


On Thu, Apr 26, 2018 at 6:50 PM, Dominique De Vito  wrote:
> Hi,
>
> I have defined a simple pipeline in Nifi:
>
> GetFile => some processor doing a dummy transformation => PublishInKafka
>
> ...with Atlas integration for lineage purposes
>
> Versions:
> -- Atlas 0.8.0 (Stack : HDP 2.6.4)
> -- Nifi 1.5.0
>
> and I have put some (dummy) file into the input directory, and it went up to
> the end of the pipeline.
>
> Results:
>
> * a "nifi_flow" entity and a "nifi_flow_path" entity were defined in Atlas
> <= good
>
> * PROBLEM_1: the "nifi_flow_path" entity has no input, neither output.
>
> But I see in the Nifi logs a trace stating that Nifi has sent a
> "ENTITY_PARTIAL_UPDATE" json to Atlas HOOK topic, with correct input and
> output.
>
> So, something looks like broken in Nifi<=>Atlas link, or within Atlas.
>
> * PROBLEM_2 (but Atlas related): when I use the GUI, Atlas says it can't
> found the "nifi_flow" entity while it's available through the REST api:
>
> 2018-04-24 05:48:14,317 ERROR - [pool-2-thread-5 -
> 3076c14e-9bb4-44a7-8299-d56476f3ec89:] ~ graph rollback due to exception
> AtlasBaseException:Instance nifi_flow with unique attribute
> {qualifiedName=76d4acd9-0162-1000-257a-7393e17b3a16@mycluster5} does not
> exist (GraphTransactionInterceptor:73)
>
> >
>
> So my questions:
>
> 1) Did anyone meet such problems ?
>
> 2) Does anyone have had some (good) experience integrating Nifi with Atlas ?
>
> Thanks.
>
> Dominique
>


problem with Nifi / Atlas integration - has anyone some experience with this integration ?

2018-04-26 Thread Dominique De Vito
Hi,

I have defined a simple pipeline in Nifi:

GetFile => some processor doing a dummy transformation => PublishInKafka

...with Atlas integration for lineage purposes

Versions:
-- Atlas 0.8.0 (Stack : HDP 2.6.4)
-- Nifi 1.5.0

and I have put some (dummy) file into the input directory, and it went up
to the end of the pipeline.

Results:

* a "nifi_flow" entity and a "nifi_flow_path" entity were defined in Atlas
<= good

* PROBLEM_1: the "nifi_flow_path" entity has no input, neither output.

But I see in the Nifi logs a trace stating that Nifi has sent a
"ENTITY_PARTIAL_UPDATE"
json to Atlas HOOK topic, with correct input and output.

So, something looks like broken in Nifi<=>Atlas link, or within Atlas.

* PROBLEM_2 (but Atlas related): when I use the GUI, Atlas says it can't
found the "nifi_flow" entity while it's available through the REST api:

2018-04-24 05:48:14,317 ERROR - [pool-2-thread-5 -
3076c14e-9bb4-44a7-8299-d56476f3ec89:]
~ graph rollback due to exception AtlasBaseException:Instance nifi_flow
with unique attribute
{qualifiedName=76d4acd9-0162-1000-257a-7393e17b3a16@mycluster5}
does not exist (GraphTransactionInterceptor:73)

>

So my questions:

1) Did anyone meet such problems ?

2) Does anyone have had some (good) experience integrating Nifi with Atlas
?

Thanks.

Dominique