Got it - yep focusing on the flow design itself is where I'd go for now.
Consider every config that could in the case of errors/unexpected data
allow things to leave the flow.  Using provenance data can be valuable to
helping review this.  Add things into the flow that extract metadata that
provenance can index and then when you find a missing event you could
likely search it.  There are a lot of techniques to help hunt such things
down but in the vast majority of cases it is a relationship routing to
terminate or something other than ensuring it ends up going to the
destination.

Thanks

On Thu, Aug 15, 2024 at 9:20 AM Chirthani, Deepak Reddy <
[email protected]> wrote:

> Hi Joe,
>
>
>
> Both the original and the duplicate flows are same except for two changes
> which are kakfa group id and the target mongo collection name. Each
> processor in the dataflow has all the relationships other than success
> connected to a funnel. Therefore, I should see data if any processor is
> routing to a relationship other than success which I don’t see in both the
> flows.
>
>
>
> I agree with you that its generally unlikely there is data loss between
> kafka and nifi but in this case I can clearly see it by querying both the
> collections with the same eventid/transactionid
>
> *From:* Joe Witt <[email protected]>
> *Sent:* Thursday, August 15, 2024 11:11 AM
> *To:* [email protected]
> *Subject:* [EXTERNAL] Re: ConsumeKafka_2_6 Processor issue
>
>
>
> CAUTION: The e-mail below is from an external source. Please exercise
> caution before opening attachments, clicking links, or following guidance.
>
>
>
> Hello
>
>
>
> The most likely scenario at play here is that configuration of the flow
> results in certain messages/events/flowfiles being routed to a failure path
> or some path that does not end up in Mongo.  It is highly unlikely there is
> loss between Kafka and NiFi and between NiFi and Mongo.  The more likely
> scenario is a configuration within the flow in nifi which directs certain
> data in certain conditions to be thrown out.
>
>
>
> Have you reviewed every possible relationship and how it is handled in the
> flow?
>
>
>
> Thanks
>
>
>
> On Thu, Aug 15, 2024 at 8:56 AM Chirthani, Deepak Reddy <
> [email protected]> wrote:
>
> Hi guys,
>
>
>
> I have a dataflow in a Nifi 3-node clustered environment reading from a
> kafka topic and writing to a mongodb collection *target1*. We are not
> filtering any messages in the dataflow as well. The groupid for this
> consumer is *test1* and this consumer has been active since *two years*
>
> From at least a month, the business customers have been reporting us that
> they are missing data in the target which are they sure to be publishing.
> Even the kafka team helped us searching the message(s) on the kafka brokers
> which the customers claim that they were sure to be publishing. So its
> evident that the consumer is not picking them up.
>
>
> Now, I did set-up a new dataflow in nifi duplicating the original
> dataflow. I made two differences. New group-id *test2* for reading
> messages and new target collection *target2* for writing the data.
> Apparently this duplicate dataflow is consuming the expected number of
> messages.
>
>
>
> Now, I changed the groupid in the original dataflow from *test1 *to 
> *newgrouped
> *and the rest of the consumekafka processor configuration remains same
> including the offset reset which is latest. Both the original and duplicate
> dataflows are running from quite some time but still the issue exists with
> the original dataflow. The duplicate dataflow is keep on doing good
> consuming the expected number of messages, parsing and loading the
> processed data to the target.
>
>
>
> Please advise what could be the issue and how to resolve this.
>
>
>
> Additional note:
>
> Nifi Version: 1.21.0
>
> Number of concurrent threads on the consumekafka processor: 2
>
> Number of kafka partitions on the topic: 5
>
>
>
> Thanks
>
>
>
>
>
> The contents of this e-mail message and any attachments are intended
> solely for the addressee(s) and may contain confidential and/or legally
> privileged information. If you are not the intended recipient of this
> message or if this message has been addressed to you in error, please
> immediately alert the sender by reply e-mail and then delete this message
> and any attachments. If you are not the intended recipient, you are
> notified that any use, dissemination, distribution, copying, or storage of
> this message or any attachment is strictly prohibited.
>
> The contents of this e-mail message and any attachments are intended
> solely for the addressee(s) and may contain confidential and/or legally
> privileged information. If you are not the intended recipient of this
> message or if this message has been addressed to you in error, please
> immediately alert the sender by reply e-mail and then delete this message
> and any attachments. If you are not the intended recipient, you are
> notified that any use, dissemination, distribution, copying, or storage of
> this message or any attachment is strictly prohibited.
>

Reply via email to