Hi Team,
I am doing some performance testing in NiFi. WorkFlow is GetSFTP -> update ->
PutKafka. I want to tune my setup to achieve high throughput without much
queuing.
But my throughput average drops during flowfile checkpointing duration. I
believe stop-the-world is happening during that ti
What are you retrieving (particularly size) and what happens in the
"update" step?
Thanks,
Mike
On Wed, Jun 13, 2018 at 4:10 AM V, Prashanth (Nokia - IN/Bangalore) <
prashant...@nokia.com> wrote:
> Hi Team,
>
>
>
> I am doing some performance testing in NiFi. WorkFlow is *GetSFTP ->
> update ->
Hi Mike,
I am retrieving many small csv files each of size 1MB (total folder size around
~100GB). In update step, I am doing some enrichment on ingress csv. Anyway my
flow doesn’t do anything with the stop the world time right?
Can you please tell me about flowfile checkpointing related tunings
Hi,
What's the version of NiFi you're using?
What are the file systems you're using for the repositories?
I think that changing the heap won't make any different in this case. I'd
keep it to something like 8GB (unless you're doing very specific stuff that
are memory consuming) and let the remaini
Please find answers inline
Thanks & Regards,
Prashanth
From: Pierre Villard [mailto:pierre.villard...@gmail.com]
Sent: Wednesday, June 13, 2018 3:56 PM
To: users@nifi.apache.org
Subject: Re: NiFi Performance Analysis Clarification
Hi,
What's the version of NiFi you're using? 1.6.0
What are the
Relevant: http://www.idata.co.il/2016/09/moving-binary-data-with-kafka/
If you're throwing 1MB and bigger files at Kafka, that's probably where
your slowdown is occurring. Particularly if you're running a single node or
just two nodes. Kafka was designed to process extremely high volumes of
small
Hi Mike,
Thanks for the reply. Actually , we did all those optimisations with kafka. I
am converting to avro, also I configured kafka producer properties accordingly.
I believe kafka is not a bottleneck.
I am sure because, I can see pretty good throughput with my flow. But average
throughput is
Prashanth - just out of curiosity could you share the average size of those
Avro files you are pushing to Kafka? It would be nice to know for some other
benchmark tests I am doing
Thanks,
Jeremy Dyer
Thanks - Jeremy Dyer
From: V, Prashanth (Nokia - IN/Bangalore)
Prasanth
I strongly recommend you reduce your JVM heap size for NiFi to 2 or 4
and no more than 8GB. The flow, well configured, will certainly not
need anywhere near that much and the more ram you give it the more
work GC has to do (some GCs are different and can be tuned/etc.. but
...that is for
Prashanth,
Whenever the FlowFile Repository performs a Checkpoint, it has to ensure that
it has flushed all data to disk
before continuing, so it performs an fsync() call so that any data buffered by
the Operating System is flushed
to disk as well. If you're using the same physical drive / physi
Hi Jeremy,
With build-in processor[UpdateRecord] with controller service CsvReader &
AvroSetWriter. I can send average of ~50MBps to kafka. I also created custom
processor for my business logic with internal avro conversion(not using
controller service) , I can push it to average of ~80Mbps.
Hi Martijn,
Can you share more about the details of what your DistributeLoad process group
is doing and how the 24 endpoints of the particular S3-compatible storage
service work? Are they fixed or could they change? Just hoping to understand
what are the constraints you have to work within.
Hi Kevin!
Thanks for your reply.
> Can you share more about the details of what your DistributeLoad process
> group is doing and how the 24 endpoints of the particular S3-compatible
> storage service work? Are they fixed or could they change? Just hoping to
> understand what are the constraints
Martijn,
Typically when I come across a set of processors like this, I go with an
approach like https://imgur.com/a/3Zh3FeN
So we have a DistributeLoad going to one of 24 different PutS3Object
processors. Each processor's 'failure'
relationship is then routed to a funnel, and that funnel just l
Hi Mark!
Typically when I come across a set of processors like this, I go with an
> approach like https://imgur.com/a/3Zh3FeN
> So we have a DistributeLoad going to one of 24 different PutS3Object
> processors. Each processor's 'failure'
> relationship is then routed to a funnel, and that funnel
Martijn,
"As an aside, does DistributeLoad use backpressure to know what processor is /
is not available?"
- It depends on the value that you set for the Processor's "Distribution
Strategy." The default is
Round Robin, which means that if any of the connections applies Back Pressure,
then Distr
Thanks for the additional details. It sounds like you have already explored
alternatives quite a bit and have found the best path. :) Looks like Mark has
some good advice for making this flow manageable, so if this is working for
you, I’d take his suggestions where it makes sense and run with it
Mark,
That sounds great, thanks!
On 13 June 2018 at 16:49, Mark Payne wrote:
> Martijn,
>
> "As an aside, does DistributeLoad use backpressure to know what processor
> is / is not available?"
> - It depends on the value that you set for the Processor's "Distribution
> Strategy." The default is
Joe,
Thanks for the reply. Please find the answers inline.
Thanks & Regards,
Prashanth
-Original Message-
From: Joe Witt [mailto:joe.w...@gmail.com]
Sent: Wednesday, June 13, 2018 6:04 PM
To: users@nifi.apache.org
Subject: Re: NiFi Performance Analysis Clarification
Prasanth
Mark,
Thanks for the reply. Please find the comments inline.
Thanks & Regards,
Prashanth
From: Mark Payne [mailto:marka...@hotmail.com]
Sent: Wednesday, June 13, 2018 6:07 PM
To: users@nifi.apache.org
Subject: Re: NiFi Performance Analysis Clarification
Prashanth,
Whenever the FlowFile Reposito
Prashanth,
"will it will it spread out the stop-the-world time across the intervals. In
that case, my average would fall to same figures right?
It's hard to say - you'd have to give it a try and see if it improves. There
are a lot of different optimizations, both at the JVM
and the Operating Sy
Prashanth,
Also of note, are you actually updating any fields in the CSV that you receive
with UpdateRecord / your custom processor?
Or are you just using that to convert the CSV to Avro? If the latter, you can
actually just remove this processor from your flow
entirely and simply use PublishKaf
I am updating & adding few fields in csv. Hence used UpdateRecord..
Thanks & Regards,
Prashanth
From: Mark Payne [mailto:marka...@hotmail.com]
Sent: Wednesday, June 13, 2018 10:49 PM
To: users@nifi.apache.org
Subject: Re: NiFi Performance Analysis Clarification
Prashanth,
Also of note, are you
Hello Team,
We are using TCP Processor to receive input from external systems.
We are frequently receiving max connection timeout exception
We feel the number of configured connections are more than our requirements.
Is there any way to monitor the open connections.
How we can know that open c
24 matches
Mail list logo