I am updating & adding few fields in csv. Hence used UpdateRecord..
Thanks & Regards,
Prashanth
From: Mark Payne [mailto:marka...@hotmail.com]
Sent: Wednesday, June 13, 2018 10:49 PM
To: users@nifi.apache.org
Subject: Re: NiFi Performance Analysis Clarification
Prashanth,
Also of note, are you
Prashanth,
Also of note, are you actually updating any fields in the CSV that you receive
with UpdateRecord / your custom processor?
Or are you just using that to convert the CSV to Avro? If the latter, you can
actually just remove this processor from your flow
entirely and simply use
Prashanth,
"will it will it spread out the stop-the-world time across the intervals. In
that case, my average would fall to same figures right?
It's hard to say - you'd have to give it a try and see if it improves. There
are a lot of different optimizations, both at the JVM
and the Operating
Mark,
Thanks for the reply. Please find the comments inline.
Thanks & Regards,
Prashanth
From: Mark Payne [mailto:marka...@hotmail.com]
Sent: Wednesday, June 13, 2018 6:07 PM
To: users@nifi.apache.org
Subject: Re: NiFi Performance Analysis Clarification
Prashanth,
Whenever the FlowFile
Joe,
Thanks for the reply. Please find the answers inline.
Thanks & Regards,
Prashanth
-Original Message-
From: Joe Witt [mailto:joe.w...@gmail.com]
Sent: Wednesday, June 13, 2018 6:04 PM
To: users@nifi.apache.org
Subject: Re: NiFi Performance Analysis Clarification
Prasanth
Mark,
That sounds great, thanks!
On 13 June 2018 at 16:49, Mark Payne wrote:
> Martijn,
>
> "As an aside, does DistributeLoad use backpressure to know what processor
> is / is not available?"
> - It depends on the value that you set for the Processor's "Distribution
> Strategy." The default is
Martijn,
"As an aside, does DistributeLoad use backpressure to know what processor is /
is not available?"
- It depends on the value that you set for the Processor's "Distribution
Strategy." The default is
Round Robin, which means that if any of the connections applies Back Pressure,
then
Thanks for the additional details. It sounds like you have already explored
alternatives quite a bit and have found the best path. :) Looks like Mark has
some good advice for making this flow manageable, so if this is working for
you, I’d take his suggestions where it makes sense and run with
Hi Mark!
Typically when I come across a set of processors like this, I go with an
> approach like https://imgur.com/a/3Zh3FeN
> So we have a DistributeLoad going to one of 24 different PutS3Object
> processors. Each processor's 'failure'
> relationship is then routed to a funnel, and that funnel
Martijn,
Typically when I come across a set of processors like this, I go with an
approach like https://imgur.com/a/3Zh3FeN
So we have a DistributeLoad going to one of 24 different PutS3Object
processors. Each processor's 'failure'
relationship is then routed to a funnel, and that funnel just
Hi Kevin!
Thanks for your reply.
> Can you share more about the details of what your DistributeLoad process
> group is doing and how the 24 endpoints of the particular S3-compatible
> storage service work? Are they fixed or could they change? Just hoping to
> understand what are the constraints
Hi Martijn,
Can you share more about the details of what your DistributeLoad process group
is doing and how the 24 endpoints of the particular S3-compatible storage
service work? Are they fixed or could they change? Just hoping to understand
what are the constraints you have to work within.
Hi Jeremy,
With build-in processor[UpdateRecord] with controller service CsvReader &
AvroSetWriter. I can send average of ~50MBps to kafka. I also created custom
processor for my business logic with internal avro conversion(not using
controller service) , I can push it to average of ~80Mbps.
Prashanth,
Whenever the FlowFile Repository performs a Checkpoint, it has to ensure that
it has flushed all data to disk
before continuing, so it performs an fsync() call so that any data buffered by
the Operating System is flushed
to disk as well. If you're using the same physical drive /
Prasanth
I strongly recommend you reduce your JVM heap size for NiFi to 2 or 4
and no more than 8GB. The flow, well configured, will certainly not
need anywhere near that much and the more ram you give it the more
work GC has to do (some GCs are different and can be tuned/etc.. but
...that is
Prashanth - just out of curiosity could you share the average size of those
Avro files you are pushing to Kafka? It would be nice to know for some other
benchmark tests I am doing
Thanks,
Jeremy Dyer
Thanks - Jeremy Dyer
From: V, Prashanth (Nokia -
Relevant: http://www.idata.co.il/2016/09/moving-binary-data-with-kafka/
If you're throwing 1MB and bigger files at Kafka, that's probably where
your slowdown is occurring. Particularly if you're running a single node or
just two nodes. Kafka was designed to process extremely high volumes of
small
Please find answers inline
Thanks & Regards,
Prashanth
From: Pierre Villard [mailto:pierre.villard...@gmail.com]
Sent: Wednesday, June 13, 2018 3:56 PM
To: users@nifi.apache.org
Subject: Re: NiFi Performance Analysis Clarification
Hi,
What's the version of NiFi you're using? 1.6.0
What are
Hi,
What's the version of NiFi you're using?
What are the file systems you're using for the repositories?
I think that changing the heap won't make any different in this case. I'd
keep it to something like 8GB (unless you're doing very specific stuff that
are memory consuming) and let the
Hi Mike,
I am retrieving many small csv files each of size 1MB (total folder size around
~100GB). In update step, I am doing some enrichment on ingress csv. Anyway my
flow doesn’t do anything with the stop the world time right?
Can you please tell me about flowfile checkpointing related
What are you retrieving (particularly size) and what happens in the
"update" step?
Thanks,
Mike
On Wed, Jun 13, 2018 at 4:10 AM V, Prashanth (Nokia - IN/Bangalore) <
prashant...@nokia.com> wrote:
> Hi Team,
>
>
>
> I am doing some performance testing in NiFi. WorkFlow is *GetSFTP ->
> update
Hi Team,
I am doing some performance testing in NiFi. WorkFlow is GetSFTP -> update ->
PutKafka. I want to tune my setup to achieve high throughput without much
queuing.
But my throughput average drops during flowfile checkpointing duration. I
believe stop-the-world is happening during that
see https://imgur.com/c4MDP7o
The success routes are not yet in place - each PutS3Object needs to be
routed to a success handling set of processors.
Thanks
Martijn
On 13 June 2018 at 08:42, Sivaprasanna wrote:
> Is it possible to share screenshots of the flow which feels cluttered? I
> have
Is it possible to share screenshots of the flow which feels cluttered? I
have a hard time picturing how the PutS3Objects are routed to failures and
successes. A picture would certainly help.
Thanks.
On Wed, Jun 13, 2018 at 12:07 PM, Martijn Dekkers
wrote:
> Thanks, I already use process groups
Thanks, I already use process groups specifically for the PutS3Object
processors. However, with 24 of those, all needing a failure and success
connection, this screen is very cluttered.
Thanks
Martijn
On 13 June 2018 at 08:30, Sivaprasanna wrote:
> Martijn,
>
> One clean up approach that comes
Martijn,
One clean up approach that comes immediately to my mind is to use 'Process
Groups'. Using which, you can group processors together that perform a
related sequence of actions. You can think of them as 'functions' or
'methods' in programming terms. And since you mentioned that you are
All,
I have a more general question. We will be uploading files to an S3
compatible storage system. In our case, this system presents 24 endpoints
to upload to. Given the volume of data we are sending to this device, we
want to avoid using a loadbalancer like HAProxy for some use-cases, to
avoid
27 matches
Mail list logo