Relevant: http://www.idata.co.il/2016/09/moving-binary-data-with-kafka/

If you're throwing 1MB and bigger files at Kafka, that's probably where
your slowdown is occurring. Particularly if you're running a single node or
just two nodes. Kafka was designed to process extremely high volumes of
small messages (at most 10s of kb, not MB and certainly not GB). What you
can try is building an Avro schema for your CSV files and using
PublishKafkaRecord to break everything down into records that are an
appropriate fit for Kafka.

On Wed, Jun 13, 2018 at 6:38 AM V, Prashanth (Nokia - IN/Bangalore) <
prashant...@nokia.com> wrote:

> Please find answers inline
>
>
>
> Thanks & Regards,
>
> Prashanth
>
>
>
> *From:* Pierre Villard [mailto:pierre.villard...@gmail.com]
> *Sent:* Wednesday, June 13, 2018 3:56 PM
>
>
> *To:* users@nifi.apache.org
> *Subject:* Re: NiFi Performance Analysis Clarification
>
>
>
> Hi,
>
>
>
> What's the version of NiFi you're using?  *1.6.0*
>
> What are the file systems you're using for the repositories? *Local rhel
> file system (/home dir)*
>
>
>
> I think that changing the heap won't make any different in this case. I'd
> keep it to something like 8GB (unless you're doing very specific stuff that
> are memory consuming) and let the remaining to OS and disk caching.
>
> *I think NiFi holds the snapshotmap in memory.. since we are dealing with
> pretty huge ingress data [I allocated 32GB out of 42GB to NiFi]. Hence, I
> increased so.  Does this has anything to do with flowfile checkpoint delay?*
>
>
>
> Pierre
>
>
>
> 2018-06-13 11:58 GMT+02:00 V, Prashanth (Nokia - IN/Bangalore) <
> prashant...@nokia.com>:
>
> Hi Mike,
>
>
>
> I am retrieving many small csv files each of size 1MB (total folder size
> around ~100GB). In update step, I am doing some enrichment on ingress csv.
> Anyway my flow doesn’t do anything with the *stop the world* time right?
>
>
>
> Can you please tell me about flowfile checkpointing related tunings?
>
>
>
> Thanks & Regards,
>
> Prashanth
>
>
>
> *From:* Mike Thomsen [mailto:mikerthom...@gmail.com]
> *Sent:* Wednesday, June 13, 2018 2:33 PM
> *To:* users@nifi.apache.org
> *Subject:* Re: NiFi Performance Analysis Clarification
>
>
>
> What are you retrieving (particularly size) and what happens in the
> "update" step?
>
>
>
> Thanks,
>
>
>
> Mike
>
>
>
> On Wed, Jun 13, 2018 at 4:10 AM V, Prashanth (Nokia - IN/Bangalore) <
> prashant...@nokia.com> wrote:
>
> Hi Team,
>
>
>
> I am doing some performance testing in NiFi. WorkFlow is *GetSFTP ->
> update -> PutKafka. *I want to tune my setup to achieve high throughput
> without much queuing.
>
> But my throughput average drops during flowfile checkpointing duration. I
> believe *stop-the-world * is happening during that time.
>
>
>
> I can roughly read ~100MB/s from SFTP and send almost same to Kafka. But
> every 2 mins, it stops the complete execution. Check below logs
>
>
>
> 2018-06-13 13:24:21,160 INFO [pool-10-thread-1]
> o.a.n.c.r.WriteAheadFlowFileRepository *Initiating checkpoint of FlowFile
> Repository*
>
> 2018-06-13 13:24:49,420 INFO [Write-Ahead Local State Provider
> Maintenance] org.wali.MinimalLockingWriteAheadLog
> org.wali.MinimalLockingWriteAheadLog@cf82c58 checkpointed with 23 Records
> and 0 Swap Files in 39353 milliseconds (Stop-the-world time = 3
> milliseconds, Clear Edit Logs time = 3 millis), max Transaction ID 68
>
> 2018-06-13 13:25:00,165 INFO [pool-10-thread-1]
> o.a.n.wali.SequentialAccessWriteAheadLog Checkpointed Write-Ahead Log with
> 7 Records and 0 Swap Files in 39002 milliseconds (Stop-the-world time =
> 28275 milliseconds), max Transaction ID 316705
>
> 2018-06-13 13:25:00,169 INFO [pool-10-thread-1]
> o.a.n.c.r.WriteAheadFlowFileRepository *Successfully checkpointed
> FlowFile Repository with 7 records in 39008 milliseconds*
>
>
>
> I think all processor goes in idle state for 39 seconds ☹ .. Please guide
> how to tune it..
>
> I changed the heap memory with 32G [I am testing on 12 core, 48G machine].
> I disabled content-repository archiving. All other properties remains same.
>
>
>
> Thanks & Regards,
>
> Prashanth
>
>
>

Reply via email to