Hello.

I'm testing on a VM 8vCPU (E5606 2.13Ghz) / 16Go.


I just have a GenerateFLowFIle which send data to an output port for Spark. 
Here, the performance is very good, I can generate a huge number of flow files.

My spark job is configured as local[4], and use 3 receivers. It just doing a 
simple word count on the stream, with a streaming context at 2 seconds.

My test is simple, I generate about 200k messages (about 200MB) from 
generateFLowFile, not starting the output port, in order  to queue  the data. 
Then I stop the processor, and start the output port and my spark job (see 
attached file) with:
 bin/spark-shell --master local[4] --packages 
"org.apache.nifi:nifi-spark-receiver:0.3.0" -i nifi.scala


with:
nifi.queue.swap.threshold=20000
nifi.swap.in.period=10 sec
nifi.swap.in.threads=1
nifi.swap.out.period=1 sec
nifi.swap.out.threads=4

I get the result in 2015-10-08 17_07_04 screenshot file.



With
nifi.queue.swap.threshold=20000
nifi.swap.in.period=1 sec
nifi.swap.in.threads=1
nifi.swap.out.period=1 sec
nifi.swap.out.threads=4

results in 2015-10-08 17_13_28 screenshot.


With
nifi.queue.swap.threshold=200000
nifi.swap.in.period=1 sec
nifi.swap.in.threads=1
nifi.swap.out.period=1 sec
nifi.swap.out.threads=4

I see result in 2015-10-08 17_21_09 screenshot.

Many questions:
- Why nifi limits the number of flow file sent to at most the swap threshold?
- Why nifi waits swap.in.period to send batch of flowfile?

It's not I'm not happy with nifi perf and/or the spark receiver, but the 
configuration or doc should be more clear on tuning.

Regards.
________________________________
De : Bryan Bende <[email protected]>

Envoyé : jeudi 8 octobre 2015 16:52
À : [email protected]
Objet : Re: Nifi & Spark receiver performance configuration

Hello,

When you say you were unhappy with the performance, can you give some more 
information about what was not performing well?

Was the NiFi Spark Receiver not pulling messages in fast enough and they were 
queuing up in NiFi?
Was NiFi not producing messages as fast as you expected?
What kind of environment were you running this? All on a local machine for 
testing?

-Bryan

On Thu, Oct 8, 2015 at 6:52 AM, Aurélien DEHAY 
<[email protected]<mailto:[email protected]>> wrote:

Hello.



I'm doing some experimentations on Apache Nifi to see where we can use it.



One idea is to use nifi to feed a spark cluster. So I'm doing some simple test 
(GenerateFlowFile => spark output port and a simple word count on spark side.



I was pretty unhappy with the performance out of the box, so I looked on the 
net and found almost nothing.



So I looked at nifi.properties, and found that some of the following properties 
have a huge impact on how many messages / second were processed to Spark :



nifi.queue.swap.threshold=20000

nifi.swap.in.period=1 sec

nifi.swap.in.threads=1

nifi.swap.out.period=1 sec

nifi.swap.out.threads=4


The documentation seems unclear on this point for output ports, is anyone have 
a pointer for me ?

Thanks.

Aurélien.

Attachment: nifi.scala
Description: nifi.scala

Reply via email to