Hello.
I'm testing on a VM 8vCPU (E5606 2.13Ghz) / 16Go. I just have a GenerateFLowFIle which send data to an output port for Spark. Here, the performance is very good, I can generate a huge number of flow files. My spark job is configured as local[4], and use 3 receivers. It just doing a simple word count on the stream, with a streaming context at 2 seconds. My test is simple, I generate about 200k messages (about 200MB) from generateFLowFile, not starting the output port, in order to queue the data. Then I stop the processor, and start the output port and my spark job (see attached file) with: bin/spark-shell --master local[4] --packages "org.apache.nifi:nifi-spark-receiver:0.3.0" -i nifi.scala with: nifi.queue.swap.threshold=20000 nifi.swap.in.period=10 sec nifi.swap.in.threads=1 nifi.swap.out.period=1 sec nifi.swap.out.threads=4 I get the result in 2015-10-08 17_07_04 screenshot file. With nifi.queue.swap.threshold=20000 nifi.swap.in.period=1 sec nifi.swap.in.threads=1 nifi.swap.out.period=1 sec nifi.swap.out.threads=4 results in 2015-10-08 17_13_28 screenshot. With nifi.queue.swap.threshold=200000 nifi.swap.in.period=1 sec nifi.swap.in.threads=1 nifi.swap.out.period=1 sec nifi.swap.out.threads=4 I see result in 2015-10-08 17_21_09 screenshot. Many questions: - Why nifi limits the number of flow file sent to at most the swap threshold? - Why nifi waits swap.in.period to send batch of flowfile? It's not I'm not happy with nifi perf and/or the spark receiver, but the configuration or doc should be more clear on tuning. Regards. ________________________________ De : Bryan Bende <[email protected]> Envoyé : jeudi 8 octobre 2015 16:52 À : [email protected] Objet : Re: Nifi & Spark receiver performance configuration Hello, When you say you were unhappy with the performance, can you give some more information about what was not performing well? Was the NiFi Spark Receiver not pulling messages in fast enough and they were queuing up in NiFi? Was NiFi not producing messages as fast as you expected? What kind of environment were you running this? All on a local machine for testing? -Bryan On Thu, Oct 8, 2015 at 6:52 AM, Aurélien DEHAY <[email protected]<mailto:[email protected]>> wrote: Hello. I'm doing some experimentations on Apache Nifi to see where we can use it. One idea is to use nifi to feed a spark cluster. So I'm doing some simple test (GenerateFlowFile => spark output port and a simple word count on spark side. I was pretty unhappy with the performance out of the box, so I looked on the net and found almost nothing. So I looked at nifi.properties, and found that some of the following properties have a huge impact on how many messages / second were processed to Spark : nifi.queue.swap.threshold=20000 nifi.swap.in.period=1 sec nifi.swap.in.threads=1 nifi.swap.out.period=1 sec nifi.swap.out.threads=4 The documentation seems unclear on this point for output ports, is anyone have a pointer for me ? Thanks. Aurélien.
nifi.scala
Description: nifi.scala
