Hello,
I just pull the thread up, if someone knows how to make the avro messages
consumption faster, I would be grateful.
Some more info: When we switched from ConsumeKafka with jsons to
ConsumeKafkaRecord with avro messages, we experienced a serious slowdown
(mutliple X) . I can get more data what slowdown precisely, but my question
about ConsumeKafka/MergeContent based flow becomes even more relevant to
me.
Or maybe I'm doing something wrong, that ConsumeKafkaRecord is so slower?

BTW, I'm on Nifi 1.7.1.

Thank you,
Krzysztof Zarzycki


pt., 7 gru 2018 o 22:24 Krzysztof Zarzycki <[email protected]>
napisaƂ(a):

> Hi everyone,
> I think I have quite a standard problem and maybe the answer would be
> quick, but I can't find it on the internet.
> We have avro messages in Kafka topic, written with HWX schema reference.
> We're able to read them in with e.g. ConsumeKafkaRecord with Avro reader.
>
> Now we would like to merge smaller flowfiles to larger files, because we
> load these files to HDFS. What combination of processors should we use to
> get this with the highest performance?
> Option 1: ConsumeKafkaRecord with AvroReader and AvroRecordSetWriter, then
> MergeRecord with AvroReader/AvroRecordSetWriter. It works, it seems
> straight forward, but for me it looks like there is too many
> interpretations and rewrites of records. Each records interpretation is an
> unnecessary cost of deserialization and then serialization through java
> heap.
>
> Option 2: somehow configure ConsumeKafka and MergeContent to do this? We
> used this combination for simple jsons (with binary concatenation), but we
> can't get it right with avro messages with schema reference (PutParquet
> processor can't read merged files with AvroReader). On the other side, this
> should be the fastest as there is no data interpretation, just byte to byte
> rewrite. Maybe we just haven't tried some of the configuration combination?
>
> Maybe Other options?
>
> Thank you for an advice.
> Krzysztof
>

Reply via email to