Re: MergeRecord takes too much time

2024-03-12 Thread Chris Sampson
I’m not an expert with MergeRecord, but looking at your screenshots, I’d guess that your setup is taking that long to reach one of the defined “maximum” settings, e.g. 2GB, 5,000,000 records, or 3600 seconds (1 hour). How large (number of records and content size in bytes) are the typical FlowF

Re: MergeRecord performance

2020-06-01 Thread Robert R. Bruno
I have back pressure object threshold set to 10 on that queue and my swap threshold is 20. I don't think though when I had the issue the number of flow files was very high in the queue in question since the issue was now at updaterecord after I did a mergecontent that greatly reduced the n

Re: MergeRecord performance

2020-06-01 Thread Mark Payne
Hey Robert, How big are the FlowFile queues that you have in front of your MergeContent/MergeRecord processors? Or, more specifically, what do you have configured for the back pressure threshold? I ask because there was a fix in 1.11.0 [1] that had to do with ordering when swapping and ensuring

Re: MergeRecord performance

2020-05-22 Thread Robert R. Bruno
Sorry one other thing I thought of that may help. I noticed on 1.11.4 when I would stop the updaterecord processor it would take a long period of time for the processor to stop (threads were hanging), but when I went back to 1.9.2 the processor would stop in a very timely manner. Not sure if that

Re: MergeRecord performance

2020-05-22 Thread Robert R. Bruno
I had more updates on this. Yesterday I again attempted to upgrade one of our 1.9.2 clusters that is now using mergecontent vs mergerecord. The flow had been running on 1.9.2 for about a week with no issue. I did the upgrade to 1.11.4, and saw about 3 of 10 nodes not being able to keep up. The

Re: MergeRecord performance

2020-05-08 Thread Robert R. Bruno
Sorry for the delayed answer, but was doing some testing this week and found a few more things out. First to answer some of your questions. I would say with no actual raw numbers, it was worse than a 10% degradation. I say this since the flow was badly backing up, and a 10% decrease in performan

Re: MergeRecord performance

2020-04-27 Thread Mark Payne
Robert, What kind of performance degradation were you seeing here? I put together some simple flows to see if I could reproduce using 1.9.2 and current master. My flow consisted of GenerateFlowFile (generating 2 CSV rows per FlowFile) -> ConvertRecord (to Avro) -> MergeRecord (read Avro, write A

Re: MergeRecord performance

2020-04-24 Thread Robert R. Bruno
Joe, In that part of the flow, we are using avro readers and writers. We are using snappy compression (which could be part of the problem). Since we are using avro at that point the embedded schema is being used by the reader and the writer is using the schema name property along with an interna

Re: MergeRecord performance

2020-04-24 Thread Joe Witt
Robert, Can you please detail the record readers and writers involved and how schemas are accessed? There can be very important performance related changes in the parsers/serializers of the given formats. And we've added a lot to make schema caching really capable but you have to opt into it. I

Re: Re: MergeRecord can not guarantee the ordering of the input sequence?

2019-10-20 Thread wangl...@geekplus.com.cn
be balanced to the same node. The order of ProcessorB received will probably not the same as ProcessorA emited. And the order is nondeterministic. Thanks, Lei wangl...@geekplus.com.cn From: Koji Kawamura Date: 2019-10-20 18:02 To: users Subject: Re: Re: MergeRecord can not guarantee the

Re: Re: MergeRecord can not guarantee the ordering of the input sequence?

2019-10-20 Thread Koji Kawamura
ngl...@geekplus.com.cn > > > From: wangl...@geekplus.com.cn > Date: 2019-10-16 10:21 > To: dev; users > CC: dev > Subject: Re: Re: MergeRecord can not guarantee the ordering of the input > sequence? > Hi Koji, > Actually i have set all connections to FIFO and concurrency t

Re: Re: MergeRecord can not guarantee the ordering of the input sequence?

2019-10-18 Thread wangl...@geekplus.com.cn
Seems it is because of the balance strategy that is used. The balance will not guarantee the the order. Thanks, Lei wangl...@geekplus.com.cn From: wangl...@geekplus.com.cn Date: 2019-10-16 10:21 To: dev; users CC: dev Subject: Re: Re: MergeRecord can not guarantee the ordering of the input

Re: Re: MergeRecord can not guarantee the ordering of the input sequence?

2019-10-15 Thread wangl...@geekplus.com.cn
This is nondeterministic. I think I should look up the MergeRecord code and do further debug. Thanks, Lei wangl...@geekplus.com.cn From: Koji Kawamura Date: 2019-10-16 09:46 To: users CC: dev Subject: Re: MergeRecord can not guarantee the ordering of the input sequence? Hi Lei, How about

Re: MergeRecord can not guarantee the ordering of the input sequence?

2019-10-15 Thread Koji Kawamura
Hi Lei, How about setting FIFO prioritizer at all the preceding connections before the MergeRecord? Without setting any prioritizer, FlowFile ordering is nondeterministic. Thanks, Koji On Tue, Oct 15, 2019 at 8:56 PM wangl...@geekplus.com.cn wrote: > > > If FlowFile A, B, C enter the MergeReco

Re: MergeRecord, queue & backpressure

2018-04-13 Thread Juan Sequeiros
Good afternoon, Another thing to help you out maybe ... You can also tweak the nifi.properties setting: nifi.queue.swap.threshold=2 This setting will control the value of the max flowfile count on a connection if exceeded it will flush those flowfiles to disk. I am not sure however there is

Re: MergeRecord, queue & backpressure

2018-04-13 Thread Mark Payne
Aurélien, In that case you're looking to merge about 500,000 FlowFiles into a single FlowFile, so you'll definitely want to use a cascading approach. I'd shoot for about 1 MB for the first MergeRecord and then merge 128 of those together for the second MergeRecord. The provenance backpressure i

RE: MergeRecord

2018-04-13 Thread DEHAY Aurelien
-Original Message- From: Koji Kawamura [mailto:ijokaruma...@gmail.com] Sent: vendredi 13 avril 2018 09:20 To: users@nifi.apache.org Subject: Re: MergeRecord Hi, Just FYI, If I replaces the schema doc comment by UpdateAttribute, I was able to merge records. ${inferred.avro.schema:replaceAll

Re: MergeRecord

2018-04-13 Thread Koji Kawamura
Hi, Just FYI, If I replaces the schema doc comment by UpdateAttribute, I was able to merge records. ${inferred.avro.schema:replaceAll('"Type inferred from [^"]+"', '""')} I looked at InferAvroSchema and underlying Kite source code, but there's no option to suppress the doc comment when inferring

Re: MergeRecord

2018-04-13 Thread Koji Kawamura
Hi, I've tested InferAvroSchema and MergeRecord scenario. As you described, records are not merged as expected. The reason in my case is, InferAvroSchema generates schema text like this: inferred.avro.schema { "type" : "record", "name" : "example", "doc" : "Schema generated by Kite", "fields" : [

Re: MergeRecord

2018-04-12 Thread DEHAY Aurelien
Hello. Thanks for the answer. The 20k is just the last test, I’ve tested with 100,1000, with an input queue of 10k, and it doesn’t change anything. I will try to simplify the test case and to not use the inferred schema. Regards > Le 13 avr. 2018 à 04:50, Koji Kawamura a écrit : > > He

Re: MergeRecord

2018-04-12 Thread Koji Kawamura
Hello, I checked your template. Haven't run the flow since I don't have sample input XML files. However, when I looked at the MergeRecord processor configuration, I found that: Minimum Number of Records = 2 Max Bin Age = 10 sec By briefly looked at MergeRecord source code, it expires a bin th