Hi Minh, The two settings are independent. You set them for different purposes.
The run schedule says how often the processor should run. A value of 0 means run whenever there are resources free and it is my turn. Any other value means run at set intervals. A reason to use this is when you are waiting on an external resource and there is no point polling it as fast as possible. This might be for example if you are running a database query and you need to query it at regular intervals to reduce stress on the database, rather than querying as fast as possible. The run duration says, once the processor is running, how long should it run for. A value of 0 means process only one item from the queue (per concurrent task of the processor, per node in the cluster). A value of 25mS means keep processing items until the 25mS is up, or until the input queue is empty, whichever comes first. This tends to be used when you fetching data from an external system such as a message queue and you want to make sure you have retrieved all of the data each time you poll it. I also tend to use this when a processor has more work items flowing to it than other parts of the flow. For example if the upstream processor split a big file into many lines, then the downstream processor might have many more times the items to process compared to the upstream. If they just kept taking it in turns to process one item at a time the second processor would never catch. So setting a run duration is one tool you could use to ensure that the processor gets to process multiple items per tick. I would say in your case you need to set both settings to 0. Regards Steve Hindmarch From: e-soci...@gmx.fr <e-soci...@gmx.fr> Sent: 08 November 2024 09:32 To: users@nifi.apache.org Cc: users@nifi.apache.org Subject: Re: Caused by: java.lang.OutOfMemoryError: Java heap space Thanks of lot ! You make my day with these videos. [3] if the "run schedule" = 0 seconds, we don't need to change the "run Duration" value, right ? Thanks Minh Envoyé: jeudi 7 novembre 2024 à 18:12 De: "Mark Payne" <marka...@hotmail.com<mailto:marka...@hotmail.com>> À: "users@nifi.apache.org<mailto:users@nifi.apache.org>" <users@nifi.apache.org<mailto:users@nifi.apache.org>> Objet: Re: Caused by: java.lang.OutOfMemoryError: Java heap space OK so given that, the issue is almost certainly because you’re promoting huge chunks of JSON into attributes using EvaluateJsonPath. You’ll want to avoid putting anything larger than a few hundred characters into attributes. Instead, lean into using Record-based processors In order to manipulate the contents of the FlowFiles as they are, without creating attributes from content. EvaluateJsonPath is helpful for creating attributes on small JSON Fields so that you can perform routing, etc. but should not be used to create large attributes. [1] I also see in your canvas that you have several load-balanced connections, which you should avoid [2]. Re: the relationship between “Run Schedule” and “Run Duration” - Run Schedule indicates how long to wait between triggering the Processor. Run Duration says how long to run the Processor each time it’s scheduled to run. So if Run Schedule = 5 seconds and Run Duration = 2 seconds, then the Processor will run for up to 2 seconds. Then it will not run again for 5 seconds. Then it will run for 2 seconds. Then it will do nothing for 5 seconds. In practice, Processors should almost always have a Run Schedule of 0 seconds except for source processors. See [3] for more details. Thanks -Mark [1] https://www.youtube.com/watch?v=RjWstt7nRVY&t=187 [2] https://www.youtube.com/watch?v=by9P0Zi8Dk8 [3] https://www.youtube.com/watch?v=pZq0EbfDBy4 On Nov 7, 2024, at 3:49 AM, e-soci...@gmx.fr<mailto:e-soci...@gmx.fr> wrote: Here the configuration for EvaluteJsonPath and ReplaceText Another question about "Run Schedule" and "Run Duration" In separately feature I know how each of them is working but how they do to work together ? I mean, if "Run Schedule" is setup to 0s and "Run Duration" is setup to 2s. It means the processor always running ? How does the impact one on the other ? Thanks a lot Minh Envoyé: mercredi 6 novembre 2024 à 16:13 De: "Mark Payne" <marka...@hotmail.com<mailto:marka...@hotmail.com>> À: "users@nifi.apache.org<mailto:users@nifi.apache.org>" <users@nifi.apache.org<mailto:users@nifi.apache.org>> Objet: Re: Caused by: java.lang.OutOfMemoryError: Java heap space OK so the decompress should be CPU intensive but not heap/memory intensive. EvaluateJsonPath will potentially consume large amounts of heap as well, depending on how it’s configured. The ExecuteGroovyScript sounds like it would use very little. ReplaceText may well consume huge amounts of heap, depending on how it’s configured. Can you share how EvaluteJsonPath and ReplaceText are configured? The idea that 16 GB of RAM is max recommended for a JVM was true a while ago but with modern JVM’s you can go much higher. That said, given the flow described, 4 GB should be more than sufficient if properly configured. Thanks -Mark On Nov 6, 2024, at 9:51 AM, e-soci...@gmx.fr<mailto:e-soci...@gmx.fr> wrote: Thanks for reply Mark, The groovy script is very simple : hexContent = flowFile.getAttribute('hexContent') hexContent = hexContent.decodeHex() outputStream.write(hexContent) The question is how is possible to process flowfiles as quickly as possible. If I upgrade the CPU to 8 per node, is it possible to process less flowfiles at the same time but more flowfiles ? The main nifi dataflow is : * Uncompress incoming flowfiles (cpu/heap consume I suppose) * ReplaceText (heap consume) * EvaluateJsonPath (heap consume) * ExecuteGroovyScript (heap consume) I read that 16GB of RAM is the maximum recommended for a JVM and that adding more isn’t beneficial. Is that true, or can I increase it to 32GB? Regards Minh Envoyé: mercredi 6 novembre 2024 à 15:24 De: "Mark Payne" <marka...@hotmail.com<mailto:marka...@hotmail.com>> À: "users@nifi.apache.org<mailto:users@nifi.apache.org>" <users@nifi.apache.org<mailto:users@nifi.apache.org>> Objet: Re: Caused by: java.lang.OutOfMemoryError: Java heap space Hi Minh, It is possible that the heap is being exhausted by EvaluateJsonPath if you are using it to add large JSON chunks as attributes. For example, if you’re creating an attribute from `$.` to put the entire JSON contents into attributes. Generally, attributes should be kept pretty small. Otherwise, based on the flow described, the issue is almost certainly within the ExecuteGroovyScript. There, there’s not much guidance we can provide, as it’s running your own script. You’d need to understand what in your own script is using up all of the heap. Thanks -Mark On Nov 6, 2024, at 4:26 AM, e-soci...@gmx.fr<mailto:e-soci...@gmx.fr> wrote: Hello all, We got a cluster with 10 nodes (4CPU/16Go) - NIFI 1.25 - jdk-11.0.19 We use this cluster to send the datas to GCP bucket, the datas are sent by others clusters, so we do S2S betweens them. I can't determine where is the issue. This message could by raise by EvaluateJsonPath/ExecuteGroovyScript/UpdateAttribute We have around 100.000 flowfiles (160Go datas) We need configure more than 1 tasks for each processor to run more faster but we have always this error <evaluateJsonPath.png><evaluateJsonPath2.png><out_of_memory.png><replaceText.png><replaceText2.png>