Re: Release from holding queue after timeout

2021-04-14 Thread Jeremy Pemberton-Pigott
Thanks Paul, that was an excellent idea and the line is added to the Groovy
code that was already in the flow.  Works now :)

Jeremy

On Thu, Apr 15, 2021 at 12:44 AM Paul Kelly  wrote:

> Whatever processor is putting it into the queue could set a penalty on
> failed flow files that enter your retry queue.  It's one of the properties
> on the General tab.
>
> If you can't set it there for whatever reason, you could put an
> ExecuteScript processor within the retry loop with the following Groovy
> code in it, and set that ExecuteScript's penalty duration to 20s:
>
> flowFile = session.get()
> if(!flowFile) return
> session.penalize(flowFile)
> session.transfer(flowFile, REL_SUCCESS)
>
> Paul
>
> On Wed, Apr 14, 2021 at 4:22 PM Jeremy Pemberton-Pigott <
> fuzzych...@gmail.com> wrote:
>
>> Yes it can. Is that an update attribute which can set that on the flow
>> file?
>>
>> Regards,
>>
>> Jeremy
>>
>>
>> On 15 Apr 2021, at 00:16, Paul Kelly  wrote:
>>
>> 
>> Are you able to penalize the flow file for 20s within the update retry
>> queue?  Penalizing without yielding would cause a flow file to sit in the
>> queue for 20s before the next processor will try acting on it.
>>
>> Paul
>>
>> On Wed, Apr 14, 2021 at 5:05 AM Jeremy Pemberton-Pigott <
>> fuzzych...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I have a parallel update process in 2 different flows running in a 3
>>> node cluster running 1.6.0, if the insert side has not completed yet the
>>> update side moves the flow file to a waiting queue for retry.  I want to
>>> retry every 20s all the flow files in the queue that have been waiting that
>>> long.  There may be 10,000s waiting so I don't want to do 1 flow file every
>>> 20s.  Any idea how I can achieve this?  I don't think that I can use the
>>> yield mechanism because after n retries it goes into a notification logging
>>> flow.  And wait/notify won't work because the notification may appear
>>> before the update flow file arrives.
>>>
>>> Jeremy
>>>
>>


Re: Nifi throws an error when reading a large csv file

2021-04-14 Thread Mike Thomsen
I could be totally barking up the wrong tree, but I think this is our
clue: Requested array size exceeds VM limit

That means that something is causing the reader to try to allocate an
array with a number of entries greater than the VM allows.

Without seeing the schema, a sample of the CSV and a stacktrace it's
pretty hard to guess what's going on. For what it's worth, I've split
55GB JSON sets using a custom streaming JSON reader without a hiccup
on a NiFi instance with only 4-8GB of RAM allocated, so I'm fairly
confident we've got some quirky edge case here.

If you want to sanitize some inputs and share along with a schema that
might help.

On Wed, Apr 14, 2021 at 1:07 PM Vibhath Ileperuma
 wrote:
>
> Hi Chris,
>
> As you have mentioned, I am trying to split the large csv file in multiple 
> stages. But this error is thrown at the first stage even without creating a 
> single flow file.
> It seems like the issue is not with the processor, but with the CSV record 
> reader. This error is thrown while reading the csv file. I tried to write the 
> data in the large csv file into a kudu table using a putKudu processor with 
> the same CSV reader. Then also I got the same error message.
>
> Hi Otto,
>
> Only following information is available in log file related to the exception
>
> 2021-04-14 17:48:28,628 ERROR [Timer-Driven Process Thread-1] 
> o.a.nifi.processors.standard.SplitRecord 
> SplitRecord[id=c9a981db-0178-1000-363d-c767653a6f34] 
> SplitRecord[id=c9a981db-0178-1000-363d-c767653a6f34] failed to process 
> session due to java.lang.OutOfMemoryError: Requested array size exceeds VM 
> limit; Processor Administratively Yielded for 1 sec: 
> java.lang.OutOfMemoryError: Requested array size exceeds VM limit
>
> java.lang.OutOfMemoryError: Requested array size exceeds VM limit
>
> 2021-04-14 17:48:28,628 WARN [Timer-Driven Process Thread-1] 
> o.a.n.controller.tasks.ConnectableTask Administratively Yielding 
> SplitRecord[id=c9a981db-0178-1000-363d-c767653a6f34] due to uncaught 
> Exception: java.lang.OutOfMemoryError: Requested array size exceeds VM limit
>
> java.lang.OutOfMemoryError: Requested array size exceeds VM limit
>
> Thanks & Regards
>
> Vibhath Ileperuma
>
>
>
>
> On Wed, Apr 14, 2021 at 7:47 PM Otto Fowler  wrote:
>>
>> What is the complete stack trace of that exception?
>>
>> On Apr 14, 2021, at 02:36, Vibhath Ileperuma  
>> wrote:
>>
>> Requested array size exceeds VM limit
>>
>>


Re: InvokeHTTP hangs after several successful calls

2021-04-14 Thread jeanne-herndon


Upgrading Nifi from 1.11.4 to 1.13.2 resolved this problem for us.   



--
Sent from: http://apache-nifi-users-list.2361937.n4.nabble.com/


Re: Nifi throws an error when reading a large csv file

2021-04-14 Thread Otto Fowler
It would be good to get the stack trace, or rather a better one to see where 
the array is being created.

How many columns does the file have?
How are you doing the schema?
Which csv parser have you configured?
Can you reproduce with a smaller file size ( see split command on linux )?


> On Apr 14, 2021, at 13:07, Vibhath Ileperuma  
> wrote:
> 
> Hi Chris,
> 
> As you have mentioned, I am trying to split the large csv file in multiple 
> stages. But this error is thrown at the first stage even without creating a 
> single flow file. 
> It seems like the issue is not with the processor, but with the CSV record 
> reader. This error is thrown while reading the csv file. I tried to write the 
> data in the large csv file into a kudu table using a putKudu processor with 
> the same CSV reader. Then also I got the same error message.
> 
> Hi Otto,
> 
> Only following information is available in log file related to the exception
> 
> 2021-04-14 17:48:28,628 ERROR [Timer-Driven Process Thread-1] 
> o.a.nifi.processors.standard.SplitRecord 
> SplitRecord[id=c9a981db-0178-1000-363d-c767653a6f34] 
> SplitRecord[id=c9a981db-0178-1000-363d-c767653a6f34] failed to process 
> session due to java.lang.OutOfMemoryError: Requested array size exceeds VM 
> limit; Processor Administratively Yielded for 1 sec: 
> java.lang.OutOfMemoryError: Requested array size exceeds VM limit
> 
> java.lang.OutOfMemoryError: Requested array size exceeds VM limit
> 
> 2021-04-14 17:48:28,628 WARN [Timer-Driven Process Thread-1] 
> o.a.n.controller.tasks.ConnectableTask Administratively Yielding 
> SplitRecord[id=c9a981db-0178-1000-363d-c767653a6f34] due to uncaught 
> Exception: java.lang.OutOfMemoryError: Requested array size exceeds VM limit
> 
> java.lang.OutOfMemoryError: Requested array size exceeds VM limit
> 
> Thanks & Regards
> Vibhath Ileperuma
> 
> 
> 
> On Wed, Apr 14, 2021 at 7:47 PM Otto Fowler  > wrote:
> What is the complete stack trace of that exception?
> 
>> On Apr 14, 2021, at 02:36, Vibhath Ileperuma > > wrote:
>> 
>> Requested array size exceeds VM limit
> 



Re: Nifi throws an error when reading a large csv file

2021-04-14 Thread Joe Witt
How large is each line expected to be?  You could have a massive line
or much larger than thought of.  Or you could be creating far more
flowfiles than intended.  If you cut the file in size does it work
better?  Will need more data to help narrow in but obviously we're all
very interested to know what is happening.  These processors and the
readers/writers are meant to be quite bullet proof and handle very
very large data easily in most cases.

On Wed, Apr 14, 2021 at 10:07 AM Vibhath Ileperuma
 wrote:
>
> Hi Chris,
>
> As you have mentioned, I am trying to split the large csv file in multiple 
> stages. But this error is thrown at the first stage even without creating a 
> single flow file.
> It seems like the issue is not with the processor, but with the CSV record 
> reader. This error is thrown while reading the csv file. I tried to write the 
> data in the large csv file into a kudu table using a putKudu processor with 
> the same CSV reader. Then also I got the same error message.
>
> Hi Otto,
>
> Only following information is available in log file related to the exception
>
> 2021-04-14 17:48:28,628 ERROR [Timer-Driven Process Thread-1] 
> o.a.nifi.processors.standard.SplitRecord 
> SplitRecord[id=c9a981db-0178-1000-363d-c767653a6f34] 
> SplitRecord[id=c9a981db-0178-1000-363d-c767653a6f34] failed to process 
> session due to java.lang.OutOfMemoryError: Requested array size exceeds VM 
> limit; Processor Administratively Yielded for 1 sec: 
> java.lang.OutOfMemoryError: Requested array size exceeds VM limit
>
> java.lang.OutOfMemoryError: Requested array size exceeds VM limit
>
> 2021-04-14 17:48:28,628 WARN [Timer-Driven Process Thread-1] 
> o.a.n.controller.tasks.ConnectableTask Administratively Yielding 
> SplitRecord[id=c9a981db-0178-1000-363d-c767653a6f34] due to uncaught 
> Exception: java.lang.OutOfMemoryError: Requested array size exceeds VM limit
>
> java.lang.OutOfMemoryError: Requested array size exceeds VM limit
>
> Thanks & Regards
>
> Vibhath Ileperuma
>
>
>
>
> On Wed, Apr 14, 2021 at 7:47 PM Otto Fowler  wrote:
>>
>> What is the complete stack trace of that exception?
>>
>> On Apr 14, 2021, at 02:36, Vibhath Ileperuma  
>> wrote:
>>
>> Requested array size exceeds VM limit
>>
>>


Re: Nifi throws an error when reading a large csv file

2021-04-14 Thread Vibhath Ileperuma
Hi Chris,

As you have mentioned, I am trying to split the large csv file in multiple
stages. But this error is thrown at the first stage even without creating a
single flow file.
It seems like the issue is not with the processor, but with the CSV record
reader. This error is thrown while reading the csv file. I tried to write
the data in the large csv file into a kudu table using a putKudu processor
with the same CSV reader. Then also I got the same error message.

Hi Otto,

Only following information is available in log file related to the exception

*2021-04-14 17:48:28,628 ERROR [Timer-Driven Process Thread-1]
o.a.nifi.processors.standard.SplitRecord
SplitRecord[id=c9a981db-0178-1000-363d-c767653a6f34]
SplitRecord[id=c9a981db-0178-1000-363d-c767653a6f34] failed to process
session due to java.lang.OutOfMemoryError: Requested array size exceeds VM
limit; Processor Administratively Yielded for 1 sec:
java.lang.OutOfMemoryError: Requested array size exceeds VM limit*

*java.lang.OutOfMemoryError: Requested array size exceeds VM limit*

*2021-04-14 17:48:28,628 WARN [Timer-Driven Process Thread-1]
o.a.n.controller.tasks.ConnectableTask Administratively Yielding
SplitRecord[id=c9a981db-0178-1000-363d-c767653a6f34] due to uncaught
Exception: java.lang.OutOfMemoryError: Requested array size exceeds VM
limit*

*java.lang.OutOfMemoryError: Requested array size exceeds VM limit*

Thanks & Regards

*Vibhath Ileperuma*




On Wed, Apr 14, 2021 at 7:47 PM Otto Fowler  wrote:

> What is the complete stack trace of that exception?
>
> On Apr 14, 2021, at 02:36, Vibhath Ileperuma 
> wrote:
>
> Requested array size exceeds VM limit
>
>
>


Re: Release from holding queue after timeout

2021-04-14 Thread Paul Kelly
Whatever processor is putting it into the queue could set a penalty on
failed flow files that enter your retry queue.  It's one of the properties
on the General tab.

If you can't set it there for whatever reason, you could put an
ExecuteScript processor within the retry loop with the following Groovy
code in it, and set that ExecuteScript's penalty duration to 20s:

flowFile = session.get()
if(!flowFile) return
session.penalize(flowFile)
session.transfer(flowFile, REL_SUCCESS)

Paul

On Wed, Apr 14, 2021 at 4:22 PM Jeremy Pemberton-Pigott <
fuzzych...@gmail.com> wrote:

> Yes it can. Is that an update attribute which can set that on the flow
> file?
>
> Regards,
>
> Jeremy
>
>
> On 15 Apr 2021, at 00:16, Paul Kelly  wrote:
>
> 
> Are you able to penalize the flow file for 20s within the update retry
> queue?  Penalizing without yielding would cause a flow file to sit in the
> queue for 20s before the next processor will try acting on it.
>
> Paul
>
> On Wed, Apr 14, 2021 at 5:05 AM Jeremy Pemberton-Pigott <
> fuzzych...@gmail.com> wrote:
>
>> Hi,
>>
>> I have a parallel update process in 2 different flows running in a 3 node
>> cluster running 1.6.0, if the insert side has not completed yet the update
>> side moves the flow file to a waiting queue for retry.  I want to retry
>> every 20s all the flow files in the queue that have been waiting that
>> long.  There may be 10,000s waiting so I don't want to do 1 flow file every
>> 20s.  Any idea how I can achieve this?  I don't think that I can use the
>> yield mechanism because after n retries it goes into a notification logging
>> flow.  And wait/notify won't work because the notification may appear
>> before the update flow file arrives.
>>
>> Jeremy
>>
>


Re: Release from holding queue after timeout

2021-04-14 Thread Jeremy Pemberton-Pigott
Yes it can. Is that an update attribute which can set that on the flow file?

Regards,

Jeremy


On 15 Apr 2021, at 00:16, Paul Kelly  wrote:


Are you able to penalize the flow file for 20s within the update retry queue?  
Penalizing without yielding would cause a flow file to sit in the queue for 20s 
before the next processor will try acting on it.

Paul

On Wed, Apr 14, 2021 at 5:05 AM Jeremy Pemberton-Pigott  
wrote:
> Hi,
> 
> I have a parallel update process in 2 different flows running in a 3 node 
> cluster running 1.6.0, if the insert side has not completed yet the update 
> side moves the flow file to a waiting queue for retry.  I want to retry every 
> 20s all the flow files in the queue that have been waiting that long.  There 
> may be 10,000s waiting so I don't want to do 1 flow file every 20s.  Any idea 
> how I can achieve this?  I don't think that I can use the yield mechanism 
> because after n retries it goes into a notification logging flow.  And 
> wait/notify won't work because the notification may appear before the update 
> flow file arrives.
> 
> Jeremy


Re: Release from holding queue after timeout

2021-04-14 Thread Paul Kelly
Are you able to penalize the flow file for 20s within the update retry
queue?  Penalizing without yielding would cause a flow file to sit in the
queue for 20s before the next processor will try acting on it.

Paul

On Wed, Apr 14, 2021 at 5:05 AM Jeremy Pemberton-Pigott <
fuzzych...@gmail.com> wrote:

> Hi,
>
> I have a parallel update process in 2 different flows running in a 3 node
> cluster running 1.6.0, if the insert side has not completed yet the update
> side moves the flow file to a waiting queue for retry.  I want to retry
> every 20s all the flow files in the queue that have been waiting that
> long.  There may be 10,000s waiting so I don't want to do 1 flow file every
> 20s.  Any idea how I can achieve this?  I don't think that I can use the
> yield mechanism because after n retries it goes into a notification logging
> flow.  And wait/notify won't work because the notification may appear
> before the update flow file arrives.
>
> Jeremy
>


Re: Nifi throws an error when reading a large csv file

2021-04-14 Thread Otto Fowler
What is the complete stack trace of that exception?

> On Apr 14, 2021, at 02:36, Vibhath Ileperuma  
> wrote:
> 
> Requested array size exceeds VM limit



Re: Nifi throws an error when reading a large csv file

2021-04-14 Thread Chris Sampson
For splitting large files, it's often recommended to use a multi-stage
approach.

For example, if your file contains 1_000_000 records and you want to split
it into 1 record per FlowFile, it would be better to split into batches of,
say 1_000 records using SplitRecord and then split each of those FlowFiles
again into files of 1 record each. But bear in mind that this is going to
result in 1_000_000 FlowFiles in the Flow, which is unlikely to be very
performant.

While you may not be trying to split into individual records, the error
suggests you're trying to create too many FlowFiles from an incoming file
in a single operation. All FlowFiles created by a processor in a single
session (i.e. run of the processor) are held in memory until the session is
committed - each FlowFile uses an operating file descriptor, so it's common
to see OS/VM level errors like these in such scenarios.

The general recommendation is to try and use Record-based processors
throughout your Flow in order to avoid the need to Split/Merge file content
(but this isn't always possible, depending upon your use case and the
processors available in your version).

---
*Chris Sampson*
IT Consultant
chris.samp...@naimuri.com



On Wed, 14 Apr 2021 at 07:36, Vibhath Ileperuma 
wrote:

> Hi All,
>
> I'm using a SplitRecord processor with a CSV Reader and a CSV
> RecordSetWriter to split a large csv file (5.5GB-6GB) into multiple small
> csv files. When I start the processor, the below exception is thrown.
>
> "failed to process session due to Requested array size exceeds VM limit;
> Processor Administratively Yielded for 1 sec: java.lang.OutOfMemoryError:
> Requested array size exceeds VM limit"
>
>
> I would be grateful if someone can suggest a way to overcome this error.
>
> Thanks & Regards
>
> *Vibhath Ileperuma*
>


Re: InvokeHTTP hangs after several successful calls

2021-04-14 Thread Vijay Chhipa
Jeanne, 

This is not a JDK issue. It is specific to HTTP. 
You can try to apply the fix for 
https://issues.apache.org/jira/browse/NIFI-8181 
 as a patch to your current 
version. 

HTH, 
Vijay

> On Apr 14, 2021, at 8:20 AM, jeanne-herndon 
>  wrote:
> 
> I am having this same issue and do not have access to log files.  Same
> pattern as the others.  I need to terminate the invokeHttp processor to
> release threads.   Is this a JDK issue or an HTTP2 issue?   I am not able to
> find any solutions that work.
> 
> 
> 
> 
> --
> Sent from: http://apache-nifi-users-list.2361937.n4.nabble.com/



smime.p7s
Description: S/MIME cryptographic signature


Re: InvokeHTTP hangs after several successful calls

2021-04-14 Thread jeanne-herndon
I am having this same issue and do not have access to log files.  Same
pattern as the others.  I need to terminate the invokeHttp processor to
release threads.   Is this a JDK issue or an HTTP2 issue?   I am not able to
find any solutions that work.




--
Sent from: http://apache-nifi-users-list.2361937.n4.nabble.com/