Re: FlowFile Repository can't checkpoint, out of heap space.

Brandon DeVries Thu, 15 Aug 2019 08:53:55 -0700

Peter,

Unfortunately, I don't have a perfect solution for your current problem.  I
would try starting with autoResume=false, just to try to limit what's going
on in the system.  If possible, you can also try temporarily giving the JVM
more heap.


This is, however, the use case that led to the idea of "recovery mode" in
the new RocksDBFlowFileRepository[1] that should be in nifi 1.10.0 (the
documentation[2] is attached to the ticket):

"[Recovery mode] limits the number of FlowFiles loaded into the graph at a
time, while not actually removing any FlowFiles (or content) from the
system. This allows for the recovery of a system that is encountering
OutOfMemory errors or similar on startup..."

[1] https://issues.apache.org/jira/browse/NIFI-4775
[2]
https://issues.apache.org/jira/secure/attachment/12976954/RocksDBFlowFileRepo.html

On Wed, Aug 14, 2019 at 12:12 PM Peter Wicks (pwicks) <[email protected]>
wrote:

> I have a node in a cluster whose FlowFile repository grew so fast that it
> exceeded the amount of available heap space and now can't checkpoint. Or
> that is my interpretation of the error.
>
>
>
> "Cannot update journal file flowfile_repository/journals/####.journal
> because this journal  has already encountered a failure when attempting to
> write to the file."
>
> Additionally, on restart, we see NiFi failed to restart because it ran out
> of heap space while doing a SchemaRecordReader.readFieldValue.  Feeling a
> bit stuck on where to go from here.
>
>
>
> Based on metrics we collect, we see a large increase in FlowFile's on that
> node right before it crashed, and in linux we see the following:
>
> 94G     ./journals/overflow-569618072
>
> 356G    ./journals/overflow-569892338
>
>
>
> Oh, and a 280 GB checkpoint file
>
>
>
> There are a few queues/known FlowFile’s that are probably the problem, and
> I’m OK with dropping them, but there is plenty of other data in there too
> that I don’t want to lose…
>
>
>
> Thanks,
>
>   Peter
>

Re: FlowFile Repository can't checkpoint, out of heap space.

Reply via email to