Hi Prabhu,

Thanks for sharing your experience with flow file archiving.
The case that a single flow.xml.gz file size exceeds
archive.max.storage was not considered well when I implemented
NIFI-2145.

By looking at the code, it currently works as follows:
1. The original conf/flow.xml.gz (> 1MB) is archived to conf/archive
2. NiFi checks if there's any expired archive files, and delete it if any
3. NiFi checks if the total size of all archived files, then delete
the oldest archive. Keep doing this until the total size becomes less
than or equal to the configured archive.max.storage.

In your case, at step 3, the newly created archive is deleted, because
its size was grater than archive.max.storage.
In this case, NiFi only logs INFO level message, and it's hard to know
what happened from user, as you reported.

I'm going to create a JIRA for this, and fix current behavior by
either one of following solutions:

A. treat archive.max.storage as a HARD limit. If the original
flow.xml.gz exceeds configured archive.max.storage in size, then throw
an IOException, which results a WAR level log message "Unable to
archive flow configuration as requested due to ...".

B. treat archive.max.storage as a SOFT limit. By not including the
newly created archive file at the step 2 and 3 above, so that it can
stay there. Maybe a WAR level log message should be logged.

For greater user experience, I'd prefer solution B, so that it can be
archived even the flow.xml.gz exceeds archive storage size, since it
was able to be written to disk, which means the physical disk had
enough space.

How do you think?

Thanks!
Koji

On Wed, Jan 18, 2017 at 3:27 PM, prabhu Mahendran
<[email protected]> wrote:
> i have check below properties used for the backup operations in Nifi-1.0.0
> with respect to JIRA.
>
> https://issues.apache.org/jira/browse/NIFI-2145
>
> nifi.flow.configuration.archive.max.time=1 hours
> nifi.flow.configuration.archive.max.storage=1 MB
>
> Since we have two backup operations first one is "conf/flow.xml.gz" and
> "conf/archive/flow.xml.gz"
>
> I have saved archive workflows(conf/archive/flow.xml.gz) as per hours in
> "max.time" property.
>
> At particular time i have reached "1 MB"[set as size of default storage].
>
> So it will delete existing conf/archive/flow.xml.gz completely and doesn't
> write new flow files in conf/archive/flow.xml.gz due to size exceeds.
>
> No logs has shows that new flow.xml.gz has higher size than specified
> storage.
>
> Can we able to
>
> Why it could delete existing flows and doesn't write new flows due to
> storage?
>
> In this case in one backup operation has failed or not?
>
> Thanks,
>
> prabhu

Reply via email to