Hi Prabhu,

Thanks for the confirmation. I can't guarantee if it's included in the
next release, but try my best :) You can watch the JIRA to get updates
when it proceeds.
https://issues.apache.org/jira/browse/NIFI-3373

Thanks,
Koji

On Fri, Jan 20, 2017 at 2:16 PM, prabhu Mahendran
<[email protected]> wrote:
> Hi Koji,
>
> Both simulation looks perfect. I was expected this exact behavior and it
> matches my requirement, also it sounds logical. Shall I expect this changes
> in next nifi release version??
>
>
> Thank you so much for this tremendous support.
>
>
> On Fri, Jan 20, 2017 at 6:14 AM, Koji Kawamura <[email protected]>
> wrote:
>>
>> Hi Prabhu,
>>
>> In that case, yes, as your assumption, even the latest archive exceeds
>> 500MB, the latest archive is saved, as long as it was written to disk
>> successfully.
>>
>> After that, when user updates NiFi flow, before new archive is
>> created, the previous one will be removed, because max.storage
>> exceeds. Then the latest will be archived.
>>
>> Let's simulate the scenario with the to-be-updated logic by NIFI-3373,
>> in which the size of flow.xml keeps increasing:
>>
>> # CASE-1
>>
>> archive.max.storage=10MB
>> archive.max.count = 5
>>
>> Time | flow.xml | archives | archive total |
>> t1 | f1 5MB  | f1 | 5MB
>> t2 | f2 5MB  | f1, f2 | 10MB
>> t3 | f3 5MB  | f1, f2, f3 | 15MB
>> t4 | f4 10MB | f2, f3, f4 | 20MB
>> t5 | f5 15MB | f4, f5 | 25MB
>> t6 | f6 20MB | f6 | 20MB
>> t7 | f7 25MB | t7 | 25MB
>>
>> * t3: f3 can is archived even total exceeds 10MB. Because f1 + f2 <=
>> 10MB. WAR message starts to be logged from this point, because total
>> archive size > 10MB.
>> * t4: The oldest f1 is removed, because f1 + f2 + f3 > 10MB.
>> * t5: Even if flow.xml size exceeds max.storage, the latest archive is
>> created. f4 are kept because f4 <= 10MB.
>> * t6: f4 and f5 are removed because f4 + f5 > 10MB, and also f5 > 10MB.
>>
>> In this case, NiFi will keep logging WAR (or should be ERR??) message
>> indicating archive storage size is exceeding limit, from t3.
>> After t6, even if archive.max.count = 5, NiFi will only keep the
>> latest flow.xml.
>>
>> # CASE-2
>>
>> If you'd like to keep at least 5 archives no matter what, then set
>> blank max.storage and max.time.
>>
>> archive.max.storage=
>> archive.max.time=
>> archive.max.count = 5 // Only limit archives by count
>>
>> Time | flow.xml | archives | archive total |
>> t1 | f1 5MB  | f1 | 5MB
>> t2 | f2 5MB  | f1, f2 | 10MB
>> t3 | f3 5MB  | f1, f2, f3 | 15MB
>> t4 | f4 10MB | f1, f2, f3, f4 | 25MB
>> t5 | f5 15MB | f1, f2, f3, f4, f5 | 40MB
>> t6 | f6 20MB | f2, f3, f4, f5, f6 | 55MB
>> t7 | f7 25MB | f3, f4, f5, f6, (f7) | 50MB, (75MB)
>> t8 | f8 30MB | f3, f4, f5, f6 | 50MB
>>
>> * From t6, oldest archive is removed to keep number of archives <= 5
>> * At t7, if the disk has only 60MB space, f7 won't be archived. And
>> after this point, archive mechanism stop working (Trying to create new
>> archive, but keep getting exception: no space left on device).
>>
>> In either case above, once flow.xml has grown to that size, some human
>> intervention would be needed.
>> Do those simulation look reasonable?
>>
>> Thanks,
>> Koji
>>
>> On Thu, Jan 19, 2017 at 5:48 PM, prabhu Mahendran
>> <[email protected]> wrote:
>> > Hi Koji,
>> >
>> > Thanks for your information.
>> >
>> > Actually the task description looks fine. I have one question here,
>> > consider
>> > the storage limit is 500MB, suppose my latest workflow exceeds this
>> > limit,
>> > which behavior is performed with respect to the properties(max.count,
>> > max.time and max.storage)?? In my assumption latest archive is saved
>> > even it
>> > exceeds 500MB, so what happen from here? Either it will keep on save the
>> > single latest archive with the large size or it will notify the user to
>> > increase the size and preserves the latest file till we restart the
>> > flow??
>> > If so what happens if the size is keep on increasing with respect to
>> > 500MB,
>> > it will save archive based on count or only latest archive throughtout
>> > nifi
>> > is in running status??
>> >
>> > Many thanks
>> >
>> > On Thu, Jan 19, 2017 at 12:47 PM, Koji Kawamura <[email protected]>
>> > wrote:
>> >>
>> >> Hi Prabhu,
>> >>
>> >> Thank you for the suggestion.
>> >>
>> >> Keeping latest N archives is nice, it's simple :)
>> >>
>> >> The max.time and max.storage have other benefit and since already
>> >> released, we should keep existing behavior with these settings, too.
>> >> I've created a JIRA to add archive.max.count property.
>> >> https://issues.apache.org/jira/browse/NIFI-3373
>> >>
>> >> Thanks,
>> >> Koji
>> >>
>> >> On Thu, Jan 19, 2017 at 2:21 PM, prabhu Mahendran
>> >> <[email protected]> wrote:
>> >> > Hi Koji,
>> >> >
>> >> >
>> >> > Thanks for your reply,
>> >> >
>> >> > Yes. Solution B may meet as I required. Currently if the storage size
>> >> > meets,
>> >> > complete folder is getting deleted and the new flow is not tracked in
>> >> > the
>> >> > archive folder. This behavior is the drawback here. I need atleast
>> >> > last
>> >> > workflow to be saved in the archive folder and notify the user to
>> >> > increase
>> >> > the size. At the same time till nifi restarts, atleast last complete
>> >> > workflow should be backed up.
>> >> >
>> >> >
>> >> > My another suggestion is as follows:
>> >> >
>> >> >
>> >> > Regardless of the max.time and max.storage property, Can we have only
>> >> > few
>> >> > files in archive(consider only 10 files). Each action from the nifi
>> >> > canvas
>> >> > should be tracked here, if the flow.xml.gz archive files count
>> >> > reaches
>> >> > it
>> >> > should delete the old first file and save the latest file, so that
>> >> > the
>> >> > count
>> >> > 10 is maintained. Here we can maintain the workflow properly and
>> >> > backup
>> >> > is
>> >> > also achieved without confusing with max.time and max.storage. Only
>> >> > case
>> >> > is
>> >> > that the disk size exceeds, we should notify user about this.
>> >> >
>> >> >
>> >> > Many thanks.
>> >> >
>> >> >
>> >> > On Thu, Jan 19, 2017 at 6:36 AM, Koji Kawamura
>> >> > <[email protected]>
>> >> > wrote:
>> >> >>
>> >> >> Hi Prabhu,
>> >> >>
>> >> >> Thanks for sharing your experience with flow file archiving.
>> >> >> The case that a single flow.xml.gz file size exceeds
>> >> >> archive.max.storage was not considered well when I implemented
>> >> >> NIFI-2145.
>> >> >>
>> >> >> By looking at the code, it currently works as follows:
>> >> >> 1. The original conf/flow.xml.gz (> 1MB) is archived to conf/archive
>> >> >> 2. NiFi checks if there's any expired archive files, and delete it
>> >> >> if
>> >> >> any
>> >> >> 3. NiFi checks if the total size of all archived files, then delete
>> >> >> the oldest archive. Keep doing this until the total size becomes
>> >> >> less
>> >> >> than or equal to the configured archive.max.storage.
>> >> >>
>> >> >> In your case, at step 3, the newly created archive is deleted,
>> >> >> because
>> >> >> its size was grater than archive.max.storage.
>> >> >> In this case, NiFi only logs INFO level message, and it's hard to
>> >> >> know
>> >> >> what happened from user, as you reported.
>> >> >>
>> >> >> I'm going to create a JIRA for this, and fix current behavior by
>> >> >> either one of following solutions:
>> >> >>
>> >> >> A. treat archive.max.storage as a HARD limit. If the original
>> >> >> flow.xml.gz exceeds configured archive.max.storage in size, then
>> >> >> throw
>> >> >> an IOException, which results a WAR level log message "Unable to
>> >> >> archive flow configuration as requested due to ...".
>> >> >>
>> >> >> B. treat archive.max.storage as a SOFT limit. By not including the
>> >> >> newly created archive file at the step 2 and 3 above, so that it can
>> >> >> stay there. Maybe a WAR level log message should be logged.
>> >> >>
>> >> >> For greater user experience, I'd prefer solution B, so that it can
>> >> >> be
>> >> >> archived even the flow.xml.gz exceeds archive storage size, since it
>> >> >> was able to be written to disk, which means the physical disk had
>> >> >> enough space.
>> >> >>
>> >> >> How do you think?
>> >> >>
>> >> >> Thanks!
>> >> >> Koji
>> >> >>
>> >> >> On Wed, Jan 18, 2017 at 3:27 PM, prabhu Mahendran
>> >> >> <[email protected]> wrote:
>> >> >> > i have check below properties used for the backup operations in
>> >> >> > Nifi-1.0.0
>> >> >> > with respect to JIRA.
>> >> >> >
>> >> >> > https://issues.apache.org/jira/browse/NIFI-2145
>> >> >> >
>> >> >> > nifi.flow.configuration.archive.max.time=1 hours
>> >> >> > nifi.flow.configuration.archive.max.storage=1 MB
>> >> >> >
>> >> >> > Since we have two backup operations first one is
>> >> >> > "conf/flow.xml.gz"
>> >> >> > and
>> >> >> > "conf/archive/flow.xml.gz"
>> >> >> >
>> >> >> > I have saved archive workflows(conf/archive/flow.xml.gz) as per
>> >> >> > hours
>> >> >> > in
>> >> >> > "max.time" property.
>> >> >> >
>> >> >> > At particular time i have reached "1 MB"[set as size of default
>> >> >> > storage].
>> >> >> >
>> >> >> > So it will delete existing conf/archive/flow.xml.gz completely and
>> >> >> > doesn't
>> >> >> > write new flow files in conf/archive/flow.xml.gz due to size
>> >> >> > exceeds.
>> >> >> >
>> >> >> > No logs has shows that new flow.xml.gz has higher size than
>> >> >> > specified
>> >> >> > storage.
>> >> >> >
>> >> >> > Can we able to
>> >> >> >
>> >> >> > Why it could delete existing flows and doesn't write new flows due
>> >> >> > to
>> >> >> > storage?
>> >> >> >
>> >> >> > In this case in one backup operation has failed or not?
>> >> >> >
>> >> >> > Thanks,
>> >> >> >
>> >> >> > prabhu
>> >> >
>> >> >
>> >
>> >
>
>

Reply via email to