Hi Koji, Both simulation looks perfect. I was expected this exact behavior and it matches my requirement, also it sounds logical. Shall I expect this changes in next nifi release version??
Thank you so much for this tremendous support. On Fri, Jan 20, 2017 at 6:14 AM, Koji Kawamura <[email protected]> wrote: > Hi Prabhu, > > In that case, yes, as your assumption, even the latest archive exceeds > 500MB, the latest archive is saved, as long as it was written to disk > successfully. > > After that, when user updates NiFi flow, before new archive is > created, the previous one will be removed, because max.storage > exceeds. Then the latest will be archived. > > Let's simulate the scenario with the to-be-updated logic by NIFI-3373, > in which the size of flow.xml keeps increasing: > > # CASE-1 > > archive.max.storage=10MB > archive.max.count = 5 > > Time | flow.xml | archives | archive total | > t1 | f1 5MB | f1 | 5MB > t2 | f2 5MB | f1, f2 | 10MB > t3 | f3 5MB | f1, f2, f3 | 15MB > t4 | f4 10MB | f2, f3, f4 | 20MB > t5 | f5 15MB | f4, f5 | 25MB > t6 | f6 20MB | f6 | 20MB > t7 | f7 25MB | t7 | 25MB > > * t3: f3 can is archived even total exceeds 10MB. Because f1 + f2 <= > 10MB. WAR message starts to be logged from this point, because total > archive size > 10MB. > * t4: The oldest f1 is removed, because f1 + f2 + f3 > 10MB. > * t5: Even if flow.xml size exceeds max.storage, the latest archive is > created. f4 are kept because f4 <= 10MB. > * t6: f4 and f5 are removed because f4 + f5 > 10MB, and also f5 > 10MB. > > In this case, NiFi will keep logging WAR (or should be ERR??) message > indicating archive storage size is exceeding limit, from t3. > After t6, even if archive.max.count = 5, NiFi will only keep the > latest flow.xml. > > # CASE-2 > > If you'd like to keep at least 5 archives no matter what, then set > blank max.storage and max.time. > > archive.max.storage= > archive.max.time= > archive.max.count = 5 // Only limit archives by count > > Time | flow.xml | archives | archive total | > t1 | f1 5MB | f1 | 5MB > t2 | f2 5MB | f1, f2 | 10MB > t3 | f3 5MB | f1, f2, f3 | 15MB > t4 | f4 10MB | f1, f2, f3, f4 | 25MB > t5 | f5 15MB | f1, f2, f3, f4, f5 | 40MB > t6 | f6 20MB | f2, f3, f4, f5, f6 | 55MB > t7 | f7 25MB | f3, f4, f5, f6, (f7) | 50MB, (75MB) > t8 | f8 30MB | f3, f4, f5, f6 | 50MB > > * From t6, oldest archive is removed to keep number of archives <= 5 > * At t7, if the disk has only 60MB space, f7 won't be archived. And > after this point, archive mechanism stop working (Trying to create new > archive, but keep getting exception: no space left on device). > > In either case above, once flow.xml has grown to that size, some human > intervention would be needed. > Do those simulation look reasonable? > > Thanks, > Koji > > On Thu, Jan 19, 2017 at 5:48 PM, prabhu Mahendran > <[email protected]> wrote: > > Hi Koji, > > > > Thanks for your information. > > > > Actually the task description looks fine. I have one question here, > consider > > the storage limit is 500MB, suppose my latest workflow exceeds this > limit, > > which behavior is performed with respect to the properties(max.count, > > max.time and max.storage)?? In my assumption latest archive is saved > even it > > exceeds 500MB, so what happen from here? Either it will keep on save the > > single latest archive with the large size or it will notify the user to > > increase the size and preserves the latest file till we restart the > flow?? > > If so what happens if the size is keep on increasing with respect to > 500MB, > > it will save archive based on count or only latest archive throughtout > nifi > > is in running status?? > > > > Many thanks > > > > On Thu, Jan 19, 2017 at 12:47 PM, Koji Kawamura <[email protected]> > > wrote: > >> > >> Hi Prabhu, > >> > >> Thank you for the suggestion. > >> > >> Keeping latest N archives is nice, it's simple :) > >> > >> The max.time and max.storage have other benefit and since already > >> released, we should keep existing behavior with these settings, too. > >> I've created a JIRA to add archive.max.count property. > >> https://issues.apache.org/jira/browse/NIFI-3373 > >> > >> Thanks, > >> Koji > >> > >> On Thu, Jan 19, 2017 at 2:21 PM, prabhu Mahendran > >> <[email protected]> wrote: > >> > Hi Koji, > >> > > >> > > >> > Thanks for your reply, > >> > > >> > Yes. Solution B may meet as I required. Currently if the storage size > >> > meets, > >> > complete folder is getting deleted and the new flow is not tracked in > >> > the > >> > archive folder. This behavior is the drawback here. I need atleast > last > >> > workflow to be saved in the archive folder and notify the user to > >> > increase > >> > the size. At the same time till nifi restarts, atleast last complete > >> > workflow should be backed up. > >> > > >> > > >> > My another suggestion is as follows: > >> > > >> > > >> > Regardless of the max.time and max.storage property, Can we have only > >> > few > >> > files in archive(consider only 10 files). Each action from the nifi > >> > canvas > >> > should be tracked here, if the flow.xml.gz archive files count reaches > >> > it > >> > should delete the old first file and save the latest file, so that the > >> > count > >> > 10 is maintained. Here we can maintain the workflow properly and > backup > >> > is > >> > also achieved without confusing with max.time and max.storage. Only > case > >> > is > >> > that the disk size exceeds, we should notify user about this. > >> > > >> > > >> > Many thanks. > >> > > >> > > >> > On Thu, Jan 19, 2017 at 6:36 AM, Koji Kawamura < > [email protected]> > >> > wrote: > >> >> > >> >> Hi Prabhu, > >> >> > >> >> Thanks for sharing your experience with flow file archiving. > >> >> The case that a single flow.xml.gz file size exceeds > >> >> archive.max.storage was not considered well when I implemented > >> >> NIFI-2145. > >> >> > >> >> By looking at the code, it currently works as follows: > >> >> 1. The original conf/flow.xml.gz (> 1MB) is archived to conf/archive > >> >> 2. NiFi checks if there's any expired archive files, and delete it if > >> >> any > >> >> 3. NiFi checks if the total size of all archived files, then delete > >> >> the oldest archive. Keep doing this until the total size becomes less > >> >> than or equal to the configured archive.max.storage. > >> >> > >> >> In your case, at step 3, the newly created archive is deleted, > because > >> >> its size was grater than archive.max.storage. > >> >> In this case, NiFi only logs INFO level message, and it's hard to > know > >> >> what happened from user, as you reported. > >> >> > >> >> I'm going to create a JIRA for this, and fix current behavior by > >> >> either one of following solutions: > >> >> > >> >> A. treat archive.max.storage as a HARD limit. If the original > >> >> flow.xml.gz exceeds configured archive.max.storage in size, then > throw > >> >> an IOException, which results a WAR level log message "Unable to > >> >> archive flow configuration as requested due to ...". > >> >> > >> >> B. treat archive.max.storage as a SOFT limit. By not including the > >> >> newly created archive file at the step 2 and 3 above, so that it can > >> >> stay there. Maybe a WAR level log message should be logged. > >> >> > >> >> For greater user experience, I'd prefer solution B, so that it can be > >> >> archived even the flow.xml.gz exceeds archive storage size, since it > >> >> was able to be written to disk, which means the physical disk had > >> >> enough space. > >> >> > >> >> How do you think? > >> >> > >> >> Thanks! > >> >> Koji > >> >> > >> >> On Wed, Jan 18, 2017 at 3:27 PM, prabhu Mahendran > >> >> <[email protected]> wrote: > >> >> > i have check below properties used for the backup operations in > >> >> > Nifi-1.0.0 > >> >> > with respect to JIRA. > >> >> > > >> >> > https://issues.apache.org/jira/browse/NIFI-2145 > >> >> > > >> >> > nifi.flow.configuration.archive.max.time=1 hours > >> >> > nifi.flow.configuration.archive.max.storage=1 MB > >> >> > > >> >> > Since we have two backup operations first one is "conf/flow.xml.gz" > >> >> > and > >> >> > "conf/archive/flow.xml.gz" > >> >> > > >> >> > I have saved archive workflows(conf/archive/flow.xml.gz) as per > hours > >> >> > in > >> >> > "max.time" property. > >> >> > > >> >> > At particular time i have reached "1 MB"[set as size of default > >> >> > storage]. > >> >> > > >> >> > So it will delete existing conf/archive/flow.xml.gz completely and > >> >> > doesn't > >> >> > write new flow files in conf/archive/flow.xml.gz due to size > exceeds. > >> >> > > >> >> > No logs has shows that new flow.xml.gz has higher size than > specified > >> >> > storage. > >> >> > > >> >> > Can we able to > >> >> > > >> >> > Why it could delete existing flows and doesn't write new flows due > to > >> >> > storage? > >> >> > > >> >> > In this case in one backup operation has failed or not? > >> >> > > >> >> > Thanks, > >> >> > > >> >> > prabhu > >> > > >> > > > > > >
