Hi Prabhu, Thanks for the confirmation. I can't guarantee if it's included in the next release, but try my best :) You can watch the JIRA to get updates when it proceeds. https://issues.apache.org/jira/browse/NIFI-3373
Thanks, Koji On Fri, Jan 20, 2017 at 2:16 PM, prabhu Mahendran <[email protected]> wrote: > Hi Koji, > > Both simulation looks perfect. I was expected this exact behavior and it > matches my requirement, also it sounds logical. Shall I expect this changes > in next nifi release version?? > > > Thank you so much for this tremendous support. > > > On Fri, Jan 20, 2017 at 6:14 AM, Koji Kawamura <[email protected]> > wrote: >> >> Hi Prabhu, >> >> In that case, yes, as your assumption, even the latest archive exceeds >> 500MB, the latest archive is saved, as long as it was written to disk >> successfully. >> >> After that, when user updates NiFi flow, before new archive is >> created, the previous one will be removed, because max.storage >> exceeds. Then the latest will be archived. >> >> Let's simulate the scenario with the to-be-updated logic by NIFI-3373, >> in which the size of flow.xml keeps increasing: >> >> # CASE-1 >> >> archive.max.storage=10MB >> archive.max.count = 5 >> >> Time | flow.xml | archives | archive total | >> t1 | f1 5MB | f1 | 5MB >> t2 | f2 5MB | f1, f2 | 10MB >> t3 | f3 5MB | f1, f2, f3 | 15MB >> t4 | f4 10MB | f2, f3, f4 | 20MB >> t5 | f5 15MB | f4, f5 | 25MB >> t6 | f6 20MB | f6 | 20MB >> t7 | f7 25MB | t7 | 25MB >> >> * t3: f3 can is archived even total exceeds 10MB. Because f1 + f2 <= >> 10MB. WAR message starts to be logged from this point, because total >> archive size > 10MB. >> * t4: The oldest f1 is removed, because f1 + f2 + f3 > 10MB. >> * t5: Even if flow.xml size exceeds max.storage, the latest archive is >> created. f4 are kept because f4 <= 10MB. >> * t6: f4 and f5 are removed because f4 + f5 > 10MB, and also f5 > 10MB. >> >> In this case, NiFi will keep logging WAR (or should be ERR??) message >> indicating archive storage size is exceeding limit, from t3. >> After t6, even if archive.max.count = 5, NiFi will only keep the >> latest flow.xml. >> >> # CASE-2 >> >> If you'd like to keep at least 5 archives no matter what, then set >> blank max.storage and max.time. >> >> archive.max.storage= >> archive.max.time= >> archive.max.count = 5 // Only limit archives by count >> >> Time | flow.xml | archives | archive total | >> t1 | f1 5MB | f1 | 5MB >> t2 | f2 5MB | f1, f2 | 10MB >> t3 | f3 5MB | f1, f2, f3 | 15MB >> t4 | f4 10MB | f1, f2, f3, f4 | 25MB >> t5 | f5 15MB | f1, f2, f3, f4, f5 | 40MB >> t6 | f6 20MB | f2, f3, f4, f5, f6 | 55MB >> t7 | f7 25MB | f3, f4, f5, f6, (f7) | 50MB, (75MB) >> t8 | f8 30MB | f3, f4, f5, f6 | 50MB >> >> * From t6, oldest archive is removed to keep number of archives <= 5 >> * At t7, if the disk has only 60MB space, f7 won't be archived. And >> after this point, archive mechanism stop working (Trying to create new >> archive, but keep getting exception: no space left on device). >> >> In either case above, once flow.xml has grown to that size, some human >> intervention would be needed. >> Do those simulation look reasonable? >> >> Thanks, >> Koji >> >> On Thu, Jan 19, 2017 at 5:48 PM, prabhu Mahendran >> <[email protected]> wrote: >> > Hi Koji, >> > >> > Thanks for your information. >> > >> > Actually the task description looks fine. I have one question here, >> > consider >> > the storage limit is 500MB, suppose my latest workflow exceeds this >> > limit, >> > which behavior is performed with respect to the properties(max.count, >> > max.time and max.storage)?? In my assumption latest archive is saved >> > even it >> > exceeds 500MB, so what happen from here? Either it will keep on save the >> > single latest archive with the large size or it will notify the user to >> > increase the size and preserves the latest file till we restart the >> > flow?? >> > If so what happens if the size is keep on increasing with respect to >> > 500MB, >> > it will save archive based on count or only latest archive throughtout >> > nifi >> > is in running status?? >> > >> > Many thanks >> > >> > On Thu, Jan 19, 2017 at 12:47 PM, Koji Kawamura <[email protected]> >> > wrote: >> >> >> >> Hi Prabhu, >> >> >> >> Thank you for the suggestion. >> >> >> >> Keeping latest N archives is nice, it's simple :) >> >> >> >> The max.time and max.storage have other benefit and since already >> >> released, we should keep existing behavior with these settings, too. >> >> I've created a JIRA to add archive.max.count property. >> >> https://issues.apache.org/jira/browse/NIFI-3373 >> >> >> >> Thanks, >> >> Koji >> >> >> >> On Thu, Jan 19, 2017 at 2:21 PM, prabhu Mahendran >> >> <[email protected]> wrote: >> >> > Hi Koji, >> >> > >> >> > >> >> > Thanks for your reply, >> >> > >> >> > Yes. Solution B may meet as I required. Currently if the storage size >> >> > meets, >> >> > complete folder is getting deleted and the new flow is not tracked in >> >> > the >> >> > archive folder. This behavior is the drawback here. I need atleast >> >> > last >> >> > workflow to be saved in the archive folder and notify the user to >> >> > increase >> >> > the size. At the same time till nifi restarts, atleast last complete >> >> > workflow should be backed up. >> >> > >> >> > >> >> > My another suggestion is as follows: >> >> > >> >> > >> >> > Regardless of the max.time and max.storage property, Can we have only >> >> > few >> >> > files in archive(consider only 10 files). Each action from the nifi >> >> > canvas >> >> > should be tracked here, if the flow.xml.gz archive files count >> >> > reaches >> >> > it >> >> > should delete the old first file and save the latest file, so that >> >> > the >> >> > count >> >> > 10 is maintained. Here we can maintain the workflow properly and >> >> > backup >> >> > is >> >> > also achieved without confusing with max.time and max.storage. Only >> >> > case >> >> > is >> >> > that the disk size exceeds, we should notify user about this. >> >> > >> >> > >> >> > Many thanks. >> >> > >> >> > >> >> > On Thu, Jan 19, 2017 at 6:36 AM, Koji Kawamura >> >> > <[email protected]> >> >> > wrote: >> >> >> >> >> >> Hi Prabhu, >> >> >> >> >> >> Thanks for sharing your experience with flow file archiving. >> >> >> The case that a single flow.xml.gz file size exceeds >> >> >> archive.max.storage was not considered well when I implemented >> >> >> NIFI-2145. >> >> >> >> >> >> By looking at the code, it currently works as follows: >> >> >> 1. The original conf/flow.xml.gz (> 1MB) is archived to conf/archive >> >> >> 2. NiFi checks if there's any expired archive files, and delete it >> >> >> if >> >> >> any >> >> >> 3. NiFi checks if the total size of all archived files, then delete >> >> >> the oldest archive. Keep doing this until the total size becomes >> >> >> less >> >> >> than or equal to the configured archive.max.storage. >> >> >> >> >> >> In your case, at step 3, the newly created archive is deleted, >> >> >> because >> >> >> its size was grater than archive.max.storage. >> >> >> In this case, NiFi only logs INFO level message, and it's hard to >> >> >> know >> >> >> what happened from user, as you reported. >> >> >> >> >> >> I'm going to create a JIRA for this, and fix current behavior by >> >> >> either one of following solutions: >> >> >> >> >> >> A. treat archive.max.storage as a HARD limit. If the original >> >> >> flow.xml.gz exceeds configured archive.max.storage in size, then >> >> >> throw >> >> >> an IOException, which results a WAR level log message "Unable to >> >> >> archive flow configuration as requested due to ...". >> >> >> >> >> >> B. treat archive.max.storage as a SOFT limit. By not including the >> >> >> newly created archive file at the step 2 and 3 above, so that it can >> >> >> stay there. Maybe a WAR level log message should be logged. >> >> >> >> >> >> For greater user experience, I'd prefer solution B, so that it can >> >> >> be >> >> >> archived even the flow.xml.gz exceeds archive storage size, since it >> >> >> was able to be written to disk, which means the physical disk had >> >> >> enough space. >> >> >> >> >> >> How do you think? >> >> >> >> >> >> Thanks! >> >> >> Koji >> >> >> >> >> >> On Wed, Jan 18, 2017 at 3:27 PM, prabhu Mahendran >> >> >> <[email protected]> wrote: >> >> >> > i have check below properties used for the backup operations in >> >> >> > Nifi-1.0.0 >> >> >> > with respect to JIRA. >> >> >> > >> >> >> > https://issues.apache.org/jira/browse/NIFI-2145 >> >> >> > >> >> >> > nifi.flow.configuration.archive.max.time=1 hours >> >> >> > nifi.flow.configuration.archive.max.storage=1 MB >> >> >> > >> >> >> > Since we have two backup operations first one is >> >> >> > "conf/flow.xml.gz" >> >> >> > and >> >> >> > "conf/archive/flow.xml.gz" >> >> >> > >> >> >> > I have saved archive workflows(conf/archive/flow.xml.gz) as per >> >> >> > hours >> >> >> > in >> >> >> > "max.time" property. >> >> >> > >> >> >> > At particular time i have reached "1 MB"[set as size of default >> >> >> > storage]. >> >> >> > >> >> >> > So it will delete existing conf/archive/flow.xml.gz completely and >> >> >> > doesn't >> >> >> > write new flow files in conf/archive/flow.xml.gz due to size >> >> >> > exceeds. >> >> >> > >> >> >> > No logs has shows that new flow.xml.gz has higher size than >> >> >> > specified >> >> >> > storage. >> >> >> > >> >> >> > Can we able to >> >> >> > >> >> >> > Why it could delete existing flows and doesn't write new flows due >> >> >> > to >> >> >> > storage? >> >> >> > >> >> >> > In this case in one backup operation has failed or not? >> >> >> > >> >> >> > Thanks, >> >> >> > >> >> >> > prabhu >> >> > >> >> > >> > >> > > >
