[jira] [Commented] (NIFI-3213) ListFile always skips files with the latest timestamp in an iteration even if the files have existed a while ago
[ https://issues.apache.org/jira/browse/NIFI-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17355238#comment-17355238 ] Kartik Mishra commented on NIFI-3213: - Hi Team, I am on NiFi 1.12.1 version. If I got it right, it should be solved already. But we are facing this issue in 1.12.1. ListFile, ListFtp and ListSftp are always skiping the files with latest timestamp > ListFile always skips files with the latest timestamp in an iteration even if > the files have existed a while ago > > > Key: NIFI-3213 > URL: https://issues.apache.org/jira/browse/NIFI-3213 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions >Affects Versions: 1.0.0, 0.5.0, 0.6.0, 0.5.1, 0.7.0, 0.6.1, 1.1.0, 0.7.1 >Reporter: Koji Kawamura >Assignee: Koji Kawamura >Priority: Major > Fix For: 1.2.0 > > > NIFI-1484 add few lines of code to avoid files to be emitted if those have > the latest timestamp within an iteration of listing, because it may still be > written at the same time. > While it doesn't affect much if ListFiles processor is scheduled with a short > period of time, such as few ms, but it does affect negatively if an user > scheduled it with longer run schedule such as "1 day" or with cron scheduler. > For example, user would expect to process list of files per daily basis. Even > if a file is saved few hours ago, the processor will skip this, because the > file has the latest timestamp within the iteration. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (NIFI-3213) ListFile always skips files with the latest timestamp in an iteration even if the files have existed a while ago
[ https://issues.apache.org/jira/browse/NIFI-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16048867#comment-16048867 ] Koji Kawamura commented on NIFI-3213: - [~bende] Sorry for my late response. As you concerned, this JIRA can list without waiting additional cycle, it doesn't go into that else statement. We shouldn't use System.nanoTime to compare with file timestamp as System.nanoTime uses arbitrary origin and differ from one JVM to another. Even before this JIRA is merged, filesystems those do not provide timestamps in milliseconds precision have had a problem that ListFile can miss some of the files those are written with the same timestamp in seconds precision. I created NIFI-4069 to address your concern and also to work with those filesystems with less accurate timestamp. It'd be appreciated if you can take a look on NIFI-4069 and its PR. Let's discuss further at NIFI-4069. > ListFile always skips files with the latest timestamp in an iteration even if > the files have existed a while ago > > > Key: NIFI-3213 > URL: https://issues.apache.org/jira/browse/NIFI-3213 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions >Affects Versions: 1.0.0, 0.5.0, 0.6.0, 0.5.1, 0.7.0, 0.6.1, 1.1.0, 0.7.1 >Reporter: Koji Kawamura >Assignee: Koji Kawamura > Fix For: 1.2.0 > > > NIFI-1484 add few lines of code to avoid files to be emitted if those have > the latest timestamp within an iteration of listing, because it may still be > written at the same time. > While it doesn't affect much if ListFiles processor is scheduled with a short > period of time, such as few ms, but it does affect negatively if an user > scheduled it with longer run schedule such as "1 day" or with cron scheduler. > For example, user would expect to process list of files per daily basis. Even > if a file is saved few hours ago, the processor will skip this, because the > file has the latest timestamp within the iteration. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-3213) ListFile always skips files with the latest timestamp in an iteration even if the files have existed a while ago
[ https://issues.apache.org/jira/browse/NIFI-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025275#comment-16025275 ] Bryan Bende commented on NIFI-3213: --- I came across this JIRA while looking at a similar issue with ListHDFS, I'm wondering about a couple of things... I believe the reason for the original logic was for the following scenario: - file1 written with time1 - processor performs listing - file2 written with time1 Since we are only tracking timestamps and not which files were listed, if we include file1 in the listing then we will miss file2 on the next execution because we are looking for things newer than time1, if we include it on both sides then we get file1 listed twice because we don't know we listed it the first time. So instead we were leaving it out and getting them both next time, which has the drawback of a delay, but won't miss anything or have duplicates. With this change we are doing the following: final long currentListingTimestamp = System.nanoTime(); Then later using that value: else if (latestListingTimestamp >= currentListingTimestamp - LISTING_LAG_NANOS) { orderedEntries.remove(latestListingTimestamp); } What if the directory we are listing is a remote directory where the timestamps don't really correspond with NiFi's timestamps? Is latestListingTimestamp in milliseconds and we are comparing against currentListingTimestamp in nano-seconds? I'm concerned that we may never go into that else statement for cases where we were supposed to. > ListFile always skips files with the latest timestamp in an iteration even if > the files have existed a while ago > > > Key: NIFI-3213 > URL: https://issues.apache.org/jira/browse/NIFI-3213 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions >Affects Versions: 1.0.0, 0.5.0, 0.6.0, 0.5.1, 0.7.0, 0.6.1, 1.1.0, 0.7.1 >Reporter: Koji Kawamura >Assignee: Koji Kawamura > Fix For: 1.2.0 > > > NIFI-1484 add few lines of code to avoid files to be emitted if those have > the latest timestamp within an iteration of listing, because it may still be > written at the same time. > While it doesn't affect much if ListFiles processor is scheduled with a short > period of time, such as few ms, but it does affect negatively if an user > scheduled it with longer run schedule such as "1 day" or with cron scheduler. > For example, user would expect to process list of files per daily basis. Even > if a file is saved few hours ago, the processor will skip this, because the > file has the latest timestamp within the iteration. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (NIFI-3213) ListFile always skips files with the latest timestamp in an iteration even if the files have existed a while ago
[ https://issues.apache.org/jira/browse/NIFI-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15872144#comment-15872144 ] ASF GitHub Bot commented on NIFI-3213: -- Github user ijokarumawak commented on the issue: https://github.com/apache/nifi/pull/1335 @trixpan Thank you for merging! closed. > ListFile always skips files with the latest timestamp in an iteration even if > the files have existed a while ago > > > Key: NIFI-3213 > URL: https://issues.apache.org/jira/browse/NIFI-3213 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions >Affects Versions: 1.0.0, 0.5.0, 0.6.0, 0.5.1, 0.7.0, 0.6.1, 1.1.0, 0.7.1 >Reporter: Koji Kawamura >Assignee: Koji Kawamura > > NIFI-1484 add few lines of code to avoid files to be emitted if those have > the latest timestamp within an iteration of listing, because it may still be > written at the same time. > While it doesn't affect much if ListFiles processor is scheduled with a short > period of time, such as few ms, but it does affect negatively if an user > scheduled it with longer run schedule such as "1 day" or with cron scheduler. > For example, user would expect to process list of files per daily basis. Even > if a file is saved few hours ago, the processor will skip this, because the > file has the latest timestamp within the iteration. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (NIFI-3213) ListFile always skips files with the latest timestamp in an iteration even if the files have existed a while ago
[ https://issues.apache.org/jira/browse/NIFI-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15872145#comment-15872145 ] ASF GitHub Bot commented on NIFI-3213: -- Github user ijokarumawak closed the pull request at: https://github.com/apache/nifi/pull/1335 > ListFile always skips files with the latest timestamp in an iteration even if > the files have existed a while ago > > > Key: NIFI-3213 > URL: https://issues.apache.org/jira/browse/NIFI-3213 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions >Affects Versions: 1.0.0, 0.5.0, 0.6.0, 0.5.1, 0.7.0, 0.6.1, 1.1.0, 0.7.1 >Reporter: Koji Kawamura >Assignee: Koji Kawamura > > NIFI-1484 add few lines of code to avoid files to be emitted if those have > the latest timestamp within an iteration of listing, because it may still be > written at the same time. > While it doesn't affect much if ListFiles processor is scheduled with a short > period of time, such as few ms, but it does affect negatively if an user > scheduled it with longer run schedule such as "1 day" or with cron scheduler. > For example, user would expect to process list of files per daily basis. Even > if a file is saved few hours ago, the processor will skip this, because the > file has the latest timestamp within the iteration. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (NIFI-3213) ListFile always skips files with the latest timestamp in an iteration even if the files have existed a while ago
[ https://issues.apache.org/jira/browse/NIFI-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15871895#comment-15871895 ] ASF GitHub Bot commented on NIFI-3213: -- Github user trixpan commented on the issue: https://github.com/apache/nifi/pull/1335 @ijokarumawak merged. However I forgot to add reference to the PR in the commit message. Would you mind closing it manually ? I thank you in advance > ListFile always skips files with the latest timestamp in an iteration even if > the files have existed a while ago > > > Key: NIFI-3213 > URL: https://issues.apache.org/jira/browse/NIFI-3213 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions >Affects Versions: 1.0.0, 0.5.0, 0.6.0, 0.5.1, 0.7.0, 0.6.1, 1.1.0, 0.7.1 >Reporter: Koji Kawamura >Assignee: Koji Kawamura > > NIFI-1484 add few lines of code to avoid files to be emitted if those have > the latest timestamp within an iteration of listing, because it may still be > written at the same time. > While it doesn't affect much if ListFiles processor is scheduled with a short > period of time, such as few ms, but it does affect negatively if an user > scheduled it with longer run schedule such as "1 day" or with cron scheduler. > For example, user would expect to process list of files per daily basis. Even > if a file is saved few hours ago, the processor will skip this, because the > file has the latest timestamp within the iteration. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (NIFI-3213) ListFile always skips files with the latest timestamp in an iteration even if the files have existed a while ago
[ https://issues.apache.org/jira/browse/NIFI-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15871887#comment-15871887 ] ASF subversion and git services commented on NIFI-3213: --- Commit 095c04eda0c604a02c51df085ba67847448224c0 in nifi's branch refs/heads/master from [~ijokarumawak] [ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=095c04e ] NIFI-3213: ListFile do not skip obviously old files Before this fix, files with the latest timestamp within a listing iteration are always be held back one cycle no matter how old it is. Signed-off-by: Andre F de Miranda > ListFile always skips files with the latest timestamp in an iteration even if > the files have existed a while ago > > > Key: NIFI-3213 > URL: https://issues.apache.org/jira/browse/NIFI-3213 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions >Affects Versions: 1.0.0, 0.5.0, 0.6.0, 0.5.1, 0.7.0, 0.6.1, 1.1.0, 0.7.1 >Reporter: Koji Kawamura >Assignee: Koji Kawamura > > NIFI-1484 add few lines of code to avoid files to be emitted if those have > the latest timestamp within an iteration of listing, because it may still be > written at the same time. > While it doesn't affect much if ListFiles processor is scheduled with a short > period of time, such as few ms, but it does affect negatively if an user > scheduled it with longer run schedule such as "1 day" or with cron scheduler. > For example, user would expect to process list of files per daily basis. Even > if a file is saved few hours ago, the processor will skip this, because the > file has the latest timestamp within the iteration. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (NIFI-3213) ListFile always skips files with the latest timestamp in an iteration even if the files have existed a while ago
[ https://issues.apache.org/jira/browse/NIFI-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15871872#comment-15871872 ] Andre F de Miranda commented on NIFI-3213: -- NIFI-3213: ListFile do not skip obviously old files Fixed TestListFile.testFilterAge to make it consistent. It used to use the same last modified timestamp throughout the test that is set at the beginning of the test. It caused different test results because the meaning of `age1` to `age5` can vary at the later part of the test, as the test also make use of Thread.sleep. This fix reset age variables and last modified timestamp of test input files before executing next run, to ensure the meaning of age variables to be consistent. > ListFile always skips files with the latest timestamp in an iteration even if > the files have existed a while ago > > > Key: NIFI-3213 > URL: https://issues.apache.org/jira/browse/NIFI-3213 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions >Affects Versions: 1.0.0, 0.5.0, 0.6.0, 0.5.1, 0.7.0, 0.6.1, 1.1.0, 0.7.1 >Reporter: Koji Kawamura >Assignee: Koji Kawamura > > NIFI-1484 add few lines of code to avoid files to be emitted if those have > the latest timestamp within an iteration of listing, because it may still be > written at the same time. > While it doesn't affect much if ListFiles processor is scheduled with a short > period of time, such as few ms, but it does affect negatively if an user > scheduled it with longer run schedule such as "1 day" or with cron scheduler. > For example, user would expect to process list of files per daily basis. Even > if a file is saved few hours ago, the processor will skip this, because the > file has the latest timestamp within the iteration. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (NIFI-3213) ListFile always skips files with the latest timestamp in an iteration even if the files have existed a while ago
[ https://issues.apache.org/jira/browse/NIFI-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15871053#comment-15871053 ] ASF GitHub Bot commented on NIFI-3213: -- Github user ijokarumawak commented on the issue: https://github.com/apache/nifi/pull/1335 @trixpan Since the failing test has more Thread.sleep calls than before, the meaning of variables such as `age1` or `age2` became fragile at the later part of the test. I added another commit to fix the test case. Also, rebased it with the latest master just in case. Thanks again for caching this test issue. Please review the test again, and let me know if you want me to squash commits. > ListFile always skips files with the latest timestamp in an iteration even if > the files have existed a while ago > > > Key: NIFI-3213 > URL: https://issues.apache.org/jira/browse/NIFI-3213 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions >Affects Versions: 1.0.0, 0.5.0, 0.6.0, 0.5.1, 0.7.0, 0.6.1, 1.1.0, 0.7.1 >Reporter: Koji Kawamura >Assignee: Koji Kawamura > > NIFI-1484 add few lines of code to avoid files to be emitted if those have > the latest timestamp within an iteration of listing, because it may still be > written at the same time. > While it doesn't affect much if ListFiles processor is scheduled with a short > period of time, such as few ms, but it does affect negatively if an user > scheduled it with longer run schedule such as "1 day" or with cron scheduler. > For example, user would expect to process list of files per daily basis. Even > if a file is saved few hours ago, the processor will skip this, because the > file has the latest timestamp within the iteration. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (NIFI-3213) ListFile always skips files with the latest timestamp in an iteration even if the files have existed a while ago
[ https://issues.apache.org/jira/browse/NIFI-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15869791#comment-15869791 ] ASF GitHub Bot commented on NIFI-3213: -- Github user ijokarumawak commented on the issue: https://github.com/apache/nifi/pull/1335 @trixpan Thanks for reviewing and catching the unit test failure. Yes, I'd like to look at it closer. I will update the PR accordingly. > ListFile always skips files with the latest timestamp in an iteration even if > the files have existed a while ago > > > Key: NIFI-3213 > URL: https://issues.apache.org/jira/browse/NIFI-3213 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions >Affects Versions: 1.0.0, 0.5.0, 0.6.0, 0.5.1, 0.7.0, 0.6.1, 1.1.0, 0.7.1 >Reporter: Koji Kawamura >Assignee: Koji Kawamura > > NIFI-1484 add few lines of code to avoid files to be emitted if those have > the latest timestamp within an iteration of listing, because it may still be > written at the same time. > While it doesn't affect much if ListFiles processor is scheduled with a short > period of time, such as few ms, but it does affect negatively if an user > scheduled it with longer run schedule such as "1 day" or with cron scheduler. > For example, user would expect to process list of files per daily basis. Even > if a file is saved few hours ago, the processor will skip this, because the > file has the latest timestamp within the iteration. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (NIFI-3213) ListFile always skips files with the latest timestamp in an iteration even if the files have existed a while ago
[ https://issues.apache.org/jira/browse/NIFI-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15869630#comment-15869630 ] ASF GitHub Bot commented on NIFI-3213: -- Github user trixpan commented on the issue: https://github.com/apache/nifi/pull/1335 @ijokarumawak my bad. I was just running a extra set of compilations and I noticed that under certain conditions there seems to be a race condition affecting ``` Tests run: 12, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.66 sec <<< FAILURE! - in org.apache.nifi.processors.standard.TestListFile testFilterAge(org.apache.nifi.processors.standard.TestListFile) Time elapsed: 1.212 sec <<< FAILURE! org.junit.ComparisonFailure: expected: but was: at org.junit.Assert.assertEquals(Assert.java:115) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.nifi.processors.standard.TestListFile.testFilterAge(TestListFile.java:223) ``` Do you want to have a look at it? > ListFile always skips files with the latest timestamp in an iteration even if > the files have existed a while ago > > > Key: NIFI-3213 > URL: https://issues.apache.org/jira/browse/NIFI-3213 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions >Affects Versions: 1.0.0, 0.5.0, 0.6.0, 0.5.1, 0.7.0, 0.6.1, 1.1.0, 0.7.1 >Reporter: Koji Kawamura >Assignee: Koji Kawamura > > NIFI-1484 add few lines of code to avoid files to be emitted if those have > the latest timestamp within an iteration of listing, because it may still be > written at the same time. > While it doesn't affect much if ListFiles processor is scheduled with a short > period of time, such as few ms, but it does affect negatively if an user > scheduled it with longer run schedule such as "1 day" or with cron scheduler. > For example, user would expect to process list of files per daily basis. Even > if a file is saved few hours ago, the processor will skip this, because the > file has the latest timestamp within the iteration. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (NIFI-3213) ListFile always skips files with the latest timestamp in an iteration even if the files have existed a while ago
[ https://issues.apache.org/jira/browse/NIFI-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15869463#comment-15869463 ] ASF GitHub Bot commented on NIFI-3213: -- Github user trixpan commented on the issue: https://github.com/apache/nifi/pull/1335 LGTM merging > ListFile always skips files with the latest timestamp in an iteration even if > the files have existed a while ago > > > Key: NIFI-3213 > URL: https://issues.apache.org/jira/browse/NIFI-3213 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions >Affects Versions: 1.0.0, 0.5.0, 0.6.0, 0.5.1, 0.7.0, 0.6.1, 1.1.0, 0.7.1 >Reporter: Koji Kawamura >Assignee: Koji Kawamura > > NIFI-1484 add few lines of code to avoid files to be emitted if those have > the latest timestamp within an iteration of listing, because it may still be > written at the same time. > While it doesn't affect much if ListFiles processor is scheduled with a short > period of time, such as few ms, but it does affect negatively if an user > scheduled it with longer run schedule such as "1 day" or with cron scheduler. > For example, user would expect to process list of files per daily basis. Even > if a file is saved few hours ago, the processor will skip this, because the > file has the latest timestamp within the iteration. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (NIFI-3213) ListFile always skips files with the latest timestamp in an iteration even if the files have existed a while ago
[ https://issues.apache.org/jira/browse/NIFI-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15869374#comment-15869374 ] ASF GitHub Bot commented on NIFI-3213: -- Github user trixpan commented on the issue: https://github.com/apache/nifi/pull/1335 @ijokarumawak reviewing it > ListFile always skips files with the latest timestamp in an iteration even if > the files have existed a while ago > > > Key: NIFI-3213 > URL: https://issues.apache.org/jira/browse/NIFI-3213 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions >Affects Versions: 1.0.0, 0.5.0, 0.6.0, 0.5.1, 0.7.0, 0.6.1, 1.1.0, 0.7.1 >Reporter: Koji Kawamura >Assignee: Koji Kawamura > > NIFI-1484 add few lines of code to avoid files to be emitted if those have > the latest timestamp within an iteration of listing, because it may still be > written at the same time. > While it doesn't affect much if ListFiles processor is scheduled with a short > period of time, such as few ms, but it does affect negatively if an user > scheduled it with longer run schedule such as "1 day" or with cron scheduler. > For example, user would expect to process list of files per daily basis. Even > if a file is saved few hours ago, the processor will skip this, because the > file has the latest timestamp within the iteration. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (NIFI-3213) ListFile always skips files with the latest timestamp in an iteration even if the files have existed a while ago
[ https://issues.apache.org/jira/browse/NIFI-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15753933#comment-15753933 ] ASF GitHub Bot commented on NIFI-3213: -- Github user ijokarumawak commented on the issue: https://github.com/apache/nifi/pull/1335 I believe the old behavior that always postpone to emit files with the latest timestamp is counter intuitive and can be problematic with some use cases if user would like to schedule it with longer run schedule. Please correct me if I'm missing any important purpose for this behavior. I had to fix many unit test cases because those are written with an assumption that the latest file should be skipped. Again I believe those test cases became more natural and understandable by this PR, but I may be missing something. Thanks for reviewing in advance! > ListFile always skips files with the latest timestamp in an iteration even if > the files have existed a while ago > > > Key: NIFI-3213 > URL: https://issues.apache.org/jira/browse/NIFI-3213 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions >Affects Versions: 1.0.0, 0.5.0, 0.6.0, 0.5.1, 0.7.0, 0.6.1, 1.1.0, 0.7.1 >Reporter: Koji Kawamura >Assignee: Koji Kawamura > > NIFI-1484 add few lines of code to avoid files to be emitted if those have > the latest timestamp within an iteration of listing, because it may still be > written at the same time. > While it doesn't affect much if ListFiles processor is scheduled with a short > period of time, such as few ms, but it does affect negatively if an user > scheduled it with longer run schedule such as "1 day" or with cron scheduler. > For example, user would expect to process list of files per daily basis. Even > if a file is saved few hours ago, the processor will skip this, because the > file has the latest timestamp within the iteration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NIFI-3213) ListFile always skips files with the latest timestamp in an iteration even if the files have existed a while ago
[ https://issues.apache.org/jira/browse/NIFI-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15753922#comment-15753922 ] ASF GitHub Bot commented on NIFI-3213: -- GitHub user ijokarumawak opened a pull request: https://github.com/apache/nifi/pull/1335 NIFI-3213: Do not skip obviously old files. Thank you for submitting a contribution to Apache NiFi. In order to streamline the review of the contribution we ask you to ensure the following steps have been taken: ### For all changes: - [x] Is there a JIRA ticket associated with this PR? Is it referenced in the commit message? - [x] Does your PR title start with NIFI- where is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character. - [x] Has your PR been rebased against the latest commit within the target branch (typically master)? - [x] Is your initial contribution a single, squashed commit? ### For code changes: - [x] Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder? - [x] Have you written or updated unit tests to verify your changes? ### Note: Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible. Before this fix, files with the latest timestamp within a listing iteration are always be held back one cycle no matter how old it is. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ijokarumawak/nifi nifi-3213 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/nifi/pull/1335.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1335 commit 9bfae4dda5f6e1fa37242207ee3b284da79dffaf Author: Koji Kawamura Date: 2016-12-16T08:48:06Z NIFI-3213: Do not skip obviously old files. Before this fix, files with the latest timestamp within a listing iteration are always be held back one cycle no matter how old it is. > ListFile always skips files with the latest timestamp in an iteration even if > the files have existed a while ago > > > Key: NIFI-3213 > URL: https://issues.apache.org/jira/browse/NIFI-3213 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions >Affects Versions: 1.0.0, 0.5.0, 0.6.0, 0.5.1, 0.7.0, 0.6.1, 1.1.0, 0.7.1 >Reporter: Koji Kawamura >Assignee: Koji Kawamura > > NIFI-1484 add few lines of code to avoid files to be emitted if those have > the latest timestamp within an iteration of listing, because it may still be > written at the same time. > While it doesn't affect much if ListFiles processor is scheduled with a short > period of time, such as few ms, but it does affect negatively if an user > scheduled it with longer run schedule such as "1 day" or with cron scheduler. > For example, user would expect to process list of files per daily basis. Even > if a file is saved few hours ago, the processor will skip this, because the > file has the latest timestamp within the iteration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)