[jira] [Commented] (FLUME-3219) Taildir source: if file is renamed, it is consumed again

2018-04-06 Thread John P. Kiffmeyer (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-3219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429014#comment-16429014
 ] 

John P. Kiffmeyer commented on FLUME-3219:
--

I'm seeing this too.  This means a plain old logrotate(8) setup on the log 
directory TailDir is pointed at will cause massive duplication.

> Taildir source: if file is renamed, it is consumed again
> 
>
> Key: FLUME-3219
> URL: https://issues.apache.org/jira/browse/FLUME-3219
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>Affects Versions: 1.8.0
>Reporter: Daniel Lanza García
>Priority: Major
>
> Current behavior of Taildir is such that if a file is renamed (eg log 
> rotated) it is consumed again.
> https://github.com/apache/flume/blob/d1f24f56ce9714bb3e1edc671da290c75a17dead/flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/ReliableTaildirEventReader.java#L247
> Would not be better if the inode is followed, and if that inode has been 
> consumed do not consume it again? With current implementation, once file is 
> rotated, you get duplicates in the case you include in the path previous 
> days's data (you want to do that if agent fails and needs to consume data 
> from previous days).
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@flume.apache.org
For additional commands, e-mail: issues-h...@flume.apache.org



[jira] [Comment Edited] (FLUME-3219) Taildir source: if file is renamed, it is consumed again

2018-04-06 Thread John P. Kiffmeyer (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-3219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429014#comment-16429014
 ] 

John P. Kiffmeyer edited comment on FLUME-3219 at 4/6/18 9:27 PM:
--

I'm seeing this too.  This means a plain old logrotate(8) setup on the log 
directory TailDir is pointed at will cause lots of reprocessing.

Specifically, a logrotate config like this one would cause TailDir to reprocess 
a file every time the _n_ in "thing.log.n" gets bumped.  So, lots of 
duplication.
{code:none}
/var/log/thing/thing.log {
# Rotate a file when it gets bigger than 25MiB
maxsize 26214400
# Keep at most 40 files
rotate 40
...
}
{code}


was (Author: jpk):
I'm seeing this too.  This means a plain old logrotate(8) setup on the log 
directory TailDir is pointed at will cause massive duplication.

> Taildir source: if file is renamed, it is consumed again
> 
>
> Key: FLUME-3219
> URL: https://issues.apache.org/jira/browse/FLUME-3219
> Project: Flume
>  Issue Type: Improvement
>  Components: Sinks+Sources
>Affects Versions: 1.8.0
>Reporter: Daniel Lanza García
>Priority: Major
>
> Current behavior of Taildir is such that if a file is renamed (eg log 
> rotated) it is consumed again.
> https://github.com/apache/flume/blob/d1f24f56ce9714bb3e1edc671da290c75a17dead/flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/ReliableTaildirEventReader.java#L247
> Would not be better if the inode is followed, and if that inode has been 
> consumed do not consume it again? With current implementation, once file is 
> rotated, you get duplicates in the case you include in the path previous 
> days's data (you want to do that if agent fails and needs to consume data 
> from previous days).
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@flume.apache.org
For additional commands, e-mail: issues-h...@flume.apache.org