[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15509796#comment-15509796 ] Jason Kushmaul commented on FLUME-2498: --- I added a new ticket FLUME-2994 to add windows support to taildir and patch available. [~hn_mting], you should create a new ticket. > Implement Taildir Source > > > Key: FLUME-2498 > URL: https://issues.apache.org/jira/browse/FLUME-2498 > Project: Flume > Issue Type: New Feature > Components: Sinks+Sources >Reporter: Satoshi Iijima > Fix For: v1.7.0 > > Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, > FLUME-2498-4.patch, FLUME-2498-5.patch, FLUME-2498.patch > > > This is the proposal of implementing a new tailing source. > This source watches the specified files, and tails them in nearly real-time > once appends are detected to these files. > * This source is reliable and will not miss data even when the tailing files > rotate. > * It periodically writes the last read position of each file in a position > file using the JSON format. > * If Flume is stopped or down for some reason, it can restart tailing from > the position written on the existing position file. > * It can add event headers to each tailing file group. > A attached patch includes a config documentation of this. > This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266010#comment-15266010 ] mouwei commented on FLUME-2498: --- Hi, I fond a bug of this tailsource. When I use the regular expression to match files under a folder. when some of file was rolling by log4j, this file's start position which is used to record tail position will be setted to 0. And then all of matched files will be readed again. after checking the code. I find below info: The process() will update all of inodes info by "existingInodes.addAll(reader.updateTailFiles());" But the the skipToEnd will be setted to "false" when update this file. " public List updateTailFiles() throws IOException { return updateTailFiles(false); }" when this file was rolled. below code will be executed. this startPos will be setted to 0. It will be readed again. if (tf == null || !tf.getPath().equals(f.getAbsolutePath())) { long startPos = skipToEnd ? f.length() : 0; tf = openFile(f, headers, inode, startPos); } Does anyone occurred same problem or is there any setting I missed? > Implement Taildir Source > > > Key: FLUME-2498 > URL: https://issues.apache.org/jira/browse/FLUME-2498 > Project: Flume > Issue Type: New Feature > Components: Sinks+Sources >Reporter: Satoshi Iijima > Fix For: v1.7.0 > > Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, > FLUME-2498-4.patch, FLUME-2498-5.patch, FLUME-2498.patch > > > This is the proposal of implementing a new tailing source. > This source watches the specified files, and tails them in nearly real-time > once appends are detected to these files. > * This source is reliable and will not miss data even when the tailing files > rotate. > * It periodically writes the last read position of each file in a position > file using the JSON format. > * If Flume is stopped or down for some reason, it can restart tailing from > the position written on the existing position file. > * It can add event headers to each tailing file group. > A attached patch includes a config documentation of this. > This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905778#comment-14905778 ] Jun Seok Hong commented on FLUME-2498: -- It was a mistake. I removed it. > Implement Taildir Source > > > Key: FLUME-2498 > URL: https://issues.apache.org/jira/browse/FLUME-2498 > Project: Flume > Issue Type: New Feature > Components: Sinks+Sources >Reporter: Satoshi Iijima > Fix For: v1.7.0 > > Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, > FLUME-2498-4.patch, FLUME-2498-5.patch, FLUME-2498.patch > > > This is the proposal of implementing a new tailing source. > This source watches the specified files, and tails them in nearly real-time > once appends are detected to these files. > * This source is reliable and will not miss data even when the tailing files > rotate. > * It periodically writes the last read position of each file in a position > file using the JSON format. > * If Flume is stopped or down for some reason, it can restart tailing from > the position written on the existing position file. > * It can add event headers to each tailing file group. > A attached patch includes a config documentation of this. > This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905676#comment-14905676 ] Hari Shreedharan commented on FLUME-2498: - This is already committed. Can you create a new jira and submit the patch there. > Implement Taildir Source > > > Key: FLUME-2498 > URL: https://issues.apache.org/jira/browse/FLUME-2498 > Project: Flume > Issue Type: New Feature > Components: Sinks+Sources >Reporter: Satoshi Iijima > Fix For: v1.7.0 > > Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, > FLUME-2498-4.patch, FLUME-2498-5.patch, FLUME-2498.patch > > > This is the proposal of implementing a new tailing source. > This source watches the specified files, and tails them in nearly real-time > once appends are detected to these files. > * This source is reliable and will not miss data even when the tailing files > rotate. > * It periodically writes the last read position of each file in a position > file using the JSON format. > * If Flume is stopped or down for some reason, it can restart tailing from > the position written on the existing position file. > * It can add event headers to each tailing file group. > A attached patch includes a config documentation of this. > This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709683#comment-14709683 ] Roshan Naik commented on FLUME-2498: [~evilezh] could u open a jira for that feature request.. and consider submitting a patch for it ? Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Fix For: v1.7.0 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, FLUME-2498-4.patch, FLUME-2498-5.patch, FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709844#comment-14709844 ] Haralds Ulmanis commented on FLUME-2498: Ok ... I'm already writing it. Not exactly patch, but another module. Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Fix For: v1.7.0 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, FLUME-2498-4.patch, FLUME-2498-5.patch, FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708160#comment-14708160 ] Haralds Ulmanis commented on FLUME-2498: It is supposed to work with regex path .. but if your regex is directory, then it does not work e.g. /var/log/.*/abc.log I did lookup code .. only regex in file name works. Maybe add file manager who will add files to list. e.g. Simplified idea: On start add all directories matching regex (directory part) to inotify ... and then process inotify create events. if dir - add watch if file - add to file list to tail. Regards Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Fix For: v1.7.0 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, FLUME-2498-4.patch, FLUME-2498-5.patch, FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700823#comment-14700823 ] Hudson commented on FLUME-2498: --- SUCCESS: Integrated in Flume-trunk-hbase-1 #119 (See [https://builds.apache.org/job/Flume-trunk-hbase-1/119/]) FLUME-2498. Implement Taildir Source (roshan: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.gita=commith=757a560db73c2e6fbec56deea4c753a45ccf9032) * flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirEventReader.java * pom.xml * flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceConfiguration.java * flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java * flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/ReliableTaildirEventReader.java * flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirSource.java * flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceType.java * flume-ng-sources/flume-taildir-source/pom.xml * flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirSourceConfigurationConstants.java * flume-ng-dist/pom.xml * flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TailFile.java * flume-ng-sources/pom.xml * flume-ng-doc/sphinx/FlumeUserGuide.rst Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Fix For: v1.7.0 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, FLUME-2498-4.patch, FLUME-2498-5.patch, FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700864#comment-14700864 ] Satoshi Iijima commented on FLUME-2498: --- Thanks to Roshan for committing this. Thanks Johny and all other reviewer. :) Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Fix For: v1.7.0 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, FLUME-2498-4.patch, FLUME-2498-5.patch, FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700656#comment-14700656 ] ASF subversion and git services commented on FLUME-2498: Commit d02013f4e1ee429b57f24bdfad72e6c6707d0653 in flume's branch refs/heads/flume-1.7 from [~roshan_naik] [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=d02013f ] FLUME-2498. Implement Taildir Source (Satoshi Iijima via Roshan Naik) Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Fix For: v1.7.0 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, FLUME-2498-4.patch, FLUME-2498-5.patch, FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700654#comment-14700654 ] ASF subversion and git services commented on FLUME-2498: Commit 757a560db73c2e6fbec56deea4c753a45ccf9032 in flume's branch refs/heads/trunk from [~roshan_naik] [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=757a560 ] FLUME-2498. Implement Taildir Source (Satoshi Iijima via Roshan Naik) Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Fix For: v1.7.0 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, FLUME-2498-4.patch, FLUME-2498-5.patch, FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700643#comment-14700643 ] Roshan Naik commented on FLUME-2498: Seems like there are no blocker issues. And all changes have been reviewed by others and myself. So +1 from me. Will initiate the commit now. Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Fix For: v1.7.0 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, FLUME-2498-4.patch, FLUME-2498-5.patch, FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698769#comment-14698769 ] Johny Rufus commented on FLUME-2498: +1 for the changes related to ConsumeOrder Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Fix For: v1.7.0 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, FLUME-2498-4.patch, FLUME-2498-5.patch, FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698950#comment-14698950 ] Satoshi Iijima commented on FLUME-2498: --- +1 for the doc changes of 'filegroups' setting. Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Fix For: v1.7.0 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, FLUME-2498-4.patch, FLUME-2498-5.patch, FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693660#comment-14693660 ] Johny Rufus commented on FLUME-2498: [~roshan_naik], let me look at the new line issue. Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Fix For: v1.7.0 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14694141#comment-14694141 ] Roshan Naik commented on FLUME-2498: yes I guess that sounds like a good idea. Good to have a little unit test for that function with two or three different type of lines feeding into it. Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Fix For: v1.7.0 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14694144#comment-14694144 ] Roshan Naik commented on FLUME-2498: if it works well we should use your implementation for FLUME-2508 also Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Fix For: v1.7.0 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14694262#comment-14694262 ] Johny Rufus commented on FLUME-2498: Sure, working on it, let me attach the patch with extra test case, once done Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Fix For: v1.7.0 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693825#comment-14693825 ] Roshan Naik commented on FLUME-2498: Thanks [~jrufus] ! You might be able to leverage the code in FLUME-2508 It handles cases like a single file having both types of line endings (which is rare but does occur in mixed OS environments) Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Fix For: v1.7.0 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693975#comment-14693975 ] Johny Rufus commented on FLUME-2498: Hi [~roshan_naik], Using the System.lineSeparator() is going to return the current system's line separator which may not be the same case with the file being processed. So typically we should 1) figure out the end of a line using '\n' (should work for both unix and windows) 2) and remove '\n' or '\r\n' in the end depending upon which one is present (should work for both unix and windows) Let me know if this sounds good. Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Fix For: v1.7.0 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692881#comment-14692881 ] Roshan Naik commented on FLUME-2498: Let me take a stab at the ordering issue. Any volunteer to take a stab at the new line issue ? ..i.e. handling both types of line endings correctly ? Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Fix For: v1.7.0 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681207#comment-14681207 ] Satoshi Iijima commented on FLUME-2498: --- - The new line issue - doc changes I think above should be fixed before committing, too. Others would be nice to be addressed by providing patches later. I do not have much time to address them now because I assign other tasks. I am happy if Roshan or others address them. Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Fix For: v1.7.0 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14662870#comment-14662870 ] Roshan Naik commented on FLUME-2498: [~iijima_satoshi] do u think u can look into supporting deserializers ? Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Fix For: v1.7.0 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14662892#comment-14662892 ] Juhani Connolly commented on FLUME-2498: Pardon my being forward, but are these actually features we actually need for the patch to be released? I think we should be fixing any actual bugs(the /r /n issue you mentioned), and documentation(e.g. documenting it is not appropriate for binary files) and then committing. After that others are free to further improve on the source by adding deserializer support rather than further delaying inclusion. Committing without deserializer support does not strike me as harmful to users, just a missing feature that would be nice to have and would be an appropriate follow-up patch(same with most of the other suggestions) As you mentioned, it is pretty mature in implementation. It's been in production use for about an year now, on a very large number of servers. Trying to throw in more features in this patch(rather than a separate one) is just going to mean additional debugging and delays. Inclusion has no impact on other components so it is not harmful to them, and the main considerations should be is it needed(I would say yes) and does it work as documented(possibly needing a newline handling fix and documentation on what it does/does not handle). Committing it opens it up to modification by more people to contribute the features they would like to see added. Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Fix For: v1.7.0 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14663052#comment-14663052 ] Ashish Paliwal commented on FLUME-2498: --- +1, I think once we have this gets committed other user can provide patches. Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Fix For: v1.7.0 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659613#comment-14659613 ] Satoshi Iijima commented on FLUME-2498: --- Thank you for reviewing and updating a patch, Roshan. bq. 1. Was not able to verify if it handles subdirectories also ? can you confirm whether or not it handles it ? Now it cannot handles subdirectories. But it would be better to be able to track files of subdirectories. bq. 2. Wasn't clear how often it commits to the position.json file ? Intuitively i would say for every batch committed into the channel the json file should get updated. If position.json is updated for every batch committed, it impacts the performance in a small way. On the other hand, if only position.json is updated in regular interval, data loss do not occur when flume restarts for some reason. bq. 3. can a regex be applied to the directory also and not just file name ? Now this source cannot apply it. But this feature sounds good. It would be good to implement these feature (of question 1 and 3) after this patch is merged to trunk. bq. 4. Windows : What areas in this implementation do you feel may break on Windows ? This source use inode to identify uniqueness of file. It would need to use file ID instead of inode on winodws. bq. 5. Is there some limit on how many files it will track ? Although I do not confirm the limit on a test, there are many hosts where this source tracks several hundreds of files in my production emvioronment. Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Fix For: v1.7.0 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659792#comment-14659792 ] Otis Gospodnetic commented on FLUME-2498: - Could this be used for tailing binary files? Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Fix For: v1.7.0 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660637#comment-14660637 ] jian jin commented on FLUME-2498: - If that is case, i think it is enough. But Are u using SSD? I test it locally, it is not so fast. Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Fix For: v1.7.0 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660565#comment-14660565 ] Roshan Naik commented on FLUME-2498: If it supported deserializers, you could give it a custom deserializer that splits the binary files into individual events... then yes. However, this source has a notion of checking if it has reached the EOF without reading a newline in the current event ... this is one area that needs a bit of investigation to see if the same behavior can be achieved with deserializer support. Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Fix For: v1.7.0 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659591#comment-14659591 ] Satoshi Iijima commented on FLUME-2498: --- In my production environment, this source can tail more than several thousands of appends per second with a few percent CPU usage at a host which has 4 CPU cores. I think it is enough. Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Fix For: v1.7.0 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14655182#comment-14655182 ] Roshan Naik commented on FLUME-2498: *General Comments* # This patch seems relatively mature in its implementation. After making the above fixes, I gave it some testing on my mac and tried to cover some potential corner cases and it handled them pretty well. # Like the filegroup feature. # Like the fact that it can track many files at once. # Handles the case when the event/line is still not completely written # Seems like it is able to pick up appends to files that have been previously closed due to timeout. Thats very nice! # Is tolerant to deletion of file and recreation of new file with same name. (treats them as diff files). Again very nice! # Ran code coverage on the unit tests. Coverage is pretty good (80% line coverage). *Questions:* # Was not able to verify if it handles subdirectories also ? can you confirm whether or not it handles it ? # Wasn't clear how often it commits to the position.json file ? Intuitively i would say for every batch committed into the channel the json file should get updated. # can a regex be applied to the directory also and not just file name ? # Windows : What areas in this implementation do you feel may break on Windows ? # Is there some limit on how many files it will track ? *Suggestions* # major - need to document that it will not delete or rename files, and that there is an expectation of this should be done externally (unlike spooldir) # major - it definitely needs deserializer support. readevent() can forward it to configured deserializer. # major - Does not have a max event size setting (i.e. line length for text files). good to default to a large number (8k ?) for. Deserializer support will automatically give this. # major - files to consume should be selected in order of creation time by default. # major - I think readline() has a bug. it is treating \r without a \n immediately following it as a new line. Patch in FLUME-2508 might be useful for this. # minor - If the file is being overwritten (instead of append) it could log an error and exclude that file ? Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Fix For: v1.7.0 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659352#comment-14659352 ] jian jin commented on FLUME-2498: - A question: Could we improve the readline() which call the read() to get data byte by byte, which is slow. Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Fix For: v1.7.0 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14654860#comment-14654860 ] Roshan Naik commented on FLUME-2498: I am beginning to look at this patch. Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Fix For: v1.7.0 Attachments: FLUME-2498-2.patch, FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637397#comment-14637397 ] sutanu das commented on FLUME-2498: --- Can this patch be backported for Flume 1.5 please? We run Hortonworks Flume 1.5.2.2 and they will not backport this patch for us. Reason we need this patch: 1. We want to restart log-files ingestion as events at a point which flume stopped/recovered, yet, the loglines keep growing/appending 2. We want to get logfile even if logs rotate with new_names eg tail.log.x get rotated to tail.log.y -- where -F of exec source doesnt work and spoolDir doesnt work either (b/c of timestamp limitations) Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Fix For: v1.7.0 Attachments: FLUME-2498-2.patch, FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637926#comment-14637926 ] Hari Shreedharan commented on FLUME-2498: - [~jrufus]/[~roshan_naik] - Do you think one of you would be able take a look at this one? Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Fix For: v1.7.0 Attachments: FLUME-2498-2.patch, FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593680#comment-14593680 ] jian jin commented on FLUME-2498: - you could not do that directly based on flume source code, some confliction if you apply the patch. Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Fix For: v1.7.0 Attachments: FLUME-2498-2.patch, FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593683#comment-14593683 ] jian jin commented on FLUME-2498: - Any progress? One comment about the implementation is : It read the content byte by byte. I do not know if that is necessary, if not, that is really impacted the performance. Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Fix For: v1.7.0 Attachments: FLUME-2498-2.patch, FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554113#comment-14554113 ] flankwang commented on FLUME-2498: -- I want to merge this patch to apache-flume-1.5.2,But I don`t know how to make it. Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Fix For: v1.7.0 Attachments: FLUME-2498-2.patch, FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14531513#comment-14531513 ] Hari Shreedharan commented on FLUME-2498: - I will review this over the next few days. Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Fix For: v1.7.0 Attachments: FLUME-2498-2.patch, FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14531188#comment-14531188 ] jian jin commented on FLUME-2498: - when this could be merged into trunk? Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Fix For: v1.7.0 Attachments: FLUME-2498-2.patch, FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14350131#comment-14350131 ] Satoshi Iijima commented on FLUME-2498: --- Hari or other committers, What about adding this as an experimental source? Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Attachments: FLUME-2498-2.patch, FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327435#comment-14327435 ] Satoshi Iijima commented on FLUME-2498: --- It sounds good. I agree with implementing these features if this patch is merged to trunk repository. Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Attachments: FLUME-2498-2.patch, FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327905#comment-14327905 ] Otis Gospodnetic commented on FLUME-2498: - +1. Too many people asking about tailing and this patch in particular to be ignored by Flume committers IMHO. Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Attachments: FLUME-2498-2.patch, FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328529#comment-14328529 ] Ashish Paliwal commented on FLUME-2498: --- Can't we add this as an experimental feature? May be we incorporate User feedback as we get it Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Attachments: FLUME-2498-2.patch, FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323992#comment-14323992 ] Benjamin Fiorini commented on FLUME-2498: - Great source, works very well from what I could test ! Even though it's easy to work around, it would be cool to also have: - regex in the directories as well (eg: /var/spool/flume/.*/.*\.reports) - be able to add the filename in the headers (could be useful if a regex is used in the filegroup) Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Attachments: FLUME-2498-2.patch, FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278201#comment-14278201 ] Satoshi Iijima commented on FLUME-2498: --- Updated the patch. I have made a fix to read multiple byte characters which are encoded in UTF-8, such as Japanese. Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Attachments: FLUME-2498-2.patch, FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14184882#comment-14184882 ] Satoshi Iijima commented on FLUME-2498: --- Thanks, Juhani-san. Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Attachments: FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14184747#comment-14184747 ] Juhani Connolly commented on FLUME-2498: Since most of the implementation details should be the same as an internal tool I wrote a while back I should be able to answer a couple of the remaining queries I can't tell if it's Yes or No to the will lines be read again Duplicate reads are possible if the process is restarted and tailing has to be resumed. Checkpoints are periodically written. We didn't see it as an issue as flume can create duplicate lines in other places, with the objective being to prevent log loss, not duplication. I think the person was asking whether this Taildir Source implementation deletes a file when it's done reading it or not. I think the answer is that it does NOT delete the file and that file deletion is somebody else's responsibility. Correct? This is correct. We use flume and the source as an invisible entity. We have it running on many internal services who do not need to worry about its existence as it works behind the scenes. We never had a need for it to delete the files, and for something tailing in real time, I suspect such a thing would be awkward. When would you delete a file that's actively being appended to? Once you're done reading, it may still get more appends. We close the file handles if there are no appends for a while, just to avoid hogging the file handle and so that log rotations and such are not obstructed, again with the objective that flume/the tailer be as unobstructive as possible. Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Attachments: FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181075#comment-14181075 ] Satoshi Iijima commented on FLUME-2498: --- This source supports only plain-text and is supposed to tail files as new-line separated data. It would be possible to tail non plain-text file if deserializer is implemented to TailFile class and can be specified in flume.conf by a patch. Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Attachments: FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180276#comment-14180276 ] Otis Gospodnetic commented on FLUME-2498: - bq. It would be possible to create a patch to control the order of files for consumption in some way, for example, sorting inode list using comparator. +1 for this. Can this source be used to tail a file where something is writing Avro or Thrift? Or just plain-text, line-oriented data? Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Attachments: FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176854#comment-14176854 ] Satoshi Iijima commented on FLUME-2498: --- It would be possible to create a patch to control the order of files for consumption in some way, for example, sorting inode list using comparator. It is correct. This source does not have the option to delete the files now. Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Attachments: FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174871#comment-14174871 ] Satoshi Iijima commented on FLUME-2498: --- Appearing below is answers to the questions posted to the mailing lists. Btw., because the position in the file is checkpointed periodically, does that mean that it is possible that, after a restart, some number of lines that have already been tailed, will be read again? Yes. They will not be read again. On restart this source will start reading from the last read position in position file. - How does it know when to stop tailing the current file and switch to or start tailing another file - When there is a backlog of many files being built up... how does it order the files for consumption This source does not have the order because it is basically supposed to tail appended lines of files in nearly real-time. If there is a backlog of many files on start-up, one file will be selected in random order and be read to EOF, then the next file will be selected in the same way. Using 'skipToEnd' property, it can also start tailing from EOF of the current files. - Sounds like there is some C/C++ native code + JNI to work with inodes ? what api are you using. This source uses java.nio.file.Files.getAttribute() of Java 7 API to identify inode of a file. - does it auto delete the consumed files ? No, the consumed files need not be deleted in this source. Files and positions of each file that should be tailed are recorded in the position file. For example, a log file of a application such as /var/log/app/access.log can be directly specified in flume.conf Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Attachments: FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175530#comment-14175530 ] Otis Gospodnetic commented on FLUME-2498: - bq. Yes. They will not be read again. I can't tell if it's Yes or No to the will lines be read again :) bq. If there is a backlog of many files on start-up, one file will be selected in random order Would it be possible to look at the timestamp on unread files? Or see if they have a numeric extension, like .1, .2, etc. and use some heuristics to try and read them in the correct order? bq. No, the consumed files need not be deleted in this source. I think the person was asking whether this Taildir Source implementation deletes a file when it's done reading it or not. I think the answer is that it does NOT delete the file and that file deletion is somebody else's responsibility. Correct? Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Attachments: FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14172294#comment-14172294 ] Satoshi Iijima commented on FLUME-2498: --- I have added 'Windows support' and 'Minimum Java version' to the above links. I think it is possible to support Windows because Windows has file ID instead of inode. If someone attaches a patch in which this source can run under Windows from now, then I think it is good, although I do not have Windows environment. 'Pollable' means that the source implements PollableSource or that it continues to process if tailing files was entirely consumed. 'Append header' means that the source can append headers to the events. When the source tails multiple files, it is useful to be able to append headers to events of each file such as Spooling Directory source. Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Attachments: FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14170734#comment-14170734 ] Satoshi Iijima commented on FLUME-2498: --- I compared this source with other source components which can tail or read file(s) If there are incorrect contents, please let me know. Compare with tail-pollable-source (FLUME-2344) and jambalay-file-source http://vschart.com/compare/flume-taildir-source/vs/flume-tail-pollable-source-flume-2344/vs/flume-jambalay-file-source Compare with Spooling directory source and Exec source (tail -F) http://vschart.com/compare/flume-taildir-source/vs/flume-spooling-directory/vs/flume-exec-source-tail-f Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Attachments: FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171202#comment-14171202 ] Otis Gospodnetic commented on FLUME-2498: - Thanks! Questions: * what about Windows support? Should that be another row? * what does Pollable mean? * what does Append header mean/do? When is this used, how/why is it useful? Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Attachments: FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. This source requires Unix-style file system and Java 1.7 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2498) Implement Taildir Source
[ https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166578#comment-14166578 ] Satoshi Iijima commented on FLUME-2498: --- Hari, Santiago, Otis, Roshan - Thank you for your replying to dev ML. I am sorry that I could not reply for a unknown reason after that... I would like to discuss about this source here. Implement Taildir Source Key: FLUME-2498 URL: https://issues.apache.org/jira/browse/FLUME-2498 Project: Flume Issue Type: New Feature Components: Sinks+Sources Reporter: Satoshi Iijima Attachments: FLUME-2498.patch This is the proposal of implementing a new tailing source. This source watches the specified files, and tails them in nearly real-time once appends are detected to these files. * This source is reliable and will not miss data even when the tailing files rotate. * It periodically writes the last read position of each file in a position file using the JSON format. * If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file. * It can add event headers to each tailing file group. A attached patch includes a config documentation of this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)