[jira] [Commented] (FLUME-2498) Implement Taildir Source

2016-09-21 Thread Jason Kushmaul (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15509796#comment-15509796
 ] 

Jason Kushmaul commented on FLUME-2498:
---

I added a new ticket FLUME-2994 to add windows support to taildir
and patch available.

[~hn_mting], you should create a new ticket.

> Implement Taildir Source
> 
>
> Key: FLUME-2498
> URL: https://issues.apache.org/jira/browse/FLUME-2498
> Project: Flume
>  Issue Type: New Feature
>  Components: Sinks+Sources
>Reporter: Satoshi Iijima
> Fix For: v1.7.0
>
> Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, 
> FLUME-2498-4.patch, FLUME-2498-5.patch, FLUME-2498.patch
>
>
> This is the proposal of implementing a new tailing source.
> This source watches the specified files, and tails them in nearly real-time 
> once appends are detected to these files.
> * This source is reliable and will not miss data even when the tailing files 
> rotate.
> * It periodically writes the last read position of each file in a position 
> file using the JSON format.
> * If Flume is stopped or down for some reason, it can restart tailing from 
> the position written on the existing position file.
> * It can add event headers to each tailing file group. 
> A attached patch includes a config documentation of this.
> This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2016-05-01 Thread mouwei (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266010#comment-15266010
 ] 

mouwei commented on FLUME-2498:
---

Hi,
I fond a bug of this tailsource.

When I use the regular expression to match files under a folder. when some of 
file was rolling by log4j, this file's start position which is used to record 
tail position will be setted to 0.  And then all of matched files will be 
readed again.

after checking the code. I find below info:
The process() will update all of inodes info by 
"existingInodes.addAll(reader.updateTailFiles());"
But the the skipToEnd will be setted to "false" when update this file.
" public List updateTailFiles() throws IOException {
return updateTailFiles(false);
  }"
when this file was rolled. below code will be executed. this startPos will be 
setted to 0. It will be readed again.
if (tf == null || !tf.getPath().equals(f.getAbsolutePath())) {
  long startPos = skipToEnd ? f.length() : 0;
  tf = openFile(f, headers, inode, startPos);
}

Does anyone occurred same problem or  is there any setting I missed?




> Implement Taildir Source
> 
>
> Key: FLUME-2498
> URL: https://issues.apache.org/jira/browse/FLUME-2498
> Project: Flume
>  Issue Type: New Feature
>  Components: Sinks+Sources
>Reporter: Satoshi Iijima
> Fix For: v1.7.0
>
> Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, 
> FLUME-2498-4.patch, FLUME-2498-5.patch, FLUME-2498.patch
>
>
> This is the proposal of implementing a new tailing source.
> This source watches the specified files, and tails them in nearly real-time 
> once appends are detected to these files.
> * This source is reliable and will not miss data even when the tailing files 
> rotate.
> * It periodically writes the last read position of each file in a position 
> file using the JSON format.
> * If Flume is stopped or down for some reason, it can restart tailing from 
> the position written on the existing position file.
> * It can add event headers to each tailing file group. 
> A attached patch includes a config documentation of this.
> This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2015-09-23 Thread Jun Seok Hong (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905778#comment-14905778
 ] 

Jun Seok Hong commented on FLUME-2498:
--

It was a mistake. I removed it.

> Implement Taildir Source
> 
>
> Key: FLUME-2498
> URL: https://issues.apache.org/jira/browse/FLUME-2498
> Project: Flume
>  Issue Type: New Feature
>  Components: Sinks+Sources
>Reporter: Satoshi Iijima
> Fix For: v1.7.0
>
> Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, 
> FLUME-2498-4.patch, FLUME-2498-5.patch, FLUME-2498.patch
>
>
> This is the proposal of implementing a new tailing source.
> This source watches the specified files, and tails them in nearly real-time 
> once appends are detected to these files.
> * This source is reliable and will not miss data even when the tailing files 
> rotate.
> * It periodically writes the last read position of each file in a position 
> file using the JSON format.
> * If Flume is stopped or down for some reason, it can restart tailing from 
> the position written on the existing position file.
> * It can add event headers to each tailing file group. 
> A attached patch includes a config documentation of this.
> This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2015-09-23 Thread Hari Shreedharan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905676#comment-14905676
 ] 

Hari Shreedharan commented on FLUME-2498:
-

This is already committed. Can you create a new jira and submit the patch there.

> Implement Taildir Source
> 
>
> Key: FLUME-2498
> URL: https://issues.apache.org/jira/browse/FLUME-2498
> Project: Flume
>  Issue Type: New Feature
>  Components: Sinks+Sources
>Reporter: Satoshi Iijima
> Fix For: v1.7.0
>
> Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, 
> FLUME-2498-4.patch, FLUME-2498-5.patch, FLUME-2498.patch
>
>
> This is the proposal of implementing a new tailing source.
> This source watches the specified files, and tails them in nearly real-time 
> once appends are detected to these files.
> * This source is reliable and will not miss data even when the tailing files 
> rotate.
> * It periodically writes the last read position of each file in a position 
> file using the JSON format.
> * If Flume is stopped or down for some reason, it can restart tailing from 
> the position written on the existing position file.
> * It can add event headers to each tailing file group. 
> A attached patch includes a config documentation of this.
> This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2015-08-24 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709683#comment-14709683
 ] 

Roshan Naik commented on FLUME-2498:


[~evilezh] could u open a jira for that feature request.. and consider 
submitting a patch for it ?

 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Fix For: v1.7.0

 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, 
 FLUME-2498-4.patch, FLUME-2498-5.patch, FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2015-08-24 Thread Haralds Ulmanis (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709844#comment-14709844
 ] 

Haralds Ulmanis commented on FLUME-2498:


Ok ... I'm already writing it. Not exactly patch, but another module.



 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Fix For: v1.7.0

 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, 
 FLUME-2498-4.patch, FLUME-2498-5.patch, FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2015-08-22 Thread Haralds Ulmanis (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708160#comment-14708160
 ] 

Haralds Ulmanis commented on FLUME-2498:


It is supposed to work with regex path .. but if your regex is directory, then 
it does not work
e.g. /var/log/.*/abc.log
I did lookup code .. only regex in file name works.

Maybe add file manager who will add files to list. e.g.
Simplified idea: On start add all directories matching regex (directory part) 
to inotify ... and then process inotify create events. 
if dir - add watch
if file - add to file list to tail.

Regards


 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Fix For: v1.7.0

 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, 
 FLUME-2498-4.patch, FLUME-2498-5.patch, FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2015-08-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700823#comment-14700823
 ] 

Hudson commented on FLUME-2498:
---

SUCCESS: Integrated in Flume-trunk-hbase-1 #119 (See 
[https://builds.apache.org/job/Flume-trunk-hbase-1/119/])
FLUME-2498.  Implement Taildir Source (roshan: 
http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.gita=commith=757a560db73c2e6fbec56deea4c753a45ccf9032)
* 
flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirEventReader.java
* pom.xml
* 
flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceConfiguration.java
* 
flume-ng-sources/flume-taildir-source/src/test/java/org/apache/flume/source/taildir/TestTaildirSource.java
* 
flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/ReliableTaildirEventReader.java
* 
flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirSource.java
* 
flume-ng-configuration/src/main/java/org/apache/flume/conf/source/SourceType.java
* flume-ng-sources/flume-taildir-source/pom.xml
* 
flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TaildirSourceConfigurationConstants.java
* flume-ng-dist/pom.xml
* 
flume-ng-sources/flume-taildir-source/src/main/java/org/apache/flume/source/taildir/TailFile.java
* flume-ng-sources/pom.xml
* flume-ng-doc/sphinx/FlumeUserGuide.rst


 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Fix For: v1.7.0

 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, 
 FLUME-2498-4.patch, FLUME-2498-5.patch, FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2015-08-18 Thread Satoshi Iijima (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700864#comment-14700864
 ] 

Satoshi Iijima commented on FLUME-2498:
---

Thanks to Roshan for committing this. Thanks Johny and all other reviewer. :)

 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Fix For: v1.7.0

 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, 
 FLUME-2498-4.patch, FLUME-2498-5.patch, FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2015-08-17 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700656#comment-14700656
 ] 

ASF subversion and git services commented on FLUME-2498:


Commit d02013f4e1ee429b57f24bdfad72e6c6707d0653 in flume's branch 
refs/heads/flume-1.7 from [~roshan_naik]
[ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=d02013f ]

FLUME-2498.  Implement Taildir Source

(Satoshi Iijima via Roshan Naik)


 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Fix For: v1.7.0

 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, 
 FLUME-2498-4.patch, FLUME-2498-5.patch, FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2015-08-17 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700654#comment-14700654
 ] 

ASF subversion and git services commented on FLUME-2498:


Commit 757a560db73c2e6fbec56deea4c753a45ccf9032 in flume's branch 
refs/heads/trunk from [~roshan_naik]
[ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=757a560 ]

FLUME-2498.  Implement Taildir Source

(Satoshi Iijima via Roshan Naik)


 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Fix For: v1.7.0

 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, 
 FLUME-2498-4.patch, FLUME-2498-5.patch, FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2015-08-17 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700643#comment-14700643
 ] 

Roshan Naik commented on FLUME-2498:


Seems like there are no blocker issues. And all changes have been reviewed by 
others and myself.
So +1 from me. 
Will initiate the commit now.

 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Fix For: v1.7.0

 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, 
 FLUME-2498-4.patch, FLUME-2498-5.patch, FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2015-08-16 Thread Johny Rufus (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698769#comment-14698769
 ] 

Johny Rufus commented on FLUME-2498:


+1 for the changes related to ConsumeOrder


 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Fix For: v1.7.0

 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, 
 FLUME-2498-4.patch, FLUME-2498-5.patch, FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2015-08-16 Thread Satoshi Iijima (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698950#comment-14698950
 ] 

Satoshi Iijima commented on FLUME-2498:
---

+1 for the doc changes of 'filegroups' setting.

 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Fix For: v1.7.0

 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, 
 FLUME-2498-4.patch, FLUME-2498-5.patch, FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2015-08-12 Thread Johny Rufus (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693660#comment-14693660
 ] 

Johny Rufus commented on FLUME-2498:


[~roshan_naik], let me look at the new line issue.

 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Fix For: v1.7.0

 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2015-08-12 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14694141#comment-14694141
 ] 

Roshan Naik commented on FLUME-2498:


yes I guess that sounds like a good idea. Good to have a little unit test for 
that function with two or three different type of lines feeding into it.


 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Fix For: v1.7.0

 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2015-08-12 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14694144#comment-14694144
 ] 

Roshan Naik commented on FLUME-2498:


if it works well we should use your implementation for   FLUME-2508 also

 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Fix For: v1.7.0

 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2015-08-12 Thread Johny Rufus (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14694262#comment-14694262
 ] 

Johny Rufus commented on FLUME-2498:


Sure, working on it, let me attach the patch with extra test case, once done

 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Fix For: v1.7.0

 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2015-08-12 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693825#comment-14693825
 ] 

Roshan Naik commented on FLUME-2498:


Thanks [~jrufus] !
You might be able to leverage the code in FLUME-2508
It handles cases like a single file having both types of line endings (which is 
rare but does occur in mixed OS environments)

 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Fix For: v1.7.0

 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2015-08-12 Thread Johny Rufus (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693975#comment-14693975
 ] 

Johny Rufus commented on FLUME-2498:


Hi [~roshan_naik], Using the System.lineSeparator() is going to return the 
current system's line separator which may not be the same case with the file 
being processed. 
So typically we should 
1) figure out the end of a line using '\n' (should work for both unix and 
windows)
2) and remove '\n' or '\r\n' in the end  depending upon which one is present 
(should work for both unix and windows)

Let me know if this sounds good.

 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Fix For: v1.7.0

 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2015-08-11 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692881#comment-14692881
 ] 

Roshan Naik commented on FLUME-2498:


Let me take a stab at the ordering issue.

Any volunteer to take a stab at the new line issue ? ..i.e.  handling both 
types of  line endings correctly ?

 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Fix For: v1.7.0

 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2015-08-10 Thread Satoshi Iijima (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681207#comment-14681207
 ] 

Satoshi Iijima commented on FLUME-2498:
---

- The new line issue
- doc changes

I think above should be fixed before committing, too.
Others would be nice to be addressed by providing patches later.

I do not have much time to address them now because I assign other tasks.
I am happy if Roshan or others address them.

 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Fix For: v1.7.0

 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2015-08-08 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14662870#comment-14662870
 ] 

Roshan Naik commented on FLUME-2498:


[~iijima_satoshi]  do u think u can look into supporting deserializers ?

 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Fix For: v1.7.0

 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2015-08-08 Thread Juhani Connolly (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14662892#comment-14662892
 ] 

Juhani Connolly commented on FLUME-2498:


Pardon my being forward, but are these actually features we actually need for 
the patch to be released?

I think we should be fixing any actual bugs(the /r /n issue you mentioned), and 
documentation(e.g. documenting it is not appropriate for binary files) and then 
committing. After that others are free to further improve on the source by 
adding deserializer support rather than further delaying inclusion. Committing 
without deserializer support does not strike me as harmful to users, just a 
missing feature that would be nice to have and would be an appropriate 
follow-up patch(same with most of the other suggestions)

As you mentioned, it is pretty mature in implementation. It's been in 
production use for about an year now, on a very large number of servers. Trying 
to throw in more features in this patch(rather than a separate one) is just 
going to mean additional debugging and delays. Inclusion has no impact on other 
components so it is not harmful to them, and the main considerations should be 
is it needed(I would say yes) and does it work as documented(possibly 
needing a newline handling fix and documentation on what it does/does not 
handle). Committing it opens it up to modification by more people to contribute 
the features they would like to see added.

 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Fix For: v1.7.0

 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2015-08-08 Thread Ashish Paliwal (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14663052#comment-14663052
 ] 

Ashish Paliwal commented on FLUME-2498:
---

+1, I think once we have this gets committed other user can provide patches.

 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Fix For: v1.7.0

 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2015-08-06 Thread Satoshi Iijima (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659613#comment-14659613
 ] 

Satoshi Iijima commented on FLUME-2498:
---


Thank you for reviewing and updating a patch, Roshan.

bq. 1. Was not able to verify if it handles subdirectories also ? can you 
confirm whether or not it handles it ?

Now it cannot handles subdirectories. But it would be better to be able to 
track files of subdirectories.

bq. 2. Wasn't clear how often it commits to the position.json file ? 
Intuitively i would say for every batch committed into the channel the json 
file should get updated.

If position.json is updated for every batch committed, it impacts the 
performance in a small way.
On the other hand, if only position.json is updated in regular interval, data 
loss do not occur when flume restarts for some reason.

bq. 3. can a regex be applied to the directory also and not just file name ?

Now this source cannot apply it. But this feature sounds good.
It would be good to implement these feature (of question 1 and 3) after this 
patch is merged to trunk.

bq. 4. Windows : What areas in this implementation do you feel may break on 
Windows ?

This source use inode to identify uniqueness of file. It would need to use file 
ID instead of inode on winodws.

bq. 5. Is there some limit on how many files it will track ?
Although I do not confirm the limit on a test, there are many hosts where this 
source tracks several hundreds of files in my production emvioronment.


 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Fix For: v1.7.0

 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2015-08-06 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659792#comment-14659792
 ] 

Otis Gospodnetic commented on FLUME-2498:
-

Could this be used for tailing binary files?

 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Fix For: v1.7.0

 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2015-08-06 Thread jian jin (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660637#comment-14660637
 ] 

jian jin commented on FLUME-2498:
-

If that is case, i think it is enough. But Are u using SSD? I test it locally, 
it is not so fast.

 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Fix For: v1.7.0

 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2015-08-06 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660565#comment-14660565
 ] 

Roshan Naik commented on FLUME-2498:


If it supported deserializers, you could give it a custom deserializer that 
splits the binary files into individual events... then yes.

However, this source has a notion of checking if it has reached the EOF without 
reading a newline in the current event ... this is one area that needs a bit of 
investigation to see if the same behavior can be achieved with deserializer 
support.

 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Fix For: v1.7.0

 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2015-08-06 Thread Satoshi Iijima (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659591#comment-14659591
 ] 

Satoshi Iijima commented on FLUME-2498:
---

In my production environment, this source can tail more than several thousands 
of appends per second with a few percent CPU usage at a host which has 4 CPU 
cores. I think it is enough.

 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Fix For: v1.7.0

 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2015-08-05 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14655182#comment-14655182
 ] 

Roshan Naik commented on FLUME-2498:


*General Comments*
 # This patch seems relatively mature in its implementation. After making the 
above fixes, I gave it some testing on my mac and tried to cover some potential 
corner cases and it handled them pretty well.
 # Like the filegroup feature.
 # Like the fact that it can track many files at once.
 # Handles the case when the event/line is still not completely written
 # Seems like it is able to pick up appends to files that have been previously 
closed due to timeout. Thats very nice!
 # Is tolerant to deletion of file and recreation of new file with same name. 
(treats them as diff files). Again very nice!
 # Ran code coverage on the unit tests. Coverage is pretty good (80% line 
coverage).

*Questions:*
 # Was not able to verify if it handles subdirectories also ? can you confirm 
whether or not it handles it ?
 # Wasn't clear how often it commits to the position.json file ? Intuitively i 
would say for every batch committed into the channel the json file should get 
updated.
 # can a regex be applied to the directory also and not just file name ?
 # Windows : What areas in this implementation do you feel may break on Windows 
? 
 # Is there some limit on how many files it will track ?


*Suggestions*
# major - need to document that it will not delete or rename files, and that 
there is an expectation of this should be done externally (unlike spooldir)
# major - it definitely needs deserializer support. readevent()  can forward it 
to configured deserializer.
# major - Does not have a max event size setting (i.e. line length for text 
files). good to default to a large number (8k ?) for. Deserializer support will 
automatically give this. 
# major - files to consume should be selected in order of creation time  by 
default.
# major - I think readline() has a bug. it is treating \r without a \n 
immediately following it as a new line.
   Patch in FLUME-2508 might be useful for this.
# minor - If the file is being overwritten (instead of append) it could log an 
error and exclude that file ?



 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Fix For: v1.7.0

 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2015-08-05 Thread jian jin (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659352#comment-14659352
 ] 

jian jin commented on FLUME-2498:
-

A question:

Could we improve the readline() which call the read() to get data byte by byte, 
which is slow.

 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Fix For: v1.7.0

 Attachments: FLUME-2498-2.patch, FLUME-2498-3.patch, FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2015-08-04 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14654860#comment-14654860
 ] 

Roshan Naik commented on FLUME-2498:


I am beginning to look at this patch.

 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Fix For: v1.7.0

 Attachments: FLUME-2498-2.patch, FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2015-07-22 Thread sutanu das (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637397#comment-14637397
 ] 

sutanu das commented on FLUME-2498:
---

Can this patch be backported for Flume 1.5 please?

We run Hortonworks Flume 1.5.2.2 and they will not backport this patch for us.

Reason we need this patch:

1. We want to restart log-files ingestion as events at a point which flume 
stopped/recovered, yet, the loglines keep growing/appending

2. We want to get logfile even if logs rotate with new_names eg tail.log.x get 
rotated to tail.log.y -- where -F of exec source doesnt work and spoolDir 
doesnt work either (b/c of timestamp limitations)

 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Fix For: v1.7.0

 Attachments: FLUME-2498-2.patch, FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2015-07-22 Thread Hari Shreedharan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637926#comment-14637926
 ] 

Hari Shreedharan commented on FLUME-2498:
-

[~jrufus]/[~roshan_naik] - Do you think one of you would be able take a look at 
this one?

 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Fix For: v1.7.0

 Attachments: FLUME-2498-2.patch, FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2015-06-19 Thread jian jin (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593680#comment-14593680
 ] 

jian jin commented on FLUME-2498:
-

you could not do that directly based on flume source code, some confliction if 
you apply the patch.

 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Fix For: v1.7.0

 Attachments: FLUME-2498-2.patch, FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2015-06-19 Thread jian jin (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593683#comment-14593683
 ] 

jian jin commented on FLUME-2498:
-

Any progress? One comment about the implementation is : It read the content 
byte by byte. I do not know if that is necessary, if not, that is really 
impacted the performance.

 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Fix For: v1.7.0

 Attachments: FLUME-2498-2.patch, FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2015-05-21 Thread flankwang (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554113#comment-14554113
 ] 

flankwang commented on FLUME-2498:
--

I want to merge this patch to apache-flume-1.5.2,But I don`t know how to make 
it. 

 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Fix For: v1.7.0

 Attachments: FLUME-2498-2.patch, FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2015-05-06 Thread Hari Shreedharan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14531513#comment-14531513
 ] 

Hari Shreedharan commented on FLUME-2498:
-

I will review this over the next few days.

 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Fix For: v1.7.0

 Attachments: FLUME-2498-2.patch, FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2015-05-06 Thread jian jin (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14531188#comment-14531188
 ] 

jian jin commented on FLUME-2498:
-

when this could be merged into trunk?

 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Fix For: v1.7.0

 Attachments: FLUME-2498-2.patch, FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2015-03-06 Thread Satoshi Iijima (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14350131#comment-14350131
 ] 

Satoshi Iijima commented on FLUME-2498:
---

Hari or other committers, What about adding this as an experimental source?

 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Attachments: FLUME-2498-2.patch, FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2015-02-19 Thread Satoshi Iijima (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327435#comment-14327435
 ] 

Satoshi Iijima commented on FLUME-2498:
---

It sounds good. I agree with implementing these features if this patch is 
merged to trunk repository.

 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Attachments: FLUME-2498-2.patch, FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2015-02-19 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327905#comment-14327905
 ] 

Otis Gospodnetic commented on FLUME-2498:
-

+1.  Too many people asking about tailing and this patch in particular to be 
ignored by Flume committers IMHO.

 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Attachments: FLUME-2498-2.patch, FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2015-02-19 Thread Ashish Paliwal (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328529#comment-14328529
 ] 

Ashish Paliwal commented on FLUME-2498:
---

Can't we add this as an experimental feature? May be we incorporate User 
feedback as we get it

 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Attachments: FLUME-2498-2.patch, FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2015-02-17 Thread Benjamin Fiorini (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323992#comment-14323992
 ] 

Benjamin Fiorini commented on FLUME-2498:
-

Great source, works very well from what I could test !

Even though it's easy to work around, it would be cool to also have:
- regex in the directories as well (eg: /var/spool/flume/.*/.*\.reports)
- be able to add the filename in the headers (could be useful if a regex is 
used in the filegroup)

 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Attachments: FLUME-2498-2.patch, FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2015-01-14 Thread Satoshi Iijima (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278201#comment-14278201
 ] 

Satoshi Iijima commented on FLUME-2498:
---

Updated the patch. I have made a fix to read multiple byte characters which are 
encoded in UTF-8, such as Japanese.

 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Attachments: FLUME-2498-2.patch, FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2014-10-27 Thread Satoshi Iijima (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14184882#comment-14184882
 ] 

Satoshi Iijima commented on FLUME-2498:
---

Thanks, Juhani-san.

 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Attachments: FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2014-10-26 Thread Juhani Connolly (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14184747#comment-14184747
 ] 

Juhani Connolly commented on FLUME-2498:


Since most of the implementation details should be the same as an internal tool 
I wrote a while back I should be able to answer a couple of the remaining 
queries

 I can't tell if it's Yes or No to the will lines be read again 

Duplicate reads are possible if the process is restarted and tailing has to be 
resumed. Checkpoints are periodically written. We didn't see it as an issue as 
flume can create duplicate lines in other places, with the objective being to 
prevent log loss, not duplication.

 I think the person was asking whether this Taildir Source implementation 
 deletes a file when it's done reading it or not. I think the answer is that 
 it does NOT delete the file and that file deletion is somebody else's 
 responsibility. Correct?

This is correct. We use flume and the source as an invisible entity. We have 
it running on many internal services who do not need to worry about its 
existence as it works behind the scenes. We never had a need for it to delete 
the files, and for something tailing in real time, I suspect such a thing would 
be awkward. When would you delete a file that's actively being appended to? 
Once you're done reading, it may still get more appends. We close the file 
handles if there are no appends for a while, just to avoid hogging the file 
handle and so that log rotations and such are not obstructed, again with the 
objective that flume/the tailer be as unobstructive as possible.

 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Attachments: FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2014-10-23 Thread Satoshi Iijima (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181075#comment-14181075
 ] 

Satoshi Iijima commented on FLUME-2498:
---

This source supports only plain-text and is supposed to tail files as new-line 
separated data.
It would be possible to tail non plain-text file if deserializer is implemented 
to TailFile class and can be specified in flume.conf by a patch.

 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Attachments: FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2014-10-22 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180276#comment-14180276
 ] 

Otis Gospodnetic commented on FLUME-2498:
-

bq. It would be possible to create a patch to control the order of files for 
consumption in some way, for example, sorting inode list using comparator.

+1 for this.

Can this source be used to tail a file where something is writing Avro or 
Thrift?  Or just plain-text, line-oriented data?


 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Attachments: FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2014-10-20 Thread Satoshi Iijima (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176854#comment-14176854
 ] 

Satoshi Iijima commented on FLUME-2498:
---

It would be possible to create a patch to control the order of files for 
consumption in some way, for example, sorting inode list using comparator.

It is correct. This source does not have the option to delete the files now.

 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Attachments: FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2014-10-17 Thread Satoshi Iijima (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174871#comment-14174871
 ] 

Satoshi Iijima commented on FLUME-2498:
---

Appearing below is answers to the questions posted to the mailing lists.

 Btw., because the position in the file is checkpointed periodically, does
 that mean that it is possible that, after a restart, some number of lines
 that have already been tailed, will be read again?

Yes. They will not be read again.
On restart this source will start reading from the last read position in 
position file.

  -  How does it know when to stop tailing the current file and switch to or 
 start tailing another file
  - When there is a backlog of many files being built up... how does it order 
 the files for consumption

This source does not have the order because it is basically supposed to tail 
appended lines of files in nearly real-time.
If there is a backlog of many files on start-up, one file will be selected in 
random order and be read to EOF, then the next file will be selected in the 
same way.
Using 'skipToEnd' property, it can also start tailing from EOF of the current 
files.

  - Sounds like there is some C/C++ native code + JNI to work with inodes ? 
 what api are you using.

This source uses java.nio.file.Files.getAttribute() of Java 7 API to identify 
inode of a file.

  - does it auto delete the consumed files ?

No, the consumed files need not be deleted in this source. Files and positions 
of each file that should be tailed are recorded in the position file.
For example, a log file of a application such as /var/log/app/access.log can be 
directly specified in flume.conf

 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Attachments: FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2014-10-17 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175530#comment-14175530
 ] 

Otis Gospodnetic commented on FLUME-2498:
-

bq. Yes. They will not be read again.

I can't tell if it's Yes or No to the will lines be read again :)

bq. If there is a backlog of many files on start-up, one file will be selected 
in random order

Would it be possible to look at the timestamp on unread files?  Or see if they 
have a numeric extension, like .1, .2, etc. and use some heuristics to try and 
read them in the correct order?

bq. No, the consumed files need not be deleted in this source.

I think the person was asking whether this Taildir Source implementation 
deletes a file when it's done reading it or not.  I think the answer is that it 
does NOT delete the file and that file deletion is somebody else's 
responsibility.  Correct?

 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Attachments: FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2014-10-15 Thread Satoshi Iijima (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14172294#comment-14172294
 ] 

Satoshi Iijima commented on FLUME-2498:
---

I have added 'Windows support' and 'Minimum Java version' to the above links. 
I think it is possible to support Windows because Windows has file ID instead 
of inode.
If someone attaches a patch in which this source can run under Windows from 
now, then I think it is good, although I do not have Windows environment.

'Pollable' means that the source implements PollableSource or that it continues 
to process if tailing files was entirely consumed.

'Append header' means that the source can append headers to the events.
When the source tails multiple files, it is useful to be able to append headers 
to events of each file such as Spooling Directory source.

 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Attachments: FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2014-10-14 Thread Satoshi Iijima (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14170734#comment-14170734
 ] 

Satoshi Iijima commented on FLUME-2498:
---

I compared this source with other source components which can tail or read 
file(s)
If there are incorrect contents, please let me know.

Compare with tail-pollable-source (FLUME-2344) and jambalay-file-source
http://vschart.com/compare/flume-taildir-source/vs/flume-tail-pollable-source-flume-2344/vs/flume-jambalay-file-source
 
Compare with Spooling directory source and Exec source (tail -F)
http://vschart.com/compare/flume-taildir-source/vs/flume-spooling-directory/vs/flume-exec-source-tail-f

 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Attachments: FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2014-10-14 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171202#comment-14171202
 ] 

Otis Gospodnetic commented on FLUME-2498:
-

Thanks!
Questions:
* what about Windows support?  Should that be another row?
* what does Pollable mean?
* what does Append header mean/do?  When is this used, how/why is it useful?


 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Attachments: FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.
 This source requires Unix-style file system and Java 1.7 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2498) Implement Taildir Source

2014-10-10 Thread Satoshi Iijima (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166578#comment-14166578
 ] 

Satoshi Iijima commented on FLUME-2498:
---

Hari, Santiago, Otis, Roshan - Thank you for your replying to dev ML.
I am sorry that I could not reply for a unknown reason after that...
I would like to discuss about this source here.

 Implement Taildir Source
 

 Key: FLUME-2498
 URL: https://issues.apache.org/jira/browse/FLUME-2498
 Project: Flume
  Issue Type: New Feature
  Components: Sinks+Sources
Reporter: Satoshi Iijima
 Attachments: FLUME-2498.patch


 This is the proposal of implementing a new tailing source.
 This source watches the specified files, and tails them in nearly real-time 
 once appends are detected to these files.
 * This source is reliable and will not miss data even when the tailing files 
 rotate.
 * It periodically writes the last read position of each file in a position 
 file using the JSON format.
 * If Flume is stopped or down for some reason, it can restart tailing from 
 the position written on the existing position file.
 * It can add event headers to each tailing file group. 
 A attached patch includes a config documentation of this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)