Existing "tail" source is not the best choice in your scenario, as you have
pointed out.

SpoolDir could be a solution if your log file rotation policy is very low
(5 minutes, for example), but then you have to deal with a huge number of
files in the folder (slower listings).

There is a proposal for a new approach, something that combines the best of
"tail" and "spoolDir". Take a look here:

https://issues.apache.org/jira/browse/FLUME-2498




2015-01-29 0:24 GMT+01:00 Lakshmanan Muthuraman <lakshma...@tokbox.com>:

> We have been using Flume to solve a very similar usecase. Our servers write
> the log files to a local file system, and then we have flume agent which
> ships the data to kafka.
>
> Flume you can use as exec source running tail. Though the exec source runs
> well with tail, there are issues if the agent goes down or the file channel
> starts building up. If the agent goes down, you can request flume exec tail
> source to go back n number of lines or read from beginning of the file. The
> challenge is we roll our log files on a daily basis. What if goes down in
> the evening. We need to go back to the entire days worth of data for
> reprocessing which slows down the data flow. We can also go back arbitarily
> number of lines, but then we dont know what is the right number to go back.
> This is kind of challenge for us. We have tried spooling directory. Which
> works, but we need to have a different log file rotation policy. We
> considered evening going a file rotation for a minute, but it will  still
> affect the real time data flow in our kafka--->storm-->Elastic search
> pipeline with a minute delay.
>
> We are going to do a poc on logstash to see how this solves the problem of
> flume.
>
> On Wed, Jan 28, 2015 at 10:39 AM, Fernando O. <fot...@gmail.com> wrote:
>
> > Hi all,
> >     I'm evaluating using Kafka.
> >
> > I liked this thing of Facebook scribe that you log to your own machine
> and
> > then there's a separate process that forwards messages to the central
> > logger.
> >
> > With Kafka it seems that I have to embed the publisher in my app, and
> deal
> > with any communication problem managing that on the producer side.
> >
> > I googled quite a bit trying to find a project that would basically use
> > daemon that parses a log file and send the lines to the Kafka cluster
> > (something like a tail file.log but instead of redirecting the output to
> > the console: send it to kafka)
> >
> > Does anyone knows about something like that?
> >
> >
> > Thanks!
> > Fernando.
> >
>



-- 

David Morales de Frías  ::  +34 607 010 411 :: @dmoralesdf
<https://twitter.com/dmoralesdf>


<http://www.stratio.com/>
Vía de las dos Castillas, 33, Ática 4, 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 828 6473 // www.stratio.com // *@stratiobd
<https://twitter.com/StratioBD>*

Reply via email to