Hi Isreal, Thank you for this details answer. I'll give it a try.
Best Regards, Christian. 2013/4/8 Israel Ekpo <[email protected]> > Christian, > > From your comments, it seems Flume will be the right tool for the task. > > The SpoolingDirectorySource would be a great choice for the task you have > since the log data has already been generated. > > However, the Spooling Directory Source requires that the files be > immutable. > > This means once a file is created or dropped in the spooling directory it > cannot be modified. > > Consequently, you will not be able to just use the currently log directory > where the log files are continuously being appended to. > > I would recommend that you set aside a separate directory for spooling for > Flume and then set up some sort of cronjob or scheduled task that will > periodically drop the logs into the spooling directory after traversing the > symlinks and recursively processing the log directories. > > The SpoolingDirectorySource currently does not recursively traverse the > spooled folders. > > It assumes that all the files you plan to consume are in the root folder > you are spooling. > > Use FileChannel as the channel as this is more reliable. > > Depending of the type of analysis you want to conduct, the > ElasticSearchSink might be a good choice for your sink. > > Feel free to review the user guide for other options for Sinks. > > http://flume.apache.org/FlumeUserGuide.html > > You can also set up your own custom sink if you have other centralized > datastores in mind. > > Spend some time to go through the user guide and developer guide so that > you can get a better understanding of the architecture and use cases. > > http://flume.apache.org/FlumeUserGuide.html > > http://flume.apache.org/FlumeDeveloperGuide.html > > > On 8 April 2013 10:33, Christian Schneider <[email protected]>wrote: > >> Hi, >> I need to collect log data from our Cluster. >> >> For this I think I need to copy the Contents of: >> * JobTracker: /var/log/hadoop-0.20-mapreduce/history/ >> * TaskTracker: /var/log/hadoop-0.20-mapreduce/userlogs/ >> >> It should also follow symlinks and copy recusrive. >> >> Is flume the right tool to do this? >> >> E.g. with the "Spooling Directory Source"? >> >> Best Regards, >> Christian. >> > >
