Re: Archive Task Logs (Stdout, Stderr, Sysogs) and Job Tracker logs of a Hadoop Cluster for later analysis

Christian Schneider Thu, 11 Apr 2013 00:56:36 -0700

Hi Isreal,

Thank you for  this details answer.
I'll give it a try.


Best Regards,
Christian.


2013/4/8 Israel Ekpo <[email protected]>

> Christian,
>
> From your comments, it seems Flume will be the right tool for the task.
>
> The SpoolingDirectorySource would be a great choice for the task you have
> since the log data has already been generated.
>
> However, the Spooling Directory Source requires that the files be
> immutable.
>
> This means once a file is created or dropped in the spooling directory it
> cannot be modified.
>
> Consequently, you will not be able to just use the currently log directory
> where the log files are continuously being appended to.
>
> I would recommend that you set aside a separate directory for spooling for
> Flume and then set up some sort of cronjob or scheduled task that will
> periodically drop the logs into the spooling directory after traversing the
> symlinks and recursively processing the log directories.
>
> The SpoolingDirectorySource currently does not recursively traverse the
> spooled folders.
>
> It assumes that all the files you plan to consume are in the root folder
> you are spooling.
>
> Use FileChannel as the channel as this is more reliable.
>
> Depending of the type of analysis you want to conduct, the
> ElasticSearchSink might be a good choice for your sink.
>
> Feel free to review the user guide for other options for Sinks.
>
> http://flume.apache.org/FlumeUserGuide.html
>
> You can also set up your own custom sink if you have other centralized
> datastores in mind.
>
> Spend some time to go through the user guide and developer guide so that
> you can get a better understanding of the architecture and use cases.
>
> http://flume.apache.org/FlumeUserGuide.html
>
> http://flume.apache.org/FlumeDeveloperGuide.html
>
>
> On 8 April 2013 10:33, Christian Schneider <[email protected]>wrote:
>
>> Hi,
>> I need to collect log data from our Cluster.
>>
>> For this I think I need to copy the Contents of:
>> * JobTracker: /var/log/hadoop-0.20-mapreduce/history/
>> * TaskTracker: /var/log/hadoop-0.20-mapreduce/userlogs/
>>
>> It should also follow symlinks and copy recusrive.
>>
>> Is flume the right tool to do this?
>>
>> E.g. with the "Spooling Directory Source"?
>>
>> Best Regards,
>> Christian.
>>
>
>

Re: Archive Task Logs (Stdout, Stderr, Sysogs) and Job Tracker logs of a Hadoop Cluster for later analysis

Reply via email to