Re: Using Chuckwa for Nutch Log Analysis and Monitoring

Eric Yang Sat, 14 Feb 2015 09:46:18 -0800

Hi Lewis,

Parse error can be captured and store errors to another HDFS location.  In
Chukwa 0.4 and earlier, we have demux map reduce job which does the
extraction and store structured data in HDFS, and errors are channel to
another HDFS folder called InError, with the cause of the parsing error.
This is still a batch oriented operation.  In Chukwa 0.6, we can setup
multiple pipeline writer.  The pipeline writers can be configured to
provide parsing and channel error to somewhere else, if data parse
properly, then write it to HBase or HDFS.  However, you will need to write
the pipeline writer class to extend this functionality.  We currently only
have a couple pipeline writers, LocalWriter, HBaseWriter, and
SeqFileWriter.  SeqFileWriter needs to be the last one in the pipeline, if
you choose to write data to HDFS.  See this page for how to configure
pipeline writer to achieve partially of what you are looking for:


http://chukwa.apache.org/docs/r0.6.0/pipeline.html

Hope this helps.

regards,
Eric

On Thu, Feb 12, 2015 at 11:12 PM, Lewis John Mcgibbney <
lewis.mcgibb...@gmail.com> wrote:

> Hi Folks,
> For some time I have been meaning to get in touch to get advice on
> developing a tool for log analysis of Apache Nutch [0] logs.
> What I am referring to particularly is monitoring of logs in a bid to
> identify particular errors which we may anticipate.
> Nutch jobs are batch oriented in architecture which are inherited from
> Hadoop, we typically see errors in the parse phase of a crawl so it is
> events like this that I would like to anticipate, monitor and report on,
> possibly through email.
> So I am therefore thinking about building a Chuckwa-powered tool for Nutch
> which would become part of our codebase.
> Is Chukwa the right tool for this? Any information about similar efforts
> would be very much appreciated.
> best
> Lewis
>
> [0] http://nutch.apache.org
>
> --
> *Lewis*
>

Re: Using Chuckwa for Nutch Log Analysis and Monitoring

Reply via email to