Hi Lewis, Parse error can be captured and store errors to another HDFS location. In Chukwa 0.4 and earlier, we have demux map reduce job which does the extraction and store structured data in HDFS, and errors are channel to another HDFS folder called InError, with the cause of the parsing error. This is still a batch oriented operation. In Chukwa 0.6, we can setup multiple pipeline writer. The pipeline writers can be configured to provide parsing and channel error to somewhere else, if data parse properly, then write it to HBase or HDFS. However, you will need to write the pipeline writer class to extend this functionality. We currently only have a couple pipeline writers, LocalWriter, HBaseWriter, and SeqFileWriter. SeqFileWriter needs to be the last one in the pipeline, if you choose to write data to HDFS. See this page for how to configure pipeline writer to achieve partially of what you are looking for:
http://chukwa.apache.org/docs/r0.6.0/pipeline.html Hope this helps. regards, Eric On Thu, Feb 12, 2015 at 11:12 PM, Lewis John Mcgibbney < lewis.mcgibb...@gmail.com> wrote: > Hi Folks, > For some time I have been meaning to get in touch to get advice on > developing a tool for log analysis of Apache Nutch [0] logs. > What I am referring to particularly is monitoring of logs in a bid to > identify particular errors which we may anticipate. > Nutch jobs are batch oriented in architecture which are inherited from > Hadoop, we typically see errors in the parse phase of a crawl so it is > events like this that I would like to anticipate, monitor and report on, > possibly through email. > So I am therefore thinking about building a Chuckwa-powered tool for Nutch > which would become part of our codebase. > Is Chukwa the right tool for this? Any information about similar efforts > would be very much appreciated. > best > Lewis > > [0] http://nutch.apache.org > > -- > *Lewis* >