askNutch wrote:
hi Kubes: You are the expert! Can you tell me What is the develop environment do you use to
develop nutch ?

Linux, Ubuntu (usually the most recent), sun jdk, core2 laptop (although hoping to upgrade to a sagernotebook.com quad core soon :) ), Eclipse stable (3.4 I think).
such as IDE etc. I want to debug nutch.

Debugging MapReduce, hence Nutch, jobs is difficult. The main reason why is because Hadoop/Nutch spin up a new JVM for each Map and Reduce job so it is difficult to connect to that JVM as it is created and launched automagically. Here are some options depending on what you are trying to debug:

1) Run all hadoop servers processes (namenode, etc.) through eclipse using the internal debugger. This isn't always the best way, usually only used when debugging some part of the hadoop infrastructure such as socket communication.

2) Run most of the hadoop servers in separate processes, run the tasktracker inside of eclipse with the internal debugger. This is mainly used when debugging a specific MapRunner, MapTask, or ReduceTask interacting with Hadoop. You won't be able to debug the Map or Reduce task itself, just the communication with the Hadoop server, for instance reporting status.

3) Debugging the Map/Reduce task itself. Logging. Judicious logging is most often what I use. Also do very small example if you can help it to give yourself small turnaround times. Unless your problem is occurring only on a large dataset, don't debug on a large data set.

Hope this helps.

Dennis

thank you !!!

Reply via email to