Re: hi Kubes:the question about develop environment!

Dennis Kubes Wed, 22 Apr 2009 07:04:56 -0700


askNutch wrote:

hi Kubes:You are the expert!Can you tell me What is the develop environment do you use to
develop nutch ?

Linux, Ubuntu (usually the most recent), sun jdk, core2 laptop (althoughhoping to upgrade to a sagernotebook.com quad core soon :) ), Eclipsestable (3.4 I think).

such as IDE etc.I want to debug nutch.

Debugging MapReduce, hence Nutch, jobs is difficult. The main reasonwhy is because Hadoop/Nutch spin up a new JVM for each Map and Reducejob so it is difficult to connect to that JVM as it is created andlaunched automagically. Here are some options depending on what you aretrying to debug:

1) Run all hadoop servers processes (namenode, etc.) through eclipseusing the internal debugger. This isn't always the best way, usuallyonly used when debugging some part of the hadoop infrastructure such assocket communication.

2) Run most of the hadoop servers in separate processes, run thetasktracker inside of eclipse with the internal debugger. This ismainly used when debugging a specific MapRunner, MapTask, or ReduceTaskinteracting with Hadoop. You won't be able to debug the Map or Reducetask itself, just the communication with the Hadoop server, for instancereporting status.

3) Debugging the Map/Reduce task itself. Logging. Judicious logging ismost often what I use. Also do very small example if you can help it togive yourself small turnaround times. Unless your problem is occurringonly on a large dataset, don't debug on a large data set.


Hope this helps.

Dennis

thank you !!!

Re: hi Kubes:the question about develop environment!

Reply via email to