askNutch wrote:
hi kubes:
thank you for your answers!
i'm sorry that i didn't express my question.
i run nutch only on one machine! and ,i cann't debug hadoop in nutch.because
the hadoop's exist is lib.
how can i debug hadoop source in nutch?
Build hadoop from scratch, or run inside of eclipse as a project. You
will have to startup each hadoop server manually through an eclipse
launcher and will have to have the project source as part of the
debugger source.
and to my surprise ,the Tutorial "RunNutchInEclipse1.0" doesn't start and
configure hadoop ,include master listen port etc.
when i debug nutch through breakpoint, it display:"there is no source file
attached to the class file URLClassPath.class!" why?
When running it through eclipse you will also need to remove the lib
hadoop jar from nutch (or at least from the classpath in eclipse) and
put in the hadoop project. This way it pulls from the hadoop source
code and will display the source file.
can hadoop run in vmware machine?
Probably yes. Many people run it under xen, don't know if there is that
much difference. I wouldn't see why there would be a problem as long as
it can get socket access.
Dennis
and i also met other problers ,it is in another message " run nutch on
eclipse problem? "'
thanks !!!
Dennis Kubes-2 wrote:
askNutch wrote:
hi Kubes:
You are the expert!
Can you tell me What is the develop environment do you use to
develop nutch ?
Linux, Ubuntu (usually the most recent), sun jdk, core2 laptop (although
hoping to upgrade to a sagernotebook.com quad core soon :) ), Eclipse
stable (3.4 I think).
such as IDE etc.
I want to debug nutch.
Debugging MapReduce, hence Nutch, jobs is difficult. The main reason
why is because Hadoop/Nutch spin up a new JVM for each Map and Reduce
job so it is difficult to connect to that JVM as it is created and
launched automagically. Here are some options depending on what you are
trying to debug:
1) Run all hadoop servers processes (namenode, etc.) through eclipse
using the internal debugger. This isn't always the best way, usually
only used when debugging some part of the hadoop infrastructure such as
socket communication.
2) Run most of the hadoop servers in separate processes, run the
tasktracker inside of eclipse with the internal debugger. This is
mainly used when debugging a specific MapRunner, MapTask, or ReduceTask
interacting with Hadoop. You won't be able to debug the Map or Reduce
task itself, just the communication with the Hadoop server, for instance
reporting status.
3) Debugging the Map/Reduce task itself. Logging. Judicious logging is
most often what I use. Also do very small example if you can help it to
give yourself small turnaround times. Unless your problem is occurring
only on a large dataset, don't debug on a large data set.
Hope this helps.
Dennis
thank you !!!