Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "RunNutchInEclipse" page has been changed by store88.
The comment on this change is: update tutorial hyperlink.
http://wiki.apache.org/nutch/RunNutchInEclipse?action=diff&rev1=14&rev2=15

--------------------------------------------------

  = RunNutchInEclipse =
- 
  This is a work in progress. If you find errors or would like to improve this 
page, just create an account [UserPreferences] and start editing this page :-)
  
  == Tested with ==
@@ -11, +10 @@

   * Ubuntu (should work on most platform, though)
  
  == Before you start ==
+ Setting up Nutch to run into Eclipse can be tricky, and most of the time you 
are much faster if you edit Nutch in Eclipse but run the scripts from the 
command line (my 2 cents). However, it's very useful to be able to debug Nutch 
in Eclipse. But again you might be quickier by looking at the logs 
(logs/hadoop.log)...
- 
- Setting up Nutch to run into Eclipse can be tricky, and most of the time you 
are much faster if you edit Nutch in Eclipse but run the scripts from the 
command line (my 2 cents).
- However, it's very useful to be able to debug Nutch in Eclipse. But again you 
might be quickier by looking at the logs (logs/hadoop.log)...
  
  == Steps ==
- 
  === Install Nutch ===
   * Grab a fresh release of Nutch 0.8 or make a fresh checkout of Nutch 0.8 
from svn
   * Do not build Nutch now. Make sure you have no .project and .classpath 
files in the Nutch directory
@@ -26, +22 @@

   * select "Create project from existing source" and use the location where 
you downloaded Nutch
   * click on Next, and wait while Eclipse is scanning the folders
   * add the folder "conf" to the classpath (scroll down the list and 
right-click on "conf". This step is necessary)
-  * Eclipse should have guessed all the java files that must be added on your 
classpath. If it's not the case, add "src/java", "src/test" and all plugin 
"src/java" and "src/test" folders to your source folders. Also add all jars in 
"lib" and in the plugin lib folders to your libraries 
+  * Eclipse should have guessed all the java files that must be added on your 
classpath. If it's not the case, add "src/java", "src/test" and all plugin 
"src/java" and "src/test" folders to your source folders. Also add all jars in 
"lib" and in the plugin lib folders to your libraries
   * set output dir to "tmp_build", create it if necessary
   * DO NOT add "build" to classpath
-  * '''or you can use [[attachment:.classpath|.classpath]] file'''
+  * '''or you can use [[attachment:.classpath]] file'''
  
  ==== If you're using the trunk ====
- 
  As of revision 511012 there were a few plugins on the trunk and a couple 
other files that did not build, and are actually excluded from the ant 
projects.  You may want to remove the following projects from the build 
structure:
  
   * plugin/parse-mp3
@@ -40, +35 @@

   * contrib/*
  
  === Configure Nutch ===
-  * see the [[http://lucene.apache.org/nutch/tutorial8.html|Tutorial]]
+  * see the [[http://wiki.apache.org/nutch/NutchTutorial|Tutorial]]
   * change the property "plugin.folders" to "./src/plugin" on 
$NUTCH_HOME/conf/nutch-site.xml
   * make sure Nutch is configured correctly before testing it into Eclipse ;-)
  
@@ -51, +46 @@

   * Menu Run > "Run..."
   * create "New" for "Java Application"
   * set in Main class
+ 
  {{{
  org.apache.nutch.crawl.Crawl
  }}}
   * on tab Arguments, Program Arguments
+ 
  {{{
  urls -dir crawl -depth 3 -topN 50
  }}}
   * in VM arguments
+ 
  {{{
  -Dhadoop.log.dir=logs -Dhadoop.log.file=hadoop.log
  }}}
@@ -68, +66 @@

  == Debug Nutch in Eclipse ==
   * Set breakpoints and debug a crawl
   * It can be tricky to find out where to set the breakpoint, because of the 
Hadoop jobs. Here are a few good places to set breakpoints:
+ 
  {{{
  Fetcher [line: 371] - run
  Fetcher [line: 438] - fetch
@@ -76, +75 @@

  Generator$Selector [line: 119] - map
  OutlinkExtractor [line: 111] - getOutlinks
  }}}
- 
  == If things do not work... ==
  Yes, Nutch and Eclipse can be a difficult companionship sometimes ;-)
  
@@ -85, +83 @@

  
  === plugin dir not found ===
  Make sure you set your plugin.folders property correct, instead of using a 
relative path you can use a absoluth one as well in nutch-defaults.xml or may 
be better in nutch-site.xml
+ 
  {{{
  <property>
    <name>plugin.folders</name>
    <value>/home/....../nutch-0.8/src/plugin</value>
  }}}
- 
- 
  === No plugins loaded during unit tests in Eclipse ===
- 
  During unit testing, Eclipse ignored conf/nutch-site.xml in favor of 
src/test/nutch-site.xml, so you might need to add the plugin directory 
configuration to that file as well.
  
- 
  === Unit tests work in eclipse but fail when running ant in the command line 
===
- 
  Suppose your unit tests work perfectly in eclipse, but each and everyone fail 
when running '''ant test''' in the command line - including the ones you 
haven't modified.   Check if you defined the '''plugin.folders''' property in 
hadoop-site.xml. In that case, try removing it from that file and adding it 
directly to nutch-site.xml
  
  Run '''ant test''' again.  That should have solved the problem.
  
- If that didn't solve the problem, are you testing a plugin?  If so, did you 
add the plugin to the list of packages in plugin\build.xml, on the test target? 
+ If that didn't solve the problem, are you testing a plugin?  If so, did you 
add the plugin to the list of packages in plugin\build.xml, on the test target?
- 
  
  === classNotFound ===
   * open the class itself, rightclick
@@ -117, +110 @@

  
  http://nutch.cvs.sourceforge.net/nutch/nutch/src/plugin/parse-rtf/lib/
  
- You need to copy jar files into plugin "lib" path and refresh the project. 
+ You need to copy jar files into plugin "lib" path and refresh the project.
- 
  
  === debugging hadoop classes ===
-  Sometime it makes sense to also have the hadoop classes available during 
debugging. So, you can check out the Hadoop sources on your machine and add the 
sources to the  hadoop-xxx.jar. Alternatively, you can: 
+  . Sometime it makes sense to also have the hadoop classes available during 
debugging. So, you can check out the Hadoop sources on your machine and add the 
sources to the  hadoop-xxx.jar. Alternatively, you can:
    * Remove the hadoopXXX.jar from your classpath libraries
    * Checkout the hadoop brunch that is used within nutch
    * configure a hadoop project similar to the nutch project within your 
eclipse
-   * add the hadoop project as a dependent project of nutch project 
+   * add the hadoop project as a dependent project of nutch project
-   * you can now also set break points within hadoop classes lik inputformat 
implementations etc. 
+   * you can now also set break points within hadoop classes lik inputformat 
implementations etc.
  
  Original credits: RenaudRichardet
  

Reply via email to