I was able to resolve the issue.

Wiki link [http://wiki.apache.org/nutch/RunNutchInEclipse1.0 ] for eclipse lets 
you build 'nutch_1_0' but not the current trunk, for this purpose you can do 
following:

1. Execute 'ant job' (which is the default) after downloading nutch through SVN
2. Update "plugin.folders" (under nutch-default.xml) to 
ECLIPSE_OUTPUT_FOLDER/plugins

If it still fails increase your memory allocation or find a simpler website to 
crawl.


--- On Fri, 2/19/10, Zeeshan Ul Haq <maqbool...@yahoo.com> wrote:

From: Zeeshan Ul Haq <maqbool...@yahoo.com>
Subject: Plugins are not properly initialized - BasicURLNormalizer exception
To: nutch-user@lucene.apache.org
Date: Friday, February 19, 2010, 2:17 PM

Operating System - Windows XP
Eclipse - Version: 3.3.1 (Europa)
Nutch - Building Trunk after downloading through SVN

ISSUE - Plugins are not properly initialized 

System log
=========
Injector: starting
Injector: crawlDb: crawl/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
org.apache.nutch.plugin.PluginRuntimeException: 
java.lang.ClassNotFoundException: 
org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer
    at 
org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:166)
    at 
org.apache.nutch.net.URLNormalizers.getURLNormalizers(URLNormalizers.java:170)
    at org.apache.nutch.net.URLNormalizers.<init>(URLNormalizers.java:128)
    at org.apache.nutch.crawl.Injector$InjectMapper.configure(Injector.java:68)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:354)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:176)
Caused by: java.lang.ClassNotFoundException: 
org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer
    at java.net.URLClassLoader$1.run(Unknown Source)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
    at 
org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:156)
    ... 21 more
org.apache.nutch.plugin.PluginRuntimeException: 
java.lang.ClassNotFoundException: 
org.apache.nutch.net.urlnormalizer.regex.RegexURLNormalizer
    at 
org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:166)
    at 
org.apache.nutch.net.URLNormalizers.getURLNormalizers(URLNormalizers.java:170)
    at org.apache.nutch.net.URLNormalizers.<init>(URLNormalizers.java:128)
    at org.apache.nutch.crawl.Injector$InjectMapper.configure(Injector.java:68)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:354)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:176)
Caused by: java.lang.ClassNotFoundException: 
org.apache.nutch.net.urlnormalizer.regex.RegexURLNormalizer
    at java.net.URLClassLoader$1.run(Unknown Source)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
    at 
org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:156)
    ... 21 more
org.apache.nutch.plugin.PluginRuntimeException: 
java.lang.ClassNotFoundException: 
org.apache.nutch.net.urlnormalizer.pass.PassURLNormalizer
    at 
org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:166)
    at 
org.apache.nutch.net.URLNormalizers.getURLNormalizers(URLNormalizers.java:170)
    at org.apache.nutch.net.URLNormalizers.<init>(URLNormalizers.java:128)
    at org.apache.nutch.crawl.Injector$InjectMapper.configure(Injector.java:68)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:354)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:176)
Caused by: java.lang.ClassNotFoundException: 
org.apache.nutch.net.urlnormalizer.pass.PassURLNormalizer
    at java.net.URLClassLoader$1.run(Unknown Source)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
    at 
org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:156)
    ... 21 more
Exception in thread "main" java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
    at org.apache.nutch.crawl.Injector.inject(Injector.java:211)
    at org.apache.nutch.crawl.Crawl.main(Crawl.java:124)

Hadoop-log
========
2010-02-19 14:13:57,475 WARN  mapred.JobClient - Use GenericOptionsParser for 
parsing the arguments. Applications should implement Tool for the same.
2010-02-19 14:13:57,647 WARN  mapred.JobClient - No job jar file set.  User 
classes may not be found. See JobConf(Class) or JobConf#setJar(String).
2
2010-02-19 14:08:38,370 INFO  plugin.PluginRepository - Plugins: looking in: 
\eclipse\workspace\Nutch_trunk\src\plugin
2010-02-19 14:08:39,558 INFO  plugin.PluginRepository - Plugin Auto-activation 
mode: [true]
2010-02-19 14:08:39,558 INFO  plugin.PluginRepository - Registered Plugins:
2010-02-19 14:08:39,558 INFO  plugin.PluginRepository -     the nutch core 
extension points (nutch-extensionpoints)
2010-02-1
....

2010-02-19 14:08:39,651 WARN  net.URLNormalizers - 
URLNormalizers:PluginRuntimeException when initializing url normalizer plugin 
urlnormalizer-regex instance in getURLNormalizers function: attempting to 
continue instantiating plugins
2010-02-19 14:08:39,651 WARN  net.URLNormalizers - 
URLNormalizers:PluginRuntimeException when initializing url normalizer plugin 
urlnormalizer-pass instance in getURLNormalizers function: attempting to 
continue instantiating plugins
2010-02-19 14:08:39,698 WARN  mapred.LocalJobRunner - job_local_0001
java.lang.RuntimeException: Error in configuring object
    at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:354)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:176)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
    ... 5 more
Caused by: java.lang.RuntimeException: Error in configuring object
    at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
    ... 10 more
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
    ... 13 more
Caused by: java.lang.RuntimeException: 
org.apache.nutch.plugin.PluginRuntimeException: 
java.lang.ClassNotFoundException: 
org.apache.nutch.urlfilter.regex.RegexURLFilter
    at org.apache.nutch.net.URLFilters.<init>(URLFilters.java:77)
    at org.apache.nutch.crawl.Injector$InjectMapper.configure(Injector.java:70)
    ... 18 more
Caused by: org.apache.nutch.plugin.PluginRuntimeException: 
java.lang.ClassNotFoundException: 
org.apache.nutch.urlfilter.regex.RegexURLFilter
    at 
org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:166)
    at org.apache.nutch.net.URLFilters.<init>(URLFilters.java:57)
    ... 19 more
Caused by: java.lang.ClassNotFoundException: 
org.apache.nutch.urlfilter.regex.RegexURLFilter
    at java.net.URLClassLoader$1.run(Unknown Source)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
    at 
org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:156)
    ... 20 more

Followed steps in wiki - http://wiki.apache.org/nutch/RunNutchInEclipse1.0

1. Install cygwin and set the PATH environment variable for it
2. Create a new Java Project in Eclipse
Checkout trunk from SVN into a new project 
File > New > Project > Java project > click Next Name the project (Nutch_Trunk 
for instance) Click on Next, and wait while Eclipse is scanning the folders Add
the folder "conf" to the classpath Right-click on the project, select
"properties" then "Java Build Path" tab (left menu) and then the
"Libraries" tab. Click "Add Class Folder..." button, and select "conf"
from the list) Go to
"Order and Export" tab, find the entry for added "conf" folder and move
it to the top (by checking it and clicking the "Top" button). This is
required so Eclipse will take config (nutch-default.xml,
nutch-final.xml, etc.) resources from our "conf" folder and not from
somewhere else. Eclipse
should have guessed all the Java files that must be added to your
classpath. If that's not the case, add "src/java", "src/test" and all
plugin "src/java" and "src/test" folders to your source folders. Also
add all jars in "lib" and in the plugin lib folders to your libraries Click
the "Source" tab and set the default output folder to
"Nutch_Trunk/bin/tmp_build". (You may need to create the tmp_build
folder.) Click the "Finish" button DO NOT add "build" to classpath 3. Open up 
$NUTCH_HOME/conf/nutch-default.xml file and update Search for http.agent.name , 
and give it value 'YOURNAME Spider'
4. Edit the file conf/crawl-urlfilter.txt and replace MY.DOMAIN.NAME with the 
name of the domain
5. Update the missing jars as described in README

6. Eclipse -> Window -> Preferences -> Java -> Installed JREs -> edit -> 
Default VM arguments  "-Xms5m -Xmx150m"



7. Create Eclipse launcher
Menu Run > "Run..." create "New" for "Java Application" set in Main class - 
"org.apache.nutch.crawl.Crawl"on tab Arguments, Program Arguments - "urls -dir 
crawl -depth 3 -topN 50"

in VM arguments - "-Dhadoop.log.dir=logs -Dhadoop.log.file=hadoop.log"click on 
"Run" 




      


      

Reply via email to