I was able to resolve the issue. Wiki link [http://wiki.apache.org/nutch/RunNutchInEclipse1.0 ] for eclipse lets you build 'nutch_1_0' but not the current trunk, for this purpose you can do following:
1. Execute 'ant job' (which is the default) after downloading nutch through SVN 2. Update "plugin.folders" (under nutch-default.xml) to ECLIPSE_OUTPUT_FOLDER/plugins If it still fails increase your memory allocation or find a simpler website to crawl. --- On Fri, 2/19/10, Zeeshan Ul Haq <maqbool...@yahoo.com> wrote: From: Zeeshan Ul Haq <maqbool...@yahoo.com> Subject: Plugins are not properly initialized - BasicURLNormalizer exception To: nutch-user@lucene.apache.org Date: Friday, February 19, 2010, 2:17 PM Operating System - Windows XP Eclipse - Version: 3.3.1 (Europa) Nutch - Building Trunk after downloading through SVN ISSUE - Plugins are not properly initialized System log ========= Injector: starting Injector: crawlDb: crawl/crawldb Injector: urlDir: urls Injector: Converting injected urls to crawl db entries. org.apache.nutch.plugin.PluginRuntimeException: java.lang.ClassNotFoundException: org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer at org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:166) at org.apache.nutch.net.URLNormalizers.getURLNormalizers(URLNormalizers.java:170) at org.apache.nutch.net.URLNormalizers.<init>(URLNormalizers.java:128) at org.apache.nutch.crawl.Injector$InjectMapper.configure(Injector.java:68) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:354) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:176) Caused by: java.lang.ClassNotFoundException: org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer at java.net.URLClassLoader$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:156) ... 21 more org.apache.nutch.plugin.PluginRuntimeException: java.lang.ClassNotFoundException: org.apache.nutch.net.urlnormalizer.regex.RegexURLNormalizer at org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:166) at org.apache.nutch.net.URLNormalizers.getURLNormalizers(URLNormalizers.java:170) at org.apache.nutch.net.URLNormalizers.<init>(URLNormalizers.java:128) at org.apache.nutch.crawl.Injector$InjectMapper.configure(Injector.java:68) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:354) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:176) Caused by: java.lang.ClassNotFoundException: org.apache.nutch.net.urlnormalizer.regex.RegexURLNormalizer at java.net.URLClassLoader$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:156) ... 21 more org.apache.nutch.plugin.PluginRuntimeException: java.lang.ClassNotFoundException: org.apache.nutch.net.urlnormalizer.pass.PassURLNormalizer at org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:166) at org.apache.nutch.net.URLNormalizers.getURLNormalizers(URLNormalizers.java:170) at org.apache.nutch.net.URLNormalizers.<init>(URLNormalizers.java:128) at org.apache.nutch.crawl.Injector$InjectMapper.configure(Injector.java:68) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:354) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:176) Caused by: java.lang.ClassNotFoundException: org.apache.nutch.net.urlnormalizer.pass.PassURLNormalizer at java.net.URLClassLoader$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:156) ... 21 more Exception in thread "main" java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252) at org.apache.nutch.crawl.Injector.inject(Injector.java:211) at org.apache.nutch.crawl.Crawl.main(Crawl.java:124) Hadoop-log ======== 2010-02-19 14:13:57,475 WARN mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 2010-02-19 14:13:57,647 WARN mapred.JobClient - No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 2 2010-02-19 14:08:38,370 INFO plugin.PluginRepository - Plugins: looking in: \eclipse\workspace\Nutch_trunk\src\plugin 2010-02-19 14:08:39,558 INFO plugin.PluginRepository - Plugin Auto-activation mode: [true] 2010-02-19 14:08:39,558 INFO plugin.PluginRepository - Registered Plugins: 2010-02-19 14:08:39,558 INFO plugin.PluginRepository - the nutch core extension points (nutch-extensionpoints) 2010-02-1 .... 2010-02-19 14:08:39,651 WARN net.URLNormalizers - URLNormalizers:PluginRuntimeException when initializing url normalizer plugin urlnormalizer-regex instance in getURLNormalizers function: attempting to continue instantiating plugins 2010-02-19 14:08:39,651 WARN net.URLNormalizers - URLNormalizers:PluginRuntimeException when initializing url normalizer plugin urlnormalizer-pass instance in getURLNormalizers function: attempting to continue instantiating plugins 2010-02-19 14:08:39,698 WARN mapred.LocalJobRunner - job_local_0001 java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:354) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:176) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 5 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) ... 10 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 13 more Caused by: java.lang.RuntimeException: org.apache.nutch.plugin.PluginRuntimeException: java.lang.ClassNotFoundException: org.apache.nutch.urlfilter.regex.RegexURLFilter at org.apache.nutch.net.URLFilters.<init>(URLFilters.java:77) at org.apache.nutch.crawl.Injector$InjectMapper.configure(Injector.java:70) ... 18 more Caused by: org.apache.nutch.plugin.PluginRuntimeException: java.lang.ClassNotFoundException: org.apache.nutch.urlfilter.regex.RegexURLFilter at org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:166) at org.apache.nutch.net.URLFilters.<init>(URLFilters.java:57) ... 19 more Caused by: java.lang.ClassNotFoundException: org.apache.nutch.urlfilter.regex.RegexURLFilter at java.net.URLClassLoader$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:156) ... 20 more Followed steps in wiki - http://wiki.apache.org/nutch/RunNutchInEclipse1.0 1. Install cygwin and set the PATH environment variable for it 2. Create a new Java Project in Eclipse Checkout trunk from SVN into a new project File > New > Project > Java project > click Next Name the project (Nutch_Trunk for instance) Click on Next, and wait while Eclipse is scanning the folders Add the folder "conf" to the classpath Right-click on the project, select "properties" then "Java Build Path" tab (left menu) and then the "Libraries" tab. Click "Add Class Folder..." button, and select "conf" from the list) Go to "Order and Export" tab, find the entry for added "conf" folder and move it to the top (by checking it and clicking the "Top" button). This is required so Eclipse will take config (nutch-default.xml, nutch-final.xml, etc.) resources from our "conf" folder and not from somewhere else. Eclipse should have guessed all the Java files that must be added to your classpath. If that's not the case, add "src/java", "src/test" and all plugin "src/java" and "src/test" folders to your source folders. Also add all jars in "lib" and in the plugin lib folders to your libraries Click the "Source" tab and set the default output folder to "Nutch_Trunk/bin/tmp_build". (You may need to create the tmp_build folder.) Click the "Finish" button DO NOT add "build" to classpath 3. Open up $NUTCH_HOME/conf/nutch-default.xml file and update Search for http.agent.name , and give it value 'YOURNAME Spider' 4. Edit the file conf/crawl-urlfilter.txt and replace MY.DOMAIN.NAME with the name of the domain 5. Update the missing jars as described in README 6. Eclipse -> Window -> Preferences -> Java -> Installed JREs -> edit -> Default VM arguments "-Xms5m -Xmx150m" 7. Create Eclipse launcher Menu Run > "Run..." create "New" for "Java Application" set in Main class - "org.apache.nutch.crawl.Crawl"on tab Arguments, Program Arguments - "urls -dir crawl -depth 3 -topN 50" in VM arguments - "-Dhadoop.log.dir=logs -Dhadoop.log.file=hadoop.log"click on "Run"