Hi, check wether your "Working directory" (Run -> Run Configurations -> Tab Arguments -> Working Directory) points to the Nutch base directory (where your conf/nucht-site.xml is located). Regards Hannes
On Mon, Oct 4, 2010 at 11:02 AM, Marseld Dedgjonaj < marseld.dedgjo...@ikubinfo.com> wrote: > Hello, > Thanks for your answer. I try it but I got this error. > Maybe any problem on reading in "conf" folder. I see its ok. > If I run crawl from linux script it works. > Thanks > > > This is the error message: > > 10/10/04 10:46:40 INFO crawl.Crawl: crawl started in: crawl > 10/10/04 10:46:40 INFO crawl.Crawl: rootUrlDir = my_urls > 10/10/04 10:46:40 INFO crawl.Crawl: threads = 5 > 10/10/04 10:46:40 INFO crawl.Crawl: depth = 3 > 10/10/04 10:46:40 INFO crawl.Crawl: indexer=lucene > 10/10/04 10:46:40 INFO crawl.Crawl: topN = 50 > 10/10/04 10:46:40 INFO crawl.Injector: Injector: starting at 2010-10-04 > 10:46:40 > 10/10/04 10:46:40 INFO crawl.Injector: Injector: crawlDb: crawl/crawldb > 10/10/04 10:46:40 INFO crawl.Injector: Injector: urlDir: my_urls > 10/10/04 10:46:40 INFO crawl.Injector: Injector: Converting injected urls > to > crawl db entries. > 10/10/04 10:46:40 INFO jvm.JvmMetrics: Initializing JVM Metrics with > processName=JobTracker, sessionId= > 10/10/04 10:46:40 WARN mapred.JobClient: Use GenericOptionsParser for > parsing the arguments. Applications should implement Tool for the same. > 10/10/04 10:46:41 WARN mapred.JobClient: No job jar file set. User classes > may not be found. See JobConf(Class) or JobConf#setJar(String). > 10/10/04 10:46:41 INFO mapred.FileInputFormat: Total input paths to process > : 1 > 10/10/04 10:46:42 INFO mapred.JobClient: Running job: job_local_0001 > 10/10/04 10:46:42 INFO mapred.FileInputFormat: Total input paths to process > : 1 > 10/10/04 10:46:42 INFO mapred.MapTask: numReduceTasks: 1 > 10/10/04 10:46:42 INFO mapred.MapTask: io.sort.mb = 100 > 10/10/04 10:46:43 INFO mapred.JobClient: map 0% reduce 0% > 10/10/04 10:46:43 INFO mapred.MapTask: data buffer = 79691776/99614720 > 10/10/04 10:46:43 INFO mapred.MapTask: record buffer = 262144/327680 > 10/10/04 10:46:43 WARN mapred.LocalJobRunner: job_local_0001 > java.lang.RuntimeException: Error in configuring object > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) > at > > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:354) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39 > ) > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl > .java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) > ... 5 more > Caused by: java.lang.RuntimeException: Error in configuring object > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) > at > > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) > at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) > ... 10 more > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39 > ) > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl > .java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) > ... 13 more > Caused by: java.lang.IllegalArgumentException: plugin.folders is not > defined > at > > org.apache.nutch.plugin.PluginManifestParser.parsePluginFolder(PluginManifes > tParser.java:78) > at > org.apache.nutch.plugin.PluginRepository.<init>(PluginRepository.java:72) > at > org.apache.nutch.plugin.PluginRepository.get(PluginRepository.java:95) > at org.apache.nutch.net.URLNormalizers.<init>(URLNormalizers.java:117) > at > org.apache.nutch.crawl.Injector$InjectMapper.configure(Injector.java:70) > ... 18 more > 10/10/04 10:46:44 INFO mapred.JobClient: Job complete: job_local_0001 > 10/10/04 10:46:44 INFO mapred.JobClient: Counters: 0 > 10/10/04 10:46:48 INFO mapred.LocalJobRunner: > file:/home/administrator/workspace/nutch-1.2/my_urls/urls:0+443 > > > > -----Original Message----- > From: Ahmad Al-Amri [mailto:amri...@yahoo.com] > Sent: Sunday, October 03, 2010 11:34 AM > To: user@nutch.apache.org > Subject: Re: Run crawl from java code > > Hello; > > open nutch with eclipse; > > Run -> debug Configuration -> 'right click on' java application and choose > new > > -- set the main class; > org.apache.nutch.crawl.Crawl > > -- and the arguments: > urls -dir crawloutput -threads 5 -depth 3 -topN 50 > > then set your breakpoints and run the debug for this configuration > > Good Luck :) > > > > > > ________________________________ > From: Marseld Dedgjonaj <marseld.dedgjo...@ikubinfo.com> > To: user@nutch.apache.org > Sent: Sat, October 2, 2010 4:51:28 PM > Subject: Run crawl from java code > > Hi, > > I have configured nutch 1.2 in Eclipse project. > > I need to run crawl from java code to follow it with debug. > > > > This is the script in linux that I execute for crawl. > > > > . bin/nutch inject /home/administrator/nutch/albanian_crawl/crawldb > my_urls > > . bin/nutch generate > /home/administrator/nutch/albanian_crawl/crawldb > /home/administrator/nutch/albanian_crawl/segments > > . segment=`ls -d > /home/administrator/nutch/albanian_crawl/segments/2* | tail -1` > > . bin/nutch fetch $segment > > . bin/nutch updatedb > /home/administrator/nutch/albanian_crawl/crawldb $segment > > . bin/nutch mergesegs > /home/administrator/nutch/albanian_crawl/segments > /home/administrator/nutch/albanian_crawl/segments/* > > . bin/nutch invertlinks > /home/administrator/nutch/albanian_crawl/linkdb > /home/administrator/nutch/albanian_crawl/segments/* > > . bin/nutch index /home/administrator/nutch/albanian_crawl/indexes > /home/administrator/nutch/albanian_crawl/crawldb > /home/administrator/nutch/albanian_crawl/linkdb > /home/administrator/nutch/albanian_crawl/segments/* > > . bin/nutch dedup /home/administrator/nutch/albanian_crawl/indexes > > > > Can anybody help to translate it in java. > > > > > > Thanks in advance , > > Marseld. > > > > > > >