Hello,
Thanks for your answer. I try it but I got this error.
Maybe any problem on reading in "conf" folder. I see its ok.
If I run crawl from linux script it works.
Thanks


This is the error message:

10/10/04 10:46:40 INFO crawl.Crawl: crawl started in: crawl
10/10/04 10:46:40 INFO crawl.Crawl: rootUrlDir = my_urls
10/10/04 10:46:40 INFO crawl.Crawl: threads = 5
10/10/04 10:46:40 INFO crawl.Crawl: depth = 3
10/10/04 10:46:40 INFO crawl.Crawl: indexer=lucene
10/10/04 10:46:40 INFO crawl.Crawl: topN = 50
10/10/04 10:46:40 INFO crawl.Injector: Injector: starting at 2010-10-04
10:46:40
10/10/04 10:46:40 INFO crawl.Injector: Injector: crawlDb: crawl/crawldb
10/10/04 10:46:40 INFO crawl.Injector: Injector: urlDir: my_urls
10/10/04 10:46:40 INFO crawl.Injector: Injector: Converting injected urls to
crawl db entries.
10/10/04 10:46:40 INFO jvm.JvmMetrics: Initializing JVM Metrics with
processName=JobTracker, sessionId=
10/10/04 10:46:40 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
10/10/04 10:46:41 WARN mapred.JobClient: No job jar file set.  User classes
may not be found. See JobConf(Class) or JobConf#setJar(String).
10/10/04 10:46:41 INFO mapred.FileInputFormat: Total input paths to process
: 1
10/10/04 10:46:42 INFO mapred.JobClient: Running job: job_local_0001
10/10/04 10:46:42 INFO mapred.FileInputFormat: Total input paths to process
: 1
10/10/04 10:46:42 INFO mapred.MapTask: numReduceTasks: 1
10/10/04 10:46:42 INFO mapred.MapTask: io.sort.mb = 100
10/10/04 10:46:43 INFO mapred.JobClient:  map 0% reduce 0%
10/10/04 10:46:43 INFO mapred.MapTask: data buffer = 79691776/99614720
10/10/04 10:46:43 INFO mapred.MapTask: record buffer = 262144/327680
10/10/04 10:46:43 WARN mapred.LocalJobRunner: job_local_0001
java.lang.RuntimeException: Error in configuring object
    at
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
    at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:354)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
    at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
    ... 5 more
Caused by: java.lang.RuntimeException: Error in configuring object
    at
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
    at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
    ... 10 more
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
    ... 13 more
Caused by: java.lang.IllegalArgumentException: plugin.folders is not defined
    at
org.apache.nutch.plugin.PluginManifestParser.parsePluginFolder(PluginManifes
tParser.java:78)
    at
org.apache.nutch.plugin.PluginRepository.<init>(PluginRepository.java:72)
    at
org.apache.nutch.plugin.PluginRepository.get(PluginRepository.java:95)
    at org.apache.nutch.net.URLNormalizers.<init>(URLNormalizers.java:117)
    at
org.apache.nutch.crawl.Injector$InjectMapper.configure(Injector.java:70)
    ... 18 more
10/10/04 10:46:44 INFO mapred.JobClient: Job complete: job_local_0001
10/10/04 10:46:44 INFO mapred.JobClient: Counters: 0
10/10/04 10:46:48 INFO mapred.LocalJobRunner:
file:/home/administrator/workspace/nutch-1.2/my_urls/urls:0+443



-----Original Message-----
From: Ahmad Al-Amri [mailto:[email protected]] 
Sent: Sunday, October 03, 2010 11:34 AM
To: [email protected]
Subject: Re: Run crawl from java code

Hello;

open nutch with eclipse; 

Run -> debug Configuration -> 'right click on' java application and choose
new 

-- set the main class;
org.apache.nutch.crawl.Crawl

-- and the arguments:
urls -dir crawloutput -threads 5 -depth 3  -topN 50

then set your breakpoints and run the debug for this configuration 

Good Luck :)





________________________________
From: Marseld Dedgjonaj <[email protected]>
To: [email protected]
Sent: Sat, October 2, 2010 4:51:28 PM
Subject: Run crawl from java code

Hi,

I have configured nutch 1.2 in Eclipse project. 

I need to run crawl from java code to follow it with debug.



This is the script in linux that I execute for crawl.



.         bin/nutch inject /home/administrator/nutch/albanian_crawl/crawldb
my_urls

.         bin/nutch generate
/home/administrator/nutch/albanian_crawl/crawldb
/home/administrator/nutch/albanian_crawl/segments

.         segment=`ls -d
/home/administrator/nutch/albanian_crawl/segments/2* | tail -1`

.         bin/nutch fetch $segment

.         bin/nutch updatedb
/home/administrator/nutch/albanian_crawl/crawldb $segment

.         bin/nutch mergesegs
/home/administrator/nutch/albanian_crawl/segments
/home/administrator/nutch/albanian_crawl/segments/*

.         bin/nutch invertlinks
/home/administrator/nutch/albanian_crawl/linkdb
/home/administrator/nutch/albanian_crawl/segments/*

.         bin/nutch index /home/administrator/nutch/albanian_crawl/indexes
/home/administrator/nutch/albanian_crawl/crawldb
/home/administrator/nutch/albanian_crawl/linkdb
/home/administrator/nutch/albanian_crawl/segments/*

.         bin/nutch dedup /home/administrator/nutch/albanian_crawl/indexes



Can anybody help to translate it in java.





Thanks in advance ,

Marseld.


      



Reply via email to