Re: Run Nutch Crawl in Eclipse

Andy Xue Thu, 12 Apr 2012 01:50:31 -0700

Hi all:

Just FYI that I have solved the problem.
I checked the hadoop.log file and found that the "plugin.folder" property
was set incorrectly.


Thank you for your help.
Andy


On 11 April 2012 16:41, Andy Xue <[email protected]> wrote:

> Hi Lewis:
>
> Thank you for the help. This is the (entire) output after I set the log4j
> property to debug.
> ==============================================================
> crawl started in: crawl
> rootUrlDir = urls
> threads = 10
> depth = 2
> solrUrl=http://localhost:8983/solr/
> topN = 10
> Injector: starting at 2012-04-11 16:37:20
> Injector: crawlDb: crawl/crawldb
> Injector: urlDir: urls
> Injector: Converting injected urls to crawl db entries.
>
> Exception in thread "main" java.io.IOException: Job failed!
>     at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
>     at org.apache.nutch.crawl.Injector.inject(Injector.java:217)
>     at org.apache.nutch.crawl.Crawl.run(Crawl.java:127)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>     at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)
> ==============================================================
>
> And btw, The "urls" directory is correct and it does contain a txt file
> of a list of urls.
>
> Regards
> Andy
>
>
>
> On 10 April 2012 22:08, Lewis John Mcgibbney <[email protected]>wrote:
>
>> There is no more log information before the solrUrl stuff, no?
>>
>> try setting log4j.properties to debug in conf/ rebuild the project and see
>> whats going on.
>>
>> On Tue, Apr 10, 2012 at 1:03 PM, Andy Xue <[email protected]> wrote:
>>
>> > Lewis:
>> > Thanks for the reply.
>> > However as far as I know, I don't have to set solrUrl unless I want to
>> > index using solr.
>> >
>> > Correct. My fault. I just assumed that this was required.
>>
>> Lewis
>>
>
>

Re: Run Nutch Crawl in Eclipse

Reply via email to