Re: error while crawling

reinhard schwab Wed, 10 Feb 2010 22:09:58 -0800

nutch expect "urls" to be a directory.
create a directory "urls" and create in this directory a file called
like you want and
edit this file, add the urls you want to crawl.


Injector: urlDir: urls

Input path doesnt exist : C:/cygwin/home/Mouad&Sibel/nutch-0.9/urls


Mouad schrieb:
> Hello,
> i installed Nutch on windows and everything went well until I wanted to
> crawl a website.
> I typed this line on the urls file that I created on nutch directory : echo
> 'http://dawahweb.net' > urls
> I could not create a WebDB trying to type admin db -create
> I received this log :
> crawl started in: crawl-tinysite
> rootUrlDir = urls
> threads = 10
> depth = 1
> Injector: starting
> Injector: crawlDb: crawl-tinysite/crawldb
> Injector: urlDir: urls
> Injector: Converting injected urls to crawl db entries.
> Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException:
> Input path doesnt exist : C:/cygwin/home/Mouad&Sibel/nutch-0.9/urls
>       at
> org.apache.hadoop.mapred.InputFormatBase.validateInput(InputFormatBase.java:138)
>       at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:326)
>       at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:543)
>       at org.apache.nutch.crawl.Injector.inject(Injector.java:162)
>       at org.apache.nutch.crawl.Crawl.main(Crawl.java:115)
>
> can anyone help please?
>
> Mouad
>

Re: error while crawling

Reply via email to