If I remember correctly, crawl looks for a directory that contains the
urls file.


On Tue, 2006-01-17 at 11:36 -0800, Chris Shepard wrote:
> Hi all,
> 
> Having some problems getting nutch to run on
> XP/Cygwin.
> This is re nutch-2006-01-17
> 
> Intranet crawl........
> 
> When I do this (after making urls file, etc.):
> 
>       bin/nutch crawl urls -dir cdir -depth 2 >&log
>       
> I get this in the log:
>       
> 060117 114833 parsing
> file:/C:/cygwin/usr/local/src/nutch-nightly/conf/nutch-default.xml
> 060117 114834 parsing
> file:/C:/cygwin/usr/local/src/nutch-nightly/conf/crawl-tool.xml
> 060117 114834 parsing
> file:/C:/cygwin/usr/local/src/nutch-nightly/conf/mapred-default.xml
> 060117 114834 parsing
> file:/C:/cygwin/usr/local/src/nutch-nightly/conf/nutch-site.xml
> 060117 114834 crawl started in: cdir
> 060117 114834 rootUrlDir = urls
> 060117 114834 threads = 10
> 060117 114834 depth = 2
> 060117 114834 parsing
> file:/C:/cygwin/usr/local/src/nutch-nightly/conf/nutch-default.xml
> 060117 114834 parsing
> file:/C:/cygwin/usr/local/src/nutch-nightly/conf/crawl-tool.xml
> 060117 114834 parsing
> file:/C:/cygwin/usr/local/src/nutch-nightly/conf/nutch-site.xml
> 060117 114834 Injector: starting
> 060117 114834 Injector: crawlDb: cdir\crawldb
> 060117 114834 Injector: urlDir: urls
> 060117 114834 Injector: Converting injected urls to
> crawl db entries.
> 060117 114834 parsing
> file:/C:/cygwin/usr/local/src/nutch-nightly/conf/nutch-default.xml
> 060117 114834 parsing
> file:/C:/cygwin/usr/local/src/nutch-nightly/conf/crawl-tool.xml
> 060117 114834 parsing
> file:/C:/cygwin/usr/local/src/nutch-nightly/conf/mapred-default.xml
> 060117 114834 parsing
> file:/C:/cygwin/usr/local/src/nutch-nightly/conf/mapred-default.xml
> 060117 114834 parsing
> file:/C:/cygwin/usr/local/src/nutch-nightly/conf/nutch-site.xml
> 060117 114834 Running job: job_krj0e1
> 060117 114834 parsing
> file:/C:/cygwin/usr/local/src/nutch-nightly/conf/nutch-default.xml
> 060117 114834 parsing
> file:/C:/cygwin/usr/local/src/nutch-nightly/conf/mapred-default.xml
> 060117 114835 parsing
> \tmp\nutch\mapred\local\localRunner\job_krj0e1.xml
> 060117 114835 parsing
> file:/C:/cygwin/usr/local/src/nutch-nightly/conf/nutch-site.xml
> java.io.IOException: No input directories specified
> in: NutchConf: nutch-default.xml , mapred-default.xml
> , \tmp\nutch\mapred\local\localRunner\job_krj0e1.xml ,
> nutch-site.xml
>       at
> org.apache.nutch.mapred.InputFormatBase.listFiles(InputFormatBase.java:85)
>       at
> org.apache.nutch.mapred.InputFormatBase.getSplits(InputFormatBase.java:95)
>       at
> org.apache.nutch.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:63)
> 060117 114835  map 0%
> java.io.IOException: Job failed!
>       at
> org.apache.nutch.mapred.JobClient.runJob(JobClient.java:308)
>       at
> org.apache.nutch.crawl.Injector.inject(Injector.java:102)
>       at org.apache.nutch.crawl.Crawl.main(Crawl.java:105)
> Exception in thread "main" 
> 
> I see that:
> 
>       nutch-site.xml is empty
>       mapred-default is empty
> 
> 
> Whole Web setup............................ 
> 
> When I do this: (after mkdirs)
> 
>       bin/nutch admin db -create
>  
> I get this at the prompt:
> 
>       Exception in thread "main"
> java.lang.NoClassDefFoundError: admin
>       
> I don't speak Java, so I'm not sure what it's saying.
> 
> 
> Please help.
> 
> TIA.
> 
> 
> 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 
> 


Reply via email to