If I remember correctly, crawl looks for a directory that contains the
urls file.


On Tue, 2006-01-17 at 11:36 -0800, Chris Shepard wrote:
> Hi all,
> 
> Having some problems getting nutch to run on
> XP/Cygwin.
> This is re nutch-2006-01-17
> 
> Intranet crawl........
> 
> When I do this (after making urls file, etc.):
> 
>       bin/nutch crawl urls -dir cdir -depth 2 >&log
>       
> I get this in the log:
>       
> 060117 114833 parsing
> file:/C:/cygwin/usr/local/src/nutch-nightly/conf/nutch-default.xml
> 060117 114834 parsing
> file:/C:/cygwin/usr/local/src/nutch-nightly/conf/crawl-tool.xml
> 060117 114834 parsing
> file:/C:/cygwin/usr/local/src/nutch-nightly/conf/mapred-default.xml
> 060117 114834 parsing
> file:/C:/cygwin/usr/local/src/nutch-nightly/conf/nutch-site.xml
> 060117 114834 crawl started in: cdir
> 060117 114834 rootUrlDir = urls
> 060117 114834 threads = 10
> 060117 114834 depth = 2
> 060117 114834 parsing
> file:/C:/cygwin/usr/local/src/nutch-nightly/conf/nutch-default.xml
> 060117 114834 parsing
> file:/C:/cygwin/usr/local/src/nutch-nightly/conf/crawl-tool.xml
> 060117 114834 parsing
> file:/C:/cygwin/usr/local/src/nutch-nightly/conf/nutch-site.xml
> 060117 114834 Injector: starting
> 060117 114834 Injector: crawlDb: cdir\crawldb
> 060117 114834 Injector: urlDir: urls
> 060117 114834 Injector: Converting injected urls to
> crawl db entries.
> 060117 114834 parsing
> file:/C:/cygwin/usr/local/src/nutch-nightly/conf/nutch-default.xml
> 060117 114834 parsing
> file:/C:/cygwin/usr/local/src/nutch-nightly/conf/crawl-tool.xml
> 060117 114834 parsing
> file:/C:/cygwin/usr/local/src/nutch-nightly/conf/mapred-default.xml
> 060117 114834 parsing
> file:/C:/cygwin/usr/local/src/nutch-nightly/conf/mapred-default.xml
> 060117 114834 parsing
> file:/C:/cygwin/usr/local/src/nutch-nightly/conf/nutch-site.xml
> 060117 114834 Running job: job_krj0e1
> 060117 114834 parsing
> file:/C:/cygwin/usr/local/src/nutch-nightly/conf/nutch-default.xml
> 060117 114834 parsing
> file:/C:/cygwin/usr/local/src/nutch-nightly/conf/mapred-default.xml
> 060117 114835 parsing
> \tmp\nutch\mapred\local\localRunner\job_krj0e1.xml
> 060117 114835 parsing
> file:/C:/cygwin/usr/local/src/nutch-nightly/conf/nutch-site.xml
> java.io.IOException: No input directories specified
> in: NutchConf: nutch-default.xml , mapred-default.xml
> , \tmp\nutch\mapred\local\localRunner\job_krj0e1.xml ,
> nutch-site.xml
>       at
> org.apache.nutch.mapred.InputFormatBase.listFiles(InputFormatBase.java:85)
>       at
> org.apache.nutch.mapred.InputFormatBase.getSplits(InputFormatBase.java:95)
>       at
> org.apache.nutch.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:63)
> 060117 114835  map 0%
> java.io.IOException: Job failed!
>       at
> org.apache.nutch.mapred.JobClient.runJob(JobClient.java:308)
>       at
> org.apache.nutch.crawl.Injector.inject(Injector.java:102)
>       at org.apache.nutch.crawl.Crawl.main(Crawl.java:105)
> Exception in thread "main" 
> 
> I see that:
> 
>       nutch-site.xml is empty
>       mapred-default is empty
> 
> 
> Whole Web setup............................ 
> 
> When I do this: (after mkdirs)
> 
>       bin/nutch admin db -create
>  
> I get this at the prompt:
> 
>       Exception in thread "main"
> java.lang.NoClassDefFoundError: admin
>       
> I don't speak Java, so I'm not sure what it's saying.
> 
> 
> Please help.
> 
> TIA.
> 
> 
> 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 
> 




-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to