bin/nutch crawl urls -dir crawled -depth 2 >& crawl.log
Well at least now it ran, but with zero results. file urls contains:
#+^http://([a-z0-9]*\.)*apache.org/
+^http://www.calpoly.edu/~acadprog/2005course.html
<attached crawl.log>
Whenever I try to do -topN I get this error:
bin/nutch crawl urls -dir crawled -depth 2 -topN 1000
returns:
{blah}
060223 132142 crawl started in: crawled
060223 132142 rootUrlFile = 1000
060223 132142 threads = 10
060223 132142 depth = 2
060223 132142 Created webdb at LocalFS,C:\cygwin\home\falieson\nutch\crawled\db
Exception in thread "main" java.io.FileNotFoundException: 1000 (The system cannot find the file specified)
at java.io.FileInputStream.open(Native Method)
{blah}
On 2/23/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
Hi Florian,
Where is your urls file located?. If you created urls in the conf folder
then you have to call:
bin/nutch crawl conf/urls -dir crawlresults/ -depth 2 - topN 1000
Good luck
Detlev
I am running cygwin (I know), with jdk1.5.0 and tomcat 4.1
From cygwin I run:
bin/nutch crawl urls -dir crawlresults/ -depth 2 - topN 1000
results:
run java in C:/program files/java/jdk1.5.0/
060223 123010 parsing file:/c:/cygwin/home/falieson/nutch/conf/nutch-
default.xml
060223 123010 parsing file:/c:/cygwin/home/falieson/nutch/conf/crawl-
tool.xml
060223 123010 parsing file:/c:/cygwin/home/falieson/nutch/conf/nutch-
site.xml
060223 123010 No FS indicated, using default:local
060223 123010 rootUrlFile = 10000
060223 123010 thread = 10
060223 123010 depth = 2
060223 123011 Created webdb at
LocalFS,C:\cygwin\home\falieson\crawlresults\db
Exception in thread "main" java.io.FileNotFoundException: 10000 <the system
cannot find the file specified>
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>( FileInputStream.java:106)
at java.io.FileReader.<init>(FileReader.java:55)
at org.apache.nutch.db.WebDBInjector.injectURLFile(WebDBInjector.java
:372)
at org.apache.nutch.db.WebDBInjector.main (WebDBInjector.java:535)
at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:134)
~~~
bin/nutch crawl urls -dir crawled -depth 3
results:
run java in C:/program files/java/jdk1.5.0/
060223 123832 parsing file:/C:/cygwin/home/falieson/nutch/conf/nutch-
default.xml
060223 123832 parsing file:/C:/cygwin/home/falieson/nutch/conf/crawl-
tool.xml
060223 123832 parsing file:/C:/cygwin/home/falieson/nutch/conf/nutch-
site.xml
060223 123832 No FS indicated, using default:local
060223 123832 crawl started in: crawled
060223 123832 rootUrlFile = urls
060223 123832 threads = 10
060223 123832 depth = 3
060223 123832 Created webdb at LocalFS,C:\cygwin\home\falieson\crawled\db
Exception in thread "main" java.io.FileNotFoundException: urls (The system
cannot find the file specified)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>( FileInputStream.java:106)
at java.io.FileReader.<init>(FileReader.java:55)
at org.apache.nutch.db.WebDBInjector.injectURLFile(WebDBInjector.java
:372)
at org.apache.nutch.db.WebDBInjector.main (WebDBInjector.java:535)
at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:134)
~~
TIA
--
Best Regards,
Florian Mettetal
--
Best Regards,
Florian Mettetal
