Are you sure your urls file doesn't have an extension?  I had a
similiar problem and found my  urls file was  .rtf which I didn't see
until I viewed the file via the command line.

On 8/28/05, Nils Hoeller <[EMAIL PROTECTED]> wrote:
> Hi
> 
> my Problem is:
> 
> I ve done everything as descriped in the Getting Started Tutorial at
> nutch.org.
> 
> When I now run the command: bin/nutch crawl urls -dir crawl.test -depth
> 3 >& crawl.log
> 
> I get this Exception in the log file:
> run java in /usr/java/jdk1.5.0_04
> 050828 104004 parsing
> file:/home/nils/Studienarbeit/nutch-nightly/conf/nutch-default.xml
> 050828 104004 parsing
> file:/home/nils/Studienarbeit/nutch-nightly/conf/crawl-tool.xml
> 050828 104004 parsing
> file:/home/nils/Studienarbeit/nutch-nightly/conf/nutch-site.xml
> 050828 104004 No FS indicated, using default:local
> 050828 104004 crawl started in: crawl.test
> 050828 104004 rootUrlFile = urls
> 050828 104004 threads = 10
> 050828 104004 depth = 3
> Exception in thread "main" java.lang.RuntimeException:
> java.net.UnknownHostException: linux: linux
>         at org.apache.nutch.io.SequenceFile
> $Writer.<init>(SequenceFile.java:67)
>         at org.apache.nutch.io.MapFile$Writer.<init>(MapFile.java:94)
>         at org.apache.nutch.db.WebDBWriter.<init>(WebDBWriter.java:1507)
>         at
> org.apache.nutch.db.WebDBWriter.createWebDB(WebDBWriter.java:1438)
>         at
> org.apache.nutch.tools.WebDBAdminTool.main(WebDBAdminTool.java:172)
>         at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:133)
> Caused by: java.net.UnknownHostException: linux: linux
>         at java.net.InetAddress.getLocalHost(InetAddress.java:1308)
>         at org.apache.nutch.io.SequenceFile
> $Writer.<init>(SequenceFile.java:64)
>         ... 5 more
> 
> 
> My urls file looks like this:
> 
> http://www.nutch.org/
> 
> I ve also tried:
> 
> http://www.ifis.uni-luebeck.de/ which I d like to get nutched
> 
> Also in the urlfilter conf is written
> 
> +^http://([a-z0-9]*\.)*ifis.uni-luebeck.de/
> +^http://([a-z0-9]*\.)*nutch.org/
> 
> 
> Can anyone give me a Hint?
> Where is the error?
> 
> Thanks Nils
> 
>


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO September
19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to