I am trying to do a crawl on trunk of one of my sites, and it isn't working. I make a file urls, that just contains the site
http://shopthar.com/ in my conf/crawl-urlfilter.txt I have +^http://shopthar.com/ I then do bin/nutch crawl urls -dir crawl.test -depth 100 -threads 20 it kicks in and I get repeating chunks like 051019 010450 Updating /home/nutch/nutch/trunk/crawl.test/db 051019 010450 Updating for /home/nutch/nutch/trunk/crawl.test/segments/20051019010449 051019 010450 Finishing update 051019 010450 Update finished 051019 010450 FetchListTool started 051019 010450 Overall processing: Sorted 0 entries in 0.0 seconds. 051019 010450 Overall processing: Sorted NaN entries/second 051019 010450 FetchListTool completed 051019 010450 logging at INFO For ages, but I only see two nutch hits in my access log: one for my robots.txt and one for my front page. Nothing else. The "crawl" finishes, then I do a search and can only get a hits for the front page. When I do the search via lynx, I get a momentary Bad partial reference! Stripping lead dots. I can't imagine this is really the problem, but pretty well all my links are relative. I mean nutch has to be able to follow relative links, right? Ideas? Thanks, Earl __________________________________ Start your day with Yahoo! - Make it your home page! http://www.yahoo.com/r/hs
