The default:local message just means that Nutch will use the local disk/file system to store the database rather than special distributed file system (called NDFS in 0.7
and Hadoop in 0.8).

If it's telling you that there are no pages in your db, the most common reason is that the conf/regex-urlfilter.txt file is filtering out all of your web pages. Check
your crawl.log, it should tell you whether any pages were actually fetched.
Look for errors in the crawl.log to make sure that it actually did anything and didn't just exception out. Check your regex-urlfilter.txt file and make sure that the entries to include your URLs appear before any that exclude a whole bunch
of URLs. Order is important for regex-urlfilter.txt.

Howie

I am having similar problem. I execute: bin/nutch crawl urls.txt -dir ct -depth 3 >& crawl.log This creates the ct directory and all the files and in the crawl.log it says that No FS indicated using default:local, I don't know what that means.

When I execute: bin/nutch readdb ct/db -stats , It says that No FS indicated, using default:local, Number of pages: 0 Number of links: 0

I have my tomcat running. It seems that the crawl ran but it did not find any webpages to index.

  p. Cone


Rafael Cardoso <[EMAIL PROTECTED]> wrote:
  Hi,
I´m crawling my intranet, i saw that he got a lot of pages, but when i
search for anything, it doesnt return any result.

"Resultados *0-0* (de um total de 0 documentos):"
"Results 0-0 (from a total of 0 documents):"

There is some step between the crawling and the seaching? (App is already
deployed in tomcat)



---------------------------------
Yahoo! Mail
Bring photos to life! New PhotoMail  makes sharing a breeze.




-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to