The default:local message just means that Nutch will use the local disk/file
system
to store the database rather than special distributed file system (called
NDFS in 0.7
and Hadoop in 0.8).
If it's telling you that there are no pages in your db, the most common
reason
is that the conf/regex-urlfilter.txt file is filtering out all of your web
pages. Check
your crawl.log, it should tell you whether any pages were actually fetched.
Look for errors in the crawl.log to make sure that it actually did anything
and
didn't just exception out. Check your regex-urlfilter.txt file and make sure
that
the entries to include your URLs appear before any that exclude a whole
bunch
of URLs. Order is important for regex-urlfilter.txt.
Howie
I am having similar problem. I execute: bin/nutch crawl urls.txt -dir ct
-depth 3 >& crawl.log
This creates the ct directory and all the files and in the crawl.log it
says that No FS indicated using default:local, I don't know what that
means.
When I execute: bin/nutch readdb ct/db -stats , It says that No FS
indicated, using default:local, Number of pages: 0 Number of links: 0
I have my tomcat running. It seems that the crawl ran but it did not
find any webpages to index.
p. Cone
Rafael Cardoso <[EMAIL PROTECTED]> wrote:
Hi,
I´m crawling my intranet, i saw that he got a lot of pages, but when i
search for anything, it doesnt return any result.
"Resultados *0-0* (de um total de 0 documentos):"
"Results 0-0 (from a total of 0 documents):"
There is some step between the crawling and the seaching? (App is already
deployed in tomcat)
---------------------------------
Yahoo! Mail
Bring photos to life! New PhotoMail makes sharing a breeze.
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general