I have a web server with about 4gb of static HTML files on it. Is there a way to get Nutch to crawl those files directly from the filesystem, without going through the web server? Obviously, I could have it go through the web server to do this, but the crawl is going to be much faster if it could just read the files directly from the disk. Is it possible?
Thanks __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
