> I'm not sure what you mean. I set environment > variables in my .bashrc, > then simply use 'bin/start-all.sh' and 'bin/nutch > crawl'.
Well, not sure if you looked at my tutorial, which is now on the wiki http://wiki.apache.org/nutch/SimpleMapReduceTutorial but yeah, that is much simpler than what I am doing. Looks like a little example has been added to the FAQ, which wasn't there last time I looked. > NutchBean now looks for things in the subdirectory > of the connected > directory named 'crawl'. Is that an improvement or > is it just confusing? I think magic is ok so long as it is documented and it works. > I think it would be better to have the junit tests > start jetty then > crawl localhost. I'd love to see some end-to-end > unit tests like that. Think I will start to work on this. Maybe start with on page that contains just a few phrases, or maybe just the word nutch, then make sure it can be queried out in the end? Could also check status through the process to make sure everything looks good. If nothing else, I would likely understand the process pretty well by the time I got done with my writing. I think this would also make it nice to test things like recursive linking, parsing pdfs or other file formats, observing robots.txt or any crawling bugs that are encountered and then fixed. Suggestions for where to put such test content in the tree? > You should be able to add them to the wiki yourself. Thanks, I added them. Earl ______________________________________________________ Yahoo! for Good Donate to the Hurricane Katrina relief effort. http://store.yahoo.com/redcross-donate3/
