Earl Cahill wrote:
1. Sounds like some of you have some glue programs
that help run the whole process. Are these going to
end up in subversion sometime? I am guessing there is
much duplicated effort.
I'm not sure what you mean. I set environment variables in my .bashrc,
then simply use 'bin/start-all.sh' and 'bin/nutch crawl'.
2. Not sure how to test that my index actually
worked. Starting catalina in my index directory
didn't work this time.
NutchBean now looks for things in the subdirectory of the connected
directory named 'crawl'. Is that an improvement or is it just confusing?
3. What do you all think of setting up some test
directories to crawl, in say
http://lucene.apache.org/nutch/test/
Thinking it would be kind of cool to have junit run
through a whole process on external pages.
I think it would be better to have the junit tests start jetty then
crawl localhost. I'd love to see some end-to-end unit tests like that.
4. Any way that
http://spack.net/nutch/SimpleMapReduceTutorial.html
http://spack.net/nutch/GettingNutchRunningOnUbuntu.html
can get on the wiki? I am using apache-ish style and
would change to whatever, but as fun as these are to
write, I would like to see them used.
You should be able to add them to the wiki yourself. Just fill out:
http://wiki.apache.org/nutch/UserPreferences
Thanks,
Doug