Tempted to do each question as a separate email, but
here you go.
1. Does nutch use pure lucene for its indexing? Does
the nutch index = lucene + potentially ndfs? If I am
going to run a search web service, I am just wondering
what advantages nutch would serve over lucene.
2. Turns out I am going to write a web service for
search. I have played with the nutch search example,
but if I want to do rather arbitrary key/value pairs
and have a web service return xml, I am guessing I am
going to have to write my own. Is that right? Is
there an easy way to get results in xml format?
Guessing I need to build it all myself.
3. In another project, I want to use ndfs to store
two+ distinct copies of a file, but I really don't
want anything else to do with nutch on the project.
Is that possible? Is there a clean break? I want to
make a list of servers, then have an api call that
takes a file and stores 2+ copies across my servers,
and an api call that reads a file, with appropriate
failover.
4. Guessing I write a plugin, but I want to interject
some code during the nutch crawl process that does
some analysis and actually does the index insertion
itself. There any good docs on how to do such a
thing?
Thanks,
Earl
__________________________________
Yahoo! Mail - PC Magazine Editors' Choice 2005
http://mail.yahoo.com