Hi all,

I'm involved in a project to develop a search tool with an engineering
bent for a collaborative client of my university. (FYI, the collaborative
"team" is very small: me and an engineer from the company.) We're going to
use UIMA to analyse documents and Solr/Lucene to store the results for later
searching. Rather than reinvent the wheel, I'd like to use some existing
crawler implementation(s) to feed my CollectionReaders (also, it's the
analysis that interests me, not so much the development work). I think I
may need three different crawlers (or one very flexible one) to cover the
three different areas documents will be found:
- intranet-based web
- network attached storage, home drives, etc.
- emails (in particular their attachments) stored on an Exchange server

Preferably I'd like to minimise the amount of work required to incorporate
them into the search tool. I've looked at things like Nutch (
http://lucene.apache.org/nutch/) but it appears to be too heavily
web-oriented; I welcome being corrected on that point though :-)

What have others used in similar situations?

What might others recommend even if they haven't used them yet?

Thanks,
James.

Reply via email to