Hi there,

I just started playing with Nutch and I have still not decided yet if it
would be appropriate or not, hence my questions. I already have experience
with Lucene inside my own projects, so I think I could tweak it a bit. I
browsed the documentation I could find, the Wiki and the mail archives and
then I thought about checking with the people already using it to see if my
impression is correct. So, here we go:

.- I'm planning on using it just in a single node to crawl/search on our
different web servers, to provide a search facility inside our own pages,
not for the whole web, and I read that the 7.X branch might be more
appropriate as the 8.X seemed to be more focused on multinode sites and
that might cause performance problems. Is that still true? Should I stick
to the 7.X branch?

.- I would like to be able to crawl/index/search the documents using
specific analyzers, due to documents being LATIN-1. I already applied an
appropriate analyzer in my programms but I'm not sure if Nutch allows to
change it easily, through some property, or I have to get into the code and
do it myself. I have no problem with that but the less I deviate from a
standard Nutch installation, the better, I guess. The same goes for the
Indexer and the searching possibilities. I would like to use something else
than a Boolean query. Can those things be tweaked through properties?

.- Lastly, the search interface is not exactly what I want and I'm also not
too keen on plain JSPs with the scripting inside. I thought I might as well
replicate the functionality using a framework we use, based on XML so we
have the UI and the rest separated... Are there any plans to develop the
search UI further, or should I simply look at the JSPs and replicate, more
or less, their behaviour. In that case, any special tips for that?

.- Anyone using Nutch in a similar scenario has any special tips/advice?

Thanks for any insight you can provide, I do have plenty of experience with
Java on the server side and Open Source, but I'd rather not duplicate work
if I can help it and I'd like to stick as close to the "standard" Nutch as
possible.

Cheers!
D.

Reply via email to