Hi there, I just started playing with Nutch and I have still not decided yet if it would be appropriate or not, hence my questions. I already have experience with Lucene inside my own projects, so I think I could tweak it a bit. I browsed the documentation I could find, the Wiki and the mail archives and then I thought about checking with the people already using it to see if my impression is correct. So, here we go:
.- I'm planning on using it just in a single node to crawl/search on our different web servers, to provide a search facility inside our own pages, not for the whole web, and I read that the 7.X branch might be more appropriate as the 8.X seemed to be more focused on multinode sites and that might cause performance problems. Is that still true? Should I stick to the 7.X branch? .- I would like to be able to crawl/index/search the documents using specific analyzers, due to documents being LATIN-1. I already applied an appropriate analyzer in my programms but I'm not sure if Nutch allows to change it easily, through some property, or I have to get into the code and do it myself. I have no problem with that but the less I deviate from a standard Nutch installation, the better, I guess. The same goes for the Indexer and the searching possibilities. I would like to use something else than a Boolean query. Can those things be tweaked through properties? .- Lastly, the search interface is not exactly what I want and I'm also not too keen on plain JSPs with the scripting inside. I thought I might as well replicate the functionality using a framework we use, based on XML so we have the UI and the rest separated... Are there any plans to develop the search UI further, or should I simply look at the JSPs and replicate, more or less, their behaviour. In that case, any special tips for that? .- Anyone using Nutch in a similar scenario has any special tips/advice? Thanks for any insight you can provide, I do have plenty of experience with Java on the server side and Open Source, but I'd rather not duplicate work if I can help it and I'd like to stick as close to the "standard" Nutch as possible. Cheers! D.
