I have some technical questions about nutch.
1.
I would like to impelement german analysis. There is a package
*org.apache.lucene.analysis.de
<http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/analysis/de/package-summary.html>
*available. Maybe it is only a simple change of the package which is used. Could somebody tell us someting about that issue.
2. What about content reduction? For example to print out only two hits per domain. We have indexed 2.000.000 pages once for testing. By searching some very common words many pages from the same server come up. Currently we did not have refetched them. Would content reduction be useful ? Or is it not necessary, if there are enough page refeteched and indexed?
3. Is it possible to change the url filter without deleting all data. Would urls, which do not fit any longer by the urlfilters, be kicked out, when fetching a segment again? How does old urls leave the database?
4. What about changing the list of common words, if a index is allready build up?
Many thanks for your reply
Matthias Jaekle
------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers