Hi,

I have some technical questions about nutch.

1.
I would like to impelement german analysis. There is a package
*org.apache.lucene.analysis.de
<http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/analysis/de/package-summary.html>


*available.
Maybe it is only a simple change of the package which is used. Could
somebody tell us someting about that issue.

2.
What about content reduction? For example to print out only two hits per
domain.
We have indexed 2.000.000 pages once for testing. By searching some very
common words many pages from the same server come up. Currently we did
not have refetched them.
Would content reduction be useful ? Or is it not necessary, if there are
enough page refeteched and indexed?

3. Is it possible to change the url filter without deleting all data.
Would urls, which do not fit any longer by the urlfilters, be kicked
out, when fetching a segment again?
How does old urls leave the database?

4. What about changing the list of common words, if a index is allready
build up?

Many thanks for your reply

Matthias Jaekle




------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to