Hello, we use nutch as the search engine for our intranet solution. It works very well.
Thanks for the lot of you having put work on it. We have one question: Is it possible to only index one part of a html page (or specify that one part of a page is NOT put in the index) ? In the past we used alkaline (alkaline.vestris.com), but since it's no longer actively developed and has problems with UTF-8 stuff, we searched and found nutch. In Alkaline, you can put a tag like <alkaline skip>text</alkaline> in the html code, then the text inside of these tags is not put in the index. (And links are not followed too) The reason for using this is the following: If you have a pagelayout with on the left the navigation, in the middle the content and on the right you have a overview of the current news. (www.ertech.ch for example) With normal indexing, all the text who appears in the news area is indexed and found on each page. But obviously this is not the intended result, when searching for a string found in the news area, each page of the website is displayed in the result. Probably the solution is some kind of custom filter for html content... ? Andr� aarboard ag - internet - networks - databases Egliweg 10 - CH-2560 Nidau - Switzerland Phone +41 32 332 97 14 Fax +41 32 332 97 15 Mail: [EMAIL PROTECTED]
