Since nutch moved to apache. Nutch is available under Apache License version 2. (you can't get more open source) :-)
Feel free to use the powered by nutch but you don't need. Anyway developers are happy to see working installation so at least alert your installation in the nutch wiki under known installation.
HTH
Stefan
Am 01.04.2005 um 04:39 schrieb <[EMAIL PROTECTED]>:


Hi,

 I loved this code.
 Nutch is really the best open source search engine.

 I would like to use it, but I didn't understand how the LICENSE works.
 What's the difference from using an open source or using another paid
software?

And I have to put the nutch gif logo on my search pages saying that it's
POWERED by NUTCH?
May I omit the logo?


 Thanks!
----- Original Message -----
From: "Sami Siren" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Thursday, March 31, 2005 3:32 PM
Subject: Re: Index only part of page content


One way is to implement a plugin wich only extracts the text you really need to index. see parse-html to get the idea. Also limiting the link extraction could be implemented by the same plugin.

--
  Sami Siren


Hello,

we use nutch as the search engine for our intranet solution.
It works very well.

Thanks for the lot of you having put work on it.

We have one question:

Is it possible to only index one part of a html page
(or specify that one part of a page is NOT put in the index) ?


In the past we used alkaline (alkaline.vestris.com), but since it's no longer actively developed and has problems with UTF-8 stuff, we searched and found nutch.


In Alkaline, you can put a tag like <alkaline skip>text</alkaline> in the html code, then the text inside of these tags is not put in the index. (And links are not followed too)


The reason for using this is the following: If you have a pagelayout with on the left the navigation, in the middle the content and on the right you have a overview of the current news. (www.ertech.ch for example)

With normal indexing, all the text who appears in the news area
is indexed and found on each page. But obviously this is not
the intended result, when searching for a string found in the news
area, each page of the website is displayed in the result.


Probably the solution is some kind of custom filter for html content... ?



Andr�

aarboard ag - internet - networks - databases
Egliweg 10 - CH-2560 Nidau - Switzerland
Phone +41 32 332 97 14 Fax +41 32 332 97 15
Mail: [EMAIL PROTECTED]





------------------------------------------------------- This SF.net email is sponsored by Demarc: A global provider of Threat Management Solutions. Download our HomeAdmin security software for free today! http://www.demarc.com/Info/Sentarus/hamr30 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general


-----------information technology-------------------
company:     http://www.media-style.com
forum:           http://www.text-mining.org
blog:                http://www.find23.net



Reply via email to