Re: [Nutch-dev] Nutch based directory and crawler based on keyword

Stefan Groschupf Sun, 09 Jul 2006 22:54:35 -0700

Hi,

this question is difficult to answer and may be there more experts in  
the nutch user list than in the developer list.
In nutch 0.8 you can use the new scoring api to change the scoring of  
a page for being scheduled for crawling based on the it's scores.  
Have a look to the opic score plugin and to the crawldatum meta data.  
The meta data can be used to transport informations like customs  
category weightnings scores that take effect in the crawlDatum score  
caculation.
Attention this is not scoring during search time, this is scoring  
crawling scheduling.
Beside that the may be simplest way is to write a index plugin that  
tag a page (keywordMatch:true / false) that a keyword occurs or not.  
During search you extend the search string behind the scene with  
something like: yourSearchString+" keywordMatch:true"


Stefan




Am 08.07.2006 um 07:03 schrieb Syed Kamran Ali:

> Hi,
>
> I have successfully configured nutch 0.7.2. Ran the crawler a few  
> times all
> working fine. Now i wanted to know is there a way i can run the  
> crawler so
> that if it finds certain keyword in a website only then it indexes it
> otherwise not. Also after i have the index created is it possible  
> that i can
> create a categorized directory, like there is yahoo and google  
> directories?
>
> -- 
> Thanks
> Kamran



-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Re: [Nutch-dev] Nutch based directory and crawler based on keyword

Reply via email to