[Nutch-dev] Re: Per-page crawling policy

Andrzej Bialecki Fri, 06 Jan 2006 21:50:36 -0800

Jack Tang wrote:

Hi Andrzej


The idea brings vertical search into nutch and definitely it is great:)
I think nutch should add information retrieving layer into the who
architecture, and export some abstract interface, say
UrlBasedInformationRetrieve(you can implement your url grouping idea
here?), TextBasedInformationRetrieve, DomBasedInformationRetrieve. The
user can implement these in their vertical search by their own.

We sort of reached an agreement to add Properties to CrawlDatum. Userswill be able to put arbitrary metadata in there, so that each pagerecord could be processed differently if needs be.


--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

[Nutch-dev] Re: Per-page crawling policy

Reply via email to