Jack Tang wrote:
Hi Andrzej
The idea brings vertical search into nutch and definitely it is great:)
I think nutch should add information retrieving layer into the who
architecture, and export some abstract interface, say
UrlBasedInformationRetrieve(you can implement your url grouping idea
here?), TextBasedInformationRetrieve, DomBasedInformationRetrieve. The
user can implement these in their vertical search by their own.
We sort of reached an agreement to add Properties to CrawlDatum. Users
will be able to put arbitrary metadata in there, so that each page
record could be processed differently if needs be.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers