Ashish wrote:
> Hey everyone....
>
>  
>
> What would be a good way to read inlinks (anchor text associated with
> inlinks, actually), for each crawled page ? 
>
> Is there some way to make this information available at fetch-time ? Any
> pointers to sample code would be a huge help ! I'm using Nutch 0.8.1.
> Thanks....
>   

Pages contain only outlinks, so until you build the inverted 
relationship (using invertlinks) it won't be available. That's what 
linkdb is for.

Why do you need this during fetching? You could modify the fetcher to 
access linkdb during fetching, or you could modify Generator to include 
information from linkdb when it generates new segments, whichever way is 
more suitable to your requirements.

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to