Re: How to read crawldb

Andrzej Bialecki Wed, 28 Nov 2007 01:11:16 -0800

Cool Coder wrote:

Hello, I am just wondering how can I read crawldb and get content of
each stored URL. I am not sure whether this can be possible or not.

In Nutch 0.8 and later the page information and link information isstored separately, in CrawlDb and LinkDb. You need to have the linkdb(see bin/nutch invertlinks command), and then you can use LinkDbReaderclass to retrieve this information. From the command line this isbin/nutch readlinkdb.



--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: How to read crawldb

Reply via email to