Cool Coder wrote:
Hello, I am just wondering how can I read crawldb and get content of
each stored URL. I am not sure whether this can be possible or not.

In Nutch 0.8 and later the page information and link information is stored separately, in CrawlDb and LinkDb. You need to have the linkdb (see bin/nutch invertlinks command), and then you can use LinkDbReader class to retrieve this information. From the command line this is bin/nutch readlinkdb.


--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to