Correct, content is not stored in the crawldb.

The crawldb holds the url, its state (fetched, unfetched, last fetch time, etc.). The content of the page is held in the segments. Content folder holds the actual page content. Parse data is the page meta data and Parse text is the actual text of the page after parsing.

Dennis

Qi Wu wrote:
Hi All,

   I want to know what kind of information of a page is kept in webDB. It
seems the content of a page can't be got from the WebDB but the MD5 hash of
page contents from WebDB, and page contents can only be got from Segements
.Is this right ?


Thanks,
Qi



Reply via email to