Re: Does WebDB keep the contents of the pages?

Dennis Kubes Tue, 24 Oct 2006 07:37:01 -0700

Correct, content is not stored in the crawldb.

The crawldb holds the url, its state (fetched, unfetched, last fetchtime, etc.). The content of the page is held in the segments. Contentfolder holds the actual page content. Parse data is the page meta dataand Parse text is the actual text of the page after parsing.


Dennis

Qi Wu wrote:

Hi All,

   I want to know what kind of information of a page is kept in webDB. It
seems the content of a page can't be got from the WebDB but the MD5 hash of
page contents from WebDB, and page contents can only be got from Segements
.Is this right ?


Thanks,
Qi

Re: Does WebDB keep the contents of the pages?

Reply via email to