how web crawler crawls contents after the first crawling

Shigeki Kobayashi Thu, 05 Feb 2015 23:28:18 -0800

Hi Karl


I have a basic question about how web crawler crawls contents after
the first crawling.

Does it crawls and indexes all pages from the root all the time or it
crawls
only pages that are modified.

If it crawls only modified pages how does it figure out the pages are
modified?
By checking the size of the pages? by hash?

How about documents files like PDF, linked in web pages?
if those documents are modified, how does MCF figure out they are modified?

I am using old version, MCF 1.4.1

Best regards,


Shigeki

how web crawler crawls contents after the first crawling

Reply via email to