Re: Getting page information given the URL

Carl Cerecke Thu, 30 Aug 2007 19:36:02 -0700

John Mendenhall wrote:

I know about readdb. But, unless I am missing something, it doesn't knowwhich segment a URL is stored in. I'm after the information stored inthe segment for a URL, not the information in the crawldb.
I'm pretty sure the indexing process includes some kind of link from aURL to the data in a segment for that URL, but I'm still looking....
You'll need to do a dump of the segments to find which
segment it is in, using readseg.

As far as I understand it, the system does not need to
find which segment it is in.  It indexes it the other
way around, indexing the segments, which have the url
(the key) attached to them.

I don't think so. Each segment I have has no extra information added toit after indexing. But the indexing operation did create a fileindexes/part-00000/_mq0.fdt which seems to be a map between url and segment.


Cheers,
Carl.

Re: Getting page information given the URL

Reply via email to