John Mendenhall wrote:
I know about readdb. But, unless I am missing something, it doesn't know which segment a URL is stored in. I'm after the information stored in the segment for a URL, not the information in the crawldb.

I'm pretty sure the indexing process includes some kind of link from a URL to the data in a segment for that URL, but I'm still looking....

You'll need to do a dump of the segments to find which
segment it is in, using readseg.

As far as I understand it, the system does not need to
find which segment it is in.  It indexes it the other
way around, indexing the segments, which have the url
(the key) attached to them.

I don't think so. Each segment I have has no extra information added to it after indexing. But the indexing operation did create a file indexes/part-00000/_mq0.fdt which seems to be a map between url and segment.

Cheers,
Carl.

Reply via email to