Hi Andrzej,
Thanks for the reply. Regarding the outlink limit - I thought it was a limit
of 100 outlinks per page by default? And in these cases the first 100
outlinks are stored. I have a few pages like this in the crawl database. The
problem I'm having is the outlink object is empty for a some pages when on
previous days the outlink object wasn't empty and contained outlinks.
At the moment I'm using the following code in my FOR loop while reading the
segment to make sure that I ignore pages that couldn't be fetched and pages
that could not be parsed....
if(fetcherOutput.getStatus() != FetcherOutput.SUCCESS)
{
continue;
}
I've also checked the status of a couple of pages whose outlinks are missing
and they all appear to have a SUCCESS status.
Regards,
Karen
Hello Karen,
Outlinks should be stored in the segment, so that's the right place to
look for them.
One common source of missing outlinks is if you hit a maximum number of
outlinks limit - but this is set to 100 by default. Another common issue
is if the content parser catches an exception, then you will get a
positive status for fetch, but an error in parsing, hence no outlinks.
Could you use the "segread" command on these two records, and check the
status both for the fetch and the parsing stages?
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com