Hi Karl,
I am developing my own repository where I borrowed some code from the file
repository connector. I use my repository connector to crawling documents from
IBM domino system. I managed to retrieve all the files in the domino, however,
when I restart my job to recrawl the database in the domino, I've got problems
with the following code where previousDocuments.get(documentIdentifierHash) in
the WorkerThread.java(org.apache.manifoldcf.crawler.system) return null for
some of the document ids. As a result, the job got stuck with the specific
document id.
Could you please tell me how I could fix the problem?
protected IPipelineSpecificationWithVersions
computePipelineSpecificationWithVersions(String documentIdentifierHash,
String componentIdentifierHash,
String documentIdentifier)
{
QueuedDocument qd = previousDocuments.get(documentIdentifierHash); //
return null. The problem is here.
if (qd == null)
throw new IllegalArgumentException("Unrecognized document identifier:
'"+documentIdentifier+"'");
return new
PipelineSpecificationWithVersions(pipelineSpecification,qd,componentIdentifierHash);
}
Thanks a lot.
Cheng