Hi Karl,
Sorry for not explaining the issue in a detail manner.
(1) Is it likely to go away or not on a retry;
The DM_PLATFORM_E_INTEGER_CONVERSION_ERROR and
DM_OBJECT_E_LOAD_INVALID_STRING_LEN error are not likely to go away on
immediate retry.
(2) Does it substantially impact the ability of ManifoldCF to properly
process the document;
The impact is someone need to monitor the indexing and if it gets stopped on
these issues, need to use the restart-minimal to start the indexing again.
(3) Is it generally acceptable to skip ALL documents where the error occurs.
Yes, those errors are occurred for a large number of documents and its tough
time for the user to restart the indexing again. Total documents count - 700000+
DM_OBJECT_E_LOAD_INVALID_STRING_LEN - 11147
DM_PLATFORM_E_INTEGER_CONVERSION_ERROR 21708
Im not sure whether the occurrences of these issues are common on the
documentum / due to improper documentum configuration/maintenance. We have
encountered those errors on a couple of the documentum instances of lower
environments (Not validated on production).
The documentum repository errors DM_PLATFORM_E_INTEGER_CONVERSION_ERROR and
DM_OBJECT_E_LOAD_INVALID_STRING_LEN are of type DfException caused from the
getObjectByQualification method in the
org.apache.manifoldcf.crawler.common.DCTM.DocumentumImpl.
We made a fix to print the error on the log(documentum server process) and
return null.
catch (DfException e)
{
e.printStackTrace();
return null;
//throw new DocumentumException("Documentum error: "+e.getMessage());
}
On the run() method of the ProcessDocumentThread inner class on the
org.apache.manifoldcf.crawler.connectors.DCTM.DCTM file, if did a null check
to continue with the document processing.
try
{
IDocumentumObject object = session.getObjectByQualification("dm_document where
i_chronicle_id='" + documentIdentifier +
"' and any r_version_label='CURRENT'");
if(object!=null) {
…
}
}
catch (Throwable e)
{
this.exception = e;
}
The [DM_SYSOBJECT_E_CONTENT_UNAVAILABLE_PARKED error occurs very rarely due to
the document uploaded is parked in interim BOCS and moved to Repository after a
shorter time.
If indexing happens on the gap, the properties will be accessible, but the
document content will not be available that causes the error. The fix is not
yet completed.
The code snippet that causes this error is shared below.
The run() method of the ProcessDocumentThread inner class on the
org.apache.manifoldcf.crawler.connectors.DCTM.DCTM
try
{
strFilePath = object.getFile(objFileTemp.getCanonicalPath());
}
catch (DocumentumException dfe)
{
// Fetch failed, so log it
activityStatus = "NOCONTENT";
activityMessage = dfe.getMessage();
if (dfe.getType() != DocumentumException.TYPE_NOTALLOWED)
throw dfe;
return;
}
The getFile method on the
org.apache.manifoldcf.crawler.common.DCTM.DocumentumObjectImpl
catch (DfException dfe)
{
// Can't decide what to do without looking at the exception text.
// This is crappy but it's the best we can manage, apparently.
String errorMessage = dfe.getMessage();
if (errorMessage.indexOf("[DM_CONTENT_E_CANT_START_PULL]") == -1)
// Treat it as transient, and retry
throw new
DocumentumException(dfe.getMessage(),DocumentumException.TYPE_SERVICEINTERRUPTION);
// It's probably not a transient error. Report it as an access
violation, even though it
// may well not be. We don't have much info as to what's happening.
throw new
DocumentumException(dfe.getMessage(),DocumentumException.TYPE_NOTALLOWED);
}
The approach to discard uncrawlable documents and continue with the indexing
process is meaningful rather than stalling it. If you feel it is good to
include, kindly do the required coding exception.
Regards,
Tamizh Kumaran Thamizharasan
From: Karl Wright [mailto:[email protected]]
Sent: Friday, July 14, 2017 12:36 PM
To: [email protected]
Cc: Sharnel Merdeck Pereira; Sundarapandian Arumaidurai Vethasigamani
Subject: Re: Documentum job stops on error
Hi Tamizh,
For any repository errors, ManifoldCF needs to know the following:
(1) Is it likely to go away or not on a retry;
(2) Does it substantially impact the ability of ManifoldCF to properly process
the document;
(3) Is it generally acceptable to skip ALL documents where the error occurs.
In this case your underlying error seems quite worrying:
[DM_SYSOBJECT_E_CONTENT_UNAVAILABLE_PARKED]error: "The content is temporarily
parked on a BOCS server host. It will be available when it is moved to a
permanent storage area."
I could imagine that many or most documents are in fact in that state, in which
case nothing can really be crawled?
I'm happy to make coding exceptions in the Documentum connector for discarding
uncrawlable documents, but only if it makes sense to do that. Here it is not
clear at all that we'd want to change MCF to throw away all documents with this
problem. It sounds instead like there's some significant Documentum
configuration issue to me.
Thanks,
Karl
On Fri, Jul 14, 2017 at 2:39 AM, Tamizh Kumaran Thamizharasan
<[email protected]<mailto:[email protected]>>
wrote:
Hi Team,
Below behavior is observed on using ManifoldCF Documentum connector.
• On any Documentum specific error, the application throws the error
and the job stops abruptly. If there is any specific reason for this approach?
Can we handle these errors by logging the errors, ignoring the document and
continue the indexing?
Please find the sample error causing the job to fail.
Documentum error: [DM_PLATFORM_E_INTEGER_CONVERSION_ERROR]error: "The server
was unable to convert the following string (String Unavailable) to an integer
or long."
Caused by: org.apache.manifoldcf.crawler.common.DCTM.DocumentumException:
Documentum error: [DM_OBJECT_E_LOAD_INVALID_STRING_LEN]error: "Error loading
object: invalid string length 0 found in input stream"
Error: Repeated service interruptions - failure processing document:
[DM_SYSOBJECT_E_CONTENT_UNAVAILABLE_PARKED]error: "The content is temporarily
parked on a BOCS server host. It will be available when it is moved to a
permanent storage area."
Kindly provide your suggestion on this.
Regards,
Tamizh Kumaran Thamizharasan