I've attached a third patch to this ticket that should fix both of these cases. The patches must be applied in order.
Karl On Mon, Jul 17, 2017 at 2:46 AM, Tamizh Kumaran Thamizharasan < [email protected]> wrote: > Thanks Karl for the patch!!! > > > > A minor correction is required on the patch https://issues.apache.org/ > jira/secure/attachment/12877287/CONNECTORS-1444-2.patch(file:DCTM.java) > > else if (dfe.getType() != DocumentumException.TYPE_CORRUPTEDDOCUMENT) > > need to be modified to > > else if (dfe.getType() == DocumentumException.TYPE_CORRUPTEDDOCUMENT) > > > > After the change its working fine. > > > > Also the observation is these errors(DM_PLATFORM_E_INTEGER_CONVERSION_ERROR > and DM_OBJECT_E_LOAD_INVALID_STRING_LEN) are emitted from the > org.apache.manifoldcf.crawler.common.DCTM.DocumentumImpl.getObjectByQualification > method call. So all the changes on https://issues.apache.org/ > jira/secure/attachment/12877287/CONNECTORS-1444-2.patch and > DocumentumException.java > <https://issues.apache.org/jira/secure/attachment/12877287/CONNECTORS-1444-2.patch%20and%20DocumentumException.java> > file change on https://issues.apache.org/jira/secure/attachment/ > 12877277/CONNECTORS-1444.patch should be sufficient. > > > > Regards, > > Tamizh Kumaran Thamizharasan > > > > *From:* Karl Wright [mailto:[email protected]] > *Sent:* Friday, July 14, 2017 5:41 PM > > *To:* [email protected] > *Cc:* Sharnel Merdeck Pereira; Sundarapandian Arumaidurai Vethasigamani > *Subject:* Re: Documentum job stops on error > > > > Ok, I've attached and committed an additional patch. Please let me know. > > > > Karl > > > > > > On Fri, Jul 14, 2017 at 7:54 AM, Tamizh Kumaran Thamizharasan < > [email protected]> wrote: > > Hi Karl, > > > > The patch provided is not working since the error is thrown from > org.apache.manifoldcf.crawler.common.DCTM.DocumentumImpl. > getObjectByQualification > > > > return new DocumentumObjectImpl(objIDfSession,objIDfSession. > getObjectByQualification(dql)); > > > > Error log as follows: > > > > DfException:: THREAD: RMI TCP Connection(1083)-127.0.0.1; MSG: > [DM_OBJECT_E_LOAD_INVALID_STRING_LEN]error: "Error loading object: > invalid string length 0 found in input stream"; ERRORCODE: 100; NEXT: null > > at com.documentum.fc.client.impl.docbase.DocbaseExceptionMapper. > newException(DocbaseExceptionMapper.java:57) > > at com.documentum.fc.client.impl.connection.docbase. > MessageEntry.getException(MessageEntry.java:39) > > at com.documentum.fc.client.impl.connection.docbase. > DocbaseMessageManager.getException(DocbaseMessageManager.java:137) > > at com.documentum.fc.client.impl.connection.docbase.netwise. > NetwiseDocbaseRpcClient.checkForMessages(NetwiseDocbaseRpcClient.java:310) > > at com.documentum.fc.client.impl.connection.docbase.netwise. > NetwiseDocbaseRpcClient.applyForObject(NetwiseDocbaseRpcClient.java:653) > > at com.documentum.fc.client.impl.connection.docbase. > DocbaseConnection$8.evaluate(DocbaseConnection.java:1370) > > at com.documentum.fc.client.impl.connection.docbase. > DocbaseConnection.evaluateRpc(DocbaseConnection.java:1129) > > at com.documentum.fc.client.impl.connection.docbase. > DocbaseConnection.applyForObject(DocbaseConnection.java:1362) > > at com.documentum.fc.client.impl.docbase.DocbaseApi. > parameterizedFetch(DocbaseApi.java:107) > > at com.documentum.fc.client.impl.objectmanager. > PersistentDataManager.fetchFromServer(PersistentDataManager.java:191) > > at com.documentum.fc.client.impl.objectmanager. > PersistentDataManager.getData(PersistentDataManager.java:82) > > at com.documentum.fc.client.impl.objectmanager. > PersistentObjectManager.getObjectFromServer(PersistentObjectManager.java: > 355) > > at com.documentum.fc.client.impl.objectmanager. > PersistentObjectManager.getObject(PersistentObjectManager.java:311) > > at com.documentum.fc.client.impl.session.Session.getObject( > Session.java:958) > > at com.documentum.fc.client.impl.session.Session. > getObjectByQualificationEx(Session.java:1139) > > at com.documentum.fc.client.impl.session.Session. > getObjectByQualification(Session.java:1117) > > at com.documentum.fc.client.impl.session.SessionHandle. > getObjectByQualification(SessionHandle.java:755) > > at org.apache.manifoldcf.crawler.common.DCTM.DocumentumImpl. > getObjectByQualification(DocumentumImpl.java:334) > > at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source) > > at sun.reflect.DelegatingMethodAccessorImpl.invoke( > DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:498) > > at sun.rmi.server.UnicastServerRef.dispatch( > UnicastServerRef.java:346) > > at sun.rmi.transport.Transport$1.run(Transport.java:200) > > at sun.rmi.transport.Transport$1.run(Transport.java:197) > > at java.security.AccessController.doPrivileged(Native Method) > > at sun.rmi.transport.Transport.serviceCall(Transport.java:196) > > at sun.rmi.transport.tcp.TCPTransport.handleMessages( > TCPTransport.java:568) > > at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0( > TCPTransport.java:826) > > at sun.rmi.transport.tcp.TCPTransport$ > ConnectionHandler.lambda$run$0(TCPTransport.java:683) > > at java.security.AccessController.doPrivileged(Native Method) > > at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run( > TCPTransport.java:682) > > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1142) > > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:617) > > at java.lang.Thread.run(Thread.java:745) > > > > Regards, > > Tamizh Kumaran Thamizharasan > > > > *From:* Karl Wright [mailto:[email protected]] > *Sent:* Friday, July 14, 2017 4:32 PM > > > *To:* [email protected] > *Cc:* Sharnel Merdeck Pereira; Sundarapandian Arumaidurai Vethasigamani > *Subject:* Re: Documentum job stops on error > > > > I have created a ticket (CONNECTORS-1444) to track this issue, and > attached a fix. I've also committed the fix to trunk. > > > > The fix is not the code change you have done, but instead introduces a new > kind of DocumentumException: CORRUPTEDDOCUMENT. This will be thrown > whenever permanent document corruption is detected, and will cause the > document to be skipped and not indexed. > > > > The "DM_SYSOBJECT_E_CONTENT_UNAVAILABLE_PARKED " error should cause the > connector to retry the document at a later time, so if indeed this is not a > permanent error, no special fix should be required. > > > > Please let me know if the fix I have committed works for you. > > > > Karl > > > > > > > > On Fri, Jul 14, 2017 at 5:41 AM, Tamizh Kumaran Thamizharasan < > [email protected]> wrote: > > Hi Karl, > > > > Sorry for not explaining the issue in a detail manner. > > (1) Is it likely to go away or not on a retry; > > The DM_PLATFORM_E_INTEGER_CONVERSION_ERROR and > DM_OBJECT_E_LOAD_INVALID_STRING_LEN > error are not likely to go away on immediate retry. > > (2) Does it substantially impact the ability of ManifoldCF to properly > process the document; > > The impact is someone need to monitor the indexing and if it gets stopped > on these issues, need to use the restart-minimal to start the indexing > again. > > (3) Is it generally acceptable to skip ALL documents where the error > occurs. > > Yes, those errors are occurred for a large number of documents and its > tough time for the user to restart the indexing again. Total documents > count - 700000+ > > DM_OBJECT_E_LOAD_INVALID_STRING_LEN - 11147 > > DM_PLATFORM_E_INTEGER_CONVERSION_ERROR 21708 > > Im not sure whether the occurrences of these issues are common on the > documentum / due to improper documentum configuration/maintenance. We have > encountered those errors on a couple of the documentum instances of lower > environments (Not validated on production). > > > > The documentum repository errors DM_PLATFORM_E_INTEGER_CONVERSION_ERROR > and DM_OBJECT_E_LOAD_INVALID_STRING_LEN are of type DfException caused > from the getObjectByQualification method in the > org.apache.manifoldcf.crawler.common.DCTM.DocumentumImpl. > > > > We made a fix to print the error on the log(documentum server process) and > return null. > > * catch* (DfException e) > > { > > > > e.printStackTrace(); > > *return* *null*; > > //throw new DocumentumException("Documentum error: > "+e.getMessage()); > > } > > > > > > On the run() method of the ProcessDocumentThread inner class on the > org.apache.manifoldcf.crawler.connectors.DCTM.DCTM file, if did a null > check to continue with the document processing. > > *try* > > { > > IDocumentumObject object = session.getObjectByQualification("dm_document > where i_chronicle_id='" + documentIdentifier + > > "' and any r_version_label='CURRENT'"); > > *if*(object!=*null*) { > > … > > } > > } > > *catch* (Throwable e) > > { > > *this*.exception = e; > > } > > > > The [DM_SYSOBJECT_E_CONTENT_UNAVAILABLE_PARKED error occurs very rarely > due to the document uploaded is parked in interim BOCS and moved to > Repository after a shorter time. > > If indexing happens on the gap, the properties will be accessible, but the > document content will not be available that causes the error. The fix is > not yet completed. > > The code snippet that causes this error is shared below. > > The run() method of the ProcessDocumentThread inner class on the > org.apache.manifoldcf.crawler.connectors.DCTM.DCTM > > * try* > > { > > strFilePath = object.getFile(objFileTemp.getCanonicalPath()); > > } > > *catch* (DocumentumException dfe) > > { > > // Fetch failed, so log it > > activityStatus = "NOCONTENT"; > > activityMessage = dfe.getMessage(); > > *if* (dfe.getType() != DocumentumException.TYPE_NOTALLOWED) > > *throw* dfe; > > *return*; > > } > > > > The getFile method on the org.apache.manifoldcf.crawler.common.DCTM. > DocumentumObjectImpl > > > > *catch* (DfException dfe) > > { > > // Can't decide what to do without looking at the exception text. > > // This is crappy but it's the best we can manage, apparently. > > String errorMessage = dfe.getMessage(); > > *if* (errorMessage.indexOf("[DM_CONTENT_E_CANT_START_PULL]") == -1) > > // Treat it as transient, and retry > > *throw* *new* DocumentumException(dfe.getMessage(), > DocumentumException.TYPE_SERVICEINTERRUPTION); > > // It's probably not a transient error. Report it as an access > violation, even though it > > // may well not be. We don't have much info as to what's happening. > > *throw* *new* DocumentumException(dfe.getMessage(), > DocumentumException.TYPE_NOTALLOWED); > > } > > > > The approach to discard uncrawlable documents and continue with the > indexing process is meaningful rather than stalling it. If you feel it is > good to include, kindly do the required coding exception. > > > > Regards, > > Tamizh Kumaran Thamizharasan > > > > *From:* Karl Wright [mailto:[email protected]] > *Sent:* Friday, July 14, 2017 12:36 PM > *To:* [email protected] > *Cc:* Sharnel Merdeck Pereira; Sundarapandian Arumaidurai Vethasigamani > *Subject:* Re: Documentum job stops on error > > > > Hi Tamizh, > > > > For any repository errors, ManifoldCF needs to know the following: > > (1) Is it likely to go away or not on a retry; > > (2) Does it substantially impact the ability of ManifoldCF to properly > process the document; > > (3) Is it generally acceptable to skip ALL documents where the error > occurs. > > > > In this case your underlying error seems quite worrying: > > > > [DM_SYSOBJECT_E_CONTENT_UNAVAILABLE_PARKED]error: "The content is > temporarily parked on a BOCS server host. It will be available when it is > moved to a permanent storage area." > > I could imagine that many or most documents are in fact in that state, in > which case nothing can really be crawled? > > > > I'm happy to make coding exceptions in the Documentum connector for > discarding uncrawlable documents, but only if it makes sense to do that. > Here it is not clear at all that we'd want to change MCF to throw away all > documents with this problem. It sounds instead like there's some > significant Documentum configuration issue to me. > > > > Thanks, > > Karl > > > > > > On Fri, Jul 14, 2017 at 2:39 AM, Tamizh Kumaran Thamizharasan < > [email protected]> wrote: > > Hi Team, > > > > Below behavior is observed on using ManifoldCF Documentum connector. > > > > · On any Documentum specific error, the application throws the > error and the job stops abruptly. If there is any specific reason for this > approach? > > Can we handle these errors by logging the errors, ignoring the document > and continue the indexing? > > > > Please find the sample error causing the job to fail. > > > > Documentum error: [DM_PLATFORM_E_INTEGER_CONVERSION_ERROR]error: "The > server was unable to convert the following string (String Unavailable) to > an integer or long." > > > > Caused by: org.apache.manifoldcf.crawler.common.DCTM.DocumentumException: > Documentum error: [DM_OBJECT_E_LOAD_INVALID_STRING_LEN]error: "Error > loading object: invalid string length 0 found in input stream" > > > > Error: Repeated service interruptions - failure processing document: > [DM_SYSOBJECT_E_CONTENT_UNAVAILABLE_PARKED]error: "The content is > temporarily parked on a BOCS server host. It will be available when it is > moved to a permanent storage area." > > > > Kindly provide your suggestion on this. > > > > Regards, > > Tamizh Kumaran Thamizharasan > > > > > > > > >
