Hi Nikita,

Until you fix your connector, nothing can be done to address your Out Of
Memory problem.

The problem is that you are not calling the following IProcessActivity
method:

  /** Check whether a document of a specific length is indexable by the
currently specified output connector.
  *@param length is the document length.
  *@return true if the document is indexable.
  */
  public boolean checkLengthIndexable(long length)
    throws ManifoldCFException, ServiceInterruption;

Your connector should call this and honor the response.

Thanks,
Karl



On Fri, Aug 24, 2018 at 9:55 AM Nikita Ahuja <[email protected]> wrote:

> Hi Karl,
>
> I have checked for the coding error, there is nothing like that as"Allowed
> Document" is working fine for same code on the other system.
>
> But now main issue being faced is "Shutting down of the ManifoldCF" and it
> shows *"java.lang.OutOfMemoryError: GC overhead limit exceeded" on the
> system.*
>
> Postgresql is being used for Manifoldcf and the memory alloted for the
> system is very good, but still this issue is faced very frequently.
> Throttling(2) and Worker thread size"45" is also being checked and as per
> the documentation it is checked for different values.
>
>
> Please suggest the possible problem area and steps to be taken.
>
> On Mon, Aug 20, 2018 at 7:30 PM, Karl Wright <[email protected]> wrote:
>
>> Obviously your Allowed Documents filter is somehow causing all documents
>> to be excluded.  Since you have a custom repository connector I would bet
>> there is a coding error in it that is responsible.
>>
>> Karl
>>
>>
>> On Mon, Aug 20, 2018 at 8:49 AM Nikita Ahuja <[email protected]>
>> wrote:
>>
>>> Hi Karl,
>>>
>>> Thanks for reply.
>>>
>>> I am using in the same sequence. The allowed document is added first and
>>> then the Tika Transformation.
>>>
>>>
>>>
>>>
>>> But nothing runs in that scenario. The job simply ends without returning
>>> anything in the output.
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Aug 20, 2018 at 5:36 PM, Karl Wright <[email protected]> wrote:
>>>
>>>> Hi,
>>>>
>>>> You are running out of memory.
>>>> Tika's memory consumption is not well defined so you will need to limit
>>>> the size of documents that reach it.  This is not the same as limiting the
>>>> size of documents *after* Tika extracts them.
>>>>
>>>> The Allowed Documents transformer therefore should be placed in the
>>>> pipeline before the Tika Extractor.
>>>>
>>>> "Also it is not compatible with the Allowed Documents and Metadata
>>>> Adjuster Connectors."
>>>>
>>>> This is a huge red flag.  Why not?
>>>>
>>>> Karl
>>>>
>>>>
>>>> On Mon, Aug 20, 2018 at 6:47 AM Nikita Ahuja <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Karl,
>>>>>
>>>>> There is a custom job executing for Aconex in the ManifoldCF
>>>>> environment. But while executing it is not able to crawl complete set of
>>>>> documents. It crashes in the middle of the execution.
>>>>>
>>>>> Also it is not compatible with the Allowed Documents and Metadata
>>>>> Adjuster Connectors.
>>>>>
>>>>> The custom job created is similar to the existing Jira connector in
>>>>> the ManifoldCF.
>>>>>
>>>>> And it showing this type of error. Please suggest appropriate  steps
>>>>> which needs to be followed to make it smoothly running.
>>>>>
>>>>>
>>>>>
>>>>> *Connect to uk1.aconex.co.uk:443 <http://uk1.aconex.co.uk:443>
>>>>> [uk1.aconex.co.uk/---.---.---.---
>>>>> <http://uk1.aconex.co.uk/---.---.---.--->] failed: Read timed out*
>>>>> *agents process ran out of memory - shutting down*
>>>>> *agents process ran out of memory - shutting down*
>>>>> *agents process ran out of memory - shutting down*
>>>>> *agents process ran out of memory - shutting down*
>>>>> *java.lang.OutOfMemoryError: Java heap space*
>>>>> *java.lang.OutOfMemoryError: Java heap space*
>>>>> *java.lang.OutOfMemoryError: Java heap space*
>>>>> *        at
>>>>> org.apache.manifoldcf.core.database.Database.beginTransaction(Database.java:240)*
>>>>> *        at
>>>>> org.apache.manifoldcf.core.database.DBInterfaceHSQLDB.beginTransaction(DBInterfaceHSQLDB.java:1361)*
>>>>> *        at
>>>>> org.apache.manifoldcf.core.database.DBInterfaceHSQLDB.beginTransaction(DBInterfaceHSQLDB.java:1327)*
>>>>> *        at
>>>>> org.apache.manifoldcf.crawler.jobs.JobManager.assessMarkedJobs(JobManager.java:823)*
>>>>> *        at
>>>>> org.apache.manifoldcf.crawler.system.AssessmentThread.run(AssessmentThread.java:65)*
>>>>> *java.lang.OutOfMemoryError: Java heap space*
>>>>> *        at
>>>>> org.apache.pdfbox.pdmodel.graphics.state.PDGraphicsState.clone(PDGraphicsState.java:494)*
>>>>> *        at
>>>>> org.apache.pdfbox.contentstream.PDFStreamEngine.saveGraphicsState(PDFStreamEngine.java:898)*
>>>>> *        at
>>>>> org.apache.pdfbox.contentstream.PDFStreamEngine.showText(PDFStreamEngine.java:721)*
>>>>> *        at
>>>>> org.apache.pdfbox.contentstream.PDFStreamEngine.showTextString(PDFStreamEngine.java:587)*
>>>>> *        at
>>>>> org.apache.pdfbox.contentstream.operator.text.ShowText.process(ShowText.java:55)*
>>>>> *        at
>>>>> org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:838)*
>>>>> *        at
>>>>> org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:495)*
>>>>> *        at
>>>>> org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:469)*
>>>>> *        at
>>>>> org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:150)*
>>>>> *        at
>>>>> org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(LegacyPDFStreamEngine.java:139)*
>>>>> *        at
>>>>> org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:391)*
>>>>> *        at
>>>>> org.apache.tika.parser.pdf.PDF2XHTML.processPage(PDF2XHTML.java:147)*
>>>>> *        at
>>>>> org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:319)*
>>>>> *        at
>>>>> org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266)*
>>>>> *        at
>>>>> org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:117)*
>>>>> *        at
>>>>> org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:168)*
>>>>> *        at
>>>>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)*
>>>>> *        at
>>>>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)*
>>>>> *        at
>>>>> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)*
>>>>> *        at
>>>>> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)*
>>>>> *        at
>>>>> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)*
>>>>> *        at
>>>>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)*
>>>>> *        at
>>>>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)*
>>>>> *        at
>>>>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)*
>>>>> *        at
>>>>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)*
>>>>> *        at
>>>>> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)*
>>>>> *        at
>>>>> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)*
>>>>> *        at
>>>>> org.apache.manifoldcf.crawler.connectors.aconex.AconexSession.fetchAndIndexFile(AconexSession.java:720)*
>>>>> *        at
>>>>> org.apache.manifoldcf.crawler.connectors.aconex.AconexRepositoryConnector.processDocuments(AconexRepositoryConnector.java:1194)*
>>>>> *        at
>>>>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)*
>>>>> *[Thread-431] INFO org.eclipse.jetty.server.ServerConnector - Stopped
>>>>> ServerConnector@2c0b4c83{HTTP/1.1}{0.0.0.0:8345 <http://0.0.0.0:8345>}*
>>>>> *[Thread-431] INFO org.eclipse.jetty.server.handler.ContextHandler -
>>>>> Stopped
>>>>> o.e.j.w.WebAppContext@4c03a37{/mcf-api-service,file:/C:/Users/smartshore/AppData/Local/Temp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-3117653580650249372.dir/webapp/,UNAVAILABLE}{D:\Manifold\apache-manifoldcf-2.8.1\example\.\..\web\war\mcf-api-service.war}*
>>>>> *[Thread-431] INFO org.eclipse.jetty.server.handler.ContextHandler -
>>>>> Stopped
>>>>> o.e.j.w.WebAppContext@65ae095c{/mcf-authority-service,file:/C:/Users/smartshore/AppData/Local/Temp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-8288503227579256193.dir/webapp/,UNAVAILABLE}{D:\Manifold\apache-manifoldcf-2.8.1\example\.\..\web\war\mcf-authority-service.war}*
>>>>> *Connect to uk1.aconex.co.uk:443 <http://uk1.aconex.co.uk:443>
>>>>> [uk1.aconex.co.uk/23.10.35.84 <http://uk1.aconex.co.uk/23.10.35.84>]
>>>>> failed: Read timed out*
>>>>> --
>>>>> Thanks and Regards,
>>>>> Nikita
>>>>> Email: [email protected]
>>>>> United Sources Service Pvt. Ltd.
>>>>> a "Smartshore" Company
>>>>> Mobile: +91 99 888 57720
>>>>> http://www.smartshore.nl
>>>>>
>>>>
>>>
>>>
>>> --
>>> Thanks and Regards,
>>> Nikita
>>> Email: [email protected]
>>> United Sources Service Pvt. Ltd.
>>> a "Smartshore" Company
>>> Mobile: +91 99 888 57720
>>> http://www.smartshore.nl
>>>
>>
>
>
> --
> Thanks and Regards,
> Nikita
> Email: [email protected]
> United Sources Service Pvt. Ltd.
> a "Smartshore" Company
> Mobile: +91 99 888 57720
> http://www.smartshore.nl
>

Reply via email to