R: Job stuck internal http error 500
I substitute all my four .jar tika files 1.17 (parsers, core, java7, xmp) versions with the 1.19 versions nightly version and it works! No more 500 error and the file has been indexed! From the link: https://builds.apache.org/job/tika-branch-1x/73/ you can use the subfolder: Apache Tika core Apache Tika Java-7 Components Apache Tika parsers Apache Tika XMP I downloaded the: tika-xmp-1.19-20180807.184545-61.jar tika-core-1.19-20180807.184018-61.jar tika-parsers-1.19-20180807.184508-61.jar tika-java7-1.19-20180807.185414-60.jar and I renamed them in: -rw-r--r-- 1 root root 687651 Aug 8 14:16 tika-core-1.19.jar -rw-r--r-- 1 root root 14012 Aug 8 14:16 tika-java7-1.19.jar -rw-r--r-- 1 root root 1131862 Aug 8 14:16 tika-parsers-1.19.jar -rw-r--r-- 1 root root 34447 Aug 8 14:16 tika-xmp-1.19.jar So, in my /opt/solr-7.3.1/contrib/extraction/lib directory of solr I have: -rw-r--r-- 1 root root 663109 Dec 9 2017 tika-core-1.17.jarOLD -rw-r--r-- 1 root root 687651 Aug 8 14:16 tika-core-1.19.jar -rw-r--r-- 1 root root 13268 Dec 9 2017 tika-java7-1.17.jarOLD -rw-r--r-- 1 root root 14012 Aug 8 14:16 tika-java7-1.19.jar -rw-r--r-- 1 root root 1078626 Dec 9 2017 tika-parsers-1.17.jarOO -rw-r--r-- 1 root root 1131862 Aug 8 14:16 tika-parsers-1.19.jar -rw-r--r-- 1 root root 33705 Dec 9 2017 tika-xmp-1.17.jarOLD -rw-r--r-- 1 root root 34447 Aug 8 14:16 tika-xmp-1.19.jar You have to restart solr to use the new tika version Tha tika 1.19 version will be released in the next few weeks. Here is the link about my issue: https://issues.apache.org/jira/browse/TIKA-2703?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel=16573125#comment-16573125 Mario Da: Karl Wright Inviato: mercoledì 8 agosto 2018 14:54 A: user@manifoldcf.apache.org Oggetto: Re: Job stuck internal http error 500 Thanks for the update! Did the Tika people say when 1.19 will be released? Karl On Wed, Aug 8, 2018 at 8:29 AM Bisonti Mario mailto:mario.biso...@vimar.com>> wrote: Hallo You had right, Karl. I have been helped by the tika people and they patched the tika jar of the solr installation and the problem was solved! Now I solved using the tika 1.19 versions nightly build. Thanks a lot. Da: Karl Wright mailto:daddy...@gmail.com>> Inviato: venerdì 27 luglio 2018 12:39 A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org> Oggetto: Re: Job stuck internal http error 500 I am afraid you will need to open a Tika ticket, and be prepared to attach your file to it. Thanks, Karl On Fri, Jul 27, 2018 at 6:04 AM Bisonti Mario mailto:mario.biso...@vimar.com>> wrote: It isn’t a memory problem because xls file bigger (30MB) have been processed. This file xlsm with many colors etc hang I could suppose that it is a tika/solr erro but I don’t know how to solve it ☹ Oggetto: R: Job stuck internal http error 500 Yes, I am using: /opt/manifoldcf/multiprocess-file-example-proprietary I set: sudo nano options.env.unix -Xms2048m -Xmx2048m But I obtain the same error. My doubt is that it could be a solr/tika problem. What could I do? I restrict the scan to a single file and I obtain the same error Da: Karl Wright mailto:daddy...@gmail.com>> Inviato: venerdì 27 luglio 2018 11:36 A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org> Oggetto: Re: Job stuck internal http error 500 I am presuming you are using the examples. If so, edit the options file to grant more memory to you agents process by increasing the Xmx value. Karl On Fri, Jul 27, 2018, 3:04 AM Bisonti Mario mailto:mario.biso...@vimar.com>> wrote: Hallo. My job is stucking indexing an xlsx file of 38MB What could I do to solve my problem? In the following there is the error: 2018-07-27 08:55:15.562 WARN (qtp1521083627-52) [ x:core_share] o.e.j.s.HttpChannel /solr/core_share/update/extract java.lang.OutOfMemoryError at java.base/java.lang.AbstractStringBuilder.hugeCapacity(AbstractStringBuilder.java:188) at java.base/java.lang.AbstractStringBuilder.newCapacity(AbstractStringBuilder.java:180) at java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:147) at java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:660) at java.base/java.lang.StringBuilder.append(StringBuilder.java:195) at org.apache.solr.handler.extraction.SolrContentHandler.characters(SolrContentHandler.java:302) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) at org.apache.tika.sax.ContentHandlerDecorator.c
R: Job stuck internal http error 500
Hallo You had right, Karl. I have been helped by the tika people and they patched the tika jar of the solr installation and the problem was solved! Now I solved using the tika 1.19 versions nightly build. Thanks a lot. Da: Karl Wright Inviato: venerdì 27 luglio 2018 12:39 A: user@manifoldcf.apache.org Oggetto: Re: Job stuck internal http error 500 I am afraid you will need to open a Tika ticket, and be prepared to attach your file to it. Thanks, Karl On Fri, Jul 27, 2018 at 6:04 AM Bisonti Mario mailto:mario.biso...@vimar.com>> wrote: It isn’t a memory problem because xls file bigger (30MB) have been processed. This file xlsm with many colors etc hang I could suppose that it is a tika/solr erro but I don’t know how to solve it ☹ Oggetto: R: Job stuck internal http error 500 Yes, I am using: /opt/manifoldcf/multiprocess-file-example-proprietary I set: sudo nano options.env.unix -Xms2048m -Xmx2048m But I obtain the same error. My doubt is that it could be a solr/tika problem. What could I do? I restrict the scan to a single file and I obtain the same error Da: Karl Wright mailto:daddy...@gmail.com>> Inviato: venerdì 27 luglio 2018 11:36 A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org> Oggetto: Re: Job stuck internal http error 500 I am presuming you are using the examples. If so, edit the options file to grant more memory to you agents process by increasing the Xmx value. Karl On Fri, Jul 27, 2018, 3:04 AM Bisonti Mario mailto:mario.biso...@vimar.com>> wrote: Hallo. My job is stucking indexing an xlsx file of 38MB What could I do to solve my problem? In the following there is the error: 2018-07-27 08:55:15.562 WARN (qtp1521083627-52) [ x:core_share] o.e.j.s.HttpChannel /solr/core_share/update/extract java.lang.OutOfMemoryError at java.base/java.lang.AbstractStringBuilder.hugeCapacity(AbstractStringBuilder.java:188) at java.base/java.lang.AbstractStringBuilder.newCapacity(AbstractStringBuilder.java:180) at java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:147) at java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:660) at java.base/java.lang.StringBuilder.append(StringBuilder.java:195) at org.apache.solr.handler.extraction.SolrContentHandler.characters(SolrContentHandler.java:302) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46) at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82) at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140) at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287) at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279) at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306) at org.apache.tika.parser.microsoft.ooxml.OOXMLTikaBodyPartHandler.run(OOXMLTikaBodyPartHandler.java:147) at org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.handleEndOfRun(OOXMLWordAndPowerPointTextHandler.java:468) at org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.endElement(OOXMLWordAndPowerPointTextHandler.java:450) at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136) at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136) at java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:609) at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1714) at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2879) at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:602) at java.xml/com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112) at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:532) at java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:8
R: Job stuck internal http error 500
It isn’t a memory problem because xls file bigger (30MB) have been processed. This file xlsm with many colors etc hang I could suppose that it is a tika/solr erro but I don’t know how to solve it ☹ Oggetto: R: Job stuck internal http error 500 Yes, I am using: /opt/manifoldcf/multiprocess-file-example-proprietary I set: sudo nano options.env.unix -Xms2048m -Xmx2048m But I obtain the same error. My doubt is that it could be a solr/tika problem. What could I do? I restrict the scan to a single file and I obtain the same error Da: Karl Wright mailto:daddy...@gmail.com>> Inviato: venerdì 27 luglio 2018 11:36 A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org> Oggetto: Re: Job stuck internal http error 500 I am presuming you are using the examples. If so, edit the options file to grant more memory to you agents process by increasing the Xmx value. Karl On Fri, Jul 27, 2018, 3:04 AM Bisonti Mario mailto:mario.biso...@vimar.com>> wrote: Hallo. My job is stucking indexing an xlsx file of 38MB What could I do to solve my problem? In the following there is the error: 2018-07-27 08:55:15.562 WARN (qtp1521083627-52) [ x:core_share] o.e.j.s.HttpChannel /solr/core_share/update/extract java.lang.OutOfMemoryError at java.base/java.lang.AbstractStringBuilder.hugeCapacity(AbstractStringBuilder.java:188) at java.base/java.lang.AbstractStringBuilder.newCapacity(AbstractStringBuilder.java:180) at java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:147) at java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:660) at java.base/java.lang.StringBuilder.append(StringBuilder.java:195) at org.apache.solr.handler.extraction.SolrContentHandler.characters(SolrContentHandler.java:302) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46) at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82) at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140) at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287) at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279) at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306) at org.apache.tika.parser.microsoft.ooxml.OOXMLTikaBodyPartHandler.run(OOXMLTikaBodyPartHandler.java:147) at org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.handleEndOfRun(OOXMLWordAndPowerPointTextHandler.java:468) at org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.endElement(OOXMLWordAndPowerPointTextHandler.java:450) at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136) at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136) at java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:609) at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1714) at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2879) at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:602) at java.xml/com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112) at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:532) at java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:888) at java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:824) at java.xml/com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141) at java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213) at java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:635) at java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(SAXParserImpl.java:324)
R: Job stuck internal http error 500
Yes, I am using: /opt/manifoldcf/multiprocess-file-example-proprietary I set: sudo nano options.env.unix -Xms2048m -Xmx2048m But I obtain the same error. My doubt is that it could be a solr/tika problem. What could I do? I restrict the scan to a single file and I obtain the same error Da: Karl Wright Inviato: venerdì 27 luglio 2018 11:36 A: user@manifoldcf.apache.org Oggetto: Re: Job stuck internal http error 500 I am presuming you are using the examples. If so, edit the options file to grant more memory to you agents process by increasing the Xmx value. Karl On Fri, Jul 27, 2018, 3:04 AM Bisonti Mario mailto:mario.biso...@vimar.com>> wrote: Hallo. My job is stucking indexing an xlsx file of 38MB What could I do to solve my problem? In the following there is the error: 2018-07-27 08:55:15.562 WARN (qtp1521083627-52) [ x:core_share] o.e.j.s.HttpChannel /solr/core_share/update/extract java.lang.OutOfMemoryError at java.base/java.lang.AbstractStringBuilder.hugeCapacity(AbstractStringBuilder.java:188) at java.base/java.lang.AbstractStringBuilder.newCapacity(AbstractStringBuilder.java:180) at java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:147) at java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:660) at java.base/java.lang.StringBuilder.append(StringBuilder.java:195) at org.apache.solr.handler.extraction.SolrContentHandler.characters(SolrContentHandler.java:302) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46) at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82) at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140) at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287) at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279) at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306) at org.apache.tika.parser.microsoft.ooxml.OOXMLTikaBodyPartHandler.run(OOXMLTikaBodyPartHandler.java:147) at org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.handleEndOfRun(OOXMLWordAndPowerPointTextHandler.java:468) at org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.endElement(OOXMLWordAndPowerPointTextHandler.java:450) at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136) at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136) at java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:609) at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1714) at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2879) at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:602) at java.xml/com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112) at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:532) at java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:888) at java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:824) at java.xml/com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141) at java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213) at java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:635) at java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(SAXParserImpl.java:324) at java.xml/javax.xml.parsers.SAXParser.parse(SAXParser.java:197) at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleGeneralTextContainingPart(AbstractOOXMLExtractor.java:506) at