R: Job stuck internal http error 500

2018-08-08 Thread Bisonti Mario
I substitute all my four .jar tika files 1.17 (parsers, core, java7, xmp)  
versions with the 1.19 versions nightly version and it works!
No more 500 error and the file has been indexed!

From the link:
https://builds.apache.org/job/tika-branch-1x/73/
you can use the subfolder:
Apache Tika core
Apache Tika Java-7 Components
Apache Tika parsers
Apache Tika XMP

I downloaded the:
tika-xmp-1.19-20180807.184545-61.jar
tika-core-1.19-20180807.184018-61.jar
tika-parsers-1.19-20180807.184508-61.jar
tika-java7-1.19-20180807.185414-60.jar
and I renamed them in:
-rw-r--r-- 1 root root  687651 Aug  8 14:16 tika-core-1.19.jar
-rw-r--r-- 1 root root   14012 Aug  8 14:16 tika-java7-1.19.jar
-rw-r--r-- 1 root root 1131862 Aug  8 14:16 tika-parsers-1.19.jar
-rw-r--r-- 1 root root   34447 Aug  8 14:16 tika-xmp-1.19.jar

So, in my /opt/solr-7.3.1/contrib/extraction/lib directory of solr I have:
-rw-r--r-- 1 root root  663109 Dec  9  2017 tika-core-1.17.jarOLD
-rw-r--r-- 1 root root  687651 Aug  8 14:16 tika-core-1.19.jar
-rw-r--r-- 1 root root   13268 Dec  9  2017 tika-java7-1.17.jarOLD
-rw-r--r-- 1 root root   14012 Aug  8 14:16 tika-java7-1.19.jar
-rw-r--r-- 1 root root 1078626 Dec  9  2017 tika-parsers-1.17.jarOO
-rw-r--r-- 1 root root 1131862 Aug  8 14:16 tika-parsers-1.19.jar
-rw-r--r-- 1 root root   33705 Dec  9  2017 tika-xmp-1.17.jarOLD
-rw-r--r-- 1 root root   34447 Aug  8 14:16 tika-xmp-1.19.jar

You have to restart solr to use the new tika version

Tha tika 1.19 version will be released in the next few weeks.

Here is the link about my issue:

https://issues.apache.org/jira/browse/TIKA-2703?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel=16573125#comment-16573125


Mario



Da: Karl Wright 
Inviato: mercoledì 8 agosto 2018 14:54
A: user@manifoldcf.apache.org
Oggetto: Re: Job stuck internal http error 500

Thanks for the update!

Did the Tika people say when 1.19 will be released?

Karl


On Wed, Aug 8, 2018 at 8:29 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hallo
You had right, Karl.

I have been helped by the tika people and they patched the tika jar of the solr 
installation and the problem was solved!

Now I solved using the tika 1.19 versions nightly build.


Thanks a lot.



Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: venerdì 27 luglio 2018 12:39
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Job stuck internal http error 500

I am afraid you will need to open a Tika ticket, and be prepared to attach your 
file to it.

Thanks,

Karl


On Fri, Jul 27, 2018 at 6:04 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
It isn’t a memory problem because xls file bigger (30MB) have been processed.

This file xlsm with many colors etc hang
I could suppose that it is a tika/solr erro but I don’t know how to solve it
☹

Oggetto: R: Job stuck internal http error 500

Yes, I am using:
/opt/manifoldcf/multiprocess-file-example-proprietary
I set:
sudo nano options.env.unix
-Xms2048m
-Xmx2048m

But I obtain the same error.
My doubt is that it could be a solr/tika problem.
What could I do?
I restrict the scan to a single file and I obtain the same error



Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: venerdì 27 luglio 2018 11:36
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Job stuck internal http error 500

I am presuming you are using the examples.  If so, edit the options file to 
grant more memory to you agents process by increasing the Xmx value.

Karl

On Fri, Jul 27, 2018, 3:04 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hallo.
My job is stucking indexing an xlsx file of 38MB

What could I do to solve my problem?

In the following there is the error:
2018-07-27 08:55:15.562 WARN  (qtp1521083627-52) [   x:core_share] 
o.e.j.s.HttpChannel /solr/core_share/update/extract
java.lang.OutOfMemoryError
at 
java.base/java.lang.AbstractStringBuilder.hugeCapacity(AbstractStringBuilder.java:188)
at 
java.base/java.lang.AbstractStringBuilder.newCapacity(AbstractStringBuilder.java:180)
at 
java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:147)
at 
java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:660)
at java.base/java.lang.StringBuilder.append(StringBuilder.java:195)
at 
org.apache.solr.handler.extraction.SolrContentHandler.characters(SolrContentHandler.java:302)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at 
org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at 
org.apache.tika.sax.ContentHandlerDecorator.c

R: Job stuck internal http error 500

2018-08-08 Thread Bisonti Mario
Hallo
You had right, Karl.

I have been helped by the tika people and they patched the tika jar of the solr 
installation and the problem was solved!

Now I solved using the tika 1.19 versions nightly build.


Thanks a lot.



Da: Karl Wright 
Inviato: venerdì 27 luglio 2018 12:39
A: user@manifoldcf.apache.org
Oggetto: Re: Job stuck internal http error 500

I am afraid you will need to open a Tika ticket, and be prepared to attach your 
file to it.

Thanks,

Karl


On Fri, Jul 27, 2018 at 6:04 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
It isn’t a memory problem because xls file bigger (30MB) have been processed.

This file xlsm with many colors etc hang
I could suppose that it is a tika/solr erro but I don’t know how to solve it
☹

Oggetto: R: Job stuck internal http error 500

Yes, I am using:
/opt/manifoldcf/multiprocess-file-example-proprietary
I set:
sudo nano options.env.unix
-Xms2048m
-Xmx2048m

But I obtain the same error.
My doubt is that it could be a solr/tika problem.
What could I do?
I restrict the scan to a single file and I obtain the same error



Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: venerdì 27 luglio 2018 11:36
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Job stuck internal http error 500

I am presuming you are using the examples.  If so, edit the options file to 
grant more memory to you agents process by increasing the Xmx value.

Karl

On Fri, Jul 27, 2018, 3:04 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hallo.
My job is stucking indexing an xlsx file of 38MB

What could I do to solve my problem?

In the following there is the error:
2018-07-27 08:55:15.562 WARN  (qtp1521083627-52) [   x:core_share] 
o.e.j.s.HttpChannel /solr/core_share/update/extract
java.lang.OutOfMemoryError
at 
java.base/java.lang.AbstractStringBuilder.hugeCapacity(AbstractStringBuilder.java:188)
at 
java.base/java.lang.AbstractStringBuilder.newCapacity(AbstractStringBuilder.java:180)
at 
java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:147)
at 
java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:660)
at java.base/java.lang.StringBuilder.append(StringBuilder.java:195)
at 
org.apache.solr.handler.extraction.SolrContentHandler.characters(SolrContentHandler.java:302)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at 
org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at 
org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
at 
org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
at 
org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
at 
org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
at 
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
at 
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLTikaBodyPartHandler.run(OOXMLTikaBodyPartHandler.java:147)
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.handleEndOfRun(OOXMLWordAndPowerPointTextHandler.java:468)
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.endElement(OOXMLWordAndPowerPointTextHandler.java:450)
at 
org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
at 
org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
at 
java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:609)
at 
java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1714)
at 
java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2879)
at 
java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:602)
at 
java.xml/com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112)
at 
java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:532)
at 
java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:8

R: Job stuck internal http error 500

2018-07-27 Thread Bisonti Mario
It isn’t a memory problem because xls file bigger (30MB) have been processed.

This file xlsm with many colors etc hang
I could suppose that it is a tika/solr erro but I don’t know how to solve it
☹

Oggetto: R: Job stuck internal http error 500

Yes, I am using:
/opt/manifoldcf/multiprocess-file-example-proprietary
I set:
sudo nano options.env.unix
-Xms2048m
-Xmx2048m

But I obtain the same error.
My doubt is that it could be a solr/tika problem.
What could I do?
I restrict the scan to a single file and I obtain the same error



Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: venerdì 27 luglio 2018 11:36
A: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Oggetto: Re: Job stuck internal http error 500

I am presuming you are using the examples.  If so, edit the options file to 
grant more memory to you agents process by increasing the Xmx value.

Karl

On Fri, Jul 27, 2018, 3:04 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hallo.
My job is stucking indexing an xlsx file of 38MB

What could I do to solve my problem?

In the following there is the error:
2018-07-27 08:55:15.562 WARN  (qtp1521083627-52) [   x:core_share] 
o.e.j.s.HttpChannel /solr/core_share/update/extract
java.lang.OutOfMemoryError
at 
java.base/java.lang.AbstractStringBuilder.hugeCapacity(AbstractStringBuilder.java:188)
at 
java.base/java.lang.AbstractStringBuilder.newCapacity(AbstractStringBuilder.java:180)
at 
java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:147)
at 
java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:660)
at java.base/java.lang.StringBuilder.append(StringBuilder.java:195)
at 
org.apache.solr.handler.extraction.SolrContentHandler.characters(SolrContentHandler.java:302)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at 
org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at 
org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
at 
org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
at 
org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
at 
org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
at 
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
at 
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLTikaBodyPartHandler.run(OOXMLTikaBodyPartHandler.java:147)
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.handleEndOfRun(OOXMLWordAndPowerPointTextHandler.java:468)
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.endElement(OOXMLWordAndPowerPointTextHandler.java:450)
at 
org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
at 
org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
at 
java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:609)
at 
java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1714)
at 
java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2879)
at 
java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:602)
at 
java.xml/com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112)
at 
java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:532)
at 
java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:888)
at 
java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:824)
at 
java.xml/com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
at 
java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
at 
java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:635)
at 
java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(SAXParserImpl.java:324)
 

R: Job stuck internal http error 500

2018-07-27 Thread Bisonti Mario
Yes, I am using:
/opt/manifoldcf/multiprocess-file-example-proprietary
I set:
sudo nano options.env.unix
-Xms2048m
-Xmx2048m

But I obtain the same error.
My doubt is that it could be a solr/tika problem.
What could I do?
I restrict the scan to a single file and I obtain the same error



Da: Karl Wright 
Inviato: venerdì 27 luglio 2018 11:36
A: user@manifoldcf.apache.org
Oggetto: Re: Job stuck internal http error 500

I am presuming you are using the examples.  If so, edit the options file to 
grant more memory to you agents process by increasing the Xmx value.

Karl

On Fri, Jul 27, 2018, 3:04 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hallo.
My job is stucking indexing an xlsx file of 38MB

What could I do to solve my problem?

In the following there is the error:
2018-07-27 08:55:15.562 WARN  (qtp1521083627-52) [   x:core_share] 
o.e.j.s.HttpChannel /solr/core_share/update/extract
java.lang.OutOfMemoryError
at 
java.base/java.lang.AbstractStringBuilder.hugeCapacity(AbstractStringBuilder.java:188)
at 
java.base/java.lang.AbstractStringBuilder.newCapacity(AbstractStringBuilder.java:180)
at 
java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:147)
at 
java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:660)
at java.base/java.lang.StringBuilder.append(StringBuilder.java:195)
at 
org.apache.solr.handler.extraction.SolrContentHandler.characters(SolrContentHandler.java:302)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at 
org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at 
org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
at 
org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
at 
org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
at 
org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
at 
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
at 
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLTikaBodyPartHandler.run(OOXMLTikaBodyPartHandler.java:147)
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.handleEndOfRun(OOXMLWordAndPowerPointTextHandler.java:468)
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.endElement(OOXMLWordAndPowerPointTextHandler.java:450)
at 
org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
at 
org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
at 
java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:609)
at 
java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1714)
at 
java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2879)
at 
java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:602)
at 
java.xml/com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112)
at 
java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:532)
at 
java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:888)
at 
java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:824)
at 
java.xml/com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
at 
java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
at 
java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:635)
at 
java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(SAXParserImpl.java:324)
at java.xml/javax.xml.parsers.SAXParser.parse(SAXParser.java:197)
at 
org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleGeneralTextContainingPart(AbstractOOXMLExtractor.java:506)
at