Re: Exclude files ~$*

2018-07-27 Thread Karl Wright
Can you view the job and include a screen shot of where this is displayed?
Thanks.

The exclusions are not regexps -- they are file specs.  The file specs have
special meanings for "*" (matches everything) and "?" (matches one
character).  You do not need to URL encode them.

If you enable connector debugging () you will see
statements like these, which should hint as to what went wrong:

Logging.connectors.debug("JCIFS: Checking '"+match+"'
against '"+fileName.substring(matchEnd-1)+"'");


Karl


On Fri, Jul 27, 2018 at 8:36 AM msaunier  wrote:

> Hi Karl,
>
>
>
> In my JCIFS connector, I want to configure an exclude condition if files
> name start with ~$*
>
> I have add the condition, but it does not working.
>
> I need to add: %7E%24* or a regex?
>
>
>
> Thanks,
>
>
>
> Maxence,
>


Re: Tika/POI bugs

2018-07-27 Thread Karl Wright
To solve your production problem I highly recommend limiting the size of
the docs fed to Tika, for a start.  But that is no guarantee, I understand.

Out of memory problems are very hard to get good forensics for because they
cause major disruptions to the running server.  You could turn on a degree
of logging so that you can see what documents are being processed at any
time by all threads, but that is pretty verbose.  In your properties.xml
file, add .  But I suspect that will generate far too much noise.
Still, it's the best I can offer.

Karl


On Fri, Jul 27, 2018 at 7:52 AM msaunier  wrote:

> Hi Karl,
>
>
>
> Okay. For the Out of Memory:
>
>
>
> This is the last day that I can go on to find out where the error comes
> from. After that, I should go into production to meet my deadlines.
>
> I hope to find time in the future to be able to fix this problem on this
> server, otherwise I could not index it. Unfortunately, it is very difficult
> to find the documents that cause this error. I did not find any trace in
> the database. Even in debug mode, it is difficult to find the problematic
> document. Maybe if I limit to 1 thread I could find it more easily, but I'm
> afraid the crawl is very long.
>
> Maybe you have an idea of ​​the best method to adopt to find this / these
> documents?
>
>
>
> Maxence
>
>
>
> *De :* Karl Wright [mailto:daddy...@gmail.com]
> *Envoyé :* vendredi 27 juillet 2018 12:47
> *À :* dev ; user@manifoldcf.apache.org
> *Objet :* Tika/POI bugs
>
>
>
> Hi all,
>
>
>
> I've easily spent 40 hours over the last two weeks chasing down bugs in
> Apache Tika and POI.  The two kinds I see are "ClassNotFound" (due to usage
> of the wrong ClassLoader), and "OutOfMemoryError" (not clear what it is due
> to yet).
>
> I don't have enough time to create tickets directly in Tika for all
> possible documents where these failures occur, so I urge our users to
> create tickets DIRECTLY in the Tika project in Jira.  I guess you can let
> the Tika people create the POI tickets, if need be.  For OutOfMemory
> problems, please attach the file that causes the problem to the ticket, and
> also the amount of memory you gave the agents process.  For ClassNotFound
> problems, also include the stack trace.
>
>
>
> Thanks in advance,
>
> Karlx
>


Exclude files ~$*

2018-07-27 Thread msaunier
Hi Karl,

 

In my JCIFS connector, I want to configure an exclude condition if files
name start with ~$*

I have add the condition, but it does not working.

I need to add: %7E%24* or a regex?

 

Thanks,

 

Maxence,



RE: Tika/POI bugs

2018-07-27 Thread msaunier
Hi Karl,

 

Okay. For the Out of Memory:

 

This is the last day that I can go on to find out where the error comes from. 
After that, I should go into production to meet my deadlines.

I hope to find time in the future to be able to fix this problem on this 
server, otherwise I could not index it. Unfortunately, it is very difficult to 
find the documents that cause this error. I did not find any trace in the 
database. Even in debug mode, it is difficult to find the problematic document. 
Maybe if I limit to 1 thread I could find it more easily, but I'm afraid the 
crawl is very long.

Maybe you have an idea of ​​the best method to adopt to find this / these 
documents?

 

Maxence

 

De : Karl Wright [mailto:daddy...@gmail.com] 
Envoyé : vendredi 27 juillet 2018 12:47
À : dev ; user@manifoldcf.apache.org
Objet : Tika/POI bugs

 

Hi all,

 

I've easily spent 40 hours over the last two weeks chasing down bugs in Apache 
Tika and POI.  The two kinds I see are "ClassNotFound" (due to usage of the 
wrong ClassLoader), and "OutOfMemoryError" (not clear what it is due to yet).

I don't have enough time to create tickets directly in Tika for all possible 
documents where these failures occur, so I urge our users to create tickets 
DIRECTLY in the Tika project in Jira.  I guess you can let the Tika people 
create the POI tickets, if need be.  For OutOfMemory problems, please attach 
the file that causes the problem to the ticket, and also the amount of memory 
you gave the agents process.  For ClassNotFound problems, also include the 
stack trace.

 

Thanks in advance,

Karlx



Tika/POI bugs

2018-07-27 Thread Karl Wright
Hi all,

I've easily spent 40 hours over the last two weeks chasing down bugs in
Apache Tika and POI.  The two kinds I see are "ClassNotFound" (due to usage
of the wrong ClassLoader), and "OutOfMemoryError" (not clear what it is due
to yet).

I don't have enough time to create tickets directly in Tika for all
possible documents where these failures occur, so I urge our users to
create tickets DIRECTLY in the Tika project in Jira.  I guess you can let
the Tika people create the POI tickets, if need be.  For OutOfMemory
problems, please attach the file that causes the problem to the ticket, and
also the amount of memory you gave the agents process.  For ClassNotFound
problems, also include the stack trace.

Thanks in advance,
Karl


Re: Job stuck internal http error 500

2018-07-27 Thread Karl Wright
I am afraid you will need to open a Tika ticket, and be prepared to attach
your file to it.

Thanks,

Karl


On Fri, Jul 27, 2018 at 6:04 AM Bisonti Mario 
wrote:

> It isn’t a memory problem because xls file bigger (30MB) have been
> processed.
>
>
>
> This file xlsm with many colors etc hang
>
> I could suppose that it is a tika/solr erro but I don’t know how to solve
> it
>
> ☹
>
>
>
> *Oggetto:* R: Job stuck internal http error 500
>
>
>
> Yes, I am using:
> /opt/manifoldcf/multiprocess-file-example-proprietary
> I set:
>
> sudo nano options.env.unix
>
> -Xms2048m
>
> -Xmx2048m
>
>
>
> But I obtain the same error.
>
> My doubt is that it could be a solr/tika problem.
>
> What could I do?
>
> I restrict the scan to a single file and I obtain the same error
>
>
>
>
>
>
>
> *Da:* Karl Wright 
> *Inviato:* venerdì 27 luglio 2018 11:36
> *A:* user@manifoldcf.apache.org
> *Oggetto:* Re: Job stuck internal http error 500
>
>
>
> I am presuming you are using the examples.  If so, edit the options file
> to grant more memory to you agents process by increasing the Xmx value.
>
>
>
> Karl
>
>
>
> On Fri, Jul 27, 2018, 3:04 AM Bisonti Mario 
> wrote:
>
> Hallo.
>
> My job is stucking indexing an xlsx file of 38MB
>
>
>
> What could I do to solve my problem?
>
>
>
> In the following there is the error:
> 2018-07-27 08:55:15.562 WARN  (qtp1521083627-52) [   x:core_share]
> o.e.j.s.HttpChannel /solr/core_share/update/extract
>
> java.lang.OutOfMemoryError
>
> at
> java.base/java.lang.AbstractStringBuilder.hugeCapacity(AbstractStringBuilder.java:188)
>
> at
> java.base/java.lang.AbstractStringBuilder.newCapacity(AbstractStringBuilder.java:180)
>
> at
> java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:147)
>
> at
> java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:660)
>
> at java.base/java.lang.StringBuilder.append(StringBuilder.java:195)
>
> at
> org.apache.solr.handler.extraction.SolrContentHandler.characters(SolrContentHandler.java:302)
>
> at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
> at
> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
>
> at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
> at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
> at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
> at
> org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
>
> at
> org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
>
> at
> org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
>
> at
> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
>
> at
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
>
> at
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)
>
> at
> org.apache.tika.parser.microsoft.ooxml.OOXMLTikaBodyPartHandler.run(OOXMLTikaBodyPartHandler.java:147)
>
> at
> org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.handleEndOfRun(OOXMLWordAndPowerPointTextHandler.java:468)
>
> at
> org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.endElement(OOXMLWordAndPowerPointTextHandler.java:450)
>
> at
> org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
>
> at
> org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
>
> at
> java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:609)
>
> at
> java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1714)
>
> at
> java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2879)
>
> at
> java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:602)
>
> at
> java.xml/com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112)
>
> at
> java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:532)
>
> at
> java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:888)
>
> at
> java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:824)
>
> at
> 

R: Job stuck internal http error 500

2018-07-27 Thread Bisonti Mario
It isn’t a memory problem because xls file bigger (30MB) have been processed.

This file xlsm with many colors etc hang
I could suppose that it is a tika/solr erro but I don’t know how to solve it
☹

Oggetto: R: Job stuck internal http error 500

Yes, I am using:
/opt/manifoldcf/multiprocess-file-example-proprietary
I set:
sudo nano options.env.unix
-Xms2048m
-Xmx2048m

But I obtain the same error.
My doubt is that it could be a solr/tika problem.
What could I do?
I restrict the scan to a single file and I obtain the same error



Da: Karl Wright mailto:daddy...@gmail.com>>
Inviato: venerdì 27 luglio 2018 11:36
A: user@manifoldcf.apache.org
Oggetto: Re: Job stuck internal http error 500

I am presuming you are using the examples.  If so, edit the options file to 
grant more memory to you agents process by increasing the Xmx value.

Karl

On Fri, Jul 27, 2018, 3:04 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hallo.
My job is stucking indexing an xlsx file of 38MB

What could I do to solve my problem?

In the following there is the error:
2018-07-27 08:55:15.562 WARN  (qtp1521083627-52) [   x:core_share] 
o.e.j.s.HttpChannel /solr/core_share/update/extract
java.lang.OutOfMemoryError
at 
java.base/java.lang.AbstractStringBuilder.hugeCapacity(AbstractStringBuilder.java:188)
at 
java.base/java.lang.AbstractStringBuilder.newCapacity(AbstractStringBuilder.java:180)
at 
java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:147)
at 
java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:660)
at java.base/java.lang.StringBuilder.append(StringBuilder.java:195)
at 
org.apache.solr.handler.extraction.SolrContentHandler.characters(SolrContentHandler.java:302)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at 
org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at 
org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
at 
org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
at 
org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
at 
org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
at 
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
at 
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLTikaBodyPartHandler.run(OOXMLTikaBodyPartHandler.java:147)
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.handleEndOfRun(OOXMLWordAndPowerPointTextHandler.java:468)
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.endElement(OOXMLWordAndPowerPointTextHandler.java:450)
at 
org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
at 
org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
at 
java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:609)
at 
java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1714)
at 
java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2879)
at 
java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:602)
at 
java.xml/com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112)
at 
java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:532)
at 
java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:888)
at 
java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:824)
at 
java.xml/com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
at 
java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
at 
java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:635)
at 
java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(SAXParserImpl.java:324)
at 

R: Job stuck internal http error 500

2018-07-27 Thread Bisonti Mario
Yes, I am using:
/opt/manifoldcf/multiprocess-file-example-proprietary
I set:
sudo nano options.env.unix
-Xms2048m
-Xmx2048m

But I obtain the same error.
My doubt is that it could be a solr/tika problem.
What could I do?
I restrict the scan to a single file and I obtain the same error



Da: Karl Wright 
Inviato: venerdì 27 luglio 2018 11:36
A: user@manifoldcf.apache.org
Oggetto: Re: Job stuck internal http error 500

I am presuming you are using the examples.  If so, edit the options file to 
grant more memory to you agents process by increasing the Xmx value.

Karl

On Fri, Jul 27, 2018, 3:04 AM Bisonti Mario 
mailto:mario.biso...@vimar.com>> wrote:
Hallo.
My job is stucking indexing an xlsx file of 38MB

What could I do to solve my problem?

In the following there is the error:
2018-07-27 08:55:15.562 WARN  (qtp1521083627-52) [   x:core_share] 
o.e.j.s.HttpChannel /solr/core_share/update/extract
java.lang.OutOfMemoryError
at 
java.base/java.lang.AbstractStringBuilder.hugeCapacity(AbstractStringBuilder.java:188)
at 
java.base/java.lang.AbstractStringBuilder.newCapacity(AbstractStringBuilder.java:180)
at 
java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:147)
at 
java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:660)
at java.base/java.lang.StringBuilder.append(StringBuilder.java:195)
at 
org.apache.solr.handler.extraction.SolrContentHandler.characters(SolrContentHandler.java:302)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at 
org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at 
org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
at 
org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
at 
org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
at 
org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
at 
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
at 
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLTikaBodyPartHandler.run(OOXMLTikaBodyPartHandler.java:147)
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.handleEndOfRun(OOXMLWordAndPowerPointTextHandler.java:468)
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.endElement(OOXMLWordAndPowerPointTextHandler.java:450)
at 
org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
at 
org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
at 
java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:609)
at 
java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1714)
at 
java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2879)
at 
java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:602)
at 
java.xml/com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112)
at 
java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:532)
at 
java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:888)
at 
java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:824)
at 
java.xml/com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
at 
java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
at 
java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:635)
at 
java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(SAXParserImpl.java:324)
at java.xml/javax.xml.parsers.SAXParser.parse(SAXParser.java:197)
at 
org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleGeneralTextContainingPart(AbstractOOXMLExtractor.java:506)
at 

Re: Job stuck internal http error 500

2018-07-27 Thread Karl Wright
Although it is not clear what process you are talking about.  If solr ask
them.

Karl

On Fri, Jul 27, 2018, 5:36 AM Karl Wright  wrote:

> I am presuming you are using the examples.  If so, edit the options file
> to grant more memory to you agents process by increasing the Xmx value.
>
> Karl
>
> On Fri, Jul 27, 2018, 3:04 AM Bisonti Mario 
> wrote:
>
>> Hallo.
>>
>> My job is stucking indexing an xlsx file of 38MB
>>
>>
>>
>> What could I do to solve my problem?
>>
>>
>>
>> In the following there is the error:
>> 2018-07-27 08:55:15.562 WARN  (qtp1521083627-52) [   x:core_share]
>> o.e.j.s.HttpChannel /solr/core_share/update/extract
>>
>> java.lang.OutOfMemoryError
>>
>> at
>> java.base/java.lang.AbstractStringBuilder.hugeCapacity(AbstractStringBuilder.java:188)
>>
>> at
>> java.base/java.lang.AbstractStringBuilder.newCapacity(AbstractStringBuilder.java:180)
>>
>> at
>> java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:147)
>>
>> at
>> java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:660)
>>
>> at
>> java.base/java.lang.StringBuilder.append(StringBuilder.java:195)
>>
>> at
>> org.apache.solr.handler.extraction.SolrContentHandler.characters(SolrContentHandler.java:302)
>>
>> at
>> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>>
>> at
>> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
>>
>> at
>> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>>
>> at
>> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>>
>> at
>> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>>
>> at
>> org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
>>
>> at
>> org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
>>
>> at
>> org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
>>
>> at
>> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
>>
>> at
>> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
>>
>> at
>> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)
>>
>> at
>> org.apache.tika.parser.microsoft.ooxml.OOXMLTikaBodyPartHandler.run(OOXMLTikaBodyPartHandler.java:147)
>>
>> at
>> org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.handleEndOfRun(OOXMLWordAndPowerPointTextHandler.java:468)
>>
>> at
>> org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.endElement(OOXMLWordAndPowerPointTextHandler.java:450)
>>
>> at
>> org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
>>
>> at
>> org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
>>
>> at
>> java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:609)
>>
>> at
>> java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1714)
>>
>> at
>> java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2879)
>>
>> at
>> java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:602)
>>
>> at
>> java.xml/com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112)
>>
>> at
>> java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:532)
>>
>> at
>> java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:888)
>>
>> at
>> java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:824)
>>
>> at
>> java.xml/com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
>>
>> at
>> java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
>>
>> at
>> java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:635)
>>
>> at
>> java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(SAXParserImpl.java:324)
>>
>> at java.xml/javax.xml.parsers.SAXParser.parse(SAXParser.java:197)
>>
>> at
>> org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleGeneralTextContainingPart(AbstractOOXMLExtractor.java:506)
>>
>> at
>> 

Re: Job stuck internal http error 500

2018-07-27 Thread Karl Wright
I am presuming you are using the examples.  If so, edit the options file to
grant more memory to you agents process by increasing the Xmx value.

Karl

On Fri, Jul 27, 2018, 3:04 AM Bisonti Mario  wrote:

> Hallo.
>
> My job is stucking indexing an xlsx file of 38MB
>
>
>
> What could I do to solve my problem?
>
>
>
> In the following there is the error:
> 2018-07-27 08:55:15.562 WARN  (qtp1521083627-52) [   x:core_share]
> o.e.j.s.HttpChannel /solr/core_share/update/extract
>
> java.lang.OutOfMemoryError
>
> at
> java.base/java.lang.AbstractStringBuilder.hugeCapacity(AbstractStringBuilder.java:188)
>
> at
> java.base/java.lang.AbstractStringBuilder.newCapacity(AbstractStringBuilder.java:180)
>
> at
> java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:147)
>
> at
> java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:660)
>
> at java.base/java.lang.StringBuilder.append(StringBuilder.java:195)
>
> at
> org.apache.solr.handler.extraction.SolrContentHandler.characters(SolrContentHandler.java:302)
>
> at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
> at
> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
>
> at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
> at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
> at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>
> at
> org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
>
> at
> org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
>
> at
> org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
>
> at
> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
>
> at
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
>
> at
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)
>
> at
> org.apache.tika.parser.microsoft.ooxml.OOXMLTikaBodyPartHandler.run(OOXMLTikaBodyPartHandler.java:147)
>
> at
> org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.handleEndOfRun(OOXMLWordAndPowerPointTextHandler.java:468)
>
> at
> org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.endElement(OOXMLWordAndPowerPointTextHandler.java:450)
>
> at
> org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
>
> at
> org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
>
> at
> java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:609)
>
> at
> java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1714)
>
> at
> java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2879)
>
> at
> java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:602)
>
> at
> java.xml/com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112)
>
> at
> java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:532)
>
> at
> java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:888)
>
> at
> java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:824)
>
> at
> java.xml/com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
>
> at
> java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
>
> at
> java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:635)
>
> at
> java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(SAXParserImpl.java:324)
>
> at java.xml/javax.xml.parsers.SAXParser.parse(SAXParser.java:197)
>
> at
> org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleGeneralTextContainingPart(AbstractOOXMLExtractor.java:506)
>
> at
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.processShapes(XSSFExcelExtractorDecorator.java:279)
>
> at
> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.buildXHTML(XSSFExcelExtractorDecorator.java:185)
>
> at
> 

Re: Solr connection, max connections and CPU

2018-07-27 Thread Bisonti Mario
Thanks a lot Karl!!!

On 2018/07/26 13:28:47, Karl Wright  wrote:
> Hi Mario,>
>
> There is no connection between the number of CPUs and the number output>
> connections.  You pick the maximum number of output connections based on>
> the number of listening threads that you can use at the same time in Solr.>
>
> Karl>
>
> On Thu, Jul 26, 2018 at 9:22 AM Bisonti Mario >
> wrote:>
>
> > Hallo, I setup solr connection in the "Output connections" of Manifold>
> >>
> >>
> >>
> > I don't understand if there is a relation between "Max Connections" and>
> > the number of CPUs in the host.>
> >>
> >>
> >>
> > Could you help me ti understand it?>
> >>
> >>
> >>
> > Thanks a lot>
> >>
> > Mario>
> >>
>


Job stuck internal http error 500

2018-07-27 Thread Bisonti Mario
Hallo.
My job is stucking indexing an xlsx file of 38MB

What could I do to solve my problem?

In the following there is the error:
2018-07-27 08:55:15.562 WARN  (qtp1521083627-52) [   x:core_share] 
o.e.j.s.HttpChannel /solr/core_share/update/extract
java.lang.OutOfMemoryError
at 
java.base/java.lang.AbstractStringBuilder.hugeCapacity(AbstractStringBuilder.java:188)
at 
java.base/java.lang.AbstractStringBuilder.newCapacity(AbstractStringBuilder.java:180)
at 
java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:147)
at 
java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:660)
at java.base/java.lang.StringBuilder.append(StringBuilder.java:195)
at 
org.apache.solr.handler.extraction.SolrContentHandler.characters(SolrContentHandler.java:302)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at 
org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at 
org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
at 
org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
at 
org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
at 
org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
at 
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
at 
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLTikaBodyPartHandler.run(OOXMLTikaBodyPartHandler.java:147)
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.handleEndOfRun(OOXMLWordAndPowerPointTextHandler.java:468)
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.endElement(OOXMLWordAndPowerPointTextHandler.java:450)
at 
org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
at 
org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
at 
java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:609)
at 
java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1714)
at 
java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2879)
at 
java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:602)
at 
java.xml/com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112)
at 
java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:532)
at 
java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:888)
at 
java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:824)
at 
java.xml/com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
at 
java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
at 
java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:635)
at 
java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(SAXParserImpl.java:324)
at java.xml/javax.xml.parsers.SAXParser.parse(SAXParser.java:197)
at 
org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleGeneralTextContainingPart(AbstractOOXMLExtractor.java:506)
at 
org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.processShapes(XSSFExcelExtractorDecorator.java:279)
at 
org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.buildXHTML(XSSFExcelExtractorDecorator.java:185)
at 
org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:135)
at 
org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.getXHTML(XSSFExcelExtractorDecorator.java:120)
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:143)
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)
at