The help on file size was great, now we still have the problem on small jpg.
solr.log contains:
ERROR - 2013-10-29 15:47:19.815; org.apache.solr.common.SolrException;
null:java.lang.RuntimeException: java.lang.NoClassDefFoundError:
com/adobe/xmp/XMPException
at
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:673)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:383)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023)
at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
at
org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:1852)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.NoClassDefFoundError: com/adobe/xmp/XMPException
at
com.drew.imaging.jpeg.JpegMetadataReader.extractMetadataFromJpegSegmentReader(JpegMetadataReader.java:112)
at
com.drew.imaging.jpeg.JpegMetadataReader.readMetadata(JpegMetadataReader.java:71)
at
org.apache.tika.parser.image.ImageMetadataExtractor.parseJpeg(ImageMetadataExtractor.java:91)
at org.apache.tika.parser.jpeg.JpegParser.parse(JpegParser.java:56)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)
... 16 more
Caused by: java.lang.ClassNotFoundException: com.adobe.xmp.XMPException
at java.net.URLClassLoader$1.run(Unknown Source)
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.net.FactoryURLClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
... 30 more
On Tue, Oct 29, 2013 at 1:25 PM, Ronny Heylen <[email protected]>wrote:
> That was a very good suggestion!
> Setting the max size has solved the problem for the first subfolder on
> which we test.
> Now we wil retry on the full drive and let you know the result.
>
>
> On Tue, Oct 29, 2013 at 12:12 PM, Karl Wright <[email protected]> wrote:
>
>> Based on the error message, Adrian is correct and this is once again a
>> solr side problem. Since solr puts all documents into memory, my guess is
>> that you are attempting to index some very large documents and those are
>> causing solr to run out of memory. Either exclude these from the crawl or
>> set a reasonable maximum length.
>>
>> Karl
>>
>> Sent from my Windows Phone
>> ------------------------------
>> From: Ronny Heylen
>> Sent: 10/29/2013 6:52 AM
>>
>> To: [email protected]
>> Subject: Error in Manifoldcf, what's the first step?
>>
>> Hi,
>>
>> Solr is 4.4, manifoldcf 1.3.
>>
>> We are indexing a shared windows network drive, filtering on *.doc*,
>> *.xls*, *.pdf ... with about 650,000 files to index, giving a SOLR index
>> 35GB in size.
>>
>> The result is great except that the manifoldcf job crashes before the end.
>>
>> Note that:
>> - ignoreTikaException is true in solrconfig.xml (otherwise the manifoldcf
>> job stops very early).
>> - tomcat has been given 24 GB of memory (it uses 15GB)
>> - there are 8 cores
>>
>> Message in http://localhost:8080/mcf-crawler-ui/showjobstatus.jsp is:
>> Error: Repeated service interruptions - failure processing document:
>> Server at http://localhost:8080/solr/collection1 returned non ok
>> status:500, message:Internal Server Error
>>
>> Then, instead of indexing the full drive in one job, we have defined one
>> job for each subfolder.
>>
>> Almost all "subfolder" jobs end successfully, only for 2 or 3 we receive
>> the same message, and for 2 or 3 other ones a different message:
>>
>> Error: Repeated service interruptions - failure processing document: Read
>> timed out
>>
>> If we try to go further (defining one job for each subfolder of a
>> subfolder in error), the same happens: success for almost all subfolders
>> except 1 or 2.
>>
>> What is the first step to do to solve this problem?
>>
>> Thanks.
>>
>
>