Re: Java heap space error
Hi, Did you set a Garbage collection strategy on your JVM ? Marcello On 07/24/2014 03:32 PM, Ameya Aware wrote: Hi I am in process of indexing around 2,00,000 documents. I have increase java jeap space to 4 GB using below command : java -Xmx4096M -Xms4096M -jar start.jar Still after indexing around 15000 documents it gives java heap space error again. Any fix for this? Thanks, Ameya
Re: Java heap space error
I think that on large heap is suggested to monitor the garbage collection behavior and try to add a strategy adapted to your performance. On my production environment with a heap of 6 GB I set this parameter (server with 8 cores): -server -Xms6144m -Xmx6144m -XX:MaxPermSize=512m -Dcom.sun.management.jmxremote -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:+CMSParallelRemarkEnabled -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70 -XX:ConcGCThreads=6 -XX:ParallelGCThreads=6 Marcello On 07/24/2014 03:53 PM, Ameya Aware wrote: I did not make any other change than this.. rest of the settings are default. Do i need to set garbage collection strategy? On Thu, Jul 24, 2014 at 9:49 AM, Marcello Lorenzi mlore...@sorint.it mailto:mlore...@sorint.it wrote: Hi, Did you set a Garbage collection strategy on your JVM ? Marcello On 07/24/2014 03:32 PM, Ameya Aware wrote: Hi I am in process of indexing around 2,00,000 documents. I have increase java jeap space to 4 GB using below command : java -Xmx4096M -Xms4096M -jar start.jar Still after indexing around 15000 documents it gives java heap space error again. Any fix for this? Thanks, Ameya
Heap size and Solr 4.3
Hi All, we have deployed on our production environment a new Solr 4.3 instance (2 nodes with SolrCloud) but this morning one node gone on outofmemory status and we have noticed that the JVM uses a lot of Old Gen space during the normal lifecycle. What are the items that improve this high usage of Heap? Thanks, Marcello
SolR vs large PDF
Hi All, on our test environment we have implemented a new search engine based on Solr 4.3 with 2 instances hosted on different servers and 1 shard present on each servlet container. During some stress test we noticed a bottleneck into crawling of large PDF file that blocks the serving of results from queries to the collections. Is it possible to boost or mitigate the overhead created by PDFBOX during the crawling? Thanks, Marcello
Re: SolR vs large PDF
Hi Erick, On our architecture we use Apache Manifoldcf to invoke the schedulation from Manifold-web and we use the Manifold-agent to take the pdf file from the filesystem to SolR instances. Is it possibile to redirect the Manifold schedulation to the SolrJ instance for specific schedules? Thanks, Marcello On 11/27/2013 06:14 PM, Erick Erickson wrote: I'm assuming you're using the ExtractingRequestHandler. Offloading the entire work onto your Solr box that is also serving queries and indexing is not going to scale well. Consider using Tika/SolrJ (Tika is what the ERH uses anyway) to offload the PDF parsing amongst as many clients as you can afford. Here's a way to get started: http://searchhub.org/2012/02/14/indexing-with-solrj/ Best, Erick On Wed, Nov 27, 2013 at 10:00 AM, Marcello Lorenzi mlore...@sorint.itwrote: Hi All, on our test environment we have implemented a new search engine based on Solr 4.3 with 2 instances hosted on different servers and 1 shard present on each servlet container. During some stress test we noticed a bottleneck into crawling of large PDF file that blocks the serving of results from queries to the collections. Is it possible to boost or mitigate the overhead created by PDFBOX during the crawling? Thanks, Marcello
Re: PDF indexing issues
Hi, I have checked the PDF Jira issue but there isn't solution into this because some users experienced the same issue with different CMAP entries. Could it possible to update the PDFBOX library in the SolR installation? Thanks, Marcello On 11/15/2013 06:27 PM, Furkan KAMACI wrote: You should check the Apache PDFBox project. A similar question: https://issues.apache.org/jira/browse/PDFBOX-940 2013/11/15 Marcello Lorenzi mlore...@sorint.it Hi, during you testing of Apache SOLR 4.3, we have noticed some errors occurred for PDF indexing: ERROR - 2013-11-15 15:14:26.248; org.apache.pdfbox.pdmodel.font.PDCIDFont; Error: Could not parse predefined CMAP file for 'PDFXC30-Indentity0-UCS2' ERROR - 2013-11-15 15:14:36.108; org.apache.pdfbox.pdmodel.font.PDCIDFont; Error: Could not parse predefined CMAP file for '--UCS2' and ERROR - 2013-11-15 15:12:18.928; org.apache.pdfbox.filter.FlateFilter; FlateFilter: stop reading corrupt stream due to a DataFormatException Could these errors related to PDF files format? Thanks, Marcello
Re: Solr xml img parsing exception
Hi Jack, we have analyzed the issue and there were duplicated jar into the tomcat classpath for Tika. After the removal of the dulicated library now the search engine works as expected. Thanks for the support, Marcello On 11/14/2013 05:24 PM, Jack Krupansky wrote: The actual error appears to be: Caused by: org.xml.sax.SAXParseException; lineNumber: 91; columnNumber: 105; The element type img must be terminated by the matching end-tag /img. So, check the input document at line 91, column 105. There should be an img tag there, but SAX is complaining that there is no matching /img. -- Jack Krupansky -Original Message- From: Marcello Lorenzi Sent: Thursday, November 14, 2013 9:26 AM To: solr-user@lucene.apache.org Subject: Solr xml img parsing exception Hi, I have installed a Solr 4.3 instance and we have configured manifoldcf to pass web content to the shard collection, but during the crawling we have noticed a lot of this exception: ERROR - 2013-11-14 15:13:57.954; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: XML parse error at com.lsegroup.solr.handler.CwsExtractingDocumentLoader.load(CwsExtractingDocumentLoader.java:150) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:242) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:221) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:107) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:155) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:76) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:934) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:90) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:515) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1012) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:642) at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:223) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1597) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1555) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) Caused by: org.apache.tika.exception.TikaException: XML parse error at org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:78) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at com.lsegroup.solr.handler.CwsExtractingDocumentLoader.load(CwsExtractingDocumentLoader.java:147) ... 24 more Caused by: org.xml.sax.SAXParseException; lineNumber: 91; columnNumber: 105; The element type img must be terminated by the matching end-tag /img. at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:198) at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177) at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:441) at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:368) at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1388) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1753
PDF indexing issues
Hi, during you testing of Apache SOLR 4.3, we have noticed some errors occurred for PDF indexing: ERROR - 2013-11-15 15:14:26.248; org.apache.pdfbox.pdmodel.font.PDCIDFont; Error: Could not parse predefined CMAP file for 'PDFXC30-Indentity0-UCS2' ERROR - 2013-11-15 15:14:36.108; org.apache.pdfbox.pdmodel.font.PDCIDFont; Error: Could not parse predefined CMAP file for '--UCS2' and ERROR - 2013-11-15 15:12:18.928; org.apache.pdfbox.filter.FlateFilter; FlateFilter: stop reading corrupt stream due to a DataFormatException Could these errors related to PDF files format? Thanks, Marcello
Solr xml img parsing exception
Hi, I have installed a Solr 4.3 instance and we have configured manifoldcf to pass web content to the shard collection, but during the crawling we have noticed a lot of this exception: ERROR - 2013-11-14 15:13:57.954; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: XML parse error at com.lsegroup.solr.handler.CwsExtractingDocumentLoader.load(CwsExtractingDocumentLoader.java:150) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:242) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:221) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:107) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:155) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:76) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:934) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:90) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:515) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1012) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:642) at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:223) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1597) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1555) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) Caused by: org.apache.tika.exception.TikaException: XML parse error at org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:78) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at com.lsegroup.solr.handler.CwsExtractingDocumentLoader.load(CwsExtractingDocumentLoader.java:147) ... 24 more Caused by: org.xml.sax.SAXParseException; lineNumber: 91; columnNumber: 105; The element type img must be terminated by the matching end-tag /img. at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:198) at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177) at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:441) at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:368) at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1388) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1753) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2951) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606) at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:116) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:846) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:775) at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:123) at
Re: Solr xml img parsing exception
Hi Erik, but in this case the custom loader receives an HTTP Error 500 by SOLR? Thanks, Marcello On 11/14/2013 04:29 PM, Erik Hatcher wrote: Also there's a custom loader here that is the culprit: com.lsegroup.solr.handler.CwsExtractingDocumentLoader On Nov 14, 2013, at 10:20, Erick Erickson erickerick...@gmail.com wrote: It looks like bad data. The XML you're sending to Solr looks mal-formed, so I suspect this is completely outside of Solr's purview. Best, Erick On Thu, Nov 14, 2013 at 9:26 AM, Marcello Lorenzi mlore...@sorint.itwrote: Hi, I have installed a Solr 4.3 instance and we have configured manifoldcf to pass web content to the shard collection, but during the crawling we have noticed a lot of this exception: ERROR - 2013-11-14 15:13:57.954; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: XML parse error at com.lsegroup.solr.handler.CwsExtractingDocumentLoader.load( CwsExtractingDocumentLoader.java:150) at org.apache.solr.handler.ContentStreamHandlerBase. handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest( RequestHandlerBase.java:135) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper. handleRequest(RequestHandlers.java:242) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) at org.apache.solr.servlet.SolrDispatchFilter.execute( SolrDispatchFilter.java:656) at org.apache.solr.servlet.SolrDispatchFilter.doFilter( SolrDispatchFilter.java:359) at org.apache.solr.servlet.SolrDispatchFilter.doFilter( SolrDispatchFilter.java:155) at org.apache.catalina.core.ApplicationFilterChain. internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter( ApplicationFilterChain.java:208) at org.apache.catalina.core.StandardWrapperValve.invoke( StandardWrapperValve.java:221) at org.apache.catalina.core.StandardContextValve.invoke( StandardContextValve.java:107) at org.apache.catalina.core.StandardHostValve.invoke( StandardHostValve.java:155) at org.apache.catalina.valves.ErrorReportValve.invoke( ErrorReportValve.java:76) at org.apache.catalina.valves.AccessLogValve.invoke( AccessLogValve.java:934) at org.apache.catalina.core.StandardEngineValve.invoke( StandardEngineValve.java:90) at org.apache.catalina.connector.CoyoteAdapter.service( CoyoteAdapter.java:515) at org.apache.coyote.http11.AbstractHttp11Processor.process( AbstractHttp11Processor.java:1012) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler. process(AbstractProtocol.java:642) at org.apache.coyote.http11.Http11NioProtocol$ Http11ConnectionHandler.process(Http11NioProtocol.java:223) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor. doRun(NioEndpoint.java:1597) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor. run(NioEndpoint.java:1555) at java.util.concurrent.ThreadPoolExecutor.runWorker( ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run( ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) Caused by: org.apache.tika.exception.TikaException: XML parse error at org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:78) at org.apache.tika.parser.CompositeParser.parse( CompositeParser.java:242) at org.apache.tika.parser.CompositeParser.parse( CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse( AutoDetectParser.java:120) at com.lsegroup.solr.handler.CwsExtractingDocumentLoader.load( CwsExtractingDocumentLoader.java:147) ... 24 more Caused by: org.xml.sax.SAXParseException; lineNumber: 91; columnNumber: 105; The element type img must be terminated by the matching end-tag /img. at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper. createSAXParseException(ErrorHandlerWrapper.java:198) at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper. fatalError(ErrorHandlerWrapper.java:177) at com.sun.org.apache.xerces.internal.impl. XMLErrorReporter.reportError(XMLErrorReporter.java:441) at com.sun.org.apache.xerces.internal.impl. XMLErrorReporter.reportError(XMLErrorReporter.java:368) at com.sun.org.apache.xerces.internal.impl.XMLScanner. reportFatalError(XMLScanner.java:1388) at com.sun.org.apache.xerces.internal.impl. XMLDocumentFragmentScannerImpl.scanEndElement( XMLDocumentFragmentScannerImpl.java:1753) at com.sun.org.apache.xerces.internal.impl. XMLDocumentFragmentScannerImpl$FragmentContentDriver.next( XMLDocumentFragmentScannerImpl.java:2951) at com.sun.org.apache.xerces.internal.impl. XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606) at com.sun.org.apache.xerces.internal.impl