[ https://issues.apache.org/jira/browse/SOLR-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexandre Rafalovitch closed SOLR-2886. --------------------------------------- Resolution: Cannot Reproduce > Out of Memory Error with DIH and TikaEntityProcessor > ---------------------------------------------------- > > Key: SOLR-2886 > URL: https://issues.apache.org/jira/browse/SOLR-2886 > Project: Solr > Issue Type: Bug > Components: contrib - DataImportHandler, contrib - Solr Cell (Tika > extraction) > Affects Versions: 4.0-ALPHA > Reporter: Tricia Jenkins > > I've recently upgraded from apache-solr-4.0-2011-06-14_08-33-23.war to > apache-solr-4.0-2011-10-14_08-56-59.war and then > apache-solr-4.0-2011-10-30_09-00-00.war to index ~5300 pdfs, of various > sizes, using the TikaEntityProcessor. My indexing would run to completion > and was completely successful under the June build. The only error was > readability of the fulltext in highlighting. This was fixed in Tika 0.10 > (TIKA-611). I chose to use the October 14 build of Solr because Tika 0.10 > had recently been included (SOLR-2372). > On the same machine without changing any memory settings my initial problem > is a Perm Gen error. Fine, I increase the PermGen space. > I've set the "onError" parameter to "skip" for the TikaEntityProcessor. Now > I get several (6) > SEVERE: Exception thrown while getting data > java.net.SocketTimeoutException: Read timed out > SEVERE: Exception in entity : > tika:org.apache.solr.handler.dataimport.DataImport > HandlerException: Exception in invoking url <url removed> # 2975 > pairs. And after ~3881 documents, with auto commit set unreasonably > frequently I consistently get an Out of Memory Error > SEVERE: Exception while processing: f document : > null:org.apache.solr.handler.dataimport.DataImportHandlerException: > java.lang.OutOfMemoryError: Java heap space > The stack trace points to > org.apache.pdfbox.io.RandomAccessBuffer.expandBuffer(RandomAccessBuffer.java:151) > and > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:718). > The October 30 build performs identically. > Funny thing is that monitoring via JConsole doesn't reveal any memory issues. > Because the out of Memory error did not occur in June, this leads me to > believe that a bug has been introduced to the code since then. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org