[ https://issues.apache.org/jira/browse/TIKA-397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jukka Zitting resolved TIKA-397. -------------------------------- Assignee: Jukka Zitting Resolution: Duplicate This was fixed in Tika 0.5 as a side-effect of other changes. Solr trunk has already upgraded to a more recent Tika version (see SOLR-1819), so the fix will also be included in the next Solr release. > Parser crashes on very simple file > ---------------------------------- > > Key: TIKA-397 > URL: https://issues.apache.org/jira/browse/TIKA-397 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 0.4 > Environment: Solr 1.4 on Ubuntu 9.10. OpenJDK Runtime Environment > (IcedTea6 1.6.1) (6b16-1.6.1-3ubuntu1) > Reporter: Ross Keatinge > Assignee: Jukka Zitting > > Sorry but I can only talk about this from a Solr user's point of view. I'm > using Solr's ExtractingRequestHandler (Solr Cell) to index some text files. > In general it's working fine but Tika crashes when parsing a text file with > with certain upper case short words near the start of the file. I haven't > been able to discover the pattern of what works and what doesn't but here's a > real simple example. > A file with just the letters XE and nothing else crashes. If I edit the file > and change it to any of XA, XB, XC, XD or XF it works but XE always crashes. > Lower case works. > I discovered this with certain five letter words that unfortunately are very > common in my documents. > Here's the error message from Solr. > <html><head><title>Apache Tomcat/6.0.20 - Error report</title><style><!--H1 > {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} > H2 > {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} > H3 > {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} > BODY > {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B > {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P > {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A > {color : black;}A.name {color : black;}HR {color : #525D76;}--></style> > </head><body><h1>HTTP Status 500 - org.apache.tika.exception.TikaException: > Unexpected RuntimeException from org.apache.tika.parser.txt.txtpar...@a51027 > org.apache.solr.common.SolrException: > org.apache.tika.exception.TikaException: Unexpected RuntimeException from > org.apache.tika.parser.txt.txtpar...@a51027 > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:211) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) > at > org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) > at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) > at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) > at > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) > at > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) > at > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) > at > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) > at > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) > at > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) > at > org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849) > at > org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) > at > org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) > at java.lang.Thread.run(Thread.java:636) > Caused by: org.apache.tika.exception.TikaException: Unexpected > RuntimeException from org.apache.tika.parser.txt.txtpar...@a51027 > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:105) > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190) > ... 18 more > Caused by: java.lang.NullPointerException > at java.io.Reader.<init>(Reader.java:78) > at java.io.BufferedReader.<init>(BufferedReader.java:93) > at java.io.BufferedReader.<init>(BufferedReader.java:108) > at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:59) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:119) > ... 20 more > </h1><HR size="1" noshade="noshade"><p><b>type</b> Status > report</p><p><b>message</b> <u>org.apache.tika.exception.TikaException: > Unexpected RuntimeException from org.apache.tika.parser.txt.txtpar...@a51027 > org.apache.solr.common.SolrException: > org.apache.tika.exception.TikaException: Unexpected RuntimeException from > org.apache.tika.parser.txt.txtpar...@a51027 > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:211) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) > at > org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) > at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) > at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) > at > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) > at > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) > at > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) > at > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) > at > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) > at > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) > at > org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849) > at > org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) > at > org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) > at java.lang.Thread.run(Thread.java:636) > Caused by: org.apache.tika.exception.TikaException: Unexpected > RuntimeException from org.apache.tika.parser.txt.txtpar...@a51027 > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:105) > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190) > ... 18 more > Caused by: java.lang.NullPointerException > at java.io.Reader.<init>(Reader.java:78) > at java.io.BufferedReader.<init>(BufferedReader.java:93) > at java.io.BufferedReader.<init>(BufferedReader.java:108) > at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:59) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:119) > ... 20 more > </u></p><p><b>description</b> <u>The server encountered an internal error > (org.apache.tika.exception.TikaException: Unexpected RuntimeException from > org.apache.tika.parser.txt.txtpar...@a51027 > org.apache.solr.common.SolrException: > org.apache.tika.exception.TikaException: Unexpected RuntimeException from > org.apache.tika.parser.txt.txtpar...@a51027 > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:211) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) > at > org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) > at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) > at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) > at > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) > at > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) > at > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) > at > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) > at > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) > at > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) > at > org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849) > at > org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) > at > org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) > at java.lang.Thread.run(Thread.java:636) > Caused by: org.apache.tika.exception.TikaException: Unexpected > RuntimeException from org.apache.tika.parser.txt.txtpar...@a51027 > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:105) > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190) > ... 18 more > Caused by: java.lang.NullPointerException > at java.io.Reader.<init>(Reader.java:78) > at java.io.BufferedReader.<init>(BufferedReader.java:93) > at java.io.BufferedReader.<init>(BufferedReader.java:108) > at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:59) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:119) > ... 20 more -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira