RE: Can't extract Outlook message files
This is an issue with extractOnly=true on Solr 3.6.1. We upgraded to 4.0 Beta 2 and the problem went away. Just in case anyone runs into this. Sincerely, Alex -Original Message- From: Alexander Cougarman [mailto:acoug...@bwc.org] Sent: 23 August 2012 12:27 PM To: solr-user@lucene.apache.org Subject: Can't extract Outlook message files Hi. We're trying to use the following Curl command to perform an extract only of *.MSG file, but it blows up: curl http://localhost:8983/solr/update/extract?extractOnly=true; -F myfile=@92.msg If we do this, it works fine: curl http://localhost:8983/solr/update/extract?literal.id=doc1commit=true; -F myfile=@92.msg We've tried a variety of MSG files and they all produce the same error; they all have content in them. What are we doing wrong? Here's the exception the extractOnly=true command generates: html head meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/ titleError 500 null org.apache.solr.common.SolrException at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr actingDocumentLoader.java:233) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Co ntentStreamHandlerBase.java:58) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl erBase.java:129) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handle Request(RequestHandlers.java:244) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter .java:365) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte r.java:260) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(Servlet Handler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3 99) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav a:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1 82) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7 66) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHand lerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection. java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:1 52) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:54 2) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnectio n.java:945) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector. java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.j ava:582) Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@aaf063 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244 ) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242 ) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:1 20) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr actingDocumentLoader.java:227) ... 23 more Caused by: java.lang.IllegalStateException: Internal: Internal error: element st ate is zero. at org.apache.xml.serialize.BaseMarkupSerializer.leaveElementState(Unkno wn Source) at org.apache.xml.serialize.XMLSerializer.endElementIO(Unknown Source) at org.apache.xml.serialize.XMLSerializer.endElement(Unknown Source) at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler Decorator.java:136) at org.apache.tika.sax.SecureContentHandler.endElement(SecureContentHand ler.java:256) at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler Decorator.java:136) at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler Decorator.java:136) at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler Decorator.java:136) at org.apache.tika.sax.SafeContentHandler.endElement(SafeContentHandler. java:273) at org.apache.tika.sax.XHTMLContentHandler.endDocument(XHTMLContentHandl er.java:213) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java :178) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242 ) ... 26 more /title /head bodyh2HTTP ERROR 500/h2 pProblem accessing /solr/update/extract. Reason: prenull
Re: Can't extract Outlook message files
Hmmm, it kind of looks like your file doesn't have an id field, but that's just guessing based on your statement hat providing an ID works just fine. Does it work if you take the uniqueKey definition out of your schema.xml (and you'll also have to remove the 'required=true ' from the id field)? But this is a wild shot in the dark Best Erick On Thu, Aug 23, 2012 at 5:27 AM, Alexander Cougarman acoug...@bwc.org wrote: Hi. We're trying to use the following Curl command to perform an extract only of *.MSG file, but it blows up: curl http://localhost:8983/solr/update/extract?extractOnly=true; -F myfile=@92.msg If we do this, it works fine: curl http://localhost:8983/solr/update/extract?literal.id=doc1commit=true; -F myfile=@92.msg We've tried a variety of MSG files and they all produce the same error; they all have content in them. What are we doing wrong? Here's the exception the extractOnly=true command generates: html head meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/ titleError 500 null org.apache.solr.common.SolrException at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr actingDocumentLoader.java:233) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Co ntentStreamHandlerBase.java:58) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl erBase.java:129) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handle Request(RequestHandlers.java:244) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter .java:365) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte r.java:260) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(Servlet Handler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3 99) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav a:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1 82) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7 66) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHand lerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection. java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:1 52) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:54 2) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnectio n.java:945) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector. java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.j ava:582) Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@aaf063 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244 ) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242 ) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:1 20) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr actingDocumentLoader.java:227) ... 23 more Caused by: java.lang.IllegalStateException: Internal: Internal error: element st ate is zero. at org.apache.xml.serialize.BaseMarkupSerializer.leaveElementState(Unkno wn Source) at org.apache.xml.serialize.XMLSerializer.endElementIO(Unknown Source) at org.apache.xml.serialize.XMLSerializer.endElement(Unknown Source) at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler Decorator.java:136) at org.apache.tika.sax.SecureContentHandler.endElement(SecureContentHand ler.java:256) at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler Decorator.java:136) at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler Decorator.java:136) at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler Decorator.java:136) at org.apache.tika.sax.SafeContentHandler.endElement(SafeContentHandler. java:273) at org.apache.tika.sax.XHTMLContentHandler.endDocument(XHTMLContentHandl er.java:213) at
Can't extract Outlook message files
Hi. We're trying to use the following Curl command to perform an extract only of *.MSG file, but it blows up: curl http://localhost:8983/solr/update/extract?extractOnly=true; -F myfile=@92.msg If we do this, it works fine: curl http://localhost:8983/solr/update/extract?literal.id=doc1commit=true; -F myfile=@92.msg We've tried a variety of MSG files and they all produce the same error; they all have content in them. What are we doing wrong? Here's the exception the extractOnly=true command generates: html head meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/ titleError 500 null org.apache.solr.common.SolrException at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr actingDocumentLoader.java:233) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Co ntentStreamHandlerBase.java:58) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl erBase.java:129) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handle Request(RequestHandlers.java:244) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter .java:365) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte r.java:260) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(Servlet Handler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3 99) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav a:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1 82) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7 66) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHand lerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection. java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:1 52) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:54 2) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnectio n.java:945) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector. java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.j ava:582) Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@aaf063 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244 ) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242 ) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:1 20) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr actingDocumentLoader.java:227) ... 23 more Caused by: java.lang.IllegalStateException: Internal: Internal error: element st ate is zero. at org.apache.xml.serialize.BaseMarkupSerializer.leaveElementState(Unkno wn Source) at org.apache.xml.serialize.XMLSerializer.endElementIO(Unknown Source) at org.apache.xml.serialize.XMLSerializer.endElement(Unknown Source) at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler Decorator.java:136) at org.apache.tika.sax.SecureContentHandler.endElement(SecureContentHand ler.java:256) at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler Decorator.java:136) at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler Decorator.java:136) at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler Decorator.java:136) at org.apache.tika.sax.SafeContentHandler.endElement(SafeContentHandler. java:273) at org.apache.tika.sax.XHTMLContentHandler.endDocument(XHTMLContentHandl er.java:213) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java :178) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242 ) ... 26 more /title /head bodyh2HTTP ERROR 500/h2 pProblem accessing /solr/update/extract. Reason: prenull org.apache.solr.common.SolrException at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr actingDocumentLoader.java:233) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Co ntentStreamHandlerBase.java:58) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl erBase.java:129)