RE: Can't extract Outlook message files

2012-08-26 Thread Alexander Cougarman
This is an issue with extractOnly=true on Solr 3.6.1. We upgraded to 4.0 Beta 
2 and the problem went away. Just in case anyone runs into this.

Sincerely,
Alex 


-Original Message-
From: Alexander Cougarman [mailto:acoug...@bwc.org] 
Sent: 23 August 2012 12:27 PM
To: solr-user@lucene.apache.org
Subject: Can't extract Outlook message files

Hi. We're trying to use the following Curl command to perform an extract only 
of *.MSG file, but it blows up:

   curl http://localhost:8983/solr/update/extract?extractOnly=true; -F 
myfile=@92.msg

If we do this, it works fine:

  curl http://localhost:8983/solr/update/extract?literal.id=doc1commit=true; 
-F myfile=@92.msg

We've tried a variety of MSG files and they all produce the same error; they 
all have content in them. What are we doing wrong?

Here's the exception the extractOnly=true command generates:

html
head
meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/ 
titleError 500 null

org.apache.solr.common.SolrException
at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr
actingDocumentLoader.java:233)
at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Co
ntentStreamHandlerBase.java:58)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl
erBase.java:129)
at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handle
Request(RequestHandlers.java:244)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter
.java:365)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte
r.java:260)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(Servlet
Handler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3
99)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav
a:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1
82)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7
66)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)

at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHand
lerCollection.java:230)
at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.
java:114)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:1
52)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:54
2)
at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnectio
n.java:945)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.
java:228)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.j
ava:582)
Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException 
from org.apache.tika.parser.microsoft.OfficeParser@aaf063
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244
)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242
)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:1
20)
at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr
actingDocumentLoader.java:227)
... 23 more
Caused by: java.lang.IllegalStateException: Internal: Internal error: element 
st ate is zero.
at org.apache.xml.serialize.BaseMarkupSerializer.leaveElementState(Unkno
wn Source)
at org.apache.xml.serialize.XMLSerializer.endElementIO(Unknown Source)
at org.apache.xml.serialize.XMLSerializer.endElement(Unknown Source)
at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler
Decorator.java:136)
at org.apache.tika.sax.SecureContentHandler.endElement(SecureContentHand
ler.java:256)
at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler
Decorator.java:136)
at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler
Decorator.java:136)
at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler
Decorator.java:136)
at org.apache.tika.sax.SafeContentHandler.endElement(SafeContentHandler.
java:273)
at org.apache.tika.sax.XHTMLContentHandler.endDocument(XHTMLContentHandl
er.java:213)
at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java
:178)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242
)
... 26 more
/title
/head
bodyh2HTTP ERROR 500/h2
pProblem accessing /solr/update/extract. Reason:
prenull

Re: Can't extract Outlook message files

2012-08-24 Thread Erick Erickson
Hmmm, it kind of looks like your file doesn't have an id field, but
that's just guessing based on your statement hat providing an ID
works just fine. Does it work if you take the uniqueKey definition
out of your schema.xml (and you'll also
have to remove the 'required=true ' from the id field)?

But this is a wild shot in the dark

Best
Erick

On Thu, Aug 23, 2012 at 5:27 AM, Alexander Cougarman acoug...@bwc.org wrote:
 Hi. We're trying to use the following Curl command to perform an extract 
 only of *.MSG file, but it blows up:

curl http://localhost:8983/solr/update/extract?extractOnly=true; -F 
 myfile=@92.msg

 If we do this, it works fine:

   curl 
 http://localhost:8983/solr/update/extract?literal.id=doc1commit=true; -F 
 myfile=@92.msg

 We've tried a variety of MSG files and they all produce the same error; they 
 all have content in them. What are we doing wrong?

 Here's the exception the extractOnly=true command generates:

 html
 head
 meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/
 titleError 500 null

 org.apache.solr.common.SolrException
 at 
 org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr
 actingDocumentLoader.java:233)
 at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Co
 ntentStreamHandlerBase.java:58)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl
 erBase.java:129)
 at 
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handle
 Request(RequestHandlers.java:244)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter
 .java:365)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte
 r.java:260)
 at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(Servlet
 Handler.java:1212)
 at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3
 99)
 at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav
 a:216)
 at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1
 82)
 at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7
 66)
 at 
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)

 at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHand
 lerCollection.java:230)
 at 
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.
 java:114)
 at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:1
 52)
 at org.mortbay.jetty.Server.handle(Server.java:326)
 at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:54
 2)
 at 
 org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnectio
 n.java:945)
 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
 at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
 at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
 at 
 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.
 java:228)
 at 
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.j
 ava:582)
 Caused by: org.apache.tika.exception.TikaException: Unexpected 
 RuntimeException
 from org.apache.tika.parser.microsoft.OfficeParser@aaf063
 at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244
 )
 at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242
 )
 at 
 org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:1
 20)
 at 
 org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr
 actingDocumentLoader.java:227)
 ... 23 more
 Caused by: java.lang.IllegalStateException: Internal: Internal error: element 
 st
 ate is zero.
 at 
 org.apache.xml.serialize.BaseMarkupSerializer.leaveElementState(Unkno
 wn Source)
 at org.apache.xml.serialize.XMLSerializer.endElementIO(Unknown Source)
 at org.apache.xml.serialize.XMLSerializer.endElement(Unknown Source)
 at 
 org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler
 Decorator.java:136)
 at 
 org.apache.tika.sax.SecureContentHandler.endElement(SecureContentHand
 ler.java:256)
 at 
 org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler
 Decorator.java:136)
 at 
 org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler
 Decorator.java:136)
 at 
 org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler
 Decorator.java:136)
 at 
 org.apache.tika.sax.SafeContentHandler.endElement(SafeContentHandler.
 java:273)
 at 
 org.apache.tika.sax.XHTMLContentHandler.endDocument(XHTMLContentHandl
 er.java:213)
 at 
 

Can't extract Outlook message files

2012-08-23 Thread Alexander Cougarman
Hi. We're trying to use the following Curl command to perform an extract only 
of *.MSG file, but it blows up:

   curl http://localhost:8983/solr/update/extract?extractOnly=true; -F 
myfile=@92.msg

If we do this, it works fine:

  curl http://localhost:8983/solr/update/extract?literal.id=doc1commit=true; 
-F myfile=@92.msg

We've tried a variety of MSG files and they all produce the same error; they 
all have content in them. What are we doing wrong?

Here's the exception the extractOnly=true command generates:

html
head
meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/
titleError 500 null

org.apache.solr.common.SolrException
at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr
actingDocumentLoader.java:233)
at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Co
ntentStreamHandlerBase.java:58)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl
erBase.java:129)
at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handle
Request(RequestHandlers.java:244)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter
.java:365)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte
r.java:260)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(Servlet
Handler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3
99)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav
a:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1
82)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7
66)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)

at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHand
lerCollection.java:230)
at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.
java:114)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:1
52)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:54
2)
at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnectio
n.java:945)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.
java:228)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.j
ava:582)
Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException
from org.apache.tika.parser.microsoft.OfficeParser@aaf063
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244
)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242
)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:1
20)
at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr
actingDocumentLoader.java:227)
... 23 more
Caused by: java.lang.IllegalStateException: Internal: Internal error: element st
ate is zero.
at org.apache.xml.serialize.BaseMarkupSerializer.leaveElementState(Unkno
wn Source)
at org.apache.xml.serialize.XMLSerializer.endElementIO(Unknown Source)
at org.apache.xml.serialize.XMLSerializer.endElement(Unknown Source)
at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler
Decorator.java:136)
at org.apache.tika.sax.SecureContentHandler.endElement(SecureContentHand
ler.java:256)
at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler
Decorator.java:136)
at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler
Decorator.java:136)
at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler
Decorator.java:136)
at org.apache.tika.sax.SafeContentHandler.endElement(SafeContentHandler.
java:273)
at org.apache.tika.sax.XHTMLContentHandler.endDocument(XHTMLContentHandl
er.java:213)
at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java
:178)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242
)
... 26 more
/title
/head
bodyh2HTTP ERROR 500/h2
pProblem accessing /solr/update/extract. Reason:
prenull

org.apache.solr.common.SolrException
at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr
actingDocumentLoader.java:233)
at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Co
ntentStreamHandlerBase.java:58)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl
erBase.java:129)