Posting Word documents

2009-07-30 Thread Kevin Miller
I am trying to post a Word document using the Solr post.jar file.  When
I attempt this, using a command line interface, I get a fatal error.

I have looked at the following resources:

Solr.com: Tutorial, Docs, FAQ,  ExtractingRequestHandler.

As near as I can tell, I have all the files in the proper place.

Following is a portion of the error displayed in the cmd window:

C:\Solr\Apache~1\example\exampledocsjava -jar post.jar *.doc
SimplePostTool: version 1.2
SimplePostTool: WARNING: Make sure your XML documents are encoded in
UTF-8, other encodings are not currently supported
SimplePostTool: POSTing files to http://localhost:8983/solr/update..
SimplePostTool: POSTing file BadNews.doc
SimplePostTool: FATAL: Solr returned an error:
Unexpected_character__code_65533__0xfffd_in_prolog_expected___at_rowcol_
unknownsoruce_11_javaioIOException_Unexpected_charater__code65533__0xfff
d_in_prolog_expected___at_rowcol_unknownsource_11___at_orgapachesolrhand
lerXMLLoaderloadXMLLoaderjava73___at_orgapahcesolrhandlerContentStreamHa
ndlerBasehandlerRequrestBodyContentStreamHandlerBasejava54___...

There is more and if needed I will be happy to post all of it.

Here is the information that posted into the log file:

127.0.0.1 -  -  [30/07/2009:15:20:09 +] POST /solr/update HTTP/1.1
500 4011 

Kevin Miller
Web Services


Re: Posting Word documents

2009-07-30 Thread Mark Miller

Look again at ExtractingRequestHandler.

I havn't looked at what post.jar does internally, but it probably 
doesn't work with ExtractingRequestHandler unless you can send other 
params as well. I would use curl as the examples in the doc for 
ExtractingRequestHandler does. Or figure out if post.jar will work for 
you and use it correctly. What Handler is 'update..' mapped to? If its 
not mapped to ExtractingRequestHandler than you have no hope of this 
working in any case. Looks to me like its trying to process the file as 
SolrXml - which means you are not submitting it to ExtractingRequestHandler.


--
- Mark

http://www.lucidimagination.com



Kevin Miller wrote:

I am trying to post a Word document using the Solr post.jar file.  When
I attempt this, using a command line interface, I get a fatal error.

I have looked at the following resources:

Solr.com: Tutorial, Docs, FAQ,  ExtractingRequestHandler.

As near as I can tell, I have all the files in the proper place.

Following is a portion of the error displayed in the cmd window:

C:\Solr\Apache~1\example\exampledocsjava -jar post.jar *.doc
SimplePostTool: version 1.2
SimplePostTool: WARNING: Make sure your XML documents are encoded in
UTF-8, other encodings are not currently supported
SimplePostTool: POSTing files to http://localhost:8983/solr/update..
SimplePostTool: POSTing file BadNews.doc
SimplePostTool: FATAL: Solr returned an error:
Unexpected_character__code_65533__0xfffd_in_prolog_expected___at_rowcol_
unknownsoruce_11_javaioIOException_Unexpected_charater__code65533__0xfff
d_in_prolog_expected___at_rowcol_unknownsource_11___at_orgapachesolrhand
lerXMLLoaderloadXMLLoaderjava73___at_orgapahcesolrhandlerContentStreamHa
ndlerBasehandlerRequrestBodyContentStreamHandlerBasejava54___...

There is more and if needed I will be happy to post all of it.

Here is the information that posted into the log file:

127.0.0.1 -  -  [30/07/2009:15:20:09 +] POST /solr/update HTTP/1.1
500 4011 


Kevin Miller
Web Services