Dear Wiki user, You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.
The following page has been changed by EricPugh: http://wiki.apache.org/solr/UpdateRichDocuments The comment on the change is: tweaking the example... ------------------------------------------------------------------------------ = Updating a Solr Index with Rich Documents such as PDF and MS Office = - Solr has an extensible + Solr has an extensible DocumentHandler architecture that allows you to feed it XML and CSV documents. There is now a patch file available as part of [https://issues.apache.org/jira/browse/SOLR-284 SOLR-284] that adds support for parsing rich binary formats. - Solr accepts index updates in [http://en.wikipedia.org/wiki/Comma-separated_values CSV] (Comma Separated Values) format. Different separators are configurable, and multi-valued fields are supported. + This page talks about how to get started using this patch. If you like it, please [https://issues.apache.org/jira/secure/ViewVoters!default.jspa?id=12372848 vote] for it on the JIRA issue tracker so we can get it added to the Solr codebase! + [[TableOfContents]] @@ -28, +29 @@ 4) Unzip the test-files.zip into SOLR_HOME/test/test-files/. These are various test files for running the included unit tests. - 5) Apply the rich.patch to your source. Rich.patch has tweaks that add the solr.RichDocumentRequestHandler to your solrconfig.xml. + 5) Apply the rich.patch to your source. Rich.patch has tweaks that add the solr.RichDocumentRequestHandler to your solrconfig.xml files. 6) Copy the contents of source.zip into SOLR_HOME/src/java/org/apache/solr/handler @@ -41, +42 @@ All of the normal methods for [SolrContentStreams uploading content] are supported. === Example === + These examples assume you have run {{{ant example}}} first and have it up and running using {{{java -jar start.jar}}}. + There is a sample PDF file at {{{src/test/test-files/simple.pdf}}} that may be used to add a PDF to the solr example server. - Example of using HTTP-POST to send the CSV data over the network to the Solr server: + Example of using HTTP-POST to send the PDF data over the network to the Solr server: {{{ - cd src/test/test-files/simple.pdf + cd src/test/test-files/ - curl http://localhost:8983/solr/update/rich --data-binary @simple.pdf -H 'Content-type:text/plain; charset=utf-8' + curl http://localhost:8983/solr/update/rich?stream.type=pdf --data-binary @simple.pdf -H 'Content-type:text/plain; charset=utf-8' }}} Uploading a binary file can be more efficient than sending it over the network via HTTP. @@ -57, +60 @@ The following request will cause Solr to directly read the input file: {{{ - curl http://localhost:8983/solr/update/rich?stream.file=src/test/test-files/simple.pdf + curl http://localhost:8983/solr/update/rich?stream.type=pdf&stream.file=src/test/test-files/simple.pdf&id=100&stream.fieldname=name #NOTE: The full path, or a path relative to the CWD of the running solr server must be used. }}}
