Dear Wiki user, You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.
The following page has been changed by seanoc5: http://wiki.apache.org/solr/ExtractingRequestHandler ------------------------------------------------------------------------------ * Check out Solr trunk or get a 1.4 release or later if it exists * cd example - * Add the Configuration as defined below to the solrconfig.xml (or your solrconfig.xml), the libs will be added to the Solr home lib automatically by the example target, but the example Solr configuration does not contain the configuration of the ExtractingRequestHandler + * Add the Configuration as defined below to the solrconfig.xml (or your solrconfig.xml), the libs will be added to the Solr home lib automatically by the example target, but the example Solr configuration does not contain the configuration of the ExtractingRequestHandler + *''for recent solr code from svn, just uncomment existing section in solr/conf/solrconfig.xml under 'example' dir'' * java -jar start.jar In a separate window, post a file: - * curl http://localhost:8983/solr/update/extract?ext.idx.attr=true\&ext.def.fl=text -F "[email protected]" //Note, the trunk/site contains some nice example docs. + * curl http://localhost:8983/solr/update/extract?ext.idx.attr=true\&ext.def.fl=text -F "[email protected]" //Note, the trunk/site contains some nice example docs + * hint: [email protected] needs a valid path (absolute or relative), e.g. "myfi...@../../site/tutorial.html" if you are still in exampledocs dir. + * with recent svn, you may need to add a unique '''id''' param to curl (see [http://www.nabble.com/Missing-required-field:-id-Using-ExtractingRequestHandler-td22611039.html nabble msg]): + * e.g. curl http://localhost:8983/solr/update/extract?ext.idx.attr=true\&ext.def.fl=text\&ext.literal.id=123 -F "myfi...@../../site/tutorial.html" or @@ -51, +55 @@ <!> NOTE, this literally streams the file, which does not, then, provide info to Solr about the name of the file. or whatever other way you know how to do it. Don't forget to COMMIT! + * e.g. curl "http://localhost:8983/solr/update/" -H "Content-Type: text/xml" --data-binary '<commit waitFlush="false"/>' --see [http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Content-Extraction-Tika#example.source LucidImagination note] = Configuration =
