Solr Cell revamped as an UpdateProcessor?

Chris Hostetter Mon, 07 Dec 2009 12:52:14 -0800

ASs someone with very little knowledge of Solr Cell and/or Tika, I findmyself wondering if ExtractingRequestHandler would make more sense as anextractingUpdateProcessor -- where it could be configured to take takeeither binary fields (or string fields containing URLs) out of theDocuments, parse them with tika, and add the various XPath matching hunksof text back into the document as new fields.

Then ExtractingRequestHandler just becomes a handler that slurps up it'sContentStreams and adds them as binary data fields and adds the otherliteral params as fields.

Wouldn't that make things like SOLR-1358, and using Tika withURLs/filepaths in XML and CSV based updates fairly trivial?




-Hoss

Solr Cell revamped as an UpdateProcessor?

Reply via email to