Integrate Solr Cell/Tika as an UpdateRequestProcessor
-----------------------------------------------------
Key: SOLR-1763
URL: https://issues.apache.org/jira/browse/SOLR-1763
Project: Solr
Issue Type: New Feature
Components: update
Reporter: Jan Høydahl
>From Chris Hostetter's original post in solr-dev:
As someone with very little knowledge of Solr Cell and/or Tika, I find myself
wondering if ExtractingRequestHandler would make more sense as an
extractingUpdateProcessor -- where it could be configured to take take either
binary fields (or string fields containing URLs) out of the Documents, parse
them with tika, and add the various XPath matching hunks of text back into the
document as new fields.
Then ExtractingRequestHandler just becomes a handler that slurps up it's
ContentStreams and adds them as binary data fields and adds the other literal
params as fields.
Wouldn't that make things like SOLR-1358, and using Tika with URLs/filepaths in
XML and CSV based updates fairly trivial?
-Hoss
I couldn't agree more, so I decided to add it as an issue.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.