Re: Solr Cell revamped as an UpdateProcessor?

Grant Ingersoll Mon, 07 Dec 2009 15:29:59 -0800

On Dec 7, 2009, at 3:51 PM, Chris Hostetter wrote:

> 
> ASs someone with very little knowledge of Solr Cell and/or Tika, I find 
> myself wondering if ExtractingRequestHandler would make more sense as an 
> extractingUpdateProcessor -- where it could be configured to take take either 
> binary fields (or string fields containing URLs) out of the Documents, parse 
> them with tika, and add the various XPath matching hunks of text back into 
> the document as new fields.
> 
> Then ExtractingRequestHandler just becomes a handler that slurps up it's 
> ContentStreams and adds them as binary data fields and adds the other literal 
> params as fields.
> 
> Wouldn't that make things like SOLR-1358, and using Tika with URLs/filepaths 
> in XML and CSV based updates fairly trivial?


It probably could, but am not sure how it works in a processor chain.  However, 
I'm not sure I understand how they work all that much either.  I also plan on 
adding, BTW, a SolrJ client for Tika that does the extraction on the client.  
In many cases, the ExtrReqHandler is really only designed for lighter weight 
extraction cases, as one would simply not want to send that much rich content 
over the wire.

Re: Solr Cell revamped as an UpdateProcessor?

Reply via email to