Re: Store tika extracted result as xhtml

Chris Hostetter Sun, 25 Oct 2009 08:42:51 -0700

: My objective is to be able to stored it as xhtml in the field and be 
: able to retrieve it as cached output. Since tika is already giving xhtml 
: output, I wonder why when Solr save it as a plain text. (Maybe I missed 
: out something in the configuration??)


I'm not very familiar with Tika or Solr CELL, but I think what you are 
seeing is that Solr only asks Tika for the *content* of the DOM Nodes 
matched by the xpath and/or capture params (ie: node.getTextContent()).

I suspect it wouldnt' be too hard to add an option to allow the capture of 
the serialized DOM Nodes.



-Hoss

Re: Store tika extracted result as xhtml

Reply via email to