Hi,

I use Tika through the Solr ExtractingRequestHandler and I face a very
common use case namely: postprocessing Tika fields in order to normalize
some fields values or override them with explicitly passed
"literal" values.

With exception of some vagues statements about "ContentHandler", I
failed to find some good examples about this (while it appears to be
quite an important feature)
I also would like to work at the API "field" level rather than working
with xpath on the raw Tika output.

Does anyone knows of some good resources/samples about the proper way to
"postprocess" fields in the context of a Solr integration ?

PS: I may have posted this on the Solr ML but I know that while Tika
outputs XML it also overrides fields passed to the
ExtractingRequestHandler so I guess that the changes I need to do would
rather apply somewhere around the Tika API.


thank you in advance

Reply via email to