I have the same problem with discarding the metadata title.
I thought the parameter "captureAttr" (can be provided at the solrconfig.xml and via get/post as a parameter) is responsible for that? I set it to false in in the xml and as a parameter, still, I get "not multivalued field" errors due to metadata & literals delivering content to a "no multivalued" field. ;(

using 3.1 though.

On 02.02.2011 17:13, Grant Ingersoll wrote:
On Jan 28, 2011, at 5:38 PM, Andreas Kemkes wrote:

Just getting my feet wet with the text extraction using both schema and
solrconfig settings from the example directory in the 1.4 distribution, so I
might miss something obvious.

Trying to provide my own title (and discarding the one received through Tika's
metadata) wasn't straightforward. I had to use the following:

fmap.title=tika_title (to discard the Tika title)
literal.attr_title=New Title (to provide the correct one)
fmap.attr_title=title (to map it back to the field as I would like to use title
in searches)

Is there anything easier than the above?

How can this best be generalized to other metadata provided by Tika (which in
our use case will be mostly ignored, as it is provided separately)?
You can provide your own ContentHandler (see the wiki docs).  I think it would 
be reasonable to patch the ExtractingRequestHandler to have a no metadata 
option and it wouldn't be that hard.

Reply via email to