Exactly right.  You get what you get with no metadata filter.  The metadata
filter is applied before these are added (IIRC...?).

Still need to improve our documentation...

On Fri, Nov 4, 2022 at 4:30 PM sam k <[email protected]> wrote:

> Thanks Tim, this is great!
>
> I was experimenting, whether
> org.apache.tika.metadata.filter.FieldNameMappingFilter in tika-config.xml
> can be used to also rename those custom metadata fields, but it seems to
> let them go through without renaming. Not sure if it would be a very useful
> feature anyhow.
>
> Thanks,
> -Sam
>
> On Thu, Nov 3, 2022 at 12:52 PM Tim Allison <[email protected]> wrote:
>
>> Yes.  We need to do a better job of documenting this. To inject
>> custom/external metadata, do something like this:
>>
>> {
>>     "emitKey": "emitKey1",
>>     "emitter": "my_emitter",
>>     "fetchKey": "fetchKey1",
>>     "fetcher": "my_fetcher",
>>     "handlerConfig": {
>>         "maxEmbeddedResources": 10,
>>         "parseMode": "concatenate",
>>         "type": "xml",
>>         "writeLimit": 10000
>>     },
>>     "id": "my_id",
>>     "metadata": {
>>         "m1": [
>>             "v1",
>>             "v1"
>>         ],
>>         "m2": [
>>             "v2",
>>             "v3"
>>         ],
>>         "m3": "v4"
>>     },
>>     "onParseException": "skip"
>> }
>>
>> On Thu, Nov 3, 2022 at 2:08 PM sam k <[email protected]>
>> wrote:
>>
>>> Hi,
>>>
>>> I'm running a Tika server with HttpFetcher and SolrEmitter, and it works
>>> great.
>>>
>>> When asking Tika to send documents to Solr, I can specify the document
>>> id as "emitKey" parameter:
>>>
>>> curl -X POST -H "Content-Type: application/json" -d '{"fetcher":"http",
>>> "fetchKey":"<URL>", "emitter":"solr", "emitKey":"<Document Id>"}'
>>> http://tika.server
>>>
>>> Is there a way to specify more custom fields for the Solr document being
>>> submitted, like:
>>>
>>> curl -X POST -H "Content-Type: application/json" -d '{"fetcher":"http",
>>> "fetchKey":"<URL>", "emitter":"solr", "emitKey":"<Document Id>",
>>> "anotherSolrField":"<Value>", "yetAnotherSolrField":"<Value>"}'
>>> http://tika.server
>>>
>>> We would like to set around 10 custom fields in each Solr document, such
>>> as the id of the user who created the PDF/Word, etc, so the values for the
>>> Solr fields would be different for each Solr document.
>>>
>>> Thanks,
>>> -Sam
>>>
>>

Reply via email to