Thanks Tim, this is great! I was experimenting, whether org.apache.tika.metadata.filter.FieldNameMappingFilter in tika-config.xml can be used to also rename those custom metadata fields, but it seems to let them go through without renaming. Not sure if it would be a very useful feature anyhow.
Thanks, -Sam On Thu, Nov 3, 2022 at 12:52 PM Tim Allison <[email protected]> wrote: > Yes. We need to do a better job of documenting this. To inject > custom/external metadata, do something like this: > > { > "emitKey": "emitKey1", > "emitter": "my_emitter", > "fetchKey": "fetchKey1", > "fetcher": "my_fetcher", > "handlerConfig": { > "maxEmbeddedResources": 10, > "parseMode": "concatenate", > "type": "xml", > "writeLimit": 10000 > }, > "id": "my_id", > "metadata": { > "m1": [ > "v1", > "v1" > ], > "m2": [ > "v2", > "v3" > ], > "m3": "v4" > }, > "onParseException": "skip" > } > > On Thu, Nov 3, 2022 at 2:08 PM sam k <[email protected]> wrote: > >> Hi, >> >> I'm running a Tika server with HttpFetcher and SolrEmitter, and it works >> great. >> >> When asking Tika to send documents to Solr, I can specify the document id >> as "emitKey" parameter: >> >> curl -X POST -H "Content-Type: application/json" -d '{"fetcher":"http", >> "fetchKey":"<URL>", "emitter":"solr", "emitKey":"<Document Id>"}' >> http://tika.server >> >> Is there a way to specify more custom fields for the Solr document being >> submitted, like: >> >> curl -X POST -H "Content-Type: application/json" -d '{"fetcher":"http", >> "fetchKey":"<URL>", "emitter":"solr", "emitKey":"<Document Id>", >> "anotherSolrField":"<Value>", "yetAnotherSolrField":"<Value>"}' >> http://tika.server >> >> We would like to set around 10 custom fields in each Solr document, such >> as the id of the user who created the PDF/Word, etc, so the values for the >> Solr fields would be different for each Solr document. >> >> Thanks, >> -Sam >> >
