I agree with Nick’s recommendation on post-parsing key mapping, and I’d like to 
put in a plug for the RecursiveParserWrapper, which may be of use for you.  
I’ve been intending to add that to the app commandline and to server…how are 
you handling embedded document metadata?  Would the wrapper be of any use or do 
you not have any embedded docs in your doc set?

I’ve also been meaning to dump counts of metadata keys from the govdocs1 
corpus, would that be of any use, or do you already know the keys that you care 
about?

Cheers,

         Tim
From: Can Duruk [mailto:c...@duruk.net]
Sent: Thursday, October 09, 2014 12:13 PM
To: user@tika.apache.org
Subject: Re: Customizing Metadata Keys

>I'd suggest you do the mapping from Tika keys to your keys in the server.
>All the parsers should return consistent keys, so the "output" side is
>the
>best place to map.

That seems to be the now-obvious solution, thanks for the suggestion.

> Perhaps a re-mapping downstream ContentHandler
> that takes in the Metadata object and will reformat
> the <meta name=.. section of the XHTML?

I've tried a way to add a step late in the pipeline I'm not super familiar with 
the Tika codebase so got lost a bit. Any pointers (examples / tutorials) you 
could guide me towards? Chapters in the Tika book? I want to explore this if 
the server idea doesn't pan out.

On Wed, Oct 8, 2014 at 10:25 PM, Chris Mattmann 
<chris.mattm...@gmail.com<mailto:chris.mattm...@gmail.com>> wrote:
>
> Perhaps a re-mapping downstream ContentHandler
> that takes in the Metadata object and will reformat
> the <meta name=.. section of the XHTML?
>
>
> ------------------------
> Chris Mattmann
> chris.mattm...@gmail.com<mailto:chris.mattm...@gmail.com>
>
>
>
>
> -----Original Message-----
> From: Nick Burch <apa...@gagravarr.org<mailto:apa...@gagravarr.org>>
> Reply-To: <user@tika.apache.org<mailto:user@tika.apache.org>>
> Date: Thursday, October 9, 2014 at 12:32 PM
> To: <user@tika.apache.org<mailto:user@tika.apache.org>>
> Subject: Re: Customizing Metadata Keys
>
> >On Wed, 8 Oct 2014, Can Duruk wrote:
> >> My question is regarding setting the metadata keys coming from the
> >>parsers
> >> to my own keys.
> >>
> >> For my application, I am using Tika to extract the metadata for a bunch
> >>of
> >> files. I am using the embedded HTTP server which I modified for my
> >>needs to
> >> return instead of CSV. (Hoping to submit that as a patch soon)
> >>
> >> However, the keys in the JSON are all in different formats and I need
> >>them
> >> to conform to my own requirements.
> >
> >I'd suggest you do the mapping from Tika keys to your keys in the server.
> >All the parsers should return consistent keys, so the "output" side is
> >the
> >best place to map. Trying to do it in each parser would be much more
> >work.
> >Just put the mapping in between where you call the parser, and where you
> >output
> >
> >Nick
>
>

Reply via email to