Re: Customizing Metadata Keys

2014-10-09 Thread Nick Burch
On Wed, 8 Oct 2014, Can Duruk wrote: My question is regarding setting the metadata keys coming from the parsers to my own keys. For my application, I am using Tika to extract the metadata for a bunch of files. I am using the embedded HTTP server which I modified for my needs to return instead

Formatted Content Extraction and Title Detection

2014-10-09 Thread imyuka
Hi all, Here is my problem: I have extracted plain texts from a serious of doc(x) documents and their titles via the dc:title label of metadata, but I'm not sure this is the right way to attain a title of a document. In many cases, a title inside a document could be of the largest

Re: Customizing Metadata Keys

2014-10-09 Thread Chris Mattmann
Perhaps a re-mapping downstream ContentHandler that takes in the Metadata object and will reformat the meta name=.. section of the XHTML? Chris Mattmann chris.mattm...@gmail.com -Original Message- From: Nick Burch apa...@gagravarr.org Reply-To:

Re: Customizing Metadata Keys

2014-10-09 Thread Can Duruk
I'd suggest you do the mapping from Tika keys to your keys in the server. All the parsers should return consistent keys, so the output side is the best place to map. That seems to be the now-obvious solution, thanks for the suggestion. Perhaps a re-mapping downstream ContentHandler that takes

RE: Customizing Metadata Keys

2014-10-09 Thread Allison, Timothy B.
I agree with Nick’s recommendation on post-parsing key mapping, and I’d like to put in a plug for the RecursiveParserWrapper, which may be of use for you. I’ve been intending to add that to the app commandline and to server…how are you handling embedded document metadata? Would the wrapper be

Re: Customizing Metadata Keys

2014-10-09 Thread Can Duruk
I agree with Nick’s recommendation on post-parsing key mapping, and I’d like to put in a plug for the RecursiveParserWrapper, which may be of use for you. I’ve been intending to add that to the app commandline and to server…how are you handling embedded document metadata? Would the wrapper be