Hi fellows, Just wanted to check in and see if this has progressed since I last asked? Maybe I should open a jira if there isn't one already? Its really problematic for me using Tika in order to index documents not having it. For example, having consistent date format across all parsers, and a constant set of META data associated with each parser is a must otherwise searching on the produced documents is problematic.... .
-shay.banon On Mon, Apr 5, 2010 at 1:10 AM, Shay Banon <[email protected]> wrote: > Hi, > > From a brief review of the different parsers Tika provides, it seems like > the parser basically add metadata as the underlying system provides them. It > would be great to try and streamline the metadata creation so some will be > consistent across parsers. For example, author, title, date (with ISO > format). This means, for example, if you create a Lucene Document out of it, > then you know what to search on regardless of the type. > > Also, it would be great if each parser would document which metadata it > is adding, and it which format (number, string, date). > > What do you think? > > cheers, > shay.banon >
