Hi fellows,

   Just wanted to check in and see if this has progressed since I last
asked? Maybe I should open a jira if there isn't one already? Its really
problematic for me using Tika in order to index documents not having it. For
example, having consistent date format across all parsers, and a constant
set of META data associated with each parser is a must otherwise searching
on the produced documents is problematic.... .

-shay.banon

On Mon, Apr 5, 2010 at 1:10 AM, Shay Banon <[email protected]> wrote:

> Hi,
>
>   From a brief review of the different parsers Tika provides, it seems like
> the parser basically add metadata as the underlying system provides them. It
> would be great to try and streamline the metadata creation so some will be
> consistent across parsers. For example, author, title, date (with ISO
> format). This means, for example, if you create a Lucene Document out of it,
> then you know what to search on regardless of the type.
>
>   Also, it would be great if each parser would document which metadata it
> is adding, and it which format (number, string, date).
>
>   What do you think?
>
> cheers,
> shay.banon
>

Reply via email to