Hi, On Wed, Dec 3, 2008 at 9:37 AM, Stephane Bastian <[EMAIL PROTECTED]> wrote: > While this certainly sounds like a very good idea, it will be difficult to > settle on using solely a single metadata format in Tika. Dublin Core is one > of several metadata format available, and while it is certainly suitable for > some documents (word, excel, open document and such), it's not a silver > bullet. for instance when it comes to images, audio and others, it is fairly > limited and we've got almost no choice than describing the metadata in > another format than Dublin Core (for instance we could use something like > this http://www.metadataworkinggroup.com/pdf/mwg_guidance.pdf )
Using Dublin Core as the standard does not mean that we couldn't _also_ use other more specific metadata schemas where appropriate. The basic metadata use case is just knowing the type, name, descriptive title, and perhaps the author of the document. This we can do with Dublin Core for all documents where such basic metadata is available, and my point is that a client that only ever cares about such basic things shouldn't need to worry about different metadata schemas for different types of documents. Also, for things like images we should settle for some common image metadata schema so that a client that only cares about basic image things like resolution, depth, etc. doesn't need to have complex logic to determine which metadata keys it should use to get to such information. > What is important for me though is that Tika Parsers should never extract > meta-data using a key that doesn't belong to a known format as it make it > difficult to use the data. It's IMHO fine to use such novel keys when there is no standard metadata schema that covers such information. BR, Jukka Zitting