Hi,

On Mon, Feb 9, 2009 at 3:11 PM, Jonathan Koren <jonat...@soe.ucsc.edu> wrote:
> On Feb 8, 2009, at 10:59 AM, Jukka Zitting wrote:
>> Note that I'm only proposing that we change the keys of the six
>> metadata entries I listed.
>
> But why only those six?

Because they are useful pieces of metadata that are already accurately
defined in the respective XMP schemas. I for example didn't propose
changing the MIDI metadata key "patches", as AFAIK there is no
standard schema that covers that piece of information.

> You're not proposing to support all of XMP, just the bare minimum that you
> need this week.  At some point you're going to want to add more metadata
> and then you're going going to have to deal with the ontology mismatch 
> problem.

I'm not proposing that we try to map all the metadata we support into
the XMP schemas. All I'm trying to do is avoid using custom keys for
information where a well defined and widely used standard alternative
already exists.

If there's an ontology mismatch, then we can use custom keys. But I
don't see why we should invent new keys when standard alternatives
with the exact same semantics already exist.

A Tika-specific client shouldn't care whether the metadata key is
"width", "tiff:ImageWidth", "xyzzy" or even "the return value of
javax.imageio.ImageReader.getWidth(0)"; it should just use a constant
like Metadata.IMAGE_WIDTH.

The metadata key "tiff:ImageWidth" is well documented and makes life
easier when your application needs to interact with existing XMP
infrastructure (or other metadata tools that already know how to
import XMP metadata), and I don't see why the key would be any worse
than the alternatives.

> You create a new class that takes the raw key-value pairs that stored in
> Tika::Metadata and translates them to something else. Call it Metadata2XMP
> or whatever.  That can be packaged within Tika as a convenient class
> that does least common denominator mapping in a well defined way.

Having such a mapping class within Tika is an alternative, but as
discussed in the Dublin Core thread [1] in December, I'm not sure if
it's worth the added complexity. My proposal covers the use case with
much less extra code or documentation.

[1] http://markmail.org/message/zjsjslaelx6acf6z

BR,

Jukka Zitting

Reply via email to