Re: Using standard XMP schemas for image and audio metadata

Jonathan Koren Mon, 09 Feb 2009 06:11:48 -0800


On Feb 8, 2009, at 10:59 AM, Jukka Zitting wrote:

Hi,

On Sun, Feb 8, 2009 at 4:57 PM, Jonathan Koren<jonat...@soe.ucsc.edu> wrote:

On Feb 8, 2009, at 5:55 AM, Jukka Zitting wrote:
On Sun, Feb 8, 2009 at 6:22 AM, Jonathan Koren <jonat...@soe.ucsc.edu>
wrote:
The problem with all these metadata standards is that they're alldumb in
the sense that they duplicate effort.
Agreed. So why would we want to duplicate the effort in Tika?
Because someone is going to be stuck doing it anyway.


Why? The metadata keys I proposed are semantically equivalent to the
custom keys we use now. Why would someone need to specify custom keys
when standard alternatives for the exact same concepts already exist?
Note that I'm only proposing that we change the keys of the six
metadata entries I listed.

But why only those six? It certainly seems like an arbitrary listbased on temporary convenience. You're not proposing to support allof XMP, just the bare minimum that you need this week. At some pointyou're going to want to add more metadata and then you're going goingto have to deal with the ontology mismatch problem. By luck or designyou've picked ones that do map 1-to-1 to some other ontology, but thisdoesn't hold across XMP and it doesn't scale across multipleontologies, including the ontologies you're currently using. When theday comes that you want to add more metadata, you haven't explainedhow you're going to solve the mismatch problem.

I don't understand what you do with the things that don't map 1-to-1with XMP. Ignore them? That doesn't work because then you'rearbitrarily dictating what kinds of problems the user can solve. Mapthem to some other space? That doesn't work either because then ifthe user wants to grab all the metadata from the foo space the userwill have to know that foo:one gets mapped to bar:uno, foo:two getsmapped to baz:cinco, and foo:three doesn't get mapped. It'sunreasonable to force such an ugly hack on all users just because itwas easier to do this for one person once.

I have a concrete use case where doing this would be beneficial: My
employer is building a digital asset management application where we
plan to leverage XMP for metadata handling. Rather than explicitly
mapping each individual Tika metadata key to equivalent XMP entries,
it would be much easier and clearer to just map the "tiff" and "xmlDM"
prefixes to appropriate XMP namespaces when importing Tika metadata.
We also wouldn't need to keep updating the metadata mappings whenever
new Tika versions start supporting new keys.

I understand that you don't want to keep updating your own code everytime Tika changes, but as you said, this is a 0.x release, so you'regoing to be stuck doing that for awhile. What I don't understand iswhy naively hardcoding the requirements for your current project intoa publicly available library is the appropriate place for this code.

Is there some better way for us to implement this use case?

Yes. Tika does no translation between ontologies. It simply dumpsall metadata detected for a file into its own namespace. This meansthat an MS Office file gets an MS namespace. Something with XMP getsan XMP namespace. ID3 tags go into the ID3 namepsace. Tika does nomapping among the types by default. You create a new class that takesthe raw key-value pairs that stored in Tika::Metadata and translatesthem to something else. Call it Metadata2XMP or whatever. That canbe packaged within Tika as a convenient class that does least commondenominator mapping in a well defined way. By breaking the mappingout to a class separate from Metadata, you avoid spreading a singlemetadata namespace across 15 namespaces, and you make all mapping 100%reversible (well in this case ignorable), since inevitably some willbe wrong in some case. If all a user wants is LCD metadata, they canget it through a common XMP namespace.



--
Jonathan Koren
jonat...@soe.ucsc.edu
http://www.soe.ucsc.edu/~jonathan/

Re: Using standard XMP schemas for image and audio metadata

Reply via email to