> Philipp, I'm not talking about just reading meta data, but also writing > it. ok, i understand ;-). having a uniform way to access/write meta data is indeed something worth thinking about - you are right! i have the "digital asset management" use case in mind (that i currently develop) which currently handles the meta data stuff for most of the formats individually...
> Tika is a metadata extraction kit. I'm talking about something more general. > If > the common metadata storage model, if we can agree on one, at the end > becomes a subproject/subproduct of Tika, I'm cool. yes, this sounds interessting. > But I'm not sure Tika could cover all this translation functionality for all > the projects > using metadata. That's something the individual document format > libraries will be much better at. Tika is more of an aggregator. well, i am not sure if we can ever make sure that ALL "individual document format libraries" will ever support such a translation functionality. so having something (like tika (currently only for reading)) in between would definitely make sense to me. regards, philipp On 11/20/07, Jeremias Maerki <[EMAIL PROTECTED]> wrote: > On 20.11.2007 08:24:01 Philipp Koch wrote: > > > Jeremias, it sounds like you considering a new project which can > > > translate data from many formats (read by a variety of projects) into > > > XMP. That sounds great! > > hmm, i am not sure if (yet) another new project should be set up for > > this since the tika project already offers all the "infrastructure" to > > read meta data from various formats. from my point of view, the tika > > project should offer some kind of "meta data to xmp" translator. > > Philipp, I'm not talking about just reading metadata, but also writing > it. Sanselan supports creating new TIFF, JPEG etc. files. FOP creates > new PDF, SVG etc. files. These processes all need metadata. Tika is a > metadata extraction kit. I'm talking about something more general. If > the common metadata storage model, if we can agree on one, at the end > becomes a subproject/subproduct of Tika, I'm cool. But I'm not sure Tika > could cover all this translation functionality for all the projects > using metadata. That's something the individual document format > libraries will be much better at. Tika is more of an aggregator. > > > > Sanselan could not use XMP internally to represent metadata, > > > though. Sanselan's goal is to read & write metadata (such as EXIF > > > metadata) preserving not just tag values but directory structure, > > > field order, field location, etc. > > this makes sense to me, since i have only seen embedded xmp in adobe's > > products that are using the pdf "file format" to store its data > > (acrobat and illustrator at least) > > Sure, the adoption of XMP is somewhat limited. But I've worked with it > for some time now and I've experienced the benefit. Our adopting it > could actually improve acceptance elsewhere. > > <snip/> > > Jeremias Maerki > >
