Re: Metadata use by Apache Java projects

Philipp Koch Tue, 20 Nov 2007 00:58:49 -0800

> Philipp, I'm not talking about just reading meta data, but also writing
> it.
ok, i understand ;-). having a uniform way to access/write meta data
is indeed something worth thinking about - you are right! i have the
"digital asset management" use case in mind (that i currently develop)
which currently handles the meta data stuff for most of the formats
individually...


> Tika is a metadata extraction kit. I'm talking about something more general. 
> If
> the common metadata storage model, if we can agree on one, at the end
> becomes a subproject/subproduct of Tika, I'm cool.
yes, this sounds interessting.

> But I'm not sure Tika could cover all this translation functionality for all 
> the projects
> using metadata. That's something the individual document format
> libraries will be much better at. Tika is more of an aggregator.
well, i am not sure if we can ever make sure that ALL "individual
document format libraries" will ever support such a translation
functionality. so having something (like tika (currently only for
reading)) in between would definitely make sense to me.

regards,
philipp


On 11/20/07, Jeremias Maerki <[EMAIL PROTECTED]> wrote:
> On 20.11.2007 08:24:01 Philipp Koch wrote:
> > >    Jeremias, it sounds like you considering a new project which can
> > > translate data from many formats (read by a variety of projects) into
> > > XMP.  That sounds great!
> > hmm, i am not sure if (yet) another  new project should be set up for
> > this since the tika project already offers all the "infrastructure" to
> > read meta data from various formats. from my point of view, the tika
> > project should offer some kind of "meta data to xmp" translator.
>
> Philipp, I'm not talking about just reading metadata, but also writing
> it. Sanselan supports creating new TIFF, JPEG etc. files. FOP creates
> new PDF, SVG etc. files. These processes all need metadata. Tika is a
> metadata extraction kit. I'm talking about something more general. If
> the common metadata storage model, if we can agree on one, at the end
> becomes a subproject/subproduct of Tika, I'm cool. But I'm not sure Tika
> could cover all this translation functionality for all the projects
> using metadata. That's something the individual document format
> libraries will be much better at. Tika is more of an aggregator.
>
> > >    Sanselan could not use XMP internally to represent metadata,
> > > though.  Sanselan's goal is to read & write metadata (such as EXIF
> > > metadata) preserving not just tag values but directory structure,
> > > field order, field location, etc.
> > this makes sense to me, since i have only seen embedded xmp in adobe's
> > products that are using the pdf "file format" to store its data
> > (acrobat and illustrator at least)
>
> Sure, the adoption of XMP is somewhat limited. But I've worked with it
> for some time now and I've experienced the benefit. Our adopting it
> could actually improve acceptance elsewhere.
>
> <snip/>
>
> Jeremias Maerki
>
>

Re: Metadata use by Apache Java projects

Reply via email to