Re: [Wikitech-l] Proposal for new table image_metadata

William Lee Mon, 05 Dec 2011 11:08:26 -0800

Thanks to everyone for your feedback about this plan.

After careful consideration, we have decided to discontinue our plan. It
does not go far enough to support the XMP standard. Instead, we will use
the field Image.img_metadata for the time being.


William

On Thu, Dec 1, 2011 at 8:49 PM, bawolff <[email protected]> wrote:

> > Message: 7
> > Date: Thu, 1 Dec 2011 12:36:02 -0500
> > From: Chad <[email protected]>
> > Subject: Re: [Wikitech-l] Proposal for new table image_metadata
> > To: Wikimedia developers <[email protected]>
> > Message-ID:
> >       <
> cadn73rnusx8regdubcesyg8mz1qg5sa49vamb5ed_y-vb-l...@mail.gmail.com>
> > Content-Type: text/plain; charset=UTF-8
> >
> > On Thu, Dec 1, 2011 at 12:34 PM, William Lee <[email protected]> wrote:
> > > I'm a developer at Wikia. We have a use case for searching through a
> file's
> > > metadata. This task is challenging now, because the field
> > > Image.img_metadata is a blob.
> > >
> > > We propose expanding the metadata field into a new table. We propose
> the
> > > name image_metadata. It will have three columns: img_name, attribute
> > > (varchar) and value (varchar). It can be joined with Image on img_name.
> > >
> > > On the application side, LocalFile's load* and decodeRow methods will
> have
> > > to be changed to support the new table.
> > >
> > > One issue to consider is the file archive. Should we replicate the
> metadata
> > > table for file archive? Or serialize the data and store it in a new
> table
> > > (something like fa_metadata)?
> > >
> > > Please let us know if you see any issues with this plan. We hope that
> this
> > > will be useful to the MediaWiki project, and a candidate to merge back.
> > >
> >
> > That was part of bawolff's plan last summer for GSoC when he overhauled
> > our metadata support. He got a lot of his project done, but never quite
> got
> > to this point. Something we'd definitely like to see though!
> >
> > -Chad
> >
> >
>
> Chad beat me to writing essentially what I was going to say. Basically
> my project ended up being more about extracting more information, and
> i didn't really touch what we did with it after we extracted.
>
> However, it should be noted that storing the image metadata nicely is
> a little more complicated then it appears at first glance (and that's
> mostly my fault  due to stuff i added during gsoc ;)
>
> Basically there's 4 different types of metadata values we store (in
> terms of the types of metadata you think of when you think EXIF et al.
> We stuff other stuff into img_metadata for extra fun)
> *Normal values - Things like Shutter speed = 1/110
> *unordered array - For example we can extract a "tags" field that's an
> arbitrary list of tags, The subject field (from XMP) is an unordered
> list, etc
> *Ordered array - Not used for a whole lot Most prominent example is
> the XMP author field is supposed to be an ordered list of authors, in
> order of importance. Honestly, we could just ditch caring about this,
> and probably nobody would notice.
> *Language array - XMP and PNG text chunks support a special value
> where you can specify language alternatives. In essence this looks
> like an associative array of "lang-code" => "translation of field into
> that lang", plus a special fallback "x-default" dummy lang code.
> *Also Contact info and software fields are stored kind of weirdly....
>
> Thus, just storing a table of key/value pairs is kind of problematic -
> how do you store an "array" value. Additionally you have to consider
> finding info. You probably want to efficiently be able to search
> through lang values in a specific language, or for a specific property
> and not caring for the language.
>
> Also consider how big a metadata field can get. Theoretically it's not
> really limited, well I don't expect it to be huge, > 255 bytes of
> utf-8 seems a totally reasonable size for a value of a metadata field.
>
> Last of all, you have to keep in mind all sorts of stuff is stored in
> the img_metadata. This includes things like the text layer of Djvu
> files (although arguably that shouldn't be stored there...) and other
> handler specific things (OggHandler stores some very complex
> structures in img_metadata). Of course, we could just keep the
> img_metadata blob there, and simply stop using it for "exif-like"
> data, but continue using it for handler specific ugly metadata that's
> generally invisible to user [probably a good idea. The two types of
> data are actually quite different].
>
>  > One issue to consider is the file archive. Should we replicate the
> metadata
>  > table for file archive? Or serialize the data and store it in a new
> table
>  > (something like fa_metadata)?
>
> Honestly, I wouldn't worry about that, especially in the beginning. As
> far as i know, the only place fa_metadata/oi_metadata is used, is that
> you can request it via api (I suppose it's copied over during file
> reverts as well). I don't think anyone uses that field on archived
> images really. (maybe one day bug 26741 will be fixed and this would
> be less of a concern).
>
>
> Anyhow, I do believe it would be awesome to store this data better. I
> can definitely think of many uses for being able to efficiently query
> it. (While I'm on the subject, making lucene index it would also be
> awesome).
>
> Cheers,
> Bawolff
>
> p.s. If its helpful - some of my ideas from last year for making a new
> metadata table are at
> http://www.mediawiki.org/wiki/User:Bawolff/metadata_table and the
> thread
> http://thread.gmane.org/gmane.science.linguistics.wikipedia.technical/48268
> . However, they're probably over-complicated/otherwise not ideal (I
> was naive back then ;). They also try and be able to encode anything
> encodable by XMP, which is most definitely a bad idea, since XMP is
> very complicated...
>
_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Proposal for new table image_metadata

Reply via email to