Hi,

On 8/19/06, Sami Siren <[EMAIL PROTECTED]> wrote:
> So far nutch has been build to deal mainly with text type documents.
> There's however need also to deal with non textual object eg.  images,
> movies, sound which will provide content only in form of metadata (ok,
> perhaps some text also about the context of object if applicable), so
> the metadata names we have today are only a subset of what might be.
>
> I really would not want to restrict the metadata the interface can carry
> to a fixed set.

But if it's an open Map, how do you index and search using that, i.e.
what is the mapping between the Map keys used by a parser component
and the field names in the resulting Lucene index? How do we enforce
that an MPEG parser uses the same Map keys as a JPEG parser when
encountering metadata with the same semantics?

I'm not opposed to using a Map for truly variable metadata, like HTML
<meta/> tags with unknown names, but if we want common handling for
example for Dublin Core metadata, it would be better to enforce that
on the interface level.

BR,

Jukka Zitting

-- 
Yukatan - http://yukatan.fi/ - [EMAIL PROTECTED]
Software craftsmanship, JCR consulting, and Java development

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to