I meant to extract image title, src link and alt from <img tags and not store image files. For a keyword search in must display link, which automatically displays image itself in the search page. Not sure what do you mean image content-based retrieval? Do image files have tags like mp3 ones? Must a parse plugin be written in both cases?
Thanks. Alex. -----Original Message----- From: Andrzej Bialecki <[email protected]> To: user <[email protected]> Sent: Tue, Mar 8, 2011 12:58 pm Subject: Re: will nutch-2 be able to index image files On 3/8/11 9:09 PM, [email protected] wrote: > Hello, > > I wondered if nutch version 2 be able to index image files? In what way? Extract metadata and index image metadata as text? Sure, if we implement a plugin for it. Tika already supports EXIF, so this shouldn't be complicated, perhaps it's a tweak to the parse-tika configuration. Or did you mean the image content-based retrieval (e.g. using wavelets)? -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com

