Dima.

I think there are several issues that need to be thought through
thoroughly before we can implement this.

I created a Wiki page to discuss the design:

http://wiki.apache.org/nutch/Image_Search_Design

Writing a map reduce job is completely new for me, so with my limited
knowledge in this area I cannot answer your question.

Anyway, now I think is time to read hadoop MapReduce code :)

Rgrds, Thomas



On 6/3/06, Dima Mazmanov <[EMAIL PROTECTED]> wrote:
Hi,TDLN.

But how image data will be stored in nutch database?
Would it affect on rest data in it?
>> (E.G. Nutch define one url == one index document.)

> Why can't we create a document for every image that is found?

> Then it is as if we will have a parse-image plugin just like we have a
> parse-html and parse-pdf plugin, with the only difference that it will
> be run after all the pages in the segment have been fetched?

> Rgrds, Thomas




--
Regards,
 Dima                          mailto:[EMAIL PROTECTED]


Reply via email to