Dima. I think there are several issues that need to be thought through thoroughly before we can implement this.
I created a Wiki page to discuss the design: http://wiki.apache.org/nutch/Image_Search_Design Writing a map reduce job is completely new for me, so with my limited knowledge in this area I cannot answer your question. Anyway, now I think is time to read hadoop MapReduce code :) Rgrds, Thomas On 6/3/06, Dima Mazmanov <[EMAIL PROTECTED]> wrote:
Hi,TDLN. But how image data will be stored in nutch database? Would it affect on rest data in it? >> (E.G. Nutch define one url == one index document.) > Why can't we create a document for every image that is found? > Then it is as if we will have a parse-image plugin just like we have a > parse-html and parse-pdf plugin, with the only difference that it will > be run after all the pages in the segment have been fetched? > Rgrds, Thomas -- Regards, Dima mailto:[EMAIL PROTECTED]
