Dima. I think there are several issues that need to be thought through thoroughly before we can implement this.
I created a Wiki page to discuss the design: http://wiki.apache.org/nutch/Image_Search_Design Writing a map reduce job is completely new for me, so with my limited knowledge in this area I cannot answer your question. Anyway, now I think is time to read hadoop MapReduce code :) Rgrds, Thomas On 6/3/06, Dima Mazmanov <[EMAIL PROTECTED]> wrote: > Hi,TDLN. > > But how image data will be stored in nutch database? > Would it affect on rest data in it? > >> (E.G. Nutch define one url == one index document.) > > > Why can't we create a document for every image that is found? > > > Then it is as if we will have a parse-image plugin just like we have a > > parse-html and parse-pdf plugin, with the only difference that it will > > be run after all the pages in the segment have been fetched? > > > Rgrds, Thomas > > > > > -- > Regards, > Dima mailto:[EMAIL PROTECTED] > > _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
