Hi, using search http is a bad idea, since you get many but not all pages. Just write a hadoop map reduce job that process the fetched content in your segments, that should be easy. Storing images in a file system will be very slow as soon you have too many. I personal don't like databases since compared to nutch they are slow as a snail. For a other project also related to images I had created a own ImageWritable that contained the binary data of a compressed image compared with some meta data. If you use a MapFile finding a image based on a key should be very fast. I think much faster than a database with binary content.
HTH Stefan Am 02.06.2006 um 21:10 schrieb Marco Pereira: > Hi Everybody, > > I've got nutch to index images searching it's url and alt and title > tags. > But the problem comes when storing the thumbnails. > I`ve indexed 3million images for a national search engine. > I was in doubt wheter I use a file system scheme or a database to > store the > thumbnails. > The thumbnails are created with a script that gets the image urls from > nutch index doing a search for http (search.jsp?query=http). > > Do you have any tips, ideas on this? > > Thanks you, > Marco --------------------------------------------- blog: http://www.find23.org company: http://www.media-style.com _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
