Hi,
using search http is a bad idea, since you get many but not all pages.
Just write a hadoop map reduce job that process the fetched content  
in your segments, that should be easy.
Storing images in a file system will be very slow as soon you have  
too many.
I personal don't like databases since compared to nutch they are slow  
as a snail.
For a other project also related to images I had created a own  
ImageWritable that contained the binary data of a compressed image  
compared with some meta data.
If you use a MapFile finding a image based on a key should be very  
fast. I think much faster than a database with binary content.

HTH
Stefan




Am 02.06.2006 um 21:10 schrieb Marco Pereira:

> Hi Everybody,
>
> I've got nutch to index images searching it's url and alt and title  
> tags.
> But the problem comes when storing the thumbnails.
> I`ve indexed 3million images for a national search engine.
> I was in doubt wheter I use a file system scheme or a database to  
> store the
> thumbnails.
> The thumbnails are created with a script that gets the image urls from
> nutch index doing a search for http (search.jsp?query=http).
>
> Do you have any tips, ideas on this?
>
> Thanks you,
> Marco

---------------------------------------------
blog: http://www.find23.org
company: http://www.media-style.com




_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to