Re: Image Search

Stefan Groschupf Fri, 02 Jun 2006 12:26:58 -0700

Hi,
using search http is a bad idea, since you get many but not all pages.

Just write a hadoop map reduce job that process the fetched contentin your segments, that should be easy.Storing images in a file system will be very slow as soon you havetoo many.I personal don't like databases since compared to nutch they are slowas a snail.For a other project also related to images I had created a ownImageWritable that contained the binary data of a compressed imagecompared with some meta data.If you use a MapFile finding a image based on a key should be veryfast. I think much faster than a database with binary content.


HTH
Stefan




Am 02.06.2006 um 21:10 schrieb Marco Pereira:

Hi Everybody,
I've got nutch to index images searching it's url and alt and titletags.
But the problem comes when storing the thumbnails.
I`ve indexed 3million images for a national search engine.
I was in doubt wheter I use a file system scheme or a database tostore the
thumbnails.
The thumbnails are created with a script that gets the image urls from
nutch index doing a search for http (search.jsp?query=http).

Do you have any tips, ideas on this?

Thanks you,
Marco


---------------------------------------------
blog: http://www.find23.org
company: http://www.media-style.com

Re: Image Search

Reply via email to