Having a image search component for nutch would be nice. However I think we need to implement this as a kind of separated tool outside of the nutch code itself, since it is not 100 % integrateable into the nutch code. (E.G. Nutch define one url == one index document.) May be this would be a nice project for a nutch sandbox. If you like you can open an issue to request a nutch sandbox project "image search". If we got enough people vote for this issue we may have a chance to got it created.
Stefan Am 03.06.2006 um 10:38 schrieb TDLN: > I am interested in developing such a solution as well. > > I am currently storing the thumbnails on the file system under a > system generated name. My indexing plugin stores the filename in the > index. Thumbnails are later served to the client by seperate Apache > HTTP server. This required some changes but is otherwise pretty > straight forward and performs very well for my current 300.000+ > images, around 15kb each. > > If you are developing the more "Nutch-like" solution I could > contribute to that. For instance; I have some code that generates the > thumbs using ImageJ that yields very good results. > > But I would definitely need some guidance in writing the hadoop map > reduce job. we could even contribute this back and base a small > tutorial on this work. > > What do you think? > > Rgrds, Thomas > > On 6/2/06, Stefan Groschupf <[EMAIL PROTECTED]> wrote: >> Hi, >> using search http is a bad idea, since you get many but not all >> pages. >> Just write a hadoop map reduce job that process the fetched content >> in your segments, that should be easy. >> Storing images in a file system will be very slow as soon you have >> too many. >> I personal don't like databases since compared to nutch they are slow >> as a snail. >> For a other project also related to images I had created a own >> ImageWritable that contained the binary data of a compressed image >> compared with some meta data. >> If you use a MapFile finding a image based on a key should be very >> fast. I think much faster than a database with binary content. >> >> HTH >> Stefan >> >> >> >> >> Am 02.06.2006 um 21:10 schrieb Marco Pereira: >> >> > Hi Everybody, >> > >> > I've got nutch to index images searching it's url and alt and title >> > tags. >> > But the problem comes when storing the thumbnails. >> > I`ve indexed 3million images for a national search engine. >> > I was in doubt wheter I use a file system scheme or a database to >> > store the >> > thumbnails. >> > The thumbnails are created with a script that gets the image >> urls from >> > nutch index doing a search for http (search.jsp?query=http). >> > >> > Do you have any tips, ideas on this? >> > >> > Thanks you, >> > Marco >> >> --------------------------------------------- >> blog: http://www.find23.org >> company: http://www.media-style.com >> >> >> > _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
