Hi,Stefan. That would be great!!! I think many people would vote for this. Since nutch is really powerfull search engine, it would be nice to see several types of search in it.
You wrote 3 июня 2006 г., 20:17:06: > Having a image search component for nutch would be nice. > However I think we need to implement this as a kind of separated tool > outside of the nutch code itself, since it is not 100 % integrateable > into the nutch code. > (E.G. Nutch define one url == one index document.) > May be this would be a nice project for a nutch sandbox. > If you like you can open an issue to request a nutch sandbox project > "image search". > If we got enough people vote for this issue we may have a chance to > got it created. > Stefan > Am 03.06.2006 um 10:38 schrieb TDLN: >> I am interested in developing such a solution as well. >> >> I am currently storing the thumbnails on the file system under a >> system generated name. My indexing plugin stores the filename in the >> index. Thumbnails are later served to the client by seperate Apache >> HTTP server. This required some changes but is otherwise pretty >> straight forward and performs very well for my current 300.000+ >> images, around 15kb each. >> >> If you are developing the more "Nutch-like" solution I could >> contribute to that. For instance; I have some code that generates the >> thumbs using ImageJ that yields very good results. >> >> But I would definitely need some guidance in writing the hadoop map >> reduce job. we could even contribute this back and base a small >> tutorial on this work. >> >> What do you think? >> >> Rgrds, Thomas >> >> On 6/2/06, Stefan Groschupf <[EMAIL PROTECTED]> wrote: >>> Hi, >>> using search http is a bad idea, since you get many but not all >>> pages. >>> Just write a hadoop map reduce job that process the fetched content >>> in your segments, that should be easy. >>> Storing images in a file system will be very slow as soon you have >>> too many. >>> I personal don't like databases since compared to nutch they are slow >>> as a snail. >>> For a other project also related to images I had created a own >>> ImageWritable that contained the binary data of a compressed image >>> compared with some meta data. >>> If you use a MapFile finding a image based on a key should be very >>> fast. I think much faster than a database with binary content. >>> >>> HTH >>> Stefan >>> >>> >>> >>> >>> Am 02.06.2006 um 21:10 schrieb Marco Pereira: >>> >>> > Hi Everybody, >>> > >>> > I've got nutch to index images searching it's url and alt and title >>> > tags. >>> > But the problem comes when storing the thumbnails. >>> > I`ve indexed 3million images for a national search engine. >>> > I was in doubt wheter I use a file system scheme or a database to >>> > store the >>> > thumbnails. >>> > The thumbnails are created with a script that gets the image >>> urls from >>> > nutch index doing a search for http (search.jsp?query=http). >>> > >>> > Do you have any tips, ideas on this? >>> > >>> > Thanks you, >>> > Marco >>> >>> --------------------------------------------- >>> blog: http://www.find23.org >>> company: http://www.media-style.com >>> >>> >>> >> > __________ NOD32 1.1576 (20060602) Information __________ > This message was checked by NOD32 antivirus system. > http://www.eset.com -- Regards, Dima mailto:[EMAIL PROTECTED] _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
