Re: [Nutch-general] Image Search

Dima Mazmanov Sat, 03 Jun 2006 08:25:36 -0700

Hi,Stefan.

That would be great!!!
I think many people would vote for this.
Since nutch is really  powerfull  search engine, it would be nice to
see several types of search in it.


You wrote 3 июня 2006 г., 20:17:06:

> Having a image search component for nutch would be nice.
> However I think we need to implement this as a kind of separated tool
> outside of the nutch code itself, since it is not 100 % integrateable
> into the nutch code.
> (E.G. Nutch define one url == one index document.)
> May be this would be a nice project for a nutch sandbox.
> If you like you can open an issue to request a nutch sandbox project
> "image search".
> If we got enough people vote for this issue we may have a chance to
> got it created.

> Stefan

> Am 03.06.2006 um 10:38 schrieb TDLN:

>> I am interested in developing such a solution as well.
>>
>> I am currently storing the thumbnails on the file system under a
>> system generated name. My indexing plugin stores the filename in the
>> index. Thumbnails are later served to the client by seperate Apache
>> HTTP server. This required some changes but is otherwise pretty
>> straight forward and performs very well for my current 300.000+
>> images, around 15kb each.
>>
>> If you are developing the more "Nutch-like" solution I could
>> contribute to that. For instance; I have some code that generates the
>> thumbs using ImageJ that yields very good results.
>>
>> But I would definitely need some guidance in writing the hadoop map
>> reduce job. we could even contribute this back and base a small
>> tutorial on this work.
>>
>> What do you think?
>>
>> Rgrds, Thomas
>>
>> On 6/2/06, Stefan Groschupf <[EMAIL PROTECTED]> wrote:
>>> Hi,
>>> using search http is a bad idea, since you get many but not all  
>>> pages.
>>> Just write a hadoop map reduce job that process the fetched content
>>> in your segments, that should be easy.
>>> Storing images in a file system will be very slow as soon you have
>>> too many.
>>> I personal don't like databases since compared to nutch they are slow
>>> as a snail.
>>> For a other project also related to images I had created a own
>>> ImageWritable that contained the binary data of a compressed image
>>> compared with some meta data.
>>> If you use a MapFile finding a image based on a key should be very
>>> fast. I think much faster than a database with binary content.
>>>
>>> HTH
>>> Stefan
>>>
>>>
>>>
>>>
>>> Am 02.06.2006 um 21:10 schrieb Marco Pereira:
>>>
>>> > Hi Everybody,
>>> >
>>> > I've got nutch to index images searching it's url and alt and title
>>> > tags.
>>> > But the problem comes when storing the thumbnails.
>>> > I`ve indexed 3million images for a national search engine.
>>> > I was in doubt wheter I use a file system scheme or a database to
>>> > store the
>>> > thumbnails.
>>> > The thumbnails are created with a script that gets the image  
>>> urls from
>>> > nutch index doing a search for http (search.jsp?query=http).
>>> >
>>> > Do you have any tips, ideas on this?
>>> >
>>> > Thanks you,
>>> > Marco
>>>
>>> ---------------------------------------------
>>> blog: http://www.find23.org
>>> company: http://www.media-style.com
>>>
>>>
>>>
>>




> __________ NOD32 1.1576 (20060602) Information __________

> This message was checked by NOD32 antivirus system.
> http://www.eset.com




-- 
Regards,
 Dima                          mailto:[EMAIL PROTECTED]



_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Re: [Nutch-general] Image Search

Reply via email to