Hi,

One advantage of storing thumbnails in a separate folder is that the index
size will be much less. However, I recently have heard that linux systems
have problem storing millions of files under a folder? Also, I checked
google and yandex image search and it seems that both of them store files
under a folder, since their src links for images  are
https://encrypted-tbn3.gstatic.com/images?q=tbn:ANd9GcSGL7cSq_9YwSB3sUc6p2CcjioRrtYxouBcgVbo_063ghF8DODZ,
http://im0-tub-ru.yandex.net/i?id=209844222-71-72&n=21 respectively. 

Can anyone comment on this problem?

Thanks.
Alex.


Jorge Luis Betancourt Gonzalez wrote
> Hi Alex:
> 
> For us was just a question of taste, if you like. We keep 2 different solr
> cores in our configuration one to store all kind of documents, from HTML,
> PDF, DOC, etc. and other for images, we decided this because the
> difference between the 2 schemas were big and of course its "cleaner" to
> keep this things separated. Inside the image index in our solr
> configuration we store the thumbnail, which is only a base64 encoded
> string. I think that one plus of this approach is that when we do queries
> against the solr index (in which we store the images) with one query we
> get all the data required to display the results page (built using
> symfony2 and solarium), if we store the images as files in a normal
> filesystem, we also need to request the web server for the thumbnails,
> which could slow down the render of the page. Again this is not bullet
> proof, in this approach we reduce the number of requests, but increase the
> network traffic per request to solr and eventually to the client, which
> depending on the environment could not bet acceptable. 
> 
> Another plus could be the simplicity gained when deploying our stack,
> right now we just need to deploy the solr index into a shiny new server,
> if we keep the images in the filesystem have to deploy also some NFS
> infrastructure to keep the image available an any time, and perhaps we
> should consider using a separated server just for storing the thumbnails
> and configuring a lightweight web server with a special configuration for
> serve this.
> 
> Any way I don't think this is "silver bullets" to storing thumbnails, this
> are just the considerations that lead us into this approach, any comments
> or suggestions are welcome from Alex or anyone else.
> 
> Greetings,
> 
> On Oct 21, 2012, at 10:51 PM, 

> alxsss@

>  wrote:
> 
>> Hello,
>> 
>> I have also written this kind of plugin. But instead of putting thumbnail
>> files in solr index they are put in a folder. Only, filenames are kept in
>> the solr index. 
>> 
>> I wondered what is the advantage of putting thumbnail files in the solr
>> index?
>> 
>> Thanks in advance.
>> Alex.
>> 
>> 
>> 
>> 
>> 
>> -----Original Message-----
>> From: Jorge Luis Betancourt Gonzalez <

> jlbetancourt@

> >
>> To: user <

> [email protected]

> >
>> Sent: Sun, Oct 21, 2012 7:26 pm
>> Subject: Re: Image search engine based on nutch/solr
>> 
>> 
>> Hi,
>> 
>> As Lewis say before, if you are going to use nutch for image retrieval
>> and 
>> indexing in solr, you'll need to invest some time writing some tools
>> depending 
>> on your needs. I've been working on a search engine using nutch for the
>> crawling 
>> process and solr as an indexing server, the typical use, when we start
>> dealing 
>> with images we became aware that nutch (through the tike project) extract
>> to few 
>> information about the image "per se" (basically only metadata, gets
>> extracted), 
>> I think that this is the biggest problem with nutch. One particular
>> requirement 
>> for me was to show a thumbnail of the image, so I wrote a plugin that
>> generates 
>> the thumbnail, then encode it using base64 and store it in the solr
>> index. Other 
>> need was to annotate the image with the surrounding text to improve the
>> search, 
>> I also write a plugin for this.
>> 
>> Summarizing, nutch it's a very good start point, but depending on your 
>> particular needs you'll have to write some plugins on your own.
>> 
>> Greetings
>> 
>> On Oct 20, 2012, at 10:02 AM, Lewis John Mcgibbney <

> lewis.mcgibbney@

> > 
>> wrote:
>> 
>>> Hi,
>>> 
>>> On Fri, Oct 19, 2012 at 10:48 PM, Santosh Mahto
>>> <

> santosh.inbox7@

> > wrote:
>>>> Hi all
>>> 
>>>> I have few question:
>>>> 1. Does nutch support images crawling and indexing(or how much support
>>>> is
>>>> there)
>>> 
>>> Depending on how you wish to process and then present your images e.g.
>>> as thumbnails for example, I would say you need to invest some time
>>> writing a custom parser for images. You can read a pretty thorough and
>>> comprehensive thread [0] on this topic.
>>> 
>>>> 2. As I got some link where apache-tika plugin is used to make image
>>>> search
>>>> engine, with little exploration i found
>>>>  tikka is defaulted in nutch(as I think ,not sure) . so is image
>>>> seaching
>>>> also happens by default.
>>> 
>>> Image processing and indexing is not enabled my default in the above
>>> context
>>> 
>>>> 3. As I think i also need to configure solr to show the image result .
>>>> could you guide me what extra configuration need to be set in solr side
>>> 
>>> Unless someone here who has worked with image indexing in Solr can
>>> help you in a more verbose manner than me, I would certainly direct
>>> you to thee solr-user@ list archives [1]. There appears to be plenty
>>> there.
>>> 
>>> hth
>>> 
>>> Lewis
>>> 
>>> [0] http://www.mail-archive.com/

> [email protected]

> /msg06758.html
>>> [1]
>>> http://www.mail-archive.com/search?q=image&l=solr-user%40lucene.apache.org
>>> 
>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
>> INFORMATICAS...
>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>> 
>>> http://www.uci.cu
>>> http://www.facebook.com/universidad.uci
>>> http://www.flickr.com/photos/universidad_uci
>> 
>> 
>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
>> INFORMATICAS...
>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>> 
>> http://www.uci.cu
>> http://www.facebook.com/universidad.uci
>> http://www.flickr.com/photos/universidad_uci
>> 
>> 
>> 
>> 
>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
>> INFORMATICAS...
>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>> 
>> http://www.uci.cu
>> http://www.facebook.com/universidad.uci
>> http://www.flickr.com/photos/universidad_uci
>> 
> 
> 
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> 
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Image-search-engine-based-on-nutch-solr-tp4014844p4032276.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to