Hi Alex: For us was just a question of taste, if you like. We keep 2 different solr cores in our configuration one to store all kind of documents, from HTML, PDF, DOC, etc. and other for images, we decided this because the difference between the 2 schemas were big and of course its "cleaner" to keep this things separated. Inside the image index in our solr configuration we store the thumbnail, which is only a base64 encoded string. I think that one plus of this approach is that when we do queries against the solr index (in which we store the images) with one query we get all the data required to display the results page (built using symfony2 and solarium), if we store the images as files in a normal filesystem, we also need to request the web server for the thumbnails, which could slow down the render of the page. Again this is not bullet proof, in this approach we reduce the number of requests, but increase the network traffic per request to solr and eventually to the client, which depending on the environment could not bet acceptable.
Another plus could be the simplicity gained when deploying our stack, right now we just need to deploy the solr index into a shiny new server, if we keep the images in the filesystem have to deploy also some NFS infrastructure to keep the image available an any time, and perhaps we should consider using a separated server just for storing the thumbnails and configuring a lightweight web server with a special configuration for serve this. Any way I don't think this is "silver bullets" to storing thumbnails, this are just the considerations that lead us into this approach, any comments or suggestions are welcome from Alex or anyone else. Greetings, On Oct 21, 2012, at 10:51 PM, [email protected] wrote: > Hello, > > I have also written this kind of plugin. But instead of putting thumbnail > files in solr index they are put in a folder. Only, filenames are kept in the > solr index. > > I wondered what is the advantage of putting thumbnail files in the solr index? > > Thanks in advance. > Alex. > > > > > > -----Original Message----- > From: Jorge Luis Betancourt Gonzalez <[email protected]> > To: user <[email protected]> > Sent: Sun, Oct 21, 2012 7:26 pm > Subject: Re: Image search engine based on nutch/solr > > > Hi, > > As Lewis say before, if you are going to use nutch for image retrieval and > indexing in solr, you'll need to invest some time writing some tools > depending > on your needs. I've been working on a search engine using nutch for the > crawling > process and solr as an indexing server, the typical use, when we start > dealing > with images we became aware that nutch (through the tike project) extract to > few > information about the image "per se" (basically only metadata, gets > extracted), > I think that this is the biggest problem with nutch. One particular > requirement > for me was to show a thumbnail of the image, so I wrote a plugin that > generates > the thumbnail, then encode it using base64 and store it in the solr index. > Other > need was to annotate the image with the surrounding text to improve the > search, > I also write a plugin for this. > > Summarizing, nutch it's a very good start point, but depending on your > particular needs you'll have to write some plugins on your own. > > Greetings > > On Oct 20, 2012, at 10:02 AM, Lewis John Mcgibbney > <[email protected]> > wrote: > >> Hi, >> >> On Fri, Oct 19, 2012 at 10:48 PM, Santosh Mahto >> <[email protected]> wrote: >>> Hi all >> >>> I have few question: >>> 1. Does nutch support images crawling and indexing(or how much support is >>> there) >> >> Depending on how you wish to process and then present your images e.g. >> as thumbnails for example, I would say you need to invest some time >> writing a custom parser for images. You can read a pretty thorough and >> comprehensive thread [0] on this topic. >> >>> 2. As I got some link where apache-tika plugin is used to make image search >>> engine, with little exploration i found >>> tikka is defaulted in nutch(as I think ,not sure) . so is image seaching >>> also happens by default. >> >> Image processing and indexing is not enabled my default in the above context >> >>> 3. As I think i also need to configure solr to show the image result . >>> could you guide me what extra configuration need to be set in solr side >> >> Unless someone here who has worked with image indexing in Solr can >> help you in a more verbose manner than me, I would certainly direct >> you to thee solr-user@ list archives [1]. There appears to be plenty >> there. >> >> hth >> >> Lewis >> >> [0] http://www.mail-archive.com/[email protected]/msg06758.html >> [1] >> http://www.mail-archive.com/search?q=image&l=solr-user%40lucene.apache.org >> >> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS > INFORMATICAS... >> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION >> >> http://www.uci.cu >> http://www.facebook.com/universidad.uci >> http://www.flickr.com/photos/universidad_uci > > > 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS > INFORMATICAS... > CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION > > http://www.uci.cu > http://www.facebook.com/universidad.uci > http://www.flickr.com/photos/universidad_uci > > > > > 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS > INFORMATICAS... > CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION > > http://www.uci.cu > http://www.facebook.com/universidad.uci > http://www.flickr.com/photos/universidad_uci > 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci

