Hi, One advantage of storing thumbnails in a separate folder is that the index size will be much less. However, I recently have heard that linux systems have problem storing millions of files under a folder? Also, I checked google and yandex image search and it seems that both of them store files under a folder, since their src links for images are https://encrypted-tbn3.gstatic.com/images?q=tbn:ANd9GcSGL7cSq_9YwSB3sUc6p2CcjioRrtYxouBcgVbo_063ghF8DODZ, http://im0-tub-ru.yandex.net/i?id=209844222-71-72&n=21 respectively.
Can anyone comment on this problem? Thanks. Alex. Jorge Luis Betancourt Gonzalez wrote > Hi Alex: > > For us was just a question of taste, if you like. We keep 2 different solr > cores in our configuration one to store all kind of documents, from HTML, > PDF, DOC, etc. and other for images, we decided this because the > difference between the 2 schemas were big and of course its "cleaner" to > keep this things separated. Inside the image index in our solr > configuration we store the thumbnail, which is only a base64 encoded > string. I think that one plus of this approach is that when we do queries > against the solr index (in which we store the images) with one query we > get all the data required to display the results page (built using > symfony2 and solarium), if we store the images as files in a normal > filesystem, we also need to request the web server for the thumbnails, > which could slow down the render of the page. Again this is not bullet > proof, in this approach we reduce the number of requests, but increase the > network traffic per request to solr and eventually to the client, which > depending on the environment could not bet acceptable. > > Another plus could be the simplicity gained when deploying our stack, > right now we just need to deploy the solr index into a shiny new server, > if we keep the images in the filesystem have to deploy also some NFS > infrastructure to keep the image available an any time, and perhaps we > should consider using a separated server just for storing the thumbnails > and configuring a lightweight web server with a special configuration for > serve this. > > Any way I don't think this is "silver bullets" to storing thumbnails, this > are just the considerations that lead us into this approach, any comments > or suggestions are welcome from Alex or anyone else. > > Greetings, > > On Oct 21, 2012, at 10:51 PM, > alxsss@ > wrote: > >> Hello, >> >> I have also written this kind of plugin. But instead of putting thumbnail >> files in solr index they are put in a folder. Only, filenames are kept in >> the solr index. >> >> I wondered what is the advantage of putting thumbnail files in the solr >> index? >> >> Thanks in advance. >> Alex. >> >> >> >> >> >> -----Original Message----- >> From: Jorge Luis Betancourt Gonzalez < > jlbetancourt@ > > >> To: user < > [email protected] > > >> Sent: Sun, Oct 21, 2012 7:26 pm >> Subject: Re: Image search engine based on nutch/solr >> >> >> Hi, >> >> As Lewis say before, if you are going to use nutch for image retrieval >> and >> indexing in solr, you'll need to invest some time writing some tools >> depending >> on your needs. I've been working on a search engine using nutch for the >> crawling >> process and solr as an indexing server, the typical use, when we start >> dealing >> with images we became aware that nutch (through the tike project) extract >> to few >> information about the image "per se" (basically only metadata, gets >> extracted), >> I think that this is the biggest problem with nutch. One particular >> requirement >> for me was to show a thumbnail of the image, so I wrote a plugin that >> generates >> the thumbnail, then encode it using base64 and store it in the solr >> index. Other >> need was to annotate the image with the surrounding text to improve the >> search, >> I also write a plugin for this. >> >> Summarizing, nutch it's a very good start point, but depending on your >> particular needs you'll have to write some plugins on your own. >> >> Greetings >> >> On Oct 20, 2012, at 10:02 AM, Lewis John Mcgibbney < > lewis.mcgibbney@ > > >> wrote: >> >>> Hi, >>> >>> On Fri, Oct 19, 2012 at 10:48 PM, Santosh Mahto >>> < > santosh.inbox7@ > > wrote: >>>> Hi all >>> >>>> I have few question: >>>> 1. Does nutch support images crawling and indexing(or how much support >>>> is >>>> there) >>> >>> Depending on how you wish to process and then present your images e.g. >>> as thumbnails for example, I would say you need to invest some time >>> writing a custom parser for images. You can read a pretty thorough and >>> comprehensive thread [0] on this topic. >>> >>>> 2. As I got some link where apache-tika plugin is used to make image >>>> search >>>> engine, with little exploration i found >>>> tikka is defaulted in nutch(as I think ,not sure) . so is image >>>> seaching >>>> also happens by default. >>> >>> Image processing and indexing is not enabled my default in the above >>> context >>> >>>> 3. As I think i also need to configure solr to show the image result . >>>> could you guide me what extra configuration need to be set in solr side >>> >>> Unless someone here who has worked with image indexing in Solr can >>> help you in a more verbose manner than me, I would certainly direct >>> you to thee solr-user@ list archives [1]. There appears to be plenty >>> there. >>> >>> hth >>> >>> Lewis >>> >>> [0] http://www.mail-archive.com/ > [email protected] > /msg06758.html >>> [1] >>> http://www.mail-archive.com/search?q=image&l=solr-user%40lucene.apache.org >>> >>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS >> INFORMATICAS... >>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION >>> >>> http://www.uci.cu >>> http://www.facebook.com/universidad.uci >>> http://www.flickr.com/photos/universidad_uci >> >> >> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS >> INFORMATICAS... >> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION >> >> http://www.uci.cu >> http://www.facebook.com/universidad.uci >> http://www.flickr.com/photos/universidad_uci >> >> >> >> >> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS >> INFORMATICAS... >> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION >> >> http://www.uci.cu >> http://www.facebook.com/universidad.uci >> http://www.flickr.com/photos/universidad_uci >> > > > 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS > INFORMATICAS... > CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION > > http://www.uci.cu > http://www.facebook.com/universidad.uci > http://www.flickr.com/photos/universidad_uci -- View this message in context: http://lucene.472066.n3.nabble.com/Image-search-engine-based-on-nutch-solr-tp4014844p4032276.html Sent from the Nutch - User mailing list archive at Nabble.com.

