Re: Arguments for Solr implementation at public web site
Some extra for the pros list: - Full control over which content to be searchable and not. - Posibility to make pages searchable almost instant after publication - Control over when the site is indexed Friendly Jan-Eirik On Fri, Nov 13, 2009 at 10:52 AM, Lukáš Vlček lukas.vl...@gmail.com wrote: Hi, I am looking for good arguments to justify implementation a search for sites which are available on the public internet. There are many sites in powered by Solr section which are indexed by Google and other search engines but still they decided to invest resources into building and maintenance of their own search functionality and not to go with [user_query site: my_site.com] google search. Why? By no mean I am saying it makes not sense to implement Solr! But I want to put together list of reasons and possibly with examples. Your help would be much appreciated! Let's narrow the scope of this discussion to the following: - the search should cover several community sites running open source CMSs, JIRAs, Bugillas ... and the like - all documents use open formats (no need to parse Word or Excel) (maybe something close to what LucidImagination does for mailing lists of Lucene and Solr) My initial kick off list would be: pros: - considering we understand the content (we understand the domain scope) we can fine tune the search engine to provide more accurate results - Solr can give us facets - we have user search logs (valuable for analysis) - implementing Solr is a fun cons: - requires resources (but the cost is relatively low depending on the query traffic, index size and frequency of updates) Regards, Lukas http://blog.lukas-vlcek.com/ -- Jan Eirik B. Nævdal Solutions Engineer | +47 982 65 347 Iterate AS | www.iterate.no The Lean Software Development Consultancy
Re: Arguments for Solr implementation at public web site
Next to the faceting engine: - MoreLikeThis - Highlighting - Spellchecker But also more flexible querying using the DisMax handler which is clearly superior. Solr can also be used to store data which can be retrieved in an instant! We have used this technique in a site and it is obviously much faster than multiple large and complex SQL statements. On Fri, 2009-11-13 at 10:52 +0100, Lukáš Vlček wrote: pros: - considering we understand the content (we understand the domain scope) we can fine tune the search engine to provide more accurate results - Solr can give us facets - we have user search logs (valuable for analysis) - implementing Solr is a fun cons: - requires resources (but the cost is relatively low depending on the query traffic, index size and frequency of updates) Regards, Lukas http://blog.lukas-vlcek.com/
Re: Arguments for Solr implementation at public web site
Jan-Eirik B. Nævdal schrieb: Some extra for the pros list: - Full control over which content to be searchable and not. - Posibility to make pages searchable almost instant after publication - Control over when the site is indexed +1 expecially the last point you can also add a robot.txt and prohibit spidering of the site to reduce traffic. google won't index any highly dynamic content, then. Friendly Jan-Eirik On Fri, Nov 13, 2009 at 10:52 AM, Lukáš Vlček lukas.vl...@gmail.com wrote: Hi, I am looking for good arguments to justify implementation a search for sites which are available on the public internet. There are many sites in powered by Solr section which are indexed by Google and other search engines but still they decided to invest resources into building and maintenance of their own search functionality and not to go with [user_query site: my_site.com] google search. Why? By no mean I am saying it makes not sense to implement Solr! But I want to put together list of reasons and possibly with examples. Your help would be much appreciated! Let's narrow the scope of this discussion to the following: - the search should cover several community sites running open source CMSs, JIRAs, Bugillas ... and the like - all documents use open formats (no need to parse Word or Excel) (maybe something close to what LucidImagination does for mailing lists of Lucene and Solr) My initial kick off list would be: pros: - considering we understand the content (we understand the domain scope) we can fine tune the search engine to provide more accurate results - Solr can give us facets - we have user search logs (valuable for analysis) - implementing Solr is a fun cons: - requires resources (but the cost is relatively low depending on the query traffic, index size and frequency of updates) Regards, Lukas http://blog.lukas-vlcek.com/ -- Jan Eirik B. Nævdal Solutions Engineer | +47 982 65 347 Iterate AS | www.iterate.no The Lean Software Development Consultancy
Re: Arguments for Solr implementation at public web site
Lukáš Vlček wrote: I am looking for good arguments to justify implementation a search for sites which are available on the public internet. There are many sites in powered by Solr section which are indexed by Google and other search engines but still they decided to invest resources into building and maintenance of their own search functionality and not to go with [user_query site: my_site.com] google search. Why? You're assuming that Solr is just used in these cases to index discrete web pages which Google etc. would be able to access via following navigational links. I would imagine that in a lot of cases, Solr is used to index database entities which are used to build [parts of] pages dynamically, and which might be viewable in different forms in various different pages. Plus, with stored fields, you have the option of actually driving a website off Solr instead of directly off a database, which might make sense from a speed perspective in some cases. And further, going back to page-only indexing -- you have no guarantee when Google will decide to recrawl your site, so there may be a delay before changes show up in their index. With an in-house search engine you can reindex as often as you like. Andrew. -- View this message in context: http://old.nabble.com/Arguments-for-Solr-implementation-at-public-web-site-tp26333987p26334734.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Arguments for Solr implementation at public web site
Hi, thanks for inputs so far... however, let's put it this way: When you need to search for something Lucene or Solr related, which one do you use: - generic Google - go to a particular mail list web site and search from here (if there is any search form at all) - go to LucidImagination.com and use its search capability Regards, Lukas On Fri, Nov 13, 2009 at 11:50 AM, Andrew Clegg andrew.cl...@gmail.comwrote: Lukáš Vlček wrote: I am looking for good arguments to justify implementation a search for sites which are available on the public internet. There are many sites in powered by Solr section which are indexed by Google and other search engines but still they decided to invest resources into building and maintenance of their own search functionality and not to go with [user_query site: my_site.com] google search. Why? You're assuming that Solr is just used in these cases to index discrete web pages which Google etc. would be able to access via following navigational links. I would imagine that in a lot of cases, Solr is used to index database entities which are used to build [parts of] pages dynamically, and which might be viewable in different forms in various different pages. Plus, with stored fields, you have the option of actually driving a website off Solr instead of directly off a database, which might make sense from a speed perspective in some cases. And further, going back to page-only indexing -- you have no guarantee when Google will decide to recrawl your site, so there may be a delay before changes show up in their index. With an in-house search engine you can reindex as often as you like. Andrew. -- View this message in context: http://old.nabble.com/Arguments-for-Solr-implementation-at-public-web-site-tp26333987p26334734.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Arguments for Solr implementation at public web site
Lukáš Vlček wrote: When you need to search for something Lucene or Solr related, which one do you use: - generic Google - go to a particular mail list web site and search from here (if there is any search form at all) Both of these (Nabble in the second case) in case any recent posts have appeared which Google hasn't picked up. Andrew. -- View this message in context: http://old.nabble.com/Arguments-for-Solr-implementation-at-public-web-site-tp26333987p26334980.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Arguments for Solr implementation at public web site
For this list I usually end up @ http://solr.markmail.org (which I believe also uses Lucene under the hood) Google is such a black box ... Pros: + 1 Open Source (enough said :-) There also seems to always be the notion that crawling leads itself to produce the best results but that is rarely the case. And unless you are a special type of site Google will not overlay your results w/ some type of context in the search (ie news or sports, etc). What I think really needs to happen is Solr (and is a bit missing @ the moment) is there needs to be a common interface to reindexing another index (if that makes sense) ... something akin or like OpenSearch (http://www.opensearch.org/Community/OpenSearch_software) For example what I would like to do is have my site, have my search index, and connect Google to indexing just to my search index (and not crawl the site) ... the only current option for something like that are sitemaps which I think Solr (templates) should have a contrib project for (but you would have to generate these offline for sure). - Jon On Nov 13, 2009, at 6:00 AM, Lukáš Vlček wrote: Hi, thanks for inputs so far... however, let's put it this way: When you need to search for something Lucene or Solr related, which one do you use: - generic Google - go to a particular mail list web site and search from here (if there is any search form at all) - go to LucidImagination.com and use its search capability Regards, Lukas On Fri, Nov 13, 2009 at 11:50 AM, Andrew Clegg andrew.cl...@gmail.comwrote: Lukáš Vlček wrote: I am looking for good arguments to justify implementation a search for sites which are available on the public internet. There are many sites in powered by Solr section which are indexed by Google and other search engines but still they decided to invest resources into building and maintenance of their own search functionality and not to go with [user_query site: my_site.com] google search. Why? You're assuming that Solr is just used in these cases to index discrete web pages which Google etc. would be able to access via following navigational links. I would imagine that in a lot of cases, Solr is used to index database entities which are used to build [parts of] pages dynamically, and which might be viewable in different forms in various different pages. Plus, with stored fields, you have the option of actually driving a website off Solr instead of directly off a database, which might make sense from a speed perspective in some cases. And further, going back to page-only indexing -- you have no guarantee when Google will decide to recrawl your site, so there may be a delay before changes show up in their index. With an in-house search engine you can reindex as often as you like. Andrew. -- View this message in context: http://old.nabble.com/Arguments-for-Solr-implementation-at-public-web-site-tp26333987p26334734.html Sent from the Solr - User mailing list archive at Nabble.com.