Re: Arguments for Solr implementation at public web site

2009-11-13 Thread Jan-Eirik B . Nævdal
Some extra for the pros list:

- Full control over which content to be searchable and not.
- Posibility to make pages searchable almost instant after publication
- Control over when the site is indexed


Friendly

Jan-Eirik

On Fri, Nov 13, 2009 at 10:52 AM, Lukáš Vlček lukas.vl...@gmail.com wrote:

 Hi,

 I am looking for good arguments to justify implementation a search for
 sites
 which are available on the public internet. There are many sites in
 powered
 by Solr section which are indexed by Google and other search engines but
 still they decided to invest resources into building and maintenance of
 their own search functionality and not to go with [user_query site:
 my_site.com] google search. Why?

 By no mean I am saying it makes not sense to implement Solr! But I want to
 put together list of reasons and possibly with examples. Your help would be
 much appreciated!

 Let's narrow the scope of this discussion to the following:
 - the search should cover several community sites running open source CMSs,
 JIRAs, Bugillas ... and the like
 - all documents use open formats (no need to parse Word or Excel)
 (maybe something close to what LucidImagination does for mailing lists of
 Lucene and Solr)

 My initial kick off list would be:

 pros:
 - considering we understand the content (we understand the domain scope) we
 can fine tune the search engine to provide more accurate results
 - Solr can give us facets
 - we have user search logs (valuable for analysis)
 - implementing Solr is a fun

 cons:
 - requires resources (but the cost is relatively low depending on the query
 traffic, index size and frequency of updates)

 Regards,
 Lukas

 http://blog.lukas-vlcek.com/




-- 
Jan Eirik B. Nævdal
Solutions Engineer | +47 982 65 347
Iterate AS | www.iterate.no
The Lean Software Development Consultancy


Re: Arguments for Solr implementation at public web site

2009-11-13 Thread Markus Jelsma - Buyways B.V.
Next to the faceting engine:
- MoreLikeThis
- Highlighting
- Spellchecker

But also more flexible querying using the DisMax handler which is
clearly superior. Solr can also be used to store data which can be
retrieved in an instant! We have used this technique in a site and it is
obviously much faster than multiple large and complex SQL statements.


On Fri, 2009-11-13 at 10:52 +0100, Lukáš Vlček wrote:

 pros:
 - considering we understand the content (we understand the domain scope) we
 can fine tune the search engine to provide more accurate results
 - Solr can give us facets
 - we have user search logs (valuable for analysis)
 - implementing Solr is a fun
 
 cons:
 - requires resources (but the cost is relatively low depending on the query
 traffic, index size and frequency of updates)
 
 Regards,
 Lukas
 
 http://blog.lukas-vlcek.com/


Re: Arguments for Solr implementation at public web site

2009-11-13 Thread Chantal Ackermann



Jan-Eirik B. Nævdal schrieb:

Some extra for the pros list:

- Full control over which content to be searchable and not.
- Posibility to make pages searchable almost instant after publication
- Control over when the site is indexed


+1 expecially the last point
you can also add a robot.txt and prohibit spidering of the site to 
reduce traffic. google won't index any highly dynamic content, then.





Friendly

Jan-Eirik

On Fri, Nov 13, 2009 at 10:52 AM, Lukáš Vlček lukas.vl...@gmail.com wrote:


Hi,

I am looking for good arguments to justify implementation a search for
sites
which are available on the public internet. There are many sites in
powered
by Solr section which are indexed by Google and other search engines but
still they decided to invest resources into building and maintenance of
their own search functionality and not to go with [user_query site:
my_site.com] google search. Why?

By no mean I am saying it makes not sense to implement Solr! But I want to
put together list of reasons and possibly with examples. Your help would be
much appreciated!

Let's narrow the scope of this discussion to the following:
- the search should cover several community sites running open source CMSs,
JIRAs, Bugillas ... and the like
- all documents use open formats (no need to parse Word or Excel)
(maybe something close to what LucidImagination does for mailing lists of
Lucene and Solr)

My initial kick off list would be:

pros:
- considering we understand the content (we understand the domain scope) we
can fine tune the search engine to provide more accurate results
- Solr can give us facets
- we have user search logs (valuable for analysis)
- implementing Solr is a fun

cons:
- requires resources (but the cost is relatively low depending on the query
traffic, index size and frequency of updates)

Regards,
Lukas

http://blog.lukas-vlcek.com/





--
Jan Eirik B. Nævdal
Solutions Engineer | +47 982 65 347
Iterate AS | www.iterate.no
The Lean Software Development Consultancy


Re: Arguments for Solr implementation at public web site

2009-11-13 Thread Andrew Clegg


Lukáš Vlček wrote:
 
 I am looking for good arguments to justify implementation a search for
 sites
 which are available on the public internet. There are many sites in
 powered
 by Solr section which are indexed by Google and other search engines but
 still they decided to invest resources into building and maintenance of
 their own search functionality and not to go with [user_query site:
 my_site.com] google search. Why?
 

You're assuming that Solr is just used in these cases to index discrete web
pages which Google etc. would be able to access via following navigational
links.

I would imagine that in a lot of cases, Solr is used to index database
entities which are used to build [parts of] pages dynamically, and which
might be viewable in different forms in various different pages.

Plus, with stored fields, you have the option of actually driving a website
off Solr instead of directly off a database, which might make sense from a
speed perspective in some cases.

And further, going back to page-only indexing -- you have no guarantee when
Google will decide to recrawl your site, so there may be a delay before
changes show up in their index. With an in-house search engine you can
reindex as often as you like.

Andrew.

-- 
View this message in context: 
http://old.nabble.com/Arguments-for-Solr-implementation-at-public-web-site-tp26333987p26334734.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Arguments for Solr implementation at public web site

2009-11-13 Thread Lukáš Vlček
Hi,

thanks for inputs so far... however, let's put it this way:

When you need to search for something Lucene or Solr related, which one do
you use:
- generic Google
- go to a particular mail list web site and search from here (if there is
any search form at all)
- go to LucidImagination.com and use its search capability

Regards,
Lukas


On Fri, Nov 13, 2009 at 11:50 AM, Andrew Clegg andrew.cl...@gmail.comwrote:



 Lukáš Vlček wrote:
 
  I am looking for good arguments to justify implementation a search for
  sites
  which are available on the public internet. There are many sites in
  powered
  by Solr section which are indexed by Google and other search engines but
  still they decided to invest resources into building and maintenance of
  their own search functionality and not to go with [user_query site:
  my_site.com] google search. Why?
 

 You're assuming that Solr is just used in these cases to index discrete web
 pages which Google etc. would be able to access via following navigational
 links.

 I would imagine that in a lot of cases, Solr is used to index database
 entities which are used to build [parts of] pages dynamically, and which
 might be viewable in different forms in various different pages.

 Plus, with stored fields, you have the option of actually driving a website
 off Solr instead of directly off a database, which might make sense from a
 speed perspective in some cases.

 And further, going back to page-only indexing -- you have no guarantee when
 Google will decide to recrawl your site, so there may be a delay before
 changes show up in their index. With an in-house search engine you can
 reindex as often as you like.

 Andrew.

 --
 View this message in context:
 http://old.nabble.com/Arguments-for-Solr-implementation-at-public-web-site-tp26333987p26334734.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: Arguments for Solr implementation at public web site

2009-11-13 Thread Andrew Clegg


Lukáš Vlček wrote:
 
 When you need to search for something Lucene or Solr related, which one do
 you use:
 - generic Google
 - go to a particular mail list web site and search from here (if there is
 any search form at all)
 

Both of these (Nabble in the second case) in case any recent posts have
appeared which Google hasn't picked up.

Andrew.

-- 
View this message in context: 
http://old.nabble.com/Arguments-for-Solr-implementation-at-public-web-site-tp26333987p26334980.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Arguments for Solr implementation at public web site

2009-11-13 Thread Jon Baer
For this list I usually end up @ http://solr.markmail.org (which I believe also 
uses Lucene under the hood)

Google is such a black box ... 

Pros:
+ 1 Open Source (enough said :-)

There also seems to always be the notion that crawling leads itself to 
produce the best results but that is rarely the case.  And unless you are a 
special type of site Google will not overlay your results w/ some type of 
context in the search (ie news or sports, etc).  

What I think really needs to happen is Solr (and is a bit missing @ the moment) 
is there needs to be a common interface to reindexing another index (if that 
makes sense) ... something akin or like OpenSearch 
(http://www.opensearch.org/Community/OpenSearch_software)

For example what I would like to do is have my site, have my search index, and 
connect Google to indexing just to my search index (and not crawl the site) ... 
the only current option for something like that are sitemaps which I think Solr 
(templates) should have a contrib project for (but you would have to generate 
these offline for sure).

- Jon  

On Nov 13, 2009, at 6:00 AM, Lukáš Vlček wrote:

 Hi,
 
 thanks for inputs so far... however, let's put it this way:
 
 When you need to search for something Lucene or Solr related, which one do
 you use:
 - generic Google
 - go to a particular mail list web site and search from here (if there is
 any search form at all)
 - go to LucidImagination.com and use its search capability
 
 Regards,
 Lukas
 
 
 On Fri, Nov 13, 2009 at 11:50 AM, Andrew Clegg andrew.cl...@gmail.comwrote:
 
 
 
 Lukáš Vlček wrote:
 
 I am looking for good arguments to justify implementation a search for
 sites
 which are available on the public internet. There are many sites in
 powered
 by Solr section which are indexed by Google and other search engines but
 still they decided to invest resources into building and maintenance of
 their own search functionality and not to go with [user_query site:
 my_site.com] google search. Why?
 
 
 You're assuming that Solr is just used in these cases to index discrete web
 pages which Google etc. would be able to access via following navigational
 links.
 
 I would imagine that in a lot of cases, Solr is used to index database
 entities which are used to build [parts of] pages dynamically, and which
 might be viewable in different forms in various different pages.
 
 Plus, with stored fields, you have the option of actually driving a website
 off Solr instead of directly off a database, which might make sense from a
 speed perspective in some cases.
 
 And further, going back to page-only indexing -- you have no guarantee when
 Google will decide to recrawl your site, so there may be a delay before
 changes show up in their index. With an in-house search engine you can
 reindex as often as you like.
 
 Andrew.
 
 --
 View this message in context:
 http://old.nabble.com/Arguments-for-Solr-implementation-at-public-web-site-tp26333987p26334734.html
 Sent from the Solr - User mailing list archive at Nabble.com.