Re: Solr for enterprise search

Mike Klaas Fri, 23 Nov 2007 17:11:29 -0800


On 23-Nov-07, at 7:28 AM, David Thibault wrote:

Hello all,
I'm new to Solr. From what little I have seen, Solr has made greatstridesin open source search, but is lacking some significant featuresthat would
really allow it to become a viable alternative to things like FAST and
Autonomy for enterprise search. I am sure these issues have beendiscussedon the list before, but I would like to help push these issuesforward if I
can:

It sounds to me like you are describing an application that can bebuilt with Solr rather than what Solr aims to provide. That said, Isee no reason that there couldn't exist some add-on modules providingthis functionality.

1) Crawling--ShareHound does windows shares, but it ignoresdocument-level
permissions.  A modular approach to crawling file systems, websites,
intranet sites, etc, would be huge. Also, I realize Nutch has acrawler butSolr looks much more full-featured in terms of things like facetedsearch,
etc, so I'd rather help push Solr forward.

It seems to be that every domain would require a different schema andhave different requirements. I'm not sure that the solution to thisproblem belongs in Solr.

2) ACLs and document-level security--The lack of doc-level securityis a
real deal-breaker in terms of indexing enterprise fileshares.  I could
envision this type of functionality to be embedded in the variouscrawlersabove, on an OS-dependent or web app-dependent basis. For example,whenindexing a file from a share, the ACL should be indexed as well,that way aresults list can be brought back and the permissions would not needto bere-checked against the original file server. Also, this impliesthat ACLchanges need to be monitored and updated as well as file contentchanges.

Again, I don't see this as within the purview of Solr. Solr provideslots of functionality to help implement access control (namely, richfiltering and faceting support), and may provide more once updateabledocuments are implemented. However, it has no concept of users,files, permissions, monitoring os-level changes, etc. Growing suchawareness seems somewhat outside what Solr should provide.

There are other differences, obviously, between the leading commercial
products and Solr, but those two features alone would make a hugedifferencein the power of Solr, in my opinion. I have little Java experience,but Icould easily prototype this functionality in other languages andwork withothers to integrate them into the code base in Java. Also, Iheaded up anenterprise search request for information for a largepharmaceutical companyin the past, so I am familiar with the feature sets of FAST andAutonomy,and I could help manage the project in terms of competing featuresets.

Again, this feels more like an application to me. I could seesomeone putting together a solution to these problems in one package,perhaps by distributing a separate webapp along with solr, completewith a pre-defined schema, a nicer admin console, and automaticcrawling/indexing tools. In fact, I suspect such a product would bevery cool and garner lots of attention. I don't see Solr becomingthat product, though. Besides being outside the scope of theproject, I think there might be a lack of interest among the coredevs to develop and maintain that direction. Mightn't it be betterto start a separate project, where a different set of people (withdifferent priorities and interests) could have full control?

This situation is analogous to the Solr/Lucene: they are tightlyintegrated, and several people contribute to both, but they aredifferent layers, and can proceed somewhat independently. And thatis a Good Thing.


-Mike

Re: Solr for enterprise search

Reply via email to