On 23-Nov-07, at 7:28 AM, David Thibault wrote:

Hello all,

I'm new to Solr. From what little I have seen, Solr has made great strides in open source search, but is lacking some significant features that would
really allow it to become a viable alternative to things like FAST and
Autonomy for enterprise search. I am sure these issues have been discussed on the list before, but I would like to help push these issues forward if I
can:

It sounds to me like you are describing an application that can be built with Solr rather than what Solr aims to provide. That said, I see no reason that there couldn't exist some add-on modules providing this functionality.

1) Crawling--ShareHound does windows shares, but it ignores document-level
permissions.  A modular approach to crawling file systems, websites,
intranet sites, etc, would be huge. Also, I realize Nutch has a crawler but Solr looks much more full-featured in terms of things like faceted search,
etc, so I'd rather help push Solr forward.

It seems to be that every domain would require a different schema and have different requirements. I'm not sure that the solution to this problem belongs in Solr.

2) ACLs and document-level security--The lack of doc-level security is a
real deal-breaker in terms of indexing enterprise fileshares.  I could
envision this type of functionality to be embedded in the various crawlers above, on an OS-dependent or web app-dependent basis. For example, when indexing a file from a share, the ACL should be indexed as well, that way a results list can be brought back and the permissions would not need to be re-checked against the original file server. Also, this implies that ACL changes need to be monitored and updated as well as file content changes.

Again, I don't see this as within the purview of Solr. Solr provides lots of functionality to help implement access control (namely, rich filtering and faceting support), and may provide more once updateable documents are implemented. However, it has no concept of users, files, permissions, monitoring os-level changes, etc. Growing such awareness seems somewhat outside what Solr should provide.

There are other differences, obviously, between the leading commercial
products and Solr, but those two features alone would make a huge difference in the power of Solr, in my opinion. I have little Java experience, but I could easily prototype this functionality in other languages and work with others to integrate them into the code base in Java. Also, I headed up an enterprise search request for information for a large pharmaceutical company in the past, so I am familiar with the feature sets of FAST and Autonomy, and I could help manage the project in terms of competing feature sets.

Again, this feels more like an application to me. I could see someone putting together a solution to these problems in one package, perhaps by distributing a separate webapp along with solr, complete with a pre-defined schema, a nicer admin console, and automatic crawling/indexing tools. In fact, I suspect such a product would be very cool and garner lots of attention. I don't see Solr becoming that product, though. Besides being outside the scope of the project, I think there might be a lack of interest among the core devs to develop and maintain that direction. Mightn't it be better to start a separate project, where a different set of people (with different priorities and interests) could have full control?

This situation is analogous to the Solr/Lucene: they are tightly integrated, and several people contribute to both, but they are different layers, and can proceed somewhat independently. And that is a Good Thing.

-Mike

Reply via email to