Some thoughts: One of the most powerful and useful concepts that many of the other engines (well the good ones) use is the notion of processing pipelines.
For queries this means a series of stages that do things such as: * faceting * collapsing * applying default values * spell checking * adding in promotions/boosted content * applying relevancy logic * more like this But it is also heavily used at indexing time. The more complex engines use these pipelines for all kinds of crazy stuff like converting msoffice docs, ocr, speech to text, etc which I think is what nutch does to some extent. However solr could still use the same notion to do more lower level operations like: * applying synonyms * removing/renaming fields * translating xml formats (it would be nice to have any update handler be able to apply an xslt on incoming data) * validate incoming data against some business logic I think much of this is wrapped up in the field definitions at the moment but it could be extended to be more document aware. Anything that makes chaining of pre-built processing easier would be nice. In addition, if these stages are specified in solrconfig then decisions like 'do I want faceting before or after collpasing' become simple cut/paste choices not code changes. Further, if the last processing step is 'index this doc' or 'search the index' those should be easy to replace with 'send this doc to segment x' or 'search all the sub indexes' with simple xml config file changes assuming those stages exist. (which again is how many of the other engines do things) - will -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Sunday, June 10, 2007 12:51 PM To: solr-dev@lucene.apache.org Subject: search components (plugins) Some people have needed some custom query logic, and they had to implement their own request handlers. They still wanted all of the other functionality (or almost all), so they are forced to copy the standard request handler or dismax, or both. That's not the easiest to maintain, and could be more elegant. Another layer of plugins sounded like overkill at first, but I'm starting to rethink it, esp in the face of the expanding number of different variations: - standard - dismax - more-like-this - field collapsing Seems like we should be able to more easily mix and match, or add new pieces, w/o having whole new request handlers. Looking toward the future, and distributed search, this might be a natural place to add hooks to implement that distributed logic. This would allow other people to efficiently support their custom functionality in a distributed environment. Thoughts? -Yonik