On 6/7/07, Rafael Rossini <[EMAIL PROTECTED]> wrote:
Hi, Jeff and Mike. Would you mind telling us about the architecture of your solutions a little bit? Mike, you said that you implemented a highly-distributed search engine using Solr as indexing nodes. What does that mean? You guys implemented a master, multi-slave solution for replication? Or the whole index shards for high availability and fail over?
Our solution doesn't use solr, but goes directly to lucene. It's built on windows, so the interop communication service is built on .net remoting (tcp based). Microsoft has deprecated ongoing development with .net remoting, in favor of other more standard mechanisms, i.e. http. So, we're looking to migrate our solution to a more community-supported model. The underlying structure sounds similar to what others have done: index shards distributed to various servers, each responsible for a subset of the index. A merging server handles coordination of concurrent thread requests and synchronizes the results as they're returned. The thread coordination and search results interleaving process is functional but not really scalable. It works for our user model, where users tend not to page deeply through results. We want to change that so we can use solr as our primary data source read mechanism for our site. -- j