On 6/7/07, Rafael Rossini <[EMAIL PROTECTED]> wrote:

Hi, Jeff and Mike.

   Would you mind telling us about the architecture of your solutions a
little bit? Mike, you said that you implemented a highly-distributed
search
engine using Solr as indexing nodes. What does that mean? You guys
implemented a master, multi-slave solution for replication? Or the whole
index shards for high availability and fail over?


Our solution doesn't use solr, but goes directly to lucene.  It's built on
windows, so the interop communication service is built on .net remoting (tcp
based).  Microsoft has deprecated ongoing development with .net remoting, in
favor of other more standard mechanisms, i.e. http.  So, we're looking to
migrate our solution to a more community-supported model.

The underlying structure sounds similar to what others have done: index
shards distributed to various servers, each responsible for a subset of the
index.  A merging server handles coordination of concurrent thread requests
and synchronizes the results as they're returned.  The thread coordination
and search results interleaving process is functional but not really
scalable.  It works for our user model, where users tend not to page deeply
through results.  We want to change that so we can use solr as our primary
data source read mechanism for our site.

-- j

Reply via email to