On 9/21/07, Mike Klaas <[EMAIL PROTECTED]> wrote: > On 21-Sep-07, at 11:08 AM, Yonik Seeley wrote: > > > I wanted to take a step back for a second and think about if HTTP was > > really the right choice for the transport for distributed search. > > > > I think the high-level approach in SOLR-303 is the right way to go > > about it, but I'm unsure if HTTP is the right transport. > > I don't know anything about RMI, but is it possible to do 100's of > simultaneous asynchronous requests cheaply?
Good question... probably only important for really big clusters (like yours), but it would be nice. Even if we go HTTP, I'm not sure it will be async at first - does HTTPClient even support async? I assume when you say async that you mean getting rid of the thread-per-connection via NIO. Some protocols do "async" by handing off the request to another thread to wait on the response and then do a callback to the original thread - this is async with respect to the original calling thread, but still requires a thread-per-connection. Of course HTTP has some issues too - you effectively need a separate connection per outstanding request. Pipelining won't work well because things need to come back in-order. I'm not sure if RMI has this limitation as well. > FWIW, our distributed search uses http over 120+ shards... and is > written in python. That would be an awesome test case if you were able to use what Solr is going to provide out-of-the-box. Any unusual requirements? -Yonik