RE: HTTP or RMI, Jini, JavaSpaces for distributed search

Peuss, Thomas Sat, 22 Sep 2007 02:34:40 -0700

Hello!

What about JMX Remoting (JSR-160)? See
http://mx4j.sourceforge.net/docs/ch03.html for an implementation with
compatible licenses. It is used by Apache Geronimo as well.


The advantage from my perspective: all the communication hassle is
handled behind the scenes with configurable transports (RMI, HTTP, ...)
and it's a standard. It even supports callbacks from the server to the
registered client! This is great for asynchronous request handling.

Disadvantage: adds a dependency to an extra lib.

About Jini/Javaspaces: To see an implementation with that would be
interesting. ;-)

Just my .02
Thomas

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
Seeley
Sent: Friday, September 21, 2007 8:08 PM
To: solr-dev@lucene.apache.org
Subject: HTTP or RMI, Jini, JavaSpaces for distributed search

I wanted to take a step back for a second and think about if HTTP was
really the right choice for the transport for distributed search.

I think the high-level approach in SOLR-303 is the right way to go about
it, but I'm unsure if HTTP is the right transport.

Pro HTTP:
  - using HTTP allows one to use an http load-balancer to distribute
load across multiple copies of the same shard by assigning a VIP
(virtual IP) to each shard.
  - because you do pretty much everything by hand, you know that there
isn't some hidden limitation that will jump out and bite you later.

Cons HTTP:
 - you end up doing everything by hand... connection handling, request
serialization, response parsing, etc...
 - goes through normal servlet channels... every sub-request will be
logged to the access logs, slowing things down.
- more network bandwidth used unless we come up with a new
BinaryResponseWriter and Parser

Currently, SOLR-303 uses and parses the XML response format, which has
some serious downsides:
- response size limits scalability and how deep in responses you can
go...
  If you want to retrieve documents 5000 through 5009, even though the
user only requested 10 documents, the top-level searcher needs to get
the top 5009 documents from *each* shard... and that can quickly exhaust
the network bandwidth of the NIC.  XML parsing on the order of
nShards*5009 entries won't be any picnic either.

I'm thinking the load-balancing of HTTP is overrated also, because it's
inflexible.  Adding another shard requires adding another VIP in the
load-balancer, and changing which servers have which shards or adding
new copies of a shard also requires load-balancer configuration.
Everything points to Solr being able to do the load-balancing itself in
the future, and there wouldn't seem to be much benefit to using a
load-balancer w/ VIPS for each shard vs having Solr do it.

So even if we stuck with HTTP, Solr would need
 - a binary protocol to minimize network bandwidth use
 - load balancing across shard copies itself

Given that, would it make sense to just go with RMI instead?
And perhaps leverage some other higher level services (Jini?
JavaSpaces?)

I'd like to hear from people with more experience with RMI & friends,
and what the potential downsides are to using these technologies.

-Yonik

RE: HTTP or RMI, Jini, JavaSpaces for distributed search

Reply via email to