Christopher Dolan wrote:
I've seen v2.1 Reggie scaled up to 4000 registered services and 2000
event listeners on a LAN, and I've seen Reggie working across WAN
systems about half that size.  In the steady state in healthy systems,
this works great.  A couple of problems I've seen:


 1) when Reggie starts (or restarts), it gets hammered with new requests
as services and clients find it via lookup locator polling, and chaos
ensues.  I've seen Reggie go non-linear and hang when it exceeds N
threads all waiting for RegistrarImpl.concurrentObj, where N is
something like 1000.  In that regime, stack memory per thread matters a
lot too.

 2) the implementation of RegistrarProxy.lookup(ServiceTemplate,int)
insists on unmarshalling all matches serially, which hangs for a long
time if any services have bad codebase jars (denial of service). This
weakest-link problem is exacerbated by WAN latencies.  I have an
experimental patch that unmarshals in parallel, but it requires
cooperation of both Reggie and clients (i.e. changes to both
RegistrarProxy and ServiceDiscoveryManager).
Chris, can you upload your experimental patch?

This looks like a problem that needs addressing.

Cheers,

Peter.

The latter problem is solvable, but the former is hard.  Reggie is
really sensitive to performance bottlenecks at startup.  Rewriting
ReadersWriter to avoid using notifyAll() might help, perhaps borrowing
CAS tricks from ReentrantReadWriteLock.

Chris


Reply via email to