On Jun 26, 2012, at 11:39 AM, exabrial wrote:

> Thanks for pointing exactly where I need to look!
> 
> I'm relieved to see that the underlying protocol isn't CORBA/IIOP. It looks
> like is sort of a custom protocol. The request is encapsulated with a
> request type (auth, jndi, or ejb) then it's reading serialized java objects
> with ObjectInputStream. Overall, it's probably pretty danged fast.

At one point the protocol included custom ObjectInputStream/ObjectOutputStream 
implementations I wrote that *were* faster.  JVM optimizations put and end to 
that and instead of being 30% faster they were actually slower :)  So we pealed 
them out and reverted to the built-in JVM implementations.

But overall the protocol has been written with an intimate knowledge of 
serialization and attempts to avoid some of the chattier parts of it.  For 
example object serialization writes structure information about a class (once) 
then writes the data for that class (once per instance).  Generally speaking, 
we've cut out the structure part of all our objects and get straight to the 
data writing part.  So "our" portion of the communication is incredibly small 
leaving the rest for your objects.

It also supports sending a versioned list of server addresses for clustering 
support.  The client sends the version number on every request.  If the list 
has changed, the server sends back a new list & version with the regular 
response.

In general we try and keep state or similar things boiled down to a byte or 
long and only transmit "full" data when necessary.

If ObjectInputStream/ObjectOutputStream implementations weren't so expensive to 
maintain, I'd take another crack at writing a better one.  Basically, an OOS or 
OIS will cache both class and instance data.  The instance data is cached so 
that if you see a reference to the same object again you just write its id 
instead of writing the entire object.  Because there's instance data cached in 
the OOS and OIS instances, you have to throw them away and create new ones on 
each request.  This unfortunately throws everything away including the class 
descriptor data.  So if you 1000 requests using an object graph consisting of 
30 objects you're writing effectively constant data 29970 times more than you 
need to.

The optimization would be to simply split OOS into two objects (two caches).  
One to hold class descriptor cache -- this object you keep and reuse on every 
request.  And one to hold instance cache -- this one you create on every 
request.  

Then communication would naturally compress.  After the first few requests, 
you'd be done writing class descriptor data for the most part and only be 
writing instance data.

Anyway, I get way too into this stuff :)  If you were looking for something fun 
to hack on, this would be one of those cool areas.  Grabbing the OOS and OIS 
code from Harmony would be a great way to get started.  The it's just a matter 
of refactoring the code into a thread safe outter class to hold the class 
descriptor cache and a factory method to create an ObjectOutputStream which is 
really just an non-static inner class that can reuse the class cache and has 
it's own cache for instances.

> 
> It's not modular however, but the design is beautifully simplistic; I'd hate
> to see it get trashed with a pluggable handlers :(

Thanks very much :)  I like to think "tight and simple" describes OpenEJB 
overall, but definitely the protocol is one of my favorite parts of the code.


-David

Reply via email to