In our experience, we use flavors of Nutch RPC, RMI and Externalizable.

RMI has been easy to implement when only one server needs to be accessed
(such as a status check) and class has many functions. 

The Nutch RPC is excellent for distribution -- yes one needs to serialize by
hand and create the OP_CODE, but while distributing you don't wand the
classes to be very heavy. We've created a few other distrusted server that
use the Nutch RPC as the distribution mechanism. B

Java RPC implementation while having improved over the years is still
heavier than RPC and in our tests took slightly longer. While we wanted to
use one or the other -- we got a lot better performance/milage by evaluating
which would be better for the particular subsystem. Distributed, homogenous
systems we use Nutch RPC. On more fluid, complex/vertical systems we started
with plain RMI (as it's a lot faster to develop/test) and then externalized
once functionality was solidified. 

This is just our experience, though as (and if) things get more complicated
it may make sense to look at RMI again. I personally feel if you're going to
do RMI and then write all the externalization stuff, why not just stick with
the simplified RPC -- the work involved is pretty much the same, and the
latter gives you more control with better speed.
 



-----Original Message-----
From: Doug Cutting [mailto:[EMAIL PROTECTED] 
Sent: Monday, August 08, 2005 5:29 PM
To: [email protected]
Subject: Re: Writable vs Externalizable

Stefan Groschupf wrote:
> can someone please tell me what is the technical difference between 
> org.apache.nutch.io.Writable and java.io.Externalizable?
> 
> For me that looks very similar and Externalizable is available since 
> jdk 1.1.
> What do I miss?

You don't miss much!

I avoided using Java's built-in Serialization and RMI when first writing
Nutch as I wanted close control of how objects are written and of the
client/server architecture (how it connects, how many connections, what
happens when things fail, etc).  I felt that it might be difficult to use
parts of Serialization and RMI without getting tangled in the rest.

Yes, we could easily switch to using java.io.Externalizable in place of
org.apache.nutch.io.Writable.  We would also then need to switch to using
ObjectInput and ObjectOutput in place of DataInput and DataOutput. 
   But how should we implement writeObject() and readObject()?  I'm hesitant
to use ObjectInputStream and ObjectOutputStream, since these have a lot of
other baggage, but maybe I'm just paranoid.

That said, in org.apache.nutch.io.ObjectWritable (mapred branch) I have now
recreated much of object serialization, so perhaps it is time to seriously
reconsider this decision.

In general I try to not adopt libraries into the core that include a lot of
complex functionality that we don't intend to use.  Java's Serialization
provides a lot of features needed for RMI that I don't think that Nutch
requires.

What do others think?

Doug




-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to