In our experience, we use flavors of Nutch RPC, RMI and Externalizable. RMI has been easy to implement when only one server needs to be accessed (such as a status check) and class has many functions.
The Nutch RPC is excellent for distribution -- yes one needs to serialize by hand and create the OP_CODE, but while distributing you don't wand the classes to be very heavy. We've created a few other distrusted server that use the Nutch RPC as the distribution mechanism. B Java RPC implementation while having improved over the years is still heavier than RPC and in our tests took slightly longer. While we wanted to use one or the other -- we got a lot better performance/milage by evaluating which would be better for the particular subsystem. Distributed, homogenous systems we use Nutch RPC. On more fluid, complex/vertical systems we started with plain RMI (as it's a lot faster to develop/test) and then externalized once functionality was solidified. This is just our experience, though as (and if) things get more complicated it may make sense to look at RMI again. I personally feel if you're going to do RMI and then write all the externalization stuff, why not just stick with the simplified RPC -- the work involved is pretty much the same, and the latter gives you more control with better speed. -----Original Message----- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: Monday, August 08, 2005 5:29 PM To: [email protected] Subject: Re: Writable vs Externalizable Stefan Groschupf wrote: > can someone please tell me what is the technical difference between > org.apache.nutch.io.Writable and java.io.Externalizable? > > For me that looks very similar and Externalizable is available since > jdk 1.1. > What do I miss? You don't miss much! I avoided using Java's built-in Serialization and RMI when first writing Nutch as I wanted close control of how objects are written and of the client/server architecture (how it connects, how many connections, what happens when things fail, etc). I felt that it might be difficult to use parts of Serialization and RMI without getting tangled in the rest. Yes, we could easily switch to using java.io.Externalizable in place of org.apache.nutch.io.Writable. We would also then need to switch to using ObjectInput and ObjectOutput in place of DataInput and DataOutput. But how should we implement writeObject() and readObject()? I'm hesitant to use ObjectInputStream and ObjectOutputStream, since these have a lot of other baggage, but maybe I'm just paranoid. That said, in org.apache.nutch.io.ObjectWritable (mapred branch) I have now recreated much of object serialization, so perhaps it is time to seriously reconsider this decision. In general I try to not adopt libraries into the core that include a lot of complex functionality that we don't intend to use. Java's Serialization provides a lot of features needed for RMI that I don't think that Nutch requires. What do others think? Doug ------------------------------------------------------- SF.Net email is Sponsored by the Better Software Conference & EXPO September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
