Redesign of the Java API

Johan Stuyts Tue, 09 Sep 2008 05:41:22 -0700

Hi,

I thought the move from Facebook to Apache packages might be anopportunity to redesign the Java API. Redesigning the API at this stagemeans that there are no backward compatibility issues: you either use theold or the new packages. A switch can be added to the compiler to choosebetween code that uses the old or the new packages.


Why can a redesign be beneficial?

- A different set of functions for TProtocol so protocol implementationshave more flexibility (For example seehttps://issues.apache.org/jira/browse/THRIFT-110 (Changes to the IDL wouldstill be required to be able to implement a compact format)).- Support for changes to the IDL needed to support more formats (e.g. thecompact format) if the IDL is changed.

- Properly supporting optional members of structures.

- Removal of the name when writing the beginning of the structure asnobody seems to need this(https://issues.apache.org/jira/browse/THRIFT-8). There is no need to hangon to obsolete constructs and confuse new users.- Replace base classes with interfaces so implementations that want to dothings differently do not have to extend the base class awkwardly. Forexample: TProtocol and TServer.- Refactoring of strange constructs in the current API. For example: the'getTransport(TTransport)' method of TTransportFactory is used as atransport transformer in addition to being used as a factory. When used asa factory method the parameter makes no sense.- Remove the Client and Processor classes from generated code of servicesso server and client implementations have more freedom concerning how(service and) function selection information is communicated andprocessed, and how I/O is handled. See below.

- Drop transports. See below.

- Add an interface to the generated code and add support classes for doingasynchronous calls (i.e. support for sequence IDs). See below.- Drop the many constructors of a number of classes and replace them witha single constructor taking an options object.

- Drop the 'T' prefix of types as this is not customary in Java.

Removal of Client and Processor
===============================

It is impossible for the client and server implementations to handleservices in a generic way currently. The only types available areTProcessorFactory and TProcessor. These types do not provide enoughinformation. For example: for some server implementations it isinteresting to know whether or not a function has the 'async' keyword. Ifa function does have it, the communication channel can be releasedimmediatly instead of being held onto while the asynchronous function isrunning.

In addition, the assumption that the protocol can be assigned to theclasses Client and Processor once, makes it more difficult to implementmultiplexing servers (which also need to send the service name in additionto the function name), clients and servers using non-blocking I/O,connection pooling clients, asynchronous calls, etc. It would be muchbetter if client and server implementations have more control over howthey want to handle things before passing control to the functionimplementations.

By only supplying the minimal information about services and functions toclient and server implementations, and by passing the protocol as late aspossible to function implementations, lots of flexibility is gained. Hereis a rough draft of how the API would look like (from the perspective ofthe server. Information and methods needed for client implementations aremissing):

class Service {
  Function get(String);
}
class Function {
  boolean isAsynchronous();

  // Throws an exception if function is asynchronous.
  //
  // Only reads the structure containing the parameters
  // using the protocol. Messages are handled by the
  // server.
  //
  // Returns the wrapper around the result.
  //
  // Server can decide when to write the result without
  // having to copy it to a byte array.
  Base invoke(Protocol);

  // Discards result if function is synchronous.
  //
  // Only reads the structure containing the parameters
  // using the protocol. Messages are handled by the
  // server.
  void invoke(Protocol);
}

Another example that could be built is a client that requests the mappingof (services and) functions to IDs from the server, so invocation of aparticular function does not require the sending of the (service and)function names. Instead a few bytes are all that is needed to select afunction. This is very difficult to do now because the Client andProcessor classes have control over the information of the message beingsent.

For the multiplexing server that I want to build having more freedom couldmean replacing this request header:

i32:    version and type
string: service name
i32:    sequence ID (unused)
i32:    version and type
string: function name
i32:    sequence ID

With this request header (if I implement the function to ID mapping). Thissaves (for a 15-character ASCII service name, a 10-character ASCIIfunction name and less than 16384 functions) 42-43 bytes per functioninvocation:

byte:   version and type
vint:   function ID (1 or 2 bytes)
i32:    sequence ID

If I would use one bit of the version and type byte as a flag to indicatesequence IDs are not needed, another 4 bytes could be saved, i.e. only 2-3bytes would be needed to select a function.

Note that protocols must understand more header types than 'TMessage' forthis to work. Maybe the clients and servers must be given the freedom towrite the header instead of having a single message type. Not all clientsand servers can have their own header format because that would defeatinteroperability, instead a number of header formats have to be agreedupon. For example:

- Current header format for single-service clients and servers.
- New header format for multiplexed clients and servers with support for:
  - a function to ID map for more efficient function selection;
  - indicating a sequence ID is not needed.

Removal of the Client and Processor classes will also remove a lot ofduplicate code for handling calls from the generated classes. In myopinion this duplicate code can easily be transformed into generic code inclient and server implementations.


Drop transports
===============

All servers currently require a transport, but the internal workings of aserver are closely tied to a specific transport. Why have servers workwith wrappers around sockets instead of working with sockets directly?Possibilities for server implementation will now be limited by thewrappers instead of allowing the full flexibility of the socket API.

In my opinion a transport is internal to a server. If a server can beimplemented in a better way, the thin transport wrapper around socketsshould not be a limiting factor. It would be very cumbersome if an APIchange of a 'TServerTransport' implementation is needed to be able toimplement a server differently.

The only thing that needs to be handled is 'TFramedTransport' because thisis a decorator. In my opinion it may be better to implement it as aprotocol decorator which frames messages.

I don't know how this works out for clients but I think it would be verysimilar.


Asynchronous interface
======================

(Does anyone have a better name than 'asynchronous functions' so the'async' keyword and sequence IDs cannot be confused?)

The addition of an asynchronous interface (to support sequence IDs)besides the synchronous one makes it easier to implement asynchronouscalls (using sequence IDs). Asynchronous calls need to be handleddifferently by a client anyway, so it is best to make this distinctionclear. For some languages only asynchronous interfaces might be generated.Here is a rough draft of how this would look if polling is used toretrieve the response:

interface Calculator.IfaceAsync {
  AsynchronousCall ping();
  IntAsynchronousCall add(int, int);
  IntAsynchronousCall calculate(int, Work);
  // zip() not added because it does not return a result
}
interface AsynchronousCall {
  boolean hasResponseArrived();
}
interface IntAsynchronousCall extends AsynchronousCall {
  int getResult();
}
interface StructAsynchronousCall<R> extends AsynchronousCall {
  R getResult();
}

An alternative is to use events instead of polling:
interface Calculator.IfaceAsync {
  void ping(VoidAsynchronousCompletion);
  void add(int, int, IntAsynchronousCompletion);
  void calculate(int, Work, IntAsynchronousCompletion);
  // zip() not added because it does not return a result
}
interface VoidAsynchronousCompletion {
  void handleResponse();
}
interface IntAsynchronousCompletion {
  void handleResponse(int);
}
interface StructAsynchronousCompletion<R> {
  void handleResponse(R);
}


What do you think? Would a redesign be useful and worth it?

--
Kind regards,

Johan Stuyts

Redesign of the Java API

Reply via email to