Hi,

I thought the move from Facebook to Apache packages might be an opportunity to redesign the Java API. Redesigning the API at this stage means that there are no backward compatibility issues: you either use the old or the new packages. A switch can be added to the compiler to choose between code that uses the old or the new packages.

Why can a redesign be beneficial?
- A different set of functions for TProtocol so protocol implementations have more flexibility (For example see https://issues.apache.org/jira/browse/THRIFT-110 (Changes to the IDL would still be required to be able to implement a compact format)). - Support for changes to the IDL needed to support more formats (e.g. the compact format) if the IDL is changed.
- Properly supporting optional members of structures.
- Removal of the name when writing the beginning of the structure as nobody seems to need this (https://issues.apache.org/jira/browse/THRIFT-8). There is no need to hang on to obsolete constructs and confuse new users. - Replace base classes with interfaces so implementations that want to do things differently do not have to extend the base class awkwardly. For example: TProtocol and TServer. - Refactoring of strange constructs in the current API. For example: the 'getTransport(TTransport)' method of TTransportFactory is used as a transport transformer in addition to being used as a factory. When used as a factory method the parameter makes no sense. - Remove the Client and Processor classes from generated code of services so server and client implementations have more freedom concerning how (service and) function selection information is communicated and processed, and how I/O is handled. See below.
- Drop transports. See below.
- Add an interface to the generated code and add support classes for doing asynchronous calls (i.e. support for sequence IDs). See below. - Drop the many constructors of a number of classes and replace them with a single constructor taking an options object.
- Drop the 'T' prefix of types as this is not customary in Java.

Removal of Client and Processor
===============================
It is impossible for the client and server implementations to handle services in a generic way currently. The only types available are TProcessorFactory and TProcessor. These types do not provide enough information. For example: for some server implementations it is interesting to know whether or not a function has the 'async' keyword. If a function does have it, the communication channel can be released immediatly instead of being held onto while the asynchronous function is running.

In addition, the assumption that the protocol can be assigned to the classes Client and Processor once, makes it more difficult to implement multiplexing servers (which also need to send the service name in addition to the function name), clients and servers using non-blocking I/O, connection pooling clients, asynchronous calls, etc. It would be much better if client and server implementations have more control over how they want to handle things before passing control to the function implementations.

By only supplying the minimal information about services and functions to client and server implementations, and by passing the protocol as late as possible to function implementations, lots of flexibility is gained. Here is a rough draft of how the API would look like (from the perspective of the server. Information and methods needed for client implementations are missing):
class Service {
  Function get(String);
}
class Function {
  boolean isAsynchronous();

  // Throws an exception if function is asynchronous.
  //
  // Only reads the structure containing the parameters
  // using the protocol. Messages are handled by the
  // server.
  //
  // Returns the wrapper around the result.
  //
  // Server can decide when to write the result without
  // having to copy it to a byte array.
  Base invoke(Protocol);

  // Discards result if function is synchronous.
  //
  // Only reads the structure containing the parameters
  // using the protocol. Messages are handled by the
  // server.
  void invoke(Protocol);
}

Another example that could be built is a client that requests the mapping of (services and) functions to IDs from the server, so invocation of a particular function does not require the sending of the (service and) function names. Instead a few bytes are all that is needed to select a function. This is very difficult to do now because the Client and Processor classes have control over the information of the message being sent.

For the multiplexing server that I want to build having more freedom could mean replacing this request header:
i32:    version and type
string: service name
i32:    sequence ID (unused)
i32:    version and type
string: function name
i32:    sequence ID

With this request header (if I implement the function to ID mapping). This saves (for a 15-character ASCII service name, a 10-character ASCII function name and less than 16384 functions) 42-43 bytes per function invocation:
byte:   version and type
vint:   function ID (1 or 2 bytes)
i32:    sequence ID

If I would use one bit of the version and type byte as a flag to indicate sequence IDs are not needed, another 4 bytes could be saved, i.e. only 2-3 bytes would be needed to select a function.

Note that protocols must understand more header types than 'TMessage' for this to work. Maybe the clients and servers must be given the freedom to write the header instead of having a single message type. Not all clients and servers can have their own header format because that would defeat interoperability, instead a number of header formats have to be agreed upon. For example:
- Current header format for single-service clients and servers.
- New header format for multiplexed clients and servers with support for:
  - a function to ID map for more efficient function selection;
  - indicating a sequence ID is not needed.

Removal of the Client and Processor classes will also remove a lot of duplicate code for handling calls from the generated classes. In my opinion this duplicate code can easily be transformed into generic code in client and server implementations.

Drop transports
===============
All servers currently require a transport, but the internal workings of a server are closely tied to a specific transport. Why have servers work with wrappers around sockets instead of working with sockets directly? Possibilities for server implementation will now be limited by the wrappers instead of allowing the full flexibility of the socket API.

In my opinion a transport is internal to a server. If a server can be implemented in a better way, the thin transport wrapper around sockets should not be a limiting factor. It would be very cumbersome if an API change of a 'TServerTransport' implementation is needed to be able to implement a server differently.

The only thing that needs to be handled is 'TFramedTransport' because this is a decorator. In my opinion it may be better to implement it as a protocol decorator which frames messages.

I don't know how this works out for clients but I think it would be very similar.

Asynchronous interface
======================
(Does anyone have a better name than 'asynchronous functions' so the 'async' keyword and sequence IDs cannot be confused?)

The addition of an asynchronous interface (to support sequence IDs) besides the synchronous one makes it easier to implement asynchronous calls (using sequence IDs). Asynchronous calls need to be handled differently by a client anyway, so it is best to make this distinction clear. For some languages only asynchronous interfaces might be generated. Here is a rough draft of how this would look if polling is used to retrieve the response:
interface Calculator.IfaceAsync {
  AsynchronousCall ping();
  IntAsynchronousCall add(int, int);
  IntAsynchronousCall calculate(int, Work);
  // zip() not added because it does not return a result
}
interface AsynchronousCall {
  boolean hasResponseArrived();
}
interface IntAsynchronousCall extends AsynchronousCall {
  int getResult();
}
interface StructAsynchronousCall<R> extends AsynchronousCall {
  R getResult();
}

An alternative is to use events instead of polling:
interface Calculator.IfaceAsync {
  void ping(VoidAsynchronousCompletion);
  void add(int, int, IntAsynchronousCompletion);
  void calculate(int, Work, IntAsynchronousCompletion);
  // zip() not added because it does not return a result
}
interface VoidAsynchronousCompletion {
  void handleResponse();
}
interface IntAsynchronousCompletion {
  void handleResponse(int);
}
interface StructAsynchronousCompletion<R> {
  void handleResponse(R);
}


What do you think? Would a redesign be useful and worth it?

--
Kind regards,

Johan Stuyts

Reply via email to