Re: Redesign of the Java API

Bryan Duxbury Tue, 09 Sep 2008 07:54:11 -0700

This is a very ambitious proposal! I will comment on specificsinline, but let me start by saying that we should probably create alot of separate JIRA tickets for dealing with the ideas raised inthis issue. As one monolithic blob, it will be too difficult toimplement and review.


On Sep 9, 2008, at 5:40 AM, Johan Stuyts wrote:

Hi,
I thought the move from Facebook to Apache packages might be anopportunity to redesign the Java API. Redesigning the API at thisstage means that there are no backward compatibility issues: youeither use the old or the new packages. A switch can be added tothe compiler to choose between code that uses the old or the newpackages.

I appreciate the attention to backward compatibility issues, but Idon't think we should be so quick to fork the Java libraries at thisstage in the game. While it may not be possible in every regards toevery one of your suggestions below, I think in general it would bebest if we could apply the improvements to the existing librariesrather than create new ones. This specter of backwards compatibilityhas been brought up a few times now in regards to big refactors (likechanging packages, etc), and I really think while we're in theIncubator, pre-release, we should just refactor what we want torefactor. We're never going to get such an open chance again. That'snot to say we shouldn't take steps to ease the transition for peoplewith existing code, but let's find a way that doesn't involvecreating a new library and deprecating the old one.

Why can a redesign be beneficial?
- A different set of functions for TProtocol so protocolimplementations have more flexibility (For example see https://issues.apache.org/jira/browse/THRIFT-110 (Changes to the IDL wouldstill be required to be able to implement a compact format)).- Support for changes to the IDL needed to support more formats(e.g. the compact format) if the IDL is changed.

It's true that it would be nice if the TProtocol interface wasricher, because it would definitely make implementing the compactprotocol easier. However, that said, I'm not convinced its necessary.We've established that it can be done with a stateful TProtocol. (Wealso don't need IDL changes to implement the compact protocol, withthe possible exception of extern strings.)

- Properly supporting optional members of structures.

This is done in the java:beans generator already. I do wonder whyanyone would use the non-beans generator at all, though, so you mightmake the argument that the two paths should be unified.

- Removal of the name when writing the beginning of the structureas nobody seems to need this (https://issues.apache.org/jira/browse/THRIFT-8). There is no need to hang on to obsolete constructs andconfuse new users.

This is part of the protocol interface, correct? Are you suggestingthat we remove struct names from all of thrift, or just from the JavaAPI? If it's unused, I'm pro-removal everywhere. However, we do haveto decide if that is a limitation we want to have.

- Replace base classes with interfaces so implementations that wantto do things differently do not have to extend the base classawkwardly. For example: TProtocol and TServer.

+1

- Refactoring of strange constructs in the current API. Forexample: the 'getTransport(TTransport)' method of TTransportFactoryis used as a transport transformer in addition to being used as afactory. When used as a factory method the parameter makes no sense.

+1

- Remove the Client and Processor classes from generated code ofservices so server and client implementations have more freedomconcerning how (service and) function selection information iscommunicated and processed, and how I/O is handled. See below.

Are you sure we should stop generating the code altogether? For mostdefault cases, I doubt people are going to want to reimplement thesepieces. I'd be for making the Processor interface clearer and easierto implement outside of code generation, though.

- Drop transports. See below.

I sort of like the transport abstraction. Not every use of transportsis hidden behind a server, after all - Rapleaf does a lot ofserializing structures to byte arrays and such. For the case of atransport and a server being intimately intertwined, I would say thatthere's nothing stopping you from not taking a Transport in theconstructor and doing whatever you want internally. The only point atwhich you need to use a Transport is as soon as you want to rely onthe generated code (de)serialization, and at that point you can usethe memory based transports. I did something analogous in theTNonblockingServer.

- Add an interface to the generated code and add support classesfor doing asynchronous calls (i.e. support for sequence IDs). Seebelow.- Drop the many constructors of a number of classes and replacethem with a single constructor taking an options object.

I see where you're going with this, but doesn't this just mean thatyou now have to manually code the validation of order and presence ofcertain parameters in certain situations? Certainly the existingscenario of tons of constructors is overly verbose, but at least itavoids the bugs that would show up if we tried to do it ourself.

- Drop the 'T' prefix of types as this is not customary in Java.

I'm mostly pro on this one, though there are a few situations thatwould yield name clashes if we did this to every class (TException ->Exception, for one).

Removal of Client and Processor
===============================
It is impossible for the client and server implementations tohandle services in a generic way currently. The only typesavailable are TProcessorFactory and TProcessor. These types do notprovide enough information. For example: for some serverimplementations it is interesting to know whether or not a functionhas the 'async' keyword. If a function does have it, thecommunication channel can be released immediatly instead of beingheld onto while the asynchronous function is running.
In addition, the assumption that the protocol can be assigned tothe classes Client and Processor once, makes it more difficult toimplement multiplexing servers (which also need to send the servicename in addition to the function name), clients and servers usingnon-blocking I/O, connection pooling clients, asynchronous calls,etc. It would be much better if client and server implementationshave more control over how they want to handle things beforepassing control to the function implementations.
By only supplying the minimal information about services andfunctions to client and server implementations, and by passing theprotocol as late as possible to function implementations, lots offlexibility is gained. Here is a rough draft of how the API wouldlook like (from the perspective of the server. Information andmethods needed for client implementations are missing):
class Service {
  Function get(String);
}
class Function {
  boolean isAsynchronous();

  // Throws an exception if function is asynchronous.
  //
  // Only reads the structure containing the parameters
  // using the protocol. Messages are handled by the
  // server.
  //
  // Returns the wrapper around the result.
  //
  // Server can decide when to write the result without
  // having to copy it to a byte array.
  Base invoke(Protocol);

  // Discards result if function is synchronous.
  //
  // Only reads the structure containing the parameters
  // using the protocol. Messages are handled by the
  // server.
  void invoke(Protocol);
}
Another example that could be built is a client that requests themapping of (services and) functions to IDs from the server, soinvocation of a particular function does not require the sending ofthe (service and) function names. Instead a few bytes are all thatis needed to select a function. This is very difficult to do nowbecause the Client and Processor classes have control over theinformation of the message being sent.
For the multiplexing server that I want to build having morefreedom could mean replacing this request header:
i32:    version and type
string: service name
i32:    sequence ID (unused)
i32:    version and type
string: function name
i32:    sequence ID
With this request header (if I implement the function to IDmapping). This saves (for a 15-character ASCII service name, a 10-character ASCII function name and less than 16384 functions) 42-43bytes per function invocation:
byte:   version and type
vint:   function ID (1 or 2 bytes)
i32:    sequence ID
If I would use one bit of the version and type byte as a flag toindicate sequence IDs are not needed, another 4 bytes could besaved, i.e. only 2-3 bytes would be needed to select a function.
Note that protocols must understand more header types than'TMessage' for this to work. Maybe the clients and servers must begiven the freedom to write the header instead of having a singlemessage type. Not all clients and servers can have their own headerformat because that would defeat interoperability, instead a numberof header formats have to be agreed upon. For example:
- Current header format for single-service clients and servers.
- New header format for multiplexed clients and servers withsupport for:
  - a function to ID map for more efficient function selection;
  - indicating a sequence ID is not needed.
Removal of the Client and Processor classes will also remove a lotof duplicate code for handling calls from the generated classes. Inmy opinion this duplicate code can easily be transformed intogeneric code in client and server implementations.
Drop transports
===============
All servers currently require a transport, but the internalworkings of a server are closely tied to a specific transport. Whyhave servers work with wrappers around sockets instead of workingwith sockets directly? Possibilities for server implementation willnow be limited by the wrappers instead of allowing the fullflexibility of the socket API.
In my opinion a transport is internal to a server. If a server canbe implemented in a better way, the thin transport wrapper aroundsockets should not be a limiting factor. It would be verycumbersome if an API change of a 'TServerTransport' implementationis needed to be able to implement a server differently.
The only thing that needs to be handled is 'TFramedTransport'because this is a decorator. In my opinion it may be better toimplement it as a protocol decorator which frames messages.
I don't know how this works out for clients but I think it would bevery similar.
Asynchronous interface
======================
(Does anyone have a better name than 'asynchronous functions' sothe 'async' keyword and sequence IDs cannot be confused?)
The addition of an asynchronous interface (to support sequence IDs)besides the synchronous one makes it easier to implementasynchronous calls (using sequence IDs). Asynchronous calls need tobe handled differently by a client anyway, so it is best to makethis distinction clear. For some languages only asynchronousinterfaces might be generated. Here is a rough draft of how thiswould look if polling is used to retrieve the response:
interface Calculator.IfaceAsync {
  AsynchronousCall ping();
  IntAsynchronousCall add(int, int);
  IntAsynchronousCall calculate(int, Work);
  // zip() not added because it does not return a result
}
interface AsynchronousCall {
  boolean hasResponseArrived();
}
interface IntAsynchronousCall extends AsynchronousCall {
  int getResult();
}
interface StructAsynchronousCall<R> extends AsynchronousCall {
  R getResult();
}

An alternative is to use events instead of polling:
interface Calculator.IfaceAsync {
  void ping(VoidAsynchronousCompletion);
  void add(int, int, IntAsynchronousCompletion);
  void calculate(int, Work, IntAsynchronousCompletion);
  // zip() not added because it does not return a result
}
interface VoidAsynchronousCompletion {
  void handleResponse();
}
interface IntAsynchronousCompletion {
  void handleResponse(int);
}
interface StructAsynchronousCompletion<R> {
  void handleResponse(R);
}


What do you think? Would a redesign be useful and worth it?

--
Kind regards,

Johan Stuyts

Re: Redesign of the Java API

Reply via email to