Re: Redesign of the Java API

Johan Stuyts Tue, 09 Sep 2008 08:54:49 -0700

I appreciate the attention to backward compatibility issues, but I don'tthink we should be so quick to fork the Java libraries at this stage inthe game. While it may not be possible in every regards to every one ofyour suggestions below, I think in general it would be best if we couldapply the improvements to the existing libraries rather than create newones. This specter of backwards compatibility has been brought up a fewtimes now in regards to big refactors (like changing packages, etc), andI really think while we're in the Incubator, pre-release, we should justrefactor what we want to refactor. We're never going to get such an openchance again. That's not to say we shouldn't take steps to ease thetransition for people with existing code, but let's find a way thatdoesn't involve creating a new library and deprecating the old one.

The reason why I would like to start fresh is because I think we can movequicker if the old API is dropped. The advantage I see with dropping theold API completely is that if the current API is kept, keeping up withincremental improvements could become a maintenance nightmare. Iunderstand this is an enormous step and of course it should not be takenlightly.

- Support for changes to the IDL needed to support more formats (e.g.the compact format) if the IDL is changed.
It's true that it would be nice if the TProtocol interface was richer,because it would definitely make implementing the compact protocoleasier. However, that said, I'm not convinced its necessary. We'veestablished that it can be done with a stateful TProtocol. (We alsodon't need IDL changes to implement the compact protocol, with thepossible exception of extern strings.)

Most things I want to do can indeed be done. But the things I have towrite feel like kludges instead of solutions that fit naturally into theThrift framework.

What if everything could be simpler so people are more likely tocontribute and Thrift can grow faster without being held back by a morecumbersome API? That is the advantage I want to gain with a redesign.

For example: one change to the IDL might be specifying the range of valuesa member or container element is allowed to have. By passing this range tothe protocol while writing it is very easy for the protocol to determinehow to encode a single value or a container of values. Sure, it can bedone by inspecting the value or all the values in a container, but it isso much easier and faster if the information is handed to the protocol.

- Properly supporting optional members of structures.
This is done in the java:beans generator already. I do wonder why anyonewould use the non-beans generator at all, though, so you might make theargument that the two paths should be unified.

Okay, I thought that maybe some library classes were involved. Great tohear that they are not.

- Removal of the name when writing the beginning of the structure asnobody seems to need this (https://issues.apache.org/jira/browse/THRIFT-8). There is no need to hang on to obsolete constructs andconfuse new users.
This is part of the protocol interface, correct? Are you suggesting thatwe remove struct names from all of thrift, or just from the Java API? Ifit's unused, I'm pro-removal everywhere. However, we do have to decideif that is a limitation we want to have.

If nobody needs it, and therefore nobody is going to support it, it can bedropped from all languages as far as I am concerned. I created THRIFT-8 toget some clarification and got no response so apparently nobody is usingit. I understand the idea behind it and think that it can work (for somelanguages). At the moment support for it is incomplete and broken. Theremnants in the classes might confuse new users which should be avoided inmy opinion.

- Remove the Client and Processor classes from generated code ofservices so server and client implementations have more freedomconcerning how (service and) function selection information iscommunicated and processed, and how I/O is handled. See below.
Are you sure we should stop generating the code altogether? For mostdefault cases, I doubt people are going to want to reimplement thesepieces. I'd be for making the Processor interface clearer and easier toimplement outside of code generation, though.

Yes, I want to remove all this code and move the code to client and serverimplementations. I absolutely do not want people to have to implementthese procedures over and over again. I think the way requests are sent,received and processed can differ significantly between implementationsand therefore the details belong to client and server implementations andnot to the services themselves.

- Drop transports. See below.
I sort of like the transport abstraction. Not every use of transports ishidden behind a server, after all - Rapleaf does a lot of serializingstructures to byte arrays and such. For the case of a transport and aserver being intimately intertwined, I would say that there's nothingstopping you from not taking a Transport in the constructor and doingwhatever you want internally. The only point at which you need to use aTransport is as soon as you want to rely on the generated code(de)serialization, and at that point you can use the memory basedtransports. I did something analogous in the TNonblockingServer.

Why would I need a TMemoryBuffer to be able to stream to memory, a file,etc.? A protocol could write directly to I/O streams. The extra layer oftransports around I/O streams seems redundant to me.

The advantage a transport provides over I/O streams is a method tocompletely fill a byte array with input data. This function can easily beextracted to a helper function.

The only issue is with TFramedTransport because it is hard to define whatthe transport should frame if it was implemented as a protocol: a message(service invocation), a structure (serialization), ...

- Drop the many constructors of a number of classes and replace themwith a single constructor taking an options object.
I see where you're going with this, but doesn't this just mean that younow have to manually code the validation of order and presence ofcertain parameters in certain situations? Certainly the existingscenario of tons of constructors is overly verbose, but at least itavoids the bugs that would show up if we tried to do it ourself.

No, is should be easy. You just need a single validation function for theoption object passed in, and call this from the constructor. A simplisticexample:

static void validate(Options o) {
  if (o.getInputTransportFactory() == null) {

throw new IllegalArgumentException("Input transport factory cannot be'null'");

  }
  if (<some condition>) {
    if (o.getX() == null) {
      throw new IllegalArgumentException("X cannot be 'null'");
    }
  }
  ...
}

The options object needs to have convenience functions so it is easy tocreate a valid instance. For example:

void setTransportFactory(TTransportFactory);
void setInputTransportFactory(TTransportFactory);
void setOutputTransportFactory(TTransportFactory);

I use this pattern all the time and it greatly simplifies my code:

- no management of configuration values in the object itself, i.e.everything is moved to the options object;- a single location containing the validation rules: one method tovalidate whether an options object is correct;- easy to check if the object is in a correctly configured state inmethods: just check for a non-null options object (only needed when usingtwo-stage construction).

- Drop the 'T' prefix of types as this is not customary in Java.
I'm mostly pro on this one, though there are a few situations that wouldyield name clashes if we did this to every class (TException ->Exception, for one).

Name clashes with standard classes should be avoided; the less confusionthe better. Instead of 'T' 'Thrift' can be used then.

In general I would like to say that the reason I posted this is that Ihave a working prototype of a non-Thrift, non-blocking, multiplexingserver which does not require the use of framing. I am about to build aThrift implementation based on it and I ran into some issues (incompletelist):- the need for a thread-local protocol implementation so a Client orProcessor can be used from multiple threads;- the impossibility to detect that a function has the 'async' keyword so Icannot start these functions in a background thread and release thecommunication channel;- the inability to write an efficient header for a multiplexedimplementation.

If things were less tightly coupled, and the request processing was nothardcoded in the services, the implementation would be much easier.


--
Kind regards,

Johan Stuyts

Re: Redesign of the Java API

Reply via email to