I appreciate the attention to backward compatibility issues, but I don't think we should be so quick to fork the Java libraries at this stage in the game. While it may not be possible in every regards to every one of your suggestions below, I think in general it would be best if we could apply the improvements to the existing libraries rather than create new ones. This specter of backwards compatibility has been brought up a few times now in regards to big refactors (like changing packages, etc), and I really think while we're in the Incubator, pre-release, we should just refactor what we want to refactor. We're never going to get such an open chance again. That's not to say we shouldn't take steps to ease the transition for people with existing code, but let's find a way that doesn't involve creating a new library and deprecating the old one.

The reason why I would like to start fresh is because I think we can move quicker if the old API is dropped. The advantage I see with dropping the old API completely is that if the current API is kept, keeping up with incremental improvements could become a maintenance nightmare. I understand this is an enormous step and of course it should not be taken lightly.

- Support for changes to the IDL needed to support more formats (e.g. the compact format) if the IDL is changed.
It's true that it would be nice if the TProtocol interface was richer, because it would definitely make implementing the compact protocol easier. However, that said, I'm not convinced its necessary. We've established that it can be done with a stateful TProtocol. (We also don't need IDL changes to implement the compact protocol, with the possible exception of extern strings.)

Most things I want to do can indeed be done. But the things I have to write feel like kludges instead of solutions that fit naturally into the Thrift framework.

What if everything could be simpler so people are more likely to contribute and Thrift can grow faster without being held back by a more cumbersome API? That is the advantage I want to gain with a redesign.

For example: one change to the IDL might be specifying the range of values a member or container element is allowed to have. By passing this range to the protocol while writing it is very easy for the protocol to determine how to encode a single value or a container of values. Sure, it can be done by inspecting the value or all the values in a container, but it is so much easier and faster if the information is handed to the protocol.

- Properly supporting optional members of structures.
This is done in the java:beans generator already. I do wonder why anyone would use the non-beans generator at all, though, so you might make the argument that the two paths should be unified.

Okay, I thought that maybe some library classes were involved. Great to hear that they are not.

- Removal of the name when writing the beginning of the structure as nobody seems to need this (https://issues.apache.org/jira/browse/ THRIFT-8). There is no need to hang on to obsolete constructs and confuse new users.
This is part of the protocol interface, correct? Are you suggesting that we remove struct names from all of thrift, or just from the Java API? If it's unused, I'm pro-removal everywhere. However, we do have to decide if that is a limitation we want to have.

If nobody needs it, and therefore nobody is going to support it, it can be dropped from all languages as far as I am concerned. I created THRIFT-8 to get some clarification and got no response so apparently nobody is using it. I understand the idea behind it and think that it can work (for some languages). At the moment support for it is incomplete and broken. The remnants in the classes might confuse new users which should be avoided in my opinion.

- Remove the Client and Processor classes from generated code of services so server and client implementations have more freedom concerning how (service and) function selection information is communicated and processed, and how I/O is handled. See below.
Are you sure we should stop generating the code altogether? For most default cases, I doubt people are going to want to reimplement these pieces. I'd be for making the Processor interface clearer and easier to implement outside of code generation, though.

Yes, I want to remove all this code and move the code to client and server implementations. I absolutely do not want people to have to implement these procedures over and over again. I think the way requests are sent, received and processed can differ significantly between implementations and therefore the details belong to client and server implementations and not to the services themselves.

- Drop transports. See below.
I sort of like the transport abstraction. Not every use of transports is hidden behind a server, after all - Rapleaf does a lot of serializing structures to byte arrays and such. For the case of a transport and a server being intimately intertwined, I would say that there's nothing stopping you from not taking a Transport in the constructor and doing whatever you want internally. The only point at which you need to use a Transport is as soon as you want to rely on the generated code (de)serialization, and at that point you can use the memory based transports. I did something analogous in the TNonblockingServer.

Why would I need a TMemoryBuffer to be able to stream to memory, a file, etc.? A protocol could write directly to I/O streams. The extra layer of transports around I/O streams seems redundant to me.

The advantage a transport provides over I/O streams is a method to completely fill a byte array with input data. This function can easily be extracted to a helper function.

The only issue is with TFramedTransport because it is hard to define what the transport should frame if it was implemented as a protocol: a message (service invocation), a structure (serialization), ...

- Drop the many constructors of a number of classes and replace them with a single constructor taking an options object.
I see where you're going with this, but doesn't this just mean that you now have to manually code the validation of order and presence of certain parameters in certain situations? Certainly the existing scenario of tons of constructors is overly verbose, but at least it avoids the bugs that would show up if we tried to do it ourself.

No, is should be easy. You just need a single validation function for the option object passed in, and call this from the constructor. A simplistic example:
static void validate(Options o) {
  if (o.getInputTransportFactory() == null) {
throw new IllegalArgumentException("Input transport factory cannot be 'null'");
  }
  if (<some condition>) {
    if (o.getX() == null) {
      throw new IllegalArgumentException("X cannot be 'null'");
    }
  }
  ...
}

The options object needs to have convenience functions so it is easy to create a valid instance. For example:
void setTransportFactory(TTransportFactory);
void setInputTransportFactory(TTransportFactory);
void setOutputTransportFactory(TTransportFactory);

I use this pattern all the time and it greatly simplifies my code:
- no management of configuration values in the object itself, i.e. everything is moved to the options object; - a single location containing the validation rules: one method to validate whether an options object is correct; - easy to check if the object is in a correctly configured state in methods: just check for a non-null options object (only needed when using two-stage construction).

- Drop the 'T' prefix of types as this is not customary in Java.
I'm mostly pro on this one, though there are a few situations that would yield name clashes if we did this to every class (TException -> Exception, for one).

Name clashes with standard classes should be avoided; the less confusion the better. Instead of 'T' 'Thrift' can be used then.


In general I would like to say that the reason I posted this is that I have a working prototype of a non-Thrift, non-blocking, multiplexing server which does not require the use of framing. I am about to build a Thrift implementation based on it and I ran into some issues (incomplete list): - the need for a thread-local protocol implementation so a Client or Processor can be used from multiple threads; - the impossibility to detect that a function has the 'async' keyword so I cannot start these functions in a background thread and release the communication channel; - the inability to write an efficient header for a multiplexed implementation.

If things were less tightly coupled, and the request processing was not hardcoded in the services, the implementation would be much easier.

--
Kind regards,

Johan Stuyts

Reply via email to