I appreciate the attention to backward compatibility issues, but I don't
think we should be so quick to fork the Java libraries at this stage in
the game. While it may not be possible in every regards to every one of
your suggestions below, I think in general it would be best if we could
apply the improvements to the existing libraries rather than create new
ones. This specter of backwards compatibility has been brought up a few
times now in regards to big refactors (like changing packages, etc), and
I really think while we're in the Incubator, pre-release, we should just
refactor what we want to refactor. We're never going to get such an open
chance again. That's not to say we shouldn't take steps to ease the
transition for people with existing code, but let's find a way that
doesn't involve creating a new library and deprecating the old one.
The reason why I would like to start fresh is because I think we can move
quicker if the old API is dropped. The advantage I see with dropping the
old API completely is that if the current API is kept, keeping up with
incremental improvements could become a maintenance nightmare. I
understand this is an enormous step and of course it should not be taken
lightly.
- Support for changes to the IDL needed to support more formats (e.g.
the compact format) if the IDL is changed.
It's true that it would be nice if the TProtocol interface was richer,
because it would definitely make implementing the compact protocol
easier. However, that said, I'm not convinced its necessary. We've
established that it can be done with a stateful TProtocol. (We also
don't need IDL changes to implement the compact protocol, with the
possible exception of extern strings.)
Most things I want to do can indeed be done. But the things I have to
write feel like kludges instead of solutions that fit naturally into the
Thrift framework.
What if everything could be simpler so people are more likely to
contribute and Thrift can grow faster without being held back by a more
cumbersome API? That is the advantage I want to gain with a redesign.
For example: one change to the IDL might be specifying the range of values
a member or container element is allowed to have. By passing this range to
the protocol while writing it is very easy for the protocol to determine
how to encode a single value or a container of values. Sure, it can be
done by inspecting the value or all the values in a container, but it is
so much easier and faster if the information is handed to the protocol.
- Properly supporting optional members of structures.
This is done in the java:beans generator already. I do wonder why anyone
would use the non-beans generator at all, though, so you might make the
argument that the two paths should be unified.
Okay, I thought that maybe some library classes were involved. Great to
hear that they are not.
- Removal of the name when writing the beginning of the structure as
nobody seems to need this (https://issues.apache.org/jira/browse/
THRIFT-8). There is no need to hang on to obsolete constructs and
confuse new users.
This is part of the protocol interface, correct? Are you suggesting that
we remove struct names from all of thrift, or just from the Java API? If
it's unused, I'm pro-removal everywhere. However, we do have to decide
if that is a limitation we want to have.
If nobody needs it, and therefore nobody is going to support it, it can be
dropped from all languages as far as I am concerned. I created THRIFT-8 to
get some clarification and got no response so apparently nobody is using
it. I understand the idea behind it and think that it can work (for some
languages). At the moment support for it is incomplete and broken. The
remnants in the classes might confuse new users which should be avoided in
my opinion.
- Remove the Client and Processor classes from generated code of
services so server and client implementations have more freedom
concerning how (service and) function selection information is
communicated and processed, and how I/O is handled. See below.
Are you sure we should stop generating the code altogether? For most
default cases, I doubt people are going to want to reimplement these
pieces. I'd be for making the Processor interface clearer and easier to
implement outside of code generation, though.
Yes, I want to remove all this code and move the code to client and server
implementations. I absolutely do not want people to have to implement
these procedures over and over again. I think the way requests are sent,
received and processed can differ significantly between implementations
and therefore the details belong to client and server implementations and
not to the services themselves.
- Drop transports. See below.
I sort of like the transport abstraction. Not every use of transports is
hidden behind a server, after all - Rapleaf does a lot of serializing
structures to byte arrays and such. For the case of a transport and a
server being intimately intertwined, I would say that there's nothing
stopping you from not taking a Transport in the constructor and doing
whatever you want internally. The only point at which you need to use a
Transport is as soon as you want to rely on the generated code
(de)serialization, and at that point you can use the memory based
transports. I did something analogous in the TNonblockingServer.
Why would I need a TMemoryBuffer to be able to stream to memory, a file,
etc.? A protocol could write directly to I/O streams. The extra layer of
transports around I/O streams seems redundant to me.
The advantage a transport provides over I/O streams is a method to
completely fill a byte array with input data. This function can easily be
extracted to a helper function.
The only issue is with TFramedTransport because it is hard to define what
the transport should frame if it was implemented as a protocol: a message
(service invocation), a structure (serialization), ...
- Drop the many constructors of a number of classes and replace them
with a single constructor taking an options object.
I see where you're going with this, but doesn't this just mean that you
now have to manually code the validation of order and presence of
certain parameters in certain situations? Certainly the existing
scenario of tons of constructors is overly verbose, but at least it
avoids the bugs that would show up if we tried to do it ourself.
No, is should be easy. You just need a single validation function for the
option object passed in, and call this from the constructor. A simplistic
example:
static void validate(Options o) {
if (o.getInputTransportFactory() == null) {
throw new IllegalArgumentException("Input transport factory cannot be
'null'");
}
if (<some condition>) {
if (o.getX() == null) {
throw new IllegalArgumentException("X cannot be 'null'");
}
}
...
}
The options object needs to have convenience functions so it is easy to
create a valid instance. For example:
void setTransportFactory(TTransportFactory);
void setInputTransportFactory(TTransportFactory);
void setOutputTransportFactory(TTransportFactory);
I use this pattern all the time and it greatly simplifies my code:
- no management of configuration values in the object itself, i.e.
everything is moved to the options object;
- a single location containing the validation rules: one method to
validate whether an options object is correct;
- easy to check if the object is in a correctly configured state in
methods: just check for a non-null options object (only needed when using
two-stage construction).
- Drop the 'T' prefix of types as this is not customary in Java.
I'm mostly pro on this one, though there are a few situations that would
yield name clashes if we did this to every class (TException ->
Exception, for one).
Name clashes with standard classes should be avoided; the less confusion
the better. Instead of 'T' 'Thrift' can be used then.
In general I would like to say that the reason I posted this is that I
have a working prototype of a non-Thrift, non-blocking, multiplexing
server which does not require the use of framing. I am about to build a
Thrift implementation based on it and I ran into some issues (incomplete
list):
- the need for a thread-local protocol implementation so a Client or
Processor can be used from multiple threads;
- the impossibility to detect that a function has the 'async' keyword so I
cannot start these functions in a background thread and release the
communication channel;
- the inability to write an efficient header for a multiplexed
implementation.
If things were less tightly coupled, and the request processing was not
hardcoded in the services, the implementation would be much easier.
--
Kind regards,
Johan Stuyts