Hey Chet,
I have thought on this a bit and considered some implementation
possibilities. I want
to make sure I understand your view and get your thoughts on a struct based
approach
to the problem.
Here's a summary:
>>>>>
Situation:
1. There are many uses for a simple text protocol, particularly
reading/writing configs
2. Embedding schema (or even just the ordinals) in the serialized stream is
undesirable [e.g. below could work today but is too messy]
3. Without embedded schema/ordinals there is no way for a plugin protocol
today to figure out what it is reading
Proposed (cleanest/lowest impact) Fix:
1. Modify the compiler to emit a schema for every struct in the types files
(struct types include: service method args, structs, exceptions and unions)
2. Modify TProtocol writeStructBegin() and readStructBegin() to accept a
schema arg
3. Modify the compiler to pass the appropriate schema to all calls to
TProtocol write/readStructBegin() **
Upshot:
1. Existing protos would simply ignore the schema arg and new protos
(text/JSON/YAML/whatever) [de]serialize field names as desired.
2. This would be a breaking change, though code could be brought up to date
by recompiling IDL and building (without src changes).
<<<<<
** I realize that only the read side needs the schema/ordinal mapping but
something tells me we'll be sorry if we don't maintain the symmetry
Did I miss anything or get anything wrong?
-Randy
{
"__schema__": {
"1": {
"name": "a",
"type": "i32"
},
"2": {
"name": "b",
"type": "str"
},
"3": {
"name": "c",
"type": {
"1": {
"name": "a",
"type": "i32"
},
"2": {
"name": "b",
"type": "str"
}
}
}
},
"a": 1,
"b": "ugh",
"c": {
"a": 2,
"b": "argh"
}
}
On Mon, Oct 2, 2017 at 12:40 PM, Chet Murthy <[email protected]> wrote:
> Randy,
>
> [There -is- a way that this could be done without modifying the interface
> of TProtocol, but it's pretty involved/abstruse; I'll write about this in a
> second email.]
>
> I think that adding such self-describing type-information would vitiate the
> value of the format. Let me try to convince you.
>
> TL;DR -- config-files need to be human-readable/editable, and maintaining
> what is arguably superfluous type-information in the config-file is never
> going to be appealing to humans.
>
> (1) in a former life, I worked for a company that used protobufs
> extensively for comms. It was really really nice (for all the reasons that
> people who use Thrift a lot are aware of) -- you could count on data being
> strongly-typed, and yet the marshalling tech was efficient and had a
> modicum of version-to-version compatibility.
>
> (2) One thing that I found really lovely, was that they also expressed
> config-files in protobufs. So almost everywhere, you found code dealing
> with configuration -objects-, and not YAML files, CSV, INI, etc. There was
> -one- representation in-memory, and it was as protobuf objects. This was
> great, for the same reason that storing data in protobufs was great.
>
> (3) Now, config-files are different from -data- -- Humans read and edit
> config-files. So having a nice human-readable/editable format for
> protobufs, was critical to this use-case. Turns out, protobufs 2.0 has
> one: "CompactText". And protobufs 3.0 has added a new one: "JSON". Both
> of these have the property that there's no (to the human) superfluous type
> information to keep accurately up-to-date. In the "CompactText"
> representation, the example from my first email would look like:
>
> a: 1 b: "ugh" c: < a: 2 b: "argh" >
>
> So: a lot like JSON, but even more bare-bones.
>
> The key thing here, and in the JSON example, is that the config-file needs
> to be somewhat reasonably near to what a human would have written in the
> first place. Adding type-information for the marshaller breaks that
> property.
>
> (4) Another use-case: where you could have an RPC server endpoint that
> presented a human-readable wireline, and users could invoke it (for
> learning, debugging, etc) using something as simple as "nc" and a
> shellscript) is also pretty much vitiated by requring the invoker to pass
> along complex type-information.
>
> --chet--
>
>
>
> On Mon, Oct 2, 2017 at 10:20 AM, Randy Abernethy <
> [email protected]>
> wrote:
>
> > I see. Are you opposed to serializing the mapping? The proto could buffer
> > writes and collect the mappings, then on writeStructEnd() you could emit
> > the map (maybe as an __map__ attribute or something) followed by the
> data.
> > The read side could read the map in response to readStructBegin(). Not
> only
> > would this require no mods to Thrift but it would have the added
> advantage
> > of making your wire format self describing. Kind of like Avro.
> >
> > Thoughts?
> >
> > On Mon, Oct 2, 2017 at 10:00 AM, Chet Murthy <[email protected]>
> > wrote:
> >
> > > Randy,
> > >
> > > Thank you for your questions! I'm hoping that I'm mistaken, and maybe
> > via
> > > this conversation, you can help me figure out that indeed I am.
> > >
> > > (1) you're right, that the writeFieldBegin method is passed the
> > field-name,
> > > so it can write it on the JSON wire.
> > >
> > > (2) the problem is, readFieldBegin can read that back, but it cannot
> > > *infer* the fieldid from that name, and the *fieldid* is what's used in
> > > generated code to drive the switch for demarshalling. Concretely, in
> > your
> > > example, even if "fname" were set to either "f1" or "f2", the switch
> > logic
> > > is driven by fid being set to either 1 or 2. And there's no way for
> that
> > > to happen in a TProtocol, and specifically TSimpleJSONProtocol doesn't
> do
> > > it. But generally, there's no way for it to happen, b/c inferring
> > fieldid
> > > from fieldname depends in which message is being demarshalled, and the
> > > *protocol* object doesn't have access to type-(IDL)-information at all.
> > >
> > > I haven't yet implemented the change I contemplate, only b/c I wanted
> to
> > > find out how open Thrift was, to such a change. But I can do so, if
> > it'll
> > > help to explain what I mean -- it isn't difficult.
> > >
> > > --chet--
> > >
> > > On Mon, Oct 2, 2017 at 9:41 AM, Randy Abernethy <[email protected]> wrote:
> > >
> > > > Hi Chet,
> > > >
> > > > You say there is no mapping between the field names and type/ids, yet
> > > every
> > > > struct (including param structs) hands just such data to the proto on
> > > > write. Why are the field string names supplied to the
> > > > TProtocol::writeFieldBegin method by the generated struct code
> > > > insufficient? The write code passes the proto the field name, type
> and
> > > id;
> > > > and the read is offered the opportunity to return them. Sounds like
> > > > everything your new protocol would need is supplied. As per Jens you
> > just
> > > > need to serialize the data provided the way you want it (swapping
> field
> > > > names for ids).
> > > >
> > > > What am I missing (I'm guessing something :-)?
> > > >
> > > > For example, thrift IDL (notice bold items):
> > > >
> > > > >>>>>>>>>>>>
> > > >
> > > > struct data {
> > > > 1: i16 *f1*
> > > > 2: i16 *f2*
> > > > }
> > > >
> > > > <<<<<<<<<<<<<
> > > >
> > > > generates c++ write:
> > > >
> > > > >>>>>>>>>>
> > > >
> > > > uint32_t data::write(::apache::thrift::protocol::TProtocol* oprot)
> > > const {
> > > > uint32_t xfer = 0;
> > > > ::apache::thrift::protocol::TOutputRecursionTracker
> tracker(*oprot);
> > > > xfer += oprot->writeStructBegin("data");
> > > >
> > > > xfer += oprot->writeFieldBegin("*f1*",
> ::apache::thrift::protocol::T_
> > > > I16,
> > > > 1);
> > > > xfer += oprot->writeI16(this->f1);
> > > > xfer += oprot->writeFieldEnd();
> > > >
> > > > xfer += oprot->writeFieldBegin("*f2*",
> ::apache::thrift::protocol::T_
> > > > I16,
> > > > 2);
> > > > xfer += oprot->writeI16(this->f2);
> > > > xfer += oprot->writeFieldEnd();
> > > >
> > > > xfer += oprot->writeFieldStop();
> > > > xfer += oprot->writeStructEnd();
> > > > return xfer;
> > > > }
> > > >
> > > > <<<<<<<<<<<<<
> > > >
> > > > and read:
> > > >
> > > > >>>>>>>>>>>>>
> > > >
> > > > uint32_t data::read(::apache::thrift::protocol::TProtocol* iprot) {
> > > >
> > > > ::apache::thrift::protocol::TInputRecursionTracker
> tracker(*iprot);
> > > > uint32_t xfer = 0;
> > > > std::string fname;
> > > > ::apache::thrift::protocol::TType ftype;
> > > > int16_t fid;
> > > >
> > > > xfer += iprot->readStructBegin(fname);
> > > >
> > > > using ::apache::thrift::protocol::TProtocolException;
> > > >
> > > > while (true)
> > > > {
> > > > xfer += iprot->readFieldBegin(*fname*, ftype, fid);
> > > > if (ftype == ::apache::thrift::protocol::T_STOP) {
> > > > break;
> > > > }
> > > > switch (fid)
> > > > {
> > > > case 1:
> > > > if (ftype == ::apache::thrift::protocol::T_I16) {
> > > > xfer += iprot->readI16(this->f1);
> > > > this->__isset.f1 = true;
> > > > } else {
> > > > xfer += iprot->skip(ftype);
> > > > }
> > > > break;
> > > > case 2:
> > > > if (ftype == ::apache::thrift::protocol::T_I16) {
> > > > xfer += iprot->readI16(this->f2);
> > > > this->__isset.f2 = true;
> > > > } else {
> > > > xfer += iprot->skip(ftype);
> > > > }
> > > > break;
> > > > default:
> > > > xfer += iprot->skip(ftype);
> > > > break;
> > > > }
> > > > xfer += iprot->readFieldEnd();
> > > > }
> > > >
> > > > xfer += iprot->readStructEnd();
> > > >
> > > > return xfer;
> > > > }
> > > >
> > > > <<<<<<<<<<<<<
> > > >
> > > >
> > > > --Randy
> > > >
> > > >
> > > > On Sat, Sep 30, 2017 at 6:38 AM, Edward Capriolo <
> > [email protected]>
> > > > wrote:
> > > >
> > > > > Also i wonder if what is meant by human readable is simple a clever
> > way
> > > > to
> > > > > generate pcap modules so tools like wireshark/tcp dump can read the
> > > data.
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Sep 28, 2017 at 3:49 PM, Jens Geyer <[email protected]
> >
> > > > wrote:
> > > > >
> > > > > > Hi Chet,
> > > > > >
> > > > > > well, Thrift is primarily about efficiency, not human
> readability.
> > If
> > > > > > machines and programs talk to each other, nobody really needs
> human
> > > > > > readable
> > > > > > messages, because there are no humans involved, except maybe for
> > > > > debugging
> > > > > > (but that's not a real production use case). If one asked you to
> > > pick
> > > > > just
> > > > > > one single feature about any Serialization and RPC library,
> > > potentially
> > > > > > sacrificing any other requirement if needed, you probably would
> > > answer
> > > > > that
> > > > > > it should be as fast and efficient as possible.
> > > > > >
> > > > > > I only wonder if the human readability has sth to do with the
> fact
> > > that
> > > > > > gRPC
> > > > > > is often found being slower than Thrift ... ;-)
> > > > > >
> > > > > > You still want a human readable fomat? Ok, here's how to do it.
> > > Thrift
> > > > > > indeed offers the ability to achieve that, because it is a
> > framework.
> > > > For
> > > > > > example, look at the implementation of the TSimpleJSONProtocol
> > (link
> > > > > below)
> > > > > > and use this as a starting point to write your own JSON-like
> > > TProtocol
> > > > > > implementation that suits your needs. That's what makes Thrift so
> > > > > flexible
> > > > > > -
> > > > > > even if you have special needs, you need to replace only those
> > parts
> > > > and
> > > > > it
> > > > > > still simply works. If you prefer XML or some other format, even
> > that
> > > > > > should
> > > > > > be feasible, but you have to invest some work either way.
> > > > > >
> > > > > > https://github.com/apache/thrift/blob/master/lib/java/
> > > > > > src/org/apache/thrift/protocol/TSimpleJSONProtocol.java
> > > > > >
> > > > > > Does that help you?
> > > > > >
> > > > > > Have fun,
> > > > > > JensG
> > > > > >
> > > > > >
> > > > > > -----Ursprüngliche Nachricht-----
> > > > > > From: Chet Murthy
> > > > > > Sent: Thursday, September 28, 2017 3:04 AM
> > > > > > To: [email protected]
> > > > > > Subject: Human-readable wire-format for Thrift?
> > > > > >
> > > > > > [I hope I'm sending this mail to the right list -- it wasn't
> clear
> > to
> > > > me
> > > > > > that it should go to thrift-dev, so I figured I'd send it here
> > > first.]
> > > > > >
> > > > > > The -one- thing that protobufs has going for it, over Thrift, is
> > that
> > > > > > protobufs has "CompactTextFormat" (and JSON too) as full
> > > wire-formats.
> > > > > > This is .... incredibly useful for the following use-case:
> > > > > >
> > > > > > You want to write a config-file format, and you want to get the
> > > > benefits
> > > > > of
> > > > > > version-to-version compatibility. In your program, you'd like to
> > > > access
> > > > > a
> > > > > > strongly-typed "config object" with typed fields, and you'd
> -like-
> > > for
> > > > > > marshalling to/from flat-text to be automatically generated.
> > > > > >
> > > > > > I have personal experience with using protobufs in exactly this
> > way,
> > > > and
> > > > > > it's really, really, really nice.
> > > > > >
> > > > > > The current Thrift JSON protocol isn't designed for this, and
> given
> > > the
> > > > > > interface of the (C++) TProtocol class, I think it isn't
> possible.
> > > But
> > > > > > with a small change, it -would- be possible, so I thought I'd
> > > describe
> > > > > the
> > > > > > change, and see what you all thought (b/c it would require a
> change
> > > to
> > > > > > generated code, and to the TProtocol base class interfaces
> > > > (specifically
> > > > > to
> > > > > > the readFieldBegin method):
> > > > > >
> > > > > > [I'll describe this for the C++ generated code; I haven't looked
> > > > > carefully
> > > > > > into the rest of the languages, but I'd guess that something
> could
> > be
> > > > > > done.]
> > > > > >
> > > > > > (0) Let me first note that these datastructures are constant, and
> > > we're
> > > > > > talking about passing an extra parameter to the read method
> listed
> > > > above.
> > > > > > That's it.
> > > > > >
> > > > > > (1) For concreteness, imagine a couple of message types
> > > > > >
> > > > > > struct Bar {
> > > > > > 4: required i32 a ,
> > > > > > 5: required string b,
> > > > > > }
> > > > > >
> > > > > > struct Foo {
> > > > > > 1: required i32 a ,
> > > > > > 2: required string b,
> > > > > > 3: required Bar c,
> > > > > > }
> > > > > >
> > > > > > Again for concreteness, here's an example of the JSON protoocol
> > for a
> > > > > value
> > > > > > of type Foo:
> > > > > >
> > > > > > {
> > > > > > "1": {
> > > > > > "i32": 1
> > > > > > },
> > > > > > "2": {
> > > > > > "str": "ugh"
> > > > > > },
> > > > > > "3": {
> > > > > > "rec": {
> > > > > > "4": {
> > > > > > "i32": 2
> > > > > > },
> > > > > > "5": {
> > > > > > "str": "argh"
> > > > > > }
> > > > > > }
> > > > > > }
> > > > > > }
> > > > > >
> > > > > > (2) I'd prefer that that look like:
> > > > > > {
> > > > > > "a": 1,
> > > > > > "b": "ugh",
> > > > > > "c": {
> > > > > > "a": 2,
> > > > > > "b": "argh"
> > > > > > }
> > > > > > }
> > > > > >
> > > > > > (3) For each message-type, we need a mapping field-name ->
> > > > > > pair<Thrift-type, field-id>. So, generate a constant
> > data-structure
> > > of
> > > > > > type
> > > > > >
> > > > > > map<string, pair<Type, int16_t> >
> > > > > >
> > > > > > for each message-type.
> > > > > >
> > > > > > (3) Marshalling is easy -- all the field-names are known, and we
> > > could
> > > > > just
> > > > > > emit those instead of field-ids; similarly, we could skip putting
> > > > > > type-information in the wire-format too.
> > > > > >
> > > > > > (4) At demarshalling time, we always know the type of the message
> > > we're
> > > > > > demarshalling. So as we read field-names, we can use the map in
> #3
> > > to
> > > > > look
> > > > > > up TType and field-id, and then just demarshal in the normal way.
> > We
> > > > > just
> > > > > > need to pass that map as a constref to readFieldBegin.
> > > > > >
> > > > > > I -think- that that works, and can't find any problems with what
> > I've
> > > > > > described.
> > > > > >
> > > > > > I can make this change to the C++ library and code-generator, but
> > > > before
> > > > > I
> > > > > > start down that path, I figured I should get some input on
> whether
> > > this
> > > > > is
> > > > > > something that the Thrift community (and maintainers) would
> accept?
> > > > > >
> > > > > > I think that a human-readable/writable wire would be immensely
> > > > valuable,
> > > > > > and not just for the example of config-files.
> > > > > >
> > > > > > Your feedback appreciated,
> > > > > > --chet--
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>