[ 
https://issues.apache.org/jira/browse/THRIFT-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12670813#action_12670813
 ] 

Bryan Duxbury commented on THRIFT-110:
--------------------------------------


bq. First: allow casting from bool to int, so that you can send the integer 
values 0 and 1 as boolean-false and boolean-true respectively.

That would be cool, but this is one of those changes that would require all of 
Thrift to change, too. I'm not usually one to avoid sweeping changes if I think 
there's benefit, but right now I'm not really pro the whole "change Thrift 
interface" thing. We're talking about every library, protocol, and code 
generator changing to match some different form of protocol/struct interface. 
While Ben has mentioned this a few times, I haven't seen a complete proposal 
for something like this yet, and it's definitely a nontrivial change. 

For the sake of expediency, I'd really like to limit the scope of the 
discussion of this protocol to the *current* Thrift interface. Changing Thrift 
to be even more compact isn't going to happen in the near future (and possibly 
not before our first release), while this protocol implementation could be 
committed, working, now. 

bq. Second: make up your mind--are you using zigzag ints or not?

Zigzags are important in this protocol, and they're not used in every 
situation. For user-entered data and field ids, there could be negative 
numbers, so I have to protect against worst-case sign extension by using 
zigzag. But there are other things, like list and string lengths, which are 
uniformly non-negative, and so zigzagging them would be a waste.

Also, while it would be nice to have specific-sized int headers available, this 
doesn't help me when I'm in a map or list/set and I have to use one type header 
without knowing the range of values up front. Zigzag allows me to just put 
stuff in there and get a pretty respectable compression.

bq. ... "followed by a variable-length type-header value" ... 

I'm not sure I understand this proposal. What does this accomplish? Leaving 
extra room for more types in the future? The current formulation of the 
protocol leaves 3 open type spots, and while one might be spoken for by 
externalized strings, I don't really know what other types we're likely to 
introduce in the future. 


> A more compact format 
> ----------------------
>
>                 Key: THRIFT-110
>                 URL: https://issues.apache.org/jira/browse/THRIFT-110
>             Project: Thrift
>          Issue Type: Improvement
>            Reporter: Noble Paul
>            Assignee: Bryan Duxbury
>         Attachments: compact-proto-spec-2.txt, compact_proto_spec.txt, 
> compact_proto_spec.txt, thrift-110-v2.patch, thrift-110-v3.patch, 
> thrift-110-v4.patch, thrift-110-v5.patch, thrift-110-v6.patch, 
> thrift-110-v7.patch, thrift-110-v8.patch, thrift-110-v9.patch, 
> thrift-110.patch
>
>
> Thrift is not very compact in writing out data as (say protobuf) . It does 
> not have the concept of variable length integers and various other 
> optimizations possible . In Solr we use a lot of such optimizations to make a 
> very compact payload. Thrift has a lot common with that format.
> It is all done in a single class
> http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/common/util/NamedListCodec.java?revision=685640&view=markup
> The other optimizations include writing type/value  in same byte, very fast 
> writes of Strings, externalizable strings etc 
> We could use a thrift format for non-java clients and I would like to see it 
> as compact as the current java version

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to