I think there is definitely a lot of room for improvement in the 
size-of-generated-code department. Generally, Thrift has always optimized for 
execution speed over code size and memory footprint, which has meant generating 
a lot of obvious, flat, inlined code.

I don't know that anyone's ever spent significant time specifically working on 
this. There are probably some reasonable improvements to be made without taking 
a perf hit or introducing dynamic constructs (stuff like generating helpers for 
common container types, instead of iterating all over the place).

I think a compiler switch could be reasonable for changes that would 
significantly alter the structure of the generated code, making it use slower 
dynamic constructs like introspection that could more significantly reduce code 
footprint.

Cheers,
mcslee

-----Original Message-----
From: Bryan Duxbury [mailto:br...@rapleaf.com] 
Sent: Monday, March 22, 2010 8:10 AM
To: thrift-dev@incubator.apache.org
Subject: Re: hyper-inflation of generated code

I've noticed how big our jars can get, and I opened a ticket about
decreasing the amount of duplication with libraries some time ago, but it
hasn't been a priority yet. (
https://issues.apache.org/jira/browse/THRIFT-447,
https://issues.apache.org/jira/browse/THRIFT-701 are the relevant tickets.)

I'm all for making some changes, but is 1.6MB of jar really a problem for
you? I know that personally my project depends on 30MB of jar, only 2 of
which is my Thrift stuff.

I'd love to work with you to get a patch in to extract some of the redundant
code. I doubt it will be that hard to do - someone just has to take a look
at it. Feel free to email me off-list if you would like to chat. I have to
imagine you could fix thrift a lot faster than you could build a competing
system from scratch.

-Bryan

On Mon, Mar 22, 2010 at 7:43 AM, tomer filiba <tomerfil...@gmail.com> wrote:

> if you recall, i'm working on a project called xthrift, which adds passing
> objects by-reference on top of thrift. the project seemed very promising up
> until yesterday, when i realized thrift generates way to much code to make
> it feasible.
>
> i made an test case of 6 classes, each with 6 methods and 6 attributes, and
> 6 service functions that expose those. i attached the thrift file that's
> generated from my xthrift file -- it contains around 100 functions.
>
> generating java code using the thrift compiler yields a 2.2 MB java source
> file! when compiled, it yields a 1.6MB jar! in csharp and python, the
> situation is slightly better: ~700 KB. just for the sake of entropy,
> compressing (bz2) the generated java code yields a 34 KB file (the a ratio
> is 65! )
>
> for our project, that contains ~100 classes, each with ~10 methods and ~5
> attributes, plus ~50 functions, the generated java code would weigh tens if
> not hundreds of MBs, which is unacceptable, of course.
>
> looking at the generated code, it's easy to spot the redundancy: thrift
> employs a "full beta-reduction policy", i.e., it doesn't encapsulate common
> functionality into functions, instead it just repeats them over and over.
> this yields ~80,000 lines of code that mostly repeat one another.
>
> judging from the code size, i understand thrift is not meant to handle more
> than ~50 functions per project, unless you are willing to accept tens of MBs
> of library footprint.[1]
> is there any "compiler switch" or planned feature, to eliminate this code
> bloat?
>
> if not, my company will have to drop thrift and adopt an in-house solution
> (which we really hoped to avoid...)
>
>
> thanks in advance,
> -tomer
>
> [1] a 100 MB library, on today's hardware, is not unheardof, but our
> project's RAM footprint is ~30 MB... it would be a pity to require such big
> a footprint just for glue code.
>
>
>
> An NCO and a Gentleman
>

Reply via email to