RE: hyper-inflation of generated code

mayan Wed, 31 Mar 2010 13:35:00 -0700

I'm almost done with the state-machine based demarshaller approach for
cthrift (I have maps & containers-as-structs left to get done).


I started off doing this so that I could get true non-blocking receives,
but, as it turns out, it also has other interesting properties:
- the footprint is proportional to the number of types, not the way they
are composed.
- the approach allows for recursive data-structures such as trees to be
parsed.

Of course, to get the full benefit of this, I'll have to add a state-based
marshaller to cthrift, which is the next thing I may tackle.

Anyway, it may be worth the Thrift team's while to look into the approach
I used.


> I think there is definitely a lot of room for improvement in the
> size-of-generated-code department. Generally, Thrift has always optimized
> for execution speed over code size and memory footprint, which has meant
> generating a lot of obvious, flat, inlined code.
>
> I don't know that anyone's ever spent significant time specifically
> working on this. There are probably some reasonable improvements to be
> made without taking a perf hit or introducing dynamic constructs (stuff
> like generating helpers for common container types, instead of iterating
> all over the place).
>
> I think a compiler switch could be reasonable for changes that would
> significantly alter the structure of the generated code, making it use
> slower dynamic constructs like introspection that could more significantly
> reduce code footprint.
>
> Cheers,
> mcslee
>
> -----Original Message-----
> From: Bryan Duxbury [mailto:br...@rapleaf.com]
> Sent: Monday, March 22, 2010 8:10 AM
> To: thrift-dev@incubator.apache.org
> Subject: Re: hyper-inflation of generated code
>
> I've noticed how big our jars can get, and I opened a ticket about
> decreasing the amount of duplication with libraries some time ago, but it
> hasn't been a priority yet. (
> https://issues.apache.org/jira/browse/THRIFT-447,
> https://issues.apache.org/jira/browse/THRIFT-701 are the relevant
> tickets.)
>
> I'm all for making some changes, but is 1.6MB of jar really a problem for
> you? I know that personally my project depends on 30MB of jar, only 2 of
> which is my Thrift stuff.
>
> I'd love to work with you to get a patch in to extract some of the
> redundant
> code. I doubt it will be that hard to do - someone just has to take a look
> at it. Feel free to email me off-list if you would like to chat. I have to
> imagine you could fix thrift a lot faster than you could build a competing
> system from scratch.
>
> -Bryan
>
> On Mon, Mar 22, 2010 at 7:43 AM, tomer filiba <tomerfil...@gmail.com>
> wrote:
>
>> if you recall, i'm working on a project called xthrift, which adds
>> passing
>> objects by-reference on top of thrift. the project seemed very promising
>> up
>> until yesterday, when i realized thrift generates way to much code to
>> make
>> it feasible.
>>
>> i made an test case of 6 classes, each with 6 methods and 6 attributes,
>> and
>> 6 service functions that expose those. i attached the thrift file that's
>> generated from my xthrift file -- it contains around 100 functions.
>>
>> generating java code using the thrift compiler yields a 2.2 MB java
>> source
>> file! when compiled, it yields a 1.6MB jar! in csharp and python, the
>> situation is slightly better: ~700 KB. just for the sake of entropy,
>> compressing (bz2) the generated java code yields a 34 KB file (the a
>> ratio
>> is 65! )
>>
>> for our project, that contains ~100 classes, each with ~10 methods and
>> ~5
>> attributes, plus ~50 functions, the generated java code would weigh tens
>> if
>> not hundreds of MBs, which is unacceptable, of course.
>>
>> looking at the generated code, it's easy to spot the redundancy: thrift
>> employs a "full beta-reduction policy", i.e., it doesn't encapsulate
>> common
>> functionality into functions, instead it just repeats them over and
>> over.
>> this yields ~80,000 lines of code that mostly repeat one another.
>>
>> judging from the code size, i understand thrift is not meant to handle
>> more
>> than ~50 functions per project, unless you are willing to accept tens of
>> MBs
>> of library footprint.[1]
>> is there any "compiler switch" or planned feature, to eliminate this
>> code
>> bloat?
>>
>> if not, my company will have to drop thrift and adopt an in-house
>> solution
>> (which we really hoped to avoid...)
>>
>>
>> thanks in advance,
>> -tomer
>>
>> [1] a 100 MB library, on today's hardware, is not unheardof, but our
>> project's RAM footprint is ~30 MB... it would be a pity to require such
>> big
>> a footprint just for glue code.
>>
>>
>>
>> An NCO and a Gentleman
>>
>

RE: hyper-inflation of generated code

Reply via email to