I think there is definitely a lot of room for improvement in the size-of-generated-code department. Generally, Thrift has always optimized for execution speed over code size and memory footprint, which has meant generating a lot of obvious, flat, inlined code.
I don't know that anyone's ever spent significant time specifically working on this. There are probably some reasonable improvements to be made without taking a perf hit or introducing dynamic constructs (stuff like generating helpers for common container types, instead of iterating all over the place). I think a compiler switch could be reasonable for changes that would significantly alter the structure of the generated code, making it use slower dynamic constructs like introspection that could more significantly reduce code footprint. Cheers, mcslee -----Original Message----- From: Bryan Duxbury [mailto:br...@rapleaf.com] Sent: Monday, March 22, 2010 8:10 AM To: thrift-dev@incubator.apache.org Subject: Re: hyper-inflation of generated code I've noticed how big our jars can get, and I opened a ticket about decreasing the amount of duplication with libraries some time ago, but it hasn't been a priority yet. ( https://issues.apache.org/jira/browse/THRIFT-447, https://issues.apache.org/jira/browse/THRIFT-701 are the relevant tickets.) I'm all for making some changes, but is 1.6MB of jar really a problem for you? I know that personally my project depends on 30MB of jar, only 2 of which is my Thrift stuff. I'd love to work with you to get a patch in to extract some of the redundant code. I doubt it will be that hard to do - someone just has to take a look at it. Feel free to email me off-list if you would like to chat. I have to imagine you could fix thrift a lot faster than you could build a competing system from scratch. -Bryan On Mon, Mar 22, 2010 at 7:43 AM, tomer filiba <tomerfil...@gmail.com> wrote: > if you recall, i'm working on a project called xthrift, which adds passing > objects by-reference on top of thrift. the project seemed very promising up > until yesterday, when i realized thrift generates way to much code to make > it feasible. > > i made an test case of 6 classes, each with 6 methods and 6 attributes, and > 6 service functions that expose those. i attached the thrift file that's > generated from my xthrift file -- it contains around 100 functions. > > generating java code using the thrift compiler yields a 2.2 MB java source > file! when compiled, it yields a 1.6MB jar! in csharp and python, the > situation is slightly better: ~700 KB. just for the sake of entropy, > compressing (bz2) the generated java code yields a 34 KB file (the a ratio > is 65! ) > > for our project, that contains ~100 classes, each with ~10 methods and ~5 > attributes, plus ~50 functions, the generated java code would weigh tens if > not hundreds of MBs, which is unacceptable, of course. > > looking at the generated code, it's easy to spot the redundancy: thrift > employs a "full beta-reduction policy", i.e., it doesn't encapsulate common > functionality into functions, instead it just repeats them over and over. > this yields ~80,000 lines of code that mostly repeat one another. > > judging from the code size, i understand thrift is not meant to handle more > than ~50 functions per project, unless you are willing to accept tens of MBs > of library footprint.[1] > is there any "compiler switch" or planned feature, to eliminate this code > bloat? > > if not, my company will have to drop thrift and adopt an in-house solution > (which we really hoped to avoid...) > > > thanks in advance, > -tomer > > [1] a 100 MB library, on today's hardware, is not unheardof, but our > project's RAM footprint is ~30 MB... it would be a pity to require such big > a footprint just for glue code. > > > > An NCO and a Gentleman >