I'm almost done with the state-machine based demarshaller approach for cthrift (I have maps & containers-as-structs left to get done).
I started off doing this so that I could get true non-blocking receives, but, as it turns out, it also has other interesting properties: - the footprint is proportional to the number of types, not the way they are composed. - the approach allows for recursive data-structures such as trees to be parsed. Of course, to get the full benefit of this, I'll have to add a state-based marshaller to cthrift, which is the next thing I may tackle. Anyway, it may be worth the Thrift team's while to look into the approach I used. > I think there is definitely a lot of room for improvement in the > size-of-generated-code department. Generally, Thrift has always optimized > for execution speed over code size and memory footprint, which has meant > generating a lot of obvious, flat, inlined code. > > I don't know that anyone's ever spent significant time specifically > working on this. There are probably some reasonable improvements to be > made without taking a perf hit or introducing dynamic constructs (stuff > like generating helpers for common container types, instead of iterating > all over the place). > > I think a compiler switch could be reasonable for changes that would > significantly alter the structure of the generated code, making it use > slower dynamic constructs like introspection that could more significantly > reduce code footprint. > > Cheers, > mcslee > > -----Original Message----- > From: Bryan Duxbury [mailto:br...@rapleaf.com] > Sent: Monday, March 22, 2010 8:10 AM > To: thrift-dev@incubator.apache.org > Subject: Re: hyper-inflation of generated code > > I've noticed how big our jars can get, and I opened a ticket about > decreasing the amount of duplication with libraries some time ago, but it > hasn't been a priority yet. ( > https://issues.apache.org/jira/browse/THRIFT-447, > https://issues.apache.org/jira/browse/THRIFT-701 are the relevant > tickets.) > > I'm all for making some changes, but is 1.6MB of jar really a problem for > you? I know that personally my project depends on 30MB of jar, only 2 of > which is my Thrift stuff. > > I'd love to work with you to get a patch in to extract some of the > redundant > code. I doubt it will be that hard to do - someone just has to take a look > at it. Feel free to email me off-list if you would like to chat. I have to > imagine you could fix thrift a lot faster than you could build a competing > system from scratch. > > -Bryan > > On Mon, Mar 22, 2010 at 7:43 AM, tomer filiba <tomerfil...@gmail.com> > wrote: > >> if you recall, i'm working on a project called xthrift, which adds >> passing >> objects by-reference on top of thrift. the project seemed very promising >> up >> until yesterday, when i realized thrift generates way to much code to >> make >> it feasible. >> >> i made an test case of 6 classes, each with 6 methods and 6 attributes, >> and >> 6 service functions that expose those. i attached the thrift file that's >> generated from my xthrift file -- it contains around 100 functions. >> >> generating java code using the thrift compiler yields a 2.2 MB java >> source >> file! when compiled, it yields a 1.6MB jar! in csharp and python, the >> situation is slightly better: ~700 KB. just for the sake of entropy, >> compressing (bz2) the generated java code yields a 34 KB file (the a >> ratio >> is 65! ) >> >> for our project, that contains ~100 classes, each with ~10 methods and >> ~5 >> attributes, plus ~50 functions, the generated java code would weigh tens >> if >> not hundreds of MBs, which is unacceptable, of course. >> >> looking at the generated code, it's easy to spot the redundancy: thrift >> employs a "full beta-reduction policy", i.e., it doesn't encapsulate >> common >> functionality into functions, instead it just repeats them over and >> over. >> this yields ~80,000 lines of code that mostly repeat one another. >> >> judging from the code size, i understand thrift is not meant to handle >> more >> than ~50 functions per project, unless you are willing to accept tens of >> MBs >> of library footprint.[1] >> is there any "compiler switch" or planned feature, to eliminate this >> code >> bloat? >> >> if not, my company will have to drop thrift and adopt an in-house >> solution >> (which we really hoped to avoid...) >> >> >> thanks in advance, >> -tomer >> >> [1] a 100 MB library, on today's hardware, is not unheardof, but our >> project's RAM footprint is ~30 MB... it would be a pity to require such >> big >> a footprint just for glue code. >> >> >> >> An NCO and a Gentleman >> >