Dear Tomas, Inaki and the rest of R-devel team, thank you for your explainations and suggestions. I talked with gEcon development team and we decided to change our implementation along the lines you suggested.
Best regards, Karol Podemski pt., 17 sie 2018 o 13:38 Tomas Kalibera <tomas.kalib...@gmail.com> napisał(a): > Dear Karol, > > I don't understand the models behind these function, but I can tell that > the code generated is very inefficient. The AST interpreter will be very > inefficient performing each scalar computation with all boxing, > allocations, function calls. The byte-code compiler removes some of the > boxing and allocation. While it could certainly compile faster, it will > always be taking long compiling such functions with so many commands: so > many expressions to track, so many source references to map, for so little > computation. The same code could be much more efficient if it used > vectorized operations and loops. The compiler cannot infer the loops and > vector operations from the code - it is not that smart and I doubt it could > easily be for R, but such optimizations could certainly be done easily at a > higher level, when optimizing computation within the model, not within R > with all its complicated semantics. I think you could hope for ~100x > speedups compared to current generated code running with R AST interpreter. > > So I think it might be worth thinking about writing an interpreter for the > model (the generator would compute function values on the fly, without > actually generating code). If that was too slow, it might pay off to > generate some intermediate representation for the model that would be > faster to interpret. If that was too hard, then perhaps generating the code > from the model in a smarter way (use vector operations, loops). It is ok to > do that opportunistically - only when possible. With compilers, this is > normal, optimizations often take advantage of certain patterns in the code > if they are present. If you had more specific questions how to optimize the > code feel free to ask (also offline). > > Certainly I don't want the existence of the byte-code compiler to require > you to switch from R to C/C++, that would be exactly the opposite of what > the compiler is aiming for. If it turns out you really need a way to > disable compilation of these generated functions (so they run much slower, > but you don't have to wait for them to compile), we will provide it and > using a hack/workaround it is already possible in existing versions of R, > with all the drawbacks I mentioned previously. > > Best > Tomas > > > On 08/17/2018 12:43 AM, Karol Podemski wrote: > > Dear Thomas, > > thank you for prompt response and taking interest in this issue. I really > appreciate your compiler project and efficiency gains in usual case. I am > aware of limitations of interpreted languages too and because of that even > when writing my first mail I had a hunch that it is not that easy to > address this problem. As you mentioned optimisation of compiler for > handling non-standard code may be tricky and harmful for usual code. The > question is if gEcon is the only package that may face the same issue > because of compilation. > > The functions generated by gEcon are systems of non-linear equations > defining the equilibrium of an economy (see > http://gecon.r-forge.r-project.org/files/gEcon-users-guide.pdf if you > want to learn a bit how we obtain it). The rows, you suggested to > vectorise, are indeed vectorisable because they define equilibrium for > similiar markets (e.g. production and sale of beverages and food) but do > not have to be vectorisable in general case. So that not to delve into too > much details I will stop here in description of how the equations > originate. However, I would like to point that similiar large systems of > linear equations may arise in other fields ( > https://en.wikipedia.org/wiki/Steady_state ) and there may be other > packages that generate similar large systems (e.g. network problems like > hydraulic networks). In that case, reports such as mine may help you to > assess the scale of the problems. > > Thank you for suggestions for improvement in our approach, i am going to > discuss them with other package developers. > > Regards, > Karol Podemski > > pon., 13 sie 2018 o 18:02 Tomas Kalibera <tomas.kalib...@gmail.com> > napisał(a): > >> Dear Karol, >> >> thank you for the report. I can reproduce that the function from you >> example takes very long to compile and I can see where most time is spent. >> The compiler is itself written in R and requires a lot of resources for >> large functions (foo() has over 16,000 lines of code, nearly 1 million of >> instructions/operands, 45,000 constants). In particular a lot of time is >> spent in garbage collection and in finding a unique set of constants. Some >> optimizations of the compiler may be possible, but it is unlikely that >> functions this large will compile fast any soon. For non-generated code, we >> now have the byte-compilation on installation by default which at least >> removes the compile overhead from runtime. Even though the compiler is >> slow, it is important to keep in mind that in principle, with any compiler >> there will be functions where compilation would not be improve performance >> (when the compile time is included or not). >> >> I think it is not a good idea to generate code for functions like foo() >> in R (or any interpreted language). You say that R's byte-code compiler >> produces code that runs 5-10x faster than when the function is interpreted >> by the AST interpreter (uncompiled), which sounds like a good result, but I >> believe that avoiding code generation would be much faster than that, apart >> from drastically reducing code size and therefore compile time. The >> generator of these functions has much more information than the compiler - >> it could be turned into an interpreter of these functions and compute their >> values on the fly. >> >> A significant source of inefficiency of the generated code are >> element-wise operations, such as >> >> r[12] <- -vv[88] + vv[16] * (1 + ppff[1307]) >> ... >> >> r[139] <- -vv[215] + vv[47] * (1 + ppff[1434]) >> >> (these could be vectorized, which would reduce code size and improve >> interpretation speed; and make it somewhat readable). Most of the code >> lines in the generated functions seem to be easily vectorizable. >> >> Compilers and interpreters necessarily use some heuristics or optimize at >> some code patterns. Optimizing for generated code may be tricky as it could >> even harm performance of usual code. And, I would much rather optimize the >> compiler for the usual code. >> >> Indeed, a pragmatic solution requiring the least amount of work would be >> to disable compilation of these generated functions. There is not a >> documented way to do that and maybe we could add it (and technically it is >> trivial), but I have been reluctant so far - in some cases, compilation >> even of these functions may be beneficial - if the speedup is 5-10x and we >> run very many times. But once the generated code included some pragma >> preventing compilation, it won't be ever compiled. Also, the trade-offs may >> change as the compiler evolves, perhaps not in this case, but in other >> where such pragma may be used. >> >> Well so the short answer would be that these functions should not be >> generated in the first place. If it were too much work rewriting, perhaps >> the generator could just be improved to produce vectorized operations. >> >> Best >> Tomas >> On 12.8.2018 21:31, Karol Podemski wrote: >> >> Dear R team, >> >> I am a co-author and maintainer of one of R packages distributed by R-forge >> (gEcon). One of gEcon package users found a strange behaviour of package (R >> froze for couple of minutes) and reported it to me. I traced the strange >> behaviour to compiler package. I attach short demonstration of the problem >> to this mail (demonstration makes use of compiler and tictoc packages only). >> >> In short, the compiler package has problems in compiling large functions - >> their compilation and execution may take much longer than direct execution >> of an uncompiled function. Such functions are generated by gEcon package as >> they describe steady state for economy. >> >> I am curious if you are aware of such problems and plan to handle the >> efficiency issues. On one of the boards I saw that there were efficiency >> issues in rpart package but they have been resolved. Or would you advise to >> turn off JIT on package load (package heavily uses such long functions >> generated whenever a new model is created)? >> >> Best regards, >> Karol Podemski >> >> >> >> ______________________________________________r-de...@r-project.org mailing >> listhttps://stat.ethz.ch/mailman/listinfo/r-devel >> >> >> > [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel