LuciferYang commented on PR #56430: URL: https://github.com/apache/spark/pull/56430#issuecomment-4677413010
> There is a nontrivial perf delta compared to janino, last time I had looked at this approach ... do we have any numbers of this @LuciferYang ? Perhaps it is better now ! Yes, the per-compile delta is still very real -- javac won't catch Janino on cold compiles. Fresh numbers, measured per generated unit with what this PR's implementation actually does (a shared, reused file manager plus `-proc:none -g:none -nowarn -implicit:none -Xlint:none`). JDK 17, Apple M3; best times over 30 iterations, stable across two independent runs (within ~5-7%): | generated unit | Janino | javac | relative | |---|---|---|---| | ~20 lines (trivial filter) | 0.06 ms | ~16 ms | ~270x | | ~80 lines (10-field projection) | 0.21 ms | ~20 ms | ~95x | | ~500 lines (50-col projection, split methods) | 0.8 ms | ~16 ms | ~19x | | ~2000 lines (200-col projection) | 3.3 ms | ~19 ms | ~6x | | ~10000 lines (1000-col projection) | 17 ms | 38 ms | ~2.2x | | ~40000 lines (4000-col projection) | 73 ms | 186 ms | ~2.5x | For reference, a naive `getTask` per compile without a reused file manager costs 360-600 ms regardless of unit size -- most of javac's fixed cost is file-manager/classpath setup, which the implementation amortizes away. Two observations from the shape of the table. The scary relative ratios are confined to small units, where the absolute cost is tens of milliseconds; at the wide-schema end where compile time actually hurts (the SPARK-18016 constant-pool scale, where Janino itself is at 70+ ms), the gap converges to ~2-2.5x, because both compilers are doing real parse/typecheck/emit work and javac's fixed overhead has amortized. And in all cases it's a one-time cost per distinct generated unit per JVM: results go into the existing codegen cache (whose key now includes the backend), so the cost does not scale with rows or repeated executions. For this reason, I believe we need to keep Janino as the default compiler for the foreseeable future, with the native JDK javac only serving as a fallback escape hatch to mitigate long-term maintenance risks associated with Janino. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
