LuciferYang commented on PR #56430:
URL: https://github.com/apache/spark/pull/56430#issuecomment-4677413010

   > There is a nontrivial perf delta compared to janino, last time I had 
looked at this approach ... do we have any numbers of this @LuciferYang ? 
Perhaps it is better now !
   
   Yes, the per-compile delta is still very real -- javac won't catch Janino on 
cold compiles. Fresh numbers, measured per generated unit with what this PR's 
implementation actually does (a shared, reused file manager plus `-proc:none 
-g:none -nowarn -implicit:none -Xlint:none`). JDK 17, Apple M3; best times over 
30 iterations, stable across two independent runs (within ~5-7%):
   
   | generated unit | Janino | javac | relative |
   |---|---|---|---|
   | ~20 lines (trivial filter) | 0.06 ms | ~16 ms | ~270x |
   | ~80 lines (10-field projection) | 0.21 ms | ~20 ms | ~95x |
   | ~500 lines (50-col projection, split methods) | 0.8 ms | ~16 ms | ~19x |
   | ~2000 lines (200-col projection) | 3.3 ms | ~19 ms | ~6x |
   | ~10000 lines (1000-col projection) | 17 ms | 38 ms | ~2.2x |
   | ~40000 lines (4000-col projection) | 73 ms | 186 ms | ~2.5x |
   
   For reference, a naive `getTask` per compile without a reused file manager 
costs 360-600 ms regardless of unit size -- most of javac's fixed cost is 
file-manager/classpath setup, which the implementation amortizes away.
   
   Two observations from the shape of the table. The scary relative ratios are 
confined to small units, where the absolute cost is tens of milliseconds; at 
the wide-schema end where compile time actually hurts (the SPARK-18016 
constant-pool scale, where Janino itself is at 70+ ms), the gap converges to 
~2-2.5x, because both compilers are doing real parse/typecheck/emit work and 
javac's fixed overhead has amortized. And in all cases it's a one-time cost per 
distinct generated unit per JVM: results go into the existing codegen cache 
(whose key now includes the backend), so the cost does not scale with rows or 
repeated executions.
   
   For this reason, I believe we need to keep Janino as the default compiler 
for the foreseeable future, with the native JDK javac only serving as a 
fallback escape hatch to mitigate long-term maintenance risks associated with 
Janino.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to