HeartSaVioR edited a comment on issue #27872: [SPARK-31115][SQL] Detect known 
Janino bug janino-compiler/janino#113 and apply workaround automatically as a 
fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-598492806
 
 
   Thanks for spending your time to analyze the patch and provide and detailed 
voice, @rednaxelafx .
   
   The rationalization of this patch is to address known issue "before" we 
reach the real solution, as the real solution is out of our control; no one can 
answer when Janino 3.0.16 will be released, and even whether it will be 
released or not. Janino maintainer refactors everything by oneself in 3.1.x and 
makes things be unstable - recent version of Janino makes lots of Spark UTs 
fail which makes me less confident we can migrate to Janino 3.1.x.
   
   I totally agree the patch is a band-aid fix and not well designed to extend 
(I didn't intend it be extendable though), and there're better options 
available (more to less preferable personally)
   
   1. Participate actively to Janino - fix all the issues in 3.1.x which breaks 
Spark, and adopt Janino 3.1.x. It should be ideal if we can provide set of 
regression tests so that Janino community can keep the stability. For sure, I'm 
not an expert of JVM so I can't lead the effort. A couple of JVM experts should 
jump in and lead the effort.
   
   2. Fork Janino - create a main dev. branch from v3.0.15, apply the patch for 
janino-compiler/janino#114 (and port back some bugfixes after 3.0.15), release 
v3.0.16 by our side. License seems to be 3-clause BSD license, which doesn't 
seem to restrict redistribution. The thing of this option would be who/which 
group is willing to maintain the fork and release under their name.
   (I might not want to maintain the fork, but may want to contribute the new 
fork, as I'm the author of the patch for janino-compiler/janino#114 and 
interested to fix the stack overflow issue in SPARK-25987 which is also an 
issue of Janino 3.0.x.)
   
   3. This patch. (try to compile and fail-back)
   
   4. Modify ExpandExec to check the number of operations in for statement, and 
use `if ~ else if` when the number of operations exceed the threshold. This 
should be ideally checking the length of offset but it would be weird if Spark 
does it, so count the lines blindly. Performance regression may happen in some 
cases where it can run with `switch` but due to blind count it runs with `if ~ 
else if`, but the case wouldn't be common. 
   
   5. Give up addressing issue - either let end users to tune their query or 
guide that end users should enable failback option and run with interactive 
mode.
   
   WDYT? Would it be better to raise this to discussion in dev@ list?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to