rednaxelafx edited a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code URL: https://github.com/apache/spark/pull/27872#issuecomment-599809129 My suggestion for tuning codegen may have been too cryptic, sorry about not providing more background explanation. Let me explain why I arrived at the `Literal` codegen improvement suggestion (targeted for Spark 2.3.x branch -- anything starting from Spark 2.4.0 has it already). @viirya probably saw through this already ;-) My assumptions: 1. The problematic code from actual customer was generated from Spark 2.3.x that @HeartSaVioR is trying to help fix. So to unblock this specific customer, slightly improving Spark 2.3.x branch may help. 2. This PR targets master because the same Janino issue does still show up in master as well. 3. But the specific customer isn't using Spark 2.4 yet (nor Spark 3.0 since it isn't even released yet). From what @HeartSaVioR was able to share in https://github.com/janino-compiler/janino/issues/113, specifically in the attachment https://github.com/janino-compiler/janino/files/4295191/error-codegen.log, we can see that a large portion of the switch...case body is filled with code like the following: ```java /* 4677 */ case 0: /* 4678 */ final long expand_value129 = -1L; /* 4679 */ expand_isNull66 = true; /* 4680 */ expand_value66 = expand_value129; /* 4681 */ /* 4682 */ final long expand_value130 = -1L; /* 4683 */ expand_isNull67 = true; /* 4684 */ expand_value67 = expand_value130; /* 4685 */ /* 4686 */ final long expand_value131 = -1L; /* 4687 */ expand_isNull68 = true; /* 4688 */ expand_value68 = expand_value131; /* 4689 */ /* 4690 */ final long expand_value132 = -1L; /* 4691 */ expand_isNull69 = true; /* 4692 */ expand_value69 = expand_value132; ``` If we zoom in on the 3 statements here: ```java /* 4678 */ final long expand_value129 = -1L; /* 4679 */ expand_isNull66 = true; /* 4680 */ expand_value66 = expand_value129; ``` `javac` would have followed JLS and compiled it into two assignments (L4679 and L4680 with constant propagation), and skipped L4678 because it's a Java language level constant expression initialization. In pseudo bytecode, that's: ``` load constant long 1 # constant true store to local expand_isNull66 load constant int -1L store to local expand_value66 ``` Janino, on the other hand, doesn't handle all of Java language level constant expression constructs. Specifically, it does not support local constant declarations. So it compiles the code as 3 assignments: ``` load constant long -1L store to local expand_value129 load constant int 1 # constant true store to local expand_isNull66 load from local expand_value129 store to local expand_value66 ``` So every "chunk" of code above would be a few more bytes of bytecode. It may be small for just one chunk, but the codegen for `ExpandExec` makes that problem really bad by generating a lot of them. So where does this "value = -1L; isNull = true" thingy come from? We can infer that it's generated from `Literal(null, LongType)`. That is, https://github.com/apache/spark/blob/branch-2.3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala#L282-L285 We can easily make Spark 2.3.x generate slightly better code, by improving `Literal`'s codegen, replacing ```scala if (value == null) { ev.isNull = "true" ev.copy(s"final $javaType ${ev.value} = ${ctx.defaultValue(dataType)};") } ``` with something like: ```scala if (value == null) { ev.copy( isNull = "true", value = ctx.defaultValue(dataType) ) } ``` (the `value` part may or may not have to be further adjusted. Perhaps like `value = s"(($javaType)${ctx.defaultValue(dataType)})"`) for maximal safety. which would improve the example above into: ```java /* 4678 */ // THIS IS GONE: final long expand_value129 = -1L; /* 4679 */ expand_isNull66 = true; /* 4680 */ expand_value66 = -1L; ``` so that Janino can generate less bytecode for this particular query.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
