[GitHub] spark pull request #15620: [SPARK-18091] [SQL] Deep if expressions cause Gen...

a-roberts Sat, 10 Dec 2016 04:13:09 -0800

Github user a-roberts commented on a diff in the pull request:

    https://github.com/apache/spark/pull/15620#discussion_r91834301
  
    --- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala
 ---
    @@ -97,6 +97,27 @@ class CodeGenerationSuite extends SparkFunSuite with 
ExpressionEvalHelper {
         assert(actual(0) == cases)
       }
     
    +  test("SPARK-18091: split large if expressions into blocks due to JVM 
code size limit") {
    +    val inStr = "StringForTesting"
    +    val row = create_row(inStr)
    +    val inputStrAttr = 'a.string.at(0)
    +
    +    var strExpr: Expression = inputStrAttr
    +    for (_ <- 1 to 13) {
    +      strExpr = If(EqualTo(Decode(Encode(strExpr, "utf-8"), "utf-8"), 
inputStrAttr),
    --- End diff --
    
    Very interested in this, we know with 5 iterations instead of a 13 we don't 
get the problem (which makes sense as we'd only be generating so much code, not 
hitting the 64k constant pool size limit). I'm testing with IBM's SDK for Java 
so curious if it manifests itself in a different way for us or if we have a 
problem to fix on our end.
    
    I have log files exceeding 2 GB from the test printing the generated code 
on failure. 
    
    If we add prints for the strExpr we see something like
    
    The problem is
    ```
    CodeGenerationSuite:
    - multithreaded eval
    - metrics are recorded on compile
    - SPARK-8443: split wide projections into blocks due to JVM code size limit
    - SPARK-13242: case-when expression with large number of branches (or cases)
    - SPARK-18091: split large if expressions into blocks due to JVM code size 
limit *** FAILED ***
      java.util.concurrent.ExecutionException: java.lang.Exception: failed to 
compile: org.codehaus.janino.JaninoRuntimeException: Constant pool for class 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection
 has grown past JVM limit of 0xFFFF
    /* 001 */ public java.lang.Object generate(Object[] references) {
    /* 002 */   return new SpecificUnsafeProjection(references);
    /* 003 */ }
    /* 004 */
    /* 005 */ class SpecificUnsafeProjection extends 
org.apache.spark.sql.catalyst.expressions.UnsafeProjection {
    /* 006 */
    /* 007 */   private Object[] references;
    /* 008 */   private boolean isNull42;
    /* 009 */   private boolean value42;
    /* 010 */   private boolean isNull43;
    /* 011 */   private UTF8String value43;
    /* 012 */   private boolean isNull44;
    /* 013 */   private UTF8String value44;
    /* 014 */   private boolean isNull58;
    ```
    
    With my prints:
    ```
    debug, row: [StringForTesting]
    input string attr: input[0, string, true]
    
    in the loop
    strExpr is: if ((decode(encode(input[0, string, true], utf-8), utf-8) = 
input[0, string, true])) input[0, string, true] else input[0, string, true]
    
    in the loop
    strExpr is: if ((decode(encode(if ((decode(encode(input[0, string, true], 
utf-8), utf-8) = input[0, string, true])) input[0, string, true] else input[0, 
string, true], utf-8), utf-8) = input[0, string, true])) if 
((decode(encode(input[0, string, true], utf-8), utf-8) = input[0, string, 
true])) input[0, string, true] else input[0, string, true] else if 
((decode(encode(input[0, string, true], utf-8), utf-8) = input[0, string, 
true])) input[0, string, true] else input[0, string, true]
    
    in the loop
    strExpr is: if ((decode(encode(if ((decode(encode(if 
((decode(encode(input[0, string, true], utf-8), utf-8) = input[0, string, 
true])) input[0, string, true] else input[0, string, true], utf-8), utf-8) = 
input[0, string, true])) if ((decode(encode(input[0, string, true], utf-8), 
utf-8) = input[0, string, true])) input[0, string, true] else input[0, string, 
true] else if ((decode(encode(input[0, string, true], utf-8), utf-8) = input[0, 
string, true])) input[0, string, true] else input[0, string, true], utf-8), 
utf-8) = input[0, string, true])) if ((decode(encode(if 
((decode(encode(input[0, string, true], utf-8), utf-8) = input[0, string, 
true])) input[0, string, true] else input[0, string, true], utf-8), utf-8) = 
input[0, string, true])) if ((decode(encode(input[0, string, true], utf-8), 
utf-8) = input[0, string, true])) input[0, string, true] else input[0, string, 
true] else if ((decode(encode(input[0, string, true], utf-8), utf-8) = input[0, 
string, true])) input[0, string, true
 ] else input[0, string, true] else if ((decode(encode(if 
((decode(encode(input[0, string, true], utf-8), utf-8) = input[0, string, 
true])) input[0, string, true] else input[0, string, true], utf-8), utf-8) = 
input[0, string, true])) if ((decode(encode(input[0, string, true], utf-8), 
utf-8) = input[0, string, true])) input[0, string, true] else input[0, string, 
true] else if ((decode(encode(input[0, string, true], utf-8), utf-8) = input[0, 
string, true])) input[0, string, true] else input[0, string, true]
    etc
    ```
    
    Is this the same issue we're referring to? I've also seen timeouts and our 
Jenkins farm has 5h limits, I see the problem against branch-2.0, branch-2.1, 
master, but didn't see it for Spark 2.1.0 RC1.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #15620: [SPARK-18091] [SQL] Deep if expressions cause Gen...

Reply via email to