[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...

kiszk Wed, 09 Nov 2016 06:53:52 -0800

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/13909
  
    Yes, #15780 did not directly change the size of generated code. However, 
since it changes `nullable` in Dataframe schema, the code size is changed.
    
    Let me show you the following example. While you can see if-statements for 
nullcheck without #15780, you see no if-statements for null check with #15780. 
Precise schema information can improve quality of generated code and reduce 
code size.
    
    without #15780 
    ```java
    /* 039 */   protected void processNext() throws java.io.IOException {
    /* 040 */     while (inputadapter_input.hasNext()) {
    /* 041 */       InternalRow inputadapter_row = (InternalRow) 
inputadapter_input.next();
    /* 042 */       int inputadapter_value = inputadapter_row.getInt(0);
    /* 043 */
    /* 044 */       boolean project_isNull1 = false;
    /* 045 */
    /* 046 */       int project_value1 = -1;
    /* 047 */       project_value1 = inputadapter_value + 1;
    /* 048 */       project_values[0] = project_value1;
    /* 049 */
    /* 050 */       boolean project_isNull4 = false;
    /* 051 */
    /* 052 */       int project_value4 = -1;
    /* 053 */       project_value4 = inputadapter_value + 2;
    /* 054 */       project_values[1] = project_value4;
    /* 055 */       final ArrayData project_value = 
org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.fromPrimitiveArray(project_values);
    /* 056 */       project_holder.reset();
    /* 057 */
    /* 058 */       // Remember the current cursor so that we can calculate how 
many bytes are
    /* 059 */       // written later.
    /* 060 */       final int project_tmpCursor = project_holder.cursor;
    ...
    ```
    
    With #15780
    ```java
    /* 037 */   protected void processNext() throws java.io.IOException {
    /* 038 */     while (inputadapter_input.hasNext()) {
    /* 039 */       InternalRow inputadapter_row = (InternalRow) 
inputadapter_input.next();
    /* 040 */       int inputadapter_value = inputadapter_row.getInt(0);
    /* 041 */
    /* 042 */       final boolean project_isNull = false;
    /* 043 */       this.project_values = new Object[2];
    /* 044 */       boolean project_isNull1 = true;
    /* 045 */       int project_value1 = -1;
    /* 046 */
    /* 047 */       if (!false) {
    /* 048 */         project_isNull1 = false; // resultCode could change 
nullability.
    /* 049 */         project_value1 = inputadapter_value + 1;
    /* 050 */
    /* 051 */       }
    /* 052 */       if (project_isNull1) {
    /* 053 */         project_values[0] = null;
    /* 054 */       } else {
    /* 055 */         project_values[0] = project_value1;
    /* 056 */       }
    /* 057 */
    /* 058 */       boolean project_isNull4 = true;
    /* 059 */       int project_value4 = -1;
    /* 060 */
    /* 061 */       if (!false) {
    /* 062 */         project_isNull4 = false; // resultCode could change 
nullability.
    /* 063 */         project_value4 = inputadapter_value + 2;
    /* 064 */
    /* 065 */       }
    /* 066 */       if (project_isNull4) {
    /* 067 */         project_values[1] = null;
    /* 068 */       } else {
    /* 069 */         project_values[1] = project_value4;
    /* 070 */       }
    /* 071 */
    /* 072 */       final ArrayData project_value = new 
org.apache.spark.sql.catalyst.util.GenericArrayData(project_values);
    /* 073 */       this.project_values = null;
    /* 074 */       project_holder.reset();
    /* 075 */
    /* 076 */       project_rowWriter.zeroOutNullBytes();
    /* 077 */
    /* 078 */       if (project_isNull) {
    /* 079 */         project_rowWriter.setNullAt(0);
    /* 080 */       } else {
    /* 081 */         // Remember the current cursor so that we can calculate 
how many bytes are
    /* 082 */         // written later.
    /* 083 */         final int project_tmpCursor = project_holder.cursor;
    ...
    ```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...

Reply via email to