Github user sarutak commented on the pull request:

    https://github.com/apache/spark/pull/11301#issuecomment-204257179
  
    @kiszk Now I'm inspecting this patch and I noticed some defects so far.
    
    1.
    I ran a job like as follows.
    
    ```
    sc.parallelize(1 to 1, 1).toDF.show
    ```
    
    And then, I got generated code like as follows.
    
    ```
    /* 006 */ class SpecificSafeProjection extends 
org.apache.spark.sql.catalyst.expressions.codegen.BaseProjection {
    /* 007 */
    /* 008 */   private Object[] references;
    /* 009 */   private MutableRow mutableRow;
    /* 010 */   private org.apache.spark.sql.types.StructType schema;
    /* 011 */
    /* 012 */
    /* 013 */   public SpecificSafeProjection(Object[] references) {
    /* 014 */     this.references = references;
    /* 015 */     mutableRow = (MutableRow) references[references.length - 1];
    /* 016 */     this.schema = (org.apache.spark.sql.types.StructType) 
references[0];
    /* 017 */   }
    /* 018 */
    /* 019 */   public java.lang.Object apply(java.lang.Object _i) {
    /* 020 */     InternalRow i = (InternalRow) _i;
    /* 021 */     /* createexternalrow(if (isnull(input[0, int])) null else 
input[0, int], StructField(value,IntegerType,false)) @ selectExpr at 
<console>:27 */
    /* 022 */     boolean isNull = false;
    /* 023 */     final Object[] values = new Object[1];
    ```
    
    The code above is about `SpecificSafeProjection` generated by 
`boundTEncoder.fromRow` called in `Dataset#collect`.
    I think the code above is not directly related to query or DSL written by 
application developers so It's funny that call sites are in the comments.
    
    2.
    When columns are described in DSL style like $"column", call sites in 
comment can be `$`.
    You will reproduce it by the code bellow.
    
    ```
    sc.parallelize(1 to 1, 1).select($"value").show
    ```
    
    3.
    Call sites of multiple jobs can be scrambled. For example, you can 
reproduce by following code.
    
    ```
    val df = sc.parallelize(1 to 1, 1).toDF
    df.select($"value").show
    df.selectExpr("value").show
    ```
    
    After you run the second job above, you can see generated code as follows.
    
    ```
    /* 006 */ class SpecificSafeProjection extends 
org.apache.spark.sql.catalyst.expressions.codegen.BaseProjection {
    /* 007 */
    /* 008 */   private Object[] references;
    /* 009 */   private MutableRow mutableRow;
    /* 010 */   private org.apache.spark.sql.types.StructType schema;
    /* 011 */
    /* 012 */
    /* 013 */   public SpecificSafeProjection(Object[] references) {
    /* 014 */     this.references = references;
    /* 015 */     mutableRow = (MutableRow) references[references.length - 1];
    /* 016 */     this.schema = (org.apache.spark.sql.types.StructType) 
references[0];
    /* 017 */   }
    /* 018 */
    /* 019 */   public java.lang.Object apply(java.lang.Object _i) {
    /* 020 */     InternalRow i = (InternalRow) _i;
    /* 021 */     /* createexternalrow(if (isnull(input[0, int])) null else 
input[0, int], StructField(value,IntegerType,false)) @ selectExpr at 
<console>:27 */
    /* 022 */     boolean isNull = false;
    /* 023 */     final Object[] values = new Object[1];
    /* 024 */     /* if (isnull(input[0, int])) null else input[0, int] @ 
selectExpr at <console>:27 */
    ```
    
    As you can see, `select` and `selectExpr` appear in the comment above 
although `select` is not used in the second job.
    
    Another case is here.
    
    ```
    val df = sc.parallelize(1 to 1, 1).toDF
    df.select($"value" + 1).show
    df.selectExpr("value + 2").show
    
    ...
    
    /* 009 */ final class GeneratedIterator extends 
org.apache.spark.sql.execution.BufferedRowIterator {
    
    ...
    
    /* 034 */       /*** CONSUME: Project [(value#1 + 2) AS (value + 2)#7] */
    /* 035 */
    /* 036 */       /*** CONSUME: WholeStageCodegen */
    /* 037 */
    /* 038 */       /* (input[0, int] + 2) @ selectExpr at <console>:27 */
    /* 039 */       /* input[0, int] @ select at <console>:27 */
    /* 040 */       /* input[0, int] @ selectExpr at <console>:27 */
    /* 041 */       int inputadapter_value = inputadapter_row.getInt(0);
    
    ...
    
    /* 050 */ }
    ```
    
    4.
    As I mentioned before, could you make `TreeNode` serializable?
    
    >Sorry, I said we don't need to make TreeNode serializable but it's needed 
otherwise origin is not serialized and origin information like callsite is not 
in the comment in the generated code when WholeStageCodegen is disabled.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to