[GitHub] spark pull request #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in...

icexelloss Wed, 15 Aug 2018 14:25:57 -0700

Github user icexelloss commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22104#discussion_r210414941
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/EvalPythonExec.scala
 ---
    @@ -117,15 +117,18 @@ abstract class EvalPythonExec(udfs: Seq[PythonUDF], 
output: Seq[Attribute], chil
               }
             }.toArray
           }.toArray
    -      val projection = newMutableProjection(allInputs, child.output)
    +
    +      // Project input rows to unsafe row so we can put it in the row queue
    +      val unsafeProjection = UnsafeProjection.create(child.output, 
child.output)
    --- End diff --
    
    This requires some discussion.
    
    This is probably another bug I found in testing this - If the input node to 
EvalPythonExec doesn't produce UnsafeRow, and cast here will fail. I don't know 
if we require data sources to produce unsafe rows, if not, then this is a 
problem.
    
    I also don't know if this will introduce additional copy if input is 
already UnsafeRow - it seems like UnsafeProject should be smart to skip the 
copy but I am not sure if it's actually the case




---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in...

Reply via email to