Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/7480#discussion_r35155967
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashSemiJoin.scala 
---
    @@ -32,34 +32,45 @@ trait HashSemiJoin {
     
       override def output: Seq[Attribute] = left.output
     
    -  @transient protected lazy val rightKeyGenerator: Projection =
    -    newProjection(rightKeys, right.output)
    +  protected[this] def supportUnsafe: Boolean = {
    +    (self.codegenEnabled && UnsafeProjection.canSupport(leftKeys)
    +      && UnsafeProjection.canSupport(rightKeys)
    +      && UnsafeProjection.canSupport(left.schema))
    +  }
    +
    +  override def outputsUnsafeRows: Boolean = right.outputsUnsafeRows
    +  override def canProcessUnsafeRows: Boolean = supportUnsafe
    --- End diff --
    
    Actually, one question: in the cases where we support UnsafeRows do we 
always want to force this to use the Unsafe execution path even if the inputs 
are both safe rows?  If so, I think that we may want to override 
`canProcessSafeRows` and set it to `!supportUnsafe` in order to force the 
unsafe path to be taken whenever it can be taken.
    
    As the code exists now, I think that it will choose the Unsafe path as long 
as either `left` or `right` already happens to output Unsafe before converters 
have been applied.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to