Github user nongli commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11055#discussion_r51959711
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala ---
    @@ -345,6 +345,14 @@ private[spark] object SQLConf {
         defaultValue = Some(true),
         doc = "Enables using the custom ParquetUnsafeRowRecordReader.")
     
    +  // Note: this can not be enabled all the time because the reader will 
not be returning UnsafeRows.
    +  // Doing so is very expensive and we should remove this requirement 
instead of fixing it here.
    +  // Initial testing seems to indicate only sort requires this.
    --- End diff --
    
    I think we can consider a few things.
    
    1. Only turn this on when it is part of the whole stage codegen pipeline 
which shouldn't have any of these requirements.
    2. Clean up InternalRow. It's not helpful to try to use InternalRow as a 
superclass if it needs a specific implementation in many places. I don't think 
we want to just have UnsafeRow since its requirements are too high (and 
therefore slow).
    3. Relax the requirements so that they are enforced by the operator, not 
the row. I think for example, we should remove copy(). The places that 
currently need copy should use something like a row serializer that copies to a 
contiguous byte buffer or whatever the operator wants. I'm not convinced a 
general purpose copy is necessary internally.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to