[GitHub] spark pull request: [SPARK-12992][SQL] Support vectorized decoding...

davies Thu, 04 Feb 2016 15:06:19 -0800

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11055#discussion_r51952950
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala ---
    @@ -345,6 +345,14 @@ private[spark] object SQLConf {
         defaultValue = Some(true),
         doc = "Enables using the custom ParquetUnsafeRowRecordReader.")
     
    +  // Note: this can not be enabled all the time because the reader will 
not be returning UnsafeRows.
    +  // Doing so is very expensive and we should remove this requirement 
instead of fixing it here.
    +  // Initial testing seems to indicate only sort requires this.
    --- End diff --
    
    Right, all the operators output UnsafeRow, some operators may depends on 
some properties of UnsafeRow: 
    
    1). copy() returns UnsafeRow
    
    2). getStruct() return UnsafeRow, getArray() return UnsafeArrayData, 
getMap() returns UnsafeMap
    
    3). hashCode() is murmur3 on bytes of UnsafeRow
    
    4). compareTo() will compare the row as bytes
    
    For example,  the in-memory cache requires 2), except requires 4)
    
    @rxin Do you have more comment on this?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-12992][SQL] Support vectorized decoding...

Reply via email to