[GitHub] spark pull request #21118: SPARK-23325: Use InternalRow when reading with Da...

cloud-fan Tue, 07 Aug 2018 23:49:40 -0700

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21118#discussion_r208472316
  
    --- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/DataSourceReader.java
 ---
    @@ -76,5 +76,5 @@
        * If this method fails (by throwing an exception), the action will fail 
and no Spark job will be
        * submitted.
        */
    -  List<InputPartition<Row>> planInputPartitions();
    +  List<InputPartition<InternalRow>> planInputPartitions();
    --- End diff --
    
    The rationale is, data source v2 is not stable yet, and we should make it 
usable first, to make more people implement data sources and provide feedback. 
Eventually we should design a stable and efficient row builder in data source 
v2, but for now we should switch to `InternalRow` to make it usable. `Row` is 
too slow to implement a decent data source (like iceberg).



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21118: SPARK-23325: Use InternalRow when reading with Da...

Reply via email to