[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

cloud-fan Sat, 05 May 2018 01:53:21 -0700

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/21118
  
    @rdblue , this is a good point. Since not all the operators need unsasfe 
row, we can save the copy at data source side if we don't need to produce 
unsade row. Actually we had such a mechanism before: 
https://github.com/apache/spark/pull/10511
    
    But I'm not sure if it worth to bring it back. We expect data source to 
produce `ColumnarBatch` for better performance, and the row interface 
performance is not that important. Actually the `SupportsScanUnsafeRow` is only 
there to avoid perf regression for migrating file sources. If you think that's 
not a good public API, we can move it to internal package and only use it for 
file sources.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21118: SPARK-23325: Use InternalRow when reading with DataSourc...

Reply via email to