[GitHub] spark issue #21242: [SPARK-23657][SQL] Document and expose the internal data...

rdblue Mon, 21 May 2018 15:43:11 -0700

Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/21242
  
    @rxin, after a few attempts at cleanly exposing these classes, I tend to 
agree that it isn't going to be worth it. But the problem is that we need an 
API for data sources to produce. What's worse is that they can already produce 
this data, but there's no documentation or guidance for doing so. What do you 
suggest as a fix?
    
    I'm fine using `InternalRow` as is in the v2 sources, and creating a 
markdown doc that describes the internal data formats that should be used.
    
    Even if we just add docs and don't expose these classes, the internals are 
going to need to change. `ArrayData`, for example, exposes `array` that is used 
in places to get the backing array. That prevents the objects from getting 
reused by readers and increases both allocations and gc costs. There is no real 
reason to use that method besides the fact that it happens to be available.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21242: [SPARK-23657][SQL] Document and expose the internal data...

Reply via email to