Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21242 @rxin, after a few attempts at cleanly exposing these classes, I tend to agree that it isn't going to be worth it. But the problem is that we need an API for data sources to produce. What's worse is that they can already produce this data, but there's no documentation or guidance for doing so. What do you suggest as a fix? I'm fine using `InternalRow` as is in the v2 sources, and creating a markdown doc that describes the internal data formats that should be used. Even if we just add docs and don't expose these classes, the internals are going to need to change. `ArrayData`, for example, exposes `array` that is used in places to get the backing array. That prevents the objects from getting reused by readers and increases both allocations and gc costs. There is no real reason to use that method besides the fact that it happens to be available.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org