Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/21242
@rxin, after a few attempts at cleanly exposing these classes, I tend to
agree that it isn't going to be worth it. But the problem is that we need an
API for data sources to produce. What's worse is that they can already produce
this data, but there's no documentation or guidance for doing so. What do you
suggest as a fix?
I'm fine using `InternalRow` as is in the v2 sources, and creating a
markdown doc that describes the internal data formats that should be used.
Even if we just add docs and don't expose these classes, the internals are
going to need to change. `ArrayData`, for example, exposes `array` that is used
in places to get the backing array. That prevents the objects from getting
reused by readers and increases both allocations and gc costs. There is no real
reason to use that method besides the fact that it happens to be available.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]