On Sun, Jul 26, 2020 at 5:52 AM Chris Nuernberger <[email protected]> wrote:
> Hmm, sounds reasonable enough. I may be mistaken but it appears to me > that the fact that the current code relies on mutably updating the vector > schema root does preclude concurrent access or parallelized access to > multiple record batches. Potentially a map-batch method that returns a new > vector-schema-root each time would work. > Yeah, you could do something like that. The issue you can see depending on your vector/batch sizes is increased heap usage. The stream based design of the current classes was built so that one minimized heap churn when working with large pipelines.
