Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/21312
@viirya I looked into it a bit more and calling `clear()` won't cause any
problems but it does trigger a reallocation of the vector buffers the next time
writing. What do you think about changing this to do a manual "reset" so that
the buffers can be reused? It just needs to zero out the buffers and set the
value count to 0, so something like this:
```
val buffers = repeatedValueVector.getBuffers(false)
buffers.foreach(buf => buf.setZero(0, buf.capacity()))
repeatedValueVector.setValueCount(0)
```
Once we upgrade to Arrow 0.10.0, this can be cleaned up because there is a
common interface to `reset()`. I think we should definitely get this
backported to the 2.3 branch too.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]