Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/21312 @viirya I looked into it a bit more and calling `clear()` won't cause any problems but it does trigger a reallocation of the vector buffers the next time writing. What do you think about changing this to do a manual "reset" so that the buffers can be reused? It just needs to zero out the buffers and set the value count to 0, so something like this: ``` val buffers = repeatedValueVector.getBuffers(false) buffers.foreach(buf => buf.setZero(0, buf.capacity())) repeatedValueVector.setValueCount(0) ``` Once we upgrade to Arrow 0.10.0, this can be cleaned up because there is a common interface to `reset()`. I think we should definitely get this backported to the 2.3 branch too.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org