Hi Diego, Regarding system memory usage, allocation and deallocation is ultimately handled by the memory implementation you are using (Netty or Unsafe). For deallocation, BufferAllocators will count references to the allocated buffers and when the reference count hits zero, the memory implementation's release() API will be called. This means that the Arrow objects that utilize a BufferAllocator need to be explicitly closed so that their references to the underlying memory can be removed. Direct memory allocation and deallocation is expensive, so it could be possible that the memory implementation does not deallocate right away. Is the memory usage you are seeing impacting anything at runtime? I think what you are seeing is common behavior for Arrow implementations.
Regarding child allocators, you'll probably want to profile your specific usage to see if there is a performance impact. I think it can be fine to use in a single method in certain cases. I would recommend pairing them logically with specific functionality. The docs state this for example: "Child allocators are not strictly required, but can help better organize code. For instance, a lower memory limit can be set for a particular section of code. The child allocator can be closed when that section completes, at which point it checks that that section didn’t leak any memory. Child allocators can also be named, which makes it easier to tell where an ArrowBuf came from during debugging." In general, you'll want to audit your usage for memory leaks. The Arrow Java APIs can be misused rather easily. The vector life cycle explained here[1] provides a good example of the expected order of operations for a ValueVector. [1]https://arrow.apache.org/docs/java/vector.html On Mon, Nov 13, 2023 at 1:31 PM Diego Fernandez <[email protected]> wrote: > Hey all, > > I've read the memory docs <https://arrow.apache.org/docs/java/memory.html> a > couple of times but I still have a few questions. Maybe the answers to some > of this might be good addons to the docs. > > We currently have a single RootAllocator that we use everywhere, and we > see our memory usage on the system slowly grow until it's almost 100% and > stays there (although we don't see any OOM errors). The JVM immediately > allocates the bulk of the memory, and my understanding is the rest is just > the RootAllocator slowly allocating more direct memory as needed but never > releasing it since it can just reuse memory that has previously been > allocated but is no longer in use. > > System memory usage > Is my understanding above correct? Does the RootAllocator just continue to > take up direct memory until it reaches max system memory, max direct buffer > memory, or max memory allowed for the RootAllocator (whichever comes first)? > > It is the responsibility of the application owning a particular allocator >> to frequently confirm whether the allocator is over its memory limit >> (BufferAllocator.isOverLimit()) and if so, attempt to aggressively release >> memory to ameliorate the situation. >> > > Does this also apply to the RootAllocator? How exactly do we release > memory? I see `releaseBytes` as the only related method on an allocator, > but how would we know how many bytes we can release at a given time? > > Child allocator life span > The docs explain the purpose of child allocators pretty well, but they > don't really mention much about recommended best practices. Looking at the > project, it seems most child allocators are used for the lifespan of a > class, although some instances (ArrowFlightSqlClientHandler and > VectorSchemaRootTransformer for example) don't even seem to ever close the > child allocator. > > Is creating and closing a child allocator an expensive operation? would it > make sense to create a child allocator just for the span of a method call > and close it when you're done? > > Any other info regarding best practices around BufferAllocators or issues > to look out for would be greatly appreciated! >
