[GitHub] [arrow] jacques-n commented on pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API by JNI to C++
jacques-n commented on pull request #7030: URL: https://github.com/apache/arrow/pull/7030#issuecomment-679358407 > The logic might warrant its own interface instead of a std::Function. I believe we already have this functionality in Gandiva. Let's see if we can use a consistent pattern (whether that is use what Gandiva has or make sure Gandiva uses whatever is proposed here--either way). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jacques-n commented on pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API by JNI to C++
jacques-n commented on pull request #7030: URL: https://github.com/apache/arrow/pull/7030#issuecomment-657096664 > @jacques-n > > I think we have several choices: we can try implement a c++ memory pool which is backed by a Java Allocator, actually I have a PoC branch for that: https://github.com/zhztheplayer/arrow-1/tree/ARROW-7808-wip-bridged-allocator. > > Or we can continue hacking Java API to reserve direct memory after buffer is registered (performance impact will be smaller). The reservation should happen inside Netty API because Netty has its own direct memory counter: https://github.com/netty/netty/blob/cbe238a95bab238455e3ff1849d450a9836f39ef/common/src/main/java/io/netty/util/internal/PlatformDependent.java#L749-L766. > > Which one do you suggest or maybe there's a better one than both? If you use Netty, then Netty has a flag that already respects the direct memory limits (so no additional work is necessary). I was suggesting a third approach: C++ memory pool that simply updates the direct memory info via Bits in java. It should be done chunk-wise to avoid excessive JNI boundary calls. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jacques-n commented on pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API by JNI to C++
jacques-n commented on pull request #7030: URL: https://github.com/apache/arrow/pull/7030#issuecomment-650227890 > 1. We are package-hacking `org.apache.arrow.memory` in module `arrow-dataset`. >Yes `Ownerships.java` uses some hacks, so I've removed the class in latest commits (the whole class is not necessary for current use case). Now only `NativeUnderlyingMemory.java` is still in the package `org.apache.arrow.memory`. We need to avoid this entirely. If you need some functionality, let's figure out what should be exposed. BaseAllocator should not be referenced. > 2. We lose java native heap caps. >Can we clarify this? I think the explanation can be either we lose caps from JVM direct memory, or we lose caps from BufferAllocator. For the latter one I think buffers are already registered to allocator. JVM should respect the max direct memory setting. I added more comments on your other comment to that end. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org