[GitHub] [arrow] jacques-n commented on pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API by JNI to C++

2020-08-24 Thread GitBox


jacques-n commented on pull request #7030:
URL: https://github.com/apache/arrow/pull/7030#issuecomment-679358407


   > The logic might warrant its own interface instead of a std::Function.
   
   I believe we already have this functionality in Gandiva. Let's see if we can 
use a consistent pattern (whether that is use what Gandiva has or make sure 
Gandiva uses whatever is proposed here--either way).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jacques-n commented on pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API by JNI to C++

2020-07-11 Thread GitBox


jacques-n commented on pull request #7030:
URL: https://github.com/apache/arrow/pull/7030#issuecomment-657096664


   > @jacques-n
   > 
   > I think we have several choices: we can try implement a c++ memory pool 
which is backed by a Java Allocator, actually I have a PoC branch for that: 
https://github.com/zhztheplayer/arrow-1/tree/ARROW-7808-wip-bridged-allocator.
   > 
   > Or we can continue hacking Java API to reserve direct memory after buffer 
is registered (performance impact will be smaller). The reservation should 
happen inside Netty API because Netty has its own direct memory counter: 
https://github.com/netty/netty/blob/cbe238a95bab238455e3ff1849d450a9836f39ef/common/src/main/java/io/netty/util/internal/PlatformDependent.java#L749-L766.
   > 
   > Which one do you suggest or maybe there's a better one than both?
   
   If you use Netty, then Netty has a flag that already respects the direct 
memory limits (so no additional work is necessary).
   
   I was suggesting a third approach: C++ memory pool that simply updates the 
direct memory info via Bits in java. It should be done chunk-wise to avoid 
excessive JNI boundary calls.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jacques-n commented on pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API by JNI to C++

2020-06-26 Thread GitBox


jacques-n commented on pull request #7030:
URL: https://github.com/apache/arrow/pull/7030#issuecomment-650227890


   
   > 1. We are package-hacking `org.apache.arrow.memory` in module 
`arrow-dataset`.
   >Yes `Ownerships.java` uses some hacks, so I've removed the class in 
latest commits (the whole class is not necessary for current use case). Now 
only `NativeUnderlyingMemory.java` is still in the package 
`org.apache.arrow.memory`.
   
   We need to avoid this entirely. If you need some functionality, let's figure 
out what should be exposed. BaseAllocator should not be referenced.
   
   > 2. We lose java native heap caps.
   >Can we clarify this? I think the explanation can be either we lose caps 
from JVM direct memory, or we lose caps from BufferAllocator. For the latter 
one I think buffers are already registered to allocator.
   
   JVM should respect the max direct memory setting. I added more comments on 
your other comment to that end.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org