I'd also add that your point: There are certainly other situations such as small files where the copying > pathway is indeed faster, but for these pathways is it not even close.
This is pretty much the intended design of the java library. Not small file per-se but small batches streamed through processing pipelines. On Thu, Aug 13, 2020 at 7:59 PM Micah Kornfield <emkornfi...@gmail.com> wrote: > Hi Chris, > Nice write-up. I'm curious if you did more analysis on where time was > spent for each method? > > It seems to confirm that investing in zero copy read from disk provides a > nice speedup. I'm curious did you aren't too create a buffer allocator > based on memory mapper files for comparison? > > Thanks, > Micah > > On Thursday, August 13, 2020, Chris Nuernberger <ch...@techascent.com> > wrote: > >> Arrow Users - >> >> We took some time and wrote a blogpost on arrow's binary format and >> memory mapping on the JVM. We are happy with how succinctly we broke down >> the binary format in a visual way and think Arrow users looking to do >> interesting/unsupported things with Arrow may be interested in the >> presentation. >> >> https://techascent.com/blog/memory-mapping-arrow.html >> >> Chris >> >