Ok, I have a PR up that creates a new non-hadoop API. It also includes a port of the tool that demonstrates reading and write ORC without hadoop on the classpath at all.
https://github.com/apache/orc/pull/641 Check it out and let me know if it works for you. .. Owen On Fri, Jan 22, 2021 at 6:32 PM Andrey Elenskiy <andrey.elens...@arista.com> wrote: > Thanks to both of you, I've actually went ahead with implementing > FileSystemAPI following this util: > https://github.com/apache/orc/blob/master/java/core/src/java/org/apache/orc/util/StreamWrapperFileSystem.java > I think it would be awesome to have ORC separated from hadoop class > eventually as I have to pull those jars as dependency and of course there > are multiple layers of indirection here. > > On Fri, Jan 22, 2021 at 10:21 AM Owen O'Malley <owen.omal...@gmail.com> > wrote: > >> Ok, a couple of things: >> >> - The PhysicalWriter was intended so that LLAP could implement a >> write through cache where the new file was put into the cache as well as >> written to long term storage. >> - The Hadoop FileSystem API, which is what ORC currently uses, is >> extensible and has a lot of bindings other than HDFS. For your use case, >> you probably want to use "file:///my-dir/my.orc" >> - Somewhere in the unit tests there is an implementation of Hadoop >> FileSystem that uses ByteBuffers in memory. >> - Finally, over the years there has been an ask for using ORC core >> without having Hadoop on the class path. Let me take a pass at that today >> to see if I can make that work. See >> https://issues.apache.org/jira/browse/ORC-508 . >> >> .. Owen >> >> On Tue, Jan 19, 2021 at 7:20 PM Andrey Elenskiy < >> andrey.elens...@arista.com> wrote: >> >>> Hello, currently there's only a single implementation of PhysicalWriter >>> that I were able to find -- PhysicalFSWriter, which only gives the option >>> to write to HDFS. >>> >>> I'd like to reuse the ORC file format for my own purposes without the >>> destination being HDFS, but just some byte buffer where I can decide myself >>> where the bytes end up being saved. >>> >>> I've started implementing PhysicalWriter, but it seems like a lot of it >>> just ends up being copied over from PhysicalFSWriter which seems redundant. >>> So, I'm wondering if maybe something already exists to achieve my goal of >>> just writing resulting columns to DataOutputStream (maybe there's some >>> unofficial Java library or I'm missing some obvious official API). >>> >>> Thanks, >>> Andrey >>> >>