Ok, I have a PR up that creates a new non-hadoop API. It also includes a
port of the tool that demonstrates reading and write ORC without hadoop on
the classpath at all.

https://github.com/apache/orc/pull/641

Check it out and let me know if it works for you.

.. Owen

On Fri, Jan 22, 2021 at 6:32 PM Andrey Elenskiy <andrey.elens...@arista.com>
wrote:

> Thanks to both of you, I've actually went ahead with implementing
> FileSystemAPI following this util:
> https://github.com/apache/orc/blob/master/java/core/src/java/org/apache/orc/util/StreamWrapperFileSystem.java
> I think it would be awesome to have ORC separated from hadoop class
> eventually as I have to pull those jars as dependency and of course there
> are multiple layers of indirection here.
>
> On Fri, Jan 22, 2021 at 10:21 AM Owen O'Malley <owen.omal...@gmail.com>
> wrote:
>
>> Ok, a couple of things:
>>
>>    - The PhysicalWriter was intended so that LLAP could implement a
>>    write through cache where the new file was put into the cache as well as
>>    written to long term storage.
>>    - The Hadoop FileSystem API, which is what ORC currently uses, is
>>    extensible and has a lot of bindings other than HDFS. For your use case,
>>    you probably want to use "file:///my-dir/my.orc"
>>    - Somewhere in the unit tests there is an implementation of Hadoop
>>    FileSystem that uses ByteBuffers in memory.
>>    - Finally, over the years there has been an ask for using ORC core
>>    without having Hadoop on the class path. Let me take a pass at that today
>>    to see if I can make that work. See
>>    https://issues.apache.org/jira/browse/ORC-508 .
>>
>> .. Owen
>>
>> On Tue, Jan 19, 2021 at 7:20 PM Andrey Elenskiy <
>> andrey.elens...@arista.com> wrote:
>>
>>> Hello, currently there's only a single implementation of PhysicalWriter
>>> that I were able to find -- PhysicalFSWriter, which only gives the option
>>> to write to HDFS.
>>>
>>> I'd like to reuse the ORC file format for my own purposes without the
>>> destination being HDFS, but just some byte buffer where I can decide myself
>>> where the bytes end up being saved.
>>>
>>> I've started implementing PhysicalWriter, but it seems like a lot of it
>>> just ends up being copied over from PhysicalFSWriter which seems redundant.
>>> So, I'm wondering if maybe something already exists to achieve my goal of
>>> just writing resulting columns to DataOutputStream (maybe there's some
>>> unofficial Java library or I'm missing some obvious official API).
>>>
>>> Thanks,
>>> Andrey
>>>
>>

Reply via email to