Re: ORC without Hadoop

Owen O'Malley Wed, 22 Feb 2017 08:32:32 -0800

On Wed, Feb 22, 2017 at 12:41 AM, István <[email protected]> wrote:

> Hi,
>
> I was wondering how hard it would be to drop Hadoop as a dependency from
> ORC.
>


We could make a new module that removes the Hadoop dependency. The
fundamental parts we would need to abstract out are:

* Configuration
* FileSystem

The biggest concern is API compatibility and making sure that we don't
break users.

Another concern is that we'd need to change the storage-api jar to not
depend on Hadoop either. That would be harder in some ways, because it has
some uses of the Writable interfaces.


> I need Hadoop because I would like to set a path (not on HDFS) for the ORC
> file and OrcFile requires and empty Hadoop config. If I am not mistaken
> these could be achieved not using the Hadoop libraries.
>

You shouldn't need hdfs or an empty hadoop config. My Mac laptop can use
the orc-tools-1.3.3-uber.jar to read ORC files from local disk without
Hadoop (or its configuration) installed. The uber tools jar has the Hadoop
jars included, but it doesn't have an impact other than making the size
larger.

I've filed a jira https://issues.apache.org/jira/browse/ORC-151 for going
through and excluding more of the transitive dependencies from the direct
dependencies especially the hadoop jar.

So


> Does anybody has a solution to avoiding Hadoop libraries for a ORC project?
>
> Thank you in advance,
> Istvan
>
> --
> the sun shines for all
>
>
> ᐧ
>

Re: ORC without Hadoop

Reply via email to