On Wed, Feb 22, 2017 at 12:41 AM, István <[email protected]> wrote: > Hi, > > I was wondering how hard it would be to drop Hadoop as a dependency from > ORC. >
We could make a new module that removes the Hadoop dependency. The fundamental parts we would need to abstract out are: * Configuration * FileSystem The biggest concern is API compatibility and making sure that we don't break users. Another concern is that we'd need to change the storage-api jar to not depend on Hadoop either. That would be harder in some ways, because it has some uses of the Writable interfaces. > I need Hadoop because I would like to set a path (not on HDFS) for the ORC > file and OrcFile requires and empty Hadoop config. If I am not mistaken > these could be achieved not using the Hadoop libraries. > You shouldn't need hdfs or an empty hadoop config. My Mac laptop can use the orc-tools-1.3.3-uber.jar to read ORC files from local disk without Hadoop (or its configuration) installed. The uber tools jar has the Hadoop jars included, but it doesn't have an impact other than making the size larger. I've filed a jira https://issues.apache.org/jira/browse/ORC-151 for going through and excluding more of the transitive dependencies from the direct dependencies especially the hadoop jar. So > Does anybody has a solution to avoiding Hadoop libraries for a ORC project? > > Thank you in advance, > Istvan > > -- > the sun shines for all > > > ᐧ >
