Thanks Owen for explaining. I understand that ORC was originally developed in the Hadoop land but as of now there are several other use cases that do not require HDFS and Hadoop. I have looked into Configuration and we are initialising the Writer with a configuration that has 100+ entries that are totally irrelevant to writing ORC files locally. I like composable systems using smaller building blocks rather than pulling in 106 packages, several of them duplicate or conflicting with other packages. I am going to look into what is easier to split our orc-core and make it independent from Hadoop (using only the Hive library for the VectorizedRowBatch if necessary).
Here is the current dependency tree of hadoop-common: [INFO] | +- org.apache.hadoop:hadoop-common:jar:2.6.4:compile [INFO] | | +- org.apache.hadoop:hadoop-annotations:jar:2.6.4:compile [INFO] | | +- com.google.guava:guava:jar:11.0.2:compile [INFO] | | +- commons-cli:commons-cli:jar:1.2:compile [INFO] | | +- org.apache.commons:commons-math3:jar:3.1.1:compile [INFO] | | +- xmlenc:xmlenc:jar:0.52:compile [INFO] | | +- commons-httpclient:commons-httpclient:jar:3.1:compile [INFO] | | | +- (commons-logging:commons-logging:jar:1.0.4:compile - omitted for conflict with 1.1.3) [INFO] | | | \- (commons-codec:commons-codec:jar:1.2:compile - omitted for conflict with 1.4) [INFO] | | +- commons-codec:commons-codec:jar:1.4:compile [INFO] | | +- commons-io:commons-io:jar:2.4:compile [INFO] | | +- commons-net:commons-net:jar:3.1:compile [INFO] | | +- commons-collections:commons-collections:jar:3.2.2:compile [INFO] | | +- com.sun.jersey:jersey-core:jar:1.9:compile [INFO] | | +- com.sun.jersey:jersey-json:jar:1.9:compile [INFO] | | | +- org.codehaus.jettison:jettison:jar:1.1:compile [INFO] | | | +- com.sun.xml.bind:jaxb-impl:jar:2.2.3-1:compile [INFO] | | | | \- javax.xml.bind:jaxb-api:jar:2.2.2:compile [INFO] | | | | +- javax.xml.stream:stax-api:jar:1.0-2:compile [INFO] | | | | \- javax.activation:activation:jar:1.1:compile [INFO] | | | +- (org.codehaus.jackson:jackson-core-asl:jar:1.8.3:compile - omitted for conflict with 1.9.13) [INFO] | | | +- (org.codehaus.jackson:jackson-mapper-asl:jar:1.8.3:compile - omitted for conflict with 1.9.13) [INFO] | | | +- org.codehaus.jackson:jackson-jaxrs:jar:1.8.3:compile [INFO] | | | | +- (org.codehaus.jackson:jackson-core-asl:jar:1.8.3:compile - omitted for conflict with 1.9.13) [INFO] | | | | \- (org.codehaus.jackson:jackson-mapper-asl:jar:1.8.3:compile - omitted for conflict with 1.9.13) [INFO] | | | +- org.codehaus.jackson:jackson-xc:jar:1.8.3:compile [INFO] | | | | +- (org.codehaus.jackson:jackson-core-asl:jar:1.8.3:compile - omitted for conflict with 1.9.13) [INFO] | | | | \- (org.codehaus.jackson:jackson-mapper-asl:jar:1.8.3:compile - omitted for conflict with 1.9.13) [INFO] | | | \- (com.sun.jersey:jersey-core:jar:1.9:compile - omitted for duplicate) [INFO] | | +- com.sun.jersey:jersey-server:jar:1.9:compile [INFO] | | | +- asm:asm:jar:3.1:compile [INFO] | | | \- (com.sun.jersey:jersey-core:jar:1.9:compile - omitted for duplicate) [INFO] | | +- tomcat:jasper-compiler:jar:5.5.23:runtime [INFO] | | +- tomcat:jasper-runtime:jar:5.5.23:runtime [INFO] | | | \- (commons-el:commons-el:jar:1.0:runtime - omitted for duplicate) [INFO] | | +- commons-el:commons-el:jar:1.0:runtime [INFO] | | | \- (commons-logging:commons-logging:jar:1.0.3:runtime - omitted for conflict with 1.0.4) [INFO] | | +- commons-logging:commons-logging:jar:1.1.3:compile [INFO] | | +- (log4j:log4j:jar:1.2.17:compile - omitted for duplicate) [INFO] | | +- net.java.dev.jets3t:jets3t:jar:0.9.0:compile [INFO] | | | +- (commons-codec:commons-codec:jar:1.4:compile - omitted for duplicate) [INFO] | | | +- (commons-logging:commons-logging:jar:1.1.1:compile - omitted for conflict with 1.1.3) [INFO] | | | +- org.apache.httpcomponents:httpclient:jar:4.1.2:compile [INFO] | | | | \- (org.apache.httpcomponents:httpcore:jar:4.1.2:compile - omitted for duplicate) [INFO] | | | +- org.apache.httpcomponents:httpcore:jar:4.1.2:compile [INFO] | | | \- com.jamesmurty.utils:java-xmlbuilder:jar:0.4:compile [INFO] | | +- (commons-lang:commons-lang:jar:2.6:compile - omitted for duplicate) [INFO] | | +- commons-configuration:commons-configuration:jar:1.6:compile [INFO] | | | +- (commons-collections:commons-collections:jar:3.2.1:compile - omitted for conflict with 3.2.2) [INFO] | | | +- (commons-lang:commons-lang:jar:2.4:compile - omitted for conflict with 2.6) [INFO] | | | +- (commons-logging:commons-logging:jar:1.1.1:compile - omitted for conflict with 1.1.3) [INFO] | | | +- commons-digester:commons-digester:jar:1.8:compile [INFO] | | | | +- commons-beanutils:commons-beanutils:jar:1.7.0:compile [INFO] | | | | | \- (commons-logging:commons-logging:jar:1.0.3:compile - omitted for conflict with 1.1.3) [INFO] | | | | \- (commons-logging:commons-logging:jar:1.1:compile - omitted for conflict with 1.1.3) [INFO] | | | \- commons-beanutils:commons-beanutils-core:jar:1.8.0:compile [INFO] | | | \- (commons-logging:commons-logging:jar:1.1.1:compile - omitted for conflict with 1.1.3) [INFO] | | +- (org.slf4j:slf4j-api:jar:1.7.5:compile - omitted for conflict with 1.7.7) [INFO] | | +- (org.slf4j:slf4j-log4j12:jar:1.7.5:compile - scope updated from runtime; omitted for duplicate) [INFO] | | +- (org.codehaus.jackson:jackson-core-asl:jar:1.9.13:compile - omitted for duplicate) [INFO] | | +- (org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13:compile - omitted for duplicate) [INFO] | | +- (com.google.protobuf:protobuf-java:jar:2.5.0:compile - omitted for duplicate) [INFO] | | +- com.google.code.gson:gson:jar:2.2.4:compile [INFO] | | +- org.apache.hadoop:hadoop-auth:jar:2.6.4:compile [INFO] | | | +- (org.slf4j:slf4j-api:jar:1.7.5:compile - omitted for conflict with 1.7.7) [INFO] | | | +- (commons-codec:commons-codec:jar:1.4:compile - omitted for duplicate) [INFO] | | | +- (log4j:log4j:jar:1.2.17:runtime - omitted for duplicate) [INFO] | | | +- (org.slf4j:slf4j-log4j12:jar:1.7.5:runtime - omitted for duplicate) [INFO] | | | +- (org.apache.httpcomponents:httpclient:jar:4.2.5:compile - omitted for conflict with 4.1.2) [INFO] | | | +- org.apache.directory.server:apacheds-kerberos-codec:jar:2.0.0-M15:compile [INFO] | | | | +- org.apache.directory.server:apacheds-i18n:jar:2.0.0-M15:compile [INFO] | | | | | \- (org.slf4j:slf4j-api:jar:1.7.5:compile - omitted for conflict with 1.7.7) [INFO] | | | | +- org.apache.directory.api:api-asn1-api:jar:1.0.0-M20:compile [INFO] | | | | | \- (org.slf4j:slf4j-api:jar:1.7.5:compile - omitted for conflict with 1.7.7) [INFO] | | | | +- org.apache.directory.api:api-util:jar:1.0.0-M20:compile [INFO] | | | | | \- (org.slf4j:slf4j-api:jar:1.7.5:compile - omitted for conflict with 1.7.7) [INFO] | | | | \- (org.slf4j:slf4j-api:jar:1.7.5:compile - omitted for conflict with 1.7.7) [INFO] | | | +- (org.apache.zookeeper:zookeeper:jar:3.4.6:compile - omitted for duplicate) [INFO] | | | \- org.apache.curator:curator-framework:jar:2.6.0:compile [INFO] | | | +- (org.apache.curator:curator-client:jar:2.6.0:compile - omitted for duplicate) [INFO] | | | +- (org.apache.zookeeper:zookeeper:jar:3.4.6:compile - omitted for duplicate) [INFO] | | | \- (com.google.guava:guava:jar:16.0.1:compile - omitted for conflict with 11.0.2) [INFO] | | +- com.jcraft:jsch:jar:0.1.42:compile [INFO] | | +- org.apache.curator:curator-client:jar:2.6.0:compile [INFO] | | | +- (org.slf4j:slf4j-api:jar:1.7.6:compile - omitted for conflict with 1.7.7) [INFO] | | | +- (org.apache.zookeeper:zookeeper:jar:3.4.6:compile - omitted for duplicate) [INFO] | | | \- (com.google.guava:guava:jar:16.0.1:compile - omitted for conflict with 11.0.2) [INFO] | | +- org.apache.curator:curator-recipes:jar:2.6.0:compile [INFO] | | | +- (org.apache.curator:curator-framework:jar:2.6.0:compile - omitted for duplicate) [INFO] | | | +- (org.apache.zookeeper:zookeeper:jar:3.4.6:compile - omitted for duplicate) [INFO] | | | \- (com.google.guava:guava:jar:16.0.1:compile - omitted for conflict with 11.0.2) [INFO] | | +- org.htrace:htrace-core:jar:3.0.4:compile [INFO] | | | +- (com.google.guava:guava:jar:12.0.1:compile - omitted for conflict with 11.0.2) [INFO] | | | \- (commons-logging:commons-logging:jar:1.1.1:compile - omitted for conflict with 1.1.3) [INFO] | | +- org.apache.zookeeper:zookeeper:jar:3.4.6:compile [INFO] | | | +- (org.slf4j:slf4j-api:jar:1.6.1:compile - omitted for conflict with 1.7.7) [INFO] | | | +- org.slf4j:slf4j-log4j12:jar:1.6.1:compile [INFO] | | | | +- (org.slf4j:slf4j-api:jar:1.6.1:compile - omitted for conflict with 1.7.7) [INFO] | | | | \- (log4j:log4j:jar:1.2.16:compile - omitted for conflict with 1.2.17) [INFO] | | | +- (log4j:log4j:jar:1.2.16:compile - omitted for conflict with 1.2.17) [INFO] | | | \- io.netty:netty:jar:3.7.0.Final:compile [INFO] | | \- (org.apache.commons:commons-compress:jar:1.4.1:compile - omitted for conflict with 1.8.1) [INFO] | +- org.apache.hive:hive-storage-api:jar:2.2.0:compile [INFO] | | +- (commons-lang:commons-lang:jar:2.6:compile - omitted for duplicate) [INFO] | | \- (org.slf4j:slf4j-api:jar:1.7.10:compile - omitted for conflict with 1.7.7) [INFO] | \- (org.slf4j:slf4j-api:jar:1.7.5:compile - omitted for conflict with 1.7.7) Regards, Istvan ᐧ On Wed, Feb 22, 2017 at 5:31 PM, Owen O'Malley <[email protected]> wrote: > > On Wed, Feb 22, 2017 at 12:41 AM, István <[email protected]> wrote: > >> Hi, >> >> I was wondering how hard it would be to drop Hadoop as a dependency from >> ORC. >> > > We could make a new module that removes the Hadoop dependency. The > fundamental parts we would need to abstract out are: > > * Configuration > * FileSystem > > The biggest concern is API compatibility and making sure that we don't > break users. > > Another concern is that we'd need to change the storage-api jar to not > depend on Hadoop either. That would be harder in some ways, because it has > some uses of the Writable interfaces. > > >> I need Hadoop because I would like to set a path (not on HDFS) for the >> ORC file and OrcFile requires and empty Hadoop config. If I am not mistaken >> these could be achieved not using the Hadoop libraries. >> > > You shouldn't need hdfs or an empty hadoop config. My Mac laptop can use > the orc-tools-1.3.3-uber.jar to read ORC files from local disk without > Hadoop (or its configuration) installed. The uber tools jar has the Hadoop > jars included, but it doesn't have an impact other than making the size > larger. > > I've filed a jira https://issues.apache.org/jira/browse/ORC-151 for going > through and excluding more of the transitive dependencies from the direct > dependencies especially the hadoop jar. > > So > > >> Does anybody has a solution to avoiding Hadoop libraries for a ORC >> project? >> >> Thank you in advance, >> Istvan >> >> -- >> the sun shines for all >> >> >> ᐧ >> > > -- the sun shines for all
