Re: Dependency hell in Spark applications
Thanks everyone for weighing in on this. I had backported kinesis module from master to spark 1.0.2 so just to confirm if I am not missing anything, I did a dependency graph compare of my spark build with spark-master and org.apache.httpcomponents:httpclient:jar does seem to resolve to 4.1.2 dependency. I need Hive so, I can't really do a build without it. Even if I exclude httpclient dependency from my project's build, it will not solve the problem because AWS SDK has been compiled with a greater version of http client. My spark stream project does not uses http client directly. AWS SDK will look for class org.apache.http.impl.conn.DefaultClientConnectionOperator and it will be loaded from spark-assembly jar regardless of how I package my project (unless I am missing something?). I enabled verbosed classloading to confirm that the class is indeed loading from spark-assembly jar. spark.files.userClassPathFirst option doesn't seem to be working on my spark 1.0.2 build (not sure why). I was only left custom building spark and forcingly introduce latest httpclient's latest version as dependency. Finally, I tested this on 1.1.0-RC4 today and it has the same issue. Has anyone ever been able to get the Kinesis example work with spark-hadoop2.4 (with hive and yarn) build? I feel like this is a bug that exists even in 1.1.0. I still believe we need a better solution to address the dependency hell problem. If OSGi is deemed too over the top, what are the solutions being investigated? On 6 September 2014 04:44, Ted Yu yuzhih...@gmail.com wrote: From output of dependency:tree: [INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ spark-streaming_2.10 --- [INFO] org.apache.spark:spark-streaming_2.10:jar:1.1.0-SNAPSHOT INFO] +- org.apache.spark:spark-core_2.10:jar:1.1.0-SNAPSHOT:compile [INFO] | +- org.apache.hadoop:hadoop-client:jar:2.4.0:compile ... [INFO] | +- net.java.dev.jets3t:jets3t:jar:0.9.0:compile [INFO] | | +- commons-codec:commons-codec:jar:1.5:compile [INFO] | | +- org.apache.httpcomponents:httpclient:jar:4.1.2:compile [INFO] | | +- org.apache.httpcomponents:httpcore:jar:4.1.2:compile bq. excluding httpclient from spark-streaming dependency in your sbt/maven project This should work. On Fri, Sep 5, 2014 at 3:14 PM, Tathagata Das tathagata.das1...@gmail.com wrote: If httpClient dependency is coming from Hive, you could build Spark without Hive. Alternatively, have you tried excluding httpclient from spark-streaming dependency in your sbt/maven project? TD On Thu, Sep 4, 2014 at 6:42 AM, Koert Kuipers ko...@tresata.com wrote: custom spark builds should not be the answer. at least not if spark ever wants to have a vibrant community for spark apps. spark does support a user-classpath-first option, which would deal with some of these issues, but I don't think it works. On Sep 4, 2014 9:01 AM, Felix Garcia Borrego fborr...@gilt.com wrote: Hi, I run into the same issue and apart from the ideas Aniket said, I only could find a nasty workaround. Add my custom PoolingClientConnectionManager to my classpath. http://stackoverflow.com/questions/24788949/nosuchmethoderror-while-running-aws-s3-client-on-spark-while-javap-shows-otherwi/25488955#25488955 On Thu, Sep 4, 2014 at 11:43 AM, Sean Owen so...@cloudera.com wrote: Dumb question -- are you using a Spark build that includes the Kinesis dependency? that build would have resolved conflicts like this for you. Your app would need to use the same version of the Kinesis client SDK, ideally. All of these ideas are well-known, yes. In cases of super-common dependencies like Guava, they are already shaded. This is a less-common source of conflicts so I don't think http-client is shaded, especially since it is not used directly by Spark. I think this is a case of your app conflicting with a third-party dependency? I think OSGi is deemed too over the top for things like this. On Thu, Sep 4, 2014 at 11:35 AM, Aniket Bhatnagar aniket.bhatna...@gmail.com wrote: I am trying to use Kinesis as source to Spark Streaming and have run into a dependency issue that can't be resolved without making my own custom Spark build. The issue is that Spark is transitively dependent on org.apache.httpcomponents:httpclient:jar:4.1.2 (I think because of libfb303 coming from hbase and hive-serde) whereas AWS SDK is dependent on org.apache.httpcomponents:httpclient:jar:4.2. When I package and run Spark Streaming application, I get the following: Caused by: java.lang.NoSuchMethodError: org.apache.http.impl.conn.DefaultClientConnectionOperator.init(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V at
Re: Dependency hell in Spark applications
If httpClient dependency is coming from Hive, you could build Spark without Hive. Alternatively, have you tried excluding httpclient from spark-streaming dependency in your sbt/maven project? TD On Thu, Sep 4, 2014 at 6:42 AM, Koert Kuipers ko...@tresata.com wrote: custom spark builds should not be the answer. at least not if spark ever wants to have a vibrant community for spark apps. spark does support a user-classpath-first option, which would deal with some of these issues, but I don't think it works. On Sep 4, 2014 9:01 AM, Felix Garcia Borrego fborr...@gilt.com wrote: Hi, I run into the same issue and apart from the ideas Aniket said, I only could find a nasty workaround. Add my custom PoolingClientConnectionManager to my classpath. http://stackoverflow.com/questions/24788949/nosuchmethoderror-while-running-aws-s3-client-on-spark-while-javap-shows-otherwi/25488955#25488955 On Thu, Sep 4, 2014 at 11:43 AM, Sean Owen so...@cloudera.com wrote: Dumb question -- are you using a Spark build that includes the Kinesis dependency? that build would have resolved conflicts like this for you. Your app would need to use the same version of the Kinesis client SDK, ideally. All of these ideas are well-known, yes. In cases of super-common dependencies like Guava, they are already shaded. This is a less-common source of conflicts so I don't think http-client is shaded, especially since it is not used directly by Spark. I think this is a case of your app conflicting with a third-party dependency? I think OSGi is deemed too over the top for things like this. On Thu, Sep 4, 2014 at 11:35 AM, Aniket Bhatnagar aniket.bhatna...@gmail.com wrote: I am trying to use Kinesis as source to Spark Streaming and have run into a dependency issue that can't be resolved without making my own custom Spark build. The issue is that Spark is transitively dependent on org.apache.httpcomponents:httpclient:jar:4.1.2 (I think because of libfb303 coming from hbase and hive-serde) whereas AWS SDK is dependent on org.apache.httpcomponents:httpclient:jar:4.2. When I package and run Spark Streaming application, I get the following: Caused by: java.lang.NoSuchMethodError: org.apache.http.impl.conn.DefaultClientConnectionOperator.init(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V at org.apache.http.impl.conn.PoolingClientConnectionManager.createConnectionOperator(PoolingClientConnectionManager.java:140) at org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:114) at org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:99) at com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:29) at com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:97) at com.amazonaws.http.AmazonHttpClient.init(AmazonHttpClient.java:181) at com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:119) at com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:103) at com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:136) at com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:117) at com.amazonaws.services.kinesis.AmazonKinesisAsyncClient.init(AmazonKinesisAsyncClient.java:132) I can create a custom Spark build with org.apache.httpcomponents:httpclient:jar:4.2 included in the assembly but I was wondering if this is something Spark devs have noticed and are looking to resolve in near releases. Here are my thoughts on this issue: Containers that allow running custom user code have to often resolve dependency issues in case of conflicts between framework's and user code's dependency. Here is how I have seen some frameworks resolve the issue: 1. Provide a child-first class loader: Some JEE containers provided a child-first class loader that allowed for loading classes from user code first. I don't think this approach completely solves the problem as the framework is then susceptible to class mismatch errors. 2. Fold in all dependencies in a sub-package: This approach involves folding all dependencies in a project specific sub-package (like spark.dependencies). This approach is tedious because it involves building custom version of all dependencies (and their transitive dependencies) 3. Use something like OSGi: Some frameworks has successfully used OSGi to manage dependencies between the modules. The challenge in this approach is to
Re: Dependency hell in Spark applications
From output of dependency:tree: [INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ spark-streaming_2.10 --- [INFO] org.apache.spark:spark-streaming_2.10:jar:1.1.0-SNAPSHOT INFO] +- org.apache.spark:spark-core_2.10:jar:1.1.0-SNAPSHOT:compile [INFO] | +- org.apache.hadoop:hadoop-client:jar:2.4.0:compile ... [INFO] | +- net.java.dev.jets3t:jets3t:jar:0.9.0:compile [INFO] | | +- commons-codec:commons-codec:jar:1.5:compile [INFO] | | +- org.apache.httpcomponents:httpclient:jar:4.1.2:compile [INFO] | | +- org.apache.httpcomponents:httpcore:jar:4.1.2:compile bq. excluding httpclient from spark-streaming dependency in your sbt/maven project This should work. On Fri, Sep 5, 2014 at 3:14 PM, Tathagata Das tathagata.das1...@gmail.com wrote: If httpClient dependency is coming from Hive, you could build Spark without Hive. Alternatively, have you tried excluding httpclient from spark-streaming dependency in your sbt/maven project? TD On Thu, Sep 4, 2014 at 6:42 AM, Koert Kuipers ko...@tresata.com wrote: custom spark builds should not be the answer. at least not if spark ever wants to have a vibrant community for spark apps. spark does support a user-classpath-first option, which would deal with some of these issues, but I don't think it works. On Sep 4, 2014 9:01 AM, Felix Garcia Borrego fborr...@gilt.com wrote: Hi, I run into the same issue and apart from the ideas Aniket said, I only could find a nasty workaround. Add my custom PoolingClientConnectionManager to my classpath. http://stackoverflow.com/questions/24788949/nosuchmethoderror-while-running-aws-s3-client-on-spark-while-javap-shows-otherwi/25488955#25488955 On Thu, Sep 4, 2014 at 11:43 AM, Sean Owen so...@cloudera.com wrote: Dumb question -- are you using a Spark build that includes the Kinesis dependency? that build would have resolved conflicts like this for you. Your app would need to use the same version of the Kinesis client SDK, ideally. All of these ideas are well-known, yes. In cases of super-common dependencies like Guava, they are already shaded. This is a less-common source of conflicts so I don't think http-client is shaded, especially since it is not used directly by Spark. I think this is a case of your app conflicting with a third-party dependency? I think OSGi is deemed too over the top for things like this. On Thu, Sep 4, 2014 at 11:35 AM, Aniket Bhatnagar aniket.bhatna...@gmail.com wrote: I am trying to use Kinesis as source to Spark Streaming and have run into a dependency issue that can't be resolved without making my own custom Spark build. The issue is that Spark is transitively dependent on org.apache.httpcomponents:httpclient:jar:4.1.2 (I think because of libfb303 coming from hbase and hive-serde) whereas AWS SDK is dependent on org.apache.httpcomponents:httpclient:jar:4.2. When I package and run Spark Streaming application, I get the following: Caused by: java.lang.NoSuchMethodError: org.apache.http.impl.conn.DefaultClientConnectionOperator.init(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V at org.apache.http.impl.conn.PoolingClientConnectionManager.createConnectionOperator(PoolingClientConnectionManager.java:140) at org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:114) at org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:99) at com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:29) at com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:97) at com.amazonaws.http.AmazonHttpClient.init(AmazonHttpClient.java:181) at com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:119) at com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:103) at com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:136) at com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:117) at com.amazonaws.services.kinesis.AmazonKinesisAsyncClient.init(AmazonKinesisAsyncClient.java:132) I can create a custom Spark build with org.apache.httpcomponents:httpclient:jar:4.2 included in the assembly but I was wondering if this is something Spark devs have noticed and are looking to resolve in near releases. Here are my thoughts on this issue: Containers that allow running custom user code have to often resolve dependency
Dependency hell in Spark applications
I am trying to use Kinesis as source to Spark Streaming and have run into a dependency issue that can't be resolved without making my own custom Spark build. The issue is that Spark is transitively dependent on org.apache.httpcomponents:httpclient:jar:4.1.2 (I think because of libfb303 coming from hbase and hive-serde) whereas AWS SDK is dependent on org.apache.httpcomponents:httpclient:jar:4.2. When I package and run Spark Streaming application, I get the following: Caused by: java.lang.NoSuchMethodError: org.apache.http.impl.conn.DefaultClientConnectionOperator.init(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V at org.apache.http.impl.conn.PoolingClientConnectionManager.createConnectionOperator(PoolingClientConnectionManager.java:140) at org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:114) at org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:99) at com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:29) at com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:97) at com.amazonaws.http.AmazonHttpClient.init(AmazonHttpClient.java:181) at com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:119) at com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:103) at com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:136) at com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:117) at com.amazonaws.services.kinesis.AmazonKinesisAsyncClient.init(AmazonKinesisAsyncClient.java:132) I can create a custom Spark build with org.apache.httpcomponents:httpclient:jar:4.2 included in the assembly but I was wondering if this is something Spark devs have noticed and are looking to resolve in near releases. Here are my thoughts on this issue: Containers that allow running custom user code have to often resolve dependency issues in case of conflicts between framework's and user code's dependency. Here is how I have seen some frameworks resolve the issue: 1. Provide a child-first class loader: Some JEE containers provided a child-first class loader that allowed for loading classes from user code first. I don't think this approach completely solves the problem as the framework is then susceptible to class mismatch errors. 2. Fold in all dependencies in a sub-package: This approach involves folding all dependencies in a project specific sub-package (like spark.dependencies). This approach is tedious because it involves building custom version of all dependencies (and their transitive dependencies) 3. Use something like OSGi: Some frameworks has successfully used OSGi to manage dependencies between the modules. The challenge in this approach is to OSGify the framework and hide OSGi complexities from end user. My personal preference is OSGi (or atleast some support for OSGi) but I would love to hear what Spark devs are thinking in terms of resolving the problem. Thanks, Aniket
Re: Dependency hell in Spark applications
Dumb question -- are you using a Spark build that includes the Kinesis dependency? that build would have resolved conflicts like this for you. Your app would need to use the same version of the Kinesis client SDK, ideally. All of these ideas are well-known, yes. In cases of super-common dependencies like Guava, they are already shaded. This is a less-common source of conflicts so I don't think http-client is shaded, especially since it is not used directly by Spark. I think this is a case of your app conflicting with a third-party dependency? I think OSGi is deemed too over the top for things like this. On Thu, Sep 4, 2014 at 11:35 AM, Aniket Bhatnagar aniket.bhatna...@gmail.com wrote: I am trying to use Kinesis as source to Spark Streaming and have run into a dependency issue that can't be resolved without making my own custom Spark build. The issue is that Spark is transitively dependent on org.apache.httpcomponents:httpclient:jar:4.1.2 (I think because of libfb303 coming from hbase and hive-serde) whereas AWS SDK is dependent on org.apache.httpcomponents:httpclient:jar:4.2. When I package and run Spark Streaming application, I get the following: Caused by: java.lang.NoSuchMethodError: org.apache.http.impl.conn.DefaultClientConnectionOperator.init(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V at org.apache.http.impl.conn.PoolingClientConnectionManager.createConnectionOperator(PoolingClientConnectionManager.java:140) at org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:114) at org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:99) at com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:29) at com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:97) at com.amazonaws.http.AmazonHttpClient.init(AmazonHttpClient.java:181) at com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:119) at com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:103) at com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:136) at com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:117) at com.amazonaws.services.kinesis.AmazonKinesisAsyncClient.init(AmazonKinesisAsyncClient.java:132) I can create a custom Spark build with org.apache.httpcomponents:httpclient:jar:4.2 included in the assembly but I was wondering if this is something Spark devs have noticed and are looking to resolve in near releases. Here are my thoughts on this issue: Containers that allow running custom user code have to often resolve dependency issues in case of conflicts between framework's and user code's dependency. Here is how I have seen some frameworks resolve the issue: 1. Provide a child-first class loader: Some JEE containers provided a child-first class loader that allowed for loading classes from user code first. I don't think this approach completely solves the problem as the framework is then susceptible to class mismatch errors. 2. Fold in all dependencies in a sub-package: This approach involves folding all dependencies in a project specific sub-package (like spark.dependencies). This approach is tedious because it involves building custom version of all dependencies (and their transitive dependencies) 3. Use something like OSGi: Some frameworks has successfully used OSGi to manage dependencies between the modules. The challenge in this approach is to OSGify the framework and hide OSGi complexities from end user. My personal preference is OSGi (or atleast some support for OSGi) but I would love to hear what Spark devs are thinking in terms of resolving the problem. Thanks, Aniket - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Dependency hell in Spark applications
Hi, I run into the same issue and apart from the ideas Aniket said, I only could find a nasty workaround. Add my custom PoolingClientConnectionManager to my classpath. http://stackoverflow.com/questions/24788949/nosuchmethoderror-while-running-aws-s3-client-on-spark-while-javap-shows-otherwi/25488955#25488955 On Thu, Sep 4, 2014 at 11:43 AM, Sean Owen so...@cloudera.com wrote: Dumb question -- are you using a Spark build that includes the Kinesis dependency? that build would have resolved conflicts like this for you. Your app would need to use the same version of the Kinesis client SDK, ideally. All of these ideas are well-known, yes. In cases of super-common dependencies like Guava, they are already shaded. This is a less-common source of conflicts so I don't think http-client is shaded, especially since it is not used directly by Spark. I think this is a case of your app conflicting with a third-party dependency? I think OSGi is deemed too over the top for things like this. On Thu, Sep 4, 2014 at 11:35 AM, Aniket Bhatnagar aniket.bhatna...@gmail.com wrote: I am trying to use Kinesis as source to Spark Streaming and have run into a dependency issue that can't be resolved without making my own custom Spark build. The issue is that Spark is transitively dependent on org.apache.httpcomponents:httpclient:jar:4.1.2 (I think because of libfb303 coming from hbase and hive-serde) whereas AWS SDK is dependent on org.apache.httpcomponents:httpclient:jar:4.2. When I package and run Spark Streaming application, I get the following: Caused by: java.lang.NoSuchMethodError: org.apache.http.impl.conn.DefaultClientConnectionOperator.init(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V at org.apache.http.impl.conn.PoolingClientConnectionManager.createConnectionOperator(PoolingClientConnectionManager.java:140) at org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:114) at org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:99) at com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:29) at com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:97) at com.amazonaws.http.AmazonHttpClient.init(AmazonHttpClient.java:181) at com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:119) at com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:103) at com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:136) at com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:117) at com.amazonaws.services.kinesis.AmazonKinesisAsyncClient.init(AmazonKinesisAsyncClient.java:132) I can create a custom Spark build with org.apache.httpcomponents:httpclient:jar:4.2 included in the assembly but I was wondering if this is something Spark devs have noticed and are looking to resolve in near releases. Here are my thoughts on this issue: Containers that allow running custom user code have to often resolve dependency issues in case of conflicts between framework's and user code's dependency. Here is how I have seen some frameworks resolve the issue: 1. Provide a child-first class loader: Some JEE containers provided a child-first class loader that allowed for loading classes from user code first. I don't think this approach completely solves the problem as the framework is then susceptible to class mismatch errors. 2. Fold in all dependencies in a sub-package: This approach involves folding all dependencies in a project specific sub-package (like spark.dependencies). This approach is tedious because it involves building custom version of all dependencies (and their transitive dependencies) 3. Use something like OSGi: Some frameworks has successfully used OSGi to manage dependencies between the modules. The challenge in this approach is to OSGify the framework and hide OSGi complexities from end user. My personal preference is OSGi (or atleast some support for OSGi) but I would love to hear what Spark devs are thinking in terms of resolving the problem. Thanks, Aniket - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Dependency hell in Spark applications
custom spark builds should not be the answer. at least not if spark ever wants to have a vibrant community for spark apps. spark does support a user-classpath-first option, which would deal with some of these issues, but I don't think it works. On Sep 4, 2014 9:01 AM, Felix Garcia Borrego fborr...@gilt.com wrote: Hi, I run into the same issue and apart from the ideas Aniket said, I only could find a nasty workaround. Add my custom PoolingClientConnectionManager to my classpath. http://stackoverflow.com/questions/24788949/nosuchmethoderror-while-running-aws-s3-client-on-spark-while-javap-shows-otherwi/25488955#25488955 On Thu, Sep 4, 2014 at 11:43 AM, Sean Owen so...@cloudera.com wrote: Dumb question -- are you using a Spark build that includes the Kinesis dependency? that build would have resolved conflicts like this for you. Your app would need to use the same version of the Kinesis client SDK, ideally. All of these ideas are well-known, yes. In cases of super-common dependencies like Guava, they are already shaded. This is a less-common source of conflicts so I don't think http-client is shaded, especially since it is not used directly by Spark. I think this is a case of your app conflicting with a third-party dependency? I think OSGi is deemed too over the top for things like this. On Thu, Sep 4, 2014 at 11:35 AM, Aniket Bhatnagar aniket.bhatna...@gmail.com wrote: I am trying to use Kinesis as source to Spark Streaming and have run into a dependency issue that can't be resolved without making my own custom Spark build. The issue is that Spark is transitively dependent on org.apache.httpcomponents:httpclient:jar:4.1.2 (I think because of libfb303 coming from hbase and hive-serde) whereas AWS SDK is dependent on org.apache.httpcomponents:httpclient:jar:4.2. When I package and run Spark Streaming application, I get the following: Caused by: java.lang.NoSuchMethodError: org.apache.http.impl.conn.DefaultClientConnectionOperator.init(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V at org.apache.http.impl.conn.PoolingClientConnectionManager.createConnectionOperator(PoolingClientConnectionManager.java:140) at org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:114) at org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:99) at com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:29) at com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:97) at com.amazonaws.http.AmazonHttpClient.init(AmazonHttpClient.java:181) at com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:119) at com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:103) at com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:136) at com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:117) at com.amazonaws.services.kinesis.AmazonKinesisAsyncClient.init(AmazonKinesisAsyncClient.java:132) I can create a custom Spark build with org.apache.httpcomponents:httpclient:jar:4.2 included in the assembly but I was wondering if this is something Spark devs have noticed and are looking to resolve in near releases. Here are my thoughts on this issue: Containers that allow running custom user code have to often resolve dependency issues in case of conflicts between framework's and user code's dependency. Here is how I have seen some frameworks resolve the issue: 1. Provide a child-first class loader: Some JEE containers provided a child-first class loader that allowed for loading classes from user code first. I don't think this approach completely solves the problem as the framework is then susceptible to class mismatch errors. 2. Fold in all dependencies in a sub-package: This approach involves folding all dependencies in a project specific sub-package (like spark.dependencies). This approach is tedious because it involves building custom version of all dependencies (and their transitive dependencies) 3. Use something like OSGi: Some frameworks has successfully used OSGi to manage dependencies between the modules. The challenge in this approach is to OSGify the framework and hide OSGi complexities from end user. My personal preference is OSGi (or atleast some support for OSGi) but I would love to hear what Spark devs are thinking in terms of resolving the problem. Thanks, Aniket - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional