Re: Dependency hell in Spark applications

2014-09-11 Thread Aniket Bhatnagar
Thanks everyone for weighing in on this.

I had backported kinesis module from master to spark 1.0.2 so just to
confirm if I am not missing anything, I did a dependency graph compare of
my spark build with spark-master
and org.apache.httpcomponents:httpclient:jar does seem to resolve to 4.1.2
dependency.

I need Hive so, I can't really do a build without it. Even if I
exclude httpclient
dependency from my project's build, it will not solve the problem because
AWS SDK has been compiled with a greater version of http client. My spark
stream project does not uses http client directly. AWS SDK will look for
 class org.apache.http.impl.conn.DefaultClientConnectionOperator and it
will be loaded from spark-assembly jar regardless of how I package my
project (unless I am missing something?). I enabled verbosed classloading
to confirm that the class is indeed loading from spark-assembly jar.

spark.files.userClassPathFirst option doesn't seem to be working on my
spark 1.0.2 build (not sure why).

I was only left custom building spark and forcingly introduce latest
httpclient's latest version as dependency.

Finally, I tested this on 1.1.0-RC4 today and it has the same issue. Has
anyone ever been able to get the Kinesis example work with spark-hadoop2.4
(with hive and yarn) build? I feel like this is a bug that exists even in
1.1.0.

I still believe we need a better solution to address the dependency hell
problem. If OSGi is deemed too over the top, what are the solutions being
investigated?

On 6 September 2014 04:44, Ted Yu yuzhih...@gmail.com wrote:

 From output of dependency:tree:

 [INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @
 spark-streaming_2.10 ---
 [INFO] org.apache.spark:spark-streaming_2.10:jar:1.1.0-SNAPSHOT
 INFO] +- org.apache.spark:spark-core_2.10:jar:1.1.0-SNAPSHOT:compile
 [INFO] |  +- org.apache.hadoop:hadoop-client:jar:2.4.0:compile
 ...
 [INFO] |  +- net.java.dev.jets3t:jets3t:jar:0.9.0:compile
 [INFO] |  |  +- commons-codec:commons-codec:jar:1.5:compile
 [INFO] |  |  +- org.apache.httpcomponents:httpclient:jar:4.1.2:compile
 [INFO] |  |  +- org.apache.httpcomponents:httpcore:jar:4.1.2:compile

 bq. excluding httpclient from spark-streaming dependency in your
 sbt/maven project

 This should work.


 On Fri, Sep 5, 2014 at 3:14 PM, Tathagata Das tathagata.das1...@gmail.com
  wrote:

 If httpClient dependency is coming from Hive, you could build Spark
 without
 Hive. Alternatively, have you tried excluding httpclient from
 spark-streaming dependency in your sbt/maven project?

 TD



 On Thu, Sep 4, 2014 at 6:42 AM, Koert Kuipers ko...@tresata.com wrote:

  custom spark builds should not be the answer. at least not if spark ever
  wants to have a vibrant community for spark apps.
 
  spark does support a user-classpath-first option, which would deal with
  some of these issues, but I don't think it works.
  On Sep 4, 2014 9:01 AM, Felix Garcia Borrego fborr...@gilt.com
 wrote:
 
   Hi,
   I run into the same issue and apart from the ideas Aniket said, I only
   could find a nasty workaround. Add my custom
  PoolingClientConnectionManager
   to my classpath.
  
  
  
 
 http://stackoverflow.com/questions/24788949/nosuchmethoderror-while-running-aws-s3-client-on-spark-while-javap-shows-otherwi/25488955#25488955
  
  
  
   On Thu, Sep 4, 2014 at 11:43 AM, Sean Owen so...@cloudera.com
 wrote:
  
Dumb question -- are you using a Spark build that includes the
 Kinesis
dependency? that build would have resolved conflicts like this for
you. Your app would need to use the same version of the Kinesis
 client
SDK, ideally.
   
All of these ideas are well-known, yes. In cases of super-common
dependencies like Guava, they are already shaded. This is a
less-common source of conflicts so I don't think http-client is
shaded, especially since it is not used directly by Spark. I think
this is a case of your app conflicting with a third-party
 dependency?
   
I think OSGi is deemed too over the top for things like this.
   
On Thu, Sep 4, 2014 at 11:35 AM, Aniket Bhatnagar
aniket.bhatna...@gmail.com wrote:
 I am trying to use Kinesis as source to Spark Streaming and have
 run
into a
 dependency issue that can't be resolved without making my own
 custom
Spark
 build. The issue is that Spark is transitively dependent
 on org.apache.httpcomponents:httpclient:jar:4.1.2 (I think
 because of
 libfb303 coming from hbase and hive-serde) whereas AWS SDK is
  dependent
 on org.apache.httpcomponents:httpclient:jar:4.2. When I package
 and
  run
 Spark Streaming application, I get the following:

 Caused by: java.lang.NoSuchMethodError:

   
  
 
 org.apache.http.impl.conn.DefaultClientConnectionOperator.init(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V
 at

   
  
 
 

Re: Dependency hell in Spark applications

2014-09-05 Thread Tathagata Das
If httpClient dependency is coming from Hive, you could build Spark without
Hive. Alternatively, have you tried excluding httpclient from
spark-streaming dependency in your sbt/maven project?

TD



On Thu, Sep 4, 2014 at 6:42 AM, Koert Kuipers ko...@tresata.com wrote:

 custom spark builds should not be the answer. at least not if spark ever
 wants to have a vibrant community for spark apps.

 spark does support a user-classpath-first option, which would deal with
 some of these issues, but I don't think it works.
 On Sep 4, 2014 9:01 AM, Felix Garcia Borrego fborr...@gilt.com wrote:

  Hi,
  I run into the same issue and apart from the ideas Aniket said, I only
  could find a nasty workaround. Add my custom
 PoolingClientConnectionManager
  to my classpath.
 
 
 
 http://stackoverflow.com/questions/24788949/nosuchmethoderror-while-running-aws-s3-client-on-spark-while-javap-shows-otherwi/25488955#25488955
 
 
 
  On Thu, Sep 4, 2014 at 11:43 AM, Sean Owen so...@cloudera.com wrote:
 
   Dumb question -- are you using a Spark build that includes the Kinesis
   dependency? that build would have resolved conflicts like this for
   you. Your app would need to use the same version of the Kinesis client
   SDK, ideally.
  
   All of these ideas are well-known, yes. In cases of super-common
   dependencies like Guava, they are already shaded. This is a
   less-common source of conflicts so I don't think http-client is
   shaded, especially since it is not used directly by Spark. I think
   this is a case of your app conflicting with a third-party dependency?
  
   I think OSGi is deemed too over the top for things like this.
  
   On Thu, Sep 4, 2014 at 11:35 AM, Aniket Bhatnagar
   aniket.bhatna...@gmail.com wrote:
I am trying to use Kinesis as source to Spark Streaming and have run
   into a
dependency issue that can't be resolved without making my own custom
   Spark
build. The issue is that Spark is transitively dependent
on org.apache.httpcomponents:httpclient:jar:4.1.2 (I think because of
libfb303 coming from hbase and hive-serde) whereas AWS SDK is
 dependent
on org.apache.httpcomponents:httpclient:jar:4.2. When I package and
 run
Spark Streaming application, I get the following:
   
Caused by: java.lang.NoSuchMethodError:
   
  
 
 org.apache.http.impl.conn.DefaultClientConnectionOperator.init(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V
at
   
  
 
 org.apache.http.impl.conn.PoolingClientConnectionManager.createConnectionOperator(PoolingClientConnectionManager.java:140)
at
   
  
 
 org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:114)
at
   
  
 
 org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:99)
at
   
  
 
 com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:29)
at
   
  
 
 com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:97)
at
com.amazonaws.http.AmazonHttpClient.init(AmazonHttpClient.java:181)
at
   
  
 
 com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:119)
at
   
  
 
 com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:103)
at
   
  
 
 com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:136)
at
   
  
 
 com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:117)
at
   
  
 
 com.amazonaws.services.kinesis.AmazonKinesisAsyncClient.init(AmazonKinesisAsyncClient.java:132)
   
I can create a custom Spark build with
org.apache.httpcomponents:httpclient:jar:4.2 included in the assembly
   but I
was wondering if this is something Spark devs have noticed and are
   looking
to resolve in near releases. Here are my thoughts on this issue:
   
Containers that allow running custom user code have to often resolve
dependency issues in case of conflicts between framework's and user
   code's
dependency. Here is how I have seen some frameworks resolve the
 issue:
1. Provide a child-first class loader: Some JEE containers provided a
child-first class loader that allowed for loading classes from user
  code
first. I don't think this approach completely solves the problem as
 the
framework is then susceptible to class mismatch errors.
2. Fold in all dependencies in a sub-package: This approach involves
folding all dependencies in a project specific sub-package (like
spark.dependencies). This approach is tedious because it involves
   building
custom version of all dependencies (and their transitive
 dependencies)
3. Use something like OSGi: Some frameworks has successfully used
 OSGi
  to
manage dependencies between the modules. The challenge in this
 approach
   is
to 

Re: Dependency hell in Spark applications

2014-09-05 Thread Ted Yu
From output of dependency:tree:

[INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @
spark-streaming_2.10 ---
[INFO] org.apache.spark:spark-streaming_2.10:jar:1.1.0-SNAPSHOT
INFO] +- org.apache.spark:spark-core_2.10:jar:1.1.0-SNAPSHOT:compile
[INFO] |  +- org.apache.hadoop:hadoop-client:jar:2.4.0:compile
...
[INFO] |  +- net.java.dev.jets3t:jets3t:jar:0.9.0:compile
[INFO] |  |  +- commons-codec:commons-codec:jar:1.5:compile
[INFO] |  |  +- org.apache.httpcomponents:httpclient:jar:4.1.2:compile
[INFO] |  |  +- org.apache.httpcomponents:httpcore:jar:4.1.2:compile

bq. excluding httpclient from spark-streaming dependency in your sbt/maven
project

This should work.


On Fri, Sep 5, 2014 at 3:14 PM, Tathagata Das tathagata.das1...@gmail.com
wrote:

 If httpClient dependency is coming from Hive, you could build Spark without
 Hive. Alternatively, have you tried excluding httpclient from
 spark-streaming dependency in your sbt/maven project?

 TD



 On Thu, Sep 4, 2014 at 6:42 AM, Koert Kuipers ko...@tresata.com wrote:

  custom spark builds should not be the answer. at least not if spark ever
  wants to have a vibrant community for spark apps.
 
  spark does support a user-classpath-first option, which would deal with
  some of these issues, but I don't think it works.
  On Sep 4, 2014 9:01 AM, Felix Garcia Borrego fborr...@gilt.com
 wrote:
 
   Hi,
   I run into the same issue and apart from the ideas Aniket said, I only
   could find a nasty workaround. Add my custom
  PoolingClientConnectionManager
   to my classpath.
  
  
  
 
 http://stackoverflow.com/questions/24788949/nosuchmethoderror-while-running-aws-s3-client-on-spark-while-javap-shows-otherwi/25488955#25488955
  
  
  
   On Thu, Sep 4, 2014 at 11:43 AM, Sean Owen so...@cloudera.com wrote:
  
Dumb question -- are you using a Spark build that includes the
 Kinesis
dependency? that build would have resolved conflicts like this for
you. Your app would need to use the same version of the Kinesis
 client
SDK, ideally.
   
All of these ideas are well-known, yes. In cases of super-common
dependencies like Guava, they are already shaded. This is a
less-common source of conflicts so I don't think http-client is
shaded, especially since it is not used directly by Spark. I think
this is a case of your app conflicting with a third-party dependency?
   
I think OSGi is deemed too over the top for things like this.
   
On Thu, Sep 4, 2014 at 11:35 AM, Aniket Bhatnagar
aniket.bhatna...@gmail.com wrote:
 I am trying to use Kinesis as source to Spark Streaming and have
 run
into a
 dependency issue that can't be resolved without making my own
 custom
Spark
 build. The issue is that Spark is transitively dependent
 on org.apache.httpcomponents:httpclient:jar:4.1.2 (I think because
 of
 libfb303 coming from hbase and hive-serde) whereas AWS SDK is
  dependent
 on org.apache.httpcomponents:httpclient:jar:4.2. When I package and
  run
 Spark Streaming application, I get the following:

 Caused by: java.lang.NoSuchMethodError:

   
  
 
 org.apache.http.impl.conn.DefaultClientConnectionOperator.init(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V
 at

   
  
 
 org.apache.http.impl.conn.PoolingClientConnectionManager.createConnectionOperator(PoolingClientConnectionManager.java:140)
 at

   
  
 
 org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:114)
 at

   
  
 
 org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:99)
 at

   
  
 
 com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:29)
 at

   
  
 
 com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:97)
 at

 com.amazonaws.http.AmazonHttpClient.init(AmazonHttpClient.java:181)
 at

   
  
 
 com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:119)
 at

   
  
 
 com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:103)
 at

   
  
 
 com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:136)
 at

   
  
 
 com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:117)
 at

   
  
 
 com.amazonaws.services.kinesis.AmazonKinesisAsyncClient.init(AmazonKinesisAsyncClient.java:132)

 I can create a custom Spark build with
 org.apache.httpcomponents:httpclient:jar:4.2 included in the
 assembly
but I
 was wondering if this is something Spark devs have noticed and are
looking
 to resolve in near releases. Here are my thoughts on this issue:

 Containers that allow running custom user code have to often
 resolve
 dependency 

Dependency hell in Spark applications

2014-09-04 Thread Aniket Bhatnagar
I am trying to use Kinesis as source to Spark Streaming and have run into a
dependency issue that can't be resolved without making my own custom Spark
build. The issue is that Spark is transitively dependent
on org.apache.httpcomponents:httpclient:jar:4.1.2 (I think because of
libfb303 coming from hbase and hive-serde) whereas AWS SDK is dependent
on org.apache.httpcomponents:httpclient:jar:4.2. When I package and run
Spark Streaming application, I get the following:

Caused by: java.lang.NoSuchMethodError:
org.apache.http.impl.conn.DefaultClientConnectionOperator.init(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V
at
org.apache.http.impl.conn.PoolingClientConnectionManager.createConnectionOperator(PoolingClientConnectionManager.java:140)
at
org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:114)
at
org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:99)
at
com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:29)
at
com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:97)
at
com.amazonaws.http.AmazonHttpClient.init(AmazonHttpClient.java:181)
at
com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:119)
at
com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:103)
at
com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:136)
at
com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:117)
at
com.amazonaws.services.kinesis.AmazonKinesisAsyncClient.init(AmazonKinesisAsyncClient.java:132)

I can create a custom Spark build with
org.apache.httpcomponents:httpclient:jar:4.2 included in the assembly but I
was wondering if this is something Spark devs have noticed and are looking
to resolve in near releases. Here are my thoughts on this issue:

Containers that allow running custom user code have to often resolve
dependency issues in case of conflicts between framework's and user code's
dependency. Here is how I have seen some frameworks resolve the issue:
1. Provide a child-first class loader: Some JEE containers provided a
child-first class loader that allowed for loading classes from user code
first. I don't think this approach completely solves the problem as the
framework is then susceptible to class mismatch errors.
2. Fold in all dependencies in a sub-package: This approach involves
folding all dependencies in a project specific sub-package (like
spark.dependencies). This approach is tedious because it involves building
custom version of all dependencies (and their transitive dependencies)
3. Use something like OSGi: Some frameworks has successfully used OSGi to
manage dependencies between the modules. The challenge in this approach is
to OSGify the framework and hide OSGi complexities from end user.

My personal preference is OSGi (or atleast some support for OSGi) but I
would love to hear what Spark devs are thinking in terms of resolving the
problem.

Thanks,
Aniket


Re: Dependency hell in Spark applications

2014-09-04 Thread Sean Owen
Dumb question -- are you using a Spark build that includes the Kinesis
dependency? that build would have resolved conflicts like this for
you. Your app would need to use the same version of the Kinesis client
SDK, ideally.

All of these ideas are well-known, yes. In cases of super-common
dependencies like Guava, they are already shaded. This is a
less-common source of conflicts so I don't think http-client is
shaded, especially since it is not used directly by Spark. I think
this is a case of your app conflicting with a third-party dependency?

I think OSGi is deemed too over the top for things like this.

On Thu, Sep 4, 2014 at 11:35 AM, Aniket Bhatnagar
aniket.bhatna...@gmail.com wrote:
 I am trying to use Kinesis as source to Spark Streaming and have run into a
 dependency issue that can't be resolved without making my own custom Spark
 build. The issue is that Spark is transitively dependent
 on org.apache.httpcomponents:httpclient:jar:4.1.2 (I think because of
 libfb303 coming from hbase and hive-serde) whereas AWS SDK is dependent
 on org.apache.httpcomponents:httpclient:jar:4.2. When I package and run
 Spark Streaming application, I get the following:

 Caused by: java.lang.NoSuchMethodError:
 org.apache.http.impl.conn.DefaultClientConnectionOperator.init(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V
 at
 org.apache.http.impl.conn.PoolingClientConnectionManager.createConnectionOperator(PoolingClientConnectionManager.java:140)
 at
 org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:114)
 at
 org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:99)
 at
 com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:29)
 at
 com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:97)
 at
 com.amazonaws.http.AmazonHttpClient.init(AmazonHttpClient.java:181)
 at
 com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:119)
 at
 com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:103)
 at
 com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:136)
 at
 com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:117)
 at
 com.amazonaws.services.kinesis.AmazonKinesisAsyncClient.init(AmazonKinesisAsyncClient.java:132)

 I can create a custom Spark build with
 org.apache.httpcomponents:httpclient:jar:4.2 included in the assembly but I
 was wondering if this is something Spark devs have noticed and are looking
 to resolve in near releases. Here are my thoughts on this issue:

 Containers that allow running custom user code have to often resolve
 dependency issues in case of conflicts between framework's and user code's
 dependency. Here is how I have seen some frameworks resolve the issue:
 1. Provide a child-first class loader: Some JEE containers provided a
 child-first class loader that allowed for loading classes from user code
 first. I don't think this approach completely solves the problem as the
 framework is then susceptible to class mismatch errors.
 2. Fold in all dependencies in a sub-package: This approach involves
 folding all dependencies in a project specific sub-package (like
 spark.dependencies). This approach is tedious because it involves building
 custom version of all dependencies (and their transitive dependencies)
 3. Use something like OSGi: Some frameworks has successfully used OSGi to
 manage dependencies between the modules. The challenge in this approach is
 to OSGify the framework and hide OSGi complexities from end user.

 My personal preference is OSGi (or atleast some support for OSGi) but I
 would love to hear what Spark devs are thinking in terms of resolving the
 problem.

 Thanks,
 Aniket

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Dependency hell in Spark applications

2014-09-04 Thread Felix Garcia Borrego
Hi,
I run into the same issue and apart from the ideas Aniket said, I only
could find a nasty workaround. Add my custom PoolingClientConnectionManager
to my classpath.

http://stackoverflow.com/questions/24788949/nosuchmethoderror-while-running-aws-s3-client-on-spark-while-javap-shows-otherwi/25488955#25488955



On Thu, Sep 4, 2014 at 11:43 AM, Sean Owen so...@cloudera.com wrote:

 Dumb question -- are you using a Spark build that includes the Kinesis
 dependency? that build would have resolved conflicts like this for
 you. Your app would need to use the same version of the Kinesis client
 SDK, ideally.

 All of these ideas are well-known, yes. In cases of super-common
 dependencies like Guava, they are already shaded. This is a
 less-common source of conflicts so I don't think http-client is
 shaded, especially since it is not used directly by Spark. I think
 this is a case of your app conflicting with a third-party dependency?

 I think OSGi is deemed too over the top for things like this.

 On Thu, Sep 4, 2014 at 11:35 AM, Aniket Bhatnagar
 aniket.bhatna...@gmail.com wrote:
  I am trying to use Kinesis as source to Spark Streaming and have run
 into a
  dependency issue that can't be resolved without making my own custom
 Spark
  build. The issue is that Spark is transitively dependent
  on org.apache.httpcomponents:httpclient:jar:4.1.2 (I think because of
  libfb303 coming from hbase and hive-serde) whereas AWS SDK is dependent
  on org.apache.httpcomponents:httpclient:jar:4.2. When I package and run
  Spark Streaming application, I get the following:
 
  Caused by: java.lang.NoSuchMethodError:
 
 org.apache.http.impl.conn.DefaultClientConnectionOperator.init(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V
  at
 
 org.apache.http.impl.conn.PoolingClientConnectionManager.createConnectionOperator(PoolingClientConnectionManager.java:140)
  at
 
 org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:114)
  at
 
 org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:99)
  at
 
 com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:29)
  at
 
 com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:97)
  at
  com.amazonaws.http.AmazonHttpClient.init(AmazonHttpClient.java:181)
  at
 
 com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:119)
  at
 
 com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:103)
  at
 
 com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:136)
  at
 
 com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:117)
  at
 
 com.amazonaws.services.kinesis.AmazonKinesisAsyncClient.init(AmazonKinesisAsyncClient.java:132)
 
  I can create a custom Spark build with
  org.apache.httpcomponents:httpclient:jar:4.2 included in the assembly
 but I
  was wondering if this is something Spark devs have noticed and are
 looking
  to resolve in near releases. Here are my thoughts on this issue:
 
  Containers that allow running custom user code have to often resolve
  dependency issues in case of conflicts between framework's and user
 code's
  dependency. Here is how I have seen some frameworks resolve the issue:
  1. Provide a child-first class loader: Some JEE containers provided a
  child-first class loader that allowed for loading classes from user code
  first. I don't think this approach completely solves the problem as the
  framework is then susceptible to class mismatch errors.
  2. Fold in all dependencies in a sub-package: This approach involves
  folding all dependencies in a project specific sub-package (like
  spark.dependencies). This approach is tedious because it involves
 building
  custom version of all dependencies (and their transitive dependencies)
  3. Use something like OSGi: Some frameworks has successfully used OSGi to
  manage dependencies between the modules. The challenge in this approach
 is
  to OSGify the framework and hide OSGi complexities from end user.
 
  My personal preference is OSGi (or atleast some support for OSGi) but I
  would love to hear what Spark devs are thinking in terms of resolving the
  problem.
 
  Thanks,
  Aniket

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




Re: Dependency hell in Spark applications

2014-09-04 Thread Koert Kuipers
custom spark builds should not be the answer. at least not if spark ever
wants to have a vibrant community for spark apps.

spark does support a user-classpath-first option, which would deal with
some of these issues, but I don't think it works.
On Sep 4, 2014 9:01 AM, Felix Garcia Borrego fborr...@gilt.com wrote:

 Hi,
 I run into the same issue and apart from the ideas Aniket said, I only
 could find a nasty workaround. Add my custom PoolingClientConnectionManager
 to my classpath.


 http://stackoverflow.com/questions/24788949/nosuchmethoderror-while-running-aws-s3-client-on-spark-while-javap-shows-otherwi/25488955#25488955



 On Thu, Sep 4, 2014 at 11:43 AM, Sean Owen so...@cloudera.com wrote:

  Dumb question -- are you using a Spark build that includes the Kinesis
  dependency? that build would have resolved conflicts like this for
  you. Your app would need to use the same version of the Kinesis client
  SDK, ideally.
 
  All of these ideas are well-known, yes. In cases of super-common
  dependencies like Guava, they are already shaded. This is a
  less-common source of conflicts so I don't think http-client is
  shaded, especially since it is not used directly by Spark. I think
  this is a case of your app conflicting with a third-party dependency?
 
  I think OSGi is deemed too over the top for things like this.
 
  On Thu, Sep 4, 2014 at 11:35 AM, Aniket Bhatnagar
  aniket.bhatna...@gmail.com wrote:
   I am trying to use Kinesis as source to Spark Streaming and have run
  into a
   dependency issue that can't be resolved without making my own custom
  Spark
   build. The issue is that Spark is transitively dependent
   on org.apache.httpcomponents:httpclient:jar:4.1.2 (I think because of
   libfb303 coming from hbase and hive-serde) whereas AWS SDK is dependent
   on org.apache.httpcomponents:httpclient:jar:4.2. When I package and run
   Spark Streaming application, I get the following:
  
   Caused by: java.lang.NoSuchMethodError:
  
 
 org.apache.http.impl.conn.DefaultClientConnectionOperator.init(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V
   at
  
 
 org.apache.http.impl.conn.PoolingClientConnectionManager.createConnectionOperator(PoolingClientConnectionManager.java:140)
   at
  
 
 org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:114)
   at
  
 
 org.apache.http.impl.conn.PoolingClientConnectionManager.init(PoolingClientConnectionManager.java:99)
   at
  
 
 com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:29)
   at
  
 
 com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:97)
   at
   com.amazonaws.http.AmazonHttpClient.init(AmazonHttpClient.java:181)
   at
  
 
 com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:119)
   at
  
 
 com.amazonaws.AmazonWebServiceClient.init(AmazonWebServiceClient.java:103)
   at
  
 
 com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:136)
   at
  
 
 com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:117)
   at
  
 
 com.amazonaws.services.kinesis.AmazonKinesisAsyncClient.init(AmazonKinesisAsyncClient.java:132)
  
   I can create a custom Spark build with
   org.apache.httpcomponents:httpclient:jar:4.2 included in the assembly
  but I
   was wondering if this is something Spark devs have noticed and are
  looking
   to resolve in near releases. Here are my thoughts on this issue:
  
   Containers that allow running custom user code have to often resolve
   dependency issues in case of conflicts between framework's and user
  code's
   dependency. Here is how I have seen some frameworks resolve the issue:
   1. Provide a child-first class loader: Some JEE containers provided a
   child-first class loader that allowed for loading classes from user
 code
   first. I don't think this approach completely solves the problem as the
   framework is then susceptible to class mismatch errors.
   2. Fold in all dependencies in a sub-package: This approach involves
   folding all dependencies in a project specific sub-package (like
   spark.dependencies). This approach is tedious because it involves
  building
   custom version of all dependencies (and their transitive dependencies)
   3. Use something like OSGi: Some frameworks has successfully used OSGi
 to
   manage dependencies between the modules. The challenge in this approach
  is
   to OSGify the framework and hide OSGi complexities from end user.
  
   My personal preference is OSGi (or atleast some support for OSGi) but I
   would love to hear what Spark devs are thinking in terms of resolving
 the
   problem.
  
   Thanks,
   Aniket
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional