Re: Why spark-submit works with package not with jar

Mich Talebzadeh Tue, 20 Oct 2020 13:49:28 -0700

Hi Nicolas,

I removed ~/.iv2 and reran the spark job with the package included (the one
working)


Under ~/.ivy/jars I Have 37 jar files, including the one that I had before.

/home/hduser/.ivy2/jars> ls
com.databricks_spark-avro_2.11-4.0.0.jar
 com.google.cloud.bigdataoss_gcs-connector-1.9.4-hadoop2.jar
com.google.oauth-client_google-oauth-client-1.24.1.jar
org.checkerframework_checker-qual-2.5.2.jar
com.fasterxml.jackson.core_jackson-core-2.9.2.jar
com.google.cloud.bigdataoss_gcsio-1.9.4.jar
com.google.oauth-client_google-oauth-client-java6-1.24.1.jar
org.codehaus.jackson_jackson-core-asl-1.9.13.jar
com.github.samelamin_spark-bigquery_2.11-0.2.6.jar
 com.google.cloud.bigdataoss_util-1.9.4.jar
 commons-codec_commons-codec-1.6.jar
 org.codehaus.jackson_jackson-mapper-asl-1.9.13.jar
com.google.api-client_google-api-client-1.24.1.jar
 com.google.cloud.bigdataoss_util-hadoop-1.9.4-hadoop2.jar
commons-logging_commons-logging-1.1.1.jar
 org.codehaus.mojo_animal-sniffer-annotations-1.14.jar
com.google.api-client_google-api-client-jackson2-1.24.1.jar
com.google.code.findbugs_jsr305-3.0.2.jar
com.thoughtworks.paranamer_paranamer-2.3.jar
org.slf4j_slf4j-api-1.7.5.jar
com.google.api-client_google-api-client-java6-1.24.1.jar
 com.google.errorprone_error_prone_annotations-2.1.3.jar
joda-time_joda-time-2.9.3.jar
 org.tukaani_xz-1.0.jar
com.google.apis_google-api-services-bigquery-v2-rev398-1.24.1.jar
com.google.guava_guava-26.0-jre.jar
org.apache.avro_avro-1.7.6.jar
org.xerial.snappy_snappy-java-1.0.5.jar
com.google.apis_google-api-services-storage-v1-rev135-1.24.1.jar
 com.google.http-client_google-http-client-1.24.1.jar
 org.apache.commons_commons-compress-1.4.1.jar
com.google.auto.value_auto-value-annotations-1.6.2.jar
 com.google.http-client_google-http-client-jackson2-1.24.1.jar
org.apache.httpcomponents_httpclient-4.0.1.jar
com.google.cloud.bigdataoss_bigquery-connector-0.13.4-hadoop2.jar
com.google.j2objc_j2objc-annotations-1.1.jar
 org.apache.httpcomponents_httpcore-4.0.1.jar

I don't think I need to add all of these to spark-submit --jars list. Is
there a way I can find out which dependency is missing

This is the error I am getting when I use the jar file
* com.github.samelamin_spark-bigquery_2.11-0.2.6.jar* instead of the
package *com.github.samelamin:spark-bigquery_2.11:0.2.6*

java.lang.NoClassDefFoundError:
com/google/api/client/http/HttpRequestInitializer
  at
com.samelamin.spark.bigquery.BigQuerySQLContext.bq$lzycompute(BigQuerySQLContext.scala:19)
  at
com.samelamin.spark.bigquery.BigQuerySQLContext.bq(BigQuerySQLContext.scala:19)
  at
com.samelamin.spark.bigquery.BigQuerySQLContext.runDMLQuery(BigQuerySQLContext.scala:105)
  ... 76 elided
Caused by: java.lang.ClassNotFoundException:
com.google.api.client.http.HttpRequestInitializer
  at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:357)


Thanks



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Tue, 20 Oct 2020 at 20:09, Nicolas Paris <nicolas.pa...@riseup.net>
wrote:

> once you got the jars from --package in the ~/.ivy2 folder you can then
> add the list to --jars . in this way there is no missing dependency.
>
>
> ayan guha <guha.a...@gmail.com> writes:
>
> > Hi
> >
> > One way to think of this is --packages is better when you have third
> party
> > dependency and --jars is better when you have custom in-house built jars.
> >
> > On Wed, 21 Oct 2020 at 3:44 am, Mich Talebzadeh <
> mich.talebza...@gmail.com>
> > wrote:
> >
> >> Thanks Sean and Russell. Much appreciated.
> >>
> >> Just to clarify recently I had issues with different versions of Google
> >> Guava jar files in building Uber jar file (to evict the unwanted ones).
> >> These used to work a year and half ago using Google Dataproc compute
> >> engines (comes with Spark preloaded) and I could create an Uber jar
> file.
> >>
> >> Unfortunately this has become problematic now so tried to use
> spark-submit
> >> instead as follows:
> >>
> >> ${SPARK_HOME}/bin/spark-submit \
> >>                 --master yarn \
> >>                 --deploy-mode client \
> >>                 --conf spark.executor.memoryOverhead=3000 \
> >>                 --class org.apache.spark.repl.Main \
> >>                 --name "Spark shell on Yarn" "$@"
> >>                 --driver-class-path /home/hduser/jars/ddhybrid.jar \
> >>                 --jars /home/hduser/jars/spark-bigquery-latest.jar, \
> >>                        /home/hduser/jars/ddhybrid.jar \
> >>                 --packages
> com.github.samelamin:spark-bigquery_2.11:0.2.6
> >>
> >> Effectively tailored spark-shell. However, I do not think there is a
> >> mechanism to resolve jar conflicts without  building an Uber jar file
> >> through SBT?
> >>
> >> Cheers
> >>
> >>
> >>
> >> On Tue, 20 Oct 2020 at 16:54, Russell Spitzer <
> russell.spit...@gmail.com>
> >> wrote:
> >>
> >>> --jar Adds only that jar
> >>> --package adds the Jar and a it's dependencies listed in maven
> >>>
> >>> On Tue, Oct 20, 2020 at 10:50 AM Mich Talebzadeh <
> >>> mich.talebza...@gmail.com> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> I have a scenario that I use in Spark submit as follows:
> >>>>
> >>>> spark-submit --driver-class-path /home/hduser/jars/ddhybrid.jar --jars
> >>>>
> /home/hduser/jars/spark-bigquery-latest.jar,/home/hduser/jars/ddhybrid.jar,
> >>>> */home/hduser/jars/spark-bigquery_2.11-0.2.6.jar*
> >>>>
> >>>> As you can see the jar files needed are added.
> >>>>
> >>>>
> >>>> This comes back with error message as below
> >>>>
> >>>>
> >>>> Creating model test.weights_MODEL
> >>>>
> >>>> java.lang.NoClassDefFoundError:
> >>>> com/google/api/client/http/HttpRequestInitializer
> >>>>
> >>>>   at
> >>>>
> com.samelamin.spark.bigquery.BigQuerySQLContext.bq$lzycompute(BigQuerySQLContext.scala:19)
> >>>>
> >>>>   at
> >>>>
> com.samelamin.spark.bigquery.BigQuerySQLContext.bq(BigQuerySQLContext.scala:19)
> >>>>
> >>>>   at
> >>>>
> com.samelamin.spark.bigquery.BigQuerySQLContext.runDMLQuery(BigQuerySQLContext.scala:105)
> >>>>
> >>>>   ... 76 elided
> >>>>
> >>>> Caused by: java.lang.ClassNotFoundException:
> >>>> com.google.api.client.http.HttpRequestInitializer
> >>>>
> >>>>   at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
> >>>>
> >>>>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> >>>>
> >>>>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> >>>>
> >>>>
> >>>>
> >>>> So there is an issue with finding the class, although the jar file
> used
> >>>>
> >>>>
> >>>> /home/hduser/jars/spark-bigquery_2.11-0.2.6.jar
> >>>>
> >>>> has it.
> >>>>
> >>>>
> >>>> Now if *I remove the above jar file and replace it with the same
> >>>> version but package* it works!
> >>>>
> >>>>
> >>>> spark-submit --driver-class-path /home/hduser/jars/ddhybrid.jar --jars
> >>>>
> /home/hduser/jars/spark-bigquery-latest.jar,/home/hduser/jars/ddhybrid.jar
> >>>> *-**-packages com.github.samelamin:spark-bigquery_2.11:0.2.6*
> >>>>
> >>>>
> >>>> I have read the write-ups about packages searching the maven
> >>>> libraries etc. Not convinced why using the package should make so much
> >>>> difference between a failure and success. In other words, when to use
> a
> >>>> package rather than a jar.
> >>>>
> >>>>
> >>>> Any ideas will be appreciated.
> >>>>
> >>>>
> >>>> Thanks
> >>>>
> >>>>
> >>>>
> >>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
> >>>> any loss, damage or destruction of data or any other property which
> may
> >>>> arise from relying on this email's technical content is explicitly
> >>>> disclaimed. The author will in no case be liable for any monetary
> damages
> >>>> arising from such loss, damage or destruction.
> >>>>
> >>>>
> >>>>
> >>> --
> > Best Regards,
> > Ayan Guha
>
>
> --
> nicolas paris
>

Re: Why spark-submit works with package not with jar

Reply via email to