Re: Why spark-submit works with package not with jar

Mich Talebzadeh Tue, 20 Oct 2020 16:06:31 -0700

Thanks again all.

Anyway as Nicola suggested I used the trench war approach to sort this out
by just using jars and working out their dependencies in ~/.ivy2/jars
directory using grep -lRi <missing> :)



This now works with just using jars (new added ones in grey) after
resolving the dependencies


${SPARK_HOME}/bin/spark-submit \

                --master yarn \

                --deploy-mode client \

                --conf spark.executor.memoryOverhead=3000 \

                --class org.apache.spark.repl.Main \

                --name "my own Spark shell on Yarn" "$@" \

                --driver-class-path /home/hduser/jars/ddhybrid.jar \

                --jars /home/hduser/jars/spark-bigquery-latest.jar, \

                       /home/hduser/jars/ddhybrid.jar, \


 /home/hduser/jars/com.google.http-client_google-http-client-1.24.1.jar, \


 
/home/hduser/jars/com.google.http-client_google-http-client-jackson2-1.24.1.jar,
\


 /home/hduser/jars/com.google.cloud.bigdataoss_util-1.9.4.jar, \


 /home/hduser/jars/com.google.api-client_google-api-client-1.24.1.jar, \


/home/hduser/jars/com.google.oauth-client_google-oauth-client-1.24.1.jar, \


 
/home/hduser/jars/com.google.apis_google-api-services-bigquery-v2-rev398-1.24.1.jar,
\


 
/home/hduser/jars/com.google.cloud.bigdataoss_bigquery-connector-0.13.4-hadoop2.jar,
\

                       /home/hduser/jars/spark-bigquery_2.11-0.2.6.jar \


Compared to using the package itself as before


${SPARK_HOME}/bin/spark-submit \

                --master yarn \

                --deploy-mode client \

                --conf spark.executor.memoryOverhead=3000 \

                --class org.apache.spark.repl.Main \

                --name "my own Spark shell on Yarn" "$@" \

                --driver-class-path /home/hduser/jars/ddhybrid.jar \

                --jars /home/hduser/jars/spark-bigquery-latest.jar, \

                       /home/hduser/jars/ddhybrid.jar \

                --packages com.github.samelamin:spark-bigquery_2.11:0.2.6



I think as Sean suggested this approach may or may not work (a manual
process) and if jars change, the whole thing has to be re-evaluated adding
to the complexity.


Cheers


On Tue, 20 Oct 2020 at 23:01, Sean Owen <sro...@gmail.com> wrote:

> Rather, let --packages (via Ivy) worry about them, because they tell Ivy
> what they need.
> There's no 100% guarantee that conflicting dependencies are resolved in a
> way that works in every single case, which you run into sometimes when
> using incompatible libraries, but yes this is the point of --packages and
> Ivy.
>
> On Tue, Oct 20, 2020 at 4:43 PM Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
>> Thanks again all.
>>
>> Hi Sean,
>>
>> As I understood from your statement, you are suggesting just use
>> --packages without worrying about individual jar dependencies?
>>
>>>
>>>>>

Re: Why spark-submit works with package not with jar

Reply via email to