Re: run new spark version on old spark cluster ?
Hi I finally got all working. Here are the steps (for information, I am on HDP 2.6.5): - copy the old hive-site.xml into the new spark conf folder - (optional?) donwload the jersey-bundle-1.8.jar and put it into the jars folder - build a tar gz from all the jars and copy that archive to hdfs with chown hdfs:hadoop - create a spark-default.conf file into conf folder and add the below lines: > spark.driver.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native > spark.executor.extraLibraryPath > /usr/hdp/current/hadoop-client/lib/native > spark.driver.extraJavaOptions -Dhdp.version=2.6.5.0-292 > spark.yarn.am.extraJavaOptions -Dhdp.version=2.6.5.0-292 > spark.eventLog.dir hdfs:///spark-history > spark.eventLog.enabled false > spark.hadoop.yarn.timeline-service.enabled false > org.apache.spark.deploy.history.FsHistoryProvider > spark.yarn.containerLauncherMaxThreads 25 > spark.driver.memoryOverhead 200 > spark.executor.memoryOverhead 200 > spark.yarn.max.executor.failures 3 > spark.yarn.preserve.staging.files false > spark.yarn.queue default > spark.yarn.scheduler.heartbeat.interval-ms 5000 > spark.yarn.submit.file.replication 3 > spark.yarn.archive hdfs:///hdp/apps/2.6.5.0-292/spark2/spark2.4.tar.gz > spark.ui.port 4041 then the below command works (included hive, hdfs and yarn): > bin/spark-shell --master yarn Thanks for your support, On Mon, May 20, 2019 at 03:42:46PM -0400, Koert Kuipers wrote: > most likely have to set something in spark-defaults.conf like > > spark.master yarn > spark.submit.deployMode client > > On Mon, May 20, 2019 at 3:14 PM Nicolas Paris > wrote: > > Finally that was easy to connect to both hive/hdfs. I just had to copy > the hive-site.xml from the old spark version and that worked instantly > after unzipping. > > Right now I am stuck on connecting to yarn. > > > On Mon, May 20, 2019 at 02:50:44PM -0400, Koert Kuipers wrote: > > we had very little issues with hdfs or hive, but then we use hive only > for > > basic reading and writing of tables. > > > > depending on your vendor you might have to add a few settings to your > > spark-defaults.conf. i remember on hdp you had to set the hdp.version > somehow. > > we prefer to build spark with hadoop being provided, and then add hadoop > > classpath to spark classpath. this works well on cdh, hdp, and also for > cloud > > providers. > > > > for example this is a typical build with hive for cdh 5 (which is based > on > > hadoop 2.6, you change hadoop version based on vendor): > > dev/make-distribution.sh --name --tgz -Phadoop-2.6 > -Dhadoop.version= > > 2.6.0 -Pyarn -Phadoop-provided -Phive > > add hadoop classpath to the spark classpath in spark-env.sh: > > export SPARK_DIST_CLASSPATH=$(hadoop classpath) > > > > i think certain vendors support multiple "vendor supported" installs, so > you > > could also look into that if you are not comfortable with running your > own > > spark build. > > > > On Mon, May 20, 2019 at 2:24 PM Nicolas Paris > wrote: > > > > > correct. note that you only need to install spark on the node you > launch > > it > > > from. spark doesnt need to be installed on cluster itself. > > > > That sound reasonably doable for me. My guess is I will have some > > troubles to make that spark version work with both hive & hdfs > installed > > on the cluster - or maybe that's finally plug-&-play i don't know. > > > > thanks > > > > On Mon, May 20, 2019 at 02:16:43PM -0400, Koert Kuipers wrote: > > > correct. note that you only need to install spark on the node you > launch > > it > > > from. spark doesnt need to be installed on cluster itself. > > > > > > the shared components between spark jobs on yarn are only really > > > spark-shuffle-service in yarn and spark-history-server. i have > found > > > compatibility for these to be good. its best if these run latest > version. > > > > > > On Mon, May 20, 2019 at 2:02 PM Nicolas Paris < > nicolas.pa...@riseup.net> > > wrote: > > > > > > > you will need the spark version you intend to launch with on > the > > machine > > > you > > > > launch from and point to the correct spark-submit > > > > > > does this mean to install a second spark version (2.4) on the > cluster > > ? > > > > > > thanks > > > > > > On Mon, May 20, 2019 at 01:58:11PM -0400, Koert Kuipers wrote: > > > > yarn can happily run multiple spark versions side-by-side > > > > you will need the spark version you intend to launch with on > the > > machine > > > you > > > > launch from and point to
Re: run new spark version on old spark cluster ?
most likely have to set something in spark-defaults.conf like spark.master yarn spark.submit.deployMode client On Mon, May 20, 2019 at 3:14 PM Nicolas Paris wrote: > Finally that was easy to connect to both hive/hdfs. I just had to copy > the hive-site.xml from the old spark version and that worked instantly > after unzipping. > > Right now I am stuck on connecting to yarn. > > > On Mon, May 20, 2019 at 02:50:44PM -0400, Koert Kuipers wrote: > > we had very little issues with hdfs or hive, but then we use hive only > for > > basic reading and writing of tables. > > > > depending on your vendor you might have to add a few settings to your > > spark-defaults.conf. i remember on hdp you had to set the hdp.version > somehow. > > we prefer to build spark with hadoop being provided, and then add hadoop > > classpath to spark classpath. this works well on cdh, hdp, and also for > cloud > > providers. > > > > for example this is a typical build with hive for cdh 5 (which is based > on > > hadoop 2.6, you change hadoop version based on vendor): > > dev/make-distribution.sh --name --tgz -Phadoop-2.6 > -Dhadoop.version= > > 2.6.0 -Pyarn -Phadoop-provided -Phive > > add hadoop classpath to the spark classpath in spark-env.sh: > > export SPARK_DIST_CLASSPATH=$(hadoop classpath) > > > > i think certain vendors support multiple "vendor supported" installs, so > you > > could also look into that if you are not comfortable with running your > own > > spark build. > > > > On Mon, May 20, 2019 at 2:24 PM Nicolas Paris > wrote: > > > > > correct. note that you only need to install spark on the node you > launch > > it > > > from. spark doesnt need to be installed on cluster itself. > > > > That sound reasonably doable for me. My guess is I will have some > > troubles to make that spark version work with both hive & hdfs > installed > > on the cluster - or maybe that's finally plug-&-play i don't know. > > > > thanks > > > > On Mon, May 20, 2019 at 02:16:43PM -0400, Koert Kuipers wrote: > > > correct. note that you only need to install spark on the node you > launch > > it > > > from. spark doesnt need to be installed on cluster itself. > > > > > > the shared components between spark jobs on yarn are only really > > > spark-shuffle-service in yarn and spark-history-server. i have > found > > > compatibility for these to be good. its best if these run latest > version. > > > > > > On Mon, May 20, 2019 at 2:02 PM Nicolas Paris < > nicolas.pa...@riseup.net> > > wrote: > > > > > > > you will need the spark version you intend to launch with on > the > > machine > > > you > > > > launch from and point to the correct spark-submit > > > > > > does this mean to install a second spark version (2.4) on the > cluster > > ? > > > > > > thanks > > > > > > On Mon, May 20, 2019 at 01:58:11PM -0400, Koert Kuipers wrote: > > > > yarn can happily run multiple spark versions side-by-side > > > > you will need the spark version you intend to launch with on > the > > machine > > > you > > > > launch from and point to the correct spark-submit > > > > > > > > On Mon, May 20, 2019 at 1:50 PM Nicolas Paris < > > nicolas.pa...@riseup.net> > > > wrote: > > > > > > > > Hi > > > > > > > > I am wondering whether that's feasible to: > > > > - build a spark application (with sbt/maven) based on > spark2.4 > > > > - deploy that jar on yarn on a spark2.3 based > installation > > > > > > > > thanks by advance, > > > > > > > > > > > > -- > > > > nicolas > > > > > > > > > > > - > > > > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > > > > > > > > > > > > > -- > > > nicolas > > > > > > > - > > > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > > > > > > > > > -- > > nicolas > > > > - > > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > > > > > -- > nicolas > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >
Re: run new spark version on old spark cluster ?
Finally that was easy to connect to both hive/hdfs. I just had to copy the hive-site.xml from the old spark version and that worked instantly after unzipping. Right now I am stuck on connecting to yarn. On Mon, May 20, 2019 at 02:50:44PM -0400, Koert Kuipers wrote: > we had very little issues with hdfs or hive, but then we use hive only for > basic reading and writing of tables. > > depending on your vendor you might have to add a few settings to your > spark-defaults.conf. i remember on hdp you had to set the hdp.version somehow. > we prefer to build spark with hadoop being provided, and then add hadoop > classpath to spark classpath. this works well on cdh, hdp, and also for cloud > providers. > > for example this is a typical build with hive for cdh 5 (which is based on > hadoop 2.6, you change hadoop version based on vendor): > dev/make-distribution.sh --name --tgz -Phadoop-2.6 > -Dhadoop.version= > 2.6.0 -Pyarn -Phadoop-provided -Phive > add hadoop classpath to the spark classpath in spark-env.sh: > export SPARK_DIST_CLASSPATH=$(hadoop classpath) > > i think certain vendors support multiple "vendor supported" installs, so you > could also look into that if you are not comfortable with running your own > spark build. > > On Mon, May 20, 2019 at 2:24 PM Nicolas Paris > wrote: > > > correct. note that you only need to install spark on the node you launch > it > > from. spark doesnt need to be installed on cluster itself. > > That sound reasonably doable for me. My guess is I will have some > troubles to make that spark version work with both hive & hdfs installed > on the cluster - or maybe that's finally plug-&-play i don't know. > > thanks > > On Mon, May 20, 2019 at 02:16:43PM -0400, Koert Kuipers wrote: > > correct. note that you only need to install spark on the node you launch > it > > from. spark doesnt need to be installed on cluster itself. > > > > the shared components between spark jobs on yarn are only really > > spark-shuffle-service in yarn and spark-history-server. i have found > > compatibility for these to be good. its best if these run latest > version. > > > > On Mon, May 20, 2019 at 2:02 PM Nicolas Paris > wrote: > > > > > you will need the spark version you intend to launch with on the > machine > > you > > > launch from and point to the correct spark-submit > > > > does this mean to install a second spark version (2.4) on the > cluster > ? > > > > thanks > > > > On Mon, May 20, 2019 at 01:58:11PM -0400, Koert Kuipers wrote: > > > yarn can happily run multiple spark versions side-by-side > > > you will need the spark version you intend to launch with on the > machine > > you > > > launch from and point to the correct spark-submit > > > > > > On Mon, May 20, 2019 at 1:50 PM Nicolas Paris < > nicolas.pa...@riseup.net> > > wrote: > > > > > > Hi > > > > > > I am wondering whether that's feasible to: > > > - build a spark application (with sbt/maven) based on spark2.4 > > > - deploy that jar on yarn on a spark2.3 based installation > > > > > > thanks by advance, > > > > > > > > > -- > > > nicolas > > > > > > > - > > > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > > > > > > > > > -- > > nicolas > > > > > - > > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > > > > > -- > nicolas > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > -- nicolas - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: run new spark version on old spark cluster ?
we had very little issues with hdfs or hive, but then we use hive only for basic reading and writing of tables. depending on your vendor you might have to add a few settings to your spark-defaults.conf. i remember on hdp you had to set the hdp.version somehow. we prefer to build spark with hadoop being provided, and then add hadoop classpath to spark classpath. this works well on cdh, hdp, and also for cloud providers. for example this is a typical build with hive for cdh 5 (which is based on hadoop 2.6, you change hadoop version based on vendor): dev/make-distribution.sh --name --tgz -Phadoop-2.6 -Dhadoop.version=2.6.0 -Pyarn -Phadoop-provided -Phive add hadoop classpath to the spark classpath in spark-env.sh: export SPARK_DIST_CLASSPATH=$(hadoop classpath) i think certain vendors support multiple "vendor supported" installs, so you could also look into that if you are not comfortable with running your own spark build. On Mon, May 20, 2019 at 2:24 PM Nicolas Paris wrote: > > correct. note that you only need to install spark on the node you launch > it > > from. spark doesnt need to be installed on cluster itself. > > That sound reasonably doable for me. My guess is I will have some > troubles to make that spark version work with both hive & hdfs installed > on the cluster - or maybe that's finally plug-&-play i don't know. > > thanks > > On Mon, May 20, 2019 at 02:16:43PM -0400, Koert Kuipers wrote: > > correct. note that you only need to install spark on the node you launch > it > > from. spark doesnt need to be installed on cluster itself. > > > > the shared components between spark jobs on yarn are only really > > spark-shuffle-service in yarn and spark-history-server. i have found > > compatibility for these to be good. its best if these run latest version. > > > > On Mon, May 20, 2019 at 2:02 PM Nicolas Paris > wrote: > > > > > you will need the spark version you intend to launch with on the > machine > > you > > > launch from and point to the correct spark-submit > > > > does this mean to install a second spark version (2.4) on the > cluster ? > > > > thanks > > > > On Mon, May 20, 2019 at 01:58:11PM -0400, Koert Kuipers wrote: > > > yarn can happily run multiple spark versions side-by-side > > > you will need the spark version you intend to launch with on the > machine > > you > > > launch from and point to the correct spark-submit > > > > > > On Mon, May 20, 2019 at 1:50 PM Nicolas Paris < > nicolas.pa...@riseup.net> > > wrote: > > > > > > Hi > > > > > > I am wondering whether that's feasible to: > > > - build a spark application (with sbt/maven) based on spark2.4 > > > - deploy that jar on yarn on a spark2.3 based installation > > > > > > thanks by advance, > > > > > > > > > -- > > > nicolas > > > > > > > - > > > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > > > > > > > > > -- > > nicolas > > > > - > > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > > > > > -- > nicolas > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >
Re: run new spark version on old spark cluster ?
> correct. note that you only need to install spark on the node you launch it > from. spark doesnt need to be installed on cluster itself. That sound reasonably doable for me. My guess is I will have some troubles to make that spark version work with both hive & hdfs installed on the cluster - or maybe that's finally plug-&-play i don't know. thanks On Mon, May 20, 2019 at 02:16:43PM -0400, Koert Kuipers wrote: > correct. note that you only need to install spark on the node you launch it > from. spark doesnt need to be installed on cluster itself. > > the shared components between spark jobs on yarn are only really > spark-shuffle-service in yarn and spark-history-server. i have found > compatibility for these to be good. its best if these run latest version. > > On Mon, May 20, 2019 at 2:02 PM Nicolas Paris > wrote: > > > you will need the spark version you intend to launch with on the machine > you > > launch from and point to the correct spark-submit > > does this mean to install a second spark version (2.4) on the cluster ? > > thanks > > On Mon, May 20, 2019 at 01:58:11PM -0400, Koert Kuipers wrote: > > yarn can happily run multiple spark versions side-by-side > > you will need the spark version you intend to launch with on the machine > you > > launch from and point to the correct spark-submit > > > > On Mon, May 20, 2019 at 1:50 PM Nicolas Paris > wrote: > > > > Hi > > > > I am wondering whether that's feasible to: > > - build a spark application (with sbt/maven) based on spark2.4 > > - deploy that jar on yarn on a spark2.3 based installation > > > > thanks by advance, > > > > > > -- > > nicolas > > > > > - > > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > > > > > -- > nicolas > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > -- nicolas - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: run new spark version on old spark cluster ?
correct. note that you only need to install spark on the node you launch it from. spark doesnt need to be installed on cluster itself. the shared components between spark jobs on yarn are only really spark-shuffle-service in yarn and spark-history-server. i have found compatibility for these to be good. its best if these run latest version. On Mon, May 20, 2019 at 2:02 PM Nicolas Paris wrote: > > you will need the spark version you intend to launch with on the machine > you > > launch from and point to the correct spark-submit > > does this mean to install a second spark version (2.4) on the cluster ? > > thanks > > On Mon, May 20, 2019 at 01:58:11PM -0400, Koert Kuipers wrote: > > yarn can happily run multiple spark versions side-by-side > > you will need the spark version you intend to launch with on the machine > you > > launch from and point to the correct spark-submit > > > > On Mon, May 20, 2019 at 1:50 PM Nicolas Paris > wrote: > > > > Hi > > > > I am wondering whether that's feasible to: > > - build a spark application (with sbt/maven) based on spark2.4 > > - deploy that jar on yarn on a spark2.3 based installation > > > > thanks by advance, > > > > > > -- > > nicolas > > > > - > > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > > > > > -- > nicolas > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >
Re: run new spark version on old spark cluster ?
It is always dangerous to run a NEWER version of code on an OLDER cluster. The danger increases with the semver change and this one is not just a build #. In other word 2.4 is considered to be a fairly major change from 2.3. Not much else can be said. From: Nicolas Paris Reply: user@spark.apache.org Date: May 20, 2019 at 11:02:49 AM To: user@spark.apache.org Subject: Re: run new spark version on old spark cluster ? > you will need the spark version you intend to launch with on the machine you > launch from and point to the correct spark-submit does this mean to install a second spark version (2.4) on the cluster ? thanks On Mon, May 20, 2019 at 01:58:11PM -0400, Koert Kuipers wrote: > yarn can happily run multiple spark versions side-by-side > you will need the spark version you intend to launch with on the machine you > launch from and point to the correct spark-submit > > On Mon, May 20, 2019 at 1:50 PM Nicolas Paris wrote: > > Hi > > I am wondering whether that's feasible to: > - build a spark application (with sbt/maven) based on spark2.4 > - deploy that jar on yarn on a spark2.3 based installation > > thanks by advance, > > > -- > nicolas > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > -- nicolas - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: run new spark version on old spark cluster ?
> you will need the spark version you intend to launch with on the machine you > launch from and point to the correct spark-submit does this mean to install a second spark version (2.4) on the cluster ? thanks On Mon, May 20, 2019 at 01:58:11PM -0400, Koert Kuipers wrote: > yarn can happily run multiple spark versions side-by-side > you will need the spark version you intend to launch with on the machine you > launch from and point to the correct spark-submit > > On Mon, May 20, 2019 at 1:50 PM Nicolas Paris > wrote: > > Hi > > I am wondering whether that's feasible to: > - build a spark application (with sbt/maven) based on spark2.4 > - deploy that jar on yarn on a spark2.3 based installation > > thanks by advance, > > > -- > nicolas > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > -- nicolas - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: run new spark version on old spark cluster ?
yarn can happily run multiple spark versions side-by-side you will need the spark version you intend to launch with on the machine you launch from and point to the correct spark-submit On Mon, May 20, 2019 at 1:50 PM Nicolas Paris wrote: > Hi > > I am wondering whether that's feasible to: > - build a spark application (with sbt/maven) based on spark2.4 > - deploy that jar on yarn on a spark2.3 based installation > > thanks by advance, > > > -- > nicolas > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >
run new spark version on old spark cluster ?
Hi I am wondering whether that's feasible to: - build a spark application (with sbt/maven) based on spark2.4 - deploy that jar on yarn on a spark2.3 based installation thanks by advance, -- nicolas - To unsubscribe e-mail: user-unsubscr...@spark.apache.org