Re: run new spark version on old spark cluster ?

2019-05-21 Thread Nicolas Paris
Hi

I finally got all working. Here are the steps (for information, I am on HDP 
2.6.5):

- copy the old hive-site.xml into the new spark conf folder
- (optional?) donwload the jersey-bundle-1.8.jar and put it into the jars folder
- build a tar gz from all the jars and copy that archive to hdfs with chown 
hdfs:hadoop
- create a spark-default.conf file into conf folder and add the below lines:

> spark.driver.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native
> spark.executor.extraLibraryPath
> /usr/hdp/current/hadoop-client/lib/native
> spark.driver.extraJavaOptions -Dhdp.version=2.6.5.0-292
> spark.yarn.am.extraJavaOptions -Dhdp.version=2.6.5.0-292
> spark.eventLog.dir hdfs:///spark-history
> spark.eventLog.enabled false
> spark.hadoop.yarn.timeline-service.enabled false
> org.apache.spark.deploy.history.FsHistoryProvider
> spark.yarn.containerLauncherMaxThreads 25
> spark.driver.memoryOverhead 200
> spark.executor.memoryOverhead 200
> spark.yarn.max.executor.failures 3
> spark.yarn.preserve.staging.files false
> spark.yarn.queue default
> spark.yarn.scheduler.heartbeat.interval-ms 5000
> spark.yarn.submit.file.replication 3
> spark.yarn.archive hdfs:///hdp/apps/2.6.5.0-292/spark2/spark2.4.tar.gz
> spark.ui.port 4041

then the below command works (included hive, hdfs and yarn):

> bin/spark-shell --master yarn


Thanks for your support,



On Mon, May 20, 2019 at 03:42:46PM -0400, Koert Kuipers wrote:
> most likely have to set something in spark-defaults.conf like
> 
> spark.master yarn
> spark.submit.deployMode client
> 
> On Mon, May 20, 2019 at 3:14 PM Nicolas Paris  
> wrote:
> 
> Finally that was easy to connect to both hive/hdfs. I just had to copy
> the hive-site.xml from the old spark version and that worked instantly
> after unzipping.
> 
> Right now I am stuck on connecting to yarn.
> 
> 
> On Mon, May 20, 2019 at 02:50:44PM -0400, Koert Kuipers wrote:
> > we had very little issues with hdfs or hive, but then we use hive only
> for
> > basic reading and writing of tables.
> >
> > depending on your vendor you might have to add a few settings to your
> > spark-defaults.conf. i remember on hdp you had to set the hdp.version
> somehow.
> > we prefer to build spark with hadoop being provided, and then add hadoop
> > classpath to spark classpath. this works well on cdh, hdp, and also for
> cloud
> > providers.
> >
> > for example this is a typical build with hive for cdh 5 (which is based
> on
> > hadoop 2.6, you change hadoop version based on vendor):
> > dev/make-distribution.sh --name  --tgz -Phadoop-2.6
> -Dhadoop.version=
> > 2.6.0 -Pyarn -Phadoop-provided -Phive
> > add hadoop classpath to the spark classpath in spark-env.sh:
> > export SPARK_DIST_CLASSPATH=$(hadoop classpath)
> >
> > i think certain vendors support multiple "vendor supported" installs, so
> you
> > could also look into that if you are not comfortable with running your
> own
> > spark build.
> >
> > On Mon, May 20, 2019 at 2:24 PM Nicolas Paris 
> wrote:
> >
> >     > correct. note that you only need to install spark on the node you
> launch
> >     it
> >     > from. spark doesnt need to be installed on cluster itself.
> >
> >     That sound reasonably doable for me. My guess is I will have some
> >     troubles to make that spark version work with both hive & hdfs
> installed
> >     on the cluster - or maybe that's finally plug-&-play i don't know.
> >
> >     thanks
> >
> >     On Mon, May 20, 2019 at 02:16:43PM -0400, Koert Kuipers wrote:
> >     > correct. note that you only need to install spark on the node you
> launch
> >     it
> >     > from. spark doesnt need to be installed on cluster itself.
> >     >
> >     > the shared components between spark jobs on yarn are only really
> >     > spark-shuffle-service in yarn and spark-history-server. i have
> found
> >     > compatibility for these to be good. its best if these run latest
> version.
> >     >
> >     > On Mon, May 20, 2019 at 2:02 PM Nicolas Paris <
> nicolas.pa...@riseup.net>
> >     wrote:
> >     >
> >     >     > you will need the spark version you intend to launch with on
> the
> >     machine
> >     >     you
> >     >     > launch from and point to the correct spark-submit
> >     >
> >     >     does this mean to install a second spark version (2.4) on the
> cluster
> >     ?
> >     >
> >     >     thanks
> >     >
> >     >     On Mon, May 20, 2019 at 01:58:11PM -0400, Koert Kuipers wrote:
> >     >     > yarn can happily run multiple spark versions side-by-side
> >     >     > you will need the spark version you intend to launch with on
> the
> >     machine
> >     >     you
> >     >     > launch from and point to 

Re: run new spark version on old spark cluster ?

2019-05-20 Thread Koert Kuipers
most likely have to set something in spark-defaults.conf like

spark.master yarn
spark.submit.deployMode client

On Mon, May 20, 2019 at 3:14 PM Nicolas Paris 
wrote:

> Finally that was easy to connect to both hive/hdfs. I just had to copy
> the hive-site.xml from the old spark version and that worked instantly
> after unzipping.
>
> Right now I am stuck on connecting to yarn.
>
>
> On Mon, May 20, 2019 at 02:50:44PM -0400, Koert Kuipers wrote:
> > we had very little issues with hdfs or hive, but then we use hive only
> for
> > basic reading and writing of tables.
> >
> > depending on your vendor you might have to add a few settings to your
> > spark-defaults.conf. i remember on hdp you had to set the hdp.version
> somehow.
> > we prefer to build spark with hadoop being provided, and then add hadoop
> > classpath to spark classpath. this works well on cdh, hdp, and also for
> cloud
> > providers.
> >
> > for example this is a typical build with hive for cdh 5 (which is based
> on
> > hadoop 2.6, you change hadoop version based on vendor):
> > dev/make-distribution.sh --name  --tgz -Phadoop-2.6
> -Dhadoop.version=
> > 2.6.0 -Pyarn -Phadoop-provided -Phive
> > add hadoop classpath to the spark classpath in spark-env.sh:
> > export SPARK_DIST_CLASSPATH=$(hadoop classpath)
> >
> > i think certain vendors support multiple "vendor supported" installs, so
> you
> > could also look into that if you are not comfortable with running your
> own
> > spark build.
> >
> > On Mon, May 20, 2019 at 2:24 PM Nicolas Paris 
> wrote:
> >
> > > correct. note that you only need to install spark on the node you
> launch
> > it
> > > from. spark doesnt need to be installed on cluster itself.
> >
> > That sound reasonably doable for me. My guess is I will have some
> > troubles to make that spark version work with both hive & hdfs
> installed
> > on the cluster - or maybe that's finally plug-&-play i don't know.
> >
> > thanks
> >
> > On Mon, May 20, 2019 at 02:16:43PM -0400, Koert Kuipers wrote:
> > > correct. note that you only need to install spark on the node you
> launch
> > it
> > > from. spark doesnt need to be installed on cluster itself.
> > >
> > > the shared components between spark jobs on yarn are only really
> > > spark-shuffle-service in yarn and spark-history-server. i have
> found
> > > compatibility for these to be good. its best if these run latest
> version.
> > >
> > > On Mon, May 20, 2019 at 2:02 PM Nicolas Paris <
> nicolas.pa...@riseup.net>
> > wrote:
> > >
> > > > you will need the spark version you intend to launch with on
> the
> > machine
> > > you
> > > > launch from and point to the correct spark-submit
> > >
> > > does this mean to install a second spark version (2.4) on the
> cluster
> > ?
> > >
> > > thanks
> > >
> > > On Mon, May 20, 2019 at 01:58:11PM -0400, Koert Kuipers wrote:
> > > > yarn can happily run multiple spark versions side-by-side
> > > > you will need the spark version you intend to launch with on
> the
> > machine
> > > you
> > > > launch from and point to the correct spark-submit
> > > >
> > > > On Mon, May 20, 2019 at 1:50 PM Nicolas Paris <
> > nicolas.pa...@riseup.net>
> > > wrote:
> > > >
> > > > Hi
> > > >
> > > > I am wondering whether that's feasible to:
> > > > - build a spark application (with sbt/maven) based on
> spark2.4
> > > > - deploy that jar on yarn on a spark2.3 based
> installation
> > > >
> > > > thanks by advance,
> > > >
> > > >
> > > > --
> > > > nicolas
> > > >
> > > >
> >
>   -
> > > > To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> > > >
> > > >
> > >
> > > --
> > > nicolas
> > >
> > >
>  -
> > > To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> > >
> > >
> >
> > --
> > nicolas
> >
> > -
> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> >
> >
>
> --
> nicolas
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: run new spark version on old spark cluster ?

2019-05-20 Thread Nicolas Paris
Finally that was easy to connect to both hive/hdfs. I just had to copy
the hive-site.xml from the old spark version and that worked instantly
after unzipping.

Right now I am stuck on connecting to yarn. 


On Mon, May 20, 2019 at 02:50:44PM -0400, Koert Kuipers wrote:
> we had very little issues with hdfs or hive, but then we use hive only for
> basic reading and writing of tables.
> 
> depending on your vendor you might have to add a few settings to your
> spark-defaults.conf. i remember on hdp you had to set the hdp.version somehow.
> we prefer to build spark with hadoop being provided, and then add hadoop
> classpath to spark classpath. this works well on cdh, hdp, and also for cloud
> providers.
> 
> for example this is a typical build with hive for cdh 5 (which is based on
> hadoop 2.6, you change hadoop version based on vendor):
> dev/make-distribution.sh --name  --tgz -Phadoop-2.6 
> -Dhadoop.version=
> 2.6.0 -Pyarn -Phadoop-provided -Phive
> add hadoop classpath to the spark classpath in spark-env.sh:
> export SPARK_DIST_CLASSPATH=$(hadoop classpath)
> 
> i think certain vendors support multiple "vendor supported" installs, so you
> could also look into that if you are not comfortable with running your own
> spark build.
> 
> On Mon, May 20, 2019 at 2:24 PM Nicolas Paris  
> wrote:
> 
> > correct. note that you only need to install spark on the node you launch
> it
> > from. spark doesnt need to be installed on cluster itself.
> 
> That sound reasonably doable for me. My guess is I will have some
> troubles to make that spark version work with both hive & hdfs installed
> on the cluster - or maybe that's finally plug-&-play i don't know.
> 
> thanks
> 
> On Mon, May 20, 2019 at 02:16:43PM -0400, Koert Kuipers wrote:
> > correct. note that you only need to install spark on the node you launch
> it
> > from. spark doesnt need to be installed on cluster itself.
> >
> > the shared components between spark jobs on yarn are only really
> > spark-shuffle-service in yarn and spark-history-server. i have found
> > compatibility for these to be good. its best if these run latest 
> version.
> >
> > On Mon, May 20, 2019 at 2:02 PM Nicolas Paris 
> wrote:
> >
> >     > you will need the spark version you intend to launch with on the
> machine
> >     you
> >     > launch from and point to the correct spark-submit
> >
> >     does this mean to install a second spark version (2.4) on the 
> cluster
> ?
> >
> >     thanks
> >
> >     On Mon, May 20, 2019 at 01:58:11PM -0400, Koert Kuipers wrote:
> >     > yarn can happily run multiple spark versions side-by-side
> >     > you will need the spark version you intend to launch with on the
> machine
> >     you
> >     > launch from and point to the correct spark-submit
> >     >
> >     > On Mon, May 20, 2019 at 1:50 PM Nicolas Paris <
> nicolas.pa...@riseup.net>
> >     wrote:
> >     >
> >     >     Hi
> >     >
> >     >     I am wondering whether that's feasible to:
> >     >     - build a spark application (with sbt/maven) based on spark2.4
> >     >     - deploy that jar on yarn on a spark2.3 based installation
> >     >
> >     >     thanks by advance,
> >     >
> >     >
> >     >     --
> >     >     nicolas
> >     >
> >     >   
>  -
> >     >     To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> >     >
> >     >
> >
> >     --
> >     nicolas
> >
> >     
> -
> >     To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> >
> >
> 
> --
> nicolas
> 
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> 
> 

-- 
nicolas

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: run new spark version on old spark cluster ?

2019-05-20 Thread Koert Kuipers
we had very little issues with hdfs or hive, but then we use hive only for
basic reading and writing of tables.

depending on your vendor you might have to add a few settings to your
spark-defaults.conf. i remember on hdp you had to set the hdp.version
somehow.
we prefer to build spark with hadoop being provided, and then add hadoop
classpath to spark classpath. this works well on cdh, hdp, and also for
cloud providers.

for example this is a typical build with hive for cdh 5 (which is based on
hadoop 2.6, you change hadoop version based on vendor):
dev/make-distribution.sh --name  --tgz -Phadoop-2.6
-Dhadoop.version=2.6.0 -Pyarn -Phadoop-provided -Phive
add hadoop classpath to the spark classpath in spark-env.sh:
export SPARK_DIST_CLASSPATH=$(hadoop classpath)

i think certain vendors support multiple "vendor supported" installs, so
you could also look into that if you are not comfortable with running your
own spark build.

On Mon, May 20, 2019 at 2:24 PM Nicolas Paris 
wrote:

> > correct. note that you only need to install spark on the node you launch
> it
> > from. spark doesnt need to be installed on cluster itself.
>
> That sound reasonably doable for me. My guess is I will have some
> troubles to make that spark version work with both hive & hdfs installed
> on the cluster - or maybe that's finally plug-&-play i don't know.
>
> thanks
>
> On Mon, May 20, 2019 at 02:16:43PM -0400, Koert Kuipers wrote:
> > correct. note that you only need to install spark on the node you launch
> it
> > from. spark doesnt need to be installed on cluster itself.
> >
> > the shared components between spark jobs on yarn are only really
> > spark-shuffle-service in yarn and spark-history-server. i have found
> > compatibility for these to be good. its best if these run latest version.
> >
> > On Mon, May 20, 2019 at 2:02 PM Nicolas Paris 
> wrote:
> >
> > > you will need the spark version you intend to launch with on the
> machine
> > you
> > > launch from and point to the correct spark-submit
> >
> > does this mean to install a second spark version (2.4) on the
> cluster ?
> >
> > thanks
> >
> > On Mon, May 20, 2019 at 01:58:11PM -0400, Koert Kuipers wrote:
> > > yarn can happily run multiple spark versions side-by-side
> > > you will need the spark version you intend to launch with on the
> machine
> > you
> > > launch from and point to the correct spark-submit
> > >
> > > On Mon, May 20, 2019 at 1:50 PM Nicolas Paris <
> nicolas.pa...@riseup.net>
> > wrote:
> > >
> > > Hi
> > >
> > > I am wondering whether that's feasible to:
> > > - build a spark application (with sbt/maven) based on spark2.4
> > > - deploy that jar on yarn on a spark2.3 based installation
> > >
> > > thanks by advance,
> > >
> > >
> > > --
> > > nicolas
> > >
> > >
>  -
> > > To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> > >
> > >
> >
> > --
> > nicolas
> >
> > -
> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> >
> >
>
> --
> nicolas
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: run new spark version on old spark cluster ?

2019-05-20 Thread Nicolas Paris
> correct. note that you only need to install spark on the node you launch it
> from. spark doesnt need to be installed on cluster itself.

That sound reasonably doable for me. My guess is I will have some
troubles to make that spark version work with both hive & hdfs installed
on the cluster - or maybe that's finally plug-&-play i don't know.

thanks

On Mon, May 20, 2019 at 02:16:43PM -0400, Koert Kuipers wrote:
> correct. note that you only need to install spark on the node you launch it
> from. spark doesnt need to be installed on cluster itself.
> 
> the shared components between spark jobs on yarn are only really
> spark-shuffle-service in yarn and spark-history-server. i have found
> compatibility for these to be good. its best if these run latest version.
> 
> On Mon, May 20, 2019 at 2:02 PM Nicolas Paris  
> wrote:
> 
> > you will need the spark version you intend to launch with on the machine
> you
> > launch from and point to the correct spark-submit
> 
> does this mean to install a second spark version (2.4) on the cluster ?
> 
> thanks
> 
> On Mon, May 20, 2019 at 01:58:11PM -0400, Koert Kuipers wrote:
> > yarn can happily run multiple spark versions side-by-side
> > you will need the spark version you intend to launch with on the machine
> you
> > launch from and point to the correct spark-submit
> >
> > On Mon, May 20, 2019 at 1:50 PM Nicolas Paris 
> wrote:
> >
> >     Hi
> >
> >     I am wondering whether that's feasible to:
> >     - build a spark application (with sbt/maven) based on spark2.4
> >     - deploy that jar on yarn on a spark2.3 based installation
> >
> >     thanks by advance,
> >
> >
> >     --
> >     nicolas
> >
> >     
> -
> >     To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> >
> >
> 
> --
> nicolas
> 
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> 
> 

-- 
nicolas

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: run new spark version on old spark cluster ?

2019-05-20 Thread Koert Kuipers
correct. note that you only need to install spark on the node you launch it
from. spark doesnt need to be installed on cluster itself.

the shared components between spark jobs on yarn are only really
spark-shuffle-service in yarn and spark-history-server. i have found
compatibility for these to be good. its best if these run latest version.

On Mon, May 20, 2019 at 2:02 PM Nicolas Paris 
wrote:

> > you will need the spark version you intend to launch with on the machine
> you
> > launch from and point to the correct spark-submit
>
> does this mean to install a second spark version (2.4) on the cluster ?
>
> thanks
>
> On Mon, May 20, 2019 at 01:58:11PM -0400, Koert Kuipers wrote:
> > yarn can happily run multiple spark versions side-by-side
> > you will need the spark version you intend to launch with on the machine
> you
> > launch from and point to the correct spark-submit
> >
> > On Mon, May 20, 2019 at 1:50 PM Nicolas Paris 
> wrote:
> >
> > Hi
> >
> > I am wondering whether that's feasible to:
> > - build a spark application (with sbt/maven) based on spark2.4
> > - deploy that jar on yarn on a spark2.3 based installation
> >
> > thanks by advance,
> >
> >
> > --
> > nicolas
> >
> > -
> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> >
> >
>
> --
> nicolas
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: run new spark version on old spark cluster ?

2019-05-20 Thread Pat Ferrel
It is always dangerous to run a NEWER version of code on an OLDER cluster.
The danger increases with the semver change and this one is not just a
build #. In other word 2.4 is considered to be a fairly major change from
2.3. Not much else can be said.


From: Nicolas Paris  
Reply: user@spark.apache.org  
Date: May 20, 2019 at 11:02:49 AM
To: user@spark.apache.org  
Subject:  Re: run new spark version on old spark cluster ?

> you will need the spark version you intend to launch with on the machine
you
> launch from and point to the correct spark-submit

does this mean to install a second spark version (2.4) on the cluster ?

thanks

On Mon, May 20, 2019 at 01:58:11PM -0400, Koert Kuipers wrote:
> yarn can happily run multiple spark versions side-by-side
> you will need the spark version you intend to launch with on the machine
you
> launch from and point to the correct spark-submit
>
> On Mon, May 20, 2019 at 1:50 PM Nicolas Paris 
wrote:
>
> Hi
>
> I am wondering whether that's feasible to:
> - build a spark application (with sbt/maven) based on spark2.4
> - deploy that jar on yarn on a spark2.3 based installation
>
> thanks by advance,
>
>
> --
> nicolas
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

-- 
nicolas

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org


Re: run new spark version on old spark cluster ?

2019-05-20 Thread Nicolas Paris
> you will need the spark version you intend to launch with on the machine you
> launch from and point to the correct spark-submit

does this mean to install a second spark version (2.4) on the cluster ?

thanks

On Mon, May 20, 2019 at 01:58:11PM -0400, Koert Kuipers wrote:
> yarn can happily run multiple spark versions side-by-side
> you will need the spark version you intend to launch with on the machine you
> launch from and point to the correct spark-submit
> 
> On Mon, May 20, 2019 at 1:50 PM Nicolas Paris  
> wrote:
> 
> Hi
> 
> I am wondering whether that's feasible to:
> - build a spark application (with sbt/maven) based on spark2.4
> - deploy that jar on yarn on a spark2.3 based installation
> 
> thanks by advance,
> 
> 
> --
> nicolas
> 
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> 
> 

-- 
nicolas

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: run new spark version on old spark cluster ?

2019-05-20 Thread Koert Kuipers
yarn can happily run multiple spark versions side-by-side
you will need the spark version you intend to launch with on the machine
you launch from and point to the correct spark-submit

On Mon, May 20, 2019 at 1:50 PM Nicolas Paris 
wrote:

> Hi
>
> I am wondering whether that's feasible to:
> - build a spark application (with sbt/maven) based on spark2.4
> - deploy that jar on yarn on a spark2.3 based installation
>
> thanks by advance,
>
>
> --
> nicolas
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


run new spark version on old spark cluster ?

2019-05-20 Thread Nicolas Paris
Hi

I am wondering whether that's feasible to:
- build a spark application (with sbt/maven) based on spark2.4
- deploy that jar on yarn on a spark2.3 based installation

thanks by advance,


-- 
nicolas

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org