Hi Bala, 

I believe with respect to rolling upgrades, you would be installing the new 
version of Tez without removing the older one ( i.e. “simple” rpms are probably 
a bad idea if you want rolling upgrades :-) ). What this implies is that 
HADOOP_CLASSPATH in any scenario ( MR on Tez, Hive on Tez or Hive Server ) can 
continue pointing to the older version of Tez. Likewise for the tez jars on 
HDFS. This also means that you need 2 versions of tez-site.xml in versioned 
config dirs i.e. 
TEZ_CLASSPATH=/opt/tez-0.4.1/conf:/opt/tez-0.4.1/*:/opt/tez-0.4.1/lib/*  ( for 
the new version, it would be 
TEZ_CLASSPATH=/opt/tez-0.4.2/conf:/opt/tez-0.4.2/*:/opt/tez-0.4.2/lib/* ) 

Switching to the newer version of Tez should just be done by changing the env 
var to point to the new version directory of Tez ( the conf in it will also 
point to the newer version of Tez on HDFS).

Given that Tez is completely client-side, any job ( be it a hive query or MR 
job ) already running on the cluster will not be affected when the switch is 
made ( jars are localized when the job kicks off ). All newly submitted jobs 
will now pick the new version. Likewise for the Hive Server, assuming it has 
been configured with a particular class path, it need not be affected until it 
is restarted with a modified class path to the newly installed version. The 
only gotcha is that the older jars cannot be deleted until all running jobs 
using them have completed.

We can setup a face-to-face meeting/meetup for any interested folks on this 
area if there is interest.

thanks
— Hitesh

On Jul 8, 2014, at 11:26 AM, Bala Krishna Gangisetty <[email protected]> wrote:

> Thanks Gopal, Bikas and Hitesh for pouring your thoughts.
> 
> Hi Gopal,
> 
> One follow-up question: As you advised, in case of rolling upgrades to 
> overcome these errors, for hive, the best place to update HADOOP_CLASSPATH 
> with Tez jars is through hive-config.sh. Could you also suggest the best ways 
> to update HADOOP_CLASSPATH with Tez jars for mapreduce programs and also for 
> non Hive cli sessions (Through HiveServer2, et al)?
> 
> --Bala G.
> 
> 
> On Mon, Jul 7, 2014 at 7:30 PM, Gopal V <[email protected]> wrote:
> On 7/7/14, 5:50 PM, Bala Krishna Gangisetty wrote:
> Thanks Hitesh for your inputs. I've not come across any issues yet. So, I
> can safely assume that putting Tez jars in Hadoop class path will not cause
> the map reduce programs to use Tez framework unless it is enabled. Let me
> know if my understanding it not correct.
> 
> Your assumptions are correct.
> 
> But this is not advised because it will break rolling upgrades.
> 
> The main issue early adopters have run into is installing a tez built against 
> hadoop-2.4.x into a cluster running hadoop-2.2.x.
> 
> As Hitesh/Bikas mentioned, that would cause errors at runtime even for MR 
> jobs.
> 
> The errors you will get for that case is similar to the errors you get during 
> a rolling upgrade between versions.
> 
> There is no real reason to include tez jars for any hadoop daemons (datanode, 
> nodemanager) you run in your cluster because they might error out while 
> replacing those files.
> 
> The correct solution for this is to install Tez in its own versioned 
> directory.
> 
> And for hive, within your hive-config.sh to do the following.
> 
> export 
> HADOOP_CLASSPATH=/opt/tez/current/*:/opt/tez/current/lib/*:/etc/tez/conf/:/usr/share/java/*:$HADOOP_CLASSPATH
> 
> This setup with symlinks from
> 
> /etc/tez/conf -> /opt/tez/current/conf
> /opt/tez/current -> /opt/tez/0.4.1
> 
> Will ensure that you are ready to do rolling upgrades from day #1.
> 
> After the symlinks point to a new version, the only daemon to restart would 
> be hive-server2.
> 
> Cheers,
> Gopal
> 
> 
> On Mon, Jul 7, 2014 at 4:10 PM, Hitesh Shah <[email protected]> wrote:
> 
> Hi
> 
> For the most part, there should be no issues as most dependencies that Tez
> pulls in are compatible with the hadoop version that it is compiled with (
> 2.2 or higher ). The major issue to be aware of is that you should compile
> Tez against the same version of hadoop/mapreduce that is deployed on your
> cluster.  The tez dependency jars contain both 3rd party deps as well as
> hadoop jars ( hdfs, common, yarn client-side and mapreduce client-side ) -
> if there is a version mismatch, this may cause a problem when the tez
> directory is added to the hadoop classpath.
> 
> Have you seen any issues? If yes, could you provide more details?
> 
> thanks
> — Hitesh
> 
> 
> On Jul 7, 2014, at 3:44 PM, Bala Krishna Gangisetty <[email protected]>
> wrote:
> 
> > I'm wondering, from operational point of view, are there any specifics
> that need special attention to make MRv2 and Tez frameworks coexist in
> harmony? I heard that putting Tez jars in Hadoop class path would impact
> the mapred behavior, even when Tez is not enabled (either through
> mapred-site.xml, or Hive). Could someone throw more light and share
> thoughts on it?
> >
> > --Bala G.
> 
> 
> 
> 
> 
> 

Reply via email to