Hi Bala, I believe with respect to rolling upgrades, you would be installing the new version of Tez without removing the older one ( i.e. “simple” rpms are probably a bad idea if you want rolling upgrades :-) ). What this implies is that HADOOP_CLASSPATH in any scenario ( MR on Tez, Hive on Tez or Hive Server ) can continue pointing to the older version of Tez. Likewise for the tez jars on HDFS. This also means that you need 2 versions of tez-site.xml in versioned config dirs i.e. TEZ_CLASSPATH=/opt/tez-0.4.1/conf:/opt/tez-0.4.1/*:/opt/tez-0.4.1/lib/* ( for the new version, it would be TEZ_CLASSPATH=/opt/tez-0.4.2/conf:/opt/tez-0.4.2/*:/opt/tez-0.4.2/lib/* )
Switching to the newer version of Tez should just be done by changing the env var to point to the new version directory of Tez ( the conf in it will also point to the newer version of Tez on HDFS). Given that Tez is completely client-side, any job ( be it a hive query or MR job ) already running on the cluster will not be affected when the switch is made ( jars are localized when the job kicks off ). All newly submitted jobs will now pick the new version. Likewise for the Hive Server, assuming it has been configured with a particular class path, it need not be affected until it is restarted with a modified class path to the newly installed version. The only gotcha is that the older jars cannot be deleted until all running jobs using them have completed. We can setup a face-to-face meeting/meetup for any interested folks on this area if there is interest. thanks — Hitesh On Jul 8, 2014, at 11:26 AM, Bala Krishna Gangisetty <[email protected]> wrote: > Thanks Gopal, Bikas and Hitesh for pouring your thoughts. > > Hi Gopal, > > One follow-up question: As you advised, in case of rolling upgrades to > overcome these errors, for hive, the best place to update HADOOP_CLASSPATH > with Tez jars is through hive-config.sh. Could you also suggest the best ways > to update HADOOP_CLASSPATH with Tez jars for mapreduce programs and also for > non Hive cli sessions (Through HiveServer2, et al)? > > --Bala G. > > > On Mon, Jul 7, 2014 at 7:30 PM, Gopal V <[email protected]> wrote: > On 7/7/14, 5:50 PM, Bala Krishna Gangisetty wrote: > Thanks Hitesh for your inputs. I've not come across any issues yet. So, I > can safely assume that putting Tez jars in Hadoop class path will not cause > the map reduce programs to use Tez framework unless it is enabled. Let me > know if my understanding it not correct. > > Your assumptions are correct. > > But this is not advised because it will break rolling upgrades. > > The main issue early adopters have run into is installing a tez built against > hadoop-2.4.x into a cluster running hadoop-2.2.x. > > As Hitesh/Bikas mentioned, that would cause errors at runtime even for MR > jobs. > > The errors you will get for that case is similar to the errors you get during > a rolling upgrade between versions. > > There is no real reason to include tez jars for any hadoop daemons (datanode, > nodemanager) you run in your cluster because they might error out while > replacing those files. > > The correct solution for this is to install Tez in its own versioned > directory. > > And for hive, within your hive-config.sh to do the following. > > export > HADOOP_CLASSPATH=/opt/tez/current/*:/opt/tez/current/lib/*:/etc/tez/conf/:/usr/share/java/*:$HADOOP_CLASSPATH > > This setup with symlinks from > > /etc/tez/conf -> /opt/tez/current/conf > /opt/tez/current -> /opt/tez/0.4.1 > > Will ensure that you are ready to do rolling upgrades from day #1. > > After the symlinks point to a new version, the only daemon to restart would > be hive-server2. > > Cheers, > Gopal > > > On Mon, Jul 7, 2014 at 4:10 PM, Hitesh Shah <[email protected]> wrote: > > Hi > > For the most part, there should be no issues as most dependencies that Tez > pulls in are compatible with the hadoop version that it is compiled with ( > 2.2 or higher ). The major issue to be aware of is that you should compile > Tez against the same version of hadoop/mapreduce that is deployed on your > cluster. The tez dependency jars contain both 3rd party deps as well as > hadoop jars ( hdfs, common, yarn client-side and mapreduce client-side ) - > if there is a version mismatch, this may cause a problem when the tez > directory is added to the hadoop classpath. > > Have you seen any issues? If yes, could you provide more details? > > thanks > — Hitesh > > > On Jul 7, 2014, at 3:44 PM, Bala Krishna Gangisetty <[email protected]> > wrote: > > > I'm wondering, from operational point of view, are there any specifics > that need special attention to make MRv2 and Tez frameworks coexist in > harmony? I heard that putting Tez jars in Hadoop class path would impact > the mapred behavior, even when Tez is not enabled (either through > mapred-site.xml, or Hive). Could someone throw more light and share > thoughts on it? > > > > --Bala G. > > > > > >
