Thanks Gopal, Bikas and Hitesh for pouring your thoughts. Hi Gopal,
One follow-up question: As you advised, in case of rolling upgrades to overcome these errors, for hive, the best place to update HADOOP_CLASSPATH with Tez jars is through hive-config.sh. Could you also suggest the best ways to update HADOOP_CLASSPATH with Tez jars for mapreduce programs and also for non Hive cli sessions (Through HiveServer2, et al)? --Bala G. On Mon, Jul 7, 2014 at 7:30 PM, Gopal V <[email protected]> wrote: > On 7/7/14, 5:50 PM, Bala Krishna Gangisetty wrote: > >> Thanks Hitesh for your inputs. I've not come across any issues yet. So, I >> can safely assume that putting Tez jars in Hadoop class path will not >> cause >> the map reduce programs to use Tez framework unless it is enabled. Let me >> know if my understanding it not correct. >> > > Your assumptions are correct. > > But this is not advised because it will break rolling upgrades. > > The main issue early adopters have run into is installing a tez built > against hadoop-2.4.x into a cluster running hadoop-2.2.x. > > As Hitesh/Bikas mentioned, that would cause errors at runtime even for MR > jobs. > > The errors you will get for that case is similar to the errors you get > during a rolling upgrade between versions. > > There is no real reason to include tez jars for any hadoop daemons > (datanode, nodemanager) you run in your cluster because they might error > out while replacing those files. > > The correct solution for this is to install Tez in its own versioned > directory. > > And for hive, within your hive-config.sh to do the following. > > export HADOOP_CLASSPATH=/opt/tez/current/*:/opt/tez/current/ > lib/*:/etc/tez/conf/:/usr/share/java/*:$HADOOP_CLASSPATH > > This setup with symlinks from > > /etc/tez/conf -> /opt/tez/current/conf > /opt/tez/current -> /opt/tez/0.4.1 > > Will ensure that you are ready to do rolling upgrades from day #1. > > After the symlinks point to a new version, the only daemon to restart > would be hive-server2. > > Cheers, > Gopal > > > On Mon, Jul 7, 2014 at 4:10 PM, Hitesh Shah <[email protected]> wrote: >> >> Hi >>> >>> For the most part, there should be no issues as most dependencies that >>> Tez >>> pulls in are compatible with the hadoop version that it is compiled with >>> ( >>> 2.2 or higher ). The major issue to be aware of is that you should >>> compile >>> Tez against the same version of hadoop/mapreduce that is deployed on your >>> cluster. The tez dependency jars contain both 3rd party deps as well as >>> hadoop jars ( hdfs, common, yarn client-side and mapreduce client-side ) >>> - >>> if there is a version mismatch, this may cause a problem when the tez >>> directory is added to the hadoop classpath. >>> >>> Have you seen any issues? If yes, could you provide more details? >>> >>> thanks >>> — Hitesh >>> >>> >>> On Jul 7, 2014, at 3:44 PM, Bala Krishna Gangisetty <[email protected]> >>> wrote: >>> >>> > I'm wondering, from operational point of view, are there any specifics >>> that need special attention to make MRv2 and Tez frameworks coexist in >>> harmony? I heard that putting Tez jars in Hadoop class path would impact >>> the mapred behavior, even when Tez is not enabled (either through >>> mapred-site.xml, or Hive). Could someone throw more light and share >>> thoughts on it? >>> > >>> > --Bala G. >>> >>> >>> >> >> >
