Here is what I learned from different docs. Please correct if wrong. old API (mapred) and new API (mapreduce) are compatible. You can use either one.
The old mapred API can be used to communicate with either MRv1 (JobTracker) or MRv2 (YARN). In both cases the client uses the old deprecated property: mapred.job.tracker Set it to the address of the JobTracker or ResourceManager or "local" if you want to run in local mode. The new mapreduce API should also be capable of communicating with either MRv1 (JobTracker) or MRv2 (YARN). A new property, mapreduce.jobtracker.address, is introduced in place of the deprecated property above. It specifies how to communicate with a Job Tracker. When new API is communicating with MRv2 (YARN) on the backend, you need to use these properties instead: yarn.resourcemanager.address yarn.framework.name=local|classic|yarn Is the last statement correct? I haven't actually tried this out. What does "classic" mean? Lastly, when run with old API (mapred) against an MRv2 (YARN) backend, I'm getting the following exception: 2014-05-11 14:48:46,571 [tomcat-http--1] ERROR org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:saspad (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException(java.io.IOException): Unknown rpc kind RPC_WRITABLE This leads me to believe some incompatibility on the client and server side. I am using CDH4.6 jars on both the client and server. What else am I missing? Gaining some insight, but still a little confused. Thanks. -Tony -----Original Message----- From: Tony Dean Sent: Sunday, May 11, 2014 8:20 AM To: 'Harsh J'; [email protected] Subject: RE: mr1 and mr2 Hi Harsh, Thanks for your reply. The confusion comes in play between API vs. Implementation. I'm using YARN on the server. I'm using mapred JobClient and MRv2 (YARN) implementation on the server. Changing client configuration to use mapred.job.tracker and setting it the YARN resource manager host:port did perform the correct connection this time. When would I use mapreduce.jobtracker.address vs. yarn.resourcemanager.address? Sorry for the confusion. Also, now that I'm connecting to the ResourceManager, I'm getting the following exception: 2014-05-11 07:43:41,315 [tomcat-http--1] ERROR org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:saspad (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException(java.io.IOException): Unknown rpc kind RPC_WRITABLE I have simple security setup. User saspad can write to the HDFS file system with no problem. I do not have any services privileges enabled. I'm sure this is another mis-configuration, but not not sure what. I appreciate any guidance. Thanks! -----Original Message----- From: Harsh J [mailto:[email protected]] Sent: Sunday, May 11, 2014 2:35 AM To: [email protected]; Tony Dean Subject: Re: mr1 and mr2 The MR1 configuration is 'mapred.job.tracker', not 'mapreduce.jobtracker.address' (this is a newer name understood only by MR in 2.x). Without the former, if you target an MR1 runtime, the job will evaluate the default of 'mapred.job.tracker' as 'local' and run a local job. If your confusion is after following the given page at http://archive.cloudera.com/cdh4/cdh/4/hadoop/hadoop-project-dist/hadoop-common/DeprecatedProperties.html, then please see the note at the bottom of http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Release-Notes/cdh4rn_topic_2.html: """ In Hadoop 2.0.0 and later (MRv2), a number of Hadoop and HDFS properties have been deprecated. (The change dates from Hadoop 0.23.1, on which the Beta releases of CDH4 were based). A list of deprecated properties and their replacements can be found on the Apache Deprecated Properties page. Note: All of these deprecated properties continue to work in MRv1. Conversely the new mapreduce* properties listed do not work in MRv1. """ On Sun, May 11, 2014 at 5:22 AM, Tony Dean <[email protected]> wrote: > Hi, > > > > I am trying to write a Java application that works with either MR1 and MR2. > At the present I have MR2 (YARN) implementation deployed and running. > I am using mapred API. I believe that I read mapred and mapreduce > APIs are compatible so either should work. The only thing that is > different is the configuration properties that need to be specified > depending on whether the back-end is MR1 or MR2. BTW: I’m using CDH 4.6 > (Hadoop 2.0). > > > > My problem is that I can’t seem to submit a job to the cluster. It > always runs locally. I setup JobConf with appropriate properties and > submit the jobs using JobClient. The properties that I set on JobConf are as > follows: > > > > mapreduce.jobtracker.address=host:port (I know this is for MR1, but > I’m trying everything) > > mapreduce.framework.name=yarn > > yarn.resourcemanager.address=host:port > > yarn.resourcemanager.host=host:port > > > > The last 2 are the same but I read 2 different ways to set it in > different conflicting documentations. > > > > Anyway, can someone explain how to get this seemingly simple > deployment to work? What am I missing? > > > > Thanks!!! -- Harsh J
