Here is what I learned from different docs.  Please correct if wrong.

old API (mapred) and new API (mapreduce) are compatible.  You can use either 
one.

The old mapred API can be used to communicate with either MRv1 (JobTracker) or 
MRv2 (YARN).
In both cases the client uses the old deprecated property: mapred.job.tracker
Set it to the address of the JobTracker or ResourceManager or "local" if you 
want to run in local mode.

The new mapreduce API should also be capable of communicating with either MRv1 
(JobTracker) or MRv2 (YARN).
A new property, mapreduce.jobtracker.address, is introduced in place of the 
deprecated property above.  It specifies
how to communicate with a Job Tracker.
When new API is communicating with MRv2 (YARN) on the backend, you need to use 
these properties instead:
yarn.resourcemanager.address
yarn.framework.name=local|classic|yarn

Is the last statement correct?  I haven't actually tried this out.  What does 
"classic" mean?

Lastly, when run with old API (mapred) against an MRv2 (YARN) backend, I'm 
getting the following exception:
2014-05-11 14:48:46,571 [tomcat-http--1] ERROR 
org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException 
as:saspad (auth:SIMPLE) 
cause:org.apache.hadoop.ipc.RemoteException(java.io.IOException): Unknown rpc 
kind RPC_WRITABLE

This leads me to believe some incompatibility on the client and server side.  I 
am using CDH4.6 jars on both the client and server.

What else am I missing?

Gaining some insight, but still a little confused.

Thanks.
-Tony

-----Original Message-----
From: Tony Dean 
Sent: Sunday, May 11, 2014 8:20 AM
To: 'Harsh J'; [email protected]
Subject: RE: mr1 and mr2

Hi Harsh,
Thanks for your reply.

The confusion comes in play between API vs. Implementation.  I'm using YARN on 
the server.

I'm using mapred JobClient and MRv2 (YARN) implementation on the server.  
Changing client configuration to use mapred.job.tracker and setting it the YARN 
resource manager host:port did perform the correct connection this time.  When 
would I use mapreduce.jobtracker.address vs. yarn.resourcemanager.address?  
Sorry for the confusion.

Also, now that I'm connecting to the ResourceManager, I'm getting the following 
exception:
2014-05-11 07:43:41,315 [tomcat-http--1] ERROR 
org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException 
as:saspad (auth:SIMPLE) 
cause:org.apache.hadoop.ipc.RemoteException(java.io.IOException): Unknown rpc 
kind RPC_WRITABLE

I have simple security setup.  User saspad can write to the HDFS file system 
with no problem.  I do not have any services privileges enabled.  I'm sure this 
is another mis-configuration, but not not sure what.

I appreciate any guidance.

Thanks!

-----Original Message-----
From: Harsh J [mailto:[email protected]]
Sent: Sunday, May 11, 2014 2:35 AM
To: [email protected]; Tony Dean
Subject: Re: mr1 and mr2

The MR1 configuration is 'mapred.job.tracker', not 
'mapreduce.jobtracker.address' (this is a newer name understood only by MR in 
2.x). Without the former, if you target an MR1 runtime, the job will evaluate 
the default of 'mapred.job.tracker' as 'local' and run a local job.

If your confusion is after following the given page at 
http://archive.cloudera.com/cdh4/cdh/4/hadoop/hadoop-project-dist/hadoop-common/DeprecatedProperties.html,
then please see the note at the bottom of
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Release-Notes/cdh4rn_topic_2.html:
"""
In Hadoop 2.0.0 and later (MRv2), a number of Hadoop and HDFS properties have 
been deprecated. (The change dates from Hadoop 0.23.1, on which the Beta 
releases of CDH4 were based). A list of deprecated properties and their 
replacements can be found on the Apache Deprecated Properties page.
Note: All of these deprecated properties continue to work in MRv1.
Conversely the new mapreduce* properties listed do not work in MRv1.
"""

On Sun, May 11, 2014 at 5:22 AM, Tony Dean <[email protected]> wrote:
> Hi,
>
>
>
> I am trying to write a Java application that works with either MR1 and MR2.
> At the present I have MR2 (YARN) implementation deployed and running.  
> I am using mapred API.  I believe that I read mapred and mapreduce 
> APIs are compatible so either should work.  The only thing that is 
> different is the configuration properties that need to be specified 
> depending on whether the back-end is MR1 or MR2. BTW: I’m using CDH 4.6 
> (Hadoop 2.0).
>
>
>
> My problem is that I can’t seem to submit a job to the cluster.  It 
> always runs locally.  I setup JobConf with appropriate properties and 
> submit the jobs using JobClient.  The properties that I set on JobConf are as 
> follows:
>
>
>
> mapreduce.jobtracker.address=host:port (I know this is for MR1, but 
> I’m trying everything)
>
> mapreduce.framework.name=yarn
>
> yarn.resourcemanager.address=host:port
>
> yarn.resourcemanager.host=host:port
>
>
>
> The last 2 are the same but I read 2 different ways to set it in 
> different conflicting documentations.
>
>
>
> Anyway, can someone explain how to get this seemingly simple 
> deployment to work?  What am I missing?
>
>
>
> Thanks!!!



--
Harsh J

Reply via email to