Re: MR jobs from Java action run locally

Marko Dinic Fri, 06 May 2016 01:36:25 -0700

Hello Micah,

Thank you for your answer. There are a couple of problems with thisapproach in my case:

- When I use the Job definition that you have given (using Configuredand Tool) my configuration still gets initialized to local.- My jobs are generally not defined as classes with main method, butthere is only one main() method in class which performs orchestration,and it uses Job definitions in separate classes which do not have a mainmethod. That is, I am not able to implement Tool since my jobdefinitions don't have a main class.

I do not understand why is my configuration initialized to local, do youhave any idea? So I still have:


|mapreduce.jobtracker.address = local
mapreduce.framework.name  <http://mapreduce.framework.name>  = local|

I do get execution on cluster when I add:

conf.addResource(newPath("file:///", 
System.getProperty("oozie.action.conf.xml")));

But then I have the problem that some of the jars are added to myDistributed cache, which makes me a problem when I try to get somethingthat I added to it (since I don't know it's location any-longer). Foreexample, here's what's located in my distributed cache:


/user/hdfs/training/lib/KMedoidsUsingFAMES-2.0-SNAPSHOT-jar-with-dependencies.jar
/user/oozie/share/lib/lib_20160128122044/oozie/aws-java-sdk-1.7.4.jar
/user/oozie/share/lib/lib_20160128122044/oozie/azure-storage-2.2.0.jar
/user/oozie/share/lib/lib_20160128122044/oozie/commons-lang3-3.3.2.jar
/user/oozie/share/lib/lib_20160128122044/oozie/guava-11.0.2.jar
/user/oozie/share/lib/lib_20160128122044/oozie/hadoop-aws-2.7.1.2.3.4.0-3485.jar
/user/oozie/share/lib/lib_20160128122044/oozie/hadoop-azure-2.7.1.2.3.4.0-3485.jar
/user/oozie/share/lib/lib_20160128122044/oozie/jackson-annotations-2.2.3.jar
/user/oozie/share/lib/lib_20160128122044/oozie/jackson-core-2.2.3.jar
/user/oozie/share/lib/lib_20160128122044/oozie/jackson-databind-2.2.3.jar
/user/oozie/share/lib/lib_20160128122044/oozie/joda-time-2.1.jar
/user/oozie/share/lib/lib_20160128122044/oozie/json-simple-1.1.jar
/user/oozie/share/lib/lib_20160128122044/oozie/oozie-hadoop-utils-hadoop-2-4.2.0.2.3.4.0-3485.jar
/user/oozie/share/lib/lib_20160128122044/oozie/oozie-sharelib-oozie-4.2.0.2.3.4.0-3485.jar
/user/hdfs/sessions/777/11072010/initialSeed/part-r-00000

As you can see, the file that I have added to distributed cache is nowlast (and was first before), so this could be a problem for me.

Are you aware of such behaviour - that Distributed cache gets "polluted"by jar locations that you don't specify?


Best regards,

On 05/05/2016 05:51 PM, Micah Whitacre wrote:

Not sure how your main class is structured but a lot of our JavaActions extend the Hadoop Tool class


public classMyJobextendsConfiguredimplementsTool {

     public static voidmain(String[] args)throwsException {
         MyJob job = new MyJob();

         ToolRunner.run(new Configuration(), job, args);
     }

     @Override
     public intrun(String[] args)throwsException {
         Configuration config = getConf();
         //do stuff
         return0;
     }
}

Inside the run method it will usually have populated the config for kicking off 
jobs.  We have found in some occasions that adding the Oozie conf
helps in secured clusters when dealing with tokens etc.  So we have code that 
looks like the following:

if(System.getProperty("oozie.action.conf.xml") !=null) {
     conf.addResource(newPath("file:///", 
System.getProperty("oozie.action.conf.xml")));
}

conf.addResource("core-site.xml");
conf.addResource("hdfs-site.xml");
conf.addResource("mapred-site.xml");
conf.addResource("yarn-site.xml");
conf.addResource("hive-site.xml");
With that code we can handle running on the command line or through Oozie 
without caring.  Also we can talk to Hive without extra command line config.

On Thu, May 5, 2016 at 9:31 AM, Marko Dinic <[email protected]<mailto:[email protected]>> wrote:


    I should add that this is what my Configuration looks like when I
    create it using default constructor

    Configuration conf = new Configuration();

    |mapreduce.jobtracker.address = local
    mapreduce.framework.name  <http://mapreduce.framework.name>  = local|

    And here is what happens when using

    |Configuration conf = new Configuration(false);
    conf.addResource(new Path("file:///", 
System.getProperty("oozie.action.conf.xml")));|

    |mapreduce.jobtracker.address =192.168.84.27:8050  
<http://192.168.84.27:8050>
    mapreduce.framework.name  <http://mapreduce.framework.name>  = yarn|

    Any help would be highly appreciated.


    On 05/05/2016 10:39 AM, Marko Dinic wrote:

    Hello everyone,

    I'm trying to run a sequence of MR jobs using Java action for
    their drivers in Oozie.

    The problem is that MR job are run locally instead on Hadoop
    cluster. How to fix this?

    First job reads from HBase, performs some processing and puts the
    result on HDFS, while next job should read from it. There are 10
    mappers in first job, but I'm only showing the last one as an
    example.

    Here is the error log from HBase MR job:

            Aw==, start row: 9-777-1123456789113, end row:
    9-777-1123456789114, region location:
    hdp-slave1.nissatech.local:16020)
        2016-05-04 14:33:48,373 INFO [LocalJobRunner Map Task
    Executor #0]
    org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
    identifier=hconnection-0x860ce79 connecting to ZooKeeper
    ensemble=192.168.84.27:2181 <http://192.168.84.27:2181>
        2016-05-04 14:33:48,373 INFO [LocalJobRunner Map Task
    Executor #0] org.apache.zookeeper.ZooKeeper: Initiating client
    connection, connectString=192.168.84.27:2181
    <http://192.168.84.27:2181> sessionTimeout=90000
    watcher=hconnection-0x860ce790x0, quorum=192.168.84.27:2181
    <http://192.168.84.27:2181>, baseZNode=/hbase-unsecure
        2016-05-04 14:33:48,378 INFO [LocalJobRunner Map Task
    Executor #0-SendThread(192.168.84.27:2181
    <http://192.168.84.27:2181>)] org.apache.zookeeper.ClientCnxn:
    Opening socket connection to server
    192.168.84.27/192.168.84.27:2181
    <http://192.168.84.27/192.168.84.27:2181>. Will not attempt to
    authenticate using SASL (unknown error)
        2016-05-04 14:33:48,379 INFO [LocalJobRunner Map Task
    Executor #0-SendThread(192.168.84.27:2181
    <http://192.168.84.27:2181>)] org.apache.zookeeper.ClientCnxn:
    Socket connection established to 192.168.84.27/192.168.84.27:2181
    <http://192.168.84.27/192.168.84.27:2181>, initiating session
        2016-05-04 14:33:48,391 INFO [LocalJobRunner Map Task
    Executor #0-SendThread(192.168.84.27:2181
    <http://192.168.84.27:2181>)] org.apache.zookeeper.ClientCnxn:
    Session establishment complete on server
    192.168.84.27/192.168.84.27:2181
    <http://192.168.84.27/192.168.84.27:2181>, sessionid =
    0x152f8f85214096b, negotiated timeout = 40000
        2016-05-04 14:33:48,394 INFO [LocalJobRunner Map Task
    Executor #0]
    org.apache.hadoop.hbase.mapreduce.TableInputFormatBase: Input
    split length: 0 bytes.
        2016-05-04 14:33:48,590 INFO [LocalJobRunner Map Task
    Executor #0] org.apache.hadoop.mapred.MapTask: (EQUATOR) 0 kvi
    26214396(104857584)
        2016-05-04 14:33:48,590 INFO [LocalJobRunner Map Task
    Executor #0] org.apache.hadoop.mapred.MapTask:
    mapreduce.task.io.sort.mb: 100
        2016-05-04 14:33:48,590 INFO [LocalJobRunner Map Task
    Executor #0] org.apache.hadoop.mapred.MapTask: soft limit at 83886080
        2016-05-04 14:33:48,590 INFO [LocalJobRunner Map Task
    Executor #0] org.apache.hadoop.mapred.MapTask: bufstart = 0;
    bufvoid = 104857600
        2016-05-04 14:33:48,591 INFO [LocalJobRunner Map Task
    Executor #0] org.apache.hadoop.mapred.MapTask: kvstart =
    26214396; length = 6553600
        2016-05-04 14:33:48,592 INFO [LocalJobRunner Map Task
    Executor #0] org.apache.hadoop.mapred.MapTask: Map output
    collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
        2016-05-04 14:33:48,801 INFO [LocalJobRunner Map Task
    Executor #0] org.apache.hadoop.mapred.LocalJobRunner:
        2016-05-04 14:33:48,802 INFO [LocalJobRunner Map Task
    Executor #0]
    org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation:
    Closing zookeeper sessionid=0x152f8f85214096b
        2016-05-04 14:33:48,828 INFO [LocalJobRunner Map Task
    Executor #0-EventThread] org.apache.zookeeper.ClientCnxn:
    EventThread shut down
        2016-05-04 14:33:48,828 INFO [LocalJobRunner Map Task
    Executor #0] org.apache.zookeeper.ZooKeeper: Session:
    0x152f8f85214096b closed
        2016-05-04 14:33:48,839 INFO [LocalJobRunner Map Task
    Executor #0] org.apache.hadoop.mapred.MapTask: Starting flush of
    map output
        2016-05-04 14:33:48,839 INFO [LocalJobRunner Map Task
    Executor #0] org.apache.hadoop.mapred.MapTask: Spilling map output
        2016-05-04 14:33:48,839 INFO [LocalJobRunner Map Task
    Executor #0] org.apache.hadoop.mapred.MapTask: bufstart = 0;
    bufend = 5734062; bufvoid = 104857600
        2016-05-04 14:33:48,839 INFO [LocalJobRunner Map Task
    Executor #0] org.apache.hadoop.mapred.MapTask: kvstart =
    26214396(104857584); kvend = 26210008(104840032); length =
    4389/6553600
        2016-05-04 14:33:48,874 INFO [LocalJobRunner Map Task
    Executor #0] org.apache.hadoop.mapred.MapTask: Finished spill 0
        2016-05-04 14:33:48,877 INFO [LocalJobRunner Map Task
    Executor #0] org.apache.hadoop.mapred.Task:
    Task:attempt_local1149688163_0001_m_000009_0 is done. And is in
    the process of committing
        2016-05-04 14:33:48,897 INFO [LocalJobRunner Map Task
    Executor #0] org.apache.hadoop.mapred.LocalJobRunner: map
        2016-05-04 14:33:48,897 INFO [LocalJobRunner Map Task
    Executor #0] org.apache.hadoop.mapred.Task: Task
    'attempt_local1149688163_0001_m_000009_0' done.
        2016-05-04 14:33:48,897 INFO [LocalJobRunner Map Task
    Executor #0] org.apache.hadoop.mapred.LocalJobRunner: Finishing
    task: attempt_local1149688163_0001_m_000009_0
        2016-05-04 14:33:48,897 INFO [Thread-42]
    org.apache.hadoop.mapred.LocalJobRunner: map task executor complete.
        2016-05-04 14:33:48,901 INFO [Thread-42]
    org.apache.hadoop.mapred.LocalJobRunner: Waiting for reduce tasks
        2016-05-04 14:33:48,901 INFO [pool-9-thread-1]
    org.apache.hadoop.mapred.LocalJobRunner: Starting task:
    attempt_local1149688163_0001_r_000000_0
        2016-05-04 14:33:48,918 INFO [pool-9-thread-1]
    org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File
    Output Committer Algorithm version is 1
        2016-05-04 14:33:48,919 INFO [pool-9-thread-1]
    org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter:
    FileOutputCommitter skip cleanup _temporary folders under output
    directory:false, ignore cleanup failures: false
        2016-05-04 14:33:48,919 INFO [pool-9-thread-1]
    org.apache.hadoop.mapred.Task:  Using
    ResourceCalculatorProcessTree : [ ]
        2016-05-04 14:33:48,932 INFO [pool-9-thread-1]
    org.apache.hadoop.mapred.ReduceTask: Using ShuffleConsumerPlugin:
    org.apache.hadoop.mapreduce.task.reduce.Shuffle@697f13c9
        2016-05-04 14:33:48,959 INFO [pool-9-thread-1]
    org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl:
    MergerManager: memoryLimit=289931264,
    maxSingleShuffleLimit=72482816, mergeThreshold=191354640,
    ioSortFactor=10, memToMemMergeOutputsThreshold=10
        2016-05-04 14:33:48,965 INFO [EventFetcher for fetching Map
    Completion Events]
    org.apache.hadoop.mapreduce.task.reduce.EventFetcher:
    attempt_local1149688163_0001_r_000000_0 Thread started:
    EventFetcher for fetching Map Completion Events
        2016-05-04 14:33:49,035 INFO [localfetcher#1]
    org.apache.hadoop.mapreduce.task.reduce.LocalFetcher:
    localfetcher#1 about to shuffle output of map
    attempt_local1149688163_0001_m_000007_0 decomp: 5381537 len:
    5381541 to MEMORY
        2016-05-04 14:33:49,056 INFO [localfetcher#1]
    org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read
    5381537 bytes from map-output for
    attempt_local1149688163_0001_m_000007_0
        2016-05-04 14:33:49,061 INFO [localfetcher#1]
    org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl:
    closeInMemoryFile -> map-output of size: 5381537,
    inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory
    ->5381537
        2016-05-04 14:33:49,070 INFO [localfetcher#1]
    org.apache.hadoop.mapreduce.task.reduce.LocalFetcher:
    localfetcher#1 about to shuffle output of map
    attempt_local1149688163_0001_m_000000_0 decomp: 5472201 len:
    5472205 to MEMORY
        2016-05-04 14:33:49,084 INFO [localfetcher#1]
    org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read
    5472201 bytes from map-output for
    attempt_local1149688163_0001_m_000000_0
        2016-05-04 14:33:49,084 INFO [localfetcher#1]
    org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl:
    closeInMemoryFile -> map-output of size: 5472201,
    inMemoryMapOutputs.size() -> 2, commitMemory -> 5381537,
    usedMemory ->10853738
        2016-05-04 14:33:49,110 INFO [localfetcher#1]
    org.apache.hadoop.mapreduce.task.reduce.LocalFetcher:
    localfetcher#1 about to shuffle output of map
    attempt_local1149688163_0001_m_000001_0 decomp: 5387977 len:
    5387981 to MEMORY
        2016-05-04 14:33:49,124 INFO [localfetcher#1]
    org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read
    5387977 bytes from map-output for
    attempt_local1149688163_0001_m_000001_0
        2016-05-04 14:33:49,125 INFO [localfetcher#1]
    org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl:
    closeInMemoryFile -> map-output of size: 5387977,
    inMemoryMapOutputs.size() -> 3, commitMemory -> 10853738,
    usedMemory ->16241715
        2016-05-04 14:33:49,129 INFO [localfetcher#1]
    org.apache.hadoop.mapreduce.task.reduce.LocalFetcher:
    localfetcher#1 about to shuffle output of map
    attempt_local1149688163_0001_m_000004_0 decomp: 5347914 len:
    5347918 to MEMORY
        2016-05-04 14:33:49,143 INFO [localfetcher#1]
    org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read
    5347914 bytes from map-output for
    attempt_local1149688163_0001_m_000004_0
        2016-05-04 14:33:49,144 INFO [localfetcher#1]
    org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl:
    closeInMemoryFile -> map-output of size: 5347914,
    inMemoryMapOutputs.size() -> 4, commitMemory -> 16241715,
    usedMemory ->21589629
        2016-05-04 14:33:49,148 INFO [localfetcher#1]
    org.apache.hadoop.mapreduce.task.reduce.LocalFetcher:
    localfetcher#1 about to shuffle output of map
    attempt_local1149688163_0001_m_000002_0 decomp: 5671398 len:
    5671402 to MEMORY
        2016-05-04 14:33:49,161 INFO [localfetcher#1]
    org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read
    5671398 bytes from map-output for
    attempt_local1149688163_0001_m_000002_0
        2016-05-04 14:33:49,161 INFO [localfetcher#1]
    org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl:
    closeInMemoryFile -> map-output of size: 5671398,
    inMemoryMapOutputs.size() -> 5, commitMemory -> 21589629,
    usedMemory ->27261027
        2016-05-04 14:33:49,166 INFO [localfetcher#1]
    org.apache.hadoop.mapreduce.task.reduce.LocalFetcher:
    localfetcher#1 about to shuffle output of map
    attempt_local1149688163_0001_m_000005_0 decomp: 5743249 len:
    5743253 to MEMORY
        2016-05-04 14:33:49,180 INFO [localfetcher#1]
    org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read
    5743249 bytes from map-output for
    attempt_local1149688163_0001_m_000005_0
        2016-05-04 14:33:49,180 INFO [localfetcher#1]
    org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl:
    closeInMemoryFile -> map-output of size: 5743249,
    inMemoryMapOutputs.size() -> 6, commitMemory -> 27261027,
    usedMemory ->33004276
        2016-05-04 14:33:49,184 INFO [localfetcher#1]
    org.apache.hadoop.mapreduce.task.reduce.LocalFetcher:
    localfetcher#1 about to shuffle output of map
    attempt_local1149688163_0001_m_000008_0 decomp: 5471488 len:
    5471492 to MEMORY
        2016-05-04 14:33:49,197 INFO [localfetcher#1]
    org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read
    5471488 bytes from map-output for
    attempt_local1149688163_0001_m_000008_0
        2016-05-04 14:33:49,197 INFO [localfetcher#1]
    org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl:
    closeInMemoryFile -> map-output of size: 5471488,
    inMemoryMapOutputs.size() -> 7, commitMemory -> 33004276,
    usedMemory ->38475764
        2016-05-04 14:33:49,313 INFO [localfetcher#1]
    org.apache.hadoop.mapreduce.task.reduce.LocalFetcher:
    localfetcher#1 about to shuffle output of map
    attempt_local1149688163_0001_m_000003_0 decomp: 5579502 len:
    5579506 to MEMORY
        2016-05-04 14:33:49,327 INFO [localfetcher#1]
    org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read
    5579502 bytes from map-output for
    attempt_local1149688163_0001_m_000003_0
        2016-05-04 14:33:49,327 INFO [localfetcher#1]
    org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl:
    closeInMemoryFile -> map-output of size: 5579502,
    inMemoryMapOutputs.size() -> 8, commitMemory -> 38475764,
    usedMemory ->44055266
        2016-05-04 14:33:49,332 INFO [localfetcher#1]
    org.apache.hadoop.mapreduce.task.reduce.LocalFetcher:
    localfetcher#1 about to shuffle output of map
    attempt_local1149688163_0001_m_000006_0 decomp: 5605456 len:
    5605460 to MEMORY
        2016-05-04 14:33:49,344 INFO [main]
    org.apache.hadoop.mapreduce.Job:  map 100% reduce 0%
        2016-05-04 14:33:49,349 INFO [localfetcher#1]
    org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read
    5605456 bytes from map-output for
    attempt_local1149688163_0001_m_000006_0
        2016-05-04 14:33:49,349 INFO [localfetcher#1]
    org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl:
    closeInMemoryFile -> map-output of size: 5605456,
    inMemoryMapOutputs.size() -> 9, commitMemory -> 44055266,
    usedMemory ->49660722
        2016-05-04 14:33:49,354 INFO [localfetcher#1]
    org.apache.hadoop.mapreduce.task.reduce.LocalFetcher:
    localfetcher#1 about to shuffle output of map
    attempt_local1149688163_0001_m_000009_0 decomp: 5738455 len:
    5738459 to MEMORY
        2016-05-04 14:33:49,370 INFO [localfetcher#1]
    org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read
    5738455 bytes from map-output for
    attempt_local1149688163_0001_m_000009_0
        2016-05-04 14:33:49,370 INFO [localfetcher#1]
    org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl:
    closeInMemoryFile -> map-output of size: 5738455,
    inMemoryMapOutputs.size() -> 10, commitMemory -> 49660722,
    usedMemory ->55399177
        2016-05-04 14:33:49,373 INFO [EventFetcher for fetching Map
    Completion Events]
    org.apache.hadoop.mapreduce.task.reduce.EventFetcher:
    EventFetcher is interrupted.. Returning
        2016-05-04 14:33:49,375 INFO [pool-9-thread-1]
    org.apache.hadoop.mapred.LocalJobRunner: 10 / 10 copied.
        2016-05-04 14:33:49,376 INFO [pool-9-thread-1]
    org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl:
    finalMerge called with 10 in-memory map-outputs and 0 on-disk
    map-outputs
        2016-05-04 14:33:49,388 INFO [pool-9-thread-1]
    org.apache.hadoop.mapred.Merger: Merging 10 sorted segments
        2016-05-04 14:33:49,389 INFO [pool-9-thread-1]
    org.apache.hadoop.mapred.Merger: Down to the last merge-pass,
    with 10 segments left of total size: 55398877 bytes
        2016-05-04 14:33:49,711 INFO [pool-9-thread-1]
    org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: Merged
    10 segments, 55399177 bytes to disk to satisfy reduce memory limit
        2016-05-04 14:33:49,712 INFO [pool-9-thread-1]
    org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: Merging
    1 files, 55399163 bytes from disk
        2016-05-04 14:33:49,713 INFO [pool-9-thread-1]
    org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: Merging
    0 segments, 0 bytes from memory into reduce
        2016-05-04 14:33:49,714 INFO [pool-9-thread-1]
    org.apache.hadoop.mapred.Merger: Merging 1 sorted segments
        2016-05-04 14:33:49,714 INFO [pool-9-thread-1]
    org.apache.hadoop.mapred.Merger: Down to the last merge-pass,
    with 1 segments left of total size: 55399129 bytes
        2016-05-04 14:33:49,715 INFO [pool-9-thread-1]
    org.apache.hadoop.mapred.LocalJobRunner: 10 / 10 copied.
        2016-05-04 14:33:49,742 INFO [Thread-42]
    org.apache.hadoop.mapred.LocalJobRunner: reduce task executor
    complete.
        2016-05-04 14:33:49,797 WARN [Thread-42]
    org.apache.hadoop.mapred.LocalJobRunner: job_local1149688163_0001
        java.lang.Exception: java.io.IOException: Mkdirs failed to
    create
    
file:/user/hdfs/sessions/777/23115/inputRecordsAsWritables/_temporary/0/_temporary/attempt_local1149688163_0001_r_000000_0
    (exists=false,
    
cwd=file:/hadoop/yarn/local/usercache/hdfs/appcache/application_1461858162941_0054/container_e12_1461858162941_0054_01_000002)
            at
    
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
            at
    org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
        Caused by: java.io.IOException: Mkdirs failed to create
    
file:/user/hdfs/sessions/777/23115/inputRecordsAsWritables/_temporary/0/_temporary/attempt_local1149688163_0001_r_000000_0
    (exists=false,
    
cwd=file:/hadoop/yarn/local/usercache/hdfs/appcache/application_1461858162941_0054/container_e12_1461858162941_0054_01_000002)
            at
    org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:449)
            at
    org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:435)
            at
    org.apache.hadoop.fs.FileSystem.create(FileSystem.java:909)
            at
    org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:1074)
            at
    org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:273)
            at
    org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:530)
            at
    
org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getSequenceWriter(SequenceFileOutputFormat.java:64)
            at
    
org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:75)
            at
    
org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.<init>(ReduceTask.java:540)
            at
    org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:614)
            at
    org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
            at
    
org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
            at
    java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            at java.util.concurrent.FutureTask.run(FutureTask.java:266)
            at
    
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
            at
    
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
            at java.lang.Thread.run(Thread.java:745)
        2016-05-04 14:33:50,346 INFO [main]
    org.apache.hadoop.mapreduce.Job: Job job_local1149688163_0001
    failed with state FAILED due to: NA
        2016-05-04 14:33:50,407 INFO [main]
    org.apache.hadoop.mapreduce.Job: Counters: 38
            File System Counters
                FILE: Number of bytes read=1287449333
                FILE: Number of bytes written=1607139426
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=1111590
                HDFS: Number of bytes written=220
                HDFS: Number of read operations=40
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=20
            Map-Reduce Framework
                Map input records=10906
                Map output records=10906
                Map output bytes=55355550
                Map output materialized bytes=55399217
                Input split bytes=2900
                Combine input records=0
                Combine output records=0
                Reduce input groups=0
                Reduce shuffle bytes=55399217
                Reduce input records=0
                Reduce output records=0
                Spilled Records=10906
                Shuffled Maps =10
                Failed Shuffles=0
                Merged Map outputs=10
                GC time elapsed (ms)=641
                CPU time spent (ms)=11290
                Physical memory (bytes) snapshot=4507889664
    <tel:4507889664>
                Virtual memory (bytes) snapshot=22225674240
                Total committed heap usage (bytes)=2925002752
            Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
            File Input Format Counters
                Bytes Read=0
            File Output Format Counters
                Bytes Written=0

    And here is the exception from next job:

        Failing Oozie Launcher, Main class

        [org.apache.oozie.action.hadoop.JavaMain], main() threw
    exception,
    org.apache.hadoop.mapreduce.lib.input.InvalidInputException:
    Input path does not exist:
    file:/user/hdfs/sessions/777/23115/inputRecordsAsWritables
    org.apache.oozie.action.hadoop.JavaMainException:
    org.apache.hadoop.mapreduce.lib.input.InvalidInputException:
    Input path does not exist:
    file:/user/hdfs/sessions/777/23115/inputRecordsAsWritables
            at
    org.apache.oozie.action.hadoop.JavaMain.run(JavaMain.java:59)
            at
    org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:47)
            at
    org.apache.oozie.action.hadoop.JavaMain.main(JavaMain.java:35)
            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
    Method)
            at
    
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
            at
    
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.lang.reflect.Method.invoke(Method.java:497)
            at
    org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:241)
            at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
            at
    org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
            at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
            at
    org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
            at java.security.AccessController.doPrivileged(Native Method)
            at javax.security.auth.Subject.doAs(Subject.java:422)
            at
    
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
            at
    org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
        Caused by:
    org.apache.hadoop.mapreduce.lib.input.InvalidInputException:
    Input path does not exist:
    file:/user/hdfs/sessions/777/23115/inputRecordsAsWritables
            at
    
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323)
            at
    
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265)
            at
    
org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:59)
            at
    
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:387)
            at
    
org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
            at
    org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
            at
    
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
            at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
            at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
            at java.security.AccessController.doPrivileged(Native Method)
            at javax.security.auth.Subject.doAs(Subject.java:422)
            at
    
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
            at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
            at
    org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
            at
    
com.nissatech.kmedoidsusingfames.algorithms.initialization.RandomSeedDriver.generateRandomSeed(RandomSeedDriver.java:52)
            at
    
com.nissatech.kmedoidsusingfames.algorithms.initialization.ScalableKMeansPPInitialization.performInitialization(ScalableKMeansPPInitialization.java:43)
            at
    
com.nissatech.kmedoidsusingfames.algorithms.kmedoids.KMedoidsUsingFAMES.perform(KMedoidsUsingFAMES.java:54)
            at
    
com.nissatech.kmedoidsusingfames.algorithms.ClusteringAlgorithmRepetitor.performIteratingForSameNoOfClusters(ClusteringAlgorithmRepetitor.java:43)
            at
    
com.nissatech.kmedoidsusingfames.algorithms.ClusteringAlgorithmIterator.performTraining(ClusteringAlgorithmIterator.java:46)
            at
    
com.nissatech.kmedoidsusingfames.orchestration.Orchestrator.main(Orchestrator.java:74)
            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
    Method)
            at
    
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
            at
    
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.lang.reflect.Method.invoke(Method.java:497)
            at
    org.apache.oozie.action.hadoop.JavaMain.run(JavaMain.java:56)
            ... 15 more

    It seems to me that first job is run locally and hence there is
    no result for the next one on the HDFS. Am I wrong?

    ___________________________


    I was able to make my MR job run on HDP cluster by adding this to
    configuration (based on the following link):

        Configuration conf = new Configuration(false);
        conf.addResource(new Path("file:///",
    System.getProperty("oozie.action.conf.xml")));

    But why do I need to do that and how to avoid it? I have a
    sequence of MR jobs run from this Java action and I don't won't
    to bind myself to using Oozie and adding this to config of each
    job. Is there a way to make my jobs run on cluster from Oozie by
    default?

    I should probably mention that this is an HDP cluster and setup
    was performed through Ambari.

--*Marko Dinic'*

    /Software engineer @/
    Nissatech
    Kajmakc(alanska 8
    18000 Nis(, Serbia
    website <http://www.nissatech.com> | email
    <mailto:[email protected]>
    tel/fax: +381 18 288 111 <tel:%2B381%2018%20288%20111>
    mobile: +381 63 82 49 556
    skype: vesto91

--*Marko Dinic'*

    /Software engineer @/
    Nissatech
    Kajmakc(alanska 8
    18000 Nis(, Serbia
    website <http://www.nissatech.com> | email
    <mailto:[email protected]>
    tel/fax: +381 18 288 111 <tel:%2B381%2018%20288%20111>
    mobile: +381 63 82 49 556
    skype: vesto91


--
signature *Marko Dinic'*
/Software engineer @/
Nissatech
Kajmakc(alanska 8
18000 Nis(, Serbia

website <http://www.nissatech.com> | email<mailto:[email protected]>

tel/fax: +381 18 288 111
mobile: +381 63 82 49 556
skype: vesto91

Re: MR jobs from Java action run locally

Reply via email to