Re: MR jobs from Java action run locally

Micah Whitacre Fri, 06 May 2016 12:55:09 -0700

Did you try adding?

conf.addResource("mapred-site.xml");
conf.addResource("yarn-site.xml");



If that doesn't work then I'd guess the config on your Oozie server might
not be setup correctly to have the right RM configuration.

On Fri, May 6, 2016 at 3:30 AM, Marko Dinic <[email protected]>
wrote:

> Hello Micah,
>
> Thank you for your answer. There are a couple of problems with this
> approach in my case:
>
> - When I use the Job definition that you have given (using Configured and
> Tool) my configuration still gets initialized to local.
> - My jobs are generally not defined as classes with main method, but there
> is only one main() method in class which performs orchestration, and it
> uses Job definitions in separate classes which do not have a main method.
> That is, I am not able to implement Tool since my job definitions don't
> have a main class.
>
> I do not understand why is my configuration initialized to local, do you
> have any idea? So I still have:
>
> mapreduce.jobtracker.address = localmapreduce.framework.name = local
>
> I do get execution on cluster when I add:
>
> conf.addResource(new Path("file:///", 
> System.getProperty("oozie.action.conf.xml")));
>
> But then I have the problem that some of the jars are added to my
> Distributed cache, which makes me a problem when I try to get something
> that I added to it (since I don't know it's location any-longer). Fore
> example, here's what's located in my distributed cache:
>
> /user/hdfs/training/lib/KMedoidsUsingFAMES-2.0-SNAPSHOT-jar-with-dependencies.jar
> /user/oozie/share/lib/lib_20160128122044/oozie/aws-java-sdk-1.7.4.jar
> /user/oozie/share/lib/lib_20160128122044/oozie/azure-storage-2.2.0.jar
> /user/oozie/share/lib/lib_20160128122044/oozie/commons-lang3-3.3.2.jar
> /user/oozie/share/lib/lib_20160128122044/oozie/guava-11.0.2.jar
> /user/oozie/share/lib/lib_20160128122044/oozie/hadoop-aws-2.7.1.2.3.4.0-3485.jar
> /user/oozie/share/lib/lib_20160128122044/oozie/hadoop-azure-2.7.1.2.3.4.0-3485.jar
> /user/oozie/share/lib/lib_20160128122044/oozie/jackson-annotations-2.2.3.jar
> /user/oozie/share/lib/lib_20160128122044/oozie/jackson-core-2.2.3.jar
> /user/oozie/share/lib/lib_20160128122044/oozie/jackson-databind-2.2.3.jar
> /user/oozie/share/lib/lib_20160128122044/oozie/joda-time-2.1.jar
> /user/oozie/share/lib/lib_20160128122044/oozie/json-simple-1.1.jar
> /user/oozie/share/lib/lib_20160128122044/oozie/oozie-hadoop-utils-hadoop-2-4.2.0.2.3.4.0-3485.jar
> /user/oozie/share/lib/lib_20160128122044/oozie/oozie-sharelib-oozie-4.2.0.2.3.4.0-3485.jar
> /user/hdfs/sessions/777/11072010/initialSeed/part-r-00000
>
>
> As you can see, the file that I have added to distributed cache is now
> last (and was first before), so this could be a problem for me.
>
> Are you aware of such behaviour - that Distributed cache gets "polluted"
> by jar locations that you don't specify?
>
> Best regards,
>
>
> On 05/05/2016 05:51 PM, Micah Whitacre wrote:
>
> Not sure how your main class is structured but a lot of our Java Actions
> extend the Hadoop Tool class
>
> public class MyJob extends Configured implements Tool {    public static void 
> main(String[] args) throws Exception {
>         MyJob job = new MyJob();
>
>         ToolRunner.run(new Configuration(), job, args);
>     }
>
>     @Override    public int run(String[] args) throws Exception {
>         Configuration config = getConf();
>
>         //do stuff
>
>         return 0;
>     }
>
> }
>
>  Inside the run method it will usually have populated the config for kicking 
> off jobs.  We have found in some occasions that adding the Oozie conf helps 
> in secured clusters when dealing with tokens etc.  So we have code that looks 
> like the following:
>
>  if (System.getProperty("oozie.action.conf.xml") != null) {
>     conf.addResource(new Path("file:///", 
> System.getProperty("oozie.action.conf.xml")));
> }
>
> conf.addResource("core-site.xml");
> conf.addResource("hdfs-site.xml");
> conf.addResource("mapred-site.xml");
> conf.addResource("yarn-site.xml");
> conf.addResource("hive-site.xml");
>
> With that code we can handle running on the command line or through Oozie 
> without caring.  Also we can talk to Hive without extra command line config.
>
>
>
> On Thu, May 5, 2016 at 9:31 AM, Marko Dinic <[email protected]>
> wrote:
>
>> I should add that this is what my Configuration looks like when I create
>> it using default constructor
>>
>> Configuration conf = new Configuration();
>>
>> mapreduce.jobtracker.address = localmapreduce.framework.name = local
>>
>> And here is what happens when using
>>
>> Configuration conf = new Configuration(false);
>> conf.addResource(new Path("file:///", 
>> System.getProperty("oozie.action.conf.xml")));
>>
>> mapreduce.jobtracker.address = 192.168.84.27:8050mapreduce.framework.name = 
>> yarn
>>
>> Any help would be highly appreciated.
>>
>>
>> On 05/05/2016 10:39 AM, Marko Dinic wrote:
>>
>> Hello everyone,
>>
>> I'm trying to run a sequence of MR jobs using Java action for their
>> drivers in Oozie.
>>
>> The problem is that MR job are run locally instead on Hadoop cluster. How
>> to fix this?
>>
>> First job reads from HBase, performs some processing and puts the result
>> on HDFS, while next job should read from it. There are 10 mappers in first
>> job, but I'm only showing the last one as an example.
>>
>> Here is the error log from HBase MR job:
>>
>>         Aw==, start row: 9-777-1123456789113, end row:
>> 9-777-1123456789114, region location: hdp-slave1.nissatech.local:16020)
>>     2016-05-04 14:33:48,373 INFO [LocalJobRunner Map Task Executor #0]
>> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
>> identifier=hconnection-0x860ce79 connecting to ZooKeeper ensemble=
>> 192.168.84.27:2181
>>     2016-05-04 14:33:48,373 INFO [LocalJobRunner Map Task Executor #0]
>> org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=
>> 192.168.84.27:2181 sessionTimeout=90000
>> watcher=hconnection-0x860ce790x0, quorum=192.168.84.27:2181,
>> baseZNode=/hbase-unsecure
>>     2016-05-04 14:33:48,378 INFO [LocalJobRunner Map Task Executor
>> #0-SendThread(192.168.84.27:2181)] org.apache.zookeeper.ClientCnxn:
>> Opening socket connection to server 192.168.84.27/192.168.84.27:2181.
>> Will not attempt to authenticate using SASL (unknown error)
>>     2016-05-04 14:33:48,379 INFO [LocalJobRunner Map Task Executor
>> #0-SendThread(192.168.84.27:2181)] org.apache.zookeeper.ClientCnxn:
>> Socket connection established to 192.168.84.27/192.168.84.27:2181,
>> initiating session
>>     2016-05-04 14:33:48,391 INFO [LocalJobRunner Map Task Executor
>> #0-SendThread(192.168.84.27:2181)] org.apache.zookeeper.ClientCnxn:
>> Session establishment complete on server 192.168.84.27/192.168.84.27:2181,
>> sessionid = 0x152f8f85214096b, negotiated timeout = 40000
>>     2016-05-04 14:33:48,394 INFO [LocalJobRunner Map Task Executor #0]
>> org.apache.hadoop.hbase.mapreduce.TableInputFormatBase: Input split length:
>> 0 bytes.
>>     2016-05-04 14:33:48,590 INFO [LocalJobRunner Map Task Executor #0]
>> org.apache.hadoop.mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
>>     2016-05-04 14:33:48,590 INFO [LocalJobRunner Map Task Executor #0]
>> org.apache.hadoop.mapred.MapTask: mapreduce.task.io.sort.mb: 100
>>     2016-05-04 14:33:48,590 INFO [LocalJobRunner Map Task Executor #0]
>> org.apache.hadoop.mapred.MapTask: soft limit at 83886080
>>     2016-05-04 14:33:48,590 INFO [LocalJobRunner Map Task Executor #0]
>> org.apache.hadoop.mapred.MapTask: bufstart = 0; bufvoid = 104857600
>>     2016-05-04 14:33:48,591 INFO [LocalJobRunner Map Task Executor #0]
>> org.apache.hadoop.mapred.MapTask: kvstart = 26214396; length = 6553600
>>     2016-05-04 14:33:48,592 INFO [LocalJobRunner Map Task Executor #0]
>> org.apache.hadoop.mapred.MapTask: Map output collector class =
>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer
>>     2016-05-04 14:33:48,801 INFO [LocalJobRunner Map Task Executor #0]
>> org.apache.hadoop.mapred.LocalJobRunner:
>>     2016-05-04 14:33:48,802 INFO [LocalJobRunner Map Task Executor #0]
>> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation:
>> Closing zookeeper sessionid=0x152f8f85214096b
>>     2016-05-04 14:33:48,828 INFO [LocalJobRunner Map Task Executor
>> #0-EventThread] org.apache.zookeeper.ClientCnxn: EventThread shut down
>>     2016-05-04 14:33:48,828 INFO [LocalJobRunner Map Task Executor #0]
>> org.apache.zookeeper.ZooKeeper: Session: 0x152f8f85214096b closed
>>     2016-05-04 14:33:48,839 INFO [LocalJobRunner Map Task Executor #0]
>> org.apache.hadoop.mapred.MapTask: Starting flush of map output
>>     2016-05-04 14:33:48,839 INFO [LocalJobRunner Map Task Executor #0]
>> org.apache.hadoop.mapred.MapTask: Spilling map output
>>     2016-05-04 14:33:48,839 INFO [LocalJobRunner Map Task Executor #0]
>> org.apache.hadoop.mapred.MapTask: bufstart = 0; bufend = 5734062; bufvoid =
>> 104857600
>>     2016-05-04 14:33:48,839 INFO [LocalJobRunner Map Task Executor #0]
>> org.apache.hadoop.mapred.MapTask: kvstart = 26214396(104857584); kvend =
>> 26210008(104840032); length = 4389/6553600
>>     2016-05-04 14:33:48,874 INFO [LocalJobRunner Map Task Executor #0]
>> org.apache.hadoop.mapred.MapTask: Finished spill 0
>>     2016-05-04 14:33:48,877 INFO [LocalJobRunner Map Task Executor #0]
>> org.apache.hadoop.mapred.Task: Task:attempt_local1149688163_0001_m_000009_0
>> is done. And is in the process of committing
>>     2016-05-04 14:33:48,897 INFO [LocalJobRunner Map Task Executor #0]
>> org.apache.hadoop.mapred.LocalJobRunner: map
>>     2016-05-04 14:33:48,897 INFO [LocalJobRunner Map Task Executor #0]
>> org.apache.hadoop.mapred.Task: Task
>> 'attempt_local1149688163_0001_m_000009_0' done.
>>     2016-05-04 14:33:48,897 INFO [LocalJobRunner Map Task Executor #0]
>> org.apache.hadoop.mapred.LocalJobRunner: Finishing task:
>> attempt_local1149688163_0001_m_000009_0
>>     2016-05-04 14:33:48,897 INFO [Thread-42]
>> org.apache.hadoop.mapred.LocalJobRunner: map task executor complete.
>>     2016-05-04 14:33:48,901 INFO [Thread-42]
>> org.apache.hadoop.mapred.LocalJobRunner: Waiting for reduce tasks
>>     2016-05-04 14:33:48,901 INFO [pool-9-thread-1]
>> org.apache.hadoop.mapred.LocalJobRunner: Starting task:
>> attempt_local1149688163_0001_r_000000_0
>>     2016-05-04 14:33:48,918 INFO [pool-9-thread-1]
>> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output
>> Committer Algorithm version is 1
>>     2016-05-04 14:33:48,919 INFO [pool-9-thread-1]
>> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter:
>> FileOutputCommitter skip cleanup _temporary folders under output
>> directory:false, ignore cleanup failures: false
>>     2016-05-04 14:33:48,919 INFO [pool-9-thread-1]
>> org.apache.hadoop.mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
>>     2016-05-04 14:33:48,932 INFO [pool-9-thread-1]
>> org.apache.hadoop.mapred.ReduceTask: Using ShuffleConsumerPlugin:
>> org.apache.hadoop.mapreduce.task.reduce.Shuffle@697f13c9
>>     2016-05-04 14:33:48,959 INFO [pool-9-thread-1]
>> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: MergerManager:
>> memoryLimit=289931264, maxSingleShuffleLimit=72482816,
>> mergeThreshold=191354640, ioSortFactor=10, memToMemMergeOutputsThreshold=10
>>     2016-05-04 14:33:48,965 INFO [EventFetcher for fetching Map
>> Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher:
>> attempt_local1149688163_0001_r_000000_0 Thread started: EventFetcher for
>> fetching Map Completion Events
>>     2016-05-04 14:33:49,035 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: localfetcher#1 about
>> to shuffle output of map attempt_local1149688163_0001_m_000007_0 decomp:
>> 5381537 len: 5381541 to MEMORY
>>     2016-05-04 14:33:49,056 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 5381537
>> bytes from map-output for attempt_local1149688163_0001_m_000007_0
>>     2016-05-04 14:33:49,061 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile
>> -> map-output of size: 5381537, inMemoryMapOutputs.size() -> 1,
>> commitMemory -> 0, usedMemory ->5381537
>>     2016-05-04 14:33:49,070 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: localfetcher#1 about
>> to shuffle output of map attempt_local1149688163_0001_m_000000_0 decomp:
>> 5472201 len: 5472205 to MEMORY
>>     2016-05-04 14:33:49,084 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 5472201
>> bytes from map-output for attempt_local1149688163_0001_m_000000_0
>>     2016-05-04 14:33:49,084 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile
>> -> map-output of size: 5472201, inMemoryMapOutputs.size() -> 2,
>> commitMemory -> 5381537, usedMemory ->10853738
>>     2016-05-04 14:33:49,110 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: localfetcher#1 about
>> to shuffle output of map attempt_local1149688163_0001_m_000001_0 decomp:
>> 5387977 len: 5387981 to MEMORY
>>     2016-05-04 14:33:49,124 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 5387977
>> bytes from map-output for attempt_local1149688163_0001_m_000001_0
>>     2016-05-04 14:33:49,125 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile
>> -> map-output of size: 5387977, inMemoryMapOutputs.size() -> 3,
>> commitMemory -> 10853738, usedMemory ->16241715
>>     2016-05-04 14:33:49,129 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: localfetcher#1 about
>> to shuffle output of map attempt_local1149688163_0001_m_000004_0 decomp:
>> 5347914 len: 5347918 to MEMORY
>>     2016-05-04 14:33:49,143 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 5347914
>> bytes from map-output for attempt_local1149688163_0001_m_000004_0
>>     2016-05-04 14:33:49,144 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile
>> -> map-output of size: 5347914, inMemoryMapOutputs.size() -> 4,
>> commitMemory -> 16241715, usedMemory ->21589629
>>     2016-05-04 14:33:49,148 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: localfetcher#1 about
>> to shuffle output of map attempt_local1149688163_0001_m_000002_0 decomp:
>> 5671398 len: 5671402 to MEMORY
>>     2016-05-04 14:33:49,161 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 5671398
>> bytes from map-output for attempt_local1149688163_0001_m_000002_0
>>     2016-05-04 14:33:49,161 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile
>> -> map-output of size: 5671398, inMemoryMapOutputs.size() -> 5,
>> commitMemory -> 21589629, usedMemory ->27261027
>>     2016-05-04 14:33:49,166 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: localfetcher#1 about
>> to shuffle output of map attempt_local1149688163_0001_m_000005_0 decomp:
>> 5743249 len: 5743253 to MEMORY
>>     2016-05-04 14:33:49,180 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 5743249
>> bytes from map-output for attempt_local1149688163_0001_m_000005_0
>>     2016-05-04 14:33:49,180 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile
>> -> map-output of size: 5743249, inMemoryMapOutputs.size() -> 6,
>> commitMemory -> 27261027, usedMemory ->33004276
>>     2016-05-04 14:33:49,184 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: localfetcher#1 about
>> to shuffle output of map attempt_local1149688163_0001_m_000008_0 decomp:
>> 5471488 len: 5471492 to MEMORY
>>     2016-05-04 14:33:49,197 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 5471488
>> bytes from map-output for attempt_local1149688163_0001_m_000008_0
>>     2016-05-04 14:33:49,197 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile
>> -> map-output of size: 5471488, inMemoryMapOutputs.size() -> 7,
>> commitMemory -> 33004276, usedMemory ->38475764
>>     2016-05-04 14:33:49,313 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: localfetcher#1 about
>> to shuffle output of map attempt_local1149688163_0001_m_000003_0 decomp:
>> 5579502 len: 5579506 to MEMORY
>>     2016-05-04 14:33:49,327 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 5579502
>> bytes from map-output for attempt_local1149688163_0001_m_000003_0
>>     2016-05-04 14:33:49,327 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile
>> -> map-output of size: 5579502, inMemoryMapOutputs.size() -> 8,
>> commitMemory -> 38475764, usedMemory ->44055266
>>     2016-05-04 14:33:49,332 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: localfetcher#1 about
>> to shuffle output of map attempt_local1149688163_0001_m_000006_0 decomp:
>> 5605456 len: 5605460 to MEMORY
>>     2016-05-04 14:33:49,344 INFO [main] org.apache.hadoop.mapreduce.Job:
>> map 100% reduce 0%
>>     2016-05-04 14:33:49,349 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 5605456
>> bytes from map-output for attempt_local1149688163_0001_m_000006_0
>>     2016-05-04 14:33:49,349 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile
>> -> map-output of size: 5605456, inMemoryMapOutputs.size() -> 9,
>> commitMemory -> 44055266, usedMemory ->49660722
>>     2016-05-04 14:33:49,354 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: localfetcher#1 about
>> to shuffle output of map attempt_local1149688163_0001_m_000009_0 decomp:
>> 5738455 len: 5738459 to MEMORY
>>     2016-05-04 14:33:49,370 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 5738455
>> bytes from map-output for attempt_local1149688163_0001_m_000009_0
>>     2016-05-04 14:33:49,370 INFO [localfetcher#1]
>> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile
>> -> map-output of size: 5738455, inMemoryMapOutputs.size() -> 10,
>> commitMemory -> 49660722, usedMemory ->55399177
>>     2016-05-04 14:33:49,373 INFO [EventFetcher for fetching Map
>> Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher:
>> EventFetcher is interrupted.. Returning
>>     2016-05-04 14:33:49,375 INFO [pool-9-thread-1]
>> org.apache.hadoop.mapred.LocalJobRunner: 10 / 10 copied.
>>     2016-05-04 14:33:49,376 INFO [pool-9-thread-1]
>> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: finalMerge called
>> with 10 in-memory map-outputs and 0 on-disk map-outputs
>>     2016-05-04 14:33:49,388 INFO [pool-9-thread-1]
>> org.apache.hadoop.mapred.Merger: Merging 10 sorted segments
>>     2016-05-04 14:33:49,389 INFO [pool-9-thread-1]
>> org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 10
>> segments left of total size: 55398877 bytes
>>     2016-05-04 14:33:49,711 INFO [pool-9-thread-1]
>> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: Merged 10
>> segments, 55399177 bytes to disk to satisfy reduce memory limit
>>     2016-05-04 14:33:49,712 INFO [pool-9-thread-1]
>> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: Merging 1 files,
>> 55399163 bytes from disk
>>     2016-05-04 14:33:49,713 INFO [pool-9-thread-1]
>> org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: Merging 0
>> segments, 0 bytes from memory into reduce
>>     2016-05-04 14:33:49,714 INFO [pool-9-thread-1]
>> org.apache.hadoop.mapred.Merger: Merging 1 sorted segments
>>     2016-05-04 14:33:49,714 INFO [pool-9-thread-1]
>> org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 1
>> segments left of total size: 55399129 bytes
>>     2016-05-04 14:33:49,715 INFO [pool-9-thread-1]
>> org.apache.hadoop.mapred.LocalJobRunner: 10 / 10 copied.
>>     2016-05-04 14:33:49,742 INFO [Thread-42]
>> org.apache.hadoop.mapred.LocalJobRunner: reduce task executor complete.
>>     2016-05-04 14:33:49,797 WARN [Thread-42]
>> org.apache.hadoop.mapred.LocalJobRunner: job_local1149688163_0001
>>     java.lang.Exception: java.io.IOException: Mkdirs failed to create
>> file:/user/hdfs/sessions/777/23115/inputRecordsAsWritables/_temporary/0/_temporary/attempt_local1149688163_0001_r_000000_0
>> (exists=false, cwd=
>> file:/hadoop/yarn/local/usercache/hdfs/appcache/application_1461858162941_0054/container_e12_1461858162941_0054_01_000002
>> )
>>         at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
>>         at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
>>     Caused by: java.io.IOException: Mkdirs failed to create
>> file:/user/hdfs/sessions/777/23115/inputRecordsAsWritables/_temporary/0/_temporary/attempt_local1149688163_0001_r_000000_0
>> (exists=false, cwd=
>> file:/hadoop/yarn/local/usercache/hdfs/appcache/application_1461858162941_0054/container_e12_1461858162941_0054_01_000002
>> )
>>         at
>> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:449)
>>         at
>> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:435)
>>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:909)
>>         at
>> org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:1074)
>>         at
>> org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:273)
>>         at
>> org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:530)
>>         at
>> org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getSequenceWriter(SequenceFileOutputFormat.java:64)
>>         at
>> org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:75)
>>         at
>> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.<init>(ReduceTask.java:540)
>>         at
>> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:614)
>>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
>>         at
>> org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
>>         at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>         at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>         at java.lang.Thread.run(Thread.java:745)
>>     2016-05-04 14:33:50,346 INFO [main] org.apache.hadoop.mapreduce.Job:
>> Job job_local1149688163_0001 failed with state FAILED due to: NA
>>     2016-05-04 14:33:50,407 INFO [main] org.apache.hadoop.mapreduce.Job:
>> Counters: 38
>>         File System Counters
>>             FILE: Number of bytes read=1287449333
>>             FILE: Number of bytes written=1607139426
>>             FILE: Number of read operations=0
>>             FILE: Number of large read operations=0
>>             FILE: Number of write operations=0
>>             HDFS: Number of bytes read=1111590
>>             HDFS: Number of bytes written=220
>>             HDFS: Number of read operations=40
>>             HDFS: Number of large read operations=0
>>             HDFS: Number of write operations=20
>>         Map-Reduce Framework
>>             Map input records=10906
>>             Map output records=10906
>>             Map output bytes=55355550
>>             Map output materialized bytes=55399217
>>             Input split bytes=2900
>>             Combine input records=0
>>             Combine output records=0
>>             Reduce input groups=0
>>             Reduce shuffle bytes=55399217
>>             Reduce input records=0
>>             Reduce output records=0
>>             Spilled Records=10906
>>             Shuffled Maps =10
>>             Failed Shuffles=0
>>             Merged Map outputs=10
>>             GC time elapsed (ms)=641
>>             CPU time spent (ms)=11290
>>             Physical memory (bytes) snapshot=4507889664
>>             Virtual memory (bytes) snapshot=22225674240
>>             Total committed heap usage (bytes)=2925002752
>>         Shuffle Errors
>>             BAD_ID=0
>>             CONNECTION=0
>>             IO_ERROR=0
>>             WRONG_LENGTH=0
>>             WRONG_MAP=0
>>             WRONG_REDUCE=0
>>         File Input Format Counters
>>             Bytes Read=0
>>         File Output Format Counters
>>             Bytes Written=0
>>
>> And here is the exception from next job:
>>
>>     Failing Oozie Launcher, Main class
>>
>>     [org.apache.oozie.action.hadoop.JavaMain], main() threw exception,
>> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
>> does not exist:
>> file:/user/hdfs/sessions/777/23115/inputRecordsAsWritables
>>     org.apache.oozie.action.hadoop.JavaMainException:
>> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
>> does not exist:
>> file:/user/hdfs/sessions/777/23115/inputRecordsAsWritables
>>         at org.apache.oozie.action.hadoop.JavaMain.run(JavaMain.java:59)
>>         at
>> org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:47)
>>         at org.apache.oozie.action.hadoop.JavaMain.main(JavaMain.java:35)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>         at java.lang.reflect.Method.invoke(Method.java:497)
>>         at
>> org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:241)
>>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
>>         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at javax.security.auth.Subject.doAs(Subject.java:422)
>>         at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
>>     Caused by:
>> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
>> does not exist:
>> file:/user/hdfs/sessions/777/23115/inputRecordsAsWritables
>>         at
>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323)
>>         at
>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265)
>>         at
>> org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:59)
>>         at
>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:387)
>>         at
>> org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
>>         at
>> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
>>         at
>> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
>>         at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
>>         at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at javax.security.auth.Subject.doAs(Subject.java:422)
>>         at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>>         at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
>>         at
>> org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
>>         at
>> com.nissatech.kmedoidsusingfames.algorithms.initialization.RandomSeedDriver.generateRandomSeed(RandomSeedDriver.java:52)
>>         at
>> com.nissatech.kmedoidsusingfames.algorithms.initialization.ScalableKMeansPPInitialization.performInitialization(ScalableKMeansPPInitialization.java:43)
>>         at
>> com.nissatech.kmedoidsusingfames.algorithms.kmedoids.KMedoidsUsingFAMES.perform(KMedoidsUsingFAMES.java:54)
>>         at
>> com.nissatech.kmedoidsusingfames.algorithms.ClusteringAlgorithmRepetitor.performIteratingForSameNoOfClusters(ClusteringAlgorithmRepetitor.java:43)
>>         at
>> com.nissatech.kmedoidsusingfames.algorithms.ClusteringAlgorithmIterator.performTraining(ClusteringAlgorithmIterator.java:46)
>>         at
>> com.nissatech.kmedoidsusingfames.orchestration.Orchestrator.main(Orchestrator.java:74)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>         at java.lang.reflect.Method.invoke(Method.java:497)
>>         at org.apache.oozie.action.hadoop.JavaMain.run(JavaMain.java:56)
>>         ... 15 more
>>
>> It seems to me that first job is run locally and hence there is no result
>> for the next one on the HDFS. Am I wrong?
>>
>> ___________________________
>>
>>
>> I was able to make my MR job run on HDP cluster by adding this to
>> configuration (based on the following link):
>>
>>     Configuration conf = new Configuration(false);
>>     conf.addResource(new Path("file:///",
>> System.getProperty("oozie.action.conf.xml")));
>>
>> But why do I need to do that and how to avoid it? I have a sequence of MR
>> jobs run from this Java action and I don't won't to bind myself to using
>> Oozie and adding this to config of each job. Is there a way to make my jobs
>> run on cluster from Oozie by default?
>>
>> I should probably mention that this is an HDP cluster and setup was
>> performed through Ambari.
>> --
>> *Marko Dinić*
>> *Software engineer @*
>> [image: Nissatech]
>> Kajmakčalanska 8
>> 18000 Niš, Serbia
>> website <http://www.nissatech.com> | email <[email protected]>
>> tel/fax: +381 18 288 111 <%2B381%2018%20288%20111>
>> mobile: +381 63 82 49 556
>> skype: vesto91
>>
>>
>> --
>> *Marko Dinić*
>> *Software engineer @*
>> [image: Nissatech]
>> Kajmakčalanska 8
>> 18000 Niš, Serbia
>> website <http://www.nissatech.com> | email <[email protected]>
>> tel/fax: +381 18 288 111 <%2B381%2018%20288%20111>
>> mobile: +381 63 82 49 556
>> skype: vesto91
>>
>
>
> --
> *Marko Dinić*
> *Software engineer @*
> [image: Nissatech]
> Kajmakčalanska 8
> 18000 Niš, Serbia
> website <http://www.nissatech.com> | email <[email protected]>
> tel/fax: +381 18 288 111
> mobile: +381 63 82 49 556
> skype: vesto91
>

Re: MR jobs from Java action run locally

Reply via email to