> Can you check whether hdfs related config was passed to Job correctly?
Ahhh, that was it! It wasn't picking up the .xml files. Fixed that and it seems to be working now. Thank you for your help!!! Sean From: Ted Yu <[email protected]<mailto:[email protected]>> Reply-To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Date: Wednesday, February 6, 2013 2:25 PM To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: Re: TaskStatus Exception using HFileOutputFormat Thanks for this information. Here is related code: public static void configureIncrementalLoad(Job job, HTable table) throws IOException { Configuration conf = job.getConfiguration(); ... Path partitionsPath = new Path(job.getWorkingDirectory(), "partitions_" + UUID.randomUUID()); LOG.info("Writing partition information to " + partitionsPath); FileSystem fs = partitionsPath.getFileSystem(conf); writePartitions(conf, partitionsPath, startKeys); partitionsPath.makeQualified(fs); Can you check whether hdfs related config was passed to Job correctly ? Thanks On Wed, Feb 6, 2013 at 1:15 PM, Sean McNamara <[email protected]<mailto:[email protected]>> wrote: Ok, a bit more info- From what I can tell is that the partitions file is being placed into the working dir on the node I launch from, and the task trackers are trying to look for that file, which doesn't exist where they run (since they are on other nodes.) Here is the exception on the TT in case it is helpful: 2013-02-06 17:05:13,002 WARN org.apache.hadoop.mapred.TaskTracker: Exception while localization java.io.FileNotFoundException: File /opt/jobs/MyMapreduceJob/partitions_1360170306728 does not exist. at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:397) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251) at org.apache.hadoop.filecache.TaskDistributedCacheManager.setupCache(TaskDistributedCacheManager.java:179) at org.apache.hadoop.mapred.TaskTracker$4.run(TaskTracker.java:1212) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1203) at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1118) at org.apache.hadoop.mapred.TaskTracker$5.run(TaskTracker.java:2430) at java.lang.Thread.run(Thread.java:662) From: Sean McNamara <[email protected]<mailto:[email protected]>> Reply-To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Date: Wednesday, February 6, 2013 9:35 AM To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: Re: TaskStatus Exception using HFileOutputFormat > Using the below construct, do you still get exception ? Correct, I am still getting this exception. Sean From: Ted Yu <[email protected]<mailto:[email protected]>> Reply-To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Date: Tuesday, February 5, 2013 7:50 PM To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: Re: TaskStatus Exception using HFileOutputFormat Using the below construct, do you still get exception ? Please consider upgrading to hadoop 1.0.4 Thanks On Tue, Feb 5, 2013 at 4:55 PM, Sean McNamara <[email protected]<mailto:[email protected]>> wrote: > an you tell us the HBase and hadoop versions you were using ? Ahh yes, sorry I left that out: Hadoop: 1.0.3 HBase: 0.92.0 > I guess you have used the above construct Our code is as follows: HTable table = new HTable(conf, configHBaseTable); FileOutputFormat.setOutputPath(job, outputDir); HFileOutputFormat.configureIncrementalLoad(job, table); Thanks! From: Ted Yu <[email protected]<mailto:[email protected]>> Reply-To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Date: Tuesday, February 5, 2013 5:46 PM To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: Re: TaskStatus Exception using HFileOutputFormat Can you tell us the HBase and hadoop versions you were using ? >From TestHFileOutputFormat: HFileOutputFormat.configureIncrementalLoad(job, table); FileOutputFormat.setOutputPath(job, outDir); I guess you have used the above construct ? Cheers On Tue, Feb 5, 2013 at 4:31 PM, Sean McNamara <[email protected]<mailto:[email protected]>> wrote: We're trying to use HFileOutputFormat for bulk hbase loading. When using HFileOutputFormat's setOutputPath or configureIncrementalLoad, the job is unable to run. The error I see in the jobtracker logs is: Trying to set finish time for task attempt_201301030046_123198_m_000002_0 when no start time is set, stackTrace is : java.lang.Exception If I remove an references to HFileOutputFormat, and use FileOutputFormat.setOutputPath, things seem to run great. Does anyone know what could be causing the TaskStatus error when using HFileOutputFormat? Thanks, Sean What I see on the Job Tracker: 2013-02-06 00:17:33,685 ERROR org.apache.hadoop.mapred.TaskStatus: Trying to set finish time for task attempt_201301030046_123198_m_000002_0 when no start time is set, stackTrace is : java.lang.Exception at org.apache.hadoop.mapred.TaskStatus.setFinishTime(TaskStatus.java:145) at org.apache.hadoop.mapred.TaskInProgress.incompleteSubTask(TaskInProgress.java:670) at org.apache.hadoop.mapred.JobInProgress.failedTask(JobInProgress.java:2945) at org.apache.hadoop.mapred.JobInProgress.updateTaskStatus(JobInProgress.java:1162) at org.apache.hadoop.mapred.JobTracker.updateTaskStatuses(JobTracker.java:4739) at org.apache.hadoop.mapred.JobTracker.processHeartbeat(JobTracker.java:3683) at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:3378) at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382) What I see from the console: 391 [main] INFO org.apache.hadoop.hbase.mapreduce.HFileOutputFormat - Looking up current regions for table org.apache.hadoop.hbase.client.HTable@3a083b1b 1284 [main] INFO org.apache.hadoop.hbase.mapreduce.HFileOutputFormat - Configuring 41 reduce partitions to match current region count 1285 [main] INFO org.apache.hadoop.hbase.mapreduce.HFileOutputFormat - Writing partition information to file:/opt/webtrends/oozie/jobs/Lab/O/VisitorAnalytics.MapReduce/bin/partitions_1360109875112 1319 [main] INFO org.apache.hadoop.util.NativeCodeLoader - Loaded the native-hadoop library 1328 [main] INFO org.apache.hadoop.io.compress.zlib.ZlibFactory - Successfully loaded & initialized native-zlib library 1329 [main] INFO org.apache.hadoop.io.compress.CodecPool - Got brand-new compressor 1588 [main] INFO org.apache.hadoop.hbase.mapreduce.HFileOutputFormat - Incremental table output configured. 2896 [main] INFO org.apache.hadoop.hbase.mapreduce.TableOutputFormat - Created table instance for Lab_O_VisitorHistory 2910 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 Job Name: job_201301030046_123199 Job Id: http://strack01.staging.dmz:50030/jobdetails.jsp?jobid=job_201301030046_123199 Job URL: VisitorHistory MapReduce (soozie01.Lab.O) 3141 [main] INFO org.apache.hadoop.mapred.JobClient - Running job: job_201301030046_123199 4145 [main] INFO org.apache.hadoop.mapred.JobClient - map 0% reduce 0% 10162 [main] INFO org.apache.hadoop.mapred.JobClient - Task Id : attempt_201301030046_123199_m_000002_0, Status : FAILED 10196 [main] WARN org.apache.hadoop.mapred.JobClient - Error reading task outputhttp://sdata01.staging.dmz:50060/tasklog?plaintext=true&attemptid=attempt_201301030046_123199_m_000002_0&filter=stdout 10199 [main] WARN org.apache.hadoop.mapred.JobClient - Error reading task outputhttp://sdata01.staging.dmz:50060/tasklog?plaintext=true&attemptid=attempt_201301030046_123199_m_000002_0&filter=stderr 10199 [main] INFO org.apache.hadoop.mapred.JobClient - Task Id : attempt_201301030046_123199_r_000042_0, Status : FAILED 10203 [main] WARN org.apache.hadoop.mapred.JobClient - Error reading task outputhttp://sdata01.staging.dmz:50060/tasklog?plaintext=true&attemptid=attempt_201301030046_123199_r_000042_0&filter=stdout 10205 [main] WARN org.apache.hadoop.mapred.JobClient - Error reading task outputhttp://sdata01.staging.dmz:50060/tasklog?plaintext=true&attemptid=attempt_201301030046_123199_r_000042_0&filter=stderr 10206 [main] INFO org.apache.hadoop.mapred.JobClient - Task Id : attempt_201301030046_123199_m_000002_1, Status : FAILED 10210 [main] WARN org.apache.hadoop.mapred.JobClient - Error reading task outputhttp://sdata05.staging.dmz:50060/tasklog?plaintext=true&attemptid=attempt_201301030046_123199_m_000002_1&filter=stdout 10213 [main] WARN org.apache.hadoop.mapred.JobClient - Error reading task outputhttp://sdata05.staging.dmz:50060/tasklog?plaintext=true&attemptid=attempt_201301030046_123199_m_000002_1&filter=stderr 10213 [main] INFO org.apache.hadoop.mapred.JobClient - Task Id : attempt_201301030046_123199_r_000042_1, Status : FAILED 10217 [main] WARN org.apache.hadoop.mapred.JobClient - Error reading task outputhttp://sdata05.staging.dmz:50060/tasklog?plaintext=true&attemptid=attempt_201301030046_123199_r_000042_1&filter=stdout 10219 [main] WARN org.apache.hadoop.mapred.JobClient - Error reading task outputhttp://sdata05.staging.dmz:50060/tasklog?plaintext=true&attemptid=attempt_201301030046_123199_r_000042_1&filter=stderr 10220 [main] INFO org.apache.hadoop.mapred.JobClient - Task Id : attempt_201301030046_123199_m_000002_2, Status : FAILED 10224 [main] WARN org.apache.hadoop.mapred.JobClient - Error reading task outputhttp://sdata03.staging.dmz:50060/tasklog?plaintext=true&attemptid=attempt_201301030046_123199_m_000002_2&filter=stdout 10226 [main] WARN org.apache.hadoop.mapred.JobClient - Error reading task outputhttp://sdata03.staging.dmz:50060/tasklog?plaintext=true&attemptid=attempt_201301030046_123199_m_000002_2&filter=stderr 10227 [main] INFO org.apache.hadoop.mapred.JobClient - Task Id : attempt_201301030046_123199_r_000042_2, Status : FAILED 10236 [main] WARN org.apache.hadoop.mapred.JobClient - Error reading task outputhttp://sdata03.staging.dmz:50060/tasklog?plaintext=true&attemptid=attempt_201301030046_123199_r_000042_2&filter=stdout 10239 [main] WARN org.apache.hadoop.mapred.JobClient - Error reading task outputhttp://sdata03.staging.dmz:50060/tasklog?plaintext=true&attemptid=attempt_201301030046_123199_r_000042_2&filter=stderr 10239 [main] INFO org.apache.hadoop.mapred.JobClient - Task Id : attempt_201301030046_123199_m_000001_0, Status : FAILED 10244 [main] WARN org.apache.hadoop.mapred.JobClient - Error reading task outputhttp://sdata02.staging.dmz:50060/tasklog?plaintext=true&attemptid=attempt_201301030046_123199_m_000001_0&filter=stdout 10247 [main] WARN org.apache.hadoop.mapred.JobClient - Error reading task outputhttp://sdata02.staging.dmz:50060/tasklog?plaintext=true&attemptid=attempt_201301030046_123199_m_000001_0&filter=stderr 10247 [main] INFO org.apache.hadoop.mapred.JobClient - Task Id : attempt_201301030046_123199_r_000041_0, Status : FAILED 10250 [main] WARN org.apache.hadoop.mapred.JobClient - Error reading task outputhttp://sdata02.staging.dmz:50060/tasklog?plaintext=true&attemptid=attempt_201301030046_123199_r_000041_0&filter=stdout 10252 [main] WARN org.apache.hadoop.mapred.JobClient - Error reading task outputhttp://sdata02.staging.dmz:50060/tasklog?plaintext=true&attemptid=attempt_201301030046_123199_r_000041_0&filter=stderr 11255 [main] INFO org.apache.hadoop.mapred.JobClient - Task Id : attempt_201301030046_123199_m_000001_1, Status : FAILED 11259 [main] WARN org.apache.hadoop.mapred.JobClient - Error reading task outputhttp://sdata05.staging.dmz:50060/tasklog?plaintext=true&attemptid=attempt_201301030046_123199_m_000001_1&filter=stdout 11262 [main] WARN org.apache.hadoop.mapred.JobClient - Error reading task outputhttp://sdata05.staging.dmz:50060/tasklog?plaintext=true&attemptid=attempt_201301030046_123199_m_000001_1&filter=stderr 11262 [main] INFO org.apache.hadoop.mapred.JobClient - Task Id : attempt_201301030046_123199_r_000041_1, Status : FAILED 11265 [main] WARN org.apache.hadoop.mapred.JobClient - Error reading task outputhttp://sdata05.staging.dmz:50060/tasklog?plaintext=true&attemptid=attempt_201301030046_123199_r_000041_1&filter=stdout 11267 [main] WARN org.apache.hadoop.mapred.JobClient - Error reading task outputhttp://sdata05.staging.dmz:50060/tasklog?plaintext=true&attemptid=attempt_201301030046_123199_r_000041_1&filter=stderr 11267 [main] INFO org.apache.hadoop.mapred.JobClient - Task Id : attempt_201301030046_123199_m_000001_2, Status : FAILED 11271 [main] WARN org.apache.hadoop.mapred.JobClient - Error reading task outputhttp://sdata03.staging.dmz:50060/tasklog?plaintext=true&attemptid=attempt_201301030046_123199_m_000001_2&filter=stdout 11273 [main] WARN org.apache.hadoop.mapred.JobClient - Error reading task outputhttp://sdata03.staging.dmz:50060/tasklog?plaintext=true&attemptid=attempt_201301030046_123199_m_000001_2&filter=stderr 11274 [main] INFO org.apache.hadoop.mapred.JobClient - Task Id : attempt_201301030046_123199_r_000041_2, Status : FAILED 11277 [main] WARN org.apache.hadoop.mapred.JobClient - Error reading task outputhttp://sdata03.staging.dmz:50060/tasklog?plaintext=true&attemptid=attempt_201301030046_123199_r_000041_2&filter=stdout 11279 [main] WARN org.apache.hadoop.mapred.JobClient - Error reading task outputhttp://sdata03.staging.dmz:50060/tasklog?plaintext=true&attemptid=attempt_201301030046_123199_r_000041_2&filter=stderr 11280 [main] INFO org.apache.hadoop.mapred.JobClient - Job complete: job_201301030046_123199 11291 [main] INFO org.apache.hadoop.mapred.JobClient - Counters: 4 11292 [main] INFO org.apache.hadoop.mapred.JobClient - Job Counters 11292 [main] INFO org.apache.hadoop.mapred.JobClient - SLOTS_MILLIS_MAPS=0 11292 [main] INFO org.apache.hadoop.mapred.JobClient - Total time spent by all reduces waiting after reserving slots (ms)=0 11292 [main] INFO org.apache.hadoop.mapred.JobClient - Total time spent by all maps waiting after reserving slots (ms)=0 11293 [main] INFO org.apache.hadoop.mapred.JobClient - SLOTS_MILLIS_REDUCES=0
