> Can you check whether hdfs related config was passed to Job correctly?

Ahhh, that was it!  It wasn't picking up the .xml files.  Fixed that and it 
seems to be working now.

Thank you for your help!!!

Sean


From: Ted Yu <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Wednesday, February 6, 2013 2:25 PM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: TaskStatus Exception using HFileOutputFormat

Thanks for this information. Here is related code:

  public static void configureIncrementalLoad(Job job, HTable table)

  throws IOException {

    Configuration conf = job.getConfiguration();

...

    Path partitionsPath = new Path(job.getWorkingDirectory(),

                                   "partitions_" + UUID.randomUUID());

    LOG.info("Writing partition information to " + partitionsPath);

    FileSystem fs = partitionsPath.getFileSystem(conf);

    writePartitions(conf, partitionsPath, startKeys);

    partitionsPath.makeQualified(fs);

Can you check whether hdfs related config was passed to Job correctly ?

Thanks

On Wed, Feb 6, 2013 at 1:15 PM, Sean McNamara 
<[email protected]<mailto:[email protected]>> wrote:
Ok, a bit more info-  From what I can tell is that the partitions file is being 
placed into the working dir on the node I launch from, and the task trackers 
are trying to look for that file, which doesn't exist where they run (since 
they are on other nodes.)


Here is the exception on the TT in case it is helpful:


2013-02-06 17:05:13,002 WARN org.apache.hadoop.mapred.TaskTracker: Exception 
while localization java.io.FileNotFoundException: File 
/opt/jobs/MyMapreduceJob/partitions_1360170306728 does not exist.
        at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:397)
        at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
        at 
org.apache.hadoop.filecache.TaskDistributedCacheManager.setupCache(TaskDistributedCacheManager.java:179)
        at org.apache.hadoop.mapred.TaskTracker$4.run(TaskTracker.java:1212)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
        at 
org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1203)
        at 
org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1118)
        at org.apache.hadoop.mapred.TaskTracker$5.run(TaskTracker.java:2430)
        at java.lang.Thread.run(Thread.java:662)

From: Sean McNamara 
<[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Wednesday, February 6, 2013 9:35 AM

To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: TaskStatus Exception using HFileOutputFormat

> Using the below construct, do you still get exception ?

Correct, I am still getting this exception.

Sean

From: Ted Yu <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Tuesday, February 5, 2013 7:50 PM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: TaskStatus Exception using HFileOutputFormat

Using the below construct, do you still get exception ?

Please consider upgrading to hadoop 1.0.4

Thanks

On Tue, Feb 5, 2013 at 4:55 PM, Sean McNamara 
<[email protected]<mailto:[email protected]>> wrote:
> an you tell us the HBase and hadoop versions you were using ?

Ahh yes, sorry I left that out:

Hadoop: 1.0.3
HBase: 0.92.0


> I guess you have used the above construct


Our code is as follows:
HTable table = new HTable(conf, configHBaseTable);
FileOutputFormat.setOutputPath(job, outputDir);
HFileOutputFormat.configureIncrementalLoad(job, table);


Thanks!

From: Ted Yu <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Tuesday, February 5, 2013 5:46 PM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: TaskStatus Exception using HFileOutputFormat

Can you tell us the HBase and hadoop versions you were using ?
>From TestHFileOutputFormat:

    HFileOutputFormat.configureIncrementalLoad(job, table);

    FileOutputFormat.setOutputPath(job, outDir);

I guess you have used the above construct ?

Cheers

On Tue, Feb 5, 2013 at 4:31 PM, Sean McNamara 
<[email protected]<mailto:[email protected]>> wrote:

We're trying to use HFileOutputFormat for bulk hbase loading.   When using 
HFileOutputFormat's setOutputPath or configureIncrementalLoad, the job is 
unable to run.  The error I see in the jobtracker logs is: Trying to set finish 
time for task attempt_201301030046_123198_m_000002_0 when no start time is set, 
stackTrace is : java.lang.Exception

If I remove an references to HFileOutputFormat, and use 
FileOutputFormat.setOutputPath, things seem to run great.  Does anyone know 
what could be causing the TaskStatus error when using HFileOutputFormat?

Thanks,

Sean


What I see on the Job Tracker:

2013-02-06 00:17:33,685 ERROR org.apache.hadoop.mapred.TaskStatus: Trying to 
set finish time for task attempt_201301030046_123198_m_000002_0 when no start 
time is set, stackTrace is : java.lang.Exception
        at 
org.apache.hadoop.mapred.TaskStatus.setFinishTime(TaskStatus.java:145)
        at 
org.apache.hadoop.mapred.TaskInProgress.incompleteSubTask(TaskInProgress.java:670)
        at 
org.apache.hadoop.mapred.JobInProgress.failedTask(JobInProgress.java:2945)
        at 
org.apache.hadoop.mapred.JobInProgress.updateTaskStatus(JobInProgress.java:1162)
        at 
org.apache.hadoop.mapred.JobTracker.updateTaskStatuses(JobTracker.java:4739)
        at 
org.apache.hadoop.mapred.JobTracker.processHeartbeat(JobTracker.java:3683)
        at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:3378)
        at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)


What I see from the console:

391  [main] INFO  org.apache.hadoop.hbase.mapreduce.HFileOutputFormat  - 
Looking up current regions for table 
org.apache.hadoop.hbase.client.HTable@3a083b1b
1284 [main] INFO  org.apache.hadoop.hbase.mapreduce.HFileOutputFormat  - 
Configuring 41 reduce partitions to match current region count
1285 [main] INFO  org.apache.hadoop.hbase.mapreduce.HFileOutputFormat  - 
Writing partition information to 
file:/opt/webtrends/oozie/jobs/Lab/O/VisitorAnalytics.MapReduce/bin/partitions_1360109875112
1319 [main] INFO  org.apache.hadoop.util.NativeCodeLoader  - Loaded the 
native-hadoop library
1328 [main] INFO  org.apache.hadoop.io.compress.zlib.ZlibFactory  - 
Successfully loaded & initialized native-zlib library
1329 [main] INFO  org.apache.hadoop.io.compress.CodecPool  - Got brand-new 
compressor
1588 [main] INFO  org.apache.hadoop.hbase.mapreduce.HFileOutputFormat  - 
Incremental table output configured.
2896 [main] INFO  org.apache.hadoop.hbase.mapreduce.TableOutputFormat  - 
Created table instance for Lab_O_VisitorHistory
2910 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat  - 
Total input paths to process : 1
Job Name:       job_201301030046_123199
Job Id: 
http://strack01.staging.dmz:50030/jobdetails.jsp?jobid=job_201301030046_123199
Job URL:        VisitorHistory MapReduce (soozie01.Lab.O)
3141 [main] INFO  org.apache.hadoop.mapred.JobClient  - Running job: 
job_201301030046_123199
4145 [main] INFO  org.apache.hadoop.mapred.JobClient  -  map 0% reduce 0%
10162 [main] INFO  org.apache.hadoop.mapred.JobClient  - Task Id : 
attempt_201301030046_123199_m_000002_0, Status : FAILED
10196 [main] WARN  org.apache.hadoop.mapred.JobClient  - Error reading task 
outputhttp://sdata01.staging.dmz:50060/tasklog?plaintext=true&attemptid=attempt_201301030046_123199_m_000002_0&filter=stdout
10199 [main] WARN  org.apache.hadoop.mapred.JobClient  - Error reading task 
outputhttp://sdata01.staging.dmz:50060/tasklog?plaintext=true&attemptid=attempt_201301030046_123199_m_000002_0&filter=stderr
10199 [main] INFO  org.apache.hadoop.mapred.JobClient  - Task Id : 
attempt_201301030046_123199_r_000042_0, Status : FAILED
10203 [main] WARN  org.apache.hadoop.mapred.JobClient  - Error reading task 
outputhttp://sdata01.staging.dmz:50060/tasklog?plaintext=true&attemptid=attempt_201301030046_123199_r_000042_0&filter=stdout
10205 [main] WARN  org.apache.hadoop.mapred.JobClient  - Error reading task 
outputhttp://sdata01.staging.dmz:50060/tasklog?plaintext=true&attemptid=attempt_201301030046_123199_r_000042_0&filter=stderr
10206 [main] INFO  org.apache.hadoop.mapred.JobClient  - Task Id : 
attempt_201301030046_123199_m_000002_1, Status : FAILED
10210 [main] WARN  org.apache.hadoop.mapred.JobClient  - Error reading task 
outputhttp://sdata05.staging.dmz:50060/tasklog?plaintext=true&attemptid=attempt_201301030046_123199_m_000002_1&filter=stdout
10213 [main] WARN  org.apache.hadoop.mapred.JobClient  - Error reading task 
outputhttp://sdata05.staging.dmz:50060/tasklog?plaintext=true&attemptid=attempt_201301030046_123199_m_000002_1&filter=stderr
10213 [main] INFO  org.apache.hadoop.mapred.JobClient  - Task Id : 
attempt_201301030046_123199_r_000042_1, Status : FAILED
10217 [main] WARN  org.apache.hadoop.mapred.JobClient  - Error reading task 
outputhttp://sdata05.staging.dmz:50060/tasklog?plaintext=true&attemptid=attempt_201301030046_123199_r_000042_1&filter=stdout
10219 [main] WARN  org.apache.hadoop.mapred.JobClient  - Error reading task 
outputhttp://sdata05.staging.dmz:50060/tasklog?plaintext=true&attemptid=attempt_201301030046_123199_r_000042_1&filter=stderr
10220 [main] INFO  org.apache.hadoop.mapred.JobClient  - Task Id : 
attempt_201301030046_123199_m_000002_2, Status : FAILED
10224 [main] WARN  org.apache.hadoop.mapred.JobClient  - Error reading task 
outputhttp://sdata03.staging.dmz:50060/tasklog?plaintext=true&attemptid=attempt_201301030046_123199_m_000002_2&filter=stdout
10226 [main] WARN  org.apache.hadoop.mapred.JobClient  - Error reading task 
outputhttp://sdata03.staging.dmz:50060/tasklog?plaintext=true&attemptid=attempt_201301030046_123199_m_000002_2&filter=stderr
10227 [main] INFO  org.apache.hadoop.mapred.JobClient  - Task Id : 
attempt_201301030046_123199_r_000042_2, Status : FAILED
10236 [main] WARN  org.apache.hadoop.mapred.JobClient  - Error reading task 
outputhttp://sdata03.staging.dmz:50060/tasklog?plaintext=true&attemptid=attempt_201301030046_123199_r_000042_2&filter=stdout
10239 [main] WARN  org.apache.hadoop.mapred.JobClient  - Error reading task 
outputhttp://sdata03.staging.dmz:50060/tasklog?plaintext=true&attemptid=attempt_201301030046_123199_r_000042_2&filter=stderr
10239 [main] INFO  org.apache.hadoop.mapred.JobClient  - Task Id : 
attempt_201301030046_123199_m_000001_0, Status : FAILED
10244 [main] WARN  org.apache.hadoop.mapred.JobClient  - Error reading task 
outputhttp://sdata02.staging.dmz:50060/tasklog?plaintext=true&attemptid=attempt_201301030046_123199_m_000001_0&filter=stdout
10247 [main] WARN  org.apache.hadoop.mapred.JobClient  - Error reading task 
outputhttp://sdata02.staging.dmz:50060/tasklog?plaintext=true&attemptid=attempt_201301030046_123199_m_000001_0&filter=stderr
10247 [main] INFO  org.apache.hadoop.mapred.JobClient  - Task Id : 
attempt_201301030046_123199_r_000041_0, Status : FAILED
10250 [main] WARN  org.apache.hadoop.mapred.JobClient  - Error reading task 
outputhttp://sdata02.staging.dmz:50060/tasklog?plaintext=true&attemptid=attempt_201301030046_123199_r_000041_0&filter=stdout
10252 [main] WARN  org.apache.hadoop.mapred.JobClient  - Error reading task 
outputhttp://sdata02.staging.dmz:50060/tasklog?plaintext=true&attemptid=attempt_201301030046_123199_r_000041_0&filter=stderr
11255 [main] INFO  org.apache.hadoop.mapred.JobClient  - Task Id : 
attempt_201301030046_123199_m_000001_1, Status : FAILED
11259 [main] WARN  org.apache.hadoop.mapred.JobClient  - Error reading task 
outputhttp://sdata05.staging.dmz:50060/tasklog?plaintext=true&attemptid=attempt_201301030046_123199_m_000001_1&filter=stdout
11262 [main] WARN  org.apache.hadoop.mapred.JobClient  - Error reading task 
outputhttp://sdata05.staging.dmz:50060/tasklog?plaintext=true&attemptid=attempt_201301030046_123199_m_000001_1&filter=stderr
11262 [main] INFO  org.apache.hadoop.mapred.JobClient  - Task Id : 
attempt_201301030046_123199_r_000041_1, Status : FAILED
11265 [main] WARN  org.apache.hadoop.mapred.JobClient  - Error reading task 
outputhttp://sdata05.staging.dmz:50060/tasklog?plaintext=true&attemptid=attempt_201301030046_123199_r_000041_1&filter=stdout
11267 [main] WARN  org.apache.hadoop.mapred.JobClient  - Error reading task 
outputhttp://sdata05.staging.dmz:50060/tasklog?plaintext=true&attemptid=attempt_201301030046_123199_r_000041_1&filter=stderr
11267 [main] INFO  org.apache.hadoop.mapred.JobClient  - Task Id : 
attempt_201301030046_123199_m_000001_2, Status : FAILED
11271 [main] WARN  org.apache.hadoop.mapred.JobClient  - Error reading task 
outputhttp://sdata03.staging.dmz:50060/tasklog?plaintext=true&attemptid=attempt_201301030046_123199_m_000001_2&filter=stdout
11273 [main] WARN  org.apache.hadoop.mapred.JobClient  - Error reading task 
outputhttp://sdata03.staging.dmz:50060/tasklog?plaintext=true&attemptid=attempt_201301030046_123199_m_000001_2&filter=stderr
11274 [main] INFO  org.apache.hadoop.mapred.JobClient  - Task Id : 
attempt_201301030046_123199_r_000041_2, Status : FAILED
11277 [main] WARN  org.apache.hadoop.mapred.JobClient  - Error reading task 
outputhttp://sdata03.staging.dmz:50060/tasklog?plaintext=true&attemptid=attempt_201301030046_123199_r_000041_2&filter=stdout
11279 [main] WARN  org.apache.hadoop.mapred.JobClient  - Error reading task 
outputhttp://sdata03.staging.dmz:50060/tasklog?plaintext=true&attemptid=attempt_201301030046_123199_r_000041_2&filter=stderr
11280 [main] INFO  org.apache.hadoop.mapred.JobClient  - Job complete: 
job_201301030046_123199
11291 [main] INFO  org.apache.hadoop.mapred.JobClient  - Counters: 4
11292 [main] INFO  org.apache.hadoop.mapred.JobClient  -   Job Counters
11292 [main] INFO  org.apache.hadoop.mapred.JobClient  -     SLOTS_MILLIS_MAPS=0
11292 [main] INFO  org.apache.hadoop.mapred.JobClient  -     Total time spent 
by all reduces waiting after reserving slots (ms)=0
11292 [main] INFO  org.apache.hadoop.mapred.JobClient  -     Total time spent 
by all maps waiting after reserving slots (ms)=0
11293 [main] INFO  org.apache.hadoop.mapred.JobClient  -     
SLOTS_MILLIS_REDUCES=0






Reply via email to