HFiles are missing from an incremental load
-------------------------------------------

                 Key: HBASE-5210
                 URL: https://issues.apache.org/jira/browse/HBASE-5210
             Project: HBase
          Issue Type: Bug
          Components: mapreduce
    Affects Versions: 0.90.2
         Environment: HBase 0.90.2 with Hadoop-0.20.2 (with durable sync).  
RHEL 2.6.18-164.15.1.el5.  4 node cluster (1 master, 3 slaves)
            Reporter: Lawrence Simpson


We run an overnight map/reduce job that loads data from an external source and 
adds that data to an existing HBase table.  The input files have been loaded 
into hdfs.  The map/reduce job uses the HFileOutputFormat (and the 
TotalOrderPartitioner) to create HFiles which are subsequently added to the 
HBase table.  On at least two separate occasions (that we know of), a range of 
output would be missing for a given day.  The range of keys for the missing 
values corresponded to those of a particular region.  This implied that a 
complete HFile somehow went missing from the job.  Further investigation 
revealed the following:

 * Two different reducers (running in separate JVMs and thus separate class 
loaders)
 * in the same server can end up using the same file names for their
 * HFiles.  The scenario is as follows:
 *      1.      Both reducers start near the same time.
 *      2.      The first reducer reaches the point where it wants to write its 
first file.
 *      3.      It uses the StoreFile class which contains a static Random 
object 
 *              which is initialized by default using a timestamp.
 *      4.      The file name is generated using the random number generator.
 *      5.      The file name is checked against other existing files.
 *      6.      The file is written into temporary files in a directory named
 *              after the reducer attempt.
 *      7.      The second reduce task reaches the same point, but its 
StoreClass
 *              (which is now in the file system's cache) gets loaded within the
 *              time resolution of the OS and thus initializes its Random()
 *              object with the same seed as the first task.
 *      8.      The second task also checks for an existing file with the name
 *              generated by the random number generator and finds no conflict
 *              because each task is writing files in its own temporary folder.
 *      9.      The first task finishes and gets its temporary files committed
 *              to the "real" folder specified for output of the HFiles.
 *     10.      The second task then reaches its own conclusion and commits its
 *              files (moveTaskOutputs).  The released Hadoop code just 
overwrites
 *              any files with the same name.  No warning messages or anything.
 *              The first task's HFiles just go missing.
 * 
 *  Note:  The reducers here are NOT different attempts at the same 
 *      reduce task.  They are different reduce tasks so data is
 *      really lost.

I am currently testing a fix in which I have added code to the Hadoop 
FileOutputCommitter.moveTaskOutputs method to check for a conflict with
an existing file in the final output folder and to rename the HFile if
needed.  This may not be appropriate for all uses of FileOutputFormat.
So I have put this into a new class which is then used by a subclass of
HFileOutputFormat.  Subclassing of FileOutputCommitter itself was a bit 
more of a problem due to private declarations.

I don't know if my approach is the best fix for the problem.  If someone
more knowledgeable than myself deems that it is, I will be happy to share
what I have done and by that time I may have some information on the
results.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to