[ https://issues.apache.org/jira/browse/HBASE-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Stack resolved HBASE-1978. ---------------------------------- Resolution: Later > Change the range/block index scheme from [start,end) to (start, end], and > index range/block by endKey, specially in HFile > ------------------------------------------------------------------------------------------------------------------------- > > Key: HBASE-1978 > URL: https://issues.apache.org/jira/browse/HBASE-1978 > Project: HBase > Issue Type: New Feature > Components: io, master, regionserver > Reporter: Schubert Zhang > Assignee: Schubert Zhang > Priority: Major > Attachments: HBASE-1978-HFile-v1.patch > > > From the code review of HFile (HBASE-1818), we found the HFile allows > duplicated key. But the old implementation would lead to missing of > duplicated key when seek and scan, when the duplicated key span multiple > blocks. > We provide a patch (HBASE-1841 is't step1) to resolve above issue. This patch > modified HFile.Writer to avoid generating a problem hfile with above > cross-block duplicated key. It only start a new block when current appending > key is different from the last appended key. But it still has a rish when the > user of HFile.Writer append many same duplicated key which lead to a very > large block and need much memory or Out-of-memory. > The current HFile's block-index use startKey to index a block, i.e. the > range/block index scheme is [startKey,endKey). > As refering to the section 5.1 of the Google Bigtable paper. > "The METADATA table stores the location of a tablet under a row key that is > an encoding of the tablet's table identifer and its end row." > The theory of Bigtable's METADATA is same as the BlockIndex in a SSTable or > HFile, so we should use EndKey in HFile's BlockIndex. In my experiences of > Hypertable, the METADATA is also "tableID:endRow". > We would change the index scheme in HFile, from [startKey,endKey) to > (startKey,endKey]. And change the binary search method to meet this index > scheme. > This change can resolve above duplicated-key issue. > Note: > The totally fix need to modify many modules in HBase, seems include HFile, > META schema, some internal code, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)