Thanks Sergey In my use case. I want to directly analyze the underlying HFiles, So i can't tolerance duplicate data.
Can you give me some pointers about how to make this procedure atomic? On Thu, Feb 21, 2013 at 6:07 AM, Sergey Shelukhin <[email protected]>wrote: > There should be no duplicate records despite the file not being deleted - > between the records with exact same key/version/etc., the newer file would > be chosen by logical sequence. If that happens to be the same some choice > (by time, or name), still one file will be chosen. > Eventually, the file will be compacted again and disappear. Granted, by > making the move atomic (via some meta/manifest file) we could avoid some > overhead in this case at the cost of some added complexity, but it should > be rather rare. > > On Tue, Feb 19, 2013 at 7:10 PM, Anty <[email protected]> wrote: > > > Hi: Guys > > > > I have some problem in understanding the compaction process, Can > > someone shed some light on me, much appreciate. Here is the problem: > > > > Region Server after successfully generate the final compacted file, > > it going through two steps: > > 1. move the above compacted file into region's directory > > 2. delete replaced files. > > > > the above two steps are not atomic, if Region Server crash after > > step1, and before step2, then there are duplication records! Is this > > problem handled in reading process , or there is another mechanism to > fix > > this? > > > > -- > > Best Regards > > Anty Rao > > > -- Best Regards Anty Rao
