Re: Problem In Understanding Compaction Process

Anty Fri, 22 Feb 2013 23:16:40 -0800

Thanks Sergey
In my use case. I want to directly analyze the underlying HFiles, So i
can't tolerance duplicate data.


Can you give me some pointers about how to make this procedure atomic?





On Thu, Feb 21, 2013 at 6:07 AM, Sergey Shelukhin <[email protected]>wrote:

> There should be no duplicate records despite the file not being deleted -
> between the records with exact same key/version/etc., the newer file would
> be chosen by logical sequence. If that happens to be the same some choice
> (by time, or name), still one file will be chosen.
> Eventually, the file will be compacted again and disappear. Granted, by
> making the move atomic (via some meta/manifest file) we could avoid some
> overhead in this case at the cost of some added complexity, but it should
> be rather rare.
>
> On Tue, Feb 19, 2013 at 7:10 PM, Anty <[email protected]> wrote:
>
> > Hi: Guys
> >
> >       I have some problem in understanding the compaction process, Can
> > someone shed some light on me, much appreciate. Here is the problem:
> >
> >       Region Server after successfully generate the final compacted file,
> > it going through two steps:
> >        1. move the above compacted file into region's directory
> >        2. delete replaced files.
> >
> >        the above two steps are not atomic, if Region Server crash after
> > step1, and  before step2, then there are duplication records!  Is this
> > problem handled  in reading process , or there is another mechanism to
> fix
> > this?
> >
> > --
> > Best Regards
> > Anty Rao
> >
>



-- 
Best Regards
Anty Rao

Re: Problem In Understanding Compaction Process

Reply via email to