Re: Avoiding duplicate writes

Lalit Jadhav Thu, 11 Jan 2018 03:04:15 -0800

Hello Peter,

    You can add a Random number in Row key for avoiding Rowkey overriding.
Even though timeStamp at one ms is same Random Number provides uniqueness.


On Thu, Jan 11, 2018 at 3:46 PM, Peter Marron <[email protected]>
wrote:

> Hi,
>
> We have a problem when we are writing lots of records to HBase.
> We are not specifying timestamps explicitly and so the situation arises
> where multiple records are being written in the same millisecond.
> Unfortunately when the records are written and the timestamps are the same
> then later writes are treated as updates of the previous records and not
> separate records, which is what we want.
> So we want to be able to guarantee that records are not treated as
> overwrites (unless we explicitly make them so).
>
> As I understand it there are (at least) two different ways to proceed.
>
> The first approach is to increase the resolution of the timestamp.
> So we could use something like java.lang.System.nanoTime()
> However although this seems to ameliorate the problem it seems to
> introduce other problems.
> Also ideally we would like something that guarantees that we don't lose
> writes rather than making them more unlikely.
>
> The second approach is to write a prePut co-processor.
> In the prePut I can do a read using the same rowkey, column family and
> column qualifier and omit the timestamp.
> As I understand it this will return me the latest timestamp.
> Then I can update the timestamp that I am going to write, if necessary, to
> make sure that the timestamp is always unique.
> In this way I can guarantee that none of my writes are accidentally turned
> into updates.
>
> However this approach seems to be expensive.
> I have to do a read before each write, and although (I believe) it will be
> on the same region server, it's still going to slow things down a lot.
> Also I am assuming that the prePut co-processor is executed inside a
> record lock so that I don't have to worry about synchronization.
> Is this true?
>
> Is there a better way?
>
> Maybe there is some implementation of this already that I can pick up?
>
> Maybe there is some way that I can implement this more efficiently?
>
> It seems to me that this might be better handled at compaction.
> Shouldn't there be some way that I can mark writes with some sort of
> special value of timestamp that means that this write should never be
> considered as an update but always as a separate write?
>
> Any advice gratefully received.
>
> Peter Marron
>



-- 
Regards,
Lalit Jadhav
Network Component Private Limited.

Re: Avoiding duplicate writes

Reply via email to