Agree.
It becomes a function of number of records updated per sec (per key) and the max number of versions kept around for a col ...

Ofcourse, solving this in general is not easy anyway :-)



Regards,
Mridul


On Thursday 21 April 2011 05:20 PM, Dmitriy Ryaboy wrote:
We dont have that functionality in the hbase loader yet, but technically one 
can get around this inconsistency by specifying max timestamp on the hbase 
scan. As long as the number of versions hbase is configured to keep is smaller 
than number of updates to a single row during your scan, you'd get a consistent 
snapshot of the data. There is a jira open requesting we add timestamp 
support....

-----Original Message-----
From: "Mridul Muralidharan"<[email protected]>
To: "[email protected]"<[email protected]>
Cc: "Bing Wei"<[email protected]>
Sent: 4/21/2011 1:19 AM
Subject: Re: pig query on Cassandra


In general (on hadoop based systems), if the input is not immutable -
you can end up with issues during task re-execution, etc.
This happens not just for cassandra but for hbase, others too - where
you modify data in-place.



Regards,
Mridul

On Thursday 21 April 2011 04:29 AM, Bing Wei wrote:
Hi, All.

When I do a pig query on Cassandra, and the Cassandra is updated by
application at the same time, what will happen? I may get inconsistent
results, right?




Reply via email to