On Thursday 21 April 2011 06:41 PM, Jeremy Hanna wrote:

On Apr 21, 2011, at 3:19 AM, Mridul Muralidharan wrote:


In general (on hadoop based systems), if the input is not immutable - you can 
end up with issues during task re-execution, etc.
This happens not just for cassandra but for hbase, others too - where you 
modify data in-place.


So do you mean that between the time of the first execution and the time of the 
re-execution, input data can change?  Yes that's possible.  However, unless you 
are reading stale data the second time, it's not a consistency issue, is it?  I 
mean, if I am guaranteed to read the most recent data on the first execution 
and the second execution, that's consistent.  If I am reading updated data the 
second time, that's consistent and may or may not be a problem.

Just trying to make sure I understand.

To clarify, I am referring to re-execution of a task, not job.

From a (single) hadoop job point of view (and everything else which consumes its output) - it is a consistency issue : the re-execution of a task can generate set of key/values which are different from initial invocation (which might have been used by some reducers).


Regards,
Mridul




Regards,
Mridul

On Thursday 21 April 2011 04:29 AM, Bing Wei wrote:
Hi, All.

When I do a pig query on Cassandra, and the Cassandra is updated by
application at the same time, what will happen? I may get inconsistent
results, right?




Reply via email to