On Thursday 21 April 2011 06:41 PM, Jeremy Hanna wrote:
On Apr 21, 2011, at 3:19 AM, Mridul Muralidharan wrote:
In general (on hadoop based systems), if the input is not immutable - you can
end up with issues during task re-execution, etc.
This happens not just for cassandra but for hbase, others too - where you
modify data in-place.
So do you mean that between the time of the first execution and the time of the
re-execution, input data can change? Yes that's possible. However, unless you
are reading stale data the second time, it's not a consistency issue, is it? I
mean, if I am guaranteed to read the most recent data on the first execution
and the second execution, that's consistent. If I am reading updated data the
second time, that's consistent and may or may not be a problem.
Just trying to make sure I understand.
To clarify, I am referring to re-execution of a task, not job.
From a (single) hadoop job point of view (and everything else which
consumes its output) - it is a consistency issue : the re-execution of
a task can generate set of key/values which are different from initial
invocation (which might have been used by some reducers).
Regards,
Mridul
Regards,
Mridul
On Thursday 21 April 2011 04:29 AM, Bing Wei wrote:
Hi, All.
When I do a pig query on Cassandra, and the Cassandra is updated by
application at the same time, what will happen? I may get inconsistent
results, right?