On Apr 21, 2011, at 9:25 AM, Mridul Muralidharan wrote: > On Thursday 21 April 2011 06:41 PM, Jeremy Hanna wrote: >> >> On Apr 21, 2011, at 3:19 AM, Mridul Muralidharan wrote: >> >>> >>> In general (on hadoop based systems), if the input is not immutable - you >>> can end up with issues during task re-execution, etc. >>> This happens not just for cassandra but for hbase, others too - where you >>> modify data in-place. >>> >> >> So do you mean that between the time of the first execution and the time of >> the re-execution, input data can change? Yes that's possible. However, >> unless you are reading stale data the second time, it's not a consistency >> issue, is it? I mean, if I am guaranteed to read the most recent data on >> the first execution and the second execution, that's consistent. If I am >> reading updated data the second time, that's consistent and may or may not >> be a problem. >> >> Just trying to make sure I understand. > > To clarify, I am referring to re-execution of a task, not job. > > From a (single) hadoop job point of view (and everything else which consumes > its output) - it is a consistency issue : the re-execution of a task can > generate set of key/values which are different from initial invocation (which > might have been used by some reducers). >
Good point about inputs that are not immutable. Currently Cassandra doesn't have a way to snapshot the data to be immutable inputs. Created a ticket to address that -https://issues.apache.org/jira/browse/CASSANDRA-2527 I guess I was more focused on Cassandra's architecture wrt consistency since it's often misunderstood - and how to use consistency levels with mapreduce/pig. > > Regards, > Mridul > >> >>> >>> >>> Regards, >>> Mridul >>> >>> On Thursday 21 April 2011 04:29 AM, Bing Wei wrote: >>>> Hi, All. >>>> >>>> When I do a pig query on Cassandra, and the Cassandra is updated by >>>> application at the same time, what will happen? I may get inconsistent >>>> results, right? >>>> >>> >> >
