On Apr 21, 2011, at 9:25 AM, Mridul Muralidharan wrote:

> On Thursday 21 April 2011 06:41 PM, Jeremy Hanna wrote:
>> 
>> On Apr 21, 2011, at 3:19 AM, Mridul Muralidharan wrote:
>> 
>>> 
>>> In general (on hadoop based systems), if the input is not immutable - you 
>>> can end up with issues during task re-execution, etc.
>>> This happens not just for cassandra but for hbase, others too - where you 
>>> modify data in-place.
>>> 
>> 
>> So do you mean that between the time of the first execution and the time of 
>> the re-execution, input data can change?  Yes that's possible.  However, 
>> unless you are reading stale data the second time, it's not a consistency 
>> issue, is it?  I mean, if I am guaranteed to read the most recent data on 
>> the first execution and the second execution, that's consistent.  If I am 
>> reading updated data the second time, that's consistent and may or may not 
>> be a problem.
>> 
>> Just trying to make sure I understand.
> 
> To clarify, I am referring to re-execution of a task, not job.
> 
> From a (single) hadoop job point of view (and everything else which consumes 
> its output) - it is a consistency issue : the re-execution  of a task can 
> generate set of key/values which are different from initial invocation (which 
> might have been used by some reducers).
> 

Good point about inputs that are not immutable.  Currently Cassandra doesn't 
have a way to snapshot the data to be immutable inputs.  Created a ticket to 
address that -https://issues.apache.org/jira/browse/CASSANDRA-2527

I guess I was more focused on Cassandra's architecture wrt consistency since 
it's often misunderstood -  and how to use consistency levels with 
mapreduce/pig.

> 
> Regards,
> Mridul
> 
>> 
>>> 
>>> 
>>> Regards,
>>> Mridul
>>> 
>>> On Thursday 21 April 2011 04:29 AM, Bing Wei wrote:
>>>> Hi, All.
>>>> 
>>>> When I do a pig query on Cassandra, and the Cassandra is updated by
>>>> application at the same time, what will happen? I may get inconsistent
>>>> results, right?
>>>> 
>>> 
>> 
> 

Reply via email to