Re: pig query on Cassandra

Mridul Muralidharan Thu, 21 Apr 2011 07:25:41 -0700

On Thursday 21 April 2011 06:41 PM, Jeremy Hanna wrote:


On Apr 21, 2011, at 3:19 AM, Mridul Muralidharan wrote:


In general (on hadoop based systems), if the input is not immutable - you can 
end up with issues during task re-execution, etc.
This happens not just for cassandra but for hbase, others too - where you 
modify data in-place.


So do you mean that between the time of the first execution and the time of the 
re-execution, input data can change?  Yes that's possible.  However, unless you 
are reading stale data the second time, it's not a consistency issue, is it?  I 
mean, if I am guaranteed to read the most recent data on the first execution 
and the second execution, that's consistent.  If I am reading updated data the 
second time, that's consistent and may or may not be a problem.

Just trying to make sure I understand.


To clarify, I am referring to re-execution of a task, not job.

From a (single) hadoop job point of view (and everything else whichconsumes its output) - it is a consistency issue : the re-execution ofa task can generate set of key/values which are different from initialinvocation (which might have been used by some reducers).



Regards,
Mridul



Regards,
Mridul

On Thursday 21 April 2011 04:29 AM, Bing Wei wrote:

Hi, All.

When I do a pig query on Cassandra, and the Cassandra is updated by
application at the same time, what will happen? I may get inconsistent
results, right?

Re: pig query on Cassandra

Reply via email to