(1) the hash calculation is a small amount of CPU -- MD5 is specifically designed to be efficient in this kind of situation (2) we compute one hash per query, so for multiple columns the advantage over timestamp-per-column gets large quickly.
On Wed, Jul 13, 2011 at 7:31 AM, David Boxenhorn <da...@citypath.com> wrote: > Is that the actual reason? > > This seems like a big inefficiency to me. For those of us who don't worry > about this extreme edge case (that probably will NEVER happen in real life, > for most applications), is there a way to turn this off? > > Or am I wrong about this making the operation MUCH more expensive? > > > On Wed, Jul 13, 2011 at 3:20 PM, Boris Yen <yulin...@gmail.com> wrote: >> >> For a specific column, If there are two versions with the same timestamp, >> the value of the column is used to break the tie. >> if v1.value().compareTo(v2.value()) < 0, it means that v2 wins. >> On Wed, Jul 13, 2011 at 7:13 PM, David Boxenhorn <da...@citypath.com> >> wrote: >>> >>> How would you know which data is correct, if they both have the same >>> timestamp? >>> >>> On Wed, Jul 13, 2011 at 12:40 PM, Boris Yen <yulin...@gmail.com> wrote: >>>> >>>> I can only say, "data" does matter, that is why the developers use hash >>>> instead of timestamp. If hash value comes from other node is not a match, a >>>> read repair would perform. so that correct data can be returned. >>>> >>>> On Wed, Jul 13, 2011 at 5:08 PM, David Boxenhorn <da...@citypath.com> >>>> wrote: >>>>> >>>>> If you have to pieces of data that are different but have the same >>>>> timestamp, how can you resolve consistency? >>>>> >>>>> This is a pathological situation to begin with, why should you waste >>>>> effort to (not) solve it? >>>>> >>>>> On Wed, Jul 13, 2011 at 12:05 PM, Boris Yen <yulin...@gmail.com> wrote: >>>>>> >>>>>> I guess it is because the timestamp does not guarantee data >>>>>> consistency, but hash does. >>>>>> Boris >>>>>> >>>>>> On Wed, Jul 13, 2011 at 4:27 PM, David Boxenhorn <da...@citypath.com> >>>>>> wrote: >>>>>>> >>>>>>> I just saw this >>>>>>> >>>>>>> http://wiki.apache.org/cassandra/DigestQueries >>>>>>> >>>>>>> and I was wondering why it returns a hash of the data. Wouldn't it be >>>>>>> better and easier to return the timestamp? You don't really care what >>>>>>> the >>>>>>> data is, you only care whether it is more or less recent than another >>>>>>> piece >>>>>>> of data. >>>>>> >>>>> >>>> >>> >> > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com