RE: MD5 in the read path

2018-09-26 Thread Tyagi, Preetika
Makes sense. Thanks! -Original Message- From: Joseph Lynch [mailto:joe.e.ly...@gmail.com] Sent: Wednesday, September 26, 2018 9:02 PM To: dev@cassandra.apache.org Subject: Re: MD5 in the read path > > Thank you all for the response. > For RandomPartitioner, MD5 is used to avoid

Re: MD5 in the read path

2018-09-26 Thread Joseph Lynch
> > Thank you all for the response. > For RandomPartitioner, MD5 is used to avoid collision. However, why is it > necessary for comparing data between different replicas? Is it not feasible > to use CRC for data comparison? > My understanding is that it is not necessary to use MD5 and we can

RE: MD5 in the read path

2018-09-26 Thread Tyagi, Preetika
Thank you all for the response. For RandomPartitioner, MD5 is used to avoid collision. However, why is it necessary for comparing data between different replicas? Is it not feasible to use CRC for data comparison? Thanks, Preetika -Original Message- From: Elliott Sims

Re: MD5 in the read path

2018-09-26 Thread Elliott Sims
Would xxHash be large enough for digests? Looks like there's no 128-bit version yet, and it seems like 64 bits would be a bit short to avoid accidental collisions/matches. FarmHash128 or MetroHash128 might be a good choice. Not quite as fast as xxHash64, but not far off and still much, much

Re: MD5 in the read path

2018-09-26 Thread Joseph Lynch
Michael Kjellman and others (Jason, Sam, et al.) have already done a lot of work in 4.0 to help change the use of MD5 to something more modern [1][2]. Also I cut a ticket a little while back about the significant performance penalty of using MD5 for digests when doing quorum reads of wide

Re: MD5 in the read path

2018-09-26 Thread Elliott Sims
They also don't matter for digests, as long as we're assuming all nodes in the cluster are non-malicious (which is a pretty reasonable and probably necessary assumption). Or at least, deliberate collisions don't. Accidental collisions do, but 128 bits is sufficient to make that sufficiently

Re: MD5 in the read path

2018-09-26 Thread Brandon Williams
Collisions don't matter in the partitioner. On Wed, Sep 26, 2018, 6:53 PM Anirudh Kubatoor wrote: > Isn't MD5 broken from a security standpoint? From wikipedia > *"One basic requirement of any cryptographic hash function is that it > should be computationally infeasible > < >

Re: MD5 in the read path

2018-09-26 Thread Anirudh Kubatoor
Isn't MD5 broken from a security standpoint? From wikipedia *"One basic requirement of any cryptographic hash function is that it should be computationally infeasible to find two non-identical messages which hash to the

Re: MD5 in the read path

2018-09-26 Thread Jeff Jirsa
In some installations, it's used for hashing the partition key to find the host ( RandomPartitioner ) It's used for prepared statement IDs It's used for hashing the data for reads to know if the data matches on all different replicas. We don't use CRC because conflicts would be really bad.

Re: MD5 in the read path

2018-09-26 Thread Elliott Sims
Thanks to open source, you can answer yourself: https://github.com/apache/cassandra/search?q=md5_q=md5 At a glance, looks like it's used for digest verification, and to get a good hash distribution on the RandomPartitioner I haven't done the math, but I suspect CRC32's just not good enough either

MD5 in the read path

2018-09-26 Thread Tyagi, Preetika
Hi all, I have a question about MD5 being used in the read path in Cassandra. I wanted to understand what exactly it is being used for and why not something like CRC is used which is less complex in comparison to MD5. Thanks, Preetika

Re: QA signup

2018-09-26 Thread Jay Zhuang
+1 for publishing official snapshot artifacts for 4.0 and even other branches. We're publishing snapshot artifacts to our internal artifactory. One minor bug we found is: currently build.xml won't publish any snapshot artifact: https://issues.apache.org/jira/browse/CASSANDRA-12704 On Thu, Sep