[jira] [Commented] (CASSANDRA-13292) Replace MessagingService usage of MD5 with something more modern
[ https://issues.apache.org/jira/browse/CASSANDRA-13292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16840218#comment-16840218 ] Benedict commented on CASSANDRA-13292: -- https://github.com/rurban/smhasher > Replace MessagingService usage of MD5 with something more modern > > > Key: CASSANDRA-13292 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13292 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Core >Reporter: Michael Kjellman >Assignee: Michael Kjellman >Priority: Normal > Attachments: quorum-concurrency-reads-quorum.svg > > > While profiling C* via multiple profilers, I've consistently seen a > significant amount of time being spent calculating MD5 digests. > {code} > Stack Trace Sample CountPercentage(%) > sun.security.provider.MD5.implCompress(byte[], int) 264 1.566 >sun.security.provider.DigestBase.implCompressMultiBlock(byte[], int, int) > 200 1.187 > sun.security.provider.DigestBase.engineUpdate(byte[], int, int) 200 > 1.187 > java.security.MessageDigestSpi.engineUpdate(ByteBuffer) 200 > 1.187 > java.security.MessageDigest$Delegate.engineUpdate(ByteBuffer) > 200 1.187 >java.security.MessageDigest.update(ByteBuffer) 200 1.187 > org.apache.cassandra.db.Column.updateDigest(MessageDigest) > 193 1.145 > > org.apache.cassandra.db.ColumnFamily.updateDigest(MessageDigest) 193 1.145 > > org.apache.cassandra.db.ColumnFamily.digest(ColumnFamily) 193 1.145 > > org.apache.cassandra.service.RowDigestResolver.resolve() 106 0.629 > > org.apache.cassandra.service.RowDigestResolver.resolve()106 0.629 > > org.apache.cassandra.service.ReadCallback.get() 88 0.522 > > org.apache.cassandra.service.AbstractReadExecutor.get() 88 0.522 > > org.apache.cassandra.service.StorageProxy.fetchRows(List, ConsistencyLevel) > 88 0.522 > > org.apache.cassandra.service.StorageProxy.read(List, ConsistencyLevel) > 88 0.522 > > org.apache.cassandra.service.pager.SliceQueryPager.queryNextPage(int, > ConsistencyLevel, boolean) 88 0.522 > > org.apache.cassandra.service.pager.AbstractQueryPager.fetchPage(int) 88 > 0.522 > > org.apache.cassandra.service.pager.SliceQueryPager.fetchPage(int) 88 > 0.522 > > org.apache.cassandra.cql3.statements.SelectStatement.execute(QueryState, > QueryOptions) 88 0.522 > > org.apache.cassandra.cql3.statements.SelectStatement.execute(QueryState, > QueryOptions) 88 0.522 > > org.apache.cassandra.cql3.QueryProcessor.processStatement(CQLStatement, > QueryState, QueryOptions) 88 0.522 > > org.apache.cassandra.cql3.QueryProcessor.process(String, QueryState, > QueryOptions) 88 0.522 > > org.apache.cassandra.transport.messages.QueryMessage.execute(QueryState) > 88 0.522 > > org.apache.cassandra.transport.Message$Dispatcher.messageReceived(ChannelHandlerContext, > MessageEvent) 88 0.522 > > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(ChannelHandlerContext, > ChannelEvent) 88 0.522 > > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline$DefaultChannelHandlerContext, > ChannelEvent) 88 0.522 > > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(ChannelEvent) > 88 0.522 > >org.jboss.netty.handler.execution.ChannelUpstreamEventRunnable.doRun() > 88 0.522 > >
[jira] [Commented] (CASSANDRA-13292) Replace MessagingService usage of MD5 with something more modern
[ https://issues.apache.org/jira/browse/CASSANDRA-13292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839229#comment-16839229 ] Elliott Sims commented on CASSANDRA-13292: -- In terms of hash algorithms, a cryptographic hash is one that's expensive to invert and it doesn't necessarily affect collision probabilities. For digests, I don't think difficulty of inversion matters at all since it's definitely not trying to hide the original data or protect against deliberate corruption. What does matter is output size and distribution. So, any fast 128-bit hash with good distribution should be equivalent to MD5: Murmur3F (faster than md5 but slower than the rest, well-supported, greenrobot implementation claims to be much faster than guava), xxH3 (fast, brand new/unstable, possible collisions), Farmhash128, Spookyhash128 Default/reference implementations seem to all be in C/C++ along with most benchmarks, so "best/fastest" may not be the same as "best/fastest in available Java libraries with compatible licenses" > Replace MessagingService usage of MD5 with something more modern > > > Key: CASSANDRA-13292 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13292 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Core >Reporter: Michael Kjellman >Assignee: Michael Kjellman >Priority: Normal > Attachments: quorum-concurrency-reads-quorum.svg > > > While profiling C* via multiple profilers, I've consistently seen a > significant amount of time being spent calculating MD5 digests. > {code} > Stack Trace Sample CountPercentage(%) > sun.security.provider.MD5.implCompress(byte[], int) 264 1.566 >sun.security.provider.DigestBase.implCompressMultiBlock(byte[], int, int) > 200 1.187 > sun.security.provider.DigestBase.engineUpdate(byte[], int, int) 200 > 1.187 > java.security.MessageDigestSpi.engineUpdate(ByteBuffer) 200 > 1.187 > java.security.MessageDigest$Delegate.engineUpdate(ByteBuffer) > 200 1.187 >java.security.MessageDigest.update(ByteBuffer) 200 1.187 > org.apache.cassandra.db.Column.updateDigest(MessageDigest) > 193 1.145 > > org.apache.cassandra.db.ColumnFamily.updateDigest(MessageDigest) 193 1.145 > > org.apache.cassandra.db.ColumnFamily.digest(ColumnFamily) 193 1.145 > > org.apache.cassandra.service.RowDigestResolver.resolve() 106 0.629 > > org.apache.cassandra.service.RowDigestResolver.resolve()106 0.629 > > org.apache.cassandra.service.ReadCallback.get() 88 0.522 > > org.apache.cassandra.service.AbstractReadExecutor.get() 88 0.522 > > org.apache.cassandra.service.StorageProxy.fetchRows(List, ConsistencyLevel) > 88 0.522 > > org.apache.cassandra.service.StorageProxy.read(List, ConsistencyLevel) > 88 0.522 > > org.apache.cassandra.service.pager.SliceQueryPager.queryNextPage(int, > ConsistencyLevel, boolean) 88 0.522 > > org.apache.cassandra.service.pager.AbstractQueryPager.fetchPage(int) 88 > 0.522 > > org.apache.cassandra.service.pager.SliceQueryPager.fetchPage(int) 88 > 0.522 > > org.apache.cassandra.cql3.statements.SelectStatement.execute(QueryState, > QueryOptions) 88 0.522 > > org.apache.cassandra.cql3.statements.SelectStatement.execute(QueryState, > QueryOptions) 88 0.522 > > org.apache.cassandra.cql3.QueryProcessor.processStatement(CQLStatement, > QueryState, QueryOptions) 88 0.522 > > org.apache.cassandra.cql3.QueryProcessor.process(String, QueryState, > QueryOptions) 88 0.522 > > org.apache.cassandra.transport.messages.QueryMessage.execute(QueryState) > 88 0.522 > > org.apache.cassandra.transport.Message$Dispatcher.messageReceived(ChannelHandlerContext, > MessageEvent) 88 0.522 > >
[jira] [Commented] (CASSANDRA-13292) Replace MessagingService usage of MD5 with something more modern
[ https://issues.apache.org/jira/browse/CASSANDRA-13292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823594#comment-16823594 ] Jon Haddad commented on CASSANDRA-13292: I was recently doing some testing on 3.11.4 to see the impact of using huge partitions, and noticed digest calculations are taking up over 50% of cpu time. Icicle graph attached. [^quorum-concurrency-reads-quorum.svg] > Replace MessagingService usage of MD5 with something more modern > > > Key: CASSANDRA-13292 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13292 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Core >Reporter: Michael Kjellman >Assignee: Michael Kjellman >Priority: Normal > Attachments: quorum-concurrency-reads-quorum.svg > > > While profiling C* via multiple profilers, I've consistently seen a > significant amount of time being spent calculating MD5 digests. > {code} > Stack Trace Sample CountPercentage(%) > sun.security.provider.MD5.implCompress(byte[], int) 264 1.566 >sun.security.provider.DigestBase.implCompressMultiBlock(byte[], int, int) > 200 1.187 > sun.security.provider.DigestBase.engineUpdate(byte[], int, int) 200 > 1.187 > java.security.MessageDigestSpi.engineUpdate(ByteBuffer) 200 > 1.187 > java.security.MessageDigest$Delegate.engineUpdate(ByteBuffer) > 200 1.187 >java.security.MessageDigest.update(ByteBuffer) 200 1.187 > org.apache.cassandra.db.Column.updateDigest(MessageDigest) > 193 1.145 > > org.apache.cassandra.db.ColumnFamily.updateDigest(MessageDigest) 193 1.145 > > org.apache.cassandra.db.ColumnFamily.digest(ColumnFamily) 193 1.145 > > org.apache.cassandra.service.RowDigestResolver.resolve() 106 0.629 > > org.apache.cassandra.service.RowDigestResolver.resolve()106 0.629 > > org.apache.cassandra.service.ReadCallback.get() 88 0.522 > > org.apache.cassandra.service.AbstractReadExecutor.get() 88 0.522 > > org.apache.cassandra.service.StorageProxy.fetchRows(List, ConsistencyLevel) > 88 0.522 > > org.apache.cassandra.service.StorageProxy.read(List, ConsistencyLevel) > 88 0.522 > > org.apache.cassandra.service.pager.SliceQueryPager.queryNextPage(int, > ConsistencyLevel, boolean) 88 0.522 > > org.apache.cassandra.service.pager.AbstractQueryPager.fetchPage(int) 88 > 0.522 > > org.apache.cassandra.service.pager.SliceQueryPager.fetchPage(int) 88 > 0.522 > > org.apache.cassandra.cql3.statements.SelectStatement.execute(QueryState, > QueryOptions) 88 0.522 > > org.apache.cassandra.cql3.statements.SelectStatement.execute(QueryState, > QueryOptions) 88 0.522 > > org.apache.cassandra.cql3.QueryProcessor.processStatement(CQLStatement, > QueryState, QueryOptions) 88 0.522 > > org.apache.cassandra.cql3.QueryProcessor.process(String, QueryState, > QueryOptions) 88 0.522 > > org.apache.cassandra.transport.messages.QueryMessage.execute(QueryState) > 88 0.522 > > org.apache.cassandra.transport.Message$Dispatcher.messageReceived(ChannelHandlerContext, > MessageEvent) 88 0.522 > > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(ChannelHandlerContext, > ChannelEvent) 88 0.522 > > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline$DefaultChannelHandlerContext, > ChannelEvent) 88 0.522 > > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(ChannelEvent) > 88 0.522 > >
[jira] [Commented] (CASSANDRA-13292) Replace MessagingService usage of MD5 with something more modern
[ https://issues.apache.org/jira/browse/CASSANDRA-13292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16740157#comment-16740157 ] JinhuaLuo commented on CASSANDRA-13292: --- I have a question: adler or murmur3 is not _cryptographic_ hash, so there may be collision hash for different inputs. That is, given two different query result, it may give the same digest value. But digest request is used to check if all replica contains the same data for the specific query, so if the hash does not reflect the actual difference, it would give wrong result and do not trigger read repair. But I also think the digest is heavyweight, which brings in unnecessary overhead, especially when it calculates the digest upon the unchanged large data. I'm thinking that whether it could bring in a digest cache, then if the schema or query columns (or fields in complex columns) was not mutated, then it could fulfill the digest request directly from the cache. > Replace MessagingService usage of MD5 with something more modern > > > Key: CASSANDRA-13292 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13292 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Core >Reporter: Michael Kjellman >Assignee: Michael Kjellman >Priority: Major > > While profiling C* via multiple profilers, I've consistently seen a > significant amount of time being spent calculating MD5 digests. > {code} > Stack Trace Sample CountPercentage(%) > sun.security.provider.MD5.implCompress(byte[], int) 264 1.566 >sun.security.provider.DigestBase.implCompressMultiBlock(byte[], int, int) > 200 1.187 > sun.security.provider.DigestBase.engineUpdate(byte[], int, int) 200 > 1.187 > java.security.MessageDigestSpi.engineUpdate(ByteBuffer) 200 > 1.187 > java.security.MessageDigest$Delegate.engineUpdate(ByteBuffer) > 200 1.187 >java.security.MessageDigest.update(ByteBuffer) 200 1.187 > org.apache.cassandra.db.Column.updateDigest(MessageDigest) > 193 1.145 > > org.apache.cassandra.db.ColumnFamily.updateDigest(MessageDigest) 193 1.145 > > org.apache.cassandra.db.ColumnFamily.digest(ColumnFamily) 193 1.145 > > org.apache.cassandra.service.RowDigestResolver.resolve() 106 0.629 > > org.apache.cassandra.service.RowDigestResolver.resolve()106 0.629 > > org.apache.cassandra.service.ReadCallback.get() 88 0.522 > > org.apache.cassandra.service.AbstractReadExecutor.get() 88 0.522 > > org.apache.cassandra.service.StorageProxy.fetchRows(List, ConsistencyLevel) > 88 0.522 > > org.apache.cassandra.service.StorageProxy.read(List, ConsistencyLevel) > 88 0.522 > > org.apache.cassandra.service.pager.SliceQueryPager.queryNextPage(int, > ConsistencyLevel, boolean) 88 0.522 > > org.apache.cassandra.service.pager.AbstractQueryPager.fetchPage(int) 88 > 0.522 > > org.apache.cassandra.service.pager.SliceQueryPager.fetchPage(int) 88 > 0.522 > > org.apache.cassandra.cql3.statements.SelectStatement.execute(QueryState, > QueryOptions) 88 0.522 > > org.apache.cassandra.cql3.statements.SelectStatement.execute(QueryState, > QueryOptions) 88 0.522 > > org.apache.cassandra.cql3.QueryProcessor.processStatement(CQLStatement, > QueryState, QueryOptions) 88 0.522 > > org.apache.cassandra.cql3.QueryProcessor.process(String, QueryState, > QueryOptions) 88 0.522 > > org.apache.cassandra.transport.messages.QueryMessage.execute(QueryState) > 88 0.522 > > org.apache.cassandra.transport.Message$Dispatcher.messageReceived(ChannelHandlerContext, > MessageEvent) 88 0.522 > > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(ChannelHandlerContext, > ChannelEvent) 88 0.522 >
[jira] [Commented] (CASSANDRA-13292) Replace MessagingService usage of MD5 with something more modern
[ https://issues.apache.org/jira/browse/CASSANDRA-13292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893145#comment-15893145 ] Michael Kjellman commented on CASSANDRA-13292: -- I have a patch for this as a separate commit on top of CASSANDRA-13291. I'll hold off attaching one until we have some discussion about what hashing implementations we might want to go with -- and after CASSANDRA-13291 is +1'ed (which takes care of the bulk of changes required to make this change). > Replace MessagingService usage of MD5 with something more modern > > > Key: CASSANDRA-13292 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13292 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Michael Kjellman >Assignee: Michael Kjellman > > While profiling C* via multiple profilers, I've consistently seen a > significant amount of time being spent calculating MD5 digests. > {code} > Stack Trace Sample CountPercentage(%) > sun.security.provider.MD5.implCompress(byte[], int) 264 1.566 >sun.security.provider.DigestBase.implCompressMultiBlock(byte[], int, int) > 200 1.187 > sun.security.provider.DigestBase.engineUpdate(byte[], int, int) 200 > 1.187 > java.security.MessageDigestSpi.engineUpdate(ByteBuffer) 200 > 1.187 > java.security.MessageDigest$Delegate.engineUpdate(ByteBuffer) > 200 1.187 >java.security.MessageDigest.update(ByteBuffer) 200 1.187 > org.apache.cassandra.db.Column.updateDigest(MessageDigest) > 193 1.145 > > org.apache.cassandra.db.ColumnFamily.updateDigest(MessageDigest) 193 1.145 > > org.apache.cassandra.db.ColumnFamily.digest(ColumnFamily) 193 1.145 > > org.apache.cassandra.service.RowDigestResolver.resolve() 106 0.629 > > org.apache.cassandra.service.RowDigestResolver.resolve()106 0.629 > > org.apache.cassandra.service.ReadCallback.get() 88 0.522 > > org.apache.cassandra.service.AbstractReadExecutor.get() 88 0.522 > > org.apache.cassandra.service.StorageProxy.fetchRows(List, ConsistencyLevel) > 88 0.522 > > org.apache.cassandra.service.StorageProxy.read(List, ConsistencyLevel) > 88 0.522 > > org.apache.cassandra.service.pager.SliceQueryPager.queryNextPage(int, > ConsistencyLevel, boolean) 88 0.522 > > org.apache.cassandra.service.pager.AbstractQueryPager.fetchPage(int) 88 > 0.522 > > org.apache.cassandra.service.pager.SliceQueryPager.fetchPage(int) 88 > 0.522 > > org.apache.cassandra.cql3.statements.SelectStatement.execute(QueryState, > QueryOptions) 88 0.522 > > org.apache.cassandra.cql3.statements.SelectStatement.execute(QueryState, > QueryOptions) 88 0.522 > > org.apache.cassandra.cql3.QueryProcessor.processStatement(CQLStatement, > QueryState, QueryOptions) 88 0.522 > > org.apache.cassandra.cql3.QueryProcessor.process(String, QueryState, > QueryOptions) 88 0.522 > > org.apache.cassandra.transport.messages.QueryMessage.execute(QueryState) > 88 0.522 > > org.apache.cassandra.transport.Message$Dispatcher.messageReceived(ChannelHandlerContext, > MessageEvent) 88 0.522 > > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(ChannelHandlerContext, > ChannelEvent) 88 0.522 > > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline$DefaultChannelHandlerContext, > ChannelEvent) 88 0.522 > > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(ChannelEvent) > 88 0.522 > >