[jira] [Commented] (CASSANDRA-13292) Replace MessagingService usage of MD5 with something more modern

2019-05-15 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16840218#comment-16840218
 ] 

Benedict commented on CASSANDRA-13292:
--

https://github.com/rurban/smhasher

> Replace MessagingService usage of MD5 with something more modern
> 
>
> Key: CASSANDRA-13292
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13292
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Core
>Reporter: Michael Kjellman
>Assignee: Michael Kjellman
>Priority: Normal
> Attachments: quorum-concurrency-reads-quorum.svg
>
>
> While profiling C* via multiple profilers, I've consistently seen a 
> significant amount of time being spent calculating MD5 digests.
> {code}
> Stack Trace   Sample CountPercentage(%)
> sun.security.provider.MD5.implCompress(byte[], int)   264 1.566
>sun.security.provider.DigestBase.implCompressMultiBlock(byte[], int, int)  
> 200 1.187
>   sun.security.provider.DigestBase.engineUpdate(byte[], int, int) 200 
> 1.187
>  java.security.MessageDigestSpi.engineUpdate(ByteBuffer)  200 
> 1.187
> java.security.MessageDigest$Delegate.engineUpdate(ByteBuffer) 
> 200 1.187
>java.security.MessageDigest.update(ByteBuffer) 200 1.187
>   org.apache.cassandra.db.Column.updateDigest(MessageDigest)  
> 193 1.145
>  
> org.apache.cassandra.db.ColumnFamily.updateDigest(MessageDigest) 193 1.145
> 
> org.apache.cassandra.db.ColumnFamily.digest(ColumnFamily) 193 1.145
>
> org.apache.cassandra.service.RowDigestResolver.resolve()   106 0.629
>   
> org.apache.cassandra.service.RowDigestResolver.resolve()106 0.629
>  
> org.apache.cassandra.service.ReadCallback.get()  88  0.522
> 
> org.apache.cassandra.service.AbstractReadExecutor.get()   88  0.522
>
> org.apache.cassandra.service.StorageProxy.fetchRows(List, ConsistencyLevel)   
>  88  0.522
>   
> org.apache.cassandra.service.StorageProxy.read(List, ConsistencyLevel)  
> 88  0.522
>  
> org.apache.cassandra.service.pager.SliceQueryPager.queryNextPage(int, 
> ConsistencyLevel, boolean) 88  0.522
> 
> org.apache.cassandra.service.pager.AbstractQueryPager.fetchPage(int)  88  
> 0.522
>
> org.apache.cassandra.service.pager.SliceQueryPager.fetchPage(int)  88  
> 0.522
>   
> org.apache.cassandra.cql3.statements.SelectStatement.execute(QueryState, 
> QueryOptions)  88  0.522
>  
> org.apache.cassandra.cql3.statements.SelectStatement.execute(QueryState, 
> QueryOptions)   88  0.522
> 
> org.apache.cassandra.cql3.QueryProcessor.processStatement(CQLStatement, 
> QueryState, QueryOptions) 88  0.522
>
> org.apache.cassandra.cql3.QueryProcessor.process(String, QueryState, 
> QueryOptions) 88  0.522
>   
> org.apache.cassandra.transport.messages.QueryMessage.execute(QueryState)
> 88  0.522
>  
> org.apache.cassandra.transport.Message$Dispatcher.messageReceived(ChannelHandlerContext,
>  MessageEvent)   88  0.522
> 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(ChannelHandlerContext,
>  ChannelEvent)  88  0.522
>
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline$DefaultChannelHandlerContext,
>  ChannelEvent) 88  0.522
>   
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(ChannelEvent)
>   88  0.522
>   
>org.jboss.netty.handler.execution.ChannelUpstreamEventRunnable.doRun() 
>   88  0.522
>   
>   

[jira] [Commented] (CASSANDRA-13292) Replace MessagingService usage of MD5 with something more modern

2019-05-14 Thread Elliott Sims (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839229#comment-16839229
 ] 

Elliott Sims commented on CASSANDRA-13292:
--

In terms of hash algorithms, a cryptographic hash is one that's expensive to 
invert and it doesn't necessarily affect collision probabilities. For digests, 
I don't think difficulty of inversion matters at all since it's definitely not 
trying to hide the original data or protect against deliberate corruption.

What does matter is output size and distribution.  So, any fast 128-bit hash 
with good distribution should be equivalent to MD5:  Murmur3F (faster than md5 
but slower than the rest, well-supported, greenrobot implementation claims to 
be much faster than guava), 
xxH3 (fast, brand new/unstable, possible collisions), 
Farmhash128,
Spookyhash128

Default/reference implementations seem to all be in C/C++ along with most 
benchmarks, so "best/fastest" may not be the same as "best/fastest in available 
Java libraries with compatible licenses"

> Replace MessagingService usage of MD5 with something more modern
> 
>
> Key: CASSANDRA-13292
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13292
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Core
>Reporter: Michael Kjellman
>Assignee: Michael Kjellman
>Priority: Normal
> Attachments: quorum-concurrency-reads-quorum.svg
>
>
> While profiling C* via multiple profilers, I've consistently seen a 
> significant amount of time being spent calculating MD5 digests.
> {code}
> Stack Trace   Sample CountPercentage(%)
> sun.security.provider.MD5.implCompress(byte[], int)   264 1.566
>sun.security.provider.DigestBase.implCompressMultiBlock(byte[], int, int)  
> 200 1.187
>   sun.security.provider.DigestBase.engineUpdate(byte[], int, int) 200 
> 1.187
>  java.security.MessageDigestSpi.engineUpdate(ByteBuffer)  200 
> 1.187
> java.security.MessageDigest$Delegate.engineUpdate(ByteBuffer) 
> 200 1.187
>java.security.MessageDigest.update(ByteBuffer) 200 1.187
>   org.apache.cassandra.db.Column.updateDigest(MessageDigest)  
> 193 1.145
>  
> org.apache.cassandra.db.ColumnFamily.updateDigest(MessageDigest) 193 1.145
> 
> org.apache.cassandra.db.ColumnFamily.digest(ColumnFamily) 193 1.145
>
> org.apache.cassandra.service.RowDigestResolver.resolve()   106 0.629
>   
> org.apache.cassandra.service.RowDigestResolver.resolve()106 0.629
>  
> org.apache.cassandra.service.ReadCallback.get()  88  0.522
> 
> org.apache.cassandra.service.AbstractReadExecutor.get()   88  0.522
>
> org.apache.cassandra.service.StorageProxy.fetchRows(List, ConsistencyLevel)   
>  88  0.522
>   
> org.apache.cassandra.service.StorageProxy.read(List, ConsistencyLevel)  
> 88  0.522
>  
> org.apache.cassandra.service.pager.SliceQueryPager.queryNextPage(int, 
> ConsistencyLevel, boolean) 88  0.522
> 
> org.apache.cassandra.service.pager.AbstractQueryPager.fetchPage(int)  88  
> 0.522
>
> org.apache.cassandra.service.pager.SliceQueryPager.fetchPage(int)  88  
> 0.522
>   
> org.apache.cassandra.cql3.statements.SelectStatement.execute(QueryState, 
> QueryOptions)  88  0.522
>  
> org.apache.cassandra.cql3.statements.SelectStatement.execute(QueryState, 
> QueryOptions)   88  0.522
> 
> org.apache.cassandra.cql3.QueryProcessor.processStatement(CQLStatement, 
> QueryState, QueryOptions) 88  0.522
>
> org.apache.cassandra.cql3.QueryProcessor.process(String, QueryState, 
> QueryOptions) 88  0.522
>   
> org.apache.cassandra.transport.messages.QueryMessage.execute(QueryState)
> 88  0.522
>  
> org.apache.cassandra.transport.Message$Dispatcher.messageReceived(ChannelHandlerContext,
>  MessageEvent)   88  0.522
> 
> 

[jira] [Commented] (CASSANDRA-13292) Replace MessagingService usage of MD5 with something more modern

2019-04-22 Thread Jon Haddad (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823594#comment-16823594
 ] 

Jon Haddad commented on CASSANDRA-13292:


I was recently doing some testing on 3.11.4 to see the impact of using huge 
partitions, and noticed digest calculations are taking up over 50% of cpu time.

Icicle graph attached.

 [^quorum-concurrency-reads-quorum.svg] 

> Replace MessagingService usage of MD5 with something more modern
> 
>
> Key: CASSANDRA-13292
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13292
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Core
>Reporter: Michael Kjellman
>Assignee: Michael Kjellman
>Priority: Normal
> Attachments: quorum-concurrency-reads-quorum.svg
>
>
> While profiling C* via multiple profilers, I've consistently seen a 
> significant amount of time being spent calculating MD5 digests.
> {code}
> Stack Trace   Sample CountPercentage(%)
> sun.security.provider.MD5.implCompress(byte[], int)   264 1.566
>sun.security.provider.DigestBase.implCompressMultiBlock(byte[], int, int)  
> 200 1.187
>   sun.security.provider.DigestBase.engineUpdate(byte[], int, int) 200 
> 1.187
>  java.security.MessageDigestSpi.engineUpdate(ByteBuffer)  200 
> 1.187
> java.security.MessageDigest$Delegate.engineUpdate(ByteBuffer) 
> 200 1.187
>java.security.MessageDigest.update(ByteBuffer) 200 1.187
>   org.apache.cassandra.db.Column.updateDigest(MessageDigest)  
> 193 1.145
>  
> org.apache.cassandra.db.ColumnFamily.updateDigest(MessageDigest) 193 1.145
> 
> org.apache.cassandra.db.ColumnFamily.digest(ColumnFamily) 193 1.145
>
> org.apache.cassandra.service.RowDigestResolver.resolve()   106 0.629
>   
> org.apache.cassandra.service.RowDigestResolver.resolve()106 0.629
>  
> org.apache.cassandra.service.ReadCallback.get()  88  0.522
> 
> org.apache.cassandra.service.AbstractReadExecutor.get()   88  0.522
>
> org.apache.cassandra.service.StorageProxy.fetchRows(List, ConsistencyLevel)   
>  88  0.522
>   
> org.apache.cassandra.service.StorageProxy.read(List, ConsistencyLevel)  
> 88  0.522
>  
> org.apache.cassandra.service.pager.SliceQueryPager.queryNextPage(int, 
> ConsistencyLevel, boolean) 88  0.522
> 
> org.apache.cassandra.service.pager.AbstractQueryPager.fetchPage(int)  88  
> 0.522
>
> org.apache.cassandra.service.pager.SliceQueryPager.fetchPage(int)  88  
> 0.522
>   
> org.apache.cassandra.cql3.statements.SelectStatement.execute(QueryState, 
> QueryOptions)  88  0.522
>  
> org.apache.cassandra.cql3.statements.SelectStatement.execute(QueryState, 
> QueryOptions)   88  0.522
> 
> org.apache.cassandra.cql3.QueryProcessor.processStatement(CQLStatement, 
> QueryState, QueryOptions) 88  0.522
>
> org.apache.cassandra.cql3.QueryProcessor.process(String, QueryState, 
> QueryOptions) 88  0.522
>   
> org.apache.cassandra.transport.messages.QueryMessage.execute(QueryState)
> 88  0.522
>  
> org.apache.cassandra.transport.Message$Dispatcher.messageReceived(ChannelHandlerContext,
>  MessageEvent)   88  0.522
> 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(ChannelHandlerContext,
>  ChannelEvent)  88  0.522
>
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline$DefaultChannelHandlerContext,
>  ChannelEvent) 88  0.522
>   
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(ChannelEvent)
>   88  0.522
>   
>

[jira] [Commented] (CASSANDRA-13292) Replace MessagingService usage of MD5 with something more modern

2019-01-11 Thread JinhuaLuo (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16740157#comment-16740157
 ] 

JinhuaLuo commented on CASSANDRA-13292:
---

I have a question: adler or murmur3 is not _cryptographic_ hash, so there may 
be collision hash for different inputs. That is, given two different query 
result, it may give the same digest value. But digest request is used to check 
if all replica contains the same data for the specific query, so if the hash 
does not reflect the actual difference, it would give wrong result and do not 
trigger read repair.

But I also think the digest is heavyweight, which brings in unnecessary 
overhead, especially when it calculates the digest upon the unchanged large 
data.

I'm thinking that whether it could bring in a digest cache, then if the schema 
or query columns (or fields in complex columns) was not mutated, then it could 
fulfill the digest request directly from the cache.

> Replace MessagingService usage of MD5 with something more modern
> 
>
> Key: CASSANDRA-13292
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13292
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Core
>Reporter: Michael Kjellman
>Assignee: Michael Kjellman
>Priority: Major
>
> While profiling C* via multiple profilers, I've consistently seen a 
> significant amount of time being spent calculating MD5 digests.
> {code}
> Stack Trace   Sample CountPercentage(%)
> sun.security.provider.MD5.implCompress(byte[], int)   264 1.566
>sun.security.provider.DigestBase.implCompressMultiBlock(byte[], int, int)  
> 200 1.187
>   sun.security.provider.DigestBase.engineUpdate(byte[], int, int) 200 
> 1.187
>  java.security.MessageDigestSpi.engineUpdate(ByteBuffer)  200 
> 1.187
> java.security.MessageDigest$Delegate.engineUpdate(ByteBuffer) 
> 200 1.187
>java.security.MessageDigest.update(ByteBuffer) 200 1.187
>   org.apache.cassandra.db.Column.updateDigest(MessageDigest)  
> 193 1.145
>  
> org.apache.cassandra.db.ColumnFamily.updateDigest(MessageDigest) 193 1.145
> 
> org.apache.cassandra.db.ColumnFamily.digest(ColumnFamily) 193 1.145
>
> org.apache.cassandra.service.RowDigestResolver.resolve()   106 0.629
>   
> org.apache.cassandra.service.RowDigestResolver.resolve()106 0.629
>  
> org.apache.cassandra.service.ReadCallback.get()  88  0.522
> 
> org.apache.cassandra.service.AbstractReadExecutor.get()   88  0.522
>
> org.apache.cassandra.service.StorageProxy.fetchRows(List, ConsistencyLevel)   
>  88  0.522
>   
> org.apache.cassandra.service.StorageProxy.read(List, ConsistencyLevel)  
> 88  0.522
>  
> org.apache.cassandra.service.pager.SliceQueryPager.queryNextPage(int, 
> ConsistencyLevel, boolean) 88  0.522
> 
> org.apache.cassandra.service.pager.AbstractQueryPager.fetchPage(int)  88  
> 0.522
>
> org.apache.cassandra.service.pager.SliceQueryPager.fetchPage(int)  88  
> 0.522
>   
> org.apache.cassandra.cql3.statements.SelectStatement.execute(QueryState, 
> QueryOptions)  88  0.522
>  
> org.apache.cassandra.cql3.statements.SelectStatement.execute(QueryState, 
> QueryOptions)   88  0.522
> 
> org.apache.cassandra.cql3.QueryProcessor.processStatement(CQLStatement, 
> QueryState, QueryOptions) 88  0.522
>
> org.apache.cassandra.cql3.QueryProcessor.process(String, QueryState, 
> QueryOptions) 88  0.522
>   
> org.apache.cassandra.transport.messages.QueryMessage.execute(QueryState)
> 88  0.522
>  
> org.apache.cassandra.transport.Message$Dispatcher.messageReceived(ChannelHandlerContext,
>  MessageEvent)   88  0.522
> 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(ChannelHandlerContext,
>  ChannelEvent)  88  0.522
>

[jira] [Commented] (CASSANDRA-13292) Replace MessagingService usage of MD5 with something more modern

2017-03-02 Thread Michael Kjellman (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893145#comment-15893145
 ] 

Michael Kjellman commented on CASSANDRA-13292:
--

I have a patch for this as a separate commit on top of CASSANDRA-13291.

I'll hold off attaching one until we have some discussion about what hashing 
implementations we might want to go with -- and after CASSANDRA-13291 is +1'ed 
(which takes care of the bulk of changes required to make this change).

> Replace MessagingService usage of MD5 with something more modern
> 
>
> Key: CASSANDRA-13292
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13292
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Michael Kjellman
>Assignee: Michael Kjellman
>
> While profiling C* via multiple profilers, I've consistently seen a 
> significant amount of time being spent calculating MD5 digests.
> {code}
> Stack Trace   Sample CountPercentage(%)
> sun.security.provider.MD5.implCompress(byte[], int)   264 1.566
>sun.security.provider.DigestBase.implCompressMultiBlock(byte[], int, int)  
> 200 1.187
>   sun.security.provider.DigestBase.engineUpdate(byte[], int, int) 200 
> 1.187
>  java.security.MessageDigestSpi.engineUpdate(ByteBuffer)  200 
> 1.187
> java.security.MessageDigest$Delegate.engineUpdate(ByteBuffer) 
> 200 1.187
>java.security.MessageDigest.update(ByteBuffer) 200 1.187
>   org.apache.cassandra.db.Column.updateDigest(MessageDigest)  
> 193 1.145
>  
> org.apache.cassandra.db.ColumnFamily.updateDigest(MessageDigest) 193 1.145
> 
> org.apache.cassandra.db.ColumnFamily.digest(ColumnFamily) 193 1.145
>
> org.apache.cassandra.service.RowDigestResolver.resolve()   106 0.629
>   
> org.apache.cassandra.service.RowDigestResolver.resolve()106 0.629
>  
> org.apache.cassandra.service.ReadCallback.get()  88  0.522
> 
> org.apache.cassandra.service.AbstractReadExecutor.get()   88  0.522
>
> org.apache.cassandra.service.StorageProxy.fetchRows(List, ConsistencyLevel)   
>  88  0.522
>   
> org.apache.cassandra.service.StorageProxy.read(List, ConsistencyLevel)  
> 88  0.522
>  
> org.apache.cassandra.service.pager.SliceQueryPager.queryNextPage(int, 
> ConsistencyLevel, boolean) 88  0.522
> 
> org.apache.cassandra.service.pager.AbstractQueryPager.fetchPage(int)  88  
> 0.522
>
> org.apache.cassandra.service.pager.SliceQueryPager.fetchPage(int)  88  
> 0.522
>   
> org.apache.cassandra.cql3.statements.SelectStatement.execute(QueryState, 
> QueryOptions)  88  0.522
>  
> org.apache.cassandra.cql3.statements.SelectStatement.execute(QueryState, 
> QueryOptions)   88  0.522
> 
> org.apache.cassandra.cql3.QueryProcessor.processStatement(CQLStatement, 
> QueryState, QueryOptions) 88  0.522
>
> org.apache.cassandra.cql3.QueryProcessor.process(String, QueryState, 
> QueryOptions) 88  0.522
>   
> org.apache.cassandra.transport.messages.QueryMessage.execute(QueryState)
> 88  0.522
>  
> org.apache.cassandra.transport.Message$Dispatcher.messageReceived(ChannelHandlerContext,
>  MessageEvent)   88  0.522
> 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(ChannelHandlerContext,
>  ChannelEvent)  88  0.522
>
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline$DefaultChannelHandlerContext,
>  ChannelEvent) 88  0.522
>   
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(ChannelEvent)
>   88  0.522
>   
>