[jira] [Updated] (CASSANDRA-5664) Improve serialization in the native protocol
[ https://issues.apache.org/jira/browse/CASSANDRA-5664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-5664: Attachment: 0002-Avoid-copy-when-compressing-native-protocol-frames.txt 0001-Rewrite-encoding-methods.txt Thanks for the remarks. And while I don't disagree with those, it seems to me that they all require a bit more investigation work to see whether they are worth the trouble, and while we should certainly do this additional work at some point, I think the patches here do improve the cleanness of the current serialization code enough that I'd rather leave those to a follow up and commit this now (provided we agree that the patches are already an improvement on the status quo). So attaching rebased patches but so far they are just rebased, nothing more. But to address the observations more precisely: bq. There's quite a bit of string encoding and object serialization going on in some of the encodedSize() methods. This means that strings/objects will be encoded/serialized twice. True. For strings, while I'm far from being an expert in unicode encodings and charsets, I'm not sure how one could compute the size of the result of UTF8 encoding without doing the actual encoding. Of course, the current CBUtil.sizeOfString does an array allocation which in theory could be avoided, but I'm not aware of a method that return the size of an encoded string without encoding it (and I'm certainly not writing one manually). So I'm open to suggestions, but I doubt this is much of a bottleneck in practice (see below for more on that) so... As for object serialization (that are not strings), I'm not totally sure which ones you are referring to exactly but I don't see anything that looks like a great waste of effort. bq. byte[] allocation and copying in encode() should be possible to avoid when serializing strings by careful use of ChannelBuffer.toByteBuffer(), CharBuffer.wrap() and CharsetEncoder.encode(). You're probably right, but I believe that means assuming (correct me if I'm wrong) that the result of ChannelBuffer.toByteBuffer() *will* share content with the ChannelBuffer it is called on, which is not guaranteed by that API. And while it is true that as of this patch we only call CBUtil.writeString() on ChannelBuffer for which this will be true, it bothers me to rely on it as this could easily break in a very hard to notice way in the future. Furthermore, I'm reasonably confident that this doesn't really matter in practice (performance wise) since if you use prepared statement (which you should if you care about performance) and with the optimization of the v2 protocol of not returning metadata in the result set for said prepared statement, I don't think there's any string serialization going on in the hot path. bq. It might be worth investigating if the code duplication in encode() and encodedSize() can be eliminated It does is worth the investigation. But we already do similar code duplication for other serialization code (namely IVersionedSerializer stuffs). So given the patch is already written and since, as said above, I'm pretty convinced this does already is better than the status quo, I'd rather left such refactoring to a latter ticket that could consider cleaning up all existing serialization code (though on a personal level, that particular code duplication from IVersionedSerializer has never bothered me much in practice so I don't see fixing it a priority). Improve serialization in the native protocol Key: CASSANDRA-5664 URL: https://issues.apache.org/jira/browse/CASSANDRA-5664 Project: Cassandra Issue Type: Improvement Reporter: Sylvain Lebresne Assignee: Sylvain Lebresne Priority: Minor Fix For: 2.0 Attachments: 0001-Rewrite-encoding-methods.txt, 0002-Avoid-copy-when-compressing-native-protocol-frames.txt Message serialization in the native protocol currently make a Netty's ChannelBuffers.wrappedBuffer(). The rational was to avoid copying of the values bytes when such value are biggish. This has a cost however, especially with lots of small values, and as suggested in CASSANDRA-5422, this might well be a more common scenario for Cassandra, so let's consider directly serializing in a newly allocated buffer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-5664) Improve serialization in the native protocol
[ https://issues.apache.org/jira/browse/CASSANDRA-5664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-5664: Attachment: (was: 0002-Avoid-copy-when-compressing-native-protocol-frames.txt) Improve serialization in the native protocol Key: CASSANDRA-5664 URL: https://issues.apache.org/jira/browse/CASSANDRA-5664 Project: Cassandra Issue Type: Improvement Reporter: Sylvain Lebresne Assignee: Sylvain Lebresne Priority: Minor Fix For: 2.0 Attachments: 0001-Rewrite-encoding-methods.txt, 0002-Avoid-copy-when-compressing-native-protocol-frames.txt Message serialization in the native protocol currently make a Netty's ChannelBuffers.wrappedBuffer(). The rational was to avoid copying of the values bytes when such value are biggish. This has a cost however, especially with lots of small values, and as suggested in CASSANDRA-5422, this might well be a more common scenario for Cassandra, so let's consider directly serializing in a newly allocated buffer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-5664) Improve serialization in the native protocol
[ https://issues.apache.org/jira/browse/CASSANDRA-5664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-5664: Attachment: (was: 0001-Rewrite-encoding-methods.txt) Improve serialization in the native protocol Key: CASSANDRA-5664 URL: https://issues.apache.org/jira/browse/CASSANDRA-5664 Project: Cassandra Issue Type: Improvement Reporter: Sylvain Lebresne Assignee: Sylvain Lebresne Priority: Minor Fix For: 2.0 Attachments: 0001-Rewrite-encoding-methods.txt, 0002-Avoid-copy-when-compressing-native-protocol-frames.txt Message serialization in the native protocol currently make a Netty's ChannelBuffers.wrappedBuffer(). The rational was to avoid copying of the values bytes when such value are biggish. This has a cost however, especially with lots of small values, and as suggested in CASSANDRA-5422, this might well be a more common scenario for Cassandra, so let's consider directly serializing in a newly allocated buffer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-5664) Improve serialization in the native protocol
[ https://issues.apache.org/jira/browse/CASSANDRA-5664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-5664: -- Can you review [~danielnorberg]? Improve serialization in the native protocol Key: CASSANDRA-5664 URL: https://issues.apache.org/jira/browse/CASSANDRA-5664 Project: Cassandra Issue Type: Improvement Reporter: Sylvain Lebresne Assignee: Sylvain Lebresne Priority: Minor Fix For: 2.0 Attachments: 0001-Rewrite-encoding-methods.txt, 0002-Avoid-copy-when-compressing-native-protocol-frames.txt Message serialization in the native protocol currently make a Netty's ChannelBuffers.wrappedBuffer(). The rational was to avoid copying of the values bytes when such value are biggish. This has a cost however, especially with lots of small values, and as suggested in CASSANDRA-5422, this might well be a more common scenario for Cassandra, so let's consider directly serializing in a newly allocated buffer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-5664) Improve serialization in the native protocol
[ https://issues.apache.org/jira/browse/CASSANDRA-5664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-5664: -- Reviewer: danielnorberg Can you review [~danielnorberg]? Improve serialization in the native protocol Key: CASSANDRA-5664 URL: https://issues.apache.org/jira/browse/CASSANDRA-5664 Project: Cassandra Issue Type: Improvement Reporter: Sylvain Lebresne Assignee: Sylvain Lebresne Priority: Minor Fix For: 2.0 Attachments: 0001-Rewrite-encoding-methods.txt, 0002-Avoid-copy-when-compressing-native-protocol-frames.txt Message serialization in the native protocol currently make a Netty's ChannelBuffers.wrappedBuffer(). The rational was to avoid copying of the values bytes when such value are biggish. This has a cost however, especially with lots of small values, and as suggested in CASSANDRA-5422, this might well be a more common scenario for Cassandra, so let's consider directly serializing in a newly allocated buffer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-5664) Improve serialization in the native protocol
[ https://issues.apache.org/jira/browse/CASSANDRA-5664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-5664: Attachment: 0002-Avoid-copy-when-compressing-native-protocol-frames.txt 0001-Rewrite-encoding-methods.txt Attaching patch for this. This does clean up serialization a bit, will probably be a bit faster and allocate less objects. Last but not least, this make it easy to not do another copy when we compress the frames, which is what the 2nd patch does. Improve serialization in the native protocol Key: CASSANDRA-5664 URL: https://issues.apache.org/jira/browse/CASSANDRA-5664 Project: Cassandra Issue Type: Improvement Reporter: Sylvain Lebresne Priority: Minor Attachments: 0001-Rewrite-encoding-methods.txt, 0002-Avoid-copy-when-compressing-native-protocol-frames.txt Message serialization in the native protocol currently make a Netty's ChannelBuffers.wrappedBuffer(). The rational was to avoid copying of the values bytes when such value are biggish. This has a cost however, especially with lots of small values, and as suggested in CASSANDRA-5422, this might well be a more common scenario for Cassandra, so let's consider directly serializing in a newly allocated buffer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira