[jira] [Commented] (CASSANDRA-6060) Remove internal use of Strings for ks/cf names

2015-03-03 Thread Brian Hess (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14345250#comment-14345250
 ] 

 Brian Hess commented on CASSANDRA-6060:


I know this ticket is closed, but there is another use case that might make 
this more useful.  Namely, with the advent of CTAS (CASSANDRA-8234), you could 
want to change the primary key of a table.  To do that, you could create a new 
table with the new primary key and select the old data into it.  The last step, 
for cleanliness, might be to drop the original table alter the name of the new 
table to the original table name - thereby completing the change of the primary 
key.

 Remove internal use of Strings for ks/cf names
 --

 Key: CASSANDRA-6060
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6060
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Ariel Weisberg
  Labels: performance

 We toss a lot of Strings around internally, including across the network.  
 Once a request has been Prepared, we ought to be able to encode these as int 
 ids.
 Unfortuntely, we moved from int to uuid in CASSANDRA-3794, which was a 
 reasonable move at the time, but a uuid is a lot bigger than an int.  Now 
 that we have CAS we can allow concurrent schema updates while still using 
 sequential int IDs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-6060) Remove internal use of Strings for ks/cf names

2014-12-10 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241364#comment-14241364
 ] 

Ariel Weisberg commented on CASSANDRA-6060:
---

I am still digging but I am not sure there is much value here.

For prepared statements between client and server there are no ks/cf names.

Here is the breakdown for a minimum size mutation inside the cluster

Size of Ethernet frame - 24 Bytes
Size of IPv4 Header (without any options) - 20 bytes
Size of TCP Header (without any options) - 20 Bytes

4-bytes protocol magic
4-bytes version
4-bytes timestamp
4-bytes verb
4-bytes parameter count
4-bytes payload length prefix
No keyspace name in current versions
2-byte key length
key say 10 bytes
4-byte mutation count

1-byte boolean
16-byte cf id
4-byte count of columns

Per column
2-byte column name length prefix
column name say 8 bytes
1-byte serialization flags
8-byte timestamp
4-byte length prefix
column value say 8 bytes

Total is 158 bytes. Saving 12 bytes on the CF uuid would be 7.5 %. 

For single CF mutations this is not a win. Loading data points 16 bytes at a 
time isn't going to work so hot anyways so people might look into batching at 
that point.

The UUID is not repeated for each cell so it is a one time cost so for 
workloads that modify multiple cells per CF. The one case where the 12-bytes 
becomes significant is single cell updates to multiple CFs in one mutation. 
There the 12-byte overhead converges on 23%.

I am going to look at the read path next, but I kind of expect to find 
something similar. A read is going t o have key overhead and possibly overhead 
for all the other query parameters that should match the simple single cell 
mutation case.

 Remove internal use of Strings for ks/cf names
 --

 Key: CASSANDRA-6060
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6060
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Ariel Weisberg
  Labels: performance
 Fix For: 3.0


 We toss a lot of Strings around internally, including across the network.  
 Once a request has been Prepared, we ought to be able to encode these as int 
 ids.
 Unfortuntely, we moved from int to uuid in CASSANDRA-3794, which was a 
 reasonable move at the time, but a uuid is a lot bigger than an int.  Now 
 that we have CAS we can allow concurrent schema updates while still using 
 sequential int IDs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-6060) Remove internal use of Strings for ks/cf names

2014-04-14 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13968216#comment-13968216
 ] 

Benedict commented on CASSANDRA-6060:
-

Bumping to 3.0, as this won't make it into 2.1 now

 Remove internal use of Strings for ks/cf names
 --

 Key: CASSANDRA-6060
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6060
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Vijay
  Labels: performance
 Fix For: 3.0


 We toss a lot of Strings around internally, including across the network.  
 Once a request has been Prepared, we ought to be able to encode these as int 
 ids.
 Unfortuntely, we moved from int to uuid in CASSANDRA-3794, which was a 
 reasonable move at the time, but a uuid is a lot bigger than an int.  Now 
 that we have CAS we can allow concurrent schema updates while still using 
 sequential int IDs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)