[jira] [Commented] (CASSANDRA-12367) Add an API to request the size of a CQL partition
[ https://issues.apache.org/jira/browse/CASSANDRA-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15503192#comment-15503192 ] Sylvain Lebresne commented on CASSANDRA-12367: -- Updated patch looks good, but we should have some basic tests for this before committing. bq. I don't feel strongly either way since I also agree that both options have merit. I've left the check in for now but I have no objection to removing it if others feel strongly. Not really feeling strongly either. Ok to leave it as it for now. > Add an API to request the size of a CQL partition > - > > Key: CASSANDRA-12367 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12367 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 3.x > > Attachments: 12367-trunk-v2.txt, 12367-trunk.txt > > > It would be useful to have an API that we could use to get the total > serialized size of a CQL partition, scoped by keyspace and table, on disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12367) Add an API to request the size of a CQL partition
[ https://issues.apache.org/jira/browse/CASSANDRA-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15497594#comment-15497594 ] sankalp kohli commented on CASSANDRA-12367: --- I think we should return like -1 if key is not replicated to the box and not 0. The reason is that 0 should mean the key is not there in that instance and -1 will tell you that you are not calling the correct instances. > Add an API to request the size of a CQL partition > - > > Key: CASSANDRA-12367 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12367 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 3.x > > Attachments: 12367-trunk-v2.txt, 12367-trunk.txt > > > It would be useful to have an API that we could use to get the total > serialized size of a CQL partition, scoped by keyspace and table, on disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12367) Add an API to request the size of a CQL partition
[ https://issues.apache.org/jira/browse/CASSANDRA-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15495387#comment-15495387 ] Geoffrey Yu commented on CASSANDRA-12367: - Thanks for the first pass [~slebresne]! I added another commit to address your comments [here|https://github.com/geoffxy/cassandra/commit/a71968ebba8b67591b88cafd2daf3b37e17fec52]. I added {{rowCount()}} to the {{Partition}} interface to be able to pass in a {{rowEstimate}} to {{UnfilteredRowIteratorSerializer.serializedSize()}} since all the implementing classes already had that method available. Please let me know how it looks now! {quote} Wonders if it wouldn't be more user friendly to return 0 if the key is not hosted on that replica (which will simply happen if we don't check anything). Genuine question though, I could see both options having advantages, so mentioning it for the sake of discussion. {quote} I don't feel strongly either way since I also agree that both options have merit. I've left the check in for now but I have no objection to removing it if others feel strongly. > Add an API to request the size of a CQL partition > - > > Key: CASSANDRA-12367 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12367 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 3.x > > Attachments: 12367-trunk-v2.txt, 12367-trunk.txt > > > It would be useful to have an API that we could use to get the total > serialized size of a CQL partition, scoped by keyspace and table, on disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12367) Add an API to request the size of a CQL partition
[ https://issues.apache.org/jira/browse/CASSANDRA-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15488541#comment-15488541 ] sankalp kohli commented on CASSANDRA-12367: --- This JIRA we created is for getting the size on disk for a CQL partition. You might want to create a separate JIRA for SIZE ON feature. > Add an API to request the size of a CQL partition > - > > Key: CASSANDRA-12367 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12367 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 3.x > > Attachments: 12367-trunk-v2.txt, 12367-trunk.txt > > > It would be useful to have an API that we could use to get the total > serialized size of a CQL partition, scoped by keyspace and table, on disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12367) Add an API to request the size of a CQL partition
[ https://issues.apache.org/jira/browse/CASSANDRA-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15488516#comment-15488516 ] Russell Bradberry commented on CASSANDRA-12367: --- {quote} Also by SIZE ON, will it return the size of data the query is returning or size on disk? {quote} would probably make the most sense as the size of data returned from the query. Size on disk could mean many things, eg. compression etc. > Add an API to request the size of a CQL partition > - > > Key: CASSANDRA-12367 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12367 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 3.x > > Attachments: 12367-trunk-v2.txt, 12367-trunk.txt > > > It would be useful to have an API that we could use to get the total > serialized size of a CQL partition, scoped by keyspace and table, on disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12367) Add an API to request the size of a CQL partition
[ https://issues.apache.org/jira/browse/CASSANDRA-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15488499#comment-15488499 ] sankalp kohli commented on CASSANDRA-12367: --- The reason you need to make this call before write is because you don't want to make the partition too big. "In this case it would be the size of the query, if you want the size of a given partition then you would run a query specifying only the partition key." If we run a query specifying only partition key, it will read gigs of data and will probability timeout. So won't work. We want a cheap way to know the size of CQL partition. Also by SIZE ON, will it return the size of data the query is returning or size on disk? > Add an API to request the size of a CQL partition > - > > Key: CASSANDRA-12367 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12367 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 3.x > > Attachments: 12367-trunk-v2.txt, 12367-trunk.txt > > > It would be useful to have an API that we could use to get the total > serialized size of a CQL partition, scoped by keyspace and table, on disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12367) Add an API to request the size of a CQL partition
[ https://issues.apache.org/jira/browse/CASSANDRA-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15487263#comment-15487263 ] Russell Bradberry commented on CASSANDRA-12367: --- {quote} I am not sure how it will work like tracing with SIZE ON? When you issue a query after SIZE ON, will it give the size of the query or CQL partition? {quote} In this case it would be the size of the query, if you want the size of a given partition then you would run a query specifying only the partition key. {quote} Also we will need the size before every read or write. This will cause calling SIZE ON and then OFF after every operation. {quote} Why? I was suggesting this for the CQL specific representation, the internal representation could still be a JMX call. If the client needs it for every read/write then it would just always be on, just as if you wanted to have the trace information for every read/write. > Add an API to request the size of a CQL partition > - > > Key: CASSANDRA-12367 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12367 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 3.x > > Attachments: 12367-trunk-v2.txt, 12367-trunk.txt > > > It would be useful to have an API that we could use to get the total > serialized size of a CQL partition, scoped by keyspace and table, on disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12367) Add an API to request the size of a CQL partition
[ https://issues.apache.org/jira/browse/CASSANDRA-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15486892#comment-15486892 ] Sylvain Lebresne commented on CASSANDRA-12367: -- bq. Are these changes similar to what you had in mind? Yes, that's what I had in mind, thanks. A few remarks from eye-balling it: * You can get the uncompressed length of a {{SSTableReader}} with the {{uncompressedLength()}} method, no need to open a scanner. * You can get the sstables for a given table using {{ColumnFamilyStore#getLiveSSTables()}} (or {{ColumnFamilyStore#getSSTables(SSTableSet.CANONICAL)}} if you really want the canonical set, though that probably doesn't matter much here) rather than iterating over all sstables of the keyspace. * Would be more consistent to reuse {{StorageService#getValidColumnFamilies()}} rather than re-inventing your own checking (namely {{validateKeyspaceTableCombination}}). * Regarding the memtable, it makes sense to have the option to include it, but I think we should be consistent in what we sum. For sstables, what we use is the serialized size of the partition, so I think we should do the same for memtables, that is call {{UnfilteredRowIteratorSerializer.serializedSize(partition.unfilteredIterator())}}. * Wonders if it wouldn't be more user friendly to return 0 if the key is not hosted on that replica (which will simply happen if we don't check anything). Genuine question though, I could see both options having advantages, so mentioning it for the sake of discussion. * I'd maybe call the JMX call {{getSerializedPartitionSize}} (or even {{getSerializedPartitionSizeInBytes}}) so it's a bit more explicit. > Add an API to request the size of a CQL partition > - > > Key: CASSANDRA-12367 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12367 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 3.x > > Attachments: 12367-trunk-v2.txt, 12367-trunk.txt > > > It would be useful to have an API that we could use to get the total > serialized size of a CQL partition, scoped by keyspace and table, on disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12367) Add an API to request the size of a CQL partition
[ https://issues.apache.org/jira/browse/CASSANDRA-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15485830#comment-15485830 ] sankalp kohli commented on CASSANDRA-12367: --- I am not sure how it will work like tracing with SIZE ON? When you issue a query after SIZE ON, will it give the size of the query or CQL partition? Also we will need the size before every read or write. This will cause calling SIZE ON and then OFF after every operation. > Add an API to request the size of a CQL partition > - > > Key: CASSANDRA-12367 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12367 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 3.x > > Attachments: 12367-trunk-v2.txt, 12367-trunk.txt > > > It would be useful to have an API that we could use to get the total > serialized size of a CQL partition, scoped by keyspace and table, on disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12367) Add an API to request the size of a CQL partition
[ https://issues.apache.org/jira/browse/CASSANDRA-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15484582#comment-15484582 ] Russell Bradberry commented on CASSANDRA-12367: --- I agree with [~thobbs] that it doesn't really belong in CQL directly. The writeTime and ttl meta information in CQL is at the column level and makes sense. What about exposing it in the same way that TRACING is exposed? where setting something like "SIZES ON" will modify the output and can be implemented in the clients in a similar fashion > Add an API to request the size of a CQL partition > - > > Key: CASSANDRA-12367 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12367 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 3.x > > Attachments: 12367-trunk-v2.txt, 12367-trunk.txt > > > It would be useful to have an API that we could use to get the total > serialized size of a CQL partition, scoped by keyspace and table, on disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12367) Add an API to request the size of a CQL partition
[ https://issues.apache.org/jira/browse/CASSANDRA-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15461776#comment-15461776 ] Geoffrey Yu commented on CASSANDRA-12367: - [~slebresne]: Are [these changes|https://github.com/geoffxy/cassandra/compare/trunk...geoffxy:CASSANDRA-12367?w=1] similar to what you had in mind? It is meant to subtract the offsets between {{RowIndexedEntry}} objects corresponding to the partition key and the next partition key in the file, to get a size in bytes. I also kept the code that reads the partition from the memtable so that it would be possible for the operator to get information on the partition's footprint in the memtable as well. However, it also ignores {{Unfiltered}} objects that are not {{Row}} s. > Add an API to request the size of a CQL partition > - > > Key: CASSANDRA-12367 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12367 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 3.x > > Attachments: 12367-trunk-v2.txt, 12367-trunk.txt > > > It would be useful to have an API that we could use to get the total > serialized size of a CQL partition, scoped by keyspace and table, on disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12367) Add an API to request the size of a CQL partition
[ https://issues.apache.org/jira/browse/CASSANDRA-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15449767#comment-15449767 ] sankalp kohli commented on CASSANDRA-12367: --- Lets implement this in JMX and create another JIRA to do it with virtual tables then. I still think it is similar to write time as it also returns internal state of the DB even if this is not the CQL path. > Add an API to request the size of a CQL partition > - > > Key: CASSANDRA-12367 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12367 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 3.x > > Attachments: 12367-trunk-v2.txt, 12367-trunk.txt > > > It would be useful to have an API that we could use to get the total > serialized size of a CQL partition, scoped by keyspace and table, on disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12367) Add an API to request the size of a CQL partition
[ https://issues.apache.org/jira/browse/CASSANDRA-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15449755#comment-15449755 ] sankalp kohli commented on CASSANDRA-12367: --- ok lets do JMX for this. > Add an API to request the size of a CQL partition > - > > Key: CASSANDRA-12367 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12367 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 3.x > > Attachments: 12367-trunk-v2.txt, 12367-trunk.txt > > > It would be useful to have an API that we could use to get the total > serialized size of a CQL partition, scoped by keyspace and table, on disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12367) Add an API to request the size of a CQL partition
[ https://issues.apache.org/jira/browse/CASSANDRA-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15449351#comment-15449351 ] Sylvain Lebresne commented on CASSANDRA-12367: -- bq. We need this feature now so we can do it with CQL and then when virtual tables are implemented, move this feature there. What do you think Sylvain Lebresne With all due respect, that's not really an argument. I'm staying on my position that the proposed CQL mechanism is imo weird, unintuitive and ad-hoc (from a CQL standpoint) and I really think we shouldn't do it that way. I get that "you" want it, but we have to think of the good of the software in general, and stuffs are much more easy to add than remove, so adding something "ugly" now to change it later don't really work. I'd be ok with focusing on JMX only for this ticket and creating a new one to do it well in CQL, which again probably means using this has initial motivation for introducing the virtual table mechanism we've been talking about, and I'm even happy helping with that as I think 1) this would be more generally useful anyway and 2) I'm not sure it's *that* hard in practice. > Add an API to request the size of a CQL partition > - > > Key: CASSANDRA-12367 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12367 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 3.x > > Attachments: 12367-trunk-v2.txt, 12367-trunk.txt > > > It would be useful to have an API that we could use to get the total > serialized size of a CQL partition, scoped by keyspace and table, on disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12367) Add an API to request the size of a CQL partition
[ https://issues.apache.org/jira/browse/CASSANDRA-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15446980#comment-15446980 ] sankalp kohli commented on CASSANDRA-12367: --- We need this feature now so we can do it with CQL and then when virtual tables are implemented, move this feature there. What do you think [~slebresne] JMX is not an option since clients need parallel effort to do connection pooling, etc to use this. Also JMX is not very good in performance as we have seen with perf testing for high volume calls. > Add an API to request the size of a CQL partition > - > > Key: CASSANDRA-12367 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12367 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 3.x > > Attachments: 12367-trunk-v2.txt, 12367-trunk.txt > > > It would be useful to have an API that we could use to get the total > serialized size of a CQL partition, scoped by keyspace and table, on disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12367) Add an API to request the size of a CQL partition
[ https://issues.apache.org/jira/browse/CASSANDRA-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15446458#comment-15446458 ] Jon Haddad commented on CASSANDRA-12367: If you're going to include it as a CQL option, I'd like to suggest making it a function size() rather than a special keyword. > Add an API to request the size of a CQL partition > - > > Key: CASSANDRA-12367 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12367 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 3.x > > Attachments: 12367-trunk-v2.txt, 12367-trunk.txt > > > It would be useful to have an API that we could use to get the total > serialized size of a CQL partition, scoped by keyspace and table, on disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12367) Add an API to request the size of a CQL partition
[ https://issues.apache.org/jira/browse/CASSANDRA-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15445841#comment-15445841 ] Sylvain Lebresne commented on CASSANDRA-12367: -- I'm not entirely convinced by the way this is implemented because: # it iterates over every row which sounds pretty wasteful, especially if the goal is to have a cheap way to determine how big a partition is on disk (though the description of the ticket could use a bit more in term of motivation, so I'm mainly guessing that's the intended use case). # it uses {{Row#dataSize()}} which only return the size of data contained in the row, but ignoring all the artifact of the serialization. It also ignores range tombstones. This overall mean the return number doesn't really represent the size on disk, and what it represent is a big ad-hoc currently imo. What I'd suggest is instead to use the index file, and return the actual size of the data on disk (by simply subtracting the offset of the start and end of the partition in the sstable). This would be *a lot* faster and imo more meaningful (the only caveat being that it's still not the size on disk since it ignores compression, but that's probably kind of ok). Regarding exposing that in CQL however, I'm pretty much -1 on the syntax suggested. I agree with Tyler, this is way too weird to make such a special case in CQL. This is very different from the {{ttl()}} and {{writetime()}} method for instance, in that those just return data that are part of CQL. This metrics here imply a completely different path (since it's intrinsically a local query) and result set, which means it'd be almost cleaner to have a full different statement, like {{GET_PARTITION_SIZE FROM foo WHERE ...}} instead of reusing {{SELECT}}. I'm *not* suggesting we add that too since imo it's way too ad-hoc to justified the addition. Don't get me wrong, I think this could be exposed much more elegantly once we have virtual tables and I'll be happy to do so when we have them. And yes, virtual tables will probably take a bit more time to come, but we'll have the JMX call in the meantime. > Add an API to request the size of a CQL partition > - > > Key: CASSANDRA-12367 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12367 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 3.x > > Attachments: 12367-trunk-v2.txt, 12367-trunk.txt > > > It would be useful to have an API that we could use to get the total > serialized size of a CQL partition, scoped by keyspace and table, on disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12367) Add an API to request the size of a CQL partition
[ https://issues.apache.org/jira/browse/CASSANDRA-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15445156#comment-15445156 ] Marcus Eriksson commented on CASSANDRA-12367: - [~geoffxy] I *think* we could do something like this: {code} DataRange keyRange = DataRange.forKeyRange(new Range<>(key.getToken().minKeyBound(), key.getToken().maxKeyBound())); sstable.getScanner(ColumnFilter.all(store.metadata), keyRange, false); {code} > Add an API to request the size of a CQL partition > - > > Key: CASSANDRA-12367 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12367 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 3.x > > Attachments: 12367-trunk-v2.txt, 12367-trunk.txt > > > It would be useful to have an API that we could use to get the total > serialized size of a CQL partition, scoped by keyspace and table, on disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12367) Add an API to request the size of a CQL partition
[ https://issues.apache.org/jira/browse/CASSANDRA-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15445093#comment-15445093 ] Marcus Eriksson commented on CASSANDRA-12367: - We already expose some metadata using CQL (writetime(..), ttl(..)) so it wouldn't be a total special case, even though the syntax looks a bit weird (but I can't think of a better one) > Add an API to request the size of a CQL partition > - > > Key: CASSANDRA-12367 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12367 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 3.x > > Attachments: 12367-trunk-v2.txt, 12367-trunk.txt > > > It would be useful to have an API that we could use to get the total > serialized size of a CQL partition, scoped by keyspace and table, on disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12367) Add an API to request the size of a CQL partition
[ https://issues.apache.org/jira/browse/CASSANDRA-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15438264#comment-15438264 ] sankalp kohli commented on CASSANDRA-12367: --- As per discussion with [~thobbs] we can do some sort of virtual tables for this to expose this. But i think that will be a longer project. Can we do this here and later once we have that feature move this call. > Add an API to request the size of a CQL partition > - > > Key: CASSANDRA-12367 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12367 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 3.x > > Attachments: 12367-trunk-v2.txt, 12367-trunk.txt > > > It would be useful to have an API that we could use to get the total > serialized size of a CQL partition, scoped by keyspace and table, on disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12367) Add an API to request the size of a CQL partition
[ https://issues.apache.org/jira/browse/CASSANDRA-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15437526#comment-15437526 ] sankalp kohli commented on CASSANDRA-12367: --- I agree JMX will be simpler however it will be too much effort from the client teams to do this via JMX. Different teams need to implement this in there stack and will be hard to use. Also they need to set timeouts, connection pooling for making these calls which already happens in Java driver. Due to these and creating a parallel process to get this information, I think we should do it over CQL. > Add an API to request the size of a CQL partition > - > > Key: CASSANDRA-12367 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12367 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 3.x > > Attachments: 12367-trunk.txt > > > It would be useful to have an API that we could use to get the total > serialized size of a CQL partition, scoped by keyspace and table, on disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12367) Add an API to request the size of a CQL partition
[ https://issues.apache.org/jira/browse/CASSANDRA-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15437495#comment-15437495 ] Tyler Hobbs commented on CASSANDRA-12367: - bq. As per NGCC talk of Patrick..we are opening up CQL to query C* metrics. The discussion at NGCC was about exposing virtual tables that contain metrics, not necessarily modifying the query language to support metrics directly. bq. Also if we expose it with JMX...how will apps make the call for which this is useful. They need to know which replica the key maps to and then call the JMX. Also we dont want to expose JMX auth to clients to call at will. So I dont see any other way besides CQL to expose this to clients. The drivers have tools for determining the replicas for a partition key. As for exposing JMX to clients, you could use something like mx4j or jolokia in front of Cassandra instead to present a limited interface. > Add an API to request the size of a CQL partition > - > > Key: CASSANDRA-12367 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12367 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 3.x > > Attachments: 12367-trunk.txt > > > It would be useful to have an API that we could use to get the total > serialized size of a CQL partition, scoped by keyspace and table, on disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12367) Add an API to request the size of a CQL partition
[ https://issues.apache.org/jira/browse/CASSANDRA-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15437481#comment-15437481 ] sankalp kohli commented on CASSANDRA-12367: --- As per NGCC talk of Patrick..we are opening up CQL to query C* metrics. Also if we expose it with JMX...how will apps make the call for which this is useful. They need to know which replica the key maps to and then call the JMX. Also we dont want to expose JMX auth to clients to call at will. So I dont see any other way besides CQL to expose this to clients. > Add an API to request the size of a CQL partition > - > > Key: CASSANDRA-12367 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12367 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 3.x > > Attachments: 12367-trunk.txt > > > It would be useful to have an API that we could use to get the total > serialized size of a CQL partition, scoped by keyspace and table, on disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12367) Add an API to request the size of a CQL partition
[ https://issues.apache.org/jira/browse/CASSANDRA-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15437480#comment-15437480 ] sankalp kohli commented on CASSANDRA-12367: --- As per NGCC talk of Patrick..we are opening up CQL to query C* metrics. Also if we expose it with JMX...how will apps make the call for which this is useful. They need to know which replica the key maps to and then call the JMX. Also we dont want to expose JMX auth to clients to call at will. So I dont see any other way besides CQL to expose this to clients. > Add an API to request the size of a CQL partition > - > > Key: CASSANDRA-12367 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12367 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 3.x > > Attachments: 12367-trunk.txt > > > It would be useful to have an API that we could use to get the total > serialized size of a CQL partition, scoped by keyspace and table, on disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12367) Add an API to request the size of a CQL partition
[ https://issues.apache.org/jira/browse/CASSANDRA-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15437450#comment-15437450 ] Tyler Hobbs commented on CASSANDRA-12367: - Doing this as a special case in CQL feels wrong to me. The query language is really designed for querying data in the database, not metadata about the storage layer. I'd prefer to stick with JMX for this. > Add an API to request the size of a CQL partition > - > > Key: CASSANDRA-12367 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12367 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 3.x > > Attachments: 12367-trunk.txt > > > It would be useful to have an API that we could use to get the total > serialized size of a CQL partition, scoped by keyspace and table, on disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12367) Add an API to request the size of a CQL partition
[ https://issues.apache.org/jira/browse/CASSANDRA-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435591#comment-15435591 ] sankalp kohli commented on CASSANDRA-12367: --- We should actually expose a CQL call to read this value from the replicas and return back all results. Example: Select SIZE from test where a =10; //a is CQL partition Make this query at consistency QUORUM with RF=3 EndpointSize 10.0.0.1 987987 10.0.0.2 7897 cc [~krummas] What do you think? > Add an API to request the size of a CQL partition > - > > Key: CASSANDRA-12367 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12367 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 3.x > > Attachments: 12367-trunk.txt > > > It would be useful to have an API that we could use to get the total > serialized size of a CQL partition, scoped by keyspace and table, on disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12367) Add an API to request the size of a CQL partition
[ https://issues.apache.org/jira/browse/CASSANDRA-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15433389#comment-15433389 ] sankalp kohli commented on CASSANDRA-12367: --- Also what if we expose this through CQL as well. There are clients who want to know how big the CQL partition is. So something like select bytes from table where What do you think. This call will be lot cheaper than counting the CQL rows and finding out how big the partition is. > Add an API to request the size of a CQL partition > - > > Key: CASSANDRA-12367 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12367 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 3.x > > Attachments: 12367-trunk.txt > > > It would be useful to have an API that we could use to get the total > serialized size of a CQL partition, scoped by keyspace and table, on disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12367) Add an API to request the size of a CQL partition
[ https://issues.apache.org/jira/browse/CASSANDRA-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15433370#comment-15433370 ] sankalp kohli commented on CASSANDRA-12367: --- I dont think we should output more information as it will make this call expensive. SO we should stick to size for this JMX call. We can always add more JMX calls for the things you are suggesting. > Add an API to request the size of a CQL partition > - > > Key: CASSANDRA-12367 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12367 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 3.x > > Attachments: 12367-trunk.txt > > > It would be useful to have an API that we could use to get the total > serialized size of a CQL partition, scoped by keyspace and table, on disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12367) Add an API to request the size of a CQL partition
[ https://issues.apache.org/jira/browse/CASSANDRA-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15432280#comment-15432280 ] Marcus Eriksson commented on CASSANDRA-12367: - Could we use {{SSTableReader.getScanner(Range range, ...)}} instead of scanning all the partitions in the sstable? We would need to create the range so that it includes the token requested but I think it should save us some time by seeking to the correct position directly. > Add an API to request the size of a CQL partition > - > > Key: CASSANDRA-12367 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12367 > Project: Cassandra > Issue Type: Improvement >Reporter: Geoffrey Yu >Assignee: Geoffrey Yu >Priority: Minor > Fix For: 3.x > > Attachments: 12367-trunk.txt > > > It would be useful to have an API that we could use to get the total > serialized size of a CQL partition, scoped by keyspace and table, on disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)