Re: trouble with deleted counter columns
On Wed, Nov 30, 2011 at 8:36 AM, Thorsten von Eicken t...@rightscale.com wrote: Running a single 1.0.3 node and using counter columns I have a problem. I have rows with ~200k counters. I deleted a number of such rows and now I can't put counters back in, or really, I can't query what I put back in. The reason is explained at http://wiki.apache.org/cassandra/Counters#Technical_limitations, though it wasn't clear that it was taking your situation into account (I've just updated it though). To rephrase, counters removal is only supported if definitive. You cannot increment after a deletion. Or rather, if you do, the behavior is undetermined. This holds for row deletion too; if you delete a row, you can't increment any counter that was there previously (the truth being that if you wait enough it would work, but how many is enough depends on things like when compaction happens and what is your gc_grace value). Note that I understand this could be a problem for your use case but that is an unfortunate limitation of the current design. Example using the cli: [default@rslog_production] get req_word_freq['2024']; Returned 0 results. Elapsed time: 2089 msec(s). [default@rslog_production] incr req_word_freq['2024']['test']; Value incremented. [default@rslog_production] get req_word_freq['2024']; Returned 0 results. Elapsed time: 2018 msec(s). Note how long it's taking, presumably because it's going through 200K+ tombstones? That is likely the reason, yes. Here's the same using a fresh row key, note the timings: [default@rslog_production] get req_word_freq['test']; Returned 0 results. Elapsed time: 1 msec(s). [default@rslog_production] incr req_word_freq['test']['test']; Value incremented. [default@rslog_production] get req_word_freq['test']; = (counter=test, value=1) Returned 1 results. Elapsed time: 6 msec(s). Incidentally, I then tried out deleting the column and I don't understand why the value is 2 at the end: [default@rslog_production] del req_word_freq['test']; row removed. [default@rslog_production] get req_word_freq['test']; Returned 0 results. Elapsed time: 1 msec(s). [default@rslog_production] incr req_word_freq['test']['test']; Value incremented. [default@rslog_production] get req_word_freq['test']; = (counter=test, value=2) Returned 1 results. Elapsed time: 1 msec(s). All this is on a single node system, running the cassandra-cli on the system itself. The CF is as follows: [default@rslog_production] describe req_word_freq; ColumnFamily: req_word_freq Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type Default column value validator: org.apache.cassandra.db.marshal.CounterColumnType Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type Row cache size / save period in seconds / keys to save : 0.0/0/all Row Cache Provider: org.apache.cassandra.cache.SerializingCacheProvider Key cache size / save period in seconds: 20.0/14400 GC grace seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 1.0 Replicate on write: true Built indexes: [] Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy I must be missing something... Thorsten
Can't run cleanup
Cassandra version 0.8.7, after adding new nodes we can't run cleanup on any node. Log reports: Cleanup cannot run before a node has joined the ring New nodes has joined (one by one), all nodes up running, reading, writing... Not sending/receiving any streams on any node for more than 12 hours. Nodetool's info/ring/tpstats/netstats for all nodes looks fine. Restart don't help. Best regards/ Pagarbiai Viktor Jevdokimov Senior Developer Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063. Fax: +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania [Adform news]http://www.adform.com/ [Visit us!] Follow: [twitter]http://twitter.com/#!/adforminsider Visit our bloghttp://www.adform.com/site/blog Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. inline: signature-logo29.pnginline: dm-exco4823.pnginline: tweet18be.png
Cleanup in a write-only environment
In my understanding Cleanup is meant to help clear out data that has been removed. If you have an environment where data is only ever added (the case for the production system I'm working with), is there a point to automating cleanup? I understand that if we were to ever purge a segment of data from our cluster we'd certainly want to run it, or after added a new node and adjusting the tokens. So I want to make sure I'm not missing something here and that there would be other reasons to run cleanup regularly? -- *David McNelis* Lead Software Engineer Agentis Energy www.agentisenergy.com c: 219.384.5143 *A Smart Grid technology company focused on helping consumers of energy control an often under-managed resource.*
RE: Can't run cleanup
Nodetool repair also don't start on all nodes, log is reporting: INFO 15:57:51,070 Starting repair command #2, repairing 0 ranges. INFO 15:57:51,070 Repair command #2 completed successfully Regular read repairs are working as reads and writes. Best regards/ Pagarbiai Viktor Jevdokimov Senior Developer Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063. Fax: +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania [Adform news]http://www.adform.com/ [Visit us!] Follow: [twitter]http://twitter.com/#!/adforminsider Visit our bloghttp://www.adform.com/site/blog Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. From: Viktor Jevdokimov [mailto:viktor.jevdoki...@adform.com] Sent: Wednesday, November 30, 2011 15:14 To: user@cassandra.apache.org Subject: Can't run cleanup Cassandra version 0.8.7, after adding new nodes we can't run cleanup on any node. Log reports: Cleanup cannot run before a node has joined the ring New nodes has joined (one by one), all nodes up running, reading, writing... Not sending/receiving any streams on any node for more than 12 hours. Nodetool's info/ring/tpstats/netstats for all nodes looks fine. Restart don't help. Best regards/ Pagarbiai Viktor Jevdokimov Senior Developer Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com Phone: +370 5 212 3063. Fax: +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania [Adform news]http://www.adform.com/ [Visit us!] Follow: [twitter]http://twitter.com/#!/adforminsider Visit our bloghttp://www.adform.com/site/blog Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. inline: image001.pnginline: image002.pnginline: image003.pnginline: signature-logo5507.pnginline: dm-exco2d8.pnginline: tweet465.png
[RELEASE] Apache Cassandra 1.0.5 released
As indicated in a preceding mail (http://goo.gl/R1r1V), the release of 1.0.4 unfortunately shipped with two important regressions. The Cassandra team is pleased to announce the release of Apache Cassandra version 1.0.5 that comes to fix those two issues[1], but is identical to 1.0.4 otherwise. Cassandra 1.0.5 can be de downloaded in the usual places, i.e: http://cassandra.apache.org/download/ We sincerely apologize for any inconvenience caused by 1.0.4. As always, please pay attention to the release notes[2] and Let us know[3] if you were to encounter any problem. Have fun! [1]: http://goo.gl/Fod0B (CHANGES.txt) [2]: http://goo.gl/gtUvs (NEWS.txt) [3]: https://issues.apache.org/jira/browse/CASSANDRA
RE: [RELEASE] Apache Cassandra 1.0.5 released
The files are not on the site The requested URL /apache//cassandra/1.0.5/apache-cassandra-1.0.5-bin.tar.gz was not found on this server. Thanks, Michael -Original Message- From: Sylvain Lebresne [mailto:sylv...@datastax.com] Sent: Wednesday, November 30, 2011 9:11 PM To: user@cassandra.apache.org Subject: [RELEASE] Apache Cassandra 1.0.5 released As indicated in a preceding mail (http://goo.gl/R1r1V), the release of 1.0.4 unfortunately shipped with two important regressions. The Cassandra team is pleased to announce the release of Apache Cassandra version 1.0.5 that comes to fix those two issues[1], but is identical to 1.0.4 otherwise. Cassandra 1.0.5 can be de downloaded in the usual places, i.e: http://cassandra.apache.org/download/ We sincerely apologize for any inconvenience caused by 1.0.4. As always, please pay attention to the release notes[2] and Let us know[3] if you were to encounter any problem. Have fun! [1]: http://goo.gl/Fod0B (CHANGES.txt) [2]: http://goo.gl/gtUvs (NEWS.txt) [3]: https://issues.apache.org/jira/browse/CASSANDRA
Re: [RELEASE] Apache Cassandra 1.0.5 released
On Wed, Nov 30, 2011 at 1:29 PM, Michael Vaknine micha...@citypath.com wrote: The files are not on the site The requested URL /apache//cassandra/1.0.5/apache-cassandra-1.0.5-bin.tar.gz was not found on this server. It takes the mirrors some time to sync. -Brandon
RE: [RELEASE] Apache Cassandra 1.0.5 released
Thanks, The files are there already. -Original Message- From: Brandon Williams [mailto:dri...@gmail.com] Sent: Wednesday, November 30, 2011 9:43 PM To: user@cassandra.apache.org Subject: Re: [RELEASE] Apache Cassandra 1.0.5 released On Wed, Nov 30, 2011 at 1:29 PM, Michael Vaknine micha...@citypath.com wrote: The files are not on the site The requested URL /apache//cassandra/1.0.5/apache-cassandra-1.0.5-bin.tar.gz was not found on this server. It takes the mirrors some time to sync. -Brandon
Cassandra_Jobs on Twitter
For those interested in Apache Cassandra related jobs - either hiring or in search of - there is now a @Cassandra_Jobs account on Twitter. You can either send posts to that account on twitter or send them to me at this email address with a public link to the job posting and I will tweet them. Cheers.
RE: [RELEASE] Apache Cassandra 1.0.5 released
Hi, Upgrade 1.0.3 to 1.0.5 I have this errors TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 AbstractCassandraDaemon.java (line 133) Fatal exception in thread Thread TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 java.lang.AssertionError TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at org.apache.cassandra.db.ColumnFamilyStore.maybeSwitchMemtable(ColumnFamilySt ore.java:671) TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at org.apache.cassandra.db.ColumnFamilyStore.forceFlush(ColumnFamilyStore.java: 745) TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at org.apache.cassandra.db.ColumnFamilyStore.forceBlockingFlush(ColumnFamilySto re.java:750) TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at org.apache.cassandra.db.index.keys.KeysIndex.forceBlockingFlush(KeysIndex.ja va:119) TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at org.apache.cassandra.db.index.SecondaryIndexManager.flushIndexesBlocking(Sec ondaryIndexManager.java:258) TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at org.apache.cassandra.db.index.SecondaryIndexManager.maybeBuildSecondaryIndex es(SecondaryIndexManager.java:123) TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at org.apache.cassandra.streaming.StreamInSession.closeIfFinished(StreamInSessi on.java:151) TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReade r.java:103) TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection. java:184) TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.jav a:81) Is this another regression? Thanks Michael -Original Message- From: Brandon Williams [mailto:dri...@gmail.com] Sent: Wednesday, November 30, 2011 9:43 PM To: user@cassandra.apache.org Subject: Re: [RELEASE] Apache Cassandra 1.0.5 released On Wed, Nov 30, 2011 at 1:29 PM, Michael Vaknine micha...@citypath.com wrote: The files are not on the site The requested URL /apache//cassandra/1.0.5/apache-cassandra-1.0.5-bin.tar.gz was not found on this server. It takes the mirrors some time to sync. -Brandon
data modeling question
hey all! i'm started my first project using cassandra and some data model questions. i'm working on an app that fetches stock market data. i need to keep track of when i fetch a set of data for any given stock in any sector; here's what i think my model should look like; fetches : { sector : { quote : { timeuuid: { symbol : --- } } ticks : { timeuuid: { symbol : --- } } fundamentals : { timeuuid: { symbol : --- } } } } is there anything that less an ideal doing it this way versus creating separate CF per sector?how do you create Super CF inside of Super CF via the CLI? thanks, deno
Re: data modeling question
Personally I would create a separate column family for each basic area. For example To organize my sectors and symbols I would create a column family where the key is the sector name and the column names are the symbols for that sector, i.e.: sector : { key: sector name Column names: symbols Column values: null } Then I would have a column family for quotes where I have the key as the symbol, the column name as the timestamp, the value as the quote: quote : { key: symbol column names: timeuuid column values: quote at that time for that symbol } I would then use the same basic structure for your other column families, ticks and fundamentals. In general people tend to stay away from super column families when possible for several reasons, but the most commonly sited one is that when you get a SCF, the entire SCF must be deserialized in order to access it. So if you have a bunch of SCF, you're running a risk of ending up needing to read in a lot more data than is necessary to get the information you are looking for. On Wed, Nov 30, 2011 at 2:57 PM, Deno Vichas d...@syncopated.net wrote: hey all! i'm started my first project using cassandra and some data model questions. i'm working on an app that fetches stock market data. i need to keep track of when i fetch a set of data for any given stock in any sector; here's what i think my model should look like; fetches : { sector : { quote : { timeuuid: { symbol : --- } } ticks : { timeuuid: { symbol : --- } } fundamentals : { timeuuid: { symbol : --- } } } } is there anything that less an ideal doing it this way versus creating separate CF per sector?how do you create Super CF inside of Super CF via the CLI? thanks, deno -- *David McNelis* Lead Software Engineer Agentis Energy www.agentisenergy.com c: 219.384.5143 *A Smart Grid technology company focused on helping consumers of energy control an often under-managed resource.*
Re: Cleanup in a write-only environment
I believe you are mis-understanding what cleanup does. Cleanup is used to remove data from a node that the node no longer owns. For example when you move a node in the ring, it changes responsibility and gets new data, but does not automatically delete the data it used to be responsible for but no longer is. In this situation, you run cleanup to delete all of that old data. Data that has been deleted/expired will get removed automatically as compaction runs. On Wed, Nov 30, 2011 at 7:24 AM, David McNelis dmcne...@agentisenergy.com wrote: In my understanding Cleanup is meant to help clear out data that has been removed. If you have an environment where data is only ever added (the case for the production system I'm working with), is there a point to automating cleanup? I understand that if we were to ever purge a segment of data from our cluster we'd certainly want to run it, or after added a new node and adjusting the tokens. So I want to make sure I'm not missing something here and that there would be other reasons to run cleanup regularly? -- David McNelis Lead Software Engineer Agentis Energy www.agentisenergy.com c: 219.384.5143 A Smart Grid technology company focused on helping consumers of energy control an often under-managed resource.
Re: data modeling question
with the quote CF below how would one query for all keys that have a column name value that have a timeuuid of later than x minutes? i need to be able to find all symbols that have not been fetch in x minutes by sector. i know i get list of symbol by sector from my sector CF. thanks, deno On 11/30/2011 1:07 PM, David McNelis wrote: Then I would have a column family for quotes where I have the key as the symbol, the column name as the timestamp, the value as the quote: quote : { key: symbol column names: timeuuid column values: quote at that time for that symbol }
Re: data modeling question
You wouldn't query for all the keys that have a column name x exactly. Instead what you would do is for sector x grab your list of symbols S. Then you would get the last column for each of those symbols (which you do in different ways depending on the API), and then test if that date is within your threshold. If not, it goes into your list of symbols to fetch. Alternatively, you could iterate over the symbols grabbing data where the date is between range A and B, if you get an empty set / no columns returned, then you need to re-pull for that symbol. Does that make sense? Either way you end up hitting on each of the individual symbols. Maybe someone else has a better idea of how to structure the data for that particular use case. On Wed, Nov 30, 2011 at 3:45 PM, Deno Vichas d...@syncopated.net wrote: with the quote CF below how would one query for all keys that have a column name value that have a timeuuid of later than x minutes? i need to be able to find all symbols that have not been fetch in x minutes by sector. i know i get list of symbol by sector from my sector CF. thanks, deno On 11/30/2011 1:07 PM, David McNelis wrote: Then I would have a column family for quotes where I have the key as the symbol, the column name as the timestamp, the value as the quote: quote : { key: symbol column names: timeuuid column values: quote at that time for that symbol } -- *David McNelis* Lead Software Engineer Agentis Energy www.agentisenergy.com c: 219.384.5143 *A Smart Grid technology company focused on helping consumers of energy control an often under-managed resource.*
Re: [RELEASE] Apache Cassandra 1.0.5 released
I don't think so. That code hasn't changed in a long time. Is it reproducible? On Wed, Nov 30, 2011 at 2:46 PM, Michael Vaknine micha...@citypath.com wrote: Hi, Upgrade 1.0.3 to 1.0.5 I have this errors TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 AbstractCassandraDaemon.java (line 133) Fatal exception in thread Thread TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 java.lang.AssertionError TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at org.apache.cassandra.db.ColumnFamilyStore.maybeSwitchMemtable(ColumnFamilySt ore.java:671) TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at org.apache.cassandra.db.ColumnFamilyStore.forceFlush(ColumnFamilyStore.java: 745) TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at org.apache.cassandra.db.ColumnFamilyStore.forceBlockingFlush(ColumnFamilySto re.java:750) TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at org.apache.cassandra.db.index.keys.KeysIndex.forceBlockingFlush(KeysIndex.ja va:119) TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at org.apache.cassandra.db.index.SecondaryIndexManager.flushIndexesBlocking(Sec ondaryIndexManager.java:258) TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at org.apache.cassandra.db.index.SecondaryIndexManager.maybeBuildSecondaryIndex es(SecondaryIndexManager.java:123) TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at org.apache.cassandra.streaming.StreamInSession.closeIfFinished(StreamInSessi on.java:151) TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReade r.java:103) TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection. java:184) TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.jav a:81) Is this another regression? Thanks Michael -Original Message- From: Brandon Williams [mailto:dri...@gmail.com] Sent: Wednesday, November 30, 2011 9:43 PM To: user@cassandra.apache.org Subject: Re: [RELEASE] Apache Cassandra 1.0.5 released On Wed, Nov 30, 2011 at 1:29 PM, Michael Vaknine micha...@citypath.com wrote: The files are not on the site The requested URL /apache//cassandra/1.0.5/apache-cassandra-1.0.5-bin.tar.gz was not found on this server. It takes the mirrors some time to sync. -Brandon -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
can not create a column family named 'index'
Hi, just wondering if this is intentional: [default@test] create column family index; Syntax error at position 21: mismatched input 'index' expecting set null [default@test] create column family idx; b9aae960-1bb2-11e1--bf27a177f2f6 Waiting for schema agreement... ... schemas agree across the cluster Thanks, Shu
Re: Cleanup in a write-only environment
Your understanding of nodetool cleanup is not correct. cleanup is used only after cluster balancing like adding or removing nodes. It removes data that does not belong on the node anymore (in older versions it removed hints as well) Your debate is needing to run companion . In a write only workload you should let cassandra do its normal connection.(in most cases) On Wednesday, November 30, 2011, David McNelis dmcne...@agentisenergy.com wrote: In my understanding Cleanup is meant to help clear out data that has been removed. If you have an environment where data is only ever added (the case for the production system I'm working with), is there a point to automating cleanup? I understand that if we were to ever purge a segment of data from our cluster we'd certainly want to run it, or after added a new node and adjusting the tokens. So I want to make sure I'm not missing something here and that there would be other reasons to run cleanup regularly? -- David McNelis Lead Software Engineer Agentis Energy www.agentisenergy.com c: 219.384.5143 A Smart Grid technology company focused on helping consumers of energy control an often under-managed resource.
Re: Cleanup in a write-only environment
Thanks, folks. I think I must have read compaction, thought cleanup, and gotten muddled from there. David On Nov 30, 2011 6:45 PM, Edward Capriolo edlinuxg...@gmail.com wrote: Your understanding of nodetool cleanup is not correct. cleanup is used only after cluster balancing like adding or removing nodes. It removes data that does not belong on the node anymore (in older versions it removed hints as well) Your debate is needing to run companion . In a write only workload you should let cassandra do its normal connection.(in most cases) On Wednesday, November 30, 2011, David McNelis dmcne...@agentisenergy.com wrote: In my understanding Cleanup is meant to help clear out data that has been removed. If you have an environment where data is only ever added (the case for the production system I'm working with), is there a point to automating cleanup? I understand that if we were to ever purge a segment of data from our cluster we'd certainly want to run it, or after added a new node and adjusting the tokens. So I want to make sure I'm not missing something here and that there would be other reasons to run cleanup regularly? -- David McNelis Lead Software Engineer Agentis Energy www.agentisenergy.com c: 219.384.5143 A Smart Grid technology company focused on helping consumers of energy control an often under-managed resource.
Re: data modeling question
here's what i ended up, this seems to work for me. @Test public void readAndWriteSettingTTL() throws Exception { int ttl = 2; String columnFamily = Quote; SetString symbols = new HashSetString(){{ add(appl); add(goog); add(ibm); add(csco); }}; UUID timeUUID = TimeUUIDUtils.getUniqueTimeUUIDinMillis(); MutatorString mutator = HFactory.createMutator(_keyspace, _stringSerializer); for(String symbol : symbols) addInsertionToMutator(columnFamily, timeUUID, mutator, symbol, ttl); mutator.execute(); RangeSlicesQueryString, UUID, String rangeSlicesQuery = HFactory.createRangeSlicesQuery(_keyspace, _stringSerializer, _uuidSerializer, _stringSerializer); rangeSlicesQuery.setColumnFamily(columnFamily); rangeSlicesQuery.setKeys(, ); rangeSlicesQuery.setRange(null, null, false, 1); QueryResultOrderedRowsString, UUID, String result = rangeSlicesQuery.execute(); UUID uuid = result.get().getList().get(0).getColumnSlice().getColumns().get(0).getName(); Assert.assertEquals(UUID should be the same, timeUUID, uuid); Assert.assertEquals(We should have 4 records, 4, result.get().getList().size()); Thread.sleep(5000); // wait till TTL hits to make sure keys are getting flushed. QueryResultOrderedRowsString, UUID, String result2 = rangeSlicesQuery.execute(); for(RowString, UUID, String row : result2.get().getList()) { Assert.assertEquals(We should have no records, 0, row.getColumnSlice().getColumns().size()); } } private void addInsertionToMutator(String columnFamily, UUID columnName, MutatorString mutator, String symbol, int ttl) { mutator.addInsertion(symbol, columnFamily, HFactory.createColumn(columnName, , ttl, _uuidSerializer, _stringSerializer)); } On 11/30/2011 1:56 PM, David McNelis wrote: You wouldn't query for all the keys that have a column name x exactly. Instead what you would do is for sector x grab your list of symbols S. Then you would get the last column for each of those symbols (which you do in different ways depending on the API), and then test if that date is within your threshold. If not, it goes into your list of symbols to fetch. Alternatively, you could iterate over the symbols grabbing data where the date is between range A and B, if you get an empty set / no columns returned, then you need to re-pull for that symbol. Does that make sense? Either way you end up hitting on each of the individual symbols. Maybe someone else has a better idea of how to structure the data for that particular use case. On Wed, Nov 30, 2011 at 3:45 PM, Deno Vichas d...@syncopated.net mailto:d...@syncopated.net wrote: with the quote CF below how would one query for all keys that have a column name value that have a timeuuid of later than x minutes? i need to be able to find all symbols that have not been fetch in x minutes by sector. i know i get list of symbol by sector from my sector CF. thanks, deno On 11/30/2011 1:07 PM, David McNelis wrote: Then I would have a column family for quotes where I have the key as the symbol, the column name as the timestamp, the value as the quote: quote : { key: symbol column names: timeuuid column values: quote at that time for that symbol } -- *David McNelis* Lead Software Engineer Agentis Energy www.agentisenergy.com http://www.agentisenergy.com c: 219.384.5143 /A Smart Grid technology company focused on helping consumers of energy control an often under-managed resource./
read repair and column range queries
Looking at the docs, I can't conclusively answer this question: Suppose I make this CQL query with consistency factor 1 and read-repair 100%: select 'a'..'z' from cf where key = 'xyz' limit 5; Suppose the node I connect to has the key and responds with (improvised syntax): ['a'-0, 'c'-2, 'e'-4, 'g'-6, 'i'-8] Suppose another node has a column 'b'-1, would this be caught by the read repair? The question really boils down to whether the digest query being sent is the same as the one above, or whether it's more of the form select a, c, e, g, i from cf where key = xyz and thus only checks whether the column values are in agreement. Thanks! Thorsten