Suggestion Date as a Partition key
HI All, Please excuse me if my queries of are of novice user . On continuation with my last table design issue i am thinking of creating a Partition key on date(only) as in our search criteria time frame would always be there. So my queries are 1) Is this a good idea as i don't have any other field to add it with the date key. All the time series examples points to the combination here i am talking only about a datewhich i would convert to an int format like 04022015. 2) Also any elaborate doc or writeup to identify how much data is on which node so that i can see the distribution of data on to the nodes For your reference below is my table structure CREATE TABLE logentries ( eventDate bigint PRIMARY KEY, context text, date_to_hour bigint, durationinseconds float, eventtimestamputc timestamp, ipaddress inet, logentrytimestamputc timestamp, loglevel int, logmessagestring text, logsequence int, message text, modulename text, productname text, searchitems maptext, text, servername text, sessionname text, stacktrace text, threadname text, timefinishutc timestamp, timestartutc timestamp, urihostname text, uripathvalue text, uriquerystring text, useragentstring text, username text ); Thanks so much all for the help Cheers Asit
RE: Smart column searching for a particular rowKey
Astyanax allows you to execute CQL statements. I don’t remember the details, but it is there. One tip – when you create the column family, use CLUSTERING ORDER WITH (timestamp DESC). Then you query becomes straightforward and C* will do all the heavy lifting for you. Mohammed From: Ravi Agrawal [mailto:ragra...@clearpoolgroup.com] Sent: Tuesday, February 3, 2015 11:54 AM To: user@cassandra.apache.org Subject: RE: Smart column searching for a particular rowKey Cannot find something corresponding to where clause there. From: Ravi Agrawal [mailto:ragra...@clearpoolgroup.com] Sent: Tuesday, February 03, 2015 2:44 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: RE: Smart column searching for a particular rowKey Thanks, it does. How about in astyanax? From: Eric Stevens [mailto:migh...@gmail.com] Sent: Tuesday, February 03, 2015 1:49 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Smart column searching for a particular rowKey WHERE + ORDER DESC + LIMIT should be able to accomplish that. On Tue, Feb 3, 2015 at 11:28 AM, Ravi Agrawal ragra...@clearpoolgroup.commailto:ragra...@clearpoolgroup.com wrote: Hi Guys, Need help with this. My rowKey is stockName like GOOGLE, APPLE. Columns are sorted as per timestamp and they include some set of data fields like price and size. So, data would be like 1. 9:31:00, $520, 100 shares 2. 9:35:09, $530, 1000 shares 3. 9:45:39, $520, 500 shares I want to search this column family using partition key timestamp. For a rowkey, if I search for data on partition id 9:33:00 which does not actually exist in columns, I want to return the last value where data was present. In this case 9:31:00, $520, 100 shares, since the next partitionkey is 9:35:09 which is greater than input value entered. One obvious way would be iterating through each columns and storing last data, if new timestamp is greater than given timestamp then return the last data stored. Is it any optimized way to achieve the same? Since columns are already sorted. Thanks
Re: No schema agreement from live replicas?
FWIW increasing the threshold for withMaxSchemaAgreementWaitSeconds to 30sec was enough to fix my problem---I would like to understand whether the cluster has some kind of configuration problem that made doing so necessary, however. Thanks! On Tue, Feb 3, 2015 at 7:44 AM, Clint Kelly clint.ke...@gmail.com wrote: Hi all, I have an application that uses the Java driver to create a table and then immediately write to it. I see the following warning in my logs: [10.241.17.134] out: 15/02/03 09:32:24 WARN com.datastax.driver.core.Cluster: No schema agreement from live replicas after 10 s. The schema may not be up to date on some nodes. ...this seems to happen after creating a table, and the schema not being up to date leads to errors when trying to write the the new tables: [10.241.17.134] out: Exception in thread main com.datastax.driver.core.exceptions.InvalidQueryException: unconfigured columnfamily schema_hash Any suggestions on what to do about this (other than increasing withMaxSchemaAgreementWaitSeconds)? This is only a three-node test cluster. I have not gotten this warning before, even on much bigger clusters. Best regards, Clint
No schema agreement from live replicas?
Hi all, I have an application that uses the Java driver to create a table and then immediately write to it. I see the following warning in my logs: [10.241.17.134] out: 15/02/03 09:32:24 WARN com.datastax.driver.core.Cluster: No schema agreement from live replicas after 10 s. The schema may not be up to date on some nodes. ...this seems to happen after creating a table, and the schema not being up to date leads to errors when trying to write the the new tables: [10.241.17.134] out: Exception in thread main com.datastax.driver.core.exceptions.InvalidQueryException: unconfigured columnfamily schema_hash Any suggestions on what to do about this (other than increasing withMaxSchemaAgreementWaitSeconds)? This is only a three-node test cluster. I have not gotten this warning before, even on much bigger clusters. Best regards, Clint
Smart column searching for a particular rowKey
Hi Guys, Need help with this. My rowKey is stockName like GOOGLE, APPLE. Columns are sorted as per timestamp and they include some set of data fields like price and size. So, data would be like 1. 9:31:00, $520, 100 shares 2. 9:35:09, $530, 1000 shares 3. 9:45:39, $520, 500 shares I want to search this column family using partition key timestamp. For a rowkey, if I search for data on partition id 9:33:00 which does not actually exist in columns, I want to return the last value where data was present. In this case 9:31:00, $520, 100 shares, since the next partitionkey is 9:35:09 which is greater than input value entered. One obvious way would be iterating through each columns and storing last data, if new timestamp is greater than given timestamp then return the last data stored. Is it any optimized way to achieve the same? Since columns are already sorted. Thanks
Re: No schema agreement from live replicas?
What version of C* are you using; you could be seeing https://issues.apache.org/jira/browse/CASSANDRA-7734 https://issues.apache.org/jira/browse/CASSANDRA-7734 which I think affects 2.0.7 thru 2.0.10 On Feb 3, 2015, at 9:47 AM, Clint Kelly clint.ke...@gmail.com wrote: FWIW increasing the threshold for withMaxSchemaAgreementWaitSeconds to 30sec was enough to fix my problem---I would like to understand whether the cluster has some kind of configuration problem that made doing so necessary, however. Thanks! On Tue, Feb 3, 2015 at 7:44 AM, Clint Kelly clint.ke...@gmail.com wrote: Hi all, I have an application that uses the Java driver to create a table and then immediately write to it. I see the following warning in my logs: [10.241.17.134] out: 15/02/03 09:32:24 WARN com.datastax.driver.core.Cluster: No schema agreement from live replicas after 10 s. The schema may not be up to date on some nodes. ...this seems to happen after creating a table, and the schema not being up to date leads to errors when trying to write the the new tables: [10.241.17.134] out: Exception in thread main com.datastax.driver.core.exceptions.InvalidQueryException: unconfigured columnfamily schema_hash Any suggestions on what to do about this (other than increasing withMaxSchemaAgreementWaitSeconds)? This is only a three-node test cluster. I have not gotten this warning before, even on much bigger clusters. Best regards, Clint smime.p7s Description: S/MIME cryptographic signature
RE: Smart column searching for a particular rowKey
Cannot find something corresponding to where clause there. From: Ravi Agrawal [mailto:ragra...@clearpoolgroup.com] Sent: Tuesday, February 03, 2015 2:44 PM To: user@cassandra.apache.org Subject: RE: Smart column searching for a particular rowKey Thanks, it does. How about in astyanax? From: Eric Stevens [mailto:migh...@gmail.com] Sent: Tuesday, February 03, 2015 1:49 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Smart column searching for a particular rowKey WHERE + ORDER DESC + LIMIT should be able to accomplish that. On Tue, Feb 3, 2015 at 11:28 AM, Ravi Agrawal ragra...@clearpoolgroup.commailto:ragra...@clearpoolgroup.com wrote: Hi Guys, Need help with this. My rowKey is stockName like GOOGLE, APPLE. Columns are sorted as per timestamp and they include some set of data fields like price and size. So, data would be like 1. 9:31:00, $520, 100 shares 2. 9:35:09, $530, 1000 shares 3. 9:45:39, $520, 500 shares I want to search this column family using partition key timestamp. For a rowkey, if I search for data on partition id 9:33:00 which does not actually exist in columns, I want to return the last value where data was present. In this case 9:31:00, $520, 100 shares, since the next partitionkey is 9:35:09 which is greater than input value entered. One obvious way would be iterating through each columns and storing last data, if new timestamp is greater than given timestamp then return the last data stored. Is it any optimized way to achieve the same? Since columns are already sorted. Thanks
RE: Smart column searching for a particular rowKey
Thanks, it does. How about in astyanax? From: Eric Stevens [mailto:migh...@gmail.com] Sent: Tuesday, February 03, 2015 1:49 PM To: user@cassandra.apache.org Subject: Re: Smart column searching for a particular rowKey WHERE + ORDER DESC + LIMIT should be able to accomplish that. On Tue, Feb 3, 2015 at 11:28 AM, Ravi Agrawal ragra...@clearpoolgroup.commailto:ragra...@clearpoolgroup.com wrote: Hi Guys, Need help with this. My rowKey is stockName like GOOGLE, APPLE. Columns are sorted as per timestamp and they include some set of data fields like price and size. So, data would be like 1. 9:31:00, $520, 100 shares 2. 9:35:09, $530, 1000 shares 3. 9:45:39, $520, 500 shares I want to search this column family using partition key timestamp. For a rowkey, if I search for data on partition id 9:33:00 which does not actually exist in columns, I want to return the last value where data was present. In this case 9:31:00, $520, 100 shares, since the next partitionkey is 9:35:09 which is greater than input value entered. One obvious way would be iterating through each columns and storing last data, if new timestamp is greater than given timestamp then return the last data stored. Is it any optimized way to achieve the same? Since columns are already sorted. Thanks
Re: Smart column searching for a particular rowKey
WHERE + ORDER DESC + LIMIT should be able to accomplish that. On Tue, Feb 3, 2015 at 11:28 AM, Ravi Agrawal ragra...@clearpoolgroup.com wrote: Hi Guys, Need help with this. My rowKey is stockName like GOOGLE, APPLE. Columns are sorted as per timestamp and they include some set of data fields like price and size. So, data would be like 1. 9:31:00, $520, 100 shares 2. 9:35:09, $530, 1000 shares 3. 9:45:39, $520, 500 shares I want to search this column family using partition key timestamp. For a rowkey, if I search for data on partition id 9:33:00 which does not actually exist in columns, I want to return the last value where data was present. In this case 9:31:00, $520, 100 shares, since the next partitionkey is 9:35:09 which is greater than input value entered. One obvious way would be iterating through each columns and storing last data, if new timestamp is greater than given timestamp then return the last data stored. Is it any optimized way to achieve the same? Since columns are already sorted. Thanks
Re: Unable to create a keyspace
Thanks Carlos for pointing that out. The clock on one of the nodes was not in sync and fixing that solved the issue. From: Jan cne...@yahoo.commailto:cne...@yahoo.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org, Jan cne...@yahoo.commailto:cne...@yahoo.com Date: Saturday, January 31, 2015 at 9:59 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Unable to create a keyspace Saurabh; a) How exactly are the three nodes hosted. b) Can you take down node 2 and create the keyspace from node 1 c) Can you take down node 1 and create the keyspace from node2 d) Do the nodes see each other with 'nodetool status' cheers Jan/ C* Architect On Saturday, January 31, 2015 5:40 AM, Carlos Rolo r...@pythian.commailto:r...@pythian.com wrote: Something that can cause weird behavior is the machine clocks not being properly synced. I didn't read the thread in full detail, so disregard this if it is not the case. --
Re: Tombstone gc after gc grace seconds
Hi, thanks for sharing your tests ! Though, how did you inserted the data ? Did you try adding columns in an atomic and random order, with a small memtable size to achieve a big sharding (normal in time series use case) ? I think performing ./md_test against this set of data would be interesting, taking care to shard enough to have parts of each key on each SSTable, or in many at least. This would measure the effectiveness of this parameter in a normal time series workflow (which is a standard use case of C*). By the way, this might be done on both LCS http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra and STCS http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/tabProp.html?scroll=tabProp__moreCompaction and see how both behave. Thanks again, it is interesting to hear about tests like yours ! C*heers Alain 2015-01-30 17:32 GMT+01:00 Ravi Agrawal ragra...@clearpoolgroup.com: I did a small test. I wrote data to 4 different column family. 30MB of data. 256 rowkeys and 100K columns on an average. And then deleted all data from all of them. 1. Md_normal - created using default compaction parameters and Gc Grace seconds was 5 seconds. Data was written and then deleted. Compaction was ran using nodetool compact keyspace columnfamily - I see full disk data, but cannot query columns(since data was deleted consistent behavior) and cannot query rows in cqlsh. Hits timeout. 2. Md_test - created using following compact parameters - compaction={'tombstone_threshold': '0.01', 'class': 'SizeTieredCompactionStrategy'} and Gc Grace seconds was 5 seconds. Disksize is reduced, and am able to query rows which return 0. 3. Md_test2 - created using following compact parameters - compaction={'tombstone_threshold': '0.0', 'class': 'SizeTieredCompactionStrategy'}. Disksize is reduced, not able to query rows using cqlsh. Hits timeout. 4. Md_forcecompact - created using compaction parameters compaction={'unchecked_tombstone_compaction': 'true', 'class': 'SizeTieredCompactionStrategy'} and Gc Grace seconds was 5 seconds. Data was written and then deleted. I see full disk data, but cannot query any data using mddbreader and cannot query rows in cqlsh. Hits timeout. Next day sizes were - 30M ./md_forcecompact 4.0K./md_test 304K./md_test2 30M ./md_normal Feel of the data that we have is - 8000 rowkeys per day and columns are added throughout the day. 300K columns on an average per rowKey. *From:* Alain RODRIGUEZ [mailto:arodr...@gmail.com] *Sent:* Friday, January 30, 2015 4:26 AM *To:* user@cassandra.apache.org *Subject:* Re: Tombstone gc after gc grace seconds The point is that all the parts or fragments of the row need to be in the SSTables implied in the compaction for C* to be able to evict the row effectively. My understanding of those parameters is that they will trigger a compaction on the SSTable that exceed this ratio. This will work properly if you never update a row (by modifying a value or adding a column). If your workflow is something like Write once per partition key, this parameter will do the job. If you have fragments, you might trigger this compaction for nothing. In the case of frequently updated rows (like when using wide rows / time series) your only way to get rid of tombstone is a major compaction. That's how I understand this. Hope this help, C*heers, Alain 2015-01-30 1:29 GMT+01:00 Mohammed Guller moham...@glassbeam.com: Ravi - It may help. What version are you running? Do you know if minor compaction is getting triggered at all? One way to check would be see how many sstables the data directory has. Mohammed *From:* Ravi Agrawal [mailto:ragra...@clearpoolgroup.com] *Sent:* Thursday, January 29, 2015 1:29 PM *To:* user@cassandra.apache.org *Subject:* RE: Tombstone gc after gc grace seconds Hi, I saw there are 2 more interesting parameters - a. tombstone_threshold - A ratio of garbage-collectable tombstones to all contained columns, which if exceeded by the SSTable triggers compaction (with no other SSTables) for the purpose of purging the tombstones. Default value - 0.2 b. unchecked_tombstone_compaction - True enables more aggressive than normal tombstone compactions. A single SSTable tombstone compaction runs without checking the likelihood of success. Cassandra 2.0.9 and later. Could I use these to get what I want? Problem I am encountering is even long after gc_grace_seconds I see no reduction in disk space until I run compaction manually. I was thinking to make tombstone threshold close to 0 and unchecked compaction set to true. Also we are not running nodetool repair on weekly basis as of now. *From:* Eric Stevens [mailto:migh...@gmail.com migh...@gmail.com] *Sent:* Monday, January 26, 2015 12:11 PM *To:* user@cassandra.apache.org *Subject:* Re: Tombstone