Why Cassandra 2.1.2 couldn't populate row cache in between
Hi, If I've enable row cache for some column family, when I request some row which is not from the begining of the partition, then cassandra doesn't populate, row cache. Why it is so? For older version I think it was because we're saying the its caching complete merged partition so, incomplete partition can't reside in row cache. However in new version since we could resize the cache, so why not we populate from other than the start? Nitin Padalia
Comparison of multiple ways to query cassandra
hi Could someone please shed some light on which is an efficient way to retrieve data from cassandra- Using a Range Slice Query(I'm Using Hector) or filtering using secondary indexes? best Parth
How to know disk utilization by each row on a node
Hello, everybody. Does anyone know a way to list, for an arbitrary column family, all the rows owned (including replicas) by a given node and the data size (real size or disk occupation) of each one of them on that node? I would like to do that because I have data on one of my nodes growing faster than the others, although rows (and replicas) seem evenly distributed across the cluster. So, I would like to verify if I have some specific rows growing too much. Thank you.
Re: Dynamic Columns
Hello, Have you looked at solving this challenge with clustering columns? Also, please describe the problem set details for more specific advice from this group. Starting new projects on Thrift isn't the recommended approach. Jonathan [image: datastax_logo.png] Jonathan Lacefield Solution Architect | (404) 822 3487 | jlacefi...@datastax.com [image: linkedin.png] http://www.linkedin.com/in/jlacefield/ [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax https://github.com/datastax/ On Tue, Jan 20, 2015 at 1:24 PM, chetan verma chetanverm...@gmail.com wrote: Hi, I am starting a new project with cassandra as database. I have unstructured data so I need dynamic columns, though in CQL3 we can achive this via Collections but there are some downsides to it. 1. Collections are used to store small amount of data. 2. The maximum size of an item in a collection is 64K. 3. Cassandra reads a collection in its entirety. 4. Restrictions on number of items in collections is 64,000 And no support to get single column by map key, which is possible via cassandra cli. Please suggest whether I should use CQL3 or Thrift and which driver is best. -- *Regards,* *Chetan Verma* *+91 99860 86634 %2B91%2099860%2086634*
Dynamic Columns
Hi, I am starting a new project with cassandra as database. I have unstructured data so I need dynamic columns, though in CQL3 we can achive this via Collections but there are some downsides to it. 1. Collections are used to store small amount of data. 2. The maximum size of an item in a collection is 64K. 3. Cassandra reads a collection in its entirety. 4. Restrictions on number of items in collections is 64,000 And no support to get single column by map key, which is possible via cassandra cli. Please suggest whether I should use CQL3 or Thrift and which driver is best. -- *Regards,* *Chetan Verma* *+91 99860 86634*
[RELEASE] Apache Cassandra 2.0.12 released
The Cassandra team is pleased to announce the release of Apache Cassandra version 2.0.12. Apache Cassandra is a fully distributed database. It is the right choice when you need scalability and high availability without compromising performance. http://cassandra.apache.org/ Downloads of source and binary distributions are listed in our download section: http://cassandra.apache.org/download/ This version is a bug fix release[1] on the 2.0 series. As always, please pay attention to the release notes[2] and Let us know[3] if you were to encounter any problem. Enjoy! [1]: http://goo.gl/ZeeTfs (CHANGES.txt) [2]: http://goo.gl/1zEijH (NEWS.txt) [3]: https://issues.apache.org/jira/browse/CASSANDRA
Re: Dynamic Columns
Could you please explain how we can achieve dynamic column behavior by clustering columns. On Wed, Jan 21, 2015 at 12:10 AM, chetan verma chetanverm...@gmail.com wrote: Hi, I am creating a review system. for instance lets assume following are the attibutes of system: Review{ id bigint, product_id bigint, created_at timestamp, summary text, description text, pros settext, cons settext, feature_rating maptext, int etc } I created partition key as product_id (so that all the reviews for a given product will reside on same node) and clustering key as created_at and id (Desc) so that reviews will be sorted by time. I can have more column and that requirement I want to fulfil by dynamic columns but there are limitations to it explained above. Could you please let me know the best way. On Tue, Jan 20, 2015 at 11:59 PM, Jonathan Lacefield jlacefi...@datastax.com wrote: Hello, Have you looked at solving this challenge with clustering columns? Also, please describe the problem set details for more specific advice from this group. Starting new projects on Thrift isn't the recommended approach. Jonathan [image: datastax_logo.png] Jonathan Lacefield Solution Architect | (404) 822 3487 | jlacefi...@datastax.com [image: linkedin.png] http://www.linkedin.com/in/jlacefield/ [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax https://github.com/datastax/ On Tue, Jan 20, 2015 at 1:24 PM, chetan verma chetanverm...@gmail.com wrote: Hi, I am starting a new project with cassandra as database. I have unstructured data so I need dynamic columns, though in CQL3 we can achive this via Collections but there are some downsides to it. 1. Collections are used to store small amount of data. 2. The maximum size of an item in a collection is 64K. 3. Cassandra reads a collection in its entirety. 4. Restrictions on number of items in collections is 64,000 And no support to get single column by map key, which is possible via cassandra cli. Please suggest whether I should use CQL3 or Thrift and which driver is best. -- *Regards,* *Chetan Verma* *+91 99860 86634 %2B91%2099860%2086634* -- *Regards,* *Chetan Verma* *+91 99860 86634 %2B91%2099860%2086634* -- *Regards,* *Chetan Verma* *+91 99860 86634*
Re: Dynamic Columns
Hi, I am creating a review system. for instance lets assume following are the attibutes of system: Review{ id bigint, product_id bigint, created_at timestamp, summary text, description text, pros settext, cons settext, feature_rating maptext, int etc } I created partition key as product_id (so that all the reviews for a given product will reside on same node) and clustering key as created_at and id (Desc) so that reviews will be sorted by time. I can have more column and that requirement I want to fulfil by dynamic columns but there are limitations to it explained above. Could you please let me know the best way. On Tue, Jan 20, 2015 at 11:59 PM, Jonathan Lacefield jlacefi...@datastax.com wrote: Hello, Have you looked at solving this challenge with clustering columns? Also, please describe the problem set details for more specific advice from this group. Starting new projects on Thrift isn't the recommended approach. Jonathan [image: datastax_logo.png] Jonathan Lacefield Solution Architect | (404) 822 3487 | jlacefi...@datastax.com [image: linkedin.png] http://www.linkedin.com/in/jlacefield/ [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax https://github.com/datastax/ On Tue, Jan 20, 2015 at 1:24 PM, chetan verma chetanverm...@gmail.com wrote: Hi, I am starting a new project with cassandra as database. I have unstructured data so I need dynamic columns, though in CQL3 we can achive this via Collections but there are some downsides to it. 1. Collections are used to store small amount of data. 2. The maximum size of an item in a collection is 64K. 3. Cassandra reads a collection in its entirety. 4. Restrictions on number of items in collections is 64,000 And no support to get single column by map key, which is possible via cassandra cli. Please suggest whether I should use CQL3 or Thrift and which driver is best. -- *Regards,* *Chetan Verma* *+91 99860 86634 %2B91%2099860%2086634* -- *Regards,* *Chetan Verma* *+91 99860 86634*
Re: number of replicas per data center?
On Sun, Jan 18, 2015 at 8:50 PM, Kevin Burton bur...@spinn3r.com wrote: Ah.. six replicas. At least its super inexpensive that way (sarcasm!) People with larger numbers of data centers do tend to reduce their replication factor per DC. It's all about how much consistency you want to risk, rebuild over the WAN, etc.. =Rob
Re: How do replica become out of sync
On Mon, Jan 19, 2015 at 5:44 PM, Flavien Charlon flavien.char...@gmail.com wrote: Thanks Andi. The reason I was asking is that even though my nodes have been 100% available and no write has been rejected, when running an incremental repair, the logs still indicate that some ranges are out of sync (which then results in large amounts of compaction), how can this be possible? This is most likely, as you conjecture, due to slight differences between nodes at the time of Merkle Tree calculation. How many rows differ? =Rob
Re: Why Cassandra 2.1.2 couldn't populate row cache in between
On Mon, Jan 19, 2015 at 11:57 PM, nitin padalia padalia.ni...@gmail.com wrote: If I've enable row cache for some column family, when I request some row which is not from the begining of the partition, then cassandra doesn't populate, row cache. Why it is so? For older version I think it was because we're saying the its caching complete merged partition so, incomplete partition can't reside in row cache. However in new version since we could resize the cache, so why not we populate from other than the start? https://issues.apache.org/jira/browse/CASSANDRA-5357 Has the details of the new row version of the row cache. =Rob
Re: Compaction failing to trigger
On Sun, Jan 18, 2015 at 6:06 PM, Flavien Charlon flavien.char...@gmail.com wrote: It's set on all the tables, as I'm using the default for all the tables. But for that particular table there are 41 SSTables between 60MB and 85MB, it should only take 4 for the compaction to kick in. What version of Cassandra are you running? Are they all live? Are there pending compactions, or exceptions regarding compactions in your logs? As this is probably a bug and going back in the mailing list archive, it seems it's already been reported: This is a weird statement. Are you saying that you've found it in the mailing list archives? If so, why not paste the threads so those of us who might remember can refer to them? - Will it be fixed in 2.1.3? https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/ =Rob
Re: keyspace not exists?
On Sun, Jan 18, 2015 at 8:55 PM, Jason Wee peich...@gmail.com wrote: two nodes running cassandra 2.1.2 and one running cassandra 2.1.1 For the record, this is an unsupported persistent configuration. You are only supposed to have split minor versions during an upgrade. I have no idea if it is causing the problem you are having. =Rob
Re: Compaction failing to trigger
@Rob - he's probably referring to the thread titled Reasons for nodes not compacting? where Tyler speculates that the tables are falling below the cold read threshold for compaction. He speculated it may be a bug. At the same time in a different thread, Roland had a similar problem, and Tyler's proposed workaround seemed to work for him. On Tue, Jan 20, 2015 at 3:35 PM, Robert Coli rc...@eventbrite.com wrote: On Sun, Jan 18, 2015 at 6:06 PM, Flavien Charlon flavien.char...@gmail.com wrote: It's set on all the tables, as I'm using the default for all the tables. But for that particular table there are 41 SSTables between 60MB and 85MB, it should only take 4 for the compaction to kick in. What version of Cassandra are you running? Are they all live? Are there pending compactions, or exceptions regarding compactions in your logs? As this is probably a bug and going back in the mailing list archive, it seems it's already been reported: This is a weird statement. Are you saying that you've found it in the mailing list archives? If so, why not paste the threads so those of us who might remember can refer to them? - Will it be fixed in 2.1.3? https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/ =Rob
Re: Dynamic Columns
Maybe this is the closest thing to dynamic columns in CQL 3. create table reivew ( product_id bigint, created_at timestamp, data_key text, data_tvalue text, data_ivalue int, primary key ((priduct_id, created_at), data_key) ); data_tvalue and data_ivalue is optional. At 2015-01-21 04:44:07, chetan verma chetanverm...@gmail.com wrote: Hi, Adding to previous mail. For example: We have a column family named review (with some arbitrary data in map). CREATE TABLE review( product_id bigint, created_at timestamp, data_int maptext, int, data_text maptext, text, PRIMARY KEY (product_id, created_at) ); Assume that these 2 maps I use to store arbitrary data (i.e. data_int and data_text for int and text values) when we see output on cassandra-cli, it looks like in a partition as : clustering_key:data_int:map_key as column name and value as map value. suppose I need to get this value, I couldn't do that with CQL3 but in thrift its possible. Any Solution? On Wed, Jan 21, 2015 at 1:06 AM, chetan verma chetanverm...@gmail.com wrote: Hi, Most of the time I will be querying on product_id and created_at, but for analytic I need to query almost on all column. Multiple collections ideas is good but the only is cassandra reads a collection entirely, what if I need a slice of it, I mean columns for certain keys which is possible with thrift. Please suggest. On Wed, Jan 21, 2015 at 12:36 AM, Jonathan Lacefield jlacefi...@datastax.com wrote: Hello, There are probably lots of options to this challenge. The more details around your use case that you can provide, the easier it will be for this group to offer advice. A few follow-up questions: - How will you query this data? - Do your queries require filtering on specific columns other than product_id and created_at, i.e. the dynamic columns? Depending on the answers to these questions, you have several options, of which here are a few: Cassandra efficiently stores sparse data, so you could create columns and not populate them, without much of a penalty Could use a clustering column to store a columns type and another col (potentially clustering) to store the value i.e. CREATE TABLE foo (col1 int, attname text, attvalue text, col4...n, PRIMARY KEY (col1, attname, attvalue)); where attname stores the name of the attribute/column and attvalue stores the value of that attribute have seen users use this model and create a main attribute row within a partition that stores the values associated with col4...n Could store multiple collections Others probably have ideas as well You may want to look in the archives for a similar discussion topic. Believe this item was asked a few months ago as well. Jonathan Lacefield Solution Architect |(404) 822 3487 | jlacefi...@datastax.com On Tue, Jan 20, 2015 at 1:40 PM, chetan verma chetanverm...@gmail.com wrote: Hi, I am creating a review system. for instance lets assume following are the attibutes of system: Review{ id bigint, product_id bigint, created_at timestamp, summary text, description text, pros settext, cons settext, feature_rating maptext, int etc } I created partition key as product_id (so that all the reviews for a given product will reside on same node) and clustering key as created_at and id (Desc) so that reviews will be sorted by time. I can have more column and that requirement I want to fulfil by dynamic columns but there are limitations to it explained above. Could you please let me know the best way. On Tue, Jan 20, 2015 at 11:59 PM, Jonathan Lacefield jlacefi...@datastax.com wrote: Hello, Have you looked at solving this challenge with clustering columns? Also, please describe the problem set details for more specific advice from this group. Starting new projects on Thrift isn't the recommended approach. Jonathan Jonathan Lacefield Solution Architect |(404) 822 3487 | jlacefi...@datastax.com On Tue, Jan 20, 2015 at 1:24 PM, chetan verma chetanverm...@gmail.com wrote: Hi, I am starting a new project with cassandra as database. I have unstructured data so I need dynamic columns, though in CQL3 we can achive this via Collections but there are some downsides to it. 1. Collections are used to store small amount of data. 2. The maximum size of an item in a collection is 64K. 3. Cassandra reads a collection in its entirety. 4. Restrictions on number of items in collections is 64,000 And no support to get single column by map key, which is possible via cassandra cli. Please suggest whether I should use CQL3 or Thrift and which driver is best. -- Regards, Chetan Verma +91 99860 86634 -- Regards, Chetan Verma +91 99860 86634 -- Regards, Chetan Verma +91 99860 86634 -- Regards, Chetan Verma +91 99860 86634
Re: Dynamic Columns
I think that table example misses the point of chetan's functional requirement. he actually needs dynamic columns. On Tue, Jan 20, 2015 at 8:12 PM, Xu Zhongxing xu_zhong_x...@163.com wrote: Maybe this is the closest thing to dynamic columns in CQL 3. create table reivew ( product_id bigint, created_at timestamp, data_key text, data_tvalue text, data_ivalue int, primary key ((priduct_id, created_at), data_key) ); data_tvalue and data_ivalue is optional. At 2015-01-21 04:44:07, chetan verma chetanverm...@gmail.com wrote: Hi, Adding to previous mail. For example: We have a column family named review (with some arbitrary data in map). CREATE TABLE review( product_id bigint, created_at timestamp, data_int maptext, int, data_text maptext, text, PRIMARY KEY (product_id, created_at) ); Assume that these 2 maps I use to store arbitrary data (i.e. data_int and data_text for int and text values) when we see output on cassandra-cli, it looks like in a partition as : clustering_key:data_int:map_key as column name and value as map value. suppose I need to get this value, I couldn't do that with CQL3 but in thrift its possible. Any Solution? On Wed, Jan 21, 2015 at 1:06 AM, chetan verma chetanverm...@gmail.com wrote: Hi, Most of the time I will be querying on product_id and created_at, but for analytic I need to query almost on all column. Multiple collections ideas is good but the only is cassandra reads a collection entirely, what if I need a slice of it, I mean columns for certain keys which is possible with thrift. Please suggest. On Wed, Jan 21, 2015 at 12:36 AM, Jonathan Lacefield jlacefi...@datastax.com wrote: Hello, There are probably lots of options to this challenge. The more details around your use case that you can provide, the easier it will be for this group to offer advice. A few follow-up questions: - How will you query this data? - Do your queries require filtering on specific columns other than product_id and created_at, i.e. the dynamic columns? Depending on the answers to these questions, you have several options, of which here are a few: - Cassandra efficiently stores sparse data, so you could create columns and not populate them, without much of a penalty - Could use a clustering column to store a columns type and another col (potentially clustering) to store the value - i.e. CREATE TABLE foo (col1 int, attname text, attvalue text, col4...n, PRIMARY KEY (col1, attname, attvalue)); - where attname stores the name of the attribute/column and attvalue stores the value of that attribute - have seen users use this model and create a main attribute row within a partition that stores the values associated with col4...n - Could store multiple collections - Others probably have ideas as well You may want to look in the archives for a similar discussion topic. Believe this item was asked a few months ago as well. [image: datastax_logo.png] Jonathan Lacefield Solution Architect | (404) 822 3487 | jlacefi...@datastax.com [image: linkedin.png] http://www.linkedin.com/in/jlacefield/ [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax https://github.com/datastax/ On Tue, Jan 20, 2015 at 1:40 PM, chetan verma chetanverm...@gmail.com wrote: Hi, I am creating a review system. for instance lets assume following are the attibutes of system: Review{ id bigint, product_id bigint, created_at timestamp, summary text, description text, pros settext, cons settext, feature_rating maptext, int etc } I created partition key as product_id (so that all the reviews for a given product will reside on same node) and clustering key as created_at and id (Desc) so that reviews will be sorted by time. I can have more column and that requirement I want to fulfil by dynamic columns but there are limitations to it explained above. Could you please let me know the best way. On Tue, Jan 20, 2015 at 11:59 PM, Jonathan Lacefield jlacefi...@datastax.com wrote: Hello, Have you looked at solving this challenge with clustering columns? Also, please describe the problem set details for more specific advice from this group. Starting new projects on Thrift isn't the recommended approach. Jonathan [image: datastax_logo.png] Jonathan Lacefield Solution Architect | (404) 822 3487 | jlacefi...@datastax.com [image: linkedin.png] http://www.linkedin.com/in/jlacefield/ [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax https://github.com/datastax/ On Tue, Jan 20, 2015 at 1:24 PM, chetan verma chetanverm...@gmail.com wrote:
Re:Re: Dynamic Columns
I approximate dynamic columns by data_key and data_value columns. Is there a better way to get dynamic columns in CQL 3? At 2015-01-21 09:41:02, Peter Lin wool...@gmail.com wrote: I think that table example misses the point of chetan's functional requirement. he actually needs dynamic columns. On Tue, Jan 20, 2015 at 8:12 PM, Xu Zhongxing xu_zhong_x...@163.com wrote: Maybe this is the closest thing to dynamic columns in CQL 3. create table reivew ( product_id bigint, created_at timestamp, data_key text, data_tvalue text, data_ivalue int, primary key ((priduct_id, created_at), data_key) ); data_tvalue and data_ivalue is optional. At 2015-01-21 04:44:07, chetan verma chetanverm...@gmail.com wrote: Hi, Adding to previous mail. For example: We have a column family named review (with some arbitrary data in map). CREATE TABLE review( product_id bigint, created_at timestamp, data_int maptext, int, data_text maptext, text, PRIMARY KEY (product_id, created_at) ); Assume that these 2 maps I use to store arbitrary data (i.e. data_int and data_text for int and text values) when we see output on cassandra-cli, it looks like in a partition as : clustering_key:data_int:map_key as column name and value as map value. suppose I need to get this value, I couldn't do that with CQL3 but in thrift its possible. Any Solution? On Wed, Jan 21, 2015 at 1:06 AM, chetan verma chetanverm...@gmail.com wrote: Hi, Most of the time I will be querying on product_id and created_at, but for analytic I need to query almost on all column. Multiple collections ideas is good but the only is cassandra reads a collection entirely, what if I need a slice of it, I mean columns for certain keys which is possible with thrift. Please suggest. On Wed, Jan 21, 2015 at 12:36 AM, Jonathan Lacefield jlacefi...@datastax.com wrote: Hello, There are probably lots of options to this challenge. The more details around your use case that you can provide, the easier it will be for this group to offer advice. A few follow-up questions: - How will you query this data? - Do your queries require filtering on specific columns other than product_id and created_at, i.e. the dynamic columns? Depending on the answers to these questions, you have several options, of which here are a few: Cassandra efficiently stores sparse data, so you could create columns and not populate them, without much of a penalty Could use a clustering column to store a columns type and another col (potentially clustering) to store the value i.e. CREATE TABLE foo (col1 int, attname text, attvalue text, col4...n, PRIMARY KEY (col1, attname, attvalue)); where attname stores the name of the attribute/column and attvalue stores the value of that attribute have seen users use this model and create a main attribute row within a partition that stores the values associated with col4...n Could store multiple collections Others probably have ideas as well You may want to look in the archives for a similar discussion topic. Believe this item was asked a few months ago as well. Jonathan Lacefield Solution Architect |(404) 822 3487 | jlacefi...@datastax.com On Tue, Jan 20, 2015 at 1:40 PM, chetan verma chetanverm...@gmail.com wrote: Hi, I am creating a review system. for instance lets assume following are the attibutes of system: Review{ id bigint, product_id bigint, created_at timestamp, summary text, description text, pros settext, cons settext, feature_rating maptext, int etc } I created partition key as product_id (so that all the reviews for a given product will reside on same node) and clustering key as created_at and id (Desc) so that reviews will be sorted by time. I can have more column and that requirement I want to fulfil by dynamic columns but there are limitations to it explained above. Could you please let me know the best way. On Tue, Jan 20, 2015 at 11:59 PM, Jonathan Lacefield jlacefi...@datastax.com wrote: Hello, Have you looked at solving this challenge with clustering columns? Also, please describe the problem set details for more specific advice from this group. Starting new projects on Thrift isn't the recommended approach. Jonathan Jonathan Lacefield Solution Architect |(404) 822 3487 | jlacefi...@datastax.com On Tue, Jan 20, 2015 at 1:24 PM, chetan verma chetanverm...@gmail.com wrote: Hi, I am starting a new project with cassandra as database. I have unstructured data so I need dynamic columns, though in CQL3 we can achive this via Collections but there are some downsides to it. 1. Collections are used to store small amount of data. 2. The maximum size of an item in a collection is 64K. 3. Cassandra reads a collection in its entirety. 4. Restrictions on number of items in collections is 64,000 And no support to get single column by map key, which is possible via cassandra cli. Please suggest whether I should use CQL3 or Thrift and which
Re: Dynamic Columns
The original dynamic column idea in Google BigTable paper is a mapping of: (row key, raw bytes) - raw bytes The restriction imposed by CQL is, as far as I understand, you need to have a type for each column. If the value types involved in the schema is limited, e.g. text or int or timestamp, we can approximate the raw bytes mapping by setting up a few value columns of explicit type. At 2015-01-21 10:46:27, Peter Lin wool...@gmail.com wrote: the thing is, CQL only handles some types of dynamic column use cases. There's plenty of examples on datastax.com that shows how to do CQL style dynamic columns. based on what was described by Chetan, I don't feel CQL3 is a perfect fit for what he wants to do. To use CQL3, he'd have to change his approach. In my temporal database, I use both Thrift and CQL. They compliment each other very nice. I don't understand why people have to put down Thrift or pretend it supports 100% of the use cases. Lots of people who started using Cassandra pre CQL and had no problems using thrift. Yes you have to understand more and the learning curve is steeper, but taking time to learn the internals of cassandra is a good thing. Using CQL3 lists or maps, it would force the query to load the enter collection, but that is by design. To get the full power of the old style of dynamic columns, thrift is a better fit. I hope CQL continues to improve so that it supports 100% of the existing use cases. On Tue, Jan 20, 2015 at 8:50 PM, Xu Zhongxing xu_zhong_x...@163.com wrote: I approximate dynamic columns by data_key and data_value columns. Is there a better way to get dynamic columns in CQL 3? At 2015-01-21 09:41:02, Peter Lin wool...@gmail.com wrote: I think that table example misses the point of chetan's functional requirement. he actually needs dynamic columns. On Tue, Jan 20, 2015 at 8:12 PM, Xu Zhongxing xu_zhong_x...@163.com wrote: Maybe this is the closest thing to dynamic columns in CQL 3. create table reivew ( product_id bigint, created_at timestamp, data_key text, data_tvalue text, data_ivalue int, primary key ((priduct_id, created_at), data_key) ); data_tvalue and data_ivalue is optional. At 2015-01-21 04:44:07, chetan verma chetanverm...@gmail.com wrote: Hi, Adding to previous mail. For example: We have a column family named review (with some arbitrary data in map). CREATE TABLE review( product_id bigint, created_at timestamp, data_int maptext, int, data_text maptext, text, PRIMARY KEY (product_id, created_at) ); Assume that these 2 maps I use to store arbitrary data (i.e. data_int and data_text for int and text values) when we see output on cassandra-cli, it looks like in a partition as : clustering_key:data_int:map_key as column name and value as map value. suppose I need to get this value, I couldn't do that with CQL3 but in thrift its possible. Any Solution? On Wed, Jan 21, 2015 at 1:06 AM, chetan verma chetanverm...@gmail.com wrote: Hi, Most of the time I will be querying on product_id and created_at, but for analytic I need to query almost on all column. Multiple collections ideas is good but the only is cassandra reads a collection entirely, what if I need a slice of it, I mean columns for certain keys which is possible with thrift. Please suggest. On Wed, Jan 21, 2015 at 12:36 AM, Jonathan Lacefield jlacefi...@datastax.com wrote: Hello, There are probably lots of options to this challenge. The more details around your use case that you can provide, the easier it will be for this group to offer advice. A few follow-up questions: - How will you query this data? - Do your queries require filtering on specific columns other than product_id and created_at, i.e. the dynamic columns? Depending on the answers to these questions, you have several options, of which here are a few: Cassandra efficiently stores sparse data, so you could create columns and not populate them, without much of a penalty Could use a clustering column to store a columns type and another col (potentially clustering) to store the value i.e. CREATE TABLE foo (col1 int, attname text, attvalue text, col4...n, PRIMARY KEY (col1, attname, attvalue)); where attname stores the name of the attribute/column and attvalue stores the value of that attribute have seen users use this model and create a main attribute row within a partition that stores the values associated with col4...n Could store multiple collections Others probably have ideas as well You may want to look in the archives for a similar discussion topic. Believe this item was asked a few months ago as well. Jonathan Lacefield Solution Architect |(404) 822 3487 | jlacefi...@datastax.com On Tue, Jan 20, 2015 at 1:40 PM, chetan verma chetanverm...@gmail.com wrote: Hi, I am creating a review system. for instance lets assume following are the attibutes of system: Review{ id bigint, product_id bigint, created_at timestamp,
Re: Re: Dynamic Columns
the thing is, CQL only handles some types of dynamic column use cases. There's plenty of examples on datastax.com that shows how to do CQL style dynamic columns. based on what was described by Chetan, I don't feel CQL3 is a perfect fit for what he wants to do. To use CQL3, he'd have to change his approach. In my temporal database, I use both Thrift and CQL. They compliment each other very nice. I don't understand why people have to put down Thrift or pretend it supports 100% of the use cases. Lots of people who started using Cassandra pre CQL and had no problems using thrift. Yes you have to understand more and the learning curve is steeper, but taking time to learn the internals of cassandra is a good thing. Using CQL3 lists or maps, it would force the query to load the enter collection, but that is by design. To get the full power of the old style of dynamic columns, thrift is a better fit. I hope CQL continues to improve so that it supports 100% of the existing use cases. On Tue, Jan 20, 2015 at 8:50 PM, Xu Zhongxing xu_zhong_x...@163.com wrote: I approximate dynamic columns by data_key and data_value columns. Is there a better way to get dynamic columns in CQL 3? At 2015-01-21 09:41:02, Peter Lin wool...@gmail.com wrote: I think that table example misses the point of chetan's functional requirement. he actually needs dynamic columns. On Tue, Jan 20, 2015 at 8:12 PM, Xu Zhongxing xu_zhong_x...@163.com wrote: Maybe this is the closest thing to dynamic columns in CQL 3. create table reivew ( product_id bigint, created_at timestamp, data_key text, data_tvalue text, data_ivalue int, primary key ((priduct_id, created_at), data_key) ); data_tvalue and data_ivalue is optional. At 2015-01-21 04:44:07, chetan verma chetanverm...@gmail.com wrote: Hi, Adding to previous mail. For example: We have a column family named review (with some arbitrary data in map). CREATE TABLE review( product_id bigint, created_at timestamp, data_int maptext, int, data_text maptext, text, PRIMARY KEY (product_id, created_at) ); Assume that these 2 maps I use to store arbitrary data (i.e. data_int and data_text for int and text values) when we see output on cassandra-cli, it looks like in a partition as : clustering_key:data_int:map_key as column name and value as map value. suppose I need to get this value, I couldn't do that with CQL3 but in thrift its possible. Any Solution? On Wed, Jan 21, 2015 at 1:06 AM, chetan verma chetanverm...@gmail.com wrote: Hi, Most of the time I will be querying on product_id and created_at, but for analytic I need to query almost on all column. Multiple collections ideas is good but the only is cassandra reads a collection entirely, what if I need a slice of it, I mean columns for certain keys which is possible with thrift. Please suggest. On Wed, Jan 21, 2015 at 12:36 AM, Jonathan Lacefield jlacefi...@datastax.com wrote: Hello, There are probably lots of options to this challenge. The more details around your use case that you can provide, the easier it will be for this group to offer advice. A few follow-up questions: - How will you query this data? - Do your queries require filtering on specific columns other than product_id and created_at, i.e. the dynamic columns? Depending on the answers to these questions, you have several options, of which here are a few: - Cassandra efficiently stores sparse data, so you could create columns and not populate them, without much of a penalty - Could use a clustering column to store a columns type and another col (potentially clustering) to store the value - i.e. CREATE TABLE foo (col1 int, attname text, attvalue text, col4...n, PRIMARY KEY (col1, attname, attvalue)); - where attname stores the name of the attribute/column and attvalue stores the value of that attribute - have seen users use this model and create a main attribute row within a partition that stores the values associated with col4...n - Could store multiple collections - Others probably have ideas as well You may want to look in the archives for a similar discussion topic. Believe this item was asked a few months ago as well. [image: datastax_logo.png] Jonathan Lacefield Solution Architect | (404) 822 3487 | jlacefi...@datastax.com [image: linkedin.png] http://www.linkedin.com/in/jlacefield/ [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax https://github.com/datastax/ On Tue, Jan 20, 2015 at 1:40 PM, chetan verma chetanverm...@gmail.com wrote: Hi, I am creating a review system. for instance lets assume following are the attibutes of system: Review{ id bigint, product_id bigint, created_at timestamp, summary text, description text, pros settext,
Re: Should one expect to see hints being stored/delivered occasionally?
On Sat, Jan 17, 2015 at 3:32 PM, Vasileios Vlachos vasileiosvlac...@gmail.com wrote: Is there any other occasion that hints are stored and then being sent in a cluster, other than network or other temporary or permanent failure? Could it be that the client responsible for establishing a connection is causing this? We use the Datastax C# driver for connecting to the cluster and we run C* 1.2.18 on Ubuntu 12.04. Other than restarting nodes manually (which I consider a temporary failure for the purposes of this question), no. Seeing hints being stored and delivered outside of this context is a warning sign that something may be wrong with your cluster. Probably what is happening is that you have stop the world GCs long enough to trigger queueing of hints via timeouts during these GCs. =Rob
Versioning in cassandra while indexing ?
Hi, I just wanted to know if there is any kind of versioning system in cassandra while indexing new data(like the one we have for ElasticSearch, for example). For example, I have a series of payloads each coming with an id and 'updatedAt' timestamp. I just want to maintain the latest state of any payload for all the ids ie, index the data only if the current payload has greater 'updatedAt' than the previously stored timestamp. I can do this with one additional self-lookup, but is there a way to achieve this without overhead of additional lookup ? Thanks ! -- Regards, Pandian
Re: Dynamic Columns
Hi, Adding to previous mail. For example: We have a column family named review (with some arbitrary data in map). CREATE TABLE review( product_id bigint, created_at timestamp, data_int maptext, int, data_text maptext, text, PRIMARY KEY (product_id, created_at) ); Assume that these 2 maps I use to store arbitrary data (i.e. data_int and data_text for int and text values) when we see output on cassandra-cli, it looks like in a partition as : clustering_key:data_int:map_key as column name and value as map value. suppose I need to get this value, I couldn't do that with CQL3 but in thrift its possible. Any Solution? On Wed, Jan 21, 2015 at 1:06 AM, chetan verma chetanverm...@gmail.com wrote: Hi, Most of the time I will be querying on product_id and created_at, but for analytic I need to query almost on all column. Multiple collections ideas is good but the only is cassandra reads a collection entirely, what if I need a slice of it, I mean columns for certain keys which is possible with thrift. Please suggest. On Wed, Jan 21, 2015 at 12:36 AM, Jonathan Lacefield jlacefi...@datastax.com wrote: Hello, There are probably lots of options to this challenge. The more details around your use case that you can provide, the easier it will be for this group to offer advice. A few follow-up questions: - How will you query this data? - Do your queries require filtering on specific columns other than product_id and created_at, i.e. the dynamic columns? Depending on the answers to these questions, you have several options, of which here are a few: - Cassandra efficiently stores sparse data, so you could create columns and not populate them, without much of a penalty - Could use a clustering column to store a columns type and another col (potentially clustering) to store the value - i.e. CREATE TABLE foo (col1 int, attname text, attvalue text, col4...n, PRIMARY KEY (col1, attname, attvalue)); - where attname stores the name of the attribute/column and attvalue stores the value of that attribute - have seen users use this model and create a main attribute row within a partition that stores the values associated with col4...n - Could store multiple collections - Others probably have ideas as well You may want to look in the archives for a similar discussion topic. Believe this item was asked a few months ago as well. [image: datastax_logo.png] Jonathan Lacefield Solution Architect | (404) 822 3487 | jlacefi...@datastax.com [image: linkedin.png] http://www.linkedin.com/in/jlacefield/ [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax https://github.com/datastax/ On Tue, Jan 20, 2015 at 1:40 PM, chetan verma chetanverm...@gmail.com wrote: Hi, I am creating a review system. for instance lets assume following are the attibutes of system: Review{ id bigint, product_id bigint, created_at timestamp, summary text, description text, pros settext, cons settext, feature_rating maptext, int etc } I created partition key as product_id (so that all the reviews for a given product will reside on same node) and clustering key as created_at and id (Desc) so that reviews will be sorted by time. I can have more column and that requirement I want to fulfil by dynamic columns but there are limitations to it explained above. Could you please let me know the best way. On Tue, Jan 20, 2015 at 11:59 PM, Jonathan Lacefield jlacefi...@datastax.com wrote: Hello, Have you looked at solving this challenge with clustering columns? Also, please describe the problem set details for more specific advice from this group. Starting new projects on Thrift isn't the recommended approach. Jonathan [image: datastax_logo.png] Jonathan Lacefield Solution Architect | (404) 822 3487 | jlacefi...@datastax.com [image: linkedin.png] http://www.linkedin.com/in/jlacefield/ [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax https://github.com/datastax/ On Tue, Jan 20, 2015 at 1:24 PM, chetan verma chetanverm...@gmail.com wrote: Hi, I am starting a new project with cassandra as database. I have unstructured data so I need dynamic columns, though in CQL3 we can achive this via Collections but there are some downsides to it. 1. Collections are used to store small amount of data. 2. The maximum size of an item in a collection is 64K. 3. Cassandra reads a collection in its entirety. 4. Restrictions on number of items in collections is 64,000 And no support to get single column by map key, which is possible via cassandra cli. Please suggest whether I should use CQL3 or Thrift and which driver is
Re: How to know disk utilization by each row on a node
Hi, Datastax comes with sstablekeys that does that. You could also use sstable2json script to find keys. Cheers, Jens On Tue, Jan 20, 2015 at 2:53 PM, Edson Marquezani Filho edsonmarquez...@gmail.com wrote: Hello, everybody. Does anyone know a way to list, for an arbitrary column family, all the rows owned (including replicas) by a given node and the data size (real size or disk occupation) of each one of them on that node? I would like to do that because I have data on one of my nodes growing faster than the others, although rows (and replicas) seem evenly distributed across the cluster. So, I would like to verify if I have some specific rows growing too much. Thank you.
Re: Dynamic Columns
Hi, Most of the time I will be querying on product_id and created_at, but for analytic I need to query almost on all column. Multiple collections ideas is good but the only is cassandra reads a collection entirely, what if I need a slice of it, I mean columns for certain keys which is possible with thrift. Please suggest. On Wed, Jan 21, 2015 at 12:36 AM, Jonathan Lacefield jlacefi...@datastax.com wrote: Hello, There are probably lots of options to this challenge. The more details around your use case that you can provide, the easier it will be for this group to offer advice. A few follow-up questions: - How will you query this data? - Do your queries require filtering on specific columns other than product_id and created_at, i.e. the dynamic columns? Depending on the answers to these questions, you have several options, of which here are a few: - Cassandra efficiently stores sparse data, so you could create columns and not populate them, without much of a penalty - Could use a clustering column to store a columns type and another col (potentially clustering) to store the value - i.e. CREATE TABLE foo (col1 int, attname text, attvalue text, col4...n, PRIMARY KEY (col1, attname, attvalue)); - where attname stores the name of the attribute/column and attvalue stores the value of that attribute - have seen users use this model and create a main attribute row within a partition that stores the values associated with col4...n - Could store multiple collections - Others probably have ideas as well You may want to look in the archives for a similar discussion topic. Believe this item was asked a few months ago as well. [image: datastax_logo.png] Jonathan Lacefield Solution Architect | (404) 822 3487 | jlacefi...@datastax.com [image: linkedin.png] http://www.linkedin.com/in/jlacefield/ [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax https://github.com/datastax/ On Tue, Jan 20, 2015 at 1:40 PM, chetan verma chetanverm...@gmail.com wrote: Hi, I am creating a review system. for instance lets assume following are the attibutes of system: Review{ id bigint, product_id bigint, created_at timestamp, summary text, description text, pros settext, cons settext, feature_rating maptext, int etc } I created partition key as product_id (so that all the reviews for a given product will reside on same node) and clustering key as created_at and id (Desc) so that reviews will be sorted by time. I can have more column and that requirement I want to fulfil by dynamic columns but there are limitations to it explained above. Could you please let me know the best way. On Tue, Jan 20, 2015 at 11:59 PM, Jonathan Lacefield jlacefi...@datastax.com wrote: Hello, Have you looked at solving this challenge with clustering columns? Also, please describe the problem set details for more specific advice from this group. Starting new projects on Thrift isn't the recommended approach. Jonathan [image: datastax_logo.png] Jonathan Lacefield Solution Architect | (404) 822 3487 | jlacefi...@datastax.com [image: linkedin.png] http://www.linkedin.com/in/jlacefield/ [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax https://github.com/datastax/ On Tue, Jan 20, 2015 at 1:24 PM, chetan verma chetanverm...@gmail.com wrote: Hi, I am starting a new project with cassandra as database. I have unstructured data so I need dynamic columns, though in CQL3 we can achive this via Collections but there are some downsides to it. 1. Collections are used to store small amount of data. 2. The maximum size of an item in a collection is 64K. 3. Cassandra reads a collection in its entirety. 4. Restrictions on number of items in collections is 64,000 And no support to get single column by map key, which is possible via cassandra cli. Please suggest whether I should use CQL3 or Thrift and which driver is best. -- *Regards,* *Chetan Verma* *+91 99860 86634 %2B91%2099860%2086634* -- *Regards,* *Chetan Verma* *+91 99860 86634 %2B91%2099860%2086634* -- *Regards,* *Chetan Verma* *+91 99860 86634*
Re: Dynamic Columns
Hello, There are probably lots of options to this challenge. The more details around your use case that you can provide, the easier it will be for this group to offer advice. A few follow-up questions: - How will you query this data? - Do your queries require filtering on specific columns other than product_id and created_at, i.e. the dynamic columns? Depending on the answers to these questions, you have several options, of which here are a few: - Cassandra efficiently stores sparse data, so you could create columns and not populate them, without much of a penalty - Could use a clustering column to store a columns type and another col (potentially clustering) to store the value - i.e. CREATE TABLE foo (col1 int, attname text, attvalue text, col4...n, PRIMARY KEY (col1, attname, attvalue)); - where attname stores the name of the attribute/column and attvalue stores the value of that attribute - have seen users use this model and create a main attribute row within a partition that stores the values associated with col4...n - Could store multiple collections - Others probably have ideas as well You may want to look in the archives for a similar discussion topic. Believe this item was asked a few months ago as well. [image: datastax_logo.png] Jonathan Lacefield Solution Architect | (404) 822 3487 | jlacefi...@datastax.com [image: linkedin.png] http://www.linkedin.com/in/jlacefield/ [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax https://github.com/datastax/ On Tue, Jan 20, 2015 at 1:40 PM, chetan verma chetanverm...@gmail.com wrote: Hi, I am creating a review system. for instance lets assume following are the attibutes of system: Review{ id bigint, product_id bigint, created_at timestamp, summary text, description text, pros settext, cons settext, feature_rating maptext, int etc } I created partition key as product_id (so that all the reviews for a given product will reside on same node) and clustering key as created_at and id (Desc) so that reviews will be sorted by time. I can have more column and that requirement I want to fulfil by dynamic columns but there are limitations to it explained above. Could you please let me know the best way. On Tue, Jan 20, 2015 at 11:59 PM, Jonathan Lacefield jlacefi...@datastax.com wrote: Hello, Have you looked at solving this challenge with clustering columns? Also, please describe the problem set details for more specific advice from this group. Starting new projects on Thrift isn't the recommended approach. Jonathan [image: datastax_logo.png] Jonathan Lacefield Solution Architect | (404) 822 3487 | jlacefi...@datastax.com [image: linkedin.png] http://www.linkedin.com/in/jlacefield/ [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax https://github.com/datastax/ On Tue, Jan 20, 2015 at 1:24 PM, chetan verma chetanverm...@gmail.com wrote: Hi, I am starting a new project with cassandra as database. I have unstructured data so I need dynamic columns, though in CQL3 we can achive this via Collections but there are some downsides to it. 1. Collections are used to store small amount of data. 2. The maximum size of an item in a collection is 64K. 3. Cassandra reads a collection in its entirety. 4. Restrictions on number of items in collections is 64,000 And no support to get single column by map key, which is possible via cassandra cli. Please suggest whether I should use CQL3 or Thrift and which driver is best. -- *Regards,* *Chetan Verma* *+91 99860 86634 %2B91%2099860%2086634* -- *Regards,* *Chetan Verma* *+91 99860 86634 %2B91%2099860%2086634*