Re: Better option to load data to cassandra
On Wed, Nov 12, 2014 at 5:19 PM, cass savy casss...@gmail.com wrote: Sstableloader works well for large tables if you want to move data from Cassandra to Cassandra. This works if both C* are on the same version. Sstable2json and json2sstable is another alternative. This post is getting a bit long in the tooth, but is still pretty relevant : http://www.palominodb.com/blog/2012/09/25/bulk-loading-options-cassandra =Rob http://twitter.com/rcolidba
Re: Cassandra sort using updatable query
Hi Jonathan, Thank you very much, it worked this way. On Thu, Nov 13, 2014 at 12:07 AM, Jonathan Haddad j...@jonhaddad.com wrote: With Cassandra you're going to want to model tables to meet the requirements of your queries instead of like a relational database where you build tables in 3NF then optimize after. For your optimized select query, your table (with caveat, see below) could start out as: create table words ( year int, frequency int, content text, primary key (year, frequency, content) ); You may want to maintain other tables as well for different types of select statements. Your UPDATE statement above won't work, you'll have to DELETE and INSERT, since you can't change the value of a clustering column. If you don't know what your old frequency is ahead of time (to do the delete), you'll need to keep another table mapping content,year - frequency. Now, the tricky part here is that the above model will limit the total number of partitions you've got to the number of years you're working with, and will not scale as the cluster increases in size. Ideally you could bucket frequencies. If that feels like too much work (it's starting to for me), this may be better suited to something like solr, elastic search, or DSE (cassandra + solr). Does that help? Jon On Wed Nov 12 2014 at 9:01:44 AM Chamila Wijayarathna cdwijayarat...@gmail.com wrote: Hello all, I have a data set with attributes content and year. I want to put them in to CF 'words' with attributes ('content','year','frequency'). The CF should support following operations. - Frequency attribute of a column can be updated (i.e. - : can run query like UPDATE words SET frequency = 2 WHERE content='abc' AND year=1990;), where clause should contain content and year - Should support select query like Select content from words where year = 2010 ORDER BY frequency DESC LIMIT 10; (where clause only has year) where results can be ordered using frequency Is this kind of requirement can be fulfilled using Cassandra? What is the CF structure and indexing I need to use here? What queries should I use to create CF and in indexing? Thank You! -- *Chamila Dilshan Wijayarathna,* SMIEEE, SMIESL, Undergraduate, Department of Computer Science and Engineering, University of Moratuwa. -- *Chamila Dilshan Wijayarathna,* SMIEEE, SMIESL, Undergraduate, Department of Computer Science and Engineering, University of Moratuwa.
Re: Better option to load data to cassandra
Here's another post which is pretty comprehensive for this topic. http://informationsurvival.blogspot.com/2014/02/cassandra-cql3-integration.html [image: datastax_logo.png] Jonathan Lacefield Solution Architect | (404) 822 3487 | jlacefi...@datastax.com [image: linkedin.png] http://www.linkedin.com/in/jlacefield/ [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax https://github.com/datastax/ On Thu, Nov 13, 2014 at 3:16 AM, Robert Coli rc...@eventbrite.com wrote: On Wed, Nov 12, 2014 at 5:19 PM, cass savy casss...@gmail.com wrote: Sstableloader works well for large tables if you want to move data from Cassandra to Cassandra. This works if both C* are on the same version. Sstable2json and json2sstable is another alternative. This post is getting a bit long in the tooth, but is still pretty relevant : http://www.palominodb.com/blog/2012/09/25/bulk-loading-options-cassandra =Rob http://twitter.com/rcolidba
Re: Programmatic Cassandra version detection/extraction
There is a ReleaseVersion attribute in the org.apache.cassandra.db:StorageService bean --- Chris Lohfink On Wed, Nov 12, 2014 at 5:57 PM, Michael Shuler mich...@pbandjelly.org wrote: On 11/12/2014 04:58 PM, Michael Shuler wrote: On 11/12/2014 04:44 PM, Otis Gospodnetic wrote: Is there a way to detect which version of Cassandra one is running? Is there an API for that, or a constant with this value, or maybe an MBean or some other way to get to this info? I'm not sure if there are other methods, but this should always work: SELECT release_version from system.local; I asked the devs about where I might find the version in jmx and got the hint that I could cheat and look at `nodetool gossipinfo`. It looks like RELEASE_VERSION is reported as a field in org.apache.cassandra.net FailureDetector AllEndpointStates. -- Michael
Re: Programmatic Cassandra version detection/extraction
This is interesting. If I do a SELECT release_version from system.local; on my system it's telling me that I'm using 2.1.1 [root@beta-new:/usr/local/apache-cassandra-2.1.2] #cqlsh Connected to Jokefire Cluster at beta-new.jokefire.com:9042. [cqlsh 5.0.1 | Cassandra 2.1.1 | CQL spec 3.2.0 | Native protocol v3] Use HELP for help. cqlsh SELECT release_version from system.local; release_version - 2.1.1 (1 rows) cqlsh [root@beta-new:/usr/local/apache-cassandra-2.1.2] #nodetool gossipinfo | grep RELEASE_VERSION RELEASE_VERSION:2.1.1 But I definitely launched cassandra from a cassandra-2.1.2 directory. Could this be because I rsync'd the data directory from a cassandra 2.1.1 directory over to the 2.1.2 directory?? Thanks Tim On Thu, Nov 13, 2014 at 10:55 AM, Chris Lohfink clohfin...@gmail.com wrote: There is a ReleaseVersion attribute in the org.apache.cassandra.db:StorageService bean --- Chris Lohfink On Wed, Nov 12, 2014 at 5:57 PM, Michael Shuler mich...@pbandjelly.org wrote: On 11/12/2014 04:58 PM, Michael Shuler wrote: On 11/12/2014 04:44 PM, Otis Gospodnetic wrote: Is there a way to detect which version of Cassandra one is running? Is there an API for that, or a constant with this value, or maybe an MBean or some other way to get to this info? I'm not sure if there are other methods, but this should always work: SELECT release_version from system.local; I asked the devs about where I might find the version in jmx and got the hint that I could cheat and look at `nodetool gossipinfo`. It looks like RELEASE_VERSION is reported as a field in org.apache.cassandra.net FailureDetector AllEndpointStates. -- Michael -- GPG me!! gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
Re: Better option to load data to cassandra
So sstableloader is a cpu efficient online method of loading data if you already have sstables. An option you may not have considered is just using batch inserts. It was a surprise to me coming from another database system, but C*'s primary use case is shoving data to an append only log. Is there a faster way to write a chunk of data other than as sequential writes? The only caveat is that there seems to be more parsing overhead (source: jvisualvm) when going the batch writing route, which is why I made sure to mention sstable loader's cpu efficiency. On Thu, Nov 13, 2014 at 5:10 AM, Jonathan Lacefield jlacefi...@datastax.com wrote: Here's another post which is pretty comprehensive for this topic. http://informationsurvival.blogspot.com/2014/02/cassandra-cql3-integration.html [image: datastax_logo.png] Jonathan Lacefield Solution Architect | (404) 822 3487 | jlacefi...@datastax.com [image: linkedin.png] http://www.linkedin.com/in/jlacefield/ [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax https://github.com/datastax/ On Thu, Nov 13, 2014 at 3:16 AM, Robert Coli rc...@eventbrite.com wrote: On Wed, Nov 12, 2014 at 5:19 PM, cass savy casss...@gmail.com wrote: Sstableloader works well for large tables if you want to move data from Cassandra to Cassandra. This works if both C* are on the same version. Sstable2json and json2sstable is another alternative. This post is getting a bit long in the tooth, but is still pretty relevant : http://www.palominodb.com/blog/2012/09/25/bulk-loading-options-cassandra =Rob http://twitter.com/rcolidba
Is it more performant to split data with the same schema into multiple keyspaces, as supposed to put all of them into the same keyspace?
Hi, we use Cassandra to store some association type of data. For example, store user to course (course registrations) association and user to school (school enrollment) association data. The schema for these two types of associations are the same. So there are two options to store the data: 1. Put user to course association data into one keyspace, and user to school association data into another keyspace. 2. Put both of them into the same keyspace. In the long run, such data will grow to be very large. With that in mind, is it better to use the first approach (having multiple keyspaces) for better performance? Thanks. George
Re: Is it more performant to split data with the same schema into multiple keyspaces, as supposed to put all of them into the same keyspace?
Performance will be the same. There's no performance benefit to using multiple keyspaces. On Thu Nov 13 2014 at 8:42:40 AM Li, George guangxing...@pearson.com wrote: Hi, we use Cassandra to store some association type of data. For example, store user to course (course registrations) association and user to school (school enrollment) association data. The schema for these two types of associations are the same. So there are two options to store the data: 1. Put user to course association data into one keyspace, and user to school association data into another keyspace. 2. Put both of them into the same keyspace. In the long run, such data will grow to be very large. With that in mind, is it better to use the first approach (having multiple keyspaces) for better performance? Thanks. George
Re: Is it more performant to split data with the same schema into multiple keyspaces, as supposed to put all of them into the same keyspace?
That's not necessarily true. You don't need to split them into separate keyspaces, but separate tables may have some advantages. For example, in Cassandra 2.1, compaction and index summary management are optimized based on read rates for SSTables. If you have different read rates or patterns for the two types of data, it will confuse/eliminate these optimizations. If you have two separate sets of data with (potentially) two separate read patterns, don't put them in the same table. On Thu, Nov 13, 2014 at 11:08 AM, Jonathan Haddad j...@jonhaddad.com wrote: Performance will be the same. There's no performance benefit to using multiple keyspaces. On Thu Nov 13 2014 at 8:42:40 AM Li, George guangxing...@pearson.com wrote: Hi, we use Cassandra to store some association type of data. For example, store user to course (course registrations) association and user to school (school enrollment) association data. The schema for these two types of associations are the same. So there are two options to store the data: 1. Put user to course association data into one keyspace, and user to school association data into another keyspace. 2. Put both of them into the same keyspace. In the long run, such data will grow to be very large. With that in mind, is it better to use the first approach (having multiple keyspaces) for better performance? Thanks. George -- Tyler Hobbs DataStax http://datastax.com/
Re: Is it more performant to split data with the same schema into multiple keyspaces, as supposed to put all of them into the same keyspace?
Tables, yes, but that wasn't the question. The question was around using different keyspaces. On Thu Nov 13 2014 at 9:17:30 AM Tyler Hobbs ty...@datastax.com wrote: That's not necessarily true. You don't need to split them into separate keyspaces, but separate tables may have some advantages. For example, in Cassandra 2.1, compaction and index summary management are optimized based on read rates for SSTables. If you have different read rates or patterns for the two types of data, it will confuse/eliminate these optimizations. If you have two separate sets of data with (potentially) two separate read patterns, don't put them in the same table. On Thu, Nov 13, 2014 at 11:08 AM, Jonathan Haddad j...@jonhaddad.com wrote: Performance will be the same. There's no performance benefit to using multiple keyspaces. On Thu Nov 13 2014 at 8:42:40 AM Li, George guangxing...@pearson.com wrote: Hi, we use Cassandra to store some association type of data. For example, store user to course (course registrations) association and user to school (school enrollment) association data. The schema for these two types of associations are the same. So there are two options to store the data: 1. Put user to course association data into one keyspace, and user to school association data into another keyspace. 2. Put both of them into the same keyspace. In the long run, such data will grow to be very large. With that in mind, is it better to use the first approach (having multiple keyspaces) for better performance? Thanks. George -- Tyler Hobbs DataStax http://datastax.com/
Re: Programmatic Cassandra version detection/extraction
On Wed, Nov 12, 2014 at 2:44 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, Is there a way to detect which version of Cassandra one is running? Is there an API for that, or a constant with this value, or maybe an MBean or some other way to get to this info? Here's the use case: SPM monitors Cassandra http://sematext.com/spm/, but Cassandra MBeans and metrics have or may change over time. How will SPM agent know which MBeans to look for, which metrics to extract, and how to interpret values it extracts without knowing which version of Cassandra it's monitoring? It could try probing for some known MBeans and deduce Cassandra version from that, but that feels a little sloppy. Ideally, we'd be able to grab the version from some MBean and based on that extract metrics we know are exposed in that version of Cassandra. Thanks, Otis If you are using the Java driver, you can retrieve the version of Cassandra for each Host: http://www.datastax.com/drivers/java/2.1/com/datastax/driver/core/Host.html#getCassandraVersion() -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ -- [:-a) Alex Popescu Sen. Product Manager @ DataStax @al3xandru
Cassandra communication between 2 datacenter
Hi, we have two datacenter with those inof: Cassandra version 2.1.0 DC1 with 5 nodes DC2 with 5 nodes we set the snitch to GossipingPropertyFileSnitch and in cassandra-rackdc.properties we put: in DC1: dc=DC1 rack=RAC1 in DC2: dc=DC2 rack=RAC1 and in every node's cassandra.yaml we define two seeds of DC1 and two seed of DC2. we restart both DC, we create un keyspace with NetworkTopologyStrategy in DC1 and we suspect that will be created also in DC2 but it's not the case...so we create the same keyspace in DC2, we create a table in both DC, we did un insert in DC1 but doing un select from the same table in DC2 we found 0 rows. so it seems that our clusters are not communicating between them. doing a nodetool status overs each DC we see only the 5 nodes corresponding to the current DC. are we missed some configuration? thanks in advance.
Re: Cassandra communication between 2 datacenter
On Thu, Nov 13, 2014 at 10:26 AM, Adil adil.cha...@gmail.com wrote: Hi, we have two datacenter with those inof: Cassandra version 2.1.0 DC1 with 5 nodes DC2 with 5 nodes we set the snitch to GossipingPropertyFileSnitch and in cassandra-rackdc.properties we put: in DC1: dc=DC1 rack=RAC1 in DC2: dc=DC2 rack=RAC1 and in every node's cassandra.yaml we define two seeds of DC1 and two seed of DC2. Do you start the nodes one at a time, and then consult nodetool ring (etc.) to see if the cluster coalesces in the way you expect? If so, a Keyspace created in one should very quickly be created in the other. =Rob http://twitter.com/rcolidba
Re: Cassandra communication between 2 datacenter
yeh we started nodes one at timemy doubt is if we should configure alse cassandra-topology.properties or not? we leave it with default vlaues 2014-11-13 21:05 GMT+01:00 Robert Coli rc...@eventbrite.com: On Thu, Nov 13, 2014 at 10:26 AM, Adil adil.cha...@gmail.com wrote: Hi, we have two datacenter with those inof: Cassandra version 2.1.0 DC1 with 5 nodes DC2 with 5 nodes we set the snitch to GossipingPropertyFileSnitch and in cassandra-rackdc.properties we put: in DC1: dc=DC1 rack=RAC1 in DC2: dc=DC2 rack=RAC1 and in every node's cassandra.yaml we define two seeds of DC1 and two seed of DC2. Do you start the nodes one at a time, and then consult nodetool ring (etc.) to see if the cluster coalesces in the way you expect? If so, a Keyspace created in one should very quickly be created in the other. =Rob http://twitter.com/rcolidba
Re: Cassandra communication between 2 datacenter
Are you sure that both DC's can communicate with each other over the necessary ports? On Thu, Nov 13, 2014 at 3:46 PM, Adil adil.cha...@gmail.com wrote: yeh we started nodes one at timemy doubt is if we should configure alse cassandra-topology.properties or not? we leave it with default vlaues 2014-11-13 21:05 GMT+01:00 Robert Coli rc...@eventbrite.com: On Thu, Nov 13, 2014 at 10:26 AM, Adil adil.cha...@gmail.com wrote: Hi, we have two datacenter with those inof: Cassandra version 2.1.0 DC1 with 5 nodes DC2 with 5 nodes we set the snitch to GossipingPropertyFileSnitch and in cassandra-rackdc.properties we put: in DC1: dc=DC1 rack=RAC1 in DC2: dc=DC2 rack=RAC1 and in every node's cassandra.yaml we define two seeds of DC1 and two seed of DC2. Do you start the nodes one at a time, and then consult nodetool ring (etc.) to see if the cluster coalesces in the way you expect? If so, a Keyspace created in one should very quickly be created in the other. =Rob http://twitter.com/rcolidba
TLS for clients on DSE
Hi I configured TLS encryption for cqlsh and cassandra (open source). I now have to do it for DataStax Enterprise. I found the cassandra.yaml file (for DSE, .dse-4.5.3/resources/cassandra/conf. Same changes as previously..but not the same results. Any pointers? Regards Danie
Questiona about node repair
Hi There, I have a question about Cassandra node repair, there is a function called forceTerminateAllRepairSessions();, so will the function terminate all the repair session in only one node, or it will terminate all the session in a ring? And when it terminates all repair sessions, does it just terminate it immediately or it just send a terminate signal, and do the real terminate later? Thanks a lot. Regards, -Jieming-
OR mapping for set appends…
I’m trying to figure out the best way to handle things like set appends (and other CQL extensions) in traditional OR mapping. Our OR mapper does basic setFoo() .. then save() to write the record back to the database. So if foo is a Sett then I can set all members. But I want to do some appends with a custom TTL on the set.. this isn’t normally handled in a OR system. I was thinking of having an AppendSet so if one calls setFoo with a different Set implementation the OR mapper knows to perform an append. Thoughts? -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com
how wide can wide rows get?
I’m struggling with this wide row business. Is there an upward limit on the number of columns you can have? Adaryl Bob Wakefield, MBA Principal Mass Street Analytics 913.938.6685 www.linkedin.com/in/bobwakefieldmba Twitter: @BobLovesData
Re: how wide can wide rows get?
The theoretical limit is maybe 2 billion but recommended max is around 10-20 thousand. Br, Hannu On 14.11.2014, at 8.10, Adaryl Bob Wakefield, MBA adaryl.wakefi...@hotmail.com wrote: I’m struggling with this wide row business. Is there an upward limit on the number of columns you can have? Adaryl Bob Wakefield, MBA Principal Mass Street Analytics 913.938.6685 www.linkedin.com/in/bobwakefieldmba Twitter: @BobLovesData
Re: how wide can wide rows get?
You can have up to 2 billion columns but there are some considerations. This article might be of some help. http://www.ebaytechblog.com/2012/08/14/cassandra-data-modeling-best-practices-part-2/#.VGWdT4enCS0 http://www.ebaytechblog.com/2012/08/14/cassandra-data-modeling-best-practices-part-2/#.VGWdT4enCS0 On Nov 14, 2014, at 1:10 AM, Adaryl Bob Wakefield, MBA adaryl.wakefi...@hotmail.com wrote: I’m struggling with this wide row business. Is there an upward limit on the number of columns you can have? Adaryl Bob Wakefield, MBA Principal Mass Street Analytics 913.938.6685 www.linkedin.com/in/bobwakefieldmba Twitter: @BobLovesData
Re[2]: how wide can wide rows get?
We have 380k of them in some of our rows and it's ok. -- Original Message -- From: Hannu Kröger hkro...@gmail.commailto:hkro...@gmail.com To: user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Sent: 14.11.2014 16:13:49 Subject: Re: how wide can wide rows get? The theoretical limit is maybe 2 billion but recommended max is around 10-20 thousand. Br, Hannu On 14.11.2014, at 8.10, Adaryl Bob Wakefield, MBA adaryl.wakefi...@hotmail.commailto:adaryl.wakefi...@hotmail.com wrote: I’m struggling with this wide row business. Is there an upward limit on the number of columns you can have? Adaryl Bob Wakefield, MBA Principal Mass Street Analytics 913.938.6685 www.linkedin.com/in/bobwakefieldmbahttp://www.linkedin.com/in/bobwakefieldmba Twitter: @BobLovesData
about collections limit
can i scan collection (list, set) paged by limit?
Re: OR mapping for set appends…
Alternatively build a SetWrapper that extends the Java sets interface and intercepts calls to add(), addAll(), remove() For each of these method, the SetWrapper will generate an appropriate UPDATE statement. This is more general purpose than just an AppendSet. TTL is another story, it is an extra option you can add to any insert/update query On Fri, Nov 14, 2014 at 5:55 AM, Kevin Burton bur...@spinn3r.com wrote: I’m trying to figure out the best way to handle things like set appends (and other CQL extensions) in traditional OR mapping. Our OR mapper does basic setFoo() .. then save() to write the record back to the database. So if foo is a Sett then I can set all members. But I want to do some appends with a custom TTL on the set.. this isn’t normally handled in a OR system. I was thinking of having an AppendSet so if one calls setFoo with a different Set implementation the OR mapper knows to perform an append. Thoughts? -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com
Re: Re[2]: how wide can wide rows get?
We have up to a few hundreds of millions of columns in a super wide row. There are two major issues you should care about. 1. the wider the row is, the more memory pressure you get for every slice query 2. repair is row based, which means a huge row could be transferred at every repair 1 is not a big issue if you don't have many concurrent slice requests. Having more cores is a good investment to reduce memory pressure. 2 could cause very high memory pressure as well as poorer disk utilization. On Fri, Nov 14, 2014 at 3:21 PM, Plotnik, Alexey aplot...@rhonda.ru wrote: We have 380k of them in some of our rows and it's ok. -- Original Message -- From: Hannu Kröger hkro...@gmail.com To: user@cassandra.apache.org user@cassandra.apache.org Sent: 14.11.2014 16:13:49 Subject: Re: how wide can wide rows get? The theoretical limit is maybe 2 billion but recommended max is around 10-20 thousand. Br, Hannu On 14.11.2014, at 8.10, Adaryl Bob Wakefield, MBA adaryl.wakefi...@hotmail.com wrote: I’m struggling with this wide row business. Is there an upward limit on the number of columns you can have? Adaryl Bob Wakefield, MBA Principal Mass Street Analytics 913.938.6685 www.linkedin.com/in/bobwakefieldmba Twitter: @BobLovesData