Re: Denormalization leads to terrible, rather than better, Cassandra performance -- I am really puzzled
A few observations from what you've said so far: 1) IN clauses in CQL can have performance impact by including sets of keys that are spread across the cluster. 2) We previously used m3.large instances in our cluster and would see occasional read timeouts even at CL.ONE. We upgraded to i2.xlarge with local SSD drive and no longer experience those problems. 3) You didn't state how you had your storage configured, but if you're using EBS for your Cassandra partitions, it can seriously impact performance due to network lag. If you're using local storage on your instances (which is recommended), you should be using separate drives for your data and commitlog partitions because the access patterns are very different. SSD is the preferred local storage option (over spinning disks) in any case. 4) If you have all of the above covered, you might also want to compare CL.ONE read results with CL.QUORUM read results. The former will likely perform much better. Any of the issues above can cause excessive contention between nodes and seriously degrade performance as traffic increases. These are just some of the first things that jump out. Others on the list are a lot more experienced than I am with Cassandra performance and may have additional advice. There are also quite a few good papers and videos on planet cassandra and the youtube channel regarding performance, storage, data models and the interactions between them. Hope that helps, Steve On Sun, May 3, 2015 at 4:05 PM, Erick Ramirez er...@ramirez.com.au wrote: Hello, there. In relation to the Java driver, I would recommend updating to the latest version as there were a lot of issues reported in versions earlier that 2.0.9 were the driver is incorrectly marking nodes as down/not available. In fact, there is a new version of the driver being released in the next 24-48 hours that reverts JAVA-425 to resolve this issue. Cheers, Erick *Erick Ramirez* About Me about.me/erickramirezonline Make a difference today! * Reduce your carbon footprint http://on.mash.to/1vZL7fX * Give back to the community http://www.govolunteer.com.au * Write free software http://www.opensource.org On Wed, Apr 29, 2015 at 4:56 AM, dlu66061 dlu66...@yahoo.com wrote: Cassandra gurus, I am really puzzled by my observations, and hope to get some help explaining the results. Thanks in advance. I think it has always been advocated in Cassandra community that de-normalization leads to better performance. I wanted to see how much performance improvement it can offer, but the results were totally opposite. The performance degraded dramatically for simultaneously requests for the same set of data. *Environment:* I have a Cassandra cluster consisting of 3 AWS m3.large instances, with Cassandra 2.0.6 installed and pretty much default settings. My program is written in Java using Java Driver 2.0.8. *Normalized case:* I have two tables created with the following 2 CQL statements CREATE TABLE event (event_id UUID, time_token timeuuid, … 30 other attributes, … PRIMARY KEY (event_id)) CREATE TABLE event_index (index_key text, time_token timeuuid, event_id UUID, PRIMARY KEY (index_key, time_token)) In my program, given the proper index_key and a token range (tokenLowerBound to tokenUpperBound), I first query the event_index table *Query 1:* SELECT * FROM event_index WHERE index_key in (…) AND time_token tokenLowerBound AND time_token = tokenUpperBound ORDER BY time_token ASC LIMIT 2000 to get a list of event_ids and then run the following CQL to get the event details. *Query 2:* SELECT * FROM event WHERE event_id IN (a list of event_ids from the above query) I repeat the above process, with updated token range from the previous run. This actually performs pretty well. In this normalized process, I have to *run 2 queries* to get data: the first one should be very quick since it is getting a slice of an internally wide row. The second query may take long because it needs to hit up to 2000 rows of event table. *De-normalized case:* What if we can attach event detail to the index and run just 1 query? Like Query 1, would it be much faster since it is also getting a slice of an internally wide row? I created a third table that merged the above two tables together. Notice the first three attributes and the PRIMARY KEY definition are exactly the same as the event_index table. CREATE TABLE event_index_with_detail (index_key text, time_token timeuuid, event_id UUID, … 30 other attributes, … PRIMARY KEY (index_key, time_token)) Then I can just run the following query to achieve my goal, with the same index and token range as in query 1: *Query 3:* SELECT * FROM event_index_with_detail WHERE index_key in (…) AND time_token tokenLowerBound AND time_token = tokenUpperBound ORDER BY time_token ASC LIMIT 2000 *Performance observations* Using Java Driver 2.0.8, I wrote a program that runs Query 1 + Query 2 in
Re: Denormalization leads to terrible, rather than better, Cassandra performance -- I am really puzzled
Hello, there. In relation to the Java driver, I would recommend updating to the latest version as there were a lot of issues reported in versions earlier that 2.0.9 were the driver is incorrectly marking nodes as down/not available. In fact, there is a new version of the driver being released in the next 24-48 hours that reverts JAVA-425 to resolve this issue. Cheers, Erick *Erick Ramirez* About Me about.me/erickramirezonline Make a difference today! * Reduce your carbon footprint http://on.mash.to/1vZL7fX * Give back to the community http://www.govolunteer.com.au * Write free software http://www.opensource.org On Wed, Apr 29, 2015 at 4:56 AM, dlu66061 dlu66...@yahoo.com wrote: Cassandra gurus, I am really puzzled by my observations, and hope to get some help explaining the results. Thanks in advance. I think it has always been advocated in Cassandra community that de-normalization leads to better performance. I wanted to see how much performance improvement it can offer, but the results were totally opposite. The performance degraded dramatically for simultaneously requests for the same set of data. *Environment:* I have a Cassandra cluster consisting of 3 AWS m3.large instances, with Cassandra 2.0.6 installed and pretty much default settings. My program is written in Java using Java Driver 2.0.8. *Normalized case:* I have two tables created with the following 2 CQL statements CREATE TABLE event (event_id UUID, time_token timeuuid, … 30 other attributes, … PRIMARY KEY (event_id)) CREATE TABLE event_index (index_key text, time_token timeuuid, event_id UUID, PRIMARY KEY (index_key, time_token)) In my program, given the proper index_key and a token range (tokenLowerBound to tokenUpperBound), I first query the event_index table *Query 1:* SELECT * FROM event_index WHERE index_key in (…) AND time_token tokenLowerBound AND time_token = tokenUpperBound ORDER BY time_token ASC LIMIT 2000 to get a list of event_ids and then run the following CQL to get the event details. *Query 2:* SELECT * FROM event WHERE event_id IN (a list of event_ids from the above query) I repeat the above process, with updated token range from the previous run. This actually performs pretty well. In this normalized process, I have to *run 2 queries* to get data: the first one should be very quick since it is getting a slice of an internally wide row. The second query may take long because it needs to hit up to 2000 rows of event table. *De-normalized case:* What if we can attach event detail to the index and run just 1 query? Like Query 1, would it be much faster since it is also getting a slice of an internally wide row? I created a third table that merged the above two tables together. Notice the first three attributes and the PRIMARY KEY definition are exactly the same as the event_index table. CREATE TABLE event_index_with_detail (index_key text, time_token timeuuid, event_id UUID, … 30 other attributes, … PRIMARY KEY (index_key, time_token)) Then I can just run the following query to achieve my goal, with the same index and token range as in query 1: *Query 3:* SELECT * FROM event_index_with_detail WHERE index_key in (…) AND time_token tokenLowerBound AND time_token = tokenUpperBound ORDER BY time_token ASC LIMIT 2000 *Performance observations* Using Java Driver 2.0.8, I wrote a program that runs Query 1 + Query 2 in the normalized case, or Query 3 in the denormalized case. All queries is set with LOCAL_QUORUM consistency level. Then I created 1 or more instances of the program to simultaneously retrieve the SAME set of 1 million events stored in Cassandra. Each test runs for 5 minutes, and the results are shown below. 1 instance 5 instances 10 instances Normalized 89 315 417 Denormalized 100 *43* *3* Note that the unit of measure is number of operations. So in the normalized case, the programs runs 89 times and retrieves 178K events for a single instance, 315 times and 630K events to 5 instances (each instance gets about 126K events), and 417 times and 834K events to 10 instances simultaneously (each instance gets about 83.4K events). Well for the de-normalized case, the performance is little better for a single instance case, in which the program runs 100 times and retrieves 200K events. However, it turns sharply south for multiple simultaneous instances. All 5 instances completed successfully only 43 operations together, and all 10 instances completed successfully only 3 operations together. For the latter case, the log showed that 3 instances each retrieved 2000 events successfully, and 7 other instances retrieved 0. In the de-normalized case, the program reported a lot of exceptions like below: com.datastax.driver.core.exceptions.ReadTimeoutException, Cassandra timeout during read query at consistency LOCAL_QUORUM (2 responses were required but only 1 replica responded)
Re: Denormalization
My experience we can design main column families and lookup column families. Main column family have all denormalized data,lookup column families have rowkey of denormalized column families's column. In users column family all user's denormalized data and lookup column family name like userByemail. when i first make request to userByemail retuns unique key which is rowkey of User column family then call to User column family returns all data, same other lookup column families too. - Chandra On Sun, Jan 27, 2013 at 8:53 PM, Hiller, Dean dean.hil...@nrel.gov wrote: Agreed, was just making sure others knew ;). Dean From: Edward Capriolo edlinuxg...@gmail.commailto:edlinuxg...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Sunday, January 27, 2013 6:51 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Denormalization When I said that writes were cheap, I was speaking that in a normal case people are making 2-10 inserts what in a relational database might be one. 30K inserts is certainly not cheap. Your use case with 30,000 inserts is probably a special case. Most directory services that I am aware of OpenLDAP, Active Directory, Sun Directory server do eventually consistent master/slave and multi-master replication. So no worries about having to background something. You just want the replication to be fast enough so that when you call the employee about to be fired into the office, that by the time he leaves and gets home he can not VPN to rm -rf / your main file server :) On Sun, Jan 27, 2013 at 7:57 PM, Hiller, Dean dean.hil...@nrel.gov mailto:dean.hil...@nrel.gov wrote: Sometimes this is true, sometimes not…..….We have a use case where we have an admin tool where we choose to do this denorm for ACL on permission checks to make permission checks extremely fast. That said, we have one issue with one object that too many children(30,000) so when someone gives a user access to this one object with 30,000 children, we end up with a bad 60 second wait and users ended up getting frustrated and trying to cancel(our plan since admin activity hardly ever happens is to do it on our background thread and just return immediately to the user and tell him his changes will take affect in 1 minute ). After all, admin changes are infrequent anyways. This example demonstrates how sometimes it could almost burn you. I guess my real point is it really depends on your use cases ;). In a lot of cases denorm can work but in some cases it burns you so you have to balance it all. In 90% of our cases our denorm is working great and for this one case, we need to background the permission change as we still LOVE the performance of our ACL checks. Ps. 30,000 writes in cassandra is not cheap when done from one server ;) but in general parallized writes is very fast for like 500. Later, Dean From: Edward Capriolo edlinuxg...@gmail.commailto:edlinuxg...@gmail.com mailto:edlinuxg...@gmail.commailto:edlinuxg...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org mailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto: user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Sunday, January 27, 2013 5:50 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Denormalization One technique is on the client side you build a tool that takes the even and produces N mutations. In c* writes are cheap so essentially, re-write everything on all changes. On Sun, Jan 27, 2013 at 4:03 PM, Fredrik Stigbäck fredrik.l.stigb...@sitevision.semailto:fredrik.l.stigb...@sitevision.se mailto:fredrik.l.stigb...@sitevision.semailto: fredrik.l.stigb...@sitevision.se wrote: Hi. Since denormalized data is first-class citizen in Cassandra, how to handle updating denormalized data. E.g. If we have a USER cf with name, email etc. and denormalize user data into many other CF:s and then update the information about a user (name, email...). What is the best way to handle updating those user data properties which might be spread out over many cf:s and many rows? Regards /Fredrik
Re: Denormalization
There is a really a mix of denormalization and normalization. It really depends on specific use-cases. To get better help on the email list, a more specific use case may be appropriate. Dean On 1/27/13 2:03 PM, Fredrik Stigbäck fredrik.l.stigb...@sitevision.se wrote: Hi. Since denormalized data is first-class citizen in Cassandra, how to handle updating denormalized data. E.g. If we have a USER cf with name, email etc. and denormalize user data into many other CF:s and then update the information about a user (name, email...). What is the best way to handle updating those user data properties which might be spread out over many cf:s and many rows? Regards /Fredrik
Re: Denormalization
I don't have a current use-case. I was just curious how applications handle and how to think when modelling, since I guess denormalization might increase the complexity of the application. Fredrik 2013/1/27 Hiller, Dean dean.hil...@nrel.gov: There is a really a mix of denormalization and normalization. It really depends on specific use-cases. To get better help on the email list, a more specific use case may be appropriate. Dean On 1/27/13 2:03 PM, Fredrik Stigbäck fredrik.l.stigb...@sitevision.se wrote: Hi. Since denormalized data is first-class citizen in Cassandra, how to handle updating denormalized data. E.g. If we have a USER cf with name, email etc. and denormalize user data into many other CF:s and then update the information about a user (name, email...). What is the best way to handle updating those user data properties which might be spread out over many cf:s and many rows? Regards /Fredrik -- Fredrik Larsson Stigbäck SiteVision AB Vasagatan 10, 107 10 Örebro 019-17 30 30
Re: Denormalization
In my experience, if you foresee needing to do a lot of updates where a master record would need to propagate its changes to other records, then in general a non-sql based data store may be the wrong fit for your data. If you have a lot of data that doesn't really change or is not linked in some way to other rows (in Cassandra's case). Then a non-sql based data store could be a great fit. Yes, you can do some fancy stuff to force things like Cassandra to behave like an RDBMS, but it's at the cost of application complexity; more code, more bugs. I often end up mixing the data stores sql/non-sql to play to their respective strengths. If I start seeing a lot of related data, relational databases are really good at solving that problem. On Sunday, January 27, 2013, Fredrik Stigbäck wrote: I don't have a current use-case. I was just curious how applications handle and how to think when modelling, since I guess denormalization might increase the complexity of the application. Fredrik 2013/1/27 Hiller, Dean dean.hil...@nrel.gov javascript:;: There is a really a mix of denormalization and normalization. It really depends on specific use-cases. To get better help on the email list, a more specific use case may be appropriate. Dean On 1/27/13 2:03 PM, Fredrik Stigbäck fredrik.l.stigb...@sitevision.sejavascript:; wrote: Hi. Since denormalized data is first-class citizen in Cassandra, how to handle updating denormalized data. E.g. If we have a USER cf with name, email etc. and denormalize user data into many other CF:s and then update the information about a user (name, email...). What is the best way to handle updating those user data properties which might be spread out over many cf:s and many rows? Regards /Fredrik -- Fredrik Larsson Stigbäck SiteVision AB Vasagatan 10, 107 10 Örebro 019-17 30 30
Re: Denormalization
Things like PlayOrm exist to help you with half and half of denormalized and normalized data. There are more and more patterns out there of denormalization and normalization but allowing for scalability still. Here is one patterns page https://github.com/deanhiller/playorm/wiki/Patterns-Page Dean From: Adam Venturella aventure...@gmail.commailto:aventure...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Sunday, January 27, 2013 3:44 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Denormalization In my experience, if you foresee needing to do a lot of updates where a master record would need to propagate its changes to other records, then in general a non-sql based data store may be the wrong fit for your data. If you have a lot of data that doesn't really change or is not linked in some way to other rows (in Cassandra's case). Then a non-sql based data store could be a great fit. Yes, you can do some fancy stuff to force things like Cassandra to behave like an RDBMS, but it's at the cost of application complexity; more code, more bugs. I often end up mixing the data stores sql/non-sql to play to their respective strengths. If I start seeing a lot of related data, relational databases are really good at solving that problem. On Sunday, January 27, 2013, Fredrik Stigbäck wrote: I don't have a current use-case. I was just curious how applications handle and how to think when modelling, since I guess denormalization might increase the complexity of the application. Fredrik 2013/1/27 Hiller, Dean dean.hil...@nrel.govjavascript:;: There is a really a mix of denormalization and normalization. It really depends on specific use-cases. To get better help on the email list, a more specific use case may be appropriate. Dean On 1/27/13 2:03 PM, Fredrik Stigbäck fredrik.l.stigb...@sitevision.sejavascript:; wrote: Hi. Since denormalized data is first-class citizen in Cassandra, how to handle updating denormalized data. E.g. If we have a USER cf with name, email etc. and denormalize user data into many other CF:s and then update the information about a user (name, email...). What is the best way to handle updating those user data properties which might be spread out over many cf:s and many rows? Regards /Fredrik -- Fredrik Larsson Stigbäck SiteVision AB Vasagatan 10, 107 10 Örebro 019-17 30 30
Re: Denormalization
Oh and check out the last pattern Scalable equals only index which can allow you to still have normalized data though the pattern does denormalization just enough that you can 1. Update just two pieces of info (the users email for instance and the Xref table email as well). 2. Allow everyone else to have foreign references into that piece. (everyone references the guid not the email….while the xref table has an email to guid for your use…this can be quite a common pattern actually when you may be having issues denormalizing) Dean From: Adam Venturella aventure...@gmail.commailto:aventure...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Sunday, January 27, 2013 3:44 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Denormalization In my experience, if you foresee needing to do a lot of updates where a master record would need to propagate its changes to other records, then in general a non-sql based data store may be the wrong fit for your data. If you have a lot of data that doesn't really change or is not linked in some way to other rows (in Cassandra's case). Then a non-sql based data store could be a great fit. Yes, you can do some fancy stuff to force things like Cassandra to behave like an RDBMS, but it's at the cost of application complexity; more code, more bugs. I often end up mixing the data stores sql/non-sql to play to their respective strengths. If I start seeing a lot of related data, relational databases are really good at solving that problem. On Sunday, January 27, 2013, Fredrik Stigbäck wrote: I don't have a current use-case. I was just curious how applications handle and how to think when modelling, since I guess denormalization might increase the complexity of the application. Fredrik 2013/1/27 Hiller, Dean dean.hil...@nrel.govjavascript:;: There is a really a mix of denormalization and normalization. It really depends on specific use-cases. To get better help on the email list, a more specific use case may be appropriate. Dean On 1/27/13 2:03 PM, Fredrik Stigbäck fredrik.l.stigb...@sitevision.sejavascript:; wrote: Hi. Since denormalized data is first-class citizen in Cassandra, how to handle updating denormalized data. E.g. If we have a USER cf with name, email etc. and denormalize user data into many other CF:s and then update the information about a user (name, email...). What is the best way to handle updating those user data properties which might be spread out over many cf:s and many rows? Regards /Fredrik -- Fredrik Larsson Stigbäck SiteVision AB Vasagatan 10, 107 10 Örebro 019-17 30 30
Re: Denormalization
One technique is on the client side you build a tool that takes the even and produces N mutations. In c* writes are cheap so essentially, re-write everything on all changes. On Sun, Jan 27, 2013 at 4:03 PM, Fredrik Stigbäck fredrik.l.stigb...@sitevision.se wrote: Hi. Since denormalized data is first-class citizen in Cassandra, how to handle updating denormalized data. E.g. If we have a USER cf with name, email etc. and denormalize user data into many other CF:s and then update the information about a user (name, email...). What is the best way to handle updating those user data properties which might be spread out over many cf:s and many rows? Regards /Fredrik
Re: Denormalization
When I said that writes were cheap, I was speaking that in a normal case people are making 2-10 inserts what in a relational database might be one. 30K inserts is certainly not cheap. Your use case with 30,000 inserts is probably a special case. Most directory services that I am aware of OpenLDAP, Active Directory, Sun Directory server do eventually consistent master/slave and multi-master replication. So no worries about having to background something. You just want the replication to be fast enough so that when you call the employee about to be fired into the office, that by the time he leaves and gets home he can not VPN to rm -rf / your main file server :) On Sun, Jan 27, 2013 at 7:57 PM, Hiller, Dean dean.hil...@nrel.gov wrote: Sometimes this is true, sometimes not…..….We have a use case where we have an admin tool where we choose to do this denorm for ACL on permission checks to make permission checks extremely fast. That said, we have one issue with one object that too many children(30,000) so when someone gives a user access to this one object with 30,000 children, we end up with a bad 60 second wait and users ended up getting frustrated and trying to cancel(our plan since admin activity hardly ever happens is to do it on our background thread and just return immediately to the user and tell him his changes will take affect in 1 minute ). After all, admin changes are infrequent anyways. This example demonstrates how sometimes it could almost burn you. I guess my real point is it really depends on your use cases ;). In a lot of cases denorm can work but in some cases it burns you so you have to balance it all. In 90% of our cases our denorm is working great and for this one case, we need to background the permission change as we still LOVE the performance of our ACL checks. Ps. 30,000 writes in cassandra is not cheap when done from one server ;) but in general parallized writes is very fast for like 500. Later, Dean From: Edward Capriolo edlinuxg...@gmail.commailto:edlinuxg...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Sunday, January 27, 2013 5:50 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Denormalization One technique is on the client side you build a tool that takes the even and produces N mutations. In c* writes are cheap so essentially, re-write everything on all changes. On Sun, Jan 27, 2013 at 4:03 PM, Fredrik Stigbäck fredrik.l.stigb...@sitevision.semailto:fredrik.l.stigb...@sitevision.se wrote: Hi. Since denormalized data is first-class citizen in Cassandra, how to handle updating denormalized data. E.g. If we have a USER cf with name, email etc. and denormalize user data into many other CF:s and then update the information about a user (name, email...). What is the best way to handle updating those user data properties which might be spread out over many cf:s and many rows? Regards /Fredrik
Re: Denormalization
Agreed, was just making sure others knew ;). Dean From: Edward Capriolo edlinuxg...@gmail.commailto:edlinuxg...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Sunday, January 27, 2013 6:51 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Denormalization When I said that writes were cheap, I was speaking that in a normal case people are making 2-10 inserts what in a relational database might be one. 30K inserts is certainly not cheap. Your use case with 30,000 inserts is probably a special case. Most directory services that I am aware of OpenLDAP, Active Directory, Sun Directory server do eventually consistent master/slave and multi-master replication. So no worries about having to background something. You just want the replication to be fast enough so that when you call the employee about to be fired into the office, that by the time he leaves and gets home he can not VPN to rm -rf / your main file server :) On Sun, Jan 27, 2013 at 7:57 PM, Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov wrote: Sometimes this is true, sometimes not…..….We have a use case where we have an admin tool where we choose to do this denorm for ACL on permission checks to make permission checks extremely fast. That said, we have one issue with one object that too many children(30,000) so when someone gives a user access to this one object with 30,000 children, we end up with a bad 60 second wait and users ended up getting frustrated and trying to cancel(our plan since admin activity hardly ever happens is to do it on our background thread and just return immediately to the user and tell him his changes will take affect in 1 minute ). After all, admin changes are infrequent anyways. This example demonstrates how sometimes it could almost burn you. I guess my real point is it really depends on your use cases ;). In a lot of cases denorm can work but in some cases it burns you so you have to balance it all. In 90% of our cases our denorm is working great and for this one case, we need to background the permission change as we still LOVE the performance of our ACL checks. Ps. 30,000 writes in cassandra is not cheap when done from one server ;) but in general parallized writes is very fast for like 500. Later, Dean From: Edward Capriolo edlinuxg...@gmail.commailto:edlinuxg...@gmail.commailto:edlinuxg...@gmail.commailto:edlinuxg...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Sunday, January 27, 2013 5:50 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Denormalization One technique is on the client side you build a tool that takes the even and produces N mutations. In c* writes are cheap so essentially, re-write everything on all changes. On Sun, Jan 27, 2013 at 4:03 PM, Fredrik Stigbäck fredrik.l.stigb...@sitevision.semailto:fredrik.l.stigb...@sitevision.semailto:fredrik.l.stigb...@sitevision.semailto:fredrik.l.stigb...@sitevision.se wrote: Hi. Since denormalized data is first-class citizen in Cassandra, how to handle updating denormalized data. E.g. If we have a USER cf with name, email etc. and denormalize user data into many other CF:s and then update the information about a user (name, email...). What is the best way to handle updating those user data properties which might be spread out over many cf:s and many rows? Regards /Fredrik