RE: Read Perf
Thanks. For our case, the no of rows will more or less be the same. The only thing which changes is the columns and they keep getting added. -Original Message- From: Hiller, Dean [mailto:dean.hil...@nrel.gov] Sent: 26 February 2013 09:21 To: user@cassandra.apache.org Subject: Re: Read Perf To find stuff on disk, there is a bloomfilter for each file in memory. On the docs, 1 billion rows has 2Gig of RAM, so it really will have a huge dependency on your number of rows. As you get more rows, you may need to modify the bloomfilter false positive to use less RAM but that means slower reads. Ie. As you add more rows, you will have slower reads on a single machine. We hit the RAM limit on one machine with 1 billion rows so we are in the process of tweaking the ratio of 0.000744(the default) to 0.1 to give us more time to solve. Since we see no I/o load on our machines(or rather extremely little), we plan on moving to leveled compaction where 0.1 is the default in new releases and size tiered new default I think is 0.01. Ie. If you store more data per row, this is not an issue as much but still something to consider. (Also, rows have a limit I think as well on data size but not sure what that is. I know the column limit on a row is in the millions, somewhere lower than 10 million). Later, Dean From: Kanwar Sangha kan...@mavenir.commailto:kan...@mavenir.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Monday, February 25, 2013 8:31 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Read Perf Hi - I am doing a performance run using modified YCSB client and was able to populate 8TB on a node and then ran some read workloads. I am seeing an average TPS of 930 ops/sec for random reads. There is no key cache/row cache. Question - Will the read TPS degrade if the data size increases to say 20 TB , 50 TB, 100 TB ? If I understand correctly, the read should remain constant irrespective of the data size since we eventually have sorted SStables and binary search would be done on the index filter to find the row ? Thanks, Kanwar
Re: Read Perf
In that case, make sure you don't plan on going into the millions or test the limit as I pretty sure it can't go above 10 million. (from previous posts on this list). Dean On 2/26/13 8:23 AM, Kanwar Sangha kan...@mavenir.com wrote: Thanks. For our case, the no of rows will more or less be the same. The only thing which changes is the columns and they keep getting added. -Original Message- From: Hiller, Dean [mailto:dean.hil...@nrel.gov] Sent: 26 February 2013 09:21 To: user@cassandra.apache.org Subject: Re: Read Perf To find stuff on disk, there is a bloomfilter for each file in memory. On the docs, 1 billion rows has 2Gig of RAM, so it really will have a huge dependency on your number of rows. As you get more rows, you may need to modify the bloomfilter false positive to use less RAM but that means slower reads. Ie. As you add more rows, you will have slower reads on a single machine. We hit the RAM limit on one machine with 1 billion rows so we are in the process of tweaking the ratio of 0.000744(the default) to 0.1 to give us more time to solve. Since we see no I/o load on our machines(or rather extremely little), we plan on moving to leveled compaction where 0.1 is the default in new releases and size tiered new default I think is 0.01. Ie. If you store more data per row, this is not an issue as much but still something to consider. (Also, rows have a limit I think as well on data size but not sure what that is. I know the column limit on a row is in the millions, somewhere lower than 10 million). Later, Dean From: Kanwar Sangha kan...@mavenir.commailto:kan...@mavenir.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Monday, February 25, 2013 8:31 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Read Perf Hi - I am doing a performance run using modified YCSB client and was able to populate 8TB on a node and then ran some read workloads. I am seeing an average TPS of 930 ops/sec for random reads. There is no key cache/row cache. Question - Will the read TPS degrade if the data size increases to say 20 TB , 50 TB, 100 TB ? If I understand correctly, the read should remain constant irrespective of the data size since we eventually have sorted SStables and binary search would be done on the index filter to find the row ? Thanks, Kanwar
Re: Read Perf
Depends, are you 1. Reading the same size of data as the data set size grows? (reading more data does generally get slower like reading 1MB vs. 10MB) 2. Reading the same number of columns as the data set size grows? 3. Never reading in the entire row? If the answer to all of the above is yes, yes, yes, then it should be fine but always better to test. ALSO, a big note, you MUST test doing a read repair as that will slow things down BIG TIME. We only have 130GB per node and in general cassandra is made for 300G to 500G per node on 1T drive(typical config). This is due to the maintenance so TEST your maintenance stuff before you get burned there. Just run nodetool upgradesstables and time it. This definitely gets slower as your data grows and gives you a good idea of how long operations will take. Of course, better yet, take a node completely out, wipe it and put it back in and see how long it takes to get all the data back into by running the read repair. With 10T, you will have a lot of issues I imagine. Dean On 2/26/13 8:43 AM, Kanwar Sangha kan...@mavenir.com wrote: Yep. So the read will remain constant in this case ? -Original Message- From: Hiller, Dean [mailto:dean.hil...@nrel.gov] Sent: 26 February 2013 09:32 To: user@cassandra.apache.org Subject: Re: Read Perf In that case, make sure you don't plan on going into the millions or test the limit as I pretty sure it can't go above 10 million. (from previous posts on this list). Dean On 2/26/13 8:23 AM, Kanwar Sangha kan...@mavenir.com wrote: Thanks. For our case, the no of rows will more or less be the same. The only thing which changes is the columns and they keep getting added. -Original Message- From: Hiller, Dean [mailto:dean.hil...@nrel.gov] Sent: 26 February 2013 09:21 To: user@cassandra.apache.org Subject: Re: Read Perf To find stuff on disk, there is a bloomfilter for each file in memory. On the docs, 1 billion rows has 2Gig of RAM, so it really will have a huge dependency on your number of rows. As you get more rows, you may need to modify the bloomfilter false positive to use less RAM but that means slower reads. Ie. As you add more rows, you will have slower reads on a single machine. We hit the RAM limit on one machine with 1 billion rows so we are in the process of tweaking the ratio of 0.000744(the default) to 0.1 to give us more time to solve. Since we see no I/o load on our machines(or rather extremely little), we plan on moving to leveled compaction where 0.1 is the default in new releases and size tiered new default I think is 0.01. Ie. If you store more data per row, this is not an issue as much but still something to consider. (Also, rows have a limit I think as well on data size but not sure what that is. I know the column limit on a row is in the millions, somewhere lower than 10 million). Later, Dean From: Kanwar Sangha kan...@mavenir.commailto:kan...@mavenir.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Monday, February 25, 2013 8:31 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Read Perf Hi - I am doing a performance run using modified YCSB client and was able to populate 8TB on a node and then ran some read workloads. I am seeing an average TPS of 930 ops/sec for random reads. There is no key cache/row cache. Question - Will the read TPS degrade if the data size increases to say 20 TB , 50 TB, 100 TB ? If I understand correctly, the read should remain constant irrespective of the data size since we eventually have sorted SStables and binary search would be done on the index filter to find the row ? Thanks, Kanwar
Read Perf
Hi - I am doing a performance run using modified YCSB client and was able to populate 8TB on a node and then ran some read workloads. I am seeing an average TPS of 930 ops/sec for random reads. There is no key cache/row cache. Question - Will the read TPS degrade if the data size increases to say 20 TB , 50 TB, 100 TB ? If I understand correctly, the read should remain constant irrespective of the data size since we eventually have sorted SStables and binary search would be done on the index filter to find the row ? Thanks, Kanwar
Read perf investigation
All, I've done a bit more homework, and I continue to see long 200ms to 300ms read times for some keys. Test Setup EC2 M1Large sending requests to a 5 node C* cluster also in EC2, also all M1Large. RF=3. ReadConsistency = ONE. I'm using pycassa from python for all communication. Data Model One column family with tens of millions of rows. The number of columns per row varies between 0 and 1440 (per minute records). The values are all ints. All data stored on EBS volumes. Total load per node is ~110GB. According to VMstat I'm not swapping at all. Highest %Util I see Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util xvdf 0.00 2788.00 17.00 267.50 1168.00 23020.0085.02 32.37 107.73 1.22 34.60 A more average profile I see is: Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util xvdf 0.00 0.00 21.000.00 1288.00 0.0061.33 0.37 18.38 9.43 19.80 QUESTION Where should I look next? I'd love to get a profile of exactly where cassandra is spending its time on a per call basis. Thanks in advance, Ian
RE: Read perf investigation
Uh, so look at your await time of *107.3*. From the iostat man page: await: The average time (in milliseconds) for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them. If the key you are reading from is not in Cassandras key cache or row cache, Cassandra needs to do two disk seeks (http://www.datastax.com/dev/blog/maximizing-cache-benefit-with-cassandra). This means that some of your *must* take on average 215 ms not even including network latency. Looks like EBS, or more generally disk saturation, is your problem. Perhaps consider RAID0 with ephemeral drives. Dan From: Ian Danforth [mailto:idanfo...@numenta.com] Sent: November-03-11 18:34 To: user@cassandra.apache.org Subject: Read perf investigation All, I've done a bit more homework, and I continue to see long 200ms to 300ms read times for some keys. Test Setup EC2 M1Large sending requests to a 5 node C* cluster also in EC2, also all M1Large. RF=3. ReadConsistency = ONE. I'm using pycassa from python for all communication. Data Model One column family with tens of millions of rows. The number of columns per row varies between 0 and 1440 (per minute records). The values are all ints. All data stored on EBS volumes. Total load per node is ~110GB. According to VMstat I'm not swapping at all. Highest %Util I see Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util xvdf 0.00 2788.00 17.00 267.50 1168.00 23020.0085.02 32.37 107.73 1.22 34.60 A more average profile I see is: Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util xvdf 0.00 0.00 21.000.00 1288.00 0.0061.33 0.37 18.38 9.43 19.80 QUESTION Where should I look next? I'd love to get a profile of exactly where cassandra is spending its time on a per call basis. Thanks in advance, Ian No virus found in this incoming message. Checked by AVG - www.avg.com Version: 9.0.920 / Virus Database: 271.1.1/3993 - Release Date: 11/03/11 03:39:00