Re: need some help with counters
On Jun 13, 2011, at 5:10 AM, aaron morton wrote: I am wondering how to index on the most recent hour as well. (ie show me top 5 URLs type query).. AFAIK thats not a great application for counters. You would need range support in the secondary indexes so you could get the first X rows ordered by a column value. To be honest, depending on scale, I'd consider a sorted set in redis for that. It does. Thanks Aaron. Hope that helps. - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 11 Jun 2011, at 00:36, Ian Holsman wrote: On Jun 9, 2011, at 10:04 PM, aaron morton wrote: I may be missing something but could you use a column for each of the last 48 hours all in the same row for a url ? e.g. { /url.com/hourly : { 20110609T01:00:00 : 456, 20110609T02:00:00 : 4567, } } yes.. that would work better... I was storing all the different times in the same row. { /url.com : { H-20110609T01:00:00 : 456, H-0110609T02:00:00 : 4567, D-0110609 : 5678, } } I am wondering how to index on the most recent hour as well. (ie show me top 5 URLs type query).. Increment the current hour only. Delete the older columns either when a read detects there are old values or as a maintenance job. Or as part of writing values for the first 5 minutes of any hour. yes.. I thought of that. The problem with doing it on read is there may be a case where a old URL never gets read.. so it will just sit there taking up space.. the maintenance job is the route I went down. The row will get spread out over a lot of sstables which may reduce read speed. If this is a problem consider a separate CF with more aggressive GC and compaction settings. Thanks! Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 10 Jun 2011, at 09:28, Ian Holsman wrote: So would doing something like storing it in reverse (so I know what to delete) work? Or is storing a million columns in a supercolumn impossible. I could always use a logfile and run the archiver off that as a worst case I guess. Would doing so many deletes screw up the db/cause other problems? --- Ian Holsman - 703 879-3128 I saw the angel in the marble and carved until I set him free -- Michelangelo On 09/06/2011, at 4:22 PM, Ryan King r...@twitter.com wrote: On Thu, Jun 9, 2011 at 1:06 PM, Ian Holsman had...@holsman.net wrote: Hi Ryan. you wouldn't have your version of cassandra up on github would you?? No, and the patch isn't in our version yet either. We're still working on it. -ryan
Re: need some help with counters
I am wondering how to index on the most recent hour as well. (ie show me top 5 URLs type query).. AFAIK thats not a great application for counters. You would need range support in the secondary indexes so you could get the first X rows ordered by a column value. To be honest, depending on scale, I'd consider a sorted set in redis for that. Hope that helps. - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 11 Jun 2011, at 00:36, Ian Holsman wrote: On Jun 9, 2011, at 10:04 PM, aaron morton wrote: I may be missing something but could you use a column for each of the last 48 hours all in the same row for a url ? e.g. { /url.com/hourly : { 20110609T01:00:00 : 456, 20110609T02:00:00 : 4567, } } yes.. that would work better... I was storing all the different times in the same row. { /url.com : { H-20110609T01:00:00 : 456, H-0110609T02:00:00 : 4567, D-0110609 : 5678, } } I am wondering how to index on the most recent hour as well. (ie show me top 5 URLs type query).. Increment the current hour only. Delete the older columns either when a read detects there are old values or as a maintenance job. Or as part of writing values for the first 5 minutes of any hour. yes.. I thought of that. The problem with doing it on read is there may be a case where a old URL never gets read.. so it will just sit there taking up space.. the maintenance job is the route I went down. The row will get spread out over a lot of sstables which may reduce read speed. If this is a problem consider a separate CF with more aggressive GC and compaction settings. Thanks! Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 10 Jun 2011, at 09:28, Ian Holsman wrote: So would doing something like storing it in reverse (so I know what to delete) work? Or is storing a million columns in a supercolumn impossible. I could always use a logfile and run the archiver off that as a worst case I guess. Would doing so many deletes screw up the db/cause other problems? --- Ian Holsman - 703 879-3128 I saw the angel in the marble and carved until I set him free -- Michelangelo On 09/06/2011, at 4:22 PM, Ryan King r...@twitter.com wrote: On Thu, Jun 9, 2011 at 1:06 PM, Ian Holsman had...@holsman.net wrote: Hi Ryan. you wouldn't have your version of cassandra up on github would you?? No, and the patch isn't in our version yet either. We're still working on it. -ryan
Re: need some help with counters
On Jun 9, 2011, at 10:04 PM, aaron morton wrote: I may be missing something but could you use a column for each of the last 48 hours all in the same row for a url ? e.g. { /url.com/hourly : { 20110609T01:00:00 : 456, 20110609T02:00:00 : 4567, } } yes.. that would work better... I was storing all the different times in the same row. { /url.com : { H-20110609T01:00:00 : 456, H-0110609T02:00:00 : 4567, D-0110609 : 5678, } } I am wondering how to index on the most recent hour as well. (ie show me top 5 URLs type query).. Increment the current hour only. Delete the older columns either when a read detects there are old values or as a maintenance job. Or as part of writing values for the first 5 minutes of any hour. yes.. I thought of that. The problem with doing it on read is there may be a case where a old URL never gets read.. so it will just sit there taking up space.. the maintenance job is the route I went down. The row will get spread out over a lot of sstables which may reduce read speed. If this is a problem consider a separate CF with more aggressive GC and compaction settings. Thanks! Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 10 Jun 2011, at 09:28, Ian Holsman wrote: So would doing something like storing it in reverse (so I know what to delete) work? Or is storing a million columns in a supercolumn impossible. I could always use a logfile and run the archiver off that as a worst case I guess. Would doing so many deletes screw up the db/cause other problems? --- Ian Holsman - 703 879-3128 I saw the angel in the marble and carved until I set him free -- Michelangelo On 09/06/2011, at 4:22 PM, Ryan King r...@twitter.com wrote: On Thu, Jun 9, 2011 at 1:06 PM, Ian Holsman had...@holsman.net wrote: Hi Ryan. you wouldn't have your version of cassandra up on github would you?? No, and the patch isn't in our version yet either. We're still working on it. -ryan
Re: need some help with counters
On Thu, Jun 9, 2011 at 12:41 PM, Ian Holsman had...@holsman.net wrote: Hi. I had a brief look at CASSANDRA-2103 (expiring counter columns), and I was wondering if anyone can help me with my problem. I want to keep some page-view stats on a URL at different levels of granularity (page views per hour, page views per day, page views per year etc etc). so my thinking was to create something a counter with a key based on Year-Month-Day-Hour, and simply increment the counter as I go along. this work's well and I'm getting my metrics beautifully put into the right places. the only problem I have is that I only need the last 48-hours worth of metrics at the hour level. how do I get rid of the old counters? do I need to write a archiver that will go through each url (could be millions) and just delete them? I'm sure other people have encountered this, and was wondering how they approached it. Here's how we are going to do it at twitter: https://issues.apache.org/jira/browse/CASSANDRA-2735 -ryan
Re: need some help with counters
Hey guy, have you tried amazon turk? -- Colin Clark +1 315 886 3422 cell +1 701 212 4314 office http://cloudeventprocessing.com http://blog.cloudeventprocessing.com @EventCloudPro *Sent from Star Trek like flat panel device, which although larger than my Star Trek like communicator device, may have typo's and exhibit improper grammar due to haste and less than perfect use of the virtual keyboard* On Jun 9, 2011, at 3:41 PM, Ian Holsman had...@holsman.net wrote: Hi. I had a brief look at CASSANDRA-2103 (expiring counter columns), and I was wondering if anyone can help me with my problem. I want to keep some page-view stats on a URL at different levels of granularity (page views per hour, page views per day, page views per year etc etc). so my thinking was to create something a counter with a key based on Year-Month-Day-Hour, and simply increment the counter as I go along. this work's well and I'm getting my metrics beautifully put into the right places. the only problem I have is that I only need the last 48-hours worth of metrics at the hour level. how do I get rid of the old counters? do I need to write a archiver that will go through each url (could be millions) and just delete them? I'm sure other people have encountered this, and was wondering how they approached it. TIA Ian
Re: need some help with counters
something like this: https://issues.apache.org/jira/browse/CASSANDRA-2103 https://issues.apache.org/jira/browse/CASSANDRA-2103but this turns out not feasible On Thu, Jun 9, 2011 at 12:41 PM, Ian Holsman had...@holsman.net wrote: Hi. I had a brief look at CASSANDRA-2103 (expiring counter columns), and I was wondering if anyone can help me with my problem. I want to keep some page-view stats on a URL at different levels of granularity (page views per hour, page views per day, page views per year etc etc). so my thinking was to create something a counter with a key based on Year-Month-Day-Hour, and simply increment the counter as I go along. this work's well and I'm getting my metrics beautifully put into the right places. the only problem I have is that I only need the last 48-hours worth of metrics at the hour level. how do I get rid of the old counters? do I need to write a archiver that will go through each url (could be millions) and just delete them? I'm sure other people have encountered this, and was wondering how they approached it. TIA Ian
Re: need some help with counters
Hi Ryan. you wouldn't have your version of cassandra up on github would you?? Colin.. always a pleasure. On Jun 9, 2011, at 3:44 PM, Ryan King wrote: On Thu, Jun 9, 2011 at 12:41 PM, Ian Holsman had...@holsman.net wrote: Hi. I had a brief look at CASSANDRA-2103 (expiring counter columns), and I was wondering if anyone can help me with my problem. I want to keep some page-view stats on a URL at different levels of granularity (page views per hour, page views per day, page views per year etc etc). so my thinking was to create something a counter with a key based on Year-Month-Day-Hour, and simply increment the counter as I go along. this work's well and I'm getting my metrics beautifully put into the right places. the only problem I have is that I only need the last 48-hours worth of metrics at the hour level. how do I get rid of the old counters? do I need to write a archiver that will go through each url (could be millions) and just delete them? I'm sure other people have encountered this, and was wondering how they approached it. Here's how we are going to do it at twitter: https://issues.apache.org/jira/browse/CASSANDRA-2735 -ryan
Re: need some help with counters
On Thu, Jun 9, 2011 at 1:06 PM, Ian Holsman had...@holsman.net wrote: Hi Ryan. you wouldn't have your version of cassandra up on github would you?? No, and the patch isn't in our version yet either. We're still working on it. -ryan
Re: need some help with counters
So would doing something like storing it in reverse (so I know what to delete) work? Or is storing a million columns in a supercolumn impossible. I could always use a logfile and run the archiver off that as a worst case I guess. Would doing so many deletes screw up the db/cause other problems? --- Ian Holsman - 703 879-3128 I saw the angel in the marble and carved until I set him free -- Michelangelo On 09/06/2011, at 4:22 PM, Ryan King r...@twitter.com wrote: On Thu, Jun 9, 2011 at 1:06 PM, Ian Holsman had...@holsman.net wrote: Hi Ryan. you wouldn't have your version of cassandra up on github would you?? No, and the patch isn't in our version yet either. We're still working on it. -ryan
Re: need some help with counters
I may be missing something but could you use a column for each of the last 48 hours all in the same row for a url ? e.g. { /url.com/hourly : { 20110609T01:00:00 : 456, 20110609T02:00:00 : 4567, } } Increment the current hour only. Delete the older columns either when a read detects there are old values or as a maintenance job. Or as part of writing values for the first 5 minutes of any hour. The row will get spread out over a lot of sstables which may reduce read speed. If this is a problem consider a separate CF with more aggressive GC and compaction settings. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 10 Jun 2011, at 09:28, Ian Holsman wrote: So would doing something like storing it in reverse (so I know what to delete) work? Or is storing a million columns in a supercolumn impossible. I could always use a logfile and run the archiver off that as a worst case I guess. Would doing so many deletes screw up the db/cause other problems? --- Ian Holsman - 703 879-3128 I saw the angel in the marble and carved until I set him free -- Michelangelo On 09/06/2011, at 4:22 PM, Ryan King r...@twitter.com wrote: On Thu, Jun 9, 2011 at 1:06 PM, Ian Holsman had...@holsman.net wrote: Hi Ryan. you wouldn't have your version of cassandra up on github would you?? No, and the patch isn't in our version yet either. We're still working on it. -ryan