Re: implementing a 'sorted set' on top of cassandra
That was what I had in mind. Which solution (populate on demand, pre-populate) really fits your needs depends on - write frequency - required cache expiration time - read frequency - ratio of written / read sets If you prefer event based stuff, the kafka solution, jon proposed could be quite interesting. 2017-01-17 19:10 GMT+01:00 Jonathan Haddad : > You could store the key -> score pairs in Cassandra, pull out the full > partition and repopulate the cache in redis with the top N whatever you > need. I'd only read the Cassandra values directly in order to repopulate > the cache. > > I wouldn't try to score the score -> key values, the perf will be a > nightmare. > > On Tue, Jan 17, 2017 at 8:47 AM Mike Torra wrote: > >> Thanks for the feedback everyone! Redis `zincryby` and `zrangebyscore` is >> indeed what we use today. >> >> Caching the resulting 'sorted sets' in redis is exactly what I plan to >> do. There will be tens of thousands of these sorted sets, each generally >> with <10k items (with maybe a few exceptions going a bit over that). The >> reason to periodically calculate the set and store it in cassandra is to >> avoid having the client do that work, when the client only really cares >> about the top 100 or so items at any given time. Being truly "real time" is >> not critical for us, but it is a selling point to be as up to date as >> possible. >> >> I'd like to understand the performance issue of frequently updating these >> sets. I understand that every time I 'regenerate' the sorted set, any rows >> that change will create a tombstone - for example, if "item_1" is in first >> place and "item_2" is in second place, then they switch on the next update, >> that would be two tombstones. Do you think this will be a big enough >> problem that it is worth doing the sorting work client side, on demand, and >> just try to eat the performance hit there? My thought was to make a >> tradeoff by using more cassandra disk space (ie pre calculating all sets), >> in exchange for faster reads when requests actually come in that need this >> data. >> >> From: Benjamin Roth >> Reply-To: "user@cassandra.apache.org" >> Date: Saturday, January 14, 2017 at 1:25 PM >> To: "user@cassandra.apache.org" >> Subject: Re: implementing a 'sorted set' on top of cassandra >> >> Mike mentioned "increment" in his initial post. That let me think of a >> case with increments and fetching a top list by a counter like >> https://redis.io/commands/zincrby >> https://redis.io/commands/zrangebyscore >> >> 1. Cassandra is absolutely not made to sort by a counter (or a >> non-counter numeric incrementing value) but it is made to store counters. >> In this case a partition could be seen as a set. >> 2. I thought of CS for persistence and - depending on the app >> requirements like real-time and set size - still use redis as a read cache >> >> 2017-01-14 18:45 GMT+01:00 Jonathan Haddad : >> >> Sorted sets don't have a requirement of incrementing / decrementing. >> They're commonly used for thing like leaderboards where the values are >> arbitrary. >> >> In Redis they are implemented with 2 data structures for efficient >> lookups of either key or value. No getting around that as far as I know. >> >> In Cassandra they would require using the score as a clustering column in >> order to select top N scores (and paginate). That means a tombstone >> whenever the value for a key in the set changes. In sets with high rates of >> change that means a lot of tombstones and thus terrible performance. >> On Sat, Jan 14, 2017 at 9:40 AM DuyHai Doan wrote: >> >> Sorting on an "incremented" numeric value has always been a nightmare to >> be done properly in C* >> >> Either use Counter type but then no sorting is possible since counter >> cannot be used as type for clustering column (which allows sort) >> >> Or use simple numeric type on clustering column but then to increment the >> value *concurrently* and *safely* it's prohibitive (SELECT to fetch current >> value + UPDATE ... IF value = ) + retry >> >> >> >> On Sat, Jan 14, 2017 at 8:54 AM, Benjamin Roth >> wrote: >> >> If your proposed solution is crazy depends on your needs :) >> It sounds like you can live with not-realtime data. So it is ok to cache >> it. Why preproduce the results if you only need 5% of them? Why not use >> redis as a cache with expiring sorted sets that are filled
Re: implementing a 'sorted set' on top of cassandra
You could store the key -> score pairs in Cassandra, pull out the full partition and repopulate the cache in redis with the top N whatever you need. I'd only read the Cassandra values directly in order to repopulate the cache. I wouldn't try to score the score -> key values, the perf will be a nightmare. On Tue, Jan 17, 2017 at 8:47 AM Mike Torra wrote: > Thanks for the feedback everyone! Redis `zincryby` and `zrangebyscore` is > indeed what we use today. > > Caching the resulting 'sorted sets' in redis is exactly what I plan to do. > There will be tens of thousands of these sorted sets, each generally with > <10k items (with maybe a few exceptions going a bit over that). The reason > to periodically calculate the set and store it in cassandra is to avoid > having the client do that work, when the client only really cares about the > top 100 or so items at any given time. Being truly "real time" is not > critical for us, but it is a selling point to be as up to date as possible. > > I'd like to understand the performance issue of frequently updating these > sets. I understand that every time I 'regenerate' the sorted set, any rows > that change will create a tombstone - for example, if "item_1" is in first > place and "item_2" is in second place, then they switch on the next update, > that would be two tombstones. Do you think this will be a big enough > problem that it is worth doing the sorting work client side, on demand, and > just try to eat the performance hit there? My thought was to make a > tradeoff by using more cassandra disk space (ie pre calculating all sets), > in exchange for faster reads when requests actually come in that need this > data. > > From: Benjamin Roth > Reply-To: "user@cassandra.apache.org" > Date: Saturday, January 14, 2017 at 1:25 PM > To: "user@cassandra.apache.org" > Subject: Re: implementing a 'sorted set' on top of cassandra > > Mike mentioned "increment" in his initial post. That let me think of a > case with increments and fetching a top list by a counter like > https://redis.io/commands/zincrby > https://redis.io/commands/zrangebyscore > > 1. Cassandra is absolutely not made to sort by a counter (or a non-counter > numeric incrementing value) but it is made to store counters. In this case > a partition could be seen as a set. > 2. I thought of CS for persistence and - depending on the app requirements > like real-time and set size - still use redis as a read cache > > 2017-01-14 18:45 GMT+01:00 Jonathan Haddad : > > Sorted sets don't have a requirement of incrementing / decrementing. > They're commonly used for thing like leaderboards where the values are > arbitrary. > > In Redis they are implemented with 2 data structures for efficient lookups > of either key or value. No getting around that as far as I know. > > In Cassandra they would require using the score as a clustering column in > order to select top N scores (and paginate). That means a tombstone > whenever the value for a key in the set changes. In sets with high rates of > change that means a lot of tombstones and thus terrible performance. > On Sat, Jan 14, 2017 at 9:40 AM DuyHai Doan wrote: > > Sorting on an "incremented" numeric value has always been a nightmare to > be done properly in C* > > Either use Counter type but then no sorting is possible since counter > cannot be used as type for clustering column (which allows sort) > > Or use simple numeric type on clustering column but then to increment the > value *concurrently* and *safely* it's prohibitive (SELECT to fetch current > value + UPDATE ... IF value = ) + retry > > > > On Sat, Jan 14, 2017 at 8:54 AM, Benjamin Roth > wrote: > > If your proposed solution is crazy depends on your needs :) > It sounds like you can live with not-realtime data. So it is ok to cache > it. Why preproduce the results if you only need 5% of them? Why not use > redis as a cache with expiring sorted sets that are filled on demand from > cassandra partitions with counters? > So redis has much less to do and can scale much better. And you are not > limited on keeping all data in ram as cache data is volatile and can be > evicted on demand. > If this is effective also depends on the size of your sets. CS wont be > able to sort them by score for you, so you will have to load the complete > set to redis for caching and / or do sorting in your app on demand. This > certainly won't work out well with sets with millions of entries. > > 2017-01-13 23:14 GMT+01:00 Mike Torra : > > We currently use redis to store sorted sets that we increment many, many > times more than w
Re: implementing a 'sorted set' on top of cassandra
On Tue, Jan 17, 2017 at 11:47 AM, Mike Torra wrote: > Thanks for the feedback everyone! Redis `zincryby` and `zrangebyscore` is > indeed what we use today. > > Caching the resulting 'sorted sets' in redis is exactly what I plan to do. > There will be tens of thousands of these sorted sets, each generally with > <10k items (with maybe a few exceptions going a bit over that). The reason > to periodically calculate the set and store it in cassandra is to avoid > having the client do that work, when the client only really cares about the > top 100 or so items at any given time. Being truly "real time" is not > critical for us, but it is a selling point to be as up to date as possible. > > I'd like to understand the performance issue of frequently updating these > sets. I understand that every time I 'regenerate' the sorted set, any rows > that change will create a tombstone - for example, if "item_1" is in first > place and "item_2" is in second place, then they switch on the next update, > that would be two tombstones. Do you think this will be a big enough > problem that it is worth doing the sorting work client side, on demand, and > just try to eat the performance hit there? My thought was to make a > tradeoff by using more cassandra disk space (ie pre calculating all sets), > in exchange for faster reads when requests actually come in that need this > data. > > From: Benjamin Roth > Reply-To: "user@cassandra.apache.org" > Date: Saturday, January 14, 2017 at 1:25 PM > To: "user@cassandra.apache.org" > Subject: Re: implementing a 'sorted set' on top of cassandra > > Mike mentioned "increment" in his initial post. That let me think of a > case with increments and fetching a top list by a counter like > https://redis.io/commands/zincrby > https://redis.io/commands/zrangebyscore > > 1. Cassandra is absolutely not made to sort by a counter (or a non-counter > numeric incrementing value) but it is made to store counters. In this case > a partition could be seen as a set. > 2. I thought of CS for persistence and - depending on the app requirements > like real-time and set size - still use redis as a read cache > > 2017-01-14 18:45 GMT+01:00 Jonathan Haddad : > >> Sorted sets don't have a requirement of incrementing / decrementing. >> They're commonly used for thing like leaderboards where the values are >> arbitrary. >> >> In Redis they are implemented with 2 data structures for efficient >> lookups of either key or value. No getting around that as far as I know. >> >> In Cassandra they would require using the score as a clustering column in >> order to select top N scores (and paginate). That means a tombstone >> whenever the value for a key in the set changes. In sets with high rates of >> change that means a lot of tombstones and thus terrible performance. >> On Sat, Jan 14, 2017 at 9:40 AM DuyHai Doan wrote: >> >>> Sorting on an "incremented" numeric value has always been a nightmare to >>> be done properly in C* >>> >>> Either use Counter type but then no sorting is possible since counter >>> cannot be used as type for clustering column (which allows sort) >>> >>> Or use simple numeric type on clustering column but then to increment >>> the value *concurrently* and *safely* it's prohibitive (SELECT to fetch >>> current value + UPDATE ... IF value = ) + retry >>> >>> >>> >>> On Sat, Jan 14, 2017 at 8:54 AM, Benjamin Roth >>> wrote: >>> >>> If your proposed solution is crazy depends on your needs :) >>> It sounds like you can live with not-realtime data. So it is ok to cache >>> it. Why preproduce the results if you only need 5% of them? Why not use >>> redis as a cache with expiring sorted sets that are filled on demand from >>> cassandra partitions with counters? >>> So redis has much less to do and can scale much better. And you are not >>> limited on keeping all data in ram as cache data is volatile and can be >>> evicted on demand. >>> If this is effective also depends on the size of your sets. CS wont be >>> able to sort them by score for you, so you will have to load the complete >>> set to redis for caching and / or do sorting in your app on demand. This >>> certainly won't work out well with sets with millions of entries. >>> >>> 2017-01-13 23:14 GMT+01:00 Mike Torra : >>> >>> We currently use redis to store sorted sets that we increment many, many >>> times more than we read. For example
Re: implementing a 'sorted set' on top of cassandra
Thanks for the feedback everyone! Redis `zincryby` and `zrangebyscore` is indeed what we use today. Caching the resulting 'sorted sets' in redis is exactly what I plan to do. There will be tens of thousands of these sorted sets, each generally with <10k items (with maybe a few exceptions going a bit over that). The reason to periodically calculate the set and store it in cassandra is to avoid having the client do that work, when the client only really cares about the top 100 or so items at any given time. Being truly "real time" is not critical for us, but it is a selling point to be as up to date as possible. I'd like to understand the performance issue of frequently updating these sets. I understand that every time I 'regenerate' the sorted set, any rows that change will create a tombstone - for example, if "item_1" is in first place and "item_2" is in second place, then they switch on the next update, that would be two tombstones. Do you think this will be a big enough problem that it is worth doing the sorting work client side, on demand, and just try to eat the performance hit there? My thought was to make a tradeoff by using more cassandra disk space (ie pre calculating all sets), in exchange for faster reads when requests actually come in that need this data. From: Benjamin Roth mailto:benjamin.r...@jaumo.com>> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" mailto:user@cassandra.apache.org>> Date: Saturday, January 14, 2017 at 1:25 PM To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" mailto:user@cassandra.apache.org>> Subject: Re: implementing a 'sorted set' on top of cassandra Mike mentioned "increment" in his initial post. That let me think of a case with increments and fetching a top list by a counter like https://redis.io/commands/zincrby https://redis.io/commands/zrangebyscore 1. Cassandra is absolutely not made to sort by a counter (or a non-counter numeric incrementing value) but it is made to store counters. In this case a partition could be seen as a set. 2. I thought of CS for persistence and - depending on the app requirements like real-time and set size - still use redis as a read cache 2017-01-14 18:45 GMT+01:00 Jonathan Haddad mailto:j...@jonhaddad.com>>: Sorted sets don't have a requirement of incrementing / decrementing. They're commonly used for thing like leaderboards where the values are arbitrary. In Redis they are implemented with 2 data structures for efficient lookups of either key or value. No getting around that as far as I know. In Cassandra they would require using the score as a clustering column in order to select top N scores (and paginate). That means a tombstone whenever the value for a key in the set changes. In sets with high rates of change that means a lot of tombstones and thus terrible performance. On Sat, Jan 14, 2017 at 9:40 AM DuyHai Doan mailto:doanduy...@gmail.com>> wrote: Sorting on an "incremented" numeric value has always been a nightmare to be done properly in C* Either use Counter type but then no sorting is possible since counter cannot be used as type for clustering column (which allows sort) Or use simple numeric type on clustering column but then to increment the value *concurrently* and *safely* it's prohibitive (SELECT to fetch current value + UPDATE ... IF value = ) + retry On Sat, Jan 14, 2017 at 8:54 AM, Benjamin Roth mailto:benjamin.r...@jaumo.com>> wrote: If your proposed solution is crazy depends on your needs :) It sounds like you can live with not-realtime data. So it is ok to cache it. Why preproduce the results if you only need 5% of them? Why not use redis as a cache with expiring sorted sets that are filled on demand from cassandra partitions with counters? So redis has much less to do and can scale much better. And you are not limited on keeping all data in ram as cache data is volatile and can be evicted on demand. If this is effective also depends on the size of your sets. CS wont be able to sort them by score for you, so you will have to load the complete set to redis for caching and / or do sorting in your app on demand. This certainly won't work out well with sets with millions of entries. 2017-01-13 23:14 GMT+01:00 Mike Torra mailto:mto...@demandware.com>>: We currently use redis to store sorted sets that we increment many, many times more than we read. For example, only about 5% of these sets are ever read. We are getting to the point where redis is becoming difficult to scale (currently at >20 nodes). We've started using cassandra for other things, and now we are experimenting to see if having a similar 'sorted set' data structure is feasible in cassandra. My approach so far is: 1. Use a counter CF to store the values I wan
Re: implementing a 'sorted set' on top of cassandra
Mike mentioned "increment" in his initial post. That let me think of a case with increments and fetching a top list by a counter like https://redis.io/commands/zincrby https://redis.io/commands/zrangebyscore 1. Cassandra is absolutely not made to sort by a counter (or a non-counter numeric incrementing value) but it is made to store counters. In this case a partition could be seen as a set. 2. I thought of CS for persistence and - depending on the app requirements like real-time and set size - still use redis as a read cache 2017-01-14 18:45 GMT+01:00 Jonathan Haddad : > Sorted sets don't have a requirement of incrementing / decrementing. > They're commonly used for thing like leaderboards where the values are > arbitrary. > > In Redis they are implemented with 2 data structures for efficient lookups > of either key or value. No getting around that as far as I know. > > In Cassandra they would require using the score as a clustering column in > order to select top N scores (and paginate). That means a tombstone > whenever the value for a key in the set changes. In sets with high rates of > change that means a lot of tombstones and thus terrible performance. > On Sat, Jan 14, 2017 at 9:40 AM DuyHai Doan wrote: > >> Sorting on an "incremented" numeric value has always been a nightmare to >> be done properly in C* >> >> Either use Counter type but then no sorting is possible since counter >> cannot be used as type for clustering column (which allows sort) >> >> Or use simple numeric type on clustering column but then to increment the >> value *concurrently* and *safely* it's prohibitive (SELECT to fetch current >> value + UPDATE ... IF value = ) + retry >> >> >> >> On Sat, Jan 14, 2017 at 8:54 AM, Benjamin Roth >> wrote: >> >> If your proposed solution is crazy depends on your needs :) >> It sounds like you can live with not-realtime data. So it is ok to cache >> it. Why preproduce the results if you only need 5% of them? Why not use >> redis as a cache with expiring sorted sets that are filled on demand from >> cassandra partitions with counters? >> So redis has much less to do and can scale much better. And you are not >> limited on keeping all data in ram as cache data is volatile and can be >> evicted on demand. >> If this is effective also depends on the size of your sets. CS wont be >> able to sort them by score for you, so you will have to load the complete >> set to redis for caching and / or do sorting in your app on demand. This >> certainly won't work out well with sets with millions of entries. >> >> 2017-01-13 23:14 GMT+01:00 Mike Torra : >> >> We currently use redis to store sorted sets that we increment many, many >> times more than we read. For example, only about 5% of these sets are ever >> read. We are getting to the point where redis is becoming difficult to >> scale (currently at >20 nodes). >> >> We've started using cassandra for other things, and now we are >> experimenting to see if having a similar 'sorted set' data structure is >> feasible in cassandra. My approach so far is: >> >>1. Use a counter CF to store the values I want to sort by >>2. Periodically read in all key/values in the counter CF and sort in >>the client application (~every five minutes or so) >>3. Write back to a different CF with the ordered keys I care about >> >> Does this seem crazy? Is there a simpler way to do this in cassandra? >> >> >> >> >> -- >> Benjamin Roth >> Prokurist >> >> Jaumo GmbH · www.jaumo.com >> Wehrstraße 46 · 73035 Göppingen · Germany >> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1 >> <+49%207161%203048801> >> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer >> >> >> -- Benjamin Roth Prokurist Jaumo GmbH · www.jaumo.com Wehrstraße 46 · 73035 Göppingen · Germany Phone +49 7161 304880-6 · Fax +49 7161 304880-1 AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
Re: implementing a 'sorted set' on top of cassandra
Sorted sets don't have a requirement of incrementing / decrementing. They're commonly used for thing like leaderboards where the values are arbitrary. In Redis they are implemented with 2 data structures for efficient lookups of either key or value. No getting around that as far as I know. In Cassandra they would require using the score as a clustering column in order to select top N scores (and paginate). That means a tombstone whenever the value for a key in the set changes. In sets with high rates of change that means a lot of tombstones and thus terrible performance. On Sat, Jan 14, 2017 at 9:40 AM DuyHai Doan wrote: > Sorting on an "incremented" numeric value has always been a nightmare to > be done properly in C* > > Either use Counter type but then no sorting is possible since counter > cannot be used as type for clustering column (which allows sort) > > Or use simple numeric type on clustering column but then to increment the > value *concurrently* and *safely* it's prohibitive (SELECT to fetch current > value + UPDATE ... IF value = ) + retry > > > > On Sat, Jan 14, 2017 at 8:54 AM, Benjamin Roth > wrote: > > If your proposed solution is crazy depends on your needs :) > It sounds like you can live with not-realtime data. So it is ok to cache > it. Why preproduce the results if you only need 5% of them? Why not use > redis as a cache with expiring sorted sets that are filled on demand from > cassandra partitions with counters? > So redis has much less to do and can scale much better. And you are not > limited on keeping all data in ram as cache data is volatile and can be > evicted on demand. > If this is effective also depends on the size of your sets. CS wont be > able to sort them by score for you, so you will have to load the complete > set to redis for caching and / or do sorting in your app on demand. This > certainly won't work out well with sets with millions of entries. > > 2017-01-13 23:14 GMT+01:00 Mike Torra : > > We currently use redis to store sorted sets that we increment many, many > times more than we read. For example, only about 5% of these sets are ever > read. We are getting to the point where redis is becoming difficult to > scale (currently at >20 nodes). > > We've started using cassandra for other things, and now we are > experimenting to see if having a similar 'sorted set' data structure is > feasible in cassandra. My approach so far is: > >1. Use a counter CF to store the values I want to sort by >2. Periodically read in all key/values in the counter CF and sort in >the client application (~every five minutes or so) >3. Write back to a different CF with the ordered keys I care about > > Does this seem crazy? Is there a simpler way to do this in cassandra? > > > > > -- > Benjamin Roth > Prokurist > > Jaumo GmbH · www.jaumo.com > Wehrstraße 46 · 73035 Göppingen · Germany > Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1 > <+49%207161%203048801> > AG Ulm · HRB 731058 · Managing Director: Jens Kammerer > > >
Re: implementing a 'sorted set' on top of cassandra
Sorting on an "incremented" numeric value has always been a nightmare to be done properly in C* Either use Counter type but then no sorting is possible since counter cannot be used as type for clustering column (which allows sort) Or use simple numeric type on clustering column but then to increment the value *concurrently* and *safely* it's prohibitive (SELECT to fetch current value + UPDATE ... IF value = ) + retry On Sat, Jan 14, 2017 at 8:54 AM, Benjamin Roth wrote: > If your proposed solution is crazy depends on your needs :) > It sounds like you can live with not-realtime data. So it is ok to cache > it. Why preproduce the results if you only need 5% of them? Why not use > redis as a cache with expiring sorted sets that are filled on demand from > cassandra partitions with counters? > So redis has much less to do and can scale much better. And you are not > limited on keeping all data in ram as cache data is volatile and can be > evicted on demand. > If this is effective also depends on the size of your sets. CS wont be > able to sort them by score for you, so you will have to load the complete > set to redis for caching and / or do sorting in your app on demand. This > certainly won't work out well with sets with millions of entries. > > 2017-01-13 23:14 GMT+01:00 Mike Torra : > >> We currently use redis to store sorted sets that we increment many, many >> times more than we read. For example, only about 5% of these sets are ever >> read. We are getting to the point where redis is becoming difficult to >> scale (currently at >20 nodes). >> >> We've started using cassandra for other things, and now we are >> experimenting to see if having a similar 'sorted set' data structure is >> feasible in cassandra. My approach so far is: >> >>1. Use a counter CF to store the values I want to sort by >>2. Periodically read in all key/values in the counter CF and sort in >>the client application (~every five minutes or so) >>3. Write back to a different CF with the ordered keys I care about >> >> Does this seem crazy? Is there a simpler way to do this in cassandra? >> > > > > -- > Benjamin Roth > Prokurist > > Jaumo GmbH · www.jaumo.com > Wehrstraße 46 · 73035 Göppingen · Germany > Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1 > <+49%207161%203048801> > AG Ulm · HRB 731058 · Managing Director: Jens Kammerer >
Re: implementing a 'sorted set' on top of cassandra
If your proposed solution is crazy depends on your needs :) It sounds like you can live with not-realtime data. So it is ok to cache it. Why preproduce the results if you only need 5% of them? Why not use redis as a cache with expiring sorted sets that are filled on demand from cassandra partitions with counters? So redis has much less to do and can scale much better. And you are not limited on keeping all data in ram as cache data is volatile and can be evicted on demand. If this is effective also depends on the size of your sets. CS wont be able to sort them by score for you, so you will have to load the complete set to redis for caching and / or do sorting in your app on demand. This certainly won't work out well with sets with millions of entries. 2017-01-13 23:14 GMT+01:00 Mike Torra : > We currently use redis to store sorted sets that we increment many, many > times more than we read. For example, only about 5% of these sets are ever > read. We are getting to the point where redis is becoming difficult to > scale (currently at >20 nodes). > > We've started using cassandra for other things, and now we are > experimenting to see if having a similar 'sorted set' data structure is > feasible in cassandra. My approach so far is: > >1. Use a counter CF to store the values I want to sort by >2. Periodically read in all key/values in the counter CF and sort in >the client application (~every five minutes or so) >3. Write back to a different CF with the ordered keys I care about > > Does this seem crazy? Is there a simpler way to do this in cassandra? > -- Benjamin Roth Prokurist Jaumo GmbH · www.jaumo.com Wehrstraße 46 · 73035 Göppingen · Germany Phone +49 7161 304880-6 · Fax +49 7161 304880-1 AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
Re: implementing a 'sorted set' on top of cassandra
Not if you want to sort by score (a counter) Am 14.01.2017 08:33 schrieb "DuyHai Doan" : > Clustering column can be seen as sorted set > > Table abstraction == Map> > > > On Sat, Jan 14, 2017 at 2:28 AM, Edward Capriolo > wrote: > >> >> >> On Fri, Jan 13, 2017 at 8:14 PM, Jonathan Haddad >> wrote: >> >>> I've thought about this for years and have never arrived on a >>> particularly great implementation. Your idea will be maybe OK if the sets >>> are very small and if the values don't change very often. But in a system >>> where the values of the keys in the set change frequently (lots of >>> tombstones) or the sets are large I think you're going to experience quite >>> a bit of pain. >>> >>> On Fri, Jan 13, 2017 at 2:14 PM Mike Torra >>> wrote: >>> >>> We currently use redis to store sorted sets that we increment many, many >>> times more than we read. For example, only about 5% of these sets are ever >>> read. We are getting to the point where redis is becoming difficult to >>> scale (currently at >20 nodes). >>> >>> We've started using cassandra for other things, and now we are >>> experimenting to see if having a similar 'sorted set' data structure is >>> feasible in cassandra. My approach so far is: >>> >>>1. Use a counter CF to store the values I want to sort by >>>2. Periodically read in all key/values in the counter CF and sort in >>>the client application (~every five minutes or so) >>>3. Write back to a different CF with the ordered keys I care about >>> >>> Does this seem crazy? Is there a simpler way to do this in cassandra? >>> >>> >> Redis is the other side of the coin. >> >> Fast: >> https://groups.google.com/forum/#!topic/redis-db/4TAItKMyUEE >> >> http://stackoverflow.com/questions/6076342/is-there-a-practi >> cal-limit-to-the-number-of-elements-in-a-sorted-set-in-redis >> >> 320MB memory for a 2,000,000 email addresses is hard to scale. If you are >> only maintaining a single list great, but if you have millions of lists >> this memory/ cost profile is not idea. >> > >
Re: implementing a 'sorted set' on top of cassandra
Clustering column can be seen as sorted set Table abstraction == Map> On Sat, Jan 14, 2017 at 2:28 AM, Edward Capriolo wrote: > > > On Fri, Jan 13, 2017 at 8:14 PM, Jonathan Haddad > wrote: > >> I've thought about this for years and have never arrived on a >> particularly great implementation. Your idea will be maybe OK if the sets >> are very small and if the values don't change very often. But in a system >> where the values of the keys in the set change frequently (lots of >> tombstones) or the sets are large I think you're going to experience quite >> a bit of pain. >> >> On Fri, Jan 13, 2017 at 2:14 PM Mike Torra wrote: >> >> We currently use redis to store sorted sets that we increment many, many >> times more than we read. For example, only about 5% of these sets are ever >> read. We are getting to the point where redis is becoming difficult to >> scale (currently at >20 nodes). >> >> We've started using cassandra for other things, and now we are >> experimenting to see if having a similar 'sorted set' data structure is >> feasible in cassandra. My approach so far is: >> >>1. Use a counter CF to store the values I want to sort by >>2. Periodically read in all key/values in the counter CF and sort in >>the client application (~every five minutes or so) >>3. Write back to a different CF with the ordered keys I care about >> >> Does this seem crazy? Is there a simpler way to do this in cassandra? >> >> > Redis is the other side of the coin. > > Fast: > https://groups.google.com/forum/#!topic/redis-db/4TAItKMyUEE > > http://stackoverflow.com/questions/6076342/is-there-a- > practical-limit-to-the-number-of-elements-in-a-sorted-set-in-redis > > 320MB memory for a 2,000,000 email addresses is hard to scale. If you are > only maintaining a single list great, but if you have millions of lists > this memory/ cost profile is not idea. >
Re: implementing a 'sorted set' on top of cassandra
On Fri, Jan 13, 2017 at 8:14 PM, Jonathan Haddad wrote: > I've thought about this for years and have never arrived on a particularly > great implementation. Your idea will be maybe OK if the sets are very > small and if the values don't change very often. But in a system where the > values of the keys in the set change frequently (lots of tombstones) or the > sets are large I think you're going to experience quite a bit of pain. > > On Fri, Jan 13, 2017 at 2:14 PM Mike Torra wrote: > > We currently use redis to store sorted sets that we increment many, many > times more than we read. For example, only about 5% of these sets are ever > read. We are getting to the point where redis is becoming difficult to > scale (currently at >20 nodes). > > We've started using cassandra for other things, and now we are > experimenting to see if having a similar 'sorted set' data structure is > feasible in cassandra. My approach so far is: > >1. Use a counter CF to store the values I want to sort by >2. Periodically read in all key/values in the counter CF and sort in >the client application (~every five minutes or so) >3. Write back to a different CF with the ordered keys I care about > > Does this seem crazy? Is there a simpler way to do this in cassandra? > > Redis is the other side of the coin. Fast: https://groups.google.com/forum/#!topic/redis-db/4TAItKMyUEE http://stackoverflow.com/questions/6076342/is-there-a-practical-limit-to-the-number-of-elements-in-a-sorted-set-in-redis 320MB memory for a 2,000,000 email addresses is hard to scale. If you are only maintaining a single list great, but if you have millions of lists this memory/ cost profile is not idea.
Re: implementing a 'sorted set' on top of cassandra
I've thought about this for years and have never arrived on a particularly great implementation. Your idea will be maybe OK if the sets are very small and if the values don't change very often. But in a system where the values of the keys in the set change frequently (lots of tombstones) or the sets are large I think you're going to experience quite a bit of pain. On Fri, Jan 13, 2017 at 2:14 PM Mike Torra wrote: We currently use redis to store sorted sets that we increment many, many times more than we read. For example, only about 5% of these sets are ever read. We are getting to the point where redis is becoming difficult to scale (currently at >20 nodes). We've started using cassandra for other things, and now we are experimenting to see if having a similar 'sorted set' data structure is feasible in cassandra. My approach so far is: 1. Use a counter CF to store the values I want to sort by 2. Periodically read in all key/values in the counter CF and sort in the client application (~every five minutes or so) 3. Write back to a different CF with the ordered keys I care about Does this seem crazy? Is there a simpler way to do this in cassandra?
Re: implementing a 'sorted set' on top of cassandra
On Fri, Jan 13, 2017 at 5:14 PM, Mike Torra wrote: > We currently use redis to store sorted sets that we increment many, many > times more than we read. For example, only about 5% of these sets are ever > read. We are getting to the point where redis is becoming difficult to > scale (currently at >20 nodes). > > We've started using cassandra for other things, and now we are > experimenting to see if having a similar 'sorted set' data structure is > feasible in cassandra. My approach so far is: > >1. Use a counter CF to store the values I want to sort by >2. Periodically read in all key/values in the counter CF and sort in >the client application (~every five minutes or so) >3. Write back to a different CF with the ordered keys I care about > > Does this seem crazy? Is there a simpler way to do this in cassandra? > Have you considered using only the keys in Cassandra's map type? I proposed an implementation that I wanted to experiment with adding to a set: https://issues.apache.org/jira/browse/CASSANDRA-6870 . Even though redis and it's feature set is wildly popular there is not a great consensus that Cassandra should do those things as manipulations of a single column.
implementing a 'sorted set' on top of cassandra
We currently use redis to store sorted sets that we increment many, many times more than we read. For example, only about 5% of these sets are ever read. We are getting to the point where redis is becoming difficult to scale (currently at >20 nodes). We've started using cassandra for other things, and now we are experimenting to see if having a similar 'sorted set' data structure is feasible in cassandra. My approach so far is: 1. Use a counter CF to store the values I want to sort by 2. Periodically read in all key/values in the counter CF and sort in the client application (~every five minutes or so) 3. Write back to a different CF with the ordered keys I care about Does this seem crazy? Is there a simpler way to do this in cassandra?