Re: implementing a 'sorted set' on top of cassandra

2017-01-17 Thread Benjamin Roth
That was what I had in mind. Which solution (populate on demand,
pre-populate) really fits your needs depends on
- write frequency
- required cache expiration time
- read frequency
- ratio of written / read sets

If you prefer event based stuff, the kafka solution, jon proposed could be
quite interesting.

2017-01-17 19:10 GMT+01:00 Jonathan Haddad :

> You could store the key -> score pairs in Cassandra, pull out the full
> partition and repopulate the cache in redis with the top N whatever you
> need.  I'd only read the Cassandra values directly in order to repopulate
> the cache.
>
> I wouldn't try to score the score -> key values, the perf will be a
> nightmare.
>
> On Tue, Jan 17, 2017 at 8:47 AM Mike Torra  wrote:
>
>> Thanks for the feedback everyone! Redis `zincryby` and `zrangebyscore` is
>> indeed what we use today.
>>
>> Caching the resulting 'sorted sets' in redis is exactly what I plan to
>> do. There will be tens of thousands of these sorted sets, each generally
>> with <10k items (with maybe a few exceptions going a bit over that). The
>> reason to periodically calculate the set and store it in cassandra is to
>> avoid having the client do that work, when the client only really cares
>> about the top 100 or so items at any given time. Being truly "real time" is
>> not critical for us, but it is a selling point to be as up to date as
>> possible.
>>
>> I'd like to understand the performance issue of frequently updating these
>> sets. I understand that every time I 'regenerate' the sorted set, any rows
>> that change will create a tombstone - for example, if "item_1" is in first
>> place and "item_2" is in second place, then they switch on the next update,
>> that would be two tombstones. Do you think this will be a big enough
>> problem that it is worth doing the sorting work client side, on demand, and
>> just try to eat the performance hit there? My thought was to make a
>> tradeoff by using more cassandra disk space (ie pre calculating all sets),
>> in exchange for faster reads when requests actually come in that need this
>> data.
>>
>> From: Benjamin Roth 
>> Reply-To: "user@cassandra.apache.org" 
>> Date: Saturday, January 14, 2017 at 1:25 PM
>> To: "user@cassandra.apache.org" 
>> Subject: Re: implementing a 'sorted set' on top of cassandra
>>
>> Mike mentioned "increment" in his initial post. That let me think of a
>> case with increments and fetching a top list by a counter like
>> https://redis.io/commands/zincrby
>> https://redis.io/commands/zrangebyscore
>>
>> 1. Cassandra is absolutely not made to sort by a counter (or a
>> non-counter numeric incrementing value) but it is made to store counters.
>> In this case a partition could be seen as a set.
>> 2. I thought of CS for persistence and - depending on the app
>> requirements like real-time and set size - still use redis as a read cache
>>
>> 2017-01-14 18:45 GMT+01:00 Jonathan Haddad :
>>
>> Sorted sets don't have a requirement of incrementing / decrementing.
>> They're commonly used for thing like leaderboards where the values are
>> arbitrary.
>>
>> In Redis they are implemented with 2 data structures for efficient
>> lookups of either key or value. No getting around that as far as I know.
>>
>> In Cassandra they would require using the score as a clustering column in
>> order to select top N scores (and paginate). That means a tombstone
>> whenever the value for a key in the set changes. In sets with high rates of
>> change that means a lot of tombstones and thus terrible performance.
>> On Sat, Jan 14, 2017 at 9:40 AM DuyHai Doan  wrote:
>>
>> Sorting on an "incremented" numeric value has always been a nightmare to
>> be done properly in C*
>>
>> Either use Counter type but then no sorting is possible since counter
>> cannot be used as type for clustering column (which allows sort)
>>
>> Or use simple numeric type on clustering column but then to increment the
>> value *concurrently* and *safely* it's prohibitive (SELECT to fetch current
>> value + UPDATE ... IF value = ) + retry
>>
>>
>>
>> On Sat, Jan 14, 2017 at 8:54 AM, Benjamin Roth 
>> wrote:
>>
>> If your proposed solution is crazy depends on your needs :)
>> It sounds like you can live with not-realtime data. So it is ok to cache
>> it. Why preproduce the results if you only need 5% of them? Why not use
>> redis as a cache with expiring sorted sets that are filled

Re: implementing a 'sorted set' on top of cassandra

2017-01-17 Thread Jonathan Haddad
You could store the key -> score pairs in Cassandra, pull out the full
partition and repopulate the cache in redis with the top N whatever you
need.  I'd only read the Cassandra values directly in order to repopulate
the cache.

I wouldn't try to score the score -> key values, the perf will be a
nightmare.

On Tue, Jan 17, 2017 at 8:47 AM Mike Torra  wrote:

> Thanks for the feedback everyone! Redis `zincryby` and `zrangebyscore` is
> indeed what we use today.
>
> Caching the resulting 'sorted sets' in redis is exactly what I plan to do.
> There will be tens of thousands of these sorted sets, each generally with
> <10k items (with maybe a few exceptions going a bit over that). The reason
> to periodically calculate the set and store it in cassandra is to avoid
> having the client do that work, when the client only really cares about the
> top 100 or so items at any given time. Being truly "real time" is not
> critical for us, but it is a selling point to be as up to date as possible.
>
> I'd like to understand the performance issue of frequently updating these
> sets. I understand that every time I 'regenerate' the sorted set, any rows
> that change will create a tombstone - for example, if "item_1" is in first
> place and "item_2" is in second place, then they switch on the next update,
> that would be two tombstones. Do you think this will be a big enough
> problem that it is worth doing the sorting work client side, on demand, and
> just try to eat the performance hit there? My thought was to make a
> tradeoff by using more cassandra disk space (ie pre calculating all sets),
> in exchange for faster reads when requests actually come in that need this
> data.
>
> From: Benjamin Roth 
> Reply-To: "user@cassandra.apache.org" 
> Date: Saturday, January 14, 2017 at 1:25 PM
> To: "user@cassandra.apache.org" 
> Subject: Re: implementing a 'sorted set' on top of cassandra
>
> Mike mentioned "increment" in his initial post. That let me think of a
> case with increments and fetching a top list by a counter like
> https://redis.io/commands/zincrby
> https://redis.io/commands/zrangebyscore
>
> 1. Cassandra is absolutely not made to sort by a counter (or a non-counter
> numeric incrementing value) but it is made to store counters. In this case
> a partition could be seen as a set.
> 2. I thought of CS for persistence and - depending on the app requirements
> like real-time and set size - still use redis as a read cache
>
> 2017-01-14 18:45 GMT+01:00 Jonathan Haddad :
>
> Sorted sets don't have a requirement of incrementing / decrementing.
> They're commonly used for thing like leaderboards where the values are
> arbitrary.
>
> In Redis they are implemented with 2 data structures for efficient lookups
> of either key or value. No getting around that as far as I know.
>
> In Cassandra they would require using the score as a clustering column in
> order to select top N scores (and paginate). That means a tombstone
> whenever the value for a key in the set changes. In sets with high rates of
> change that means a lot of tombstones and thus terrible performance.
> On Sat, Jan 14, 2017 at 9:40 AM DuyHai Doan  wrote:
>
> Sorting on an "incremented" numeric value has always been a nightmare to
> be done properly in C*
>
> Either use Counter type but then no sorting is possible since counter
> cannot be used as type for clustering column (which allows sort)
>
> Or use simple numeric type on clustering column but then to increment the
> value *concurrently* and *safely* it's prohibitive (SELECT to fetch current
> value + UPDATE ... IF value = ) + retry
>
>
>
> On Sat, Jan 14, 2017 at 8:54 AM, Benjamin Roth 
> wrote:
>
> If your proposed solution is crazy depends on your needs :)
> It sounds like you can live with not-realtime data. So it is ok to cache
> it. Why preproduce the results if you only need 5% of them? Why not use
> redis as a cache with expiring sorted sets that are filled on demand from
> cassandra partitions with counters?
> So redis has much less to do and can scale much better. And you are not
> limited on keeping all data in ram as cache data is volatile and can be
> evicted on demand.
> If this is effective also depends on the size of your sets. CS wont be
> able to sort them by score for you, so you will have to load the complete
> set to redis for caching and / or do sorting in your app on demand. This
> certainly won't work out well with sets with millions of entries.
>
> 2017-01-13 23:14 GMT+01:00 Mike Torra :
>
> We currently use redis to store sorted sets that we increment many, many
> times more than w

Re: implementing a 'sorted set' on top of cassandra

2017-01-17 Thread Edward Capriolo
On Tue, Jan 17, 2017 at 11:47 AM, Mike Torra  wrote:

> Thanks for the feedback everyone! Redis `zincryby` and `zrangebyscore` is
> indeed what we use today.
>
> Caching the resulting 'sorted sets' in redis is exactly what I plan to do.
> There will be tens of thousands of these sorted sets, each generally with
> <10k items (with maybe a few exceptions going a bit over that). The reason
> to periodically calculate the set and store it in cassandra is to avoid
> having the client do that work, when the client only really cares about the
> top 100 or so items at any given time. Being truly "real time" is not
> critical for us, but it is a selling point to be as up to date as possible.
>
> I'd like to understand the performance issue of frequently updating these
> sets. I understand that every time I 'regenerate' the sorted set, any rows
> that change will create a tombstone - for example, if "item_1" is in first
> place and "item_2" is in second place, then they switch on the next update,
> that would be two tombstones. Do you think this will be a big enough
> problem that it is worth doing the sorting work client side, on demand, and
> just try to eat the performance hit there? My thought was to make a
> tradeoff by using more cassandra disk space (ie pre calculating all sets),
> in exchange for faster reads when requests actually come in that need this
> data.
>
> From: Benjamin Roth 
> Reply-To: "user@cassandra.apache.org" 
> Date: Saturday, January 14, 2017 at 1:25 PM
> To: "user@cassandra.apache.org" 
> Subject: Re: implementing a 'sorted set' on top of cassandra
>
> Mike mentioned "increment" in his initial post. That let me think of a
> case with increments and fetching a top list by a counter like
> https://redis.io/commands/zincrby
> https://redis.io/commands/zrangebyscore
>
> 1. Cassandra is absolutely not made to sort by a counter (or a non-counter
> numeric incrementing value) but it is made to store counters. In this case
> a partition could be seen as a set.
> 2. I thought of CS for persistence and - depending on the app requirements
> like real-time and set size - still use redis as a read cache
>
> 2017-01-14 18:45 GMT+01:00 Jonathan Haddad :
>
>> Sorted sets don't have a requirement of incrementing / decrementing.
>> They're commonly used for thing like leaderboards where the values are
>> arbitrary.
>>
>> In Redis they are implemented with 2 data structures for efficient
>> lookups of either key or value. No getting around that as far as I know.
>>
>> In Cassandra they would require using the score as a clustering column in
>> order to select top N scores (and paginate). That means a tombstone
>> whenever the value for a key in the set changes. In sets with high rates of
>> change that means a lot of tombstones and thus terrible performance.
>> On Sat, Jan 14, 2017 at 9:40 AM DuyHai Doan  wrote:
>>
>>> Sorting on an "incremented" numeric value has always been a nightmare to
>>> be done properly in C*
>>>
>>> Either use Counter type but then no sorting is possible since counter
>>> cannot be used as type for clustering column (which allows sort)
>>>
>>> Or use simple numeric type on clustering column but then to increment
>>> the value *concurrently* and *safely* it's prohibitive (SELECT to fetch
>>> current value + UPDATE ... IF value = ) + retry
>>>
>>>
>>>
>>> On Sat, Jan 14, 2017 at 8:54 AM, Benjamin Roth 
>>> wrote:
>>>
>>> If your proposed solution is crazy depends on your needs :)
>>> It sounds like you can live with not-realtime data. So it is ok to cache
>>> it. Why preproduce the results if you only need 5% of them? Why not use
>>> redis as a cache with expiring sorted sets that are filled on demand from
>>> cassandra partitions with counters?
>>> So redis has much less to do and can scale much better. And you are not
>>> limited on keeping all data in ram as cache data is volatile and can be
>>> evicted on demand.
>>> If this is effective also depends on the size of your sets. CS wont be
>>> able to sort them by score for you, so you will have to load the complete
>>> set to redis for caching and / or do sorting in your app on demand. This
>>> certainly won't work out well with sets with millions of entries.
>>>
>>> 2017-01-13 23:14 GMT+01:00 Mike Torra :
>>>
>>> We currently use redis to store sorted sets that we increment many, many
>>> times more than we read. For example

Re: implementing a 'sorted set' on top of cassandra

2017-01-17 Thread Mike Torra
Thanks for the feedback everyone! Redis `zincryby` and `zrangebyscore` is 
indeed what we use today.

Caching the resulting 'sorted sets' in redis is exactly what I plan to do. 
There will be tens of thousands of these sorted sets, each generally with <10k 
items (with maybe a few exceptions going a bit over that). The reason to 
periodically calculate the set and store it in cassandra is to avoid having the 
client do that work, when the client only really cares about the top 100 or so 
items at any given time. Being truly "real time" is not critical for us, but it 
is a selling point to be as up to date as possible.

I'd like to understand the performance issue of frequently updating these sets. 
I understand that every time I 'regenerate' the sorted set, any rows that 
change will create a tombstone - for example, if "item_1" is in first place and 
"item_2" is in second place, then they switch on the next update, that would be 
two tombstones. Do you think this will be a big enough problem that it is worth 
doing the sorting work client side, on demand, and just try to eat the 
performance hit there? My thought was to make a tradeoff by using more 
cassandra disk space (ie pre calculating all sets), in exchange for faster 
reads when requests actually come in that need this data.

From: Benjamin Roth mailto:benjamin.r...@jaumo.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Date: Saturday, January 14, 2017 at 1:25 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Subject: Re: implementing a 'sorted set' on top of cassandra

Mike mentioned "increment" in his initial post. That let me think of a case 
with increments and fetching a top list by a counter like
https://redis.io/commands/zincrby
https://redis.io/commands/zrangebyscore

1. Cassandra is absolutely not made to sort by a counter (or a non-counter 
numeric incrementing value) but it is made to store counters. In this case a 
partition could be seen as a set.
2. I thought of CS for persistence and - depending on the app requirements like 
real-time and set size - still use redis as a read cache

2017-01-14 18:45 GMT+01:00 Jonathan Haddad 
mailto:j...@jonhaddad.com>>:
Sorted sets don't have a requirement of incrementing / decrementing. They're 
commonly used for thing like leaderboards where the values are arbitrary.

In Redis they are implemented with 2 data structures for efficient lookups of 
either key or value. No getting around that as far as I know.

In Cassandra they would require using the score as a clustering column in order 
to select top N scores (and paginate). That means a tombstone whenever the 
value for a key in the set changes. In sets with high rates of change that 
means a lot of tombstones and thus terrible performance.
On Sat, Jan 14, 2017 at 9:40 AM DuyHai Doan 
mailto:doanduy...@gmail.com>> wrote:
Sorting on an "incremented" numeric value has always been a nightmare to be 
done properly in C*

Either use Counter type but then no sorting is possible since counter cannot be 
used as type for clustering column (which allows sort)

Or use simple numeric type on clustering column but then to increment the value 
*concurrently* and *safely* it's prohibitive (SELECT to fetch current value + 
UPDATE ... IF value = ) + retry



On Sat, Jan 14, 2017 at 8:54 AM, Benjamin Roth 
mailto:benjamin.r...@jaumo.com>> wrote:
If your proposed solution is crazy depends on your needs :)
It sounds like you can live with not-realtime data. So it is ok to cache it. 
Why preproduce the results if you only need 5% of them? Why not use redis as a 
cache with expiring sorted sets that are filled on demand from cassandra 
partitions with counters?
So redis has much less to do and can scale much better. And you are not limited 
on keeping all data in ram as cache data is volatile and can be evicted on 
demand.
If this is effective also depends on the size of your sets. CS wont be able to 
sort them by score for you, so you will have to load the complete set to redis 
for caching and / or do sorting in your app on demand. This certainly won't 
work out well with sets with millions of entries.

2017-01-13 23:14 GMT+01:00 Mike Torra 
mailto:mto...@demandware.com>>:
We currently use redis to store sorted sets that we increment many, many times 
more than we read. For example, only about 5% of these sets are ever read. We 
are getting to the point where redis is becoming difficult to scale (currently 
at >20 nodes).

We've started using cassandra for other things, and now we are experimenting to 
see if having a similar 'sorted set' data structure is feasible in cassandra. 
My approach so far is:

  1.  Use a counter CF to store the values I wan

Re: implementing a 'sorted set' on top of cassandra

2017-01-14 Thread Benjamin Roth
Mike mentioned "increment" in his initial post. That let me think of a case
with increments and fetching a top list by a counter like
https://redis.io/commands/zincrby
https://redis.io/commands/zrangebyscore

1. Cassandra is absolutely not made to sort by a counter (or a non-counter
numeric incrementing value) but it is made to store counters. In this case
a partition could be seen as a set.
2. I thought of CS for persistence and - depending on the app requirements
like real-time and set size - still use redis as a read cache

2017-01-14 18:45 GMT+01:00 Jonathan Haddad :

> Sorted sets don't have a requirement of incrementing / decrementing.
> They're commonly used for thing like leaderboards where the values are
> arbitrary.
>
> In Redis they are implemented with 2 data structures for efficient lookups
> of either key or value. No getting around that as far as I know.
>
> In Cassandra they would require using the score as a clustering column in
> order to select top N scores (and paginate). That means a tombstone
> whenever the value for a key in the set changes. In sets with high rates of
> change that means a lot of tombstones and thus terrible performance.
> On Sat, Jan 14, 2017 at 9:40 AM DuyHai Doan  wrote:
>
>> Sorting on an "incremented" numeric value has always been a nightmare to
>> be done properly in C*
>>
>> Either use Counter type but then no sorting is possible since counter
>> cannot be used as type for clustering column (which allows sort)
>>
>> Or use simple numeric type on clustering column but then to increment the
>> value *concurrently* and *safely* it's prohibitive (SELECT to fetch current
>> value + UPDATE ... IF value = ) + retry
>>
>>
>>
>> On Sat, Jan 14, 2017 at 8:54 AM, Benjamin Roth 
>> wrote:
>>
>> If your proposed solution is crazy depends on your needs :)
>> It sounds like you can live with not-realtime data. So it is ok to cache
>> it. Why preproduce the results if you only need 5% of them? Why not use
>> redis as a cache with expiring sorted sets that are filled on demand from
>> cassandra partitions with counters?
>> So redis has much less to do and can scale much better. And you are not
>> limited on keeping all data in ram as cache data is volatile and can be
>> evicted on demand.
>> If this is effective also depends on the size of your sets. CS wont be
>> able to sort them by score for you, so you will have to load the complete
>> set to redis for caching and / or do sorting in your app on demand. This
>> certainly won't work out well with sets with millions of entries.
>>
>> 2017-01-13 23:14 GMT+01:00 Mike Torra :
>>
>> We currently use redis to store sorted sets that we increment many, many
>> times more than we read. For example, only about 5% of these sets are ever
>> read. We are getting to the point where redis is becoming difficult to
>> scale (currently at >20 nodes).
>>
>> We've started using cassandra for other things, and now we are
>> experimenting to see if having a similar 'sorted set' data structure is
>> feasible in cassandra. My approach so far is:
>>
>>1. Use a counter CF to store the values I want to sort by
>>2. Periodically read in all key/values in the counter CF and sort in
>>the client application (~every five minutes or so)
>>3. Write back to a different CF with the ordered keys I care about
>>
>> Does this seem crazy? Is there a simpler way to do this in cassandra?
>>
>>
>>
>>
>> --
>> Benjamin Roth
>> Prokurist
>>
>> Jaumo GmbH · www.jaumo.com
>> Wehrstraße 46 · 73035 Göppingen · Germany
>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
>> <+49%207161%203048801>
>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>
>>
>>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer


Re: implementing a 'sorted set' on top of cassandra

2017-01-14 Thread Jonathan Haddad
Sorted sets don't have a requirement of incrementing / decrementing.
They're commonly used for thing like leaderboards where the values are
arbitrary.

In Redis they are implemented with 2 data structures for efficient lookups
of either key or value. No getting around that as far as I know.

In Cassandra they would require using the score as a clustering column in
order to select top N scores (and paginate). That means a tombstone
whenever the value for a key in the set changes. In sets with high rates of
change that means a lot of tombstones and thus terrible performance.
On Sat, Jan 14, 2017 at 9:40 AM DuyHai Doan  wrote:

> Sorting on an "incremented" numeric value has always been a nightmare to
> be done properly in C*
>
> Either use Counter type but then no sorting is possible since counter
> cannot be used as type for clustering column (which allows sort)
>
> Or use simple numeric type on clustering column but then to increment the
> value *concurrently* and *safely* it's prohibitive (SELECT to fetch current
> value + UPDATE ... IF value = ) + retry
>
>
>
> On Sat, Jan 14, 2017 at 8:54 AM, Benjamin Roth 
> wrote:
>
> If your proposed solution is crazy depends on your needs :)
> It sounds like you can live with not-realtime data. So it is ok to cache
> it. Why preproduce the results if you only need 5% of them? Why not use
> redis as a cache with expiring sorted sets that are filled on demand from
> cassandra partitions with counters?
> So redis has much less to do and can scale much better. And you are not
> limited on keeping all data in ram as cache data is volatile and can be
> evicted on demand.
> If this is effective also depends on the size of your sets. CS wont be
> able to sort them by score for you, so you will have to load the complete
> set to redis for caching and / or do sorting in your app on demand. This
> certainly won't work out well with sets with millions of entries.
>
> 2017-01-13 23:14 GMT+01:00 Mike Torra :
>
> We currently use redis to store sorted sets that we increment many, many
> times more than we read. For example, only about 5% of these sets are ever
> read. We are getting to the point where redis is becoming difficult to
> scale (currently at >20 nodes).
>
> We've started using cassandra for other things, and now we are
> experimenting to see if having a similar 'sorted set' data structure is
> feasible in cassandra. My approach so far is:
>
>1. Use a counter CF to store the values I want to sort by
>2. Periodically read in all key/values in the counter CF and sort in
>the client application (~every five minutes or so)
>3. Write back to a different CF with the ordered keys I care about
>
> Does this seem crazy? Is there a simpler way to do this in cassandra?
>
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
> <+49%207161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>
>
>


Re: implementing a 'sorted set' on top of cassandra

2017-01-14 Thread DuyHai Doan
Sorting on an "incremented" numeric value has always been a nightmare to be
done properly in C*

Either use Counter type but then no sorting is possible since counter
cannot be used as type for clustering column (which allows sort)

Or use simple numeric type on clustering column but then to increment the
value *concurrently* and *safely* it's prohibitive (SELECT to fetch current
value + UPDATE ... IF value = ) + retry



On Sat, Jan 14, 2017 at 8:54 AM, Benjamin Roth 
wrote:

> If your proposed solution is crazy depends on your needs :)
> It sounds like you can live with not-realtime data. So it is ok to cache
> it. Why preproduce the results if you only need 5% of them? Why not use
> redis as a cache with expiring sorted sets that are filled on demand from
> cassandra partitions with counters?
> So redis has much less to do and can scale much better. And you are not
> limited on keeping all data in ram as cache data is volatile and can be
> evicted on demand.
> If this is effective also depends on the size of your sets. CS wont be
> able to sort them by score for you, so you will have to load the complete
> set to redis for caching and / or do sorting in your app on demand. This
> certainly won't work out well with sets with millions of entries.
>
> 2017-01-13 23:14 GMT+01:00 Mike Torra :
>
>> We currently use redis to store sorted sets that we increment many, many
>> times more than we read. For example, only about 5% of these sets are ever
>> read. We are getting to the point where redis is becoming difficult to
>> scale (currently at >20 nodes).
>>
>> We've started using cassandra for other things, and now we are
>> experimenting to see if having a similar 'sorted set' data structure is
>> feasible in cassandra. My approach so far is:
>>
>>1. Use a counter CF to store the values I want to sort by
>>2. Periodically read in all key/values in the counter CF and sort in
>>the client application (~every five minutes or so)
>>3. Write back to a different CF with the ordered keys I care about
>>
>> Does this seem crazy? Is there a simpler way to do this in cassandra?
>>
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
> <+49%207161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>


Re: implementing a 'sorted set' on top of cassandra

2017-01-13 Thread Benjamin Roth
If your proposed solution is crazy depends on your needs :)
It sounds like you can live with not-realtime data. So it is ok to cache
it. Why preproduce the results if you only need 5% of them? Why not use
redis as a cache with expiring sorted sets that are filled on demand from
cassandra partitions with counters?
So redis has much less to do and can scale much better. And you are not
limited on keeping all data in ram as cache data is volatile and can be
evicted on demand.
If this is effective also depends on the size of your sets. CS wont be able
to sort them by score for you, so you will have to load the complete set to
redis for caching and / or do sorting in your app on demand. This certainly
won't work out well with sets with millions of entries.

2017-01-13 23:14 GMT+01:00 Mike Torra :

> We currently use redis to store sorted sets that we increment many, many
> times more than we read. For example, only about 5% of these sets are ever
> read. We are getting to the point where redis is becoming difficult to
> scale (currently at >20 nodes).
>
> We've started using cassandra for other things, and now we are
> experimenting to see if having a similar 'sorted set' data structure is
> feasible in cassandra. My approach so far is:
>
>1. Use a counter CF to store the values I want to sort by
>2. Periodically read in all key/values in the counter CF and sort in
>the client application (~every five minutes or so)
>3. Write back to a different CF with the ordered keys I care about
>
> Does this seem crazy? Is there a simpler way to do this in cassandra?
>



-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer


Re: implementing a 'sorted set' on top of cassandra

2017-01-13 Thread Benjamin Roth
Not if you want to sort by score (a counter)

Am 14.01.2017 08:33 schrieb "DuyHai Doan" :

> Clustering column can be seen as sorted set
>
> Table abstraction == Map>
>
>
> On Sat, Jan 14, 2017 at 2:28 AM, Edward Capriolo 
> wrote:
>
>>
>>
>> On Fri, Jan 13, 2017 at 8:14 PM, Jonathan Haddad 
>> wrote:
>>
>>> I've thought about this for years and have never arrived on a
>>> particularly great implementation.  Your idea will be maybe OK if the sets
>>> are very small and if the values don't change very often.  But in a system
>>> where the values of the keys in the set change frequently (lots of
>>> tombstones) or the sets are large I think you're going to experience quite
>>> a bit of pain.
>>>
>>> On Fri, Jan 13, 2017 at 2:14 PM Mike Torra 
>>> wrote:
>>>
>>> We currently use redis to store sorted sets that we increment many, many
>>> times more than we read. For example, only about 5% of these sets are ever
>>> read. We are getting to the point where redis is becoming difficult to
>>> scale (currently at >20 nodes).
>>>
>>> We've started using cassandra for other things, and now we are
>>> experimenting to see if having a similar 'sorted set' data structure is
>>> feasible in cassandra. My approach so far is:
>>>
>>>1. Use a counter CF to store the values I want to sort by
>>>2. Periodically read in all key/values in the counter CF and sort in
>>>the client application (~every five minutes or so)
>>>3. Write back to a different CF with the ordered keys I care about
>>>
>>> Does this seem crazy? Is there a simpler way to do this in cassandra?
>>>
>>>
>> Redis is the other side of the coin.
>>
>> Fast:
>> https://groups.google.com/forum/#!topic/redis-db/4TAItKMyUEE
>>
>> http://stackoverflow.com/questions/6076342/is-there-a-practi
>> cal-limit-to-the-number-of-elements-in-a-sorted-set-in-redis
>>
>> 320MB memory for a 2,000,000 email addresses is hard to scale. If you are
>> only maintaining a single list great, but if you have millions of lists
>> this memory/ cost profile is not idea.
>>
>
>


Re: implementing a 'sorted set' on top of cassandra

2017-01-13 Thread DuyHai Doan
Clustering column can be seen as sorted set

Table abstraction == Map>


On Sat, Jan 14, 2017 at 2:28 AM, Edward Capriolo 
wrote:

>
>
> On Fri, Jan 13, 2017 at 8:14 PM, Jonathan Haddad 
> wrote:
>
>> I've thought about this for years and have never arrived on a
>> particularly great implementation.  Your idea will be maybe OK if the sets
>> are very small and if the values don't change very often.  But in a system
>> where the values of the keys in the set change frequently (lots of
>> tombstones) or the sets are large I think you're going to experience quite
>> a bit of pain.
>>
>> On Fri, Jan 13, 2017 at 2:14 PM Mike Torra  wrote:
>>
>> We currently use redis to store sorted sets that we increment many, many
>> times more than we read. For example, only about 5% of these sets are ever
>> read. We are getting to the point where redis is becoming difficult to
>> scale (currently at >20 nodes).
>>
>> We've started using cassandra for other things, and now we are
>> experimenting to see if having a similar 'sorted set' data structure is
>> feasible in cassandra. My approach so far is:
>>
>>1. Use a counter CF to store the values I want to sort by
>>2. Periodically read in all key/values in the counter CF and sort in
>>the client application (~every five minutes or so)
>>3. Write back to a different CF with the ordered keys I care about
>>
>> Does this seem crazy? Is there a simpler way to do this in cassandra?
>>
>>
> Redis is the other side of the coin.
>
> Fast:
> https://groups.google.com/forum/#!topic/redis-db/4TAItKMyUEE
>
> http://stackoverflow.com/questions/6076342/is-there-a-
> practical-limit-to-the-number-of-elements-in-a-sorted-set-in-redis
>
> 320MB memory for a 2,000,000 email addresses is hard to scale. If you are
> only maintaining a single list great, but if you have millions of lists
> this memory/ cost profile is not idea.
>


Re: implementing a 'sorted set' on top of cassandra

2017-01-13 Thread Edward Capriolo
On Fri, Jan 13, 2017 at 8:14 PM, Jonathan Haddad  wrote:

> I've thought about this for years and have never arrived on a particularly
> great implementation.  Your idea will be maybe OK if the sets are very
> small and if the values don't change very often.  But in a system where the
> values of the keys in the set change frequently (lots of tombstones) or the
> sets are large I think you're going to experience quite a bit of pain.
>
> On Fri, Jan 13, 2017 at 2:14 PM Mike Torra  wrote:
>
> We currently use redis to store sorted sets that we increment many, many
> times more than we read. For example, only about 5% of these sets are ever
> read. We are getting to the point where redis is becoming difficult to
> scale (currently at >20 nodes).
>
> We've started using cassandra for other things, and now we are
> experimenting to see if having a similar 'sorted set' data structure is
> feasible in cassandra. My approach so far is:
>
>1. Use a counter CF to store the values I want to sort by
>2. Periodically read in all key/values in the counter CF and sort in
>the client application (~every five minutes or so)
>3. Write back to a different CF with the ordered keys I care about
>
> Does this seem crazy? Is there a simpler way to do this in cassandra?
>
>
Redis is the other side of the coin.

Fast:
https://groups.google.com/forum/#!topic/redis-db/4TAItKMyUEE

http://stackoverflow.com/questions/6076342/is-there-a-practical-limit-to-the-number-of-elements-in-a-sorted-set-in-redis

320MB memory for a 2,000,000 email addresses is hard to scale. If you are
only maintaining a single list great, but if you have millions of lists
this memory/ cost profile is not idea.


Re: implementing a 'sorted set' on top of cassandra

2017-01-13 Thread Jonathan Haddad
I've thought about this for years and have never arrived on a particularly
great implementation.  Your idea will be maybe OK if the sets are very
small and if the values don't change very often.  But in a system where the
values of the keys in the set change frequently (lots of tombstones) or the
sets are large I think you're going to experience quite a bit of pain.

On Fri, Jan 13, 2017 at 2:14 PM Mike Torra  wrote:

We currently use redis to store sorted sets that we increment many, many
times more than we read. For example, only about 5% of these sets are ever
read. We are getting to the point where redis is becoming difficult to
scale (currently at >20 nodes).

We've started using cassandra for other things, and now we are
experimenting to see if having a similar 'sorted set' data structure is
feasible in cassandra. My approach so far is:

   1. Use a counter CF to store the values I want to sort by
   2. Periodically read in all key/values in the counter CF and sort in the
   client application (~every five minutes or so)
   3. Write back to a different CF with the ordered keys I care about

Does this seem crazy? Is there a simpler way to do this in cassandra?


Re: implementing a 'sorted set' on top of cassandra

2017-01-13 Thread Edward Capriolo
On Fri, Jan 13, 2017 at 5:14 PM, Mike Torra  wrote:

> We currently use redis to store sorted sets that we increment many, many
> times more than we read. For example, only about 5% of these sets are ever
> read. We are getting to the point where redis is becoming difficult to
> scale (currently at >20 nodes).
>
> We've started using cassandra for other things, and now we are
> experimenting to see if having a similar 'sorted set' data structure is
> feasible in cassandra. My approach so far is:
>
>1. Use a counter CF to store the values I want to sort by
>2. Periodically read in all key/values in the counter CF and sort in
>the client application (~every five minutes or so)
>3. Write back to a different CF with the ordered keys I care about
>
> Does this seem crazy? Is there a simpler way to do this in cassandra?
>

Have you considered using only the keys in Cassandra's map type?

I proposed an implementation that I wanted to experiment with adding to a
set: https://issues.apache.org/jira/browse/CASSANDRA-6870 . Even though
redis and it's feature set is wildly popular there is not a great consensus
that Cassandra should do those things as manipulations of a single column.


implementing a 'sorted set' on top of cassandra

2017-01-13 Thread Mike Torra
We currently use redis to store sorted sets that we increment many, many times 
more than we read. For example, only about 5% of these sets are ever read. We 
are getting to the point where redis is becoming difficult to scale (currently 
at >20 nodes).

We've started using cassandra for other things, and now we are experimenting to 
see if having a similar 'sorted set' data structure is feasible in cassandra. 
My approach so far is:

  1.  Use a counter CF to store the values I want to sort by
  2.  Periodically read in all key/values in the counter CF and sort in the 
client application (~every five minutes or so)
  3.  Write back to a different CF with the ordered keys I care about

Does this seem crazy? Is there a simpler way to do this in cassandra?