Thanks for the clarification.

In my case, Cassandra is the only storage. If the counters get incorrect,
it could't be corrected. For that if we store raw data, we can as well go
that approach. But the granularity has to be as seconds level as more than
one user can click the same link. So the data will be huge with more writes
and more rows to count for reads right?

Thanks
Ajay


On Mon, Dec 29, 2014 at 7:10 PM, Alain RODRIGUEZ <arodr...@gmail.com> wrote:

> Hi Ajay,
>
> Here is a good explanation you might want to read.
>
>
> http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counters
>
> Though we use counters for 3 years now, we used them from start C* 0.8 and
> we are happy with them. Limits I can see in both ways are:
>
> Counters:
>
> - accuracy indeed (Tend to be small in our use case < 5% - when the
> business allow 10%, so fair enough for us) + we recount them through a
> batch processing tool (spark / hadoop - Kind of lambda architecture). So
> our real-time stats are inaccurate and after a few minutes or hours we have
> the real value.
> - Read-Before-Write model, which is an anti-pattern. Makes you use more
> machine due to the pressure involved, affordable for us too.
>
> Raw data (counted)
>
> - Space used (can become quite impressive very fast, depending on your
> business) !
> - Time to answer a request (we expose the data to customer, they don't
> want to wait 10 sec for Cassandra to read 1 000 000 + columns)
> - Performances in o(n) (linear) instead of o(1) (constant). Customer won't
> always understand that for you it is harder to read 1 than 1 000 000, since
> it should be reading 1 number in both case, and your interface will have
> very unstable read time.
>
> Pick the best solution (or combination) for your use case. Those
> disadvantages lists are not exhaustive, just things that came to my mind
> right now.
>
> C*heers
>
> Alain
>
> 2014-12-29 13:33 GMT+01:00 Ajay <ajay.ga...@gmail.com>:
>
>> Hi,
>>
>> So you mean to say counters are not accurate? (It is highly likely that
>> multiple parallel threads trying to increment the counter as users click
>> the links).
>>
>> Thanks
>> Ajay
>>
>>
>> On Mon, Dec 29, 2014 at 4:49 PM, Janne Jalkanen <janne.jalka...@ecyrd.com
>> > wrote:
>>
>>>
>>> Hi!
>>>
>>> It’s really a tradeoff between accurate and fast and your read access
>>> patterns; if you need it to be fairly fast, use counters by all means, but
>>> accept the fact that they will (especially in older versions of cassandra
>>> or adverse network conditions) drift off from the true click count.  If you
>>> need accurate, use a timeuuid and count the rows (this is fairly safe for
>>> replays too).  However, if using timeuuids your storage will need lots of
>>> space; and your reads will be slow if the click counts are huge (because
>>> Cassandra will need to read every item).  Using counters makes it easy to
>>> just grab a slice of the time series data and shove it to a client for
>>> visualization.
>>>
>>> You could of course do a hybrid system; use timeuuids and then
>>> periodically count and add the result to a regular column, and then remove
>>> the columns.  Note that you might want to optimize this so that you don’t
>>> end up with a lot of tombstones, e.g. by bucketing the writes so that you
>>> can delete everything with just a single partition delete.
>>>
>>> At Thinglink some of the more important counters that we use are backed
>>> up by the actual data. So for speed purposes we use always counters for
>>> reads, but there’s a repair process that fixes the counter value if we
>>> suspect it starts drifting off the real data too much.  (You might be able
>>> to tell that we’ve been using counters for quite some time :-P)
>>>
>>> /Janne
>>>
>>> On 29 Dec 2014, at 13:00, Ajay <ajay.ga...@gmail.com> wrote:
>>>
>>> > Hi,
>>> >
>>> > Is it better to use Counter to User click count than maintaining
>>> creating new row as user id : timestamp and count it.
>>> >
>>> > Basically we want to track the user clicks and use the same for
>>> hourly/daily/monthly report.
>>> >
>>> > Thanks
>>> > Ajay
>>>
>>>
>>
>

Reply via email to