Re: 答复: 答复: A difficult data model with C*

2016-11-10 Thread Benjamin Roth
This is the reason why One would like to use an mv for it. An mv Handels
this. It adds a clustering Key while preserving uniqueness of the original
pk.

Am 11.11.2016 02:33 schrieb "Gang Liu" :

> I guess orignal design is keep one record for one video per user. maybe
> their app will report many play records when user watching one video.
> So there will be many records when change primary key to (user_name,
> last_time). Also
> SELECT * FROM user_views WHERE user_name = ? LIMIT 10
> without group by video_id can't fit business requirement.
>
> regards,
> Gang
>
>
> On Thu, Nov 10, 2016 at 6:47 PM, Carlos Alonso  wrote:
>
>> What about having something like
>>
>> CREATE TABLE user_views (
>>   user_name text,
>>   video_id text,
>>   position int,
>>   last_time timestamp,
>>   PRIMARY KEY(user_name, last_time)
>> ) WITH CLUSTERING ORDER BY (last_time DESC);
>>
>> Where you insert a record everytime a user watches a video and then
>> having a batch task (every night maybe?) that deletes the extra rows that
>> are not needed anymore.
>> The query pattern for this is quite efficient as something like SELECT *
>> FROM user_views WHERE user_name = ? LIMIT 10;
>>
>> Regards
>>
>> Carlos Alonso | Software Engineer | @calonso
>> 
>>
>> On 10 November 2016 at 09:19, Vladimir Yudovin 
>> wrote:
>>
>>> >Do you mean the oldest one should be removed when a new play is added?
>>> Sure. As you described the issue "the last ten items may be adequate for
>>> the business"
>>>
>>> Best regards, Vladimir Yudovin,
>>>
>>> *Winguzone  - Hosted Cloud
>>> CassandraLaunch your cluster in minutes.*
>>>
>>>
>>>  On Wed, 09 Nov 2016 20:47:05 -0500*Diamond ben
>>> >* wrote 
>>>
>>> The solution maybe work. However, the play list will grow over time and
>>> somebody maybe has ten thousands that will slow down the query and sort .
>>> Do you mean the oldest one should be removed when a new play is added?
>>>
>>> BTW, the version is 2.1.16 in our live system.
>>>
>>>
>>> BRs,
>>>
>>> BEN
>>> --
>>>
>>> *发件人:* Vladimir Yudovin 
>>> *发送时间:* 2016年11月9日 18:11:26
>>> *收件人:* user
>>> *主题:* Re: 答复: A difficult data model with C*
>>>
>>> You are welcome! )
>>>
>>> >recent ten movies watched by the user within 30 days.
>>> In this case you can't use PRIMARY KEY (user_name, video_id), as
>>> video_id is demanded to fetch row, so all this stuff may be
>>>
>>> CREATE TYPE play (video_id text, position int, last_time timestamp);
>>> CREATE TABLE recent (user_name text PRIMARY KEY, play_list
>>> LIST);
>>>
>>> You can easily retrieve play list for specific user by his ID. Instead
>>> of LIST you can use MAP, I don't think that for ten entries it matters.
>>>
>>>
>>> Best regards, Vladimir Yudovin,
>>>
>>> *Winguzone  - Hosted Cloud Cassandra
>>> Launch your cluster in minutes.*
>>>
>>>
>>>  On Tue, 08 Nov 2016 22:29:48 -0500*ben ben
>>> >* wrote 
>>>
>>> Hi Vladimir Yudovin,
>>>
>>>
>>> Thank you very much for your detailed explaining. Maybe I didn't
>>> describe the requirement clearly. The use cases should be:
>>>
>>> 1. a user login our app.
>>>
>>> 2. show the recent ten movies watched by the user within 30 days.
>>>
>>> 3. the user can click any one of the ten movie and continue to watch
>>> from the last position she/he did. BTW, a movie can be watched several
>>> times by a user and the last positon is needed indeed.
>>>
>>> BRs,
>>>
>>> BEN
>>> --
>>>
>>> *发件人:* Vladimir Yudovin 
>>> *发送时间:* 2016年11月8日 22:35:48
>>> *收件人:* user
>>> *主题:* Re: A difficult data model with C*
>>>
>>> Hi Ben,
>>>
>>> if need very limited number of positions (as you said ten) may be you
>>> can store them in LIST of UDT? Or just as JSON string?
>>> So you'll have one row per each pair user-video.
>>>
>>> It can be something like this:
>>>
>>> CREATE TYPE play (position int, last_time timestamp);
>>> CREATE TABLE recent (user_name text, video_id text, review
>>> LIST, PRIMARY KEY (user_name, video_id));
>>>
>>> UPDATE recent set review = review + [(1234,12345)] where user_name='some
>>> user' AND video_id='great video';
>>> UPDATE recent set review = review + [(1234,123456)] where
>>> user_name='some user' AND video_id='great video';
>>> UPDATE recent set review = review + [(1234,1234567)] where
>>> user_name='some user' AND video_id='great video';
>>>
>>> You can delete the oldest entry by index:
>>> DELETE review[0] FROM recent WHERE user_name='some user' AND
>>> video_id='great video';
>>>
>>> or by value, if you know the oldest entry:
>>>
>>> UPDATE recent SET review = review - [(1234,12345)]  WHERE
>>> user_name='some user' AND video_id='great video';
>>>
>>> Best 

Re: failure node rejoin

2016-11-10 Thread Ben Slater
>From a quick look I couldn’t find any defects other than the ones you’ve
found that seem potentially relevant to your issue (if any one else on the
list knows of one please chime in). Maybe the next step, if you haven’t
done so already, is to check your Cassandra logs for any signs of issues
(ie WARNING or ERROR logs) in the failing case.

Cheers
Ben

On Fri, 11 Nov 2016 at 13:07 Yuji Ito  wrote:

> Thanks Ben,
>
> I tried 2.2.8 and could reproduce the problem.
> So, I'm investigating some bug fixes of repair and commitlog between 2.2.8
> and 3.0.9.
>
> - CASSANDRA-12508: "nodetool repair returns status code 0 for some errors"
>
> - CASSANDRA-12436: "Under some races commit log may incorrectly think it
> has unflushed data"
>   - related to CASSANDRA-9669, CASSANDRA-11828 (the fix of 2.2 is
> different from that of 3.0?)
>
> Do you know other bug fixes related to commitlog?
>
> Regards
> yuji
>
> On Wed, Nov 9, 2016 at 11:34 AM, Ben Slater 
> wrote:
>
> There have been a few commit log bugs around in the last couple of months
> so perhaps you’ve hit something that was fixed recently. Would be
> interesting to know the problem is still occurring in 2.2.8.
>
> I suspect what is happening is that when you do your initial read (without
> flush) to check the number of rows, the data is in memtables and
> theoretically the commitlogs but not sstables. With the forced stop the
> memtables are lost and Cassandra should read the commitlog from disk at
> startup to reconstruct the memtables. However, it looks like that didn’t
> happen for some (bad) reason.
>
> Good news that 3.0.9 fixes the problem so up to you if you want to
> investigate further and see if you can narrow it down to file a JIRA
> (although the first step of that would be trying 2.2.9 to make sure it’s
> not already fixed there).
>
> Cheers
> Ben
>
> On Wed, 9 Nov 2016 at 12:56 Yuji Ito  wrote:
>
> I tried C* 3.0.9 instead of 2.2.
> The data lost problem hasn't happen for now (without `nodetool flush`).
>
> Thanks
>
> On Fri, Nov 4, 2016 at 3:50 PM, Yuji Ito  wrote:
>
> Thanks Ben,
>
> When I added `nodetool flush` on all nodes after step 2, the problem
> didn't happen.
> Did replay from old commit logs delete rows?
>
> Perhaps, the flush operation just detected that some nodes were down in
> step 2 (just after truncating tables).
> (Insertion and check in step2 would succeed if one node was down because
> consistency levels was serial.
> If the flush failed on more than one node, the test would retry step 2.)
> However, if so, the problem would happen without deleting Cassandra data.
>
> Regards,
> yuji
>
>
> On Mon, Oct 24, 2016 at 8:37 AM, Ben Slater 
> wrote:
>
> Definitely sounds to me like something is not working as expected but I
> don’t really have any idea what would cause that (other than the fairly
> extreme failure scenario). A couple of things I can think of to try to
> narrow it down:
> 1) Run nodetool flush on all nodes after step 2 - that will make sure all
> data is written to sstables rather than relying on commit logs
> 2) Run the test with consistency level quorom rather than serial
> (shouldn’t be any different but quorom is more widely used so maybe there
> is a bug that’s specific to serial)
>
> Cheers
> Ben
>
> On Mon, 24 Oct 2016 at 10:29 Yuji Ito  wrote:
>
> Hi Ben,
>
> The test without killing nodes has been working well without data lost.
> I've repeated my test about 200 times after removing data and
> rebuild/repair.
>
> Regards,
>
>
> On Fri, Oct 21, 2016 at 3:14 PM, Yuji Ito  wrote:
>
> > Just to confirm, are you saying:
> > a) after operation 2, you select all and get 1000 rows
> > b) after operation 3 (which only does updates and read) you select and
> only get 953 rows?
>
> That's right!
>
> I've started the test without killing nodes.
> I'll report the result to you next Monday.
>
> Thanks
>
>
> On Fri, Oct 21, 2016 at 3:05 PM, Ben Slater 
> wrote:
>
> Just to confirm, are you saying:
> a) after operation 2, you select all and get 1000 rows
> b) after operation 3 (which only does updates and read) you select and
> only get 953 rows?
>
> If so, that would be very unexpected. If you run your tests without
> killing nodes do you get the expected (1,000) rows?
>
> Cheers
> Ben
>
> On Fri, 21 Oct 2016 at 17:00 Yuji Ito  wrote:
>
> > Are you certain your tests don’t generate any overlapping inserts (by
> PK)?
>
> Yes. The operation 2) also checks the number of rows just after all
> insertions.
>
>
> On Fri, Oct 21, 2016 at 2:51 PM, Ben Slater 
> wrote:
>
> OK. Are you certain your tests don’t generate any overlapping inserts (by
> PK)? Cassandra basically treats any inserts with the same primary key as
> updates (so 1000 insert operations may not necessarily result in 1000 

Re: 答复: 答复: A difficult data model with C*

2016-11-10 Thread Gang Liu
I guess orignal design is keep one record for one video per user. maybe
their app will report many play records when user watching one video.
So there will be many records when change primary key to (user_name,
last_time). Also
SELECT * FROM user_views WHERE user_name = ? LIMIT 10
without group by video_id can't fit business requirement.

regards,
Gang


On Thu, Nov 10, 2016 at 6:47 PM, Carlos Alonso  wrote:

> What about having something like
>
> CREATE TABLE user_views (
>   user_name text,
>   video_id text,
>   position int,
>   last_time timestamp,
>   PRIMARY KEY(user_name, last_time)
> ) WITH CLUSTERING ORDER BY (last_time DESC);
>
> Where you insert a record everytime a user watches a video and then having
> a batch task (every night maybe?) that deletes the extra rows that are not
> needed anymore.
> The query pattern for this is quite efficient as something like SELECT *
> FROM user_views WHERE user_name = ? LIMIT 10;
>
> Regards
>
> Carlos Alonso | Software Engineer | @calonso 
>
> On 10 November 2016 at 09:19, Vladimir Yudovin 
> wrote:
>
>> >Do you mean the oldest one should be removed when a new play is added?
>> Sure. As you described the issue "the last ten items may be adequate for
>> the business"
>>
>> Best regards, Vladimir Yudovin,
>>
>> *Winguzone  - Hosted Cloud
>> CassandraLaunch your cluster in minutes.*
>>
>>
>>  On Wed, 09 Nov 2016 20:47:05 -0500*Diamond ben
>> >* wrote 
>>
>> The solution maybe work. However, the play list will grow over time and
>> somebody maybe has ten thousands that will slow down the query and sort .
>> Do you mean the oldest one should be removed when a new play is added?
>>
>> BTW, the version is 2.1.16 in our live system.
>>
>>
>> BRs,
>>
>> BEN
>> --
>>
>> *发件人:* Vladimir Yudovin 
>> *发送时间:* 2016年11月9日 18:11:26
>> *收件人:* user
>> *主题:* Re: 答复: A difficult data model with C*
>>
>> You are welcome! )
>>
>> >recent ten movies watched by the user within 30 days.
>> In this case you can't use PRIMARY KEY (user_name, video_id), as video_id
>> is demanded to fetch row, so all this stuff may be
>>
>> CREATE TYPE play (video_id text, position int, last_time timestamp);
>> CREATE TABLE recent (user_name text PRIMARY KEY, play_list
>> LIST);
>>
>> You can easily retrieve play list for specific user by his ID. Instead of
>> LIST you can use MAP, I don't think that for ten entries it matters.
>>
>>
>> Best regards, Vladimir Yudovin,
>>
>> *Winguzone  - Hosted Cloud Cassandra
>> Launch your cluster in minutes.*
>>
>>
>>  On Tue, 08 Nov 2016 22:29:48 -0500*ben ben > >* wrote 
>>
>> Hi Vladimir Yudovin,
>>
>>
>> Thank you very much for your detailed explaining. Maybe I didn't
>> describe the requirement clearly. The use cases should be:
>>
>> 1. a user login our app.
>>
>> 2. show the recent ten movies watched by the user within 30 days.
>>
>> 3. the user can click any one of the ten movie and continue to watch
>> from the last position she/he did. BTW, a movie can be watched several
>> times by a user and the last positon is needed indeed.
>>
>> BRs,
>>
>> BEN
>> --
>>
>> *发件人:* Vladimir Yudovin 
>> *发送时间:* 2016年11月8日 22:35:48
>> *收件人:* user
>> *主题:* Re: A difficult data model with C*
>>
>> Hi Ben,
>>
>> if need very limited number of positions (as you said ten) may be you can
>> store them in LIST of UDT? Or just as JSON string?
>> So you'll have one row per each pair user-video.
>>
>> It can be something like this:
>>
>> CREATE TYPE play (position int, last_time timestamp);
>> CREATE TABLE recent (user_name text, video_id text, review
>> LIST, PRIMARY KEY (user_name, video_id));
>>
>> UPDATE recent set review = review + [(1234,12345)] where user_name='some
>> user' AND video_id='great video';
>> UPDATE recent set review = review + [(1234,123456)] where user_name='some
>> user' AND video_id='great video';
>> UPDATE recent set review = review + [(1234,1234567)] where
>> user_name='some user' AND video_id='great video';
>>
>> You can delete the oldest entry by index:
>> DELETE review[0] FROM recent WHERE user_name='some user' AND
>> video_id='great video';
>>
>> or by value, if you know the oldest entry:
>>
>> UPDATE recent SET review = review - [(1234,12345)]  WHERE user_name='some
>> user' AND video_id='great video';
>>
>> Best regards, Vladimir Yudovin,
>>
>> *Winguzone  - Hosted Cloud Cassandra
>> Launch your cluster in minutes.*
>>
>>
>>  On Mon, 07 Nov 2016 21:54:08 -0500*ben ben > >* wrote 
>>
>>
>> Hi guys,
>>
>> We are maintaining a system for an on-line video service. ALL users'
>> viewing records of 

Re: failure node rejoin

2016-11-10 Thread Yuji Ito
Thanks Ben,

I tried 2.2.8 and could reproduce the problem.
So, I'm investigating some bug fixes of repair and commitlog between 2.2.8
and 3.0.9.

- CASSANDRA-12508: "nodetool repair returns status code 0 for some errors"

- CASSANDRA-12436: "Under some races commit log may incorrectly think it
has unflushed data"
  - related to CASSANDRA-9669, CASSANDRA-11828 (the fix of 2.2 is different
from that of 3.0?)

Do you know other bug fixes related to commitlog?

Regards
yuji

On Wed, Nov 9, 2016 at 11:34 AM, Ben Slater 
wrote:

> There have been a few commit log bugs around in the last couple of months
> so perhaps you’ve hit something that was fixed recently. Would be
> interesting to know the problem is still occurring in 2.2.8.
>
> I suspect what is happening is that when you do your initial read (without
> flush) to check the number of rows, the data is in memtables and
> theoretically the commitlogs but not sstables. With the forced stop the
> memtables are lost and Cassandra should read the commitlog from disk at
> startup to reconstruct the memtables. However, it looks like that didn’t
> happen for some (bad) reason.
>
> Good news that 3.0.9 fixes the problem so up to you if you want to
> investigate further and see if you can narrow it down to file a JIRA
> (although the first step of that would be trying 2.2.9 to make sure it’s
> not already fixed there).
>
> Cheers
> Ben
>
> On Wed, 9 Nov 2016 at 12:56 Yuji Ito  wrote:
>
>> I tried C* 3.0.9 instead of 2.2.
>> The data lost problem hasn't happen for now (without `nodetool flush`).
>>
>> Thanks
>>
>> On Fri, Nov 4, 2016 at 3:50 PM, Yuji Ito  wrote:
>>
>> Thanks Ben,
>>
>> When I added `nodetool flush` on all nodes after step 2, the problem
>> didn't happen.
>> Did replay from old commit logs delete rows?
>>
>> Perhaps, the flush operation just detected that some nodes were down in
>> step 2 (just after truncating tables).
>> (Insertion and check in step2 would succeed if one node was down because
>> consistency levels was serial.
>> If the flush failed on more than one node, the test would retry step 2.)
>> However, if so, the problem would happen without deleting Cassandra data.
>>
>> Regards,
>> yuji
>>
>>
>> On Mon, Oct 24, 2016 at 8:37 AM, Ben Slater 
>> wrote:
>>
>> Definitely sounds to me like something is not working as expected but I
>> don’t really have any idea what would cause that (other than the fairly
>> extreme failure scenario). A couple of things I can think of to try to
>> narrow it down:
>> 1) Run nodetool flush on all nodes after step 2 - that will make sure all
>> data is written to sstables rather than relying on commit logs
>> 2) Run the test with consistency level quorom rather than serial
>> (shouldn’t be any different but quorom is more widely used so maybe there
>> is a bug that’s specific to serial)
>>
>> Cheers
>> Ben
>>
>> On Mon, 24 Oct 2016 at 10:29 Yuji Ito  wrote:
>>
>> Hi Ben,
>>
>> The test without killing nodes has been working well without data lost.
>> I've repeated my test about 200 times after removing data and
>> rebuild/repair.
>>
>> Regards,
>>
>>
>> On Fri, Oct 21, 2016 at 3:14 PM, Yuji Ito  wrote:
>>
>> > Just to confirm, are you saying:
>> > a) after operation 2, you select all and get 1000 rows
>> > b) after operation 3 (which only does updates and read) you select and
>> only get 953 rows?
>>
>> That's right!
>>
>> I've started the test without killing nodes.
>> I'll report the result to you next Monday.
>>
>> Thanks
>>
>>
>> On Fri, Oct 21, 2016 at 3:05 PM, Ben Slater 
>> wrote:
>>
>> Just to confirm, are you saying:
>> a) after operation 2, you select all and get 1000 rows
>> b) after operation 3 (which only does updates and read) you select and
>> only get 953 rows?
>>
>> If so, that would be very unexpected. If you run your tests without
>> killing nodes do you get the expected (1,000) rows?
>>
>> Cheers
>> Ben
>>
>> On Fri, 21 Oct 2016 at 17:00 Yuji Ito  wrote:
>>
>> > Are you certain your tests don’t generate any overlapping inserts (by
>> PK)?
>>
>> Yes. The operation 2) also checks the number of rows just after all
>> insertions.
>>
>>
>> On Fri, Oct 21, 2016 at 2:51 PM, Ben Slater 
>> wrote:
>>
>> OK. Are you certain your tests don’t generate any overlapping inserts (by
>> PK)? Cassandra basically treats any inserts with the same primary key as
>> updates (so 1000 insert operations may not necessarily result in 1000 rows
>> in the DB).
>>
>> On Fri, 21 Oct 2016 at 16:30 Yuji Ito  wrote:
>>
>> thanks Ben,
>>
>> > 1) At what stage did you have (or expect to have) 1000 rows (and have
>> the mismatch between actual and expected) - at that end of operation (2) or
>> after operation (3)?
>>
>> after operation 3), at operation 4) which reads all rows by cqlsh 

Re: Can a Select Count(*) Affect Writes in Cassandra?

2016-11-10 Thread Alexander Dejanovski
Shalom,

you may have a high trace probability which could explain what you're
observing :
https://docs.datastax.com/en/cassandra/2.0/cassandra/tools/toolsSetTraceProbability.html

On Thu, Nov 10, 2016 at 3:37 PM Chris Lohfink  wrote:

> count(*) actually pages through all the data. So a select count(*) without
> a limit would be expected to cause a lot of load on the system. The hit is
> more than just IO load and CPU, it also creates a lot of garbage that can
> cause pauses slowing down the entire JVM. Some details here:
> http://www.datastax.com/dev/blog/counting-keys-in-cassandra
> 
>
> You may want to consider maintaining the count yourself, using Spark, or
> if you just want a ball park number you can grab it from JMX.
>
> > Cassandra writes (mutations) are INSERTs, UPDATEs or DELETEs, it
> actually has nothing to do with flushes. A flush is the operation of moving
> data from memory (memtable) to disk (SSTable).
>
> FWIW in 2.0 thats not completely accurate. Before 2.1 the process of
> memtable flushing acquired a switchlock on that blocks mutations during the
> flush (the "pending task" metric is the measure of how many mutations are
> blocked by this lock).
>
> Chris
>
> On Thu, Nov 10, 2016 at 8:10 AM, Shalom Sagges 
> wrote:
>
> Hi Alexander,
>
> I'm referring to Writes Count generated from JMX:
> [image: Inline image 1]
>
> The higher curve shows the total write count per second for all nodes in
> the cluster and the lower curve is the average write count per second per
> node.
> The drop in the end is the result of shutting down one application node
> that performed this kind of query (we still haven't removed the query
> itself in this cluster).
>
>
> On a different cluster, where we already removed the "select count(*)"
> query completely, we can see that the issue was resolved (also verified
> this with running nodetool cfstats a few times and checked the write count
> difference):
> [image: Inline image 2]
>
>
> Naturally I asked how can a select query affect the write count of a node
> but weird as it seems, the issue was resolved once the query was removed
> from the code.
>
> Another side note.. One of our developers that wrote the query in the
> code, thought it would be nice to limit the query results to 560,000,000.
> Perhaps the ridiculously high limit might have caused this?
>
> Thanks!
>
>
>
> Shalom Sagges
> DBA
> T: +972-74-700-4035
>  
>  We Create Meaningful Connections
>
> 
>
>
> On Thu, Nov 10, 2016 at 3:21 PM, Alexander Dejanovski <
> a...@thelastpickle.com> wrote:
>
> Hi Shalom,
>
> Cassandra writes (mutations) are INSERTs, UPDATEs or DELETEs, it actually
> has nothing to do with flushes. A flush is the operation of moving data
> from memory (memtable) to disk (SSTable).
>
> The Cassandra write path and read path are two different things and, as
> far as I know, I see no way for a select count(*) to increase your write
> count (if you are indeed talking about actual Cassandra writes, and not I/O
> operations).
>
> Cheers,
>
> On Thu, Nov 10, 2016 at 1:21 PM Shalom Sagges 
> wrote:
>
> Yes, I know it's obsolete, but unfortunately this takes time.
> We're in the process of upgrading to 2.2.8 and 3.0.9 in our clusters.
>
> Thanks!
>
>
>
> Shalom Sagges
> DBA
> T: +972-74-700-4035 <+972%2074-700-4035>
>  
>  We Create Meaningful Connections
>
> 
>
>
> On Thu, Nov 10, 2016 at 1:31 PM, Vladimir Yudovin 
> wrote:
>
> As I said I'm not sure about it, but it will be interesting to check
> memory heap state with any JMX tool, e.g.
> https://github.com/patric-r/jvmtop
>
> By a way, why Cassandra 2.0.14? It's quit old and unsupported version.
> Even in 2.0 branch there is 2.0.17 available.
>
> Best regards, Vladimir Yudovin,
>
> *Winguzone  - Hosted Cloud
> CassandraLaunch your cluster in minutes.*
>
>
>  On Thu, 10 Nov 2016 05:47:37 -0500*Shalom Sagges
> >* wrote 
>
> Thanks for the quick reply Vladimir.
> Is it really possible that ~12,500 writes per second (per node in a 12
> nodes DC) are caused by memory flushes?
>
>
>
>
>
>
> Shalom Sagges
> DBA
> T: +972-74-700-4035
> 
> 
> 
> We Create Meaningful Connections
>
> 

Re: Can a Select Count(*) Affect Writes in Cassandra?

2016-11-10 Thread Alexander Dejanovski
Could you check the write count on a per table basis in order to check
which specific table is actually receiving writes ?
Check the OneMinuteRate metric in
org.apache.cassandra.metrics:type=ColumnFamily,keyspace=*keyspace1*,scope=
*standard1*,name=WriteLatency
(Make sure you replace keyspace and table name here).

Also, check if you have tracing turned on as it can indeed generate writes
for every query you send in the sessions and events table :
https://docs.datastax.com/en/cql/3.1/cql/cql_reference/tracing_r.html

Cheers,

On Thu, Nov 10, 2016 at 3:11 PM Shalom Sagges 
wrote:

> Hi Alexander,
>
> I'm referring to Writes Count generated from JMX:
> [image: Inline image 1]
>
> The higher curve shows the total write count per second for all nodes in
> the cluster and the lower curve is the average write count per second per
> node.
> The drop in the end is the result of shutting down one application node
> that performed this kind of query (we still haven't removed the query
> itself in this cluster).
>
>
> On a different cluster, where we already removed the "select count(*)"
> query completely, we can see that the issue was resolved (also verified
> this with running nodetool cfstats a few times and checked the write count
> difference):
> [image: Inline image 2]
>
>
> Naturally I asked how can a select query affect the write count of a node
> but weird as it seems, the issue was resolved once the query was removed
> from the code.
>
> Another side note.. One of our developers that wrote the query in the
> code, thought it would be nice to limit the query results to 560,000,000.
> Perhaps the ridiculously high limit might have caused this?
>
> Thanks!
>
>
>
> Shalom Sagges
> DBA
> T: +972-74-700-4035 <+972%2074-700-4035>
>  
>  We Create Meaningful Connections
>
> 
>
>
> On Thu, Nov 10, 2016 at 3:21 PM, Alexander Dejanovski <
> a...@thelastpickle.com> wrote:
>
> Hi Shalom,
>
> Cassandra writes (mutations) are INSERTs, UPDATEs or DELETEs, it actually
> has nothing to do with flushes. A flush is the operation of moving data
> from memory (memtable) to disk (SSTable).
>
> The Cassandra write path and read path are two different things and, as
> far as I know, I see no way for a select count(*) to increase your write
> count (if you are indeed talking about actual Cassandra writes, and not I/O
> operations).
>
> Cheers,
>
> On Thu, Nov 10, 2016 at 1:21 PM Shalom Sagges 
> wrote:
>
> Yes, I know it's obsolete, but unfortunately this takes time.
> We're in the process of upgrading to 2.2.8 and 3.0.9 in our clusters.
>
> Thanks!
>
>
>
> Shalom Sagges
> DBA
> T: +972-74-700-4035 <+972%2074-700-4035>
>  
>  We Create Meaningful Connections
>
> 
>
>
> On Thu, Nov 10, 2016 at 1:31 PM, Vladimir Yudovin 
> wrote:
>
> As I said I'm not sure about it, but it will be interesting to check
> memory heap state with any JMX tool, e.g.
> https://github.com/patric-r/jvmtop
>
> By a way, why Cassandra 2.0.14? It's quit old and unsupported version.
> Even in 2.0 branch there is 2.0.17 available.
>
> Best regards, Vladimir Yudovin,
>
> *Winguzone  - Hosted Cloud
> CassandraLaunch your cluster in minutes.*
>
>
>  On Thu, 10 Nov 2016 05:47:37 -0500*Shalom Sagges
> >* wrote 
>
> Thanks for the quick reply Vladimir.
> Is it really possible that ~12,500 writes per second (per node in a 12
> nodes DC) are caused by memory flushes?
>
>
>
>
>
>
> Shalom Sagges
> DBA
> T: +972-74-700-4035
> 
> 
> 
> We Create Meaningful Connections
>
> 
>
>
>
> On Thu, Nov 10, 2016 at 11:02 AM, Vladimir Yudovin 
> wrote:
>
>
>
> This message may contain confidential and/or privileged information.
> If you are not the addressee or authorized to receive this on behalf of
> the addressee you must not use, copy, disclose or take action based on this
> message or any information herein.
> If you have received this message in error, please advise the sender
> immediately by reply email and delete this message. Thank you.
>
>
> Hi Shalom,
>
> so not sure, but probably excessive memory consumption by this SELECT
> causes C* to flush tables to free memory.
>
> Best regards, Vladimir Yudovin,
>
> *Winguzone  - Hosted Cloud
> 

Re: Can a Select Count(*) Affect Writes in Cassandra?

2016-11-10 Thread Chris Lohfink
count(*) actually pages through all the data. So a select count(*) without
a limit would be expected to cause a lot of load on the system. The hit is
more than just IO load and CPU, it also creates a lot of garbage that can
cause pauses slowing down the entire JVM. Some details here:
http://www.datastax.com/dev/blog/counting-keys-in-cassandra


You may want to consider maintaining the count yourself, using Spark, or if
you just want a ball park number you can grab it from JMX.

> Cassandra writes (mutations) are INSERTs, UPDATEs or DELETEs, it actually
has nothing to do with flushes. A flush is the operation of moving data
from memory (memtable) to disk (SSTable).

FWIW in 2.0 thats not completely accurate. Before 2.1 the process of
memtable flushing acquired a switchlock on that blocks mutations during the
flush (the "pending task" metric is the measure of how many mutations are
blocked by this lock).

Chris

On Thu, Nov 10, 2016 at 8:10 AM, Shalom Sagges 
wrote:

> Hi Alexander,
>
> I'm referring to Writes Count generated from JMX:
> [image: Inline image 1]
>
> The higher curve shows the total write count per second for all nodes in
> the cluster and the lower curve is the average write count per second per
> node.
> The drop in the end is the result of shutting down one application node
> that performed this kind of query (we still haven't removed the query
> itself in this cluster).
>
>
> On a different cluster, where we already removed the "select count(*)"
> query completely, we can see that the issue was resolved (also verified
> this with running nodetool cfstats a few times and checked the write count
> difference):
> [image: Inline image 2]
>
>
> Naturally I asked how can a select query affect the write count of a node
> but weird as it seems, the issue was resolved once the query was removed
> from the code.
>
> Another side note.. One of our developers that wrote the query in the
> code, thought it would be nice to limit the query results to 560,000,000.
> Perhaps the ridiculously high limit might have caused this?
>
> Thanks!
>
>
>
> Shalom Sagges
> DBA
> T: +972-74-700-4035
>  
>  We Create Meaningful Connections
>
> 
>
>
> On Thu, Nov 10, 2016 at 3:21 PM, Alexander Dejanovski <
> a...@thelastpickle.com> wrote:
>
>> Hi Shalom,
>>
>> Cassandra writes (mutations) are INSERTs, UPDATEs or DELETEs, it actually
>> has nothing to do with flushes. A flush is the operation of moving data
>> from memory (memtable) to disk (SSTable).
>>
>> The Cassandra write path and read path are two different things and, as
>> far as I know, I see no way for a select count(*) to increase your write
>> count (if you are indeed talking about actual Cassandra writes, and not I/O
>> operations).
>>
>> Cheers,
>>
>> On Thu, Nov 10, 2016 at 1:21 PM Shalom Sagges 
>> wrote:
>>
>>> Yes, I know it's obsolete, but unfortunately this takes time.
>>> We're in the process of upgrading to 2.2.8 and 3.0.9 in our clusters.
>>>
>>> Thanks!
>>>
>>>
>>>
>>> Shalom Sagges
>>> DBA
>>> T: +972-74-700-4035 <+972%2074-700-4035>
>>>  
>>>  We Create Meaningful Connections
>>>
>>> 
>>>
>>>
>>> On Thu, Nov 10, 2016 at 1:31 PM, Vladimir Yudovin 
>>> wrote:
>>>
>>> As I said I'm not sure about it, but it will be interesting to check
>>> memory heap state with any JMX tool, e.g. https://github.com/patric
>>> -r/jvmtop
>>>
>>> By a way, why Cassandra 2.0.14? It's quit old and unsupported version.
>>> Even in 2.0 branch there is 2.0.17 available.
>>>
>>> Best regards, Vladimir Yudovin,
>>>
>>> *Winguzone  - Hosted Cloud
>>> CassandraLaunch your cluster in minutes.*
>>>
>>>
>>>  On Thu, 10 Nov 2016 05:47:37 -0500*Shalom Sagges
>>> >* wrote 
>>>
>>> Thanks for the quick reply Vladimir.
>>> Is it really possible that ~12,500 writes per second (per node in a 12
>>> nodes DC) are caused by memory flushes?
>>>
>>>
>>>
>>>
>>>
>>>
>>> Shalom Sagges
>>> DBA
>>> T: +972-74-700-4035
>>> 
>>> 
>>> 
>>> We Create Meaningful Connections
>>>
>>> 
>>>
>>>
>>>
>>> On Thu, Nov 10, 2016 at 11:02 AM, Vladimir Yudovin >> > wrote:
>>>
>>>
>>>
>>> This message may contain confidential and/or privileged 

Re: Can a Select Count(*) Affect Writes in Cassandra?

2016-11-10 Thread Shalom Sagges
Hi Alexander,

I'm referring to Writes Count generated from JMX:
[image: Inline image 1]

The higher curve shows the total write count per second for all nodes in
the cluster and the lower curve is the average write count per second per
node.
The drop in the end is the result of shutting down one application node
that performed this kind of query (we still haven't removed the query
itself in this cluster).


On a different cluster, where we already removed the "select count(*)"
query completely, we can see that the issue was resolved (also verified
this with running nodetool cfstats a few times and checked the write count
difference):
[image: Inline image 2]


Naturally I asked how can a select query affect the write count of a node
but weird as it seems, the issue was resolved once the query was removed
from the code.

Another side note.. One of our developers that wrote the query in the code,
thought it would be nice to limit the query results to 560,000,000. Perhaps
the ridiculously high limit might have caused this?

Thanks!



Shalom Sagges
DBA
T: +972-74-700-4035
 
 We Create Meaningful Connections



On Thu, Nov 10, 2016 at 3:21 PM, Alexander Dejanovski <
a...@thelastpickle.com> wrote:

> Hi Shalom,
>
> Cassandra writes (mutations) are INSERTs, UPDATEs or DELETEs, it actually
> has nothing to do with flushes. A flush is the operation of moving data
> from memory (memtable) to disk (SSTable).
>
> The Cassandra write path and read path are two different things and, as
> far as I know, I see no way for a select count(*) to increase your write
> count (if you are indeed talking about actual Cassandra writes, and not I/O
> operations).
>
> Cheers,
>
> On Thu, Nov 10, 2016 at 1:21 PM Shalom Sagges 
> wrote:
>
>> Yes, I know it's obsolete, but unfortunately this takes time.
>> We're in the process of upgrading to 2.2.8 and 3.0.9 in our clusters.
>>
>> Thanks!
>>
>>
>>
>> Shalom Sagges
>> DBA
>> T: +972-74-700-4035 <+972%2074-700-4035>
>>  
>>  We Create Meaningful Connections
>>
>> 
>>
>>
>> On Thu, Nov 10, 2016 at 1:31 PM, Vladimir Yudovin 
>> wrote:
>>
>> As I said I'm not sure about it, but it will be interesting to check
>> memory heap state with any JMX tool, e.g. https://github.com/
>> patric-r/jvmtop
>>
>> By a way, why Cassandra 2.0.14? It's quit old and unsupported version.
>> Even in 2.0 branch there is 2.0.17 available.
>>
>> Best regards, Vladimir Yudovin,
>>
>> *Winguzone  - Hosted Cloud
>> CassandraLaunch your cluster in minutes.*
>>
>>
>>  On Thu, 10 Nov 2016 05:47:37 -0500*Shalom Sagges
>> >* wrote 
>>
>> Thanks for the quick reply Vladimir.
>> Is it really possible that ~12,500 writes per second (per node in a 12
>> nodes DC) are caused by memory flushes?
>>
>>
>>
>>
>>
>>
>> Shalom Sagges
>> DBA
>> T: +972-74-700-4035
>> 
>> 
>> 
>> We Create Meaningful Connections
>>
>> 
>>
>>
>>
>> On Thu, Nov 10, 2016 at 11:02 AM, Vladimir Yudovin 
>> wrote:
>>
>>
>>
>> This message may contain confidential and/or privileged information.
>> If you are not the addressee or authorized to receive this on behalf of
>> the addressee you must not use, copy, disclose or take action based on this
>> message or any information herein.
>> If you have received this message in error, please advise the sender
>> immediately by reply email and delete this message. Thank you.
>>
>>
>> Hi Shalom,
>>
>> so not sure, but probably excessive memory consumption by this SELECT
>> causes C* to flush tables to free memory.
>>
>> Best regards, Vladimir Yudovin,
>>
>> *Winguzone  - Hosted Cloud
>> CassandraLaunch your cluster in minutes.*
>>
>>
>>  On Thu, 10 Nov 2016 03:36:59 -0500*Shalom Sagges
>> >* wrote 
>>
>> Hi There!
>>
>> I'm using C* 2.0.14.
>> I experienced a scenario where a "select count(*)" that ran every minute
>> on a table with practically no results limit (yes, this should definitely
>> be avoided), caused a huge increase in Cassandra writes to around 150
>> thousand writes per second for that particular table.
>>
>> Can anyone explain this behavior? Why would a Select query significantly
>> increase write count in Cassandra?
>>
>> Thanks!

Re: 答复: 答复: A difficult data model with C*

2016-11-10 Thread Benjamin Roth
Yes sorry. Was irritated by the fact that Video id wasn't. Anyway probably
an mv could be a way to go.

Am 10.11.2016 13:38 schrieb "Carlos Alonso" :

Hi Ben, you're right, but in my example the last_time timestamp field is
actually part of the primary key.

Regards

Carlos Alonso | Software Engineer | @calonso 

On 10 November 2016 at 11:50, Benjamin Roth  wrote:

> I pretty much guess the CQL you posted is invalid. You cannot set a
> clustering column that is not part of the primary key.
> But you can use a materialized view to append the last_time to the primary
> key and still preserver uniqueness of username + vedio_id (guess it is a
> typo in vedio).
>
> 2016-11-10 10:47 GMT+00:00 Carlos Alonso :
>
>> What about having something like
>>
>> CREATE TABLE user_views (
>>   user_name text,
>>   video_id text,
>>   position int,
>>   last_time timestamp,
>>   PRIMARY KEY(user_name, last_time)
>> ) WITH CLUSTERING ORDER BY (last_time DESC);
>>
>> Where you insert a record everytime a user watches a video and then
>> having a batch task (every night maybe?) that deletes the extra rows that
>> are not needed anymore.
>> The query pattern for this is quite efficient as something like SELECT *
>> FROM user_views WHERE user_name = ? LIMIT 10;
>>
>> Regards
>>
>> Carlos Alonso | Software Engineer | @calonso
>> 
>>
>> On 10 November 2016 at 09:19, Vladimir Yudovin 
>> wrote:
>>
>>> >Do you mean the oldest one should be removed when a new play is added?
>>> Sure. As you described the issue "the last ten items may be adequate for
>>> the business"
>>>
>>> Best regards, Vladimir Yudovin,
>>>
>>> *Winguzone  - Hosted Cloud
>>> CassandraLaunch your cluster in minutes.*
>>>
>>>
>>>  On Wed, 09 Nov 2016 20:47:05 -0500*Diamond ben
>>> >* wrote 
>>>
>>> The solution maybe work. However, the play list will grow over time and
>>> somebody maybe has ten thousands that will slow down the query and sort .
>>> Do you mean the oldest one should be removed when a new play is added?
>>>
>>> BTW, the version is 2.1.16 in our live system.
>>>
>>>
>>> BRs,
>>>
>>> BEN
>>> --
>>>
>>> *发件人:* Vladimir Yudovin 
>>> *发送时间:* 2016年11月9日 18:11:26
>>> *收件人:* user
>>> *主题:* Re: 答复: A difficult data model with C*
>>>
>>> You are welcome! )
>>>
>>> >recent ten movies watched by the user within 30 days.
>>> In this case you can't use PRIMARY KEY (user_name, video_id), as
>>> video_id is demanded to fetch row, so all this stuff may be
>>>
>>> CREATE TYPE play (video_id text, position int, last_time timestamp);
>>> CREATE TABLE recent (user_name text PRIMARY KEY, play_list
>>> LIST);
>>>
>>> You can easily retrieve play list for specific user by his ID. Instead
>>> of LIST you can use MAP, I don't think that for ten entries it matters.
>>>
>>>
>>> Best regards, Vladimir Yudovin,
>>>
>>> *Winguzone  - Hosted Cloud Cassandra
>>> Launch your cluster in minutes.*
>>>
>>>
>>>  On Tue, 08 Nov 2016 22:29:48 -0500*ben ben
>>> >* wrote 
>>>
>>> Hi Vladimir Yudovin,
>>>
>>>
>>> Thank you very much for your detailed explaining. Maybe I didn't
>>> describe the requirement clearly. The use cases should be:
>>>
>>> 1. a user login our app.
>>>
>>> 2. show the recent ten movies watched by the user within 30 days.
>>>
>>> 3. the user can click any one of the ten movie and continue to watch
>>> from the last position she/he did. BTW, a movie can be watched several
>>> times by a user and the last positon is needed indeed.
>>>
>>> BRs,
>>>
>>> BEN
>>> --
>>>
>>> *发件人:* Vladimir Yudovin 
>>> *发送时间:* 2016年11月8日 22:35:48
>>> *收件人:* user
>>> *主题:* Re: A difficult data model with C*
>>>
>>> Hi Ben,
>>>
>>> if need very limited number of positions (as you said ten) may be you
>>> can store them in LIST of UDT? Or just as JSON string?
>>> So you'll have one row per each pair user-video.
>>>
>>> It can be something like this:
>>>
>>> CREATE TYPE play (position int, last_time timestamp);
>>> CREATE TABLE recent (user_name text, video_id text, review
>>> LIST, PRIMARY KEY (user_name, video_id));
>>>
>>> UPDATE recent set review = review + [(1234,12345)] where user_name='some
>>> user' AND video_id='great video';
>>> UPDATE recent set review = review + [(1234,123456)] where
>>> user_name='some user' AND video_id='great video';
>>> UPDATE recent set review = review + [(1234,1234567)] where
>>> user_name='some user' AND video_id='great video';
>>>
>>> You can delete the oldest entry by index:
>>> DELETE review[0] FROM recent WHERE user_name='some user' AND
>>> video_id='great video';
>>>
>>> or by value, if you know the oldest entry:
>>>

Re: 答复: 答复: A difficult data model with C*

2016-11-10 Thread Carlos Alonso
Hi Ben, you're right, but in my example the last_time timestamp field is
actually part of the primary key.

Regards

Carlos Alonso | Software Engineer | @calonso 

On 10 November 2016 at 11:50, Benjamin Roth  wrote:

> I pretty much guess the CQL you posted is invalid. You cannot set a
> clustering column that is not part of the primary key.
> But you can use a materialized view to append the last_time to the primary
> key and still preserver uniqueness of username + vedio_id (guess it is a
> typo in vedio).
>
> 2016-11-10 10:47 GMT+00:00 Carlos Alonso :
>
>> What about having something like
>>
>> CREATE TABLE user_views (
>>   user_name text,
>>   video_id text,
>>   position int,
>>   last_time timestamp,
>>   PRIMARY KEY(user_name, last_time)
>> ) WITH CLUSTERING ORDER BY (last_time DESC);
>>
>> Where you insert a record everytime a user watches a video and then
>> having a batch task (every night maybe?) that deletes the extra rows that
>> are not needed anymore.
>> The query pattern for this is quite efficient as something like SELECT *
>> FROM user_views WHERE user_name = ? LIMIT 10;
>>
>> Regards
>>
>> Carlos Alonso | Software Engineer | @calonso
>> 
>>
>> On 10 November 2016 at 09:19, Vladimir Yudovin 
>> wrote:
>>
>>> >Do you mean the oldest one should be removed when a new play is added?
>>> Sure. As you described the issue "the last ten items may be adequate for
>>> the business"
>>>
>>> Best regards, Vladimir Yudovin,
>>>
>>> *Winguzone  - Hosted Cloud
>>> CassandraLaunch your cluster in minutes.*
>>>
>>>
>>>  On Wed, 09 Nov 2016 20:47:05 -0500*Diamond ben
>>> >* wrote 
>>>
>>> The solution maybe work. However, the play list will grow over time and
>>> somebody maybe has ten thousands that will slow down the query and sort .
>>> Do you mean the oldest one should be removed when a new play is added?
>>>
>>> BTW, the version is 2.1.16 in our live system.
>>>
>>>
>>> BRs,
>>>
>>> BEN
>>> --
>>>
>>> *发件人:* Vladimir Yudovin 
>>> *发送时间:* 2016年11月9日 18:11:26
>>> *收件人:* user
>>> *主题:* Re: 答复: A difficult data model with C*
>>>
>>> You are welcome! )
>>>
>>> >recent ten movies watched by the user within 30 days.
>>> In this case you can't use PRIMARY KEY (user_name, video_id), as
>>> video_id is demanded to fetch row, so all this stuff may be
>>>
>>> CREATE TYPE play (video_id text, position int, last_time timestamp);
>>> CREATE TABLE recent (user_name text PRIMARY KEY, play_list
>>> LIST);
>>>
>>> You can easily retrieve play list for specific user by his ID. Instead
>>> of LIST you can use MAP, I don't think that for ten entries it matters.
>>>
>>>
>>> Best regards, Vladimir Yudovin,
>>>
>>> *Winguzone  - Hosted Cloud Cassandra
>>> Launch your cluster in minutes.*
>>>
>>>
>>>  On Tue, 08 Nov 2016 22:29:48 -0500*ben ben
>>> >* wrote 
>>>
>>> Hi Vladimir Yudovin,
>>>
>>>
>>> Thank you very much for your detailed explaining. Maybe I didn't
>>> describe the requirement clearly. The use cases should be:
>>>
>>> 1. a user login our app.
>>>
>>> 2. show the recent ten movies watched by the user within 30 days.
>>>
>>> 3. the user can click any one of the ten movie and continue to watch
>>> from the last position she/he did. BTW, a movie can be watched several
>>> times by a user and the last positon is needed indeed.
>>>
>>> BRs,
>>>
>>> BEN
>>> --
>>>
>>> *发件人:* Vladimir Yudovin 
>>> *发送时间:* 2016年11月8日 22:35:48
>>> *收件人:* user
>>> *主题:* Re: A difficult data model with C*
>>>
>>> Hi Ben,
>>>
>>> if need very limited number of positions (as you said ten) may be you
>>> can store them in LIST of UDT? Or just as JSON string?
>>> So you'll have one row per each pair user-video.
>>>
>>> It can be something like this:
>>>
>>> CREATE TYPE play (position int, last_time timestamp);
>>> CREATE TABLE recent (user_name text, video_id text, review
>>> LIST, PRIMARY KEY (user_name, video_id));
>>>
>>> UPDATE recent set review = review + [(1234,12345)] where user_name='some
>>> user' AND video_id='great video';
>>> UPDATE recent set review = review + [(1234,123456)] where
>>> user_name='some user' AND video_id='great video';
>>> UPDATE recent set review = review + [(1234,1234567)] where
>>> user_name='some user' AND video_id='great video';
>>>
>>> You can delete the oldest entry by index:
>>> DELETE review[0] FROM recent WHERE user_name='some user' AND
>>> video_id='great video';
>>>
>>> or by value, if you know the oldest entry:
>>>
>>> UPDATE recent SET review = review - [(1234,12345)]  WHERE
>>> user_name='some user' AND video_id='great video';
>>>
>>> Best regards, Vladimir Yudovin,
>>>
>>> 

Re: Can a Select Count(*) Affect Writes in Cassandra?

2016-11-10 Thread Alexander Dejanovski
Hi Shalom,

Cassandra writes (mutations) are INSERTs, UPDATEs or DELETEs, it actually
has nothing to do with flushes. A flush is the operation of moving data
from memory (memtable) to disk (SSTable).

The Cassandra write path and read path are two different things and, as far
as I know, I see no way for a select count(*) to increase your write count
(if you are indeed talking about actual Cassandra writes, and not I/O
operations).

Cheers,

On Thu, Nov 10, 2016 at 1:21 PM Shalom Sagges 
wrote:

> Yes, I know it's obsolete, but unfortunately this takes time.
> We're in the process of upgrading to 2.2.8 and 3.0.9 in our clusters.
>
> Thanks!
>
>
>
> Shalom Sagges
> DBA
> T: +972-74-700-4035 <+972%2074-700-4035>
>  
>  We Create Meaningful Connections
>
> 
>
>
> On Thu, Nov 10, 2016 at 1:31 PM, Vladimir Yudovin 
> wrote:
>
> As I said I'm not sure about it, but it will be interesting to check
> memory heap state with any JMX tool, e.g.
> https://github.com/patric-r/jvmtop
>
> By a way, why Cassandra 2.0.14? It's quit old and unsupported version.
> Even in 2.0 branch there is 2.0.17 available.
>
> Best regards, Vladimir Yudovin,
>
> *Winguzone  - Hosted Cloud
> CassandraLaunch your cluster in minutes.*
>
>
>  On Thu, 10 Nov 2016 05:47:37 -0500*Shalom Sagges
> >* wrote 
>
> Thanks for the quick reply Vladimir.
> Is it really possible that ~12,500 writes per second (per node in a 12
> nodes DC) are caused by memory flushes?
>
>
>
>
>
>
> Shalom Sagges
> DBA
> T: +972-74-700-4035
> 
> 
> 
> We Create Meaningful Connections
>
> 
>
>
>
> On Thu, Nov 10, 2016 at 11:02 AM, Vladimir Yudovin 
> wrote:
>
>
>
> This message may contain confidential and/or privileged information.
> If you are not the addressee or authorized to receive this on behalf of
> the addressee you must not use, copy, disclose or take action based on this
> message or any information herein.
> If you have received this message in error, please advise the sender
> immediately by reply email and delete this message. Thank you.
>
>
> Hi Shalom,
>
> so not sure, but probably excessive memory consumption by this SELECT
> causes C* to flush tables to free memory.
>
> Best regards, Vladimir Yudovin,
>
> *Winguzone  - Hosted Cloud
> CassandraLaunch your cluster in minutes.*
>
>
>  On Thu, 10 Nov 2016 03:36:59 -0500*Shalom Sagges
> >* wrote 
>
> Hi There!
>
> I'm using C* 2.0.14.
> I experienced a scenario where a "select count(*)" that ran every minute
> on a table with practically no results limit (yes, this should definitely
> be avoided), caused a huge increase in Cassandra writes to around 150
> thousand writes per second for that particular table.
>
> Can anyone explain this behavior? Why would a Select query significantly
> increase write count in Cassandra?
>
> Thanks!
>
>
> Shalom Sagges
>
> 
> 
> 
> We Create Meaningful Connections
>
> 
>
>
>
> This message may contain confidential and/or privileged information.
> If you are not the addressee or authorized to receive this on behalf of
> the addressee you must not use, copy, disclose or take action based on this
> message or any information herein.
> If you have received this message in error, please advise the sender
> immediately by reply email and delete this message. Thank you.
>
>
>
>
>
> This message may contain confidential and/or privileged information.
> If you are not the addressee or authorized to receive this on behalf of
> the addressee you must not use, copy, disclose or take action based on this
> message or any information herein.
> If you have received this message in error, please advise the sender
> immediately by reply email and delete this message. Thank you.
>
-- 
-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: Can a Select Count(*) Affect Writes in Cassandra?

2016-11-10 Thread Shalom Sagges
Yes, I know it's obsolete, but unfortunately this takes time.
We're in the process of upgrading to 2.2.8 and 3.0.9 in our clusters.

Thanks!



Shalom Sagges
DBA
T: +972-74-700-4035
 
 We Create Meaningful Connections



On Thu, Nov 10, 2016 at 1:31 PM, Vladimir Yudovin 
wrote:

> As I said I'm not sure about it, but it will be interesting to check
> memory heap state with any JMX tool, e.g. https://github.com/
> patric-r/jvmtop
>
> By a way, why Cassandra 2.0.14? It's quit old and unsupported version.
> Even in 2.0 branch there is 2.0.17 available.
>
> Best regards, Vladimir Yudovin,
>
> *Winguzone  - Hosted Cloud
> CassandraLaunch your cluster in minutes.*
>
>
>  On Thu, 10 Nov 2016 05:47:37 -0500*Shalom Sagges
> >* wrote 
>
> Thanks for the quick reply Vladimir.
> Is it really possible that ~12,500 writes per second (per node in a 12
> nodes DC) are caused by memory flushes?
>
>
>
>
>
>
> Shalom Sagges
> DBA
> T: +972-74-700-4035
> 
> 
> 
> We Create Meaningful Connections
>
> 
>
>
>
> On Thu, Nov 10, 2016 at 11:02 AM, Vladimir Yudovin 
> wrote:
>
>
>
> This message may contain confidential and/or privileged information.
> If you are not the addressee or authorized to receive this on behalf of
> the addressee you must not use, copy, disclose or take action based on this
> message or any information herein.
> If you have received this message in error, please advise the sender
> immediately by reply email and delete this message. Thank you.
>
>
> Hi Shalom,
>
> so not sure, but probably excessive memory consumption by this SELECT
> causes C* to flush tables to free memory.
>
> Best regards, Vladimir Yudovin,
>
> *Winguzone  - Hosted Cloud
> CassandraLaunch your cluster in minutes.*
>
>
>  On Thu, 10 Nov 2016 03:36:59 -0500*Shalom Sagges
> >* wrote 
>
> Hi There!
>
> I'm using C* 2.0.14.
> I experienced a scenario where a "select count(*)" that ran every minute
> on a table with practically no results limit (yes, this should definitely
> be avoided), caused a huge increase in Cassandra writes to around 150
> thousand writes per second for that particular table.
>
> Can anyone explain this behavior? Why would a Select query significantly
> increase write count in Cassandra?
>
> Thanks!
>
>
> Shalom Sagges
>
> 
> 
> 
> We Create Meaningful Connections
>
> 
>
>
>
> This message may contain confidential and/or privileged information.
> If you are not the addressee or authorized to receive this on behalf of
> the addressee you must not use, copy, disclose or take action based on this
> message or any information herein.
> If you have received this message in error, please advise the sender
> immediately by reply email and delete this message. Thank you.
>
>
>
>

-- 
This message may contain confidential and/or privileged information. 
If you are not the addressee or authorized to receive this on behalf of the 
addressee you must not use, copy, disclose or take action based on this 
message or any information herein. 
If you have received this message in error, please advise the sender 
immediately by reply email and delete this message. Thank you.


Re: Can a Select Count(*) Affect Writes in Cassandra?

2016-11-10 Thread Vladimir Yudovin
As I said I'm not sure about it, but it will be interesting to check memory 
heap state with any JMX tool, e.g. https://github.com/patric-r/jvmtop



By a way, why Cassandra 2.0.14? It's quit old and unsupported version. Even in 
2.0 branch there is 2.0.17 available.


Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.





 On Thu, 10 Nov 2016 05:47:37 -0500Shalom Sagges 
shal...@liveperson.com wrote 




Thanks for the quick reply Vladimir. 

Is it really possible that ~12,500 writes per second (per node in a 12 nodes 
DC) are caused by memory flushes?










 


 
Shalom Sagges
 
DBA
 
T: +972-74-700-4035
 

 
 
 
 We Create Meaningful Connections
 
 

 

 







On Thu, Nov 10, 2016 at 11:02 AM, Vladimir Yudovin vla...@winguzone.com 
wrote:









This message may contain confidential and/or privileged information. 

If you are not the addressee or authorized to receive this on behalf of the 
addressee you must not use, copy, disclose or take action based on this message 
or any information herein. 

If you have received this message in error, please advise the sender 
immediately by reply email and delete this message. Thank you.




Hi Shalom,



so not sure, but probably excessive memory consumption by this SELECT causes C* 
to flush tables to free memory. 



Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.





 On Thu, 10 Nov 2016 03:36:59 -0500Shalom Sagges 
shal...@liveperson.com wrote 




Hi There!



I'm using C* 2.0.14. 

I experienced a scenario where a "select count(*)" that ran every minute on a 
table with practically no results limit (yes, this should definitely be 
avoided), caused a huge increase in Cassandra writes to around 150 thousand 
writes per second for that particular table.



Can anyone explain this behavior? Why would a Select query significantly 
increase write count in Cassandra?



Thanks!




 
Shalom Sagges
 

 

 
 
 
 We Create Meaningful Connections
 
 

 













This message may contain confidential and/or privileged information. 

If you are not the addressee or authorized to receive this on behalf of the 
addressee you must not use, copy, disclose or take action based on this message 
or any information herein. 

If you have received this message in error, please advise the sender 
immediately by reply email and delete this message. Thank you.













Re: 答复: 答复: A difficult data model with C*

2016-11-10 Thread Benjamin Roth
I pretty much guess the CQL you posted is invalid. You cannot set a
clustering column that is not part of the primary key.
But you can use a materialized view to append the last_time to the primary
key and still preserver uniqueness of username + vedio_id (guess it is a
typo in vedio).

2016-11-10 10:47 GMT+00:00 Carlos Alonso :

> What about having something like
>
> CREATE TABLE user_views (
>   user_name text,
>   video_id text,
>   position int,
>   last_time timestamp,
>   PRIMARY KEY(user_name, last_time)
> ) WITH CLUSTERING ORDER BY (last_time DESC);
>
> Where you insert a record everytime a user watches a video and then having
> a batch task (every night maybe?) that deletes the extra rows that are not
> needed anymore.
> The query pattern for this is quite efficient as something like SELECT *
> FROM user_views WHERE user_name = ? LIMIT 10;
>
> Regards
>
> Carlos Alonso | Software Engineer | @calonso 
>
> On 10 November 2016 at 09:19, Vladimir Yudovin 
> wrote:
>
>> >Do you mean the oldest one should be removed when a new play is added?
>> Sure. As you described the issue "the last ten items may be adequate for
>> the business"
>>
>> Best regards, Vladimir Yudovin,
>>
>> *Winguzone  - Hosted Cloud
>> CassandraLaunch your cluster in minutes.*
>>
>>
>>  On Wed, 09 Nov 2016 20:47:05 -0500*Diamond ben
>> >* wrote 
>>
>> The solution maybe work. However, the play list will grow over time and
>> somebody maybe has ten thousands that will slow down the query and sort .
>> Do you mean the oldest one should be removed when a new play is added?
>>
>> BTW, the version is 2.1.16 in our live system.
>>
>>
>> BRs,
>>
>> BEN
>> --
>>
>> *发件人:* Vladimir Yudovin 
>> *发送时间:* 2016年11月9日 18:11:26
>> *收件人:* user
>> *主题:* Re: 答复: A difficult data model with C*
>>
>> You are welcome! )
>>
>> >recent ten movies watched by the user within 30 days.
>> In this case you can't use PRIMARY KEY (user_name, video_id), as video_id
>> is demanded to fetch row, so all this stuff may be
>>
>> CREATE TYPE play (video_id text, position int, last_time timestamp);
>> CREATE TABLE recent (user_name text PRIMARY KEY, play_list
>> LIST);
>>
>> You can easily retrieve play list for specific user by his ID. Instead of
>> LIST you can use MAP, I don't think that for ten entries it matters.
>>
>>
>> Best regards, Vladimir Yudovin,
>>
>> *Winguzone  - Hosted Cloud Cassandra
>> Launch your cluster in minutes.*
>>
>>
>>  On Tue, 08 Nov 2016 22:29:48 -0500*ben ben > >* wrote 
>>
>> Hi Vladimir Yudovin,
>>
>>
>> Thank you very much for your detailed explaining. Maybe I didn't
>> describe the requirement clearly. The use cases should be:
>>
>> 1. a user login our app.
>>
>> 2. show the recent ten movies watched by the user within 30 days.
>>
>> 3. the user can click any one of the ten movie and continue to watch
>> from the last position she/he did. BTW, a movie can be watched several
>> times by a user and the last positon is needed indeed.
>>
>> BRs,
>>
>> BEN
>> --
>>
>> *发件人:* Vladimir Yudovin 
>> *发送时间:* 2016年11月8日 22:35:48
>> *收件人:* user
>> *主题:* Re: A difficult data model with C*
>>
>> Hi Ben,
>>
>> if need very limited number of positions (as you said ten) may be you can
>> store them in LIST of UDT? Or just as JSON string?
>> So you'll have one row per each pair user-video.
>>
>> It can be something like this:
>>
>> CREATE TYPE play (position int, last_time timestamp);
>> CREATE TABLE recent (user_name text, video_id text, review
>> LIST, PRIMARY KEY (user_name, video_id));
>>
>> UPDATE recent set review = review + [(1234,12345)] where user_name='some
>> user' AND video_id='great video';
>> UPDATE recent set review = review + [(1234,123456)] where user_name='some
>> user' AND video_id='great video';
>> UPDATE recent set review = review + [(1234,1234567)] where
>> user_name='some user' AND video_id='great video';
>>
>> You can delete the oldest entry by index:
>> DELETE review[0] FROM recent WHERE user_name='some user' AND
>> video_id='great video';
>>
>> or by value, if you know the oldest entry:
>>
>> UPDATE recent SET review = review - [(1234,12345)]  WHERE user_name='some
>> user' AND video_id='great video';
>>
>> Best regards, Vladimir Yudovin,
>>
>> *Winguzone  - Hosted Cloud Cassandra
>> Launch your cluster in minutes.*
>>
>>
>>  On Mon, 07 Nov 2016 21:54:08 -0500*ben ben > >* wrote 
>>
>>
>> Hi guys,
>>
>> We are maintaining a system for an on-line video service. ALL users'
>> viewing records of every movie are stored in C*. So she/he can continue to
>> enjoy the movie from 

Re: 答复: 答复: A difficult data model with C*

2016-11-10 Thread Carlos Alonso
What about having something like

CREATE TABLE user_views (
  user_name text,
  video_id text,
  position int,
  last_time timestamp,
  PRIMARY KEY(user_name, last_time)
) WITH CLUSTERING ORDER BY (last_time DESC);

Where you insert a record everytime a user watches a video and then having
a batch task (every night maybe?) that deletes the extra rows that are not
needed anymore.
The query pattern for this is quite efficient as something like SELECT *
FROM user_views WHERE user_name = ? LIMIT 10;

Regards

Carlos Alonso | Software Engineer | @calonso 

On 10 November 2016 at 09:19, Vladimir Yudovin  wrote:

> >Do you mean the oldest one should be removed when a new play is added?
> Sure. As you described the issue "the last ten items may be adequate for
> the business"
>
> Best regards, Vladimir Yudovin,
>
> *Winguzone  - Hosted Cloud
> CassandraLaunch your cluster in minutes.*
>
>
>  On Wed, 09 Nov 2016 20:47:05 -0500*Diamond ben
> >* wrote 
>
> The solution maybe work. However, the play list will grow over time and
> somebody maybe has ten thousands that will slow down the query and sort .
> Do you mean the oldest one should be removed when a new play is added?
>
> BTW, the version is 2.1.16 in our live system.
>
>
> BRs,
>
> BEN
> --
>
> *发件人:* Vladimir Yudovin 
> *发送时间:* 2016年11月9日 18:11:26
> *收件人:* user
> *主题:* Re: 答复: A difficult data model with C*
>
> You are welcome! )
>
> >recent ten movies watched by the user within 30 days.
> In this case you can't use PRIMARY KEY (user_name, video_id), as video_id
> is demanded to fetch row, so all this stuff may be
>
> CREATE TYPE play (video_id text, position int, last_time timestamp);
> CREATE TABLE recent (user_name text PRIMARY KEY, play_list
> LIST);
>
> You can easily retrieve play list for specific user by his ID. Instead of
> LIST you can use MAP, I don't think that for ten entries it matters.
>
>
> Best regards, Vladimir Yudovin,
>
> *Winguzone  - Hosted Cloud Cassandra
> Launch your cluster in minutes.*
>
>
>  On Tue, 08 Nov 2016 22:29:48 -0500*ben ben  >* wrote 
>
> Hi Vladimir Yudovin,
>
>
> Thank you very much for your detailed explaining. Maybe I didn't
> describe the requirement clearly. The use cases should be:
>
> 1. a user login our app.
>
> 2. show the recent ten movies watched by the user within 30 days.
>
> 3. the user can click any one of the ten movie and continue to watch from
> the last position she/he did. BTW, a movie can be watched several times by
> a user and the last positon is needed indeed.
>
> BRs,
>
> BEN
> --
>
> *发件人:* Vladimir Yudovin 
> *发送时间:* 2016年11月8日 22:35:48
> *收件人:* user
> *主题:* Re: A difficult data model with C*
>
> Hi Ben,
>
> if need very limited number of positions (as you said ten) may be you can
> store them in LIST of UDT? Or just as JSON string?
> So you'll have one row per each pair user-video.
>
> It can be something like this:
>
> CREATE TYPE play (position int, last_time timestamp);
> CREATE TABLE recent (user_name text, video_id text, review
> LIST, PRIMARY KEY (user_name, video_id));
>
> UPDATE recent set review = review + [(1234,12345)] where user_name='some
> user' AND video_id='great video';
> UPDATE recent set review = review + [(1234,123456)] where user_name='some
> user' AND video_id='great video';
> UPDATE recent set review = review + [(1234,1234567)] where user_name='some
> user' AND video_id='great video';
>
> You can delete the oldest entry by index:
> DELETE review[0] FROM recent WHERE user_name='some user' AND
> video_id='great video';
>
> or by value, if you know the oldest entry:
>
> UPDATE recent SET review = review - [(1234,12345)]  WHERE user_name='some
> user' AND video_id='great video';
>
> Best regards, Vladimir Yudovin,
>
> *Winguzone  - Hosted Cloud Cassandra
> Launch your cluster in minutes.*
>
>
>  On Mon, 07 Nov 2016 21:54:08 -0500*ben ben  >* wrote 
>
>
> Hi guys,
>
> We are maintaining a system for an on-line video service. ALL users'
> viewing records of every movie are stored in C*. So she/he can continue to
> enjoy the movie from the last point next time. The table is designed as
> below:
> CREATE TABLE recent (
> user_name text,
> vedio_id text,
> position int,
> last_time timestamp,
> PRIMARY KEY (user_name, vedio_id)
> )
>
> It worked well before. However, the records increase every day and the
> last ten items may be adequate for the business. The current model use
> vedio_id as cluster key to keep a row for a movie, but as you know, the
> business prefer to order by the last_time desc. If we use last_time as
> cluster key, there 

Re: Can a Select Count(*) Affect Writes in Cassandra?

2016-11-10 Thread Shalom Sagges
Thanks for the quick reply Vladimir.
Is it really possible that ~12,500 writes per second (per node in a 12
nodes DC) are caused by memory flushes?





Shalom Sagges
DBA
T: +972-74-700-4035
 
 We Create Meaningful Connections



On Thu, Nov 10, 2016 at 11:02 AM, Vladimir Yudovin 
wrote:

> Hi Shalom,
>
> so not sure, but probably excessive memory consumption by this SELECT
> causes C* to flush tables to free memory.
>
> Best regards, Vladimir Yudovin,
>
> *Winguzone  - Hosted Cloud
> CassandraLaunch your cluster in minutes.*
>
>
>  On Thu, 10 Nov 2016 03:36:59 -0500*Shalom Sagges
> >* wrote 
>
> Hi There!
>
> I'm using C* 2.0.14.
> I experienced a scenario where a "select count(*)" that ran every minute
> on a table with practically no results limit (yes, this should definitely
> be avoided), caused a huge increase in Cassandra writes to around 150
> thousand writes per second for that particular table.
>
> Can anyone explain this behavior? Why would a Select query significantly
> increase write count in Cassandra?
>
> Thanks!
>
>
> Shalom Sagges
>
> 
> 
> 
> We Create Meaningful Connections
>
> 
>
>
>
> This message may contain confidential and/or privileged information.
> If you are not the addressee or authorized to receive this on behalf of
> the addressee you must not use, copy, disclose or take action based on this
> message or any information herein.
> If you have received this message in error, please advise the sender
> immediately by reply email and delete this message. Thank you.
>
>
>

-- 
This message may contain confidential and/or privileged information. 
If you are not the addressee or authorized to receive this on behalf of the 
addressee you must not use, copy, disclose or take action based on this 
message or any information herein. 
If you have received this message in error, please advise the sender 
immediately by reply email and delete this message. Thank you.


Re: large number of pending compactions, sstables steadily increasing

2016-11-10 Thread Eiti Kimura
Ben, Benjamin thanks for reply,

What your doing here is to change from LeveledCompactions to
SizeTieredCompaction. This task is in progress and we are going to measure
the results just for some column families.
Ben, thanks for the  procedure, will try it later again. When the problem
happened here, we started to 'destroy' the node, witch means, decommission,
remove all of data directories and bootstrap it again, the problem is that
bootstrap was taking much time to complete more than 5 hours...

Benjamin, I hope they start to take care of this ticket you are pointing,
looks like a bug for me and a generalized problem since there is a lot of
people using Cassandra 2.1.x.
Do you know guys if Cassandra 3.1 and 3.0 are affected by this problem as
well?

Regards,
Eiti



J.P. Eiti Kimura
Plataformas

+55 19 3518  5500
+ 55 19 98232 2792
skype: eitikimura

  


2016-11-07 18:49 GMT-02:00 Benjamin Roth :

> Hm, this MAY somehow relate to the issue I encountered recently:
> https://issues.apache.org/jira/browse/CASSANDRA-12730
> I also made a proposal to mitigate excessive (unnecessary) flushes during
> repair streams but unfortunately nobody commented on it yet.
> Maybe there are some opinions on it around here?
>
> 2016-11-07 20:15 GMT+00:00 Ben Slater :
>
>> What I’ve seen happen a number of times is you get in a negative feedback
>> loop:
>> not enough capacity to keep up with compactions (often triggered by
>> repair or compaction hitting a large partition) -> more sstables -> more
>> expensive reads -> even less capacity to keep up with compactions -> repeat
>>
>> The way we deal with this at Instaclustr is typically to take the node
>> offline to let it catch up with compactions. We take it offline by running
>> nodetool disablegossip + disablethrift + disablebinary, unthrottle
>> compactions (nodetool setcompactionthroughput 0) and then leave it to chug
>> through compactions until it gets close to zero then reverse the settings
>> or restart C* to set things back to normal. This typically resolves the
>> issues. If you see it happening regularly your cluster probably needs more
>> processing capacity (or other tuning).
>>
>> Cheers
>> Ben
>>
>> On Tue, 8 Nov 2016 at 02:38 Eiti Kimura  wrote:
>>
>>> Hey guys,
>>>
>>> Do we have any conclusions about this case? Ezra, did you solve your
>>> problem?
>>> We are facing a very similar problem here. LeveledCompaction with VNodes
>>> and looks like a node went to a weird state and start to consume lot of
>>> CPU, the compaction process seems to be stucked and the number of SSTables
>>> increased significantly.
>>>
>>> Do you have any clue about it?
>>>
>>> Thanks,
>>> Eiti
>>>
>>>
>>>
>>> J.P. Eiti Kimura
>>> Plataformas
>>>
>>> +55 19 3518  5500
>>> + 55 19 98232 2792
>>> skype: eitikimura
>>> 
>>>   
>>> 
>>>
>>> 2016-09-11 18:20 GMT-03:00 Jens Rantil :
>>>
>>> I just want to chime in and say that we also had issues keeping up with
>>> compaction once (with vnodes/ssd disks) and I also want to recommend
>>> keeping track of your open file limit which might bite you.
>>>
>>> Cheers,
>>> Jens
>>>
>>>
>>> On Friday, August 19, 2016, Mark Rose  wrote:
>>>
>>> Hi Ezra,
>>>
>>> Are you making frequent changes to your rows (including TTL'ed
>>> values), or mostly inserting new ones? If you're only inserting new
>>> data, it's probable using size-tiered compaction would work better for
>>> you. If you are TTL'ing whole rows, consider date-tiered.
>>>
>>> If leveled compaction is still the best strategy, one way to catch up
>>> with compactions is to have less data per partition -- in other words,
>>> use more machines. Leveled compaction is CPU expensive. You are CPU
>>> bottlenecked currently, or from the other perspective, you have too
>>> much data per node for leveled compaction.
>>>
>>> At this point, compaction is so far behind that you'll likely be
>>> getting high latency if you're reading old rows (since dozens to
>>> hundreds of uncompacted sstables will likely need to be checked for
>>> matching rows). You may be better off with size tiered compaction,
>>> even if it will mean always reading several sstables per read (higher
>>> latency than when leveled can keep up).
>>>
>>> How much data do you have per node? Do you update/insert to/delete
>>> rows? Do you TTL?
>>>
>>> Cheers,
>>> Mark
>>>
>>> On Wed, Aug 17, 2016 at 2:39 PM, Ezra Stuetzel 
>>> wrote:
>>> > I have one 

Re: Can a Select Count(*) Affect Writes in Cassandra?

2016-11-10 Thread Vladimir Yudovin
Hi Shalom,



so not sure, but probably excessive memory consumption by this SELECT causes C* 
to flush tables to free memory. 


Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.





 On Thu, 10 Nov 2016 03:36:59 -0500Shalom Sagges 
shal...@liveperson.com wrote 




Hi There!



I'm using C* 2.0.14. 

I experienced a scenario where a "select count(*)" that ran every minute on a 
table with practically no results limit (yes, this should definitely be 
avoided), caused a huge increase in Cassandra writes to around 150 thousand 
writes per second for that particular table.



Can anyone explain this behavior? Why would a Select query significantly 
increase write count in Cassandra?



Thanks!

 


 
Shalom Sagges
 

 

 
 
 
 We Create Meaningful Connections
 
 

 

 









This message may contain confidential and/or privileged information. 

If you are not the addressee or authorized to receive this on behalf of the 
addressee you must not use, copy, disclose or take action based on this message 
or any information herein. 

If you have received this message in error, please advise the sender 
immediately by reply email and delete this message. Thank you.








Can a Select Count(*) Affect Writes in Cassandra?

2016-11-10 Thread Shalom Sagges
Hi There!

I'm using C* 2.0.14.
I experienced a scenario where a "select count(*)" that ran every minute on
a table with practically no results limit (yes, this should definitely be
avoided), caused a huge increase in Cassandra writes to around 150 thousand
writes per second for that particular table.

Can anyone explain this behavior? Why would a Select query significantly
increase write count in Cassandra?

Thanks!

Shalom Sagges

 
 We Create Meaningful Connections


-- 
This message may contain confidential and/or privileged information. 
If you are not the addressee or authorized to receive this on behalf of the 
addressee you must not use, copy, disclose or take action based on this 
message or any information herein. 
If you have received this message in error, please advise the sender 
immediately by reply email and delete this message. Thank you.


Re: 答复: 答复: A difficult data model with C*

2016-11-10 Thread Vladimir Yudovin
Do you mean the oldest one should be removed when a new play is added?

Sure. As you described the issue "the last ten items may be adequate for the 
business"



Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.





 On Wed, 09 Nov 2016 20:47:05 -0500Diamond ben 
diamond@outlook.com wrote 




The solution maybe work. However, the play list will grow over time and 
somebody maybe has ten thousands that will slow down the query and sort . Do 
you mean the oldest one should be removed when a new play is added?

BTW, the version is 2.1.16 in our live system.



BRs,

BEN




发件人: Vladimir Yudovin vla...@winguzone.com
 发送时间: 2016年11月9日 18:11:26
 收件人: user
 主题: Re: 答复: A difficult data model with C* 
 


You are welcome! )



recent ten movies watched by the user within 30 days.

In this case you can't use PRIMARY KEY (user_name, video_id), as video_id is 
demanded to fetch row, so all this stuff may be

CREATE TYPE play (video_id text, position int, last_time timestamp);

CREATE TABLE recent (user_name text PRIMARY KEY, play_list 
LISTfrozenplay);


You can easily retrieve play list for specific user by his ID. Instead of LIST 
you can use MAP, I don't think that for ten entries it matters.





Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
 Launch your cluster in minutes.





 On Tue, 08 Nov 2016 22:29:48 -0500ben ben diamond@outlook.com 
wrote 




Hi Vladimir Yudovin,



Thank you very much for your detailed explaining. Maybe I didn't describe 
the requirement clearly. The use cases should be:

1. a user login our app.

2. show the recent ten movies watched by the user within 30 days.

3. the user can click any one of the ten movie and continue to watch from the 
last position she/he did. BTW, a movie can be watched several times by a user 
and the last positon is needed indeed.



BRs,

BEN




发件人: Vladimir Yudovin vla...@winguzone.com
 发送时间: 2016年11月8日 22:35:48
 收件人: user
 主题: Re: A difficult data model with C*
 


Hi Ben,



if need very limited number of positions (as you said ten) may be you can store 
them in LIST of UDT? Or just as JSON string?

So you'll have one row per each pair user-video. 



It can be something like this:



CREATE TYPE play (position int, last_time timestamp);

CREATE TABLE recent (user_name text, video_id text, review 
LISTfrozenplay, PRIMARY KEY (user_name, video_id));



UPDATE recent set review = review + [(1234,12345)] where user_name='some user' 
AND video_id='great video';

UPDATE recent set review = review + [(1234,123456)] where user_name='some user' 
AND video_id='great video';

UPDATE recent set review = review + [(1234,1234567)] where user_name='some 
user' AND video_id='great video';



You can delete the oldest entry by index:

DELETE review[0] FROM recent WHERE user_name='some user' AND video_id='great 
video';



or by value, if you know the oldest entry:



UPDATE recent SET review = review - [(1234,12345)]  WHERE user_name='some user' 
AND video_id='great video';



Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
 Launch your cluster in minutes.





 On Mon, 07 Nov 2016 21:54:08 -0500ben ben diamond@outlook.com 
wrote 






Hi guys,



We are maintaining a system for an on-line video service. ALL users' viewing 
records of every movie are stored in C*. So she/he can continue to enjoy the 
movie from the last point next time. The table is designed as below:

CREATE TABLE recent (

user_name text,

vedio_id text,

position int,

last_time timestamp,

PRIMARY KEY (user_name, vedio_id)

)



It worked well before. However, the records increase every day and the last ten 
items may be adequate for the business. The current model use vedio_id as 
cluster key to keep a row for a movie, but as you know, the business prefer to 
order by the last_time desc. If we use last_time as cluster key, there will be 
many records for a singe movie and the recent one is actually desired. So how 
to model that? Do you have any suggestions? 

Thanks!





BRs,

BEN