subject:"\[jira\] \[Commented\] \(CASSANDRA\-11138\) cassandra\-stress tool \- clustering key values not distributed"

[jira] [Commented] (CASSANDRA-11138) cassandra-stress tool - clustering key values not distributed

2019-03-31 Thread Ben Slater (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-11138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16806295#comment-16806295
 ] 

Ben Slater commented on CASSANDRA-11138:


All a long time ago now but I suspect CASSANDRA-12744 (which was discussed in 
CASSANDRA-12490) will have helped with this issue but that this is a specific 
issue that still needs fixing. As Alwyn says in the main description, you'd 
expect up to 1200 rows per partition if things were working properly.

CASSANDRA-12490 shouldn't have any impact unless you use the new distribution 
type it added. 

Note that there is also CASSANDRA-13940 which is a better job of 
CASSANDRA-12744 but I just noticed has been left in open state by the 
contributor but is actually patch available.

> cassandra-stress tool - clustering key values not distributed
> -
>
> Key: CASSANDRA-11138
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11138
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Tools
> Environment: Cassandra 2.2.4, Centos 6.5, Java 8
>Reporter: Ralf Steppacher
>Assignee: Alwyn Davis
>Priority: Normal
>  Labels: stress
> Attachments: 11138-trunk.patch
>
>
> I am trying to get the stress tool to generate random values for three 
> clustering keys. I am trying to simulate collecting events per user id (text, 
> partition key). Events have a session type (text), event type (text), and 
> creation time (timestamp) (clustering keys, in that order). For testing 
> purposes I ended up with the following column spec:
> {noformat}
> columnspec:
> - name: created_at
>   cluster: uniform(10..10)
> - name: event_type
>   size: uniform(5..10)
>   population: uniform(1..30)
>   cluster: uniform(1..30)
> - name: session_type
>   size: fixed(5)
>   population: uniform(1..4)
>   cluster: uniform(1..4)
> - name: user_id
>   size: fixed(15)
>   population: uniform(1..100)
> - name: message
>   size: uniform(10..100)
>   population: uniform(1..100B)
> {noformat}
> My expectation was that this would lead to anywhere between 10 and 1200 rows 
> to be created per partition key. But it seems that exactly 10 rows are being 
> created, with the {{created_at}} timestamp being the only variable that is 
> assigned variable values (per partition key). The {{session_type}} and 
> {{event_type}} variables are assigned fixed values. This is even the case if 
> I set the cluster distribution to uniform(30..30) and uniform(4..4) 
> respectively. With this setting I expected 1200 rows per partition key to be 
> created, as announced when running the stress tool, but it is still 10.
> {noformat}
> [rsteppac@centos bin]$ ./cassandra-stress user 
> profile=../batch_too_large.yaml ops\(insert=1\) -log level=verbose 
> file=~/centos_eventy_patient_session_event_timestamp_insert_only.log -node 
> 10.211.55.8
> …
> Created schema. Sleeping 1s for propagation.
> Generating batches with [1..1] partitions and [1..1] rows (of [1200..1200] 
> total rows in the partitions)
> Improvement over 4 threadCount: 19%
> ...
> {noformat}
> Sample of generated data:
> {noformat}
> cqlsh> select user_id, event_type, session_type, created_at from 
> stresscql.batch_too_large LIMIT 30 ;
> user_id | event_type   | session_type | created_at
> -+--+--+--
>   %\x7f\x03/.d29 08:14:11+
>   %\x7f\x03/.d29 04:04:56+
>   %\x7f\x03/.d29 00:39:23+
>   %\x7f\x03/.d29 19:56:30+
>   %\x7f\x03/.d29 20:46:26+
>   %\x7f\x03/.d29 03:27:17+
>   %\x7f\x03/.d29 23:30:34+
>   %\x7f\x03/.d29 02:41:28+
>   %\x7f\x03/.d29 07:23:48+
>   %\x7f\x03/.d29 23:23:04+
>  N!\x0eUA7^r7d\x06J 17:48:51+
>  N!\x0eUA7^r7d\x06J 06:21:13+
>  N!\x0eUA7^r7d\x06J 03:34:41+
>  N!\x0eUA7^r7d\x06J 05:26:21+
>  N!\x0eUA7^r7d\x06J 01:31:24+
>  N!\x0eUA7^r7d\x06J 14:22:43+
>  N!\x0eUA7^r7d\x06J 14:54:29+
>  N!\x0eUA7^r7d\x06J 13:31:54+
>  N!\x0eUA7^r7d\x06J 06:38:40+
>  N!\x0eUA7^r7d\x06J 21:16:47+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 2014-11-23 
> 17:05:45+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 2012-02-23 
> 23:20:54+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 2012-02-19 
> 12:05:15+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 2005-10-17 
> 04:22:45+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 2003-02-24 
> 19:45:06+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 1996-12-18 
> 06:18:31+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 1991-06-10 
> 22:07:45+00

[jira] [Commented] (CASSANDRA-11138) cassandra-stress tool - clustering key values not distributed

2019-03-31 Thread mck (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-11138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16806292#comment-16806292
 ] 

mck commented on CASSANDRA-11138:
-

[~alwyn], presuming this bug still an issue post- CASSANDRA-12490 ?


I did a quick test. In 3.9, 3.10 and 3.11.x, with the schema you describe 
above, I got 30 rows in one partition, which is not the same as the 10 you got 
in one partition (although you got 30 across 3). Applying the patch gave me 390 
rows per partition.

[~benedict], [~Stefania], I had trouble following the conversation in 
CASSANDRA-12490, are the concerns there relevant to this patch?

> cassandra-stress tool - clustering key values not distributed
> -
>
> Key: CASSANDRA-11138
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11138
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Tools
> Environment: Cassandra 2.2.4, Centos 6.5, Java 8
>Reporter: Ralf Steppacher
>Assignee: Alwyn Davis
>Priority: Normal
>  Labels: stress
> Attachments: 11138-trunk.patch
>
>
> I am trying to get the stress tool to generate random values for three 
> clustering keys. I am trying to simulate collecting events per user id (text, 
> partition key). Events have a session type (text), event type (text), and 
> creation time (timestamp) (clustering keys, in that order). For testing 
> purposes I ended up with the following column spec:
> {noformat}
> columnspec:
> - name: created_at
>   cluster: uniform(10..10)
> - name: event_type
>   size: uniform(5..10)
>   population: uniform(1..30)
>   cluster: uniform(1..30)
> - name: session_type
>   size: fixed(5)
>   population: uniform(1..4)
>   cluster: uniform(1..4)
> - name: user_id
>   size: fixed(15)
>   population: uniform(1..100)
> - name: message
>   size: uniform(10..100)
>   population: uniform(1..100B)
> {noformat}
> My expectation was that this would lead to anywhere between 10 and 1200 rows 
> to be created per partition key. But it seems that exactly 10 rows are being 
> created, with the {{created_at}} timestamp being the only variable that is 
> assigned variable values (per partition key). The {{session_type}} and 
> {{event_type}} variables are assigned fixed values. This is even the case if 
> I set the cluster distribution to uniform(30..30) and uniform(4..4) 
> respectively. With this setting I expected 1200 rows per partition key to be 
> created, as announced when running the stress tool, but it is still 10.
> {noformat}
> [rsteppac@centos bin]$ ./cassandra-stress user 
> profile=../batch_too_large.yaml ops\(insert=1\) -log level=verbose 
> file=~/centos_eventy_patient_session_event_timestamp_insert_only.log -node 
> 10.211.55.8
> …
> Created schema. Sleeping 1s for propagation.
> Generating batches with [1..1] partitions and [1..1] rows (of [1200..1200] 
> total rows in the partitions)
> Improvement over 4 threadCount: 19%
> ...
> {noformat}
> Sample of generated data:
> {noformat}
> cqlsh> select user_id, event_type, session_type, created_at from 
> stresscql.batch_too_large LIMIT 30 ;
> user_id | event_type   | session_type | created_at
> -+--+--+--
>   %\x7f\x03/.d29 08:14:11+
>   %\x7f\x03/.d29 04:04:56+
>   %\x7f\x03/.d29 00:39:23+
>   %\x7f\x03/.d29 19:56:30+
>   %\x7f\x03/.d29 20:46:26+
>   %\x7f\x03/.d29 03:27:17+
>   %\x7f\x03/.d29 23:30:34+
>   %\x7f\x03/.d29 02:41:28+
>   %\x7f\x03/.d29 07:23:48+
>   %\x7f\x03/.d29 23:23:04+
>  N!\x0eUA7^r7d\x06J 17:48:51+
>  N!\x0eUA7^r7d\x06J 06:21:13+
>  N!\x0eUA7^r7d\x06J 03:34:41+
>  N!\x0eUA7^r7d\x06J 05:26:21+
>  N!\x0eUA7^r7d\x06J 01:31:24+
>  N!\x0eUA7^r7d\x06J 14:22:43+
>  N!\x0eUA7^r7d\x06J 14:54:29+
>  N!\x0eUA7^r7d\x06J 13:31:54+
>  N!\x0eUA7^r7d\x06J 06:38:40+
>  N!\x0eUA7^r7d\x06J 21:16:47+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 2014-11-23 
> 17:05:45+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 2012-02-23 
> 23:20:54+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 2012-02-19 
> 12:05:15+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 2005-10-17 
> 04:22:45+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 2003-02-24 
> 19:45:06+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 1996-12-18 
> 06:18:31+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 1991-06-10 
> 22:07:45+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 1983-05-05 
> 12:29:09+
> oy\x1c0077H"i\x07\x13_%\x06 || \n

[jira] [Commented] (CASSANDRA-11138) cassandra-stress tool - clustering key values not distributed

2016-09-29 Thread Yap Sok Ann (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15534169#comment-15534169
 ] 

Yap Sok Ann commented on CASSANDRA-11138:
-

I didn't set the population for {{col2}}, so it should be the default 
{{uniform(1..100B)}}.

Yes, so far I am able to replicate this consistently. May I know if you get the 
same results as mine without the patch?

> cassandra-stress tool - clustering key values not distributed
> -
>
> Key: CASSANDRA-11138
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11138
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
> Environment: Cassandra 2.2.4, Centos 6.5, Java 8
>Reporter: Ralf Steppacher
>  Labels: stress
> Attachments: 11138-trunk.patch
>
>
> I am trying to get the stress tool to generate random values for three 
> clustering keys. I am trying to simulate collecting events per user id (text, 
> partition key). Events have a session type (text), event type (text), and 
> creation time (timestamp) (clustering keys, in that order). For testing 
> purposes I ended up with the following column spec:
> {noformat}
> columnspec:
> - name: created_at
>   cluster: uniform(10..10)
> - name: event_type
>   size: uniform(5..10)
>   population: uniform(1..30)
>   cluster: uniform(1..30)
> - name: session_type
>   size: fixed(5)
>   population: uniform(1..4)
>   cluster: uniform(1..4)
> - name: user_id
>   size: fixed(15)
>   population: uniform(1..100)
> - name: message
>   size: uniform(10..100)
>   population: uniform(1..100B)
> {noformat}
> My expectation was that this would lead to anywhere between 10 and 1200 rows 
> to be created per partition key. But it seems that exactly 10 rows are being 
> created, with the {{created_at}} timestamp being the only variable that is 
> assigned variable values (per partition key). The {{session_type}} and 
> {{event_type}} variables are assigned fixed values. This is even the case if 
> I set the cluster distribution to uniform(30..30) and uniform(4..4) 
> respectively. With this setting I expected 1200 rows per partition key to be 
> created, as announced when running the stress tool, but it is still 10.
> {noformat}
> [rsteppac@centos bin]$ ./cassandra-stress user 
> profile=../batch_too_large.yaml ops\(insert=1\) -log level=verbose 
> file=~/centos_eventy_patient_session_event_timestamp_insert_only.log -node 
> 10.211.55.8
> …
> Created schema. Sleeping 1s for propagation.
> Generating batches with [1..1] partitions and [1..1] rows (of [1200..1200] 
> total rows in the partitions)
> Improvement over 4 threadCount: 19%
> ...
> {noformat}
> Sample of generated data:
> {noformat}
> cqlsh> select user_id, event_type, session_type, created_at from 
> stresscql.batch_too_large LIMIT 30 ;
> user_id | event_type   | session_type | created_at
> -+--+--+--
>   %\x7f\x03/.d29 08:14:11+
>   %\x7f\x03/.d29 04:04:56+
>   %\x7f\x03/.d29 00:39:23+
>   %\x7f\x03/.d29 19:56:30+
>   %\x7f\x03/.d29 20:46:26+
>   %\x7f\x03/.d29 03:27:17+
>   %\x7f\x03/.d29 23:30:34+
>   %\x7f\x03/.d29 02:41:28+
>   %\x7f\x03/.d29 07:23:48+
>   %\x7f\x03/.d29 23:23:04+
>  N!\x0eUA7^r7d\x06J 17:48:51+
>  N!\x0eUA7^r7d\x06J 06:21:13+
>  N!\x0eUA7^r7d\x06J 03:34:41+
>  N!\x0eUA7^r7d\x06J 05:26:21+
>  N!\x0eUA7^r7d\x06J 01:31:24+
>  N!\x0eUA7^r7d\x06J 14:22:43+
>  N!\x0eUA7^r7d\x06J 14:54:29+
>  N!\x0eUA7^r7d\x06J 13:31:54+
>  N!\x0eUA7^r7d\x06J 06:38:40+
>  N!\x0eUA7^r7d\x06J 21:16:47+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 2014-11-23 
> 17:05:45+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 2012-02-23 
> 23:20:54+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 2012-02-19 
> 12:05:15+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 2005-10-17 
> 04:22:45+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 2003-02-24 
> 19:45:06+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 1996-12-18 
> 06:18:31+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 1991-06-10 
> 22:07:45+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 1983-05-05 
> 12:29:09+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 1972-04-17 
> 21:24:52+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 1971-05-09 
> 23:00:02+
> (30 rows)
> cqlsh>
> {noformat}
> If I remove the {{created_at}} clustering key, then the other two clustering 
> keys are being assigned variable values

[jira] [Commented] (CASSANDRA-11138) cassandra-stress tool - clustering key values not distributed

2016-09-28 Thread Alwyn Davis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15531351#comment-15531351
 ] 

Alwyn Davis commented on CASSANDRA-11138:
-

I couldn't actually replicate your results with the patch, unless I set 
{{col2}} to have only have 1 population value (e.g. {{population: 
uniform(1..1)}}).

Assuming that setting isn't in your columnspec, are you able to replicate this 
consistently?

> cassandra-stress tool - clustering key values not distributed
> -
>
> Key: CASSANDRA-11138
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11138
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
> Environment: Cassandra 2.2.4, Centos 6.5, Java 8
>Reporter: Ralf Steppacher
>  Labels: stress
> Attachments: 11138-trunk.patch
>
>
> I am trying to get the stress tool to generate random values for three 
> clustering keys. I am trying to simulate collecting events per user id (text, 
> partition key). Events have a session type (text), event type (text), and 
> creation time (timestamp) (clustering keys, in that order). For testing 
> purposes I ended up with the following column spec:
> {noformat}
> columnspec:
> - name: created_at
>   cluster: uniform(10..10)
> - name: event_type
>   size: uniform(5..10)
>   population: uniform(1..30)
>   cluster: uniform(1..30)
> - name: session_type
>   size: fixed(5)
>   population: uniform(1..4)
>   cluster: uniform(1..4)
> - name: user_id
>   size: fixed(15)
>   population: uniform(1..100)
> - name: message
>   size: uniform(10..100)
>   population: uniform(1..100B)
> {noformat}
> My expectation was that this would lead to anywhere between 10 and 1200 rows 
> to be created per partition key. But it seems that exactly 10 rows are being 
> created, with the {{created_at}} timestamp being the only variable that is 
> assigned variable values (per partition key). The {{session_type}} and 
> {{event_type}} variables are assigned fixed values. This is even the case if 
> I set the cluster distribution to uniform(30..30) and uniform(4..4) 
> respectively. With this setting I expected 1200 rows per partition key to be 
> created, as announced when running the stress tool, but it is still 10.
> {noformat}
> [rsteppac@centos bin]$ ./cassandra-stress user 
> profile=../batch_too_large.yaml ops\(insert=1\) -log level=verbose 
> file=~/centos_eventy_patient_session_event_timestamp_insert_only.log -node 
> 10.211.55.8
> …
> Created schema. Sleeping 1s for propagation.
> Generating batches with [1..1] partitions and [1..1] rows (of [1200..1200] 
> total rows in the partitions)
> Improvement over 4 threadCount: 19%
> ...
> {noformat}
> Sample of generated data:
> {noformat}
> cqlsh> select user_id, event_type, session_type, created_at from 
> stresscql.batch_too_large LIMIT 30 ;
> user_id | event_type   | session_type | created_at
> -+--+--+--
>   %\x7f\x03/.d29 08:14:11+
>   %\x7f\x03/.d29 04:04:56+
>   %\x7f\x03/.d29 00:39:23+
>   %\x7f\x03/.d29 19:56:30+
>   %\x7f\x03/.d29 20:46:26+
>   %\x7f\x03/.d29 03:27:17+
>   %\x7f\x03/.d29 23:30:34+
>   %\x7f\x03/.d29 02:41:28+
>   %\x7f\x03/.d29 07:23:48+
>   %\x7f\x03/.d29 23:23:04+
>  N!\x0eUA7^r7d\x06J 17:48:51+
>  N!\x0eUA7^r7d\x06J 06:21:13+
>  N!\x0eUA7^r7d\x06J 03:34:41+
>  N!\x0eUA7^r7d\x06J 05:26:21+
>  N!\x0eUA7^r7d\x06J 01:31:24+
>  N!\x0eUA7^r7d\x06J 14:22:43+
>  N!\x0eUA7^r7d\x06J 14:54:29+
>  N!\x0eUA7^r7d\x06J 13:31:54+
>  N!\x0eUA7^r7d\x06J 06:38:40+
>  N!\x0eUA7^r7d\x06J 21:16:47+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 2014-11-23 
> 17:05:45+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 2012-02-23 
> 23:20:54+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 2012-02-19 
> 12:05:15+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 2005-10-17 
> 04:22:45+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 2003-02-24 
> 19:45:06+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 1996-12-18 
> 06:18:31+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 1991-06-10 
> 22:07:45+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 1983-05-05 
> 12:29:09+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 1972-04-17 
> 21:24:52+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 1971-05-09 
> 23:00:02+
> (30 rows)
> cqlsh>
> {noformat}
> If I remove the {{created_at}} clustering key, then the other two clustering 
> k

[jira] [Commented] (CASSANDRA-11138) cassandra-stress tool - clustering key values not distributed

2016-09-28 Thread Yap Sok Ann (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15530531#comment-15530531
 ] 

Yap Sok Ann commented on CASSANDRA-11138:
-

Encounter similar problem with the current trunk, with or without the patch.

With the following sample config:
{code:none}
table: table1
table_definition: |
  CREATE TABLE table1 (
pk1 text,
pk2 int,
col1 timestamp,
col2 text,
col3 blob,
PRIMARY KEY ((pk1, pk2), col1, col2)
  ); 

columnspec:
  - name: col1
cluster: uniform(1..3)

  - name: col2
cluster: uniform(1..4)
{code}

after insert, there will be duplicate values for {{col2}} *and* {{col3}}:
{code:sql}
cqlsh:stress> select * from table1 where pk1 = 'VFk.mZLR' and pk2 = 1772149447; 
 

 pk1  | pk2| col1 | col2 | col3
--++--+--+
 VFk.mZLR | 1772149447 | 1994-05-17 08:23:01+ | QyCJtb6` | 
0x5728b1b79dd2372a
 VFk.mZLR | 1772149447 | 2010-11-24 11:19:30+ | QyCJtb6` | 
0x5728b1b79dd2372a

(2 rows)
{code}

If I just remove {{col2}} from clustering key, then there is no duplicate:
{code:sql}
cqlsh:stress> select * from table1 where pk1 = 'VFk.mZLR' and pk2 = 1772149447;

 pk1  | pk2| col1 | col2 | col3
--++--+--+
 VFk.mZLR | 1772149447 | 1994-05-17 08:23:01+ |  '{9j\(; | 0x1080f88c325e
 VFk.mZLR | 1772149447 | 2010-11-24 11:19:30+ | sA0wlY>' | 0x763588f2f5a8

(2 rows)
{code}

> cassandra-stress tool - clustering key values not distributed
> -
>
> Key: CASSANDRA-11138
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11138
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
> Environment: Cassandra 2.2.4, Centos 6.5, Java 8
>Reporter: Ralf Steppacher
>  Labels: stress
> Attachments: 11138-trunk.patch
>
>
> I am trying to get the stress tool to generate random values for three 
> clustering keys. I am trying to simulate collecting events per user id (text, 
> partition key). Events have a session type (text), event type (text), and 
> creation time (timestamp) (clustering keys, in that order). For testing 
> purposes I ended up with the following column spec:
> {noformat}
> columnspec:
> - name: created_at
>   cluster: uniform(10..10)
> - name: event_type
>   size: uniform(5..10)
>   population: uniform(1..30)
>   cluster: uniform(1..30)
> - name: session_type
>   size: fixed(5)
>   population: uniform(1..4)
>   cluster: uniform(1..4)
> - name: user_id
>   size: fixed(15)
>   population: uniform(1..100)
> - name: message
>   size: uniform(10..100)
>   population: uniform(1..100B)
> {noformat}
> My expectation was that this would lead to anywhere between 10 and 1200 rows 
> to be created per partition key. But it seems that exactly 10 rows are being 
> created, with the {{created_at}} timestamp being the only variable that is 
> assigned variable values (per partition key). The {{session_type}} and 
> {{event_type}} variables are assigned fixed values. This is even the case if 
> I set the cluster distribution to uniform(30..30) and uniform(4..4) 
> respectively. With this setting I expected 1200 rows per partition key to be 
> created, as announced when running the stress tool, but it is still 10.
> {noformat}
> [rsteppac@centos bin]$ ./cassandra-stress user 
> profile=../batch_too_large.yaml ops\(insert=1\) -log level=verbose 
> file=~/centos_eventy_patient_session_event_timestamp_insert_only.log -node 
> 10.211.55.8
> …
> Created schema. Sleeping 1s for propagation.
> Generating batches with [1..1] partitions and [1..1] rows (of [1200..1200] 
> total rows in the partitions)
> Improvement over 4 threadCount: 19%
> ...
> {noformat}
> Sample of generated data:
> {noformat}
> cqlsh> select user_id, event_type, session_type, created_at from 
> stresscql.batch_too_large LIMIT 30 ;
> user_id | event_type   | session_type | created_at
> -+--+--+--
>   %\x7f\x03/.d29 08:14:11+
>   %\x7f\x03/.d29 04:04:56+
>   %\x7f\x03/.d29 00:39:23+
>   %\x7f\x03/.d29 19:56:30+
>   %\x7f\x03/.d29 20:46:26+
>   %\x7f\x03/.d29 03:27:17+
>   %\x7f\x03/.d29 23:30:34+
>   %\x7f\x03/.d29 02:41:28+
>   %\x7f\x03/.d29 07:23:48+
>   %\x7f\x03/.d29 23:23:04+
>  N!\x0eUA7^r7d\x06J 17:48:51+
>  N!\x0eUA7^r7d\x06J 06:21:13+
>  N!\x0eUA7^r7d\x06J 03:34:41+
>  N!\x0eUA7^r7d\x06J 05:26:21+
>  N!\x0eUA7^r7d\x06J 01:31:24+
>  N!\x0eUA7^r7d\x06J 14:22:43+
>  N!\x0eUA7^r7d\x06J 14:54:29+
>  N!\x0eUA7^r7d\x06J 1

[jira] [Commented] (CASSANDRA-11138) cassandra-stress tool - clustering key values not distributed

2016-09-27 Thread Alan Boudreault (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15526118#comment-15526118
 ] 

Alan Boudreault commented on CASSANDRA-11138:
-

I confirm that the bug I mentioned in CASSANDRA-12490 is fixed with this patch.

> cassandra-stress tool - clustering key values not distributed
> -
>
> Key: CASSANDRA-11138
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11138
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
> Environment: Cassandra 2.2.4, Centos 6.5, Java 8
>Reporter: Ralf Steppacher
>  Labels: stress
> Attachments: 11138-trunk.patch
>
>
> I am trying to get the stress tool to generate random values for three 
> clustering keys. I am trying to simulate collecting events per user id (text, 
> partition key). Events have a session type (text), event type (text), and 
> creation time (timestamp) (clustering keys, in that order). For testing 
> purposes I ended up with the following column spec:
> {noformat}
> columnspec:
> - name: created_at
>   cluster: uniform(10..10)
> - name: event_type
>   size: uniform(5..10)
>   population: uniform(1..30)
>   cluster: uniform(1..30)
> - name: session_type
>   size: fixed(5)
>   population: uniform(1..4)
>   cluster: uniform(1..4)
> - name: user_id
>   size: fixed(15)
>   population: uniform(1..100)
> - name: message
>   size: uniform(10..100)
>   population: uniform(1..100B)
> {noformat}
> My expectation was that this would lead to anywhere between 10 and 1200 rows 
> to be created per partition key. But it seems that exactly 10 rows are being 
> created, with the {{created_at}} timestamp being the only variable that is 
> assigned variable values (per partition key). The {{session_type}} and 
> {{event_type}} variables are assigned fixed values. This is even the case if 
> I set the cluster distribution to uniform(30..30) and uniform(4..4) 
> respectively. With this setting I expected 1200 rows per partition key to be 
> created, as announced when running the stress tool, but it is still 10.
> {noformat}
> [rsteppac@centos bin]$ ./cassandra-stress user 
> profile=../batch_too_large.yaml ops\(insert=1\) -log level=verbose 
> file=~/centos_eventy_patient_session_event_timestamp_insert_only.log -node 
> 10.211.55.8
> …
> Created schema. Sleeping 1s for propagation.
> Generating batches with [1..1] partitions and [1..1] rows (of [1200..1200] 
> total rows in the partitions)
> Improvement over 4 threadCount: 19%
> ...
> {noformat}
> Sample of generated data:
> {noformat}
> cqlsh> select user_id, event_type, session_type, created_at from 
> stresscql.batch_too_large LIMIT 30 ;
> user_id | event_type   | session_type | created_at
> -+--+--+--
>   %\x7f\x03/.d29 08:14:11+
>   %\x7f\x03/.d29 04:04:56+
>   %\x7f\x03/.d29 00:39:23+
>   %\x7f\x03/.d29 19:56:30+
>   %\x7f\x03/.d29 20:46:26+
>   %\x7f\x03/.d29 03:27:17+
>   %\x7f\x03/.d29 23:30:34+
>   %\x7f\x03/.d29 02:41:28+
>   %\x7f\x03/.d29 07:23:48+
>   %\x7f\x03/.d29 23:23:04+
>  N!\x0eUA7^r7d\x06J 17:48:51+
>  N!\x0eUA7^r7d\x06J 06:21:13+
>  N!\x0eUA7^r7d\x06J 03:34:41+
>  N!\x0eUA7^r7d\x06J 05:26:21+
>  N!\x0eUA7^r7d\x06J 01:31:24+
>  N!\x0eUA7^r7d\x06J 14:22:43+
>  N!\x0eUA7^r7d\x06J 14:54:29+
>  N!\x0eUA7^r7d\x06J 13:31:54+
>  N!\x0eUA7^r7d\x06J 06:38:40+
>  N!\x0eUA7^r7d\x06J 21:16:47+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 2014-11-23 
> 17:05:45+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 2012-02-23 
> 23:20:54+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 2012-02-19 
> 12:05:15+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 2005-10-17 
> 04:22:45+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 2003-02-24 
> 19:45:06+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 1996-12-18 
> 06:18:31+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 1991-06-10 
> 22:07:45+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 1983-05-05 
> 12:29:09+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 1972-04-17 
> 21:24:52+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 1971-05-09 
> 23:00:02+
> (30 rows)
> cqlsh>
> {noformat}
> If I remove the {{created_at}} clustering key, then the other two clustering 
> keys are being assigned variable values per partition key.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11138) cassandra-stress tool - clustering key values not distributed

2016-09-22 Thread Alwyn Davis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15514810#comment-15514810
 ] 

Alwyn Davis commented on CASSANDRA-11138:
-

I think this is occurring because the {{lastRow}} in the {{PartitionIterator}} 
class will always exit once the clustering components are exhausted, for 3 or 
more clustering keys.  

When {{decompose}} creates lastRow, position is always a product of 
{{generator.clusteringDescendantAverages\[0\]}} and is then divided by it 
again.  So there's never a remainder and consequently any lastRow to currentRow 
comparison will indicate that there are more distinct values in the currentRow 
column then we want - lastRow will be something like:
{code}{, 0, 0}{code}.

As a fix, could it instead set lastRow (for MultiRowIterators) to just the 
corresponding clusteringDescendantAverages values?

> cassandra-stress tool - clustering key values not distributed
> -
>
> Key: CASSANDRA-11138
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11138
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
> Environment: Cassandra 2.2.4, Centos 6.5, Java 8
>Reporter: Ralf Steppacher
>  Labels: stress
>
> I am trying to get the stress tool to generate random values for three 
> clustering keys. I am trying to simulate collecting events per user id (text, 
> partition key). Events have a session type (text), event type (text), and 
> creation time (timestamp) (clustering keys, in that order). For testing 
> purposes I ended up with the following column spec:
> {noformat}
> columnspec:
> - name: created_at
>   cluster: uniform(10..10)
> - name: event_type
>   size: uniform(5..10)
>   population: uniform(1..30)
>   cluster: uniform(1..30)
> - name: session_type
>   size: fixed(5)
>   population: uniform(1..4)
>   cluster: uniform(1..4)
> - name: user_id
>   size: fixed(15)
>   population: uniform(1..100)
> - name: message
>   size: uniform(10..100)
>   population: uniform(1..100B)
> {noformat}
> My expectation was that this would lead to anywhere between 10 and 1200 rows 
> to be created per partition key. But it seems that exactly 10 rows are being 
> created, with the {{created_at}} timestamp being the only variable that is 
> assigned variable values (per partition key). The {{session_type}} and 
> {{event_type}} variables are assigned fixed values. This is even the case if 
> I set the cluster distribution to uniform(30..30) and uniform(4..4) 
> respectively. With this setting I expected 1200 rows per partition key to be 
> created, as announced when running the stress tool, but it is still 10.
> {noformat}
> [rsteppac@centos bin]$ ./cassandra-stress user 
> profile=../batch_too_large.yaml ops\(insert=1\) -log level=verbose 
> file=~/centos_eventy_patient_session_event_timestamp_insert_only.log -node 
> 10.211.55.8
> …
> Created schema. Sleeping 1s for propagation.
> Generating batches with [1..1] partitions and [1..1] rows (of [1200..1200] 
> total rows in the partitions)
> Improvement over 4 threadCount: 19%
> ...
> {noformat}
> Sample of generated data:
> {noformat}
> cqlsh> select user_id, event_type, session_type, created_at from 
> stresscql.batch_too_large LIMIT 30 ;
> user_id | event_type   | session_type | created_at
> -+--+--+--
>   %\x7f\x03/.d29 08:14:11+
>   %\x7f\x03/.d29 04:04:56+
>   %\x7f\x03/.d29 00:39:23+
>   %\x7f\x03/.d29 19:56:30+
>   %\x7f\x03/.d29 20:46:26+
>   %\x7f\x03/.d29 03:27:17+
>   %\x7f\x03/.d29 23:30:34+
>   %\x7f\x03/.d29 02:41:28+
>   %\x7f\x03/.d29 07:23:48+
>   %\x7f\x03/.d29 23:23:04+
>  N!\x0eUA7^r7d\x06J 17:48:51+
>  N!\x0eUA7^r7d\x06J 06:21:13+
>  N!\x0eUA7^r7d\x06J 03:34:41+
>  N!\x0eUA7^r7d\x06J 05:26:21+
>  N!\x0eUA7^r7d\x06J 01:31:24+
>  N!\x0eUA7^r7d\x06J 14:22:43+
>  N!\x0eUA7^r7d\x06J 14:54:29+
>  N!\x0eUA7^r7d\x06J 13:31:54+
>  N!\x0eUA7^r7d\x06J 06:38:40+
>  N!\x0eUA7^r7d\x06J 21:16:47+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 2014-11-23 
> 17:05:45+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 2012-02-23 
> 23:20:54+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 2012-02-19 
> 12:05:15+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 2005-10-17 
> 04:22:45+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 2003-02-24 
> 19:45:06+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 1996-12-18 
> 06:18:31+
> oy\x1c0077H"i\x07\x13_%\x06 || \nz@Qj\x1cB |E}P^k | 1991-06-10 
> 22:07:45+
>

[jira] [Commented] (CASSANDRA-11138) cassandra-stress tool - clustering key values not distributed

[jira] [Commented] (CASSANDRA-11138) cassandra-stress tool - clustering key values not distributed

[jira] [Commented] (CASSANDRA-11138) cassandra-stress tool - clustering key values not distributed

[jira] [Commented] (CASSANDRA-11138) cassandra-stress tool - clustering key values not distributed

[jira] [Commented] (CASSANDRA-11138) cassandra-stress tool - clustering key values not distributed

[jira] [Commented] (CASSANDRA-11138) cassandra-stress tool - clustering key values not distributed

[jira] [Commented] (CASSANDRA-11138) cassandra-stress tool - clustering key values not distributed

7 matches

Site Navigation

Mail list logo

Footer information