[jira] [Commented] (CASSANDRA-12490) Add sequence distribution type to cassandra stress

2016-10-23 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15600792#comment-15600792
 ] 

Stefania commented on CASSANDRA-12490:
--

Update committed to 3.X as 7c759e2c326357a241e63c19cd4cd329f7920ea3 and merged 
into trunk.

> Add sequence distribution type to cassandra stress
> --
>
> Key: CASSANDRA-12490
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12490
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Ben Slater
>Assignee: Ben Slater
>Priority: Minor
> Fix For: 3.10
>
> Attachments: 12490-trunk.patch, 12490.yaml, 
> 12490updatev2-trunk.patch, cqlstress-seq-example.yaml
>
>
> When using the write command, cassandra stress sequentially generates seeds. 
> This ensures generated values don't overlap (unless the sequence wraps) 
> providing more predictable number of inserted records (and generating a base 
> set of data without wasted writes).
> When using a yaml stress spec there is no sequenced distribution available. 
> It think it would be useful to have this for doing initial load of data for 
> testing 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12490) Add sequence distribution type to cassandra stress

2016-10-23 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15600672#comment-15600672
 ] 

Stefania commented on CASSANDRA-12490:
--

The trunk testall job failed for an unrelated reason, I've relaunched it. I 
should be able to commit the update if the job completes successfully or has 
only known failures.

> Add sequence distribution type to cassandra stress
> --
>
> Key: CASSANDRA-12490
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12490
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Ben Slater
>Assignee: Ben Slater
>Priority: Minor
> Fix For: 3.10
>
> Attachments: 12490-trunk.patch, 12490.yaml, 
> 12490updatev2-trunk.patch, cqlstress-seq-example.yaml
>
>
> When using the write command, cassandra stress sequentially generates seeds. 
> This ensures generated values don't overlap (unless the sequence wraps) 
> providing more predictable number of inserted records (and generating a base 
> set of data without wasted writes).
> When using a yaml stress spec there is no sequenced distribution available. 
> It think it would be useful to have this for doing initial load of data for 
> testing 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12490) Add sequence distribution type to cassandra stress

2016-10-21 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15594193#comment-15594193
 ] 

Stefania commented on CASSANDRA-12490:
--

Thank you for the update, it LGTM.

I've launched the CI jobs here:

||3.X||trunk||
|[patch|https://github.com/stef1927/cassandra/tree/12490-3.X]|[patch|https://github.com/stef1927/cassandra/tree/12490]|
|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-12490-3.X-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-12490-testall/]|
|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-12490-3.X-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-12490-dtest/]|


> Add sequence distribution type to cassandra stress
> --
>
> Key: CASSANDRA-12490
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12490
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Ben Slater
>Assignee: Ben Slater
>Priority: Minor
> Fix For: 3.10
>
> Attachments: 12490-trunk.patch, 12490.yaml, 
> 12490updatev2-trunk.patch, cqlstress-seq-example.yaml
>
>
> When using the write command, cassandra stress sequentially generates seeds. 
> This ensures generated values don't overlap (unless the sequence wraps) 
> providing more predictable number of inserted records (and generating a base 
> set of data without wasted writes).
> When using a yaml stress spec there is no sequenced distribution available. 
> It think it would be useful to have this for doing initial load of data for 
> testing 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12490) Add sequence distribution type to cassandra stress

2016-10-20 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15593955#comment-15593955
 ] 

Stefania commented on CASSANDRA-12490:
--

We still have time before 3.10 so there is no need to work over the w/e. I can 
also help you with finalizing it if you are too busy.

> Add sequence distribution type to cassandra stress
> --
>
> Key: CASSANDRA-12490
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12490
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Ben Slater
>Assignee: Ben Slater
>Priority: Minor
> Fix For: 3.10
>
> Attachments: 12490-trunk.patch, 12490.yaml, 12490update-trunk.patch, 
> cqlstress-seq-example.yaml
>
>
> When using the write command, cassandra stress sequentially generates seeds. 
> This ensures generated values don't overlap (unless the sequence wraps) 
> providing more predictable number of inserted records (and generating a base 
> set of data without wasted writes).
> When using a yaml stress spec there is no sequenced distribution available. 
> It think it would be useful to have this for doing initial load of data for 
> testing 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12490) Add sequence distribution type to cassandra stress

2016-10-20 Thread Ben Slater (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15593928#comment-15593928
 ] 

Ben Slater commented on CASSANDRA-12490:


Yep, will take a look over the (Australian) weekend.

> Add sequence distribution type to cassandra stress
> --
>
> Key: CASSANDRA-12490
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12490
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Ben Slater
>Assignee: Ben Slater
>Priority: Minor
> Fix For: 3.10
>
> Attachments: 12490-trunk.patch, 12490.yaml, 12490update-trunk.patch, 
> cqlstress-seq-example.yaml
>
>
> When using the write command, cassandra stress sequentially generates seeds. 
> This ensures generated values don't overlap (unless the sequence wraps) 
> providing more predictable number of inserted records (and generating a base 
> set of data without wasted writes).
> When using a yaml stress spec there is no sequenced distribution available. 
> It think it would be useful to have this for doing initial load of data for 
> testing 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12490) Add sequence distribution type to cassandra stress

2016-10-20 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15593781#comment-15593781
 ] 

Stefania commented on CASSANDRA-12490:
--

No comments in the last 3 days.  If SEQ is being released in 3.10, and it looks 
like it, I'd rather have the fix for {{setSeed()}} in as well. So Ben, would 
you mind finalizing the second update to the patch so we can commit it?

> Add sequence distribution type to cassandra stress
> --
>
> Key: CASSANDRA-12490
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12490
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Ben Slater
>Assignee: Ben Slater
>Priority: Minor
> Fix For: 3.10
>
> Attachments: 12490-trunk.patch, 12490.yaml, 12490update-trunk.patch, 
> cqlstress-seq-example.yaml
>
>
> When using the write command, cassandra stress sequentially generates seeds. 
> This ensures generated values don't overlap (unless the sequence wraps) 
> providing more predictable number of inserted records (and generating a base 
> set of data without wasted writes).
> When using a yaml stress spec there is no sequenced distribution available. 
> It think it would be useful to have this for doing initial load of data for 
> testing 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12490) Add sequence distribution type to cassandra stress

2016-10-17 Thread Ben Slater (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15581709#comment-15581709
 ] 

Ben Slater commented on CASSANDRA-12490:


ok, agreed.

> Add sequence distribution type to cassandra stress
> --
>
> Key: CASSANDRA-12490
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12490
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Ben Slater
>Assignee: Ben Slater
>Priority: Minor
> Fix For: 3.10
>
> Attachments: 12490-trunk.patch, 12490.yaml, 12490update-trunk.patch, 
> cqlstress-seq-example.yaml
>
>
> When using the write command, cassandra stress sequentially generates seeds. 
> This ensures generated values don't overlap (unless the sequence wraps) 
> providing more predictable number of inserted records (and generating a base 
> set of data without wasted writes).
> When using a yaml stress spec there is no sequenced distribution available. 
> It think it would be useful to have this for doing initial load of data for 
> testing 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12490) Add sequence distribution type to cassandra stress

2016-10-16 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15581311#comment-15581311
 ] 

Stefania commented on CASSANDRA-12490:
--

bq. It still generates min..max as nextWithWrap() (which is called by the other 
variants of next()) returns start + (next % totalCount) so it starts again from 
min if it goes past max.

You're correct it does wrap, but it won't necessarily start at min, so the 
documentation should clearly state this. I would also add a call to 
{{setSeed()}} in one of the unit tests if the patch goes ahead. 
{{inverseCumProb()}} is instead correct because of the wrapping.

> Add sequence distribution type to cassandra stress
> --
>
> Key: CASSANDRA-12490
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12490
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Ben Slater
>Assignee: Ben Slater
>Priority: Minor
> Fix For: 3.10
>
> Attachments: 12490-trunk.patch, 12490.yaml, 12490update-trunk.patch, 
> cqlstress-seq-example.yaml
>
>
> When using the write command, cassandra stress sequentially generates seeds. 
> This ensures generated values don't overlap (unless the sequence wraps) 
> providing more predictable number of inserted records (and generating a base 
> set of data without wasted writes).
> When using a yaml stress spec there is no sequenced distribution available. 
> It think it would be useful to have this for doing initial load of data for 
> testing 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12490) Add sequence distribution type to cassandra stress

2016-10-16 Thread Ben Slater (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15581194#comment-15581194
 ] 

Ben Slater commented on CASSANDRA-12490:


Thanks for the feedback. 

You're right - next.set(seed) would be a better implementation - I didn't check 
if there was another method of setting the value of an AtomicLong. I can update 
a the patch if the decision is to go ahead.

It still generates min..max as nextWithWrap() (which is called by the other 
variants of next()) returns start + (next % totalCount) so it starts again from 
min if it goes past max.

I think the need for this is less with Jake's fix to the random generator in 
CASSANDRA-12744. However, I still think it serves a purpose for loading 
background data for a test without overlap. However, if I'm over-ruled on that 
so be it.


> Add sequence distribution type to cassandra stress
> --
>
> Key: CASSANDRA-12490
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12490
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Ben Slater
>Assignee: Ben Slater
>Priority: Minor
> Fix For: 3.10
>
> Attachments: 12490-trunk.patch, 12490.yaml, 12490update-trunk.patch, 
> cqlstress-seq-example.yaml
>
>
> When using the write command, cassandra stress sequentially generates seeds. 
> This ensures generated values don't overlap (unless the sequence wraps) 
> providing more predictable number of inserted records (and generating a base 
> set of data without wasted writes).
> When using a yaml stress spec there is no sequenced distribution available. 
> It think it would be useful to have this for doing initial load of data for 
> testing 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12490) Add sequence distribution type to cassandra stress

2016-10-16 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15581172#comment-15581172
 ] 

Stefania commented on CASSANDRA-12490:
--

Thanks for the patch Ben. 

Why not simply call {{next.set(seed)}}, or use a {{compareAndSet()}} in a loop 
so that {{next}} can remain final? 

Also, the updated patch changes the meaning of {{SEQ(min..max)}} to generate 
values between {{(min + seed .. min + seed + max)}} rather than {{(min .. 
max)}}. I assume this is what you wanted but the SEQ declaration looks a bit 
misleading now, so I would remove start from the declaration and set it to the 
seed, leaving next to always start at zero, which has the added advantage that 
you don't need to fix {{inverseCumProb()}}. The help section also needs 
rewriting. Lastly, add a unit test case that calls {{setSeed()}} so we can 
catch any other problems I may have missed.

I'm happy to review the code, and to do any commits or rollbacks as required, 
but I don't feel like the most qualified person to discuss the merits of the 
SEQ approach, vs. what was suggested by [~benedict], since I don't have a 
particularly good knowledge of cassandra-stress data generation. So if anyone 
with better knowledge wants to comment further, this is welcome.

> Add sequence distribution type to cassandra stress
> --
>
> Key: CASSANDRA-12490
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12490
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Ben Slater
>Assignee: Ben Slater
>Priority: Minor
> Fix For: 3.10
>
> Attachments: 12490-trunk.patch, 12490.yaml, 12490update-trunk.patch, 
> cqlstress-seq-example.yaml
>
>
> When using the write command, cassandra stress sequentially generates seeds. 
> This ensures generated values don't overlap (unless the sequence wraps) 
> providing more predictable number of inserted records (and generating a base 
> set of data without wasted writes).
> When using a yaml stress spec there is no sequenced distribution available. 
> It think it would be useful to have this for doing initial load of data for 
> testing 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12490) Add sequence distribution type to cassandra stress

2016-10-15 Thread Alan Boudreault (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15578207#comment-15578207
 ] 

Alan Boudreault commented on CASSANDRA-12490:
-

Awesome. Thanks Ben.

> Add sequence distribution type to cassandra stress
> --
>
> Key: CASSANDRA-12490
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12490
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Ben Slater
>Assignee: Ben Slater
>Priority: Minor
> Fix For: 3.10
>
> Attachments: 12490-trunk.patch, 12490.yaml, 12490update-trunk.patch, 
> cqlstress-seq-example.yaml
>
>
> When using the write command, cassandra stress sequentially generates seeds. 
> This ensures generated values don't overlap (unless the sequence wraps) 
> providing more predictable number of inserted records (and generating a base 
> set of data without wasted writes).
> When using a yaml stress spec there is no sequenced distribution available. 
> It think it would be useful to have this for doing initial load of data for 
> testing 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12490) Add sequence distribution type to cassandra stress

2016-10-14 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15575388#comment-15575388
 ] 

T Jake Luciani commented on CASSANDRA-12490:


Regarding the distribution issues I also noticed this and opened 
CASSANDRA-12744.   

> Add sequence distribution type to cassandra stress
> --
>
> Key: CASSANDRA-12490
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12490
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Ben Slater
>Assignee: Ben Slater
>Priority: Minor
> Fix For: 3.10
>
> Attachments: 12490-trunk.patch, 12490.yaml, cqlstress-seq-example.yaml
>
>
> When using the write command, cassandra stress sequentially generates seeds. 
> This ensures generated values don't overlap (unless the sequence wraps) 
> providing more predictable number of inserted records (and generating a base 
> set of data without wasted writes).
> When using a yaml stress spec there is no sequenced distribution available. 
> It think it would be useful to have this for doing initial load of data for 
> testing 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12490) Add sequence distribution type to cassandra stress

2016-10-14 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15575377#comment-15575377
 ] 

T Jake Luciani commented on CASSANDRA-12490:


Ah you are right!  I'm not sure why but for runs with an older version of 
stress I was getting read errors and assumed it was validation but perhaps it's 
a different bug.

> Add sequence distribution type to cassandra stress
> --
>
> Key: CASSANDRA-12490
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12490
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Ben Slater
>Assignee: Ben Slater
>Priority: Minor
> Fix For: 3.10
>
> Attachments: 12490-trunk.patch, 12490.yaml, cqlstress-seq-example.yaml
>
>
> When using the write command, cassandra stress sequentially generates seeds. 
> This ensures generated values don't overlap (unless the sequence wraps) 
> providing more predictable number of inserted records (and generating a base 
> set of data without wasted writes).
> When using a yaml stress spec there is no sequenced distribution available. 
> It think it would be useful to have this for doing initial load of data for 
> testing 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12490) Add sequence distribution type to cassandra stress

2016-10-13 Thread Ben Slater (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15573973#comment-15573973
 ] 

Ben Slater commented on CASSANDRA-12490:


I realised [~tjake] was saying the a validation error is the expected behaviour 
and occurs in 3.9 but not trunk. 

I just tried but can't get a validation error in 3.9 with a YAML file (as I 
said, I wasn't aware that validation functionality existed for YAML specs). Can 
you provide some more details on how to reproduce?

> Add sequence distribution type to cassandra stress
> --
>
> Key: CASSANDRA-12490
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12490
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Ben Slater
>Assignee: Ben Slater
>Priority: Minor
> Fix For: 3.10
>
> Attachments: 12490-trunk.patch, 12490.yaml, cqlstress-seq-example.yaml
>
>
> When using the write command, cassandra stress sequentially generates seeds. 
> This ensures generated values don't overlap (unless the sequence wraps) 
> providing more predictable number of inserted records (and generating a base 
> set of data without wasted writes).
> When using a yaml stress spec there is no sequenced distribution available. 
> It think it would be useful to have this for doing initial load of data for 
> testing 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12490) Add sequence distribution type to cassandra stress

2016-10-13 Thread Ben Slater (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15573966#comment-15573966
 ] 

Ben Slater commented on CASSANDRA-12490:


Moving back from dev list where I think the discussion ended up by accident. 

Jake said:
No I'm not using a seq anywhere else then the command line

I said:
OK, I think it’s pretty unlikely to be this change as I didn’t change the 
existing code (certainly nothing near what is used by -pop) and also I just 
noticed you said you had the issue in 3.9 and CASS-12490 is destined for 3.10. 

Also, last time I looked, I thought stress didn’t validate returned results for 
YAML specs. Did I miss something or did that get added recently? Can you add 
your actual command, etc to the ticket?

Anyway, I will try to do some more digging over the weekend as I still suspect 
there is something wrong (or at least unexpected) going on aside from this 
change.

> Add sequence distribution type to cassandra stress
> --
>
> Key: CASSANDRA-12490
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12490
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Ben Slater
>Assignee: Ben Slater
>Priority: Minor
> Fix For: 3.10
>
> Attachments: 12490-trunk.patch, 12490.yaml, cqlstress-seq-example.yaml
>
>
> When using the write command, cassandra stress sequentially generates seeds. 
> This ensures generated values don't overlap (unless the sequence wraps) 
> providing more predictable number of inserted records (and generating a base 
> set of data without wasted writes).
> When using a yaml stress spec there is no sequenced distribution available. 
> It think it would be useful to have this for doing initial load of data for 
> testing 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12490) Add sequence distribution type to cassandra stress

2016-10-13 Thread Ben Slater (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15573119#comment-15573119
 ] 

Ben Slater commented on CASSANDRA-12490:


Just to check [~tjake] when you say "this also breaks validation", I assume you 
mean it breaks validation when you use the sequence distribution type, not in 
the case where you don't use seq()?

> Add sequence distribution type to cassandra stress
> --
>
> Key: CASSANDRA-12490
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12490
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Ben Slater
>Assignee: Ben Slater
>Priority: Minor
> Fix For: 3.10
>
> Attachments: 12490-trunk.patch, 12490.yaml, cqlstress-seq-example.yaml
>
>
> When using the write command, cassandra stress sequentially generates seeds. 
> This ensures generated values don't overlap (unless the sequence wraps) 
> providing more predictable number of inserted records (and generating a base 
> set of data without wasted writes).
> When using a yaml stress spec there is no sequenced distribution available. 
> It think it would be useful to have this for doing initial load of data for 
> testing 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12490) Add sequence distribution type to cassandra stress

2016-10-13 Thread Ben Slater (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15573116#comment-15573116
 ] 

Ben Slater commented on CASSANDRA-12490:


Yeah, that would be my misunderstanding and misimplementation of setSeed(). The 
fix appears to be trivial (discussed somewhere in the wall of text above). I'll 
test a bit more and submit a patch in the next day or two.

> Add sequence distribution type to cassandra stress
> --
>
> Key: CASSANDRA-12490
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12490
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Ben Slater
>Assignee: Ben Slater
>Priority: Minor
> Fix For: 3.10
>
> Attachments: 12490-trunk.patch, 12490.yaml, cqlstress-seq-example.yaml
>
>
> When using the write command, cassandra stress sequentially generates seeds. 
> This ensures generated values don't overlap (unless the sequence wraps) 
> providing more predictable number of inserted records (and generating a base 
> set of data without wasted writes).
> When using a yaml stress spec there is no sequenced distribution available. 
> It think it would be useful to have this for doing initial load of data for 
> testing 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12490) Add sequence distribution type to cassandra stress

2016-10-13 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572941#comment-15572941
 ] 

T Jake Luciani commented on CASSANDRA-12490:


I'm pretty sure this also breaks read validation somehow.

If you write with a yaml profile and -pop seq=1..1000 then read with -pop 
seq=1..1m it will return without error. In 3.9 this errors as it should.


> Add sequence distribution type to cassandra stress
> --
>
> Key: CASSANDRA-12490
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12490
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Ben Slater
>Assignee: Ben Slater
>Priority: Minor
> Fix For: 3.10
>
> Attachments: 12490-trunk.patch, 12490.yaml, cqlstress-seq-example.yaml
>
>
> When using the write command, cassandra stress sequentially generates seeds. 
> This ensures generated values don't overlap (unless the sequence wraps) 
> providing more predictable number of inserted records (and generating a base 
> set of data without wasted writes).
> When using a yaml stress spec there is no sequenced distribution available. 
> It think it would be useful to have this for doing initial load of data for 
> testing 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12490) Add sequence distribution type to cassandra stress

2016-10-13 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571614#comment-15571614
 ] 

Benedict commented on CASSANDRA-12490:
--

_partition_ keys are a distinct beast, and if your population distribution for 
these is tiny then yes you will get overwrites, and I'm really not sure there's 
anything we can _reliably_ do about that.  Mostly I've been talking about 
behaviour _within_ a partition (except when pointing out some breakages).

The command line "-pop" property specifies the population of unique partition 
_seeds_.  These have to be translated into the partition key population 
distribution(s) first, which then between them identify the partition's 
contents.  The problem is that the size of the unique seed set could be 
gigantic (we let n be billions in size, and it is often necessary to run with 
datasets this large), so enumerating all of these unique seeds and determining 
their value in the partition key column population distributions would be 
prohibitively expensive.  So we just accept that users should sensibly ensure 
their partition key population distribution is large enough to accommodate 
enough random samples to fulfil their seed population.

Now, for small populations we *could* mode-switch.  But I'm not sure it ever 
makes sense to so materially constrain your partition key population 
distribution.  It might even make sense for stress to forbid constraining this 
distribution too much, as it has essentially no impact to the behaviour profile 
of the cluster.

If you want to visit a single partition many times, there are better ways to do 
that. i.e., specifying that the seed population as small, but that you want to 
run many operations.  This will give you an identically constrained population, 
without any risk of weirdness, as well as permitting the same yaml to be used 
for different scales of test.  ideally you want each visit, in such a scenario, 
to use the more advanced features of stress anyway (such as partial visitation 
of the whole generated (presumably huge) partition, or incremental visitation)

> Add sequence distribution type to cassandra stress
> --
>
> Key: CASSANDRA-12490
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12490
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Ben Slater
>Assignee: Ben Slater
>Priority: Minor
> Fix For: 3.10
>
> Attachments: 12490-trunk.patch, 12490.yaml, cqlstress-seq-example.yaml
>
>
> When using the write command, cassandra stress sequentially generates seeds. 
> This ensures generated values don't overlap (unless the sequence wraps) 
> providing more predictable number of inserted records (and generating a base 
> set of data without wasted writes).
> When using a yaml stress spec there is no sequenced distribution available. 
> It think it would be useful to have this for doing initial load of data for 
> testing 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12490) Add sequence distribution type to cassandra stress

2016-10-13 Thread Ben Slater (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571589#comment-15571589
 ] 

Ben Slater commented on CASSANDRA-12490:


OK, so I did some more investigation this evening to try to better understand 
this and found a few interesting things. I suspect there is at least on bug 
here but I'll be interested to see what you think.

I set up a simple spec to test what was going on:
```
table: test4
table_definition: |
  CREATE TABLE test4 (
pk text,
val text,
PRIMARY KEY (pk)
  ) 
columnspec:
  - name: pk
size: fixed(32) 
population: uniform(1..50)```

When I run this with `ops(insert=1) n=50` the end result is 1 row added to the 
table. When I run it with n=500 I get 3 rows. Some other observations:
a) tracing this through it seems that's because the small number of seed values 
from the population (due to the small n=) results in a very low variation in 
values being returned from `delegate.sample()` in 
`DistributionBoundApache.next()`. They were (all 37 Add sequence distribution type to cassandra stress
> --
>
> Key: CASSANDRA-12490
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12490
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Ben Slater
>Assignee: Ben Slater
>Priority: Minor
> Fix For: 3.10
>
> Attachments: 12490-trunk.patch, 12490.yaml, cqlstress-seq-example.yaml
>
>
> When using the write command, cassandra stress sequentially generates seeds. 
> This ensures generated values don't overlap (unless the sequence wraps) 
> providing more predictable number of inserted records (and generating a base 
> set of data without wasted writes).
> When using a yaml stress spec there is no sequenced distribution available. 
> It think it would be useful to have this for doing initial load of data for 
> testing 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12490) Add sequence distribution type to cassandra stress

2016-10-11 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15565128#comment-15565128
 ] 

Benedict commented on CASSANDRA-12490:
--

bq. which does make me wonder how stress is respecting the distribution for the 
PK value 

The random number generator is separate to the distribution.  Resetting the 
seed just provides a deterministically pseudo-random value to the distribution; 
it does not affect the distribution of the output value.

bq. For say normal distribution you'd need several * n to cover all the 
possible values and have close to a normal distribution

Well, it's not clear what the "correct" behaviour is here - which spec should 
win? The two specs (population and value count) are at odds, and users found it 
confusing (esp. for populating a dataset) to have far fewer values produced 
than they expected (for any population distribution).  So, when it is *easy* to 
do so we let value count win the competition, i.e. when the total set of values 
we should produce is known.  Though I'll admit I don't like the inconsistency. 
Perhaps we should always honour it, switching to the inverse distribution and 
deselecting values where it would be costly.

bq. (a) there was already a sequence distribution type in the "legacy" 
distribution sets (presumably for just this purpose) 

I'm really not sure what you're referring to here.  It's been a while since I 
wrote this or looked at the really old legacy stuff, but I don't recall any 
legacy sequential distribution. There is a sequential *population* mode, i.e. 
to visit partitions in order, but I don't recall a sequential distribution (and 
I just had a glance at the source code and couldn't find it).

bq. (b) to me, one way of describing this is a uniform distribution with 
minimal chance of collisions (ie it's just another way for selecting values 
from a range).

Well, here's the rub, that is not consistent semantically to other 
distributions; you even had to qualify with "to me" - unfortunately (as I have 
discovered) not everyone has your own intuitions.  Users will be surprised, and 
annoyance and JIRA traffic will ensue. stress is already too difficult to use, 
and I think we should be aiming for maximum conformity of behaviour.  We have, 
after all, just discussed three possible variants of the "same" distribution - 
and users won't know which of these they have.  That said, since your modified 
approach doesn't absolutely break behaviour I just think it's a bad idea, not a 
terrible one.

bq. Finally, it's not quite correct to say I'm trying to populate all possible 
values for a column, rather trying to generate as many unique values as 
possible (within the specified ranges) for a given sample size (to minimise 
overwriting)

There is no overwriting, just the samples are discarded if they clash (when the 
population distribution wins).  It seems like you fall into the large bucket of 
users I mentioned before, who really want the count to be a value count not a 
sample count, so why not just honour this everywhere with the approach I 
suggested above?  i.e. If it looks likely to be costly to select {{count}} 
values from the distribution (because of collisions), instead invert the 
probability function and deselect values.



> Add sequence distribution type to cassandra stress
> --
>
> Key: CASSANDRA-12490
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12490
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Ben Slater
>Assignee: Ben Slater
>Priority: Minor
> Fix For: 3.10
>
> Attachments: 12490-trunk.patch, 12490.yaml, cqlstress-seq-example.yaml
>
>
> When using the write command, cassandra stress sequentially generates seeds. 
> This ensures generated values don't overlap (unless the sequence wraps) 
> providing more predictable number of inserted records (and generating a base 
> set of data without wasted writes).
> When using a yaml stress spec there is no sequenced distribution available. 
> It think it would be useful to have this for doing initial load of data for 
> testing 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12490) Add sequence distribution type to cassandra stress

2016-10-10 Thread Ben Slater (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15564148#comment-15564148
 ] 

Ben Slater commented on CASSANDRA-12490:


Yes, you're right resetting the counter to zero on setSeed() does result in the 
same row being generated over and over again (which does make me wonder how 
stress is respecting the distribution for the PK value but didn't investigate 
at this point). However, that is pretty easily fixed by having setSeed() set 
the counter to the supplied seed value. I think once we do this SEQ behaves 
very similarly to the other distributions.

I don't think it's correct that stress generates every value if the number of 
unique values it can generate is <= the number of values it is being asked to 
generate for a partition. This would only respect the distribution in the case 
of uniform distribution, however even then I don't think it's guaranteed to be 
completely uniform (and thus generate all values) from n samples of a 1..n 
distribution (you probably need to do many * n to get very close to uniform) - 
it certainly doesn't seem to behave this way in testing. For say normal 
distribution you'd need several * n to cover all the possible values and have 
close to a normal distribution.

I afraid I don't really understand why you think this is abusing the notion of 
distributions when (a) there was already a sequence distribution type in the 
"legacy" distribution sets (presumably for just this purpose) and (b) to me, 
one way of describing this is a uniform distribution with minimal chance of 
collisions (ie it's just another way for selecting values from a range).

Finally, it's not quite correct to say I'm trying to populate all possible 
values for a column, rather trying to generate as many unique values as 
possible (within the specified ranges) for a given sample size (to minimise 
overwriting).

> Add sequence distribution type to cassandra stress
> --
>
> Key: CASSANDRA-12490
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12490
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Ben Slater
>Assignee: Ben Slater
>Priority: Minor
> Fix For: 3.10
>
> Attachments: 12490-trunk.patch, 12490.yaml, cqlstress-seq-example.yaml
>
>
> When using the write command, cassandra stress sequentially generates seeds. 
> This ensures generated values don't overlap (unless the sequence wraps) 
> providing more predictable number of inserted records (and generating a base 
> set of data without wasted writes).
> When using a yaml stress spec there is no sequenced distribution available. 
> It think it would be useful to have this for doing initial load of data for 
> testing 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12490) Add sequence distribution type to cassandra stress

2016-10-10 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15561963#comment-15561963
 ] 

Benedict commented on CASSANDRA-12490:
--

Well, the cqlstress-seq-example.yaml attached to the ticket shows this in use 
by the partition key specification, which is broken either way you cut it.  If 
we reset the seed each time, we will only ever generate one partition, no 
matter how hard we try (making it a fairly terrible load test).  If we do not, 
we can never query the data (meaningfully, reliably; perhaps by chance).

If all you want is some way to populate all possible values for a clustering 
column, that's a very specific problem and I'm not sure abusing distribution is 
the right way to achieve that.  Perhaps the distribution parameter could take 
an ALL value, that tells stress it should generate every value.  However it 
does this already (iirc) if the number of unique values it can generate is <= 
the number of values it is being asked to generate for a partition.

For generating specific value distributions within a partition, my view is that 
we should really be supporting nashorn function definitions in the json.  These 
could accept the partition and clustering row seeds (and perhaps, optionally, 
index array, i.e. with three clustering columns the index within each column we 
are generating, i.e. the first row would be [0,0,0], the second [0,0,1] or 
[0,1,0] or [1,0,0]) as their parameter, and return a value for the column).  
This would allow you to reliably produce whatever distribution of values you 
wanted.  

> Add sequence distribution type to cassandra stress
> --
>
> Key: CASSANDRA-12490
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12490
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Ben Slater
>Assignee: Ben Slater
>Priority: Minor
> Fix For: 3.10
>
> Attachments: 12490-trunk.patch, 12490.yaml, cqlstress-seq-example.yaml
>
>
> When using the write command, cassandra stress sequentially generates seeds. 
> This ensures generated values don't overlap (unless the sequence wraps) 
> providing more predictable number of inserted records (and generating a base 
> set of data without wasted writes).
> When using a yaml stress spec there is no sequenced distribution available. 
> It think it would be useful to have this for doing initial load of data for 
> testing 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12490) Add sequence distribution type to cassandra stress

2016-10-10 Thread Ben Slater (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15561852#comment-15561852
 ] 

Ben Slater commented on CASSANDRA-12490:


Hi Benedict,

I must be missing something here because as far as I can tell from testing a 
few different scenarios, setting -pop seq=1..N doesn't have any impact on the 
set of data generated when used with a YAML file.

That aside, the intent is that you use the SEQ distribution for doing an 
initial load of background data before running say a read test or a mixed 
read/write test so that you are running with a representative volume of data on 
disk (and that you would probably wouldn't use SEQ for these later tests). In 
that case you wouldn't expect/care whether the set of data generated initially 
lines up in the same order as what is generated by later runs (although you 
would expect them to be from the same overall populations of values which I 
believe does hold). I believe the sequence of data generation would have to 
change similarly if you changed between existing distribution types between 
runs?

Looking again at the code, I can see how the current implementation of  SEQ is 
any issue for implementation future data validation as it doesn't "reset" as 
you visit each partition.  I think the other distributions effectively rest due 
to the call to setSeed(). However, I think this can fairly easily be rectified 
by having the setSeed() implementation of DistrubtionSequence reset the next 
value to 0?

Cheers
Ben



> Add sequence distribution type to cassandra stress
> --
>
> Key: CASSANDRA-12490
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12490
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Ben Slater
>Assignee: Ben Slater
>Priority: Minor
> Fix For: 3.10
>
> Attachments: 12490-trunk.patch, 12490.yaml, cqlstress-seq-example.yaml
>
>
> When using the write command, cassandra stress sequentially generates seeds. 
> This ensures generated values don't overlap (unless the sequence wraps) 
> providing more predictable number of inserted records (and generating a base 
> set of data without wasted writes).
> When using a yaml stress spec there is no sequenced distribution available. 
> It think it would be useful to have this for doing initial load of data for 
> testing 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12490) Add sequence distribution type to cassandra stress

2016-10-09 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15560318#comment-15560318
 ] 

Benedict commented on CASSANDRA-12490:
--

I'm afraid I think this was a terrible idea, and it should probably be rolled 
back.  The example yaml permits its use as a column value seed generator, which 
means the contents of a partition no longer depend on the partition's seed, but 
on the order of visitation.  

For partition and clustering columns (as in the example) this breaks behaviour 
for queries.  Stress no longer knows what records exist (it will generate 
different values to query than it originally wrote).

It also completely breaks any possibility of data validation, which is 
currently supported for thrift and always intended to be extending to CQL to 
improve testing. 

As already mentioned, the -pop seq=1..N mode can be provided on the command 
line for sequentially visiting partitions.  For generating *values* that can 
step forwards with this, the most sensible design (and what had been on the 
cards) is to accept a functional specification that depends on the seed of the 
partition, the simplest being to return 1 when the partition's seed was 1.

> Add sequence distribution type to cassandra stress
> --
>
> Key: CASSANDRA-12490
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12490
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Ben Slater
>Assignee: Ben Slater
>Priority: Minor
> Fix For: 3.10
>
> Attachments: 12490-trunk.patch, 12490.yaml, cqlstress-seq-example.yaml
>
>
> When using the write command, cassandra stress sequentially generates seeds. 
> This ensures generated values don't overlap (unless the sequence wraps) 
> providing more predictable number of inserted records (and generating a base 
> set of data without wasted writes).
> When using a yaml stress spec there is no sequenced distribution available. 
> It think it would be useful to have this for doing initial load of data for 
> testing 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12490) Add sequence distribution type to cassandra stress

2016-10-02 Thread Alan Boudreault (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15540529#comment-15540529
 ] 

Alan Boudreault commented on CASSANDRA-12490:
-

Hey Ben, It looks like when using seq() on partition keys, we are limited to a 
rate of threads=1. Can you confirm? I wonder if we could do something about 
this.

> Add sequence distribution type to cassandra stress
> --
>
> Key: CASSANDRA-12490
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12490
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Ben Slater
>Assignee: Ben Slater
>Priority: Minor
> Fix For: 3.10
>
> Attachments: 12490-trunk.patch, 12490.yaml, cqlstress-seq-example.yaml
>
>
> When using the write command, cassandra stress sequentially generates seeds. 
> This ensures generated values don't overlap (unless the sequence wraps) 
> providing more predictable number of inserted records (and generating a base 
> set of data without wasted writes).
> When using a yaml stress spec there is no sequenced distribution available. 
> It think it would be useful to have this for doing initial load of data for 
> testing 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12490) Add sequence distribution type to cassandra stress

2016-09-26 Thread Alan Boudreault (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15524585#comment-15524585
 ] 

Alan Boudreault commented on CASSANDRA-12490:
-

Thanks Ben!

> Add sequence distribution type to cassandra stress
> --
>
> Key: CASSANDRA-12490
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12490
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Ben Slater
>Assignee: Ben Slater
>Priority: Minor
> Fix For: 3.10
>
> Attachments: 12490-trunk.patch, 12490.yaml, cqlstress-seq-example.yaml
>
>
> When using the write command, cassandra stress sequentially generates seeds. 
> This ensures generated values don't overlap (unless the sequence wraps) 
> providing more predictable number of inserted records (and generating a base 
> set of data without wasted writes).
> When using a yaml stress spec there is no sequenced distribution available. 
> It think it would be useful to have this for doing initial load of data for 
> testing 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12490) Add sequence distribution type to cassandra stress

2016-09-26 Thread Ben Slater (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15524405#comment-15524405
 ] 

Ben Slater commented on CASSANDRA-12490:


Actually, a colleague of mine just submitted patch for that issue last week - 
it occurs regardless of distribution but is probably more obvious using SEQ. 
See https://issues.apache.org/jira/browse/CASSANDRA-11138

> Add sequence distribution type to cassandra stress
> --
>
> Key: CASSANDRA-12490
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12490
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Ben Slater
>Assignee: Ben Slater
>Priority: Minor
> Fix For: 3.10
>
> Attachments: 12490-trunk.patch, 12490.yaml, cqlstress-seq-example.yaml
>
>
> When using the write command, cassandra stress sequentially generates seeds. 
> This ensures generated values don't overlap (unless the sequence wraps) 
> providing more predictable number of inserted records (and generating a base 
> set of data without wasted writes).
> When using a yaml stress spec there is no sequenced distribution available. 
> It think it would be useful to have this for doing initial load of data for 
> testing 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12490) Add sequence distribution type to cassandra stress

2016-09-26 Thread Alan Boudreault (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15523039#comment-15523039
 ] 

Alan Boudreault commented on CASSANDRA-12490:
-

[~slater_ben] I've been testing this feature during the weekend. At first 
sight, it worked well and it's very useful. However, I'm experiencing 
unexpected results when I add a third clustered columns. I've attached my yaml 
configuration as a test case. There are some comments in the file but here is a 
brief description of the issue:

cassandra-stress user profile=12490.yaml ops\(insert=1\) n=10 -rate threads=1

{code}
-->  PRIMARY KEY ((stid, year, month), day, hour, minute)
{code}

{code}
 - name: day
cluster: fixed(30)
population: seq(1..30)
  - name: hour
cluster: fixed(24)
population: seq(1..24) 
  - name: minute
cluster: fixed(60)
population: seq(1..60)
{code}

With 3 clustered columns, it looks like only the last one is considered. So, 
with n=10, I got 600 rows total.. when I should have (60*24*30) rows per 
partition. If I remove the minute in the clustering columns, things work as 
expected: 7200 rows total (10*24*30). 



> Add sequence distribution type to cassandra stress
> --
>
> Key: CASSANDRA-12490
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12490
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Ben Slater
>Assignee: Ben Slater
>Priority: Minor
> Fix For: 3.10
>
> Attachments: 12490-trunk.patch, 12490.yaml, cqlstress-seq-example.yaml
>
>
> When using the write command, cassandra stress sequentially generates seeds. 
> This ensures generated values don't overlap (unless the sequence wraps) 
> providing more predictable number of inserted records (and generating a base 
> set of data without wasted writes).
> When using a yaml stress spec there is no sequenced distribution available. 
> It think it would be useful to have this for doing initial load of data for 
> testing 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12490) Add sequence distribution type to cassandra stress

2016-08-23 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15434079#comment-15434079
 ] 

Stefania commented on CASSANDRA-12490:
--

Thanks, can this override the distributions specified in the stress profiles?

> Add sequence distribution type to cassandra stress
> --
>
> Key: CASSANDRA-12490
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12490
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Ben Slater
>Assignee: Ben Slater
>Priority: Minor
> Fix For: 3.10
>
> Attachments: 12490-trunk.patch, cqlstress-seq-example.yaml
>
>
> When using the write command, cassandra stress sequentially generates seeds. 
> This ensures generated values don't overlap (unless the sequence wraps) 
> providing more predictable number of inserted records (and generating a base 
> set of data without wasted writes).
> When using a yaml stress spec there is no sequenced distribution available. 
> It think it would be useful to have this for doing initial load of data for 
> testing 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12490) Add sequence distribution type to cassandra stress

2016-08-23 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15434067#comment-15434067
 ] 

T Jake Luciani commented on CASSANDRA-12490:


FYI we do support this on the command line via -pop seq=1..100

> Add sequence distribution type to cassandra stress
> --
>
> Key: CASSANDRA-12490
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12490
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Ben Slater
>Assignee: Ben Slater
>Priority: Minor
> Fix For: 3.10
>
> Attachments: 12490-trunk.patch, cqlstress-seq-example.yaml
>
>
> When using the write command, cassandra stress sequentially generates seeds. 
> This ensures generated values don't overlap (unless the sequence wraps) 
> providing more predictable number of inserted records (and generating a base 
> set of data without wasted writes).
> When using a yaml stress spec there is no sequenced distribution available. 
> It think it would be useful to have this for doing initial load of data for 
> testing 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12490) Add sequence distribution type to cassandra stress

2016-08-23 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15434049#comment-15434049
 ] 

Stefania commented on CASSANDRA-12490:
--

CI results are good, the failed testall build is due to a recent change in that 
build that now counts Eclipse warnings as failures. They are not new warnings. 

Committed to trunk as e4f6045806906332b6c1ed3cf2ceece8f5bb922b, thank you for 
the patch!

> Add sequence distribution type to cassandra stress
> --
>
> Key: CASSANDRA-12490
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12490
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Ben Slater
>Assignee: Ben Slater
>Priority: Minor
> Fix For: 3.x
>
> Attachments: 12490-trunk.patch, cqlstress-seq-example.yaml
>
>
> When using the write command, cassandra stress sequentially generates seeds. 
> This ensures generated values don't overlap (unless the sequence wraps) 
> providing more predictable number of inserted records (and generating a base 
> set of data without wasted writes).
> When using a yaml stress spec there is no sequenced distribution available. 
> It think it would be useful to have this for doing initial load of data for 
> testing 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12490) Add sequence distribution type to cassandra stress

2016-08-22 Thread Ben Slater (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15432105#comment-15432105
 ] 

Ben Slater commented on CASSANDRA-12490:


Nits look good to me. I think the example is sufficient - the ones I've used 
don't really illustrate anything additional. Thanks!

> Add sequence distribution type to cassandra stress
> --
>
> Key: CASSANDRA-12490
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12490
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Ben Slater
>Assignee: Ben Slater
>Priority: Minor
> Fix For: 3.x
>
> Attachments: 12490-trunk.patch, cqlstress-seq-example.yaml
>
>
> When using the write command, cassandra stress sequentially generates seeds. 
> This ensures generated values don't overlap (unless the sequence wraps) 
> providing more predictable number of inserted records (and generating a base 
> set of data without wasted writes).
> When using a yaml stress spec there is no sequenced distribution available. 
> It think it would be useful to have this for doing initial load of data for 
> testing 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12490) Add sequence distribution type to cassandra stress

2016-08-22 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15432083#comment-15432083
 ] 

Stefania commented on CASSANDRA-12490:
--

Thanks for the updated patch, LGTM. 

I've created a branch with your patch and some nits. I've also started the CI 
jobs for trunk but they are still in a queue at the moment:

|[patch|https://github.com/stef1927/cassandra/commits/12490]|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-12490-testall/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-12490-dtest/]|

If you are happy with the nits, and the CI results are good, I'll squash and 
commit. 

Do you think we should add a sample stress profile, I've used [this 
one|^cqlstress-seq-example.yaml] for testing, do you have a more realistic 
sample or is this one good enough? 

> Add sequence distribution type to cassandra stress
> --
>
> Key: CASSANDRA-12490
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12490
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Ben Slater
>Assignee: Ben Slater
>Priority: Minor
> Fix For: 3.x
>
> Attachments: 12490-trunk.patch, cqlstress-seq-example.yaml
>
>
> When using the write command, cassandra stress sequentially generates seeds. 
> This ensures generated values don't overlap (unless the sequence wraps) 
> providing more predictable number of inserted records (and generating a base 
> set of data without wasted writes).
> When using a yaml stress spec there is no sequenced distribution available. 
> It think it would be useful to have this for doing initial load of data for 
> testing 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12490) Add sequence distribution type to cassandra stress

2016-08-22 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15430221#comment-15430221
 ] 

Stefania commented on CASSANDRA-12490:
--

Great, thank you!

> Add sequence distribution type to cassandra stress
> --
>
> Key: CASSANDRA-12490
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12490
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Ben Slater
>Assignee: Ben Slater
>Priority: Minor
> Fix For: 3.x
>
> Attachments: 12490-trunk.patch
>
>
> When using the write command, cassandra stress sequentially generates seeds. 
> This ensures generated values don't overlap (unless the sequence wraps) 
> providing more predictable number of inserted records (and generating a base 
> set of data without wasted writes).
> When using a yaml stress spec there is no sequenced distribution available. 
> It think it would be useful to have this for doing initial load of data for 
> testing 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12490) Add sequence distribution type to cassandra stress

2016-08-22 Thread Ben Slater (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15430211#comment-15430211
 ] 

Ben Slater commented on CASSANDRA-12490:


Makes sense - I'll fix up inverseCumProb() and add a test.

> Add sequence distribution type to cassandra stress
> --
>
> Key: CASSANDRA-12490
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12490
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Ben Slater
>Assignee: Ben Slater
>Priority: Minor
> Fix For: 3.x
>
> Attachments: 12490-trunk.patch
>
>
> When using the write command, cassandra stress sequentially generates seeds. 
> This ensures generated values don't overlap (unless the sequence wraps) 
> providing more predictable number of inserted records (and generating a base 
> set of data without wasted writes).
> When using a yaml stress spec there is no sequenced distribution available. 
> It think it would be useful to have this for doing initial load of data for 
> testing 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12490) Add sequence distribution type to cassandra stress

2016-08-22 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15430197#comment-15430197
 ] 

Stefania commented on CASSANDRA-12490:
--

Thank you for the patch!

I'm not an expert in probability either, but I believe {{inverseCumProb()}} 
should return the inverse of the cumulative distribution function, also known 
as the quantile function, see 
[here|https://en.wikipedia.org/wiki/Cumulative_distribution_function#Inverse_distribution_function_.28quantile_function.29].
 Regardless, what we care about is that it is used to calculate the min, max 
and average by the {{Distribution}} base class. The min and average don't seem 
to matter but we should nonetheless ensure that they are correct. So I believe, 
if I am not mistaken, that {{inverseCumProb()}} should return {{start + 
(totalCount -1) * cumProb}} so that it returns {{start}} when {{cumProb}} is 
zero and {{end}} when it is one.

If we add a unit test for {{DistributionSequence}}, I know we don't have that 
many unit tests for stress but we do have 3 of them, then we can verify not 
only that it returns a sequence as expected but also that the min, max and 
average are correct. This should be sufficient to ensure the correctness of 
this class.


> Add sequence distribution type to cassandra stress
> --
>
> Key: CASSANDRA-12490
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12490
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Ben Slater
>Assignee: Ben Slater
>Priority: Minor
> Fix For: 3.x
>
> Attachments: 12490-trunk.patch
>
>
> When using the write command, cassandra stress sequentially generates seeds. 
> This ensures generated values don't overlap (unless the sequence wraps) 
> providing more predictable number of inserted records (and generating a base 
> set of data without wasted writes).
> When using a yaml stress spec there is no sequenced distribution available. 
> It think it would be useful to have this for doing initial load of data for 
> testing 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12490) Add sequence distribution type to cassandra stress

2016-08-21 Thread Ben Slater (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15430002#comment-15430002
 ] 

Ben Slater commented on CASSANDRA-12490:


I'm pretty sure my implementation of inverseCumProb() is incorrect but it 
doesn't appear this practically matters. Happy to update if someone can explain 
what it's supposed to be returning.

> Add sequence distribution type to cassandra stress
> --
>
> Key: CASSANDRA-12490
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12490
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Ben Slater
>Assignee: Ben Slater
>Priority: Minor
> Fix For: 3.x
>
> Attachments: 12490-trunk.patch
>
>
> When using the write command, cassandra stress sequentially generates seeds. 
> This ensures generated values don't overlap (unless the sequence wraps) 
> providing more predictable number of inserted records (and generating a base 
> set of data without wasted writes).
> When using a yaml stress spec there is no sequenced distribution available. 
> It think it would be useful to have this for doing initial load of data for 
> testing 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)