[jira] [Commented] (CASSANDRA-12744) Randomness of stress distributions is not good

2017-10-05 Thread Daniel Cranford (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16192827#comment-16192827
 ] 

Daniel Cranford commented on CASSANDRA-12744:
-

Created CASSANDRA-13940 to fix this.

> Randomness of stress distributions is not good
> --
>
> Key: CASSANDRA-12744
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12744
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: T Jake Luciani
>Assignee: Ben Slater
>Priority: Minor
>  Labels: stress
> Fix For: 4.0
>
> Attachments: CASSANDRA_12744_SeedManager_changes-trunk.patch
>
>
> The randomness of our distributions is pretty bad.  We are using the 
> JDKRandomGenerator() but in testing of uniform(1..3) we see for 100 
> iterations it's only outputting 3.  If you bump it to 10k it hits all 3 
> values. 
> I made a change to just use the default commons math random generator and now 
> see all 3 values for n=10



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12744) Randomness of stress distributions is not good

2017-10-05 Thread Daniel Cranford (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16192820#comment-16192820
 ] 

Daniel Cranford commented on CASSANDRA-12744:
-

Some more thoughts:

The generation of partition keys has been "broken" since CASSANDRA-7519

The Linear Congruential Generators (LCGs) used in java.util.Random and by 
extension JDKRandomGenerator generate good random number sequences, but similar 
seeds result in similar sequences. Using the lcg update function {{lcg\(x) = 
a*x + c}} like

random ~1~ = lcg(1)
random ~2~ = lcg(2)
random ~3~ = lcg(3)
...
random ~n~ = lcg\(n)

does not generate a good random sequence, this is a misuse of the LCG. LCGs are 
supposed to be used like

random ~1~ = lcg(1)
random ~2~ = lcg(lcg(1))
random ~3~ = lcg(lcg(lcg(1)))
...
random ~n~ = lcg ^n^ (1)

I say "broken" in quotes because the misuse of LCGs ends up not mattering. 
{{new java.util.Random(seed).nextDouble()}} will always differ from {{new 
java.util.Random(seed + 1).nextDouble()}} by more than 1/100,000,000,000 Thus 
with the default partition key population (=UNIFORM(1..100B)), seeds that 
differ by 1 will generate distinct partition keys.

The only thing that matters about partition keys is how many distinct values 
there are (and how large their lexical value is). The number of partition key 
components doesn't matter. The cardinality of each partition key component 
doesn't matter. The distribution of values in the lexical partition key space 
doesn't matter.

At the end of the day, all the partition key components get concatenated and 
the resulting bit vector is hashed resulting in a uniformly distributed 64 bit 
token that determines where the data will be stored.

The easiest "fix" is to not use the partition key population to define the 
number of partition keys. Take advantage of the fact that the only thing that 
matters from a performance standpoint is the number of distinct partitions. 
Leave the partition key distribution at uniform(1..100B), and use the n= 
parameter to define the number of partitions.

An ideal fix would update the way partition keys are generated to use the LCG 
generator properly. However, this seems difficult since LCGs don't support 
random access (i.e., the only way to calculate the nth item in an LCG sequence 
is to first calculate the n-1 preceding items), and all three seed generation 
modes rely on the ability to randomly jump around in the seed sequence. This 
could be worked around by using a PRNG that supports random access to the n'th 
item in the sequence (e.g. something like JDK 1.8's SpittableRandom could be 
easily extended to support this).

A more workable fix is to spread the generated seeds (typically drawn from a 
smallish range of integers) around in the 2 ^64^ values a long can take before 
seeding the LCG. An additional caveat to whatever function is used for 
spreading the seeds needs to be invertable since LookbackableWriteGenerator's 
implementation relies on the properties of the sequence it generates to perform 
internal bookeeping.

Multiplication by an odd integer happens to be an invertable function (although 
integer division is NOT the inverse operation, multiplication by the modular 
inverse is). So the initial implementation (although broken) is not actually 
that bad an idea. I propose fixing things by picking a static integer as the 
multiplier and using multiplication by it's modular inverse to invert it for 
LookbackableWriteGenerator


> Randomness of stress distributions is not good
> --
>
> Key: CASSANDRA-12744
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12744
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: T Jake Luciani
>Assignee: Ben Slater
>Priority: Minor
>  Labels: stress
> Fix For: 4.0
>
> Attachments: CASSANDRA_12744_SeedManager_changes-trunk.patch
>
>
> The randomness of our distributions is pretty bad.  We are using the 
> JDKRandomGenerator() but in testing of uniform(1..3) we see for 100 
> iterations it's only outputting 3.  If you bump it to 10k it hits all 3 
> values. 
> I made a change to just use the default commons math random generator and now 
> see all 3 values for n=10



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12744) Randomness of stress distributions is not good

2017-10-03 Thread Daniel Cranford (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190085#comment-16190085
 ] 

Daniel Cranford commented on CASSANDRA-12744:
-

As I've thought about how to fix the seed multiplier, I've come to the 
conclusion that it is impossible to use an adaptive multiplier without breaking 
existing functionality or changing the command line interface.

One of the key reasons you can specify how the seeds get generated is so that 
you can partition the seed space and run multiple cassandra-stress processes on 
different machines in parallel so the cassandra-stress client doesn't become 
the bottleneck. E.G. to write 2 million partitions from two client machines, 
you'd run {noformat}cassandra-stress write n=100 -pop 
seq=1..100{noformat} on one client machine and {noformat}cassandra-stress 
write n=100 -pop seq=101..200{noformat} on the other client machine.

An adaptive multiplier that attempts to scale the seed sequence so that it's 
range is 10^22 (or better, Long.MAX_VALUE since seeds are 64 bit longs) would 
generate the same multiplier for both client processes resulting in seed 
sequence overlaps.

To correctly generate an adaptive multiplier, you need global knowledge of the 
entire range of seeds being generated by all cassandra-stress processes. This 
information cannot be supplied via the current command line interface. The 
command line interface would have to be updated in a breaking fashion to 
support an adaptive multiplier.

Using a hardcoded static multiplier is safe, but would reduce the allowable 
range of seed values (and thus reduce the maximum number of distinct partition 
keys). This probably isn't a big deal since nobody wants to write 2^64 
partitions. But it would need to be chosen with care so that the number of 
distinct seeds (and thus the number of distinct partitions) doesn't become too 
small.



> Randomness of stress distributions is not good
> --
>
> Key: CASSANDRA-12744
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12744
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: T Jake Luciani
>Assignee: Ben Slater
>Priority: Minor
>  Labels: stress
> Fix For: 4.0
>
> Attachments: CASSANDRA_12744_SeedManager_changes-trunk.patch
>
>
> The randomness of our distributions is pretty bad.  We are using the 
> JDKRandomGenerator() but in testing of uniform(1..3) we see for 100 
> iterations it's only outputting 3.  If you bump it to 10k it hits all 3 
> values. 
> I made a change to just use the default commons math random generator and now 
> see all 3 values for n=10



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12744) Randomness of stress distributions is not good

2017-09-28 Thread Daniel Cranford (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16184418#comment-16184418
 ] 

Daniel Cranford commented on CASSANDRA-12744:
-

I think the math on this is broken slightly. The seed multiplier is intended to 
scale all seeds to the 10^22 magnitude. However, seeds (and the multiplier) are 
all stored in 64 bit integers and the math is performed on them is 64 bit math.

10^22 is not representable as a long which has range {noformat}[-(2^63) : 2^63 
- 1] = [-9,223,372,036,854,775,808 : 9,223,372,036,854,775,807]{noformat}

Consider that for sample sizes under 1084, the line that calculates the the 
sample multiplier 
{noformat}this.sampleMultiplier = 1 + Math.round(Math.pow(10D, 22 - 
Math.log10(sampleSize)));{noformat}
will result in a multiplier of Long.MIN_VALUE which when multiplied by any long 
will result in 0 or Long.MIN_VALUE reducing your seeds to two distinct values.

I think using 18 instead of 22 as the target exponent should resolve this issue.

Additionally, I think the seed population size is being incorrectly calculated 
as the range of the revisit distribution (which defaults to uniform(1..1M)). 
However, when running in the default sequential seed mode (without revisits), 
eg {noformat}cassandra-stress write n=100{noformat}, the size of the seed 
population is actually the length of the seed sequence (in this case 100).

And when running with seeds generated from a distribution, eg 
{noformat}cassandra-stress read -pop dist=gaussian(1..250M){noformat} the size 
of the seed population is actually the range of the seed distribution (in this 
case 250 million).


> Randomness of stress distributions is not good
> --
>
> Key: CASSANDRA-12744
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12744
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: T Jake Luciani
>Assignee: Ben Slater
>Priority: Minor
>  Labels: stress
> Fix For: 4.0
>
> Attachments: CASSANDRA_12744_SeedManager_changes-trunk.patch
>
>
> The randomness of our distributions is pretty bad.  We are using the 
> JDKRandomGenerator() but in testing of uniform(1..3) we see for 100 
> iterations it's only outputting 3.  If you bump it to 10k it hits all 3 
> values. 
> I made a change to just use the default commons math random generator and now 
> see all 3 values for n=10



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12744) Randomness of stress distributions is not good

2017-06-08 Thread Ben Slater (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043871#comment-16043871
 ] 

Ben Slater commented on CASSANDRA-12744:


One extra note for future searching: There is a fair chance this fix will 
change the workload quite substantially in a number of scenarios. So, if you 
want to compare benchmarks make sure you don't compare results from stress with 
this fix vs stress without this fix.

> Randomness of stress distributions is not good
> --
>
> Key: CASSANDRA-12744
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12744
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: T Jake Luciani
>Assignee: Ben Slater
>Priority: Minor
>  Labels: stress
> Fix For: 4.0
>
> Attachments: CASSANDRA_12744_SeedManager_changes-trunk.patch
>
>
> The randomness of our distributions is pretty bad.  We are using the 
> JDKRandomGenerator() but in testing of uniform(1..3) we see for 100 
> iterations it's only outputting 3.  If you bump it to 10k it hits all 3 
> values. 
> I made a change to just use the default commons math random generator and now 
> see all 3 values for n=10



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12744) Randomness of stress distributions is not good

2017-06-05 Thread Ben Slater (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16038091#comment-16038091
 ] 

Ben Slater commented on CASSANDRA-12744:


Looks like the tests failures are unrelated? Are we OK to commit?

> Randomness of stress distributions is not good
> --
>
> Key: CASSANDRA-12744
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12744
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: T Jake Luciani
>Assignee: Ben Slater
>Priority: Minor
>  Labels: stress
> Fix For: 4.0
>
> Attachments: CASSANDRA_12744_SeedManager_changes-trunk.patch
>
>
> The randomness of our distributions is pretty bad.  We are using the 
> JDKRandomGenerator() but in testing of uniform(1..3) we see for 100 
> iterations it's only outputting 3.  If you bump it to 10k it hits all 3 
> values. 
> I made a change to just use the default commons math random generator and now 
> see all 3 values for n=10



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12744) Randomness of stress distributions is not good

2017-05-30 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16029477#comment-16029477
 ] 

T Jake Luciani commented on CASSANDRA-12744:


good find! I re-started the tests with your patch

> Randomness of stress distributions is not good
> --
>
> Key: CASSANDRA-12744
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12744
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: T Jake Luciani
>Assignee: Ben Slater
>Priority: Minor
>  Labels: stress
> Fix For: 4.0
>
> Attachments: CASSANDRA_12744_SeedManager_changes-trunk.patch
>
>
> The randomness of our distributions is pretty bad.  We are using the 
> JDKRandomGenerator() but in testing of uniform(1..3) we see for 100 
> iterations it's only outputting 3.  If you bump it to 10k it hits all 3 
> values. 
> I made a change to just use the default commons math random generator and now 
> see all 3 values for n=10



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12744) Randomness of stress distributions is not good

2017-05-28 Thread Ben Slater (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16027755#comment-16027755
 ] 

Ben Slater commented on CASSANDRA-12744:


Actually, I think it's a bit more complex than I just said but still think it's 
related to the interaction between the population distribution and the 
individual column distributions. Just tried 10,000 inserts with -pop 
dist=uniform(1..25) and the following YAML and only get 1 row inserted. 
table_definition: |
  CREATE TABLE test4 (
pk text,
pk2 text,
val text,
PRIMARY KEY ((pk,pk2))
  ) 
columnspec:
  - name: pk
size: fixed(2) 
population: exp(1..5) 
  - name: pk2
size: fixed(2) 
population: exp(1..5)

Running with -pop dist=uniform(1..10B) gives the expected 25 rows so it may be 
as simple as just setting a really big default population when running in user 
mode but I'll do a bit more digging.

> Randomness of stress distributions is not good
> --
>
> Key: CASSANDRA-12744
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12744
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: T Jake Luciani
>Assignee: Ben Slater
>Priority: Minor
>  Labels: stress
> Fix For: 4.0
>
>
> The randomness of our distributions is pretty bad.  We are using the 
> JDKRandomGenerator() but in testing of uniform(1..3) we see for 100 
> iterations it's only outputting 3.  If you bump it to 10k it hits all 3 
> values. 
> I made a change to just use the default commons math random generator and now 
> see all 3 values for n=10



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12744) Randomness of stress distributions is not good

2017-05-28 Thread Ben Slater (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16027722#comment-16027722
 ] 

Ben Slater commented on CASSANDRA-12744:


So I took a look into this with the following findings:
1) The dtest is broken because it assumes that when you when c*-stress with 
n=1 you will end up with 10,000 rows inserted when I think the actual 
functional guarantee is that it will run 10,000 insert operations.
2) However, with the JDKRandomGenerator is assumption hold up to a few hundred 
thousand records. Even with n=1M you end up with 999,999 records in the table. 
For some reason, change to the library default Well19937c generator means no 
only is the assumption broken at n=10k but seem to get proportional worse as n 
increases.

So, on those findings, I don't think changing the generator is a good idea.

So, I tried to dig a bit deeper about what was causing the issue. As part of 
this, I wrote some code to generate values directly from the distributions in 
various ways and the results all seemed as expected (ie reasonably aligned with 
the distribution type). 

After a bit more digging, and to cut a long story short, I found that the 
actual is related to the -pop setting. I'm still a bit hazy on this but it 
seems -pop is the distribution of all possible keys. So, if I have a -pop of 
dist(1..10) I can only have 10 possible key values (ie combinations across all 
columns) no matter what the ranges specified for the key column in the YAML 
file are. The default for -pop is UNIFORM(1..n) where n is specified or 
1..1,000,000 where no n is specified. I think this all results in somewhat 
counter-intuitive results, particular with multi-part keys.

So, I think the actual answer here is to change the rules for the default -pop  
for yaml runs to have a population size equal to the product of the population 
size of each key as specified in the YAML.  For example, if I have two columns: 
partition_key UNIFORM(1..1M)
cluster_key UNIFORM(1..100)

The the default population should be 1..100M. I think this is already implied 
by the YAML and what people would expect (certainly what I expected).

I don't think this change will be two hard to make but interested to hear if 
anyone has an opinions before I jump into it.

> Randomness of stress distributions is not good
> --
>
> Key: CASSANDRA-12744
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12744
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: T Jake Luciani
>Assignee: Ben Slater
>Priority: Minor
>  Labels: stress
> Fix For: 4.0
>
>
> The randomness of our distributions is pretty bad.  We are using the 
> JDKRandomGenerator() but in testing of uniform(1..3) we see for 100 
> iterations it's only outputting 3.  If you bump it to 10k it hits all 3 
> values. 
> I made a change to just use the default commons math random generator and now 
> see all 3 values for n=10



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12744) Randomness of stress distributions is not good

2017-05-23 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16021150#comment-16021150
 ] 

T Jake Luciani commented on CASSANDRA-12744:


Thanks Ben, I'd appreciate it

Rebased on trunk:

[branch|https://github.com/tjake/cassandra/tree/stress-random-trunk]
[utest|http://cassci.datastax.com/job/tjake-stress-random-trunk-testall/]
[dtests| http://cassci.datastax.com/job/tjake-stress-random-trunk-dtest/]

> Randomness of stress distributions is not good
> --
>
> Key: CASSANDRA-12744
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12744
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: T Jake Luciani
>Assignee: T Jake Luciani
>Priority: Minor
>  Labels: stress
> Fix For: 4.0
>
>
> The randomness of our distributions is pretty bad.  We are using the 
> JDKRandomGenerator() but in testing of uniform(1..3) we see for 100 
> iterations it's only outputting 3.  If you bump it to 10k it hits all 3 
> values. 
> I made a change to just use the default commons math random generator and now 
> see all 3 values for n=10



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12744) Randomness of stress distributions is not good

2017-05-23 Thread Ben Slater (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16021038#comment-16021038
 ] 

Ben Slater commented on CASSANDRA-12744:


[~tjake] - just realised this one was still open. If you can kick off the tests 
again, I'd be happy to dig into any issues.

> Randomness of stress distributions is not good
> --
>
> Key: CASSANDRA-12744
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12744
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: T Jake Luciani
>Assignee: T Jake Luciani
>Priority: Minor
>  Labels: stress
> Fix For: 3.0.x
>
>
> The randomness of our distributions is pretty bad.  We are using the 
> JDKRandomGenerator() but in testing of uniform(1..3) we see for 100 
> iterations it's only outputting 3.  If you bump it to 10k it hits all 3 
> values. 
> I made a change to just use the default commons math random generator and now 
> see all 3 values for n=10



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12744) Randomness of stress distributions is not good

2017-01-19 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830595#comment-15830595
 ] 

T Jake Luciani commented on CASSANDRA-12744:


There are test failures I haven't looked into yet so no.

> Randomness of stress distributions is not good
> --
>
> Key: CASSANDRA-12744
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12744
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: T Jake Luciani
>Assignee: T Jake Luciani
>Priority: Minor
>  Labels: stress
> Fix For: 3.0.x
>
>
> The randomness of our distributions is pretty bad.  We are using the 
> JDKRandomGenerator() but in testing of uniform(1..3) we see for 100 
> iterations it's only outputting 3.  If you bump it to 10k it hits all 3 
> values. 
> I made a change to just use the default commons math random generator and now 
> see all 3 values for n=10



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12744) Randomness of stress distributions is not good

2017-01-17 Thread Ben Slater (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826181#comment-15826181
 ] 

Ben Slater commented on CASSANDRA-12744:


I am on leave until Monday 30 Jan. If you need an immediate response please 
contact [1]sa...@instaclustr.com or [2]supp...@instaclustr.com as appropriate. 
For less urgent queries, I will be checking email every couple of days and 
respond or redirect. Cheers Ben Slater Instaclustr

--
Ben SlaterChief Product Officer[3]Instaclustr: Cassandra + Spark - 
Managed | Consulting | Support[4]www.instaclustr.com


[1] mailto:sa...@instaclustr.com
[2] mailto:supp...@instaclustr.com
[3] https://www.instaclustr.com
[4] http://www.instaclustr.com


> Randomness of stress distributions is not good
> --
>
> Key: CASSANDRA-12744
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12744
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: T Jake Luciani
>Assignee: T Jake Luciani
>Priority: Minor
>  Labels: stress
> Fix For: 3.0.x
>
>
> The randomness of our distributions is pretty bad.  We are using the 
> JDKRandomGenerator() but in testing of uniform(1..3) we see for 100 
> iterations it's only outputting 3.  If you bump it to 10k it hits all 3 
> values. 
> I made a change to just use the default commons math random generator and now 
> see all 3 values for n=10



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12744) Randomness of stress distributions is not good

2017-01-17 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826180#comment-15826180
 ] 

Joshua McKenzie commented on CASSANDRA-12744:
-

[~tjake] - is this patch available?

> Randomness of stress distributions is not good
> --
>
> Key: CASSANDRA-12744
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12744
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: T Jake Luciani
>Assignee: T Jake Luciani
>Priority: Minor
>  Labels: stress
> Fix For: 3.0.x
>
>
> The randomness of our distributions is pretty bad.  We are using the 
> JDKRandomGenerator() but in testing of uniform(1..3) we see for 100 
> iterations it's only outputting 3.  If you bump it to 10k it hits all 3 
> values. 
> I made a change to just use the default commons math random generator and now 
> see all 3 values for n=10



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12744) Randomness of stress distributions is not good

2016-10-16 Thread Ben Slater (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15580762#comment-15580762
 ] 

Ben Slater commented on CASSANDRA-12744:


I tried this patch out. It definitely seems to improves distribution to 
something like what you'd expect. Didn't notice any issues.

> Randomness of stress distributions is not good
> --
>
> Key: CASSANDRA-12744
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12744
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: T Jake Luciani
>Assignee: T Jake Luciani
>Priority: Minor
>  Labels: stress
> Fix For: 3.0.10
>
>
> The randomness of our distributions is pretty bad.  We are using the 
> JDKRandomGenerator() but in testing of uniform(1..3) we see for 100 
> iterations it's only outputting 3.  If you bump it to 10k it hits all 3 
> values. 
> I made a change to just use the default commons math random generator and now 
> see all 3 values for n=10



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12744) Randomness of stress distributions is not good

2016-10-10 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562403#comment-15562403
 ] 

Paulo Motta commented on CASSANDRA-12744:
-

some 
[cqlsh_copy_tests|http://cassci.datastax.com/job/tjake-stress-random-dtest/lastCompletedBuild/testReport/cqlsh_tests.cqlsh_copy_tests/]
 are outputting the following error:
{noformat}
cassandra-stress did not import enough records
{noformat}

do you think this could be related to the distribution change? if not, can you 
maybe rebase and resubmit those?

> Randomness of stress distributions is not good
> --
>
> Key: CASSANDRA-12744
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12744
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: T Jake Luciani
>Assignee: T Jake Luciani
>Priority: Minor
>  Labels: stress
> Fix For: 3.0.10
>
>
> The randomness of our distributions is pretty bad.  We are using the 
> JDKRandomGenerator() but in testing of uniform(1..3) we see for 100 
> iterations it's only outputting 3.  If you bump it to 10k it hits all 3 
> values. 
> I made a change to just use the default commons math random generator and now 
> see all 3 values for n=10



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12744) Randomness of stress distributions is not good

2016-10-03 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15542859#comment-15542859
 ] 

T Jake Luciani commented on CASSANDRA-12744:


[branch|https://github.com/tjake/cassandra/tree/stress-random]
[utest|http://cassci.datastax.com/job/tjake-stress-random-testall/]
[dtests| http://cassci.datastax.com/job/tjake-stress-random-dtest/]

> Randomness of stress distributions is not good
> --
>
> Key: CASSANDRA-12744
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12744
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: T Jake Luciani
>Assignee: T Jake Luciani
>Priority: Minor
>  Labels: stress
> Fix For: 3.0.10
>
>
> The randomness of our distributions is pretty bad.  We are using the 
> JDKRandomGenerator() but in testing of uniform(1..3) we see for 100 
> iterations it's only outputting 3.  If you bump it to 10k it hits all 3 
> values. 
> I made a change to just use the default commons math random generator and now 
> see all 3 values for n=10



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)