[jira] [Commented] (CASSANDRA-12744) Randomness of stress distributions is not good
[ https://issues.apache.org/jira/browse/CASSANDRA-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16192827#comment-16192827 ] Daniel Cranford commented on CASSANDRA-12744: - Created CASSANDRA-13940 to fix this. > Randomness of stress distributions is not good > -- > > Key: CASSANDRA-12744 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12744 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: T Jake Luciani >Assignee: Ben Slater >Priority: Minor > Labels: stress > Fix For: 4.0 > > Attachments: CASSANDRA_12744_SeedManager_changes-trunk.patch > > > The randomness of our distributions is pretty bad. We are using the > JDKRandomGenerator() but in testing of uniform(1..3) we see for 100 > iterations it's only outputting 3. If you bump it to 10k it hits all 3 > values. > I made a change to just use the default commons math random generator and now > see all 3 values for n=10 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-12744) Randomness of stress distributions is not good
[ https://issues.apache.org/jira/browse/CASSANDRA-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16192820#comment-16192820 ] Daniel Cranford commented on CASSANDRA-12744: - Some more thoughts: The generation of partition keys has been "broken" since CASSANDRA-7519 The Linear Congruential Generators (LCGs) used in java.util.Random and by extension JDKRandomGenerator generate good random number sequences, but similar seeds result in similar sequences. Using the lcg update function {{lcg\(x) = a*x + c}} like random ~1~ = lcg(1) random ~2~ = lcg(2) random ~3~ = lcg(3) ... random ~n~ = lcg\(n) does not generate a good random sequence, this is a misuse of the LCG. LCGs are supposed to be used like random ~1~ = lcg(1) random ~2~ = lcg(lcg(1)) random ~3~ = lcg(lcg(lcg(1))) ... random ~n~ = lcg ^n^ (1) I say "broken" in quotes because the misuse of LCGs ends up not mattering. {{new java.util.Random(seed).nextDouble()}} will always differ from {{new java.util.Random(seed + 1).nextDouble()}} by more than 1/100,000,000,000 Thus with the default partition key population (=UNIFORM(1..100B)), seeds that differ by 1 will generate distinct partition keys. The only thing that matters about partition keys is how many distinct values there are (and how large their lexical value is). The number of partition key components doesn't matter. The cardinality of each partition key component doesn't matter. The distribution of values in the lexical partition key space doesn't matter. At the end of the day, all the partition key components get concatenated and the resulting bit vector is hashed resulting in a uniformly distributed 64 bit token that determines where the data will be stored. The easiest "fix" is to not use the partition key population to define the number of partition keys. Take advantage of the fact that the only thing that matters from a performance standpoint is the number of distinct partitions. Leave the partition key distribution at uniform(1..100B), and use the n= parameter to define the number of partitions. An ideal fix would update the way partition keys are generated to use the LCG generator properly. However, this seems difficult since LCGs don't support random access (i.e., the only way to calculate the nth item in an LCG sequence is to first calculate the n-1 preceding items), and all three seed generation modes rely on the ability to randomly jump around in the seed sequence. This could be worked around by using a PRNG that supports random access to the n'th item in the sequence (e.g. something like JDK 1.8's SpittableRandom could be easily extended to support this). A more workable fix is to spread the generated seeds (typically drawn from a smallish range of integers) around in the 2 ^64^ values a long can take before seeding the LCG. An additional caveat to whatever function is used for spreading the seeds needs to be invertable since LookbackableWriteGenerator's implementation relies on the properties of the sequence it generates to perform internal bookeeping. Multiplication by an odd integer happens to be an invertable function (although integer division is NOT the inverse operation, multiplication by the modular inverse is). So the initial implementation (although broken) is not actually that bad an idea. I propose fixing things by picking a static integer as the multiplier and using multiplication by it's modular inverse to invert it for LookbackableWriteGenerator > Randomness of stress distributions is not good > -- > > Key: CASSANDRA-12744 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12744 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: T Jake Luciani >Assignee: Ben Slater >Priority: Minor > Labels: stress > Fix For: 4.0 > > Attachments: CASSANDRA_12744_SeedManager_changes-trunk.patch > > > The randomness of our distributions is pretty bad. We are using the > JDKRandomGenerator() but in testing of uniform(1..3) we see for 100 > iterations it's only outputting 3. If you bump it to 10k it hits all 3 > values. > I made a change to just use the default commons math random generator and now > see all 3 values for n=10 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-12744) Randomness of stress distributions is not good
[ https://issues.apache.org/jira/browse/CASSANDRA-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190085#comment-16190085 ] Daniel Cranford commented on CASSANDRA-12744: - As I've thought about how to fix the seed multiplier, I've come to the conclusion that it is impossible to use an adaptive multiplier without breaking existing functionality or changing the command line interface. One of the key reasons you can specify how the seeds get generated is so that you can partition the seed space and run multiple cassandra-stress processes on different machines in parallel so the cassandra-stress client doesn't become the bottleneck. E.G. to write 2 million partitions from two client machines, you'd run {noformat}cassandra-stress write n=100 -pop seq=1..100{noformat} on one client machine and {noformat}cassandra-stress write n=100 -pop seq=101..200{noformat} on the other client machine. An adaptive multiplier that attempts to scale the seed sequence so that it's range is 10^22 (or better, Long.MAX_VALUE since seeds are 64 bit longs) would generate the same multiplier for both client processes resulting in seed sequence overlaps. To correctly generate an adaptive multiplier, you need global knowledge of the entire range of seeds being generated by all cassandra-stress processes. This information cannot be supplied via the current command line interface. The command line interface would have to be updated in a breaking fashion to support an adaptive multiplier. Using a hardcoded static multiplier is safe, but would reduce the allowable range of seed values (and thus reduce the maximum number of distinct partition keys). This probably isn't a big deal since nobody wants to write 2^64 partitions. But it would need to be chosen with care so that the number of distinct seeds (and thus the number of distinct partitions) doesn't become too small. > Randomness of stress distributions is not good > -- > > Key: CASSANDRA-12744 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12744 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: T Jake Luciani >Assignee: Ben Slater >Priority: Minor > Labels: stress > Fix For: 4.0 > > Attachments: CASSANDRA_12744_SeedManager_changes-trunk.patch > > > The randomness of our distributions is pretty bad. We are using the > JDKRandomGenerator() but in testing of uniform(1..3) we see for 100 > iterations it's only outputting 3. If you bump it to 10k it hits all 3 > values. > I made a change to just use the default commons math random generator and now > see all 3 values for n=10 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-12744) Randomness of stress distributions is not good
[ https://issues.apache.org/jira/browse/CASSANDRA-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16184418#comment-16184418 ] Daniel Cranford commented on CASSANDRA-12744: - I think the math on this is broken slightly. The seed multiplier is intended to scale all seeds to the 10^22 magnitude. However, seeds (and the multiplier) are all stored in 64 bit integers and the math is performed on them is 64 bit math. 10^22 is not representable as a long which has range {noformat}[-(2^63) : 2^63 - 1] = [-9,223,372,036,854,775,808 : 9,223,372,036,854,775,807]{noformat} Consider that for sample sizes under 1084, the line that calculates the the sample multiplier {noformat}this.sampleMultiplier = 1 + Math.round(Math.pow(10D, 22 - Math.log10(sampleSize)));{noformat} will result in a multiplier of Long.MIN_VALUE which when multiplied by any long will result in 0 or Long.MIN_VALUE reducing your seeds to two distinct values. I think using 18 instead of 22 as the target exponent should resolve this issue. Additionally, I think the seed population size is being incorrectly calculated as the range of the revisit distribution (which defaults to uniform(1..1M)). However, when running in the default sequential seed mode (without revisits), eg {noformat}cassandra-stress write n=100{noformat}, the size of the seed population is actually the length of the seed sequence (in this case 100). And when running with seeds generated from a distribution, eg {noformat}cassandra-stress read -pop dist=gaussian(1..250M){noformat} the size of the seed population is actually the range of the seed distribution (in this case 250 million). > Randomness of stress distributions is not good > -- > > Key: CASSANDRA-12744 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12744 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: T Jake Luciani >Assignee: Ben Slater >Priority: Minor > Labels: stress > Fix For: 4.0 > > Attachments: CASSANDRA_12744_SeedManager_changes-trunk.patch > > > The randomness of our distributions is pretty bad. We are using the > JDKRandomGenerator() but in testing of uniform(1..3) we see for 100 > iterations it's only outputting 3. If you bump it to 10k it hits all 3 > values. > I made a change to just use the default commons math random generator and now > see all 3 values for n=10 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-12744) Randomness of stress distributions is not good
[ https://issues.apache.org/jira/browse/CASSANDRA-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043871#comment-16043871 ] Ben Slater commented on CASSANDRA-12744: One extra note for future searching: There is a fair chance this fix will change the workload quite substantially in a number of scenarios. So, if you want to compare benchmarks make sure you don't compare results from stress with this fix vs stress without this fix. > Randomness of stress distributions is not good > -- > > Key: CASSANDRA-12744 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12744 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: T Jake Luciani >Assignee: Ben Slater >Priority: Minor > Labels: stress > Fix For: 4.0 > > Attachments: CASSANDRA_12744_SeedManager_changes-trunk.patch > > > The randomness of our distributions is pretty bad. We are using the > JDKRandomGenerator() but in testing of uniform(1..3) we see for 100 > iterations it's only outputting 3. If you bump it to 10k it hits all 3 > values. > I made a change to just use the default commons math random generator and now > see all 3 values for n=10 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-12744) Randomness of stress distributions is not good
[ https://issues.apache.org/jira/browse/CASSANDRA-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16038091#comment-16038091 ] Ben Slater commented on CASSANDRA-12744: Looks like the tests failures are unrelated? Are we OK to commit? > Randomness of stress distributions is not good > -- > > Key: CASSANDRA-12744 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12744 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: T Jake Luciani >Assignee: Ben Slater >Priority: Minor > Labels: stress > Fix For: 4.0 > > Attachments: CASSANDRA_12744_SeedManager_changes-trunk.patch > > > The randomness of our distributions is pretty bad. We are using the > JDKRandomGenerator() but in testing of uniform(1..3) we see for 100 > iterations it's only outputting 3. If you bump it to 10k it hits all 3 > values. > I made a change to just use the default commons math random generator and now > see all 3 values for n=10 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-12744) Randomness of stress distributions is not good
[ https://issues.apache.org/jira/browse/CASSANDRA-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16029477#comment-16029477 ] T Jake Luciani commented on CASSANDRA-12744: good find! I re-started the tests with your patch > Randomness of stress distributions is not good > -- > > Key: CASSANDRA-12744 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12744 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: T Jake Luciani >Assignee: Ben Slater >Priority: Minor > Labels: stress > Fix For: 4.0 > > Attachments: CASSANDRA_12744_SeedManager_changes-trunk.patch > > > The randomness of our distributions is pretty bad. We are using the > JDKRandomGenerator() but in testing of uniform(1..3) we see for 100 > iterations it's only outputting 3. If you bump it to 10k it hits all 3 > values. > I made a change to just use the default commons math random generator and now > see all 3 values for n=10 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-12744) Randomness of stress distributions is not good
[ https://issues.apache.org/jira/browse/CASSANDRA-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16027755#comment-16027755 ] Ben Slater commented on CASSANDRA-12744: Actually, I think it's a bit more complex than I just said but still think it's related to the interaction between the population distribution and the individual column distributions. Just tried 10,000 inserts with -pop dist=uniform(1..25) and the following YAML and only get 1 row inserted. table_definition: | CREATE TABLE test4 ( pk text, pk2 text, val text, PRIMARY KEY ((pk,pk2)) ) columnspec: - name: pk size: fixed(2) population: exp(1..5) - name: pk2 size: fixed(2) population: exp(1..5) Running with -pop dist=uniform(1..10B) gives the expected 25 rows so it may be as simple as just setting a really big default population when running in user mode but I'll do a bit more digging. > Randomness of stress distributions is not good > -- > > Key: CASSANDRA-12744 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12744 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: T Jake Luciani >Assignee: Ben Slater >Priority: Minor > Labels: stress > Fix For: 4.0 > > > The randomness of our distributions is pretty bad. We are using the > JDKRandomGenerator() but in testing of uniform(1..3) we see for 100 > iterations it's only outputting 3. If you bump it to 10k it hits all 3 > values. > I made a change to just use the default commons math random generator and now > see all 3 values for n=10 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-12744) Randomness of stress distributions is not good
[ https://issues.apache.org/jira/browse/CASSANDRA-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16027722#comment-16027722 ] Ben Slater commented on CASSANDRA-12744: So I took a look into this with the following findings: 1) The dtest is broken because it assumes that when you when c*-stress with n=1 you will end up with 10,000 rows inserted when I think the actual functional guarantee is that it will run 10,000 insert operations. 2) However, with the JDKRandomGenerator is assumption hold up to a few hundred thousand records. Even with n=1M you end up with 999,999 records in the table. For some reason, change to the library default Well19937c generator means no only is the assumption broken at n=10k but seem to get proportional worse as n increases. So, on those findings, I don't think changing the generator is a good idea. So, I tried to dig a bit deeper about what was causing the issue. As part of this, I wrote some code to generate values directly from the distributions in various ways and the results all seemed as expected (ie reasonably aligned with the distribution type). After a bit more digging, and to cut a long story short, I found that the actual is related to the -pop setting. I'm still a bit hazy on this but it seems -pop is the distribution of all possible keys. So, if I have a -pop of dist(1..10) I can only have 10 possible key values (ie combinations across all columns) no matter what the ranges specified for the key column in the YAML file are. The default for -pop is UNIFORM(1..n) where n is specified or 1..1,000,000 where no n is specified. I think this all results in somewhat counter-intuitive results, particular with multi-part keys. So, I think the actual answer here is to change the rules for the default -pop for yaml runs to have a population size equal to the product of the population size of each key as specified in the YAML. For example, if I have two columns: partition_key UNIFORM(1..1M) cluster_key UNIFORM(1..100) The the default population should be 1..100M. I think this is already implied by the YAML and what people would expect (certainly what I expected). I don't think this change will be two hard to make but interested to hear if anyone has an opinions before I jump into it. > Randomness of stress distributions is not good > -- > > Key: CASSANDRA-12744 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12744 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: T Jake Luciani >Assignee: Ben Slater >Priority: Minor > Labels: stress > Fix For: 4.0 > > > The randomness of our distributions is pretty bad. We are using the > JDKRandomGenerator() but in testing of uniform(1..3) we see for 100 > iterations it's only outputting 3. If you bump it to 10k it hits all 3 > values. > I made a change to just use the default commons math random generator and now > see all 3 values for n=10 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-12744) Randomness of stress distributions is not good
[ https://issues.apache.org/jira/browse/CASSANDRA-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16021150#comment-16021150 ] T Jake Luciani commented on CASSANDRA-12744: Thanks Ben, I'd appreciate it Rebased on trunk: [branch|https://github.com/tjake/cassandra/tree/stress-random-trunk] [utest|http://cassci.datastax.com/job/tjake-stress-random-trunk-testall/] [dtests| http://cassci.datastax.com/job/tjake-stress-random-trunk-dtest/] > Randomness of stress distributions is not good > -- > > Key: CASSANDRA-12744 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12744 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: T Jake Luciani >Assignee: T Jake Luciani >Priority: Minor > Labels: stress > Fix For: 4.0 > > > The randomness of our distributions is pretty bad. We are using the > JDKRandomGenerator() but in testing of uniform(1..3) we see for 100 > iterations it's only outputting 3. If you bump it to 10k it hits all 3 > values. > I made a change to just use the default commons math random generator and now > see all 3 values for n=10 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-12744) Randomness of stress distributions is not good
[ https://issues.apache.org/jira/browse/CASSANDRA-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16021038#comment-16021038 ] Ben Slater commented on CASSANDRA-12744: [~tjake] - just realised this one was still open. If you can kick off the tests again, I'd be happy to dig into any issues. > Randomness of stress distributions is not good > -- > > Key: CASSANDRA-12744 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12744 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: T Jake Luciani >Assignee: T Jake Luciani >Priority: Minor > Labels: stress > Fix For: 3.0.x > > > The randomness of our distributions is pretty bad. We are using the > JDKRandomGenerator() but in testing of uniform(1..3) we see for 100 > iterations it's only outputting 3. If you bump it to 10k it hits all 3 > values. > I made a change to just use the default commons math random generator and now > see all 3 values for n=10 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-12744) Randomness of stress distributions is not good
[ https://issues.apache.org/jira/browse/CASSANDRA-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830595#comment-15830595 ] T Jake Luciani commented on CASSANDRA-12744: There are test failures I haven't looked into yet so no. > Randomness of stress distributions is not good > -- > > Key: CASSANDRA-12744 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12744 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: T Jake Luciani >Assignee: T Jake Luciani >Priority: Minor > Labels: stress > Fix For: 3.0.x > > > The randomness of our distributions is pretty bad. We are using the > JDKRandomGenerator() but in testing of uniform(1..3) we see for 100 > iterations it's only outputting 3. If you bump it to 10k it hits all 3 > values. > I made a change to just use the default commons math random generator and now > see all 3 values for n=10 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12744) Randomness of stress distributions is not good
[ https://issues.apache.org/jira/browse/CASSANDRA-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826181#comment-15826181 ] Ben Slater commented on CASSANDRA-12744: I am on leave until Monday 30 Jan. If you need an immediate response please contact [1]sa...@instaclustr.com or [2]supp...@instaclustr.com as appropriate. For less urgent queries, I will be checking email every couple of days and respond or redirect. Cheers Ben Slater Instaclustr -- Ben SlaterChief Product Officer[3]Instaclustr: Cassandra + Spark - Managed | Consulting | Support[4]www.instaclustr.com [1] mailto:sa...@instaclustr.com [2] mailto:supp...@instaclustr.com [3] https://www.instaclustr.com [4] http://www.instaclustr.com > Randomness of stress distributions is not good > -- > > Key: CASSANDRA-12744 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12744 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: T Jake Luciani >Assignee: T Jake Luciani >Priority: Minor > Labels: stress > Fix For: 3.0.x > > > The randomness of our distributions is pretty bad. We are using the > JDKRandomGenerator() but in testing of uniform(1..3) we see for 100 > iterations it's only outputting 3. If you bump it to 10k it hits all 3 > values. > I made a change to just use the default commons math random generator and now > see all 3 values for n=10 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12744) Randomness of stress distributions is not good
[ https://issues.apache.org/jira/browse/CASSANDRA-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826180#comment-15826180 ] Joshua McKenzie commented on CASSANDRA-12744: - [~tjake] - is this patch available? > Randomness of stress distributions is not good > -- > > Key: CASSANDRA-12744 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12744 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: T Jake Luciani >Assignee: T Jake Luciani >Priority: Minor > Labels: stress > Fix For: 3.0.x > > > The randomness of our distributions is pretty bad. We are using the > JDKRandomGenerator() but in testing of uniform(1..3) we see for 100 > iterations it's only outputting 3. If you bump it to 10k it hits all 3 > values. > I made a change to just use the default commons math random generator and now > see all 3 values for n=10 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12744) Randomness of stress distributions is not good
[ https://issues.apache.org/jira/browse/CASSANDRA-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15580762#comment-15580762 ] Ben Slater commented on CASSANDRA-12744: I tried this patch out. It definitely seems to improves distribution to something like what you'd expect. Didn't notice any issues. > Randomness of stress distributions is not good > -- > > Key: CASSANDRA-12744 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12744 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: T Jake Luciani >Assignee: T Jake Luciani >Priority: Minor > Labels: stress > Fix For: 3.0.10 > > > The randomness of our distributions is pretty bad. We are using the > JDKRandomGenerator() but in testing of uniform(1..3) we see for 100 > iterations it's only outputting 3. If you bump it to 10k it hits all 3 > values. > I made a change to just use the default commons math random generator and now > see all 3 values for n=10 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12744) Randomness of stress distributions is not good
[ https://issues.apache.org/jira/browse/CASSANDRA-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562403#comment-15562403 ] Paulo Motta commented on CASSANDRA-12744: - some [cqlsh_copy_tests|http://cassci.datastax.com/job/tjake-stress-random-dtest/lastCompletedBuild/testReport/cqlsh_tests.cqlsh_copy_tests/] are outputting the following error: {noformat} cassandra-stress did not import enough records {noformat} do you think this could be related to the distribution change? if not, can you maybe rebase and resubmit those? > Randomness of stress distributions is not good > -- > > Key: CASSANDRA-12744 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12744 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: T Jake Luciani >Assignee: T Jake Luciani >Priority: Minor > Labels: stress > Fix For: 3.0.10 > > > The randomness of our distributions is pretty bad. We are using the > JDKRandomGenerator() but in testing of uniform(1..3) we see for 100 > iterations it's only outputting 3. If you bump it to 10k it hits all 3 > values. > I made a change to just use the default commons math random generator and now > see all 3 values for n=10 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12744) Randomness of stress distributions is not good
[ https://issues.apache.org/jira/browse/CASSANDRA-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15542859#comment-15542859 ] T Jake Luciani commented on CASSANDRA-12744: [branch|https://github.com/tjake/cassandra/tree/stress-random] [utest|http://cassci.datastax.com/job/tjake-stress-random-testall/] [dtests| http://cassci.datastax.com/job/tjake-stress-random-dtest/] > Randomness of stress distributions is not good > -- > > Key: CASSANDRA-12744 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12744 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: T Jake Luciani >Assignee: T Jake Luciani >Priority: Minor > Labels: stress > Fix For: 3.0.10 > > > The randomness of our distributions is pretty bad. We are using the > JDKRandomGenerator() but in testing of uniform(1..3) we see for 100 > iterations it's only outputting 3. If you bump it to 10k it hits all 3 > values. > I made a change to just use the default commons math random generator and now > see all 3 values for n=10 -- This message was sent by Atlassian JIRA (v6.3.4#6332)