[jira] [Commented] (CASSANDRA-9522) Specify unset column ratios in cassandra-stress write
[ https://issues.apache.org/jira/browse/CASSANDRA-9522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612042#comment-14612042 ] Jim Witschey commented on CASSANDRA-9522: - I failed to review this properly and have to reopen -- I didn't ask for cassci links before +1ing this change: http://cassci.datastax.com/view/Dev/view/tjake/job/tjake-stress-9522-dtest/1/console This fails on, e.g. {{sstablesplit_test.py:TestSSTableSplit.single_file_split_test}} with an NPE in {{PredefinedOperation.init}}. Failing output in [this Gist|https://gist.github.com/mambocab/acaa2a880c2e55d9de8b]. Specify unset column ratios in cassandra-stress write - Key: CASSANDRA-9522 URL: https://issues.apache.org/jira/browse/CASSANDRA-9522 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Jim Witschey Assignee: T Jake Luciani Fix For: 3.0 beta 1 I'd like to be able to use stress to generate workloads with different distributions of unset columns -- so, for instance, you could specify that rows will have 70% unset columns, and on average, a 100-column row would contain only 30 values. This would help us test the new row formats introduced in 8099. There are a 2 different row formats, used depending on the ratio of set to unset columns, and this feature would let us generate workloads that would be stored in each of those formats. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9522) Specify unset column ratios in cassandra-stress write
[ https://issues.apache.org/jira/browse/CASSANDRA-9522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14608766#comment-14608766 ] Jim Witschey commented on CASSANDRA-9522: - What distributions can you specify via the command line? {{FIXED}} works, but {{UNIFORM}} doesn't: {code} $ ./tools/bin/cassandra-stress write n=1000 -rate threads=50 -col n=FIXED\(50\) -insert row-population-ratio=uniform\(5..10\) Invalid parameter row-population-ratio=uniform(5..10) $ ./tools/bin/cassandra-stress write n=1000 -rate threads=50 -col n=FIXED\(50\) -insert row-population-ratio=fixed\(1\)/2 # expected stress output {code} I haven't tried specifying via a yaml file. Specify unset column ratios in cassandra-stress write - Key: CASSANDRA-9522 URL: https://issues.apache.org/jira/browse/CASSANDRA-9522 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Jim Witschey Assignee: T Jake Luciani Fix For: 3.0 beta 1 I'd like to be able to use stress to generate workloads with different distributions of unset columns -- so, for instance, you could specify that rows will have 70% unset columns, and on average, a 100-column row would contain only 30 values. This would help us test the new row formats introduced in 8099. There are a 2 different row formats, used depending on the ratio of set to unset columns, and this feature would let us generate workloads that would be stored in each of those formats. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9522) Specify unset column ratios in cassandra-stress write
[ https://issues.apache.org/jira/browse/CASSANDRA-9522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14608779#comment-14608779 ] T Jake Luciani commented on CASSANDRA-9522: --- bq. What distributions can you specify via the command line? FIXED works, but UNIFORM doesn't: It's a *Ratio* so needs to be divided be a number. like fixed(1)/2 Specify unset column ratios in cassandra-stress write - Key: CASSANDRA-9522 URL: https://issues.apache.org/jira/browse/CASSANDRA-9522 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Jim Witschey Assignee: T Jake Luciani Fix For: 3.0 beta 1 I'd like to be able to use stress to generate workloads with different distributions of unset columns -- so, for instance, you could specify that rows will have 70% unset columns, and on average, a 100-column row would contain only 30 values. This would help us test the new row formats introduced in 8099. There are a 2 different row formats, used depending on the ratio of set to unset columns, and this feature would let us generate workloads that would be stored in each of those formats. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9522) Specify unset column ratios in cassandra-stress write
[ https://issues.apache.org/jira/browse/CASSANDRA-9522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14608957#comment-14608957 ] Jim Witschey commented on CASSANDRA-9522: - I understand; my mistake. Looks great! Specify unset column ratios in cassandra-stress write - Key: CASSANDRA-9522 URL: https://issues.apache.org/jira/browse/CASSANDRA-9522 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Jim Witschey Assignee: T Jake Luciani Fix For: 3.0 beta 1 I'd like to be able to use stress to generate workloads with different distributions of unset columns -- so, for instance, you could specify that rows will have 70% unset columns, and on average, a 100-column row would contain only 30 values. This would help us test the new row formats introduced in 8099. There are a 2 different row formats, used depending on the ratio of set to unset columns, and this feature would let us generate workloads that would be stored in each of those formats. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9522) Specify unset column ratios in cassandra-stress write
[ https://issues.apache.org/jira/browse/CASSANDRA-9522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605785#comment-14605785 ] Jim Witschey commented on CASSANDRA-9522: - Last he and I talked, [~tjake] proposed a {{insert_sparseness_distribution}} parameter in the stress yaml that would allow you to set sparseness per partition with a distribution specifier like {{fixed(50)}} or {{uniform(40..60)}}. That'd work for me; is that still a workable change? Specify unset column ratios in cassandra-stress write - Key: CASSANDRA-9522 URL: https://issues.apache.org/jira/browse/CASSANDRA-9522 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Jim Witschey Assignee: T Jake Luciani Fix For: 3.0 beta 1 I'd like to be able to use stress to generate workloads with different distributions of unset columns -- so, for instance, you could specify that rows will have 70% unset columns, and on average, a 100-column row would contain only 30 values. This would help us test the new row formats introduced in 8099. There are a 2 different row formats, used depending on the ratio of set to unset columns, and this feature would let us generate workloads that would be stored in each of those formats. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9522) Specify unset column ratios in cassandra-stress write
[ https://issues.apache.org/jira/browse/CASSANDRA-9522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605791#comment-14605791 ] T Jake Luciani commented on CASSANDRA-9522: --- Yeah, working on this atm Specify unset column ratios in cassandra-stress write - Key: CASSANDRA-9522 URL: https://issues.apache.org/jira/browse/CASSANDRA-9522 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Jim Witschey Assignee: T Jake Luciani Fix For: 3.0 beta 1 I'd like to be able to use stress to generate workloads with different distributions of unset columns -- so, for instance, you could specify that rows will have 70% unset columns, and on average, a 100-column row would contain only 30 values. This would help us test the new row formats introduced in 8099. There are a 2 different row formats, used depending on the ratio of set to unset columns, and this feature would let us generate workloads that would be stored in each of those formats. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9522) Specify unset column ratios in cassandra-stress write
[ https://issues.apache.org/jira/browse/CASSANDRA-9522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606270#comment-14606270 ] T Jake Luciani commented on CASSANDRA-9522: --- Branch here: https://github.com/tjake/cassandra/tree/stress-9522 works with command line arg or yaml param. Simple example that leaves 50% of the columns null {code} ./tools/bin/cassandra-stress write n=1000 -rate threads=50 -col n=FIXED\(50\) -insert row-population-ratio=fixed\(1\)/2 {code} Specify unset column ratios in cassandra-stress write - Key: CASSANDRA-9522 URL: https://issues.apache.org/jira/browse/CASSANDRA-9522 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Jim Witschey Assignee: T Jake Luciani Fix For: 3.0 beta 1 I'd like to be able to use stress to generate workloads with different distributions of unset columns -- so, for instance, you could specify that rows will have 70% unset columns, and on average, a 100-column row would contain only 30 values. This would help us test the new row formats introduced in 8099. There are a 2 different row formats, used depending on the ratio of set to unset columns, and this feature would let us generate workloads that would be stored in each of those formats. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9522) Specify unset column ratios in cassandra-stress write
[ https://issues.apache.org/jira/browse/CASSANDRA-9522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605077#comment-14605077 ] Jonathan Ellis commented on CASSANDRA-9522: --- /cc [~mambocab] Specify unset column ratios in cassandra-stress write - Key: CASSANDRA-9522 URL: https://issues.apache.org/jira/browse/CASSANDRA-9522 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Jim Witschey Assignee: T Jake Luciani Fix For: 3.0 beta 1 I'd like to be able to use stress to generate workloads with different distributions of unset columns -- so, for instance, you could specify that rows will have 70% unset columns, and on average, a 100-column row would contain only 30 values. This would help us test the new row formats introduced in 8099. There are a 2 different row formats, used depending on the ratio of set to unset columns, and this feature would let us generate workloads that would be stored in each of those formats. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9522) Specify unset column ratios in cassandra-stress write
[ https://issues.apache.org/jira/browse/CASSANDRA-9522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14585915#comment-14585915 ] T Jake Luciani commented on CASSANDRA-9522: --- The magic ratio seems to be 50% Would adding the ability to ignore certain columns defined in the schema be good enough? Like in the column spec of the yaml file you could add ignored: true and stress would just not insert into it. Then you could do one test with 1/2 of the columns set to ignored = true. and another with 1/2 - 1 set. Specify unset column ratios in cassandra-stress write - Key: CASSANDRA-9522 URL: https://issues.apache.org/jira/browse/CASSANDRA-9522 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Jim Witschey Assignee: T Jake Luciani Fix For: 3.0 beta 1 I'd like to be able to use stress to generate workloads with different distributions of unset columns -- so, for instance, you could specify that rows will have 70% unset columns, and on average, a 100-column row would contain only 30 values. This would help us test the new row formats introduced in 8099. There are a 2 different row formats, used depending on the ratio of set to unset columns, and this feature would let us generate workloads that would be stored in each of those formats. -- This message was sent by Atlassian JIRA (v6.3.4#6332)