[jira] [Commented] (CASSANDRA-9522) Specify unset column ratios in cassandra-stress write

2015-07-02 Thread Jim Witschey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612042#comment-14612042
 ] 

Jim Witschey commented on CASSANDRA-9522:
-

I failed to review this properly and have to reopen -- I didn't ask for cassci 
links before +1ing this change: 

http://cassci.datastax.com/view/Dev/view/tjake/job/tjake-stress-9522-dtest/1/console

This fails on, e.g. 
{{sstablesplit_test.py:TestSSTableSplit.single_file_split_test}} with an NPE in 
{{PredefinedOperation.init}}. Failing output in [this 
Gist|https://gist.github.com/mambocab/acaa2a880c2e55d9de8b].

 Specify unset column ratios in cassandra-stress write
 -

 Key: CASSANDRA-9522
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9522
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Jim Witschey
Assignee: T Jake Luciani
 Fix For: 3.0 beta 1


 I'd like to be able to use stress to generate workloads with different 
 distributions of unset columns -- so, for instance, you could specify that 
 rows will have 70% unset columns, and on average, a 100-column row would 
 contain only 30 values.
 This would help us test the new row formats introduced in 8099. There are a 2 
 different row formats, used depending on the ratio of set to unset columns, 
 and this feature would let us generate workloads that would be stored in each 
 of those formats.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9522) Specify unset column ratios in cassandra-stress write

2015-06-30 Thread Jim Witschey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14608766#comment-14608766
 ] 

Jim Witschey commented on CASSANDRA-9522:
-

What distributions can you specify via the command line? {{FIXED}} works, but 
{{UNIFORM}} doesn't:

{code}
$ ./tools/bin/cassandra-stress write n=1000 -rate threads=50 -col n=FIXED\(50\) 
-insert row-population-ratio=uniform\(5..10\)
Invalid parameter row-population-ratio=uniform(5..10)


  
$ ./tools/bin/cassandra-stress write n=1000 -rate threads=50 -col n=FIXED\(50\) 
-insert row-population-ratio=fixed\(1\)/2
# expected stress output
{code}

I haven't tried specifying via a yaml file.

 Specify unset column ratios in cassandra-stress write
 -

 Key: CASSANDRA-9522
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9522
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Jim Witschey
Assignee: T Jake Luciani
 Fix For: 3.0 beta 1


 I'd like to be able to use stress to generate workloads with different 
 distributions of unset columns -- so, for instance, you could specify that 
 rows will have 70% unset columns, and on average, a 100-column row would 
 contain only 30 values.
 This would help us test the new row formats introduced in 8099. There are a 2 
 different row formats, used depending on the ratio of set to unset columns, 
 and this feature would let us generate workloads that would be stored in each 
 of those formats.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9522) Specify unset column ratios in cassandra-stress write

2015-06-30 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14608779#comment-14608779
 ] 

T Jake Luciani commented on CASSANDRA-9522:
---

bq. What distributions can you specify via the command line? FIXED works, but 
UNIFORM doesn't:

It's a *Ratio* so needs to be divided be a number. like fixed(1)/2

 Specify unset column ratios in cassandra-stress write
 -

 Key: CASSANDRA-9522
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9522
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Jim Witschey
Assignee: T Jake Luciani
 Fix For: 3.0 beta 1


 I'd like to be able to use stress to generate workloads with different 
 distributions of unset columns -- so, for instance, you could specify that 
 rows will have 70% unset columns, and on average, a 100-column row would 
 contain only 30 values.
 This would help us test the new row formats introduced in 8099. There are a 2 
 different row formats, used depending on the ratio of set to unset columns, 
 and this feature would let us generate workloads that would be stored in each 
 of those formats.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9522) Specify unset column ratios in cassandra-stress write

2015-06-30 Thread Jim Witschey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14608957#comment-14608957
 ] 

Jim Witschey commented on CASSANDRA-9522:
-

I understand; my mistake. Looks great!

 Specify unset column ratios in cassandra-stress write
 -

 Key: CASSANDRA-9522
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9522
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Jim Witschey
Assignee: T Jake Luciani
 Fix For: 3.0 beta 1


 I'd like to be able to use stress to generate workloads with different 
 distributions of unset columns -- so, for instance, you could specify that 
 rows will have 70% unset columns, and on average, a 100-column row would 
 contain only 30 values.
 This would help us test the new row formats introduced in 8099. There are a 2 
 different row formats, used depending on the ratio of set to unset columns, 
 and this feature would let us generate workloads that would be stored in each 
 of those formats.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9522) Specify unset column ratios in cassandra-stress write

2015-06-29 Thread Jim Witschey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605785#comment-14605785
 ] 

Jim Witschey commented on CASSANDRA-9522:
-

Last he and I talked, [~tjake] proposed a {{insert_sparseness_distribution}} 
parameter in the stress yaml that would allow you to set sparseness per 
partition with a distribution specifier like {{fixed(50)}} or 
{{uniform(40..60)}}. That'd work for me; is that still a workable change?

 Specify unset column ratios in cassandra-stress write
 -

 Key: CASSANDRA-9522
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9522
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Jim Witschey
Assignee: T Jake Luciani
 Fix For: 3.0 beta 1


 I'd like to be able to use stress to generate workloads with different 
 distributions of unset columns -- so, for instance, you could specify that 
 rows will have 70% unset columns, and on average, a 100-column row would 
 contain only 30 values.
 This would help us test the new row formats introduced in 8099. There are a 2 
 different row formats, used depending on the ratio of set to unset columns, 
 and this feature would let us generate workloads that would be stored in each 
 of those formats.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9522) Specify unset column ratios in cassandra-stress write

2015-06-29 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605791#comment-14605791
 ] 

T Jake Luciani commented on CASSANDRA-9522:
---

Yeah, working on this atm

 Specify unset column ratios in cassandra-stress write
 -

 Key: CASSANDRA-9522
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9522
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Jim Witschey
Assignee: T Jake Luciani
 Fix For: 3.0 beta 1


 I'd like to be able to use stress to generate workloads with different 
 distributions of unset columns -- so, for instance, you could specify that 
 rows will have 70% unset columns, and on average, a 100-column row would 
 contain only 30 values.
 This would help us test the new row formats introduced in 8099. There are a 2 
 different row formats, used depending on the ratio of set to unset columns, 
 and this feature would let us generate workloads that would be stored in each 
 of those formats.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9522) Specify unset column ratios in cassandra-stress write

2015-06-29 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606270#comment-14606270
 ] 

T Jake Luciani commented on CASSANDRA-9522:
---

Branch here: https://github.com/tjake/cassandra/tree/stress-9522

works with command line arg or yaml param.

Simple example that leaves 50% of the columns null
{code}
./tools/bin/cassandra-stress write n=1000 -rate threads=50 -col n=FIXED\(50\) 
-insert row-population-ratio=fixed\(1\)/2
{code}



 Specify unset column ratios in cassandra-stress write
 -

 Key: CASSANDRA-9522
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9522
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Jim Witschey
Assignee: T Jake Luciani
 Fix For: 3.0 beta 1


 I'd like to be able to use stress to generate workloads with different 
 distributions of unset columns -- so, for instance, you could specify that 
 rows will have 70% unset columns, and on average, a 100-column row would 
 contain only 30 values.
 This would help us test the new row formats introduced in 8099. There are a 2 
 different row formats, used depending on the ratio of set to unset columns, 
 and this feature would let us generate workloads that would be stored in each 
 of those formats.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9522) Specify unset column ratios in cassandra-stress write

2015-06-28 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605077#comment-14605077
 ] 

Jonathan Ellis commented on CASSANDRA-9522:
---

/cc [~mambocab]

 Specify unset column ratios in cassandra-stress write
 -

 Key: CASSANDRA-9522
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9522
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Jim Witschey
Assignee: T Jake Luciani
 Fix For: 3.0 beta 1


 I'd like to be able to use stress to generate workloads with different 
 distributions of unset columns -- so, for instance, you could specify that 
 rows will have 70% unset columns, and on average, a 100-column row would 
 contain only 30 values.
 This would help us test the new row formats introduced in 8099. There are a 2 
 different row formats, used depending on the ratio of set to unset columns, 
 and this feature would let us generate workloads that would be stored in each 
 of those formats.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9522) Specify unset column ratios in cassandra-stress write

2015-06-15 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14585915#comment-14585915
 ] 

T Jake Luciani commented on CASSANDRA-9522:
---

The magic ratio seems to be 50%
Would adding the ability to ignore certain columns defined in the schema be 
good enough?

Like in the column spec of the yaml file you could add ignored: true and stress 
would just not insert into it.  Then you could do one test with  1/2 of the 
columns set to ignored = true. and another with 1/2 - 1 set.

 Specify unset column ratios in cassandra-stress write
 -

 Key: CASSANDRA-9522
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9522
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Jim Witschey
Assignee: T Jake Luciani
 Fix For: 3.0 beta 1


 I'd like to be able to use stress to generate workloads with different 
 distributions of unset columns -- so, for instance, you could specify that 
 rows will have 70% unset columns, and on average, a 100-column row would 
 contain only 30 values.
 This would help us test the new row formats introduced in 8099. There are a 2 
 different row formats, used depending on the ratio of set to unset columns, 
 and this feature would let us generate workloads that would be stored in each 
 of those formats.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)