[jira] [Commented] (CASSANDRA-11105) cassandra-stress tool - InvalidQueryException: Batch too large
[ https://issues.apache.org/jira/browse/CASSANDRA-11105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026822#comment-17026822 ] Alexander Dejanovski commented on CASSANDRA-11105: -- I agree with [~mck]. The code has evolved too much anyway since my patch was written, and internally we've moved our efforts on a cassandra-stress replacement tool. Happy to have the ticket closed as "won't do". > cassandra-stress tool - InvalidQueryException: Batch too large > -- > > Key: CASSANDRA-11105 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11105 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Tools > Environment: Cassandra 2.2.4, Java 8, CentOS 6.5 >Reporter: Ralf Steppacher >Priority: Normal > Fix For: 4.0 > > Attachments: 11105-trunk.txt, batch_too_large.yaml > > > I am using Cassandra 2.2.4 and I am struggling to get the cassandra-stress > tool to work for my test scenario. I have followed the example on > http://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema > to create a yaml file describing my test (attached). > I am collecting events per user id (text, partition key). Events have a > session type (text), event type (text), and creation time (timestamp) > (clustering keys, in that order). Plus some more attributes required for > rendering the events in a UI. For testing purposes I ended up with the > following column spec and insert distribution: > {noformat} > columnspec: > - name: created_at > cluster: uniform(10..1) > - name: event_type > size: uniform(5..10) > population: uniform(1..30) > cluster: uniform(1..30) > - name: session_type > size: fixed(5) > population: uniform(1..4) > cluster: uniform(1..4) > - name: user_id > size: fixed(15) > population: uniform(1..100) > - name: message > size: uniform(10..100) > population: uniform(1..100B) > insert: > partitions: fixed(1) > batchtype: UNLOGGED > select: fixed(1)/120 > {noformat} > Running stress tool for just the insert prints > {noformat} > Generating batches with [1..1] partitions and [0..1] rows (of [10..120] > total rows in the partitions) > {noformat} > and then immediately starts flooding me with > {{com.datastax.driver.core.exceptions.InvalidQueryException: Batch too > large}}. > Why I should be exceeding the {{batch_size_fail_threshold_in_kb: 50}} in the > {{cassandra.yaml}} I do not understand. My understanding is that the stress > tool should generate one row per batch. The size of a single row should not > exceed {{8+10*3+5*3+15*3+100*3 = 398 bytes}}. Assuming a worst case of all > text characters being 3 byte unicode characters. > This is how I start the attached user scenario: > {noformat} > [rsteppac@centos bin]$ ./cassandra-stress user > profile=../batch_too_large.yaml ops\(insert=1\) -log level=verbose > file=~/centos_event_by_patient_session_event_timestamp_insert_only.log -node > 10.211.55.8 > INFO 08:00:07 Did not find Netty's native epoll transport in the classpath, > defaulting to NIO. > INFO 08:00:08 Using data-center name 'datacenter1' for > DCAwareRoundRobinPolicy (if this is incorrect, please provide the correct > datacenter name with DCAwareRoundRobinPolicy constructor) > INFO 08:00:08 New Cassandra host /10.211.55.8:9042 added > Connected to cluster: Titan_DEV > Datatacenter: datacenter1; Host: /10.211.55.8; Rack: rack1 > Created schema. Sleeping 1s for propagation. > Generating batches with [1..1] partitions and [0..1] rows (of [10..120] > total rows in the partitions) > com.datastax.driver.core.exceptions.InvalidQueryException: Batch too large > at > com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:35) > at > com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:271) > at > com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:185) > at > com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:55) > at > org.apache.cassandra.stress.operations.userdefined.SchemaInsert$JavaDriverRun.run(SchemaInsert.java:87) > at > org.apache.cassandra.stress.Operation.timeWithRetry(Operation.java:159) > at > org.apache.cassandra.stress.operations.userdefined.SchemaInsert.run(SchemaInsert.java:119) > at > org.apache.cassandra.stress.StressAction$Consumer.run(StressAction.java:309) > Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Batch > too large > at > com.datastax.driver.core.Responses$Error.asException(Responses.java:125) > at > com.datastax.driver.core.DefaultResult
[jira] [Commented] (CASSANDRA-11105) cassandra-stress tool - InvalidQueryException: Batch too large
[ https://issues.apache.org/jira/browse/CASSANDRA-11105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026814#comment-17026814 ] Michael Semb Wever commented on CASSANDRA-11105: I'd be in preference of closing out the ticket as 'wont do', because not only is there a workaround but that workaround is probably closer to what you are trying to benchmarking. That is, big batches are not normal and not recommended. That cassandra-stress by default uses batches is unfortunate, and even more unfortunate that it is so convoluted to make batches consist of only single inserts. > cassandra-stress tool - InvalidQueryException: Batch too large > -- > > Key: CASSANDRA-11105 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11105 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Tools > Environment: Cassandra 2.2.4, Java 8, CentOS 6.5 >Reporter: Ralf Steppacher >Priority: Normal > Fix For: 4.0 > > Attachments: 11105-trunk.txt, batch_too_large.yaml > > > I am using Cassandra 2.2.4 and I am struggling to get the cassandra-stress > tool to work for my test scenario. I have followed the example on > http://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema > to create a yaml file describing my test (attached). > I am collecting events per user id (text, partition key). Events have a > session type (text), event type (text), and creation time (timestamp) > (clustering keys, in that order). Plus some more attributes required for > rendering the events in a UI. For testing purposes I ended up with the > following column spec and insert distribution: > {noformat} > columnspec: > - name: created_at > cluster: uniform(10..1) > - name: event_type > size: uniform(5..10) > population: uniform(1..30) > cluster: uniform(1..30) > - name: session_type > size: fixed(5) > population: uniform(1..4) > cluster: uniform(1..4) > - name: user_id > size: fixed(15) > population: uniform(1..100) > - name: message > size: uniform(10..100) > population: uniform(1..100B) > insert: > partitions: fixed(1) > batchtype: UNLOGGED > select: fixed(1)/120 > {noformat} > Running stress tool for just the insert prints > {noformat} > Generating batches with [1..1] partitions and [0..1] rows (of [10..120] > total rows in the partitions) > {noformat} > and then immediately starts flooding me with > {{com.datastax.driver.core.exceptions.InvalidQueryException: Batch too > large}}. > Why I should be exceeding the {{batch_size_fail_threshold_in_kb: 50}} in the > {{cassandra.yaml}} I do not understand. My understanding is that the stress > tool should generate one row per batch. The size of a single row should not > exceed {{8+10*3+5*3+15*3+100*3 = 398 bytes}}. Assuming a worst case of all > text characters being 3 byte unicode characters. > This is how I start the attached user scenario: > {noformat} > [rsteppac@centos bin]$ ./cassandra-stress user > profile=../batch_too_large.yaml ops\(insert=1\) -log level=verbose > file=~/centos_event_by_patient_session_event_timestamp_insert_only.log -node > 10.211.55.8 > INFO 08:00:07 Did not find Netty's native epoll transport in the classpath, > defaulting to NIO. > INFO 08:00:08 Using data-center name 'datacenter1' for > DCAwareRoundRobinPolicy (if this is incorrect, please provide the correct > datacenter name with DCAwareRoundRobinPolicy constructor) > INFO 08:00:08 New Cassandra host /10.211.55.8:9042 added > Connected to cluster: Titan_DEV > Datatacenter: datacenter1; Host: /10.211.55.8; Rack: rack1 > Created schema. Sleeping 1s for propagation. > Generating batches with [1..1] partitions and [0..1] rows (of [10..120] > total rows in the partitions) > com.datastax.driver.core.exceptions.InvalidQueryException: Batch too large > at > com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:35) > at > com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:271) > at > com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:185) > at > com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:55) > at > org.apache.cassandra.stress.operations.userdefined.SchemaInsert$JavaDriverRun.run(SchemaInsert.java:87) > at > org.apache.cassandra.stress.Operation.timeWithRetry(Operation.java:159) > at > org.apache.cassandra.stress.operations.userdefined.SchemaInsert.run(SchemaInsert.java:119) > at > org.apache.cassandra.stress.StressAction$Consumer.run(StressAction.java:309) > Caused by: com.datastax.driver.core.exception
[jira] [Commented] (CASSANDRA-11105) cassandra-stress tool - InvalidQueryException: Batch too large
[ https://issues.apache.org/jira/browse/CASSANDRA-11105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026804#comment-17026804 ] Josh McKenzie commented on CASSANDRA-11105: --- [~adejanovski] - it's been almost 3 years since your patch on this ticket. Are you still active on the project and do you have a desire to move this forward by any chance (i.e. should we rebase and drum up a reviewer here)? If not, [~mck] - do you have cycles to take this on or perhaps a position on its importance to 4.0? > cassandra-stress tool - InvalidQueryException: Batch too large > -- > > Key: CASSANDRA-11105 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11105 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Tools > Environment: Cassandra 2.2.4, Java 8, CentOS 6.5 >Reporter: Ralf Steppacher >Priority: Normal > Fix For: 4.0 > > Attachments: 11105-trunk.txt, batch_too_large.yaml > > > I am using Cassandra 2.2.4 and I am struggling to get the cassandra-stress > tool to work for my test scenario. I have followed the example on > http://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema > to create a yaml file describing my test (attached). > I am collecting events per user id (text, partition key). Events have a > session type (text), event type (text), and creation time (timestamp) > (clustering keys, in that order). Plus some more attributes required for > rendering the events in a UI. For testing purposes I ended up with the > following column spec and insert distribution: > {noformat} > columnspec: > - name: created_at > cluster: uniform(10..1) > - name: event_type > size: uniform(5..10) > population: uniform(1..30) > cluster: uniform(1..30) > - name: session_type > size: fixed(5) > population: uniform(1..4) > cluster: uniform(1..4) > - name: user_id > size: fixed(15) > population: uniform(1..100) > - name: message > size: uniform(10..100) > population: uniform(1..100B) > insert: > partitions: fixed(1) > batchtype: UNLOGGED > select: fixed(1)/120 > {noformat} > Running stress tool for just the insert prints > {noformat} > Generating batches with [1..1] partitions and [0..1] rows (of [10..120] > total rows in the partitions) > {noformat} > and then immediately starts flooding me with > {{com.datastax.driver.core.exceptions.InvalidQueryException: Batch too > large}}. > Why I should be exceeding the {{batch_size_fail_threshold_in_kb: 50}} in the > {{cassandra.yaml}} I do not understand. My understanding is that the stress > tool should generate one row per batch. The size of a single row should not > exceed {{8+10*3+5*3+15*3+100*3 = 398 bytes}}. Assuming a worst case of all > text characters being 3 byte unicode characters. > This is how I start the attached user scenario: > {noformat} > [rsteppac@centos bin]$ ./cassandra-stress user > profile=../batch_too_large.yaml ops\(insert=1\) -log level=verbose > file=~/centos_event_by_patient_session_event_timestamp_insert_only.log -node > 10.211.55.8 > INFO 08:00:07 Did not find Netty's native epoll transport in the classpath, > defaulting to NIO. > INFO 08:00:08 Using data-center name 'datacenter1' for > DCAwareRoundRobinPolicy (if this is incorrect, please provide the correct > datacenter name with DCAwareRoundRobinPolicy constructor) > INFO 08:00:08 New Cassandra host /10.211.55.8:9042 added > Connected to cluster: Titan_DEV > Datatacenter: datacenter1; Host: /10.211.55.8; Rack: rack1 > Created schema. Sleeping 1s for propagation. > Generating batches with [1..1] partitions and [0..1] rows (of [10..120] > total rows in the partitions) > com.datastax.driver.core.exceptions.InvalidQueryException: Batch too large > at > com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:35) > at > com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:271) > at > com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:185) > at > com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:55) > at > org.apache.cassandra.stress.operations.userdefined.SchemaInsert$JavaDriverRun.run(SchemaInsert.java:87) > at > org.apache.cassandra.stress.Operation.timeWithRetry(Operation.java:159) > at > org.apache.cassandra.stress.operations.userdefined.SchemaInsert.run(SchemaInsert.java:119) > at > org.apache.cassandra.stress.StressAction$Consumer.run(StressAction.java:309) > Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Batch > too large > at > com.datastax.driver.core.R
[jira] [Commented] (CASSANDRA-11105) cassandra-stress tool - InvalidQueryException: Batch too large
[ https://issues.apache.org/jira/browse/CASSANDRA-11105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16533307#comment-16533307 ] mck commented on CASSANDRA-11105: - for reference sake: in newer versions the syntax is {{-insert visits=FIXED\(10M\)}} for example: ./cassandra-stress user profile=../batch_too_large.yaml ops\(insert=1\) -insert visits=FIXED\(10M\) -log level=verbose file=~/centos_event_by_patient_session_event_timestamp_insert_only.log -node 10.211.55.8 "FIXED" can also be any of the specifications found [here|https://github.com/apache/cassandra/blob/8b3a60b9a7dbefeecc06bace617279612ec7092d/tools/stress/src/org/apache/cassandra/stress/settings/OptionDistribution.java#L158-L170] > cassandra-stress tool - InvalidQueryException: Batch too large > -- > > Key: CASSANDRA-11105 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11105 > Project: Cassandra > Issue Type: Bug > Components: Tools > Environment: Cassandra 2.2.4, Java 8, CentOS 6.5 >Reporter: Ralf Steppacher >Priority: Major > Fix For: 4.0 > > Attachments: 11105-trunk.txt, batch_too_large.yaml > > > I am using Cassandra 2.2.4 and I am struggling to get the cassandra-stress > tool to work for my test scenario. I have followed the example on > http://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema > to create a yaml file describing my test (attached). > I am collecting events per user id (text, partition key). Events have a > session type (text), event type (text), and creation time (timestamp) > (clustering keys, in that order). Plus some more attributes required for > rendering the events in a UI. For testing purposes I ended up with the > following column spec and insert distribution: > {noformat} > columnspec: > - name: created_at > cluster: uniform(10..1) > - name: event_type > size: uniform(5..10) > population: uniform(1..30) > cluster: uniform(1..30) > - name: session_type > size: fixed(5) > population: uniform(1..4) > cluster: uniform(1..4) > - name: user_id > size: fixed(15) > population: uniform(1..100) > - name: message > size: uniform(10..100) > population: uniform(1..100B) > insert: > partitions: fixed(1) > batchtype: UNLOGGED > select: fixed(1)/120 > {noformat} > Running stress tool for just the insert prints > {noformat} > Generating batches with [1..1] partitions and [0..1] rows (of [10..120] > total rows in the partitions) > {noformat} > and then immediately starts flooding me with > {{com.datastax.driver.core.exceptions.InvalidQueryException: Batch too > large}}. > Why I should be exceeding the {{batch_size_fail_threshold_in_kb: 50}} in the > {{cassandra.yaml}} I do not understand. My understanding is that the stress > tool should generate one row per batch. The size of a single row should not > exceed {{8+10*3+5*3+15*3+100*3 = 398 bytes}}. Assuming a worst case of all > text characters being 3 byte unicode characters. > This is how I start the attached user scenario: > {noformat} > [rsteppac@centos bin]$ ./cassandra-stress user > profile=../batch_too_large.yaml ops\(insert=1\) -log level=verbose > file=~/centos_event_by_patient_session_event_timestamp_insert_only.log -node > 10.211.55.8 > INFO 08:00:07 Did not find Netty's native epoll transport in the classpath, > defaulting to NIO. > INFO 08:00:08 Using data-center name 'datacenter1' for > DCAwareRoundRobinPolicy (if this is incorrect, please provide the correct > datacenter name with DCAwareRoundRobinPolicy constructor) > INFO 08:00:08 New Cassandra host /10.211.55.8:9042 added > Connected to cluster: Titan_DEV > Datatacenter: datacenter1; Host: /10.211.55.8; Rack: rack1 > Created schema. Sleeping 1s for propagation. > Generating batches with [1..1] partitions and [0..1] rows (of [10..120] > total rows in the partitions) > com.datastax.driver.core.exceptions.InvalidQueryException: Batch too large > at > com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:35) > at > com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:271) > at > com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:185) > at > com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:55) > at > org.apache.cassandra.stress.operations.userdefined.SchemaInsert$JavaDriverRun.run(SchemaInsert.java:87) > at > org.apache.cassandra.stress.Operation.timeWithRetry(Operation.java:159) > at > org.apache.cassandra.stress.operations.userdefined.SchemaInsert.run(SchemaInsert.java:119) > at > org.apache.cassandra.str
[jira] [Commented] (CASSANDRA-11105) cassandra-stress tool - InvalidQueryException: Batch too large
[ https://issues.apache.org/jira/browse/CASSANDRA-11105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15731138#comment-15731138 ] mck commented on CASSANDRA-11105: - [~ralfsteppacher], {quote}My understanding is that the stress tool should generate one row per batch.{quote} The stress tool generates one batch request (or operation) per partition. As you're asking for partitions with up to 1.2M rows here, the size of the batch statements are falling over. A simple way of dealing with this, if I've understood you correctly, should be to limit the number of rows per partition included within a batch statement by using "visits" option, ie adding {{-insert visits=\(X\)}} to the command line, where X is the number of divisions within each batch statement you'd like to see. Setting it to "10M" would effectively ensure one request per row, or batch statements with only one insert each. For example: {code}./cassandra-stress user profile=../batch_too_large.yaml ops\(insert=1\) -insert visits=\(10M\) -log level=verbose file=~/centos_event_by_patient_session_event_timestamp_insert_only.log -node 10.211.55.8{code} > cassandra-stress tool - InvalidQueryException: Batch too large > -- > > Key: CASSANDRA-11105 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11105 > Project: Cassandra > Issue Type: Bug > Components: Tools > Environment: Cassandra 2.2.4, Java 8, CentOS 6.5 >Reporter: Ralf Steppacher > Attachments: batch_too_large.yaml > > > I am using Cassandra 2.2.4 and I am struggling to get the cassandra-stress > tool to work for my test scenario. I have followed the example on > http://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema > to create a yaml file describing my test (attached). > I am collecting events per user id (text, partition key). Events have a > session type (text), event type (text), and creation time (timestamp) > (clustering keys, in that order). Plus some more attributes required for > rendering the events in a UI. For testing purposes I ended up with the > following column spec and insert distribution: > {noformat} > columnspec: > - name: created_at > cluster: uniform(10..1) > - name: event_type > size: uniform(5..10) > population: uniform(1..30) > cluster: uniform(1..30) > - name: session_type > size: fixed(5) > population: uniform(1..4) > cluster: uniform(1..4) > - name: user_id > size: fixed(15) > population: uniform(1..100) > - name: message > size: uniform(10..100) > population: uniform(1..100B) > insert: > partitions: fixed(1) > batchtype: UNLOGGED > select: fixed(1)/120 > {noformat} > Running stress tool for just the insert prints > {noformat} > Generating batches with [1..1] partitions and [0..1] rows (of [10..120] > total rows in the partitions) > {noformat} > and then immediately starts flooding me with > {{com.datastax.driver.core.exceptions.InvalidQueryException: Batch too > large}}. > Why I should be exceeding the {{batch_size_fail_threshold_in_kb: 50}} in the > {{cassandra.yaml}} I do not understand. My understanding is that the stress > tool should generate one row per batch. The size of a single row should not > exceed {{8+10*3+5*3+15*3+100*3 = 398 bytes}}. Assuming a worst case of all > text characters being 3 byte unicode characters. > This is how I start the attached user scenario: > {noformat} > [rsteppac@centos bin]$ ./cassandra-stress user > profile=../batch_too_large.yaml ops\(insert=1\) -log level=verbose > file=~/centos_event_by_patient_session_event_timestamp_insert_only.log -node > 10.211.55.8 > INFO 08:00:07 Did not find Netty's native epoll transport in the classpath, > defaulting to NIO. > INFO 08:00:08 Using data-center name 'datacenter1' for > DCAwareRoundRobinPolicy (if this is incorrect, please provide the correct > datacenter name with DCAwareRoundRobinPolicy constructor) > INFO 08:00:08 New Cassandra host /10.211.55.8:9042 added > Connected to cluster: Titan_DEV > Datatacenter: datacenter1; Host: /10.211.55.8; Rack: rack1 > Created schema. Sleeping 1s for propagation. > Generating batches with [1..1] partitions and [0..1] rows (of [10..120] > total rows in the partitions) > com.datastax.driver.core.exceptions.InvalidQueryException: Batch too large > at > com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:35) > at > com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:271) > at > com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:185) > at > com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:
[jira] [Commented] (CASSANDRA-11105) cassandra-stress tool - InvalidQueryException: Batch too large
[ https://issues.apache.org/jira/browse/CASSANDRA-11105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15128631#comment-15128631 ] Eric Evans commented on CASSANDRA-11105: FWIW, I'm seeing the same thing (2.1.12), [yaml gist here|https://gist.github.com/eevans/1babf3fab9206951d7e6]. When I run this config with {{n=1}}, I can see that 50 CQL rows are added, all with the same partition key, with two unique {{rev}} columns (25 each). > cassandra-stress tool - InvalidQueryException: Batch too large > -- > > Key: CASSANDRA-11105 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11105 > Project: Cassandra > Issue Type: Bug > Components: Tools > Environment: Cassandra 2.2.4, Java 8, CentOS 6.5 >Reporter: Ralf Steppacher > Attachments: batch_too_large.yaml > > > I am using Cassandra 2.2.4 and I am struggling to get the cassandra-stress > tool to work for my test scenario. I have followed the example on > http://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema > to create a yaml file describing my test (attached). > I am collecting events per user id (text, partition key). Events have a > session type (text), event type (text), and creation time (timestamp) > (clustering keys, in that order). Plus some more attributes required for > rendering the events in a UI. For testing purposes I ended up with the > following column spec and insert distribution: > {noformat} > columnspec: > - name: created_at > cluster: uniform(10..1) > - name: event_type > size: uniform(5..10) > population: uniform(1..30) > cluster: uniform(1..30) > - name: session_type > size: fixed(5) > population: uniform(1..4) > cluster: uniform(1..4) > - name: user_id > size: fixed(15) > population: uniform(1..100) > - name: message > size: uniform(10..100) > population: uniform(1..100B) > insert: > partitions: fixed(1) > batchtype: UNLOGGED > select: fixed(1)/120 > {noformat} > Running stress tool for just the insert prints > {noformat} > Generating batches with [1..1] partitions and [0..1] rows (of [10..120] > total rows in the partitions) > {noformat} > and then immediately starts flooding me with > {{com.datastax.driver.core.exceptions.InvalidQueryException: Batch too > large}}. > Why I should be exceeding the {{batch_size_fail_threshold_in_kb: 50}} in the > {{cassandra.yaml}} I do not understand. My understanding is that the stress > tool should generate one row per batch. The size of a single row should not > exceed {{8+10*3+5*3+15*3+100*3 = 398 bytes}}. Assuming a worst case of all > text characters being 3 byte unicode characters. > This is how I start the attached user scenario: > {noformat} > [rsteppac@centos bin]$ ./cassandra-stress user > profile=../batch_too_large.yaml ops\(insert=1\) -log level=verbose > file=~/centos_event_by_patient_session_event_timestamp_insert_only.log -node > 10.211.55.8 > INFO 08:00:07 Did not find Netty's native epoll transport in the classpath, > defaulting to NIO. > INFO 08:00:08 Using data-center name 'datacenter1' for > DCAwareRoundRobinPolicy (if this is incorrect, please provide the correct > datacenter name with DCAwareRoundRobinPolicy constructor) > INFO 08:00:08 New Cassandra host /10.211.55.8:9042 added > Connected to cluster: Titan_DEV > Datatacenter: datacenter1; Host: /10.211.55.8; Rack: rack1 > Created schema. Sleeping 1s for propagation. > Generating batches with [1..1] partitions and [0..1] rows (of [10..120] > total rows in the partitions) > com.datastax.driver.core.exceptions.InvalidQueryException: Batch too large > at > com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:35) > at > com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:271) > at > com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:185) > at > com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:55) > at > org.apache.cassandra.stress.operations.userdefined.SchemaInsert$JavaDriverRun.run(SchemaInsert.java:87) > at > org.apache.cassandra.stress.Operation.timeWithRetry(Operation.java:159) > at > org.apache.cassandra.stress.operations.userdefined.SchemaInsert.run(SchemaInsert.java:119) > at > org.apache.cassandra.stress.StressAction$Consumer.run(StressAction.java:309) > Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Batch > too large > at > com.datastax.driver.core.Responses$Error.asException(Responses.java:125) > at > com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:120) >