[jira] [Commented] (CASSANDRA-11105) cassandra-stress tool - InvalidQueryException: Batch too large

2020-01-30 Thread Alexander Dejanovski (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026822#comment-17026822
 ] 

Alexander Dejanovski commented on CASSANDRA-11105:
--

I agree with [~mck].

The code has evolved too much anyway since my patch was written, and internally 
we've moved our efforts on a cassandra-stress replacement tool.

Happy to have the ticket closed as "won't do".

> cassandra-stress tool - InvalidQueryException: Batch too large
> --
>
> Key: CASSANDRA-11105
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11105
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Tools
> Environment: Cassandra 2.2.4, Java 8, CentOS 6.5
>Reporter: Ralf Steppacher
>Priority: Normal
> Fix For: 4.0
>
> Attachments: 11105-trunk.txt, batch_too_large.yaml
>
>
> I am using Cassandra 2.2.4 and I am struggling to get the cassandra-stress 
> tool to work for my test scenario. I have followed the example on 
> http://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema
>  to create a yaml file describing my test (attached).
> I am collecting events per user id (text, partition key). Events have a 
> session type (text), event type (text), and creation time (timestamp) 
> (clustering keys, in that order). Plus some more attributes required for 
> rendering the events in a UI. For testing purposes I ended up with the 
> following column spec and insert distribution:
> {noformat}
> columnspec:
>   - name: created_at
> cluster: uniform(10..1)
>   - name: event_type
> size: uniform(5..10)
> population: uniform(1..30)
> cluster: uniform(1..30)
>   - name: session_type
> size: fixed(5)
> population: uniform(1..4)
> cluster: uniform(1..4)
>   - name: user_id
> size: fixed(15)
> population: uniform(1..100)
>   - name: message
> size: uniform(10..100)
> population: uniform(1..100B)
> insert:
>   partitions: fixed(1)
>   batchtype: UNLOGGED
>   select: fixed(1)/120
> {noformat}
> Running stress tool for just the insert prints 
> {noformat}
> Generating batches with [1..1] partitions and [0..1] rows (of [10..120] 
> total rows in the partitions)
> {noformat}
> and then immediately starts flooding me with 
> {{com.datastax.driver.core.exceptions.InvalidQueryException: Batch too 
> large}}. 
> Why I should be exceeding the {{batch_size_fail_threshold_in_kb: 50}} in the 
> {{cassandra.yaml}} I do not understand. My understanding is that the stress 
> tool should generate one row per batch. The size of a single row should not 
> exceed {{8+10*3+5*3+15*3+100*3 = 398 bytes}}. Assuming a worst case of all 
> text characters being 3 byte unicode characters. 
> This is how I start the attached user scenario:
> {noformat}
> [rsteppac@centos bin]$ ./cassandra-stress user 
> profile=../batch_too_large.yaml ops\(insert=1\) -log level=verbose 
> file=~/centos_event_by_patient_session_event_timestamp_insert_only.log -node 
> 10.211.55.8
> INFO  08:00:07 Did not find Netty's native epoll transport in the classpath, 
> defaulting to NIO.
> INFO  08:00:08 Using data-center name 'datacenter1' for 
> DCAwareRoundRobinPolicy (if this is incorrect, please provide the correct 
> datacenter name with DCAwareRoundRobinPolicy constructor)
> INFO  08:00:08 New Cassandra host /10.211.55.8:9042 added
> Connected to cluster: Titan_DEV
> Datatacenter: datacenter1; Host: /10.211.55.8; Rack: rack1
> Created schema. Sleeping 1s for propagation.
> Generating batches with [1..1] partitions and [0..1] rows (of [10..120] 
> total rows in the partitions)
> com.datastax.driver.core.exceptions.InvalidQueryException: Batch too large
>   at 
> com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:35)
>   at 
> com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:271)
>   at 
> com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:185)
>   at 
> com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:55)
>   at 
> org.apache.cassandra.stress.operations.userdefined.SchemaInsert$JavaDriverRun.run(SchemaInsert.java:87)
>   at 
> org.apache.cassandra.stress.Operation.timeWithRetry(Operation.java:159)
>   at 
> org.apache.cassandra.stress.operations.userdefined.SchemaInsert.run(SchemaInsert.java:119)
>   at 
> org.apache.cassandra.stress.StressAction$Consumer.run(StressAction.java:309)
> Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Batch 
> too large
>   at 
> com.datastax.driver.core.Responses$Error.asException(Responses.java:125)
>   at 
> com.datastax.driver.core.DefaultResult

[jira] [Commented] (CASSANDRA-11105) cassandra-stress tool - InvalidQueryException: Batch too large

2020-01-30 Thread Michael Semb Wever (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026814#comment-17026814
 ] 

Michael Semb Wever commented on CASSANDRA-11105:


I'd be in preference of closing out the ticket as 'wont do', because not only 
is there a workaround but that workaround is probably closer to what you are 
trying to benchmarking. That is, big batches are not normal and not 
recommended. That cassandra-stress by default uses batches is unfortunate, and 
even more unfortunate that it is so convoluted to make batches consist of only 
single inserts.

> cassandra-stress tool - InvalidQueryException: Batch too large
> --
>
> Key: CASSANDRA-11105
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11105
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Tools
> Environment: Cassandra 2.2.4, Java 8, CentOS 6.5
>Reporter: Ralf Steppacher
>Priority: Normal
> Fix For: 4.0
>
> Attachments: 11105-trunk.txt, batch_too_large.yaml
>
>
> I am using Cassandra 2.2.4 and I am struggling to get the cassandra-stress 
> tool to work for my test scenario. I have followed the example on 
> http://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema
>  to create a yaml file describing my test (attached).
> I am collecting events per user id (text, partition key). Events have a 
> session type (text), event type (text), and creation time (timestamp) 
> (clustering keys, in that order). Plus some more attributes required for 
> rendering the events in a UI. For testing purposes I ended up with the 
> following column spec and insert distribution:
> {noformat}
> columnspec:
>   - name: created_at
> cluster: uniform(10..1)
>   - name: event_type
> size: uniform(5..10)
> population: uniform(1..30)
> cluster: uniform(1..30)
>   - name: session_type
> size: fixed(5)
> population: uniform(1..4)
> cluster: uniform(1..4)
>   - name: user_id
> size: fixed(15)
> population: uniform(1..100)
>   - name: message
> size: uniform(10..100)
> population: uniform(1..100B)
> insert:
>   partitions: fixed(1)
>   batchtype: UNLOGGED
>   select: fixed(1)/120
> {noformat}
> Running stress tool for just the insert prints 
> {noformat}
> Generating batches with [1..1] partitions and [0..1] rows (of [10..120] 
> total rows in the partitions)
> {noformat}
> and then immediately starts flooding me with 
> {{com.datastax.driver.core.exceptions.InvalidQueryException: Batch too 
> large}}. 
> Why I should be exceeding the {{batch_size_fail_threshold_in_kb: 50}} in the 
> {{cassandra.yaml}} I do not understand. My understanding is that the stress 
> tool should generate one row per batch. The size of a single row should not 
> exceed {{8+10*3+5*3+15*3+100*3 = 398 bytes}}. Assuming a worst case of all 
> text characters being 3 byte unicode characters. 
> This is how I start the attached user scenario:
> {noformat}
> [rsteppac@centos bin]$ ./cassandra-stress user 
> profile=../batch_too_large.yaml ops\(insert=1\) -log level=verbose 
> file=~/centos_event_by_patient_session_event_timestamp_insert_only.log -node 
> 10.211.55.8
> INFO  08:00:07 Did not find Netty's native epoll transport in the classpath, 
> defaulting to NIO.
> INFO  08:00:08 Using data-center name 'datacenter1' for 
> DCAwareRoundRobinPolicy (if this is incorrect, please provide the correct 
> datacenter name with DCAwareRoundRobinPolicy constructor)
> INFO  08:00:08 New Cassandra host /10.211.55.8:9042 added
> Connected to cluster: Titan_DEV
> Datatacenter: datacenter1; Host: /10.211.55.8; Rack: rack1
> Created schema. Sleeping 1s for propagation.
> Generating batches with [1..1] partitions and [0..1] rows (of [10..120] 
> total rows in the partitions)
> com.datastax.driver.core.exceptions.InvalidQueryException: Batch too large
>   at 
> com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:35)
>   at 
> com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:271)
>   at 
> com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:185)
>   at 
> com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:55)
>   at 
> org.apache.cassandra.stress.operations.userdefined.SchemaInsert$JavaDriverRun.run(SchemaInsert.java:87)
>   at 
> org.apache.cassandra.stress.Operation.timeWithRetry(Operation.java:159)
>   at 
> org.apache.cassandra.stress.operations.userdefined.SchemaInsert.run(SchemaInsert.java:119)
>   at 
> org.apache.cassandra.stress.StressAction$Consumer.run(StressAction.java:309)
> Caused by: com.datastax.driver.core.exception

[jira] [Commented] (CASSANDRA-11105) cassandra-stress tool - InvalidQueryException: Batch too large

2020-01-30 Thread Josh McKenzie (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026804#comment-17026804
 ] 

Josh McKenzie commented on CASSANDRA-11105:
---

[~adejanovski] - it's been almost 3 years since your patch on this ticket. Are 
you still active on the project and do you have a desire to move this forward 
by any chance (i.e. should we rebase and drum up a reviewer here)? If not, 
[~mck] - do you have cycles to take this on or perhaps a position on its 
importance to 4.0?

> cassandra-stress tool - InvalidQueryException: Batch too large
> --
>
> Key: CASSANDRA-11105
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11105
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Tools
> Environment: Cassandra 2.2.4, Java 8, CentOS 6.5
>Reporter: Ralf Steppacher
>Priority: Normal
> Fix For: 4.0
>
> Attachments: 11105-trunk.txt, batch_too_large.yaml
>
>
> I am using Cassandra 2.2.4 and I am struggling to get the cassandra-stress 
> tool to work for my test scenario. I have followed the example on 
> http://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema
>  to create a yaml file describing my test (attached).
> I am collecting events per user id (text, partition key). Events have a 
> session type (text), event type (text), and creation time (timestamp) 
> (clustering keys, in that order). Plus some more attributes required for 
> rendering the events in a UI. For testing purposes I ended up with the 
> following column spec and insert distribution:
> {noformat}
> columnspec:
>   - name: created_at
> cluster: uniform(10..1)
>   - name: event_type
> size: uniform(5..10)
> population: uniform(1..30)
> cluster: uniform(1..30)
>   - name: session_type
> size: fixed(5)
> population: uniform(1..4)
> cluster: uniform(1..4)
>   - name: user_id
> size: fixed(15)
> population: uniform(1..100)
>   - name: message
> size: uniform(10..100)
> population: uniform(1..100B)
> insert:
>   partitions: fixed(1)
>   batchtype: UNLOGGED
>   select: fixed(1)/120
> {noformat}
> Running stress tool for just the insert prints 
> {noformat}
> Generating batches with [1..1] partitions and [0..1] rows (of [10..120] 
> total rows in the partitions)
> {noformat}
> and then immediately starts flooding me with 
> {{com.datastax.driver.core.exceptions.InvalidQueryException: Batch too 
> large}}. 
> Why I should be exceeding the {{batch_size_fail_threshold_in_kb: 50}} in the 
> {{cassandra.yaml}} I do not understand. My understanding is that the stress 
> tool should generate one row per batch. The size of a single row should not 
> exceed {{8+10*3+5*3+15*3+100*3 = 398 bytes}}. Assuming a worst case of all 
> text characters being 3 byte unicode characters. 
> This is how I start the attached user scenario:
> {noformat}
> [rsteppac@centos bin]$ ./cassandra-stress user 
> profile=../batch_too_large.yaml ops\(insert=1\) -log level=verbose 
> file=~/centos_event_by_patient_session_event_timestamp_insert_only.log -node 
> 10.211.55.8
> INFO  08:00:07 Did not find Netty's native epoll transport in the classpath, 
> defaulting to NIO.
> INFO  08:00:08 Using data-center name 'datacenter1' for 
> DCAwareRoundRobinPolicy (if this is incorrect, please provide the correct 
> datacenter name with DCAwareRoundRobinPolicy constructor)
> INFO  08:00:08 New Cassandra host /10.211.55.8:9042 added
> Connected to cluster: Titan_DEV
> Datatacenter: datacenter1; Host: /10.211.55.8; Rack: rack1
> Created schema. Sleeping 1s for propagation.
> Generating batches with [1..1] partitions and [0..1] rows (of [10..120] 
> total rows in the partitions)
> com.datastax.driver.core.exceptions.InvalidQueryException: Batch too large
>   at 
> com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:35)
>   at 
> com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:271)
>   at 
> com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:185)
>   at 
> com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:55)
>   at 
> org.apache.cassandra.stress.operations.userdefined.SchemaInsert$JavaDriverRun.run(SchemaInsert.java:87)
>   at 
> org.apache.cassandra.stress.Operation.timeWithRetry(Operation.java:159)
>   at 
> org.apache.cassandra.stress.operations.userdefined.SchemaInsert.run(SchemaInsert.java:119)
>   at 
> org.apache.cassandra.stress.StressAction$Consumer.run(StressAction.java:309)
> Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Batch 
> too large
>   at 
> com.datastax.driver.core.R

[jira] [Commented] (CASSANDRA-11105) cassandra-stress tool - InvalidQueryException: Batch too large

2018-07-04 Thread mck (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16533307#comment-16533307
 ] 

mck commented on CASSANDRA-11105:
-

for reference sake: in newer versions the syntax is {{-insert 
visits=FIXED\(10M\)}}

for example:
./cassandra-stress user profile=../batch_too_large.yaml ops\(insert=1\) -insert 
visits=FIXED\(10M\) -log level=verbose 
file=~/centos_event_by_patient_session_event_timestamp_insert_only.log -node 
10.211.55.8
"FIXED" can also be any of the specifications found 
[here|https://github.com/apache/cassandra/blob/8b3a60b9a7dbefeecc06bace617279612ec7092d/tools/stress/src/org/apache/cassandra/stress/settings/OptionDistribution.java#L158-L170]

> cassandra-stress tool - InvalidQueryException: Batch too large
> --
>
> Key: CASSANDRA-11105
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11105
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
> Environment: Cassandra 2.2.4, Java 8, CentOS 6.5
>Reporter: Ralf Steppacher
>Priority: Major
> Fix For: 4.0
>
> Attachments: 11105-trunk.txt, batch_too_large.yaml
>
>
> I am using Cassandra 2.2.4 and I am struggling to get the cassandra-stress 
> tool to work for my test scenario. I have followed the example on 
> http://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema
>  to create a yaml file describing my test (attached).
> I am collecting events per user id (text, partition key). Events have a 
> session type (text), event type (text), and creation time (timestamp) 
> (clustering keys, in that order). Plus some more attributes required for 
> rendering the events in a UI. For testing purposes I ended up with the 
> following column spec and insert distribution:
> {noformat}
> columnspec:
>   - name: created_at
> cluster: uniform(10..1)
>   - name: event_type
> size: uniform(5..10)
> population: uniform(1..30)
> cluster: uniform(1..30)
>   - name: session_type
> size: fixed(5)
> population: uniform(1..4)
> cluster: uniform(1..4)
>   - name: user_id
> size: fixed(15)
> population: uniform(1..100)
>   - name: message
> size: uniform(10..100)
> population: uniform(1..100B)
> insert:
>   partitions: fixed(1)
>   batchtype: UNLOGGED
>   select: fixed(1)/120
> {noformat}
> Running stress tool for just the insert prints 
> {noformat}
> Generating batches with [1..1] partitions and [0..1] rows (of [10..120] 
> total rows in the partitions)
> {noformat}
> and then immediately starts flooding me with 
> {{com.datastax.driver.core.exceptions.InvalidQueryException: Batch too 
> large}}. 
> Why I should be exceeding the {{batch_size_fail_threshold_in_kb: 50}} in the 
> {{cassandra.yaml}} I do not understand. My understanding is that the stress 
> tool should generate one row per batch. The size of a single row should not 
> exceed {{8+10*3+5*3+15*3+100*3 = 398 bytes}}. Assuming a worst case of all 
> text characters being 3 byte unicode characters. 
> This is how I start the attached user scenario:
> {noformat}
> [rsteppac@centos bin]$ ./cassandra-stress user 
> profile=../batch_too_large.yaml ops\(insert=1\) -log level=verbose 
> file=~/centos_event_by_patient_session_event_timestamp_insert_only.log -node 
> 10.211.55.8
> INFO  08:00:07 Did not find Netty's native epoll transport in the classpath, 
> defaulting to NIO.
> INFO  08:00:08 Using data-center name 'datacenter1' for 
> DCAwareRoundRobinPolicy (if this is incorrect, please provide the correct 
> datacenter name with DCAwareRoundRobinPolicy constructor)
> INFO  08:00:08 New Cassandra host /10.211.55.8:9042 added
> Connected to cluster: Titan_DEV
> Datatacenter: datacenter1; Host: /10.211.55.8; Rack: rack1
> Created schema. Sleeping 1s for propagation.
> Generating batches with [1..1] partitions and [0..1] rows (of [10..120] 
> total rows in the partitions)
> com.datastax.driver.core.exceptions.InvalidQueryException: Batch too large
>   at 
> com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:35)
>   at 
> com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:271)
>   at 
> com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:185)
>   at 
> com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:55)
>   at 
> org.apache.cassandra.stress.operations.userdefined.SchemaInsert$JavaDriverRun.run(SchemaInsert.java:87)
>   at 
> org.apache.cassandra.stress.Operation.timeWithRetry(Operation.java:159)
>   at 
> org.apache.cassandra.stress.operations.userdefined.SchemaInsert.run(SchemaInsert.java:119)
>   at 
> org.apache.cassandra.str

[jira] [Commented] (CASSANDRA-11105) cassandra-stress tool - InvalidQueryException: Batch too large

2016-12-07 Thread mck (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15731138#comment-15731138
 ] 

mck commented on CASSANDRA-11105:
-

[~ralfsteppacher],
{quote}My understanding is that the stress tool should generate one row per 
batch.{quote}

The stress tool generates one batch request (or operation) per partition. As 
you're asking for partitions with up to 1.2M rows here, the size of the batch 
statements are falling over.

A simple way of dealing with this, if I've understood you correctly, should be 
to limit the number of rows per partition included within a batch statement by 
using "visits" option, ie adding {{-insert visits=\(X\)}} to the command line, 
where X is the number of divisions within each batch statement you'd like to 
see. Setting it to "10M" would effectively ensure one request per row, or batch 
statements with only one insert each.

For example:
{code}./cassandra-stress user profile=../batch_too_large.yaml ops\(insert=1\) 
-insert visits=\(10M\) -log level=verbose 
file=~/centos_event_by_patient_session_event_timestamp_insert_only.log -node 
10.211.55.8{code}

> cassandra-stress tool - InvalidQueryException: Batch too large
> --
>
> Key: CASSANDRA-11105
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11105
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
> Environment: Cassandra 2.2.4, Java 8, CentOS 6.5
>Reporter: Ralf Steppacher
> Attachments: batch_too_large.yaml
>
>
> I am using Cassandra 2.2.4 and I am struggling to get the cassandra-stress 
> tool to work for my test scenario. I have followed the example on 
> http://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema
>  to create a yaml file describing my test (attached).
> I am collecting events per user id (text, partition key). Events have a 
> session type (text), event type (text), and creation time (timestamp) 
> (clustering keys, in that order). Plus some more attributes required for 
> rendering the events in a UI. For testing purposes I ended up with the 
> following column spec and insert distribution:
> {noformat}
> columnspec:
>   - name: created_at
> cluster: uniform(10..1)
>   - name: event_type
> size: uniform(5..10)
> population: uniform(1..30)
> cluster: uniform(1..30)
>   - name: session_type
> size: fixed(5)
> population: uniform(1..4)
> cluster: uniform(1..4)
>   - name: user_id
> size: fixed(15)
> population: uniform(1..100)
>   - name: message
> size: uniform(10..100)
> population: uniform(1..100B)
> insert:
>   partitions: fixed(1)
>   batchtype: UNLOGGED
>   select: fixed(1)/120
> {noformat}
> Running stress tool for just the insert prints 
> {noformat}
> Generating batches with [1..1] partitions and [0..1] rows (of [10..120] 
> total rows in the partitions)
> {noformat}
> and then immediately starts flooding me with 
> {{com.datastax.driver.core.exceptions.InvalidQueryException: Batch too 
> large}}. 
> Why I should be exceeding the {{batch_size_fail_threshold_in_kb: 50}} in the 
> {{cassandra.yaml}} I do not understand. My understanding is that the stress 
> tool should generate one row per batch. The size of a single row should not 
> exceed {{8+10*3+5*3+15*3+100*3 = 398 bytes}}. Assuming a worst case of all 
> text characters being 3 byte unicode characters. 
> This is how I start the attached user scenario:
> {noformat}
> [rsteppac@centos bin]$ ./cassandra-stress user 
> profile=../batch_too_large.yaml ops\(insert=1\) -log level=verbose 
> file=~/centos_event_by_patient_session_event_timestamp_insert_only.log -node 
> 10.211.55.8
> INFO  08:00:07 Did not find Netty's native epoll transport in the classpath, 
> defaulting to NIO.
> INFO  08:00:08 Using data-center name 'datacenter1' for 
> DCAwareRoundRobinPolicy (if this is incorrect, please provide the correct 
> datacenter name with DCAwareRoundRobinPolicy constructor)
> INFO  08:00:08 New Cassandra host /10.211.55.8:9042 added
> Connected to cluster: Titan_DEV
> Datatacenter: datacenter1; Host: /10.211.55.8; Rack: rack1
> Created schema. Sleeping 1s for propagation.
> Generating batches with [1..1] partitions and [0..1] rows (of [10..120] 
> total rows in the partitions)
> com.datastax.driver.core.exceptions.InvalidQueryException: Batch too large
>   at 
> com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:35)
>   at 
> com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:271)
>   at 
> com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:185)
>   at 
> com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:

[jira] [Commented] (CASSANDRA-11105) cassandra-stress tool - InvalidQueryException: Batch too large

2016-02-02 Thread Eric Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15128631#comment-15128631
 ] 

Eric Evans commented on CASSANDRA-11105:


FWIW, I'm seeing the same thing (2.1.12), [yaml gist 
here|https://gist.github.com/eevans/1babf3fab9206951d7e6].  When I run this 
config with {{n=1}}, I can see that 50 CQL rows are added, all with the same 
partition key, with two unique {{rev}} columns (25 each).

> cassandra-stress tool - InvalidQueryException: Batch too large
> --
>
> Key: CASSANDRA-11105
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11105
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
> Environment: Cassandra 2.2.4, Java 8, CentOS 6.5
>Reporter: Ralf Steppacher
> Attachments: batch_too_large.yaml
>
>
> I am using Cassandra 2.2.4 and I am struggling to get the cassandra-stress 
> tool to work for my test scenario. I have followed the example on 
> http://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema
>  to create a yaml file describing my test (attached).
> I am collecting events per user id (text, partition key). Events have a 
> session type (text), event type (text), and creation time (timestamp) 
> (clustering keys, in that order). Plus some more attributes required for 
> rendering the events in a UI. For testing purposes I ended up with the 
> following column spec and insert distribution:
> {noformat}
> columnspec:
>   - name: created_at
> cluster: uniform(10..1)
>   - name: event_type
> size: uniform(5..10)
> population: uniform(1..30)
> cluster: uniform(1..30)
>   - name: session_type
> size: fixed(5)
> population: uniform(1..4)
> cluster: uniform(1..4)
>   - name: user_id
> size: fixed(15)
> population: uniform(1..100)
>   - name: message
> size: uniform(10..100)
> population: uniform(1..100B)
> insert:
>   partitions: fixed(1)
>   batchtype: UNLOGGED
>   select: fixed(1)/120
> {noformat}
> Running stress tool for just the insert prints 
> {noformat}
> Generating batches with [1..1] partitions and [0..1] rows (of [10..120] 
> total rows in the partitions)
> {noformat}
> and then immediately starts flooding me with 
> {{com.datastax.driver.core.exceptions.InvalidQueryException: Batch too 
> large}}. 
> Why I should be exceeding the {{batch_size_fail_threshold_in_kb: 50}} in the 
> {{cassandra.yaml}} I do not understand. My understanding is that the stress 
> tool should generate one row per batch. The size of a single row should not 
> exceed {{8+10*3+5*3+15*3+100*3 = 398 bytes}}. Assuming a worst case of all 
> text characters being 3 byte unicode characters. 
> This is how I start the attached user scenario:
> {noformat}
> [rsteppac@centos bin]$ ./cassandra-stress user 
> profile=../batch_too_large.yaml ops\(insert=1\) -log level=verbose 
> file=~/centos_event_by_patient_session_event_timestamp_insert_only.log -node 
> 10.211.55.8
> INFO  08:00:07 Did not find Netty's native epoll transport in the classpath, 
> defaulting to NIO.
> INFO  08:00:08 Using data-center name 'datacenter1' for 
> DCAwareRoundRobinPolicy (if this is incorrect, please provide the correct 
> datacenter name with DCAwareRoundRobinPolicy constructor)
> INFO  08:00:08 New Cassandra host /10.211.55.8:9042 added
> Connected to cluster: Titan_DEV
> Datatacenter: datacenter1; Host: /10.211.55.8; Rack: rack1
> Created schema. Sleeping 1s for propagation.
> Generating batches with [1..1] partitions and [0..1] rows (of [10..120] 
> total rows in the partitions)
> com.datastax.driver.core.exceptions.InvalidQueryException: Batch too large
>   at 
> com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:35)
>   at 
> com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:271)
>   at 
> com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:185)
>   at 
> com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:55)
>   at 
> org.apache.cassandra.stress.operations.userdefined.SchemaInsert$JavaDriverRun.run(SchemaInsert.java:87)
>   at 
> org.apache.cassandra.stress.Operation.timeWithRetry(Operation.java:159)
>   at 
> org.apache.cassandra.stress.operations.userdefined.SchemaInsert.run(SchemaInsert.java:119)
>   at 
> org.apache.cassandra.stress.StressAction$Consumer.run(StressAction.java:309)
> Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Batch 
> too large
>   at 
> com.datastax.driver.core.Responses$Error.asException(Responses.java:125)
>   at 
> com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:120)
>