cqlsh COPY does batches intelligently by only grouping inserts targeting
the same partition in a batch.
As of version 3.6, C* will not emit the "batch size exceeded" errors if all
statements in a batch belong to the same partition (CASSANDRA-13467
The docs (https://cassandra.apache.org/doc/latest/tools/cqlsh.html#copy-from)
are a good reference for how to use copy from.
https://www.datastax.com/dev/blog/new-features-in-cqlsh-copy is also a good
Here's an example from something I was working from locally:
cqlsh -e "COPY andy.table100b (pkey,skey,text1,text2,text3,text4,text5)
from 'csv/ordered/100b/*.csv' WITH header = true AND INGESTRATE=1000000 AND
NUMPROCESSES=32 AND MAXBATCHSIZE=100;" myhostname
Note you should probably still keep your batches relatively small even with
single partition batches depending on your dataset. In my particular case
I was working with relatively small data (100-byte rows). There is
diminishing returns in terms of throughput as your increase your batch
size, but that will vary based on your data and environment.
On Wed, Oct 25, 2017 at 11:51 AM Suresh Babu Mallampati <
> Hi All,
> Can someone provide me the code snippet for the cqlsh COPY from csv file.
> I just want to know how that COPY mechanism work compared to normal
> insert/commit to avaoid the batch size exceed the limit.