Re: Huge Batches

techpyaasa . Fri, 09 Jun 2017 03:30:05 -0700

Hi Justin,

We have very few columns in PK(max 2 partition columns , max 2 clustering
columns) and it wont have huge data/huge number of primary keys.
I just wanted to print the names & values of these columns for huge batches.


PS: we are using c*-2.1

Thanks for reply @Justin and @Akhil

On Fri, Jun 9, 2017 at 5:31 AM, Justin Cameron <jus...@instaclustr.com>
wrote:

> I don't believe the keys within a large batch are logged by Cassandra. A
> large batch could potentially contain tens of thousands of primary keys, so
> this could quickly fill up the logs.
>
> Here are a couple of suggestions:
>
>    - Large batches should also be slow, so you could try setting up slow
>    query logging in the Java driver and see what gets caught:
>    https://docs.datastax.com/en/developer/java-driver/3.2/manual/logging/
>    <https://docs.datastax.com/en/developer/java-driver/3.2/manual/logging/>
>    - You could write your own custom QueryHandler to log those details on
>    the server-side, as described here: https://www.slideshare.
>    net/planetcassandra/cassandra-summit-2014-lesser-known-
>    features-of-cassandra-21
>    
> <https://www.slideshare.net/planetcassandra/cassandra-summit-2014-lesser-known-features-of-cassandra-21>
>
>
> Cheers,
> Justin
>
> On Thu, 8 Jun 2017 at 18:49 techpyaasa . <techpya...@gmail.com> wrote:
>
>> Hi ,
>>
>> Recently we are seeing huge batches and log prints as below in c* logs
>>
>>
>> *Batch of prepared statements for [ks1.cf1] is of size 413350, exceeding
>> specified threshold of 5120 by 362150*
>> Along with the Column Family name (as found in above log print) , we
>> would like to know the partion key , cluster column values(along with their
>> names) too , so that it would be easy to trace out the user who is
>> inserting such huge batches.
>>
>> I tried to see code base of c* as below, but could not figure out how to
>> get values of partition keys , values of cluster columns. :(
>> Can some one please help me out...
>>
>>    * public static void verifyBatchSize(Iterable<ColumnFamily> cfs)*
>> *    {*
>> *        long size = 0;*
>> *        long warnThreshold =
>> DatabaseDescriptor.getBatchSizeWarnThreshold();*
>>
>> *        for (ColumnFamily cf : cfs)*
>> *            size += cf.dataSize();*
>>
>> *        if (size > warnThreshold)*
>> *        {*
>> *            Set<String> ksCfPairs = new HashSet<>();*
>> *            for (ColumnFamily cf : cfs)*
>> *            {*
>> *                ksCfPairs.add(String.format("%s.%s size=%s",
>> cf.metadata().ksName, cf.metadata().cfName , cf.dataSize()));*
>> *                Iterator<CellName> cns = cf.getColumnNames().iterator();*
>> *                CellName cn = cns.next();*
>> *                cn.dataSize();*
>> *            }*
>>
>> *            String format = "Batch of prepared statements for {} is of
>> size {}, exceeding specified threshold of {} by {}.";*
>> *            logger.warn(format, ksCfPairs, size, warnThreshold, size -
>> warnThreshold);*
>> *        }*
>> *    }*
>>
>>
>> Thanks
>>
>> TechPyaasa
>>
> --
>
>
> *Justin Cameron*Senior Software Engineer
>
>
> <https://www.instaclustr.com/>
>
>
> This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
> and Instaclustr Inc (USA).
>
> This email and any attachments may contain confidential and legally
> privileged information.  If you are not the intended recipient, do not copy
> or disclose its content, but please reply to this email immediately and
> highlight the error to the sender and then immediately delete the message.
>

Re: Huge Batches

Reply via email to