tax.com/dev/blog/cassandra-2-1-now-
>>>>>>>> over-50-faster) which massively helps performance. It provides
>>>>>>>> the benefit of batches but without the coordinator overhead.
>>>>>>>>
>>>>>>>> Ca
gt;> have 100 servers, and perform a mutation on 100 partitions, you could
>>>>>>>> have
>>>>>>>> a coordinator that's
>>>>>>>>
>>>>>>>> 1) talking to every machine in the cluster and
>>&
>>>> Jonathan says “It is absolutely not going to help you if you're
>>>>>>>> trying to lump queries together to reduce network & server overhead -
>>>>>>>> in
>>>>>>>> fact it'll do the opposite”, but I w
t;>>>> trying to lump queries together to reduce network & server overhead -
>>>>>>>> in
>>>>>>>> fact it'll do the opposite”, but I would note that the CQL3 spec says “
>>>>>>>> The BATCH statement ... serves
ps between the client and the server (and sometimes
>>>>>>> between the server coordinator and the replicas) when batching multiple
>>>>>>> updates.” Is the spec inaccurate? I mean, it seems in conflict with your
>>>>>>> stateme
t with your
>>>>>>> statement.
>>>>>>>
>>>>>>> See:
>>>>>>> https://cassandra.apache.org/doc/cql3/CQL.html
>>>>>>>
>>>>>>> I see the spec as gospel – if it’s not accurate, let’s propose a
>
er
>>>>>> coordinator/replicas. However, because of the distributed nature of
>>>>>> Cassandra, spread requests across nearby nodes as much as possible to
>>>>>> optimize performance. Using batches to optimize performance is usually
>>>>>>
formance is usually not
>>>>> successful, as described in Using and misusing batches section. For
>>>>> information about the fastest way to load data, see "Cassandra: Batch
>>>>> loading without the Batch keyword."”
>>>>>
>>
ptimize performance. Using batches to optimize performance is usually not
>>>>> successful, as described in Using and misusing batches section. For
>>>>> information about the fastest way to load data, see "Cassandra: Batch
>>>>> loading without th
atch”, which is
>>>> simply a way to collect “batches” of operations in the client/driver and
>>>> then let the driver determine what degree of batching and asynchronous
>>>> operation is appropriate.
>>>>
>>>> It might also be nice t
verall cluster load.
>>>
>>> I would also note that the example in the spec has multiple inserts with
>>> different partition key values, which flies in the face of the admonition
>>> to to refrain from using server-side distribution of requests.
>>>
>>> At a minimum the CQL
connections, and to have that be dynamic based
>> on overall cluster load.
>>
>> I would also note that the example in the spec has multiple inserts with
>> different partition key values, which flies in the face of the admonition
>> to to refrain from using server-sid
verhead - in fact it'll do the
>>> opposite. If you're trying to do that, instead perform many async
>>> queries. The overhead of batches in cassandra is significant and you're
>>> going to hit a lot of problems if you use them excessively (timeou
Ryan,
>>>
>>> Thanks for the quick response.
>>>
>>>
>>>
>>> I did see that jira before posting my question on this list. However, I
>>> didn’t see any information about why 5kb+ data will cause instability. 5kb
>>> or e
ore clear statement of intent and
> non-intent for BATCH.
>
> -- Jack Krupansky
>
> *From:* Jonathan Haddad
> *Sent:* Friday, December 12, 2014 12:58 PM
> *To:* user@cassandra.apache.org ; Ryan Svihla
> *Subject:* Re: batch_size_warn_threshold_in_kb
>
> The really important thing
gt;>
>>
>> In addition, Patrick is saying that he does not recommend more than 100
>> mutations per batch. So why not warn users just on the # of mutations in a
>> batch?
>>
>>
>>
>> Mohammed
>>
>>
>>
>> *From:* Ryan Svi
, 2014 12:58 PM
To: user@cassandra.apache.org ; Ryan Svihla
Subject: Re: batch_size_warn_threshold_in_kb
The really important thing to really take away from Ryan's original post is
that batches are not there for performance. The only case I consider batches
to be useful for is when you absolut
>
>
> *From:* Ryan Svihla [mailto:rsvi...@datastax.com]
> *Sent:* Thursday, December 11, 2014 12:56 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: batch_size_warn_threshold_in_kb
>
>
>
> Nothing magic, just put in there based on experience. You can find the
> s
up for debate."
>>
>> It's totally changeable, however, it's there in no small part because so
>> many people confuse the BATCH keyword as a performance optimization, this
>> helps flag those cases of misuse.
>>
>> On Thu, Dec 11, 2014 at 2:43 PM, M
ohammed
>
>
>
> *From:* Ryan Svihla [mailto:rsvi...@datastax.com]
> *Sent:* Thursday, December 11, 2014 12:56 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: batch_size_warn_threshold_in_kb
>
>
>
> Nothing magic, just put in there based on experience. Y
g those cases of misuse.
> On Thu, Dec 11, 2014 at 2:43 PM, Mohammed Guller
> wrote:
>>
>> Hi –
>>
>> The cassandra.yaml file has property called *batch_size_warn_threshold_in_kb.
>> *
>>
>> The default size is 5kb and according to the comments in the ya
@cassandra.apache.org
Subject: Re: batch_size_warn_threshold_in_kb
Nothing magic, just put in there based on experience. You can find the story
behind the original recommendation here
https://issues.apache.org/jira/browse/CASSANDRA-6487
Key reasoning for the desire comes from Patrick McFadden:
"Yes
roperty called *batch_size_warn_threshold_in_kb.
> *
>
> The default size is 5kb and according to the comments in the yaml file, it
> is used to log WARN on any batch size exceeding this value in kilobytes. It
> says caution should be taken on increasing the size of this threshold as it
&
on, this
helps flag those cases of misuse.
On Thu, Dec 11, 2014 at 2:43 PM, Mohammed Guller
wrote:
>
> Hi –
>
> The cassandra.yaml file has property called *batch_size_warn_threshold_in_kb.
> *
>
> The default size is 5kb and according to the comments in the yaml file, it
>
Hi -
The cassandra.yaml file has property called batch_size_warn_threshold_in_kb.
The default size is 5kb and according to the comments in the yaml file, it is
used to log WARN on any batch size exceeding this value in kilobytes. It says
caution should be taken on increasing the size of this
25 matches
Mail list logo