Re: various TTL datas in one table (TWCS)

2020-10-28 Thread Jeff Jirsa


Those properties but 21600 is probably more aggressive than I’d use myself - 
I’m not 100% sure but I suspect I’d try something over 12 hours 

> On Oct 28, 2020, at 10:37 PM, Eunsu Kim  wrote:
> 
> 
> Thank you for your response.
> 
> What subproperties do you mean specifically?
> 
> Currently, there are the following settings to ageressive purge.
> 
> AND COMPACTION = { 'class' : 'TimeWindowCompactionStrategy', 
> 'compaction_window_unit' : 'HOURS', 'compaction_window_size' : 12, 
> 'unchecked_tombstone_compaction': true, 'tombstone_threshold' : 0.05, 
> 'tombstone_compaction_interval' : 21600 }
> AND gc_grace_seconds = 600
> 
> Apache Cassandra Version 3.11.4
> 
> 
>> 2020. 10. 29. 12:26, Jeff Jirsa  작성:
>> 
>> Works but requires you to enable tombstone compaction subproperties  if you 
>> need to purge the 2w ttl data before the highest ttl time you chose
>> 
>>> On Oct 28, 2020, at 5:58 PM, Eunsu Kim  wrote:
>>> 
>>> Hello,
>>> 
>>> I have a table with a default TTL(2w). I'm using TWCS(window size : 12h) on 
>>> the recommendation of experts. This table is quite big, high WPS.
>>> 
>>> I would like to insert data different TTL from the default in this table 
>>> according to the type of data.
>>> About four different TTLs (4w, 6w, 8w, 10w)
>>> 
>>> ex.)
>>> INSERT INTO my_table (…..) VALUES (….) USING TTL 4w
>>> 
>>> 
>>> Could this cause performance problems or unexpected problems in the 
>>> compaction?
>>> 
>>> Please give me advice,
>>> 
>>> Thank you.
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>> 
>> 
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>> 
> 


Re: various TTL datas in one table (TWCS)

2020-10-28 Thread Eunsu Kim
Thank you for your response.

What subproperties do you mean specifically?

Currently, there are the following settings to ageressive purge.

AND COMPACTION = { 'class' : 'TimeWindowCompactionStrategy', 
'compaction_window_unit' : 'HOURS', 'compaction_window_size' : 12, 
'unchecked_tombstone_compaction': true, 'tombstone_threshold' : 0.05, 
'tombstone_compaction_interval' : 21600 }
AND gc_grace_seconds = 600

Apache Cassandra Version 3.11.4


> 2020. 10. 29. 12:26, Jeff Jirsa  작성:
> 
> Works but requires you to enable tombstone compaction subproperties  if you 
> need to purge the 2w ttl data before the highest ttl time you chose
> 
>> On Oct 28, 2020, at 5:58 PM, Eunsu Kim  wrote:
>> 
>> Hello,
>> 
>> I have a table with a default TTL(2w). I'm using TWCS(window size : 12h) on 
>> the recommendation of experts. This table is quite big, high WPS.
>> 
>> I would like to insert data different TTL from the default in this table 
>> according to the type of data.
>> About four different TTLs (4w, 6w, 8w, 10w)
>> 
>> ex.)
>> INSERT INTO my_table (…..) VALUES (….) USING TTL 4w
>> 
>> 
>> Could this cause performance problems or unexpected problems in the 
>> compaction?
>> 
>> Please give me advice,
>> 
>> Thank you.
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>> 
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
> 



Re: various TTL datas in one table (TWCS)

2020-10-28 Thread Jeff Jirsa
Works but requires you to enable tombstone compaction subproperties  if you 
need to purge the 2w ttl data before the highest ttl time you chose

> On Oct 28, 2020, at 5:58 PM, Eunsu Kim  wrote:
> 
> Hello,
> 
> I have a table with a default TTL(2w). I'm using TWCS(window size : 12h) on 
> the recommendation of experts. This table is quite big, high WPS.
> 
> I would like to insert data different TTL from the default in this table 
> according to the type of data.
> About four different TTLs (4w, 6w, 8w, 10w)
> 
> ex.)
> INSERT INTO my_table (…..) VALUES (….) USING TTL 4w
> 
> 
> Could this cause performance problems or unexpected problems in the 
> compaction?
> 
> Please give me advice,
> 
> Thank you.
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



various TTL datas in one table (TWCS)

2020-10-28 Thread Eunsu Kim
Hello,

I have a table with a default TTL(2w). I'm using TWCS(window size : 12h) on the 
recommendation of experts. This table is quite big, high WPS.

I would like to insert data different TTL from the default in this table 
according to the type of data.
About four different TTLs (4w, 6w, 8w, 10w)

ex.)
INSERT INTO my_table (…..) VALUES (….) USING TTL 4w


Could this cause performance problems or unexpected problems in the compaction?

Please give me advice,

Thank you.
-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Running and Managing Large Cassandra Clusters

2020-10-28 Thread Tom van der Woerdt
That particular cluster exists for archival purposes, and as such gets a
very low amount of traffic (maybe 5 queries per minute). So not
particularly helpful to answer your question :-) With that said, we've seen
in other clusters that scalability issues are much more likely to come from
hot partitions, hardware change rate (so basically any change to the token
ring, which we never do concurrently), repairs (though largely mitigated
now that we've switched to num_tokens=16), and connection count (sometimes
I'd consider it advisable to configure drivers to *not* establish a
connection to every node, but bound this and let the Cassandra coordinator
route requests instead).

The scalability in terms of client requests/reads/writes tends to be pretty
linear with the node count (and size of course), and on clusters that are
slightly smaller we can see this as well, easily doing hundreds of
thousands to a million queries per second.

As for repairs, we have our own tools for this, but it's fairly similar to
what Reaper does: we take all the ranges in the cluster and then schedule
them to be repaired over the course of a week. No manual `nodetool repair`
invocations, but specific single-range repairs.

Tom van der Woerdt
Senior Site Reliability Engineer

Booking.com BV
Vijzelstraat Amsterdam Netherlands 1017HL
[image: Booking.com] 
Making it easier for everyone to experience the world since 1996
43 languages, 214+ offices worldwide, 141,000+ global destinations, 29
million reported listings
Subsidiary of Booking Holdings Inc. (NASDAQ: BKNG)


On Wed, Oct 28, 2020 at 2:20 PM Gediminas Blazys
 wrote:

> Hey,
>
>
>
> Thanks chipping in Tomas. Could you describe what sort of workload is the
> big cluster receiving in terms of local C* reads, writes and client
> requests as well?
>
>
>
> You mention repairs, how do you run them?
>
>
>
> Gediminas
>
>
>
> *From:* Tom van der Woerdt 
> *Sent:* Wednesday, October 28, 2020 14:35
> *To:* user 
> *Subject:* [EXTERNAL] Re: Running and Managing Large Cassandra Clusters
>
>
>
> Heya,
>
>
>
> We're running version 3.11.7, can't use 3.11.8 as it won't even start
> (CASSANDRA-16091). Our policy is to use LCS for everything unless there's a
> good argument for a different compaction strategy (I don't think we have
> *any* STCS at all other than system keyspaces). Since our nodes are mostly
> on-prem they are generally oversized on cpu count, but when idle the
> cluster with 360 nodes ends up using less than two cores *peak* for
> background tasks like (full, weekly) repairs and tombstone compactions.
> That said they do get 32 logical threads because that's what the hardware
> ships with (-:
>
>
>
> Haven't had major problems with Gossip over the years. I think we've had
> to run nodetool assassinate exactly once, a few years ago. Probably the
> only gossip related annoyance is that when you decommission all seed nodes
> Cassandra will happily run a single core at 100% trying to connect until
> you update the list of seeds, but that's really minor.
>
>
>
> There's also one cluster that has 50TB nodes, 60 of them, storing
> reasonably large cells (using LCS, previously TWCS, both fine). Replacing a
> node takes a few days, but other than that it's not particularly
> problematic.
>
>
>
> In my experience it's the small clusters that wake you up ;-)
>
>
> *Tom van der Woerdt*
>
> Senior Site Reliability Engineer
>
> Booking.com BV
> Vijzelstraat Amsterdam Netherlands 1017HL
>
> *[image: Booking.com]*
> 
>
> Making it easier for everyone to experience the world since 1996
>
> 43 languages, 214+ offices worldwide, 141,000+ global destinations, 29
> million reported listings
> Subsidiary of Booking Holdings Inc. (NASDAQ: BKNG)
>
>
>
>
>
> On Wed, Oct 28, 2020 at 12:32 PM Joshua McKenzie 
> wrote:
>
> A few questions for you Tom if you have 30 seconds and care to disclose:
>
>1. What version of C*?
>2. What compaction strategy?
>3. What's core count allocated per C* node?
>4. Gossip give you any headaches / you have to be delicate there or
>does it behave itself?
>
> Context: pmc/committer and I manage the OSS C* team at DataStax. We're
> doing a lot of thinking about how to generally improve the operator
> experience across the board for folks in the post 4.0 time frame, so data
> like the above (where things are going well at scale and why) is super
> useful to help feed into that effort.
>
>
>
> Thanks!
>
>
>
>
>
>
>
> On Wed, Oct 

RE: [EXTERNAL] Re: Running and Managing Large Cassandra Clusters

2020-10-28 Thread Gediminas Blazys
Hey,

Thanks chipping in Tomas. Could you describe what sort of workload is the big 
cluster receiving in terms of local C* reads, writes and client requests as 
well?

You mention repairs, how do you run them?

Gediminas

From: Tom van der Woerdt 
Sent: Wednesday, October 28, 2020 14:35
To: user 
Subject: [EXTERNAL] Re: Running and Managing Large Cassandra Clusters

Heya,

We're running version 3.11.7, can't use 3.11.8 as it won't even start 
(CASSANDRA-16091). Our policy is to use LCS for everything unless there's a 
good argument for a different compaction strategy (I don't think we have *any* 
STCS at all other than system keyspaces). Since our nodes are mostly on-prem 
they are generally oversized on cpu count, but when idle the cluster with 360 
nodes ends up using less than two cores *peak* for background tasks like (full, 
weekly) repairs and tombstone compactions. That said they do get 32 logical 
threads because that's what the hardware ships with (-:

Haven't had major problems with Gossip over the years. I think we've had to run 
nodetool assassinate exactly once, a few years ago. Probably the only gossip 
related annoyance is that when you decommission all seed nodes Cassandra will 
happily run a single core at 100% trying to connect until you update the list 
of seeds, but that's really minor.

There's also one cluster that has 50TB nodes, 60 of them, storing reasonably 
large cells (using LCS, previously TWCS, both fine). Replacing a node takes a 
few days, but other than that it's not particularly problematic.

In my experience it's the small clusters that wake you up ;-)

Tom van der Woerdt
Senior Site Reliability Engineer
Booking.com BV
Vijzelstraat Amsterdam Netherlands 1017HL
[Booking.com]
Making it easier for everyone to experience the world since 1996
43 languages, 214+ offices worldwide, 141,000+ global destinations, 29 million 
reported listings
Subsidiary of Booking Holdings Inc. (NASDAQ: BKNG)


On Wed, Oct 28, 2020 at 12:32 PM Joshua McKenzie 
mailto:jmcken...@apache.org>> wrote:

A few questions for you Tom if you have 30 seconds and care to disclose:

  1.  What version of C*?
  2.  What compaction strategy?
  3.  What's core count allocated per C* node?
  4.  Gossip give you any headaches / you have to be delicate there or does it 
behave itself?
Context: pmc/committer and I manage the OSS C* team at DataStax. We're doing a 
lot of thinking about how to generally improve the operator experience across 
the board for folks in the post 4.0 time frame, so data like the above (where 
things are going well at scale and why) is super useful to help feed into that 
effort.

Thanks!



On Wed, Oct 28, 2020 at 7:14 AM, Tom van der Woerdt 
mailto:tom.vanderwoe...@booking.com.invalid>>
 wrote:
Does 360 count? :-)

num_tokens is 16, works fine (had 256 on a 300 node cluster as well, not too 
many problems either). Roughly 2.5TB per node, running on-prem on reasonably 
stable hardware so replacements end up happening once a week at most, and 
there's no particular change needed in the automation. Scaling up or down takes 
a while, but it doesn't appear to be slower than any other cluster. 
Configuration wise it's no different than a 5-node cluster either. Pretty 
uneventful tbh.

Tom van der Woerdt
Senior Site Reliability Engineer
Booking.com
 BV
Vijzelstraat Amsterdam Netherlands 1017HL
[Booking.com]
Making it easier for everyone to experience the world since 1996
43 languages, 214+ offices worldwide, 141,000+ global destinations, 29 million 
reported listings
Subsidiary of Booking Holdings Inc. (NASDAQ: BKNG)


On Wed, Oct 28, 2020 at 8:58 AM Gediminas Blazys 
mailto:gediminas.bla...@microsoft.com.invalid>>
 wrote:
Hello,

I wanted to seek out your opinion and experience.

Has anyone of you had a chance to run a Cassandra cluster of more than 350 
nodes?
What are the major 

Re: Running and Managing Large Cassandra Clusters

2020-10-28 Thread Tom van der Woerdt
Heya,

We're running version 3.11.7, can't use 3.11.8 as it won't even start
(CASSANDRA-16091). Our policy is to use LCS for everything unless there's a
good argument for a different compaction strategy (I don't think we have
*any* STCS at all other than system keyspaces). Since our nodes are mostly
on-prem they are generally oversized on cpu count, but when idle the
cluster with 360 nodes ends up using less than two cores *peak* for
background tasks like (full, weekly) repairs and tombstone compactions.
That said they do get 32 logical threads because that's what the hardware
ships with (-:

Haven't had major problems with Gossip over the years. I think we've had to
run nodetool assassinate exactly once, a few years ago. Probably the only
gossip related annoyance is that when you decommission all seed nodes
Cassandra will happily run a single core at 100% trying to connect until
you update the list of seeds, but that's really minor.

There's also one cluster that has 50TB nodes, 60 of them, storing
reasonably large cells (using LCS, previously TWCS, both fine). Replacing a
node takes a few days, but other than that it's not particularly
problematic.

In my experience it's the small clusters that wake you up ;-)

Tom van der Woerdt
Senior Site Reliability Engineer

Booking.com BV
Vijzelstraat Amsterdam Netherlands 1017HL
[image: Booking.com] 
Making it easier for everyone to experience the world since 1996
43 languages, 214+ offices worldwide, 141,000+ global destinations, 29
million reported listings
Subsidiary of Booking Holdings Inc. (NASDAQ: BKNG)


On Wed, Oct 28, 2020 at 12:32 PM Joshua McKenzie 
wrote:

> A few questions for you Tom if you have 30 seconds and care to disclose:
>
>1. What version of C*?
>2. What compaction strategy?
>3. What's core count allocated per C* node?
>4. Gossip give you any headaches / you have to be delicate there or
>does it behave itself?
>
> Context: pmc/committer and I manage the OSS C* team at DataStax. We're
> doing a lot of thinking about how to generally improve the operator
> experience across the board for folks in the post 4.0 time frame, so data
> like the above (where things are going well at scale and why) is super
> useful to help feed into that effort.
>
> Thanks!
>
>
>
> On Wed, Oct 28, 2020 at 7:14 AM, Tom van der Woerdt <
> tom.vanderwoe...@booking.com.invalid> wrote:
>
>> Does 360 count? :-)
>>
>> num_tokens is 16, works fine (had 256 on a 300 node cluster as well, not
>> too many problems either). Roughly 2.5TB per node, running on-prem on
>> reasonably stable hardware so replacements end up happening once a week at
>> most, and there's no particular change needed in the automation. Scaling up
>> or down takes a while, but it doesn't appear to be slower than any other
>> cluster. Configuration wise it's no different than a 5-node cluster either.
>> Pretty uneventful tbh.
>>
>> Tom van der Woerdt
>> Senior Site Reliability Engineer
>>
>> Booking.com  BV
>> Vijzelstraat Amsterdam Netherlands 1017HL
>> [image: Booking.com] 
>> Making it easier for everyone to experience the world since 1996
>> 43 languages, 214+ offices worldwide, 141,000+ global destinations, 29
>> million reported listings
>> Subsidiary of Booking Holdings Inc. (NASDAQ: BKNG)
>>
>>
>> On Wed, Oct 28, 2020 at 8:58 AM Gediminas Blazys <
>> gediminas.bla...@microsoft.com.invalid> wrote:
>>
>>> Hello,
>>>
>>>
>>>
>>> I wanted to seek out your opinion and experience.
>>>
>>>
>>>
>>> Has anyone of you had a chance to run a Cassandra cluster of more than
>>> 350 nodes?
>>>
>>> What are the major configuration considerations that you had to focus
>>> on? What number of vnodes did you use?
>>>
>>> Once the cluster was up and running what would you have done differently?
>>>
>>> Perhaps it would be more manageable to run multiple smaller clusters?
>>> Did you try this approach? What were the major challenges?
>>>
>>>
>>>
>>> I don’t know if questions like that are allowed here but I’m really
>>> interested in what other folks ran into while running massive operations.
>>>
>>>
>>>
>>> Gediminas
>>>
>>
>


Re: GC pauses way up after single node Cassandra 2.2 -> 3.11 binary upgrade

2020-10-28 Thread Steinmaurer, Thomas
Leon,

we had an awful performance/throughput experience with 3.x coming from 2.1. 
3.11 is simply a memory hog, if you are using batch statements on the client 
side. If so, you are likely affected by 
https://issues.apache.org/jira/browse/CASSANDRA-16201


Regards,
Thomas

From: Leon Zaruvinsky 
Sent: Wednesday, October 28, 2020 5:21 AM
To: user@cassandra.apache.org 
Subject: Re: GC pauses way up after single node Cassandra 2.2 -> 3.11 binary 
upgrade



Our JVM options are unchanged between 2.2 and 3.11

For the sake of clarity, do you mean:
(a) you're using the default JVM options in 3.11 and it's different to the 
options you had in 2.2?
(b) you've copied the same JVM options you had in 2.2 to 3.11?

(b), which are the default options from 2.2 (and I believe the default options 
in 3.11 from a brief glance).

Copied here for clarity, though I'm skeptical that GC settings are actually a 
cause here because I would expect them to only impact the upgraded node and not 
the cluster overall.

### CMS Settings
-XX:+UseParNewGC
XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1
XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSWaitDuration=1
-XX:+CMSParallelInitialMarkEnabled
-XX:+CMSEdenChunksRecordAlways
XX:+CMSClassUnloadingEnabled

The distinction is important because at the moment, you need to go through a 
process of elimination to identify the cause.


Read throughput (rate, bytes read/range scanned, etc.) seems fairly consistent 
before and after the upgrade across all nodes.

What I was trying to get at is whether the upgraded node was getting hit with 
more traffic compared to the other nodes since it will indicate that the longer 
GCs are just the symptom, not the cause.


I don't see any distinct change, nor do I see an increase in traffic to the 
upgraded node that would result in longer GC pauses.  Frankly I don't see any 
changes or aberrations in client-related metrics at all that correlate to the 
GC pauses, except for the corresponding timeouts.
The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4020 Linz, Austria, Am 
F?nfundzwanziger Turm 20


Re: Running and Managing Large Cassandra Clusters

2020-10-28 Thread Joshua McKenzie
A few questions for you Tom if you have 30 seconds and care to disclose:

   1. What version of C*?
   2. What compaction strategy?
   3. What's core count allocated per C* node?
   4. Gossip give you any headaches / you have to be delicate there or does
   it behave itself?

Context: pmc/committer and I manage the OSS C* team at DataStax. We're
doing a lot of thinking about how to generally improve the operator
experience across the board for folks in the post 4.0 time frame, so data
like the above (where things are going well at scale and why) is super
useful to help feed into that effort.

Thanks!



On Wed, Oct 28, 2020 at 7:14 AM, Tom van der Woerdt <
tom.vanderwoe...@booking.com.invalid> wrote:

> Does 360 count? :-)
>
> num_tokens is 16, works fine (had 256 on a 300 node cluster as well, not
> too many problems either). Roughly 2.5TB per node, running on-prem on
> reasonably stable hardware so replacements end up happening once a week at
> most, and there's no particular change needed in the automation. Scaling up
> or down takes a while, but it doesn't appear to be slower than any other
> cluster. Configuration wise it's no different than a 5-node cluster either.
> Pretty uneventful tbh.
>
> Tom van der Woerdt
> Senior Site Reliability Engineer
>
> Booking.com  BV
> Vijzelstraat Amsterdam Netherlands 1017HL
> [image: Booking.com] 
> Making it easier for everyone to experience the world since 1996
> 43 languages, 214+ offices worldwide, 141,000+ global destinations, 29
> million reported listings
> Subsidiary of Booking Holdings Inc. (NASDAQ: BKNG)
>
>
> On Wed, Oct 28, 2020 at 8:58 AM Gediminas Blazys  microsoft.com.invalid> wrote:
>
> Hello,
>
>
>
> I wanted to seek out your opinion and experience.
>
>
>
> Has anyone of you had a chance to run a Cassandra cluster of more than 350
> nodes?
>
> What are the major configuration considerations that you had to focus on?
> What number of vnodes did you use?
>
> Once the cluster was up and running what would you have done differently?
>
> Perhaps it would be more manageable to run multiple smaller clusters? Did
> you try this approach? What were the major challenges?
>
>
>
> I don’t know if questions like that are allowed here but I’m really
> interested in what other folks ran into while running massive operations.
>
>
>
> Gediminas
>
>


Re: Running and Managing Large Cassandra Clusters

2020-10-28 Thread Tom van der Woerdt
Does 360 count? :-)

num_tokens is 16, works fine (had 256 on a 300 node cluster as well, not
too many problems either). Roughly 2.5TB per node, running on-prem on
reasonably stable hardware so replacements end up happening once a week at
most, and there's no particular change needed in the automation. Scaling up
or down takes a while, but it doesn't appear to be slower than any other
cluster. Configuration wise it's no different than a 5-node cluster either.
Pretty uneventful tbh.

Tom van der Woerdt
Senior Site Reliability Engineer

Booking.com BV
Vijzelstraat Amsterdam Netherlands 1017HL
[image: Booking.com] 
Making it easier for everyone to experience the world since 1996
43 languages, 214+ offices worldwide, 141,000+ global destinations, 29
million reported listings
Subsidiary of Booking Holdings Inc. (NASDAQ: BKNG)


On Wed, Oct 28, 2020 at 8:58 AM Gediminas Blazys
 wrote:

> Hello,
>
>
>
> I wanted to seek out your opinion and experience.
>
>
>
> Has anyone of you had a chance to run a Cassandra cluster of more than 350
> nodes?
>
> What are the major configuration considerations that you had to focus on?
> What number of vnodes did you use?
>
> Once the cluster was up and running what would you have done differently?
>
> Perhaps it would be more manageable to run multiple smaller clusters? Did
> you try this approach? What were the major challenges?
>
>
>
> I don’t know if questions like that are allowed here but I’m really
> interested in what other folks ran into while running massive operations.
>
>
>
> Gediminas
>
>
>


Running and Managing Large Cassandra Clusters

2020-10-28 Thread Gediminas Blazys
Hello,

I wanted to seek out your opinion and experience.

Has anyone of you had a chance to run a Cassandra cluster of more than 350 
nodes?
What are the major configuration considerations that you had to focus on? What 
number of vnodes did you use?
Once the cluster was up and running what would you have done differently?
Perhaps it would be more manageable to run multiple smaller clusters? Did you 
try this approach? What were the major challenges?

I don't know if questions like that are allowed here but I'm really interested 
in what other folks ran into while running massive operations.

Gediminas



Re: Cassandra timeout during read query

2020-10-28 Thread Attila Wind

Hey Deepak,

"Are you suggesting to reduce the fetchSize (right now fetchSize is 
5000) for this query?"


Definitely yes! If you would go with 1000 only that would give 5x more 
chance to the concrete Cassandra node/nodes which is/are executing your 
query to finish in time pulling together the records (page) - thus helps 
you to avoid the timeout issue.
Based on our measurements smaller page sizes does not add too much to 
the overall query time at all - but helps Cassandra a lot to eventually 
fulfill the full request as she can do much better load balancing too as 
you are iterating over your result set.

I would give it a try - same tactics helped a lot on our side

I also recommend to try to optimize your data in parallel with the above 
- if possible and there is space for improvement.
All I wrote earlier counts a lot. You need to also take care of data 
cleanup strategies in your tables to keep the amount of data managed 
somehow. TTL based approach e.g. is the best if you ask me especially if 
you have huge data set.


cheers

Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +49 176 43556932


27.10.2020 20:07 keltezéssel, Deepak Sharma írta:

Hi Attlila,

We did have larger partitions which are now below 100MB threshold 
after we ran nodetool repair. And now we do see most of the time, 
query runs are running successfully but there is a small percentage of 
query runs which are still failing.


Regarding your comment ```considered with your fetchSize together 
(driver setting on the query level)```, can you elaborate more on it? 
Are you suggesting to reduce the fetchSize (right now fetchSize is 
5000) for this query?


Also, we are trying to use prefetch feature as well but it is also not 
helping. Following is the code:


Iterator iter = resultSet.iterator();
while (iter.hasNext()) {
  if (resultSet.getAvailableWithoutFetching() <= fetchSize && 
!resultSet.isFullyFetched()) {

    resultSet.fetchMoreResults();
  }
  Row row = iter.next();
  .
}

Thanks,
Deepak

On Sat, Sep 19, 2020 at 6:56 PM Deepak Sharma 
mailto:sharma.dee...@salesforce.com>> 
wrote:


Thanks Attila and Aaron for the response. These are great
insights. I will check and get back to you in case I have any
questions.

Best,
Deepak

On Tue, Sep 15, 2020 at 4:33 AM Attila Wind
 wrote:

Hi Deepak,

Aaron has right - in order being able to help (better) you
need to share those details

That 5 secs timeout comes from the coordinator node I think -
see cassandra.yaml "read_request_timeout_in_ms" setting - that
is influencing this

But it does not matter too much... The point is that none of
the replicas could completed your query within that 5 secs.
And this is a clean indication of something is slow with your
query.
Maybe 4) is a bit less important here, or I would a bit make
it more precise: considered with your fetchSize together
(driver setting on the query level)

By experience one reason could be if the query which used to
works starts not to work any longer is growing number of data.
And a possible "wide cluster" problem.
Do you have monitoring on the Cassandra machines? What does
iowait show? (for us when things like this will start
happening is a clean indication)

cheers

Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +49 176 43556932


14.09.2020 18:36 keltezéssel, Aaron Ploetz írta:

Deepak,

Can you reply with:

1) The query you are trying to run.
2) The table definition (PRIMARY KEY, specifically).
3) Maybe a little description of what the table is designed
to do.
4) How much data you're expecting returned (both # of rows
and data size).

Thanks,

Aaron


On Mon, Sep 14, 2020 at 10:58 AM Deepak Sharma

 wrote:

Hi There,

We are running into a strange issue in our Cassandra
Cluster where one specific query is failing with
following error:

Cassandra timeout during read query at consistency QUORUM
(3 responses were required but only 0 replica responded)

This is not a typical query read timeout that we know for
sure. This error is getting spit out within 5 seconds and
the query timeout we have set is around 30 seconds

Can we know what is happening here and how can we
reproduce this in our local environment?

Thanks,
Deepak