Re: Bootstrapping fails with < 128GB RAM ...

2018-02-16 Thread Jürgen Albersdorfer
Hi Jon,
I was able to get a heapdump. - I created a JIRA and attached as much as
details as possible.

https://issues.apache.org/jira/browse/CASSANDRA-14239

The heapdump is 42GB in Size. I will keep it - if you need more information
please don't hesitate to let me know.

thanks,
Jürgen

2018-02-09 12:07 GMT+01:00 Jürgen Albersdorfer :

> Hi Jon,
> should I register to the JIRA and open an Issue or will you do so?
> I'm currently trying to bootstrap another node - with 100GB RAM, this
> time, and I'm recording Java Heap Memory over time via Jconsole, Top
> Threads and do monitoring the debug.log.
>
> There, in the debug.log, I can see, that the other nodes seem to
> immediatelly start hinting to the joining node, indicated by the following
> logs, which I have hundrets per second in my debug.log:
>
> DEBUG [MutationStage-27] 2018-02-09 12:06:03,241 HintVerbHandler.java:95 -
> Failed to apply hint
> java.util.concurrent.CompletionException: 
> org.apache.cassandra.exceptions.WriteTimeoutException:
> Operation timed out - received only 0 responses.
> at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
> ~[na:1.8.0_151]
> at 
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
> ~[na:1.8.0_151]
> at 
> java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:647)
> ~[na:1.8.0_151]
> at java.util.concurrent.CompletableFuture$UniAccept.
> tryFire(CompletableFuture.java:632) ~[na:1.8.0_151]
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> ~[na:1.8.0_151]
> at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
> ~[na:1.8.0_151]
> at org.apache.cassandra.db.Keyspace.applyInternal(Keyspace.java:523)
> ~[apache-cassandra-3.11.1.jar:3.11.1]
> at 
> org.apache.cassandra.db.Keyspace.lambda$applyInternal$0(Keyspace.java:538)
> ~[apache-cassandra-3.11.1.jar:3.11.1]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ~[na:1.8.0_151]
> at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorServ
> ice$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
> ~[apache-cassandra-3.11.1.jar:3.11.1]
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109)
> ~[apache-cassandra-3.11.1.jar:3.11.1]
> at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_151]
> Caused by: org.apache.cassandra.exceptions.WriteTimeoutException:
> Operation timed out - received only 0 responses.
> ... 6 common frames omitted
>
> Could this be connected? Maybe causing the extensive RAM requirement?
>
> Thanks so far, regards
> Juergen
>
> 2018-02-07 19:49 GMT+01:00 Jon Haddad :
>
>> It would be extremely helpful to get some info about your heap.  At a
>> bare minimum, a histogram of the heap dump would be useful, but ideally a
>> full heap dump would be best.
>>
>> jmap  -dump:live,format=b,file=heap.bin PID
>>
>> Taking a look at that in YourKit should give some pretty quick insight
>> into what kinds of objects are allocated then we can get to the bottom of
>> the issue.  This should be moved to a JIRA (https://issues.apache.org/jir
>> a/secure/Dashboard.jspa) in order to track and fix it, if you could
>> attach that heap dump it would be very helpful.
>>
>> Jon
>>
>>
>> On Feb 7, 2018, at 6:11 AM, Nicolas Guyomar 
>> wrote:
>>
>> Ok then, following up on the wild guess : because you have quite a lot of
>> concurrent compactors, maybe it is too much concurrent compactions for the
>> jvm to deal with (taking into account that your load average of 106 seems
>> really high IMHO)
>>
>> 55Gb of data is not that much, you can try to reduce those concurrent
>> compactor to make sure your box is not under too much stress (how many
>> compaction do you have in parallel during boostrap ? )
>>
>> In the end, it does seem that you're gonna have to share some heap dump
>> for further investigation (sorry I'm not gonna help lot on this matter)
>>
>> On 7 February 2018 at 14:43, Jürgen Albersdorfer > > wrote:
>>
>>> Hi Nicolas,
>>>
>>> Do you know how many sstables is this new node suppose to receive ?
>>>
>>>
>>> If I can find out this via nodetool netstats, then this would be 619 as
>>> following:
>>>
>>> # nodetool netstats
>>> Bootstrap b95371e0-0c0a-11e8-932b-f775227bf21c
>>> /192.168.1.215 - Receiving 71 files, 7744612158 <(774)%20461-2158>
>>> bytes total. Already received 0 files, 893897583 bytes total
>>> /192.168.1.214 - Receiving 58 files, 5693392001 bytes total.
>>> Already received 0 files, 1078372756 bytes total
>>> /192.168.1.206 - Receiving 52 files, 3389096409 bytes total.
>>> Already received 3 files, 508592758 bytes total
>>> /192.168.1.213 - Receiving 59 files, 6041633329 bytes total.
>>> Already received 0 files, 1038760653 bytes total
>>> /192.168.1.231 - Receiving 79 files, 7579181689 <(757)%20918-1689>
>

Re: Bootstrapping fails with < 128GB RAM ...

2018-02-09 Thread Jürgen Albersdorfer
Hi Jon,
should I register to the JIRA and open an Issue or will you do so?
I'm currently trying to bootstrap another node - with 100GB RAM, this time,
and I'm recording Java Heap Memory over time via Jconsole, Top Threads and
do monitoring the debug.log.

There, in the debug.log, I can see, that the other nodes seem to
immediatelly start hinting to the joining node, indicated by the following
logs, which I have hundrets per second in my debug.log:

DEBUG [MutationStage-27] 2018-02-09 12:06:03,241 HintVerbHandler.java:95 -
Failed to apply hint
java.util.concurrent.CompletionException:
org.apache.cassandra.exceptions.WriteTimeoutException: Operation timed out
- received only 0 responses.
at
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
~[na:1.8.0_151]
at
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
~[na:1.8.0_151]
at
java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:647)
~[na:1.8.0_151]
at
java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:632)
~[na:1.8.0_151]
at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
~[na:1.8.0_151]
at
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
~[na:1.8.0_151]
at
org.apache.cassandra.db.Keyspace.applyInternal(Keyspace.java:523)
~[apache-cassandra-3.11.1.jar:3.11.1]
at
org.apache.cassandra.db.Keyspace.lambda$applyInternal$0(Keyspace.java:538)
~[apache-cassandra-3.11.1.jar:3.11.1]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
~[na:1.8.0_151]
at
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
~[apache-cassandra-3.11.1.jar:3.11.1]
at
org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109)
~[apache-cassandra-3.11.1.jar:3.11.1]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_151]
Caused by: org.apache.cassandra.exceptions.WriteTimeoutException: Operation
timed out - received only 0 responses.
... 6 common frames omitted

Could this be connected? Maybe causing the extensive RAM requirement?

Thanks so far, regards
Juergen

2018-02-07 19:49 GMT+01:00 Jon Haddad :

> It would be extremely helpful to get some info about your heap.  At a bare
> minimum, a histogram of the heap dump would be useful, but ideally a full
> heap dump would be best.
>
> jmap  -dump:live,format=b,file=heap.bin PID
>
> Taking a look at that in YourKit should give some pretty quick insight
> into what kinds of objects are allocated then we can get to the bottom of
> the issue.  This should be moved to a JIRA (https://issues.apache.org/
> jira/secure/Dashboard.jspa) in order to track and fix it, if you could
> attach that heap dump it would be very helpful.
>
> Jon
>
>
> On Feb 7, 2018, at 6:11 AM, Nicolas Guyomar 
> wrote:
>
> Ok then, following up on the wild guess : because you have quite a lot of
> concurrent compactors, maybe it is too much concurrent compactions for the
> jvm to deal with (taking into account that your load average of 106 seems
> really high IMHO)
>
> 55Gb of data is not that much, you can try to reduce those concurrent
> compactor to make sure your box is not under too much stress (how many
> compaction do you have in parallel during boostrap ? )
>
> In the end, it does seem that you're gonna have to share some heap dump
> for further investigation (sorry I'm not gonna help lot on this matter)
>
> On 7 February 2018 at 14:43, Jürgen Albersdorfer 
> wrote:
>
>> Hi Nicolas,
>>
>> Do you know how many sstables is this new node suppose to receive ?
>>
>>
>> If I can find out this via nodetool netstats, then this would be 619 as
>> following:
>>
>> # nodetool netstats
>> Bootstrap b95371e0-0c0a-11e8-932b-f775227bf21c
>> /192.168.1.215 - Receiving 71 files, 7744612158 <(774)%20461-2158>
>> bytes total. Already received 0 files, 893897583 bytes total
>> /192.168.1.214 - Receiving 58 files, 5693392001 bytes total. Already
>> received 0 files, 1078372756 bytes total
>> /192.168.1.206 - Receiving 52 files, 3389096409 bytes total. Already
>> received 3 files, 508592758 bytes total
>> /192.168.1.213 - Receiving 59 files, 6041633329 bytes total. Already
>> received 0 files, 1038760653 bytes total
>> /192.168.1.231 - Receiving 79 files, 7579181689 <(757)%20918-1689>
>> bytes total. Already received 4 files, 38387859 bytes total
>> /192.168.1.208 - Receiving 51 files, 3272885123 bytes total. Already
>> received 3 files, 362450903 bytes total
>> /192.168.1.207 - Receiving 56 files, 3028344200 <(302)%20834-4200>
>> bytes total. Already received 3 files, 57790197 bytes total
>> /192.168.1.232 - Receiving 79 files, 7268716317 <(726)%20871-6317>
>> bytes total. Already received 1 files, 1127174421 bytes total
>> /192.168.1.209 - Receiving 114 files, 2138184

Re: Bootstrapping fails with < 128GB RAM ...

2018-02-07 Thread Jon Haddad
It would be extremely helpful to get some info about your heap.  At a bare 
minimum, a histogram of the heap dump would be useful, but ideally a full heap 
dump would be best.

jmap  -dump:live,format=b,file=heap.bin PID

Taking a look at that in YourKit should give some pretty quick insight into 
what kinds of objects are allocated then we can get to the bottom of the issue. 
 This should be moved to a JIRA 
(https://issues.apache.org/jira/secure/Dashboard.jspa 
) in order to track and 
fix it, if you could attach that heap dump it would be very helpful.

Jon

> On Feb 7, 2018, at 6:11 AM, Nicolas Guyomar  wrote:
> 
> Ok then, following up on the wild guess : because you have quite a lot of 
> concurrent compactors, maybe it is too much concurrent compactions for the 
> jvm to deal with (taking into account that your load average of 106 seems 
> really high IMHO)
> 
> 55Gb of data is not that much, you can try to reduce those concurrent 
> compactor to make sure your box is not under too much stress (how many 
> compaction do you have in parallel during boostrap ? )
> 
> In the end, it does seem that you're gonna have to share some heap dump for 
> further investigation (sorry I'm not gonna help lot on this matter)
> 
> On 7 February 2018 at 14:43, Jürgen Albersdorfer  > wrote:
> Hi Nicolas,
> 
> Do you know how many sstables is this new node suppose to receive ?
> 
> If I can find out this via nodetool netstats, then this would be 619 as 
> following:
> 
> # nodetool netstats
> Bootstrap b95371e0-0c0a-11e8-932b-f775227bf21c
> /192.168.1.215  - Receiving 71 files, 7744612158 
> bytes total. Already received 0 files, 893897583 bytes total
> /192.168.1.214  - Receiving 58 files, 5693392001 
> bytes total. Already received 0 files, 1078372756 bytes total
> /192.168.1.206  - Receiving 52 files, 3389096409 
> bytes total. Already received 3 files, 508592758 bytes total
> /192.168.1.213  - Receiving 59 files, 6041633329 
> bytes total. Already received 0 files, 1038760653 bytes total
> /192.168.1.231  - Receiving 79 files, 7579181689 
> bytes total. Already received 4 files, 38387859 bytes total
> /192.168.1.208  - Receiving 51 files, 3272885123 
> bytes total. Already received 3 files, 362450903 bytes total
> /192.168.1.207  - Receiving 56 files, 3028344200 
> bytes total. Already received 3 files, 57790197 bytes total
> /192.168.1.232  - Receiving 79 files, 7268716317 
> bytes total. Already received 1 files, 1127174421 bytes total
> /192.168.1.209  - Receiving 114 files, 21381846105 
> bytes total. Already received 1 files, 961497222 bytes total
> 
> 
> does disabling compaction_throughput_mb_per_sec or increasing 
> concurrent_compactors has any effect ?
> 
> I will give it a try:
> 
> # nodetool getcompactionthroughput
> Current compaction throughput: 128 MB/s
> 
> # nodetool setcompactionthroughput 0
> 
> # nodetool getcompactionthroughput
> Current compaction throughput: 0 MB/s
> 
> # nodetool getconcurrentcompactors
> Current concurrent compactors in the system is:
> 16
> 
> 
> Which memtable_allocation_type are you using ?
>  
> # grep memtable_allocation_type /etc/cassandra/conf/cassandra.yaml
> memtable_allocation_type: heap_buffers
> 
> 
> thanks so far, regards
> Juergen
> 
> 2018-02-07 14:29 GMT+01:00 Nicolas Guyomar  >:
> Hi Jurgen,
> 
> It does feel like some OOM during boostrap from previous C* v2, but that 
> sould be fixed in your version.
> 
> Do you know how many sstables is this new node suppose to receive ? 
> 
> Juste a wild guess : it may have something to do with compaction not keeping 
> up because every other nodes are streaming data to this new one (resulting in 
> long lived object in the heap), does disabling 
> compaction_throughput_mb_per_sec or increasing concurrent_compactors has any 
> effect ? 
> 
> Which memtable_allocation_type are you using ? 
> 
> 
> On 7 February 2018 at 12:38, Jürgen Albersdorfer  > wrote:
> Hi, I always face an issue when bootstrapping a Node having less than 184GB 
> RAM (156GB JVM HEAP) on our 10 Node C* 3.11.1 Cluster.
> During bootstrap, when I watch the cassandra.log I observe a growth in JVM 
> Heap Old Gen which gets not significantly freed any more.
> I know that JVM collects on Old Gen only when really needed. I can see 
> collections, but there is always a remainder which
> seems to grow forever without ever getting freed.
> After the Node successfully Joined the Cluster, I can remove the extra 128GB 
> of RAM I have given it for bootstrapping without any further effect.
> 
> It feels like Cassandra will not forget about every single byte streamed over 

Re: Bootstrapping fails with < 128GB RAM ...

2018-02-07 Thread Nicolas Guyomar
Ok then, following up on the wild guess : because you have quite a lot of
concurrent compactors, maybe it is too much concurrent compactions for the
jvm to deal with (taking into account that your load average of 106 seems
really high IMHO)

55Gb of data is not that much, you can try to reduce those concurrent
compactor to make sure your box is not under too much stress (how many
compaction do you have in parallel during boostrap ? )

In the end, it does seem that you're gonna have to share some heap dump for
further investigation (sorry I'm not gonna help lot on this matter)

On 7 February 2018 at 14:43, Jürgen Albersdorfer 
wrote:

> Hi Nicolas,
>
> Do you know how many sstables is this new node suppose to receive ?
>
>
> If I can find out this via nodetool netstats, then this would be 619 as
> following:
>
> # nodetool netstats
> Bootstrap b95371e0-0c0a-11e8-932b-f775227bf21c
> /192.168.1.215 - Receiving 71 files, 7744612158 bytes total. Already
> received 0 files, 893897583 bytes total
> /192.168.1.214 - Receiving 58 files, 5693392001 bytes total. Already
> received 0 files, 1078372756 bytes total
> /192.168.1.206 - Receiving 52 files, 3389096409 bytes total. Already
> received 3 files, 508592758 bytes total
> /192.168.1.213 - Receiving 59 files, 6041633329 bytes total. Already
> received 0 files, 1038760653 bytes total
> /192.168.1.231 - Receiving 79 files, 7579181689 bytes total. Already
> received 4 files, 38387859 bytes total
> /192.168.1.208 - Receiving 51 files, 3272885123 bytes total. Already
> received 3 files, 362450903 bytes total
> /192.168.1.207 - Receiving 56 files, 3028344200 bytes total. Already
> received 3 files, 57790197 bytes total
> /192.168.1.232 - Receiving 79 files, 7268716317 bytes total. Already
> received 1 files, 1127174421 bytes total
> /192.168.1.209 - Receiving 114 files, 21381846105 bytes total.
> Already received 1 files, 961497222 bytes total
>
>
> does disabling compaction_throughput_mb_per_sec or increasing concurrent_
>> compactors has any effect ?
>
>
> I will give it a try:
>
> # nodetool getcompactionthroughput
> Current compaction throughput: 128 MB/s
>
> # nodetool setcompactionthroughput 0
>
> # nodetool getcompactionthroughput
> Current compaction throughput: 0 MB/s
>
> # nodetool getconcurrentcompactors
> Current concurrent compactors in the system is:
> 16
>
>
> Which memtable_allocation_type are you using ?
>
>
> # grep memtable_allocation_type /etc/cassandra/conf/cassandra.yaml
> memtable_allocation_type: heap_buffers
>
>
> thanks so far, regards
> Juergen
>
> 2018-02-07 14:29 GMT+01:00 Nicolas Guyomar :
>
>> Hi Jurgen,
>>
>> It does feel like some OOM during boostrap from previous C* v2, but that
>> sould be fixed in your version.
>>
>> Do you know how many sstables is this new node suppose to receive ?
>>
>> Juste a wild guess : it may have something to do with compaction not
>> keeping up because every other nodes are streaming data to this new one
>> (resulting in long lived object in the heap), does disabling
>> compaction_throughput_mb_per_sec or increasing concurrent_compactors has
>> any effect ?
>>
>> Which memtable_allocation_type are you using ?
>>
>>
>> On 7 February 2018 at 12:38, Jürgen Albersdorfer > > wrote:
>>
>>> Hi, I always face an issue when bootstrapping a Node having less than
>>> 184GB RAM (156GB JVM HEAP) on our 10 Node C* 3.11.1 Cluster.
>>> During bootstrap, when I watch the cassandra.log I observe a growth in
>>> JVM Heap Old Gen which gets not significantly freed any more.
>>> I know that JVM collects on Old Gen only when really needed. I can see
>>> collections, but there is always a remainder which
>>> seems to grow forever without ever getting freed.
>>> After the Node successfully Joined the Cluster, I can remove the extra
>>> 128GB of RAM I have given it for bootstrapping without any further effect.
>>>
>>> It feels like Cassandra will not forget about every single byte streamed
>>> over the Network over time during bootstrapping, - which would be a memory
>>> leak and a major problem, too.
>>>
>>> Or is there something I'm doing wrong? - Any Ideas?
>>>
>>> Here my observations on a failing Bootstrap - The following Node has
>>> 72GB RAM installed, 64GB of it are configured for JVM Heap Space.
>>>
>>> cassandra.log (truncated):
>>> INFO  [Service Thread] 2018-02-07 11:12:49,604 GCInspector.java:284 - G1
>>> Young Generation GC in 984ms.  G1 Eden Space: 14763950080 -> 0; G1 Old Gen:
>>> 36960206856 -> 39661338640; G1 Survivor Space: 2785017856 -> 1476395008;
>>> INFO  [Service Thread] 2018-02-07 11:13:00,108 GCInspector.java:284 - G1
>>> Young Generation GC in 784ms.  G1 Eden Space: 18387828736
>>> <(838)%20782-8736> -> 0; G1 Old Gen: 39661338640 -> 41053847560; G1
>>> Survivor Space: 1476395008 -> 1845493760;
>>> INFO  [Service Thread] 2018-02-07 11:13:08,639 GCInspector.java:284 - G1
>>> Young Generation GC in 718ms.  G1 Eden Space: 16743661568 -> 0; G1 Old Gen:
>>> 41053847560 -> 42832

Re: Bootstrapping fails with < 128GB RAM ...

2018-02-07 Thread Jürgen Albersdorfer
Hi Nicolas,

Do you know how many sstables is this new node suppose to receive ?


If I can find out this via nodetool netstats, then this would be 619 as
following:

# nodetool netstats
Bootstrap b95371e0-0c0a-11e8-932b-f775227bf21c
/192.168.1.215 - Receiving 71 files, 7744612158 bytes total. Already
received 0 files, 893897583 bytes total
/192.168.1.214 - Receiving 58 files, 5693392001 bytes total. Already
received 0 files, 1078372756 bytes total
/192.168.1.206 - Receiving 52 files, 3389096409 bytes total. Already
received 3 files, 508592758 bytes total
/192.168.1.213 - Receiving 59 files, 6041633329 bytes total. Already
received 0 files, 1038760653 bytes total
/192.168.1.231 - Receiving 79 files, 7579181689 bytes total. Already
received 4 files, 38387859 bytes total
/192.168.1.208 - Receiving 51 files, 3272885123 bytes total. Already
received 3 files, 362450903 bytes total
/192.168.1.207 - Receiving 56 files, 3028344200 bytes total. Already
received 3 files, 57790197 bytes total
/192.168.1.232 - Receiving 79 files, 7268716317 bytes total. Already
received 1 files, 1127174421 bytes total
/192.168.1.209 - Receiving 114 files, 21381846105 bytes total. Already
received 1 files, 961497222 bytes total


does disabling compaction_throughput_mb_per_sec or increasing
concurrent_compactors
> has any effect ?


I will give it a try:

# nodetool getcompactionthroughput
Current compaction throughput: 128 MB/s

# nodetool setcompactionthroughput 0

# nodetool getcompactionthroughput
Current compaction throughput: 0 MB/s

# nodetool getconcurrentcompactors
Current concurrent compactors in the system is:
16


Which memtable_allocation_type are you using ?


# grep memtable_allocation_type /etc/cassandra/conf/cassandra.yaml
memtable_allocation_type: heap_buffers


thanks so far, regards
Juergen

2018-02-07 14:29 GMT+01:00 Nicolas Guyomar :

> Hi Jurgen,
>
> It does feel like some OOM during boostrap from previous C* v2, but that
> sould be fixed in your version.
>
> Do you know how many sstables is this new node suppose to receive ?
>
> Juste a wild guess : it may have something to do with compaction not
> keeping up because every other nodes are streaming data to this new one
> (resulting in long lived object in the heap), does disabling
> compaction_throughput_mb_per_sec or increasing concurrent_compactors has
> any effect ?
>
> Which memtable_allocation_type are you using ?
>
>
> On 7 February 2018 at 12:38, Jürgen Albersdorfer 
> wrote:
>
>> Hi, I always face an issue when bootstrapping a Node having less than
>> 184GB RAM (156GB JVM HEAP) on our 10 Node C* 3.11.1 Cluster.
>> During bootstrap, when I watch the cassandra.log I observe a growth in
>> JVM Heap Old Gen which gets not significantly freed any more.
>> I know that JVM collects on Old Gen only when really needed. I can see
>> collections, but there is always a remainder which
>> seems to grow forever without ever getting freed.
>> After the Node successfully Joined the Cluster, I can remove the extra
>> 128GB of RAM I have given it for bootstrapping without any further effect.
>>
>> It feels like Cassandra will not forget about every single byte streamed
>> over the Network over time during bootstrapping, - which would be a memory
>> leak and a major problem, too.
>>
>> Or is there something I'm doing wrong? - Any Ideas?
>>
>> Here my observations on a failing Bootstrap - The following Node has 72GB
>> RAM installed, 64GB of it are configured for JVM Heap Space.
>>
>> cassandra.log (truncated):
>> INFO  [Service Thread] 2018-02-07 11:12:49,604 GCInspector.java:284 - G1
>> Young Generation GC in 984ms.  G1 Eden Space: 14763950080 -> 0; G1 Old Gen:
>> 36960206856 -> 39661338640; G1 Survivor Space: 2785017856 -> 1476395008;
>> INFO  [Service Thread] 2018-02-07 11:13:00,108 GCInspector.java:284 - G1
>> Young Generation GC in 784ms.  G1 Eden Space: 18387828736
>> <(838)%20782-8736> -> 0; G1 Old Gen: 39661338640 -> 41053847560; G1
>> Survivor Space: 1476395008 -> 1845493760;
>> INFO  [Service Thread] 2018-02-07 11:13:08,639 GCInspector.java:284 - G1
>> Young Generation GC in 718ms.  G1 Eden Space: 16743661568 -> 0; G1 Old Gen:
>> 41053847560 -> 42832232472; G1 Survivor Space: 1845493760 -> 1375731712;
>> INFO  [Service Thread] 2018-02-07 11:13:18,271 GCInspector.java:284 - G1
>> Young Generation GC in 546ms.  G1 Eden Space: 15535702016 -> 0; G1 Old Gen:
>> 42831004832 -> 44206736544; G1 Survivor Space: 1375731712 -> 1006632960;
>> INFO  [Service Thread] 2018-02-07 11:13:35,364 GCInspector.java:284 - G1
>> Young Generation GC in 638ms.  G1 Eden Space: 14025752576
>> <(402)%20575-2576> -> 0; G1 Old Gen: 44206737048 -> 45213369488; G1
>> Survivor Space: 1778384896 -> 1610612736;
>> INFO  [Service Thread] 2018-02-07 11:13:42,898 GCInspector.java:284 - G1
>> Young Generation GC in 614ms.  G1 Eden Space: 13388218368 -> 0; G1 Old Gen:
>> 45213369488 -> 46152893584; G1 Survivor Space: 1610612736 -> 1006632960;
>> INFO  [Service Thread]

Re: Bootstrapping fails with < 128GB RAM ...

2018-02-07 Thread Nicolas Guyomar
Hi Jurgen,

It does feel like some OOM during boostrap from previous C* v2, but that
sould be fixed in your version.

Do you know how many sstables is this new node suppose to receive ?

Juste a wild guess : it may have something to do with compaction not
keeping up because every other nodes are streaming data to this new one
(resulting in long lived object in the heap), does disabling
compaction_throughput_mb_per_sec or increasing concurrent_compactors has
any effect ?

Which memtable_allocation_type are you using ?


On 7 February 2018 at 12:38, Jürgen Albersdorfer 
wrote:

> Hi, I always face an issue when bootstrapping a Node having less than
> 184GB RAM (156GB JVM HEAP) on our 10 Node C* 3.11.1 Cluster.
> During bootstrap, when I watch the cassandra.log I observe a growth in JVM
> Heap Old Gen which gets not significantly freed any more.
> I know that JVM collects on Old Gen only when really needed. I can see
> collections, but there is always a remainder which
> seems to grow forever without ever getting freed.
> After the Node successfully Joined the Cluster, I can remove the extra
> 128GB of RAM I have given it for bootstrapping without any further effect.
>
> It feels like Cassandra will not forget about every single byte streamed
> over the Network over time during bootstrapping, - which would be a memory
> leak and a major problem, too.
>
> Or is there something I'm doing wrong? - Any Ideas?
>
> Here my observations on a failing Bootstrap - The following Node has 72GB
> RAM installed, 64GB of it are configured for JVM Heap Space.
>
> cassandra.log (truncated):
> INFO  [Service Thread] 2018-02-07 11:12:49,604 GCInspector.java:284 - G1
> Young Generation GC in 984ms.  G1 Eden Space: 14763950080 -> 0; G1 Old Gen:
> 36960206856 -> 39661338640; G1 Survivor Space: 2785017856 -> 1476395008;
> INFO  [Service Thread] 2018-02-07 11:13:00,108 GCInspector.java:284 - G1
> Young Generation GC in 784ms.  G1 Eden Space: 18387828736 -> 0; G1 Old Gen:
> 39661338640 -> 41053847560; G1 Survivor Space: 1476395008 -> 1845493760;
> INFO  [Service Thread] 2018-02-07 11:13:08,639 GCInspector.java:284 - G1
> Young Generation GC in 718ms.  G1 Eden Space: 16743661568 -> 0; G1 Old Gen:
> 41053847560 -> 42832232472; G1 Survivor Space: 1845493760 -> 1375731712;
> INFO  [Service Thread] 2018-02-07 11:13:18,271 GCInspector.java:284 - G1
> Young Generation GC in 546ms.  G1 Eden Space: 15535702016 -> 0; G1 Old Gen:
> 42831004832 -> 44206736544; G1 Survivor Space: 1375731712 -> 1006632960;
> INFO  [Service Thread] 2018-02-07 11:13:35,364 GCInspector.java:284 - G1
> Young Generation GC in 638ms.  G1 Eden Space: 14025752576 -> 0; G1 Old Gen:
> 44206737048 -> 45213369488; G1 Survivor Space: 1778384896 -> 1610612736;
> INFO  [Service Thread] 2018-02-07 11:13:42,898 GCInspector.java:284 - G1
> Young Generation GC in 614ms.  G1 Eden Space: 13388218368 -> 0; G1 Old Gen:
> 45213369488 -> 46152893584; G1 Survivor Space: 1610612736 -> 1006632960;
> INFO  [Service Thread] 2018-02-07 11:13:58,291 GCInspector.java:284 - G1
> Young Generation GC in 400ms.  G1 Eden Space: 13119782912 -> 0; G1 Old Gen:
> 46136116376 -> 47171400848; G1 Survivor Space: 1275068416 -> 771751936;
> INFO  [Service Thread] 2018-02-07 11:14:23,071 GCInspector.java:284 - G1
> Young Generation GC in 303ms.  G1 Eden Space: 11676942336 -> 0; G1 Old Gen:
> 47710958232 -> 48239699096; G1 Survivor Space: 1207959552 -> 973078528;
> INFO  [Service Thread] 2018-02-07 11:14:46,157 GCInspector.java:284 - G1
> Young Generation GC in 305ms.  G1 Eden Space: 11005853696 -> 0; G1 Old Gen:
> 48903342232 -> 49289001104; G1 Survivor Space: 939524096 -> 973078528;
> INFO  [Service Thread] 2018-02-07 11:14:53,045 GCInspector.java:284 - G1
> Young Generation GC in 380ms.  G1 Eden Space: 10569646080 -> 0; G1 Old Gen:
> 49289001104 -> 49586732696; G1 Survivor Space: 973078528 -> 1308622848;
> INFO  [Service Thread] 2018-02-07 11:15:04,692 GCInspector.java:284 - G1
> Young Generation GC in 360ms.  G1 Eden Space: 9294577664 -> 0; G1 Old Gen:
> 50671712912 -> 51269944472; G1 Survivor Space: 905969664 -> 805306368;
> WARN  [Service Thread] 2018-02-07 11:15:07,317 GCInspector.java:282 - G1
> Young Generation GC in 1102ms.  G1 Eden Space: 2617245696 -> 0; G1 Old
> Gen: 51269944472 -> 47310521496; G1 Survivor Space: 805306368 ->
> 301989888;
> 
> INFO  [Service Thread] 2018-02-07 11:16:36,535 GCInspector.java:284 - G1
> Young Generation GC in 377ms.  G1 Eden Space: 7683964928 -> 0; G1 Old Gen:
> 51958433432 -> 52658554008; G1 Survivor Space: 1073741824 -> 1040187392;
> INFO  [Service Thread] 2018-02-07 11:16:41,756 GCInspector.java:284 - G1
> Young Generation GC in 340ms.  G1 Eden Space: 7046430720 -> 0; G1 Old Gen:
> 52624999576 -> 53299987616; G1 Survivor Space: 1040187392 -> 805306368;
> WARN  [Service Thread] 2018-02-07 11:16:44,087 GCInspector.java:282 - G1
> Young Generation GC in 1005ms.  G1 Eden Space: 2617245696 -> 0; G1 Old
> Gen: 53299987616 -> 49659331752; G1 Survivor Space: 805306368 -