Hi Jurgen,

It does feel like some OOM during boostrap from previous C* v2, but that
sould be fixed in your version.

Do you know how many sstables is this new node suppose to receive ?

Juste a wild guess : it may have something to do with compaction not
keeping up because every other nodes are streaming data to this new one
(resulting in long lived object in the heap), does disabling
compaction_throughput_mb_per_sec or increasing concurrent_compactors has
any effect ?

Which memtable_allocation_type are you using ?


On 7 February 2018 at 12:38, Jürgen Albersdorfer <[email protected]>
wrote:

> Hi, I always face an issue when bootstrapping a Node having less than
> 184GB RAM (156GB JVM HEAP) on our 10 Node C* 3.11.1 Cluster.
> During bootstrap, when I watch the cassandra.log I observe a growth in JVM
> Heap Old Gen which gets not significantly freed any more.
> I know that JVM collects on Old Gen only when really needed. I can see
> collections, but there is always a remainder which
> seems to grow forever without ever getting freed.
> After the Node successfully Joined the Cluster, I can remove the extra
> 128GB of RAM I have given it for bootstrapping without any further effect.
>
> It feels like Cassandra will not forget about every single byte streamed
> over the Network over time during bootstrapping, - which would be a memory
> leak and a major problem, too.
>
> Or is there something I'm doing wrong? - Any Ideas?
>
> Here my observations on a failing Bootstrap - The following Node has 72GB
> RAM installed, 64GB of it are configured for JVM Heap Space.
>
> cassandra.log (truncated):
> INFO  [Service Thread] 2018-02-07 11:12:49,604 GCInspector.java:284 - G1
> Young Generation GC in 984ms.  G1 Eden Space: 14763950080 -> 0; G1 Old Gen:
> 36960206856 -> 39661338640; G1 Survivor Space: 2785017856 -> 1476395008;
> INFO  [Service Thread] 2018-02-07 11:13:00,108 GCInspector.java:284 - G1
> Young Generation GC in 784ms.  G1 Eden Space: 18387828736 -> 0; G1 Old Gen:
> 39661338640 -> 41053847560; G1 Survivor Space: 1476395008 -> 1845493760;
> INFO  [Service Thread] 2018-02-07 11:13:08,639 GCInspector.java:284 - G1
> Young Generation GC in 718ms.  G1 Eden Space: 16743661568 -> 0; G1 Old Gen:
> 41053847560 -> 42832232472; G1 Survivor Space: 1845493760 -> 1375731712;
> INFO  [Service Thread] 2018-02-07 11:13:18,271 GCInspector.java:284 - G1
> Young Generation GC in 546ms.  G1 Eden Space: 15535702016 -> 0; G1 Old Gen:
> 42831004832 -> 44206736544; G1 Survivor Space: 1375731712 -> 1006632960;
> INFO  [Service Thread] 2018-02-07 11:13:35,364 GCInspector.java:284 - G1
> Young Generation GC in 638ms.  G1 Eden Space: 14025752576 -> 0; G1 Old Gen:
> 44206737048 -> 45213369488; G1 Survivor Space: 1778384896 -> 1610612736;
> INFO  [Service Thread] 2018-02-07 11:13:42,898 GCInspector.java:284 - G1
> Young Generation GC in 614ms.  G1 Eden Space: 13388218368 -> 0; G1 Old Gen:
> 45213369488 -> 46152893584; G1 Survivor Space: 1610612736 -> 1006632960;
> INFO  [Service Thread] 2018-02-07 11:13:58,291 GCInspector.java:284 - G1
> Young Generation GC in 400ms.  G1 Eden Space: 13119782912 -> 0; G1 Old Gen:
> 46136116376 -> 47171400848; G1 Survivor Space: 1275068416 -> 771751936;
> INFO  [Service Thread] 2018-02-07 11:14:23,071 GCInspector.java:284 - G1
> Young Generation GC in 303ms.  G1 Eden Space: 11676942336 -> 0; G1 Old Gen:
> 47710958232 -> 48239699096; G1 Survivor Space: 1207959552 -> 973078528;
> INFO  [Service Thread] 2018-02-07 11:14:46,157 GCInspector.java:284 - G1
> Young Generation GC in 305ms.  G1 Eden Space: 11005853696 -> 0; G1 Old Gen:
> 48903342232 -> 49289001104; G1 Survivor Space: 939524096 -> 973078528;
> INFO  [Service Thread] 2018-02-07 11:14:53,045 GCInspector.java:284 - G1
> Young Generation GC in 380ms.  G1 Eden Space: 10569646080 -> 0; G1 Old Gen:
> 49289001104 -> 49586732696; G1 Survivor Space: 973078528 -> 1308622848;
> INFO  [Service Thread] 2018-02-07 11:15:04,692 GCInspector.java:284 - G1
> Young Generation GC in 360ms.  G1 Eden Space: 9294577664 -> 0; G1 Old Gen:
> 50671712912 -> 51269944472; G1 Survivor Space: 905969664 -> 805306368;
> WARN  [Service Thread] 2018-02-07 11:15:07,317 GCInspector.java:282 - G1
> Young Generation GC in 1102ms.  G1 Eden Space: 2617245696 -> 0; G1 Old
> Gen: 51269944472 -> 47310521496; G1 Survivor Space: 805306368 ->
> 301989888;
> ....
> INFO  [Service Thread] 2018-02-07 11:16:36,535 GCInspector.java:284 - G1
> Young Generation GC in 377ms.  G1 Eden Space: 7683964928 -> 0; G1 Old Gen:
> 51958433432 -> 52658554008; G1 Survivor Space: 1073741824 -> 1040187392;
> INFO  [Service Thread] 2018-02-07 11:16:41,756 GCInspector.java:284 - G1
> Young Generation GC in 340ms.  G1 Eden Space: 7046430720 -> 0; G1 Old Gen:
> 52624999576 -> 53299987616; G1 Survivor Space: 1040187392 -> 805306368;
> WARN  [Service Thread] 2018-02-07 11:16:44,087 GCInspector.java:282 - G1
> Young Generation GC in 1005ms.  G1 Eden Space: 2617245696 -> 0; G1 Old
> Gen: 53299987616 -> 49659331752; G1 Survivor Space: 805306368 ->
> 436207616;
> ...
> INFO  [Service Thread] 2018-02-07 11:25:40,902 GCInspector.java:284 - G1
> Young Generation GC in 254ms.  G1 Eden Space: 11475615744 -> 0; G1 Old Gen:
> 48904357040 -> 48904357544; G1 Survivor Space: 704643072 -> 805306368;
> INFO  [Service Thread] 2018-02-07 11:26:11,424 GCInspector.java:284 - G1
> Young Generation GC in 202ms.  G1 Eden Space: 11005853696 -> 0; G1 Old Gen:
> 48904357544 -> 49321014960; G1 Survivor Space: 939524096 -> 536870912;
> WARN  [Service Thread] 2018-02-07 11:26:44,484 GCInspector.java:282 - G1
> Young Generation GC in 1295ms.  G1 Eden Space: 2617245696 -> 0; G1 Old
> Gen: 49321014960 -> 46255753384; G1 Survivor Space: 805306368 ->
> 402653184;
> ...
> INFO  [Service Thread] 2018-02-07 11:30:37,828 GCInspector.java:284 - G1
> Young Generation GC in 958ms.  G1 Eden Space: 2785017856 -> 0; G1 Old
> Gen: 51196393000 -> 50629766184; G1 Survivor Space: 637534208 ->
> 436207616;
> INFO  [Service Thread] 2018-02-07 11:30:45,036 GCInspector.java:284 - G1
> Young Generation GC in 270ms.  G1 Eden Space: 10267656192 -> 0; G1 Old
> Gen: 50629766184 -> 50626254144; G1 Survivor Space: 436207616 ->
> 738197504;
> INFO  [Service Thread] 2018-02-07 11:31:48,128 GCInspector.java:284 - G1
> Young Generation GC in 984ms.  G1 Eden Space: 2617245696 -> 0; G1 Old
> Gen: 51086410272 -> 50443965480; G1 Survivor Space: 805306368 ->
> 369098752;
>
>
> jvm.options as following (comments removed):
> ## Use the Hotspot garbage-first collector.
> -XX:+UseG1GC
> -XX:MaxGCPauseMillis=1000
> -XX:InitiatingHeapOccupancyPercent=70
> -XX:ParallelGCThreads=16
> -XX:ConcGCThreads=16
>
> ### GC logging options -- uncomment to enable
> -XX:+PrintGCDetails
> -XX:+PrintGCDateStamps
> -XX:+PrintHeapAtGC
> -XX:+PrintTenuringDistribution
> -XX:+PrintGCApplicationStoppedTime
> -XX:+PrintPromotionFailure
> #-XX:PrintFLSStatistics=1
> #-Xloggc:/var/log/cassandra/gc.log
> -XX:+UseGCLogFileRotation
> -XX:NumberOfGCLogFiles=10
> -XX:GCLogFileSize=10M
>
> I tried this with ParNewGC and ConcMarkSweepGC, too - and I have the same
> behavior there, too.
>
> From nodetool netstats I see that it wants to Stream about 55,9 GB.
> After 1,5h of streaming with more than 10MB/s (about 54GB seen with dstat)
> nodetool netstats displays that only 3,5GB of 55,9 GB already received.
>
> uptime
>  11:30:52 up  1:36,  3 users,  load average: 106.01, 87.54, 66.01
>
> nodetool netstats (truncated for better reading)
> Wed Feb  7 11:19:07 CET 2018
> Mode: JOINING
> Bootstrap 56d204d0-0be9-11e8-ae30-617216855b4a
>     /192.168.1.213 - Receiving 68 files, 6.774.831.556 bytes total.
> Already received 3 files, 279.238.740 bytes total
>     /192.168.1.215 - Receiving 68 files, 5.721.460.494 bytes total.
> Already received 4 files, 109.051.913 bytes total
>     /192.168.1.214 - Receiving 68 files, 7.497.726.056 bytes total.
> Already received 4 files, 870.592.708 bytes total
>     /192.168.1.207 - Receiving 63 files, 4.945.809.501 bytes total.
> Already received 4 files, 700.599.427 bytes total
>     /192.168.1.232 - Receiving 91 files, 7.344.537.682 bytes total.
> Already received 3 files, 237.482.005 bytes total
>     /192.168.1.209 - Receiving 102 files, 15.931.849.729 bytes total.
> Already received 3 files, 1.108.754.920 bytes total
>     /192.168.1.231 - Receiving 92 files, 7.927.882.516 bytes total.
> Already received 4 files, 269.514.936 bytes total
>
>
> nodetool status:
> Datacenter: main
> ================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address        Load       Tokens       Owns    Host ID
>                Rack
> UN  192.168.1.232  83,31 GiB  256          ?       
> 510a0068-ee2b-4d1f-9965-9e29602d2f8f
> rack04
> UN  192.168.1.206  51,41 GiB  256          ?       
> a168b632-52e7-408a-ae7f-6ba6b9c55cea
> rack01
> UN  192.168.1.207  57,66 GiB  256          ?       
> 7401ab8f-114d-41b4-801d-53a4b042de52
> rack01
> UN  192.168.1.208  56,47 GiB  256          ?       
> 767980ef-52f2-4c21-8567-324fc1db274c
> rack01...
> UJ  192.168.1.160  68,95 GiB  256          ?       
> a3a5a169-512f-4e1f-8c0b-419c828f23e1
> rack02
> UN  192.168.1.209  94,27 GiB  256          ?       
> 8757cb4a-183e-4828-8212-7715b5563935
> rack02
> UN  192.168.1.213  78,26 GiB  256          ?       
> b1e9481c-4ba2-4396-837a-84be35737fe7
> rack05
> UN  192.168.1.214  80,66 GiB  256          ?       
> 457fc606-7002-49ad-8da5-309b92093acf
> rack06
> UN  192.168.1.231  87,5 GiB   256          ?       
> 2017a9e8-3638-465e-bc4a-5e59e693fb49
> rack03
> UN  192.168.1.215  86,97 GiB  256          ?       
> 5dfe4c35-8f8a-4305-824a-4610cec9411b
> rack07
>
> thanks, and kind regards
> Juergen
>

Reply via email to