Re: Bootstrapping fails with < 128GB RAM ...

Jürgen Albersdorfer Fri, 16 Feb 2018 00:16:47 -0800

Hi Jon,
I was able to get a heapdump. - I created a JIRA and attached as much as
details as possible.


https://issues.apache.org/jira/browse/CASSANDRA-14239

The heapdump is 42GB in Size. I will keep it - if you need more information
please don't hesitate to let me know.

thanks,
Jürgen

2018-02-09 12:07 GMT+01:00 Jürgen Albersdorfer <jalbersdor...@gmail.com>:

> Hi Jon,
> should I register to the JIRA and open an Issue or will you do so?
> I'm currently trying to bootstrap another node - with 100GB RAM, this
> time, and I'm recording Java Heap Memory over time via Jconsole, Top
> Threads and do monitoring the debug.log.
>
> There, in the debug.log, I can see, that the other nodes seem to
> immediatelly start hinting to the joining node, indicated by the following
> logs, which I have hundrets per second in my debug.log:
>
> DEBUG [MutationStage-27] 2018-02-09 12:06:03,241 HintVerbHandler.java:95 -
> Failed to apply hint
> java.util.concurrent.CompletionException: 
> org.apache.cassandra.exceptions.WriteTimeoutException:
> Operation timed out - received only 0 responses.
>         at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
> ~[na:1.8.0_151]
>         at 
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
> ~[na:1.8.0_151]
>         at 
> java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:647)
> ~[na:1.8.0_151]
>         at java.util.concurrent.CompletableFuture$UniAccept.
> tryFire(CompletableFuture.java:632) ~[na:1.8.0_151]
>         at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> ~[na:1.8.0_151]
>         at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
> ~[na:1.8.0_151]
>         at org.apache.cassandra.db.Keyspace.applyInternal(Keyspace.java:523)
> ~[apache-cassandra-3.11.1.jar:3.11.1]
>         at 
> org.apache.cassandra.db.Keyspace.lambda$applyInternal$0(Keyspace.java:538)
> ~[apache-cassandra-3.11.1.jar:3.11.1]
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ~[na:1.8.0_151]
>         at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorServ
> ice$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
> ~[apache-cassandra-3.11.1.jar:3.11.1]
>         at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109)
> ~[apache-cassandra-3.11.1.jar:3.11.1]
>         at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_151]
> Caused by: org.apache.cassandra.exceptions.WriteTimeoutException:
> Operation timed out - received only 0 responses.
>         ... 6 common frames omitted
>
> Could this be connected? Maybe causing the extensive RAM requirement?
>
> Thanks so far, regards
> Juergen
>
> 2018-02-07 19:49 GMT+01:00 Jon Haddad <j...@jonhaddad.com>:
>
>> It would be extremely helpful to get some info about your heap.  At a
>> bare minimum, a histogram of the heap dump would be useful, but ideally a
>> full heap dump would be best.
>>
>> jmap  -dump:live,format=b,file=heap.bin PID
>>
>> Taking a look at that in YourKit should give some pretty quick insight
>> into what kinds of objects are allocated then we can get to the bottom of
>> the issue.  This should be moved to a JIRA (https://issues.apache.org/jir
>> a/secure/Dashboard.jspa) in order to track and fix it, if you could
>> attach that heap dump it would be very helpful.
>>
>> Jon
>>
>>
>> On Feb 7, 2018, at 6:11 AM, Nicolas Guyomar <nicolas.guyo...@gmail.com>
>> wrote:
>>
>> Ok then, following up on the wild guess : because you have quite a lot of
>> concurrent compactors, maybe it is too much concurrent compactions for the
>> jvm to deal with (taking into account that your load average of 106 seems
>> really high IMHO)
>>
>> 55Gb of data is not that much, you can try to reduce those concurrent
>> compactor to make sure your box is not under too much stress (how many
>> compaction do you have in parallel during boostrap ? )
>>
>> In the end, it does seem that you're gonna have to share some heap dump
>> for further investigation (sorry I'm not gonna help lot on this matter)
>>
>> On 7 February 2018 at 14:43, Jürgen Albersdorfer <jalbersdor...@gmail.com
>> > wrote:
>>
>>> Hi Nicolas,
>>>
>>> Do you know how many sstables is this new node suppose to receive ?
>>>
>>>
>>> If I can find out this via nodetool netstats, then this would be 619 as
>>> following:
>>>
>>> # nodetool netstats
>>> Bootstrap b95371e0-0c0a-11e8-932b-f775227bf21c
>>>     /192.168.1.215 - Receiving 71 files, 7744612158 <(774)%20461-2158>
>>> bytes total. Already received 0 files, 893897583 bytes total
>>>     /192.168.1.214 - Receiving 58 files, 5693392001 bytes total.
>>> Already received 0 files, 1078372756 bytes total
>>>     /192.168.1.206 - Receiving 52 files, 3389096409 bytes total.
>>> Already received 3 files, 508592758 bytes total
>>>     /192.168.1.213 - Receiving 59 files, 6041633329 bytes total.
>>> Already received 0 files, 1038760653 bytes total
>>>     /192.168.1.231 - Receiving 79 files, 7579181689 <(757)%20918-1689>
>>> bytes total. Already received 4 files, 38387859 bytes total
>>>     /192.168.1.208 - Receiving 51 files, 3272885123 bytes total.
>>> Already received 3 files, 362450903 bytes total
>>>     /192.168.1.207 - Receiving 56 files, 3028344200 <(302)%20834-4200>
>>> bytes total. Already received 3 files, 57790197 bytes total
>>>     /192.168.1.232 - Receiving 79 files, 7268716317 <(726)%20871-6317>
>>> bytes total. Already received 1 files, 1127174421 bytes total
>>>     /192.168.1.209 - Receiving 114 files, 21381846105 bytes total.
>>> Already received 1 files, 961497222 bytes total
>>>
>>>
>>> does disabling compaction_throughput_mb_per_sec or
>>>> increasing concurrent_compactors has any effect ?
>>>
>>>
>>> I will give it a try:
>>>
>>> # nodetool getcompactionthroughput
>>> Current compaction throughput: 128 MB/s
>>>
>>> # nodetool setcompactionthroughput 0
>>>
>>> # nodetool getcompactionthroughput
>>> Current compaction throughput: 0 MB/s
>>>
>>> # nodetool getconcurrentcompactors
>>> Current concurrent compactors in the system is:
>>> 16
>>>
>>>
>>> Which memtable_allocation_type are you using ?
>>>
>>>
>>> # grep memtable_allocation_type /etc/cassandra/conf/cassandra.yaml
>>> memtable_allocation_type: heap_buffers
>>>
>>>
>>> thanks so far, regards
>>> Juergen
>>>
>>> 2018-02-07 14:29 GMT+01:00 Nicolas Guyomar <nicolas.guyo...@gmail.com>:
>>>
>>>> Hi Jurgen,
>>>>
>>>> It does feel like some OOM during boostrap from previous C* v2, but
>>>> that sould be fixed in your version.
>>>>
>>>> Do you know how many sstables is this new node suppose to receive ?
>>>>
>>>> Juste a wild guess : it may have something to do with compaction not
>>>> keeping up because every other nodes are streaming data to this new one
>>>> (resulting in long lived object in the heap), does disabling
>>>> compaction_throughput_mb_per_sec or increasing concurrent_compactors
>>>> has any effect ?
>>>>
>>>> Which memtable_allocation_type are you using ?
>>>>
>>>>
>>>> On 7 February 2018 at 12:38, Jürgen Albersdorfer <
>>>> jalbersdor...@gmail.com> wrote:
>>>>
>>>>> Hi, I always face an issue when bootstrapping a Node having less than
>>>>> 184GB RAM (156GB JVM HEAP) on our 10 Node C* 3.11.1 Cluster.
>>>>> During bootstrap, when I watch the cassandra.log I observe a growth in
>>>>> JVM Heap Old Gen which gets not significantly freed any more.
>>>>> I know that JVM collects on Old Gen only when really needed. I can see
>>>>> collections, but there is always a remainder which
>>>>> seems to grow forever without ever getting freed.
>>>>> After the Node successfully Joined the Cluster, I can remove the extra
>>>>> 128GB of RAM I have given it for bootstrapping without any further effect.
>>>>>
>>>>> It feels like Cassandra will not forget about every single byte
>>>>> streamed over the Network over time during bootstrapping, - which would be
>>>>> a memory leak and a major problem, too.
>>>>>
>>>>> Or is there something I'm doing wrong? - Any Ideas?
>>>>>
>>>>> Here my observations on a failing Bootstrap - The following Node has
>>>>> 72GB RAM installed, 64GB of it are configured for JVM Heap Space.
>>>>>
>>>>> cassandra.log (truncated):
>>>>> INFO  [Service Thread] 2018-02-07 11:12:49,604 GCInspector.java:284 -
>>>>> G1 Young Generation GC in 984ms.  G1 Eden Space: 14763950080 -> 0; G1 Old
>>>>> Gen: 36960206856 -> 39661338640; G1 Survivor Space: 2785017856 ->
>>>>> 1476395008;
>>>>> INFO  [Service Thread] 2018-02-07 11:13:00,108 GCInspector.java:284 -
>>>>> G1 Young Generation GC in 784ms.  G1 Eden Space: 18387828736
>>>>> <(838)%20782-8736> -> 0; G1 Old Gen: 39661338640 -> 41053847560; G1
>>>>> Survivor Space: 1476395008 -> 1845493760;
>>>>> INFO  [Service Thread] 2018-02-07 11:13:08,639 GCInspector.java:284 -
>>>>> G1 Young Generation GC in 718ms.  G1 Eden Space: 16743661568 -> 0; G1 Old
>>>>> Gen: 41053847560 -> 42832232472; G1 Survivor Space: 1845493760 ->
>>>>> 1375731712;
>>>>> INFO  [Service Thread] 2018-02-07 11:13:18,271 GCInspector.java:284 -
>>>>> G1 Young Generation GC in 546ms.  G1 Eden Space: 15535702016 -> 0; G1 Old
>>>>> Gen: 42831004832 -> 44206736544; G1 Survivor Space: 1375731712 ->
>>>>> 1006632960;
>>>>> INFO  [Service Thread] 2018-02-07 11:13:35,364 GCInspector.java:284 -
>>>>> G1 Young Generation GC in 638ms.  G1 Eden Space: 14025752576
>>>>> <(402)%20575-2576> -> 0; G1 Old Gen: 44206737048 -> 45213369488; G1
>>>>> Survivor Space: 1778384896 -> 1610612736;
>>>>> INFO  [Service Thread] 2018-02-07 11:13:42,898 GCInspector.java:284 -
>>>>> G1 Young Generation GC in 614ms.  G1 Eden Space: 13388218368 -> 0; G1 Old
>>>>> Gen: 45213369488 -> 46152893584; G1 Survivor Space: 1610612736 ->
>>>>> 1006632960;
>>>>> INFO  [Service Thread] 2018-02-07 11:13:58,291 GCInspector.java:284 -
>>>>> G1 Young Generation GC in 400ms.  G1 Eden Space: 13119782912 -> 0; G1 Old
>>>>> Gen: 46136116376 -> 47171400848; G1 Survivor Space: 1275068416 -> 
>>>>> 771751936;
>>>>> INFO  [Service Thread] 2018-02-07 11:14:23,071 GCInspector.java:284 -
>>>>> G1 Young Generation GC in 303ms.  G1 Eden Space: 11676942336 -> 0; G1 Old
>>>>> Gen: 47710958232 -> 48239699096; G1 Survivor Space: 1207959552 -> 
>>>>> 973078528;
>>>>> INFO  [Service Thread] 2018-02-07 11:14:46,157 GCInspector.java:284 -
>>>>> G1 Young Generation GC in 305ms.  G1 Eden Space: 11005853696 -> 0; G1 Old
>>>>> Gen: 48903342232 -> 49289001104; G1 Survivor Space: 939524096 -> 
>>>>> 973078528;
>>>>> INFO  [Service Thread] 2018-02-07 11:14:53,045 GCInspector.java:284 -
>>>>> G1 Young Generation GC in 380ms.  G1 Eden Space: 10569646080 -> 0; G1 Old
>>>>> Gen: 49289001104 -> 49586732696; G1 Survivor Space: 973078528 -> 
>>>>> 1308622848;
>>>>> INFO  [Service Thread] 2018-02-07 11:15:04,692 GCInspector.java:284 -
>>>>> G1 Young Generation GC in 360ms.  G1 Eden Space: 9294577664
>>>>> <(929)%20457-7664> -> 0; G1 Old Gen: 50671712912 -> 51269944472; G1
>>>>> Survivor Space: 905969664 -> 805306368;
>>>>> WARN  [Service Thread] 2018-02-07 11:15:07,317 GCInspector.java:282 -
>>>>> G1 Young Generation GC in 1102ms.  G1 Eden Space: 2617245696 -> 0; G1
>>>>> Old Gen: 51269944472 -> 47310521496; G1 Survivor Space: 805306368 ->
>>>>> 301989888;
>>>>> ....
>>>>> INFO  [Service Thread] 2018-02-07 11:16:36,535 GCInspector.java:284 -
>>>>> G1 Young Generation GC in 377ms.  G1 Eden Space: 7683964928 -> 0; G1 Old
>>>>> Gen: 51958433432 -> 52658554008; G1 Survivor Space: 1073741824 ->
>>>>> 1040187392;
>>>>> INFO  [Service Thread] 2018-02-07 11:16:41,756 GCInspector.java:284 -
>>>>> G1 Young Generation GC in 340ms.  G1 Eden Space: 7046430720
>>>>> <(704)%20643-0720> -> 0; G1 Old Gen: 52624999576 -> 53299987616; G1
>>>>> Survivor Space: 1040187392 -> 805306368;
>>>>> WARN  [Service Thread] 2018-02-07 11:16:44,087 GCInspector.java:282 -
>>>>> G1 Young Generation GC in 1005ms.  G1 Eden Space: 2617245696 -> 0; G1
>>>>> Old Gen: 53299987616 -> 49659331752; G1 Survivor Space: 805306368 ->
>>>>> 436207616;
>>>>> ...
>>>>> INFO  [Service Thread] 2018-02-07 11:25:40,902 GCInspector.java:284 -
>>>>> G1 Young Generation GC in 254ms.  G1 Eden Space: 11475615744 -> 0; G1 Old
>>>>> Gen: 48904357040 -> 48904357544; G1 Survivor Space: 704643072 -> 
>>>>> 805306368;
>>>>> INFO  [Service Thread] 2018-02-07 11:26:11,424 GCInspector.java:284 -
>>>>> G1 Young Generation GC in 202ms.  G1 Eden Space: 11005853696 -> 0; G1 Old
>>>>> Gen: 48904357544 -> 49321014960; G1 Survivor Space: 939524096 -> 
>>>>> 536870912;
>>>>> WARN  [Service Thread] 2018-02-07 11:26:44,484 GCInspector.java:282 -
>>>>> G1 Young Generation GC in 1295ms.  G1 Eden Space: 2617245696 -> 0; G1
>>>>> Old Gen: 49321014960 -> 46255753384; G1 Survivor Space: 805306368 ->
>>>>> 402653184;
>>>>> ...
>>>>> INFO  [Service Thread] 2018-02-07 11:30:37,828 GCInspector.java:284 -
>>>>> G1 Young Generation GC in 958ms.  G1 Eden Space: 2785017856 -> 0; G1
>>>>> Old Gen: 51196393000 -> 50629766184; G1 Survivor Space: 637534208 ->
>>>>> 436207616;
>>>>> INFO  [Service Thread] 2018-02-07 11:30:45,036 GCInspector.java:284 -
>>>>> G1 Young Generation GC in 270ms.  G1 Eden Space: 10267656192 -> 0; G1
>>>>> Old Gen: 50629766184 -> 50626254144; G1 Survivor Space: 436207616 ->
>>>>> 738197504;
>>>>> INFO  [Service Thread] 2018-02-07 11:31:48,128 GCInspector.java:284 -
>>>>> G1 Young Generation GC in 984ms.  G1 Eden Space: 2617245696 -> 0; G1
>>>>> Old Gen: 51086410272 -> 50443965480; G1 Survivor Space: 805306368 ->
>>>>> 369098752;
>>>>>
>>>>>
>>>>> jvm.options as following (comments removed):
>>>>> ## Use the Hotspot garbage-first collector.
>>>>> -XX:+UseG1GC
>>>>> -XX:MaxGCPauseMillis=1000
>>>>> -XX:InitiatingHeapOccupancyPercent=70
>>>>> -XX:ParallelGCThreads=16
>>>>> -XX:ConcGCThreads=16
>>>>>
>>>>> ### GC logging options -- uncomment to enable
>>>>> -XX:+PrintGCDetails
>>>>> -XX:+PrintGCDateStamps
>>>>> -XX:+PrintHeapAtGC
>>>>> -XX:+PrintTenuringDistribution
>>>>> -XX:+PrintGCApplicationStoppedTime
>>>>> -XX:+PrintPromotionFailure
>>>>> #-XX:PrintFLSStatistics=1
>>>>> #-Xloggc:/var/log/cassandra/gc.log
>>>>> -XX:+UseGCLogFileRotation
>>>>> -XX:NumberOfGCLogFiles=10
>>>>> -XX:GCLogFileSize=10M
>>>>>
>>>>> I tried this with ParNewGC and ConcMarkSweepGC, too - and I have the
>>>>> same behavior there, too.
>>>>>
>>>>> From nodetool netstats I see that it wants to Stream about 55,9 GB.
>>>>> After 1,5h of streaming with more than 10MB/s (about 54GB seen with
>>>>> dstat) nodetool netstats displays that only 3,5GB of 55,9 GB already
>>>>> received.
>>>>>
>>>>> uptime
>>>>>  11:30:52 up  1:36,  3 users,  load average: 106.01, 87.54, 66.01
>>>>>
>>>>> nodetool netstats (truncated for better reading)
>>>>> Wed Feb  7 11:19:07 CET 2018
>>>>> Mode: JOINING
>>>>> Bootstrap 56d204d0-0be9-11e8-ae30-617216855b4a
>>>>>     /192.168.1.213 - Receiving 68 files, 6.774.831.556 bytes total.
>>>>> Already received 3 files, 279.238.740 bytes total
>>>>>     /192.168.1.215 - Receiving 68 files, 5.721.460.494 bytes total.
>>>>> Already received 4 files, 109.051.913 bytes total
>>>>>     /192.168.1.214 - Receiving 68 files, 7.497.726.056 bytes total.
>>>>> Already received 4 files, 870.592.708 bytes total
>>>>>     /192.168.1.207 - Receiving 63 files, 4.945.809.501 bytes total.
>>>>> Already received 4 files, 700.599.427 bytes total
>>>>>     /192.168.1.232 - Receiving 91 files, 7.344.537.682 bytes total.
>>>>> Already received 3 files, 237.482.005 bytes total
>>>>>     /192.168.1.209 - Receiving 102 files, 15.931.849.729 bytes total.
>>>>> Already received 3 files, 1.108.754.920 bytes total
>>>>>     /192.168.1.231 - Receiving 92 files, 7.927.882.516 bytes total.
>>>>> Already received 4 files, 269.514.936 bytes total
>>>>>
>>>>>
>>>>> nodetool status:
>>>>> Datacenter: main
>>>>> ================
>>>>> Status=Up/Down
>>>>> |/ State=Normal/Leaving/Joining/Moving
>>>>> --  Address        Load       Tokens       Owns    Host ID
>>>>>                    Rack
>>>>> UN  192.168.1.232  83,31 GiB  256          ?
>>>>>  510a0068-ee2b-4d1f-9965-9e29602d2f8f  rack04
>>>>> UN  192.168.1.206  51,41 GiB  256          ?
>>>>>  a168b632-52e7-408a-ae7f-6ba6b9c55cea  rack01
>>>>> UN  192.168.1.207  57,66 GiB  256          ?
>>>>>  7401ab8f-114d-41b4-801d-53a4b042de52  rack01
>>>>> UN  192.168.1.208  56,47 GiB  256          ?
>>>>>  767980ef-52f2-4c21-8567-324fc1db274c  rack01...
>>>>> UJ  192.168.1.160  68,95 GiB  256          ?
>>>>>  a3a5a169-512f-4e1f-8c0b-419c828f23e1  rack02
>>>>> UN  192.168.1.209  94,27 GiB  256          ?
>>>>>  8757cb4a-183e-4828-8212-7715b5563935  rack02
>>>>> UN  192.168.1.213  78,26 GiB  256          ?
>>>>>  b1e9481c-4ba2-4396-837a-84be35737fe7  rack05
>>>>> UN  192.168.1.214  80,66 GiB  256          ?
>>>>>  457fc606-7002-49ad-8da5-309b92093acf  rack06
>>>>> UN  192.168.1.231  87,5 GiB   256          ?
>>>>>  2017a9e8-3638-465e-bc4a-5e59e693fb49  rack03
>>>>> UN  192.168.1.215  86,97 GiB  256          ?
>>>>>  5dfe4c35-8f8a-4305-824a-4610cec9411b  rack07
>>>>>
>>>>> thanks, and kind regards
>>>>> Juergen
>>>>>
>>>>
>>>>
>>>
>>
>>
>

Re: Bootstrapping fails with < 128GB RAM ...

Reply via email to