Using:
- JDK 1.8u40
- UseG1GC, ParallelRefProcEnabled, Xmx12g,Xms12g
- Solr 4.10.4


When using G1GC we are seeing very high processing times in the GC Remark
phase during reference processing. Originally we saw high times during
WeakReference processing but adding"-XX:+ParallelRefProcEnabled" flag did
away with this. Now we frequently see these times are very high for JNI
Weak Reference processing (90seconds+!!!).  I've only noticed this being an
issue during STW processing in the Remark phase after an initial-mark phase
has executed. Reference processing during young or mixed collections never
seems to be as big an issue (milliseconds to 1-2s max). Currently JNI Weak
Reference are not processed in parallel in HotSpot so adding the parallel
flag has no affect.  (https://bugs.openjdk.java.net/browse/JDK-8072498)

Over the lifetime of an application we will see a similar pattern after
days of uptime, one where the amount of time taken for Remark gradually
continues increasing from milliseconds to minutes. If a FullGC is issued we
will see the times for reference processing reset back down to milliseconds.

Likely related: after a heap dump of a running application we noticed
hundreds of thousands of unreferenced DirectByteBuffers. And hundreds of
thousands more of them being referenced from Lucene410DocValuesProducer.

*Application Details:*
- Heavy indexing:
  - ~4000 bulk updates per minute, ~20 documents each update or 80k
documents per minute
- ~100 fields per document, mostly small strings, TrieInt, TrieDouble
- Usage of docValues (1-5 fields per document)
- Some fields multi-value TrieInt (precisionStep=0) fields could
potentially have hundreds of values
- 1s auto soft commit openSearcher=true, 15s auto hard commit
openSearcher=false


*Questions*:
1) What is causing the JNI Weak References?
  - Is it from nio? Usage of MMaps, DirectBuffers, etc?
2) Why does it become worse during the lifetime of the application?
  - Is there a leak?
  - Are we getting into a situation where we begin promoting Searchers
along with
    their IndexReaders/Writers too early. i.e. do we have a lot expensive
objects moving to
    tenured generation just before they become dereferenced.
      - Tune G1NewSizePercent?
3) Are the JNI Weak References being visited but not cleaned during GC?
  - Why not? tune G1HeapWastePercent?


As of right now we are experimenting with different G1GC flags, however
without understanding the root cause it sometimes requires days between
experiments for the problem to present itself.


We have not tuned any of the options above which disable the adaptive
sizing policy, some options we are actively testing are:

- Lowering InitiatingHeapOccupancyPercent
 - To force dereferenced objects in tenured to be cleaned earlier.

- Increasing MaxGCPauseMillis
 - To encourage policy for the young generation size to be larger
(discouraging premature promotion).


Example of remark phases below (full gc log can be provided if required)

One *example log file* (however we have seen worse), using flags in the
beginning:

1348278.630: [GC pause (G1 Evacuation Pause) (young) (initial-mark)
Desired survivor size 41943040 bytes, new threshold 15 (max 15)
- age   1:   36138128 bytes,   36138128 total
1348278.676: [SoftReference, 0 refs, 0.0086523 secs]1348278.684:
[WeakReference, 7128 refs, 0.0047846 secs]1348278.689: [FinalReference, 414
refs, 0.0063207 secs]1348278.695: [PhantomReference, 0 refs, 376 refs,
0.0129979 secs]1348278.709: [JNI Weak Reference, 0.1018689 secs], 0.1955299
secs]
   [Parallel Time: 39.7 ms, GC Workers: 23]
      [GC Worker Start (ms): Min: 1348278633.2, Avg: 1348278633.7, Max:
1348278634.2, Diff: 1.0]
      [Ext Root Scanning (ms): Min: 11.2, Avg: 11.7, Max: 14.9, Diff: 3.7,
Sum: 269.2]
      [Update RS (ms): Min: 9.9, Avg: 13.3, Max: 14.0, Diff: 4.1, Sum:
304.8]
         [Processed Buffers: Min: 23, Avg: 66.9, Max: 117, Diff: 94, Sum:
1538]
      [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.6]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
Sum: 0.0]
      [Object Copy (ms): Min: 11.8, Avg: 12.4, Max: 12.9, Diff: 1.1, Sum:
286.3]
      [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.3]
      [GC Worker Other (ms): Min: 0.1, Avg: 0.3, Max: 0.7, Diff: 0.6, Sum:
6.2]
      [GC Worker Total (ms): Min: 37.0, Avg: 37.7, Max: 38.6, Diff: 1.6,
Sum: 867.4]
      [GC Worker End (ms): Min: 1348278671.2, Avg: 1348278671.4, Max:
1348278671.8, Diff: 0.6]
   [Code Root Fixup: 0.4 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 2.8 ms]
   [Other: 152.6 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 137.2 ms]
      [Ref Enq: 2.6 ms]
      [Redirty Cards: 3.2 ms]
      [Humongous Reclaim: 0.0 ms]
      [Free CSet: 0.6 ms]
   [Eden: 532.0M(532.0M)->0.0B(532.0M) Survivors: 80.0M->80.0M Heap:
6225.6M(12.0G)->5701.0M(12.0G)]
 [Times: user=0.80 sys=0.04, real=0.20 secs]
1348278.827: [GC concurrent-root-region-scan-start]
1348278.828: Total time for which application threads were stopped:
0.2145574 seconds, Stopping threads took: 0.0012976 seconds
1348278.844: [GC concurrent-root-region-scan-end, 0.0170768 secs]
1348278.844: [GC concurrent-mark-start]
1348279.075: [GC concurrent-mark-end, 0.2306171 secs]
1348279.088: [GC remark 1348279.088: [Finalize Marking, 0.0051946 secs]
1348279.093: [GC ref-proc1348279.093: [SoftReference, 186 refs, 0.0089505
secs]1348279.102: [WeakReference, 2668 refs, 0.0051839 secs]1348279.107:
[FinalReference, 848 refs, 0.0088167 secs]1348279.116: [PhantomReference, 0
refs, 3836 refs, 0.0130326 secs]1348279.129: [JNI Weak Reference,
18.9741270 secs], 19.0130579 secs] 1348298.106: [Unloading, 0.0177559
secs], 19.0412558 secs]
 [Times: user=19.59 sys=0.00, real=19.04 secs]
1348298.132: Total time for which application threads were stopped:
19.0566871 seconds, Stopping threads took: 0.0011100 seconds
1348298.147: [GC cleanup 5836M->4256M(12G), 0.0124121 secs]
 [Times: user=0.11 sys=0.00, real=0.01 secs]
...
1348358.227: [GC remark 1348358.227: [Finalize Marking, 0.0042513 secs]
1348358.231: [GC ref-proc1348358.231: [SoftReference, 197 refs, 0.0085051
secs]1348358.240: [WeakReference, 5962 refs, 0.0082517 secs]1348358.248:
[FinalReference, 1881 refs, 0.0074058 secs]1348358.255: [PhantomReference,
0 refs, 3802 refs, 0.0116230 secs]1348358.267: [JNI Weak Reference,
29.3914835 secs], 29.4310975 secs] 1348387.662: [Unloading, 0.0179274
secs], 29.4583167 secs]
1350907.355: [GC remark 1350907.355: [Finalize Marking, 0.0066394 secs]
1350907.362: [GC ref-proc1350907.362: [SoftReference, 189 refs, 0.0084702
secs]1350907.370: [WeakReference, 2379 refs, 0.0046033 secs]1350907.375:
[FinalReference, 779 refs, 0.0086055 secs]1350907.383: [PhantomReference, 0
refs, 4349 refs, 0.0124574 secs]1350907.396: [JNI Weak Reference,
29.9550822 secs], 29.9923290 secs] 1350937.354: [Unloading, 0.0171185
secs], 30.0213910 secs]
...
...
...
1377495.992: [GC remark 1377495.992: [Finalize Marking, 0.0039972 secs]
1377495.996: [GC ref-proc1377495.996: [SoftReference, 192 refs, 0.0084780
secs]1377496.005: [WeakReference, 1171 refs, 0.0040591 secs]1377496.009:
[FinalReference, 619 refs, 0.0068656 secs]1377496.016: [PhantomReference, 0
refs, 4886 refs, 0.0094143 secs]1377496.025: [JNI Weak Reference,
48.3669099 secs], 48.3980356 secs] 1377544.394: [Unloading, 0.0159881
secs], 48.4228951 secs]
1378034.868: [GC remark 1378034.868: [Finalize Marking, 0.0033629 secs]
1378034.871: [GC ref-proc1378034.871: [SoftReference, 193 refs, 0.0059063
secs]1378034.877: [WeakReference, 1332 refs, 0.0036461 secs]1378034.881:
[FinalReference, 611 refs, 0.0046526 secs]1378034.885: [PhantomReference, 0
refs, 4254 refs, 0.0059437 secs]1378034.891: [JNI Weak Reference,
46.6392836 secs], 46.6624796 secs] 1378081.534: [Unloading, 0.0180802
secs], 46.6893835 secs]
1378677.326: [GC remark 1378677.326: [Finalize Marking, 0.0039769 secs]
1378677.330: [GC ref-proc1378677.330: [SoftReference, 193 refs, 0.0080377
secs]1378677.338: [WeakReference, 1719 refs, 0.0035972 secs]1378677.341:
[FinalReference, 351 refs, 0.0063767 secs]1378677.348: [PhantomReference, 0
refs, 4753 refs, 0.0103290 secs]1378677.358: [JNI Weak Reference,
43.9230851 secs], 43.9537699 secs] 1378721.284: [Unloading, 0.0150821
secs], 43.9780128 secs]
1379626.009: [GC remark 1379626.009: [Finalize Marking, 0.0040175 secs]
1379626.013: [GC ref-proc1379626.013: [SoftReference, 186 refs, 0.0080680
secs]1379626.021: [WeakReference, 955 refs, 0.0064521 secs]1379626.028:
[FinalReference, 567 refs, 0.0098877 secs]1379626.038: [PhantomReference, 0
refs, 4701 refs, 0.0104484 secs]1379626.048: [JNI Weak Reference,
51.3181908 secs], 51.3558998 secs] 1379677.369: [Unloading, 0.0207052
secs], 51.3856987 secs]
...
1380251.013: [GC pause (G1 Evacuation Pause) (young) (initial-mark)
Desired survivor size 41943040 bytes, new threshold 1 (max 15)
- age   1:   46612512 bytes,   46612512 total
[SoftReference, 0 refs, 0.0074261 secs]1380251.048: [WeakReference, 5189
refs, 0.0041909 secs]1380251.052: [FinalReference, 393 refs, 0.0042160
secs]1380251.056: [PhantomReference, 0 refs, 83 refs, 0.0074198
secs]1380251.064: [JNI Weak Reference, 3.0041020 secs], 3.0678505 secs]
   [Parallel Time: 23.5 ms, GC Workers: 23]
      [GC Worker Start (ms): Min: 1380251014.8, Avg: 1380251015.3, Max:
1380251015.9, Diff: 1.1]
      [Ext Root Scanning (ms): Min: 6.7, Avg: 7.5, Max: 10.9, Diff: 4.2,
Sum: 173.4]
      [Update RS (ms): Min: 3.7, Avg: 6.5, Max: 7.0, Diff: 3.3, Sum: 150.5]
         [Processed Buffers: Min: 27, Avg: 66.7, Max: 112, Diff: 85, Sum:
1534]
      [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.6]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
Sum: 0.0]
      [Object Copy (ms): Min: 6.3, Avg: 6.7, Max: 7.3, Diff: 1.0, Sum:
154.9]
      [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
      [GC Worker Other (ms): Min: 0.1, Avg: 0.4, Max: 0.8, Diff: 0.8, Sum:
8.3]
      [GC Worker Total (ms): Min: 20.4, Avg: 21.2, Max: 21.8, Diff: 1.4,
Sum: 487.7]
      [GC Worker End (ms): Min: 1380251036.2, Avg: 1380251036.5, Max:
1380251037.0, Diff: 0.8]
   [Code Root Fixup: 0.4 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 2.9 ms]
   [Other: 3041.1 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 3029.5 ms]
      [Ref Enq: 2.7 ms]
      [Redirty Cards: 2.7 ms]
      [Humongous Reclaim: 0.0 ms]
      [Free CSet: 0.3 ms]
   [Eden: 532.0M(532.0M)->0.0B(548.0M) Survivors: 80.0M->64.0M Heap:
6142.5M(12.0G)->5620.7M(12.0G)]
 [Times: user=3.46 sys=0.00, real=3.07 secs]
1380254.081: [GC concurrent-root-region-scan-start]
1380254.082: Total time for which application threads were stopped:
3.0847197 seconds, Stopping threads took: 0.0011519 seconds
1380254.092: [GC concurrent-root-region-scan-end, 0.0105392 secs]
1380254.092: [GC concurrent-mark-start]
1380254.275: [GC concurrent-mark-end, 0.1832853 secs]
1380254.289: [GC remark 1380254.290: [Finalize Marking, 0.0039580 secs]
1380254.294: [GC ref-proc1380254.294: [SoftReference, 183 refs, 0.0068857
secs]1380254.300: [WeakReference, 1384 refs, 0.0043966 secs]1380254.305:
[FinalReference, 579 refs, 0.0073469 secs]1380254.312: [PhantomReference, 0
refs, 4327 refs, 0.0077366 secs]1380254.320: [JNI Weak Reference,
38.7211438 secs], 38.7506058 secs] 1380293.044: [Unloading, 0.0174879
secs], 38.7765019 secs]
 [Times: user=39.37 sys=0.00, real=38.78 secs]
1380293.067: Total time for which application threads were stopped:
38.7917823 seconds, Stopping threads took: 0.0011561 seconds
1380293.077: [GC cleanup 5659M->4167M(12G), 0.0120391 secs]
 [Times: user=0.11 sys=0.00, real=0.01 secs]

Reply via email to