Hi Chuck,

of course it could be something other than garbage collection, but at this 
point I'm pretty sure it's something related to memory, either GC, or snapshot 
cache size, or something related.

I checked for all sorts of things that can occur at 5pm and didn't find any. In 
fact the problematic behaviour can also start a few minutes before or after 
that sometimes, and I don't see any meaningful spikes in resource consumption 
of any kind at that time. Backup is at night, I have monitoring for expensive 
processing jobs and also would see them in the application log. Database is 
exclusive for this application, and EOF transaction times are fast.

The one thing that I always see when this happens is a large heap, and it never 
happens as long as the heap is small. I spent the last 24 hours learning about 
garbage collectors, and there's a lot to test and try there, but ultimately 
there are unavoidable Stop-The-World events with all available options in the 
GC implementations we have in the Oracle JVM, and the only thing you can tune 
is how often they occur vs. how long they take, and both depend on either the 
live object set size or the total heap size.

Maik



> Am 26.01.2016 um 19:44 schrieb Chuck Hill <ch...@gevityinc.com>:
> 
> Hi Maik,
> 
> I will guess that the answer to the following is No as you said if you 
> restart the apps at 3PM the problem does not a happen.
> 
> Could it be something other than garbage collection?  I’ve seen a similar 
> problem when someone installed backup software that ran during business 
> hours.  Could it be something unrelated to  your apps that starts at 5PM?  Or 
> are some users starting very intensive processing at the end of their day and 
> impacting other users?  Could it be on the database side?  A delay there 
> would leave the app stalled with very little CPU or IO activity.  That is the 
> only thing that occurred to me other than the GC you are investigating.
> 
> Chuck
> 
> 
> From: <webobjects-dev-bounces+chill=gevityinc....@lists.apple.com 
> <mailto:webobjects-dev-bounces+chill=gevityinc....@lists.apple.com>> on 
> behalf of Musall Maik <m...@selbstdenker.ag <mailto:m...@selbstdenker.ag>>
> Date: Tuesday, January 26, 2016 at 5:28 AM
> To: Apple WebObjects Developer List <webobjects-dev@lists.apple.com 
> <mailto:webobjects-dev@lists.apple.com>>
> Subject: Re: Instance becomes unresponsive after 12 hours
> 
> Hi all,
> 
> a few of you also replied by private mail, thanks. Supplying the -Xloggc 
> option worked, although I don't understand where the info went before, and 
> the log rotation option doesn't rotate anything but just truncates the log 
> file on every JVM restart, which sucks. But ok, I have some data.
> 
> Below you find the log of one session. Users reported bad response times that 
> correspond with the Full GC runs, but also very bad response times from 17:00 
> onward where there is often no GC activity logged at the times that users 
> experienced waiting time. So I'm not entirely sure if this is all caused by 
> GC.
> 
> Anyway, heap sizes of up to 60 GByte are apparently not so common, at least 
> not with WebObjects. I found a helpful article [1] about very large JVM 
> heaps, including the hint to the Zing Azul JVM, which features a stopless 
> garbage collector [3], but costs $8000 per server and year. In the end, we 
> may actually be trying that if other options fail, but first I need to 
> understand more about what's in memory. I'm currently approaching that 
> question via JProfiler, especially to find out if that's all snapshot cache 
> (how would I limit that in size?) or some other homemade memory leak.
> 
> Does anyone have experience with the combination of Azul and WO?
> 
> I also found one strange event today where the application appeared stuck, 
> but continued after a minute or two, while the gc log file was truncated at 
> that time, without the JVM restarting. I can't see how that's supposed to 
> happen. Here's how that looked in my shell running "tail -F gc.log":
> 
> 
> [PSYoungGen: 15256082K->3431947K(16962048K)] 25580475K->13801928K(27895808K), 
> 1.1505415 secs] [Times: user=6.00 sys=0.03, real=1.15 secs]
> 2016-01-26T12:11:38.640+0100: 26136.407: [GC (Allocation Failure)
> Desired survivor size 4679794688 bytes, new threshold 15 (max 15)
> [PSYoungGen: 15791627K->3414349K(16958976K)] 26161608K->13935770K(27892736K), 
> 1.0761646 secs] [Times: user=6.10 sys=0.17, real=1.08 secs]
> 2016-01-26T12:16:10.835+0100: 26408.603: [GC (Allocation Failure)
> Desired survivor size 4657250304 bytes, new threshold 15 (max 15)
> [PSYoungGen: 15774029K->3449938K(17397760K)] 26295450K->14128338K(28331520K), 
> 1.3285101 secs] [Times: user=6.24 sys=1.39, real=1.33 secs]
> 2016-01-26T12:16:12.164+0100: 26409.931: [Full GC (Ergonomics) [PSYoungGen: 
> 3449938K->2355719K(17397760K)] [ParOldGen: 10678400K->10933508K(13649408K)] 
> 14128338K->13289227K(31047168K), [Metaspace: 84345K->84345K(94208K)], 
> 12.1170523 secs] [Times: user=67.98 sys=1.03, real=12.11 secs]
> 2016-01-26T12:21:18.990+0100: 26716.757: [GC (Allocation Failure)
> Desired survivor size 4644667392 bytes, new threshold 15 (max 15)
> [PSYoungGen: 15218183K->2398851K(17410560K)] 26151691K->13458637K(31059968K), 
> 1.3389149 secs] [Times: user=8.72 sys=0.17, real=1.34 secs]
> tail: gc.log: file truncated
> Java HotSpot(TM) 64-Bit Server VM (25.66-b17) for linux-amd64 JRE 
> (1.8.0_66-b17), built on Oct  6 2015 17:28:34 by "java_re" with gcc 4.3.0 
> 20080428 (Red Hat 4.3.0-8)
> Memory: 4k page, physical 99008820k(13655516k free), swap 67108860k(66990156k 
> free)
> CommandLine flags: -XX:InitialHeapSize=1584141120 -XX:+ManagementServer 
> -XX:MaxHeapSize=90194313216 -XX:+PrintGC -XX:+PrintGCDateStamps 
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution 
> -XX:-UseGCLogFileRotation -XX:+UseParallelGC
> 2016-01-26T12:33:36.029+0100: 27453.796: [GC (Allocation Failure)
> Desired survivor size 4588568576 bytes, new threshold 15 (max 15)
> [PSYoungGen: 15261315K->2538940K(17720832K)] 26321101K->13670347K(31370240K), 
> 0.7936427 secs] [Times: user=4.62 sys=0.03, real=0.80 secs]
> 2016-01-26T12:39:29.156+0100: 27806.923: [GC (Allocation Failure)
> 
> 
> And here's the above-mentioned full gc log of 4 hours during the most 
> critical usage time where users experienced waits. Notable wait times 
> according to user's notes occurred at least at the following times: 15:07, 
> 16:05, 16:57, 17:00-17:35, 17:59.
> 
> 
> 2016-01-25T15:06:54.344+0100: 36634.642: [GC (Allocation Failure) 
> [PSYoungGen: 18701268K->2058545K(23079424K)] 33613120K->17079754K(41686016K), 
> 0.8635088 secs] [Times: user=4.22 sys=0.50, real=0.86 secs] 
> 2016-01-25T15:15:48.438+0100: 37168.736: [GC (Allocation Failure) 
> [PSYoungGen: 19683633K->2231668K(23157248K)] 34704842K->17370085K(41763840K), 
> 0.9898285 secs] [Times: user=4.98 sys=0.05, real=0.99 secs] 
> 2016-01-25T15:24:54.891+0100: 37715.189: [GC (Allocation Failure) 
> [PSYoungGen: 19856756K->2254238K(23623168K)] 34995173K->17626274K(42229760K), 
> 1.0030974 secs] [Times: user=4.91 sys=0.30, real=1.00 secs] 
> 2016-01-25T15:33:58.784+0100: 38259.082: [GC (Allocation Failure) 
> [PSYoungGen: 20552606K->2502793K(23740416K)] 35924642K->17914660K(42347008K), 
> 1.0203825 secs] [Times: user=5.32 sys=0.24, real=1.02 secs] 
> 2016-01-25T15:43:12.959+0100: 38813.256: [GC (Allocation Failure) 
> [PSYoungGen: 20801161K->2677482K(23608320K)] 36213028K->18138909K(42214912K), 
> 1.2300001 secs] [Times: user=5.53 sys=0.22, real=1.23 secs] 
> 2016-01-25T15:51:16.686+0100: 39296.983: [GC (Allocation Failure) 
> [PSYoungGen: 21276394K->2800542K(23760384K)] 36737821K->18411609K(42366976K), 
> 1.1514720 secs] [Times: user=6.19 sys=0.15, real=1.15 secs] 
> 2016-01-25T15:58:35.696+0100: 39735.994: [GC (Allocation Failure) 
> [PSYoungGen: 21399454K->2938402K(23849472K)] 37010521K->18700024K(42456064K), 
> 2.8563665 secs] [Times: user=6.23 sys=8.99, real=2.86 secs] 
> 2016-01-25T16:05:33.310+0100: 40153.607: [GC (System.gc()) [PSYoungGen: 
> 20526288K->2986997K(22197760K)] 36287910K->18889778K(40804352K), 1.2626979 
> secs] [Times: user=6.22 sys=1.00, real=1.27 secs] 
> 2016-01-25T16:05:34.572+0100: 40154.870: [Full GC (System.gc()) [PSYoungGen: 
> 2986997K->0K(22197760K)] [ParOldGen: 15902781K->16935926K(18606592K)] 
> 18889778K->16935926K(40804352K), [Metaspace: 90226K->90226K(100352K)], 
> 15.0685505 secs] [Times: user=86.67 sys=7.55, real=15.06 secs] 
> 2016-01-25T16:13:35.766+0100: 40636.064: [GC (Allocation Failure) 
> [PSYoungGen: 19210752K->419289K(23559168K)] 36146678K->17355224K(42165760K), 
> 0.5276242 secs] [Times: user=2.41 sys=0.00, real=0.53 secs] 
> 2016-01-25T16:22:46.735+0100: 41187.033: [GC (Allocation Failure) 
> [PSYoungGen: 19353561K->740457K(23835648K)] 36289496K->17676391K(42442240K), 
> 0.6634034 secs] [Times: user=2.52 sys=0.00, real=0.66 secs] 
> 2016-01-25T16:30:31.503+0100: 41651.801: [GC (Allocation Failure) 
> [PSYoungGen: 19674729K->1023572K(24313856K)] 36610663K->17959507K(42920448K), 
> 0.8099960 secs] [Times: user=2.72 sys=0.08, real=0.81 secs] 
> 2016-01-25T16:37:51.095+0100: 42091.393: [GC (Allocation Failure) 
> [PSYoungGen: 20311636K->1263580K(20551680K)] 37247571K->18199515K(39158272K), 
> 0.7992434 secs] [Times: user=3.32 sys=0.01, real=0.80 secs] 
> 2016-01-25T16:49:34.671+0100: 42794.969: [GC (Allocation Failure) 
> [PSYoungGen: 20551644K->1565656K(24408576K)] 37487579K->18501598K(43015168K), 
> 0.8773248 secs] [Times: user=3.74 sys=0.00, real=0.87 secs] 
> 2016-01-25T16:56:34.541+0100: 43214.839: [GC (Allocation Failure) 
> [PSYoungGen: 21011416K->1791681K(21237760K)] 37947358K->18727623K(39844352K), 
> 0.7816826 secs] [Times: user=4.03 sys=0.00, real=0.79 secs] 
> 2016-01-25T17:01:31.494+0100: 43511.792: [GC (Allocation Failure) 
> [PSYoungGen: 21237441K->1997601K(24417280K)] 38173383K->18933551K(43023872K), 
> 0.8769351 secs] [Times: user=4.45 sys=0.09, real=0.88 secs] 
> 2016-01-25T17:07:13.856+0100: 43854.154: [GC (Allocation Failure) 
> [PSYoungGen: 21797153K->2080272K(24207360K)] 38733103K->19016222K(42813952K), 
> 0.8543820 secs] [Times: user=4.50 sys=0.00, real=0.86 secs] 
> 2016-01-25T17:12:37.866+0100: 44178.164: [GC (Allocation Failure) 
> [PSYoungGen: 21879824K->2170000K(24282112K)] 38815774K->19105950K(42888704K), 
> 0.8236605 secs] [Times: user=4.92 sys=0.00, real=0.82 secs] 
> 2016-01-25T17:18:05.977+0100: 44506.275: [GC (Allocation Failure) 
> [PSYoungGen: 22576272K->2299192K(24404992K)] 39512222K->19235143K(43011584K), 
> 0.7759199 secs] [Times: user=4.77 sys=0.00, real=0.78 secs] 
> 2016-01-25T17:28:11.288+0100: 45111.586: [GC (Allocation Failure) 
> [PSYoungGen: 22705464K->2550874K(24838656K)] 39641415K->19486825K(43445248K), 
> 1.3544459 secs] [Times: user=5.52 sys=1.93, real=1.36 secs] 
> 2016-01-25T17:35:39.350+0100: 45559.648: [GC (Allocation Failure) 
> [PSYoungGen: 23578202K->2665756K(24871424K)] 40514153K->19601715K(43478016K), 
> 1.0373681 secs] [Times: user=5.67 sys=0.04, real=1.04 secs] 
> 2016-01-25T17:43:31.775+0100: 46032.073: [GC (Allocation Failure) 
> [PSYoungGen: 23693084K->2891569K(25351168K)] 40629043K->19827535K(43957760K), 
> 2.8873970 secs] [Times: user=5.28 sys=11.09, real=2.89 secs] 
> 2016-01-25T17:51:22.218+0100: 46502.516: [GC (Allocation Failure) 
> [PSYoungGen: 24390449K->3020322K(24519680K)] 41326415K->19956296K(43126272K), 
> 1.0685985 secs] [Times: user=6.17 sys=0.00, real=1.07 secs] 
> 2016-01-25T18:00:19.997+0100: 47040.295: [GC (Allocation Failure) 
> [PSYoungGen: 24519202K->3177498K(25069568K)] 41455176K->20113480K(43676160K), 
> 1.3560052 secs] [Times: user=6.39 sys=0.34, real=1.36 secs] 
> 2016-01-25T18:09:14.251+0100: 47574.548: [GC (Allocation Failure) 
> [PSYoungGen: 24215578K->3059878K(24098304K)] 41151560K->20219258K(42704896K), 
> 1.5082240 secs] [Times: user=6.62 sys=2.86, real=1.50 secs] 
> 2016-01-25T18:15:01.783+0100: 47922.080: [GC (Allocation Failure) 
> [PSYoungGen: 24097958K->2809652K(25209856K)] 41257338K->20232628K(43816448K), 
> 1.8871482 secs] [Times: user=6.38 sys=6.59, real=1.88 secs] 
> 2016-01-25T18:24:10.926+0100: 48471.223: [GC (Allocation Failure) 
> [PSYoungGen: 23806772K->2757610K(23754752K)] 41229748K->20472641K(42361344K), 
> 1.8807114 secs] [Times: user=6.50 sys=6.05, real=1.88 secs] 
> 2016-01-25T18:32:11.039+0100: 48951.336: [GC (Allocation Failure) 
> [PSYoungGen: 23754730K->2664985K(25179136K)] 41469761K->20610485K(43785728K), 
> 1.6722986 secs] [Times: user=5.99 sys=4.89, real=1.68 secs] 
> 2016-01-25T18:32:33.693+0100: 48973.991: [GC (System.gc()) [PSYoungGen: 
> 5342511K->2261788K(25064448K)] 23288012K->20489350K(43671040K), 1.4310250 
> secs] [Times: user=5.08 sys=5.69, real=1.43 secs] 
> 2016-01-25T18:32:35.124+0100: 48975.422: [Full GC (System.gc()) [PSYoungGen: 
> 2261788K->373549K(25064448K)] [ParOldGen: 18227562K->18606102K(18606592K)] 
> 20489350K->18979651K(43671040K), [Metaspace: 94654K->94654K(104448K)], 
> 17.7869839 secs] [Times: user=109.81 sys=4.10, real=17.79 secs] 
> 2016-01-25T18:38:01.155+0100: 49301.452: [Full GC (Ergonomics) [PSYoungGen: 
> 21485357K->0K(25064448K)] [ParOldGen: 18606102K->17803897K(22484992K)] 
> 40091459K->17803897K(47549440K), [Metaspace: 94728K->94728K(104448K)], 
> 11.2804123 secs] [Times: user=71.57 sys=1.84, real=11.28 secs] 
> 2016-01-25T18:49:54.421+0100: 50014.719: [GC (Allocation Failure) 
> [PSYoungGen: 21111808K->310417K(24994816K)] 38915705K->18114322K(47479808K), 
> 0.4822526 secs] [Times: user=1.49 sys=0.00, real=0.49 secs] 
> 2016-01-25T19:02:11.033+0100: 50751.331: [GC (Allocation Failure) 
> [PSYoungGen: 21435537K->650681K(21775872K)] 39239442K->18454586K(44260864K), 
> 0.4177362 secs] [Times: user=1.93 sys=0.01, real=0.42 secs] 
> 2016-01-25T19:09:54.731+0100: 51215.028: [GC (Allocation Failure) 
> [PSYoungGen: 21775801K->1405472K(25216000K)] 39579706K->19209377K(47700992K), 
> 0.6537367 secs] [Times: user=3.92 sys=0.00, real=0.65 secs] 
> 
> 
> As always, I'm thankful for any more hints. Please reply publicly so everyone 
> else can benefit from the answers in the future.
> 
> 
> Thanks
> Maik
> 
> 
> [1] 
> http://stackoverflow.com/questions/214362/java-very-large-heap-sizes#6340745 
> <http://stackoverflow.com/questions/214362/java-very-large-heap-sizes#6340745>
> [2] https://www.azul.com/products/zing/ <https://www.azul.com/products/zing/>
> [3] http://www.infoq.com/articles/azul_gc_in_detail 
> <http://www.infoq.com/articles/azul_gc_in_detail>
> 
> 
>> Am 24.01.2016 um 21:01 schrieb Bogdan Zlatanov <bogdan.zlata...@gmail.com 
>> <mailto:bogdan.zlata...@gmail.com>>:
>> 
>> Hi again,
>> 
>> Try redirecting all the GC output in it's own file with the -Xloggc: 
>> /path/to/gc/log option.
>> 
>> Cheers,
>> Bogdan
>> 
>> On 24/01/2016 20:19, Maik Musall wrote:
>>> Hi,
>>> 
>>> yes, I tried, but the documented log output would never appear anywhere in 
>>> my logs. These are the options I'm using:
>>> 
>>> -verbose:gc
>>> -XX:+PrintGCDetails
>>> -XX:+PrintGCDateStamps
>>> 
>>> Despite these settings, I can't see ANYTHING in the logs about GC. I don't 
>>> know what I'm doing wrong there?
>>> 
>>> Maik
>>> 
>>>> Am 24.01.2016 um 13:27 schrieb Bogdan Zlatanov <bogdan.zlata...@gmail.com> 
>>>> <mailto:bogdan.zlata...@gmail.com>:
>>>> 
>>>> Hi,
>>>> 
>>>> Have you tried tracing the garbage collection -> 
>>>> https://docs.oracle.com/cd/E19159-01/819-3681/abeig/index.html 
>>>> <https://docs.oracle.com/cd/E19159-01/819-3681/abeig/index.html>?
>>>> 
>>>> Cheers,
>>>> Bogdan
>>>> 
>>>>> On 24/01/2016 13:01, Musall Maik wrote:
>>>>> Hi,
>>>>> 
>>>>> I have an application that often becomes laggy and shows slow responses 
>>>>> almost every day around 5 pm. The instance has daily scheduled restarts 
>>>>> at 5 am, so it occurs after 12 hours of use. When I restart it 
>>>>> preemptively at 3 pm, the problem does not occur. When I don't, users 
>>>>> suffer bad response times on many requests, but not all.
>>>>> 
>>>>> Users are computing heavy statistics in this application, instance size 
>>>>> (mostly snapshot cache) at that time is usually about 50 GByte, while Xmx 
>>>>> is 84 GByte, and XX:MaxPermSize is 1024m. The biggest instance size I've 
>>>>> seen so far in production is 61 GByte, so 50 seems not like a limit where 
>>>>> something weird happens.
>>>>> 
>>>>> CPU, memory, filesystem, database and network all show nothing special 
>>>>> before and after 5 pm. I monitored the instance with jvisualvm to no 
>>>>> avail. I've been searching for the reason for quite some time now, so I'd 
>>>>> appreciate any suggestion you might have what else to check.
>>>>> 
>>>>> Thanks
>>>>> Maik
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> Do not post admin requests to the list. They will be ignored.
>>>>> Webobjects-dev mailing list      (Webobjects-dev@lists.apple.com 
>>>>> <mailto:Webobjects-dev@lists.apple.com>)
>>>>> Help/Unsubscribe/Update your Subscription:
>>>>> https://lists.apple.com/mailman/options/webobjects-dev/bogdan.zlatanov%40gmail.com
>>>>>  
>>>>> <https://lists.apple.com/mailman/options/webobjects-dev/bogdan.zlatanov%40gmail.com>
>>>>> 
>>>>> This email sent to bogdan.zlata...@gmail.com 
>>>>> <mailto:bogdan.zlata...@gmail.com>_______________________________________________
>>>> Do not post admin requests to the list. They will be ignored.
>>>> Webobjects-dev mailing list      (Webobjects-dev@lists.apple.com 
>>>> <mailto:Webobjects-dev@lists.apple.com>)
>>>> Help/Unsubscribe/Update your Subscription:
>>>> https://lists.apple.com/mailman/options/webobjects-dev/maik%40selbstdenker.ag
>>>>  
>>>> <https://lists.apple.com/mailman/options/webobjects-dev/maik%40selbstdenker.ag>
>>>> 
>>>> This email sent to m...@selbstdenker.ag <mailto:m...@selbstdenker.ag> 
>>>> _______________________________________________
>>> Do not post admin requests to the list. They will be ignored.
>>> Webobjects-dev mailing list      (Webobjects-dev@lists.apple.com 
>>> <mailto:Webobjects-dev@lists.apple.com>)
>>> Help/Unsubscribe/Update your Subscription:
>>> https://lists.apple.com/mailman/options/webobjects-dev/bogdan.zlatanov%40gmail.com
>>>  
>>> <https://lists.apple.com/mailman/options/webobjects-dev/bogdan.zlatanov%40gmail.com>
>>> 
>>> This email sent to bogdan.zlata...@gmail.com 
>>> <mailto:bogdan.zlata...@gmail.com>
>> _______________________________________________
>> Do not post admin requests to the list. They will be ignored.
>> Webobjects-dev mailing list      (Webobjects-dev@lists.apple.com 
>> <mailto:Webobjects-dev@lists.apple.com>)
>> Help/Unsubscribe/Update your Subscription:
>> https://lists.apple.com/mailman/options/webobjects-dev/maik%40selbstdenker.ag
>>  
>> <https://lists.apple.com/mailman/options/webobjects-dev/maik%40selbstdenker.ag>
>> 
>> This email sent to m...@selbstdenker.ag <mailto:m...@selbstdenker.ag>

 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list      (Webobjects-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to