# file managed by puppet # filled out via puppet templating ############################ \ this is not a typo / set min and max to javamemorymax export JAVA_OPTS="-server -Xms<%= javamemorymax %> -Xmx<%= javamemorymax %> \ -XX:PermSize=<%= javapermsize %> -XX:MaxPermSize=<%= javapermsize %> \ -XX:CMSInitiatingOccupancyFraction=70 \ -XX:NewRatio=3 -XX:-UseAdaptiveSizePolicy \ -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled \ -XX:MaxTenuringThreshold=0 -XX:-DisableExplicitGC \ -XX:+UseCMSInitiatingOccupancyOnly \ -Djava.awt.headless=true -verbose:gc \ -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution \ -XX:+PrintCommandLineFlags"
# http://randomlyrr.blogspot.it/2012/03/java-tuning-in-nutshell-part-1.html # -Xmx should be equal to -Xms Growing from Xms to Xmx requires Full GC’s to # resize the heap. Set these to the same value if Full GC’s are to be completely # eliminated in production. # –XX:PermSize should be equal to –XX:MaxPermSize # Both params need to be specified and should have the same value. Otherwise, # a full GC is required for each Perm Gen resize while it grows up to MaxPermSize # –XX:NewSize is specified but not equal to –XX:MaxNewSize # Like the other heap params, resize of new/young gen requires a Full GC. The # preferred approach is to avoid these two parameters and use -Xmn instead. # This eliminates the problem as setting, say "-Xmn1g", is the same as setting # "-XX:NewSize=1g -XX:MaxNewSize=1g". # Although UseConcMarkSweepGC is specified, CMS can and often will kick in too # late, causing a Full GC when it can’t catch up. In other words, although CMS # is collecting garbage, the application threads that are executing concurrently # run out of heap for allocation because CMS couldn't free garbage soon enough. # At this point, the JVM stops all application threads and does a Full GC. # This is also called a “concurrent mode failure” in GC logs. The reason for # concurrent mode failure - the JVM dynamically finds a value for when CMS # should be initiated and changes this value based on statistics. However, in # production, load is often bursty which leads to misses/miscalculation for the # last dynamically computed initiation value. To prevent this, provide a static # value for CMSInitiation. Use –XX:CMSInitiatingOccupancyFraction (as percentage # of total heap) to tell the JVM what point it should initiate CMS. A value # between 40 to 70 usually works for most Fusion middleware products. Start # with the higher value (70) and tune down only if you still see the string # "concurrent mode failure" in GC log # Secondly, always specify –XX:+UseCMSInitiatingOccupancyOnly when # CMSInitiatingOccupancyFraction is used, otherwise the value you specify # does not stick (JVM will dynamically change it on the fly again). This is # very important and commonly missed. # -XX:+UseCompressedOops Highly recommended on 64-bit JVM's with an Xmx value # less than 32g. However, this is available only on JDK6 update 14+. On Aug 20, 2012, at 7:40 PM, "Walters, Beren" <bwalt...@csu.edu.au> wrote: > What sort of SAKAI3_JAVA_OPTS does everyone else use? > > What sort of environment is this in (physical memory, local solr servers, > etc)? > > Is anyone using the concurrent collector to try and reduce application pauses? > > We are running the options shown below on two (virtual) app servers, each > with 4GB of memory, with separate solr and database servers, no garbage > collector specified. > > Thanks, > Beren > > -----Original Message----- > From: Branden Visser [mailto:mrvis...@gmail.com] > Sent: Monday, 20 August 2012 8:40 PM > To: Walters, Beren > Cc: oae-dev@collab.sakaiproject.org > Subject: Re: [oae-dev] OAE-model-loader > > Thanks for the graphs, Beren. Given the spike in GC activity around > the times that the loading failed, there is some substance to the > theory that the JVM was struggling with memory. > > Cheers, > Branden > > On Sun, Aug 19, 2012 at 11:07 PM, Walters, Beren <bwalt...@csu.edu.au> wrote: >> Hi Brandon, >> >> I was just looking at the total server memory using the free command. Not >> very useful in retrospect. >> >> We are currently running using the following java options: >> SAKAI3_JAVA_OPTS="-Xmx1500m -XX:MaxPermSize=256m -server >> -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=****** >> -Dcom.sun.management.jmxremote.ssl=false >> -Dcom.sun.management.jmxremote.password.file=****** -Djava.security.manager >> -Djava.security.policy=****** -Djava.awt.headless=true -Dhttp.proxySet=true >> -Dhttp.proxyHost=****** -Dhttp.proxyPort=****** >> -Dhttp.nonProxyHosts='******' -Dhttp.proxyUser=****** >> -Dhttp.proxyPassword=******* -Dcom.sun.management.snmp.port=****** >> -Dcom.sun.management.snmp.acl.file=******" >> >> So I guess the server only had 1.5GB allocated for the JVM its self. >> >> We had some monitoring running while I ran the two test this morning: >> >> https://oae-community.sakaiproject.org/content#p=mTbbqA15C/PS-marksweep.png >> >> https://oae-community.sakaiproject.org/content#p=mTbbjAU7aa/PS-scavenge.png >> >> The first run ran until about 10:45am and the second till about 11:30am. >> >> I'm going to have to run the JVM changes through approval (even if it is >> only temporary) so it may take me a while to produce the log you are after. >> >> Unfortunately I'm not able to run the model loader on the app server its >> self. >> >> Thanks for the package.js info. >> >> Cheers, >> Beren. >> >> -----Original Message----- >> From: Branden Visser [mailto:mrvis...@gmail.com] >> Sent: Monday, 20 August 2012 12:21 PM >> To: Walters, Beren >> Cc: oae-dev@collab.sakaiproject.org >> Subject: Re: [oae-dev] OAE-model-loader >> >> Hi Beren, >> >> When you say that the app server has 4GB of memory, do you mean the >> JVM is configured with 4GB in the startup params (e.g., -Xmx)? The JVM >> itself may have less allocated space. To see if you're running into >> significant garbage-collection issues, try and enable the verbose >> garbage collector logs on the app server. If they're already enabled, >> then its output would be useful if you could attach it. >> >> Here are some relevant parameters: >> >> java -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps >> -Xloggc:<path/to/output/file.log> ... >> >> The garbage collector is capable of completely locking up the JVM to >> clean out de-referenced objects. If it locks for a significant amount >> of time, timeouts may occur. As you've found with the >> OAE-model-loader, it only takes one timeout to crash it. >> >> Another possibility is intermittent network connectivity? Or maybe a >> firewall / IDP causing issues? If possible, you could try running the >> OAE-model-loader from the same machine as the app server. This can >> actually greatly improve the loading time as well if there is a >> relatively larger cost establishing an HTTP connection over the >> network. >> >> Also, I forgot to reply about the lack of the package.js script in the >> OAE-model-loader. The PR [1] that adds this functionality is actually >> still outstanding, I guess I got a little excited with the >> documentation. To get a head start, you could work from my >> performance-testing branch for now [2]. Everything about generating >> and loading the data is still the same, though. So no need to redo >> anything by taking in those changes. >> >> Hope that helps, >> Branden >> >> [1] https://github.com/sakaiproject/OAE-model-loader/pull/28 >> [2] https://github.com/mrvisser/OAE-model-loader/tree/performance-testing >> >> On Sun, Aug 19, 2012 at 9:35 PM, Walters, Beren <bwalt...@csu.edu.au> wrote: >>> >>> Hi Branden, >>> >>> >>> >>> The app server has 4GB of memory (and 6GB of swap) and does not run the >>> solr or database services. I did have to bump the memory to 1.5GB on the >>> debian VM I use for running OAE-model-loader as it kept being killed by the >>> kernels OOM process killer during the generate phase. >>> >>> >>> >>> I'm running the import across a 100mb wired network which never seems to >>> exceed about 2-3% utilisation, generally more like 0.5-1%. >>> >>> >>> >>> Run 1: Using the previously run data. >>> >>> ===== >>> >>> >>> >>> At start of run (100 batches of 500 users, 2 worlds, 2 content, 2 >>> collections) the server is using 1.6GB of memory. >>> >>> >>> >>> After 180 of the first users in the batch it has hit 2.2GB of used memory. >>> >>> >>> >>> After 321 of the first 500 user batch it has reached 2.25GB. >>> >>> >>> >>> After 481 of the first 500 it has reached 2.28GB. >>> >>> >>> >>> It appears to have finished the users in that batch at this point then died >>> (ECONNREFUSED) while loading Contact 244 of 6879. >>> >>> >>> >>> I was able to keep using the app server during and after this load test - >>> browsing content etc. >>> >>> >>> >>> I ran packet captures on the VM where I run OAE-model-loader when I first >>> hit this issue and it appeared that the loader was trying to create a TCP >>> connection for the HTTP transaction but never received a response before >>> hitting some timeout. >>> >>> >>> >>> Attached is the log from this load test (import-oaeappdev01-logs2.zip). I >>> use 2>&1 when running the test so error out may be mixed with the standard >>> out and this file is 12+MB as it contains all of the errors for failing to >>> insert existing users and contacts. >>> >>> >>> >>> Run 2: Reran the generate.js script using the same settings before this >>> test. >>> >>> >>> >>> At start of run (100 batches of 500 users, 2 worlds, 2 content, 2 >>> collections) the server is using 2.32GB of memory. I assume this is due to >>> the Linux virtual memory strategy, it won't release the memory until it >>> reaches the cache pressure threshold. >>> >>> >>> >>> After 141 users of first batch it is at 2.36GB. >>> >>> >>> >>> At user 189 I saw an error in the log: Could not create user >>> batch0-lory-turnell-217 because No live SolrServers available to handle >>> this request >>> >>> This didn't abort the load as per the ECONNREFUSED error. >>> >>> >>> >>> After 250 users of first batch it is at 2.39GB. >>> >>> >>> >>> After 360 users of the first batch it 2.43GB. >>> >>> >>> >>> At user 460 I received the solrserver error again. >>> >>> >>> >>> It then died (ECONNREFUSED) while loading contact 360 of 4954. >>> >>> >>> >>> The app server kept running fine during and after this load. >>> >>> >>> >>> Attached is the log from this load test (import-oaeappdev01-logs3.zip) >>> >>> >>> >>> Let me know if there are any more details I can provide. Perhaps monitoring >>> the solr servers (we run a master + slave config)? >>> >>> >>> >>> Thanks, >>> >>> Beren. >>> >>> >>> >>> >>> >>> From: Branden Visser [mailto:mrvis...@gmail.com] >>> Sent: Monday, 20 August 2012 10:19 AM >>> To: Walters, Beren >>> Subject: Re: [oae-dev] OAE-model-loader >>> >>> >>> >>> Hi Beren, when that happens, is the app server responsive at all? >>> >>> You may be running out of memory. I managed to load 5000 users on my >>> MacBook by feeding the server 3gb of memory. I was running solr embedded >>> with postgres on the MacBook as well. >>> >>> Hope that helps, >>> Branden >>> >>> On Aug 19, 2012 7:08 PM, "Walters, Beren" <bwalt...@csu.edu.au> wrote: >>> >>> Hi All, >>> >>> >>> >>> I'm having a bit of trouble with the OAE-model-loader. >>> >>> >>> >>> I can't add more than about 1000 users without the load failing with this >>> error: >>> >>> >>> >>> events.js:66 >>> >>> throw arguments[1]; // Unhandled 'error' event >>> >>> ^ >>> >>> Error: connect ECONNREFUSED >>> >>> at errnoException (net.js:782:11) >>> >>> at Object.afterConnect [as oncomplete] (net.js:773:19) >>> >>> >>> >>> I have tried regenerating the data files in case there was an error but it >>> happens on every load. >>> >>> >>> >>> I have already split the load into <500 per batch and run non-concurrent >>> loads but this problem persists. >>> >>> >>> >>> Even loading the existing users again (receiving item already exists HTTP >>> responses) the load will still fail, often before it gets to the users that >>> have not yet been loaded. >>> >>> >>> >>> Does anyone have any ideas on how to solve this issue? Could the error be >>> caught? The server doesn't seem to be heavily loaded during the process and >>> continues working after the failure with nothing obvious in the logs. >>> >>> >>> >>> I also haven't been able to find the package.js script referred to at >>> https://confluence.sakaiproject.org/display/3AK/Performance+Testing+Methodology#PerformanceTestingMethodology-LoadingSourceDataintoOAE >>> could someone point me in the direction of it? >>> >>> >>> >>> Thanks, >>> >>> Beren. >>> >>> >>> >>> | ALBURY-WODONGA | BATHURST | CANBERRA | DUBBO | GOULBURN >>> | MELBOURNE | ONTARIO | ORANGE | PORT MACQUARIE | >>> SYDNEY | WAGGA WAGGA | >>> >>> ________________________________ >>> >>> LEGAL NOTICE >>> This email (and any attachment) is confidential and is intended for the use >>> of the addressee(s) only. If you are not the intended recipient of this >>> email, you must not copy, distribute, take any action in reliance on it or >>> disclose it to anyone. Any confidentiality is not waived or lost by reason >>> of mistaken delivery. Email should be checked for viruses and defects >>> before opening. Charles Sturt University (CSU) does not accept liability >>> for viruses or any consequence which arise as a result of this email >>> transmission. Email communications with CSU may be subject to automated >>> email filtering, which could result in the delay or deletion of a >>> legitimate email before it is read at CSU. The views expressed in this >>> email are not necessarily those of CSU. >>> >>> Charles Sturt University in Australia The Grange Chancellery, Panorama >>> Avenue, Bathurst NSW Australia 2795 (ABN: 83 878 708 551; CRICOS Provider >>> Numbers: 00005F (NSW), 01947G (VIC), 02960B (ACT)). TEQSA Provider Number: >>> PV12018 >>> Charles Sturt University in Ontario 860 Harrington Court, Burlington >>> Ontario Canada L7N 3N4 Registration: www.peqab.ca >>> >>> Consider the environment before printing this email. >>> >>> >>> _______________________________________________ >>> oae-dev mailing list >>> oae-dev@collab.sakaiproject.org >>> http://collab.sakaiproject.org/mailman/listinfo/oae-dev >> Charles Sturt University >> >> | ALBURY-WODONGA | BATHURST | CANBERRA | DUBBO | GOULBURN | MELBOURNE | >> ONTARIO | ORANGE | PORT MACQUARIE | SYDNEY | WAGGA WAGGA | >> >> LEGAL NOTICE >> This email (and any attachment) is confidential and is intended for the use >> of the addressee(s) only. If you are not the intended recipient of this >> email, you must not copy, distribute, take any action in reliance on it or >> disclose it to anyone. Any confidentiality is not waived or lost by reason >> of mistaken delivery. Email should be checked for viruses and defects before >> opening. Charles Sturt University (CSU) does not accept liability for >> viruses or any consequence which arise as a result of this email >> transmission. Email communications with CSU may be subject to automated >> email filtering, which could result in the delay or deletion of a legitimate >> email before it is read at CSU. The views expressed in this email are not >> necessarily those of CSU. >> >> Charles Sturt University in Australia http://www.csu.edu.au The Grange >> Chancellery, Panorama Avenue, Bathurst NSW Australia 2795 (ABN: 83 878 708 >> 551; CRICOS Provider Numbers: 00005F (NSW), 01947G (VIC), 02960B (ACT)). >> TEQSA Provider Number: PV12018 >> >> Charles Sturt University in Ontario http://www.charlessturt.ca 860 >> Harrington Court, Burlington Ontario Canada L7N 3N4 Registration: >> www.peqab.ca >> >> Consider the environment before printing this email. > Charles Sturt University > > | ALBURY-WODONGA | BATHURST | CANBERRA | DUBBO | GOULBURN | MELBOURNE | > ONTARIO | ORANGE | PORT MACQUARIE | SYDNEY | WAGGA WAGGA | > > LEGAL NOTICE > This email (and any attachment) is confidential and is intended for the use > of the addressee(s) only. If you are not the intended recipient of this > email, you must not copy, distribute, take any action in reliance on it or > disclose it to anyone. Any confidentiality is not waived or lost by reason of > mistaken delivery. Email should be checked for viruses and defects before > opening. Charles Sturt University (CSU) does not accept liability for viruses > or any consequence which arise as a result of this email transmission. Email > communications with CSU may be subject to automated email filtering, which > could result in the delay or deletion of a legitimate email before it is read > at CSU. The views expressed in this email are not necessarily those of CSU. > > Charles Sturt University in Australia http://www.csu.edu.au The Grange > Chancellery, Panorama Avenue, Bathurst NSW Australia 2795 (ABN: 83 878 708 > 551; CRICOS Provider Numbers: 00005F (NSW), 01947G (VIC), 02960B (ACT)). > TEQSA Provider Number: PV12018 > > Charles Sturt University in Ontario http://www.charlessturt.ca 860 > Harrington Court, Burlington Ontario Canada L7N 3N4 Registration: > www.peqab.ca > > Consider the environment before printing this email. > _______________________________________________ > oae-dev mailing list > oae-dev@collab.sakaiproject.org > http://collab.sakaiproject.org/mailman/listinfo/oae-dev _______________________________________________ oae-dev mailing list oae-dev@collab.sakaiproject.org http://collab.sakaiproject.org/mailman/listinfo/oae-dev