Hi Beren, When you say that the app server has 4GB of memory, do you mean the JVM is configured with 4GB in the startup params (e.g., -Xmx)? The JVM itself may have less allocated space. To see if you're running into significant garbage-collection issues, try and enable the verbose garbage collector logs on the app server. If they're already enabled, then its output would be useful if you could attach it.
Here are some relevant parameters: java -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:<path/to/output/file.log> ... The garbage collector is capable of completely locking up the JVM to clean out de-referenced objects. If it locks for a significant amount of time, timeouts may occur. As you've found with the OAE-model-loader, it only takes one timeout to crash it. Another possibility is intermittent network connectivity? Or maybe a firewall / IDP causing issues? If possible, you could try running the OAE-model-loader from the same machine as the app server. This can actually greatly improve the loading time as well if there is a relatively larger cost establishing an HTTP connection over the network. Also, I forgot to reply about the lack of the package.js script in the OAE-model-loader. The PR [1] that adds this functionality is actually still outstanding, I guess I got a little excited with the documentation. To get a head start, you could work from my performance-testing branch for now [2]. Everything about generating and loading the data is still the same, though. So no need to redo anything by taking in those changes. Hope that helps, Branden [1] https://github.com/sakaiproject/OAE-model-loader/pull/28 [2] https://github.com/mrvisser/OAE-model-loader/tree/performance-testing On Sun, Aug 19, 2012 at 9:35 PM, Walters, Beren <bwalt...@csu.edu.au> wrote: > > Hi Branden, > > > > The app server has 4GB of memory (and 6GB of swap) and does not run the solr > or database services. I did have to bump the memory to 1.5GB on the debian VM > I use for running OAE-model-loader as it kept being killed by the kernels OOM > process killer during the generate phase. > > > > I’m running the import across a 100mb wired network which never seems to > exceed about 2-3% utilisation, generally more like 0.5-1%. > > > > Run 1: Using the previously run data. > > ===== > > > > At start of run (100 batches of 500 users, 2 worlds, 2 content, 2 > collections) the server is using 1.6GB of memory. > > > > After 180 of the first users in the batch it has hit 2.2GB of used memory. > > > > After 321 of the first 500 user batch it has reached 2.25GB. > > > > After 481 of the first 500 it has reached 2.28GB. > > > > It appears to have finished the users in that batch at this point then died > (ECONNREFUSED) while loading Contact 244 of 6879. > > > > I was able to keep using the app server during and after this load test – > browsing content etc. > > > > I ran packet captures on the VM where I run OAE-model-loader when I first hit > this issue and it appeared that the loader was trying to create a TCP > connection for the HTTP transaction but never received a response before > hitting some timeout. > > > > Attached is the log from this load test (import-oaeappdev01-logs2.zip). I use > 2>&1 when running the test so error out may be mixed with the standard out > and this file is 12+MB as it contains all of the errors for failing to insert > existing users and contacts. > > > > Run 2: Reran the generate.js script using the same settings before this test. > > > > At start of run (100 batches of 500 users, 2 worlds, 2 content, 2 > collections) the server is using 2.32GB of memory. I assume this is due to > the Linux virtual memory strategy, it won’t release the memory until it > reaches the cache pressure threshold. > > > > After 141 users of first batch it is at 2.36GB. > > > > At user 189 I saw an error in the log: Could not create user > batch0-lory-turnell-217 because No live SolrServers available to handle this > request > > This didn’t abort the load as per the ECONNREFUSED error. > > > > After 250 users of first batch it is at 2.39GB. > > > > After 360 users of the first batch it 2.43GB. > > > > At user 460 I received the solrserver error again. > > > > It then died (ECONNREFUSED) while loading contact 360 of 4954. > > > > The app server kept running fine during and after this load. > > > > Attached is the log from this load test (import-oaeappdev01-logs3.zip) > > > > Let me know if there are any more details I can provide. Perhaps monitoring > the solr servers (we run a master + slave config)? > > > > Thanks, > > Beren. > > > > > > From: Branden Visser [mailto:mrvis...@gmail.com] > Sent: Monday, 20 August 2012 10:19 AM > To: Walters, Beren > Subject: Re: [oae-dev] OAE-model-loader > > > > Hi Beren, when that happens, is the app server responsive at all? > > You may be running out of memory. I managed to load 5000 users on my MacBook > by feeding the server 3gb of memory. I was running solr embedded with > postgres on the MacBook as well. > > Hope that helps, > Branden > > On Aug 19, 2012 7:08 PM, "Walters, Beren" <bwalt...@csu.edu.au> wrote: > > Hi All, > > > > I’m having a bit of trouble with the OAE-model-loader. > > > > I can’t add more than about 1000 users without the load failing with this > error: > > > > events.js:66 > > throw arguments[1]; // Unhandled 'error' event > > ^ > > Error: connect ECONNREFUSED > > at errnoException (net.js:782:11) > > at Object.afterConnect [as oncomplete] (net.js:773:19) > > > > I have tried regenerating the data files in case there was an error but it > happens on every load. > > > > I have already split the load into <500 per batch and run non-concurrent > loads but this problem persists. > > > > Even loading the existing users again (receiving item already exists HTTP > responses) the load will still fail, often before it gets to the users that > have not yet been loaded. > > > > Does anyone have any ideas on how to solve this issue? Could the error be > caught? The server doesn’t seem to be heavily loaded during the process and > continues working after the failure with nothing obvious in the logs. > > > > I also haven’t been able to find the package.js script referred to at > https://confluence.sakaiproject.org/display/3AK/Performance+Testing+Methodology#PerformanceTestingMethodology-LoadingSourceDataintoOAE > could someone point me in the direction of it? > > > > Thanks, > > Beren. > > > > | ALBURY-WODONGA | BATHURST | CANBERRA | DUBBO | GOULBURN > | MELBOURNE | ONTARIO | ORANGE | PORT MACQUARIE | SYDNEY > | WAGGA WAGGA | > > ________________________________ > > LEGAL NOTICE > This email (and any attachment) is confidential and is intended for the use > of the addressee(s) only. If you are not the intended recipient of this > email, you must not copy, distribute, take any action in reliance on it or > disclose it to anyone. Any confidentiality is not waived or lost by reason of > mistaken delivery. Email should be checked for viruses and defects before > opening. Charles Sturt University (CSU) does not accept liability for viruses > or any consequence which arise as a result of this email transmission. Email > communications with CSU may be subject to automated email filtering, which > could result in the delay or deletion of a legitimate email before it is read > at CSU. The views expressed in this email are not necessarily those of CSU. > > Charles Sturt University in Australia The Grange Chancellery, Panorama > Avenue, Bathurst NSW Australia 2795 (ABN: 83 878 708 551; CRICOS Provider > Numbers: 00005F (NSW), 01947G (VIC), 02960B (ACT)). TEQSA Provider Number: > PV12018 > Charles Sturt University in Ontario 860 Harrington Court, Burlington Ontario > Canada L7N 3N4 Registration: www.peqab.ca > > Consider the environment before printing this email. > > > _______________________________________________ > oae-dev mailing list > oae-dev@collab.sakaiproject.org > http://collab.sakaiproject.org/mailman/listinfo/oae-dev _______________________________________________ oae-dev mailing list oae-dev@collab.sakaiproject.org http://collab.sakaiproject.org/mailman/listinfo/oae-dev