I had replied with this on the uportal-user list:For the performance my first pointer would be to uportal-impl/src/main/resources/properties/ehcache.xml
In each release of uPortal we've been moving more and more data out of static caches and the user session into Ehcache. I'm not sure its out in a released version yet but I recently did some review of the default cache config and tuning here at UW and checked in an updated config file that at least has comments describing how each cache is used.
Also all of the cache statistics are available via JMX. I'd recommend that you monitor those as you're doing your load testing and see which caches are filling up and which have poor hit rates. Tuning the size and TTLs of the caches should do a lot to reduce database IO and load times.
So I guess I'd be very interested to have you do some basic tuning in ehcache then re-run the tests and watch the caches to see if they are both large enough and have appropriate TTLs for your usage patterns.
-Eric On 06/08/2010 12:13 AM, Alex Bragg wrote:
Hello,
I'm doing some performance testing, and I could use some hints on a couple of
issues. First, I'm looking for some hints on things I can tweak in 3.1.1/3.2.1
to improve performance under heavy load. Second, I'm hitting a bug in 2.6.1
that is preventing me from gathering solid baseline performance numbers, and
perhaps someone else has seen it. Let me explain in further detail.
We have been preparing for an upgrade of our production systems from uPortal
2.6.1 to uPortal 3.x. Currently, we're looking at two 3.x versions, 3.1.1 and
3.2.1. In my development environment, I have installed 2.6.1, 3.1.1, and
3.2.1. My 2.6.1 install is running out of a 5.5.28 Tomcat, and my 3.x versions
are running in a 6.0.24 Tomcat. All versions are running under Java version
1.6.0_12-b04, 64-bit, and I have an Oracle 11gR2 database backing them.
The layout in each instance is a simple 5-tab layout, with nothing on the
default tab. I have a custom testing portlet that simply executes a SQL query
5, 10, or 15 times and renders a 3-line text output. On the remaining four
tabs, I have mixtures of two or more of these testing portlets. I run tests
with JMeter, and the click path is get login page, login, click tab 2, click
tab 3, click tab 4, click tab 5, and logout. JMeter verifies each page renders
properly. The tests I run execute this click path 4000 times spread across 1,
4, 50, and 200 threads, and there are no waits built into the scripts.
Here are results from the tests I have run so far. The values are the 90th percentile
page-response time in seconds. Please note that the number for 2.6.1 in the 200-thread
column isn't valid. At the 200-thread level most of the 200 threads complete their 20
iterations before JMeter starts additional threads during ramp-up. I end up with no more
than 4 or 5 threads running concurrently. Another thing that skews these numbers is that
I can only get valid results using users that have successfully logged in before.
Anything above 2 threads with users that have not previously logged in results in
channels failing to render (with the message "You are not authorized to view this
channel").
version 1 4 50 200 50-lb2 200-lb2 50-lb4 200-lb4
2.6.1 0.07 0.08 0.7 *0.08* 0.69 4.56
3.1.1 0.09 0.09 1.96 7.81 1.18 6.02 1.12 5.49
3.2.1 0.17 0.18 7.04 26.43 6.17 20.22
The "lb2" and "lb4" designators signify that I have started multiple Tomcats on
the server, 2 for lb2 and 4 for lb4, and I'm balancing load with HAProxy. I see much better
utilization on the server, and both page-response times and elapsed test run times (below) both
improve significantly even though I have not added any additional hardware.
This table shows the elapsed time in seconds to complete the above tests.
version 1 4 50 200 50-lb2 200-lb2 50-lb4 200-lb4
2.6.1 934 454 216 212 209.09 263.43
3.1.1 1,537 462 495 813 386.92 660.39 421.41 414.32
3.2.1 3,299 862 1,999 3,958 1259.99 2636.8
Basically, what I see here is that at low concurrency 2.6.1 and 3.1.1 are
fairly comparable, and 3.2.1 is noticeably slower. At 50 threads and above, I
see that 2.6.1 is much faster than 3.x. I also see that at very high loads,
3.x seems to have a point where it just falls over the edge of a cliff.
Part of that I'm sure is the change in page sizes. Here are the page sizes
JMeter reports (this does not include embedded resources).
2.6.1 3.1.1 3.2.1
Avg. Bytes Avg. Bytes Avg. Bytes
Login Page 2014.93 12865 23963
Login 2958.61 21716 21909
Tab 1 5950.05 24221.21 24656
Tab 2 8840.34 26755.38 27430
Tab 3 8835.95 26753.3 27428
Tab 4 7380.03 25525.27 26068
Logout 2014.94 12865 23963
TOTAL 5427.84 21528.74 25059.57
So, back to my two questions.
1. What has changed in 3.1.1 that might explain a significant (at least 2x
slowdown under load)? To me it feels like 2.6.1 is caching rendered elements
to a much greater degree than 3.1.1. What can I tweak to improve this?
2. Is anyone aware of something I can change to fix the behavior with new
logins in 2.6.1 to prevent this issue with channels not authorized?
Thanks,
Alex Bragg
Unicon, Inc.
smime.p7s
Description: S/MIME Cryptographic Signature
