On 28/08/13 21:54, James Wennmacher wrote:
Do you have any statistics on your portlets for average execution time?
How many portlets do you have on your pages (in particular, pages with
longer-running portlets)? What's the average execution time of the
portlets on those pages?
Unfortunately I don't have execution time data.
Our home tab has 8 portlets on it. Timeouts: 1 x 10s, 1 x 7s, 1 x 6s and
5 x 5s.
3 of these portlets use ajax to load their "remote" content.
... after this conversation I think its time for me to review our status.
Do you have a lot of custom portlets and is internal portlet caching not
an option to avoid requests to external systems?
Yes we have lots of custom portlets. We do not use portlets such as the
simple content portlet where all of its data comes from the portal.
All of our portlets make connects elsewhere. In some cases this may be a
DB connection to different schemas on the main portal database server
but more often to different database servers. A larger proportion of our
portlets make SOAP or REST calls to remote systems. We make heavy use of
the Jasig WebProxyPortlet.
Many of our portlets have their own cache, but more recently we've been
switching to just using portlet caching. (This is very robust now in
uP4). Caching however is a luxury, the data still has to be fetched at
some point. (We've talked around just-in-time strategies, perhaps
triggered from login, but have not actually done anything like this).
Do you have a lot of portlet timeout indications and on what portlets?
I see no evidence in the logs or from support calls that timeouts are an
issue for our users.
Are your uPortal server's nearing capacity (how is it doing on JVM heap
usage and garbage collection, CPU usage, etc.)
This is not an area I'm confident with, however I suspect that our heap
could use being a bit bigger. We have -Xmx4608m and
-XX:MaxNewSize=2304m. This is the most I can allocate without causing
the system to swap. CPU load is generally low.
There are 150 worker threads which of course are shared for all page
render requests so this does provide some indication of worker thread
queue sizing. (Separate thought - it would be cool if the worker thread
queue defaulted to a configured size but auto-adjusted up to a separate
max value and dropped back to a configured min value as needed).
Our thread pool size is the default 150.
One though I came across; on a page with several portlets of different
timeouts, then all portlets might as well be allowed to run for as long
as the longest timeout!
Regarding the specific situation you mentioned (and others you are aware
of):
- what do you mean by brought down the servers? Were user requests for
pages immediately failing or queueing up for processing (I'm not sure
what the behavior is when you run out of worker threads)?
When the thread pool is exhausted you get the uPortal error.jsp. uPortal
generally recovers from this.
Today we got this
WARN [uP-TaskExec-4-cleanupHungWorkers]
rendering.PortletExecutionManager.[] 2013-08-28 07:51:01,098 -
PortletExecutionWorker [portletFname=man-portlet-calendar,
timeout=10000, portletWindowId=53_u23l1n12_27890, started=1377672501857,
submitted=1377672501857, complete=0, retrieved=true, canceled=true,
cancelCount=150, wait=0, duration=-1377672501857] is still hung, cancel
has been called 150 times
I think we need to explicitly set the timout on our remote web service
connection :-)
User facing today we got error 500 form Apache, so I suspect that we
exhausted tomcat or mod_jk threads. I didn't do a full investigation on
this occasion :-(
- can you explain in more detail why adjusting one portlet's timeout
from 5s to 10s brought down the servers. How many other portlets are on
those specific pages and what are their average and peak response
times? Is it on the home page, guest pages or authenticated user
pages? Etc.
Today was a exception in many ways and really we're talking only of
exceptional situations where a remote system has hung. In the day-to-day
working then I agree that if a portlet genuinely needs more time it
should have it.
uPortal's self protection features and better programming on my part
mean that I'm more confident these days to increase timeouts. But I
still see the timeout as our last line of defense against thread
exhaustion, 500 errors or even death.
Clearly we'll want to really understand the impacts and potential risk
scenarios based on your comments.
Hope this helps?
-- Anthony.
--
You are currently subscribed to [email protected] as:
[email protected]
To unsubscribe, change settings or access archives, see
http://www.ja-sig.org/wiki/display/JSG/uportal-dev