Re: [uportal-dev] Proposal to change portal's default render timeout from 5000ms to 20000ms

Anthony Colebourne Wed, 28 Aug 2013 15:18:01 -0700

On 28/08/13 21:54, James Wennmacher wrote:

Do you have any statistics on your portlets for average execution time?
How many portlets do you have on your pages (in particular, pages with
longer-running portlets)?  What's the average execution time of the
portlets on those pages?


Unfortunately I don't have execution time data.

Our home tab has 8 portlets on it. Timeouts: 1 x 10s, 1 x 7s, 1 x 6s and5 x 5s.


3 of these portlets use ajax to load their "remote" content.

... after this conversation I think its time for me to review our status.

Do you have a lot of custom portlets and is internal portlet caching not
an option to avoid requests to external systems?

Yes we have lots of custom portlets. We do not use portlets such as thesimple content portlet where all of its data comes from the portal.

All of our portlets make connects elsewhere. In some cases this may be aDB connection to different schemas on the main portal database serverbut more often to different database servers. A larger proportion of ourportlets make SOAP or REST calls to remote systems. We make heavy use ofthe Jasig WebProxyPortlet.

Many of our portlets have their own cache, but more recently we've beenswitching to just using portlet caching. (This is very robust now inuP4). Caching however is a luxury, the data still has to be fetched atsome point. (We've talked around just-in-time strategies, perhapstriggered from login, but have not actually done anything like this).

Do you have a lot of portlet timeout indications and on what portlets?

I see no evidence in the logs or from support calls that timeouts are anissue for our users.

Are your uPortal server's nearing capacity (how is it doing on JVM heap
usage and garbage collection, CPU usage, etc.)

This is not an area I'm confident with, however I suspect that our heapcould use being a bit bigger. We have -Xmx4608m and-XX:MaxNewSize=2304m. This is the most I can allocate without causingthe system to swap. CPU load is generally low.

There are 150 worker threads which of course are shared for all page
render requests so this does provide some indication of worker thread
queue sizing.  (Separate thought - it would be cool if the worker thread
queue defaulted to a configured size but auto-adjusted up to a separate
max value and dropped back to a configured min value as needed).


Our thread pool size is the default 150.

One though I came across; on a page with several portlets of differenttimeouts, then all portlets might as well be allowed to run for as longas the longest timeout!

Regarding the specific situation you mentioned (and others you are aware
of):
- what do you mean by brought down the servers?  Were user requests for
pages immediately failing or queueing up for processing (I'm not sure
what the behavior is when you run out of worker threads)?

When the thread pool is exhausted you get the uPortal error.jsp. uPortalgenerally recovers from this.


Today we got this

WARN [uP-TaskExec-4-cleanupHungWorkers]rendering.PortletExecutionManager.[] 2013-08-28 07:51:01,098 -PortletExecutionWorker [portletFname=man-portlet-calendar,timeout=10000, portletWindowId=53_u23l1n12_27890, started=1377672501857,submitted=1377672501857, complete=0, retrieved=true, canceled=true,cancelCount=150, wait=0, duration=-1377672501857] is still hung, cancelhas been called 150 times

I think we need to explicitly set the timout on our remote web serviceconnection :-)

User facing today we got error 500 form Apache, so I suspect that weexhausted tomcat or mod_jk threads. I didn't do a full investigation onthis occasion :-(

- can you explain in more detail why adjusting one portlet's timeout
from 5s to 10s brought down the servers.  How many other portlets are on
those specific pages and what are their average and peak response
times?  Is it on the home page, guest pages or authenticated user
pages?  Etc.

Today was a exception in many ways and really we're talking only ofexceptional situations where a remote system has hung. In the day-to-dayworking then I agree that if a portlet genuinely needs more time itshould have it.

uPortal's self protection features and better programming on my partmean that I'm more confident these days to increase timeouts. But Istill see the timeout as our last line of defense against threadexhaustion, 500 errors or even death.

Clearly we'll want to really understand the impacts and potential risk
scenarios based on your comments.


Hope this helps?

-- Anthony.

--
You are currently subscribed to [email protected] as: 
[email protected]
To unsubscribe, change settings or access archives, see 
http://www.ja-sig.org/wiki/display/JSG/uportal-dev

Re: [uportal-dev] Proposal to change portal's default render timeout from 5000ms to 20000ms

Reply via email to