Thanks for passing on a cautionary note and providing this perspective Anthony. This is excellent information.

Do you have any statistics on your portlets for average execution time?
How many portlets do you have on your pages (in particular, pages with longer-running portlets)? What's the average execution time of the portlets on those pages? Do you have a lot of custom portlets and is internal portlet caching not an option to avoid requests to external systems?
Do you have a lot of portlet timeout indications and on what portlets?
Are your uPortal server's nearing capacity (how is it doing on JVM heap usage and garbage collection, CPU usage, etc.)

There are 150 worker threads which of course are shared for all page render requests so this does provide some indication of worker thread queue sizing. (Separate thought - it would be cool if the worker thread queue defaulted to a configured size but auto-adjusted up to a separate max value and dropped back to a configured min value as needed).

Regarding the specific situation you mentioned (and others you are aware of): - what do you mean by brought down the servers? Were user requests for pages immediately failing or queueing up for processing (I'm not sure what the behavior is when you run out of worker threads)? - can you explain in more detail why adjusting one portlet's timeout from 5s to 10s brought down the servers. How many other portlets are on those specific pages and what are their average and peak response times? Is it on the home page, guest pages or authenticated user pages? Etc.

Clearly we'll want to really understand the impacts and potential risk scenarios based on your comments.

James Wennmacher
Unicon
480.558.2420
On 08/28/2013 01:26 PM, Anthony Colebourne wrote:
Hi James,

In my experience I would recommend extreme caution when raising portlet timeouts.

Under heavy load, it is very easy to use up all the portlet rendering threads.

Just this morning I accidentally mis-configured a portlet who's time out is 10000 and brought down our servers. We're pretty strict about timeout values and almost all of our portlets use the 5s default.

We in many cases use ajax to load content from long-running processes. I'm some of these cases I have reluctantly raised to timeout of the resource requests only.

I would like to see documented the relationship between rendering threads and timeout values, also how this relates to tomcat threads and apache threads where applicable. (I guess db connection pools are also impacted, both portal and portlet?).

Some guidance on how to choose sensible thread/timeout values based on averages such as portlets per page / server resources would be useful.

I'm happy to provide information and statistics from our production cluster if it helps?

-- Anthony.



On 28/08/13 20:08, James Wennmacher wrote:
I propose we change the portal's default render timeout from 5000ms to
20000ms.  There are portlets that tend to take long time (such as email
preview) or custom portlets connecting to back end systems where 5000ms
is sometimes too short a time.

I think the user experience in general would be better to have a longer
default so:
- The portal doesn't display an unpleasant message to the user for
longer-running scenarios
- The portal is more tolerant of operational issues such as uPortal or
dependent systems running a bit slow
- Longer-running processes should typically use ajax requests to obtain
the data for a better user experience, so the urgent user response is
less important in these situations since the entire UI is not impacted

This would affect new portlets that are created via the UI (or imported
without a timeout value I believe) but not existing portlet instances.

Thoughts?


--
You are currently subscribed to [email protected] as: 
[email protected]
To unsubscribe, change settings or access archives, see 
http://www.ja-sig.org/wiki/display/JSG/uportal-dev

Reply via email to