[
https://issues.apache.org/jira/browse/SHINDIG-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618039#action_12618039
]
Kevin Brown commented on SHINDIG-479:
-------------------------------------
For reference, Louis and I ran some benchmarks using the yourkit profiler and
found that for a gadget with 4 proxied requests, 22% of time was spent in
character set detection alone. The ICU algorithm for detecting the likely
character set is really awful (see the source for details).
There are a couple of potential options:
1.Do a quick check for UTF8 (extremely fast and easy). If not UTF8, try ICU
2. Do UTF8 check. If not UTF8, assume ISO-8859-1 (standard character set for
most web servers).
In both cases we continue to believe the server if it does send an appropriate
character set in the response headers.
The first won't break anything, but it will still be doing a lot of unnecessary
work.
The second will break some things (though probably not many), but the CPU usage
will be down significantly. Anything that might be "broken" can be addressed by
simply specifying the character set in the http responses.
> Character set detection is EXPENSIVE using ICU4J.
> -------------------------------------------------
>
> Key: SHINDIG-479
> URL: https://issues.apache.org/jira/browse/SHINDIG-479
> Project: Shindig
> Issue Type: Bug
> Components: Gadget Rendering Server (Java)
> Reporter: Louis Ryan
> Assignee: Louis Ryan
> Priority: Critical
>
> We use the ICU4J library to detect the character set on HTTP content fetched
> from 3rd parties when the content-type header does not contain the charset.
> The code is quite expensive and the cost was also being incurred on cached
> content rewritten content i.e. basically everything that runs through Shindig.
> I've submitted a partial fix which once the charset is derived it is stored
> back into the HTTP headers so that caching and rewriting can benefit from it
> so far containers with good caching this should eliminate ~75% of the CPU
> overhead but alot of makeRequest traffic is not cacheable and suffers from
> this.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.