On 6/14/11 5:37 PM, Dan Scott wrote:
On Tue, Jun 14, 2011 at 04:15:02PM -0400, Bill Erickson wrote:
On 6/14/11 9:17 AM, Dan Scott wrote:
On Tue, Jun 14, 2011 at 08:41:59AM -0400, Bill Erickson wrote:
Hi Dan,
I'd like to suggest we not make this change or at least make the
default significantly lower. With a 30-second timeout and a slow or
crippled added content provider, it would not take long for the
Apache processes to be gobbled up, leaving EG unusable.
Hmm. I guess as you say below that depends on load and the added content
provider; we've been running with timeout set to 45 seconds and using
the new OpenLibrary Read API where some requests do take a long time to
resolve (30 seconds for an ISBN with many editions is not unusual, at
least in this early stage before they've optimized their own service). I
thought that with caching integrated into added content, the idea was
that the initial request would be costly but subsequent requests would
be cached - therefore spreading out the pain.
Yes, in some environments a high timeout works fine. I think it's
very subjective. And, yes, that is the goal of caching. It helps a
lot, but obviously it doesn't remove the need to make network calls.
Thanks very much for the response, Bill. To make a long story short:
* Our added content infrastructure is vulnerable to a denial of service;
if it is enabled, then setting the timeout value is a balance between
incurring an accidental denial of service and actually working.
Thanks for summarizing/vocalizing my knee-jerk reaction ;)
Here, I would lean towards:
1) Keeping it enabled - we've generally tried to make things work
with the default settings.
2) Setting the default timeout to your extreme limit of discomfort
- 3 seconds - again, we want things to work with the default
settings.
3) Documenting this as a setting to consider the pros and cons of
for production.
+1
* To avoid nice-to-have requests like added content from blocking core
requests like fine payments, checkouts, etc, (aka: to remove the
possibility of denial of service attacks via AC requests) we need to
rearchitect added content - probably doing something like having all
added content requests served by a dedicated AC server; possibly
offering responses via JSONP to help circumvent same-domain request
restrictions if we offload to something like http://ac.example.com.
+1
An alternative is to treat added content like other catalogs and load
the content directly from the provider, avoiding the proxy step
altogether. (I know some providers would prefer this approach). That
would be a pretty drastic departure, of course. It would solve the DOS
problem, better parallel-ize retrieval, and reduce server load, but it
comes with a host of new complications and limitations.
Note that this also has implications for the TT OPAC. We don't want to
block for a few seconds waiting for AC requests to return before
returning the final page. We can cheat with cover art like we currently
do, but for excerpts / TOCs / reviews / online previews etc are we
staring down the barrel at JavaScript to enable async results?
FWIW, I would consider any added content beyond cover art
"nice-to-haves" - and online previews via Google Books / OpenLibrary
generally require JavaScript support anyway - so that use of JavaScript
in the TT OPAC wouldn't bother me.
Indeed, added (i.e. remote) content seems to be the sticking point for
JS and I agree we don't want TT pages waiting for remote content to
load. I'll add my $0.02 to the other thread as soon a I can.
-b
--
Bill Erickson
| VP, Software Development & Integration
| Equinox Software, Inc. / Your Library's Guide to Open Source
| phone: 877-OPEN-ILS (673-6457)
| email: [email protected]
| web: http://esilibrary.com
Equinox is going to New Orleans! Please visit us at booth 550
at ALA Annual to learn more about Koha and Evergreen.