On 6/14/11 5:37 PM, Dan Scott wrote:
On Tue, Jun 14, 2011 at 04:15:02PM -0400, Bill Erickson wrote:
On 6/14/11 9:17 AM, Dan Scott wrote:
On Tue, Jun 14, 2011 at 08:41:59AM -0400, Bill Erickson wrote:

Hi Dan,

I'd like to suggest we not make this change or at least make the
default significantly lower.  With a 30-second timeout and a slow or
crippled added content provider, it would not take long for the
Apache processes to be gobbled up, leaving EG unusable.

Hmm. I guess as you say below that depends on load and the added content
provider; we've been running with timeout set to 45 seconds and using
the new OpenLibrary Read API where some requests do take a long time to
resolve (30 seconds for an ISBN with many editions is not unusual, at
least in this early stage before they've optimized their own service). I
thought that with caching integrated into added content, the idea was
that the initial request would be costly but subsequent requests would
be cached - therefore spreading out the pain.

Yes, in some environments a high timeout works fine.  I think it's
very subjective.  And, yes, that is the goal of caching.  It helps a
lot, but obviously it doesn't remove the need to make network calls.

Thanks very much for the response, Bill. To make a long story short:

* Our added content infrastructure is vulnerable to a denial of service;
   if it is enabled, then setting the timeout value is a balance between
   incurring an accidental denial of service and actually working.

Thanks for summarizing/vocalizing my knee-jerk reaction ;)


Here, I would lean towards:

   1) Keeping it enabled - we've generally tried to make things work
      with the default settings.

   2) Setting the default timeout to your extreme limit of discomfort
      - 3 seconds - again, we want things to work with the default
      settings.

   3) Documenting this as a setting to consider the pros and cons of
      for production.

+1


* To avoid nice-to-have requests like added content from blocking core
   requests like fine payments, checkouts, etc, (aka: to remove the
   possibility of denial of service attacks via AC requests) we need to
   rearchitect added content - probably doing something like having all
   added content requests served by a dedicated AC server; possibly
   offering responses via JSONP to help circumvent same-domain request
   restrictions if we offload to something like http://ac.example.com.

+1

An alternative is to treat added content like other catalogs and load the content directly from the provider, avoiding the proxy step altogether. (I know some providers would prefer this approach). That would be a pretty drastic departure, of course. It would solve the DOS problem, better parallel-ize retrieval, and reduce server load, but it comes with a host of new complications and limitations.


Note that this also has implications for the TT OPAC. We don't want to
block for a few seconds waiting for AC requests to return before
returning the final page. We can cheat with cover art like we currently
do, but for excerpts / TOCs / reviews / online previews etc are we
staring down the barrel at JavaScript to enable async results?

FWIW, I would consider any added content beyond cover art
"nice-to-haves" - and online previews via Google Books / OpenLibrary
generally require JavaScript support anyway - so that use of JavaScript
in the TT OPAC wouldn't bother me.

Indeed, added (i.e. remote) content seems to be the sticking point for JS and I agree we don't want TT pages waiting for remote content to load. I'll add my $0.02 to the other thread as soon a I can.

-b

--
Bill Erickson
| VP, Software Development & Integration
| Equinox Software, Inc. / Your Library's Guide to Open Source
| phone: 877-OPEN-ILS (673-6457)
| email: [email protected]
| web: http://esilibrary.com

Equinox is going to New Orleans! Please visit us at booth 550
at ALA Annual to learn more about Koha and Evergreen.

Reply via email to