Re: [Evergreen-general] Question about search engine bots & DB CPU spikes

2021-12-02 Thread JonGeorg SageLibrary via Evergreen-general
I tried that and still got the loopback address, after restarting services.
Any other ideas? And the robots.txt file seems to be doing nothing, which
is not much of a surprise. I've reached out to the people who host our
network and have control of everything on the other side of the firewall.
-Jon


On Wed, Dec 1, 2021 at 3:57 AM Jason Stephenson  wrote:

> JonGeorg,
>
> If you're using nginx as a proxy, that may be the configuration of
> Apache and nginx.
>
> First, make sure that mod_remote_ip is installed and enabled for Apache 2.
>
> Then, in eg_vhost.conf, find the 3 lines the begin with
> "RemoteIPInternalProxy 127.0.0.1/24" and uncomment them.
>
> Next, see what header Apache checks for the remote IP address. In my
> example it is "RemoteIPHeader X-Forwarded-For"
>
> Next, make sure that the following two lines appear in BOTH "location /"
> blocks in the ngins configuration:
>
>  proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
>  proxy_set_header X-Forwarded-Proto $scheme;
>
> After reloading/restarting nginx and Apache, you should start seeing
> remote IP addresses in the Apache logs.
>
> Hope that helps!
> Jason
>
>
> On 12/1/21 12:53 AM, JonGeorg SageLibrary wrote:
> > Because we're behind a firewall, all the addresses display as 127.0.0.1.
> > I can talk to the people who administer the firewall though about
> > blocking IP's. Thanks
> > -Jon
> >
> > On Tue, Nov 30, 2021 at 8:20 PM Jason Stephenson via Evergreen-general
> >  > > wrote:
> >
> > JonGeorg,
> >
> > Check your Apache logs for the source IP addresses. If you can't find
> > them, I can share the correct configuration for Apache with Nginx so
> > that you will get the addresses logged.
> >
> > Once you know the IP address ranges, block them. If you have a
> > firewall,
> > I suggest you block them there. If not, you can block them in Nginx
> or
> > in your load balancer configuration if you have one and it allows
> that.
> >
> > You may think you want your catalog to show up in search engines, but
> > bad bots will lie about who they are. All you can do with misbehaving
> > bots is to block them.
> >
> > HtH,
> > Jason
> >
> > On 11/30/21 9:34 PM, JonGeorg SageLibrary via Evergreen-general
> wrote:
> >  > Question. We've been getting hammered by search engine bots [?],
> but
> >  > they seem to all query our system at the same time. Enough that
> it's
> >  > crashing the app servers. We have a robots.txt file in place. I've
> >  > increased the crawling delay speed from 3 to 10 seconds, and have
> >  > explicitly disallowed the specific bots, but I've seen no change
> > from
> >  > the worst offenders - Bingbot and UT-Dorkbot. We had over 4k hits
> > from
> >  > Dorkbot alone from 2pm-5pm today, and over 5k from Bingbot in the
> > same
> >  > timeframe. All a couple hours after I made the changes to the
> robots
> >  > file and restarted apache services. Which out of 100k entries in
> the
> >  > vhosts files in that time frame doesn't sound like a lot, but the
> > rest
> >  > of the traffic looks normal. This issue has been happening
> >  > intermittently [last 3 are 11/30, 11/3, 7/20] for a while, and
> > the only
> >  > thing that seems to work is to manually kill the services on the
> DB
> >  > servers and restart services on the application servers.
> >  >
> >  > The symptom is an immediate spike in the Database CPU load. I
> start
> >  > killing all queries older than 2 minutes, but it still usually
> >  > overwhelms the system causing the app servers to stop serving
> > requests.
> >  > The stuck queries are almost always ones along the lines of:
> >  >
> >  > -- bib search: #CD_documentLength #CD_meanHarmonic #CD_uniqueWords
> >  > from_metarecord(*/BIB_RECORD#/*) core_limit(10)
> >  > badge_orgs(1,138,151) estimation_strategy(inclusion) skip_check(0)
> >  > check_limit(1000) sort(1) filter_group_entry(1) 1
> >  > site(*/LIBRARY_BRANCH/*) depth(2)
> >  >  +
> >  >   |   | WITH w AS (
> >  >  |   | WITH */STRING/*_keyword_xq AS (SELECT
> >  >  +
> >  >   |   |   (to_tsquery('english_nostop',
> >  > COALESCE(NULLIF( '(' ||
> >  >
> >
>  
> btrim(regexp_replace(split_date_range(search_normalize(replace(replace(uppercase(translate_isbn1013(E'1')),
> >
> >  > */LONG_STRING/*))),E'(?:\\s+|:)','&','g'),'&|')  || ')', '()'),
> > '')) ||
> >  > to_tsquery('simple', COALESCE(NULLIF( '(' ||
> >  >
> >
>  
> btrim(regexp_replace(split_date_range(search_normalize(replace(replace(uppercase(translate_isbn1013(E'1')),
> >
> >  > */LONG_STRING/*))),E'(?:\\s+|:)','&','g'),'&|')  || ')', '()'),
> > '')

[Evergreen-general] Documentation Interest Group (DIG) Meeting Today!

2021-12-02 Thread Deborah Luchenbill via Evergreen-general
Reminder: The December DIG meeting is today, Thursday, December 2, at 2
p.m. EDT / 11 a.m. PDT, on Zoom! We'll be checking in on our docs
reorganization work and then having a holiday party.  Everyone is welcome!

You can find the agenda (including connection information) at
https://wiki.evergreen-ils.org/doku.php?id=evergreen-docs:dig_meetings:20211202-agenda

Though we will try to keep other business to a minimum at this meeting, if
you have something important to add to the agenda, feel free to do so, or
to let me know. Please include your name in parentheses to any agenda items
you add.

You can find minutes from our November meeting (and all others) at
https://wiki.evergreen-ils.org/doku.php?id=evergreen-docs:dig_meetings

Best,
Debbie
DIG Facilitator

Debbie Luchenbill
Associate Director, Open Source Initiatives
MOBIUS
2511 Broadway Bluffs, Ste. 101
Columbia, MO  65201
deb...@mobiusconsortium.org 
573-234-4914
https://mobiusconsortium.org <http://mobiusconsortium.org>
Evergreen Help Desk: h...@mobiusconsortium.org / 877-312-3517
Pronouns: She/Her/Hers
___
Evergreen-general mailing list
Evergreen-general@list.evergreen-ils.org
http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general