Re: [Evergreen-general] Hold emails appear to not be working

2024-06-17 Thread JonGeorg SageLibrary via Evergreen-general
I just wanted to say thanks to those who responded to me. Most notably to
Chris Sharp who helped me identify the issue and get it resolved the next
business day. Your help was very much appreciated.
-Jon

On Fri, Jun 7, 2024 at 8:18 PM JonGeorg SageLibrary <
jongeorg.sagelibr...@gmail.com> wrote:

> Greetings. I have two libraries that have noticed that hold emails have
> not been going out the last couple days. One of the libraries was able to
> send me specific patron info and I was able to query the db and verify that
> no hold emails went out for those patrons. I do see other emails going out
> from the system via the mail.log on the server, so the mail server is up
> and working.
>
> I did have an issue where the log server ran out of space and I had to
> reboot it after clearing space, however I have been able to verify that
> hold emails went out after that, so that should not be an issue, but it is
> the only unusual recent event for our system that I'm aware of. Email
> receipts are working as expected, included among the patrons I know did not
> receive hold email notifications.
>
> I've checked syslog for CRON jobs, for such things as hold_targeter.pl,
> action_trigger_runner.pl, thaw_expired_frozen_holds.srfsh, and I don't
> see any errors and all CRON jobs appear to be running and finishing
> successfully.
>
> I've also checked the mail.log file and do not see any entries for these
> patrons regarding their holds which makes sense as there is no entry in the
> db, but I do see email receipts. I do see a ton of 'Network is unreachable'
> errors related to gmail and other providers, but that is a separate issue
> and likely a reason to consider a 3rd party application to handle
> notifications.
>
> Is it possible that there is a backlog of stuck hold emails like what
> happens when the reporter gets stuck? Is there a way to force the hold
> email function to stop and restart, perhaps in the database itself? I've
> restarted all services on all servers [minus the db servers] without
> resolution.
>
> Thanks
> -Jon
>
___
Evergreen-general mailing list
Evergreen-general@list.evergreen-ils.org
http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general


Re: [Evergreen-general] IRC links

2024-06-10 Thread JonGeorg SageLibrary via Evergreen-general
Thank you
-Jon

On Mon, Jun 10, 2024 at 5:19 PM Jason Stephenson via Evergreen-general <
evergreen-general@list.evergreen-ils.org> wrote:

> JonGeorg,
>
> http://irc.evergreen-ils.org/evergreen is the correct link.
>
> I'll see about fixing the links on the site tomorrow unless someone else
> does it before me.
>
> Jason Stephenson
>
> On 6/10/24 7:27 PM, JonGeorg SageLibrary via Evergreen-general wrote:
> > I'm not sure if this is the correct place to point this out, but the IRC
> > log links to https://evergreen-ils.org/evergreen/today
> > <https://evergreen-ils.org/evergreen/today> do not work. They take you
> > to a page that says what you're looking for is missing. Same with the
> > link in the footer of https://evergreen-ils.org/
> > <https://evergreen-ils.org/>
> >
> > Thanks
> > -Jon
> >
> > ___
> > Evergreen-general mailing list
> > Evergreen-general@list.evergreen-ils.org
> > http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general
> ___
> Evergreen-general mailing list
> Evergreen-general@list.evergreen-ils.org
> http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general
>
___
Evergreen-general mailing list
Evergreen-general@list.evergreen-ils.org
http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general


[Evergreen-general] IRC links

2024-06-10 Thread JonGeorg SageLibrary via Evergreen-general
I'm not sure if this is the correct place to point this out, but the IRC
log links to https://evergreen-ils.org/evergreen/today do not work. They
take you to a page that says what you're looking for is missing. Same with
the link in the footer of https://evergreen-ils.org/

Thanks
-Jon
___
Evergreen-general mailing list
Evergreen-general@list.evergreen-ils.org
http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general


[Evergreen-general] Hold emails appear to not be working

2024-06-07 Thread JonGeorg SageLibrary via Evergreen-general
Greetings. I have two libraries that have noticed that hold emails have not
been going out the last couple days. One of the libraries was able to send
me specific patron info and I was able to query the db and verify that no
hold emails went out for those patrons. I do see other emails going out
from the system via the mail.log on the server, so the mail server is up
and working.

I did have an issue where the log server ran out of space and I had to
reboot it after clearing space, however I have been able to verify that
hold emails went out after that, so that should not be an issue, but it is
the only unusual recent event for our system that I'm aware of. Email
receipts are working as expected, included among the patrons I know did not
receive hold email notifications.

I've checked syslog for CRON jobs, for such things as hold_targeter.pl,
action_trigger_runner.pl, thaw_expired_frozen_holds.srfsh, and I don't see
any errors and all CRON jobs appear to be running and finishing
successfully.

I've also checked the mail.log file and do not see any entries for these
patrons regarding their holds which makes sense as there is no entry in the
db, but I do see email receipts. I do see a ton of 'Network is unreachable'
errors related to gmail and other providers, but that is a separate issue
and likely a reason to consider a 3rd party application to handle
notifications.

Is it possible that there is a backlog of stuck hold emails like what
happens when the reporter gets stuck? Is there a way to force the hold
email function to stop and restart, perhaps in the database itself? I've
restarted all services on all servers [minus the db servers] without
resolution.

Thanks
-Jon
___
Evergreen-general mailing list
Evergreen-general@list.evergreen-ils.org
http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general


Re: [Evergreen-general] [External] Reporter folder cleanup question

2024-05-28 Thread JonGeorg SageLibrary via Evergreen-general
Thank you everyone. This helps a ton.
-Jon

On Tue, May 28, 2024 at 12:48 PM Jason Stephenson via Evergreen-general <
evergreen-general@list.evergreen-ils.org> wrote:

> I hit reply too soon!
>
> We also run this a few minutes later to remove empty directories:
>
> find /openils/var/data/reports/ -empty -type d -delete
>
> You could bundle them up into a single script. I recommend running the
> above find command after the one in the previous email.
>
> On 5/28/24 15:46, Jason Stephenson wrote:
> > Jon,
> >
> > We run something similar to what Blake shared below.
> >
> > We also run this daily via crontab to remove the files after 90 days:
> >
> > find /openils/var/data/reports/ -type f -mtime +90 -delete
> >
> > You'll need to adjust "/openils/var/data" for wherever your reports are
> > stored.
> >
> > HtH,
> > Jason Stephenson
> >
> > On 5/28/24 15:29, Blake Graham-Henderson via Evergreen-general wrote:
> >> Jon,
> >>
> >> Here's the cron line:
> >>
> >>
> >> # purge 90 day old reports
> >> 0 01 * * * psql  < purge_reports.sql >/dev/null 2>&1
> >>
> >> And the contents of purge_reports.sql:
> >>
> >> BEGIN;
> >>
> >> DELETE FROM reporter.schedule WHERE run_time >> DELETE  FROM reporter.report WHERE create_time >> days' AND recur=FALSE AND id NOT IN
> >> (SELECT r.id FROM reporter.report r INNER JOIN reporter.schedule s ON
> >> r.id=s.report);
> >>
> >> COMMIT;
> >>
> >>
> >> -Blake-
> >> Conducting Magic
> >> Will consume any data format
> >> MOBIUS
> >>
> >> On 5/28/2024 1:45 PM, Murphy, Benjamin via Evergreen-general wrote:
> >>> NC Cardinal has a process that deletes old output and non-recurring
> >>> reports after 3 months. We don't touch the templates. (Its a cron job
> >>> that Mobius runs for us.)
> >>>
> >>> *Benjamin Murphy*
> >>>
> >>> NC Cardinal Program Manager
> >>>
> >>> State Library of North Carolina
> >>>
> >>> _benjamin.mur...@dncr.nc.gov _ |
> >>> https://statelibrary.ncdcr.gov/services-libraries/nc-cardinal
> >>>
> >>> 109 East Jones Street  | 4640 Mail Service Center
> >>>
> >>> Raleigh, North Carolina 27699-4600
> >>>
> >>> The State Library is part of the NC Department of Natural & Cultural
> >>> Resources.
> >>>
> >>> /Email correspondence to and from this address is subject to the
> >>> North Carolina Public Records Law and may be disclosed to third
> >>> parties./
> >>>
> >>> Please note new email address
> >>>
> >>>
> 
> >>> *From:* Evergreen-general
> >>>  on behalf of
> >>> JonGeorg SageLibrary via Evergreen-general
> >>> 
> >>> *Sent:* Tuesday, May 28, 2024 2:35 PM
> >>> *To:* Evergreen Discussion Group
> >>> 
> >>> *Cc:* JonGeorg SageLibrary 
> >>> *Subject:* [External] [Evergreen-general] Reporter folder cleanup
> >>> question
> >>> CAUTION: External email. Do not click links or open attachments
> >>> unless verified. Report suspicious emails with the Report Message
> >>> button located on your Outlook menu bar on the Home tab.
> >>>
> >>> What methodology are you all using to periodically purge old reports
> >>> out of the /openils/var/web/reporter folder?
> >>>
> >>> Thanks
> >>> -Jon
> >>>
> >>>
> 
> >>>
> >>> Email correspondence to and from this address may be subject to the
> >>> North Carolina Public Records Law and may be disclosed to third
> >>> parties by an authorized state official.
> >>>
> >>> ___
> >>> Evergreen-general mailing list
> >>> Evergreen-general@list.evergreen-ils.org
> >>>
> http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general
> >>
> >>
> >> ___
> >> Evergreen-general mailing list
> >> Evergreen-general@list.evergreen-ils.org
> >>
> http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general
> ___
> Evergreen-general mailing list
> Evergreen-general@list.evergreen-ils.org
> http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general
>
___
Evergreen-general mailing list
Evergreen-general@list.evergreen-ils.org
http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general


[Evergreen-general] Reporter folder cleanup question

2024-05-28 Thread JonGeorg SageLibrary via Evergreen-general
What methodology are you all using to periodically purge old reports out of
the /openils/var/web/reporter folder?

Thanks
-Jon
___
Evergreen-general mailing list
Evergreen-general@list.evergreen-ils.org
http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general


[Evergreen-general] Cover Image Scaling question

2024-05-03 Thread JonGeorg SageLibrary via Evergreen-general
In prior versions of Evergreen, cover images missing from ContentCafe were
always square when they had to be manually uploaded to the server.

We're still currently on 3.7 [planning to upgrade soon] and the traditional
view shows them correctly. The new angular search results however stretch
the images to the new rectangular, portrait oriented format. Documentation
suggests that how those images are handled can be changed, but it doesn't
say how or where. We have approximately 1500 cover images that I've
uploaded and I'd rather not resize them all/reupload them if there is a way
to avoid that.

Suggestions?
Thanks
-Jon
___
Evergreen-general mailing list
Evergreen-general@list.evergreen-ils.org
http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general


[Evergreen-general] Permission reset question

2024-05-03 Thread JonGeorg SageLibrary via Evergreen-general
I have a question. Some of our staff accounts have individually set
permissions, or at least customized permissions, rather than all being
managed by groups. Is there an easy way to reset a user's permissions to
null, so I can then reassign them by group? I'm assuming that this would
have to be done on the database side?

Thanks
-Jon
___
Evergreen-general mailing list
Evergreen-general@list.evergreen-ils.org
http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general


Re: [Evergreen-general] SMS Messages Not Being Received

2024-04-05 Thread JonGeorg SageLibrary via Evergreen-general
Elizabeth, what 3rd party service did you end up using and how well has it
been working for your libraries?
-Jon

On Fri, Apr 5, 2024 at 11:29 AM Elizabeth Davis via Evergreen-general <
evergreen-general@list.evergreen-ils.org> wrote:

> Hi Will
>
>
>
> We’ve had several issues with most of the carriers on SMS issues.
> Sometimes SMS messages were throttled, and patrons would get them days or
> weeks later if at all.  Some carriers just stopped sending them totally.
> Sometimes the messages to a specific carrier would deliver for one library
> in the consortium but not others.  We tried changing from sending SMS to
> MMS with no luck.  Also we were getting asked to add new carriers that
> don’t have gateways and so we were unable to provide SMS service to those
> patrons.In the end we moved to a third-party option to deliver SMS
> notifications.
>
>
>
>
>
> *Elizabeth Davis* (she/her), *Support & Project Management Specialist*
>
> *Pennsylvania Integrated Library System **(PaILS) | SPARK*
>
> (717) 256-1627 | elizabeth.da...@sparkpa.org
> 
> support.sparkpa.org | supp...@sparkpa.org
>
>
>
> *From:* Evergreen-general <
> evergreen-general-boun...@list.evergreen-ils.org> *On Behalf Of *Szwagiel,
> Will via Evergreen-general
> *Sent:* Friday, April 5, 2024 2:13 PM
> *To:* Szwagiel, Will via Evergreen-general <
> evergreen-general@list.evergreen-ils.org>
> *Cc:* Szwagiel, Will 
> *Subject:* [Evergreen-general] SMS Messages Not Being Received
>
>
>
> Good afternoon,
>
>
>
> We have recently been receiving a number of reports from different
> libraries that patrons are not receiving SMS notifications, particularly
> those for holds.  Evergreen is sending the messages like it is supposed to,
> so we are thinking that some carriers may be flagging the messages as
> spam.  Based on the reports we have received, it appears to be most common
> with AT
>
>
>
> For anyone else who might have experienced this in the past, did you have
> any direct interactions with the carrier/s, and if so, what were the steps
> that needed to be taken to prevent this from happening in the future?
>
>
>
> Thank you.
>
>
>
> *William C. Szwagiel*
>
> NC Cardinal Project Manager
>
> State Library of North Carolina
>
> william.szwag...@ncdcr.gov | 919.814.6721
>
> https://statelibrary.ncdcr.gov/services-libraries/nc-cardinal
> 
>
> 109 East Jones Street  | 4640 Mail Service Center
>
> Raleigh, North Carolina 27699-4600
>
> The State Library is part of the NC Department of Natural & Cultural
> Resources.
>
> *Email correspondence to and from this address is subject to the North
> Carolina Public Records Law and may be disclosed to third parties.*
>
>
>
>
> --
>
>
> Email correspondence to and from this address may be subject to the North
> Carolina Public Records Law and may be disclosed to third parties by an
> authorized state official.
> ___
> Evergreen-general mailing list
> Evergreen-general@list.evergreen-ils.org
> http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general
>
___
Evergreen-general mailing list
Evergreen-general@list.evergreen-ils.org
http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general


Re: [Evergreen-general] SMS Messages Not Being Received

2024-04-05 Thread JonGeorg SageLibrary via Evergreen-general
We've had a lot of issues with SMS notifications not going out according to
patrons.

Sometimes it's because the carriers were bought out by another carrier like
when StraightTalk was purchased by Verizon out here in Oregon- so I updated
SMS server settings.

Another was when US Cellular apparently stopped supporting email to text
altogether, at least that is what I was told by their support. However,
some users are still receiving notifications while others aren't- leaving
me completely confused and unable to verify the issue.

For now we've been suggesting to staff to recommend email notifications
instead of SMS, and I've set up email forwarders for all branches to avoid
SPF issues for the time being.

Looking forward to seeing more responses on this topic to see what
solutions others are using. It's been mentioned before that the time to use
a 3rd party application for email & text notifications is approaching and
I'd like to hear more about success/issues with that if anyone has gone
that route.

Thanks for bringing this topic up.
-Jon

On Fri, Apr 5, 2024 at 11:13 AM Szwagiel, Will via Evergreen-general <
evergreen-general@list.evergreen-ils.org> wrote:

> Good afternoon,
>
> We have recently been receiving a number of reports from different
> libraries that patrons are not receiving SMS notifications, particularly
> those for holds.  Evergreen is sending the messages like it is supposed to,
> so we are thinking that some carriers may be flagging the messages as
> spam.  Based on the reports we have received, it appears to be most common
> with AT
>
> For anyone else who might have experienced this in the past, did you have
> any direct interactions with the carrier/s, and if so, what were the steps
> that needed to be taken to prevent this from happening in the future?
>
> Thank you.
>
> *William C. Szwagiel*
>
> NC Cardinal Project Manager
>
> State Library of North Carolina
>
> william.szwag...@ncdcr.gov | 919.814.6721
>
> https://statelibrary.ncdcr.gov/services-libraries/nc-cardinal
>
> 109 East Jones Street  | 4640 Mail Service Center
>
> Raleigh, North Carolina 27699-4600
>
> The State Library is part of the NC Department of Natural & Cultural
> Resources.
>
> *Email correspondence to and from this address is subject to the North
> Carolina Public Records Law and may be disclosed to third parties.*
>
>
>
> --
>
> Email correspondence to and from this address may be subject to the North
> Carolina Public Records Law and may be disclosed to third parties by an
> authorized state official.
> ___
> Evergreen-general mailing list
> Evergreen-general@list.evergreen-ils.org
> http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general
>
___
Evergreen-general mailing list
Evergreen-general@list.evergreen-ils.org
http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general


[Evergreen-general] Issues with Evergreen wiki site?

2023-10-24 Thread JonGeorg SageLibrary via Evergreen-general
I was looking for the database schema and am getting 404 errors for the
links to it. This includes the link on the wiki-
https://wiki.evergreen-ils.org/doku.php?id=dev:database_schemas

-Jon
___
Evergreen-general mailing list
Evergreen-general@list.evergreen-ils.org
http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general


Re: [Evergreen-general] Question about search engine bots & DB CPU spikes

2021-12-03 Thread JonGeorg SageLibrary via Evergreen-general
The DorkBot queries I'm referring to look like this:
[02/Dec/2021:12:08:13 -0800] "GET
/eg/opac/results?do_basket_action=Go=1_record_view=1=Search_highlight=1=metabib_basket_action=1=keyword%27%22%3Amat_format=1=176=1
HTTP/1.0" 200 62417 "-" "UT-Dorkbot/1.0"

they vary after metabib, but all are using the basket feature. They come
from different library branch URLs.
-Jon

On Fri, Dec 3, 2021 at 10:45 AM JonGeorg SageLibrary <
jongeorg.sagelibr...@gmail.com> wrote:

> Yeah, I'm not seeing any /opac/extras/unapi requests in the Apache logs.
> Is DorkBot used legitimately for querying the opac?
> -Jon
>
> On Fri, Dec 3, 2021 at 10:37 AM JonGeorg SageLibrary <
> jongeorg.sagelibr...@gmail.com> wrote:
>
>> Thank you!
>> -Jon
>>
>> On Fri, Dec 3, 2021 at 8:10 AM Blake Henderson via Evergreen-general <
>> evergreen-general@list.evergreen-ils.org> wrote:
>>
>>> JonGeorg,
>>>
>>> This reminds me of a similar issues that we had. We resolved it with
>>> this change to NGINX. Here's the link:
>>>
>>>
>>> https://git.evergreen-ils.org/?p=working/OpenSRF.git;a=shortlog;h=refs/heads/user/blake/LP1913610_nginx_request_limits
>>>
>>> and the bug:
>>> https://bugs.launchpad.net/evergreen/+bug/1913610
>>>
>>> I'm not sure that it's the same issue though, as you've shared a search
>>> SQL query and this solution addresses external requests to
>>> "/opac/extras/unapi"
>>> But you might be able to apply the same nginx rate limiting technique
>>> here if you can detect the URL they are using.
>>>
>>> There is a tool called "apachetop" which I used in order to see the
>>> URL's that were being used.
>>>
>>> apt-get -y install apachetop && apachetop -f
>>> /var/log/apache2/other_vhosts_access.log
>>>
>>> and another useful command:
>>>
>>> cat /var/log/apache2/other_vhosts_access.log | awk '{print $2}' | sort |
>>> uniq -c | sort -rn
>>>
>>> You have to ignore (not limit) all the requests to the Evergreen gateway
>>> as most of that traffic is the staff client and should (probably) not be
>>> limited.
>>>
>>> I'm just throwing some ideas out there for you. Good luck!
>>>
>>> -Blake-
>>> Conducting Magic
>>> Can consume data in any format
>>> MOBIUS
>>>
>>> On 12/2/2021 9:07 PM, JonGeorg SageLibrary via Evergreen-general wrote:
>>>
>>> I tried that and still got the loopback address, after restarting
>>> services. Any other ideas? And the robots.txt file seems to be doing
>>> nothing, which is not much of a surprise. I've reached out to the people
>>> who host our network and have control of everything on the other side of
>>> the firewall.
>>> -Jon
>>>
>>>
>>> On Wed, Dec 1, 2021 at 3:57 AM Jason Stephenson  wrote:
>>>
>>>> JonGeorg,
>>>>
>>>> If you're using nginx as a proxy, that may be the configuration of
>>>> Apache and nginx.
>>>>
>>>> First, make sure that mod_remote_ip is installed and enabled for Apache
>>>> 2.
>>>>
>>>> Then, in eg_vhost.conf, find the 3 lines the begin with
>>>> "RemoteIPInternalProxy 127.0.0.1/24" and uncomment them.
>>>>
>>>> Next, see what header Apache checks for the remote IP address. In my
>>>> example it is "RemoteIPHeader X-Forwarded-For"
>>>>
>>>> Next, make sure that the following two lines appear in BOTH "location
>>>> /"
>>>> blocks in the ngins configuration:
>>>>
>>>>  proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
>>>>  proxy_set_header X-Forwarded-Proto $scheme;
>>>>
>>>> After reloading/restarting nginx and Apache, you should start seeing
>>>> remote IP addresses in the Apache logs.
>>>>
>>>> Hope that helps!
>>>> Jason
>>>>
>>>>
>>>> On 12/1/21 12:53 AM, JonGeorg SageLibrary wrote:
>>>> > Because we're behind a firewall, all the addresses display as
>>>> 127.0.0.1.
>>>> > I can talk to the people who administer the firewall though about
>>>> > blocking IP's. Thanks
>>>> > -Jon
>>>> >
>>>> > On Tue, Nov 30, 2021 at 8:20 PM Jason Stephenson via
>>>> Evergreen-general
>>>> >

Re: [Evergreen-general] Question about search engine bots & DB CPU spikes

2021-12-03 Thread JonGeorg SageLibrary via Evergreen-general
Thank you!
-Jon

On Fri, Dec 3, 2021 at 8:10 AM Blake Henderson via Evergreen-general <
evergreen-general@list.evergreen-ils.org> wrote:

> JonGeorg,
>
> This reminds me of a similar issues that we had. We resolved it with this
> change to NGINX. Here's the link:
>
>
> https://git.evergreen-ils.org/?p=working/OpenSRF.git;a=shortlog;h=refs/heads/user/blake/LP1913610_nginx_request_limits
>
> and the bug:
> https://bugs.launchpad.net/evergreen/+bug/1913610
>
> I'm not sure that it's the same issue though, as you've shared a search
> SQL query and this solution addresses external requests to
> "/opac/extras/unapi"
> But you might be able to apply the same nginx rate limiting technique here
> if you can detect the URL they are using.
>
> There is a tool called "apachetop" which I used in order to see the URL's
> that were being used.
>
> apt-get -y install apachetop && apachetop -f
> /var/log/apache2/other_vhosts_access.log
>
> and another useful command:
>
> cat /var/log/apache2/other_vhosts_access.log | awk '{print $2}' | sort |
> uniq -c | sort -rn
>
> You have to ignore (not limit) all the requests to the Evergreen gateway
> as most of that traffic is the staff client and should (probably) not be
> limited.
>
> I'm just throwing some ideas out there for you. Good luck!
>
> -Blake-
> Conducting Magic
> Can consume data in any format
> MOBIUS
>
> On 12/2/2021 9:07 PM, JonGeorg SageLibrary via Evergreen-general wrote:
>
> I tried that and still got the loopback address, after restarting
> services. Any other ideas? And the robots.txt file seems to be doing
> nothing, which is not much of a surprise. I've reached out to the people
> who host our network and have control of everything on the other side of
> the firewall.
> -Jon
>
>
> On Wed, Dec 1, 2021 at 3:57 AM Jason Stephenson  wrote:
>
>> JonGeorg,
>>
>> If you're using nginx as a proxy, that may be the configuration of
>> Apache and nginx.
>>
>> First, make sure that mod_remote_ip is installed and enabled for Apache 2.
>>
>> Then, in eg_vhost.conf, find the 3 lines the begin with
>> "RemoteIPInternalProxy 127.0.0.1/24" and uncomment them.
>>
>> Next, see what header Apache checks for the remote IP address. In my
>> example it is "RemoteIPHeader X-Forwarded-For"
>>
>> Next, make sure that the following two lines appear in BOTH "location /"
>> blocks in the ngins configuration:
>>
>>  proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
>>  proxy_set_header X-Forwarded-Proto $scheme;
>>
>> After reloading/restarting nginx and Apache, you should start seeing
>> remote IP addresses in the Apache logs.
>>
>> Hope that helps!
>> Jason
>>
>>
>> On 12/1/21 12:53 AM, JonGeorg SageLibrary wrote:
>> > Because we're behind a firewall, all the addresses display as
>> 127.0.0.1.
>> > I can talk to the people who administer the firewall though about
>> > blocking IP's. Thanks
>> > -Jon
>> >
>> > On Tue, Nov 30, 2021 at 8:20 PM Jason Stephenson via Evergreen-general
>> > > > <mailto:evergreen-general@list.evergreen-ils.org>> wrote:
>> >
>> > JonGeorg,
>> >
>> > Check your Apache logs for the source IP addresses. If you can't
>> find
>> > them, I can share the correct configuration for Apache with Nginx so
>> > that you will get the addresses logged.
>> >
>> >     Once you know the IP address ranges, block them. If you have a
>> > firewall,
>> > I suggest you block them there. If not, you can block them in Nginx
>> or
>> > in your load balancer configuration if you have one and it allows
>> that.
>> >
>> > You may think you want your catalog to show up in search engines,
>> but
>> > bad bots will lie about who they are. All you can do with
>> misbehaving
>> > bots is to block them.
>> >
>> > HtH,
>> > Jason
>> >
>> > On 11/30/21 9:34 PM, JonGeorg SageLibrary via Evergreen-general
>> wrote:
>> >  > Question. We've been getting hammered by search engine bots [?],
>> but
>> >  > they seem to all query our system at the same time. Enough that
>> it's
>> >  > crashing the app servers. We have a robots.txt file in place.
>> I've
>> >  > increased the crawling delay speed from 3 to 10 seconds, and have
>> >  > explicitly disallowed the specif

Re: [Evergreen-general] Question about search engine bots & DB CPU spikes

2021-12-02 Thread JonGeorg SageLibrary via Evergreen-general
I tried that and still got the loopback address, after restarting services.
Any other ideas? And the robots.txt file seems to be doing nothing, which
is not much of a surprise. I've reached out to the people who host our
network and have control of everything on the other side of the firewall.
-Jon


On Wed, Dec 1, 2021 at 3:57 AM Jason Stephenson  wrote:

> JonGeorg,
>
> If you're using nginx as a proxy, that may be the configuration of
> Apache and nginx.
>
> First, make sure that mod_remote_ip is installed and enabled for Apache 2.
>
> Then, in eg_vhost.conf, find the 3 lines the begin with
> "RemoteIPInternalProxy 127.0.0.1/24" and uncomment them.
>
> Next, see what header Apache checks for the remote IP address. In my
> example it is "RemoteIPHeader X-Forwarded-For"
>
> Next, make sure that the following two lines appear in BOTH "location /"
> blocks in the ngins configuration:
>
>  proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
>  proxy_set_header X-Forwarded-Proto $scheme;
>
> After reloading/restarting nginx and Apache, you should start seeing
> remote IP addresses in the Apache logs.
>
> Hope that helps!
> Jason
>
>
> On 12/1/21 12:53 AM, JonGeorg SageLibrary wrote:
> > Because we're behind a firewall, all the addresses display as 127.0.0.1.
> > I can talk to the people who administer the firewall though about
> > blocking IP's. Thanks
> > -Jon
> >
> > On Tue, Nov 30, 2021 at 8:20 PM Jason Stephenson via Evergreen-general
> >  > <mailto:evergreen-general@list.evergreen-ils.org>> wrote:
> >
> > JonGeorg,
> >
> > Check your Apache logs for the source IP addresses. If you can't find
> > them, I can share the correct configuration for Apache with Nginx so
> > that you will get the addresses logged.
> >
> > Once you know the IP address ranges, block them. If you have a
> > firewall,
> > I suggest you block them there. If not, you can block them in Nginx
> or
> > in your load balancer configuration if you have one and it allows
> that.
> >
> > You may think you want your catalog to show up in search engines, but
> > bad bots will lie about who they are. All you can do with misbehaving
> > bots is to block them.
> >
> > HtH,
> > Jason
> >
> > On 11/30/21 9:34 PM, JonGeorg SageLibrary via Evergreen-general
> wrote:
> >  > Question. We've been getting hammered by search engine bots [?],
> but
> >  > they seem to all query our system at the same time. Enough that
> it's
> >  > crashing the app servers. We have a robots.txt file in place. I've
> >  > increased the crawling delay speed from 3 to 10 seconds, and have
> >  > explicitly disallowed the specific bots, but I've seen no change
> > from
> >  > the worst offenders - Bingbot and UT-Dorkbot. We had over 4k hits
> > from
> >  > Dorkbot alone from 2pm-5pm today, and over 5k from Bingbot in the
> > same
> >  > timeframe. All a couple hours after I made the changes to the
> robots
> >  > file and restarted apache services. Which out of 100k entries in
> the
> >  > vhosts files in that time frame doesn't sound like a lot, but the
> > rest
> >  > of the traffic looks normal. This issue has been happening
> >  > intermittently [last 3 are 11/30, 11/3, 7/20] for a while, and
> > the only
> >  > thing that seems to work is to manually kill the services on the
> DB
> >  > servers and restart services on the application servers.
> >  >
> >  > The symptom is an immediate spike in the Database CPU load. I
> start
> >  > killing all queries older than 2 minutes, but it still usually
> >  > overwhelms the system causing the app servers to stop serving
> > requests.
> >  > The stuck queries are almost always ones along the lines of:
> >  >
> >  > -- bib search: #CD_documentLength #CD_meanHarmonic #CD_uniqueWords
> >  > from_metarecord(*/BIB_RECORD#/*) core_limit(10)
> >  > badge_orgs(1,138,151) estimation_strategy(inclusion) skip_check(0)
> >  > check_limit(1000) sort(1) filter_group_entry(1) 1
> >  > site(*/LIBRARY_BRANCH/*) depth(2)
> >  >  +
> >  >   |   | WITH w AS (
> >  >  |   | WITH */STRING/*_keyword_xq AS (SELECT
> >  >

Re: [Evergreen-general] Question about search engine bots & DB CPU spikes

2021-12-01 Thread JonGeorg SageLibrary via Evergreen-general
The LONG STRING sometimes contains a word, but it's usually just a string
of numbers repeated, like this- $_78110$[$_78110$, $_78110$$_78110$),
$_78110$]$_78110$, $_78110$$_78110$. The numbers change which is why I
suspect it's a SQL injection attempt.

I agree re blocking by IP's. I didn't set the robots file crawl time any
higher as I wanted to see what, if any, effect the initial change had
during an attack.
-Jon

On Wed, Dec 1, 2021 at 11:27 AM Jeff Davis via Evergreen-general <
evergreen-general@list.evergreen-ils.org> wrote:

> Our robots.txt file (https://catalogue.libraries.coop/robots.txt)
> throttles Googlebot and Bingbot to 60 seconds and disallows certain
> other crawlers entirely.  So even 10 seconds seems generous to me.
>
> Of course, robots.txt will only be respected by well-behaved crawlers;
> there's nothing preventing a bot from ignoring it (in which case, as
> Jason says, your best bet may be to block the offending IP).
>
> Is the "LONG_STRING" in your examples a legitimate search -- i.e, no
> unusual characters or obvious SQL injection attempts?  Does it contain
> complex nesting of search terms?
>
> Jeff
>
>
> On 2021-11-30 6:34 p.m., JonGeorg SageLibrary via Evergreen-general wrote:
> > Question. We've been getting hammered by search engine bots [?], but
> > they seem to all query our system at the same time. Enough that it's
> > crashing the app servers. We have a robots.txt file in place. I've
> > increased the crawling delay speed from 3 to 10 seconds, and have
> > explicitly disallowed the specific bots, but I've seen no change from
> > the worst offenders - Bingbot and UT-Dorkbot. We had over 4k hits from
> > Dorkbot alone from 2pm-5pm today, and over 5k from Bingbot in the same
> > timeframe. All a couple hours after I made the changes to the robots
> > file and restarted apache services. Which out of 100k entries in the
> > vhosts files in that time frame doesn't sound like a lot, but the rest
> > of the traffic looks normal. This issue has been happening
> > intermittently [last 3 are 11/30, 11/3, 7/20] for a while, and the only
> > thing that seems to work is to manually kill the services on the DB
> > servers and restart services on the application servers.
> >
> > The symptom is an immediate spike in the Database CPU load. I start
> > killing all queries older than 2 minutes, but it still usually
> > overwhelms the system causing the app servers to stop serving requests.
> > The stuck queries are almost always ones along the lines of:
> >
> > -- bib search: #CD_documentLength #CD_meanHarmonic #CD_uniqueWords
> > from_metarecord(*/BIB_RECORD#/*) core_limit(10)
> > badge_orgs(1,138,151) estimation_strategy(inclusion) skip_check(0)
> > check_limit(1000) sort(1) filter_group_entry(1) 1
> > site(*/LIBRARY_BRANCH/*) depth(2)
> >  +
> >   |   | WITH w AS (
> >  |   | WITH */STRING/*_keyword_xq AS (SELECT
> >  +
> >   |   |   (to_tsquery('english_nostop',
> > COALESCE(NULLIF( '(' ||
> >
> btrim(regexp_replace(split_date_range(search_normalize(replace(replace(uppercase(translate_isbn1013(E'1')),
>
> > */LONG_STRING/*))),E'(?:\\s+|:)','&','g'),'&|')  || ')', '()'), '')) ||
> > to_tsquery('simple', COALESCE(NULLIF( '(' ||
> >
> btrim(regexp_replace(split_date_range(search_normalize(replace(replace(uppercase(translate_isbn1013(E'1')),
>
> > */LONG_STRING/*))),E'(?:\\s+|:)','&','g'),'&|')  || ')', '()'), ''))) AS
> > tsq,+
> >   |   |   (to_tsquery('english_nostop',
> > COALESCE(NULLIF( '(' ||
> > btrim(regexp_replace(split_date_range(search_normalize
> >   00:02:17.319491 | */STRING/* |
> >
> > And the queries by DorkBot look like they could be starting the query
> > since it's using the basket function in the OPAC.
> >
> > "GET
> >
> /eg/opac/results?do_basket_action=Go=1_record_view=*/LONG_STRING/*=Search_highlight=1=metabib_basket_action=1=keyword%3Amat_format=1=112=1
>
> > HTTP/1.0" 500 16796 "-" "UT-Dorkbot/1.0"
> >
> > I've anonymized the output just to be cautious. Reports are run off the
> > backup database server, so it cannot be an auto generated report, and it
> > doesn't happen often enough for that either. At this point I'm tempted
> > to block the IP addresses. What strategies are you all using to deal
> > with crawlers, and does anyone have an idea what is causing this?
> > -Jon
> >
> > ___
> 

Re: [Evergreen-general] Question about search engine bots & DB CPU spikes

2021-11-30 Thread JonGeorg SageLibrary via Evergreen-general
Because we're behind a firewall, all the addresses display as 127.0.0.1. I
can talk to the people who administer the firewall though about blocking
IP's. Thanks
-Jon

On Tue, Nov 30, 2021 at 8:20 PM Jason Stephenson via Evergreen-general <
evergreen-general@list.evergreen-ils.org> wrote:

> JonGeorg,
>
> Check your Apache logs for the source IP addresses. If you can't find
> them, I can share the correct configuration for Apache with Nginx so
> that you will get the addresses logged.
>
> Once you know the IP address ranges, block them. If you have a firewall,
> I suggest you block them there. If not, you can block them in Nginx or
> in your load balancer configuration if you have one and it allows that.
>
> You may think you want your catalog to show up in search engines, but
> bad bots will lie about who they are. All you can do with misbehaving
> bots is to block them.
>
> HtH,
> Jason
>
> On 11/30/21 9:34 PM, JonGeorg SageLibrary via Evergreen-general wrote:
> > Question. We've been getting hammered by search engine bots [?], but
> > they seem to all query our system at the same time. Enough that it's
> > crashing the app servers. We have a robots.txt file in place. I've
> > increased the crawling delay speed from 3 to 10 seconds, and have
> > explicitly disallowed the specific bots, but I've seen no change from
> > the worst offenders - Bingbot and UT-Dorkbot. We had over 4k hits from
> > Dorkbot alone from 2pm-5pm today, and over 5k from Bingbot in the same
> > timeframe. All a couple hours after I made the changes to the robots
> > file and restarted apache services. Which out of 100k entries in the
> > vhosts files in that time frame doesn't sound like a lot, but the rest
> > of the traffic looks normal. This issue has been happening
> > intermittently [last 3 are 11/30, 11/3, 7/20] for a while, and the only
> > thing that seems to work is to manually kill the services on the DB
> > servers and restart services on the application servers.
> >
> > The symptom is an immediate spike in the Database CPU load. I start
> > killing all queries older than 2 minutes, but it still usually
> > overwhelms the system causing the app servers to stop serving requests.
> > The stuck queries are almost always ones along the lines of:
> >
> > -- bib search: #CD_documentLength #CD_meanHarmonic #CD_uniqueWords
> > from_metarecord(*/BIB_RECORD#/*) core_limit(10)
> > badge_orgs(1,138,151) estimation_strategy(inclusion) skip_check(0)
> > check_limit(1000) sort(1) filter_group_entry(1) 1
> > site(*/LIBRARY_BRANCH/*) depth(2)
> >  +
> >   |   | WITH w AS (
> >  |   | WITH */STRING/*_keyword_xq AS (SELECT
> >  +
> >   |   |   (to_tsquery('english_nostop',
> > COALESCE(NULLIF( '(' ||
> >
> btrim(regexp_replace(split_date_range(search_normalize(replace(replace(uppercase(translate_isbn1013(E'1')),
>
> > */LONG_STRING/*))),E'(?:\\s+|:)','&','g'),'&|')  || ')', '()'), '')) ||
> > to_tsquery('simple', COALESCE(NULLIF( '(' ||
> >
> btrim(regexp_replace(split_date_range(search_normalize(replace(replace(uppercase(translate_isbn1013(E'1')),
>
> > */LONG_STRING/*))),E'(?:\\s+|:)','&','g'),'&|')  || ')', '()'), ''))) AS
> > tsq,+
> >   |   |   (to_tsquery('english_nostop',
> > COALESCE(NULLIF( '(' ||
> > btrim(regexp_replace(split_date_range(search_normalize
> >   00:02:17.319491 | */STRING/* |
> >
> > And the queries by DorkBot look like they could be starting the query
> > since it's using the basket function in the OPAC.
> >
> > "GET
> >
> /eg/opac/results?do_basket_action=Go=1_record_view=*/LONG_STRING/*=Search_highlight=1=metabib_basket_action=1=keyword%3Amat_format=1=112=1
>
> > HTTP/1.0" 500 16796 "-" "UT-Dorkbot/1.0"
> >
> > I've anonymized the output just to be cautious. Reports are run off the
> > backup database server, so it cannot be an auto generated report, and it
> > doesn't happen often enough for that either. At this point I'm tempted
> > to block the IP addresses. What strategies are you all using to deal
> > with crawlers, and does anyone have an idea what is causing this?
> > -Jon
> >
> > ___
> > Evergreen-general mailing list
> > Evergreen-general@list.evergreen-ils.org
> > http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general
> >
> ___
> Evergreen-general mailing list
> Evergreen-general@list.evergreen-ils.org
> http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general
>
___
Evergreen-general mailing list
Evergreen-general@list.evergreen-ils.org
http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general


[Evergreen-general] Question about search engine bots & DB CPU spikes

2021-11-30 Thread JonGeorg SageLibrary via Evergreen-general
Question. We've been getting hammered by search engine bots [?], but they
seem to all query our system at the same time. Enough that it's crashing
the app servers. We have a robots.txt file in place. I've increased the
crawling delay speed from 3 to 10 seconds, and have explicitly disallowed
the specific bots, but I've seen no change from the worst offenders -
Bingbot and UT-Dorkbot. We had over 4k hits from Dorkbot alone from 2pm-5pm
today, and over 5k from Bingbot in the same timeframe. All a couple hours
after I made the changes to the robots file and restarted apache services.
Which out of 100k entries in the vhosts files in that time frame doesn't
sound like a lot, but the rest of the traffic looks normal. This issue has
been happening intermittently [last 3 are 11/30, 11/3, 7/20] for a while,
and the only thing that seems to work is to manually kill the services on
the DB servers and restart services on the application servers.

The symptom is an immediate spike in the Database CPU load. I start killing
all queries older than 2 minutes, but it still usually overwhelms the
system causing the app servers to stop serving requests. The stuck queries
are almost always ones along the lines of:

-- bib search: #CD_documentLength #CD_meanHarmonic #CD_uniqueWords
from_metarecord(*BIB_RECORD#*) core_limit(10) badge_orgs(1,138,151)
estimation_strategy(inclusion) skip_check(0) check_limit(1000) sort(1)
filter_group_entry(1) 1 site(*LIBRARY_BRANCH*) depth(2)
+
 |   | WITH w AS (
|   | WITH *STRING*_keyword_xq AS (SELECT
  +
 |   |   (to_tsquery('english_nostop',
COALESCE(NULLIF( '(' ||
btrim(regexp_replace(split_date_range(search_normalize(replace(replace(uppercase(translate_isbn1013(E'1')),
*LONG_STRING*))),E'(?:\\s+|:)','&','g'),'&|')  || ')', '()'), '')) ||
to_tsquery('simple', COALESCE(NULLIF( '(' ||
btrim(regexp_replace(split_date_range(search_normalize(replace(replace(uppercase(translate_isbn1013(E'1')),
*LONG_STRING*))),E'(?:\\s+|:)','&','g'),'&|')  || ')', '()'), ''))) AS tsq,+
 |   |   (to_tsquery('english_nostop',
COALESCE(NULLIF( '(' ||
btrim(regexp_replace(split_date_range(search_normalize
 00:02:17.319491 | *STRING* |

And the queries by DorkBot look like they could be starting the query since
it's using the basket function in the OPAC.

"GET /eg/opac/results?do_basket_action=Go=1_record_view=
*LONG_STRING*=Search_highlight=1=metabib_basket_action=1=keyword%3Amat_format=1=112=1
HTTP/1.0" 500 16796 "-" "UT-Dorkbot/1.0"

I've anonymized the output just to be cautious. Reports are run off the
backup database server, so it cannot be an auto generated report, and it
doesn't happen often enough for that either. At this point I'm tempted to
block the IP addresses. What strategies are you all using to deal with
crawlers, and does anyone have an idea what is causing this?
-Jon
___
Evergreen-general mailing list
Evergreen-general@list.evergreen-ils.org
http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-general