Re: [Archivesspace_Users_Group] External Solr - Memory Allocation?

2023-01-30 Thread Joshua D. Shaw
Thanks, Andrew & Blake, et al!

One of my thoughts was to run the index against an earlier 8.x version of Solr 
to see if there are any significant changes in the indexer time.

Failing that, I think my only option is to increase the timeout to about 2hrs.

@Brian - I don't know if this is related to the DST issue since the indexer 
does not​ restart the index round for me - like it typically does in those DST 
circumstances.

To elaborate on a couple of specific thoughts Blake brought up.


  1.  We only have about 31.5K records in the deleted table, but as Andrew 
pointed out, I'm pretty sure those aren't really relevant here.
  2.  I may be reading the configs wrong, but it looks like 'debug' is the 
default level 
(https://github.com/archivesspace/archivesspace/blob/v3.3.1/common/config/config-defaults.rb#L53)
  3.  We do have some very 'wide' collections with many series and many files 
under each series. Though these collections tend not to be very 'deep' - ie 
only 3-4 levels.

I'd be interested to hear if other institutions with similar collections also 
experience the same timeout issues. Thanks, Peter, for chiming in.

Will update if/when I find out more.

Joshua


From: archivesspace_users_group-boun...@lyralists.lyrasis.org 
 on behalf of Andrew 
Morrison 
Sent: Monday, January 30, 2023 4:46 AM
To: archivesspace_users_group@lyralists.lyrasis.org 

Subject: Re: [Archivesspace_Users_Group] External Solr - Memory Allocation?


Truncating the "deleted_records" table will prevent the OAI-PMH service from 
being able to send out deletion notifications. Maybe it is worth trying on a 
testing system, but probably not a good idea on a production system.


Also note that is this...


> Deleted 186992 documents


... is not logging the deletion of deleted records. It is logging the deletion 
of Solr documents for unpublished records. But few, if any, exist to be 
deleted. ArchivesSpace sent 186992 IDs to Solr to delete, just in case any of 
them were unpublished immediately before this index run. Solr returned a 200 OK 
response, even if none were found, so ArchivesSpace reports them as all deleted.


The other logged deletions, before that, are likewise not "real" 99.9% of the 
time. These batches...


> Deleted 100 documents


...are when ArchivesSpace tells Solr to delete the tree nodes for all archival 
objects without children, just in case any of them had children before this 
index run. Only the PUIIndexer does this, which is part of why it is slower (it 
is also allocated fewer threads in default config.) Again, 99.99% of the time 
there's nothing for Solr to delete, but it has to search its indexes for them 
anyway.


And these...


> Deleted 25 documents


...are the deletion of URIs in the deleted_records table. But most of those 
were deleted long ago by previous index runs. Again, it is just in case any new 
ones were recently deleted (and even those were probably deleted by the 
RealtimeIndexer.)


This "belt and braces" approach prevents a few stray records remaining in the 
PUI when they've been deleted or unpublished, but it seems to be the cause of 
the longest wait times for commits when re-indexing large repositories. Maybe 
something has changed in newer versions of Solr to make this process slower, 
possibly specifically for deletions?


Andrew.




On 27/01/2023 14:01, Blake Carver wrote:
> I'm running default config values for the AS log levels so they are all set 
> to 'debug'.

I'm seeing "INFO" and not "DEBUG" there.

> Deleted 186992 documents

How much is in the deleted_records table? Try truncating that. ArchivesSpace is 
going and deleting anything in that table.

> So I'm falling back to this just being super​ slow for some reason.

Could be some complex records, could be there's way too much in the deleted 
table.

> but I'm not sure why the PUI indexer would become so much slower (21 hours)

Yep, sounds about right. The PUI is slow.

> We do have some collections that are quite large (10s of thousands of AOs), 
> so maybe that's part of the issue.

No doubt that's slowing it down too.



From: 
archivesspace_users_group-boun...@lyralists.lyrasis.org
 

 on behalf of Joshua D. Shaw 

Sent: Thursday, January 26, 2023 6:02 PM
To: Archivesspace Users Group 

Subject: Re: [Archivesspace_Users_Group] External Solr - Memory Allocation?

Thanks, Blake!

I'm running default config values for the AS log levels so they are all set to 
'debug'. I took a closer look, and the timeout message happens exactly after 
the timeout amount I set (as you'd expect). Interestingly, Solr is in the 
middle of deleting documents when it goes silent

I, [2023-01-26T09:18:40.357101 #78764]  INFO -- : Thread-3384: Deleted 

Re: [Archivesspace_Users_Group] External Solr - Memory Allocation?

2023-01-30 Thread Andrew Morrison
Truncating the "deleted_records" table will prevent the OAI-PMH service 
from being able to send out deletion notifications. Maybe it is worth 
trying on a testing system, but probably not a good idea on a production 
system.



Also note that is this...


> Deleted 186992 documents


... is not logging the deletion of deleted records. It is logging the 
deletion of Solr documents for unpublished records. But few, if any, 
exist to be deleted. ArchivesSpace sent 186992 IDs to Solr to delete, 
just in case any of them were unpublished immediately before this index 
run. Solr returned a 200 OK response, even if none were found, so 
ArchivesSpace reports them as all deleted.



The other logged deletions, before that, are likewise not "real" 99.9% 
of the time. These batches...



> Deleted 100 documents


...are when ArchivesSpace tells Solr to delete the tree nodes for all 
archival objects without children, just in case any of them had children 
before this index run. Only the PUIIndexer does this, which is part of 
why it is slower (it is also allocated fewer threads in default config.) 
Again, 99.99% of the time there's nothing for Solr to delete, but it has 
to search its indexes for them anyway.



And these...


> Deleted 25 documents


...are the deletion of URIs in the deleted_records table. But most of 
those were deleted long ago by previous index runs. Again, it is just in 
case any new ones were recently deleted (and even those were probably 
deleted by the RealtimeIndexer.)



This "belt and braces" approach prevents a few stray records remaining 
in the PUI when they've been deleted or unpublished, but it seems to be 
the cause of the longest wait times for commits when re-indexing large 
repositories. Maybe something has changed in newer versions of Solr to 
make this process slower, possibly specifically for deletions?



Andrew.




On 27/01/2023 14:01, Blake Carver wrote:
> I'm running default config values for the AS log levels so they are 
all set to 'debug'.


I'm seeing "INFO" and not "DEBUG" there.

> Deleted 186992 documents

How much is in the deleted_records table? Try truncating that. 
ArchivesSpace is going and deleting anything in that table.


> So I'm falling back to this just being super​ slow for some reason.

Could be some complex records, could be there's way too much in the 
deleted table.


> but I'm not sure why the PUI indexer would become so much slower (21 
hours)


Yep, sounds about right. The PUI is slow.

> We do have some collections that are quite large (10s of thousands 
of AOs), so maybe that's part of the issue.


No doubt that's slowing it down too.



*From:* archivesspace_users_group-boun...@lyralists.lyrasis.org 
 on behalf of 
Joshua D. Shaw 

*Sent:* Thursday, January 26, 2023 6:02 PM
*To:* Archivesspace Users Group 

*Subject:* Re: [Archivesspace_Users_Group] External Solr - Memory 
Allocation?

Thanks, Blake!

I'm running default config values for the AS log levels so they are 
all set to 'debug'. I took a closer look, and the timeout message 
happens exactly after the timeout amount I set (as you'd expect). 
Interestingly, Solr is in the middle of deleting documents when it 
goes silent


I, [2023-01-26T09:18:40.357101 #78764]  INFO -- : Thread-3384: Deleted 
100 documents: #


 40 minutes pass with all the other AS log chatter ...

E, [2023-01-26T09:58:40.400971 #78764] ERROR -- : Thread-3384: 
SolrIndexerError when deleting records: Timeout error with  POST {}
I, [2023-01-26T09:58:40.410522 #78764]  INFO -- : Thread-3384: Deleted 
100 documents: #


This continuing delete phase goes on for a bit until it stops logging 
batch deletes.


I, [2023-01-26T09:59:11.734200 #78764]  INFO -- : Thread-3384: Deleted 
9 documents: #


 40 minutes pass with all the other AS log chatter ... And then 
the commit error pops up


E, [2023-01-26T10:39:11.746166 #78764] ERROR -- : Thread-3384: 
SolrIndexerError when committing:

Timeout error with  POST {"commit":{"softCommit":false}}.

Then after some more time

I, [2023-01-26T11:06:35.678926 #78764]  INFO -- : Thread-3384: Deleted 
186992 documents: #


 This all seems to indicate to me that the commit phase is taking 
an inordinate amount of time (almost 2 hours - maybe that's what I 
need to set the timeout to?). After that, the indexer starts the 2nd repo


I, [2023-01-26T11:06:35.765797 #78764]  INFO -- : Thread-3384: PUI 
Indexer [2023-01-26 11:06:35 -0500] Indexed 2 additional PUI records 
in repository Sherman


 The indexer waits for a looong time with no timeout and no 
messaging - even though this is a tiny repo - and then starts the 3rd repo


I, [2023-01-26T11:31:32.795602 #78764]  INFO -- : Thread-3384: PUI 
Indexer [2023-01-26 11:31:32 -0500] Indexed 188 additional PUI records 
in repository Rauner-XO


And then the indexer starts the 4th repo soon after and seems to go on 
to complete normally


I, [2023-01-26