Re: [Archivesspace_Users_Group] PUI indexing issues

2021-03-22 Thread Blake Carver
I did some experimenting this weekend, messing around with indexer speeds, and 
found I could get it to succeed with the right indexer settings. I think the 
answer is going to be "it depends" and you'll need to experiment with what 
works on your set up with your data. I started with the defaults, then dropped 
it to realy slow (1 thread 1 per), then just tried to dial it up and down. 
The last one I tried worked fine, it was fast enough to finish in a reasonable 
amount of time and didn't slow down or crash. Your settings may not look like 
this, but here's something to try.

AppConfig[:pui_indexer_records_per_thread] = 50
AppConfig[:pui_indexer_thread_count] = 1


So some extra detail for the mailing list archives... if your site keeps 
crashing before the indexers finish and you're not seeing any particular errors 
in the logs that make you think you have a problem with your data, try turning 
the knobs on your indexer speed and see if that helps.

It looks like maybe the indexer just eats up too much memory on BIG records and 
having too many (too many being 15ish) threads running causes it to crash. I 
know BIG is pretty subjective, if you have a bunch of resources (maybe a few 
thousand) AND those resources all have ALLOTA (maybe a few thousand) children 
with ALLOTA subjects/agents/notes/stuff, then you might hit this problem. Seems 
like it's not the total number of resources, it's probably because those 
resources are big/complex/deep.


From: archivesspace_users_group-boun...@lyralists.lyrasis.org 
 on behalf of Tom 
Hanstra 
Sent: Thursday, March 18, 2021 11:24 AM
To: Archivesspace Users Group 
Subject: Re: [Archivesspace_Users_Group] PUI indexing issues

Dave,

Thanks for the suggestion, but unless there is some direct limitation within 
Solr, that should not be an issue. My disk is at only about 50% of capacity and 
Solr should be able to expand as needed. In my case, I don't think there has 
been much addition to Solr because I'm reindexing records which have been 
indexed already. So the deleted records are growing, but not the overall number 
of records. My index is currently at about 6GB.

Any other thoughts out there?

Thanks,
Tom

On Thu, Mar 18, 2021 at 10:51 AM Mayo, Dave 
mailto:dave_m...@harvard.edu>> wrote:

This is a little bit of a shot in the dark, but have you looked at disk space 
on whatever host Solr is resident on? (the ASpace server if you’re not running 
an external one)?

A thing we’ve hit a couple times is that Solr, at least in some configurations, 
needs substantial headroom on disk to perform well – I think it’s related to 
how it builds and maintains the index?  So it might be worth looking to see if 
Solr is filling up the disk enough that it can’t efficiently handle itself.



--

Dave Mayo (he/him)

Senior Digital Library Software Engineer
Harvard University > HUIT > LTS



From: 
mailto:archivesspace_users_group-boun...@lyralists.lyrasis.org>>
 on behalf of Tom Hanstra mailto:hans...@nd.edu>>
Reply-To: Archivesspace Users Group 
mailto:archivesspace_users_group@lyralists.lyrasis.org>>
Date: Wednesday, March 17, 2021 at 11:43 AM
To: Archivesspace Users Group 
mailto:archivesspace_users_group@lyralists.lyrasis.org>>
Subject: Re: [Archivesspace_Users_Group] PUI indexing issues







- What really bothers me is the slowdown. That indicates to me that some 
resource is being lost along the way. Anyone have thoughts on what that might 
be?





Just to follow up on my earlier post, I did get even lower numbers from Blake 
to try based upon what he used for our hosted account. But I'm seeing the same 
pattern in terms of slowdowns regarding the number of records that get 
processed/hour. Is this typical?  Is it just hitting records that have more 
work to be done? Or do I still have a resource issue.



I note that the number of docs in Solr has not changed at all throughout the 
last couple of attempts, which again leads me to believe it has already handled 
these records (at least once) before and thus there is no more indexing to 
really be done with the records which it is running through the PUI indexer 
again. Which leads back to the "why does PUI indexing restart each time from 0" 
question. How does one add an enhancement request to have this reviewed and 
(perhaps) changed?



Thanks,

Tom



--

Tom Hanstra

Sr. Systems Administrator

hans...@nd.edu<mailto:hans...@nd.edu>



[https://docs.google.com/uc?export=download=1GFX1KaaMTtQ2Kg2u8bMXt1YwBp96bvf0=0B7APN9POn6xAQ244WWFYMFU3aVJwZ0lxbmVHK3FxNXlCd0RRPQ]

___
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group@lyralists.lyrasis.org<mailto:Archivesspace_Users_Group@lyralists.lyrasis.org>
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group


--
Tom Hanstra
Sr. Systems Administrator
hans...@nd.edu<mailto:hans...@nd.edu>

[ht

Re: [Archivesspace_Users_Group] PUI indexing issues

2021-03-22 Thread Tom Hanstra
 records, etc.)
>
>
> Mark
>
>
> --
> *From:* archivesspace_users_group-boun...@lyralists.lyrasis.org <
> archivesspace_users_group-boun...@lyralists.lyrasis.org> on behalf of Tom
> Hanstra 
> *Sent:* Monday, March 22, 2021 11:21 AM
> *To:* Archivesspace Users Group <
> archivesspace_users_group@lyralists.lyrasis.org>
> *Subject:* Re: [Archivesspace_Users_Group] PUI indexing issues
>
> Thanks, Blake.
>
> In your testing, how big was the repository that you were testing against.
> Mine has "763368 archival_object records" and I consistently get into the
> 670K range for staff and 575 range for PUI before things really slow down.
> I'm now trying to really increase the Java settings to see if that will
> help. So far, the problem is similar: real slow downs after zipping through
> the first records. I'll also try some of the settings you have there to see
> if fewer but larger threads work better than multiple smaller threads.
>
> Tom
>
> On Mon, Mar 22, 2021 at 10:52 AM Blake Carver 
> wrote:
>
> I did some experimenting this weekend, messing around with indexer speeds,
> and found I could get it to succeed with the right indexer settings. I
> think the answer is going to be "it depends" and you'll need to experiment
> with what works on your set up with your data. I started with the defaults,
> then dropped it to realy slow (1 thread 1 per), then just tried to dial
> it up and down. The last one I tried worked fine, it was fast enough to
> finish in a reasonable amount of time and didn't slow down or crash. Your
> settings may not look like this, but here's something to try.
>
> AppConfig[:pui_indexer_records_per_thread] = 50
> AppConfig[:pui_indexer_thread_count] = 1
>
>
> So some extra detail for the mailing list archives... if your site keeps
> crashing before the indexers finish and you're not seeing any particular
> errors in the logs that make you think you have a problem with your data,
> try turning the knobs on your indexer speed and see if that helps.
>
> It looks like maybe the indexer just eats up too much memory on BIG
> records and having too many (too many being 15ish) threads running causes
> it to crash. I know BIG is pretty subjective, if you have a bunch of
> resources (maybe a few thousand) AND those resources all have ALLOTA (maybe
> a few thousand) children with ALLOTA subjects/agents/notes/stuff, then you
> might hit this problem. Seems like it's not the total number of resources,
> it's probably because those resources are big/complex/deep.
>
> --
> *From:* archivesspace_users_group-boun...@lyralists.lyrasis.org <
> archivesspace_users_group-boun...@lyralists.lyrasis.org> on behalf of Tom
> Hanstra 
> *Sent:* Thursday, March 18, 2021 11:24 AM
> *To:* Archivesspace Users Group <
> archivesspace_users_group@lyralists.lyrasis.org>
> *Subject:* Re: [Archivesspace_Users_Group] PUI indexing issues
>
> Dave,
>
> Thanks for the suggestion, but unless there is some direct limitation
> within Solr, that should not be an issue. My disk is at only about 50% of
> capacity and Solr should be able to expand as needed. In my case, I don't
> think there has been much addition to Solr because I'm reindexing records
> which have been indexed already. So the deleted records are growing, but
> not the overall number of records. My index is currently at about 6GB.
>
> Any other thoughts out there?
>
> Thanks,
> Tom
>
> On Thu, Mar 18, 2021 at 10:51 AM Mayo, Dave  wrote:
>
> This is a little bit of a shot in the dark, but have you looked at disk
> space on whatever host Solr is resident on? (the ASpace server if you’re
> not running an external one)?
>
> A thing we’ve hit a couple times is that Solr, at least in some
> configurations, needs substantial headroom on disk to perform well – I
> think it’s related to how it builds and maintains the index?  So it might
> be worth looking to see if Solr is filling up the disk enough that it can’t
> efficiently handle itself.
>
>
>
> --
>
> Dave Mayo (he/him)
>
> Senior Digital Library Software Engineer
> Harvard University > HUIT > LTS
>
>
>
> *From: * on
> behalf of Tom Hanstra 
> *Reply-To: *Archivesspace Users Group <
> archivesspace_users_group@lyralists.lyrasis.org>
> *Date: *Wednesday, March 17, 2021 at 11:43 AM
> *To: *Archivesspace Users Group <
> archivesspace_users_group@lyralists.lyrasis.org>
> *Subject: *Re: [Archivesspace_Users_Group] PUI indexing issues
>
>
>
>
>
>
>
> - What really bothers me is the slowdown. That indicates to me that some
> resource is being lost along the way. A

Re: [Archivesspace_Users_Group] PUI indexing issues

2021-03-22 Thread Custer, Mark
Tom,

Not sure if it will help (and not sure if you shared your config.rb file in a 
previous message), but have you tried:

  *   turning off the Solr backups during the re-indexing 
(https://github.com/archivesspace/archivesspace/blob/d207e8a7bb01c2b7b6f42ee5c0025d95f35ee7ae/common/config/config-defaults.rb#L76).
  Just going back to Dave's suggestion about keeping an eye out on disk space.
  *   updating the record inheritance settings and removing the bit about 
inheriting scope and contents notes, which really bloats the index since most 
finding aids won't have lower-level descriptive notes 
(https://github.com/archivesspace/archivesspace/blob/d207e8a7bb01c2b7b6f42ee5c0025d95f35ee7ae/common/config/config-defaults.rb#L411-L413).
 For our record inheritance settings , 
https://github.com/YaleArchivesSpace/aspace-deployment/blob/master/prod/config.rb#L204-L244),
 we only inherit two notes currently:  access notes and preferred citation 
notes.
  *   Turning off the PUI indexer until the staff indexing is done, and then 
turning the PUI indexer back on?

When testing re-indexing locally, I usually bump up the two values that Blake 
listed below, at least until I start getting Java heapspace errors and don't' 
have any more RAM to allot to ASpace .  But even just waiting for the archival 
objects can take a while, depending on the settings in your config.rb file. 
While waiting for a full re-index once when things were re-indexing on a server 
that I didn't have access to, I'd periodically look at the last archival object 
ID that was indexed, and then run a database query to see how many more 
archival objects were left in that repo, since a full re-index seems to go in 
order of the primary keys. e.g.

select count(*) from archival_object
where id > {archival_object id}
and repo_id = {repo id};

That would at least give me a sense of how much longer it might be for all the 
archival objects were left in one of our repos.

As for the slowdown, after all the archival objects are indexed in a 
repository, the next thing that happens (although it can take quite a while) 
will be for all the tree indexes to be created and finally committed to Solr.  
See 
https://github.com/archivesspace/archivesspace/blob/82c4603fe22bf0fd06043974478d4caf26e1c646/indexer/app/lib/pui_indexer.rb#L136.
 If i recall correctly, there won't be any specific mentions in the logs about 
that, but after you get a message about all of the archival objects being 
indexed in a specfic repository, you'll get another message about the archival 
objects being indexed again sometime later, at which point the full trees have 
been reindexed again, and then the indexer will be off to the next repo (or 
record type, like classification records, etc.)


Mark



From: archivesspace_users_group-boun...@lyralists.lyrasis.org 
 on behalf of Tom 
Hanstra 
Sent: Monday, March 22, 2021 11:21 AM
To: Archivesspace Users Group 
Subject: Re: [Archivesspace_Users_Group] PUI indexing issues

Thanks, Blake.

In your testing, how big was the repository that you were testing against. Mine 
has "763368 archival_object records" and I consistently get into the 670K range 
for staff and 575 range for PUI before things really slow down. I'm now trying 
to really increase the Java settings to see if that will help. So far, the 
problem is similar: real slow downs after zipping through the first records. 
I'll also try some of the settings you have there to see if fewer but larger 
threads work better than multiple smaller threads.

Tom

On Mon, Mar 22, 2021 at 10:52 AM Blake Carver 
mailto:blake.car...@lyrasis.org>> wrote:
I did some experimenting this weekend, messing around with indexer speeds, and 
found I could get it to succeed with the right indexer settings. I think the 
answer is going to be "it depends" and you'll need to experiment with what 
works on your set up with your data. I started with the defaults, then dropped 
it to realy slow (1 thread 1 per), then just tried to dial it up and down. 
The last one I tried worked fine, it was fast enough to finish in a reasonable 
amount of time and didn't slow down or crash. Your settings may not look like 
this, but here's something to try.

AppConfig[:pui_indexer_records_per_thread] = 50
AppConfig[:pui_indexer_thread_count] = 1


So some extra detail for the mailing list archives... if your site keeps 
crashing before the indexers finish and you're not seeing any particular errors 
in the logs that make you think you have a problem with your data, try turning 
the knobs on your indexer speed and see if that helps.

It looks like maybe the indexer just eats up too much memory on BIG records and 
having too many (too many being 15ish) threads running causes it to crash. I 
know BIG is pretty subjective, if you have a bunch of resources (maybe a few 
thousand) AND those resources all have ALLOTA (maybe a few thousand) children 
with ALLOTA subjects

Re: [Archivesspace_Users_Group] PUI indexing issues

2021-03-22 Thread Tom Hanstra
Thanks, Blake.

In your testing, how big was the repository that you were testing against.
Mine has "763368 archival_object records" and I consistently get into the
670K range for staff and 575 range for PUI before things really slow down.
I'm now trying to really increase the Java settings to see if that will
help. So far, the problem is similar: real slow downs after zipping through
the first records. I'll also try some of the settings you have there to see
if fewer but larger threads work better than multiple smaller threads.

Tom

On Mon, Mar 22, 2021 at 10:52 AM Blake Carver 
wrote:

> I did some experimenting this weekend, messing around with indexer speeds,
> and found I could get it to succeed with the right indexer settings. I
> think the answer is going to be "it depends" and you'll need to experiment
> with what works on your set up with your data. I started with the defaults,
> then dropped it to realy slow (1 thread 1 per), then just tried to dial
> it up and down. The last one I tried worked fine, it was fast enough to
> finish in a reasonable amount of time and didn't slow down or crash. Your
> settings may not look like this, but here's something to try.
>
> AppConfig[:pui_indexer_records_per_thread] = 50
> AppConfig[:pui_indexer_thread_count] = 1
>
>
> So some extra detail for the mailing list archives... if your site keeps
> crashing before the indexers finish and you're not seeing any particular
> errors in the logs that make you think you have a problem with your data,
> try turning the knobs on your indexer speed and see if that helps.
>
> It looks like maybe the indexer just eats up too much memory on BIG
> records and having too many (too many being 15ish) threads running causes
> it to crash. I know BIG is pretty subjective, if you have a bunch of
> resources (maybe a few thousand) AND those resources all have ALLOTA (maybe
> a few thousand) children with ALLOTA subjects/agents/notes/stuff, then you
> might hit this problem. Seems like it's not the total number of resources,
> it's probably because those resources are big/complex/deep.
>
> --
> *From:* archivesspace_users_group-boun...@lyralists.lyrasis.org <
> archivesspace_users_group-boun...@lyralists.lyrasis.org> on behalf of Tom
> Hanstra 
> *Sent:* Thursday, March 18, 2021 11:24 AM
> *To:* Archivesspace Users Group <
> archivesspace_users_group@lyralists.lyrasis.org>
> *Subject:* Re: [Archivesspace_Users_Group] PUI indexing issues
>
> Dave,
>
> Thanks for the suggestion, but unless there is some direct limitation
> within Solr, that should not be an issue. My disk is at only about 50% of
> capacity and Solr should be able to expand as needed. In my case, I don't
> think there has been much addition to Solr because I'm reindexing records
> which have been indexed already. So the deleted records are growing, but
> not the overall number of records. My index is currently at about 6GB.
>
> Any other thoughts out there?
>
> Thanks,
> Tom
>
> On Thu, Mar 18, 2021 at 10:51 AM Mayo, Dave  wrote:
>
> This is a little bit of a shot in the dark, but have you looked at disk
> space on whatever host Solr is resident on? (the ASpace server if you’re
> not running an external one)?
>
> A thing we’ve hit a couple times is that Solr, at least in some
> configurations, needs substantial headroom on disk to perform well – I
> think it’s related to how it builds and maintains the index?  So it might
> be worth looking to see if Solr is filling up the disk enough that it can’t
> efficiently handle itself.
>
>
>
> --
>
> Dave Mayo (he/him)
>
> Senior Digital Library Software Engineer
> Harvard University > HUIT > LTS
>
>
>
> *From: * on
> behalf of Tom Hanstra 
> *Reply-To: *Archivesspace Users Group <
> archivesspace_users_group@lyralists.lyrasis.org>
> *Date: *Wednesday, March 17, 2021 at 11:43 AM
> *To: *Archivesspace Users Group <
> archivesspace_users_group@lyralists.lyrasis.org>
> *Subject: *Re: [Archivesspace_Users_Group] PUI indexing issues
>
>
>
>
>
>
>
> - What really bothers me is the slowdown. That indicates to me that some
> resource is being lost along the way. Anyone have thoughts on what that
> might be?
>
>
>
>
>
> Just to follow up on my earlier post, I did get even lower numbers from
> Blake to try based upon what he used for our hosted account. But I'm seeing
> the same pattern in terms of slowdowns regarding the number of records that
> get processed/hour. Is this typical?  Is it just hitting records that have
> more work to be done? Or do I still have a resource issue.
>
>
>
> I note that the number of docs in Solr has not changed at all 

Re: [Archivesspace_Users_Group] PUI indexing issues

2021-03-18 Thread Tom Hanstra
Dave,

Thanks for the suggestion, but unless there is some direct limitation
within Solr, that should not be an issue. My disk is at only about 50% of
capacity and Solr should be able to expand as needed. In my case, I don't
think there has been much addition to Solr because I'm reindexing records
which have been indexed already. So the deleted records are growing, but
not the overall number of records. My index is currently at about 6GB.

Any other thoughts out there?

Thanks,
Tom

On Thu, Mar 18, 2021 at 10:51 AM Mayo, Dave  wrote:

> This is a little bit of a shot in the dark, but have you looked at disk
> space on whatever host Solr is resident on? (the ASpace server if you’re
> not running an external one)?
>
> A thing we’ve hit a couple times is that Solr, at least in some
> configurations, needs substantial headroom on disk to perform well – I
> think it’s related to how it builds and maintains the index?  So it might
> be worth looking to see if Solr is filling up the disk enough that it can’t
> efficiently handle itself.
>
>
>
> --
>
> Dave Mayo (he/him)
>
> Senior Digital Library Software Engineer
> Harvard University > HUIT > LTS
>
>
>
> *From: * on
> behalf of Tom Hanstra 
> *Reply-To: *Archivesspace Users Group <
> archivesspace_users_group@lyralists.lyrasis.org>
> *Date: *Wednesday, March 17, 2021 at 11:43 AM
> *To: *Archivesspace Users Group <
> archivesspace_users_group@lyralists.lyrasis.org>
> *Subject: *Re: [Archivesspace_Users_Group] PUI indexing issues
>
>
>
>
>
>
>
> - What really bothers me is the slowdown. That indicates to me that some
> resource is being lost along the way. Anyone have thoughts on what that
> might be?
>
>
>
>
>
> Just to follow up on my earlier post, I did get even lower numbers from
> Blake to try based upon what he used for our hosted account. But I'm seeing
> the same pattern in terms of slowdowns regarding the number of records that
> get processed/hour. Is this typical?  Is it just hitting records that have
> more work to be done? Or do I still have a resource issue.
>
>
>
> I note that the number of docs in Solr has not changed at all throughout
> the last couple of attempts, which again leads me to believe it has already
> handled these records (at least once) before and thus there is no more
> indexing to really be done with the records which it is running through
> the PUI indexer again. Which leads back to the "why does PUI indexing
> restart each time from 0" question. How does one add an enhancement request
> to have this reviewed and (perhaps) changed?
>
>
>
> Thanks,
>
> Tom
>
>
>
> --
>
> *Tom Hanstra*
>
> *Sr. Systems Administrator*
>
> hans...@nd.edu
>
>
>
> ___
> Archivesspace_Users_Group mailing list
> Archivesspace_Users_Group@lyralists.lyrasis.org
> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
>


-- 
*Tom Hanstra*
*Sr. Systems Administrator*
hans...@nd.edu
___
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group@lyralists.lyrasis.org
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group


Re: [Archivesspace_Users_Group] PUI indexing issues

2021-03-18 Thread Mayo, Dave
This is a little bit of a shot in the dark, but have you looked at disk space 
on whatever host Solr is resident on? (the ASpace server if you’re not running 
an external one)?

A thing we’ve hit a couple times is that Solr, at least in some configurations, 
needs substantial headroom on disk to perform well – I think it’s related to 
how it builds and maintains the index?  So it might be worth looking to see if 
Solr is filling up the disk enough that it can’t efficiently handle itself.

--
Dave Mayo (he/him)
Senior Digital Library Software Engineer
Harvard University > HUIT > LTS

From:  on behalf of 
Tom Hanstra 
Reply-To: Archivesspace Users Group 

Date: Wednesday, March 17, 2021 at 11:43 AM
To: Archivesspace Users Group 
Subject: Re: [Archivesspace_Users_Group] PUI indexing issues



- What really bothers me is the slowdown. That indicates to me that some 
resource is being lost along the way. Anyone have thoughts on what that might 
be?


Just to follow up on my earlier post, I did get even lower numbers from Blake 
to try based upon what he used for our hosted account. But I'm seeing the same 
pattern in terms of slowdowns regarding the number of records that get 
processed/hour. Is this typical?  Is it just hitting records that have more 
work to be done? Or do I still have a resource issue.

I note that the number of docs in Solr has not changed at all throughout the 
last couple of attempts, which again leads me to believe it has already handled 
these records (at least once) before and thus there is no more indexing to 
really be done with the records which it is running through the PUI indexer 
again. Which leads back to the "why does PUI indexing restart each time from 0" 
question. How does one add an enhancement request to have this reviewed and 
(perhaps) changed?

Thanks,
Tom

--
Tom Hanstra
Sr. Systems Administrator
hans...@nd.edu<mailto:hans...@nd.edu>

[https://docs.google.com/uc?export=download=1GFX1KaaMTtQ2Kg2u8bMXt1YwBp96bvf0=0B7APN9POn6xAQ244WWFYMFU3aVJwZ0lxbmVHK3FxNXlCd0RRPQ]
___
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group@lyralists.lyrasis.org
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group


Re: [Archivesspace_Users_Group] PUI indexing issues

2021-03-17 Thread Tom Hanstra
>
>
> - What really bothers me is the slowdown. That indicates to me that some
> resource is being lost along the way. Anyone have thoughts on what that
> might be?
>
>
> Just to follow up on my earlier post, I did get even lower numbers from
Blake to try based upon what he used for our hosted account. But I'm seeing
the same pattern in terms of slowdowns regarding the number of records that
get processed/hour. Is this typical?  Is it just hitting records that have
more work to be done? Or do I still have a resource issue.

I note that the number of docs in Solr has not changed at all throughout
the last couple of attempts, which again leads me to believe it has already
handled these records (at least once) before and thus there is no more
indexing to really be done with the records which it is running through
the PUI indexer again. Which leads back to the "why does PUI indexing
restart each time from 0" question. How does one add an enhancement request
to have this reviewed and (perhaps) changed?

Thanks,
Tom

>
>> --
*Tom Hanstra*
*Sr. Systems Administrator*
hans...@nd.edu
___
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group@lyralists.lyrasis.org
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group


Re: [Archivesspace_Users_Group] PUI indexing issues

2021-03-16 Thread Blake Carver
I can only answer some of those.

- Staff indexing is done and has its files written. So does the number of 
threads given to that make a difference? Is it still taking up resources?

Not so much if it's not doing anything.

- Does there happen to be any way to stop the staff indexing and just let PUI 
have full access to the server for indexing?

You can disable either indexer, but that requires a restart. There's a setting 
in the config. The PUI is just slower than the Staff.

- Should our repositories be broken up into smaller groupings?  I'm beginning 
to wonder if we have things set up incorrectly, since it sounds like we have a 
very large data set compared to others.

It's probably not the total number of resources in a repo, just that the 
resources are quite large.



From: archivesspace_users_group-boun...@lyralists.lyrasis.org 
 on behalf of Tom 
Hanstra 
Sent: Tuesday, March 16, 2021 1:52 PM
To: Archivesspace Users Group 
Subject: Re: [Archivesspace_Users_Group] PUI indexing issues

Thanks for the suggestion, Blake. A couple additional questions:

- Staff indexing is done and has its files written. So does the number of 
threads given to that make a difference? Is it still taking up resources?

- Does there happen to be any way to stop the staff indexing and just let PUI 
have full access to the server for indexing?

- What really bothers me is the slowdown. That indicates to me that some 
resource is being lost along the way. Anyone have thoughts on what that might 
be?

- Should our repositories be broken up into smaller groupings?  I'm beginning 
to wonder if we have things set up incorrectly, since it sounds like we have a 
very large data set compared to others.


And a comment

It is really frustrating to have to start over on the indexing each time. It 
seems that there should be some way to document progress along the way so that 
the indexing can pick up where it left off. Is that something that might also 
be looked at?

Thanks all. Appreciate your help.

Tom


On Tue, Mar 16, 2021 at 1:15 PM Blake Carver 
mailto:blake.car...@lyrasis.org>> wrote:
> I've now left my PUI indexing threads and count at the default (which I 
> believe is 1 thread and 25 records/thread).

Try dropping both indexer_records_per_thread and indexer_thread_count for both 
PUI and Staff indexers. Maybe in half or so. Sometimes with larger records it 
just needs to be slowed down.

From: 
archivesspace_users_group-boun...@lyralists.lyrasis.org<mailto:archivesspace_users_group-boun...@lyralists.lyrasis.org>
 
mailto:archivesspace_users_group-boun...@lyralists.lyrasis.org>>
 on behalf of Tom Hanstra mailto:hans...@nd.edu>>
Sent: Tuesday, March 16, 2021 12:51 PM
To: Archivesspace Users Group 
mailto:archivesspace_users_group@lyralists.lyrasis.org>>
Subject: [Archivesspace_Users_Group] PUI indexing issues

Hello again.

I'm still trying to understand some indexing issues. I've now left my PUI 
indexing threads and count at the default (which I believe is 1 thread and 25 
records/thread). And I have given 4GB to Java processes. I've tried other 
values as well, but with similar results.

No matter what values I use, I cannot seem to fully index PUI. Each time, it 
will start well but continuously slow down. I've kept a spreadsheet of the 
number of records/hr I'm indexing and have several attempts which start in the 
50-60K/hr range and then continuously slow down to the 1800-1500/hr speed until 
finally dying with a Java Heap error. I think I'm headed to that again this 
round.

Why might this be happening?  Could my data have been corrupted during the 
transfer from Lyrasis? (I'm working with a database export of our production 
data). Is the database too far away (our database is in an AWS RDS being 
accessed from our AWS EC2).

I do have one log which gave this error:

E, [2021-03-12T18:14:53.886243 #2919] ERROR -- : Thread-9472: Failed fetching 
archival_object id=1484623: too many connection resets (due to Net::ReadTimeout 
- Net::ReadTimeout) after 0 requests on 3150, last used 1615590893.870297 
seconds
ago

prior to the Java Heap error. In that log, there were a number of connections 
for the staff indexer after the PUI indexer stopped reporting, then an 88 
minute gap prior to the above connection error and then finally a Java Heap 
error in the archivesspace.out log.

Does the indexer reauthenticate each time it connects to get more information?  
The earlier question about authentication has me wondering if my database 
server might be balking at the number of reconnections or something. I'm trying 
to index 760K records.

Bottom line is that I'm still not getting my PUI index creation to complete. 
Each run can take several days before it finally fails and I have to start all 
over again.  I'm looking for any help to track down why this slowdown is 
occurring and what I can do to address it.

Th

Re: [Archivesspace_Users_Group] PUI indexing issues

2021-03-16 Thread Blake Carver
> I've now left my PUI indexing threads and count at the default (which I 
> believe is 1 thread and 25 records/thread).

Try dropping both indexer_records_per_thread and indexer_thread_count for both 
PUI and Staff indexers. Maybe in half or so. Sometimes with larger records it 
just needs to be slowed down.

From: archivesspace_users_group-boun...@lyralists.lyrasis.org 
 on behalf of Tom 
Hanstra 
Sent: Tuesday, March 16, 2021 12:51 PM
To: Archivesspace Users Group 
Subject: [Archivesspace_Users_Group] PUI indexing issues

Hello again.

I'm still trying to understand some indexing issues. I've now left my PUI 
indexing threads and count at the default (which I believe is 1 thread and 25 
records/thread). And I have given 4GB to Java processes. I've tried other 
values as well, but with similar results.

No matter what values I use, I cannot seem to fully index PUI. Each time, it 
will start well but continuously slow down. I've kept a spreadsheet of the 
number of records/hr I'm indexing and have several attempts which start in the 
50-60K/hr range and then continuously slow down to the 1800-1500/hr speed until 
finally dying with a Java Heap error. I think I'm headed to that again this 
round.

Why might this be happening?  Could my data have been corrupted during the 
transfer from Lyrasis? (I'm working with a database export of our production 
data). Is the database too far away (our database is in an AWS RDS being 
accessed from our AWS EC2).

I do have one log which gave this error:

E, [2021-03-12T18:14:53.886243 #2919] ERROR -- : Thread-9472: Failed fetching 
archival_object id=1484623: too many connection resets (due to Net::ReadTimeout 
- Net::ReadTimeout) after 0 requests on 3150, last used 1615590893.870297 
seconds
ago

prior to the Java Heap error. In that log, there were a number of connections 
for the staff indexer after the PUI indexer stopped reporting, then an 88 
minute gap prior to the above connection error and then finally a Java Heap 
error in the archivesspace.out log.

Does the indexer reauthenticate each time it connects to get more information?  
The earlier question about authentication has me wondering if my database 
server might be balking at the number of reconnections or something. I'm trying 
to index 760K records.

Bottom line is that I'm still not getting my PUI index creation to complete. 
Each run can take several days before it finally fails and I have to start all 
over again.  I'm looking for any help to track down why this slowdown is 
occurring and what I can do to address it.

Thanks,
Tom
--
Tom Hanstra
Sr. Systems Administrator
hans...@nd.edu<mailto:hans...@nd.edu>

[https://docs.google.com/uc?export=download=1GFX1KaaMTtQ2Kg2u8bMXt1YwBp96bvf0=0B7APN9POn6xAQ244WWFYMFU3aVJwZ0lxbmVHK3FxNXlCd0RRPQ]
___
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group@lyralists.lyrasis.org
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group


Re: [Archivesspace_Users_Group] PUI indexing issues

2021-03-16 Thread Tom Hanstra
Thanks for the suggestion, Blake. A couple additional questions:

- Staff indexing is done and has its files written. So does the number of
threads given to that make a difference? Is it still taking up resources?

- Does there happen to be any way to stop the staff indexing and just let
PUI have full access to the server for indexing?

- What really bothers me is the slowdown. That indicates to me that some
resource is being lost along the way. Anyone have thoughts on what that
might be?

- Should our repositories be broken up into smaller groupings?  I'm
beginning to wonder if we have things set up incorrectly, since it sounds
like we have a very large data set compared to others.


And a comment

It is really frustrating to have to start over on the indexing each time.
It seems that there should be some way to document progress along the way
so that the indexing can pick up where it left off. Is that something that
might also be looked at?

Thanks all. Appreciate your help.

Tom


On Tue, Mar 16, 2021 at 1:15 PM Blake Carver 
wrote:

> > I've now left my PUI indexing threads and count at the default (which I
> believe is 1 thread and 25 records/thread).
>
> Try dropping both indexer_records_per_thread and indexer_thread_count for
> both PUI and Staff indexers. Maybe in half or so. Sometimes with larger
> records it just needs to be slowed down.
> --
> *From:* archivesspace_users_group-boun...@lyralists.lyrasis.org <
> archivesspace_users_group-boun...@lyralists.lyrasis.org> on behalf of Tom
> Hanstra 
> *Sent:* Tuesday, March 16, 2021 12:51 PM
> *To:* Archivesspace Users Group <
> archivesspace_users_group@lyralists.lyrasis.org>
> *Subject:* [Archivesspace_Users_Group] PUI indexing issues
>
> Hello again.
>
> I'm still trying to understand some indexing issues. I've now left my PUI
> indexing threads and count at the default (which I believe is 1 thread and
> 25 records/thread). And I have given 4GB to Java processes. I've tried
> other values as well, but with similar results.
>
> No matter what values I use, I cannot seem to fully index PUI. Each time,
> it will start well but continuously slow down. I've kept a spreadsheet of
> the number of records/hr I'm indexing and have several attempts which start
> in the 50-60K/hr range and then continuously slow down to the 1800-1500/hr
> speed until finally dying with a Java Heap error. I think I'm headed to
> that again this round.
>
> Why might this be happening?  Could my data have been corrupted during the
> transfer from Lyrasis? (I'm working with a database export of our
> production data). Is the database too far away (our database is in an AWS
> RDS being accessed from our AWS EC2).
>
> I do have one log which gave this error:
>
> E, [2021-03-12T18:14:53.886243 #2919] ERROR -- : Thread-9472: Failed
> fetching archival_object id=1484623: too many connection resets (due to
> Net::ReadTimeout - Net::ReadTimeout) after 0 requests on 3150, last used
> 1615590893.870297 seconds
> ago
>
> prior to the Java Heap error. In that log, there were a number of
> connections for the staff indexer after the PUI indexer stopped reporting,
> then an 88 minute gap prior to the above connection error and then finally
> a Java Heap error in the archivesspace.out log.
>
> Does the indexer reauthenticate each time it connects to get more
> information?  The earlier question about authentication has me wondering if
> my database server might be balking at the number of reconnections or
> something. I'm trying to index 760K records.
>
> Bottom line is that I'm still not getting my PUI index creation to
> complete. Each run can take several days before it finally fails and I have
> to start all over again.  I'm looking for any help to track down why this
> slowdown is occurring and what I can do to address it.
>
> Thanks,
> Tom
> --
> *Tom Hanstra*
> *Sr. Systems Administrator*
> hans...@nd.edu
>
>
> ___
> Archivesspace_Users_Group mailing list
> Archivesspace_Users_Group@lyralists.lyrasis.org
> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
>


-- 
*Tom Hanstra*
*Sr. Systems Administrator*
hans...@nd.edu
___
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group@lyralists.lyrasis.org
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group


[Archivesspace_Users_Group] PUI indexing issues

2021-03-16 Thread Tom Hanstra
Hello again.

I'm still trying to understand some indexing issues. I've now left my PUI
indexing threads and count at the default (which I believe is 1 thread and
25 records/thread). And I have given 4GB to Java processes. I've tried
other values as well, but with similar results.

No matter what values I use, I cannot seem to fully index PUI. Each time,
it will start well but continuously slow down. I've kept a spreadsheet of
the number of records/hr I'm indexing and have several attempts which start
in the 50-60K/hr range and then continuously slow down to the 1800-1500/hr
speed until finally dying with a Java Heap error. I think I'm headed to
that again this round.

Why might this be happening?  Could my data have been corrupted during the
transfer from Lyrasis? (I'm working with a database export of our
production data). Is the database too far away (our database is in an AWS
RDS being accessed from our AWS EC2).

I do have one log which gave this error:

E, [2021-03-12T18:14:53.886243 #2919] ERROR -- : Thread-9472: Failed
fetching archival_object id=1484623: too many connection resets (due to
Net::ReadTimeout - Net::ReadTimeout) after 0 requests on 3150, last used
1615590893.870297 seconds
ago

prior to the Java Heap error. In that log, there were a number of
connections for the staff indexer after the PUI indexer stopped reporting,
then an 88 minute gap prior to the above connection error and then finally
a Java Heap error in the archivesspace.out log.

Does the indexer reauthenticate each time it connects to get more
information?  The earlier question about authentication has me wondering if
my database server might be balking at the number of reconnections or
something. I'm trying to index 760K records.

Bottom line is that I'm still not getting my PUI index creation to
complete. Each run can take several days before it finally fails and I have
to start all over again.  I'm looking for any help to track down why this
slowdown is occurring and what I can do to address it.

Thanks,
Tom
-- 
*Tom Hanstra*
*Sr. Systems Administrator*
hans...@nd.edu
___
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group@lyralists.lyrasis.org
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group