Re: [Archivesspace_Users_Group] Top container ranges

2020-06-22 Thread Rees, John (NIH/NLM) [E]
Hi Dawne,

I'm doing pretty much what Kate laid out to edit source EAD, for all sorts of 
different container conventions. You can see my ever-growing punch list at 
https://github.com/John-Rees/aspace-migrations/issues?q=is%3Aopen+is%3Aissue+milestone%3Amigrations

I use a variety of regex and xsls to solve various problems.

John


John P. Rees
Archivist and Digital Resources Manager
History of Medicine Division
National Library of Medicine
301-827-4510
Teleworking M-F 8:00AM - 4:30PM each day until further notice


From: Lucas, Dawne Howard 
Sent: Friday, June 19, 2020 6:58 AM
To: Archivesspace Users Group 
Subject: Re: [Archivesspace_Users_Group] Top container ranges

Thank you to everyone for the responses!

Best,

Dawne

--
Dawne Howard Lucas (she/her/hers)
Technical Services Archivist

Wilson Special Collections Library
200 South Road, CB #3926
Chapel Hill, NC 27515
The University of North Carolina at Chapel Hill
P  919-966-1776   E  dawne_lu...@unc.edu<mailto:dawne_lu...@unc.edu>

[cid:image002.png@01D648AC.08659C60]<https://library.unc.edu/wilson/>


From: Bowers, Kate A.<mailto:kate_bow...@harvard.edu>
Sent: Thursday, June 18, 2020 9:55 AM
To: Archivesspace Users 
Group<mailto:archivesspace_users_group@lyralists.lyrasis.org>
Subject: Re: [Archivesspace_Users_Group] Top container ranges

What Dave said.

>From a practical standpoint, you have these options:

  1.  Change the EAD pre-migration
  2.  Change the data in Aspace
  3.  Live with the data like that (your cost of cleaning v living with it may 
vary, but it is worth having the discussion. My repository had some issues we 
decided to live with and some we decided to fix.)

Option 1 works if you have consistency in the choices folks have made in 
finding aids. Unfortunately at Harvard, for some repositories text like "Box 
3-4" referred to a single box with the identifier 3-4, and not to "Box 3 and 
Box 4". Thus, we could not implement a single script that would work for all 
cases. We were also constrained by time and could not implement scripted 
solutions across sub-sets of our corpus. However...

Individual repositories did implement changes in their own ways. We had very 
few of these "box range" type of finding aids, so I (OK, I know this is a 
really crude, sledgehammer type of method!)

  *   Got the subset of finding aid that have this issue (granted, this can be 
some task in itself)
  *   Put them in their own directory
  *   Used regex find-and-replace (taking great care, of course to do no harm 
by accident) in either my favorite text editor or oXygen to find-and-replace 
all instances of the problem
  *   Double-checked that they were all still valid
  *   Spot-checked the results
Your mileage and access to a real programmer for stuff like this may vary.




From: 
archivesspace_users_group-boun...@lyralists.lyrasis.org<mailto:archivesspace_users_group-boun...@lyralists.lyrasis.org>
 
mailto:archivesspace_users_group-boun...@lyralists.lyrasis.org>>
 On Behalf Of Mayo, Dave
Sent: Thursday, June 18, 2020 9:37 AM
To: Archivesspace Users Group 
mailto:archivesspace_users_group@lyralists.lyrasis.org>>
Subject: Re: [Archivesspace_Users_Group] Top container ranges

Also, specifically:


  1.  Using an XML database like eXist-db or BaseX with XPath/XQuery was 
invaluable when doing analysis of issues and of the impact of changes
  2.  One of the tools I wrote, the EAD Checker, is available online: 
https://eadchecker.lib.harvard.edu - it doesn't catch this specific issue, but 
it does catch a bunch of issues, some of which cause corrupted data rather than 
failure to import.

--
Dave Mayo (he/him)
Senior Digital Library Software Engineer
Harvard University > HUIT > LTS

From: 
mailto:archivesspace_users_group-boun...@lyralists.lyrasis.org>>
 on behalf of "Mayo, Dave" mailto:dave_m...@harvard.edu>>
Reply-To: Archivesspace Users Group 
mailto:archivesspace_users_group@lyralists.lyrasis.org>>
Date: Thursday, June 18, 2020 at 9:23 AM
To: Archivesspace Users Group 
mailto:archivesspace_users_group@lyralists.lyrasis.org>>
Subject: Re: [Archivesspace_Users_Group] Top container ranges

So, with the caveat that we put a lot of resources (a bunch of archivists' 
time, a full year of a full time developer (me!)), we had very solid results; I 
think remediating issues prior to import is almost always worth the expense of 
significant effort, particularly over a large corpus.

My main advice would be to be very, very careful about changes - version your 
EADs, compare before and after scripts run, and in general be very systematic 
about how you find, report, and correct changes.

I don't know if you've seen it, but Kate Bowers and I did a write-up of what we 
did during our migration - it has links to a number of open source tools I 
wrote for doing this kind of work.  They're a bit involved to get running, but 

Re: [Archivesspace_Users_Group] Top container ranges

2020-06-19 Thread Andrew Morrison
Just for completeness, another option is to create your own customized 
version of the EAD importer, by subclassing the EADConverter class 
<https://github.com/archivesspace/archivesspace/blob/master/backend/app/converters/ead_converter.rb> 
in a backend plugin. Then you'd just have another option in the 
drop-down in the import job form, and no need to pre-process.


But that would require both Ruby skills and an understanding of the 
ArchivesSpace data model for containers. I'd say even a complete novice 
with XSLT would find it easier to learning enough to tweak the Yale 
example that Adrien has given below. And it produces EAD you can view, 
validate and import on a test system to check the effects. We do both, 
but only use the plugin when changing the EAD has no effect (e.g. to 
alter how agents get roles, or the rules for whether a certain note is 
published.)


Andrew.


On 18/06/2020 14:57, Hilton, Adrien wrote:


Hi Dawne,

I believe Yale created a script to break out container ranges: 
https://github.com/YaleArchivesSpace/xslt-files/blob/master/EAD_expand_top_container_ranges_prior_to_import.xsl


Best wishes,

Adrien

*From:* archivesspace_users_group-boun...@lyralists.lyrasis.org 
 *On Behalf 
Of *Mayo, Dave

*Sent:* Thursday, June 18, 2020 9:23 AM
*To:* Archivesspace Users Group 


*Subject:* Re: [Archivesspace_Users_Group] Top container ranges

So, with the caveat that we put a lot of resources (a bunch of 
archivists’ time, a full year of a full time developer (me!)), we had 
very solid results; I think remediating issues prior to import is 
almost always worth the expense of significant effort, particularly 
over a large corpus.


My main advice would be to be very, very careful about changes – 
version your EADs, compare before and after scripts run, and in 
general be very systematic about how you find, report, and correct 
changes.


I don’t know if you’ve seen it, but Kate Bowers and I did a write-up 
of what we did during our migration – it has links to a number of open 
source tools I wrote for doing this kind of work.  They’re a bit 
involved to get running, but they definitely work at basically any 
scale out there, and I’m happy to help people get started with them. 
https://journal.code4lib.org/articles/12239 
<https://urldefense.proofpoint.com/v2/url?u=https-3A__journal.code4lib.org_articles_12239=DwMGaQ=WO-RGvefibhHBZq3fL85hQ=o7OoY1I5SGwJOY4qFC1JgmA4MQwVJOWSxO2IqPX0FiU=vS6XDcZB0h_br-T8Gq3jqXX33ieGP-JCFkbS1dAbEHg=rQ8EQDYEp71yWvgaKUN296_jqyoIDw4gUtTneg1gC6w=>


--

Dave Mayo (he/him)

Senior Digital Library Software Engineer
Harvard University > HUIT > LTS

*From: *<mailto:archivesspace_users_group-boun...@lyralists.lyrasis.org>> on 
behalf of "Lucas, Dawne Howard" <mailto:dawne_lu...@unc.edu>>
*Reply-To: *Archivesspace Users Group 
<mailto:archivesspace_users_group@lyralists.lyrasis.org>>

*Date: *Thursday, June 18, 2020 at 9:12 AM
*To: *Archivesspace Users Group 
<mailto:archivesspace_users_group@lyralists.lyrasis.org>>

*Subject: *Re: [Archivesspace_Users_Group] Top container ranges

Thanks, Dave.  I guess I should have specified that changing the EAD 
isn’t a viable solution for us /unless/ it’s automated. We do not plan 
to edit individual finding aids manually except in cases where the 
ranges aren’t regular.


If you’ve done this at Harvard, have there been any drawbacks? 
Anything we should be looking to avoid?


Thanks again,

Dawne

*From: *Mayo, Dave <mailto:dave_m...@harvard.edu>
*Sent: *Thursday, June 18, 2020 9:04 AM
*To: *Archivesspace Users Group 
<mailto:archivesspace_users_group@lyralists.lyrasis.org>

*Subject: *Re: [Archivesspace_Users_Group] Top container ranges

The two options I see here are essentially:

1. Change the EAD

2. Change the containers after they’re ingested.

Of the two, changing the EAD seems _/easier/_ to me; if you wouldn’t 
mind going more into why that’s not a viable solution for you, it 
might help us provide better advice?



Either way, at 7000 finding aids, the solution would basically need to 
be automated – if your box ranges are very regular (i.e. only single 
number or range, no “3,4,7-10” or similar), it wouldn’t be too 
difficult – split the range on ‘-‘, generate list of numbers, replace 
container with multiple containers.


--

Dave Mayo (he/him)

Senior Digital Library Software Engineer
Harvard University > HUIT > LTS

*From: *<mailto:archivesspace_users_group-boun...@lyralists.lyrasis.org>> on 
behalf of "Lucas, Dawne Howard" <mailto:dawne_lu...@unc.edu>>
*Reply-To: *Archivesspace Users Group 
<mailto:archivesspace_users_group@lyralists.lyrasis.org>>

*Date: *Thursday, June 18, 2020 at 8:13 AM
*To: *Archivesspace Users Group 
<mailto:archivesspace_users_group@lyralists.lyrasis.org>>

*Subject: *[Archivesspace_Users_Group] Top container ranges

Hi all,

We are formulating a plan to import our 7000+

Re: [Archivesspace_Users_Group] Top container ranges

2020-06-19 Thread Lucas, Dawne Howard
Thank you to everyone for the responses!

Best,

Dawne

--
Dawne Howard Lucas (she/her/hers)
Technical Services Archivist

Wilson Special Collections Library
200 South Road, CB #3926
Chapel Hill, NC 27515
The University of North Carolina at Chapel Hill
P  919-966-1776   E  dawne_lu...@unc.edu<mailto:dawne_lu...@unc.edu>

[cid:image001.png@01D5F200.0D957C80]<https://library.unc.edu/wilson/>


From: Bowers, Kate A.<mailto:kate_bow...@harvard.edu>
Sent: Thursday, June 18, 2020 9:55 AM
To: Archivesspace Users 
Group<mailto:archivesspace_users_group@lyralists.lyrasis.org>
Subject: Re: [Archivesspace_Users_Group] Top container ranges

What Dave said.

>From a practical standpoint, you have these options:

  1.  Change the EAD pre-migration
  2.  Change the data in Aspace
  3.  Live with the data like that (your cost of cleaning v living with it may 
vary, but it is worth having the discussion. My repository had some issues we 
decided to live with and some we decided to fix.)

Option 1 works if you have consistency in the choices folks have made in 
finding aids. Unfortunately at Harvard, for some repositories text like “Box 
3-4” referred to a single box with the identifier 3-4, and not to “Box 3 and 
Box 4”. Thus, we could not implement a single script that would work for all 
cases. We were also constrained by time and could not implement scripted 
solutions across sub-sets of our corpus. However…

Individual repositories did implement changes in their own ways. We had very 
few of these “box range” type of finding aids, so I (OK, I know this is a 
really crude, sledgehammer type of method!)

  *   Got the subset of finding aid that have this issue (granted, this can be 
some task in itself)
  *   Put them in their own directory
  *   Used regex find-and-replace (taking great care, of course to do no harm 
by accident) in either my favorite text editor or oXygen to find-and-replace 
all instances of the problem
  *   Double-checked that they were all still valid
  *   Spot-checked the results
Your mileage and access to a real programmer for stuff like this may vary.




From: archivesspace_users_group-boun...@lyralists.lyrasis.org 
 On Behalf Of Mayo, 
Dave
Sent: Thursday, June 18, 2020 9:37 AM
To: Archivesspace Users Group 
Subject: Re: [Archivesspace_Users_Group] Top container ranges

Also, specifically:


  1.  Using an XML database like eXist-db or BaseX with XPath/XQuery was 
invaluable when doing analysis of issues and of the impact of changes
  2.  One of the tools I wrote, the EAD Checker, is available online: 
https://eadchecker.lib.harvard.edu – it doesn’t catch this specific issue, but 
it does catch a bunch of issues, some of which cause corrupted data rather than 
failure to import.

--
Dave Mayo (he/him)
Senior Digital Library Software Engineer
Harvard University > HUIT > LTS

From: 
mailto:archivesspace_users_group-boun...@lyralists.lyrasis.org>>
 on behalf of "Mayo, Dave" mailto:dave_m...@harvard.edu>>
Reply-To: Archivesspace Users Group 
mailto:archivesspace_users_group@lyralists.lyrasis.org>>
Date: Thursday, June 18, 2020 at 9:23 AM
To: Archivesspace Users Group 
mailto:archivesspace_users_group@lyralists.lyrasis.org>>
Subject: Re: [Archivesspace_Users_Group] Top container ranges

So, with the caveat that we put a lot of resources (a bunch of archivists’ 
time, a full year of a full time developer (me!)), we had very solid results; I 
think remediating issues prior to import is almost always worth the expense of 
significant effort, particularly over a large corpus.

My main advice would be to be very, very careful about changes – version your 
EADs, compare before and after scripts run, and in general be very systematic 
about how you find, report, and correct changes.

I don’t know if you’ve seen it, but Kate Bowers and I did a write-up of what we 
did during our migration – it has links to a number of open source tools I 
wrote for doing this kind of work.  They’re a bit involved to get running, but 
they definitely work at basically any scale out there, and I’m happy to help 
people get started with them.  
https://journal.code4lib.org/articles/12239<https://urldefense.proofpoint.com/v2/url?u=https-3A__journal.code4lib.org_articles_12239=DwMGaQ=WO-RGvefibhHBZq3fL85hQ=_Mv1dY22K7jvT5MD7xjbvGVzRDOUMhx4WYcnPSIzYnE=MDvEtnIJJpOOfJzfDMsXF5u8QJ22oJqGB1UWDHD9Gmc=0ky2pQ2HoOxy34kpHGjThpBcFVj1ERUBf7LwbRZMMP4=>

--
Dave Mayo (he/him)
Senior Digital Library Software Engineer
Harvard University > HUIT > LTS

From: 
mailto:archivesspace_users_group-boun...@lyralists.lyrasis.org>>
 on behalf of "Lucas, Dawne Howard" 
mailto:dawne_lu...@unc.edu>>
Reply-To: Archivesspace Users Group 
mailto:archivesspace_users_group@lyralists.lyrasis.org>>
Date: Thursday, June 18, 2020 at 9:12 AM
To: Archivesspace Users Group 
mailto:archivesspace_users_group@lyralists.lyrasis.org>>
Subject: Re: [Archivessp

Re: [Archivesspace_Users_Group] Top container ranges

2020-06-18 Thread Hilton, Adrien
Hi Dawne,

I believe Yale created a script to break out container ranges: 
https://github.com/YaleArchivesSpace/xslt-files/blob/master/EAD_expand_top_container_ranges_prior_to_import.xsl

Best wishes,
Adrien

From: archivesspace_users_group-boun...@lyralists.lyrasis.org 
 On Behalf Of Mayo, 
Dave
Sent: Thursday, June 18, 2020 9:23 AM
To: Archivesspace Users Group 
Subject: Re: [Archivesspace_Users_Group] Top container ranges

So, with the caveat that we put a lot of resources (a bunch of archivists’ 
time, a full year of a full time developer (me!)), we had very solid results; I 
think remediating issues prior to import is almost always worth the expense of 
significant effort, particularly over a large corpus.

My main advice would be to be very, very careful about changes – version your 
EADs, compare before and after scripts run, and in general be very systematic 
about how you find, report, and correct changes.

I don’t know if you’ve seen it, but Kate Bowers and I did a write-up of what we 
did during our migration – it has links to a number of open source tools I 
wrote for doing this kind of work.  They’re a bit involved to get running, but 
they definitely work at basically any scale out there, and I’m happy to help 
people get started with them.  
https://journal.code4lib.org/articles/12239<https://urldefense.proofpoint.com/v2/url?u=https-3A__journal.code4lib.org_articles_12239=DwMGaQ=WO-RGvefibhHBZq3fL85hQ=o7OoY1I5SGwJOY4qFC1JgmA4MQwVJOWSxO2IqPX0FiU=vS6XDcZB0h_br-T8Gq3jqXX33ieGP-JCFkbS1dAbEHg=rQ8EQDYEp71yWvgaKUN296_jqyoIDw4gUtTneg1gC6w=>

--
Dave Mayo (he/him)
Senior Digital Library Software Engineer
Harvard University > HUIT > LTS

From: 
mailto:archivesspace_users_group-boun...@lyralists.lyrasis.org>>
 on behalf of "Lucas, Dawne Howard" 
mailto:dawne_lu...@unc.edu>>
Reply-To: Archivesspace Users Group 
mailto:archivesspace_users_group@lyralists.lyrasis.org>>
Date: Thursday, June 18, 2020 at 9:12 AM
To: Archivesspace Users Group 
mailto:archivesspace_users_group@lyralists.lyrasis.org>>
Subject: Re: [Archivesspace_Users_Group] Top container ranges

Thanks, Dave.  I guess I should have specified that changing the EAD isn’t a 
viable solution for us unless it’s automated. We do not plan to edit individual 
finding aids manually except in cases where the ranges aren’t regular.

If you’ve done this at Harvard, have there been any drawbacks? Anything we 
should be looking to avoid?

Thanks again,

Dawne


From: Mayo, Dave<mailto:dave_m...@harvard.edu>
Sent: Thursday, June 18, 2020 9:04 AM
To: Archivesspace Users 
Group<mailto:archivesspace_users_group@lyralists.lyrasis.org>
Subject: Re: [Archivesspace_Users_Group] Top container ranges

The two options I see here are essentially:

1. Change the EAD
2. Change the containers after they’re ingested.

Of the two, changing the EAD seems _easier_ to me; if you wouldn’t mind going 
more into why that’s not a viable solution for you, it might help us provide 
better advice?

Either way, at 7000 finding aids, the solution would basically need to be 
automated – if your box ranges are very regular (i.e. only single number or 
range, no “3,4,7-10” or similar), it wouldn’t be too difficult – split the 
range on ‘-‘, generate list of numbers, replace container with multiple 
containers.
--
Dave Mayo (he/him)
Senior Digital Library Software Engineer
Harvard University > HUIT > LTS

From: 
mailto:archivesspace_users_group-boun...@lyralists.lyrasis.org>>
 on behalf of "Lucas, Dawne Howard" 
mailto:dawne_lu...@unc.edu>>
Reply-To: Archivesspace Users Group 
mailto:archivesspace_users_group@lyralists.lyrasis.org>>
Date: Thursday, June 18, 2020 at 8:13 AM
To: Archivesspace Users Group 
mailto:archivesspace_users_group@lyralists.lyrasis.org>>
Subject: [Archivesspace_Users_Group] Top container ranges


Hi all,



We are formulating a plan to import our 7000+ EAD finding aids into 
ArchivesSpace and are wondering how other institutions have handled top 
container ranges.



For example, we have finding aids coded like this:



3-4Photographs



This imports into ASpace just fine (yay!), but of course also creates a top 
container for Box 3-4 instead of Box 3 and Box 4 (boo!). We assume this will be 
an issue later when we integrate with Aeon.



The most obvious solution to this problem appears to be to change the encoding 
to:



3Photographs



4 
Photographs



For several reasons, this is not a viable solution for us. Have other 
institutions figured out a way to deal with this issue that does not include 
editing the EAD in individual finding aids?

Thanks for your help,

Dawne

--
Dawne Howard Lucas (she/her/hers)
Technical Services Archivist

Wilson Special Collections Library
200 South Road, CB #3926
Chapel Hill, NC 27515
The University of North Carolina at Chapel Hill
P  919-966-1776   E  dawne_lu...@unc.edu<mailto:dawne_lu...@unc.edu>

[cid:image001.png@0

Re: [Archivesspace_Users_Group] Top container ranges

2020-06-18 Thread Bowers, Kate A.
What Dave said.

From a practical standpoint, you have these options:

1.  Change the EAD pre-migration

2.  Change the data in Aspace

3.  Live with the data like that (your cost of cleaning v living with it 
may vary, but it is worth having the discussion. My repository had some issues 
we decided to live with and some we decided to fix.)

Option 1 works if you have consistency in the choices folks have made in 
finding aids. Unfortunately at Harvard, for some repositories text like “Box 
3-4” referred to a single box with the identifier 3-4, and not to “Box 3 and 
Box 4”. Thus, we could not implement a single script that would work for all 
cases. We were also constrained by time and could not implement scripted 
solutions across sub-sets of our corpus. However…

Individual repositories did implement changes in their own ways. We had very 
few of these “box range” type of finding aids, so I (OK, I know this is a 
really crude, sledgehammer type of method!)

·Got the subset of finding aid that have this issue (granted, this can 
be some task in itself)

·Put them in their own directory

·Used regex find-and-replace (taking great care, of course to do no 
harm by accident) in either my favorite text editor or oXygen to 
find-and-replace all instances of the problem

·Double-checked that they were all still valid

·Spot-checked the results
Your mileage and access to a real programmer for stuff like this may vary.




From: archivesspace_users_group-boun...@lyralists.lyrasis.org 
 On Behalf Of Mayo, 
Dave
Sent: Thursday, June 18, 2020 9:37 AM
To: Archivesspace Users Group 
Subject: Re: [Archivesspace_Users_Group] Top container ranges

Also, specifically:


  1.  Using an XML database like eXist-db or BaseX with XPath/XQuery was 
invaluable when doing analysis of issues and of the impact of changes
  2.  One of the tools I wrote, the EAD Checker, is available online: 
https://eadchecker.lib.harvard.edu – it doesn’t catch this specific issue, but 
it does catch a bunch of issues, some of which cause corrupted data rather than 
failure to import.

--
Dave Mayo (he/him)
Senior Digital Library Software Engineer
Harvard University > HUIT > LTS

From: 
mailto:archivesspace_users_group-boun...@lyralists.lyrasis.org>>
 on behalf of "Mayo, Dave" mailto:dave_m...@harvard.edu>>
Reply-To: Archivesspace Users Group 
mailto:archivesspace_users_group@lyralists.lyrasis.org>>
Date: Thursday, June 18, 2020 at 9:23 AM
To: Archivesspace Users Group 
mailto:archivesspace_users_group@lyralists.lyrasis.org>>
Subject: Re: [Archivesspace_Users_Group] Top container ranges

So, with the caveat that we put a lot of resources (a bunch of archivists’ 
time, a full year of a full time developer (me!)), we had very solid results; I 
think remediating issues prior to import is almost always worth the expense of 
significant effort, particularly over a large corpus.

My main advice would be to be very, very careful about changes – version your 
EADs, compare before and after scripts run, and in general be very systematic 
about how you find, report, and correct changes.

I don’t know if you’ve seen it, but Kate Bowers and I did a write-up of what we 
did during our migration – it has links to a number of open source tools I 
wrote for doing this kind of work.  They’re a bit involved to get running, but 
they definitely work at basically any scale out there, and I’m happy to help 
people get started with them.  
https://journal.code4lib.org/articles/12239<https://urldefense.proofpoint.com/v2/url?u=https-3A__journal.code4lib.org_articles_12239=DwMGaQ=WO-RGvefibhHBZq3fL85hQ=_Mv1dY22K7jvT5MD7xjbvGVzRDOUMhx4WYcnPSIzYnE=MDvEtnIJJpOOfJzfDMsXF5u8QJ22oJqGB1UWDHD9Gmc=0ky2pQ2HoOxy34kpHGjThpBcFVj1ERUBf7LwbRZMMP4=>

--
Dave Mayo (he/him)
Senior Digital Library Software Engineer
Harvard University > HUIT > LTS

From: 
mailto:archivesspace_users_group-boun...@lyralists.lyrasis.org>>
 on behalf of "Lucas, Dawne Howard" 
mailto:dawne_lu...@unc.edu>>
Reply-To: Archivesspace Users Group 
mailto:archivesspace_users_group@lyralists.lyrasis.org>>
Date: Thursday, June 18, 2020 at 9:12 AM
To: Archivesspace Users Group 
mailto:archivesspace_users_group@lyralists.lyrasis.org>>
Subject: Re: [Archivesspace_Users_Group] Top container ranges

Thanks, Dave.  I guess I should have specified that changing the EAD isn’t a 
viable solution for us unless it’s automated. We do not plan to edit individual 
finding aids manually except in cases where the ranges aren’t regular.

If you’ve done this at Harvard, have there been any drawbacks? Anything we 
should be looking to avoid?

Thanks again,

Dawne


From: Mayo, Dave<mailto:dave_m...@harvard.edu>
Sent: Thursday, June 18, 2020 9:04 AM
To: Archivesspace Users 
Group<mailto:archivesspace_users_group@lyralists.lyrasis.org>
Subject: Re: [Archivesspace_Users_Group] Top container ran

Re: [Archivesspace_Users_Group] Top container ranges

2020-06-18 Thread Mayo, Dave
Also, specifically:


  1.  Using an XML database like eXist-db or BaseX with XPath/XQuery was 
invaluable when doing analysis of issues and of the impact of changes
  2.  One of the tools I wrote, the EAD Checker, is available online: 
https://eadchecker.lib.harvard.edu – it doesn’t catch this specific issue, but 
it does catch a bunch of issues, some of which cause corrupted data rather than 
failure to import.

--
Dave Mayo (he/him)
Senior Digital Library Software Engineer
Harvard University > HUIT > LTS

From:  on behalf of 
"Mayo, Dave" 
Reply-To: Archivesspace Users Group 

Date: Thursday, June 18, 2020 at 9:23 AM
To: Archivesspace Users Group 
Subject: Re: [Archivesspace_Users_Group] Top container ranges

So, with the caveat that we put a lot of resources (a bunch of archivists’ 
time, a full year of a full time developer (me!)), we had very solid results; I 
think remediating issues prior to import is almost always worth the expense of 
significant effort, particularly over a large corpus.

My main advice would be to be very, very careful about changes – version your 
EADs, compare before and after scripts run, and in general be very systematic 
about how you find, report, and correct changes.

I don’t know if you’ve seen it, but Kate Bowers and I did a write-up of what we 
did during our migration – it has links to a number of open source tools I 
wrote for doing this kind of work.  They’re a bit involved to get running, but 
they definitely work at basically any scale out there, and I’m happy to help 
people get started with them.  
https://journal.code4lib.org/articles/12239<https://urldefense.proofpoint.com/v2/url?u=https-3A__journal.code4lib.org_articles_12239=DwMGaQ=WO-RGvefibhHBZq3fL85hQ=_Mv1dY22K7jvT5MD7xjbvGVzRDOUMhx4WYcnPSIzYnE=MDvEtnIJJpOOfJzfDMsXF5u8QJ22oJqGB1UWDHD9Gmc=0ky2pQ2HoOxy34kpHGjThpBcFVj1ERUBf7LwbRZMMP4=>

--
Dave Mayo (he/him)
Senior Digital Library Software Engineer
Harvard University > HUIT > LTS

From:  on behalf of 
"Lucas, Dawne Howard" 
Reply-To: Archivesspace Users Group 

Date: Thursday, June 18, 2020 at 9:12 AM
To: Archivesspace Users Group 
Subject: Re: [Archivesspace_Users_Group] Top container ranges

Thanks, Dave.  I guess I should have specified that changing the EAD isn’t a 
viable solution for us unless it’s automated. We do not plan to edit individual 
finding aids manually except in cases where the ranges aren’t regular.

If you’ve done this at Harvard, have there been any drawbacks? Anything we 
should be looking to avoid?

Thanks again,

Dawne


From: Mayo, Dave<mailto:dave_m...@harvard.edu>
Sent: Thursday, June 18, 2020 9:04 AM
To: Archivesspace Users 
Group<mailto:archivesspace_users_group@lyralists.lyrasis.org>
Subject: Re: [Archivesspace_Users_Group] Top container ranges

The two options I see here are essentially:

1. Change the EAD
2. Change the containers after they’re ingested.

Of the two, changing the EAD seems _easier_ to me; if you wouldn’t mind going 
more into why that’s not a viable solution for you, it might help us provide 
better advice?

Either way, at 7000 finding aids, the solution would basically need to be 
automated – if your box ranges are very regular (i.e. only single number or 
range, no “3,4,7-10” or similar), it wouldn’t be too difficult – split the 
range on ‘-‘, generate list of numbers, replace container with multiple 
containers.
--
Dave Mayo (he/him)
Senior Digital Library Software Engineer
Harvard University > HUIT > LTS

From:  on behalf of 
"Lucas, Dawne Howard" 
Reply-To: Archivesspace Users Group 

Date: Thursday, June 18, 2020 at 8:13 AM
To: Archivesspace Users Group 
Subject: [Archivesspace_Users_Group] Top container ranges


Hi all,



We are formulating a plan to import our 7000+ EAD finding aids into 
ArchivesSpace and are wondering how other institutions have handled top 
container ranges.



For example, we have finding aids coded like this:



3-4Photographs



This imports into ASpace just fine (yay!), but of course also creates a top 
container for Box 3-4 instead of Box 3 and Box 4 (boo!). We assume this will be 
an issue later when we integrate with Aeon.



The most obvious solution to this problem appears to be to change the encoding 
to:



3Photographs



4 
Photographs



For several reasons, this is not a viable solution for us. Have other 
institutions figured out a way to deal with this issue that does not include 
editing the EAD in individual finding aids?

Thanks for your help,

Dawne

--
Dawne Howard Lucas (she/her/hers)
Technical Services Archivist

Wilson Special Collections Library
200 South Road, CB #3926
Chapel Hill, NC 27515
The University of North Carolina at Chapel Hill
P  919-966-1776   E  dawne_lu...@unc.edu<mailto:dawne_lu...@unc.edu>

[cid:image001.png@01D5F200.0D957C80]<https://urldefense.proofpoint.com/v2/url?u=https-3A__library.unc.edu_wilson_=DwMFAg=WO-RGvefibhHBZq3fL85hQ=_Mv1dY22K7jv

Re: [Archivesspace_Users_Group] Top container ranges

2020-06-18 Thread Mayo, Dave
So, with the caveat that we put a lot of resources (a bunch of archivists’ 
time, a full year of a full time developer (me!)), we had very solid results; I 
think remediating issues prior to import is almost always worth the expense of 
significant effort, particularly over a large corpus.

My main advice would be to be very, very careful about changes – version your 
EADs, compare before and after scripts run, and in general be very systematic 
about how you find, report, and correct changes.

I don’t know if you’ve seen it, but Kate Bowers and I did a write-up of what we 
did during our migration – it has links to a number of open source tools I 
wrote for doing this kind of work.  They’re a bit involved to get running, but 
they definitely work at basically any scale out there, and I’m happy to help 
people get started with them.  https://journal.code4lib.org/articles/12239

--
Dave Mayo (he/him)
Senior Digital Library Software Engineer
Harvard University > HUIT > LTS

From:  on behalf of 
"Lucas, Dawne Howard" 
Reply-To: Archivesspace Users Group 

Date: Thursday, June 18, 2020 at 9:12 AM
To: Archivesspace Users Group 
Subject: Re: [Archivesspace_Users_Group] Top container ranges

Thanks, Dave.  I guess I should have specified that changing the EAD isn’t a 
viable solution for us unless it’s automated. We do not plan to edit individual 
finding aids manually except in cases where the ranges aren’t regular.

If you’ve done this at Harvard, have there been any drawbacks? Anything we 
should be looking to avoid?

Thanks again,

Dawne


From: Mayo, Dave<mailto:dave_m...@harvard.edu>
Sent: Thursday, June 18, 2020 9:04 AM
To: Archivesspace Users 
Group<mailto:archivesspace_users_group@lyralists.lyrasis.org>
Subject: Re: [Archivesspace_Users_Group] Top container ranges

The two options I see here are essentially:

1. Change the EAD
2. Change the containers after they’re ingested.

Of the two, changing the EAD seems _easier_ to me; if you wouldn’t mind going 
more into why that’s not a viable solution for you, it might help us provide 
better advice?

Either way, at 7000 finding aids, the solution would basically need to be 
automated – if your box ranges are very regular (i.e. only single number or 
range, no “3,4,7-10” or similar), it wouldn’t be too difficult – split the 
range on ‘-‘, generate list of numbers, replace container with multiple 
containers.
--
Dave Mayo (he/him)
Senior Digital Library Software Engineer
Harvard University > HUIT > LTS

From:  on behalf of 
"Lucas, Dawne Howard" 
Reply-To: Archivesspace Users Group 

Date: Thursday, June 18, 2020 at 8:13 AM
To: Archivesspace Users Group 
Subject: [Archivesspace_Users_Group] Top container ranges


Hi all,



We are formulating a plan to import our 7000+ EAD finding aids into 
ArchivesSpace and are wondering how other institutions have handled top 
container ranges.



For example, we have finding aids coded like this:



3-4Photographs



This imports into ASpace just fine (yay!), but of course also creates a top 
container for Box 3-4 instead of Box 3 and Box 4 (boo!). We assume this will be 
an issue later when we integrate with Aeon.



The most obvious solution to this problem appears to be to change the encoding 
to:



3Photographs



4 
Photographs



For several reasons, this is not a viable solution for us. Have other 
institutions figured out a way to deal with this issue that does not include 
editing the EAD in individual finding aids?

Thanks for your help,

Dawne

--
Dawne Howard Lucas (she/her/hers)
Technical Services Archivist

Wilson Special Collections Library
200 South Road, CB #3926
Chapel Hill, NC 27515
The University of North Carolina at Chapel Hill
P  919-966-1776   E  dawne_lu...@unc.edu<mailto:dawne_lu...@unc.edu>

[cid:image001.png@01D5F200.0D957C80]<https://urldefense.proofpoint.com/v2/url?u=https-3A__library.unc.edu_wilson_=DwMFAg=WO-RGvefibhHBZq3fL85hQ=_Mv1dY22K7jvT5MD7xjbvGVzRDOUMhx4WYcnPSIzYnE=tkJE1JdGvSoNb5i6NSRbF3z1n28dGeVJ4ogcFmpTpQo=e9r4LIAN87oWg7LLTrzui9bCYcCMX-8twYfh3y0I8tY=>



___
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group@lyralists.lyrasis.org
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group


Re: [Archivesspace_Users_Group] Top container ranges

2020-06-18 Thread Lucas, Dawne Howard
Thanks, Dave.  I guess I should have specified that changing the EAD isn’t a 
viable solution for us unless it’s automated. We do not plan to edit individual 
finding aids manually except in cases where the ranges aren’t regular.

If you’ve done this at Harvard, have there been any drawbacks? Anything we 
should be looking to avoid?

Thanks again,

Dawne


From: Mayo, Dave<mailto:dave_m...@harvard.edu>
Sent: Thursday, June 18, 2020 9:04 AM
To: Archivesspace Users 
Group<mailto:archivesspace_users_group@lyralists.lyrasis.org>
Subject: Re: [Archivesspace_Users_Group] Top container ranges

The two options I see here are essentially:

1. Change the EAD
2. Change the containers after they’re ingested.

Of the two, changing the EAD seems _easier_ to me; if you wouldn’t mind going 
more into why that’s not a viable solution for you, it might help us provide 
better advice?

Either way, at 7000 finding aids, the solution would basically need to be 
automated – if your box ranges are very regular (i.e. only single number or 
range, no “3,4,7-10” or similar), it wouldn’t be too difficult – split the 
range on ‘-‘, generate list of numbers, replace container with multiple 
containers.
--
Dave Mayo (he/him)
Senior Digital Library Software Engineer
Harvard University > HUIT > LTS

From:  on behalf of 
"Lucas, Dawne Howard" 
Reply-To: Archivesspace Users Group 

Date: Thursday, June 18, 2020 at 8:13 AM
To: Archivesspace Users Group 
Subject: [Archivesspace_Users_Group] Top container ranges


Hi all,



We are formulating a plan to import our 7000+ EAD finding aids into 
ArchivesSpace and are wondering how other institutions have handled top 
container ranges.



For example, we have finding aids coded like this:



3-4Photographs



This imports into ASpace just fine (yay!), but of course also creates a top 
container for Box 3-4 instead of Box 3 and Box 4 (boo!). We assume this will be 
an issue later when we integrate with Aeon.



The most obvious solution to this problem appears to be to change the encoding 
to:



3Photographs



4 
Photographs



For several reasons, this is not a viable solution for us. Have other 
institutions figured out a way to deal with this issue that does not include 
editing the EAD in individual finding aids?

Thanks for your help,

Dawne

--
Dawne Howard Lucas (she/her/hers)
Technical Services Archivist

Wilson Special Collections Library
200 South Road, CB #3926
Chapel Hill, NC 27515
The University of North Carolina at Chapel Hill
P  919-966-1776   E  dawne_lu...@unc.edu<mailto:dawne_lu...@unc.edu>

[cid:image001.png@01D5F200.0D957C80]<https://urldefense.proofpoint.com/v2/url?u=https-3A__library.unc.edu_wilson_=DwMFAg=WO-RGvefibhHBZq3fL85hQ=_Mv1dY22K7jvT5MD7xjbvGVzRDOUMhx4WYcnPSIzYnE=tkJE1JdGvSoNb5i6NSRbF3z1n28dGeVJ4ogcFmpTpQo=e9r4LIAN87oWg7LLTrzui9bCYcCMX-8twYfh3y0I8tY=>



___
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group@lyralists.lyrasis.org
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group


Re: [Archivesspace_Users_Group] Top container ranges

2020-06-18 Thread Mayo, Dave
The two options I see here are essentially:

1. Change the EAD
2. Change the containers after they’re ingested.

Of the two, changing the EAD seems _easier_ to me; if you wouldn’t mind going 
more into why that’s not a viable solution for you, it might help us provide 
better advice?

Either way, at 7000 finding aids, the solution would basically need to be 
automated – if your box ranges are very regular (i.e. only single number or 
range, no “3,4,7-10” or similar), it wouldn’t be too difficult – split the 
range on ‘-‘, generate list of numbers, replace container with multiple 
containers.

--
Dave Mayo (he/him)
Senior Digital Library Software Engineer
Harvard University > HUIT > LTS

From:  on behalf of 
"Lucas, Dawne Howard" 
Reply-To: Archivesspace Users Group 

Date: Thursday, June 18, 2020 at 8:13 AM
To: Archivesspace Users Group 
Subject: [Archivesspace_Users_Group] Top container ranges


Hi all,



We are formulating a plan to import our 7000+ EAD finding aids into 
ArchivesSpace and are wondering how other institutions have handled top 
container ranges.



For example, we have finding aids coded like this:



3-4Photographs



This imports into ASpace just fine (yay!), but of course also creates a top 
container for Box 3-4 instead of Box 3 and Box 4 (boo!). We assume this will be 
an issue later when we integrate with Aeon.



The most obvious solution to this problem appears to be to change the encoding 
to:



3Photographs



4 
Photographs



For several reasons, this is not a viable solution for us. Have other 
institutions figured out a way to deal with this issue that does not include 
editing the EAD in individual finding aids?

Thanks for your help,

Dawne

--
Dawne Howard Lucas (she/her/hers)
Technical Services Archivist

Wilson Special Collections Library
200 South Road, CB #3926
Chapel Hill, NC 27515
The University of North Carolina at Chapel Hill
P  919-966-1776   E  dawne_lu...@unc.edu<mailto:dawne_lu...@unc.edu>

[cid:image001.png@01D5F200.0D957C80]<https://urldefense.proofpoint.com/v2/url?u=https-3A__library.unc.edu_wilson_=DwMFAg=WO-RGvefibhHBZq3fL85hQ=_Mv1dY22K7jvT5MD7xjbvGVzRDOUMhx4WYcnPSIzYnE=tkJE1JdGvSoNb5i6NSRbF3z1n28dGeVJ4ogcFmpTpQo=e9r4LIAN87oWg7LLTrzui9bCYcCMX-8twYfh3y0I8tY=>


___
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group@lyralists.lyrasis.org
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group


Re: [Archivesspace_Users_Group] Top container ranges

2020-06-18 Thread Lucas, Dawne Howard
One quick-follow up: we’re not completely clueless about how we might do this, 
but appreciate hearing about the experience at other institutions. For those 
institutions that have done this, were there any drawbacks?

Thanks,

Dawne


From: Lucas, Dawne Howard<mailto:dawne_lu...@unc.edu>
Sent: Thursday, June 18, 2020 8:13 AM
To: Archivesspace Users 
Group<mailto:archivesspace_users_group@lyralists.lyrasis.org>
Subject: [Archivesspace_Users_Group] Top container ranges


Hi all,



We are formulating a plan to import our 7000+ EAD finding aids into 
ArchivesSpace and are wondering how other institutions have handled top 
container ranges.



For example, we have finding aids coded like this:



3-4Photographs



This imports into ASpace just fine (yay!), but of course also creates a top 
container for Box 3-4 instead of Box 3 and Box 4 (boo!). We assume this will be 
an issue later when we integrate with Aeon.



The most obvious solution to this problem appears to be to change the encoding 
to:



3Photographs



4 
Photographs



For several reasons, this is not a viable solution for us. Have other 
institutions figured out a way to deal with this issue that does not include 
editing the EAD in individual finding aids?

Thanks for your help,

Dawne

--
Dawne Howard Lucas (she/her/hers)
Technical Services Archivist

Wilson Special Collections Library
200 South Road, CB #3926
Chapel Hill, NC 27515
The University of North Carolina at Chapel Hill
P  919-966-1776   E  dawne_lu...@unc.edu<mailto:dawne_lu...@unc.edu>

[cid:image001.png@01D5F200.0D957C80]<https://library.unc.edu/wilson/>



___
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group@lyralists.lyrasis.org
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group


[Archivesspace_Users_Group] Top container ranges

2020-06-18 Thread Lucas, Dawne Howard
Hi all,



We are formulating a plan to import our 7000+ EAD finding aids into 
ArchivesSpace and are wondering how other institutions have handled top 
container ranges.



For example, we have finding aids coded like this:



3-4Photographs



This imports into ASpace just fine (yay!), but of course also creates a top 
container for Box 3-4 instead of Box 3 and Box 4 (boo!). We assume this will be 
an issue later when we integrate with Aeon.



The most obvious solution to this problem appears to be to change the encoding 
to:



3Photographs



4 
Photographs



For several reasons, this is not a viable solution for us. Have other 
institutions figured out a way to deal with this issue that does not include 
editing the EAD in individual finding aids?

Thanks for your help,

Dawne

--
Dawne Howard Lucas (she/her/hers)
Technical Services Archivist

Wilson Special Collections Library
200 South Road, CB #3926
Chapel Hill, NC 27515
The University of North Carolina at Chapel Hill
P  919-966-1776   E  dawne_lu...@unc.edu

[cid:image001.png@01D5F200.0D957C80]


___
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group@lyralists.lyrasis.org
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group