Re: [Archivesspace_Users_Group] EAD Importer and DAOs

2018-05-26 Thread Custer, Mark
Brian, Tim:

In my opinion, the entire digital object module deserves a review, and that 
would include both EAD imports and exports (in those exports, for example, the 
digital object title is repeated as both an @xlink:title AND a daodesc/p; and 
worse yet, the ASpace required identifier isn't even included in the export).

In general, I think that some of the import/export and migration tools in 
Aspace assume more than they should.  In this case, I don't think that the 
importer should assume that it must repurpose a title as a caption, especially 
since the caption is not required by ASpace. 

So in the short term, if anything is done, I'd think that the importer should 
stop duplicating the title information in two places in the database.  Any time 
data is duplicated like that (not to mention being repurposed for a completely 
different field), the odds for messy metadata goes way up.  

And there would be other ways to get around this in the meantime, such as 
shortening the DAO title and then updating it after the import... but even 
then, you'd have to ask:  do we really want that title repeated in the caption 
field? 

Mark



-Original Message-
From: archivesspace_users_group-boun...@lyralists.lyrasis.org 
[mailto:archivesspace_users_group-boun...@lyralists.lyrasis.org] On Behalf Of 
Timothy Dilauro
Sent: Thursday, 24 May, 2018 5:26 PM
To: Archivesspace Users Group <archivesspace_users_group@lyralists.lyrasis.org>
Subject: Re: [Archivesspace_Users_Group] EAD Importer and DAOs

Hi Brian,

I don't think it's a good idea to change the data model just to avoid imports 
failing, though there may be other rationales that result in such a change.

In the mean time, it might be useful to write some XSLT or some other custom 
code to perform sanity checks relative to ASpace restrictions ahead of EAD 
import attempts. In that manner, those non-conformant captions (and anything 
else you check on) could be tweaked before import.

Cheers,
~Tim

> On May 23, 2018, at 2:39 PM, Brian Harrington <brian.harring...@lyrasis.org> 
> wrote:
> 
> 
> Currently when importing an EAD, s are used to create digital objects.  
> As part of this process, the @title attribute is used for both the digital 
> object title, and the caption under file versions.  I've recently run into a 
> fun issue with s with @titles longer than 255 characters.  These titles 
> are OK for digital_object:title, which is VARCHAR(8704) but too long for 
> file_version:caption, which is VARCHAR(255).  So the import fails.
> 
> Should this be considered a bug?  If it is, and if one were theoretically 
> considering a PR, would it make more sense to harmonize the length of the 
> title and caption, or truncate the caption to 255 characters?  My inclination 
> is just to increase the maximum length of captions, and rely on people to 
> show restraint, but I know that other people might have different opinions.
> 
> Thanks,
> 
> Brian
> 
> --
> Brian Harrington
> Migration Specialist
> LYRASIS
> brian.harring...@lyrasis.org
> skype: abbistani
> 
> 
> ___
> Archivesspace_Users_Group mailing list 
> Archivesspace_Users_Group@lyralists.lyrasis.org
> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_grou
> p
> 

___
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group@lyralists.lyrasis.org
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group


Re: [Archivesspace_Users_Group] EAD Importer and DAOs

2018-05-25 Thread Mayo, Dave
IIRC, also, a lot of the smaller DB-level defaults (and some of the larger 
ones) are the result of various defaults or arbitrary guesses - before just 
saying that it's part of the data model and we should just live with it, one 
should always positively verify that it's legitimately part of the data model.  

In this case, I can actually negatively verify this: in the commit where the 
file_version:caption field is added, the DB migration doesn't specify a length 
(and so gets the default of 255), but the schema, which is specified, reads:

> "caption" => {"type" => "string", "maxLength" => 16384},

So it's definitely a bug - I'll file a ticket, and I'm planning to make a pull 
request to fix this, though it won't really be a solution to your problem until 
it gets accepted and you update.  

If you have DB access, I think you'd be fine to just go directly in and run:

ALTER TABLE `file_version` MODIFY caption VARCHAR(16384);

Since you're expanding it, you won't risk truncating data.

- Dave Mayo
ASpace Core Committer's Group

On 5/25/18, 9:34 AM, "archivesspace_users_group-boun...@lyralists.lyrasis.org 
on behalf of Brian Harrington" 
 wrote:

Hi Tim,

I agree it seems a bit backwards to change the data model to suit the 
importer, which is one of the reasons I decided to pose the question to the 
list.  There could be valid reasons (display issues?) for limiting the length 
of the caption, but these things are often assigned somewhat arbitrarily, so I 
thought I would ask.  If there are reasons for keeping  for keeping the caption 
at 255, then I think it makes sense to truncate it in the importer, rather than 
just having things die on a database error.

I currently use a modified version of Mark Custer’s schematron 

 to check EADs prior to import, and can certainly add code to check  
@titles.  The problem with doing that is the double use of @title for both 
digital_object:title and file_version:caption.  Since ASpace supports long 
titles, and the archivist presumably assigned a long title for a reason, I 
would hate to shorten it before import just to make sure that it fits when 
re-used as a caption.

Thanks,

Brian

> On May 24, 2018, at 5:25 PM, Timothy Dilauro  wrote:
> 
> Hi Brian,
> 
> I don't think it's a good idea to change the data model just to avoid 
imports failing, though there may be other rationales that result in such a 
change.
> 
> In the mean time, it might be useful to write some XSLT or some other 
custom code to perform sanity checks relative to ASpace restrictions ahead of 
EAD import attempts. In that manner, those non-conformant captions (and 
anything else you check on) could be tweaked before import.
> 
> Cheers,
> ~Tim
> 
>> On May 23, 2018, at 2:39 PM, Brian Harrington 
 wrote:
>> 
>> 
>> Currently when importing an EAD, s are used to create digital 
objects.  As part of this process, the @title attribute is used for both the 
digital object title, and the caption under file versions.  I've recently run 
into a fun issue with s with @titles longer than 255 characters.  These 
titles are OK for digital_object:title, which is VARCHAR(8704) but too long for 
file_version:caption, which is VARCHAR(255).  So the import fails.
>> 
>> Should this be considered a bug?  If it is, and if one were 
theoretically considering a PR, would it make more sense to harmonize the 
length of the title and caption, or truncate the caption to 255 characters?  My 
inclination is just to increase the maximum length of captions, and rely on 
people to show restraint, but I know that other people might have different 
opinions.
>> 
>> Thanks,
>> 
>> Brian
>> 
>> --
>> Brian Harrington
>> Migration Specialist
>> LYRASIS
>> brian.harring...@lyrasis.org
>> skype: abbistani
>> 
>> 
>> ___
>> Archivesspace_Users_Group mailing list
>> Archivesspace_Users_Group@lyralists.lyrasis.org
>> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
>> 
> 
> ___
> Archivesspace_Users_Group mailing list
> Archivesspace_Users_Group@lyralists.lyrasis.org
> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group

___
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group@lyralists.lyrasis.org
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group


___
Archivesspace_Users_Group mailing list

Re: [Archivesspace_Users_Group] EAD Importer and DAOs

2018-05-25 Thread Brian Harrington
Hi Tim,

I agree it seems a bit backwards to change the data model to suit the importer, 
which is one of the reasons I decided to pose the question to the list.  There 
could be valid reasons (display issues?) for limiting the length of the 
caption, but these things are often assigned somewhat arbitrarily, so I thought 
I would ask.  If there are reasons for keeping  for keeping the caption at 255, 
then I think it makes sense to truncate it in the importer, rather than just 
having things die on a database error.

I currently use a modified version of Mark Custer’s schematron 

 to check EADs prior to import, and can certainly add code to check  
@titles.  The problem with doing that is the double use of @title for both 
digital_object:title and file_version:caption.  Since ASpace supports long 
titles, and the archivist presumably assigned a long title for a reason, I 
would hate to shorten it before import just to make sure that it fits when 
re-used as a caption.

Thanks,

Brian

> On May 24, 2018, at 5:25 PM, Timothy Dilauro  wrote:
> 
> Hi Brian,
> 
> I don't think it's a good idea to change the data model just to avoid imports 
> failing, though there may be other rationales that result in such a change.
> 
> In the mean time, it might be useful to write some XSLT or some other custom 
> code to perform sanity checks relative to ASpace restrictions ahead of EAD 
> import attempts. In that manner, those non-conformant captions (and anything 
> else you check on) could be tweaked before import.
> 
> Cheers,
> ~Tim
> 
>> On May 23, 2018, at 2:39 PM, Brian Harrington  
>> wrote:
>> 
>> 
>> Currently when importing an EAD, s are used to create digital objects.  
>> As part of this process, the @title attribute is used for both the digital 
>> object title, and the caption under file versions.  I've recently run into a 
>> fun issue with s with @titles longer than 255 characters.  These titles 
>> are OK for digital_object:title, which is VARCHAR(8704) but too long for 
>> file_version:caption, which is VARCHAR(255).  So the import fails.
>> 
>> Should this be considered a bug?  If it is, and if one were theoretically 
>> considering a PR, would it make more sense to harmonize the length of the 
>> title and caption, or truncate the caption to 255 characters?  My 
>> inclination is just to increase the maximum length of captions, and rely on 
>> people to show restraint, but I know that other people might have different 
>> opinions.
>> 
>> Thanks,
>> 
>> Brian
>> 
>> --
>> Brian Harrington
>> Migration Specialist
>> LYRASIS
>> brian.harring...@lyrasis.org
>> skype: abbistani
>> 
>> 
>> ___
>> Archivesspace_Users_Group mailing list
>> Archivesspace_Users_Group@lyralists.lyrasis.org
>> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
>> 
> 
> ___
> Archivesspace_Users_Group mailing list
> Archivesspace_Users_Group@lyralists.lyrasis.org
> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group

___
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group@lyralists.lyrasis.org
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group


Re: [Archivesspace_Users_Group] EAD Importer and DAOs

2018-05-24 Thread Timothy Dilauro
Hi Brian,

I don't think it's a good idea to change the data model just to avoid imports 
failing, though there may be other rationales that result in such a change.

In the mean time, it might be useful to write some XSLT or some other custom 
code to perform sanity checks relative to ASpace restrictions ahead of EAD 
import attempts. In that manner, those non-conformant captions (and anything 
else you check on) could be tweaked before import.

Cheers,
~Tim

> On May 23, 2018, at 2:39 PM, Brian Harrington  
> wrote:
> 
> 
> Currently when importing an EAD, s are used to create digital objects.  
> As part of this process, the @title attribute is used for both the digital 
> object title, and the caption under file versions.  I've recently run into a 
> fun issue with s with @titles longer than 255 characters.  These titles 
> are OK for digital_object:title, which is VARCHAR(8704) but too long for 
> file_version:caption, which is VARCHAR(255).  So the import fails.
> 
> Should this be considered a bug?  If it is, and if one were theoretically 
> considering a PR, would it make more sense to harmonize the length of the 
> title and caption, or truncate the caption to 255 characters?  My inclination 
> is just to increase the maximum length of captions, and rely on people to 
> show restraint, but I know that other people might have different opinions.
> 
> Thanks,
> 
> Brian
> 
> --
> Brian Harrington
> Migration Specialist
> LYRASIS
> brian.harring...@lyrasis.org
> skype: abbistani
> 
> 
> ___
> Archivesspace_Users_Group mailing list
> Archivesspace_Users_Group@lyralists.lyrasis.org
> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
> 



signature.asc
Description: Message signed with OpenPGP
___
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group@lyralists.lyrasis.org
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group


[Archivesspace_Users_Group] EAD Importer and DAOs

2018-05-23 Thread Brian Harrington
Currently when importing an EAD, s are used to create digital objects.  As 
part of this process, the @title attribute is used for both the digital object 
title, and the caption under file versions.  I've recently run into a fun issue 
with s with @titles longer than 255 characters.  These titles are OK for 
digital_object:title, which is VARCHAR(8704) but too long for 
file_version:caption, which is VARCHAR(255).  So the import fails.

Should this be considered a bug?  If it is, and if one were theoretically 
considering a PR, would it make more sense to harmonize the length of the title 
and caption, or truncate the caption to 255 characters?  My inclination is just 
to increase the maximum length of captions, and rely on people to show 
restraint, but I know that other people might have different opinions.

Thanks,

Brian 

-- 
Brian Harrington
Migration Specialist
LYRASIS
brian.harring...@lyrasis.org
skype: abbistani


___
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group@lyralists.lyrasis.org
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group