Re: [Archivesspace_Users_Group] PUI question: external indexing of container list tree link text

2019-01-04 Thread Custer, Mark
John,

Exactly what Steve said!

As an aside, one thing that I thought about (quite a while back now, I guess it 
was) was adding those mappings to the JSON-LD metadata that are currently only 
applied on Resource, Repository, and Agent pages within the PUI (if you view 
the source of a Resource landing page, for example, you’ll see a bit of JSON-LD 
at the very top).  In other words, the Resource JSON-LD could have 
https://schema.org/hasPart statements which would include the URL of the 
immediate archival object children.  But, that could result in a lot of extra 
data since sometimes folks have very, very flat hierarchies (e.g. 10,000 
children archival objects all attached to 1 resource record…. yikes!). Because 
of that very real possibility, it might be a better mapping strategy not to use 
hasPart in ASpace, but instead to just add a single https://schema.org/isPartOf 
to each archival object page, but then I don’t think that would help your use 
case.

We haven’t done this just yet, but I was planning to add a readme file in 
Github that just lists out all of the EAD files for this sort of aggregation 
purpose.  E.g. 
https://github.com/YaleArchivesSpace/Archives-at-Yale-EAD3/blob/master/med-ead/README.md
   We’re not using the OAI endpoint right now, though, since we post-process 
and validate our EAD files after export.  For folks who are using it, then that 
would be a very good way to get the data harvested.

As for search engines indexing at a deep level in the PUI, I suspect that could 
be an issue. It would probably be best if the PUI had sitemaps out of the box.  
That said, we have a LOT of archival objects being indexed by Google in our 
instance of the PUI, but I don’t think the crawler is getting the entire 
finding aids (as long as they get all of the Resources, I’m happy for now).  
I’d expect that a sitemap would be best for ensuring that, but I’ve honestly no 
clue in this day and age of the Web!

Mark



From: archivesspace_users_group-boun...@lyralists.lyrasis.org 
[mailto:archivesspace_users_group-boun...@lyralists.lyrasis.org] On Behalf Of 
Majewski, Steven Dennis (sdm7g)
Sent: Friday, 04 January, 2019 11:04 AM
To: Archivesspace Users Group 
Subject: Re: [Archivesspace_Users_Group] PUI question: external indexing of 
container list tree link text


I would suggest crawling the OAI endpoint for indexing, but linking to the PUI 
record.
oai_ead metadata just EAD in an OAI wrapper, and has the complete resource tree.
The problem with that is that not everyone may have configured OAI or made it 
public.

But yes: that’s a problem with progressive web apps: all of the data you want 
indexed isn’t in the page.
I wonder if there is a way thru google webmaster console or sitemaps to 
configure this sort of action, i.e.
Use this other URL to index this resource.

— Steve Majewski




On Jan 4, 2019, at 9:59 AM, Rees, John (NIH/NLM) [E] 
mailto:re...@mail.nlm.nih.gov>> wrote:

Hi all,

I administer a finding aids aggregation service that in part scrapes 
HTML-source code as a data input and I am looking for some advice/start a 
conversation.

Several of our contributing repositories with this data type moved to 
ArchivesSpace in 2018 and we are not able to crawl ASpace’s 
collection_organization#tree source which seems to be the only organized view 
of container list data. As many of you probably know these are coded as URIs in 
the HTML-source and are only rendered as visible text at runtime via javascript 
and css in the browser.

Our crawler cannot translate these HTML-source URIs into text that it can 
index. The only workaround we’ve been able to find is indexing the PDF view, 
but not everyone implements this feature. Additionally, our crawler sometimes 
times out on large PDFs as it can take ASpace a while to generate them at 
runtime.

I’m also wondering if PUI implementers have noticed any issues with other 
search engines having difficulty indexing their PUI content at a full-document 
level?

I searched the Jira backlog and PUI Enhancements wikispace and did not find 
anything specifically addressing this use case.

Thanks,
John


John P. Rees
Archivist and Digital Resources Manager
History of Medicine Division
National Library of Medicine
301-827-4510


___
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group@lyralists.lyrasis.org
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group

___
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group@lyralists.lyrasis.org

Re: [Archivesspace_Users_Group] PUI question: external indexing of container list tree link text

2019-01-04 Thread Majewski, Steven Dennis (sdm7g)

I would suggest crawling the OAI endpoint for indexing, but linking to the PUI 
record.
oai_ead metadata just EAD in an OAI wrapper, and has the complete resource 
tree. 
The problem with that is that not everyone may have configured OAI or made it 
public. 

But yes: that’s a problem with progressive web apps: all of the data you want 
indexed isn’t in the page. 
I wonder if there is a way thru google webmaster console or sitemaps to 
configure this sort of action, i.e.
Use this other URL to index this resource. 

— Steve Majewski



> On Jan 4, 2019, at 9:59 AM, Rees, John (NIH/NLM) [E]  
> wrote:
> 
> Hi all,
>  
> I administer a finding aids aggregation service that in part scrapes 
> HTML-source code as a data input and I am looking for some advice/start a 
> conversation.
>  
> Several of our contributing repositories with this data type moved to 
> ArchivesSpace in 2018 and we are not able to crawl ASpace’s 
> collection_organization#tree source which seems to be the only organized view 
> of container list data. As many of you probably know these are coded as URIs 
> in the HTML-source and are only rendered as visible text at runtime via 
> javascript and css in the browser.
>  
> Our crawler cannot translate these HTML-source URIs into text that it can 
> index. The only workaround we’ve been able to find is indexing the PDF view, 
> but not everyone implements this feature. Additionally, our crawler sometimes 
> times out on large PDFs as it can take ASpace a while to generate them at 
> runtime.
>  
> I’m also wondering if PUI implementers have noticed any issues with other 
> search engines having difficulty indexing their PUI content at a 
> full-document level?
>  
> I searched the Jira backlog and PUI Enhancements wikispace and did not find 
> anything specifically addressing this use case.
>  
> Thanks,
> John
>  
>  
> John P. Rees
> Archivist and Digital Resources Manager
> History of Medicine Division
> National Library of Medicine
> 301-827-4510
>  
>  
> ___
> Archivesspace_Users_Group mailing list
> Archivesspace_Users_Group@lyralists.lyrasis.org 
> 
> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group 
> 


smime.p7s
Description: S/MIME cryptographic signature
___
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group@lyralists.lyrasis.org
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group


[Archivesspace_Users_Group] PUI question: external indexing of container list tree link text

2019-01-04 Thread Rees, John (NIH/NLM) [E]
Hi all,

I administer a finding aids aggregation service that in part scrapes 
HTML-source code as a data input and I am looking for some advice/start a 
conversation.

Several of our contributing repositories with this data type moved to 
ArchivesSpace in 2018 and we are not able to crawl ASpace's 
collection_organization#tree source which seems to be the only organized view 
of container list data. As many of you probably know these are coded as URIs in 
the HTML-source and are only rendered as visible text at runtime via javascript 
and css in the browser.

Our crawler cannot translate these HTML-source URIs into text that it can 
index. The only workaround we've been able to find is indexing the PDF view, 
but not everyone implements this feature. Additionally, our crawler sometimes 
times out on large PDFs as it can take ASpace a while to generate them at 
runtime.

I'm also wondering if PUI implementers have noticed any issues with other 
search engines having difficulty indexing their PUI content at a full-document 
level?

I searched the Jira backlog and PUI Enhancements wikispace and did not find 
anything specifically addressing this use case.

Thanks,
John


John P. Rees
Archivist and Digital Resources Manager
History of Medicine Division
National Library of Medicine
301-827-4510


___
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group@lyralists.lyrasis.org
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group