You could try to programatically match up each hOCR text block to a
corresponding fragment from the transcripts, based on textual similarity
(then replace the hOCR text with the "real" text). There's monotonicity in
terms of XY coordinates vs offset in the transcript, i.e. (X1,Y1) < (X2,Y2)
=> text
*** Apologies for cross-posting ***
—
*10th annual Taxonomy Boot Camp*
November 4-5 as part of KMWorld
Washington, DC
*Website:* http://www.taxonomybootcamp.com/2014/
*Call for Speakers:*
http://www.taxonomybootcamp.com/2014/CallForSpeakers.asp
*Deadline:* March 20, 2014
——
If you were to select a set of RDF ontologies intended to be used in the linked
data of archival descriptions, then what ontologies would you select?
For simplicity's sake, RDF ontologies are akin to the fields in MARC records or
the entities in EAD/XML files. Articulated more accurately, they a
Roy, I'm not sure what tips you over into sarcasm mode (unless it's
anything I say), but 1) the answer is a few posts down, albeit not in
any detail 2) as a member-based organization that exists to serve its
members, I would think that OCLC would want to encourage the gathering
of information