[CODE4LIB] units of metadata, was Namespace management, was Models of MARC in RDF

2011-12-12 Thread [Your Name]
On Dec 11, 2011, at 11:00 PM, Karen Coyle wrote:

> I'll let you battle that one out with Simon :-), but I am often at a loss for 
> a better term to describe the unit of metadata that libraries may create in 
> the future to describe their resources. Suggestions highly welcome.


I'm sure you're aware of these, but for general edification here are some 
possible ways to think about an "implicit record":

Concise Bounded Description: http://www.w3.org/Submission/CBD, or better for 
libraries IMHO, Symmetric Concise Bounded Description: 
http://www.w3.org/Submission/CBD/#scbd

The Minimum Self-contained Graph ("MSG"), details of which are available in 
"Signing individual fragments of an RDF graph." from (ACM WWW '05), as well as 
"RDFSync: efficient remote synchronization of RDF models" from 
(ISWC2007+ASWC2007).
http://www.www2005.org/cdrom/docs/p1020.pdf
http://data.semanticweb.org/pdfs/iswc-aswc/2007/ISWC2007_RT_Tummarello(1).pdf.

RDF Molecules, details in "Tracking RDF Graph Provenance using RDF Molecules," 
from (ISWC '05).
http://aisl.umbc.edu/resources/178.pdf

All of these are basically defined as pieces of larger graphs, although they 
can be considered as conditions of some kind of "validity" for a graph. I 
suspect that part of the hurdle for our community in moving to new patterns of 
work is the gap between current workflows (which create records) and future 
workflows (which may enrich shared graphs by much smaller increments then 
current notions of "record"). "[T]he unit of metadata that libraries may create 
in the future" as the unit in a given workflow may be only as large as the 
triple.

---
A. Soroka
Online Library Environment
the University of Virginia Library




On Dec 11, 2011, at 11:00 PM, Karen Coyle wrote:

>> I know it is only semantics (no pun intended), but we need to stop using the 
>> word 'record' when talking about the future description of 'things' or 
>> entities that are then linked together.   That word has so many built in
>> assumptions, especially in the library world.
> 
> I'll let you battle that one out with Simon :-), but I am often at a loss for 
> a better term to describe the unit of metadata that libraries may create in 
> the future to describe their resources. Suggestions highly welcome.


Re: [CODE4LIB] EAD in Blacklight (was: Re: [CODE4LIB] Batch loading in fedora)

2010-08-11 Thread [Your Name]
---
A. Soroka
Digital Research and Scholarship R & D
the University of Virginia Library

Tom--

Yes and no. Yes, in the sense that nothing of policy prevents us from sharing 
it, but no, in the sense that it is currently -very- tightly bound up with our 
workflow machinery, so I don't know how useful it could immediately be to you. 
I can put you in touch with the programmer who constructed that workflow, if 
you like. Anyone else interested in that tooling is also welcome to contact me 
off-list.


---
A. Soroka
Digital Research and Scholarship R & D
the University of Virginia Library


On Aug 9, 2010, at 11:00 PM, CODE4LIB automatic digest system wrote:

> From: Tom Cramer 
> Date: August 9, 2010 11:09:02 AM EDT
> Subject: Re: EAD in Blacklight (was: Re: [CODE4LIB] Batch loading in fedora)
> 
> 
> Adam,
> 
> Is the EAD-to-RDF "graphinator" code you describe shareable? I'd like to 
> experiment with it for some ongoing work that involves ingesting archival 
> collections into Fedora, and then editing them with Hydra and viewing them 
> via Blacklight. 
> 
> - Tom
> 
> 
> On Aug 8, 2010, at 8:13 AM, [Your Name] wrote:
> 
>> I'd like to share an alternative approach that we're pursuing here at UVa. 
>> It doesn't speak quite directly to operations on finding aids by themselves, 
>> with no attention to representing on-line the collection so described, but 
>> more to those situations where you make an attempt at a full digital 
>> surrogate for a collection, using repository machinery. I hope, though, that 
>> it might be useful to hear about. We started from a few principles as 
>> follows. (All of them have exceptions, of course. {grin})


Re: [CODE4LIB] EAD in Blacklight (was: Re: [CODE4LIB] Batch loading in fedora)

2010-08-08 Thread [Your Name]
I'd like to share an alternative approach that we're pursuing here at UVa. It 
doesn't speak quite directly to operations on finding aids by themselves, with 
no attention to representing on-line the collection so described, but more to 
those situations where you make an attempt at a full digital surrogate for a 
collection, using repository machinery. I hope, though, that it might be useful 
to hear about. We started from a few principles as follows. (All of them have 
exceptions, of course. {grin})

1) EAD is a wonderful markup language, but not always an optimal metadata 
standard. 

2) XML is for serializing, not for storage.

3) Solr is a fantastic indexing tool, but it's neither a datastore nor a 
database.

4) Collections do not have an absolutely correct structure. Archivists and 
scholars disagree sometimes.

5) The best ways to describe an individual entity are not necessarily the best 
ways to describe the relationships between entities.

We assemble digital surrogates for archival collections as assemblages of 
Fedora objects linked together by RDF. When we start with a finding aid, we 
disassemble the EAD to develop a graph of documents, containers, series, etc. 
in Fedora, with RDF predicates along the lines of "isConstituentOf", 
"hasCollectionMember", etc. When we haven't got a finding aid, we build up the 
graph from annotations on the physical objects (boxes, folders, etc.) as they 
are processed for scanning. Obviously, we get a much simpler graph that way, 
because no claims have been made by archivists about the structure of the 
collection. Descriptive and other metadata is stored with each object in MODS 
and other good -metadata- formats. A document object has metadata that pertain 
only to the document (along with any data that permits us to represent the 
document on-line, e.g. a scanned image or TEI text ), a folder object has 
metadata for that folder, etc. Since we want to offer EAD for a collection (or!
  any piece thereof), we supply a Fedora behavior (dissemination) against any 
object, which behavior assembles a collection structure as "seen" from that 
object (by following the RDF graph), then recursively assembles the appropriate 
metadata and transforms it to produce EAD.

We like this approach because it offers a great deal of extensibility (we could 
imagine using more sophisticated RDF to account for different opinions about a 
collection, or offering a METS or other structured view as well) and it keeps 
the repository contents "idiomatic". We haven't yet figured out entirely how we 
bring this kind of content to Blacklight, but we'll be aided by the fact that 
we have appropriately-attached metadata for anything that should appear as a 
record in our indexes.

We're bringing the first part of this scheme (the assembly of object graphs) to 
production in the next fortnight or so. We've got the code ready and tested and 
are now enjoying the really fun stuff-- moving servers around and tinkering 
with clustering and the like. The second part (producing EAD "live") is waiting 
to go to production on some work from our cataloging dep't, who have assigned 
some staff to polish up the mappings involved. We have very simple mappings in 
place now, but not ones good enough to publish publicly. They're working away, 
and we hope to see something in production later this fall. As for how we 
provide discoverability, we'll start simply by indexing all these objects into 
our local Blacklight instance. There's no need to consider how to index 
highly-structured XML because we're not storing it. We can move on to providing 
special views for records with awareness of the relationships that Fedora has 
recorded on those objects and tools for discovering, v!
 isualizing, and following them. Unfortunately, our one Blacklight developer 
has plenty on her plate already, so I don't know how quickly we'll be able to 
look at that. In the meanwhile, we can simply style out the 
dynamically-constructed EAD as part of a Blacklight view for a given record, 
which isn't particularly exciting, but is useful.

---
A. Soroka
Digital Research and Scholarship R & D
the University of Virginia Library


Re: [CODE4LIB] Batch loading in fedora

2010-07-29 Thread [Your Name]
Mr. Banerjee--

I'll assume that you're using a recent version of Fedora (3.x series). 

You've got a number of methods at your disposal, including the batch ingest 
scripts, but if you can write some simple scripts, you can use Fedora's REST 
interface to create objects by:

POST to MY_FEDORA/objects/new
or 
POST to MY_FEDORA/objects/my-pid

If you use "new" Fedora will return a PID (persistent ID for the new object) or 
you can select one yourself (as shown above). In either case you can then use 
the PID of the object to add datastreams by:

POST to MY_FEDORA/objects/my-pid/datastreams/datastream-id

e.g. MY_FEDORA/objects/test:345/datastreams/DC 
and
MY_FEDORA/objects/test:345/datastreams/JPG

or the like. That's all you need do. I suggest that might be the simplest 
approach. Take a look at the documentation on the REST interface:

https://wiki.duraspace.org/display/FCR30/REST+API

and I think you'll find it pretty clear and useful. The "create an object" 
method I describe above is called "ingest" for historical reasons. You'll find 
its documentation at:

https://wiki.duraspace.org/display/FCR30/REST+API#RESTAPI-ingest

The "add a datastream" method is documented at:

https://wiki.duraspace.org/display/FCR30/REST+API#RESTAPI-addDatastream

Good luck!

---
A. Soroka
Digital Research and Scholarship R & D
the University of Virginia Library



On Jul 28, 2010, at 11:00 PM, CODE4LIB automatic digest system wrote:

> From: Kyle Banerjee 
> Date: July 28, 2010 7:36:34 PM EDT
> Subject: Batch loading in fedora
> 
> 
> Howdy all,
> 
> I've never used fedora before, and I've found myself in the position of
> needing to load thousands of objects with metadata into a test instance
> which I don't maintain (and won't have to maintain later). Timeline is tight
> so I need to figure this out fast.
> 
> The data are very simple -- DC records which correspond with a bunch of
> image files that I have
> 
> In the short time I've had to familiarize myself with fedora today, it
> appears that I need to use the batch utility. I've found some documentation,
> most notably at
> http://www.fedora-commons.org/download/1.0/userdocs/client/admin-gui/batch/
> but
> I'm wondering where else I need to look.
> 
> If someone could point me to documentation (or better yet, an example) that
> a n00b could use fairly quickly prep what I have so that it will be ready to
> go in, it would be highly appreciated. I'm concerned that if I try to plow
> through documentation until I get it, I'll run out of time.
> 
> Thanks,
> 
> kyle
> 
> -- 
> --
> Kyle Banerjee
> Digital Services Program Manager
> Orbis Cascade Alliance
> baner...@uoregon.edu / 503.999.9787


Re: [CODE4LIB] Running a repository on Debian Stable

2010-04-09 Thread [Your Name]
>  Mike Taylor writes:
> Fedora,
> 
>  The problem there, as I understand it is that Fedora expects
>  everything to be in one directory. This setup in inimical to the
>  Debian setup.

Personally, I would think that Fedora is well beyond anything you're describing 
as desired, but just as a point of general information:

If by the above you mean that Fedora requires the web-app, object store, 
indexes, etc. to be in one directory, this is certainly not the case. A simple 
default install will indeed put all these inside one directory, along with a 
Apache Tomcat install and Apache Derby plant (if you ask for those things to be 
configured for you), but that seems to me to be simply the most OS-agnostic 
approach. You can, however, rearrange the various filesystem (and other) 
dependencies howsoever you like. E.g. here at UVa we've used network storage 
for objects, other network storage for data, a separate database server, our 
own locally-configured Java container, etc.

---
A. Soroka
Digital Research and Scholarship R & D
the University of Virginia Library


Re: [CODE4LIB] Auto-suggest and the id.loc.gov LCSH web service

2009-12-07 Thread [Your Name]
How about feeding back from web request stats?

Things that get pulled more often are probably more popular.

It's admittedly not very clever, but it would be easy to  implement...

---
A. Soroka
Digital Research and Scholarship R & D
the University of Virginia Library



On Dec 7, 2009, at 5:56 PM, Ed Summers wrote:

> On Mon, Dec 7, 2009 at 3:43 PM, LeVan,Ralph  wrote:
>> For VIAF, rankings are calculated based on the number of institutions
>> that have controlled that name and the amount of attention the
>> institutions have given to that name (e.g. size of their respective name
>> authority records).
> 
> Neat. It would be great to have some external dataset to use in
> ranking LCSH suggestions at id.loc.gov. But at the moment it's a
> simple mysql db loaded up with some MARC LCSH data. I guess it could
> do something smart with PageRank-like ranking of 'super-concepts'
> (concepts that are linked to a lot)...but that would've taken longer
> than 20 minutes :-)
> 
> //Ed


[CODE4LIB] new mailing list for XForms in libraries

2009-12-03 Thread [Your Name]
There's been some interest lately on this list in the use of W3C XForms for 
library metadata (e.g. MODS, EAD, VRA Core...). Several institutions have 
committed in one degree or another to their use, and many more are 
investigating the possibility. To provide a venue for more specific discussion 
(implementations, code sharing, etc.) I've created a list at:

https://list.mail.virginia.edu/mailman/listinfo/xforms4lib

I hope we can generate some useful discussion there, and perhaps even some 
partnership-building. As my colleague Ethan Gruber has pointed out to me, there 
are at least four or five institutions implementing MODS editors alone. It 
would seem that there's a lot of room to help each other.

---
A. Soroka
Digital Research and Scholarship R & D
the University of Virginia Library


Re: [CODE4LIB] XForms EAD editor sandbox available

2009-11-13 Thread [Your Name]
In discussion with colleagues around this topic, the question of  
controlled vocabularies has been prominent. We're looking to move away  
from list instances that are packed into the XForm at render time to  
lists that are exposed from other services through REST interfaces,  
which can be dynamically coupled into a form.


On the other hand, 4 seconds is really not terribly long. {grin}

---
A. Soroka
Digital Research and Scholarship R & D
the University of Virginia Library



On Nov 13, 2009, at 12:45 PM, Ford, Kevin wrote:

We've been using Orbeon forms for about a year now for cataloging  
our digital collections.  We use Fedora Commons, so using the XML as  
input and outputting to XML seemed a no brainer.  It has worked very  
nicely for editing VRA Core4 records. But, instead of doing anything  
terribly fancy with Orbeon, we simply use the little sandbox  
application that comes with Orbeon (there's an online demo [1]).   
The URL to the XForm is part of the query string. This solution has  
greatly reduced our time investment in making Orbeon part of our  
workflow and, more importantly, getting Orbeon to work for us.  All  
that being said, Ethan's sharp looking EAD editor makes me jealous  
that we haven't created our own custom editor.


As for Orbeon's performance, once we worked out some quirks, we've  
been quite happy with Orbeon.  Orbeon hosts a useful performance and  
tuning page [2].  We also learned that it is helpful to stop the  
Orbeon app and restart it about once every two weeks as performance  
can become progressively slower.  It seems to need a little reboot.   
In any event, a typical XForm for us is about 200k, with a number of  
authority lists, one of which includes nearly 1500 items.  Orbeon  
loads and renders the XForm fairly quickly (less than 4 seconds) and  
editing performance hasn't been an issue either, which is great  
considering that a 1500-item-subject-authority drop down list is  
created for each subject being added to a record.


Moving such a large XForm to a server-based solution was necessary.   
Our XForm cataloging application, which began with a simple DC  
record and focused on producing a viable XForm, initially used the  
Mozilla XForm add-on [3].  The Firefox add-on, which of course runs  
on the client, easily scaled for a VRA Core4 record, but it couldn't  
handle a burgeoning subject authority file.  Hence the need for an  
alternative solution, quick.


-Kevin

[1] http://www.orbeon.com/ops/xforms-sandbox/
[2] http://wiki.orbeon.com/forms/doc/developer-guide/performance- 
tuning

[3] http://www.mozilla.org/projects/xforms/

--
Kevin Ford
Library Digital Collections
Columbia College Chicago



-Original Message-
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf  
Of Andrew Ashton

Sent: Friday, November 13, 2009 8:37 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] XForms EAD editor sandbox available

Nice job, Ethan.  This looks really cool.

We have an Orbeon-based MODS editor, but I have found Orbeon to be a  
bit
tough to develop/maintain and more heavyweight than we really need.   
We're
considering more Xforms implementations, but I would love to find a  
more

lightweight Xforms application.  Does anyone have any recommendations?

The only one I know of is XSLTForms (http://www.agencexml.com/xsltforms 
) but

I haven't messed with it yet.

-Andy

On 11/13/09 9:13 AM, "Eric Hellman"  wrote:


XForms and Orbeon are very interesting tools for developing metadata
management tools.

The ONIX developers have used this stack to produce an interface  
for ONIX-PL

called OPLE that people should try out.

http://www.jisc.ac.uk/whatwedo/programmes/pals3/onixeditor.aspx

Questions about Orbeon relate to performance and integrability, but  
I think

it's an impressive use of XForms nonetheless.

- Eric

On Nov 12, 2009, at 1:30 PM, Ethan Gruber wrote:


Hello all,

Over the past few months I have been working on and off on a  
research
project to develop a XForms, web-based editor for EAD finding aids  
that runs
within the Orbeon tomcat application.  While still in a very early  
alpha
stage (I have probably put only 60-80 hours of work into it thus  
far), I
think that it's ready for a general demonstration to solicit  
opinions,

criticism, etc. from librarians, and technical staff.

Background:
For those not familiar with XForms, it is a W3C standard for  
creating
next-generation forms.  It is powerful and can allow you to create  
XML in
the way that it is intended to be created, without limits to  
repeatability,
complex hierarchies, or mixed content.  Orbeon adds a level on top  
of that,
taking care of all the ajax calls, serialization, CRUD operations,  
and a

variety of widgets that allow nice features like tabs and
autocomplete/autosuggest that can be bound to authority lists and  
controlled
access terms.  By default, Orbeon reads and writes data from and  
to an eXist
database that comes packaged wi