[CODE4LIB] Code4Lib Journal, Issue 26 is now available!

2014-10-21 Thread Kelley McGrath
The Code4Lib Journal, Issue 26 is now available!

http://journal.code4lib.org/issues/issue26

Here is what you will find inside:  

Editorial Introduction: On Being on The Code4Lib Journal Editorial Committee
Kelley McGrath
Behind the scenes of the The Code4Lib Journal...

Archiving the Web: A Case Study from the University of Victoria
Corey Davis
The University of Victoria Libraries started archiving websites in 2013, and it 
quickly became apparent that many scholarly websites being produced by faculty, 
especially in the digital humanities, were going to prove very challenging to 
effectively capture and play back. This article will provide an overview of web 
archiving and explore the considerable legal and technical challenges of 
implementing a web archiving initiative at a research library, using the 
University of Victoria's implementation of Archive-it, a web archiving service 
from the Internet Archive, as a case study, with a special focus on capturing 
complex, interactive websites that scholars are creating to disseminate their 
research in new ways.

Technical Challenges in Developing Software to Collect Twitter Data
Daniel Chudnov, Daniel Kerchner, Ankushi Sharma and Laura Wrubel
Over the past two years, George Washington University Libraries developed 
Social Feed Manager (SFM), a Python and Django-based application for collecting 
social media data from Twitter. Expanding the project from a research prototype 
to a more widely useful application has presented a number of technical 
challenges, including changes in the Twitter API, supervision of simultaneous 
streaming processes, management, storage, and organization of collected data, 
meeting researcher needs for groups or sets of data, and improving 
documentation to facilitate other institutions' installation and use of SFM. 
This article will describe how the Social Feed Manager project addressed these 
issues, use of supervisord to manage processes, and other technical decisions 
made in the course of this project through late summer 2014. This article is 
targeted towards librarians and archivists who are interested in building 
collections around web archives and social media data, and have a particula!
 r interest in the technical work involved in applying software to the problem 
of building a sustainable collection management program around these sources.

Exposing Library Services with AngularJS
Jakob Vo? and Moritz Horn
This article provides an introduction to the JavaScript framework AngularJS and 
specific AngularJS modules for accessing library services. It shows how 
information such as search suggestions, additional links, and availability can 
be embedded in any website. The ease of reuse may encourage more libraries to 
expose their services via standard APIs to allow usage in different contexts.

Hacking Summon 2.0 The Elegant Way
Annette Bailey and Godmar Back
Libraries have long been adding content and customizations to vendor-provided 
web-based search interfaces, including discovery systems such as ProQuest's 
Summon((tm)). Unlike solutions based on using an API, these approaches augment 
the vendor-designed user interface using library-provided JavaScript code. 
Recently, vendors have been implementing such user interfaces using 
client-centric model-view-controller (MVC) frameworks such as AngularJS, which 
are characterized by the use of modern software engineering techniques such as 
domain-specific markup, data binding, encapsulation, and dependency injection.

Consequently, traditional approaches such as reverse-engineering the document 
model (DOM) have become more difficult or even impossible to use because the 
DOM is highly dynamic, the templates used are difficult to discern, the 
vendor-provided JavaScript code is both encapsulated and partially obfuscated, 
and the data binding mechanisms impose a strict separation of model and view 
that discourages direct DOM manipulation. In fact, practitioners have started 
to complain that AngularJS-based websites such as Summon 2.0 are very difficult 
to enhance with custom content in a robust and efficient manner.

In this article, we show how to reverse-engineer the AngularJS-based Summon 2.0 
interface to discover the modules, directives, controllers, and services it 
uses, and we explain how we can use AngularJS's built-in mechanisms to create 
new directives and controllers that integrate with and augment the 
vendor-provided ones to add desired customization and interactions.

We have implemented several features that demonstrate our approach, such as a 
click-recording script, COinS and facet customization, and the integration of 
eBook public notes. Our explanation and code should be of direct use for 
adoption or as examples for other Summon 2.0 customers, but they may also be 
useful to anyone faced with the need to add enhancements to other 
vendor-controlled MVC-based sites.

Parsing and Matching Dates in VIAF
Jenny A. Toves and Thomas B. Hickey

[CODE4LIB] Code4Lib Journal call for proposals

2014-06-12 Thread Kelley McGrath
Call for Papers (and apologies for cross-posting):

The Code4Lib Journal (C4LJ) exists to foster community and share information 
among those interested in the intersection of libraries, technology, and the 
future.

We are now accepting proposals for publication in our 26th issue. Don't miss 
out on this opportunity to share your ideas and experiences. To be included in 
the 26th issue, which is scheduled for publication in mid-October 2014, please 
submit articles, abstracts, or proposals at 
http://journal.code4lib.org/submit-proposal or to jour...@code4lib.org by 
Friday, July 11, 2014. When submitting, please include the title or subject of 
the proposal in the subject line of the email message.

C4LJ encourages creativity and flexibility, and the editors welcome submissions 
across a broad variety of topics that support the mission of the journal. 
Possible topics include, but are not limited to:

* Practical applications of library technology (both actual and hypothetical)
* Technology projects (failed, successful, or proposed), including how they 
were done and challenges faced
* Case studies
* Best practices
* Reviews
* Comparisons of third party software or libraries
* Analyses of library metadata for use with technology
* Project management and communication within the library environment
* Assessment and user studies

C4LJ strives to promote professional communication by minimizing the barriers 
to publication. While articles should be of a high quality, they need not 
follow any formal structure. Writers should aim for the middle ground between 
blog posts and articles in traditional refereed journals. Where appropriate, we 
encourage authors to submit code samples, algorithms, and pseudo-code. For more 
information, visit C4LJ's Article Guidelines or browse articles from earlier 
issues published on our website: http://journal.code4lib.org.

Remember, for consideration for the 26th issue, please send proposals, 
abstracts, or draft articles to jour...@code4lib.org no later than Friday, July 
11, 2014.

Send in a submission. Your peers would like to hear what you are doing.


Code4Lib Journal Editorial Committee


Re: [CODE4LIB] Looking for two coders to help with discoverability of videos

2013-12-03 Thread Kelley McGrath
Thanks, Jon. I have seen the Variations work and also talked to Jenn Riley 
about it. It has definitely influenced me, although we are going in a slightly 
different direction and moving images have some different needs from music.

One thing about Variations that struck me is this paragraph from the usability 
testing report 
(http://www.dlib.indiana.edu/projects/vfrbr/projectDoc/usability/usabilityTest/ScherzoUTestReport.pdf):

There was an assumption among the development team that works would be a 
window for organizing and narrowing results in a way that users searching for 
scores and recordings would find useful. One of the main ideas behind FRBR is 
that the work, or the intellectual entity that is produced by people and is 
packaged in many forms, is the core information – Scherzo’s interface reflected 
that organization. 4 (See Appendix E, Fig. 14 for Scherzo’s search results 
page.) But the participants tended to latch on to a person’s name and search 
for that name in a particular role. The reasons for this are not completely 
clear and further discussion follows, but it is worth bearing this finding in 
mind. Additionally, from the search results page, work results were clicked 
only 14 times in comparison to items in recordings  scores , which were 
clicked 65 times. Regardless of how the FRBRized data is organized on the back 
end, the interface needs to reflect the way users want to search, and that 
might not mean with search results organized by work.

Does this mean that a work-focused approach is not actually what users want or 
need? Does it mean that the work-centered approach needs to be implemented 
differently in the user interface? Are these results somehow specific to music? 
Do they reflect users' familiarity with the typical library catalog and the 
strategies they've become accustomed to using?

It does suggest to me that there should be more studies on how users interact 
with FRBRized data (and not just the clustering that so many discovery 
interfaces do now, but real FRBR-based data) and how FRBRized data is best 
presented.

Kelley

On Tue, Dec 3, 2013 at 11:35 AM, Dunn, Jon William Butcher 
j...@iu.edumailto:j...@iu.edu wrote:
Hi Kelley,

If you haven't already, you might want to look at the music score and sound 
recording FRBRization work done on the Variations-FRBR project here at Indiana 
University. I'm not sure how directly useful this would be for your work with 
moving images, but there may be some useful mapping ideas:

FRBR XML schemas: 
http://www.dlib.indiana.edu/projects/vfrbr/schemas/1.1/index.shtml

MARC-FRBR mapping specifications: 
http://www.dlib.indiana.edu/projects/vfrbr/projectDoc/metadata/mappings/spring2010/vfrbrSpring2010mappings.shtml

Java FRBRization code and documentation: 
http://www.dlib.indiana.edu/projects/vfrbr/projectDoc/index.shtml

Jon


Re: [CODE4LIB] Looking for two coders to help with discoverability of videos

2013-12-02 Thread Kelley McGrath
Well, that would be much easier, but most of what I am working with are records 
for physical items (DVD, VHS, film) or licensed streaming video. The sample 
records are also not all UO records so I don't necessarily even have access to 
the source material (our goal is to build a general purpose tool). So I think I 
am stuck with extracting from MARC.

We should be able to get data for some resources by matching the MARC up with 
external data sources. That won't work for everything, though, so we want to 
make the process of extracting data from MARC as effective as possible.

Kelley


On Mon, Dec 2, 2013 at 7:03 AM, Alexander Duryee 
alexanderdur...@gmail.commailto:alexanderdur...@gmail.com wrote:
Is it out of the question to extract technical metadata from the
audiovisual materials themselves (via MediaInfo et al)?  It would minimize
the amount of MARC that needs to be processed and give more
accurate/complete data than relying on old cataloging records.


Re: [CODE4LIB] Looking for two coders to help with discoverability of videos

2013-12-02 Thread Kelley McGrath
Robert,

Your work also sounds very interesting and definitely overlaps with some of 
what we want to do. It seems like a lot of people are trying to get useful 
format information out of MARC records and it's unfortunate that it is so 
complicated. I would be very interested to see your logic for determining 
format and dealing with self-contradictory records. Runtime from the 008 is, as 
you say, pretty straightforward, but not always filled out and useless if the 
resource is longer than 999 minutes.

It's interesting that you mention identifying directors. We have also been 
working on a similar, although more generalized, process. We're trying to 
identify all of the personal and organizational names mentioned in video 
records and, where possible, their roles. Our existing process is pretty 
accurate for personal names and for roles in English. It tends to struggle with 
credits involving multiple corporate bodies and we're working on building a 
lexicon of non-English terms for common roles. We're also trying to get people 
to hand-annotate credits to build a corpus to help us improve our process. 
(Help us out at http://olac-annotator.org/. And if you're willing to be on call 
to help with translating non-English credits, email me with the language(s) 
you'd be able to help out with. We also just started a mailing list at 
https://lists.uoregon.edu/mailman/listinfo/olac-credits)

Matching MARC records for moving images with external data sources is also on 
our radar. Most feature film type material can probably be identified by the 
attributes you mention: title, original date and director (probably 2 out of 3 
would work in most cases). We are also hoping to use these attributes (and 
possibly others) to cluster records for the same FRBR work.

It would be great to talk with you more about this off-list.

Kelley
kell...@uoregon.edu

From: Robert Haschart [rh...@virginia.edu]
Sent: Monday, December 02, 2013 10:49 AM
To: Code for Libraries
Cc: Kelley McGrath
Subject: Re: [CODE4LIB] Looking for two coders to help with discoverability of 
videos

Kelley,

The work you are proposing is interesting and overlaps somewhat both
with work I have already done and with a new project I'm looking into
here at UVa.
I have been the primary contributor to the Marc4j java project for the
past several years and am the creator of the project SolrMarc which
extracts data from Marc records based on a customizable specification,
to build Solr index records to facilitate rich discovery.

Much of my work on creating and improving these projects has been in
service of my actual job of creating and maintaining the Solr Index
behind our Blacklight-based discovery interface.   As a part of that
work I have created custom SolrMarc routines that extract the format of
items similar to what is described in Example 3, including looking in
the leader, 006, 007 and 008 to determine the format as-coded but
further looking in the 245 h, 300 and 538 fields to heuristically
determine when the format as-coded is incorrect and ought to be
overridden.   Most of the heuristic determination is targeted towards
Video material, and was initiated when I found an item that due to a
coding error was listed as a Video in Braille format.

Further I have developed a set of custom routines that look more closely
at Video items, one of which already extracts the runtime from the
008[18-20] field,
To modify it from its current form that currently returns the runtime in
minutes, to instead return it as   HH:MM as specified in your xls file,
and to further handle the edge case of  008[18-20] = 000  to return
over 16:39 would literally take about 15 minutes.

Another of these custom routines that is more fully-formed, is code for
extracting the Director of a video from the Marc record.  It examines
the contents of the fields 245c, 508a, 500a, 505a, 505t, employing
heuristics and targeted natural language processing techniques, to
attempt to correctly extract the Director.   At this point I believe
it achieves better results than a careful cataloger would achieve, even
one who specializes in film and video.

The other project I have just started investigating is an effort to
create and/or flesh out Marc records for video items based on heuristic
matching of title and director and date with data returned from
publicly-accessible movie information sites.

This more recent work may not be relevant to your needs but the custom
extraction routines seem directly applicable to your goals, and may also
provide a template that may make your other goals more easily achievable.

-Robert Haschart


Re: [CODE4LIB] Looking for two coders to help with discoverability of videos

2013-12-01 Thread Kelley McGrath
I wanted to follow up on my previous post with a couple points.

1. This is probably too late for anybody thinking about applying, but I thought 
there may be some general interest. I have put up some more detailed 
specifications about what I am hoping to do at 
http://pages.uoregon.edu/kelleym/miw/. Data extraction overview.doc is the 
general overview and the other files contain supporting documents.

2. I replied some time ago to Heather's offer below about her website that will 
connect researchers with volunteer software developers. I have to admit that 
looking for volunteer software developers had not really occurred to me. 
However, I do have additional things that I would like to do for which I 
currently have no funding so if you would be interested in volunteering in the 
future, let me know.

Kelley
kell...@uoregon.edu


On Tue, Nov 12, 2013 at 6:33 PM, Heather Claxton 
claxt...@gmail.commailto:claxt...@gmail.com wrote:
Hi Kelley,

I might be able to help in your search.   I'm in the process of starting a
website that connects academic researchers with volunteer software
developers.  I'm looking for people to post programming projects on the
website once it's launched in late January.   I realize that may be a
little late for you, but perhaps the project you mentioned in your PS
(clustering based on title, name, date ect.) would be perfect?  The
one caveat is that the website is targeting software developers who wish to
volunteer.   Anyway, if you're interested in posting, please send me an
e-mail  at  sciencesolved2...@gmail.commailto:sciencesolved2...@gmail.com
I would greatly appreciate it.
Oh and of course it would be free to post  :)  Best of luck in your
hiring process,

Heather Claxton-Douglas


On Mon, Nov 11, 2013 at 9:58 PM, Kelley McGrath 
kell...@uoregon.edumailto:kell...@uoregon.edu wrote:

 I have a small amount of money to work with and am looking for two people
 to help with extracting data from MARC records as described below. This is
 part of a larger project to develop a FRBR-based data store and discovery
 interface for moving images. Our previous work includes a consideration of
 the feasibility of the project from a cataloging perspective (
 http://www.olacinc.org/drupal/?q=node/27), a prototype end-user interface
 (https://blazing-sunset-24.heroku.com/,
 https://blazing-sunset-24.heroku.com/page/about) and a web form to
 crowdsource the parsing of movie credits (
 http://olac-annotator.org/#/about).
 Planned work period: six months beginning around the second week of
 December (I can be somewhat flexible on the dates if you want to wait and
 start after the New Year)
 Payment: flat sum of $2500 upon completion of the work

 Required skills and knowledge:

   *   Familiarity with the MARC 21 bibliographic format
   *   Familiarity with Natural Language Processing concepts (or
 willingness to learn)
   *   Experience with Java, Python, and/or Ruby programming languages

 Description of work: Use language and text processing tools and provided
 strategies to write code to extract and normalize data in existing MARC
 bibliographic records for moving images. Refine code based on feedback from
 analysis of results obtained with a sample dataset.

 Data to be extracted:
 Tasks for Position 1:
 Titles (including the main title of the video, uniform titles, variant
 titles, series titles, television program titles and titles of contents)
 Authors and titles of related works on which an adaptation is based
 Duration
 Color
 Sound vs. silent
 Tasks for Position 2:
 Format (DVD, VHS, film, online, etc.)
 Original language
 Country of production
 Aspect ratio
 Flag for whether a record represents multiple works or not
 We have already done some work with dates, names and roles and have a
 framework to work in. I have the basic logic for the data extraction
 processes, but expect to need some iteration to refine these strategies.

 To apply please send me an email at kelleym@uoregon explaining why you
 are interested in this project, what relevant experience you would bring
 and any other reasons why I should hire you. If you have a preference for
 position 1 or 2, let me know (it's not necessary to have a preference). The
 deadline for applications is Monday, December 2, 2013. Let me know if you
 have any questions.

 Thank you for your consideration.

 Kelley

 PS In the near future, I will also be looking for someone to help with
 work clustering based on title, name, date and identifier data from MARC
 records. This will not involve any direct interaction with MARC.


 Kelley McGrath
 Metadata Management Librarian
 University of Oregon Libraries
 541-346-8232tel:541-346-8232
 kell...@uoregon.edumailto:kell...@uoregon.edu



[CODE4LIB] OLAC Movie Video Credit Annotation Experiment

2013-10-14 Thread Kelley McGrath
This project may be of interest to some on this list as an experiment to 
explore extracting structured data from free text in MARC. You also have a 
chance to help make it easier to find film and video in libraries if you're 
willing to take a few minutes to participate.

OLAC (http://www.olacinc.org/) is working on project to try to make the process 
of finding film and video in library catalogs better. Please help us by 
annotating some film and video credits at http://olac-annotator.org/. It only 
takes a few minutes to make a contribution. We are challenging OLAC members to 
do annotate three credits per day this week to see how many we can get done. 
Please join us in this endeavor. We are especially looking for people who know 
languages other than English to help us translate credits in languages from 
Chinese to Spanish to Urdu. Full announcement below. Please share this 
information with anyone you think might be interested.

Kelley

***

The OLAC Movie  Video Credit Annotation Experiment (http://olac-annotator.org) 
is part of a larger project to make it easier to find film and video in 
libraries and archives. In the current phase, we're trying to break existing 
MARC movie records down and pull out all the cast and crew information so that 
it may be re-ordered and manipulated. We also want to make explicit connections 
between cast and crew names and their roles or functions in the movie 
production. Adding these formal connections to movie records will allow us to 
provide a better user experience. For example, library patrons would be able to 
search just for directors or just for cast members or only for movies where 
Clint Eastwood is actually in the cast rather than all the movies that he is 
connected with. Libraries would have the flexibility to create more 
standardized and readable displays of production credits, such as you see at 
IMDb (see http://www.imdb.com/title/tt1205489/ -- not that we necessarily want 
IMDb's display, but that we would have much more flexibility in designing 
displays) , rather than views like a typical library catalog (such as 
http://janus.uoregon.edu/record=b3958782).

We therefore want to convert our existing records into more structured sets of 
data. Eventually, we intend to automate most of this conversion. For now, we 
need help from human volunteers, who can train our software to recognize the 
many ways names and roles have been listed in library records for movies. Give 
us a hand at http://olac-annotator.org. For an explanation with more library 
jargon thrown in, see http://olac-annotator.org/#/more.

The OLAC Movie  Video Credit Annotation Experiment was conceived by Kelley 
McGrath, developed by Chris Fitzpatrick and funded by a Richard and Mary 
Corrigan Solari Library Fellowship Incentive Award from the University of 
Oregon Libraries.


Kelley McGrath
Metadata Management Librarian
University of Oregon Libraries
541-346-8232
kell...@uoregon.edu


[CODE4LIB]

2012-11-27 Thread Kelley McGrath
I'll second the idea of approaching people individually and explicitly asking 
them to participate. It worked on me. I never would have written my first 
article for the Code4Lib Journal or become a member of the editorial committee 
if someone hadn't encouraged me individually (Thanks Jonathan!).

It would also be good to find a way to somehow target the pool of lurkers who 
maybe aren't already connected to someone and get them more involved.

As far as anonymous proposals go, we recently had a very good workshop on 
implicit bias here. Someone brought up that found significant changes in the 
gender proportions in symphony orchestras after candidates started auditioning 
behind screens. There are also lots of studies about the different responses to 
the same resume/application depending on whether a stereotypically male/female 
or white/black name was used. Probably it's impossible to make proposals 
completely anonymous, but it would be an interesting experiment to leave off 
the names.

Kelley

PS Interestingly, I wouldn't instinctively self-identify as a member of the 
Code4Lib community, although my first thought is that that has more to do with 
not being a coder than with being a woman.


**
Kelley McGrath
Metadata Management Librarian
University of Oregon Libraries 
1299 University of Oregon
Eugene, OR 97403

541-346-8232
kell...@uoregon.edu


Re: [CODE4LIB] Code4Lib Journal: Editors Wanted

2012-04-23 Thread Kelley McGrath
Hello, again!

Just a reminder that the deadline to apply for the Code4Lib Journal editorial 
committee is Monday, April 30.

If you have been thinking about the call for new editors, I encourage you to 
apply. It's a great opportunity to contribute to something that makes a real 
difference to the library community by improving the dissemination of 
innovative and practical ideas. In my experience, it's both a lot work and a 
lot of fun and the editorial committee is made up of dedicated, supportive 
people who are great to work with.

Questions? Ask me (kell...@uoregon.edu) or anybody on the editorial committee 
(http://journal.code4lib.org/editorial-committee) or all of us 
(jour...@code4lib.org).

Kelley

On Tue, Apr 10, 2012 at 10:22 AM, Kelley McGrath kell...@uoregon.edu wrote:
 The Code4Lib Journal (http://journal.code4lib.org/) is looking for volunteers 
 to join its editorial committee.  Editorial committee members work 
 collaboratively to produce the quarterly Code4Lib Journal.  Editors are 
 expected to:

 * Read, discuss, and vote on incoming proposals.
 * Volunteer to be the assigned editor or second reader for specific proposals.
    ** Assigned editors work with the author(s) to make sure the article is as 
 strong as possible, that the copy is clean, and deadlines are met.  They also 
 enter the article into WordPress, making sure the formatting is okay, all 
 images and tables look good, etc.
    ** Second readers act as a second set of eyes for the assigned editor.
 * Read and comment on any other article that interests you.
 * Volunteer for administrative tasks and projects as they crop up.
 * Take a turn as Coordinating Editor for an Issue.  The Coordinating Editor 
 shepherds the issue through its life cycle.

 We seek an individual who is self-motivated, organized and able to meet 
 deadlines; is familiar with ideas and trends in the field; and has an 
 interest in the mechanics of writing.  There is a sometimes significant time 
 commitment involved; expect to set aside ten or more hours a month.

 It sounds like a lot of work, but it's also a lot of fun (if editing is your 
 idea of fun).

 Intrigued?  Please send a letter of interest by Monday, April 30 to 
 jour...@code4lib.org. Your letter should address these two basic questions:

 1) What is your vision for the Code4Lib Journal? Why are you interested in it?

 2) How can you contribute to the Code4Lib Journal, i.e. what do you have to 
 offer?

 We encourage people who have previously applied and who are still interested 
 to re-apply. We have had to turn down a lot of highly-qualified people in the 
 past due to the large number of applications.

 If you have any questions, contact us by email at jour...@code4lib.org or ask 
 any member of the editorial committee (listed at 
 http://journal.code4lib.org/editorial-committee). We plan to make decisions 
 about additional editors by mid-May.

 Kelley McGrath
 on behalf of the Code4Lib Editorial Committee


[CODE4LIB] Code4Lib Journal: Editors Wanted

2012-04-10 Thread Kelley McGrath
The Code4Lib Journal (http://journal.code4lib.org/) is looking for volunteers 
to join its editorial committee.  Editorial committee members work 
collaboratively to produce the quarterly Code4Lib Journal.  Editors are 
expected to:

* Read, discuss, and vote on incoming proposals.
* Volunteer to be the assigned editor or second reader for specific proposals.
** Assigned editors work with the author(s) to make sure the article is as 
strong as possible, that the copy is clean, and deadlines are met.  They also 
enter the article into WordPress, making sure the formatting is okay, all 
images and tables look good, etc.
** Second readers act as a second set of eyes for the assigned editor.
* Read and comment on any other article that interests you.
* Volunteer for administrative tasks and projects as they crop up.
* Take a turn as Coordinating Editor for an Issue.  The Coordinating Editor 
shepherds the issue through its life cycle.

We seek an individual who is self-motivated, organized and able to meet 
deadlines; is familiar with ideas and trends in the field; and has an interest 
in the mechanics of writing.  There is a sometimes significant time commitment 
involved; expect to set aside ten or more hours a month.

It sounds like a lot of work, but it's also a lot of fun (if editing is your 
idea of fun).

Intrigued?  Please send a letter of interest by Monday, April 30 to 
jour...@code4lib.org. Your letter should address these two basic questions:

1) What is your vision for the Code4Lib Journal? Why are you interested in it?

2) How can you contribute to the Code4Lib Journal, i.e. what do you have to 
offer?

We encourage people who have previously applied and who are still interested to 
re-apply. We have had to turn down a lot of highly-qualified people in the past 
due to the large number of applications.

If you have any questions, contact us by email at jour...@code4lib.org or ask 
any member of the editorial committee (listed at 
http://journal.code4lib.org/editorial-committee). We plan to make decisions 
about additional editors by mid-May.

Kelley McGrath
on behalf of the Code4Lib Editorial Committee


[CODE4LIB] Finding movies with FRBR facets lightning talk: expanded version

2012-02-26 Thread Kelley McGrath
Since my lightning talk at the Code4Lib conference only really talked 
about the prototype discovery interface, it ended up giving an 
incomplete sense of the overarching project. If anyone is interested, I 
put up an expanded version of my slides that includes more of the bigger 
picture of where we're going to get the data from and how we hope to 
cooperatively maintain a central data store of information about moving 
image materials in libraries to drive the user interface.


http://pages.uoregon.edu/kelleym/publications/FRBRFacets_C4L2012.pdf

Kelley McGrath
Metadata Management Librarian
University of Oregon
kell...@uoregon.edu


Re: [CODE4LIB] Cataloging4Coders @ C4L12 - We need your brains

2011-11-18 Thread Kelley McGrath
 to account for.

I have no idea how a computer would know whether ill. ought to map to
illustration or illustrations in most cases since the distinction was not
recorded. Perhaps illustration(s) would work.

That doesn't even start to address mistakes in data, allowing for older
rules (AACR1's illus.), non-English language records or local practices that
go against the rules. All this is not to say that there isn't a real need
here. Their ought to be a way to both minimize the amount of typing that
catalogers have to do while at the same time provide full, unambiguous
displays for users.

So what I wish is that there were some way to get more catalogers to see
that despite Watson, there are serious limitations to what computers can
practically do and that we would be better off if we worked with computer's
strengths instead of trying to make them do things that are hard for them to
do so we can reproduce the form of the card catalog (as opposed to the
function).

Kelley

On Fri, Nov 18, 2011 at 8:26 AM, Bohyun Kim k...@fiu.edu wrote:
 As a side note to this, the communication issue is not unique between
catalogers and coders. It is a common discussion topic (librarians vs. IT;
emerging technology librarians vs. library coders; even web designers vs.
web developers).  I hear about this a lot in library conferences. But of
course, discussion there is mostly from the librarians' point of view. Since
code4lib is unique in that many library coders get together, it would be
good to hear the thoughts on this from the coders' point of view as well.

 ~Bohyun

 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf 
 Of Kelley McGrath
 Sent: Thursday, November 17, 2011 7:19 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Cataloging4Coders @ C4L12 - We need your 
 brains

 I am not by any stretch of the imagination a coder, but I think it would
be helpful to have some discussion of common cataloger-coder communication
issues. So many cataloger-coder discussions online seem to consist of people
talking past each other (although I do think there is a much larger and less
vocal common ground in the middle). In addition, I have sometimes seen my
cataloger and coder/IT colleagues struggle to communicate with each other
and find myself trying to translate. Are there ways to make that translation
process easier or cultivate more translators? What do coders wish that
catalogers knew about how computers interact with metadata?

 I would also be interested in ideas on how to shift the conversation more
towards underlying functionality. A central failing of computerized catalogs
IMO is that they tend to replicate the literal form and actions of cards and
the card catalog rather than tried to find a way to express the underlying
functionality of the card catalog in a computer environment. This is also
sometimes badly done because the programmers don't understand the point of
what they're replicating (although to be fair, what they're trying to work
with is often not in a form optimized for a computer environment). Uniform
titles in many catalogs are a good example of this.

 Kelley

 PS Some of the other emails mention wanting help with understanding where
real data differs from what's in specifications or differs over time or for
other reasons. Speaking as a reasonably competent cataloger, I would say
that, although some things can be anticipated in advance, I find this to
inevitably be an iterative process.

 PPS I'm looking forward to attending.

 On Thu, Nov 10, 2011 at 11:14 AM, Becky Yoose b.yo...@gmail.com wrote:
 Hey folks,

 There's been increasing discussion and interest about cataloging 
 around this community (and others like it) for quite a while. I found 
 some co-conspirators and we are planning to propose a pre-conference 
 on cataloging/library metadata creation geared towards the huddled 
 code4lib masses (otherwise known as coders) who are yearning for 
 knowledge of this Darkest of Library Arts.

 We need you help before we post our proposal. We realize that there's 
 a wide range of cataloging knowledge and experience in the community, 
 and we want to make sure that those interested get the most out of 
 the pre-conference. If this pre-conference has perked your interest, 
 can you help us in letting us know:

 - What experience do you have with cataloging/library metadata creation?
 - What do you want us to cover? Do you have any questions that you 
 want covered?

 This information will help us greatly in how we structure the 
 pre-conference both in content and schedule. For now, we're planning 
 a half-day pre-conference, but if there's enough interest between 
 beginners and more experienced folks, we will consider offering two 
 half-day preconferences in order to focus on specific participant needs.

 Feel free to ask questions as well - I'll try to answer them as best 
 as possible given what our group has brainstormed so far.

 Thanks for reading

Re: [CODE4LIB] Cataloging4Coders @ C4L12 - We need your brains

2011-11-17 Thread Kelley McGrath
I am not by any stretch of the imagination a coder, but I think it would be
helpful to have some discussion of common cataloger-coder communication
issues. So many cataloger-coder discussions online seem to consist of people
talking past each other (although I do think there is a much larger and less
vocal common ground in the middle). In addition, I have sometimes seen my
cataloger and coder/IT colleagues struggle to communicate with each other
and find myself trying to translate. Are there ways to make that translation
process easier or cultivate more translators? What do coders wish that
catalogers knew about how computers interact with metadata?

I would also be interested in ideas on how to shift the conversation more
towards underlying functionality. A central failing of computerized catalogs
IMO is that they tend to replicate the literal form and actions of cards and
the card catalog rather than tried to find a way to express the underlying
functionality of the card catalog in a computer environment. This is also
sometimes badly done because the programmers don't understand the point of
what they're replicating (although to be fair, what they're trying to work
with is often not in a form optimized for a computer environment). Uniform
titles in many catalogs are a good example of this.

Kelley

PS Some of the other emails mention wanting help with understanding where
real data differs from what's in specifications or differs over time or for
other reasons. Speaking as a reasonably competent cataloger, I would say
that, although some things can be anticipated in advance, I find this to
inevitably be an iterative process.

PPS I'm looking forward to attending.

On Thu, Nov 10, 2011 at 11:14 AM, Becky Yoose b.yo...@gmail.com wrote:
 Hey folks,

 There's been increasing discussion and interest about cataloging 
 around this community (and others like it) for quite a while. I found 
 some co-conspirators and we are planning to propose a pre-conference 
 on cataloging/library metadata creation geared towards the huddled 
 code4lib masses (otherwise known as coders) who are yearning for 
 knowledge of this Darkest of Library Arts.

 We need you help before we post our proposal. We realize that there's 
 a wide range of cataloging knowledge and experience in the community, 
 and we want to make sure that those interested get the most out of the 
 pre-conference. If this pre-conference has perked your interest, can 
 you help us in letting us know:

 - What experience do you have with cataloging/library metadata creation?
 - What do you want us to cover? Do you have any questions that you 
 want covered?

 This information will help us greatly in how we structure the 
 pre-conference both in content and schedule. For now, we're planning a 
 half-day pre-conference, but if there's enough interest between 
 beginners and more experienced folks, we will consider offering two 
 half-day preconferences in order to focus on specific participant needs.

 Feel free to ask questions as well - I'll try to answer them as best 
 as possible given what our group has brainstormed so far.

 Thanks for reading,
 Becky
 Official cat[aloger] herder


 ---
 Becky Yoose
 Systems Librarian
 Grinnell College Libraries
 yoose...@grinnell.edu



Re: [CODE4LIB] LCSH and Linked Data

2011-04-18 Thread Kelley McGrath

On Sun, Apr 17, 2011 at 7:40 AM, Simon Spero s...@unc.edu wrote:


The main study on this subject was the Michigan study performed/led 
by Karen
Markey (some reports were written as Karen M. Drabenstott.  The 
final report

of the project is available at
http://deepblue.lib.umich.edu/handle/2027.42/57992 .  The work took 
place in

the mid to late 90s, after  Airlie .

...

The most perplexing results were those that showed that measured
understanding was lower when headings were displayed in the context 
of a
bibliographic record rather than on their own. This indicates either 
a
problem in the measurement process, or an either more fundamental 
problem
with subdivided headings that may so negate the significant 
theoretical
advantages of pre-coordination that the value of the whole practice 
is

thrown in to doubt.


That is fascinating. And disturbing. I don't think I ever read the 
original study, but now I'll have to.


Touching on another topic, I believe that   the movement of 
geographical

subdivisions to follow the right most geographically sub-dividable
subdivision can sometimes be interrupted by the interposition of a 
$x
topical subdivision, but I haven't determined whether this is a 
legacy
exception (the ones that came to mind were related to subtopics of 
the US
Civil War, which seems inevitable given that  the first elements are 
United

States--History--Civil War, 1861-1865--).

I think the key here is partly In 1992, it was decided to adopt that 
order where it could be applied. so LC didn't promise to do them all. 
$x History is probably the biggest one that hasn't been made 
geographically subdividable, but it's hard to say if that's on principle 
or because of practical concerns about the huge amount of disruption 
that would cause in individual systems. It's interesting that some of 
the biggies like economic aspects are more recent.


One of the challenges for pre-coordinated strings at least as currently 
implemented (that facets evade) is that no order will suit everyone. 
Which of the following is better?


Dwellings $z Australia $x History $y 20th century
Dwellings $z Indonesia $x Economic aspects
Dwellings $z Indonesia $x Psychological aspects
Dwellings $z Indonesia $x Social aspects
Dwellings $z Ireland $x Economic aspects
Dwellings $z Ireland $x Psychological aspects
Dwellings $z Ireland $x Social aspects
Dwellings $z Japan $x Economic aspects
Dwellings $z Japan $x Psychological aspects
Dwellings $z Japan $x Social aspects

OR (mostly current practice)

*Dwellings $z Australia $x History $y 20th century  **Current practice
Dwellings $x Economic aspects $z Indonesia
Dwellings $x Economic aspects $z Ireland
Dwellings $x Economic aspects $z Japan
*Dwellings $x History $z Australia $y 20th century  **Airlie 
recommendation

Dwellings $x Psychological aspects $z Indonesia
Dwellings $x Psychological aspects $z Ireland
Dwellings $x Psychological aspects $z Japan
Dwellings $x Social aspects $z Indonesia
Dwellings $x Social aspects $z Ireland
Dwellings $x Social aspects $z Japan

Probably not helpful to have history be an outlier, though.

Kelley


Re: [CODE4LIB] LCSH and Linked Data

2011-04-15 Thread Kelley McGrath
A few belated ramblings from a cataloger:

 

1) GEOGRAPHICAL SUBDIVISION

 

It used to be that geographical subdivision was much more flexible and was 
supposed to convey different meanings depending on where it occurred in the 
string. Then there was some research showing that not only did users not know 
how to interpret this, but catalogers did not understand these rules and were 
constructing inconsistent headings. This led to a movement for simplification. 
From LC's Subject Heading Manual:

 

The Subject Subdivisions Conference that took place at Airlie, Virginia, in 
1991 recommended that the standard order of subdivisions be 
[topic]–[place]–[chronology]–[form].  In 1992, it was decided to adopt that 
order where it could be applied. 

 

This leaves a standard order of $a, $b [rare], $x, $z, $y, $v with some 
exceptions.

 

As was pointed out earlier, the current rule is to put the geographic 
subdivision ($$z) as near the end as is legal. This can be mechanically 
determined based on a fixed field in the authority record. Although fixed 
fields in bib records are often unreliable, those in authority records are 
probably as accurate as they can reasonably be made to be, allowing for human 
error. This is both because LC coordinates training and reviews records and 
because the fixed fields are used as decision points so there are short-term 
consequences for later catalogers if they're not done right.

 

The fixed field (008/06) in LCSH authority records that tells you if a 
geographic subdivision can come after the heading 
(http://www.loc.gov/marc/authority/ad008.html). Id.loc.gov doesn't seem to give 
you that info, but it might be nice if it did.

 

650 _0 $a Education [sh 85040989, Geo Subd = i = Subdivided 
geographically-indirect] $z England [n  82068148] $x Finance [sh2002007885, Geo 
Subd = # = Not subdivided geographically]

 

650 _0 $a Education [sh 85040989, Geo Subd = i = Subdivided 
geographically-indirect] $x Economic aspects [sh 99005484 Geo Subd = i = 
Subdivided geographically-indirect] $z England [n  82068148].

 

One reason not to rely on found order is that LC has been moving in the 
direction of the Airlie House recommendation so in addition to the usual 
mistakes, you'll probably come across a lot of older forms if you take data 
from the wild. For example, until somewhat recently, the economic aspects 
record above looked like the finance one so you'll probably still see records 
like 

 

650 _0 $a Education $z England $x Economic aspects.

 

A) Indirect Subdivision

 

In general, when a heading string starts with a geographic name, it is in 
direct order:

 

651 _0 $a London (England) [n  79005665] $x Economic conditions [sh 99005736].

 

If a geographic name is modifying a topical heading, it is given in indirect 
order:

 

650 _0 $a Education [sh 85040989] $z England $z London [n  79005665; covers 
both $z subfields].

 

Thanks to a project that OCLC did for FAST (which uses only the indirect 
style), in most cases both of these can be extracted from the authority record, 
which will have a 781 with the indirect form added:

 

n  79005665

151  $a London (England)

451  $a Londinium (England)

...

781 0 $z England $z London

 

Some records (usually for geographic areas within cities) cannot be used to 
modify topical headings, but can be used in 651$a as the main term in a heading 
string. There are identified by a note and lack of 781.

 

n  85192245

151  $a Hackney (London, England)

667  $a SUBJECT USAGE: This heading is not valid for use as a geographic 
subdivision.

 

B) Geographic Entities and Name vs. Subject Headings

 

Notice that in the above example, the control number/identifier for Education 
starts with sh while the one for London starts with n. This is an important 
distinction. Heading identifiers that start with sh are LCSH terms found in the 
subject authority file and are available from id.loc.gov. I think these all 
fall into FRBR's group 3 bib entities. Heading identifiers that start with n 
are stored in the LC NAF (Name Authority File) and are not available as linked 
data. These are the FRBR group 1 and 2 entities and maybe some from group 3. 
Most of these can also be used as subjects in LCSH. So you can't actually get 
at all the building blocks of LCSH strings nor use linked data for all subjects.

 

Named geographic features (e.g., mountains, lakes, continents) are established 
in the subject authority file using the rules in the Subject Cataloging Manual 
for LCSH. The headings are tagged 151 and can be found at id.loc.gov.

 

sh 85082617 

151  $a McKinley, Mount (Alaska)

 

sh 85044620 

151  $a Erie, Lake

 

sh 85008606

151 $a Asia

 

Geographic features appear in bib records only as 651 or 650+ $z subject terms.

 

Jurisdiction names (e.g., cities, states, countries) are established in the 
name authority file using descriptive cataloging rules (e.g., AACR2 ch 23 and 
the NACO Participants' Manual).  They 

Re: [CODE4LIB] regexp for LCC?

2011-04-01 Thread Kelley McGrath
At one point, much to my surprise, someone told me that 050 is defined for
numbers assigned by LC not for LCC numbers per se. It doesn't really sound
like that from the current definition
(http://www.loc.gov/marc/bibliographic/bd050.html), but if you look on the
ITS page (http://www.itsmarc.com/crs/edit7592.htm), which I think is not
up-to-date, you'll see a discussion of Pseudo call numbers and other forms
of LC call numbers

As someone pointed out, only a very few classes start with three letters
(off the top of my head; a couple in D and a number in K; see
http://library.duke.edu/services/instruction/libraryguide/lcclass.html, but
there are more in K than are listed here).

The pseudo or shelf numbers I've seen most often in 050 are MLC and SD
(which unfortunately is the same as the class for forestry). Look for SD on
musical recording records (it used to really mess up the attempts of the
catalog where I used to work to facet music CDs on LC class; there were a
few other common ones, but I've forgotten).

Depending what you're doing, you might try to prefer a call number in 090 if
there is one. These are more likely to reflect local preference.

Looking up 090 (http://www.oclc.org/bibformats/en/0xx/090.shtm) produced
some other examples of non-LCC 050's: PAR, Newspaper, UNC, or NOT IN LC.

Good luck!

Kelley

***
Except now I wonder if those annoying MLCS call numbers might actually be
properly MATCHED by this regex, when I need em excluded. They are annoying
_similar_ to a classified call number. Well, one way to find out.

And the reason this matters is to try and use an LCC to map to a
'discipline' or other broad category, either directly from the LCC schedule
labels, or using a mapping like umich's:
http://www.lib.umich.edu/browse/categories/

But if it's not really an LCC at all, and you try to map it, you'll get bad
postings.

On 3/31/2011 1:03 PM, Jonathan Rochkind wrote:

 Thanks, that looks good!

 It's hosted on Google Code, but I don't think that code is anything 
 Google uses, it looks like it's from our very own Bill Dueber.

 On 3/31/2011 12:38 PM, Tod Olson wrote:

 Check the regexp that Google uses in their call number normalization:

        http://code.google.com/p/library-callnumber-lc/wiki/Home

 You may want to remove the prefix part, and allow for a fourth cutter.

 The folks at UNC pointed me to this a few months ago.

 -Tod

 On Mar 31, 2011, at 11:29 AM, Jonathan Rochkind wrote:

 Does anyone have a good regular expression that will match all legal 
 LC Call Numbers from the LC Classified Schedule, but will generally 
 not match things that could not possibly be an LC Call Number from 
 the LC Classified Schedule?

 In particular, I need it to NOT match an MLC call number, which is 
 an LC assigned call number that shows up in an 050 with no way to 
 distinguish based on indicators, but isn't actually from the LC 
 Schedules.  Here's an example of an MLC call number:

 MLCS 83/5180 (P)

 Hmm, maybe all MLC call numbers begin with MLC, okay I guess I can 
 exclude them just like that. But it looks like there are also OTHER 
 things that can show up in the 050 but aren't actually from the 
 classified schedule, the OCLC documentation even contains an example 
 of Microfilm 19072 E.

 What a mess, huh?  So, yeah, regex anyone?

 [You can probably guess why I care if it's from the LC Classified 
 Schedule or not].

 Tod Olsont...@uchicago.edu
 Systems Librarian
 University of Chicago Library



Re: [CODE4LIB] Announcing OLAC's prototype FRBR-inspired moving image discovery interface

2010-12-28 Thread Kelley McGrath
-Original Message-
From: McElwain, Paul Benjamin [mailto:pbmce...@indiana.edu] 



In my working on the Variations FRBR implementation, as a data modeler, I
was struck by little attention had been paid to the relationships by the
FRBR Report.  I'm not surprised though, treating the relationships at at
entity level (having their own attribution) is a more obtuse exercise of
abstraction.

 

We do treat relationships as attributed entities, for more information
about the role involved. (a creator may be as a composer)

 

One way to think about the originality of an expression of a work, being the
first ever expression, could be as an attribute of the relationship between
the work and expression.

 

Paul...

 



 

Paul,

 

I agree that from a theoretical perspective it makes a lot of sense to model
the original expression as an attribute of the relationship between the work
and that expression. Or to do what FRBRoo did and make classes like Work
Conception and F28 Expression Creation.

 

It's not really clear to me that in our particular situation there is any
practical advantage to trying to do that rather than creating a merged
work/primary expression entity.

 

This has to do with the kind of expressions we're mostly modeling and the
way we're trying to model them. Most of the moving image expressions that
average libraries deal with are defined not by what I think of as bundled
attributes, but are rather a set of independent attributes.

 

This is unlike the typical music expression, which I think of as a set of
bundled attributes. If you know you have a performance of X work on Y date
in Z venue then, if someone has previously created in an expression record,
you know a number of other things about that expression such as the
composer, performers, and arrangement of the piece without having to
re-verify them again. All those things could productively be stored as a
unit.

 

This also happens with film with various cuts such as airplane versions or
director's cuts. These do have some associated attributes, notably length,
but also perhaps a different editor. 

 

For the kinds of unbundled attributes that are common with moving images,
especially DVDs, there are a large number of attributes, like soundtrack and
subtitle languages, accessibility options (captions, audio descriptions),
and aspect ratio, that vary independently. With these kinds of unbundled
expression attributes, a cataloger has to reexamine all of them every time
there is a new manifestation. If there's a change in our knowledge of what
subtitles are on a specific manifestation, it does not have automatic
implications for any other manifestation that might have that same
constellation of options.

 

The other types of attributes that describe the original expression of a
film are those that never change because they are important facts about the
history of the work that we want to note in conjunction with any future
expression. Many of these are things that RDA says are attributes of
expressions that moving image catalogers would tend to think of as
attributes of works (e.g., casts and costume designers do not vary among
expressions so why record them on every expression?).

 

In a sense, at least for moving images, the original expression is bit of an
abstraction and in practice we get most of our information from reference
sources. 

 

At first, I thought we could just model these unbundled attributes of the
expression as attributes of the manifestation/publication since, as I
mentioned above, they have to be verified with every new manifestation
anyway. 


Work record 1

Dracula (1931)

Tod Browning

English

Manifestation record 1

1 VHS videocassette (1985)

OCLC#: 13754402

Audio: English

 

 

I ran into trouble with manifestations that include more than one work.

 

Some still work well enough, either because the expression-level information
is all the same or is unknown.


Work record 1

Ursula (1961)

Lloyd Michael Williams

English

Manifestation record 1

1 DVD video (2005)

Experiments in terror

ISBN: 0976523922  

Audio: English


Work record 2

Journey into the Unknown (2002 )

Kerry Laitala

English

 

 

However, in some cases, the expression-level information varies between two
works on a single manifestation/publication. The manifestation below
includes two versions of Dracula, each in its original language. For the
prototype, I just made two different manifestation records, which repeat
most of the same information. That doesn't seem to me to be a desirable
long-term solution.


Work record 1

Dracula (1931)

Tod Browning

English

Manifestation record 1

1 DVD video (1999)

ISBN: 0783227450  

Audio: English

Subtitles:  English or French


Work record 2

Dracula (1931)

George Medford

Spanish

Manifestation record 2

1 DVD video (1999)

ISBN: 0783227450  

Audio: Spanish

Subtitles:  English or French

 

 

So I think we do need the intermediate expression level, but I am not sure
if 

Re: [CODE4LIB] Announcing OLAC's prototype FRBR-inspired moving image discovery interface

2010-12-28 Thread Kelley McGrath
Okay, I tried to put in tables and that didn't work. I'm trying again with
tabs. See if this makes more sense--Kelley

-Original Message-
From: McElwain, Paul Benjamin [mailto:pbmce...@indiana.edu] 
In my working on the Variations FRBR implementation, as a data modeler, I
was struck by little attention had been paid to the relationships by the
FRBR Report.  I'm not surprised though, treating the relationships at at
entity level (having their own attribution) is a more obtuse exercise of
abstraction.

We do treat relationships as attributed entities, for more information
about the role involved. (a creator may be as a composer)

One way to think about the originality of an expression of a work, being the
first ever expression, could be as an attribute of the relationship between
the work and expression.

Paul...



Paul,

I agree that from a theoretical perspective it makes a lot of sense to model
the original expression as an attribute of the relationship between the work
and that expression. Or to do what FRBRoo did and make classes like Work
Conception and F28 Expression Creation.

It's not really clear to me that in our particular situation there is any
practical advantage to trying to do that rather than creating a merged
work/primary expression entity.

This has to do with the kind of expressions we're mostly modeling and the
way we're trying to model them. Most of the moving image expressions that
average libraries deal with are defined not by what I think of as bundled
attributes, but are rather a set of independent attributes.

This is unlike the typical music expression, which I think of as a set of
bundled attributes. If you know you have a performance of X work on Y date
in Z venue then, if someone has previously created in an expression record,
you know a number of other things about that expression such as the
composer, performers, and arrangement of the piece without having to
re-verify them again. All those things could productively be stored as a
unit.

This also happens with film with various cuts such as airplane versions or
director's cuts. These do have some associated attributes, notably length,
but also perhaps a different editor. 

For the kinds of unbundled attributes that are common with moving images,
especially DVDs, there are a large number of attributes, like soundtrack and
subtitle languages, accessibility options (captions, audio descriptions),
and aspect ratio, that vary independently. With these kinds of unbundled
expression attributes, a cataloger has to reexamine all of them every time
there is a new manifestation. If there's a change in our knowledge of what
subtitles are on a specific manifestation, it does not have automatic
implications for any other manifestation that might have that same
constellation of options.

The other types of attributes that describe the original expression of a
film are those that never change because they are important facts about the
history of the work that we want to note in conjunction with any future
expression. Many of these are things that RDA says are attributes of
expressions that moving image catalogers would tend to think of as
attributes of works (e.g., casts and costume designers do not vary among
expressions so why record them on every expression?).

In a sense, at least for moving images, the original expression is bit of an
abstraction and in practice we get most of our information from reference
sources. 

At first, I thought we could just model these unbundled attributes of the
expression as attributes of the manifestation/publication since, as I
mentioned above, they have to be verified with every new manifestation
anyway. 

[Work record 1 is linked to Manifestation record 1]

Work record 1   Manifestation record 1
Dracula (1931)  OCLC#: 13754402
Tod BrowningAudio: English  
English


I ran into trouble with manifestations that include more than one work.

Some still work well enough, either because the expression-level information
is all the same or is unknown.

[Work record 1 is linked to Manifestation record 1]
[Work record 2 is also linked to Manifestation record 1]

Work record 1
Ursula (1961)
Lloyd Michael Williams  Manifestation record 1
English 1 DVD video (2005)  
Experiments in terror
Work record 2   ISBN: 0976523922  
Journey into the Unknown (2002 )Audio: English
Kerry Laitala
English 



However, in some cases, the expression-level information varies between two
works on a single manifestation/publication. The manifestation below
includes two versions of Dracula, each in its original language. For the
prototype, I just made two different manifestation records, which repeat
most of the same information. That doesn’t seem to me 

Re: [CODE4LIB] Announcing OLAC's prototype FRBR-inspired moving image discovery interface

2010-12-13 Thread Kelley McGrath
Matthew,

I find it confusing as well, but as Karen points out, that's the way the
FRBR model does things. It seems to be driven by the need for the work to be
such an abstract thing that it is prior to words. However, it does seem to
me that the meaning of the language of a particular expression is not
complete without reference to the original language.

One of the FRAD drafts
(http://archive.ifla.org/VII/d4/franar-conceptual-model-2ndreview.pdf)
actually did propose original language as an attribute of the work (The
language in which the work was first expressed), but that axed so it seems
to have been a very conscious decision on the part of the creators of FRBR.

The idea does seem to have generated some controversy. From ALA's feedback
on this draft:
 At least one task force member was a bit uneasy with this attribute,
noting that, although the attribute has a certain utility, the work entity
is abstract in FRBR and is not associated with any particular language (e.g.
Ancient Greek is the language of the first expression of the Iliad, but
not the language of the work, which encompasses what all of the expressions
have in common). Others thought that an original language attribute was
appropriate for work (for textual works, anyway), that all expressions of
a work do have the same original language even if the language of the
expressions themselves can differ, and that the attribute is necessary for
determining whether or not the expression represents a translation. It was
suggested that the attribute would not be appropriate for a superwork
entity, were one to be defined.
(http://www.libraries.psu.edu/tas/jca/ccda/docs/tf-frad3.pdf)

Kelley

-Original Message-
From: Beacom, Matthew matthew.bea...@yale.edu

Thank you, Karen,

It has been awhile since I refreshed my memory with actually reading FRBR.
Language is an attribute of the FRBR expression and not the FRBR work
entity. I must still have a dominate pre-FRBR concept of work in my mind! I
need another 5 years in the re-education camp.

Matthew

-Original Message-
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
Karen Coyle
Sent: Monday, December 13, 2010 10:51 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Announcing OLAC's prototype FRBR-inspired moving
image discovery interface

Quoting Beacom, Matthew matthew.bea...@yale.edu:

Sometimes I feel like we should all have the FRBR diagram tattoo'd on our
arms so we can consult it any time anywhere. :-)



 With as complex a thing as a film--so many authors, images, music, 
 dialog, acting, sets, costume, etc., etc., etc., applying the FRBR 
 model is tough, and your implementation is quite sensible. However, I 
 had a small question about one thing you said about FRBR not allowing 
 language at the work level. That doesn't seem right to me.
 How could the language of a thing that is primarily or even partially 
 a work made of language--like a novel or a motion picture with spoken 
 dialogue would not necessarily be considered at the work level and not 
 at some other level.

Matthew, I can't answer how it is possible but I can tell you that it is a
fact: language is an attribute of Expression, not of Work. That's kind of
the key meaning of frbr:Expression -- it is the Expression of the Work, and
the Work doesn't exist until Expressed. So Work is a very abstract concept
in FRBR. (Which is why more than one attempted implementation of FRBR that I
have seen combines Work and Expression attributes in some way.)

Not only that, but Kelley's model uses something that I consider to be
missing from FRBR: the concept of a original Expression. For FRBR (and
thus for RDA) all expressions are in a sense equal; there is no privileged
first or original expression. Yet there is evidence that this is a useful
concept in the minds of users. Some recent user studies [1] around FRBR
showed that this is a concept that users come up with spontaneously. Also, I
can't think of any field of study where knowing what the original expression
of a work was wouldn't be important.

 Because of the way we treat translations--not just in FRBR--as what 
 FRBR calls expressions not as new works, a translation from the 
 original language to another would be considered an FRBR expression.
 Could you explain this a bit more?

The FRBR relationship translation of is an Expression-to-Expression
relationship. (See my personal cheat sheet of RDA/FRBR relationships [2]).

kc
[1] http://www.asis.org/asist2010/abstracts/75.html
[2] http://kcoyle.net/rda/group1relsby.html


 Thank you.

 Matthew



 -Original Message-
 ...

 This also allowed us to get around some of the areas of more orthodox 
 FRBR modeling that we found unhelpful. For example, FRBR doesn't 
 allow language at the Work level, but we think it is important to 
 record the original language of a moving image at the top level.




--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234

Re: [CODE4LIB] Announcing OLAC's prototype FRBR-inspired moving image discovery interface

2010-12-13 Thread Kelley McGrath
Karen,

I'm glad you found it helpful and I will definitely consider writing it up
somewhere. Right now I'm also struggling to write something up on the data
modeling problems I had in a way that is comprehensible to anyone other than
me. That might make a good complement to this discussion.

I look forward to any comments or suggestions that you or anyone else has.
We are trying to get as much feedback as possible.

Kelley

-Original Message-
Kelley, this is great! Thanks. And since you already have so much written
up, would you consider going a bit further and offering it to the code4lib
journal? My reasons are selfish -- i'd like to be able to find and cite this
in the future.

Later I may have a few comments.

kc

Quoting Kelley McGrath kell...@uoregon.edu:

 We called it FRBR-inspired since it probably wouldn't pass muster as 
 an orthodox FRBR interpretation. We were looking to experiment with a 
 practical approach that we thought would make it much easier for 
 patrons to discover moving images in libraries and archives. If you 
 haven't read it, the about page gives a general overview of our 
 approach at http://blazing-sunset-24.heroku.com/page/about


Re: [CODE4LIB] Announcing OLAC's prototype FRBR-inspired moving image discovery interface

2010-12-13 Thread Kelley McGrath
One other thing about this project that might be of interest to Code4Lib
readers is that the most technically challenging part of the interface was
making the facets work properly so that they simultaneously applied limits
across tables that are linked with a many-to-many relationship.

The two main tables that are involved are Movies/Programs (works/primary
expressions) and Versions (expression/manifestations). These go with the two
sets of facets, which we visually separate for the interface in the hopes of
communicating their different functions to users.

Movies obviously can have many versions. If you look at the Citizen Kane
record, you can see that it was released in many formats, including VHS, DVD
and LaserDisc, with various language options.

A given manifestation can also contain more than one work. If you search for
Kyle XY, you'll get ten records for episodes that are part of a season of
the TV program. These are all on the same manifestation.

The versions table is also linked to a table that represents items and is
the intersection of the versions/manifestations table and the libraries
table, but this is a one-to-many relationship.

The facet counts under Versions are really for items, but it would be
interesting to see whether this would be more useful if the count was for
versions.

Kelley


Re: [CODE4LIB] Announcing OLAC's prototype FRBR-inspired moving image discovery interface

2010-12-11 Thread Kelley McGrath
We called it FRBR-inspired since it probably wouldn't pass muster as 
an orthodox FRBR interpretation. We were looking to experiment with a 
practical approach that we thought would make it much easier for patrons 
to discover moving images in libraries and archives. If you haven't read 
it, the about page gives a general overview of our approach at 
http://blazing-sunset-24.heroku.com/page/about


Our top level is a combination of FRBR work information and information 
about what we are calling the primary expression. We haven't made any 
internal distinction between these two types of information. This 
enables us to record together the data that we think people expect to 
see about the generic moving image and reflects the sort of information 
that is given in IMDb, the All Movie Guide, and film and TV reference 
sources. This is also the data that we would want to re-use in every 
MARC record for a manifestation of a given movie.


This also allowed us to get around some of the areas of more orthodox 
FRBR modeling that we found unhelpful. For example, FRBR doesn't allow 
language at the Work level, but we think it is important to record the 
original language of a moving image at the top level. In addition, RDA 
has mapped a number of functions, such as art director, costume designer 
and performer, to the expression level. We would prefer to present these 
at the top level. It is hard to imagine a version of Gone With the Wind 
with a different costume designer or cast that would still be the same 
work. So all the Seven Samurai data you listed above belongs either to 
the work or the primary expression.


We mingle expression, manifestation and item information in the version 
facets on the right. We don't show any explicit expression records. In 
this demonstration we are not actually identifying any unique 
expressions, although in the future we will probably want to do this for 
what I think of as named expressions. Since this is a demo, we are 
working with a limited number of attributes and the only 
expression-level facets we provide are soundtrack and subtitle 
languages.


In this sense, our approach is similar to the near manifestation idea 
that Simon mentioned. We are not trying to assert that we have 
identified particular expressions. Rather, we are trying to provide a 
mechanism for the user to identify the set of items that meet their 
needs. It is not clear to me that libraries are always in a position to 
accurately identify expressions.


Rather than providing a hierarchical view where the user selects a 
work, then an expression, and so on, as is common in FRBR presentations, 
we permit the user to begin at any FRBR level. The user is invited to 
limit by as many characteristics as they desire to delineate the set of 
things that they are interested in. They only need to select as many 
attributes as are important to them and no more. This may not meet the 
needs of all scholars, but we hope that it will meet the vast majority 
of general purpose user needs.


It's a bit of a different approach than I have seen elsewhere, but I 
think it works particularly well for moving images. One of the main 
reasons I think this is because of the types of expressions that 
predominate in commercial moving images. I will try to explain some of 
my thoughts on types of expressions below.


1. Expressions that can be reduced to controlled vocabulary options

These are the most common types of commercial  moving image 
expressions, especially in the DVD era. They are distinguished by 
characteristics that such as


  Soundtrack language(s)
  Subtitle language(s)
  Accessibility options (captioning, SDH, and audio description)
  Aspect ratio (although in this era of widescreen TVs, full screen 
modifications are less common)

  Colorization
  Soundtracks for silent films

These can be full described based on standardized data (although for 
the silent film soundtracks, this would involve multiple pieces of 
information, i.e., musical work, composer, conductor, performer(s), 
etc.)


DVD often contain what essentially are multiple expressions in that 
they offer multiple soundtrack and subtitle options and may offer 
multiple aspect ratios. A silent film on DVD may come with alternate 
soundtracks. All of these can be combined in various ways by the viewer, 
which can make for a large number of expressions contained in a single 
manifestation.


2. Named expressions

These are versions that are different in moving image content due to 
have been edited differently. Examples include


  Theatrical release
  Director's cut
  Unrated version

Although Martha Yee found a strong correlation between differences in 
duration and the likelihood that two things represented two different 
expressions, this doesn't always work. The archetypical example of Blade 
Runner was released on DVD with five different versions 
(http://en.wikipedia.org/wiki/Versions_of_Blade_Runner), all of which 
had run times within 

[CODE4LIB] Announcing OLAC's prototype FRBR-inspired moving image discovery interface

2010-12-09 Thread Kelley McGrath
OLAC (Online Audiovisual Catalogers) is excited to announce the 
availability of our prototype for a FRBR-inspired, work-centric, faceted 
discovery interface for moving images at 
http://blazing-sunset-24.heroku.com.


The OLAC Work-Centric Moving Image Discovery Interface Prototype is an 
exploration of the possibilities of leveraging the Functional 
Requirements for Bibliographic Records (FRBR) model and faceted search 
to improve access to moving image materials held by libraries and 
archives.


This prototype was funded by OLAC. Chris Fitzpatrick developed the 
demonstration interface to meet OLAC’s specifications using the free 
open source tools Ruby on Rails, Solr, and the Blacklight and Hydra 
plug-ins. This project was only possible due to the contributions of a 
great many people, some of whom are listed at 
http://blazing-sunset-24.heroku.com/page/credits.


In this demonstration interface we present the user with a two-level 
view inspired by the FRBR model. The top level, labeled Movie or 
Program, provides information about the FRBR Work and what we are 
calling the Primary Expression, usually the first publicly-released 
Expression. Facets for the Work/Primary Expression level are displayed 
across the top of the screen and the records found in the hit list 
contain information about the Work and Primary Expression. The second 
level, labeled Version, includes information about Expressions (language 
options), Manifestations (format and publication date), and in a very 
basic way about Items (what libraries or archives hold a particular 
Manifestation). Facets for the Version level are displayed separately on 
the side of the screen and information about the particular Versions 
that meet the user’s qualifications are displayed below each 
Work/Primary Expression.


An overview of the goals of the interface is available at 
http://blazing-sunset-24.heroku.com/page/about. Some suggested sample 
searches and potential use cases may be seen at 
http://blazing-sunset-24.heroku.com/page/samples.


We invite you to check it out and send us your feedback. Comments, 
questions, complaints, and suggestions may be sent to me at 
kell...@uoregon.edu. Also, if you are interested in contributing to a 
larger grant project to try to bring this idea into a production 
environment, please contact me.


Kelley McGrath
Metadata Management Librarian
University of Oregon
kell...@uoregon.edu


[CODE4LIB] FRBR work-centric, faceted UI demo developer sought

2010-10-18 Thread Kelley McGrath
Hi,

I thought I would send this again since so far I haven't heard from anyone. 
Unfortunately, we don't have a great deal of money to offer, but I think this 
would be an interesting project for the right person. It might even be a good 
project for an LIS student or recent grad looking for something for a resume.  
If you have any questions, please feel free to contact me.

Kelley

-Original Message-
 OLAC (Online Audiovisual Catalogers) has been investigating the  potential of 
the FRBR model and a work-centric approach to improve  access to moving images 
for some time. We are looking for someone to  make a basic but functional 
demonstration end-user interface for moving  images that is focused on FRBR 
works and that offers faceted navigation  using sample data for 143 moving 
image works, 210 manifestations, and
 297 items. Ideally, this will be developed with open source tools such  as 
MySQL, Solr and Lucene. I have some ideas about what the interface  might look 
like (see link below) and am looking for someone to put up  something quick and 
dirty, but functional and interactive so people can  get a better idea of how 
this might work. This may not turn out to be  anything like what would work for 
a final user interface, but I am  hoping that it will make the potential for a 
FRBR-based, faceted  approach clear and make it easier for people to understand 
the kinds of  searching options we want to provide.

 OLAC has agreed to fund $1500 to be awarded to the individual(s) who  
successfully completes this project. More information on and the sample  data 
for this project are available at
 http://www.olacinc.org/drupal/?q=node/437

 If you are interested in taking this project on, please contact me at  
kell...@uoregon.edu via email by Friday, October 22 with a list of your  
qualifications, a suggested timeline, and any other information you  think it 
might be helpful for us to know. We are willing to negotiate on  the timetable, 
but are interested in having a finished product as soon  as possible. Please 
contact me if you have any questions.

 Kelley McGrath
 kell...@uoregon.edu