[CODE4LIB] Code4Lib Journal, Issue 26 is now available!
The Code4Lib Journal, Issue 26 is now available! http://journal.code4lib.org/issues/issue26 Here is what you will find inside: Editorial Introduction: On Being on The Code4Lib Journal Editorial Committee Kelley McGrath Behind the scenes of the The Code4Lib Journal... Archiving the Web: A Case Study from the University of Victoria Corey Davis The University of Victoria Libraries started archiving websites in 2013, and it quickly became apparent that many scholarly websites being produced by faculty, especially in the digital humanities, were going to prove very challenging to effectively capture and play back. This article will provide an overview of web archiving and explore the considerable legal and technical challenges of implementing a web archiving initiative at a research library, using the University of Victoria's implementation of Archive-it, a web archiving service from the Internet Archive, as a case study, with a special focus on capturing complex, interactive websites that scholars are creating to disseminate their research in new ways. Technical Challenges in Developing Software to Collect Twitter Data Daniel Chudnov, Daniel Kerchner, Ankushi Sharma and Laura Wrubel Over the past two years, George Washington University Libraries developed Social Feed Manager (SFM), a Python and Django-based application for collecting social media data from Twitter. Expanding the project from a research prototype to a more widely useful application has presented a number of technical challenges, including changes in the Twitter API, supervision of simultaneous streaming processes, management, storage, and organization of collected data, meeting researcher needs for groups or sets of data, and improving documentation to facilitate other institutions' installation and use of SFM. This article will describe how the Social Feed Manager project addressed these issues, use of supervisord to manage processes, and other technical decisions made in the course of this project through late summer 2014. This article is targeted towards librarians and archivists who are interested in building collections around web archives and social media data, and have a particula! r interest in the technical work involved in applying software to the problem of building a sustainable collection management program around these sources. Exposing Library Services with AngularJS Jakob Vo? and Moritz Horn This article provides an introduction to the JavaScript framework AngularJS and specific AngularJS modules for accessing library services. It shows how information such as search suggestions, additional links, and availability can be embedded in any website. The ease of reuse may encourage more libraries to expose their services via standard APIs to allow usage in different contexts. Hacking Summon 2.0 The Elegant Way Annette Bailey and Godmar Back Libraries have long been adding content and customizations to vendor-provided web-based search interfaces, including discovery systems such as ProQuest's Summon((tm)). Unlike solutions based on using an API, these approaches augment the vendor-designed user interface using library-provided JavaScript code. Recently, vendors have been implementing such user interfaces using client-centric model-view-controller (MVC) frameworks such as AngularJS, which are characterized by the use of modern software engineering techniques such as domain-specific markup, data binding, encapsulation, and dependency injection. Consequently, traditional approaches such as reverse-engineering the document model (DOM) have become more difficult or even impossible to use because the DOM is highly dynamic, the templates used are difficult to discern, the vendor-provided JavaScript code is both encapsulated and partially obfuscated, and the data binding mechanisms impose a strict separation of model and view that discourages direct DOM manipulation. In fact, practitioners have started to complain that AngularJS-based websites such as Summon 2.0 are very difficult to enhance with custom content in a robust and efficient manner. In this article, we show how to reverse-engineer the AngularJS-based Summon 2.0 interface to discover the modules, directives, controllers, and services it uses, and we explain how we can use AngularJS's built-in mechanisms to create new directives and controllers that integrate with and augment the vendor-provided ones to add desired customization and interactions. We have implemented several features that demonstrate our approach, such as a click-recording script, COinS and facet customization, and the integration of eBook public notes. Our explanation and code should be of direct use for adoption or as examples for other Summon 2.0 customers, but they may also be useful to anyone faced with the need to add enhancements to other vendor-controlled MVC-based sites. Parsing and Matching Dates in VIAF Jenny A. Toves and Thomas B. Hickey
[CODE4LIB] Code4Lib Journal call for proposals
Call for Papers (and apologies for cross-posting): The Code4Lib Journal (C4LJ) exists to foster community and share information among those interested in the intersection of libraries, technology, and the future. We are now accepting proposals for publication in our 26th issue. Don't miss out on this opportunity to share your ideas and experiences. To be included in the 26th issue, which is scheduled for publication in mid-October 2014, please submit articles, abstracts, or proposals at http://journal.code4lib.org/submit-proposal or to jour...@code4lib.org by Friday, July 11, 2014. When submitting, please include the title or subject of the proposal in the subject line of the email message. C4LJ encourages creativity and flexibility, and the editors welcome submissions across a broad variety of topics that support the mission of the journal. Possible topics include, but are not limited to: * Practical applications of library technology (both actual and hypothetical) * Technology projects (failed, successful, or proposed), including how they were done and challenges faced * Case studies * Best practices * Reviews * Comparisons of third party software or libraries * Analyses of library metadata for use with technology * Project management and communication within the library environment * Assessment and user studies C4LJ strives to promote professional communication by minimizing the barriers to publication. While articles should be of a high quality, they need not follow any formal structure. Writers should aim for the middle ground between blog posts and articles in traditional refereed journals. Where appropriate, we encourage authors to submit code samples, algorithms, and pseudo-code. For more information, visit C4LJ's Article Guidelines or browse articles from earlier issues published on our website: http://journal.code4lib.org. Remember, for consideration for the 26th issue, please send proposals, abstracts, or draft articles to jour...@code4lib.org no later than Friday, July 11, 2014. Send in a submission. Your peers would like to hear what you are doing. Code4Lib Journal Editorial Committee
Re: [CODE4LIB] Looking for two coders to help with discoverability of videos
Thanks, Jon. I have seen the Variations work and also talked to Jenn Riley about it. It has definitely influenced me, although we are going in a slightly different direction and moving images have some different needs from music. One thing about Variations that struck me is this paragraph from the usability testing report (http://www.dlib.indiana.edu/projects/vfrbr/projectDoc/usability/usabilityTest/ScherzoUTestReport.pdf): There was an assumption among the development team that works would be a window for organizing and narrowing results in a way that users searching for scores and recordings would find useful. One of the main ideas behind FRBR is that the work, or the intellectual entity that is produced by people and is packaged in many forms, is the core information – Scherzo’s interface reflected that organization. 4 (See Appendix E, Fig. 14 for Scherzo’s search results page.) But the participants tended to latch on to a person’s name and search for that name in a particular role. The reasons for this are not completely clear and further discussion follows, but it is worth bearing this finding in mind. Additionally, from the search results page, work results were clicked only 14 times in comparison to items in recordings scores , which were clicked 65 times. Regardless of how the FRBRized data is organized on the back end, the interface needs to reflect the way users want to search, and that might not mean with search results organized by work. Does this mean that a work-focused approach is not actually what users want or need? Does it mean that the work-centered approach needs to be implemented differently in the user interface? Are these results somehow specific to music? Do they reflect users' familiarity with the typical library catalog and the strategies they've become accustomed to using? It does suggest to me that there should be more studies on how users interact with FRBRized data (and not just the clustering that so many discovery interfaces do now, but real FRBR-based data) and how FRBRized data is best presented. Kelley On Tue, Dec 3, 2013 at 11:35 AM, Dunn, Jon William Butcher j...@iu.edumailto:j...@iu.edu wrote: Hi Kelley, If you haven't already, you might want to look at the music score and sound recording FRBRization work done on the Variations-FRBR project here at Indiana University. I'm not sure how directly useful this would be for your work with moving images, but there may be some useful mapping ideas: FRBR XML schemas: http://www.dlib.indiana.edu/projects/vfrbr/schemas/1.1/index.shtml MARC-FRBR mapping specifications: http://www.dlib.indiana.edu/projects/vfrbr/projectDoc/metadata/mappings/spring2010/vfrbrSpring2010mappings.shtml Java FRBRization code and documentation: http://www.dlib.indiana.edu/projects/vfrbr/projectDoc/index.shtml Jon
Re: [CODE4LIB] Looking for two coders to help with discoverability of videos
Well, that would be much easier, but most of what I am working with are records for physical items (DVD, VHS, film) or licensed streaming video. The sample records are also not all UO records so I don't necessarily even have access to the source material (our goal is to build a general purpose tool). So I think I am stuck with extracting from MARC. We should be able to get data for some resources by matching the MARC up with external data sources. That won't work for everything, though, so we want to make the process of extracting data from MARC as effective as possible. Kelley On Mon, Dec 2, 2013 at 7:03 AM, Alexander Duryee alexanderdur...@gmail.commailto:alexanderdur...@gmail.com wrote: Is it out of the question to extract technical metadata from the audiovisual materials themselves (via MediaInfo et al)? It would minimize the amount of MARC that needs to be processed and give more accurate/complete data than relying on old cataloging records.
Re: [CODE4LIB] Looking for two coders to help with discoverability of videos
Robert, Your work also sounds very interesting and definitely overlaps with some of what we want to do. It seems like a lot of people are trying to get useful format information out of MARC records and it's unfortunate that it is so complicated. I would be very interested to see your logic for determining format and dealing with self-contradictory records. Runtime from the 008 is, as you say, pretty straightforward, but not always filled out and useless if the resource is longer than 999 minutes. It's interesting that you mention identifying directors. We have also been working on a similar, although more generalized, process. We're trying to identify all of the personal and organizational names mentioned in video records and, where possible, their roles. Our existing process is pretty accurate for personal names and for roles in English. It tends to struggle with credits involving multiple corporate bodies and we're working on building a lexicon of non-English terms for common roles. We're also trying to get people to hand-annotate credits to build a corpus to help us improve our process. (Help us out at http://olac-annotator.org/. And if you're willing to be on call to help with translating non-English credits, email me with the language(s) you'd be able to help out with. We also just started a mailing list at https://lists.uoregon.edu/mailman/listinfo/olac-credits) Matching MARC records for moving images with external data sources is also on our radar. Most feature film type material can probably be identified by the attributes you mention: title, original date and director (probably 2 out of 3 would work in most cases). We are also hoping to use these attributes (and possibly others) to cluster records for the same FRBR work. It would be great to talk with you more about this off-list. Kelley kell...@uoregon.edu From: Robert Haschart [rh...@virginia.edu] Sent: Monday, December 02, 2013 10:49 AM To: Code for Libraries Cc: Kelley McGrath Subject: Re: [CODE4LIB] Looking for two coders to help with discoverability of videos Kelley, The work you are proposing is interesting and overlaps somewhat both with work I have already done and with a new project I'm looking into here at UVa. I have been the primary contributor to the Marc4j java project for the past several years and am the creator of the project SolrMarc which extracts data from Marc records based on a customizable specification, to build Solr index records to facilitate rich discovery. Much of my work on creating and improving these projects has been in service of my actual job of creating and maintaining the Solr Index behind our Blacklight-based discovery interface. As a part of that work I have created custom SolrMarc routines that extract the format of items similar to what is described in Example 3, including looking in the leader, 006, 007 and 008 to determine the format as-coded but further looking in the 245 h, 300 and 538 fields to heuristically determine when the format as-coded is incorrect and ought to be overridden. Most of the heuristic determination is targeted towards Video material, and was initiated when I found an item that due to a coding error was listed as a Video in Braille format. Further I have developed a set of custom routines that look more closely at Video items, one of which already extracts the runtime from the 008[18-20] field, To modify it from its current form that currently returns the runtime in minutes, to instead return it as HH:MM as specified in your xls file, and to further handle the edge case of 008[18-20] = 000 to return over 16:39 would literally take about 15 minutes. Another of these custom routines that is more fully-formed, is code for extracting the Director of a video from the Marc record. It examines the contents of the fields 245c, 508a, 500a, 505a, 505t, employing heuristics and targeted natural language processing techniques, to attempt to correctly extract the Director. At this point I believe it achieves better results than a careful cataloger would achieve, even one who specializes in film and video. The other project I have just started investigating is an effort to create and/or flesh out Marc records for video items based on heuristic matching of title and director and date with data returned from publicly-accessible movie information sites. This more recent work may not be relevant to your needs but the custom extraction routines seem directly applicable to your goals, and may also provide a template that may make your other goals more easily achievable. -Robert Haschart
Re: [CODE4LIB] Looking for two coders to help with discoverability of videos
I wanted to follow up on my previous post with a couple points. 1. This is probably too late for anybody thinking about applying, but I thought there may be some general interest. I have put up some more detailed specifications about what I am hoping to do at http://pages.uoregon.edu/kelleym/miw/. Data extraction overview.doc is the general overview and the other files contain supporting documents. 2. I replied some time ago to Heather's offer below about her website that will connect researchers with volunteer software developers. I have to admit that looking for volunteer software developers had not really occurred to me. However, I do have additional things that I would like to do for which I currently have no funding so if you would be interested in volunteering in the future, let me know. Kelley kell...@uoregon.edu On Tue, Nov 12, 2013 at 6:33 PM, Heather Claxton claxt...@gmail.commailto:claxt...@gmail.com wrote: Hi Kelley, I might be able to help in your search. I'm in the process of starting a website that connects academic researchers with volunteer software developers. I'm looking for people to post programming projects on the website once it's launched in late January. I realize that may be a little late for you, but perhaps the project you mentioned in your PS (clustering based on title, name, date ect.) would be perfect? The one caveat is that the website is targeting software developers who wish to volunteer. Anyway, if you're interested in posting, please send me an e-mail at sciencesolved2...@gmail.commailto:sciencesolved2...@gmail.com I would greatly appreciate it. Oh and of course it would be free to post :) Best of luck in your hiring process, Heather Claxton-Douglas On Mon, Nov 11, 2013 at 9:58 PM, Kelley McGrath kell...@uoregon.edumailto:kell...@uoregon.edu wrote: I have a small amount of money to work with and am looking for two people to help with extracting data from MARC records as described below. This is part of a larger project to develop a FRBR-based data store and discovery interface for moving images. Our previous work includes a consideration of the feasibility of the project from a cataloging perspective ( http://www.olacinc.org/drupal/?q=node/27), a prototype end-user interface (https://blazing-sunset-24.heroku.com/, https://blazing-sunset-24.heroku.com/page/about) and a web form to crowdsource the parsing of movie credits ( http://olac-annotator.org/#/about). Planned work period: six months beginning around the second week of December (I can be somewhat flexible on the dates if you want to wait and start after the New Year) Payment: flat sum of $2500 upon completion of the work Required skills and knowledge: * Familiarity with the MARC 21 bibliographic format * Familiarity with Natural Language Processing concepts (or willingness to learn) * Experience with Java, Python, and/or Ruby programming languages Description of work: Use language and text processing tools and provided strategies to write code to extract and normalize data in existing MARC bibliographic records for moving images. Refine code based on feedback from analysis of results obtained with a sample dataset. Data to be extracted: Tasks for Position 1: Titles (including the main title of the video, uniform titles, variant titles, series titles, television program titles and titles of contents) Authors and titles of related works on which an adaptation is based Duration Color Sound vs. silent Tasks for Position 2: Format (DVD, VHS, film, online, etc.) Original language Country of production Aspect ratio Flag for whether a record represents multiple works or not We have already done some work with dates, names and roles and have a framework to work in. I have the basic logic for the data extraction processes, but expect to need some iteration to refine these strategies. To apply please send me an email at kelleym@uoregon explaining why you are interested in this project, what relevant experience you would bring and any other reasons why I should hire you. If you have a preference for position 1 or 2, let me know (it's not necessary to have a preference). The deadline for applications is Monday, December 2, 2013. Let me know if you have any questions. Thank you for your consideration. Kelley PS In the near future, I will also be looking for someone to help with work clustering based on title, name, date and identifier data from MARC records. This will not involve any direct interaction with MARC. Kelley McGrath Metadata Management Librarian University of Oregon Libraries 541-346-8232tel:541-346-8232 kell...@uoregon.edumailto:kell...@uoregon.edu
[CODE4LIB] OLAC Movie Video Credit Annotation Experiment
This project may be of interest to some on this list as an experiment to explore extracting structured data from free text in MARC. You also have a chance to help make it easier to find film and video in libraries if you're willing to take a few minutes to participate. OLAC (http://www.olacinc.org/) is working on project to try to make the process of finding film and video in library catalogs better. Please help us by annotating some film and video credits at http://olac-annotator.org/. It only takes a few minutes to make a contribution. We are challenging OLAC members to do annotate three credits per day this week to see how many we can get done. Please join us in this endeavor. We are especially looking for people who know languages other than English to help us translate credits in languages from Chinese to Spanish to Urdu. Full announcement below. Please share this information with anyone you think might be interested. Kelley *** The OLAC Movie Video Credit Annotation Experiment (http://olac-annotator.org) is part of a larger project to make it easier to find film and video in libraries and archives. In the current phase, we're trying to break existing MARC movie records down and pull out all the cast and crew information so that it may be re-ordered and manipulated. We also want to make explicit connections between cast and crew names and their roles or functions in the movie production. Adding these formal connections to movie records will allow us to provide a better user experience. For example, library patrons would be able to search just for directors or just for cast members or only for movies where Clint Eastwood is actually in the cast rather than all the movies that he is connected with. Libraries would have the flexibility to create more standardized and readable displays of production credits, such as you see at IMDb (see http://www.imdb.com/title/tt1205489/ -- not that we necessarily want IMDb's display, but that we would have much more flexibility in designing displays) , rather than views like a typical library catalog (such as http://janus.uoregon.edu/record=b3958782). We therefore want to convert our existing records into more structured sets of data. Eventually, we intend to automate most of this conversion. For now, we need help from human volunteers, who can train our software to recognize the many ways names and roles have been listed in library records for movies. Give us a hand at http://olac-annotator.org. For an explanation with more library jargon thrown in, see http://olac-annotator.org/#/more. The OLAC Movie Video Credit Annotation Experiment was conceived by Kelley McGrath, developed by Chris Fitzpatrick and funded by a Richard and Mary Corrigan Solari Library Fellowship Incentive Award from the University of Oregon Libraries. Kelley McGrath Metadata Management Librarian University of Oregon Libraries 541-346-8232 kell...@uoregon.edu
[CODE4LIB]
I'll second the idea of approaching people individually and explicitly asking them to participate. It worked on me. I never would have written my first article for the Code4Lib Journal or become a member of the editorial committee if someone hadn't encouraged me individually (Thanks Jonathan!). It would also be good to find a way to somehow target the pool of lurkers who maybe aren't already connected to someone and get them more involved. As far as anonymous proposals go, we recently had a very good workshop on implicit bias here. Someone brought up that found significant changes in the gender proportions in symphony orchestras after candidates started auditioning behind screens. There are also lots of studies about the different responses to the same resume/application depending on whether a stereotypically male/female or white/black name was used. Probably it's impossible to make proposals completely anonymous, but it would be an interesting experiment to leave off the names. Kelley PS Interestingly, I wouldn't instinctively self-identify as a member of the Code4Lib community, although my first thought is that that has more to do with not being a coder than with being a woman. ** Kelley McGrath Metadata Management Librarian University of Oregon Libraries 1299 University of Oregon Eugene, OR 97403 541-346-8232 kell...@uoregon.edu
Re: [CODE4LIB] Code4Lib Journal: Editors Wanted
Hello, again! Just a reminder that the deadline to apply for the Code4Lib Journal editorial committee is Monday, April 30. If you have been thinking about the call for new editors, I encourage you to apply. It's a great opportunity to contribute to something that makes a real difference to the library community by improving the dissemination of innovative and practical ideas. In my experience, it's both a lot work and a lot of fun and the editorial committee is made up of dedicated, supportive people who are great to work with. Questions? Ask me (kell...@uoregon.edu) or anybody on the editorial committee (http://journal.code4lib.org/editorial-committee) or all of us (jour...@code4lib.org). Kelley On Tue, Apr 10, 2012 at 10:22 AM, Kelley McGrath kell...@uoregon.edu wrote: The Code4Lib Journal (http://journal.code4lib.org/) is looking for volunteers to join its editorial committee. Editorial committee members work collaboratively to produce the quarterly Code4Lib Journal. Editors are expected to: * Read, discuss, and vote on incoming proposals. * Volunteer to be the assigned editor or second reader for specific proposals. ** Assigned editors work with the author(s) to make sure the article is as strong as possible, that the copy is clean, and deadlines are met. They also enter the article into WordPress, making sure the formatting is okay, all images and tables look good, etc. ** Second readers act as a second set of eyes for the assigned editor. * Read and comment on any other article that interests you. * Volunteer for administrative tasks and projects as they crop up. * Take a turn as Coordinating Editor for an Issue. The Coordinating Editor shepherds the issue through its life cycle. We seek an individual who is self-motivated, organized and able to meet deadlines; is familiar with ideas and trends in the field; and has an interest in the mechanics of writing. There is a sometimes significant time commitment involved; expect to set aside ten or more hours a month. It sounds like a lot of work, but it's also a lot of fun (if editing is your idea of fun). Intrigued? Please send a letter of interest by Monday, April 30 to jour...@code4lib.org. Your letter should address these two basic questions: 1) What is your vision for the Code4Lib Journal? Why are you interested in it? 2) How can you contribute to the Code4Lib Journal, i.e. what do you have to offer? We encourage people who have previously applied and who are still interested to re-apply. We have had to turn down a lot of highly-qualified people in the past due to the large number of applications. If you have any questions, contact us by email at jour...@code4lib.org or ask any member of the editorial committee (listed at http://journal.code4lib.org/editorial-committee). We plan to make decisions about additional editors by mid-May. Kelley McGrath on behalf of the Code4Lib Editorial Committee
[CODE4LIB] Code4Lib Journal: Editors Wanted
The Code4Lib Journal (http://journal.code4lib.org/) is looking for volunteers to join its editorial committee. Editorial committee members work collaboratively to produce the quarterly Code4Lib Journal. Editors are expected to: * Read, discuss, and vote on incoming proposals. * Volunteer to be the assigned editor or second reader for specific proposals. ** Assigned editors work with the author(s) to make sure the article is as strong as possible, that the copy is clean, and deadlines are met. They also enter the article into WordPress, making sure the formatting is okay, all images and tables look good, etc. ** Second readers act as a second set of eyes for the assigned editor. * Read and comment on any other article that interests you. * Volunteer for administrative tasks and projects as they crop up. * Take a turn as Coordinating Editor for an Issue. The Coordinating Editor shepherds the issue through its life cycle. We seek an individual who is self-motivated, organized and able to meet deadlines; is familiar with ideas and trends in the field; and has an interest in the mechanics of writing. There is a sometimes significant time commitment involved; expect to set aside ten or more hours a month. It sounds like a lot of work, but it's also a lot of fun (if editing is your idea of fun). Intrigued? Please send a letter of interest by Monday, April 30 to jour...@code4lib.org. Your letter should address these two basic questions: 1) What is your vision for the Code4Lib Journal? Why are you interested in it? 2) How can you contribute to the Code4Lib Journal, i.e. what do you have to offer? We encourage people who have previously applied and who are still interested to re-apply. We have had to turn down a lot of highly-qualified people in the past due to the large number of applications. If you have any questions, contact us by email at jour...@code4lib.org or ask any member of the editorial committee (listed at http://journal.code4lib.org/editorial-committee). We plan to make decisions about additional editors by mid-May. Kelley McGrath on behalf of the Code4Lib Editorial Committee
[CODE4LIB] Finding movies with FRBR facets lightning talk: expanded version
Since my lightning talk at the Code4Lib conference only really talked about the prototype discovery interface, it ended up giving an incomplete sense of the overarching project. If anyone is interested, I put up an expanded version of my slides that includes more of the bigger picture of where we're going to get the data from and how we hope to cooperatively maintain a central data store of information about moving image materials in libraries to drive the user interface. http://pages.uoregon.edu/kelleym/publications/FRBRFacets_C4L2012.pdf Kelley McGrath Metadata Management Librarian University of Oregon kell...@uoregon.edu
Re: [CODE4LIB] Cataloging4Coders @ C4L12 - We need your brains
to account for. I have no idea how a computer would know whether ill. ought to map to illustration or illustrations in most cases since the distinction was not recorded. Perhaps illustration(s) would work. That doesn't even start to address mistakes in data, allowing for older rules (AACR1's illus.), non-English language records or local practices that go against the rules. All this is not to say that there isn't a real need here. Their ought to be a way to both minimize the amount of typing that catalogers have to do while at the same time provide full, unambiguous displays for users. So what I wish is that there were some way to get more catalogers to see that despite Watson, there are serious limitations to what computers can practically do and that we would be better off if we worked with computer's strengths instead of trying to make them do things that are hard for them to do so we can reproduce the form of the card catalog (as opposed to the function). Kelley On Fri, Nov 18, 2011 at 8:26 AM, Bohyun Kim k...@fiu.edu wrote: As a side note to this, the communication issue is not unique between catalogers and coders. It is a common discussion topic (librarians vs. IT; emerging technology librarians vs. library coders; even web designers vs. web developers). I hear about this a lot in library conferences. But of course, discussion there is mostly from the librarians' point of view. Since code4lib is unique in that many library coders get together, it would be good to hear the thoughts on this from the coders' point of view as well. ~Bohyun -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Kelley McGrath Sent: Thursday, November 17, 2011 7:19 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Cataloging4Coders @ C4L12 - We need your brains I am not by any stretch of the imagination a coder, but I think it would be helpful to have some discussion of common cataloger-coder communication issues. So many cataloger-coder discussions online seem to consist of people talking past each other (although I do think there is a much larger and less vocal common ground in the middle). In addition, I have sometimes seen my cataloger and coder/IT colleagues struggle to communicate with each other and find myself trying to translate. Are there ways to make that translation process easier or cultivate more translators? What do coders wish that catalogers knew about how computers interact with metadata? I would also be interested in ideas on how to shift the conversation more towards underlying functionality. A central failing of computerized catalogs IMO is that they tend to replicate the literal form and actions of cards and the card catalog rather than tried to find a way to express the underlying functionality of the card catalog in a computer environment. This is also sometimes badly done because the programmers don't understand the point of what they're replicating (although to be fair, what they're trying to work with is often not in a form optimized for a computer environment). Uniform titles in many catalogs are a good example of this. Kelley PS Some of the other emails mention wanting help with understanding where real data differs from what's in specifications or differs over time or for other reasons. Speaking as a reasonably competent cataloger, I would say that, although some things can be anticipated in advance, I find this to inevitably be an iterative process. PPS I'm looking forward to attending. On Thu, Nov 10, 2011 at 11:14 AM, Becky Yoose b.yo...@gmail.com wrote: Hey folks, There's been increasing discussion and interest about cataloging around this community (and others like it) for quite a while. I found some co-conspirators and we are planning to propose a pre-conference on cataloging/library metadata creation geared towards the huddled code4lib masses (otherwise known as coders) who are yearning for knowledge of this Darkest of Library Arts. We need you help before we post our proposal. We realize that there's a wide range of cataloging knowledge and experience in the community, and we want to make sure that those interested get the most out of the pre-conference. If this pre-conference has perked your interest, can you help us in letting us know: - What experience do you have with cataloging/library metadata creation? - What do you want us to cover? Do you have any questions that you want covered? This information will help us greatly in how we structure the pre-conference both in content and schedule. For now, we're planning a half-day pre-conference, but if there's enough interest between beginners and more experienced folks, we will consider offering two half-day preconferences in order to focus on specific participant needs. Feel free to ask questions as well - I'll try to answer them as best as possible given what our group has brainstormed so far. Thanks for reading
Re: [CODE4LIB] Cataloging4Coders @ C4L12 - We need your brains
I am not by any stretch of the imagination a coder, but I think it would be helpful to have some discussion of common cataloger-coder communication issues. So many cataloger-coder discussions online seem to consist of people talking past each other (although I do think there is a much larger and less vocal common ground in the middle). In addition, I have sometimes seen my cataloger and coder/IT colleagues struggle to communicate with each other and find myself trying to translate. Are there ways to make that translation process easier or cultivate more translators? What do coders wish that catalogers knew about how computers interact with metadata? I would also be interested in ideas on how to shift the conversation more towards underlying functionality. A central failing of computerized catalogs IMO is that they tend to replicate the literal form and actions of cards and the card catalog rather than tried to find a way to express the underlying functionality of the card catalog in a computer environment. This is also sometimes badly done because the programmers don't understand the point of what they're replicating (although to be fair, what they're trying to work with is often not in a form optimized for a computer environment). Uniform titles in many catalogs are a good example of this. Kelley PS Some of the other emails mention wanting help with understanding where real data differs from what's in specifications or differs over time or for other reasons. Speaking as a reasonably competent cataloger, I would say that, although some things can be anticipated in advance, I find this to inevitably be an iterative process. PPS I'm looking forward to attending. On Thu, Nov 10, 2011 at 11:14 AM, Becky Yoose b.yo...@gmail.com wrote: Hey folks, There's been increasing discussion and interest about cataloging around this community (and others like it) for quite a while. I found some co-conspirators and we are planning to propose a pre-conference on cataloging/library metadata creation geared towards the huddled code4lib masses (otherwise known as coders) who are yearning for knowledge of this Darkest of Library Arts. We need you help before we post our proposal. We realize that there's a wide range of cataloging knowledge and experience in the community, and we want to make sure that those interested get the most out of the pre-conference. If this pre-conference has perked your interest, can you help us in letting us know: - What experience do you have with cataloging/library metadata creation? - What do you want us to cover? Do you have any questions that you want covered? This information will help us greatly in how we structure the pre-conference both in content and schedule. For now, we're planning a half-day pre-conference, but if there's enough interest between beginners and more experienced folks, we will consider offering two half-day preconferences in order to focus on specific participant needs. Feel free to ask questions as well - I'll try to answer them as best as possible given what our group has brainstormed so far. Thanks for reading, Becky Official cat[aloger] herder --- Becky Yoose Systems Librarian Grinnell College Libraries yoose...@grinnell.edu
Re: [CODE4LIB] LCSH and Linked Data
On Sun, Apr 17, 2011 at 7:40 AM, Simon Spero s...@unc.edu wrote: The main study on this subject was the Michigan study performed/led by Karen Markey (some reports were written as Karen M. Drabenstott. The final report of the project is available at http://deepblue.lib.umich.edu/handle/2027.42/57992 . The work took place in the mid to late 90s, after Airlie . ... The most perplexing results were those that showed that measured understanding was lower when headings were displayed in the context of a bibliographic record rather than on their own. This indicates either a problem in the measurement process, or an either more fundamental problem with subdivided headings that may so negate the significant theoretical advantages of pre-coordination that the value of the whole practice is thrown in to doubt. That is fascinating. And disturbing. I don't think I ever read the original study, but now I'll have to. Touching on another topic, I believe that the movement of geographical subdivisions to follow the right most geographically sub-dividable subdivision can sometimes be interrupted by the interposition of a $x topical subdivision, but I haven't determined whether this is a legacy exception (the ones that came to mind were related to subtopics of the US Civil War, which seems inevitable given that the first elements are United States--History--Civil War, 1861-1865--). I think the key here is partly In 1992, it was decided to adopt that order where it could be applied. so LC didn't promise to do them all. $x History is probably the biggest one that hasn't been made geographically subdividable, but it's hard to say if that's on principle or because of practical concerns about the huge amount of disruption that would cause in individual systems. It's interesting that some of the biggies like economic aspects are more recent. One of the challenges for pre-coordinated strings at least as currently implemented (that facets evade) is that no order will suit everyone. Which of the following is better? Dwellings $z Australia $x History $y 20th century Dwellings $z Indonesia $x Economic aspects Dwellings $z Indonesia $x Psychological aspects Dwellings $z Indonesia $x Social aspects Dwellings $z Ireland $x Economic aspects Dwellings $z Ireland $x Psychological aspects Dwellings $z Ireland $x Social aspects Dwellings $z Japan $x Economic aspects Dwellings $z Japan $x Psychological aspects Dwellings $z Japan $x Social aspects OR (mostly current practice) *Dwellings $z Australia $x History $y 20th century **Current practice Dwellings $x Economic aspects $z Indonesia Dwellings $x Economic aspects $z Ireland Dwellings $x Economic aspects $z Japan *Dwellings $x History $z Australia $y 20th century **Airlie recommendation Dwellings $x Psychological aspects $z Indonesia Dwellings $x Psychological aspects $z Ireland Dwellings $x Psychological aspects $z Japan Dwellings $x Social aspects $z Indonesia Dwellings $x Social aspects $z Ireland Dwellings $x Social aspects $z Japan Probably not helpful to have history be an outlier, though. Kelley
Re: [CODE4LIB] LCSH and Linked Data
A few belated ramblings from a cataloger: 1) GEOGRAPHICAL SUBDIVISION It used to be that geographical subdivision was much more flexible and was supposed to convey different meanings depending on where it occurred in the string. Then there was some research showing that not only did users not know how to interpret this, but catalogers did not understand these rules and were constructing inconsistent headings. This led to a movement for simplification. From LC's Subject Heading Manual: The Subject Subdivisions Conference that took place at Airlie, Virginia, in 1991 recommended that the standard order of subdivisions be [topic]–[place]–[chronology]–[form]. In 1992, it was decided to adopt that order where it could be applied. This leaves a standard order of $a, $b [rare], $x, $z, $y, $v with some exceptions. As was pointed out earlier, the current rule is to put the geographic subdivision ($$z) as near the end as is legal. This can be mechanically determined based on a fixed field in the authority record. Although fixed fields in bib records are often unreliable, those in authority records are probably as accurate as they can reasonably be made to be, allowing for human error. This is both because LC coordinates training and reviews records and because the fixed fields are used as decision points so there are short-term consequences for later catalogers if they're not done right. The fixed field (008/06) in LCSH authority records that tells you if a geographic subdivision can come after the heading (http://www.loc.gov/marc/authority/ad008.html). Id.loc.gov doesn't seem to give you that info, but it might be nice if it did. 650 _0 $a Education [sh 85040989, Geo Subd = i = Subdivided geographically-indirect] $z England [n 82068148] $x Finance [sh2002007885, Geo Subd = # = Not subdivided geographically] 650 _0 $a Education [sh 85040989, Geo Subd = i = Subdivided geographically-indirect] $x Economic aspects [sh 99005484 Geo Subd = i = Subdivided geographically-indirect] $z England [n 82068148]. One reason not to rely on found order is that LC has been moving in the direction of the Airlie House recommendation so in addition to the usual mistakes, you'll probably come across a lot of older forms if you take data from the wild. For example, until somewhat recently, the economic aspects record above looked like the finance one so you'll probably still see records like 650 _0 $a Education $z England $x Economic aspects. A) Indirect Subdivision In general, when a heading string starts with a geographic name, it is in direct order: 651 _0 $a London (England) [n 79005665] $x Economic conditions [sh 99005736]. If a geographic name is modifying a topical heading, it is given in indirect order: 650 _0 $a Education [sh 85040989] $z England $z London [n 79005665; covers both $z subfields]. Thanks to a project that OCLC did for FAST (which uses only the indirect style), in most cases both of these can be extracted from the authority record, which will have a 781 with the indirect form added: n 79005665 151 $a London (England) 451 $a Londinium (England) ... 781 0 $z England $z London Some records (usually for geographic areas within cities) cannot be used to modify topical headings, but can be used in 651$a as the main term in a heading string. There are identified by a note and lack of 781. n 85192245 151 $a Hackney (London, England) 667 $a SUBJECT USAGE: This heading is not valid for use as a geographic subdivision. B) Geographic Entities and Name vs. Subject Headings Notice that in the above example, the control number/identifier for Education starts with sh while the one for London starts with n. This is an important distinction. Heading identifiers that start with sh are LCSH terms found in the subject authority file and are available from id.loc.gov. I think these all fall into FRBR's group 3 bib entities. Heading identifiers that start with n are stored in the LC NAF (Name Authority File) and are not available as linked data. These are the FRBR group 1 and 2 entities and maybe some from group 3. Most of these can also be used as subjects in LCSH. So you can't actually get at all the building blocks of LCSH strings nor use linked data for all subjects. Named geographic features (e.g., mountains, lakes, continents) are established in the subject authority file using the rules in the Subject Cataloging Manual for LCSH. The headings are tagged 151 and can be found at id.loc.gov. sh 85082617 151 $a McKinley, Mount (Alaska) sh 85044620 151 $a Erie, Lake sh 85008606 151 $a Asia Geographic features appear in bib records only as 651 or 650+ $z subject terms. Jurisdiction names (e.g., cities, states, countries) are established in the name authority file using descriptive cataloging rules (e.g., AACR2 ch 23 and the NACO Participants' Manual). They
Re: [CODE4LIB] regexp for LCC?
At one point, much to my surprise, someone told me that 050 is defined for numbers assigned by LC not for LCC numbers per se. It doesn't really sound like that from the current definition (http://www.loc.gov/marc/bibliographic/bd050.html), but if you look on the ITS page (http://www.itsmarc.com/crs/edit7592.htm), which I think is not up-to-date, you'll see a discussion of Pseudo call numbers and other forms of LC call numbers As someone pointed out, only a very few classes start with three letters (off the top of my head; a couple in D and a number in K; see http://library.duke.edu/services/instruction/libraryguide/lcclass.html, but there are more in K than are listed here). The pseudo or shelf numbers I've seen most often in 050 are MLC and SD (which unfortunately is the same as the class for forestry). Look for SD on musical recording records (it used to really mess up the attempts of the catalog where I used to work to facet music CDs on LC class; there were a few other common ones, but I've forgotten). Depending what you're doing, you might try to prefer a call number in 090 if there is one. These are more likely to reflect local preference. Looking up 090 (http://www.oclc.org/bibformats/en/0xx/090.shtm) produced some other examples of non-LCC 050's: PAR, Newspaper, UNC, or NOT IN LC. Good luck! Kelley *** Except now I wonder if those annoying MLCS call numbers might actually be properly MATCHED by this regex, when I need em excluded. They are annoying _similar_ to a classified call number. Well, one way to find out. And the reason this matters is to try and use an LCC to map to a 'discipline' or other broad category, either directly from the LCC schedule labels, or using a mapping like umich's: http://www.lib.umich.edu/browse/categories/ But if it's not really an LCC at all, and you try to map it, you'll get bad postings. On 3/31/2011 1:03 PM, Jonathan Rochkind wrote: Thanks, that looks good! It's hosted on Google Code, but I don't think that code is anything Google uses, it looks like it's from our very own Bill Dueber. On 3/31/2011 12:38 PM, Tod Olson wrote: Check the regexp that Google uses in their call number normalization: http://code.google.com/p/library-callnumber-lc/wiki/Home You may want to remove the prefix part, and allow for a fourth cutter. The folks at UNC pointed me to this a few months ago. -Tod On Mar 31, 2011, at 11:29 AM, Jonathan Rochkind wrote: Does anyone have a good regular expression that will match all legal LC Call Numbers from the LC Classified Schedule, but will generally not match things that could not possibly be an LC Call Number from the LC Classified Schedule? In particular, I need it to NOT match an MLC call number, which is an LC assigned call number that shows up in an 050 with no way to distinguish based on indicators, but isn't actually from the LC Schedules. Here's an example of an MLC call number: MLCS 83/5180 (P) Hmm, maybe all MLC call numbers begin with MLC, okay I guess I can exclude them just like that. But it looks like there are also OTHER things that can show up in the 050 but aren't actually from the classified schedule, the OCLC documentation even contains an example of Microfilm 19072 E. What a mess, huh? So, yeah, regex anyone? [You can probably guess why I care if it's from the LC Classified Schedule or not]. Tod Olsont...@uchicago.edu Systems Librarian University of Chicago Library
Re: [CODE4LIB] Announcing OLAC's prototype FRBR-inspired moving image discovery interface
-Original Message- From: McElwain, Paul Benjamin [mailto:pbmce...@indiana.edu] In my working on the Variations FRBR implementation, as a data modeler, I was struck by little attention had been paid to the relationships by the FRBR Report. I'm not surprised though, treating the relationships at at entity level (having their own attribution) is a more obtuse exercise of abstraction. We do treat relationships as attributed entities, for more information about the role involved. (a creator may be as a composer) One way to think about the originality of an expression of a work, being the first ever expression, could be as an attribute of the relationship between the work and expression. Paul... Paul, I agree that from a theoretical perspective it makes a lot of sense to model the original expression as an attribute of the relationship between the work and that expression. Or to do what FRBRoo did and make classes like Work Conception and F28 Expression Creation. It's not really clear to me that in our particular situation there is any practical advantage to trying to do that rather than creating a merged work/primary expression entity. This has to do with the kind of expressions we're mostly modeling and the way we're trying to model them. Most of the moving image expressions that average libraries deal with are defined not by what I think of as bundled attributes, but are rather a set of independent attributes. This is unlike the typical music expression, which I think of as a set of bundled attributes. If you know you have a performance of X work on Y date in Z venue then, if someone has previously created in an expression record, you know a number of other things about that expression such as the composer, performers, and arrangement of the piece without having to re-verify them again. All those things could productively be stored as a unit. This also happens with film with various cuts such as airplane versions or director's cuts. These do have some associated attributes, notably length, but also perhaps a different editor. For the kinds of unbundled attributes that are common with moving images, especially DVDs, there are a large number of attributes, like soundtrack and subtitle languages, accessibility options (captions, audio descriptions), and aspect ratio, that vary independently. With these kinds of unbundled expression attributes, a cataloger has to reexamine all of them every time there is a new manifestation. If there's a change in our knowledge of what subtitles are on a specific manifestation, it does not have automatic implications for any other manifestation that might have that same constellation of options. The other types of attributes that describe the original expression of a film are those that never change because they are important facts about the history of the work that we want to note in conjunction with any future expression. Many of these are things that RDA says are attributes of expressions that moving image catalogers would tend to think of as attributes of works (e.g., casts and costume designers do not vary among expressions so why record them on every expression?). In a sense, at least for moving images, the original expression is bit of an abstraction and in practice we get most of our information from reference sources. At first, I thought we could just model these unbundled attributes of the expression as attributes of the manifestation/publication since, as I mentioned above, they have to be verified with every new manifestation anyway. Work record 1 Dracula (1931) Tod Browning English Manifestation record 1 1 VHS videocassette (1985) OCLC#: 13754402 Audio: English I ran into trouble with manifestations that include more than one work. Some still work well enough, either because the expression-level information is all the same or is unknown. Work record 1 Ursula (1961) Lloyd Michael Williams English Manifestation record 1 1 DVD video (2005) Experiments in terror ISBN: 0976523922 Audio: English Work record 2 Journey into the Unknown (2002 ) Kerry Laitala English However, in some cases, the expression-level information varies between two works on a single manifestation/publication. The manifestation below includes two versions of Dracula, each in its original language. For the prototype, I just made two different manifestation records, which repeat most of the same information. That doesn't seem to me to be a desirable long-term solution. Work record 1 Dracula (1931) Tod Browning English Manifestation record 1 1 DVD video (1999) ISBN: 0783227450 Audio: English Subtitles: English or French Work record 2 Dracula (1931) George Medford Spanish Manifestation record 2 1 DVD video (1999) ISBN: 0783227450 Audio: Spanish Subtitles: English or French So I think we do need the intermediate expression level, but I am not sure if
Re: [CODE4LIB] Announcing OLAC's prototype FRBR-inspired moving image discovery interface
Okay, I tried to put in tables and that didn't work. I'm trying again with tabs. See if this makes more sense--Kelley -Original Message- From: McElwain, Paul Benjamin [mailto:pbmce...@indiana.edu] In my working on the Variations FRBR implementation, as a data modeler, I was struck by little attention had been paid to the relationships by the FRBR Report. I'm not surprised though, treating the relationships at at entity level (having their own attribution) is a more obtuse exercise of abstraction. We do treat relationships as attributed entities, for more information about the role involved. (a creator may be as a composer) One way to think about the originality of an expression of a work, being the first ever expression, could be as an attribute of the relationship between the work and expression. Paul... Paul, I agree that from a theoretical perspective it makes a lot of sense to model the original expression as an attribute of the relationship between the work and that expression. Or to do what FRBRoo did and make classes like Work Conception and F28 Expression Creation. It's not really clear to me that in our particular situation there is any practical advantage to trying to do that rather than creating a merged work/primary expression entity. This has to do with the kind of expressions we're mostly modeling and the way we're trying to model them. Most of the moving image expressions that average libraries deal with are defined not by what I think of as bundled attributes, but are rather a set of independent attributes. This is unlike the typical music expression, which I think of as a set of bundled attributes. If you know you have a performance of X work on Y date in Z venue then, if someone has previously created in an expression record, you know a number of other things about that expression such as the composer, performers, and arrangement of the piece without having to re-verify them again. All those things could productively be stored as a unit. This also happens with film with various cuts such as airplane versions or director's cuts. These do have some associated attributes, notably length, but also perhaps a different editor. For the kinds of unbundled attributes that are common with moving images, especially DVDs, there are a large number of attributes, like soundtrack and subtitle languages, accessibility options (captions, audio descriptions), and aspect ratio, that vary independently. With these kinds of unbundled expression attributes, a cataloger has to reexamine all of them every time there is a new manifestation. If there's a change in our knowledge of what subtitles are on a specific manifestation, it does not have automatic implications for any other manifestation that might have that same constellation of options. The other types of attributes that describe the original expression of a film are those that never change because they are important facts about the history of the work that we want to note in conjunction with any future expression. Many of these are things that RDA says are attributes of expressions that moving image catalogers would tend to think of as attributes of works (e.g., casts and costume designers do not vary among expressions so why record them on every expression?). In a sense, at least for moving images, the original expression is bit of an abstraction and in practice we get most of our information from reference sources. At first, I thought we could just model these unbundled attributes of the expression as attributes of the manifestation/publication since, as I mentioned above, they have to be verified with every new manifestation anyway. [Work record 1 is linked to Manifestation record 1] Work record 1 Manifestation record 1 Dracula (1931) OCLC#: 13754402 Tod BrowningAudio: English English I ran into trouble with manifestations that include more than one work. Some still work well enough, either because the expression-level information is all the same or is unknown. [Work record 1 is linked to Manifestation record 1] [Work record 2 is also linked to Manifestation record 1] Work record 1 Ursula (1961) Lloyd Michael Williams Manifestation record 1 English 1 DVD video (2005) Experiments in terror Work record 2 ISBN: 0976523922 Journey into the Unknown (2002 )Audio: English Kerry Laitala English However, in some cases, the expression-level information varies between two works on a single manifestation/publication. The manifestation below includes two versions of Dracula, each in its original language. For the prototype, I just made two different manifestation records, which repeat most of the same information. That doesnt seem to me
Re: [CODE4LIB] Announcing OLAC's prototype FRBR-inspired moving image discovery interface
Matthew, I find it confusing as well, but as Karen points out, that's the way the FRBR model does things. It seems to be driven by the need for the work to be such an abstract thing that it is prior to words. However, it does seem to me that the meaning of the language of a particular expression is not complete without reference to the original language. One of the FRAD drafts (http://archive.ifla.org/VII/d4/franar-conceptual-model-2ndreview.pdf) actually did propose original language as an attribute of the work (The language in which the work was first expressed), but that axed so it seems to have been a very conscious decision on the part of the creators of FRBR. The idea does seem to have generated some controversy. From ALA's feedback on this draft: At least one task force member was a bit uneasy with this attribute, noting that, although the attribute has a certain utility, the work entity is abstract in FRBR and is not associated with any particular language (e.g. Ancient Greek is the language of the first expression of the Iliad, but not the language of the work, which encompasses what all of the expressions have in common). Others thought that an original language attribute was appropriate for work (for textual works, anyway), that all expressions of a work do have the same original language even if the language of the expressions themselves can differ, and that the attribute is necessary for determining whether or not the expression represents a translation. It was suggested that the attribute would not be appropriate for a superwork entity, were one to be defined. (http://www.libraries.psu.edu/tas/jca/ccda/docs/tf-frad3.pdf) Kelley -Original Message- From: Beacom, Matthew matthew.bea...@yale.edu Thank you, Karen, It has been awhile since I refreshed my memory with actually reading FRBR. Language is an attribute of the FRBR expression and not the FRBR work entity. I must still have a dominate pre-FRBR concept of work in my mind! I need another 5 years in the re-education camp. Matthew -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Karen Coyle Sent: Monday, December 13, 2010 10:51 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Announcing OLAC's prototype FRBR-inspired moving image discovery interface Quoting Beacom, Matthew matthew.bea...@yale.edu: Sometimes I feel like we should all have the FRBR diagram tattoo'd on our arms so we can consult it any time anywhere. :-) With as complex a thing as a film--so many authors, images, music, dialog, acting, sets, costume, etc., etc., etc., applying the FRBR model is tough, and your implementation is quite sensible. However, I had a small question about one thing you said about FRBR not allowing language at the work level. That doesn't seem right to me. How could the language of a thing that is primarily or even partially a work made of language--like a novel or a motion picture with spoken dialogue would not necessarily be considered at the work level and not at some other level. Matthew, I can't answer how it is possible but I can tell you that it is a fact: language is an attribute of Expression, not of Work. That's kind of the key meaning of frbr:Expression -- it is the Expression of the Work, and the Work doesn't exist until Expressed. So Work is a very abstract concept in FRBR. (Which is why more than one attempted implementation of FRBR that I have seen combines Work and Expression attributes in some way.) Not only that, but Kelley's model uses something that I consider to be missing from FRBR: the concept of a original Expression. For FRBR (and thus for RDA) all expressions are in a sense equal; there is no privileged first or original expression. Yet there is evidence that this is a useful concept in the minds of users. Some recent user studies [1] around FRBR showed that this is a concept that users come up with spontaneously. Also, I can't think of any field of study where knowing what the original expression of a work was wouldn't be important. Because of the way we treat translations--not just in FRBR--as what FRBR calls expressions not as new works, a translation from the original language to another would be considered an FRBR expression. Could you explain this a bit more? The FRBR relationship translation of is an Expression-to-Expression relationship. (See my personal cheat sheet of RDA/FRBR relationships [2]). kc [1] http://www.asis.org/asist2010/abstracts/75.html [2] http://kcoyle.net/rda/group1relsby.html Thank you. Matthew -Original Message- ... This also allowed us to get around some of the areas of more orthodox FRBR modeling that we found unhelpful. For example, FRBR doesn't allow language at the Work level, but we think it is important to record the original language of a moving image at the top level. -- Karen Coyle kco...@kcoyle.net http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234
Re: [CODE4LIB] Announcing OLAC's prototype FRBR-inspired moving image discovery interface
Karen, I'm glad you found it helpful and I will definitely consider writing it up somewhere. Right now I'm also struggling to write something up on the data modeling problems I had in a way that is comprehensible to anyone other than me. That might make a good complement to this discussion. I look forward to any comments or suggestions that you or anyone else has. We are trying to get as much feedback as possible. Kelley -Original Message- Kelley, this is great! Thanks. And since you already have so much written up, would you consider going a bit further and offering it to the code4lib journal? My reasons are selfish -- i'd like to be able to find and cite this in the future. Later I may have a few comments. kc Quoting Kelley McGrath kell...@uoregon.edu: We called it FRBR-inspired since it probably wouldn't pass muster as an orthodox FRBR interpretation. We were looking to experiment with a practical approach that we thought would make it much easier for patrons to discover moving images in libraries and archives. If you haven't read it, the about page gives a general overview of our approach at http://blazing-sunset-24.heroku.com/page/about
Re: [CODE4LIB] Announcing OLAC's prototype FRBR-inspired moving image discovery interface
One other thing about this project that might be of interest to Code4Lib readers is that the most technically challenging part of the interface was making the facets work properly so that they simultaneously applied limits across tables that are linked with a many-to-many relationship. The two main tables that are involved are Movies/Programs (works/primary expressions) and Versions (expression/manifestations). These go with the two sets of facets, which we visually separate for the interface in the hopes of communicating their different functions to users. Movies obviously can have many versions. If you look at the Citizen Kane record, you can see that it was released in many formats, including VHS, DVD and LaserDisc, with various language options. A given manifestation can also contain more than one work. If you search for Kyle XY, you'll get ten records for episodes that are part of a season of the TV program. These are all on the same manifestation. The versions table is also linked to a table that represents items and is the intersection of the versions/manifestations table and the libraries table, but this is a one-to-many relationship. The facet counts under Versions are really for items, but it would be interesting to see whether this would be more useful if the count was for versions. Kelley
Re: [CODE4LIB] Announcing OLAC's prototype FRBR-inspired moving image discovery interface
We called it FRBR-inspired since it probably wouldn't pass muster as an orthodox FRBR interpretation. We were looking to experiment with a practical approach that we thought would make it much easier for patrons to discover moving images in libraries and archives. If you haven't read it, the about page gives a general overview of our approach at http://blazing-sunset-24.heroku.com/page/about Our top level is a combination of FRBR work information and information about what we are calling the primary expression. We haven't made any internal distinction between these two types of information. This enables us to record together the data that we think people expect to see about the generic moving image and reflects the sort of information that is given in IMDb, the All Movie Guide, and film and TV reference sources. This is also the data that we would want to re-use in every MARC record for a manifestation of a given movie. This also allowed us to get around some of the areas of more orthodox FRBR modeling that we found unhelpful. For example, FRBR doesn't allow language at the Work level, but we think it is important to record the original language of a moving image at the top level. In addition, RDA has mapped a number of functions, such as art director, costume designer and performer, to the expression level. We would prefer to present these at the top level. It is hard to imagine a version of Gone With the Wind with a different costume designer or cast that would still be the same work. So all the Seven Samurai data you listed above belongs either to the work or the primary expression. We mingle expression, manifestation and item information in the version facets on the right. We don't show any explicit expression records. In this demonstration we are not actually identifying any unique expressions, although in the future we will probably want to do this for what I think of as named expressions. Since this is a demo, we are working with a limited number of attributes and the only expression-level facets we provide are soundtrack and subtitle languages. In this sense, our approach is similar to the near manifestation idea that Simon mentioned. We are not trying to assert that we have identified particular expressions. Rather, we are trying to provide a mechanism for the user to identify the set of items that meet their needs. It is not clear to me that libraries are always in a position to accurately identify expressions. Rather than providing a hierarchical view where the user selects a work, then an expression, and so on, as is common in FRBR presentations, we permit the user to begin at any FRBR level. The user is invited to limit by as many characteristics as they desire to delineate the set of things that they are interested in. They only need to select as many attributes as are important to them and no more. This may not meet the needs of all scholars, but we hope that it will meet the vast majority of general purpose user needs. It's a bit of a different approach than I have seen elsewhere, but I think it works particularly well for moving images. One of the main reasons I think this is because of the types of expressions that predominate in commercial moving images. I will try to explain some of my thoughts on types of expressions below. 1. Expressions that can be reduced to controlled vocabulary options These are the most common types of commercial moving image expressions, especially in the DVD era. They are distinguished by characteristics that such as Soundtrack language(s) Subtitle language(s) Accessibility options (captioning, SDH, and audio description) Aspect ratio (although in this era of widescreen TVs, full screen modifications are less common) Colorization Soundtracks for silent films These can be full described based on standardized data (although for the silent film soundtracks, this would involve multiple pieces of information, i.e., musical work, composer, conductor, performer(s), etc.) DVD often contain what essentially are multiple expressions in that they offer multiple soundtrack and subtitle options and may offer multiple aspect ratios. A silent film on DVD may come with alternate soundtracks. All of these can be combined in various ways by the viewer, which can make for a large number of expressions contained in a single manifestation. 2. Named expressions These are versions that are different in moving image content due to have been edited differently. Examples include Theatrical release Director's cut Unrated version Although Martha Yee found a strong correlation between differences in duration and the likelihood that two things represented two different expressions, this doesn't always work. The archetypical example of Blade Runner was released on DVD with five different versions (http://en.wikipedia.org/wiki/Versions_of_Blade_Runner), all of which had run times within
[CODE4LIB] Announcing OLAC's prototype FRBR-inspired moving image discovery interface
OLAC (Online Audiovisual Catalogers) is excited to announce the availability of our prototype for a FRBR-inspired, work-centric, faceted discovery interface for moving images at http://blazing-sunset-24.heroku.com. The OLAC Work-Centric Moving Image Discovery Interface Prototype is an exploration of the possibilities of leveraging the Functional Requirements for Bibliographic Records (FRBR) model and faceted search to improve access to moving image materials held by libraries and archives. This prototype was funded by OLAC. Chris Fitzpatrick developed the demonstration interface to meet OLAC’s specifications using the free open source tools Ruby on Rails, Solr, and the Blacklight and Hydra plug-ins. This project was only possible due to the contributions of a great many people, some of whom are listed at http://blazing-sunset-24.heroku.com/page/credits. In this demonstration interface we present the user with a two-level view inspired by the FRBR model. The top level, labeled Movie or Program, provides information about the FRBR Work and what we are calling the Primary Expression, usually the first publicly-released Expression. Facets for the Work/Primary Expression level are displayed across the top of the screen and the records found in the hit list contain information about the Work and Primary Expression. The second level, labeled Version, includes information about Expressions (language options), Manifestations (format and publication date), and in a very basic way about Items (what libraries or archives hold a particular Manifestation). Facets for the Version level are displayed separately on the side of the screen and information about the particular Versions that meet the user’s qualifications are displayed below each Work/Primary Expression. An overview of the goals of the interface is available at http://blazing-sunset-24.heroku.com/page/about. Some suggested sample searches and potential use cases may be seen at http://blazing-sunset-24.heroku.com/page/samples. We invite you to check it out and send us your feedback. Comments, questions, complaints, and suggestions may be sent to me at kell...@uoregon.edu. Also, if you are interested in contributing to a larger grant project to try to bring this idea into a production environment, please contact me. Kelley McGrath Metadata Management Librarian University of Oregon kell...@uoregon.edu
[CODE4LIB] FRBR work-centric, faceted UI demo developer sought
Hi, I thought I would send this again since so far I haven't heard from anyone. Unfortunately, we don't have a great deal of money to offer, but I think this would be an interesting project for the right person. It might even be a good project for an LIS student or recent grad looking for something for a resume. If you have any questions, please feel free to contact me. Kelley -Original Message- OLAC (Online Audiovisual Catalogers) has been investigating the potential of the FRBR model and a work-centric approach to improve access to moving images for some time. We are looking for someone to make a basic but functional demonstration end-user interface for moving images that is focused on FRBR works and that offers faceted navigation using sample data for 143 moving image works, 210 manifestations, and 297 items. Ideally, this will be developed with open source tools such as MySQL, Solr and Lucene. I have some ideas about what the interface might look like (see link below) and am looking for someone to put up something quick and dirty, but functional and interactive so people can get a better idea of how this might work. This may not turn out to be anything like what would work for a final user interface, but I am hoping that it will make the potential for a FRBR-based, faceted approach clear and make it easier for people to understand the kinds of searching options we want to provide. OLAC has agreed to fund $1500 to be awarded to the individual(s) who successfully completes this project. More information on and the sample data for this project are available at http://www.olacinc.org/drupal/?q=node/437 If you are interested in taking this project on, please contact me at kell...@uoregon.edu via email by Friday, October 22 with a list of your qualifications, a suggested timeline, and any other information you think it might be helpful for us to know. We are willing to negotiate on the timetable, but are interested in having a finished product as soon as possible. Please contact me if you have any questions. Kelley McGrath kell...@uoregon.edu