Re: [CODE4LIB] pdf2txt [tesseract]
On Oct 16, 2013, at 10:56 AM, Robert Haschart rh...@virginia.edu wrote: The abstract extraction routine I have been working on does use tesseract internally for doing OCR when it encounters a document that doesn't have usable full-text. I agree that tesseract is not that easy to install, especially if (as in my case) you do not have root/sudo access to the machine. Since I have gone through installing tesseract quite recently, perhaps my experience can be helpful to you. Robert, can you outline the process you used to get Tesseract to do OCR agains PDF documents? I installed Tesseract a few months ago, but I couldn't figure out how to get to work against PDF, only some image files. Any pointers would be greatly appreciated. (Hmmm. Maybe Tesseract doesn't do PDF files, only image files, and I need to convert my PDFs to images, and then the to Tesseract.) --Eric Morgan
[CODE4LIB] Online validator for RelaxNG or Schematron?
Does anyone know of an online validator for either Relax NG or Schematron? Thanks, Mark Mark Wolfe Curator of Digital Collections M. E. Grenander Department of Special Collections Archives Science Library 355, University at Albany, SUNY 1400 Washington Avenue, Albany NY 1 Phone: (518) 437-3934 Email: mwo...@albany.edu
Re: [CODE4LIB] pdf2txt [tesseract]
Hi Eric, On Thu, Oct 17, 2013 at 09:43:04AM -0400, Eric Lease Morgan wrote: Robert, can you outline the process you used to get Tesseract to do OCR agains PDF documents? I installed Tesseract a few months ago, but I couldn't figure out how to get to work against PDF, only some image files. Any pointers would be greatly appreciated. (Hmmm. Maybe Tesseract doesn't do PDF files, only image files, and I need to convert my PDFs to images, and then the to Tesseract.) --Eric Morgan Once you have Tesseract installed, the easiest way to use it for adding an OCR text layer to PDF files is this Ruby script IMHO: https://github.com/gkovacs/pdfocr Geza Kovacs wrote it for Cuneiform and an old version of OCRopus. I added Tesseract support later. If you cannot use Ruby for some reason, I could upload a BASH script doing the same thing. Cheers, Christian -- Christian Pietsch · http://purl.org/net/pietsch LibTec · Library Technology and Knowledge Management Bielefeld University Library, Bielefeld, Germany
Re: [CODE4LIB] Google Analytics on multiple systems
Hi Joel, It usually ends up being easiest to go with one GA account, separating different sources by using different properties (e.g., UA-[acct number]-1 for CONTENTdm, UA-[acct number]-2 for LibGuides, etc.) rather than separate accounts entirely. Each property can have different users with different permissions levels so you can customize who has access to what. You can further refine each property into different profiles if you want to filter data from one source in different ways. Having everything under one account makes it easy to manage and apply common settings (like users, filters, or custom reports) between properties and profiles. If you add another user, you only have to add them to one account, too. There are limits to the number of allowed properties (it's quite high and goes up occasionally; not sure what it is offhand), so if you bumped into that you could use another GA account. Google has made it easier in recent months to jump between accounts and properties, though. (Sorry for delayed reply, catching up on listservs) On Mon, Oct 14, 2013 at 2:36 PM, Joel Marchesoni jma...@email.wcu.eduwrote: Hello, We currently have Google Analytics on our main library pages and digital collections pages on the same domain. Now that CONTENTdm has a GA easy button we are going to add Analytics to it as well, and while we're at it probably LibGuides and non-authenticated ILLiad pages (I mainly want to see how big a percentage of mobile hits ILLiad gets) as well. I was hoping to hear from the list whether you have all service points in one GA account or a separate account for each one, and why. Thanks, Joel Marchesoni Tech Support Analyst Hunter Library, Western Carolina University http://library.wcu.edu/ 828-227-2860 ~Please consider the environment before printing this email~
Re: [CODE4LIB] Google Analytics on multiple systems
Thank you all for your replies. I'm thinking we'll go with one account (we already have a Google account for various other services) with multiple properties. One thing that has complicated matters is the property we currently use is not yet able to be upgraded to Universal Analytics, which is what CONTENTdm uses. FYI I noticed in my own research that the property limit is 250,000. I don't see us hitting that ever... Joel -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Josh Wilson Sent: Thursday, October 17, 2013 10:24 To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Google Analytics on multiple systems Hi Joel, It usually ends up being easiest to go with one GA account, separating different sources by using different properties (e.g., UA-[acct number]-1 for CONTENTdm, UA-[acct number]-2 for LibGuides, etc.) rather than separate accounts entirely. Each property can have different users with different permissions levels so you can customize who has access to what. You can further refine each property into different profiles if you want to filter data from one source in different ways. Having everything under one account makes it easy to manage and apply common settings (like users, filters, or custom reports) between properties and profiles. If you add another user, you only have to add them to one account, too. There are limits to the number of allowed properties (it's quite high and goes up occasionally; not sure what it is offhand), so if you bumped into that you could use another GA account. Google has made it easier in recent months to jump between accounts and properties, though. (Sorry for delayed reply, catching up on listservs) On Mon, Oct 14, 2013 at 2:36 PM, Joel Marchesoni jma...@email.wcu.eduwrote: Hello, We currently have Google Analytics on our main library pages and digital collections pages on the same domain. Now that CONTENTdm has a GA easy button we are going to add Analytics to it as well, and while we're at it probably LibGuides and non-authenticated ILLiad pages (I mainly want to see how big a percentage of mobile hits ILLiad gets) as well. I was hoping to hear from the list whether you have all service points in one GA account or a separate account for each one, and why. Thanks, Joel Marchesoni Tech Support Analyst Hunter Library, Western Carolina University http://library.wcu.edu/ 828-227-2860 ~Please consider the environment before printing this email~
Re: [CODE4LIB] pdf2txt [tesseract]
On 10/17/2013 9:43 AM, Eric Lease Morgan wrote: On Oct 16, 2013, at 10:56 AM, Robert Haschartrh...@virginia.edu wrote: The abstract extraction routine I have been working on does use tesseract internally for doing OCR when it encounters a document that doesn't have usable full-text. I agree that tesseract is not that easy to install, especially if (as in my case) you do not have root/sudo access to the machine. Since I have gone through installing tesseract quite recently, perhaps my experience can be helpful to you. Robert, can you outline the process you used to get Tesseract to do OCR agains PDF documents? I installed Tesseract a few months ago, but I couldn't figure out how to get to work against PDF, only some image files. Any pointers would be greatly appreciated. (Hmmm. Maybe Tesseract doesn't do PDF files, only image files, and I need to convert my PDFs to images, and then the to Tesseract.) --Eric Morgan That correct. I use ghostscript to print the pdf to a series of .tiff files, and then use tesseract to perform ocr on the individual .tiff images, producing a .txt file for each page. Since I'm only looking to extract the abstract I limit the ghostscript to the first 5 pages, and then do post-processing and various heuristics to find and fix the abstract. One particular issue I've found is that tesseract is fond of detecting ligatures such as fi fl ff ffl ffi but doesn't seem to be very good at selecting the correct one (at least for my data), so one of the post-processing steps is expand the ligature to individual characters does a dictionary look-up to help select the correct expansion.
Re: [CODE4LIB] MARC field lengths
Thanks, Bill. What you say about assumptions is a good part of what is motivating me to try to instigate a discussion. As you know, both FRBR and RDA were developed by the cataloging community with no input from technologists. There are sweeping statements about FRBR being more efficient than the MARC model, but without, that I can find, any real analysis. There was a study done at OCLC on the ratio of Works to Manifestations (and that shows in their stats today), but the OCLC catalog is not representative of the catalog of a single library. What I'm hoping to do is to surface some of the assumptions so that we can talk about them. I'll make a stab at an analysis, but I'm really interested in the conversation that could follow what I have to say. kc On 10/16/13 5:43 PM, Bill Dueber wrote: My guess is that traversing the WEM structure for display of a single record (e.g., in a librarian's ILS client or what not) will not be a problem at all, because the volume is so low. In terms of the OPAC interface itself, well, there are lots and lots of way to denormalize the data (meaning copy over and inline data whose canonical values are in their own tables somewhere) for search and display purposes. Heck, lots of us do this on a smaller and less complicated scale already, as we dump data into Solr for our public catalogs. This adds complexity to the system (determining what to denormalize, determining when some underlying value has changed and knowing what other elements need updating), but it's the sort of complexity that's been well-studied and doesn't worry me too much. I'm much, *much* more nerd than librarian, and if there's one thing I wish I could get across to people who swing the other way, it's that getting the data model right is so very much harder than figuring out how to process it. Make sure the individual elements are machine-intelligible, and there are hoards of smart people (both within and outside of the library world) who will figure out how efficiently(-enough) store and retrieve it. And, for the love of god, have someone around who can at least speak authoritatively about what sorts of things fall into the hard and easy-peasy categories in terms of the technology, instead of making assumptions. On Wed, Oct 16, 2013 at 6:23 PM, Karen Coyle li...@kcoyle.net wrote: Yes, that's my take as well, but I think it's worth quantifying if possible. There is the usual trade-off between time and space -- and I'd be interested in hearing whether anyone here thinks that there is any concern about traversing the WEM structure for each search and display. Does it matter if every display of author in a Manifestation has to connect M-E-W? Or is that a concern, like space, that is no longer relevant? kc On 10/16/13 12:57 PM, Bill Dueber wrote: If anyone out there is really making a case for FRBR based on whether or not it saves a few characters in a database, well, she should give up the library business and go make money off her time machine . Maybe -- *maybe* -- 15 years ago. But I have to say, I'm sitting on 10m records right now, and would happily figure out how to deal with double or triple the space requirements for added utility. Space is always a consideration, but it's slipped down into about 15th place on my Giant List of Things to Worry About. On Wed, Oct 16, 2013 at 3:49 PM, Karen Coyle li...@kcoyle.net wrote: On 10/16/13 12:33 PM, Kyle Banerjee wrote: BTW, I don't think 240 is a good substitute as the content is very different than in the regular title. That's where you'll find music, laws, selections, translations and it's totally littered with subfields. The 70.1 figure from the stripped 245 is probably closer to the mark Yes, you are right, especially for the particular purpose I am looking at. Thanks. IMO, what you stand to gain in functionality, maintenance, and analysis is much more interesting than potential space gains/losses. Yes, obviously. But there exists an apology for FRBR that says that it will save cataloger time and will be more efficient in a database. I think it's worth taking a look at those assumptions. If there is a way to measure functionality, maintenance, etc. then we should measure it, for sure. kc kyle On Wed, Oct 16, 2013 at 12:00 PM, Karen Coyle li...@kcoyle.net wrote: Thanks, Roy (and others!) It looks like the 245 is including the $c - dang! I should have been more specific. I'm mainly interested in the title, which is $a $b -- I'm looking at the gains and losses of bytes should one implement FRBR. As a hedge, could I ask what've you got for the 240? that may be closer to reality. kc On 10/16/13 10:57 AM, Roy Tennant wrote: I don't even have to fire it up. That's a statistic that we generate quarterly (albeit via Hadoop). Here you go: 100 - 30.3 245 - 103.1 600 - 41 610 - 48.8 611 - 61.4 630 - 40.8 648 - 23.8 650 - 35.1 651 - 39.6 653 - 33.3 654 - 38.1 655 - 22.5 656 - 30.6 657 - 27.4 658
Re: [CODE4LIB] Google Analytics on multiple systems
Wow, 250,000? I'm not sure that's right, though I'm prepared to believe anything. I checked the GA documentation, which says you can officially have 50 profiles per account. Each property has at least one default profile, so that's probably the official limit of properties too, before you'd need to use an extra account. (In turn, you can evidently manage 25 GA accounts per Google user account.) Not sure where the 250,000 figure comes from, but I've seen a number of scripting workarounds for the profile limit in various analytics blogs, so maybe you can sort of 'overclock' your accounts if you needed to. On Thu, Oct 17, 2013 at 10:41 AM, Joel Marchesoni jma...@email.wcu.eduwrote: Thank you all for your replies. I'm thinking we'll go with one account (we already have a Google account for various other services) with multiple properties. One thing that has complicated matters is the property we currently use is not yet able to be upgraded to Universal Analytics, which is what CONTENTdm uses. FYI I noticed in my own research that the property limit is 250,000. I don't see us hitting that ever... Joel -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Josh Wilson Sent: Thursday, October 17, 2013 10:24 To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Google Analytics on multiple systems Hi Joel, It usually ends up being easiest to go with one GA account, separating different sources by using different properties (e.g., UA-[acct number]-1 for CONTENTdm, UA-[acct number]-2 for LibGuides, etc.) rather than separate accounts entirely. Each property can have different users with different permissions levels so you can customize who has access to what. You can further refine each property into different profiles if you want to filter data from one source in different ways. Having everything under one account makes it easy to manage and apply common settings (like users, filters, or custom reports) between properties and profiles. If you add another user, you only have to add them to one account, too. There are limits to the number of allowed properties (it's quite high and goes up occasionally; not sure what it is offhand), so if you bumped into that you could use another GA account. Google has made it easier in recent months to jump between accounts and properties, though. (Sorry for delayed reply, catching up on listservs) On Mon, Oct 14, 2013 at 2:36 PM, Joel Marchesoni jma...@email.wcu.edu wrote: Hello, We currently have Google Analytics on our main library pages and digital collections pages on the same domain. Now that CONTENTdm has a GA easy button we are going to add Analytics to it as well, and while we're at it probably LibGuides and non-authenticated ILLiad pages (I mainly want to see how big a percentage of mobile hits ILLiad gets) as well. I was hoping to hear from the list whether you have all service points in one GA account or a separate account for each one, and why. Thanks, Joel Marchesoni Tech Support Analyst Hunter Library, Western Carolina University http://library.wcu.edu/ 828-227-2860 ~Please consider the environment before printing this email~
[CODE4LIB] Call for Proposals: MARC Formats Transition Interest Group at ALA Midwinter
**Apologies for cross posting** -- The LITA/ALCTS Marc Formats Transition Interest Group invites proposals for presentations for its session at the 2014 ALA Midwinter Conference in Philadelphia , Pennsylvania. The meeting will take place on Saturday, January 25th, from 3pm to 4pm. Proposals may be between 15 to 30 minutes in length. Possible topics include, but are not limited to: * Harvesting bibliographic data from MARC records for use in discovery tools, next-gen catalogs and other applications * Transforming MARC data to other metadata schemes (BIBFRAME, Dublin Core, EAD, VRA, etc…) * Using data from MARC records with data from linked data sources * Discussions of recent MARC changes, RDA in MARC or ongoing problems or complexities of the standard. * Other unconventional projects using MARC data. Proposals should be e-mailed to Sarah Weeks (wee...@stolaf.edu) by Monday, November 11, 2013. Please include presentation title, summary, amount of time needed for the presentation, and the names, titles and contact information for the presenter(s). -- Sarah Beth Weeks Head of Technical Services St Olaf College Rolvaag Memorial Library 1510 St. Olaf Avenue Northfield, MN 55057 507-786-3453 (office)
Re: [CODE4LIB] Google Analytics on multiple systems
Oh wow, sorry, that's not right. I was thinking 25; not sure where the 4 zeros came from... Joel -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Josh Wilson Sent: Thursday, October 17, 2013 11:18 To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Google Analytics on multiple systems Wow, 250,000? I'm not sure that's right, though I'm prepared to believe anything. I checked the GA documentation, which says you can officially have 50 profiles per account. Each property has at least one default profile, so that's probably the official limit of properties too, before you'd need to use an extra account. (In turn, you can evidently manage 25 GA accounts per Google user account.) Not sure where the 250,000 figure comes from, but I've seen a number of scripting workarounds for the profile limit in various analytics blogs, so maybe you can sort of 'overclock' your accounts if you needed to. On Thu, Oct 17, 2013 at 10:41 AM, Joel Marchesoni jma...@email.wcu.eduwrote: Thank you all for your replies. I'm thinking we'll go with one account (we already have a Google account for various other services) with multiple properties. One thing that has complicated matters is the property we currently use is not yet able to be upgraded to Universal Analytics, which is what CONTENTdm uses. FYI I noticed in my own research that the property limit is 250,000. I don't see us hitting that ever... Joel -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Josh Wilson Sent: Thursday, October 17, 2013 10:24 To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Google Analytics on multiple systems Hi Joel, It usually ends up being easiest to go with one GA account, separating different sources by using different properties (e.g., UA-[acct number]-1 for CONTENTdm, UA-[acct number]-2 for LibGuides, etc.) rather than separate accounts entirely. Each property can have different users with different permissions levels so you can customize who has access to what. You can further refine each property into different profiles if you want to filter data from one source in different ways. Having everything under one account makes it easy to manage and apply common settings (like users, filters, or custom reports) between properties and profiles. If you add another user, you only have to add them to one account, too. There are limits to the number of allowed properties (it's quite high and goes up occasionally; not sure what it is offhand), so if you bumped into that you could use another GA account. Google has made it easier in recent months to jump between accounts and properties, though. (Sorry for delayed reply, catching up on listservs) On Mon, Oct 14, 2013 at 2:36 PM, Joel Marchesoni jma...@email.wcu.edu wrote: Hello, We currently have Google Analytics on our main library pages and digital collections pages on the same domain. Now that CONTENTdm has a GA easy button we are going to add Analytics to it as well, and while we're at it probably LibGuides and non-authenticated ILLiad pages (I mainly want to see how big a percentage of mobile hits ILLiad gets) as well. I was hoping to hear from the list whether you have all service points in one GA account or a separate account for each one, and why. Thanks, Joel Marchesoni Tech Support Analyst Hunter Library, Western Carolina University http://library.wcu.edu/ 828-227-2860 ~Please consider the environment before printing this email~
[CODE4LIB] Job: Digital Initiatives Librarian at University of Wisconsin-Parkside
The University of Wisconsin-Parkside invites applications for the Digital Initiatives Librarian (Official Title: Associate Academic Librarian). This is a full-time, 12 month Academic Staff position. The Digital Initiatives Librarian manages the digital assets and digital collections of the University of Wisconsin-Parkside Library and Archives and Area Research Center. Works directly with students, faculty, staff and members of the community to support access to archival materials in digital formats. Provides direct support for student and faculty archival research. Teaches scheduled archival instruction sessions for UW-Parkside research courses, and collaborates with instructors to determine which resources and delivery methods are most appropriate to class objectives. Provides expertise and leadership in the design and development of the University digital collections. Oversees web-based access to local and remote digital content. Manages digitization processes and workflows. Partners with Head of Library Systems and Emerging Technologies Librarian to develop appropriate preservation, storage and retrieval of University digital assets. Monitors digitization trends and developments and assists in the adoption and implementation of new technologies and methods. This position reports to the Head of Archives at the University of Wisconsin-Parkside Library. **QUALIFICATIONS** _Required_ * ALA accredited MLS, with archival coursework * Strong oral and written communication skills * Demonstrated project management skills * Preferred * Experience working in an archives or special collections * Experience creating standard archival finding aids * Experience in public records reference * Experience working in an academic environment * Strong preference will be given to applicants with experience using Omeka, Archivists' Toolkit or Archon, and oXygen or other XML editors **RESPONSIBILITIES** _A_. * Oversees the creation, preservation, and delivery of digital content and collections in support of research and instruction * Identifies and selects archival material/collections to be digitized. * Establishes project priorities and manages all facets of digitization projects including development of workflows and schedules, coordination of staff and equipment resources, quality control and creation of documentation for digital project procedures. * Coordinates the daily operations of digital content creation including digitization of a variety of digital content formats and quality control activities. * Keeps current with digitization and digital delivery systems, standards and trends in higher education. * Investigates and recommends digitization hardware and software; monitors and maintains specialized hardware and software to capture, manipulate and save digital files. * Trains digitization staff; trains and supervises student employees. * Experiments with open-source software solutions to anticipate future ancillary services, including GIS mapping, mobile interfacing, and interoperability with new systems. * Designs and implements digital exhibits. * Creates metadata for digital material and collections; stays current with a variety of digital library standards including best practices for digitization and metadata creation. * Engages in outreach activities with campus and community partners. _B._ * Manages the daily functions of the Archives * Manages ARC transfers and training students/staff in receiving and transferring procedures. * Performs archival and public records reference. * Conducts records management functions, such as answering records questions, referring university employees to the proper schedules, and receiving records collections in any format. * Coordinates the technological environment for bar coding the Wisconsin Historical Society collections and supervises the project. C. * Participates in reference services, library instruction and library liaison program as needed _D. _ * Serves on library committees, teams, UW-System and professional committees as elected or appointed _Knowledge, Skills and Abilities_ * Experience with metadata standards and schema such as Dublin Core, EAD, and DACS. * Experience working with XML data and XML editors. * Experience in archival reference. * Familiarity with basic web languages and editors. * Ability to work with a diverse group of faculty, students, administrators, donors, staff and general public. * Familiarity with intellectual property and copyright issues. * Knowledge of digital project strategies, technologies and standards. * Knowledge of proper methods of handling and conserving archival materials in varied formats. **SPECIAL NOTES:** Salary: Commensurate with qualifications and experience. The University of Wisconsin System provides a liberal benefits package, including participation in a state pension plan. It is the
[CODE4LIB] Job: Professor of Audiovisual Archival Studies at University of California, Los Angeles
Assistant/Associate Professor of Audiovisual Archival Studies The Department of Information Studies of the Graduate School of Education and Information Studies at UCLA invites applications for a tenure-track assistant professor or tenured associate professor specializing in audiovisual archival studies. The successful applicant will have research and teaching interests that relate to any aspect of audiovisual archival studies, broadly conceived as encompassing moving image, recorded sound, and digital media archives. These interests might include one or more of the following: * the nature, history, and role in society, of physical and digital collections of archival moving images, sound recordings, and new media objects; * the nature, history, and role in society, of media and technologies for the production, transmission, organization, discovery, retrieval, presentation, and playback of audiovisual works; * uses and users of audiovisual archives; * the appraisal, description, arrangement, documentation, curatorship, conservation, restoration, preservation, and exhibition of audiovisual archival resources, and of textual, visual, and material artifacts relating to such resources; * the design, evaluation, and use of collections, records, data/metadata, and digital/media asset management systems for audiovisual archival resources; * public programming and outreach in audiovisual archives; * the provision of equitable and open access to audiovisual cultural heritage; * community, ethnic, and Indigenous audiovisual archives and memory-keeping traditions; * the management of audiovisual archives in commercial (e.g., studio) and nonprofit (e.g., library special collections, museum) settings; * policy development and analysis for audiovisual archives; * the evolving identity of the moving image and recorded sound archivists' professions; * social, economic, political, and legal aspects of audiovisual archives management; and * international collaboration, policymaking, and standards development for audiovisual archives. The Graduate School of Education and Information Studies (GSEIS) is one of the top-ranked schools in the U.S., and supports internationally recognized research centers including the Center for Information as Evidence. Within the school, the Department of Information Studies has emerged as an innovative, interdisciplinary locus for theory and research in information studies, including archival and museum informatics, data curatorship, information policy, new media, preservation, and textual and visual studies. The Department's faculty has been recognized as among the most productive and highly-cited in the field. Faculty members have close ties with UCLA's Center for Digital Humanities, Ethnomusicology Archive, Film Television Archive, Library Digital Collections, and Library Special Collections. The Department offers an M.A. program in Moving Image Archive Studies (MIAS),* an M.L.I.S. (Master of Library and Information Science) degree with specializations in archival studies, library studies, informatics, and rare books and print and visual culture, and a Ph.D. program in Information Studies. The MIAS program was established in 2002 as the first graduate program in North America (and still the only one on the West Coast) to address the technical, cultural, and policy challenges of preserving moving image cultural heritage (film, video, and digital) through a systematic program for preparing future moving image archivists to lead the field. The archival studies specialization of the M.L.I.S. program is among the most highly regarded nationally and internationally, and a leader in initiatives to pluralize archival practice and research. All faculty in the Department teach at both master's and doctoral levels; thus, candidates should be able to demonstrate how their research and teaching interests and experience will help foster the growth of the M.A., M.L.I.S., and Ph.D. programs. This position entails: teaching four four-unit courses (including at least two of the core seminars in the MIAS M.A. program) per year, or their equivalent, in accordance with the Department's workload policy; advising and mentoring graduate students; actively engaging in research; and actively participating in administrative responsibilities for the Department, the School, and the University. The School and the Department have strong commitments to the rich and varied multicultural communities of the Southern California region, and a reputation for merging research and practice in statewide, national, and international outreach and service. We seek a scholar who will make the most of Los Angeles' unique advantages as a setting for research that links audiovisual archival studies to public engagement, and for creating international connections, especially with the Pacific Rim and Latin America. We particularly encourage applications from those whose research and
[CODE4LIB] Job: University Archivist Special Collections Librarian, at Adelphi University
University Archives and Special Collections (UASC) is comprised of two distinct collections--the official archives of the University, in multiple formats, and some 30 distinctive special collections in a variety of different subjects. Reporting to the Dean of Libraries, the University Archivist and Special Collections Librarian position provides leadership within the department in accordance with the Libraries' goals and strategic planning; facilitates communication about UASC within the University Libraries, throughout the University community, and to the general public of current and potential users. This is a tenure-track library Associate Professor faculty position. Applicants must hold a master's degree from an ALA accredited school of library/ information science, preferably with a concentration in archives or some advanced training in archives, manuscripts, and special collections. A second post-baccalaureate degree or similar proof of advanced study is required for tenure. The successful candidate will also have 3-5 years of significant experience in an archives or special collections environment, including at least three years of supervisory and budgetary responsibilities, as well as a broad understanding of archival related activities in an academic research library setting. Primary Responsibilities: * Coordinates all aspects of Special Collections Archives operations, including the ongoing acquisition of relevant material; preservation, conservation and management of collections; maintenance of intellectual control; and development of access and usage policies appropriate to both physical and virtual collections. * Provides overall supervisory oversight of staff, including full-time and part-time librarians/archivists, an administrative assistant, and student employees. * Oversees the formulation and periodic review of collection development and materials selection policies and profiles; oversees policies relating to the use of both collections.. * Oversees specialized collection management functions, including the handling of gift materials, selection and de-selection collection processes, identification of potential conservation and preservation materials in the general collection, and collection analysis. * Maintains a strategic development plan that will encompass growth and enhancement of the library's physical and digital collections documenting the history and functions of the university. * Monitors resources within the department, including faculty/staff, budget, equipment, space and physical facilities. * Works collaboratively with the staff to set priorities, create strategic plans and documentation, and meet project deadlines. * Fosters communication and collegiality within the department and with other departments in the library. * Collaborates with Adelphi faculty and staff and all divisions of the Libraries to develop digital collections, including both digitized and born-digital resources; establish digitization priorities for print and audiovisual collections; and ensure that digitization projects are successfully completed. * Supports a high level of public service and dedication to the Libraries' mission within the department. * Promotes the use of primary resources within university courses and research. * Cultivates relationships with donors and prospective donors of unique special collections and archival materials. * Collaborates with department faculty/staff and library leadership to identify potential grant and funding sources, prepare required applications, and manage funded projects. * Works closely with department faculty/staff to develop programs and exhibits that will promote collections and contribute to the mission and vision of the Libraries and the University. OTHER RESPONSIBILITIES: * Collection development and liaison responsibilities for one or more schools or departments. * Participation in the Libraries' information literacy program. * Provision of services at Swirbul Library's main reference desk including occasional evenings and weekends. * Service on University and Library committees. * Active participation in professional associations and activities. * Active participation in scholarly activities including research and publishing, as required for reappointment and tenure. QUALIFICATIONS: * Knowledge of standards-based archival description and metadata schema, such as EAD, XML, MODS, and Dublin Core * Excellent communication and interpersonal skills * The ability to work effectively in a collegial environment * Evidence of ability to meet criteria for promotion and tenure * Experience with digitization projects, archival database management systems, and website construction. Other desirable qualifications include: * Familiarity with ContentDM and Archivist's Toolkit * Experience with records retention policies and schedules, exhibits, and writing
[CODE4LIB] Job: Curator - Gordon W. Prange Collection and Librarian for East Asian Studies at University of Maryland, College Park
The University of Maryland Libraries are seeking dynamic and innovative applicants for the position of Curator of the Gordon W. Prange Collection and Librarian for East Asian Studies. The successful candidate will create and implement a vision for the Gordon W. Prange Collection, a world-renown special collection of rare and archival materials that constitutes the most comprehensive collection of Japanese language publications issued in Japan during the post-World War II period of 1945-1949. The Prange Collection encompasses over 1.7 million items representing virtually everything published in Japan during this period. The University of Maryland Libraries, in partnership with the National Diet Library of Japan, have engaged in large-scale microfilming and digitization projects to preserve and improve access to this historically significant and unique collection. Project funders have included the National Endowment for the Humanities, the Japan Foundation Center for Global Partnership and the Nippon Foundation. The Curator/Librarian will also be responsible for East Asian studies materials in the Libraries' general collection, which includes over 80,000 monographs, periodicals and reference works in Chinese, Japanese and Korean languages. Particular strengths include humanities and social sciences with an emphasis on Chinese and Japanese history and culture in support of the research and curricular needs of faculty and students in East Asian Studies. The Curator/Librarian will develop a robust program of collection development, research services, digitization, outreach, and scholarly activity to support these collections. In addition, the successful candidate will not only manage these collections and related services, but will also be a scholar with an active program of print and digital research based in the Prange and East Asia Collections. For the full job announcement and position description, please go to [http://www.lib.umd.edu/hr/employment-opportunities/staff-faculty- positions](http://www.lib.umd.edu/hr/employment-opportunities/staff-faculty- positions). Position is appointed to Librarian Faculty Ranks as established by the University System of Maryland Board of Regents. Rank at appointment is based on the successful applicant's experience and relevant credentials. For additional information, consult the following website: [http://www.president.umd.edu/policies/ii- 100B.html](http://www.president.umd.edu/policies/ii-100B.html). **APPLICATIONS**: Electronic applications required. Please apply online at [https://ejobs.umd.edu/postings/22149](https://ejobs.umd.edu/postings/22149). An application consists of a cover letter which includes the source of advertisement, a resume, and names/e-mail addresses of three references. Applications will be reviewed as they are received and accepted until Monday, November 18, 2013. The University of Maryland, College Park, actively subscribes to a policy of equal employment opportunity, and will not discriminate against any employee or applicant because of race, age, sex, color, sexual orientation, physical or mental disability, religion, ancestry or national origin, marital status, genetic information, political affiliation, or gender identity and expression. Minorities and women are encouraged to apply. Brought to you by code4lib jobs: http://jobs.code4lib.org/job/10356/
[CODE4LIB] Job: Systems Engineers at Virginia Polytechnic Institute and State University
Virginia Tech's Newman Library and the Center for Digital Research and Scholarship (CDRS) are seeking qualified candidates for two Systems Engineers for data initiatives. Incumbents will develop systems that: 1) enable data integration across distributed and heterogeneous local and external data sources to maximize data use and reuse in applications, and 2) support digital preservation strategies and repository systems research, development, and implementation. Primary responsibilities include leading technical contributions, such as data architecture design, data integration, system design and testing, and applications development, implementation, administration, and support, for data publishing and preservation projects (initial focus on VIVO and Fedora). Additional responsibilities include ensuring systems compatibility to meet project/program functionality, technical, and design specifications, and scheduling objectives; collaborating with colleagues in the Libraries and at other institutions in delivering system and web development projects; providing informed IT-related advice for Center for Digital Research and Scholarship (CDRS) projects; liaising with CDRS and Information Technologies and Services (ITS) personnel for planning and service development; participating in selected cross-Libraries working groups to improve systems and services; providing training to Libraries personnel (and library users where appropriate); participating in various systems engineering projects as a result of developments and changes in Library services. Required Qualifications: Master's degree in computer/information science, management information systems, or related field, or Bachelor's degree and significant experience equivalent to an advanced degree. Successful candidates must have: familiarity with semantic web technologies; knowledge of and experience with: Java and/or Object Oriented programming in PHP, relational databases (e.g., MySQL), web applications (e.g., HTTP, CSS, HTML, XML, REST API), software development methods and tools (e.g., version control, agile programming methodologies, documentation, and sound security practices); experience with Windows 200x and/or UNIX/LINUX server environments and related support and maintenance, thorough understanding of application server (Apache Web) technical architecture, and familiarity with shell scripting; experience with backups, caching, role servers, DNS, SMTP/mail relays, SQL query writing/troubleshooting, SSL certificates, systems design and networking and security administration; knowledge of authentication mechanisms (local and/or external) - Active Directory, LDAP, Shibboleth, EZproxy (or similar); ability to work independently and with initiative to identify and solve problems; excellent analytical and design skills at multi-product/multi-environment level; ability to work collaboratively with individuals and groups, both onsite and remotely; good interpersonal and communications skills; commitment to service excellence and customer care. Preferred Qualifications: Knowledge of and experience with JCR or J2EE; knowledge of Ruby, Solr Indexes, Semantic Triplestores, and Cloud Infrastructures; experience working with RDF in practical applications; experience working in a managed programming environment using one or more of the following: an IDE (e.g. Eclipse), a code repository (e.g. Redmine, Trac, Subversion, Github), in-code documentation (e.g. PHPDoc/Javadoc), a bug tracking system (e.g. Mantis); experience with remote desktop applications; experience with acceptance testing or unit testing and usability testing; training in a formalized project management methodology; experience working in academic libraries; experience with digital repository platforms such as DSpace, Fedora Commons, and EPrints; a proven record of innovative development for the web; experience of documenting procedures and systems; experience working in a formal project-managed work environment; RHCE certification or equivalent. How to Apply for this Job: Applications must be submitted online at www.jobs.vt.edu search posting # AP0130182. The application package needs to include a resume, cover letter addressing the candidate's experience with the responsibilities associated with the position, and the required and preferred qualifications, names of three (3) references and their contact information. Brought to you by code4lib jobs: http://jobs.code4lib.org/job/10357/
[CODE4LIB] Job: Library Web Manager at Brown University
Brown University Library seeks a Library Web Manager to oversee and manage content and software tools to support the Library's web presence. The Library Web Manager will coordinate with stakeholders across the Library to administer the Library's content management system and ensure consistency and accuracy of information on the Library's public websites, intranet, and social media. S/he will work to improve the integration of the Library's web-based discovery systems, and assist in the assessment, testing and implementation of new or improved services. **Responsibilities:** • Provide oversight and guidance for the creation, organization and maintenance of content for the Library's public websites and intranet • Coordinate with content owners to ensure that the Library's web presence is relevant, accurate, up-to-date, user-centered and accessible • Collaborate in the design, implementation, and management of a Drupal content management system (CMS) for the Libraries, including responsibility for configuration and user support • Develop and recommend policies, workflows, and content authoring guidelines for Web content development, implementation, and maintenance • Conduct training in creating web content using the CMS • Working with a library-wide advisory groups, assist in the development of effective and intuitive interfaces for the discovery of library content by ensuring the effective presentation of search options, metadata, and related resources • Field questions and identify solutions to improve access and retrieval of library resources via the web • Regularly assess and promote awareness of new and existing services, such as Summon, VuFind, etc. • Participate in the design, implementation and analysis of user research/usability studies • Conduct regular analytics to identify opportunities for improvement **Qualifications:** Required: • Bachelor's Degree • Demonstrated content management and web publishing experience • 3-4 years of related professional experience • Excellent organizational, analytical, and problem-solving skills. • Excellent written and oral communication skills • Ability to think creatively • Demonstrated experience working with content management systems (e.g. Drupal, Wordpress) and information technologies relevant to web site design and maintenance (e.g. HTML, CSS, Javascript, PHP) • Strong understanding of usability, usability testing, and information architecture concepts • Experience with web analytics analysis • Strong interpersonal and collaborative skills Desired: • Master's degree in Library or Information Science, Computer Science, or equivalent. • Experience developing and managing Drupal-based web sites • Supervisory experience leading a team To apply for this position (Job #B01524), please visit Brown's Online Employment website (https://careers.brown.edu), complete an application online, attach documents, and submit for immediateconsideration. Documents should include cover letter, resume, and the names and e-mail addresses of threereferences. Review of applications will continue until the position is filled. **Brown University is an Equal Opportunity/Affirmative Action Employer** Brought to you by code4lib jobs: http://jobs.code4lib.org/job/10388/
Re: [CODE4LIB] Online validator for RelaxNG or Schematron?
For RNG, as long as your schema is reachable and referenced correctly, it looks like this should work: http://validator.nu . Please let us know how you find it. Nothing known or found in a quick scan for Schematron. Somewhat surprised at the apparent lack of options. Cheers Hugh Barnes Digital Access Coordinator Library, Teaching and Learning Lincoln University Christchurch New Zealand p +64 3 423 0357 -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Wolfe, Mark D Sent: Friday, 18 October 2013 2:54 a.m. To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] Online validator for RelaxNG or Schematron? Does anyone know of an online validator for either Relax NG or Schematron? Thanks, Mark Mark Wolfe Curator of Digital Collections M. E. Grenander Department of Special Collections Archives Science Library 355, University at Albany, SUNY 1400 Washington Avenue, Albany NY 1 Phone: (518) 437-3934 Email: mwo...@albany.edu P Please consider the environment before you print this email. The contents of this e-mail (including any attachments) may be confidential and/or subject to copyright. Any unauthorised use, distribution, or copying of the contents is expressly prohibited. If you have received this e-mail in error, please advise the sender by return e-mail or telephone and then delete this e-mail together with all attachments from your system.
[CODE4LIB] Code4Lib 2014 Call for Proposals
Code4lib 2014 is a loosely-structured conference that provides people working at the intersection of libraries/archives/museums and technology with a chance to share ideas, be inspired, and forge collaborations. The conference will be held at the *Sheraton Raleigh Hotel in downtown Raleigh, NC from March 24 - 27, 2014*. For more information about the hotel, visit http://www.sheratonraleigh.com/ We are currently accepting proposals for prepared talks and pre-conferences. While only a limited number of these can be selected, multiple lightning talk and breakout sessions will provide additional opportunities for you to make your voice heard at the conference. *Proposals for Prepared Talks:* Prepared talks are 20 minutes (including setup and questions), and should focus on one or more of the following areas: - Projects you've worked on which incorporate innovative implementation of existing technologies and/or development of new software - Tools and technologies – How to get the most out of existing tools, standards and protocols (and ideas on how to make them better) - Technical issues – Big issues in library technology that should be addressed or better understood - Relevant non-technical issues – Concerns of interest to the Code4Lib community which are not strictly technical in nature, e.g. collaboration, diversity, organizational challenges, etc. *To submit a proposal:* - Go to http://wiki.code4lib.org/index.php/2014_Prepared_Talk_Proposals - Log in to the wiki in order to submit a proposal. If you are not already registered, follow the instructions to do so. - Provide a title and brief (500 words or fewer) description of your proposed talk. - If you so choose, you may also indicate when, if ever, you have presented at a prior Code4Lib conference. This information is completely optional, but it may assist us in opening the conference to new presenters. As in past years, the Code4Lib community will vote on proposals that they would like to see included in the program. This year, however, only the top 10 proposals will be guaranteed a slot at the conference. Additional presentations will be selected by the Program Committee in an effort to ensure diversity in program content. Community votes will, of course, still weigh heavily in these decisions. Presenters whose proposals are selected for inclusion in the program will be guaranteed an opportunity to register for the conference. The standard conference registration fee will still apply. Proposals can be submitted through Friday, November 8, 2013, at 5pm PST. Voting will commence on November 18, 2013 and continue through December 6, 2013. The final line-up of presentations will be announced in early January, 2014. *Pre-Conference Proposals:* Pre-conferences are full- or half-day sessions that will be held on Monday, March 24th, 2014 and can cover just about any topic you can think of [1]. If you are interested in hosting a pre-conference session, please create a pitch at http://wiki.code4lib.org/index.php/2014_preconference_proposals. Pitches should be added to the wiki by December 6. Please indicate the topic of your session and your preference for full-day or half-day. This is expected to be a fluid process, as our venue provides some flexibility in determining space. *Pre-Conference Attendance:* If you are interested in attending a pre-conference, please list your name underneath the pre-conference description on the wiki; this does not incur any obligation on your part, but will help planners. You might want to visit the page occasionally as new session pitches are added. Actual, less-revocable registration for pre-conferences will be handled as part of the overall conference registration, and will involve a very small fee. We look forward to reading your proposals, and seeing you at the conference! Code4Lib 2014 Program Committee -- Bulk mail. Postage paid.