date:20131017

Re: [CODE4LIB] pdf2txt [tesseract]

2013-10-17 Thread Eric Lease Morgan

On Oct 16, 2013, at 10:56 AM, Robert Haschart rh...@virginia.edu wrote:

 The abstract extraction routine I have been working on does use 
 tesseract internally for doing OCR when it encounters a document that 
 doesn't have usable full-text.  I agree that tesseract is not that easy 
 to install, especially if (as in my case) you do not have root/sudo 
 access to the machine.  Since I have gone through installing tesseract 
 quite recently, perhaps my experience can be helpful to you.


Robert, can you outline the process you used to get Tesseract to do OCR agains 
PDF documents? I installed Tesseract a few months ago, but I couldn't figure 
out how to get to work against PDF, only some image files. Any pointers would 
be greatly appreciated. (Hmmm. Maybe Tesseract doesn't do PDF files, only image 
files, and I need to convert my PDFs to images, and then the to Tesseract.) 
--Eric Morgan

[CODE4LIB] Online validator for RelaxNG or Schematron?

2013-10-17 Thread Wolfe, Mark D

Does anyone know of an online validator for either Relax NG or Schematron?

Thanks, Mark


Mark Wolfe
Curator of Digital Collections
M. E. Grenander Department of Special Collections  Archives
Science Library  355, University at Albany, SUNY
1400 Washington Avenue, Albany NY  1 
Phone: (518) 437-3934
Email: mwo...@albany.edu

Re: [CODE4LIB] pdf2txt [tesseract]

2013-10-17 Thread Christian Pietsch

Hi Eric,

On Thu, Oct 17, 2013 at 09:43:04AM -0400, Eric Lease Morgan wrote:
 Robert, can you outline the process you used to get Tesseract to do
 OCR agains PDF documents? I installed Tesseract a few months ago,
 but I couldn't figure out how to get to work against PDF, only some
 image files. Any pointers would be greatly appreciated. (Hmmm. Maybe
 Tesseract doesn't do PDF files, only image files, and I need to
 convert my PDFs to images, and then the to Tesseract.) --Eric Morgan

Once you have Tesseract installed, the easiest way to use it for
adding an OCR text layer to PDF files is this Ruby script IMHO:
https://github.com/gkovacs/pdfocr
Geza Kovacs wrote it for Cuneiform and an old version of OCRopus.
I added Tesseract support later.

If you cannot use Ruby for some reason, I could upload a BASH script
doing the same thing.

Cheers,
Christian

-- 
  Christian Pietsch · http://purl.org/net/pietsch
  LibTec · Library Technology and Knowledge Management
  Bielefeld University Library, Bielefeld, Germany

Re: [CODE4LIB] Google Analytics on multiple systems

2013-10-17 Thread Josh Wilson

Hi Joel,
It usually ends up being easiest to go with one GA account, separating
different sources by using different properties (e.g., UA-[acct number]-1
for CONTENTdm, UA-[acct number]-2 for LibGuides, etc.) rather than separate
accounts entirely. Each property can have different users with different
permissions levels so you can customize who has access to what. You can
further refine each property into different profiles if you want to filter
data from one source in different ways. Having everything under one account
makes it easy to manage and apply common settings (like users, filters, or
custom reports) between properties and profiles. If you add another user,
you only have to add them to one account, too.

There are limits to the number of allowed properties (it's quite high and
goes up occasionally; not sure what it is offhand), so if you bumped into
that you could use another GA account. Google has made it easier in recent
months to jump between accounts and properties, though.

(Sorry for delayed reply, catching up on listservs)



On Mon, Oct 14, 2013 at 2:36 PM, Joel Marchesoni jma...@email.wcu.eduwrote:

 Hello,

 We currently have Google Analytics on our main library pages and digital
 collections pages on the same domain. Now that CONTENTdm has a GA easy
 button we are going to add Analytics to it as well, and while we're at it
 probably LibGuides and non-authenticated ILLiad pages (I mainly want to see
 how big a percentage of mobile hits ILLiad gets) as well. I was hoping to
 hear from the list whether you have all service points in one GA account
 or a separate account for each one, and why.

 Thanks,

 Joel Marchesoni
 Tech Support Analyst
 Hunter Library, Western Carolina University
 http://library.wcu.edu/
 828-227-2860
 ~Please consider the environment before printing this email~

Re: [CODE4LIB] Google Analytics on multiple systems

2013-10-17 Thread Joel Marchesoni

Thank you all for your replies. I'm thinking we'll go with one account (we 
already have a Google account for various other services) with multiple 
properties. One thing that has complicated matters is the property we currently 
use is not yet able to be upgraded to Universal Analytics, which is what 
CONTENTdm uses.

FYI I noticed in my own research that the property limit is 250,000. I don't 
see us hitting that ever...

Joel

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Josh 
Wilson
Sent: Thursday, October 17, 2013 10:24
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Google Analytics on multiple systems

Hi Joel,
It usually ends up being easiest to go with one GA account, separating 
different sources by using different properties (e.g., UA-[acct number]-1 for 
CONTENTdm, UA-[acct number]-2 for LibGuides, etc.) rather than separate 
accounts entirely. Each property can have different users with different 
permissions levels so you can customize who has access to what. You can further 
refine each property into different profiles if you want to filter data from 
one source in different ways. Having everything under one account makes it easy 
to manage and apply common settings (like users, filters, or custom reports) 
between properties and profiles. If you add another user, you only have to add 
them to one account, too.

There are limits to the number of allowed properties (it's quite high and goes 
up occasionally; not sure what it is offhand), so if you bumped into that you 
could use another GA account. Google has made it easier in recent months to 
jump between accounts and properties, though.

(Sorry for delayed reply, catching up on listservs)



On Mon, Oct 14, 2013 at 2:36 PM, Joel Marchesoni jma...@email.wcu.eduwrote:

 Hello,

 We currently have Google Analytics on our main library pages and 
 digital collections pages on the same domain. Now that CONTENTdm has a 
 GA easy button we are going to add Analytics to it as well, and 
 while we're at it probably LibGuides and non-authenticated ILLiad 
 pages (I mainly want to see how big a percentage of mobile hits ILLiad 
 gets) as well. I was hoping to hear from the list whether you have all 
 service points in one GA account or a separate account for each one, and 
 why.

 Thanks,

 Joel Marchesoni
 Tech Support Analyst
 Hunter Library, Western Carolina University http://library.wcu.edu/
 828-227-2860
 ~Please consider the environment before printing this email~

Re: [CODE4LIB] pdf2txt [tesseract]

2013-10-17 Thread Robert Haschart


On 10/17/2013 9:43 AM, Eric Lease Morgan wrote:

On Oct 16, 2013, at 10:56 AM, Robert Haschartrh...@virginia.edu  wrote:


The abstract extraction routine I have been working on does use
tesseract internally for doing OCR when it encounters a document that
doesn't have usable full-text.  I agree that tesseract is not that easy
to install, especially if (as in my case) you do not have root/sudo
access to the machine.  Since I have gone through installing tesseract
quite recently, perhaps my experience can be helpful to you.


Robert, can you outline the process you used to get Tesseract to do OCR agains 
PDF documents? I installed Tesseract a few months ago, but I couldn't figure 
out how to get to work against PDF, only some image files. Any pointers would 
be greatly appreciated. (Hmmm. Maybe Tesseract doesn't do PDF files, only image 
files, and I need to convert my PDFs to images, and then the to Tesseract.) 
--Eric Morgan
That correct.   I use ghostscript to print the pdf to a series of .tiff 
files, and then use tesseract to perform ocr on the individual .tiff 
images, producing a .txt file for each page.   Since I'm only looking to 
extract the abstract I limit the ghostscript to the first 5 pages, and 
then do post-processing and various heuristics to find and fix the 
abstract.  One particular issue I've found is that tesseract is fond of 
detecting ligatures such as fi fl ff ffl ffi but doesn't seem 
to be very good at selecting the correct one (at least for my data), so 
one of the post-processing steps is expand the ligature to individual 
characters does a dictionary look-up to help select the correct expansion.

Re: [CODE4LIB] MARC field lengths

2013-10-17 Thread Karen Coyle

Thanks, Bill. What you say about assumptions is a good part of what is 
motivating me to try to instigate a discussion. As you know, both FRBR 
and RDA were developed by the cataloging community with no input from 
technologists. There are sweeping statements about FRBR being more 
efficient than the MARC model, but without, that I can find, any real 
analysis. There was a study done at OCLC on the ratio of Works to 
Manifestations (and that shows in their stats today), but the OCLC 
catalog is not representative of the catalog of a single library.


What I'm hoping to do is to surface some of the assumptions so that we 
can talk about them. I'll make a stab at an analysis, but I'm really 
interested in the conversation that could follow what I have to say.


kc

On 10/16/13 5:43 PM, Bill Dueber wrote:

My guess is that traversing the WEM structure for display of a single
record (e.g., in a librarian's ILS client or what not) will not be a
problem at all, because the volume is so low.  In terms of the OPAC
interface itself, well, there are lots and lots of way to denormalize the
data (meaning copy over and inline data whose canonical values are in
their own tables somewhere) for search and display purposes. Heck, lots of
us do this on a smaller and less complicated scale already, as we dump data
into Solr for our public catalogs.

This adds complexity to the system (determining what to denormalize,
determining when some underlying value has changed and knowing what other
elements need updating), but it's the sort of complexity that's been
well-studied and doesn't worry me too much.

I'm much, *much* more nerd than librarian, and if there's one thing I
wish I could get across to people who swing the other way, it's that
getting the data model right is so very much harder than figuring out how
to process it. Make sure the individual elements are machine-intelligible,
and there are hoards of smart people (both within and outside of the
library world) who will figure out how efficiently(-enough) store and
retrieve it. And, for the love of god, have someone around who can at least
speak authoritatively about what sorts of things fall into the hard and
easy-peasy categories in terms of the technology, instead of making
assumptions.




On Wed, Oct 16, 2013 at 6:23 PM, Karen Coyle li...@kcoyle.net wrote:


Yes, that's my take as well, but I think it's worth quantifying if
possible. There is the usual trade-off between time and space -- and I'd be
interested in hearing whether anyone here thinks that there is any concern
about traversing the WEM structure for each search and display. Does it
matter if every display of author in a Manifestation has to connect M-E-W?
Or is that a concern, like space, that is no longer relevant?

kc



On 10/16/13 12:57 PM, Bill Dueber wrote:


If anyone out there is really making a case for FRBR based on whether or
not it saves a few characters in a database, well, she should give up the
library business and go make money off  her time machine . Maybe --
*maybe* --

15 years ago. But I have to say, I'm sitting on 10m records right now, and
would happily figure out how to deal with double or triple the space
requirements for added utility. Space is always a consideration, but it's
slipped down into about 15th place on my Giant List of Things to Worry
About.


On Wed, Oct 16, 2013 at 3:49 PM, Karen Coyle li...@kcoyle.net wrote:

  On 10/16/13 12:33 PM, Kyle Banerjee wrote:

  BTW, I don't think 240 is a good substitute as the content is very

different than in the regular title. That's where you'll find music,
laws,
selections, translations and it's totally littered with subfields. The
70.1
figure from the stripped 245 is probably closer to the mark

  Yes, you are right, especially for the particular purpose I am looking

at.
Thanks.



  IMO, what you stand to gain in functionality, maintenance, and analysis

is
much more interesting than potential space gains/losses.

  Yes, obviously. But there exists an apology for FRBR that says that it

will save cataloger time and will be more efficient in a database. I
think
it's worth taking a look at those assumptions. If there is a way to
measure
functionality, maintenance, etc. then we should measure it, for sure.

kc



  kyle




On Wed, Oct 16, 2013 at 12:00 PM, Karen Coyle li...@kcoyle.net wrote:

   Thanks, Roy (and others!)


It looks like the 245 is including the $c - dang! I should have been
more
specific. I'm mainly interested in the title, which is $a $b -- I'm
looking
at the gains and losses of bytes should one implement FRBR. As a hedge,
could I ask what've you got for the 240? that may be closer to reality.

kc


On 10/16/13 10:57 AM, Roy Tennant wrote:

   I don't even have to fire it up. That's a statistic that we generate


quarterly (albeit via Hadoop). Here you go:

100 - 30.3
245 - 103.1
600 - 41
610 - 48.8
611 - 61.4
630 - 40.8
648 - 23.8
650 - 35.1
651 - 39.6
653 - 33.3
654 - 38.1
655 - 22.5
656 - 30.6
657 - 27.4
658

Re: [CODE4LIB] Google Analytics on multiple systems

2013-10-17 Thread Josh Wilson

Wow, 250,000? I'm not sure that's right, though I'm prepared to believe
anything. I checked the GA documentation, which says you can officially
have 50 profiles per account. Each property has at least one default
profile, so that's probably the official limit of properties too, before
you'd need to use an extra account. (In turn, you can evidently manage 25
GA accounts per Google user account.)

Not sure where the 250,000 figure comes from, but I've seen a number of
scripting workarounds for the profile limit in various analytics blogs, so
maybe you can sort of 'overclock' your accounts if you needed to.


On Thu, Oct 17, 2013 at 10:41 AM, Joel Marchesoni jma...@email.wcu.eduwrote:

 Thank you all for your replies. I'm thinking we'll go with one account (we
 already have a Google account for various other services) with multiple
 properties. One thing that has complicated matters is the property we
 currently use is not yet able to be upgraded to Universal Analytics, which
 is what CONTENTdm uses.

 FYI I noticed in my own research that the property limit is 250,000. I
 don't see us hitting that ever...

 Joel

 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
 Josh Wilson
 Sent: Thursday, October 17, 2013 10:24
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Google Analytics on multiple systems

 Hi Joel,
 It usually ends up being easiest to go with one GA account, separating
 different sources by using different properties (e.g., UA-[acct number]-1
 for CONTENTdm, UA-[acct number]-2 for LibGuides, etc.) rather than separate
 accounts entirely. Each property can have different users with different
 permissions levels so you can customize who has access to what. You can
 further refine each property into different profiles if you want to filter
 data from one source in different ways. Having everything under one account
 makes it easy to manage and apply common settings (like users, filters, or
 custom reports) between properties and profiles. If you add another user,
 you only have to add them to one account, too.

 There are limits to the number of allowed properties (it's quite high and
 goes up occasionally; not sure what it is offhand), so if you bumped into
 that you could use another GA account. Google has made it easier in recent
 months to jump between accounts and properties, though.

 (Sorry for delayed reply, catching up on listservs)



 On Mon, Oct 14, 2013 at 2:36 PM, Joel Marchesoni jma...@email.wcu.edu
 wrote:

  Hello,
 
  We currently have Google Analytics on our main library pages and
  digital collections pages on the same domain. Now that CONTENTdm has a
  GA easy button we are going to add Analytics to it as well, and
  while we're at it probably LibGuides and non-authenticated ILLiad
  pages (I mainly want to see how big a percentage of mobile hits ILLiad
  gets) as well. I was hoping to hear from the list whether you have all
  service points in one GA account or a separate account for each one,
 and why.
 
  Thanks,
 
  Joel Marchesoni
  Tech Support Analyst
  Hunter Library, Western Carolina University http://library.wcu.edu/
  828-227-2860
  ~Please consider the environment before printing this email~

[CODE4LIB] Call for Proposals: MARC Formats Transition Interest Group at ALA Midwinter

2013-10-17 Thread Sarah Weeks

**Apologies for cross posting**
--

The LITA/ALCTS Marc Formats Transition Interest Group invites proposals for
presentations for its session at the 2014 ALA Midwinter Conference in
Philadelphia , Pennsylvania. The meeting will take place on Saturday,
January 25th, from 3pm to 4pm.

Proposals may be between 15 to 30 minutes in length. Possible topics
include, but are not limited to:

* Harvesting bibliographic data from MARC records for use in discovery
tools, next-gen catalogs and other applications
* Transforming MARC data to other metadata schemes (BIBFRAME, Dublin Core,
EAD, VRA, etc…)
* Using data from MARC records with data from linked data sources
* Discussions of recent MARC changes, RDA in MARC or ongoing problems or
complexities of the standard.
* Other unconventional projects using MARC data.

Proposals should be e-mailed to Sarah Weeks (wee...@stolaf.edu) by Monday,
November 11, 2013. Please include presentation title, summary, amount of
time needed for the presentation, and the names, titles and contact
information for the presenter(s).

-- 
Sarah Beth Weeks
Head of Technical Services
St Olaf College Rolvaag Memorial Library
1510 St. Olaf Avenue
Northfield, MN 55057
507-786-3453 (office)

Re: [CODE4LIB] Google Analytics on multiple systems

2013-10-17 Thread Joel Marchesoni

Oh wow, sorry, that's not right. I was thinking 25; not sure where the 4 zeros 
came from...

Joel

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Josh 
Wilson
Sent: Thursday, October 17, 2013 11:18
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Google Analytics on multiple systems

Wow, 250,000? I'm not sure that's right, though I'm prepared to believe 
anything. I checked the GA documentation, which says you can officially have 50 
profiles per account. Each property has at least one default profile, so that's 
probably the official limit of properties too, before you'd need to use an 
extra account. (In turn, you can evidently manage 25 GA accounts per Google 
user account.)

Not sure where the 250,000 figure comes from, but I've seen a number of 
scripting workarounds for the profile limit in various analytics blogs, so 
maybe you can sort of 'overclock' your accounts if you needed to.

On Thu, Oct 17, 2013 at 10:41 AM, Joel Marchesoni jma...@email.wcu.eduwrote:

 Thank you all for your replies. I'm thinking we'll go with one account 
 (we already have a Google account for various other services) with 
 multiple properties. One thing that has complicated matters is the 
 property we currently use is not yet able to be upgraded to Universal 
 Analytics, which is what CONTENTdm uses.

 FYI I noticed in my own research that the property limit is 250,000. I 
 don't see us hitting that ever...

 Joel

 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf 
 Of Josh Wilson
 Sent: Thursday, October 17, 2013 10:24
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Google Analytics on multiple systems

 Hi Joel,
 It usually ends up being easiest to go with one GA account, separating 
 different sources by using different properties (e.g., UA-[acct 
 number]-1 for CONTENTdm, UA-[acct number]-2 for LibGuides, etc.) 
 rather than separate accounts entirely. Each property can have 
 different users with different permissions levels so you can customize 
 who has access to what. You can further refine each property into 
 different profiles if you want to filter data from one source in 
 different ways. Having everything under one account makes it easy to 
 manage and apply common settings (like users, filters, or custom 
 reports) between properties and profiles. If you add another user, you only 
 have to add them to one account, too.

 There are limits to the number of allowed properties (it's quite high 
 and goes up occasionally; not sure what it is offhand), so if you 
 bumped into that you could use another GA account. Google has made it 
 easier in recent months to jump between accounts and properties, though.

 (Sorry for delayed reply, catching up on listservs)

 On Mon, Oct 14, 2013 at 2:36 PM, Joel Marchesoni jma...@email.wcu.edu
 wrote:

  Hello,

  We currently have Google Analytics on our main library pages and 
  digital collections pages on the same domain. Now that CONTENTdm has 
  a GA easy button we are going to add Analytics to it as well, and 
  while we're at it probably LibGuides and non-authenticated ILLiad 
  pages (I mainly want to see how big a percentage of mobile hits 
  ILLiad
  gets) as well. I was hoping to hear from the list whether you have 
  all service points in one GA account or a separate account for 
  each one,
 and why.

  Thanks,

  Joel Marchesoni
  Tech Support Analyst
  Hunter Library, Western Carolina University http://library.wcu.edu/
  828-227-2860
  ~Please consider the environment before printing this email~

[CODE4LIB] Job: Digital Initiatives Librarian at University of Wisconsin-Parkside

2013-10-17 Thread jobs

The University of Wisconsin-Parkside invites applications for the Digital
Initiatives Librarian (Official Title: Associate Academic Librarian). This is
a full-time, 12 month Academic Staff position.

  
The Digital Initiatives Librarian manages the digital assets and digital
collections of the University of Wisconsin-Parkside Library and Archives and
Area Research Center. Works directly with students, faculty, staff and members
of the community to support access to archival materials in digital formats.
Provides direct support for student and faculty archival research. Teaches
scheduled archival instruction sessions for UW-Parkside research courses, and
collaborates with instructors to determine which resources and delivery
methods are most appropriate to class objectives. Provides expertise and
leadership in the design and development of the University digital
collections. Oversees web-based access to local and remote digital content.
Manages digitization processes and workflows. Partners with Head of Library
Systems and Emerging Technologies Librarian to develop appropriate
preservation, storage and retrieval of University digital assets. Monitors
digitization trends and developments and assists in the adoption and
implementation of new technologies and methods. This position reports to the
Head of Archives at the University of Wisconsin-Parkside Library.

  
**QUALIFICATIONS**  
  
_Required_

  * ALA accredited MLS, with archival coursework
  * Strong oral and written communication skills
  * Demonstrated project management skills
  * Preferred
  * Experience working in an archives or special collections
  * Experience creating standard archival finding aids
  * Experience in public records reference
  * Experience working in an academic environment
  * Strong preference will be given to applicants with experience using Omeka, 
Archivists' Toolkit or Archon, and oXygen or other XML editors
**RESPONSIBILITIES**  
  
_A_.

  * Oversees the creation, preservation, and delivery of digital content and 
collections in support of research and instruction
  * Identifies and selects archival material/collections to be digitized.
  * Establishes project priorities and manages all facets of digitization 
projects including development of workflows and schedules, coordination of 
staff and equipment resources, quality control and creation of documentation 
for digital project procedures.
  * Coordinates the daily operations of digital content creation including 
digitization of a variety of digital content formats and quality control 
activities.
  * Keeps current with digitization and digital delivery systems, standards and 
trends in higher education.
  * Investigates and recommends digitization hardware and software; monitors 
and maintains specialized hardware and software to capture, manipulate and save 
digital files.
  * Trains digitization staff; trains and supervises student employees.
  * Experiments with open-source software solutions to anticipate future 
ancillary services, including GIS mapping, mobile interfacing, and 
interoperability with new systems.
  * Designs and implements digital exhibits.
  * Creates metadata for digital material and collections; stays current with a 
variety of digital library standards including best practices for digitization 
and metadata creation.
  * Engages in outreach activities with campus and community partners.
_B._

  * Manages the daily functions of the Archives
  * Manages ARC transfers and training students/staff in receiving and 
transferring procedures.
  * Performs archival and public records reference.
  * Conducts records management functions, such as answering records questions, 
referring university employees to the proper schedules, and receiving records 
collections in any format.
  * Coordinates the technological environment for bar coding the Wisconsin 
Historical Society collections and supervises the project.
C.

  * Participates in reference services, library instruction and library liaison 
program as needed
_D. _

  * Serves on library committees, teams, UW-System and professional committees 
as elected or appointed
  
_Knowledge, Skills and Abilities_

  * Experience with metadata standards and schema such as Dublin Core, EAD, and 
DACS.
  * Experience working with XML data and XML editors.
  * Experience in archival reference.
  * Familiarity with basic web languages and editors.
  * Ability to work with a diverse group of faculty, students, administrators, 
donors, staff and general public.
  * Familiarity with intellectual property and copyright issues.
  * Knowledge of digital project strategies, technologies and standards.
  * Knowledge of proper methods of handling and conserving archival materials 
in varied formats.
**SPECIAL NOTES:**  
  
Salary: Commensurate with qualifications and experience. The University of
Wisconsin System provides a liberal benefits package, including participation
in a state pension plan.

  
It is the

[CODE4LIB] Job: Professor of Audiovisual Archival Studies at University of California, Los Angeles

2013-10-17 Thread jobs

Assistant/Associate Professor of Audiovisual Archival Studies

The Department of Information Studies of the Graduate School of Education and
Information Studies at UCLA invites applications for a tenure-track assistant
professor or tenured associate professor specializing in audiovisual archival
studies. The successful applicant will have research and teaching interests
that relate to any aspect of audiovisual archival studies, broadly conceived
as encompassing moving image, recorded sound, and digital media archives.
These interests might include one or more of the following:

  * the nature, history, and role in society, of physical and digital 
collections of archival moving images, sound recordings, and new media objects;
  * the nature, history, and role in society, of media and technologies for the 
production, transmission, organization, discovery, retrieval, presentation, and 
playback of audiovisual works;
  * uses and users of audiovisual archives;
  * the appraisal, description, arrangement, documentation, curatorship, 
conservation, restoration, preservation, and exhibition of audiovisual archival 
resources, and of textual, visual, and material artifacts relating to such 
resources;
  * the design, evaluation, and use of collections, records, data/metadata, and 
digital/media asset management systems for audiovisual archival resources;
  * public programming and outreach in audiovisual archives;
  * the provision of equitable and open access to audiovisual cultural heritage;
  * community, ethnic, and Indigenous audiovisual archives and memory-keeping 
traditions;
  * the management of audiovisual archives in commercial (e.g., studio) and 
nonprofit (e.g., library special collections, museum) settings;
  * policy development and analysis for audiovisual archives;
  * the evolving identity of the moving image and recorded sound archivists' 
professions;
  * social, economic, political, and legal aspects of audiovisual archives 
management; and
  * international collaboration, policymaking, and standards development for 
audiovisual archives.
The Graduate School of Education and Information Studies (GSEIS) is one of
the top-ranked schools in the U.S., and supports internationally recognized
research centers including the Center for Information as Evidence. Within the
school, the Department of Information Studies has emerged as an innovative,
interdisciplinary locus for theory and research in information studies,
including archival and museum informatics, data curatorship, information
policy, new media, preservation, and textual and visual studies. The
Department's faculty has been recognized as among the most productive and
highly-cited in the field. Faculty members have close ties with UCLA's Center
for Digital Humanities, Ethnomusicology Archive, Film  Television Archive,
Library Digital Collections, and Library Special Collections.

  
The Department offers an M.A. program in Moving Image Archive Studies (MIAS),*
an M.L.I.S. (Master of Library and Information Science) degree with
specializations in archival studies, library studies, informatics, and rare
books and print and visual culture, and a Ph.D. program in Information
Studies. The MIAS program was established in 2002 as the first graduate
program in North America (and still the only one on the West Coast) to address
the technical, cultural, and policy challenges of preserving moving image
cultural heritage (film, video, and digital) through a systematic program for
preparing future moving image archivists to lead the field. The archival
studies specialization of the M.L.I.S. program is among the most highly
regarded nationally and internationally, and a leader in initiatives to
pluralize archival practice and research.

  
All faculty in the Department teach at both master's and doctoral levels;
thus, candidates should be able to demonstrate how their research and teaching
interests and experience will help foster the growth of the M.A., M.L.I.S.,
and Ph.D. programs. This position entails: teaching four four-unit courses
(including at least two of the core seminars in the MIAS M.A. program) per
year, or their equivalent, in accordance with the Department's workload
policy; advising and mentoring graduate students; actively engaging in
research; and actively participating in administrative responsibilities for
the Department, the School, and the University.

  
The School and the Department have strong commitments to the rich and varied
multicultural communities of the Southern California region, and a reputation
for merging research and practice in statewide, national, and international
outreach and service. We seek a scholar who will make the most of Los Angeles'
unique advantages as a setting for research that links audiovisual archival
studies to public engagement, and for creating international connections,
especially with the Pacific Rim and Latin America. We particularly encourage
applications from those whose research and

[CODE4LIB] Job: University Archivist Special Collections Librarian, at Adelphi University

2013-10-17 Thread jobs

University Archives and Special Collections (UASC) is comprised of two
distinct collections--the official archives of the University, in multiple
formats, and some 30 distinctive special collections in a variety of different
subjects.

  
Reporting to the Dean of Libraries, the University Archivist and Special
Collections Librarian position provides leadership within the department in
accordance with the Libraries' goals and strategic planning; facilitates
communication about UASC within the University Libraries, throughout the
University community, and to the general public of current and potential
users.

  
  
This is a tenure-track library Associate Professor faculty
position. Applicants must hold a master's degree from an
ALA accredited school of library/ information science, preferably with a
concentration in archives or some advanced training in archives, manuscripts,
and special collections. A second post-baccalaureate degree
or similar proof of advanced study is required for tenure.
The successful candidate will also have 3-5 years of significant experience in
an archives or special collections environment, including at least three years
of supervisory and budgetary responsibilities, as well as a broad
understanding of archival related activities in an academic research library
setting.

  
Primary Responsibilities:

  * Coordinates all aspects of Special Collections  Archives operations, 
including the ongoing acquisition of relevant material; preservation, 
conservation and management of collections; maintenance of intellectual 
control; and development of access and usage policies appropriate to both 
physical and virtual collections.
  * Provides overall supervisory oversight of staff, including full-time and 
part-time librarians/archivists, an administrative assistant, and student 
employees.
  * Oversees the formulation and periodic review of collection development and 
materials selection policies and profiles; oversees policies relating to the 
use of both collections..
  * Oversees specialized collection management functions, including the 
handling of gift materials, selection and de-selection collection processes, 
identification of potential conservation and preservation materials in the 
general collection, and collection analysis.
  * Maintains a strategic development plan that will encompass growth and 
enhancement of the library's physical and digital collections documenting the 
history and functions of the university.
  * Monitors resources within the department, including faculty/staff, budget, 
equipment, space and physical facilities.
  * Works collaboratively with the staff to set priorities, create strategic 
plans and documentation, and meet project deadlines.
  * Fosters communication and collegiality within the department and with other 
departments in the library.
  * Collaborates with Adelphi faculty and staff and all divisions of the 
Libraries to develop digital collections, including both digitized and 
born-digital resources; establish digitization priorities for print and 
audiovisual collections; and ensure that digitization projects are successfully 
completed.
  * Supports a high level of public service and dedication to the Libraries' 
mission within the department.
  * Promotes the use of primary resources within university courses and 
research.
  * Cultivates relationships with donors and prospective donors of unique 
special collections and archival materials.
  * Collaborates with department faculty/staff and library leadership to 
identify potential grant and funding sources, prepare required applications, 
and manage funded projects.
  * Works closely with department faculty/staff to develop programs and 
exhibits that will promote collections and contribute to the mission and vision 
of the Libraries and the University.
OTHER RESPONSIBILITIES:

  * Collection development and liaison responsibilities for one or more schools 
or departments.
  * Participation in the Libraries' information literacy program.
  * Provision of services at Swirbul Library's main reference desk including 
occasional evenings and weekends.
  * Service on University and Library committees.
  * Active participation in professional associations and activities.
  * Active participation in scholarly activities including research and 
publishing, as required for reappointment and tenure.


QUALIFICATIONS:

  * Knowledge of standards-based archival description and metadata schema, such 
as EAD, XML, MODS, and Dublin Core
  * Excellent communication and interpersonal skills
  * The ability to work effectively in a collegial environment
  * Evidence of ability to meet criteria for promotion and tenure
  * Experience with digitization projects, archival database management 
systems, and website construction.
Other desirable qualifications include:

  * Familiarity with ContentDM and Archivist's Toolkit
  * Experience with records retention policies and schedules, exhibits, and 
writing

[CODE4LIB] Job: Curator - Gordon W. Prange Collection and Librarian for East Asian Studies at University of Maryland, College Park

2013-10-17 Thread jobs

The University of Maryland Libraries are seeking dynamic and innovative
applicants for the position of Curator of the Gordon W. Prange Collection and
Librarian for East Asian Studies.

  
The successful candidate will create and implement a vision for the Gordon W.
Prange Collection, a world-renown special collection of rare and archival
materials that constitutes the most comprehensive collection of Japanese
language publications issued in Japan during the post-World War II period of
1945-1949. The Prange Collection encompasses over 1.7 million items
representing virtually everything published in Japan during this period. The
University of Maryland Libraries, in partnership with the National Diet
Library of Japan, have engaged in large-scale microfilming and digitization
projects to preserve and improve access to this historically significant and
unique collection. Project funders have included the National Endowment for
the Humanities, the Japan Foundation Center for Global Partnership and the
Nippon Foundation.

  
The Curator/Librarian will also be responsible for East Asian studies
materials in the Libraries' general collection, which includes over 80,000
monographs, periodicals and reference works in Chinese, Japanese and Korean
languages. Particular strengths include humanities and social sciences with an
emphasis on Chinese and Japanese history and culture in support of the
research and curricular needs of faculty and students in East Asian Studies.
The Curator/Librarian will develop a robust program of collection development,
research services, digitization, outreach, and scholarly activity to support
these collections.

  
In addition, the successful candidate will not only manage these collections
and related services, but will also be a scholar with an active program of
print and digital research based in the Prange and East Asia Collections. For
the full job announcement and position description, please go to
[http://www.lib.umd.edu/hr/employment-opportunities/staff-faculty-
positions](http://www.lib.umd.edu/hr/employment-opportunities/staff-faculty-
positions).

  
Position is appointed to Librarian Faculty Ranks as established by the
University System of Maryland Board of Regents. Rank at appointment is based
on the successful applicant's experience and relevant credentials. For
additional information, consult the following website:
[http://www.president.umd.edu/policies/ii-
100B.html](http://www.president.umd.edu/policies/ii-100B.html).

  
**APPLICATIONS**:  
Electronic applications required. Please apply online at
[https://ejobs.umd.edu/postings/22149](https://ejobs.umd.edu/postings/22149).
An application consists of a cover letter which includes
the source of advertisement, a resume, and names/e-mail addresses of three
references. Applications will be reviewed as they are received and accepted
until Monday, November 18, 2013. The University of Maryland, College Park,
actively subscribes to a policy of equal employment opportunity, and will not
discriminate against any employee or applicant because of race, age, sex,
color, sexual orientation, physical or mental disability, religion, ancestry
or national origin, marital status, genetic information, political
affiliation, or gender identity and expression. Minorities
and women are encouraged to apply.



Brought to you by code4lib jobs: http://jobs.code4lib.org/job/10356/

[CODE4LIB] Job: Systems Engineers at Virginia Polytechnic Institute and State University

2013-10-17 Thread jobs

Virginia Tech's Newman Library and the Center for Digital Research and
Scholarship (CDRS) are seeking qualified candidates for two Systems Engineers
for data initiatives. Incumbents will develop systems that: 1) enable data
integration across distributed and heterogeneous local and external data
sources to maximize data use and reuse in applications, and 2) support digital
preservation strategies and repository systems research, development, and
implementation. Primary responsibilities include leading technical
contributions, such as data architecture design, data integration, system
design and testing, and applications development, implementation,
administration, and support, for data publishing and preservation projects
(initial focus on VIVO and Fedora). Additional responsibilities include
ensuring systems compatibility to meet project/program functionality,
technical, and design specifications, and scheduling objectives; collaborating
with colleagues in the Libraries and at other institutions in delivering
system and web development projects; providing informed IT-related advice for
Center for Digital Research and Scholarship (CDRS) projects; liaising with
CDRS and Information Technologies and Services (ITS) personnel for planning
and service development; participating in selected cross-Libraries working
groups to improve systems and services; providing training to Libraries
personnel (and library users where appropriate); participating in various
systems engineering projects as a result of developments and changes in
Library services.

  
Required Qualifications:

Master's degree in computer/information science, management information
systems, or related field, or Bachelor's degree and significant experience
equivalent to an advanced degree. Successful candidates must have: familiarity
with semantic web technologies; knowledge of and experience with: Java and/or
Object Oriented programming in PHP, relational databases (e.g., MySQL), web
applications (e.g., HTTP, CSS, HTML, XML, REST API), software development
methods and tools (e.g., version control, agile programming methodologies,
documentation, and sound security practices); experience with Windows 200x
and/or UNIX/LINUX server environments and related support and maintenance,
thorough understanding of application server (Apache Web) technical
architecture, and familiarity with shell scripting; experience with backups,
caching, role servers, DNS, SMTP/mail relays, SQL query
writing/troubleshooting, SSL certificates, systems design and networking and
security administration; knowledge of authentication mechanisms (local and/or
external) - Active Directory, LDAP, Shibboleth, EZproxy (or similar); ability
to work independently and with initiative to identify and solve problems;
excellent analytical and design skills at multi-product/multi-environment
level; ability to work collaboratively with individuals and groups, both
onsite and remotely; good interpersonal and communications skills; commitment
to service excellence and customer care.

  
Preferred Qualifications:

Knowledge of and experience with JCR or J2EE; knowledge of Ruby, Solr Indexes,
Semantic Triplestores, and Cloud Infrastructures; experience working with RDF
in practical applications; experience working in a managed programming
environment using one or more of the following: an IDE (e.g. Eclipse), a code
repository (e.g. Redmine, Trac, Subversion, Github), in-code documentation
(e.g. PHPDoc/Javadoc), a bug tracking system (e.g. Mantis); experience with
remote desktop applications; experience with acceptance testing or unit
testing and usability testing; training in a formalized project management
methodology; experience working in academic libraries; experience with digital
repository platforms such as DSpace, Fedora Commons, and EPrints; a proven
record of innovative development for the web; experience of documenting
procedures and systems; experience working in a formal project-managed work
environment; RHCE certification or equivalent.

  
How to Apply for this Job:

Applications must be submitted online at www.jobs.vt.edu search posting #
AP0130182. The application package needs to include a resume, cover letter
addressing the candidate's experience with the responsibilities associated
with the position, and the required and preferred qualifications, names of
three (3) references and their contact information.



Brought to you by code4lib jobs: http://jobs.code4lib.org/job/10357/

[CODE4LIB] Job: Library Web Manager at Brown University

2013-10-17 Thread jobs

Brown University Library seeks a Library Web Manager to oversee and manage
content and software tools to support the Library's web
presence. The Library Web Manager will coordinate with
stakeholders across the Library to administer the Library's content management
system and ensure consistency and accuracy of information on the Library's
public websites, intranet, and social media. S/he will work
to improve the integration of the Library's web-based discovery systems, and
assist in the assessment, testing and implementation of new or improved
services.

  
**Responsibilities:**  
• Provide oversight and guidance for the creation, organization and
maintenance of content for the Library's public websites and intranet

• Coordinate with content owners to ensure that the Library's web presence is
relevant, accurate, up-to-date, user-centered and accessible

• Collaborate in the design, implementation, and management of a Drupal
content management system (CMS) for the Libraries, including responsibility
for configuration and user support

• Develop and recommend policies, workflows, and content authoring guidelines
for Web content development, implementation, and maintenance

• Conduct training in creating web content using the CMS

• Working with a library-wide advisory groups, assist in the development of
effective and intuitive interfaces for the discovery of library content by
ensuring the effective presentation of search options, metadata, and related
resources

• Field questions and identify solutions to improve access and retrieval of
library resources via the web

• Regularly assess and promote awareness of new and existing services, such as
Summon, VuFind, etc.

• Participate in the design, implementation and analysis of user
research/usability studies

• Conduct regular analytics to identify opportunities for improvement

  
**Qualifications:**  
Required:

• Bachelor's Degree

• Demonstrated content management and web publishing experience

• 3-4 years of related professional experience

• Excellent organizational, analytical, and problem-solving skills.

• Excellent written and oral communication skills

• Ability to think creatively

• Demonstrated experience working with content management systems (e.g.
Drupal, Wordpress) and information technologies relevant to web site design
and maintenance (e.g. HTML, CSS, Javascript, PHP)

• Strong understanding of usability, usability testing, and information
architecture concepts

• Experience with web analytics analysis

• Strong interpersonal and collaborative skills

  
Desired:

• Master's degree in Library or Information Science, Computer Science, or
equivalent.

• Experience developing and managing Drupal-based web sites

• Supervisory experience leading a team

  
To apply for this position (Job #B01524), please visit Brown's Online
Employment website (https://careers.brown.edu), complete an application
online, attach documents, and submit for
immediateconsideration. Documents should include cover
letter, resume, and the names and e-mail addresses of
threereferences. Review of applications will continue until
the position is filled.

  
**Brown University is an Equal Opportunity/Affirmative Action Employer**



Brought to you by code4lib jobs: http://jobs.code4lib.org/job/10388/

Re: [CODE4LIB] Online validator for RelaxNG or Schematron?

2013-10-17 Thread Barnes, Hugh

For RNG, as long as your schema is reachable and referenced correctly, it looks 
like this should work: http://validator.nu . Please let us know how you find it.

Nothing known or found in a quick scan for Schematron.

Somewhat surprised at the apparent lack of options.

Cheers

Hugh Barnes
Digital Access Coordinator
Library, Teaching and Learning
Lincoln University
Christchurch
New Zealand
p +64 3 423 0357

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Wolfe, 
Mark D
Sent: Friday, 18 October 2013 2:54 a.m.
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] Online validator for RelaxNG or Schematron?

Does anyone know of an online validator for either Relax NG or Schematron?

Thanks, Mark


Mark Wolfe
Curator of Digital Collections
M. E. Grenander Department of Special Collections  Archives Science Library  
355, University at Albany, SUNY
1400 Washington Avenue, Albany NY  1
Phone: (518) 437-3934
Email: mwo...@albany.edu


P Please consider the environment before you print this email.
The contents of this e-mail (including any attachments) may be confidential 
and/or subject to copyright. Any unauthorised use, distribution, or copying of 
the contents is expressly prohibited. If you have received this e-mail in 
error, please advise the sender by return e-mail or telephone and then delete 
this e-mail together with all attachments from your system.

[CODE4LIB] Code4Lib 2014 Call for Proposals

2013-10-17 Thread Ranti Junus

Code4lib 2014 is a loosely-structured conference that provides people
working at the intersection of libraries/archives/museums and technology
with a chance to share ideas, be inspired, and forge collaborations.

The conference will be held at the *Sheraton Raleigh Hotel in downtown
Raleigh, NC from March 24 - 27, 2014*.  For more information about the
hotel, visit http://www.sheratonraleigh.com/

We are currently accepting proposals for prepared talks and
pre-conferences. While only a limited number of these can be selected,
multiple lightning talk and breakout sessions will provide additional
opportunities for you to make your voice heard at the conference.



*Proposals for Prepared Talks:*

Prepared talks are 20 minutes (including setup and questions), and should
focus on one or more of the following areas:

- Projects you've worked on which incorporate innovative implementation of
existing technologies and/or development of new software

- Tools and technologies – How to get the most out of existing tools,
standards and protocols (and ideas on how to make them better)

- Technical issues – Big issues in library technology that should be
addressed or better understood

- Relevant non-technical issues – Concerns of interest to the Code4Lib
community which are not strictly technical in nature, e.g. collaboration,
diversity, organizational challenges, etc.

*To submit a proposal:*

- Go to http://wiki.code4lib.org/index.php/2014_Prepared_Talk_Proposals

- Log in to the wiki in order to submit a proposal. If you are not already
registered, follow the instructions to do so.

- Provide a title and brief (500 words or fewer) description of your
proposed talk.

- If you so choose, you may also indicate when, if ever, you have presented
at a prior Code4Lib conference. This information is completely optional,
but it may assist us in opening the conference to new presenters.

As in past years, the Code4Lib community will vote on proposals that they
would like to see included in the program. This year, however, only the top
10 proposals will be guaranteed a slot at the conference. Additional
presentations will be selected by the Program Committee in an effort to
ensure diversity in program content. Community votes will, of course, still
weigh heavily in these decisions.

Presenters whose proposals are selected for inclusion in the program will
be guaranteed an opportunity to register for the conference. The standard
conference registration fee will still apply.

Proposals can be submitted through Friday, November 8, 2013, at 5pm PST.
Voting will commence on November 18, 2013 and continue through December 6,
2013. The final line-up of presentations will be announced in early
January, 2014.



*Pre-Conference Proposals:*

Pre-conferences are full- or half-day sessions that will be held on Monday,
March 24th, 2014 and can cover just about any topic you can think of [1].

If you are interested in hosting a pre-conference session, please create a
pitch at http://wiki.code4lib.org/index.php/2014_preconference_proposals.
Pitches should be added to the wiki by December 6.

Please indicate the topic of your session and your preference for full-day
or half-day.  This is expected to be a fluid process, as our venue provides
some flexibility in determining space.

*Pre-Conference Attendance:*

If you are interested in attending a pre-conference, please list your name
underneath the pre-conference description on the wiki; this does not incur
any obligation on your part, but will help planners. You might want to
visit the page occasionally as new session pitches are added.  Actual,
less-revocable registration for pre-conferences will be handled as part of
the overall conference registration, and will involve a very small fee.


We look forward to reading your proposals, and seeing you at the conference!

Code4Lib 2014 Program Committee



-- 
Bulk mail.  Postage paid.

Re: [CODE4LIB] pdf2txt [tesseract]

[CODE4LIB] Online validator for RelaxNG or Schematron?

Re: [CODE4LIB] pdf2txt [tesseract]

Re: [CODE4LIB] Google Analytics on multiple systems

Re: [CODE4LIB] Google Analytics on multiple systems

Re: [CODE4LIB] pdf2txt [tesseract]

Re: [CODE4LIB] MARC field lengths

Re: [CODE4LIB] Google Analytics on multiple systems

[CODE4LIB] Call for Proposals: MARC Formats Transition Interest Group at ALA Midwinter

Re: [CODE4LIB] Google Analytics on multiple systems

[CODE4LIB] Job: Digital Initiatives Librarian at University of Wisconsin-Parkside

[CODE4LIB] Job: Professor of Audiovisual Archival Studies at University of California, Los Angeles

[CODE4LIB] Job: University Archivist Special Collections Librarian, at Adelphi University

[CODE4LIB] Job: Curator - Gordon W. Prange Collection and Librarian for East Asian Studies at University of Maryland, College Park

[CODE4LIB] Job: Systems Engineers at Virginia Polytechnic Institute and State University

[CODE4LIB] Job: Library Web Manager at Brown University

Re: [CODE4LIB] Online validator for RelaxNG or Schematron?

[CODE4LIB] Code4Lib 2014 Call for Proposals

18 matches

Site Navigation

Mail list logo

Footer information