date:20120301

Re: [CODE4LIB] Repositories, OAI-PMH and web crawling

2012-03-01 Thread Owen Stephens

Thanks Jason and Ed,

I suspect within this project we'll keep using OAI-PMH because we've got tight 
deadlines and the other project strands (which do stuff with the harvested 
content) need time from the developer. At the moment it looks like we will 
probably combine OAI-PMH with web crawling (using nutch) - so use data from the 

However, that said, one of the things we are meant to be doing is offering 
recommendations or good practice guidelines back to the (repository) community 
based on our experience. If we have time I would love to tackle the questions 
(a)-(d) that you highlight here - perhaps especially (a) and (c). Since this 
particular project is part of the wider JISC 'Discovery' programme 
(http://discovery.ac.uk and tech principles at 
http://technicalfoundations.ukoln.info/guidance/technical-principles-discovery-ecosystem)
 - from which one of the main themes might be summarised as 'work with the web' 
these questions are definitely relevant.

I need to look at Jason's stuff again as I think this definitely has parallels 
with some of the Discovery work, as, of course, does some of the recent 
discussion on here about the question of the indexing of library catalogues by 
search engines.

Thanks again to all who have contributed to the discussion - very useful

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 1 Mar 2012, at 11:42, Ed Summers wrote:

 On Mon, Feb 27, 2012 at 12:15 PM, Jason Ronallo jrona...@gmail.com wrote:
 I'd like to bring this back to your suggestion to just forget OAI-PMH
 and crawl the web. I think that's probably the long-term way forward.
 
 I definitely had the same thoughts while reading this thread. Owen,
 are you forced to stay within the context of OAI-PMH because you are
 working with existing institutional repositories? I don't know if it's
 appropriate, or if it has been done before, but as part of your work
 it would be interesting to determine:
 
 a) how many IRs allow crawling (robots.txt or lack thereof)
 b) how many IRs support crawling with a sitemap
 c) how many IR HTML splashpages use the rel-license [1] pattern
 d) how many IRs support syndication (RSS/Atom) to publish changes
 
 If you could do this in a semi-automated way for the UK it would be
 great if you could then apply it to IRs around the world. It would
 also align really nicely with the sort of work that Jason has been
 doing around CAPS [2].
 
 It seems to me that there might be an opportunity to educate digital
 repository managers about better aligning their content w/ the Web ...
 instead of trying to cook up new standards. I imagine this is way out
 of scope for what you are currently doing--if so, maybe this can be
 your next grant :-)
 
 //Ed
 
 [1] http://microformats.org/wiki/rel-license
 [2] https://github.com/jronallo/capsys

Re: [CODE4LIB] Repositories, OAI-PMH and web crawling

2012-03-01 Thread Ian Ibbotson

Owen...

Just wanted to say that, whilst I've been silent since my initial response,
I'm not sure I agree with all the viewpoints presented here.. From a point
of view of (for example, CultureGrid) I'm not sure what has been done could
have been pragmatically achieved soley with web crawling as it's described
in this thread. Don't have a problem with anything thats been written here.
It certainly represent a great cross-section of viewpoints. However, from a
jisc discovery perspective, I don't want to contribute to any confirmation
bias that we could dispose of pesky old OAI. I'd be interested in providing
a counter-point to any Best practice document that suggested we could.

Ian.

On Thu, Mar 1, 2012 at 12:36 PM, Owen Stephens o...@ostephens.com wrote:

Thanks Jason and Ed,

I suspect within this project we'll keep using OAI-PMH because we've got
tight deadlines and the other project strands (which do stuff with the
harvested content) need time from the developer. At the moment it looks
like we will probably combine OAI-PMH with web crawling (using nutch) - so
use data from the

However, that said, one of the things we are meant to be doing is offering
recommendations or good practice guidelines back to the (repository)
community based on our experience. If we have time I would love to tackle
the questions (a)-(d) that you highlight here - perhaps especially (a) and
(c). Since this particular project is part of the wider JISC 'Discovery'
programme (http://discovery.ac.uk and tech principles at
http://technicalfoundations.ukoln.info/guidance/technical-principles-discovery-ecosystem)
- from which one of the main themes might be summarised as 'work with the
web' these questions are definitely relevant.

I need to look at Jason's stuff again as I think this definitely has
parallels with some of the Discovery work, as, of course, does some of the
recent discussion on here about the question of the indexing of library
catalogues by search engines.

Thanks again to all who have contributed to the discussion - very useful

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 1 Mar 2012, at 11:42, Ed Summers wrote:

On Mon, Feb 27, 2012 at 12:15 PM, Jason Ronallo jrona...@gmail.com
wrote:
I'd like to bring this back to your suggestion to just forget OAI-PMH
and crawl the web. I think that's probably the long-term way forward.

I definitely had the same thoughts while reading this thread. Owen,
are you forced to stay within the context of OAI-PMH because you are
working with existing institutional repositories? I don't know if it's
appropriate, or if it has been done before, but as part of your work
it would be interesting to determine:

a) how many IRs allow crawling (robots.txt or lack thereof)
b) how many IRs support crawling with a sitemap
c) how many IR HTML splashpages use the rel-license [1] pattern
d) how many IRs support syndication (RSS/Atom) to publish changes

If you could do this in a semi-automated way for the UK it would be
great if you could then apply it to IRs around the world. It would
also align really nicely with the sort of work that Jason has been
doing around CAPS [2].

It seems to me that there might be an opportunity to educate digital
repository managers about better aligning their content w/ the Web ...
instead of trying to cook up new standards. I imagine this is way out
of scope for what you are currently doing--if so, maybe this can be
your next grant :-)

//Ed

[1] http://microformats.org/wiki/rel-license
[2] https://github.com/jronallo/capsys

Re: [CODE4LIB] Repositories, OAI-PMH and web crawling

2012-03-01 Thread Owen Stephens

Thanks Ian,

Agree that it is clear from this discussion that there are differing viewpoints
and also very different requirements depending on the context and desired
outcomes.

I think I said earlier in the thread - I'm not against niche solutions, they
just make me want to double check that they are justified. For me I'd say the
jury is still out on 'crawl' vs 'harvest' - but I think it definitely needs
more investigation and thought - and of course different problems require
different solutions. It would be interesting to try to go through the case for
OAI-PMH, especially specific examples where it has achieved something that
would have been difficult/impossible to do with more general solutions. Not
sure if that could be done here on list, or better/easier through other
discussion - or both (possibly over that beer? :)

From the CORE project, any 'best practice' would be focussed on institutional
research publication repositories, and I it seems highly unlikely to make a
recommendation on 'crawl' vs 'harvest' - we just won't have time to do enough
work on this to understand the pros/cons of these even from our own singular
perspective. I think any recommendations are more along the lines of ensuring
robots.txt is consistent with other policies; the impact of using splash pages
as opposed to links to actual resources in the OAI-PMH feed; configuring
access to embargoed papers (as per Raffaele's suggestion); how to deal with
multi-part resources etc. Anything coming out of the project would, of course,
be just one projects recommendations for JISC to consider not more than that.

Cheers,

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 1 Mar 2012, at 14:38, Ian Ibbotson wrote:

Owen...

Ian.

On Thu, Mar 1, 2012 at 12:36 PM, Owen Stephens o...@ostephens.com wrote:

Thanks Jason and Ed,

Thanks again to all who have contributed to the discussion - very useful

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 1 Mar 2012, at 11:42, Ed Summers wrote:

If you could do this in a semi-automated way for the UK it would be
great if you could then apply it to IRs around the

[CODE4LIB] DC / Baltimore Perl Workshop

2012-03-01 Thread Joe Hourcle

Apologies in advance if you've already seen this from other mailing lists;  I 
know we have a few Perl folks on here, but I don't know how many in the DC area.

The DC  Baltimore Perl Mongers groups are organizing a Perl workshop on Sat, 
April 14th in Catonsville, MD.

We're still filling out the program schedule, but I thought I'd mention it as 
today's the last day for early registration ($25 vs. $50, although free for 
students  the unemployed)

http://dcbpw.org/dcbpw2012/

-Joe

[CODE4LIB] Follow Up to the Naming a 'Favorites' System for a Library Survey

2012-03-01 Thread Varnum, Ken

*Apologies for cross-posting *

A few weeks ago, I sent a link to a quick poll to a couple of listservs looking 
for information about what libraries have chosen to name their save this for 
later or favorite tool on the site. A handful of folks asked for the 
summarized results. I wrote up a brief summary on our library's technology blog 
at
http://mblog.lib.umich.edu/blt/archives/2012/03/bookmarks_favor.html


Ken Varnum
Web Systems Manager
University of Michigan Library -- http://lib.umich.edu/
var...@umich.edumailto:var...@umich.edu


From: Ken Varnum var...@umich.edumailto:var...@umich.edu
Date: Mon, 13 Feb 2012 14:51:35 -0500
Subject: Quick Survey: Naming a Favorites System for a Library

*Apologies for cross-posting *

We're working on a tool for our library website that will allow users to save a 
catalog entry, a link to a journal or database, or an article citation, for 
future use. There are a variety of names for this kind of tool (Favorites, 
Saved Items, Save for Later, Bookshelf, and so on), and I'd like to learn a bit 
from what you've done.  While many licensed databases and other web sites have 
this mechanism, I'm particularly interested in library-built systems.

The survey should take less than 3 minutes to complete:
http://bit.ly/library-faves

Please feel free to share with others as appropriate. I'm happy to summarize 
the results of the survey after it closes on February 20.

Ken Varnum
Web Systems Manager
University of Michigan Library -- http://lib.umich.edu/
var...@umich.edumailto:var...@umich.edu

Re: [CODE4LIB] Repositories, OAI-PMH and web crawling

2012-03-01 Thread Jonathan Rochkind

IF your HTML includes embedded semantic data using HTML5 microdata or 
RDFa or something similar (using a standard vocabulary -- the standard 
for repositories seems to be DC-based, since that's often all you can 
get out of OAI-PMH anyway) --- then web crawling combined with site maps 
probably provides about as much functionality as OAI-PMH.


But embedded semantic metadata is key.  However, even in the current 
OAI-PMH-considered-standard-best-practice world, the document-level 
metadata from repositories is often _extremely_ basic, as well as often 
unreliable.  This severely limits the functionality that harvesters can 
put harvests to.


So it's not neccesarily really about OAI-PMH vs web crawling. It's about 
sufficient and sufficiently reliable metadata.  And even in the OAI-PMH 
world, we rarely have it.


Note for instance that OAISter and similar harvesters are _unable to 
know_ whether a harvested document is open access full text or not.  
That seems like something you'd want to tell people in their search 
results right, they might only want stuff that they can actually 
access.  But  it's not really possible, becuase most (all?) repo's do 
not reveal any standard metadata in their OAI-PMH that would specify this.


On 3/1/2012 9:38 AM, Ian Ibbotson wrote:

Owen...

Just wanted to say that, whilst I've been silent since my initial response,
I'm not sure I agree with all the viewpoints presented here.. From a point
of view of (for example, CultureGrid) I'm not sure what has been done could
have been pragmatically achieved soley with web crawling as it's described
in this thread. Don't have a problem with anything thats been written here.
It certainly represent a great cross-section of viewpoints. However, from a
jisc discovery perspective, I don't want to contribute to any confirmation
bias that we could dispose of pesky old OAI. I'd be interested in providing
a counter-point to any Best practice document that suggested we could.

Ian.

On Thu, Mar 1, 2012 at 12:36 PM, Owen Stephenso...@ostephens.com  wrote:


Thanks Jason and Ed,

I suspect within this project we'll keep using OAI-PMH because we've got
tight deadlines and the other project strands (which do stuff with the
harvested content) need time from the developer. At the moment it looks
like we will probably combine OAI-PMH with web crawling (using nutch) - so
use data from the

However, that said, one of the things we are meant to be doing is offering
recommendations or good practice guidelines back to the (repository)
community based on our experience. If we have time I would love to tackle
the questions (a)-(d) that you highlight here - perhaps especially (a) and
(c). Since this particular project is part of the wider JISC 'Discovery'
programme (http://discovery.ac.uk and tech principles at
http://technicalfoundations.ukoln.info/guidance/technical-principles-discovery-ecosystem)
- from which one of the main themes might be summarised as 'work with the
web' these questions are definitely relevant.

I need to look at Jason's stuff again as I think this definitely has
parallels with some of the Discovery work, as, of course, does some of the
recent discussion on here about the question of the indexing of library
catalogues by search engines.

Thanks again to all who have contributed to the discussion - very useful

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 1 Mar 2012, at 11:42, Ed Summers wrote:


On Mon, Feb 27, 2012 at 12:15 PM, Jason Ronallojrona...@gmail.com

wrote:

I'd like to bring this back to your suggestion to just forget OAI-PMH
and crawl the web. I think that's probably the long-term way forward.

I definitely had the same thoughts while reading this thread. Owen,
are you forced to stay within the context of OAI-PMH because you are
working with existing institutional repositories? I don't know if it's
appropriate, or if it has been done before, but as part of your work
it would be interesting to determine:

a) how many IRs allow crawling (robots.txt or lack thereof)
b) how many IRs support crawling with a sitemap
c) how many IR HTML splashpages use the rel-license [1] pattern
d) how many IRs support syndication (RSS/Atom) to publish changes

If you could do this in a semi-automated way for the UK it would be
great if you could then apply it to IRs around the world. It would
also align really nicely with the sort of work that Jason has been
doing around CAPS [2].

It seems to me that there might be an opportunity to educate digital
repository managers about better aligning their content w/ the Web ...
instead of trying to cook up new standards. I imagine this is way out
of scope for what you are currently doing--if so, maybe this can be
your next grant :-)

//Ed

[1] http://microformats.org/wiki/rel-license
[2] https://github.com/jronallo/capsys

[CODE4LIB] Job: Archivist for Digital Collections at Tufts University

2012-03-01 Thread jobs4lib

**Posting Title: ARCHIVIST FOR DIGITAL COLLECTIONS - Digital Collections 
and Archives**

**Job Description-Overview**:

**The Digital Collections and 
Archives (DCA) supports the teaching and research mission of Tufts University 
by ensuring the enduring preservation and accessibility of the university's 
permanently valuable records and collections. The DCA assists departments, 
faculty, and staff in managing records and other assets. The DCA collaborates 
with members of the Tufts community and others to develop tools to discover and 
access collections to support teaching, research, and administrative 
needs.**



**The Archivist for Digital 
Collections (ADC) oversees the formulation, preparation, and management of 
digital objects and collections for the DCA with a particular focus on 
developing tools and workflows to maximize efficiency in digital collections 
management. This work includes: database manipulation, scripting, supervising 
student workers, developing policies and procedures concerning digital objects 
and metadata, implementing appropriate standards and best practices, conducting 
quality assurance for digital collections, undertaking preservation activities, 
and managing the DCA's locally-developed collections management system, CIDER. 
The ADC, working closely with the Director, acts as project manager for 
projects yielding digital collections including proposal development, and 
implementation and oversight of funded projects, and serves as a primary point 
of contact for faculty requiring assistance managing electronic resea!
 rch materials. The ADC collaborates closely with department colleagues on 
workflow development and 
implementation.**

Job Description - 
Requirements**:**

Basic 
Requirements:

  * **ALA-accredited MLS 
with concentration in Archives Management or related advanced 
degree.**
  * **3-5 years of related 
experience.**
  * **Experience with at 
least one programming or scripting language, such as Perl; some experience with 
database manipulation; knowledge of XML, HTML, CSS, digital imaging, and 
metadata and digital object creation and preservation standards. Ability to 
work in both Windows and Apple OSX environments. Comfort with learning new 
technologies on an ongoing basis.

Preferred 
Qualifications:

  * **Strong written and 
oral communication skills; ability to function in a highly collaborative 
environment with many simultaneous projects. Familiarity with digital 
repository systems, particularly Fedora, a plus. Knowledge of Ruby on Rails, 
MySQL, JQuery, Catalyst, a 
plus.**



**_Tufts University is an 
AA/EO employer and actively seeks candidates from diverse 
backgrounds._**





Brought to you by code4lib jobs: http://jobs.code4lib.org/job/815/

[CODE4LIB] Job: Preservation Digital Technology Internship at Library of Congress

2012-03-01 Thread jobs4lib

The Preservation Reformatting Division (PRD) provides access to at-risk
Library serials, brittle books, newspapers, photographs and manuscripts by
converting items to new formats such as microfilm, facsimile copies or digital
reproductions. Reformatting is accomplished through programs for
microphotography and digital capture.

The goal of the internship is to provide Library Science and Information
Technology students, graduates, and post-graduates with the opportunity to
study and work with state-of-the-art digital technologies: those used for the
digital reformatting of library materials; those used to document and model
reformatting and related preservation workflows; and those used to ensure
proper workflow execution by enabling statistical process monitoring and
control.



Interns have the opportunity to participate in the following key activities to
plan, get, describe, sustain, and make accessible reformatted digital and/or
microfilm formats for serials, photographs, manuscripts, brittle books, and
other items.



Digital Preservation Activities



Plan: Processing management (e.g., assessing materials, processing brittle
books, reviewing reformatting policies, etc.) in order to identify the
functions and processes to be represented in a fashion comprehensible to
library management and IT personnel.



Sustain: Microphotography using two state-of-the-art microfilm digitization
workstations (16/35mm roll and fiche), a high-resolution color overhead
capture workstation, and an image processing and data storage infrastructure
that enables: high-resolution digital image capture/importation and image
quality analysis from microform and printed materials; and image
inspection/auditing, editing, post processing, image quality measurement, and
process control activities, with a focus on digitizing microform materials.



Make Available: Digital imaging production processes of books and serials with
open-source and commercial image editing/image processing software (e.g.,
imaging materials, managing vendor-created images, conducting quality reviews,
and preparing files for use in online delivery systems).



Other Activities



Research: Specification development and deployment using computerized
modeling/design tools to develop preservation-relevant process models, data
models, flowcharts, and other products that represent existing and planned
Preservation Directorate operations.



Tours: The Library of Congress has tremendous quantity, quality, and diversity
in its holdings. Interns have the opportunity to tour the other Directorate
divisions as well as the many custodial divisions in the Library.



Training and Conservation Professional Activities: Participation in outreach
activities such as lab tours for visitors and relevant in-house lectures and
conferences. Interns meet curators to discuss collections and are expected to
give a farewell presentation of work and accomplishments to Library staff.



Application and Selection Procedure



Internships may be on part-time or full-time schedule, but minimum of 200
hours is generally required. The length of the internship generally ranges
from 6 weeks to 6 months.



Applicants should complete and submit by email the Preservation Fellowship and
Internship Application Form [PDF: 18 KB / 3 p.], plus a resume, two letters of
recommendation, and a formal letter of interest. Please follow the additional
instructions on the application form and note that the Preservation
Directorate uses this one application form for all of the various internships
and fellowships offered. Citizenship requirements: U.S. citizenship not
required.



Application Schedule



Applications are accepted at any time.



To apply, please direct applications to:

Mary Oey

Preservation Education Specialist

Library of Congress

Telephone: (202) 707-8345

FAX: (202) 707-1525

m...@loc.gov



Brought to you by code4lib jobs: http://jobs.code4lib.org/job/816/

[CODE4LIB] Job: Library Digital Services Manager at St. Edward's Hall

2012-03-01 Thread jobs4lib

##  Overview:

The Scarborough-Phillips Library at St. Edward's University seeks a creative,
innovative individual to work on all things digital including but not limited
to the library's web presence, digitization initiatives, and integrated
library systems. This position reports to the Head of
Library Systems. Salary range in the mid to high 50's,
commensurate with experience.

This position provides planning, organization and implementation of digital
library services under the general direction and leadership of the Head of
Library Systems, including: usability testing of digital products; user
experience design to create a nurturing, usable, and flexible digital
environment for learning; digitization of analog format; system administration
of III's Millennium enterprise library solution; system administration of
public services products, including LibGuides, LibAnswers, and LibAnalytics;
assist in writing and testing of programming code for the web, open source
solutions (e.g., Omeka, Book Reader, and Open Journal Systems), and automation
of internal library processes (e.g., sending and receiving data from vendors,
integration with an ERP system); maintaining computers, printers, and other
technology for staff and public service points in the library; work with IT to
solve problems with library systems; support the creation of digital learning
objects; and support resource sharing solutions.

##  Responsibilities:

  * Plan and implement a usability program for various library digital 
services, including but not limited to the library's website, databases, online 
catalog, and digital collections.
  * Collaborate with Instructional Technology and library staff to create new 
and support existing platforms for library reference and instruction, including 
tutorials, online chat, streaming media, podcasting, and 3rd party software.
  * Provide administrative support for the library's integrated library system 
(III's Millennium).
  * Provide administrative support for the library's interlibrary loan system 
and other resource sharing initiatives.
  * Provide administrative support for digital library tools, including but not 
limited to LibGuides, LibAnswers, and LibAnalytics.
  * Provide administrative support for staff computers, including the 
management of Deep Freeze, print queues, and installation of software.
  * Program and maintain open source solutions (e.g., Omeka, Book Reader, Open 
Journal Systems).
  * Support technology-related issues throughout the library, including 
digitization projects, user experience design, automation of technical services 
routines, and the creation of digital learning objects.

##  Qualifications:

  * Undergraduate degree in an area related to computer science or information 
systems required by time of employment. Advanced degree preferred. Experience 
with web development or technology support services in a library or academic 
setting preferred.
  * Experience with usability testing and user experience design required.
  * Familiarity with use of social media, e.g. Facebook, Twitter, Foursquare, 
in academic library settings preferred.
  * Demonstrated familiarity with developing and maintaining dynamic 
data-driven websites with relevant standards and technologies such as PHP, 
XML/XSLT, XHTML, CSS, JavaScript, and UNIX-like environments preferred.
  * Familiarity with digital media industry standards and production of 
high-quality audio, video and images and screencasts preferred.
  * Graphic design skills, including the use of Adobe Creative Suite, preferred.
  * Demonstrated effective oral, written and interpersonal communication skills.
  * Demonstrated ability to think critically and analytically and to work in a 
collegial, collaborative service-focused environment.
  * Familiarity with copyright laws and digital rights management preferred.
  * Experience with distance education courses preferred.
  * Ability to constantly adapt to a fast-evolving environment required.
  * Successful completion of an employment and/or criminal background check 
required.

##  About St. Edward's University:

Founded in 1885 by the Congregation of Holy Cross, St. Edward's University is
the premier private institution of higher learning in Austin. Enrolling
approximately 5,300 students, the university offers more than 90 academic
programs. In addition to the many programs designed for traditional
undergraduates, the university offers more than 15 undergraduate degree
programs designed for working adults and 11 masters degree programs. Over the
last two decades the university has doubled its enrollment and invested more
than $147 million in new campus facilities. US News  World Report has ranked
St. Edward's among the top regional universities in the West for nine
consecutive years, and peers identified St. Edward's as one of a handful of
up-and-coming universities in both 2010 and 2011. The university's newly
adopted strategic plan, Academic

[CODE4LIB] Job: Lead Programmer for Digital Libraries at University of North Texas

2012-03-01 Thread jobs4lib

**Departement Overview**



The digital library repository of the UNT Libraries is ranked in the top 10
repositories in North America. The University Libraries house print and
electronic collections of almost 6 million cataloged items, in five libraries
located in five separate facilities.

For more information, about our department and strategic vision please visit
our website at
[http://www.library.unt.edu

**Job Description **



The Library is seeking an IT Programmer Analyst I to serve as lead programmer
for the UNT Libraries various digital library initiatives including The Portal
to Texas History, UNT Digital Library and the CyberCemetery and Web archiving
activities.

Responsibilities include but are not limited to:

  * Supervise other software developers and programmers in the Digital 
Libraries Division
  * Serve as primary programmer for the CODA digital archiving environment and 
replication system
  * Serve as primary programmer for the Aubrey Search Service
  * Establish and monitor testing practices for software and interfaces 
developed by the unit
  * Adhere to the unit's version control practices for software development and 
deployment
  * Participate in grant and externally funded projects
  * Act as lead developer and administrator of the LOCKSS systems managed by 
the Libraries for the MetaArchive and the global LOCKSS network and the 
Texas-History Online search system 

**Minimum Qualifications**



The successful candidate will possess a Bachelor's Degree with coursework in
computing or information systems and two years of related computer programming
experience; or any equivalent combination of education, training and
experience.

The following knowledge, skills, and abilities are
required:

  * Considerable knowledge of the methods and equipment used in electronic data 
processing, including system analysis and design, and computer programming 
techniques
  * Strong skill in writing programs for computer applications
  * Ability to analyze problems and develop solutions 

**Preferred Qualifications**



The preferred candidate will possess the following additional
qualifications:

  * Demonstrated leadership in project teamwork
  * Ability to coordinate and evaluate the work of others
  * Understanding of digital library concepts and operations
  * Broad familiarity with open source tools and environments
  * Extensive knowledge of dynamic script programming languages such as Python, 
Perl or Ruby
  * Working knowledge of version control systems
  * Working knowledge of XML and related technologies
  * Extensive knowledge of Linux/Unix environments for software development and 
deployment
  * Working knowledge of Solr indexing software, including setup, configuration 
and interface design
  * Familiarity with the following technologies and/or applications - Python, 
PHP, Apache, MySQL, HTML, Java, XSLT 



Brought to you by code4lib jobs: http://jobs.code4lib.org/job/818/

[CODE4LIB] Autoscaling and streaming apps on EC2

2012-03-01 Thread Kyle Banerjee

Howdy all,

I have no experience with autoscaling or streaming, so I'm looking for
thoughts that help me wrap my mind around how to implement it in a
production setting.

I have been asked to examine the possibility of providing a consortia level
music reserves system using Variations (which I also have no experience
with). The software would be maintained centrally, but each institution
will manage its own collections, users, etc. Load is expected to vary
considerably, varying from practically nothing to possibly hundreds of
simultaneous streams at peak time. This strikes me as an excellent elastic
application.

This strikes me a good EC2 app and as far as I can tell, there are two
basic ways to achieve the elasticity I'm looking for in that environment.
The first is to store the files in S3 buckets and serve them via
CloudFront. This strikes me as the preferred solution, but I don't yet know
if I'll be able to get the user and staff clients to play well with this
configuration. The second is to have a script monitor the service and spin
up more instances when certain triggers are met and destroy them when
demand drops. But if I do that, all instances need to be able to access the
same live data. For DB data, that's a no brainer since I can just run a DB
server. But how do you synchronize live files across instances since EBS
volumes can only be accessed by one instance? Somehow, NFS strikes me an
ugly way to deal with the problem. Actually even if EBS volumes could be
attached to multiple instances, that solution would still suck as you could
have multiple apps trying to access files at the same time.

Obviously, I'm having trouble getting pointed in the right direction. I
could punt and just order capacity to handle heavy use cases. But that's a
copout, and figuring out autoscaling bandwidth and computing capacity just
feels like one of those tools that's really handy to have in your bag of
tricks. Any pointers would be appreciated.

kyle

-- 
--
Kyle Banerjee
Digital Services Program Manager
Orbis Cascade Alliance
baner...@uoregon.edu / 503.999.9787

Re: [CODE4LIB] Autoscaling and streaming apps on EC2

2012-03-01 Thread Kyle Banerjee

 I might be missing something, but it seems to me that you are
 comparing using CloudFront to trying to build your own CloudFront.
 Building your own does not seem like it would be very easy or cost
 effective.

 Essentially, S3 is an NFS, innit? We use it that way.

 What is the issue with CloudFront?


There's no philosophical problem with CloudFront, but there might be
practical ones. While I should theoretically be able to use s3fs to allow
the software to seamlessly interact with S3, the software also assumes
you're using it to stream media files rather than an external service.
 Maybe this change will be easy to implement, maybe it won't -- I won't
know until I try. If it isn't, I need to come up with a Plan B.

At this point in time, I'm just trying to make sure I understand my major
options for setting up the service.

kyle

Re: [CODE4LIB] Repositories, OAI-PMH and web crawling

Re: [CODE4LIB] Repositories, OAI-PMH and web crawling

Re: [CODE4LIB] Repositories, OAI-PMH and web crawling

[CODE4LIB] DC / Baltimore Perl Workshop

[CODE4LIB] Follow Up to the Naming a 'Favorites' System for a Library Survey

Re: [CODE4LIB] Repositories, OAI-PMH and web crawling

[CODE4LIB] Job: Archivist for Digital Collections at Tufts University

[CODE4LIB] Job: Preservation Digital Technology Internship at Library of Congress

[CODE4LIB] Job: Library Digital Services Manager at St. Edward's Hall

[CODE4LIB] Job: Lead Programmer for Digital Libraries at University of North Texas

[CODE4LIB] Autoscaling and streaming apps on EC2

Re: [CODE4LIB] Autoscaling and streaming apps on EC2

12 matches

Site Navigation

Mail list logo

Footer information