Re: [CODE4LIB] Repositories, OAI-PMH and web crawling
Thanks Jason and Ed, I suspect within this project we'll keep using OAI-PMH because we've got tight deadlines and the other project strands (which do stuff with the harvested content) need time from the developer. At the moment it looks like we will probably combine OAI-PMH with web crawling (using nutch) - so use data from the However, that said, one of the things we are meant to be doing is offering recommendations or good practice guidelines back to the (repository) community based on our experience. If we have time I would love to tackle the questions (a)-(d) that you highlight here - perhaps especially (a) and (c). Since this particular project is part of the wider JISC 'Discovery' programme (http://discovery.ac.uk and tech principles at http://technicalfoundations.ukoln.info/guidance/technical-principles-discovery-ecosystem) - from which one of the main themes might be summarised as 'work with the web' these questions are definitely relevant. I need to look at Jason's stuff again as I think this definitely has parallels with some of the Discovery work, as, of course, does some of the recent discussion on here about the question of the indexing of library catalogues by search engines. Thanks again to all who have contributed to the discussion - very useful Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 On 1 Mar 2012, at 11:42, Ed Summers wrote: On Mon, Feb 27, 2012 at 12:15 PM, Jason Ronallo jrona...@gmail.com wrote: I'd like to bring this back to your suggestion to just forget OAI-PMH and crawl the web. I think that's probably the long-term way forward. I definitely had the same thoughts while reading this thread. Owen, are you forced to stay within the context of OAI-PMH because you are working with existing institutional repositories? I don't know if it's appropriate, or if it has been done before, but as part of your work it would be interesting to determine: a) how many IRs allow crawling (robots.txt or lack thereof) b) how many IRs support crawling with a sitemap c) how many IR HTML splashpages use the rel-license [1] pattern d) how many IRs support syndication (RSS/Atom) to publish changes If you could do this in a semi-automated way for the UK it would be great if you could then apply it to IRs around the world. It would also align really nicely with the sort of work that Jason has been doing around CAPS [2]. It seems to me that there might be an opportunity to educate digital repository managers about better aligning their content w/ the Web ... instead of trying to cook up new standards. I imagine this is way out of scope for what you are currently doing--if so, maybe this can be your next grant :-) //Ed [1] http://microformats.org/wiki/rel-license [2] https://github.com/jronallo/capsys
Re: [CODE4LIB] Repositories, OAI-PMH and web crawling
Owen... Just wanted to say that, whilst I've been silent since my initial response, I'm not sure I agree with all the viewpoints presented here.. From a point of view of (for example, CultureGrid) I'm not sure what has been done could have been pragmatically achieved soley with web crawling as it's described in this thread. Don't have a problem with anything thats been written here. It certainly represent a great cross-section of viewpoints. However, from a jisc discovery perspective, I don't want to contribute to any confirmation bias that we could dispose of pesky old OAI. I'd be interested in providing a counter-point to any Best practice document that suggested we could. Ian. On Thu, Mar 1, 2012 at 12:36 PM, Owen Stephens o...@ostephens.com wrote: Thanks Jason and Ed, I suspect within this project we'll keep using OAI-PMH because we've got tight deadlines and the other project strands (which do stuff with the harvested content) need time from the developer. At the moment it looks like we will probably combine OAI-PMH with web crawling (using nutch) - so use data from the However, that said, one of the things we are meant to be doing is offering recommendations or good practice guidelines back to the (repository) community based on our experience. If we have time I would love to tackle the questions (a)-(d) that you highlight here - perhaps especially (a) and (c). Since this particular project is part of the wider JISC 'Discovery' programme (http://discovery.ac.uk and tech principles at http://technicalfoundations.ukoln.info/guidance/technical-principles-discovery-ecosystem) - from which one of the main themes might be summarised as 'work with the web' these questions are definitely relevant. I need to look at Jason's stuff again as I think this definitely has parallels with some of the Discovery work, as, of course, does some of the recent discussion on here about the question of the indexing of library catalogues by search engines. Thanks again to all who have contributed to the discussion - very useful Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 On 1 Mar 2012, at 11:42, Ed Summers wrote: On Mon, Feb 27, 2012 at 12:15 PM, Jason Ronallo jrona...@gmail.com wrote: I'd like to bring this back to your suggestion to just forget OAI-PMH and crawl the web. I think that's probably the long-term way forward. I definitely had the same thoughts while reading this thread. Owen, are you forced to stay within the context of OAI-PMH because you are working with existing institutional repositories? I don't know if it's appropriate, or if it has been done before, but as part of your work it would be interesting to determine: a) how many IRs allow crawling (robots.txt or lack thereof) b) how many IRs support crawling with a sitemap c) how many IR HTML splashpages use the rel-license [1] pattern d) how many IRs support syndication (RSS/Atom) to publish changes If you could do this in a semi-automated way for the UK it would be great if you could then apply it to IRs around the world. It would also align really nicely with the sort of work that Jason has been doing around CAPS [2]. It seems to me that there might be an opportunity to educate digital repository managers about better aligning their content w/ the Web ... instead of trying to cook up new standards. I imagine this is way out of scope for what you are currently doing--if so, maybe this can be your next grant :-) //Ed [1] http://microformats.org/wiki/rel-license [2] https://github.com/jronallo/capsys
Re: [CODE4LIB] Repositories, OAI-PMH and web crawling
Thanks Ian, Agree that it is clear from this discussion that there are differing viewpoints and also very different requirements depending on the context and desired outcomes. I think I said earlier in the thread - I'm not against niche solutions, they just make me want to double check that they are justified. For me I'd say the jury is still out on 'crawl' vs 'harvest' - but I think it definitely needs more investigation and thought - and of course different problems require different solutions. It would be interesting to try to go through the case for OAI-PMH, especially specific examples where it has achieved something that would have been difficult/impossible to do with more general solutions. Not sure if that could be done here on list, or better/easier through other discussion - or both (possibly over that beer? :) From the CORE project, any 'best practice' would be focussed on institutional research publication repositories, and I it seems highly unlikely to make a recommendation on 'crawl' vs 'harvest' - we just won't have time to do enough work on this to understand the pros/cons of these even from our own singular perspective. I think any recommendations are more along the lines of ensuring robots.txt is consistent with other policies; the impact of using splash pages as opposed to links to actual resources in the OAI-PMH feed; configuring access to embargoed papers (as per Raffaele's suggestion); how to deal with multi-part resources etc. Anything coming out of the project would, of course, be just one projects recommendations for JISC to consider not more than that. Cheers, Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 On 1 Mar 2012, at 14:38, Ian Ibbotson wrote: Owen... Just wanted to say that, whilst I've been silent since my initial response, I'm not sure I agree with all the viewpoints presented here.. From a point of view of (for example, CultureGrid) I'm not sure what has been done could have been pragmatically achieved soley with web crawling as it's described in this thread. Don't have a problem with anything thats been written here. It certainly represent a great cross-section of viewpoints. However, from a jisc discovery perspective, I don't want to contribute to any confirmation bias that we could dispose of pesky old OAI. I'd be interested in providing a counter-point to any Best practice document that suggested we could. Ian. On Thu, Mar 1, 2012 at 12:36 PM, Owen Stephens o...@ostephens.com wrote: Thanks Jason and Ed, I suspect within this project we'll keep using OAI-PMH because we've got tight deadlines and the other project strands (which do stuff with the harvested content) need time from the developer. At the moment it looks like we will probably combine OAI-PMH with web crawling (using nutch) - so use data from the However, that said, one of the things we are meant to be doing is offering recommendations or good practice guidelines back to the (repository) community based on our experience. If we have time I would love to tackle the questions (a)-(d) that you highlight here - perhaps especially (a) and (c). Since this particular project is part of the wider JISC 'Discovery' programme (http://discovery.ac.uk and tech principles at http://technicalfoundations.ukoln.info/guidance/technical-principles-discovery-ecosystem) - from which one of the main themes might be summarised as 'work with the web' these questions are definitely relevant. I need to look at Jason's stuff again as I think this definitely has parallels with some of the Discovery work, as, of course, does some of the recent discussion on here about the question of the indexing of library catalogues by search engines. Thanks again to all who have contributed to the discussion - very useful Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 On 1 Mar 2012, at 11:42, Ed Summers wrote: On Mon, Feb 27, 2012 at 12:15 PM, Jason Ronallo jrona...@gmail.com wrote: I'd like to bring this back to your suggestion to just forget OAI-PMH and crawl the web. I think that's probably the long-term way forward. I definitely had the same thoughts while reading this thread. Owen, are you forced to stay within the context of OAI-PMH because you are working with existing institutional repositories? I don't know if it's appropriate, or if it has been done before, but as part of your work it would be interesting to determine: a) how many IRs allow crawling (robots.txt or lack thereof) b) how many IRs support crawling with a sitemap c) how many IR HTML splashpages use the rel-license [1] pattern d) how many IRs support syndication (RSS/Atom) to publish changes If you could do this in a semi-automated way for the UK it would be great if you could then apply it to IRs around the
[CODE4LIB] DC / Baltimore Perl Workshop
Apologies in advance if you've already seen this from other mailing lists; I know we have a few Perl folks on here, but I don't know how many in the DC area. The DC Baltimore Perl Mongers groups are organizing a Perl workshop on Sat, April 14th in Catonsville, MD. We're still filling out the program schedule, but I thought I'd mention it as today's the last day for early registration ($25 vs. $50, although free for students the unemployed) http://dcbpw.org/dcbpw2012/ -Joe
[CODE4LIB] Follow Up to the Naming a 'Favorites' System for a Library Survey
*Apologies for cross-posting * A few weeks ago, I sent a link to a quick poll to a couple of listservs looking for information about what libraries have chosen to name their save this for later or favorite tool on the site. A handful of folks asked for the summarized results. I wrote up a brief summary on our library's technology blog at http://mblog.lib.umich.edu/blt/archives/2012/03/bookmarks_favor.html Ken Varnum Web Systems Manager University of Michigan Library -- http://lib.umich.edu/ var...@umich.edumailto:var...@umich.edu From: Ken Varnum var...@umich.edumailto:var...@umich.edu Date: Mon, 13 Feb 2012 14:51:35 -0500 Subject: Quick Survey: Naming a Favorites System for a Library *Apologies for cross-posting * We're working on a tool for our library website that will allow users to save a catalog entry, a link to a journal or database, or an article citation, for future use. There are a variety of names for this kind of tool (Favorites, Saved Items, Save for Later, Bookshelf, and so on), and I'd like to learn a bit from what you've done. While many licensed databases and other web sites have this mechanism, I'm particularly interested in library-built systems. The survey should take less than 3 minutes to complete: http://bit.ly/library-faves Please feel free to share with others as appropriate. I'm happy to summarize the results of the survey after it closes on February 20. Ken Varnum Web Systems Manager University of Michigan Library -- http://lib.umich.edu/ var...@umich.edumailto:var...@umich.edu
Re: [CODE4LIB] Repositories, OAI-PMH and web crawling
IF your HTML includes embedded semantic data using HTML5 microdata or RDFa or something similar (using a standard vocabulary -- the standard for repositories seems to be DC-based, since that's often all you can get out of OAI-PMH anyway) --- then web crawling combined with site maps probably provides about as much functionality as OAI-PMH. But embedded semantic metadata is key. However, even in the current OAI-PMH-considered-standard-best-practice world, the document-level metadata from repositories is often _extremely_ basic, as well as often unreliable. This severely limits the functionality that harvesters can put harvests to. So it's not neccesarily really about OAI-PMH vs web crawling. It's about sufficient and sufficiently reliable metadata. And even in the OAI-PMH world, we rarely have it. Note for instance that OAISter and similar harvesters are _unable to know_ whether a harvested document is open access full text or not. That seems like something you'd want to tell people in their search results right, they might only want stuff that they can actually access. But it's not really possible, becuase most (all?) repo's do not reveal any standard metadata in their OAI-PMH that would specify this. On 3/1/2012 9:38 AM, Ian Ibbotson wrote: Owen... Just wanted to say that, whilst I've been silent since my initial response, I'm not sure I agree with all the viewpoints presented here.. From a point of view of (for example, CultureGrid) I'm not sure what has been done could have been pragmatically achieved soley with web crawling as it's described in this thread. Don't have a problem with anything thats been written here. It certainly represent a great cross-section of viewpoints. However, from a jisc discovery perspective, I don't want to contribute to any confirmation bias that we could dispose of pesky old OAI. I'd be interested in providing a counter-point to any Best practice document that suggested we could. Ian. On Thu, Mar 1, 2012 at 12:36 PM, Owen Stephenso...@ostephens.com wrote: Thanks Jason and Ed, I suspect within this project we'll keep using OAI-PMH because we've got tight deadlines and the other project strands (which do stuff with the harvested content) need time from the developer. At the moment it looks like we will probably combine OAI-PMH with web crawling (using nutch) - so use data from the However, that said, one of the things we are meant to be doing is offering recommendations or good practice guidelines back to the (repository) community based on our experience. If we have time I would love to tackle the questions (a)-(d) that you highlight here - perhaps especially (a) and (c). Since this particular project is part of the wider JISC 'Discovery' programme (http://discovery.ac.uk and tech principles at http://technicalfoundations.ukoln.info/guidance/technical-principles-discovery-ecosystem) - from which one of the main themes might be summarised as 'work with the web' these questions are definitely relevant. I need to look at Jason's stuff again as I think this definitely has parallels with some of the Discovery work, as, of course, does some of the recent discussion on here about the question of the indexing of library catalogues by search engines. Thanks again to all who have contributed to the discussion - very useful Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 On 1 Mar 2012, at 11:42, Ed Summers wrote: On Mon, Feb 27, 2012 at 12:15 PM, Jason Ronallojrona...@gmail.com wrote: I'd like to bring this back to your suggestion to just forget OAI-PMH and crawl the web. I think that's probably the long-term way forward. I definitely had the same thoughts while reading this thread. Owen, are you forced to stay within the context of OAI-PMH because you are working with existing institutional repositories? I don't know if it's appropriate, or if it has been done before, but as part of your work it would be interesting to determine: a) how many IRs allow crawling (robots.txt or lack thereof) b) how many IRs support crawling with a sitemap c) how many IR HTML splashpages use the rel-license [1] pattern d) how many IRs support syndication (RSS/Atom) to publish changes If you could do this in a semi-automated way for the UK it would be great if you could then apply it to IRs around the world. It would also align really nicely with the sort of work that Jason has been doing around CAPS [2]. It seems to me that there might be an opportunity to educate digital repository managers about better aligning their content w/ the Web ... instead of trying to cook up new standards. I imagine this is way out of scope for what you are currently doing--if so, maybe this can be your next grant :-) //Ed [1] http://microformats.org/wiki/rel-license [2] https://github.com/jronallo/capsys
[CODE4LIB] Job: Archivist for Digital Collections at Tufts University
**Posting Title: ARCHIVIST FOR DIGITAL COLLECTIONS - Digital Collections and Archives** **Job Description-Overview**: **The Digital Collections and Archives (DCA) supports the teaching and research mission of Tufts University by ensuring the enduring preservation and accessibility of the university's permanently valuable records and collections. The DCA assists departments, faculty, and staff in managing records and other assets. The DCA collaborates with members of the Tufts community and others to develop tools to discover and access collections to support teaching, research, and administrative needs.** **The Archivist for Digital Collections (ADC) oversees the formulation, preparation, and management of digital objects and collections for the DCA with a particular focus on developing tools and workflows to maximize efficiency in digital collections management. This work includes: database manipulation, scripting, supervising student workers, developing policies and procedures concerning digital objects and metadata, implementing appropriate standards and best practices, conducting quality assurance for digital collections, undertaking preservation activities, and managing the DCA's locally-developed collections management system, CIDER. The ADC, working closely with the Director, acts as project manager for projects yielding digital collections including proposal development, and implementation and oversight of funded projects, and serves as a primary point of contact for faculty requiring assistance managing electronic resea! rch materials. The ADC collaborates closely with department colleagues on workflow development and implementation.** Job Description - Requirements**:** Basic Requirements: * **ALA-accredited MLS with concentration in Archives Management or related advanced degree.** * **3-5 years of related experience.** * **Experience with at least one programming or scripting language, such as Perl; some experience with database manipulation; knowledge of XML, HTML, CSS, digital imaging, and metadata and digital object creation and preservation standards. Ability to work in both Windows and Apple OSX environments. Comfort with learning new technologies on an ongoing basis. Preferred Qualifications: * **Strong written and oral communication skills; ability to function in a highly collaborative environment with many simultaneous projects. Familiarity with digital repository systems, particularly Fedora, a plus. Knowledge of Ruby on Rails, MySQL, JQuery, Catalyst, a plus.** **_Tufts University is an AA/EO employer and actively seeks candidates from diverse backgrounds._** Brought to you by code4lib jobs: http://jobs.code4lib.org/job/815/
[CODE4LIB] Job: Preservation Digital Technology Internship at Library of Congress
The Preservation Reformatting Division (PRD) provides access to at-risk Library serials, brittle books, newspapers, photographs and manuscripts by converting items to new formats such as microfilm, facsimile copies or digital reproductions. Reformatting is accomplished through programs for microphotography and digital capture. The goal of the internship is to provide Library Science and Information Technology students, graduates, and post-graduates with the opportunity to study and work with state-of-the-art digital technologies: those used for the digital reformatting of library materials; those used to document and model reformatting and related preservation workflows; and those used to ensure proper workflow execution by enabling statistical process monitoring and control. Interns have the opportunity to participate in the following key activities to plan, get, describe, sustain, and make accessible reformatted digital and/or microfilm formats for serials, photographs, manuscripts, brittle books, and other items. Digital Preservation Activities Plan: Processing management (e.g., assessing materials, processing brittle books, reviewing reformatting policies, etc.) in order to identify the functions and processes to be represented in a fashion comprehensible to library management and IT personnel. Sustain: Microphotography using two state-of-the-art microfilm digitization workstations (16/35mm roll and fiche), a high-resolution color overhead capture workstation, and an image processing and data storage infrastructure that enables: high-resolution digital image capture/importation and image quality analysis from microform and printed materials; and image inspection/auditing, editing, post processing, image quality measurement, and process control activities, with a focus on digitizing microform materials. Make Available: Digital imaging production processes of books and serials with open-source and commercial image editing/image processing software (e.g., imaging materials, managing vendor-created images, conducting quality reviews, and preparing files for use in online delivery systems). Other Activities Research: Specification development and deployment using computerized modeling/design tools to develop preservation-relevant process models, data models, flowcharts, and other products that represent existing and planned Preservation Directorate operations. Tours: The Library of Congress has tremendous quantity, quality, and diversity in its holdings. Interns have the opportunity to tour the other Directorate divisions as well as the many custodial divisions in the Library. Training and Conservation Professional Activities: Participation in outreach activities such as lab tours for visitors and relevant in-house lectures and conferences. Interns meet curators to discuss collections and are expected to give a farewell presentation of work and accomplishments to Library staff. Application and Selection Procedure Internships may be on part-time or full-time schedule, but minimum of 200 hours is generally required. The length of the internship generally ranges from 6 weeks to 6 months. Applicants should complete and submit by email the Preservation Fellowship and Internship Application Form [PDF: 18 KB / 3 p.], plus a resume, two letters of recommendation, and a formal letter of interest. Please follow the additional instructions on the application form and note that the Preservation Directorate uses this one application form for all of the various internships and fellowships offered. Citizenship requirements: U.S. citizenship not required. Application Schedule Applications are accepted at any time. To apply, please direct applications to: Mary Oey Preservation Education Specialist Library of Congress Telephone: (202) 707-8345 FAX: (202) 707-1525 m...@loc.gov Brought to you by code4lib jobs: http://jobs.code4lib.org/job/816/
[CODE4LIB] Job: Library Digital Services Manager at St. Edward's Hall
## Overview: The Scarborough-Phillips Library at St. Edward's University seeks a creative, innovative individual to work on all things digital including but not limited to the library's web presence, digitization initiatives, and integrated library systems. This position reports to the Head of Library Systems. Salary range in the mid to high 50's, commensurate with experience. This position provides planning, organization and implementation of digital library services under the general direction and leadership of the Head of Library Systems, including: usability testing of digital products; user experience design to create a nurturing, usable, and flexible digital environment for learning; digitization of analog format; system administration of III's Millennium enterprise library solution; system administration of public services products, including LibGuides, LibAnswers, and LibAnalytics; assist in writing and testing of programming code for the web, open source solutions (e.g., Omeka, Book Reader, and Open Journal Systems), and automation of internal library processes (e.g., sending and receiving data from vendors, integration with an ERP system); maintaining computers, printers, and other technology for staff and public service points in the library; work with IT to solve problems with library systems; support the creation of digital learning objects; and support resource sharing solutions. ## Responsibilities: * Plan and implement a usability program for various library digital services, including but not limited to the library's website, databases, online catalog, and digital collections. * Collaborate with Instructional Technology and library staff to create new and support existing platforms for library reference and instruction, including tutorials, online chat, streaming media, podcasting, and 3rd party software. * Provide administrative support for the library's integrated library system (III's Millennium). * Provide administrative support for the library's interlibrary loan system and other resource sharing initiatives. * Provide administrative support for digital library tools, including but not limited to LibGuides, LibAnswers, and LibAnalytics. * Provide administrative support for staff computers, including the management of Deep Freeze, print queues, and installation of software. * Program and maintain open source solutions (e.g., Omeka, Book Reader, Open Journal Systems). * Support technology-related issues throughout the library, including digitization projects, user experience design, automation of technical services routines, and the creation of digital learning objects. ## Qualifications: * Undergraduate degree in an area related to computer science or information systems required by time of employment. Advanced degree preferred. Experience with web development or technology support services in a library or academic setting preferred. * Experience with usability testing and user experience design required. * Familiarity with use of social media, e.g. Facebook, Twitter, Foursquare, in academic library settings preferred. * Demonstrated familiarity with developing and maintaining dynamic data-driven websites with relevant standards and technologies such as PHP, XML/XSLT, XHTML, CSS, JavaScript, and UNIX-like environments preferred. * Familiarity with digital media industry standards and production of high-quality audio, video and images and screencasts preferred. * Graphic design skills, including the use of Adobe Creative Suite, preferred. * Demonstrated effective oral, written and interpersonal communication skills. * Demonstrated ability to think critically and analytically and to work in a collegial, collaborative service-focused environment. * Familiarity with copyright laws and digital rights management preferred. * Experience with distance education courses preferred. * Ability to constantly adapt to a fast-evolving environment required. * Successful completion of an employment and/or criminal background check required. ## About St. Edward's University: Founded in 1885 by the Congregation of Holy Cross, St. Edward's University is the premier private institution of higher learning in Austin. Enrolling approximately 5,300 students, the university offers more than 90 academic programs. In addition to the many programs designed for traditional undergraduates, the university offers more than 15 undergraduate degree programs designed for working adults and 11 masters degree programs. Over the last two decades the university has doubled its enrollment and invested more than $147 million in new campus facilities. US News World Report has ranked St. Edward's among the top regional universities in the West for nine consecutive years, and peers identified St. Edward's as one of a handful of up-and-coming universities in both 2010 and 2011. The university's newly adopted strategic plan, Academic
[CODE4LIB] Job: Lead Programmer for Digital Libraries at University of North Texas
**Departement Overview** The digital library repository of the UNT Libraries is ranked in the top 10 repositories in North America. The University Libraries house print and electronic collections of almost 6 million cataloged items, in five libraries located in five separate facilities. For more information, about our department and strategic vision please visit our website at [http://www.library.unt.edu **Job Description ** The Library is seeking an IT Programmer Analyst I to serve as lead programmer for the UNT Libraries various digital library initiatives including The Portal to Texas History, UNT Digital Library and the CyberCemetery and Web archiving activities. Responsibilities include but are not limited to: * Supervise other software developers and programmers in the Digital Libraries Division * Serve as primary programmer for the CODA digital archiving environment and replication system * Serve as primary programmer for the Aubrey Search Service * Establish and monitor testing practices for software and interfaces developed by the unit * Adhere to the unit's version control practices for software development and deployment * Participate in grant and externally funded projects * Act as lead developer and administrator of the LOCKSS systems managed by the Libraries for the MetaArchive and the global LOCKSS network and the Texas-History Online search system **Minimum Qualifications** The successful candidate will possess a Bachelor's Degree with coursework in computing or information systems and two years of related computer programming experience; or any equivalent combination of education, training and experience. The following knowledge, skills, and abilities are required: * Considerable knowledge of the methods and equipment used in electronic data processing, including system analysis and design, and computer programming techniques * Strong skill in writing programs for computer applications * Ability to analyze problems and develop solutions **Preferred Qualifications** The preferred candidate will possess the following additional qualifications: * Demonstrated leadership in project teamwork * Ability to coordinate and evaluate the work of others * Understanding of digital library concepts and operations * Broad familiarity with open source tools and environments * Extensive knowledge of dynamic script programming languages such as Python, Perl or Ruby * Working knowledge of version control systems * Working knowledge of XML and related technologies * Extensive knowledge of Linux/Unix environments for software development and deployment * Working knowledge of Solr indexing software, including setup, configuration and interface design * Familiarity with the following technologies and/or applications - Python, PHP, Apache, MySQL, HTML, Java, XSLT Brought to you by code4lib jobs: http://jobs.code4lib.org/job/818/
[CODE4LIB] Autoscaling and streaming apps on EC2
Howdy all, I have no experience with autoscaling or streaming, so I'm looking for thoughts that help me wrap my mind around how to implement it in a production setting. I have been asked to examine the possibility of providing a consortia level music reserves system using Variations (which I also have no experience with). The software would be maintained centrally, but each institution will manage its own collections, users, etc. Load is expected to vary considerably, varying from practically nothing to possibly hundreds of simultaneous streams at peak time. This strikes me as an excellent elastic application. This strikes me a good EC2 app and as far as I can tell, there are two basic ways to achieve the elasticity I'm looking for in that environment. The first is to store the files in S3 buckets and serve them via CloudFront. This strikes me as the preferred solution, but I don't yet know if I'll be able to get the user and staff clients to play well with this configuration. The second is to have a script monitor the service and spin up more instances when certain triggers are met and destroy them when demand drops. But if I do that, all instances need to be able to access the same live data. For DB data, that's a no brainer since I can just run a DB server. But how do you synchronize live files across instances since EBS volumes can only be accessed by one instance? Somehow, NFS strikes me an ugly way to deal with the problem. Actually even if EBS volumes could be attached to multiple instances, that solution would still suck as you could have multiple apps trying to access files at the same time. Obviously, I'm having trouble getting pointed in the right direction. I could punt and just order capacity to handle heavy use cases. But that's a copout, and figuring out autoscaling bandwidth and computing capacity just feels like one of those tools that's really handy to have in your bag of tricks. Any pointers would be appreciated. kyle -- -- Kyle Banerjee Digital Services Program Manager Orbis Cascade Alliance baner...@uoregon.edu / 503.999.9787
Re: [CODE4LIB] Autoscaling and streaming apps on EC2
I might be missing something, but it seems to me that you are comparing using CloudFront to trying to build your own CloudFront. Building your own does not seem like it would be very easy or cost effective. Essentially, S3 is an NFS, innit? We use it that way. What is the issue with CloudFront? There's no philosophical problem with CloudFront, but there might be practical ones. While I should theoretically be able to use s3fs to allow the software to seamlessly interact with S3, the software also assumes you're using it to stream media files rather than an external service. Maybe this change will be easy to implement, maybe it won't -- I won't know until I try. If it isn't, I need to come up with a Plan B. At this point in time, I'm just trying to make sure I understand my major options for setting up the service. kyle