[CODE4LIB] Job: Developer Needed for Omeka/Scripto + Wordpress Newman website archive at The National Institute for Newman Studies
Developer Needed for Omeka/Scripto + Wordpress Newman website archive The National Institute for Newman Studies Pittsburgh Please follow the link below to see all the project details. https://docs.google.com/document/d/1poSC- A7V_TPhlx2uXKSx2zUdXJl_i4Dtz3q0jiIBuVc/edit Please contact me ASAP we need to get this project started and finished. Brought to you by code4lib jobs: http://jobs.code4lib.org/job/22120/ To post a new job please visit http://jobs.code4lib.org/
[CODE4LIB] Job: RESEARCH DATA MANAGEMENT LIBRARIAN at Indiana University Bloomington
RESEARCH DATA MANAGEMENT LIBRARIAN Indiana University Bloomington Bloomington **RESEARCH DATA MANAGEMENT LIBRARIAN ASSISTANT LIBRARIAN OR ASSOCIATE LIBRARIAN INDIANA UNIVERSITY BLOOMINGTON LIBRARIES** Founded in 1820, Indiana University Bloomington has grown from a small state seminary into the flagship campus of a great public university with over 42,000 students and almost 3,000 faculty. Innovation, creativity, and academic freedom are hallmarks of IU Bloomington and its world-class contributions in research and the arts. The campus covers over 1,800 wooded acres and is distinctive for both its park-like beauty and an architectural heritage inspired by local craftsmanship in limestone. The Indiana University Bloomington Libraries (http://www.libraries.iub.edu) are among the leading academic research library systems in North America, having recently been named the top university library by the Association of College and Research Libraries. The IUB Libraries provide strong collections, quality service and instructional programs, and leadership in the application of information technologies. The collections support every academic discipline on campus and include more than 6.6 million books, journals, maps, films, and audio/visual materials in over 900 languages. Users can access more than 400 databases, 43,000 electronic journals, and 224,000 electronic books, as well as locally developed digital content. Of particular note are the 8-million volume high-density Auxiliary Library Facility (ALF) for preservation and access to the libraries' collections and archives, and the Lilly Library, the rare books, manuscripts, and special collections library of the Indiana University Libraries, Bloomington. The IUB Libraries are active members of regional and national associations and consortia including the Committee on Institutional Cooperation (CIC), the Association of Research Libraries (ARL), the Digital Library Federation (DLF), the Hydra community, and is a founding member of HathiTrust, a shared digital repository. IU is the principal investigator for the Kuali Open Library Environment (OLE) and is working with academic library partners to develop a next generation open source library management system. Indiana University is an organizational member of the Research Data Alliance, working internationally to bridge research data use and sharing across domains and disciplines. The Indiana University Bloomington Libraries seeks a Research Data Management Librarian to be part of a collaborative team to plan and develop new services and promote existing services for research data management -- consultation, outreach and training, and repository services -- to meet the diverse needs of all scholars across the Bloomington campus. Working across units within the Libraries --especially Library Technologies, Scholarly Communications, Digital Collections Services, the Office of Scholarly Publishing, and with subject librarians, the Research Data Management Librarian will provide data management expertise for both the libraries and individual researchers as part of the Scholars' Commons suite of digital scholarship services. In addition to working with library units and scholars, this position will foster collaborations and relationships that complements the Libraries' capacity to support the University's interdisciplinary research and technology initiatives, building upon a foundation of successful library- campus collaborations to date including partnerships with Indiana University's Office of Research Administration, University Information Technology Services, Pervasive Technology Institute-Data to Insight Center, and Office of the Vice- Provost for Research. These larger partnerships are instrumental in ensuring cohesion and collaboration in data management resources at the institutional level. Reporting to the Associate Dean for Library Technologies, this librarian will consult with faculty, graduate students, and other researchers on data management planning and data curation activities; develop instructional programming and documentation to support scholars in this area; and work with colleagues in Library Technologies and University Information Technology Services to adapt, design, and develop tools and repository services for storing and sharing research data. The successful candidate will demonstrate a clear vision of the services, infrastructure, and skills required to provide high quality assistance and tools to IU researchers. RESPONSIBILITIES * Contribute to university- and campus-wide initiatives to develop and design policies, services, and infrastructure to enable faculty and students to preserve and make available, and thus maximize the utility of, their research data. * Develop, enhance, deliver, and assess research data workflows for IUB faculty, students and staff. * Serve as a library consultant to IUB faculty, researchers and project teams on the development of
[CODE4LIB] Processing Circ data
Hi all. What are you using to process circ data for ad-hoc queries. I usually extract csv or tab-delimited files - one row per item record, with identifying bib record data, then total checkouts over the given time period(s). I have been importing these into Access then grouping them by bib record. I think that I've reached the limits of scalability for Access for this project now, with 250,000 item records. Does anyone do this in R? My other go-to- software for data processing is RapidMiner free version. Or do you just use MySQL or other SQL database? I was looking into doing it in R with RSQLite (just read about this and sqldf http://www.r-bloggers.com/make-r-speak-sql-with-sqldf/ ) because I'm sure my IT department will be skeptical of letting me have MySQL on my desktop. (I've moved into a much more users-don't-do-real-computing kind of environment). I'm rusty enough in R that if anyone will give me some start-off data import code, that would be great. Cindy Harper E-services and periodicals librarian Virginia Theological Seminary Bishop Payne Library 3737 Seminary Road Alexandria VA 22304 char...@vts.edumailto:char...@vts.edu 703-461-1794
Re: [CODE4LIB] Processing Circ data
Another option might be to use OpenRefine http://openrefine.org - this should easily handle 250,000 rows. I find it good for basic data analysis, and there are extensions which offer some visualisations (e.g. the VIB BITs extension which will plot simple data using d3 https://www.bits.vib.be/index.php/software-overview/openrefine https://www.bits.vib.be/index.php/software-overview/openrefine) I’ve written an introduction to OpenRefine available at http://www.meanboyfriend.com/overdue_ideas/2014/11/working-with-data-using-openrefine/ http://www.meanboyfriend.com/overdue_ideas/2014/11/working-with-data-using-openrefine/ Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 On 5 Aug 2015, at 21:07, Harper, Cynthia char...@vts.edu wrote: Hi all. What are you using to process circ data for ad-hoc queries. I usually extract csv or tab-delimited files - one row per item record, with identifying bib record data, then total checkouts over the given time period(s). I have been importing these into Access then grouping them by bib record. I think that I've reached the limits of scalability for Access for this project now, with 250,000 item records. Does anyone do this in R? My other go-to- software for data processing is RapidMiner free version. Or do you just use MySQL or other SQL database? I was looking into doing it in R with RSQLite (just read about this and sqldf http://www.r-bloggers.com/make-r-speak-sql-with-sqldf/ ) because I'm sure my IT department will be skeptical of letting me have MySQL on my desktop. (I've moved into a much more users-don't-do-real-computing kind of environment). I'm rusty enough in R that if anyone will give me some start-off data import code, that would be great. Cindy Harper E-services and periodicals librarian Virginia Theological Seminary Bishop Payne Library 3737 Seminary Road Alexandria VA 22304 char...@vts.edumailto:char...@vts.edu 703-461-1794
Re: [CODE4LIB] Processing Circ data
Hi Cindy, This doesn't quite address your issue, but, unless you've hit the 2 GB Access size limit [1], Access can handle a good deal more than 250,000 item records (rows, yes?) you cited. What makes you think you've hit the limit? Slowness, something else? All the best, Kevin [1] https://support.office.com/en-us/article/Access-2010-specifications-1e521481-7f9a-46f7-8ed9-ea9dff1fa854 On 8/5/15 3:07 PM, Harper, Cynthia wrote: Hi all. What are you using to process circ data for ad-hoc queries. I usually extract csv or tab-delimited files - one row per item record, with identifying bib record data, then total checkouts over the given time period(s). I have been importing these into Access then grouping them by bib record. I think that I've reached the limits of scalability for Access for this project now, with 250,000 item records. Does anyone do this in R? My other go-to- software for data processing is RapidMiner free version. Or do you just use MySQL or other SQL database? I was looking into doing it in R with RSQLite (just read about this and sqldf http://www.r-bloggers.com/make-r-speak-sql-with-sqldf/ ) because I'm sure my IT department will be skeptical of letting me have MySQL on my desktop. (I've moved into a much more users-don't-do-real-computing kind of environment). I'm rusty enough in R that if anyone will give me some start-off data import code, that would be great. Cindy Harper E-services and periodicals librarian Virginia Theological Seminary Bishop Payne Library 3737 Seminary Road Alexandria VA 22304 char...@vts.edumailto:char...@vts.edu 703-461-1794
Re: [CODE4LIB] Processing Circ data
Well, I guess it could be bad data, but I don't know how to tell. I think I've done more than this before. I have a Find duplicates query that groups by bib record number. That query seemed to take about 40 minutes to process. Then I added a criterion to limit to only records that had 0 circs this year. That query displays the rotating cursor, then says Not Responding, then the cursor, and loops through that for hours. Maybe I can find the Access bad data, but I'd be glad to find a more modern data analysis software. My db is 136,256 kb. But adding that extra query will probably put it over the 2GB mark. I've tried extracting to a csv, and that didn't work. Maybe I'll try a Make table to a separate db. Or the OpenRefine suggestion sounds good too. Cindy Harper -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Kevin Ford Sent: Wednesday, August 05, 2015 4:23 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Processing Circ data Hi Cindy, This doesn't quite address your issue, but, unless you've hit the 2 GB Access size limit [1], Access can handle a good deal more than 250,000 item records (rows, yes?) you cited. What makes you think you've hit the limit? Slowness, something else? All the best, Kevin [1] https://support.office.com/en-us/article/Access-2010-specifications-1e521481-7f9a-46f7-8ed9-ea9dff1fa854 On 8/5/15 3:07 PM, Harper, Cynthia wrote: Hi all. What are you using to process circ data for ad-hoc queries. I usually extract csv or tab-delimited files - one row per item record, with identifying bib record data, then total checkouts over the given time period(s). I have been importing these into Access then grouping them by bib record. I think that I've reached the limits of scalability for Access for this project now, with 250,000 item records. Does anyone do this in R? My other go-to- software for data processing is RapidMiner free version. Or do you just use MySQL or other SQL database? I was looking into doing it in R with RSQLite (just read about this and sqldf http://www.r-bloggers.com/make-r-speak-sql-with-sqldf/ ) because ... I'm rusty enough in R that if anyone will give me some start-off data import code, that would be great. Cindy Harper E-services and periodicals librarian Virginia Theological Seminary Bishop Payne Library 3737 Seminary Road Alexandria VA 22304 char...@vts.edumailto:char...@vts.edu 703-461-1794
Re: [CODE4LIB] Processing Circ data
On the surface, your difficulties suggest you may need look at a few optimization tactics. Apologies if these are things you've already considered and addressed - just offering a suggestion. This page [1] is for Access 2003 but the items under Improve query performance should apply - I think - to newer versions also. I'll draw specific attention to 1) Compacting the database; 2) making sure you have an index set up on the bib record number field and number of circs field; and 3) make sure you are using hte Group by sql syntax [2]. Now, I'm not terribly familiar with Access so I can't actually help you with point/click instructions, but the above are common 'gotchas' that could be a problem regardless of RDBMS. Yours, Kevin [1] https://support.microsoft.com/en-us/kb/209126 [2] http://www.w3schools.com/sql/sql_groupby.asp On 8/5/15 4:01 PM, Harper, Cynthia wrote: Well, I guess it could be bad data, but I don't know how to tell. I think I've done more than this before. I have a Find duplicates query that groups by bib record number. That query seemed to take about 40 minutes to process. Then I added a criterion to limit to only records that had 0 circs this year. That query displays the rotating cursor, then says Not Responding, then the cursor, and loops through that for hours. Maybe I can find the Access bad data, but I'd be glad to find a more modern data analysis software. My db is 136,256 kb. But adding that extra query will probably put it over the 2GB mark. I've tried extracting to a csv, and that didn't work. Maybe I'll try a Make table to a separate db. Or the OpenRefine suggestion sounds good too. Cindy Harper -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Kevin Ford Sent: Wednesday, August 05, 2015 4:23 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Processing Circ data Hi Cindy, This doesn't quite address your issue, but, unless you've hit the 2 GB Access size limit [1], Access can handle a good deal more than 250,000 item records (rows, yes?) you cited. What makes you think you've hit the limit? Slowness, something else? All the best, Kevin [1] https://support.office.com/en-us/article/Access-2010-specifications-1e521481-7f9a-46f7-8ed9-ea9dff1fa854 On 8/5/15 3:07 PM, Harper, Cynthia wrote: Hi all. What are you using to process circ data for ad-hoc queries. I usually extract csv or tab-delimited files - one row per item record, with identifying bib record data, then total checkouts over the given time period(s). I have been importing these into Access then grouping them by bib record. I think that I've reached the limits of scalability for Access for this project now, with 250,000 item records. Does anyone do this in R? My other go-to- software for data processing is RapidMiner free version. Or do you just use MySQL or other SQL database? I was looking into doing it in R with RSQLite (just read about this and sqldf http://www.r-bloggers.com/make-r-speak-sql-with-sqldf/ ) because ... I'm rusty enough in R that if anyone will give me some start-off data import code, that would be great. Cindy Harper E-services and periodicals librarian Virginia Theological Seminary Bishop Payne Library 3737 Seminary Road Alexandria VA 22304 char...@vts.edumailto:char...@vts.edu 703-461-1794
[CODE4LIB] Fwd: Survey on embedded metadata in digital objects
Dear Colleagues, You are invited to participate in a survey designed to collect information on the practice of embedding metadata into digital objects. The purpose of the survey is to explore the cost and benefit of embedding additional (i.e. LAM-generated) metadata into digital objects, to the end of evaluating current practice and defining best practices. The survey consists of a mix of closed and open ended questions. Participation should take between 15-20 minutes. *Please follow this link to complete the survey: * http://goo.gl/forms/okWuTIyTcN Rachel Jaffe, Metadata Librarian, UC Santa Cruz and Edward Corrado, Associate Dean, Library Technology Planning and Policy, University of Alabama are conducting this survey. *Participation is voluntary; participants will have the right to discontinue the survey at any point without penalty.* Information obtained from the online survey will be collected in a manner that human subjects cannot be identified, directly or through identifiers linked to the subject. Data will be made available to the profession; along with analysis of current practice and possibilities for future research. The University of California, Santa Cruz Institutional Review Board has determined that this survey qualifies as exempt from full IRB oversight. No human subjects harm is expected to occur during the online survey. *Deadline for completing the survey is September 15, 2015.* Contact Rachel Jaffe at 831-502-7291 or jaf...@ucsc.edu, or Edward Corrado at 205-348-0266 or emcorr...@ua.edu with questions or concerns about this study. If you have questions about your rights as a participant in this research, please contact the University of California, Santa Cruz Office of Research Compliance Administration, at 831-459-1473 or o...@ucsc.edu. Regards, Rachel Jaffe Metadata Librarian Metadata Services, University Library University of California, Santa Cruz 1156 High Street Santa Cruz, CA 95064 (831) 502-7291 jaf...@ucsc.edu Edward M. Corrado Associate Dean Library Technology Planning and Policy, University Libraries University of Alabama Box 870266 Tuscaloosa, AL 35487-0266 (205) 348-0266 emcorr...@ua.edu